1RAPPOR 2====== 3 4RAPPOR is a novel privacy technology that allows inferring statistics about 5populations while preserving the privacy of individual users. 6 7This repository contains simulation and analysis code in Python and R. 8 9For a detailed description of the algorithms, see the 10[paper](http://arxiv.org/abs/1407.6981) and links below. 11 12Feel free to send feedback to 13[[email protected]][group]. 14 15Running the Demo 16---------------- 17 18Although the Python and R libraries should be portable to any platform, our 19end-to-end demo has only been tested on Linux. 20 21If you don't have a Linux box handy, you can [view the generated 22output](http://google.github.io/rappor/examples/report.html). 23 24To setup your enviroment there are some packages and R dependencies. There is a setup script to install them: 25 $ ./setup.sh 26Then to build the native components run: 27 $ ./build.sh 28This compiles and tests the `fastrand` C extension module for Python, which 29speeds up the simulation. 30 31Finally to run the demo run: 32 $ ./demo.sh 33 34The demo strings together the Python and R code. It: 35 361. Generates simulated input data with different distributions 372. Runs it through the RAPPOR privacy-preserving reporting mechanisms 383. Analyzes and plots the aggregated reports against the true input 39 40The output is written to `_tmp/regtest/results.html`, and can be opened with a 41browser. 42 43Dependencies 44------------ 45 46[R](http://r-project.org) analysis (`analysis/R`): 47 48- [glmnet](http://cran.r-project.org/web/packages/glmnet/index.html) 49- [limSolve](https://cran.r-project.org/web/packages/limSolve/index.html) 50 51Demo dependencies (`demo.sh`): 52 53These are necessary if you want to test changes to the code. 54 55- R libraries 56 - [ggplot2](http://cran.r-project.org/web/packages/ggplot2/index.html) 57 - [optparse](http://cran.r-project.org/web/packages/optparse/index.html) 58- bash shell / coreutils: to run tests 59 60Python client (`client/python`): 61 62- None. You should be able to just import the `rappor.py` file. 63 64Platform: 65 66- R: tested on R 3.0. 67- Python: tested on Python 2.7. 68- OS: the shell script tests have been tested on Linux, but may work on 69 Mac/Cygwin. The R and Python code should work on any OS. 70 71Development 72----------- 73 74To run tests: 75 76 $ ./test.sh 77 78This currently runs Python unit tests, lints Python source files, and runs R 79unit tests. 80 81API 82--- 83 84`rappor.py` is a tiny standalone Python file, and you can easily copy it into a 85Python program. 86 87NOTE: Its interface is subject to change. We are in the demo stage now, but if 88there's demand, we will document and publish the interface. 89 90The R interface is also subject to change. 91 92<!-- TODO: Add links to interface docs when available. --> 93 94The `fastrand` C module is optional. It's likely only useful for simulation of 95thousands of clients. It doesn't use cryptographically strong randomness, and 96thus should **not** be used in production. 97 98Directory Structure 99------------------- 100 101 analysis/ 102 R/ # R code for analysis 103 cpp/ # Fast reimplementations of certain analysis 104 # algorithms 105 apps/ # Web apps to help you use RAPPOR (using Shiny) 106 bin/ # Command line tools for analysis. 107 client/ # Client libraries 108 python/ # Python client library 109 rappor.py 110 ... 111 cpp/ # C++ client library 112 encoder.cc 113 ... 114 doc/ # Documentation 115 tests/ # Tools for regression tests 116 compare_dist.R # Test helper for single variable analysis 117 gen_true_values.R # Generate test input 118 make_summary.py # Generate an HTML report for the regtest 119 rappor_sim.py # RAPPOR client simulation 120 regtest_spec.py # Specification of test cases 121 ... 122 build.sh # Build scripts (docs, C extension, etc.) 123 demo.sh # Quick demonstration 124 docs.sh # Generate docs form the markdown in doc/ 125 gh-pages/ # Where generated docs go. (A subtree of the branch gh-pages) 126 pipeline/ # Analysis pipeline code. 127 regtest.sh # End-to-end regression tests, including client 128 # libraries and analysis 129 setup.sh # Install dependencies (for Linux) 130 test.sh # Test runner 131 132Documentation 133------------- 134 135- [RAPPOR Data Flow](http://google.github.io/rappor/doc/data-flow.html) 136 137Publications 138------------ 139 140- [RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response](http://arxiv.org/abs/1407.6981) 141- [Building a RAPPOR with the Unknown: Privacy-Preserving Learning of Associations and Data Dictionaries](http://arxiv.org/abs/1503.01214) 142 143Links 144----- 145 146- [Google Blog Post about RAPPOR](http://googleresearch.blogspot.com/2014/10/learning-statistics-with-privacy-aided.html) 147- [RAPPOR implementation in Chrome](http://www.chromium.org/developers/design-documents/rappor) 148 - This is a production quality C++ implementation, but it's somewhat tied to 149 Chrome, and doesn't support all privacy parameters (e.g. only a few values 150 of p and q). On the other hand, the code in this repo is not yet 151 production quality, but supports experimentation with different parameters 152 and data sets. Of course, anyone is free to implement RAPPOR independently 153 as well. 154- Mailing list: [[email protected]][group] 155 156[group]: https://groups.google.com/forum/#!forum/rappor-discuss 157