1*2abb3134SXin LiRAPPOR 2*2abb3134SXin Li====== 3*2abb3134SXin Li 4*2abb3134SXin LiRAPPOR is a novel privacy technology that allows inferring statistics about 5*2abb3134SXin Lipopulations while preserving the privacy of individual users. 6*2abb3134SXin Li 7*2abb3134SXin LiThis repository contains simulation and analysis code in Python and R. 8*2abb3134SXin Li 9*2abb3134SXin LiFor a detailed description of the algorithms, see the 10*2abb3134SXin Li[paper](http://arxiv.org/abs/1407.6981) and links below. 11*2abb3134SXin Li 12*2abb3134SXin LiFeel free to send feedback to 13*2abb3134SXin Li[[email protected]][group]. 14*2abb3134SXin Li 15*2abb3134SXin LiRunning the Demo 16*2abb3134SXin Li---------------- 17*2abb3134SXin Li 18*2abb3134SXin LiAlthough the Python and R libraries should be portable to any platform, our 19*2abb3134SXin Liend-to-end demo has only been tested on Linux. 20*2abb3134SXin Li 21*2abb3134SXin LiIf you don't have a Linux box handy, you can [view the generated 22*2abb3134SXin Lioutput](http://google.github.io/rappor/examples/report.html). 23*2abb3134SXin Li 24*2abb3134SXin LiTo setup your enviroment there are some packages and R dependencies. There is a setup script to install them: 25*2abb3134SXin Li $ ./setup.sh 26*2abb3134SXin LiThen to build the native components run: 27*2abb3134SXin Li $ ./build.sh 28*2abb3134SXin LiThis compiles and tests the `fastrand` C extension module for Python, which 29*2abb3134SXin Lispeeds up the simulation. 30*2abb3134SXin Li 31*2abb3134SXin LiFinally to run the demo run: 32*2abb3134SXin Li $ ./demo.sh 33*2abb3134SXin Li 34*2abb3134SXin LiThe demo strings together the Python and R code. It: 35*2abb3134SXin Li 36*2abb3134SXin Li1. Generates simulated input data with different distributions 37*2abb3134SXin Li2. Runs it through the RAPPOR privacy-preserving reporting mechanisms 38*2abb3134SXin Li3. Analyzes and plots the aggregated reports against the true input 39*2abb3134SXin Li 40*2abb3134SXin LiThe output is written to `_tmp/regtest/results.html`, and can be opened with a 41*2abb3134SXin Librowser. 42*2abb3134SXin Li 43*2abb3134SXin LiDependencies 44*2abb3134SXin Li------------ 45*2abb3134SXin Li 46*2abb3134SXin Li[R](http://r-project.org) analysis (`analysis/R`): 47*2abb3134SXin Li 48*2abb3134SXin Li- [glmnet](http://cran.r-project.org/web/packages/glmnet/index.html) 49*2abb3134SXin Li- [limSolve](https://cran.r-project.org/web/packages/limSolve/index.html) 50*2abb3134SXin Li 51*2abb3134SXin LiDemo dependencies (`demo.sh`): 52*2abb3134SXin Li 53*2abb3134SXin LiThese are necessary if you want to test changes to the code. 54*2abb3134SXin Li 55*2abb3134SXin Li- R libraries 56*2abb3134SXin Li - [ggplot2](http://cran.r-project.org/web/packages/ggplot2/index.html) 57*2abb3134SXin Li - [optparse](http://cran.r-project.org/web/packages/optparse/index.html) 58*2abb3134SXin Li- bash shell / coreutils: to run tests 59*2abb3134SXin Li 60*2abb3134SXin LiPython client (`client/python`): 61*2abb3134SXin Li 62*2abb3134SXin Li- None. You should be able to just import the `rappor.py` file. 63*2abb3134SXin Li 64*2abb3134SXin LiPlatform: 65*2abb3134SXin Li 66*2abb3134SXin Li- R: tested on R 3.0. 67*2abb3134SXin Li- Python: tested on Python 2.7. 68*2abb3134SXin Li- OS: the shell script tests have been tested on Linux, but may work on 69*2abb3134SXin Li Mac/Cygwin. The R and Python code should work on any OS. 70*2abb3134SXin Li 71*2abb3134SXin LiDevelopment 72*2abb3134SXin Li----------- 73*2abb3134SXin Li 74*2abb3134SXin LiTo run tests: 75*2abb3134SXin Li 76*2abb3134SXin Li $ ./test.sh 77*2abb3134SXin Li 78*2abb3134SXin LiThis currently runs Python unit tests, lints Python source files, and runs R 79*2abb3134SXin Liunit tests. 80*2abb3134SXin Li 81*2abb3134SXin LiAPI 82*2abb3134SXin Li--- 83*2abb3134SXin Li 84*2abb3134SXin Li`rappor.py` is a tiny standalone Python file, and you can easily copy it into a 85*2abb3134SXin LiPython program. 86*2abb3134SXin Li 87*2abb3134SXin LiNOTE: Its interface is subject to change. We are in the demo stage now, but if 88*2abb3134SXin Lithere's demand, we will document and publish the interface. 89*2abb3134SXin Li 90*2abb3134SXin LiThe R interface is also subject to change. 91*2abb3134SXin Li 92*2abb3134SXin Li<!-- TODO: Add links to interface docs when available. --> 93*2abb3134SXin Li 94*2abb3134SXin LiThe `fastrand` C module is optional. It's likely only useful for simulation of 95*2abb3134SXin Lithousands of clients. It doesn't use cryptographically strong randomness, and 96*2abb3134SXin Lithus should **not** be used in production. 97*2abb3134SXin Li 98*2abb3134SXin LiDirectory Structure 99*2abb3134SXin Li------------------- 100*2abb3134SXin Li 101*2abb3134SXin Li analysis/ 102*2abb3134SXin Li R/ # R code for analysis 103*2abb3134SXin Li cpp/ # Fast reimplementations of certain analysis 104*2abb3134SXin Li # algorithms 105*2abb3134SXin Li apps/ # Web apps to help you use RAPPOR (using Shiny) 106*2abb3134SXin Li bin/ # Command line tools for analysis. 107*2abb3134SXin Li client/ # Client libraries 108*2abb3134SXin Li python/ # Python client library 109*2abb3134SXin Li rappor.py 110*2abb3134SXin Li ... 111*2abb3134SXin Li cpp/ # C++ client library 112*2abb3134SXin Li encoder.cc 113*2abb3134SXin Li ... 114*2abb3134SXin Li doc/ # Documentation 115*2abb3134SXin Li tests/ # Tools for regression tests 116*2abb3134SXin Li compare_dist.R # Test helper for single variable analysis 117*2abb3134SXin Li gen_true_values.R # Generate test input 118*2abb3134SXin Li make_summary.py # Generate an HTML report for the regtest 119*2abb3134SXin Li rappor_sim.py # RAPPOR client simulation 120*2abb3134SXin Li regtest_spec.py # Specification of test cases 121*2abb3134SXin Li ... 122*2abb3134SXin Li build.sh # Build scripts (docs, C extension, etc.) 123*2abb3134SXin Li demo.sh # Quick demonstration 124*2abb3134SXin Li docs.sh # Generate docs form the markdown in doc/ 125*2abb3134SXin Li gh-pages/ # Where generated docs go. (A subtree of the branch gh-pages) 126*2abb3134SXin Li pipeline/ # Analysis pipeline code. 127*2abb3134SXin Li regtest.sh # End-to-end regression tests, including client 128*2abb3134SXin Li # libraries and analysis 129*2abb3134SXin Li setup.sh # Install dependencies (for Linux) 130*2abb3134SXin Li test.sh # Test runner 131*2abb3134SXin Li 132*2abb3134SXin LiDocumentation 133*2abb3134SXin Li------------- 134*2abb3134SXin Li 135*2abb3134SXin Li- [RAPPOR Data Flow](http://google.github.io/rappor/doc/data-flow.html) 136*2abb3134SXin Li 137*2abb3134SXin LiPublications 138*2abb3134SXin Li------------ 139*2abb3134SXin Li 140*2abb3134SXin Li- [RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response](http://arxiv.org/abs/1407.6981) 141*2abb3134SXin Li- [Building a RAPPOR with the Unknown: Privacy-Preserving Learning of Associations and Data Dictionaries](http://arxiv.org/abs/1503.01214) 142*2abb3134SXin Li 143*2abb3134SXin LiLinks 144*2abb3134SXin Li----- 145*2abb3134SXin Li 146*2abb3134SXin Li- [Google Blog Post about RAPPOR](http://googleresearch.blogspot.com/2014/10/learning-statistics-with-privacy-aided.html) 147*2abb3134SXin Li- [RAPPOR implementation in Chrome](http://www.chromium.org/developers/design-documents/rappor) 148*2abb3134SXin Li - This is a production quality C++ implementation, but it's somewhat tied to 149*2abb3134SXin Li Chrome, and doesn't support all privacy parameters (e.g. only a few values 150*2abb3134SXin Li of p and q). On the other hand, the code in this repo is not yet 151*2abb3134SXin Li production quality, but supports experimentation with different parameters 152*2abb3134SXin Li and data sets. Of course, anyone is free to implement RAPPOR independently 153*2abb3134SXin Li as well. 154*2abb3134SXin Li- Mailing list: [[email protected]][group] 155*2abb3134SXin Li 156*2abb3134SXin Li[group]: https://groups.google.com/forum/#!forum/rappor-discuss 157