xref: /aosp_15_r20/external/rappor/README.md (revision 2abb31345f6c95944768b5222a9a5ed3fc68cc00)
1*2abb3134SXin LiRAPPOR
2*2abb3134SXin Li======
3*2abb3134SXin Li
4*2abb3134SXin LiRAPPOR is a novel privacy technology that allows inferring statistics about
5*2abb3134SXin Lipopulations while preserving the privacy of individual users.
6*2abb3134SXin Li
7*2abb3134SXin LiThis repository contains simulation and analysis code in Python and R.
8*2abb3134SXin Li
9*2abb3134SXin LiFor a detailed description of the algorithms, see the
10*2abb3134SXin Li[paper](http://arxiv.org/abs/1407.6981) and links below.
11*2abb3134SXin Li
12*2abb3134SXin LiFeel free to send feedback to
13*2abb3134SXin Li[[email protected]][group].
14*2abb3134SXin Li
15*2abb3134SXin LiRunning the Demo
16*2abb3134SXin Li----------------
17*2abb3134SXin Li
18*2abb3134SXin LiAlthough the Python and R libraries should be portable to any platform, our
19*2abb3134SXin Liend-to-end demo has only been tested on Linux.
20*2abb3134SXin Li
21*2abb3134SXin LiIf you don't have a Linux box handy, you can [view the generated
22*2abb3134SXin Lioutput](http://google.github.io/rappor/examples/report.html).
23*2abb3134SXin Li
24*2abb3134SXin LiTo setup your enviroment there are some packages and R dependencies. There is a setup script to install them:
25*2abb3134SXin Li    $ ./setup.sh
26*2abb3134SXin LiThen to build the native components run:
27*2abb3134SXin Li    $ ./build.sh
28*2abb3134SXin LiThis compiles and tests the `fastrand` C extension module for Python, which
29*2abb3134SXin Lispeeds up the simulation.
30*2abb3134SXin Li
31*2abb3134SXin LiFinally to run the demo run:
32*2abb3134SXin Li    $ ./demo.sh
33*2abb3134SXin Li
34*2abb3134SXin LiThe demo strings together the Python and R code.  It:
35*2abb3134SXin Li
36*2abb3134SXin Li1. Generates simulated input data with different distributions
37*2abb3134SXin Li2. Runs it through the RAPPOR privacy-preserving reporting mechanisms
38*2abb3134SXin Li3. Analyzes and plots the aggregated reports against the true input
39*2abb3134SXin Li
40*2abb3134SXin LiThe output is written to `_tmp/regtest/results.html`, and can be opened with a
41*2abb3134SXin Librowser.
42*2abb3134SXin Li
43*2abb3134SXin LiDependencies
44*2abb3134SXin Li------------
45*2abb3134SXin Li
46*2abb3134SXin Li[R](http://r-project.org) analysis (`analysis/R`):
47*2abb3134SXin Li
48*2abb3134SXin Li- [glmnet](http://cran.r-project.org/web/packages/glmnet/index.html)
49*2abb3134SXin Li- [limSolve](https://cran.r-project.org/web/packages/limSolve/index.html)
50*2abb3134SXin Li
51*2abb3134SXin LiDemo dependencies (`demo.sh`):
52*2abb3134SXin Li
53*2abb3134SXin LiThese are necessary if you want to test changes to the code.
54*2abb3134SXin Li
55*2abb3134SXin Li- R libraries
56*2abb3134SXin Li  - [ggplot2](http://cran.r-project.org/web/packages/ggplot2/index.html)
57*2abb3134SXin Li  - [optparse](http://cran.r-project.org/web/packages/optparse/index.html)
58*2abb3134SXin Li- bash shell / coreutils: to run tests
59*2abb3134SXin Li
60*2abb3134SXin LiPython client (`client/python`):
61*2abb3134SXin Li
62*2abb3134SXin Li- None.  You should be able to just import the `rappor.py` file.
63*2abb3134SXin Li
64*2abb3134SXin LiPlatform:
65*2abb3134SXin Li
66*2abb3134SXin Li- R: tested on R 3.0.
67*2abb3134SXin Li- Python: tested on Python 2.7.
68*2abb3134SXin Li- OS: the shell script tests have been tested on Linux, but may work on
69*2abb3134SXin Li  Mac/Cygwin.  The R and Python code should work on any OS.
70*2abb3134SXin Li
71*2abb3134SXin LiDevelopment
72*2abb3134SXin Li-----------
73*2abb3134SXin Li
74*2abb3134SXin LiTo run tests:
75*2abb3134SXin Li
76*2abb3134SXin Li    $ ./test.sh
77*2abb3134SXin Li
78*2abb3134SXin LiThis currently runs Python unit tests, lints Python source files, and runs R
79*2abb3134SXin Liunit tests.
80*2abb3134SXin Li
81*2abb3134SXin LiAPI
82*2abb3134SXin Li---
83*2abb3134SXin Li
84*2abb3134SXin Li`rappor.py` is a tiny standalone Python file, and you can easily copy it into a
85*2abb3134SXin LiPython program.
86*2abb3134SXin Li
87*2abb3134SXin LiNOTE: Its interface is subject to change.  We are in the demo stage now, but if
88*2abb3134SXin Lithere's demand, we will document and publish the interface.
89*2abb3134SXin Li
90*2abb3134SXin LiThe R interface is also subject to change.
91*2abb3134SXin Li
92*2abb3134SXin Li<!-- TODO: Add links to interface docs when available. -->
93*2abb3134SXin Li
94*2abb3134SXin LiThe `fastrand` C module is optional.  It's likely only useful for simulation of
95*2abb3134SXin Lithousands of clients.  It doesn't use cryptographically strong randomness, and
96*2abb3134SXin Lithus should **not** be used in production.
97*2abb3134SXin Li
98*2abb3134SXin LiDirectory Structure
99*2abb3134SXin Li-------------------
100*2abb3134SXin Li
101*2abb3134SXin Li    analysis/
102*2abb3134SXin Li      R/                 # R code for analysis
103*2abb3134SXin Li      cpp/               # Fast reimplementations of certain analysis
104*2abb3134SXin Li                         #   algorithms
105*2abb3134SXin Li    apps/                # Web apps to help you use RAPPOR (using Shiny)
106*2abb3134SXin Li    bin/                 # Command line tools for analysis.
107*2abb3134SXin Li    client/              # Client libraries
108*2abb3134SXin Li      python/            # Python client library
109*2abb3134SXin Li        rappor.py
110*2abb3134SXin Li        ...
111*2abb3134SXin Li      cpp/               # C++ client library
112*2abb3134SXin Li        encoder.cc
113*2abb3134SXin Li        ...
114*2abb3134SXin Li    doc/                 # Documentation
115*2abb3134SXin Li    tests/               # Tools for regression tests
116*2abb3134SXin Li      compare_dist.R     # Test helper for single variable analysis
117*2abb3134SXin Li      gen_true_values.R  # Generate test input
118*2abb3134SXin Li      make_summary.py    # Generate an HTML report for the regtest
119*2abb3134SXin Li      rappor_sim.py      # RAPPOR client simulation
120*2abb3134SXin Li      regtest_spec.py    # Specification of test cases
121*2abb3134SXin Li      ...
122*2abb3134SXin Li    build.sh             # Build scripts (docs, C extension, etc.)
123*2abb3134SXin Li    demo.sh              # Quick demonstration
124*2abb3134SXin Li    docs.sh              # Generate docs form the markdown in doc/
125*2abb3134SXin Li    gh-pages/            # Where generated docs go. (A subtree of the branch gh-pages)
126*2abb3134SXin Li    pipeline/            # Analysis pipeline code.
127*2abb3134SXin Li    regtest.sh           # End-to-end regression tests, including client
128*2abb3134SXin Li                         #  libraries and analysis
129*2abb3134SXin Li    setup.sh             # Install dependencies (for Linux)
130*2abb3134SXin Li    test.sh              # Test runner
131*2abb3134SXin Li
132*2abb3134SXin LiDocumentation
133*2abb3134SXin Li-------------
134*2abb3134SXin Li
135*2abb3134SXin Li- [RAPPOR Data Flow](http://google.github.io/rappor/doc/data-flow.html)
136*2abb3134SXin Li
137*2abb3134SXin LiPublications
138*2abb3134SXin Li------------
139*2abb3134SXin Li
140*2abb3134SXin Li- [RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response](http://arxiv.org/abs/1407.6981)
141*2abb3134SXin Li- [Building a RAPPOR with the Unknown: Privacy-Preserving Learning of Associations and Data Dictionaries](http://arxiv.org/abs/1503.01214)
142*2abb3134SXin Li
143*2abb3134SXin LiLinks
144*2abb3134SXin Li-----
145*2abb3134SXin Li
146*2abb3134SXin Li- [Google Blog Post about RAPPOR](http://googleresearch.blogspot.com/2014/10/learning-statistics-with-privacy-aided.html)
147*2abb3134SXin Li- [RAPPOR implementation in Chrome](http://www.chromium.org/developers/design-documents/rappor)
148*2abb3134SXin Li  - This is a production quality C++ implementation, but it's somewhat tied to
149*2abb3134SXin Li    Chrome, and doesn't support all privacy parameters (e.g. only a few values
150*2abb3134SXin Li    of p and q).  On the other hand, the code in this repo is not yet
151*2abb3134SXin Li    production quality, but supports experimentation with different parameters
152*2abb3134SXin Li    and data sets.  Of course, anyone is free to implement RAPPOR independently
153*2abb3134SXin Li    as well.
154*2abb3134SXin Li- Mailing list: [[email protected]][group]
155*2abb3134SXin Li
156*2abb3134SXin Li[group]: https://groups.google.com/forum/#!forum/rappor-discuss
157