README.md
1Command Line Tools
2==================
3
4This directory contains command line tools for RAPPOR analysis.
5
6Analysis Tools
7--------------
8
9### decode-dist
10
11Decode a distribution -- requires a "counts" file (summed bits from reports),
12map file, and a params file. See `test.sh decode-dist` in this dir for an
13example.
14
15### decode-assoc
16
17Decode a joint distribution between 2 variables ("association analysis"). See
18`test.sh decode-assoc-R` or `test.sh decode-assoc-cpp` in this dir for an
19example.
20
21Currently it only supports associating strings vs. booleans.
22
23### Setup
24
25Both of these tools are written in R, and require several R libraries to be
26installed (see `../setup.sh r-packages`).
27
28`decode-assoc` also shells out to a native binary written in C++ if
29`--em-executable` is passed. This requires a C++ compiler (see
30`analysis/cpp/run.sh`). You can run `test.sh decode-assoc-cpp` to test it.
31
32
33Helper Tools
34------------
35
36These are simple Python implementations of tools needed for analysis. At
37Google, Chrome uses alternative C++/Go implementations of these tools.
38
39### sum-bits
40
41Given a CSV file with RAPPOR reports (IRRs), produce a "counts" CSV file on
42stdout. This is the `m x (k+1)` matrix that is used in the R analysis (where m
43= #cohorts and k = report width in bits).
44
45### hash-candidates
46
47Given a list of candidates on stdin, produce a CSV file of hashes (the "map
48file"). Each row has `m x h` cells (where m = #cohorts and h = #hashes)
49
50See the `regtest.sh` script for examples of how these tools are invoked.
51
52