Name Date Size #Lines LOC

..--

README.mdH A D25-Apr-20251.7 KiB5333

alarm-lib.shH A D25-Apr-20252.3 KiB12573

assoc.shH A D25-Apr-20254.3 KiB15394

combine_results.pyH A D25-Apr-20253.4 KiB13987

combine_results_test.pyH A D25-Apr-2025895 3924

combine_status.pyH A D25-Apr-20258.1 KiB299182

combine_status_test.pyH A D25-Apr-2025921 3924

cook.shH A D25-Apr-20253.8 KiB14883

csv-to-html-test.shH A D25-Apr-2025770 6451

csv_to_html.pyH A D25-Apr-20255.8 KiB219143

csv_to_html_test.pyH A D25-Apr-2025439 2515

dist.shH A D25-Apr-20253.7 KiB13683

metric_status.RH A D25-Apr-202510.8 KiB344221

regtest.shH A D25-Apr-20253.6 KiB162106

task_spec.pyH A D25-Apr-202510.3 KiB365251

task_spec_test.pyH A D25-Apr-20251.3 KiB6243

tools-lib.shH A D25-Apr-20251.2 KiB6539

ui.shH A D25-Apr-20258.5 KiB323184

util.pyH A D25-Apr-2025117 106

README.md

1pipeline
2========
3
4This directory contains tools and scripts for running a cron job that does
5RAPPOR analysis and generates an HTML dashboard.
6
7It works like this:
8
91. `task_spec.py` generates a text file where each line corresponds to a process
10   to be run (a "task").  The process is `bin/decode-dist` or
11   `bin/decode-assoc`.  The line contains the task parameters.
12
132. `xargs -P` is used to run processes in parallel.  Our analysis is generally
14   single-threaded (i.e. because R is single-threaded), so this helps utilize
15   the machine fully.  Each task places its output in a different subdirectory.
16
173. `cook.sh` calls `combine_results.py` to combine analysis results into a time
18   series.  It also calls `combine_status.py` to keep track of task data for
19   "meta-analysis".  `metric_status.R` generates more summary CSV files.
20
214. `ui.sh` calls `csv_to_html.py` to generate an HTML fragments from the CSV
22   files.
23
245. The JavaScript in `ui/ui.js` is loaded from static HTML, and makes AJAX calls
25   to retrieve the HTML fragments.  The page is made interactive with
26   `ui/table-lib.js`.
27
28`dist.sh` and `assoc.sh` contain functions which coordinate this process.
29
30`alarm-lib.sh` is used to kill processes that have been running for too long.
31
32Testing
33-------
34
35`pipeline/regtest.sh` contains end-to-end demos of this process.  Right now it
36depends on testdata from elsewhere in the tree:
37
38
39    rappor$ ./demo.sh run   # prepare dist testdata
40    rappor$ cd bin
41
42    bin$ ./test.sh write-assoc-testdata  # prepare assoc testdata
43    bin$ cd ../pipeline
44
45    pipeline$ ./regtest.sh dist
46    pipeline$ ./regtest.sh assoc
47
48    pipeline$ python -m SimpleHTTPServer  # start a static web server
49
50    http://localhost:8000/_tmp/
51
52
53