xref: /aosp_15_r20/external/private-join-and-compute/README.md (revision a6aa18fbfbf9cb5cd47356a9d1b057768998488c)
1*a6aa18fbSYabin Cui# Private Join and Compute
2*a6aa18fbSYabin Cui
3*a6aa18fbSYabin CuiThis project contains an implementation of the "Private Join and Compute"
4*a6aa18fbSYabin Cuifunctionality. This functionality allows two users, each holding an input file,
5*a6aa18fbSYabin Cuito privately compute the sum of associated values for records that have common
6*a6aa18fbSYabin Cuiidentifiers.
7*a6aa18fbSYabin Cui
8*a6aa18fbSYabin CuiIn more detail, suppose a Server has a file containing the following
9*a6aa18fbSYabin Cuiidentifiers:
10*a6aa18fbSYabin Cui
11*a6aa18fbSYabin CuiIdentifiers |
12*a6aa18fbSYabin Cui----------- |
13*a6aa18fbSYabin CuiSam         |
14*a6aa18fbSYabin CuiAda         |
15*a6aa18fbSYabin CuiRuby        |
16*a6aa18fbSYabin CuiBrendan     |
17*a6aa18fbSYabin Cui
18*a6aa18fbSYabin CuiAnd a Client has a file containing the following identifiers, paired with
19*a6aa18fbSYabin Cuiassociated integer values:
20*a6aa18fbSYabin Cui
21*a6aa18fbSYabin CuiIdentifiers | Associated Values
22*a6aa18fbSYabin Cui----------- | -----------------
23*a6aa18fbSYabin CuiRuby        | 10
24*a6aa18fbSYabin CuiAda         | 30
25*a6aa18fbSYabin CuiAlexander   | 5
26*a6aa18fbSYabin CuiMika        | 35
27*a6aa18fbSYabin Cui
28*a6aa18fbSYabin CuiThen the Private Join and Compute functionality would allow the Client to learn
29*a6aa18fbSYabin Cuithat the input files had *2* identifiers in common, and that the associated
30*a6aa18fbSYabin Cuivalues summed to *40*. It does this *without* revealing which specific
31*a6aa18fbSYabin Cuiidentifiers were in common (Ada and Ruby in the example above), or revealing
32*a6aa18fbSYabin Cuianything additional about the other identifiers in the two parties' data set.
33*a6aa18fbSYabin Cui
34*a6aa18fbSYabin CuiPrivate Join and Compute is a variant of the well-studied Private Set
35*a6aa18fbSYabin CuiIntersection functionality. We sometimes also refer to Private Join and Compute
36*a6aa18fbSYabin Cuias Private Intersection-Sum.
37*a6aa18fbSYabin Cui
38*a6aa18fbSYabin Cui## How to run the protocol
39*a6aa18fbSYabin Cui
40*a6aa18fbSYabin CuiIn order to run Private Join and Compute, you need to install Bazel, if you
41*a6aa18fbSYabin Cuidon't have it already.
42*a6aa18fbSYabin Cui[Follow the instructions for your platform on the Bazel website.](https://docs.bazel.build/versions/master/install.html)
43*a6aa18fbSYabin Cui
44*a6aa18fbSYabin CuiYou also need to install Git, if you don't have it already.
45*a6aa18fbSYabin Cui[Follow the instructions for your platform on the Git website.](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git)
46*a6aa18fbSYabin Cui
47*a6aa18fbSYabin CuiOnce you've installed Bazel and Git, open a Terminal and clone the Private Join
48*a6aa18fbSYabin Cuiand Compute repository into a local folder:
49*a6aa18fbSYabin Cui
50*a6aa18fbSYabin Cui```shell
51*a6aa18fbSYabin Cuigit clone https://github.com/google/private-join-and-compute.git
52*a6aa18fbSYabin Cui```
53*a6aa18fbSYabin Cui
54*a6aa18fbSYabin CuiNavigate into the `private-join-and-compute` folder you just created, and build
55*a6aa18fbSYabin Cuithe Private Join and Compute library and dependencies using Bazel:
56*a6aa18fbSYabin Cui
57*a6aa18fbSYabin Cui```bash
58*a6aa18fbSYabin Cuicd private-join-and-compute
59*a6aa18fbSYabin Cuibazel build //private_join_and_compute:all
60*a6aa18fbSYabin Cui```
61*a6aa18fbSYabin Cui
62*a6aa18fbSYabin Cui(All the following instructions must be run from inside the
63*a6aa18fbSYabin Cuiprivate-join-and-compute folder.)
64*a6aa18fbSYabin Cui
65*a6aa18fbSYabin CuiNext, generate some dummy data to run the protocol on:
66*a6aa18fbSYabin Cui
67*a6aa18fbSYabin Cui```shell
68*a6aa18fbSYabin Cuibazel-bin/private_join_and_compute/generate_dummy_data --server_data_file=/tmp/dummy_server_data.csv \
69*a6aa18fbSYabin Cui--client_data_file=/tmp/dummy_client_data.csv
70*a6aa18fbSYabin Cui```
71*a6aa18fbSYabin Cui
72*a6aa18fbSYabin CuiThis will create dummy data for the server and client at the specified
73*a6aa18fbSYabin Cuilocations. You can look at the files in `/tmp/dummy_server_data.csv` and
74*a6aa18fbSYabin Cui`/tmp/dummy_client_data.csv` to see the dummy data that was generated. You can
75*a6aa18fbSYabin Cuialso change the size of the dummy data generated using additional flags. For
76*a6aa18fbSYabin Cuiexample:
77*a6aa18fbSYabin Cui
78*a6aa18fbSYabin Cui```shell
79*a6aa18fbSYabin Cuibazel-bin/private_join_and_compute/generate_dummy_data \
80*a6aa18fbSYabin Cui--server_data_file=/tmp/dummy_server_data.csv \
81*a6aa18fbSYabin Cui--client_data_file=/tmp/dummy_client_data.csv --server_data_size=1000 \
82*a6aa18fbSYabin Cui--client_data_size=1000 --intersection_size=200 --max_associated_value=100
83*a6aa18fbSYabin Cui```
84*a6aa18fbSYabin Cui
85*a6aa18fbSYabin CuiOnce you've generated dummy data, you can start the server as follows:
86*a6aa18fbSYabin Cui
87*a6aa18fbSYabin Cui```shell
88*a6aa18fbSYabin Cuibazel-bin/private_join_and_compute/server --server_data_file=/tmp/dummy_server_data.csv
89*a6aa18fbSYabin Cui```
90*a6aa18fbSYabin Cui
91*a6aa18fbSYabin CuiThe server will load data from the specified file, and wait for a connection
92*a6aa18fbSYabin Cuifrom the client.
93*a6aa18fbSYabin Cui
94*a6aa18fbSYabin CuiOnce the server is running, you can start a client to connect to the server.
95*a6aa18fbSYabin CuiCreate a new terminal and navigate to the private-join-and-compute folder. Once
96*a6aa18fbSYabin Cuithere, run the following command to start the client:
97*a6aa18fbSYabin Cui
98*a6aa18fbSYabin Cui```shell
99*a6aa18fbSYabin Cuibazel-bin/private_join_and_compute/client --client_data_file=/tmp/dummy_client_data.csv
100*a6aa18fbSYabin Cui```
101*a6aa18fbSYabin Cui
102*a6aa18fbSYabin CuiThe client will connect to the server and execute the steps of the protocol
103*a6aa18fbSYabin Cuisequentially. At the end of the protocol, the client will output the
104*a6aa18fbSYabin CuiIntersection Size (the number of identifiers in common) and the Intersection Sum
105*a6aa18fbSYabin Cui(the sum of associated values). If the protocol was successful, both the server
106*a6aa18fbSYabin Cuiand client will shut down.
107*a6aa18fbSYabin Cui
108*a6aa18fbSYabin Cui## Caveats
109*a6aa18fbSYabin Cui
110*a6aa18fbSYabin CuiSeveral caveats should be carefully considered before using Private Join and
111*a6aa18fbSYabin CuiCompute.
112*a6aa18fbSYabin Cui
113*a6aa18fbSYabin Cui### Security Model
114*a6aa18fbSYabin Cui
115*a6aa18fbSYabin CuiOur protocol has security against honest-but-curious adversaries. This means
116*a6aa18fbSYabin Cuithat as long as both participants follow the protocol honestly, neither will
117*a6aa18fbSYabin Cuilearn more than the size of the intersection and the intersection-sum. However,
118*a6aa18fbSYabin Cuiif a participant deviates from the protocol, it is possible they could learn
119*a6aa18fbSYabin Cuimore than the prescribed information. For example, they could learn the specific
120*a6aa18fbSYabin Cuiidentifiers in the intersection. If the underlying data is sensitive, we
121*a6aa18fbSYabin Cuirecommend performing a careful risk analysis before using Private Join and
122*a6aa18fbSYabin CuiCompute, to ensure that neither party has an incentive to deviate from the
123*a6aa18fbSYabin Cuiprotocol. The protocol can also be supplemented with external enforcement such
124*a6aa18fbSYabin Cuias code audits to ensure that no party deviates from the protocol.
125*a6aa18fbSYabin Cui
126*a6aa18fbSYabin Cui### Maliciously Chosen Inputs
127*a6aa18fbSYabin Cui
128*a6aa18fbSYabin CuiWe note that our protocol does not authenticate that parties use "real" input,
129*a6aa18fbSYabin Cuinor does it prevent them from arbitrarily changing their input. We suggest
130*a6aa18fbSYabin Cuicareful analysis of whether any party has an incentive to lie about their
131*a6aa18fbSYabin Cuiinputs. This risk can also be mitigated by external enforcement such as code
132*a6aa18fbSYabin Cuiaudits.
133*a6aa18fbSYabin Cui
134*a6aa18fbSYabin Cui### Leakage from the Intersection-Sum.
135*a6aa18fbSYabin Cui
136*a6aa18fbSYabin CuiWhile the Private Join and Compute functionality is supposed to reveal only the
137*a6aa18fbSYabin Cuiintersection-size and intersection-sum, it is possible that the intersection-sum
138*a6aa18fbSYabin Cuiitself could reveal something about which identifiers were in common.
139*a6aa18fbSYabin Cui
140*a6aa18fbSYabin CuiFor example, if an identifier has a very unique associated integer values, then
141*a6aa18fbSYabin Cuiit may be easy to detect if that identifier was in the intersection simply by
142*a6aa18fbSYabin Cuilooking at the intersection-sum. One way this could happen is if one of the
143*a6aa18fbSYabin Cuiidentifiers has a very large associated value compared to all other identifiers.
144*a6aa18fbSYabin CuiIn that case, if the intersection-sum is large, one could reasonably infer that
145*a6aa18fbSYabin Cuithat identifier was in the intersection. To mitigate this, we suggest scrubbing
146*a6aa18fbSYabin Cuiinputs to remove identifiers with "outlier" values.
147*a6aa18fbSYabin Cui
148*a6aa18fbSYabin CuiAnother way that the intersection-sum may leak which identifiers are in the
149*a6aa18fbSYabin Cuiintersection is if the intersection is too small. This could make it easier to
150*a6aa18fbSYabin Cuiguess which combination of identifiers could be in the intersection in order to
151*a6aa18fbSYabin Cuiyield a particular intersection-sum. To mitigate this, one could abort the
152*a6aa18fbSYabin Cuiprotocol if the intersection-size is below a certain threshold, or to add noise
153*a6aa18fbSYabin Cuito the output of the protocol.
154*a6aa18fbSYabin Cui
155*a6aa18fbSYabin Cui(Note that these mitigations are not currently implemented in this open-source
156*a6aa18fbSYabin Cuilibrary.)
157*a6aa18fbSYabin Cui
158*a6aa18fbSYabin Cui## Disclaimers
159*a6aa18fbSYabin Cui
160*a6aa18fbSYabin CuiThis is not an officially supported Google product. The software is provided
161*a6aa18fbSYabin Cuias-is without any guarantees or warranties, express or implied.
162