1*a6aa18fbSYabin Cui# Private Join and Compute 2*a6aa18fbSYabin Cui 3*a6aa18fbSYabin CuiThis project contains an implementation of the "Private Join and Compute" 4*a6aa18fbSYabin Cuifunctionality. This functionality allows two users, each holding an input file, 5*a6aa18fbSYabin Cuito privately compute the sum of associated values for records that have common 6*a6aa18fbSYabin Cuiidentifiers. 7*a6aa18fbSYabin Cui 8*a6aa18fbSYabin CuiIn more detail, suppose a Server has a file containing the following 9*a6aa18fbSYabin Cuiidentifiers: 10*a6aa18fbSYabin Cui 11*a6aa18fbSYabin CuiIdentifiers | 12*a6aa18fbSYabin Cui----------- | 13*a6aa18fbSYabin CuiSam | 14*a6aa18fbSYabin CuiAda | 15*a6aa18fbSYabin CuiRuby | 16*a6aa18fbSYabin CuiBrendan | 17*a6aa18fbSYabin Cui 18*a6aa18fbSYabin CuiAnd a Client has a file containing the following identifiers, paired with 19*a6aa18fbSYabin Cuiassociated integer values: 20*a6aa18fbSYabin Cui 21*a6aa18fbSYabin CuiIdentifiers | Associated Values 22*a6aa18fbSYabin Cui----------- | ----------------- 23*a6aa18fbSYabin CuiRuby | 10 24*a6aa18fbSYabin CuiAda | 30 25*a6aa18fbSYabin CuiAlexander | 5 26*a6aa18fbSYabin CuiMika | 35 27*a6aa18fbSYabin Cui 28*a6aa18fbSYabin CuiThen the Private Join and Compute functionality would allow the Client to learn 29*a6aa18fbSYabin Cuithat the input files had *2* identifiers in common, and that the associated 30*a6aa18fbSYabin Cuivalues summed to *40*. It does this *without* revealing which specific 31*a6aa18fbSYabin Cuiidentifiers were in common (Ada and Ruby in the example above), or revealing 32*a6aa18fbSYabin Cuianything additional about the other identifiers in the two parties' data set. 33*a6aa18fbSYabin Cui 34*a6aa18fbSYabin CuiPrivate Join and Compute is a variant of the well-studied Private Set 35*a6aa18fbSYabin CuiIntersection functionality. We sometimes also refer to Private Join and Compute 36*a6aa18fbSYabin Cuias Private Intersection-Sum. 37*a6aa18fbSYabin Cui 38*a6aa18fbSYabin Cui## How to run the protocol 39*a6aa18fbSYabin Cui 40*a6aa18fbSYabin CuiIn order to run Private Join and Compute, you need to install Bazel, if you 41*a6aa18fbSYabin Cuidon't have it already. 42*a6aa18fbSYabin Cui[Follow the instructions for your platform on the Bazel website.](https://docs.bazel.build/versions/master/install.html) 43*a6aa18fbSYabin Cui 44*a6aa18fbSYabin CuiYou also need to install Git, if you don't have it already. 45*a6aa18fbSYabin Cui[Follow the instructions for your platform on the Git website.](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git) 46*a6aa18fbSYabin Cui 47*a6aa18fbSYabin CuiOnce you've installed Bazel and Git, open a Terminal and clone the Private Join 48*a6aa18fbSYabin Cuiand Compute repository into a local folder: 49*a6aa18fbSYabin Cui 50*a6aa18fbSYabin Cui```shell 51*a6aa18fbSYabin Cuigit clone https://github.com/google/private-join-and-compute.git 52*a6aa18fbSYabin Cui``` 53*a6aa18fbSYabin Cui 54*a6aa18fbSYabin CuiNavigate into the `private-join-and-compute` folder you just created, and build 55*a6aa18fbSYabin Cuithe Private Join and Compute library and dependencies using Bazel: 56*a6aa18fbSYabin Cui 57*a6aa18fbSYabin Cui```bash 58*a6aa18fbSYabin Cuicd private-join-and-compute 59*a6aa18fbSYabin Cuibazel build //private_join_and_compute:all 60*a6aa18fbSYabin Cui``` 61*a6aa18fbSYabin Cui 62*a6aa18fbSYabin Cui(All the following instructions must be run from inside the 63*a6aa18fbSYabin Cuiprivate-join-and-compute folder.) 64*a6aa18fbSYabin Cui 65*a6aa18fbSYabin CuiNext, generate some dummy data to run the protocol on: 66*a6aa18fbSYabin Cui 67*a6aa18fbSYabin Cui```shell 68*a6aa18fbSYabin Cuibazel-bin/private_join_and_compute/generate_dummy_data --server_data_file=/tmp/dummy_server_data.csv \ 69*a6aa18fbSYabin Cui--client_data_file=/tmp/dummy_client_data.csv 70*a6aa18fbSYabin Cui``` 71*a6aa18fbSYabin Cui 72*a6aa18fbSYabin CuiThis will create dummy data for the server and client at the specified 73*a6aa18fbSYabin Cuilocations. You can look at the files in `/tmp/dummy_server_data.csv` and 74*a6aa18fbSYabin Cui`/tmp/dummy_client_data.csv` to see the dummy data that was generated. You can 75*a6aa18fbSYabin Cuialso change the size of the dummy data generated using additional flags. For 76*a6aa18fbSYabin Cuiexample: 77*a6aa18fbSYabin Cui 78*a6aa18fbSYabin Cui```shell 79*a6aa18fbSYabin Cuibazel-bin/private_join_and_compute/generate_dummy_data \ 80*a6aa18fbSYabin Cui--server_data_file=/tmp/dummy_server_data.csv \ 81*a6aa18fbSYabin Cui--client_data_file=/tmp/dummy_client_data.csv --server_data_size=1000 \ 82*a6aa18fbSYabin Cui--client_data_size=1000 --intersection_size=200 --max_associated_value=100 83*a6aa18fbSYabin Cui``` 84*a6aa18fbSYabin Cui 85*a6aa18fbSYabin CuiOnce you've generated dummy data, you can start the server as follows: 86*a6aa18fbSYabin Cui 87*a6aa18fbSYabin Cui```shell 88*a6aa18fbSYabin Cuibazel-bin/private_join_and_compute/server --server_data_file=/tmp/dummy_server_data.csv 89*a6aa18fbSYabin Cui``` 90*a6aa18fbSYabin Cui 91*a6aa18fbSYabin CuiThe server will load data from the specified file, and wait for a connection 92*a6aa18fbSYabin Cuifrom the client. 93*a6aa18fbSYabin Cui 94*a6aa18fbSYabin CuiOnce the server is running, you can start a client to connect to the server. 95*a6aa18fbSYabin CuiCreate a new terminal and navigate to the private-join-and-compute folder. Once 96*a6aa18fbSYabin Cuithere, run the following command to start the client: 97*a6aa18fbSYabin Cui 98*a6aa18fbSYabin Cui```shell 99*a6aa18fbSYabin Cuibazel-bin/private_join_and_compute/client --client_data_file=/tmp/dummy_client_data.csv 100*a6aa18fbSYabin Cui``` 101*a6aa18fbSYabin Cui 102*a6aa18fbSYabin CuiThe client will connect to the server and execute the steps of the protocol 103*a6aa18fbSYabin Cuisequentially. At the end of the protocol, the client will output the 104*a6aa18fbSYabin CuiIntersection Size (the number of identifiers in common) and the Intersection Sum 105*a6aa18fbSYabin Cui(the sum of associated values). If the protocol was successful, both the server 106*a6aa18fbSYabin Cuiand client will shut down. 107*a6aa18fbSYabin Cui 108*a6aa18fbSYabin Cui## Caveats 109*a6aa18fbSYabin Cui 110*a6aa18fbSYabin CuiSeveral caveats should be carefully considered before using Private Join and 111*a6aa18fbSYabin CuiCompute. 112*a6aa18fbSYabin Cui 113*a6aa18fbSYabin Cui### Security Model 114*a6aa18fbSYabin Cui 115*a6aa18fbSYabin CuiOur protocol has security against honest-but-curious adversaries. This means 116*a6aa18fbSYabin Cuithat as long as both participants follow the protocol honestly, neither will 117*a6aa18fbSYabin Cuilearn more than the size of the intersection and the intersection-sum. However, 118*a6aa18fbSYabin Cuiif a participant deviates from the protocol, it is possible they could learn 119*a6aa18fbSYabin Cuimore than the prescribed information. For example, they could learn the specific 120*a6aa18fbSYabin Cuiidentifiers in the intersection. If the underlying data is sensitive, we 121*a6aa18fbSYabin Cuirecommend performing a careful risk analysis before using Private Join and 122*a6aa18fbSYabin CuiCompute, to ensure that neither party has an incentive to deviate from the 123*a6aa18fbSYabin Cuiprotocol. The protocol can also be supplemented with external enforcement such 124*a6aa18fbSYabin Cuias code audits to ensure that no party deviates from the protocol. 125*a6aa18fbSYabin Cui 126*a6aa18fbSYabin Cui### Maliciously Chosen Inputs 127*a6aa18fbSYabin Cui 128*a6aa18fbSYabin CuiWe note that our protocol does not authenticate that parties use "real" input, 129*a6aa18fbSYabin Cuinor does it prevent them from arbitrarily changing their input. We suggest 130*a6aa18fbSYabin Cuicareful analysis of whether any party has an incentive to lie about their 131*a6aa18fbSYabin Cuiinputs. This risk can also be mitigated by external enforcement such as code 132*a6aa18fbSYabin Cuiaudits. 133*a6aa18fbSYabin Cui 134*a6aa18fbSYabin Cui### Leakage from the Intersection-Sum. 135*a6aa18fbSYabin Cui 136*a6aa18fbSYabin CuiWhile the Private Join and Compute functionality is supposed to reveal only the 137*a6aa18fbSYabin Cuiintersection-size and intersection-sum, it is possible that the intersection-sum 138*a6aa18fbSYabin Cuiitself could reveal something about which identifiers were in common. 139*a6aa18fbSYabin Cui 140*a6aa18fbSYabin CuiFor example, if an identifier has a very unique associated integer values, then 141*a6aa18fbSYabin Cuiit may be easy to detect if that identifier was in the intersection simply by 142*a6aa18fbSYabin Cuilooking at the intersection-sum. One way this could happen is if one of the 143*a6aa18fbSYabin Cuiidentifiers has a very large associated value compared to all other identifiers. 144*a6aa18fbSYabin CuiIn that case, if the intersection-sum is large, one could reasonably infer that 145*a6aa18fbSYabin Cuithat identifier was in the intersection. To mitigate this, we suggest scrubbing 146*a6aa18fbSYabin Cuiinputs to remove identifiers with "outlier" values. 147*a6aa18fbSYabin Cui 148*a6aa18fbSYabin CuiAnother way that the intersection-sum may leak which identifiers are in the 149*a6aa18fbSYabin Cuiintersection is if the intersection is too small. This could make it easier to 150*a6aa18fbSYabin Cuiguess which combination of identifiers could be in the intersection in order to 151*a6aa18fbSYabin Cuiyield a particular intersection-sum. To mitigate this, one could abort the 152*a6aa18fbSYabin Cuiprotocol if the intersection-size is below a certain threshold, or to add noise 153*a6aa18fbSYabin Cuito the output of the protocol. 154*a6aa18fbSYabin Cui 155*a6aa18fbSYabin Cui(Note that these mitigations are not currently implemented in this open-source 156*a6aa18fbSYabin Cuilibrary.) 157*a6aa18fbSYabin Cui 158*a6aa18fbSYabin Cui## Disclaimers 159*a6aa18fbSYabin Cui 160*a6aa18fbSYabin CuiThis is not an officially supported Google product. The software is provided 161*a6aa18fbSYabin Cuias-is without any guarantees or warranties, express or implied. 162