README.md
1# Cross-Device Federated Computations Demo
2
3This directory contains an example
4[Federated Program platform](https://github.com/tensorflow/federated/blob/main/tensorflow_federated/python/program/README.md#platform-specific-components)
5implementation that's compatible with the Federated Compute client.
6
7The code in this directory prioritizes understandability over production
8scalability because many of the frameworks used to create robust servers are
9dependent on the intended deployment environment. Comments throughout the code
10and documentation call out where changes should be made for a production
11implementation. Unless otherwise noted, the libraries in other directories are
12production quality.
13
14See
15[Towards Federated Learning at Scale: System Design](https://arxiv.org/abs/1902.01046)
16(TFLaS) for additional information on scaling Federated Learning.
17
18## Example Usage
19
20> See `federated_program_test.py` for a working example of configuring and
21> running a Federated Program using this package.
22
23The following example program is based on the example in the
24[TFF documentation](https://github.com/tensorflow/federated/blob/main/tensorflow_federated/python/program/README.md#program):
25
26```python
27from fcp import demo
28from fcp.protos import plan_pb2
29
30# Parameters set by the customer.
31_OUTPUT_DIR = flags.DEFINE_string('output_dir', None, 'The output path.')
32_POPULATION_NAME = flags.DEFINE_string(
33 'population_name', None, 'The identifier for the client population.')
34_COLLECTION_URI = flags.DEFINE_string(
35 'collection_uri', None,
36 'A URI identifying the example collection to read from.')
37
38
39def main():
40 # Parameters set by the program.
41 total_rounds = 10
42 num_clients = 3
43
44 # Configure the platform-specific components.
45 with demo.FederatedContext(
46 _POPULATION_NAME.value,
47 base_context=tff.framework.get_context_stack().current) as context:
48 data_source = demo.FederatedDataSource(
49 _POPULATION_NAME.value,
50 plan_pb2.ExampleSelector(collection_uri=_COLLECTION_URI.value))
51
52 # Configure the platform-agnostic components.
53 summary_dir = os.path.join(_OUTPUT_DIR.value, 'summary')
54 output_managers = [
55 tff.program.LoggingReleaseManager(),
56 tensorboard_manager = tff.program.TensorBoardReleaseManager(summary_dir),
57 ]
58 program_state_dir = os.path.join(..., 'program_state')
59 program_state_manager = tff.program.FileProgramStateManager(
60 program_state_dir)
61
62 # Define the computations.
63 initialize = ...
64 train = ...
65
66 # Execute the computations using program logic.
67 tff.framework.set_default_context(context)
68 train_federated_model(
69 initialize=initialize,
70 train=train,
71 data_source=data_source,
72 total_rounds=total_rounds,
73 num_clients=num_clients,
74 output_managers=output_managers,
75 program_state_manager=program_state_manager)
76```
77
78## Code Structure
79
80```mermaid
81flowchart
82 client(Client)
83
84 subgraph FP[Federated Program Process]
85 federated_program(Federated Program)
86 style federated_program color:#333,fill:#bbb,stroke:#666,stroke-width:3px;
87
88 subgraph Server[In-Process Server]
89 server(server.py)
90 http_actions(http_actions.py)
91 plan_utils(plan_utils.py)
92
93 subgraph Handlers[HTTP Handlers]
94 aggregations(aggregations.py)
95 eligibility_eval_tasks(eligibility_eval_tasks.py)
96 media(media.py)
97 task_assignments(task_assignments.py)
98 end
99 end
100
101 subgraph FP_Platform[Federated Program Platform]
102 federated_context(federated_context.py)
103 federated_computation(federated_computation.py)
104 federated_data_source(federated_data_source.py)
105 checkpoint_tensor_reference(checkpoint_tensor_reference.py)
106 end
107 end
108
109 client & server --> Handlers
110 server --> http_actions & plan_utils
111 Handlers --> http_actions
112 federated_program --> federated_context & federated_computation & federated_data_source
113 federated_context --> checkpoint_tensor_reference & server
114```
115
116### Client
117
118The [Federated Computations Client](../client)
119library is used by applications running on end-user devices to run
120server-defined computations over on-device data and report back results (such as
121updated model weights) to be aggregated by the server.
122
123> See `federated_program_test.py` for command-line flags that should be used
124> when running `//fcp/client:client_runner_main`.
125
126> ⚠️ The client requires TLS when connecting to any host other than `localhost`.
127> The server's public and private keys will need to be provided to the
128> `demo.FederatedContext` constructor, and the corresponding CA certificate will
129> need to be passed to the client library (e.g., via `--test_cert` for
130> `client_runner_main`).
131
132### Federated Program Platform
133
134The demo Federated Computations platform is a
135[Federated Program platform](https://github.com/tensorflow/federated/blob/main/tensorflow_federated/python/program/README.md#platform-specific-components)
136implementation that allows TFF computations to be run using Federated
137Computations Clients.
138
139A production implementation could reuse much of this code as-is, though
140`federated_context.py` would need to be updated to communicate with remote
141server(s) instead of an in-process server.
142
143#### `federated_context.py`
144
145Contains a
146[`tff.program.FederatedContext`](https://www.tensorflow.org/federated/api_docs/python/tff/program/FederatedContext)
147implementation for running computations on the demo Federated Computations
148platform.
149
150This module uses libraries in
151[`fcp/artifact_building`](../artifact_building) to
152convert TFF computations to the format expected by the
153[in-process server](#in-process-server) and [client](#client).
154
155#### `federated_computation.py`
156
157Contains a
158[`tff.Computation`](https://www.tensorflow.org/federated/api_docs/python/tff/Computation)
159subclass for computations that will be run by the demo Federated Computations
160platform.
161
162#### `federated_data_source.py`
163
164Contains a
165[`tff.program.FederatedDataSource`](https://www.tensorflow.org/federated/api_docs/python/tff/program/FederatedDataSource)
166implementation for representing on-device data sources.
167
168#### `checkpoint_tensor_reference.py`
169
170Contains a
171[`tff.program.MaterializableValueReference`](https://www.tensorflow.org/federated/api_docs/python/tff/program/MaterializableValueReference)
172implementation that reads values from a TensorFlow checkpoint.
173
174### In-Process Server
175
176An in-process HTTP(S) server that implements the
177[Federated Compute protocol](../protos/federatedcompute).
178This server is responsible for selecting which clients will contribute to each
179computation invocation (**task**), broadcasting computations and state to
180clients, aggregating the results of on-device computation, and incorporating
181that aggregate information back into the model or metrics.
182
183In a production implementation, each Federated Compute protocol service would
184likely be handled by a separate replicated microservice, not a Python module.
185
186#### `server.py`
187
188Provides the interface for setting up and stopping the in-process HTTP(S) server
189and running computations provided by the `FederatedContext`. This module is
190responsible for notifying the various Federated Compute protocol service
191implementations when a new task has been added and then managing the lifecycle
192of that task.
193
194#### `eligibility_eval_tasks.py`
195
196Contains handlers for the Federated Compute protocol's
197[EligibilityEvalTasks](../protos/federatedcompute/eligibility_eval_tasks.proto)
198service. This service is responsible for serving optional pre-task-assignment
199computations that determines to which tasks each client is eligible to
200contribute. The demo platform does not currently support configuring Eligibility
201Eval tasks; clients are considered to be eligible for all tasks.
202
203#### `task_assignments.py`
204
205Contains handlers for the Federated Compute protocol's
206[TaskAssignments](../protos/federatedcompute/task_assignments.proto)
207service. This service is responsible for either assigning each client to a
208task -- or rejecting the client.
209
210#### `aggregations.py`
211
212Contains handlers for the Federated Compute protocol's
213[Aggregations](../protos/federatedcompute/aggregations.proto)
214service. This service is responsible for aggregating client-reported data using
215the
216[simple Aggregation Protocol](../aggregation/protocol/simple_aggregation)
217library.
218
219Note that the demo platform does not currently contain an implementation of the
220[SecureAggregations](../protos/federatedcompute/secure_aggregations.proto)
221service.
222
223#### `media.py`
224
225Contains handlers for HTTP uploads and downloads using `PUT` and `GET` requests.
226
227A production implementation will likely replace this module with a
228deployment-environment-specific download service; a custom upload service
229implementation may be needed since it should not persistently store
230client-uploaded data.
231
232#### `http_actions.py`
233
234Contains helper functions for converting proto-based handlers into HTTP
235handlers. This conversion mimics the Cloud Endpoints
236[HTTP to gRPC transcoding](https://cloud.google.com/endpoints/docs/grpc/transcoding).
237
238#### `plan_utils.py`
239
240Contains helper functions for constructing the TensorFlow graph and input
241checkpoint used by the client and running TensorFlow-based post-processing on
242aggregated results.
243