1# Cross-Device Federated Computations Demo 2 3This directory contains an example 4[Federated Program platform](https://github.com/tensorflow/federated/blob/main/tensorflow_federated/python/program/README.md#platform-specific-components) 5implementation that's compatible with the Federated Compute client. 6 7The code in this directory prioritizes understandability over production 8scalability because many of the frameworks used to create robust servers are 9dependent on the intended deployment environment. Comments throughout the code 10and documentation call out where changes should be made for a production 11implementation. Unless otherwise noted, the libraries in other directories are 12production quality. 13 14See 15[Towards Federated Learning at Scale: System Design](https://arxiv.org/abs/1902.01046) 16(TFLaS) for additional information on scaling Federated Learning. 17 18## Example Usage 19 20> See `federated_program_test.py` for a working example of configuring and 21> running a Federated Program using this package. 22 23The following example program is based on the example in the 24[TFF documentation](https://github.com/tensorflow/federated/blob/main/tensorflow_federated/python/program/README.md#program): 25 26```python 27from fcp import demo 28from fcp.protos import plan_pb2 29 30# Parameters set by the customer. 31_OUTPUT_DIR = flags.DEFINE_string('output_dir', None, 'The output path.') 32_POPULATION_NAME = flags.DEFINE_string( 33 'population_name', None, 'The identifier for the client population.') 34_COLLECTION_URI = flags.DEFINE_string( 35 'collection_uri', None, 36 'A URI identifying the example collection to read from.') 37 38 39def main(): 40 # Parameters set by the program. 41 total_rounds = 10 42 num_clients = 3 43 44 # Configure the platform-specific components. 45 with demo.FederatedContext( 46 _POPULATION_NAME.value, 47 base_context=tff.framework.get_context_stack().current) as context: 48 data_source = demo.FederatedDataSource( 49 _POPULATION_NAME.value, 50 plan_pb2.ExampleSelector(collection_uri=_COLLECTION_URI.value)) 51 52 # Configure the platform-agnostic components. 53 summary_dir = os.path.join(_OUTPUT_DIR.value, 'summary') 54 output_managers = [ 55 tff.program.LoggingReleaseManager(), 56 tensorboard_manager = tff.program.TensorBoardReleaseManager(summary_dir), 57 ] 58 program_state_dir = os.path.join(..., 'program_state') 59 program_state_manager = tff.program.FileProgramStateManager( 60 program_state_dir) 61 62 # Define the computations. 63 initialize = ... 64 train = ... 65 66 # Execute the computations using program logic. 67 tff.framework.set_default_context(context) 68 train_federated_model( 69 initialize=initialize, 70 train=train, 71 data_source=data_source, 72 total_rounds=total_rounds, 73 num_clients=num_clients, 74 output_managers=output_managers, 75 program_state_manager=program_state_manager) 76``` 77 78## Code Structure 79 80```mermaid 81flowchart 82 client(Client) 83 84 subgraph FP[Federated Program Process] 85 federated_program(Federated Program) 86 style federated_program color:#333,fill:#bbb,stroke:#666,stroke-width:3px; 87 88 subgraph Server[In-Process Server] 89 server(server.py) 90 http_actions(http_actions.py) 91 plan_utils(plan_utils.py) 92 93 subgraph Handlers[HTTP Handlers] 94 aggregations(aggregations.py) 95 eligibility_eval_tasks(eligibility_eval_tasks.py) 96 media(media.py) 97 task_assignments(task_assignments.py) 98 end 99 end 100 101 subgraph FP_Platform[Federated Program Platform] 102 federated_context(federated_context.py) 103 federated_computation(federated_computation.py) 104 federated_data_source(federated_data_source.py) 105 checkpoint_tensor_reference(checkpoint_tensor_reference.py) 106 end 107 end 108 109 client & server --> Handlers 110 server --> http_actions & plan_utils 111 Handlers --> http_actions 112 federated_program --> federated_context & federated_computation & federated_data_source 113 federated_context --> checkpoint_tensor_reference & server 114``` 115 116### Client 117 118The [Federated Computations Client](../client) 119library is used by applications running on end-user devices to run 120server-defined computations over on-device data and report back results (such as 121updated model weights) to be aggregated by the server. 122 123> See `federated_program_test.py` for command-line flags that should be used 124> when running `//fcp/client:client_runner_main`. 125 126> ⚠️ The client requires TLS when connecting to any host other than `localhost`. 127> The server's public and private keys will need to be provided to the 128> `demo.FederatedContext` constructor, and the corresponding CA certificate will 129> need to be passed to the client library (e.g., via `--test_cert` for 130> `client_runner_main`). 131 132### Federated Program Platform 133 134The demo Federated Computations platform is a 135[Federated Program platform](https://github.com/tensorflow/federated/blob/main/tensorflow_federated/python/program/README.md#platform-specific-components) 136implementation that allows TFF computations to be run using Federated 137Computations Clients. 138 139A production implementation could reuse much of this code as-is, though 140`federated_context.py` would need to be updated to communicate with remote 141server(s) instead of an in-process server. 142 143#### `federated_context.py` 144 145Contains a 146[`tff.program.FederatedContext`](https://www.tensorflow.org/federated/api_docs/python/tff/program/FederatedContext) 147implementation for running computations on the demo Federated Computations 148platform. 149 150This module uses libraries in 151[`fcp/artifact_building`](../artifact_building) to 152convert TFF computations to the format expected by the 153[in-process server](#in-process-server) and [client](#client). 154 155#### `federated_computation.py` 156 157Contains a 158[`tff.Computation`](https://www.tensorflow.org/federated/api_docs/python/tff/Computation) 159subclass for computations that will be run by the demo Federated Computations 160platform. 161 162#### `federated_data_source.py` 163 164Contains a 165[`tff.program.FederatedDataSource`](https://www.tensorflow.org/federated/api_docs/python/tff/program/FederatedDataSource) 166implementation for representing on-device data sources. 167 168#### `checkpoint_tensor_reference.py` 169 170Contains a 171[`tff.program.MaterializableValueReference`](https://www.tensorflow.org/federated/api_docs/python/tff/program/MaterializableValueReference) 172implementation that reads values from a TensorFlow checkpoint. 173 174### In-Process Server 175 176An in-process HTTP(S) server that implements the 177[Federated Compute protocol](../protos/federatedcompute). 178This server is responsible for selecting which clients will contribute to each 179computation invocation (**task**), broadcasting computations and state to 180clients, aggregating the results of on-device computation, and incorporating 181that aggregate information back into the model or metrics. 182 183In a production implementation, each Federated Compute protocol service would 184likely be handled by a separate replicated microservice, not a Python module. 185 186#### `server.py` 187 188Provides the interface for setting up and stopping the in-process HTTP(S) server 189and running computations provided by the `FederatedContext`. This module is 190responsible for notifying the various Federated Compute protocol service 191implementations when a new task has been added and then managing the lifecycle 192of that task. 193 194#### `eligibility_eval_tasks.py` 195 196Contains handlers for the Federated Compute protocol's 197[EligibilityEvalTasks](../protos/federatedcompute/eligibility_eval_tasks.proto) 198service. This service is responsible for serving optional pre-task-assignment 199computations that determines to which tasks each client is eligible to 200contribute. The demo platform does not currently support configuring Eligibility 201Eval tasks; clients are considered to be eligible for all tasks. 202 203#### `task_assignments.py` 204 205Contains handlers for the Federated Compute protocol's 206[TaskAssignments](../protos/federatedcompute/task_assignments.proto) 207service. This service is responsible for either assigning each client to a 208task -- or rejecting the client. 209 210#### `aggregations.py` 211 212Contains handlers for the Federated Compute protocol's 213[Aggregations](../protos/federatedcompute/aggregations.proto) 214service. This service is responsible for aggregating client-reported data using 215the 216[simple Aggregation Protocol](../aggregation/protocol/simple_aggregation) 217library. 218 219Note that the demo platform does not currently contain an implementation of the 220[SecureAggregations](../protos/federatedcompute/secure_aggregations.proto) 221service. 222 223#### `media.py` 224 225Contains handlers for HTTP uploads and downloads using `PUT` and `GET` requests. 226 227A production implementation will likely replace this module with a 228deployment-environment-specific download service; a custom upload service 229implementation may be needed since it should not persistently store 230client-uploaded data. 231 232#### `http_actions.py` 233 234Contains helper functions for converting proto-based handlers into HTTP 235handlers. This conversion mimics the Cloud Endpoints 236[HTTP to gRPC transcoding](https://cloud.google.com/endpoints/docs/grpc/transcoding). 237 238#### `plan_utils.py` 239 240Contains helper functions for constructing the TensorFlow graph and input 241checkpoint used by the client and running TensorFlow-based post-processing on 242aggregated results. 243