Name Date Size #Lines LOC

..--

configurations/H25-Apr-2025-3735

data/H25-Apr-2025-5947

metrics/H25-Apr-2025-248202

models/H25-Apr-2025-4235

server/H25-Apr-2025-340302

trainer/H25-Apr-2025-476412

README.mdH A D25-Apr-20252.4 KiB6346

launcher.pyH A D25-Apr-202517.3 KiB546493

utils.pyH A D25-Apr-20252 KiB6150

README.md

1# RPC PS Benchmark
2
3## How to add your experiment
4
51. Data
6    - Create a data class and add it to the data directory
7    - Update benchmark_class_helper.py to include your data class in the data_map
8    - Add configurations to data_configurations.json in the configurations directory
92. Model
10    - Create a model class and add it to the model directory
11    - Update benchmark_class_helper.py to include your model class in the model_map
12    - Add configurations to model_configurations.json in the configurations directory
133. Trainer
14    - Create a trainer class and add it to the trainer directory
15    - Update benchmark_class_helper.py to include your trainer class in the trainer_map
16    - Add configurations to trainer_configurations.json in the configurations directory
174. Parameter Server
18    - Create a parameter server class and add it to the parameter_servers directory
19    - Update benchmark_class_helper.py to include your parameter_server class in the ps_map
20    - Add configurations to parameter_server_configurations.json in the configurations directory
215. Script
22    - Create a bash script for your experiment and add it to the experiment_scripts directory
236. Testing
24    - Add a test method for your script to test_scripts.py
25
26## Trainer class
27
28The trainer directory contains base classes to provide a starting point for implementing a trainer.
29Inherit from a base class and implement your trainer. The benchmark has two requirements for trainers.
30
311. It must implement a __init__ method that takes rank, trainer_count, and ps_rref as arguments
32
33    ```python
34    def __init__(self, rank, trainer_count, ps_rref, backend, use_cuda_rpc):
35    ```
36
372. It must implement a train method that takes model and data as arguments.
38
39    ```python
40    def train(self, model, data):
41    ```
42
43## Parameter Server class
44
45The parameter_server directory contains base classes to provide a starting point for implementing a parameter server.
46Inherit from a base class and implement your parameter server. The benchmark has two requirements for parameter servers.
47
481. It must implement a __init__ method that takes rank and ps_trainer_count as arguments
49
50    ```python
51    def __init__(self, rank, ps_trainer_count, backend, use_cuda_rpc):
52    ```
53
542. It must implement a reset_state method
55
56    ```python
57    def reset_state(ps_rref):
58    ```
59
60## Testing
61
62Use `pytest` to run the test methods added to test_scripts.py. To test all the scripts added use `pytest test_scripts.py`.
63