mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

History

Yulv-git ac2d2e3a3d Fix some typos. Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/75561 Approved by: https://github.com/albanD		2022-04-11 21:55:59 +00:00
..
configurations	refactor ps benchmark (#60784 )	2021-07-14 13:19:13 -07:00
data	refactor ps benchmark (#60784 )	2021-07-14 13:19:13 -07:00
metrics	benchmark rpc ps (#57454 )	2021-05-07 19:58:40 -07:00
models	refactor ps benchmark (#60784 )	2021-07-14 13:19:13 -07:00
server	refactor ps benchmark (#60784 )	2021-07-14 13:19:13 -07:00
trainer	Fix some typos.	2022-04-11 21:55:59 +00:00
launcher.py	refactor ps benchmark (#60784 )	2021-07-14 13:19:13 -07:00
README.md	test experiment script (#57925 )	2021-05-12 10:22:47 -07:00
utils.py	[DDP Communication Hook] Update get_tensor and set_tensor to be cleaner naming conventions (buffer() and set_buffer()) (#62662 )	2021-08-04 09:27:31 -07:00

README.md

RPC PS Benchmark

How to add your experiment

Data
- Create a data class and add it to the data directory
- Update benchmark_class_helper.py to include your data class in the data_map
- Add configurations to data_configurations.json in the configurations directory
Model
- Create a model class and add it to the model directory
- Update benchmark_class_helper.py to include your model class in the model_map
- Add configurations to model_configurations.json in the configurations directory
Trainer
- Create a trainer class and add it to the trainer directory
- Update benchmark_class_helper.py to include your trainer class in the trainer_map
- Add configurations to trainer_configurations.json in the configurations directory
Parameter Server
- Create a parameter server class and add it to the parameter_servers directory
- Update benchmark_class_helper.py to include your parameter_server class in the ps_map
- Add configurations to parameter_server_configurations.json in the configurations directory
Script
- Create a bash script for your experiment and add it to the experiment_scripts directory
Testing
- Add a test method for your script to test_scripts.py

Trainer class

The trainer directory contains base classes to provide a starting point for implementing a trainer. Inherit from a base class and implement your trainer. The benchmark has two requirements for trainers.

It must implement a init method that takes rank, trainer_count, and ps_rref as arguments
```
def __init__(self, rank, trainer_count, ps_rref, backend, use_cuda_rpc):
```
It must implement a train method that takes model and data as arguments.
```
def train(self, model, data):
```

Parameter Server class

The parameter_server directory contains base classes to provide a starting point for implementing a parameter server. Inherit from a base class and implement your parameter server. The benchmark has two requirements for parameter servers.

It must implement a init method that takes rank and ps_trainer_count as arguments
```
def __init__(self, rank, ps_trainer_count, backend, use_cuda_rpc):
```
It must implement a reset_state method
```
def reset_state(ps_rref):
```

Testing

Use pytest to run the test methods added to test_scripts.py. To test all the scripts added use pytest test_scripts.py.