mirror of
https://github.com/zebrajr/pytorch.git
synced 2025-12-06 12:20:52 +01:00
This adds a `reduce_scatter` implementation for ProcessGroupGloo. This is a pretty naive implementation as it does 1 allreduce per rank but may be useful for testing in FSDP etc. There was an existing implementation of reduce_scatter_tensor/reduce_scatter_tensor_coalesed that has a very similar implementation but requires a fixed tensor size per rank. If users find these functions to be too slow we can address them as issues arise. Gloo now supports all major distributed operations. Quite a few of these were added by @rohan-varma and @yifuwang but they didn't update the support chart. We also have `CUDAWork` variants of most operations so those were also added to the chart. Test plan: ``` pytest -v test/distributed/test_c10d_gloo.py -k reduce_scatter ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/149869 Approved by: https://github.com/fduwjj |
||
|---|---|---|
| .. | ||
| cpp | ||
| source | ||
| .gitignore | ||
| libtorch.rst | ||
| make.bat | ||
| Makefile | ||
| README.md | ||
| requirements.txt | ||
Please see the Writing documentation section of CONTRIBUTING.md for details on both writing and building the docs.