Commit Graph

2 Commits

Author SHA1 Message Date
Xiang Gao
cda210e23b UCC PG build in CI (#81583)
- Modifies the current cmake build definitions to use `find_package` to find UCX and UCC installed in the system
- Install UCX and UCC in CUDA dockers
- Build PyTorch with `USE_UCC=1` in pipelines
- Currently, we are not running unit tests with the UCC PG. Those tests will be added in future PRs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81583
Approved by: https://github.com/vtlam, https://github.com/malfet
2022-08-10 00:23:47 +00:00
Terry Lam
54bdaf76d6 [PFC] Native UCC process group for Pytorch (#79918)
Summary:
This diff integrates UCC process group as a native component of Pytorch Distributed core. It is based on the existing torch-ucc (https://github.com/facebookresearch/torch_ucc) as the wrapper for UCC collective communication library.
The environment and cmake variables are named in mirroring to the existing process groups such as NCCL and Gloo. Specifically,
- USE_UCC: enables UCC PG. This defaults to OFF, so there is no breakage of existing builds that do not have UCX/UCC external libraries.
- USE_SYSTEM_UCC: uses external UCX and UCC shared libraries that are set accordingly with UCX_HOME and UCC_HOME.

Currently, this diff only supports USE_SYSTEM_UCC=ON, i.e., requiring users to specify external libraries for UCX and UCC. In subsequent diffs, we will add UCX and UCC repos as third-party dependencies in pytorch/third-party.

Test Plan:
Passed Torch-UCC tests that invoke UCC process group. For example:

$ sh test/start_test.sh test/torch_allreduce_test.py --backend gloo --use-cuda
...
Test allreduce: succeeded

Differential Revision: D36973688

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79918
Approved by: https://github.com/kwen2501, https://github.com/kingchc
2022-07-12 14:45:44 +00:00