Summary: `is_backend_available` takes in a string and expects it to only be backend, if its given a composite (device:backend) string, it fails.
Reviewed By: prashrock
Differential Revision: D81886736
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165061
Approved by: https://github.com/H-Huang
It generally recommended to use `is/is not` to compare types. Therefore this series of changes apply this suggestion in the code base, and it aims to finally enabling related linter checks.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165037
Approved by: https://github.com/mlazos
For https://github.com/pytorch/pytorch/issues/114850, we will port distributed tests to Intel GPU. This PR will work on some test files under test/distributed. We could enable Intel GPU with following methods and try the best to keep the original code styles:
- instantiate_device_type_tests()
- use "torch.accelerator.current_accelerator()" to determine the accelerator backend
- use requires_accelerator_dist_backend to allow both nccl and xccl test
- enabled XPU for some test path
- Change the hardcoded world_size according to device_count.
- Unify some common code under torch/testing/_internal for multiple backend, for example:
Added xpu for Backend.backend_capability and dist.Backend.register_backend()
Pull Request resolved: https://github.com/pytorch/pytorch/pull/159473
Approved by: https://github.com/guangyey, https://github.com/d4l3k
Today we can initialize a mixed-backend process group (e.g. "cpu:gloo,cuda:nccl") but we can only pass one set of process group options.
However, when we call `split_group`, we retrieve that set of options from the parent PG and pass it to the ProcessGroup::groupSplit C++ API, which then attempts to propagate that set of options to all backends.
This leads to an assert on some user code, where ProcessGroupGloo::split is expecting gloo options but receives nccl options instead.
Arguably the APIs as currently designed are just broken; we should not ever expect a single set of backend options to apply across multiple backends. However, fixing this would require changing quite a few public APIs.
As a quick fix, since user-provided options really only exist for NCCL, just warn and fall-back to defaulted options for Gloo if non-gloo options are detected.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162424
Approved by: https://github.com/d4l3k, https://github.com/fduwjj, https://github.com/H-Huang
For https://github.com/pytorch/pytorch/issues/114850, we will port distributed tests to Intel GPU. This PR will work on some test files under test/distributed. We could enable Intel GPU with following methods and try the best to keep the original code styles:
- instantiate_device_type_tests()
- use "torch.accelerator.current_accelerator()" to determine the accelerator backend
- use requires_accelerator_dist_backend to allow both nccl and xccl test
- enabled XPU for some test path
- Change the hardcoded world_size according to device_count.
- Unify some common code under torch/testing/_internal for multiple backend, for example:
Added xpu for Backend.backend_capability and dist.Backend.register_backend()
Pull Request resolved: https://github.com/pytorch/pytorch/pull/159473
Approved by: https://github.com/guangyey, https://github.com/d4l3k
Today we can initialize a mixed-backend process group (e.g. "cpu:gloo,cuda:nccl") but we can only pass one set of process group options.
However, when we call `split_group`, we retrieve that set of options from the parent PG and pass it to the ProcessGroup::groupSplit C++ API, which then attempts to propagate that set of options to all backends.
This leads to an assert on some user code, where ProcessGroupGloo::split is expecting gloo options but receives nccl options instead.
Arguably the APIs as currently designed are just broken; we should not ever expect a single set of backend options to apply across multiple backends. However, fixing this would require changing quite a few public APIs.
As a quick fix, since user-provided options really only exist for NCCL, just warn and fall-back to defaulted options for Gloo if non-gloo options are detected.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162424
Approved by: https://github.com/d4l3k, https://github.com/fduwjj, https://github.com/H-Huang
This PR is greatly simplified now that it stacked on top of a PR that builds with distributed always. We only need to stub functions that may not be defined due to a backend not being enabled.
Signed-off-by: Edward Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/159889
Approved by: https://github.com/wconstab
ghstack dependencies: #160449
This PR is greatly simplified now that it stacked on top of a PR that builds with distributed always. We only need to stub functions that may not be defined due to a backend not being enabled.
Signed-off-by: Edward Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/159889
Approved by: https://github.com/wconstab
ghstack dependencies: #160449
This PR is greatly simplified now that it stacked on top of a PR that builds with distributed always. We only need to stub functions that may not be defined due to a backend not being enabled.
Signed-off-by: Edward Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/159889
Approved by: https://github.com/wconstab
ghstack dependencies: #160449
For https://github.com/pytorch/pytorch/issues/114850, we will port distributed tests to Intel GPU. This PR will work on some test files under test/distributed. We could enable Intel GPU with following methods and try the best to keep the original code styles:
- instantiate_device_type_tests()
- use "torch.accelerator.current_accelerator()" to determine the accelerator backend
- use requires_accelerator_dist_backend to allow both nccl and xccl test
- enabled XPU for some test path
- Change the hardcoded world_size according to device_count.
- Unify some common code under torch/testing/_internal for multiple backend, for example:
Added xpu for Backend.backend_capability and dist.Backend.register_backend()
Pull Request resolved: https://github.com/pytorch/pytorch/pull/159473
Approved by: https://github.com/guangyey, https://github.com/d4l3k
This PR is greatly simplified now that it stacked on top of a PR that builds with distributed always. We only need to stub functions that may not be defined due to a backend not being enabled.
Signed-off-by: Edward Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/159889
Approved by: https://github.com/wconstab
ghstack dependencies: #160449
This PR is greatly simplified now that it stacked on top of a PR that builds with distributed always. We only need to stub functions that may not be defined due to a backend not being enabled.
Signed-off-by: Edward Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/159889
Approved by: https://github.com/wconstab
ghstack dependencies: #160449
`send_object_list` and `recv_object_list` use regular `send`/`recv` P2P ops which means that they will create 2-rank NCCL communicators between ranks if the communicators have not been initialized.
This adds an option `use_batch` which will call the send/recv with `batch_isend_irecv` which will re-use the communicators already initialized for collectives in the group.
---
BatchP2P ops, creates (or use existing) communicator keyed by device index
Regular P2P Ops, creates (or use existing) dedicated 2-rank communicators keyed by “rank1:rank2”
See:
c8205cb354/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp (L3980-L4008)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160342
Approved by: https://github.com/wconstab