Headers under torch/csrc/distributed may be referened with relative path, e.g., "<c10d/...>". However, relative path cannot be gracefully handled by Meta internal build when the NCCL PG is hipified to support AMD/RCCL because the "hipified" header files are generated in other directories. Moreover, using absolute path for header inclusion is the state-of-the-art in most components in Pytorch. Thus, this patch refactors all header paths in torch/csrc/distributed to be absolute.
See D39835774 for more details about Meta internal complication.
**How to test**: commit 9e5d199 removes -I./torch/csrc/distributed in compile options. Thus use it to verify we don't miss any relative path use of torch/csrc/distributed headers.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85780
Approved by: https://github.com/kumpera, https://github.com/huydhn
Headers under torch/csrc/distributed may be referened with relative path, e.g., "<c10d/...>". However, relative path cannot be gracefully handled by Meta internal build when the NCCL PG is hipified to support AMD/RCCL because the "hipified" header files are generated in other directories. Moreover, using absolute path for header inclusion is the state-of-the-art in most components in Pytorch. Thus, this patch refactors all header paths in torch/csrc/distributed to be absolute.
See D39835774 for more details about Meta internal complication.
**How to test**: commit 9e5d199 removes -I./torch/csrc/distributed in compile options. Thus use it to verify we don't miss any relative path use of torch/csrc/distributed headers.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85780
Approved by: https://github.com/kumpera
### Changes
- Move ProcessGroup::Work into its own class and update all the references to it / header includes.
#### Motivation
In the future PRs we will repurpose ProcessGroup to instead contain a list of Backends (ProcessGroupNCCL/Gloo/UCC) and perform dispatching to them based on tensor type. This change is prevent a circular dependency with ProcessGroup depending on Backend and Backend depending on ProcessGroup::Work.
Differential Revision: [D38839212](https://our.internmc.facebook.com/intern/diff/D38839212)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83680
Approved by: https://github.com/kwen2501
This PR enables python process group usage with DDP by doing the following:
- Surface PG::Work::getFuture() as overridable()
- Use Work::getFuture() to retrieve values from a PG.
- Add _create_work_from_future python method that creates a Work object that wraps a Future.
To test this changes we use both strategies to run DDP with a python based PG.
The reason for offering two methods is that both have short-comings.
The wrapper method is harder to troubleshoot as there's no visibility of how the future is used.
The subclass method has memory management issues as can be noticed in the test suite by having to keep Work instances alive by storing them in PG fields.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79176
Approved by: https://github.com/rohan-varma
Summary:
`batch_isend_irecv` previously only worked for two-rank cases,
otherwise it would hang, e.g. pytorch/pytorch#73960. This Diff extends
`batch_isend_irecv` to support more than two ranks. The fix is by treating
the operation more like a collective rather than two-rank P2P when selecting
the communicator, since there can be more ranks participating in the batch call than "my" rank and "my" peer.
Rules:
- If `batch_isend_irecv` is the first collective call (including collectives and
all-to-all) in the `group` given as the argument, then all ranks of the
`group` are expected to participate in this call.
- Otherwise, if it is not the first collective call in the `group` (i.e. the
communicator has been initialized), then batched P2P communication involving
only subset of processes of the `group` is allowed.
Test Plan:
Added p2p_tests.py testing the following patterns:
+ sendrecv_neighbor(input, output) # Ring like neighbor exchange
+ sendrecv_ripple(input, output) # Exchange with growing distance (pytorch/pytorch#73960)
+ sendrecv_P2P(input, output) # Single P2P operation
+ isendrecv_P2P(input, output) # Single non-blocking P2P operation
+ isendrecv_P2P_batch(input, output, 0) # batched P2P between only two ranks
+ isendrecv_P2P_batch(input, output, 1) # batched P2P within a new group created for two ranks
Differential Revision: D35122664
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74701
Approved by: https://github.com/mingzhe09088, https://github.com/osalpekar
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73702
This PR adds logging to track usage of ProcessGroup backends. The
change involves adding an `init()` method to the ProcessGroup base class and
modifying all the subclasses to call it.
We can't add logging directly in the ProcessGroup constructor since
`getBackendName()` is pure virtual and can't be called from the base class
constructor. See
https://isocpp.org/wiki/faq/strange-inheritance#calling-virtuals-from-ctors for
more details.
ghstack-source-id: 150381852
Test Plan: check logging
Reviewed By: cbalioglu
Differential Revision: D34596295
fbshipit-source-id: 5baa7bfb53bdcd1b4032292c4e7715467f983288
(cherry picked from commit b2344144fe0c25db02c2e958f2acd772e9cd4dc8)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73166
This PR refactors, cleans up, and optimizes the implementation of `TORCH_DISTRIBUTED_DEBUG`. It also introduces three new user APIs: `get_debug_level()`, `set_debug_level()`, and `set_debug_level_from_env()` to retrieve and modify the debug level after a process has started.
ghstack-source-id: 149778566
Test Plan: Run the existing unit tests.
Reviewed By: rohan-varma
Differential Revision: D34371226
fbshipit-source-id: e18443b411adcbaf39b2ec999178c198052fcd5b
(cherry picked from commit 26d6bb1584b83a0490d8b766482656a5887fa21d)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71711
This will fix a ton of broken asserts that should always fire but never actually fire.
All would have been caught with `-Wstring-conversion` warnings enabled.
Test Plan: CI Pass
Differential Revision: D33743605
fbshipit-source-id: 062641f9d5d02c6e317c5a286fd01017cf77237f
(cherry picked from commit 639b42e04b)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64943
Most ProcessGroup Collective APIs are pure virtual. As a result, c10d extensions need to override all of them and throw an error if they don't need certain APIs. This is too verbose for users. This commit changes those collective APIs to virtual and throws an error by default. Note that ProcessGroup is still an abstract class as `getBackendName` is a pure virtual function that all subclasses have to override.
cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang cbalioglu gcramer23
Test Plan: Imported from OSS
Reviewed By: cbalioglu
Differential Revision: D30906866
Pulled By: mrshenli
fbshipit-source-id: c4df8962d60350a44d2df72fd04f9dd6eadb9fa6
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60543
Since now c10d is part of libtorch, it would also be nice if the sources lived all in one place.
ghstack-source-id: 132306292
Test Plan: It builds
Reviewed By: cbalioglu
Differential Revision: D29062002
fbshipit-source-id: d9e1301e9d73e1643fa0f0119cd2d618f1ad52e6