Commit Graph

1265 Commits

Author SHA1 Message Date
Yuxin Wu
c8ed84ad06 Fix a static initialization order fiasco in c10d (#90149)
The `TORCH_LIBRARY_IMPL` registrations in `OpsImpl.cpp` needs to happen after `ProcessGroup` is registered as a torch class -- which happens in `Ops.cpp`. However, the order of the registrations is undefined between the two files.

If the registration in `OpsImpl.cpp` runs before `Ops.cpp`, we get a crash at program launch similar to #83255 . This happens in our internal build.

This PR moves `OpsImpl.cpp` to the end of `Oops.cpp`. Because according to the omniscient lord of chatGPT:
<img width="600" alt="2022-12-04_19-25" src="https://user-images.githubusercontent.com/1381301/205542847-3535b319-3c2a-4e8e-bc11-27913f6afb39.png">

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90149
Approved by: https://github.com/kwen2501, https://github.com/H-Huang, https://github.com/soumith
2022-12-12 08:21:54 +00:00
Howard Huang
80150788bc [21/N] Add alltoall_base custom op with CPU/CUDA implementations (#89813)
Differential Revision: [D41812670](https://our.internmc.facebook.com/intern/diff/D41812670)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89813
Approved by: https://github.com/kwen2501
2022-12-08 23:39:26 +00:00
Howard Huang
e65ee3975f [20/N] Add recv_any_source custom op with CPU/CUDA implementations (#89505)
Differential Revision: [D41812671](https://our.internmc.facebook.com/intern/diff/D41812671)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89505
Approved by: https://github.com/kwen2501
2022-12-08 23:39:26 +00:00
Masaki Kozuki
508916128d [ReduceOp] ameliorate custom __eq__ (#90088)
Improve the completeness of `ReduceOp.__eq__`.

Should support the equal operator with the first argument of `RedOpType` and the second of `ReduceOp` in a follow-up.

Fixes #90072

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90088
Approved by: https://github.com/kwen2501
2022-12-06 05:13:50 +00:00
Aidyn-A
a5430e1067 [UCC] Properly finalize unsuccessful collective posts (#89306)
This PR add a `ucc_collective_finalize` call if `ucc_collective_post` and `ucc_collective_triggered_post` were not successful.
According to the [UCC documentation](https://openucx.github.io/ucc/api/v1.1/html/group___u_c_c___c_o_l_l_e_c_t_i_v_e_s.html):
```
On error, request handle becomes invalid, user is responsible to call ucc_collective_finalize to free allocated resources.
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89306
Approved by: https://github.com/kwen2501
2022-12-01 23:01:45 +00:00
Howard Huang
5797f74924 [19/N] Add monitored_barrier custom op with CPU implementation (#89318)
Differential Revision: [D41415324](https://our.internmc.facebook.com/intern/diff/D41415324)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89318
Approved by: https://github.com/kwen2501
2022-11-22 14:18:40 +00:00
Howard Huang
be22b5d39f [18/N] Add allgather_coalesced custom op with CPU/CUDA implementations (#89317)
Differential Revision: [D41415321](https://our.internmc.facebook.com/intern/diff/D41415321)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89317
Approved by: https://github.com/kwen2501
2022-11-22 14:14:17 +00:00
Howard Huang
58a74f34f9 [17/N] Add _reduce_scatter_base custom op with CPU/CUDA implementation (#88903)
Differential Revision: [D41415325](https://our.internmc.facebook.com/intern/diff/D41415325)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88903
Approved by: https://github.com/kwen2501
2022-11-22 00:42:11 +00:00
Kirtesh Patil
fe276ea0f9 [UCC] Add pre & post processing for CPU collectives (#89030)
Summary: The CPU block in `collective_post` was missing pre & post processing. The reduce-scatter implementaion expects use of pre-processing callback to flatten the input tensors, however, the missing invocation meant grabage values were being passed.

Test Plan: Tested the reduce-scatter collective using PARAM

Reviewed By: eastzone

Differential Revision: D41291592

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89030
Approved by: https://github.com/kingchc, https://github.com/kwen2501
2022-11-16 16:40:24 +00:00
Masaki Kozuki
63e16216d8 [c10d] Implement __instancecheck__ for c10d::ReduceOp (#88275)
Summary:
- Customize the metaclass of `torch.distributed.distributed_c10d.ReduceOp` for the sake of custom `__instancecheck__`
- Add `copy.copy`, `copy.deepcopy`, and `pickle` support with tests

Rel:
- #81272
- #84243
- #87191
- #87303
- #87555

Ref:
- https://github.com/pybind/pybind11/issues/2696

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88275
Approved by: https://github.com/wanchaol
2022-11-15 13:21:41 +00:00
Kazuaki Ishizaki
e0c194f10b Fix typos in messages under torch (#88961)
This PR fixes typos of messages and parms in c++ source and head files under `torch` directory.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88961
Approved by: https://github.com/albanD
2022-11-14 19:06:41 +00:00
Howard Huang
df1df9d10a [16/N] Add _allgather_base custom op with CPU/CUDA implementation (#88889)
Differential Revision: [D41227739](https://our.internmc.facebook.com/intern/diff/D41227739)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88889
Approved by: https://github.com/kwen2501
2022-11-12 22:31:07 +00:00
Howard Huang
6e5f736d86 [15/N] Add allreduce_coalesced custom op with CPU/CUDA implementations (#88846)
Differential Revision: [D41227740](https://our.internmc.facebook.com/intern/diff/D41227740)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88846
Approved by: https://github.com/kwen2501
2022-11-12 14:23:45 +00:00
Howard Huang
3a3500fa08 [13/N] Update gather with CPU/CUDA implementations (#86409)
Differential Revision: [D40181612](https://our.internmc.facebook.com/intern/diff/D40181612)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86409
Approved by: https://github.com/kwen2501
2022-11-09 22:11:40 +00:00
Howard Huang
55df18e3da [12/N] Update scatter with CPU/CUDA implementations (#86408)
Differential Revision: [D40181613](https://our.internmc.facebook.com/intern/diff/D40181613)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86408
Approved by: https://github.com/kwen2501
2022-11-09 18:40:25 +00:00
Howard Huang
bc66ddb5cb Add torch.distributed.DistBackendError exception type, thrown from C10D_NCCL_CHECK (#88134)
Currently all of the distributed errors are thrown from the `TORCH_CHECK` macro which throws a generic `RuntimeError`. This change introduced a new error type `DistBackendError` which derives from `RuntimeError` to signify there was an error with the backend communication library. This allows for better error handling and analysis at higher levels in the stack. Motivation: https://docs.google.com/document/d/1j6VPOkC6znscliFuiDWMuMV1_fH4Abgdq7TCHMcXai4/edit#heading=h.a9rc38misyx8

Changes:
- introduce new error type
- Update `C10D_NCCL_CHECK`

Sample script to demonstrate new error type

```python
# python -m torch.distributed.run --nproc_per_node=2 <script>.py

import torch
import torch.distributed as dist

if __name__ == "__main__":
    dist.init_process_group("nccl")
    dist.broadcast(torch.tensor([1, 2, 3]).cuda(), 0)
```

Differential Revision: [D40998803](https://our.internmc.facebook.com/intern/diff/D40998803)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88134
Approved by: https://github.com/rohan-varma
2022-11-08 13:26:42 +00:00
Howard Huang
81f74eed75 [11/N] Update all_to_all with CPU/CUDA implementations (#86407)
* #83916 [7/N] [Dispatchable Collectives] Update reduce with CPU / CUDA implementations
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86407
Approved by: https://github.com/kwen2501
2022-11-01 17:54:13 +00:00
Howard Huang
bed8102741 [10/N] Update barrier with CPU/CUDA implementations (#86368)
### Changes
- Updates for the barrier collective
- NOTE: current change will not achieve dispatching of barrier since there is no tensor to read from

### Context
https://github.com/pytorch/pytorch/issues/86225

cc @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @osalpekar @jiayisuse @kwen2501 @awgu
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86368
Approved by: https://github.com/kwen2501
2022-11-01 17:41:01 +00:00
Yanli Zhao
44f8efd5c1 [BE]fix DDP when the number of output features is zero (#87793)
Fixes #87280

Pull Request resolved: https://github.com/pytorch/pytorch/pull/87793
Approved by: https://github.com/rohan-varma
2022-11-01 15:27:40 +00:00
Howard Huang
20d849b982 [9/N] [Dispatchable Collectives] Update reduce_scatter with CPU / CUDA implementations (#86166)
### Changes
- Updates for the reduce_scatter collective

### Context
https://github.com/pytorch/pytorch/issues/86225
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86166
Approved by: https://github.com/kwen2501
2022-11-01 15:23:41 +00:00
Fuzzkatt
d13f1e6ab4 Add sequence number support for UCC (#85047)
Add sequence number support for UCC, mostly following format of ProcressGroupNCCL.
Pass new test: `test_all_gather_object_subgroup`
Add skips for gather tests: `test_gather_object` and `test_gather_object_subgroup`

cc @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @osalpekar @jiayisuse @H-Huang @kwen2501 @awgu
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85047
Approved by: https://github.com/kwen2501
2022-10-31 03:56:55 +00:00
Sergey Lebedev
19171a21ee Make barrier blocking in UCC (#86961)
Currently CUDA UCC barrier is nonblocking with respect to CPU and there is no flag to change it. To make UCC PG barrier behaviour consistent with NCCL PG in this PR barrier has changed to be always blocking.

cc @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @osalpekar @jiayisuse @H-Huang @kwen2501 @awgu
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86961
Approved by: https://github.com/kwen2501
2022-10-29 16:33:18 +00:00
soulitzer
adb76ef510 Expose API for backward execution order (#87507)
In this PR:
- graph_task stores graph roots on construction so that we can later traverse through the graph
- before the nodes are returned, they needed to be converted from raw_ptr to shared_ptr, and this should be OK because the graph is guaranteed to be alive

Pull Request resolved: https://github.com/pytorch/pytorch/pull/87507
Approved by: https://github.com/albanD
2022-10-26 21:28:45 +00:00
Howard Huang
4ef5f5dec7 Fix use after free in tensorpipe agent (#87627)
Fixes #87359, which identifies use after free for reverse device maps. This is only in the dynamic RPC feature and not effecting stable RPC code path.

Unfortunately the test `TensorPipeRpcTest.test_dynamic_rpc_existing_rank_can_communicate_with_new_rank_cuda` that is failing is also running into separate issue. I've temporarily disabled some of the test code to investigate the error in asychronously.

Testing plan:
- tested all the dynamic RPC tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87627
Approved by: https://github.com/rohan-varma
2022-10-25 04:17:43 +00:00
Masaki Kozuki
aa8248cc9a Reenable isinstance with torch.distributed.ReduceOp (#87303)
tentatively marking as draft as I haven't gotten a comprehensive list of side effects...

Ref: https://stackoverflow.com/questions/40244413/python-static-class-attribute-of-the-class-itself
Rel: https://github.com/pytorch/pytorch/issues/87191

cc @kwen2501
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87303
Approved by: https://github.com/wanchaol
2022-10-21 15:05:36 +00:00
Andrew Gu
f552eee427 [Docs] Remove outdated comment for sparse all-reduce (#87018)
https://github.com/pytorch/pytorch/pull/23917 switched to using allgatherv instead of allgather for gloo sparse all-reduce. This PR removes a comment saying to use allgatherv if available since that has already been done.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87018
Approved by: https://github.com/H-Huang
2022-10-17 21:17:07 +00:00
Howard Huang
3356d0385f [BE] Store helper functions C++ for python API parity (#82136)
Add helper functions for `store.set()`, `store.compare_set()` to accept string arguments instead of vector<uint_8> and refactored some usages internally
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82136
Approved by: https://github.com/rohan-varma
2022-10-12 17:49:38 +00:00
Louis Feng
55479fe80e Enable capturing of comm collective parameters (#98) (#85368)
Summary:
X-link: https://github.com/facebookresearch/torch_ucc/pull/98

Add tensor input, output, and other metadata for PyTorch comms.

Test Plan: P517138779

Reviewed By: Pavani-Panakanti

Differential Revision: D38357077

Pull Request resolved: https://github.com/pytorch/pytorch/pull/85368
Approved by: https://github.com/H-Huang
2022-10-11 04:38:26 +00:00
Howard Huang
ad449b338f [8/N] [Dispatchable Collectives] Update allgather with CPU / CUDA implementations  (#84423)
### Changes
- Updates for the allgather collective

### Context
https://github.com/pytorch/pytorch/issues/86225
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84423
Approved by: https://github.com/kwen2501
2022-10-10 17:18:48 +00:00
Howard Huang
8a1fc5d2f8 [7/N] [Dispatchable Collectives] Update reduce with CPU / CUDA implementations (#83916)
### Changes
- Updates for the reduce collective

### Context
https://github.com/pytorch/pytorch/issues/86225
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83916
Approved by: https://github.com/kwen2501
2022-10-10 15:58:37 +00:00
Chengqi Deng
b43ae1c411 Add reference counter in FileStore (#85601)
Fixes #67566.

This diff added a reference counter in the FileStore object. The underlying file would be removed only if the reference counter became 0.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/85601
Approved by: https://github.com/H-Huang
2022-10-07 17:59:29 +00:00
Howard Huang
d39e9c1e90 [6/N] [Dispatchable Collectives] Update recv with CPU / CUDA implementations (#83876)
*
### Changes
- Updates for the recv collective

### Context
https://github.com/pytorch/pytorch/issues/86225

Differential Revision: [D40044552](https://our.internmc.facebook.com/intern/diff/D40044552)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83876
Approved by: https://github.com/kwen2501
2022-10-04 20:30:21 +00:00
Howard Huang
3f2e7d5c9a [5/N] [Dispatchable Collectives] Update send with CPU / CUDA implementations (#83859)
Differential Revision: [D40044550](https://our.internmc.facebook.com/intern/diff/D40044550)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83859
Approved by: https://github.com/kwen2501
2022-10-04 14:32:37 +00:00
Fuzzkatt
d9421f8158 added fix for WorkUCC (#84368)
Added new constructor for WorkUCC to take in optional inputTensors argument for to enable record_shapes=True for profiling purposes. Tested at https://github.com/pytorch/pytorch/pull/84323 which manually merges in https://github.com/pytorch/pytorch/pull/83285.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/84368
Approved by: https://github.com/kingchc, https://github.com/kwen2501
2022-09-30 22:51:59 +00:00
Ke Wen
1f38abb5d2 Adopt ncclRemoteError (#85887)
`ncclRemoteError` was added in NCCL 2.13 to indicate a network error or a remote process exiting prematurely.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/85887
Approved by: https://github.com/wanchaol
2022-09-30 09:17:49 +00:00
Ke Wen
ade1c19612 Add reduce_scatter_tensor in place of _reduce_scatter_base (#85867)
This is a twin PR similar to the one for `all_gather_into_tensor` (#85686).
The philosophy for renaming `_reduce_scatter_base` instead of merging it is described in #85686.

Cc @rohan-varma @H-Huang @crcrpar @ptrblck @mrshenli

Pull Request resolved: https://github.com/pytorch/pytorch/pull/85867
Approved by: https://github.com/crcrpar, https://github.com/H-Huang
2022-09-30 05:48:16 +00:00
Min Si
1ad0048b64 Refactor distribuetd to use absolute header path (#85780)
Headers under torch/csrc/distributed may be referened with relative path, e.g., "<c10d/...>". However, relative path cannot be gracefully handled by Meta internal build when the NCCL PG is hipified to support AMD/RCCL because the "hipified" header files are generated in other directories. Moreover, using absolute path for header inclusion is the state-of-the-art in most components in Pytorch. Thus, this patch refactors all header paths in torch/csrc/distributed to be absolute.

See D39835774 for more details about Meta internal complication.

**How to test**: commit 9e5d199 removes -I./torch/csrc/distributed in compile options. Thus use it to verify we don't miss any relative path use of torch/csrc/distributed headers.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85780
Approved by: https://github.com/kumpera, https://github.com/huydhn
2022-09-30 05:13:50 +00:00
PyTorch MergeBot
a50d8864fc Revert "Refactor distribuetd to use absolute header path (#85780)"
This reverts commit 668082718a.

Reverted https://github.com/pytorch/pytorch/pull/85780 on behalf of https://github.com/huydhn due to Sorry for reverting your PR but it breaks build due to a missing file <c10d/Store.hpp>
2022-09-30 02:04:29 +00:00
Min Si
668082718a Refactor distribuetd to use absolute header path (#85780)
Headers under torch/csrc/distributed may be referened with relative path, e.g., "<c10d/...>". However, relative path cannot be gracefully handled by Meta internal build when the NCCL PG is hipified to support AMD/RCCL because the "hipified" header files are generated in other directories. Moreover, using absolute path for header inclusion is the state-of-the-art in most components in Pytorch. Thus, this patch refactors all header paths in torch/csrc/distributed to be absolute.

See D39835774 for more details about Meta internal complication.

**How to test**: commit 9e5d199 removes -I./torch/csrc/distributed in compile options. Thus use it to verify we don't miss any relative path use of torch/csrc/distributed headers.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85780
Approved by: https://github.com/kumpera
2022-09-30 00:27:24 +00:00
Andrey
dde43d083b [c10d] Reorder macros so they are defined before getting used (#85850)
Summary: Move preprocessor macros all the way up, so they are defined before being used.

Test Plan: existing tests

Reviewed By: wanchaol

Pull Request resolved: https://github.com/pytorch/pytorch/pull/85850
Approved by: https://github.com/wanchaol
2022-09-29 23:44:57 +00:00
Saliya Ekanayake
941d7a31f6 Pass group ranks and options to third party distributed backends (#73164)
Fixes #73163

PyTorch's [_new_process_group_helper()](9f541aa3ac/torch/distributed/distributed_c10d.py (L633)) does not pass group's participating ranks to the backend.

This PR adds the above capability. Also, refactors some variables for better clarity.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73164
Approved by: https://github.com/kumpera
2022-09-29 17:28:58 +00:00
Wanchao Liang
72b32f1644 [c10d] move ncclgetlasterror directive definition upfront (#85825)
Move the directive definition of ncclGetLastError() upfront so that
C++ preprocessor does not treat this as a empty string
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85825
Approved by: https://github.com/H-Huang, https://github.com/kwen2501
2022-09-29 06:17:43 +00:00
Howard Huang
06e0583fb0 [4/N] [Dispatchable Collectives] Update all_reduce_ with CPU / CUDA implementations (#83810)
### About this PR
* Update the all_reduce op to dispatch to cpu and cuda implementations. Right now they both perform the same logic so this is essentially a no-op.
* Update test to validate that a separate device implementation is not supported.

### About this stack
In the future we will repurpose ProcessGroup to instead contain a list of Backends (ProcessGroupNCCL/Gloo/UCC) and perform dispatching to them based on tensor type. The CPU and CUDA implementations will be updated to have process group select its CPU and CUDA backends respectively.

Differential Revision: [D39506979](https://our.internmc.facebook.com/intern/diff/D39506979)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83810
Approved by: https://github.com/kwen2501
2022-09-28 08:48:32 +00:00
Howard Huang
ccac8d13d5 [3/N] [Dispatchable Collectives] Update broadcast_ with CPU and CUDA implementations (#83735)
### About this PR
* Update the broadcast op to dispatch to cpu and cuda implementations. Right now they both perform the same logic so this is essentially a no-op.
* Add test to validate that a separate device implementation is not supported.

### About this stack
In the future we will repurpose ProcessGroup to instead contain a list of Backends (ProcessGroupNCCL/Gloo/UCC) and perform dispatching to them based on tensor type. The CPU and CUDA implementations will be updated to have process group select its CPU and CUDA backends respectively.

Differential Revision: [D38876771](https://our.internmc.facebook.com/intern/diff/D38876771)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83735
Approved by: https://github.com/kwen2501
2022-09-28 03:24:06 +00:00
Ke Wen
3276b51243 Add environment parse function that supports default value (#85563)
We use "-2" to represent an unset environment variable.
Now adding a util function to attach default value if environment variable is unset.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85563
Approved by: https://github.com/rohan-varma, https://github.com/H-Huang
2022-09-28 02:56:50 +00:00
James Zeng
7934596b70 [ucc] Remove internal tracing (#85730)
Summary: Remove internal tracing since this was not upstreamed yet.

Test Plan: All PyTorch test should pass.

Differential Revision: D39853937

Pull Request resolved: https://github.com/pytorch/pytorch/pull/85730
Approved by: https://github.com/kwen2501
2022-09-27 23:27:43 +00:00
PyTorch MergeBot
b360d66391 Revert "Add environment parse function that supports default value (#85563)"
This reverts commit 784f4ba1ce.

Reverted https://github.com/pytorch/pytorch/pull/85563 on behalf of https://github.com/huydhn due to Fail test_DistributedDataParallel (main.TestDistBackendWithSpawn)
2022-09-27 02:55:59 +00:00
Ke Wen
784f4ba1ce Add environment parse function that supports default value (#85563)
We use "-2" to represent an unset environment variable.
Now adding a util function to attach default value if environment variable is unset.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85563
Approved by: https://github.com/rohan-varma, https://github.com/H-Huang
2022-09-27 00:34:50 +00:00
Howard Huang
db40fbdee0 Add deprecation warning to ProcessGroupRoundRobin (#85158)
Trying to add any deprecation messages we anticipate we need before 1.13 branch cut. Add deprecation message to process group round robin.

## Test

```python
import torch.distributed as dist

if __name__ == "__main__":
    pg = dist._round_robin_process_groups(
        [
            dist.ProcessGroupGloo(dist.TCPStore("localhost", 29500, 1, True), 0, 1)
        ]
    )
```

gives message
```
W0916 16:19:38.367360 68031 ProcessGroupRoundRobin.cpp:12] Warning: ProcessGroupRoundRobin is deprecated and scheduled to be removed after this current release (1.13). Please file an issue on https://github.com/pytorch/pytorch/issues if there are any concerns or issues with this deprecation. (function ProcessGroupRoundRobin)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85158
Approved by: https://github.com/rohan-varma
2022-09-24 18:00:28 +00:00
Wanchao Liang
976f8bee94 [c10d] add ncclGetLastError to NCCL pg (#83724)
This PR add ncclGetLastError API to the nccl pg, to provide better error
reporting out of nccl failures directly, instead of guessing on random
reasons

Differential Revision: [D39161199](https://our.internmc.facebook.com/intern/diff/D39161199)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83724
Approved by: https://github.com/kwen2501, https://github.com/H-Huang
2022-09-14 23:21:33 +00:00