pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Yuxin Wu	c8ed84ad06	Fix a static initialization order fiasco in c10d (#90149 ) The `TORCH_LIBRARY_IMPL` registrations in `OpsImpl.cpp` needs to happen after `ProcessGroup` is registered as a torch class -- which happens in `Ops.cpp`. However, the order of the registrations is undefined between the two files. If the registration in `OpsImpl.cpp` runs before `Ops.cpp`, we get a crash at program launch similar to #83255 . This happens in our internal build. This PR moves `OpsImpl.cpp` to the end of `Oops.cpp`. Because according to the omniscient lord of chatGPT: <img width="600" alt="2022-12-04_19-25" src="https://user-images.githubusercontent.com/1381301/205542847-3535b319-3c2a-4e8e-bc11-27913f6afb39.png"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/90149 Approved by: https://github.com/kwen2501, https://github.com/H-Huang, https://github.com/soumith	2022-12-12 08:21:54 +00:00
Howard Huang	80150788bc	[21/N] Add alltoall_base custom op with CPU/CUDA implementations (#89813 ) Differential Revision: [D41812670](https://our.internmc.facebook.com/intern/diff/D41812670) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89813 Approved by: https://github.com/kwen2501	2022-12-08 23:39:26 +00:00
Howard Huang	e65ee3975f	[20/N] Add recv_any_source custom op with CPU/CUDA implementations (#89505 ) Differential Revision: [D41812671](https://our.internmc.facebook.com/intern/diff/D41812671) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89505 Approved by: https://github.com/kwen2501	2022-12-08 23:39:26 +00:00
Masaki Kozuki	508916128d	[ReduceOp] ameliorate custom `__eq__` (#90088 ) Improve the completeness of `ReduceOp.__eq__`. Should support the equal operator with the first argument of `RedOpType` and the second of `ReduceOp` in a follow-up. Fixes #90072 Pull Request resolved: https://github.com/pytorch/pytorch/pull/90088 Approved by: https://github.com/kwen2501	2022-12-06 05:13:50 +00:00
Aidyn-A	a5430e1067	[UCC] Properly finalize unsuccessful collective posts (#89306 ) This PR add a `ucc_collective_finalize` call if `ucc_collective_post` and `ucc_collective_triggered_post` were not successful. According to the [UCC documentation](https://openucx.github.io/ucc/api/v1.1/html/group___u_c_c___c_o_l_l_e_c_t_i_v_e_s.html): ``` On error, request handle becomes invalid, user is responsible to call ucc_collective_finalize to free allocated resources. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/89306 Approved by: https://github.com/kwen2501	2022-12-01 23:01:45 +00:00
Howard Huang	5797f74924	[19/N] Add monitored_barrier custom op with CPU implementation (#89318 ) Differential Revision: [D41415324](https://our.internmc.facebook.com/intern/diff/D41415324) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89318 Approved by: https://github.com/kwen2501	2022-11-22 14:18:40 +00:00
Howard Huang	be22b5d39f	[18/N] Add allgather_coalesced custom op with CPU/CUDA implementations (#89317 ) Differential Revision: [D41415321](https://our.internmc.facebook.com/intern/diff/D41415321) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89317 Approved by: https://github.com/kwen2501	2022-11-22 14:14:17 +00:00
Howard Huang	58a74f34f9	[17/N] Add _reduce_scatter_base custom op with CPU/CUDA implementation (#88903 ) Differential Revision: [D41415325](https://our.internmc.facebook.com/intern/diff/D41415325) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88903 Approved by: https://github.com/kwen2501	2022-11-22 00:42:11 +00:00
Kirtesh Patil	fe276ea0f9	[UCC] Add pre & post processing for CPU collectives (#89030 ) Summary: The CPU block in `collective_post` was missing pre & post processing. The reduce-scatter implementaion expects use of pre-processing callback to flatten the input tensors, however, the missing invocation meant grabage values were being passed. Test Plan: Tested the reduce-scatter collective using PARAM Reviewed By: eastzone Differential Revision: D41291592 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89030 Approved by: https://github.com/kingchc, https://github.com/kwen2501	2022-11-16 16:40:24 +00:00
Masaki Kozuki	63e16216d8	[c10d] Implement `__instancecheck__` for `c10d::ReduceOp` (#88275 ) Summary: - Customize the metaclass of `torch.distributed.distributed_c10d.ReduceOp` for the sake of custom `__instancecheck__` - Add `copy.copy`, `copy.deepcopy`, and `pickle` support with tests Rel: - #81272 - #84243 - #87191 - #87303 - #87555 Ref: - https://github.com/pybind/pybind11/issues/2696 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88275 Approved by: https://github.com/wanchaol	2022-11-15 13:21:41 +00:00
Kazuaki Ishizaki	e0c194f10b	Fix typos in messages under torch (#88961 ) This PR fixes typos of messages and parms in c++ source and head files under `torch` directory. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88961 Approved by: https://github.com/albanD	2022-11-14 19:06:41 +00:00
Howard Huang	df1df9d10a	[16/N] Add _allgather_base custom op with CPU/CUDA implementation (#88889 ) Differential Revision: [D41227739](https://our.internmc.facebook.com/intern/diff/D41227739) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88889 Approved by: https://github.com/kwen2501	2022-11-12 22:31:07 +00:00
Howard Huang	6e5f736d86	[15/N] Add allreduce_coalesced custom op with CPU/CUDA implementations (#88846 ) Differential Revision: [D41227740](https://our.internmc.facebook.com/intern/diff/D41227740) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88846 Approved by: https://github.com/kwen2501	2022-11-12 14:23:45 +00:00
Howard Huang	3a3500fa08	[13/N] Update gather with CPU/CUDA implementations (#86409 ) Differential Revision: [D40181612](https://our.internmc.facebook.com/intern/diff/D40181612) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86409 Approved by: https://github.com/kwen2501	2022-11-09 22:11:40 +00:00
Howard Huang	55df18e3da	[12/N] Update scatter with CPU/CUDA implementations (#86408 ) Differential Revision: [D40181613](https://our.internmc.facebook.com/intern/diff/D40181613) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86408 Approved by: https://github.com/kwen2501	2022-11-09 18:40:25 +00:00
Howard Huang	bc66ddb5cb	Add torch.distributed.DistBackendError exception type, thrown from C10D_NCCL_CHECK (#88134 ) Currently all of the distributed errors are thrown from the `TORCH_CHECK` macro which throws a generic `RuntimeError`. This change introduced a new error type `DistBackendError` which derives from `RuntimeError` to signify there was an error with the backend communication library. This allows for better error handling and analysis at higher levels in the stack. Motivation: https://docs.google.com/document/d/1j6VPOkC6znscliFuiDWMuMV1_fH4Abgdq7TCHMcXai4/edit#heading=h.a9rc38misyx8 Changes: - introduce new error type - Update `C10D_NCCL_CHECK` Sample script to demonstrate new error type ```python # python -m torch.distributed.run --nproc_per_node=2 <script>.py import torch import torch.distributed as dist if __name__ == "__main__": dist.init_process_group("nccl") dist.broadcast(torch.tensor([1, 2, 3]).cuda(), 0) ``` Differential Revision: [D40998803](https://our.internmc.facebook.com/intern/diff/D40998803) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88134 Approved by: https://github.com/rohan-varma	2022-11-08 13:26:42 +00:00
Howard Huang	81f74eed75	[11/N] Update all_to_all with CPU/CUDA implementations (#86407 ) * #83916 [7/N] [Dispatchable Collectives] Update reduce with CPU / CUDA implementations Pull Request resolved: https://github.com/pytorch/pytorch/pull/86407 Approved by: https://github.com/kwen2501	2022-11-01 17:54:13 +00:00
Howard Huang	bed8102741	[10/N] Update barrier with CPU/CUDA implementations (#86368 ) ### Changes - Updates for the barrier collective - NOTE: current change will not achieve dispatching of barrier since there is no tensor to read from ### Context https://github.com/pytorch/pytorch/issues/86225 cc @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @osalpekar @jiayisuse @kwen2501 @awgu Pull Request resolved: https://github.com/pytorch/pytorch/pull/86368 Approved by: https://github.com/kwen2501	2022-11-01 17:41:01 +00:00
Yanli Zhao	44f8efd5c1	[BE]fix DDP when the number of output features is zero (#87793 ) Fixes #87280 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87793 Approved by: https://github.com/rohan-varma	2022-11-01 15:27:40 +00:00
Howard Huang	20d849b982	[9/N] [Dispatchable Collectives] Update reduce_scatter with CPU / CUDA implementations (#86166 ) ### Changes - Updates for the reduce_scatter collective ### Context https://github.com/pytorch/pytorch/issues/86225 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86166 Approved by: https://github.com/kwen2501	2022-11-01 15:23:41 +00:00
Fuzzkatt	d13f1e6ab4	Add sequence number support for UCC (#85047 ) Add sequence number support for UCC, mostly following format of ProcressGroupNCCL. Pass new test: `test_all_gather_object_subgroup` Add skips for gather tests: `test_gather_object` and `test_gather_object_subgroup` cc @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @osalpekar @jiayisuse @H-Huang @kwen2501 @awgu Pull Request resolved: https://github.com/pytorch/pytorch/pull/85047 Approved by: https://github.com/kwen2501	2022-10-31 03:56:55 +00:00
Sergey Lebedev	19171a21ee	Make barrier blocking in UCC (#86961 ) Currently CUDA UCC barrier is nonblocking with respect to CPU and there is no flag to change it. To make UCC PG barrier behaviour consistent with NCCL PG in this PR barrier has changed to be always blocking. cc @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @osalpekar @jiayisuse @H-Huang @kwen2501 @awgu Pull Request resolved: https://github.com/pytorch/pytorch/pull/86961 Approved by: https://github.com/kwen2501	2022-10-29 16:33:18 +00:00
soulitzer	adb76ef510	Expose API for backward execution order (#87507 ) In this PR: - graph_task stores graph roots on construction so that we can later traverse through the graph - before the nodes are returned, they needed to be converted from raw_ptr to shared_ptr, and this should be OK because the graph is guaranteed to be alive Pull Request resolved: https://github.com/pytorch/pytorch/pull/87507 Approved by: https://github.com/albanD	2022-10-26 21:28:45 +00:00
Howard Huang	4ef5f5dec7	Fix use after free in tensorpipe agent (#87627 ) Fixes #87359, which identifies use after free for reverse device maps. This is only in the dynamic RPC feature and not effecting stable RPC code path. Unfortunately the test `TensorPipeRpcTest.test_dynamic_rpc_existing_rank_can_communicate_with_new_rank_cuda` that is failing is also running into separate issue. I've temporarily disabled some of the test code to investigate the error in asychronously. Testing plan: - tested all the dynamic RPC tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/87627 Approved by: https://github.com/rohan-varma	2022-10-25 04:17:43 +00:00
Masaki Kozuki	aa8248cc9a	Reenable `isinstance` with `torch.distributed.ReduceOp` (#87303 ) tentatively marking as draft as I haven't gotten a comprehensive list of side effects... Ref: https://stackoverflow.com/questions/40244413/python-static-class-attribute-of-the-class-itself Rel: https://github.com/pytorch/pytorch/issues/87191 cc @kwen2501 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87303 Approved by: https://github.com/wanchaol	2022-10-21 15:05:36 +00:00
Andrew Gu	f552eee427	[Docs] Remove outdated comment for sparse all-reduce (#87018 ) https://github.com/pytorch/pytorch/pull/23917 switched to using allgatherv instead of allgather for gloo sparse all-reduce. This PR removes a comment saying to use allgatherv if available since that has already been done. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87018 Approved by: https://github.com/H-Huang	2022-10-17 21:17:07 +00:00
Howard Huang	3356d0385f	[BE] Store helper functions C++ for python API parity (#82136 ) Add helper functions for `store.set()`, `store.compare_set()` to accept string arguments instead of vector<uint_8> and refactored some usages internally Pull Request resolved: https://github.com/pytorch/pytorch/pull/82136 Approved by: https://github.com/rohan-varma	2022-10-12 17:49:38 +00:00
Louis Feng	55479fe80e	Enable capturing of comm collective parameters (#98 ) (#85368 ) Summary: X-link: https://github.com/facebookresearch/torch_ucc/pull/98 Add tensor input, output, and other metadata for PyTorch comms. Test Plan: P517138779 Reviewed By: Pavani-Panakanti Differential Revision: D38357077 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85368 Approved by: https://github.com/H-Huang	2022-10-11 04:38:26 +00:00
Howard Huang	ad449b338f	[8/N] [Dispatchable Collectives] Update allgather with CPU / CUDA implementations (#84423 ) ### Changes - Updates for the allgather collective ### Context https://github.com/pytorch/pytorch/issues/86225 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84423 Approved by: https://github.com/kwen2501	2022-10-10 17:18:48 +00:00
Howard Huang	8a1fc5d2f8	[7/N] [Dispatchable Collectives] Update reduce with CPU / CUDA implementations (#83916 ) ### Changes - Updates for the reduce collective ### Context https://github.com/pytorch/pytorch/issues/86225 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83916 Approved by: https://github.com/kwen2501	2022-10-10 15:58:37 +00:00
Chengqi Deng	b43ae1c411	Add reference counter in FileStore (#85601 ) Fixes #67566. This diff added a reference counter in the FileStore object. The underlying file would be removed only if the reference counter became 0. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85601 Approved by: https://github.com/H-Huang	2022-10-07 17:59:29 +00:00
Howard Huang	d39e9c1e90	[6/N] [Dispatchable Collectives] Update recv with CPU / CUDA implementations (#83876 ) * ### Changes - Updates for the recv collective ### Context https://github.com/pytorch/pytorch/issues/86225 Differential Revision: [D40044552](https://our.internmc.facebook.com/intern/diff/D40044552) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83876 Approved by: https://github.com/kwen2501	2022-10-04 20:30:21 +00:00
Howard Huang	3f2e7d5c9a	[5/N] [Dispatchable Collectives] Update send with CPU / CUDA implementations (#83859 ) Differential Revision: [D40044550](https://our.internmc.facebook.com/intern/diff/D40044550) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83859 Approved by: https://github.com/kwen2501	2022-10-04 14:32:37 +00:00
Fuzzkatt	d9421f8158	added fix for WorkUCC (#84368 ) Added new constructor for WorkUCC to take in optional inputTensors argument for to enable record_shapes=True for profiling purposes. Tested at https://github.com/pytorch/pytorch/pull/84323 which manually merges in https://github.com/pytorch/pytorch/pull/83285. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84368 Approved by: https://github.com/kingchc, https://github.com/kwen2501	2022-09-30 22:51:59 +00:00
Ke Wen	1f38abb5d2	Adopt ncclRemoteError (#85887 ) `ncclRemoteError` was added in NCCL 2.13 to indicate a network error or a remote process exiting prematurely. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85887 Approved by: https://github.com/wanchaol	2022-09-30 09:17:49 +00:00
Ke Wen	ade1c19612	Add reduce_scatter_tensor in place of _reduce_scatter_base (#85867 ) This is a twin PR similar to the one for `all_gather_into_tensor` (#85686). The philosophy for renaming `_reduce_scatter_base` instead of merging it is described in #85686. Cc @rohan-varma @H-Huang @crcrpar @ptrblck @mrshenli Pull Request resolved: https://github.com/pytorch/pytorch/pull/85867 Approved by: https://github.com/crcrpar, https://github.com/H-Huang	2022-09-30 05:48:16 +00:00
Min Si	1ad0048b64	Refactor distribuetd to use absolute header path (#85780 ) Headers under torch/csrc/distributed may be referened with relative path, e.g., "<c10d/...>". However, relative path cannot be gracefully handled by Meta internal build when the NCCL PG is hipified to support AMD/RCCL because the "hipified" header files are generated in other directories. Moreover, using absolute path for header inclusion is the state-of-the-art in most components in Pytorch. Thus, this patch refactors all header paths in torch/csrc/distributed to be absolute. See D39835774 for more details about Meta internal complication. How to test: commit 9e5d199 removes -I./torch/csrc/distributed in compile options. Thus use it to verify we don't miss any relative path use of torch/csrc/distributed headers. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85780 Approved by: https://github.com/kumpera, https://github.com/huydhn	2022-09-30 05:13:50 +00:00
PyTorch MergeBot	a50d8864fc	Revert "Refactor distribuetd to use absolute header path (#85780 )" This reverts commit `668082718a`. Reverted https://github.com/pytorch/pytorch/pull/85780 on behalf of https://github.com/huydhn due to Sorry for reverting your PR but it breaks build due to a missing file <c10d/Store.hpp>	2022-09-30 02:04:29 +00:00
Min Si	668082718a	Refactor distribuetd to use absolute header path (#85780 ) Headers under torch/csrc/distributed may be referened with relative path, e.g., "<c10d/...>". However, relative path cannot be gracefully handled by Meta internal build when the NCCL PG is hipified to support AMD/RCCL because the "hipified" header files are generated in other directories. Moreover, using absolute path for header inclusion is the state-of-the-art in most components in Pytorch. Thus, this patch refactors all header paths in torch/csrc/distributed to be absolute. See D39835774 for more details about Meta internal complication. How to test: commit 9e5d199 removes -I./torch/csrc/distributed in compile options. Thus use it to verify we don't miss any relative path use of torch/csrc/distributed headers. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85780 Approved by: https://github.com/kumpera	2022-09-30 00:27:24 +00:00
Andrey	dde43d083b	[c10d] Reorder macros so they are defined before getting used (#85850 ) Summary: Move preprocessor macros all the way up, so they are defined before being used. Test Plan: existing tests Reviewed By: wanchaol Pull Request resolved: https://github.com/pytorch/pytorch/pull/85850 Approved by: https://github.com/wanchaol	2022-09-29 23:44:57 +00:00
Saliya Ekanayake	941d7a31f6	Pass group ranks and options to third party distributed backends (#73164 ) Fixes #73163 PyTorch's [_new_process_group_helper()](`9f541aa3ac/torch/distributed/distributed_c10d.py (L633)`) does not pass group's participating ranks to the backend. This PR adds the above capability. Also, refactors some variables for better clarity. Pull Request resolved: https://github.com/pytorch/pytorch/pull/73164 Approved by: https://github.com/kumpera	2022-09-29 17:28:58 +00:00
Wanchao Liang	72b32f1644	[c10d] move ncclgetlasterror directive definition upfront (#85825 ) Move the directive definition of ncclGetLastError() upfront so that C++ preprocessor does not treat this as a empty string Pull Request resolved: https://github.com/pytorch/pytorch/pull/85825 Approved by: https://github.com/H-Huang, https://github.com/kwen2501	2022-09-29 06:17:43 +00:00
Howard Huang	06e0583fb0	[4/N] [Dispatchable Collectives] Update all_reduce_ with CPU / CUDA implementations (#83810 ) ### About this PR * Update the all_reduce op to dispatch to cpu and cuda implementations. Right now they both perform the same logic so this is essentially a no-op. * Update test to validate that a separate device implementation is not supported. ### About this stack In the future we will repurpose ProcessGroup to instead contain a list of Backends (ProcessGroupNCCL/Gloo/UCC) and perform dispatching to them based on tensor type. The CPU and CUDA implementations will be updated to have process group select its CPU and CUDA backends respectively. Differential Revision: [D39506979](https://our.internmc.facebook.com/intern/diff/D39506979) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83810 Approved by: https://github.com/kwen2501	2022-09-28 08:48:32 +00:00
Howard Huang	ccac8d13d5	[3/N] [Dispatchable Collectives] Update broadcast_ with CPU and CUDA implementations (#83735 ) ### About this PR * Update the broadcast op to dispatch to cpu and cuda implementations. Right now they both perform the same logic so this is essentially a no-op. * Add test to validate that a separate device implementation is not supported. ### About this stack In the future we will repurpose ProcessGroup to instead contain a list of Backends (ProcessGroupNCCL/Gloo/UCC) and perform dispatching to them based on tensor type. The CPU and CUDA implementations will be updated to have process group select its CPU and CUDA backends respectively. Differential Revision: [D38876771](https://our.internmc.facebook.com/intern/diff/D38876771) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83735 Approved by: https://github.com/kwen2501	2022-09-28 03:24:06 +00:00
Ke Wen	3276b51243	Add environment parse function that supports default value (#85563 ) We use "-2" to represent an unset environment variable. Now adding a util function to attach default value if environment variable is unset. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85563 Approved by: https://github.com/rohan-varma, https://github.com/H-Huang	2022-09-28 02:56:50 +00:00
James Zeng	7934596b70	[ucc] Remove internal tracing (#85730 ) Summary: Remove internal tracing since this was not upstreamed yet. Test Plan: All PyTorch test should pass. Differential Revision: D39853937 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85730 Approved by: https://github.com/kwen2501	2022-09-27 23:27:43 +00:00
PyTorch MergeBot	b360d66391	Revert "Add environment parse function that supports default value (#85563 )" This reverts commit `784f4ba1ce`. Reverted https://github.com/pytorch/pytorch/pull/85563 on behalf of https://github.com/huydhn due to Fail test_DistributedDataParallel (main.TestDistBackendWithSpawn)	2022-09-27 02:55:59 +00:00
Ke Wen	784f4ba1ce	Add environment parse function that supports default value (#85563 ) We use "-2" to represent an unset environment variable. Now adding a util function to attach default value if environment variable is unset. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85563 Approved by: https://github.com/rohan-varma, https://github.com/H-Huang	2022-09-27 00:34:50 +00:00
Howard Huang	db40fbdee0	Add deprecation warning to ProcessGroupRoundRobin (#85158 ) Trying to add any deprecation messages we anticipate we need before 1.13 branch cut. Add deprecation message to process group round robin. ## Test ```python import torch.distributed as dist if __name__ == "__main__": pg = dist._round_robin_process_groups( [ dist.ProcessGroupGloo(dist.TCPStore("localhost", 29500, 1, True), 0, 1) ] ) ``` gives message ``` W0916 16:19:38.367360 68031 ProcessGroupRoundRobin.cpp:12] Warning: ProcessGroupRoundRobin is deprecated and scheduled to be removed after this current release (1.13). Please file an issue on https://github.com/pytorch/pytorch/issues if there are any concerns or issues with this deprecation. (function ProcessGroupRoundRobin) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/85158 Approved by: https://github.com/rohan-varma	2022-09-24 18:00:28 +00:00
Wanchao Liang	976f8bee94	[c10d] add ncclGetLastError to NCCL pg (#83724 ) This PR add ncclGetLastError API to the nccl pg, to provide better error reporting out of nccl failures directly, instead of guessing on random reasons Differential Revision: [D39161199](https://our.internmc.facebook.com/intern/diff/D39161199) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83724 Approved by: https://github.com/kwen2501, https://github.com/H-Huang	2022-09-14 23:21:33 +00:00

1 2 3 4 5 ...

1265 Commits