pytorch/torch/lib/c10d
Jeff Daily ce5bca5502 ProcessGroupNCCL::alltoall_base needs to call recordStream (#46603)
Summary:
For similar reasons as documented in the `[Sync Streams]` note.  For a current example, `ProcessGroupNCCL::allgather` must also call `recordStream` and does so already.

The output tensor is created on the default stream (by the application).  NCCL/RCCL internally uses another stream (i.e., ncclStream).  If we do not record the output tensor on the ncclStream, there is a chance that the output tensor might be deallocated while NCCL/RCCL is using it.

The application is not aware of the ncclStream since it's internal to ProcessGroupNCCL.  So, the application cannot record the output tensor on the ncclStream.

Patch originally developed by sarunyap.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/46603

Reviewed By: srinivas212

Differential Revision: D24458530

fbshipit-source-id: b02e74d1c3a176ea1b9bbdd7dc671b221fcadaef
2020-10-22 15:53:19 -07:00
..
bin
example
test [Distributed] deleteKey support for HashStore (#46049) 2020-10-14 12:04:42 -07:00
CMakeLists.txt Fix Windows build failure after DDP PR merged (#45335) 2020-09-25 12:37:50 -07:00
FileStore.cpp [Distributed] DeleteKey API for c10d TCP Store (#45401) 2020-09-28 15:30:39 -07:00
FileStore.hpp [Distributed] DeleteKey API for c10d TCP Store (#45401) 2020-09-28 15:30:39 -07:00
GlooDeviceFactory.cpp Fix Windows build failure after DDP PR merged (#45335) 2020-09-25 12:37:50 -07:00
GlooDeviceFactory.hpp transport open registration (#30167) 2019-11-22 17:41:52 -08:00
HashStore.cpp [Distributed] deleteKey support for HashStore (#46049) 2020-10-14 12:04:42 -07:00
HashStore.hpp [Distributed] DeleteKey API for c10d TCP Store (#45401) 2020-09-28 15:30:39 -07:00
NCCLUtils.cpp improve error handling in getNCCLVersion in NCCLUtils (#27883) 2019-10-15 17:33:09 -07:00
NCCLUtils.hpp [NCCL] Provide additional information about NCCL error codes. (#45950) 2020-10-13 21:18:20 -07:00
PrefixStore.cpp [Distributed] DeleteKey API for c10d TCP Store (#45401) 2020-09-28 15:30:39 -07:00
PrefixStore.hpp [Distributed] DeleteKey API for c10d TCP Store (#45401) 2020-09-28 15:30:39 -07:00
ProcessGroup.cpp [ci-all tests] Improve logging in ProcessGroupNCCL for debugging purposes. (#46010) 2020-10-09 09:46:58 -07:00
ProcessGroup.hpp Add warning on ProcessGroup and ProcessGroup::Work APIs (#46220) 2020-10-14 16:27:37 -07:00
ProcessGroupGloo.cpp Fix Windows build failure after DDP PR merged (#45335) 2020-09-25 12:37:50 -07:00
ProcessGroupGloo.hpp Move torch/csrc/utils/hash.h to c10/util/hash.h. (#42503) 2020-08-29 17:47:00 -07:00
ProcessGroupMPI.cpp [c10d] Template computeLengthsAndOffsets() (#42706) 2020-08-10 19:21:46 -07:00
ProcessGroupMPI.hpp [NCCL] Add timeout to ProcessGroup Work Wait (#40944) 2020-07-16 10:56:58 -07:00
ProcessGroupNCCL.cpp ProcessGroupNCCL::alltoall_base needs to call recordStream (#46603) 2020-10-22 15:53:19 -07:00
ProcessGroupNCCL.hpp [pytorch][PR] Record FutureNCCL callback stream on CUDA caching allocator (#45318) 2020-10-22 01:49:47 -07:00
ProcessGroupRoundRobin.cpp Add NCCL Alltoall to PT NCCL process group (#42514) 2020-08-04 08:39:28 -07:00
ProcessGroupRoundRobin.hpp Add NCCL Alltoall to PT NCCL process group (#42514) 2020-08-04 08:39:28 -07:00
Store.cpp
Store.hpp [Distributed] DeleteKey API for c10d TCP Store (#45401) 2020-09-28 15:30:39 -07:00
TCPStore.cpp [Distributed] DeleteKey API for c10d TCP Store (#45401) 2020-09-28 15:30:39 -07:00
TCPStore.hpp [Distributed] DeleteKey API for c10d TCP Store (#45401) 2020-09-28 15:30:39 -07:00
Types.hpp Add All-to-all comms support to distributed module and MPI backend (#32361) 2020-04-01 08:57:12 -07:00
Utils.cpp Fix Windows build failure after DDP PR merged (#45335) 2020-09-25 12:37:50 -07:00
Utils.hpp [Distributed] General Function for Parsing Environment Variable Flags in PG (#46045) 2020-10-14 12:21:11 -07:00