pytorch/torch/csrc/cuda
Dan Johnson 3c97b0ab00 Use ncclAlltoAllv and ncclAlltoAll API when supported (#134499)
NCCL does not have an api for ncclAllToAll and ncclAllToAllv, so PyTorch does point to point send/recv. Expose this API if it is supported.

Differential Revision: [D61683836](https://our.internmc.facebook.com/intern/diff/D61683836/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134499
Approved by: https://github.com/shuqiangzhang, https://github.com/eqy
2024-09-16 20:08:06 +00:00
..
shared [sparse] Add cuSPARSELt as a backend (#128534) 2024-08-21 22:06:07 +00:00
comm.cpp
comm.h
CUDAPluggableAllocator.cpp [Reland] Refactor caching device allocator utils (#130923) 2024-09-07 11:14:17 +00:00
CUDAPluggableAllocator.h [Reland] Refactor caching device allocator utils (#130923) 2024-09-07 11:14:17 +00:00
device_set.h
Event.cpp drop gil in couple places (leads to deadlocks) (#134910) 2024-09-01 00:05:53 +00:00
Event.h
GdsFile.cpp [Reland] Add wrappers for synchronous GPUDirect Storage APIs (#133489) 2024-08-15 17:11:52 +00:00
GdsFile.h [Reland] Add wrappers for synchronous GPUDirect Storage APIs (#133489) 2024-08-15 17:11:52 +00:00
Graph.cpp
memory_snapshot.cpp [Memory Snapshot] Skip C++ warmup unwind() call if context is not set (#133038) 2024-08-13 17:25:24 +00:00
memory_snapshot.h
MemPool.cpp Implements torch.cuda.MemPool() API (#131152) 2024-08-01 01:29:30 +00:00
Module.cpp [ROCm] Enable ROCm support for inductor's dynamic_rblock_scaling (#129663) 2024-09-13 16:45:39 +00:00
Module.h
nccl.cpp Use ncclAlltoAllv and ncclAlltoAll API when supported (#134499) 2024-09-16 20:08:06 +00:00
nccl.h
python_comm.cpp
python_comm.h
python_nccl.cpp
python_nccl.h
Stream.cpp
Stream.h
Tensor.cpp
THCP.h
utils.cpp