| .. |
|
control_collectives
|
|
|
|
control_plane
|
[19/N] Fix extra warnings brought by clang-tidy-17 (#144448)
|
2025-01-09 15:58:05 +00:00 |
|
cuda
|
[codemod] Remove unused-variable in caffe2/torch/csrc/distributed/c10d/cuda/AsyncMM.cu (#148501)
|
2025-03-07 00:33:39 +00:00 |
|
quantization
|
|
|
|
Backend.cpp
|
|
|
|
Backend.hpp
|
c10d/ProcessGroup: cleanup abort and shutdown (#148798)
|
2025-03-08 18:33:18 +00:00 |
|
Backoff.cpp
|
|
|
|
Backoff.hpp
|
|
|
|
c10d.h
|
|
|
|
comm.cpp
|
|
|
|
comm.hpp
|
|
|
|
CudaDMAConnectivity.cpp
|
|
|
|
CUDASymmetricMemory-inl.h
|
Support SymmetricMemory's signaling kernels on sm60 and sm70 (#146308)
|
2025-02-21 15:29:02 +00:00 |
|
CUDASymmetricMemory.cu
|
[c10d] Restrict use condition of NCCL mem pool (#147764)
|
2025-02-26 03:40:00 +00:00 |
|
CUDASymmetricMemory.hpp
|
[SymmetricMemory] support specifying group_name at rendezvous time (#139529)
|
2024-11-17 09:31:17 +00:00 |
|
CUDASymmetricMemoryOps.cu
|
Support SymmetricMemory's signaling kernels on sm60 and sm70 (#146308)
|
2025-02-21 15:29:02 +00:00 |
|
debug.cpp
|
|
|
|
debug.h
|
|
|
|
default_comm_hooks.cpp
|
|
|
|
default_comm_hooks.hpp
|
|
|
|
DMAConnectivity.cpp
|
[19/N] Fix extra warnings brought by clang-tidy-17 (#144448)
|
2025-01-09 15:58:05 +00:00 |
|
DMAConnectivity.hpp
|
|
|
|
error.h
|
|
|
|
exception.h
|
[BE] TCPStore: use typed errors for assertions (#147647)
|
2025-02-24 20:58:10 +00:00 |
|
FakeProcessGroup.hpp
|
Add support for non functional collectives under FakeTensorMode and fake_pg for memory tracking (#147566)
|
2025-03-08 18:00:49 +00:00 |
|
FileStore.cpp
|
Enable clang-tidy on torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp (#143806)
|
2025-01-24 12:22:13 +00:00 |
|
FileStore.hpp
|
|
|
|
FlightRecorder.cpp
|
[c10d] Flush file in file recorder (#145458)
|
2025-01-27 23:15:52 +00:00 |
|
FlightRecorder.hpp
|
[2/N] Rename NCCLTraceBuffer to FlightRecorder (#141712)
|
2024-11-29 21:15:31 +00:00 |
|
Functional.cpp
|
Optimize shard_dim_alltoall to use alltoall_single (#148868)
|
2025-03-10 18:38:12 +00:00 |
|
Functional.hpp
|
|
|
|
GlooDeviceFactory.cpp
|
[Reland][Environment Variable][4/N] Use thread-safe getenv functions (#140593)
|
2025-01-28 20:51:49 +00:00 |
|
GlooDeviceFactory.hpp
|
|
|
|
GroupRegistry.cpp
|
Remove some NOLINT (#146610)
|
2025-02-07 01:50:06 +00:00 |
|
GroupRegistry.hpp
|
|
|
|
HashStore.cpp
|
|
|
|
HashStore.hpp
|
|
|
|
init.cpp
|
Revert "[PGNCCL] Launch kernel on current stream & remove record_stream entirely (#148590)"
|
2025-03-17 22:43:15 +00:00 |
|
intra_node_comm.cpp
|
Fix compile errors (#148758)
|
2025-03-08 04:56:42 +00:00 |
|
intra_node_comm.cu
|
[IntraNodeComm] fix a recent breakage (#141200)
|
2024-11-26 00:46:38 +00:00 |
|
intra_node_comm.hpp
|
|
|
|
logger.cpp
|
[4/N] Remove unnecessary once flag usage (#146783)
|
2025-02-11 13:55:06 +00:00 |
|
logger.hpp
|
[fr][c10d] log trace capture enabled or not in flight recorder (#143865)
|
2024-12-27 03:07:55 +00:00 |
|
logging.cpp
|
|
|
|
logging.h
|
Enable clang-tidy on torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp (#143806)
|
2025-01-24 12:22:13 +00:00 |
|
NanCheck.cu
|
catch tensor.numel() == 0 in nan detector (#140741)
|
2024-11-15 05:03:20 +00:00 |
|
NanCheck.hpp
|
|
|
|
NCCLUtils.cpp
|
[codemod] Fix unused-value issue in caffe2/aten/src/ATen/cuda/detail/CUDAHooks.cpp +4 (#147555)
|
2025-03-01 19:46:13 +00:00 |
|
NCCLUtils.hpp
|
[DDP] Use NCCL allocated memory for gradient bucket (#146589)
|
2025-02-10 05:23:11 +00:00 |
|
Ops.cpp
|
Revert "[PGNCCL] Launch kernel on current stream & remove record_stream entirely (#148590)"
|
2025-03-17 22:43:15 +00:00 |
|
ParamCommsUtils.cpp
|
|
|
|
ParamCommsUtils.hpp
|
|
|
|
PrefixStore.cpp
|
|
|
|
PrefixStore.hpp
|
|
|
|
ProcessGroup.cpp
|
|
|
|
ProcessGroup.hpp
|
Revert "[PGNCCL] Launch kernel on current stream & remove record_stream entirely (#148590)"
|
2025-03-17 22:43:15 +00:00 |
|
ProcessGroupGloo.cpp
|
Use task submitter TLS in gloo working threads (#142184)
|
2024-12-06 17:03:17 +00:00 |
|
ProcessGroupGloo.hpp
|
Use task submitter TLS in gloo working threads (#142184)
|
2024-12-06 17:03:17 +00:00 |
|
ProcessGroupMPI.cpp
|
[2/N] Remove unnecessary once flag usage (#145057)
|
2025-01-23 09:48:46 +00:00 |
|
ProcessGroupMPI.hpp
|
c10d/ProcessGroup: cleanup abort and shutdown (#148798)
|
2025-03-08 18:33:18 +00:00 |
|
ProcessGroupNCCL.cpp
|
Revert "[PGNCCL] Launch kernel on current stream & remove record_stream entirely (#148590)"
|
2025-03-17 22:43:15 +00:00 |
|
ProcessGroupNCCL.hpp
|
Revert "[PGNCCL] Launch kernel on current stream & remove record_stream entirely (#148590)"
|
2025-03-17 22:43:15 +00:00 |
|
ProcessGroupUCC.cpp
|
Cleanup CallOnce.h (#146700)
|
2025-02-07 16:44:45 +00:00 |
|
ProcessGroupUCC.hpp
|
[c10d][UCC] Add _reduce_scatter_base to c10d::ProcessGroupUCC (#138021)
|
2024-12-09 16:02:24 +00:00 |
|
ProcessGroupWrapper.cpp
|
|
|
|
ProcessGroupWrapper.hpp
|
|
|
|
PyProcessGroup.hpp
|
c10d/ProcessGroup: cleanup abort and shutdown (#148798)
|
2025-03-08 18:33:18 +00:00 |
|
python_comm_hook.cpp
|
|
|
|
python_comm_hook.h
|
|
|
|
RankLocal.hpp
|
|
|
|
reducer_cuda.cpp
|
Fix compile errors (#148758)
|
2025-03-08 04:56:42 +00:00 |
|
reducer_timer.hpp
|
|
|
|
reducer.cpp
|
[reland][ca] side-effect free inital trace: compiled_args (#148376)
|
2025-03-11 01:57:36 +00:00 |
|
reducer.hpp
|
Enable clang-tidy on torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp (#143806)
|
2025-01-24 12:22:13 +00:00 |
|
sequence_num.cpp
|
[4/N] Apply bugprone-unchecked-optional-access (#142832)
|
2024-12-12 04:33:32 +00:00 |
|
sequence_num.hpp
|
|
|
|
socket_fmt.h
|
|
|
|
socket.cpp
|
Remove unnecessary once flag usage (#143255)
|
2025-01-16 02:36:11 +00:00 |
|
socket.h
|
|
|
|
Store.cpp
|
|
|
|
Store.hpp
|
|
|
|
SymmetricMemory.cpp
|
[SymmetricMemory] introduce multimem_all_gather (#142810)
|
2024-12-17 01:07:27 +00:00 |
|
SymmetricMemory.hpp
|
[torch/distributed] Make _SymmetricMemory.has_multicast_support() ret… (#141598)
|
2024-11-26 23:36:32 +00:00 |
|
TCPStore.cpp
|
Fix dist.init_process_group on windows (#148266)
|
2025-03-05 00:07:56 +00:00 |
|
TCPStore.hpp
|
|
|
|
TCPStoreBackend.cpp
|
[19/N] Fix extra warnings brought by clang-tidy-17 (#144448)
|
2025-01-09 15:58:05 +00:00 |
|
TCPStoreBackend.hpp
|
|
|
|
TCPStoreLibUvBackend.cpp
|
[BE] TCPStore: use typed errors for assertions (#147647)
|
2025-02-24 20:58:10 +00:00 |
|
TraceUtils.h
|
[pgnccl][simple] log started work numel (#139773)
|
2024-11-05 23:11:19 +00:00 |
|
Types.hpp
|
Revert "[PGNCCL] Launch kernel on current stream & remove record_stream entirely (#148590)"
|
2025-03-17 22:43:15 +00:00 |
|
UCCTracing.cpp
|
|
|
|
UCCTracing.hpp
|
|
|
|
UCCUtils.cpp
|
|
|
|
UCCUtils.hpp
|
[3/N] Replace c10::sv with std::sv (#139861)
|
2024-11-07 20:03:57 +00:00 |
|
UnixSockUtils.hpp
|
|
|
|
Utils.cpp
|
Code Refactoring for getting start and stride from global ranks (#147230)
|
2025-02-21 10:02:50 +00:00 |
|
Utils.hpp
|
Code Refactoring for getting start and stride from global ranks (#147230)
|
2025-02-21 10:02:50 +00:00 |
|
WinSockUtils.hpp
|
|
|
|
Work.cpp
|
Enable more readability-redundant checks (#143963)
|
2024-12-30 14:49:33 +00:00 |
|
Work.hpp
|
|
|