pytorch/test/cpp/c10d
Ke Wen e474f0de82 [PGNCCL] Slimming watchdog loop (#139834)
- Refactored traceback code into `work.printTraceback()`.  cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @c-p-i-o @shuqiangzhang
- Refactored desync debug code into `class DesyncDebugger`.
- Moved occurrences of `futureWorkResult_->markCompleted` into `checkAndSetException` and `checkTimeout`, respectively. cc @shuqiangzhang
- Modularized dump signal broadcast code into `ProcessGroupNCCL::broadcastDumpSignal`. cc @fduwjj @c-p-i-o

Pull Request resolved: https://github.com/pytorch/pytorch/pull/139834
Approved by: https://github.com/shuqiangzhang
2024-11-07 17:22:44 +00:00
..
example
BackoffTest.cpp [Distributed] [16/N] Fix clang-tidy warnings in torch/csrc/distributed/c10d (#137404) 2024-10-10 18:05:34 +00:00
CMakeLists.txt [CMake] Remove pthread linking (#134436) 2024-10-29 23:14:40 +00:00
CUDATest.cu
CUDATest.hpp
FileStoreTest.cpp C10_UNUSED to [[maybe_unused]] (#6357) (#138364) 2024-10-19 13:17:43 +00:00
HashStoreTest.cpp C10_UNUSED to [[maybe_unused]] (#6357) (#138364) 2024-10-19 13:17:43 +00:00
ProcessGroupGlooAsyncTest.cpp C10_UNUSED to [[maybe_unused]] (#6357) (#138364) 2024-10-19 13:17:43 +00:00
ProcessGroupGlooTest.cpp C10_UNUSED to [[maybe_unused]] (#6357) (#138364) 2024-10-19 13:17:43 +00:00
ProcessGroupMPITest.cpp [Distributed] [16/N] Fix clang-tidy warnings in torch/csrc/distributed/c10d (#137404) 2024-10-10 18:05:34 +00:00
ProcessGroupNCCLErrorsTest.cpp [PGNCCL] Slimming watchdog loop (#139834) 2024-11-07 17:22:44 +00:00
ProcessGroupNCCLTest.cpp Make Context to be Device-agnostic Step by Step (1/N) (#136519) (#138155) 2024-10-17 20:58:56 +00:00
ProcessGroupUCCTest.cpp [Distributed] [16/N] Fix clang-tidy warnings in torch/csrc/distributed/c10d (#137404) 2024-10-10 18:05:34 +00:00
StoreTestCommon.hpp
TCPStoreTest.cpp C10_UNUSED to [[maybe_unused]] (#6357) (#138364) 2024-10-19 13:17:43 +00:00
TestUtils.hpp