mirror of
https://github.com/zebrajr/pytorch.git
synced 2025-12-07 00:21:07 +01:00
Replaces the "always sleep 30 sec before abort" with "wait up to 30 sec
for the future to complete then abort". The difference in this case is
the abort happens as soon as the dump finishes up to a maximum, instead
of always waiting the maximum.
Allows multiple calls to dump, which will be serialized.
Renames tryWriteDebugInfo to launchAsyncDebugDump in spirit of the
change to support more than one launch and to always launch rather than
only launching on the first call.
Adds a test for dumping on timeout.
This reverts commit
|
||
|---|---|---|
| .. | ||
| example | ||
| CMakeLists.txt | ||
| CUDATest.cu | ||
| CUDATest.hpp | ||
| FileStoreTest.cpp | ||
| HashStoreTest.cpp | ||
| ProcessGroupGlooAsyncTest.cpp | ||
| ProcessGroupGlooTest.cpp | ||
| ProcessGroupMPITest.cpp | ||
| ProcessGroupNCCLErrorsTest.cpp | ||
| ProcessGroupNCCLTest.cpp | ||
| ProcessGroupUCCTest.cpp | ||
| StoreTestCommon.hpp | ||
| TCPStoreTest.cpp | ||
| TestUtils.hpp | ||