mirror of
https://github.com/zebrajr/pytorch.git
synced 2025-12-06 12:20:52 +01:00
We added `CudaEventCache` in https://github.com/pytorch/pytorch/pull/133727 and this is a feature which tries to reuse CudaEvent so that we don't call destroy of CudaEvent which causes hang in the past. We had a bunch of tests and testing on TorchTitan and internal workload already. So far no errors or crash are found at the moment so we decide to roll out to all OSS users. For internal workload, this PR would not affect it because of some internal gating. Also we observed some multi-device use cases in OSS, so that we want to bring back multi-device support originally proposed in https://github.com/pytorch/pytorch/pull/122732/files. Pull Request resolved: https://github.com/pytorch/pytorch/pull/140975 Approved by: https://github.com/eqy, https://github.com/kwen2501 |
||
|---|---|---|
| .. | ||
| example | ||
| BackoffTest.cpp | ||
| CMakeLists.txt | ||
| CUDATest.cu | ||
| CUDATest.hpp | ||
| FileStoreTest.cpp | ||
| HashStoreTest.cpp | ||
| ProcessGroupGlooAsyncTest.cpp | ||
| ProcessGroupGlooTest.cpp | ||
| ProcessGroupMPITest.cpp | ||
| ProcessGroupNCCLErrorsTest.cpp | ||
| ProcessGroupNCCLTest.cpp | ||
| ProcessGroupUCCTest.cpp | ||
| StoreTestCommon.hpp | ||
| TCPStoreTest.cpp | ||
| TestUtils.hpp | ||