pytorch/test/cpp
fduwjj ae7df51232 [c10d] Fix CudaEventCache for dangling references (#144496)
Reported in https://github.com/pytorch/pytorch/issues/143470, we have a dangling references in `CudaEventCache`. So we want to fix it.
1. We add a unit test to repro the issue mentioned in the issue.
2. Instead of converting variables to shared pointers as suggested in the issue, we then make the cache itself a shared pointer. So if the thread creates the cache dies before all events get recycled, the cache is still there until the last CudaEvent get deleted. (thanks for the suggestion from @kwen2501 )

Pull Request resolved: https://github.com/pytorch/pytorch/pull/144496
Approved by: https://github.com/kwen2501
2025-01-15 05:11:48 +00:00
..
aoti_abi_check [AOTI] Fix complex64 not defined (#132810) 2024-08-08 18:08:23 +00:00
aoti_inference [AOTInductor] Add standalone test for compilation from ExportedProgram (#142327) 2024-12-10 06:50:09 +00:00
api [ROCm][CI] upgrade CI to ROCm 6.3 (#142152) 2025-01-09 17:14:16 +00:00
c10d [c10d] Fix CudaEventCache for dangling references (#144496) 2025-01-15 05:11:48 +00:00
common [AOTI] Add ABI-compatiblity tests (#123848) 2024-04-19 00:51:24 +00:00
dist_autograd Set RUNPATH so installed tests can find the required shared libraries (#136627) 2024-10-25 09:38:08 +00:00
jit Revert "Fix poision child process issue when call getAccelerator() (#144368)" 2025-01-10 23:36:43 +00:00
lazy [BE]: Replace clone detach with detach clone to be more efficient (#144469) 2025-01-09 18:28:39 +00:00
lite_interpreter_runtime Add None return type to init -- tests (#132352) 2024-08-01 15:44:51 +00:00
monitor
profiler [codemod] Fix a few unused-variable issues in pytorch (#143517) 2024-12-19 00:18:08 +00:00
rpc [rpc] Fix unit test after c10::nullopt removal (#143690) 2024-12-20 23:36:07 +00:00
tensorexpr Fix floating point literals in IRPrinter (#142119) 2024-12-18 21:59:48 +00:00
__init__.py