mirror of
https://github.com/zebrajr/pytorch.git
synced 2025-12-07 12:21:27 +01:00
> capture_error_mode (str, optional): specifies the cudaStreamCaptureMode for the graph capture stream. Can be "global", "thread_local" or "relaxed". During cuda graph capture, some actions, such as cudaMalloc, may be unsafe. "global" will error on actions in other threads, "thread_local" will only error for actions in the current thread, and "relaxed" will not error on these actions. Inductor codegen is single-threaded, so it should be safe to enable "thread_local" for inductor's cuda graph capturing. We have seen errors when inductor cudagraphs has been used concurrently with data preprocessing in other threads. Differential Revision: [D48656014](https://our.internmc.facebook.com/intern/diff/D48656014) Pull Request resolved: https://github.com/pytorch/pytorch/pull/107407 Approved by: https://github.com/albanD, https://github.com/eqy |
||
|---|---|---|
| .. | ||
| shared | ||
| comm.cpp | ||
| comm.h | ||
| CUDAPluggableAllocator.cpp | ||
| CUDAPluggableAllocator.h | ||
| device_set.h | ||
| Event.cpp | ||
| Event.h | ||
| Graph.cpp | ||
| memory_snapshot.cpp | ||
| memory_snapshot.h | ||
| Module.cpp | ||
| Module.h | ||
| nccl.cpp | ||
| nccl.h | ||
| python_comm.cpp | ||
| python_comm.h | ||
| python_nccl.cpp | ||
| python_nccl.h | ||
| Stream.cpp | ||
| Stream.h | ||
| Tensor.cpp | ||
| THCP.h | ||
| utils.cpp | ||