pytorch/torch/cuda
Shivam Raikundalia 1083bc749d [Memory Snapshot] Add Flag to Toggle Global and Local Callbacks for Annotations (#154932)
Summary:
There are some cases where we want only local annotations for memory snapshot such as executing inside the cudastream callback, which cannot execute CUDA operators. Thus the cuda errors happen: Exception in RecordFunction callback: CUDA error: operation not permitted

However, we need to have an option to turn on the globally so that on-demand snapshot can get annotations. Additionally, there may be some cases in which auto-trace will also want annotations using record functions so we expose the flag to the auto-trace as well.

Test Plan:
Run MVAI executable and see that the errors go away

Rollback Plan:

Differential Revision: D75831687

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154932
Approved by: https://github.com/mzzchy, https://github.com/sanrise
2025-06-04 23:15:19 +00:00
..
amp
__init__.py [BE] Introduce torch.AcceleratorError (#152023) 2025-06-01 21:02:43 +00:00
_gpu_trace.py
_memory_viz.py
_sanitizer.py
_utils.py Add torch.cuda._compile_kernel() (#151484) 2025-04-24 07:14:31 +00:00
comm.py
error.py
gds.py
graphs.py
jiterator.py
memory.py [Memory Snapshot] Add Flag to Toggle Global and Local Callbacks for Annotations (#154932) 2025-06-04 23:15:19 +00:00
nccl.py
nvtx.py
profiler.py
random.py Avoid unnecessary clone in torch.cuda.set_rng_state (#149283) 2025-03-18 20:47:57 +00:00
sparse.py
streams.py
tunable.py [ROCm][TunableOp] Support submatrices in offline tuning (#151138) 2025-04-19 04:14:27 +00:00