pytorch/torch/profiler
David Berard 7d205b22b5 [profiler][retry] don't disable CUPTI_LAZY_REINIT for cuda >= 12.6 (#151124)
Retry of https://github.com/pytorch/pytorch/pull/150957, which was reverted due to internal meta failures

Credit to @mgmtea who wrote the initial version of this PR: https://github.com/pytorch/pytorch/pull/146604

Context: CUPTI is the NVIDIA library that Kineto uses for collecting GPU-side info during profiling. The intended usage is to register a callback while you want profiling to occur, and then unregister the callback when you want profiling to stop. But a bug would cause crashes if CUPTI callbacks were de-registered when used with cudagraphs. The workaround was to disable "CUPTI_LAZY_REINIT" and "CUPTI_TEARDOWN" in Kineto - which prevents crashes, but can result in slower execution after profiling has occurred and completed.

This bug is believed to be fixed in CUDA >= 12.6, so this PR qualifies that DISABLE_CUPTI_LAZY_REINIT=1 and CUPTI_TEARDOWN=0 should only be applied if CUDA >= 12.6. Additionally, `profiler_allow_cudagraph_cupti_lazy_reinit_cuda12()` is added as an escape hatch so that we can add a killswitch in case we see more crashes related to this.

Differential Revision: [D72842114](https://our.internmc.facebook.com/intern/diff/D72842114/)

**NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D72842114/)!

Differential Revision: [D72842114](https://our.internmc.facebook.com/intern/diff/D72842114)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/151124
Approved by: https://github.com/sraikund16
2025-04-15 16:11:49 +00:00
..
__init__.py [Profiler] Enable Iterative Step without profiler in fbcode (#142077) 2024-12-12 19:00:13 +00:00
_memory_profiler.py [BE][Ez]: Use itertools.chain.from_iterable when possible (#148190) 2025-03-06 20:37:06 +00:00
_pattern_matcher.py PEP585 update - torch/nn torch/optim torch/package torch/profiler torch/serialization torch/sparse torch/xpu (#145175) 2025-01-21 16:57:27 +00:00
_utils.py PEP585 update - torch/nn torch/optim torch/package torch/profiler torch/serialization torch/sparse torch/xpu (#145175) 2025-01-21 16:57:27 +00:00
itt.py [BE][Easy][19/19] enforce style for empty lines in import segments in torch/[o-z]*/ (#129771) 2024-08-01 17:07:14 +00:00
profiler.py [profiler][retry] don't disable CUPTI_LAZY_REINIT for cuda >= 12.6 (#151124) 2025-04-15 16:11:49 +00:00
python_tracer.py PEP585 update - torch/nn torch/optim torch/package torch/profiler torch/serialization torch/sparse torch/xpu (#145175) 2025-01-21 16:57:27 +00:00