pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

History

Zizeng Meng 861945100e [Kineto] Enable OOM observer (#152160 ) Summary: # Context: When memory leak happens, it usually trigger the OOM in the later iterations. The snapshot of full iteration will be huge and hard to interpret. On CUDA side, they provide OOM observer which generates snapshot when OOM happens with latest 1,500,000 entries for debugging. In this diff, we want to implement the feature on MTIA side Test Plan: Run this test with last diff in the stack. ``` buck run @//mode/opt kineto/libkineto/fb/mtia/integration_tests:mtia_memory_auto_trace_test ``` As shown, the memory_snapshot is generated when oom happens Log: P1794792326 Snapshot: https://fburl.com/pytorch_memory_visualizer/lx73y6s3 {F1977402355} Differential Revision: D71993315 Pull Request resolved: https://github.com/pytorch/pytorch/pull/152160 Approved by: https://github.com/sraikund16		2025-04-27 15:56:44 +00:00
..
cpp	Fix broken URLs (#152237 )	2025-04-27 09:56:42 +00:00
source	[Kineto] Enable OOM observer (#152160 )	2025-04-27 15:56:44 +00:00
.gitignore
libtorch.rst	Add ROCm documentation to libtorch (C++) reST. (#136378 )	2024-09-25 02:30:56 +00:00
make.bat
Makefile	Add scripts to generate plots of LRSchedulers (#149189 )	2025-04-14 09:53:38 +00:00
README.md
requirements.txt	Update docs dependencies for local build (#151796 )	2025-04-24 18:40:42 +00:00

Please see the Writing documentation section of CONTRIBUTING.md for details on both writing and building the docs.