pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

History

Jerry Ma 1610ea8ef8 Comprehensive-ish instrumentation for CUDA memory allocator (#27361 ) Summary: Adds comprehensive memory instrumentation to the CUDA caching memory allocator. # Counters Added comprehensive instrumentation for the following stats: - Allocation requests (`allocation`) - Allocated memory (`allocated_bytes`) - Reserved segments from cudaMalloc (`segment`) - Reserved memory (`reserved_bytes`) - Active memory blocks (`active`) - Active memory (`active_bytes`) - Inactive, non-releasable blocks (`inactive_split`) - Inactive, non-releasable memory (`inactive_split_bytes`) - Number of failed cudaMalloc calls that result in a cache flush and retry (`cuda_malloc_retries`) - Number of OOMs (`num_ooms`) Except for the last two, these stats are segmented between all memory, large blocks, and small blocks. Along with the current value of each stat, historical counts of allocs/frees as well as peak usage are tracked by the allocator. # Snapshots Added the capability to get a "memory snapshot" – that is, to generate a complete dump of the allocator block/segment state. # Implementation: major changes - Added `torch.cuda.memory_stats()` (and associated C++ changes) which returns all instrumented stats as a dictionary. - Added `torch.cuda.snapshot()` (and associated C++ changes) which returns a complete dump of the allocator block/segment state as a list of segments. - Added memory summary generator in `torch.cuda.memory_summary()` for ease of client access to the instrumentation stats. Potentially useful to dump when catching OOMs. Sample output here: https://pastebin.com/uKZjtupq # Implementation: minor changes - Add error-checking helper functions for Python dicts and lists in `torch/csrc/utils/`. - Existing memory management functions in `torch.cuda` moved from `__init__.py` to `memory.py` and star-imported to the main CUDA module. - Add various helper functions to `torch.cuda` to return individual items from `torch.cuda.memory_stats()`. - `torch.cuda.reset_max_memory_cached()` and `torch.cuda.reset_max_memory_allocated()` are deprecated in favor of `reset_peak_stats`. It's a bit difficult to think of a case where only one of those stats should be reset, and IMO this makes the peak stats collectively more consistent. - `torch.cuda.memory_cached()` and `torch.cuda.max_memory_cached()` are deprecated in favor of `*memory_reserved()`. - Style (add access modifiers in the allocator class, random nit fixes, etc.) # Testing - Added consistency check for stats in `test_cuda.py`. This verifies that the data from `memory_stats()` is faithful to the data from `snapshot()`. - Ran on various basic workflows (toy example, CIFAR) # Performance Running the following speed benchmark: https://pastebin.com/UNndQg50 - Before this PR: 45.98 microseconds per tensor creation - After this PR: 46.65 microseconds per tensor creation Pull Request resolved: https://github.com/pytorch/pytorch/pull/27361 Differential Revision: D17758747 Pulled By: jma127 fbshipit-source-id: 5a84e82d696c40c505646b9a1b4e0c3bba38aeb6		2019-10-08 15:42:48 -07:00
..
autograd.rst	[docs] Update autograd notes (#6769 )	2018-04-19 13:34:14 -04:00
broadcasting.rst	[docs] Update broadcasting and cuda semantics notes (#6904 )	2018-04-24 13:41:24 -04:00
cpu_threading_torchscript_inference.rst	Threading and CPU Inference note	2019-07-29 15:45:49 -07:00
cpu_threading_torchscript_inference.svg	Threading and CPU Inference note	2019-07-29 15:45:49 -07:00
cuda.rst	Comprehensive-ish instrumentation for CUDA memory allocator (#27361 )	2019-10-08 15:42:48 -07:00
extending.rst	Update extension docs, fix Fold/Unfold docs (#9239 )	2018-07-08 19:09:39 -07:00
faq.rst	Use "length of the RNN input" instead of "length of the RNN"	2019-05-24 09:03:50 -07:00
large_scale_deployments.rst	Thread local debug info	2019-08-12 14:53:57 -07:00
multiprocessing.rst	Add IterableDataset (#19228 )	2019-06-20 20:12:44 -07:00
randomness.rst	Update randomness.rst (#21337 )	2019-06-04 07:38:00 -07:00
serialization.rst	code syntax error in document (serialization.rst) (#937 )	2017-03-06 10:06:04 -05:00
windows.rst	Add magma for CUDA 10.1 to Windows docs	2019-04-29 10:13:21 -07:00