pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

History

Jerry Ma 1610ea8ef8 Comprehensive-ish instrumentation for CUDA memory allocator (#27361 ) Summary: Adds comprehensive memory instrumentation to the CUDA caching memory allocator. # Counters Added comprehensive instrumentation for the following stats: - Allocation requests (`allocation`) - Allocated memory (`allocated_bytes`) - Reserved segments from cudaMalloc (`segment`) - Reserved memory (`reserved_bytes`) - Active memory blocks (`active`) - Active memory (`active_bytes`) - Inactive, non-releasable blocks (`inactive_split`) - Inactive, non-releasable memory (`inactive_split_bytes`) - Number of failed cudaMalloc calls that result in a cache flush and retry (`cuda_malloc_retries`) - Number of OOMs (`num_ooms`) Except for the last two, these stats are segmented between all memory, large blocks, and small blocks. Along with the current value of each stat, historical counts of allocs/frees as well as peak usage are tracked by the allocator. # Snapshots Added the capability to get a "memory snapshot" – that is, to generate a complete dump of the allocator block/segment state. # Implementation: major changes - Added `torch.cuda.memory_stats()` (and associated C++ changes) which returns all instrumented stats as a dictionary. - Added `torch.cuda.snapshot()` (and associated C++ changes) which returns a complete dump of the allocator block/segment state as a list of segments. - Added memory summary generator in `torch.cuda.memory_summary()` for ease of client access to the instrumentation stats. Potentially useful to dump when catching OOMs. Sample output here: https://pastebin.com/uKZjtupq # Implementation: minor changes - Add error-checking helper functions for Python dicts and lists in `torch/csrc/utils/`. - Existing memory management functions in `torch.cuda` moved from `__init__.py` to `memory.py` and star-imported to the main CUDA module. - Add various helper functions to `torch.cuda` to return individual items from `torch.cuda.memory_stats()`. - `torch.cuda.reset_max_memory_cached()` and `torch.cuda.reset_max_memory_allocated()` are deprecated in favor of `reset_peak_stats`. It's a bit difficult to think of a case where only one of those stats should be reset, and IMO this makes the peak stats collectively more consistent. - `torch.cuda.memory_cached()` and `torch.cuda.max_memory_cached()` are deprecated in favor of `*memory_reserved()`. - Style (add access modifiers in the allocator class, random nit fixes, etc.) # Testing - Added consistency check for stats in `test_cuda.py`. This verifies that the data from `memory_stats()` is faithful to the data from `snapshot()`. - Ran on various basic workflows (toy example, CIFAR) # Performance Running the following speed benchmark: https://pastebin.com/UNndQg50 - Before this PR: 45.98 microseconds per tensor creation - After this PR: 46.65 microseconds per tensor creation Pull Request resolved: https://github.com/pytorch/pytorch/pull/27361 Differential Revision: D17758747 Pulled By: jma127 fbshipit-source-id: 5a84e82d696c40c505646b9a1b4e0c3bba38aeb6		2019-10-08 15:42:48 -07:00
..
_static/img	hyperparameter plugin (#23134 )	2019-08-26 10:40:34 -07:00
_templates	Generate sphinx docs with secure content. (#18508 )	2019-03-27 11:01:48 -07:00
community	Adjust maintainers list (#23693 )	2019-08-01 22:59:02 -07:00
notes	Comprehensive-ish instrumentation for CUDA memory allocator (#27361 )	2019-10-08 15:42:48 -07:00
scripts	Add CELU activation to pytorch (#8551 )	2018-08-01 07:54:44 -07:00
__config__.rst	Allow a non-OpenMP based build (#19749 )	2019-05-06 19:34:48 -07:00
autograd.rst	Added torch.autograd.profiler.record_function() as context manager. (#23428 )	2019-07-30 11:10:01 -07:00
bottleneck.rst	[docs] Clarify more CUDA profiling gotchas in bottleneck docs (#6763 )	2018-04-19 13:15:27 -04:00
checkpoint.rst	Stashing checkpointing RNG states based on devices of arg tensors (#14518 )	2018-12-11 09:48:45 -08:00
conf.py	Finish testing code examples in the docs (#25668 )	2019-09-05 16:13:37 -07:00
cpp_extension.rst	Inline JIT C++ Extensions (#7059 )	2018-04-30 11:48:44 -04:00
cuda_deterministic_backward.rst	Typo correction in cuda_deterministic_backward.rst (#25011 )	2019-08-22 21:19:39 -07:00
cuda_deterministic.rst	Amend nondeterminism notes (#12217 )	2018-10-16 23:59:26 -07:00
cuda.rst	Comprehensive-ish instrumentation for CUDA memory allocator (#27361 )	2019-10-08 15:42:48 -07:00
cudnn_deterministic.rst	Amend nondeterminism notes (#12217 )	2018-10-16 23:59:26 -07:00
cudnn_persistent_rnn.rst	don't copy weight gradients in rnn (#12600 )	2018-10-12 13:34:10 -07:00
data.rst	Slightly improve dataloader docs on when auto-batching is disabled (#23671 )	2019-08-01 12:10:17 -07:00
distributed.rst	Update distributed.rst (#23289 )	2019-07-26 16:55:52 -07:00
distributions.rst	More doc edits (#19929 )	2019-04-30 13:52:07 -07:00
dlpack.rst	document torch.utils.dlpack (#9343 )	2018-07-11 07:46:09 -07:00
hub.rst	Hub improvements (#26723 )	2019-09-25 08:21:50 -07:00
index.rst	Alphabetize Package Reference section in Docs	2019-09-04 14:31:16 -07:00
jit_builtin_functions.rst	Fix builtin function reference (#24056 )	2019-08-09 15:58:15 -07:00
jit.rst	Reduce error context from 10 -> 3 (#26765 )	2019-10-04 11:24:52 -07:00
model_zoo.rst	add/move a few apis in torch.hub (#18758 )	2019-04-10 23:10:39 -07:00
multiprocessing.rst	Update multiprocessing note now that shared CUDA tensors are refcounted (#19904 )	2019-05-25 17:40:42 -07:00
nn.functional.rst	Breaks up NN module in docs so it loads faster.	2019-06-11 09:38:41 -07:00
nn.init.rst	Add document of functions nn.init.ones_/zeros_ (#23145 )	2019-07-25 06:09:50 -07:00
nn.rst	Fixed flatten docs (I think) (#25544 )	2019-09-02 11:34:56 -07:00
onnx.rst	Fix dead link and syntax in ONNX landing page	2019-08-29 23:58:34 -07:00
optim.rst	Add CosineAnnealingWarmRestarts to optim documentation (#25421 )	2019-09-05 19:06:18 -07:00
random.rst	Adds torch.random to docs/toc (#23553 )	2019-08-07 16:31:32 -07:00
sparse.rst	fix typo: toDense --> to_dense #25706 (#25832 )	2019-09-09 18:27:03 -07:00
storage.rst	Start documenting torch.Tensor (#377 )	2016-12-30 01:21:34 -05:00
tensor_attributes.rst	Expose a torch.result_type and simplify tensor iterator	2019-09-25 06:52:23 -07:00
tensorboard.rst	Add method add_hparams to API doc (#27344 )	2019-10-03 17:07:45 -07:00
tensors.rst	Remove deprecated torch.gels (#26480 )	2019-09-23 07:15:39 -07:00
torch.rst	Add torch.promote_types function	2019-09-27 16:48:38 -07:00
type_info.rst	Allow converting char tensor to numpy; add [fi]info.min (#15046 )	2018-12-24 09:11:24 -08:00