pytorch/docs/source
Jerry Ma 1610ea8ef8 Comprehensive-ish instrumentation for CUDA memory allocator (#27361)
Summary:
Adds comprehensive memory instrumentation to the CUDA caching memory allocator.

# Counters

Added comprehensive instrumentation for the following stats:
  - Allocation requests (`allocation`)
  - Allocated memory (`allocated_bytes`)
  - Reserved segments from cudaMalloc (`segment`)
  - Reserved memory (`reserved_bytes`)
  - Active memory blocks (`active`)
  - Active memory (`active_bytes`)
  - Inactive, non-releasable blocks (`inactive_split`)
  - Inactive, non-releasable memory (`inactive_split_bytes`)
  - Number of failed cudaMalloc calls that result in a cache flush and retry (`cuda_malloc_retries`)
  - Number of OOMs (`num_ooms`)

Except for the last two, these stats are segmented between all memory, large blocks, and small blocks. Along with the current value of each stat, historical counts of allocs/frees as well as peak usage are tracked by the allocator.

# Snapshots

Added the capability to get a "memory snapshot" – that is, to generate a complete dump of the allocator block/segment state.

# Implementation: major changes

- Added `torch.cuda.memory_stats()` (and associated C++ changes) which returns all instrumented stats as a dictionary.
- Added `torch.cuda.snapshot()` (and associated C++ changes) which returns a complete dump of the allocator block/segment state as a list of segments.
- Added memory summary generator in `torch.cuda.memory_summary()` for ease of client access to the instrumentation stats. Potentially useful to dump when catching OOMs. Sample output here: https://pastebin.com/uKZjtupq

# Implementation: minor changes

- Add error-checking helper functions for Python dicts and lists in `torch/csrc/utils/`.
- Existing memory management functions in `torch.cuda` moved from `__init__.py` to `memory.py` and star-imported to the main CUDA module.
- Add various helper functions to `torch.cuda` to return individual items from `torch.cuda.memory_stats()`.
- `torch.cuda.reset_max_memory_cached()` and `torch.cuda.reset_max_memory_allocated()` are deprecated in favor of `reset_peak_stats`. It's a bit difficult to think of a case where only one of those stats should be reset, and IMO this makes the peak stats collectively more consistent.
- `torch.cuda.memory_cached()` and `torch.cuda.max_memory_cached()` are deprecated in favor of `*memory_reserved()`.
- Style (add access modifiers in the allocator class, random nit fixes, etc.)

# Testing

- Added consistency check for stats in `test_cuda.py`. This verifies that the data from `memory_stats()` is faithful to the data from `snapshot()`.
- Ran on various basic workflows (toy example, CIFAR)

# Performance

Running the following speed benchmark: https://pastebin.com/UNndQg50

- Before this PR: 45.98 microseconds per tensor creation
- After this PR: 46.65 microseconds per tensor creation
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27361

Differential Revision: D17758747

Pulled By: jma127

fbshipit-source-id: 5a84e82d696c40c505646b9a1b4e0c3bba38aeb6
2019-10-08 15:42:48 -07:00
..
_static/img hyperparameter plugin (#23134) 2019-08-26 10:40:34 -07:00
_templates Generate sphinx docs with secure content. (#18508) 2019-03-27 11:01:48 -07:00
community Adjust maintainers list (#23693) 2019-08-01 22:59:02 -07:00
notes Comprehensive-ish instrumentation for CUDA memory allocator (#27361) 2019-10-08 15:42:48 -07:00
scripts Add CELU activation to pytorch (#8551) 2018-08-01 07:54:44 -07:00
__config__.rst Allow a non-OpenMP based build (#19749) 2019-05-06 19:34:48 -07:00
autograd.rst Added torch.autograd.profiler.record_function() as context manager. (#23428) 2019-07-30 11:10:01 -07:00
bottleneck.rst [docs] Clarify more CUDA profiling gotchas in bottleneck docs (#6763) 2018-04-19 13:15:27 -04:00
checkpoint.rst Stashing checkpointing RNG states based on devices of arg tensors (#14518) 2018-12-11 09:48:45 -08:00
conf.py Finish testing code examples in the docs (#25668) 2019-09-05 16:13:37 -07:00
cpp_extension.rst Inline JIT C++ Extensions (#7059) 2018-04-30 11:48:44 -04:00
cuda_deterministic_backward.rst Typo correction in cuda_deterministic_backward.rst (#25011) 2019-08-22 21:19:39 -07:00
cuda_deterministic.rst Amend nondeterminism notes (#12217) 2018-10-16 23:59:26 -07:00
cuda.rst Comprehensive-ish instrumentation for CUDA memory allocator (#27361) 2019-10-08 15:42:48 -07:00
cudnn_deterministic.rst Amend nondeterminism notes (#12217) 2018-10-16 23:59:26 -07:00
cudnn_persistent_rnn.rst don't copy weight gradients in rnn (#12600) 2018-10-12 13:34:10 -07:00
data.rst Slightly improve dataloader docs on when auto-batching is disabled (#23671) 2019-08-01 12:10:17 -07:00
distributed.rst Update distributed.rst (#23289) 2019-07-26 16:55:52 -07:00
distributions.rst More doc edits (#19929) 2019-04-30 13:52:07 -07:00
dlpack.rst document torch.utils.dlpack (#9343) 2018-07-11 07:46:09 -07:00
hub.rst Hub improvements (#26723) 2019-09-25 08:21:50 -07:00
index.rst Alphabetize Package Reference section in Docs 2019-09-04 14:31:16 -07:00
jit_builtin_functions.rst Fix builtin function reference (#24056) 2019-08-09 15:58:15 -07:00
jit.rst Reduce error context from 10 -> 3 (#26765) 2019-10-04 11:24:52 -07:00
model_zoo.rst add/move a few apis in torch.hub (#18758) 2019-04-10 23:10:39 -07:00
multiprocessing.rst Update multiprocessing note now that shared CUDA tensors are refcounted (#19904) 2019-05-25 17:40:42 -07:00
nn.functional.rst Breaks up NN module in docs so it loads faster. 2019-06-11 09:38:41 -07:00
nn.init.rst Add document of functions nn.init.ones_/zeros_ (#23145) 2019-07-25 06:09:50 -07:00
nn.rst Fixed flatten docs (I think) (#25544) 2019-09-02 11:34:56 -07:00
onnx.rst Fix dead link and syntax in ONNX landing page 2019-08-29 23:58:34 -07:00
optim.rst Add CosineAnnealingWarmRestarts to optim documentation (#25421) 2019-09-05 19:06:18 -07:00
random.rst Adds torch.random to docs/toc (#23553) 2019-08-07 16:31:32 -07:00
sparse.rst fix typo: toDense --> to_dense #25706 (#25832) 2019-09-09 18:27:03 -07:00
storage.rst Start documenting torch.Tensor (#377) 2016-12-30 01:21:34 -05:00
tensor_attributes.rst Expose a torch.result_type and simplify tensor iterator 2019-09-25 06:52:23 -07:00
tensorboard.rst Add method add_hparams to API doc (#27344) 2019-10-03 17:07:45 -07:00
tensors.rst Remove deprecated torch.gels (#26480) 2019-09-23 07:15:39 -07:00
torch.rst Add torch.promote_types function 2019-09-27 16:48:38 -07:00
type_info.rst Allow converting char tensor to numpy; add [fi]info.min (#15046) 2018-12-24 09:11:24 -08:00