pytorch/docs/source
Michael Wootton 67dcd62310 Don't split oversize cached blocks (#44742)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/35901

This change is designed to prevent fragmentation in the Caching Allocator.  Permissive block splitting in the allocator allows very large blocks to be split into many pieces.  Once split too finely it is unlikely all pieces will be 'free' at that same time so the original allocation can never be returned.   Anecdotally, we've seen a model run out of memory failing to alloc a 50 MB block on a 32 GB card while the caching allocator is holding 13 GB of 'split free blocks'

Approach:

- Large blocks above a certain size are designated "oversize".  This limit is currently set 1 decade above large, 200 MB
- Oversize blocks can not be split
- Oversize blocks must closely match the requested size (e.g. a 200 MB request will match an existing 205 MB block, but not a 300 MB block)
- In lieu of splitting oversize blocks there is a mechanism to quickly free a single oversize block (to the system allocator) to allow an appropriate size block to be allocated.  This will be activated under memory pressure and will prevent _release_cached_blocks()_ from triggering

Initial performance tests show this is similar or quicker than the original strategy.  Additional tests are ongoing.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44742

Reviewed By: ngimel

Differential Revision: D23752058

Pulled By: ezyang

fbshipit-source-id: ccb7c13e3cf8ef2707706726ac9aaac3a5e3d5c8
2021-04-14 03:04:41 -07:00
..
_static Add documentation page for pipeline parallelism. (#50791) 2021-01-25 13:47:13 -08:00
_templates various doc building cleanups (#53851) 2021-03-16 15:01:59 -07:00
community Lint trailing newlines (#54737) 2021-03-30 13:09:52 -07:00
notes Don't split oversize cached blocks (#44742) 2021-04-14 03:04:41 -07:00
rpc Forbid trailing whitespace (#53406) 2021-03-05 17:22:55 -08:00
scripts Optimize SiLU (Swish) op in PyTorch (#42976) 2020-08-16 13:21:57 -07:00
__config__.rst Fix __config__ docs (#48557) 2020-11-29 23:57:06 -08:00
amp.rst [Relanding] Implemented torch.linalg.multi_dot (#52859) 2021-04-01 04:49:05 -07:00
autograd.rst DOC: use autosummary on tensors.rst (#55042) 2021-04-08 06:44:23 -07:00
backends.rst Forbid trailing whitespace (#53406) 2021-03-05 17:22:55 -08:00
benchmark_utils.rst Expand benchmark utils docs (#51664) 2021-02-04 00:22:41 -08:00
bottleneck.rst [docs] Clarify more CUDA profiling gotchas in bottleneck docs (#6763) 2018-04-19 13:15:27 -04:00
checkpoint.rst Stashing checkpointing RNG states based on devices of arg tensors (#14518) 2018-12-11 09:48:45 -08:00
complex_numbers.rst various doc building cleanups (#53851) 2021-03-16 15:01:59 -07:00
conf.py various doc building cleanups (#53851) 2021-03-16 15:01:59 -07:00
cpp_extension.rst correct some cpp extension code usages and documents (#39766) 2020-06-10 08:31:22 -07:00
cpp_index.rst Add C++ Landing Page (#38450) 2020-05-14 16:02:01 -07:00
cuda.rst docs: add reset_peak_memory_stats in cuda.rst (#54668) 2021-03-29 10:05:20 -07:00
cudnn_persistent_rnn.rst Forbid trailing whitespace (#53406) 2021-03-05 17:22:55 -08:00
cudnn_rnn_determinism.rst Forbid trailing whitespace (#53406) 2021-03-05 17:22:55 -08:00
data.rst Forbid trailing whitespace (#53406) 2021-03-05 17:22:55 -08:00
ddp_comm_hooks.rst [SPMD] Restrict DDP communication hooks to SPSD mode (#55253) 2021-04-05 16:46:47 -07:00
distributed.optim.rst [Reland] Update and expose ZeroRedundancyOptimizer docs (#53112) 2021-03-02 14:16:12 -08:00
distributed.rst update distributed doc table for alltoall nccl (#54277) 2021-03-19 15:35:10 -07:00
distributions.rst Add sample validation for LKJCholesky.log_prob (#52763) 2021-02-25 16:12:29 -08:00
dlpack.rst Lint trailing newlines (#54737) 2021-03-30 13:09:52 -07:00
docutils.conf Revert "Revert D21337640: [pytorch][PR] Split up documentation into subpages and clean up some warnings" (#37778) 2020-05-04 14:32:35 -07:00
fft.rst Use autosummary on torch.fft, torch.linalg (#55748) 2021-04-13 12:02:36 -07:00
futures.rst [JIT x RPC] Consolidate Future type class and Future impl class (#40406) 2020-06-24 01:44:49 -07:00
fx.rst [FX][docs] Render inherited methods in fx.Tracer API reference (#53630) 2021-03-09 14:30:41 -08:00
hub.rst Add a torch.hub.load_local() function that can load models from any local directory with a hubconf.py (#44204) 2020-09-21 14:17:21 -07:00
index.rst [package] Create API reference (#55812) 2021-04-13 09:58:45 -07:00
jit_builtin_functions.rst Lint trailing newlines (#54737) 2021-03-30 13:09:52 -07:00
jit_language_reference_v2.rst Remove legacy constructor calls from pytorch codebase. (#54142) 2021-04-11 15:45:17 -07:00
jit_language_reference.rst add type annotations to torch.nn.modules.conv (#49564) 2021-01-15 11:16:11 -08:00
jit_python_reference.rst [JIT] Add support for with statements (#34705) 2020-06-18 16:57:18 -07:00
jit_unsupported.rst [JIT] Update docs for recently added features (#45232) 2020-09-28 18:17:42 -07:00
jit.rst Add documentation for torch.jit.Attribute and torch.jit.annotate (#54485) 2021-03-29 14:44:53 -07:00
linalg.rst Use autosummary on torch.fft, torch.linalg (#55748) 2021-04-13 12:02:36 -07:00
math-quantizer-equation.png adding quantization.rst file for quantization feature (#27559) 2019-10-09 16:45:09 -07:00
mobile_optimizer.rst Mod lists to neutral+descriptive terms in caffe2/docs (#49803) 2020-12-23 11:37:11 -08:00
model_zoo.rst add/move a few apis in torch.hub (#18758) 2019-04-10 23:10:39 -07:00
multiprocessing.rst Forbid trailing whitespace (#53406) 2021-03-05 17:22:55 -08:00
name_inference.rst Add CSR (compressed sparse row) layout for sparse tensors (#50937) 2021-04-12 10:09:12 -07:00
named_tensor.rst Forbid trailing whitespace (#53406) 2021-03-05 17:22:55 -08:00
nn.functional.rst split nn.functional (#55038) 2021-04-07 06:35:47 -07:00
nn.init.rst Bag of documentation fixes; fix more sphinx warnings (#27850) 2019-10-15 07:31:14 -07:00
nn.rst docs: separate autosummary for flatten layers (#54663) 2021-03-29 10:23:34 -07:00
onnx.rst [ONNX] Add hardsigmoid symbolic in opset 9 #49649 (#54193) 2021-04-07 14:28:31 -07:00
optim.rst Forbid trailing whitespace (#53406) 2021-03-05 17:22:55 -08:00
package.rst [package] Minor fixes to PackageExporter docstrings (#55817) 2021-04-13 10:00:38 -07:00
pipeline.rst Add tutorials to pipeline docs. (#55209) 2021-04-05 20:01:00 -07:00
profiler.rst docs: fix profiler docstring (#55750) 2021-04-13 00:23:14 -07:00
quantization-support.rst [docs][quant] Add fx graph mode quant api doc (#55306) 2021-04-05 13:56:23 -07:00
quantization.rst [docs][quant] Fix FX Graph Mode Quantization tutorial link (#54715) 2021-03-29 17:25:19 -07:00
random.rst Remove duplicated entries in random.rst (#39725) 2020-06-10 16:51:15 -07:00
rpc.rst Add 'remote_parameters' and 'get_module_rref' to RemoteModule docs. (#54645) 2021-03-26 21:41:28 -07:00
sparse.rst Add CSR (compressed sparse row) layout for sparse tensors (#50937) 2021-04-12 10:09:12 -07:00
special.rst [special] Alias for sigmoid and logit & follow-up (#54759) 2021-04-08 00:56:59 -07:00
storage.rst Lint trailing newlines (#54737) 2021-03-30 13:09:52 -07:00
tensor_attributes.rst Remove legacy constructor calls from pytorch codebase. (#54142) 2021-04-11 15:45:17 -07:00
tensor_view.rst Add torch.swapdims and torch.swapaxes (#46041) 2020-11-18 11:35:53 -08:00
tensorboard.rst Add method add_hparams to API doc (#27344) 2019-10-03 17:07:45 -07:00
tensors.rst DOC: use autosummary on tensors.rst (#55042) 2021-04-08 06:44:23 -07:00
torch.nn.intrinsic.qat.rst [quantization] Add some support for 3d operations (#50003) 2021-03-10 16:40:35 -08:00
torch.nn.intrinsic.quantized.rst Lint trailing newlines (#54737) 2021-03-30 13:09:52 -07:00
torch.nn.intrinsic.rst [quantization] Add some support for 3d operations (#50003) 2021-03-10 16:40:35 -08:00
torch.nn.qat.rst Lint trailing newlines (#54737) 2021-03-30 13:09:52 -07:00
torch.nn.quantized.dynamic.rst Forbid trailing whitespace (#53406) 2021-03-05 17:22:55 -08:00
torch.nn.quantized.rst [quant] add docs for embedding/embedding_bag (#51770) 2021-02-05 11:43:15 -08:00
torch.overrides.rst Add documentation for torch.overrides submodule. (#48170) 2020-11-30 11:25:31 -08:00
torch.quantization.rst Lint trailing newlines (#54737) 2021-03-30 13:09:52 -07:00
torch.rst DOC: use autosummary on tensors.rst (#55042) 2021-04-08 06:44:23 -07:00
type_info.rst DOC: split quantization.rst into smaller pieces (#41321) 2020-07-25 23:59:40 -07:00