pytorch/docs/source
Michael Wootton 2f3be2735f Don't split oversize cached blocks (#44742)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/35901

This change is designed to prevent fragmentation in the Caching Allocator.  Permissive block splitting in the allocator allows very large blocks to be split into many pieces.  Once split too finely it is unlikely all pieces will be 'free' at that same time so the original allocation can never be returned.   Anecdotally, we've seen a model run out of memory failing to alloc a 50 MB block on a 32 GB card while the caching allocator is holding 13 GB of 'split free blocks'

Approach:

- Large blocks above a certain size are designated "oversize".  This limit is currently set 1 decade above large, 200 MB
- Oversize blocks can not be split
- Oversize blocks must closely match the requested size (e.g. a 200 MB request will match an existing 205 MB block, but not a 300 MB block)
- In lieu of splitting oversize blocks there is a mechanism to quickly free a single oversize block (to the system allocator) to allow an appropriate size block to be allocated.  This will be activated under memory pressure and will prevent _release_cached_blocks()_ from triggering

Initial performance tests show this is similar or quicker than the original strategy.  Additional tests are ongoing.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44742

Reviewed By: zou3519

Differential Revision: D29186394

Pulled By: ezyang

fbshipit-source-id: c88918836db3f51df59de6d1b3e03602ebe306a9
2021-06-21 11:46:08 -07:00
..
_static DOC Improve documentation for LayerNorm (#59178) 2021-06-07 14:34:10 -07:00
_templates Remove master documentation from being indexable by search engines (#58056) 2021-05-18 06:20:09 -07:00
community Lint trailing newlines (#54737) 2021-03-30 13:09:52 -07:00
elastic [torch/elastic] Update the rendezvous docs (#58160) 2021-05-12 16:54:28 -07:00
notes Don't split oversize cached blocks (#44742) 2021-06-21 11:46:08 -07:00
rpc Forbid trailing whitespace (#53406) 2021-03-05 17:22:55 -08:00
scripts Add mish activation function (#58648) 2021-05-25 10:36:21 -07:00
__config__.rst Fix __config__ docs (#48557) 2020-11-29 23:57:06 -08:00
amp.rst Moves grid_sampler to autocast promote list (#58618) 2021-06-21 10:22:36 -07:00
autograd.rst Add no-grad inference mode note (#58513) 2021-05-25 13:06:54 -07:00
backends.rst Forbid trailing whitespace (#53406) 2021-03-05 17:22:55 -08:00
benchmark_utils.rst Expand benchmark utils docs (#51664) 2021-02-04 00:22:41 -08:00
bottleneck.rst
checkpoint.rst
complex_numbers.rst Abladawood patch 1 (#58496) 2021-05-20 10:32:18 -07:00
conf.py Use proper Google Analytics id (#56578) 2021-05-04 13:23:16 -07:00
cpp_extension.rst correct some cpp extension code usages and documents (#39766) 2020-06-10 08:31:22 -07:00
cpp_index.rst Add C++ Landing Page (#38450) 2020-05-14 16:02:01 -07:00
cuda.rst breakup optim, cuda documentation (#55673) 2021-04-14 12:44:00 -07:00
cudnn_persistent_rnn.rst Forbid trailing whitespace (#53406) 2021-03-05 17:22:55 -08:00
cudnn_rnn_determinism.rst Forbid trailing whitespace (#53406) 2021-03-05 17:22:55 -08:00
data.rst [DataLoader][doc] Randomness for base_seed generator and NumPy seed (#56528) 2021-04-22 09:40:45 -07:00
ddp_comm_hooks.rst [Gradient Compression] Remove unnecessary warning on the rst file and the check on C++ version (#58170) 2021-05-12 14:15:10 -07:00
distributed.elastic.rst [1/n][torch/elastic] Move torchelastic docs *.rst (#148) 2021-05-04 00:57:56 -07:00
distributed.optim.rst [Reland] Update and expose ZeroRedundancyOptimizer docs (#53112) 2021-03-02 14:16:12 -08:00
distributed.rst [reland] Document debugability features in torch.distributed (#59726) 2021-06-09 16:40:11 -07:00
distributions.rst Add sample validation for LKJCholesky.log_prob (#52763) 2021-02-25 16:12:29 -08:00
dlpack.rst Lint trailing newlines (#54737) 2021-03-30 13:09:52 -07:00
docutils.conf Revert "Revert D21337640: [pytorch][PR] Split up documentation into subpages and clean up some warnings" (#37778) 2020-05-04 14:32:35 -07:00
fft.rst Use autosummary on torch.fft, torch.linalg (#55748) 2021-04-13 12:02:36 -07:00
futures.rst Update docs to mention CUDA support for Future (#50048) 2021-05-11 08:26:33 -07:00
fx.rst [FX][docs][EZ] Fix link to fuser example (#59670) 2021-06-08 17:32:55 -07:00
hub.rst Add a torch.hub.load_local() function that can load models from any local directory with a hubconf.py (#44204) 2020-09-21 14:17:21 -07:00
index.rst add torch.testing to docs (#57247) 2021-05-07 09:16:39 -07:00
jit_builtin_functions.rst Lint trailing newlines (#54737) 2021-03-30 13:09:52 -07:00
jit_language_reference_v2.rst Fix hasattr support type (#57950) 2021-05-10 12:21:56 -07:00
jit_language_reference.rst add type annotations to torch.nn.modules.conv (#49564) 2021-01-15 11:16:11 -08:00
jit_python_reference.rst [JIT] improve documentation (#57991) 2021-05-19 11:47:32 -07:00
jit_unsupported.rst [JIT] Update docs for recently added features (#45232) 2020-09-28 18:17:42 -07:00
jit.rst Remove caption for Lang Reference (#56526) 2021-04-20 14:33:42 -07:00
linalg.rst Add torch.linalg.inv_ex without checking for errors by default (#58039) 2021-05-13 09:42:15 -07:00
math-quantizer-equation.png
mobile_optimizer.rst Mod lists to neutral+descriptive terms in caffe2/docs (#49803) 2020-12-23 11:37:11 -08:00
model_zoo.rst
multiprocessing.rst Forbid trailing whitespace (#53406) 2021-03-05 17:22:55 -08:00
name_inference.rst Abladawood patch 1 (#58496) 2021-05-20 10:32:18 -07:00
named_tensor.rst Forbid trailing whitespace (#53406) 2021-03-05 17:22:55 -08:00
nn.functional.rst Add mish activation function (#58648) 2021-05-25 10:36:21 -07:00
nn.init.rst
nn.rst ENH Adds nn.ReflectionPad3d (#59791) 2021-06-21 10:53:14 -07:00
onnx.rst Update Autograd Export Docs (#56594) (#59534) 2021-06-15 12:23:00 -07:00
optim.rst To add single and chained learning schedulers to docs (#56705) 2021-04-23 09:36:00 -07:00
package.rst [package] fix tutorial link (#60113) 2021-06-16 11:27:25 -07:00
pipeline.rst Add tutorials to pipeline docs. (#55209) 2021-04-05 20:01:00 -07:00
profiler.rst docs: fix profiler docstring (#55750) 2021-04-13 00:23:14 -07:00
quantization-support.rst [docs][quant] Add fx graph mode quant api doc (#55306) 2021-04-05 13:56:23 -07:00
quantization.rst quantization: improve documentation on natively supported backends (#58925) 2021-06-07 17:29:03 -07:00
random.rst Remove duplicated entries in random.rst (#39725) 2020-06-10 16:51:15 -07:00
rpc.rst Add a disclaimer about limited CUDA support in RPC (#58023) 2021-05-12 00:11:22 -07:00
sparse.rst Add CSR (compressed sparse row) layout for sparse tensors (#50937) 2021-04-12 10:09:12 -07:00
special.rst [special] Add special.ndtri (#58650) 2021-06-19 18:36:54 -07:00
storage.rst Lint trailing newlines (#54737) 2021-03-30 13:09:52 -07:00
tensor_attributes.rst Remove legacy constructor calls from pytorch codebase. (#54142) 2021-04-11 15:45:17 -07:00
tensor_view.rst Conjugate View (#54987) 2021-06-04 14:12:41 -07:00
tensorboard.rst
tensors.rst Revert D28994140: [pytorch][PR] Implemented torch.cov 2021-06-13 02:33:37 -07:00
testing.rst add torch.testing to docs (#57247) 2021-05-07 09:16:39 -07:00
torch.nn.intrinsic.qat.rst [quantization] Add some support for 3d operations (#50003) 2021-03-10 16:40:35 -08:00
torch.nn.intrinsic.quantized.rst Lint trailing newlines (#54737) 2021-03-30 13:09:52 -07:00
torch.nn.intrinsic.rst [quantization] Add some support for 3d operations (#50003) 2021-03-10 16:40:35 -08:00
torch.nn.qat.rst Lint trailing newlines (#54737) 2021-03-30 13:09:52 -07:00
torch.nn.quantized.dynamic.rst Forbid trailing whitespace (#53406) 2021-03-05 17:22:55 -08:00
torch.nn.quantized.rst [quant] add docs for embedding/embedding_bag (#51770) 2021-02-05 11:43:15 -08:00
torch.overrides.rst Add documentation for torch.overrides submodule. (#48170) 2020-11-30 11:25:31 -08:00
torch.quantization.rst Lint trailing newlines (#54737) 2021-03-30 13:09:52 -07:00
torch.rst Implementation of torch.isin() (#53125) 2021-06-14 13:50:53 -07:00
type_info.rst DOC: split quantization.rst into smaller pieces (#41321) 2020-07-25 23:59:40 -07:00