pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

History

Dan Johnson d22c4cc353 Add option to use mempool on OOM (#151487 ) MemPool is a separate pool of memory handled by the caching allocator. This PR adds the option let the caching allocator try to use this pool as a last resort instead of OOMing by associating a use_on_oom bool with each MemPool. Usage: Users can optionally specify a ``use_on_oom`` bool (which is False by default) during MemPool creation. If true, then the CUDACachingAllocator will be able to use memory in this pool as a last resort instead of OOMing. ``` pool = torch.cuda.MemPool(allocator, use_on_oom=True) with torch.cuda.use_mem_pool(pool): a = torch.randn(40 * 1024 * 1024, dtype=torch.uint8, device="cuda") del a # at the memory limit, this will succeed by using pool's memory in order to avoid the oom b = torch.randn(40 * 1024 * 1024, dtype=torch.uint8, device="cuda") ``` Testing: ``` python test/test_cuda.py -k test_mempool_limited_memory_with_allocator ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/151487 Approved by: https://github.com/eqy, https://github.com/syed-ahmed, https://github.com/ngimel		2025-04-26 04:04:57 +00:00
..
amp
__init__.py	[ROCm] Fixes to enable VM-based MI300 CI runners (#152133 )	2025-04-25 18:06:48 +00:00
_gpu_trace.py
_memory_viz.py
_sanitizer.py
_utils.py	Add torch.cuda._compile_kernel() (#151484 )	2025-04-24 07:14:31 +00:00
comm.py
error.py
gds.py	[BE] Upgrade to mypy 1.14 (#145966 )	2025-03-04 20:58:26 +00:00
graphs.py	Revert "Implement cuda graphs implementation of torch.cond and torch.while_loop (#140979 )"	2025-02-13 18:04:26 +00:00
jiterator.py
memory.py	Add option to use mempool on OOM (#151487 )	2025-04-26 04:04:57 +00:00
nccl.py
nvtx.py
profiler.py
random.py	Avoid unnecessary clone in torch.cuda.set_rng_state (#149283 )	2025-03-18 20:47:57 +00:00
sparse.py
streams.py
tunable.py	[ROCm][TunableOp] Support submatrices in offline tuning (#151138 )	2025-04-19 04:14:27 +00:00