pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

History

Banit Agrawal 48d18fbd4c [PyTorch CUDA Allocator] Allow reuse of non-split blocks with better rounding (#136174 ) Summary: This diff adds an option to round the non-split blocks in caching allocator so that they can be reused without causing lots of fragmentation for large memory segments. For example, if we specify max_split memory size as 400MB, then all allocations more than 400MB will not be split. Lets say, we allocated some 1024MB blocks and these are cached in the allocator blocks. If we request a new 500MB block, we round it to nearest power-2-division, thats 512MB, we add default kLargeBuffer of 20MB, that will be 532MB and since 532MB is less than existing 1024MB block, the 1024MB will not be used for this allocation, instead a new 512MB block will be created. In this diff, we provide an option to cofigure the kLargeBuffer for rounding and expose as a configurable option, so 512MB + max_non_split_rounding_size and if thats greater than 1024MB, we will use te 1024MB and we wont create a new 512MB block using cudaMalloc. This option is added so that we can pre-allocate some large blocks so that we can reuse them as much as possible and we dont stall on calling cudaMalloc. Differential Revision: D62758758 Pull Request resolved: https://github.com/pytorch/pytorch/pull/136174 Approved by: https://github.com/zyan0		2024-09-17 19:08:44 +00:00
..
_static	Clean up distributed/CONTRIBUTING.md (#128450 )	2024-06-22 02:41:22 +00:00
_templates	Remove sdp_kernel and replace with sdpa_kernel in attention namespace (#114689 )	2024-01-24 22:28:04 +00:00
community	Add Alban and Piotr into Core Maintainers (#130903 )	2024-07-20 16:02:42 +00:00
elastic	DOC: add docstring to construct_and_record_rdzv_event() (#128189 )	2024-06-10 22:17:33 +00:00
notes	[PyTorch CUDA Allocator] Allow reuse of non-split blocks with better rounding (#136174 )	2024-09-17 19:08:44 +00:00
rpc	[Doc] fix some typos (found by codespell and typos) (#132544 )	2024-08-05 17:21:56 +00:00
scripts	[Doc] fix some typos (found by codespell and typos) (#132544 )	2024-08-05 17:21:56 +00:00
amp.rst	Update document for autocast on CPU (#135299 )	2024-09-13 09:11:47 +00:00
autograd.rst	Add torch.library.register_autograd (#124071 )	2024-04-18 12:47:59 +00:00
backends.rst	[sparse] Add cuSPARSELt as a backend (#128534 )	2024-08-21 22:06:07 +00:00
benchmark_utils.rst	Adding Compare in torch.utils.benchmark documentation (#125009 )	2024-05-03 00:50:54 +00:00
bottleneck.rst
checkpoint.rst	[checkpoint] Clean up selective activation checkpoint and make public (#125795 )	2024-06-18 18:18:50 +00:00
complex_numbers.rst	Document complex optimizer semantic behavior (#121667 )	2024-03-16 00:43:47 +00:00
cond.rst	[Doc] fix some typos (found by codespell and typos) (#132544 )	2024-08-05 17:21:56 +00:00
conf.py	Revert "[dtensor] move DTensor to public namespace (#133113 )"	2024-08-19 05:00:19 +00:00
config_mod.rst
cpp_extension.rst
cpp_index.rst
cpu.rst	Add current_device() to torch.cpu (#110987 )	2023-10-11 05:13:10 +00:00
cuda_environment_variables.rst	Add doc page for environment variables that effect PyTorch Runtime (#119087 )	2024-02-15 21:41:38 +00:00
cuda._sanitizer.rst
cuda.rst	Uses MemPoolContext to route allocations from CUDACachingAllocator (#134685 )	2024-08-29 03:56:31 +00:00
cuda.tunable.rst	[ROCm] TunableOp improvements (#124362 )	2024-06-03 22:30:11 +00:00
cudnn_persistent_rnn.rst
cudnn_rnn_determinism.rst
data.rst	Revert "reseed all Generators in Dataloader's _worker_loop() -- via GC (#107131 )"	2023-08-23 17:08:07 +00:00
ddp_comm_hooks.rst
debugging_environment_variables.rst	Add doc page for environment variables that effect PyTorch Runtime (#119087 )	2024-02-15 21:41:38 +00:00
deploy.rst
deterministic.rst	Add `torch.utils.deterministic.fill_uninitialized_memory` flag (#111377 )	2023-11-01 16:10:09 +00:00
distributed.algorithms.join.rst
distributed.checkpoint.rst	[Doc] fix some typos (found by codespell and typos) (#132544 )	2024-08-05 17:21:56 +00:00
distributed.elastic.rst	Reapply "distributed debug handlers (#126601 )" (#127805 )	2024-06-04 19:44:30 +00:00
distributed.optim.rst
distributed.pipelining.rst	[PP] Add ZeroBubble schedule (#133467 )	2024-08-22 13:32:15 +00:00
distributed.rst	[reland][dtensor] move DTensor to public namespace (#134203 )	2024-09-08 17:08:40 +00:00
distributed.tensor.parallel.rst	Update link in distributed.tensor.parallel.rst (#136103 )	2024-09-15 19:36:29 +00:00
distributed.tensor.rst	[reland][dtensor] move DTensor to public namespace (#134203 )	2024-09-08 17:08:40 +00:00
distributions.rst	Add inverse gamma distribution and fix `sign` bug in `PowerTransform`. (#104501 )	2023-11-01 02:26:25 +00:00
dlpack.rst
docutils.conf
export.ir_spec.rst	[export] Remove torch._export.export (#119095 )	2024-02-08 21:22:04 +00:00
export.rst	Add some doc for export_for_training (#135918 )	2024-09-15 17:08:12 +00:00
fft.rst
fsdp.rst
func.api.rst
func.batch_norm.rst
func.migrating.rst
func.rst
func.ux_limitations.rst
func.whirlwind_tour.rst
future_mod.rst	Add swap_tensors path to nn.Module._apply (#117167 )	2024-02-07 18:55:44 +00:00
futures.rst
fx.experimental.rst	Only thunkify proxies in some situations (#132421 )	2024-08-08 12:03:06 +00:00
fx.rst	Consolidate SymDispatchMode into ProxyTensorMode (#132674 )	2024-08-08 12:02:54 +00:00
hub.rst
index.rst	[reland][dtensor] move DTensor to public namespace (#134203 )	2024-09-08 17:08:40 +00:00
jit_builtin_functions.rst
jit_language_reference_v2.rst	[Doc] fix some typos (found by codespell and typos) (#132544 )	2024-08-05 17:21:56 +00:00
jit_language_reference.rst	[Doc] fix some typos (found by codespell and typos) (#132544 )	2024-08-05 17:21:56 +00:00
jit_python_reference.rst
jit_unsupported.rst	Add support for `torch.Generator` type in TorchScript (#110413 )	2023-11-21 23:07:21 +00:00
jit_utils.rst
jit.rst	Doc test non packages (#110568 )	2023-10-06 14:16:01 +00:00
library.rst	[custom ops] Add register_vmap for custom ops (#130589 )	2024-07-23 17:48:38 +00:00
linalg.rst
logging.rst	Change classification to beta for TORCH_LOGS (#118682 )	2024-01-31 21:50:55 +00:00
masked.rst	Add MaskedTensor passthrough: unfold, F.Unfold, F.Fold, stack (#125262 )	2024-09-06 19:06:23 +00:00
math-quantizer-equation.png
meta.rst	Add documentation for meta device (#119119 )	2024-02-04 01:05:22 +00:00
miscellaneous_environment_variables.rst	[RFC] Add support for device extension autoloading (#127074 )	2024-07-09 06:14:13 +00:00
mobile_optimizer.rst	Add ExecuTorch warning to mobile_optimizer (#134697 )	2024-09-04 17:47:14 +00:00
model_zoo.rst
module_tracker.rst	Add module tracker (#125352 )	2024-05-04 18:33:35 +00:00
monitor.rst
mps_environment_variables.rst	[MPS] Add mps profiler env vars to docs (#129552 )	2024-07-04 06:44:48 +00:00
mps.rst	Add support in Python API for the recommended max working set size. (#128289 )	2024-06-12 16:03:57 +00:00
mtia.rst	[MTIA] Support torch.cuda.get_device_capability equivalent API on MTIA (#135889 )	2024-09-17 17:42:56 +00:00
multiprocessing.rst	Doc test non packages (#110568 )	2023-10-06 14:16:01 +00:00
name_inference.rst	[docs] Properly link register_post_accumulate_grad_hook docs (#108157 )	2023-08-29 22:13:33 +00:00
named_tensor.rst
nested.rst
nn.attention.bias.rst	Remove sdp_kernel and replace with sdpa_kernel in attention namespace (#114689 )	2024-01-24 22:28:04 +00:00
nn.attention.flex_attention.rst	[Inductor] Added and_masks and or_masks utilities & make fully masked out rows 0 instead of nan (#131552 )	2024-07-25 21:29:46 +00:00
nn.attention.rst	Make FlexAttention API public (#130755 )	2024-07-16 16:21:25 +00:00
nn.functional.rst	Add RMSNorm module (#121364 )	2024-03-29 18:05:28 +00:00
nn.init.rst
nn.rst	Make adding Buffers more like adding Parameters (#125971 )	2024-07-31 10:32:40 +00:00
onnx_dynamo_onnxruntime_backend.rst	Follow-up #108379 (#108905 )	2023-09-09 01:38:36 +00:00
onnx_dynamo.rst	[ONNX] Improves documentation of ONNX exporter (#135372 )	2024-09-09 15:09:01 +00:00
onnx_torchscript_supported_aten_ops.rst	Refactor torch.onnx documentation (#108379 )	2023-09-08 18:23:48 +00:00
onnx_torchscript.rst	[ONNX] Remove logging apis from public (#133825 )	2024-09-13 22:19:52 +00:00
onnx.rst	[ONNX] Improves documentation of ONNX exporter (#135372 )	2024-09-09 15:09:01 +00:00
optim.rst	Make optim.swa.util content accessible from the torch.optim doc (#133393 )	2024-08-21 00:43:46 +00:00
package.rst	Doc test non packages (#110568 )	2023-10-06 14:16:01 +00:00
profiler.rst	Doc test non packages (#110568 )	2023-10-06 14:16:01 +00:00
quantization-accuracy-debugging.rst
quantization-backend-configuration.rst
quantization-support.rst	Update pt2e numeric debugger to use node.meta["custom"] field (#134040 )	2024-08-27 19:51:03 +00:00
quantization.rst	Cleanup some duplicated placeholder py:module docs (#123244 )	2024-04-05 03:18:53 +00:00
random.rst
rpc.rst
signal.rst
size.rst	Added a docstring for torch.Size.numel. (#124186 )	2024-04-19 09:23:02 +00:00
sparse.rst	SparseCsrCUDA: cuDSS backend for linalg.solve (#129856 )	2024-08-22 07:57:30 +00:00
special.rst
storage.rst
tensor_attributes.rst	Refine the logic of device construction when only device index is given (#129119 )	2024-07-15 14:34:29 +00:00
tensor_view.rst
tensorboard.rst
tensors.rst	add xpu to torch.tensors (#127280 )	2024-06-11 18:13:01 +00:00
testing.rst
threading_environment_variables.rst	Add doc page for environment variables that effect PyTorch Runtime (#119087 )	2024-02-15 21:41:38 +00:00
torch_cuda_memory.rst	Fix typo under docs directory (#110359 )	2023-10-03 16:36:05 +00:00
torch_environment_variables.rst	[Docs][MPS] Add mps environment variable table (#129008 )	2024-06-20 03:30:35 +00:00
torch_nccl_environment_variables.rst	[c10d][doc] Add docs for ENV variables TORCH_NCCL_ASYNC_ERROR_HANDLING TORCH_NCCL_TRACE_CPP_STACK and TORCH_NCCL_COORD_CHECK_MILSEC (#132920 )	2024-08-09 21:08:20 +00:00
torch.ao.ns._numeric_suite_fx.rst
torch.ao.ns._numeric_suite.rst
torch.compiler_aot_inductor.rst	[AOTI] docs: add suggestion to turn on freezing on CPU (#128010 )	2024-06-07 08:57:02 +00:00
torch.compiler_api.rst	[RFC][dynamo] add decorator to register polyfill for unsupported C++ function to avoid graph break (#133712 )	2024-08-21 06:36:41 +00:00
torch.compiler_best_practices_for_backends.rst
torch.compiler_cudagraph_trees.rst	[CUDAGraph] add more docs for cudagraph trees (#127963 )	2024-06-18 02:07:07 +00:00
torch.compiler_custom_backends.rst	Fix a link in the compiler backend doc (#126079 )	2024-05-21 20:16:04 +00:00
torch.compiler_dynamic_shapes.rst	feat: Add min, max ranges to mark_dynamic API (#119737 )	2024-03-07 23:26:03 +00:00
torch.compiler_dynamo_deepdive.rst	Stop immediately specializing common constants 0/1 for plain int (#128327 )	2024-07-03 16:41:51 +00:00
torch.compiler_dynamo_overview.rst	Rename TorchDynamo -> Dyanamo in the dynamo tutorial doc (#123431 )	2024-05-07 05:07:00 +00:00
torch.compiler_fake_tensor.rst	[BE] Reroute all uses of proxy_tensor.maybe_disable_fake_tensor_mode to fake_tensor.unset_fake_temporarily (#132770 )	2024-08-08 23:07:23 +00:00
torch.compiler_faq.rst	[dynamo] Retire CompileProfiler (#135133 )	2024-09-05 01:08:40 +00:00
torch.compiler_fine_grain_apis.rst	[Doc] fix some typos (found by codespell and typos) (#132544 )	2024-08-05 17:21:56 +00:00
torch.compiler_get_started.rst	Revert "[inductor] More fixes on the keys of `constants` and `signature` dictionaries (#135406 )"	2024-09-16 17:58:02 +00:00
torch.compiler_inductor_profiling.rst
torch.compiler_ir.rst	[export] torch.export landing page (#108783 )	2023-09-10 01:40:42 +00:00
torch.compiler_nn_module.rst	Revert "Reland 3rd try [finishing colesbury's PR 100642] Guard on nn.Module dicts and type (#109323 )" + Forward fixes + test (#110964 )	2023-10-11 05:16:47 +00:00
torch.compiler_performance_dashboard.rst
torch.compiler_profiling_torch_compile.rst	[EZ] Fix spelling typo (#136157 )	2024-09-16 19:30:30 +00:00
torch.compiler_transformations.rst	Fix typo under docs directory (#110359 )	2023-10-03 16:36:05 +00:00
torch.compiler_troubleshooting.rst	[dynamo] Retire CompileProfiler (#135133 )	2024-09-05 01:08:40 +00:00
torch.compiler.rst	add xpu to torch.compile (#127279 )	2024-06-13 21:15:09 +00:00
torch.overrides.rst	Doc test non packages (#110568 )	2023-10-06 14:16:01 +00:00
torch.rst	Autoselect default device in FSDP construction. (#127609 )	2024-08-08 05:25:17 +00:00
type_info.rst
utils.rst	New swap function (#111747 )	2023-12-08 18:49:35 +00:00
xpu.rst	[Intel GPU] Add XPU memory-related APIs (#129919 )	2024-09-07 11:15:17 +00:00