pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

History

Frank Lin bec6541d84 [CUDA][CUDAGraph] Reduce capture overhead in CUDA Graph memory reuse (#162186 ) Previous work #158352 delivered CUDAGraph memory footprint reduction with no replay-time impact, but capture time regressed (up to 20× slower) due to repeated full-graph traversals. See previous benchmark results [here](https://github.com/pytorch/pytorch/pull/158352#issuecomment-3215947565) This PR removes capture/reply overhead while preserving the memory savings: 1. Terminals as free markers We stop inserting empty nodes and instead record the current stream terminals as free markers. This avoids mutating the user’s graph and keeps semantics unchanged. 2. Incremental, cached reachability We add a per-graph reuse context that caches reverse-traversal state: * `graph_reuse_context[graph].visited[stream]` tracks nodes already seen from that stream’s terminal frontier. * On each allocation during capture, we resume traversal from the latest terminals and only visit unseen nodes. * A block is freed when all its recorded markers are in the visited set of its allocation stream—i.e., all markers are proven predecessors of future work. See [the performance results here](https://docs.google.com/spreadsheets/d/e/2PACX-1vRPvdd9Xa8W87ixbiA0da_qvOhrUAjUpFz0G-_j-MsDnoeRyhEa4_ut_W3rqcg1VVZVFJ-gucwov-3b/pubhtml?gid=1468302443&single=true), we sweep synthetic multi-stream CUDA Graphs built by `capture_benchmark.py` (same as before, we generate random interleaving of alloc/free/join with given probabilities, see [gist here](https://gist.github.com/eee4017/e2092d215b1d4bd46534148939af39e3)), and we compare median capture/replay times and memory. On an NVIDIA H100 PCIe across 24 configs, the optimization preserves reserved memory reduction at ~24–98%, leaves allocated memory unchanged, and brings capture time back to baseline (range 0.96–1.04× vs. baseline) with replay time unchanged (range 0.97–1.11×). Pull Request resolved: https://github.com/pytorch/pytorch/pull/162186 Approved by: https://github.com/eqy, https://github.com/ngimel		2025-09-30 22:28:46 +00:00
..
_static	[doc] Add AOTInductor intermediate debug printer OSS user manual (#163794 )	2025-09-30 03:01:03 +00:00
_templates	Migrate to new theme (#149331 )	2025-04-16 21:35:19 +00:00
accelerator	[OpenReg] Add AMP Integration guide for accelerators (#162050 )	2025-09-30 12:27:11 +00:00
community	Update persons of interest for XLA. The previous one is out of date. (#158652 )	2025-09-30 19:21:18 +00:00
compile	Add dynamic shapes doc (#159428 )	2025-09-22 21:01:27 +00:00
elastic	Support NUMA Binding for Callable Entrypoints (#160163 )	2025-08-12 20:08:49 +00:00
export	[export] Update PT2 archive docs (#162308 )	2025-09-09 02:08:13 +00:00
notes	[CUDA][CUDAGraph] Reduce capture overhead in CUDA Graph memory reuse (#162186 )	2025-09-30 22:28:46 +00:00
rpc	Fix broken URLs (#152237 )	2025-04-27 09:56:42 +00:00
scripts	[ONNX] Filter out torchscript sentences (#158850 )	2025-07-24 20:59:06 +00:00
user_guide	[OpenReg] Migrate Accelerator Document from source/notes into source/accelerator (#161845 )	2025-09-03 03:12:18 +00:00
accelerator.md	Add unified memory APIs for torch.accelerator (#152932 )	2025-08-08 17:41:22 +00:00
amp.md	[Docs] Convert to markdown: accelerator.rst, amp.rst, autograd.rst, backends.rst, benchmark_utils.rst (#155762 )	2025-06-12 02:55:06 +00:00
autograd.md	[Docs] Convert to markdown: accelerator.rst, amp.rst, autograd.rst, backends.rst, benchmark_utils.rst (#155762 )	2025-06-12 02:55:06 +00:00
backends.md	Revert "[ROCm] SDPA fix mem fault when dropout is enabled (#154864 )"	2025-08-26 20:03:59 +00:00
benchmark_utils.md	[Docs] Convert to markdown: accelerator.rst, amp.rst, autograd.rst, backends.rst, benchmark_utils.rst (#155762 )	2025-06-12 02:55:06 +00:00
checkpoint.md	Convert to markdown: checkpoint.rst (#156009 )	2025-06-16 17:48:23 +00:00
complex_numbers.md	Convert complex_numbers.rst to markdown (#156039 )	2025-06-16 17:24:37 +00:00
cond.md	[Docs] Fix indentations in cond.md (#156147 )	2025-09-21 05:50:50 +00:00
conf.py	Revert "[RELAND] Always build USE_DISTRIBUTED (#160449 ) and Make distributed modules importable even when backend not built (#159889 ) (#162594 )"	2025-09-25 13:47:46 +00:00
config_mod.md	[Docs] Convert to markdown cond.rst, config_mod.rst (#155653 )	2025-06-13 20:58:57 +00:00
cpp_extension.rst	xpu: support sycl with torch.utils.cpp_extension APIs (#132945 )	2025-02-16 16:50:59 +00:00
cpp_index.rst	[3/n] Remove references to TorchScript in PyTorch docs (#158315 )	2025-07-15 21:14:18 +00:00
cpu.rst
cuda_environment_variables.rst
cuda._sanitizer.rst
cuda.aliases.md	[BE] Adding aliases for CUDA and XPU API documentation (#162984 )	2025-09-21 22:28:27 +00:00
cuda.md	[BE] Adding aliases for CUDA and XPU API documentation (#162984 )	2025-09-21 22:28:27 +00:00
cuda.tunable.md	Fix #155016 for Docathon - convert rst to markdown (#155198 )	2025-06-13 20:24:34 +00:00
cudnn_persistent_rnn.rst
cudnn_rnn_determinism.rst	Fix broken URLs (#152237 )	2025-04-27 09:56:42 +00:00
data.md	Fix #155016 for Docathon - convert rst to markdown (#155198 )	2025-06-13 20:24:34 +00:00
ddp_comm_hooks.md	DOC: Convert to markdown: ddp_comm_hooks.rst, debugging_environment_variables.rst, deploy.rst, deterministic.rst, distributed.algorithms.join.rst (#155298 )	2025-06-06 22:44:50 +00:00
debugging_environment_variables.md	DOC: Convert to markdown: ddp_comm_hooks.rst, debugging_environment_variables.rst, deploy.rst, deterministic.rst, distributed.algorithms.join.rst (#155298 )	2025-06-06 22:44:50 +00:00
deterministic.md	DOC: Convert to markdown: ddp_comm_hooks.rst, debugging_environment_variables.rst, deploy.rst, deterministic.rst, distributed.algorithms.join.rst (#155298 )	2025-06-06 22:44:50 +00:00
distributed._dist2.md	Add a title to distributed._dist2.md (#159385 )	2025-07-30 04:09:41 +00:00
distributed.algorithms.join.md	DOC: Convert to markdown: ddp_comm_hooks.rst, debugging_environment_variables.rst, deploy.rst, deterministic.rst, distributed.algorithms.join.rst (#155298 )	2025-06-06 22:44:50 +00:00
distributed.checkpoint.md	[DCP][HuggingFace] Add Support for dequantization of SafeTensors checkpoints (#160682 )	2025-09-04 01:09:53 +00:00
distributed.elastic.md	NUMA binding integration with elastic agent and torchrun (#149334 )	2025-07-25 21:19:49 +00:00
distributed.fsdp.fully_shard.md	[FSDP2] explain user contract for fully_shard (#156070 )	2025-06-17 10:03:19 +00:00
distributed.md	[DCP][HuggingFace] Add Support for dequantization of SafeTensors checkpoints (#160682 )	2025-09-04 01:09:53 +00:00
distributed.optim.md	Fix #155018 (convert distributed rst to markdown) (#155528 )	2025-06-16 20:46:09 +00:00
distributed.pipelining.md	[PP] Add DualPipeV schedule (#159591 )	2025-08-14 14:58:35 +00:00
distributed.tensor.md	[DTensor] Add guide for what to do about mixed torch.Tensor and DTensor operations (#162651 )	2025-09-18 06:41:02 +00:00
distributed.tensor.parallel.md	fix Dtensor doc link (#162494 )	2025-09-09 22:10:37 +00:00
distributions.md	Convert to markdown: distributed.tensor.parallel.rst, distributed.tensor.rst, distributions.rst, dlpack.rst (#155297 )	2025-06-13 22:08:37 +00:00
dlpack.md	Convert to markdown: distributed.tensor.parallel.rst, distributed.tensor.rst, distributions.rst, dlpack.rst (#155297 )	2025-06-13 22:08:37 +00:00
docutils.conf
export.md	Docs on export joint with descriptors (#159006 )	2025-09-06 03:02:58 +00:00
fft.md	Convert to .md: draft_export.rst, export.ir_spec.rst, fft.rst (#155567 )	2025-06-13 05:19:43 +00:00
fsdp.md	Convert rst files to md (#155369 )	2025-06-11 23:00:52 +00:00
func.api.md	Convert rst files to md (#155369 )	2025-06-11 23:00:52 +00:00
func.batch_norm.md	Convert rst files to md (#155369 )	2025-06-11 23:00:52 +00:00
func.md	Convert rst files to md (#155369 )	2025-06-11 23:00:52 +00:00
func.migrating.md	Convert rst files to md (#155369 )	2025-06-11 23:00:52 +00:00
func.ux_limitations.md	Fix #155022 rst to markdown conversion (#155540 )	2025-06-12 00:21:22 +00:00
func.whirlwind_tour.md	Fix #155022 rst to markdown conversion (#155540 )	2025-06-12 00:21:22 +00:00
future_mod.md	Fix #155022 rst to markdown conversion (#155540 )	2025-06-12 00:21:22 +00:00
futures.md	Fix #155022 rst to markdown conversion (#155540 )	2025-06-12 00:21:22 +00:00
fx.experimental.md	[dynamic shapes] DynamicInts prototype (#162194 )	2025-09-18 23:26:28 +00:00
fx.md	Preserve user annotation in graph (#163673 )	2025-09-25 15:50:15 +00:00
hub.md	Convert hub.rst to hub.md (#155483 )	2025-06-13 04:39:55 +00:00
index.md	Add placeholder for the User Guide (#159379 )	2025-08-13 14:56:04 +00:00
jit_builtin_functions.rst	[4/n] Remove references to TorchScript in PyTorch docs (#158317 )	2025-07-16 20:01:34 +00:00
jit_language_reference_v2.md	[1/n] Remove references to TorchScript in PyTorch docs (#158305 )	2025-07-15 20:16:53 +00:00
jit_language_reference.md	[2/n] Remove references to TorchScript in PyTorch docs (#158306 )	2025-07-15 20:57:23 +00:00
jit_python_reference.md	[3/n] Remove references to TorchScript in PyTorch docs (#158315 )	2025-07-15 21:14:18 +00:00
jit_unsupported.md	[4/n] Remove references to TorchScript in PyTorch docs (#158317 )	2025-07-16 20:01:34 +00:00
jit_utils.md	Convert to markdown: jit_python_reference.rst, jit_unsupported.rst, jit_utils.rst, library.rst (#155404 )	2025-06-26 21:09:46 +00:00
jit.rst	[4/n] Remove references to TorchScript in PyTorch docs (#158317 )	2025-07-16 20:01:34 +00:00
library.md	Add utility to get computed kernel in torch.library (#158393 )	2025-08-13 21:00:59 +00:00
linalg.md	[Docs] Convert to markdown to fix 155025 (#155789 )	2025-06-17 15:08:14 +00:00
logging.md	[Docs] Convert to markdown to fix 155025 (#155789 )	2025-06-17 15:08:14 +00:00
masked.md	[Docs] Convert to markdown to fix 155025 (#155789 )	2025-06-17 15:08:14 +00:00
math-quantizer-equation.png
meta.md	[Docs] Convert to markdown to fix 155025 (#155789 )	2025-06-17 15:08:14 +00:00
miscellaneous_environment_variables.md	[Docs] Convert to markdown to fix 155025 (#155789 )	2025-06-17 15:08:14 +00:00
mobile_optimizer.md	DOC: Convert to markdown: mobile_optimizer.rst, model_zoo.rst, module_tracker.rst, monitor.rst, mps_environment_variables.rst (#155702 )	2025-06-11 22:16:04 +00:00
model_zoo.md	DOC: Convert to markdown: mobile_optimizer.rst, model_zoo.rst, module_tracker.rst, monitor.rst, mps_environment_variables.rst (#155702 )	2025-06-11 22:16:04 +00:00
module_tracker.md	DOC: Convert to markdown: mobile_optimizer.rst, model_zoo.rst, module_tracker.rst, monitor.rst, mps_environment_variables.rst (#155702 )	2025-06-11 22:16:04 +00:00
monitor.md	DOC: Convert to markdown: mobile_optimizer.rst, model_zoo.rst, module_tracker.rst, monitor.rst, mps_environment_variables.rst (#155702 )	2025-06-11 22:16:04 +00:00
mps_environment_variables.md	DOC: Convert to markdown: mobile_optimizer.rst, model_zoo.rst, module_tracker.rst, monitor.rst, mps_environment_variables.rst (#155702 )	2025-06-11 22:16:04 +00:00
mps.md	Fix/issue #155027 (#155252 )	2025-06-08 21:17:31 +00:00
mtia.md	[BE] Add Documentation for Device APIs (#162834 )	2025-09-16 17:01:06 +00:00
mtia.memory.md	[Re-land][Inductor] Support native Inductor as backend for MTIA (#159211 )	2025-07-29 17:03:24 +00:00
multiprocessing.md	Fix/issue #155027 (#155252 )	2025-06-08 21:17:31 +00:00
name_inference.md	Fix/issue #155027 (#155252 )	2025-06-08 21:17:31 +00:00
named_tensor.md	Convert to markdown: named_tensor.rst, nested.rst, nn.attention.bias.rst, nn.attention.experimental.rst, nn.attention.flex_attention.rst #155028 (#155696 )	2025-06-14 03:32:00 +00:00
nativert.rst	[nativert] aoti (#162353 )	2025-09-12 05:56:25 +00:00
nested.md	Convert to markdown: named_tensor.rst, nested.rst, nn.attention.bias.rst, nn.attention.experimental.rst, nn.attention.flex_attention.rst #155028 (#155696 )	2025-06-14 03:32:00 +00:00
nn.aliases.md	[BE] Use .md instead of .rst for nn.aliases doc (#158666 )	2025-07-25 22:03:55 +00:00
nn.attention.bias.md	Convert to markdown: named_tensor.rst, nested.rst, nn.attention.bias.rst, nn.attention.experimental.rst, nn.attention.flex_attention.rst #155028 (#155696 )	2025-06-14 03:32:00 +00:00
nn.attention.experimental.md	Convert to markdown: named_tensor.rst, nested.rst, nn.attention.bias.rst, nn.attention.experimental.rst, nn.attention.flex_attention.rst #155028 (#155696 )	2025-06-14 03:32:00 +00:00
nn.attention.flex_attention.md	[BC Breaking] Remove flex + njt code paths (#161734 )	2025-09-16 00:13:56 +00:00
nn.attention.rst
nn.functional.rst
nn.init.rst
nn.rst	[BE] More torch.nn docs coverage test (except for torch.nn.parallel) (#158654 )	2025-07-25 22:03:55 +00:00
notes.md	Migrate to new theme (#149331 )	2025-04-16 21:35:19 +00:00
onnx_export.md	[ONNX] Remove enable_fake_mode and exporter_legacy (#161222 )	2025-08-22 22:15:27 +00:00
onnx_ops.md	[ONNX] Implement Attention-23 (#156431 )	2025-06-20 23:54:57 +00:00
onnx_testing.md	[ONNX] Expose the testing module (#162495 )	2025-09-10 01:40:24 +00:00
onnx_verification.md	[ONNX] Refactor torchscript based exporter (#161323 )	2025-09-02 16:10:30 +00:00
onnx.md	[ONNX] Expose the testing module (#162495 )	2025-09-10 01:40:24 +00:00
optim.aliases.md	Document the rest of the specific optimizer module APIs (#158669 )	2025-07-19 07:27:15 +00:00
optim.md	[muon] Introduce Muon optimizer to PyTorch (#160213 )	2025-08-24 08:03:04 +00:00
package.md	[3/n] Remove references to TorchScript in PyTorch docs (#158315 )	2025-07-15 21:14:18 +00:00
profiler.md	Convert rst to markdown - profiler.rst #155031 (#155559 )	2025-06-13 05:02:54 +00:00
pytorch-api.md	[BE] Remove bottleneck (#163210 )	2025-09-18 12:08:13 +00:00
quantization-support.md	Convert to markdown: quantization-accuracy-debugging.rst, quantization-backend-configuration.rst, quantization-support.rst, random.rst (#155520 )	2025-06-18 18:46:04 +00:00
quantization.rst	Remove the uncessary empty file (#160728 )	2025-08-19 10:54:08 +00:00
random.md	Convert to markdown: quantization-accuracy-debugging.rst, quantization-backend-configuration.rst, quantization-support.rst, random.rst (#155520 )	2025-06-18 18:46:04 +00:00
rpc.md	RPC tutorial audit (#157938 )	2025-07-10 14:15:37 +00:00
signal.md	Convert rst to md: rpc.rst, signal.rst, size.rst, special.rst (#155430 )	2025-06-18 01:27:04 +00:00
size.md	Convert rst to md: rpc.rst, signal.rst, size.rst, special.rst (#155430 )	2025-06-18 01:27:04 +00:00
sparse.rst	[BE] fix typos in docs/ (#156080 )	2025-06-21 02:47:32 +00:00
special.md	Convert rst to md: rpc.rst, signal.rst, size.rst, special.rst (#155430 )	2025-06-18 01:27:04 +00:00
storage.rst	Super tiny fix typo (#151212 )	2025-04-14 16:47:40 +00:00
tensor_attributes.rst	revamp dtype documentation for 2025 (#156087 )	2025-06-27 13:10:23 +00:00
tensor_view.rst	[docs] fix numpy docs reference (#147697 )	2025-02-26 01:30:03 +00:00
tensorboard.rst
tensors.rst	revamp dtype documentation for 2025 (#156087 )	2025-06-27 13:10:23 +00:00
testing.md	Convert to markdown: testing.rst, threading_environment_variables.rst, torch_cuda_memory.rst, torch_environment_variables.rst, torch_nccl_environment_variables.rst (#155523 )	2025-06-10 20:38:36 +00:00
threading_environment_variables.md	Convert to markdown: testing.rst, threading_environment_variables.rst, torch_cuda_memory.rst, torch_environment_variables.rst, torch_nccl_environment_variables.rst (#155523 )	2025-06-10 20:38:36 +00:00
torch_cuda_memory.md	Fixes broken memory_viz link in CUDA memory docs (#161426 )	2025-09-02 02:06:54 +00:00
torch_environment_variables.md	Convert to markdown: testing.rst, threading_environment_variables.rst, torch_cuda_memory.rst, torch_environment_variables.rst, torch_nccl_environment_variables.rst (#155523 )	2025-06-10 20:38:36 +00:00
torch_nccl_environment_variables.md	Convert to markdown: testing.rst, threading_environment_variables.rst, torch_cuda_memory.rst, torch_environment_variables.rst, torch_nccl_environment_variables.rst (#155523 )	2025-06-10 20:38:36 +00:00
torch.aliases.md	Remove torch.functional entries from the doc ignore list (#158581 )	2025-07-25 17:19:01 +00:00
torch.compiler_aot_inductor_debugging_guide.md	[doc] AOTI debugging guide (#160430 )	2025-08-14 23:42:17 +00:00
torch.compiler_aot_inductor_minifier.md	Converting .rst files to .md files (#155377 )	2025-06-13 22:54:27 +00:00
torch.compiler_aot_inductor.md	[AOTI] Update AOTInductor tutorial (#163808 )	2025-09-26 22:01:31 +00:00
torch.compiler_api.md	Implement guard collectives (optimized version) (#156562 )	2025-06-24 04:59:49 +00:00
torch.compiler_backward.md	Add AOTDispatcher config to set backward autocast behavior (#156356 )	2025-06-27 14:58:58 +00:00
torch.compiler_cudagraph_trees.md	[Graph Partition] add graph partition doc (#159450 )	2025-07-30 17:01:10 +00:00
torch.compiler_custom_backends.md	DOC: Convert to markdown: torch.compiler_best_practices_for_backends.rst, torch.compiler_cudagraph_trees.rst, torch.compiler_custom_backends.rst, torch.compiler_dynamic_shapes.rst, torch.compiler_dynamo_deepdive.rst (#155137 )	2025-06-10 20:51:05 +00:00
torch.compiler_dynamic_shapes.md	Adjust ...mark_unbacked() -> ...decorators.mark_unbacked() in logs. (#164131 )	2025-09-29 17:44:00 +00:00
torch.compiler_dynamo_deepdive.md	Dynamo Deep Dive Documentation Fix (#158860 )	2025-08-12 08:53:33 +00:00
torch.compiler_dynamo_overview.md	convert: rst to myst pr 1/2 (#155840 )	2025-06-13 18:02:28 +00:00
torch.compiler_fake_tensor.md	convert: rst to myst pr 1/2 (#155840 )	2025-06-13 18:02:28 +00:00
torch.compiler_faq.md	convert: rst to myst pr2/2 (#155911 )	2025-06-16 00:44:44 +00:00
torch.compiler_fine_grain_apis.md	convert: rst to myst pr2/2 (#155911 )	2025-06-16 00:44:44 +00:00
torch.compiler_get_started.md	convert: rst to myst pr2/2 (#155911 )	2025-06-16 00:44:44 +00:00
torch.compiler_inductor_profiling.md	Convert compiler rst files to markdown (#155335 )	2025-06-10 01:12:11 +00:00
torch.compiler_inductor_provenance.rst	Add to inductor provenance tracking doc (#162975 )	2025-09-16 19:09:06 +00:00
torch.compiler_ir.md	[export] Update docs (#157750 )	2025-07-16 19:53:12 +00:00
torch.compiler_nn_module.md	Convert compiler rst files to markdown (#155335 )	2025-06-10 01:12:11 +00:00
torch.compiler_performance_dashboard.md	Convert compiler rst files to markdown (#155335 )	2025-06-10 01:12:11 +00:00
torch.compiler_profiling_torch_compile.md	[Docs] Update PT2 Profiler Torch-Compiled Region Image (#158066 )	2025-07-11 07:56:45 +00:00
torch.compiler_transformations.md	[Docs] Convert to markdown: torch.compiler_transformations.rst, torch.compiler.config.rst (#155347 )	2025-06-11 18:55:30 +00:00
torch.compiler_troubleshooting_old.md	Add torch compile force disable caches alias (#158072 )	2025-08-02 23:23:17 +00:00
torch.compiler_troubleshooting.md	[doc] AOTI debugging guide (#160430 )	2025-08-14 23:42:17 +00:00
torch.compiler.config.md	[Docs] Convert to markdown: torch.compiler_transformations.rst, torch.compiler.config.rst (#155347 )	2025-06-11 18:55:30 +00:00
torch.compiler.md	Add dynamic shapes doc (#159428 )	2025-09-22 21:01:27 +00:00
torch.intermediate_debug_printer.md	[doc] Add AOTInductor intermediate debug printer OSS user manual (#163794 )	2025-09-30 03:01:03 +00:00
torch.overrides.md	DOC: Convert to markdown: torch.overrides.rst, type_info.rst, utils.rst, xpu.rst (#155088 )	2025-06-06 20:16:13 +00:00
torch.rst	Remove torch.functional entries from the doc ignore list (#158581 )	2025-07-25 17:19:01 +00:00
type_info.md	finfo eps doc fix (#160502 )	2025-08-14 01:49:35 +00:00
utils.md	Rename to _debug_mode.py to make it private (#163534 )	2025-09-23 04:27:10 +00:00
xpu.aliases.md	[BE] Adding aliases for CUDA and XPU API documentation (#162984 )	2025-09-21 22:28:27 +00:00
xpu.md	[BE] Adding aliases for CUDA and XPU API documentation (#162984 )	2025-09-21 22:28:27 +00:00