pytorch/docs/source
Marko Radmilac 945e359fc1 Initial implementation of host memory stats (#147660)
This is an initial attempt to provide some statistics for the pinned host memory allocations flowing through CachingHostAllocator. Many times in the past we have had inexplicable slowdowns that would be much easier to diagnose if we had some host memory characteristics.

This change tries very hard not to disrupt the initial design of the allocator, and it uses existing locking mechanism, whenever possible, to gather statistics "for free". Only deviation from that is on the "slow path" where we incur CUDA calls anyway, so taking a short lock is not going to hurt the performance much, especially in the steady state where most allocations will come from cache.

As mentioned before, this is the first PR, to introduce the concept and to see if it fits the right paradigm. We can always add more later.

Metrics that would require more involved changes to the code base and locks, like requested memory, have been punted for now. I also tried to reuse the Stat structure used in CUDA caching allocator, in order to maintain symmetry.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/147660
Approved by: https://github.com/ngimel
2025-02-28 18:36:44 +00:00
..
_static Update OSS nested tensor docs to focus on NJT (#145402) 2025-01-25 04:08:19 +00:00
_templates Remove link to search survey (#147751) 2025-02-25 19:26:59 +00:00
community Add @nikitaved to torch.linalg CODEOWNERS/persons_of_interest (#141803) 2025-02-04 16:11:31 +00:00
elastic DOC: add docstring to construct_and_record_rdzv_event() (#128189) 2024-06-10 22:17:33 +00:00
notes Udpate hw requirement for FP64 on "Getting Started on Intel GPU" (#147802) 2025-02-27 01:54:19 +00:00
rpc [Doc] fix some typos (found by codespell and typos) (#132544) 2024-08-05 17:21:56 +00:00
scripts [BE][CI] bump ruff to 0.9.0: string quote styles (#144569) 2025-02-24 19:56:09 +00:00
accelerator.rst [BE][accelerator] formalize API name {current,set}_device_{idx => index} (#140542) 2024-12-12 10:53:48 +00:00
amp.rst Update document for autocast on CPU (#135299) 2024-09-13 09:11:47 +00:00
autograd.rst Add torch.library.register_autograd (#124071) 2024-04-18 12:47:59 +00:00
backends.rst Revert "Reverting the PR adding Kleidiai-based int4 kernels (#145392)" (#145505) 2025-01-23 18:50:59 +00:00
benchmark_utils.rst Adding Compare in torch.utils.benchmark documentation (#125009) 2024-05-03 00:50:54 +00:00
bottleneck.rst
checkpoint.rst [checkpoint] Clean up selective activation checkpoint and make public (#125795) 2024-06-18 18:18:50 +00:00
complex_numbers.rst Document complex optimizer semantic behavior (#121667) 2024-03-16 00:43:47 +00:00
cond.rst [Doc] fix some typos (found by codespell and typos) (#132544) 2024-08-05 17:21:56 +00:00
conf.py Initial implementation of host memory stats (#147660) 2025-02-28 18:36:44 +00:00
config_mod.rst
cpp_extension.rst xpu: support sycl with torch.utils.cpp_extension APIs (#132945) 2025-02-16 16:50:59 +00:00
cpp_index.rst
cpu.rst Add current_device() to torch.cpu (#110987) 2023-10-11 05:13:10 +00:00
cuda_environment_variables.rst Add doc page for environment variables that effect PyTorch Runtime (#119087) 2024-02-15 21:41:38 +00:00
cuda._sanitizer.rst
cuda.rst Initial implementation of host memory stats (#147660) 2025-02-28 18:36:44 +00:00
cuda.tunable.rst [ROCm] Fix TunableOp UTs: Rotating Buffer (#143172) 2024-12-14 06:18:11 +00:00
cudnn_persistent_rnn.rst
cudnn_rnn_determinism.rst
data.rst Revert "reseed all Generators in Dataloader's _worker_loop() -- via GC (#107131)" 2023-08-23 17:08:07 +00:00
ddp_comm_hooks.rst
debugging_environment_variables.rst Add doc page for environment variables that effect PyTorch Runtime (#119087) 2024-02-15 21:41:38 +00:00
deploy.rst
deterministic.rst Add torch.utils.deterministic.fill_uninitialized_memory flag (#111377) 2023-11-01 16:10:09 +00:00
distributed.algorithms.join.rst
distributed.checkpoint.rst [DCP] Cross-link DCP doc to tutorials (#139776) 2024-11-07 02:19:49 +00:00
distributed.elastic.rst Reapply "distributed debug handlers (#126601)" (#127805) 2024-06-04 19:44:30 +00:00
distributed.fsdp.fully_shard.rst [FSDP2] Move to public torch.distributed.fsdp (#141868) 2024-12-07 01:24:28 +00:00
distributed.optim.rst
distributed.pipelining.rst [pipelining] Update tutorials and documentation (#143045) 2024-12-12 18:42:17 +00:00
distributed.rst [C10D] Update docs for wait() (#143305) 2024-12-17 00:41:11 +00:00
distributed.tensor.parallel.rst Update link in distributed.tensor.parallel.rst (#136103) 2024-09-15 19:36:29 +00:00
distributed.tensor.rst [dtensor] expose the __create_chunk_list__ in the doc (#144100) 2025-01-03 20:06:23 +00:00
distributions.rst Add inverse gamma distribution and fix sign bug in PowerTransform. (#104501) 2023-11-01 02:26:25 +00:00
dlpack.rst
docutils.conf
export.ir_spec.rst [export] Update docs (#142011) 2024-12-05 03:44:46 +00:00
export.programming_model.rst fix formatting in programming model doc (#143587) 2024-12-20 07:09:19 +00:00
export.rst [docs] Minor fixes to export and aoti docs (#144513) 2025-02-13 15:19:35 +00:00
fft.rst
fsdp.rst [FSDP][state_dict] Expose optimizer state_dict config (#105949) 2023-08-21 07:29:49 +00:00
func.api.rst Add torch.func.debug_unwrap (#146528) 2025-02-06 18:48:09 +00:00
func.batch_norm.rst
func.migrating.rst
func.rst
func.ux_limitations.rst
func.whirlwind_tour.rst
future_mod.rst Add swap_tensors path to nn.Module._apply (#117167) 2024-02-07 18:55:44 +00:00
futures.rst
fx.experimental.rst Add truediv support in export serializer (#136364) 2024-12-05 17:33:33 +00:00
fx.rst Consolidate SymDispatchMode into ProxyTensorMode (#132674) 2024-08-08 12:02:54 +00:00
hub.rst
index.rst Add Torchao docs link to Pytorch libraries (#145412) 2025-01-24 17:11:20 +00:00
jit_builtin_functions.rst
jit_language_reference_v2.rst [Doc] fix some typos (found by codespell and typos) (#132544) 2024-08-05 17:21:56 +00:00
jit_language_reference.rst [Doc] fix some typos (found by codespell and typos) (#132544) 2024-08-05 17:21:56 +00:00
jit_python_reference.rst
jit_unsupported.rst Add support for torch.Generator type in TorchScript (#110413) 2023-11-21 23:07:21 +00:00
jit_utils.rst
jit.rst Doc test non packages (#110568) 2023-10-06 14:16:01 +00:00
library.rst [Custom Ops] Add a new API to allow users to register an autocast for the custom op (#145588) 2025-01-27 19:22:43 +00:00
linalg.rst
logging.rst Change classification to beta for TORCH_LOGS (#118682) 2024-01-31 21:50:55 +00:00
masked.rst Add MaskedTensor passthrough: unfold, F.Unfold, F.Fold, stack (#125262) 2024-09-06 19:06:23 +00:00
math-quantizer-equation.png
meta.rst Add documentation for meta device (#119119) 2024-02-04 01:05:22 +00:00
miscellaneous_environment_variables.rst Add environment variable to force no weights_only load (#138225) 2024-10-21 23:26:15 +00:00
mobile_optimizer.rst Add ExecuTorch warning to mobile_optimizer (#134697) 2024-09-04 17:47:14 +00:00
model_zoo.rst
module_tracker.rst Add module tracker (#125352) 2024-05-04 18:33:35 +00:00
monitor.rst
mps_environment_variables.rst [MPS] Add mps profiler env vars to docs (#129552) 2024-07-04 06:44:48 +00:00
mps.rst [MPS] Expose MPSProfiler::start/stopCapture to Python (#144561) 2025-01-11 02:05:36 +00:00
mtia.memory.rst Revert "[MTIA] (3/n) Implement PyTorch APIs to query/reset device peak memory usage (#143347)" 2024-12-21 04:04:16 +00:00
mtia.rst Revert "[MTIA] (3/n) Implement PyTorch APIs to query/reset device peak memory usage (#143347)" 2024-12-21 04:04:16 +00:00
multiprocessing.rst Doc test non packages (#110568) 2023-10-06 14:16:01 +00:00
name_inference.rst [docs] Properly link register_post_accumulate_grad_hook docs (#108157) 2023-08-29 22:13:33 +00:00
named_tensor.rst fixing named tensor unflatten example (#106921) 2023-08-22 18:00:10 +00:00
nested.rst Update OSS nested tensor docs to focus on NJT (#145402) 2025-01-25 04:08:19 +00:00
nn.attention.bias.rst Remove sdp_kernel and replace with sdpa_kernel in attention namespace (#114689) 2024-01-24 22:28:04 +00:00
nn.attention.experimental.rst [Flex Attention] Paged Attention (#137164) 2024-10-29 17:05:22 +00:00
nn.attention.flex_attention.rst FlexAttention support for NJT (#136792) 2024-10-28 20:01:27 +00:00
nn.attention.rst [Flex Attention] Paged Attention (#137164) 2024-10-29 17:05:22 +00:00
nn.functional.rst Add RMSNorm module (#121364) 2024-03-29 18:05:28 +00:00
nn.init.rst
nn.rst Add APIs to separate norm calculation and gradient scaling in nn.utils.clip_grad_norm_ (#139662) 2024-11-07 23:13:23 +00:00
onnx_dynamo_memory_usage.rst Update TorchDynamo-based ONNX Exporter memory usage example code. (#144139) 2025-01-03 20:41:36 +00:00
onnx_dynamo_onnxruntime_backend.rst Follow-up #108379 (#108905) 2023-09-09 01:38:36 +00:00
onnx_dynamo.rst [ONNX] Update images and APIs to onnx_dynamo.rst (#144358) 2025-01-08 21:44:43 +00:00
onnx_torchscript_supported_aten_ops.rst Refactor torch.onnx documentation (#108379) 2023-09-08 18:23:48 +00:00
onnx_torchscript.rst [ONNX] Update images and APIs to onnx_dynamo.rst (#144358) 2025-01-08 21:44:43 +00:00
onnx.rst [ONNX] Improves documentation of ONNX exporter (#135372) 2024-09-09 15:09:01 +00:00
optim.rst Ensure SWA boundary conditions w.r.t. definition (#133773) 2024-10-31 18:24:08 +00:00
package.rst Doc test non packages (#110568) 2023-10-06 14:16:01 +00:00
profiler.rst Doc test non packages (#110568) 2023-10-06 14:16:01 +00:00
quantization-accuracy-debugging.rst
quantization-backend-configuration.rst
quantization-support.rst Add support for prototype affine quantization in pt2e flow (#141421) 2024-12-24 04:22:18 +00:00
quantization.rst [BC-Breaking]Remove capture_pre_autograd_graph references in quantization (#139505) 2024-12-13 22:26:22 +00:00
random.rst
rpc.rst [BE] RPC is missing RRef docs (#106902) 2023-08-10 16:26:27 +00:00
signal.rst
size.rst Added a docstring for torch.Size.numel. (#124186) 2024-04-19 09:23:02 +00:00
sparse.rst SparseCsrCUDA: cuDSS backend for linalg.solve (#129856) 2024-08-22 07:57:30 +00:00
special.rst
storage.rst Doc: Rewrite the storage.rst file to emphasize untyped storages (#140145) 2024-11-13 17:40:16 +00:00
tensor_attributes.rst [Docs] Remove duplicate declaration of double_tensor (#140927) 2024-11-18 21:22:30 +00:00
tensor_view.rst [docs] fix numpy docs reference (#147697) 2025-02-26 01:30:03 +00:00
tensorboard.rst
tensors.rst add xpu to torch.tensors (#127280) 2024-06-11 18:13:01 +00:00
testing.rst
threading_environment_variables.rst Add doc page for environment variables that effect PyTorch Runtime (#119087) 2024-02-15 21:41:38 +00:00
torch_cuda_memory.rst Fix typo under docs directory (#110359) 2023-10-03 16:36:05 +00:00
torch_environment_variables.rst [Docs][MPS] Add mps environment variable table (#129008) 2024-06-20 03:30:35 +00:00
torch_nccl_environment_variables.rst [c10d][doc] Add docs for ENV variables TORCH_NCCL_ASYNC_ERROR_HANDLING TORCH_NCCL_TRACE_CPP_STACK and TORCH_NCCL_COORD_CHECK_MILSEC (#132920) 2024-08-09 21:08:20 +00:00
torch.ao.ns._numeric_suite_fx.rst
torch.ao.ns._numeric_suite.rst
torch.compiler_aot_inductor_minifier.rst Aoti minifier flatten (#141156) 2024-12-06 07:12:45 +00:00
torch.compiler_aot_inductor.rst [docs] Minor fixes to export and aoti docs (#144513) 2025-02-13 15:19:35 +00:00
torch.compiler_api.rst [export] add is_exporting flag (#142425) 2024-12-18 21:36:28 +00:00
torch.compiler_best_practices_for_backends.rst Restructure torch.compile docs (#105376) 2023-07-28 20:58:57 +00:00
torch.compiler_cudagraph_trees.rst Revert "Implement cuda graphs implementation of torch.cond and torch.while_loop (#140979)" 2025-02-13 18:04:26 +00:00
torch.compiler_custom_backends.rst [pt2, docs] Add new PT2 troubleshooting doc (#138620) 2024-11-09 01:17:39 +00:00
torch.compiler_dynamic_shapes.rst feat: Add min, max ranges to mark_dynamic API (#119737) 2024-03-07 23:26:03 +00:00
torch.compiler_dynamo_deepdive.rst fix typo in torch.compiler_dynamo_deepdive.rst (#140871) 2024-11-19 14:42:36 +00:00
torch.compiler_dynamo_overview.rst Rename TorchDynamo -> Dyanamo in the dynamo tutorial doc (#123431) 2024-05-07 05:07:00 +00:00
torch.compiler_fake_tensor.rst [doc] improve code in fake tensor doc (#140329) 2024-11-13 05:14:56 +00:00
torch.compiler_faq.rst Rename cache limit to recompile limit in configs (#143709) 2024-12-22 10:03:57 +00:00
torch.compiler_fine_grain_apis.rst [export] add is_exporting flag (#142425) 2024-12-18 21:36:28 +00:00
torch.compiler_get_started.rst [Inductor] Update AttrsDescriptor instantiation for Triton changes (#137458) 2024-10-14 20:20:29 +00:00
torch.compiler_inductor_profiling.rst Restructure torch.compile docs (#105376) 2023-07-28 20:58:57 +00:00
torch.compiler_ir.rst [export] torch.export landing page (#108783) 2023-09-10 01:40:42 +00:00
torch.compiler_nn_module.rst Revert "Reland 3rd try [finishing colesbury's PR 100642] Guard on nn.Module dicts and type (#109323)" + Forward fixes + test (#110964) 2023-10-11 05:16:47 +00:00
torch.compiler_performance_dashboard.rst Restructure torch.compile docs (#105376) 2023-07-28 20:58:57 +00:00
torch.compiler_profiling_torch_compile.rst [EZ] Fix spelling typo (#136157) 2024-09-16 19:30:30 +00:00
torch.compiler_transformations.rst Fix typo under docs directory (#110359) 2023-10-03 16:36:05 +00:00
torch.compiler_troubleshooting_old.rst Rename cache limit to recompile limit in configs (#143709) 2024-12-22 10:03:57 +00:00
torch.compiler_troubleshooting.rst Rename cache limit to recompile limit in configs (#143709) 2024-12-22 10:03:57 +00:00
torch.compiler.config.rst Profile guided optimization for automatic_dynamic (#139001) 2024-11-03 06:29:57 +00:00
torch.compiler.rst Profile guided optimization for automatic_dynamic (#139001) 2024-11-03 06:29:57 +00:00
torch.overrides.rst Doc test non packages (#110568) 2023-10-06 14:16:01 +00:00
torch.rst Transform unbacked int expressions into a fresh unbacked int. (#141917) 2024-12-05 16:53:44 +00:00
type_info.rst
utils.rst New swap function (#111747) 2023-12-08 18:49:35 +00:00
xpu.rst Add get_stream_from_external API for XPU backend (#141123) 2024-12-31 11:15:52 +00:00