pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

History

Mikayla Gawarecki db3685a35c Add option to serialization config to reduce random reads from get_record_offset when loading with mmap=True (#143880 ) ## Background This PR adds `torch.utils.serialization.config.load.calculate_storage_offsets`. This option relies on the previous PR in this stack, where storage order was changed to non lexicographical. A `.format_version` entry was added to the zipfile and `calculate_storage_offsets` will only work on checkpoints with `.format_version`. When this is turned on, for `torch.load(mmap=True)`, offsets of each storage record (other than the 0th storage will be calculated instead of relying on `miniz` APIs to determine this). The existing APIs will issue multiple random reads (reading the end of central directory record, then reading the zipfile header for the record) to determine the storage offset where the record starts. This can greatly degrade `torch.load(mmap=True)` performance for non-filesystem cases. `6aaae9d78f/caffe2/serialize/inline_container.cc (L589-L605)` ## Testing strategy The agreed upon testing strategy was as follows: - Add debug code gated by an environment flag `TORCH_SERIALIZATION_DEBUG` that will run this offset calculation logic and verify it against getRecordOffset for each storage (when mmap=False) - This flag is set throughout CI, which means that every time `torch.load` is called, the offset calculation logic is implicitly being tested. Differential Revision: [D67673026](https://our.internmc.facebook.com/intern/diff/D67673026) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143880 Approved by: https://github.com/albanD ghstack dependencies: #143879		2025-01-27 23:57:30 +00:00
..
_static	Update OSS nested tensor docs to focus on NJT (#145402 )	2025-01-25 04:08:19 +00:00
_templates	Add an option for classic search (#142018 )	2024-12-06 01:24:52 +00:00
community	Update maintainers for inductor and x86 CPU (#136839 )	2024-10-11 07:24:07 +00:00
elastic	DOC: add docstring to construct_and_record_rdzv_event() (#128189 )	2024-06-10 22:17:33 +00:00
notes	Add option to serialization config to reduce random reads from get_record_offset when loading with mmap=True (#143880 )	2025-01-27 23:57:30 +00:00
rpc	[Doc] fix some typos (found by codespell and typos) (#132544 )	2024-08-05 17:21:56 +00:00
scripts	[ONNX] Update images and APIs to onnx_dynamo.rst (#144358 )	2025-01-08 21:44:43 +00:00
accelerator.rst	[BE][accelerator] formalize API name `{current,set}_device_{idx => index}` (#140542 )	2024-12-12 10:53:48 +00:00
amp.rst	Update document for autocast on CPU (#135299 )	2024-09-13 09:11:47 +00:00
autograd.rst	Add torch.library.register_autograd (#124071 )	2024-04-18 12:47:59 +00:00
backends.rst	Revert "Reverting the PR adding Kleidiai-based int4 kernels (#145392 )" (#145505 )	2025-01-23 18:50:59 +00:00
benchmark_utils.rst	Adding Compare in torch.utils.benchmark documentation (#125009 )	2024-05-03 00:50:54 +00:00
bottleneck.rst
checkpoint.rst	[checkpoint] Clean up selective activation checkpoint and make public (#125795 )	2024-06-18 18:18:50 +00:00
complex_numbers.rst	Document complex optimizer semantic behavior (#121667 )	2024-03-16 00:43:47 +00:00
cond.rst	[Doc] fix some typos (found by codespell and typos) (#132544 )	2024-08-05 17:21:56 +00:00
conf.py	Revert "Add flop formula for _scaled_mm (#144872 )"	2025-01-16 15:16:18 +00:00
config_mod.rst
cpp_extension.rst
cpp_index.rst
cpu.rst
cuda_environment_variables.rst	Add doc page for environment variables that effect PyTorch Runtime (#119087 )	2024-02-15 21:41:38 +00:00
cuda._sanitizer.rst
cuda.rst	Add get_stream_from_external API for CUDA backend (#143799 )	2024-12-31 11:15:59 +00:00
cuda.tunable.rst	[ROCm] Fix TunableOp UTs: Rotating Buffer (#143172 )	2024-12-14 06:18:11 +00:00
cudnn_persistent_rnn.rst
cudnn_rnn_determinism.rst
data.rst
ddp_comm_hooks.rst
debugging_environment_variables.rst	Add doc page for environment variables that effect PyTorch Runtime (#119087 )	2024-02-15 21:41:38 +00:00
deploy.rst
deterministic.rst
distributed.algorithms.join.rst
distributed.checkpoint.rst	[DCP] Cross-link DCP doc to tutorials (#139776 )	2024-11-07 02:19:49 +00:00
distributed.elastic.rst	Reapply "distributed debug handlers (#126601 )" (#127805 )	2024-06-04 19:44:30 +00:00
distributed.fsdp.fully_shard.rst	[FSDP2] Move to public `torch.distributed.fsdp` (#141868 )	2024-12-07 01:24:28 +00:00
distributed.optim.rst
distributed.pipelining.rst	[pipelining] Update tutorials and documentation (#143045 )	2024-12-12 18:42:17 +00:00
distributed.rst	[C10D] Update docs for wait() (#143305 )	2024-12-17 00:41:11 +00:00
distributed.tensor.parallel.rst	Update link in distributed.tensor.parallel.rst (#136103 )	2024-09-15 19:36:29 +00:00
distributed.tensor.rst	[dtensor] expose the __create_chunk_list__ in the doc (#144100 )	2025-01-03 20:06:23 +00:00
distributions.rst
dlpack.rst
docutils.conf
export.ir_spec.rst	[export] Update docs (#142011 )	2024-12-05 03:44:46 +00:00
export.programming_model.rst	fix formatting in programming model doc (#143587 )	2024-12-20 07:09:19 +00:00
export.rst	torch export programming model (#143546 )	2024-12-19 16:56:13 +00:00
fft.rst
fsdp.rst
func.api.rst
func.batch_norm.rst
func.migrating.rst
func.rst
func.ux_limitations.rst
func.whirlwind_tour.rst
future_mod.rst	Add swap_tensors path to nn.Module._apply (#117167 )	2024-02-07 18:55:44 +00:00
futures.rst
fx.experimental.rst	Add `truediv` support in export serializer (#136364 )	2024-12-05 17:33:33 +00:00
fx.rst	Consolidate SymDispatchMode into ProxyTensorMode (#132674 )	2024-08-08 12:02:54 +00:00
hub.rst
index.rst	Add Torchao docs link to Pytorch libraries (#145412 )	2025-01-24 17:11:20 +00:00
jit_builtin_functions.rst
jit_language_reference_v2.rst	[Doc] fix some typos (found by codespell and typos) (#132544 )	2024-08-05 17:21:56 +00:00
jit_language_reference.rst	[Doc] fix some typos (found by codespell and typos) (#132544 )	2024-08-05 17:21:56 +00:00
jit_python_reference.rst
jit_unsupported.rst
jit_utils.rst
jit.rst
library.rst	[Custom Ops] Add a new API to allow users to register an autocast for the custom op (#145588 )	2025-01-27 19:22:43 +00:00
linalg.rst
logging.rst	Change classification to beta for TORCH_LOGS (#118682 )	2024-01-31 21:50:55 +00:00
masked.rst	Add MaskedTensor passthrough: unfold, F.Unfold, F.Fold, stack (#125262 )	2024-09-06 19:06:23 +00:00
math-quantizer-equation.png
meta.rst	Add documentation for meta device (#119119 )	2024-02-04 01:05:22 +00:00
miscellaneous_environment_variables.rst	Add environment variable to force no weights_only load (#138225 )	2024-10-21 23:26:15 +00:00
mobile_optimizer.rst	Add ExecuTorch warning to mobile_optimizer (#134697 )	2024-09-04 17:47:14 +00:00
model_zoo.rst
module_tracker.rst	Add module tracker (#125352 )	2024-05-04 18:33:35 +00:00
monitor.rst
mps_environment_variables.rst	[MPS] Add mps profiler env vars to docs (#129552 )	2024-07-04 06:44:48 +00:00
mps.rst	[MPS] Expose `MPSProfiler::start/stopCapture` to Python (#144561 )	2025-01-11 02:05:36 +00:00
mtia.memory.rst	Revert "[MTIA] (3/n) Implement PyTorch APIs to query/reset device peak memory usage (#143347 )"	2024-12-21 04:04:16 +00:00
mtia.rst	Revert "[MTIA] (3/n) Implement PyTorch APIs to query/reset device peak memory usage (#143347 )"	2024-12-21 04:04:16 +00:00
multiprocessing.rst
name_inference.rst
named_tensor.rst
nested.rst	Update OSS nested tensor docs to focus on NJT (#145402 )	2025-01-25 04:08:19 +00:00
nn.attention.bias.rst	Remove sdp_kernel and replace with sdpa_kernel in attention namespace (#114689 )	2024-01-24 22:28:04 +00:00
nn.attention.experimental.rst	[Flex Attention] Paged Attention (#137164 )	2024-10-29 17:05:22 +00:00
nn.attention.flex_attention.rst	FlexAttention support for NJT (#136792 )	2024-10-28 20:01:27 +00:00
nn.attention.rst	[Flex Attention] Paged Attention (#137164 )	2024-10-29 17:05:22 +00:00
nn.functional.rst	Add RMSNorm module (#121364 )	2024-03-29 18:05:28 +00:00
nn.init.rst
nn.rst	Add APIs to separate norm calculation and gradient scaling in `nn.utils.clip_grad_norm_` (#139662 )	2024-11-07 23:13:23 +00:00
onnx_dynamo_memory_usage.rst	Update TorchDynamo-based ONNX Exporter memory usage example code. (#144139 )	2025-01-03 20:41:36 +00:00
onnx_dynamo_onnxruntime_backend.rst
onnx_dynamo.rst	[ONNX] Update images and APIs to onnx_dynamo.rst (#144358 )	2025-01-08 21:44:43 +00:00
onnx_torchscript_supported_aten_ops.rst
onnx_torchscript.rst	[ONNX] Update images and APIs to onnx_dynamo.rst (#144358 )	2025-01-08 21:44:43 +00:00
onnx.rst	[ONNX] Improves documentation of ONNX exporter (#135372 )	2024-09-09 15:09:01 +00:00
optim.rst	Ensure SWA boundary conditions w.r.t. definition (#133773 )	2024-10-31 18:24:08 +00:00
package.rst
profiler.rst
quantization-accuracy-debugging.rst
quantization-backend-configuration.rst
quantization-support.rst	Add support for prototype affine quantization in pt2e flow (#141421 )	2024-12-24 04:22:18 +00:00
quantization.rst	[BC-Breaking]Remove capture_pre_autograd_graph references in quantization (#139505 )	2024-12-13 22:26:22 +00:00
random.rst
rpc.rst
signal.rst
size.rst	Added a docstring for torch.Size.numel. (#124186 )	2024-04-19 09:23:02 +00:00
sparse.rst	SparseCsrCUDA: cuDSS backend for linalg.solve (#129856 )	2024-08-22 07:57:30 +00:00
special.rst
storage.rst	Doc: Rewrite the storage.rst file to emphasize untyped storages (#140145 )	2024-11-13 17:40:16 +00:00
tensor_attributes.rst	[Docs] Remove duplicate declaration of `double_tensor` (#140927 )	2024-11-18 21:22:30 +00:00
tensor_view.rst
tensorboard.rst
tensors.rst	add xpu to torch.tensors (#127280 )	2024-06-11 18:13:01 +00:00
testing.rst
threading_environment_variables.rst	Add doc page for environment variables that effect PyTorch Runtime (#119087 )	2024-02-15 21:41:38 +00:00
torch_cuda_memory.rst
torch_environment_variables.rst	[Docs][MPS] Add mps environment variable table (#129008 )	2024-06-20 03:30:35 +00:00
torch_nccl_environment_variables.rst	[c10d][doc] Add docs for ENV variables TORCH_NCCL_ASYNC_ERROR_HANDLING TORCH_NCCL_TRACE_CPP_STACK and TORCH_NCCL_COORD_CHECK_MILSEC (#132920 )	2024-08-09 21:08:20 +00:00
torch.ao.ns._numeric_suite_fx.rst
torch.ao.ns._numeric_suite.rst
torch.compiler_aot_inductor_minifier.rst	Aoti minifier flatten (#141156 )	2024-12-06 07:12:45 +00:00
torch.compiler_aot_inductor.rst	[AOTI][doc] Update tutorial (#143390 )	2024-12-17 18:35:40 +00:00
torch.compiler_api.rst	[export] add is_exporting flag (#142425 )	2024-12-18 21:36:28 +00:00
torch.compiler_best_practices_for_backends.rst
torch.compiler_cudagraph_trees.rst	[CUDAGraph][Docs] add `cuda` to `torch.randn` (#144793 )	2025-01-15 18:02:10 +00:00
torch.compiler_custom_backends.rst	[pt2, docs] Add new PT2 troubleshooting doc (#138620 )	2024-11-09 01:17:39 +00:00
torch.compiler_dynamic_shapes.rst	feat: Add min, max ranges to mark_dynamic API (#119737 )	2024-03-07 23:26:03 +00:00
torch.compiler_dynamo_deepdive.rst	fix typo in `torch.compiler_dynamo_deepdive.rst` (#140871 )	2024-11-19 14:42:36 +00:00
torch.compiler_dynamo_overview.rst	Rename TorchDynamo -> Dyanamo in the dynamo tutorial doc (#123431 )	2024-05-07 05:07:00 +00:00
torch.compiler_fake_tensor.rst	[doc] improve code in fake tensor doc (#140329 )	2024-11-13 05:14:56 +00:00
torch.compiler_faq.rst	Rename cache limit to recompile limit in configs (#143709 )	2024-12-22 10:03:57 +00:00
torch.compiler_fine_grain_apis.rst	[export] add is_exporting flag (#142425 )	2024-12-18 21:36:28 +00:00
torch.compiler_get_started.rst	[Inductor] Update AttrsDescriptor instantiation for Triton changes (#137458 )	2024-10-14 20:20:29 +00:00
torch.compiler_inductor_profiling.rst
torch.compiler_ir.rst
torch.compiler_nn_module.rst
torch.compiler_performance_dashboard.rst
torch.compiler_profiling_torch_compile.rst	[EZ] Fix spelling typo (#136157 )	2024-09-16 19:30:30 +00:00
torch.compiler_transformations.rst
torch.compiler_troubleshooting_old.rst	Rename cache limit to recompile limit in configs (#143709 )	2024-12-22 10:03:57 +00:00
torch.compiler_troubleshooting.rst	Rename cache limit to recompile limit in configs (#143709 )	2024-12-22 10:03:57 +00:00
torch.compiler.config.rst	Profile guided optimization for automatic_dynamic (#139001 )	2024-11-03 06:29:57 +00:00
torch.compiler.rst	Profile guided optimization for automatic_dynamic (#139001 )	2024-11-03 06:29:57 +00:00
torch.overrides.rst
torch.rst	Transform unbacked int expressions into a fresh unbacked int. (#141917 )	2024-12-05 16:53:44 +00:00
type_info.rst
utils.rst
xpu.rst	Add get_stream_from_external API for XPU backend (#141123 )	2024-12-31 11:15:52 +00:00