pytorch/torch/csrc/jit/python
Mikayla Gawarecki db3685a35c Add option to serialization config to reduce random reads from get_record_offset when loading with mmap=True (#143880)
## Background

This PR adds `torch.utils.serialization.config.load.calculate_storage_offsets`. This option relies  on the previous PR in this stack, where storage order was changed to non lexicographical. A `.format_version` entry was added to the zipfile and `calculate_storage_offsets` will only work on checkpoints with `.format_version`.

When this is turned on, for `torch.load(mmap=True)`, offsets of each storage record (other than the 0th storage will be calculated instead of relying on `miniz` APIs to determine this).

The existing APIs will issue multiple random reads (reading the end of central directory record, then reading the zipfile header for the record) to determine the storage offset where the record starts. This can greatly degrade `torch.load(mmap=True)` performance for non-filesystem cases.

6aaae9d78f/caffe2/serialize/inline_container.cc (L589-L605)

## Testing strategy

The agreed upon testing strategy was as follows:
- Add debug code gated by an environment flag `TORCH_SERIALIZATION_DEBUG` that will run this offset calculation logic and verify it against getRecordOffset for each storage (when mmap=False)
- This flag is set throughout CI, which means that every time `torch.load` is called, the offset calculation logic is implicitly being tested.

Differential Revision: [D67673026](https://our.internmc.facebook.com/intern/diff/D67673026)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/143880
Approved by: https://github.com/albanD
ghstack dependencies: #143879
2025-01-27 23:57:30 +00:00
..
init.cpp Add option to serialization config to reduce random reads from get_record_offset when loading with mmap=True (#143880) 2025-01-27 23:57:30 +00:00
init.h
module_python.h
pybind_utils.cpp
pybind_utils.h [NFC] Fix some minor typos. (#145599) 2025-01-24 18:58:59 +00:00
pybind.h Use Wextra-semi (#140236) 2024-11-13 02:15:16 +00:00
python_arg_flatten.cpp
python_arg_flatten.h
python_custom_class.cpp
python_custom_class.h
python_dict.cpp
python_dict.h Use Wextra-semi (#140236) 2024-11-13 02:15:16 +00:00
python_interpreter.cpp
python_ir.cpp [TorchScript] bindings for torch._C.ClassType.method_names() (#140444) 2024-11-13 17:23:23 +00:00
python_ir.h
python_ivalue.h
python_list.cpp
python_list.h Use Wextra-semi (#140236) 2024-11-13 02:15:16 +00:00
python_sugared_value.cpp Fix PyBind 2.10.4 compatibility issue in caffe2/torch/csrc/dynamo/guards.cpp +2 (#141456) 2024-11-24 21:05:48 +00:00
python_sugared_value.h Use Wextra-semi (#140236) 2024-11-13 02:15:16 +00:00
python_tracer.cpp
python_tracer.h
python_tree_views.cpp
python_tree_views.h
script_init.cpp
script_init.h
update_graph_executor_opt.cpp
update_graph_executor_opt.h
utf8_decoding_ignore.cpp
utf8_decoding_ignore.h