mirror of
https://github.com/zebrajr/pytorch.git
synced 2025-12-07 00:21:07 +01:00
## Background
This PR adds `torch.utils.serialization.config.load.calculate_storage_offsets`. This option relies on the previous PR in this stack, where storage order was changed to non lexicographical. A `.format_version` entry was added to the zipfile and `calculate_storage_offsets` will only work on checkpoints with `.format_version`.
When this is turned on, for `torch.load(mmap=True)`, offsets of each storage record (other than the 0th storage will be calculated instead of relying on `miniz` APIs to determine this).
The existing APIs will issue multiple random reads (reading the end of central directory record, then reading the zipfile header for the record) to determine the storage offset where the record starts. This can greatly degrade `torch.load(mmap=True)` performance for non-filesystem cases.
6aaae9d78f/caffe2/serialize/inline_container.cc (L589-L605)
## Testing strategy
The agreed upon testing strategy was as follows:
- Add debug code gated by an environment flag `TORCH_SERIALIZATION_DEBUG` that will run this offset calculation logic and verify it against getRecordOffset for each storage (when mmap=False)
- This flag is set throughout CI, which means that every time `torch.load` is called, the offset calculation logic is implicitly being tested.
Differential Revision: [D67673026](https://our.internmc.facebook.com/intern/diff/D67673026)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/143880
Approved by: https://github.com/albanD
ghstack dependencies: #143879
25 lines
626 B
Python
25 lines
626 B
Python
import sys
|
|
from typing import Optional as _Optional, TYPE_CHECKING as _TYPE_CHECKING
|
|
|
|
|
|
if _TYPE_CHECKING:
|
|
from torch.serialization import LoadEndianness as _LoadEndianess
|
|
|
|
from torch.utils._config_module import install_config_module as _install_config_module
|
|
|
|
|
|
class load:
|
|
mmap: bool = False
|
|
endianness: _Optional["_LoadEndianess"] = None
|
|
# MAP_PRIVATE = 2
|
|
mmap_flags: _Optional[int] = None if sys.platform == "win32" else 2
|
|
calculate_storage_offsets: bool = False
|
|
|
|
|
|
class save:
|
|
compute_crc32: bool = True
|
|
use_pinned_memory_for_d2h: bool = False
|
|
|
|
|
|
_install_config_module(sys.modules[__name__])
|