pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
Mikayla Gawarecki	001e355a56	Add option to serialization config to reduce random reads from get_record_offset when loading with mmap=True (#143880 ) ## Background This PR adds `torch.utils.serialization.config.load.calculate_storage_offsets`. This option relies on the previous PR in this stack, where storage order was changed to non lexicographical. A `.format_version` entry was added to the zipfile and `calculate_storage_offsets` will only work on checkpoints with `.format_version`. When this is turned on, for `torch.load(mmap=True)`, offsets of each storage record (other than the 0th storage will be calculated instead of relying on `miniz` APIs to determine this). The existing APIs will issue multiple random reads (reading the end of central directory record, then reading the zipfile header for the record) to determine the storage offset where the record starts. This can greatly degrade `torch.load(mmap=True)` performance for non-filesystem cases. `6aaae9d78f/caffe2/serialize/inline_container.cc (L589-L605)` ## How does this work The format for the checkpoint is as such ``` archive_name/ \|_ data.pkl \|_.format_version \|_byteorder \|_data/ \|_ 0 \|_ 1 \|_ 2 \|_ ... \|_ ``` Each `data/i` record represents a storage, where storages are written in the order that the Pickler encounters them. For each storage, our `persistent_load` logic saves the following metadata to the pickle file `dtype, numel, key, location` where `numel` is the number of bytes in the storage. Note that we always use `miniz` writer in the zip64 mode per [here](`7796e308d0/caffe2/serialize/inline_container.cc (L701)`) A zipfile record written by miniz looks as such ``` ---------------- ----------------- ------------------- ---------------- --------- ------------------------------ \| 30 byte header \| n byte filename \| zip64_extra_data \| m byte padding \| storage \| 16 or 24 byte local dir footer \| ---------------- ----------------- ------------------- ---------------- --------- ------------------------------ ``` - The header size (30) is given by [`MZ_ZIP_LOCAL_DIR_HEADER_SIZE`](https://github.com/pytorch/pytorch/blob/main/third_party/miniz-3.0.2/miniz.c?fbclid=IwZXh0bgNhZW0CMTEAAR2O8Vysd--UoSCxW70gabXIS1dbz733oHwuUQ5_Ff1hY2WU6PL2i6CSH4A_aem_J9oaU2HpDeWtJKOU9EnVqw#L3290) - filename will be `"{archive_name}/{filepath}"` - `zip64_extra_data` is determined by [`mz_zip_writer_create_zip64_extra_data`](`7796e308d0/third_party/miniz-3.0.2/miniz.c (L6202)`). Note that [we only create zip64_extra_data if storage_size >= 0xFFFFFFFF or the offset of the start of the header >= 0xFFFFFFFF](`7796e308d0/third_party/miniz-3.0.2/miniz.c (L6519-L6524)`) - `m` is determined by [`getPadding`](`7796e308d0/caffe2/serialize/inline_container.cc (L254)`), which accounts for filename, zip64_extra_data to determine `m` such that the start of `storage` is aligned to 64 bytes. The `m` bytes will always start with `F B padding_size" as the first 4 bytes - The local dir footer size is determined based on [this snippet ](`7796e308d0/third_party/miniz-3.0.2/miniz.c (L6610-L6632)`): if the buffer size is 0 it is skipped. If the zip64_extra_data was created, it is 24, otherwise it is 16. When `torch.utils.serialization.config.load.calculate_storage_offsets` is set we do the following - We keep track of where the "cursor" is in the file using `current_offset`, after each persistent_load call, it will be at the offset where the header for the next record starts - for the 0th storage, "data/0", we use the regular get_record_offset to determine the start of the storage - for any other storage, (where the storages will be in order encountered by the unpickler, 0, 1, 2, 3, ...) we use `get_record_offset_no_read`, which re-uses the `getPadding` logic to determine the offset of the storage - Note that `load_tensor` will only ever be called again with the same key if the storage's `._data_ptr()` is 0 [[pointer1](https://github.com/pytorch/pytorch/blob/main/torch/serialization.py#L1917-L1918)][[pointer2](https://github.com/pytorch/pytorch/blob/main/torch/serialization.py#L1936-L1937)], so we cache the offsets for this edge case - After each storage, if the storage is non-zero, we account for the local dir footer based on the logic described above ## Testing strategy The agreed upon testing strategy was as follows: - Add debug code gated by an environment flag `TORCH_SERIALIZATION_DEBUG` that will run this offset calculation logic and verify it against getRecordOffset for each storage (when mmap=False) - This flag is set throughout CI, which means that every time `torch.load` is called, the offset calculation logic is implicitly being tested. Differential Revision: [D67673026](https://our.internmc.facebook.com/intern/diff/D67673026) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143880 Approved by: https://github.com/albanD ghstack dependencies: #143879	2025-01-31 17:09:20 +00:00
Mikayla Gawarecki	2d6f6637d3	Remove lexicographical sorting of storage keys in torch.save (#143879 ) Currently the order lexicographical (i.e. 0, 10, 11, ...19, 2, ....) instead of 0, 1, 2, 3, 4, 5 (the order that storage metadata is actually pickled in), since PyTorch will never be used with Python < 3.7 we can be assured that the keys will be read in the order of insertion (numerically sorted) This makes it such that the order storages are written in are the same as the pickling/unpickling order so we can calculate their offsets with less random reads * __->__ #143879 Pull Request resolved: https://github.com/pytorch/pytorch/pull/143879 Approved by: https://github.com/albanD	2025-01-31 17:00:23 +00:00
PyTorch MergeBot	dbef2a9bc9	Revert "Remove lexicographical sorting of storage keys in torch.save (#143879 )" This reverts commit `7db0afabaa`. Reverted https://github.com/pytorch/pytorch/pull/143879 on behalf of https://github.com/ZainRizvi due to Sorry but this is breaking internally. See D68746524 for details ([comment](https://github.com/pytorch/pytorch/pull/143879#issuecomment-2619661492))	2025-01-28 17:40:16 +00:00
PyTorch MergeBot	9010649292	Revert "Add option to serialization config to reduce random reads from get_record_offset when loading with mmap=True (#143880 )" This reverts commit `db3685a35c`. Reverted https://github.com/pytorch/pytorch/pull/143880 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but either this PR or the base PR breaks distributed tests ([comment](https://github.com/pytorch/pytorch/pull/143880#issuecomment-2617743403))	2025-01-28 03:07:17 +00:00
Mikayla Gawarecki	db3685a35c	Add option to serialization config to reduce random reads from get_record_offset when loading with mmap=True (#143880 ) ## Background This PR adds `torch.utils.serialization.config.load.calculate_storage_offsets`. This option relies on the previous PR in this stack, where storage order was changed to non lexicographical. A `.format_version` entry was added to the zipfile and `calculate_storage_offsets` will only work on checkpoints with `.format_version`. When this is turned on, for `torch.load(mmap=True)`, offsets of each storage record (other than the 0th storage will be calculated instead of relying on `miniz` APIs to determine this). The existing APIs will issue multiple random reads (reading the end of central directory record, then reading the zipfile header for the record) to determine the storage offset where the record starts. This can greatly degrade `torch.load(mmap=True)` performance for non-filesystem cases. `6aaae9d78f/caffe2/serialize/inline_container.cc (L589-L605)` ## Testing strategy The agreed upon testing strategy was as follows: - Add debug code gated by an environment flag `TORCH_SERIALIZATION_DEBUG` that will run this offset calculation logic and verify it against getRecordOffset for each storage (when mmap=False) - This flag is set throughout CI, which means that every time `torch.load` is called, the offset calculation logic is implicitly being tested. Differential Revision: [D67673026](https://our.internmc.facebook.com/intern/diff/D67673026) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143880 Approved by: https://github.com/albanD ghstack dependencies: #143879	2025-01-27 23:57:30 +00:00
Mikayla Gawarecki	7db0afabaa	Remove lexicographical sorting of storage keys in torch.save (#143879 ) Currently the order lexicographical (i.e. 0, 10, 11, ...19, 2, ....) instead of 0, 1, 2, 3, 4, 5 (the order that storage metadata is actually pickled in), since PyTorch will never be used with Python < 3.7 we can be assured that the keys will be read in the order of insertion (numerically sorted) This makes it such that the order storages are written in are the same as the pickling/unpickling order so we can calculate their offsets with less random reads Differential Revision: [D67673025](https://our.internmc.facebook.com/intern/diff/D67673025) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143879 Approved by: https://github.com/albanD	2025-01-27 23:57:30 +00:00
Mikayla Gawarecki	0eda02a94c	Prevent legacy_load when weights_only=True (correctly) (#145020 ) Only prevent `legacy_load` (.tar format removed in https://github.com/pytorch/pytorch/pull/713), not the whole of `_legacy_load` (.tar format + _use_new_zipfile_serialization=False) Differential Revision: [D68301405](https://our.internmc.facebook.com/intern/diff/D68301405) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145020 Approved by: https://github.com/kit1980, https://github.com/albanD	2025-01-17 20:10:22 +00:00
PyTorch MergeBot	aa4a1ff027	Revert "Prevent _legacy_load with weights_only=True (#144914 )" This reverts commit `7c3aa1da1c`. Reverted https://github.com/pytorch/pytorch/pull/144914 on behalf of https://github.com/izaitsevfb due to breaking inductor on trunk ([comment](https://github.com/pytorch/pytorch/pull/144914#issuecomment-2596922781))	2025-01-16 21:29:50 +00:00
Mikayla Gawarecki	7c3aa1da1c	Prevent _legacy_load with weights_only=True (#144914 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144914 Approved by: https://github.com/malfet, https://github.com/albanD	2025-01-16 19:33:46 +00:00
PyTorch MergeBot	99f2491af9	Revert "Use absolute path `path.resolve()` -> `path.absolute()` (#129409 )" This reverts commit `45411d1fc9`. Reverted https://github.com/pytorch/pytorch/pull/129409 on behalf of https://github.com/jeanschmidt due to Breaking internal CI, @albanD please help get this PR merged ([comment](https://github.com/pytorch/pytorch/pull/129409#issuecomment-2571316444))	2025-01-04 14:17:20 +00:00
Xuehai Pan	45411d1fc9	Use absolute path `path.resolve()` -> `path.absolute()` (#129409 ) Changes: 1. Always explicit `.absolute()`: `Path(__file__)` -> `Path(__file__).absolute()` 2. Replace `path.resolve()` with `path.absolute()` if the code is resolving the PyTorch repo root directory. Pull Request resolved: https://github.com/pytorch/pytorch/pull/129409 Approved by: https://github.com/albanD	2025-01-03 20:03:40 +00:00
PyTorch MergeBot	cc4e70b7c3	Revert "Use absolute path `path.resolve()` -> `path.absolute()` (#129409 )" This reverts commit `135c7db99d`. Reverted https://github.com/pytorch/pytorch/pull/129409 on behalf of https://github.com/malfet due to need to revert to as dependency of https://github.com/pytorch/pytorch/pull/129374 ([comment](https://github.com/pytorch/pytorch/pull/129409#issuecomment-2562969825))	2024-12-26 17:26:06 +00:00
Xuehai Pan	135c7db99d	Use absolute path `path.resolve()` -> `path.absolute()` (#129409 ) Changes: 1. Always explicit `.absolute()`: `Path(__file__)` -> `Path(__file__).absolute()` 2. Replace `path.resolve()` with `path.absolute()` if the code is resolving the PyTorch repo root directory. Pull Request resolved: https://github.com/pytorch/pytorch/pull/129409 Approved by: https://github.com/albanD	2024-12-24 08:33:08 +00:00
Animesh Jain	4627cfd1f9	[dynamo] Support user defined dicts (#143548 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143548 Approved by: https://github.com/yanboliang, https://github.com/jansel, https://github.com/williamwen42	2024-12-21 01:46:14 +00:00
Mikayla Gawarecki	372b023eb1	Fix test_serialization_zipfile_actually_jit when weights_only is not default (#143668 ) Fails in fbcode where weights_only isn't default Pull Request resolved: https://github.com/pytorch/pytorch/pull/143668 Approved by: https://github.com/awgu ghstack dependencies: #143326, #143403	2024-12-20 21:25:10 +00:00
Mikayla Gawarecki	8e483654cb	Add config.save.use_pinned_memory_for_d2h to serialization config (#143342 ) This was benchmarked with two separate scripts on my A100 (A) Save state_dict of llama3-style model on CUDA to disk with ``torch.save`` (B) Save `ModuleList` of 10 `nn.Linear(10,000, 10,000)` on CUDA to disk with `torch.save` Timings are an average of 5 runs and benchmark scripts + results are attached Under both scenarios, we see ~2x speedup in ``torch.save`` time with (``compute_crc32=False`` and ``use_pinned_memory_for_d2h=True``) compared to the baseline of the current defaults (``compute_crc32=True`` and ``use_pinned_memory_for_d2h=False`` (A) Save state_dict of llama3-style model on CUDA to disk with ``torch.save`` [[script](https://gist.github.com/mikaylagawarecki/d3a86ea1bb08045d1a839976808d7432)][[results](https://gist.github.com/mikaylagawarecki/f61a4714e5cff703146a1fcb7e0c755c)] \| \| use_pinned_memory_for_d2h=False (Default) \| use_pinned_memory_for_d2h=True \| \|-\|-\|-\| \| `compute_crc_32= True` (Default)\| 28.54s \| 20.76s \| \| `compute_crc_32 = False` \| 22.57s \| 14.51s \| (B) Save `ModuleList` of 10 `nn.Linear(10,000, 10,000)` on CUDA to disk with `torch.save` [[script](https://gist.github.com/mikaylagawarecki/ecbc505436bdd4b5190ef1b3430c12b6)][[results](https://gist.github.com/mikaylagawarecki/4e686bcf030b57de8c3ca74d8f5a88f7)] \| \| use_pinned_memory_for_d2h=False (Default) \| use_pinned_memory_for_d2h=True \| \|-\|-\|-\| \| `compute_crc_32= True` (Default)\| 8.38s \| 5.53s \| \| `compute_crc_32 = False` \| 6.94s \| 3.99s \| Trace of (A) with `use_pinned_memory_for_d2h=True`, `compute_crc32=False` <img width="1745" alt="Screenshot 2024-12-16 at 7 32 33 PM" src="https://github.com/user-attachments/assets/80b87a8c-5a70-4eb9-ad66-7abc4aa7cc25" /> Baseline trace of (A) with `use_pinned_memory_for_d2h=False`, `compute_crc32=True` <img width="1799" alt="Screenshot 2024-12-16 at 7 38 20 PM" src="https://github.com/user-attachments/assets/13fa12d1-8f5f-424c-adc4-275b67012927" /> Pull Request resolved: https://github.com/pytorch/pytorch/pull/143342 Approved by: https://github.com/albanD ghstack dependencies: #143324	2024-12-20 21:01:18 +00:00
Tom Ritchford	d8c8ba2440	Fix unused Python variables in test/[e-z]* (#136964 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/136964 Approved by: https://github.com/justinchuby, https://github.com/albanD	2024-12-18 23:02:30 +00:00
Mikayla Gawarecki	ac8342f881	Prevent torch.jit.load path in torch.load when weights_only=True (#143326 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143326 Approved by: https://github.com/albanD	2024-12-18 00:17:41 +00:00
Mikayla Gawarecki	3bec67b8e5	Fix tests in test/test_serialization that were failing if run individually (#141300 ) #140739 and #140740 made it such that `get_safe_globals` no longer return an empty List by default This caused some tests that check the content of `get_safe_globals` to fail, in particular when run individually (they didn't fail in test suite as other tests ran before them called `clear_safe_globals`) but will fail when tests are run individually [T208186010](https://www.internalfb.com/intern/tasks/?t=208186010) test_safe_globals_for_weights_only test_safe_globals_context_manager_weights_only This PR fixes that and also makes most tests calling `clear_safe_globals` use the `safe_globals` context manager rather than try: finally Pull Request resolved: https://github.com/pytorch/pytorch/pull/141300 Approved by: https://github.com/awgu	2024-11-22 02:40:37 +00:00
Mikayla Gawarecki	37959c554d	Add small test case for #140230 (#140850 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/140850 Approved by: https://github.com/malfet ghstack dependencies: #140739, #140740	2024-11-19 02:44:54 +00:00
Mikayla Gawarecki	b63a84804c	Allow NJT by default for weights_only torch.load (take 2) (#140739 ) Per discussion with @malfet, only allow weights_only unpickler to load NJT if `torch.nested` and `torch._dynamo` are imported (this is slightly weird as technically `torch.nested` is actually imported by default and `torch._dynamo.decorators._DimRange` is actually what needs to be imported) we can't import this from `torch.nested` as this would - undo dynamo lazy import - cause circular import =========================== Redo of https://github.com/pytorch/pytorch/pull/140304 caused issues as `torch.nested._internal.foo` needs to be imported, which causes issues like ```python torch/_weights_only_unpickler.py", line 339, in load if full_path in _get_allowed_globals(): torch/_weights_only_unpickler.py", line 188, in _get_allowed_globals torch.nested._internal.nested_tensor.NestedTensor AttributeError: module 'torch.nested' has no attribute '_internal' ``` This likely wasn't caught in our CI because imports are global during unit tests(?), so we use subprocess to properly test this time Differential Revision: [D65961691](https://our.internmc.facebook.com/intern/diff/D65961691) @jbschlosser Pull Request resolved: https://github.com/pytorch/pytorch/pull/140739 Approved by: https://github.com/malfet	2024-11-19 02:44:53 +00:00
Mikayla Gawarecki	41bb1539d3	Fix get_unsafe_globals_in_checkpoint to account for user allowed globals per docstring (#140738 ) bugfix: this function did not account for the user allowed globals :( Differential Revision: [D65960696](https://our.internmc.facebook.com/intern/diff/D65960696) Pull Request resolved: https://github.com/pytorch/pytorch/pull/140738 Approved by: https://github.com/malfet	2024-11-15 22:47:35 +00:00
cz2h	9602f56979	Fix misuse of offset param in seek (#140633 ) Fixes #115630. The size of BufferAdapter has been calculated wrongly due to misuse of python method seek. Causes miniz reader initialized with wrong size. Pull Request resolved: https://github.com/pytorch/pytorch/pull/140633 Approved by: https://github.com/ezyang Co-authored-by: Edward Z. Yang <ezyang@fb.com>	2024-11-15 19:07:52 +00:00
Mikayla Gawarecki	ca43ecd599	Flip default on weights_only (#137602 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/137602 Approved by: https://github.com/malfet, https://github.com/albanD ghstack dependencies: #138936, #139221, #139433, #139541	2024-11-04 18:30:29 +00:00
Mikayla Gawarecki	ea0e09b3f3	Add utility to get all unsafe globals in checkpoint (no pickletools dependency) (#139221 ) Fixes https://github.com/pytorch/pytorch/issues/129698 https://github.com/pytorch/pytorch/pull/139106 without pickletools Pull Request resolved: https://github.com/pytorch/pytorch/pull/139221 Approved by: https://github.com/malfet ghstack dependencies: #138936	2024-11-01 19:31:39 +00:00
Mikayla Gawarecki	2a309c0997	Fix weights_only for BUILD instructions for user allowlisted objects with __slots__ (#138936 ) Previously `BUILD` instruction missed handling for `__slots__`. This only applies for things allowlisted via `add_safe_globals`/`safe_globals` that use slots. ### Background When does pickle serialize a `BUILD` instruction? When `state` is not `None` and `state_setter` is `None` [[link](`c5b99f5c2c/Lib/pickle.py (L765)`)]. In this case, the docs tell us that either `__setstate__` or a `__dict__` update will be performed [[link](https://github.com/python/cpython/blob/3.13/Lib/pickletools.py#L1984)] `__reduce__`/`__reduce_ex__` are expected to return tuples of length 2 to 6 where `state` is the 3rd argument. When user doesn't patch `__reduce__` but patches `__setstate__`/`__getstate__`, state will be what is yielded by `__getstate__` Note the return type for [`__getstate__` ](https://docs.python.org/3/library/pickle.html#object.__getstate__) - For a class that has no instance [`__dict__`](https://docs.python.org/3/reference/datamodel.html#object.__dict__) and no [`__slots__`](https://docs.python.org/3/reference/datamodel.html#object.__slots__), the default state is None. - For a class that has an instance [`__dict__`](https://docs.python.org/3/reference/datamodel.html#object.__dict__) and no [`__slots__`](https://docs.python.org/3/reference/datamodel.html#object.__slots__), the default state is `self.__dict__`. - For a class that has an instance [`__dict__`](https://docs.python.org/3/reference/datamodel.html#object.__dict__) and [`__slots__`](https://docs.python.org/3/reference/datamodel.html#object.__slots__), the default state is a tuple consisting of two dictionaries: `self.__dict__`, and a dictionary mapping slot names to slot values. Only slots that have a value are included in the latter. - For a class that has [`__slots__`](https://docs.python.org/3/reference/datamodel.html#object.__slots__) and no instance [`__dict__`](https://docs.python.org/3/reference/datamodel.html#object.__dict__), the default state is a tuple whose first item is None and whose second item is a dictionary mapping slot names to slot values described in the previous bullet. see handling in pickle code `c5b99f5c2c/Lib/pickle.py (L1846-L1867)` Before this PR, we didn't account for the fact that when `__setstate__` is not defined, `state` might be a tuple so this would fail ```python from dataclasses import dataclass # Define the dataclass @dataclass class MyDataClass: __slots__ = ["x", "y"] x: int y: str # Create an instance of the dataclass my_data = MyDataClass(x=2, y=3) # Save the dataclass to a file torch.save(my_data, "my_data.pt") with torch.serialization.safe_globals([MyDataClass]): loaded_my_data = torch.load("my_data.pt", weights_only=True) # AttributeError: 'MyDataClass' object has no attribute '__dict__' ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/138936 Approved by: https://github.com/malfet	2024-11-01 00:59:29 +00:00
Wouter Devriendt	bae3426af7	reimport pr137735 due to merging check issues (#138959 ) This is a cherry-pick from #137735 by @mikaylagawarecki , that cannot be merged due to a (wrongly) failing check for codev @diff-train-skip-merge Pull Request resolved: https://github.com/pytorch/pytorch/pull/138959 Approved by: https://github.com/mikaylagawarecki	2024-10-27 16:31:34 +00:00
Nikita Shulga	b999daf7a9	Add sets to list of safe objects to de-serialize (#138866 ) Lists, dicts and tuples are already allowed, it's a bit weird not to exclude set from the list of basic containers. Test plan (in addition to unittest): ```python torch.save({1, 2, 3}, "foo.pt") torch.load("foo.pt", weights_only=True) ``` Fixes https://github.com/pytorch/pytorch/issues/138851 Pull Request resolved: https://github.com/pytorch/pytorch/pull/138866 Approved by: https://github.com/mikaylagawarecki Co-authored-by: mikaylagawarecki <mikaylagawarecki@gmail.com>	2024-10-25 05:23:08 +00:00
Mikayla Gawarecki	e24871eb3c	Add environment variable to force no weights_only load (#138225 ) In preparation for `weights_only` flip, if users don't have access to the `torch.load` call Pull Request resolved: https://github.com/pytorch/pytorch/pull/138225 Approved by: https://github.com/albanD	2024-10-21 23:26:15 +00:00
PyTorch MergeBot	dd32a32cb6	Revert "Expose option to disable CRC-32 computation during `torch.save` (#137735 )" This reverts commit `534fa96f2d`. Reverted https://github.com/pytorch/pytorch/pull/137735 on behalf of https://github.com/clee2000 due to failing internally D64438525, probably needs gating ([comment](https://github.com/pytorch/pytorch/pull/137735#issuecomment-2417412264))	2024-10-16 17:03:06 +00:00
Mikayla Gawarecki	534fa96f2d	Expose option to disable CRC-32 computation during `torch.save` (#137735 ) Option only works in open source, not internal Pull Request resolved: https://github.com/pytorch/pytorch/pull/137735 Approved by: https://github.com/albanD	2024-10-15 19:30:02 +00:00
Mikayla Gawarecki	b6cb174816	Fix serialization for torch.uint16, torch.uint32, torch.uint64 (#137184 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/137184 Approved by: https://github.com/albanD	2024-10-03 14:56:11 +00:00
Jez Ng	a0c76ea853	Make test_skip_data_serialization regex more flexible (#136580 ) Some CI machines seem to throw "Can't get local object" rather than "Can't pickle local object". Pull Request resolved: https://github.com/pytorch/pytorch/pull/136580 Approved by: https://github.com/mikaylagawarecki	2024-09-25 11:27:23 +00:00
Mikayla Gawarecki	a096f2899d	Add torch.serialization.skip_data context manager (#134504 ) ## Semantic The semantic is (1) By default `torch.serialization.skip_data(materialize_fake_tensors=False)` will make `torch.save` skip writing storages (but reserve space for them in the checkpoint). ```python import torch import torch.nn as nn sd = nn.Linear(3, 5).state_dict() with torch.serialization.skip_data(): torch.save(sd, 'foo.pt') print(torch.load('foo.pt', weights_only=True)) ``` (2) With `torch.serialization.skip_data(materialize_fake_tensors=True)`If FakeTensor is passed to `torch.save` the pickler will treat these FakeTensors as being "materialized" space will be reserved in the checkpoint for the associated storage bytes, and when loading the type will be Tensor instead of FakeTensor) ```python import torch import torch.nn as nn from torch._subclasses.fake_tensor import FakeTensorMode with FakeTensorMode(): m = nn.Linear(3, 5, dtype=torch.float16, device='cuda') sd = m.state_dict() with torch.serialization.skip_data(materialize_fake_tensors=True): torch.save(sd, 'bla.pt') print(torch.load('bla.pt', weights_only=True)) # OrderedDict([('weight', tensor([[0., 0., 0.], # [0., 0., 0.], # [0., 0., 0.], # [0., 0., 0.], # [0., 0., 0.]], device='cuda:0', dtype=torch.float16)), ('bias', tensor([0., 0., 0., 0., 0.], device='cuda:0', dtype=torch.float16))]) ``` ## Follow Ups - [ ] `torch.load` semantic for skip_data context manager - [ ] Mechanism for getting offsets of storages saved via this method (for writing in a separate pass) Differential Revision: [D62238610](https://our.internmc.facebook.com/intern/diff/D62238610) Pull Request resolved: https://github.com/pytorch/pytorch/pull/134504 Approved by: https://github.com/albanD	2024-09-05 16:53:39 +00:00
PyTorch MergeBot	2fd36086bc	Revert "Add torch.serialization.skip_data context manager (#134504 )" This reverts commit `94db935749`. Reverted https://github.com/pytorch/pytorch/pull/134504 on behalf of https://github.com/kit1980 due to See D62082697 ([comment](https://github.com/pytorch/pytorch/pull/134504#issuecomment-2327542276))	2024-09-03 22:21:27 +00:00
Mikayla Gawarecki	94db935749	Add torch.serialization.skip_data context manager (#134504 ) ## Semantic The semantic is (1) By default `torch.serialization.skip_data(materialize_fake_tensors=False)` will make `torch.save` skip writing storages (but reserve space for them in the checkpoint). ```python import torch import torch.nn as nn sd = nn.Linear(3, 5).state_dict() with torch.serialization.skip_data(): torch.save(sd, 'foo.pt') print(torch.load('foo.pt', weights_only=True)) ``` (2) With `torch.serialization.skip_data(materialize_fake_tensors=True)`If FakeTensor is passed to `torch.save` the pickler will treat these FakeTensors as being "materialized" space will be reserved in the checkpoint for the associated storage bytes, and when loading the type will be Tensor instead of FakeTensor) ```python import torch import torch.nn as nn from torch._subclasses.fake_tensor import FakeTensorMode with FakeTensorMode(): m = nn.Linear(3, 5, dtype=torch.float16, device='cuda') sd = m.state_dict() with torch.serialization.skip_data(materialize_fake_tensors=True): torch.save(sd, 'bla.pt') print(torch.load('bla.pt', weights_only=True)) # OrderedDict([('weight', tensor([[0., 0., 0.], # [0., 0., 0.], # [0., 0., 0.], # [0., 0., 0.], # [0., 0., 0.]], device='cuda:0', dtype=torch.float16)), ('bias', tensor([0., 0., 0., 0., 0.], device='cuda:0', dtype=torch.float16))]) ``` ## Follow Ups - [ ] `torch.load` semantic for skip_data context manager - [ ] Mechanism for getting offsets of storages saved via this method (for writing in a separate pass) Pull Request resolved: https://github.com/pytorch/pytorch/pull/134504 Approved by: https://github.com/albanD	2024-08-29 04:52:52 +00:00
PyTorch MergeBot	1285443994	Revert "Add torch.serialization.skip_data context manager (#134504 )" This reverts commit `202600bc23`. Reverted https://github.com/pytorch/pytorch/pull/134504 on behalf of https://github.com/mikaylagawarecki due to This is breaking Windows docs tests due to NamedTemporaryFile on Windows not working well ([comment](https://github.com/pytorch/pytorch/pull/134504#issuecomment-2316543901))	2024-08-29 01:30:49 +00:00
Mikayla Gawarecki	202600bc23	Add torch.serialization.skip_data context manager (#134504 ) ## Semantic The semantic is (1) By default `torch.serialization.skip_data(materialize_fake_tensors=False)` will make `torch.save` skip writing storages (but reserve space for them in the checkpoint). ```python import torch import torch.nn as nn sd = nn.Linear(3, 5).state_dict() with torch.serialization.skip_data(): torch.save(sd, 'foo.pt') print(torch.load('foo.pt', weights_only=True)) ``` (2) With `torch.serialization.skip_data(materialize_fake_tensors=True)`If FakeTensor is passed to `torch.save` the pickler will treat these FakeTensors as being "materialized" space will be reserved in the checkpoint for the associated storage bytes, and when loading the type will be Tensor instead of FakeTensor) ```python import torch import torch.nn as nn from torch._subclasses.fake_tensor import FakeTensorMode with FakeTensorMode(): m = nn.Linear(3, 5, dtype=torch.float16, device='cuda') sd = m.state_dict() with torch.serialization.skip_data(materialize_fake_tensors=True): torch.save(sd, 'bla.pt') print(torch.load('bla.pt', weights_only=True)) # OrderedDict([('weight', tensor([[0., 0., 0.], # [0., 0., 0.], # [0., 0., 0.], # [0., 0., 0.], # [0., 0., 0.]], device='cuda:0', dtype=torch.float16)), ('bias', tensor([0., 0., 0., 0., 0.], device='cuda:0', dtype=torch.float16))]) ``` ## Follow Ups - [ ] `torch.load` semantic for skip_data context manager - [ ] Mechanism for getting offsets of storages saved via this method (for writing in a separate pass) Pull Request resolved: https://github.com/pytorch/pytorch/pull/134504 Approved by: https://github.com/albanD	2024-08-28 23:53:17 +00:00
Mikayla Gawarecki	2ac710e667	Make torch.serialization.set_default_mmap_options usable as a context manager (#134371 ) As title Pull Request resolved: https://github.com/pytorch/pytorch/pull/134371 Approved by: https://github.com/albanD	2024-08-27 14:45:29 +00:00
Mikayla Gawarecki	d9576c9440	Fix failures when default is flipped for weights_only (#127627 ) Tests on XLA shard not fixed yet but there is an issue here https://github.com/pytorch/xla/issues/7799 Pull Request resolved: https://github.com/pytorch/pytorch/pull/127627 Approved by: https://github.com/albanD ghstack dependencies: #132349	2024-08-16 00:22:43 +00:00
Mikayla Gawarecki	c8ad5e37e8	Fix all RuntimeErrors during weights_only load from being erroneously reported with the weights_only message (#132349 ) Caught in above PR #127627 Pull Request resolved: https://github.com/pytorch/pytorch/pull/132349 Approved by: https://github.com/albanD	2024-08-16 00:22:43 +00:00
Luciano Bello	1e9bedf688	Add `_codecs.encode` and `builtins.bytearray` to `_get_allowed_globals` to support bytes and bytearray serialization (#133189 ) Fixes #133163 Debugged in collaboration with @hariveliki The `byte` type is demanding the global `_codecs.encode`. That means, the following currently works: ```python import torch torch.save(b'hello', '/tmp/dummy.pth') torch.serialization.add_safe_globals([_codecs.encode]) torch.load('/tmp/dummy.pth', weights_only=True) ``` Similarly, `bytearray` needs `builtins.bytearray`. Following the `torch.loads` docs promise, both types should be supported without `add_safe_globals` as they are both primitive types: > weights_only: Indicates whether unpickler should be restricted to > loading only tensors, primitive types, dictionaries > and any types added via :func:`torch.serialization.add_safe_globals`. This PR adds both `_codecs.encode` and `builtins.bytearray` to `_get_allowed_globals` and test for saving and loading of both types with and without `weights_only`. Co-authored-by: hariveliki <98284163+hariveliki@users.noreply.github.com> Co-authored-by: mikaylagawarecki <mikaylagawarecki@gmail.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/133189 Approved by: https://github.com/mikaylagawarecki	2024-08-13 02:20:28 +00:00
Oguz Ulgen	221350e3a4	Add None return type to init -- tests (#132352 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/132352 Approved by: https://github.com/ezyang ghstack dependencies: #132335, #132351	2024-08-01 15:44:51 +00:00
Mikayla Gawarecki	d3556786b8	Blocklist certain modules for weights_only load (#131259 ) Also bold certain text in the error message as suggested <img width="3000" alt="Screenshot 2024-07-19 at 5 56 48 PM" src="https://github.com/user-attachments/assets/378f20c5-c6b2-4e53-8eaf-0bd26c3a6b60"> With a GLOBAL like `os.execv` the error message is now as such ```python File "/data/users/mg1998/pytorch/torch/serialization.py", line 1256, in load raise pickle.UnpicklingError(_get_wo_message(str(e))) from None _pickle.UnpicklingError: Weights only load failed. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. Trying to load unsupported GLOBAL posix.execv whose module posix is blocked. Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/131259 Approved by: https://github.com/malfet, https://github.com/albanD	2024-07-22 18:23:21 +00:00
Mikayla Gawarecki	7c289c2a5c	Add torch.serialization.safe_globals context manager (#127939 ) Add context manager mentioned in https://github.com/pytorch/pytorch/pull/127808#pullrequestreview-2096298486 Pull Request resolved: https://github.com/pytorch/pytorch/pull/127939 Approved by: https://github.com/albanD	2024-07-12 20:38:43 +00:00
Xuehai Pan	4ee1cb9b95	[BE][Easy] replace `import pathlib` with `from pathlib import Path` (#129426 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/129426 Approved by: https://github.com/malfet	2024-06-30 01:36:07 +00:00
PyTorch MergeBot	2effbcfcd8	Revert "[BE][Easy] replace `import pathlib` with `from pathlib import Path` (#129426 )" This reverts commit `6d75604ef1`. Reverted https://github.com/pytorch/pytorch/pull/129426 on behalf of https://github.com/XuehaiPan due to recognize `Path` as new exported API ([comment](https://github.com/pytorch/pytorch/pull/129426#issuecomment-2198371625))	2024-06-29 23:24:06 +00:00
Xuehai Pan	6d75604ef1	[BE][Easy] replace `import pathlib` with `from pathlib import Path` (#129426 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/129426 Approved by: https://github.com/malfet	2024-06-29 15:42:09 +00:00
Mikayla Gawarecki	45f3e20527	Improve error message for weights_only load (#129705 ) As @vmoens pointed out, the current error message does not make the "either/or" between setting `weights_only=False` and using `add_safe_globals` clear enough, and should print the code for the user to call `add_safe_globals` New formatting looks like such In the case that `add_safe_globals` can be used ```python >>> import torch >>> from torch.testing._internal.two_tensor import TwoTensor >>> torch.save(TwoTensor(torch.randn(2), torch.randn(2)), "two_tensor.pt") >>> torch.load("two_tensor.pt", weights_only=True) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/data/users/mg1998/pytorch/torch/serialization.py", line 1225, in load raise pickle.UnpicklingError(_get_wo_message(str(e))) from None _pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options (1) Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message. WeightsUnpickler error: Unsupported global: GLOBAL torch.testing._internal.two_tensor.TwoTensor was not an allowed global by default. Please use `torch.serialization.add_safe_globals([TwoTensor])` to allowlist this global if you trust this class/function. Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. ``` For other issues (unsupported bytecode) ```python >>> import torch >>> t = torch.randn(2, 3) >>> torch.save(t, "protocol_5.pt", pickle_protocol=5) >>> torch.load("protocol_5.pt", weights_only=True) /data/users/mg1998/pytorch/torch/_weights_only_unpickler.py:359: UserWarning: Detected pickle protocol 5 in the checkpoint, which was not the default pickle protocol used by `torch.load` (2). The weights_only Unpickler might not support all instructions implemented by this protocol, please file an issue for adding support if you encounter this. warnings.warn( Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/data/users/mg1998/pytorch/torch/serialization.py", line 1225, in load raise pickle.UnpicklingError(_get_wo_message(str(e))) from None _pickle.UnpicklingError: Weights only load failed. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. Please file an issue with the following so that we can make `weights_only=True` compatible with your use case: WeightsUnpickler error: Unsupported operand 149 Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. ``` Old formatting would have been like: ```python Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/data/users/mg1998/pytorch/torch/serialization.py", line 1203, in load raise pickle.UnpicklingError(UNSAFE_MESSAGE + str(e)) from None _pickle.UnpicklingError: Weights only load failed. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you get the file from a trusted source. Alternatively, to load with `weights_only` please check the recommended steps in the following error message. WeightsUnpickler error: Unsupported global: GLOBAL torch.testing._internal.two_tensor.TwoTensor was not an allowed global by default. Please use `torch.serialization.add_safe_globals` to allowlist this global if you trust this class/function. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/129705 Approved by: https://github.com/albanD, https://github.com/vmoens ghstack dependencies: #129239, #129396, #129509	2024-06-28 19:36:31 +00:00
Mikayla Gawarecki	303ad8d7f5	Add warning for weights_only (#129239 ) Also changes default for `weights_only` to `None` per comment below (hence the `suppress-bc-linter` tag) Pull Request resolved: https://github.com/pytorch/pytorch/pull/129239 Approved by: https://github.com/albanD, https://github.com/malfet	2024-06-26 14:20:19 +00:00

1 2 3

149 Commits