Summary: To get source for a particular module, the "correct" thing to do is to check the module's spec and use `get_source` if it's a SourceFileLoader, since subclasses may look elsewhere than the `__file__`, and the spec will give the source of truth. For torch packager, however, we prefer to use linecache, but the loader could still change the file, so we figure out the file for the module using the spec's loader rather than using `module.__file__`, if possible.
Test Plan: This code path will get exercised by CI. Also added a test for remapped files.
Differential Revision: D41412983
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90258
Approved by: https://github.com/PaliC
Summary:
In this logic, we are traversing the entries to find the module for STACK_GLOBAL entries.
According to 2837241f22/Lib/pickletools.py (L1799) we need to look for GET, BINGET and LONG_BINGET.
So this diff updates that. Also while testing, I found some cases of empty modules, for cases such as tanh. For this I added the option to skip processing when this is the case.
Test Plan: Tested with f392778829
Differential Revision: D41748595
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90223
Approved by: https://github.com/PaliC
Summary: When using torch deploy, if we do fx transformation and then try to pickle/unpickle a fx GraphModule, it's possible that the GraphModule's code depends on `builtins` but we didn't add it to extern module.
Reviewed By: PaliC
Differential Revision: D40958730
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88385
Approved by: https://github.com/PaliC
### Description
Since the major changes for `_TypedStorage` and `_UntypedStorage` are now complete, they can be renamed to be public.
`TypedStorage._untyped()` is renamed to `TypedStorage.untyped()`.
Documentation for storages is improved as well.
### Issue
Fixes#82436
### Testing
N/A
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82438
Approved by: https://github.com/ezyang
The last entry is `torch/onnx/**/*.py` will be covered in a separated PR to onnx code owner
### Description
After ufmt (black + usort) covers the same set of files as black, when we can remove black and keep only one "true" linter for pytorch
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82043
Approved by: https://github.com/kit1980
I noticed that in #81261 all of the stdlib module names were explicitly listed, however as of Python 3.10 the stdlib now has a mechanism for this. https://github.com/python/cpython/issues/87121
I figured it was better to use `sys.stdlib_module_names` going forward for 3.10+ instead of having to maintain this file for every new Python release. For docs see:
https://docs.python.org/3/library/sys.html#sys.stdlib_module_names
I did a symmetric difference to determine what the effective change would be. I verified that everything listed in this file ins included in sys.stdlib_module_names. However, there are files in sys.stdlib_module_names that are not included in the previous hard coded definition. Namely these are:
```
frozenset({'__future__',
'_abc',
'_aix_support',
'_asyncio',
'_bisect',
'_blake2',
'_bootsubprocess',
'_bz2',
'_codecs',
'_codecs_cn',
'_codecs_hk',
'_codecs_iso2022',
'_codecs_jp',
'_codecs_kr',
'_codecs_tw',
'_collections',
'_collections_abc',
'_compat_pickle',
'_compression',
'_contextvars',
'_crypt',
'_csv',
'_ctypes',
'_curses',
'_curses_panel',
'_datetime',
'_dbm',
'_decimal',
'_elementtree',
'_frozen_importlib',
'_frozen_importlib_external',
'_functools',
'_gdbm',
'_hashlib',
'_heapq',
'_imp',
'_io',
'_json',
'_locale',
'_lsprof',
'_lzma',
'_markupbase',
'_md5',
'_msi',
'_multibytecodec',
'_multiprocessing',
'_opcode',
'_operator',
'_osx_support',
'_overlapped',
'_pickle',
'_posixshmem',
'_posixsubprocess',
'_py_abc',
'_pydecimal',
'_pyio',
'_queue',
'_random',
'_scproxy',
'_sha1',
'_sha256',
'_sha3',
'_sha512',
'_signal',
'_sitebuiltins',
'_socket',
'_sqlite3',
'_sre',
'_ssl',
'_stat',
'_statistics',
'_string',
'_strptime',
'_struct',
'_symtable',
'_threading_local',
'_tkinter',
'_tracemalloc',
'_uuid',
'_warnings',
'_weakref',
'_weakrefset',
'_winapi',
'_zoneinfo',
'antigravity',
'genericpath',
'idlelib',
'nt',
'nturl2path',
'opcode',
'pydoc_data',
'pyexpat',
'this'})
```
I'm not sure if excluding these matters. I wouldn't think it would, but if it does and it is better to explicitly update this file each time, then feel free to close this.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81520
Approved by: https://github.com/malfet
Summary:
This pr addresses https://github.com/pytorch/multipy/issues/82 and https://github.com/pytorch/multipy/issues/44. The changes will be copied over to [pytorch/multipy](https://github.com/pytorch/multipy) as well.
A C extension module behaves a bit differently than a normal python package as it does not contain a `__path__` attribute. However, these modules still have information about their submodules. This PR also checks if a module is a C extension module and checks if the module we are looking for is in it's children.
For example, if we are importing `torch._C._nn` we check if the parent `torch._C` is a C extension module if necessary, and then check if `torch._C._nn` is a proper child of `torch._C`.
Differential Revision: D37630120
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80917
Approved by: https://github.com/d4l3k
Summary:
Applies new import merging and sorting from µsort v1.0.
When merging imports, µsort will make a best-effort to move associated
comments to match merged elements, but there are known limitations due to
the diynamic nature of Python and developer tooling. These changes should
not produce any dangerous runtime changes, but may require touch-ups to
satisfy linters and other tooling.
Note that µsort uses case-insensitive, lexicographical sorting, which
results in a different ordering compared to isort. This provides a more
consistent sorting order, matching the case-insensitive order used when
sorting import statements by module name, and ensures that "frog", "FROG",
and "Frog" always sort next to each other.
For details on µsort's sorting and merging semantics, see the user guide:
https://usort.readthedocs.io/en/stable/guide.html#sorting
Test Plan: S271899
Reviewed By: lisroach
Differential Revision: D36402110
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78973
Approved by: https://github.com/osalpekar
Summary: This adds logs for usage of deploy and package. These can be used to track where it's being used in production so we can support it better.
Test Plan: no functional changes - existing tests
Reviewed By: PaliC
Differential Revision: D36258876
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77097
Approved by: https://github.com/PaliC
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72237
add a generic zip file reader/writer to torch.package in order to get rid of dependency on torch for non torchscript / tensor related usages of package. This also enables users to create a derived class from the zip file reader/writer classes to have their own serialization/deserialization if it's desired for performance needs.
https://www.internalfb.com/intern/diff/D35423079/ was reverted due to this refactor changing the name of where most of the implementation components of PackageExporter/PackageImporter come from like ModuleActionType_ etc.
This diff also changes the import paths where these components come from to point to the correct file compared to D35423079
Test Plan: Imported from OSS
Reviewed By: malfet
Differential Revision: D35423079
Pulled By: PaliC
fbshipit-source-id: 31abc4364d5fd007911cfb67cf36ebfac5d786f4
(cherry picked from commit 023b0d1445e0b1e1bb7a03c660cd62eb9d26d2a6)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74998
this is a cleaned-up version of a util that I commonly use to figure out where new transitive (surprise!) dependencies come from when a model breaks, since it can be difficult in large models to tell exactly what code change indirectly added the depdendency. I tried to keep the opinionated bits out of OSS as much as possible.
Reviewed By: PaliC
Differential Revision: D35265017
fbshipit-source-id: e126e03aa113db6ab79d32d86a69bcbba844875e
(cherry picked from commit ffa0f4b0a294cbcf7413b176d0d2fa0a605b3b9e)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74702
- add flag in constructor.
- add if condition routing to extern c-extension in method `_intern_module`.
- add unit test for the new condition.
Test Plan: Imported from OSS
Reviewed By: PaliC
Differential Revision: D35124731
fbshipit-source-id: a4b7fdf3210e0ad4bfd1ea30fd94595d10405987
(cherry picked from commit 57239a77ae099328025ab2d634e7880bd14a473b)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74610
Adding python version to exported package and reading it on import as per this issue in github https://github.com/pytorch/pytorch/issues/74068
ghstack-source-id: 152003088
Test Plan: CI Tests
Reviewed By: PaliC
Differential Revision: D35062709
fbshipit-source-id: 04091a1255a09b96255112a60d31df127c424193
(cherry picked from commit ed39fd54b8b20918dac89a2873ecccf06aafd724)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74315
Now instead of spitting out a NotImplemented Error when the Package Exporter finds an object for a mocked module, it combines these errors with the rest of the Packaging Errors of PackageExporter to get something like
```
torch.package.package_exporter.PackagingError:
* Module was mocked out, but is still being used in the package.Please intern or extern the mocked modules if objects are supposed to be inthe package.
package_a
Context: Object(s) '['PackageASubpackageObject']' from module package_a was mocked out during packaging but is being used in resource - obj.pkl in package obj.
package_a.subpackage
Context: Object(s) '['PackageASubpackageObject']' from module package_a.subpackage was mocked out during packaging but is being used in resource - obj.pkl in package obj.
```
This makes it significantly easier to fix mocked object errors as they all should appear at once.
Test Plan: Imported from OSS
Reviewed By: d4l3k
Differential Revision: D34951973
Pulled By: PaliC
fbshipit-source-id: 01ee4ba3767967ef9a9bcd69ad86362ebc100b2d
(cherry picked from commit 900edd270ee8f5802fc6e56df08fff6b073ac6f2)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74315
Now instead of spitting out a NotImplemented Error when the Package Exporter finds an object for a mocked module, it combines these errors with the rest of the Packaging Errors of PackageExporter to get something like
```
torch.package.package_exporter.PackagingError:
* Module was mocked out, but is still being used in the package.Please intern or extern the mocked modules if objects are supposed to be inthe package.
package_a
Context: Object(s) '['PackageASubpackageObject']' from module package_a was mocked out during packaging but is being used in resource - obj.pkl in package obj.
package_a.subpackage
Context: Object(s) '['PackageASubpackageObject']' from module package_a.subpackage was mocked out during packaging but is being used in resource - obj.pkl in package obj.
```
This makes it significantly easier to fix mocked object errors as they all should appear at once.
Test Plan: Imported from OSS
Reviewed By: aivanou
Differential Revision: D34932200
Pulled By: PaliC
fbshipit-source-id: 7f12bd88dbfbad974fd04b5dcaba3203b5c68a04
(cherry picked from commit 73df434ddd3e26f0e4c5ea3dd2ca1b6984736213)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73106
The original error message didn't have next steps, and someone got confused. This error message should make debugging a bit easier.
Test Plan: Imported from OSS
Reviewed By: ngimel
Differential Revision: D34559499
Pulled By: PaliC
fbshipit-source-id: fd5fec9c4db10a20775435a587bad24336a671ef
(cherry picked from commit efdcf1e198389ee156a46cf0f8b185d8145d3266)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72533
The current check that we have for dummy packages is too expansive; it
will skip anything without a `__file__`, including extension modules in
the standard library.
So first check if a module was created by torch.package before skipping
it, which should rule out anything accidentally getting skipped (as the
only time torch.package creates something without a `__file__` is the
dummy case).
Test Plan: Imported from OSS
Reviewed By: PaliC
Differential Revision: D34082792
Pulled By: suo
fbshipit-source-id: 18b17eb0f693927697657b20843ec5cd8bcccb47
(cherry picked from commit f571370078)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71520
Specifically this pr is to deal with adverse cases such as D33662192
It is possible for someone to export a package, change something in it, and then attempt to repackage it. In this case it's possible that dependencies of the package are no longer interned. In this case it is not obvious where we would look for these packages, therefore, we throw an error.
Test Plan: Imported from OSS
Reviewed By: bradleyhd
Differential Revision: D33675557
Pulled By: PaliC
fbshipit-source-id: 807962bfb340d30d418617d6e78661a033828314
(cherry picked from commit 1b10c23807)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70641
Raises a not implemented error if we attempt to pickle an object which uses a mocked module. Now we no longer have to load the object to get this check, and instead happens right on the saving path.
Review History is on https://github.com/pytorch/pytorch/pull/69793 PR was moved to a different branch due to original branch getting corrupted.
Test Plan: Imported from OSS
Reviewed By: suo
Differential Revision: D33414365
Pulled By: PaliC
fbshipit-source-id: 6d72ddb05c47a3d060e9622ec0b6e5cd6c6c71c8
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71025
TL;DR In come cases:
1) user imports `dill`, which mutates `_Pickler.dispatch`,
2) user imports lib that imports `torch.package`
3) `PackagePickler.dispatch = _Pickler.dispatch.copy()` makes a copy of the mutated table
4) user calls `dill.extend(use_dill=False)` to reset `_Pickler.dispatch`, expecting everything to be okay
5) `PackagePickler` is used to pickle something like `ModuleDict`. `PackagePickler.dispatch` has stale entries to dill pickle functions like `save_module_dict`, which sometimes hard-code calls to `StockPickler.save_global`, which is unaware of torch.package module prefixes.
6) Exception is raised, e.g. `Got unhandled exception Can't pickle <class '<torch_package_2>.caffe2.mylib'>: it's not found as <class '<torch_package_2>.caffe2.mylib'>`
Differential Revision: D33483672
fbshipit-source-id: d7cd2a925bedf27c02524a6a4c3132a262f5c984
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62030
Remove dtype tracking from Python Storage interface, remove all the different `<type>Storage` classes except for `ByteStorage`, and update serialization accordingly, while maintaining as much FC/BC as possible
Fixes https://github.com/pytorch/pytorch/issues/47442
* **THE SERIALIZATION FORMAT IS FULLY FC/BC.** We worked very hard to make sure this is the case. We will probably want to break FC at some point to make the serialization structure of tensors make more sense, but not today.
* There is now only a single torch.ByteStorage class. Methods like `Tensor.set_` no longer check that the dtype of storage is appropriate.
* As we no longer know what dtype of a storage is, we've **removed** the size method from Storage, replacing it with nbytes. This is to help catch otherwise silent errors where you confuse number of elements with number of bytes.
* `Storage._new_shared` takes a `nbytes` kwarg and will reject previous positional only calls. `Storage._new_with_file` and `_set_from_file` require explicit element size arguments.
* It's no longer possible to convert storages to different types using the float/double/etc methods. Instead, do the conversion using a tensor.
* It's no longer possible to allocate a typed storage directly using FloatStorage/DoubleStorage/etc constructors. Instead, construct a tensor and extract its storage. The classes still exist but they are used purely for unpickling.
* The preexisting serialization format stores dtype with storage, and in fact this dtype is used to determine the dtype of the tensor overall.
To accommodate this case, we introduce a new TypedStorage concept that exists only during unpickling time which is used to temporarily store the dtype so we can construct a tensor. **If you overrode the handling of pickling/unpickling, you MUST add handling for TypedStorage** or your serialization code will degrade to standard file-based serialization.
Original pull request: https://github.com/pytorch/pytorch/pull/59671
Reviewed By: soulitzer, ngimel
Differential Revision: D29466819
Pulled By: ezyang
fbshipit-source-id: 4a14e5d3c2b08e06e558683d97f7378a3180b00e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65101
As title. Previously this was guarded against for implementation
simplicity, as we didn't really think there was a use case for saving a
mangled module name directly.
But people started doing stuff like:
```
exporter.save_module(my_imported_obj.__module__)
```
which implicitly passes along the mangled module name.
This PR makes it so that given `PackageImporter` instance can always
import modules that it created, and changes `PackageExporter` to
properly demangle the resulting module name when writing the package to
the export archive.
Differential Revision:
D30975712
D30975712
Test Plan: Imported from OSS
Pulled By: suo
fbshipit-source-id: d9e849bf651713890e72dccdcef74fa52d377149