Summary:
## Original commit message:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73368
debug_pkl file inside of pytorch's .pt file consists of a list of SourceRanges. Each SourceRange points to a Source which is a stack track, filename, and start, end numbers. Those are emitted in debug_pkl file as strings.
Since many SourceRange shares the same source, the string for trace can be deduped.
The newer format saves a set of unique traces in a tuple, then each SourceRange will save the offset of it's trace w.r.t. position in that tuple. (i.e. manually applying dictionary compression).
The above helps with smaller file size. On loading, if we copy each trace to Source as string the runtime memory would still blowup.
To mitigate this, we use SourceView directly instead of source which will take the reference of string inside of Deserializer and make that into string_view. This is safe because Deserializer is hold by Unpickler by shared_ptr, and Unpickler is also hold by shared_ptr by another Source object. That Source object will be alive during the model construction.
Test Plan:
## Original Test plan
unit test
Took original file (312271638_930.predictor.disagg.local); loaded with `torch.jit.load` save again with `torch.jit.save`. Unzip both, look at contents:
```
[qihan@devvm5585.vll0 ~]$ du archive -h
4.0K archive/xl_model_weights
3.7M archive/extra
8.0K archive/code/__torch__/caffe2/torch/fb/model_transform/splitting
8.0K archive/code/__torch__/caffe2/torch/fb/model_transform
8.0K archive/code/__torch__/caffe2/torch/fb
8.0K archive/code/__torch__/caffe2/torch
8.0K archive/code/__torch__/caffe2
20M archive/code/__torch__/torch/fx/graph_module
20M archive/code/__torch__/torch/fx
8.0K archive/code/__torch__/torch/classes
20M archive/code/__torch__/torch
20M archive/code/__torch__
20M archive/code
2.7M archive/constants
35M archive
[qihan@devvm5585.vll0 ~]$ du resaved -h
4.0K resaved/extra
8.0K resaved/code/__torch__/caffe2/torch/fb/model_transform/splitting
8.0K resaved/code/__torch__/caffe2/torch/fb/model_transform
8.0K resaved/code/__torch__/caffe2/torch/fb
8.0K resaved/code/__torch__/caffe2/torch
8.0K resaved/code/__torch__/caffe2
1.3M resaved/code/__torch__/torch/fx/graph_module
1.3M resaved/code/__torch__/torch/fx
8.0K resaved/code/__torch__/torch/classes
1.4M resaved/code/__torch__/torch
1.4M resaved/code/__torch__
1.4M resaved/code
2.7M resaved/constants
13M resaved
[qihan@devvm5585.vll0 ~]$
```
## Additional test:
`buck test mode/dev-tsan //caffe2/benchmarks/static_runtime:static_runtime_cpptest -- --exact 'caffe2/benchmarks/static_runtime:static_runtime_cpptest - StaticRuntime.to'` passes
test jest.fbios.startup_cold_start.local.simulator f333356873 -
Differential Revision: D35196883
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74869
Approved by: https://github.com/gmagogsfm
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75201
In this diff:
1. Bump supported version to 9, which will serve as a placeholder for upcoming version bump to v9 for flatbuffer format migration.
2. Implements backport from v9 flatbuffer file to v8 pickle file.
ghstack-source-id: 153225189
(Note: this ignores all push blocking failures!)
Test Plan:
fb:
```
cd ~/fbsource/fbcode/ && buck test -c fbcode.caffe2_enable_flatbuffer=1 caffe2/test/cpp/jit:jit -- LiteInterpreterTest.BackPortByteCodeModelAllVersions
Parsing buck files: finished in 0.7 sec
Downloaded 0/25 artifacts, 0.00 bytes, 100.0% cache miss (for updated rules)
Building: finished in 20.7 sec (100%) 21783/21783 jobs, 5/21783 updated
cd ~/fbsource/fbcode/ && buck test caffe2/test/cpp/jit:jit -- FlatbufferTest.FlatbufferBackPortTest
Parsing buck files: finished in 0.7 sec
Building: finished in 4.5 sec (100%) 12972/53298 jobs, 0/53298 updated
Total time: 5.3 sec
More details at https://www.internalfb.com/intern/buck/build/b658d597-d358-4293-97cb-28e7612b96e8
BUILD SUCCEEDED
Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details.
Running with tpx session id: 35d5542d-6ee3-4c28-be10-1d822c7a6fef
Trace available for this run at /tmp/tpx-20220308-090347.891303-35d5542d-6ee3-4c28-be10-1d822c7a6fef/trace.log
RemoteExecution session id: reSessionID-35d5542d-6ee3-4c28-be10-1d822c7a6fef-tpx
Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/8444249379196000
✓ ListingSuccess: caffe2/test/cpp/jit:jit : 490 tests discovered (22.838)
✓ Pass: caffe2/test/cpp/jit:jit - FlatbufferTest.FlatbufferBackPortTest (0.289)
Summary
Pass: 1
ListingSuccess: 1
If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users
Finished test run: https://www.internalfb.com/intern/testinfra/testrun/8444249379196000
```
Reviewed By: iseeyuan
Differential Revision: D34702597
fbshipit-source-id: 5c203c29d13360d7934ce6e57557739e7038c05e
(cherry picked from commit 6189e08a2bd968fdab636f77cb6bd73d6c36beb2)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75244
Original commit changeset: d653a5af662a
Original Phabricator Diff: D35060736 (d9d34922a0)
Test Plan: Model loading test, verified that D35060736 (d9d34922a0) will cause the torch::save => torch::load failure.
Reviewed By: yinghai, jianyuh
Differential Revision: D35387009
fbshipit-source-id: 9d176992d402d57779e2af3d905b3c1538335298
(cherry picked from commit 6c8cc0d3b8a88b15e35702d70e18bbae8aa4628a)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74119
implemented function to generate source as ExtraFilesMap and constants
wrote function to construct jit module given (ivalue, source,
constant) tripple.
Test Plan: unittest
Reviewed By: pavithranrao
Differential Revision: D34803945
fbshipit-source-id: 2edc798407fe68294cb4c3c7516f5bd143df88c3
(cherry picked from commit 35e54e166b8f0f5cfe8f08c07866b59ae61ee79d)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70338
Today Unpickler is used by both server and mobile for deserializing model, and it always fallback to mobile parser when there's no type resolver provided by user. However this is not intended as server and mobile type parser supports different things. In this diff we provide a default fallback using script parser and opt it out for all mobile cases.
ghstack-source-id: 146727330
(Note: this ignores all push blocking failures!)
Test Plan: CI
Reviewed By: iseeyuan
Differential Revision: D33284352
fbshipit-source-id: 997c4f110b36eee6596e8f23f6a87bf91a4197ed
Summary:
This PR is created to replace https://github.com/pytorch/pytorch/pull/53180 PR stack, which has all the review discussions. Reason for needing a replacement is due to a messy Sandcastle issue.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64234
Reviewed By: gmagogsfm
Differential Revision: D30656444
Pulled By: ansley
fbshipit-source-id: 77536c8bcc88162e2c72636026ca3c16891d669a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59735
1. Fixes ABA storage identity problem during serialization for `torch.package` by keeping reference of serialized storages through lifetime of `PackageExporter` to prevent reuse of memory address. Achieved by extending logic used in solution to mobile's same issue.
2. Adds determinism to naming scheme of serialized storages in export code paths which utilize `tensor_cdata_naming_scheme`(introduced 2nd mapping in `StorageContext`, now maps `storage cdata ptr` -> `unique id`, `unique id` -> `c10::Storage`)
3. Additionally uses presence of a storage in the `StorageContext` instance as marker for if a storage has been serialized or not, removing the need to scan the `PythonStreamWriter` for presence of the storage's serialization file
Test Plan: Imported from OSS
Reviewed By: suo
Differential Revision: D29075276
Pulled By: Lilyjjo
fbshipit-source-id: 15a5c30b1de99c5bd7079388f2db9b6ece2eca12
Summary:
Add an api `_get_bytecode_version` to get version number given a bytecode model in both cxx and python, and the input can be both from file path and buffer.
## Test
CI (new added unit test will run as part of `pytorch_core-buck`)
1. run test_lite_interpreter.cpp
2. `python test/mobile/test_bytecode.py`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56801
ghstack-source-id: 128169647
Test Plan:
CI (new added unit test will run as part of `pytorch_core-buck`)
1. run test_lite_interpreter.cpp
2. `python test/mobile/test_bytecode.py`
Reviewed By: iseeyuan
Differential Revision: D27961417
fbshipit-source-id: f786cc9573d855feecff0b4fe8e5363e25f5728c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57098
1. Separate `readArchiveAndTensors()` from `jit/import.cpp` to a new file `jit/import_read.cpp`.
2. Use `readArchiveAndTensors()` in `mobile/import.cpp`
ghstack-source-id: 127703081
3. Add a util function in cpp that could read .pkl files directly instead of loading the entire module
Test Plan: CI
Reviewed By: raziel, iseeyuan
Differential Revision: D28052193
fbshipit-source-id: c8d57f3270bdcf2e52a32f7c111899bd5da7cac2
Summary:
This is an automatic change generated by the following script:
```
#!/usr/bin/env python3
from subprocess import check_output, check_call
import os
def get_compiled_files_list():
import json
with open("build/compile_commands.json") as f:
data = json.load(f)
files = [os.path.relpath(node['file']) for node in data]
for idx, fname in enumerate(files):
if fname.startswith('build/') and fname.endswith('.DEFAULT.cpp'):
files[idx] = fname[len('build/'):-len('.DEFAULT.cpp')]
return files
def run_clang_tidy(fname):
check_call(["python3", "tools/clang_tidy.py", "-c", "build", "-x", fname,"-s"])
changes = check_output(["git", "ls-files", "-m"])
if len(changes) == 0:
return
check_call(["git", "commit","--all", "-m", f"NOLINT stubs for {fname}"])
def main():
git_files = check_output(["git", "ls-files"]).decode("ascii").split("\n")
compiled_files = get_compiled_files_list()
for idx, fname in enumerate(git_files):
if fname not in compiled_files:
continue
if fname.startswith("caffe2/contrib/aten/"):
continue
print(f"[{idx}/{len(git_files)}] Processing {fname}")
run_clang_tidy(fname)
if __name__ == "__main__":
main()
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56892
Reviewed By: H-Huang
Differential Revision: D27991944
Pulled By: malfet
fbshipit-source-id: 5415e1eb2c1b34319a4f03024bfaa087007d7179
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50832
Please see the previous diff in this stack for the motivation to do so. This makes the same change but for the non-mobile codebase.
ghstack-source-id: 120184012
Test Plan: Sandcastle + Build
Reviewed By: raziel, iseeyuan
Differential Revision: D25979986
fbshipit-source-id: 7708f4f6a50cb16d7a23651e5655144d277d0a4f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48802
Current torch::jit::load API only supports unique_ptr ReadAdaptInterface input, but for some cases, torch::jit::load may not be the only consumer of the reader adapter. This diff enables an overload of torch::jit::load to load shared_ptr PyTorchStreamReader.
Reviewed By: malfet, houseroad
Differential Revision: D25241904
fbshipit-source-id: aa403bac9ed820cc0e94342aebfe524a1d5bf913
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40718
Currently only constant except tensor must be inlined during serialization.
Tensor are stored in the contant table. This patch generalizes this capability
to any IValue. This is particularly useful for non ASCII string literal that
cannot be inlined.
Test Plan: Imported from OSS
Differential Revision: D22298169
Pulled By: bzinodev
fbshipit-source-id: 88cc59af9cc45e426ca8002175593b9e431f4bac
Summary:
Clearly expressing a type is inferred by PyTorch instead of explicitly annotated by user makes many error messages more user-friendly
Currently Type has two string conversion methods. str() for IR printing and python_str() for serialization and error message generation. If we want to include more information in type printing while maintaining serialization/deserialization correctness, we need to split python_str() into annotation_str() and repr_str().
annotation_str is solely responsible for serialization, it strictly matches format of python type annotation. repr_str() is responsible for generating a human-readable error message that includes information like "this type is inferred, not explicitly annotated"
Closes https://github.com/pytorch/pytorch/issues/39449
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39544
Differential Revision: D21978759
Pulled By: gmagogsfm
fbshipit-source-id: 733566f5a62e748b5ca4bb3c5943ebb6d5b664d0
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37472
Our convention is for `findX` to return an optional version and `getX`
to assert that the X is there. Fix up `getMethod` to be consistent with
this convention.
Test Plan: Imported from OSS
Differential Revision: D21297543
Pulled By: suo
fbshipit-source-id: b40f56231cc8183e61bbb01fe5c0c113bcb6464d
Summary:
fmt is a formatting library for C++. It has several properties that make it nice
for inclusion in PyTorch:
- Widely used
- Basically copies how Python does it
- Support for all the compilers and platforms we care about
- Standards track (C++20)
- Small code size
- Header only
This PR includes it as a submodule and sets up the build.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37356
Differential Revision: D21262619
Pulled By: suo
fbshipit-source-id: 1d9a1a5ed08a634213748e7b02fc718ef8dac4c9
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37251
This was broken by recent changes to how we serialize with type tags. We
save a name (like `Dict[str, MyNamedTuple]`) and then relied on the
mobile type parser to resolve that name back into a set of types.
This doesn't work for any NamedTypes as the mobile type parser doesn't
know how to resolve those. The unpickler allows the caller to inject a
type resolver in for this purpose, use that so that when importing in a
non-mobile environment you get the right results.
A second problem also had to be fixed: the SourceImporter type loader
would only load named types directly (e.g. `MyNamedTuple`) and choked if
it was a general type that contained a named tupe (e.g.
`List[MyNamedTuple]`). Fixed that and renamed `loadNamedType` to
`loadType` for clarity.
Test Plan: Imported from OSS
Differential Revision: D21235213
Pulled By: suo
fbshipit-source-id: 16db0f4c5e91a890d67a8687cc8ababa6b94b0f4
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35115
This commit runs the newly added tools/clang_format.py on the JIT
codebase and includes all of the formatting changes thus produced.
Testing:
Ran the script, CI.
Test Plan: Imported from OSS
Reviewed By: eellison
Differential Revision: D20568523
Pulled By: SplitInfinity
fbshipit-source-id: e09bdb982ccf090eecfb7c7b461b8d0681eef82b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34515
Once upon a time we thought this was necessary. In reality it is not, so
removing it.
For backcompat, our public interface (defined in `api/`) still has
typedefs to the old `script::` names.
There was only one collision: `Pass` as a `Stmt` and `Pass` as a graph
transform. I renamed one of them.
Test Plan: Imported from OSS
Differential Revision: D20353503
Pulled By: suo
fbshipit-source-id: 48bb911ce75120a8c9e0c6fb65262ef775dfba93
Summary:
Stacked PRs
* #33474 - [jit] Remove list specializations from pickler
* **#33255 - [jit] Add type tags to lists/dicts in pickle**
This adds a global call to `torch.jit._pickle.restore_type_tags` for
lists and dicts so that we can preserve their types after serialization.
](https://our.intern.facebook.com/intern/diff/20346780/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33255
Pulled By: driazati
Differential Revision: D20346780
fbshipit-source-id: c8534954ef4adb2e3c880401acbee30cd284f3db
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33294
1. Serialize bytecode of __setstate__ and run it when loading the model.
2. One use case is quantization. To test this use case a few operators are registered temporarily for lite interpreter. The "_" prefix registration will be removed when the operators are all migrated to mobile.
Test Plan: Imported from OSS
Differential Revision: D20162898
Pulled By: iseeyuan
fbshipit-source-id: 7a3180807bf38fbce594d86993896861f12bb58c
Summary:
Stacked PRs
* #33474 - [jit] Remove list specializations from pickler
* **#33255 - [jit] Add type tags to lists/dicts in pickle**
This adds a global call to `torch.jit._pickle.restore_type_tags` for
lists and dicts so that we can preserve their types after serialization.
](https://our.intern.facebook.com/intern/diff/19868637/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33255
Pulled By: driazati
Reviewed By: xman1979, Tianshu-Bao
Differential Revision: D19868637
fbshipit-source-id: 2f1826e6679a786ca209198690269f399a542c04