Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75230
Op diagonal is a view op which we can't code-gen yet. Therefore, support
it by making hand-written IR construction and lowering.
Test Plan: ./build/bin/test_lazy --gtest_filter=LazyOpsTest.TestDiagonal*
Reviewed By: wconstab
Differential Revision: D35378316
Pulled By: alanwaketan
fbshipit-source-id: 7958d00107aef20ac37aabcf2868346240977530
(cherry picked from commit 84155528fce484627c9688cfd92fd4aeb68219e5)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75201
In this diff:
1. Bump supported version to 9, which will serve as a placeholder for upcoming version bump to v9 for flatbuffer format migration.
2. Implements backport from v9 flatbuffer file to v8 pickle file.
ghstack-source-id: 153225189
(Note: this ignores all push blocking failures!)
Test Plan:
fb:
```
cd ~/fbsource/fbcode/ && buck test -c fbcode.caffe2_enable_flatbuffer=1 caffe2/test/cpp/jit:jit -- LiteInterpreterTest.BackPortByteCodeModelAllVersions
Parsing buck files: finished in 0.7 sec
Downloaded 0/25 artifacts, 0.00 bytes, 100.0% cache miss (for updated rules)
Building: finished in 20.7 sec (100%) 21783/21783 jobs, 5/21783 updated
cd ~/fbsource/fbcode/ && buck test caffe2/test/cpp/jit:jit -- FlatbufferTest.FlatbufferBackPortTest
Parsing buck files: finished in 0.7 sec
Building: finished in 4.5 sec (100%) 12972/53298 jobs, 0/53298 updated
Total time: 5.3 sec
More details at https://www.internalfb.com/intern/buck/build/b658d597-d358-4293-97cb-28e7612b96e8
BUILD SUCCEEDED
Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details.
Running with tpx session id: 35d5542d-6ee3-4c28-be10-1d822c7a6fef
Trace available for this run at /tmp/tpx-20220308-090347.891303-35d5542d-6ee3-4c28-be10-1d822c7a6fef/trace.log
RemoteExecution session id: reSessionID-35d5542d-6ee3-4c28-be10-1d822c7a6fef-tpx
Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/8444249379196000
✓ ListingSuccess: caffe2/test/cpp/jit:jit : 490 tests discovered (22.838)
✓ Pass: caffe2/test/cpp/jit:jit - FlatbufferTest.FlatbufferBackPortTest (0.289)
Summary
Pass: 1
ListingSuccess: 1
If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users
Finished test run: https://www.internalfb.com/intern/testinfra/testrun/8444249379196000
```
Reviewed By: iseeyuan
Differential Revision: D34702597
fbshipit-source-id: 5c203c29d13360d7934ce6e57557739e7038c05e
(cherry picked from commit 6189e08a2bd968fdab636f77cb6bd73d6c36beb2)
Summary:
PyTorch supports registering a custom operator by `TORCH_LIBRARY_FRAGMENT` / `TORCH_LIBRARY_IMPL` and `torch::jit::tensorexpr::getNNCLoweringRegistry` could insert a custom operator. But the te fuser passes conditional check does not support custom operator. The `isSupported` of `tensorexpr_fuser` checks whether the `Node` is `get_tensorexpr_elementwise_set()`, `supported_non_eltwise_set()`, `supported_misc_set` and `supported_reduction_set`. If a custom operator needs to be added to the TE fusion group, the checked will block it.
Taking the RN50 as an example, we can speed up the model by fusing the convolution and consecutive element-wise operator into a custom operator. The framework overhead becomes non-negligible when the computation becomes more efficient, especially for the latency mode and the tiny models. If the TE fuser allows adding the custom operator to the fusion group, then the entire RN50 model could be fused by TE as a single operator/function consisting of "ExternalCalls" and TE-IR. This could significantly reduce framework overhead, which in turn improves RN50 E2E performance. The same goes for other models.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73073
Reviewed By: pbelevich
Differential Revision: D35453165
Pulled By: ZolotukhinM
fbshipit-source-id: a764cf340b0b1e05fe230649cbe44f5786bdd37d
(cherry picked from commit ee95aa4d36714540fbb216a338799e6a6bb966d5)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57775
The minimum supported bytecode version is updated from 3 to 4. We no longer support version 3 bytecode models.
Why?
* There are hacky codes in operator loading, that performs differently on one operator on the global bytecode version 3. Instead operator related metadata should be passed (for example, in #56845). To allow future development, we remove the hacky way first.
* The bytecode version was bumped from 3 to 4 more than half a year ago. Since all the production models are all bumped to version 4, it's not practical to keep and maintain version 3. The risk to deprecate version 3 is low.
Test Plan: Imported from OSS
Reviewed By: raziel
Differential Revision: D28270791
Pulled By: cccclai
fbshipit-source-id: 70b1bd6352fdaae5f8d2173b81578d77018c8e44
(cherry picked from commit 3e930fa381cd01f3705116795c6426df992372fc)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75244
Original commit changeset: d653a5af662a
Original Phabricator Diff: D35060736 (d9d34922a0)
Test Plan: Model loading test, verified that D35060736 (d9d34922a0) will cause the torch::save => torch::load failure.
Reviewed By: yinghai, jianyuh
Differential Revision: D35387009
fbshipit-source-id: 9d176992d402d57779e2af3d905b3c1538335298
(cherry picked from commit 6c8cc0d3b8a88b15e35702d70e18bbae8aa4628a)
Summary:
update the upgrader models by hacking backport logic - copy everything in the model and only rewrite the bytecode version to 4 in D35265596
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75120
ghstack-source-id: 152823046
Test Plan: CI
Reviewed By: qihqi
Differential Revision: D35321154
fbshipit-source-id: 333158bd0fd9b4819b3b7cf47d80c285934adf3e
(cherry picked from commit 74bb2da73a4d18f448b8486772643eac89eb759a)
Summary:
Previously, the torchscript backend would be (partially) initialized at startup.
- the dispatcher registrations would be registered,
- but other backend components would not be initialized until explicitly calling
the backend init function
With this change, the torchscript backend is not initialized until its explicit
initialization function is called.
This enables external backends to register their own backend instead of the torchscript
backend to the same (Lazy) key.
Lands a change contributed by antoniojkim via lazy_tensor_staging branch (https://github.com/pytorch/pytorch/issues/73973)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74557
Reviewed By: bdhirsh
Differential Revision: D35051464
Pulled By: wconstab
fbshipit-source-id: 5a8b0851293e394f49427d1416ee571a8881fe9f
(cherry picked from commit ef745a4a2c8d1d7f9510541a20f1f40625ce29de)
Summary:
This PR introduces `SymInt` type to Pytorch which will be used by LTC and AOTAutograd for tracing size arithmetic and tests.
`SymInt` is a C++ union structure [int64_t, SymbolicIntNode*] that wraps around an int64_t field where the value of the field could be an index into a list of `shared_ptr<SymbolicIntNode>` or a real int.
This PR doesn't add any support for actually tracing symbolic ints. i.e. data_ for now can only contain real ints.
```
Goal 1: just to show we can add a type to PyTorch core. (wraps int) LANDEABLE
Finalize the naming - symint
Want the name to be short
Does invoke “size” - NO
SInt/SymInt/SymbolicInt
SInt could mean signed int
sym_int or symint or SymInt (originally it was “int”; capitalized implies object semantics, whereas lowercase implies value semantics)
JIT schema - symint
C++ - symint
```
See more details here: https://docs.google.com/document/d/1iiLNwR5ohAsw_ymfnOpDsyF6L9RTUaHMpD8 (d843f63f2a)YLw-jxEw
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74861
Reviewed By: qihqi, ngimel
Differential Revision: D35226230
Pulled By: Krovatkin
fbshipit-source-id: 34acf342bd50fcaa4d8d5dd49c2fd6a98823a5b3
(cherry picked from commit 218643f63ef181cabb92d13a6e837eb64f2dda3c)
Summary:
Things changed in this PR that requires review:
test/forward_backward_compatibility/check_forward_backward_compatibility.py
Our previous function overload extension names were wrong and has been updated in this PR, hence the compatibility list updated.
nvfuser code updates with bug fixes towards failures we encountered in OpInfoTests as well as failures reported by AOTAutograd team.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73627
Reviewed By: Chillee
Differential Revision: D34765458
Pulled By: davidberard98
fbshipit-source-id: c81f3d6a1b723fb3a8ba419b7f82227f70440ca7
(cherry picked from commit b6a2c362c37051e44fac31687b2fe272f776551e)
This is a technical revert of 6d36bbde7e to reconcile it with e50478c02592597f12b8490ec5496f76c7d8b8cc (which is the same + lint changes applied)
Should be skipped during import
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74580
Handle Flatbuffer-serialized parameters.
Make `_load_parameters()` detect the input data format and use the correct deserializer to load the parameters.
Also, rename `BytecodeDeserializer` to `IValueUnpickler` to make it clear that it unpickles an `IValue` and doesn't have anything to do with bytecode.
ghstack-source-id: 152487890
Test Plan:
New unit test shows a successful round trip from _save_parameters() to _load_parameters() using flatbuffers.
```
$ buck test //xplat/caffe2:test_lite_trainer //xplat/caffe2:test_lite_trainer_pickle_and_flatbuffer
Building: finished in 0.5 sec (100%) 346/346 jobs, 0/346 updated
Total time: 0.6 sec
Testing: finished in 0.5 sec (26 PASS/0 FAIL)
BUILD SUCCEEDED
RESULTS FOR //xplat/caffe2:test_lite_trainer //xplat/caffe2:test_lite_trainer_pickle_and_flatbuffer
PASS <100ms 13 Passed 0 Skipped 0 Failed //xplat/caffe2:test_lite_trainer
PASS <100ms 13 Passed 0 Skipped 0 Failed //xplat/caffe2:test_lite_trainer_pickle_and_flatbuffer
TESTS PASSED
```
Reviewed By: qihqi
Differential Revision: D34488913
fbshipit-source-id: 8d2c0b895699f3b336115d33bf96d49cbf9245d2
(cherry picked from commit 319345deff260826197f8cdf5ac03071b412c72f)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74579
Now that we can convert a module to a flatbuffer, update `_save_parameters()` to optionally write to that format.
Also, rename the internal `ScriptModuleSerializer` class to `IValuePickler` to make it more clear that a) it's pickle-specific, and b) it serializes IValues, not Modules.
ghstack-source-id: 152487889
Test Plan:
New unit test shows that we can produce Flatbuffer-formatted output.
```
$ buck test //xplat/caffe2:test_lite_trainer //xplat/caffe2:test_lite_trainer_pickle_and_flatbuffer
Building: finished in 0.5 sec (100%) 346/346 jobs, 0/346 updated
Total time: 0.6 sec
Testing: finished in 0.5 sec (26 PASS/0 FAIL)
BUILD SUCCEEDED
RESULTS FOR //xplat/caffe2:test_lite_trainer //xplat/caffe2:test_lite_trainer_pickle_and_flatbuffer
PASS <100ms 13 Passed 0 Skipped 0 Failed //xplat/caffe2:test_lite_trainer
PASS <100ms 13 Passed 0 Skipped 0 Failed //xplat/caffe2:test_lite_trainer_pickle_and_flatbuffer
TESTS PASSED
```
A new test in later commit D34488913 tests the full round trip.
Reviewed By: qihqi
Differential Revision: D34408538
fbshipit-source-id: eea183c31b5e1b2b75a65f384d8a479223a4ae72
(cherry picked from commit de310a15422b65fb7e443f7005d287d9f5f586bc)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74186
Make the execution settings mutable on function_impl so that we can set it for running op decompositions. Add mapping to function objects and show example in test of executing op decompositions.
Test Plan: Imported from OSS
Reviewed By: gchanan
Differential Revision: D34938125
Pulled By: eellison
fbshipit-source-id: adf108b2f6c1bd166910c6d7b94245661d67ce0d
(cherry picked from commit 9957e33803002d9e71abe4ff802769270b6960d3)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74012
This allows setting an executor on a function. The first use case is use to decompositions in C++ without additional fusion passes etc which might not work with custom tensors like batched tensors/vmap. A subsequent use case might be taking advantage of invokees of JIT execution which guard on certain properties before invocation (such as complete shapes in AOT autograd, rank in lazy tensor).
Test Plan: Imported from OSS
Reviewed By: gchanan
Differential Revision: D34938124
Pulled By: eellison
fbshipit-source-id: cf7a45416457942b872322cab47d871a8336bdb5
(cherry picked from commit 9c600eb9ad0f2173f003e511268e97584edae36d)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73875
Previously we had a few settings:
- getExecutor - which toggled between Profiling Executor and Legacy
- getGraphOptimize - if true, overrides PE/Legacy to run with simple executor (no optimizations)
and then...
- getProfilingMode - which would set PE to 0 specializtions.
The last mode is redundant with getGraphOptimize, we should just remove it and use getGraphOptimize in these cases. It would lead to potentially invalid combinations of logic - what does mean if getProfilingMode is true but getExecutor is set to false ? This would lead to a bug in specialize_autograd_zero in this case, see: https://github.com/pytorch/pytorch/blob/master/torch%2Fcsrc%2Fjit%2Fpasses%2Fspecialize_autogradzero.cpp#L93.
The tests here are failing but get fixed with the PR above it, so i'll squash for landing.
Test Plan: Imported from OSS
Reviewed By: cpuhrsch
Differential Revision: D34938130
Pulled By: eellison
fbshipit-source-id: 1a9c0ae7f6d1cfddc2ed3499a5af611053ae5e1b
(cherry picked from commit cf69ce3d155ba7d334022c42fb2cee54bb088c23)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74594
Extending `_save_for_mobile` and `_load_for_mobile` to support faltbuffer format with additional optional argument which is set to pick pickle by default.
Adding new binary target with suffix `_pickle_and_flatbuffer` to help migration.
Size test in D34909502 shows the size has regressed by ~40K but after removing pickle and comparing lite_predictors we have ~120K size measure that we will achieve when deprecating pickle and moving to flatbuffer
**BEFORE:**
```lang=mermaid
graph TD;
torch_core-->torch_mobile_deserialize;
torch_mobile_core-->torch_mobile_deserialize;
jit_module_saving-->torch_core;
jit_module_saving-->torch_mobile_core;
torch_mobile_deserialize-->caffe2_serialize;
torch_mobile_deserialize-->torch_mobile_module;
caffe2_serialize-->miniz;
flatbuffer_loader-->mobile_bytecode;
flatbuffer_serializer-->mobile_bytecode;
mobile_bytecode-->flatbuffer_2.0;
flatbuffer_loader-->torch_mobile_module;
flatbuffer_serializer-->torch_mobile_module;
```
**AFTER:**
```lang=mermaid
graph TD;
torch_core-->torch_mobile_deserialize;
torch_mobile_core-->torch_mobile_deserialize;
jit_module_saving-->torch_core;
jit_module_saving-->torch_mobile_core;
torch_mobile_deserialize-->caffe2_serialize;
torch_mobile_deserialize-->torch_mobile_module;
caffe2_serialize-->miniz;
flatbuffer_loader-->mobile_bytecode;
flatbuffer_serializer-->mobile_bytecode;
mobile_bytecode-->flatbuffer_2.0;
torch_mobile_deserialize_pickle_and_flatbuffer-->|new| flatbuffer_loader;
torch_mobile_deserialize_pickle_and_flatbuffer-->|new| torch_mobile_deserialize;
torch_mobile_core_pickle_and_flatbuffer-->|new| torch_mobile_deserialize_pickle_and_flatbuffer;
torch_core_pickle_and_flatbuffer-->|new| torch_mobile_deserialize_pickle_and_flatbuffer;
jit_module_saving_pickle_and_flatbuffer-->|new| torch_core_pickle_and_flatbuffer;
jit_module_saving_pickle_and_flatbuffer-->|new| torch_mobile_core_pickle_and_flatbuffer;
flatbuffer_serializer-->torch_mobile_module;
jit_module_saving_pickle_and_flatbuffer-->|new|jit_module_saving;
jit_module_saving_pickle_and_flatbuffer-->|new|flatbuffer_serializer;
flatbuffer_loader-->torch_mobile_module;
```
Original commit changeset: 780dfb6fd6ba
Original Phabricator Diff: D34805092 (284b2b7135)
ghstack-source-id: 152044801
(Note: this ignores all push blocking failures!)
Test Plan:
CI
```
~/fbsource/fbcode] cd ~/fbsource/fbcode/ && buck test -c fbcode.caffe2_enable_flatbuffer=1 //caffe2/test/cpp/jit:jit -- FlatbufferTest.ExtraFiles
Parsing buck files: finished in 0.9 sec
Building: finished in 5.3 sec (100%) 12992/54304 jobs, 0/54304 updated
Total time: 6.2 sec
More details at https://www.internalfb.com/intern/buck/build/2b387fff-f813-4cfa-b53f-eb2378630d4e
BUILD SUCCEEDED
Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details.
Running with tpx session id: f93a84d6-e7ce-41a0-a97f-0ef3fa6d199d
Trace available for this run at /tmp/tpx-20220323-134108.766518-f93a84d6-e7ce-41a0-a97f-0ef3fa6d199d/trace.log
RemoteExecution session id: reSessionID-f93a84d6-e7ce-41a0-a97f-0ef3fa6d199d-tpx
Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/4503599723101693
✓ ListingSuccess: caffe2/test/cpp/jit:jit : 486 tests discovered (19.122)
✓ Pass: caffe2/test/cpp/jit:jit - FlatbufferTest.ExtraFiles (0.187)
Summary
Pass: 1
ListingSuccess: 1
If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users
Finished test run: https://www.internalfb.com/intern/testinfra/testrun/4503599723101693
```
Similar Build Deps Dags
```
[pavithran@devvm5216.vll0 /data/users/pavithran/fbsource] buck query 'allpaths(//xplat/caffe2:torch_mobile_all_ops_pickle_and_flatbuffer, //xplat/caffe2:torch_mobile_deserialize_pickle_and_flatbuffer)' --output-format dot-compact | pastry
P486770901: https://www.internalfb.com/intern/paste/P486770901/
[pavithran@devvm5216.vll0 /data/users/pavithran/fbsource] buck query 'allpaths(//xplat/caffe2:torch_mobile_all_ops, //xplat/caffe2:torch_mobile_deserialize)' --output-format dot-compact | pastry
P486771278: https://www.internalfb.com/intern/paste/P486771278/
```
pickle_and_flatbuffer: https://www.internalfb.com/intern/dgw/graph/?build_id=P486770901
pickle: https://www.internalfb.com/intern/dgw/graph/?build_id=P486771278
Reviewed By: iseeyuan
Differential Revision: D35067157
fbshipit-source-id: 9044259c17a2e0da79bd6aedb28efbdfd57e23e0
(cherry picked from commit f738069ec3a72e79da56172741d027de514e9e5f)
Summary:
Also enables bazel build to run lazy codegen. Bazel (oss) build feeds off the same filelists as cmake/buck (build_variables.bzl), so enabling it is easier than keeping it disabled.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74111
Test Plan: Run CI and verify test_lazy_ops is running via OSS cmake builds
Reviewed By: bdhirsh
Differential Revision: D34772403
fbshipit-source-id: 8a63f58b9536e6ac1be530667932176ef2549496
(cherry picked from commit e807ffb1918853d10b924fdc24f85ee5b1a39021)
Summary:
This merges changes that have already been reviewed/landed onto lazy_tensor_staging branch. It combines changes from multiple PRs into one diff.
updated from lazy_tensor_staging on 3/16
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74311
Test Plan:
Run CI to ensure compilation on various platforms
Run unit tests on lazy_tensor_staging branch with source version of all these diffs
Reviewed By: desertfire
Differential Revision: D34929235
fbshipit-source-id: babbc3bbeabc5b8107ee9284ed7765887a148622
(cherry picked from commit d91577a6557343ec536f6859e4808ec1a8a9b685)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74309
Since the test file is large, it can be landed on its own and then switched on
in the diff that actually builds lazy tensor code.
Test Plan: verify CI passes
Reviewed By: desertfire
Differential Revision: D34928619
fbshipit-source-id: cd556155326f7fb55b3f29031f80bc36c936d565
(cherry picked from commit 60945adbefb6a8d19f89e330f8b344d076b13bfc)
Summary:
Hooks into existing autograd codegen script (generate_code.py) to take advantage of its integrations into buck/cmake/bazel.
Adds a new option (--gen_lazy_ts_backend) to. generate_code.py, calling this from CMake OSS build and fbcode build, but not from other internal xplat/ovrsource builds (these could be opted in later)
Bazel support is added in a later diff.
Includes one generated file (torch/csrc/lazy/generated/LazyIr.h) in a unit test (test/cpp/lazy/test_ir.cpp) to partially verify the generator is working, but does not compile the remaining output sources from the generator yet as they depend on other files not yet landed from lazy_tensor_staging branch.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73996
Test Plan: OSS/internal CI - verify all builds are working and test_ir.cpp compiles LazyIr.h
Reviewed By: ezyang
Differential Revision: D34408536
fbshipit-source-id: 8af0aea3b95d81eccafc17d64390d70ddd176515
(cherry picked from commit f930612f2bad61c76eb02d85cfbec9f33a1459dc)