Commit Graph

655 Commits

Author SHA1 Message Date
jishaomin
91e9fcf5b0 sup torch script parameterlist
Fixes #61176

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75479
Approved by: https://github.com/davidberard98
2022-04-20 20:53:07 +00:00
Elias Ellison
0c671c15ec [JIT] Remove CSE Hoisting
This has led to a couple bugs, and I don't think the additional complexity was worth keeping in codebase.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75756
Approved by: https://github.com/davidberard98
2022-04-19 20:59:25 +00:00
Han Qi
b34b192d6b Reland "Make debug_pkl smaller by only emitting unique traces." (#73368)
Summary:
## Original commit message:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73368

debug_pkl file inside of pytorch's .pt file consists of a list of SourceRanges. Each SourceRange points to a Source which is a stack track, filename, and start, end numbers. Those are emitted in debug_pkl file as strings.
Since many SourceRange shares the same source, the string for trace can be deduped.
The newer format saves a set of unique traces in a tuple, then each SourceRange will save the offset of it's trace w.r.t. position in that tuple. (i.e. manually applying dictionary compression).
The above helps with smaller file size. On loading, if we copy each trace to Source as string the runtime memory would still blowup.
To mitigate this, we use SourceView directly instead of source which will take the reference of string inside of Deserializer and make that into string_view. This is safe because Deserializer is hold by Unpickler by shared_ptr, and Unpickler is also hold by shared_ptr by another Source object. That Source object will be alive during the model construction.

Test Plan:
## Original Test plan
unit test

Took original file (312271638_930.predictor.disagg.local); loaded with `torch.jit.load` save again with `torch.jit.save`. Unzip both, look at contents:
```
[qihan@devvm5585.vll0 ~]$ du archive -h
4.0K    archive/xl_model_weights
3.7M    archive/extra
8.0K    archive/code/__torch__/caffe2/torch/fb/model_transform/splitting
8.0K    archive/code/__torch__/caffe2/torch/fb/model_transform
8.0K    archive/code/__torch__/caffe2/torch/fb
8.0K    archive/code/__torch__/caffe2/torch
8.0K    archive/code/__torch__/caffe2
20M     archive/code/__torch__/torch/fx/graph_module
20M     archive/code/__torch__/torch/fx
8.0K    archive/code/__torch__/torch/classes
20M     archive/code/__torch__/torch
20M     archive/code/__torch__
20M     archive/code
2.7M    archive/constants
35M     archive
[qihan@devvm5585.vll0 ~]$ du resaved -h
4.0K    resaved/extra
8.0K    resaved/code/__torch__/caffe2/torch/fb/model_transform/splitting
8.0K    resaved/code/__torch__/caffe2/torch/fb/model_transform
8.0K    resaved/code/__torch__/caffe2/torch/fb
8.0K    resaved/code/__torch__/caffe2/torch
8.0K    resaved/code/__torch__/caffe2
1.3M    resaved/code/__torch__/torch/fx/graph_module
1.3M    resaved/code/__torch__/torch/fx
8.0K    resaved/code/__torch__/torch/classes
1.4M    resaved/code/__torch__/torch
1.4M    resaved/code/__torch__
1.4M    resaved/code
2.7M    resaved/constants
13M     resaved
[qihan@devvm5585.vll0 ~]$
```
## Additional test:
`buck test mode/dev-tsan //caffe2/benchmarks/static_runtime:static_runtime_cpptest -- --exact 'caffe2/benchmarks/static_runtime:static_runtime_cpptest - StaticRuntime.to'` passes

 test jest.fbios.startup_cold_start.local.simulator f333356873 -

Differential Revision: D35196883

Pull Request resolved: https://github.com/pytorch/pytorch/pull/74869
Approved by: https://github.com/gmagogsfm
2022-04-18 22:34:21 +00:00
John Clow
f281d83d77 Moving Remove Tensor Type Specializations to after custom passes
This is to allow for Intel folks to use type information in their custom passes.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/71748

Approved by: https://github.com/eellison
2022-04-11 22:12:01 +00:00
Emma Blink
ca056cc918 [AutoAccept][Codemod][FBSourceClangFormatLinter] Daily arc lint --take CLANGFORMAT
Reviewed By: zertosh

Differential Revision: D35543681

fbshipit-source-id: 0453f35c2a39299df172dc2b4fc77fb73963bb97
(cherry picked from commit aae11d9628a1cf7fd88a2113191f31e979750bc8)
2022-04-11 13:48:41 +00:00
eellison
00d11de564 [JIT] Add support for closed over inf
Fixes https://github.com/facebookresearch/torchdynamo/issues/124
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75439
Approved by: https://github.com/anijain2305, https://github.com/davidberard98
2022-04-07 21:39:01 +00:00
Elias Ellison
9a8e605565 Add support for legacy tensor constructors in JIT (#74785)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74785

Fix for https://github.com/facebookresearch/torchdynamo/issues/93

Because the constructor follow a non-standard input schema (variadic integers), they are handled specially in ir_emitter.

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D35362762

Pulled By: eellison

fbshipit-source-id: 960badf08ba2ab0818af5fd331aff3542051250f
(cherry picked from commit bd579dead5a5206fc6e5b535ecf4f99ae67ee135)
2022-04-06 18:11:23 +00:00
Elias Ellison
43b56b3814 Add Parsing of tensor constants (#75119)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75119

Add support for parsing Tensor constants like Double(4, 4) ... by initializing random tensors. This makes saving IR and then parsing it lossy, so I have it toggled as default not on, but is useful in cases like repro-ing Fusions with tensor constants post-freezing.

cc Krovatkin

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D35373999

Pulled By: eellison

fbshipit-source-id: a5c8d9f93f23a7442258fc745ed6b6def330dca8
(cherry picked from commit 32dd6567522973563bd452bf486ed27b02e4e35c)
2022-04-06 18:00:53 +00:00
David Berard
e9e75215e2 [JIT] Optionally validate nvfuser outputs after execution (#74361)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74361

This adds an optional validation after executing an NVFuser node, which checks that the output is the same as the unfused implementation. Then the outputs and the graph are reported via a callback.

```python
import torch

def callback(x, y, graph):
    for i in range(len(x)-amt, len(x)):
        print(x[i])
        print(y[i])
    print(graph)

with torch.jit.fuser("fuser2"):
    torch._C._jit_nvfuser_set_comparison_callback(True, callback)

    torch.jit.script
    def g(x, y):
        z = torch.add(x, y)
        return torch.sin(z)

    def f(x, y, a):
        z = torch.add(x, y)
        return g(torch.relu(z), a)

    f_s = torch.jit.script(f)
    x = torch.rand((10, 10), dtype=torch.half).cuda()
    y = torch.rand((10, 10), dtype=torch.half).cuda()
    a = torch.rand((10, 10), dtype=torch.half).cuda()
    f_s(x, y, a)
    f_s(x, y, a)
    f_s(x, y, a)
```

Test Plan: Imported from OSS

Reviewed By: eellison

Differential Revision: D34975310

Pulled By: davidberard98

fbshipit-source-id: 2379c9a6f371cd58da6a187c1f16882f3923ab24
(cherry picked from commit 96c87992c65f5e6bb1bdd51791682dd837af99b4)
2022-04-01 23:48:30 +00:00
Nikita Shulga
9bb12beda1 Fix sign-compare violations in python_list.h
`idx` is signed type as well as `len()`, so no need to cast one of the
two two unsigned.
Prerequisite change for enabling `-Werror=sign-compare` across PyTorch repo

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75076

Approved by: https://github.com/albanD
2022-04-01 19:15:51 +00:00
Nikolay Korovaiko
5177f95d21 Introducing SymInt to Pytorch (for tracing size arithmetic) (master rebase) (#74861)
Summary:
This PR introduces `SymInt` type to Pytorch which will be used by LTC and AOTAutograd for tracing size arithmetic and tests.
`SymInt` is a C++ union structure [int64_t, SymbolicIntNode*] that wraps around an int64_t field where the value of the field could be an index into a list of `shared_ptr<SymbolicIntNode>` or a real int.
This PR doesn't add any support for actually tracing symbolic ints. i.e. data_ for now can only contain real ints.

```
Goal 1: just to show we can add a type to PyTorch core. (wraps int) LANDEABLE
Finalize the naming - symint
Want the name to be short
Does invoke “size” - NO
SInt/SymInt/SymbolicInt
SInt could mean signed int
sym_int or symint or SymInt (originally it was “int”; capitalized implies object semantics, whereas lowercase implies value semantics)
JIT schema - symint
C++ - symint
```

See more details here: https://docs.google.com/document/d/1iiLNwR5ohAsw_ymfnOpDsyF6L9RTUaHMpD8 (d843f63f2a)YLw-jxEw

Pull Request resolved: https://github.com/pytorch/pytorch/pull/74861

Reviewed By: qihqi, ngimel

Differential Revision: D35226230

Pulled By: Krovatkin

fbshipit-source-id: 34acf342bd50fcaa4d8d5dd49c2fd6a98823a5b3
(cherry picked from commit 218643f63ef181cabb92d13a6e837eb64f2dda3c)
2022-03-31 21:59:59 +00:00
Elias Ellison
2ef5611f31 Add comments for adding shape function and linting (#73570)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73570

Approved by: https://github.com/huiguoo

Test Plan: contbuild & OSS CI, see 6d36bbde7e

Reviewed By: pbelevich

Differential Revision: D35192688

Pulled By: atalman

fbshipit-source-id: b12b80e6a6dd1adaa57a8facb6bb077989faa543
(cherry picked from commit e50478c02592597f12b8490ec5496f76c7d8b8cc)
2022-03-31 04:25:43 +00:00
Nikita Shulga
3036a0309d [skip ci]Revert "Add comments for adding shape function and linting"
This is a technical revert of 6d36bbde7e to reconcile it with e50478c02592597f12b8490ec5496f76c7d8b8cc (which is the same + lint changes applied)

Should be skipped during import
2022-03-30 21:21:28 -07:00
Elias Ellison
6d36bbde7e Add comments for adding shape function and linting
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73570

Approved by: https://github.com/huiguoo
2022-03-29 23:02:22 +00:00
Elias Ellison
aacdf291e0 [JIT] Make aot autograd decompositions usable in JIT, add script for serializing the decompositions (#73938)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73938

This is a first step in porting and making usable all of the decompositions defined in [functorch](https://github.com/pytorch/functorch/blob/main/functorch/_src/decompositions.py#L349) in core and in JIT as well as C++.

The decompositions are defined in python, scripted and inlined, and then serialized as C++ code which TorchScript can parse. The workflow is edit python decomposition file then run [tools/codegen/decompositions/gen_jit_decompositions.py](https://github.com/pytorch/pytorch/pull/73938/files#diff-6adef2116be233c3524e3b583e373ab0ffc9169beb6c1f6d96b5d0385e75afa1).

Decompositions are mapped to their corresponding aten schemas via the schema in their python def. This allows multiple decompositions for an overloaded op like `aten.var` (shown here in the example).

This is just a first PR, i'm sure there will be many follows ups such as:
- making these runnable in C++ with simple executor
- porting over more decompositions from AOT Autograd
- Using opinfos / more robust testing
- Categorizing decompositions
- Hooking in decompositions at various points of JIT execution

Test Plan: Imported from OSS

Reviewed By: gchanan

Differential Revision: D34938126

Pulled By: eellison

fbshipit-source-id: 9559a7cb731982e3a726f2f95af498b84fb09c13
(cherry picked from commit a4e0e748791e378e7e12a9dd0b63fb3c62dc1890)
2022-03-29 18:38:52 +00:00
Elias Ellison
6694fdaccd Clean up profiling mode and profiling executor strategy (#73875)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73875

Previously we had a few settings:
- getExecutor - which toggled between Profiling Executor and Legacy
- getGraphOptimize - if true, overrides PE/Legacy to run with simple executor (no optimizations)
and then...
- getProfilingMode - which would set PE to 0 specializtions.

The last mode is redundant with getGraphOptimize, we should just remove it and use getGraphOptimize in these cases. It would lead to potentially invalid combinations of logic - what does mean if getProfilingMode is true but getExecutor is set to false ? This would lead to a bug in specialize_autograd_zero in this case, see: https://github.com/pytorch/pytorch/blob/master/torch%2Fcsrc%2Fjit%2Fpasses%2Fspecialize_autogradzero.cpp#L93.

The tests here are failing but get fixed with the PR above it, so i'll squash for landing.

Test Plan: Imported from OSS

Reviewed By: cpuhrsch

Differential Revision: D34938130

Pulled By: eellison

fbshipit-source-id: 1a9c0ae7f6d1cfddc2ed3499a5af611053ae5e1b
(cherry picked from commit cf69ce3d155ba7d334022c42fb2cee54bb088c23)
2022-03-29 18:38:51 +00:00
Oleg Khabinov
5079321b71 Fix issue with prim::Print() and torch::deploy (#74513)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74513

Reviewed By: d4l3k, houseroad

Differential Revision: D35035089

fbshipit-source-id: d67b98600c74e2ed16b4d80f52148cd64b9e6ca0
(cherry picked from commit 16caf865077e28be31b805f015b9a61962632c8f)
2022-03-25 03:14:34 +00:00
Han Qi
75d6cbe605 [4/5]Testing jit module in flatbuffer in Python. (#74387)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74387

Make temporary python bindings for flatbuffer to test ScriptModule save / load.

(Note: this ignores all push blocking failures!)

Test Plan: unittest

Reviewed By: iseeyuan

Differential Revision: D34968080

fbshipit-source-id: d23b16abda6e4b7ecf6b1198ed6e00908a3db903
(cherry picked from commit 5cbbc390c5f54146a1c469106ab4a6286c754325)
2022-03-24 23:29:47 +00:00
Pavithran Ramachandran
fc2cf3d26f Back out "Revert D34805092: Extend _save_for_mobile and _load_for_mobile to support flatbuffer format; Default format is pickle + Change buck targets to support only pickle and pickle + flatbuffer for migration" (#74594)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74594

Extending `_save_for_mobile` and `_load_for_mobile` to support faltbuffer format with additional optional argument which is set to pick pickle by default.

Adding new binary target with suffix `_pickle_and_flatbuffer` to help migration.

Size test in D34909502 shows the size has regressed by ~40K but after removing pickle and comparing lite_predictors we have ~120K size measure that we will achieve when deprecating pickle and moving to flatbuffer

**BEFORE:**

```lang=mermaid
graph TD;
    torch_core-->torch_mobile_deserialize;

    torch_mobile_core-->torch_mobile_deserialize;

    jit_module_saving-->torch_core;
    jit_module_saving-->torch_mobile_core;

    torch_mobile_deserialize-->caffe2_serialize;
    torch_mobile_deserialize-->torch_mobile_module;

    caffe2_serialize-->miniz;

    flatbuffer_loader-->mobile_bytecode;
    flatbuffer_serializer-->mobile_bytecode;

    mobile_bytecode-->flatbuffer_2.0;

    flatbuffer_loader-->torch_mobile_module;
    flatbuffer_serializer-->torch_mobile_module;
```

**AFTER:**
```lang=mermaid
graph TD;
    torch_core-->torch_mobile_deserialize;

    torch_mobile_core-->torch_mobile_deserialize;

    jit_module_saving-->torch_core;
    jit_module_saving-->torch_mobile_core;

    torch_mobile_deserialize-->caffe2_serialize;
    torch_mobile_deserialize-->torch_mobile_module;

    caffe2_serialize-->miniz;

    flatbuffer_loader-->mobile_bytecode;
    flatbuffer_serializer-->mobile_bytecode;

    mobile_bytecode-->flatbuffer_2.0;

    torch_mobile_deserialize_pickle_and_flatbuffer-->|new| flatbuffer_loader;
    torch_mobile_deserialize_pickle_and_flatbuffer-->|new| torch_mobile_deserialize;
    torch_mobile_core_pickle_and_flatbuffer-->|new| torch_mobile_deserialize_pickle_and_flatbuffer;
    torch_core_pickle_and_flatbuffer-->|new| torch_mobile_deserialize_pickle_and_flatbuffer;

    jit_module_saving_pickle_and_flatbuffer-->|new| torch_core_pickle_and_flatbuffer;
    jit_module_saving_pickle_and_flatbuffer-->|new| torch_mobile_core_pickle_and_flatbuffer;

    flatbuffer_serializer-->torch_mobile_module;

    jit_module_saving_pickle_and_flatbuffer-->|new|jit_module_saving;
    jit_module_saving_pickle_and_flatbuffer-->|new|flatbuffer_serializer;

    flatbuffer_loader-->torch_mobile_module;
```

Original commit changeset: 780dfb6fd6ba

Original Phabricator Diff: D34805092 (284b2b7135)
ghstack-source-id: 152044801

(Note: this ignores all push blocking failures!)

Test Plan:
CI

```
~/fbsource/fbcode] cd ~/fbsource/fbcode/ && buck test -c fbcode.caffe2_enable_flatbuffer=1 //caffe2/test/cpp/jit:jit  -- FlatbufferTest.ExtraFiles
Parsing buck files: finished in 0.9 sec
Building: finished in 5.3 sec (100%) 12992/54304 jobs, 0/54304 updated
  Total time: 6.2 sec
More details at https://www.internalfb.com/intern/buck/build/2b387fff-f813-4cfa-b53f-eb2378630d4e
BUILD SUCCEEDED
Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details.
Running with tpx session id: f93a84d6-e7ce-41a0-a97f-0ef3fa6d199d
Trace available for this run at /tmp/tpx-20220323-134108.766518-f93a84d6-e7ce-41a0-a97f-0ef3fa6d199d/trace.log
RemoteExecution session id: reSessionID-f93a84d6-e7ce-41a0-a97f-0ef3fa6d199d-tpx
Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/4503599723101693
    ✓ ListingSuccess: caffe2/test/cpp/jit:jit : 486 tests discovered (19.122)
    ✓ Pass: caffe2/test/cpp/jit:jit - FlatbufferTest.ExtraFiles (0.187)
Summary
  Pass: 1
  ListingSuccess: 1
If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users
Finished test run: https://www.internalfb.com/intern/testinfra/testrun/4503599723101693
```

Similar Build Deps Dags

```
[pavithran@devvm5216.vll0 /data/users/pavithran/fbsource] buck query 'allpaths(//xplat/caffe2:torch_mobile_all_ops_pickle_and_flatbuffer, //xplat/caffe2:torch_mobile_deserialize_pickle_and_flatbuffer)' --output-format dot-compact  | pastry
P486770901: https://www.internalfb.com/intern/paste/P486770901/

[pavithran@devvm5216.vll0 /data/users/pavithran/fbsource] buck query 'allpaths(//xplat/caffe2:torch_mobile_all_ops, //xplat/caffe2:torch_mobile_deserialize)' --output-format dot-compact  | pastry
P486771278: https://www.internalfb.com/intern/paste/P486771278/
```

pickle_and_flatbuffer: https://www.internalfb.com/intern/dgw/graph/?build_id=P486770901
pickle: https://www.internalfb.com/intern/dgw/graph/?build_id=P486771278

Reviewed By: iseeyuan

Differential Revision: D35067157

fbshipit-source-id: 9044259c17a2e0da79bd6aedb28efbdfd57e23e0
(cherry picked from commit f738069ec3a72e79da56172741d027de514e9e5f)
2022-03-24 21:51:05 +00:00
CodemodService FBSourceClangFormatLinterBot
c9612cddb7 [AutoAccept][Codemod][FBSourceClangFormatLinter] Daily arc lint --take CLANGFORMAT
Reviewed By: zsol

Differential Revision: D35109008

fbshipit-source-id: 35d37cc1d991569c6df8e65fc789803ac881012b
(cherry picked from commit f5beda976adc343f90b8e622257b2bcac3ac0d27)
2022-03-24 09:35:26 +00:00
jiej
e4e19d5beb nvfuser parser skip api (#74520)
Summary:
added python API to disable nvfuser on certain opkind.

```
          "_jit_set_nvfuser_skip_node_kind",
          [](const std::string& op_name, bool flip = true) {
            return fuser::cuda::skipNode(op_name, flip);
          })
```

Args:
    `op_name`: Symbol of op;
    `flip`: flag indicating whether to flip the given op in the skip list.
Returns:
    a bool flag indicating if `op_name` was already in the skip list.

The python example that disables the fusion of `aten::add` afterwards.
`torch._C._jit_set_nvfuser_skip_node_kind("aten::add", True)  # returns False, as no op is in skip list by default`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/74520

Reviewed By: saketh-are

Differential Revision: D35046110

Pulled By: davidberard98

fbshipit-source-id: 689f5286513dbab206768823a852467b9f6b49b6
(cherry picked from commit 9a31129f7591ba2d393ab057b1cd137a6a25e7e8)
2022-03-23 20:56:43 +00:00
Nikita Shulga
c53b3ed20f Revert D34805092: Extend _save_for_mobile and _load_for_mobile to support flatbuffer format; Default format is pickle + Change buck targets to support only pickle and pickle + flatbuffer for migration
Test Plan: revert-hammer

Differential Revision:
D34805092 (284b2b7135)

Original commit changeset: 57f3fc81d68f

Original Phabricator Diff: D34805092 (284b2b7135)

fbshipit-source-id: 780dfb6fd6ba5f9348f24a2fb3c57971b7155541
(cherry picked from commit bebeb8b84e11c34cbde4857d0e1c291731a7c781)
2022-03-22 22:45:50 +00:00
Pavithran Ramachandran
284b2b7135 Extend _save_for_mobile and _load_for_mobile to support flatbuffer format; Default format is pickle + Change buck targets to support only pickle and pickle + flatbuffer for migration (#74209)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74209

Extending `_save_for_mobile` and `_load_for_mobile` to support faltbuffer format with additional optional argument which is set to pick pickle by default.

Adding new binary target with suffix `_pickle_and_flatbuffer` to help migration.

Size test in D34909502 shows the size has regressed by ~40K but after removing pickle and comparing lite_predictors we have ~120K size measure that we will achieve when deprecating pickle and moving to flatbuffer

**BEFORE:**

```lang=mermaid
graph TD;
    torch_core-->torch_mobile_deserialize;

    torch_mobile_core-->torch_mobile_deserialize;

    jit_module_saving-->torch_core;
    jit_module_saving-->torch_mobile_core;

    torch_mobile_deserialize-->caffe2_serialize;
    torch_mobile_deserialize-->torch_mobile_module;

    caffe2_serialize-->miniz;

    flatbuffer_loader-->mobile_bytecode;
    flatbuffer_serializer-->mobile_bytecode;

    mobile_bytecode-->flatbuffer_2.0;

    flatbuffer_loader-->torch_mobile_module;
    flatbuffer_serializer-->torch_mobile_module;
```

**AFTER:**
```lang=mermaid
graph TD;
    torch_core-->torch_mobile_deserialize;

    torch_mobile_core-->torch_mobile_deserialize;

    jit_module_saving-->torch_core;
    jit_module_saving-->torch_mobile_core;

    torch_mobile_deserialize-->caffe2_serialize;
    torch_mobile_deserialize-->torch_mobile_module;

    caffe2_serialize-->miniz;

    flatbuffer_loader-->mobile_bytecode;
    flatbuffer_serializer-->mobile_bytecode;

    mobile_bytecode-->flatbuffer_2.0;

    torch_mobile_deserialize_pickle_and_flatbuffer-->|new| flatbuffer_loader;
    torch_mobile_deserialize_pickle_and_flatbuffer-->|new| torch_mobile_deserialize;
    torch_mobile_core_pickle_and_flatbuffer-->|new| torch_mobile_deserialize_pickle_and_flatbuffer;
    torch_core_pickle_and_flatbuffer-->|new| torch_mobile_deserialize_pickle_and_flatbuffer;

    jit_module_saving_pickle_and_flatbuffer-->|new| torch_core_pickle_and_flatbuffer;
    jit_module_saving_pickle_and_flatbuffer-->|new| torch_mobile_core_pickle_and_flatbuffer;

    flatbuffer_serializer-->torch_mobile_module;

    jit_module_saving_pickle_and_flatbuffer-->|new|jit_module_saving;
    jit_module_saving_pickle_and_flatbuffer-->|new|flatbuffer_serializer;

    flatbuffer_loader-->torch_mobile_module;
```
ghstack-source-id: 151744258

Test Plan:
Similar Build Deps Dags

```
[pavithran@devvm5216.vll0 /data/users/pavithran/fbsource] buck query 'allpaths(//xplat/caffe2:torch_mobile_all_ops_pickle_and_flatbuffer, //xplat/caffe2:torch_mobile_deserialize_pickle_and_flatbuffer)' --output-format dot-compact  | pastry
P486770901: https://www.internalfb.com/intern/paste/P486770901/

[pavithran@devvm5216.vll0 /data/users/pavithran/fbsource] buck query 'allpaths(//xplat/caffe2:torch_mobile_all_ops, //xplat/caffe2:torch_mobile_deserialize)' --output-format dot-compact  | pastry
P486771278: https://www.internalfb.com/intern/paste/P486771278/
```

pickle_and_flatbuffer: https://www.internalfb.com/intern/dgw/graph/?build_id=P486770901
pickle: https://www.internalfb.com/intern/dgw/graph/?build_id=P486771278

Reviewed By: iseeyuan

Differential Revision: D34805092

fbshipit-source-id: 57f3fc81d68fce941a050c35bd8e6f05951183b3
(cherry picked from commit 671ae4ed29e65b86ffe507a503548d3e86ab0ea4)
2022-03-22 20:00:53 +00:00
Michael Suo
e5bf87963d Revert D34584878: [pytorch][PR] Add JIT graph fuser for oneDNN Graph API (Preview4)
Test Plan: revert-hammer

Differential Revision:
D34584878 (7dd0823011)

Original commit changeset: ce817aa8cc90

Original Phabricator Diff: D34584878 (7dd0823011)

fbshipit-source-id: a941aaad34f8fe5f0c51f719f9f5c29b811c4d5b
(cherry picked from commit a43262ec7521b1665b02a64d3f279e72ee2344b9)
2022-03-21 23:07:14 +00:00
chunyuan
7dd0823011 Add JIT graph fuser for oneDNN Graph API (Preview4) (#68111)
Summary:
## Description
Preview4 PR of this [RFC](https://github.com/pytorch/pytorch/issues/49444).

On the basis of https://github.com/pytorch/pytorch/pull/50256, the below improvements are included:

- The [preview4 release branch](https://github.com/oneapi-src/oneDNN/releases/tag/graph-v0.4.1) of the oneDNN Graph API is used
- The fuser now works with the profiling graph executor. We have inserted type check nodes to guard the profiled tensor properties.

### User API:
The optimization pass is disabled by default. Users could enable it by:
```
torch.jit.enable_onednn_fusion(True)
```

### Performance:
[pytorch/benchmark](https://github.com/pytorch/benchmark) tool is used to compare the performance:
- SkyLake 8180 (1 socket of 28 cores):

  ![image](https://user-images.githubusercontent.com/65992142/151162305-05e44425-a24e-4d5e-94e1-743b40b87a8c.png)

- SkyLake 8180 (single thread):

  ![image](https://user-images.githubusercontent.com/65992142/151162528-69f90b79-d08d-46b8-8775-d80a6ccbce8a.png)
 \* By mapping hardswish to oneDNN Graph, it’s 8% faster than PyTorch JIT (NNC + OFI)
  \** We expect performance gain after mapping transpose, contiguous & view to oneDNN graph ops

### Directory structure of the integration code
Fuser-related code are placed under:
```
torch/csrc/jit/codegen/onednn/
```

Optimization pass registration is done in:
```
torch/csrc/jit/passes/onednn_graph_fuser.h
```

CMake for the integration code is:
```
caffe2/CMakeLists.txt
```

## Limitations

- In this PR, we have only supported the optimization on Linux platform. The support on Windows and MacOS will be enabled as the next step.
- We have only optimized the inference use case.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68111

Reviewed By: eellison

Differential Revision: D34584878

Pulled By: malfet

fbshipit-source-id: ce817aa8cc9052ee9ed930c9cf66be83449e61a4
(cherry picked from commit cd17683aa7d9c0947df45a1ab53627feff795587)
2022-03-21 22:12:19 +00:00
BowenBao
54a6942f8d [ONNX] ONNX Exporter logging (#71342)
Summary:
Add ONNX exporter logging facility. Supporting both C++/Python logging api. Logging can be turned on/off. Logging output stream can be either set to `stdout` or `stderr`.

A few other changes:
* When exception is raised in passes, the current IR graph being processed will be logged.
* When exception is raised from `_jit_pass_onnx` (the pass that converts nodes from namespace `ATen` to `ONNX`), both ATen IR graph and ONNX IR graph under construction will be logged.
* Exception message for ConstantFolding is truncated to avoid being too verbose.
* Update the final printed IR graph with node name in ONNX ModelProto as node attribute. Torch IR Node does not have name. Adding this to printed IR graph helps debugging.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/71342

Reviewed By: msaroufim

Differential Revision: D34433473

Pulled By: malfet

fbshipit-source-id: 4b137dfd6a33eb681a5f2612f19aadf5dfe3d84a
(cherry picked from commit 67a8ebed5192c266f604bdcca931df6fe589699f)
2022-03-17 19:40:03 +00:00
jjsjann123
0120ff759c fixing assert condition (#74239)
Summary:
fixing assert for `_jit_set_fusion_strategy`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/74239

Reviewed By: H-Huang

Differential Revision: D34896284

Pulled By: eellison

fbshipit-source-id: a4daec70f68dcae2098447551ea071c744f6b0b7
(cherry picked from commit 60746f45b69e0448232626d1d601e8051dc5d427)
2022-03-15 19:28:52 +00:00
David Berard
b5244b8470 [JIT] add keep_unique_names arg to canonicalize python bindings (#74074)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74074

Adds the keep_unique_names argument to the python binding for Canonicalize.

Test Plan: Imported from OSS

Reviewed By: eellison

Differential Revision: D34821816

Pulled By: davidberard98

fbshipit-source-id: 7932562cb20e504494f53b83484393bb296e717a
(cherry picked from commit 62bbcff972287550eeaa3ddb0e5c35ff2bbe60ad)
2022-03-11 22:35:55 +00:00
Edward Yang
2f90c82265 Get rid of TorchScript sparse tensor is experimental warning. (#73874)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73874

These get triggered when you are doing normal stuff with sparse
tensors and `__torch_dispatch__`, but it all works fine.  No need
to warn.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: bdhirsh

Differential Revision: D34707395

Pulled By: ezyang

fbshipit-source-id: 3492c03abb1df1e925af3855dbf772784405d8c1
(cherry picked from commit 95e5981b304abf0367740906c238b29cadeea41c)
2022-03-09 15:45:24 +00:00
Han Qi
0723639b60 Revert D34455360: Multisect successfully blamed D34455360 for test failures
Summary:
This diff is reverting D34455360 (61d6c43864)
D34455360 (61d6c43864) is making the following tests to fail and this revert diff is either the revert of the blame diff or the revert of the stack of diffs that need to be reverted to revert the blame diff

Tests affected:
- https://www.internalfb.com/intern/test/562950004334605/

Multisect link:
https://www.internalfb.com/intern/testinfra/multisect/756170

Test Plan: NA

Reviewed By: zhxchen17

Differential Revision: D34596156

fbshipit-source-id: a465bca0094db3caf6130c80f1ed49eea981359b
(cherry picked from commit ef5e5578c64ce9827570757fb016aafa9c782c6a)
2022-03-08 23:18:54 +00:00
anjali411
beda4e8b2f Fix fx tracing for OpOverload (#73940)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73940

Test Plan: Imported from OSS

Reviewed By: zhxchen17

Differential Revision: D34727831

Pulled By: anjali411

fbshipit-source-id: 26e7044a1d5ba9ee0854bda784633b134971074b
(cherry picked from commit 69685e19b3de5ea3f494464eddcce44e93cb0f4d)
2022-03-08 21:47:55 +00:00
Peter Bell
9ef5c679ef record_function: add torchbind alternative API (#72301)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72301

First step in resolving #35026.

This adds `PythonRecordFunction` which is a `torch::CustomClassHolder`
for `at::RecordFunction` to keep the ATen code free of torch includes.
And adds new unused internal API functions
`_record_function_enter_new` which return the torchbind object.

Once the FC period is expired, `torch.profiler.record_function` will
be updated to use this new internal API. Then once BC period is
expired, the cpp_custom_type_hack-based API can be removed.

Test Plan: Imported from OSS

Reviewed By: dagitses

Differential Revision: D34586311

Pulled By: robieta

fbshipit-source-id: d3eb9ffad7b348548a2b22c75203a92d1cb5115b
(cherry picked from commit 92d2ca808e5fbd20c9d6645dcabc3f059f9ef2d3)
2022-03-08 03:26:27 +00:00
anjali411
086645ad77 Update __torch_dispatch__ to return op overload instead of the opoverload packet function (#72673)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72673

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D34627164

Pulled By: anjali411

fbshipit-source-id: 3cb6406a392d530bf9da36b4d8e0a62b30e6497e
(cherry picked from commit 65b85a0a67df4d0f16ac8964e2b685d478a610fb)
2022-03-07 22:38:42 +00:00
Vasiliy Kuznetsov
bf896a2988 dbr quant: add torchscript pass to remove redundant aliases (#71230)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71230

DBR quantization uses `torch.Tensor.as_subclass` frequently. When
the quantized model is traced with `torch.jit.trace`, these calls appear
in the resulting graph as `aten::alias`. This PR adds a pass to remove
these calls from the graph, for two reasons:
1. ease of debugging (these calls do nothing)
2. less work for downstream passes (for example, converting to ONNX currently breaks if these alias calls are present)

For now, we have to inline the graph in order for `aliasDb` to determine
safety properly. In the future, we may choose to relax this if there is
a need for it.

Test Plan:
Test plan is pretty basic for now, it can be improved in future PRs.
```
python test/test_quantization.py TestQuantizeDBR.test_jit_tracing_removes_aliases
```

Reviewed By: eellison

Differential Revision: D33552387

Pulled By: vkuzo

fbshipit-source-id: 681a33ddfff394a91e971263ac593afd93c5ea78
(cherry picked from commit 0f8412725d0c6fd9ef1072a50d4203465aa5d1f9)
2022-03-03 15:31:53 +00:00
David Berard
b27ec57331 [JIT] script & logging for extracting IR from logs (#72889)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72889

The script along with the GRAPH_EXPORT macro will allow for an easy way to extract IR from logs. One use case in this diff is to extract the fusion groups from nvfuser, so that the fusions can be tested individually.

Usage (e.g. for nvfuser test)

1. Write some test.py file that uses nvfuser
2. `PYTORCH_JIT_LOG_LEVEL=">>graph_fuser" python3 test.py 2>&1 | tee output.txt`
3. `python3 pytorch/scripts/jit/log_extract.py output.txt --nvfuser`

This will run with and without nvfuser to compare the output.

Alternatively, use `--output` to dump the IR so that it can be used in other applications.

Currently, only `--output` works (since generating input tensors is not supported)

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D34440189

Pulled By: davidberard98

fbshipit-source-id: fca0f619200ee37aba34bb39b69e6c640c263e26
(cherry picked from commit eb319166075db160f1628f0de545641fbecde8be)
2022-03-02 18:34:35 +00:00
Han Qi
61d6c43864 Make debug_pkl smaller by only emitting unique traces. (#73368)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73368

debug_pkl file inside of pytorch's .pt file consists of a list of SourceRanges. Each SourceRange points to a Source which is a stack track, filename, and start, end numbers. Those are emitted in debug_pkl file as strings.
Since many SourceRange shares the same source, the string for trace can be deduped.
The newer format saves a set of unique traces in a tuple, then each SourceRange will save the offset of it's trace w.r.t. position in that tuple. (i.e. manually applying dictionary compression).
The above helps with smaller file size. On loading, if we copy each trace to Source as string the runtime memory would still blowup.
To mitigate this, we use SourceView directly instead of source which will take the reference of string inside of Deserializer and make that into string_view. This is safe because Deserializer is hold by Unpickler by shared_ptr, and Unpickler is also hold by shared_ptr by another Source object. That Source object will be alive during the model construction.

Test Plan:
unit test

Took original file (312271638_930.predictor.disagg.local); loaded with `torch.jit.load` save again with `torch.jit.save`. Unzip both, look at contents:
```
[qihan@devvm5585.vll0 ~]$ du archive -h
4.0K    archive/xl_model_weights
3.7M    archive/extra
8.0K    archive/code/__torch__/caffe2/torch/fb/model_transform/splitting
8.0K    archive/code/__torch__/caffe2/torch/fb/model_transform
8.0K    archive/code/__torch__/caffe2/torch/fb
8.0K    archive/code/__torch__/caffe2/torch
8.0K    archive/code/__torch__/caffe2
20M     archive/code/__torch__/torch/fx/graph_module
20M     archive/code/__torch__/torch/fx
8.0K    archive/code/__torch__/torch/classes
20M     archive/code/__torch__/torch
20M     archive/code/__torch__
20M     archive/code
2.7M    archive/constants
35M     archive
[qihan@devvm5585.vll0 ~]$ du resaved -h
4.0K    resaved/extra
8.0K    resaved/code/__torch__/caffe2/torch/fb/model_transform/splitting
8.0K    resaved/code/__torch__/caffe2/torch/fb/model_transform
8.0K    resaved/code/__torch__/caffe2/torch/fb
8.0K    resaved/code/__torch__/caffe2/torch
8.0K    resaved/code/__torch__/caffe2
1.3M    resaved/code/__torch__/torch/fx/graph_module
1.3M    resaved/code/__torch__/torch/fx
8.0K    resaved/code/__torch__/torch/classes
1.4M    resaved/code/__torch__/torch
1.4M    resaved/code/__torch__
1.4M    resaved/code
2.7M    resaved/constants
13M     resaved
[qihan@devvm5585.vll0 ~]$
```

Reviewed By: gmagogsfm

Differential Revision: D34455360

fbshipit-source-id: 8cc716f9bba7183746b1b4ecc33a2de34ac503b9
(cherry picked from commit f1a04730fc9ac8fdab6c8e4c44cb5529e42090e4)
2022-03-02 08:37:08 +00:00
Elias Ellison
ab6395fc65 Add api for recursively analyzing function calls (#73329)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73329

There is a quantization use case for having better alias analysis with function calls remaining. This does the relatively dumb approach of getting the inlined graph of each function call, and then analyzing that subgraph. Since we need a unique single analysis of every `Value*`, for every function call make a copy of the graph for every analysis past the first. This is relatively slow, but given the limited use case here should work well enough (and is no slower than calling the inlining pass).

cc vkuzo

Test Plan: Imported from OSS

Reviewed By: davidberard98

Differential Revision: D34451424

Pulled By: eellison

fbshipit-source-id: b7c7e54679d723f5ded1e11ffb32eb6d2176431d
(cherry picked from commit 81a42b31522b890311a3f512448b372c4ebbefd1)
2022-02-28 17:44:45 +00:00
Elias Ellison
8bc28e9c9c [JIT] Add more python ir utilities (#69871)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69871

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D33515232

Pulled By: eellison

fbshipit-source-id: d48da7b398a3f1a8862789484a4035d874196763
(cherry picked from commit e5976b8b7a4995be25a93601bbae5c52d6d3fca8)
2022-02-25 01:07:05 +00:00
Alban Desmaison
3bd1507ff2 Revert D33994011: Make debug_pkl smaller by only emitting unique traces.
Test Plan: revert-hammer

Differential Revision:
D33994011 (3d37f5b052)

Original commit changeset: 8e6224c6e942

Original Phabricator Diff: D33994011 (3d37f5b052)

fbshipit-source-id: 885e739efa1081382e1fcf9c6cccba92c57e9f7a
(cherry picked from commit a6d98c85a736c2eb321a6f38005dd0f5dc43eb87)
2022-02-24 16:38:55 +00:00
Han Qi
3d37f5b052 Make debug_pkl smaller by only emitting unique traces. (#72596)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72596

debug_pkl file inside of pytorch's .pt file consists of a list of SourceRanges. Each SourceRange points to a Source which is a stack track, filename, and start, end numbers. Those are emitted in debug_pkl file as strings.

Since many SourceRange shares the same source, the string for trace can be deduped.

The newer format saves a set of unique traces in a tuple, then each SourceRange will save the offset of it's trace w.r.t. position in that tuple. (i.e. manually applying dictionary compression).

The above helps with smaller file size. On loading, if we copy each trace to Source as string the runtime memory would still blowup.
To mitigate this, we use SourceView directly instead of source which will take the reference of string inside of Deserializer and make that into string_view. This is safe because Deserializer is hold by Unpickler by shared_ptr, and Unpickler is also hold by shared_ptr by another Source object. That Source object will be alive during the model construction.

Test Plan:
unit test

Took original file (312271638_930.predictor.disagg.local); loaded with `torch.jit.load` save again with `torch.jit.save`. Unzip both, look at contents:
```
[qihan@devvm5585.vll0 ~]$ du archive -h
4.0K    archive/xl_model_weights
3.7M    archive/extra
8.0K    archive/code/__torch__/caffe2/torch/fb/model_transform/splitting
8.0K    archive/code/__torch__/caffe2/torch/fb/model_transform
8.0K    archive/code/__torch__/caffe2/torch/fb
8.0K    archive/code/__torch__/caffe2/torch
8.0K    archive/code/__torch__/caffe2
20M     archive/code/__torch__/torch/fx/graph_module
20M     archive/code/__torch__/torch/fx
8.0K    archive/code/__torch__/torch/classes
20M     archive/code/__torch__/torch
20M     archive/code/__torch__
20M     archive/code
2.7M    archive/constants
35M     archive
[qihan@devvm5585.vll0 ~]$ du resaved -h
4.0K    resaved/extra
8.0K    resaved/code/__torch__/caffe2/torch/fb/model_transform/splitting
8.0K    resaved/code/__torch__/caffe2/torch/fb/model_transform
8.0K    resaved/code/__torch__/caffe2/torch/fb
8.0K    resaved/code/__torch__/caffe2/torch
8.0K    resaved/code/__torch__/caffe2
1.3M    resaved/code/__torch__/torch/fx/graph_module
1.3M    resaved/code/__torch__/torch/fx
8.0K    resaved/code/__torch__/torch/classes
1.4M    resaved/code/__torch__/torch
1.4M    resaved/code/__torch__
1.4M    resaved/code
2.7M    resaved/constants
13M     resaved
[qihan@devvm5585.vll0 ~]$
```

Reviewed By: JasonHanwen

Differential Revision: D33994011

fbshipit-source-id: 8e6224c6e942e91c3403f686c8f0937d1002ed41
(cherry picked from commit a7014dd4029308c95007f362a57c31796d686647)
2022-02-24 09:31:16 +00:00
BowenBao
bbac8c9c48 [ONNX] List of files to consider for mergebot onnx rule (#72297)
Summary:
Based on past PRs, here is an non-exhaustive list of files to consider for extension. The PR is not meant to be final. Based on feedback and discussion, files could be dropped from the list, or PR could be updated to move code around such that extension is no longer needed.

List of files below and description:

* These files are for converting from IR to ONNX proto. These should be used only for ONNX.
```
"torch/csrc/jit/serialization/export.*",
"torch/csrc/jit/serialization/onnx.*",
```

* This file is touched whenever pass signature is updated.
```
"torch/_C/__init__.pyi.in",
```

* These files are touched whenever pass signature is updated. Somehow it's been convention that onnx passes are also added here, but it could be possible to move them. Let me know what you think.
~~"torch/csrc/jit/python/init.cpp",~~
~~"torch/csrc/jit/python/script_init.cpp",~~
Update: Bowen will move onnx passes to files under onnx folder.

* ~~Touched when need new attr::xxx, or onnx::xxx.~~
~~"aten/src/ATen/core/interned_strings.h"~~
Update: Nikita will help separate this file.

malfet

Pull Request resolved: https://github.com/pytorch/pytorch/pull/72297

Reviewed By: H-Huang

Differential Revision: D34254666

Pulled By: malfet

fbshipit-source-id: 032cfa590cbedf4648b7335fe8f09a2380ab14cb
(cherry picked from commit 88653eadbf)
2022-02-16 23:01:13 +00:00
Shunting Zhang
763ad1bf25 (2/2) Make TorchScript Preserve Fully Qualified Class Name for Python Exceptions: frontend change (#72899)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72899

Reland D33282878 (911d527b87). This is the frontend change.
ghstack-source-id: 149204031

Test Plan: Refer to D33282878 (911d527b87). Also check CI

Reviewed By: gmagogsfm

Differential Revision: D34252127

fbshipit-source-id: 27b17ddd4d05d904eb91fd9ee094d9121f00e388
(cherry picked from commit 1d276baca3)
2022-02-16 03:45:15 +00:00
Michael Suo
7db4a48d92 Revert D33342569: (2/2) Make TorchScript Preserve Fully Qualified Class Name for Python Exceptions: frontend change
Test Plan: revert-hammer

Differential Revision:
D33342569 (856157fcee)

Original commit changeset: 57984ac67ae2

Original Phabricator Diff: D33342569 (856157fcee)

fbshipit-source-id: 4c12235a1776a3652e7f91e93b626705759d5176
(cherry picked from commit 4cbd7d8bab)
2022-02-15 18:45:44 +00:00
Shunting Zhang
856157fcee (2/2) Make TorchScript Preserve Fully Qualified Class Name for Python Exceptions: frontend change (#70471)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70471

Reland D33282878 (911d527b87). This is the frontend change.
ghstack-source-id: 149114933

Test Plan: Refer to D33282878 (911d527b87). Also check CI

Reviewed By: gmagogsfm

Differential Revision: D33342569

fbshipit-source-id: 57984ac67ae2c56c38f72d3b1fb69105901fb472
(cherry picked from commit b47cc935ee)
2022-02-15 07:21:19 +00:00
BowenBao
cc792746d2 [ONNX] De-duplicate initializers (#68202) (#69547)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69547

ScriptModule export introduces duplicated ONNX initializers for shared weights, unnecessarily increases ONNX model size. This PR de-duplicates ONNX initializers for model exported in eval mode, by checking if the underlying tensors share the same `data_ptr`, `strides` and `sizes`.

Test Plan: Imported from OSS

Reviewed By: msaroufim

Differential Revision: D32994271

Pulled By: malfet

fbshipit-source-id: 10ac66638b6255890875272472aa9ed07a5b1d9a

Co-authored-by: BowenBao <bowbao@microsoft.com>
(cherry picked from commit d7cbde940c)
2022-02-11 22:05:15 +00:00
BowenBao
04c5d978b9 [ONNX] Refactor _run_symbolic_function (#67573) (#68491)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68491

* Allows implementing symbolic functions for domains other than `aten`, for example `prim`, in symbolic_opset#.py.
* Allows symbolic function to access extra context if needed, through `SymbolicFunctionState`.
  * Particularly, the `prim::PythonOp` special case can access node without the need of passing node through inputs. Updates will be made downstreams, and in a follow-up PR we will remove the previous workaround in exporter.
* `prim::Loop`, `prim::If`, etc are now moved outside of `_run_symbolic_function` from utils.py, and to symbolic_opset9.py.

Motivation for this change:
- Better maintainability and reducing complexity. Easier to add symbolic for operators, both simple and complex ones (that need additional context), without the former needing to know the existence of the latter.
- The design idea was long outdated. prim ops are no longer rare special cases, and they shouldn't all be handled inside `_run_symbolic_function`. As a result this function becomes too clumsy. There were also prim ops symbolic added in symbolic_opset#.py with signature `prim_[opname]`, creating separation and confusion.

Test Plan: Imported from OSS

Reviewed By: jansel

Differential Revision: D32483782

Pulled By: malfet

fbshipit-source-id: f9affc31b1570af30ffa6668da9375da111fd54a

Co-authored-by: BowenBao <bowbao@microsoft.com>
(cherry picked from commit 1e04ffd2fd)
2022-02-11 18:35:35 +00:00
David Berard
bbd42c605a [JIT] Opinfo tests for nnc fusion - retry (#72486)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72486

Retry #70465.

Test Plan: Imported from OSS

Reviewed By: mikaylagawarecki

Differential Revision: D34061628

Pulled By: davidberard98

fbshipit-source-id: e27ed315bc4ad57cdbfbc9cedffcbb7886004524
(cherry picked from commit 7937808d2e)
2022-02-09 19:01:22 +00:00
Nikita Shulga
bb101ec78d Revert D33595240: [JIT] Opinfo tests for nnc fusion
Test Plan: revert-hammer

Differential Revision:
D33595240 (0b57bd4c66)

Original commit changeset: e2e17a921bc3

Original Phabricator Diff: D33595240 (0b57bd4c66)

fbshipit-source-id: 172a3ffd19d180b1b3617956b1f881be62f37bc9
(cherry picked from commit 324cfaea86)
2022-02-08 01:28:42 +00:00
David Berard
0b57bd4c66 [JIT] Opinfo tests for nnc fusion (#70465)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70465

These tests check to ensure that
(a) the result after nnc fusion (of a single op) is the same as the
unfused op
(b) for certain ops where fusion is expected to occur, ensure that
fusion does actually occur

Test Plan: Imported from OSS

Reviewed By: wenleix

Differential Revision: D33595240

Pulled By: davidberard98

fbshipit-source-id: e2e17a921bc30c313e92e8e5bbc6c1b5fcd14bc1
(cherry picked from commit b1ba221acc)
2022-02-07 20:56:21 +00:00
Nikita Shulga
717d8c6224 [BE] Fix pybind deprecation warnings (#72376)
Summary:
Fixes:
```
../torch/csrc/autograd/python_variable.cpp:1798:33: warning: ‘bool pybind11::handle::operator==(const pybind11::handle&) const’ is deprecated: Use obj1.is(obj2) instead [-Wdeprecated-declarations]
     TORCH_CHECK(out == py::none(), "Expected __torch_dispatch__ for ", op.operator_name(),
```
and
```
../torch/csrc/jit/python/python_list.cpp:254:57: warning: ‘pybind11::object::object(pybind11::handle, bool)’ is deprecated: Use reinterpret_borrow<object>() or reinterpret_steal<object>() [-Wdeprecated-declarations]
                     py::object(obj, /*is_borrowed*/ true),
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/72376

Reviewed By: albanD

Differential Revision: D34021328

Pulled By: malfet

fbshipit-source-id: 72906077db9031311c6b0ae4c65eb79df9c514d4
(cherry picked from commit e1877ca268)
2022-02-07 18:33:32 +00:00
Anjali Chourdia
a1383a9cfa Reland torch.ops API change machinery with the core functionality disabled (#71785)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71785

see https://github.com/pytorch/pytorch/pull/67254
ghstack-source-id: 147648699

Test Plan: github CI

Reviewed By: albanD

Differential Revision: D33777229

fbshipit-source-id: 517b36be9743025eb40d708d380dae62e3663184
(cherry picked from commit a637e69569)
2022-02-02 16:06:29 +00:00
CodemodService FBSourceClangFormatLinterBot
ed435e903f [AutoAccept][Codemod][FBSourceClangFormatLinter] Daily arc lint --take CLANGFORMAT
Reviewed By: zertosh

Differential Revision: D33938055

fbshipit-source-id: 6c0643a18f09854e87e183341f252c66dd6395a6
(cherry picked from commit fd183aedbc)
2022-02-02 11:27:15 +00:00
Elias Ellison
59a6375639 [NNC] Add Tests for Dynamic Shape Fusion Change default fusion strategy (#71651)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71651

The only tests that regress are because chunk NYI, the other tests that I touched were passing just because the `assertAllFused` wasn't working correctly. That, and we're no longer compiling conv/matmul w dynamic shapes

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D33801500

Pulled By: eellison

fbshipit-source-id: 074118ab4a975b7db876a4fcdfb9483afb879e79
(cherry picked from commit abaa7948c1)
2022-02-01 19:07:02 +00:00
Elias Ellison
f1499d6c18 Refactor PE so fusion specializations are configurable (#71650)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71650

*

Refactors PE so there is a current fusion strategy set, which will take in a vector of e.g. [(STATIC, 2), (DYNAMIC, 10)] which means fuse two static invocations then fuse 10 dynamic ones, then stop specializing.

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D33801501

Pulled By: eellison

fbshipit-source-id: ebc7ac3c57e35a3b9bb15ab751f0aa1d25cc9bd5
(cherry picked from commit 8dd89088d3)
2022-02-01 19:07:02 +00:00
Tugsbayasgalan (Tugsuu) Manlaibaatar
c5df294940 Fix bug in upgrader generation in mobile (#71578)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71578

Use more robust way of extracting upgrader min and max versions

Test Plan: omgitsgreen

Reviewed By: cccclai

Differential Revision: D33690113

fbshipit-source-id: 79a964acb26d7ca1354e104710a285b8da3f46d1
(cherry picked from commit 9e316ee5c1)
2022-01-28 18:20:59 +00:00
Tugsbayasgalan (Tugsuu) Manlaibaatar
e849c8b0f2 Move bytecode generation to python (#71681)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71681

Test Plan: Imported from OSS

Reviewed By: gmagogsfm, cccclai

Differential Revision: D33730791

Pulled By: tugsbayasgalan

fbshipit-source-id: e752e9ae20c01a57a3bea270f604215fdcc9182e
(cherry picked from commit 69c9dc0548)
2022-01-28 02:33:00 +00:00
Chen Lai
e755a4f124 Update the operator version check logic when generating models for testing upgraders (#71894)
Summary:
The model generation script will check the model version, to ensure the developer run the script before they change operator

Previously, the version use the old model version. However, it's hard for developer to know the old version number. In this change, it use the current max operator version to check. It's less strict, but more developer friendly

Pull Request resolved: https://github.com/pytorch/pytorch/pull/71894

ghstack-source-id: 147769215

Test Plan:
first time run:
```
chenlai@devvm5615:~/fbsource/fbcode(b82243650)$ buck run mode/opt //caffe2/torch/fb/mobile/upgrader_codegen:upgrader_test_models_gen
Parsing buck files: finished in 0.7 sec
Downloaded 0/2 artifacts, 0.00 bytes, 100.0% cache miss (for updated rules)
Building: finished in 21.6 sec (100%) 11547/11547 jobs, 2/11547 updated
  Total time: 22.4 sec
BUILD SUCCEEDED
TestVersionedDivTensorExampleV7() aten::div.Tensor
INFO:test.jit.fixtures_srcs.generate_models:Processing TestVersionedDivTensorExampleV7
INFO:test.jit.fixtures_srcs.generate_models:Generating model test_versioned_div_tensor_example_v7 and it's save to /data/users/chenlai/fbsource/fbcode/caffe2/test/jit/fixtures/test_versioned_div_tensor_example_v7.ptl
chenlai@devvm5615:~/fbsource/fbcode(b82243650)$
```

second time run:
```
chenlai@devvm5615:~/fbsource/fbcode(b82243650)$ rm caffe2/test/jit/fixtures/test_versioned_div_tensor_example_v4.ptl
chenlai@devvm5615:~/fbsource/fbcode(b82243650)$ buck run mode/opt //caffe2/torch/fb/mobile/upgrader_codegen:upgrader_test_models_gen
Action graph will be rebuilt because files have been added or removed.
Parsing buck files: finished in 2.0 sec
Building... 17.4 sec (99%) 9289/9290 jobs, 0/9290 updated
TestVersionedDivTensorExampleV7() aten::div.Tensor
INFO:test.jit.fixtures_srcs.generate_models:Processing TestVersionedDivTensorExampleV7
INFO:test.jit.fixtures_srcs.generate_models:Model test_versioned_div_tensor_example_v7 already exists, skipping
chenlai@devvm5615:~/fbsource/fbcode(b82243650)$ jf s
```

Reviewed By: tugsbayasgalan

Differential Revision: D33804737

fbshipit-source-id: 7424b81a700703bdf896ec606c2dac8df6dbf8a6
(cherry picked from commit 44b4e37d30)
2022-01-27 21:15:32 +00:00
Tugsbayasgalan (Tugsuu) Manlaibaatar
c9bd1c60ed Move upgraders from python to cpp (#70593)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70593

Test Plan: Imported from OSS

Reviewed By: gmagogsfm

Differential Revision: D33402543

Pulled By: tugsbayasgalan

fbshipit-source-id: 713c54fbbb2bc4c96d5e3b6084f3090a8923a12d
(cherry picked from commit e72b375264)
2022-01-22 00:24:24 +00:00
Jacob Szwejbka
e926360cb8 [Pytorch Edge] Refactor Compatibility Stuff into own directory (#71432)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71432

Organizing jit/mobile a little more

ghstack-source-id: 147184536

Test Plan: ci.

Reviewed By: iseeyuan

Differential Revision: D33640527

fbshipit-source-id: f3a7884fe0d06d80bb8d9cf141ecaee34b6f88ff
(cherry picked from commit 4c3d1e5435)
2022-01-20 19:38:41 +00:00
Can Balioglu
80b19c4c8c Enable Python bindings for UntypedStorage (#68945)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68945

This PR enables the Python conversion functions for `Storage` (specifically `UntypedStorage`) and also cleans up some remnants of the deprecated typed storages from `DynamicTypes.cpp`.
ghstack-source-id: 147245110

Test Plan: Run the existing unit and integration tests.

Reviewed By: albanD

Differential Revision: D32676505

fbshipit-source-id: 3a3f6db4fb0da5c78dd406c96ab70bdc37015521
(cherry picked from commit d6427b94cf)
2022-01-20 02:11:34 +00:00
Yan Li
6964aa2ced backout D33469839 (#71443)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71443

cogwheel test inline_cvr_infer_canary_pyper_model_publish is timing out.

The convert_fx call takes > 20 mins for local and local_ro sub modules, which used to take ~ 2 mins.

Test Plan:
Fblearn flow run
* the following cmd took 1113 seconds before the diff and 5002 seconds after.
    flow-cli clone-locally 320014219  --run-as-secure-group pytorch_at_scale  --operators pyper_model_publish_workflow.pyper_model_publish_workflow.process_torch_package_model_files.process_non_sparse_parameters[0]

Cogwheel test
* Cogwheel test with packages in B3588 (the last good run) took 4694.48s
* Cogwheel test with packages in B3590 (the first timeout) took 13975.83s
* Cogwheel test with the following packages took 4535.04s
  * all packages in B3588 except the model publish
  * the model publish built with D33469839 (043e84b3d2) reversed (created D33633570)

Reviewed By: albanD, jerryzh168

Differential Revision: D33633570

fbshipit-source-id: dc5e777c48a90c551641a3f79126461f6a60449e
(cherry picked from commit 03ab65023a)
2022-01-18 23:51:51 +00:00
CodemodService FBSourceClangFormatLinterBot
88012c7daf [AutoAccept][Codemod][FBSourceClangFormatLinter] Daily arc lint --take CLANGFORMAT
Reviewed By: zertosh

Differential Revision: D33577744

fbshipit-source-id: 7ecc8367998ee1dffde54c2f4dd3cfafe19a53c9
2022-01-14 06:10:57 -08:00
John Clow
ade83ed90c Building Default Inference for Device Type (#69049)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69049

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D33555885

Pulled By: Gamrix

fbshipit-source-id: 7364066cbc544ab8442a47c82ea89f0e73eaaa06
2022-01-13 13:57:08 -08:00
Nikita Shulga
1de830a985 Use ptrdiff_t rather than ssize_t (#71271)
Summary:
`diff_type` kind of naturally should be `ptrdiff_t`, as `ssize_t` is actually defined [here](https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/sys_types.h.html) as :
> The type ssize_t shall be capable of storing values at least in the range [-1, {SSIZE_MAX}].

Pull Request resolved: https://github.com/pytorch/pytorch/pull/71271

Reviewed By: atalman

Differential Revision: D33569304

Pulled By: malfet

fbshipit-source-id: 57dafed5fc42a1f91cdbed257e76cec4fdfbbebe
2022-01-13 12:41:53 -08:00
Elias Ellison
39be20f259 [JIT][NNC] Add handling of strides to dynamic shape support. (#70464)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70464

Add handling of strided input tensors to dynamic fusion. This is done with the same set of input striding specializations as https://github.com/pytorch/pytorch/pull/60684/:
```
  S_ONE, // STRIDE_ONE: packed
  S_CONT, // STRIDE_CONTIGUOUS: stride[i + 1] * sizes[i + 1]
  S_TRAN_CONT, // STRIDE_TRANSPOSED_CONTIGUOUS: stride[i-1] * sizes[i-1]
  S_AS_ARG, // STRIDE_AS_ARG: stride passed in as runtime value
```
and then two additional specializations for a) contiguous tensor and b) channels-last tensor. channels-last is a common case and we should optimize for it. additionally, tensors natively store whether they are contiguous/channels-last contiguous, which makes it faster to check if tensors follow this pattern.

Output striding will be done in a follow up.

The striding is stored on both the TensorGroup node and on the guard node. The striding descriptors are stored as a vector of strings on the node for debugability and to make use of storing ivalues as attributes on nodes.

As an example:

```

%8 : Double(10, 11, 12, 13, strides=[1716, 1, 143, 11], requires_grad=0, device=cpu) = prim::TensorExprGroup_0[symbolic_shape_inputs=[-37, -36, -35, -34], striding_inputs_desc=[["TENSOR_CONT_CHANNELS_LAST"]](%x, %24, %23, %22, %21)```
```

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D33458649

Pulled By: eellison

fbshipit-source-id: c42616d3c683d70f6258180d23d3841a31a6030d
2022-01-12 09:11:31 -08:00
Elias Ellison
97e8dcba5e Fix mis-specified device arg name (#69645)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69645

As noted in code comment:
existing device operator is registered with input name `a`, which prevents torch.device(type="cuda") from working. add shim-layer here

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D33515231

Pulled By: eellison

fbshipit-source-id: c04af8158a9568a20cd5fbbbd573f6efab98fd60
2022-01-11 22:11:24 -08:00
CodemodService FBSourceClangFormatLinterBot
fb8a9732d9 [AutoAccept][Codemod][FBSourceClangFormatLinter] Daily arc lint --take CLANGFORMAT
Reviewed By: zertosh

Differential Revision: D33524330

fbshipit-source-id: 112291a23e2efe2d573bee86ead8ce2fc3957e5b
2022-01-11 04:33:21 -08:00
anjali411
043e84b3d2 Per-overload torch.ops API (#67254)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67254

Fixes https://github.com/pytorch/pytorch/issues/65997

BC breaking:
`output = torch.ops._test.leaky_relu(self=torch.tensor(-1.0))` now fails with the error `TypeError: __call__() got multiple values for argument 'self'` since we call into `OpOverloadBundle`'s `__call__` method that has `self` bound to it as its first argument.

Follow up work:
1. disallow `default` as an overload name for aten operators.
2. Add a method to obtain a list of all overloads (exclude the ones registered by JIT)
3. Add methods/properties to `OpOverload` to access more schema information (types of input and output args etc)

cc ezyang gchanan

Test Plan: Imported from OSS

Reviewed By: pbelevich

Differential Revision: D33469839

Pulled By: anjali411

fbshipit-source-id: c3fc43460f1c7c9651c64b4d46337be21c400621
2022-01-10 17:29:06 -08:00
John Clow
80659b71a5 Hoisting common expressions out of If blocks [retry] (#65645)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65645

This is a retry of PR: https://github.com/pytorch/pytorch/pull/59492

Latest Changes: Added more tests, added the getOrCreateDB pattern, updated logic to remove unnecessary checks
addressed all comments.

Adding code to find common expressions from the two subblocks of an if
operation and hoist them before the if block.
This also allows Dead Code Elimination to
then eliminate some if blocks.

Test Plan: python test_jit.py TestIfHoisting

Reviewed By: eellison

Differential Revision: D33302065

Pulled By: Gamrix

fbshipit-source-id: a5a184a480cf07354359aaca344c6e27b687a3c2
2022-01-10 13:28:17 -08:00
Zhengxu Chen
649dda9fee [jit] Implement DynamicType for TorchScript runtime. (#68136)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68136

DynamicType is an extension to existing server JIT types. Today using normal server types on Edge is a bit problematic because in embedded environments we don't need the full spectrum of types but we still build with these unneeded dependencies.

Is it possible to just get rid of unneeded JIT types from Edge builds? It's not easy to do so at this moment. For example, on Edge we don't support Union type, but we have to pull in the dependency of Union type because Optional type is being supported which inherits from Union type, so Union type has to be included in the build. Although we could split Union type and Optional type, it could be argued that the root cause is every time we use anything inheriting from `c10::Type`, we don't have the direct evidence of how much dependency we pull in, because we do virtual calls and we don't know what exactly we're calling with server JIT types. If we don't know, it's highly possible that the linker doesn't know either so it cannot effectively strip unused methods.

To address this problem, one option is to implement a separate `DynamicType` which has simpler behavior and doesn't store different types as different symbols in binary but rather raw data (or "tag"). This could increase the binary size by several KBs, so I included several binary size reductions in the same stack, hoping at least we don't regress the binary size.

Currently `DynamicType` inherits from `c10::Type` because I want to reduce the migration cost of `DynamicType` by making it interfacing with existing server JIT types. In the future `DynamicType` should be implemented as a separate class without relying on `c10::Type` to make things both simpler and leaner.
ghstack-source-id: 146670522

Test Plan: in the next diff.

Reviewed By: VitalyFedyunin

Differential Revision: D32264615

fbshipit-source-id: 180eb0998a14eacc1d8b28db39870d84fcc17d5b
2022-01-07 11:23:07 -08:00
Scott Wolchok
ddea6980fe [PyTorch][JIT] Don't refcount Type singletons (#69579)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69579

This should help us avoid reference counting overhead on singleton Type subclasses without a major rewrite of the Type subsystem.
ghstack-source-id: 146643993

Test Plan:
Ran //caffe2/caffe2/fb/high_perf_models/pytorch/benchmark_framework_overheads:cpp_benchmark with arguments `--op empty -niter 40 --stressTestRecordFunction --captureRecordFunctionInputs` on devbig with turbo off.

Before:
```
I1206 13:47:15.037441 1201670 bench.cpp:144] Mean 0.737675
I1206 13:47:15.037463 1201670 bench.cpp:145] Median 0.736725
I1206 13:47:15.037468 1201670 bench.cpp:146] Min 0.722897
I1206 13:47:15.037473 1201670 bench.cpp:147] stddev 0.00508187
I1206 13:47:15.037482 1201670 bench.cpp:148] stddev / mean 0.00688903
```

After:
```
I1206 13:48:16.830123 1205612 bench.cpp:144] Mean 0.66988
I1206 13:48:16.830150 1205612 bench.cpp:145] Median 0.663956
I1206 13:48:16.830157 1205612 bench.cpp:146] Min 0.65986
I1206 13:48:16.830164 1205612 bench.cpp:147] stddev 0.0335928
I1206 13:48:16.830171 1205612 bench.cpp:148] stddev / mean 0.0501475
```

Static runtime startup is also improved; for CMF local_ro, time to initialize a predictor went from 10.01s to 9.59s.

(Note: I wish I had a production workload to demonstrate the advantage of this on. I tried ctr_mobile_feed local_ro net but it was neutral. Anything that manipulates types or List/Dict a lot might be promising.)

Reviewed By: suo

Differential Revision: D32923880

fbshipit-source-id: c82ed6689b3598e61047fbcb2149982173127ff0
2022-01-06 17:39:16 -08:00
Tugsbayasgalan (Tugsuu) Manlaibaatar
b0fdca8855 Bump version number to 7 and compile old operators with old schema (#68358)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68358

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D33433730

Pulled By: tugsbayasgalan

fbshipit-source-id: 202c58365bae13195d3545cefcb0da9162b02151
2022-01-05 23:57:22 -08:00
Tugsbayasgalan (Tugsuu) Manlaibaatar
8bdbe94344 Add forward compatability tests in CI (#64139)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64139

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D30626912

Pulled By: tugsbayasgalan

fbshipit-source-id: 781a88386701b42e2e86daaca0a779d1fc1c4df3
2022-01-05 23:40:06 -08:00
Michael Suo
402f2934bf Revert D33262228: Per-overload torch.ops API
Test Plan: revert-hammer

Differential Revision:
D33262228 (8e6d1738a4)

Original commit changeset: 600dbf511514

Original Phabricator Diff: D33262228 (8e6d1738a4)

fbshipit-source-id: 238fa88ea9c4f26c7511334765c07452fbca9655
2022-01-05 22:10:11 -08:00
anjali411
8e6d1738a4 Per-overload torch.ops API (#67254)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67254

Fixes https://github.com/pytorch/pytorch/issues/65997

TODO: disallow `default` as an overload name for aten operators.

BC breaking:
`output = torch.ops._test.leaky_relu(self=torch.tensor(-1.0))` now fails with the error `TypeError: __call__() got multiple values for argument 'self'` since we call into `OpOverloadBundle`'s `__call__` method that has `self` bound to it as its first argument.

cc ezyang gchanan

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D33262228

Pulled By: anjali411

fbshipit-source-id: 600dbf511514ea9b41aea3e6b1bc1102dab08909
2022-01-05 15:17:41 -08:00
Michael Suo
0ece9a49d7 Revert D33198155: Bump version number to 7 and compile old operators with old schema
Test Plan: revert-hammer

Differential Revision:
D33198155 (d35fc409ad)

Original commit changeset: 38a1185f9ecb

Original Phabricator Diff: D33198155 (d35fc409ad)

fbshipit-source-id: 411aaeb4e047aad9202db50d4d0f2ff35bc51f9d
2022-01-04 13:44:59 -08:00
Tugsbayasgalan (Tugsuu) Manlaibaatar
d35fc409ad Bump version number to 7 and compile old operators with old schema (#68358)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68358

Test Plan: Imported from OSS

Reviewed By: samdow

Differential Revision: D33198155

Pulled By: tugsbayasgalan

fbshipit-source-id: 38a1185f9ecb34a33f737ad0b060b3490956300c
2022-01-04 01:31:25 -08:00
Peter Bell
fa09099ba3 Codegen: TraceType only includes operators being registered (#68691)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68691

TraceType is a sharded file, so by only including specific operator
headers, we ensure that changing one (non-method) operator only needs
one shard to be re-compiled.

This also changes all the included autograd and jit headers from
including `ATen/ATen.h` to just including `ATen/core/Tensor.h`.

Test Plan: Imported from OSS

Reviewed By: gchanan

Differential Revision: D33336948

Pulled By: albanD

fbshipit-source-id: 4e40371592b9a5a7e7fcd1d8cecae11ffb873113
2022-01-02 13:09:19 -08:00
Bo Wu
bf610f08b0 Back out "Make TorchScript Preserve Fully Qualified Class Name for Python Exceptions"
Summary: as title

Test Plan:
```
buck run mode/opt-split-dwarf -c=python.package_style=inplace //ai_infra/distributed_ai/pyper_test_framework/templates:pyper_release_v2 -- --model inline_cvr_post_imp_deterministic_shrunk_pyper_release_v2 --cluster TSCTestCluster --hpc_identity oncall_pyper_oncall --stage prod_offline_training --test_module training_platform
...
############## Start inline_cvr_post_imp_model Test Results Analysis ##############
I1226 22:03:56.789000 3346280 test_driver.py:139  UNKNOWN     ] Test finished in 808.2743511786684 seconds.
+-------------------------+---------+------------------------+-----------------+
| Test Case               | Status  | Message                | Model Entity ID |
+-------------------------+---------+------------------------+-----------------+
| SmallWorld_release_test | Success | finished successfully. | 987987491       |
+-------------------------+---------+------------------------+-----------------+
I1226 22:03:56.790000 3346280 test_driver.py:143  UNKNOWN     ] test_run_id: 3d085f61-28d1-411d-bd27-940ea2554b23 use this id to find your run in scuba pyper_test_framework
I1226 22:03:56.792000 3346280 test_driver.py:160  UNKNOWN     ] Calling cleanup
I1226 22:03:56.792000 3346280 training_platform_test_launcher.py:385  UNKNOWN     ] Stopping launched jobs 1
I1226 22:03:59.563122 3346280 ClientSingletonManager.cpp:100] Shutting down Manifold ClientSingletonManager
```

Reviewed By: seemethere

Differential Revision: D33325936

fbshipit-source-id: 64414bf7061ad77e8ac12eb8abafee4043e0fa1e
2021-12-27 09:11:46 -08:00
Tugsbayasgalan (Tugsuu) Manlaibaatar
4ae71c8d34 Add graph op replacement pass (#69915)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69915

Test Plan: Imported from OSS

Reviewed By: samdow

Differential Revision: D33198158

Pulled By: tugsbayasgalan

fbshipit-source-id: f2b924edf9959aaf51f97db994fae031fa062cf8
2021-12-25 13:03:19 -08:00
Shunting Zhang
911d527b87 Make TorchScript Preserve Fully Qualified Class Name for Python Exceptions (#70339)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70339

When a python program is translated to TorchScript, the python exception type is dropped. This makes users's life hard when they need to categorize errors based more than only exception message.

Here we make the change so when we raise a python exception, we record the fully qualified class name for the exception. Later on when the TorchScript is interpreted, a special exception CustomJITException is thrown. User can get the python class name from CustomJITException::getPythonClassName .

Note that, this diff does not customize the mapping from C++ exception to Python exception. It's left to the users to do whatever mapping they want.

Code under scripts/shunting are just my own experimental code. I can split them out if requested.
ghstack-source-id: 146221879

Test Plan: buck test mode/opt //caffe2/test:jit

Reviewed By: gmagogsfm

Differential Revision: D33282878

fbshipit-source-id: 910f67a764519f1053a48589d1a34df69001525d
2021-12-24 00:25:40 -08:00
jjsjann123
e429a68478 Allow single node fusion for nvfuser (#70000)
Summary:
Setting `PYTORCH_NVFUSER_ONE_OP_FUSION=1` will take all nodes nvFuser support, instead of waiting for fusion opportunity.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70000

Reviewed By: samdow

Differential Revision: D33292195

Pulled By: davidberard98

fbshipit-source-id: 8ed5ce5e82fbb6737e8ab5ce4223b038eaf47756
2021-12-23 17:07:57 -08:00
CodemodService FBSourceClangFormatLinterBot
181120f7d7 [AutoAccept][Codemod][FBSourceClangFormatLinter] Daily arc lint --take CLANGFORMAT
Reviewed By: zertosh

Differential Revision: D33229251

fbshipit-source-id: 3a69bb459fa0a65888d6f9c8e70b5de032ddad97
2021-12-19 16:38:25 -08:00
Peter Bell
ef70174f2e Separate c10::Symbol header from list of interned strings (#69406)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69406

Most files that include `interned_strings.h` don't actually depend on
anything generated from `FORALL_NS_SYMBOLS` yet because they're in a
single file you need to recompile whenever a new symbol is added. Here
I move the class definition into a separate file so this doesn't
happen.

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D32923637

Pulled By: albanD

fbshipit-source-id: 6e488cbfcfe2c041a99d9ff22e167dbddf3f46d7
2021-12-19 14:52:26 -08:00
Tugsbayasgalan (Tugsuu) Manlaibaatar
5c7817fd43 Add test operator in upgrader entry (#69427)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69427

Test Plan: Imported from OSS

Reviewed By: gmagogsfm

Differential Revision: D32867984

Pulled By: tugsbayasgalan

fbshipit-source-id: 25810fc2fd4b943911f950618968af067c04da5c
2021-12-15 00:40:05 -08:00
Tugsbayasgalan (Tugsuu) Manlaibaatar
20f7c893c1 Populate runtime with upgrader graph (#68773)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68773

Test Plan: Imported from OSS

Reviewed By: qihqi, gmagogsfm

Differential Revision: D32603258

Pulled By: tugsbayasgalan

fbshipit-source-id: 6fa0b7ee4ebe46c9aa148923c6ef3e1de106ad13
2021-12-11 13:44:24 -08:00
Yanan Cao
17f3179d60 Back out "[pytorch][PR] Add ability for a mobile::Module to save as flatbuffer" (#69796)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69796

(Note: this ignores all push blocking failures!)

Test Plan: External CI + Sandcastle

Reviewed By: zhxchen17

Differential Revision: D33032671

fbshipit-source-id: dbf6690e960e25d6a5f19043cbe792add2acd7ef
2021-12-10 21:29:53 -08:00
Peter Bell
b2e79ed5ec Remove WindowsTorchApiMacro.h in favor of Export.h (#69585)
Summary:
Follow up to https://github.com/pytorch/pytorch/issues/68095

This also changes the files from the ATen folder to include c10's `Export.h` instead since they can't ever be exporting `TORCH_PYTHON_API`.

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69585

Reviewed By: mrshenli

Differential Revision: D32958594

Pulled By: albanD

fbshipit-source-id: 1ec7ef63764573fa2b486928955e3a1172150061
2021-12-09 17:30:09 -08:00
Han Qi
d3649309e6 [pytorch][PR] Add ability for a mobile::Module to save as flatbuffer (#69306)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69306

Included functions:

save_mobile_module -> saves a mobile::Module to flatbuffer
load_mobile_module_from_file -> loads a flatbuffer into mobile::Module
parse_mobile_module -> parses from bytes or deserialized flatbuffer
Module object

Test Plan: unittests

Reviewed By: gmagogsfm

Differential Revision: D32806835

fbshipit-source-id: 71913c6650e225634f878946bd16960d377a7f57
2021-12-09 14:53:31 -08:00
David Berard
c21169ea41 [JIT] optimize_for_inference on methods other than forward (#69367)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69367

Test Plan: Imported from OSS

Reviewed By: cpuhrsch

Differential Revision: D32835529

Pulled By: davidberard98

fbshipit-source-id: d3066c23d071bc2a3bee59b8ab03b6ab0e43efcf
2021-12-07 12:36:47 -08:00
CodemodService FBSourceClangFormatLinterBot
945d2e380c [AutoAccept][Codemod][FBSourceClangFormatLinter] Daily arc lint --take CLANGFORMAT
Reviewed By: zertosh

Differential Revision: D32910817

fbshipit-source-id: 60d0cb10412e1a37a0249bb223b75855c5596dbd
2021-12-07 08:11:09 -08:00
Tugsbayasgalan (Tugsuu) Manlaibaatar
bc89528931 Initialize upgrader and operator version files (#68772)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68772

Test Plan: Imported from OSS

Reviewed By: gmagogsfm

Differential Revision: D32603257

Pulled By: tugsbayasgalan

fbshipit-source-id: 5a3d9ba4d0a01ddff4ff6ebdf7bb88ec125765b0
2021-12-06 16:27:52 -08:00
Tugsbayasgalan (Tugsuu) Manlaibaatar
dde801686b Expose MobileCode to python (#66592)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66592

Test Plan: Imported from OSS

Reviewed By: samdow

Differential Revision: D31632600

Pulled By: tugsbayasgalan

fbshipit-source-id: 46a7ac20ddb6b433bd037280ed020481901a15c9
2021-12-02 13:18:46 -08:00
Alban Desmaison
28c519961f Follow the undefined Tensor <-> None rule better in torch dispatch (#67793)
Summary:
As per title. This in particular allows to more easily override backward function for which the underlying backend returns `None`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67793

Reviewed By: zou3519

Differential Revision: D32242962

Pulled By: albanD

fbshipit-source-id: 6e114def90ee9499161e1303d301ba7fd003ff89
2021-12-02 07:46:56 -08:00
Alban Desmaison
00ebbd5ef6 Revert D32010095: [pytorch][PR] Add ability for a mobile::Module to save as flatbuffer
Test Plan: revert-hammer

Differential Revision:
D32010095 (41d35dc201)

Original commit changeset: d763b0557780

fbshipit-source-id: bf746a0389135c9f5f67f00f449435ce08fb5f6d
2021-12-02 06:41:40 -08:00
Han Qi
41d35dc201 Add ability for a mobile::Module to save as flatbuffer (#67351)
Summary:
Included functions:

* save_mobile_module -> saves a mobile::Module to flatbuffer
* load_mobile_module_from_file -> loads a flatbuffer into mobile::Module
* parse_mobile_module -> parses from bytes or deserialized flatbuffer
      Module object

Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67351

Reviewed By: iseeyuan

Differential Revision: D32010095

Pulled By: qihqi

fbshipit-source-id: d763b0557780f7c2661b6485105b045e41a5e8f1
2021-12-01 23:58:15 -08:00
Nikolay Korovaiko
ab1d879b33 [WIP] forbid aliasing between the outputs of a differentiable graph (#67732)
Summary:
Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67732

Reviewed By: cpuhrsch

Differential Revision: D32522826

Pulled By: Krovatkin

fbshipit-source-id: 9fdf3509dcd1b885f7c7f06d22b340c0f93bbe12
2021-11-18 15:03:35 -08:00
Deyu Huang
d32efe8bc2 [ONNX] Remove the argument use_external_data_format of export() method entirely. (#67080) (#67811)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67811

* remove the argument use_external_data_format of export() method entirely

Test Plan: Imported from OSS

Reviewed By: msaroufim

Differential Revision: D32181302

Pulled By: malfet

fbshipit-source-id: 4bc1448b7487bb9dfdad4e36008ff5b227fd64a3

Co-authored-by: hwangdeyu <dejack953@outlook.com>
2021-11-15 17:20:04 -08:00
Thomas Viehmann
be281fc597 Check for None in torch.jit.Graph.create (#68253)
Summary:
...because we don't like segfaults from Python (see test).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68253

Reviewed By: suo

Differential Revision: D32396747

Pulled By: gmagogsfm

fbshipit-source-id: a0925e8479702766e88176280985a63bc79e4f6a
2021-11-13 11:30:33 -08:00
Elias Ellison
6b44e75f6b aliasing fixes (#66977)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66977

Fix for https://github.com/pytorch/pytorch/issues/47218

More context is in original PR here: https://github.com/pytorch/pytorch/pull/20556

Test Plan: Imported from OSS

Reviewed By: malfet, albanD

Differential Revision: D31935573

Pulled By: eellison

fbshipit-source-id: 3658d5711116396c35f1d5016773b0096ed347a5
2021-11-09 18:33:37 -08:00
John Clow
a9c2f11d2a Update Freezing Logic and add new passes (#68024)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68024

Pull Request resolved: #67949

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D32260614

Pulled By: eellison

fbshipit-source-id: 41d7a9b45e33297a17560a22eba8973e2fc48b43
2021-11-09 13:21:52 -08:00
Bowen Bao
02e35ce17b [ONNX] Update onnx function export with comments and clean up (#66817) (#67803)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67803

* Addresses comments from #63589

[ONNX] remove torch::onnx::PRODUCER_VERSION (#67107)

Use constants from version.h instead.
This simplifies things since we no longer have to update
PRODUCER_VERSION for each release.

Also add TORCH_VERSION to version.h so that a string is available for
this purpose.

[ONNX] Set `ir_version` based on opset_version. (#67128)

This increases the odds that the exported ONNX model will be usable.
Before this change, we were setting the IR version to a value which may
be higher than what the model consumer supports.

Also some minor clean-up in the test code:
* Fix string replacement.
* Use a temporary file so as to not leave files around in the test
  current working directory.

Test Plan: Imported from OSS

Reviewed By: msaroufim

Differential Revision: D32181306

Pulled By: malfet

fbshipit-source-id: 02f136d34ef8f664ade0bc1985a584f0e8c2b663

Co-authored-by: BowenBao <bowbao@microsoft.com>
Co-authored-by: Gary Miguel <garymiguel@microsoft.com>
Co-authored-by: Nikita Shulga <nshulga@fb.com>
2021-11-05 10:35:35 -07:00
John Clow
ec8a71f9ac Dtype Analysis for Unary and Binary ops with Metatensors (#66898)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66898

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D32175961

Pulled By: Gamrix

fbshipit-source-id: 72721259b900e5a311b6bcb5c350366ba420b734
2021-11-04 19:00:50 -07:00
Natalia Gimelshein
3d4a6ff15d Revert D32154788: Move Concat Linear out of Optimize Numerics
Test Plan: revert-hammer

Differential Revision:
D32154788 (ea94dde573)

Original commit changeset: faa6465c89b3

fbshipit-source-id: 0dcaa65268b68ed01e6a5bc7b73ade1f51163b33
2021-11-04 12:20:02 -07:00
John Clow
ea94dde573 Move Concat Linear out of Optimize Numerics (#67196)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67196

Test Plan: Imported from OSS

Reviewed By: eellison

Differential Revision: D32154788

Pulled By: Gamrix

fbshipit-source-id: faa6465c89b3676d6b1ff7c20a677738a7fbdf88
2021-11-04 11:30:39 -07:00
Elias Ellison
2486061c72 [JIT] make x (+ or -) 0 and x (* or /) 1 peepholes type promotion aware (#67688)
Summary:
Some of the "no-ops" are not actually no-ops because they can change the dtype

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67688

Reviewed By: davidberard98

Differential Revision: D32104601

Pulled By: eellison

fbshipit-source-id: ccb99179a4b30fd20b5a9228374584f2cdc8ec21
2021-11-03 20:11:46 -07:00
Nikolay Korovaiko
3db536e55e add jit_trace_module python binding (#67425)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67425

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D31998564

Pulled By: Krovatkin

fbshipit-source-id: f7e38c8c3f560f2c4e5ed62e1acae2c100efebd4
2021-11-02 23:55:23 -07:00
Scott Wolchok
82f7f8d471 [PyTorch] Adopt IValue::toTupleRef() where obvious (#65505)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65505

Generated with

`fastmod -m 'toTuple\(\)(\s*)->' 'toTupleRef()${1}.'`

, followed by

`fastmod '(std::move\(.*)toTupleRef\(\).' '${1}toTuple()->'`

to unbreak 2 callsites.
ghstack-source-id: 142065835

Test Plan: CI

Reviewed By: gchanan

Differential Revision: D31131025

fbshipit-source-id: 54457ae5bbeb38db9c7f196d469b98521c3d3f34
2021-11-02 10:22:18 -07:00
Zhengxu Chen
5ef62c88a9 [jit] Replace get_executor() with call() in abstract Function interface. (#65969)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65969

ghstack-source-id: 141759210

Test Plan: no behavior change.

Reviewed By: anjali411

Differential Revision: D31326151

fbshipit-source-id: 201f6dc4c23fdb2531f6b8c73d26127f9e212de4
2021-10-28 13:11:29 -07:00
Zhengxu Chen
f20614af21 [jit] Allow custom class functions to be traced in invokeScriptMethodFromPython(). (#67380)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67380

Test Plan: eyes

Reviewed By: tugsbayasgalan

Differential Revision: D31975656

fbshipit-source-id: 47c8c9854899e9fed5a635f88470711dc4c95970
2021-10-27 16:38:50 -07:00
jjsjann123
1ec732bc46 Add fp16/fp32 autocasting to JIT/TorchScript (#63939)
Summary:
Adds mixed precision autocasting support between fp32/fp16 to torchscript/JIT. More in depth descriptoin can be found at [torch/csrc/jit/JIT-AUTOCAST.md](https://github.com/pytorch/pytorch/pull/63939/files#diff-1f1772aaa508841c5bb58b74ab98f49a1e577612cd9ea5c386c8714a75db830b)

This PR implemented an autocast optimization pass that inserts casting ops per AMP rule (torch/csrc/jit/passes/autocast.cpp), that mimics the behavior of eager autocast. The pass also takes into consideration the context of `torch.cuda.amp.autocast` and only inserts casting ops within the enabled context manager, giving feature parity as with eager amp autocast.

We currently provide JIT AMP autocast as a prototyping feature, so it is default off and could be turned on via `torch._C._jit_set_autocast_mode(True)`

The JIT support for autocast is subject to different constraints compared to the eager mode implementation (mostly related to the fact that TorchScript is statically typed), restriction on the user facing python code is described in doc torch/csrc/jit/JIT-AUTOCAST.md

This is a prototype, there are also implementation limitation that's necessary to keep this PR small and get something functioning quickly on upstream, so we can iterate on designs.

Few limitation/challenge that is not properly resolved in this PR:
1. Autocast inserts cast operation, which would have impact on scalar type of output tensor feeding downstream operations. We are not currently propagating the updated scalar types, this would give issues/wrong results on operations in promotion rules.

2. Backward for autodiff in JIT misses the casting of dgrad to input scalar type, as what autograd does in eager. This forces us to explicitly mark the casting operation for certain operations (e.g. binary ops), otherwise, we might be feeding dgrad with mismatch scalar type to input. This could potentially break gradient function consuming dgrad. (e.g. gemm backwards, which assumes grad_output to be of same scalar type as input')

3. `torch.autocast` api has an optional argument `dtype` which is not currently supported in the JIT autocast and we require a static value.

Credit goes mostly to:
tlemo
kevinstephano

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63939

Reviewed By: navahgar

Differential Revision: D31093381

Pulled By: eellison

fbshipit-source-id: da6e26c668c38b01e296f304507048d6c1794314
2021-10-27 12:11:36 -07:00
Zhengxu Chen
b55a2500d2 [jit] Remove graph() call from abstract Function interface. (#65967)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65967

Graph is an implementation detail. If user wants to get access to the
underlying graph, they should be able to explicitly dynamic cast instead.
ghstack-source-id: 141659819

Test Plan: no behavior change.

Reviewed By: gmagogsfm

Differential Revision: D31326153

fbshipit-source-id: a0e984f57c6013494b92a7095bf5bb660035eb84
2021-10-27 11:54:26 -07:00
Zhengxu Chen
f510193e22 [jit][edge] Export maybe-used interface methods from modules. (#65966)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65966

ghstack-source-id: 141594521

Support exportation of "interface methods" from submodule to a mobile module. "Interface methods" are defined as methods which might be dynamically called in a module therefore need to be exported anyway, like virtual functions in C++.

Before this change the algorithm of exportation is a simple iteration through all toplevel methods. Now since we have indirect calls, we need to recursively walkthrough the call graph to find all potentially used methods, which means the order we export methods might break in old runtimes, to guarantee forward compatibility we need to export toplevel methods first, then extra methods, in this order toplevel methods will always be found first.

NOTE that interface methods exportations are disabled by default in this diff. We need to call torch._C._enable_mobile_interface_call_export to actaully enable it.

Test Plan: buck test mode/dev //caffe2/test:jit -- --exact 'caffe2/test:jit - test_export_opnames_interface (jit.test_misc.TestMisc)'

Reviewed By: qihqi, iseeyuan

Differential Revision: D31326155

fbshipit-source-id: 5be7234cca07691f62648a85133b6db65e427b53
2021-10-26 16:35:15 -07:00
Zhengxu Chen
059ae96007 [jit] Factor findAllNodes into one place. (#65965)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65965

ghstack-source-id: 141504185

Test Plan: no behavior change

Reviewed By: qihqi, ejguan

Differential Revision: D31326152

fbshipit-source-id: 2e0261a96853bfb67a96dd68972c905b6b26d562
2021-10-25 15:42:52 -07:00
Nikolay Korovaiko
a7ebf76a15 jit trace (#59949)
Summary:
Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59949

Reviewed By: ZolotukhinM

Differential Revision: D31366787

Pulled By: Krovatkin

fbshipit-source-id: 798cbcd97e8ecfba984f98cd70214954be9309af
2021-10-24 18:04:22 -07:00
Nikita Shulga
6f3f302d9f [ONNX] Deprecate fold_if pass (#65697) (#66145)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66145

Deprecate fold_if pass

Test Plan: Imported from OSS

Reviewed By: jansel

Differential Revision: D31424097

fbshipit-source-id: 25b89679c756393a1065ca6aaa24d29db960cbd4

Co-authored-by: jiafatom <jiafa@microsoft.com>
2021-10-22 13:46:20 -07:00
Nikita Shulga
53a163a015 [ONNX] Export nn.Module call as ONNX local function (#63589) (#66140)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66140

* Add new argument to export api to enable users specifying `nn.Module` classes that they wish to be exported as local function in ONNX model.
* Refactor `torch/csrc/jit/serialization/export.cpp`, and remove redundant `EncoderBase` class.
* ~~Contains changes from #63268~~
* Depends on #63716 to update onnx submodule.

Test Plan: Imported from OSS

Reviewed By: jansel

Differential Revision: D31424098

fbshipit-source-id: c949d0b01c206c30b4182c2dd1a5b90e32b7a0d3

Co-authored-by: BowenBao <bowbao@microsoft.com>
2021-10-22 13:44:56 -07:00
Elias Ellison
63b41e1f4d [JIT] Add partial evaluation graph stitching logic (#65377)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65377

When we run symbolic shape analysis on
```
conv = torch.nn.Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
max_pool = torch.nn.MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
mod = nn.Sequential(conv1, max_pool)
...
graph(%self : __torch__.torch.nn.modules.container.___torch_mangle_0.Sequential,
      %input.1 : Tensor):
  %18 : bool = prim::Constant[value=0]()
  %30 : int[] = prim::Constant[value=[1, 1]]()
  %29 : int[] = prim::Constant[value=[3, 3]]()
  %28 : int[] = prim::Constant[value=[2, 2]]()
  %6 : int = prim::Constant[value=1]()
  %self.0.bias : NoneType = prim::Constant()
  %self.0.weight : Double(64, 3, 7, 7, strides=[147, 49, 7, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %input.5 : Tensor(SS(-2), 64, SS(-3), SS(-4)) = aten::conv2d(%input.1, %self.0.weight, %self.0.bias, %28, %29, %30, %6)
  %input.9 : Tensor(SS(-2), 64, SS(-5), SS(-6)) = aten::max_pool2d(%input.5, %29, %28, %30, %30, %18)
  return (%input.9)
```
we partially evaluate the shape compute graph of `conv2d`, whose output gets passed in and used to partially evaluate the shape compute graph of `max_pool2d`.

The conv2d remaining partially eval'd graph is [here](https://gist.github.com/eellison/0598bd224a422211efa1a45d2b7560b7), and the maxpool2d eval'd graph is [here](https://gist.github.com/eellison/625540b84f650ddbefd3ae5511ab8814). We can take the partially eval'd graphs of a series of operators and stitch them together, which allows us to
a) recover symbolic equivalences by CSE'ing & other optimizations
b) calculate shapes for a whole block of operators just on the input, such as for fusing the whole model to nnc with dynamic shapes and then passing along the computed symbolic shapes. the calculation will also handle error handling.
c) (future-looking) generate inputs on demand for straight-line networks that are composed just of aten operators

The combined graph of the two gives us compute for the unknown symbolic dimensions - `SS(-2), SS(-3), SS(-4), SS(-5), and SS(-6)`.
```
graph(%input.1 : int[]):
  %42 : bool = prim::Constant[value=0]() # <string>:152:17
  %15 : int = prim::Constant[value=3]()
  %input_batch_size_dim.1 : int = prim::Constant[value=0]() # <string>:417:41
  %13 : int = prim::Constant[value=1]() # <string>:426:61
  %12 : int = prim::Constant[value=4]() # <string>:437:32
  %11 : str = prim::Constant[value="AssertionError: "]()
  %9 : int = prim::Constant[value=2]()
  %8 : int = prim::Constant[value=6]()
  %7 : int = prim::Constant[value=7]()
  %16 : int = aten::len(%input.1) # <string>:438:17
  %17 : bool = aten::eq(%16, %12) # <string>:438:17
   = prim::If(%17) # <string>:438:10
    block0():
      -> ()
    block1():
       = prim::RaiseException(%11) # <string>:438:10
      -> ()
  %18 : int = aten::__getitem__(%input.1, %13) # <string>:407:17
  %19 : bool = aten::eq(%18, %15) # <string>:407:17
   = prim::If(%19) # <string>:407:10
    block0():
      -> ()
    block1():
       = prim::RaiseException(%11) # <string>:407:10
      -> ()
  %20 : int = aten::__getitem__(%input.1, %9) # <string>:411:20
  %21 : int = aten::add(%20, %8) # <string>:411:20
  %22 : bool = aten::ge(%21, %7) # <string>:411:20
   = prim::If(%22) # <string>:411:12
    block0():
      -> ()
    block1():
       = prim::RaiseException(%11) # <string>:411:12
      -> ()
  %23 : int = aten::__getitem__(%input.1, %15) # <string>:411:20
  %24 : int = aten::add(%23, %8) # <string>:411:20
  %25 : bool = aten::ge(%24, %7) # <string>:411:20
   = prim::If(%25) # <string>:411:12
    block0():
      -> ()
    block1():
       = prim::RaiseException(%11) # <string>:411:12
      -> ()
  %26 : int = aten::__getitem__(%input.1, %input_batch_size_dim.1) # <string>:422:29
  %27 : int = aten::sub(%20, %13) # <string>:428:32
  %28 : int = aten::floordiv(%27, %9) # <string>:428:32
  %29 : int = aten::add(%28, %13) # <string>:428:32
  %30 : int = aten::sub(%23, %13) # <string>:428:32
  %31 : int = aten::floordiv(%30, %9) # <string>:428:32
  %32 : int = aten::add(%31, %13) # <string>:428:32
  %48 : int = aten::floordiv(%28, %9) # <string>:133:17
  %outputSize.2 : int = aten::add(%48, %13) # <string>:136:23
  %51 : int = aten::floordiv(%31, %9) # <string>:133:17
  %outputSize.1 : int = aten::add(%51, %13) # <string>:136:23
  %53 : bool = aten::ne(%29, %input_batch_size_dim.1) # <string>:156:41
  %54 : bool = prim::If(%53) # <string>:157:64
    block0():
      %55 : bool = aten::ne(%32, %input_batch_size_dim.1) # <string>:157:93
      -> (%55)
    block1():
      -> (%42)
   = prim::If(%54) # <string>:157:10
    block0():
      -> ()
    block1():
       = prim::RaiseException(%11) # <string>:157:10
      -> ()
  %56 : bool = aten::ge(%outputSize.1, %13) # <string>:160:17
  %57 : bool = prim::If(%56) # <string>:160:17
    block0():
      %58 : bool = aten::ge(%outputSize.2, %13) # <string>:160:38
      -> (%58)
    block1():
      -> (%42)
   = prim::If(%57) # <string>:160:10
    block0():
      -> ()
    block1():
       = prim::RaiseException(%11) # <string>:160:10
      -> ()
  return (%26, %29, %32, %outputSize.2, %outputSize.1)
  ```

This PR runs shape analysis, retains the partially evaluated graphs, and then stitches them together, keeping track of what inputs in the partial eval graph correspond to what inputs in the encompassing graph IR and what outputs correspond to what symbolic shape. Adding NNC ppl as reviewers because it is relevant to dynamic shape fusion.

Question for reviewers  : should I make this a separate file ?

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D31797472

Pulled By: eellison

fbshipit-source-id: a41ed31fad085d3563e71c815f49af0cd18aaeed
2021-10-20 16:12:58 -07:00
Michael Suo
70c9eb130d Revert D31732419: [JIT] Add partial evaluation graph stitching logic
Test Plan: revert-hammer

Differential Revision:
D31732419 (5db7db667f)

Original commit changeset: 883a55cbeef0

fbshipit-source-id: f5faba69dfb6b54aeb29d1beaeec8c5b0373830f
2021-10-19 20:07:04 -07:00
Elias Ellison
5db7db667f [JIT] Add partial evaluation graph stitching logic (#65377)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65377

When we run symbolic shape analysis on
```
conv = torch.nn.Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
max_pool = torch.nn.MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
mod = nn.Sequential(conv1, max_pool)
...
graph(%self : __torch__.torch.nn.modules.container.___torch_mangle_0.Sequential,
      %input.1 : Tensor):
  %18 : bool = prim::Constant[value=0]()
  %30 : int[] = prim::Constant[value=[1, 1]]()
  %29 : int[] = prim::Constant[value=[3, 3]]()
  %28 : int[] = prim::Constant[value=[2, 2]]()
  %6 : int = prim::Constant[value=1]()
  %self.0.bias : NoneType = prim::Constant()
  %self.0.weight : Double(64, 3, 7, 7, strides=[147, 49, 7, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %input.5 : Tensor(SS(-2), 64, SS(-3), SS(-4)) = aten::conv2d(%input.1, %self.0.weight, %self.0.bias, %28, %29, %30, %6)
  %input.9 : Tensor(SS(-2), 64, SS(-5), SS(-6)) = aten::max_pool2d(%input.5, %29, %28, %30, %30, %18)
  return (%input.9)
```
we partially evaluate the shape compute graph of `conv2d`, whose output gets passed in and used to partially evaluate the shape compute graph of `max_pool2d`.

The conv2d remaining partially eval'd graph is [here](https://gist.github.com/eellison/0598bd224a422211efa1a45d2b7560b7), and the maxpool2d eval'd graph is [here](https://gist.github.com/eellison/625540b84f650ddbefd3ae5511ab8814). We can take the partially eval'd graphs of a series of operators and stitch them together, which allows us to
a) recover symbolic equivalences by CSE'ing & other optimizations
b) calculate shapes for a whole block of operators just on the input, such as for fusing the whole model to nnc with dynamic shapes and then passing along the computed symbolic shapes. the calculation will also handle error handling.
c) (future-looking) generate inputs on demand for straight-line networks that are composed just of aten operators

The combined graph of the two gives us compute for the unknown symbolic dimensions - `SS(-2), SS(-3), SS(-4), SS(-5), and SS(-6)`.
```
graph(%input.1 : int[]):
  %42 : bool = prim::Constant[value=0]() # <string>:152:17
  %15 : int = prim::Constant[value=3]()
  %input_batch_size_dim.1 : int = prim::Constant[value=0]() # <string>:417:41
  %13 : int = prim::Constant[value=1]() # <string>:426:61
  %12 : int = prim::Constant[value=4]() # <string>:437:32
  %11 : str = prim::Constant[value="AssertionError: "]()
  %9 : int = prim::Constant[value=2]()
  %8 : int = prim::Constant[value=6]()
  %7 : int = prim::Constant[value=7]()
  %16 : int = aten::len(%input.1) # <string>:438:17
  %17 : bool = aten::eq(%16, %12) # <string>:438:17
   = prim::If(%17) # <string>:438:10
    block0():
      -> ()
    block1():
       = prim::RaiseException(%11) # <string>:438:10
      -> ()
  %18 : int = aten::__getitem__(%input.1, %13) # <string>:407:17
  %19 : bool = aten::eq(%18, %15) # <string>:407:17
   = prim::If(%19) # <string>:407:10
    block0():
      -> ()
    block1():
       = prim::RaiseException(%11) # <string>:407:10
      -> ()
  %20 : int = aten::__getitem__(%input.1, %9) # <string>:411:20
  %21 : int = aten::add(%20, %8) # <string>:411:20
  %22 : bool = aten::ge(%21, %7) # <string>:411:20
   = prim::If(%22) # <string>:411:12
    block0():
      -> ()
    block1():
       = prim::RaiseException(%11) # <string>:411:12
      -> ()
  %23 : int = aten::__getitem__(%input.1, %15) # <string>:411:20
  %24 : int = aten::add(%23, %8) # <string>:411:20
  %25 : bool = aten::ge(%24, %7) # <string>:411:20
   = prim::If(%25) # <string>:411:12
    block0():
      -> ()
    block1():
       = prim::RaiseException(%11) # <string>:411:12
      -> ()
  %26 : int = aten::__getitem__(%input.1, %input_batch_size_dim.1) # <string>:422:29
  %27 : int = aten::sub(%20, %13) # <string>:428:32
  %28 : int = aten::floordiv(%27, %9) # <string>:428:32
  %29 : int = aten::add(%28, %13) # <string>:428:32
  %30 : int = aten::sub(%23, %13) # <string>:428:32
  %31 : int = aten::floordiv(%30, %9) # <string>:428:32
  %32 : int = aten::add(%31, %13) # <string>:428:32
  %48 : int = aten::floordiv(%28, %9) # <string>:133:17
  %outputSize.2 : int = aten::add(%48, %13) # <string>:136:23
  %51 : int = aten::floordiv(%31, %9) # <string>:133:17
  %outputSize.1 : int = aten::add(%51, %13) # <string>:136:23
  %53 : bool = aten::ne(%29, %input_batch_size_dim.1) # <string>:156:41
  %54 : bool = prim::If(%53) # <string>:157:64
    block0():
      %55 : bool = aten::ne(%32, %input_batch_size_dim.1) # <string>:157:93
      -> (%55)
    block1():
      -> (%42)
   = prim::If(%54) # <string>:157:10
    block0():
      -> ()
    block1():
       = prim::RaiseException(%11) # <string>:157:10
      -> ()
  %56 : bool = aten::ge(%outputSize.1, %13) # <string>:160:17
  %57 : bool = prim::If(%56) # <string>:160:17
    block0():
      %58 : bool = aten::ge(%outputSize.2, %13) # <string>:160:38
      -> (%58)
    block1():
      -> (%42)
   = prim::If(%57) # <string>:160:10
    block0():
      -> ()
    block1():
       = prim::RaiseException(%11) # <string>:160:10
      -> ()
  return (%26, %29, %32, %outputSize.2, %outputSize.1)
  ```

This PR runs shape analysis, retains the partially evaluated graphs, and then stitches them together, keeping track of what inputs in the partial eval graph correspond to what inputs in the encompassing graph IR and what outputs correspond to what symbolic shape. Adding NNC ppl as reviewers because it is relevant to dynamic shape fusion.

Question for reviewers  : should I make this a separate file ?

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D31732419

Pulled By: eellison

fbshipit-source-id: 883a55cbeef0fd5a6068a779ffa89b6f537245b3
2021-10-19 16:41:19 -07:00
gmagogsfm
147f7559b1 Add SourceView which doesn't own source text as base class of Source (#65309)
Summary:
This would save the cost copying text from stack to heap in some cases (like
parsing function schema during loading phase of libtorch.so)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65309

Reviewed By: swolchok

Differential Revision: D31060315

Pulled By: gmagogsfm

fbshipit-source-id: 0caf7a688b40df52bb4388c5191d1a42351d6f1a
2021-10-18 23:17:22 -07:00
Scott Wolchok
e88d1c4f10 [PyTorch] Add tuple inline storage (#64066)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64066

I noticed a bunch of time being spent heap-allocating Tuples
in the unpickler. 1-, 2-, and 3-element Tuples are apparently common
enough that they get their own bytecode instructions, so I decided to
try also giving them their own representation. We store up to 3
IValues inline in `Tuple` rather than doing a second heap allocation
for a `std::vector<IValue>`.
ghstack-source-id: 140695395

Test Plan:
Added automated tests for TupleElements.

Pixel 3 before: https://www.internalfb.com/intern/aibench/details/761596366576284
Pixel 3 after: https://www.internalfb.com/intern/aibench/details/591414145082422
We went from 347 ms to 302 ms.

Reviewed By: dhruvbird

Differential Revision: D30592622

fbshipit-source-id: 93625c54c9dca5f765ef6d5c191944179cb281a8
2021-10-15 12:16:51 -07:00
John Clow
3bad54069b Concatting multiple linear layers with same input Tensor (different weight/bias) (#63198)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63198

Linear layers using the same input tensor can be concatted together
as long as the weights and biases are compatible.

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D31240642

fbshipit-source-id: 1e78daa6b89822412ba2513d326ee0e072ceff1e
2021-10-08 10:55:46 -07:00
Scott Wolchok
2d885ab73d [jit] Reduce refcounting of Types (#65345)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65345

FooType::get() can return a const reference. Inconveniently, converting shared_ptr<FooType> to shared_ptr<Type> requires a copy & refcount bump, so to properly take advantage of this in unshapedType() we need to take a const Type& in isSubtypeOf(), which is good practice anyway -- don't require a shared_ptr if you don't need to take ownership.
ghstack-source-id: 140044165

Test Plan:
CI

perf says c10::unshapedType time decreased from 2.8% to 2.2% during static runtime startup, though I expect this to be generally beneficial.

Reviewed By: hlu1

Differential Revision: D31027361

fbshipit-source-id: 676feb81db9f74ad7b8651d8774f4ecb4cfa6ab8
2021-10-08 09:03:04 -07:00
Chen Lai
a5895f85be [PyTorch Edge][type] Add type check in compatibility api (#63129)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63129

1. Add an api to get `supported_types` from runtime, expose in c++ only.
2. Add an api to get `contained_types` from model, expose in both c++ and PyThon.
3. Add a field `contained_types_` in `type_parser.cpp` to track the contained types when parsing python string.
4. Expand `is_compatible` api to check type. When checking type, it will check the contained type list from the model with the support type list from runtime.
5. Expand the unittest for compatibility to cover type
6. Add unit test in python to check type list
ghstack-source-id: 139826944

Test Plan:
```
buck test mode/dev //caffe2/test/cpp/jit:jit -- --exact 'caffe2/test/cpp/jit:jit - LiteInterpreterTest.GetContainTypes'

buck test mode/dev //caffe2/test/cpp/jit:jit -- --exact 'caffe2/test/cpp/jit:jit - LiteInterpreterTest.isCompatibleSuccess'
buck test mode/dev //caffe2/test/cpp/jit:jit -- --exact 'caffe2/test/cpp/jit:jit - LiteInterpreterTest.isCompatibleFail'

buck test //caffe2/test:mobile
```

Reviewed By: iseeyuan

Differential Revision: D30231419

fbshipit-source-id: 8427f423ec28cc5de56411f15fd960d8595d6947
2021-10-06 02:23:44 -07:00
Gary Miguel
d1058df885 fix clang-tidy error introduced by #64382 (#65977)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65977

Reviewed By: ngimel

Differential Revision: D31423174

Pulled By: malfet

fbshipit-source-id: 0ea560b9a6ddd6431f70bd3ac10ace68e26ab352
2021-10-05 20:13:13 -07:00
John Clow
6cdea8239e Precomputing Transposes for frozen linear layers (#65631)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65631

Test Plan: Imported from OSS

Reviewed By: eellison

Differential Revision: D31314248

Pulled By: Gamrix

fbshipit-source-id: 85611f3ccfe7b91a183d5d12f7fb9aca3c51acb0
2021-10-05 20:08:32 -07:00
jjsjann123
d609957c95 patching graph_for (#55139)
Summary:
Allows individual DifferentiableGraphOp to display optimized forward graph. This improves user visibility to graph mutation via optimization pass, especially fusion.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/55139

Reviewed By: albanD

Differential Revision: D31330909

Pulled By: dzhulgakov

fbshipit-source-id: c745b482fdc34876dc404cbe3bacd99dcf2ac724
2021-10-04 21:50:22 -07:00
Hariom Narang
2828ce53fd Added jit log stream changing function and some refactor (#65768)
Summary:
Description:
- Have only added `stdout` and `stderr` as possible options from python
  API for now. We can do file path passing later maybe.
- Put the class `JitLoggingConfig` in the cpp file as none of its methods were being used outside of this file.

Python API:
`torch._C._jit_set_logging_stream('stdout|stderr')`
C++ API:
`::torch::jit::set_jit_logging_output_stream(ostream);`

Testing:
- Tested python API locally.
- Unit test for the C++ API is written

Fixes https://github.com/pytorch/pytorch/issues/54182

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65768

Reviewed By: mrshenli

Differential Revision: D31291739

Pulled By: ZolotukhinM

fbshipit-source-id: eee72edc20488efad78a01c5b0ed8a132886a08d
2021-09-30 23:25:11 -07:00
Elias Ellison
928a4bbafb [JIT] Fix compilation unit reference link in constant object upon load (#65784)
Summary:
Follow up to https://github.com/pytorch/pytorch/pull/65442, make sure objects inserted into the graph from load do not holding owning reference.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65784

Reviewed By: suo

Differential Revision: D31251033

Pulled By: eellison

fbshipit-source-id: 59efe19ce6f70744383de4eebf0f89f79f3eb03a
2021-09-30 09:32:28 -07:00
Pruthvi Madugundu
085e2f7bdd [ROCm] Changes not to rely on CUDA_VERSION or HIP_VERSION (#65610)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65610

- Replace HIP_PLATFORM_HCC with USE_ROCM
- Dont rely on CUDA_VERSION or HIP_VERSION and use USE_ROCM and ROCM_VERSION.

- In the next PR
   - Will be removing the mapping from CUDA_VERSION to HIP_VERSION and CUDA to HIP in hipify.
   - HIP_PLATFORM_HCC is deprecated, so will add HIP_PLATFORM_AMD to support HIP host code compilation on gcc.

cc jeffdaily sunway513 jithunnair-amd ROCmSupport amathews-amd

Reviewed By: jbschlosser

Differential Revision: D30909053

Pulled By: ezyang

fbshipit-source-id: 224a966ebf1aaec79beccbbd686fdf3d49267e06
2021-09-29 09:55:43 -07:00
BowenBao
20143bf07f [ONNX] Deprecate use_external_data_format param from torch.onnx.export() function. (#62257) (#64382)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64382

* This `use_external_data_format` parameter is used for large models cannot be exported because of the 2GB protobuf limit.

* When `use_external_data_format` set to True, the model is exported in ONNX external data format, in which case some of the model parameters are stored in external binary files and not in the ONNX model file itself.

* This PR will set this paramter to DEPRECATED and check the model proto sizes by code instead of by user, if the sizes lager than 2GB, then `use_external_data_format = True` automatically.

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D30905265

Pulled By: malfet

fbshipit-source-id: 82b4e17bfa6a8de2bfd700a5282c12f6835603cb

Co-authored-by: hwangdeyu <dejack953@outlook.com>
2021-09-23 22:20:48 -07:00
David Berard
8eb21488fd [JIT] Improve BatchMM mutability handling (#65097)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65097

Previously, BatchMM would skip any block containing any mutable
operators. Now it will avoid batching any operation whose inputs or
outputs are ever mutated. Specifically: consider a tree of ADD, T,
and MM nodes rooted at an ADD node.  If any input or output to any
node in the tree is ever mutated, then the entire tree will be ignored
by BatchMM.

Test Plan: python test/test_jit.py TestBatchMM

Reviewed By: eellison

Differential Revision: D30973515

Pulled By: davidberard98

fbshipit-source-id: 9d836faa1ef0c9e3fefe0ffc0bd265f275471f48
2021-09-16 10:46:14 -07:00
Ansley Ussery
6831d8e379 Support Union in TorchScript (#64234)
Summary:
This PR is created to replace https://github.com/pytorch/pytorch/pull/53180 PR stack, which has all the review discussions. Reason for needing a replacement is due to a messy Sandcastle issue.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64234

Reviewed By: gmagogsfm

Differential Revision: D30656444

Pulled By: ansley

fbshipit-source-id: 77536c8bcc88162e2c72636026ca3c16891d669a
2021-09-03 06:12:24 -07:00
James Reed
e1c3e5f830 [resubmit][FX] Prototype for guarding against mutable operations in tracing (#64467)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64467

Test Plan: Imported from OSS

Reviewed By: driazati

Differential Revision: D30744870

Pulled By: jamesr66a

fbshipit-source-id: fc652f8b17748f90dbeb83fabf3bd5bb57d6ff1a
2021-09-02 21:13:21 -07:00
Eli Uriegas
32a93c2424 Revert D30675780: [FX] Prototype for guarding against mutable operations in tracing
Test Plan: revert-hammer

Differential Revision:
D30675780 (795387477f)

Original commit changeset: b2116b51dcc8

fbshipit-source-id: d4f1173f4989556ea54974f4c2739ef85a705fae
2021-09-02 16:07:29 -07:00
James Reed
795387477f [FX] Prototype for guarding against mutable operations in tracing (#64295)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64295

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D30675780

Pulled By: jamesr66a

fbshipit-source-id: b2116b51dcc87357f0c84192c4c336680875e27a
2021-09-02 15:17:04 -07:00
Zhengxu Chen
ac99d63f83 [jit] Make operation call accept Stack& instead Stack* (#63414)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63414

Misuse of raw pointer in here where stack is never nullable.
ghstack-source-id: 136938318

Test Plan:
compiles.

Imported from OSS

Reviewed By: ejguan

Differential Revision: D30375410

fbshipit-source-id: 9d65b620bb76d90d886c800f54308520095d58ee
2021-08-30 11:49:20 -07:00
Meghan Lele
95d0b3199b Back out "[ONNX] Fix an issue that optimizations might adjust graph inputs unexpectedly. (#61280)" (#64004)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64004

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63904

Fixes T98808160

Test Plan: T98808160

Reviewed By: msaroufim

Differential Revision: D30527450

fbshipit-source-id: 6262901a78ca929cecda1cf740893139aa26f1b4
2021-08-26 12:49:42 -07:00
Bert Maher
8dda299d96 Re-apply: [nnc] Support thread level parallelism in fused kernels (#63776)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63776

I reverted this out of an abundance of caution because some test
failures occurred, but they were all due to precision issues fixed lower in
this stack.  Let's try again.

I've rolled the elimination of the allow-parallelism-in-fusions toggle into
this diff since they're pretty tightly coupled.
ghstack-source-id: 136529847

Test Plan: CI

Reviewed By: huiguoo

Differential Revision: D30484555

fbshipit-source-id: 38fd33520f710585d1130c365a8c60c9ce794a59
2021-08-24 18:56:55 -07:00
Bert Maher
a709ab34a8 [nnc] Re-enable CPU fusion" (#63665)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63665

This reverts commit 125e2d02e5.

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D30471646

Pulled By: bertmaher

fbshipit-source-id: 4189869566f03b5f9ada78d78830f6a34946eed6
2021-08-23 12:42:42 -07:00
Bert Maher
76da46ccdc Revert D30417127: Remove flag to toggle CPU fusion in the presence of parallelism
Test Plan: revert-hammer

Differential Revision:
D30417127 (6600bc9651)

Original commit changeset: b77d7c68364f

fbshipit-source-id: 6b52fb83a84fe241945e3cb3eeb71050d1d9c8f1
2021-08-21 03:38:07 -07:00
BowenBao
8760254911 [ONNX] Fix an issue that optimizations might adjust graph inputs unexpectedly. (#61280) (#62763)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62763

This PR is to fix the issue that the graph inputs might be updated when we export the model in inference mode.

When a model is export in inference mode, some optimizations will be made. One side effect of these optimizations is: the inputs of graph might be adjusted. Such optimizatiosn include:

	1. Conv and BatchNorm op fusion.
	2. Do constant folding.

If the user sets export_params=False, or set keep_initializers_as_inputs=True, it's highly possible that the user wants to provide the corresponding parameters or initiliazers as the inputs of the graph.
In such situation, no matter the model is export in inference mode or training mode, exporter needs to prevent above optimizations from adjusting the graph inputs. By this, the inputs of graph could match inputs that users provided.

The changes in this PR, add an additional common judgement to see if the above optimizations needs to be done or not. From the value of export_params and keep_initializers_as_inputs arguments, infer if the graph inputs are allowed to be adjusted.
If no, these optimizations will be ignored, even other requirements are matched.

Besides these code changes, the comments of some parameters below have been updated so that users have more thoughts when they consider how to leverage these parameters for different purposes:

	1. export_params
	2. training
	3. do_constant_folding
	4. keep_initializers_as_inputs

Test Plan: Imported from OSS

Reviewed By: SplitInfinity

Differential Revision: D30375183

Pulled By: msaroufim

fbshipit-source-id: 4db8b9695649eb32a3a0fefa950ee2e5651bdba0

Co-authored-by: fatcat-z <jiz@microsoft.com>
2021-08-20 12:46:52 -07:00
Alban Desmaison
125e2d02e5 Revert D30417370: [nnc] Enable CPU fusion
Test Plan: revert-hammer

Differential Revision:
D30417370 (b9fc656cf2)

Original commit changeset: 84ce7a578a36

fbshipit-source-id: cd23774cdc3273fd72f8a05f1900eaf36f373e6b
2021-08-20 12:30:21 -07:00
Bert Maher
b9fc656cf2 [nnc] Enable CPU fusion (#63545)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63545

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D30417370

Pulled By: bertmaher

fbshipit-source-id: 84ce7a578a3678d5562bab99d1dc00330c4f72d1
2021-08-20 11:18:21 -07:00
Bert Maher
6600bc9651 Remove flag to toggle CPU fusion in the presence of parallelism (#63514)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63514

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D30417127

Pulled By: bertmaher

fbshipit-source-id: b77d7c68364f2af73570740540f3b1152313016e
2021-08-20 11:18:19 -07:00
Alban Desmaison
ce61100923 Revert D29399533: Hoisting common expressions out of If blocks
Test Plan: revert-hammer

Differential Revision:
D29399533 (9477211e7d)

Original commit changeset: 9336b9dc48c0

fbshipit-source-id: f081c7280203f40328bcbb0c03a7c6a007acedb7
2021-08-19 06:20:40 -07:00
John Clow
9477211e7d Hoisting common expressions out of If blocks (#59492)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59492

Adding code to find common expressions from the two subblocks of an if
operation and hoist them before the if block.
This also allows Dead Code Elimination to
then eliminate some if blocks.

Also eliminated some dead code in the codebase.

Test Plan:
python test_jit.py TestIfHoisting

Imported from OSS

Reviewed By: ngimel

Differential Revision: D29399533

fbshipit-source-id: 9336b9dc48c02c38862f98f98cd72fc1767a1802
2021-08-18 16:29:30 -07:00
Jiewen Tan
04caef8e1d Improve IMethod::getArgumentNames to deal with empty argument names list (#62947)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62947

This diff improved IMethod::getArgumentNames to deal with empty argument names list.

Test Plan:
buck test mode/dev //caffe2/caffe2/fb/predictor:pytorch_predictor_test -- PyTorchDeployPredictor.GetEmptyArgumentNamesValidationMode
buck test mode/dev //caffe2/caffe2/fb/predictor:pytorch_predictor_test -- PyTorchDeployPredictor.GetEmptyArgumentNamesRealMode

Reviewed By: wconstab

Differential Revision: D30179974

fbshipit-source-id: c7aec35c360a73318867c5b77ebfec3affee47e3
2021-08-11 16:44:00 -07:00
Elias Ellison
ea808df25d Test shape analysis with opinfos (#59814)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59814

Using opinfos to test shape analysis. By default, we just check that we don't give incorrect answers, and then if `assert_jit_shape_analysis` is true, tests that we correctly propagates the full shape. and it found a couple bugs {emoji:1f603}

Test Plan: Imported from OSS

Reviewed By: Krovatkin

Differential Revision: D30200058

Pulled By: eellison

fbshipit-source-id: 6226be87f5390277cfa5a1fffaa1b072d4bc8803
2021-08-10 09:47:33 -07:00