Commit Graph

2136 Commits

Author SHA1 Message Date
Jiewen Tan
dc37090ec5 [LT] Support diagonal op (#75230)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75230

Op diagonal is a view op which we can't code-gen yet. Therefore, support
it by making hand-written IR construction and lowering.

Test Plan: ./build/bin/test_lazy --gtest_filter=LazyOpsTest.TestDiagonal*

Reviewed By: wconstab

Differential Revision: D35378316

Pulled By: alanwaketan

fbshipit-source-id: 7958d00107aef20ac37aabcf2868346240977530
(cherry picked from commit 84155528fce484627c9688cfd92fd4aeb68219e5)
2022-04-08 19:49:42 +00:00
Nikolay Korovaiko
4a85145bbd Ansley's rebase of DimensionNode onto master (#75352)
Summary:
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75352

Reviewed By: wconstab

Differential Revision: D35455859

Pulled By: Krovatkin

fbshipit-source-id: e24c81d63dc66d03b752cc8de5cb551d84b003ac
(cherry picked from commit 4ad371cb4cc88860ce8ec398d82083f6759e3fcf)
2022-04-08 17:22:56 +00:00
John Clow
f1db3e465a Adding integration of SSA into LazyTensor
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75050

Approved by: https://github.com/Krovatkin
2022-04-07 19:49:41 +00:00
Pavithran Ramachandran
3001bda304 [PyTorchEdge] Backport from v9 flatbuffer to v8 pickle (#75201)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75201

In this diff:
1. Bump supported version to 9, which will serve as a placeholder for upcoming version bump to v9 for flatbuffer format migration.
2. Implements backport from v9 flatbuffer file to v8 pickle file.
ghstack-source-id: 153225189

(Note: this ignores all push blocking failures!)

Test Plan:
fb:
```
cd ~/fbsource/fbcode/ && buck test  -c fbcode.caffe2_enable_flatbuffer=1 caffe2/test/cpp/jit:jit -- LiteInterpreterTest.BackPortByteCodeModelAllVersions
Parsing buck files: finished in 0.7 sec
Downloaded 0/25 artifacts, 0.00 bytes, 100.0% cache miss (for updated rules)
Building: finished in 20.7 sec (100%) 21783/21783 jobs, 5/21783 updated

cd ~/fbsource/fbcode/ && buck test caffe2/test/cpp/jit:jit -- FlatbufferTest.FlatbufferBackPortTest
Parsing buck files: finished in 0.7 sec
Building: finished in 4.5 sec (100%) 12972/53298 jobs, 0/53298 updated
  Total time: 5.3 sec
More details at https://www.internalfb.com/intern/buck/build/b658d597-d358-4293-97cb-28e7612b96e8
BUILD SUCCEEDED
Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details.
Running with tpx session id: 35d5542d-6ee3-4c28-be10-1d822c7a6fef
Trace available for this run at /tmp/tpx-20220308-090347.891303-35d5542d-6ee3-4c28-be10-1d822c7a6fef/trace.log
RemoteExecution session id: reSessionID-35d5542d-6ee3-4c28-be10-1d822c7a6fef-tpx
Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/8444249379196000
    ✓ ListingSuccess: caffe2/test/cpp/jit:jit : 490 tests discovered (22.838)
    ✓ Pass: caffe2/test/cpp/jit:jit - FlatbufferTest.FlatbufferBackPortTest (0.289)
Summary
  Pass: 1
  ListingSuccess: 1
If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users
Finished test run: https://www.internalfb.com/intern/testinfra/testrun/8444249379196000
```

Reviewed By: iseeyuan

Differential Revision: D34702597

fbshipit-source-id: 5c203c29d13360d7934ce6e57557739e7038c05e
(cherry picked from commit 6189e08a2bd968fdab636f77cb6bd73d6c36beb2)
2022-04-07 19:43:57 +00:00
Wang, Eikan
252e1ccce6 Enable TE fuser to support user defined operator (#73073)
Summary:
PyTorch supports registering a custom operator by `TORCH_LIBRARY_FRAGMENT` / `TORCH_LIBRARY_IMPL` and `torch::jit::tensorexpr::getNNCLoweringRegistry` could insert a custom operator. But the te fuser passes conditional check does not support custom operator. The `isSupported` of `tensorexpr_fuser` checks whether the `Node` is `get_tensorexpr_elementwise_set()`, `supported_non_eltwise_set()`, `supported_misc_set` and `supported_reduction_set`. If a custom operator needs to be added to the TE fusion group, the checked will block it.

Taking the RN50 as an example, we can speed up the model by fusing the convolution and consecutive element-wise operator into a custom operator. The framework overhead becomes non-negligible when the computation becomes more efficient, especially for the latency mode and the tiny models. If the TE fuser allows adding the custom operator to the fusion group, then the entire RN50 model could be fused by TE as a single operator/function consisting of "ExternalCalls" and TE-IR.  This could significantly reduce framework overhead, which in turn improves RN50 E2E performance. The same goes for other models.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/73073

Reviewed By: pbelevich

Differential Revision: D35453165

Pulled By: ZolotukhinM

fbshipit-source-id: a764cf340b0b1e05fe230649cbe44f5786bdd37d
(cherry picked from commit ee95aa4d36714540fbb216a338799e6a6bb966d5)
2022-04-07 04:36:39 +00:00
Martin Yuan
00c1e01ad0 Remove internal logic to handle bytecode version 3 (#57775)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57775

The minimum supported bytecode version is updated from 3 to 4. We no longer support version 3 bytecode models.

Why?
* There are hacky codes in operator loading, that performs differently on one operator on the global bytecode version 3. Instead operator related metadata should be passed (for example, in #56845). To allow future development, we remove the hacky way first.
* The bytecode version was bumped from 3 to 4 more than half a year ago. Since all the production models are all bumped to version 4, it's not practical to keep and maintain version 3. The risk to deprecate version 3 is low.

Test Plan: Imported from OSS

Reviewed By: raziel

Differential Revision: D28270791

Pulled By: cccclai

fbshipit-source-id: 70b1bd6352fdaae5f8d2173b81578d77018c8e44
(cherry picked from commit 3e930fa381cd01f3705116795c6426df992372fc)
2022-04-07 01:45:52 +00:00
Pavithran Ramachandran
f984e50f39 Extend jit::load to work on flatbuffer file; Take 2 (#75256)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75256

ghstack-source-id: 153138970

Test Plan: CI

Reviewed By: iseeyuan

Differential Revision: D35399581

fbshipit-source-id: dafe9d301009d3f70986ed92bfe06d160ab90ba0
(cherry picked from commit ccc860fd07946de5aae12bc179a0b8bbba83b997)
2022-04-06 17:54:01 +00:00
John Clow
26dcec152c Added support for SSA for ops not in a JIT graph
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74340

Approved by: https://github.com/eellison
2022-04-06 01:45:37 +00:00
Antonio Kim
e1b4117e30 Move shape and operand definitions to base node (#75223)
Summary:
First stage of breaking up https://github.com/pytorch/pytorch/pull/74710

Moves the shape and operand definitions from `TsNode` to the base `Node`

CC: wconstab JackCaoG henrytwo

Partially Fixes https://github.com/pytorch/pytorch/issues/74628

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75223

Reviewed By: zou3519

Differential Revision: D35410285

Pulled By: wconstab

fbshipit-source-id: bb84d3fb636882cbe7e18af4b35ff2c0e22aaa58
(cherry picked from commit a4144c9a48379d8a9007cff845796608b597cce1)
2022-04-06 01:43:46 +00:00
Lu Fang
32e58c73c4 Back out "Extend jit::load to work on flatbuffer file" (#75244)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75244

Original commit changeset: d653a5af662a

Original Phabricator Diff: D35060736 (d9d34922a0)

Test Plan: Model loading test, verified that D35060736 (d9d34922a0) will cause the torch::save => torch::load failure.

Reviewed By: yinghai, jianyuh

Differential Revision: D35387009

fbshipit-source-id: 9d176992d402d57779e2af3d905b3c1538335298
(cherry picked from commit 6c8cc0d3b8a88b15e35702d70e18bbae8aa4628a)
2022-04-05 09:55:04 +00:00
Nikita Shulga
81d765ef1f Fix sign-compare violations in cpp tests
Prerequisite change for enabling `-Werror=sign-compare` across PyTorch repo

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75080

Approved by: https://github.com/atalman
2022-04-04 23:05:31 +00:00
Chen Lai
6efc5c1acf Rewrite upgrader bytecode version from 3 to 4 (content unchanged) (#75120)
Summary:
update the upgrader models by hacking backport logic - copy everything in the model and only rewrite the bytecode version to 4 in D35265596

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75120

ghstack-source-id: 152823046

Test Plan: CI

Reviewed By: qihqi

Differential Revision: D35321154

fbshipit-source-id: 333158bd0fd9b4819b3b7cf47d80c285934adf3e
(cherry picked from commit 74bb2da73a4d18f448b8486772643eac89eb759a)
2022-04-02 01:51:39 +00:00
Pavithran Ramachandran
d9d34922a0 Extend jit::load to work on flatbuffer file (#75022)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75022

Extending torch::jit::load to read flatbuffer file
ghstack-source-id: 152820697

Test Plan: CI

Reviewed By: iseeyuan

Differential Revision: D35060736

fbshipit-source-id: d653a5af662a46107ff4fd70209fd2a0a4d40f20
(cherry picked from commit 109e14a54bd279011c8f9066e6c29e8e0b1fc4db)
2022-04-02 01:33:34 +00:00
Pavithran Ramachandran
7aaa75af05 Extending _get_bytecode_version to support flatbuffers format (#75021)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75021

Extending `_get_bytecode_version` to support flatbuffers.
ghstack-source-id: 152771695

(Note: this ignores all push blocking failures!)

Test Plan:
```
~/fbsource/xplat] cd ~/fbsource/xplat/ && buck test //xplat/caffe2:test_lite_interpreter
Building: finished in 0.8 sec (100%) 327/327 jobs, 0/327 updated
  Total time: 0.9 sec
Testing: finished in 06:59.5 min (85 PASS/0 FAIL)
BUILD SUCCEEDED
RESULTS FOR //xplat/caffe2:test_lite_interpreter
PASS    412.3s 85 Passed   0 Skipped   0 Failed   //xplat/caffe2:test_lite_interpreter
TESTS PASSED
```

Reviewed By: iseeyuan

Differential Revision: D34900498

fbshipit-source-id: 65743076d43a933c5381ec128d0268f22c0a8441
(cherry picked from commit 457c76c7d1df6050b941c56a8198162e2e4a3388)
2022-04-01 15:05:37 +00:00
Will Constable
b9e535a64a Add non-eager registration to dispatch autogen (#74557)
Summary:
Previously, the torchscript backend would be (partially) initialized at startup.
- the dispatcher registrations would be registered,
- but other backend components would not be initialized until explicitly calling
  the backend init function

With this change, the torchscript backend is not initialized until its explicit
initialization function is called.

This enables external backends to register their own backend instead of the torchscript
backend to the same (Lazy) key.

Lands a change contributed by antoniojkim via lazy_tensor_staging branch (https://github.com/pytorch/pytorch/issues/73973)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/74557

Reviewed By: bdhirsh

Differential Revision: D35051464

Pulled By: wconstab

fbshipit-source-id: 5a8b0851293e394f49427d1416ee571a8881fe9f
(cherry picked from commit ef745a4a2c8d1d7f9510541a20f1f40625ce29de)
2022-04-01 03:42:53 +00:00
Will Constable
14affba799 Fix ir_metadata Python frames func and remove dead code (#74979)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74979

Reviewed By: alanwaketan

Differential Revision: D35261641

Pulled By: wconstab

fbshipit-source-id: e82b5f17d0043c4a3de72c16fb42fd02a85414fe
(cherry picked from commit fc6c0a1654256871361a5ad08926bc39d74cd0c5)
2022-03-31 23:23:36 +00:00
Nikolay Korovaiko
5177f95d21 Introducing SymInt to Pytorch (for tracing size arithmetic) (master rebase) (#74861)
Summary:
This PR introduces `SymInt` type to Pytorch which will be used by LTC and AOTAutograd for tracing size arithmetic and tests.
`SymInt` is a C++ union structure [int64_t, SymbolicIntNode*] that wraps around an int64_t field where the value of the field could be an index into a list of `shared_ptr<SymbolicIntNode>` or a real int.
This PR doesn't add any support for actually tracing symbolic ints. i.e. data_ for now can only contain real ints.

```
Goal 1: just to show we can add a type to PyTorch core. (wraps int) LANDEABLE
Finalize the naming - symint
Want the name to be short
Does invoke “size” - NO
SInt/SymInt/SymbolicInt
SInt could mean signed int
sym_int or symint or SymInt (originally it was “int”; capitalized implies object semantics, whereas lowercase implies value semantics)
JIT schema - symint
C++ - symint
```

See more details here: https://docs.google.com/document/d/1iiLNwR5ohAsw_ymfnOpDsyF6L9RTUaHMpD8 (d843f63f2a)YLw-jxEw

Pull Request resolved: https://github.com/pytorch/pytorch/pull/74861

Reviewed By: qihqi, ngimel

Differential Revision: D35226230

Pulled By: Krovatkin

fbshipit-source-id: 34acf342bd50fcaa4d8d5dd49c2fd6a98823a5b3
(cherry picked from commit 218643f63ef181cabb92d13a6e837eb64f2dda3c)
2022-03-31 21:59:59 +00:00
jjsjann123
873ced7cd0 Nvfuser code bump 030122 (#73627)
Summary:
Things changed in this PR that requires review:

test/forward_backward_compatibility/check_forward_backward_compatibility.py

Our previous function overload extension names were wrong and has been updated in this PR, hence the compatibility list updated.

nvfuser code updates with bug fixes towards failures we encountered in OpInfoTests as well as failures reported by AOTAutograd team.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/73627

Reviewed By: Chillee

Differential Revision: D34765458

Pulled By: davidberard98

fbshipit-source-id: c81f3d6a1b723fb3a8ba419b7f82227f70440ca7
(cherry picked from commit b6a2c362c37051e44fac31687b2fe272f776551e)
2022-03-31 08:18:22 +00:00
Nikita Shulga
43313cbde3 Revert D34647822: [tensorexpr] Add support for aten::stack
Test Plan: revert-hammer

Differential Revision:
D34647822 (954c7e2a77)

Original commit changeset: 3b863c71886c

Original Phabricator Diff: D34647822 (954c7e2a77)

fbshipit-source-id: e9ce06c9c8d7caf0fbb2565f0d99035bad685793
(cherry picked from commit b2ff355e9dbaa4e940fb221254223984c3c8a215)
2022-03-31 04:25:43 +00:00
Nikita Shulga
320e5a8268 Revert D34808051: [tensorexpr] Enabled aten::stack in the fuser pass with static shapes
Test Plan: revert-hammer

Differential Revision:
D34808051

Original commit changeset: 213e2ffdf87f

Original Phabricator Diff: D34808051

fbshipit-source-id: b618daeb346f784e8ab9525040edcb4a30a39613
(cherry picked from commit e47b973cba5c95e9410f8aecdfd5619de6d4be7c)
2022-03-31 04:25:43 +00:00
Hui Guo
90c3699cc8 [tensorexpr] Enabled aten::stack in the fuser pass with static shapes (#74077)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74077

Test Plan: Imported from OSS

Reviewed By: gchanan

Differential Revision: D34808051

Pulled By: huiguoo

fbshipit-source-id: 213e2ffdf87fb1a74104037cea7ef25e4bfd4307
(cherry picked from commit ad9e84842e5b47eda845827d325b08ba361a8286)
2022-03-31 04:25:43 +00:00
Elias Ellison
2ef5611f31 Add comments for adding shape function and linting (#73570)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73570

Approved by: https://github.com/huiguoo

Test Plan: contbuild & OSS CI, see 6d36bbde7e

Reviewed By: pbelevich

Differential Revision: D35192688

Pulled By: atalman

fbshipit-source-id: b12b80e6a6dd1adaa57a8facb6bb077989faa543
(cherry picked from commit e50478c02592597f12b8490ec5496f76c7d8b8cc)
2022-03-31 04:25:43 +00:00
Nikita Shulga
3036a0309d [skip ci]Revert "Add comments for adding shape function and linting"
This is a technical revert of 6d36bbde7e to reconcile it with e50478c02592597f12b8490ec5496f76c7d8b8cc (which is the same + lint changes applied)

Should be skipped during import
2022-03-30 21:21:28 -07:00
Hui Guo
954c7e2a77 [tensorexpr] Add support for aten::stack (#73801)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73801

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D34647822

Pulled By: huiguoo

fbshipit-source-id: 3b863c71886c7c6616b16f5d3313079714c8b82a
(cherry picked from commit c71778cf6a5724d26b671bf3ee0478add24990e8)
2022-03-30 21:25:15 +00:00
Dave Bort
f82b2d4a82 [PyTorchEdge] Make _load_parameters() handle flatbuffer inputs (#74580)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74580

Handle Flatbuffer-serialized parameters.

Make `_load_parameters()` detect the input data format and use the correct deserializer to load the parameters.

Also, rename `BytecodeDeserializer` to `IValueUnpickler` to make it clear that it unpickles an `IValue` and doesn't have anything to do with bytecode.
ghstack-source-id: 152487890

Test Plan:
New unit test shows a successful round trip from _save_parameters() to _load_parameters() using flatbuffers.

```
$ buck test //xplat/caffe2:test_lite_trainer //xplat/caffe2:test_lite_trainer_pickle_and_flatbuffer
Building: finished in 0.5 sec (100%) 346/346 jobs, 0/346 updated
  Total time: 0.6 sec
Testing: finished in 0.5 sec (26 PASS/0 FAIL)
BUILD SUCCEEDED
RESULTS FOR //xplat/caffe2:test_lite_trainer //xplat/caffe2:test_lite_trainer_pickle_and_flatbuffer
PASS    <100ms 13 Passed   0 Skipped   0 Failed   //xplat/caffe2:test_lite_trainer
PASS    <100ms 13 Passed   0 Skipped   0 Failed   //xplat/caffe2:test_lite_trainer_pickle_and_flatbuffer
TESTS PASSED
```

Reviewed By: qihqi

Differential Revision: D34488913

fbshipit-source-id: 8d2c0b895699f3b336115d33bf96d49cbf9245d2
(cherry picked from commit 319345deff260826197f8cdf5ac03071b412c72f)
2022-03-30 20:39:58 +00:00
Dave Bort
1659a267f9 [PyTorchEdge] Export flatbuffers from _save_parameters() (#74579)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74579

Now that we can convert a module to a flatbuffer, update `_save_parameters()` to optionally write to that format.

Also, rename the internal `ScriptModuleSerializer` class to `IValuePickler` to make it more clear that a) it's pickle-specific, and b) it serializes IValues, not Modules.
ghstack-source-id: 152487889

Test Plan:
New unit test shows that we can produce Flatbuffer-formatted output.

```
$ buck test //xplat/caffe2:test_lite_trainer //xplat/caffe2:test_lite_trainer_pickle_and_flatbuffer
Building: finished in 0.5 sec (100%) 346/346 jobs, 0/346 updated
  Total time: 0.6 sec
Testing: finished in 0.5 sec (26 PASS/0 FAIL)
BUILD SUCCEEDED
RESULTS FOR //xplat/caffe2:test_lite_trainer //xplat/caffe2:test_lite_trainer_pickle_and_flatbuffer
PASS    <100ms 13 Passed   0 Skipped   0 Failed   //xplat/caffe2:test_lite_trainer
PASS    <100ms 13 Passed   0 Skipped   0 Failed   //xplat/caffe2:test_lite_trainer_pickle_and_flatbuffer
TESTS PASSED
```

A new test in later commit D34488913 tests the full round trip.

Reviewed By: qihqi

Differential Revision: D34408538

fbshipit-source-id: eea183c31b5e1b2b75a65f384d8a479223a4ae72
(cherry picked from commit de310a15422b65fb7e443f7005d287d9f5f586bc)
2022-03-30 20:39:58 +00:00
Elias Ellison
6d36bbde7e Add comments for adding shape function and linting
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73570

Approved by: https://github.com/huiguoo
2022-03-29 23:02:22 +00:00
Elias Ellison
9c4a63787b Add api for changing function executor settings, hook up execution with decomposition registry (#74186)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74186

Make the execution settings mutable on function_impl so that we can set it for running op decompositions. Add mapping to function objects and show example in test of executing op decompositions.

Test Plan: Imported from OSS

Reviewed By: gchanan

Differential Revision: D34938125

Pulled By: eellison

fbshipit-source-id: adf108b2f6c1bd166910c6d7b94245661d67ce0d
(cherry picked from commit 9957e33803002d9e71abe4ff802769270b6960d3)
2022-03-29 18:38:52 +00:00
Elias Ellison
0ecf1add1b Introduce function-local settings for executor, expose in c++ (#74012)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74012

This allows setting an executor on a function. The first use case is use to decompositions in C++ without additional fusion passes etc which might not work with custom tensors like batched tensors/vmap. A subsequent use case might be taking advantage of invokees of JIT execution which guard on certain properties before invocation (such as complete shapes in AOT autograd, rank in lazy tensor).

Test Plan: Imported from OSS

Reviewed By: gchanan

Differential Revision: D34938124

Pulled By: eellison

fbshipit-source-id: cf7a45416457942b872322cab47d871a8336bdb5
(cherry picked from commit 9c600eb9ad0f2173f003e511268e97584edae36d)
2022-03-29 18:38:52 +00:00
Elias Ellison
6694fdaccd Clean up profiling mode and profiling executor strategy (#73875)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73875

Previously we had a few settings:
- getExecutor - which toggled between Profiling Executor and Legacy
- getGraphOptimize - if true, overrides PE/Legacy to run with simple executor (no optimizations)
and then...
- getProfilingMode - which would set PE to 0 specializtions.

The last mode is redundant with getGraphOptimize, we should just remove it and use getGraphOptimize in these cases. It would lead to potentially invalid combinations of logic - what does mean if getProfilingMode is true but getExecutor is set to false ? This would lead to a bug in specialize_autograd_zero in this case, see: https://github.com/pytorch/pytorch/blob/master/torch%2Fcsrc%2Fjit%2Fpasses%2Fspecialize_autogradzero.cpp#L93.

The tests here are failing but get fixed with the PR above it, so i'll squash for landing.

Test Plan: Imported from OSS

Reviewed By: cpuhrsch

Differential Revision: D34938130

Pulled By: eellison

fbshipit-source-id: 1a9c0ae7f6d1cfddc2ed3499a5af611053ae5e1b
(cherry picked from commit cf69ce3d155ba7d334022c42fb2cee54bb088c23)
2022-03-29 18:38:51 +00:00
Kurt Mohler
5375b2e994 Resolve int[]? arguments to new OptionalIntArrayRef class
This PR uses the `OptionalArrayRef` template class that was drafted in #64084.

Fixes #44409
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70864
Approved by: https://github.com/ezyang
2022-03-26 01:45:50 +00:00
Pavithran Ramachandran
fc2cf3d26f Back out "Revert D34805092: Extend _save_for_mobile and _load_for_mobile to support flatbuffer format; Default format is pickle + Change buck targets to support only pickle and pickle + flatbuffer for migration" (#74594)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74594

Extending `_save_for_mobile` and `_load_for_mobile` to support faltbuffer format with additional optional argument which is set to pick pickle by default.

Adding new binary target with suffix `_pickle_and_flatbuffer` to help migration.

Size test in D34909502 shows the size has regressed by ~40K but after removing pickle and comparing lite_predictors we have ~120K size measure that we will achieve when deprecating pickle and moving to flatbuffer

**BEFORE:**

```lang=mermaid
graph TD;
    torch_core-->torch_mobile_deserialize;

    torch_mobile_core-->torch_mobile_deserialize;

    jit_module_saving-->torch_core;
    jit_module_saving-->torch_mobile_core;

    torch_mobile_deserialize-->caffe2_serialize;
    torch_mobile_deserialize-->torch_mobile_module;

    caffe2_serialize-->miniz;

    flatbuffer_loader-->mobile_bytecode;
    flatbuffer_serializer-->mobile_bytecode;

    mobile_bytecode-->flatbuffer_2.0;

    flatbuffer_loader-->torch_mobile_module;
    flatbuffer_serializer-->torch_mobile_module;
```

**AFTER:**
```lang=mermaid
graph TD;
    torch_core-->torch_mobile_deserialize;

    torch_mobile_core-->torch_mobile_deserialize;

    jit_module_saving-->torch_core;
    jit_module_saving-->torch_mobile_core;

    torch_mobile_deserialize-->caffe2_serialize;
    torch_mobile_deserialize-->torch_mobile_module;

    caffe2_serialize-->miniz;

    flatbuffer_loader-->mobile_bytecode;
    flatbuffer_serializer-->mobile_bytecode;

    mobile_bytecode-->flatbuffer_2.0;

    torch_mobile_deserialize_pickle_and_flatbuffer-->|new| flatbuffer_loader;
    torch_mobile_deserialize_pickle_and_flatbuffer-->|new| torch_mobile_deserialize;
    torch_mobile_core_pickle_and_flatbuffer-->|new| torch_mobile_deserialize_pickle_and_flatbuffer;
    torch_core_pickle_and_flatbuffer-->|new| torch_mobile_deserialize_pickle_and_flatbuffer;

    jit_module_saving_pickle_and_flatbuffer-->|new| torch_core_pickle_and_flatbuffer;
    jit_module_saving_pickle_and_flatbuffer-->|new| torch_mobile_core_pickle_and_flatbuffer;

    flatbuffer_serializer-->torch_mobile_module;

    jit_module_saving_pickle_and_flatbuffer-->|new|jit_module_saving;
    jit_module_saving_pickle_and_flatbuffer-->|new|flatbuffer_serializer;

    flatbuffer_loader-->torch_mobile_module;
```

Original commit changeset: 780dfb6fd6ba

Original Phabricator Diff: D34805092 (284b2b7135)
ghstack-source-id: 152044801

(Note: this ignores all push blocking failures!)

Test Plan:
CI

```
~/fbsource/fbcode] cd ~/fbsource/fbcode/ && buck test -c fbcode.caffe2_enable_flatbuffer=1 //caffe2/test/cpp/jit:jit  -- FlatbufferTest.ExtraFiles
Parsing buck files: finished in 0.9 sec
Building: finished in 5.3 sec (100%) 12992/54304 jobs, 0/54304 updated
  Total time: 6.2 sec
More details at https://www.internalfb.com/intern/buck/build/2b387fff-f813-4cfa-b53f-eb2378630d4e
BUILD SUCCEEDED
Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details.
Running with tpx session id: f93a84d6-e7ce-41a0-a97f-0ef3fa6d199d
Trace available for this run at /tmp/tpx-20220323-134108.766518-f93a84d6-e7ce-41a0-a97f-0ef3fa6d199d/trace.log
RemoteExecution session id: reSessionID-f93a84d6-e7ce-41a0-a97f-0ef3fa6d199d-tpx
Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/4503599723101693
    ✓ ListingSuccess: caffe2/test/cpp/jit:jit : 486 tests discovered (19.122)
    ✓ Pass: caffe2/test/cpp/jit:jit - FlatbufferTest.ExtraFiles (0.187)
Summary
  Pass: 1
  ListingSuccess: 1
If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users
Finished test run: https://www.internalfb.com/intern/testinfra/testrun/4503599723101693
```

Similar Build Deps Dags

```
[pavithran@devvm5216.vll0 /data/users/pavithran/fbsource] buck query 'allpaths(//xplat/caffe2:torch_mobile_all_ops_pickle_and_flatbuffer, //xplat/caffe2:torch_mobile_deserialize_pickle_and_flatbuffer)' --output-format dot-compact  | pastry
P486770901: https://www.internalfb.com/intern/paste/P486770901/

[pavithran@devvm5216.vll0 /data/users/pavithran/fbsource] buck query 'allpaths(//xplat/caffe2:torch_mobile_all_ops, //xplat/caffe2:torch_mobile_deserialize)' --output-format dot-compact  | pastry
P486771278: https://www.internalfb.com/intern/paste/P486771278/
```

pickle_and_flatbuffer: https://www.internalfb.com/intern/dgw/graph/?build_id=P486770901
pickle: https://www.internalfb.com/intern/dgw/graph/?build_id=P486771278

Reviewed By: iseeyuan

Differential Revision: D35067157

fbshipit-source-id: 9044259c17a2e0da79bd6aedb28efbdfd57e23e0
(cherry picked from commit f738069ec3a72e79da56172741d027de514e9e5f)
2022-03-24 21:51:05 +00:00
Will Constable
3547f20872 Land remaining parts of Torchscript Lazy Tensor backend (#74111)
Summary:
Also enables bazel build to run lazy codegen.  Bazel (oss) build feeds off the same filelists as cmake/buck (build_variables.bzl), so enabling it is easier than keeping it disabled.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/74111

Test Plan: Run CI and verify test_lazy_ops is running via OSS cmake builds

Reviewed By: bdhirsh

Differential Revision: D34772403

fbshipit-source-id: 8a63f58b9536e6ac1be530667932176ef2549496
(cherry picked from commit e807ffb1918853d10b924fdc24f85ee5b1a39021)
2022-03-22 23:14:03 +00:00
Nikita Shulga
c53b3ed20f Revert D34805092: Extend _save_for_mobile and _load_for_mobile to support flatbuffer format; Default format is pickle + Change buck targets to support only pickle and pickle + flatbuffer for migration
Test Plan: revert-hammer

Differential Revision:
D34805092 (284b2b7135)

Original commit changeset: 57f3fc81d68f

Original Phabricator Diff: D34805092 (284b2b7135)

fbshipit-source-id: 780dfb6fd6ba5f9348f24a2fb3c57971b7155541
(cherry picked from commit bebeb8b84e11c34cbde4857d0e1c291731a7c781)
2022-03-22 22:45:50 +00:00
Pavithran Ramachandran
284b2b7135 Extend _save_for_mobile and _load_for_mobile to support flatbuffer format; Default format is pickle + Change buck targets to support only pickle and pickle + flatbuffer for migration (#74209)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74209

Extending `_save_for_mobile` and `_load_for_mobile` to support faltbuffer format with additional optional argument which is set to pick pickle by default.

Adding new binary target with suffix `_pickle_and_flatbuffer` to help migration.

Size test in D34909502 shows the size has regressed by ~40K but after removing pickle and comparing lite_predictors we have ~120K size measure that we will achieve when deprecating pickle and moving to flatbuffer

**BEFORE:**

```lang=mermaid
graph TD;
    torch_core-->torch_mobile_deserialize;

    torch_mobile_core-->torch_mobile_deserialize;

    jit_module_saving-->torch_core;
    jit_module_saving-->torch_mobile_core;

    torch_mobile_deserialize-->caffe2_serialize;
    torch_mobile_deserialize-->torch_mobile_module;

    caffe2_serialize-->miniz;

    flatbuffer_loader-->mobile_bytecode;
    flatbuffer_serializer-->mobile_bytecode;

    mobile_bytecode-->flatbuffer_2.0;

    flatbuffer_loader-->torch_mobile_module;
    flatbuffer_serializer-->torch_mobile_module;
```

**AFTER:**
```lang=mermaid
graph TD;
    torch_core-->torch_mobile_deserialize;

    torch_mobile_core-->torch_mobile_deserialize;

    jit_module_saving-->torch_core;
    jit_module_saving-->torch_mobile_core;

    torch_mobile_deserialize-->caffe2_serialize;
    torch_mobile_deserialize-->torch_mobile_module;

    caffe2_serialize-->miniz;

    flatbuffer_loader-->mobile_bytecode;
    flatbuffer_serializer-->mobile_bytecode;

    mobile_bytecode-->flatbuffer_2.0;

    torch_mobile_deserialize_pickle_and_flatbuffer-->|new| flatbuffer_loader;
    torch_mobile_deserialize_pickle_and_flatbuffer-->|new| torch_mobile_deserialize;
    torch_mobile_core_pickle_and_flatbuffer-->|new| torch_mobile_deserialize_pickle_and_flatbuffer;
    torch_core_pickle_and_flatbuffer-->|new| torch_mobile_deserialize_pickle_and_flatbuffer;

    jit_module_saving_pickle_and_flatbuffer-->|new| torch_core_pickle_and_flatbuffer;
    jit_module_saving_pickle_and_flatbuffer-->|new| torch_mobile_core_pickle_and_flatbuffer;

    flatbuffer_serializer-->torch_mobile_module;

    jit_module_saving_pickle_and_flatbuffer-->|new|jit_module_saving;
    jit_module_saving_pickle_and_flatbuffer-->|new|flatbuffer_serializer;

    flatbuffer_loader-->torch_mobile_module;
```
ghstack-source-id: 151744258

Test Plan:
Similar Build Deps Dags

```
[pavithran@devvm5216.vll0 /data/users/pavithran/fbsource] buck query 'allpaths(//xplat/caffe2:torch_mobile_all_ops_pickle_and_flatbuffer, //xplat/caffe2:torch_mobile_deserialize_pickle_and_flatbuffer)' --output-format dot-compact  | pastry
P486770901: https://www.internalfb.com/intern/paste/P486770901/

[pavithran@devvm5216.vll0 /data/users/pavithran/fbsource] buck query 'allpaths(//xplat/caffe2:torch_mobile_all_ops, //xplat/caffe2:torch_mobile_deserialize)' --output-format dot-compact  | pastry
P486771278: https://www.internalfb.com/intern/paste/P486771278/
```

pickle_and_flatbuffer: https://www.internalfb.com/intern/dgw/graph/?build_id=P486770901
pickle: https://www.internalfb.com/intern/dgw/graph/?build_id=P486771278

Reviewed By: iseeyuan

Differential Revision: D34805092

fbshipit-source-id: 57f3fc81d68fce941a050c35bd8e6f05951183b3
(cherry picked from commit 671ae4ed29e65b86ffe507a503548d3e86ab0ea4)
2022-03-22 20:00:53 +00:00
Han Qi
4b4f652f79 [3/5] Put JIT source inside flatbuffer (#74245)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74245

title

Test Plan: unittest

Reviewed By: iseeyuan

Differential Revision: D34881612

fbshipit-source-id: 7037982e9267ad72b86e91cd5f2d92426d71dd56
(cherry picked from commit 88f34eb55b2bee6ef8ef27188e075fa2b8767fdf)
2022-03-17 18:46:47 +00:00
Will Constable
d67a265881 Sync lazy_tensor_staging to master (#74311)
Summary:
This merges changes that have already been reviewed/landed onto lazy_tensor_staging branch.  It combines changes from multiple PRs into one diff.

updated from lazy_tensor_staging on 3/16

Pull Request resolved: https://github.com/pytorch/pytorch/pull/74311

Test Plan:
Run CI to ensure compilation on various platforms
Run unit tests on lazy_tensor_staging branch with source version of all these diffs

Reviewed By: desertfire

Differential Revision: D34929235

fbshipit-source-id: babbc3bbeabc5b8107ee9284ed7765887a148622
(cherry picked from commit d91577a6557343ec536f6859e4808ec1a8a9b685)
2022-03-17 16:08:57 +00:00
Will Constable
44a8d4d998 Add lazy tensor unit tests, disabled (#74309)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74309

Since the test file is large, it can be landed on its own and then switched on
in the diff that actually builds lazy tensor code.

Test Plan: verify CI passes

Reviewed By: desertfire

Differential Revision: D34928619

fbshipit-source-id: cd556155326f7fb55b3f29031f80bc36c936d565
(cherry picked from commit 60945adbefb6a8d19f89e330f8b344d076b13bfc)
2022-03-17 15:31:26 +00:00
Will Constable
72b1194464 Run lazy tensor codegen in generate_code.py (#73996)
Summary:
Hooks into existing autograd codegen script (generate_code.py) to take advantage of its integrations into buck/cmake/bazel.

Adds a new option (--gen_lazy_ts_backend) to. generate_code.py, calling this from CMake OSS build and fbcode build, but not from other internal xplat/ovrsource builds (these could be opted in later)

Bazel support is added in a later diff.

Includes one generated file (torch/csrc/lazy/generated/LazyIr.h) in a unit test (test/cpp/lazy/test_ir.cpp) to partially verify the generator is working, but does not compile the remaining output sources from the generator yet as they depend on other files not yet landed from lazy_tensor_staging branch.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/73996

Test Plan: OSS/internal CI - verify all builds are working and test_ir.cpp compiles LazyIr.h

Reviewed By: ezyang

Differential Revision: D34408536

fbshipit-source-id: 8af0aea3b95d81eccafc17d64390d70ddd176515
(cherry picked from commit f930612f2bad61c76eb02d85cfbec9f33a1459dc)
2022-03-17 15:31:26 +00:00
Han Qi
ded82ad7c7 Create method to map JIT module to (source, constant) and back. (#74119)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74119

  implemented function to generate source as ExtraFilesMap and constants

  wrote function to construct jit module given (ivalue, source,
  constant) tripple.

Test Plan: unittest

Reviewed By: pavithranrao

Differential Revision: D34803945

fbshipit-source-id: 2edc798407fe68294cb4c3c7516f5bd143df88c3
(cherry picked from commit 35e54e166b8f0f5cfe8f08c07866b59ae61ee79d)
2022-03-15 18:30:08 +00:00
Taylor Robie
0b1f3bd158 [Profiler] Prefer TSC to wall clock when available (#73855)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73855

Calling the clock is one of the most expensive parts of profiling. We can reduce the profiling overhead by using `rdtsc` instead. The tradeoff is that we have to measure and convert. (shift and scale)

Test Plan: I added a cpp unit test with *very* aggressive anti-flake measures. I also ran the overhead benchmark (9 replicates) with `--stressTestKineto` (0.94 -> 0.89 us) and `--stressTestKineto --kinetoProfileMemory` (1.27 -> 1.17 us)

Reviewed By: chaekit

Differential Revision: D34231071

fbshipit-source-id: e3b3dd7580d93bcc783e87c7f2fc726cb74f4df8
(cherry picked from commit e8be9f8160793c6ee35d5af02bca3e01703e377d)
2022-03-13 18:29:06 +00:00
Taylor Robie
5a58820f01 [Profiler] Specialized AppendOnlyQueue (#73409)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73409

We can do better than `vector` or `deque`, and it's sufficiently important to the hot path to justify a custom container. (This is part of the larger queue refactor, but this is a standalone drop-in replacement so we don't need to wait.)

Test Plan: It's a pretty simple container type, so I just added a few cpp tests for emplace and read back. I also ran the overhead benchmark (replicates=9) with both `--stressTestKineto` (0.99 -> 0.94 us) and `--stressTestKineto --kinetoProfileMemory` (1.36 -> 1.27 us).

Reviewed By: swolchok

Differential Revision: D34231072

fbshipit-source-id: ed57299729d444d59cf843a0d38a3ee2240eeec1
(cherry picked from commit 43907948f3a8d2137244e7bb59f43999bd660917)
2022-03-11 19:47:40 +00:00
David Dang
abfaef0aec [Quant][core] Merged conv packed params and linear packed params (#73486)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73486

conv and linear packed params were previously defined in ATen/native/quantized/cpu/conv_packed_params.h> and ATen/native/quantized/cpu/packed_params.h>. These two files have been merged into one and has been relocated to ATen/native/quantized/cpu/packed_params.h>.

Differential Revision:
D34513286
D34513286

Test Plan: Imported from OSS

Reviewed By: dagitses

Pulled By: dzdang

fbshipit-source-id: 813845af7ea9449e316ab7822efe7460f0bd0d88
(cherry picked from commit 2f627561f27f81977ff73b8863c5e9e719dc4c60)
2022-03-11 15:18:45 +00:00
Ivan Kobzarev
519e226b66 [tensorexp] ExternalCall2 without memcpy (#72225)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72225

Test Plan: Imported from OSS

Reviewed By: dagitses

Differential Revision: D33960933

Pulled By: IvanKobzarev

fbshipit-source-id: fc73a3de9e5150919e3806516065b4a6c8316000
(cherry picked from commit f637842c341e0ba94906a0c8a1efc81691dc512c)
2022-03-09 21:19:26 +00:00
Han Qi
0723639b60 Revert D34455360: Multisect successfully blamed D34455360 for test failures
Summary:
This diff is reverting D34455360 (61d6c43864)
D34455360 (61d6c43864) is making the following tests to fail and this revert diff is either the revert of the blame diff or the revert of the stack of diffs that need to be reverted to revert the blame diff

Tests affected:
- https://www.internalfb.com/intern/test/562950004334605/

Multisect link:
https://www.internalfb.com/intern/testinfra/multisect/756170

Test Plan: NA

Reviewed By: zhxchen17

Differential Revision: D34596156

fbshipit-source-id: a465bca0094db3caf6130c80f1ed49eea981359b
(cherry picked from commit ef5e5578c64ce9827570757fb016aafa9c782c6a)
2022-03-08 23:18:54 +00:00
Elias Ellison
52ccbf4494 Lock thread/block computation (#73800)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73800

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D34647281

Pulled By: eellison

fbshipit-source-id: adbdaf24191c4c1b85e0b62564388f2481002ed2
(cherry picked from commit 6cf38015cc14691518b1b5cb7d636e80eb3684fc)
2022-03-04 22:32:08 +00:00
Dave Bort
7b51629c53 [PyTorchEdge] Add getFileFormat() so we can differentiate Zip/Pickle from Flatbuffer (#73707)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73707

Add a helper function to detect the file format from the first bytes of a data file or stream. This will be necessary during the migration from Pickle-serialized modules to Flatbuffer-serialized modules.
ghstack-source-id: 150384317

Test Plan:
Existing tests for ZIP+Pickle continue to pass.

New unit tests pass:
```
cd xplat && buck test //xplat/caffe2:test_lite_trainer //xplat/caffe2:test_lite_interpreter

Building: finished in 26.6 sec (100%) 3180/3180 jobs, 571/3180 updated
  Total time: 32.2 sec
Testing: finished in 07:08.3 min (89 PASS/0 FAIL)
BUILD SUCCEEDED
RESULTS FOR //xplat/caffe2:test_lite_interpreter //xplat/caffe2:test_lite_trainer
PASS    421.1s 81 Passed   0 Skipped   0 Failed   //xplat/caffe2:test_lite_interpreter
PASS     103ms  8 Passed   0 Skipped   0 Failed   //xplat/caffe2:test_lite_trainer
TESTS PASSED
```

Reviewed By: iseeyuan

Differential Revision: D34527859

fbshipit-source-id: ff2d1eabc2f8be1de2e44709c878e2d1a373f0df
(cherry picked from commit 5c394848346ab9e374c9e7eed479ad70ed09a7ae)
2022-03-04 19:35:41 +00:00
Han Qi
61d6c43864 Make debug_pkl smaller by only emitting unique traces. (#73368)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73368

debug_pkl file inside of pytorch's .pt file consists of a list of SourceRanges. Each SourceRange points to a Source which is a stack track, filename, and start, end numbers. Those are emitted in debug_pkl file as strings.
Since many SourceRange shares the same source, the string for trace can be deduped.
The newer format saves a set of unique traces in a tuple, then each SourceRange will save the offset of it's trace w.r.t. position in that tuple. (i.e. manually applying dictionary compression).
The above helps with smaller file size. On loading, if we copy each trace to Source as string the runtime memory would still blowup.
To mitigate this, we use SourceView directly instead of source which will take the reference of string inside of Deserializer and make that into string_view. This is safe because Deserializer is hold by Unpickler by shared_ptr, and Unpickler is also hold by shared_ptr by another Source object. That Source object will be alive during the model construction.

Test Plan:
unit test

Took original file (312271638_930.predictor.disagg.local); loaded with `torch.jit.load` save again with `torch.jit.save`. Unzip both, look at contents:
```
[qihan@devvm5585.vll0 ~]$ du archive -h
4.0K    archive/xl_model_weights
3.7M    archive/extra
8.0K    archive/code/__torch__/caffe2/torch/fb/model_transform/splitting
8.0K    archive/code/__torch__/caffe2/torch/fb/model_transform
8.0K    archive/code/__torch__/caffe2/torch/fb
8.0K    archive/code/__torch__/caffe2/torch
8.0K    archive/code/__torch__/caffe2
20M     archive/code/__torch__/torch/fx/graph_module
20M     archive/code/__torch__/torch/fx
8.0K    archive/code/__torch__/torch/classes
20M     archive/code/__torch__/torch
20M     archive/code/__torch__
20M     archive/code
2.7M    archive/constants
35M     archive
[qihan@devvm5585.vll0 ~]$ du resaved -h
4.0K    resaved/extra
8.0K    resaved/code/__torch__/caffe2/torch/fb/model_transform/splitting
8.0K    resaved/code/__torch__/caffe2/torch/fb/model_transform
8.0K    resaved/code/__torch__/caffe2/torch/fb
8.0K    resaved/code/__torch__/caffe2/torch
8.0K    resaved/code/__torch__/caffe2
1.3M    resaved/code/__torch__/torch/fx/graph_module
1.3M    resaved/code/__torch__/torch/fx
8.0K    resaved/code/__torch__/torch/classes
1.4M    resaved/code/__torch__/torch
1.4M    resaved/code/__torch__
1.4M    resaved/code
2.7M    resaved/constants
13M     resaved
[qihan@devvm5585.vll0 ~]$
```

Reviewed By: gmagogsfm

Differential Revision: D34455360

fbshipit-source-id: 8cc716f9bba7183746b1b4ecc33a2de34ac503b9
(cherry picked from commit f1a04730fc9ac8fdab6c8e4c44cb5529e42090e4)
2022-03-02 08:37:08 +00:00
Mengwei Liu
9ce9803abe [PyTorch] Add codegen unboxing ability (#69881)
Summary:
RFC: https://github.com/pytorch/rfcs/pull/40

This PR (re)introduces python codegen for unboxing wrappers. Given an entry of `native_functions.yaml` the codegen should be able to generate the corresponding C++ code to convert ivalues from the stack to their proper types. To trigger the codegen, run
```
tools/jit/gen_unboxing.py -d cg/torch/share/ATen
```

Merged changes on CI test. In https://github.com/pytorch/pytorch/issues/71782 I added an e2e test for static dispatch + codegen unboxing. The test exports a mobile model of mobilenetv2, load and run it on a new binary for lite interpreter: `test/mobile/custom_build/lite_predictor.cpp`.

## Lite predictor build specifics

1. Codegen: `gen.py` generates `RegisterCPU.cpp` and `RegisterSchema.cpp`. Now with this PR, once `static_dispatch` mode is enabled, `gen.py` will not generate `TORCH_LIBRARY` API calls in those cpp files, hence avoids interaction with the dispatcher. Once `USE_LIGHTWEIGHT_DISPATCH` is turned on, `cmake/Codegen.cmake` calls `gen_unboxing.py` which generates `UnboxingFunctions.h`, `UnboxingFunctions_[0-4].cpp` and `RegisterCodegenUnboxedKernels_[0-4].cpp`.
2. Build: `USE_LIGHTWEIGHT_DISPATCH` adds generated sources into `all_cpu_cpp` in `aten/src/ATen/CMakeLists.txt`. All other files remain unchanged. In reality all the `Operators_[0-4].cpp` are not necessary but we can rely on linker to strip them off.

## Current CI job test coverage update

Created a new CI job `linux-xenial-py3-clang5-mobile-lightweight-dispatch-build` that enables the following build options:
* `USE_LIGHTWEIGHT_DISPATCH=1`
* `BUILD_LITE_INTERPRETER=1`
* `STATIC_DISPATCH_BACKEND=CPU`

This job triggers `test/mobile/lightweight_dispatch/build.sh` and builds `libtorch`. Then the script runs C++ tests written in `test_lightweight_dispatch.cpp` and `test_codegen_unboxing.cpp`. Recent commits added tests to cover as many C++ argument type as possible: in `build.sh` we installed PyTorch Python API so that we can export test models in `tests_setup.py`. Then we run C++ test binary to run these models on lightweight dispatch enabled runtime.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69881

Reviewed By: iseeyuan

Differential Revision: D33692299

Pulled By: larryliu0820

fbshipit-source-id: 211e59f2364100703359b4a3d2ab48ca5155a023
(cherry picked from commit 58e1c9a25e3d1b5b656282cf3ac2f548d98d530b)
2022-03-01 23:28:13 +00:00
Elias Ellison
d3d74e9040 Allow custom registration of shape functions (#73270)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73270

Together with open registration of NNC lowerings this should make possible to add support for custom operators, including internal fb-ops

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D34451275

Pulled By: eellison

fbshipit-source-id: ae8ae2deb93caa6770e738217461e65853897b55
(cherry picked from commit ea6b7e8a6d8f970a20e68d02eefc5c951e32aa07)
2022-02-28 17:44:45 +00:00
Pavithran Ramachandran
62eb7d64cf [PyTorchEdge] Extend flatbuffer to support extra files map (#72951)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72951

Extend flatbuffer to support extra files map

Flatbuffer schema has extra files. The users can write extra files by providing a `map<string, string>` which will be part of the flatbuffer model asset and and can be loaded back similar to pickle.
ghstack-source-id: 149622799

Test Plan:
fb:

```[pavithran@devvm5216.vll0 ~/fbsource/fbcode] cd ~/fbsource/fbcode/ && buck test caffe2/test/cpp/jit:jit -- FlatbufferTest.ExtraFiles
Parsing buck files: finished in 0.7 sec
Downloaded 0/8 artifacts, 0.00 bytes, 100.0% cache miss (for updated rules)
Building: finished in 20.0 sec (100%) 22343/22343 jobs, 4/22343 updated
  Total time: 20.7 sec
More details at https://www.internalfb.com/intern/buck/build/7dba5034-d623-4a1e-afa1-b0e809df7066
BUILD SUCCEEDED
Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details.
Running with tpx session id: 9c1ac1e0-a8c0-4a62-95df-8f49695aa7d1
Trace available for this run at /tmp/tpx-20220216-144630.207992/trace.log
RemoteExecution session id: reSessionID-9c1ac1e0-a8c0-4a62-95df-8f49695aa7d1-tpx
Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/7318349470518809
    ✓ ListingSuccess: caffe2/test/cpp/jit:jit : 468 tests discovered (17.211)
    ✓ Pass: caffe2/test/cpp/jit:jit - FlatbufferTest.ExtraFiles (0.169)
Summary
  Pass: 1
  ListingSuccess: 1
If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users
Finished test run: https://www.internalfb.com/intern/testinfra/testrun/7318349470518809````

Reviewed By: iseeyuan

Differential Revision: D34286346

fbshipit-source-id: 4e09ab25b8ed6af6f8923db3aab046c255f13bb8
(cherry picked from commit ce8d88e22a360b25253d8a75f428d523fa88a79a)
2022-02-24 19:39:32 +00:00
Jacob Szwejbka
faacb8ab36 [Pytorch Edge] Lean Runtime Test
Summary:
As far as I can tell theres no CI that actually runs the lean_runtime. This should add it I think. (Is this directory covered by CI?) Next up is to create some test for min_runtime_lib

(Note: this ignores all push blocking failures!)

Test Plan: buck test :lean_runtime_delegate_flatbuffer_test

Reviewed By: iseeyuan

Differential Revision: D34255148

fbshipit-source-id: b44693220e93869edd984bbcd17d33db4007a4ea
(cherry picked from commit 0a4a6b5bd2b4a1f8cce8bc1c4a22dad9539631c1)
2022-02-24 18:40:47 +00:00
Alban Desmaison
3bd1507ff2 Revert D33994011: Make debug_pkl smaller by only emitting unique traces.
Test Plan: revert-hammer

Differential Revision:
D33994011 (3d37f5b052)

Original commit changeset: 8e6224c6e942

Original Phabricator Diff: D33994011 (3d37f5b052)

fbshipit-source-id: 885e739efa1081382e1fcf9c6cccba92c57e9f7a
(cherry picked from commit a6d98c85a736c2eb321a6f38005dd0f5dc43eb87)
2022-02-24 16:38:55 +00:00
Han Qi
3d37f5b052 Make debug_pkl smaller by only emitting unique traces. (#72596)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72596

debug_pkl file inside of pytorch's .pt file consists of a list of SourceRanges. Each SourceRange points to a Source which is a stack track, filename, and start, end numbers. Those are emitted in debug_pkl file as strings.

Since many SourceRange shares the same source, the string for trace can be deduped.

The newer format saves a set of unique traces in a tuple, then each SourceRange will save the offset of it's trace w.r.t. position in that tuple. (i.e. manually applying dictionary compression).

The above helps with smaller file size. On loading, if we copy each trace to Source as string the runtime memory would still blowup.
To mitigate this, we use SourceView directly instead of source which will take the reference of string inside of Deserializer and make that into string_view. This is safe because Deserializer is hold by Unpickler by shared_ptr, and Unpickler is also hold by shared_ptr by another Source object. That Source object will be alive during the model construction.

Test Plan:
unit test

Took original file (312271638_930.predictor.disagg.local); loaded with `torch.jit.load` save again with `torch.jit.save`. Unzip both, look at contents:
```
[qihan@devvm5585.vll0 ~]$ du archive -h
4.0K    archive/xl_model_weights
3.7M    archive/extra
8.0K    archive/code/__torch__/caffe2/torch/fb/model_transform/splitting
8.0K    archive/code/__torch__/caffe2/torch/fb/model_transform
8.0K    archive/code/__torch__/caffe2/torch/fb
8.0K    archive/code/__torch__/caffe2/torch
8.0K    archive/code/__torch__/caffe2
20M     archive/code/__torch__/torch/fx/graph_module
20M     archive/code/__torch__/torch/fx
8.0K    archive/code/__torch__/torch/classes
20M     archive/code/__torch__/torch
20M     archive/code/__torch__
20M     archive/code
2.7M    archive/constants
35M     archive
[qihan@devvm5585.vll0 ~]$ du resaved -h
4.0K    resaved/extra
8.0K    resaved/code/__torch__/caffe2/torch/fb/model_transform/splitting
8.0K    resaved/code/__torch__/caffe2/torch/fb/model_transform
8.0K    resaved/code/__torch__/caffe2/torch/fb
8.0K    resaved/code/__torch__/caffe2/torch
8.0K    resaved/code/__torch__/caffe2
1.3M    resaved/code/__torch__/torch/fx/graph_module
1.3M    resaved/code/__torch__/torch/fx
8.0K    resaved/code/__torch__/torch/classes
1.4M    resaved/code/__torch__/torch
1.4M    resaved/code/__torch__
1.4M    resaved/code
2.7M    resaved/constants
13M     resaved
[qihan@devvm5585.vll0 ~]$
```

Reviewed By: JasonHanwen

Differential Revision: D33994011

fbshipit-source-id: 8e6224c6e942e91c3403f686c8f0937d1002ed41
(cherry picked from commit a7014dd4029308c95007f362a57c31796d686647)
2022-02-24 09:31:16 +00:00
Hui Guo
5eb5b61221 [tensorexpre] Add typecast when src and dest buf types are different in PlacementAllocate (#71934)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71934

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D33826700

Pulled By: huiguoo

fbshipit-source-id: 9fb29a43ab5983586a6bfde3a34d7e2f2120ab0a
(cherry picked from commit 2bee018691ec888cb1ec761528951f5745d7ef79)
2022-02-23 19:36:50 +00:00
Yedidya Feldblum
7a5b0efc64 [caffe2] fix build failures in optimized builds under clang
Summary:
There are various possible approaches, but the approach chosen minimizes disruption to source control blame.

Addresses:
```
error: Function _ZN23FunctionalTest_Pad_Test8TestBodyEv is too big to optimize [-Werror,-Wignored-optimization-argument]
```

Test Plan: buck2 build mode/opt caffe2/test/cpp/api:functional

Reviewed By: jamesr66a

Differential Revision: D34027291

fbshipit-source-id: 9dfd771ad56d3d4bc0d41b38b04654c8dae7c006
(cherry picked from commit d43b5a7ed6)
2022-02-22 22:31:47 +00:00
Raghavan Raman
0d66748948 [jit] Add tests for JIT with dynamic shape fusion (#72201)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72201

Reviewed By: mikaylagawarecki

Differential Revision: D34067211

Pulled By: navahgar

fbshipit-source-id: 2c13bb43c76c7fed720ad37892d2177c3dc0b924
(cherry picked from commit eed2d8cea4)
2022-02-18 23:29:08 +00:00
Alban Desmaison
0951cb513a Revert D34342689: Revert D34250357: Sync lazy_tensor_staging back to master
Test Plan: revert-hammer

Differential Revision:
D34342689

Original commit changeset: 43f6da6986f7

Original Phabricator Diff: D34250357 (69389fb542)

fbshipit-source-id: 8a3fb74877e719e9b9577b58027b4e7061a04ef0
(cherry picked from commit c749f08e7a)
2022-02-18 17:31:21 +00:00
Alban Desmaison
86a961af87 Revert D34250357: Sync lazy_tensor_staging back to master
Test Plan: revert-hammer

Differential Revision:
D34250357 (69389fb542)

Original commit changeset: aa7d589f6050

Original Phabricator Diff: D34250357 (69389fb542)

fbshipit-source-id: 43f6da6986f7fc5189d641b7803adc5ada27194c
(cherry picked from commit 3c930a5e4e)
2022-02-18 15:47:37 +00:00
Will Constable
69389fb542 Sync lazy_tensor_staging back to master (#72875)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72875

This diff contains changes from several PRs landed to lazy_tensor_staging branch.
* generating 'fallback' overrides for each codegenned op, useful for debugging
* supports operators which are missing aten:: symbols for op names, instead using their string counterpart
* makes the IR class a base class instead of hardcoding the assumption of TS

It also resolves lint issues and in particular cleans up the following:
* {Type}s shouldn't be passed into isValueType, and using the catch-all base class of CType is nicer than specifying a list of types.

Fixes #72852

Test Plan: test manually on lazy_tensor_staging branch

Reviewed By: shunting314

Differential Revision: D34250357

fbshipit-source-id: aa7d589f605055d5d02bc77c77fa6f1182ff7497
(cherry picked from commit 2f8f5e4971)
2022-02-18 03:49:46 +00:00
Raghavan Raman
6d33852685 [NNC] TensorExprKernel state should not be modified on calls to run methods (#73028)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73028

A typical use case for `TensorExprKernel` is to create the kernel once and call it multiple times, possibly in parallel. For the parallel calls to work, we need to ensure that the run() method calls do not change any state in `TensorExprKernel`.

Before this change, the `run()` method was modifying the sizes and strides vectors when dynamic shapes were present. This manifested as a data race when running a model with Static Runtime.
ghstack-source-id: 149398820

Test Plan:
```
buck build mode/dev-asan //caffe2/test/cpp/tensorexpr:tensorexpr
./buck-out/dev/gen/caffe2/test/cpp/tensorexpr/tensorexpr --gtest_filter="DynamicShapes.MultiThreadedExecution"
```

Reviewed By: eellison

Differential Revision: D34287960

fbshipit-source-id: d311f3c5a66c5d5de4e1deaeaa01816b53e9906e
(cherry picked from commit 161568bfae)
2022-02-17 23:14:27 +00:00
Mike Iovine
d1c5f9e439 [JIT][SR] Introduce prim::IfThenElse (#72587)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72587

This pattern frequently appears in a few graphs:

```
%result = prim::If(%condition)
  block0():
    -> (%a)
  block1():
    -> (%b)
```

This is slow, particularly in static runtime. Static runtime creates memory planners/block runners for each sub-block, which eats up a lot of memory and introduces a lot of extra overhead for this relatively simple operation.

This diff introduces a new op that replaces nodes like the above with a single op meant to act like a ternary operator:

```
%result = prim::IfThenElse(%condition, %a, %b)
```

Test Plan: New unit tests

Reviewed By: eellison

Differential Revision: D34091789

fbshipit-source-id: eb6a8c460c39b4c019a1f4ab1f3f1e5b6edc400c
(cherry picked from commit 0f1b335e5b)
2022-02-17 18:22:48 +00:00
Will Constable
889f3f48b2 Revert D34178476: Update lazy_ir.py from lazy_tensor_staging
Test Plan: revert-hammer

Differential Revision:
D34178476 (3842140fd5)

Original commit changeset: 7190b2e0d82b

Original Phabricator Diff: D34178476 (3842140fd5)

fbshipit-source-id: 4c969a355f01244c6f5acc52bc31679f2182aa55
(cherry picked from commit 17082075dd)
2022-02-16 19:34:41 +00:00
Will Constable
3842140fd5 Update lazy_ir.py from lazy_tensor_staging (#72730)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72730

This diff contains changes from several PRs landed to lazy_tensor_staging branch.
- generating 'fallback' overrides for each codegenned op, useful for debugging
- supports operators which are missing aten:: symbols for op names, instead using their string counterpart
- makes the IR class a base class instead of hardcoding the assumption of TS

Test Plan: tested on lazy_tensor_staging branch

Reviewed By: desertfire

Differential Revision: D34178476

fbshipit-source-id: 7190b2e0d82b4eb1f4510c858c24446c6df3f9d0
(cherry picked from commit 6713d3f0ef)
2022-02-16 18:33:31 +00:00
Shunting Zhang
763ad1bf25 (2/2) Make TorchScript Preserve Fully Qualified Class Name for Python Exceptions: frontend change (#72899)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72899

Reland D33282878 (911d527b87). This is the frontend change.
ghstack-source-id: 149204031

Test Plan: Refer to D33282878 (911d527b87). Also check CI

Reviewed By: gmagogsfm

Differential Revision: D34252127

fbshipit-source-id: 27b17ddd4d05d904eb91fd9ee094d9121f00e388
(cherry picked from commit 1d276baca3)
2022-02-16 03:45:15 +00:00
Ivan Kobzarev
67cd98fad4 [tensorexpr] Fix isNLC segfault (#72786)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72786

Test Plan: Imported from OSS

Reviewed By: H-Huang

Differential Revision: D34204523

Pulled By: IvanKobzarev

fbshipit-source-id: 9a0f2ce0a1921e261932029c3ebd842330fdf528
(cherry picked from commit b8326064f6)
2022-02-15 20:31:56 +00:00
Michael Suo
7db4a48d92 Revert D33342569: (2/2) Make TorchScript Preserve Fully Qualified Class Name for Python Exceptions: frontend change
Test Plan: revert-hammer

Differential Revision:
D33342569 (856157fcee)

Original commit changeset: 57984ac67ae2

Original Phabricator Diff: D33342569 (856157fcee)

fbshipit-source-id: 4c12235a1776a3652e7f91e93b626705759d5176
(cherry picked from commit 4cbd7d8bab)
2022-02-15 18:45:44 +00:00
Shunting Zhang
856157fcee (2/2) Make TorchScript Preserve Fully Qualified Class Name for Python Exceptions: frontend change (#70471)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70471

Reland D33282878 (911d527b87). This is the frontend change.
ghstack-source-id: 149114933

Test Plan: Refer to D33282878 (911d527b87). Also check CI

Reviewed By: gmagogsfm

Differential Revision: D33342569

fbshipit-source-id: 57984ac67ae2c56c38f72d3b1fb69105901fb472
(cherry picked from commit b47cc935ee)
2022-02-15 07:21:19 +00:00
Pavithran Ramachandran
a482aeb0ce [PyTorchEdge] backport v8 to v7 to support promoted ops as instruction (#71662)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71662

backport v8 to v7 to support promoted ops as instruction

a flag to help export as instruction from v8 and export as operators for v7 and below

Test Plan:
```
buck test caffe2/test/cpp/jit:jit -- LiteInterpreterTest.BackPortByteCodeModelAllVersions

Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/5629499620570927
    ✓ ListingSuccess: caffe2/test/cpp/jit:jit : 461 tests discovered (15.693)
    ✓ Pass: caffe2/test/cpp/jit:jit - LiteInterpreterTest.BackPortByteCodeModelAllVersions (2.712)
Summary
  Pass: 1
  ListingSuccess: 1
If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users
Finished test run: https://www.internalfb.com/intern/testinfra/testrun/5629499620570927
```

```
buck run mode/opt //caffe2/torch/fb/mobile/upgrader_codegen:upgrader_codegen

buck test mode/opt //caffe2/test:upgrader_codegen -- mobile.test_upgrader_codegen.TestLiteScriptModule
Parsing buck files: finished in 0.8 sec
Downloaded 0/2 artifacts, 0.00 bytes, 100.0% cache miss (for updated rules)
Building: finished in 01:39.4 min (100%) 11031/11031 jobs, 2/11031 updated
  Total time: 01:40.2 min
More details at https://www.internalfb.com/intern/buck/build/a8b0e417-019c-44ba-be6b-23379411a965
BUILD SUCCEEDED
Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details.
Running with tpx session id: 44fbfa66-cce8-4277-82ac-f89d79558581
Trace available for this run at /tmp/tpx-20220202-160956.915412/trace.log
RemoteExecution session id: reSessionID-44fbfa66-cce8-4277-82ac-f89d79558581-tpx
Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/281475200877601
    ✓ ListingSuccess: caffe2/test:upgrader_codegen : 1 tests discovered (1.249)
    ✓ Pass: caffe2/test:upgrader_codegen - test_generate_bytecode (mobile.test_upgrader_codegen.TestLiteScriptModule) (1.365)
Summary
  Pass: 1
  ListingSuccess: 1
If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users
Finished test run: https://www.internalfb.com/intern/testinfra/testrun/281475200877601
```

Reviewed By: iseeyuan

Differential Revision: D33719098

fbshipit-source-id: e2d2b23d298f98e4d4fcdfc344f7b8c6f92cff26
(cherry picked from commit 81b956c23a)
2022-02-15 03:47:39 +00:00
jiej
2d110d514f Nvfuser code bump 2_1_2022 (#72127)
Summary:
Things changed in this PR that requires review:
1. aten/src/ATen/core/interned_strings.h
2. torch/csrc/jit/ir/alias_analysis.h : exposing createValue to allow efficient mutation
3. torch/csrc/jit/runtime/symbolic_shape_registry.cpp : added gelu/tanh/erf in registry
4. torch/jit/_script.py : throws scripting model sees autocast as decorator since it's not supported

nvfuser code update:
1. codegen improvements and performance tuning
2. integration bug fixes for shape expression logic
3. kernel segmentation update to address perf regression from horizontal fusion
4. scalar cpu tensor promotion to support inter-device operation between cpu scalar tensor and cuda tensor

Things reverted from local changes:
aten::gelu with approximation (tracked in PR: https://github.com/pytorch/pytorch/pull/61439)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/72127

Reviewed By: HamidShojanazeri

Differential Revision: D34113233

Pulled By: jbschlosser

fbshipit-source-id: b82cde32b71e324eca0ea57cb8c9f9647278ca74
(cherry picked from commit e009bc5c4e)
2022-02-15 00:43:16 +00:00
Jacob Szwejbka
52c516ecb8 [Pytorch Edge] Minor improve documentation in test_backend_with_compiler
Summary:
Went through all these files and the design doc to understand the to_backend api. Figured I could add some comments to these files to make the apis a little clearer for those that come after.

(Note: this ignores all push blocking failures!)

Test Plan: na

Reviewed By: raziel, larryliu0820

Differential Revision: D34221989

fbshipit-source-id: 699fcbd8714bfb6b58c6c0bf0e5fbc019d2ef6f8
(cherry picked from commit 0b3f5d73e8)
2022-02-14 23:44:46 +00:00
Ryan Spring
4f8b986e28 Implement Tanh Gelu Approximation (#61439)
Summary:
1. Implements https://github.com/pytorch/pytorch/issues/39853
2. Adds approximate boolean flag to Gelu
3. Enables Tanh Gelu approximation
4. Adds double backward support for Gelu
5. Enable Tanh Gelu in NvFuser

```
def gelu(x, approximate : str = 'none'):
    if approximate == 'tanh':
        # sqrt(2/pi) = 0.7978845608028654
        return 0.5 * x * (1.0 + torch.tanh(0.7978845608028654 * (x + 0.044715 * torch.pow(x, 3.0))))
    else:
        return x * normcdf(x)
```

Linking XLA PR - https://github.com/pytorch/xla/pull/3039

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61439

Reviewed By: VitalyFedyunin

Differential Revision: D33894937

Pulled By: jbschlosser

fbshipit-source-id: b65e8fb6ea66168af8f34f45ed50e92737a33851
(cherry picked from commit 6e986f91a9)
2022-02-14 03:40:32 +00:00
Mikhail Zolotukhin
1855b14922 [TensorExpr] Delet DimArg class. (#72390)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72390

This class didn't add much value and only caused more boilerplate code.
This change removes the class and updates all the use cases with
uses of `ExprHandle`.

A side effect of this change is different names in loop variables, which
caused massive mechanical changes in our tests.

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D34030296

Pulled By: ZolotukhinM

fbshipit-source-id: 2ba4e313506a43ab129a10d99e72b638b7d40108
(cherry picked from commit c2ec46a058)
2022-02-11 01:21:59 +00:00
Mikhail Zolotukhin
9123e9b3b5 [TensorExpr] Switch from ExprPtr to ExprHandle in Compute impl. (#72389)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72389

This is an NFC change that just prepares the code for the upcoming
deletion of `DimArg` class. This change makes `Compute` and `Reduce`
APIs to use `ExprHandle` everywhere.

There should be no observable behavior change from this PR.

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D34030295

Pulled By: ZolotukhinM

fbshipit-source-id: 3fd035b6a6bd0a07ccfa92e118819478ae85412a
(cherry picked from commit 1b0a4b6fac)
2022-02-11 01:21:59 +00:00
David Berard
c314750401 [JIT] enable profiling optional tensors (#70532)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70532

This adds profiling to Optional[Tensor] types

First, in profiling_record.cpp, profiling nodes are added to Optional[Tensor] inputs. The nodes record
(a) whether or not any `None` types are encountered, and
(b) of the Tensor types, what's the most specific type matching all of non-null tensors that were encoutered (shape, dtype, etc.)

In tensorexpr_fuser, when specializing types based on the profiled information, an Optional[Tensor] type will always be Optional[], but the Tensor type contained in the optional type can be specialized (e.g. `Optional[Float(2x2x2, cpu, etc)]`)

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D33714748

Pulled By: davidberard98

fbshipit-source-id: 93c819054450de7ac84b112de1012c0c12e34120
(cherry picked from commit 21cfd80123)
2022-02-08 22:52:26 +00:00
Raghavan Raman
765908708b [nnc] Adding a test with dynamic shapes from a model (#72198)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72198

Test Plan: Imported from OSS

Reviewed By: eellison

Differential Revision: D33951741

Pulled By: navahgar

fbshipit-source-id: 596b193eba14c8e1affa9fa13070079f05d64cac
(cherry picked from commit ddbb78ff80)
2022-02-08 02:00:46 +00:00
Raghavan Raman
ff71429906 [nnc] Add stride args while running with allocated outputs (#72223)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72223

ghstack-source-id: 148494871

Test Plan:
```
buck test mode/opt //caffe2/test/cpp/tensorexpr:tensorexpr -- --exact 'caffe2/test/cpp/tensorexpr:tensorexpr - DynamicShapes.GraphWithSymbolicStrides'
```

Reviewed By: eellison

Differential Revision: D33960592

fbshipit-source-id: 6334978d5e3713889b4ad12bcd8ed8c69df39d58
(cherry picked from commit 95cc102bc2)
2022-02-07 19:24:56 +00:00
Han Qi
57f039b41f Fixing few bugs in torch flatbuffer (#72349)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72349

1. Interface call'd methods need to be registered to class. Previously all interface calls are inlined so  there was no such problem.
2. parseDoubleList and parseBoolList got reversed when refactoring.

Test Plan:
1. Get ASR's test model at
```
mkdir ~/asr1 && cd ~/asr1
fbpkg fetch speech.tuna.milan.ondevice.en_us
```
2. Convert model:
```
cd ~/fbsource
buck run //xplat/caffe2/fb/lite_predictor:convert_model -- --model=$HOME/asr1/pytorchmodel.pt --output_name=$HOME/asr1/pytorchmodel.ff
```
3. Ran lite_predictor_flatbuffer
```
 buck run //xplat/caffe2/fb/lite_predictor:lite_predictor_flatbuffer -- --model=$HOME/asr1/pytorchmodel.ff --method_to_call=encode_src --method_to_generate_input=get_all_bundled_inputs_for_encode_src

```

See perf metric generated (means loading and inference succeeded).

Reviewed By: gmagogsfm, zhxchen17

Differential Revision: D33959746

fbshipit-source-id: 24671e1189438119f477032eb6c29bd7736e74ca
(cherry picked from commit 5e18809350)
2022-02-05 00:25:27 +00:00
Raghavan Raman
38f696c0cd [nnc] Add a API to unroll loops by a given factor (#72071)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72071

Reviewed By: ngimel

Differential Revision: D33946250

Pulled By: navahgar

fbshipit-source-id: 3f3f92054174620025a9d71154d006f1738953e2
(cherry picked from commit d8b53598e9)
2022-02-03 18:41:21 +00:00
kshitij12345
02f6226bff [fix] Dropout2d-3d no-batch-dim (#69885)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/69801

TODO:
* [x] Update C++ API

cc albanD mruberry jbschlosser walterddr kshitij12345

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69885

Reviewed By: mruberry

Differential Revision: D33175470

Pulled By: jbschlosser

fbshipit-source-id: c9d7d9e0f59ba290a0157725c338a345f3d58b9f
(cherry picked from commit 7e4271a156)
2022-02-02 16:40:32 +00:00
CodemodService FBSourceClangFormatLinterBot
ed435e903f [AutoAccept][Codemod][FBSourceClangFormatLinter] Daily arc lint --take CLANGFORMAT
Reviewed By: zertosh

Differential Revision: D33938055

fbshipit-source-id: 6c0643a18f09854e87e183341f252c66dd6395a6
(cherry picked from commit fd183aedbc)
2022-02-02 11:27:15 +00:00
Ivan Kobzarev
34e4418dfa [nnc] tensorexpr for quantized/aten::upsample_nearest2d (#71236)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71236

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D33553305

Pulled By: IvanKobzarev

fbshipit-source-id: 2442afee6d23123bb3a4bc52d3555393b0254106
(cherry picked from commit 90a263fc08)
2022-02-01 19:48:53 +00:00
Elias Ellison
cf1833df70 [WIP] add explicit dynamic fusion arg (#71173)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71173

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D33536222

Pulled By: eellison

fbshipit-source-id: a097408ecdd6e284432de128feb297993d882d52
(cherry picked from commit 0e3419b2d3)
2022-02-01 19:07:02 +00:00
Nikita Shulga
74c44ba9d6 Revert D33850228: [pytorch][PR] Implement Tanh Gelu Approximation
Test Plan: revert-hammer

Differential Revision:
D33850228 (23d03025dc)

Original commit changeset: 3cc33fb298e4

Original Phabricator Diff: D33850228 (23d03025dc)

fbshipit-source-id: 9436e7df73c2b2e2011f321674f24973316d3692
(cherry picked from commit c9efb58223)
2022-01-31 17:44:19 +00:00
Ryan Spring
23d03025dc Implement Tanh Gelu Approximation (#61439)
Summary:
1. Implements https://github.com/pytorch/pytorch/issues/39853
2. Adds approximate boolean flag to Gelu
3. Enables Tanh Gelu approximation
4. Adds double backward support for Gelu
5. Enable Tanh Gelu in NvFuser

```
def gelu(x, approximate : str = 'none'):
    if approximate == 'tanh':
        # sqrt(2/pi) = 0.7978845608028654
        return 0.5 * x * (1.0 + torch.tanh(0.7978845608028654 * (x + 0.044715 * torch.pow(x, 3.0))))
    else:
        return x * normcdf(x)
```

Linking XLA PR - https://github.com/pytorch/xla/pull/3039

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61439

Reviewed By: cpuhrsch

Differential Revision: D33850228

Pulled By: jbschlosser

fbshipit-source-id: 3cc33fb298e480d7ecc5c67716da019d60c6ab33
(cherry picked from commit 3a53b3e94f)
2022-01-31 17:07:45 +00:00
Tristan Rice
6208c2800e torch/monitor: merge Interval and FixedCount stats (#72009)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72009

This simplifies the Stats interface by merging IntervalStat and FixedCountStat into a single Stat w/ a specific window size duration and an optional max samples per window. This allows for the original intention of having comparably sized windows (for statistical purposes) while also having a consistent output bandwidth.

Test Plan:
```
buck test //caffe2/test:monitor //caffe2/test/cpp/monitor:monitor
```

Reviewed By: kiukchung

Differential Revision: D33822956

fbshipit-source-id: a74782492421be613a1a8b14341b6fb2e8eeb8b4
(cherry picked from commit 293b94e0b4)
2022-01-30 23:21:59 +00:00
David Berard
99bc978b78 [JIT] Propagate requires_grad to autodiff subgraphs (#71666)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71666

When JIT autodiff is constructing a gradient computation graph, it will only add gradients for tensors that require_grad. Previously, require_grad information was **not** propagated to the subgraph that autodiff used; as a result, autodiff would calculate *all* gradients, even if requires_grad had never been set during profiling runs. In certain cases, this can lead to performance issues. For example, during training, the gradient of the input data is not needed, but is still computed.

This propagates requires_grad to the subgraph passed into autodiff, so that autodiff will not compute unnecessary gradients.

Test: `./bin/test_jit --gtest_filter="AutodiffRemoveUnusedGradientsTest.Linear"`

Test Plan: Imported from OSS

Reviewed By: eellison

Differential Revision: D33725304

Pulled By: davidberard98

fbshipit-source-id: ca7ab4c9a6a26f94f93aff2d5a4135e125323ba1
(cherry picked from commit a97fe0556d)
2022-01-28 18:57:36 +00:00
Joel Schlosser
cb823d9f07 Revert D33744717: [pytorch][PR] Implement Tanh Gelu Approximation
Test Plan: revert-hammer

Differential Revision:
D33744717 (f499ab9cef)

Original commit changeset: d64532a562ed

Original Phabricator Diff: D33744717 (f499ab9cef)

fbshipit-source-id: 396c3f63de5865f894dbc353d0790a01a624be93
(cherry picked from commit e9fb2d1db1)
2022-01-28 18:35:01 +00:00
Ryan Spring
f499ab9cef Implement Tanh Gelu Approximation (#61439)
Summary:
1. Implements https://github.com/pytorch/pytorch/issues/39853
2. Adds approximate boolean flag to Gelu
3. Enables Tanh Gelu approximation
4. Adds double backward support for Gelu
5. Enable Tanh Gelu in NvFuser

```
def gelu(x, approximate : str = 'none'):
    if approximate == 'tanh':
        # sqrt(2/pi) = 0.7978845608028654
        return 0.5 * x * (1.0 + torch.tanh(0.7978845608028654 * (x + 0.044715 * torch.pow(x, 3.0))))
    else:
        return x * normcdf(x)
```

Linking XLA PR - https://github.com/pytorch/xla/pull/3039

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61439

Reviewed By: mikaylagawarecki

Differential Revision: D33744717

Pulled By: jbschlosser

fbshipit-source-id: d64532a562ed53247bb4fa52bb16722634d5c187
(cherry picked from commit 4713dd9cca)
2022-01-28 16:59:09 +00:00
John Clow
c85965600c Fix bug where frozen mod not used for OFI #68903 (#71436)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71436

Fixes issue #68903

Test Plan: Imported from OSS

Reviewed By: george-qi

Differential Revision: D33824857

Pulled By: Gamrix

fbshipit-source-id: 8d351feb4a621916f55003c58527a1e85eec476e
(cherry picked from commit 57bb420040)
2022-01-27 23:37:50 +00:00
Pavithran Ramachandran
bf69a61293 (1/2) Make TorchScript Preserve Fully Qualified Class Name for Python Exceptions: backend change
Summary: Reland for D33282878 (911d527b87) . Land backend change first to maintain FC. Will wait for 2 weeks after this diff is in. And than land the front-end change in next diff.

Test Plan:
test in next diff

time buck test mode/dev-nosan fblearner/flow/projects/langtech/translation:tests -- test_e2e_base_training

Reviewed By: gmagogsfm

Differential Revision: D33342547

fbshipit-source-id: b3dee9a4bdfd78103848c12629e5fccafdd621e3
(cherry picked from commit ae1935f1af)
2022-01-27 03:29:40 +00:00
Mikhail Zolotukhin
1dbcde2ade [TensorExpr] Support scalar intermediate and output values. (#71186)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71186

So far we've only supported scalar inputs, but couldn't handle scalar outputs
or intermediates. This PR adds it.

Scalar outputs are returned as 0-dim tensors. If the kernel is invoked on a
stack of IValues, we correctly convert the results to scalar IValues when
needed. If the kernel is invoked with a vector of void* pointers, everything
works out of the box without any conversions.

Lowerings for scalar operators are a bit tricky. Usual lowerings return a pair
<Buf, Stmt> (aka Tensor), but for scalar operators we also want to have the
corresponding Var that the lowering function supposedly creates (in theory we
could just use Loads and Stores, but I'm worried it can affect performance as
there is no guarantee this will be optimized by LLVM). So, what we do here to
work around this is we return a fake buf + stmt that sets the corresponding
var. Then outside of the lowering we create a real buffer and generate a Store
to it with the value from the variable we passed as the base handle of the fake
buf. This real buffer is then treated as usual by the rest of the system and we
can use it if we need to return this scalar value as a kernel output. If we do
not need to return it, then the Store will be deleted by the DCE pass.

Differential Revision:
D33539324
D33539324

Test Plan: Imported from OSS

Reviewed By: navahgar

Pulled By: ZolotukhinM

fbshipit-source-id: ab4524b9820ce204f106effcf6232ed33d4ee223
(cherry picked from commit 7faa0939f0)
2022-01-26 06:32:51 +00:00
Jacob Szwejbka
70f3078dd6 [Pytorch Edge] Wrap lowered module in to_backend (#71597)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71597

Problem: _jit_to_backend overrides get/set state. This means any attributes added to the module after lowering will not be preserved after serialization. For edge workflows the biggest problem here is it breaks bundled_inputs.

Solution?:

Real quick and easy way to handle issues with to_backend overriding get/set state. Wraps the lowered module in another module and has forwarding functions for the api specified in 'method_compile_spec'.

The tradeoff with this approach is now the actual workhorse of the module is 1 layer deep which might make debugging slightly grosser/more difficult/confusing. The other approach Martin David and I talked about would be to only lower the portions that require custom get/set state logic. This leaves the top level the same, and only specific backened internals are changed. Personally I'm not sure how much that really addresses the debugging concern all that well. It seems like if you cracked the model open you'd still run into similar amounts of confusion with a lot of the variables and logic referenced coming from another module.

The other concern with this approach is whether or not 'compile_spec' specifies the public api of the module (since thats our source of truth for this wrapper). While it may not be enforced, it certainly seems to be true by convention and the to_backend api already uses it as a source of truth for all functions that get generated in the resulting module. I say we just formally commit to this (compile spec keys being functions) being the contract of the api instead of just assuming it to be the case and then having weird behavior if its not.

Test Plan:
New Unit Test
CI to check for existing behavior and contracts.

manually tested in a notebook with bundled inputs.

{P475790313}

Reviewed By: raziel

Differential Revision: D33694257

fbshipit-source-id: 9ff27db421eba41bac083dff11a22e9e40a36970
(cherry picked from commit 91ef49977e)
2022-01-25 06:30:19 +00:00
Peter Bell
40d1f77384 Codegen: python_torch_functions only include relevant operators (#68693)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68693

Generation of python bindings for native functions is split over 8
different files. One for each namespace, with the torch namespace
split into 3 shards, and methods in their own file as well. This
change ensures that editing any single (non-method) operator only
causes one of these files to be rebuilt.

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D32596270

Pulled By: albanD

fbshipit-source-id: 0570ec69e7476b8f1bc21138ba18fe8f95ebbe3f
(cherry picked from commit ba0fc71a3a)
2022-01-21 15:37:06 +00:00
Jacob Szwejbka
e926360cb8 [Pytorch Edge] Refactor Compatibility Stuff into own directory (#71432)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71432

Organizing jit/mobile a little more

ghstack-source-id: 147184536

Test Plan: ci.

Reviewed By: iseeyuan

Differential Revision: D33640527

fbshipit-source-id: f3a7884fe0d06d80bb8d9cf141ecaee34b6f88ff
(cherry picked from commit 4c3d1e5435)
2022-01-20 19:38:41 +00:00
Han Qi
21b697b646 add flatbuffer_loader and flatbuffer_serializer as BUCK target (#71463)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71463

title

Test Plan: unittest

Reviewed By: zhxchen17

Differential Revision: D33651339

fbshipit-source-id: 4bf325a40e263a441fd86bce560645ad0c1ebb23
(cherry picked from commit 4cb02e62a6)
2022-01-20 04:51:10 +00:00
Raghavan Raman
70c9146c40 [nnc] Update block and thread extents in cuda_codegen to use int64_t (#71428)
Summary:
The block and thread extent calculations in `cuda_codegen` should be using `int64_t` instead of `int`. The updated test, `test_dynamic_shapes`, fails without this change.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/71428

Reviewed By: samdow

Differential Revision: D33640374

Pulled By: navahgar

fbshipit-source-id: 64c340ad2a9a1fa1fe066cf1c5dfc3b546b7be6d
(cherry picked from commit 6ea546ce11)
2022-01-19 23:21:24 +00:00
Peter Bell
6f4c491c6b empty_cpu: Add functions that don't depend on Tensor (#70613)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70613

This refactors `at::detail::empty_cpu` to use only `TensorBase` so you
can construct tensors without including `Tensor.h`. It also adds a
`TensorOptions` version to reduce friction in operators moving from
the `at::empty` API.

Test Plan: Imported from OSS

Reviewed By: samdow

Differential Revision: D33623682

Pulled By: ngimel

fbshipit-source-id: 7a7b08bc2ed06830a3d698197a0c8389a096dc1d
(cherry picked from commit 2e17ad0bbd)
2022-01-19 00:01:58 +00:00
Jiewen Tan
680d61daab [LT] Remove torch::lazy::convertShapes (#71291)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71291

This commit removes torch::lazy::convertShapes since it's no longer used.
In addition, it replaces a numel logic within LTCTensorImpl.

Test Plan:
./build/bin/test_lazy
CI in lazy_tensor_staging branch

Reviewed By: wconstab

Differential Revision: D33575084

Pulled By: alanwaketan

fbshipit-source-id: b104ef39fd552822e1f4069eab2cb942d48423a6
2022-01-14 12:06:39 -08:00
CodemodService FBSourceClangFormatLinterBot
88012c7daf [AutoAccept][Codemod][FBSourceClangFormatLinter] Daily arc lint --take CLANGFORMAT
Reviewed By: zertosh

Differential Revision: D33577744

fbshipit-source-id: 7ecc8367998ee1dffde54c2f4dd3cfafe19a53c9
2022-01-14 06:10:57 -08:00
Mike Ruberry
3a0c680a14 Jiterates exp2, erfc, erfinv and entr and refactors code_template.h to ATen (#71295)
Summary:
Per title.

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang

Pull Request resolved: https://github.com/pytorch/pytorch/pull/71295

Reviewed By: ngimel

Differential Revision: D33575885

Pulled By: mruberry

fbshipit-source-id: bc841b46fc0b5458a26a4d4465b18a7a54cd5a5b
2022-01-13 23:58:51 -08:00
Zhengxu Chen
5f2b4be3b9 [jit] Split DynamicType conformance test into smaller pieces. (#71275)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71275

Currently it's taking more than 10 minutes to run the conformance test. Instead we should use parametrized test to shard into test segments so that they can run in parallel.
ghstack-source-id: 146990608

Test Plan:
```
[zhxchen17@devbig560.ftw3 /data/users/zhxchen17/fbsource/fbcode] buck test mode/dev-tsan //caffe2/test/cpp/jit:jit -- -r 'LiteInterpreterDynamicTypeTestFixture'
Building... 34.9 sec (99%) 12110/12111 jobs, 0/12111 updated
Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details.
Running with tpx session id: ebea52b3-7c7f-46be-9f69-18e2e7b040cc
Trace available for this run at /tmp/tpx-20220113-113635.717778/trace.log
RemoteExecution session id: reSessionID-ebea52b3-7c7f-46be-9f69-18e2e7b040cc-tpx
Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/4222124735827748
    ✓ ListingSuccess: caffe2/test/cpp/jit:jit : 431 tests discovered (11.173)
    ✓ Pass: caffe2/test/cpp/jit:jit - Conformance/LiteInterpreterDynamicTypeTestFixture.Conformance/0 (51.331)
    ✓ Pass: caffe2/test/cpp/jit:jit - Conformance/LiteInterpreterDynamicTypeTestFixture.Conformance/1 (65.614)
    ✓ Pass: caffe2/test/cpp/jit:jit - Conformance/LiteInterpreterDynamicTypeTestFixture.Conformance/3 (76.875)
    ✓ Pass: caffe2/test/cpp/jit:jit - Conformance/LiteInterpreterDynamicTypeTestFixture.Conformance/5 (77.271)
    ✓ Pass: caffe2/test/cpp/jit:jit - Conformance/LiteInterpreterDynamicTypeTestFixture.Conformance/4 (78.871)
    ✓ Pass: caffe2/test/cpp/jit:jit - Conformance/LiteInterpreterDynamicTypeTestFixture.Conformance/6 (78.984)
    ✓ Pass: caffe2/test/cpp/jit:jit - Conformance/LiteInterpreterDynamicTypeTestFixture.Conformance/7 (84.068)
    ✓ Pass: caffe2/test/cpp/jit:jit - Conformance/LiteInterpreterDynamicTypeTestFixture.Conformance/2 (85.198)
    ✓ Pass: caffe2/test/cpp/jit:jit - Conformance/LiteInterpreterDynamicTypeTestFixture.Conformance/8 (88.815)
    ✓ Pass: caffe2/test/cpp/jit:jit - Conformance/LiteInterpreterDynamicTypeTestFixture.Conformance/9 (90.332)
Summary
  Pass: 10
  ListingSuccess: 1
If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users
Finished test run: https://www.internalfb.com/intern/testinfra/testrun/4222124735827748
```

Reviewed By: qihqi

Differential Revision: D33570442

fbshipit-source-id: 5c49e03b0f88068d444c84b4adeaaf45433ce1fa
2022-01-13 18:22:55 -08:00
CodemodService FBSourceClangFormatLinterBot
60632a00fe [AutoAccept][Codemod][FBSourceClangFormatLinter] Daily arc lint --take CLANGFORMAT
Reviewed By: zertosh

Differential Revision: D33561057

fbshipit-source-id: 79873717c45c8bbe6d0ae760e718770fd960185d
2022-01-13 03:27:06 -08:00
Scott Wolchok
1bbea3c3a2 [PyTorch][JIT] Support mayContainAlias(Value*, ArrayRef<Value*>) (#69853)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69853

We can implement this overload more efficiently.
ghstack-source-id: 146924693

Test Plan:
patched alias_analysis tests

Time reported to initialize a predictor by static runtime when given ctr_mobile_feed local_ro net is 9.5s instead of 10.5s.

Reviewed By: mikeiovine

Differential Revision: D33039731

fbshipit-source-id: 52559d678e9eb00e335b9e0db304e7a5840ea397
2022-01-12 16:53:54 -08:00
Han Qi
1bc3571078 [pytorch][PR] Add ability for a mobile::Module to save as flatbuffer (#70201)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70201

Included functions:
save_mobile_module -> saves a mobile::Module to flatbuffer
load_mobile_module_from_file -> loads a flatbuffer into mobile::Module
parse_mobile_module -> parses from bytes or deserialized flatbuffer module object

Compared to previous attempts, this diff only adds flatbuffer to cmake target and leaves fbcode/xplat ones unchanged.

Test Plan: unittest

Reviewed By: malfet, gmagogsfm

Differential Revision: D33239362

fbshipit-source-id: b9ca36b83d6af2d78cc50b9eb9e2a6fa7fce0763
2022-01-12 16:30:39 -08:00
Raghavan Raman
9ca367d48b [nnc] Use given kernel function name while emitting code (#67781)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67781

Update `LLVMCodeGen` in NNC to use the given kernel function name while emitting code.

This was earlier committed as D31445799 (c30dc52739) and got reverted as part of a stack of diffs that included a cache for `PyTorchLLVMJIT`, which was the likely culprit.

Test Plan:
```
buck test mode/opt //caffe2/test/cpp/tensorexpr:tensorexpr -- --exact 'caffe2/test/cpp/tensorexpr:tensorexpr - LLVM.CodeGenKernelFuncName'
```

Reviewed By: ZolotukhinM, bdhirsh

Differential Revision: D32145958

fbshipit-source-id: 5f4e0400c4fa7cabce5b91e6de2a294fa0cad88e
2022-01-12 15:49:17 -08:00
Tristan Rice
bfe1abd3b5 torch/monitor: add pybind (#69567)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69567

This exposes torch.monitor events and stats via pybind11 to the underlying C++ implementation.

* The registration interface is a tad different since it takes a lambda function in Python where as in C++ it's a full class.
* This has a small amount of changes to the counter interfaces since there's no way to create an initializer list at runtime so they now also take a vector.
* Only double based stats are provided in Python since it's intended more for high level stats where float imprecision shouldn't be an issue. This can be changed down the line if need arises.

```
events = []

def handler(event):
    events.append(event)

handle = register_event_handler(handler)

log_event(Event(type="torch.monitor.TestEvent", timestamp=datetime.now(), metadata={"foo": 1.0}))
```

D32969391 is now included in this diff.
This cleans up the naming for events. type is now name, message is gone, and metadata is renamed data.

Test Plan: buck test //caffe2/test:monitor //caffe2/test/cpp/monitor:monitor

Reviewed By: kiukchung

Differential Revision: D32924141

fbshipit-source-id: 563304c2e3261a4754e40cca39fc64c5a04b43e8
2022-01-12 13:35:11 -08:00
Tugsbayasgalan (Tugsuu) Manlaibaatar
70951884d4 Add option to load historic operators in IR when the operator is deprecated (#71148)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71148

Test Plan: Imported from OSS

Reviewed By: gmagogsfm

Differential Revision: D33521300

Pulled By: tugsbayasgalan

fbshipit-source-id: a0607dba5e7233590384326537017eb0b18da419
2022-01-12 11:07:04 -08:00
Elias Ellison
5480deb183 Add support for permutting dynamic fusion group outputs to channels last format (#70656)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70656

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D33458650

Pulled By: eellison

fbshipit-source-id: f0c7d20743deac7a87f7c9176e60da8100aefe41
2022-01-12 09:11:34 -08:00
Elias Ellison
39be20f259 [JIT][NNC] Add handling of strides to dynamic shape support. (#70464)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70464

Add handling of strided input tensors to dynamic fusion. This is done with the same set of input striding specializations as https://github.com/pytorch/pytorch/pull/60684/:
```
  S_ONE, // STRIDE_ONE: packed
  S_CONT, // STRIDE_CONTIGUOUS: stride[i + 1] * sizes[i + 1]
  S_TRAN_CONT, // STRIDE_TRANSPOSED_CONTIGUOUS: stride[i-1] * sizes[i-1]
  S_AS_ARG, // STRIDE_AS_ARG: stride passed in as runtime value
```
and then two additional specializations for a) contiguous tensor and b) channels-last tensor. channels-last is a common case and we should optimize for it. additionally, tensors natively store whether they are contiguous/channels-last contiguous, which makes it faster to check if tensors follow this pattern.

Output striding will be done in a follow up.

The striding is stored on both the TensorGroup node and on the guard node. The striding descriptors are stored as a vector of strings on the node for debugability and to make use of storing ivalues as attributes on nodes.

As an example:

```

%8 : Double(10, 11, 12, 13, strides=[1716, 1, 143, 11], requires_grad=0, device=cpu) = prim::TensorExprGroup_0[symbolic_shape_inputs=[-37, -36, -35, -34], striding_inputs_desc=[["TENSOR_CONT_CHANNELS_LAST"]](%x, %24, %23, %22, %21)```
```

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D33458649

Pulled By: eellison

fbshipit-source-id: c42616d3c683d70f6258180d23d3841a31a6030d
2022-01-12 09:11:31 -08:00
Elias Ellison
975e7d246e Remove ignore shapes arg (#71144)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71144

This wasn't being used anywhere. It was originally intended for the SR flow but we're doing something else now.

Test Plan: Imported from OSS

Reviewed By: navahgar, ZolotukhinM

Differential Revision: D33521061

Pulled By: eellison

fbshipit-source-id: 0574698a2b7409df6feb703f81e806d886225307
2022-01-12 09:09:49 -08:00
CodemodService FBSourceClangFormatLinterBot
93b2399c6c [AutoAccept][Codemod][FBSourceClangFormatLinter] Daily arc lint --take CLANGFORMAT
Reviewed By: zertosh

Differential Revision: D33544281

fbshipit-source-id: 4f0b5d6d490e6fcb967550cfb1dc0111b1770f73
2022-01-12 04:16:43 -08:00
Elias Ellison
9bccb31306 Remove precise tuple construct flag (#71121)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71121

Test Plan: Imported from OSS

Reviewed By: d1jang

Differential Revision: D33515234

Pulled By: eellison

fbshipit-source-id: 57cfe171b583a6bb4d3493a34b159061e97a11b8
2022-01-11 22:12:36 -08:00
Zhengxu Chen
9465c24245 [jit][edge] Use dynamic type instead of union types for schema parsers. (#70509)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70509

TypeFactory will construct DynamicType when building on Edge platforms. We use this facility to make FunctionSchema return DynamicType all the time for OptionalType. We don't explicitly use DynamicTypeFactory everywhere because that requires too many changes and will split the entire aten codebase.
ghstack-source-id: 146818621

Test Plan: CI

Reviewed By: iseeyuan

Differential Revision: D33306737

fbshipit-source-id: d7ce00b438f7c03b43945d578280cfd254b1f634
2022-01-11 20:14:25 -08:00
Zhengxu Chen
e7634f83ce [jit][edge] Migrate base types to DynamicType on mobile. (#70233)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70233

Make type parser to produce DynamicType for all base types which don't have type arguments, and return DynamicType pointer for IValue::type().
ghstack-source-id: 146818622

Test Plan: no behavior change.

Reviewed By: iseeyuan

Differential Revision: D33137219

fbshipit-source-id: 1612c924f5619261ebb21359936309b41b2754f5
2022-01-11 13:53:29 -08:00
Zhengxu Chen
4f35b9144c [jit][edge] Migrate ListType to DynamicType on mobile. (#70212)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70212

Use DynamicType instead of ListType all over the place in Lite Interpreter. Namely we need to modify the following places:
1. Type parser which produces the Type constants.
2. IValue::type() which returns reflected Type from IValues.
3. Helper functions to construct the container value.
4. Typechecks which test whether a type instance is a particular container type.
ghstack-source-id: 146818619

Test Plan: CI

Reviewed By: iseeyuan

Differential Revision: D33176931

fbshipit-source-id: 9144787f5fc4778538e5c665946974eb6171a2e6
2022-01-11 10:57:53 -08:00
Zhengxu Chen
b12ca69179 [jit][edge] Migrate DictType to DynamicType on mobile. (#70202)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70202

Use DynamicType instead of DictType all over the place in Lite Interpreter. Namely we need to modify the following places:
1. Type parser which produces the Type constants.
2. IValue::type() which returns reflected Type from IValues.
3. Helper functions to construct the container value.
4. Typechecks which test whether a type instance is a particular container type.
ghstack-source-id: 146735648

Test Plan: no behavior change.

Reviewed By: iseeyuan

Differential Revision: D33137257

fbshipit-source-id: 971bf431658c422ea9353cc32cdab66e98876e9d
2022-01-10 15:55:29 -08:00
Zhengxu Chen
30699cbfd5 Reland D33284352: [jit][edge] Do not reuse mobile type parser for all unpicklers. (#71048)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71048

reland D33284352 (0a921ba0d0)
ghstack-source-id: 146735646

Test Plan: All Github CI: ciflow rerun -l ciflow/all

Reviewed By: gmagogsfm

Differential Revision: D33489731

fbshipit-source-id: 3e160209a1abb193ad3eed3018054aa7d331025e
2022-01-10 12:42:23 -08:00
Elias Ellison
fb66f561b1 Add copy out to the fallback path in SR invocation of composed op (#70871)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70871

We had previously handled reusing memory in the optimized kernel execution path, but not yet handled it if we hit the unoptimized fallback.

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D33458652

Pulled By: eellison

fbshipit-source-id: 4eb62181ed02c95813a99638f5e2d0f9347b5c08
2022-01-10 12:16:38 -08:00
Zhengxu Chen
9762aa0fdc Revert D33284352: [jit][edge] Do not reuse mobile type parser for all unpicklers.
Test Plan: revert-hammer

Differential Revision:
D33284352 (0a921ba0d0)

Original commit changeset: 997c4f110b36

Original Phabricator Diff: D33284352 (0a921ba0d0)

fbshipit-source-id: af316727442a64f1ae40d53d7a9d26ec550d634e
2022-01-07 19:58:03 -08:00
Zhengxu Chen
0a921ba0d0 [jit][edge] Do not reuse mobile type parser for all unpicklers. (#70338)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70338

Today Unpickler is used by both server and mobile for deserializing model, and it always fallback to mobile parser when there's no type resolver provided by user. However this is not intended as server and mobile type parser supports different things. In this diff we provide a default fallback using script parser and opt it out for all mobile cases.
ghstack-source-id: 146727330

(Note: this ignores all push blocking failures!)

Test Plan: CI

Reviewed By: iseeyuan

Differential Revision: D33284352

fbshipit-source-id: 997c4f110b36eee6596e8f23f6a87bf91a4197ed
2022-01-07 18:35:32 -08:00
Jiewen Tan
338eb1b2b3 [LTC] Export torch::lazy::GetBackendDevice() (#70963)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70963

This commit exports torch::lazy::GetBackendDevice().

Test Plan: CI in the lazy_tensor_staging branch.

Reviewed By: wconstab

Differential Revision: D33468938

Pulled By: alanwaketan

fbshipit-source-id: f65599c9238bf6b4f4ffbd5194befdc267272831
2022-01-07 13:13:18 -08:00
Zhengxu Chen
1011ac188f [jit][edge] Create DynamicType for OptionalType in mobile. (#68137)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68137

A small step to replace existing OptionalType usage to DynamicType in Edge runtime.
ghstack-source-id: 146670520

Test Plan: CI

Reviewed By: iseeyuan

Differential Revision: D32264617

fbshipit-source-id: 62d3ffad40901842deac19ca2098ea5ca132e718
2022-01-07 11:23:12 -08:00
Zhengxu Chen
0517e719ac [jit] Add conformance test for DynamicType with server JIT types. (#69482)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69482

Add a test to enumerate a number of JIT type combinations and see if their subtyping behavior is preserved in the new DynamicType system.
ghstack-source-id: 146670526

Test Plan: buck test mode/opt //caffe2/test/cpp/jit:jit -- --exact 'caffe2/test/cpp/jit:jit - LiteInterpreterTest.DynamicType'

Reviewed By: gmagogsfm

Differential Revision: D32891263

fbshipit-source-id: 728211b39778e93db011b69b0a4047df78a8fc5b
2022-01-07 11:23:09 -08:00
Xiang Gao
6e16c9bb1d Add support for deleteKey for FileStore (#69953)
Summary:
torch_ucc uses `deleteKey`, and trying to run PyTorch tests with torch_ucc leads to failure about `deleteKey not implemented for FileStore`.

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69953

Reviewed By: ngimel

Differential Revision: D33458457

Pulled By: H-Huang

fbshipit-source-id: f46afd59f950722ae594d9aafb8843f14019e930
2022-01-07 06:20:59 -08:00
Mikhail Zolotukhin
8223ef1cd8 [TensorExpr] Clean-up logic for copying input tensors and remove some dead code. (#70535)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70535

This also fixes handling of inputs that happen to be outputs (they
require copy).

Test Plan: Imported from OSS

Reviewed By: pbelevich

Differential Revision: D33399116

Pulled By: ZolotukhinM

fbshipit-source-id: 9845838eb653b82ae47b527631b51893990d5319
2022-01-07 01:03:56 -08:00
Mikhail Zolotukhin
5d7cc8f22a [TensorExpr] Add some graph-rewrite passes to prepare models for AOT compilation. (#66515)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66515

These passes should not be used generally as they change API of the
model's forward method, but they help experimenting with the model and
ironing out all the kinks before it can be compiled properly. In the
long run ideally we should provide a better way to enable such
experiments.

Differential Revision:
D31590862
D31590862

Test Plan: Imported from OSS

Reviewed By: navahgar

Pulled By: ZolotukhinM

fbshipit-source-id: 74ded34c6c871d4cafa29f43dc27c7e71daff8fc
2022-01-07 01:03:53 -08:00
Joel Schlosser
e6befbe85c Add flag to optionally average output attention weights across heads (#70055)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/47583

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70055

Reviewed By: bhosmer

Differential Revision: D33457866

Pulled By: jbschlosser

fbshipit-source-id: 17746b3668b0148c1e1ed8333227b7c42f1e3bf5
2022-01-06 17:32:37 -08:00
Tugsbayasgalan (Tugsuu) Manlaibaatar
b0fdca8855 Bump version number to 7 and compile old operators with old schema (#68358)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68358

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D33433730

Pulled By: tugsbayasgalan

fbshipit-source-id: 202c58365bae13195d3545cefcb0da9162b02151
2022-01-05 23:57:22 -08:00
Raghavan Raman
616afcf981 [jit] [shape analysis] Move constant tensors out of fused subgraphs during generalization (#70320)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70320

ghstack-source-id: 146514368

Test Plan: `buck test mode/dev-nosan //caffe2/test/cpp/jit:jit`

Reviewed By: eellison

Differential Revision: D33280508

fbshipit-source-id: fe4291d7c49f0a498b330de96b698e99f6f6a505
2022-01-05 10:19:14 -08:00
Michael Suo
0ece9a49d7 Revert D33198155: Bump version number to 7 and compile old operators with old schema
Test Plan: revert-hammer

Differential Revision:
D33198155 (d35fc409ad)

Original commit changeset: 38a1185f9ecb

Original Phabricator Diff: D33198155 (d35fc409ad)

fbshipit-source-id: 411aaeb4e047aad9202db50d4d0f2ff35bc51f9d
2022-01-04 13:44:59 -08:00
Tugsbayasgalan (Tugsuu) Manlaibaatar
d35fc409ad Bump version number to 7 and compile old operators with old schema (#68358)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68358

Test Plan: Imported from OSS

Reviewed By: samdow

Differential Revision: D33198155

Pulled By: tugsbayasgalan

fbshipit-source-id: 38a1185f9ecb34a33f737ad0b060b3490956300c
2022-01-04 01:31:25 -08:00
Salil Desai
35251a5528 [PyTorch] Add Enum to IValue Deepcopy (#69937)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69937

This enables ```export_torch_mobile_model``` compatibility with Enum IValues

Test Plan: ModuleAPITest.DeepCopyEnum

Reviewed By: gmagogsfm

Differential Revision: D33104681

fbshipit-source-id: ca2a6d259c312487fe38dd1bed33ab6b7910bc2a
2021-12-30 07:52:22 -08:00
George Qi
8af39b7668 AdaptiveLogSoftmaxWithLoss no_batch_dim support (#69054)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69054

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D33200166

Pulled By: george-qi

fbshipit-source-id: 9d953744351a25f372418d2a64e8402356d1e9b7
2021-12-29 10:25:26 -08:00
Bo Wu
bf610f08b0 Back out "Make TorchScript Preserve Fully Qualified Class Name for Python Exceptions"
Summary: as title

Test Plan:
```
buck run mode/opt-split-dwarf -c=python.package_style=inplace //ai_infra/distributed_ai/pyper_test_framework/templates:pyper_release_v2 -- --model inline_cvr_post_imp_deterministic_shrunk_pyper_release_v2 --cluster TSCTestCluster --hpc_identity oncall_pyper_oncall --stage prod_offline_training --test_module training_platform
...
############## Start inline_cvr_post_imp_model Test Results Analysis ##############
I1226 22:03:56.789000 3346280 test_driver.py:139  UNKNOWN     ] Test finished in 808.2743511786684 seconds.
+-------------------------+---------+------------------------+-----------------+
| Test Case               | Status  | Message                | Model Entity ID |
+-------------------------+---------+------------------------+-----------------+
| SmallWorld_release_test | Success | finished successfully. | 987987491       |
+-------------------------+---------+------------------------+-----------------+
I1226 22:03:56.790000 3346280 test_driver.py:143  UNKNOWN     ] test_run_id: 3d085f61-28d1-411d-bd27-940ea2554b23 use this id to find your run in scuba pyper_test_framework
I1226 22:03:56.792000 3346280 test_driver.py:160  UNKNOWN     ] Calling cleanup
I1226 22:03:56.792000 3346280 training_platform_test_launcher.py:385  UNKNOWN     ] Stopping launched jobs 1
I1226 22:03:59.563122 3346280 ClientSingletonManager.cpp:100] Shutting down Manifold ClientSingletonManager
```

Reviewed By: seemethere

Differential Revision: D33325936

fbshipit-source-id: 64414bf7061ad77e8ac12eb8abafee4043e0fa1e
2021-12-27 09:11:46 -08:00
Tugsbayasgalan (Tugsuu) Manlaibaatar
4ae71c8d34 Add graph op replacement pass (#69915)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69915

Test Plan: Imported from OSS

Reviewed By: samdow

Differential Revision: D33198158

Pulled By: tugsbayasgalan

fbshipit-source-id: f2b924edf9959aaf51f97db994fae031fa062cf8
2021-12-25 13:03:19 -08:00
Tugsbayasgalan (Tugsuu) Manlaibaatar
df3cbcff28 Add utility methods to find an upgrader (#68355)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68355

Test Plan: Imported from OSS

Reviewed By: samdow

Differential Revision: D33198156

Pulled By: tugsbayasgalan

fbshipit-source-id: 68380148f0d9bee96d8090bf01c8dfca8e1f8b12
2021-12-24 12:23:04 -08:00
Shunting Zhang
911d527b87 Make TorchScript Preserve Fully Qualified Class Name for Python Exceptions (#70339)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70339

When a python program is translated to TorchScript, the python exception type is dropped. This makes users's life hard when they need to categorize errors based more than only exception message.

Here we make the change so when we raise a python exception, we record the fully qualified class name for the exception. Later on when the TorchScript is interpreted, a special exception CustomJITException is thrown. User can get the python class name from CustomJITException::getPythonClassName .

Note that, this diff does not customize the mapping from C++ exception to Python exception. It's left to the users to do whatever mapping they want.

Code under scripts/shunting are just my own experimental code. I can split them out if requested.
ghstack-source-id: 146221879

Test Plan: buck test mode/opt //caffe2/test:jit

Reviewed By: gmagogsfm

Differential Revision: D33282878

fbshipit-source-id: 910f67a764519f1053a48589d1a34df69001525d
2021-12-24 00:25:40 -08:00
Jiewen Tan
ab57f6d12c [LTC] Upstream utils to extract BackendDevice from at::Tensor (#70069)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70069

This commit upstreams utils to extract BackendDevice from at::Tensor.

Test Plan: ./build/bin/test_lazy --gtest_filter=BackendDeviceTest.GetBackendDevice*

Reviewed By: samdow

Differential Revision: D33293160

Pulled By: alanwaketan

fbshipit-source-id: 78647239f90b4d04adce84ae6022b8983ad30c09
2021-12-23 12:42:03 -08:00
Michael Suo
795af1578c Revert D33172665: [LTC] Upstream utils to extract BackendDevice from at::Tensor
Test Plan: revert-hammer

Differential Revision:
D33172665 (121d067999)

Original commit changeset: b334ee358ea7

Original Phabricator Diff: D33172665 (121d067999)

fbshipit-source-id: 8bff43cddfc5d30483ec5cea8eff037aab9d1cfa
2021-12-22 21:12:49 -08:00
Jiewen Tan
121d067999 [LTC] Upstream utils to extract BackendDevice from at::Tensor (#70069)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70069

This commit upstreams utils to extract BackendDevice from at::Tensor.

Test Plan: ./build/bin/test_lazy --gtest_filter=BackendDeviceTest.GetBackendDevice*

Reviewed By: wconstab

Differential Revision: D33172665

Pulled By: alanwaketan

fbshipit-source-id: b334ee358ea7b031bbffb0a16fa634715dba83f5
2021-12-22 18:15:45 -08:00
vfdev-5
ce9a2f8ba9 [C++ API] Added missing nearest-exact mode and anti-alias flag (#69318)
Summary:
Description:

Following https://github.com/pytorch/pytorch/pull/65142#issuecomment-981995692 adding missing nearest-exact mode and anti-alias flag to C++ frontend.

- https://github.com/pytorch/pytorch/pull/65142
- https://github.com/pytorch/pytorch/pull/64501

- added tests in pytorch/test/cpp/api/functional.cpp

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69318

Reviewed By: davidberard98

Differential Revision: D33278995

Pulled By: jbschlosser

fbshipit-source-id: fa87c0c78df6b398e4f9688cc02111eed187afa7
2021-12-22 11:10:51 -08:00
Jiewen Tan
e02d836cb2 [LTC] Upstream LTCTensorImpl (#70062)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70062

This commit upstreams LTCTensorImpl from the lazy_tensor_staging branch.
It inherits from c10::TensorImpl and thus manages the lifetime/storage
of LazyTensor.

Test Plan: ./build/bin/test_lazy --gtest_filter=LazyTensorImplTest.*

Reviewed By: desertfire

Differential Revision: D33171186

Pulled By: alanwaketan

fbshipit-source-id: 6af9f91cc7c7e997f120cb89a7bcd6785c03ace0
2021-12-22 03:21:52 -08:00
Raghavan Raman
4dec15e6d8 [nnc] Add a run method to TensorExprKernel that takes in output tensors (#69477)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69477

This diff adds a new run method to `TensorExprKernel` which takes in
output tensors as inputs and stores the output in those given tensors.
ghstack-source-id: 146107009

Test Plan: buck test mode/dev-nosan //caffe2/test/cpp/tensorexpr:tensorexpr -- --exact 'caffe2/test/cpp/tensorexpr:tensorexpr - Kernel.RunWithAllocatedOutputs'

Reviewed By: ZolotukhinM

Differential Revision: D32823890

fbshipit-source-id: edc1f4839785124048b034060feb71cb8c1be34f
2021-12-22 00:30:15 -08:00
George Qi
bb51519937 bug fix FractionalMaxPool2d (random_samples dimensions) (#70031)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70031

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D33200618

Pulled By: george-qi

fbshipit-source-id: 142f224c2cab1008d2d4e9ed333697a92d2d42db
2021-12-21 12:21:54 -08:00
Hui Guo
7abb7667a6 [tensorexpr] Add memory planning to reuse intermediate buffers (#66452)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66452

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D31557188

Pulled By: huiguoo

fbshipit-source-id: f18dfeba1df20d5d4f118640fc10782534eb9219
2021-12-17 01:38:02 -08:00
Hui Guo
bbfd7b75ca [tensorexpr] Move the allocation of intermediate buffers from TEK to CodeGen (#67143)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67143

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D31881151

Pulled By: huiguoo

fbshipit-source-id: 457e5d4ff8a15f70af9c797c9ab4803d8e779abe
2021-12-17 01:37:56 -08:00
Hui Guo
c7e0951524 [tensorexpr] Add a stmt recorder to obtain stmt PCs (#66450)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66450

Test Plan: Imported from OSS

Reviewed By: navahgar, ZolotukhinM

Differential Revision: D31557189

Pulled By: huiguoo

fbshipit-source-id: 416d79ddfc46a0109187cdeb919ad9b5abde8030
2021-12-17 01:36:37 -08:00
Zhengxu Chen
d459e79500 [jit][edge] Remove usage of shared_ptr<mobile::Code>. (#68037)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68037

Right now mobile::Code doesn't outlive its enclosing Function, and all accesses to Code happens inside interpreter loop which doesn't outlive the module, so we don't need to use std::shared_ptr here. This also should saves us 1-2 KB for binary size, because shared_ptr seems to bloat on arm64 android.
ghstack-source-id: 145818696

Test Plan: eyes.

Reviewed By: qihqi, tugsbayasgalan

Differential Revision: D32264616

fbshipit-source-id: d83f538d6604cf75fd7728a25127b4849ce7ab2a
2021-12-16 13:11:46 -08:00
Jiawei Lv
b4c4a015d6 Revert D33163841: Revert D33102715: Back out "Revert D32606547: torch/monitor: add C++ events and handlers"
Test Plan: revert-hammer

Differential Revision:
D33163841

Original commit changeset: e262b6d8c80a

Original Phabricator Diff: D33102715 (eb374de3f5)

fbshipit-source-id: 644216036a238a458f0a2198460b36d24fb035f8
2021-12-16 11:12:18 -08:00
Jiawei Lv
c80b5b8c8f Revert D33102715: Back out "Revert D32606547: torch/monitor: add C++ events and handlers"
Test Plan: revert-hammer

Differential Revision:
D33102715 (eb374de3f5)

Original commit changeset: 3816ff01c578

Original Phabricator Diff: D33102715 (eb374de3f5)

fbshipit-source-id: e262b6d8c80a05f3a67e024fedfbadefdbfe6e29
2021-12-16 09:39:57 -08:00
David Berard
8c7f4a0d0b [tensorexpr] check for index out of bounds in ir_eval (#68858)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68858

when executing with ir_eval, check for index out of bounds.

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D32657881

Pulled By: davidberard98

fbshipit-source-id: 62dd0f85bb182b34e9c9f795ff761081290f6922
2021-12-16 09:27:45 -08:00
jiej
76d282d447 Nvfuser code bump 12 5 (#69964)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69964

Things added in this PR that requires review:
1. cuLaunchCooperativeKernel driver API added
aten/src/ATen/cuda/detail/LazyNVRTC.cpp
aten/src/ATen/cuda/nvrtc_stub/ATenNVRTC.h

nvfuser code update:
1. perf turning on codegen scheduler that improves performance.
2. permutation support has been extended beyond contiguous/channels-last. (The improvements could be observed on PW benchmark)

Things reverted from local changes:
1. aten::gelu with approximation
2. local changes that is upstreamed in PR https://github.com/pytorch/pytorch/issues/68804

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69428

Reviewed By: ngimel

Differential Revision: D33073817

Pulled By: wconstab

fbshipit-source-id: e77d32e81d037d7370822b040456fd4c3bd68edb
2021-12-16 08:28:54 -08:00
Tristan Rice
eb374de3f5 Back out "Revert D32606547: torch/monitor: add C++ events and handlers" (#69923)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69923

Original commit changeset: fbaf2cc06ad4

Original Phabricator Diff: D32606547 (e61fc1c03b)

This is the same thing as the original diff but just using a normal std::mutex instead of std::shared_timed_mutex which is not available on OSX 10.11. The performance difference should be negligible and easy to change down the line if it does become a bottleneck.

Old failing build: https://github.com/pytorch/pytorch/runs/4495465412?check_suite_focus=true

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68783

Test Plan:
buck test //caffe2/test/cpp/monitor:monitor

will add ciflow tags to ensure mac builds are fine

Reviewed By: aivanou

Differential Revision: D33102715

fbshipit-source-id: 3816ff01c578d8e844d303d881a63cf5c3817bdb
2021-12-15 22:51:43 -08:00
Taylor Robie
24bc3be146 [Profiler] Clean up profiler includes. (#69421)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69421

I've hit a lot of build issues in D32671972, and I've come to realize that a lot of it boils down to header hygene. `function.h` includes `profiler.h` *solely* to transitively include `record_function.h` which winds up leaking the profiler symbols. Moreover several files are relying on transitive includes to get access to `getTime`. As long as I have to touch all the places that use `getTime`, I may as well also move them to the new namespace.

Test Plan: Unit tests and CI.

Reviewed By: aaronenyeshi, albanD

Differential Revision: D32865907

fbshipit-source-id: f87d6fd5afb784dca2146436e72c69e34623020e
2021-12-15 12:50:24 -08:00
Chen Lai
408283319a [Operator Versioning][Edge] Change OP to CALL when there is a valid upgrader (#67731)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67731

1. Register upgrader function at loading stage
2. Change OP to CALL when there operator_version from model is smaller than current runtime version and there exists a valid upgrader

The interpreter log is :
```
RUNNING 0 STOREN 1 3
RUNNING 1 DROPR 1
RUNNING 2 LOAD 2
RUNNING 3 LOAD 3
RUNNING 4 CALL 0
RUNNING 0 STOREN 1 2
RUNNING 1 LOAD 1
RUNNING 2 OP 0, aten::is_floating_point
RUNNING 3 JF 3
RUNNING 4 LOADC 1
RUNNING 5 JMP 3
RUNNING 8 STORE 3
RUNNING 9 MOVE 3
RUNNING 10 JF 5
RUNNING 11 LOAD 1
RUNNING 12 LOAD 2
RUNNING 13 OP 1, aten::div.Tensor
RUNNING 14 JMP 5
RUNNING 19 STORE 4
RUNNING 20 DROPR 2
RUNNING 21 DROPR 1
RUNNING 22 MOVE 4
RUNNING 23 RET
RUNNING 5 LOAD 2
RUNNING 6 LOAD 3
RUNNING 7 CALL 0
RUNNING 0 STOREN 1 2
RUNNING 1 LOAD 1
RUNNING 2 OP 0, aten::is_floating_point
RUNNING 3 JF 3
RUNNING 4 LOADC 1
RUNNING 5 JMP 3
RUNNING 8 STORE 3
RUNNING 9 MOVE 3
RUNNING 10 JF 5
RUNNING 11 LOAD 1
RUNNING 12 LOAD 2
RUNNING 13 OP 1, aten::div.Tensor
RUNNING 14 JMP 5
RUNNING 19 STORE 4
RUNNING 20 DROPR 2
RUNNING 21 DROPR 1
RUNNING 22 MOVE 4
RUNNING 23 RET
RUNNING 8 MOVE 2
RUNNING 9 MOVE 3
RUNNING 10 CALL 0
RUNNING 0 STOREN 1 2
RUNNING 1 LOAD 1
RUNNING 2 OP 0, aten::is_floating_point
RUNNING 3 JF 3
RUNNING 4 LOADC 1
RUNNING 5 JMP 3
RUNNING 8 STORE 3
RUNNING 9 MOVE 3
RUNNING 10 JF 5
RUNNING 11 LOAD 1
RUNNING 12 LOAD 2
RUNNING 13 OP 1, aten::div.Tensor
RUNNING 14 JMP 5
RUNNING 19 STORE 4
RUNNING 20 DROPR 2
RUNNING 21 DROPR 1
RUNNING 22 MOVE 4
RUNNING 23 RET
RUNNING 11 TUPLE_CONSTRUCT 3
RUNNING 12 RET
```

The upgrader bytecode is:
```
(STOREN, 1, 2)
(LOAD, 1, 0)
(OP, 0, 0)
(JF, 3, 0)
(LOADC, 1, 0)
(JMP, 3, 0)
(LOAD, 2, 0)
(OP, 0, 0)
(STORE, 3, 0)
(MOVE, 3, 0)
(JF, 5, 0)
(LOAD, 1, 0)
(LOAD, 2, 0)
(OP, 1, 0)
(JMP, 5, 0)
(LOAD, 1, 0)
(LOAD, 2, 0)
(LOADC, 0, 0)
(OP, 2, 0)
(STORE, 4, 0)
(DROPR, 2, 0)
(DROPR, 1, 0)
(MOVE, 4, 0)
(RET, 0, 0)
```
ghstack-source-id: 145635622

Test Plan: describe in summary and CI

Reviewed By: iseeyuan

Differential Revision: D32092517

fbshipit-source-id: 0314b4bda5d2578cdd4e7cfbfd1e3c07fbccf8a3
2021-12-14 19:13:12 -08:00
Chen Lai
9e4d60a552 [Operator Versioning][Edge] Use check in cpp source file for upgrader (#67728)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67728

1. Check in upgrader_mobile.h and upgrader_mobile.cpp
2. Add test to parse all bytecode from upgrader_mobile.h
ghstack-source-id: 145635621

Test Plan: buck test mode/dev //caffe2/test/cpp/jit:jit -- --exact 'caffe2/test/cpp/jit:jit - LiteInterpreterUpgraderTest.Upgrader'

Reviewed By: iseeyuan

Differential Revision: D32087295

fbshipit-source-id: 21e95aabb5e9db76be27e01adfea8fbc41caeaf6
2021-12-14 19:10:51 -08:00
Michael Suo
f565167fbd Revert D32606547: torch/monitor: add C++ events and handlers
Test Plan: revert-hammer

Differential Revision:
D32606547 (e61fc1c03b)

Original commit changeset: a00d0364092d

Original Phabricator Diff: D32606547 (e61fc1c03b)

fbshipit-source-id: fbaf2cc06ad4bec606e8a9c6f591d65c04e6fa56
2021-12-11 22:51:03 -08:00
Tristan Rice
e61fc1c03b torch/monitor: add C++ events and handlers (#68783)
Summary:
This adds a C++ event handler corresponding to the Python one mentioned in the RFC.

This changes the counters a bit to all be push driven instead of being polled. The two window types are "fixed count" and "interval". One is based off the number of logged events and the other is based off of time windows. There's currently no active ticker for interval so it needs a regular stream of events to ensure events are produced. A follow up diff can add support for things like HHWheel / simple ticker.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68783

Test Plan: buck test //caffe2/test/cpp/monitor:monitor

Reviewed By: kiukchung

Differential Revision: D32606547

fbshipit-source-id: a00d0364092d7d8a98e0b18e503c0ca8ede2bead
2021-12-11 16:44:46 -08:00
Yanan Cao
17f3179d60 Back out "[pytorch][PR] Add ability for a mobile::Module to save as flatbuffer" (#69796)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69796

(Note: this ignores all push blocking failures!)

Test Plan: External CI + Sandcastle

Reviewed By: zhxchen17

Differential Revision: D33032671

fbshipit-source-id: dbf6690e960e25d6a5f19043cbe792add2acd7ef
2021-12-10 21:29:53 -08:00
Hao Lu
91d16cb633 [Jit] Fix schema of aten::split int[] version (#69745)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69745

Missed in D31935573 (6b44e75f6b).

Reviewed By: d1jang

Differential Revision: D31889867

fbshipit-source-id: 417bd0b15db4891dbd641b35a803553f11d0d756
2021-12-10 02:33:36 -08:00
Nikita Shulga
3bb20ae49f Make c10d tests -Werror clean (#69703)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69703

Test Plan: Imported from OSS

Reviewed By: seemethere

Differential Revision: D32997001

Pulled By: malfet

fbshipit-source-id: 38b5f195c04f2b3b920e6883a96fe9a36345b9d2
2021-12-09 22:10:04 -08:00
Ivan Kobzarev
7dba88dfdb [nnc][quant] Fix quantized concat (#69596)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69596

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D32941108

Pulled By: IvanKobzarev

fbshipit-source-id: 727f608b98625648e2e444396d910838c95f58f2
2021-12-09 18:55:32 -08:00
Peter Bell
b2e79ed5ec Remove WindowsTorchApiMacro.h in favor of Export.h (#69585)
Summary:
Follow up to https://github.com/pytorch/pytorch/issues/68095

This also changes the files from the ATen folder to include c10's `Export.h` instead since they can't ever be exporting `TORCH_PYTHON_API`.

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69585

Reviewed By: mrshenli

Differential Revision: D32958594

Pulled By: albanD

fbshipit-source-id: 1ec7ef63764573fa2b486928955e3a1172150061
2021-12-09 17:30:09 -08:00
Han Qi
d3649309e6 [pytorch][PR] Add ability for a mobile::Module to save as flatbuffer (#69306)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69306

Included functions:

save_mobile_module -> saves a mobile::Module to flatbuffer
load_mobile_module_from_file -> loads a flatbuffer into mobile::Module
parse_mobile_module -> parses from bytes or deserialized flatbuffer
Module object

Test Plan: unittests

Reviewed By: gmagogsfm

Differential Revision: D32806835

fbshipit-source-id: 71913c6650e225634f878946bd16960d377a7f57
2021-12-09 14:53:31 -08:00
Richard Barnes
afb742382a use irange for loops 10 (#69394)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69394

Modified loops in files under fbsource/fbcode/caffe2/ from the format
```
for(TYPE var=x0;var<x_max;x++)
```
to the format
```
for(const auto var: irange(xmax))
```

This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand.

Test Plan: Sandcastle

Reviewed By: malfet

Differential Revision: D32837991

fbshipit-source-id: fc7c4f76d2f32a17a0faf329294b3fe7cb81df32
2021-12-09 09:49:34 -08:00
Chen Lai
13faaff54c [Operator Versioning][Edge] Implement register function for upgrader (#67730)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67730

This pr implement the register function for upgrader so it can be used at loading stage
ghstack-source-id: 145170986

Test Plan:
```
buck test //caffe2/test/cpp/jit:jit
```

Reviewed By: iseeyuan

Differential Revision: D32092518

fbshipit-source-id: 779b51eb12b8cb162a93a55c1e66fe0becc4cb36
2021-12-09 02:18:09 -08:00
Peter Bell
e279963eef Remove remaining THC code (#69039)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69039

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D32872476

Pulled By: ngimel

fbshipit-source-id: 7972aacc24aef9450fb59b707ed6396c501bcb31
2021-12-08 12:18:08 -08:00
Bin Bao
e8f4c9cc40 [LT] Upstream LazyView and view ops IR Nodes (#69277)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69277

LazyView is the main class for tracking alias caused by view
ops. The corresponding IR classes for view ops are hand-written now, and
we can switch to code-gen them in future. For certain view ops, they
have a reverse IR class to perform inplace update in the backward
direction on a chain of alias ops.

As part of the future work, we will simplify the logic for LazyView once
the functionalization pass in core is ready to use.

Test Plan: Imported from OSS

Reviewed By: wconstab

Differential Revision: D32820014

Pulled By: desertfire

fbshipit-source-id: d9eb526cb23885f667e4815dc9dd291a7b7e4256
2021-12-04 08:44:54 -08:00
Ramanpreet Nara
f587267dc7 Revert D31705359: use irange for loops 8
Test Plan: revert-hammer

Differential Revision:
D31705359 (17e5200441)

Original commit changeset: c9ea2fbc0f9c

fbshipit-source-id: 08fff2d12beca953ad30dd0baabf86e39ac84f14
2021-12-02 12:55:08 -08:00
Richard Barnes
17e5200441 use irange for loops 8 (#66743)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66743

Modified loops in files under fbsource/fbcode/caffe2/ from the format

`for(TYPE var=x0;var<x_max;x++)`

to the format

`for(const auto var: irange(xmax))`

This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand.

Test Plan: Sandcastle

Reviewed By: malfet

Differential Revision: D31705359

fbshipit-source-id: c9ea2fbc0f9cd29e97a52dcb203addc5f2abb09b
2021-12-02 10:21:29 -08:00
Alban Desmaison
00ebbd5ef6 Revert D32010095: [pytorch][PR] Add ability for a mobile::Module to save as flatbuffer
Test Plan: revert-hammer

Differential Revision:
D32010095 (41d35dc201)

Original commit changeset: d763b0557780

fbshipit-source-id: bf746a0389135c9f5f67f00f449435ce08fb5f6d
2021-12-02 06:41:40 -08:00
Han Qi
41d35dc201 Add ability for a mobile::Module to save as flatbuffer (#67351)
Summary:
Included functions:

* save_mobile_module -> saves a mobile::Module to flatbuffer
* load_mobile_module_from_file -> loads a flatbuffer into mobile::Module
* parse_mobile_module -> parses from bytes or deserialized flatbuffer
      Module object

Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67351

Reviewed By: iseeyuan

Differential Revision: D32010095

Pulled By: qihqi

fbshipit-source-id: d763b0557780f7c2661b6485105b045e41a5e8f1
2021-12-01 23:58:15 -08:00
Jacob Szwejbka
291e56eda4 [Pytorch Edge] Update Black Box Api with operator versioning (#68678)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68678

Test Plan: Ill update the unit test before land

Reviewed By: cccclai

Differential Revision: D32573603

fbshipit-source-id: 19271bcbb68b61d24d6943e61a943f4f75fddb5d
2021-12-01 19:13:32 -08:00
Chen Lai
b9738e923e [Operator Versioning][Edge] Add old models and unittest (#67726)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67726

1. Check in one model with aten:div_tensor old op with unittest in both cpp and python. The following two lines are commented out and expected to work after using upgrader.
```
_helper(mobile_module_v2, div_tensor_0_3)
_helper(current_mobile_module, torch.div)
```

2. Update the commented code accordingly.

Currently there are 6 upgraders. The following old models with operators are added to cover these 6 upgraders:
```
// Tensor x Tensor

test_versioned_div_tensor_v3

// Tensor x Scalar

test_versioned_div_scalar_float_v3
test_versioned_div_scalar_reciprocal_int_v3
test_versioned_div_scalar_inplace_float_v3

// Scalar x Scalar

test_versioned_div_scalar_scalar_v3

// Tensor x Tensor with out kwarg

test_versioned_div_tensor_out_v3

// Tensor x Tensor inplace

test_versioned_div_tensor_inplace_v3

// Tensor x Scalar inplace

test_versioned_div_scalar_inplace_int_v3

```
Note:
In this pr, per model, it includes the following test:
1. Model (with old op) load/run test will be in both cpp and python
2. Model (with old op) + upgrader test will be in python
Other tests considered adding:
1. per upgrader bytecode test
2. app level integration test
ghstack-source-id: 144422418

Test Plan: CI and the added unittest

Reviewed By: iseeyuan

Differential Revision: D32069653

fbshipit-source-id: 96d9567088a1f709bc7795f78beed7a308e71ca9
2021-12-01 18:46:30 -08:00
Jiewen Tan
e6c435bf96 [LTC] Upstream helpers for c10::Device <=> BackendDevice (#69064)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69064

This commit upstreams helpers for converting a c10::Device to
BackendDevice and vice versa.

Test Plan: ./build/bin/test_lazy --gtest_filter=BackendDeviceTest.FromAten:BackendDeviceTest.ToAten

Reviewed By: wconstab

Differential Revision: D32732607

Pulled By: alanwaketan

fbshipit-source-id: 0dd233d37a4a30fc4b22dba322ddd85d4cb3635b
2021-12-01 12:15:32 -08:00
Scott Wolchok
1d84d8c5d8 [PyTorch] Remove StringView from RecordFunction interface (1/2) (#68410)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68410

First step toward not heap-allocating a string in RecordFunction::before() every time
ghstack-source-id: 144287654

Test Plan: CI

Reviewed By: chaekit

Differential Revision: D32453847

fbshipit-source-id: 080d95095fb568287b65fcc41a4ca6929b5f9a87
2021-11-30 13:20:08 -08:00
Joel Schlosser
8fef7c09f5 Remove finput from slow2d signatures (#68896)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68896

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D32655874

Pulled By: jbschlosser

fbshipit-source-id: 3c9acb106961c40af1432652179edb2bc5a4bfa5
2021-11-30 09:47:24 -08:00
Jiewen Tan
0cdeb586ae [LTC] Upstream some utilities (#69046)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69046

This commit upstreams utilities including ExceptionCleanup, MaybeRef,
Iota, ToVector, ToOptionalVector and GetEnumValue.

Test Plan: ./build/bin/test_lazy --gtest_filter=UtilTest.*

Reviewed By: wconstab, Chillee

Differential Revision: D32709090

Pulled By: alanwaketan

fbshipit-source-id: 5147433becd4dbb07be7d36d66b0b8685054d714
2021-11-30 02:44:02 -08:00
Mikhail Zolotukhin
75ce040620 [TensorExpr] Allow for 'keepdim' argument in aten::mean in NNC's external call. (#68756)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68756

That fixes some warnings in our tests.

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D32600952

Pulled By: ZolotukhinM

fbshipit-source-id: 548eaf3659e20795cce44d8f57e77f4a47d44d98
2021-11-30 00:06:34 -08:00
Vinnam Kim
7b701ce2d4 Add set_to_none option to C++ API (#68801)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/68167.

Signed-off-by: Vinnam Kim <vinnam.kim@makinarocks.ai>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68801

Reviewed By: mruberry

Differential Revision: D32625239

Pulled By: jbschlosser

fbshipit-source-id: 5f09b959e23d5448106a47029d06ec20ad094d82
2021-11-29 08:42:39 -08:00
Bin Bao
787ded5103 Add lazy::Shape::numel() (#68314)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68314

Add a convenience to lazy::Shape for counting the number of elements (by multiplying out the dimensions).  This is a method on Tensor, and in switching other lazy tensor shape utils to use aten shape inference, we need numel counts.

Test Plan: add unit tests

Reviewed By: alanwaketan

Differential Revision: D32409138

fbshipit-source-id: 3ae725300f8826d38e45412f46501d5e5f776fb2
2021-11-29 08:38:09 -08:00
Han Qi
959cb03132 Populate operator_input_sizes_ (#68542)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68542

title

Test Plan: unittest

Reviewed By: iseeyuan

Differential Revision: D32508159

fbshipit-source-id: 0773a725973a493f19a2e9a340365e559dfdf7f8
2021-11-23 12:18:06 -08:00
Tristan Rice
758d7dea9c torch.monitor - Initial C++ Stats (#68074)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68074

This is the first step of many PRs towards implementing the `torch.monitor` RFC https://github.com/pytorch/rfcs/pull/30

This defines the aggregation types, the `Stat` class and provides some simple collection of the stats.

This doesn't match the RFC exactly as it incorporates some of the comments on the RFC as well as a few changes for performance.

Changes:
* added window_size to the stats. If specified it will always compute the stat using the `window_size` number of values. If there aren't enough values within that window it reports the previous stats.
* This doesn't include the push metrics yet (will be coming).
  After more discussion it looks like the best way to handle this is to support a hybrid where the metric can set how frequently it'll be logged. For fixed window_size metrics it'll be logged each time it hits the window size. This will allow performant counters as well as lower frequency push counters (window_size=1).

Performance considerations:
* Updating the stats acquires a lock on that Stat object. This should be performant unless there's many-many threads writing to the same stat. Single thread will typically use futex so should be quite fast.
* Adding/removing/fetching all stats sets a global lock on the stat list -- this shouldn't be an issue since these events happen infrequently.
* Fetching stats accesses one stat at a time instead of a global lock. This means the exported values are linearizable but not serializable across multiple stats but I don't expect this to be an issue.

Next steps:
1. Add StatCollector interface for push style metrics
1. Add pybind interfaces to expose to Python
1. Add default metric providers
1. Integrate into Kineto trace view

Test Plan:
buck test //caffe2/test/cpp/monitor:monitor

CI

Reviewed By: kiukchung

Differential Revision: D32266032

fbshipit-source-id: dab8747b4712f5dba5644387817a3a0fda18b66a
2021-11-18 21:46:23 -08:00
Hongyi Jia
146a7f68e2 Enable desync root cause analysis for NCCL (#68310)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68310

Enable desync root cause analysis by recording the last footprint of collective calls. When timeout we parse the store trace and figure out the root cause of the desync issue. This feature is built based on async error handling.

Test Plan:
Standalone test
* Typical desync - P467288969
* Mismatched collectives - P467288916
* Mismatched broadcast size - P467288873

DDP benchmark
* DDP benchmark desync - P467433483, P467520195

No perf regression:
* w/o this diff https://www.internalfb.com/intern/fblearner/details/308379789?tab=Outputs
* w/ this diff https://www.internalfb.com/intern/fblearner/details/308534088?tab=Outputs

Reviewed By: mingzhe09088

Differential Revision: D32348647

fbshipit-source-id: 43e7e96e3fa2be0ac66c1325bceb639b461a8b3a
2021-11-17 20:29:03 -08:00
Han Qi
4eb772fde6 Refactor saving jit::Module to mobile .pt in 2 steps: (#66494)
Summary:
1. is to convert Function -> mobile::Function
2. is to serialize mobile::Function

This also opens opportunity to create mobile::Module without saving/reloading

Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66494

Reviewed By: zhxchen17

Differential Revision: D32293022

Pulled By: qihqi

fbshipit-source-id: 29b43d47ff86071d5e2f9d6ca4dba4445711ce3d
2021-11-17 12:02:20 -08:00
jjsjann123
0dc3f829d9 Nvfuser code bump 11 5 (#67943)
Summary:
nvfuser code update:
1. Tuning heuristics on schedulers for reduction/normalization kernels;
2. bfloat16 on IO tensor support;
3. Refactored memory format support, now we can support dimension collapsing with non-coherent input tensors with different memory format. e.g. channels last tensor input to batch normalization. Note that we are currently limiting memory format to only Contiguous and Channels last;
4. Refactored nvfuser graph partitioning in `graph_fuser.cpp`, separated node merge and profile node API. Updated `profiling_record.cpp`.

Things that are reverted from our local branch:
1. changes on some entries in autodiff
2. aten::gelu with approximation
3. native_dropout(_backward)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67943

Reviewed By: ngimel

Differential Revision: D32288709

Pulled By: dzhulgakov

fbshipit-source-id: fc9491182ea7e0158bc112c66f096823c588eaf1
2021-11-17 01:22:17 -08:00
Raghavan Raman
2fd468e5f8 [jit] Set the graph input types before interpreting the graph during tracing (#68242)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68242

Test Plan: Imported from OSS

Reviewed By: saketh-are

Differential Revision: D32382958

Pulled By: navahgar

fbshipit-source-id: 4e82a604a9ea2046af2755de23944147e618a65f
2021-11-15 15:44:32 -08:00
Mike Iovine
c697eeba72 [JIT] Combine concat nodes where possible (#67000)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67000

See the [related issue](https://github.com/pytorch/pytorch/issues/66654) for context.

This new JIT optimization transforms patterns like this:
```
%inputs.1 : Tensor[] = prim::ListConstruct(%a, %b, %c)
%concat.1 : Tensor = aten::cat(%inputs, %dim)
%inputs.2 : Tensor[] = prim::ListConstruct(%x, %concat.1, %y)
%concat.2 : Tensor = aten::cat(%inputs.2, %dim)
```
into this:
```
%inputs.2 : Tensor[] = prim::ListConstruct(%x, %a, %b, %c, %y)
%concat.2 : Tensor = aten::cat(%inputs.2, %dim)
```
(it can do this for chains of `aten::cat` longer than 2 as well)

A few conditions have to hold:
1.  The `dim`s have to match.
2. `inputs.1` and `inputs.2` cannot be mutated

Test Plan: `buck test caffe2/test/cpp/jit:jit -- ConcatOpt`

Reviewed By: d1jang

Differential Revision: D31819491

fbshipit-source-id: 9f1a501d52099eb1a630b5dd906df4c38c3817ba
2021-11-15 12:02:45 -08:00
Mikhail Zolotukhin
e511a7a5b4 [TensorExpr] Remove non-determinism in iterating over unordered_set of intermediate buffers. (#68277)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68277

Differential Revision:
D32400553
D32400553

Test Plan: Imported from OSS

Reviewed By: saketh-are, priyaramani

Pulled By: ZolotukhinM

fbshipit-source-id: a8fe820bbddaa19f95db432efaa6d3e36095a05e
2021-11-13 00:50:57 -08:00
Will Constable
6ddaf3bd37 [LT] Upstream TsNode, TsNodeLowering, TsLoweringContext (#68154)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68154

Test Plan: added a basic test; cover more by using lazy_tensor_staging tests

Reviewed By: Krovatkin, alanwaketan

Differential Revision: D32224303

fbshipit-source-id: ac3e1161229b8ae60fdb15ffa72e17072b595914
2021-11-12 12:57:20 -08:00
Will Constable
dc24503a89 Fix Hash(c10::Scalar), account for garbage data in union (#68201)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68201

Hash(c10::Scalar) made a bad assumption that it was valid to just hash over all the bytes of data of the c10::Scalar struct.

Becuase c10::Scalar stores a union of different (float/int/complex) types with different sizes, not all bytes are valid in all cases.  Hash() should only read the bytes corresponding to the currently active type.

Test Plan: Added new unit tests.  Verified HashTest.Scalar failed with the original Hash() impl and then fixed.

Reviewed By: alanwaketan

Differential Revision: D32367564

fbshipit-source-id: ac30dd4f6dd0513954986d3d23c0c11ba802c37b
2021-11-12 07:20:08 -08:00
Howard Huang
7b376bf844 Remove ProcessGroup from TensorPipeAgent initialization (#68128)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68128

Reland of D31762735 (0cbfd466d2).

This diff was originally reverted due to failure in test_send_export_type_through_rpc_with_custom_pickler.

I updated rpc_pickler_test.py to prevent a race condition where processes were not registering their pickler before handling their rpc_sync calls.

Test Plan:
rpc_pickler_test file:

buck test mode/dev-nosan -c 'cxx.coverage_only=caffe2' //caffe2/torch/fb/training_toolkit/backend/metrics/tests:rpc_pickler_test //caffe2/torch/fb/training_toolkit/backend/metrics/collectors/fbdata_aggregator/tests:batch_collector_test -- --run-disabled --collect-coverage '--code-coverage-session=test_session' --force-tpx

rpc_pickler stress test:

buck test mode/dev-nosan -c 'cxx.coverage_only=caffe2' //caffe2/torch/fb/training_toolkit/backend/metrics/tests:rpc_pickler_test -- --exact 'caffe2/torch/fb/training_toolkit/backend/metrics/tests:rpc_pickler_test - test_send_export_type_through_rpc_with_custom_pickler (caffe2.torch.fb.training_toolkit.backend.metrics.tests.rpc_pickler_test.CythonTypeRpcSpawnTest)' --run-disabled --collect-coverage '--code-coverage-session=test_session' --force-tpx --jobs 18 --stress-runs 10 --record-results

Reviewed By: mrshenli

Differential Revision: D32316077

fbshipit-source-id: e58de2335fbaa3ab46d46fe222c659197633a5e4
2021-11-11 12:28:55 -08:00
Martin Yuan
bd5f33f91e demo backend decoupled from operators (#66100)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66100

A backend should not directly dependent on ATen operators. The demo backend is changed to that way for testing purpose.

Test Plan: Imported from OSS

Reviewed By: pavithranrao

Differential Revision: D31384614

Pulled By: iseeyuan

fbshipit-source-id: c97f0c4aa12feb1d124f1d7a852e9955a7a2ce42
2021-11-11 10:26:17 -08:00
Will Constable
d6e6064efc [LT] Upstream backend interfaces (#67927)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67927

BackendData - represents 'tensor data' in opaque backend storage
LoweringContext - interface for performing backend-specific IR lowering
BackendImplInterface - interface for lazy tensors backends to implement

Reorgs backend-related files into lazy/backend subdir

includes a few small fixes, which were made on lazy_tensor_staging but need to be back-ported to master.

Test Plan: used by lazy_tensor_staging branch

Reviewed By: desertfire

Differential Revision: D32142032

fbshipit-source-id: 828c717bcd0d511876e64ad209b50f7bfb10cec5
2021-11-10 12:55:31 -08:00
Jiewen Tan
6011c35a79 [LTC] Upstream class BackendDevice (#68027)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68027

This commit upstreams class BackendDevice to the master, which is a backend
specific representation of the actual hardware, for instances, CPU, GPU, or
TPU.

This concept is important for backend like XLA where it needs to tell the
actual hardware type from the c10::DeviceType::Lazy virtual device during
both IR constructions and lowerings.

Test Plan: ./build/bin/test_lazy --gtest_filter=BackendDeviceTest.*

Reviewed By: wconstab

Differential Revision: D32261838

Pulled By: alanwaketan

fbshipit-source-id: 579c3fc5f9da7847c887a383c6047e8ecb9cc5bc
2021-11-10 07:05:43 -08:00
Bin Bao
a027551358 [LT] Merge cache.h (#67929)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67929

1. Write a node-hash based unit test for Cache
2. Replace CHECK with TORCH_CHECK in IrUtil

Test Plan: Imported from OSS

Reviewed By: H-Huang

Differential Revision: D32246134

Pulled By: desertfire

fbshipit-source-id: c464bc300126d47e9ad4af3b3e8484a389757dc0
2021-11-09 12:02:02 -08:00
Bin Bao
a473417076 [LT] Merge permutation_util into master (#67766)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67766

Test Plan: `build/bin/test_lazy`

Reviewed By: wconstab

Differential Revision: D32147676

Pulled By: desertfire

fbshipit-source-id: 528b48c9cf789abc171235091c7146b2ab7a9c76
2021-11-09 12:00:39 -08:00
Howard Huang
9fb3ba9d7b Revert D31762735 (#67924)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67924

This diff reverts the changes made in D31762735 (0cbfd466d2)

Test Plan: Wait for CI

Reviewed By: derekmod-fb

Differential Revision: D32214744

fbshipit-source-id: e0a65b6a31a88216ae1243549fcbc901ef812374
2021-11-06 17:34:13 -07:00
Chen Lai
ae501a9727 [PyTorch Edge] Update bytecode version compatibility check (#67417)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67417

bytecode version is valid when it's smaller than kMaxSupported and larger than kMinSupported
ghstack-source-id: 142609392

Test Plan:
```
buck test mode/dev //caffe2/test/cpp/jit:jit -- --exact 'caffe2/test/cpp/jit:jit - LiteInterpreterTest.isCompatibleFail'
```

Reviewed By: JacobSzwejbka, iseeyuan

Differential Revision: D31984839

fbshipit-source-id: 2011e77455c931c0a8a58267494d44bcf167b877
2021-11-05 19:34:01 -07:00
Raghavan Raman
e7a3bbce89 [nnc] Add support for dynamic shapes in TensorExprKernel (#67861)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67861

Previously submitted as https://github.com/pytorch/pytorch/pull/67197.
This got reverted because its failures were hidden by the failures of
another PR.

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D32178196

Pulled By: navahgar

fbshipit-source-id: cc8a5c68aed360d06289e69645461cfa773e1300
2021-11-05 11:18:19 -07:00
Jiewen Tan
8bed46ef38 [WIP][LTC] Upstream class Shape (#67672)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67672

This commit Upstreams class Shape from lazy_tensor_staging branch.

Test Plan: WIP.

Reviewed By: malfet

Differential Revision: D32095478

Pulled By: alanwaketan

fbshipit-source-id: 61611b12fc079b195833b5b22a6cf73c0935b8b9
2021-11-04 14:12:03 -07:00
Rohan Varma
90d311b268 [RPC] Add exception logging to constValue() (#67802)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67802

In RPC C++ code, we might sometimes call constValue() when the future actually has an exception, and in unittests we want to assert on the exception. What happens is that we get a message basically saying "!eptr_" which indicates there is some exception but we don't know what it is.

This diff simply adds logging for the exception and mentions that `value` over `constValue` should be used when the future can have an exception. The contract of `constValue` to throw when `eptr_` is set is still held, it is just enhanced with additional logging.
ghstack-source-id: 142375391

Test Plan: Added UT

Reviewed By: mrshenli

Differential Revision: D32156552

fbshipit-source-id: 4dd5e73b92173209074c104a4b75c2021e20de4b
2021-11-04 10:04:09 -07:00
Natalia Gimelshein
ca445645f9 Revert D31902471: [nnc] Add support for dynamic shapes in TensorExprKernel
Test Plan: revert-hammer

Differential Revision:
D31902471 (15a3c374e2)

Original commit changeset: d2729a38ba1a

fbshipit-source-id: 4c05de82e626bbf744df84fd2b914b66fd165a19
2021-11-03 14:48:12 -07:00
Raghavan Raman
15a3c374e2 [nnc] Add support for dynamic shapes in TensorExprKernel (#67197)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67197

Test Plan: Imported from OSS

Reviewed By: eellison, ZolotukhinM

Differential Revision: D31902471

Pulled By: navahgar

fbshipit-source-id: d2729a38ba1ac607ff07f516ed56fbd9085715dc
2021-11-03 11:24:17 -07:00
Raghavan Raman
383c1f51b1 [nnc] Fixed handling of 0-sized tensors in cat (#67734)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67734

The implementation of `aten::cat` op in NNC has to ignore tensors that have 0-size in any dimension.

Test Plan: `buck test mode/dev-nosan //caffe2/test/cpp/tensorexpr:tensorexpr -- --exact 'caffe2/test/cpp/tensorexpr:tensorexpr - Kernel.CatWithEmptyInputs'`

Reviewed By: ZolotukhinM

Differential Revision: D32122171

fbshipit-source-id: 90c697813bc504664673cdc262df6e7ce419c655
2021-11-03 10:16:16 -07:00
Mikhail Zolotukhin
ff5c61a74e [TensorExpr] Add lowering for aten::max (reduction). (#66519)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66519

Differential Revision:
D31590853
D31590853

Test Plan: Imported from OSS

Reviewed By: navahgar

Pulled By: ZolotukhinM

fbshipit-source-id: a702621621f681d7f5392912e8a77ca124e14170
2021-11-03 09:44:09 -07:00
Mikhail Zolotukhin
00afe9ba7b [TensorExpr] Add lowering for aten::embedding. (#66518)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66518

Differential Revision:
D31590855
D31590855

Test Plan: Imported from OSS

Reviewed By: pbelevich

Pulled By: ZolotukhinM

fbshipit-source-id: aace0a87b1649330dae44182f7873aca27160d64
2021-11-03 09:44:07 -07:00
Mikhail Zolotukhin
008a58d226 [TensorExpr] Add lowering for aten::conv1d. (#66517)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66517

Differential Revision:
D31590856
D31590856

Test Plan: Imported from OSS

Reviewed By: pbelevich

Pulled By: ZolotukhinM

fbshipit-source-id: c05a37d8741acd0606c2adb8d6cfeb1f57bc8aa0
2021-11-03 09:44:05 -07:00
Mikhail Zolotukhin
d58ef2bbff [TensorExpr] Fix lowering for aten::softmax for the case when dtype parameter is None. (#66516)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66516

Differential Revision:
D31590858
D31590858

Test Plan: Imported from OSS

Reviewed By: navahgar

Pulled By: ZolotukhinM

fbshipit-source-id: 0aeee7a5be64b3b9c8fa00aacb1a94031a7e25d1
2021-11-03 09:42:48 -07:00
Rohan Varma
885da61d7d [PG NCCL] Disable NCCL health check (#67668)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67668

This adds an env var to enable NCCL health check, which when left unspecified, results in the check not being run. Unit tests that need to test this functionality have the env variable set. Please see internal diff for more details.

Test Plan: CI

Reviewed By: yuguo68, mrshenli

Differential Revision: D32089763

fbshipit-source-id: dff5664a5e607f711515cd1042089ca769914fbb
2021-11-02 16:21:59 -07:00
Scott Wolchok
82f7f8d471 [PyTorch] Adopt IValue::toTupleRef() where obvious (#65505)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65505

Generated with

`fastmod -m 'toTuple\(\)(\s*)->' 'toTupleRef()${1}.'`

, followed by

`fastmod '(std::move\(.*)toTupleRef\(\).' '${1}toTuple()->'`

to unbreak 2 callsites.
ghstack-source-id: 142065835

Test Plan: CI

Reviewed By: gchanan

Differential Revision: D31131025

fbshipit-source-id: 54457ae5bbeb38db9c7f196d469b98521c3d3f34
2021-11-02 10:22:18 -07:00
Howard Huang
0cbfd466d2 Remove ProcessGroup from TensorPipeAgent initialization (#66708)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66708

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D31762735

Pulled By: H-Huang

fbshipit-source-id: 9f3879fca6b8258f7e6171b14d2c1d6cce21627d
2021-11-01 14:15:27 -07:00
Max Ren
ba369ea053 check to ensure profiler_edge is only added when use_kineto is on (#67494)
Summary:
Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67494

Reviewed By: jbschlosser

Differential Revision: D32031142

Pulled By: mcr229

fbshipit-source-id: 8267f0e02c5bed0fbc4956af6935a551bedb27ef
2021-11-01 13:42:14 -07:00
Ivan Kobzarev
7fbcf79684 [tensorexpr][nnc] Support quantization (#66676)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66676

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D31676329

Pulled By: IvanKobzarev

fbshipit-source-id: 288b41ff4ed603dfaacb465f296997f14bb23c22
2021-10-31 22:49:30 -07:00
Jacob Szwejbka
66202b7f8d [Pytorch Edge] Expose runtime operators versioning (#67385)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67385

As part of the expanded operator versioning effort we are going to start looking at this variable and whats stored locally in the model file.
ghstack-source-id: 141782717

Test Plan: unit test

Reviewed By: cccclai

Differential Revision: D31976654

fbshipit-source-id: 255a23cff7c4f4039089de23b4da95772be48324
2021-10-29 13:42:59 -07:00
Elias Ellison
fc82ad186a Add Initial NNC Dynamic Shapes Flow (#66136)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66136

FOR REVIEWERS: this is ready to review, test failures comes from somewhere else in stack..

Takes in a TensorExprGraph of static shapes and generalizes the input shapes
to symbolic dimensions. Dimensions of value 1 will be preserved, otherwise
dimensions with the same value will be bucketed to the same symbolic shape.

E.g. `Tensor(5, 3), Tensor(3, 1) -> Tensor(SS(-1), SS(-2)), Tensor(SS(-2), 1)`

From there, runs symbolic shape inference on the graph, and creates a
versioning if in the graph with prim::TensorExprDynamicGuard checking if
the inputs at runtime match the Generalized Symbolic Shapes that are inputs
to the TE Kernel. The computate to calculate all symbolic dimensions is
inlined in to the if block with the TE Kernel. All Sym Dim Value* are
appended to the end of the TE Kernel Graph/Node inputs, and the Node is
augmented with a integer list attr `symbolic_shape_inputs` that gives the
mapping from Value * -> Symbolic Shape int64_t value. For more lengthy IR
examples and walkthrough look at ShapeAnalysisTest.DynamicShapesFusion in
`test_shape_analysis` Returns True on Success, False on Failure, can fail if
shape propagation fails to propagate # of dims or if complete shapes on
inputs not set.

Example transformation
```
graph(%x_inp : Float(10, 5, strides=[5, 1], requires_grad=0, device=cpu),
      %y_inp : Float(4, 5, strides=[5, 1], requires_grad=0, device=cpu),
      %z_inp : Float(1, 1, strides=[1, 1], requires_grad=0, device=cpu)):
  %3 : Tensor = prim::TensorExprGroup_0(%x_inp, %y_inp, %z_inp)
  return ()
with prim::TensorExprGroup_0 = graph(%x.1 : Float(10, 5, strides=[5, 1], requires_grad=0, device=cpu),
      %y.1 : Float(4, 5, strides=[5, 1], requires_grad=0, device=cpu),
      %z : Float(1, 1, strides=[1, 1], requires_grad=0, device=cpu)):
  %3 : int = prim::Constant[value=0]()
  %4 : Tensor = aten::tanh(%x.1)
  %5 : Tensor = aten::erf(%4)
  %6 : Tensor = aten::relu(%y.1)
  %7 : Tensor[] = prim::ListConstruct(%5, %6)
  %8 : Tensor = aten::cat(%7, %3)
  %9 : Tensor = aten::hardswish(%8)
  %10 : Tensor = aten::mul(%9, %z)
  return (%9)
```
->

```
  graph(%x_inp : Float(10, 5, strides=[5, 1], requires_grad=0, device=cpu),
      %y_inp : Float(4, 5, strides=[5, 1], requires_grad=0, device=cpu),
      %z_inp : Float(1, 1, strides=[1, 1], requires_grad=0, device=cpu)):
  %4 : bool = prim::TensorExprDynamicGuard[types=[Float(SS(-2), SS(-3), strides=[5, 1], requires_grad=0, device=cpu), Float(SS(-4), SS(-3), strides=[5, 1], requires_grad=0, device=cpu), Float(1, 1, strides=[1, 1], requires_grad=0, device=cpu)]](%x_inp, %y_inp, %z_inp)
  %5 : Tensor = prim::If(%4)
    block0():
      %15 : int[] = aten::size(%x_inp)
      %16 : int[] = aten::size(%y_inp)
      %17 : int = prim::Constant[value=1]()
      %18 : int = prim::Constant[value=0]()
      %elem.3 : int = aten::__getitem__(%15, %18) # <string>:40:10
      %elem.5 : int = aten::__getitem__(%15, %17) # <string>:40:10
      %elem.11 : int = aten::__getitem__(%16, %18) # <string>:40:10
      %cat_dim_size.48 : int = aten::add(%elem.3, %elem.11) # <string>:321:29
      %3 : Tensor = prim::TensorExprGroup_0[symbolic_shape_inputs=[-5, -4, -3, -2]](%x_inp, %y_inp, %z_inp, %cat_dim_size.48, %elem.11, %elem.5, %elem.3)
      -> (%3)
    block1():
      %14 : Tensor = prim::FallbackGraph_1(%x_inp, %y_inp, %z_inp)
      -> (%14)
  return ()
  with prim::TensorExprGroup_0 = graph(%x.1 : Float(SS(-2), SS(-3), strides=[5, 1], requires_grad=0, device=cpu),
        %y.1 : Float(SS(-4), SS(-3), strides=[5, 1], requires_grad=0, device=cpu),
        %z : Float(1, 1, strides=[1, 1], requires_grad=0, device=cpu),
        %SS_5 : int,
        %SS_4 : int,
        %SS_3 : int,
        %SS_2 : int):
    %3 : int = prim::Constant[value=0]()
    %4 : Tensor(SS(-2), SS(-3)) = aten::tanh(%x.1)
    %5 : Tensor(SS(-2), SS(-3)) = aten::erf(%4)
    %6 : Tensor(SS(-4), SS(-3)) = aten::relu(%y.1)
    %7 : Tensor[] = prim::ListConstruct(%5, %6)
    %8 : Tensor(SS(-5), SS(-3)) = aten::cat(%7, %3)
    %9 : Tensor(SS(-5), SS(-3)) = aten::hardswish(%8)
    %10 : Tensor(SS(-5), SS(-3)) = aten::mul(%9, %z)
    return (%9)
```

Test Plan: Imported from OSS

Reviewed By: navahgar, anjali411

Differential Revision: D31797466

Pulled By: eellison

fbshipit-source-id: b508d2f5baef6e8e4020955ab1d4bc4b9c7bdfdd
2021-10-28 17:09:03 -07:00
Zhengxu Chen
0795735351 [jit] Clean up unneeded virtual methods from Function interface. (#65968)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65968

tryToGraphFunction() should cover all cases and more composable than
adhoc virtual methods.
ghstack-source-id: 141759214

Test Plan: no behavior change.

Reviewed By: gmagogsfm

Differential Revision: D31326154

fbshipit-source-id: 692a35df424f7d4f777a96489c4cbb24b3ae7807
2021-10-28 12:28:48 -07:00
Bin Bao
2366948085 [LT] Add ir_util for ComputePostOrder (#67282)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67282

Test Plan: `build/bin/test_lazy`

Reviewed By: wconstab, ngimel

Differential Revision: D31961754

Pulled By: desertfire

fbshipit-source-id: 28466588ece8057640a7202b8c79cc1a4357d373
2021-10-28 08:17:52 -07:00
Zhengxu Chen
b55a2500d2 [jit] Remove graph() call from abstract Function interface. (#65967)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65967

Graph is an implementation detail. If user wants to get access to the
underlying graph, they should be able to explicitly dynamic cast instead.
ghstack-source-id: 141659819

Test Plan: no behavior change.

Reviewed By: gmagogsfm

Differential Revision: D31326153

fbshipit-source-id: a0e984f57c6013494b92a7095bf5bb660035eb84
2021-10-27 11:54:26 -07:00
Pavithran Ramachandran
1ce500f56f [easy][PyTorch] Use at::native::is_nonzero (#67195)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67195

Now that `is_nonzero` is part of `at::native` refer https://github.com/pytorch/pytorch/pull/66663, replacing `TensorCompare::is_nonzero` to `at::native::is_nonzero`

ghstack-source-id: 141514416

Test Plan: CI

Reviewed By: larryliu0820

Differential Revision: D31704041

fbshipit-source-id: 36813e5411d0aa2eb2d0442e2a195bbed417b33d
2021-10-26 12:40:32 -07:00
Michael Shi
ad5731cacc [PyTorch] Add flop count for bmm and baddbmm (#66636)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66636

Add FLOP count for bmm and baddbmm, which is `2*b*m*n*k`.

Reviewed By: ngimel

Differential Revision: D31622061

fbshipit-source-id: f3e1e1e34c45228693117b81647fb4a623c4085b
2021-10-25 17:31:12 -07:00
Zhengxu Chen
12daa4f663 [jit][edge] Enable CALL instruction in lite interpreter. (#65964)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65964

ghstack-source-id: 141425519

Test Plan: buck run xplat/caffe2:test_lite_interpreter

Reviewed By: cccclai

Differential Revision: D31326149

fbshipit-source-id: 8a599d92f3fa4e6c125100adb36d89592e71e547
2021-10-25 14:44:33 -07:00
Nikolay Korovaiko
a7ebf76a15 jit trace (#59949)
Summary:
Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59949

Reviewed By: ZolotukhinM

Differential Revision: D31366787

Pulled By: Krovatkin

fbshipit-source-id: 798cbcd97e8ecfba984f98cd70214954be9309af
2021-10-24 18:04:22 -07:00
Chen Lai
5f58764d1d [PyTorch Edge][type] Add type support for NamedTuple custom class (import) (#63130)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63130

Extend `type_parser` to handle `NamedTuple` type. It can be extended to handle other types when needed. The custom type will follow the following format:
```
"qualified_named[
    NamedTuple, [
        [filed_name_1, field_type_1],
        [filed_name_2, field_type_2]
    ]
]"
```
For example:
```
"__torch__.base_models.sparse_nn.pytorch_preproc_types.PreprocOutputType[
    NamedTuple, [
        [float_features, Tensor],
        [id_list_features, List[Tensor]],
        [label,  Tensor],
        [weight, Tensor],
        ]
    ]"
```

For nested types, the order of type lists from type table should be:
```
std::string type_1 = “__torch__.C [
    NamedTuple, [
        [field_name_c_1, Tensor],
        [field_name_c_2, Tuple[Tensor, Tensor]],
    ]
]”

std::string type_2 = “__torch__.B [
   NamedTuple, [
       [field_name_b, __torch__.C ]
   ]
]”

std::string type_3 = “__torch__.A[
   NamedTuple, [
       [field_name_a, __torch__.B]
   ]
]”
std::vector<std::string> type_strs = {type_str_1, type_str_2, type_3};
std::vector<TypePtr> type_ptrs =  c10::parseType(type_strs);
```

namedtuple from both `collection` and `typing` are supported
```

from typing import NamedTuple
from collections import namedtuple
```

This change only adds the parser and now new runtime can read the above format.
ghstack-source-id: 141293658

Test Plan:
```
buck test mode/dev //caffe2/test/cpp/jit:jit -- --exact 'caffe2/test/cpp/jit:jit - LiteInterpreterTest.CompatiblePrimitiveType'
buck test mode/dev //caffe2/test/cpp/jit:jit -- --exact 'caffe2/test/cpp/jit:jit - LiteInterpreterTest.CompatibleCustomType'

buck test mode/dev //caffe2/test/cpp/jit:jit -- --exact 'caffe2/test/cpp/jit:jit - LiteInterpreterTest.InCompatiblePrimitiveType'
buck test mode/dev //caffe2/test/cpp/jit:jit -- --exact 'caffe2/test/cpp/jit:jit - LiteInterpreterTest.InCompatibleCustomType'
```

Reviewed By: iseeyuan

Differential Revision: D30261547

fbshipit-source-id: 68a9974338464e320b39a5c613dc048f6c5adeb5
2021-10-22 00:40:57 -07:00
David Berard
e86d8323cb [JIT] Add special cases for batch_norm, instance_norm in alias_analysis (#66554)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66554

In native_functions.yaml, the schemas for batch_norm and instance_norm
are incorrect: the inputs `running_mean` and `running_var` are mutated,
but are not marked as such in the function schema. Since `(a!)?`
annotations are currently not working (see #65760), this instead adds a
special case to `alias_anaysis.cpp`. If the value of `training` or
`use_input_stats` is known to be `false`, then `alias_analysis` will
mark the input as _not_ being written to.

Test Plan:
Removed the `skip` annotation on the following test, and added a special
exception in `check_alias_annotations`:
```
python test/test_ops.py -k test_variant_consistency_jit_nn_functional_batch_norm
```

Also:
```
./build/bin/test_jit --gtest_filter="*BatchAndInstanceNormFixture*"
```

Imported from OSS

Reviewed By: eellison

Differential Revision: D31612339

fbshipit-source-id: 12ca61b782b9e41e06883ba080a276209dc435bb
2021-10-20 10:22:10 -07:00
Michael Suo
1bf0e1acb4 Revert D31732414: Add Initial NNC Dynamic Shapes Flow
Test Plan: revert-hammer

Differential Revision:
D31732414 (de4fe7a38c)

Original commit changeset: 290a94a667c2

fbshipit-source-id: 3021a1d7a8661967e37d4f9cfc86ed47cc4a7f3d
2021-10-19 20:05:29 -07:00
Elias Ellison
de4fe7a38c Add Initial NNC Dynamic Shapes Flow (#66136)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66136

FOR REVIEWERS: this is ready to review, test failures comes from somewhere else in stack..

Takes in a TensorExprGraph of static shapes and generalizes the input shapes
to symbolic dimensions. Dimensions of value 1 will be preserved, otherwise
dimensions with the same value will be bucketed to the same symbolic shape.

E.g. `Tensor(5, 3), Tensor(3, 1) -> Tensor(SS(-1), SS(-2)), Tensor(SS(-2), 1)`

From there, runs symbolic shape inference on the graph, and creates a
versioning if in the graph with prim::TensorExprDynamicGuard checking if
the inputs at runtime match the Generalized Symbolic Shapes that are inputs
to the TE Kernel. The computate to calculate all symbolic dimensions is
inlined in to the if block with the TE Kernel. All Sym Dim Value* are
appended to the end of the TE Kernel Graph/Node inputs, and the Node is
augmented with a integer list attr `symbolic_shape_inputs` that gives the
mapping from Value * -> Symbolic Shape int64_t value. For more lengthy IR
examples and walkthrough look at ShapeAnalysisTest.DynamicShapesFusion in
`test_shape_analysis` Returns True on Success, False on Failure, can fail if
shape propagation fails to propagate # of dims or if complete shapes on
inputs not set.

Example transformation
```
graph(%x_inp : Float(10, 5, strides=[5, 1], requires_grad=0, device=cpu),
      %y_inp : Float(4, 5, strides=[5, 1], requires_grad=0, device=cpu),
      %z_inp : Float(1, 1, strides=[1, 1], requires_grad=0, device=cpu)):
  %3 : Tensor = prim::TensorExprGroup_0(%x_inp, %y_inp, %z_inp)
  return ()
with prim::TensorExprGroup_0 = graph(%x.1 : Float(10, 5, strides=[5, 1], requires_grad=0, device=cpu),
      %y.1 : Float(4, 5, strides=[5, 1], requires_grad=0, device=cpu),
      %z : Float(1, 1, strides=[1, 1], requires_grad=0, device=cpu)):
  %3 : int = prim::Constant[value=0]()
  %4 : Tensor = aten::tanh(%x.1)
  %5 : Tensor = aten::erf(%4)
  %6 : Tensor = aten::relu(%y.1)
  %7 : Tensor[] = prim::ListConstruct(%5, %6)
  %8 : Tensor = aten::cat(%7, %3)
  %9 : Tensor = aten::hardswish(%8)
  %10 : Tensor = aten::mul(%9, %z)
  return (%9)
```
->

```
  graph(%x_inp : Float(10, 5, strides=[5, 1], requires_grad=0, device=cpu),
      %y_inp : Float(4, 5, strides=[5, 1], requires_grad=0, device=cpu),
      %z_inp : Float(1, 1, strides=[1, 1], requires_grad=0, device=cpu)):
  %4 : bool = prim::TensorExprDynamicGuard[types=[Float(SS(-2), SS(-3), strides=[5, 1], requires_grad=0, device=cpu), Float(SS(-4), SS(-3), strides=[5, 1], requires_grad=0, device=cpu), Float(1, 1, strides=[1, 1], requires_grad=0, device=cpu)]](%x_inp, %y_inp, %z_inp)
  %5 : Tensor = prim::If(%4)
    block0():
      %15 : int[] = aten::size(%x_inp)
      %16 : int[] = aten::size(%y_inp)
      %17 : int = prim::Constant[value=1]()
      %18 : int = prim::Constant[value=0]()
      %elem.3 : int = aten::__getitem__(%15, %18) # <string>:40:10
      %elem.5 : int = aten::__getitem__(%15, %17) # <string>:40:10
      %elem.11 : int = aten::__getitem__(%16, %18) # <string>:40:10
      %cat_dim_size.48 : int = aten::add(%elem.3, %elem.11) # <string>:321:29
      %3 : Tensor = prim::TensorExprGroup_0[symbolic_shape_inputs=[-5, -4, -3, -2]](%x_inp, %y_inp, %z_inp, %cat_dim_size.48, %elem.11, %elem.5, %elem.3)
      -> (%3)
    block1():
      %14 : Tensor = prim::FallbackGraph_1(%x_inp, %y_inp, %z_inp)
      -> (%14)
  return ()
  with prim::TensorExprGroup_0 = graph(%x.1 : Float(SS(-2), SS(-3), strides=[5, 1], requires_grad=0, device=cpu),
        %y.1 : Float(SS(-4), SS(-3), strides=[5, 1], requires_grad=0, device=cpu),
        %z : Float(1, 1, strides=[1, 1], requires_grad=0, device=cpu),
        %SS_5 : int,
        %SS_4 : int,
        %SS_3 : int,
        %SS_2 : int):
    %3 : int = prim::Constant[value=0]()
    %4 : Tensor(SS(-2), SS(-3)) = aten::tanh(%x.1)
    %5 : Tensor(SS(-2), SS(-3)) = aten::erf(%4)
    %6 : Tensor(SS(-4), SS(-3)) = aten::relu(%y.1)
    %7 : Tensor[] = prim::ListConstruct(%5, %6)
    %8 : Tensor(SS(-5), SS(-3)) = aten::cat(%7, %3)
    %9 : Tensor(SS(-5), SS(-3)) = aten::hardswish(%8)
    %10 : Tensor(SS(-5), SS(-3)) = aten::mul(%9, %z)
    return (%9)
```

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D31732414

Pulled By: eellison

fbshipit-source-id: 290a94a667c20467717202a43c60e4f9ca4c00e2
2021-10-19 16:41:49 -07:00
gmagogsfm
147f7559b1 Add SourceView which doesn't own source text as base class of Source (#65309)
Summary:
This would save the cost copying text from stack to heap in some cases (like
parsing function schema during loading phase of libtorch.so)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65309

Reviewed By: swolchok

Differential Revision: D31060315

Pulled By: gmagogsfm

fbshipit-source-id: 0caf7a688b40df52bb4388c5191d1a42351d6f1a
2021-10-18 23:17:22 -07:00
Richard Barnes
e0643fa3fc use irange for loops 5 (#66744)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66744

Modified loops in files under fbsource/fbcode/caffe2/ from the format

`for(TYPE var=x0;var<x_max;x++)`

to the format

`for(const auto var: irange(xmax))`

This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand.

Test Plan: Sandcastle

Reviewed By: ngimel

Differential Revision: D31705358

fbshipit-source-id: d6ea350cbaa8f452fc78f238160e5374be637a48
2021-10-18 21:59:50 -07:00
Will Constable
d05c1ec007 Add lazy Node base and associated infra (#66601)
Summary:
- Adds Node base class and unit tests
- Also adds metadata utils to enable source code annotation and scope tracking

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66601

Test Plan: Add new unit tests

Reviewed By: desertfire

Differential Revision: D31634044

fbshipit-source-id: a042d54f06fbc480acfc63c18d43cb6fceb6fea5
2021-10-18 19:09:42 -07:00
Ivan Yashchuk
0d203a16fe Add relative and absolute tolerances for matrix_rank, pinv (#63102)
Summary:
This pull request introduces new keyword arguments for `torch.linalg.matrix_rank` and `torch.linalg.pinv`: `atol` and `rtol`.

Currently, only tensor overload has default values for either `atol` or `rtol`, the float overload requires both arguments to be specified.

FC compatibility: https://github.com/pytorch/pytorch/pull/63102#discussion_r710930509

Fixes https://github.com/pytorch/pytorch/issues/54151. Fixes https://github.com/pytorch/pytorch/issues/66618.

cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63102

Reviewed By: H-Huang

Differential Revision: D31641456

Pulled By: mruberry

fbshipit-source-id: 4c765508ab1657730703e42975fc8c0d0a60eb7c
2021-10-17 22:15:42 -07:00
Xue Li
2f099c7555 Revert D30652629: use irange for loops
Test Plan: revert-hammer

Differential Revision:
D30652629 (687c2267d4)

Original commit changeset: 0ae6c4bbbb55

fbshipit-source-id: 5c4f067b584a021c8c9656454d1ee60999600fb3
2021-10-15 15:23:10 -07:00
Richard Barnes
687c2267d4 use irange for loops (#66234)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66234

Modified loops in files under fbsource/fbcode/caffe2/ from the format

`for(TYPE var=x0;var<x_max;x++)`

to the format

`for(const auto var: irange(xmax))`

This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand.

bypass_size_limit
allow-large-files

Test Plan: Sandcastle

Reviewed By: ngimel

Differential Revision: D30652629

fbshipit-source-id: 0ae6c4bbbb554bad42e372792a6430e1acf15e3e
2021-10-15 13:50:33 -07:00
Scott Wolchok
e88d1c4f10 [PyTorch] Add tuple inline storage (#64066)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64066

I noticed a bunch of time being spent heap-allocating Tuples
in the unpickler. 1-, 2-, and 3-element Tuples are apparently common
enough that they get their own bytecode instructions, so I decided to
try also giving them their own representation. We store up to 3
IValues inline in `Tuple` rather than doing a second heap allocation
for a `std::vector<IValue>`.
ghstack-source-id: 140695395

Test Plan:
Added automated tests for TupleElements.

Pixel 3 before: https://www.internalfb.com/intern/aibench/details/761596366576284
Pixel 3 after: https://www.internalfb.com/intern/aibench/details/591414145082422
We went from 347 ms to 302 ms.

Reviewed By: dhruvbird

Differential Revision: D30592622

fbshipit-source-id: 93625c54c9dca5f765ef6d5c191944179cb281a8
2021-10-15 12:16:51 -07:00
Rohan Varma
06fa6c15c0 Back out "Revert D31299350: Back out "Revert D31005792: [NCCL] Init dummy NCCL comms in constructor"" (#66393)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66393

Third try!

Fixes:
- test_nccl_timeout can be flaky because of 1s timeout, bump up the timeout to resolve the flakiness. But in general we should not have been relying on time.sleep for this test, filed https://github.com/pytorch/pytorch/issues/66354 to track that.
- ciflow/all did not actually run tests due to a bug causing multigpu tests to not be run. This has since been fixed.
ghstack-source-id: 140560113

Test Plan: CI

Reviewed By: mrshenli

Differential Revision: D31534735

fbshipit-source-id: 8b7e0f4fed3972b7a77cbcda28876c9eefb0c7e2
2021-10-14 22:23:22 -07:00
soulitzer
93d326c868 Add InplaceOrView boxed kernel (#63878)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63878

See https://github.com/pytorch/pytorch/issues/64407, https://github.com/pytorch/pytorch/issues/62032 for context:

In this PR:
 - Add boxed kernel by replicating `gen_inplace_or_view`'s logic that is ONLY for use with the Autograd not-implemented kernel
   - Unlike `gen_inplace_or_view` we always pass a view_func to as_view in order to ensure that an "derivative is not implemented" error is raised even if an in-place update is performed on the view. Without the `view_func`, the CopySlice + AsStridedBackward nodes would replace the NotImplemented node.
   - This limitation makes it impossible to use this node for general use
   - view relationship must be between first input (must be tensor) and first output (may be tensor or vec of tensor)
   - do not support non-differentiable views (_values, _indices, view.dtype) - view relationship is always fw and bw differentiable
 - Adds the macro `#define REGISTER_AUTOGRAD_NOT_IMPLEMENTED_FALLBACK(ns, op)` to be the interface for this feature:
   - static initialization can be slowed down(? not measured) if there are many registrations, because each line translates to 2 library calls but the workaround is just to manually use the two functions `AutogradNotImplementedFallback` and `ADInplaceOrViewFallback` and call `m.impl`.
 - Adds testing:
    - for views: view relationship created
      -  performing in-place operation on the view, raises properly
      - trying to create two view relationships is not allowed,
      - single view relationship but not first input/first output should error
      - view relation created properly for tensor vector output
    - for in-place:
      - version count bump
      - triggers rebase_history
      - multiple mutations is okay and also updates version counter
 - TODO (follow up): Update tutorials for adding  third-party operators (and document the above limitations)
 - TODO (follow up): Look at torch-audio/torch-vision and identify places where this can simplify existing code

EDIT: Made it more clear what is introduced in this PR and moved some more contextual stuff into the issue itself

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D30901714

Pulled By: soulitzer

fbshipit-source-id: 48de14c28be023ff4bd31b7ea5e7cba88aeee04c
2021-10-12 18:55:50 -07:00
Kimish Patel
c6216b2a43 Back out "Revert D30710710: [Pytorch Edge] Support profiling kineto events from external source" (#66421)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66421

Original commit changeset: ab6bb8fe4e83

Plus this incldes BUILD.bazel changes, the reason for the revert.

Test Plan: See original diff

Reviewed By: gdankel

Differential Revision: D31542513

fbshipit-source-id: ee30aca2d6705638f97e04b77a9ae31fe5cc4ebb
2021-10-12 10:55:29 -07:00
Animesh Jain
cc24e4e5d0 [NNC] Normalize loops in SplitWithTail (#66242)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66242

While working on random test generation, I observed that many simple transformations were upsetting vectorization. Digging deeper, I found that it calls SplitWithTail which incorrectly splits the loop when the loop start is not zero. This path normalizes the loop before we start splitting it.

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D31506853

Pulled By: anijain2305

fbshipit-source-id: 5c5f2568ce0a239bfaa515458be52541eafd23b1
2021-10-11 13:44:05 -07:00
Nikita Shulga
c373387709 Update CMake and use native CUDA language support (#62445)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62445

PyTorch currently uses the old style of compiling CUDA in CMake which is just a
bunch of scripts in `FindCUDA.cmake`. Newer versions support CUDA natively as
a language just like C++ or C.

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D31503350

fbshipit-source-id: 2ee817edc9698531ae1b87eda3ad271ee459fd55
2021-10-11 09:05:48 -07:00
Jane Xu
0a48f56318 Revert D31299350: Back out "Revert D31005792: [NCCL] Init dummy NCCL comms in constructor"
Test Plan: revert-hammer

Differential Revision:
D31299350 (f1f3bd8c36)

Original commit changeset: 9ad5c8fa17f7

fbshipit-source-id: d63d889922f507a4a0e2e042e451b95b9591c317
2021-10-08 17:55:28 -07:00
Jane Xu
c62ed96496 Revert D30710710: [Pytorch Edge] Support profiling kineto events from external source
Test Plan: revert-hammer

Differential Revision:
D30710710 (c1343ff706)

Original commit changeset: 51399f9b0b64

fbshipit-source-id: ab6bb8fe4e83ed1052e621e427259192a4f0f540
2021-10-08 17:46:18 -07:00
Rohan Varma
f1f3bd8c36 Back out "Revert D31005792: [NCCL] Init dummy NCCL comms in constructor" (#65883)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65883

Original commit changeset: d8e962b8aab6
ghstack-source-id: 139836954

Test Plan: ci

Reviewed By: zhaojuanmao

Differential Revision: D31299350

fbshipit-source-id: 9ad5c8fa17f7038ba579cb1eda6d9271ac07a130
2021-10-08 16:04:20 -07:00
Kimish Patel
c1343ff706 [Pytorch Edge] Support profiling kineto events from external source (#64397)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64397

This diff exposes a way to add events to kineto profiler from external
source.
This can be a backend that executes a subgraph and wants to record this
execution in kineto profiler.
This diff also adds "backend" metadata to identify the backend an event
would have executed on.

Test Plan:
test_lite_interpreter

Imported from OSS

Reviewed By: raziel

Differential Revision: D30710710

fbshipit-source-id: 51399f9b0b647bc2d0076074ad4ea9286d0ef3e2
2021-10-08 15:59:42 -07:00
Raghavan Raman
92ce188510 Revert D31445799: [nnc] Use given kernel function name while emitting code
Test Plan: revert-hammer

Differential Revision:
D31445799 (c30dc52739)

Original commit changeset: 8d1642098313

fbshipit-source-id: 6b9d8c816437e9fcba8eb19cc683bc0a46a04cf5
2021-10-08 12:39:01 -07:00
Raghavan Raman
2e6fa0261f Revert D31445797: [nnc] Added a cache to use singleton instances of PytorchLLVMJIT for every triple,cpu,attrs combination
Test Plan: revert-hammer

Differential Revision:
D31445797 (7e5ef5e517)

Original commit changeset: 4e1450100928

fbshipit-source-id: fc13b34dbb66c7a22816eb46cf6d98ae9f332d39
2021-10-08 12:38:59 -07:00
Scott Wolchok
2d885ab73d [jit] Reduce refcounting of Types (#65345)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65345

FooType::get() can return a const reference. Inconveniently, converting shared_ptr<FooType> to shared_ptr<Type> requires a copy & refcount bump, so to properly take advantage of this in unshapedType() we need to take a const Type& in isSubtypeOf(), which is good practice anyway -- don't require a shared_ptr if you don't need to take ownership.
ghstack-source-id: 140044165

Test Plan:
CI

perf says c10::unshapedType time decreased from 2.8% to 2.2% during static runtime startup, though I expect this to be generally beneficial.

Reviewed By: hlu1

Differential Revision: D31027361

fbshipit-source-id: 676feb81db9f74ad7b8651d8774f4ecb4cfa6ab8
2021-10-08 09:03:04 -07:00
Raghavan Raman
7e5ef5e517 [nnc] Added a cache to use singleton instances of PytorchLLVMJIT for every triple,cpu,attrs combination (#66217)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66217

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D31445797

Pulled By: navahgar

fbshipit-source-id: 4e1450100928132ccce4ef3c6c20ad6661cfabed
2021-10-07 13:17:11 -07:00
Raghavan Raman
c30dc52739 [nnc] Use given kernel function name while emitting code (#66216)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66216

Test Plan: Imported from OSS

Reviewed By: dagitses, priyaramani

Differential Revision: D31445799

Pulled By: navahgar

fbshipit-source-id: 8d164209831339d364710b14f6a263a16e108281
2021-10-07 13:15:46 -07:00
Will Constable
a8c0b362ce [pytorch][PR] Add hash and int128 utils for Lazy Tensor Core" (#66181)
Summary:
These utils are prerequisites for Lazy Node base class.
- set up new torch/csrc/lazy, test/cpp/lazy dirs
- add source files to build_variables.bzl in new lazy_core_sources var
- create new test_lazy binary

Fixes https://github.com/pytorch/pytorch/issues/65636

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66181

Original commit changeset: 3d0d5377d71e

Test Plan:
Run PyTorch XLA corresponding PR in XLA CI:
https://github.com/pytorch/xla/pull/3148/files

Reviewed By: suo

Differential Revision: D31416438

fbshipit-source-id: 58a6a49c5bc30134bc6bae2e42778f359b9a8f40
2021-10-07 10:05:26 -07:00
Chen Lai
a5895f85be [PyTorch Edge][type] Add type check in compatibility api (#63129)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63129

1. Add an api to get `supported_types` from runtime, expose in c++ only.
2. Add an api to get `contained_types` from model, expose in both c++ and PyThon.
3. Add a field `contained_types_` in `type_parser.cpp` to track the contained types when parsing python string.
4. Expand `is_compatible` api to check type. When checking type, it will check the contained type list from the model with the support type list from runtime.
5. Expand the unittest for compatibility to cover type
6. Add unit test in python to check type list
ghstack-source-id: 139826944

Test Plan:
```
buck test mode/dev //caffe2/test/cpp/jit:jit -- --exact 'caffe2/test/cpp/jit:jit - LiteInterpreterTest.GetContainTypes'

buck test mode/dev //caffe2/test/cpp/jit:jit -- --exact 'caffe2/test/cpp/jit:jit - LiteInterpreterTest.isCompatibleSuccess'
buck test mode/dev //caffe2/test/cpp/jit:jit -- --exact 'caffe2/test/cpp/jit:jit - LiteInterpreterTest.isCompatibleFail'

buck test //caffe2/test:mobile
```

Reviewed By: iseeyuan

Differential Revision: D30231419

fbshipit-source-id: 8427f423ec28cc5de56411f15fd960d8595d6947
2021-10-06 02:23:44 -07:00
Michael Suo
f062def486 Revert D31260343: [pytorch][PR] Add hash and int128 utils for Lazy Tensor Core
Test Plan: revert-hammer

Differential Revision:
D31260343 (e94fea08d0)

Original commit changeset: 8bb1194188e3

fbshipit-source-id: 3d0d5377d71ed928015bcb2105801be368e38cd8
2021-10-05 17:15:50 -07:00
Will Constable
e94fea08d0 Add hash and int128 utils for Lazy Tensor Core (#65635)
Summary:
These utils are prerequisites for Lazy Node base class.

- set up new torch/csrc/lazy, test/cpp/lazy dirs
- add source files to build_variables.bzl in new lazy_core_sources var
- create new test_lazy binary

Fixes https://github.com/pytorch/pytorch/issues/65636

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65635

Reviewed By: alanwaketan

Differential Revision: D31260343

Pulled By: wconstab

fbshipit-source-id: 8bb1194188e3e77fc42e08a14ba37faed37a9c2e
2021-10-05 16:43:55 -07:00
Nikita Shulga
4c4525fa5c Compile without -Wno-unused-variable (take 2) (#66041)
Summary:
Delete `-Wno-unused-variable` from top level `CMakeLists.txt`
Still suppress those warnings for tests and `torch_python`

Delete number of unused variables from caffe2 code
Use `(void)var;` to suppress unused variable in range loops
Use `C10_UNUSED` for global constructors and use `constexpr` instead of `static` for global constants

Do not delete `caffe2::OperatorBase::Output` calls as they have side effects

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66041

Reviewed By: ngimel

Differential Revision: D31360142

Pulled By: malfet

fbshipit-source-id: 6fdfb9f91efdc49ca984a2f2a17ee377d28210c8
2021-10-04 20:39:39 -07:00
Don Jang
7941590a51 [JIT] Selectively enable precise alias analysis for TupleConstruct (#66025)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66025

This change adds an option to selectively enable precise alias analysis for `prim::`TupleConstruct` (introduced by D30437737 (cd458fe092)) to minimize its exposure only to `StaticRuntime` as of now.

Test Plan: Modified existing unit tests whose behavior depends on D30437737 (cd458fe092).

Reviewed By: eellison

Differential Revision: D31350285

fbshipit-source-id: 3ce777f07f99650d74634481ad0805192dce55c6
2021-10-01 20:42:22 -07:00
Nikita Shulga
e4ee5ca698 Revert D31326599: [pytorch][PR] Compile without -Wno-unused-variable
Test Plan: revert-hammer

Differential Revision:
D31326599 (a6280ab653)

Original commit changeset: 924155f1257a

fbshipit-source-id: b8ee5bc0298637443232f5ee9ec79e51ed256faf
2021-10-01 20:40:47 -07:00
Nikita Shulga
a6280ab653 Compile without -Wno-unused-variable (#65954)
Summary:
Delete `-Wno-unused-variable` from top level `CMakeLists.txt`
Still suppress those warnings for tests and `torch_python`

Delete number of unused variables from caffe2 code
Use `(void)var;` to suppress unused variable in range loops
Use `C10_UNUSED` for global constructors and use `constexpr` instead of `static` for global constants

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65954

Reviewed By: ngimel

Differential Revision: D31326599

Pulled By: malfet

fbshipit-source-id: 924155f1257a2ba1896c50512f615e45ca1f61f3
2021-10-01 17:40:47 -07:00
Hariom Narang
2828ce53fd Added jit log stream changing function and some refactor (#65768)
Summary:
Description:
- Have only added `stdout` and `stderr` as possible options from python
  API for now. We can do file path passing later maybe.
- Put the class `JitLoggingConfig` in the cpp file as none of its methods were being used outside of this file.

Python API:
`torch._C._jit_set_logging_stream('stdout|stderr')`
C++ API:
`::torch::jit::set_jit_logging_output_stream(ostream);`

Testing:
- Tested python API locally.
- Unit test for the C++ API is written

Fixes https://github.com/pytorch/pytorch/issues/54182

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65768

Reviewed By: mrshenli

Differential Revision: D31291739

Pulled By: ZolotukhinM

fbshipit-source-id: eee72edc20488efad78a01c5b0ed8a132886a08d
2021-09-30 23:25:11 -07:00
Michael Suo
33c03cb61a [deploy][1/n] Make deploy code conform to PyTorch style. (#65861)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65861

First in a series. This PR changes the code in deploy.h/cpp and
interpreter_impl.h/cpp to be camel case instead of snake case. Starting
with this as it has the most impact on downstream users.

Test Plan: Imported from OSS

Reviewed By: shannonzhu

Differential Revision: D31291183

Pulled By: suo

fbshipit-source-id: ba6f74042947c9a08fb9cb3ad7276d8dbb5b2934
2021-09-30 22:59:47 -07:00
Mikhail Zolotukhin
3a0165da49 [TensorExpr] Port NNC lowerings to the new registry mechanism. (#65551)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65551

Previously we had a big switch on Op kind to decide how to lower a given
JIT operator to NNC. This PR changes this switch to a hash table lookup.

Why? This helps us with at least two things:
1) With this approach we can easily check if we know how to handle a
given node in advance - i.e. we can inspect the entire graph and tell
whether it's possible to compile it or not without actually trying to do
that and dying in the middle. This would allow us to, say, provide
user-friendly error messages in AOT workflow.
2) We can switch to use schema instead of op kind to determine correct
lowering. Unlike op schema, op kind might be ambigous (see e.g. #64963)
and using it instead of schema can lead to bugs.

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D31148926

Pulled By: ZolotukhinM

fbshipit-source-id: ac12684e2126c899426ef5e4cc1e3f70fa01f704
2021-09-30 22:56:18 -07:00
Don Jang
cd458fe092 [JIT] Make output of prim::TupleConstruct alias only with its inputs (#64879)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64879

This change makes the output of `prim::TupleConstruct` alias only with its inputs *when* the created tuple is directly returned from the graph.

The same treatment could be made to any tuples newly constructed by `prim::TupleConstruct` if they do not let their elements escape. However, this change only focuses on only one simplest, but frequently used usecase: tuples constructed only to be returned from a graph. This usecase turns out to be very often used.

Test Plan:
Added
- `AliasMoveForTupleConstructWithSingleUseAsGraphOutput`
- `WildcardAliasForTupleConstructWithUses`

to cover the newly added code.

Reviewed By: eellison

Differential Revision: D30437737

fbshipit-source-id: 417fbc6bc348062e60e7acdddd340d4754d090eb
2021-09-29 21:56:31 -07:00
Mike Ruberry
91f8755b0e Revert D31005792: [NCCL] Init dummy NCCL comms in constructor
Test Plan: revert-hammer

Differential Revision:
D31005792 (2b22a5dde2)

Original commit changeset: c2c582dee25a

fbshipit-source-id: d8e962b8aab6fda8a6c013e8577492dff9568c27
2021-09-29 20:46:38 -07:00
Rohan Varma
2b22a5dde2 [NCCL] Init dummy NCCL comms in constructor (#65173)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65173

Initializes dummy NCCL communicators in constructor for a basic health
check that communicators can be initialized prior to launching the first
collective.

After successful init, we immediately use `ncclCommAbort` to destroy these
communicators to ensure they don't interfere with regular communicator creation
during collectives.

Test Plan: CI

Reviewed By: pritamdamania87

Differential Revision: D31005792

fbshipit-source-id: c2c582dee25a098361ead6ef03f541e7833c606b
2021-09-29 15:36:54 -07:00
Scott Wolchok
ece25c453f [PyTorch] Store Argument::alias_info_ on the heap (#64824)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64824

See comment in function_schema.h for explanation. I claim that this is a good tradeoff because the aliasing information seems to be used only in compiler-ish code paths, where performance isn't as critical as actual execution. If performance is important there too, perhaps we should hoist isWrite into the Argument itself since there are several paths that only care about isWrite.
ghstack-source-id: 138958896

Test Plan: CI, profile schema parsing on startup and see much fewer page faults in createArgumentVector.

Reviewed By: suo

Differential Revision: D30860719

fbshipit-source-id: 1d4d2328f2b8e34f5ddf9d82083fd4dd7b7f738f
2021-09-24 17:00:51 -07:00
Peter Bell
68e5935498 Remove fgrad_input from slow_conv2d (#64280)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64280

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D30830887

Pulled By: jbschlosser

fbshipit-source-id: 5a3a79ad9d9118177672eabf872f9d9a3313ebe4
2021-09-24 14:27:39 -07:00
XiaobingSuper
1682722152 keep output type after calling SubgraphRewriter (#65453)
Summary:
For jit **SubgraphRewriter**, it doesn't keep output type after overwriting the old graph, for example, in profiling mode, the old graph has the old operator's shapes, but after replacing the old operator with a newer operator by applying **SubgraphRewriter**, the tensor shape info was eliminated.

The activation is that I want to replace pytorch convolution with a customer's convolution, I first register **aten::_convolution** as a profiler node that can reorder the input and output's shapes, and then using graph rewrite to replace it as **aten::conv2d**, which tensors' shapes info are eliminated. I hope using input size do some pre-progress before replacing **aten::conv2d** with the customer's convolution.

Before rewrite:
```
graph(%self.1 : __torch__.MyModule,
      %x.1 : Float(2, 3, 20, 20, strides=[1200, 400, 20, 1], requires_grad=0, device=cpu)):
  %7 : int = prim::Constant[value=1](), scope: __module.conv # /home/xiaobinz/miniconda3/envs/pytorch-master/lib/python3.6/                      site-packages/torch/nn/modules/conv.py:443:0
  %6 : bool = prim::Constant[value=0](), scope: __module.conv # /home/xiaobinz/miniconda3/envs/pytorch-master/lib/python3.6                      /site-packages/torch/nn/modules/conv.py:443:0
  %5 : bool = prim::Constant[value=1](), scope: __module.conv # /home/xiaobinz/miniconda3/envs/pytorch-master/lib/python3.6                      /site-packages/torch/nn/modules/conv.py:443:0
  %4 : NoneType = prim::Constant()
  %3 : int[] = prim::Constant[value=[1, 1]]()
  %2 : int[] = prim::Constant[value=[0, 0]]()
  %conv : __torch__.torch.nn.modules.conv.Conv2d = prim::GetAttr[name="conv"](%self.1)
  %z : Float(2, 3, 20, 20, strides=[1200, 400, 20, 1], requires_grad=0, device=cpu) = aten::clone(%x.1, %4) # jit_test.py:2                      2:0
  %weight : Float(3, 3, 1, 1, strides=[3, 1, 1, 1], requires_grad=0, device=cpu) = prim::GetAttr[name="weight"](%conv)
  %x : Float(2, 3, 20, 20, strides=[1200, 400, 20, 1], requires_grad=0, device=cpu) = aten::_convolution(%x.1, %weight, %4,                       %3, %2, %3, %6, %2, %7, %6, %6, %5, %5), scope: __module.conv # /home/xiaobinz/miniconda3/envs/pytorch-master/lib/python3.                      6/site-packages/torch/nn/modules/conv.py:443:0
  %16 : Float(2, 3, 20, 20, strides=[1200, 400, 20, 1], requires_grad=0, device=cpu) = aten::add(%x, %z, %7) # jit_test.py:                      24:0
  return (%16)
```
 after rewrite by using **aten::conv2d**
```
graph(%self.1 : __torch__.MyModule,
      %x.1 : Float(2, 3, 20, 20, strides=[1200, 400, 20, 1], requires_grad=0, device=cpu)):
  %7 : int = prim::Constant[value=1](), scope: __module.conv # /home/xiaobinz/miniconda3/envs/pytorch-master/lib/python3.6/site-packages/torch/nn/modules/conv.py:443:0
  %6 : bool = prim::Constant[value=0](), scope: __module.conv # /home/xiaobinz/miniconda3/envs/pytorch-master/lib/python3.6/site-packages/torch/nn/modules/conv.py:443:0
  %5 : bool = prim::Constant[value=1](), scope: __module.conv # /home/xiaobinz/miniconda3/envs/pytorch-master/lib/python3.6/site-packages/torch/nn/modules/conv.py:443:0
  %4 : NoneType = prim::Constant()
  %3 : int[] = prim::Constant[value=[1, 1]]()
  %2 : int[] = prim::Constant[value=[0, 0]]()
  %conv : __torch__.torch.nn.modules.conv.Conv2d = prim::GetAttr[name="conv"](%self.1)
  %z : Float(2, 3, 20, 20, strides=[1200, 400, 20, 1], requires_grad=0, device=cpu) = aten::clone(%x.1, %4) # jit_test.py:22:0
  %weight : Float(3, 3, 1, 1, strides=[3, 1, 1, 1], requires_grad=0, device=cpu) = prim::GetAttr[name="weight"](%conv)
  %18 : Tensor = aten::conv2d(%x.1, %weight, %4, %3, %2, %3, %7)
  %16 : Float(2, 3, 20, 20, strides=[1200, 400, 20, 1], requires_grad=0, device=cpu) = aten::add(%18, %z, %7) # jit_test.py:24:0
  return (%16)
```

expected result after replace **aten::_convolution** with  **aten::conv2d**:

```
graph(%self.1 : __torch__.MyModule,
      %x.1 : Float(2, 3, 20, 20, strides=[1200, 400, 20, 1], requires_grad=0, device=cpu)):
  %7 : int = prim::Constant[value=1](), scope: __module.conv # /home/xiaobinz/miniconda3/envs/pytorch-master/lib/python3.6/                      site-packages/torch/nn/modules/conv.py:443:0
  %6 : bool = prim::Constant[value=0](), scope: __module.conv # /home/xiaobinz/miniconda3/envs/pytorch-master/lib/python3.6                      /site-packages/torch/nn/modules/conv.py:443:0
  %5 : bool = prim::Constant[value=1](), scope: __module.conv # /home/xiaobinz/miniconda3/envs/pytorch-master/lib/python3.6                      /site-packages/torch/nn/modules/conv.py:443:0
  %4 : NoneType = prim::Constant()
  %3 : int[] = prim::Constant[value=[1, 1]]()
  %2 : int[] = prim::Constant[value=[0, 0]]()
  %conv : __torch__.torch.nn.modules.conv.Conv2d = prim::GetAttr[name="conv"](%self.1)
  %z : Float(2, 3, 20, 20, strides=[1200, 400, 20, 1], requires_grad=0, device=cpu) = aten::clone(%x.1, %4) # jit_test.py:2                      2:0
  %weight : Float(3, 3, 1, 1, strides=[3, 1, 1, 1], requires_grad=0, device=cpu) = prim::GetAttr[name="weight"](%conv)
  %18 : Float(2, 3, 20, 20, strides=[1200, 400, 20, 1], requires_grad=0, device=cpu) = aten::conv2d(%x.1, %weight, %4, %3,                       %2, %3, %7)
  %16 : Float(2, 3, 20, 20, strides=[1200, 400, 20, 1], requires_grad=0, device=cpu) = aten::add(%18, %z, %7) # jit_test.py                      :24:0
  return (%16)
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65453

Reviewed By: zdevito

Differential Revision: D31162489

Pulled By: ZolotukhinM

fbshipit-source-id: 0d1c1d607cb612df47c64f173d9f4c9e8b1d6c49
2021-09-24 11:07:40 -07:00
kshitij12345
a012216b96 [nn] Fold : no batch dim (#64909)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/64907
Reference: https://github.com/pytorch/pytorch/issues/60585

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64909

Reviewed By: cpuhrsch, heitorschueroff

Differential Revision: D30991087

Pulled By: jbschlosser

fbshipit-source-id: 91a37e0b1d51472935ff2308719dfaca931513f3
2021-09-23 08:37:32 -07:00
jiej
127c9402d0 Revert "Revert D30752939: [pytorch][PR] nvfuser update" (#65137)
Summary:
This reverts commit 03389dc851.

Attempt again for PR: https://github.com/pytorch/pytorch/issues/63745
Fixes the windows build failure.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65137

Reviewed By: seemethere, dzhulgakov, heitorschueroff

Differential Revision: D30994556

Pulled By: malfet

fbshipit-source-id: f1925b6c5cc1a1a441a96499667c91e8dfc1b53d
2021-09-22 04:54:51 -07:00
Chen Lai
880098a7e3 [PyTorch Edge] Backport function for defaults args with out args, flag on (#63651)
Summary:
1. Enable support for operators with default args and out args. For `torch.add(x, h, out=x)`, the number of specified arguments will be 3 instead of 4.
2. Bump bytecode version from 6 to 7
3. Implement backport_v7_to_v6 function. Also slightly refactor the local_thread to allow re-emit operators.
4. unittest to cover backport function
5. Update expect result from 4 to 3 in unit test DefaultArgsWithOutArg to cover the number of specified arguments.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63651

ghstack-source-id: 138539912

Test Plan:
```
caffe2/test/cpp/jit:jit - LiteInterpreterTest.DefaultArgsWithOutArg
caffe2/test/cpp/jit:jit - LiteInterpreterTest.DefaultArgsPinvWithOutArg
caffe2/test/cpp/jit:jit - LiteInterpreterTest.BackPortByteCodeModelAllVersions
```

Reviewed By: raziel, tugsbayasgalan

Differential Revision: D30454080

fbshipit-source-id: 357c50b96682430675142d20d688d1f64e1de307
2021-09-20 22:50:30 -07:00
Mengwei Liu
eaf85fad62 [PyTorch] Extract parseOperator() into a standalone source file (#65179)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65179

This is following up this PR: https://github.com/pytorch/pytorch/pull/61862. The purpose is to modularize operator parsing so that it can be used as needed without pulling the whole `import.cpp` into build.

Test Plan: Added a unit test in `test_lite_predictor.cpp` called `ParseOperators`, similar to `ParseBytecode`.

Reviewed By: iseeyuan

Differential Revision: D31006555

fbshipit-source-id: c38e221800af4cf72963a353c452c5437f56a0ac
2021-09-17 13:31:59 -07:00
Jane Xu
1ee66a5278 Remove CUDA 9.2 references conditionals and workarounds (#65070)
Summary:
Title says it all

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65070

Reviewed By: malfet

Differential Revision: D30966464

Pulled By: janeyx99

fbshipit-source-id: e454906fd5d7d321d390939ba5d237e1d9b150f8
2021-09-17 12:28:23 -07:00
Raghavan Raman
bbe25af0df [nnc] Updated inlining to handle cases when producer indices are constants after eval (#65044)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65044

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D30954655

Pulled By: navahgar

fbshipit-source-id: dfaedb5af710b2625ceec3a443a6c4e34158ab16
2021-09-17 11:28:48 -07:00
Raghavan Raman
03fc636d5c [nnc] Updated inliner to remove assertions and exception (#64719)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64719

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D30828583

Pulled By: navahgar

fbshipit-source-id: 9826a59085a210e44d101a843ff2cae440dfd633
2021-09-17 11:28:46 -07:00
Edward Yang
9601deb1b3 Disable autograd fallback tests on Windows (#65147)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65147

I think they trigger an MSVC bug per https://github.com/pytorch/pytorch/issues/48763
ghstack-source-id: 138247203

Test Plan: breakpointed https://www.internalfb.com/intern/sandcastle/job/9007199738584981/ and sush'ed into the host and ran `buck build arvr/mode/win/opt //xplat/caffe2:autograd_libtorch_test_ovrsource` in `/cygdrive/d/ovrsource-null-hg`

Reviewed By: soulitzer

Differential Revision: D30992685

fbshipit-source-id: 06c6fb2c18d55490f89fc91ee5b7a4c5a7faf1c6
2021-09-17 08:32:43 -07:00
Eli Uriegas
03389dc851 Revert D30752939: [pytorch][PR] nvfuser update
Test Plan: revert-hammer

Differential Revision:
D30752939 (cfaecaf40b)

Original commit changeset: ce122e80f01b

fbshipit-source-id: 57685df8f9946032a06eff1de8a3d1498500d2d2
2021-09-15 17:38:47 -07:00
Mikhail Zolotukhin
7e9c599784 [TensorExpr] Add a method for sanitizing Var and Buf names in Stmt. (#65010)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65010

This pass ensures all names are legal and not-duplicated.

Fixes #52727.

Test Plan: Imported from OSS

Reviewed By: bertmaher, navahgar

Differential Revision: D30939717

Pulled By: ZolotukhinM

fbshipit-source-id: 7dbe7f937de41f22ad49137a5e067d698443ed63
2021-09-15 17:15:06 -07:00
jiej
cfaecaf40b nvfuser update (#63745)
Summary:
Syncing nvfuser code base from devel branch, Listing a few of our development since last sync:

- Extends support to normalization and reduction kernels.
- Multiple kernel launch for single `CudaFusionGroup`. Hierarchical caching system has been updated to cache graph segmentation.
- profile_ivalue is enabled to convert dynamic scalar into compile time constants, which are required by the codegen. (e.g. reduction axes).

To keep this PR simple and relatively review-free. We stripped most external changes and submitted them as separate PRs, so this gigantic PR is easier to handle.

internal updates are files located in:
1. updates in nvfuser codegen `torch/csrc/jit/coddgen/cuda`
2. added nvfuser specific benchmarks `benchmarks/cpp/nvfuser`
3. nvfuser jit cpp tests `test/cpp/jit/test_gpu.cpp` `test/cpp/jit/test_gpu_shift.cpp` `test/cpp/jit/test_gpu_validator.h`

updates affecting integration:

1. profile_ivalue enabled for nvfuser. related changes are in `torch/csrc/jit/runtime/*`,
2. exposed a few more symbols `aten/src/ATen/core/*` used by codegen

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63745

Reviewed By: saketh-are

Differential Revision: D30752939

Pulled By: malfet

fbshipit-source-id: ce122e80f01bcd3865f5bd3c4dfde660665fd84c
2021-09-15 14:42:55 -07:00
Mikhail Zolotukhin
f23f21dafe [TensorExpr] Remove 'Placeholder' class. (#64887)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64887

BufHandle has exactly the same functionality and should be used instead.

Differential Revision:
D30889483
D30889483

Test Plan: Imported from OSS

Reviewed By: navahgar

Pulled By: ZolotukhinM

fbshipit-source-id: 365fe8e396731b88920535a3de96bd3301aaa3f3
2021-09-14 00:22:44 -07:00
Martin Yuan
30a7c768d7 [RFC] Modularize functions of parsing bytecode (#61862)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61862

Modularize functions of parsing bytecode tables so that they can be used as needed in situations other than mobile lite interpreter.
* The decoupled functions are re-used by current lite interpreter loader.
* The bytecode can be serialized/deserialized from other formats.
* The decoupled functions have minimum dependencies on other PyTorch components.

Next:
Build a driver binary to include the parser and interpreter, but only has necessary dependency on other PyTorch components.
ghstack-source-id: 137867287

Test Plan:
As an example, a simple bytecode is parsed to a mobile function, and directly run in the added unit test, `RunTimeTest:ParseBytecode`. It contains basic control flow (if, else) and basic data orchestration (list construction).
CI

Reviewed By: larryliu0820

Differential Revision: D29798382

Pulled By: iseeyuan

fbshipit-source-id: 1c173a5f5d37097e3a97baec3f3e48e1eea1400f
2021-09-11 22:24:05 -07:00
Mikhail Zolotukhin
180e4fbfae [TensorExpr] LLVMCodegen: fix lowering for UInt->Float casts. (#64862)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64862

Previously we erroneously were looking at dst signedness. This was
discovered when we tried to implement quantize/dequantize ops.

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D30881696

Pulled By: ZolotukhinM

fbshipit-source-id: 34af842e5e52a3b6b5d2e70c4ef32f910a20341f
2021-09-11 09:24:36 -07:00
Hui Guo
4481c87ac4 [tensorexpr] Simplify x/100 -> 0 if x is a non-negative integer less than 100. (#64763)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64763

Simplification pattern:
  x/N -> 0; N is a constant positive integer and x is a for-loop index whose range is a subset of [0, N).

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D30845854

Pulled By: huiguoo

fbshipit-source-id: 814d69ed4be05e57405c222183cc1c6c526721cd
2021-09-10 20:33:02 -07:00
Raghavan Raman
cad7a4b0ea [nnc] Added an implementation of sign op (#64033)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64033

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D30579197

Pulled By: navahgar

fbshipit-source-id: f9f7fa7f2ffa109cf4e441eb1af821b8b891d4d3
2021-09-10 16:49:04 -07:00
Mikhail Zolotukhin
a17d6c7f80 [TensorExpr] Simplify TE IR before applying any transformations. (#64717)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64717

This also exposed several bugs, which are fixed in this PR.

Differential Revision:
D30826408
D30826408

Test Plan: Imported from OSS

Reviewed By: navahgar

Pulled By: ZolotukhinM

fbshipit-source-id: a67ec5739aceed9ffdf0d24f77eb3787cefe4560
2021-09-09 18:50:51 -07:00
Raghavan Raman
b7c86365d1 [nnc] Handled cast in index expression during inlining (#64716)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64716

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D30826388

Pulled By: navahgar

fbshipit-source-id: 7e446602f650527e0d954e437f0370602019e040
2021-09-09 08:30:52 -07:00
Raghavan Raman
652a8bf7d0 [nnc] Updated indices during broadcast to use int64_t (#64627)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64627

This fixes the root cause of S242719

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D30801686

Pulled By: navahgar

fbshipit-source-id: b6d3ebdc7eb57116eaced53c2f35c7798bb17e80
2021-09-09 08:29:37 -07:00
Elias Ellison
3bf93d769c [JIT] Add gradient check in constants (#64613)
Summary:
fixes internal issue

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64613

Reviewed By: Gamrix

Differential Revision: D30799016

Pulled By: eellison

fbshipit-source-id: 48ef52d1cac627919e6cd232216d24878a2a8b58
2021-09-09 08:13:57 -07:00
Hui Guo
5c27a580ec [tensorexpr] Allocate intermediate buffers at compile time (#64227)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64227

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D30652220

Pulled By: huiguoo

fbshipit-source-id: cd75005cdfa42751318de7174b44e14a3a01634e
2021-09-08 15:34:44 -07:00
Peter Bell
d701357d92 Factor out TensorBase that doesn't depend on native operators (#63612)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63612

This makes Tensor inherit from a new class TensorBase, that provides a subset of Tensor that doesn't
directly depend on native_functions.yaml. Code that only includes TensorBase.h with thus not need to
be rebuilt every time someone changes an operator signature.

Making `Tensor` inherit from this class means that `const TensorBase&` parameters will be callable
with an ordinary `Tensor`. I've also made `Tensor` constructible and assignable from `TensorBase` to
minimize friction in code mixing the two types.

To help enforce that `Tensor.h` and `Functions.h` aren't accidentally included, I've added an error
into `Operators.h` if `TORCH_ASSERT_NO_OPERATORS` is defined. We can either set this in the build
system for certain folders, or just define it at the top of any file.

I've also included an example of manually special-casing the commonly used `contiguous` operator.
The inline function's slow path defers to `TensorBase::__dispatch_contiguous` which is defined in
`Tensor.cpp`. I've made it so `OptionalTensorRef` is constructible from `TensorBase`, so I can
materialize a `Tensor` for use in dispatch without actually increasing its refcount.

Test Plan: Imported from OSS

Reviewed By: gchanan

Differential Revision: D30728580

Pulled By: ezyang

fbshipit-source-id: 2cbc8eee08043382ee6904ea8e743b1286921c03
2021-09-08 13:28:54 -07:00
Mikhail Zolotukhin
72274e2a2f [TensorExpr] Don't rely on exceptions in Vectorizer. (#64609)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64609

We've been using exceptions to indicate whether vectorization succeeded
or not, but that posed some problems with (e.g. we spent too much time
symbolicazing these exceptions). This change converts this mechanism to
a standard error return code.

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D30795342

Pulled By: ZolotukhinM

fbshipit-source-id: 16e38b37bcdd78ceb438ac814cc377f35b058e17
2021-09-08 00:25:34 -07:00
Maksim Levental
81fe2c5e49 add out variant of linear (#61801)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61801

resubmitting because the last one was unrecoverable due to making changes incorrectly in the stack

Test Plan: Imported from OSS

Reviewed By: desertfire

Differential Revision: D29812510

Pulled By: makslevental

fbshipit-source-id: ba9685dc81b6699724104d5ff3211db5852370a6
2021-09-07 19:58:52 -07:00
Ansley Ussery
6831d8e379 Support Union in TorchScript (#64234)
Summary:
This PR is created to replace https://github.com/pytorch/pytorch/pull/53180 PR stack, which has all the review discussions. Reason for needing a replacement is due to a messy Sandcastle issue.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64234

Reviewed By: gmagogsfm

Differential Revision: D30656444

Pulled By: ansley

fbshipit-source-id: 77536c8bcc88162e2c72636026ca3c16891d669a
2021-09-03 06:12:24 -07:00
Chen Lai
8d5b95019d [PyTorch Edge] Support default args with out arg, flag off (#63540)
Summary:
1. Allow consuming operators with defaults arguments and out arguments. Flag is off to keep the same behavior as v6, in pr 63651, turn on the flag.
2. Add two unittests to cover this type of operators.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63540

ghstack-source-id: 137211562

Test Plan:
```
caffe2/test/cpp/jit:jit - LiteInterpreterTest.DefaultArgsWithOutArg
caffe2/test/cpp/jit:jit - LiteInterpreterTest.DefaultArgsPinvWithOutArg
```

Reviewed By: raziel, iseeyuan, tugsbayasgalan

Differential Revision: D30414156

fbshipit-source-id: 0f3a219a22aee10ac53184cbd95940726c459d1f
2021-09-02 01:36:16 -07:00
Kimish Patel
468001600c Back out "Revert D30327514: [Pytorch lite predictor] Use KinetoEdgeCPUProfiler for operator profiling." (#64307)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64307

Original commit changeset: 0b2aa7c57d08

Restores original changes.
This diff changes the way operator profiling is done in lite predictor
benchmarking binary.
Instead of using custom callbacks it uses KinetoEdgeCPUProfiler to profile
events and then generate operator level metric from it.
Since KinetoEvents do not contain cpu clock time, now we report only wallclock
time.
This unifies various profiling effort that we have for benchmarking purpose. In
production we will still use observer based mechanism, but the advantage of
using kineto profiler is that we get few other things for free, such as:
chrome trace generation.
operator level memory profiling (to be added)
flop counts (to be added)
Furthermore possible we can use python post processing script to parse chrome
trace and generate output similar to torch.profiler. (To be done)

Furthermore removes some tests from test_lite_interpreter.cpp which were testing module hierarchy in debug info. They should be covered by test_mobile_profiler.cpp.

Test Plan:
aibench run
Model without debug info:
https://www.internalfb.com/intern/aibench/details/219598441154763
Model with debug info and --print_module_info true (see Operator summary has now module hierarchy information).
https://www.internalfb.com/intern/aibench/details/617154236292985

Reviewed By: raziel

Differential Revision: D30680354

fbshipit-source-id: b6ba0d59c510c13d13d9935b1d8051cc82ffa4e9
2021-09-01 13:29:35 -07:00
Mikhail Zolotukhin
8337a3fb3f [TensorExpr] Wrap error messages with buildErrorMessage call. (#64330)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64330

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D30687226

Pulled By: ZolotukhinM

fbshipit-source-id: ade1be2ad6847c6afbba60307ef854696821b4e3
2021-08-31 20:31:16 -07:00
Kimish Patel
67cb131458 Revert D30327514: [Pytorch lite predictor] Use KinetoEdgeCPUProfiler for operator profiling.
Test Plan: revert-hammer

Differential Revision:
D30327514 (bc9277dca3)

Original commit changeset: 3bb2f2daaaed

fbshipit-source-id: 0b2aa7c57d08de77c9aaa75e546a7d0938610f64
2021-08-31 08:30:36 -07:00
Kimish Patel
bc9277dca3 [Pytorch lite predictor] Use KinetoEdgeCPUProfiler for operator profiling. (#63367)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63367

This diff changes the way operator profiling is done in lite predictor
benchmarking binary.
Instead of using custom callbacks it uses KinetoEdgeCPUProfiler to profile
events and then generate operator level metric from it.
Since KinetoEvents do not contain cpu clock time, now we report only wallclock
time.
This unifies various profiling effort that we have for benchmarking purpose. In
production we will still use observer based mechanism, but the advantage of
using kineto profiler is that we get few other things for free, such as:
- chrome trace generation.
- operator level memory profiling (to be added)
- flop counts (to be added)

Furthermore possible we can use python post processing script to parse chrome
trace and generate output similar to torch.profiler. (To be done)

Test Plan:
aibench run
Model without debug info:
https://www.internalfb.com/intern/aibench/details/219598441154763
Model with debug info and `--print_module_info true` (see Operator summary has now module hierarchy information).
https://www.internalfb.com/intern/aibench/details/617154236292985

Reviewed By: raziel

Differential Revision: D30327514

fbshipit-source-id: 3bb2f2daaaedfb04bd6f5d9c91292783f9c4344f
2021-08-30 20:54:51 -07:00
Will Constable
85df73658c Make name() part of IMethod interface (#63995)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63995

JIT methods already have name() in their interface, and Py methods have names in their implementation.  I'm adding this for a particular case where someone tried to use name() on a JIT method that we're replacing with an IMethod.

Test Plan: add case to imethod API test

Reviewed By: suo

Differential Revision: D30559401

fbshipit-source-id: 76236721f5cd9a9d9d488ddba12bfdd01d679a2c
2021-08-30 13:31:55 -07:00
Zhengxu Chen
ac99d63f83 [jit] Make operation call accept Stack& instead Stack* (#63414)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63414

Misuse of raw pointer in here where stack is never nullable.
ghstack-source-id: 136938318

Test Plan:
compiles.

Imported from OSS

Reviewed By: ejguan

Differential Revision: D30375410

fbshipit-source-id: 9d65b620bb76d90d886c800f54308520095d58ee
2021-08-30 11:49:20 -07:00
Thomas J. Fan
d3bcba5f85 ENH Adds label_smoothing to cross entropy loss (#63122)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/7455

Partially resolves pytorch/vision#4281

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63122

Reviewed By: iramazanli

Differential Revision: D30586076

Pulled By: jbschlosser

fbshipit-source-id: 06afc3aa1f8b9edb07fe9ed68c58968ad1926924
2021-08-29 23:33:04 -07:00
Bert Maher
2e6221a232 [nnc] Make 64-bit dimensions work (#64077)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64077

We were assuming kernel dimensions fit in 32 bits (the old fuser made
this assumption too), but we should be able to support 64.
ghstack-source-id: 136933272

Test Plan: unit tests; new IR level test with huge sizes

Reviewed By: ZolotukhinM

Differential Revision: D30596689

fbshipit-source-id: 23b7e393a2ebaecb0c391a6b1f0c4b05a98bcc94
2021-08-28 19:59:47 -07:00
Tugsbayasgalan (Tugsuu) Manlaibaatar
d0c63e857d Enhancement for smart serialization for out schemas (#63096)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63096

Test Plan: Imported from OSS

Reviewed By: gmagogsfm

Differential Revision: D30415255

Pulled By: tugsbayasgalan

fbshipit-source-id: eb40440a3b46258394d035479f5fc4a4baa12bcc
2021-08-28 11:46:27 -07:00
Mikhail Zolotukhin
2d75ab0c8f [TensorExpr] Update tutorial. (#64109)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64109

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D30614050

Pulled By: ZolotukhinM

fbshipit-source-id: e8f9bd9ef2483e6eafbc0bd5394d311cd694c7b2
2021-08-27 16:19:29 -07:00
soulitzer
90a6498a12 Add autograd not implemented boxed fallback (#63458)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63458

See description and discussion from https://github.com/pytorch/pytorch/pull/62450

Test Plan: Imported from OSS

Reviewed By: heitorschueroff

Differential Revision: D30518572

Pulled By: soulitzer

fbshipit-source-id: 3b1504d49abb84560ae17077f0dec335749c9882
2021-08-27 15:00:28 -07:00
Tugsbayasgalan (Tugsuu) Manlaibaatar
19c1b45f25 Detect out argument in the schema (#62755)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62755

After this change, out argument can be checked by calling is_out()

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D30415256

Pulled By: tugsbayasgalan

fbshipit-source-id: b2e1fa46bab7c813aaede1f44149081ef2df566d
2021-08-27 11:20:33 -07:00
Jiewen Tan
ed573a8e08 Enable test_api IMethodTest in OSS (#63345)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63345

This diff did the following few things to enable the tests:
1. Exposed IMethod as TORCH_API.
2. Linked torch_deploy to test_api if USE_DEPLOY == 1.
3. Generated torch::deploy examples when building torch_deploy library.

Test Plan: ./build/bin/test_api --gtest_filter=IMethodTest.*

Reviewed By: ngimel

Differential Revision: D30346257

Pulled By: alanwaketan

fbshipit-source-id: 932ae7d45790dfb6e00c51893933a054a0fad86d
2021-08-26 16:50:52 -07:00
Cheng Chang
0f6b524665 [NNC] Add C++ codegen backend to NNC (#62869)
Summary:
Adds a C++ codegen backend to NNC to generate C++ for CPU instead of generating LLVM IR.
Tensors are represented as blobs of float. Vector operations are devectorized/unrolled.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62869

Test Plan:
https://github.com/pytorch/pytorch/tree/mvz-nnc-aot-prototype makes it able to AOT compile the whole MobileNetV3 model into binary code through LLVM codegen in NNC.

I forked that branch to https://github.com/cheng-chang/pytorch/tree/cc-aot-cpp, merged this PR into it, and modified `fancy_compile` to compile MobileNetV3 into C++ through

```
import torch

m = torch.jit.load('mobnet.pt')
m.eval()
f = torch.jit.freeze(m)
torch._C._fancy_compile(f.graph, [1, 3, 224, 224])
```

The generated C++ file `mobnet.cc` can be found at https://gist.github.com/cheng-chang/e2830cc6920b39204ebf368035b2bcec.

I manually compiled the generated C++ through `g++ -o mobnet -std=c++14 -L./build/lib -ltorch_cpu -ltorch mobnet.cc`, and it succeeded.

Reviewed By: ZolotukhinM

Differential Revision: D30149482

Pulled By: cheng-chang

fbshipit-source-id: e77b189f0353e37cd309423a48a513e668d07675
2021-08-26 09:56:37 -07:00
Raghavan Raman
6d31ba6ddc [nnc] Sanitized the names of constants in the input graph. (#63990)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/63923

The input graph can contain constants whose names contain special characters. So, all names of constants in the input graph need to be sanitized.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63990

Reviewed By: ZolotukhinM

Differential Revision: D30558432

Pulled By: navahgar

fbshipit-source-id: de5b0c23d50ee8997f40f2c0fc605dda3719186f
2021-08-26 09:52:02 -07:00
Bert Maher
8dda299d96 Re-apply: [nnc] Support thread level parallelism in fused kernels (#63776)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63776

I reverted this out of an abundance of caution because some test
failures occurred, but they were all due to precision issues fixed lower in
this stack.  Let's try again.

I've rolled the elimination of the allow-parallelism-in-fusions toggle into
this diff since they're pretty tightly coupled.
ghstack-source-id: 136529847

Test Plan: CI

Reviewed By: huiguoo

Differential Revision: D30484555

fbshipit-source-id: 38fd33520f710585d1130c365a8c60c9ce794a59
2021-08-24 18:56:55 -07:00
yanbing-j
33a163d886 Enable BFloat16 LeakyReLU and RReLU in CPU path (#61514)
Summary:
Enable and optimize BFloat16 LeakyReLU and RReLU in CPU path.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61514

Reviewed By: ejguan

Differential Revision: D30257612

Pulled By: VitalyFedyunin

fbshipit-source-id: 8cc0d1faacd02dcc9827af724a86d95b6952748f
2021-08-24 08:34:56 -07:00
Mike Iovine
1385f9fb12 [JIT] Add variadic stack op (#63578)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63578

Added a new op `prim::VarStack` and a pass that transforms instances of `aten::stack(list, dim)` into `prim::VarStack(list[0], ..., list[n], dim)`. Also provided a JIT interpreter implementation.

Most of the implementation/tests are the same as `prim::VarConcat`.

Test Plan: `buck test caffe2/test/cpp/jit:jit -- TestStackOpt`

Reviewed By: navahgar

Differential Revision: D30426232

fbshipit-source-id: 9829a7db6e0a5038c9b7528c43c25b0c221aa2ce
2021-08-24 08:20:54 -07:00
Mikhail Zolotukhin
f0d274294d [TensorExpr] Nuke KernelArena and KernelScope. (#63587)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63587

Now that there is no classes using KernelArena for memory management we
can remove it.

Differential Revision:
D30429115
D30429115

Test Plan: Imported from OSS

Reviewed By: navahgar

Pulled By: ZolotukhinM

fbshipit-source-id: 375f6f9294d27790645eeb7cb5a8e87047a57544
2021-08-24 00:32:16 -07:00
Mikhail Zolotukhin
62d02f2b57 [TensorExpr] Make 'Tensor' a value type. (#63586)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63586

This is another commit in transition from KernelArena memory management.
Tensor is essentially just a pair of <BufPtr, StmtPtr> and we don't need
to dynamically allocate it at all - it's cheap to pass it by value, and
that's what we're switching to in this commit.

After this change nothing uses KernelScope/KernelArena and they can be
safely removed.

Differential Revision:
D30429114
D30429114

Test Plan: Imported from OSS

Reviewed By: navahgar

Pulled By: ZolotukhinM

fbshipit-source-id: f90b859cfe863692b7beffbe9bd0e4143df1e819
2021-08-24 00:32:13 -07:00
Mikhail Zolotukhin
dd96c26066 [TensorExpr] More NFC changes like Expr* -> ExprPtr. (#63778)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63778

This is a preparation for a switch from raw pointers to shared pointers
as a memory model for TE expressions and statements.

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D30487425

Pulled By: ZolotukhinM

fbshipit-source-id: 9cbe817b7d4e5fc2f150b29bb9b3bf578868f20c
2021-08-24 00:30:49 -07:00
Mike Iovine
fc6dd0bc00 [JIT] Move UseVariadicCat internals (#63577)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63577

Since other variadic ops will have an almost identical implementation, we can generalize the `UseVariadicCat` implementation and put it in a common folder.

Also moved some test utilities that other variadic op tests will likely need.

Test Plan: `buck test caffe2/test/cpp/jit:jit -- ConcatOptTest`

Reviewed By: navahgar

Differential Revision: D30409937

fbshipit-source-id: 925c11c27b58ce98cb8368d2a205e26ba66d3db9
2021-08-23 17:30:36 -07:00
Bert Maher
37d60c08e5 Revert D30360382: [nnc] Support thread level parallelism in fused kernels
Test Plan: revert-hammer

Differential Revision:
D30360382 (d6d86efb1c)

Original commit changeset: 29acf4e932c6

fbshipit-source-id: e0531113135d30eabb172dc1537d5dd6d65dc438
2021-08-21 03:46:43 -07:00
Bert Maher
76da46ccdc Revert D30417127: Remove flag to toggle CPU fusion in the presence of parallelism
Test Plan: revert-hammer

Differential Revision:
D30417127 (6600bc9651)

Original commit changeset: b77d7c68364f

fbshipit-source-id: 6b52fb83a84fe241945e3cb3eeb71050d1d9c8f1
2021-08-21 03:38:07 -07:00
Bert Maher
6600bc9651 Remove flag to toggle CPU fusion in the presence of parallelism (#63514)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63514

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D30417127

Pulled By: bertmaher

fbshipit-source-id: b77d7c68364f2af73570740540f3b1152313016e
2021-08-20 11:18:19 -07:00
Bert Maher
d6d86efb1c [nnc] Support thread level parallelism in fused kernels (#63386)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63386

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D30360382

Pulled By: bertmaher

fbshipit-source-id: 29acf4e932c669ce0f35823faea9099bcd8119b6
2021-08-20 11:18:17 -07:00
Raghavan Raman
d82667f7e2 [nnc] Updated sliceTail to do inplace mutation (#63532)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63532

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D30412184

Pulled By: navahgar

fbshipit-source-id: e7669d3b9d24e14501f3feb6505c88d1d42030c6
2021-08-19 22:55:30 -07:00
Raghavan Raman
5e31a3b904 [nnc] Updated sliceHead to do inplace mutation (#63531)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63531

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D30412183

Pulled By: navahgar

fbshipit-source-id: 47ee9482a36e606788d28d22eee4edaca45ffa50
2021-08-19 22:54:05 -07:00
Mikhail Zolotukhin
6e00b31b15 [TensorExpr] Make CacheReplacer and IndexFlattener mutate stmts/exprs inplace. (#63527)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63527

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D30411411

Pulled By: ZolotukhinM

fbshipit-source-id: efb14ee57b36537fa4fefa89bdd6bafe7151c012
2021-08-18 22:59:31 -07:00
Mikhail Zolotukhin
1d62fb8a63 [TensorExpr] Speedup ExternalCall.ComputeInterop test by reducing tensor sizes. (#63526)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63526

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D30411410

Pulled By: ZolotukhinM

fbshipit-source-id: d9a99afac14d2238b5100c98ae9ed4467f9f05ea
2021-08-18 22:58:25 -07:00
Mikhail Zolotukhin
7fdba4564a [TensorExpr] IRSimplifier: sort terms in polynomials, terms, minterms, maxterms. (#63197)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63197

This solves non-determinism from using hash values in sort methods.
Changes in tests are mostly mechanical.

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D30292776

Pulled By: ZolotukhinM

fbshipit-source-id: 74f57b53c3afc9d4be45715fd74781271373e055
2021-08-18 14:49:27 -07:00
Mikhail Zolotukhin
1dc2b52764 [TensorExpr] Add a wrapper for all expr and stmt pointers. (#63195)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63195

This helps us to later switch from using KernelArena with raw pointers
to shared pointers without having to change all our source files at
once.

The changes are mechanical and should not affect any functionality.

With this PR, we're changing the following:
 * `Add*` --> `AddPtr`
 * `new Add(...)` --> `alloc<Add>(...)`
 * `dynamic_cast<Add*>` --> `to<Add>`
 * `static_cast<Add*>` --> `static_to<Add>`

Due to some complications with args forwarding, some places became more
verbose, e.g.:
 * `new Block({})` --> `new Block(std::vector<ExprPtr>())`

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D30292779

Pulled By: ZolotukhinM

fbshipit-source-id: 150301c7d2df56b608b035827b6a9a87f5e2d9e9
2021-08-17 13:44:45 -07:00
Mikhail Zolotukhin
548c717cbd [TensorExpr] Remove test_train from tensorexpr tests. (#63194)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63194

This test implements functionality used nowhere, and the author no
longer works on that. This PR also adds test_approx to CMakeLists where
it's been missing before.

Test Plan: Imported from OSS

Reviewed By: VitalyFedyunin

Differential Revision: D30292777

Pulled By: ZolotukhinM

fbshipit-source-id: ab6d98e729320a16f1b02ea0c69734f5e7fb2554
2021-08-16 20:36:31 -07:00
Don Jang
e7724bb100 [JIT] Set future's error to current exception as is when --torch_jit_enable_rethrow_caught_exception=true (#63348)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63348

This change addresses singlaiiit's comment on D30241792 (61b49c8e41), which makes the JIT interpreter's behavior consistent between `future` is set and not.

Test Plan: Enhanced `EnableRethrowCaughtExceptionTest.EnableRethrowCaughtExceptionTestRethrowsCaughtException` to cover the modified code path.

Reviewed By: singlaiiit

Differential Revision: D30347782

fbshipit-source-id: 79ce57283154ca4372e5341217d942398db21ac8
2021-08-16 17:32:13 -07:00
Raghavan Raman
e50e8b07d8 [nnc] Updated IRMutator and IRSimplifier to perform in-place mutations. (#63246)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63246

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D30309636

Pulled By: navahgar

fbshipit-source-id: 409ea8d6982888cfee9127e6248044dd2ed9d8d4
2021-08-16 00:09:22 -07:00
Kimish Patel
38c185189c [Pytorch Edge] Enable kineto profiler on mobile via EdgeKinetoProfiler (#62419)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62419

This diff adds support for cpu only kineto profiler on mobile. Thus
enabling chrome trace generation on mobile. This bring cpp API for
mobile profiling on part with Torchscript.
This is done via:
1. Utilizating debug handle annotations in KinetoEvent.
2. Adding post processing capability, via callbacks, to
KinetoThreadLocalState
3. Creating new RAII stype profiler, KinetoEdgeCPUProfiler, which can be
used in surrounding scope of model execution. This will write chrome
trace to the location specified in profiler constructor.

Test Plan:
MobileProfiler.ModuleHierarchy

Imported from OSS

Reviewed By: raziel

Differential Revision: D29993660

fbshipit-source-id: 0b44f52f9e9c5f5aff81ebbd9273c254c3c03299
2021-08-13 21:40:19 -07:00
Kimish Patel
1b04d99f55 [Pytorch Profiler] Introduce scopes to enableProfiler (#62417)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62417

This diff adds an option to make enableProfiler enable callbacks only
for certain RecordScopes.
Why?
Profiling has some overhead when we repeatedly execute callbacks for
alls copes. On mobile side when we often have small quantized models
this overhead can be large. We observed that by only profiling top level
op and skipping profiling of other atend ops called within we can limit
this overhead. For example, instead of profling at::conv2d -> at::convolution ->
at::convolution_ and further more if ops like transpose etc. are called,
skipping profiling of those. Of course this limits the visibility, but
at the least this way we get a choice.

Test Plan: Imported from OSS

Reviewed By: ilia-cher

Differential Revision: D29993659

fbshipit-source-id: 852d3ae7822f0d94dc6e507bd4019b60d488ef69
2021-08-13 21:40:15 -07:00
Kimish Patel
b00afe135d [Pytorch Profiler] Add debug_handles to KinetoEvent (#62228)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62228

This diff adds debug handles to events and provides a way to use
RECORD_FUNCTIONs that will pass debug_handles down to profiler, which
will record it in the events.

Why add debug_handles?
For pytorch mobile, with lite interpreter, we generate debug handles
that can be used for lazily symbolicate exception traces to model level
stack trace. Similar to the model level stack trace you get in
TorchScript models. The debug_handles also enable getting module
hierarchy for lite interpreter model, support for which was added to
KinetoProfiler in previous diffs.

Followup plan:
1. Enabled scope callbacks such that lite interpreter can use it to
profiler only top level ops.
2. Enable post processing callbacks that take KinetoEvents and populate
module hierarchy using debug handles.

This will let us use KinetoProfiler for lite interpter use cases on
mobile. Aim is to use RAII guard to similarly generate chrome trace for
mobile usecases as well, although only for top level ops.

Test Plan:
test_misc : RecordDebugHandles.Basic

Imported from OSS

Reviewed By: ilia-cher

Differential Revision: D29935899

fbshipit-source-id: 4f06dc411b6b5fe0ffaebdd26d3274c96f8f389b
2021-08-13 21:40:14 -07:00
Kimish Patel
54f2eb6e7e [Pytorch Profiler] Add support for adding module hierarchy to (#61792)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61792

KinetoEvent

This PR adds module hierarchy information to events.
What is module hierarchy information attached to events?
During profiling a TorchScript module, when events are added, we ask JIT
what is the module hierarchy associated with the node being
executed. At the time of execution of that node, there might be multiple
frames in the stack of interpreter. For each frame, we find
corresponding node and the corresponding module hierarchy is queried.
Module hierarchy corresponding to the node is associated with node's
InlinedCallStack. InlinedCallStack of node tracks the path via which the
node is inlined. Thus during the inlining process we annotate
module information corresponding to the CallMethod nodes being inlined.

With this PR, chrome trace will contain additional metadata:
"Module Hierarchy". This can look like this:
TOP(ResNet)::forward.SELF(ResNet)::_forward_impl.layer1(Sequential)::forward.0(BasicBlock)::forward.conv1(Conv2d)::forward.SELF(Conv2d)::_conv_forward
It contains module instance, type name and the method name in the
callstack.

Test Plan:
test_profiler

Imported from OSS

Reviewed By: raziel, ilia-cher

Differential Revision: D29745442

fbshipit-source-id: dc8dfaf7c5b8ab256ff0b2ef1e5ec265ca366528
2021-08-13 21:39:10 -07:00
Don Jang
61b49c8e41 [JIT] Add a flag to rethrow caught exception in jit interpreter (#63073)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63073

It turned out that it's less than ideal to print out verbose stacktrace in exception messages in high-QPS services (see the related task) with a non-significant failure rate due to the truncation of long stacktrace which results in losing the original exception message thrown from native code. It is actually desirable to retain only the message of the original exception directly thrown from native code in such a usecase.

This change adds a new flag `torch_jit_disable_exception_stacktrace` to the pytorch jit interpreter to suppress stacktrace in the messages of exception thrown from the interpreter.

Reviewed By: Krovatkin

Differential Revision: D30241792

fbshipit-source-id: c340225c69286663cbd857bd31ba6f1736b1ac4c
2021-08-13 08:44:24 -07:00
Shen Li
1022443168 Revert D30279364: [codemod][lint][fbcode/c*] Enable BLACK by default
Test Plan: revert-hammer

Differential Revision:
D30279364 (b004307252)

Original commit changeset: c1ed77dfe43a

fbshipit-source-id: eab50857675c51e0088391af06ec0ecb14e2347e
2021-08-12 11:45:01 -07:00
Zsolt Dollenstein
b004307252 [codemod][lint][fbcode/c*] Enable BLACK by default
Test Plan: manual inspection & sandcastle

Reviewed By: zertosh

Differential Revision: D30279364

fbshipit-source-id: c1ed77dfe43a3bde358f92737cd5535ae5d13c9a
2021-08-12 10:58:35 -07:00
Nikita Shulga
709ac6853a Fix warnings (#62930)
Summary:
Add `-Wno-writable-strings`(which is clang's flavor of `-Wwrite-strings`) to list of warnings ignored while compiling torch_python.
Avoid unnecessary copies in range loop
Fix number of signed-unsigned comparisons

Found while building locally on M1

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62930

Reviewed By: albanD

Differential Revision: D30171981

Pulled By: malfet

fbshipit-source-id: 25bd43dab5675f927ca707e32737ed178b04651e
2021-08-11 14:07:10 -07:00
Howard Cheng
fa22f6303f [PyTorch] Add flop count for addmm (#61895)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61895

* Add FLOP count for addmm, should be `2*m*n*k`.

Share the same code path for `addmm` and `mm`.

Test Plan:
Imported from OSS

`python test/test_profiler.py`
Run a sample profile and check that FLOPS for `aten::addmm` is correct.

`[chowar@devbig053.frc2 ~/local/pytorch/build] ninja bin/test_jit`
`[chowar@devbig053.frc2 ~/local/pytorch/build] ./bin/test_jit --gtest_filter='ComputeFlopsTest*'`

Reviewed By: dskhudia

Differential Revision: D29785671

fbshipit-source-id: d1512036202d7234a981bda897af1f75808ccbfe
2021-08-11 12:33:43 -07:00
Jacob Szwejbka
b746fed164 [Pytorch Edge] Move RuntimeCompatibilityInfo Factory Method (#63005)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63005

Realized I forgot to move the Runtime half of these functions be within the struct.

Test Plan: ci

Reviewed By: pavithranrao

Differential Revision: D30205521

fbshipit-source-id: ccd87d7d78450dd0dd23ba493bbb9d87be4640a5
2021-08-11 11:15:57 -07:00
tktrungna
2f5ac9c0ba update test distributed (#62796)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62796

Fixes #62380

* update test functions to call wheel install folder {sitepackages}/torch instead of build/ folder
* add symbolic link for shared libraries which are called by the tests (this is a bit hacky and should be fixed the rpath before compiling -- similar to https://github.com/pytorch/pytorch/blob/master/.jenkins/pytorch/test.sh#L204-L208).

### Test plan
check if all ci workflows pass

Test Plan: Imported from OSS

Reviewed By: driazati

Differential Revision: D30193142

Pulled By: tktrungna

fbshipit-source-id: 1247f9eda1c11c763c31c7383c77545b1ead1a60
2021-08-10 16:29:47 -07:00
Howard Huang
4d0497034c Remove process_group_agent and faulty_process_group_agent files (#62985)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62985

Remove the process_group_agent and faulty_process_group_agent code now that PROCESS_GROUP backend has been deprecated for RPC (https://github.com/pytorch/pytorch/issues/55615). Discussed with xush6528 that it was okay to remove ProcessGroupAgentTest and ProcessGroupAgentBench which depended on process_group_agent.

Test Plan: CI tests

Reviewed By: pritamdamania87

Differential Revision: D30195576

fbshipit-source-id: 8b4381cffadb868b19d481198015d0a67b205811
2021-08-10 15:57:39 -07:00
Will Constable
22e3cc21e5 Back out "Enable test_api IMethodTest in OSS" (#62893)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62893

Original commit changeset: 50eb3689cf84

Test Plan: Confirm pytorch_linux_xenial_cuda11_1_cudnn8_py3_gcc7_test2 passes in OSS

Reviewed By: seemethere, alanwaketan

Differential Revision: D30159999

fbshipit-source-id: 74ff8975328409a3dc8222d3e2707a1bb0ab930c
2021-08-06 16:43:50 -07:00
Jiewen Tan
4b68801c69 Enable test_api IMethodTest in OSS (#62521)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62521

This diff did the following few things to enable the tests:
1. Exposed IMethod as TORCH_API.
2. Linked torch_deploy to test_api if USE_DEPLOY == 1.

Test Plan:
./build/bin/test_api --gtest_filter=IMethodTest.*

To be noted, one needs to run `python torch/csrc/deploy/example/generate_examples.py` before the above command.

Reviewed By: ezyang

Differential Revision: D30055372

Pulled By: alanwaketan

fbshipit-source-id: 50eb3689cf84ed0f48be58cd109afcf61ecca508
2021-08-04 21:14:20 -07:00
Raghavan Raman
59dd12042e [nnc] Removed const from all fields in IR. (#62336)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62336

This PR was generated by removing `const` for all types of nodes in NNC IR, and fixing compilation errors that were the result of this change.

This is the first step in making all NNC mutations in-place.

Test Plan: Imported from OSS

Reviewed By: iramazanli

Differential Revision: D30049829

Pulled By: navahgar

fbshipit-source-id: ed14e2d2ca0559ffc0b92ac371f405579c85dd63
2021-08-03 11:44:36 -07:00
Jacob Szwejbka
474d7ec43b [Pytorch Edge] Black Box Compatibility API (#61477)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61477

It would be nice if the compatibility api was just kinda plug and play with no care about the internals of the api at all. Thats what this diff aims to provide.

The general usage would be something like
  < On the Client >
  RuntimeCompatibilityInfo runtime_info = get_runtime_compatibility_info();

  .
  .
  .
  < On the Server >
  ModelCompatibilityInfo model_info = get_model_compatibility_info(<model_path>);
  bool compatible = is_compatible(runtime_info, model_info);

Currently RuntimeCompatibilityInfo and ModelCompatibilityInfo are exactly the same, but it seemed feasible to me that they may end up diverging as more information is added to the api (such as a min supported bytecode version being exposed from the runtime).

Test Plan: unit test and ci

Reviewed By: dhruvbird, raziel

Differential Revision: D29624080

fbshipit-source-id: 43c1ce15531f6f1a92f357f9cde4e6634e561700
2021-08-03 11:27:28 -07:00
yanbing-j
c7a7c2b62f Enable Gelu fp32/bf16 in CPU path using Mkldnn implementation (#58525)
Summary:
Enable Gelu bf16/fp32 in CPU path using Mkldnn implementation. User doesn't need to_mkldnn() explicitly. New Gelu fp32 performs better than original one.

Add Gelu backward for https://github.com/pytorch/pytorch/pull/53615.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58525

Reviewed By: ejguan

Differential Revision: D29940369

Pulled By: ezyang

fbshipit-source-id: df9598262ec50e5d7f6e96490562aa1b116948bf
2021-08-03 06:52:23 -07:00
Hui Guo
3a592730d5 [nnc] Simplify i%100 to i if i is less than 100; fixed #52580 (#60693)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60693

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D29375938

Pulled By: huiguoo

fbshipit-source-id: 1388729c5b93805cb156efa53e8823d5462885bf
2021-08-02 18:38:54 -07:00
Hui Guo
8f7ae77040 [nnc] Add context-sensitive simplification for div/mod (#60688)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60688

Test Plan: Imported from OSS

Reviewed By: navahgar, ZolotukhinM

Differential Revision: D29373313

Pulled By: huiguoo

fbshipit-source-id: 90d7f2fbfce583b0ea3b0f1c7899e22b0210bd62
2021-08-02 18:37:39 -07:00
Joel Schlosser
ee482edf0a Callable activation function support for Transformer modules (C++) (#62342)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/60747

Enhances the C++ versions of `Transformer`, `TransformerEncoderLayer`, and `TransformerDecoderLayer` to support callables as their activation functions. The old way of specifying activation function still works as well.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62342

Reviewed By: malfet

Differential Revision: D30022592

Pulled By: jbschlosser

fbshipit-source-id: d3c62410b84b1bd8c5ed3a1b3a3cce55608390c4
2021-08-02 08:06:39 -07:00
Will Constable
bc787f2402 Fix setArgumentNames and make Script/Python consistent (#62442)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62442

For PythonMethodWrapper::setArgumentNames, make sure to use the correct method
specified by method_name_ rather than using the parent model_ obj which itself
_is_ callable, but that callable is not the right signature to extract.

For Python vs Script, unify the behavior to avoid the 'self' parameter, so we only
list the argument names to the unbound arguments which is what we need in practice.

Test Plan: update unit test and it passes

Reviewed By: alanwaketan

Differential Revision: D29965283

fbshipit-source-id: a4e6a1d0f393f2a41c3afac32285548832da3fb4
2021-07-29 21:29:06 -07:00
Dhruv Matani
0b3f42fa4f [PyTorch Edge] Add test for lite interpreter operator caching (#62306)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62306

Test to see if caching of operators works as expected. When caching operators during model load we look up using the operator name. This test ensures that even if there are multiple operators with the same name (in the same model), the caching distinguishes between the ones that have a different number of arguments specified during the call in the serialized bytecode.

In this specific test, there's a model with 3 methods, 2 of which return a `float32` tensor and one which return an `int64` dtype. Please see the comments in the diff for details.

ghstack-source-id: 134634613

Test Plan:
Test command:

```
cd fbsource/fbcode/
buck test mode/dev //caffe2/test/cpp/jit:jit -- --exact 'caffe2/test/cpp/jit:jit - LiteInterpreterTest.OperatorCacheDifferentiatesDefaultArgs'
```

```
cd fbsource/
buck test xplat/caffe2:test_lite_interpreter
```

Reviewed By: raziel

Differential Revision: D29929116

fbshipit-source-id: 1d42bd3e6d33128631e970c477344564b0337325
2021-07-29 20:14:45 -07:00
Dhruv Matani
0bbdf0e1e3 [PyTorch Edge] Add test_lite_interpreter to fbsource xplat BUCK files (#62305)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62305

Currently, it's super time consuming to run a lite interpreter test from fbcode since it takes > 10 minutes to build. Recently, I haven't been able to do that either due to low disk space.

Having this test available in fbsource/xplat/ is a great win for productivity since I can re-run it in ~2 minutes even after significant changes!

I've had to disarm some tests that can only run in OSS of fbcode builds (since they need functionality that we don't include for on-device FB builds). They are disarmed using the macro `FB_XPLAT_BUILD`.

ghstack-source-id: 134634611

Test Plan: New test!

Reviewed By: raziel, JacobSzwejbka, cccclai

Differential Revision: D29954943

fbshipit-source-id: e55eab14309472ef6bc9b0afe0af126c561dbdb1
2021-07-29 20:13:06 -07:00