Commit Graph

2136 Commits

Author SHA1 Message Date
goldenxuett
e71f4e7958 [JIT] Implement may_contain_alias in FunctionSchema (#81352)
- Created may_contain_alias method in FunctionSchema to publicize more detailed aliasing information about inputs and outputs of a schema. This method returns whether the first argument may contain an alias to the second argument (ie if the first argument is a list[Tensor], it can contain an alias to the second argument of the second argument is Tensor(*)) and vice versa if bidirectional = true.
- Created helper methods are explained more thoroughly in detail in function_schema.h
-Tested may_contain_alias methods for basic functionality, bidirectional functionality, wildcard functionality and dual container functionality in test_schema_info.cpp.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81352
Approved by: https://github.com/davidberard98, https://github.com/Gamrix
2022-07-15 21:57:37 +00:00
goldenxuett
42ee1608d3 [JIT] Add special cases batch_norm, instance_norm and dropout for SchemaInfo (#81007)
- Added special cases for detach in is_non_deterministic() check and batch_norm and instance_norm in is_mutable() check in SchemaInfo().
- Added tests for the above special cases for detach, batch_norm and instance_norm.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81007
Approved by: https://github.com/davidberard98
2022-07-15 04:52:02 +00:00
goldenxuett
3b4964230e [JIT] Add side effects checks for ops in SchemaInfo subclass (#81002)
- Added has_side_effects method which returns whether a given op has side effects. Currently this is implemented with a hard-coded list of functions copied from ir.cpp in AliasDB, but this will eventually be implemented by returning with a given schema has the has_side_effects tag.
- Tested in test_schema_info.cpp with both an op with side effects and an op without side effects.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81002
Approved by: https://github.com/davidberard98
2022-07-13 00:39:30 +00:00
goldenxuett
14c28caed9 [JIT] Add determinism checks for ops in SchemaInfo subclass (#81000)
- Added is_non_deterministic which returns whether a given op is non-deterministic. Currently this is implemented with a hard-coded list of non-deterministic functions copied from ir.cpp in AliasDB, but this will eventually be implemented by returning with a given schema has the non_deterministic tag.
- Tested is_non_deterministic method with a deterministic op and a non deterministic op in test_schema_info.cpp

**Note that the case for op "aten::dropout(Tensor input, float p, bool train) -> Tensor" which is deterministic whenever "train=false" is not accounted for in this pr and will be fixed in a later pr. Currently "aten::dropout(Tensor input, float p, bool train) -> Tensor" is always considered nondeterministic.**
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81000
Approved by: https://github.com/davidberard98
2022-07-13 00:35:42 +00:00
goldenxuett
50ba94f5cc [JIT] Add aliasing checks in SchemaInfo with associated tests (#80984)
- Created may_alias method in SchemaInfo to update the implementation of FunctionSchema::may_alias for aliasing cases due to inputs aliasing.
- Created output_alias_map_ internal variable to check cases where outputs might alias due to inputs aliasing. This variable is updated in generateAliasMap().
- Added tests for various may_alias special cases (input - input, input - output, output - output) due to inputs aliasing causing other arguments to also alias.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80984
Approved by: https://github.com/davidberard98
2022-07-13 00:18:43 +00:00
goldenxuett
aa61fdb667 [JIT] Add argumentValue functions and is_mutable checks to SchemaInfo (#80972)
- Created addArgumentValue/s methods in SchemaInfo to pass argument values into the subclass. These are used for more accurate mutation, aliasing and determinism checks which include special cases.
- Added input_alias_map_ to keep track of which inputs alias each other. This is updated with the method generateAliasMap.
- Implemented is_mutable methods in SchemaInfo which also give information based on argument values. For instance, if two inputs alias and one is mutable by the schema, then the other will also be mutable.
- Tested Schema Info is_mutable implementation where inputs alias as mentioned above.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80972
Approved by: https://github.com/davidberard98
2022-07-13 00:16:41 +00:00
goldenxuett
e3a870986e [JIT] Add may_alias in FunctionSchema with associated tests (#80918)
- Created may_alias method in FunctionSchema to publicize aliasing information about inputs and outputs of a schema.
- Tested may_alias methods for basic functionality, exceptions, and wildcard functionality.

**Cases where elements of a container alias another argument will be handled with a new may_contain_alias method which will be created in a later pr**
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80918
Approved by: https://github.com/davidberard98
2022-07-12 18:07:23 +00:00
goldenxuett
b4e342928b [JIT] Add mutability checks in FunctionSchema and create SchemaInfo subclass (#80734)
- Added overloads to is_mutable method in FunctionSchema to tell whether an argument at index is mutable or an argument with name is mutable.
- Created SchemaInfo subclass of FunctionSchema with constructors from FunctionSchema and from const char* signature.
- Tested is_mutable method overloads in new test_schema_info.cpp file.

**Note that this pr is used to set up SchemaInfo. Implementation for SchemaInfo will be addressed in later commits**

Differential Revision: [D37651384](https://our.internmc.facebook.com/intern/diff/D37651384)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80734
Approved by: https://github.com/davidberard98
2022-07-11 19:13:06 +00:00
soulitzer
516f3198d6 Fix retains grad behavior after in-place (#79996)
See this doc: https://docs.google.com/document/d/1KiRdnoj6B4cI3yl017hTbCqcOGO1gWIpUf20sldipHM/edit#

Two issues (1) regarding hooks in general and (2) regarding retains grad hooks are fixed, Python hooks, which rely on a different mechanism are not discussed here:
- Hooks in cpp in general
  - (fixed) new hooks to registered to a newer version of the tensor no longer get applied to grad_fn
    associated with older version of the tensor when the first hook was ever registered
  - (unchanged) hooks registered to the older version of the tensor remain active on
- Retains grad hooks
  - (fixed) now get moved to the latest grad_fn. NB: To the user, retains_grad is not considered hooks
    or expected to behave like hooks (which we consider properties of the grad_fn) vs retains_gradness
    which is a property of the tensor.
- (not in this PR) Python hooks
  - (will fix) same issue as hooks in cpp where new hooks are being applied to grad_fn associated
    with the older version of the tensor
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79996
Approved by: https://github.com/albanD
2022-07-08 19:13:28 +00:00
Sergii Dymchenko
b0aaefb50f Build example_allreduce only for GLOO (#81062)
`example/allreduce.cpp` is GLOO-specific and will not compile with USE_GLOO=0

Pull Request resolved: https://github.com/pytorch/pytorch/pull/81062
Approved by: https://github.com/malfet
2022-07-08 02:25:54 +00:00
Nikolay Korovaiko
8389ccbcd8 reinstate size and shape returning symints (#79560)
This PR redirects `size` and `.shape` to call `sym_sizes`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79560
Approved by: https://github.com/Chillee
2022-07-08 01:17:33 +00:00
Boyan Atanasov
b603860c1d Back out "[profiling] Adding targets file for test_mobile_profiler" (#80789)
Summary:
Original commit changeset: 38314c83d223

Original Phabricator Diff: D37116110 (c9aa74a37f)

Reviewed By: mcr229

Differential Revision: D37582906

Pull Request resolved: https://github.com/pytorch/pytorch/pull/80789
Approved by: https://github.com/bochko, https://github.com/salilsdesai
2022-07-05 23:34:15 +00:00
Han Qi (qihqi)
c93ceef658 Wrap static initializers in ifdef (#80590)
because, on iOS some projects has -Wglobal-constructors and it won't build.

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/80590
Approved by: https://github.com/cccclai
2022-07-01 04:42:17 +00:00
Max Ren
c9aa74a37f [profiling] Adding targets file for test_mobile_profiler (#80351)
Summary:
Testing for successful recording of backend events. Testing checks that the trace file successfully adds the memory recording from the backend at execute. The record in the trace file looks like:

```
{
    "ph": "i", "cat": "cpu_instant_event", "s": "t", "name": "[memory]",
    "pid": 847267, "tid": 847267,
    "ts": 1655333276408215,
    "args": {
      "Device Type": 0, "Device Id": -1, "Addr": 108370615407104, "Bytes": 16384, "Total Allocated": 16384, "Total Reserved": 49152
    }
  }
```

Test Plan:
```
buck test //caffe2/test/cpp/lite_interpreter_runtime:test_mobile_profiler
Parsing buck files: finished in 1.6 sec
Creating action graph: finished in 30.9 sec
Downloaded 0/5 artifacts, 0.00 bytes, 100.0% cache miss (for updated rules)
Building: finished in 37.9 sec (100%) 25314/25314 jobs, 5/25314 updated
  Total time: 01:10.5 min
More details at https://www.internalfb.com/intern/buck/build/ef1c4324-13d3-494e-bce7-8004047d5f89
BUILD SUCCEEDED
Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details.
Running with tpx session id: 17f300d4-9a78-4302-9e9e-d7ab79ba1ff0
Trace available for this run at /tmp/tpx-20220615-165413.567757-17f300d4-9a78-4302-9e9e-d7ab79ba1ff0/trace.log
RemoteExecution session id: reSessionID-17f300d4-9a78-4302-9e9e-d7ab79ba1ff0-tpx
Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/7881299443250383
    ✓ ListingSuccess: caffe2/test/cpp/lite_interpreter_runtime:test_mobile_profiler : 3 tests discovered (37.049)
    ✓ Pass: caffe2/test/cpp/lite_interpreter_runtime:test_mobile_profiler - MobileProfiler.Backend (0.402)
    ✓ Pass: caffe2/test/cpp/lite_interpreter_runtime:test_mobile_profiler - MobileProfiler.ModuleHierarchy (0.487)
    ✓ Pass: caffe2/test/cpp/lite_interpreter_runtime:test_mobile_profiler - MobileProfiler.BackendMemoryEvents (0.280)
Summary
  Pass: 3
  ListingSuccess: 1
Finished test run: https://www.internalfb.com/intern/testinfra/testrun/7881299443250383
If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users
```

Differential Revision: D37116110

Pull Request resolved: https://github.com/pytorch/pytorch/pull/80351
Approved by: https://github.com/kimishpatel
2022-06-30 17:27:35 +00:00
Sergei Vorobev
a8b0988596 Fix //:module_test Conversion_MultiCUDA (#79926)
Fixes #79871

Make `module.cpp` tests respect change that was made in #78436 (no int types in autograd).

Note that there still a gap in Cmake test -- it's unclear why it didn't fail CI before.

As far as I can tell it should be executed, because it's included here 79507d2a9d/test/cpp/api/CMakeLists.txt (L17):L17

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79926
Approved by: https://github.com/soulitzer
2022-06-21 23:32:18 +00:00
Nikolay Korovaiko
efc7343743 Revert "Revert "Put symint overloads on a different name"" (#79680)
This relands https://github.com/pytorch/pytorch/pull/79281

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79680
Approved by: https://github.com/malfet
2022-06-21 07:06:33 +00:00
Han Qi (qihqi)
fed12ff680 [BE][flatbuffer] Remove code duplications and refactor (#79184)
Summary:
Remove code dup in import.cpp / export_modules.cpp such that
1. Only one copy of switching logic (detect flatbuffer / is_flatbuffer);
2. Move detection of includeness of flatbuffer to runtime (so no more macros)

This also reverts the dependency of import.cpp -> flatbuffer_loader.cpp to flatbuffer_loader.cpp -> import.cpp.

Differential Revision: D36926217

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79184
Approved by: https://github.com/zhxchen17
2022-06-20 16:37:38 +00:00
Nikita Shulga
4a4890cfb2 [BE] Use CamelCase for enum class members (#79772)
Per many C++ code-style guides members(for [example](https://google.github.io/styleguide/cppguide.html#Enumerator_Names) ) members of `enum` should be CamelCased,
and only defines should be ALL_CAPS

Changes `MemOverlap`, `MemOverlapStatus` and `CmpEvalResult` enum values

Also, `YES`, `NO`, `TRUE` and `FALSE` are often system defines

Fixes among other things, current iOS build regression, see, which manifests as follows (see [this](6e90572bb9):
```
/Users/runner/work/pytorch/pytorch/aten/src/ATen/MemoryOverlap.h:19:29: error: expected identifier
enum class MemOverlap { NO, YES, TOO_HARD };
                            ^
/Applications/Xcode_12.4.app/Contents/Developer/Platforms/iPhoneSimulator.platform/Developer/SDKs/iPhoneSimulator14.4.sdk/usr/include/objc/objc.h:89:13: note: expanded from macro 'YES'
#define YES __objc_yes
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79772
Approved by: https://github.com/drisspg, https://github.com/kulinseth
2022-06-17 05:53:57 +00:00
PyTorch MergeBot
b9bb52d97b Revert "Put symint overloads on a different name"
This reverts commit 213a8fc992.

Reverted https://github.com/pytorch/pytorch/pull/79281 on behalf of https://github.com/bigfootjon due to Diff reverted internally
2022-06-15 17:15:21 +00:00
Nikolay Korovaiko
83e575c510 have a common interface to extract metadata from SizeNodes (#78088)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78088
Approved by: https://github.com/JackCaoG, https://github.com/wconstab
2022-06-15 04:59:08 +00:00
John Clow
07a528cac7 Adding isDynamic Support to SizeNodes
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77917

Approved by: https://github.com/Krovatkin
2022-06-14 03:27:57 +00:00
David Berard
91a2e953e5 [JIT] Use signed integers in CalculatedNecessaryArgs
x was underflowing:
```
size_t x = ...
while (x >= 0) {
  x--;
}
```

Changed the variables to ssize_t.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79331

Approved by: https://github.com/yuhc, https://github.com/tugsbayasgalan
2022-06-13 19:41:18 +00:00
Edward Z. Yang
213a8fc992 Put symint overloads on a different name
Due to implicit conversion shenanigans, having both IntArrayRef
and SymIntArrayRef overloads makes {} ambiguous.  While we could
fix this by making a single unified type that accepts all the overloads
we want, an easier fix was to just push the SymIntArrayRef overload
to its own name.

Signed-off-by: Edward Z. Yang <ezyangfb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79281

Approved by: https://github.com/suo
2022-06-12 14:36:39 +00:00
Michael Suo
30fb2c4aba [lint] autoformat test/cpp and torch/csrc
Let's have some fun.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78828

Approved by: https://github.com/ezyang
2022-06-11 21:11:16 +00:00
Michael Andreas Dagitses
ab2ca95dd1 turn on -Werror=unused-variable in our Bazel CPU build
Summary:
We also fix any existing issues. Note that we only do this for the CPU
build because nvcc is considered a C++ toolchain but it does not have
the same flag support. Adding flags to the GPU build will cause nvcc
errors.

Test Plan: Built locally, rely on CI to confirm.

Reviewers: malfet

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79156

Approved by: https://github.com/seemethere, https://github.com/osalpekar, https://github.com/albanD
2022-06-11 02:46:34 +00:00
Michael Andreas Dagitses
606b234336 turn on -Werror=unused-function in our Bazel CPU build
Summary:
We also fix any existing issues. Note that we only do this for the CPU
build because nvcc is considered a C++ toolchain but it does not have
the same flag support. Adding flags to the GPU build will cause nvcc
errors.

Test Plan: Built locally, rely on CI to confirm.

Reviewers: malfet

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79154

Approved by: https://github.com/seemethere, https://github.com/osalpekar, https://github.com/albanD
2022-06-10 22:11:54 +00:00
PyTorch MergeBot
bcd7a20953 Revert "turn on -Werror=unused-function in our Bazel CPU build"
This reverts commit 67d313a032.

Reverted https://github.com/pytorch/pytorch/pull/79154 on behalf of https://github.com/malfet due to Breaks bazel build: 67d313a032
2022-06-10 20:43:03 +00:00
Michael Andreas Dagitses
67d313a032 turn on -Werror=unused-function in our Bazel CPU build
Summary:
We also fix any existing issues. Note that we only do this for the CPU
build because nvcc is considered a C++ toolchain but it does not have
the same flag support. Adding flags to the GPU build will cause nvcc
errors.

Test Plan: Built locally, rely on CI to confirm.

Reviewers: malfet

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79154

Approved by: https://github.com/seemethere, https://github.com/osalpekar, https://github.com/albanD
2022-06-10 18:30:08 +00:00
Brian Hirsh
7b3a0ff87a Port index.Tensor to structured kernels.
Tracking issue: #55070

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69607

Approved by: https://github.com/bdhirsh
2022-06-10 17:27:47 +00:00
Michael Andreas Dagitses
f96d96a7fc turn on -Werror=type-limits in our Bazel CPU build
Summary:
We also fix any existing issues.

Test Plan: Built locally, rely on CI to confirm.

Reviewers: malfet

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79139

Approved by: https://github.com/seemethere, https://github.com/osalpekar, https://github.com/albanD
2022-06-10 10:04:08 +00:00
Nikita Shulga
3255ddeec9 Make Wunused-local-typedef a hard error (#77918)
Only allow it for `libtorch_python` and tests
Helps prevent regression like https://github.com/pytorch/pytorch/pull/76547#issuecomment-1132208232

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77918
Approved by: https://github.com/osalpekar, https://github.com/seemethere
2022-06-09 18:14:01 +00:00
Mark Harfouche
221755cc71 Link BLAS privately (#78883)
We've some users report that they are getting symbol collisions when linking to blas.

I don't see a need to re-export the blas library symbols.

I figured I would share here for other packagers to be able to benefit too.

xref: https://github.com/conda-forge/pytorch-cpu-feedstock/pull/116
xref: https://github.com/conda-forge/openblas-feedstock/issues/134
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78883
Approved by: https://github.com/ezyang
2022-06-09 17:02:06 +00:00
PyTorch MergeBot
4b82ef7928 Revert "Port index.Tensor to structured kernels."
This reverts commit cfd84125bd.

Reverted https://github.com/pytorch/pytorch/pull/69607 on behalf of https://github.com/zengk95 due to This is breaking mac trunk tests cfd84125bd
2022-06-08 20:16:10 +00:00
Brian Hirsh
cfd84125bd Port index.Tensor to structured kernels.
Tracking issue: #55070

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69607

Approved by: https://github.com/bdhirsh
2022-06-08 18:17:52 +00:00
dzdang
a56f4e23b9 [quant][core][better-engineering] Rename files in quantized directory to conform with non-quantized countertpart filenames
Summary:
Names of analogous files in quantized directory (previously snake case) were inconsistent with
their non-quantized filename counterparts (pascal case). This is the first of a series of PRs that changes
all files in quantized (and sub-directories) dir to have pascal case.

`aten/src/ATen/native/quantized/qconv_unpack.cpp` has not been renamed yet
because (for reasons currently unknown) after making the name change, `import torch` produces the below error (`qlinear_unpack.cpp` renaming also seems to fail some phabricator CI tests for similar reasons). We suspect that these may be undefined errors and will revisit naming these files in a future PR.

```
terminate called after throwing an instance of 'c10::Error'
  what():  Type c10::intrusive_ptr<ConvPackedParamsBase<2> > could not be converted to any of the known types.
Exception raised from operator() at ../aten/src/ATen/core/jit_type.h:1735 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x55 (0x7f26745c0c65 in /data/users/dzdang/pytorch/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0xb1 (0x7f26745bdcd1 in /data/users/dzdang/pytorch/torch/lib/libc10.so)
frame #2: <unknown function> + 0x1494e24 (0x7f2663b14e24 in /data/users/dzdang/pytorch/torch/lib/libtorch_cpu.so)
frame #3: <unknown function> + 0xfed0bc (0x7f266366d0bc in /data/users/dzdang/pytorch/torch/lib/libtorch_cpu.so)
frame #4: c10::detail::infer_schema::make_function_schema(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&&, c10::ArrayRef<c10::detail::infer_schema::ArgumentDef>, c10::ArrayRef<c10::detail::infer_schema::ArgumentDef>) + 0x5a (0x7f266366d71a in /data/users/dzdang/pytorch/torch/lib/libtorch_cpu.so)
frame #5: c10::detail::infer_schema::make_function_schema(c10::ArrayRef<c10::detail::infer_schema::ArgumentDef>, c10::ArrayRef<c10::detail::infer_schema::ArgumentDef>) + 0x7b (0x7f266366e06b in /data/users/dzdang/pytorch/torch/lib/libtorch_cpu.so)
frame #6: <unknown function> + 0x1493f32 (0x7f2663b13f32 in /data/users/dzdang/pytorch/torch/lib/libtorch_cpu.so)
frame #7: <unknown function> + 0xe227dd (0x7f26634a27dd in /data/users/dzdang/pytorch/torch/lib/libtorch_cpu.so)
frame #8: <unknown function> + 0x14e0a (0x7f268c934e0a in /lib64/ld-linux-x86-64.so.2)
..........................truncated.............
```

Test Plan:
```
python test/test_quantization.py
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77037

Approved by: https://github.com/jerryzh168
2022-06-07 13:47:08 +00:00
PyTorch MergeBot
6a4997e66a [Profiler] Weaken ordering check during post processing.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78563

The profiler assembles a call hierarchy by replaying recorded events. There is an assert to ensure that the events form a well structured tree; however many of the inputs are from external sources and small differences (e.g. recording time in a lower precision) leads to traces which violate that assumption. For now this is acceptable; the post processing can handle resolving these descrepencies. As a result, I am relaxing the assert to only test event types where we expect the framework to be able to enforce these strong structural requirements.

Differential Revision: [D36787787](https://our.internmc.facebook.com/intern/diff/D36787787/)

Approved by: https://github.com/suo
2022-06-01 18:55:19 +00:00
PyTorch MergeBot
fca1f495c2 Revert "Port index.Tensor to structured kernels."
This reverts commit 9fe6f1baf5.

Reverted https://github.com/pytorch/pytorch/pull/69607 on behalf of https://github.com/suo due to this broke master, see: 9fe6f1baf5
2022-06-01 00:12:15 +00:00
PyTorch MergeBot
ceb93afe3f Revert "Fix bug in flatbuffer deserialization"
This reverts commit 7e72c96b10.

Reverted https://github.com/pytorch/pytorch/pull/78344 on behalf of https://github.com/tugsbayasgalan due to as we need to land it in fbcode asap
2022-05-31 23:34:04 +00:00
Brian Hirsh
9fe6f1baf5 Port index.Tensor to structured kernels.
Tracking issue: #55070

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69607

Approved by: https://github.com/bdhirsh
2022-05-31 22:15:20 +00:00
Tugsbayasgalan Manlaibaatar
7e72c96b10 Fix bug in flatbuffer deserialization
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78344

Approved by: https://github.com/qihqi
2022-05-31 18:37:30 +00:00
Michael Suo
032d1ace1d [ci] disable flaky MobileProfiler.Backend test
This test is flaky, normally I'd disable using the disable bot but it
doesn't support cpp.

[skip ci]

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78320

Approved by: https://github.com/malfet
2022-05-26 03:22:55 +00:00
Shunting Zhang
26d9386f67 Make string serialization of C++ FunctionSchema consistent with torchgen.model.FunctionSchema
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77926

There is a discrepency between the string representation of C++ FunctionSchema and torchgen.model.FunctionSchema.
The latter will not add parenthesis around the returned types if that a single item,
but the C++ FunctionSchema always add the parenthesis.

Make them consistent so we can convert one type to the other via its string representation and parse method.

Differential Revision: [D36535924](https://our.internmc.facebook.com/intern/diff/D36535924/)

Approved by: https://github.com/bdhirsh
2022-05-24 19:39:26 +00:00
John Clow
c82fb7a67f Adding support for upper and lower bound functions in SSA
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77389

Approved by: https://github.com/eellison
2022-05-20 23:58:40 +00:00
Nikolay Korovaiko
df1f9b9840 Implement sym_sizes to create proper IR for sym ints representing tensor sizes (#77756)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77756
Approved by: https://github.com/desertfire
2022-05-20 05:39:03 +00:00
PyTorch MergeBot
e9d660c331 Revert "Revert "Revert "Implement sym_sizes to create proper IR for sym ints representing tensor sizes (#76836)"""
This reverts commit acf7136a52.

Reverted https://github.com/pytorch/pytorch/pull/77719 on behalf of https://github.com/suo
2022-05-18 05:06:50 +00:00
Edward Z. Yang
acf7136a52 Revert "Revert "Implement sym_sizes to create proper IR for sym ints representing tensor sizes (#76836)""
This reverts commit c35bd8d423.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77719

Approved by: https://github.com/Chillee, https://github.com/malfet
2022-05-18 03:25:43 +00:00
PyTorch MergeBot
c35bd8d423 Revert "Implement sym_sizes to create proper IR for sym ints representing tensor sizes (#76836)"
This reverts commit fc4c3c9bc7.

Reverted https://github.com/pytorch/pytorch/pull/76836 on behalf of https://github.com/suo
2022-05-18 02:45:25 +00:00
Han Qi (qihqi)
3822a472ef Python function to extract information on mobile::Module from flatbuffer (#77624)
Summary:
Includes following refactor:
1. common loading on operator validation that is dup'd in pickle and
   flatbuffer loader moved to function.h/cpp
2. Allow loading of a function without wiring operator.

This function will be used to implement get_bundled_input and friends
for flatbuffer.

Test Plan: contbuild & OSS CI, see 69fa49f123

Reviewed By: cccclai

Differential Revision: D36348549

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77624
Approved by: https://github.com/cccclai
2022-05-18 00:42:57 +00:00
Nikolay Korovaiko
fc4c3c9bc7 Implement sym_sizes to create proper IR for sym ints representing tensor sizes (#76836)
LTC Tensors now create real IR (SizeNode) for sym_sizes() in LTCTensorImpl.cpp.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76836
Approved by: https://github.com/ezyang
2022-05-18 00:40:42 +00:00
Michael Suo
7f1e331b34 Make SymInt constructor explicit
Since we plan to have a bunch of code that is sensitive to whether or
not a SymInt contains a symbolic shape or not, it seems like a bad idea
to have an implicit constructor.

For example, code like:
```
sizes_and_strides_.stride_at_unchecked(dim) = 0;
```

would sail through, and the `0` would get implicitly promoted to a
SymInt.

This is a tradeoff though: it makes code that handles `SymInt`s more
clunky as `int64_t`s and integer literals need to be explicitly wrapped
in `SymInt` before being used.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77666

Approved by: https://github.com/ezyang
2022-05-17 22:28:35 +00:00
Bin Bao
25c6ebd12c Revert "Revert "[LT] Codegen ReuseNode for supported ops""
Summary: Fixed a XLC build failure by generating an always-return-false
default CanBeReused method.

This reverts commit 3cade9d454.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77513

Approved by: https://github.com/alanwaketan
2022-05-16 20:14:42 +00:00
Wang, Eikan
e5a5cd149f Simplify IfThenElse and CompareSelect within for-loop (#76793)
Analyze the range to determine if a condition cannot be satisfied. Suppose the for-loop body contains `IfThenElse` or `CompareSelect` while the condition of the two statements depends on the for-loop index `Var`. In that case, we will analyze the range to check whether the condition could always be satisfied or not. If the condition is deterministic, simplify the logic.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76793
Approved by: https://github.com/huiguoo
2022-05-15 20:21:28 +00:00
PyTorch MergeBot
3cade9d454 Revert "[LT] Codegen ReuseNode for supported ops"
This reverts commit 6066e5929f.

Reverted https://github.com/pytorch/pytorch/pull/76738 on behalf of https://github.com/malfet
2022-05-14 00:33:10 +00:00
Bin Bao
6066e5929f [LT] Codegen ReuseNode for supported ops
Summary:
1. Update the codegen script to add a TrieCache lookup (ReuseNode)
before creating a new IR node. The following is an example generated
code,

```
    at::Tensor LazyNativeFunctions::add(const at::Tensor & self, const at::Tensor & other, const at::Scalar & alpha) {
        ...
        torch::lazy::NodePtr node = torch::lazy::ReuseNode<AddTensor>(lazy_self->GetIrValue(), lazy_other->GetIrValue(), node_alpha);
        if (!node) {
            auto out_meta = at::meta::add(self, other, alpha);
            std::vector<Shape> shapes{Shape(out_meta.scalar_type(), out_meta.sizes().vec())};
            TORCH_INTERNAL_ASSERT(shapes.size() == 1);
            if(symbolicShapeEnabled()){
                std::vector<jit::IValue> inputs = { self, other, alpha };
                char* schema_str = "aten::add.Tensor(Tensor self, Tensor other, *, Scalar alpha=1) -> Tensor";
                applySymbolicShapesOnLT(schema_str, inputs, shapes);
            }

            node = torch::lazy::MakeNode<AddTensor>(lazy_self->GetIrValue(), lazy_other->GetIrValue(), node_alpha, std::move(shapes));
            CacheNode(node);
        }
        ...
    }
```
2. TrieCache lookup depends on each IR node subclass to provide its own
comparison function. The following is an example generated code,

```
  bool CanBeReused(const torch::lazy::Value& self, const torch::lazy::Value& other, const torch::lazy::Value& alpha) const {
    size_t i = 0;
    return (operand(i++) == self &&
        operand(i++) == other &&
        operand(i++) == alpha);
  }
```

3. DeviceData is specially handled.

4. Non-codegen op changes are coming a separate PR.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76738

Approved by: https://github.com/JackCaoG, https://github.com/wconstab
2022-05-13 19:13:58 +00:00
yanbing-j
4f82f439d1 Enable BFloat16 ELU, SELU and CELU in CPU path (#62546)
Enable BFloat16 ELU, SELU and CELU in CPU path. SELU and CELU will call ELU implementation.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62546
Approved by: https://github.com/frank-wei
2022-05-12 16:56:57 +00:00
Xiang Gao
cc9d0f309e lshift and rshift stop support floating types (#77146)
Fixes #74358

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77146
Approved by: https://github.com/ngimel
2022-05-11 22:29:30 +00:00
Bin Bao
8f5cdc6d5d Revert "Revert "[LT] Store OpKind for each IR subclass in a static field""
Summary: Re-land https://github.com/pytorch/pytorch/pull/76711 by
fixing internal build errors.
Generate class-level opkind as a static method instead of a static
member.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77102

Approved by: https://github.com/wconstab, https://github.com/JackCaoG, https://github.com/antoniojkim
2022-05-11 12:27:05 +00:00
John Clow
26e2936edc [JIT SSA] Added testing for the Cat Op in LazyTensor
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76552

Approved by: https://github.com/Krovatkin
2022-05-09 22:11:14 +00:00
PyTorch MergeBot
7eaf4780ba Revert "[LT] Store OpKind for each IR subclass in a static field"
This reverts commit ac37ddc795.

Reverted https://github.com/pytorch/pytorch/pull/76711 on behalf of https://github.com/malfet
2022-05-09 20:50:09 +00:00
Fuqiang Zhang
bd573389f6 [Bootcamp]Add option for flatbuffer loader to copy memory to individual tensors (#76986)
Summary: Add option for flatbuffer loader to copy memory to individual tensors to allow free memeory without waiting for all tensor runs completed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76986
Approved by: https://github.com/qihqi
2022-05-09 17:29:30 +00:00
Bin Bao
ac37ddc795 [LT] Store OpKind for each IR subclass in a static field
Summary: Currently OpKind is stored as an object field called op_ for each IR
node, and one usage of op_ is to avoid dynamic_cast in NodeCast when we
need to downcast a base-node pointer into a concrete sub-node pointer.
As a result, we need to construct and pass in an op when downcasting
nodes, and this becomes quite anonnying when we start to implement the
trie-based IR node reusing. More importantly, the op for each subclass
should be unique for that subclass and thus making it a const static field
is a more logical design.

In this PR, we still keep the object-level op_ for easier XLA adoption. As
furture work, we can come back to remove op_, make the op() method
virtual, and get rid of OpKind in all the node constructors.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76711

Approved by: https://github.com/wconstab, https://github.com/JackCaoG
2022-05-06 19:14:46 +00:00
David Berard
6c615a21a0 [NVFuser] prep for on-by-default
1. fix tests that expected nvfuser off-by-default behavior
2. skip nvfuser if getExecutorMode() == false

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76937

Approved by: https://github.com/eellison
2022-05-06 18:18:53 +00:00
Bin Bao
f05710dd40 [LT] Add a trie data structure for caching IR nodes
Summary: TrieCache provides a way to look up an IR node before we
actually create it. If the lookup hits in TrieCache, we reuse the
existing node and move the current pointer in TrieCache to point to that
node; if the lookup misses, we create a new node and insert it into TrieCache.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76542

Approved by: https://github.com/wconstab, https://github.com/JackCaoG
2022-05-04 23:48:03 +00:00
Wang, Eikan
429a80dded [NNC] Lowering function generates the output buffer with the specified stride (#76529)
Summary:
Pass stride information to lowering function to generate the output bufer with proper memory layout.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76529

Reviewed By: ZolotukhinM

Differential Revision: D36116712

Pulled By: IvanKobzarev

fbshipit-source-id: d3901f756b3710ecce172d6db3ecb0b7c12fb929
(cherry picked from commit b6cd53c91c01db36ea0e99167dc0ce0ae1d3aa23)
2022-05-04 20:04:22 +00:00
Bin Bao
f8a4780eb2 [LT] Move MakeNode into ir_builder.h
Summary: Move MakeNode into ir_builder.h to avoid circular header
reference later when introducing a trie cache for IR node lookup.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76482

Approved by: https://github.com/wconstab
2022-05-03 14:53:19 +00:00
Elias Ellison
e5a55af305 Reland reland
Reland of https://github.com/pytorch/pytorch/pull/76397 and https://github.com/pytorch/pytorch/pull/76493

This time I'll get it right 😢
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76539
Approved by: https://github.com/davidberard98, https://github.com/osalpekar
2022-04-28 20:41:55 +00:00
PyTorch MergeBot
a5bc02aeb2 Revert "[JIT] Register decomp reland"
This reverts commit 81b9cb741c.

Reverted https://github.com/pytorch/pytorch/pull/76397 on behalf of https://github.com/osalpekar
2022-04-28 03:33:29 +00:00
Antonio Kim
f3f327e103 Decouple LTC from TS Backend using Lazy IR Builder
Next stage of breaking up https://github.com/pytorch/pytorch/pull/74710

IR builder class introduced to decouple the explicit usage of `TsNode` in core lazy tensors.

Requires https://github.com/pytorch/pytorch/pull/75324 to be merged in first.

**Background**
- there are ~ 5 special ops used in lazy core but defined as :public {Backend}Node.  (DeviceData, Expand, Scalar...)
- we currently require all nodes derive from {Backend}Node, so that backends can make this assumption safely
- it is hard to have shared 'IR classes' in core/ because they depend on 'Node'

**Motivation**

1. avoid copy-paste of "special" node classes for each backend
2. in general decouple and remove all dependencies that LTC has on the TS backend

**Summary of changes**
- new 'IRBuilder' interface that knows how to make 5 special ops
- move 'special' node classes to `ts_backend/`
- implement TSIRBuilder that makes the special TS Nodes
- new backend interface API to get the IRBuilder
- update core code to call the builder

CC: @wconstab @JackCaoG @henrytwo

Partially Fixes #74628

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75433
Approved by: https://github.com/wconstab
2022-04-28 02:07:02 +00:00
Jiewen Tan
a28b132bc2 Revert D35860266: [pytorch][PR] Update torch::lazy::BackendDevice to have a new default ordinal
Test Plan: revert-hammer

Differential Revision:
D35860266 (f9d07ae644)

Original commit changeset: 554ebe16a068

Original Phabricator Diff: D35860266 (f9d07ae644)

fbshipit-source-id: 325c54aa2e87e51134115213352b3d33a81b7edf
(cherry picked from commit bbd74bf34a534d1b87aadff9790038e3dbbfa9c8)
2022-04-27 18:11:24 +00:00
Elias Ellison
81b9cb741c [JIT] Register decomp reland
Reland of https://github.com/pytorch/pytorch/pull/76252
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76397
Approved by: https://github.com/davidberard98
2022-04-26 23:17:18 +00:00
PyTorch MergeBot
2d72cb3373 Revert "[JIT] Allow registering Decompositions"
This reverts commit d9f0774f98.

Reverted https://github.com/pytorch/pytorch/pull/76252 on behalf of https://github.com/zengk95
2022-04-26 04:47:05 +00:00
Elias Ellison
d9f0774f98 [JIT] Allow registering Decompositions
- Allow registering custom decompositions
- Add easier API for invoking decompositions
- Shorten API names (no users yet)

I am doing these as one pr because they are fairly short/simple and because github first does not support ghstack yet.

cc @Chillee @zou3519
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76252
Approved by: https://github.com/davidberard98
2022-04-26 03:00:35 +00:00
Nikolay Korovaiko
bb60cac25a E2E SymInt example narrow_copy
This **roughly** corresponds to Goal 3.2 in https://docs.google.com/document/d/1iiLNwR5ohAsw_ymfnOpDsyF6L9RTUaHMpD8YLw-jxEw/edit#

Namely:

It adds the following:

* SymbolicIntNode interface
* LazySymbolicIntNode implementation
* Lazy `narrow_copy` implementation
* Need add support for SymInt in codegen
* Test (below)

```cpp
TEST(LazyDynamicOpsTest, NarrowCopy) {
  auto x = torch::rand({5, 10, 10}).to(kLazy);
  const size_t Y_DIM = 3;
  const size_t X_DIM_INDEX = 2;
  auto y = torch::rand({Y_DIM}).to(kLazy);
  auto ly = torch::lazy::TryGetLtcTensor(y);
  auto dim_node = MakeNode<SizeNode>(ly->GetIrValue(), 0);
  auto lmn = new torch::lazy::SymbolicIntNode(dim_node);
  auto z = x.narrow_copy(X_DIM_INDEX, 0, lmn->toSymInt());
  AllClose(z.cpu(), x.cpu().narrow_copy(X_DIM_INDEX, 0, Y_DIM));
}
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75759
Approved by: https://github.com/wconstab
2022-04-26 02:40:27 +00:00
Wonjoo Lee
f9d07ae644 Update torch::lazy::BackendDevice to have a new default ordinal (#76264)
Summary:
Fixes https://github.com/pytorch/xla/issues/3490. Updates `torch::lazy::BackendDevice` with changes below:

1. Remove the no-op string constructor.
2. Update default ordinal to `-1`.
3. Add a `is_valid` function to check if `ordinal` is valid/non-default (`ordinal >= 0`).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76264

Reviewed By: mrshenli

Differential Revision: D35860266

Pulled By: alanwaketan

fbshipit-source-id: 554ebe16a0683d37b00270c4f35163bf690bfe28
(cherry picked from commit b941d10e8545dfecfb34e4d5c24a29a1cc49bc4b)
2022-04-25 23:57:18 +00:00
zengk95
1d55518198 Revert "[nnc] Strides to Tensor (#72962)"
This reverts commit 939060925f.

Fixes https://github.com/pytorch/vision/issues/5873

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76332
Approved by: https://github.com/seemethere
2022-04-25 19:50:00 +00:00
Ivan Kobzarev
939060925f [nnc] Strides to Tensor (#72962)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72962

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM, cpuhrsch

Differential Revision: D34589306

Pulled By: IvanKobzarev

fbshipit-source-id: ecee5249760ecc0c8b2edb1842b90218899bc944
(cherry picked from commit 9e310c4c67389da30da89126d838ffe3864aba6f)
2022-04-23 19:35:15 +00:00
Prem
7557407653 Added directory check before saving in C++ API
Fixes #75177

Couldn't find any utility method to get directory name in pytorch repo, hence creating a function for that.
Let me know if a new function is not needed.

I also referred [this](https://github.com/pytorch/pytorch/blob/master/c10/test/util/tempfile_test.cpp#L15) for directory check.

Also I am using TORCH_CHECK to show the error. This is highly verbose with the entire stack visible. Is there any alternative for the same so that it is easier to read? This could happen a frequently, so small and concise error would be more helpful here.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75681
Approved by: https://github.com/albanD
2022-04-22 20:04:41 +00:00
Wang, Eikan
ef0873327e [NNC] Add utility functions to check channels-last contiguous (#75938)
Summary:
The `Buf` uses `std::vector<ExprHandle>` to represent its strides. The `ExprHandle` could be an immediate value or a mathematical expression with variables involved both for the static shape and dynamic shape. So it is hard to directly deduce the channels-last contiguous layout based on the numerical calculation. Hence, the utility functions of this PR are based on the pattern match to check whether the `Buf` is channels-last contiguous.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75938

Reviewed By: cpuhrsch

Differential Revision: D35724091

Pulled By: ZolotukhinM

fbshipit-source-id: f79ae21749d0aad8601f0434b52df88602ff09bf
(cherry picked from commit 3712bbbe4bea57c5c1abe1eafde4b8778e13e0c4)
2022-04-22 06:42:39 -07:00
Antonio Kim
2c2c13d21b Decouple Lazy Node Shape Cache (#75324)
Summary:
Next stage of breaking up https://github.com/pytorch/pytorch/pull/74710

Move shape cache implementation to the backend interface. Also, clean up some of the hashing logic in the base node class.

CC: wconstab JackCaoG henrytwo

Partially Fixes https://github.com/pytorch/pytorch/issues/74628

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75324

Reviewed By: anjali411

Differential Revision: D35730823

Pulled By: wconstab

fbshipit-source-id: cf6fa326319b9324e5f422a78817b6fb5bf7e9b8
(cherry picked from commit faec5043df56639e2fd23de2d91ae796e4f3df70)
2022-04-21 17:27:05 -07:00
Nikolay Korovaiko
69e048b090 List of SymInt rebase on master
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75115
Approved by: https://github.com/ezyang
2022-04-20 02:09:55 +00:00
Elias Ellison
f65eb09d6b [JIT] Move Shape Function definition to python
Moves jit shape function registration to python. Like jit decompositions, a script must be run after adding new definitions which serializes them in a c++ file.

This was a request so that torch-mlir could define functions in python and upstream their shape functions. cc @silvasean  @makslevental
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75546
Approved by: https://github.com/davidberard98
2022-04-19 20:59:44 +00:00
Taylor Robie
a5e338a826 [RecordFunction] More effecient machinery to determine which callbacks to run. (#75807)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75807

There is a tension in RecordFunction between two use cases:
1) In the normal eager path we don't run any callbacks, so we need to bail out of the profiling path as soon as possible to minimize eager overhead.
2) When profiling we want to determine which callbacks to run as efficiently as possible to minimize instrumentation overhead.

The confounding factor in all of this is sampling callbacks because they change which callbacks will run on each call, even in steady state operation. This has traditionally been handled with a two stage procedure: first we flip a coin to determine if a sampled callback *might* run. If false (which it usually is), do nothing. This solves (1). If true, check to see if we need to build the full callback set or if it was a false positive. This procedure has two negative effects:
* It forces us to rebuild the set of callbacks to run on every step when profiling
* It leaks the sampling abstraction, requiring other parts of the code to bump certain values and forces RecordFunction to lazily initialize.

This change introduces a multi-level cache which can (in the common case) quickly determine which callbacks *will* run, rather than if callbacks *might* run. This means that rather than call `shouldRunRecordFunction`, we can simply get the callbacks for an invocation and check if they are empty. (And completely removes the pre-sampling heuristic.) Another major benefit of the new cache structure is that it allows thread-safe registration and unregistration of global callbacks.

It's worth briefly discussing how this maintains eager performance. In the standard eager case (only sampling callbacks registered) the cache first checks that the global callbacks haven't changed (atomic read), decrements a counter to see if a sampling callback fired, and then returns the active callbacks which is simply a SmallVector of pointer pairs and a couple POD values (scope, needs inputs/outputs/ids). The biggest cost according to perf is the SmallVector logic; we could consider adopting a hard limit on active callbacks; more than half a dozen callbacks *running* in a single step would be quite a lot. But the total cost relative to `PYTORCH_DISABLE_PER_OP_PROFILING` is only ~10ns, so debatable if it's worth it to switch to `std::array`.

The primary change is in `record_function.cpp`, which has a more detailed description of the new cache structure. `record_function.h` has some minor changes to align with the new calling convention and the remaining files are simply changes to the call sites.

Future work:
  * RecordFunction no longer needs to be lazily initialized.
  * We can deprecate the disable/reenable APIs, since we can not safely add and remove global callbacks.

Test Plan:
I tested eager mode performance using the overhead benchmark and found that the non-profiled path was unaffected. However the no-op observer dropped from 0.41us to 0.37us (0.25us if no observers are active) which is about 1/3rd reduction in the cost of the callback selection machinery.

I also added several C++ unit tests, as the core RecordFunction machinery (especially sampling) was largely untested.

Reviewed By: swolchok, davidberard98

Differential Revision: D35276158

fbshipit-source-id: 35135f444724fba4eb97c0ae7f3f710f0f9016fd
(cherry picked from commit 9e359b87422c18f2a195185f32e7e85c82f956fd)
2022-04-19 20:46:16 +00:00
Han Qi
b34b192d6b Reland "Make debug_pkl smaller by only emitting unique traces." (#73368)
Summary:
## Original commit message:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73368

debug_pkl file inside of pytorch's .pt file consists of a list of SourceRanges. Each SourceRange points to a Source which is a stack track, filename, and start, end numbers. Those are emitted in debug_pkl file as strings.
Since many SourceRange shares the same source, the string for trace can be deduped.
The newer format saves a set of unique traces in a tuple, then each SourceRange will save the offset of it's trace w.r.t. position in that tuple. (i.e. manually applying dictionary compression).
The above helps with smaller file size. On loading, if we copy each trace to Source as string the runtime memory would still blowup.
To mitigate this, we use SourceView directly instead of source which will take the reference of string inside of Deserializer and make that into string_view. This is safe because Deserializer is hold by Unpickler by shared_ptr, and Unpickler is also hold by shared_ptr by another Source object. That Source object will be alive during the model construction.

Test Plan:
## Original Test plan
unit test

Took original file (312271638_930.predictor.disagg.local); loaded with `torch.jit.load` save again with `torch.jit.save`. Unzip both, look at contents:
```
[qihan@devvm5585.vll0 ~]$ du archive -h
4.0K    archive/xl_model_weights
3.7M    archive/extra
8.0K    archive/code/__torch__/caffe2/torch/fb/model_transform/splitting
8.0K    archive/code/__torch__/caffe2/torch/fb/model_transform
8.0K    archive/code/__torch__/caffe2/torch/fb
8.0K    archive/code/__torch__/caffe2/torch
8.0K    archive/code/__torch__/caffe2
20M     archive/code/__torch__/torch/fx/graph_module
20M     archive/code/__torch__/torch/fx
8.0K    archive/code/__torch__/torch/classes
20M     archive/code/__torch__/torch
20M     archive/code/__torch__
20M     archive/code
2.7M    archive/constants
35M     archive
[qihan@devvm5585.vll0 ~]$ du resaved -h
4.0K    resaved/extra
8.0K    resaved/code/__torch__/caffe2/torch/fb/model_transform/splitting
8.0K    resaved/code/__torch__/caffe2/torch/fb/model_transform
8.0K    resaved/code/__torch__/caffe2/torch/fb
8.0K    resaved/code/__torch__/caffe2/torch
8.0K    resaved/code/__torch__/caffe2
1.3M    resaved/code/__torch__/torch/fx/graph_module
1.3M    resaved/code/__torch__/torch/fx
8.0K    resaved/code/__torch__/torch/classes
1.4M    resaved/code/__torch__/torch
1.4M    resaved/code/__torch__
1.4M    resaved/code
2.7M    resaved/constants
13M     resaved
[qihan@devvm5585.vll0 ~]$
```
## Additional test:
`buck test mode/dev-tsan //caffe2/benchmarks/static_runtime:static_runtime_cpptest -- --exact 'caffe2/benchmarks/static_runtime:static_runtime_cpptest - StaticRuntime.to'` passes

 test jest.fbios.startup_cold_start.local.simulator f333356873 -

Differential Revision: D35196883

Pull Request resolved: https://github.com/pytorch/pytorch/pull/74869
Approved by: https://github.com/gmagogsfm
2022-04-18 22:34:21 +00:00
Han Qi
7d5c07830d Add upgrader related logic to flatbuffer (#71451)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71451

title

Test Plan: unittest

Reviewed By: tugsbayasgalan

Differential Revision: D33593056

fbshipit-source-id: c48d6ad50e6e2f757b68525dfe07693711b95840
(cherry picked from commit 8e09e20c1dafcdbdb45c2d1574da68a32e54a3a5)
2022-04-17 18:51:23 +00:00
Nikita Shulga
fe8eff3711 Revert "Add upgrader related logic to flatbuffer"
This reverts commit dfae96171a.
2022-04-17 11:38:59 -07:00
Han Qi
dfae96171a Add upgrader related logic to flatbuffer
Summary: title

Test Plan: unittest

Differential Revision: D33593056

Pull Request resolved: https://github.com/pytorch/pytorch/pull/71451
Approved by: https://github.com/tugsbayasgalan
2022-04-16 02:04:48 +00:00
Raghavan Raman
c2d5f6a5a4 [nnc] Update bounds overlap analysis to identify non-overlaps even with symbolic bounds
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74658

Approved by: https://github.com/ZolotukhinM
2022-04-14 20:24:03 +00:00
Raghavan Raman
d8ad1a579f [nnc] Fuse loops that have variable bounds
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74346

Approved by: https://github.com/ZolotukhinM
2022-04-14 20:24:03 +00:00
Jiewen Tan
ab0d9b18e9 [LT] Support Tensor.is_alias_of
Summary:
Tensor.is_alias_of relies on Storage to perform. However, LTCTensorImpl was
not implemented with that in mind. This commit adds a fake storage to LazyTensor
as a marker to mark LazyTensors that point to the same storage. The reason
why it's not done at LTCTensorImpl is that LazyTensor maintains the view ops/alias
logic in LazyTensor class instead of relying on TensorImpl to do the check.

Test Plan:
./build/bin/test_lazy --gtest_filter=LazyOpsTest.IsAliasOf

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75246

Approved by: https://github.com/bdhirsh
2022-04-14 07:28:03 +00:00
Nikolay Korovaiko
ce842f43f2 Relanding shape cache (75400) (#75710)
Summary:
https://github.com/pytorch/pytorch/pull/75400

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75710

Reviewed By: malfet

Differential Revision: D35598920

Pulled By: Krovatkin

fbshipit-source-id: 2bbbb3d0c24214b5dbb4ca605e7daa94671f96b0
(cherry picked from commit 572f2f9df5bfd73cd7b83536f619bc86d820ccd8)
2022-04-13 17:17:30 +00:00
PyTorch MergeBot
db1801099b Revert "Relanding shape cache (75400)"
This reverts commit 89486821ed.

Reverted https://github.com/pytorch/pytorch/pull/75710 on behalf of https://github.com/malfet
2022-04-13 17:14:38 +00:00
Nikolay Korovaiko
89486821ed Relanding shape cache (75400)
https://github.com/pytorch/pytorch/pull/75400
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75710
Approved by: https://github.com/malfet
2022-04-13 07:28:32 +00:00
PyTorch MergeBot
c274f66268 Revert "Adding Caching of calculated Symbolic Shapes"
This reverts commit 9a7bfaa929.

Reverted https://github.com/pytorch/pytorch/pull/75400 on behalf of https://github.com/mehtanirav
2022-04-12 21:53:31 +00:00
John Clow
9a7bfaa929 Adding Caching of calculated Symbolic Shapes
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75400

Approved by: https://github.com/eellison
2022-04-12 11:19:58 +00:00
Pavithran Ramachandran
6402e62454 Refractor flatbuffer jit code
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75239

Refractor flatbuffer_serializer to move JIT related code to a separate file .

Differential Revision: [D35301020](https://our.internmc.facebook.com/intern/diff/D35301020/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D35301020/)!

Approved by: https://github.com/iseeyuan
2022-04-11 23:41:48 +00:00
John Clow
f281d83d77 Moving Remove Tensor Type Specializations to after custom passes
This is to allow for Intel folks to use type information in their custom passes.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/71748

Approved by: https://github.com/eellison
2022-04-11 22:12:01 +00:00
Yulv-git
ac2d2e3a3d Fix some typos.
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75561
Approved by: https://github.com/albanD
2022-04-11 21:55:59 +00:00
Nikita Shulga
80ea6955af Add cuda-11.3+clang9 build workflow (take 2)
To be able to detect unused captures in GPU code lambdas (as gcc does not support this diagnostic)

Remove unused opts lambda capture in `ProcessGroupMPI.cpp` and `Distributions.cu`

Fix sign-compare in nvfuser benchmark and ignore signed unsigned comparison in nvfuser tests
Fixes https://github.com/pytorch/pytorch/issues/75475 by aliasing CMAKE_CUDA_HOST_COMPILER to C_COMPILER when clang is used
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75293
Approved by: https://github.com/atalman, https://github.com/seemethere
2022-04-11 17:13:01 +00:00
PyTorch MergeBot
8fe43d76d5 Revert "Add cuda-11.3+clang9 build workflow"
This reverts commit 709fcc862e.

Reverted https://github.com/pytorch/pytorch/pull/75293 on behalf of https://github.com/janeyx99
2022-04-11 15:24:59 +00:00
Nikita Shulga
709fcc862e Add cuda-11.3+clang9 build workflow
To be able to detect unused captures in GPU code lambdas (as gcc does not support this diagnostic)

Remove unused opts lambda capture in `ProcessGroupMPI.cpp` and `Distributions.cu`

Fix sign-compare in nvfuser benchmark and ignore signed unsigned comparison in nvfuser tests
Fixes https://github.com/pytorch/pytorch/issues/75475 by aliasing CMAKE_CUDA_HOST_COMPILER to C_COMPILER when clang is used
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75293
Approved by: https://github.com/atalman, https://github.com/seemethere
2022-04-11 14:10:57 +00:00
Jiewen Tan
dc37090ec5 [LT] Support diagonal op (#75230)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75230

Op diagonal is a view op which we can't code-gen yet. Therefore, support
it by making hand-written IR construction and lowering.

Test Plan: ./build/bin/test_lazy --gtest_filter=LazyOpsTest.TestDiagonal*

Reviewed By: wconstab

Differential Revision: D35378316

Pulled By: alanwaketan

fbshipit-source-id: 7958d00107aef20ac37aabcf2868346240977530
(cherry picked from commit 84155528fce484627c9688cfd92fd4aeb68219e5)
2022-04-08 19:49:42 +00:00
Nikolay Korovaiko
4a85145bbd Ansley's rebase of DimensionNode onto master (#75352)
Summary:
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75352

Reviewed By: wconstab

Differential Revision: D35455859

Pulled By: Krovatkin

fbshipit-source-id: e24c81d63dc66d03b752cc8de5cb551d84b003ac
(cherry picked from commit 4ad371cb4cc88860ce8ec398d82083f6759e3fcf)
2022-04-08 17:22:56 +00:00
John Clow
f1db3e465a Adding integration of SSA into LazyTensor
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75050

Approved by: https://github.com/Krovatkin
2022-04-07 19:49:41 +00:00
Pavithran Ramachandran
3001bda304 [PyTorchEdge] Backport from v9 flatbuffer to v8 pickle (#75201)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75201

In this diff:
1. Bump supported version to 9, which will serve as a placeholder for upcoming version bump to v9 for flatbuffer format migration.
2. Implements backport from v9 flatbuffer file to v8 pickle file.
ghstack-source-id: 153225189

(Note: this ignores all push blocking failures!)

Test Plan:
fb:
```
cd ~/fbsource/fbcode/ && buck test  -c fbcode.caffe2_enable_flatbuffer=1 caffe2/test/cpp/jit:jit -- LiteInterpreterTest.BackPortByteCodeModelAllVersions
Parsing buck files: finished in 0.7 sec
Downloaded 0/25 artifacts, 0.00 bytes, 100.0% cache miss (for updated rules)
Building: finished in 20.7 sec (100%) 21783/21783 jobs, 5/21783 updated

cd ~/fbsource/fbcode/ && buck test caffe2/test/cpp/jit:jit -- FlatbufferTest.FlatbufferBackPortTest
Parsing buck files: finished in 0.7 sec
Building: finished in 4.5 sec (100%) 12972/53298 jobs, 0/53298 updated
  Total time: 5.3 sec
More details at https://www.internalfb.com/intern/buck/build/b658d597-d358-4293-97cb-28e7612b96e8
BUILD SUCCEEDED
Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details.
Running with tpx session id: 35d5542d-6ee3-4c28-be10-1d822c7a6fef
Trace available for this run at /tmp/tpx-20220308-090347.891303-35d5542d-6ee3-4c28-be10-1d822c7a6fef/trace.log
RemoteExecution session id: reSessionID-35d5542d-6ee3-4c28-be10-1d822c7a6fef-tpx
Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/8444249379196000
    ✓ ListingSuccess: caffe2/test/cpp/jit:jit : 490 tests discovered (22.838)
    ✓ Pass: caffe2/test/cpp/jit:jit - FlatbufferTest.FlatbufferBackPortTest (0.289)
Summary
  Pass: 1
  ListingSuccess: 1
If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users
Finished test run: https://www.internalfb.com/intern/testinfra/testrun/8444249379196000
```

Reviewed By: iseeyuan

Differential Revision: D34702597

fbshipit-source-id: 5c203c29d13360d7934ce6e57557739e7038c05e
(cherry picked from commit 6189e08a2bd968fdab636f77cb6bd73d6c36beb2)
2022-04-07 19:43:57 +00:00
Wang, Eikan
252e1ccce6 Enable TE fuser to support user defined operator (#73073)
Summary:
PyTorch supports registering a custom operator by `TORCH_LIBRARY_FRAGMENT` / `TORCH_LIBRARY_IMPL` and `torch::jit::tensorexpr::getNNCLoweringRegistry` could insert a custom operator. But the te fuser passes conditional check does not support custom operator. The `isSupported` of `tensorexpr_fuser` checks whether the `Node` is `get_tensorexpr_elementwise_set()`, `supported_non_eltwise_set()`, `supported_misc_set` and `supported_reduction_set`. If a custom operator needs to be added to the TE fusion group, the checked will block it.

Taking the RN50 as an example, we can speed up the model by fusing the convolution and consecutive element-wise operator into a custom operator. The framework overhead becomes non-negligible when the computation becomes more efficient, especially for the latency mode and the tiny models. If the TE fuser allows adding the custom operator to the fusion group, then the entire RN50 model could be fused by TE as a single operator/function consisting of "ExternalCalls" and TE-IR.  This could significantly reduce framework overhead, which in turn improves RN50 E2E performance. The same goes for other models.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/73073

Reviewed By: pbelevich

Differential Revision: D35453165

Pulled By: ZolotukhinM

fbshipit-source-id: a764cf340b0b1e05fe230649cbe44f5786bdd37d
(cherry picked from commit ee95aa4d36714540fbb216a338799e6a6bb966d5)
2022-04-07 04:36:39 +00:00
Martin Yuan
00c1e01ad0 Remove internal logic to handle bytecode version 3 (#57775)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57775

The minimum supported bytecode version is updated from 3 to 4. We no longer support version 3 bytecode models.

Why?
* There are hacky codes in operator loading, that performs differently on one operator on the global bytecode version 3. Instead operator related metadata should be passed (for example, in #56845). To allow future development, we remove the hacky way first.
* The bytecode version was bumped from 3 to 4 more than half a year ago. Since all the production models are all bumped to version 4, it's not practical to keep and maintain version 3. The risk to deprecate version 3 is low.

Test Plan: Imported from OSS

Reviewed By: raziel

Differential Revision: D28270791

Pulled By: cccclai

fbshipit-source-id: 70b1bd6352fdaae5f8d2173b81578d77018c8e44
(cherry picked from commit 3e930fa381cd01f3705116795c6426df992372fc)
2022-04-07 01:45:52 +00:00
Pavithran Ramachandran
f984e50f39 Extend jit::load to work on flatbuffer file; Take 2 (#75256)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75256

ghstack-source-id: 153138970

Test Plan: CI

Reviewed By: iseeyuan

Differential Revision: D35399581

fbshipit-source-id: dafe9d301009d3f70986ed92bfe06d160ab90ba0
(cherry picked from commit ccc860fd07946de5aae12bc179a0b8bbba83b997)
2022-04-06 17:54:01 +00:00
John Clow
26dcec152c Added support for SSA for ops not in a JIT graph
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74340

Approved by: https://github.com/eellison
2022-04-06 01:45:37 +00:00
Antonio Kim
e1b4117e30 Move shape and operand definitions to base node (#75223)
Summary:
First stage of breaking up https://github.com/pytorch/pytorch/pull/74710

Moves the shape and operand definitions from `TsNode` to the base `Node`

CC: wconstab JackCaoG henrytwo

Partially Fixes https://github.com/pytorch/pytorch/issues/74628

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75223

Reviewed By: zou3519

Differential Revision: D35410285

Pulled By: wconstab

fbshipit-source-id: bb84d3fb636882cbe7e18af4b35ff2c0e22aaa58
(cherry picked from commit a4144c9a48379d8a9007cff845796608b597cce1)
2022-04-06 01:43:46 +00:00
Lu Fang
32e58c73c4 Back out "Extend jit::load to work on flatbuffer file" (#75244)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75244

Original commit changeset: d653a5af662a

Original Phabricator Diff: D35060736 (d9d34922a0)

Test Plan: Model loading test, verified that D35060736 (d9d34922a0) will cause the torch::save => torch::load failure.

Reviewed By: yinghai, jianyuh

Differential Revision: D35387009

fbshipit-source-id: 9d176992d402d57779e2af3d905b3c1538335298
(cherry picked from commit 6c8cc0d3b8a88b15e35702d70e18bbae8aa4628a)
2022-04-05 09:55:04 +00:00
Nikita Shulga
81d765ef1f Fix sign-compare violations in cpp tests
Prerequisite change for enabling `-Werror=sign-compare` across PyTorch repo

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75080

Approved by: https://github.com/atalman
2022-04-04 23:05:31 +00:00
Chen Lai
6efc5c1acf Rewrite upgrader bytecode version from 3 to 4 (content unchanged) (#75120)
Summary:
update the upgrader models by hacking backport logic - copy everything in the model and only rewrite the bytecode version to 4 in D35265596

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75120

ghstack-source-id: 152823046

Test Plan: CI

Reviewed By: qihqi

Differential Revision: D35321154

fbshipit-source-id: 333158bd0fd9b4819b3b7cf47d80c285934adf3e
(cherry picked from commit 74bb2da73a4d18f448b8486772643eac89eb759a)
2022-04-02 01:51:39 +00:00
Pavithran Ramachandran
d9d34922a0 Extend jit::load to work on flatbuffer file (#75022)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75022

Extending torch::jit::load to read flatbuffer file
ghstack-source-id: 152820697

Test Plan: CI

Reviewed By: iseeyuan

Differential Revision: D35060736

fbshipit-source-id: d653a5af662a46107ff4fd70209fd2a0a4d40f20
(cherry picked from commit 109e14a54bd279011c8f9066e6c29e8e0b1fc4db)
2022-04-02 01:33:34 +00:00
Pavithran Ramachandran
7aaa75af05 Extending _get_bytecode_version to support flatbuffers format (#75021)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75021

Extending `_get_bytecode_version` to support flatbuffers.
ghstack-source-id: 152771695

(Note: this ignores all push blocking failures!)

Test Plan:
```
~/fbsource/xplat] cd ~/fbsource/xplat/ && buck test //xplat/caffe2:test_lite_interpreter
Building: finished in 0.8 sec (100%) 327/327 jobs, 0/327 updated
  Total time: 0.9 sec
Testing: finished in 06:59.5 min (85 PASS/0 FAIL)
BUILD SUCCEEDED
RESULTS FOR //xplat/caffe2:test_lite_interpreter
PASS    412.3s 85 Passed   0 Skipped   0 Failed   //xplat/caffe2:test_lite_interpreter
TESTS PASSED
```

Reviewed By: iseeyuan

Differential Revision: D34900498

fbshipit-source-id: 65743076d43a933c5381ec128d0268f22c0a8441
(cherry picked from commit 457c76c7d1df6050b941c56a8198162e2e4a3388)
2022-04-01 15:05:37 +00:00
Will Constable
b9e535a64a Add non-eager registration to dispatch autogen (#74557)
Summary:
Previously, the torchscript backend would be (partially) initialized at startup.
- the dispatcher registrations would be registered,
- but other backend components would not be initialized until explicitly calling
  the backend init function

With this change, the torchscript backend is not initialized until its explicit
initialization function is called.

This enables external backends to register their own backend instead of the torchscript
backend to the same (Lazy) key.

Lands a change contributed by antoniojkim via lazy_tensor_staging branch (https://github.com/pytorch/pytorch/issues/73973)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/74557

Reviewed By: bdhirsh

Differential Revision: D35051464

Pulled By: wconstab

fbshipit-source-id: 5a8b0851293e394f49427d1416ee571a8881fe9f
(cherry picked from commit ef745a4a2c8d1d7f9510541a20f1f40625ce29de)
2022-04-01 03:42:53 +00:00
Will Constable
14affba799 Fix ir_metadata Python frames func and remove dead code (#74979)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74979

Reviewed By: alanwaketan

Differential Revision: D35261641

Pulled By: wconstab

fbshipit-source-id: e82b5f17d0043c4a3de72c16fb42fd02a85414fe
(cherry picked from commit fc6c0a1654256871361a5ad08926bc39d74cd0c5)
2022-03-31 23:23:36 +00:00
Nikolay Korovaiko
5177f95d21 Introducing SymInt to Pytorch (for tracing size arithmetic) (master rebase) (#74861)
Summary:
This PR introduces `SymInt` type to Pytorch which will be used by LTC and AOTAutograd for tracing size arithmetic and tests.
`SymInt` is a C++ union structure [int64_t, SymbolicIntNode*] that wraps around an int64_t field where the value of the field could be an index into a list of `shared_ptr<SymbolicIntNode>` or a real int.
This PR doesn't add any support for actually tracing symbolic ints. i.e. data_ for now can only contain real ints.

```
Goal 1: just to show we can add a type to PyTorch core. (wraps int) LANDEABLE
Finalize the naming - symint
Want the name to be short
Does invoke “size” - NO
SInt/SymInt/SymbolicInt
SInt could mean signed int
sym_int or symint or SymInt (originally it was “int”; capitalized implies object semantics, whereas lowercase implies value semantics)
JIT schema - symint
C++ - symint
```

See more details here: https://docs.google.com/document/d/1iiLNwR5ohAsw_ymfnOpDsyF6L9RTUaHMpD8 (d843f63f2a)YLw-jxEw

Pull Request resolved: https://github.com/pytorch/pytorch/pull/74861

Reviewed By: qihqi, ngimel

Differential Revision: D35226230

Pulled By: Krovatkin

fbshipit-source-id: 34acf342bd50fcaa4d8d5dd49c2fd6a98823a5b3
(cherry picked from commit 218643f63ef181cabb92d13a6e837eb64f2dda3c)
2022-03-31 21:59:59 +00:00
jjsjann123
873ced7cd0 Nvfuser code bump 030122 (#73627)
Summary:
Things changed in this PR that requires review:

test/forward_backward_compatibility/check_forward_backward_compatibility.py

Our previous function overload extension names were wrong and has been updated in this PR, hence the compatibility list updated.

nvfuser code updates with bug fixes towards failures we encountered in OpInfoTests as well as failures reported by AOTAutograd team.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/73627

Reviewed By: Chillee

Differential Revision: D34765458

Pulled By: davidberard98

fbshipit-source-id: c81f3d6a1b723fb3a8ba419b7f82227f70440ca7
(cherry picked from commit b6a2c362c37051e44fac31687b2fe272f776551e)
2022-03-31 08:18:22 +00:00
Nikita Shulga
43313cbde3 Revert D34647822: [tensorexpr] Add support for aten::stack
Test Plan: revert-hammer

Differential Revision:
D34647822 (954c7e2a77)

Original commit changeset: 3b863c71886c

Original Phabricator Diff: D34647822 (954c7e2a77)

fbshipit-source-id: e9ce06c9c8d7caf0fbb2565f0d99035bad685793
(cherry picked from commit b2ff355e9dbaa4e940fb221254223984c3c8a215)
2022-03-31 04:25:43 +00:00
Nikita Shulga
320e5a8268 Revert D34808051: [tensorexpr] Enabled aten::stack in the fuser pass with static shapes
Test Plan: revert-hammer

Differential Revision:
D34808051

Original commit changeset: 213e2ffdf87f

Original Phabricator Diff: D34808051

fbshipit-source-id: b618daeb346f784e8ab9525040edcb4a30a39613
(cherry picked from commit e47b973cba5c95e9410f8aecdfd5619de6d4be7c)
2022-03-31 04:25:43 +00:00
Hui Guo
90c3699cc8 [tensorexpr] Enabled aten::stack in the fuser pass with static shapes (#74077)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74077

Test Plan: Imported from OSS

Reviewed By: gchanan

Differential Revision: D34808051

Pulled By: huiguoo

fbshipit-source-id: 213e2ffdf87fb1a74104037cea7ef25e4bfd4307
(cherry picked from commit ad9e84842e5b47eda845827d325b08ba361a8286)
2022-03-31 04:25:43 +00:00
Elias Ellison
2ef5611f31 Add comments for adding shape function and linting (#73570)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73570

Approved by: https://github.com/huiguoo

Test Plan: contbuild & OSS CI, see 6d36bbde7e

Reviewed By: pbelevich

Differential Revision: D35192688

Pulled By: atalman

fbshipit-source-id: b12b80e6a6dd1adaa57a8facb6bb077989faa543
(cherry picked from commit e50478c02592597f12b8490ec5496f76c7d8b8cc)
2022-03-31 04:25:43 +00:00
Nikita Shulga
3036a0309d [skip ci]Revert "Add comments for adding shape function and linting"
This is a technical revert of 6d36bbde7e to reconcile it with e50478c02592597f12b8490ec5496f76c7d8b8cc (which is the same + lint changes applied)

Should be skipped during import
2022-03-30 21:21:28 -07:00
Hui Guo
954c7e2a77 [tensorexpr] Add support for aten::stack (#73801)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73801

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D34647822

Pulled By: huiguoo

fbshipit-source-id: 3b863c71886c7c6616b16f5d3313079714c8b82a
(cherry picked from commit c71778cf6a5724d26b671bf3ee0478add24990e8)
2022-03-30 21:25:15 +00:00
Dave Bort
f82b2d4a82 [PyTorchEdge] Make _load_parameters() handle flatbuffer inputs (#74580)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74580

Handle Flatbuffer-serialized parameters.

Make `_load_parameters()` detect the input data format and use the correct deserializer to load the parameters.

Also, rename `BytecodeDeserializer` to `IValueUnpickler` to make it clear that it unpickles an `IValue` and doesn't have anything to do with bytecode.
ghstack-source-id: 152487890

Test Plan:
New unit test shows a successful round trip from _save_parameters() to _load_parameters() using flatbuffers.

```
$ buck test //xplat/caffe2:test_lite_trainer //xplat/caffe2:test_lite_trainer_pickle_and_flatbuffer
Building: finished in 0.5 sec (100%) 346/346 jobs, 0/346 updated
  Total time: 0.6 sec
Testing: finished in 0.5 sec (26 PASS/0 FAIL)
BUILD SUCCEEDED
RESULTS FOR //xplat/caffe2:test_lite_trainer //xplat/caffe2:test_lite_trainer_pickle_and_flatbuffer
PASS    <100ms 13 Passed   0 Skipped   0 Failed   //xplat/caffe2:test_lite_trainer
PASS    <100ms 13 Passed   0 Skipped   0 Failed   //xplat/caffe2:test_lite_trainer_pickle_and_flatbuffer
TESTS PASSED
```

Reviewed By: qihqi

Differential Revision: D34488913

fbshipit-source-id: 8d2c0b895699f3b336115d33bf96d49cbf9245d2
(cherry picked from commit 319345deff260826197f8cdf5ac03071b412c72f)
2022-03-30 20:39:58 +00:00
Dave Bort
1659a267f9 [PyTorchEdge] Export flatbuffers from _save_parameters() (#74579)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74579

Now that we can convert a module to a flatbuffer, update `_save_parameters()` to optionally write to that format.

Also, rename the internal `ScriptModuleSerializer` class to `IValuePickler` to make it more clear that a) it's pickle-specific, and b) it serializes IValues, not Modules.
ghstack-source-id: 152487889

Test Plan:
New unit test shows that we can produce Flatbuffer-formatted output.

```
$ buck test //xplat/caffe2:test_lite_trainer //xplat/caffe2:test_lite_trainer_pickle_and_flatbuffer
Building: finished in 0.5 sec (100%) 346/346 jobs, 0/346 updated
  Total time: 0.6 sec
Testing: finished in 0.5 sec (26 PASS/0 FAIL)
BUILD SUCCEEDED
RESULTS FOR //xplat/caffe2:test_lite_trainer //xplat/caffe2:test_lite_trainer_pickle_and_flatbuffer
PASS    <100ms 13 Passed   0 Skipped   0 Failed   //xplat/caffe2:test_lite_trainer
PASS    <100ms 13 Passed   0 Skipped   0 Failed   //xplat/caffe2:test_lite_trainer_pickle_and_flatbuffer
TESTS PASSED
```

A new test in later commit D34488913 tests the full round trip.

Reviewed By: qihqi

Differential Revision: D34408538

fbshipit-source-id: eea183c31b5e1b2b75a65f384d8a479223a4ae72
(cherry picked from commit de310a15422b65fb7e443f7005d287d9f5f586bc)
2022-03-30 20:39:58 +00:00
Elias Ellison
6d36bbde7e Add comments for adding shape function and linting
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73570

Approved by: https://github.com/huiguoo
2022-03-29 23:02:22 +00:00
Elias Ellison
9c4a63787b Add api for changing function executor settings, hook up execution with decomposition registry (#74186)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74186

Make the execution settings mutable on function_impl so that we can set it for running op decompositions. Add mapping to function objects and show example in test of executing op decompositions.

Test Plan: Imported from OSS

Reviewed By: gchanan

Differential Revision: D34938125

Pulled By: eellison

fbshipit-source-id: adf108b2f6c1bd166910c6d7b94245661d67ce0d
(cherry picked from commit 9957e33803002d9e71abe4ff802769270b6960d3)
2022-03-29 18:38:52 +00:00
Elias Ellison
0ecf1add1b Introduce function-local settings for executor, expose in c++ (#74012)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74012

This allows setting an executor on a function. The first use case is use to decompositions in C++ without additional fusion passes etc which might not work with custom tensors like batched tensors/vmap. A subsequent use case might be taking advantage of invokees of JIT execution which guard on certain properties before invocation (such as complete shapes in AOT autograd, rank in lazy tensor).

Test Plan: Imported from OSS

Reviewed By: gchanan

Differential Revision: D34938124

Pulled By: eellison

fbshipit-source-id: cf7a45416457942b872322cab47d871a8336bdb5
(cherry picked from commit 9c600eb9ad0f2173f003e511268e97584edae36d)
2022-03-29 18:38:52 +00:00
Elias Ellison
6694fdaccd Clean up profiling mode and profiling executor strategy (#73875)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73875

Previously we had a few settings:
- getExecutor - which toggled between Profiling Executor and Legacy
- getGraphOptimize - if true, overrides PE/Legacy to run with simple executor (no optimizations)
and then...
- getProfilingMode - which would set PE to 0 specializtions.

The last mode is redundant with getGraphOptimize, we should just remove it and use getGraphOptimize in these cases. It would lead to potentially invalid combinations of logic - what does mean if getProfilingMode is true but getExecutor is set to false ? This would lead to a bug in specialize_autograd_zero in this case, see: https://github.com/pytorch/pytorch/blob/master/torch%2Fcsrc%2Fjit%2Fpasses%2Fspecialize_autogradzero.cpp#L93.

The tests here are failing but get fixed with the PR above it, so i'll squash for landing.

Test Plan: Imported from OSS

Reviewed By: cpuhrsch

Differential Revision: D34938130

Pulled By: eellison

fbshipit-source-id: 1a9c0ae7f6d1cfddc2ed3499a5af611053ae5e1b
(cherry picked from commit cf69ce3d155ba7d334022c42fb2cee54bb088c23)
2022-03-29 18:38:51 +00:00
Kurt Mohler
5375b2e994 Resolve int[]? arguments to new OptionalIntArrayRef class
This PR uses the `OptionalArrayRef` template class that was drafted in #64084.

Fixes #44409
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70864
Approved by: https://github.com/ezyang
2022-03-26 01:45:50 +00:00
Pavithran Ramachandran
fc2cf3d26f Back out "Revert D34805092: Extend _save_for_mobile and _load_for_mobile to support flatbuffer format; Default format is pickle + Change buck targets to support only pickle and pickle + flatbuffer for migration" (#74594)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74594

Extending `_save_for_mobile` and `_load_for_mobile` to support faltbuffer format with additional optional argument which is set to pick pickle by default.

Adding new binary target with suffix `_pickle_and_flatbuffer` to help migration.

Size test in D34909502 shows the size has regressed by ~40K but after removing pickle and comparing lite_predictors we have ~120K size measure that we will achieve when deprecating pickle and moving to flatbuffer

**BEFORE:**

```lang=mermaid
graph TD;
    torch_core-->torch_mobile_deserialize;

    torch_mobile_core-->torch_mobile_deserialize;

    jit_module_saving-->torch_core;
    jit_module_saving-->torch_mobile_core;

    torch_mobile_deserialize-->caffe2_serialize;
    torch_mobile_deserialize-->torch_mobile_module;

    caffe2_serialize-->miniz;

    flatbuffer_loader-->mobile_bytecode;
    flatbuffer_serializer-->mobile_bytecode;

    mobile_bytecode-->flatbuffer_2.0;

    flatbuffer_loader-->torch_mobile_module;
    flatbuffer_serializer-->torch_mobile_module;
```

**AFTER:**
```lang=mermaid
graph TD;
    torch_core-->torch_mobile_deserialize;

    torch_mobile_core-->torch_mobile_deserialize;

    jit_module_saving-->torch_core;
    jit_module_saving-->torch_mobile_core;

    torch_mobile_deserialize-->caffe2_serialize;
    torch_mobile_deserialize-->torch_mobile_module;

    caffe2_serialize-->miniz;

    flatbuffer_loader-->mobile_bytecode;
    flatbuffer_serializer-->mobile_bytecode;

    mobile_bytecode-->flatbuffer_2.0;

    torch_mobile_deserialize_pickle_and_flatbuffer-->|new| flatbuffer_loader;
    torch_mobile_deserialize_pickle_and_flatbuffer-->|new| torch_mobile_deserialize;
    torch_mobile_core_pickle_and_flatbuffer-->|new| torch_mobile_deserialize_pickle_and_flatbuffer;
    torch_core_pickle_and_flatbuffer-->|new| torch_mobile_deserialize_pickle_and_flatbuffer;

    jit_module_saving_pickle_and_flatbuffer-->|new| torch_core_pickle_and_flatbuffer;
    jit_module_saving_pickle_and_flatbuffer-->|new| torch_mobile_core_pickle_and_flatbuffer;

    flatbuffer_serializer-->torch_mobile_module;

    jit_module_saving_pickle_and_flatbuffer-->|new|jit_module_saving;
    jit_module_saving_pickle_and_flatbuffer-->|new|flatbuffer_serializer;

    flatbuffer_loader-->torch_mobile_module;
```

Original commit changeset: 780dfb6fd6ba

Original Phabricator Diff: D34805092 (284b2b7135)
ghstack-source-id: 152044801

(Note: this ignores all push blocking failures!)

Test Plan:
CI

```
~/fbsource/fbcode] cd ~/fbsource/fbcode/ && buck test -c fbcode.caffe2_enable_flatbuffer=1 //caffe2/test/cpp/jit:jit  -- FlatbufferTest.ExtraFiles
Parsing buck files: finished in 0.9 sec
Building: finished in 5.3 sec (100%) 12992/54304 jobs, 0/54304 updated
  Total time: 6.2 sec
More details at https://www.internalfb.com/intern/buck/build/2b387fff-f813-4cfa-b53f-eb2378630d4e
BUILD SUCCEEDED
Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details.
Running with tpx session id: f93a84d6-e7ce-41a0-a97f-0ef3fa6d199d
Trace available for this run at /tmp/tpx-20220323-134108.766518-f93a84d6-e7ce-41a0-a97f-0ef3fa6d199d/trace.log
RemoteExecution session id: reSessionID-f93a84d6-e7ce-41a0-a97f-0ef3fa6d199d-tpx
Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/4503599723101693
    ✓ ListingSuccess: caffe2/test/cpp/jit:jit : 486 tests discovered (19.122)
    ✓ Pass: caffe2/test/cpp/jit:jit - FlatbufferTest.ExtraFiles (0.187)
Summary
  Pass: 1
  ListingSuccess: 1
If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users
Finished test run: https://www.internalfb.com/intern/testinfra/testrun/4503599723101693
```

Similar Build Deps Dags

```
[pavithran@devvm5216.vll0 /data/users/pavithran/fbsource] buck query 'allpaths(//xplat/caffe2:torch_mobile_all_ops_pickle_and_flatbuffer, //xplat/caffe2:torch_mobile_deserialize_pickle_and_flatbuffer)' --output-format dot-compact  | pastry
P486770901: https://www.internalfb.com/intern/paste/P486770901/

[pavithran@devvm5216.vll0 /data/users/pavithran/fbsource] buck query 'allpaths(//xplat/caffe2:torch_mobile_all_ops, //xplat/caffe2:torch_mobile_deserialize)' --output-format dot-compact  | pastry
P486771278: https://www.internalfb.com/intern/paste/P486771278/
```

pickle_and_flatbuffer: https://www.internalfb.com/intern/dgw/graph/?build_id=P486770901
pickle: https://www.internalfb.com/intern/dgw/graph/?build_id=P486771278

Reviewed By: iseeyuan

Differential Revision: D35067157

fbshipit-source-id: 9044259c17a2e0da79bd6aedb28efbdfd57e23e0
(cherry picked from commit f738069ec3a72e79da56172741d027de514e9e5f)
2022-03-24 21:51:05 +00:00
Will Constable
3547f20872 Land remaining parts of Torchscript Lazy Tensor backend (#74111)
Summary:
Also enables bazel build to run lazy codegen.  Bazel (oss) build feeds off the same filelists as cmake/buck (build_variables.bzl), so enabling it is easier than keeping it disabled.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/74111

Test Plan: Run CI and verify test_lazy_ops is running via OSS cmake builds

Reviewed By: bdhirsh

Differential Revision: D34772403

fbshipit-source-id: 8a63f58b9536e6ac1be530667932176ef2549496
(cherry picked from commit e807ffb1918853d10b924fdc24f85ee5b1a39021)
2022-03-22 23:14:03 +00:00
Nikita Shulga
c53b3ed20f Revert D34805092: Extend _save_for_mobile and _load_for_mobile to support flatbuffer format; Default format is pickle + Change buck targets to support only pickle and pickle + flatbuffer for migration
Test Plan: revert-hammer

Differential Revision:
D34805092 (284b2b7135)

Original commit changeset: 57f3fc81d68f

Original Phabricator Diff: D34805092 (284b2b7135)

fbshipit-source-id: 780dfb6fd6ba5f9348f24a2fb3c57971b7155541
(cherry picked from commit bebeb8b84e11c34cbde4857d0e1c291731a7c781)
2022-03-22 22:45:50 +00:00
Pavithran Ramachandran
284b2b7135 Extend _save_for_mobile and _load_for_mobile to support flatbuffer format; Default format is pickle + Change buck targets to support only pickle and pickle + flatbuffer for migration (#74209)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74209

Extending `_save_for_mobile` and `_load_for_mobile` to support faltbuffer format with additional optional argument which is set to pick pickle by default.

Adding new binary target with suffix `_pickle_and_flatbuffer` to help migration.

Size test in D34909502 shows the size has regressed by ~40K but after removing pickle and comparing lite_predictors we have ~120K size measure that we will achieve when deprecating pickle and moving to flatbuffer

**BEFORE:**

```lang=mermaid
graph TD;
    torch_core-->torch_mobile_deserialize;

    torch_mobile_core-->torch_mobile_deserialize;

    jit_module_saving-->torch_core;
    jit_module_saving-->torch_mobile_core;

    torch_mobile_deserialize-->caffe2_serialize;
    torch_mobile_deserialize-->torch_mobile_module;

    caffe2_serialize-->miniz;

    flatbuffer_loader-->mobile_bytecode;
    flatbuffer_serializer-->mobile_bytecode;

    mobile_bytecode-->flatbuffer_2.0;

    flatbuffer_loader-->torch_mobile_module;
    flatbuffer_serializer-->torch_mobile_module;
```

**AFTER:**
```lang=mermaid
graph TD;
    torch_core-->torch_mobile_deserialize;

    torch_mobile_core-->torch_mobile_deserialize;

    jit_module_saving-->torch_core;
    jit_module_saving-->torch_mobile_core;

    torch_mobile_deserialize-->caffe2_serialize;
    torch_mobile_deserialize-->torch_mobile_module;

    caffe2_serialize-->miniz;

    flatbuffer_loader-->mobile_bytecode;
    flatbuffer_serializer-->mobile_bytecode;

    mobile_bytecode-->flatbuffer_2.0;

    torch_mobile_deserialize_pickle_and_flatbuffer-->|new| flatbuffer_loader;
    torch_mobile_deserialize_pickle_and_flatbuffer-->|new| torch_mobile_deserialize;
    torch_mobile_core_pickle_and_flatbuffer-->|new| torch_mobile_deserialize_pickle_and_flatbuffer;
    torch_core_pickle_and_flatbuffer-->|new| torch_mobile_deserialize_pickle_and_flatbuffer;

    jit_module_saving_pickle_and_flatbuffer-->|new| torch_core_pickle_and_flatbuffer;
    jit_module_saving_pickle_and_flatbuffer-->|new| torch_mobile_core_pickle_and_flatbuffer;

    flatbuffer_serializer-->torch_mobile_module;

    jit_module_saving_pickle_and_flatbuffer-->|new|jit_module_saving;
    jit_module_saving_pickle_and_flatbuffer-->|new|flatbuffer_serializer;

    flatbuffer_loader-->torch_mobile_module;
```
ghstack-source-id: 151744258

Test Plan:
Similar Build Deps Dags

```
[pavithran@devvm5216.vll0 /data/users/pavithran/fbsource] buck query 'allpaths(//xplat/caffe2:torch_mobile_all_ops_pickle_and_flatbuffer, //xplat/caffe2:torch_mobile_deserialize_pickle_and_flatbuffer)' --output-format dot-compact  | pastry
P486770901: https://www.internalfb.com/intern/paste/P486770901/

[pavithran@devvm5216.vll0 /data/users/pavithran/fbsource] buck query 'allpaths(//xplat/caffe2:torch_mobile_all_ops, //xplat/caffe2:torch_mobile_deserialize)' --output-format dot-compact  | pastry
P486771278: https://www.internalfb.com/intern/paste/P486771278/
```

pickle_and_flatbuffer: https://www.internalfb.com/intern/dgw/graph/?build_id=P486770901
pickle: https://www.internalfb.com/intern/dgw/graph/?build_id=P486771278

Reviewed By: iseeyuan

Differential Revision: D34805092

fbshipit-source-id: 57f3fc81d68fce941a050c35bd8e6f05951183b3
(cherry picked from commit 671ae4ed29e65b86ffe507a503548d3e86ab0ea4)
2022-03-22 20:00:53 +00:00
Han Qi
4b4f652f79 [3/5] Put JIT source inside flatbuffer (#74245)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74245

title

Test Plan: unittest

Reviewed By: iseeyuan

Differential Revision: D34881612

fbshipit-source-id: 7037982e9267ad72b86e91cd5f2d92426d71dd56
(cherry picked from commit 88f34eb55b2bee6ef8ef27188e075fa2b8767fdf)
2022-03-17 18:46:47 +00:00
Will Constable
d67a265881 Sync lazy_tensor_staging to master (#74311)
Summary:
This merges changes that have already been reviewed/landed onto lazy_tensor_staging branch.  It combines changes from multiple PRs into one diff.

updated from lazy_tensor_staging on 3/16

Pull Request resolved: https://github.com/pytorch/pytorch/pull/74311

Test Plan:
Run CI to ensure compilation on various platforms
Run unit tests on lazy_tensor_staging branch with source version of all these diffs

Reviewed By: desertfire

Differential Revision: D34929235

fbshipit-source-id: babbc3bbeabc5b8107ee9284ed7765887a148622
(cherry picked from commit d91577a6557343ec536f6859e4808ec1a8a9b685)
2022-03-17 16:08:57 +00:00
Will Constable
44a8d4d998 Add lazy tensor unit tests, disabled (#74309)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74309

Since the test file is large, it can be landed on its own and then switched on
in the diff that actually builds lazy tensor code.

Test Plan: verify CI passes

Reviewed By: desertfire

Differential Revision: D34928619

fbshipit-source-id: cd556155326f7fb55b3f29031f80bc36c936d565
(cherry picked from commit 60945adbefb6a8d19f89e330f8b344d076b13bfc)
2022-03-17 15:31:26 +00:00
Will Constable
72b1194464 Run lazy tensor codegen in generate_code.py (#73996)
Summary:
Hooks into existing autograd codegen script (generate_code.py) to take advantage of its integrations into buck/cmake/bazel.

Adds a new option (--gen_lazy_ts_backend) to. generate_code.py, calling this from CMake OSS build and fbcode build, but not from other internal xplat/ovrsource builds (these could be opted in later)

Bazel support is added in a later diff.

Includes one generated file (torch/csrc/lazy/generated/LazyIr.h) in a unit test (test/cpp/lazy/test_ir.cpp) to partially verify the generator is working, but does not compile the remaining output sources from the generator yet as they depend on other files not yet landed from lazy_tensor_staging branch.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/73996

Test Plan: OSS/internal CI - verify all builds are working and test_ir.cpp compiles LazyIr.h

Reviewed By: ezyang

Differential Revision: D34408536

fbshipit-source-id: 8af0aea3b95d81eccafc17d64390d70ddd176515
(cherry picked from commit f930612f2bad61c76eb02d85cfbec9f33a1459dc)
2022-03-17 15:31:26 +00:00
Han Qi
ded82ad7c7 Create method to map JIT module to (source, constant) and back. (#74119)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74119

  implemented function to generate source as ExtraFilesMap and constants

  wrote function to construct jit module given (ivalue, source,
  constant) tripple.

Test Plan: unittest

Reviewed By: pavithranrao

Differential Revision: D34803945

fbshipit-source-id: 2edc798407fe68294cb4c3c7516f5bd143df88c3
(cherry picked from commit 35e54e166b8f0f5cfe8f08c07866b59ae61ee79d)
2022-03-15 18:30:08 +00:00
Taylor Robie
0b1f3bd158 [Profiler] Prefer TSC to wall clock when available (#73855)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73855

Calling the clock is one of the most expensive parts of profiling. We can reduce the profiling overhead by using `rdtsc` instead. The tradeoff is that we have to measure and convert. (shift and scale)

Test Plan: I added a cpp unit test with *very* aggressive anti-flake measures. I also ran the overhead benchmark (9 replicates) with `--stressTestKineto` (0.94 -> 0.89 us) and `--stressTestKineto --kinetoProfileMemory` (1.27 -> 1.17 us)

Reviewed By: chaekit

Differential Revision: D34231071

fbshipit-source-id: e3b3dd7580d93bcc783e87c7f2fc726cb74f4df8
(cherry picked from commit e8be9f8160793c6ee35d5af02bca3e01703e377d)
2022-03-13 18:29:06 +00:00
Taylor Robie
5a58820f01 [Profiler] Specialized AppendOnlyQueue (#73409)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73409

We can do better than `vector` or `deque`, and it's sufficiently important to the hot path to justify a custom container. (This is part of the larger queue refactor, but this is a standalone drop-in replacement so we don't need to wait.)

Test Plan: It's a pretty simple container type, so I just added a few cpp tests for emplace and read back. I also ran the overhead benchmark (replicates=9) with both `--stressTestKineto` (0.99 -> 0.94 us) and `--stressTestKineto --kinetoProfileMemory` (1.36 -> 1.27 us).

Reviewed By: swolchok

Differential Revision: D34231072

fbshipit-source-id: ed57299729d444d59cf843a0d38a3ee2240eeec1
(cherry picked from commit 43907948f3a8d2137244e7bb59f43999bd660917)
2022-03-11 19:47:40 +00:00
David Dang
abfaef0aec [Quant][core] Merged conv packed params and linear packed params (#73486)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73486

conv and linear packed params were previously defined in ATen/native/quantized/cpu/conv_packed_params.h> and ATen/native/quantized/cpu/packed_params.h>. These two files have been merged into one and has been relocated to ATen/native/quantized/cpu/packed_params.h>.

Differential Revision:
D34513286
D34513286

Test Plan: Imported from OSS

Reviewed By: dagitses

Pulled By: dzdang

fbshipit-source-id: 813845af7ea9449e316ab7822efe7460f0bd0d88
(cherry picked from commit 2f627561f27f81977ff73b8863c5e9e719dc4c60)
2022-03-11 15:18:45 +00:00
Ivan Kobzarev
519e226b66 [tensorexp] ExternalCall2 without memcpy (#72225)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72225

Test Plan: Imported from OSS

Reviewed By: dagitses

Differential Revision: D33960933

Pulled By: IvanKobzarev

fbshipit-source-id: fc73a3de9e5150919e3806516065b4a6c8316000
(cherry picked from commit f637842c341e0ba94906a0c8a1efc81691dc512c)
2022-03-09 21:19:26 +00:00
Han Qi
0723639b60 Revert D34455360: Multisect successfully blamed D34455360 for test failures
Summary:
This diff is reverting D34455360 (61d6c43864)
D34455360 (61d6c43864) is making the following tests to fail and this revert diff is either the revert of the blame diff or the revert of the stack of diffs that need to be reverted to revert the blame diff

Tests affected:
- https://www.internalfb.com/intern/test/562950004334605/

Multisect link:
https://www.internalfb.com/intern/testinfra/multisect/756170

Test Plan: NA

Reviewed By: zhxchen17

Differential Revision: D34596156

fbshipit-source-id: a465bca0094db3caf6130c80f1ed49eea981359b
(cherry picked from commit ef5e5578c64ce9827570757fb016aafa9c782c6a)
2022-03-08 23:18:54 +00:00
Elias Ellison
52ccbf4494 Lock thread/block computation (#73800)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73800

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D34647281

Pulled By: eellison

fbshipit-source-id: adbdaf24191c4c1b85e0b62564388f2481002ed2
(cherry picked from commit 6cf38015cc14691518b1b5cb7d636e80eb3684fc)
2022-03-04 22:32:08 +00:00
Dave Bort
7b51629c53 [PyTorchEdge] Add getFileFormat() so we can differentiate Zip/Pickle from Flatbuffer (#73707)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73707

Add a helper function to detect the file format from the first bytes of a data file or stream. This will be necessary during the migration from Pickle-serialized modules to Flatbuffer-serialized modules.
ghstack-source-id: 150384317

Test Plan:
Existing tests for ZIP+Pickle continue to pass.

New unit tests pass:
```
cd xplat && buck test //xplat/caffe2:test_lite_trainer //xplat/caffe2:test_lite_interpreter

Building: finished in 26.6 sec (100%) 3180/3180 jobs, 571/3180 updated
  Total time: 32.2 sec
Testing: finished in 07:08.3 min (89 PASS/0 FAIL)
BUILD SUCCEEDED
RESULTS FOR //xplat/caffe2:test_lite_interpreter //xplat/caffe2:test_lite_trainer
PASS    421.1s 81 Passed   0 Skipped   0 Failed   //xplat/caffe2:test_lite_interpreter
PASS     103ms  8 Passed   0 Skipped   0 Failed   //xplat/caffe2:test_lite_trainer
TESTS PASSED
```

Reviewed By: iseeyuan

Differential Revision: D34527859

fbshipit-source-id: ff2d1eabc2f8be1de2e44709c878e2d1a373f0df
(cherry picked from commit 5c394848346ab9e374c9e7eed479ad70ed09a7ae)
2022-03-04 19:35:41 +00:00
Han Qi
61d6c43864 Make debug_pkl smaller by only emitting unique traces. (#73368)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73368

debug_pkl file inside of pytorch's .pt file consists of a list of SourceRanges. Each SourceRange points to a Source which is a stack track, filename, and start, end numbers. Those are emitted in debug_pkl file as strings.
Since many SourceRange shares the same source, the string for trace can be deduped.
The newer format saves a set of unique traces in a tuple, then each SourceRange will save the offset of it's trace w.r.t. position in that tuple. (i.e. manually applying dictionary compression).
The above helps with smaller file size. On loading, if we copy each trace to Source as string the runtime memory would still blowup.
To mitigate this, we use SourceView directly instead of source which will take the reference of string inside of Deserializer and make that into string_view. This is safe because Deserializer is hold by Unpickler by shared_ptr, and Unpickler is also hold by shared_ptr by another Source object. That Source object will be alive during the model construction.

Test Plan:
unit test

Took original file (312271638_930.predictor.disagg.local); loaded with `torch.jit.load` save again with `torch.jit.save`. Unzip both, look at contents:
```
[qihan@devvm5585.vll0 ~]$ du archive -h
4.0K    archive/xl_model_weights
3.7M    archive/extra
8.0K    archive/code/__torch__/caffe2/torch/fb/model_transform/splitting
8.0K    archive/code/__torch__/caffe2/torch/fb/model_transform
8.0K    archive/code/__torch__/caffe2/torch/fb
8.0K    archive/code/__torch__/caffe2/torch
8.0K    archive/code/__torch__/caffe2
20M     archive/code/__torch__/torch/fx/graph_module
20M     archive/code/__torch__/torch/fx
8.0K    archive/code/__torch__/torch/classes
20M     archive/code/__torch__/torch
20M     archive/code/__torch__
20M     archive/code
2.7M    archive/constants
35M     archive
[qihan@devvm5585.vll0 ~]$ du resaved -h
4.0K    resaved/extra
8.0K    resaved/code/__torch__/caffe2/torch/fb/model_transform/splitting
8.0K    resaved/code/__torch__/caffe2/torch/fb/model_transform
8.0K    resaved/code/__torch__/caffe2/torch/fb
8.0K    resaved/code/__torch__/caffe2/torch
8.0K    resaved/code/__torch__/caffe2
1.3M    resaved/code/__torch__/torch/fx/graph_module
1.3M    resaved/code/__torch__/torch/fx
8.0K    resaved/code/__torch__/torch/classes
1.4M    resaved/code/__torch__/torch
1.4M    resaved/code/__torch__
1.4M    resaved/code
2.7M    resaved/constants
13M     resaved
[qihan@devvm5585.vll0 ~]$
```

Reviewed By: gmagogsfm

Differential Revision: D34455360

fbshipit-source-id: 8cc716f9bba7183746b1b4ecc33a2de34ac503b9
(cherry picked from commit f1a04730fc9ac8fdab6c8e4c44cb5529e42090e4)
2022-03-02 08:37:08 +00:00
Mengwei Liu
9ce9803abe [PyTorch] Add codegen unboxing ability (#69881)
Summary:
RFC: https://github.com/pytorch/rfcs/pull/40

This PR (re)introduces python codegen for unboxing wrappers. Given an entry of `native_functions.yaml` the codegen should be able to generate the corresponding C++ code to convert ivalues from the stack to their proper types. To trigger the codegen, run
```
tools/jit/gen_unboxing.py -d cg/torch/share/ATen
```

Merged changes on CI test. In https://github.com/pytorch/pytorch/issues/71782 I added an e2e test for static dispatch + codegen unboxing. The test exports a mobile model of mobilenetv2, load and run it on a new binary for lite interpreter: `test/mobile/custom_build/lite_predictor.cpp`.

## Lite predictor build specifics

1. Codegen: `gen.py` generates `RegisterCPU.cpp` and `RegisterSchema.cpp`. Now with this PR, once `static_dispatch` mode is enabled, `gen.py` will not generate `TORCH_LIBRARY` API calls in those cpp files, hence avoids interaction with the dispatcher. Once `USE_LIGHTWEIGHT_DISPATCH` is turned on, `cmake/Codegen.cmake` calls `gen_unboxing.py` which generates `UnboxingFunctions.h`, `UnboxingFunctions_[0-4].cpp` and `RegisterCodegenUnboxedKernels_[0-4].cpp`.
2. Build: `USE_LIGHTWEIGHT_DISPATCH` adds generated sources into `all_cpu_cpp` in `aten/src/ATen/CMakeLists.txt`. All other files remain unchanged. In reality all the `Operators_[0-4].cpp` are not necessary but we can rely on linker to strip them off.

## Current CI job test coverage update

Created a new CI job `linux-xenial-py3-clang5-mobile-lightweight-dispatch-build` that enables the following build options:
* `USE_LIGHTWEIGHT_DISPATCH=1`
* `BUILD_LITE_INTERPRETER=1`
* `STATIC_DISPATCH_BACKEND=CPU`

This job triggers `test/mobile/lightweight_dispatch/build.sh` and builds `libtorch`. Then the script runs C++ tests written in `test_lightweight_dispatch.cpp` and `test_codegen_unboxing.cpp`. Recent commits added tests to cover as many C++ argument type as possible: in `build.sh` we installed PyTorch Python API so that we can export test models in `tests_setup.py`. Then we run C++ test binary to run these models on lightweight dispatch enabled runtime.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69881

Reviewed By: iseeyuan

Differential Revision: D33692299

Pulled By: larryliu0820

fbshipit-source-id: 211e59f2364100703359b4a3d2ab48ca5155a023
(cherry picked from commit 58e1c9a25e3d1b5b656282cf3ac2f548d98d530b)
2022-03-01 23:28:13 +00:00
Elias Ellison
d3d74e9040 Allow custom registration of shape functions (#73270)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73270

Together with open registration of NNC lowerings this should make possible to add support for custom operators, including internal fb-ops

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D34451275

Pulled By: eellison

fbshipit-source-id: ae8ae2deb93caa6770e738217461e65853897b55
(cherry picked from commit ea6b7e8a6d8f970a20e68d02eefc5c951e32aa07)
2022-02-28 17:44:45 +00:00
Pavithran Ramachandran
62eb7d64cf [PyTorchEdge] Extend flatbuffer to support extra files map (#72951)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72951

Extend flatbuffer to support extra files map

Flatbuffer schema has extra files. The users can write extra files by providing a `map<string, string>` which will be part of the flatbuffer model asset and and can be loaded back similar to pickle.
ghstack-source-id: 149622799

Test Plan:
fb:

```[pavithran@devvm5216.vll0 ~/fbsource/fbcode] cd ~/fbsource/fbcode/ && buck test caffe2/test/cpp/jit:jit -- FlatbufferTest.ExtraFiles
Parsing buck files: finished in 0.7 sec
Downloaded 0/8 artifacts, 0.00 bytes, 100.0% cache miss (for updated rules)
Building: finished in 20.0 sec (100%) 22343/22343 jobs, 4/22343 updated
  Total time: 20.7 sec
More details at https://www.internalfb.com/intern/buck/build/7dba5034-d623-4a1e-afa1-b0e809df7066
BUILD SUCCEEDED
Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details.
Running with tpx session id: 9c1ac1e0-a8c0-4a62-95df-8f49695aa7d1
Trace available for this run at /tmp/tpx-20220216-144630.207992/trace.log
RemoteExecution session id: reSessionID-9c1ac1e0-a8c0-4a62-95df-8f49695aa7d1-tpx
Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/7318349470518809
    ✓ ListingSuccess: caffe2/test/cpp/jit:jit : 468 tests discovered (17.211)
    ✓ Pass: caffe2/test/cpp/jit:jit - FlatbufferTest.ExtraFiles (0.169)
Summary
  Pass: 1
  ListingSuccess: 1
If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users
Finished test run: https://www.internalfb.com/intern/testinfra/testrun/7318349470518809````

Reviewed By: iseeyuan

Differential Revision: D34286346

fbshipit-source-id: 4e09ab25b8ed6af6f8923db3aab046c255f13bb8
(cherry picked from commit ce8d88e22a360b25253d8a75f428d523fa88a79a)
2022-02-24 19:39:32 +00:00
Jacob Szwejbka
faacb8ab36 [Pytorch Edge] Lean Runtime Test
Summary:
As far as I can tell theres no CI that actually runs the lean_runtime. This should add it I think. (Is this directory covered by CI?) Next up is to create some test for min_runtime_lib

(Note: this ignores all push blocking failures!)

Test Plan: buck test :lean_runtime_delegate_flatbuffer_test

Reviewed By: iseeyuan

Differential Revision: D34255148

fbshipit-source-id: b44693220e93869edd984bbcd17d33db4007a4ea
(cherry picked from commit 0a4a6b5bd2b4a1f8cce8bc1c4a22dad9539631c1)
2022-02-24 18:40:47 +00:00
Alban Desmaison
3bd1507ff2 Revert D33994011: Make debug_pkl smaller by only emitting unique traces.
Test Plan: revert-hammer

Differential Revision:
D33994011 (3d37f5b052)

Original commit changeset: 8e6224c6e942

Original Phabricator Diff: D33994011 (3d37f5b052)

fbshipit-source-id: 885e739efa1081382e1fcf9c6cccba92c57e9f7a
(cherry picked from commit a6d98c85a736c2eb321a6f38005dd0f5dc43eb87)
2022-02-24 16:38:55 +00:00
Han Qi
3d37f5b052 Make debug_pkl smaller by only emitting unique traces. (#72596)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72596

debug_pkl file inside of pytorch's .pt file consists of a list of SourceRanges. Each SourceRange points to a Source which is a stack track, filename, and start, end numbers. Those are emitted in debug_pkl file as strings.

Since many SourceRange shares the same source, the string for trace can be deduped.

The newer format saves a set of unique traces in a tuple, then each SourceRange will save the offset of it's trace w.r.t. position in that tuple. (i.e. manually applying dictionary compression).

The above helps with smaller file size. On loading, if we copy each trace to Source as string the runtime memory would still blowup.
To mitigate this, we use SourceView directly instead of source which will take the reference of string inside of Deserializer and make that into string_view. This is safe because Deserializer is hold by Unpickler by shared_ptr, and Unpickler is also hold by shared_ptr by another Source object. That Source object will be alive during the model construction.

Test Plan:
unit test

Took original file (312271638_930.predictor.disagg.local); loaded with `torch.jit.load` save again with `torch.jit.save`. Unzip both, look at contents:
```
[qihan@devvm5585.vll0 ~]$ du archive -h
4.0K    archive/xl_model_weights
3.7M    archive/extra
8.0K    archive/code/__torch__/caffe2/torch/fb/model_transform/splitting
8.0K    archive/code/__torch__/caffe2/torch/fb/model_transform
8.0K    archive/code/__torch__/caffe2/torch/fb
8.0K    archive/code/__torch__/caffe2/torch
8.0K    archive/code/__torch__/caffe2
20M     archive/code/__torch__/torch/fx/graph_module
20M     archive/code/__torch__/torch/fx
8.0K    archive/code/__torch__/torch/classes
20M     archive/code/__torch__/torch
20M     archive/code/__torch__
20M     archive/code
2.7M    archive/constants
35M     archive
[qihan@devvm5585.vll0 ~]$ du resaved -h
4.0K    resaved/extra
8.0K    resaved/code/__torch__/caffe2/torch/fb/model_transform/splitting
8.0K    resaved/code/__torch__/caffe2/torch/fb/model_transform
8.0K    resaved/code/__torch__/caffe2/torch/fb
8.0K    resaved/code/__torch__/caffe2/torch
8.0K    resaved/code/__torch__/caffe2
1.3M    resaved/code/__torch__/torch/fx/graph_module
1.3M    resaved/code/__torch__/torch/fx
8.0K    resaved/code/__torch__/torch/classes
1.4M    resaved/code/__torch__/torch
1.4M    resaved/code/__torch__
1.4M    resaved/code
2.7M    resaved/constants
13M     resaved
[qihan@devvm5585.vll0 ~]$
```

Reviewed By: JasonHanwen

Differential Revision: D33994011

fbshipit-source-id: 8e6224c6e942e91c3403f686c8f0937d1002ed41
(cherry picked from commit a7014dd4029308c95007f362a57c31796d686647)
2022-02-24 09:31:16 +00:00
Hui Guo
5eb5b61221 [tensorexpre] Add typecast when src and dest buf types are different in PlacementAllocate (#71934)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71934

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D33826700

Pulled By: huiguoo

fbshipit-source-id: 9fb29a43ab5983586a6bfde3a34d7e2f2120ab0a
(cherry picked from commit 2bee018691ec888cb1ec761528951f5745d7ef79)
2022-02-23 19:36:50 +00:00
Yedidya Feldblum
7a5b0efc64 [caffe2] fix build failures in optimized builds under clang
Summary:
There are various possible approaches, but the approach chosen minimizes disruption to source control blame.

Addresses:
```
error: Function _ZN23FunctionalTest_Pad_Test8TestBodyEv is too big to optimize [-Werror,-Wignored-optimization-argument]
```

Test Plan: buck2 build mode/opt caffe2/test/cpp/api:functional

Reviewed By: jamesr66a

Differential Revision: D34027291

fbshipit-source-id: 9dfd771ad56d3d4bc0d41b38b04654c8dae7c006
(cherry picked from commit d43b5a7ed6)
2022-02-22 22:31:47 +00:00
Raghavan Raman
0d66748948 [jit] Add tests for JIT with dynamic shape fusion (#72201)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72201

Reviewed By: mikaylagawarecki

Differential Revision: D34067211

Pulled By: navahgar

fbshipit-source-id: 2c13bb43c76c7fed720ad37892d2177c3dc0b924
(cherry picked from commit eed2d8cea4)
2022-02-18 23:29:08 +00:00
Alban Desmaison
0951cb513a Revert D34342689: Revert D34250357: Sync lazy_tensor_staging back to master
Test Plan: revert-hammer

Differential Revision:
D34342689

Original commit changeset: 43f6da6986f7

Original Phabricator Diff: D34250357 (69389fb542)

fbshipit-source-id: 8a3fb74877e719e9b9577b58027b4e7061a04ef0
(cherry picked from commit c749f08e7a)
2022-02-18 17:31:21 +00:00
Alban Desmaison
86a961af87 Revert D34250357: Sync lazy_tensor_staging back to master
Test Plan: revert-hammer

Differential Revision:
D34250357 (69389fb542)

Original commit changeset: aa7d589f6050

Original Phabricator Diff: D34250357 (69389fb542)

fbshipit-source-id: 43f6da6986f7fc5189d641b7803adc5ada27194c
(cherry picked from commit 3c930a5e4e)
2022-02-18 15:47:37 +00:00
Will Constable
69389fb542 Sync lazy_tensor_staging back to master (#72875)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72875

This diff contains changes from several PRs landed to lazy_tensor_staging branch.
* generating 'fallback' overrides for each codegenned op, useful for debugging
* supports operators which are missing aten:: symbols for op names, instead using their string counterpart
* makes the IR class a base class instead of hardcoding the assumption of TS

It also resolves lint issues and in particular cleans up the following:
* {Type}s shouldn't be passed into isValueType, and using the catch-all base class of CType is nicer than specifying a list of types.

Fixes #72852

Test Plan: test manually on lazy_tensor_staging branch

Reviewed By: shunting314

Differential Revision: D34250357

fbshipit-source-id: aa7d589f605055d5d02bc77c77fa6f1182ff7497
(cherry picked from commit 2f8f5e4971)
2022-02-18 03:49:46 +00:00
Raghavan Raman
6d33852685 [NNC] TensorExprKernel state should not be modified on calls to run methods (#73028)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73028

A typical use case for `TensorExprKernel` is to create the kernel once and call it multiple times, possibly in parallel. For the parallel calls to work, we need to ensure that the run() method calls do not change any state in `TensorExprKernel`.

Before this change, the `run()` method was modifying the sizes and strides vectors when dynamic shapes were present. This manifested as a data race when running a model with Static Runtime.
ghstack-source-id: 149398820

Test Plan:
```
buck build mode/dev-asan //caffe2/test/cpp/tensorexpr:tensorexpr
./buck-out/dev/gen/caffe2/test/cpp/tensorexpr/tensorexpr --gtest_filter="DynamicShapes.MultiThreadedExecution"
```

Reviewed By: eellison

Differential Revision: D34287960

fbshipit-source-id: d311f3c5a66c5d5de4e1deaeaa01816b53e9906e
(cherry picked from commit 161568bfae)
2022-02-17 23:14:27 +00:00
Mike Iovine
d1c5f9e439 [JIT][SR] Introduce prim::IfThenElse (#72587)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72587

This pattern frequently appears in a few graphs:

```
%result = prim::If(%condition)
  block0():
    -> (%a)
  block1():
    -> (%b)
```

This is slow, particularly in static runtime. Static runtime creates memory planners/block runners for each sub-block, which eats up a lot of memory and introduces a lot of extra overhead for this relatively simple operation.

This diff introduces a new op that replaces nodes like the above with a single op meant to act like a ternary operator:

```
%result = prim::IfThenElse(%condition, %a, %b)
```

Test Plan: New unit tests

Reviewed By: eellison

Differential Revision: D34091789

fbshipit-source-id: eb6a8c460c39b4c019a1f4ab1f3f1e5b6edc400c
(cherry picked from commit 0f1b335e5b)
2022-02-17 18:22:48 +00:00
Will Constable
889f3f48b2 Revert D34178476: Update lazy_ir.py from lazy_tensor_staging
Test Plan: revert-hammer

Differential Revision:
D34178476 (3842140fd5)

Original commit changeset: 7190b2e0d82b

Original Phabricator Diff: D34178476 (3842140fd5)

fbshipit-source-id: 4c969a355f01244c6f5acc52bc31679f2182aa55
(cherry picked from commit 17082075dd)
2022-02-16 19:34:41 +00:00
Will Constable
3842140fd5 Update lazy_ir.py from lazy_tensor_staging (#72730)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72730

This diff contains changes from several PRs landed to lazy_tensor_staging branch.
- generating 'fallback' overrides for each codegenned op, useful for debugging
- supports operators which are missing aten:: symbols for op names, instead using their string counterpart
- makes the IR class a base class instead of hardcoding the assumption of TS

Test Plan: tested on lazy_tensor_staging branch

Reviewed By: desertfire

Differential Revision: D34178476

fbshipit-source-id: 7190b2e0d82b4eb1f4510c858c24446c6df3f9d0
(cherry picked from commit 6713d3f0ef)
2022-02-16 18:33:31 +00:00
Shunting Zhang
763ad1bf25 (2/2) Make TorchScript Preserve Fully Qualified Class Name for Python Exceptions: frontend change (#72899)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72899

Reland D33282878 (911d527b87). This is the frontend change.
ghstack-source-id: 149204031

Test Plan: Refer to D33282878 (911d527b87). Also check CI

Reviewed By: gmagogsfm

Differential Revision: D34252127

fbshipit-source-id: 27b17ddd4d05d904eb91fd9ee094d9121f00e388
(cherry picked from commit 1d276baca3)
2022-02-16 03:45:15 +00:00
Ivan Kobzarev
67cd98fad4 [tensorexpr] Fix isNLC segfault (#72786)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72786

Test Plan: Imported from OSS

Reviewed By: H-Huang

Differential Revision: D34204523

Pulled By: IvanKobzarev

fbshipit-source-id: 9a0f2ce0a1921e261932029c3ebd842330fdf528
(cherry picked from commit b8326064f6)
2022-02-15 20:31:56 +00:00
Michael Suo
7db4a48d92 Revert D33342569: (2/2) Make TorchScript Preserve Fully Qualified Class Name for Python Exceptions: frontend change
Test Plan: revert-hammer

Differential Revision:
D33342569 (856157fcee)

Original commit changeset: 57984ac67ae2

Original Phabricator Diff: D33342569 (856157fcee)

fbshipit-source-id: 4c12235a1776a3652e7f91e93b626705759d5176
(cherry picked from commit 4cbd7d8bab)
2022-02-15 18:45:44 +00:00
Shunting Zhang
856157fcee (2/2) Make TorchScript Preserve Fully Qualified Class Name for Python Exceptions: frontend change (#70471)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70471

Reland D33282878 (911d527b87). This is the frontend change.
ghstack-source-id: 149114933

Test Plan: Refer to D33282878 (911d527b87). Also check CI

Reviewed By: gmagogsfm

Differential Revision: D33342569

fbshipit-source-id: 57984ac67ae2c56c38f72d3b1fb69105901fb472
(cherry picked from commit b47cc935ee)
2022-02-15 07:21:19 +00:00
Pavithran Ramachandran
a482aeb0ce [PyTorchEdge] backport v8 to v7 to support promoted ops as instruction (#71662)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71662

backport v8 to v7 to support promoted ops as instruction

a flag to help export as instruction from v8 and export as operators for v7 and below

Test Plan:
```
buck test caffe2/test/cpp/jit:jit -- LiteInterpreterTest.BackPortByteCodeModelAllVersions

Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/5629499620570927
    ✓ ListingSuccess: caffe2/test/cpp/jit:jit : 461 tests discovered (15.693)
    ✓ Pass: caffe2/test/cpp/jit:jit - LiteInterpreterTest.BackPortByteCodeModelAllVersions (2.712)
Summary
  Pass: 1
  ListingSuccess: 1
If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users
Finished test run: https://www.internalfb.com/intern/testinfra/testrun/5629499620570927
```

```
buck run mode/opt //caffe2/torch/fb/mobile/upgrader_codegen:upgrader_codegen

buck test mode/opt //caffe2/test:upgrader_codegen -- mobile.test_upgrader_codegen.TestLiteScriptModule
Parsing buck files: finished in 0.8 sec
Downloaded 0/2 artifacts, 0.00 bytes, 100.0% cache miss (for updated rules)
Building: finished in 01:39.4 min (100%) 11031/11031 jobs, 2/11031 updated
  Total time: 01:40.2 min
More details at https://www.internalfb.com/intern/buck/build/a8b0e417-019c-44ba-be6b-23379411a965
BUILD SUCCEEDED
Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details.
Running with tpx session id: 44fbfa66-cce8-4277-82ac-f89d79558581
Trace available for this run at /tmp/tpx-20220202-160956.915412/trace.log
RemoteExecution session id: reSessionID-44fbfa66-cce8-4277-82ac-f89d79558581-tpx
Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/281475200877601
    ✓ ListingSuccess: caffe2/test:upgrader_codegen : 1 tests discovered (1.249)
    ✓ Pass: caffe2/test:upgrader_codegen - test_generate_bytecode (mobile.test_upgrader_codegen.TestLiteScriptModule) (1.365)
Summary
  Pass: 1
  ListingSuccess: 1
If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users
Finished test run: https://www.internalfb.com/intern/testinfra/testrun/281475200877601
```

Reviewed By: iseeyuan

Differential Revision: D33719098

fbshipit-source-id: e2d2b23d298f98e4d4fcdfc344f7b8c6f92cff26
(cherry picked from commit 81b956c23a)
2022-02-15 03:47:39 +00:00
jiej
2d110d514f Nvfuser code bump 2_1_2022 (#72127)
Summary:
Things changed in this PR that requires review:
1. aten/src/ATen/core/interned_strings.h
2. torch/csrc/jit/ir/alias_analysis.h : exposing createValue to allow efficient mutation
3. torch/csrc/jit/runtime/symbolic_shape_registry.cpp : added gelu/tanh/erf in registry
4. torch/jit/_script.py : throws scripting model sees autocast as decorator since it's not supported

nvfuser code update:
1. codegen improvements and performance tuning
2. integration bug fixes for shape expression logic
3. kernel segmentation update to address perf regression from horizontal fusion
4. scalar cpu tensor promotion to support inter-device operation between cpu scalar tensor and cuda tensor

Things reverted from local changes:
aten::gelu with approximation (tracked in PR: https://github.com/pytorch/pytorch/pull/61439)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/72127

Reviewed By: HamidShojanazeri

Differential Revision: D34113233

Pulled By: jbschlosser

fbshipit-source-id: b82cde32b71e324eca0ea57cb8c9f9647278ca74
(cherry picked from commit e009bc5c4e)
2022-02-15 00:43:16 +00:00
Jacob Szwejbka
52c516ecb8 [Pytorch Edge] Minor improve documentation in test_backend_with_compiler
Summary:
Went through all these files and the design doc to understand the to_backend api. Figured I could add some comments to these files to make the apis a little clearer for those that come after.

(Note: this ignores all push blocking failures!)

Test Plan: na

Reviewed By: raziel, larryliu0820

Differential Revision: D34221989

fbshipit-source-id: 699fcbd8714bfb6b58c6c0bf0e5fbc019d2ef6f8
(cherry picked from commit 0b3f5d73e8)
2022-02-14 23:44:46 +00:00
Ryan Spring
4f8b986e28 Implement Tanh Gelu Approximation (#61439)
Summary:
1. Implements https://github.com/pytorch/pytorch/issues/39853
2. Adds approximate boolean flag to Gelu
3. Enables Tanh Gelu approximation
4. Adds double backward support for Gelu
5. Enable Tanh Gelu in NvFuser

```
def gelu(x, approximate : str = 'none'):
    if approximate == 'tanh':
        # sqrt(2/pi) = 0.7978845608028654
        return 0.5 * x * (1.0 + torch.tanh(0.7978845608028654 * (x + 0.044715 * torch.pow(x, 3.0))))
    else:
        return x * normcdf(x)
```

Linking XLA PR - https://github.com/pytorch/xla/pull/3039

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61439

Reviewed By: VitalyFedyunin

Differential Revision: D33894937

Pulled By: jbschlosser

fbshipit-source-id: b65e8fb6ea66168af8f34f45ed50e92737a33851
(cherry picked from commit 6e986f91a9)
2022-02-14 03:40:32 +00:00
Mikhail Zolotukhin
1855b14922 [TensorExpr] Delet DimArg class. (#72390)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72390

This class didn't add much value and only caused more boilerplate code.
This change removes the class and updates all the use cases with
uses of `ExprHandle`.

A side effect of this change is different names in loop variables, which
caused massive mechanical changes in our tests.

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D34030296

Pulled By: ZolotukhinM

fbshipit-source-id: 2ba4e313506a43ab129a10d99e72b638b7d40108
(cherry picked from commit c2ec46a058)
2022-02-11 01:21:59 +00:00
Mikhail Zolotukhin
9123e9b3b5 [TensorExpr] Switch from ExprPtr to ExprHandle in Compute impl. (#72389)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72389

This is an NFC change that just prepares the code for the upcoming
deletion of `DimArg` class. This change makes `Compute` and `Reduce`
APIs to use `ExprHandle` everywhere.

There should be no observable behavior change from this PR.

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D34030295

Pulled By: ZolotukhinM

fbshipit-source-id: 3fd035b6a6bd0a07ccfa92e118819478ae85412a
(cherry picked from commit 1b0a4b6fac)
2022-02-11 01:21:59 +00:00
David Berard
c314750401 [JIT] enable profiling optional tensors (#70532)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70532

This adds profiling to Optional[Tensor] types

First, in profiling_record.cpp, profiling nodes are added to Optional[Tensor] inputs. The nodes record
(a) whether or not any `None` types are encountered, and
(b) of the Tensor types, what's the most specific type matching all of non-null tensors that were encoutered (shape, dtype, etc.)

In tensorexpr_fuser, when specializing types based on the profiled information, an Optional[Tensor] type will always be Optional[], but the Tensor type contained in the optional type can be specialized (e.g. `Optional[Float(2x2x2, cpu, etc)]`)

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D33714748

Pulled By: davidberard98

fbshipit-source-id: 93c819054450de7ac84b112de1012c0c12e34120
(cherry picked from commit 21cfd80123)
2022-02-08 22:52:26 +00:00
Raghavan Raman
765908708b [nnc] Adding a test with dynamic shapes from a model (#72198)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72198

Test Plan: Imported from OSS

Reviewed By: eellison

Differential Revision: D33951741

Pulled By: navahgar

fbshipit-source-id: 596b193eba14c8e1affa9fa13070079f05d64cac
(cherry picked from commit ddbb78ff80)
2022-02-08 02:00:46 +00:00
Raghavan Raman
ff71429906 [nnc] Add stride args while running with allocated outputs (#72223)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72223

ghstack-source-id: 148494871

Test Plan:
```
buck test mode/opt //caffe2/test/cpp/tensorexpr:tensorexpr -- --exact 'caffe2/test/cpp/tensorexpr:tensorexpr - DynamicShapes.GraphWithSymbolicStrides'
```

Reviewed By: eellison

Differential Revision: D33960592

fbshipit-source-id: 6334978d5e3713889b4ad12bcd8ed8c69df39d58
(cherry picked from commit 95cc102bc2)
2022-02-07 19:24:56 +00:00
Han Qi
57f039b41f Fixing few bugs in torch flatbuffer (#72349)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72349

1. Interface call'd methods need to be registered to class. Previously all interface calls are inlined so  there was no such problem.
2. parseDoubleList and parseBoolList got reversed when refactoring.

Test Plan:
1. Get ASR's test model at
```
mkdir ~/asr1 && cd ~/asr1
fbpkg fetch speech.tuna.milan.ondevice.en_us
```
2. Convert model:
```
cd ~/fbsource
buck run //xplat/caffe2/fb/lite_predictor:convert_model -- --model=$HOME/asr1/pytorchmodel.pt --output_name=$HOME/asr1/pytorchmodel.ff
```
3. Ran lite_predictor_flatbuffer
```
 buck run //xplat/caffe2/fb/lite_predictor:lite_predictor_flatbuffer -- --model=$HOME/asr1/pytorchmodel.ff --method_to_call=encode_src --method_to_generate_input=get_all_bundled_inputs_for_encode_src

```

See perf metric generated (means loading and inference succeeded).

Reviewed By: gmagogsfm, zhxchen17

Differential Revision: D33959746

fbshipit-source-id: 24671e1189438119f477032eb6c29bd7736e74ca
(cherry picked from commit 5e18809350)
2022-02-05 00:25:27 +00:00
Raghavan Raman
38f696c0cd [nnc] Add a API to unroll loops by a given factor (#72071)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72071

Reviewed By: ngimel

Differential Revision: D33946250

Pulled By: navahgar

fbshipit-source-id: 3f3f92054174620025a9d71154d006f1738953e2
(cherry picked from commit d8b53598e9)
2022-02-03 18:41:21 +00:00
kshitij12345
02f6226bff [fix] Dropout2d-3d no-batch-dim (#69885)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/69801

TODO:
* [x] Update C++ API

cc albanD mruberry jbschlosser walterddr kshitij12345

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69885

Reviewed By: mruberry

Differential Revision: D33175470

Pulled By: jbschlosser

fbshipit-source-id: c9d7d9e0f59ba290a0157725c338a345f3d58b9f
(cherry picked from commit 7e4271a156)
2022-02-02 16:40:32 +00:00
CodemodService FBSourceClangFormatLinterBot
ed435e903f [AutoAccept][Codemod][FBSourceClangFormatLinter] Daily arc lint --take CLANGFORMAT
Reviewed By: zertosh

Differential Revision: D33938055

fbshipit-source-id: 6c0643a18f09854e87e183341f252c66dd6395a6
(cherry picked from commit fd183aedbc)
2022-02-02 11:27:15 +00:00
Ivan Kobzarev
34e4418dfa [nnc] tensorexpr for quantized/aten::upsample_nearest2d (#71236)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71236

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D33553305

Pulled By: IvanKobzarev

fbshipit-source-id: 2442afee6d23123bb3a4bc52d3555393b0254106
(cherry picked from commit 90a263fc08)
2022-02-01 19:48:53 +00:00
Elias Ellison
cf1833df70 [WIP] add explicit dynamic fusion arg (#71173)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71173

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D33536222

Pulled By: eellison

fbshipit-source-id: a097408ecdd6e284432de128feb297993d882d52
(cherry picked from commit 0e3419b2d3)
2022-02-01 19:07:02 +00:00
Nikita Shulga
74c44ba9d6 Revert D33850228: [pytorch][PR] Implement Tanh Gelu Approximation
Test Plan: revert-hammer

Differential Revision:
D33850228 (23d03025dc)

Original commit changeset: 3cc33fb298e4

Original Phabricator Diff: D33850228 (23d03025dc)

fbshipit-source-id: 9436e7df73c2b2e2011f321674f24973316d3692
(cherry picked from commit c9efb58223)
2022-01-31 17:44:19 +00:00
Ryan Spring
23d03025dc Implement Tanh Gelu Approximation (#61439)
Summary:
1. Implements https://github.com/pytorch/pytorch/issues/39853
2. Adds approximate boolean flag to Gelu
3. Enables Tanh Gelu approximation
4. Adds double backward support for Gelu
5. Enable Tanh Gelu in NvFuser

```
def gelu(x, approximate : str = 'none'):
    if approximate == 'tanh':
        # sqrt(2/pi) = 0.7978845608028654
        return 0.5 * x * (1.0 + torch.tanh(0.7978845608028654 * (x + 0.044715 * torch.pow(x, 3.0))))
    else:
        return x * normcdf(x)
```

Linking XLA PR - https://github.com/pytorch/xla/pull/3039

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61439

Reviewed By: cpuhrsch

Differential Revision: D33850228

Pulled By: jbschlosser

fbshipit-source-id: 3cc33fb298e480d7ecc5c67716da019d60c6ab33
(cherry picked from commit 3a53b3e94f)
2022-01-31 17:07:45 +00:00
Tristan Rice
6208c2800e torch/monitor: merge Interval and FixedCount stats (#72009)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72009

This simplifies the Stats interface by merging IntervalStat and FixedCountStat into a single Stat w/ a specific window size duration and an optional max samples per window. This allows for the original intention of having comparably sized windows (for statistical purposes) while also having a consistent output bandwidth.

Test Plan:
```
buck test //caffe2/test:monitor //caffe2/test/cpp/monitor:monitor
```

Reviewed By: kiukchung

Differential Revision: D33822956

fbshipit-source-id: a74782492421be613a1a8b14341b6fb2e8eeb8b4
(cherry picked from commit 293b94e0b4)
2022-01-30 23:21:59 +00:00
David Berard
99bc978b78 [JIT] Propagate requires_grad to autodiff subgraphs (#71666)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71666

When JIT autodiff is constructing a gradient computation graph, it will only add gradients for tensors that require_grad. Previously, require_grad information was **not** propagated to the subgraph that autodiff used; as a result, autodiff would calculate *all* gradients, even if requires_grad had never been set during profiling runs. In certain cases, this can lead to performance issues. For example, during training, the gradient of the input data is not needed, but is still computed.

This propagates requires_grad to the subgraph passed into autodiff, so that autodiff will not compute unnecessary gradients.

Test: `./bin/test_jit --gtest_filter="AutodiffRemoveUnusedGradientsTest.Linear"`

Test Plan: Imported from OSS

Reviewed By: eellison

Differential Revision: D33725304

Pulled By: davidberard98

fbshipit-source-id: ca7ab4c9a6a26f94f93aff2d5a4135e125323ba1
(cherry picked from commit a97fe0556d)
2022-01-28 18:57:36 +00:00
Joel Schlosser
cb823d9f07 Revert D33744717: [pytorch][PR] Implement Tanh Gelu Approximation
Test Plan: revert-hammer

Differential Revision:
D33744717 (f499ab9cef)

Original commit changeset: d64532a562ed

Original Phabricator Diff: D33744717 (f499ab9cef)

fbshipit-source-id: 396c3f63de5865f894dbc353d0790a01a624be93
(cherry picked from commit e9fb2d1db1)
2022-01-28 18:35:01 +00:00
Ryan Spring
f499ab9cef Implement Tanh Gelu Approximation (#61439)
Summary:
1. Implements https://github.com/pytorch/pytorch/issues/39853
2. Adds approximate boolean flag to Gelu
3. Enables Tanh Gelu approximation
4. Adds double backward support for Gelu
5. Enable Tanh Gelu in NvFuser

```
def gelu(x, approximate : str = 'none'):
    if approximate == 'tanh':
        # sqrt(2/pi) = 0.7978845608028654
        return 0.5 * x * (1.0 + torch.tanh(0.7978845608028654 * (x + 0.044715 * torch.pow(x, 3.0))))
    else:
        return x * normcdf(x)
```

Linking XLA PR - https://github.com/pytorch/xla/pull/3039

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61439

Reviewed By: mikaylagawarecki

Differential Revision: D33744717

Pulled By: jbschlosser

fbshipit-source-id: d64532a562ed53247bb4fa52bb16722634d5c187
(cherry picked from commit 4713dd9cca)
2022-01-28 16:59:09 +00:00
John Clow
c85965600c Fix bug where frozen mod not used for OFI #68903 (#71436)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71436

Fixes issue #68903

Test Plan: Imported from OSS

Reviewed By: george-qi

Differential Revision: D33824857

Pulled By: Gamrix

fbshipit-source-id: 8d351feb4a621916f55003c58527a1e85eec476e
(cherry picked from commit 57bb420040)
2022-01-27 23:37:50 +00:00
Pavithran Ramachandran
bf69a61293 (1/2) Make TorchScript Preserve Fully Qualified Class Name for Python Exceptions: backend change
Summary: Reland for D33282878 (911d527b87) . Land backend change first to maintain FC. Will wait for 2 weeks after this diff is in. And than land the front-end change in next diff.

Test Plan:
test in next diff

time buck test mode/dev-nosan fblearner/flow/projects/langtech/translation:tests -- test_e2e_base_training

Reviewed By: gmagogsfm

Differential Revision: D33342547

fbshipit-source-id: b3dee9a4bdfd78103848c12629e5fccafdd621e3
(cherry picked from commit ae1935f1af)
2022-01-27 03:29:40 +00:00
Mikhail Zolotukhin
1dbcde2ade [TensorExpr] Support scalar intermediate and output values. (#71186)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71186

So far we've only supported scalar inputs, but couldn't handle scalar outputs
or intermediates. This PR adds it.

Scalar outputs are returned as 0-dim tensors. If the kernel is invoked on a
stack of IValues, we correctly convert the results to scalar IValues when
needed. If the kernel is invoked with a vector of void* pointers, everything
works out of the box without any conversions.

Lowerings for scalar operators are a bit tricky. Usual lowerings return a pair
<Buf, Stmt> (aka Tensor), but for scalar operators we also want to have the
corresponding Var that the lowering function supposedly creates (in theory we
could just use Loads and Stores, but I'm worried it can affect performance as
there is no guarantee this will be optimized by LLVM). So, what we do here to
work around this is we return a fake buf + stmt that sets the corresponding
var. Then outside of the lowering we create a real buffer and generate a Store
to it with the value from the variable we passed as the base handle of the fake
buf. This real buffer is then treated as usual by the rest of the system and we
can use it if we need to return this scalar value as a kernel output. If we do
not need to return it, then the Store will be deleted by the DCE pass.

Differential Revision:
D33539324
D33539324

Test Plan: Imported from OSS

Reviewed By: navahgar

Pulled By: ZolotukhinM

fbshipit-source-id: ab4524b9820ce204f106effcf6232ed33d4ee223
(cherry picked from commit 7faa0939f0)
2022-01-26 06:32:51 +00:00
Jacob Szwejbka
70f3078dd6 [Pytorch Edge] Wrap lowered module in to_backend (#71597)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71597

Problem: _jit_to_backend overrides get/set state. This means any attributes added to the module after lowering will not be preserved after serialization. For edge workflows the biggest problem here is it breaks bundled_inputs.

Solution?:

Real quick and easy way to handle issues with to_backend overriding get/set state. Wraps the lowered module in another module and has forwarding functions for the api specified in 'method_compile_spec'.

The tradeoff with this approach is now the actual workhorse of the module is 1 layer deep which might make debugging slightly grosser/more difficult/confusing. The other approach Martin David and I talked about would be to only lower the portions that require custom get/set state logic. This leaves the top level the same, and only specific backened internals are changed. Personally I'm not sure how much that really addresses the debugging concern all that well. It seems like if you cracked the model open you'd still run into similar amounts of confusion with a lot of the variables and logic referenced coming from another module.

The other concern with this approach is whether or not 'compile_spec' specifies the public api of the module (since thats our source of truth for this wrapper). While it may not be enforced, it certainly seems to be true by convention and the to_backend api already uses it as a source of truth for all functions that get generated in the resulting module. I say we just formally commit to this (compile spec keys being functions) being the contract of the api instead of just assuming it to be the case and then having weird behavior if its not.

Test Plan:
New Unit Test
CI to check for existing behavior and contracts.

manually tested in a notebook with bundled inputs.

{P475790313}

Reviewed By: raziel

Differential Revision: D33694257

fbshipit-source-id: 9ff27db421eba41bac083dff11a22e9e40a36970
(cherry picked from commit 91ef49977e)
2022-01-25 06:30:19 +00:00
Peter Bell
40d1f77384 Codegen: python_torch_functions only include relevant operators (#68693)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68693

Generation of python bindings for native functions is split over 8
different files. One for each namespace, with the torch namespace
split into 3 shards, and methods in their own file as well. This
change ensures that editing any single (non-method) operator only
causes one of these files to be rebuilt.

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D32596270

Pulled By: albanD

fbshipit-source-id: 0570ec69e7476b8f1bc21138ba18fe8f95ebbe3f
(cherry picked from commit ba0fc71a3a)
2022-01-21 15:37:06 +00:00
Jacob Szwejbka
e926360cb8 [Pytorch Edge] Refactor Compatibility Stuff into own directory (#71432)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71432

Organizing jit/mobile a little more

ghstack-source-id: 147184536

Test Plan: ci.

Reviewed By: iseeyuan

Differential Revision: D33640527

fbshipit-source-id: f3a7884fe0d06d80bb8d9cf141ecaee34b6f88ff
(cherry picked from commit 4c3d1e5435)
2022-01-20 19:38:41 +00:00
Han Qi
21b697b646 add flatbuffer_loader and flatbuffer_serializer as BUCK target (#71463)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71463

title

Test Plan: unittest

Reviewed By: zhxchen17

Differential Revision: D33651339

fbshipit-source-id: 4bf325a40e263a441fd86bce560645ad0c1ebb23
(cherry picked from commit 4cb02e62a6)
2022-01-20 04:51:10 +00:00
Raghavan Raman
70c9146c40 [nnc] Update block and thread extents in cuda_codegen to use int64_t (#71428)
Summary:
The block and thread extent calculations in `cuda_codegen` should be using `int64_t` instead of `int`. The updated test, `test_dynamic_shapes`, fails without this change.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/71428

Reviewed By: samdow

Differential Revision: D33640374

Pulled By: navahgar

fbshipit-source-id: 64c340ad2a9a1fa1fe066cf1c5dfc3b546b7be6d
(cherry picked from commit 6ea546ce11)
2022-01-19 23:21:24 +00:00
Peter Bell
6f4c491c6b empty_cpu: Add functions that don't depend on Tensor (#70613)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70613

This refactors `at::detail::empty_cpu` to use only `TensorBase` so you
can construct tensors without including `Tensor.h`. It also adds a
`TensorOptions` version to reduce friction in operators moving from
the `at::empty` API.

Test Plan: Imported from OSS

Reviewed By: samdow

Differential Revision: D33623682

Pulled By: ngimel

fbshipit-source-id: 7a7b08bc2ed06830a3d698197a0c8389a096dc1d
(cherry picked from commit 2e17ad0bbd)
2022-01-19 00:01:58 +00:00
Jiewen Tan
680d61daab [LT] Remove torch::lazy::convertShapes (#71291)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71291

This commit removes torch::lazy::convertShapes since it's no longer used.
In addition, it replaces a numel logic within LTCTensorImpl.

Test Plan:
./build/bin/test_lazy
CI in lazy_tensor_staging branch

Reviewed By: wconstab

Differential Revision: D33575084

Pulled By: alanwaketan

fbshipit-source-id: b104ef39fd552822e1f4069eab2cb942d48423a6
2022-01-14 12:06:39 -08:00
CodemodService FBSourceClangFormatLinterBot
88012c7daf [AutoAccept][Codemod][FBSourceClangFormatLinter] Daily arc lint --take CLANGFORMAT
Reviewed By: zertosh

Differential Revision: D33577744

fbshipit-source-id: 7ecc8367998ee1dffde54c2f4dd3cfafe19a53c9
2022-01-14 06:10:57 -08:00
Mike Ruberry
3a0c680a14 Jiterates exp2, erfc, erfinv and entr and refactors code_template.h to ATen (#71295)
Summary:
Per title.

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang

Pull Request resolved: https://github.com/pytorch/pytorch/pull/71295

Reviewed By: ngimel

Differential Revision: D33575885

Pulled By: mruberry

fbshipit-source-id: bc841b46fc0b5458a26a4d4465b18a7a54cd5a5b
2022-01-13 23:58:51 -08:00
Zhengxu Chen
5f2b4be3b9 [jit] Split DynamicType conformance test into smaller pieces. (#71275)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71275

Currently it's taking more than 10 minutes to run the conformance test. Instead we should use parametrized test to shard into test segments so that they can run in parallel.
ghstack-source-id: 146990608

Test Plan:
```
[zhxchen17@devbig560.ftw3 /data/users/zhxchen17/fbsource/fbcode] buck test mode/dev-tsan //caffe2/test/cpp/jit:jit -- -r 'LiteInterpreterDynamicTypeTestFixture'
Building... 34.9 sec (99%) 12110/12111 jobs, 0/12111 updated
Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details.
Running with tpx session id: ebea52b3-7c7f-46be-9f69-18e2e7b040cc
Trace available for this run at /tmp/tpx-20220113-113635.717778/trace.log
RemoteExecution session id: reSessionID-ebea52b3-7c7f-46be-9f69-18e2e7b040cc-tpx
Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/4222124735827748
    ✓ ListingSuccess: caffe2/test/cpp/jit:jit : 431 tests discovered (11.173)
    ✓ Pass: caffe2/test/cpp/jit:jit - Conformance/LiteInterpreterDynamicTypeTestFixture.Conformance/0 (51.331)
    ✓ Pass: caffe2/test/cpp/jit:jit - Conformance/LiteInterpreterDynamicTypeTestFixture.Conformance/1 (65.614)
    ✓ Pass: caffe2/test/cpp/jit:jit - Conformance/LiteInterpreterDynamicTypeTestFixture.Conformance/3 (76.875)
    ✓ Pass: caffe2/test/cpp/jit:jit - Conformance/LiteInterpreterDynamicTypeTestFixture.Conformance/5 (77.271)
    ✓ Pass: caffe2/test/cpp/jit:jit - Conformance/LiteInterpreterDynamicTypeTestFixture.Conformance/4 (78.871)
    ✓ Pass: caffe2/test/cpp/jit:jit - Conformance/LiteInterpreterDynamicTypeTestFixture.Conformance/6 (78.984)
    ✓ Pass: caffe2/test/cpp/jit:jit - Conformance/LiteInterpreterDynamicTypeTestFixture.Conformance/7 (84.068)
    ✓ Pass: caffe2/test/cpp/jit:jit - Conformance/LiteInterpreterDynamicTypeTestFixture.Conformance/2 (85.198)
    ✓ Pass: caffe2/test/cpp/jit:jit - Conformance/LiteInterpreterDynamicTypeTestFixture.Conformance/8 (88.815)
    ✓ Pass: caffe2/test/cpp/jit:jit - Conformance/LiteInterpreterDynamicTypeTestFixture.Conformance/9 (90.332)
Summary
  Pass: 10
  ListingSuccess: 1
If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users
Finished test run: https://www.internalfb.com/intern/testinfra/testrun/4222124735827748
```

Reviewed By: qihqi

Differential Revision: D33570442

fbshipit-source-id: 5c49e03b0f88068d444c84b4adeaaf45433ce1fa
2022-01-13 18:22:55 -08:00
CodemodService FBSourceClangFormatLinterBot
60632a00fe [AutoAccept][Codemod][FBSourceClangFormatLinter] Daily arc lint --take CLANGFORMAT
Reviewed By: zertosh

Differential Revision: D33561057

fbshipit-source-id: 79873717c45c8bbe6d0ae760e718770fd960185d
2022-01-13 03:27:06 -08:00
Scott Wolchok
1bbea3c3a2 [PyTorch][JIT] Support mayContainAlias(Value*, ArrayRef<Value*>) (#69853)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69853

We can implement this overload more efficiently.
ghstack-source-id: 146924693

Test Plan:
patched alias_analysis tests

Time reported to initialize a predictor by static runtime when given ctr_mobile_feed local_ro net is 9.5s instead of 10.5s.

Reviewed By: mikeiovine

Differential Revision: D33039731

fbshipit-source-id: 52559d678e9eb00e335b9e0db304e7a5840ea397
2022-01-12 16:53:54 -08:00
Han Qi
1bc3571078 [pytorch][PR] Add ability for a mobile::Module to save as flatbuffer (#70201)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70201

Included functions:
save_mobile_module -> saves a mobile::Module to flatbuffer
load_mobile_module_from_file -> loads a flatbuffer into mobile::Module
parse_mobile_module -> parses from bytes or deserialized flatbuffer module object

Compared to previous attempts, this diff only adds flatbuffer to cmake target and leaves fbcode/xplat ones unchanged.

Test Plan: unittest

Reviewed By: malfet, gmagogsfm

Differential Revision: D33239362

fbshipit-source-id: b9ca36b83d6af2d78cc50b9eb9e2a6fa7fce0763
2022-01-12 16:30:39 -08:00
Raghavan Raman
9ca367d48b [nnc] Use given kernel function name while emitting code (#67781)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67781

Update `LLVMCodeGen` in NNC to use the given kernel function name while emitting code.

This was earlier committed as D31445799 (c30dc52739) and got reverted as part of a stack of diffs that included a cache for `PyTorchLLVMJIT`, which was the likely culprit.

Test Plan:
```
buck test mode/opt //caffe2/test/cpp/tensorexpr:tensorexpr -- --exact 'caffe2/test/cpp/tensorexpr:tensorexpr - LLVM.CodeGenKernelFuncName'
```

Reviewed By: ZolotukhinM, bdhirsh

Differential Revision: D32145958

fbshipit-source-id: 5f4e0400c4fa7cabce5b91e6de2a294fa0cad88e
2022-01-12 15:49:17 -08:00
Tristan Rice
bfe1abd3b5 torch/monitor: add pybind (#69567)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69567

This exposes torch.monitor events and stats via pybind11 to the underlying C++ implementation.

* The registration interface is a tad different since it takes a lambda function in Python where as in C++ it's a full class.
* This has a small amount of changes to the counter interfaces since there's no way to create an initializer list at runtime so they now also take a vector.
* Only double based stats are provided in Python since it's intended more for high level stats where float imprecision shouldn't be an issue. This can be changed down the line if need arises.

```
events = []

def handler(event):
    events.append(event)

handle = register_event_handler(handler)

log_event(Event(type="torch.monitor.TestEvent", timestamp=datetime.now(), metadata={"foo": 1.0}))
```

D32969391 is now included in this diff.
This cleans up the naming for events. type is now name, message is gone, and metadata is renamed data.

Test Plan: buck test //caffe2/test:monitor //caffe2/test/cpp/monitor:monitor

Reviewed By: kiukchung

Differential Revision: D32924141

fbshipit-source-id: 563304c2e3261a4754e40cca39fc64c5a04b43e8
2022-01-12 13:35:11 -08:00
Tugsbayasgalan (Tugsuu) Manlaibaatar
70951884d4 Add option to load historic operators in IR when the operator is deprecated (#71148)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71148

Test Plan: Imported from OSS

Reviewed By: gmagogsfm

Differential Revision: D33521300

Pulled By: tugsbayasgalan

fbshipit-source-id: a0607dba5e7233590384326537017eb0b18da419
2022-01-12 11:07:04 -08:00
Elias Ellison
5480deb183 Add support for permutting dynamic fusion group outputs to channels last format (#70656)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70656

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D33458650

Pulled By: eellison

fbshipit-source-id: f0c7d20743deac7a87f7c9176e60da8100aefe41
2022-01-12 09:11:34 -08:00
Elias Ellison
39be20f259 [JIT][NNC] Add handling of strides to dynamic shape support. (#70464)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70464

Add handling of strided input tensors to dynamic fusion. This is done with the same set of input striding specializations as https://github.com/pytorch/pytorch/pull/60684/:
```
  S_ONE, // STRIDE_ONE: packed
  S_CONT, // STRIDE_CONTIGUOUS: stride[i + 1] * sizes[i + 1]
  S_TRAN_CONT, // STRIDE_TRANSPOSED_CONTIGUOUS: stride[i-1] * sizes[i-1]
  S_AS_ARG, // STRIDE_AS_ARG: stride passed in as runtime value
```
and then two additional specializations for a) contiguous tensor and b) channels-last tensor. channels-last is a common case and we should optimize for it. additionally, tensors natively store whether they are contiguous/channels-last contiguous, which makes it faster to check if tensors follow this pattern.

Output striding will be done in a follow up.

The striding is stored on both the TensorGroup node and on the guard node. The striding descriptors are stored as a vector of strings on the node for debugability and to make use of storing ivalues as attributes on nodes.

As an example:

```

%8 : Double(10, 11, 12, 13, strides=[1716, 1, 143, 11], requires_grad=0, device=cpu) = prim::TensorExprGroup_0[symbolic_shape_inputs=[-37, -36, -35, -34], striding_inputs_desc=[["TENSOR_CONT_CHANNELS_LAST"]](%x, %24, %23, %22, %21)```
```

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D33458649

Pulled By: eellison

fbshipit-source-id: c42616d3c683d70f6258180d23d3841a31a6030d
2022-01-12 09:11:31 -08:00
Elias Ellison
975e7d246e Remove ignore shapes arg (#71144)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71144

This wasn't being used anywhere. It was originally intended for the SR flow but we're doing something else now.

Test Plan: Imported from OSS

Reviewed By: navahgar, ZolotukhinM

Differential Revision: D33521061

Pulled By: eellison

fbshipit-source-id: 0574698a2b7409df6feb703f81e806d886225307
2022-01-12 09:09:49 -08:00
CodemodService FBSourceClangFormatLinterBot
93b2399c6c [AutoAccept][Codemod][FBSourceClangFormatLinter] Daily arc lint --take CLANGFORMAT
Reviewed By: zertosh

Differential Revision: D33544281

fbshipit-source-id: 4f0b5d6d490e6fcb967550cfb1dc0111b1770f73
2022-01-12 04:16:43 -08:00
Elias Ellison
9bccb31306 Remove precise tuple construct flag (#71121)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71121

Test Plan: Imported from OSS

Reviewed By: d1jang

Differential Revision: D33515234

Pulled By: eellison

fbshipit-source-id: 57cfe171b583a6bb4d3493a34b159061e97a11b8
2022-01-11 22:12:36 -08:00
Zhengxu Chen
9465c24245 [jit][edge] Use dynamic type instead of union types for schema parsers. (#70509)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70509

TypeFactory will construct DynamicType when building on Edge platforms. We use this facility to make FunctionSchema return DynamicType all the time for OptionalType. We don't explicitly use DynamicTypeFactory everywhere because that requires too many changes and will split the entire aten codebase.
ghstack-source-id: 146818621

Test Plan: CI

Reviewed By: iseeyuan

Differential Revision: D33306737

fbshipit-source-id: d7ce00b438f7c03b43945d578280cfd254b1f634
2022-01-11 20:14:25 -08:00
Zhengxu Chen
e7634f83ce [jit][edge] Migrate base types to DynamicType on mobile. (#70233)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70233

Make type parser to produce DynamicType for all base types which don't have type arguments, and return DynamicType pointer for IValue::type().
ghstack-source-id: 146818622

Test Plan: no behavior change.

Reviewed By: iseeyuan

Differential Revision: D33137219

fbshipit-source-id: 1612c924f5619261ebb21359936309b41b2754f5
2022-01-11 13:53:29 -08:00
Zhengxu Chen
4f35b9144c [jit][edge] Migrate ListType to DynamicType on mobile. (#70212)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70212

Use DynamicType instead of ListType all over the place in Lite Interpreter. Namely we need to modify the following places:
1. Type parser which produces the Type constants.
2. IValue::type() which returns reflected Type from IValues.
3. Helper functions to construct the container value.
4. Typechecks which test whether a type instance is a particular container type.
ghstack-source-id: 146818619

Test Plan: CI

Reviewed By: iseeyuan

Differential Revision: D33176931

fbshipit-source-id: 9144787f5fc4778538e5c665946974eb6171a2e6
2022-01-11 10:57:53 -08:00
Zhengxu Chen
b12ca69179 [jit][edge] Migrate DictType to DynamicType on mobile. (#70202)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70202

Use DynamicType instead of DictType all over the place in Lite Interpreter. Namely we need to modify the following places:
1. Type parser which produces the Type constants.
2. IValue::type() which returns reflected Type from IValues.
3. Helper functions to construct the container value.
4. Typechecks which test whether a type instance is a particular container type.
ghstack-source-id: 146735648

Test Plan: no behavior change.

Reviewed By: iseeyuan

Differential Revision: D33137257

fbshipit-source-id: 971bf431658c422ea9353cc32cdab66e98876e9d
2022-01-10 15:55:29 -08:00
Zhengxu Chen
30699cbfd5 Reland D33284352: [jit][edge] Do not reuse mobile type parser for all unpicklers. (#71048)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71048

reland D33284352 (0a921ba0d0)
ghstack-source-id: 146735646

Test Plan: All Github CI: ciflow rerun -l ciflow/all

Reviewed By: gmagogsfm

Differential Revision: D33489731

fbshipit-source-id: 3e160209a1abb193ad3eed3018054aa7d331025e
2022-01-10 12:42:23 -08:00
Elias Ellison
fb66f561b1 Add copy out to the fallback path in SR invocation of composed op (#70871)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70871

We had previously handled reusing memory in the optimized kernel execution path, but not yet handled it if we hit the unoptimized fallback.

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D33458652

Pulled By: eellison

fbshipit-source-id: 4eb62181ed02c95813a99638f5e2d0f9347b5c08
2022-01-10 12:16:38 -08:00
Zhengxu Chen
9762aa0fdc Revert D33284352: [jit][edge] Do not reuse mobile type parser for all unpicklers.
Test Plan: revert-hammer

Differential Revision:
D33284352 (0a921ba0d0)

Original commit changeset: 997c4f110b36

Original Phabricator Diff: D33284352 (0a921ba0d0)

fbshipit-source-id: af316727442a64f1ae40d53d7a9d26ec550d634e
2022-01-07 19:58:03 -08:00
Zhengxu Chen
0a921ba0d0 [jit][edge] Do not reuse mobile type parser for all unpicklers. (#70338)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70338

Today Unpickler is used by both server and mobile for deserializing model, and it always fallback to mobile parser when there's no type resolver provided by user. However this is not intended as server and mobile type parser supports different things. In this diff we provide a default fallback using script parser and opt it out for all mobile cases.
ghstack-source-id: 146727330

(Note: this ignores all push blocking failures!)

Test Plan: CI

Reviewed By: iseeyuan

Differential Revision: D33284352

fbshipit-source-id: 997c4f110b36eee6596e8f23f6a87bf91a4197ed
2022-01-07 18:35:32 -08:00
Jiewen Tan
338eb1b2b3 [LTC] Export torch::lazy::GetBackendDevice() (#70963)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70963

This commit exports torch::lazy::GetBackendDevice().

Test Plan: CI in the lazy_tensor_staging branch.

Reviewed By: wconstab

Differential Revision: D33468938

Pulled By: alanwaketan

fbshipit-source-id: f65599c9238bf6b4f4ffbd5194befdc267272831
2022-01-07 13:13:18 -08:00
Zhengxu Chen
1011ac188f [jit][edge] Create DynamicType for OptionalType in mobile. (#68137)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68137

A small step to replace existing OptionalType usage to DynamicType in Edge runtime.
ghstack-source-id: 146670520

Test Plan: CI

Reviewed By: iseeyuan

Differential Revision: D32264617

fbshipit-source-id: 62d3ffad40901842deac19ca2098ea5ca132e718
2022-01-07 11:23:12 -08:00
Zhengxu Chen
0517e719ac [jit] Add conformance test for DynamicType with server JIT types. (#69482)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69482

Add a test to enumerate a number of JIT type combinations and see if their subtyping behavior is preserved in the new DynamicType system.
ghstack-source-id: 146670526

Test Plan: buck test mode/opt //caffe2/test/cpp/jit:jit -- --exact 'caffe2/test/cpp/jit:jit - LiteInterpreterTest.DynamicType'

Reviewed By: gmagogsfm

Differential Revision: D32891263

fbshipit-source-id: 728211b39778e93db011b69b0a4047df78a8fc5b
2022-01-07 11:23:09 -08:00
Xiang Gao
6e16c9bb1d Add support for deleteKey for FileStore (#69953)
Summary:
torch_ucc uses `deleteKey`, and trying to run PyTorch tests with torch_ucc leads to failure about `deleteKey not implemented for FileStore`.

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69953

Reviewed By: ngimel

Differential Revision: D33458457

Pulled By: H-Huang

fbshipit-source-id: f46afd59f950722ae594d9aafb8843f14019e930
2022-01-07 06:20:59 -08:00
Mikhail Zolotukhin
8223ef1cd8 [TensorExpr] Clean-up logic for copying input tensors and remove some dead code. (#70535)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70535

This also fixes handling of inputs that happen to be outputs (they
require copy).

Test Plan: Imported from OSS

Reviewed By: pbelevich

Differential Revision: D33399116

Pulled By: ZolotukhinM

fbshipit-source-id: 9845838eb653b82ae47b527631b51893990d5319
2022-01-07 01:03:56 -08:00
Mikhail Zolotukhin
5d7cc8f22a [TensorExpr] Add some graph-rewrite passes to prepare models for AOT compilation. (#66515)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66515

These passes should not be used generally as they change API of the
model's forward method, but they help experimenting with the model and
ironing out all the kinks before it can be compiled properly. In the
long run ideally we should provide a better way to enable such
experiments.

Differential Revision:
D31590862
D31590862

Test Plan: Imported from OSS

Reviewed By: navahgar

Pulled By: ZolotukhinM

fbshipit-source-id: 74ded34c6c871d4cafa29f43dc27c7e71daff8fc
2022-01-07 01:03:53 -08:00
Joel Schlosser
e6befbe85c Add flag to optionally average output attention weights across heads (#70055)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/47583

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70055

Reviewed By: bhosmer

Differential Revision: D33457866

Pulled By: jbschlosser

fbshipit-source-id: 17746b3668b0148c1e1ed8333227b7c42f1e3bf5
2022-01-06 17:32:37 -08:00
Tugsbayasgalan (Tugsuu) Manlaibaatar
b0fdca8855 Bump version number to 7 and compile old operators with old schema (#68358)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68358

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D33433730

Pulled By: tugsbayasgalan

fbshipit-source-id: 202c58365bae13195d3545cefcb0da9162b02151
2022-01-05 23:57:22 -08:00
Raghavan Raman
616afcf981 [jit] [shape analysis] Move constant tensors out of fused subgraphs during generalization (#70320)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70320

ghstack-source-id: 146514368

Test Plan: `buck test mode/dev-nosan //caffe2/test/cpp/jit:jit`

Reviewed By: eellison

Differential Revision: D33280508

fbshipit-source-id: fe4291d7c49f0a498b330de96b698e99f6f6a505
2022-01-05 10:19:14 -08:00
Michael Suo
0ece9a49d7 Revert D33198155: Bump version number to 7 and compile old operators with old schema
Test Plan: revert-hammer

Differential Revision:
D33198155 (d35fc409ad)

Original commit changeset: 38a1185f9ecb

Original Phabricator Diff: D33198155 (d35fc409ad)

fbshipit-source-id: 411aaeb4e047aad9202db50d4d0f2ff35bc51f9d
2022-01-04 13:44:59 -08:00
Tugsbayasgalan (Tugsuu) Manlaibaatar
d35fc409ad Bump version number to 7 and compile old operators with old schema (#68358)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68358

Test Plan: Imported from OSS

Reviewed By: samdow

Differential Revision: D33198155

Pulled By: tugsbayasgalan

fbshipit-source-id: 38a1185f9ecb34a33f737ad0b060b3490956300c
2022-01-04 01:31:25 -08:00
Salil Desai
35251a5528 [PyTorch] Add Enum to IValue Deepcopy (#69937)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69937

This enables ```export_torch_mobile_model``` compatibility with Enum IValues

Test Plan: ModuleAPITest.DeepCopyEnum

Reviewed By: gmagogsfm

Differential Revision: D33104681

fbshipit-source-id: ca2a6d259c312487fe38dd1bed33ab6b7910bc2a
2021-12-30 07:52:22 -08:00
George Qi
8af39b7668 AdaptiveLogSoftmaxWithLoss no_batch_dim support (#69054)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69054

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D33200166

Pulled By: george-qi

fbshipit-source-id: 9d953744351a25f372418d2a64e8402356d1e9b7
2021-12-29 10:25:26 -08:00
Bo Wu
bf610f08b0 Back out "Make TorchScript Preserve Fully Qualified Class Name for Python Exceptions"
Summary: as title

Test Plan:
```
buck run mode/opt-split-dwarf -c=python.package_style=inplace //ai_infra/distributed_ai/pyper_test_framework/templates:pyper_release_v2 -- --model inline_cvr_post_imp_deterministic_shrunk_pyper_release_v2 --cluster TSCTestCluster --hpc_identity oncall_pyper_oncall --stage prod_offline_training --test_module training_platform
...
############## Start inline_cvr_post_imp_model Test Results Analysis ##############
I1226 22:03:56.789000 3346280 test_driver.py:139  UNKNOWN     ] Test finished in 808.2743511786684 seconds.
+-------------------------+---------+------------------------+-----------------+
| Test Case               | Status  | Message                | Model Entity ID |
+-------------------------+---------+------------------------+-----------------+
| SmallWorld_release_test | Success | finished successfully. | 987987491       |
+-------------------------+---------+------------------------+-----------------+
I1226 22:03:56.790000 3346280 test_driver.py:143  UNKNOWN     ] test_run_id: 3d085f61-28d1-411d-bd27-940ea2554b23 use this id to find your run in scuba pyper_test_framework
I1226 22:03:56.792000 3346280 test_driver.py:160  UNKNOWN     ] Calling cleanup
I1226 22:03:56.792000 3346280 training_platform_test_launcher.py:385  UNKNOWN     ] Stopping launched jobs 1
I1226 22:03:59.563122 3346280 ClientSingletonManager.cpp:100] Shutting down Manifold ClientSingletonManager
```

Reviewed By: seemethere

Differential Revision: D33325936

fbshipit-source-id: 64414bf7061ad77e8ac12eb8abafee4043e0fa1e
2021-12-27 09:11:46 -08:00
Tugsbayasgalan (Tugsuu) Manlaibaatar
4ae71c8d34 Add graph op replacement pass (#69915)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69915

Test Plan: Imported from OSS

Reviewed By: samdow

Differential Revision: D33198158

Pulled By: tugsbayasgalan

fbshipit-source-id: f2b924edf9959aaf51f97db994fae031fa062cf8
2021-12-25 13:03:19 -08:00
Tugsbayasgalan (Tugsuu) Manlaibaatar
df3cbcff28 Add utility methods to find an upgrader (#68355)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68355

Test Plan: Imported from OSS

Reviewed By: samdow

Differential Revision: D33198156

Pulled By: tugsbayasgalan

fbshipit-source-id: 68380148f0d9bee96d8090bf01c8dfca8e1f8b12
2021-12-24 12:23:04 -08:00
Shunting Zhang
911d527b87 Make TorchScript Preserve Fully Qualified Class Name for Python Exceptions (#70339)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70339

When a python program is translated to TorchScript, the python exception type is dropped. This makes users's life hard when they need to categorize errors based more than only exception message.

Here we make the change so when we raise a python exception, we record the fully qualified class name for the exception. Later on when the TorchScript is interpreted, a special exception CustomJITException is thrown. User can get the python class name from CustomJITException::getPythonClassName .

Note that, this diff does not customize the mapping from C++ exception to Python exception. It's left to the users to do whatever mapping they want.

Code under scripts/shunting are just my own experimental code. I can split them out if requested.
ghstack-source-id: 146221879

Test Plan: buck test mode/opt //caffe2/test:jit

Reviewed By: gmagogsfm

Differential Revision: D33282878

fbshipit-source-id: 910f67a764519f1053a48589d1a34df69001525d
2021-12-24 00:25:40 -08:00
Jiewen Tan
ab57f6d12c [LTC] Upstream utils to extract BackendDevice from at::Tensor (#70069)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70069

This commit upstreams utils to extract BackendDevice from at::Tensor.

Test Plan: ./build/bin/test_lazy --gtest_filter=BackendDeviceTest.GetBackendDevice*

Reviewed By: samdow

Differential Revision: D33293160

Pulled By: alanwaketan

fbshipit-source-id: 78647239f90b4d04adce84ae6022b8983ad30c09
2021-12-23 12:42:03 -08:00
Michael Suo
795af1578c Revert D33172665: [LTC] Upstream utils to extract BackendDevice from at::Tensor
Test Plan: revert-hammer

Differential Revision:
D33172665 (121d067999)

Original commit changeset: b334ee358ea7

Original Phabricator Diff: D33172665 (121d067999)

fbshipit-source-id: 8bff43cddfc5d30483ec5cea8eff037aab9d1cfa
2021-12-22 21:12:49 -08:00
Jiewen Tan
121d067999 [LTC] Upstream utils to extract BackendDevice from at::Tensor (#70069)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70069

This commit upstreams utils to extract BackendDevice from at::Tensor.

Test Plan: ./build/bin/test_lazy --gtest_filter=BackendDeviceTest.GetBackendDevice*

Reviewed By: wconstab

Differential Revision: D33172665

Pulled By: alanwaketan

fbshipit-source-id: b334ee358ea7b031bbffb0a16fa634715dba83f5
2021-12-22 18:15:45 -08:00
vfdev-5
ce9a2f8ba9 [C++ API] Added missing nearest-exact mode and anti-alias flag (#69318)
Summary:
Description:

Following https://github.com/pytorch/pytorch/pull/65142#issuecomment-981995692 adding missing nearest-exact mode and anti-alias flag to C++ frontend.

- https://github.com/pytorch/pytorch/pull/65142
- https://github.com/pytorch/pytorch/pull/64501

- added tests in pytorch/test/cpp/api/functional.cpp

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69318

Reviewed By: davidberard98

Differential Revision: D33278995

Pulled By: jbschlosser

fbshipit-source-id: fa87c0c78df6b398e4f9688cc02111eed187afa7
2021-12-22 11:10:51 -08:00
Jiewen Tan
e02d836cb2 [LTC] Upstream LTCTensorImpl (#70062)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70062

This commit upstreams LTCTensorImpl from the lazy_tensor_staging branch.
It inherits from c10::TensorImpl and thus manages the lifetime/storage
of LazyTensor.

Test Plan: ./build/bin/test_lazy --gtest_filter=LazyTensorImplTest.*

Reviewed By: desertfire

Differential Revision: D33171186

Pulled By: alanwaketan

fbshipit-source-id: 6af9f91cc7c7e997f120cb89a7bcd6785c03ace0
2021-12-22 03:21:52 -08:00
Raghavan Raman
4dec15e6d8 [nnc] Add a run method to TensorExprKernel that takes in output tensors (#69477)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69477

This diff adds a new run method to `TensorExprKernel` which takes in
output tensors as inputs and stores the output in those given tensors.
ghstack-source-id: 146107009

Test Plan: buck test mode/dev-nosan //caffe2/test/cpp/tensorexpr:tensorexpr -- --exact 'caffe2/test/cpp/tensorexpr:tensorexpr - Kernel.RunWithAllocatedOutputs'

Reviewed By: ZolotukhinM

Differential Revision: D32823890

fbshipit-source-id: edc1f4839785124048b034060feb71cb8c1be34f
2021-12-22 00:30:15 -08:00
George Qi
bb51519937 bug fix FractionalMaxPool2d (random_samples dimensions) (#70031)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70031

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D33200618

Pulled By: george-qi

fbshipit-source-id: 142f224c2cab1008d2d4e9ed333697a92d2d42db
2021-12-21 12:21:54 -08:00
Hui Guo
7abb7667a6 [tensorexpr] Add memory planning to reuse intermediate buffers (#66452)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66452

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D31557188

Pulled By: huiguoo

fbshipit-source-id: f18dfeba1df20d5d4f118640fc10782534eb9219
2021-12-17 01:38:02 -08:00
Hui Guo
bbfd7b75ca [tensorexpr] Move the allocation of intermediate buffers from TEK to CodeGen (#67143)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67143

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D31881151

Pulled By: huiguoo

fbshipit-source-id: 457e5d4ff8a15f70af9c797c9ab4803d8e779abe
2021-12-17 01:37:56 -08:00
Hui Guo
c7e0951524 [tensorexpr] Add a stmt recorder to obtain stmt PCs (#66450)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66450

Test Plan: Imported from OSS

Reviewed By: navahgar, ZolotukhinM

Differential Revision: D31557189

Pulled By: huiguoo

fbshipit-source-id: 416d79ddfc46a0109187cdeb919ad9b5abde8030
2021-12-17 01:36:37 -08:00
Zhengxu Chen
d459e79500 [jit][edge] Remove usage of shared_ptr<mobile::Code>. (#68037)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68037

Right now mobile::Code doesn't outlive its enclosing Function, and all accesses to Code happens inside interpreter loop which doesn't outlive the module, so we don't need to use std::shared_ptr here. This also should saves us 1-2 KB for binary size, because shared_ptr seems to bloat on arm64 android.
ghstack-source-id: 145818696

Test Plan: eyes.

Reviewed By: qihqi, tugsbayasgalan

Differential Revision: D32264616

fbshipit-source-id: d83f538d6604cf75fd7728a25127b4849ce7ab2a
2021-12-16 13:11:46 -08:00
Jiawei Lv
b4c4a015d6 Revert D33163841: Revert D33102715: Back out "Revert D32606547: torch/monitor: add C++ events and handlers"
Test Plan: revert-hammer

Differential Revision:
D33163841

Original commit changeset: e262b6d8c80a

Original Phabricator Diff: D33102715 (eb374de3f5)

fbshipit-source-id: 644216036a238a458f0a2198460b36d24fb035f8
2021-12-16 11:12:18 -08:00