Commit Graph

2136 Commits

Author SHA1 Message Date
Edward Z. Yang
9790d90e4b Don't introduce new overload for SymInt (#83628)
Previously, we introduced new SymInt overloads for every function we wanted.  This led to a lot of boilerplate, and also a lot of confusion about how the overloads needed to be implemented.

This PR takes a simpler but more risky approach: just take the original function and changes its ints to SymInts.

This is BC-breaking in the following ways:

* The C++ API for registering implementations for aten operators will change from int64_t to SymInt whenever you make this change. Code generated registrations in PyTorch do not change as codegen handles the translation automatically, but manual registrations will need to follow the change.  Typically, if you now accept a SymInt where you previously only took int64_t, you have to convert it back manually.  This will definitely break XLA, see companion PR https://github.com/pytorch/xla/pull/3914 Note that not all dispatch keys get the automatic translation; all the composite keys and Meta keys are modified to take SymInt directly (because they should handle them directly), and so there are adjustments for this.

This is not BC-breaking in the following ways:

* The user facing C++ API remains compatible.  Even if a function changes from int to SymInt, the default C++ binding still takes only ints.  (e.g., at::empty(IntArrayRef, ...).  To call with SymInts, you must call at::empty_symint instead. This involved adding two more signatures to CppSignatureGroup; in many cases I refactored code to iterate over all signatures in the group instead of hard-coding the two that previously existed.
* This is TorchScript compatible; internally we treat SymInts as ints so there is no change to what happens at runtime in TorchScript. In particular, it's OK to reference an empty schema by its old type (using int types), as long as you're not doing string equality (which you shouldn't be), these parse to the same underyling type.

Structure of the PR:

* The general strategy of this PR is that, even when you write `SymInt` inside `native_functions.yaml`, sometimes, we will treat it *as if* it were an `int`. This idea pervades the codegen changes, where we have a translation from SymInt to c10::SymInt or int64_t, and this is controlled by a symint kwarg which I added and then audited all call sites to decide which I wanted. Here are some of the major places where we pick one or the other:
  * The C++ FunctionSchema representation represents `SymInt` as `int`. There are a few places we do need to know that we actually have a SymInt and we consult `real_type()` to get the real type in this case. In particular:
    * When we do schema validation of C++ operator registration, we must compare against true schema (as the C++ API will provide `c10::SymInt`, and this will only be accepted if the schema is `SymInt`. This is handled with cloneWithRealTypes before we check for schema differences.
    * In `toIValue` argument parsing, we parse against the true schema value. For backwards compatibility reasons, I do still accept ints in many places where Layout/SymInt/etc were expected. (Well, accepting int where SymInt is expected is not BC, it's just the right logic!)
  * In particular, because SymInt never shows up as type() in FunctionSchema, this means that we no longer need a dedicated Tag::SymInt. This is good, because SymInts never show up in mobile anyway.
* Changes to functorch/aten are mostly about tracking changes to the C++ API registration convention. Additionally, since SymInt overloads no longer exist, registrations for SymInt implementations are deleted. In many cases, the old implementations did not properly support SymInts; I did not add any new functionality with this PR, but I did try to annotate with TODOs where this is work to do. Finally, because the signature of `native::` API changed from int to SymInt, I need to find alternative APIs for people who were directly calling these functions to call. Typically, I insert a new dispatch call when perf doesn't matter, or use `at::compositeexplicitautograd` namespace to handle other caes.
* The change to `make_boxed_from_unboxed_functor.h` is so that we accept a plain IntList IValue anywhere a SymIntList is expected; these are read-only arguments so covariant typing is OK.
* I change how unboxing logic works slightly. Previously, we interpret the C++ type for Layout/etc directly as IntType JIT type, which works well because the incoming IValue is tagged as an integer. Now, we interpret the C++ type for Layout as its true type, e.g., LayoutType (change to `jit_type.h`), but then we accept an int IValue for it anyway. This makes it symmetric with SymInt, where we interpret the C++ type as SymIntType, and then accept SymInt and int IValues for it.
* I renamed the `empty.names` overload to `empty_names` to make it less confusing (I kept mixing it up with the real empty overload)
* I deleted the `empty.SymInt` overload, which ended up killing a pile of functions. (This was originally a separate PR but the profiler expect test was giving me grief so I folded it in.)
* I deleted the LazyDynamicOpsTest tests. These were failing after these changes, and I couldn't figure out why they used to be passing: they make use of `narrow_copy` which didn't actually support SymInts; they were immediately converted to ints.
* I bashed LTC into working. The patches made here are not the end of the story. The big problem is that SymInt translates into Value, but what if you have a list of SymInt? This cannot be conveniently represented in the IR today, since variadic Values are not supported. To work around this, I translate SymInt[] into plain int[] (this is fine for tests because LTC dynamic shapes never actually worked); but this will need to be fixed for proper LTC SymInt support. The LTC codegen also looked somewhat questionable; I added comments based on my code reading.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83628
Approved by: https://github.com/albanD, https://github.com/bdhirsh
2022-08-26 01:35:40 +00:00
Xiang Gao
a4a55f5ea6 New TORCH_UCC_BLOCKING_WAIT env variable (#81791)
Cherry-pick of https://github.com/facebookresearch/torch_ucc/pull/95.

I recommend waiting until https://github.com/pytorch/pytorch/pull/81583 is merged first, so the CI is checking if this PR compiles correctly.

Marking this as a draft for now, will change to "ready for review" once https://github.com/pytorch/pytorch/pull/81583 merged.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81791
Approved by: https://github.com/kwen2501
2022-08-25 21:33:17 +00:00
Nikolay Korovaiko
86e134ddf7 disable c10::SymIntNode tests on mobile (#84066)
This fixes c++ tests' breaks where we were passing pointers and expected `is_symbolic` to return `true`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/84066
Approved by: https://github.com/albanD
2022-08-25 17:28:23 +00:00
jjsjann123
b21a6ff639 [NVFuser] Upstream push 0811 (#83239)
Syncing nvfuser devel branch to upstream master. https://github.com/csarofeen/pytorch/

Code changes includes:

- codegen improvements:
  1. double support in expression evaluator
- bug fixes:
  1. dropout fix - rework RNG to support broadcasted dropout (Fixes #82784)
  2. expand fix - Patch expand+reduction, expand+view, rework view analysis and guard
- scheduler:
  1. manual transpose schedule example
  2. WIP transpose scheduler

Commits that's in this PR from the devel branch:

```
b7435afcd22c917713c2f41a7237bc26e1183f14 Transpose scheduler, step 1 (#1854)
8a45dbf72034684eb8e18b1835b533e90b68f184 Add an example on how to manually schedule transpose (#1889)
83dbf56a9554b2efbd5416461d938fff477b0b27 Patch dropout fix (#1898)
69d3519a532250719b1aa8341b50e067b181b42d Expand+Reduction, Expand+View support, rework View analysis and guards (#1883)
15091c488e96343bdc49e3990acbf238a3b3da51 Rework RNG to correctly support broadcasted dropout (#1888)
aafe2d048aaac596e503596a41303423619f3954 Make ExpressionEvaluator support Double (#1885)
```

RUN_TORCHBENCH: nvfuser

Differential Revision: [D38657074](https://our.internmc.facebook.com/intern/diff/D38657074)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83239
Approved by: https://github.com/davidberard98
2022-08-25 02:23:22 +00:00
PyTorch MergeBot
a7edf71360 Revert "Don't introduce new overload for SymInt (#83628)"
This reverts commit 8fae7027b3.

Reverted https://github.com/pytorch/pytorch/pull/83628 on behalf of https://github.com/malfet due to breaking internal builds, see https://www.internalfb.com/diff/D38984222
2022-08-25 00:49:40 +00:00
Richard Barnes
67f0940cdd Check all CUDA API calls for errors in test/ (#74921) (#83954)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74921

Test Plan: Sandcastle

Reviewed By: ezyang, malfet, ngimel

Differential Revision: D35194966

Pull Request resolved: https://github.com/pytorch/pytorch/pull/83954
Approved by: https://github.com/ezyang
2022-08-24 20:12:25 +00:00
Larry Liu
a8a36c45a6 [frontend] Fix tensor list alias annotation (#84005)
For issue https://github.com/pytorch/pytorch/issues/77920 and a retry of https://github.com/pytorch/pytorch/pull/83921

The current logic checks alias info before `[]` and after. If no alias info exists after `[]`, we overwrite the alias info before. This logic failed on argument like `Tensor(a!)[]`, dropping the alias info before `[]` on the floor. This PR adds a new alias info if it's missing after `[]`. This way we can keep the alias info before `[]`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84005
Approved by: https://github.com/cccclai, https://github.com/bdhirsh
2022-08-24 19:50:19 +00:00
Nikolay Korovaiko
b842670aa5 logical ops (#83879)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83879
Approved by: https://github.com/ezyang
2022-08-24 17:49:57 +00:00
Nikolay Korovaiko
2b805e3520 add arithmetic ops (#83878)
arithmetic ops tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83878
Approved by: https://github.com/ezyang
2022-08-24 17:49:56 +00:00
Edward Z. Yang
8fae7027b3 Don't introduce new overload for SymInt (#83628)
Previously, we introduced new SymInt overloads for every function we wanted.  This led to a lot of boilerplate, and also a lot of confusion about how the overloads needed to be implemented.

This PR takes a simpler but more risky approach: just take the original function and changes its ints to SymInts.

This is BC-breaking in the following ways:

* The C++ API for registering implementations for aten operators will change from int64_t to SymInt whenever you make this change. Code generated registrations in PyTorch do not change as codegen handles the translation automatically, but manual registrations will need to follow the change.  Typically, if you now accept a SymInt where you previously only took int64_t, you have to convert it back manually.  This will definitely break XLA, see companion PR https://github.com/pytorch/xla/pull/3914 Note that not all dispatch keys get the automatic translation; all the composite keys and Meta keys are modified to take SymInt directly (because they should handle them directly), and so there are adjustments for this.

This is not BC-breaking in the following ways:

* The user facing C++ API remains compatible.  Even if a function changes from int to SymInt, the default C++ binding still takes only ints.  (e.g., at::empty(IntArrayRef, ...).  To call with SymInts, you must call at::empty_symint instead. This involved adding two more signatures to CppSignatureGroup; in many cases I refactored code to iterate over all signatures in the group instead of hard-coding the two that previously existed.
* This is TorchScript compatible; internally we treat SymInts as ints so there is no change to what happens at runtime in TorchScript. In particular, it's OK to reference an empty schema by its old type (using int types), as long as you're not doing string equality (which you shouldn't be), these parse to the same underyling type.

Structure of the PR:

* The general strategy of this PR is that, even when you write `SymInt` inside `native_functions.yaml`, sometimes, we will treat it *as if* it were an `int`. This idea pervades the codegen changes, where we have a translation from SymInt to c10::SymInt or int64_t, and this is controlled by a symint kwarg which I added and then audited all call sites to decide which I wanted. Here are some of the major places where we pick one or the other:
  * The C++ FunctionSchema representation represents `SymInt` as `int`. There are a few places we do need to know that we actually have a SymInt and we consult `real_type()` to get the real type in this case. In particular:
    * When we do schema validation of C++ operator registration, we must compare against true schema (as the C++ API will provide `c10::SymInt`, and this will only be accepted if the schema is `SymInt`. This is handled with cloneWithRealTypes before we check for schema differences.
    * In `toIValue` argument parsing, we parse against the true schema value. For backwards compatibility reasons, I do still accept ints in many places where Layout/SymInt/etc were expected. (Well, accepting int where SymInt is expected is not BC, it's just the right logic!)
  * In particular, because SymInt never shows up as type() in FunctionSchema, this means that we no longer need a dedicated Tag::SymInt. This is good, because SymInts never show up in mobile anyway.
* Changes to functorch/aten are mostly about tracking changes to the C++ API registration convention. Additionally, since SymInt overloads no longer exist, registrations for SymInt implementations are deleted. In many cases, the old implementations did not properly support SymInts; I did not add any new functionality with this PR, but I did try to annotate with TODOs where this is work to do. Finally, because the signature of `native::` API changed from int to SymInt, I need to find alternative APIs for people who were directly calling these functions to call. Typically, I insert a new dispatch call when perf doesn't matter, or use `at::compositeexplicitautograd` namespace to handle other caes.
* The change to `make_boxed_from_unboxed_functor.h` is so that we accept a plain IntList IValue anywhere a SymIntList is expected; these are read-only arguments so covariant typing is OK.
* I change how unboxing logic works slightly. Previously, we interpret the C++ type for Layout/etc directly as IntType JIT type, which works well because the incoming IValue is tagged as an integer. Now, we interpret the C++ type for Layout as its true type, e.g., LayoutType (change to `jit_type.h`), but then we accept an int IValue for it anyway. This makes it symmetric with SymInt, where we interpret the C++ type as SymIntType, and then accept SymInt and int IValues for it.
* I renamed the `empty.names` overload to `empty_names` to make it less confusing (I kept mixing it up with the real empty overload)
* I deleted the `empty.SymInt` overload, which ended up killing a pile of functions. (This was originally a separate PR but the profiler expect test was giving me grief so I folded it in.)
* I deleted the LazyDynamicOpsTest tests. These were failing after these changes, and I couldn't figure out why they used to be passing: they make use of `narrow_copy` which didn't actually support SymInts; they were immediately converted to ints.
* I bashed LTC into working. The patches made here are not the end of the story. The big problem is that SymInt translates into Value, but what if you have a list of SymInt? This cannot be conveniently represented in the IR today, since variadic Values are not supported. To work around this, I translate SymInt[] into plain int[] (this is fine for tests because LTC dynamic shapes never actually worked); but this will need to be fixed for proper LTC SymInt support. The LTC codegen also looked somewhat questionable; I added comments based on my code reading.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83628
Approved by: https://github.com/albanD, https://github.com/bdhirsh
2022-08-23 22:04:07 +00:00
Nikolay Korovaiko
fcb124406b release the current symintnode in the move c-tor (#83789)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83789
Approved by: https://github.com/ezyang
2022-08-22 14:37:06 +00:00
Milad Mohammadi
72963bbae9 Update isDynamic api to align with is_symbolic API (#83415)
Downstream #https://github.com/pytorch/xla/pull/3888

Pull Request resolved: https://github.com/pytorch/pytorch/pull/83415
Approved by: https://github.com/Krovatkin
2022-08-18 22:53:19 +00:00
soulitzer
31fad3926a Add option to run anomaly mode without nan checking (#83481)
Fixes https://github.com/pytorch/pytorch/issues/83117

Pull Request resolved: https://github.com/pytorch/pytorch/pull/83481
Approved by: https://github.com/albanD
2022-08-16 22:56:23 +00:00
richard
382ef1fda7 Autograd graphtask trim unnecessary edges (#82544)
### Introduction
<!-- What did you change and why was it needed? -->

Removing unnecessary weight gradient calculation is very important for applications that need high-order derivatives during training. However, this is not supported by the current Autograd engine.

For more detail: The backward function of a `matmul` operator (e.g., `linear` `addmm` `mm`), has two matmuls, one for `input gradient` and another for `weight gradient`. For a typical neural network (nn) with a few linear layers and activation functions, if the user calls `torch.autograd.grad()` to calculate the derivative of the nn output `y` w.r.t the nn input `x`,  only the `input gradient` of the `matmul` operator is needed, and the `weight gradient` is discarded. However, the current PyTorch autograd engine will always calculate the `weight gradient` if `weight` requires gradient (the calculation of the high-order derivative is performed during training).

The figure attached shows the autograd graph of the following code snippet:
```py
y = torch.nn.functional.linear(x, weight, bias)
y = y.pow(2)
# first order derivative
y__x, = torch.autograd.grad(y, x, grad_outputs=grad_outputs, create_graph=True)
# first order derivative
y__x__x, = torch.autograd.grad(y__x, x, grad_outputs=grad_outputs, create_graph=True)
```
The path with  is not needed when calculating derivatives.

<img width="50%" alt="image" src="https://user-images.githubusercontent.com/9999318/182018117-719c5a23-bcc6-4a63-8e8d-1bca3ebda2e3.png">

### Issue
<!-- Link to Issue ticket or RFP -->
Related issue: https://github.com/pytorch/pytorch/issues/56500

### Method
When calling `torch.autograd.grad`, `exec_info_` is created for each GraphTask, which allows filtering paths on the graph that are not needed. However, when the GraphTask calls into the node, the node still does not know whether the edges are needed or not. In the case of matmul, `weight.requires_grad is True` so the weight gradient is always calculated.

Following https://github.com/pytorch/pytorch/issues/56500#issuecomment-825694656, this PR passes the graph task's thread_local `exec_info_` into the node, so it could trim unnecessary edges during `torch.autograd.grad` calls.

### Benchmark
Benchmark script: https://gist.github.com/yueyericardo/24158433a2021c51eeef9c3e2722df99

Benchmark result:
6 hidden layers, batch size 10000, on A100

FP32 result
| hessian benchmark             | FP32 (before) | FP32 (After)      | FP32 (Functorch v0.1.1) |
| ----------------------------- | ------------- | ----------------- | ----------------------- |
| Linear + ReLU (no backward)   | 55.658 ms     | 29.392 ms (1.90X) | 29.547 ms (1.90X)       |
| Linear + ReLU (with backward) | 81.173 ms     | 54.917 ms (1.47X) | 68.988 ms (1.18X)       |

TF32 result
| hessian benchmark             | TF32 (before) | TF32 (after)      | TF32 (Functorch v0.1.1) |
| ----------------------------- | ------------- | ----------------- | ----------------------- |
| Linear + ReLU (no backward)   | 19.801 ms     | 11.259 ms (1.76X) | 10.754 ms (1.84X)       |
| Linear + ReLU (with backward) | 29.167 ms     | 20.466 ms (1.42X) | 22.784 ms (1.28X)       |

For FP32 result, we could get 1.9X speed up for hessian calculation, and 1.47X speed up during training, which is even faster than functorch `vmap(jacfwd(jacrev` implementation. (functorch has performance regression on v0.2.0, https://github.com/pytorch/functorch/issues/989, so we are using v0.1.1 for benchmark)

@zou3519 does functorch also includes similar optimizations during hessian calculation? If not, what do we need to do so the functorch could also benefit from this PR?

### Testing
<!-- How did you test your change? -->

- [x] we need to figure out a way for unittest

### Thanks
Thanks for the great blog: [How Computational Graphs are Executed in PyTorch | PyTorch](https://pytorch.org/blog/how-computational-graphs-are-executed-in-pytorch/)

cc @zasdfgbnm @albanD
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82544
Approved by: https://github.com/soulitzer
2022-08-11 18:50:09 +00:00
Nikita Shulga
1b2a17b8f9 Build MacOS binaries with -Werror (#83049)
Should prevent proliferating MPS warnings
Fixes https://github.com/pytorch/pytorch/issues/82966
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83049
Approved by: https://github.com/albanD, https://github.com/ezyang
2022-08-10 17:29:44 +00:00
Nikita Shulga
62c8d30f9f [BE] Add append_cxx_flag_if_supported macro (#82883)
And use it throughout the CMakeLists and rectify `IF(APPLE)`/`IF(GNU_CXX_VERSION VERSION_GREATER A.B)` and so on

Also, add `target_compile_options_if_supported` and use it in `Dependencies.cmake` as well as in test's `CMakeListst.txt`

Delete `-Wno-unknown-warning-option` to test that conditions indeed working as expected
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82883
Approved by: https://github.com/seemethere
2022-08-10 14:32:26 +00:00
PyTorch MergeBot
d3a1f17fc7 Revert "[BE] Add append_cxx_flag_if_supported macro (#82883)"
This reverts commit d7e6aaa59b.

Reverted https://github.com/pytorch/pytorch/pull/82883 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally
2022-08-10 10:27:59 +00:00
David Chen
90821aab10 Add SOFT_ASSERT to gracefully recover from invariant violations (#82689)
Summary: Implement SOFT_ASSERT that only fails in debug mode, but only trigger a warning log in release mode. This allows us to gracefully handle some of the invariant violation when processing traces that doesn't necessarily need to crash the entire program.

Test Plan: Added SOFT_ASSERT test in containers.cpp

Differential Revision: D38327334

Pull Request resolved: https://github.com/pytorch/pytorch/pull/82689
Approved by: https://github.com/robieta
2022-08-10 00:58:07 +00:00
Han Qi (qihqi)
f9533560cc Use flatbuffer of alternate namespace (#82952)
Summary: Minimal change to make use of flatbuffer with fbsource namespace.

Test Plan: existing unit tests

Differential Revision: D38494999

Pull Request resolved: https://github.com/pytorch/pytorch/pull/82952
Approved by: https://github.com/cccclai
2022-08-09 07:40:59 +00:00
Tugsbayasgalan Manlaibaatar
b4b60c2a2e Get rid of ENABLE_UPGRADERS macro (#77574)
Since it's been a while after we merged the upgrader design and we haven't encountered any issues, let's get rid of the macro for safe rollout
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77574
Approved by: https://github.com/gmagogsfm
2022-08-09 05:33:14 +00:00
Nikita Shulga
d7e6aaa59b [BE] Add append_cxx_flag_if_supported macro (#82883)
And use it throughout the CMakeLists and rectify `IF(APPLE)`/`IF(GNU_CXX_VERSION VERSION_GREATER A.B)` and so on

Also, add `target_compile_options_if_supported` and use it in `Dependencies.cmake` as well as in test's `CMakeListst.txt`

Delete `-Wno-unknown-warning-option` to test that conditions indeed working as expected
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82883
Approved by: https://github.com/seemethere
2022-08-08 21:04:09 +00:00
Dave Bort
6e712823c5 Migrate remaining pytorch code to use new flatbuffer_loader.h APIs (#82620)
This is the only file in pytorch core that refers to the deprecated flatbuffer_loader.h APIs. Move to the non-deprecated functions.

Differential Revision: [D38330369](https://our.internmc.facebook.com/intern/diff/D38330369/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82620
Approved by: https://github.com/qihqi
2022-08-05 02:25:33 +00:00
Dave Bort
0810961d5f Remove flatbuffer types/headers from flatbuffer_serializer[_jit].h (#82619)
Hide the flatbuffers types and headers from the serialize APIs, and stop using the DEPRECATED functions from flatbuffer_loader.h.

This required creating the new `DetachedBuffer` type to replace/hide `flatbuffers::DetachedBuffer`, a class that owns a span of custom-allocated memory.

This is another step towards hiding the flatbuffers types and headers from the load/serialize APIs.

Differential Revision: [D38292798](https://our.internmc.facebook.com/intern/diff/D38292798/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D38292798/)!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82619
Approved by: https://github.com/qihqi
2022-08-05 02:23:34 +00:00
Howard Huang
9d228fe517 [Small] Remove using c10d::ProcessGroup directive from c10d test (#82681)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82681
Approved by: https://github.com/awgu
2022-08-03 17:23:35 +00:00
Edward Z. Yang
df69660832 Revert "Revert "Add a lint rule for torch/csrc/util/pybind.h include (#82552)"" (#82599)
This reverts commit 532b8a9e00.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82599
Approved by: https://github.com/albanD
2022-08-02 19:37:02 +00:00
PyTorch MergeBot
532b8a9e00 Revert "Add a lint rule for torch/csrc/util/pybind.h include (#82552)"
This reverts commit 9465c0e0b5.

Reverted https://github.com/pytorch/pytorch/pull/82552 on behalf of https://github.com/zengk95 due to This seems to be breaking windows binary wheels
2022-08-01 20:25:35 +00:00
Edward Z. Yang
9465c0e0b5 Add a lint rule for torch/csrc/util/pybind.h include (#82552)
We define specializations for pybind11 defined templates
(in particular, PYBIND11_DECLARE_HOLDER_TYPE) and consequently
it is important that these specializations *always* be #include'd
when making use of pybind11 templates whose behavior depends on
these specializations, otherwise we can cause an ODR violation.

The easiest way to ensure that all the specializations are always
loaded is to designate a header (in this case, torch/csrc/util/pybind.h)
that ensures the specializations are defined, and then add a lint
to ensure this header is included whenever pybind11 headers are
included.

The existing grep linter didn't have enough knobs to do this
conveniently, so I added some features.  I'm open to suggestions
for how to structure the features better.  The main changes:

- Added an --allowlist-pattern flag, which turns off the grep lint
  if some other line exists.  This is used to stop the grep
  lint from complaining about pybind11 includes if the util
  include already exists.

- Added --match-first-only flag, which lets grep only match against
  the first matching line.  This is because, even if there are multiple
  includes that are problematic, I only need to fix one of them.
  We don't /really/ need this, but when I was running lintrunner -a
  to fixup the preexisting codebase it was annoying without this,
  as the lintrunner overall driver fails if there are multiple edits
  on the same file.

I excluded any files that didn't otherwise have a dependency on
torch/ATen, this was mostly caffe2 and the valgrind wrapper compat
bindings.

Note the grep replacement is kind of crappy, but clang-tidy lint
cleaned it up in most cases.

See also https://github.com/pybind/pybind11/issues/4099

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82552
Approved by: https://github.com/albanD
2022-08-01 17:16:58 +00:00
Edward Z. Yang
50e8abbcad Change SymIntNode into an intrusive pointer (#82548)
This will make the pointer type a single word, which is important
for packing it into an int64_t

This time, this diff doesn't segfault when you build with DEBUG mode; more details at https://github.com/pybind/pybind11/issues/4099

Signed-off-by: Edward Z. Yang <ezyangfb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82548
Approved by: https://github.com/albanD
2022-08-01 15:07:21 +00:00
PyTorch MergeBot
3b9cbb1738 Revert "Change SymIntNode into an intrusive pointer (#82432)"
This reverts commit 7be44f8158.

Reverted https://github.com/pytorch/pytorch/pull/82432 on behalf of https://github.com/ezyang due to segfaults on test but not caught in CI
2022-07-29 20:08:59 +00:00
Edward Z. Yang
7be44f8158 Change SymIntNode into an intrusive pointer (#82432)
This will make the pointer type a single word, which is important
for packing it into an int64_t

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82432
Approved by: https://github.com/albanD, https://github.com/Krovatkin
2022-07-29 17:32:54 +00:00
Max Ren
727a327162 Back out "Back out "[profiling] Adding targets file for test_mobile_profiler"" (#82243)
Summary:
Originally reverted this diff D37116110 (c9aa74a37f) because

```
> /usr/local/bin/buck build //caffe2/test/cpp/lite_interpreter_runtime/...

BUILD FAILED
The rule //caffe2:backend_interface_libAndroid could not be found.
Please check the spelling and whether it is one of the 1866 targets in /data/users/batanasov/fbsource/fbcode/caffe2/TARGETS. (52107 bytes)
1 similar targets in /data/users/batanasov/fbsource/fbcode/caffe2/TARGETS are:
  //caffe2:backend_interface_lib

This error happened while trying to get dependency '//caffe2:backend_interface_libAndroid' of target '//caffe2/test/cpp/lite_interpreter_runtime:test_mobile_profilerAndroid'
    At //caffe2:backend_interface_libAndroid (ovr_config//platform/linux:x86_64-fbcode)
    At //caffe2/test/cpp/lite_interpreter_runtime:test_mobile_profilerAndroid (ovr_config//platform/linux:x86_64-fbcode)
```

The add test_mobile_profiler was not meant to be built with Android or other mobile platforms, so we are changing the test to a cpp_unittest

Test Plan:
```
buck test //caffe2/test/cpp/lite_interpreter_runtime:test_mobile_profiler
Parsing buck files: finished in 0.9 sec
Creating action graph: finished in 26.5 sec
Downloaded 2/2 artifacts, 1.30 Mbytes, 0.0% cache miss (for updated rules)
Building: finished in 16.5 sec (100%) 18451/18451 jobs, 3/18451 updated
  Total time: 44.0 sec
More details at https://www.internalfb.com/intern/buck/build/8bee82c1-66a9-4fae-805f-e4ef5505d25d
BUILD SUCCEEDED
Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details.
Running with tpx session id: 6904f989-5c17-4c5b-9a4f-ffb643dfcc43
Trace available for this run at /tmp/tpx-20220726-114727.001729-6904f989-5c17-4c5b-9a4f-ffb643dfcc43/trace.log
RemoteExecution session id: reSessionID-6904f989-5c17-4c5b-9a4f-ffb643dfcc43-tpx
Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/844425183404951
    ✓ ListingSuccess: caffe2/test/cpp/lite_interpreter_runtime:test_mobile_profiler : 3 tests discovered (17.640)
    ✓ Pass: caffe2/test/cpp/lite_interpreter_runtime:test_mobile_profiler - MobileProfiler.Backend (0.206)
    ✓ Pass: caffe2/test/cpp/lite_interpreter_runtime:test_mobile_profiler - MobileProfiler.BackendMemoryEvents (0.271)
    ✓ Pass: caffe2/test/cpp/lite_interpreter_runtime:test_mobile_profiler - MobileProfiler.ModuleHierarchy (0.268)
Summary
  Pass: 3
  ListingSuccess: 1
Finished test run: https://www.internalfb.com/intern/testinfra/testrun/844425183404951
```

Differential Revision: D38166171

Pull Request resolved: https://github.com/pytorch/pytorch/pull/82243
Approved by: https://github.com/salilsdesai
2022-07-28 23:08:52 +00:00
Edward Z. Yang
fd5ac1e6b5 Rename SymbolicIntNode to SymIntNodeImpl (#82350)
Done via

```
git grep -l 'SymbolicIntNode' | xargs sed -i 's/SymbolicIntNode/SymIntNodeImpl/g'
```

Reasoning for the change:

* Sym is shorter than Symbolic, and consistent with SymInt
* You usually will deal in shared_ptr<...>, so we're going to
  reserve the shorter name (SymIntNode) for the shared pointer.

But I don't want to update the Python name, so afterwards I ran

```
 git grep -l _C.SymIntNodeImpl | xargs sed -i 's/_C.SymIntNodeImpl/_C.SymIntNode/'
```

and manually fixed up the binding code

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82350
Approved by: https://github.com/Krovatkin
2022-07-28 18:27:45 +00:00
goldenxuett
c2ccf6e625 [JIT] Add backwards compatibility test for old NonDeterminism ops list in ir.cpp (#82257)
- Added backwards compatibility test to ensure that every Op in the old Nondeterministic op list from ir.cpp has the tag nondeterministic_seeded.
**Note that the 3 ops marked "normal" were not actually real op signatures. (ie findOp with dispatcher returned a nullptr). These were changed to normal.Tensor_Tensor, normal.Tensor_float and normal.float_Tensor in the list since that is what matches the rest of their signatures**
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82257
Approved by: https://github.com/davidberard98
2022-07-27 20:19:22 +00:00
goldenxuett
8d5951e7e8 [JIT] Add is_aliasing method to FunctionSchema (#82255)
- Add is_aliasing method in function schema to be able to indicate if an argument has an alias_set attached to it. This is utilized in the integration with autograd (see next PR)
- Tested in test_schema_info
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82255
Approved by: https://github.com/davidberard98
2022-07-27 20:19:21 +00:00
goldenxuett
67c22b6c07 [JIT] Modify is_nondeterministic to utilize tags in SchemaInfo for non-mobile contexts and integrate with ir.cpp (#82253)
- Modified is_nondeterministic method in SchemaInfo class to utilize tags.
- Modified isNonDeterministic method in ir.cpp to utilize SchemaInfo when a Node is an aten op.
- Added an assert to ensure that if a node is an aten op kind, it has a schema.
- Tested through verifying that all IR.cpp tests run, and through adding 2 custom determinism checks to test for the special dropout edge case and a general bernoulli case.

Differential Revision: [D38179499](https://our.internmc.facebook.com/intern/diff/D38179499)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82253
Approved by: https://github.com/davidberard98
2022-07-27 20:19:19 +00:00
PyTorch MergeBot
e1bd244a14 Revert "[JIT] Modify is_nondeterministic to utilize tags in schemaInfo and integrate with ir.cpp (#81836)"
This reverts commit fc3555ce4d.

Reverted https://github.com/pytorch/pytorch/pull/81836 on behalf of https://github.com/osalpekar due to Internal Mobile NNPACK custom_ops tests failing with Error: tags are not saved for Mobile
2022-07-26 19:11:49 +00:00
PyTorch MergeBot
cbeef2c541 Revert "[JIT] Add is_aliasing method to FunctionSchema (#81916)"
This reverts commit eb2ea9a581.

Reverted https://github.com/pytorch/pytorch/pull/81916 on behalf of https://github.com/osalpekar due to Need to revert this to revert https://github.com/pytorch/pytorch/pull/81836 cleanly. That PR broke internal mobile custom_ops
2022-07-26 19:06:56 +00:00
PyTorch MergeBot
18dd7e55c9 Revert "[JIT] Add backwards compatibility test for old NonDeterminism ops list in ir.cpp (#82029)"
This reverts commit 7288ea4e1d.

Reverted https://github.com/pytorch/pytorch/pull/82029 on behalf of https://github.com/osalpekar due to Need to revert this to revert https://github.com/pytorch/pytorch/pull/81836 cleanly. That PR broke internal mobile custom_ops
2022-07-26 19:00:44 +00:00
Will Constable
4f34cd6d1e Replace all CHECK_ and DCHECK_ with TORCH_* macros (#82032)
Avoid exposing defines that conflict with google logging, since this blocks external usage of libtorch in certain cases.

All the 'interesting' changes should be in these two files, and the rest should just be mechanical changes via sed.
c10/util/logging_is_not_google_glog.h
c10/util/logging_is_google_glog.h

Fixes https://github.com/pytorch/pytorch/issues/81415

cc @miladm @malfet
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82032
Approved by: https://github.com/soumith, https://github.com/miladm
2022-07-26 01:20:44 +00:00
goldenxuett
7288ea4e1d [JIT] Add backwards compatibility test for old NonDeterminism ops list in ir.cpp (#82029)
- Added backwards compatibility test to ensure that every Op in the old Nondeterministic op list from ir.cpp has the tag nondeterministic_seeded.
**Note that the 3 ops marked "normal" were not actually real op signatures. (ie findOp with dispatcher returned a nullptr). These were changed to normal.Tensor_Tensor, normal.Tensor_float and normal.float_Tensor in the list since that is what matches the rest of their signatures**
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82029
Approved by: https://github.com/davidberard98
2022-07-25 15:44:34 +00:00
goldenxuett
eb2ea9a581 [JIT] Add is_aliasing method to FunctionSchema (#81916)
- Add is_aliasing method in function schema to be able to indicate if an argument has an alias_set attached to it. This is utilized in the integration with autograd (see next PR)
- Tested in test_schema_info
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81916
Approved by: https://github.com/davidberard98
2022-07-25 15:44:33 +00:00
goldenxuett
fc3555ce4d [JIT] Modify is_nondeterministic to utilize tags in schemaInfo and integrate with ir.cpp (#81836)
- Modified is_nondeterministic method in SchemaInfo class to utilize tags.
- Modified isNonDeterministic method in ir.cpp to utilize SchemaInfo when a Node is an aten op.
- Added an assert to ensure that if a node is an aten op kind, it has a schema.
- Tested through verifying that all IR.cpp tests run, and through adding 2 custom determinism checks to test for the special dropout edge case and a general bernoulli case.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81836
Approved by: https://github.com/davidberard98
2022-07-25 15:44:31 +00:00
goldenxuett
9a5fa15ea8 [JIT] Remove BatchNorm and InstanceNorm special cases from AliasDB and replace with SchemaInfo is_mutable checks (#81785)
- Generalized AnalyzeImpl cases for batchNorm and InstanceNorm in alias_analysis.cpp using schema_info.
- Tested by ensuring all aliasDB special case checks for batchNorm and instanceNorm pass as expected.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81785
Approved by: https://github.com/davidberard98
2022-07-23 05:50:39 +00:00
Peter Bell
8d0cbce069 Lower randint default dtype to the C++ API (#81410)
The default dtype for randint is currently handled with manual python
binding code, this moves it into the `native_functions.yaml` declaration
for API consistency.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81410
Approved by: https://github.com/albanD
2022-07-21 16:42:49 +00:00
goldenxuett
c9497886fd [JIT] Modify is_mutable in FunctionSchema and SchemaInfo to have SchemaArgument parameter instead of index (#81784)
- Modify the is_mutable(size_t index) overload to become is_mutable(const SchemaArgument& argument) due to cases where one might want to check the mutability of either input or output arguments.
- Refactored all calls to the function to use this new overload
- Tested through is_mutable() tests in test_schema_info.cpp
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81784
Approved by: https://github.com/davidberard98
2022-07-20 22:09:56 +00:00
goldenxuett
1ddbc5a7dc [JIT] Remove has_side_effects functionality from SchemaInfo (#81575)
- This removes all functionality from https://github.com/pytorch/pytorch/pull/81002 due to a realization that the side effects check doesn't affect any ops outside of JIT.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81575
Approved by: https://github.com/davidberard98
2022-07-19 22:33:19 +00:00
goldenxuett
a6e716cfed [JIT] Add may_contains_alias function in SchemaInfo class (#81444)
- Created may_contain_alias method in SchemaInfo which is a wrapper around FunctionSchema may_contain_alias that also accounts for argument values. This is done using similar logic to AliasDB using an internal understanding of wildcard sets and container object
- Added a multitude of tests for various graph edge cases (inputs aliasing, outputs aliasing, multiple input wildcards, multiple container objects, etc...).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81444
Approved by: https://github.com/davidberard98
2022-07-19 04:29:22 +00:00
goldenxuett
47cdab6601 [JIT] Fix double wildcard edge case for may_alias in SchemaInfo and improve formatting (#81439)
- Create c10::AliasTypeSet type def of vector<TypePtr> to match alias_analysis.cpp formatting and improve readability.
- Move canAliasTypeSetsAlias, mapTypeToAliasTypeSet, getAliasTypeSetContainedTypes, and getCorrectList to public in function_schema.h for use in SchemaInfo class.
**In the future it might be better to find a different home for most of these functions since they don't depend on functionSchema. **
- Created hash function for SchemaArgument
- Add assert to ensure that there is only 1 input and 1 output with each alias set (excluding wildcard)
- Fixed double wildcard input edge case for may_alias. (This is the case where if there is a schema with the form (Tensor(a) a, Tensor(*) b, Tensor(*) c) -> Tensor, and the argument values for 'a' and 'b' cause them to alias, then 'a' may also alias 'c'.
- Added tests for double wildcard case in may_alias, mismatching types in may_alias, and the uniqueness internal assert.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81439
Approved by: https://github.com/davidberard98
2022-07-19 04:29:21 +00:00
Nikolay Korovaiko
4aac42cc98 [LT] Add a new backend interface [DUP of the original] (#81662)
This is a dup of https://github.com/pytorch/pytorch/pull/76517 which is failing because Jiewen needs to resign the CLA.

Summary:
This commit introduces a new set of BackendImplInterface: GetDefaultDeviceOrdinal
and SetDefaultDeviceOrdinal. It allows backend to specify their own default
device, e.g, 1 for XLA and 0 for CUDA/CPU.

Test Plan:
./build/bin/test_lazy --gtest_filter=BackendDeviceTest.*

ghstack-source-id: b4adfef49253e51bffbbf40d356188a92c98994d
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76517

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/81662
Approved by: https://github.com/JackCaoG, https://github.com/wconstab
2022-07-19 01:15:22 +00:00
zhang, xiaobing
86b86202b5 fix torch.config can't respect USE_MKLDNN flag issue (#75001)
Fixes https://github.com/pytorch/pytorch/issues/74949, which reports that torch.config can't respect USE_MKLDNN flag.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75001
Approved by: https://github.com/malfet
2022-07-17 15:00:48 +00:00
goldenxuett
e71f4e7958 [JIT] Implement may_contain_alias in FunctionSchema (#81352)
- Created may_contain_alias method in FunctionSchema to publicize more detailed aliasing information about inputs and outputs of a schema. This method returns whether the first argument may contain an alias to the second argument (ie if the first argument is a list[Tensor], it can contain an alias to the second argument of the second argument is Tensor(*)) and vice versa if bidirectional = true.
- Created helper methods are explained more thoroughly in detail in function_schema.h
-Tested may_contain_alias methods for basic functionality, bidirectional functionality, wildcard functionality and dual container functionality in test_schema_info.cpp.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81352
Approved by: https://github.com/davidberard98, https://github.com/Gamrix
2022-07-15 21:57:37 +00:00
goldenxuett
42ee1608d3 [JIT] Add special cases batch_norm, instance_norm and dropout for SchemaInfo (#81007)
- Added special cases for detach in is_non_deterministic() check and batch_norm and instance_norm in is_mutable() check in SchemaInfo().
- Added tests for the above special cases for detach, batch_norm and instance_norm.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81007
Approved by: https://github.com/davidberard98
2022-07-15 04:52:02 +00:00
goldenxuett
3b4964230e [JIT] Add side effects checks for ops in SchemaInfo subclass (#81002)
- Added has_side_effects method which returns whether a given op has side effects. Currently this is implemented with a hard-coded list of functions copied from ir.cpp in AliasDB, but this will eventually be implemented by returning with a given schema has the has_side_effects tag.
- Tested in test_schema_info.cpp with both an op with side effects and an op without side effects.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81002
Approved by: https://github.com/davidberard98
2022-07-13 00:39:30 +00:00
goldenxuett
14c28caed9 [JIT] Add determinism checks for ops in SchemaInfo subclass (#81000)
- Added is_non_deterministic which returns whether a given op is non-deterministic. Currently this is implemented with a hard-coded list of non-deterministic functions copied from ir.cpp in AliasDB, but this will eventually be implemented by returning with a given schema has the non_deterministic tag.
- Tested is_non_deterministic method with a deterministic op and a non deterministic op in test_schema_info.cpp

**Note that the case for op "aten::dropout(Tensor input, float p, bool train) -> Tensor" which is deterministic whenever "train=false" is not accounted for in this pr and will be fixed in a later pr. Currently "aten::dropout(Tensor input, float p, bool train) -> Tensor" is always considered nondeterministic.**
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81000
Approved by: https://github.com/davidberard98
2022-07-13 00:35:42 +00:00
goldenxuett
50ba94f5cc [JIT] Add aliasing checks in SchemaInfo with associated tests (#80984)
- Created may_alias method in SchemaInfo to update the implementation of FunctionSchema::may_alias for aliasing cases due to inputs aliasing.
- Created output_alias_map_ internal variable to check cases where outputs might alias due to inputs aliasing. This variable is updated in generateAliasMap().
- Added tests for various may_alias special cases (input - input, input - output, output - output) due to inputs aliasing causing other arguments to also alias.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80984
Approved by: https://github.com/davidberard98
2022-07-13 00:18:43 +00:00
goldenxuett
aa61fdb667 [JIT] Add argumentValue functions and is_mutable checks to SchemaInfo (#80972)
- Created addArgumentValue/s methods in SchemaInfo to pass argument values into the subclass. These are used for more accurate mutation, aliasing and determinism checks which include special cases.
- Added input_alias_map_ to keep track of which inputs alias each other. This is updated with the method generateAliasMap.
- Implemented is_mutable methods in SchemaInfo which also give information based on argument values. For instance, if two inputs alias and one is mutable by the schema, then the other will also be mutable.
- Tested Schema Info is_mutable implementation where inputs alias as mentioned above.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80972
Approved by: https://github.com/davidberard98
2022-07-13 00:16:41 +00:00
goldenxuett
e3a870986e [JIT] Add may_alias in FunctionSchema with associated tests (#80918)
- Created may_alias method in FunctionSchema to publicize aliasing information about inputs and outputs of a schema.
- Tested may_alias methods for basic functionality, exceptions, and wildcard functionality.

**Cases where elements of a container alias another argument will be handled with a new may_contain_alias method which will be created in a later pr**
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80918
Approved by: https://github.com/davidberard98
2022-07-12 18:07:23 +00:00
goldenxuett
b4e342928b [JIT] Add mutability checks in FunctionSchema and create SchemaInfo subclass (#80734)
- Added overloads to is_mutable method in FunctionSchema to tell whether an argument at index is mutable or an argument with name is mutable.
- Created SchemaInfo subclass of FunctionSchema with constructors from FunctionSchema and from const char* signature.
- Tested is_mutable method overloads in new test_schema_info.cpp file.

**Note that this pr is used to set up SchemaInfo. Implementation for SchemaInfo will be addressed in later commits**

Differential Revision: [D37651384](https://our.internmc.facebook.com/intern/diff/D37651384)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80734
Approved by: https://github.com/davidberard98
2022-07-11 19:13:06 +00:00
soulitzer
516f3198d6 Fix retains grad behavior after in-place (#79996)
See this doc: https://docs.google.com/document/d/1KiRdnoj6B4cI3yl017hTbCqcOGO1gWIpUf20sldipHM/edit#

Two issues (1) regarding hooks in general and (2) regarding retains grad hooks are fixed, Python hooks, which rely on a different mechanism are not discussed here:
- Hooks in cpp in general
  - (fixed) new hooks to registered to a newer version of the tensor no longer get applied to grad_fn
    associated with older version of the tensor when the first hook was ever registered
  - (unchanged) hooks registered to the older version of the tensor remain active on
- Retains grad hooks
  - (fixed) now get moved to the latest grad_fn. NB: To the user, retains_grad is not considered hooks
    or expected to behave like hooks (which we consider properties of the grad_fn) vs retains_gradness
    which is a property of the tensor.
- (not in this PR) Python hooks
  - (will fix) same issue as hooks in cpp where new hooks are being applied to grad_fn associated
    with the older version of the tensor
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79996
Approved by: https://github.com/albanD
2022-07-08 19:13:28 +00:00
Sergii Dymchenko
b0aaefb50f Build example_allreduce only for GLOO (#81062)
`example/allreduce.cpp` is GLOO-specific and will not compile with USE_GLOO=0

Pull Request resolved: https://github.com/pytorch/pytorch/pull/81062
Approved by: https://github.com/malfet
2022-07-08 02:25:54 +00:00
Nikolay Korovaiko
8389ccbcd8 reinstate size and shape returning symints (#79560)
This PR redirects `size` and `.shape` to call `sym_sizes`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79560
Approved by: https://github.com/Chillee
2022-07-08 01:17:33 +00:00
Boyan Atanasov
b603860c1d Back out "[profiling] Adding targets file for test_mobile_profiler" (#80789)
Summary:
Original commit changeset: 38314c83d223

Original Phabricator Diff: D37116110 (c9aa74a37f)

Reviewed By: mcr229

Differential Revision: D37582906

Pull Request resolved: https://github.com/pytorch/pytorch/pull/80789
Approved by: https://github.com/bochko, https://github.com/salilsdesai
2022-07-05 23:34:15 +00:00
Han Qi (qihqi)
c93ceef658 Wrap static initializers in ifdef (#80590)
because, on iOS some projects has -Wglobal-constructors and it won't build.

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/80590
Approved by: https://github.com/cccclai
2022-07-01 04:42:17 +00:00
Max Ren
c9aa74a37f [profiling] Adding targets file for test_mobile_profiler (#80351)
Summary:
Testing for successful recording of backend events. Testing checks that the trace file successfully adds the memory recording from the backend at execute. The record in the trace file looks like:

```
{
    "ph": "i", "cat": "cpu_instant_event", "s": "t", "name": "[memory]",
    "pid": 847267, "tid": 847267,
    "ts": 1655333276408215,
    "args": {
      "Device Type": 0, "Device Id": -1, "Addr": 108370615407104, "Bytes": 16384, "Total Allocated": 16384, "Total Reserved": 49152
    }
  }
```

Test Plan:
```
buck test //caffe2/test/cpp/lite_interpreter_runtime:test_mobile_profiler
Parsing buck files: finished in 1.6 sec
Creating action graph: finished in 30.9 sec
Downloaded 0/5 artifacts, 0.00 bytes, 100.0% cache miss (for updated rules)
Building: finished in 37.9 sec (100%) 25314/25314 jobs, 5/25314 updated
  Total time: 01:10.5 min
More details at https://www.internalfb.com/intern/buck/build/ef1c4324-13d3-494e-bce7-8004047d5f89
BUILD SUCCEEDED
Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details.
Running with tpx session id: 17f300d4-9a78-4302-9e9e-d7ab79ba1ff0
Trace available for this run at /tmp/tpx-20220615-165413.567757-17f300d4-9a78-4302-9e9e-d7ab79ba1ff0/trace.log
RemoteExecution session id: reSessionID-17f300d4-9a78-4302-9e9e-d7ab79ba1ff0-tpx
Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/7881299443250383
    ✓ ListingSuccess: caffe2/test/cpp/lite_interpreter_runtime:test_mobile_profiler : 3 tests discovered (37.049)
    ✓ Pass: caffe2/test/cpp/lite_interpreter_runtime:test_mobile_profiler - MobileProfiler.Backend (0.402)
    ✓ Pass: caffe2/test/cpp/lite_interpreter_runtime:test_mobile_profiler - MobileProfiler.ModuleHierarchy (0.487)
    ✓ Pass: caffe2/test/cpp/lite_interpreter_runtime:test_mobile_profiler - MobileProfiler.BackendMemoryEvents (0.280)
Summary
  Pass: 3
  ListingSuccess: 1
Finished test run: https://www.internalfb.com/intern/testinfra/testrun/7881299443250383
If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users
```

Differential Revision: D37116110

Pull Request resolved: https://github.com/pytorch/pytorch/pull/80351
Approved by: https://github.com/kimishpatel
2022-06-30 17:27:35 +00:00
Sergei Vorobev
a8b0988596 Fix //:module_test Conversion_MultiCUDA (#79926)
Fixes #79871

Make `module.cpp` tests respect change that was made in #78436 (no int types in autograd).

Note that there still a gap in Cmake test -- it's unclear why it didn't fail CI before.

As far as I can tell it should be executed, because it's included here 79507d2a9d/test/cpp/api/CMakeLists.txt (L17):L17

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79926
Approved by: https://github.com/soulitzer
2022-06-21 23:32:18 +00:00
Nikolay Korovaiko
efc7343743 Revert "Revert "Put symint overloads on a different name"" (#79680)
This relands https://github.com/pytorch/pytorch/pull/79281

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79680
Approved by: https://github.com/malfet
2022-06-21 07:06:33 +00:00
Han Qi (qihqi)
fed12ff680 [BE][flatbuffer] Remove code duplications and refactor (#79184)
Summary:
Remove code dup in import.cpp / export_modules.cpp such that
1. Only one copy of switching logic (detect flatbuffer / is_flatbuffer);
2. Move detection of includeness of flatbuffer to runtime (so no more macros)

This also reverts the dependency of import.cpp -> flatbuffer_loader.cpp to flatbuffer_loader.cpp -> import.cpp.

Differential Revision: D36926217

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79184
Approved by: https://github.com/zhxchen17
2022-06-20 16:37:38 +00:00
Nikita Shulga
4a4890cfb2 [BE] Use CamelCase for enum class members (#79772)
Per many C++ code-style guides members(for [example](https://google.github.io/styleguide/cppguide.html#Enumerator_Names) ) members of `enum` should be CamelCased,
and only defines should be ALL_CAPS

Changes `MemOverlap`, `MemOverlapStatus` and `CmpEvalResult` enum values

Also, `YES`, `NO`, `TRUE` and `FALSE` are often system defines

Fixes among other things, current iOS build regression, see, which manifests as follows (see [this](6e90572bb9):
```
/Users/runner/work/pytorch/pytorch/aten/src/ATen/MemoryOverlap.h:19:29: error: expected identifier
enum class MemOverlap { NO, YES, TOO_HARD };
                            ^
/Applications/Xcode_12.4.app/Contents/Developer/Platforms/iPhoneSimulator.platform/Developer/SDKs/iPhoneSimulator14.4.sdk/usr/include/objc/objc.h:89:13: note: expanded from macro 'YES'
#define YES __objc_yes
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79772
Approved by: https://github.com/drisspg, https://github.com/kulinseth
2022-06-17 05:53:57 +00:00
PyTorch MergeBot
b9bb52d97b Revert "Put symint overloads on a different name"
This reverts commit 213a8fc992.

Reverted https://github.com/pytorch/pytorch/pull/79281 on behalf of https://github.com/bigfootjon due to Diff reverted internally
2022-06-15 17:15:21 +00:00
Nikolay Korovaiko
83e575c510 have a common interface to extract metadata from SizeNodes (#78088)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78088
Approved by: https://github.com/JackCaoG, https://github.com/wconstab
2022-06-15 04:59:08 +00:00
John Clow
07a528cac7 Adding isDynamic Support to SizeNodes
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77917

Approved by: https://github.com/Krovatkin
2022-06-14 03:27:57 +00:00
David Berard
91a2e953e5 [JIT] Use signed integers in CalculatedNecessaryArgs
x was underflowing:
```
size_t x = ...
while (x >= 0) {
  x--;
}
```

Changed the variables to ssize_t.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79331

Approved by: https://github.com/yuhc, https://github.com/tugsbayasgalan
2022-06-13 19:41:18 +00:00
Edward Z. Yang
213a8fc992 Put symint overloads on a different name
Due to implicit conversion shenanigans, having both IntArrayRef
and SymIntArrayRef overloads makes {} ambiguous.  While we could
fix this by making a single unified type that accepts all the overloads
we want, an easier fix was to just push the SymIntArrayRef overload
to its own name.

Signed-off-by: Edward Z. Yang <ezyangfb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79281

Approved by: https://github.com/suo
2022-06-12 14:36:39 +00:00
Michael Suo
30fb2c4aba [lint] autoformat test/cpp and torch/csrc
Let's have some fun.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78828

Approved by: https://github.com/ezyang
2022-06-11 21:11:16 +00:00
Michael Andreas Dagitses
ab2ca95dd1 turn on -Werror=unused-variable in our Bazel CPU build
Summary:
We also fix any existing issues. Note that we only do this for the CPU
build because nvcc is considered a C++ toolchain but it does not have
the same flag support. Adding flags to the GPU build will cause nvcc
errors.

Test Plan: Built locally, rely on CI to confirm.

Reviewers: malfet

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79156

Approved by: https://github.com/seemethere, https://github.com/osalpekar, https://github.com/albanD
2022-06-11 02:46:34 +00:00
Michael Andreas Dagitses
606b234336 turn on -Werror=unused-function in our Bazel CPU build
Summary:
We also fix any existing issues. Note that we only do this for the CPU
build because nvcc is considered a C++ toolchain but it does not have
the same flag support. Adding flags to the GPU build will cause nvcc
errors.

Test Plan: Built locally, rely on CI to confirm.

Reviewers: malfet

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79154

Approved by: https://github.com/seemethere, https://github.com/osalpekar, https://github.com/albanD
2022-06-10 22:11:54 +00:00
PyTorch MergeBot
bcd7a20953 Revert "turn on -Werror=unused-function in our Bazel CPU build"
This reverts commit 67d313a032.

Reverted https://github.com/pytorch/pytorch/pull/79154 on behalf of https://github.com/malfet due to Breaks bazel build: 67d313a032
2022-06-10 20:43:03 +00:00
Michael Andreas Dagitses
67d313a032 turn on -Werror=unused-function in our Bazel CPU build
Summary:
We also fix any existing issues. Note that we only do this for the CPU
build because nvcc is considered a C++ toolchain but it does not have
the same flag support. Adding flags to the GPU build will cause nvcc
errors.

Test Plan: Built locally, rely on CI to confirm.

Reviewers: malfet

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79154

Approved by: https://github.com/seemethere, https://github.com/osalpekar, https://github.com/albanD
2022-06-10 18:30:08 +00:00
Brian Hirsh
7b3a0ff87a Port index.Tensor to structured kernels.
Tracking issue: #55070

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69607

Approved by: https://github.com/bdhirsh
2022-06-10 17:27:47 +00:00
Michael Andreas Dagitses
f96d96a7fc turn on -Werror=type-limits in our Bazel CPU build
Summary:
We also fix any existing issues.

Test Plan: Built locally, rely on CI to confirm.

Reviewers: malfet

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79139

Approved by: https://github.com/seemethere, https://github.com/osalpekar, https://github.com/albanD
2022-06-10 10:04:08 +00:00
Nikita Shulga
3255ddeec9 Make Wunused-local-typedef a hard error (#77918)
Only allow it for `libtorch_python` and tests
Helps prevent regression like https://github.com/pytorch/pytorch/pull/76547#issuecomment-1132208232

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77918
Approved by: https://github.com/osalpekar, https://github.com/seemethere
2022-06-09 18:14:01 +00:00
Mark Harfouche
221755cc71 Link BLAS privately (#78883)
We've some users report that they are getting symbol collisions when linking to blas.

I don't see a need to re-export the blas library symbols.

I figured I would share here for other packagers to be able to benefit too.

xref: https://github.com/conda-forge/pytorch-cpu-feedstock/pull/116
xref: https://github.com/conda-forge/openblas-feedstock/issues/134
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78883
Approved by: https://github.com/ezyang
2022-06-09 17:02:06 +00:00
PyTorch MergeBot
4b82ef7928 Revert "Port index.Tensor to structured kernels."
This reverts commit cfd84125bd.

Reverted https://github.com/pytorch/pytorch/pull/69607 on behalf of https://github.com/zengk95 due to This is breaking mac trunk tests cfd84125bd
2022-06-08 20:16:10 +00:00
Brian Hirsh
cfd84125bd Port index.Tensor to structured kernels.
Tracking issue: #55070

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69607

Approved by: https://github.com/bdhirsh
2022-06-08 18:17:52 +00:00
dzdang
a56f4e23b9 [quant][core][better-engineering] Rename files in quantized directory to conform with non-quantized countertpart filenames
Summary:
Names of analogous files in quantized directory (previously snake case) were inconsistent with
their non-quantized filename counterparts (pascal case). This is the first of a series of PRs that changes
all files in quantized (and sub-directories) dir to have pascal case.

`aten/src/ATen/native/quantized/qconv_unpack.cpp` has not been renamed yet
because (for reasons currently unknown) after making the name change, `import torch` produces the below error (`qlinear_unpack.cpp` renaming also seems to fail some phabricator CI tests for similar reasons). We suspect that these may be undefined errors and will revisit naming these files in a future PR.

```
terminate called after throwing an instance of 'c10::Error'
  what():  Type c10::intrusive_ptr<ConvPackedParamsBase<2> > could not be converted to any of the known types.
Exception raised from operator() at ../aten/src/ATen/core/jit_type.h:1735 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x55 (0x7f26745c0c65 in /data/users/dzdang/pytorch/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0xb1 (0x7f26745bdcd1 in /data/users/dzdang/pytorch/torch/lib/libc10.so)
frame #2: <unknown function> + 0x1494e24 (0x7f2663b14e24 in /data/users/dzdang/pytorch/torch/lib/libtorch_cpu.so)
frame #3: <unknown function> + 0xfed0bc (0x7f266366d0bc in /data/users/dzdang/pytorch/torch/lib/libtorch_cpu.so)
frame #4: c10::detail::infer_schema::make_function_schema(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&&, c10::ArrayRef<c10::detail::infer_schema::ArgumentDef>, c10::ArrayRef<c10::detail::infer_schema::ArgumentDef>) + 0x5a (0x7f266366d71a in /data/users/dzdang/pytorch/torch/lib/libtorch_cpu.so)
frame #5: c10::detail::infer_schema::make_function_schema(c10::ArrayRef<c10::detail::infer_schema::ArgumentDef>, c10::ArrayRef<c10::detail::infer_schema::ArgumentDef>) + 0x7b (0x7f266366e06b in /data/users/dzdang/pytorch/torch/lib/libtorch_cpu.so)
frame #6: <unknown function> + 0x1493f32 (0x7f2663b13f32 in /data/users/dzdang/pytorch/torch/lib/libtorch_cpu.so)
frame #7: <unknown function> + 0xe227dd (0x7f26634a27dd in /data/users/dzdang/pytorch/torch/lib/libtorch_cpu.so)
frame #8: <unknown function> + 0x14e0a (0x7f268c934e0a in /lib64/ld-linux-x86-64.so.2)
..........................truncated.............
```

Test Plan:
```
python test/test_quantization.py
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77037

Approved by: https://github.com/jerryzh168
2022-06-07 13:47:08 +00:00
PyTorch MergeBot
6a4997e66a [Profiler] Weaken ordering check during post processing.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78563

The profiler assembles a call hierarchy by replaying recorded events. There is an assert to ensure that the events form a well structured tree; however many of the inputs are from external sources and small differences (e.g. recording time in a lower precision) leads to traces which violate that assumption. For now this is acceptable; the post processing can handle resolving these descrepencies. As a result, I am relaxing the assert to only test event types where we expect the framework to be able to enforce these strong structural requirements.

Differential Revision: [D36787787](https://our.internmc.facebook.com/intern/diff/D36787787/)

Approved by: https://github.com/suo
2022-06-01 18:55:19 +00:00
PyTorch MergeBot
fca1f495c2 Revert "Port index.Tensor to structured kernels."
This reverts commit 9fe6f1baf5.

Reverted https://github.com/pytorch/pytorch/pull/69607 on behalf of https://github.com/suo due to this broke master, see: 9fe6f1baf5
2022-06-01 00:12:15 +00:00
PyTorch MergeBot
ceb93afe3f Revert "Fix bug in flatbuffer deserialization"
This reverts commit 7e72c96b10.

Reverted https://github.com/pytorch/pytorch/pull/78344 on behalf of https://github.com/tugsbayasgalan due to as we need to land it in fbcode asap
2022-05-31 23:34:04 +00:00
Brian Hirsh
9fe6f1baf5 Port index.Tensor to structured kernels.
Tracking issue: #55070

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69607

Approved by: https://github.com/bdhirsh
2022-05-31 22:15:20 +00:00
Tugsbayasgalan Manlaibaatar
7e72c96b10 Fix bug in flatbuffer deserialization
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78344

Approved by: https://github.com/qihqi
2022-05-31 18:37:30 +00:00
Michael Suo
032d1ace1d [ci] disable flaky MobileProfiler.Backend test
This test is flaky, normally I'd disable using the disable bot but it
doesn't support cpp.

[skip ci]

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78320

Approved by: https://github.com/malfet
2022-05-26 03:22:55 +00:00
Shunting Zhang
26d9386f67 Make string serialization of C++ FunctionSchema consistent with torchgen.model.FunctionSchema
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77926

There is a discrepency between the string representation of C++ FunctionSchema and torchgen.model.FunctionSchema.
The latter will not add parenthesis around the returned types if that a single item,
but the C++ FunctionSchema always add the parenthesis.

Make them consistent so we can convert one type to the other via its string representation and parse method.

Differential Revision: [D36535924](https://our.internmc.facebook.com/intern/diff/D36535924/)

Approved by: https://github.com/bdhirsh
2022-05-24 19:39:26 +00:00
John Clow
c82fb7a67f Adding support for upper and lower bound functions in SSA
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77389

Approved by: https://github.com/eellison
2022-05-20 23:58:40 +00:00
Nikolay Korovaiko
df1f9b9840 Implement sym_sizes to create proper IR for sym ints representing tensor sizes (#77756)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77756
Approved by: https://github.com/desertfire
2022-05-20 05:39:03 +00:00
PyTorch MergeBot
e9d660c331 Revert "Revert "Revert "Implement sym_sizes to create proper IR for sym ints representing tensor sizes (#76836)"""
This reverts commit acf7136a52.

Reverted https://github.com/pytorch/pytorch/pull/77719 on behalf of https://github.com/suo
2022-05-18 05:06:50 +00:00
Edward Z. Yang
acf7136a52 Revert "Revert "Implement sym_sizes to create proper IR for sym ints representing tensor sizes (#76836)""
This reverts commit c35bd8d423.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77719

Approved by: https://github.com/Chillee, https://github.com/malfet
2022-05-18 03:25:43 +00:00
PyTorch MergeBot
c35bd8d423 Revert "Implement sym_sizes to create proper IR for sym ints representing tensor sizes (#76836)"
This reverts commit fc4c3c9bc7.

Reverted https://github.com/pytorch/pytorch/pull/76836 on behalf of https://github.com/suo
2022-05-18 02:45:25 +00:00
Han Qi (qihqi)
3822a472ef Python function to extract information on mobile::Module from flatbuffer (#77624)
Summary:
Includes following refactor:
1. common loading on operator validation that is dup'd in pickle and
   flatbuffer loader moved to function.h/cpp
2. Allow loading of a function without wiring operator.

This function will be used to implement get_bundled_input and friends
for flatbuffer.

Test Plan: contbuild & OSS CI, see 69fa49f123

Reviewed By: cccclai

Differential Revision: D36348549

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77624
Approved by: https://github.com/cccclai
2022-05-18 00:42:57 +00:00
Nikolay Korovaiko
fc4c3c9bc7 Implement sym_sizes to create proper IR for sym ints representing tensor sizes (#76836)
LTC Tensors now create real IR (SizeNode) for sym_sizes() in LTCTensorImpl.cpp.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76836
Approved by: https://github.com/ezyang
2022-05-18 00:40:42 +00:00
Michael Suo
7f1e331b34 Make SymInt constructor explicit
Since we plan to have a bunch of code that is sensitive to whether or
not a SymInt contains a symbolic shape or not, it seems like a bad idea
to have an implicit constructor.

For example, code like:
```
sizes_and_strides_.stride_at_unchecked(dim) = 0;
```

would sail through, and the `0` would get implicitly promoted to a
SymInt.

This is a tradeoff though: it makes code that handles `SymInt`s more
clunky as `int64_t`s and integer literals need to be explicitly wrapped
in `SymInt` before being used.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77666

Approved by: https://github.com/ezyang
2022-05-17 22:28:35 +00:00
Bin Bao
25c6ebd12c Revert "Revert "[LT] Codegen ReuseNode for supported ops""
Summary: Fixed a XLC build failure by generating an always-return-false
default CanBeReused method.

This reverts commit 3cade9d454.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77513

Approved by: https://github.com/alanwaketan
2022-05-16 20:14:42 +00:00
Wang, Eikan
e5a5cd149f Simplify IfThenElse and CompareSelect within for-loop (#76793)
Analyze the range to determine if a condition cannot be satisfied. Suppose the for-loop body contains `IfThenElse` or `CompareSelect` while the condition of the two statements depends on the for-loop index `Var`. In that case, we will analyze the range to check whether the condition could always be satisfied or not. If the condition is deterministic, simplify the logic.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76793
Approved by: https://github.com/huiguoo
2022-05-15 20:21:28 +00:00
PyTorch MergeBot
3cade9d454 Revert "[LT] Codegen ReuseNode for supported ops"
This reverts commit 6066e5929f.

Reverted https://github.com/pytorch/pytorch/pull/76738 on behalf of https://github.com/malfet
2022-05-14 00:33:10 +00:00
Bin Bao
6066e5929f [LT] Codegen ReuseNode for supported ops
Summary:
1. Update the codegen script to add a TrieCache lookup (ReuseNode)
before creating a new IR node. The following is an example generated
code,

```
    at::Tensor LazyNativeFunctions::add(const at::Tensor & self, const at::Tensor & other, const at::Scalar & alpha) {
        ...
        torch::lazy::NodePtr node = torch::lazy::ReuseNode<AddTensor>(lazy_self->GetIrValue(), lazy_other->GetIrValue(), node_alpha);
        if (!node) {
            auto out_meta = at::meta::add(self, other, alpha);
            std::vector<Shape> shapes{Shape(out_meta.scalar_type(), out_meta.sizes().vec())};
            TORCH_INTERNAL_ASSERT(shapes.size() == 1);
            if(symbolicShapeEnabled()){
                std::vector<jit::IValue> inputs = { self, other, alpha };
                char* schema_str = "aten::add.Tensor(Tensor self, Tensor other, *, Scalar alpha=1) -> Tensor";
                applySymbolicShapesOnLT(schema_str, inputs, shapes);
            }

            node = torch::lazy::MakeNode<AddTensor>(lazy_self->GetIrValue(), lazy_other->GetIrValue(), node_alpha, std::move(shapes));
            CacheNode(node);
        }
        ...
    }
```
2. TrieCache lookup depends on each IR node subclass to provide its own
comparison function. The following is an example generated code,

```
  bool CanBeReused(const torch::lazy::Value& self, const torch::lazy::Value& other, const torch::lazy::Value& alpha) const {
    size_t i = 0;
    return (operand(i++) == self &&
        operand(i++) == other &&
        operand(i++) == alpha);
  }
```

3. DeviceData is specially handled.

4. Non-codegen op changes are coming a separate PR.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76738

Approved by: https://github.com/JackCaoG, https://github.com/wconstab
2022-05-13 19:13:58 +00:00
yanbing-j
4f82f439d1 Enable BFloat16 ELU, SELU and CELU in CPU path (#62546)
Enable BFloat16 ELU, SELU and CELU in CPU path. SELU and CELU will call ELU implementation.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62546
Approved by: https://github.com/frank-wei
2022-05-12 16:56:57 +00:00
Xiang Gao
cc9d0f309e lshift and rshift stop support floating types (#77146)
Fixes #74358

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77146
Approved by: https://github.com/ngimel
2022-05-11 22:29:30 +00:00
Bin Bao
8f5cdc6d5d Revert "Revert "[LT] Store OpKind for each IR subclass in a static field""
Summary: Re-land https://github.com/pytorch/pytorch/pull/76711 by
fixing internal build errors.
Generate class-level opkind as a static method instead of a static
member.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77102

Approved by: https://github.com/wconstab, https://github.com/JackCaoG, https://github.com/antoniojkim
2022-05-11 12:27:05 +00:00
John Clow
26e2936edc [JIT SSA] Added testing for the Cat Op in LazyTensor
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76552

Approved by: https://github.com/Krovatkin
2022-05-09 22:11:14 +00:00
PyTorch MergeBot
7eaf4780ba Revert "[LT] Store OpKind for each IR subclass in a static field"
This reverts commit ac37ddc795.

Reverted https://github.com/pytorch/pytorch/pull/76711 on behalf of https://github.com/malfet
2022-05-09 20:50:09 +00:00
Fuqiang Zhang
bd573389f6 [Bootcamp]Add option for flatbuffer loader to copy memory to individual tensors (#76986)
Summary: Add option for flatbuffer loader to copy memory to individual tensors to allow free memeory without waiting for all tensor runs completed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76986
Approved by: https://github.com/qihqi
2022-05-09 17:29:30 +00:00
Bin Bao
ac37ddc795 [LT] Store OpKind for each IR subclass in a static field
Summary: Currently OpKind is stored as an object field called op_ for each IR
node, and one usage of op_ is to avoid dynamic_cast in NodeCast when we
need to downcast a base-node pointer into a concrete sub-node pointer.
As a result, we need to construct and pass in an op when downcasting
nodes, and this becomes quite anonnying when we start to implement the
trie-based IR node reusing. More importantly, the op for each subclass
should be unique for that subclass and thus making it a const static field
is a more logical design.

In this PR, we still keep the object-level op_ for easier XLA adoption. As
furture work, we can come back to remove op_, make the op() method
virtual, and get rid of OpKind in all the node constructors.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76711

Approved by: https://github.com/wconstab, https://github.com/JackCaoG
2022-05-06 19:14:46 +00:00
David Berard
6c615a21a0 [NVFuser] prep for on-by-default
1. fix tests that expected nvfuser off-by-default behavior
2. skip nvfuser if getExecutorMode() == false

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76937

Approved by: https://github.com/eellison
2022-05-06 18:18:53 +00:00
Bin Bao
f05710dd40 [LT] Add a trie data structure for caching IR nodes
Summary: TrieCache provides a way to look up an IR node before we
actually create it. If the lookup hits in TrieCache, we reuse the
existing node and move the current pointer in TrieCache to point to that
node; if the lookup misses, we create a new node and insert it into TrieCache.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76542

Approved by: https://github.com/wconstab, https://github.com/JackCaoG
2022-05-04 23:48:03 +00:00
Wang, Eikan
429a80dded [NNC] Lowering function generates the output buffer with the specified stride (#76529)
Summary:
Pass stride information to lowering function to generate the output bufer with proper memory layout.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76529

Reviewed By: ZolotukhinM

Differential Revision: D36116712

Pulled By: IvanKobzarev

fbshipit-source-id: d3901f756b3710ecce172d6db3ecb0b7c12fb929
(cherry picked from commit b6cd53c91c01db36ea0e99167dc0ce0ae1d3aa23)
2022-05-04 20:04:22 +00:00
Bin Bao
f8a4780eb2 [LT] Move MakeNode into ir_builder.h
Summary: Move MakeNode into ir_builder.h to avoid circular header
reference later when introducing a trie cache for IR node lookup.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76482

Approved by: https://github.com/wconstab
2022-05-03 14:53:19 +00:00
Elias Ellison
e5a55af305 Reland reland
Reland of https://github.com/pytorch/pytorch/pull/76397 and https://github.com/pytorch/pytorch/pull/76493

This time I'll get it right 😢
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76539
Approved by: https://github.com/davidberard98, https://github.com/osalpekar
2022-04-28 20:41:55 +00:00
PyTorch MergeBot
a5bc02aeb2 Revert "[JIT] Register decomp reland"
This reverts commit 81b9cb741c.

Reverted https://github.com/pytorch/pytorch/pull/76397 on behalf of https://github.com/osalpekar
2022-04-28 03:33:29 +00:00
Antonio Kim
f3f327e103 Decouple LTC from TS Backend using Lazy IR Builder
Next stage of breaking up https://github.com/pytorch/pytorch/pull/74710

IR builder class introduced to decouple the explicit usage of `TsNode` in core lazy tensors.

Requires https://github.com/pytorch/pytorch/pull/75324 to be merged in first.

**Background**
- there are ~ 5 special ops used in lazy core but defined as :public {Backend}Node.  (DeviceData, Expand, Scalar...)
- we currently require all nodes derive from {Backend}Node, so that backends can make this assumption safely
- it is hard to have shared 'IR classes' in core/ because they depend on 'Node'

**Motivation**

1. avoid copy-paste of "special" node classes for each backend
2. in general decouple and remove all dependencies that LTC has on the TS backend

**Summary of changes**
- new 'IRBuilder' interface that knows how to make 5 special ops
- move 'special' node classes to `ts_backend/`
- implement TSIRBuilder that makes the special TS Nodes
- new backend interface API to get the IRBuilder
- update core code to call the builder

CC: @wconstab @JackCaoG @henrytwo

Partially Fixes #74628

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75433
Approved by: https://github.com/wconstab
2022-04-28 02:07:02 +00:00
Jiewen Tan
a28b132bc2 Revert D35860266: [pytorch][PR] Update torch::lazy::BackendDevice to have a new default ordinal
Test Plan: revert-hammer

Differential Revision:
D35860266 (f9d07ae644)

Original commit changeset: 554ebe16a068

Original Phabricator Diff: D35860266 (f9d07ae644)

fbshipit-source-id: 325c54aa2e87e51134115213352b3d33a81b7edf
(cherry picked from commit bbd74bf34a534d1b87aadff9790038e3dbbfa9c8)
2022-04-27 18:11:24 +00:00
Elias Ellison
81b9cb741c [JIT] Register decomp reland
Reland of https://github.com/pytorch/pytorch/pull/76252
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76397
Approved by: https://github.com/davidberard98
2022-04-26 23:17:18 +00:00
PyTorch MergeBot
2d72cb3373 Revert "[JIT] Allow registering Decompositions"
This reverts commit d9f0774f98.

Reverted https://github.com/pytorch/pytorch/pull/76252 on behalf of https://github.com/zengk95
2022-04-26 04:47:05 +00:00
Elias Ellison
d9f0774f98 [JIT] Allow registering Decompositions
- Allow registering custom decompositions
- Add easier API for invoking decompositions
- Shorten API names (no users yet)

I am doing these as one pr because they are fairly short/simple and because github first does not support ghstack yet.

cc @Chillee @zou3519
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76252
Approved by: https://github.com/davidberard98
2022-04-26 03:00:35 +00:00
Nikolay Korovaiko
bb60cac25a E2E SymInt example narrow_copy
This **roughly** corresponds to Goal 3.2 in https://docs.google.com/document/d/1iiLNwR5ohAsw_ymfnOpDsyF6L9RTUaHMpD8YLw-jxEw/edit#

Namely:

It adds the following:

* SymbolicIntNode interface
* LazySymbolicIntNode implementation
* Lazy `narrow_copy` implementation
* Need add support for SymInt in codegen
* Test (below)

```cpp
TEST(LazyDynamicOpsTest, NarrowCopy) {
  auto x = torch::rand({5, 10, 10}).to(kLazy);
  const size_t Y_DIM = 3;
  const size_t X_DIM_INDEX = 2;
  auto y = torch::rand({Y_DIM}).to(kLazy);
  auto ly = torch::lazy::TryGetLtcTensor(y);
  auto dim_node = MakeNode<SizeNode>(ly->GetIrValue(), 0);
  auto lmn = new torch::lazy::SymbolicIntNode(dim_node);
  auto z = x.narrow_copy(X_DIM_INDEX, 0, lmn->toSymInt());
  AllClose(z.cpu(), x.cpu().narrow_copy(X_DIM_INDEX, 0, Y_DIM));
}
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75759
Approved by: https://github.com/wconstab
2022-04-26 02:40:27 +00:00
Wonjoo Lee
f9d07ae644 Update torch::lazy::BackendDevice to have a new default ordinal (#76264)
Summary:
Fixes https://github.com/pytorch/xla/issues/3490. Updates `torch::lazy::BackendDevice` with changes below:

1. Remove the no-op string constructor.
2. Update default ordinal to `-1`.
3. Add a `is_valid` function to check if `ordinal` is valid/non-default (`ordinal >= 0`).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76264

Reviewed By: mrshenli

Differential Revision: D35860266

Pulled By: alanwaketan

fbshipit-source-id: 554ebe16a0683d37b00270c4f35163bf690bfe28
(cherry picked from commit b941d10e8545dfecfb34e4d5c24a29a1cc49bc4b)
2022-04-25 23:57:18 +00:00
zengk95
1d55518198 Revert "[nnc] Strides to Tensor (#72962)"
This reverts commit 939060925f.

Fixes https://github.com/pytorch/vision/issues/5873

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76332
Approved by: https://github.com/seemethere
2022-04-25 19:50:00 +00:00
Ivan Kobzarev
939060925f [nnc] Strides to Tensor (#72962)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72962

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM, cpuhrsch

Differential Revision: D34589306

Pulled By: IvanKobzarev

fbshipit-source-id: ecee5249760ecc0c8b2edb1842b90218899bc944
(cherry picked from commit 9e310c4c67389da30da89126d838ffe3864aba6f)
2022-04-23 19:35:15 +00:00
Prem
7557407653 Added directory check before saving in C++ API
Fixes #75177

Couldn't find any utility method to get directory name in pytorch repo, hence creating a function for that.
Let me know if a new function is not needed.

I also referred [this](https://github.com/pytorch/pytorch/blob/master/c10/test/util/tempfile_test.cpp#L15) for directory check.

Also I am using TORCH_CHECK to show the error. This is highly verbose with the entire stack visible. Is there any alternative for the same so that it is easier to read? This could happen a frequently, so small and concise error would be more helpful here.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75681
Approved by: https://github.com/albanD
2022-04-22 20:04:41 +00:00
Wang, Eikan
ef0873327e [NNC] Add utility functions to check channels-last contiguous (#75938)
Summary:
The `Buf` uses `std::vector<ExprHandle>` to represent its strides. The `ExprHandle` could be an immediate value or a mathematical expression with variables involved both for the static shape and dynamic shape. So it is hard to directly deduce the channels-last contiguous layout based on the numerical calculation. Hence, the utility functions of this PR are based on the pattern match to check whether the `Buf` is channels-last contiguous.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75938

Reviewed By: cpuhrsch

Differential Revision: D35724091

Pulled By: ZolotukhinM

fbshipit-source-id: f79ae21749d0aad8601f0434b52df88602ff09bf
(cherry picked from commit 3712bbbe4bea57c5c1abe1eafde4b8778e13e0c4)
2022-04-22 06:42:39 -07:00
Antonio Kim
2c2c13d21b Decouple Lazy Node Shape Cache (#75324)
Summary:
Next stage of breaking up https://github.com/pytorch/pytorch/pull/74710

Move shape cache implementation to the backend interface. Also, clean up some of the hashing logic in the base node class.

CC: wconstab JackCaoG henrytwo

Partially Fixes https://github.com/pytorch/pytorch/issues/74628

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75324

Reviewed By: anjali411

Differential Revision: D35730823

Pulled By: wconstab

fbshipit-source-id: cf6fa326319b9324e5f422a78817b6fb5bf7e9b8
(cherry picked from commit faec5043df56639e2fd23de2d91ae796e4f3df70)
2022-04-21 17:27:05 -07:00
Nikolay Korovaiko
69e048b090 List of SymInt rebase on master
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75115
Approved by: https://github.com/ezyang
2022-04-20 02:09:55 +00:00
Elias Ellison
f65eb09d6b [JIT] Move Shape Function definition to python
Moves jit shape function registration to python. Like jit decompositions, a script must be run after adding new definitions which serializes them in a c++ file.

This was a request so that torch-mlir could define functions in python and upstream their shape functions. cc @silvasean  @makslevental
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75546
Approved by: https://github.com/davidberard98
2022-04-19 20:59:44 +00:00
Taylor Robie
a5e338a826 [RecordFunction] More effecient machinery to determine which callbacks to run. (#75807)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75807

There is a tension in RecordFunction between two use cases:
1) In the normal eager path we don't run any callbacks, so we need to bail out of the profiling path as soon as possible to minimize eager overhead.
2) When profiling we want to determine which callbacks to run as efficiently as possible to minimize instrumentation overhead.

The confounding factor in all of this is sampling callbacks because they change which callbacks will run on each call, even in steady state operation. This has traditionally been handled with a two stage procedure: first we flip a coin to determine if a sampled callback *might* run. If false (which it usually is), do nothing. This solves (1). If true, check to see if we need to build the full callback set or if it was a false positive. This procedure has two negative effects:
* It forces us to rebuild the set of callbacks to run on every step when profiling
* It leaks the sampling abstraction, requiring other parts of the code to bump certain values and forces RecordFunction to lazily initialize.

This change introduces a multi-level cache which can (in the common case) quickly determine which callbacks *will* run, rather than if callbacks *might* run. This means that rather than call `shouldRunRecordFunction`, we can simply get the callbacks for an invocation and check if they are empty. (And completely removes the pre-sampling heuristic.) Another major benefit of the new cache structure is that it allows thread-safe registration and unregistration of global callbacks.

It's worth briefly discussing how this maintains eager performance. In the standard eager case (only sampling callbacks registered) the cache first checks that the global callbacks haven't changed (atomic read), decrements a counter to see if a sampling callback fired, and then returns the active callbacks which is simply a SmallVector of pointer pairs and a couple POD values (scope, needs inputs/outputs/ids). The biggest cost according to perf is the SmallVector logic; we could consider adopting a hard limit on active callbacks; more than half a dozen callbacks *running* in a single step would be quite a lot. But the total cost relative to `PYTORCH_DISABLE_PER_OP_PROFILING` is only ~10ns, so debatable if it's worth it to switch to `std::array`.

The primary change is in `record_function.cpp`, which has a more detailed description of the new cache structure. `record_function.h` has some minor changes to align with the new calling convention and the remaining files are simply changes to the call sites.

Future work:
  * RecordFunction no longer needs to be lazily initialized.
  * We can deprecate the disable/reenable APIs, since we can not safely add and remove global callbacks.

Test Plan:
I tested eager mode performance using the overhead benchmark and found that the non-profiled path was unaffected. However the no-op observer dropped from 0.41us to 0.37us (0.25us if no observers are active) which is about 1/3rd reduction in the cost of the callback selection machinery.

I also added several C++ unit tests, as the core RecordFunction machinery (especially sampling) was largely untested.

Reviewed By: swolchok, davidberard98

Differential Revision: D35276158

fbshipit-source-id: 35135f444724fba4eb97c0ae7f3f710f0f9016fd
(cherry picked from commit 9e359b87422c18f2a195185f32e7e85c82f956fd)
2022-04-19 20:46:16 +00:00
Han Qi
b34b192d6b Reland "Make debug_pkl smaller by only emitting unique traces." (#73368)
Summary:
## Original commit message:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73368

debug_pkl file inside of pytorch's .pt file consists of a list of SourceRanges. Each SourceRange points to a Source which is a stack track, filename, and start, end numbers. Those are emitted in debug_pkl file as strings.
Since many SourceRange shares the same source, the string for trace can be deduped.
The newer format saves a set of unique traces in a tuple, then each SourceRange will save the offset of it's trace w.r.t. position in that tuple. (i.e. manually applying dictionary compression).
The above helps with smaller file size. On loading, if we copy each trace to Source as string the runtime memory would still blowup.
To mitigate this, we use SourceView directly instead of source which will take the reference of string inside of Deserializer and make that into string_view. This is safe because Deserializer is hold by Unpickler by shared_ptr, and Unpickler is also hold by shared_ptr by another Source object. That Source object will be alive during the model construction.

Test Plan:
## Original Test plan
unit test

Took original file (312271638_930.predictor.disagg.local); loaded with `torch.jit.load` save again with `torch.jit.save`. Unzip both, look at contents:
```
[qihan@devvm5585.vll0 ~]$ du archive -h
4.0K    archive/xl_model_weights
3.7M    archive/extra
8.0K    archive/code/__torch__/caffe2/torch/fb/model_transform/splitting
8.0K    archive/code/__torch__/caffe2/torch/fb/model_transform
8.0K    archive/code/__torch__/caffe2/torch/fb
8.0K    archive/code/__torch__/caffe2/torch
8.0K    archive/code/__torch__/caffe2
20M     archive/code/__torch__/torch/fx/graph_module
20M     archive/code/__torch__/torch/fx
8.0K    archive/code/__torch__/torch/classes
20M     archive/code/__torch__/torch
20M     archive/code/__torch__
20M     archive/code
2.7M    archive/constants
35M     archive
[qihan@devvm5585.vll0 ~]$ du resaved -h
4.0K    resaved/extra
8.0K    resaved/code/__torch__/caffe2/torch/fb/model_transform/splitting
8.0K    resaved/code/__torch__/caffe2/torch/fb/model_transform
8.0K    resaved/code/__torch__/caffe2/torch/fb
8.0K    resaved/code/__torch__/caffe2/torch
8.0K    resaved/code/__torch__/caffe2
1.3M    resaved/code/__torch__/torch/fx/graph_module
1.3M    resaved/code/__torch__/torch/fx
8.0K    resaved/code/__torch__/torch/classes
1.4M    resaved/code/__torch__/torch
1.4M    resaved/code/__torch__
1.4M    resaved/code
2.7M    resaved/constants
13M     resaved
[qihan@devvm5585.vll0 ~]$
```
## Additional test:
`buck test mode/dev-tsan //caffe2/benchmarks/static_runtime:static_runtime_cpptest -- --exact 'caffe2/benchmarks/static_runtime:static_runtime_cpptest - StaticRuntime.to'` passes

 test jest.fbios.startup_cold_start.local.simulator f333356873 -

Differential Revision: D35196883

Pull Request resolved: https://github.com/pytorch/pytorch/pull/74869
Approved by: https://github.com/gmagogsfm
2022-04-18 22:34:21 +00:00
Han Qi
7d5c07830d Add upgrader related logic to flatbuffer (#71451)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71451

title

Test Plan: unittest

Reviewed By: tugsbayasgalan

Differential Revision: D33593056

fbshipit-source-id: c48d6ad50e6e2f757b68525dfe07693711b95840
(cherry picked from commit 8e09e20c1dafcdbdb45c2d1574da68a32e54a3a5)
2022-04-17 18:51:23 +00:00
Nikita Shulga
fe8eff3711 Revert "Add upgrader related logic to flatbuffer"
This reverts commit dfae96171a.
2022-04-17 11:38:59 -07:00
Han Qi
dfae96171a Add upgrader related logic to flatbuffer
Summary: title

Test Plan: unittest

Differential Revision: D33593056

Pull Request resolved: https://github.com/pytorch/pytorch/pull/71451
Approved by: https://github.com/tugsbayasgalan
2022-04-16 02:04:48 +00:00
Raghavan Raman
c2d5f6a5a4 [nnc] Update bounds overlap analysis to identify non-overlaps even with symbolic bounds
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74658

Approved by: https://github.com/ZolotukhinM
2022-04-14 20:24:03 +00:00
Raghavan Raman
d8ad1a579f [nnc] Fuse loops that have variable bounds
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74346

Approved by: https://github.com/ZolotukhinM
2022-04-14 20:24:03 +00:00
Jiewen Tan
ab0d9b18e9 [LT] Support Tensor.is_alias_of
Summary:
Tensor.is_alias_of relies on Storage to perform. However, LTCTensorImpl was
not implemented with that in mind. This commit adds a fake storage to LazyTensor
as a marker to mark LazyTensors that point to the same storage. The reason
why it's not done at LTCTensorImpl is that LazyTensor maintains the view ops/alias
logic in LazyTensor class instead of relying on TensorImpl to do the check.

Test Plan:
./build/bin/test_lazy --gtest_filter=LazyOpsTest.IsAliasOf

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75246

Approved by: https://github.com/bdhirsh
2022-04-14 07:28:03 +00:00
Nikolay Korovaiko
ce842f43f2 Relanding shape cache (75400) (#75710)
Summary:
https://github.com/pytorch/pytorch/pull/75400

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75710

Reviewed By: malfet

Differential Revision: D35598920

Pulled By: Krovatkin

fbshipit-source-id: 2bbbb3d0c24214b5dbb4ca605e7daa94671f96b0
(cherry picked from commit 572f2f9df5bfd73cd7b83536f619bc86d820ccd8)
2022-04-13 17:17:30 +00:00
PyTorch MergeBot
db1801099b Revert "Relanding shape cache (75400)"
This reverts commit 89486821ed.

Reverted https://github.com/pytorch/pytorch/pull/75710 on behalf of https://github.com/malfet
2022-04-13 17:14:38 +00:00
Nikolay Korovaiko
89486821ed Relanding shape cache (75400)
https://github.com/pytorch/pytorch/pull/75400
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75710
Approved by: https://github.com/malfet
2022-04-13 07:28:32 +00:00
PyTorch MergeBot
c274f66268 Revert "Adding Caching of calculated Symbolic Shapes"
This reverts commit 9a7bfaa929.

Reverted https://github.com/pytorch/pytorch/pull/75400 on behalf of https://github.com/mehtanirav
2022-04-12 21:53:31 +00:00
John Clow
9a7bfaa929 Adding Caching of calculated Symbolic Shapes
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75400

Approved by: https://github.com/eellison
2022-04-12 11:19:58 +00:00
Pavithran Ramachandran
6402e62454 Refractor flatbuffer jit code
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75239

Refractor flatbuffer_serializer to move JIT related code to a separate file .

Differential Revision: [D35301020](https://our.internmc.facebook.com/intern/diff/D35301020/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D35301020/)!

Approved by: https://github.com/iseeyuan
2022-04-11 23:41:48 +00:00
John Clow
f281d83d77 Moving Remove Tensor Type Specializations to after custom passes
This is to allow for Intel folks to use type information in their custom passes.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/71748

Approved by: https://github.com/eellison
2022-04-11 22:12:01 +00:00
Yulv-git
ac2d2e3a3d Fix some typos.
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75561
Approved by: https://github.com/albanD
2022-04-11 21:55:59 +00:00
Nikita Shulga
80ea6955af Add cuda-11.3+clang9 build workflow (take 2)
To be able to detect unused captures in GPU code lambdas (as gcc does not support this diagnostic)

Remove unused opts lambda capture in `ProcessGroupMPI.cpp` and `Distributions.cu`

Fix sign-compare in nvfuser benchmark and ignore signed unsigned comparison in nvfuser tests
Fixes https://github.com/pytorch/pytorch/issues/75475 by aliasing CMAKE_CUDA_HOST_COMPILER to C_COMPILER when clang is used
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75293
Approved by: https://github.com/atalman, https://github.com/seemethere
2022-04-11 17:13:01 +00:00
PyTorch MergeBot
8fe43d76d5 Revert "Add cuda-11.3+clang9 build workflow"
This reverts commit 709fcc862e.

Reverted https://github.com/pytorch/pytorch/pull/75293 on behalf of https://github.com/janeyx99
2022-04-11 15:24:59 +00:00
Nikita Shulga
709fcc862e Add cuda-11.3+clang9 build workflow
To be able to detect unused captures in GPU code lambdas (as gcc does not support this diagnostic)

Remove unused opts lambda capture in `ProcessGroupMPI.cpp` and `Distributions.cu`

Fix sign-compare in nvfuser benchmark and ignore signed unsigned comparison in nvfuser tests
Fixes https://github.com/pytorch/pytorch/issues/75475 by aliasing CMAKE_CUDA_HOST_COMPILER to C_COMPILER when clang is used
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75293
Approved by: https://github.com/atalman, https://github.com/seemethere
2022-04-11 14:10:57 +00:00
Jiewen Tan
dc37090ec5 [LT] Support diagonal op (#75230)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75230

Op diagonal is a view op which we can't code-gen yet. Therefore, support
it by making hand-written IR construction and lowering.

Test Plan: ./build/bin/test_lazy --gtest_filter=LazyOpsTest.TestDiagonal*

Reviewed By: wconstab

Differential Revision: D35378316

Pulled By: alanwaketan

fbshipit-source-id: 7958d00107aef20ac37aabcf2868346240977530
(cherry picked from commit 84155528fce484627c9688cfd92fd4aeb68219e5)
2022-04-08 19:49:42 +00:00
Nikolay Korovaiko
4a85145bbd Ansley's rebase of DimensionNode onto master (#75352)
Summary:
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75352

Reviewed By: wconstab

Differential Revision: D35455859

Pulled By: Krovatkin

fbshipit-source-id: e24c81d63dc66d03b752cc8de5cb551d84b003ac
(cherry picked from commit 4ad371cb4cc88860ce8ec398d82083f6759e3fcf)
2022-04-08 17:22:56 +00:00
John Clow
f1db3e465a Adding integration of SSA into LazyTensor
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75050

Approved by: https://github.com/Krovatkin
2022-04-07 19:49:41 +00:00
Pavithran Ramachandran
3001bda304 [PyTorchEdge] Backport from v9 flatbuffer to v8 pickle (#75201)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75201

In this diff:
1. Bump supported version to 9, which will serve as a placeholder for upcoming version bump to v9 for flatbuffer format migration.
2. Implements backport from v9 flatbuffer file to v8 pickle file.
ghstack-source-id: 153225189

(Note: this ignores all push blocking failures!)

Test Plan:
fb:
```
cd ~/fbsource/fbcode/ && buck test  -c fbcode.caffe2_enable_flatbuffer=1 caffe2/test/cpp/jit:jit -- LiteInterpreterTest.BackPortByteCodeModelAllVersions
Parsing buck files: finished in 0.7 sec
Downloaded 0/25 artifacts, 0.00 bytes, 100.0% cache miss (for updated rules)
Building: finished in 20.7 sec (100%) 21783/21783 jobs, 5/21783 updated

cd ~/fbsource/fbcode/ && buck test caffe2/test/cpp/jit:jit -- FlatbufferTest.FlatbufferBackPortTest
Parsing buck files: finished in 0.7 sec
Building: finished in 4.5 sec (100%) 12972/53298 jobs, 0/53298 updated
  Total time: 5.3 sec
More details at https://www.internalfb.com/intern/buck/build/b658d597-d358-4293-97cb-28e7612b96e8
BUILD SUCCEEDED
Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details.
Running with tpx session id: 35d5542d-6ee3-4c28-be10-1d822c7a6fef
Trace available for this run at /tmp/tpx-20220308-090347.891303-35d5542d-6ee3-4c28-be10-1d822c7a6fef/trace.log
RemoteExecution session id: reSessionID-35d5542d-6ee3-4c28-be10-1d822c7a6fef-tpx
Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/8444249379196000
    ✓ ListingSuccess: caffe2/test/cpp/jit:jit : 490 tests discovered (22.838)
    ✓ Pass: caffe2/test/cpp/jit:jit - FlatbufferTest.FlatbufferBackPortTest (0.289)
Summary
  Pass: 1
  ListingSuccess: 1
If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users
Finished test run: https://www.internalfb.com/intern/testinfra/testrun/8444249379196000
```

Reviewed By: iseeyuan

Differential Revision: D34702597

fbshipit-source-id: 5c203c29d13360d7934ce6e57557739e7038c05e
(cherry picked from commit 6189e08a2bd968fdab636f77cb6bd73d6c36beb2)
2022-04-07 19:43:57 +00:00
Wang, Eikan
252e1ccce6 Enable TE fuser to support user defined operator (#73073)
Summary:
PyTorch supports registering a custom operator by `TORCH_LIBRARY_FRAGMENT` / `TORCH_LIBRARY_IMPL` and `torch::jit::tensorexpr::getNNCLoweringRegistry` could insert a custom operator. But the te fuser passes conditional check does not support custom operator. The `isSupported` of `tensorexpr_fuser` checks whether the `Node` is `get_tensorexpr_elementwise_set()`, `supported_non_eltwise_set()`, `supported_misc_set` and `supported_reduction_set`. If a custom operator needs to be added to the TE fusion group, the checked will block it.

Taking the RN50 as an example, we can speed up the model by fusing the convolution and consecutive element-wise operator into a custom operator. The framework overhead becomes non-negligible when the computation becomes more efficient, especially for the latency mode and the tiny models. If the TE fuser allows adding the custom operator to the fusion group, then the entire RN50 model could be fused by TE as a single operator/function consisting of "ExternalCalls" and TE-IR.  This could significantly reduce framework overhead, which in turn improves RN50 E2E performance. The same goes for other models.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/73073

Reviewed By: pbelevich

Differential Revision: D35453165

Pulled By: ZolotukhinM

fbshipit-source-id: a764cf340b0b1e05fe230649cbe44f5786bdd37d
(cherry picked from commit ee95aa4d36714540fbb216a338799e6a6bb966d5)
2022-04-07 04:36:39 +00:00
Martin Yuan
00c1e01ad0 Remove internal logic to handle bytecode version 3 (#57775)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57775

The minimum supported bytecode version is updated from 3 to 4. We no longer support version 3 bytecode models.

Why?
* There are hacky codes in operator loading, that performs differently on one operator on the global bytecode version 3. Instead operator related metadata should be passed (for example, in #56845). To allow future development, we remove the hacky way first.
* The bytecode version was bumped from 3 to 4 more than half a year ago. Since all the production models are all bumped to version 4, it's not practical to keep and maintain version 3. The risk to deprecate version 3 is low.

Test Plan: Imported from OSS

Reviewed By: raziel

Differential Revision: D28270791

Pulled By: cccclai

fbshipit-source-id: 70b1bd6352fdaae5f8d2173b81578d77018c8e44
(cherry picked from commit 3e930fa381cd01f3705116795c6426df992372fc)
2022-04-07 01:45:52 +00:00
Pavithran Ramachandran
f984e50f39 Extend jit::load to work on flatbuffer file; Take 2 (#75256)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75256

ghstack-source-id: 153138970

Test Plan: CI

Reviewed By: iseeyuan

Differential Revision: D35399581

fbshipit-source-id: dafe9d301009d3f70986ed92bfe06d160ab90ba0
(cherry picked from commit ccc860fd07946de5aae12bc179a0b8bbba83b997)
2022-04-06 17:54:01 +00:00
John Clow
26dcec152c Added support for SSA for ops not in a JIT graph
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74340

Approved by: https://github.com/eellison
2022-04-06 01:45:37 +00:00
Antonio Kim
e1b4117e30 Move shape and operand definitions to base node (#75223)
Summary:
First stage of breaking up https://github.com/pytorch/pytorch/pull/74710

Moves the shape and operand definitions from `TsNode` to the base `Node`

CC: wconstab JackCaoG henrytwo

Partially Fixes https://github.com/pytorch/pytorch/issues/74628

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75223

Reviewed By: zou3519

Differential Revision: D35410285

Pulled By: wconstab

fbshipit-source-id: bb84d3fb636882cbe7e18af4b35ff2c0e22aaa58
(cherry picked from commit a4144c9a48379d8a9007cff845796608b597cce1)
2022-04-06 01:43:46 +00:00
Lu Fang
32e58c73c4 Back out "Extend jit::load to work on flatbuffer file" (#75244)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75244

Original commit changeset: d653a5af662a

Original Phabricator Diff: D35060736 (d9d34922a0)

Test Plan: Model loading test, verified that D35060736 (d9d34922a0) will cause the torch::save => torch::load failure.

Reviewed By: yinghai, jianyuh

Differential Revision: D35387009

fbshipit-source-id: 9d176992d402d57779e2af3d905b3c1538335298
(cherry picked from commit 6c8cc0d3b8a88b15e35702d70e18bbae8aa4628a)
2022-04-05 09:55:04 +00:00
Nikita Shulga
81d765ef1f Fix sign-compare violations in cpp tests
Prerequisite change for enabling `-Werror=sign-compare` across PyTorch repo

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75080

Approved by: https://github.com/atalman
2022-04-04 23:05:31 +00:00
Chen Lai
6efc5c1acf Rewrite upgrader bytecode version from 3 to 4 (content unchanged) (#75120)
Summary:
update the upgrader models by hacking backport logic - copy everything in the model and only rewrite the bytecode version to 4 in D35265596

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75120

ghstack-source-id: 152823046

Test Plan: CI

Reviewed By: qihqi

Differential Revision: D35321154

fbshipit-source-id: 333158bd0fd9b4819b3b7cf47d80c285934adf3e
(cherry picked from commit 74bb2da73a4d18f448b8486772643eac89eb759a)
2022-04-02 01:51:39 +00:00
Pavithran Ramachandran
d9d34922a0 Extend jit::load to work on flatbuffer file (#75022)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75022

Extending torch::jit::load to read flatbuffer file
ghstack-source-id: 152820697

Test Plan: CI

Reviewed By: iseeyuan

Differential Revision: D35060736

fbshipit-source-id: d653a5af662a46107ff4fd70209fd2a0a4d40f20
(cherry picked from commit 109e14a54bd279011c8f9066e6c29e8e0b1fc4db)
2022-04-02 01:33:34 +00:00
Pavithran Ramachandran
7aaa75af05 Extending _get_bytecode_version to support flatbuffers format (#75021)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75021

Extending `_get_bytecode_version` to support flatbuffers.
ghstack-source-id: 152771695

(Note: this ignores all push blocking failures!)

Test Plan:
```
~/fbsource/xplat] cd ~/fbsource/xplat/ && buck test //xplat/caffe2:test_lite_interpreter
Building: finished in 0.8 sec (100%) 327/327 jobs, 0/327 updated
  Total time: 0.9 sec
Testing: finished in 06:59.5 min (85 PASS/0 FAIL)
BUILD SUCCEEDED
RESULTS FOR //xplat/caffe2:test_lite_interpreter
PASS    412.3s 85 Passed   0 Skipped   0 Failed   //xplat/caffe2:test_lite_interpreter
TESTS PASSED
```

Reviewed By: iseeyuan

Differential Revision: D34900498

fbshipit-source-id: 65743076d43a933c5381ec128d0268f22c0a8441
(cherry picked from commit 457c76c7d1df6050b941c56a8198162e2e4a3388)
2022-04-01 15:05:37 +00:00
Will Constable
b9e535a64a Add non-eager registration to dispatch autogen (#74557)
Summary:
Previously, the torchscript backend would be (partially) initialized at startup.
- the dispatcher registrations would be registered,
- but other backend components would not be initialized until explicitly calling
  the backend init function

With this change, the torchscript backend is not initialized until its explicit
initialization function is called.

This enables external backends to register their own backend instead of the torchscript
backend to the same (Lazy) key.

Lands a change contributed by antoniojkim via lazy_tensor_staging branch (https://github.com/pytorch/pytorch/issues/73973)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/74557

Reviewed By: bdhirsh

Differential Revision: D35051464

Pulled By: wconstab

fbshipit-source-id: 5a8b0851293e394f49427d1416ee571a8881fe9f
(cherry picked from commit ef745a4a2c8d1d7f9510541a20f1f40625ce29de)
2022-04-01 03:42:53 +00:00
Will Constable
14affba799 Fix ir_metadata Python frames func and remove dead code (#74979)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74979

Reviewed By: alanwaketan

Differential Revision: D35261641

Pulled By: wconstab

fbshipit-source-id: e82b5f17d0043c4a3de72c16fb42fd02a85414fe
(cherry picked from commit fc6c0a1654256871361a5ad08926bc39d74cd0c5)
2022-03-31 23:23:36 +00:00
Nikolay Korovaiko
5177f95d21 Introducing SymInt to Pytorch (for tracing size arithmetic) (master rebase) (#74861)
Summary:
This PR introduces `SymInt` type to Pytorch which will be used by LTC and AOTAutograd for tracing size arithmetic and tests.
`SymInt` is a C++ union structure [int64_t, SymbolicIntNode*] that wraps around an int64_t field where the value of the field could be an index into a list of `shared_ptr<SymbolicIntNode>` or a real int.
This PR doesn't add any support for actually tracing symbolic ints. i.e. data_ for now can only contain real ints.

```
Goal 1: just to show we can add a type to PyTorch core. (wraps int) LANDEABLE
Finalize the naming - symint
Want the name to be short
Does invoke “size” - NO
SInt/SymInt/SymbolicInt
SInt could mean signed int
sym_int or symint or SymInt (originally it was “int”; capitalized implies object semantics, whereas lowercase implies value semantics)
JIT schema - symint
C++ - symint
```

See more details here: https://docs.google.com/document/d/1iiLNwR5ohAsw_ymfnOpDsyF6L9RTUaHMpD8 (d843f63f2a)YLw-jxEw

Pull Request resolved: https://github.com/pytorch/pytorch/pull/74861

Reviewed By: qihqi, ngimel

Differential Revision: D35226230

Pulled By: Krovatkin

fbshipit-source-id: 34acf342bd50fcaa4d8d5dd49c2fd6a98823a5b3
(cherry picked from commit 218643f63ef181cabb92d13a6e837eb64f2dda3c)
2022-03-31 21:59:59 +00:00
jjsjann123
873ced7cd0 Nvfuser code bump 030122 (#73627)
Summary:
Things changed in this PR that requires review:

test/forward_backward_compatibility/check_forward_backward_compatibility.py

Our previous function overload extension names were wrong and has been updated in this PR, hence the compatibility list updated.

nvfuser code updates with bug fixes towards failures we encountered in OpInfoTests as well as failures reported by AOTAutograd team.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/73627

Reviewed By: Chillee

Differential Revision: D34765458

Pulled By: davidberard98

fbshipit-source-id: c81f3d6a1b723fb3a8ba419b7f82227f70440ca7
(cherry picked from commit b6a2c362c37051e44fac31687b2fe272f776551e)
2022-03-31 08:18:22 +00:00
Nikita Shulga
43313cbde3 Revert D34647822: [tensorexpr] Add support for aten::stack
Test Plan: revert-hammer

Differential Revision:
D34647822 (954c7e2a77)

Original commit changeset: 3b863c71886c

Original Phabricator Diff: D34647822 (954c7e2a77)

fbshipit-source-id: e9ce06c9c8d7caf0fbb2565f0d99035bad685793
(cherry picked from commit b2ff355e9dbaa4e940fb221254223984c3c8a215)
2022-03-31 04:25:43 +00:00
Nikita Shulga
320e5a8268 Revert D34808051: [tensorexpr] Enabled aten::stack in the fuser pass with static shapes
Test Plan: revert-hammer

Differential Revision:
D34808051

Original commit changeset: 213e2ffdf87f

Original Phabricator Diff: D34808051

fbshipit-source-id: b618daeb346f784e8ab9525040edcb4a30a39613
(cherry picked from commit e47b973cba5c95e9410f8aecdfd5619de6d4be7c)
2022-03-31 04:25:43 +00:00
Hui Guo
90c3699cc8 [tensorexpr] Enabled aten::stack in the fuser pass with static shapes (#74077)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74077

Test Plan: Imported from OSS

Reviewed By: gchanan

Differential Revision: D34808051

Pulled By: huiguoo

fbshipit-source-id: 213e2ffdf87fb1a74104037cea7ef25e4bfd4307
(cherry picked from commit ad9e84842e5b47eda845827d325b08ba361a8286)
2022-03-31 04:25:43 +00:00
Elias Ellison
2ef5611f31 Add comments for adding shape function and linting (#73570)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73570

Approved by: https://github.com/huiguoo

Test Plan: contbuild & OSS CI, see 6d36bbde7e

Reviewed By: pbelevich

Differential Revision: D35192688

Pulled By: atalman

fbshipit-source-id: b12b80e6a6dd1adaa57a8facb6bb077989faa543
(cherry picked from commit e50478c02592597f12b8490ec5496f76c7d8b8cc)
2022-03-31 04:25:43 +00:00
Nikita Shulga
3036a0309d [skip ci]Revert "Add comments for adding shape function and linting"
This is a technical revert of 6d36bbde7e to reconcile it with e50478c02592597f12b8490ec5496f76c7d8b8cc (which is the same + lint changes applied)

Should be skipped during import
2022-03-30 21:21:28 -07:00
Hui Guo
954c7e2a77 [tensorexpr] Add support for aten::stack (#73801)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73801

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D34647822

Pulled By: huiguoo

fbshipit-source-id: 3b863c71886c7c6616b16f5d3313079714c8b82a
(cherry picked from commit c71778cf6a5724d26b671bf3ee0478add24990e8)
2022-03-30 21:25:15 +00:00
Dave Bort
f82b2d4a82 [PyTorchEdge] Make _load_parameters() handle flatbuffer inputs (#74580)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74580

Handle Flatbuffer-serialized parameters.

Make `_load_parameters()` detect the input data format and use the correct deserializer to load the parameters.

Also, rename `BytecodeDeserializer` to `IValueUnpickler` to make it clear that it unpickles an `IValue` and doesn't have anything to do with bytecode.
ghstack-source-id: 152487890

Test Plan:
New unit test shows a successful round trip from _save_parameters() to _load_parameters() using flatbuffers.

```
$ buck test //xplat/caffe2:test_lite_trainer //xplat/caffe2:test_lite_trainer_pickle_and_flatbuffer
Building: finished in 0.5 sec (100%) 346/346 jobs, 0/346 updated
  Total time: 0.6 sec
Testing: finished in 0.5 sec (26 PASS/0 FAIL)
BUILD SUCCEEDED
RESULTS FOR //xplat/caffe2:test_lite_trainer //xplat/caffe2:test_lite_trainer_pickle_and_flatbuffer
PASS    <100ms 13 Passed   0 Skipped   0 Failed   //xplat/caffe2:test_lite_trainer
PASS    <100ms 13 Passed   0 Skipped   0 Failed   //xplat/caffe2:test_lite_trainer_pickle_and_flatbuffer
TESTS PASSED
```

Reviewed By: qihqi

Differential Revision: D34488913

fbshipit-source-id: 8d2c0b895699f3b336115d33bf96d49cbf9245d2
(cherry picked from commit 319345deff260826197f8cdf5ac03071b412c72f)
2022-03-30 20:39:58 +00:00
Dave Bort
1659a267f9 [PyTorchEdge] Export flatbuffers from _save_parameters() (#74579)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74579

Now that we can convert a module to a flatbuffer, update `_save_parameters()` to optionally write to that format.

Also, rename the internal `ScriptModuleSerializer` class to `IValuePickler` to make it more clear that a) it's pickle-specific, and b) it serializes IValues, not Modules.
ghstack-source-id: 152487889

Test Plan:
New unit test shows that we can produce Flatbuffer-formatted output.

```
$ buck test //xplat/caffe2:test_lite_trainer //xplat/caffe2:test_lite_trainer_pickle_and_flatbuffer
Building: finished in 0.5 sec (100%) 346/346 jobs, 0/346 updated
  Total time: 0.6 sec
Testing: finished in 0.5 sec (26 PASS/0 FAIL)
BUILD SUCCEEDED
RESULTS FOR //xplat/caffe2:test_lite_trainer //xplat/caffe2:test_lite_trainer_pickle_and_flatbuffer
PASS    <100ms 13 Passed   0 Skipped   0 Failed   //xplat/caffe2:test_lite_trainer
PASS    <100ms 13 Passed   0 Skipped   0 Failed   //xplat/caffe2:test_lite_trainer_pickle_and_flatbuffer
TESTS PASSED
```

A new test in later commit D34488913 tests the full round trip.

Reviewed By: qihqi

Differential Revision: D34408538

fbshipit-source-id: eea183c31b5e1b2b75a65f384d8a479223a4ae72
(cherry picked from commit de310a15422b65fb7e443f7005d287d9f5f586bc)
2022-03-30 20:39:58 +00:00
Elias Ellison
6d36bbde7e Add comments for adding shape function and linting
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73570

Approved by: https://github.com/huiguoo
2022-03-29 23:02:22 +00:00
Elias Ellison
9c4a63787b Add api for changing function executor settings, hook up execution with decomposition registry (#74186)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74186

Make the execution settings mutable on function_impl so that we can set it for running op decompositions. Add mapping to function objects and show example in test of executing op decompositions.

Test Plan: Imported from OSS

Reviewed By: gchanan

Differential Revision: D34938125

Pulled By: eellison

fbshipit-source-id: adf108b2f6c1bd166910c6d7b94245661d67ce0d
(cherry picked from commit 9957e33803002d9e71abe4ff802769270b6960d3)
2022-03-29 18:38:52 +00:00
Elias Ellison
0ecf1add1b Introduce function-local settings for executor, expose in c++ (#74012)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74012

This allows setting an executor on a function. The first use case is use to decompositions in C++ without additional fusion passes etc which might not work with custom tensors like batched tensors/vmap. A subsequent use case might be taking advantage of invokees of JIT execution which guard on certain properties before invocation (such as complete shapes in AOT autograd, rank in lazy tensor).

Test Plan: Imported from OSS

Reviewed By: gchanan

Differential Revision: D34938124

Pulled By: eellison

fbshipit-source-id: cf7a45416457942b872322cab47d871a8336bdb5
(cherry picked from commit 9c600eb9ad0f2173f003e511268e97584edae36d)
2022-03-29 18:38:52 +00:00
Elias Ellison
6694fdaccd Clean up profiling mode and profiling executor strategy (#73875)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73875

Previously we had a few settings:
- getExecutor - which toggled between Profiling Executor and Legacy
- getGraphOptimize - if true, overrides PE/Legacy to run with simple executor (no optimizations)
and then...
- getProfilingMode - which would set PE to 0 specializtions.

The last mode is redundant with getGraphOptimize, we should just remove it and use getGraphOptimize in these cases. It would lead to potentially invalid combinations of logic - what does mean if getProfilingMode is true but getExecutor is set to false ? This would lead to a bug in specialize_autograd_zero in this case, see: https://github.com/pytorch/pytorch/blob/master/torch%2Fcsrc%2Fjit%2Fpasses%2Fspecialize_autogradzero.cpp#L93.

The tests here are failing but get fixed with the PR above it, so i'll squash for landing.

Test Plan: Imported from OSS

Reviewed By: cpuhrsch

Differential Revision: D34938130

Pulled By: eellison

fbshipit-source-id: 1a9c0ae7f6d1cfddc2ed3499a5af611053ae5e1b
(cherry picked from commit cf69ce3d155ba7d334022c42fb2cee54bb088c23)
2022-03-29 18:38:51 +00:00
Kurt Mohler
5375b2e994 Resolve int[]? arguments to new OptionalIntArrayRef class
This PR uses the `OptionalArrayRef` template class that was drafted in #64084.

Fixes #44409
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70864
Approved by: https://github.com/ezyang
2022-03-26 01:45:50 +00:00
Pavithran Ramachandran
fc2cf3d26f Back out "Revert D34805092: Extend _save_for_mobile and _load_for_mobile to support flatbuffer format; Default format is pickle + Change buck targets to support only pickle and pickle + flatbuffer for migration" (#74594)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74594

Extending `_save_for_mobile` and `_load_for_mobile` to support faltbuffer format with additional optional argument which is set to pick pickle by default.

Adding new binary target with suffix `_pickle_and_flatbuffer` to help migration.

Size test in D34909502 shows the size has regressed by ~40K but after removing pickle and comparing lite_predictors we have ~120K size measure that we will achieve when deprecating pickle and moving to flatbuffer

**BEFORE:**

```lang=mermaid
graph TD;
    torch_core-->torch_mobile_deserialize;

    torch_mobile_core-->torch_mobile_deserialize;

    jit_module_saving-->torch_core;
    jit_module_saving-->torch_mobile_core;

    torch_mobile_deserialize-->caffe2_serialize;
    torch_mobile_deserialize-->torch_mobile_module;

    caffe2_serialize-->miniz;

    flatbuffer_loader-->mobile_bytecode;
    flatbuffer_serializer-->mobile_bytecode;

    mobile_bytecode-->flatbuffer_2.0;

    flatbuffer_loader-->torch_mobile_module;
    flatbuffer_serializer-->torch_mobile_module;
```

**AFTER:**
```lang=mermaid
graph TD;
    torch_core-->torch_mobile_deserialize;

    torch_mobile_core-->torch_mobile_deserialize;

    jit_module_saving-->torch_core;
    jit_module_saving-->torch_mobile_core;

    torch_mobile_deserialize-->caffe2_serialize;
    torch_mobile_deserialize-->torch_mobile_module;

    caffe2_serialize-->miniz;

    flatbuffer_loader-->mobile_bytecode;
    flatbuffer_serializer-->mobile_bytecode;

    mobile_bytecode-->flatbuffer_2.0;

    torch_mobile_deserialize_pickle_and_flatbuffer-->|new| flatbuffer_loader;
    torch_mobile_deserialize_pickle_and_flatbuffer-->|new| torch_mobile_deserialize;
    torch_mobile_core_pickle_and_flatbuffer-->|new| torch_mobile_deserialize_pickle_and_flatbuffer;
    torch_core_pickle_and_flatbuffer-->|new| torch_mobile_deserialize_pickle_and_flatbuffer;

    jit_module_saving_pickle_and_flatbuffer-->|new| torch_core_pickle_and_flatbuffer;
    jit_module_saving_pickle_and_flatbuffer-->|new| torch_mobile_core_pickle_and_flatbuffer;

    flatbuffer_serializer-->torch_mobile_module;

    jit_module_saving_pickle_and_flatbuffer-->|new|jit_module_saving;
    jit_module_saving_pickle_and_flatbuffer-->|new|flatbuffer_serializer;

    flatbuffer_loader-->torch_mobile_module;
```

Original commit changeset: 780dfb6fd6ba

Original Phabricator Diff: D34805092 (284b2b7135)
ghstack-source-id: 152044801

(Note: this ignores all push blocking failures!)

Test Plan:
CI

```
~/fbsource/fbcode] cd ~/fbsource/fbcode/ && buck test -c fbcode.caffe2_enable_flatbuffer=1 //caffe2/test/cpp/jit:jit  -- FlatbufferTest.ExtraFiles
Parsing buck files: finished in 0.9 sec
Building: finished in 5.3 sec (100%) 12992/54304 jobs, 0/54304 updated
  Total time: 6.2 sec
More details at https://www.internalfb.com/intern/buck/build/2b387fff-f813-4cfa-b53f-eb2378630d4e
BUILD SUCCEEDED
Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details.
Running with tpx session id: f93a84d6-e7ce-41a0-a97f-0ef3fa6d199d
Trace available for this run at /tmp/tpx-20220323-134108.766518-f93a84d6-e7ce-41a0-a97f-0ef3fa6d199d/trace.log
RemoteExecution session id: reSessionID-f93a84d6-e7ce-41a0-a97f-0ef3fa6d199d-tpx
Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/4503599723101693
    ✓ ListingSuccess: caffe2/test/cpp/jit:jit : 486 tests discovered (19.122)
    ✓ Pass: caffe2/test/cpp/jit:jit - FlatbufferTest.ExtraFiles (0.187)
Summary
  Pass: 1
  ListingSuccess: 1
If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users
Finished test run: https://www.internalfb.com/intern/testinfra/testrun/4503599723101693
```

Similar Build Deps Dags

```
[pavithran@devvm5216.vll0 /data/users/pavithran/fbsource] buck query 'allpaths(//xplat/caffe2:torch_mobile_all_ops_pickle_and_flatbuffer, //xplat/caffe2:torch_mobile_deserialize_pickle_and_flatbuffer)' --output-format dot-compact  | pastry
P486770901: https://www.internalfb.com/intern/paste/P486770901/

[pavithran@devvm5216.vll0 /data/users/pavithran/fbsource] buck query 'allpaths(//xplat/caffe2:torch_mobile_all_ops, //xplat/caffe2:torch_mobile_deserialize)' --output-format dot-compact  | pastry
P486771278: https://www.internalfb.com/intern/paste/P486771278/
```

pickle_and_flatbuffer: https://www.internalfb.com/intern/dgw/graph/?build_id=P486770901
pickle: https://www.internalfb.com/intern/dgw/graph/?build_id=P486771278

Reviewed By: iseeyuan

Differential Revision: D35067157

fbshipit-source-id: 9044259c17a2e0da79bd6aedb28efbdfd57e23e0
(cherry picked from commit f738069ec3a72e79da56172741d027de514e9e5f)
2022-03-24 21:51:05 +00:00
Will Constable
3547f20872 Land remaining parts of Torchscript Lazy Tensor backend (#74111)
Summary:
Also enables bazel build to run lazy codegen.  Bazel (oss) build feeds off the same filelists as cmake/buck (build_variables.bzl), so enabling it is easier than keeping it disabled.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/74111

Test Plan: Run CI and verify test_lazy_ops is running via OSS cmake builds

Reviewed By: bdhirsh

Differential Revision: D34772403

fbshipit-source-id: 8a63f58b9536e6ac1be530667932176ef2549496
(cherry picked from commit e807ffb1918853d10b924fdc24f85ee5b1a39021)
2022-03-22 23:14:03 +00:00
Nikita Shulga
c53b3ed20f Revert D34805092: Extend _save_for_mobile and _load_for_mobile to support flatbuffer format; Default format is pickle + Change buck targets to support only pickle and pickle + flatbuffer for migration
Test Plan: revert-hammer

Differential Revision:
D34805092 (284b2b7135)

Original commit changeset: 57f3fc81d68f

Original Phabricator Diff: D34805092 (284b2b7135)

fbshipit-source-id: 780dfb6fd6ba5f9348f24a2fb3c57971b7155541
(cherry picked from commit bebeb8b84e11c34cbde4857d0e1c291731a7c781)
2022-03-22 22:45:50 +00:00
Pavithran Ramachandran
284b2b7135 Extend _save_for_mobile and _load_for_mobile to support flatbuffer format; Default format is pickle + Change buck targets to support only pickle and pickle + flatbuffer for migration (#74209)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74209

Extending `_save_for_mobile` and `_load_for_mobile` to support faltbuffer format with additional optional argument which is set to pick pickle by default.

Adding new binary target with suffix `_pickle_and_flatbuffer` to help migration.

Size test in D34909502 shows the size has regressed by ~40K but after removing pickle and comparing lite_predictors we have ~120K size measure that we will achieve when deprecating pickle and moving to flatbuffer

**BEFORE:**

```lang=mermaid
graph TD;
    torch_core-->torch_mobile_deserialize;

    torch_mobile_core-->torch_mobile_deserialize;

    jit_module_saving-->torch_core;
    jit_module_saving-->torch_mobile_core;

    torch_mobile_deserialize-->caffe2_serialize;
    torch_mobile_deserialize-->torch_mobile_module;

    caffe2_serialize-->miniz;

    flatbuffer_loader-->mobile_bytecode;
    flatbuffer_serializer-->mobile_bytecode;

    mobile_bytecode-->flatbuffer_2.0;

    flatbuffer_loader-->torch_mobile_module;
    flatbuffer_serializer-->torch_mobile_module;
```

**AFTER:**
```lang=mermaid
graph TD;
    torch_core-->torch_mobile_deserialize;

    torch_mobile_core-->torch_mobile_deserialize;

    jit_module_saving-->torch_core;
    jit_module_saving-->torch_mobile_core;

    torch_mobile_deserialize-->caffe2_serialize;
    torch_mobile_deserialize-->torch_mobile_module;

    caffe2_serialize-->miniz;

    flatbuffer_loader-->mobile_bytecode;
    flatbuffer_serializer-->mobile_bytecode;

    mobile_bytecode-->flatbuffer_2.0;

    torch_mobile_deserialize_pickle_and_flatbuffer-->|new| flatbuffer_loader;
    torch_mobile_deserialize_pickle_and_flatbuffer-->|new| torch_mobile_deserialize;
    torch_mobile_core_pickle_and_flatbuffer-->|new| torch_mobile_deserialize_pickle_and_flatbuffer;
    torch_core_pickle_and_flatbuffer-->|new| torch_mobile_deserialize_pickle_and_flatbuffer;

    jit_module_saving_pickle_and_flatbuffer-->|new| torch_core_pickle_and_flatbuffer;
    jit_module_saving_pickle_and_flatbuffer-->|new| torch_mobile_core_pickle_and_flatbuffer;

    flatbuffer_serializer-->torch_mobile_module;

    jit_module_saving_pickle_and_flatbuffer-->|new|jit_module_saving;
    jit_module_saving_pickle_and_flatbuffer-->|new|flatbuffer_serializer;

    flatbuffer_loader-->torch_mobile_module;
```
ghstack-source-id: 151744258

Test Plan:
Similar Build Deps Dags

```
[pavithran@devvm5216.vll0 /data/users/pavithran/fbsource] buck query 'allpaths(//xplat/caffe2:torch_mobile_all_ops_pickle_and_flatbuffer, //xplat/caffe2:torch_mobile_deserialize_pickle_and_flatbuffer)' --output-format dot-compact  | pastry
P486770901: https://www.internalfb.com/intern/paste/P486770901/

[pavithran@devvm5216.vll0 /data/users/pavithran/fbsource] buck query 'allpaths(//xplat/caffe2:torch_mobile_all_ops, //xplat/caffe2:torch_mobile_deserialize)' --output-format dot-compact  | pastry
P486771278: https://www.internalfb.com/intern/paste/P486771278/
```

pickle_and_flatbuffer: https://www.internalfb.com/intern/dgw/graph/?build_id=P486770901
pickle: https://www.internalfb.com/intern/dgw/graph/?build_id=P486771278

Reviewed By: iseeyuan

Differential Revision: D34805092

fbshipit-source-id: 57f3fc81d68fce941a050c35bd8e6f05951183b3
(cherry picked from commit 671ae4ed29e65b86ffe507a503548d3e86ab0ea4)
2022-03-22 20:00:53 +00:00
Han Qi
4b4f652f79 [3/5] Put JIT source inside flatbuffer (#74245)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74245

title

Test Plan: unittest

Reviewed By: iseeyuan

Differential Revision: D34881612

fbshipit-source-id: 7037982e9267ad72b86e91cd5f2d92426d71dd56
(cherry picked from commit 88f34eb55b2bee6ef8ef27188e075fa2b8767fdf)
2022-03-17 18:46:47 +00:00
Will Constable
d67a265881 Sync lazy_tensor_staging to master (#74311)
Summary:
This merges changes that have already been reviewed/landed onto lazy_tensor_staging branch.  It combines changes from multiple PRs into one diff.

updated from lazy_tensor_staging on 3/16

Pull Request resolved: https://github.com/pytorch/pytorch/pull/74311

Test Plan:
Run CI to ensure compilation on various platforms
Run unit tests on lazy_tensor_staging branch with source version of all these diffs

Reviewed By: desertfire

Differential Revision: D34929235

fbshipit-source-id: babbc3bbeabc5b8107ee9284ed7765887a148622
(cherry picked from commit d91577a6557343ec536f6859e4808ec1a8a9b685)
2022-03-17 16:08:57 +00:00
Will Constable
44a8d4d998 Add lazy tensor unit tests, disabled (#74309)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74309

Since the test file is large, it can be landed on its own and then switched on
in the diff that actually builds lazy tensor code.

Test Plan: verify CI passes

Reviewed By: desertfire

Differential Revision: D34928619

fbshipit-source-id: cd556155326f7fb55b3f29031f80bc36c936d565
(cherry picked from commit 60945adbefb6a8d19f89e330f8b344d076b13bfc)
2022-03-17 15:31:26 +00:00
Will Constable
72b1194464 Run lazy tensor codegen in generate_code.py (#73996)
Summary:
Hooks into existing autograd codegen script (generate_code.py) to take advantage of its integrations into buck/cmake/bazel.

Adds a new option (--gen_lazy_ts_backend) to. generate_code.py, calling this from CMake OSS build and fbcode build, but not from other internal xplat/ovrsource builds (these could be opted in later)

Bazel support is added in a later diff.

Includes one generated file (torch/csrc/lazy/generated/LazyIr.h) in a unit test (test/cpp/lazy/test_ir.cpp) to partially verify the generator is working, but does not compile the remaining output sources from the generator yet as they depend on other files not yet landed from lazy_tensor_staging branch.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/73996

Test Plan: OSS/internal CI - verify all builds are working and test_ir.cpp compiles LazyIr.h

Reviewed By: ezyang

Differential Revision: D34408536

fbshipit-source-id: 8af0aea3b95d81eccafc17d64390d70ddd176515
(cherry picked from commit f930612f2bad61c76eb02d85cfbec9f33a1459dc)
2022-03-17 15:31:26 +00:00
Han Qi
ded82ad7c7 Create method to map JIT module to (source, constant) and back. (#74119)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74119

  implemented function to generate source as ExtraFilesMap and constants

  wrote function to construct jit module given (ivalue, source,
  constant) tripple.

Test Plan: unittest

Reviewed By: pavithranrao

Differential Revision: D34803945

fbshipit-source-id: 2edc798407fe68294cb4c3c7516f5bd143df88c3
(cherry picked from commit 35e54e166b8f0f5cfe8f08c07866b59ae61ee79d)
2022-03-15 18:30:08 +00:00
Taylor Robie
0b1f3bd158 [Profiler] Prefer TSC to wall clock when available (#73855)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73855

Calling the clock is one of the most expensive parts of profiling. We can reduce the profiling overhead by using `rdtsc` instead. The tradeoff is that we have to measure and convert. (shift and scale)

Test Plan: I added a cpp unit test with *very* aggressive anti-flake measures. I also ran the overhead benchmark (9 replicates) with `--stressTestKineto` (0.94 -> 0.89 us) and `--stressTestKineto --kinetoProfileMemory` (1.27 -> 1.17 us)

Reviewed By: chaekit

Differential Revision: D34231071

fbshipit-source-id: e3b3dd7580d93bcc783e87c7f2fc726cb74f4df8
(cherry picked from commit e8be9f8160793c6ee35d5af02bca3e01703e377d)
2022-03-13 18:29:06 +00:00
Taylor Robie
5a58820f01 [Profiler] Specialized AppendOnlyQueue (#73409)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73409

We can do better than `vector` or `deque`, and it's sufficiently important to the hot path to justify a custom container. (This is part of the larger queue refactor, but this is a standalone drop-in replacement so we don't need to wait.)

Test Plan: It's a pretty simple container type, so I just added a few cpp tests for emplace and read back. I also ran the overhead benchmark (replicates=9) with both `--stressTestKineto` (0.99 -> 0.94 us) and `--stressTestKineto --kinetoProfileMemory` (1.36 -> 1.27 us).

Reviewed By: swolchok

Differential Revision: D34231072

fbshipit-source-id: ed57299729d444d59cf843a0d38a3ee2240eeec1
(cherry picked from commit 43907948f3a8d2137244e7bb59f43999bd660917)
2022-03-11 19:47:40 +00:00
David Dang
abfaef0aec [Quant][core] Merged conv packed params and linear packed params (#73486)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73486

conv and linear packed params were previously defined in ATen/native/quantized/cpu/conv_packed_params.h> and ATen/native/quantized/cpu/packed_params.h>. These two files have been merged into one and has been relocated to ATen/native/quantized/cpu/packed_params.h>.

Differential Revision:
D34513286
D34513286

Test Plan: Imported from OSS

Reviewed By: dagitses

Pulled By: dzdang

fbshipit-source-id: 813845af7ea9449e316ab7822efe7460f0bd0d88
(cherry picked from commit 2f627561f27f81977ff73b8863c5e9e719dc4c60)
2022-03-11 15:18:45 +00:00
Ivan Kobzarev
519e226b66 [tensorexp] ExternalCall2 without memcpy (#72225)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72225

Test Plan: Imported from OSS

Reviewed By: dagitses

Differential Revision: D33960933

Pulled By: IvanKobzarev

fbshipit-source-id: fc73a3de9e5150919e3806516065b4a6c8316000
(cherry picked from commit f637842c341e0ba94906a0c8a1efc81691dc512c)
2022-03-09 21:19:26 +00:00
Han Qi
0723639b60 Revert D34455360: Multisect successfully blamed D34455360 for test failures
Summary:
This diff is reverting D34455360 (61d6c43864)
D34455360 (61d6c43864) is making the following tests to fail and this revert diff is either the revert of the blame diff or the revert of the stack of diffs that need to be reverted to revert the blame diff

Tests affected:
- https://www.internalfb.com/intern/test/562950004334605/

Multisect link:
https://www.internalfb.com/intern/testinfra/multisect/756170

Test Plan: NA

Reviewed By: zhxchen17

Differential Revision: D34596156

fbshipit-source-id: a465bca0094db3caf6130c80f1ed49eea981359b
(cherry picked from commit ef5e5578c64ce9827570757fb016aafa9c782c6a)
2022-03-08 23:18:54 +00:00
Elias Ellison
52ccbf4494 Lock thread/block computation (#73800)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73800

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D34647281

Pulled By: eellison

fbshipit-source-id: adbdaf24191c4c1b85e0b62564388f2481002ed2
(cherry picked from commit 6cf38015cc14691518b1b5cb7d636e80eb3684fc)
2022-03-04 22:32:08 +00:00
Dave Bort
7b51629c53 [PyTorchEdge] Add getFileFormat() so we can differentiate Zip/Pickle from Flatbuffer (#73707)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73707

Add a helper function to detect the file format from the first bytes of a data file or stream. This will be necessary during the migration from Pickle-serialized modules to Flatbuffer-serialized modules.
ghstack-source-id: 150384317

Test Plan:
Existing tests for ZIP+Pickle continue to pass.

New unit tests pass:
```
cd xplat && buck test //xplat/caffe2:test_lite_trainer //xplat/caffe2:test_lite_interpreter

Building: finished in 26.6 sec (100%) 3180/3180 jobs, 571/3180 updated
  Total time: 32.2 sec
Testing: finished in 07:08.3 min (89 PASS/0 FAIL)
BUILD SUCCEEDED
RESULTS FOR //xplat/caffe2:test_lite_interpreter //xplat/caffe2:test_lite_trainer
PASS    421.1s 81 Passed   0 Skipped   0 Failed   //xplat/caffe2:test_lite_interpreter
PASS     103ms  8 Passed   0 Skipped   0 Failed   //xplat/caffe2:test_lite_trainer
TESTS PASSED
```

Reviewed By: iseeyuan

Differential Revision: D34527859

fbshipit-source-id: ff2d1eabc2f8be1de2e44709c878e2d1a373f0df
(cherry picked from commit 5c394848346ab9e374c9e7eed479ad70ed09a7ae)
2022-03-04 19:35:41 +00:00
Han Qi
61d6c43864 Make debug_pkl smaller by only emitting unique traces. (#73368)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73368

debug_pkl file inside of pytorch's .pt file consists of a list of SourceRanges. Each SourceRange points to a Source which is a stack track, filename, and start, end numbers. Those are emitted in debug_pkl file as strings.
Since many SourceRange shares the same source, the string for trace can be deduped.
The newer format saves a set of unique traces in a tuple, then each SourceRange will save the offset of it's trace w.r.t. position in that tuple. (i.e. manually applying dictionary compression).
The above helps with smaller file size. On loading, if we copy each trace to Source as string the runtime memory would still blowup.
To mitigate this, we use SourceView directly instead of source which will take the reference of string inside of Deserializer and make that into string_view. This is safe because Deserializer is hold by Unpickler by shared_ptr, and Unpickler is also hold by shared_ptr by another Source object. That Source object will be alive during the model construction.

Test Plan:
unit test

Took original file (312271638_930.predictor.disagg.local); loaded with `torch.jit.load` save again with `torch.jit.save`. Unzip both, look at contents:
```
[qihan@devvm5585.vll0 ~]$ du archive -h
4.0K    archive/xl_model_weights
3.7M    archive/extra
8.0K    archive/code/__torch__/caffe2/torch/fb/model_transform/splitting
8.0K    archive/code/__torch__/caffe2/torch/fb/model_transform
8.0K    archive/code/__torch__/caffe2/torch/fb
8.0K    archive/code/__torch__/caffe2/torch
8.0K    archive/code/__torch__/caffe2
20M     archive/code/__torch__/torch/fx/graph_module
20M     archive/code/__torch__/torch/fx
8.0K    archive/code/__torch__/torch/classes
20M     archive/code/__torch__/torch
20M     archive/code/__torch__
20M     archive/code
2.7M    archive/constants
35M     archive
[qihan@devvm5585.vll0 ~]$ du resaved -h
4.0K    resaved/extra
8.0K    resaved/code/__torch__/caffe2/torch/fb/model_transform/splitting
8.0K    resaved/code/__torch__/caffe2/torch/fb/model_transform
8.0K    resaved/code/__torch__/caffe2/torch/fb
8.0K    resaved/code/__torch__/caffe2/torch
8.0K    resaved/code/__torch__/caffe2
1.3M    resaved/code/__torch__/torch/fx/graph_module
1.3M    resaved/code/__torch__/torch/fx
8.0K    resaved/code/__torch__/torch/classes
1.4M    resaved/code/__torch__/torch
1.4M    resaved/code/__torch__
1.4M    resaved/code
2.7M    resaved/constants
13M     resaved
[qihan@devvm5585.vll0 ~]$
```

Reviewed By: gmagogsfm

Differential Revision: D34455360

fbshipit-source-id: 8cc716f9bba7183746b1b4ecc33a2de34ac503b9
(cherry picked from commit f1a04730fc9ac8fdab6c8e4c44cb5529e42090e4)
2022-03-02 08:37:08 +00:00
Mengwei Liu
9ce9803abe [PyTorch] Add codegen unboxing ability (#69881)
Summary:
RFC: https://github.com/pytorch/rfcs/pull/40

This PR (re)introduces python codegen for unboxing wrappers. Given an entry of `native_functions.yaml` the codegen should be able to generate the corresponding C++ code to convert ivalues from the stack to their proper types. To trigger the codegen, run
```
tools/jit/gen_unboxing.py -d cg/torch/share/ATen
```

Merged changes on CI test. In https://github.com/pytorch/pytorch/issues/71782 I added an e2e test for static dispatch + codegen unboxing. The test exports a mobile model of mobilenetv2, load and run it on a new binary for lite interpreter: `test/mobile/custom_build/lite_predictor.cpp`.

## Lite predictor build specifics

1. Codegen: `gen.py` generates `RegisterCPU.cpp` and `RegisterSchema.cpp`. Now with this PR, once `static_dispatch` mode is enabled, `gen.py` will not generate `TORCH_LIBRARY` API calls in those cpp files, hence avoids interaction with the dispatcher. Once `USE_LIGHTWEIGHT_DISPATCH` is turned on, `cmake/Codegen.cmake` calls `gen_unboxing.py` which generates `UnboxingFunctions.h`, `UnboxingFunctions_[0-4].cpp` and `RegisterCodegenUnboxedKernels_[0-4].cpp`.
2. Build: `USE_LIGHTWEIGHT_DISPATCH` adds generated sources into `all_cpu_cpp` in `aten/src/ATen/CMakeLists.txt`. All other files remain unchanged. In reality all the `Operators_[0-4].cpp` are not necessary but we can rely on linker to strip them off.

## Current CI job test coverage update

Created a new CI job `linux-xenial-py3-clang5-mobile-lightweight-dispatch-build` that enables the following build options:
* `USE_LIGHTWEIGHT_DISPATCH=1`
* `BUILD_LITE_INTERPRETER=1`
* `STATIC_DISPATCH_BACKEND=CPU`

This job triggers `test/mobile/lightweight_dispatch/build.sh` and builds `libtorch`. Then the script runs C++ tests written in `test_lightweight_dispatch.cpp` and `test_codegen_unboxing.cpp`. Recent commits added tests to cover as many C++ argument type as possible: in `build.sh` we installed PyTorch Python API so that we can export test models in `tests_setup.py`. Then we run C++ test binary to run these models on lightweight dispatch enabled runtime.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69881

Reviewed By: iseeyuan

Differential Revision: D33692299

Pulled By: larryliu0820

fbshipit-source-id: 211e59f2364100703359b4a3d2ab48ca5155a023
(cherry picked from commit 58e1c9a25e3d1b5b656282cf3ac2f548d98d530b)
2022-03-01 23:28:13 +00:00
Elias Ellison
d3d74e9040 Allow custom registration of shape functions (#73270)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73270

Together with open registration of NNC lowerings this should make possible to add support for custom operators, including internal fb-ops

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D34451275

Pulled By: eellison

fbshipit-source-id: ae8ae2deb93caa6770e738217461e65853897b55
(cherry picked from commit ea6b7e8a6d8f970a20e68d02eefc5c951e32aa07)
2022-02-28 17:44:45 +00:00