Commit Graph

4284 Commits

Author SHA1 Message Date
BowenBao
8f4edf1e1d [ONNX] Initial version of diagnostics infrastructure. (#85107)
This PR introduces a general Python diagnostics infrastructure powered by SARIF,
and the exporter diagnostics module that builds on top of it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85107
Approved by: https://github.com/abock, https://github.com/justinchuby
2022-09-30 07:47:26 +00:00
BowenBao
33401ee81f [ONNX] Rename 'sarif_om' to 'sarif' (#85918)
'sarif_om' was the module name in the original repository https://github.com/microsoft/sarif-python-om.
But since we have moved along with various extensions, it wouldn't hurt to rename the module for clarity.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85918
Approved by: https://github.com/abock, https://github.com/thiagocrepaldi, https://github.com/justinchuby
2022-09-30 05:39:49 +00:00
BowenBao
e9b254a025 [ONNX] Migrate SARIF from attr to dataclasses (#85651)
Move to dataclasses since PyTorch does not depend on `attr`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85651
Approved by: https://github.com/justinchuby, https://github.com/AllenTiTaiWang, https://github.com/abock, https://github.com/thiagocrepaldi
2022-09-30 05:34:40 +00:00
BowenBao
91667d1d21 [ONNX] Introduce SARIF (#85428)
That's the parent issue tracking this and more follow up tasks, so will keep open after this.
This PR introduces the python classes for SARIF object model, along with script for generation.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85428
Approved by: https://github.com/justinchuby, https://github.com/AllenTiTaiWang, https://github.com/abock, https://github.com/thiagocrepaldi
2022-09-30 05:32:41 +00:00
soulitzer
7e4684009c Improve codegen for jvp decomposition (#84894)
Fixes: https://github.com/pytorch/pytorch/issues/84888
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84894
Approved by: https://github.com/albanD
2022-09-29 03:04:15 +00:00
soulitzer
bd65adf4e9 Properly fix log_sigmoid vmapjvp and remove hack (#84892)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84892
Approved by: https://github.com/albanD, https://github.com/zou3519
2022-09-29 01:19:13 +00:00
Mikayla Gawarecki
afaee00fec Add python nested_tensor and as_nested_tensor constructors in torch.nested (#85593)
Remove `torch.nested_tensor` which has erroneous behavior wrt gradients (could be either leaf or not leaf). Introduce `torch.nested.nested_tensor` and `torch.nested.as_nested_tensor` in the vein of `torch.tensor` and `torch.as_tensor`. Done in nested `__init__.py` for now but can move to pybind in future (when we want to load from numpy/nested lists ).

Discussed offline with @cpuhrsch and pybind constructor (https://github.com/pytorch/pytorch/pull/85536) was more gnarly than expected, so we can move to that when we do need loading from numpy etc.

Differential Revision: [D39806622](https://our.internmc.facebook.com/intern/diff/D39806622)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85593
Approved by: https://github.com/drisspg, https://github.com/cpuhrsch
2022-09-28 20:15:02 +00:00
Horace He
a4bd89b267 Revert "Revert "Symintified mmm/addmm derivative formulas (#85794)"" (#85820)
This reverts commit 823dc33b00.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85820
Approved by: https://github.com/huydhn
2022-09-28 17:34:11 +00:00
PyTorch MergeBot
a0b1693996 Revert "Update amax/amin/norm/count_nonzero signatures with int[*]? dim (#83300)"
This reverts commit 1c0f0b33a0.

Reverted https://github.com/pytorch/pytorch/pull/83300 on behalf of https://github.com/jeffdaily due to The commit breaks nvfuser tests
2022-09-28 17:04:53 +00:00
PyTorch MergeBot
823dc33b00 Revert "Symintified mmm/addmm derivative formulas (#85794)"
This reverts commit 230edd2515.

Reverted https://github.com/pytorch/pytorch/pull/85794 on behalf of https://github.com/janeyx99 due to Sorry, reverting as this breaks an aot_autograd mac test on functorch 230edd2515
2022-09-28 16:02:05 +00:00
Horace He
230edd2515 Symintified mmm/addmm derivative formulas (#85794)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85794
Approved by: https://github.com/ezyang
2022-09-28 14:07:57 +00:00
Edward Z. Yang
793488cda2 Revert "Revert "Symintifying slice ops (#85196)"" (#85746)
This reverts commit 3a171dfb0c.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85746
Approved by: https://github.com/albanD
2022-09-28 04:37:35 +00:00
Kurt Mohler
1c0f0b33a0 Update amax/amin/norm/count_nonzero signatures with int[*]? dim (#83300)
Changes `dim` arg to use `int[*]?` type for the following functions in `native_funcitons.yaml`:
* `amax`
* `amin`
* `norm`
* `frobenius_norm`
* `native_norm`
* `count_nonzero`

Part of #29137

Pull Request resolved: https://github.com/pytorch/pytorch/pull/83300
Approved by: https://github.com/ngimel, https://github.com/albanD, https://github.com/kulinseth
2022-09-28 01:56:37 +00:00
PyTorch MergeBot
572dd862c4 Revert "Update amax/amin/norm/count_nonzero signatures with int[*]? dim (#83300)"
This reverts commit 8c7c7ed322.

Reverted https://github.com/pytorch/pytorch/pull/83300 on behalf of https://github.com/huydhn due to The commit pin breaks XLA test somehow
2022-09-28 01:36:43 +00:00
Kurt Mohler
8c7c7ed322 Update amax/amin/norm/count_nonzero signatures with int[*]? dim (#83300)
Changes `dim` arg to use `int[*]?` type for the following functions in `native_funcitons.yaml`:
* `amax`
* `amin`
* `norm`
* `frobenius_norm`
* `native_norm`
* `count_nonzero`

Part of #29137

Pull Request resolved: https://github.com/pytorch/pytorch/pull/83300
Approved by: https://github.com/ngimel, https://github.com/albanD, https://github.com/kulinseth
2022-09-27 23:50:04 +00:00
PyTorch MergeBot
3a171dfb0c Revert "Symintifying slice ops (#85196)"
This reverts commit 4c01c51266.

Reverted https://github.com/pytorch/pytorch/pull/85196 on behalf of https://github.com/atalman due to Break internal build Exutorch
2022-09-27 18:01:27 +00:00
soulitzer
15c52ffc4f Disallow auto_element_wise for in-place and fix some in-place gradients (#85634)
Fixes https://github.com/pytorch/pytorch/issues/85535

Also fixes the backward and forward gradients of `nn.functional.threshold`. The issue was that in-place gradients weren't tested because the in-place variants were not properly registered to the OpInfo.

Perhaps an alternative to this to make auto_element_wise smart enough to actually handle the in-places cases (we have 4 cases total now where we manually copy_ after doing auto_element_wise), but that requires a few more changes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85634
Approved by: https://github.com/albanD
2022-09-27 15:35:24 +00:00
George Qi
686555b663 [maskedtensor] port torch/_masked into torch/masked (#85515)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85515
Approved by: https://github.com/cpuhrsch
2022-09-26 23:41:13 +00:00
Brian Hirsh
4a2d2e5e40 Change API type Tensor[] for structured kernels. (#73350)
Partially fixes: #66328

This PR:
- adds support for `ITensorList` to the dispatcher for:
  - computing the dispatch key
  - boxing and unboxing `ITensorList`
- modified the codegen for structured kernels:
  - codegen APIs use `ITensorList` instead of `ArrayRef<Tensor>`

**Changes summary:**

- Signature changes due to the different APIs:
  - dispatcher API (e.g. `BatchingRegistrations.cpp`)
  - C++ API (e.g. `TensorShape.cpp`)
- Miscelaneous functions used by codegen'd functions (e.g. `FunctionalTensorWrapper.*`)
- Dispatcher changes for handling `ITensorList` correctly (e.g. `DispatchKeyExtractor.h`)
- Signature changes of `at::cat` due to the need of `const` inside `TensorBody.h`
- Forward declarations of `ITensorList` (e.g. `MethodOperators.h`)
- Codegen changes, special casing structured kernels (e.g. `gen.py`)

**Short description of structured kernels special casing:**

I introduced, mainly, 5 types of changes to the codegen for generating code depending on
whether the kernel is structured or not:

1. Added a `structured_type_override` flag to the `argument_type` function definition of
the affected APIs (mainly the dispatcher and C++ APIs).
  - `api/cpp.py`, `api/dispatcher.py`, `api/native.py`
2. Added a `structured_type_override` member to the signature
classes (e.g. `CppSignature`), since `FunctionSchema` doesn't really know whether the
function is structured or not
  - `api/types.py`
3. Added a `part_of_structured_group` to `NativeFunction` class, which is just a
convenient function to forward to `structured_type_override` wherever needed
  - `model.py`
4. Appropriately changed the rest of the codegen, whenever it used either the signature
classes or the `arguments` function directly
5. Added a check for `const ITensorList&` type wherever there was a check for `TensorList`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73350
Approved by: https://github.com/bdhirsh
2022-09-26 21:46:38 +00:00
Edward Z. Yang
4c01c51266 Symintifying slice ops (#85196)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85196
Approved by: https://github.com/ezyang
2022-09-23 22:01:32 +00:00
Catherine Lee
49e10c1598 [ci] test_ops in parallel, ci tests log to file (#85528)
part one of splitting up https://github.com/pytorch/pytorch/pull/84961 into (probably 2) parts

contains
* logging to file
* testing test_ops in parallel
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85528
Approved by: https://github.com/huydhn
2022-09-23 20:45:20 +00:00
Ivan Yashchuk
539076e2c2 Remove deprecated torch.lstsq (#70980)
The time has come to remove deprecated linear algebra related functions. This PR removes `torch.lstsq`.

There's a note in `tools/codegen/gen.py` about `lstsq` schema in `native_function.yaml` that I will not remove:
87139d8532/tools/codegen/gen.py (L734-L770)

cc @jianyuh @nikitaved @pearu @mruberry @walterddr @IvanYashchuk @xwang233 @Lezcano
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70980
Approved by: https://github.com/lezcano, https://github.com/kit1980
2022-09-23 00:16:55 +00:00
Richard Zou
848437590f Delete functorch's monkeypatching (#85430)
By upstreaming functorch's tensor printing logic into PyTorch. There's
no way of creating a custom print function for a TensorImpl subclass (as
opposed to a torch_dispatch or torch_function tensor subclass, which can
just override repr()) right now, so we need to directly interpose inside
regular Tensor printing in PyTorch.

Monkey patching is bad; users do not expect `import blah` to change
something about another library.

Fixes https://github.com/pytorch/functorch/issues/900

Test Plan:
- existing tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85430
Approved by: https://github.com/ezyang
2022-09-22 18:47:12 +00:00
kshitij12345
56a41b5998 [composite compliance] ctc_loss (#84752)
#Ref #69991

I have mixed feelings about adding new (private) operators. Backends writers will have to override them as well.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84752
Approved by: https://github.com/zou3519
2022-09-22 00:21:11 +00:00
PyTorch MergeBot
3dce26635f Revert "test in parallel at file granularity (#84961)"
This reverts commit 8107666c6a.

Reverted https://github.com/pytorch/pytorch/pull/84961 on behalf of https://github.com/clee2000 due to makes test_forward_ad_nn_functional_max_unpool2d_cuda_float32 flakily unexpectedly pass
2022-09-21 20:21:25 +00:00
Mikayla Gawarecki
77f1f98479 Re-introduce torch.Tensor.to_padded_tensor (#85293)
Differential Revision: [D39629004](https://our.internmc.facebook.com/intern/diff/D39629004)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85293
Approved by: https://github.com/cpuhrsch
2022-09-21 18:45:56 +00:00
Catherine Lee
8107666c6a test in parallel at file granularity (#84961)
run tests in parallel at the test file granularity

runs 3 files in parallel using multiprocessing pool, output goes to a file, which is then printed when the test finishes.  Some tests cannot be run in parallel (usually due to lacking memory), so we run those after.  Sharding is changed to attempt to mask large files with other large files/run them on the same shard.

test_ops* gets a custom handler to run it because it is simply too big (2hrs on windows) and linalg_cholesky fails (I would really like a solution to this if possible, but until then we use the custom handler).

reduces cuda tests by a lot, reduces total windows test time by ~1hr

Ref. https://github.com/pytorch/pytorch/issues/82894
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84961
Approved by: https://github.com/huydhn
2022-09-21 16:58:11 +00:00
Edward Z. Yang
3eb27229dd as_strided symbolic support (#85264)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: [D39662820](https://our.internmc.facebook.com/intern/diff/D39662820)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85264
Approved by: https://github.com/wconstab
2022-09-21 13:34:55 +00:00
Edward Z. Yang
e1f634753c Setup fake tensor and symbolic shapes once at beginning of AOTAutograd (#85233)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: [D39662822](https://our.internmc.facebook.com/intern/diff/D39662822)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85233
Approved by: https://github.com/wconstab
2022-09-20 19:11:25 +00:00
Thomas Viehmann
e41d758e26 Handle implicit real->complex casting for backward of stack (#84993)
Fixes: #75852

P.S.: Yay for the PyTorch foundation.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/84993
Approved by: https://github.com/soulitzer
2022-09-19 21:20:34 +00:00
Edward Z. Yang
6a18616296 Support for sym_strides() in backwards formulas (#85210)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85210
Approved by: https://github.com/Chillee, https://github.com/voznesenskym
2022-09-19 18:05:09 +00:00
Brian Hirsh
1838957e6f fix external codegen kernel error checking (#85029)
Fixes https://github.com/pytorch/pytorch/issues/84987. I followed the repro steps from the issue (changed `empty_symint` to `empty_symint2` and confirmed that and error gets raised.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/85029
Approved by: https://github.com/ezyang
2022-09-17 04:08:09 +00:00
Edward Z. Yang
490727a35f New calling convention for Python dispatcher (#85133)
Instead of calling into the Python dispatcher for EVERY dispatcher
call, we now have a two step process.  First, we
getattr(op: OpOverload, dispatch_key) to "load" the handler for the
function.  This can either be a conventional function (in which
case we will call it, in the same way the old Python dispatcher
worked), or it can be a DispatchKey, in which case we will directly
call that DispatchKey in C++, bypassing marshalling between Python
and C++ entirely.  OpOverload.__getattr__ is carefully written so
that it will cache the

A further optimization would be to define __slots__ on OpOverload,
and ensuring that the DispatchKey strings are interned.

The resulting Python dispatcher is less flexible: after the first
lookup, the handler is cached and we won't recompute it.  Furthermore,
by default, dispatches will not go into Python, and so you won't
get stack frames for the Python dispatcher by default.  But we get
a huge performance improvement: on the following microbenchmark
we go from 2.5s to 1.9s.

```
import time
import torch
from functorch import make_fx

def f(x):
    for i in range(1000):
        x = x * x
    return x

begin = time.time()
res = make_fx(f, tracing_mode="symbolic")(torch.randn(10, 20))
print(time.time()-begin)
```

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85133
Approved by: https://github.com/wconstab
2022-09-16 20:38:21 +00:00
lezcano
d710c95cc0 Implement forward AD for scatter_reduce (#85000)
I left the case `reduction="prod"` for future work as it's a bit of a pain.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85000
Approved by: https://github.com/soulitzer
2022-09-16 17:45:07 +00:00
Edward Z. Yang
00ce302c07 Performance optimizations to proxy tensor (#85049)
- Lazily allocate FX nodes for size/stride accessors on proxy tensor
- Properly track derived computations on strides/numel/etc
- Remove unnecessary tree_map at end of proxy tensor trace checking
  invariants; we will just have to be smart (it's too expensive)
- Avoid tree_map in sym proxy tracing

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85049
Approved by: https://github.com/wconstab
2022-09-16 00:28:50 +00:00
soulitzer
7f88934a8f [reland 2] Call jit decomp in VariableType to improve forward AD coverage (#84976)
Reland of https://github.com/pytorch/pytorch/pull/84675
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84976
Approved by: https://github.com/zou3519
2022-09-15 22:46:19 +00:00
Michael Voznesensky
8ca1839d32 Python Dispatcher integration with C++ dispatcher (#85050)
#84826 but without ghstack
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85050
Approved by: https://github.com/malfet
2022-09-15 00:43:36 +00:00
PyTorch MergeBot
706b990306 Revert "Python Dispatcher integration with C++ dispatcher (#84826)"
This reverts commit 35f6a69191.

Reverted https://github.com/pytorch/pytorch/pull/84826 on behalf of https://github.com/malfet due to Broke dynamo, see 35f6a69191
2022-09-14 14:07:58 +00:00
Michael Voznesensky
35f6a69191 Python Dispatcher integration with C++ dispatcher (#84826)
Signed-off-by: Edward Z. Yang <ezyangfb.com>

From @ezyang's original PR:

There are a number of situations where we have non-backend kernels (e.g., CompositeImplicitAutograd, batching rules) which we would like to port to Python, but we have no way to integrate these ports with the overall system while using preexisting C++ registrations otherwise. This PR changes that by introducing a Python dispatcher (which can have its own kernels directly in Python), which can be interpose over ordinary C++ dispatch. The ingredients:

We introduce a new PythonDispatcher dispatch key, that has the same tenor as FuncTorchDynamicLayerFrontMode: it works by getting triggered before every other dispatch key in the dispatch key, and shunting to a Python implementation
The Python dispatcher is a per-interpreter global object that is enabled/disabled via the guard EnablePythonDispatcher/DisablePythonDispatcher. We don't make it compositional as I have no idea what a compositional version of this feature would look like. Because it is global, we don't need to memory manage it and so I use a simpler SafePyHandle (newly added) to control access to this pointer from non-Python C++. Like __torch_dispatch__, we use PyInterpreter to get to the Python interpreter to handle the dispatch.
I need to reimplement dispatch table computation logic in Python. To do this, I expose a lot more helper functions for doing computations on alias dispatch keys and similar. I also improve the pybind11 handling for DispatchKey so that you can either accept the pybind11 bound enum or a string; this simplifies our binding code. See https://github.com/pybind/pybind11/issues/483#issuecomment-1237418106 for how this works; the technique is generally useful.

I need to be able to call backend fallbacks. I do this by permitting you to call at a dispatch key which doesn't have a kernel for the operator; if the kernel doesn't exist, we check the backend fallback table instead.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84826
Approved by: https://github.com/ezyang
2022-09-14 06:57:19 +00:00
PyTorch MergeBot
36d79143ce Revert "[reland] Call jit decomposition in VariableType to increase forward AD coverage (#84151) (#84675)"
This reverts commit bb4e96c964.

Reverted https://github.com/pytorch/pytorch/pull/84675 on behalf of https://github.com/osalpekar due to causing asan xplat link-time errors like ld.lld: error: undefined symbol: torch::jit::has_jit_decomposition(c10::FunctionSchema const&)
2022-09-13 22:54:54 +00:00
drisspg
bda8a5729b [Nested Tensor] Create differentiable nt to tensor view functions (#83371)
This PR attempts to implements 2) "the safe way" of creating a view of nested tensor that returns a regular tensor. The rest of the break down is here: https://fb.quip.com/J8QCAx41af11

https://gist.github.com/drisspg/8622e9c97d374fa920ac647e1167cabc
This is a short list of some edge cases. After some more work I was able to address two of the test cases in the above gist. There are few complex aspects here that I left defeated comments inline.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/83371
Approved by: https://github.com/bdhirsh
2022-09-13 20:35:58 +00:00
Thomas Orozco
b4799736ee autograd: fix non-deterministic output in codegen comments (#84695)
Summary:
Like it says in the title. Currently, this will return output like this:

In Buck1, that's OK because Buck1's caching doesn't really care too much about

However, in Buck2, this is a disaster, because caching is based exclusively
on inputs and outputs and

The diff here proposes making the path relative to the codegen script itself,
which should carry about as much info, but avoid cache misses.

Concretely, this:

```
// generated from /dev/shm/uid-34135/cfbc5712-seed-nspid4026533424_cgpid2794673-ns-4026533443/tools/autograd/templates/python_functions.h
```

Becomes, this:

```
// generated from ../tools/autograd/templates/python_functions.h
```

So, we keep the useful part, and we get caching. This matters because those
headers are used in actions like:

```
fbcode//deeplearning/fbgemm/fbgemm_gpu/codegen:embedding_ops -- action (cxx_compile gen_embedding_backward_adam_split_unweighted_cuda.cu (pic))
```

Those actions take upwards of 5 minutes to finish, so by allowing a cache hit,
we are a) saving our users a lot of time and b) saving some RE capacity as
well.

This actually matters a lot because right now those targets are produced by
`//caffe2:generate-code`, which itself doesn't get cache hits from RE because
`generate_code.par` is non-deterministic (this is, unfortunately, true of PARs
in general), so that rule introduces non-determinism that the codegen
propagates and we get zero caching.

This diff doesn't fix `//caffe2:generate-code`'s  inputs being
non-deterministic, but it does fix its *outputs* being non-deterministic, which
means the non-determinism stops there, and we get back to cache hits.

Test Plan:
- CI

```
buck2 build fbcode//caffe2:generate-code
buck2 build fbcode//deeplearning/fbgemm/fbgemm_gpu/codegen:embedding_ops
```

Reviewed By: ndmitchell

Differential Revision: D39348565

Pull Request resolved: https://github.com/pytorch/pytorch/pull/84695
Approved by: https://github.com/soulitzer
2022-09-13 18:41:15 +00:00
soulitzer
bb4e96c964 [reland] Call jit decomposition in VariableType to increase forward AD coverage (#84151) (#84675)
This reverts commit acb4a09628.

In addition, we also fix a memory leak in layer norm.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84675
Approved by: https://github.com/zou3519
2022-09-12 20:33:14 +00:00
Mikayla Gawarecki
e217b30b0f Add torch.nested namespace (#84102)
First step towards #83775
- only `to_padded_tensor` is moved to the nested namespace for now
- following the schema used for `special`, `fft`, `linalg` and other namespaces, nested functions are registered in native_functions.yaml as `nested_{function_name}` and are bound to the desired Python name in
`torch/nested/__init__.py`, and the desired C++ name in `torch/csrc/api/include/torch/nested.h`.

~~**Question**: should we keep the documentation for `Tensor.to_padded_tensor` or can this deleted since it is shared by `torch.nested.to_padded_tensor`?~~

[generated nested docs](https://docs-preview.pytorch.org/84102/nested.html?highlight=nested#module-torch.nested)

Differential Revision: [D39361148](https://our.internmc.facebook.com/intern/diff/D39361148)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84102
Approved by: https://github.com/drisspg
2022-09-12 16:31:05 +00:00
Mengwei Liu
2765243cd5 [torchgen] Refactor static_dispatch to take in source signature (#84384)
Summary: Context: currently `static_dispatch` assumes that given a native function `f`, we always want to map from its `DispatchSignature` to its `CppSignature`. This assumption may not hold true for some use cases, where the source bindings may not come from its `DispatchSignature`. Here I'm changing the argument `sig: DispatcherSignature` to be `sig: Union[CppSignature, DispatcherSignature]`, also removes unused `f`

Test Plan: Rely on added unit test.

Differential Revision: D39192969

Pull Request resolved: https://github.com/pytorch/pytorch/pull/84384
Approved by: https://github.com/iseeyuan
2022-09-10 06:58:56 +00:00
Ivan Yashchuk
01c54ad6de Remove deprecated torch.eig (#70982)
The time has come to remove deprecated linear algebra related functions. This PR removes `torch.eig`.

cc @jianyuh @nikitaved @pearu @mruberry @walterddr @IvanYashchuk @xwang233 @Lezcano
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70982
Approved by: https://github.com/Lezcano, https://github.com/malfet
2022-09-09 21:31:57 +00:00
Eli Uriegas
93aef3a010 Use presence of _symint in kernel name to generate symint sig or not (#84579)
Something people found confusing was that whether or not a native::
signature would get SymInt or not in its type was based on the dispatch
key.  This changes it so that SymInt or not in type is based on whether
or not you have _symint in the name of the kernel or not.  This means
that even when we make operators support SymInt, you no longer have to
go and update all the preexisting definitions; instead, you now
selectively write _symint to opt individual kernels into SymInt support.

I then go and update a bunch of kernels that don't have proper SymInt
support to make use of this convention.  There is some hacking around
for view generation code.

I also add support for external backends to specify 'symint' operators, for which we generate SymInt signatures instead of regular signatures.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: [D39310060](https://our.internmc.facebook.com/intern/diff/D39310060)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84579
Approved by: https://github.com/wconstab
2022-09-09 18:31:56 +00:00
Dhruv Matani
18a31cc044 [Mobile] Fix The Build For Model Tracer (#84755)
Summary: Currently, the model tracer build is broken because of 2 reasons:
1. A few source files are missing, resulting in missing link time symbols
2. The `TRACING_BASED` flag isn't passed correctly from the command line (specified as an evnironment variable) as a CMake flag

Both these issues were fixed.

Test Plan: Ran this command: `USE_CUDA=0 TRACING_BASED=1 python setup.py develop --cmake`

and saw that the tracer binary was built at `build/bin/model_tracer` - also ran it to ensure that it can generate a YAML file.

Differential Revision: [D39391270](https://our.internmc.facebook.com/intern/diff/D39391270)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84755
Approved by: https://github.com/cccclai
2022-09-09 18:22:24 +00:00
Justin Chu
2fa8142cf9 [ONNX] Rename constants for clarity (#84645)
Rename constants to make them more clear. Fix styles to upper case.

Removed `onnx_stable_opsets` because it can be computed from `ONNX_MIN_OPSET` and `ONNX_MAX_OPSET`.

Fixes #84643

Pull Request resolved: https://github.com/pytorch/pytorch/pull/84645
Approved by: https://github.com/BowenBao
2022-09-09 01:22:14 +00:00
PyTorch MergeBot
acb4a09628 Revert "Call jit decomposition in VariableType to increase forward AD coverage (#84151)"
This reverts commit 42d99e6f19.

Reverted https://github.com/pytorch/pytorch/pull/84151 on behalf of https://github.com/malfet due to Regressed test_jvpvjp_nn_functional_layer_norm_cuda_float32, see 42d99e6f19
2022-09-07 18:02:27 +00:00
soulitzer
42d99e6f19 Call jit decomposition in VariableType to increase forward AD coverage (#84151)
This PR:
- updates forward AD codegen in core to generate code that tries calling into decompositions registered to jit when
   - (1) the function is not in-place or out variant
   - AND (2) the function is differentiable (requires_derivative=True)
   - AND (3) there are no forward AD formulas registered
   - To simplify things we always generating the if/else (as long as (1) is true), but generate 'false' when either (2) or (3) are false.
 - removes the mechanism from functorch
    - (follow up) some functorch tests should be updated here so they no longer have to compute the Jacobian with vjp
  - factors out some logic to generate the any_has_forward_grad condition
     - (bc-breaking) when TensorList inputs unexpectedly have forward grad, the error will no longer contain the name

See https://github.com/pytorch/pytorch/pull/84151#issuecomment-1238519247 for codegen output and more discussion.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84151
Approved by: https://github.com/samdow, https://github.com/albanD, https://github.com/zou3519
2022-09-07 15:31:46 +00:00
Mikayla Gawarecki
1cad744694 Enable select.int when NestedTensor requires grad (#83875)
Previously indexing a nested tensor when it requires_grad would raise an error because the backward formula for `select.int` uses `self.sizes()`. This PR fixes that by temporarily registering a _nested_select_backward function which can be removed when we start using the symint approach to register kernels. For now this functionality is needed for creating a POC that nested tensor can be an API to `segment_coo` and `segment_csr` in the torch_scatter repo

```
    a = torch.arange(10).reshape(2, 5).float()
    b = torch.arange(12).reshape(2, 6).float()
    nt = torch.nested_tensor([a, b], dtype=torch.float).requires_grad_(True)
    nt[0]
    # RuntimeError: Internal error: NestedTensorImpl doesn't support sizes. Please file an issue on https://github.com/pytorch/nestedtensor
```

whereas

```
 nt = torch.nested_tensor([a, b], dtype=torch.float).requires_grad_(False)
 nt[0]
 ```
would succeed

Pull Request resolved: https://github.com/pytorch/pytorch/pull/83875
Approved by: https://github.com/albanD, https://github.com/drisspg
2022-09-06 22:19:32 +00:00
mikey dagitses
4f0b9f3c31 move PyTorch internal-only starlark files into fb/ subdirectories (#84548)
Summary: These are not used in OSS so should not clutter them there.

Test Plan: Rely on CI.

Differential Revision: D39262135

Pull Request resolved: https://github.com/pytorch/pytorch/pull/84548
Approved by: https://github.com/DanilBaibak
2022-09-06 18:08:42 +00:00
Nikolay Korovaiko
f725009a48 as_strided supports SymInt; codegen supports optional SymInt (#84393)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84393
Approved by: https://github.com/ezyang
2022-09-06 16:39:24 +00:00
Edward Z. Yang
2a332afbf4 Add SymFloat, support SymInt to SymFloat conversion (#84284)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84284
Approved by: https://github.com/albanD
2022-09-03 01:30:32 +00:00
YifanShenSZ
673b35c847 Better reshape with autograd support (#82754) (#84154)
The original author is @YifanShenSZ  and the original PR is: #82754
# Summary:
Previous reshape [https://github.com/pytorch/pytorch/issues/80981](https://github.com/pytorch/pytorch/pull/80981) is ok for forward, but needs improvement for backward: need to handle "sometimes view sometimes copy" behavior.

This pull request fixes it by:
1. add a new alias dispatch key `CompositeImplicitAutogradNestedTensor`, which ideally would work as nested-tensor version of `CompositeImplicitAutograd`
2. register `reshape_nested` to `reshape` by `CompositeImplicitAutogradNestedTensor`

Side changes:
* add contiguous memory format support to `clone_nested`
* add `view_nested`
* add `reshape_as_nested`

Fix issue [https://github.com/pytorch/pytorch/issues/83041](https://github.com/pytorch/pytorch/issues/83041)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/82754

Test Plan:
Imported from GitHub, without a `Test Plan:` line.

**Static Docs Preview: executorch**
|[Full Site](https://our.intern.facebook.com/intern/staticdocs/eph/D39023822/V13/executorch/)|

|**Modified Pages**|

Reviewed By: albanD

Differential Revision: D39023822

Pulled By: drisspg

Pull Request resolved: https://github.com/pytorch/pytorch/pull/84154
Approved by: https://github.com/bdhirsh, https://github.com/albanD
2022-09-01 20:01:39 +00:00
Edward Z. Yang
f1ee162193 Use SymInt signature to compute saved variables (#84354)
This seems to have been accidentally working, but it broke
when I added support for saving optional SymInt directly
from input arguments.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84354
Approved by: https://github.com/Krovatkin
2022-09-01 16:30:00 +00:00
Elias Ellison
f701cb04fb Test Dynamo CI w Fake Tensors (#84282)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84282
Approved by: https://github.com/anijain2305
2022-09-01 00:15:05 +00:00
Nikolay Korovaiko
eda217ab67 Reland symint_numel (#84281)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/84281
Approved by: https://github.com/ezyang
2022-08-30 21:53:34 +00:00
Jeff Daily
d09486ab23 [ROCm] enable nvfuser (#82498)
### Description
The nvfuser is enabled for ROCm.

### Testing
CI label ciflow/trunk covers the newly enabled ROCm functionality as well as any CUDA regressions caused by these changes.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/82498
Approved by: https://github.com/jjsjann123, https://github.com/davidberard98
2022-08-30 21:50:39 +00:00
Nikolay Korovaiko
44a975335e Revert "Re-land sym_numel (#82374) (#82726) (#82731) (#82855)" (#84207)
This reverts commit bfebf254dd.

Differential Revision: [D39104562](https://our.internmc.facebook.com/intern/diff/D39104562)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84207
Approved by: https://github.com/robieta
2022-08-30 13:22:58 +00:00
Edward Z. Yang
ad44670fa1 Back out "Revert D38984222: Don't introduce new overload for SymInt (#83628)" (#84173)
Also Back out "Revert D39075159: [acc_tensor] Use SymIntArrayRef for overloaded empty.memory_format's signature"

Original commit changeset: dab4a9dba4fa
Original commit changeset: dcaf16c037a9

Original Phabricator Diff: D38984222
Original Phabricator Diff: D39075159

Also update Metal registrations for C++ registration changes.

Also update NNPI registration to account for tightened schema checking

Differential Revision: [D39084762](https://our.internmc.facebook.com/intern/diff/D39084762/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D39084762/)!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84173
Approved by: https://github.com/Krovatkin
2022-08-29 18:01:07 +00:00
PyTorch MergeBot
c7edcd6968 Revert "Don't introduce new overload for SymInt (#83628)"
This reverts commit 9790d90e4b.

Reverted https://github.com/pytorch/pytorch/pull/83628 on behalf of https://github.com/malfet due to Breaks internal builds, see D39076487
2022-08-27 01:23:17 +00:00
Catherine Lee
582c0833d5 mac circleci workflows (#82780)
Add mac and ios workflows to circleci so they can be run on pull

m1 tests not included because circleci doesnt have machines

Unsure how to get certain environment variables (specifically for arm64 ios builds that require env vars like `IOS_SIGN_KEY_2022` and `IOS_DEV_TEAM_ID` that are stored in the org-member context which is not accessible by everyone.

doc regarding env vars https://docs.google.com/document/d/1J_3Z9sfu2vlHMF1fjdJfeTuxPXC6dgqJs7aU0KpYSBU/edit#

Pull Request resolved: https://github.com/pytorch/pytorch/pull/82780
Approved by: https://github.com/malfet, https://github.com/huydhn
2022-08-26 18:48:48 +00:00
Edward Z. Yang
9790d90e4b Don't introduce new overload for SymInt (#83628)
Previously, we introduced new SymInt overloads for every function we wanted.  This led to a lot of boilerplate, and also a lot of confusion about how the overloads needed to be implemented.

This PR takes a simpler but more risky approach: just take the original function and changes its ints to SymInts.

This is BC-breaking in the following ways:

* The C++ API for registering implementations for aten operators will change from int64_t to SymInt whenever you make this change. Code generated registrations in PyTorch do not change as codegen handles the translation automatically, but manual registrations will need to follow the change.  Typically, if you now accept a SymInt where you previously only took int64_t, you have to convert it back manually.  This will definitely break XLA, see companion PR https://github.com/pytorch/xla/pull/3914 Note that not all dispatch keys get the automatic translation; all the composite keys and Meta keys are modified to take SymInt directly (because they should handle them directly), and so there are adjustments for this.

This is not BC-breaking in the following ways:

* The user facing C++ API remains compatible.  Even if a function changes from int to SymInt, the default C++ binding still takes only ints.  (e.g., at::empty(IntArrayRef, ...).  To call with SymInts, you must call at::empty_symint instead. This involved adding two more signatures to CppSignatureGroup; in many cases I refactored code to iterate over all signatures in the group instead of hard-coding the two that previously existed.
* This is TorchScript compatible; internally we treat SymInts as ints so there is no change to what happens at runtime in TorchScript. In particular, it's OK to reference an empty schema by its old type (using int types), as long as you're not doing string equality (which you shouldn't be), these parse to the same underyling type.

Structure of the PR:

* The general strategy of this PR is that, even when you write `SymInt` inside `native_functions.yaml`, sometimes, we will treat it *as if* it were an `int`. This idea pervades the codegen changes, where we have a translation from SymInt to c10::SymInt or int64_t, and this is controlled by a symint kwarg which I added and then audited all call sites to decide which I wanted. Here are some of the major places where we pick one or the other:
  * The C++ FunctionSchema representation represents `SymInt` as `int`. There are a few places we do need to know that we actually have a SymInt and we consult `real_type()` to get the real type in this case. In particular:
    * When we do schema validation of C++ operator registration, we must compare against true schema (as the C++ API will provide `c10::SymInt`, and this will only be accepted if the schema is `SymInt`. This is handled with cloneWithRealTypes before we check for schema differences.
    * In `toIValue` argument parsing, we parse against the true schema value. For backwards compatibility reasons, I do still accept ints in many places where Layout/SymInt/etc were expected. (Well, accepting int where SymInt is expected is not BC, it's just the right logic!)
  * In particular, because SymInt never shows up as type() in FunctionSchema, this means that we no longer need a dedicated Tag::SymInt. This is good, because SymInts never show up in mobile anyway.
* Changes to functorch/aten are mostly about tracking changes to the C++ API registration convention. Additionally, since SymInt overloads no longer exist, registrations for SymInt implementations are deleted. In many cases, the old implementations did not properly support SymInts; I did not add any new functionality with this PR, but I did try to annotate with TODOs where this is work to do. Finally, because the signature of `native::` API changed from int to SymInt, I need to find alternative APIs for people who were directly calling these functions to call. Typically, I insert a new dispatch call when perf doesn't matter, or use `at::compositeexplicitautograd` namespace to handle other caes.
* The change to `make_boxed_from_unboxed_functor.h` is so that we accept a plain IntList IValue anywhere a SymIntList is expected; these are read-only arguments so covariant typing is OK.
* I change how unboxing logic works slightly. Previously, we interpret the C++ type for Layout/etc directly as IntType JIT type, which works well because the incoming IValue is tagged as an integer. Now, we interpret the C++ type for Layout as its true type, e.g., LayoutType (change to `jit_type.h`), but then we accept an int IValue for it anyway. This makes it symmetric with SymInt, where we interpret the C++ type as SymIntType, and then accept SymInt and int IValues for it.
* I renamed the `empty.names` overload to `empty_names` to make it less confusing (I kept mixing it up with the real empty overload)
* I deleted the `empty.SymInt` overload, which ended up killing a pile of functions. (This was originally a separate PR but the profiler expect test was giving me grief so I folded it in.)
* I deleted the LazyDynamicOpsTest tests. These were failing after these changes, and I couldn't figure out why they used to be passing: they make use of `narrow_copy` which didn't actually support SymInts; they were immediately converted to ints.
* I bashed LTC into working. The patches made here are not the end of the story. The big problem is that SymInt translates into Value, but what if you have a list of SymInt? This cannot be conveniently represented in the IR today, since variadic Values are not supported. To work around this, I translate SymInt[] into plain int[] (this is fine for tests because LTC dynamic shapes never actually worked); but this will need to be fixed for proper LTC SymInt support. The LTC codegen also looked somewhat questionable; I added comments based on my code reading.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83628
Approved by: https://github.com/albanD, https://github.com/bdhirsh
2022-08-26 01:35:40 +00:00
Mario Lezcano
f5a3515083 Make linalg.inv composite of linalg.solve (#80074)
The `getri` kernel calls inside `getrs` so we can do so explicitly
ourselves and save ourselves from having to maintain an extra kernel.
This way we just need to optimise `lu_factor` and `lu_solve` and `inv`
will be as efficient as it can be, as it'll be choosing the best backend
to perform the factorisation and the best backend (not necessarily the
same) to perform the solve.

Fixes https://github.com/pytorch/pytorch/issues/77498

The benchmarks: https://github.com/pytorch/pytorch/pull/80074#issuecomment-1164309071
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80074
Approved by: https://github.com/IvanYashchuk, https://github.com/albanD, https://github.com/malfet
2022-08-25 09:28:55 +00:00
PyTorch MergeBot
a7edf71360 Revert "Don't introduce new overload for SymInt (#83628)"
This reverts commit 8fae7027b3.

Reverted https://github.com/pytorch/pytorch/pull/83628 on behalf of https://github.com/malfet due to breaking internal builds, see https://www.internalfb.com/diff/D38984222
2022-08-25 00:49:40 +00:00
PyTorch MergeBot
5321bf52f2 Revert "Make linalg.inv composite of linalg.solve (#80074)"
This reverts commit 4737b33614.

Reverted https://github.com/pytorch/pytorch/pull/80074 on behalf of https://github.com/malfet due to Depends on the changes from https://github.com/pytorch/pytorch/pull/83628
2022-08-25 00:43:00 +00:00
Catherine Lee
4a6726a840 use condensed disabled tests file (#84017)
follow up to https://github.com/pytorch/test-infra/pull/545

then we can get rid of the non condensed version
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84017
Approved by: https://github.com/huydhn, https://github.com/janeyx99
2022-08-25 00:34:25 +00:00
Mario Lezcano
3e6e0a1d10 Support a stable double backward on linalg.det for real inputs (#80217)
The complex case still fails. I do not know why.

Fixes https://github.com/pytorch/pytorch/issues/62327
Fixes https://github.com/pytorch/pytorch/issues/53364
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80217
Approved by: https://github.com/nikitaved, https://github.com/albanD, https://github.com/malfet
2022-08-24 15:18:56 +00:00
Mario Lezcano
4737b33614 Make linalg.inv composite of linalg.solve (#80074)
The `getri` kernel calls inside `getrs` so we can do so explicitly
ourselves and save ourselves from having to maintain an extra kernel.
This way we just need to optimise `lu_factor` and `lu_solve` and `inv`
will be as efficient as it can be, as it'll be choosing the best backend
to perform the factorisation and the best backend (not necessarily the
same) to perform the solve.

Fixes https://github.com/pytorch/pytorch/issues/77498

The benchmarks: https://github.com/pytorch/pytorch/pull/80074#issuecomment-1164309071
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80074
Approved by: https://github.com/IvanYashchuk, https://github.com/albanD, https://github.com/malfet
2022-08-24 15:18:56 +00:00
Edward Z. Yang
0491e1a13a Support returning symbolic strides from t.stride() in Python (#83842)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83842
Approved by: https://github.com/albanD, https://github.com/Chillee, https://github.com/bdhirsh
2022-08-24 04:32:51 +00:00
Sergii Dymchenko
591222f5d9 Fix use-dict-literal lint (#83718)
Fix use-dict-literal pylint suggestions by changing `dict()` to `{}`. This PR should do the change for every Python file except test/jit/test_list_dict.py, where I think the intent is to test the constructor.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83718
Approved by: https://github.com/albanD
2022-08-24 00:26:46 +00:00
Edward Z. Yang
8fae7027b3 Don't introduce new overload for SymInt (#83628)
Previously, we introduced new SymInt overloads for every function we wanted.  This led to a lot of boilerplate, and also a lot of confusion about how the overloads needed to be implemented.

This PR takes a simpler but more risky approach: just take the original function and changes its ints to SymInts.

This is BC-breaking in the following ways:

* The C++ API for registering implementations for aten operators will change from int64_t to SymInt whenever you make this change. Code generated registrations in PyTorch do not change as codegen handles the translation automatically, but manual registrations will need to follow the change.  Typically, if you now accept a SymInt where you previously only took int64_t, you have to convert it back manually.  This will definitely break XLA, see companion PR https://github.com/pytorch/xla/pull/3914 Note that not all dispatch keys get the automatic translation; all the composite keys and Meta keys are modified to take SymInt directly (because they should handle them directly), and so there are adjustments for this.

This is not BC-breaking in the following ways:

* The user facing C++ API remains compatible.  Even if a function changes from int to SymInt, the default C++ binding still takes only ints.  (e.g., at::empty(IntArrayRef, ...).  To call with SymInts, you must call at::empty_symint instead. This involved adding two more signatures to CppSignatureGroup; in many cases I refactored code to iterate over all signatures in the group instead of hard-coding the two that previously existed.
* This is TorchScript compatible; internally we treat SymInts as ints so there is no change to what happens at runtime in TorchScript. In particular, it's OK to reference an empty schema by its old type (using int types), as long as you're not doing string equality (which you shouldn't be), these parse to the same underyling type.

Structure of the PR:

* The general strategy of this PR is that, even when you write `SymInt` inside `native_functions.yaml`, sometimes, we will treat it *as if* it were an `int`. This idea pervades the codegen changes, where we have a translation from SymInt to c10::SymInt or int64_t, and this is controlled by a symint kwarg which I added and then audited all call sites to decide which I wanted. Here are some of the major places where we pick one or the other:
  * The C++ FunctionSchema representation represents `SymInt` as `int`. There are a few places we do need to know that we actually have a SymInt and we consult `real_type()` to get the real type in this case. In particular:
    * When we do schema validation of C++ operator registration, we must compare against true schema (as the C++ API will provide `c10::SymInt`, and this will only be accepted if the schema is `SymInt`. This is handled with cloneWithRealTypes before we check for schema differences.
    * In `toIValue` argument parsing, we parse against the true schema value. For backwards compatibility reasons, I do still accept ints in many places where Layout/SymInt/etc were expected. (Well, accepting int where SymInt is expected is not BC, it's just the right logic!)
  * In particular, because SymInt never shows up as type() in FunctionSchema, this means that we no longer need a dedicated Tag::SymInt. This is good, because SymInts never show up in mobile anyway.
* Changes to functorch/aten are mostly about tracking changes to the C++ API registration convention. Additionally, since SymInt overloads no longer exist, registrations for SymInt implementations are deleted. In many cases, the old implementations did not properly support SymInts; I did not add any new functionality with this PR, but I did try to annotate with TODOs where this is work to do. Finally, because the signature of `native::` API changed from int to SymInt, I need to find alternative APIs for people who were directly calling these functions to call. Typically, I insert a new dispatch call when perf doesn't matter, or use `at::compositeexplicitautograd` namespace to handle other caes.
* The change to `make_boxed_from_unboxed_functor.h` is so that we accept a plain IntList IValue anywhere a SymIntList is expected; these are read-only arguments so covariant typing is OK.
* I change how unboxing logic works slightly. Previously, we interpret the C++ type for Layout/etc directly as IntType JIT type, which works well because the incoming IValue is tagged as an integer. Now, we interpret the C++ type for Layout as its true type, e.g., LayoutType (change to `jit_type.h`), but then we accept an int IValue for it anyway. This makes it symmetric with SymInt, where we interpret the C++ type as SymIntType, and then accept SymInt and int IValues for it.
* I renamed the `empty.names` overload to `empty_names` to make it less confusing (I kept mixing it up with the real empty overload)
* I deleted the `empty.SymInt` overload, which ended up killing a pile of functions. (This was originally a separate PR but the profiler expect test was giving me grief so I folded it in.)
* I deleted the LazyDynamicOpsTest tests. These were failing after these changes, and I couldn't figure out why they used to be passing: they make use of `narrow_copy` which didn't actually support SymInts; they were immediately converted to ints.
* I bashed LTC into working. The patches made here are not the end of the story. The big problem is that SymInt translates into Value, but what if you have a list of SymInt? This cannot be conveniently represented in the IR today, since variadic Values are not supported. To work around this, I translate SymInt[] into plain int[] (this is fine for tests because LTC dynamic shapes never actually worked); but this will need to be fixed for proper LTC SymInt support. The LTC codegen also looked somewhat questionable; I added comments based on my code reading.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83628
Approved by: https://github.com/albanD, https://github.com/bdhirsh
2022-08-23 22:04:07 +00:00
Driss Guessous
7cfc8b7820 [MPS] Move mps_linear to mps dispatch key (#80068)
Fixes #77394

This is related to #79920 which adds linear support for nested tensors. Codegen still throws an assert stoping this from compiling. However I tested locally by commenting out this assert: 61305cd638/tools/autograd/gen_variable_type.py (L798)
and the intended behavior appears to be working. I am not sure what changes need to be made to codegen to make this work.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80068
Approved by: https://github.com/albanD, https://github.com/malfet, https://github.com/kulinseth
2022-08-23 01:13:17 +00:00
chenlai
7aba6f8e7b Rename flatbuffer_serializer to *_mobile or *_full_jit (#82827)
The target named `flatbuffer_serializer` in fbcode has dependency from full jit and the one in xplat has dependency for mobile only. Rename them accordingly

```
flatbuffer_serializer in fbode -> flatbuffer_serializer_full_jit
flatbuffer_serializer in xplat -> flatbuffer_serializer_mobile
```

so it's more readable.

Differential Revision: [D38413369](https://our.internmc.facebook.com/intern/diff/D38413369/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D38413369/)!

Differential Revision: [D38413369](https://our.internmc.facebook.com/intern/diff/D38413369)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82827
Approved by: https://github.com/qihqi
2022-08-19 01:29:46 +00:00
Mario Lezcano
88d3acd6b1 Fix and improve the efficiency of the backward of xlog* functions. (#82713)
That is `xlogy`, `special.xlogy`, `special.xlog1py`.

Fixes https://github.com/pytorch/pytorch/issues/80770
Fixes https://github.com/pytorch/pytorch/issues/74279
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82713
Approved by: https://github.com/albanD
2022-08-18 21:55:42 +00:00
Mario Lezcano
aad89bb771 Make the derivative of masked_fill more efficient (#83515)
There's no need to add all the zeros if we extract all the non-zero
elements.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83515
Approved by: https://github.com/albanD, https://github.com/soulitzer
2022-08-18 13:00:12 +00:00
Mengwei Liu
badbdb0330 [torchgen] Relax the restriction on number of custom namespaces (#83580)
Summary:
We started to see use cases where it involves more than 1 custom namespace to live within the same yaml file. Hence relaxing the restriction that 1 yaml file can only have 1 custom namespace other than `aten`. Updated unit test as well.

Differential Revision: D38775685

Pull Request resolved: https://github.com/pytorch/pytorch/pull/83580
Approved by: https://github.com/JacobSzwejbka
2022-08-18 04:47:13 +00:00
Edward Z. Yang
52be908225 Delete unnecessary sum.SymInt overload (#83591)
Dims argument only ever takes dimensions, which we do not need
to SymInt-ify.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83591
Approved by: https://github.com/albanD
2022-08-18 02:00:50 +00:00
Jay Chae
451c6296af [kineto] deprecate USE_KINETO_UPDATED (#83305)
Summary: This is used to do cross repo updates but has not been cleaned up properly

Test Plan: CI

Reviewed By: aaronenyeshi

Differential Revision: D38633379

Pull Request resolved: https://github.com/pytorch/pytorch/pull/83305
Approved by: https://github.com/aaronenyeshi
2022-08-17 22:31:49 +00:00
Mikayla Gawarecki
bd0ad7a84f Add backward support for rudimentary NestedTensor.sum(dim) (#82625)
Per offline discussion, this will be updated to use expand once expand semantics for nested tensor have been fleshed out.

Next steps will be to add support for other features for forward sum mentioned on #82387 and likewise update the backward

Pull Request resolved: https://github.com/pytorch/pytorch/pull/82625
Approved by: https://github.com/albanD
2022-08-17 18:12:00 +00:00
Larry Liu
11d4d91bdc [torchgen] Add logic in annotation parser to accept alias set (#83501)
Extending the current regex in `model.py` to support annotation alias set. See issue #83214.

Ideally we should have a full fledged lexer similar to `schema_type_parser.cpp`, since regex can be more and more difficult to read if we add more support to it.

Adding this to unblock this issue for now.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83501
Approved by: https://github.com/SherlockNoMad
2022-08-17 07:04:25 +00:00
Justin Chu
cd68f08992 [ONNX] Update the script for version updates (#83283)
This PR updates the `tools/onnx/update_default_opset_version.py` script to ensure files are edited correctly to prepare for the opset 17 support in torch.onnx.

- (clean up) Move script to `main()`
- Add an `--skip_build` option to avoid building pytorch if we want to rerun the process due to errors after compilation is done
- Update to edit the correct files now that the onnx files were refactored
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83283
Approved by: https://github.com/thiagocrepaldi, https://github.com/AllenTiTaiWang, https://github.com/abock
2022-08-16 22:28:54 +00:00
Nikita Shulga
a8941aa996 [BE] Better test stats errors (#83484)
When `BUILD_ENVIRONMENT` is not defined, print sensible error message
Which is better than:
```
Could not download https://raw.githubusercontent.com/pytorch/test-infra/generated-stats/stats/test-times.json because: 'BUILD_ENVIRONMENT'
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/83484
Approved by: https://github.com/huydhn, https://github.com/ZainRizvi
2022-08-16 07:51:12 +00:00
Edward Z. Yang
2d8f091f6a Move TorchDispatchModeTLS to c10/core (#83370)
I need to access it directly from TensorImpl to route directly
TensorImpl induced operations to modes (upcoming PR).

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83370
Approved by: https://github.com/zou3519
2022-08-15 17:59:57 +00:00
Mor Tzur
316cb8a06a embedded_interpreter_hip (#83329)
Summary: Adding embedded_interpreter_hip and deps to enable torch::deploy on AMD.

Test Plan: Sandcastle

Reviewed By: zrphercule

Differential Revision: D38546701

Pull Request resolved: https://github.com/pytorch/pytorch/pull/83329
Approved by: https://github.com/jfix71
2022-08-15 15:08:55 +00:00
Mengwei Liu
d0d6b1f222 [torchgen] Generate out variant for functional operator (#81437)
Summary:
Previously we don't generate out variant (both schema and kernel) for an operator with functional variant only. This adds support for that and adds test.

## Changes on `native_function_generation.py`

We are generating out variant for all functional variants if possible. This PR introduces a lot of newly generated out variants and `native_functions.yaml` needs to incorporate the changes by adding `autogen` keywords.

The logic for determining what operators we should generate an out variant for is the following:

1. No existing out variant for this `NativeFunction`
2. Contains an existing in place, mutable or functional variant
3. Contains at least 1 tensor like return(s)

For operators matching the first two conditions but failing the third, I listed them in `FUNCTIONAL_OPS_THAT_CANNOT_GET_AN_OUT_VARIANT`.

## Special handling

The following operators satisfy all 3 criteria above but we chose to not autogen them, with some reasons.
* `mkldnn_adaptive_avg_pool2d`, the generated out variant `mkldnn_adaptive_avg_pool2d.out` is colliding with the `mkldnn_adaptive_avg_pool2d_out` kernel in `adaptive_avg_pool2d.out` operator. I manually created `mkldnn_adaptive_avg_pool2d.out` and renamed `mkldnn_adaptive_avg_pool2d_out` to `mkldnn_adaptive_avg_pool2d_out_stub`.
* `min`, `max` and `mean`. There already exist `min.out`, `max.out` and `mean.out` but they are having different semantics with the functional ones. I manually created `min.unary_out`, `max.unary_out` and `mean.dtype_out` to disambiguate.

## Autograd Changes

We introduced a logic to not match derivatives info in `derivatives.yaml` to out variant, since we are generating `NOT_IMPLEMENTED` kernels for those out variants anyway. The issue we are seeing with the original logic is that it doesn't handle `TensorOption` arguments really well. For example we have these two operators:

* `_to_copy(Tensor self, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None, bool non_blocking=False, MemoryFormat? memory_format=None) -> Tensor`
* `_to_copy.out(Tensor self, *, bool non_blocking=False, MemoryFormat? memory_format=None, Tensor(a!) out) -> Tensor(a!)`

If we uses `_to_copy` derivative info, there will be compilation error since `dtype` is missing from `_to_copy.out` signature.
Test Plan: Rely on unit test

Differential Revision: D37832342

Pull Request resolved: https://github.com/pytorch/pytorch/pull/81437
Approved by: https://github.com/iseeyuan, https://github.com/bdhirsh
2022-08-13 05:44:53 +00:00
Nikolay Korovaiko
88d7322b07 fix a comment since the options in arg parser no longer require Declarations.yaml (#83337)
fix a comment since the options in arg parser no longer require Declarations.yaml
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83337
Approved by: https://github.com/albanD
2022-08-12 21:10:41 +00:00
Edward Z. Yang
d423722607 Add data_dependent_output tag; generalize proxy tensor to test it (#83312)
Fixes https://github.com/pytorch/pytorch/issues/83251

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83312
Approved by: https://github.com/albanD
2022-08-12 17:31:55 +00:00
Yifan Shen
7f18ef14c1 Register nested matmul as an addition to CompositeImplicit (#82786)
The initial matmul_nested in [#81957](https://github.com/pytorch/pytorch/pull/81957) is imperfect:
* it is allowed now to register another kernel in addition to CompositeImplicit
* so we should do that, instead of the code smell is_nested()
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82786
Approved by: https://github.com/albanD
2022-08-11 21:46:05 +00:00
richard
382ef1fda7 Autograd graphtask trim unnecessary edges (#82544)
### Introduction
<!-- What did you change and why was it needed? -->

Removing unnecessary weight gradient calculation is very important for applications that need high-order derivatives during training. However, this is not supported by the current Autograd engine.

For more detail: The backward function of a `matmul` operator (e.g., `linear` `addmm` `mm`), has two matmuls, one for `input gradient` and another for `weight gradient`. For a typical neural network (nn) with a few linear layers and activation functions, if the user calls `torch.autograd.grad()` to calculate the derivative of the nn output `y` w.r.t the nn input `x`,  only the `input gradient` of the `matmul` operator is needed, and the `weight gradient` is discarded. However, the current PyTorch autograd engine will always calculate the `weight gradient` if `weight` requires gradient (the calculation of the high-order derivative is performed during training).

The figure attached shows the autograd graph of the following code snippet:
```py
y = torch.nn.functional.linear(x, weight, bias)
y = y.pow(2)
# first order derivative
y__x, = torch.autograd.grad(y, x, grad_outputs=grad_outputs, create_graph=True)
# first order derivative
y__x__x, = torch.autograd.grad(y__x, x, grad_outputs=grad_outputs, create_graph=True)
```
The path with  is not needed when calculating derivatives.

<img width="50%" alt="image" src="https://user-images.githubusercontent.com/9999318/182018117-719c5a23-bcc6-4a63-8e8d-1bca3ebda2e3.png">

### Issue
<!-- Link to Issue ticket or RFP -->
Related issue: https://github.com/pytorch/pytorch/issues/56500

### Method
When calling `torch.autograd.grad`, `exec_info_` is created for each GraphTask, which allows filtering paths on the graph that are not needed. However, when the GraphTask calls into the node, the node still does not know whether the edges are needed or not. In the case of matmul, `weight.requires_grad is True` so the weight gradient is always calculated.

Following https://github.com/pytorch/pytorch/issues/56500#issuecomment-825694656, this PR passes the graph task's thread_local `exec_info_` into the node, so it could trim unnecessary edges during `torch.autograd.grad` calls.

### Benchmark
Benchmark script: https://gist.github.com/yueyericardo/24158433a2021c51eeef9c3e2722df99

Benchmark result:
6 hidden layers, batch size 10000, on A100

FP32 result
| hessian benchmark             | FP32 (before) | FP32 (After)      | FP32 (Functorch v0.1.1) |
| ----------------------------- | ------------- | ----------------- | ----------------------- |
| Linear + ReLU (no backward)   | 55.658 ms     | 29.392 ms (1.90X) | 29.547 ms (1.90X)       |
| Linear + ReLU (with backward) | 81.173 ms     | 54.917 ms (1.47X) | 68.988 ms (1.18X)       |

TF32 result
| hessian benchmark             | TF32 (before) | TF32 (after)      | TF32 (Functorch v0.1.1) |
| ----------------------------- | ------------- | ----------------- | ----------------------- |
| Linear + ReLU (no backward)   | 19.801 ms     | 11.259 ms (1.76X) | 10.754 ms (1.84X)       |
| Linear + ReLU (with backward) | 29.167 ms     | 20.466 ms (1.42X) | 22.784 ms (1.28X)       |

For FP32 result, we could get 1.9X speed up for hessian calculation, and 1.47X speed up during training, which is even faster than functorch `vmap(jacfwd(jacrev` implementation. (functorch has performance regression on v0.2.0, https://github.com/pytorch/functorch/issues/989, so we are using v0.1.1 for benchmark)

@zou3519 does functorch also includes similar optimizations during hessian calculation? If not, what do we need to do so the functorch could also benefit from this PR?

### Testing
<!-- How did you test your change? -->

- [x] we need to figure out a way for unittest

### Thanks
Thanks for the great blog: [How Computational Graphs are Executed in PyTorch | PyTorch](https://pytorch.org/blog/how-computational-graphs-are-executed-in-pytorch/)

cc @zasdfgbnm @albanD
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82544
Approved by: https://github.com/soulitzer
2022-08-11 18:50:09 +00:00
Mengwei Liu
c322fc03a1 [torchgen] Fix selective build error on custom namespace (#83141)
Summary: Currently `SelectiveBuilder` is hardcoding namespace `aten` for operators. This is not working anymore since operators started to have custom namespaces. This fixes it.

Test Plan: Rely on newly added unit test

Differential Revision: D38565527

Pull Request resolved: https://github.com/pytorch/pytorch/pull/83141
Approved by: https://github.com/JacobSzwejbka
2022-08-10 21:27:05 +00:00
PyTorch MergeBot
f534b2c627 Revert "Remove split functional wrapper (#74727)"
This reverts commit a58876ace7.

Reverted https://github.com/pytorch/pytorch/pull/74727 on behalf of https://github.com/seemethere due to Fails internal use cases, might extend out to external use cases as well. Need to assess overall impact of this change more widely
2022-08-10 19:45:23 +00:00
Mikayla Gawarecki
e3e33cfae0 Enable codegen of per-dispatch key derivative formulas in derivatives.yaml (#82801)
`derivatives.yaml` can now take a `dispatch` entry which registers per-autograd dispatch key derivatives such as
```
name: foo(Tensor self, Tensor y) -> Tensor
dispatch:
  Default:
    x: grad
    y: grad.expand(y.sizes())
  AutogradNestedTensor:
    x: grad
    y:  NestedTensor_foo_backward(grad, y)
output_differentiabilty: [True]
```

However the old schema where there is no `dispatch` entry is still supported.

Would greatly appreciate feedback on *how to improve the testing strategy* of this PR, currently have registered an aten test op in TestOps.cpp with dummy gradients in derivatives.yaml and have some tests in test_autograd.py:TestAutogradMultipleDispatch but I am not sure whether these are sufficiently rigorous.

Additionally, this PR also makes the assumption that sets like [VIEW_FUNCTIONS](ff5399e528/tools/autograd/gen_inplace_or_view_type.py (L60)) are per-native-function and not per-native-function-and-dispatch-key. I'm not sure whether this is necessarily the case, *would there ever be a situation where (e.g. a nested_tensor op is a view op but the aten function is not or vice versa?)*

* __->__ #82801
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82801
Approved by: https://github.com/bhosmer, https://github.com/albanD
2022-08-10 19:26:29 +00:00
Peter Bell
a58876ace7 Remove split functional wrapper (#74727)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74727
Approved by: https://github.com/albanD, https://github.com/khabinov
2022-08-10 17:57:48 +00:00
Kurt Mohler
be5b3df6cc Update std_mean/var_mean/nanmean/nansum signatures with int[1]? dim (#82912)
### Description
Change the type of the `dim` arg for `std_mean/var_mean/nanmean/nansum` to `int[1]?` in `native_functions.yaml`

### Issue
Part of #29137

### Testing

Pull Request resolved: https://github.com/pytorch/pytorch/pull/82912
Approved by: https://github.com/albanD
2022-08-10 16:58:26 +00:00
Nicolas Macchioni
b236352036 Add mask identifier for multiplexed src_mask/src_key_padding_mask in BT (#81947)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81947

Transformer fastpath multiplexes two arguments, src_mask [seq_len x seq_len] and src_key_padding_mask [batch_size x seq_len], and later deduces the type based on mask shape.

In the event that batch_size == seq_len, any src_mask is wrongly interpreted as a src_key padding_mask. This is fixed by requiring a mask_type identifier be supplied whenever batch_size == seq_len.

Additionally, added support for src_mask in masked_softmax CPU path.

Test Plan: existing unit tests + new unit tests (batch_size == seq_len)

Differential Revision: D37932240

Pull Request resolved: https://github.com/pytorch/pytorch/pull/81947
Approved by: https://github.com/zrphercule
2022-08-09 23:42:16 +00:00
soulitzer
b55f9047e1 Add forward AD support for elu_, celu_, selu_ (#83080)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83080
Approved by: https://github.com/albanD
2022-08-09 20:15:44 +00:00
Natalia Gimelshein
e77d4ec5eb fix where backward to use scalar 0 (#83043)
Per title

Pull Request resolved: https://github.com/pytorch/pytorch/pull/83043
Approved by: https://github.com/Chillee
2022-08-09 16:27:44 +00:00
Shunting Zhang
943553965e support custom class in torchgen schema parser (#82925)
Differential Revision: [D38480514](https://our.internmc.facebook.com/intern/diff/D38480514/)

torchgen schema parser does not support parsing function schemas using custom class so far. Here is an example:
```
quantized::conv2d_relu.new(Tensor qx, __torch__.torch.classes.quantized.Conv2dPackedParamsBase packed_weight, float output_scale, int output_zero_point) -> (Tensor)
```

This PR parse custom class name and encapsulate that into an object of CustomClassType. The only thing we need right now is just store the string class name and return that in `__str__` method.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82925
Approved by: https://github.com/ezyang, https://github.com/bdhirsh
2022-08-08 22:24:43 +00:00
Nikolay Korovaiko
35b4ac4eeb remove unused/debug header (#82845)
### Description
Missed one of the review comments in https://github.com/pytorch/pytorch/pull/82731 . Namely, to remove an unused `<iostream>` that was used for debugging

### Issue
<!-- Link to Issue ticket or RFP -->

### Testing
<!-- How did you test your change? -->

Pull Request resolved: https://github.com/pytorch/pytorch/pull/82845
Approved by: https://github.com/Chillee, https://github.com/albanD
2022-08-08 21:40:17 +00:00
Peter Bell
4f255dbfb3 Remove manual bindings for arange (#81380)
The functional variant of one of the `arange` overloads has a schema mismatch with the out variant. The functional one has `Scalar step`, but the corresponding out variant has `Scalar step=1`. This isn't allowed, so it had to be special-cased in the python codegen and manually bound. This adds the default `step` value to the functional overload and removes the special-casing.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/81380
Approved by: https://github.com/ngimel
2022-08-07 00:10:27 +00:00
Peter Bell
adc5e7d32e Remove manual bindings for linspace, logspace and full (#81378)
These functions are bound manually because their default dtype isn't
always the same as `torch.get_default_dtype()`. This was necessary
because the python binding codegen effectively translated
`ScalarType? dtype=None` to `ScalarType dtype=torch.get_default_dtype()`.

I've fixed the python bindings generator to correctly pass through
`None`, and thus we can safely remove the manual bindings.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/81378
Approved by: https://github.com/ngimel
2022-08-07 00:10:27 +00:00
Nikolay Korovaiko
bfebf254dd Re-land sym_numel (#82374) (#82726) (#82731) (#82855)
### Description
This is a reland of (#82374) (#82726) (#82731)
This PR has no extra fixes, it simply updates the **correct** pin to point to the XLA side that has the corresponding changes.

### Issue
<!-- Link to Issue ticket or RFP -->

### Testing
<!-- How did you test your change? -->

Pull Request resolved: https://github.com/pytorch/pytorch/pull/82855
Approved by: https://github.com/ezyang, https://github.com/qihqi
2022-08-05 03:36:09 +00:00
PyTorch MergeBot
78bd95b13a Revert "Re-land sym_numel (#82374) (#82726) (#82731)"
This reverts commit c90e00cf85.

Reverted https://github.com/pytorch/pytorch/pull/82731 on behalf of https://github.com/zengk95 due to This is breaking XLA  tests on trunk. It seems to have passed on PR and was able to checkout that commit c90e00cf85.
2022-08-04 22:45:26 +00:00
Nikolay Korovaiko
c90e00cf85 Re-land sym_numel (#82374) (#82726) (#82731)
This PR relands sym_numel #82374 and fixes the ios build break in this commit : 8cbd0031c5
which was a type mismatch in an equality.

### Description
<!-- What did you change and why was it needed? -->

### Issue
<!-- Link to Issue ticket or RFP -->

### Testing
<!-- How did you test your change? -->

Pull Request resolved: https://github.com/pytorch/pytorch/pull/82731
Approved by: https://github.com/malfet
2022-08-04 21:05:24 +00:00
Michael Gschwind
82f558feee Allow user to assert no mask contiguous check is necessary (#82533)
Summary:
Allow user to assert no mask contiguous check is necessary:
(1) Prevents sync event which will disrupt CUDA Graph collection, and
(2) offers slightly better performance by avoid a sync

This needs to be a separate opt-in option because we change behavior of malformed masks.  It's the only way to get BT into CUDA Graph based on what I understood about CUDA Graph collection from ngimel.

Test Plan: sandcastle unit tests

Differential Revision: D38040418

Pull Request resolved: https://github.com/pytorch/pytorch/pull/82533
Approved by: https://github.com/jbschlosser, https://github.com/zrphercule
2022-08-04 17:57:57 +00:00
zengk95
d0e6e5a5bb Revert "sym_numel (#82374)" (#82726)
TSIA

It looks like this PR #82374  is breaking mac builds on trunk but I can't revert it normally since there's a merge conflict in the XLA hash.
<img width="1753" alt="image" src="https://user-images.githubusercontent.com/34172846/182644661-b7fdda4b-e5ce-45c3-96a2-ad6737d169ae.png">

I reverted it and resolved the conflict using the old XLA hash that this commit was based upon
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82726
Approved by: https://github.com/albanD, https://github.com/janeyx99
2022-08-03 15:23:47 +00:00
Nikolay Korovaiko
fd68b0931f sym_numel (#82374)
### Description
This PR makes `numel` symint-aware similar to `sym_sizes()` and `sym_strides()`. Similar to https://github.com/pytorch/pytorch/pull/81300 . This PR is the part of a bigger project to support dynamic_shapes.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/82374
Approved by: https://github.com/ezyang
2022-08-03 06:33:45 +00:00
Edward Z. Yang
df69660832 Revert "Revert "Add a lint rule for torch/csrc/util/pybind.h include (#82552)"" (#82599)
This reverts commit 532b8a9e00.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82599
Approved by: https://github.com/albanD
2022-08-02 19:37:02 +00:00
Elias Ellison
642aed8b99 Add Autocast Support for FakeTensors / use fake device dispatch keys (#82449)
From PR:
```
Note: [Fake Tensor Dispatch Keys]
In order to model the behavior of device-specific autocast
and autograd logic, we update the dispatch keys of FakeTensors
to reflect their fake device. This includes the BackendComponent
(DispatchKey::Meta -> DispatchKey::CUDA), and also the BackendComponent
related Autocast and Autograd keys. __torch__dispatch__ sits below
Autocast and Autograd, and is only invoked when we are at the
kernel for the BackendComponent. Then, we add Meta to the
thread-local dispatch include set to hit the meta kernel
instead of the kernel of the BackendComponent for the fake device.
```

Also adds the `conv1/2/3d.padding` operators to the Autocast rule set. Without that fix, the FakeTensor dtype would diverge.

See: https://github.com/pytorch/pytorch/issues/81608

Pull Request resolved: https://github.com/pytorch/pytorch/pull/82449
Approved by: https://github.com/ezyang
2022-08-01 21:40:36 +00:00
PyTorch MergeBot
532b8a9e00 Revert "Add a lint rule for torch/csrc/util/pybind.h include (#82552)"
This reverts commit 9465c0e0b5.

Reverted https://github.com/pytorch/pytorch/pull/82552 on behalf of https://github.com/zengk95 due to This seems to be breaking windows binary wheels
2022-08-01 20:25:35 +00:00
kshitij12345
e93b5210ec [composite compliance] allclose, linalg_eig (#82437)
Ref: #69991

Make `allclose` CompositeExplicit as it calls `item` (we can't get away from it) which makes it non Composite Compliant.

`linalg_eig` backward passes CompositeCompliance as it calls on `allclose`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82437
Approved by: https://github.com/zou3519
2022-08-01 18:01:15 +00:00
Edward Z. Yang
9465c0e0b5 Add a lint rule for torch/csrc/util/pybind.h include (#82552)
We define specializations for pybind11 defined templates
(in particular, PYBIND11_DECLARE_HOLDER_TYPE) and consequently
it is important that these specializations *always* be #include'd
when making use of pybind11 templates whose behavior depends on
these specializations, otherwise we can cause an ODR violation.

The easiest way to ensure that all the specializations are always
loaded is to designate a header (in this case, torch/csrc/util/pybind.h)
that ensures the specializations are defined, and then add a lint
to ensure this header is included whenever pybind11 headers are
included.

The existing grep linter didn't have enough knobs to do this
conveniently, so I added some features.  I'm open to suggestions
for how to structure the features better.  The main changes:

- Added an --allowlist-pattern flag, which turns off the grep lint
  if some other line exists.  This is used to stop the grep
  lint from complaining about pybind11 includes if the util
  include already exists.

- Added --match-first-only flag, which lets grep only match against
  the first matching line.  This is because, even if there are multiple
  includes that are problematic, I only need to fix one of them.
  We don't /really/ need this, but when I was running lintrunner -a
  to fixup the preexisting codebase it was annoying without this,
  as the lintrunner overall driver fails if there are multiple edits
  on the same file.

I excluded any files that didn't otherwise have a dependency on
torch/ATen, this was mostly caffe2 and the valgrind wrapper compat
bindings.

Note the grep replacement is kind of crappy, but clang-tidy lint
cleaned it up in most cases.

See also https://github.com/pybind/pybind11/issues/4099

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82552
Approved by: https://github.com/albanD
2022-08-01 17:16:58 +00:00
Edward Z. Yang
a9320e6d96 Delete SymInt::data() in favor of as_int_unchecked() (#82477)
I audited all the sites while I was at it, and marked a few suspicious
ones.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82477
Approved by: https://github.com/Chillee
2022-08-01 15:07:22 +00:00
Edward Z. Yang
50e8abbcad Change SymIntNode into an intrusive pointer (#82548)
This will make the pointer type a single word, which is important
for packing it into an int64_t

This time, this diff doesn't segfault when you build with DEBUG mode; more details at https://github.com/pybind/pybind11/issues/4099

Signed-off-by: Edward Z. Yang <ezyangfb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82548
Approved by: https://github.com/albanD
2022-08-01 15:07:21 +00:00
YifanShenSZ
4bb7e148c4 add nested tensor matmul support (#81957)
There was a discussion on whether letting nested tensor `reshape` support collapsing and splitting dimension 0. The conclusion was to make reshape simple, so we need a tweaked `matmul`, which only supports 3+ dimension nonbroadcast case, i.e. a generalized `bmm`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/81957
Approved by: https://github.com/jbschlosser
2022-07-30 22:35:09 +00:00
Kurt Mohler
14d0296e5c Rename _Typed/_UntypedStorage to Typed/UntypedStorage and update docs (#82438)
### Description

Since the major changes for `_TypedStorage` and `_UntypedStorage` are now complete, they can be renamed to be public.

`TypedStorage._untyped()` is renamed to `TypedStorage.untyped()`.

Documentation for storages is improved as well.

### Issue
Fixes #82436

### Testing
N/A

Pull Request resolved: https://github.com/pytorch/pytorch/pull/82438
Approved by: https://github.com/ezyang
2022-07-30 19:37:08 +00:00
Mengwei Liu
301fe8c27d [torchgen] Fix multiple backends with custom namespace (#82133)
Summary:
Some quantized operators needs `QuantizedCPU` backend, due to an issue in namespace checking, currently if we have two backends as well as a custom namespaces in native function, codegen will hit assertion error. This PR fixes this issue

The root cause is that codegen right now asserts that a native function should only have one namespace. The current behavior is that If a native function is not found in a `BackendIndex`, we will use default namespace for that backend, for fallback kernels. However that default namespace may not be listed in the yaml file and it should not be counted when checking if we have two different namespaces for that backend. In our error case, we have 2 `BackendIndex`, one for `QuantizedCPU` and one for `CPU`. The native function doesn't have a kernel in `QuantizedCPU` but we still use a default namespace (`at::native`) for it. Since we have a custom namespace for dispatch key `CPU`, we ran into the assertion error.

This PR changes the assertion criteria. We only error out if a namespace has two or more kernels and they have two or more different namespaces.

Test Plan: rely on newly added unit test

Differential Revision: D38101345

Pull Request resolved: https://github.com/pytorch/pytorch/pull/82133
Approved by: https://github.com/iseeyuan
2022-07-29 22:53:58 +00:00
Shintaro Iwasaki
ccd30a12a2 [PyTorch][Kineto] add ActivityType.h when USE_KINETO is not set (#82028)
Summary:
This patch fixes an error "'ActivityType.h' file not found" when `use_kineto()` is false.

## Problem
Even when `use_kineto()` is not set (i.e., `-DUSE_KINETO` is not passed), `ActivityType.h` is required for PyTorch compilation:
https://github.com/pytorch/pytorch/blob/master/torch/csrc/profiler/kineto_shim.h#L15

## Solution
Add `ActivitiyType.h` dependency even when `use_kineto() == False`.

Test Plan: PyTorch internal and external CI tests.

Differential Revision: D38090153

Pull Request resolved: https://github.com/pytorch/pytorch/pull/82028
Approved by: https://github.com/kit1980, https://github.com/robieta
2022-07-29 20:57:59 +00:00
Peter Bell
ba4727d4e5 Codegen: Parse deprecated signatures as a full FunctionSchema (#82179)
Deprecated signatures are currently "parsed" manually to find the
relative order of the argument names and all other information is
inferred from the aten schema for the non-deprecated overload.
However, this leads to problems if the argument names don't match or
if there are multiple candidates that match the ATen function call.

Instead, this makes the deprecated function a full FunctionSchema and
so the entire python signature comes solely from the deprecated
schema, with the `aten:` clause only used for the dispatch lambda call.

I have confirmed locally that there is no change to
`python_torch_functionsEverything.cpp`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82179
Approved by: https://github.com/albanD
2022-07-29 17:19:54 +00:00
Kurt Mohler
2bfae07a79 Enable dim=None for torch.mean (#81286)
Part of #79525

This will require coordination with XLA before merging, just like #79881
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81286
Approved by: https://github.com/albanD
2022-07-28 22:34:56 +00:00
Edward Z. Yang
fd5ac1e6b5 Rename SymbolicIntNode to SymIntNodeImpl (#82350)
Done via

```
git grep -l 'SymbolicIntNode' | xargs sed -i 's/SymbolicIntNode/SymIntNodeImpl/g'
```

Reasoning for the change:

* Sym is shorter than Symbolic, and consistent with SymInt
* You usually will deal in shared_ptr<...>, so we're going to
  reserve the shorter name (SymIntNode) for the shared pointer.

But I don't want to update the Python name, so afterwards I ran

```
 git grep -l _C.SymIntNodeImpl | xargs sed -i 's/_C.SymIntNodeImpl/_C.SymIntNode/'
```

and manually fixed up the binding code

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82350
Approved by: https://github.com/Krovatkin
2022-07-28 18:27:45 +00:00
PyTorch MergeBot
40a0150f8b Revert "libtorch: exclude from libomnibus to support multipy usage from pybind (#81672)"
This reverts commit 0933c037e7.

Reverted https://github.com/pytorch/pytorch/pull/81672 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally
2022-07-28 17:59:16 +00:00
Catherine Lee
86f038dd56 download test times during build to avoid race conditions (#81915)
After https://github.com/pytorch/pytorch/pull/81116, we started pulling test times straight from the source instead of first downloading them in the build job and then having the test job take the build jobs version.  This can cause an issues where different shards pull different versions of the file, leading to incorrect sharding (ex two shards running the same tests file on accident).  This generally happens if the test jobs happen while the test times file is being updated (unlikely, but not impossible) or if someone reruns a test job the next day.

In this PR, I return to the old method of downloading the test times file during the build job and having the test jobs pull from the build jobs uploaded artifacts.  If there is no test times file in the build job's artifacts, we fall back to the default sharding plan.

Notes:
* script moved to a new file to avoid needing to import torch, which would require torch to be built, which can cause issues with asan
* I got errors with asan (`ASan runtime does not come first in initial library list; you should either link runtime to your application or manually preload it with LD_PRELOAD.`), so I put the script at the beginning of the build

### Test Plan
Verified that the number of tests ran in the pull and trunk workflows are similar to workflows run on master.  Checked logs to see if artifacts were being used for sharding.  Spot checked a few test configs to check that their lists of selected tests didn't overlap.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81915
Approved by: https://github.com/huydhn
2022-07-28 16:35:01 +00:00
Edward Z. Yang
d38ffa6a4c Make all of new_/_like factory functions composite explicit autograd (#82238)
Once CompositeImplicitAutograd gets registered to Python key, this will
ensure that tensor subclasses can interpose on these functions directly
rather than getting decomposed.  We prefer not decomposing as these
functions are functional, but their implementations use inplace
operations (and are thus more difficult to deal with, unless you use
functionalization.)

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82238
Approved by: https://github.com/zou3519, https://github.com/bdhirsh
2022-07-27 18:33:46 +00:00
Tristan Rice
0933c037e7 libtorch: exclude from libomnibus to support multipy usage from pybind (#81672)
Summary: When libtorch is bundled into libomnibus all of the symbols are marked as unexported which causes issues when deploy/multipy tries to link in a subinterpreter at runtime. This excludes `libtorch` and `ATen-core` from libomnibus so the symbols remain exported and available.

Test Plan:
stacked diff

```
buck2 test @//mode/opt -c python.package_style=inplace //multipy/runtime:test_deploy_from_python
```

Differential Revision: D37946374

Pull Request resolved: https://github.com/pytorch/pytorch/pull/81672
Approved by: https://github.com/PaliC
2022-07-27 17:27:57 +00:00
Xinya Zhang
ec99a8003a [ROCM] Improvements of incremental hipification and build (#82190)
### Description
Improve the incremental build process on ROCM by eliminating unnecessary file changes.

### Issue
N/A

### Testing
1. Run `python tools/amd_build/build_amd.py --out-of-place-only` multiple times, and ensure File `third_party/gloo/cmake/Modules/Findrccl.cmake` does not contain patterns like `RCCL_LIBRARY_PATH_PATH`
2. Run `python tools/amd_build/build_amd.py; USE_ROCM=1 python3 setup.py develop` twice, and confirm the second run does not trigger the compiling of thousands of files.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/82190
Approved by: https://github.com/jithunnair-amd, https://github.com/ezyang
2022-07-27 13:37:40 +00:00
Horace He
fc389cc0a0 Added new_empty.symint overload and a new_empty ref (#82049)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82049
Approved by: https://github.com/ezyang
2022-07-27 00:31:57 +00:00
PyTorch MergeBot
6c10a598ca Revert "add nested tensor matmul support (#81957)"
This reverts commit 7bdafed4f1.

Reverted https://github.com/pytorch/pytorch/pull/81957 on behalf of https://github.com/osalpekar due to Reverting this in order to revert https://github.com/pytorch/pytorch/pull/80981 cleanly. That diff caused GPU Inference breakage internally
2022-07-26 21:10:28 +00:00
Nikolay Korovaiko
d2c47d559c Revert "Revert "Enabling SymInt in autograd; take 3 (#81145)"" ; make sure is_intlist checks for symintnodes (#82189)
### Description
<!-- What did you change and why was it needed? -->

### Issue
<!-- Link to Issue ticket or RFP -->

### Testing
<!-- How did you test your change? -->

Pull Request resolved: https://github.com/pytorch/pytorch/pull/82189
Approved by: https://github.com/ezyang
2022-07-26 20:47:11 +00:00
Nikolay Korovaiko
30e74be784 a new section for ir generation (#81847)
This is to get a conversation started.

* @JackCaoG we could add attributes to items in `ir_codegen` section to customize IR generation logic (e.g. not generating `::Lower`). Though it could be a bit tricky to thread it through.
* Adding an extra argument to `map_codegen` to filter native functions out seems like a step in the right direction. Otherwise, it's a bit confusing how do we go from a full list to a codegen list.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/81847
Approved by: https://github.com/JackCaoG, https://github.com/wconstab, https://github.com/bdhirsh
2022-07-26 20:39:07 +00:00
YifanShenSZ
7bdafed4f1 add nested tensor matmul support (#81957)
There was a discussion on whether letting nested tensor `reshape` support collapsing and splitting dimension 0. The conclusion was to make reshape simple, so we need a tweaked `matmul`, which only supports 3+ dimension nonbroadcast case, i.e. a generalized `bmm`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/81957
Approved by: https://github.com/jbschlosser
2022-07-26 16:58:42 +00:00
Edward Z. Yang
9d45243e24 Move empty_like to DONT_REQUIRE_DERIVATIVE list (#82178)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82178
Approved by: https://github.com/soulitzer
2022-07-26 04:26:22 +00:00
Catherine Lee
6f2a88dd50 script to monitor memory + cpu utilization (#82006)
Add a python script that runs in the background during test jobs to log cpu + gpu memory usage and cpu utilization of python tests (really any python process) to a file and upload the file as an artifact.

I plan on using the the gpu memory usage stats to better understand how to parallelize them, but it is easy to add on other stats if people want them.

In the future, we want to add the ability to track network usage to see if we can decrease it.  GPU utilization will also likely need to be improved.

Click the hud link to see uploaded usage log artifacts

Pull Request resolved: https://github.com/pytorch/pytorch/pull/82006
Approved by: https://github.com/huydhn
2022-07-25 16:53:31 +00:00
Kshiteej K
db0e121b46 [composite compliance] put, take (#81094)
Reference: #69991

This PR makes `put` CompositeExplicit as it is implemented in terms of `put_` (for which we can't handle Composite Compliance at the implementation level).

Ref (put implementation)
478081c698/aten/src/ATen/native/TensorAdvancedIndexing.cpp (L619-L621)

Also, we update the `take` gradient formula to handle Tensor Subclass .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81094
Approved by: https://github.com/zou3519
2022-07-25 15:05:16 +00:00
PyTorch MergeBot
c078476eb0 Revert "Enabling SymInt in autograd; take 3 (#81145)"
This reverts commit 032facd6e6.

Reverted https://github.com/pytorch/pytorch/pull/81145 on behalf of https://github.com/jeanschmidt due to breaking internal builds
2022-07-22 11:15:20 +00:00
Zain Rizvi
d28e667159 Update actionlint (#81922)
This PR will:

1. Update actionlint to fix false positives from https://github.com/pytorch/pytorch/issues/81807
2. Establish a new naming convention for S3 file paths for linter adapters which allows older commits of pytorch to no longer be broken
3. Add update instructions to the s3_init_config.json file.

**Why are the instructions embedded in this json file and not the pytorch wiki?**
Anyone who tries to update the binaries will definitely easily this file and can see the instructions above. The wiki is not nearly as searchable and is likely to not get noticed

**Why embed the comment as data in the json file?**
Json doesn't support native comments. But since nothing is validating the exact shape of this json file, adding an extra dictionary entry to serve as a comment is perfectly safe.

## Testing
I validated the architectures of the old binaries by running `file actionlint` on them and inspecting the outputs
I validated the hash was sha256 by checking tools/linter/adapters/s3_init.py and by also downloading the binaries from s3 and verifying their sha256 matches what's in s3_init_config.json
I validated end to end behavior by:
1. Deleting `.lintbin\actionlint` locally, running `lintrunner init` and verifying it got installed correctly and could lint files
2. Changing the sha to an invalid value and verifying `lintrunner init` failed to install actionlint
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81922
Approved by: https://github.com/kit1980, https://github.com/janeyx99
2022-07-22 01:55:42 +00:00
Nikolay Korovaiko
032facd6e6 Enabling SymInt in autograd; take 3 (#81145)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/81145
Approved by: https://github.com/ezyang
2022-07-22 00:14:50 +00:00
lezcano
c5330183ca [PrimTorch] Reference for linalg.matrix_norm (#81113)
As per title. I corrected a thing or two from my previous implementation
to make for better errors in some weird edge-cases and have a more clear
understanding of when does this function support low_precision types and
when it doesn't.

We also use the optimisation for bfloat16 within `vector_norm` within
this function.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81113
Approved by: https://github.com/ngimel
2022-07-21 23:07:32 +00:00
Edward Z. Yang
a7c1f74426 Revert "Revert "Call lift_fresh after scalar_to_tensor in composite derivative formulas (#81609)"" (#81885)
This reverts commit fdc2af0090.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81885
Approved by: https://github.com/soulitzer
2022-07-21 17:35:49 +00:00
Peter Bell
8d0cbce069 Lower randint default dtype to the C++ API (#81410)
The default dtype for randint is currently handled with manual python
binding code, this moves it into the `native_functions.yaml` declaration
for API consistency.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81410
Approved by: https://github.com/albanD
2022-07-21 16:42:49 +00:00
Peter Bell
5f2e31797a Replace _dtype_default_type_hack (#81479)
Currently any function with a default dtype other than None has to be
manually entered into this function. Instead, this reads the default
directly from `native_functions.yaml`. In order to do this, I also
change `PythonSignatureGroup` to take `tensor_options_args` from the
functional variant since the out variant doesn't actually have tensor
options arguments to take the default values from.

Also note that we need to use `default_init` instead of `default`
because the out argument version doesn't have a `tensor_options`
argument to extract the default value from and so the PythonSignature
objects wouldn't match.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81479
Approved by: https://github.com/albanD
2022-07-21 16:42:49 +00:00
PyTorch MergeBot
fdc2af0090 Revert "Call lift_fresh after scalar_to_tensor in composite derivative formulas (#81609)"
This reverts commit aad7a1c06c.

Reverted https://github.com/pytorch/pytorch/pull/81609 on behalf of https://github.com/jeanschmidt due to breaking internal builds
2022-07-21 10:50:05 +00:00
Edward Z. Yang
aad7a1c06c Call lift_fresh after scalar_to_tensor in composite derivative formulas (#81609)
`scalar_to_tensor` is not dispatched and thus there is no interposition point for modes to ensure that the resulting tensor is appropriately wrapped. `lift_fresh` introduces this interposition point. This prevents FakeTensorMode from erroring. I can't make these wrapped numbers because there is some downstream logic on convolution backwards that expects these inputs to be honest to goodness tensors for conjugation.

This fixes test_aot_autograd_exhaustive_special_ndtr_cpu_float32
in https://github.com/pytorch/functorch/pull/935

See https://github.com/pytorch/pytorch/issues/81608 for more discussion

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81609
Approved by: https://github.com/soulitzer
2022-07-21 04:46:05 +00:00
ssjia
96958be6be [vulkan] Automatically generate shader layout from GLSL (#81715)
Differential Revision: [D37966838](https://our.internmc.facebook.com/intern/diff/D37966838/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81715
Approved by: https://github.com/kirklandsign
2022-07-20 01:57:59 +00:00
Catherine Lee
06a0cfc0ea pytest to run test_ops, test_ops_gradients, test_ops_jit in non linux cuda environments (#79898)
This PR uses pytest to run test_ops, test_ops_gradients, and test_ops_jit in parallel in non linux cuda environments to decrease TTS.  I am excluding linux cuda because running in parallel results in errors due to running out of memory

Notes:
* update hypothesis version for compatability with pytest
* use rerun-failures to rerun tests (similar to flaky tests, although these test files generally don't have flaky tests)
  * reruns are denoted by a rerun tag in the xml.  Failed reruns also have the failure tag.  Successes (meaning that the test is flaky) do not have the failure tag.
* see https://docs.google.com/spreadsheets/d/1aO0Rbg3y3ch7ghipt63PG2KNEUppl9a5b18Hmv2CZ4E/edit#gid=602543594 for info on speedup (or slowdown in the case of slow tests)
  * expecting windows tests to decrease by 60 minutes total
* slow test infra is expected to stay the same - verified by running pytest and unittest on the same job and check the number of skipped/run tests
* test reports to s3 changed - add entirely new table to keep track of invoking_file times
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79898
Approved by: https://github.com/malfet, https://github.com/janeyx99
2022-07-19 19:50:57 +00:00
Larry Liu
e345138591 [retake2][mobile] Fix lightweight dispatch OOM error by introducing selective build (#80791)
To fix #78540 I committed #78983 which is reverted due to internal CI failure. Then I comitted #79215 which was only fixing the failure but didn't have the full feature of #78983. This PR is another try.

This PR adds script to dump all operators from test models and automatically write into `lightweight_dispatch_ops.yaml`. This way we don't have to manually update the yaml file.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80791
Approved by: https://github.com/raziel
2022-07-15 18:04:25 +00:00
Peter Bell
00459c2c87 [primTorch] Implement constant_pad_nd (#80182)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80182
Approved by: https://github.com/mruberry, https://github.com/ngimel
2022-07-15 15:13:42 +00:00