Commit Graph

90 Commits

Author SHA1 Message Date
Peter Bell
5c580a9846 [decomp] Add test tracking core ATen operators (#104262)
This adds an expect-test that finds the set of core ATen operators by
subtracting the operators with decomposition in core_aten_decompositions from the
set of all operators that have decompositions and could be decomposed.

This is useful because if you add a new decomposition but forget to add it to
the list of core decompositions, it will appear in the PR diff.

Also, by going through this list I have identified some operators where the
functional variant is decomposed, but not the inplace variant which must be an
oversight.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104262
Approved by: https://github.com/lezcano
2023-07-04 16:41:44 +00:00
Fuzzkatt
d805a53f1f disable tf32 for rnn tests and norm tests (#102005)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102005
Approved by: https://github.com/ngimel
2023-05-24 02:22:58 +00:00
Khushi
1aaf0396eb [reland][opinfo] empty_strided (#101782)
Follows #100223

Previous PR: #100890

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101782
Approved by: https://github.com/ezyang
2023-05-19 03:06:29 +00:00
PyTorch MergeBot
dfac4364c4 Revert "[opinfo] empty_strided (#100890)"
This reverts commit 01c7106580.

Reverted https://github.com/pytorch/pytorch/pull/100890 on behalf of https://github.com/PaliC due to broke test_ops.py slow test ([comment](https://github.com/pytorch/pytorch/pull/100890#issuecomment-1551903975))
2023-05-17 19:00:15 +00:00
Jiong Gong
788ff0623b [decomp] fix decomp of batch_norm when weight/bias is not flattened (#101059)
Fix https://github.com/pytorch/pytorch/issues/100970
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101059
Approved by: https://github.com/ezyang
2023-05-16 00:00:34 +00:00
Khushi
01c7106580 [opinfo] empty_strided (#100890)
Follows: #100223

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100890
Approved by: https://github.com/ezyang
2023-05-15 23:39:39 +00:00
Khushi
51fe53e619 [opinfo] item (#100313)
Follows #100223

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100313
Approved by: https://github.com/ezyang
2023-05-10 11:32:45 +00:00
Animesh Jain
e1021ec535 [decomp] Bad accuracy for elu_backward (#100284)
Accuracy is tested by the full model at https://github.com/pytorch/pytorch/issues/100061
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100284
Approved by: https://github.com/ngimel
2023-04-29 04:21:20 +00:00
Aaron Gokaslan
e2a3817dfd [BE] Enable C419 rule for any all shortcircuiting (#99890)
Apparently https://github.com/pytorch/pytorch/pull/78142 made torch.JIT allow for simple generator expressions which allows us to enable rules that replace unnecessary list comprehensions with generators in any/all. This was originally part of #99280 but I split it off into this PR so that it can be easily reverted should anything break.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99890
Approved by: https://github.com/justinchuby, https://github.com/kit1980, https://github.com/malfet
2023-04-25 15:02:13 +00:00
Rohan Gupta
b01d6f2cdb addmv decomp #2 (#96264)
Fixes #94617

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96264
Approved by: https://github.com/ngimel, https://github.com/ezyang
2023-03-16 23:09:45 +00:00
Edward Z. Yang
6a675f7cac Correctly resolve dispatch keys for PyOperator (#96306)
Previously, we never actually used resolve_key, which meant that
you had to register CPU/CUDA/etc all manually; none of the alias
keys worked.  Now they work.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96306
Approved by: https://github.com/Skylion007, https://github.com/zou3519
2023-03-09 22:16:31 +00:00
Yanan Cao (PyTorch)
039b4c8809 Add meta function for _upsample_bilinear2d_aa (#94982)
Differential Revision: D43353000

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94982
Approved by: https://github.com/ezyang
2023-02-19 07:11:20 +00:00
Aaron Gokaslan
67d9790985 [BE] Apply almost all remaining flake8-comprehension checks (#94676)
Applies the remaining flake8-comprehension fixes and checks. This changes replace all remaining unnecessary generator expressions with list/dict/set comprehensions which are more succinct, performant, and better supported by our torch.jit compiler. It also removes useless generators such as 'set(a for a in b)`, resolving it into just the set call.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94676
Approved by: https://github.com/ezyang
2023-02-12 01:01:25 +00:00
Peter Bell
e22e323bea [decomp] Use var_mean in native_batch_norm decomposition (#94140)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94140
Approved by: https://github.com/ngimel
2023-02-10 15:19:46 +00:00
lezcano
fe0e28ab87 [decompositions] GRU decompositon with and without packed sequence (#91466)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91466
Approved by: https://github.com/zou3519
2023-02-08 14:16:30 +00:00
lezcano
bef61225c3 [decompositions] add decomposition for RNN with packed sequence (#91281)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91281
Approved by: https://github.com/zou3519
2023-02-08 14:16:30 +00:00
lezcano
e5f6e1f660 [decompositions] add LSTM decomp (#91124)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91124
Approved by: https://github.com/zou3519
2023-02-08 14:16:30 +00:00
lezcano
c2a92687e0 [decompositions] add RNN decomp and testing (#91123)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91123
Approved by: https://github.com/zou3519
2023-02-08 14:16:30 +00:00
Peter Bell
cee5174d44 Add test tracking operators without decompositions (#90887)
This test inspects the dispatcher directly, so captures operators without
`OpInfo` including internal helper operators and backward operators that might
appear in a trace.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90887
Approved by: https://github.com/ezyang
2023-01-26 01:44:42 +00:00
PyTorch MergeBot
a2da0a0b02 Revert "Add test tracking operators without decompositions (#90887)"
This reverts commit 2740daf701.

Reverted https://github.com/pytorch/pytorch/pull/90887 on behalf of https://github.com/huydhn due to Sorry for reverting your PR. We reverted https://github.com/pytorch/pytorch/pull/70988 in acdd462b1a and this test starts to fail. There is probably a dependency between the twos
2023-01-24 21:56:58 +00:00
Peter Bell
2740daf701 Add test tracking operators without decompositions (#90887)
This test inspects the dispatcher directly, so captures operators without
`OpInfo` including internal helper operators and backward operators that might
appear in a trace.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90887
Approved by: https://github.com/ezyang
2023-01-24 17:38:27 +00:00
lezcano
66e498626c Perform first the decomposition and then the ATen function to catch in-place modifications (#92243)
Addresses https://github.com/pytorch/pytorch/pull/91672#discussion_r1070412867

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92243
Approved by: https://github.com/ezyang
2023-01-17 16:53:36 +00:00
lezcano
ea8b14f27e Add a test for decompositions that decomposes all the operations as much as possible (#87182)
This will enable a more thorough testing of the decompositions than the
one just provided by OpInfos.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/87182
Approved by: https://github.com/ezyang
2023-01-17 16:53:34 +00:00
lezcano
d162c8f92b Assorted decomposition fixes (#87183)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87183
Approved by: https://github.com/ngimel
2023-01-17 16:53:31 +00:00
Yanbo Liang
25f39c1bce Fix uniform ref implementation (#90094)
Fixes https://github.com/pytorch/torchdynamo/issues/1954

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90094
Approved by: https://github.com/ngimel
2022-12-06 21:28:17 +00:00
Animesh Jain
c1950620c5 [decomp] Fix native_batch_norm_backward dtype of dweight and dbias (#89740)
Discovered while debugging an accuracy issue for Inductor.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89740
Approved by: https://github.com/soumith, https://github.com/ngimel
2022-11-29 03:15:20 +00:00
Jane Xu
8695f0cced Rectify native_batch_norm schema by splitting it into two legit schemas (#88697)
Using the same repro from the issue (but with BatchNorm2D)

Rectifies native_batch_norm schema by splitting the schema into 2:
1. one will have NON-optional alias-able running_mean and running_var inputs
2. the other will just not have those parameters at all (no_stats variation)

**Calling for name suggestions!**

## test plan
I've added tests in test_functionalization.py as well as an entry in common_method_invocations.py for `native_batch_norm_legit`
CI should pass.

## next steps
Because of bc/fc reasons, we reroute native_batch_norm to call our new schemas ONLY through the python dispatcher, but in 2 weeks or so, we should make `native_batch_norm_legit` the official batch_norm.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88697
Approved by: https://github.com/albanD
2022-11-23 23:23:17 +00:00
lezcano
1d6a188d08 Reland Dispatch torch.norm to linalg.vector_norm and linalg.matrix_norm (#81761) (#84624)
Reland https://github.com/pytorch/pytorch/pull/81761

Differential Revision: [D39332292](https://our.internmc.facebook.com/intern/diff/D39332292)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/84624
Approved by: https://github.com/kit1980
2022-11-22 07:53:24 +00:00
lezcano
3320915303 Fix decomp for embedding_backward and simplify the decomposition of embedding_dense and embedding_dense_backward (#87204)
See the title

Pull Request resolved: https://github.com/pytorch/pytorch/pull/87204
Approved by: https://github.com/Chillee
2022-11-16 17:46:54 +00:00
lezcano
e1ecf53d84 Simplify linspace decomp and increase its tolerance (#87203)
This is an interesting one

Since this is an operation that's intrinsically defined on the reals,
we should perform the ops on that dtype always, and just cast to
the desired dtype at the end. This simplifies the decomposition.

Now, I started looking at this one when I started seeing failures on a
test that's added in a later PR. What's going on here is that, by doing
an upcast to a higher dtype and then cast down to integers, sometimes
there's an off-by-one error. I think this is fine, as the decomposition
is more accurate than the original function, which goes in line with
the whole PrimTorch effort.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/87203
Approved by: https://github.com/mruberry
2022-11-16 17:46:54 +00:00
Sherlock Huang
5faa2792fa Symintify decomps for split and upsample_bilinear; Fix decomp for _softmax_backward_data and native_dropout_backward (#88761)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88761
Approved by: https://github.com/ezyang
2022-11-15 13:34:45 +00:00
PyTorch MergeBot
eea506aee1 Revert "Symintify decomps for split and upsample_bilinear; Fix decomp for _softmax_backward_data and native_dropout_backward (#88761)"
This reverts commit 9eabcc370f.

Reverted https://github.com/pytorch/pytorch/pull/88761 on behalf of https://github.com/suo due to much broken 9eabcc370f
2022-11-14 01:58:47 +00:00
Sherlock Huang
9eabcc370f Symintify decomps for split and upsample_bilinear; Fix decomp for _softmax_backward_data and native_dropout_backward (#88761)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88761
Approved by: https://github.com/ezyang
2022-11-13 21:30:53 +00:00
Fabio Rocha
652af5ec15 upsample_*.vec ops are now CompositeImplicit (#85638)
It was previously CompositeExplicit but it was not really necessary.
See discussion in https://github.com/pytorch/pytorch/issues/85405

Pull Request resolved: https://github.com/pytorch/pytorch/pull/85638
Approved by: https://github.com/ezyang, https://github.com/lezcano, https://github.com/malfet, https://github.com/jansel
2022-11-09 09:58:04 +00:00
soulitzer
4c20c0509d Split out forward AD tests from test_ops_gradients and reenable slow gradcheck CI (#88216)
Fixes: https://github.com/pytorch/pytorch/issues/88010

This PR does a couple things to stop slow gradcheck from timing out:
- Splits out test_ops_fwd_gradients from test_ops_gradients, and factors out TestFwdGradients and TestBwdGradients which both inherit from TestGradients, now situated in common_utils (maybe there is a better place?)
- Skips CompositeCompliance (and several other test files) for slow gradcheck CI since they do not use gradcheck
- because test times for test_ops_fwd_gradients and test_ops_gradients are either unknown or wrong, we hardcode them for now to prevent them from being put together. We can undo the hack after we see actual test times are updated. ("def calculate_shards" randomly divides tests with unknown test times in a round-robin fashion.)
- Updates references to test_ops_gradients and TestGradients
- Test files that are skipped for slow gradcheck CI are now centrally located in in run_tests.py, this reduces how fine-grained we can be with the skips, so for some skips (one so far) we still use the old skipping mechanism, e.g. for test_mps

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88216
Approved by: https://github.com/albanD
2022-11-03 00:20:45 +00:00
lezcano
faf9c47abb Simplify a few diagonal-related functions (#87180)
`diag` was unnecessarily implemented as a kernel rather than as a composite
function, which made it unnecessarily difficult (explicit backward + all it entails).

We also change a few uses of `diag` on 2D tensors for `diagonal()`. The
latter returns a view rather than creating a new tensor.

We also upgrade its meta implementation to a fully-fledged
decomposition

I tried implementing the backwards of `diagonal()` via `diag_scatter` (or better `diag_scatter_` to keep the perf) but functionalisation was failing and I was not sure how to fix this, so I moved on. It may be possible to simplify that one as well if @soulitzer or someone knows how to do this.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87180
Approved by: https://github.com/ngimel, https://github.com/albanD, https://github.com/mruberry
2022-10-24 06:11:53 +00:00
Peter Bell
6eeeb88172 OpInfo: Sample input cleanup (4/n) (#86324)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86324
Approved by: https://github.com/mruberry
2022-10-19 21:25:45 +00:00
PyTorch MergeBot
317eeb81c3 Revert "OpInfo: Sample input cleanup (4/n) (#86324)"
This reverts commit 2a6d37d23d.

Reverted https://github.com/pytorch/pytorch/pull/86324 on behalf of https://github.com/peterbell10 due to Caused tolerance issues in periodic test
2022-10-17 18:26:59 +00:00
Peter Bell
2a6d37d23d OpInfo: Sample input cleanup (4/n) (#86324)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86324
Approved by: https://github.com/mruberry
2022-10-16 19:12:44 +00:00
Khushi Agrawal
77d29bcee2 [primTorch] special: ndtr, ndtri, log_ndtr, erfcx (#86077)
- Adds prims and _refs for `erfcx` and `ndtri`.
- Adds _refs for `ndtr`, and `log_ndtr`.

cc @kshitij12345 @lezcano @mruberry
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86077
Approved by: https://github.com/mruberry
2022-10-13 01:18:30 +00:00
Jane Xu
6923dc3b59 Add module: decompositions as an owner to test_decomp.py (#86703)
so flaky tests can be attributed to @SherlockNoMad too 😛
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86703
Approved by: https://github.com/albanD
2022-10-11 17:23:36 +00:00
Peter Bell
3ec71fce79 Improve make_tensor performance for float and complex types (#85473)
For floating types, `make_tensor` calls `rand` and then does a linear
interpolation from `low` to `high`. This instead calls `uniform_(low,
high)` to cut out the interpolation step.

For complex types, `make_tensor` does the `rand` + interpolation step
twice and calls `torch.complex(real, imag)` at the end. This instead
uses `view_as_real` and `uniform_(low, high)` to fuse it all into one
operation.

My benchmarks show significant speedups in all cases for float32 and
complex64.

| Device | dtype     | Size  | Master (us) | This PR (us) | Speedup |
|--------|-----------|-------|-------------|--------------|---------|
| CPU    | float32   | 8     | 19.4        | 6.34         | 3.1     |
|        |           | 4096  | 36.8        | 21.3         | 1.7     |
|        |           | 2**24 | 167,000     | 80,500       | 2.1     |
|        | complex32 | 8     | 37.0        | 7.57         | 4.9     |
|        |           | 4096  | 73.1        | 37.6         | 1.9     |
|        |           | 2**24 | 409,000     | 161,000      | 2.5     |
| CUDA   | float32   | 8     | 40.4        | 11.7         | 3.5     |
|        |           | 4096  | 38.7        | 11.7         | 3.3     |
|        |           | 2**24 | 2,300       | 238          | 9.7     |
|        | complex32 | 8     | 78.7        | 14           | 5.6     |
|        |           | 4096  | 82.7        | 13.8         | 6.0     |
|        |           | 2**24 | 5,520       | 489          | 11.3    |
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85473
Approved by: https://github.com/mruberry
2022-10-05 17:05:20 +00:00
PyTorch MergeBot
6db3539e70 Revert "Improve make_tensor performance for float and complex types (#85473)"
This reverts commit a76995e584.

Reverted https://github.com/pytorch/pytorch/pull/85473 on behalf of https://github.com/huydhn due to Sorry for revert your PR, but it seems to cause a bunch of flaky test in pull an periodic
2022-09-29 20:06:52 +00:00
Peter Bell
a76995e584 Improve make_tensor performance for float and complex types (#85473)
For floating types, `make_tensor` calls `rand` and then does a linear
interpolation from `low` to `high`. This instead calls `uniform_(low,
high)` to cut out the interpolation step.

For complex types, `make_tensor` does the `rand` + interpolation step
twice and calls `torch.complex(real, imag)` at the end. This instead
uses `view_as_real` and `uniform_(low, high)` to fuse it all into one
operation.

My benchmarks show significant speedups in all cases for float32 and
complex64.

| Device | dtype     | Size  | Master (us) | This PR (us) | Speedup |
|--------|-----------|-------|-------------|--------------|---------|
| CPU    | float32   | 8     | 19.4        | 6.34         | 3.1     |
|        |           | 4096  | 36.8        | 21.3         | 1.7     |
|        |           | 2**24 | 167,000     | 80,500       | 2.1     |
|        | complex32 | 8     | 37.0        | 7.57         | 4.9     |
|        |           | 4096  | 73.1        | 37.6         | 1.9     |
|        |           | 2**24 | 409,000     | 161,000      | 2.5     |
| CUDA   | float32   | 8     | 40.4        | 11.7         | 3.5     |
|        |           | 4096  | 38.7        | 11.7         | 3.3     |
|        |           | 2**24 | 2,300       | 238          | 9.7     |
|        | complex32 | 8     | 78.7        | 14           | 5.6     |
|        |           | 4096  | 82.7        | 13.8         | 6.0     |
|        |           | 2**24 | 5,520       | 489          | 11.3    |
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85473
Approved by: https://github.com/mruberry
2022-09-29 11:46:09 +00:00
Animesh Jain
796da4df4d Return contiguous tensor from softmax decomposition (#85788)
Fixes https://github.com/pytorch/torchdynamo/issues/1135

Softmax decomp's output stride does not match with aten softmax output stride. Not sure if its desirable. Opening a PR for now.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/85788
Approved by: https://github.com/ngimel, https://github.com/ezyang
2022-09-28 20:52:45 +00:00
Peter Bell
29c78266c0 test_decomp.py: Skip tests for embedding_backward bf16 (#84554)
`embedding_backward`'s decomposition is less accurate for bf16.
Currently bfloat16 is skipped in both forward and backward, but the
forward decomposition matches 1-1 with the ATen implementation so this
re-enables the test for the forwards decomposition.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84554
Approved by: https://github.com/albanD
2022-09-28 19:32:54 +00:00
Edward Z. Yang
793488cda2 Revert "Revert "Symintifying slice ops (#85196)"" (#85746)
This reverts commit 3a171dfb0c.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85746
Approved by: https://github.com/albanD
2022-09-28 04:37:35 +00:00
PyTorch MergeBot
3a171dfb0c Revert "Symintifying slice ops (#85196)"
This reverts commit 4c01c51266.

Reverted https://github.com/pytorch/pytorch/pull/85196 on behalf of https://github.com/atalman due to Break internal build Exutorch
2022-09-27 18:01:27 +00:00
samdow
18d8c548f4 [Modes] remove enable and rewrite mode stack (squashed) (#84774)
Based on @ezyang's suggestion, mode stack now has "one true mode" which is the _only_ mode that can ever be active at the C++ level. That mode's torch dispatch is just to take the top mode in the stack, reenable itself (if we aren't at the end of the mode stack), and run the top mode's torch_{dispatch|function}

This maintains that in the middle of a mode's torch dispatch, the mode itself will not be active. It changes the function the user has to call to see what the current mode is (no longer queries the C++, it's python only) but allows the user to also see the entire mode stack easily

Removes `enable_torch_dispatch_mode` and `.restore()` since neither makes sense in this new setup

### Background
Why do we want this? Well, a pretty common pattern that was coming up was that users had to do something like

```python
## PRE-PR UX
def f(mode):
  with mode.restore():  # user needs to understand this restore thing?
    ...

with Mode() as m:
  pass
f(m)
```

Many users were getting error from forgetting to call `.restore` or from forgetting to add the (tbh weird) "mode instantiation"  step where they use the mode as a context manager with an empty body. Really, they wanted to treat modes like context managers and just write
```python
## FROM FEEDBACK, USER DESIRED CODE. POSSIBLE POST-PR
def f(mode):
  with mode:
    ...
f(Mode())
```

** Technical Details **
With the old mode stack, we basically had a linked list so the mode itself could only be used once and had a fixed parent. In this new design, the mode stack is just a python list that we're pushing to and popping from. There's only one mode that's ever active at the C++ level and it runs the next mode in the Python list. The modes don't have state on them anymore
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84774
Approved by: https://github.com/ezyang, https://github.com/zou3519
2022-09-27 01:04:35 +00:00
Fabio Rocha
d5ce2bbed2 [primTorch] decompositions for upsample_bicubic2d (#85403)
FYI, this decomposition seems to be significantly slower than the lowering in torchinductor:

```
------------------------------------- upsample_bicubic2d -------------------------------------]
                                                              |  lowering  |  Inductor  |  Eager
32 threads: ------------------------------------------------------------------------------------
      (torch.Size([16, 4, 128, 256]),), ((512, 1024), True)   |    1.8     |   3.880    |   1.4
      (torch.Size([16, 4, 128, 256]),), ((512, 1024), False)  |    1.9     |   3.887    |   1.4
```

This seems related to the fact that in the lowering we can use int32s as the indices and in the decomp we can only use int64s (see https://github.com/pytorch/torchdynamo/issues/1293).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/85403
Approved by: https://github.com/ngimel
2022-09-26 20:11:23 +00:00