Commit Graph

2119 Commits

Author SHA1 Message Date
kvathupo
2b9d9bcb85 Deprecate non-bool masks in masked_fill (#96594)
__What?__
Per discussion at #94634, deprecate `masked_fill` with non-bool masks. Deprecation warnings were previously added by #22261, but not for Apple MPS. I can revert the MPS changes if deprecation warnings are wanted first tho. See also #96112.

Fixes #85063 and #89320.

__Further Development?__
- Fixed the mask dtype checking for the cuda dispatch for `masked_fill` in `aten/src/ATen/native/cuda/Indexing.cu`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96594
Approved by: https://github.com/malfet, https://github.com/ngimel
2023-03-13 01:41:47 +00:00
Nikita Shulga
1cd0929bf7 [BC] Allow only bool tensors as mask in masked_select (#96112)
`byte` support was marked as deprecated in 1.8, so it's fine to remove this in 2.1 (or even 2.0)
Deprecation warning was added by https://github.com/pytorch/pytorch/pull/22261

Also, fix bunch of syntactic errors in comments

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96112
Approved by: https://github.com/ezyang
2023-03-07 01:43:14 +00:00
puririshi98
8aa34602f7 Jetson Update for CI Redo (#94549)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94549
Approved by: https://github.com/ezyang, https://github.com/malfet
2023-02-21 17:13:38 +00:00
Yuxin Wu
9bb2fe3eae fix numpy1.24 deprecations in unittests (#93997)
Fixes https://github.com/pytorch/pytorch/issues/91329

Pull Request resolved: https://github.com/pytorch/pytorch/pull/93997
Approved by: https://github.com/ngimel, https://github.com/jerryzh168
2023-02-18 00:59:09 +00:00
Xuehai Pan
b005ec62b9 [BE] Remove dependency on six and future (#94709)
Remove the Python 2 and 3 compatibility library [six](https://pypi.org/project/six) and [future](https://pypi.org/project/future) and `torch._six`. We only support Python 3.8+ now. It's time to retire them.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94709
Approved by: https://github.com/malfet, https://github.com/Skylion007
2023-02-14 09:14:14 +00:00
Brian Hirsh
ceb0f1576b turn functionalization on in aot_autograd inference (#92857)
still waiting for CI fallout
fixes #90759

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92857
Approved by: https://github.com/ezyang
2023-02-13 17:48:00 +00:00
Nikita Shulga
4869929f32 Update Triton hash (#94249)
That includes MLIR + latest packaging changes (that also download ptxas from CUDA-12)
Tweak CI to install gcc-9 to build trition

Disable a few tests to make everything be correct

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94249
Approved by: https://github.com/Skylion007, https://github.com/ngimel, https://github.com/weiwangmeta
2023-02-13 13:17:36 +00:00
Aaron Gokaslan
9171f7d4cd [BE] Modernize PyTorch even more for 3.8 with pyupgrade (#94520)
Applies some more pyupgrade fixits to PyTorch

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94520
Approved by: https://github.com/ezyang
2023-02-10 18:02:50 +00:00
ganler
db6cfff827 fix: forbid multi-index for index_select over scalar (#94347)
Fixes #88940

According to the [doc](https://pytorch.org/docs/stable/generated/torch.index_select.html):
1. "The returned tensor has the same number of dimensions as the original tensor (`input`). "
2.  "The `dim`th dimension has the same size as the length of `index`; other dimensions have the same size as in the original tensor."

These two conditions cannot be satisfied at the same time if the `input` is a scalar && `index` has multiple values: because a scalar at most holds one element (according to property 1, the output is a scalar), it is impossible to satisfy "The `dim`th dimension has the same size as the length of `index`" when `index` has multiple values.

However, currently, if we do so we either get:

1. Buffer overflow with ASAN;
2. Or (w/o ASAN) silently returns outputs that is not consistent with the doc (`x.index_select(0, torch.Tensor([0, 0, 0]).int())` returns `x`).

As a result, we should explicitly reject such cases.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94347
Approved by: https://github.com/malfet
2023-02-10 17:17:09 +00:00
min-jean-cho
900e09c872 [Dynamo] Support torch.Tensor.fn as TorchVariable, not UserDefinedObjectVariable, preventing graph break (#93243)
As found in #92709, thanks to @ngimel and @jansel, currently `torch.Tensor.fn` points to `UserDefinedObjectVariable` rather than `TorchVariable`. The root cause is due to https://github.com/pytorch/pytorch/pull/92709#pullrequestreview-1273357406. To prevent this, build `TorchVariable`  of `torch.Tensor.fn` pointing to `torch.ops.aten.fn`.

This issue propagates to `torch.Tensor.fn` causing graph break with `nopython=True`.
```python
import torch
import torch._dynamo as dynamo

#op = torch.ops.aten.abs_ # no graph break
op = torch.Tensor.abs_ # graph break
args = torch.empty(10)

def foo(args):
    return op(args)

opt_foo = dynamo.optimize("inductor", nopython=True)(foo)
y_ = opt_foo(args)

```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/93243
Approved by: https://github.com/jansel
2023-02-07 09:26:50 +00:00
min-jean-cho
6e1cfcdf4b cauchy_ few fixes (1) check gamma > 0 (2) better dtype error log (#93314)
Related #92047

(1) `torch.Tensor.cauchy_` is missing check for `gamma > 0` (`torch.distributions.cauchy.Cauchy` correctly checks  `gamma > 0`).
(2) add better error log on dtype similar to exponential_

Pull Request resolved: https://github.com/pytorch/pytorch/pull/93314
Approved by: https://github.com/jgong5, https://github.com/fritzo, https://github.com/lezcano
2023-02-03 11:56:28 +00:00
min-jean-cho
2f0b0c5dd7 exponential_ few fixes (1) lambda > 0 (2) mkl kernel to continuous (3) better error log on dtype (#92891)
Exponential distribution is continuous. Fixes CPU MKL exponential implementation to exclude integer dtypes.

```python
import torch
dtypes = [torch.uint8, torch.int8, torch.int16, torch.int32, torch.int64]

for dtype in dtypes:
    x = torch.empty(10000, dtype=dtype).exponential_() # should fail !
    print("dtype: ", x.dtype, "sum: ", x.sum())
```

### Additional Context

Related to #92709. This issue propagates to OpInfo of exponential.

```
AssertionError: The supported dtypes for exponential on device type cpu are incorrect!
The following dtypes worked in forward but are not listed by the OpInfo: {torch.int64, torch.uint8, torch.int8, torch.int16, torch.int32}.
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92891
Approved by: https://github.com/CaoE, https://github.com/jgong5, https://github.com/ngimel
2023-01-28 02:27:16 +00:00
Yanbo Liang
a6b51448f5 [Dynamo] Supports if condition on user defined object (#90892)
Fixes Meta internal user case, see the pattern in unit test.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90892
Approved by: https://github.com/jansel, https://github.com/mlazos
2023-01-26 04:19:32 +00:00
PyTorch MergeBot
9b23fd378f Revert "Logcumsumexp for complex in CPU and CUDA (#90847)"
This reverts commit 64985123e4.

Reverted https://github.com/pytorch/pytorch/pull/90847 on behalf of https://github.com/malfet due to Reverting to decrease build time, let's discuss the alternatives here
2023-01-24 20:49:08 +00:00
pierreHaslee
1c30844eaa where() function added as a Tensor method as well (#92849)
Fixes #88470

I added the "method" keyword in `aten/src/ATen/native/native_functions.yaml` for the function `where` with Scalar Overload.
This way, you can now use `Tensor.where()` with a scalar parameter the same way `torch.where()` can.

I added a test in `test/test_torch.py` as requested.
It uses the `where()` method on a tensor and then checks it has the same results as the `torch.where()` function.
The test is roughly the same as the one provided by the author of the issue.

PS: this is the second PR I make to resolve this issue, the first one is #92747. I had troubles with commit signatures and is therefore closed.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92849
Approved by: https://github.com/albanD
2023-01-24 03:09:33 +00:00
mfkasim1
64985123e4 Logcumsumexp for complex in CPU and CUDA (#90847)
Another PR towards solving #89205.
What's in this PR:

* The implementation of forward `logcumsumexp` for complex numbers in CPU & CUDA
* The tests on forward call of `logcumsumexp` for complex numbers
* The implementation of backward `logcumsumexp` for complex numbers

What's missing:

* The test on backward gradient of `logcumsumexp` (it complaints `RuntimeError: logcumsumexp does not support automatic differentiation for outputs with complex dtype.` and I don't know how to solve the error and I don't know where to put the test for the backward computation). If possible, I'd like this to be done in this PR.

It's really tricky to handle the edge cases here (i.e. the ones involving `inf`), but I've tried my best to put some comments explaining the reasonings of my decisions in this PR.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90847
Approved by: https://github.com/albanD
2023-01-20 15:10:50 +00:00
Kurt Mohler
647b8f8e3e Add TORCH_CHECK_TENSOR_ALL (#89097)
`TORCH_CHECK_TENSOR_ALL(cond, ...)` is a wrapper around `TORCH_CHECK` which allows the condition argument to be a tensor, batched or unbatched. `cond` can be a boolean tensor of any size. If any element is False, or if `cond.numel() == 0`, then `TORCH_CHECK_TENSOR_ALL` raises an error

Part of #72948
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89097
Approved by: https://github.com/zou3519
2023-01-19 21:04:09 +00:00
BowenBao
a72bcb3388 Do not leak SkipFrame exception to parent frames (#91059)
Discovered by https://github.com/pytorch/torchdynamo/issues/2000, we noticed the exception `SkipFrame` to avoid repeatedly compiling frame of loop with graph breaks could leak to parent frames while inlining, which then prevents compiling.

This PR checks at inlining if such exception is raised and would instead raise an `Unsupported` to the outer frame. The original behavior and goal of #88857 is unaffected: the inner frame that has loop would still be skipped.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91059
Approved by: https://github.com/jansel, https://github.com/thiagocrepaldi
2023-01-13 17:11:22 +00:00
XiaobingSuper
1892c75a45 fix norrow_copy correctness issue for non-contiguous input for cpu path(reland) (#91883)
This PR is about re-land https://github.com/pytorch/pytorch/pull/91789.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91883
Approved by: https://github.com/lezcano
2023-01-10 10:56:18 +00:00
PyTorch MergeBot
d85f3c8237 Revert "fix norrow_copy correctness issue for non-contiguous input for cpu path (#91789)"
This reverts commit 136dadd689.

Reverted https://github.com/pytorch/pytorch/pull/91789 on behalf of https://github.com/huydhn due to This breaks trunk with XPASS test_vmap_exhaustive_narrow_copy_cpu_float32 136dadd689
2023-01-09 06:50:20 +00:00
XiaobingSuper
136dadd689 fix norrow_copy correctness issue for non-contiguous input for cpu path (#91789)
Fix https://github.com/pytorch/pytorch/issues/91690.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91789
Approved by: https://github.com/jgong5, https://github.com/lezcano
2023-01-09 00:55:03 +00:00
PyTorch MergeBot
b3603f8129 Revert "Deduplicate c10 error and PyTorchError hierarchy (#87855)"
This reverts commit 34f2d3e6ae.

Reverted https://github.com/pytorch/pytorch/pull/87855 on behalf of https://github.com/osalpekar due to perf regression in quantization tests
2023-01-06 19:56:35 +00:00
William Phetsinorath
34f2d3e6ae Deduplicate c10 error and PyTorchError hierarchy (#87855)
Fixes #53370

Pull Request resolved: https://github.com/pytorch/pytorch/pull/87855
Approved by: https://github.com/albanD
2023-01-02 15:53:36 +00:00
ecao
274d3b24c3 use scatter_add for index_add when dim is the most inner dim (#88729)
### Motivation
When dim is -1 and the slice of source or result is noncontiguous, original `index_add` is slow as it uses add for the sliced tensor, which is serial on index and parallel on sliced tensor to avoid write conflict. Doing parallel on the sliced tensor is not optimal as the size of sliced tensor may be not big enough to parallel and also causes multiple parallelizations.

`scatter_add ` is used to speedup for this case as `scatter_add ` parallels on the outer dimension of input and is serial on the inner dimension to avoid write conflict. `scatter_add ` only need one parallel and the size of outer dimensions is bigger to do parallel.

### Testing

- Single core:

Before:

shape | fp32 / s | bf16 / s
-- | -- | --
[10, 128, 20, 20] | 2.82E-03 | 2.11E-03
[10, 128, 50, 50] | 0.023604 | 0.023794

After:

shape | fp32 / s | bf16 / s
-- | -- | --
[10, 128, 20, 20] | 9.30E-04 | 1.66E-03
[10, 128, 50, 50] | 0.005995 | 0.010003

- Single socket (28 cores):

Before:

shape | fp32 / s | bf16 / s
-- | -- | --
[10, 128, 20, 20] | 2.96E-03 | 2.52E-03
[10, 128, 50, 50] | 0.012208 | 0.012568

After:

shape | fp32 / s | bf16 / s
-- | -- | --
[10, 128, 20, 20] | 7.44E-05 | 1.33E-04
[10, 128, 50, 50] | 0.000333 | 0.000469

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88729
Approved by: https://github.com/mingfeima, https://github.com/jgong5, https://github.com/malfet
2022-12-28 12:04:17 +00:00
PyTorch MergeBot
eadd557266 Revert "use scatter_add for index_add when dim is the most inner dim (#88729)"
This reverts commit 68e9da68cb.

Reverted https://github.com/pytorch/pytorch/pull/88729 on behalf of https://github.com/atalman due to Break internal build
2022-12-22 18:06:45 +00:00
ecao
68e9da68cb use scatter_add for index_add when dim is the most inner dim (#88729)
### Motivation
When dim is -1 and the slice of source or result is noncontiguous, original `index_add` is slow as it uses add for the sliced tensor, which is serial on index and parallel on sliced tensor to avoid write conflict. Doing parallel on the sliced tensor is not optimal as the size of sliced tensor may be not big enough to parallel and also causes multiple parallelizations.

`scatter_add ` is used to speedup for this case as `scatter_add ` parallels on the outer dimension of input and is serial on the inner dimension to avoid write conflict. `scatter_add ` only need one parallel and the size of outer dimensions is bigger to do parallel.

### Testing

- Single core:

Before:

shape | fp32 / s | bf16 / s
-- | -- | --
[10, 128, 20, 20] | 2.82E-03 | 2.11E-03
[10, 128, 50, 50] | 0.023604 | 0.023794

After:

shape | fp32 / s | bf16 / s
-- | -- | --
[10, 128, 20, 20] | 9.30E-04 | 1.66E-03
[10, 128, 50, 50] | 0.005995 | 0.010003

- Single socket (28 cores):

Before:

shape | fp32 / s | bf16 / s
-- | -- | --
[10, 128, 20, 20] | 2.96E-03 | 2.52E-03
[10, 128, 50, 50] | 0.012208 | 0.012568

After:

shape | fp32 / s | bf16 / s
-- | -- | --
[10, 128, 20, 20] | 7.44E-05 | 1.33E-04
[10, 128, 50, 50] | 0.000333 | 0.000469

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88729
Approved by: https://github.com/mingfeima, https://github.com/jgong5, https://github.com/malfet
2022-12-22 01:13:35 +00:00
PyTorch MergeBot
3194281ca7 Revert "use scatter_add for index_add when dim is the most inner dim (#88729)"
This reverts commit 13dbad6369.

Reverted https://github.com/pytorch/pytorch/pull/88729 on behalf of https://github.com/desertfire due to causing inductor test failure
2022-12-20 15:19:54 +00:00
ecao
13dbad6369 use scatter_add for index_add when dim is the most inner dim (#88729)
### Motivation
When dim is -1 and the slice of source or result is noncontiguous, original `index_add` is slow as it uses add for the sliced tensor, which is serial on index and parallel on sliced tensor to avoid write conflict. Doing parallel on the sliced tensor is not optimal as the size of sliced tensor may be not big enough to parallel and also causes multiple parallelizations.

`scatter_add ` is used to speedup for this case as `scatter_add ` parallels on the outer dimension of input and is serial on the inner dimension to avoid write conflict. `scatter_add ` only need one parallel and the size of outer dimensions is bigger to do parallel.

### Testing

- Single core:

Before:

shape | fp32 / s | bf16 / s
-- | -- | --
[10, 128, 20, 20] | 2.82E-03 | 2.11E-03
[10, 128, 50, 50] | 0.023604 | 0.023794

After:

shape | fp32 / s | bf16 / s
-- | -- | --
[10, 128, 20, 20] | 9.30E-04 | 1.66E-03
[10, 128, 50, 50] | 0.005995 | 0.010003

- Single socket (28 cores):

Before:

shape | fp32 / s | bf16 / s
-- | -- | --
[10, 128, 20, 20] | 2.96E-03 | 2.52E-03
[10, 128, 50, 50] | 0.012208 | 0.012568

After:

shape | fp32 / s | bf16 / s
-- | -- | --
[10, 128, 20, 20] | 7.44E-05 | 1.33E-04
[10, 128, 50, 50] | 0.000333 | 0.000469

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88729
Approved by: https://github.com/mingfeima, https://github.com/jgong5, https://github.com/malfet
2022-12-20 13:12:36 +00:00
Yanbo Liang
511fbad830 [Dynamo] Fix builder for class with metaclass (#90807)
Fixes Meta internal user case: a class with metaclass can't be identified as ```UserDefinedClassVariable```.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90807
Approved by: https://github.com/jansel
2022-12-20 05:02:28 +00:00
Edward Z. Yang
e686a442b4 If a torch.* returns non-Tensor, make this unimplemented rather than assert. (#89918)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89918
Approved by: https://github.com/albanD
2022-12-15 21:53:54 +00:00
Edward Z. Yang
283cf718ed Fix _fix_weakref memory leak (#90823)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90823
Approved by: https://github.com/eellison, https://github.com/albanD
2022-12-15 01:07:29 +00:00
Edward Z. Yang
cc504ce292 Restore test_warn_types (#90810)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90810
Approved by: https://github.com/ngimel
2022-12-14 05:15:32 +00:00
Bin Bao
7035bcdd0f [inductor] Enable test_torch (#90518)
Summary: Skipping failures in those tests so that CI can guard other
passing cases.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90518
Approved by: https://github.com/jansel
2022-12-13 16:21:35 +00:00
Yuxin Wu
5d8618dfbd Some memory saving in large unittests (#90148)
Two tests test_large_cumsum, test_large_cumprod use a lot of memory. This PR:
* Reduces their memory usage by: avoid `self.assertEqual` and avoid a temporary python variable
* Mark their memory requirement by decorator.

related to https://github.com/pytorch/pytorch/issues/84944
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90148
Approved by: https://github.com/soumith
2022-12-11 21:04:38 +00:00
Edward Z. Yang
2ad6ed8ac9 Fix some typed storage is deprecated warnings. (#89867)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89867
Approved by: https://github.com/albanD
2022-12-07 20:09:57 +00:00
eqy
f7520cb51e Reduce memory usage requirement of test_pdist_norm_large in test_torch.py (#90075)
Basically the same fix as #85373, `/usr/bin/time` indicates that the memory requirement on the host-side was actually ~64GiB before the workaround and ~30GiB after.

CC @ptrblck @davidberard98

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90075
Approved by: https://github.com/davidberard98
2022-12-03 05:28:21 +00:00
Yu, Guangye
4144ad16af add XPU backend to support torch.save and torch.load (#89679)
# Motivate
We need to add XPU backend to support torch.save and torch.load when parameter _use_new_zipfile_serialization=False.

# Solution
We give a design via wrap data as a tensor:
>1. and use an in-place copy for H2D
>2. directly call a tensor.to() for D2H.

This can help us:
>1. unify the generic code for all backends.
>2. support all the non-CPU device backends.

# Additional Context
No need more UT.
test/test_serialization.py will cover this code change.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89679
Approved by: https://github.com/ezyang
2022-11-30 20:38:02 +00:00
albanD
8713119c89 Stream actually overrides __new__ so we need to patch it as well (#89592)
Avoids
```
$ python foo.py
Traceback (most recent call last):
  File "foo.py", line 3, in <module>
    a = torch.cuda.Stream()
  File "/home/albandes/local/pytorch/3.8_debug_source/torch/cuda/streams.py", line 34, in __new__
    return super(Stream, cls).__new__(cls, priority=priority, **kwargs)
TypeError: object.__new__() takes exactly one argument (the type to instantiate)
```
And now gets
```
$ python foo.py
Traceback (most recent call last):
  File "foo.py", line 3, in <module>
    a = torch.cuda.Stream()
  File "/home/albandes/local/pytorch/3.8_debug_source/torch/cuda/streams.py", line 34, in __new__
    return super(Stream, cls).__new__(cls, priority=priority, **kwargs)
  File "/home/albandes/local/pytorch/3.8_debug_source/torch/cuda/_utils.py", line 44, in err_fn
    raise RuntimeError(
RuntimeError: Tried to instantiate dummy base class Stream

```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89592
Approved by: https://github.com/soumith
2022-11-29 21:43:23 +00:00
David Berard
a029ec2c88 Move gpu slow tests to sm86 (#87880)
NVFuser tests (which are slow tests) would be better to run on more
modern GPU hardware.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87880
Approved by: https://github.com/malfet
2022-11-29 19:29:59 +00:00
Nikita Karetnikov
57af0c8245 Bug fix: make sure copy_impl doesn't read out of bounds (#88544)
Fixes #88543.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88544
Approved by: https://github.com/lezcano
2022-11-16 13:23:38 +00:00
PyTorch MergeBot
8441443132 Revert "Add nondeterministic error for scatter (#88244)"
This reverts commit e940a2f8e2.

Reverted https://github.com/pytorch/pytorch/pull/88244 on behalf of https://github.com/mehtanirav due to Internal test failures
2022-11-10 23:56:49 +00:00
Kurt Mohler
ee28b865ee Deprecate TypedStorage, its derived classes, and all of their public methods (#85303)
Part of #85302

Pull Request resolved: https://github.com/pytorch/pytorch/pull/85303
Approved by: https://github.com/ezyang
2022-11-08 18:11:01 +00:00
Kurt Mohler
e940a2f8e2 Add nondeterministic error for scatter (#88244)
Fixes #88096

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88244
Approved by: https://github.com/ezyang, https://github.com/mruberry
2022-11-04 20:23:59 +00:00
Nikolay Korovaiko
0f6304ef1e disable the out variants in test_cumprod test for inductor (#88328)
`out=` variants aren't supported by autograd and it's not a must fix, so disabling the test (https://github.com/pytorch/torchdynamo/issues/1798) for now.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88328
Approved by: https://github.com/desertfire
2022-11-03 16:52:37 +00:00
Nikolay Korovaiko
529ba076c6 add an exclude for test_constructor for inductor (#88143)
This test (https://github.com/pytorch/torchdynamo/issues/1800) fails since none of the c-tor ops support `pin_memory=True`. Natalia suggests it's not a priority to fix.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88143
Approved by: https://github.com/desertfire
2022-11-03 16:21:18 +00:00
Edward Z. Yang
f884e817d4 Make Python op registration work with torchdeploy/multipy (#87162)
See strategy at PythonOpRegistrationTrampoline.cpp for the
big picture.

Along the way, I made OperatorHandle support == and hashing,
and slightly changed the low level python_dispatch impl API
to disallow empty strings for dispatch key, which had the knock
on effect of requiring us to explicitly make sure we pass in
CompositeImplicitAutograd if we would have passed in "" (I didn't apply
this to the rest of the file because I'm lazy.)

Test strategy is we delete the logic for preventing Python op
registrations in torch from being skipped in a torchdeploy context
and show CI still works.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87162
Approved by: https://github.com/anjali411, https://github.com/bdhirsh
2022-11-03 12:56:44 +00:00
Philip Meier
bc73affdad prepare removal of deprecated functionality in torch.testing (#87969)
_Redo of #86586 with all BC breaking changes granularly placed into separate commits._

---

Per title. Deprecation happened on Feb 25, 2022 in c6f1bbc0ac, which made it into the 1.12 release. Since it is now 245 days later and the next release will be 1.14, the removals later in the stack comply with the [BC policy](https://github.com/pytorch/pytorch/wiki/PyTorch's-Python-Frontend-Backward-and-Forward-Compatibility-Policy#minimizing-the-disruption-of-bc-breaking-changes).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87969
Approved by: https://github.com/mruberry
2022-11-02 14:04:48 +00:00
Kurt Mohler
1dbc8ad3b7 Add Warning class and refactor C++ warnings to use it (#84101)
Also adds `TORCH_WARN_WITH` and `TORCH_WARN_DEPRECATION` macros

Part of #72948

Pull Request resolved: https://github.com/pytorch/pytorch/pull/84101
Approved by: https://github.com/albanD
2022-10-18 20:02:42 +00:00
Natalia Gimelshein
1704256b10 Enables where to have cpu scalar args (#87022)
This is for decompositions only, no attempt made to have good performance for this case.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/87022
Approved by: https://github.com/ezyang, https://github.com/eellison, https://github.com/mruberry
2022-10-17 17:08:47 +00:00
Mikayla Gawarecki
afaee00fec Add python nested_tensor and as_nested_tensor constructors in torch.nested (#85593)
Remove `torch.nested_tensor` which has erroneous behavior wrt gradients (could be either leaf or not leaf). Introduce `torch.nested.nested_tensor` and `torch.nested.as_nested_tensor` in the vein of `torch.tensor` and `torch.as_tensor`. Done in nested `__init__.py` for now but can move to pybind in future (when we want to load from numpy/nested lists ).

Discussed offline with @cpuhrsch and pybind constructor (https://github.com/pytorch/pytorch/pull/85536) was more gnarly than expected, so we can move to that when we do need loading from numpy etc.

Differential Revision: [D39806622](https://our.internmc.facebook.com/intern/diff/D39806622)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85593
Approved by: https://github.com/drisspg, https://github.com/cpuhrsch
2022-09-28 20:15:02 +00:00
Kurt Mohler
b0a631cd14 Add nondeterministic alert for MaxUnpool1d/2d/3d (#84766)
Part of #80827
Part of #78249
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84766
Approved by: https://github.com/Lezcano, https://github.com/mruberry, https://github.com/nikitaved
2022-09-17 11:58:18 +00:00
soulitzer
02f654abca Disable torch.library.Library with PYTORCH_DISABLE_LIBRARY (#85190)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85190
Approved by: https://github.com/d4l3k
2022-09-17 03:05:43 +00:00
Khushi Agrawal
a9258eba8e [Testing] Port bernoulli and multinomial to ErrorInputs. (#74683)
Hi,
The PR aims to port `bernoulli` and `multinomial` to error inputs. Thanks!

cc: @kshitij12345! :)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74683
Approved by: https://github.com/kshitij12345, https://github.com/mruberry
2022-09-16 21:24:09 +00:00
Elias Ellison
f37069aac7 Re-enable fixed dynamo tests (#84969)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84969
Approved by: https://github.com/bdhirsh, https://github.com/ezyang
2022-09-16 15:36:52 +00:00
Kurt Mohler
95a2c3df31 Replace expectedAlertNondeterministic with simpler check function (#84808)
Fixes #84807

Pull Request resolved: https://github.com/pytorch/pytorch/pull/84808
Approved by: https://github.com/mruberry
2022-09-16 01:10:12 +00:00
Kurt Mohler
5b58140d1a Add deterministic impl of scatter_add CUDA for all input sizes (#79466)
Fixes #50469

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79466
Approved by: https://github.com/ngimel
2022-09-07 03:12:49 +00:00
Natalia Gimelshein
0b363c5c5c don't synchronize single element any/all reductions (#84465)
Fixes #84291

Pull Request resolved: https://github.com/pytorch/pytorch/pull/84465
Approved by: https://github.com/ezyang
2022-09-02 21:18:58 +00:00
Elias Ellison
f701cb04fb Test Dynamo CI w Fake Tensors (#84282)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84282
Approved by: https://github.com/anijain2305
2022-09-01 00:15:05 +00:00
mattip
4dfa6d28a1 Normalize DLPack stride to 1 where shape < 2 (#83158)
Fixes #83069. Also move all the dlpack tests to a new file., `test_dlpack.py`.

The fix involves always allocating a "strides" int array when converting to dlPack and deleting the strides when the capsule descructor is called. Then the strides are copied from the tensor, and `strides[i]` is set to `1` where `shape[i] < 2`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83158
Approved by: https://github.com/ezyang
2022-08-23 15:03:29 +00:00
Brian Hirsh
0c24af4985 Always allow tensor metadata changes (#83590)
Make it so that it is valid to set metadata after detach calls, like `x.detach().resize_(...)`.

This technically lifts some restrictions around `.data`. This PR means that you can now technically call `x.data.resize_(...)`, which can now directly resize `x` instead of erroring.

My understanding: Before the tensor-variable merge, when `x` and `x.data` were really different tensors, you could resize `x.data` independently of `x`, and during the merge, this error was added to avoid silent confusing behavior changes.

It was agreed that this error has been around long enough (several years) that it's acceptable to drop.  cc @albanD @ezyang.

(Ed already had a prototype PR [here](https://github.com/pytorch/pytorch/pull/83545) - I ended up making one to try to slog through test failures).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/83590
Approved by: https://github.com/ezyang
2022-08-19 23:30:43 +00:00
Nikita Shulga
1a09b05c94 Fix torch.equal on CPU (#83350)
`torch.equal` should not raise an exception when comparing tensors of different types
I.e. `torch.equal(torch.tensor([1, 2]), torch.tensor([1, 2], dtype=torch.float)))` should return True rather than raise an exception.
Also, this makes it consistent with GPU behaviour

Fixes https://github.com/pytorch/pytorch/issues/83314

Pull Request resolved: https://github.com/pytorch/pytorch/pull/83350
Approved by: https://github.com/albanD
2022-08-17 03:22:56 +00:00
Nikita Karetnikov
4010f96121 [primTorch] Fix off by 1 in canonicalize_dim (#83198)
Also fix an issue in the `unsqueeze` ref due to this change.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83198
Approved by: https://github.com/ngimel
2022-08-16 17:57:01 +00:00
PyTorch MergeBot
f534b2c627 Revert "Remove split functional wrapper (#74727)"
This reverts commit a58876ace7.

Reverted https://github.com/pytorch/pytorch/pull/74727 on behalf of https://github.com/seemethere due to Fails internal use cases, might extend out to external use cases as well. Need to assess overall impact of this change more widely
2022-08-10 19:45:23 +00:00
Peter Bell
a58876ace7 Remove split functional wrapper (#74727)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74727
Approved by: https://github.com/albanD, https://github.com/khabinov
2022-08-10 17:57:48 +00:00
Kurt Mohler
c379915969 Add nondeterministic alert to CUDA cumsum (#75693)
Part of #75240

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75693
Approved by: https://github.com/ngimel
2022-08-04 01:58:29 +00:00
Kurt Mohler
14d0296e5c Rename _Typed/_UntypedStorage to Typed/UntypedStorage and update docs (#82438)
### Description

Since the major changes for `_TypedStorage` and `_UntypedStorage` are now complete, they can be renamed to be public.

`TypedStorage._untyped()` is renamed to `TypedStorage.untyped()`.

Documentation for storages is improved as well.

### Issue
Fixes #82436

### Testing
N/A

Pull Request resolved: https://github.com/pytorch/pytorch/pull/82438
Approved by: https://github.com/ezyang
2022-07-30 19:37:08 +00:00
Fabio Rocha
fd84c458f4 Add torch.unflatten and improve its docs (#81399)
unflatten now has a free function version in torch.flatten in addition to
    the method in torch.Tensor.flatten.

    Updated docs to reflect this and polished them a little.
    For consistency, changed the signature of the int version of unflatten in
    native_functions.yaml.

    Some override tests were failing because unflatten has unusual
    characteristics in terms of the .int and .Dimname versions having
    different number of arguments so this required some changes
    to test/test_override.py

    Removed support for using mix of integer and string arguments
    when specifying dimensions in unflatten.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81399
Approved by: https://github.com/Lezcano, https://github.com/ngimel
2022-07-29 15:02:42 +00:00
ecao
1ebe98220c Optimize the copy of BFloat16 to Float and Float to BFloat16 (#79685)
Optimize the copy of BFloat16 to Float and Float to BFloat16.
* Vectorize the copy of BFLoat16 <-> Float
* Use `at::internal::serial_for_each` instead of directly using `cpu_kernel_vec` as  `cpu_kernel_vec` can't handle that input and output has different data types.

single socket (28cores):
```
before: torch.Size([10, 128, 10, 124])  bf16 -> fp32: 4.18e-05 ms;   fp32 -> bf16: 5.04e-05 ms
        torch.Size([10, 128, 30, 124])  bf16 -> fp32: 0.00011868 ms; fp32 -> bf16: 0.0001476 ms

after:  torch.Size([10, 128, 10, 124])  bf16 -> fp32: 1.35e-05 ms;   fp32 -> bf16: 1.97e-05 ms
        torch.Size([10, 128, 30, 124])  bf16 -> fp32: 7.32e-05 ms;   fp32 -> bf16: 5.70e-05 ms
```
single core:
```
before: torch.Size([10, 128, 10, 124])  bf16 -> fp32: 0.000848 ms;   fp32 -> bf16: 0.00105 ms
        torch.Size([10, 128, 30, 124])  bf16 -> fp32: 0.00269 ms;    fp32 -> bf16: 0.00321 ms

after:  torch.Size([10, 128, 10, 124])  bf16 -> fp32: 0.000370 ms;   fp32 -> bf16: 0.000382 ms
        torch.Size([10, 128, 30, 124])  bf16 -> fp32: 0.00153 ms;    fp32 -> bf16: 0.00113 ms
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79685
Approved by: https://github.com/malfet
2022-07-28 14:34:08 +00:00
Huy Do
edf1868e67 Fix test_doc_template regex (#81755)
### The problem

This original regex abuses .* in combination with `re.DOTALL` and leads to a catastrophic backtracking perf issue when there is no match. When it happens, test_doc_template will run "forever" and timeout. Here is an example timeout test https://github.com/pytorch/pytorch/runs/7413337595

Another minor issue with this regex is that it won't matches concatenated doc string like `"""FOO""" + """BAR"""`, which is used for some API `_torch_docs.py`

### The fix
* Remove most of the match all .* usage. I have tested to make sure that the test finishes even when there is no match, i.e. it fails successfully
* Update the regex to match all the following cases before and after linting (You can also try it out on https://pythex.org):

BEFORE
```
add_docstr(torch.abs, r"""
abs(input, *, out=None) -> Tensor

Computes the absolute value of each element in :attr:`input`.

.. math::
    \text{out}_{i} = |\text{input}_{i}|
""" + r"""
Args:
    {input}

Keyword args:
    {out}

Example::

    >>> torch.abs(torch.tensor([-1, -2, 3]))
    tensor([ 1,  2,  3])
""".format(**common_args))

add_docstr(torch.absolute,
           r"""
absolute(input, *, out=None) -> Tensor

Alias for :func:`torch.abs`
""")
```

AFTER
```
add_docstr(
    torch.abs,
    r"""
abs(input, *, out=None) -> Tensor

Computes the absolute value of each element in :attr:`input`.

.. math::
    \text{out}_{i} = |\text{input}_{i}|
"""
    + r"""
Args:
    {input}

Keyword args:
    {out}

Example::

    >>> torch.abs(torch.tensor([-1, -2, 3]))
    tensor([ 1,  2,  3])
""".format(
        **common_args
    ),
)

add_docstr(
    torch.absolute,
    r"""
absolute(input, *, out=None) -> Tensor

Alias for :func:`torch.abs`
""",
)
```

This will unblock https://github.com/pytorch/pytorch/pull/81643
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81755
Approved by: https://github.com/atalman
2022-07-21 16:28:29 +00:00
Animesh Jain
1d90d6ee60 Setup for running PyTorch tests with TorchDynamo and skips for known failing tests (#80106)
@ezyang I am going to keep adding more skips in this PR for now. And once we have the CI running, I will replace with the appropriate decorators.

cc @mlazos , we should add those tests in test_ops.py in this PR as well

cc @jansel
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80106
Approved by: https://github.com/ezyang, https://github.com/jansel
2022-07-07 18:57:33 +00:00
Kurt Mohler
4c279994fd Fix Module.share_memory error (#80843)
Fixes #80733

Pull Request resolved: https://github.com/pytorch/pytorch/pull/80843
Approved by: https://github.com/malfet
2022-07-05 15:17:36 +00:00
PyTorch MergeBot
f668b7ecb0 Add integer support to index_reduce (#80464)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80464
Approved by: https://github.com/cpuhrsch
2022-06-30 12:54:51 +00:00
PyTorch MergeBot
d7847ed23e Add integer support to scatter_reduce (#80324)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80324
Approved by: https://github.com/cpuhrsch
2022-06-29 21:10:26 +00:00
Alexander Grund
71d9592a72 Only sync CUDA if the operation is run on GPU (#80328)
This fixes test failures when PyTorch is build without CUDA

Fixes https://github.com/pytorch/pytorch/issues/58563

I used the same is_cuda check that is used in test_nn.py

CC @ailzhang after #58564
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80328
Approved by: https://github.com/mruberry
2022-06-27 14:49:39 +00:00
Alexander Grund
3b8589ac44 Copy Tensor for tests to avoid in-place transform modifying the original tensor (#80331)
Fixes #48591

CC @mruberry  after #60256
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80331
Approved by: https://github.com/mruberry
2022-06-27 14:47:52 +00:00
lezcano
f54e7b4ad6 More forward AD formulas
This PR:
- Corrects the forward AD formula of `torch.sgn`.
  - The reason why we can't use `auto_element_wise` for this operations is rather subtle. I left a comment.
  - This, in turn, fixes a problem we had in forward-over-backward for `linalg.svd` and other spectral decompositions (and `norm`, `linalg.norm`, `linalg.matrix_norm`) that were using `torch.abs` (whose derivative is given by `torch.sgn`.
- Implement the formula for a number of missing operations `nansum`, `amax`, `amin`...
- Simplified a few formulas, most notably the forward AD for `div` and the derivative of `norm`, `linalg.norm` and `vector_norm` for `ord=+-inf`.
- Correct the formula for `mean`, `std_mean`, `var_mean` when `dim` is provided and equal to `()` (or `None`)
- A few minor improvements to `sum_backward`, `unsqueeze_multiple` and formulas depending on them
- Fix the derivatives of `std_mean` and `std_var` (complex support,
ASAN, forward AD...)

Fixes: https://github.com/pytorch/pytorch/issues/67539

Pull Request resolved: https://github.com/pytorch/pytorch/pull/80082

Approved by: https://github.com/zou3519
2022-06-23 01:31:08 +00:00
Alex Hedges
cb2b7b1e57 Fix code that triggers BytesWarning (#79868)
Fixes #74812.

I have fixed the multiple instances in the repository that trigger
`BytesWarning`, and I have enabled the `-bb` option when tests are run
to prevent regressions.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79868
Approved by: https://github.com/janeyx99
2022-06-21 01:12:21 +00:00
PyTorch MergeBot
e10cbe3880 Revert "Fix BytesWarning in torch.load() (#74813)"
This reverts commit 6c2e8119dd.

Reverted https://github.com/pytorch/pytorch/pull/74813 on behalf of https://github.com/janeyx99 due to Broke slow tests in cuda 10.2 https://github.com/pytorch/pytorch/runs/6944238177?check_suite_focus=true
2022-06-18 03:53:54 +00:00
Alex Hedges
6c2e8119dd Fix BytesWarning in torch.load() (#74813)
Fixes #74812.

I have enabled the `-bb` option when tests are run to prevent regressions. I don't think it will make CI run more slowly, but I'm not entirely sure.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/74813
Approved by: https://github.com/kit1980
2022-06-17 22:56:43 +00:00
drisspg
bdcee8f995 update is_same_size to work with nested tensor dispatch
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79297

Approved by: https://github.com/soulitzer
2022-06-11 00:07:27 +00:00
Brian Hirsh
7b3a0ff87a Port index.Tensor to structured kernels.
Tracking issue: #55070

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69607

Approved by: https://github.com/bdhirsh
2022-06-10 17:27:47 +00:00
Peter Bell
7843a5e882 Move Tensor.grad back into C++
`Tensor.grad` was moved to python in #30531 to add a warning. However,
that warning has since been lowered into C++ so this wrapper is no
longer necessary.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76675

Approved by: https://github.com/albanD
2022-06-10 13:44:45 +00:00
PyTorch MergeBot
4b82ef7928 Revert "Port index.Tensor to structured kernels."
This reverts commit cfd84125bd.

Reverted https://github.com/pytorch/pytorch/pull/69607 on behalf of https://github.com/zengk95 due to This is breaking mac trunk tests cfd84125bd
2022-06-08 20:16:10 +00:00
Brian Hirsh
cfd84125bd Port index.Tensor to structured kernels.
Tracking issue: #55070

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69607

Approved by: https://github.com/bdhirsh
2022-06-08 18:17:52 +00:00
Kshiteej K
497ae27050 [chalf] warn once on creating a chalf tensor (#78245)
`chalf` is experimental as the op coverage is low.

Following script raises 6 warnings if `set_warn_always(True)` else raises only 1 warning.
```python
import torch
torch.set_warn_always(True)
device='cpu'
t = torch.randn(3, dtype=torch.chalf, device=device)
y = torch.rand(3, dtype=torch.chalf, device=device)
# Allocates new tensor for result
t + y

device='cuda'
t = torch.randn(3, dtype=torch.chalf, device=device)
y = torch.rand(3, dtype=torch.chalf, device=device)

# Allocates new tensor for result
t + y

```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78245
Approved by: https://github.com/anjali411
2022-06-01 18:38:31 +00:00
yuguo68
efdb4192bc set data permits requires_grad=True on integer tensor
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78436

Approved by: https://github.com/albanD, https://github.com/soulitzer
2022-06-01 15:56:32 +00:00
PyTorch MergeBot
fca1f495c2 Revert "Port index.Tensor to structured kernels."
This reverts commit 9fe6f1baf5.

Reverted https://github.com/pytorch/pytorch/pull/69607 on behalf of https://github.com/suo due to this broke master, see: 9fe6f1baf5
2022-06-01 00:12:15 +00:00
Brian Hirsh
9fe6f1baf5 Port index.Tensor to structured kernels.
Tracking issue: #55070

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69607

Approved by: https://github.com/bdhirsh
2022-05-31 22:15:20 +00:00
Kurt Mohler
e9afb43676 Add meta device support to _UntypedStorage and _TypedStorage (#78008)
Fixes #77885

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78008
Approved by: https://github.com/ezyang
2022-05-28 15:33:45 +00:00
Yu Guo
f69c990ecc fix index_select when source tensor is empty
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77881

Approved by: https://github.com/ezyang
2022-05-26 03:10:47 +00:00
Kurt Mohler
cecb2ad95e Restore old names for private funcs in legacy storages (#77861)
Followup from #75459

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77861
Approved by: https://github.com/ezyang
2022-05-20 02:03:34 +00:00
Eric Sauser
2d4291fb81 [torch] Fixed a few test for Windows & Linux GPUs (#77531)
Summary:
While running those tests on
- my local windows GPU machine
- a dev server
- an on-demand GPU
I noticed a few test failures and here's some tentative fixes

Test Plan:
Ran tests on:
- my local windows GPU machine
- a linux dev server w/o GPU
- an linux on-demand GPU server

Note that when using CUDA11, the tests crashes (segfaults) on calls to torch.nn.ConvTranspose3d. Fails on master, but works with CUDA10.

Differential Revision: D36377288

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77531
Approved by: https://github.com/ezyang
2022-05-19 14:04:13 +00:00
Kurt Mohler
aea6e2c396 Merge torch.cuda._UntypedStorage into torch._UntypedStorage (#75459)
Fixes #74933

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75459
Approved by: https://github.com/ezyang
2022-05-19 13:54:39 +00:00
Mikayla Gawarecki
841c65f499 Unprivate _index_reduce and add documentation
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76997

Approved by: https://github.com/cpuhrsch
2022-05-13 19:48:38 +00:00
Kulin Seth
e011a8e18b Enable PyTorch operations on MPS Backend. (#77343)
Add PyTorch operations to MPS backend.

- https://github.com/pytorch/pytorch/issues/77394
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77343
Approved by: https://github.com/albanD
2022-05-13 18:28:53 +00:00
Mikayla Gawarecki
1141b45e7a Index reduction CUDA support
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76296

Approved by: https://github.com/cpuhrsch, https://github.com/ngimel
2022-05-13 14:47:52 +00:00
Christian Puhrsch
ce9a477fdf Support torch.Tensor.to for CSR
Fixes #76379

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76400
Approved by: https://github.com/pearu, https://github.com/davidberard98
2022-05-05 21:59:50 +00:00
Natalia Gimelshein
ce76244200 fix where type promotion
Fixes #73298
I don't know whether `where` kernel actually supports type promotion, nor am I in the mood to find out, so it's manual type promotion.
Edit: nah, i can't tell TI to "promote to common dtype" because of bool condition, so manual type promotion is our only option.
I'll see what tests start failing and fix.
Uses some parts from #62084
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76691
Approved by: https://github.com/mruberry
2022-05-03 04:40:04 +00:00
kshitij12345
e36d25fbae [complex32] support printing the tensor
Reference: #74537
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76614
Approved by: https://github.com/anjali411
2022-05-01 12:46:09 +00:00
Mikayla Gawarecki
676a4a3969 Prototype _index_reduce (CPU-only)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75981

Approved by: https://github.com/cpuhrsch
2022-04-27 23:01:00 +00:00
kshitij12345
aa51704ce5 [complex32] add chalf alias for complex32 and chalf method
Reference: https://github.com/pytorch/pytorch/issues/74537

Adds chalf alias for complex32 and also adds method `chalf` similar to `cfloat, cdouble`

TODO:
* [x] Add docs
* [x] Add override
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75320
Approved by: https://github.com/anjali411
2022-04-20 23:44:47 +00:00
Edward Z. Yang
ee955b8bb9 Cannibalize noarch CI job into crossref CI job
crossref is a new strategy for performing tests when you want
to run a normal PyTorch API call, separately run some variation of
the API call (e.g., same thing but all the arguments are meta tensors)
and then cross-reference the results to see that they are consistent.
Any logic you add to CrossRefMode will get run on *every* PyTorch API
call that is called in the course of PyTorch's test suite.  This can
be a good choice for correctness testing if OpInfo testing is not
exhaustive enough.

For now, the crossref test doesn't do anything except verify that
we can validly push a mode onto the torch function mode stack for all
functions.

Signed-off-by: Edward Z. Yang <ezyangfb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75988

Approved by: https://github.com/seemethere
2022-04-20 11:56:25 +00:00
Edward Z. Yang
30943d1610 Remove noarchTest decorator
These tests are cheap so it doesn't matter if we run them on all
configs.  This is in preparation for removing the noarch build
configuration entirely.

Signed-off-by: Edward Z. Yang <ezyangfb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75985

Approved by: https://github.com/seemethere, https://github.com/cbalioglu
2022-04-19 00:48:49 +00:00
Beilei Zheng
332086c08d Add BFloat16 support for multinomial and poisson on CPU
Add BFloat16 support for multinomial and poisson on CPU

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63215
Approved by: https://github.com/frank-wei, https://github.com/bigfootjon
2022-04-14 15:42:18 +00:00
Jagadish Krishnamoorthy
26ba7a9297 ROCm: Enable test_masked_scatter_large_tensor
#68487 fixes the issue #60190 for ROCm >= 5.0 release.

Signed-off-by: Jagadish Krishnamoorthy <jagdish.krishna@gmail.com>

Fixes #60190

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75455
Approved by: https://github.com/ezyang
2022-04-08 15:59:40 +00:00
Nikita Karetnikov
936a65056e Use the same checks in all grid_sampler functions
Fixes #73187.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75164
Approved by: https://github.com/albanD
2022-04-04 15:21:44 +00:00
Alban Desmaison
0ce02ea52d Revert D35284563: Use the same checks in all grid_sampler functions
Test Plan: revert-hammer

Differential Revision:
D35284563 (835cc66e5d)

Original commit changeset: 1477c506b875

Original Phabricator Diff: D35284563 (835cc66e5d)

fbshipit-source-id: 7260f4dfda23bd60200e5ba2c5bf3e4f833c2646
(cherry picked from commit fbe082905ef678e7dd70dbc9520dca644383ce01)
2022-04-01 16:45:46 +00:00
kshitij12345
65b65af236 [complex32] cat, fill_(partial), item
Reference : #74537

`cat_backwards` (on CUDA) requires support for `fill`, have added support for `fill`. (Also `fill` requires `item` support)

Now `fill` backward requires `sum` (will add it in later PR).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75010
Approved by: https://github.com/anjali411
2022-04-01 15:19:05 +00:00
Nikita Karetnikov
835cc66e5d Use the same checks in all grid_sampler functions (#74635)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74635

Fixes #73187.

Test Plan: Imported from OSS

Reviewed By: bdhirsh

Differential Revision: D35284563

Pulled By: albanD

fbshipit-source-id: 1477c506b8755d864ca902ee140bee7bdb0069b0
(cherry picked from commit dcbd5242baaae11f9e323d99a9596e5b88e86bd7)
2022-04-01 14:26:16 +00:00
Mikayla Gawarecki
2bfa018462 [BC-breaking] Use ScatterGatherKernel for scatter_reduce (CPU-only) (#74226)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74226

Update signature of `scatter_reduce_` to match `scatter_/scatter_add_`

`Tensor.scatter_reduce_(int64 dim, Tensor index, Tensor src, str reduce)`

- Add new reduction options in ScatterGatherKernel.cpp and update `scatter_reduce` to call into the cpu kernel for `scatter.reduce`
- `scatter_reduce` now has the same shape constraints as `scatter_` and `scatter_add_`
- Migrate `test/test_torch.py:test_scatter_reduce` to `test/test_scatter_gather_ops.py`

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D35222842

Pulled By: mikaylagawarecki

fbshipit-source-id: 84930add2ad30baf872c495251373313cb7428bd
(cherry picked from commit 1b45139482e22eb0dc8b6aec2a7b25a4b58e31df)
2022-04-01 05:57:45 +00:00
Nikita Shulga
bfac65dfe5
[testing] Update dispatch macros (#74977)
This PR is reland of #74289 
Co-authored-by: Khushi Agrawal <khushiagrawal411@gmail.com>
2022-03-30 14:13:21 -07:00
PyTorch MergeBot
2e4152b118 Revert "[testing] Update dispatch macros"
This reverts commit eed19a0f38.

Reverted https://github.com/pytorch/pytorch/pull/74289 on behalf of https://github.com/malfet
2022-03-30 19:52:37 +00:00
Khushi Agrawal
eed19a0f38 [testing] Update dispatch macros
Hi,
This PR is the follow-up PR of #71561. (the previous PR had a couple of merge conflicts and was reverted, this PR resolves that).
Please take a look. Thanks!

cc: @pmeier @mruberry @kshitij12345
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74289
Approved by: https://github.com/pmeier, https://github.com/mruberry
2022-03-30 16:10:16 +00:00
Edward Z. Yang
51e7a3406c Fix formatting of scalar tensors (don't call item)
Signed-off-by: Edward Z. Yang <ezyangfb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/74376

Approved by: https://github.com/bdhirsh
2022-03-25 02:22:25 +00:00
Jane Xu
3f9115dc7a Decorate test_pdist_large for requiring large memory (#74574)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/74154

Pull Request resolved: https://github.com/pytorch/pytorch/pull/74574

Reviewed By: george-qi

Differential Revision: D35100229

Pulled By: janeyx99

fbshipit-source-id: d7df377318e45c7f5447c034aa025b1422fcc06e
(cherry picked from commit 335a76d9f2a721b30e1b9e1c869bfbe431f01a2a)
2022-03-24 17:25:37 +00:00
kshitij12345
f7ee308dfb [complex-half] support casting (by updating copy_)
Reference https://github.com/pytorch/pytorch/issues/71680

Pull Request resolved: https://github.com/pytorch/pytorch/pull/73847
Approved by: https://github.com/anjali411
2022-03-23 21:42:59 +00:00
Kurt Mohler
79ddc72b85 Virtualize <type>Storage classes (#66970)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/66228

cc ezyang bhosmer smessmer ljk53 bdhirsh

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66970

Reviewed By: bdhirsh

Differential Revision: D33245612

Pulled By: ezyang

fbshipit-source-id: 4c61c2cb029e2b94b0e68927c377d3e1c358dd7c
(cherry picked from commit d29fcdfb4bc2cc17b1795d4349e4b56fa0d1cf12)
2022-03-22 23:44:48 +00:00
Saketh Are
46a88036af Refactor error input tests in test_torch.py to OpInfos (#73981)
Summary:
This PR ports several tests in `test/test_torch.py` over to OpInfo ErrorInputs.

Some tests commented "convert to ErrorInputs" still remain in `test_torch.py`. They fall under two categories:
- Memory overlap tests which specifically test the in-place version of an operator (e.g. [this test](424a054d53/test/test_torch.py (L3788)) for index_add_).
- Tests with non-trivial behavior calling `torch.cuda.synchronize()` after calling the operator being tested (e.g. [this test](424a054d53/test/test_torch.py (L4948)) for torch.multinomial).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/73981

Reviewed By: qihqi

Differential Revision: D35016669

Pulled By: saketh-are

fbshipit-source-id: bc0016d2b2bfb566a9dfef81ecf44e0adb9e4b14
(cherry picked from commit 99bcbdb05f2c10a717a269b0010aa3a3e24fe5c0)
2022-03-21 22:31:37 +00:00
Nikita Shulga
ef066f0832 Revert D34856571: [pytorch][PR] Replace get_all_ type macros with the ATen dispatch macros.
Test Plan: revert-hammer

Differential Revision:
D34856571 (3ded7b1da3)

Original commit changeset: 0dca038bcad5

Original Phabricator Diff: D34856571 (3ded7b1da3)

fbshipit-source-id: 594553fa0b710d78beba59d5d2b646f1f1270386
(cherry picked from commit 8090eb9b12dcf452a9e7dc01792a66fb91b563b6)
2022-03-15 22:07:11 +00:00
Khushi Agrawal
3ded7b1da3 Replace get_all_ type macros with the ATen dispatch macros. (#71561)
Summary:
Hi, Team!
The PR is motivated from https://github.com/pytorch/pytorch/pull/71153#discussion_r782446738. It aims to replace `get_all` type macros with the ATen dispatch macros.

The files it iterates over are: (Thanks, Lezcano, for the idea!!)

<details>
<summary>

`test/test_autograd.py`</summary>

<p>

```python
43:from torch.testing._internal.common_dtype import get_all_dtypes
8506:        floating_dt = [dt for dt in get_all_dtypes() if dt.is_floating_point]
```

</p>
</details>

<details>
<summary>

`test/test_binary_ufuncs.py`</summary>

<p>

```python
26:    all_types_and_complex_and, integral_types_and, get_all_dtypes, get_all_int_dtypes, get_all_math_dtypes,
27:    get_all_complex_dtypes, get_all_fp_dtypes,
935:    dtypes(*get_all_dtypes(include_bool=False, include_complex=False))
1035:    dtypes(*get_all_dtypes(
1488:    dtypes(*(get_all_dtypes(include_bool=False, include_bfloat16=False)))
1879:    dtypes(*product(get_all_dtypes(include_complex=False), get_all_dtypes(include_complex=False)))
1887:    dtypes(*(get_all_int_dtypes() + [torch.bool]))
1913:    dtypes(*(get_all_fp_dtypes()))
1941:    dtypes(*(get_all_fp_dtypes()))
1977:    dtypes(*product(get_all_complex_dtypes(), get_all_dtypes()))
2019:    dtypes(*product(get_all_fp_dtypes(), get_all_fp_dtypes()))
2048:    dtypes(*get_all_dtypes())
2110:    dtypes(*product(get_all_dtypes(include_complex=False),
2111:                     get_all_dtypes(include_complex=False)))
2128:            types = [torch.bool, torch.bfloat16] + get_all_int_dtypes()
2173:        if dtypes[1] in get_all_fp_dtypes():
2178:    dtypes(*product(get_all_fp_dtypes(),
2179:                     get_all_fp_dtypes()))
2260:    dtypesIfCUDA(*set(get_all_math_dtypes('cuda')) - {torch.complex64, torch.complex128})
2261:    dtypes(*set(get_all_math_dtypes('cpu')) - {torch.complex64, torch.complex128})
2273:    dtypesIfCUDA(*set(get_all_math_dtypes('cuda')) - {torch.complex64, torch.complex128})
2274:    dtypes(*set(get_all_math_dtypes('cpu')) - {torch.complex64, torch.complex128})
2307:    dtypes(*get_all_math_dtypes('cpu'))
2319:    dtypes(*get_all_fp_dtypes(include_bfloat16=False))
2331:    dtypes(*get_all_int_dtypes())
2356:    dtypes(*get_all_dtypes(include_bfloat16=False, include_bool=False, include_complex=False))
2393:        if dtype in get_all_int_dtypes():
2614:    dtypes(*get_all_dtypes())
2624:    dtypes(*tuple(itertools.combinations_with_replacement(get_all_dtypes(), 2)))
2806:    dtypes(*list(product(get_all_dtypes(include_complex=False),
2807:                          get_all_dtypes(include_complex=False))))
2866:    dtypes(*list(product(get_all_complex_dtypes(),
2867:                          get_all_complex_dtypes())))
2902:    dtypes(*product(get_all_dtypes(), get_all_dtypes()))
2906:    dtypes(*product(get_all_dtypes(), get_all_dtypes()))
2910:    dtypes(*product(get_all_dtypes(), get_all_dtypes()))
3019:        dtypes = [torch.float, torch.double] + get_all_complex_dtypes()
3221:    dtypes(*get_all_dtypes(include_complex=False))
3407:    dtypes(*list(product(get_all_dtypes(include_bool=False),
3408:                          get_all_dtypes(include_bool=False))))
3504:    dtypes(*product(get_all_dtypes(include_complex=False, include_bfloat16=False),
3505:                     get_all_dtypes(include_complex=False, include_bfloat16=False)))
3516:            if x.dtype in get_all_int_dtypes() + [torch.bool]:
3643:    dtypes(*product(get_all_dtypes(include_complex=False,
3645:                     get_all_dtypes(include_complex=False,
```

</p>
</details>

<details>
<summary>

`test/test_complex.py`</summary>

<p>

```python
6:from torch.testing._internal.common_dtype import get_all_complex_dtypes
11:    dtypes(*get_all_complex_dtypes())
```

</p>
</details>

<details>
<summary>

`test/test_foreach.py`</summary>

<p>

```python
18:    get_all_dtypes, get_all_int_dtypes, get_all_complex_dtypes, get_all_fp_dtypes,
142:            if dtype in get_all_int_dtypes():
179:            disable_fastpath = op.ref == torch.div and dtype in get_all_int_dtypes() + [torch.bool]
201:            disable_fastpath = op.ref == torch.div and dtype in get_all_int_dtypes() + [torch.bool]
205:                disable_fastpath |= dtype in get_all_int_dtypes() + [torch.bool]
211:                disable_fastpath |= dtype not in get_all_complex_dtypes()
241:                bool_int_div = op.ref == torch.div and dtype in get_all_int_dtypes() + [torch.bool]
246:                    disable_fastpath |= dtype in get_all_int_dtypes() + [torch.bool]
248:                    disable_fastpath |= dtype not in get_all_complex_dtypes()
250:                    disable_fastpath |= True and dtype not in get_all_complex_dtypes()
307:        disable_fastpath = dtype in get_all_int_dtypes() + [torch.bool]
365:        if opinfo.name == "_foreach_abs" and dtype in get_all_complex_dtypes():
376:    ops(foreach_unary_op_db, dtypes=get_all_dtypes())
393:         dtypes=get_all_dtypes(include_half=True, include_bfloat16=True, include_complex=False))
401:    ops(foreach_minmax_op_db, dtypes=get_all_fp_dtypes(include_bfloat16=True, include_half=True))
426:            if ord in (1, 2) and dtype in torch.testing.get_all_fp_dtypes():
439:    dtypes(*get_all_dtypes())
449:    ops(foreach_binary_op_db, dtypes=get_all_dtypes())
481:    ops(foreach_binary_op_db, dtypes=get_all_dtypes())
536:            if dtype in get_all_int_dtypes() + [torch.bool] and foreach_op == torch._foreach_div:
545:    ops(foreach_binary_op_db, dtypes=get_all_dtypes())
637:    ops(foreach_pointwise_op_db, allowed_dtypes=get_all_fp_dtypes(include_half=False, include_bfloat16=False))
```

</p>
</details>

<details>
<summary>

`test/test_linalg.py`</summary>

<p>

```python
29:    all_types, floating_types, floating_and_complex_types, get_all_dtypes, get_all_int_dtypes, get_all_complex_dtypes,
30:    get_all_fp_dtypes,
111:    dtypes(*(get_all_dtypes()))
794:        float_and_complex_dtypes = get_all_fp_dtypes() + get_all_complex_dtypes()
807:    dtypes(*(get_all_int_dtypes()))
828:    dtypes(*(get_all_fp_dtypes() + get_all_complex_dtypes()))
841:        if dtype in get_all_complex_dtypes():
844:    dtypes(*itertools.product(get_all_dtypes(),
845:                               get_all_dtypes()))
855:        for dtypes0, dtypes1, dtypes2 in product(get_all_dtypes(), repeat=3):
5607:                  *get_all_fp_dtypes(include_half=not CUDA9, include_bfloat16=(CUDA11OrLater and SM53OrLater)))
5608:    dtypes(*(set(get_all_dtypes()) - {torch.half, torch.bool}))
5644:    dtypes(*(get_all_complex_dtypes() + get_all_fp_dtypes()))
6255:    dtypesIfCUDA(*get_all_complex_dtypes(),
6256:                  *get_all_fp_dtypes(include_bfloat16=(TEST_WITH_ROCM or (CUDA11OrLater and SM53OrLater)),
6292:    dtypesIfCUDA(*get_all_fp_dtypes(include_bfloat16=(TEST_WITH_ROCM or (CUDA11OrLater and SM53OrLater))))
6323:    dtypesIfCUDA(*get_all_complex_dtypes(),
6324:                  *get_all_fp_dtypes(include_bfloat16=(TEST_WITH_ROCM or (CUDA11OrLater and SM53OrLater))))
6325:    dtypes(*get_all_complex_dtypes(), *get_all_fp_dtypes())
6358:    dtypesIfCUDA(*([torch.float, torch.double] + get_all_complex_dtypes()))
6556:    dtypes(*get_all_fp_dtypes(), *get_all_complex_dtypes())
6668:    dtypes(*get_all_fp_dtypes(), *get_all_complex_dtypes())
6741:    dtypes(*get_all_fp_dtypes(), *get_all_complex_dtypes())
```

</p>
</details>

<details>
<summary>

`test/test_nn.py`</summary>

<p>

```python
37:from torch.testing._internal.common_dtype import integral_types, get_all_fp_dtypes, get_all_math_dtypes
50:    onlyNativeDeviceTypes, deviceCountAtLeast, largeTensorTest, expectedFailureMeta, skipMeta, get_all_device_types, \
8862:                for device in get_all_device_types():
9629:            for dt1 in get_all_math_dtypes(device):
9630:                for dt2 in get_all_math_dtypes(device):
9631:                    for dt3 in get_all_math_dtypes(device):
9648:            for input_dtype in get_all_math_dtypes(device):
9664:            for input_dtype in get_all_math_dtypes(device):
13015:    dtypes(*get_all_fp_dtypes(include_bfloat16=AMPERE_OR_ROCM))
13034:    dtypes(*get_all_fp_dtypes(include_bfloat16=AMPERE_OR_ROCM))
13159:    dtypes(*get_all_fp_dtypes(include_bfloat16=AMPERE_OR_ROCM))
17400:    dtypesIfCUDA(*get_all_fp_dtypes(include_bfloat16=AMPERE_OR_ROCM))
17768:    dtypesIfCUDA(*get_all_fp_dtypes())
17773:    dtypesIfCUDA(*get_all_fp_dtypes())
17778:    dtypesIfCUDA(*get_all_fp_dtypes())
17783:    dtypesIfCUDA(*get_all_fp_dtypes())
17788:    dtypesIfCUDA(*get_all_fp_dtypes())
17793:    dtypesIfCUDA(*get_all_fp_dtypes())
17798:    dtypesIfCUDA(*get_all_fp_dtypes())
17963:    dtypesIfCUDA(*get_all_fp_dtypes())
17977:    dtypesIfCUDA(*get_all_fp_dtypes())
18684:    def test_cross_entropy_loss_prob_target_all_reductions(self, device):
```

</p>
</details>

<details>
<summary>

`test/test_numpy_interop.py`</summary>

<p>

```python
12:from torch.testing._internal.common_dtype import get_all_dtypes
399:    dtypes(*get_all_dtypes())
```

</p>
</details>

<details>
<summary>

`test/test_ops.py`</summary>

<p>

```python
12:from torch.testing._internal.common_dtype import floating_and_complex_types_and, get_all_dtypes
86:        for dtype in get_all_dtypes():
```

</p>
</details>

<details>
<summary>

`test/test_reductions.py`</summary>

<p>

```python
16:    get_all_dtypes, get_all_math_dtypes, get_all_int_dtypes, get_all_complex_dtypes, get_all_fp_dtypes,
360:         allowed_dtypes=get_all_dtypes(include_bfloat16=False))
366:         allowed_dtypes=get_all_dtypes(include_bfloat16=False))
394:         allowed_dtypes=get_all_dtypes(include_bfloat16=False))
750:        for dtype in [dtype for dtype in get_all_math_dtypes('cpu') if dtype != torch.float16]:
1404:    dtypes(*get_all_dtypes(include_bool=False, include_complex=False))
1457:    dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False) +
1458:              get_all_complex_dtypes()))
1465:            return dtype in get_all_int_dtypes()
1494:    dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False)))
1501:    dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False)))
1507:    dtypes(*(get_all_complex_dtypes()))
1514:        dtypes = list(get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False))
1523:    dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False)))
1531:        if dtype in get_all_fp_dtypes():
1608:    dtypes(*(get_all_dtypes(include_half=True, include_bfloat16=False,
1837:    dtypes(*get_all_dtypes(include_bool=False, include_complex=False))
1855:    dtypes(*(set(get_all_dtypes(include_bool=False, include_complex=False)) - {torch.uint8}))
3219:        for dtype in get_all_dtypes(include_half=True, include_bfloat16=False,
```

</p>
</details>

<details>
<summary>

`test/test_serialization.py`</summary>

<p>

```python
26:from torch.testing._internal.common_dtype import get_all_dtypes
586:        for device, dtype in product(devices, get_all_dtypes()):
589:            for other_dtype in get_all_dtypes():
```

</p>
</details>

<details>
<summary>

`test/test_shape_ops.py`</summary>

<p>

```python
18:from torch.testing._internal.common_dtype import get_all_dtypes
230:    dtypes(*get_all_dtypes(include_complex=False, include_bool=False, include_half=False,
232:    dtypesIfCUDA(*get_all_dtypes(include_complex=False, include_bool=False, include_bfloat16=False))
344:    dtypes(*get_all_dtypes())
443:    dtypes(*get_all_dtypes())
461:    dtypes(*get_all_dtypes())
570:    dtypes(*get_all_dtypes(include_complex=False))
```

</p>
</details>

<details>
<summary>

`test/test_sort_and_select.py`</summary>

<p>

```python
12:    all_types, all_types_and, floating_types_and, get_all_dtypes, get_all_int_dtypes, get_all_fp_dtypes,
136:    dtypes(*set(get_all_dtypes()) - {torch.bool, torch.complex64, torch.complex128})
231:    dtypes(*set(get_all_dtypes()) - {torch.bool, torch.complex64, torch.complex128})
296:    dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes()))
647:    dtypesIfCUDA(*get_all_fp_dtypes())
678:    dtypesIfCUDA(*(get_all_dtypes(include_complex=False,
682:    dtypes(*(get_all_dtypes(include_complex=False, include_bool=False, include_half=False, include_bfloat16=False)))
739:    dtypesIfCPU(*set(get_all_dtypes()) - {torch.complex64, torch.complex128})
740:    dtypes(*set(get_all_dtypes()) - {torch.bfloat16, torch.complex64, torch.complex128})
799:    dtypesIfCPU(*set(get_all_dtypes()) - {torch.complex64, torch.complex128})
800:    dtypes(*set(get_all_dtypes()) - {torch.bfloat16, torch.complex64, torch.complex128})
```

</p>
</details>

<details>
<summary>

`test/test_sparse.py`</summary>

<p>

```python
20:from torch.testing import get_all_complex_dtypes, get_all_fp_dtypes
29:    floating_and_complex_types, floating_and_complex_types_and, get_all_dtypes, get_all_int_dtypes,
1963:            return dtype in get_all_int_dtypes()
1994:    dtypes(*get_all_dtypes(include_bool=False, include_half=False,
2103:            return dtype in get_all_int_dtypes()
2138:    dtypes(*get_all_dtypes(include_bool=False, include_half=False,
2626:        all_sparse_dtypes = get_all_dtypes(include_complex=True)
2633:        all_sparse_dtypes = get_all_dtypes(include_complex=True)
3230:    dtypes(*get_all_complex_dtypes(),
3231:            *get_all_fp_dtypes(include_half=False, include_bfloat16=False))
3234:                  *get_all_fp_dtypes(
```

</p>
</details>

<details>
<summary>

`test/test_sparse_csr.py`</summary>

<p>

```python
7:from torch.testing import get_all_complex_dtypes, get_all_fp_dtypes, floating_and_complex_types, make_tensor
17:from torch.testing._internal.common_dtype import floating_types, get_all_dtypes
120:    dtypes(*get_all_dtypes())
133:    dtypes(*get_all_dtypes())
150:    dtypes(*get_all_dtypes())
180:    dtypes(*get_all_dtypes())
201:    dtypes(*get_all_dtypes())
210:    dtypes(*get_all_dtypes())
225:    dtypes(*get_all_dtypes())
244:    dtypes(*get_all_dtypes())
263:    dtypes(*get_all_dtypes())
285:    dtypes(*get_all_dtypes())
411:    dtypes(*get_all_dtypes())
482:    dtypes(*get_all_dtypes())
502:    dtypes(*get_all_dtypes())
562:    dtypes(*get_all_dtypes())
588:    dtypesIfCUDA(*get_all_complex_dtypes(),
589:                  *get_all_fp_dtypes(include_half=SM53OrLater, include_bfloat16=SM80OrLater))
745:    dtypesIfCUDA(*get_all_complex_dtypes(),
746:                  *get_all_fp_dtypes(include_half=SM53OrLater and TEST_CUSPARSE_GENERIC,
765:    dtypesIfCUDA(*get_all_complex_dtypes(),
766:                  *get_all_fp_dtypes(include_half=SM53OrLater and TEST_CUSPARSE_GENERIC,
801:                  *torch.testing.get_all_fp_dtypes(include_bfloat16=SM80OrLater,
841:                  *torch.testing.get_all_fp_dtypes(include_bfloat16=SM80OrLater,
1182:    dtypes(*get_all_dtypes())
1276:    dtypes(*get_all_dtypes(include_bool=False, include_half=False, include_bfloat16=False))
1286:    dtypes(*get_all_dtypes())
```

</p>
</details>

<details>
<summary>

`test/test_tensor_creation_ops.py`</summary>

<p>

```python
21:    onlyCUDA, skipCPUIf, dtypesIfCUDA, skipMeta, get_all_device_types)
23:    get_all_dtypes, get_all_math_dtypes, get_all_int_dtypes, get_all_fp_dtypes, get_all_complex_dtypes
150:        for dt in get_all_dtypes():
160:        for dt in get_all_dtypes():
314:        dtypes = [dtype for dtype in get_all_dtypes() if dtype != torch.bfloat16]
1012:    dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False) +
1013:              get_all_complex_dtypes()))
1032:    dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False) +
1033:              get_all_complex_dtypes()))
1050:    dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False) +
1051:              get_all_complex_dtypes()))
1745:    dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes()))
1779:    dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes()))
1868:    dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes()))
1926:    dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes()))
1954:            do_test_empty_full(self, get_all_math_dtypes('cpu'), torch.strided, torch_device)
1956:            do_test_empty_full(self, get_all_math_dtypes('cpu'), torch.strided, None)
1957:            do_test_empty_full(self, get_all_math_dtypes('cpu'), torch.strided, torch_device)
2538:        for device in get_all_device_types():
2645:        for dtype in get_all_dtypes():
2678:    dtypes(*(get_all_fp_dtypes(include_half=False, include_bfloat16=False) +
2679:              get_all_complex_dtypes()))
2716:    dtypes(*get_all_fp_dtypes(include_half=False, include_bfloat16=False))
2827:            for dt in get_all_dtypes():
2913:    dtypes(*get_all_dtypes(include_bool=False, include_half=False))
2914:    dtypesIfCUDA(*get_all_dtypes(include_bool=False, include_half=True))
3028:    dtypes(*(get_all_fp_dtypes() + get_all_complex_dtypes()))
3033:    dtypes(*(get_all_fp_dtypes() + get_all_complex_dtypes()))
3074:    dtypes(*get_all_dtypes(include_bool=False, include_half=False, include_complex=False))
3075:    dtypesIfCUDA(*((get_all_int_dtypes() + [torch.float32, torch.float16, torch.bfloat16])
3077:                    else get_all_dtypes(include_bool=False, include_half=True, include_complex=False)))
3873:    dtypes(*get_all_dtypes())
3884:    dtypes(*get_all_dtypes(include_bool=False))
3916:            for other in get_all_dtypes():
3922:    dtypes(*get_all_dtypes())
3932:    dtypes(*get_all_dtypes(include_bool=False))
3955:    dtypes(*get_all_dtypes(include_bool=False))
3961:    dtypes(*get_all_dtypes(include_bool=False))
3965:    dtypes(*get_all_dtypes())
```

</p>
</details>

<details>
<summary>

`test/test_testing.py`</summary>

<p>

```python
25:from torch.testing._internal.common_dtype import get_all_dtypes
31:    dtypes(*(get_all_dtypes(include_half=True, include_bfloat16=False,
```

</p>
</details>

<details>
<summary>

`test/test_torch.py`</summary>

<p>

```python
51:    expectedAlertNondeterministic, get_all_device_types, skipXLA)
57:    get_all_fp_dtypes, get_all_int_dtypes, get_all_math_dtypes, get_all_dtypes, get_all_complex_dtypes
296:            for d in get_all_device_types():
323:            for device in get_all_device_types():
324:                for dt1 in get_all_dtypes():
325:                    for dt2 in get_all_dtypes():
343:            all_dtypes = get_all_dtypes()
350:            all_dtypes = get_all_dtypes()
781:            for dtype in get_all_dtypes():
986:            for device in get_all_device_types():
1017:            for device in get_all_device_types():
1018:                for dtype in get_all_math_dtypes(device):
2792:            for device in get_all_device_types():
3186:    dtypes(*get_all_dtypes())
3195:        for error_dtype in get_all_dtypes():
3203:    dtypes(*get_all_dtypes())
3212:        for error_dtype in get_all_dtypes():
4539:    dtypes(*get_all_fp_dtypes())
4545:    dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes()))
4577:    dtypes(*get_all_fp_dtypes(include_half=False, include_bfloat16=False))
4578:    dtypesIfCPU(*(get_all_fp_dtypes(include_half=False, include_bfloat16=True)))
4579:    dtypesIfCUDA(*(get_all_fp_dtypes(include_bfloat16=False)))
4599:    dtypes(*(get_all_fp_dtypes(include_half=False, include_bfloat16=False)))
4600:    dtypesIfCPU(*(get_all_dtypes(include_half=False, include_bfloat16=False, include_complex=False)))
4601:    dtypesIfCUDA(*(get_all_dtypes(include_bfloat16=False, include_complex=False)))
4613:        for p_dtype in get_all_fp_dtypes(include_half=device.startswith('cuda'), include_bfloat16=False):
4628:    dtypes(*(get_all_fp_dtypes(include_half=False, include_bfloat16=False)))
4629:    dtypesIfCUDA(*(get_all_fp_dtypes(include_bfloat16=False)))
4640:    dtypes(*get_all_fp_dtypes())
4723:    dtypes(*get_all_fp_dtypes())
4735:    dtypes(*get_all_fp_dtypes(include_bfloat16=False))
4736:    dtypesIfCUDA(*get_all_fp_dtypes())
4747:    dtypes(*get_all_fp_dtypes())
4761:    dtypes(*get_all_fp_dtypes())
4771:    dtypes(*get_all_fp_dtypes())
4792:    dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes()))
5302:    dtypes(*get_all_dtypes(include_bfloat16=False))
5322:    dtypes(*get_all_dtypes(include_half=False, include_bfloat16=False))
5323:    dtypesIfCPU(*get_all_dtypes(include_bfloat16=False))
5324:    dtypesIfCUDA(*get_all_dtypes(include_bfloat16=False))
5591:        for dt in get_all_dtypes():
5611:        for dt in get_all_dtypes():
5678:        for dt in get_all_dtypes():
5696:    dtypesIfCUDA(*set(get_all_math_dtypes('cuda')))
5697:    dtypes(*set(get_all_math_dtypes('cpu')))
5746:    dtypes(*get_all_dtypes())
5780:    dtypes(*get_all_dtypes())
5885:    dtypes(*get_all_dtypes())
5902:    dtypes(*get_all_dtypes())
5945:    dtypes(*get_all_dtypes())
5979:    dtypes(*get_all_dtypes(include_bool=False))
6049:    dtypes(*get_all_dtypes(include_bool=False))
6092:    dtypes(*(get_all_fp_dtypes(include_bfloat16=False, include_half=False) +
6093:              get_all_complex_dtypes()))
6094:    dtypesIfCPU(*get_all_dtypes())
6095:    dtypesIfCUDA(*get_all_dtypes())
6122:    dtypes(*(get_all_fp_dtypes(include_bfloat16=False, include_half=False) +
6123:              get_all_complex_dtypes()))
6124:    dtypesIfCPU(*get_all_dtypes())
6125:    dtypesIfCUDA(*get_all_dtypes())
6163:    dtypes(*(get_all_fp_dtypes(include_bfloat16=False, include_half=False) +
6164:              get_all_complex_dtypes()))
6165:    dtypesIfCPU(*get_all_dtypes())
6166:    dtypesIfCUDA(*get_all_dtypes())
6190:    dtypes(*(get_all_complex_dtypes() +
6191:              get_all_int_dtypes()))
6238:    dtypes(*get_all_dtypes())
6323:    dtypes(*get_all_dtypes())
6389:    dtypes(*product(get_all_dtypes(), (torch.uint8, torch.bool)))
6699:    dtypesIfCUDA(*set(get_all_math_dtypes('cuda')))
6700:    dtypes(*set(get_all_math_dtypes('cpu')))
7452:    dtypes(*get_all_dtypes(include_bool=False))
7461:    dtypes(*get_all_dtypes(include_bool=False))
7477:    dtypes(*get_all_dtypes(include_bool=False))
7496:    dtypes(*get_all_dtypes(include_bool=False))
7538:    dtypes(*get_all_dtypes(include_bool=False))
8162:    dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes() +
8163:              get_all_complex_dtypes()))
8175:    dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes() +
8176:              get_all_complex_dtypes()))
```

</p>
</details>

<details>
<summary>

`test/test_type_promotion.py`</summary>

<p>

```python
14:    get_all_dtypes, get_all_math_dtypes, get_all_int_dtypes, get_all_fp_dtypes
187:        for dtype in get_all_dtypes():
262:        dtypes1 = get_all_math_dtypes('cuda')
263:        dtypes2 = get_all_math_dtypes(device)
339:    dtypes(*itertools.product(get_all_dtypes(), get_all_dtypes()))
468:            for dt1 in get_all_math_dtypes(device):
469:                for dt2 in get_all_math_dtypes(device):
519:            for dt1 in get_all_math_dtypes(device):
520:                for dt2 in get_all_math_dtypes(device):
528:        for dt in get_all_math_dtypes(device):
561:        for dtype in get_all_dtypes():
766:                                          dtypes=get_all_math_dtypes(device))
771:                                          dtypes=get_all_math_dtypes(device))
782:                                          dtypes=get_all_math_dtypes(device))
879:        dtypes = get_all_dtypes(include_bfloat16=False)
898:        dtypes = get_all_dtypes(include_bfloat16=False, include_bool=False)
965:    dtypesIfCUDA(*itertools.product(get_all_dtypes(include_bfloat16=False, include_complex=False),
966:                                     get_all_dtypes(include_bfloat16=False, include_complex=False)))
967:    dtypes(*itertools.product(get_all_dtypes(include_half=False, include_bfloat16=False,
969:                               get_all_dtypes(include_half=False, include_bfloat16=False,
976:            return dtype in get_all_int_dtypes() + [torch.bool]
979:            return dtype in get_all_fp_dtypes(include_half=True, include_bfloat16=False)
```

</p>
</details>

<details>
<summary>

`test/test_unary_ufuncs.py`</summary>

<p>

```python
24:    floating_types_and, all_types_and_complex_and, floating_and_complex_types_and, get_all_dtypes, get_all_math_dtypes,
25:    get_all_int_dtypes, get_all_fp_dtypes, get_all_complex_dtypes
517:    dtypes(*(get_all_int_dtypes() + [torch.bool] +
518:              get_all_fp_dtypes(include_bfloat16=False)))
596:    dtypes(*get_all_fp_dtypes(include_half=True, include_bfloat16=False))
611:        invalid_input_dtypes = get_all_int_dtypes() + \
612:            get_all_complex_dtypes() + \
619:        for dtype in get_all_fp_dtypes(include_half=True, include_bfloat16=False):
1048:    dtypes(*get_all_math_dtypes('cpu'))
1182:    dtypesIfCUDA(*get_all_fp_dtypes())
1190:    dtypesIfCUDA(*get_all_fp_dtypes())
1205:    dtypesIfCUDA(*get_all_fp_dtypes())
1215:    dtypesIfCUDA(*get_all_fp_dtypes())
1307:    dtypes(*(get_all_dtypes(include_bool=False)))
1349:    dtypes(*(get_all_fp_dtypes(include_half=False) +
1350:              get_all_complex_dtypes()))
1351:    dtypesIfCUDA(*(get_all_fp_dtypes(include_half=True) +
1352:                    get_all_complex_dtypes()))
```

</p>
</details>

<details>
<summary>

`test/test_view_ops.py`</summary>

<p>

```python
19:    get_all_dtypes, get_all_int_dtypes, get_all_fp_dtypes, get_all_complex_dtypes
124:    dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes()))
131:    dtypes(*get_all_dtypes(include_bfloat16=False))
213:            for view_dtype in [*get_all_fp_dtypes(), *get_all_complex_dtypes()]:
220:    dtypes(*get_all_dtypes())
224:        for view_dtype in get_all_dtypes():
305:    dtypes(*get_all_complex_dtypes(include_complex32=True))
343:    dtypes(*get_all_dtypes())
354:    dtypes(*get_all_dtypes())
364:    dtypes(*get_all_dtypes())
374:    dtypes(*get_all_dtypes())
384:    dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes()))
395:    dtypes(*get_all_complex_dtypes())
426:    dtypes(*get_all_complex_dtypes())
451:    dtypes(*product(get_all_complex_dtypes(), get_all_dtypes()))
1263:    dtypes(*(torch.testing.get_all_dtypes()))
1279:    dtypes(*(torch.testing.get_all_dtypes()))
1405:    dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False) +
1406:              get_all_complex_dtypes()))
1471:    dtypes(*get_all_dtypes(include_bfloat16=False))
1574:    dtypes(*get_all_dtypes())
1601:    dtypes(*get_all_dtypes(include_bfloat16=False))
1632:    dtypes(*get_all_dtypes(include_bfloat16=False))
1711:        for dt in get_all_dtypes():
1717:        for dt in get_all_dtypes():
1724:        for dt in get_all_dtypes():
```

</p>
</details>

I'm looking forward to your viewpoints. Thanks :)

cc: mruberry kshitij12345 anjali411

Pull Request resolved: https://github.com/pytorch/pytorch/pull/71561

Reviewed By: samdow

Differential Revision: D34856571

Pulled By: mruberry

fbshipit-source-id: 0dca038bcad5cf69906245c496d2e61ac3876335
(cherry picked from commit b058f67b4313143efa714ab105f36e74083131b9)
2022-03-15 20:31:41 +00:00
Natalia Gimelshein
967606124a port torch cov tests to error inputs (#73977)
Summary:
Per title

Pull Request resolved: https://github.com/pytorch/pytorch/pull/73977

Reviewed By: malfet

Differential Revision: D34779552

Pulled By: ngimel

fbshipit-source-id: b4191101a029981eb27c75e1b56d739db046f819
(cherry picked from commit 2c2af726ffdba68f358a4ff0ee07580609bccc34)
2022-03-10 19:04:44 +00:00
Natalia Gimelshein
e47a5a64bb Back out "Revert D34524207: [pytorch][PR] remove _s_where" (#73579)
Summary:
Original commit changeset: 87b1220d851c

Original Phabricator Diff: D34524207 (4eb2482568) (4eb2482568)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/73579

Test Plan:
OSS tests
tested with canary https://www.internalfb.com/intern/ads/canary/441912928798660873

Reviewed By: ezyang

Differential Revision: D34688237

Pulled By: ngimel

fbshipit-source-id: 32f3a0046053ef52e95ab45a26bfc1de17e7e061
(cherry picked from commit d1c0acbe3e0ff884c429072923a468ee1d3d447d)
2022-03-08 19:15:30 +00:00
anjali411
37e0d2e361 Fix segfault while real and imaginary attributes are set to a number (#73867)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73867

Fixes https://github.com/pytorch/pytorch/issues/72947

Test Plan: Imported from OSS

Reviewed By: davidberard98

Differential Revision: D34695956

Pulled By: anjali411

fbshipit-source-id: 2f3eda272a5214335eae506bd387ce8da4d81b8c
(cherry picked from commit fdb07354cac22c30aa047e65fbac9840608db811)
2022-03-08 18:58:26 +00:00
Natalia Gimelshein
55525632ab Revert D34554432: Back out "Revert D34524207: [pytorch][PR] remove _s_where"
Test Plan: revert-hammer

Differential Revision:
D34554432 (9c03c6163f)

Original commit changeset: 2f3601d3d426

Original Phabricator Diff: D34554432 (9c03c6163f)

fbshipit-source-id: db434750f44c6e6ec545a248c462d8fdcbefbaf8
(cherry picked from commit 866d4d0c795edd7ef519925683b5e57dd9b116ad)
2022-03-04 20:32:39 +00:00
Natalia Gimelshein
9c03c6163f Back out "Revert D34524207: [pytorch][PR] remove _s_where" (#73579)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73579

Original commit changeset: 87b1220d851c

Original Phabricator Diff: D34524207 (4eb2482568)

Test Plan: OSS tests

Reviewed By: malfet

Differential Revision: D34554432

fbshipit-source-id: 2f3601d3d4261ebcebb05b4b1aec0c9a8a00ea04
(cherry picked from commit b9cad3f2bc54e12b275567454336cf4d9dcb78c4)
2022-03-04 19:35:41 +00:00
Nikita Karetnikov
eb0d370f14 Write explicit meta-kernels for normal (#70089)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70089

See #69386.

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D34089964

Pulled By: bdhirsh

fbshipit-source-id: eb88eb7c4830545d3d43c82b6f3abb98617cee8e
(cherry picked from commit 89c9c02a0fb1c780495fee6370961104f4b1dcd1)
2022-03-01 23:28:14 +00:00
Nikita Shulga
dd9517cc4a Revert D34524207: [pytorch][PR] remove _s_where
Test Plan: revert-hammer

Differential Revision:
D34524207 (4eb2482568)

Original commit changeset: bc71e27b6d3f

Original Phabricator Diff: D34524207 (4eb2482568)

fbshipit-source-id: 87b1220d851c3d2b51bdd1cf2f8a493c58ab9b14
(cherry picked from commit af1f0cc9e032b00619a7979bbbd2281f69e0fdf0)
2022-03-01 17:43:16 +00:00
Natalia Gimelshein
4eb2482568 remove _s_where (#73468)
Summary:
Per title
Fixes https://github.com/pytorch/pytorch/issues/73135

Pull Request resolved: https://github.com/pytorch/pytorch/pull/73468

Reviewed By: albanD

Differential Revision: D34524207

Pulled By: ngimel

fbshipit-source-id: bc71e27b6d3fa50de6737533c92375266d9eadc5
(cherry picked from commit 047b925849370e6e4cbe9e3a722db52bb1e965b9)
2022-03-01 07:30:34 +00:00
Philip Meier
0973c5a1cc align signature of make_tensor with other creation ops (#72702)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72702

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D34457729

Pulled By: mruberry

fbshipit-source-id: 83d580c4201eef946dc9cf4b9e28a3d36be55609
(cherry picked from commit aa4cf20fbeb4b795595729b8ac2e6ba7707d8283)
2022-02-25 06:30:31 +00:00
Nikita Shulga
cfb6c942fe scatter_reduce documentation (#73125)
Summary:
Reland of https://github.com/pytorch/pytorch/issues/68580 (which were milestoned for 1.11) plus partial revert of https://github.com/pytorch/pytorch/pull/72543

Pull Request resolved: https://github.com/pytorch/pytorch/pull/73125

Reviewed By: bdhirsh

Differential Revision: D34355217

Pulled By: malfet

fbshipit-source-id: 325ecdeaf53183d653b44ee5e6e8839ceefd9200
(cherry picked from commit 71db31748a)
2022-02-22 19:33:46 +00:00
Philip Meier
1f74e082e2 only compare attributes for meta tensors (#72508)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72508

Todo:

- [x] document this behavior
- [x] add tests

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D34262452

Pulled By: ezyang

fbshipit-source-id: bc5c9653d5c3ad5c6efccc9c8e0efc0d28e15104
(cherry picked from commit 233142c88e)
2022-02-17 02:33:08 +00:00
Brian Hirsh
f87f753bb9 avoiding adding some functions to the public python API before 1.11 release (#72543)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72543

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D34085724

Pulled By: bdhirsh

fbshipit-source-id: 941d5a90a6fa5328268d623e0e2b01577e4132ca
(cherry picked from commit 6676a0c79a)
2022-02-14 19:49:01 +00:00
Kurt Mohler
47c6993355 Update from_dlpack tests and documentation (#70543)
Summary:
Part of https://github.com/pytorch/pytorch/issues/58742

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70543

Reviewed By: soulitzer

Differential Revision: D34172475

Pulled By: mruberry

fbshipit-source-id: d498764b8651a8b7a19181b3421aeebf28a5db2b
(cherry picked from commit 05332f164c)
2022-02-14 03:35:17 +00:00
anjali411
f607af126e Set correct device id on efficientzerotensors (#71611)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71611

Fixes https://github.com/pytorch/pytorch/issues/71160 https://github.com/pytorch/pytorch/issues/69925 #69913

Test Plan: Imported from OSS

Reviewed By: VitalyFedyunin

Differential Revision: D33897543

Pulled By: anjali411

fbshipit-source-id: f1d8608c351876b8c2619da5ef891f74bad30ab5
(cherry picked from commit 643e666ea3)
2022-02-02 21:51:32 +00:00
Anjali Chourdia
1e4aefaa2f Revert D33834916: Set correct device id on efficientzerotensors
Test Plan: revert-hammer

Differential Revision:
D33834916 (a18cfb790d)

Original commit changeset: 11cec343e95e

Original Phabricator Diff: D33834916 (a18cfb790d)

fbshipit-source-id: 3d3f60b760b445383768161b1d21ea4dadbe5d7c
(cherry picked from commit eba41aa646)
2022-01-31 03:49:56 +00:00
anjali411
a18cfb790d Set correct device id on efficientzerotensors (#71611)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71611

Fixes https://github.com/pytorch/pytorch/issues/71160 https://github.com/pytorch/pytorch/issues/69925

Test Plan: Imported from OSS

Reviewed By: george-qi

Differential Revision: D33834916

Pulled By: anjali411

fbshipit-source-id: 11cec343e95e2ee188ab7576f26f64aa19317891
(cherry picked from commit f6e86f8a6b)
2022-01-30 20:53:15 +00:00
Mikayla Gawarecki
09c417ae65 Add new reduce options and autograd support for scatter_reduce (#71788)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71788

Test Plan: Imported from OSS

Reviewed By: mikaylagawarecki

Differential Revision: D33778525

Pulled By: cpuhrsch

fbshipit-source-id: 47b8544e29df3075bc6ede894c59499a7ffec876
(cherry picked from commit ddcddac726)
2022-01-27 17:38:50 +00:00
Mikayla Gawarecki
fdec94504f Rename _scatter_reduce to scatter_reduce and make it unstructured (#71787)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71787

Test Plan: Imported from OSS

Reviewed By: mikaylagawarecki

Differential Revision: D33778524

Pulled By: cpuhrsch

fbshipit-source-id: 55a330e1c2227c0eaaa1c0d2f9205a4dee24a11b
(cherry picked from commit 6e4a8a91da)
2022-01-27 16:29:13 +00:00
Mike Ruberry
0891c908bb Revert D33768645: Set correct device id on efficientzerotensors
Test Plan: revert-hammer

Differential Revision:
D33768645 (5dd6cd55ba)

Original commit changeset: 66ce9907630b

Original Phabricator Diff: D33768645 (5dd6cd55ba)

fbshipit-source-id: 4bb1ad46f01cd33aeb813bdc123741cf665194a8
(cherry picked from commit 8ca385b1d8)
2022-01-26 17:01:32 +00:00
anjali411
5dd6cd55ba Set correct device id on efficientzerotensors (#71611)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71611

Fixes https://github.com/pytorch/pytorch/issues/71160

Test Plan: Imported from OSS

Reviewed By: pbelevich, ngimel

Differential Revision: D33768645

Pulled By: anjali411

fbshipit-source-id: 66ce9907630b65a12c0775077147a7e72ff4cee4
(cherry picked from commit 3af98a4d70)
2022-01-25 23:32:11 +00:00
Jonathan Colen
33403f4848 edge_order check in torch.gradient only applies to dim argument (#67926)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/67919

The compatibility check on `edge_order` in `pre_check_gradient` now looks only at dim argument if it is present, otherwise it checks all dimensions.

Previously, it would check all dimensions regardless of the dim argument and throw unnecessary errors.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67926

Reviewed By: albanD

Differential Revision: D33760621

Pulled By: mruberry

fbshipit-source-id: d490cd8610c68ff3787e670fc947de3cbf2db062
(cherry picked from commit 45bc56de9e)
2022-01-25 21:29:31 +00:00
Mike Ruberry
e0d829a266 Kill the test_torch.py mixin and creates test_scatter_gather_ops (#71691)
Summary:
Per title.

Also annotates test_torch.py with additional cleanup tasks and adds empty sample inputs to elementwise unary and binary OpInfos.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/71691

Reviewed By: ngimel

Differential Revision: D33735126

Pulled By: mruberry

fbshipit-source-id: 8cc097a7581a8b620540c95b2a5889c1165ecf23
(cherry picked from commit 5c6a245a3f)
2022-01-24 09:32:32 +00:00
Mike Ruberry
7680a0ae9d Deprecates _aminmax (#71576)
Summary:
Replaces https://github.com/pytorch/pytorch/pull/62432. Existing callsites are updated.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/71576

Reviewed By: ngimel

Differential Revision: D33689960

Pulled By: mruberry

fbshipit-source-id: fad1ba78347ecec7fd48f21862c3eb606662b8f4
(cherry picked from commit 6cd438e9a1)
2022-01-21 09:23:29 +00:00
Peter Bell
17bb68618f Copy: Fix CPU transpose path ignoring neg and conj bits (#69026)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69026

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D33064533

Pulled By: anjali411

fbshipit-source-id: 98c25586a1707ac2324f69f652ce5a14dd59c0ad
2022-01-14 10:13:33 -08:00
Emilio Castillo
8dfff8b2e2 Fix scatter for empty indexes (#70662)
Summary:
This PR fixes an issue with `scatter` where the output is garbage for zero-sized indexes.

```py
import torch

null_index = torch.zeros((0, 4), dtype=torch.int64)
null_arr = torch.zeros((0, 4))
zeros_arr = torch.zeros((1, 4))

result = zeros_arr.scatter(0, null_index, null_arr)

print(null_index)
print(null_arr)
print(zeros_arr)
print(result)
```

```
tensor([], size=(0, 4), dtype=torch.int64)
tensor([], size=(0, 4))
tensor([[0., 0., 0., 0.]])
tensor([[1.7036e+19, 2.9965e+32, 3.9133e-14, 1.3585e-19]])
```

the out array is never filled if `index` arg has 0 elements.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70662

Reviewed By: dagitses

Differential Revision: D33476807

Pulled By: albanD

fbshipit-source-id: 97dbdd9c0133899e58828c43ecba81838807b8af
2022-01-07 09:20:43 -08:00
Peter Bell
917d56a7e4 Copy: Fix conj bit being ignored on type mismatch (#68963)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68963

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D33064492

Pulled By: anjali411

fbshipit-source-id: 043f927d6bfff46bf5f8ea6fce9409f250bf8ff8
2022-01-05 17:59:32 -08:00
Brian Hirsh
457ba1dd3e Porting index_add to structured kernels, add an out variant (#65993)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65993

This PR attempts to port `index_add` to structured kernels, but does more than that:

* Adds an `out=` variant to `index_add`
* Revises `native_functions.yaml` registrations, to not have multiple entries and instead pass default value to `alpha`.
* Changes in `derivatives.yaml` file for autograd functioning
* Revises error messages, please see: https://github.com/pytorch/pytorch/pull/65993#issuecomment-945441615

Follow-up PRs in near future will attempt to refactor the OpInfo test, and will give another look at tests in `test/test_torch.py` for this function. (hence the use of ghstack for this)

~This is WIP because there are tests failing for `Dimname` variant on mobile/android builds, and I'm working on fixing them.~

Issue tracker: https://github.com/pytorch/pytorch/issues/55070

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D32646426

fbshipit-source-id: b035ecf843a9a27d4d1e18b202b035adc2a49ab5
2021-12-14 11:57:13 -08:00
kshitij12345
5b2586fe09 [testing] Ignore expected_regex in assertRaisesRegex for non-native device (#68723)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/29719

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68723

Reviewed By: zou3519

Differential Revision: D32797061

Pulled By: mruberry

fbshipit-source-id: 3bcae6d3d62d180059dbe39be520b0e7f9aea19f
2021-12-02 14:52:27 -08:00
Emilio Castillo
533e72e0a4 Fix DLPack CUDA stream convention (#67618)
Summary:
Apparently for the array API, cuda default stream and per thread stream should be 1 and 2 instead of 0 and 1:

https://data-apis.org/array-api/latest/API_specification/array_object.html?dlpack-self-stream-none#dlpack-self-stream-none.

This caused a problem in the interop with CuPy https://github.com/cupy/cupy/pull/5970#discussion_r739912926.

cc rgommers leofang mruberry

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67618

Reviewed By: albanD

Differential Revision: D32521805

Pulled By: mruberry

fbshipit-source-id: 95777e4014e5edf1f88ba10adc03c6e34c13248d
2021-11-18 08:36:05 -08:00
kshitij12345
d5d2096dab [testing] make @dtypes mandatory when using @dtypesIf (#68186)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/53647

With this if a test forgets to add `dtypes` while using `dtypesIf`, following error is raised
```
AssertionError: dtypes is mandatory when using dtypesIf however 'test_exponential_no_zero' didn't specify it
```

**Tested Locally**

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68186

Reviewed By: VitalyFedyunin

Differential Revision: D32468581

Pulled By: mruberry

fbshipit-source-id: 805e0855f988b77a5d8d4cd52b31426c04c2200b
2021-11-18 08:29:31 -08:00
rusty1s
9807787135 scatter_reduce (#68115)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/63780

Basic functionality of a `scatter_reduce` algorithm with `reduce="sum"`:

* `scatter_reduce` is named as `scatter_reduce2` due to compiling issues
* It currently re-uses functionality from `scatter_add`
* Tests are missing: WIP

The error when the `scatter_reduce` naming is used:
```
In file included from aten/src/ATen/core/TensorBody.h:3,
                 from ../aten/src/ATen/core/Tensor.h:3,
                 from ../aten/src/ATen/DeviceGuard.h:4,
                 from ../aten/src/ATen/ATen.h:11,
                 from aten/src/ATen/native/cpu/CopyKernel.cpp.DEFAULT.cpp:1:
aten/src/ATen/Operators.h:13949:18: error: redefinition of ‘struct at::_ops::scatter_reduce’
13949 | struct TORCH_API scatter_reduce {
      |                  ^~~~~~~~~~~~~~
aten/src/ATen/Operators.h:13817:18: note: previous definition of ‘struct at::_ops::scatter_reduce’
13817 | struct TORCH_API scatter_reduce {
      |                  ^~~~~~~~~~~~~~
aten/src/ATen/Operators.h:13960:18: error: redefinition of ‘struct at::_ops::scatter_reduce_out’
13960 | struct TORCH_API scatter_reduce_out {
      |                  ^~~~~~~~~~~~~~~~~~
aten/src/ATen/Operators.h:13839:18: note: previous definition of ‘struct at::_ops::scatter_reduce_out’
13839 | struct TORCH_API scatter_reduce_out {
      |                  ^~~~~~~~~~~~~~~~~~
In file included from ../aten/src/ATen/core/Tensor.h:3,
                 from ../aten/src/ATen/DeviceGuard.h:4,
                 from ../aten/src/ATen/ATen.h:11,
                 from aten/src/ATen/native/cpu/CopyKernel.cpp.DEFAULT.cpp:1:
aten/src/ATen/core/TensorBody.h: In member function ‘at::Tensor at::Tensor::scatter_reduce(int64_t, const at::Tensor&, c10::string_view, c10::optional<long int>) const’:
aten/src/ATen/core/TensorBody.h:3976:83: error: cannot convert ‘c10::string_view’ {aka ‘c10::basic_string_view<char>’} to ‘const at::Tensor&’
 3976 |     return at::_ops::scatter_reduce::call(const_cast<Tensor&>(*this), dim, index, reduce, output_size);
      |                                                                                   ^~~~~~
      |                                                                                   |
      |                                                                                   c10::string_view {aka c10::basic_string_view<char>}
In file included from aten/src/ATen/core/TensorBody.h:3,
                 from ../aten/src/ATen/core/Tensor.h:3,
                 from ../aten/src/ATen/DeviceGuard.h:4,
                 from ../aten/src/ATen/ATen.h:11,
                 from aten/src/ATen/native/cpu/CopyKernel.cpp.DEFAULT.cpp:1:
aten/src/ATen/Operators.h:13824:109: note:   initializing argument 4 of ‘static at::Tensor at::_ops::scatter_reduce::call(const at::Tensor&, int64_t, const at::Tensor&, const at::Tensor&, c10::string_view)’
13824 |   static at::Tensor call(const at::Tensor & self, int64_t dim, const at::Tensor & index, const at::Tensor & src, c10::string_view reduce);
      |                                                                                          ~~~~~~~~~~~~~~~~~~~^~~
In file included from ../aten/src/ATen/ATen.h:15,
                 from aten/src/ATen/native/cpu/CopyKernel.cpp.DEFAULT.cpp:1:
aten/src/ATen/Functions.h: In function ‘at::Tensor at::scatter_reduce(const at::Tensor&, int64_t, const at::Tensor&, c10::string_view, c10::optional<long int>)’:
aten/src/ATen/Functions.h:7119:61: error: cannot convert ‘c10::string_view’ {aka ‘c10::basic_string_view<char>’} to ‘const at::Tensor&’
 7119 |     return at::_ops::scatter_reduce::call(self, dim, index, reduce, output_size);
      |                                                             ^~~~~~
      |                                                             |
      |                                                             c10::string_view {aka c10::basic_string_view<char>}
In file included from aten/src/ATen/core/TensorBody.h:3,
                 from ../aten/src/ATen/core/Tensor.h:3,
                 from ../aten/src/ATen/DeviceGuard.h:4,
                 from ../aten/src/ATen/ATen.h:11,
                 from aten/src/ATen/native/cpu/CopyKernel.cpp.DEFAULT.cpp:1:
aten/src/ATen/Operators.h:13824:109: note:   initializing argument 4 of ‘static at::Tensor at::_ops::scatter_reduce::call(const at::Tensor&, int64_t, const at::Tensor&, const at::Tensor&, c10::string_view)’
13824 |   static at::Tensor call(const at::Tensor & self, int64_t dim, const at::Tensor & index, const at::Tensor & src, c10::string_view reduce);
      |                                                                                          ~~~~~~~~~~~~~~~~~~~^~~
In file included from ../aten/src/ATen/ATen.h:15,
                 from aten/src/ATen/native/cpu/CopyKernel.cpp.DEFAULT.cpp:1:
aten/src/ATen/Functions.h: In function ‘at::Tensor& at::scatter_reduce_out(at::Tensor&, const at::Tensor&, int64_t, const at::Tensor&, c10::string_view, c10::optional<long int>)’:
aten/src/ATen/Functions.h:7124:65: error: cannot convert ‘c10::string_view’ {aka ‘c10::basic_string_view<char>’} to ‘const at::Tensor&’
 7124 |     return at::_ops::scatter_reduce_out::call(self, dim, index, reduce, output_size, out);
      |                                                                 ^~~~~~
      |                                                                 |
      |                                                                 c10::string_view {aka c10::basic_string_view<char>}
In file included from aten/src/ATen/core/TensorBody.h:3,
                 from ../aten/src/ATen/core/Tensor.h:3,
                 from ../aten/src/ATen/DeviceGuard.h:4,
                 from ../aten/src/ATen/ATen.h:11,
                 from aten/src/ATen/native/cpu/CopyKernel.cpp.DEFAULT.cpp:1:
aten/src/ATen/Operators.h:13846:111: note:   initializing argument 4 of ‘static at::Tensor& at::_ops::scatter_reduce_out::call(const at::Tensor&, int64_t, const at::Tensor&, const at::Tensor&, c10::string_view, at::Tensor&)’
13846 |   static at::Tensor & call(const at::Tensor & self, int64_t dim, const at::Tensor & index, const at::Tensor & src, c10::string_view reduce, at::Tensor & out);
      |                                                                                            ~~~~~~~~~~~~~~~~~~~^~~
In file included from ../aten/src/ATen/ATen.h:15,
                 from aten/src/ATen/native/cpu/CopyKernel.cpp.DEFAULT.cpp:1:
aten/src/ATen/Functions.h: In function ‘at::Tensor& at::scatter_reduce_outf(const at::Tensor&, int64_t, const at::Tensor&, c10::string_view, c10::optional<long int>, at::Tensor&)’:
aten/src/ATen/Functions.h:7129:65: error: cannot convert ‘c10::string_view’ {aka ‘c10::basic_string_view<char>’} to ‘const at::Tensor&’
 7129 |     return at::_ops::scatter_reduce_out::call(self, dim, index, reduce, output_size, out);
      |                                                                 ^~~~~~
      |                                                                 |
      |                                                                 c10::string_view {aka c10::basic_string_view<char>}
In file included from aten/src/ATen/core/TensorBody.h:3,
                 from ../aten/src/ATen/core/Tensor.h:3,
                 from ../aten/src/ATen/DeviceGuard.h:4,
                 from ../aten/src/ATen/ATen.h:11,
                 from aten/src/ATen/native/cpu/CopyKernel.cpp.DEFAULT.cpp:1:
aten/src/ATen/Operators.h:13846:111: note:   initializing argument 4 of ‘static at::Tensor& at::_ops::scatter_reduce_out::call(const at::Tensor&, int64_t, const at::Tensor&, const at::Tensor&, c10::string_view, at::Tensor&)’
13846 |   static at::Tensor & call(const at::Tensor & self, int64_t dim, const at::Tensor & index, const at::Tensor & src, c10::string_view reduce, at::Tensor & out);
      |                                                                                            ~~~~~~~~~~~~~~~~~~~^~~
In file included from aten/src/ATen/NativeFunctions.h:6,
                 from ../aten/src/ATen/TensorIndexing.h:12,
                 from ../aten/src/ATen/ATen.h:20,
                 from aten/src/ATen/native/cpu/CopyKernel.cpp.DEFAULT.cpp:1:
aten/src/ATen/NativeMetaFunctions.h: At global scope:
aten/src/ATen/NativeMetaFunctions.h:496:18: error: redefinition of ‘struct at::meta::structured_scatter_reduce’
  496 | struct TORCH_API structured_scatter_reduce : public at::impl::MetaBase {
      |                  ^~~~~~~~~~~~~~~~~~~~~~~~~
aten/src/ATen/NativeMetaFunctions.h:481:18: note: previous definition of ‘struct at::meta::structured_scatter_reduce’
  481 | struct TORCH_API structured_scatter_reduce : public at::impl::MetaBase {
      |                  ^~~~~~~~~~~~~~~~~~~~~~~~~
ninja: build stopped: subcommand failed.
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68115

Reviewed By: albanD

Differential Revision: D32488450

Pulled By: cpuhrsch

fbshipit-source-id: 65e79c6d0555c0d5715535bb52aade8d5fcd9722
2021-11-17 19:53:12 -08:00
Mikayla Gawarecki
cac3cd1433 add torch.diff support for n greater than 1 (#67260)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67260

Addressing 54853

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D31930294

Pulled By: mikaylagawarecki

fbshipit-source-id: 97c7a27e9200c6688242680ff96b73dfff828479
2021-11-17 09:16:33 -08:00
Nick Anderson
f9ea41f257 Fixes spelling error writeable to writable, improves warning, and documentation (#67664)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/46741
pytorchbot

contributors: nickleus27, yanivsagy, and khanhthien123

SmrutiSikha this is mostly your work.  We just did very minor clean up.

cc mruberry

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67664

Reviewed By: gchanan

Differential Revision: D32311838

Pulled By: mruberry

fbshipit-source-id: 0e5d4d888caeccb0fd7c80e6ff11b1b1fa8e00d6
2021-11-11 13:05:00 -08:00
Kurt Mohler
db014b8529 Add set_deterministic_debug_mode and get_deterministic_debug_mode (#67778)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/67386

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67778

Reviewed By: ngimel

Differential Revision: D32310661

Pulled By: mruberry

fbshipit-source-id: 300129e96ca51c22fa711182ce6a9f4d4d2ce57f
2021-11-11 12:48:29 -08:00
Thomas Viehmann
33b7790907 Fix conv_transpose3d backward with non-contiguous grad_out (#67829)
Summary:
Many thanks to Forest Yang (meowmix) from the forum for reporting it with a minimal reproduction.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67829

Reviewed By: malfet

Differential Revision: D32184786

Pulled By: albanD

fbshipit-source-id: b63dbd3148b5def2109deb2f4612c08f55f59dfb
2021-11-05 08:34:21 -07:00
soulitzer
83e8612d11 Clean up test autograd (#67413)
Summary:
Partially fixes https://github.com/pytorch/pytorch/issues/66066

This PR:
 - cleans up op-specific testing from test_autograd. test_autograd should be reserved for testing generic autograd functionality
 - tests related to an operator are better colocated
 - see the tracker for details

What to think about when moving tests to their correct test suite:
 - naming, make sure its not too generic
 - how the test is parametrized, sometimes we need to add/remove a device/dtype parameter
 - can this be merged with existing tests

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67413

Reviewed By: jbschlosser, albanD

Differential Revision: D32031480

Pulled By: soulitzer

fbshipit-source-id: 8e13da1e58a38d5cecbfdfd4fe2b4fe6f816897f
2021-11-03 15:26:09 -07:00
kshitij12345
885a8e53ba replace onlyOnCPUAndCUDA with onlyNativeDeviceTypes (#65201)
Summary:
Reference https://github.com/pytorch/pytorch/issues/53849

Replace `onlyOnCPUandCUDA` with `onlyNativeDeviceTypes` which includes `cpu, cuda and meta`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65201

Reviewed By: mrshenli

Differential Revision: D31299718

Pulled By: mruberry

fbshipit-source-id: 2d8356450c035d6a314209ab51b2c237583920fd
2021-11-01 09:22:34 -07:00
kshitij12345
c00806beda Add skipXLA and expectedFailureXLA decorator (#66857)
Summary:
Add skipXLA and expectedFailureXLA decorator and relevant test.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66857

Reviewed By: ngimel

Differential Revision: D32039856

Pulled By: mruberry

fbshipit-source-id: 3c99d5e06c1c7684d1f798c11c783bd6ebea9899
2021-10-29 19:53:36 -07:00
jjsjann123
1ec732bc46 Add fp16/fp32 autocasting to JIT/TorchScript (#63939)
Summary:
Adds mixed precision autocasting support between fp32/fp16 to torchscript/JIT. More in depth descriptoin can be found at [torch/csrc/jit/JIT-AUTOCAST.md](https://github.com/pytorch/pytorch/pull/63939/files#diff-1f1772aaa508841c5bb58b74ab98f49a1e577612cd9ea5c386c8714a75db830b)

This PR implemented an autocast optimization pass that inserts casting ops per AMP rule (torch/csrc/jit/passes/autocast.cpp), that mimics the behavior of eager autocast. The pass also takes into consideration the context of `torch.cuda.amp.autocast` and only inserts casting ops within the enabled context manager, giving feature parity as with eager amp autocast.

We currently provide JIT AMP autocast as a prototyping feature, so it is default off and could be turned on via `torch._C._jit_set_autocast_mode(True)`

The JIT support for autocast is subject to different constraints compared to the eager mode implementation (mostly related to the fact that TorchScript is statically typed), restriction on the user facing python code is described in doc torch/csrc/jit/JIT-AUTOCAST.md

This is a prototype, there are also implementation limitation that's necessary to keep this PR small and get something functioning quickly on upstream, so we can iterate on designs.

Few limitation/challenge that is not properly resolved in this PR:
1. Autocast inserts cast operation, which would have impact on scalar type of output tensor feeding downstream operations. We are not currently propagating the updated scalar types, this would give issues/wrong results on operations in promotion rules.

2. Backward for autodiff in JIT misses the casting of dgrad to input scalar type, as what autograd does in eager. This forces us to explicitly mark the casting operation for certain operations (e.g. binary ops), otherwise, we might be feeding dgrad with mismatch scalar type to input. This could potentially break gradient function consuming dgrad. (e.g. gemm backwards, which assumes grad_output to be of same scalar type as input')

3. `torch.autocast` api has an optional argument `dtype` which is not currently supported in the JIT autocast and we require a static value.

Credit goes mostly to:
tlemo
kevinstephano

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63939

Reviewed By: navahgar

Differential Revision: D31093381

Pulled By: eellison

fbshipit-source-id: da6e26c668c38b01e296f304507048d6c1794314
2021-10-27 12:11:36 -07:00
Nikita Shulga
77beccaedb Do not build PyTorch with caffe2 by default (#66658)
Summary:
CAFFE2 has been deprecated for a while, but still included in every PyTorch build.
We should stop building it by default, although CI should still validate that caffe2 code is buildable.

Build even fewer dependencies when compiling mobile builds without Caffe2
Introduce `TEST_CAFFE2` in torch.common.utils
Skip `TestQuantizedEmbeddingOps` and `TestJit.test_old_models_bc`  is code is compiled without Caffe2
Should be landed after https://github.com/pytorch/builder/pull/864

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66658

Reviewed By: driazati, seemethere, janeyx99

Differential Revision: D31669156

Pulled By: malfet

fbshipit-source-id: 1cc45e2d402daf913a4685eb9f841cc3863e458d
2021-10-21 20:32:47 -07:00
Kurt Mohler
94f4e9a995 Enable warning tests for nondeterministic backward functions (#66736)
Summary:
Followup from https://github.com/pytorch/pytorch/issues/66233

Since https://github.com/pytorch/pytorch/issues/50209 was fixed, we can enable these warning tests now

cc mruberry kurtamohler

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66736

Reviewed By: zou3519

Differential Revision: D31723385

Pulled By: mruberry

fbshipit-source-id: dc1922a6d0c45cc80020db85710e755a89113861
2021-10-21 12:51:53 -07:00
Jane Xu
8a65047acc [skip ci] Set test owners for everything considered with module: tests (#66865)
Summary:
Action following https://github.com/pytorch/pytorch/issues/66232

cc mruberry

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66865

Reviewed By: anjali411

Differential Revision: D31771147

Pulled By: janeyx99

fbshipit-source-id: 8bebe5ac2098364ef1ee93b590abb5f4455b0f89
2021-10-20 09:37:03 -07:00
lezcano
0974215c4d Prefer mT and mH over transpose(-2, -1) and transpose(-2, -1).conj() (#64181)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64181

This PR replaces all the calls to:
- `transpose(-2, -1)` or `transpose(-1, -2)` by `mT()` in C++ and `mT` in Python
- `conj().transpose(-2, -1)` or `transpose(-2, -1).conj()` or `conj().transpose(-1, -2)` or `transpose(-1, -2).conj()` by `mH()` in C++ and `mH` in Python.

It also simplifies two pieces of code, and fixes one bug where a pair
of parentheses were missing in the function `make_symmetric_matrices`.

Test Plan: Imported from OSS

Reviewed By: H-Huang

Differential Revision: D31692896

Pulled By: anjali411

fbshipit-source-id: e9112c42343663d442dc5bd53ff2b492094b434a
2021-10-18 13:02:25 -07:00
Kurt Mohler
a25648953c Add warn_only kwarg to use_deterministic_algorithms (#66233)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/64883

Adds a `warn_only` kwarg to `use_deterministic_algorithms`. When enabled, calling an operation that does not have a deterministic implementation will raise a warning, rather than an error.

`torch.testing._internal.common_device_type.expectedAlertNondeterministic` is also refactored and documented in this PR to make it easier to use and understand.

cc mruberry kurtamohler

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66233

Reviewed By: bdhirsh

Differential Revision: D31616481

Pulled By: mruberry

fbshipit-source-id: 059634a82d54407492b1d8df08f059c758d0a420
2021-10-15 13:54:59 -07:00
anjali411
a82fcd3560 Disable .numpy() and .tolist() for tensor subclasses subclasses and fix .tolist() for conjugated and negated tensors (#66082)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66082

Fixes https://github.com/pytorch/pytorch/issues/66024 #65779

cc ezyang anjali411 dylanbespalko mruberry Lezcano nikitaved albanD

Test Plan: Imported from OSS

Reviewed By: Gamrix, albanD

Differential Revision: D31615588

Pulled By: anjali411

fbshipit-source-id: c3e65ef0fe301630eb76732ccd7819683c09aa19
2021-10-13 13:57:51 -07:00
lezcano
82a216c45b Add tensor.{adjoint(),H,mT,mH} methods and properties (#64179)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64179

This PR follows the discussion in https://github.com/pytorch/pytorch/issues/45063#issuecomment-904431478

Fixes https://github.com/pytorch/pytorch/issues/45063

cc ezyang anjali411 dylanbespalko mruberry Lezcano nikitaved rgommers pmeier asmeurer leofang AnirudhDagar asi1024 emcastillo kmaehashi heitorschueroff

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D30730483

Pulled By: anjali411

fbshipit-source-id: 821d25083f5f682450f6812bf852dc96a1cdf9f2
2021-10-13 07:44:43 -07:00
Kurt Mohler
5883523c1d Remove dtype from torch.Storage and use only torch.ByteStorage (#62030)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62030

Remove dtype tracking from Python Storage interface, remove all the different `<type>Storage` classes except for `ByteStorage`, and update serialization accordingly, while maintaining as much FC/BC as possible

Fixes https://github.com/pytorch/pytorch/issues/47442

* **THE SERIALIZATION FORMAT IS FULLY FC/BC.** We worked very hard to make sure this is the case. We will probably want to break FC at some point to make the serialization structure of tensors make more sense, but not today.
* There is now only a single torch.ByteStorage class. Methods like `Tensor.set_` no longer check that the dtype of storage is appropriate.
* As we no longer know what dtype of a storage is, we've **removed** the size method from Storage, replacing it with nbytes. This is to help catch otherwise silent errors where you confuse number of elements with number of bytes.
* `Storage._new_shared` takes a `nbytes` kwarg and will reject previous positional only calls.  `Storage._new_with_file` and `_set_from_file` require explicit element size arguments.
* It's no longer possible to convert storages to different types using the float/double/etc methods. Instead, do the conversion using a tensor.
* It's no longer possible to allocate a typed storage directly using FloatStorage/DoubleStorage/etc constructors. Instead, construct a tensor and extract its storage. The classes still exist but they are used purely for unpickling.
* The preexisting serialization format stores dtype with storage, and in fact this dtype is used to determine the dtype of the tensor overall.
 To accommodate this case, we introduce a new TypedStorage concept that exists only during unpickling time which is used to temporarily store the dtype so we can construct a tensor. **If you overrode the handling of pickling/unpickling, you MUST add handling for TypedStorage** or your serialization code will degrade to standard file-based serialization.

Original pull request: https://github.com/pytorch/pytorch/pull/59671

Reviewed By: soulitzer, ngimel

Differential Revision: D29466819

Pulled By: ezyang

fbshipit-source-id: 4a14e5d3c2b08e06e558683d97f7378a3180b00e
2021-10-05 13:50:34 -07:00
Philip Meier
aebde1bc2b deprecate device getter from torch.testing namespace (#63844)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63844

Test Plan: Imported from OSS

Reviewed By: H-Huang

Differential Revision: D31141433

Pulled By: mruberry

fbshipit-source-id: a29331278ab99a19e225e2cb357458e3db4f9732
2021-09-29 02:40:52 -07:00
Alban Desmaison
7c62b6e973 add deepcopy support to subclasses (#65584)
Summary:
Happy to get any feedback on how to make this code cleaner!

This:
- Fix Tensor attribute deepcopy BC-breaking?
- Add a test for Tensor attribute deepcopy
- Fix subclass deepcopy
- Moves the subclass serialization tests into their own class not to interfere with other serialization test logic
- Add a test for subclass deepcopy

cc ezyang gchanan

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65584

Reviewed By: gchanan

Differential Revision: D31206590

Pulled By: albanD

fbshipit-source-id: 74a8f0767f4933b9c941fbea880a8fd1b893ea2f
2021-09-27 14:36:22 -07:00
Kshiteej K
ff6b475d4a [fix] don't expose unique_dim in torch (#63080)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/62793

This is mostly a quick fix. I think the more correct fix could be updating `unique_dim` to `_unique_dim` which could be BC-breaking for C++ users (� maybe). Maybe something else I am missing.

~~Not sure how to add a test for it.~~ Have tested it locally.

We can add a test like following. Tested this locally, it fails currently but passes with the fix.
```python
        def test_wildcard_import(self):
            exec('from torch import *')

```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63080

Reviewed By: gchanan

Differential Revision: D30738711

Pulled By: zou3519

fbshipit-source-id: b86d0190e45ba0b49fd2cffdcfd2e3a75cc2a35e
2021-09-14 18:19:17 -07:00
Victor Quach
8131bc85d0 Raise TypeError on assigned grad with wrong type (#64876)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/64813

Raises a TypeError when assigned value to a grad is not a Tensor or
None.

Adds tests.

cc ezyang gchanan

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64876

Reviewed By: anjali411

Differential Revision: D30901678

Pulled By: soulitzer

fbshipit-source-id: dbb3cb5fd0bbac6918e0b2e2f51d340daa43dee0
2021-09-13 16:41:45 -07:00
Emilio Castillo
1cb3507ed3 Adds DLPack support (#57110)
Summary:
Partially Fixes https://github.com/pytorch/pytorch/issues/55090
Depends on https://github.com/pytorch/pytorch/issues/55365

Inspired by https://github.com/dmlc/dlpack/issues/57#issuecomment-774482973

Questions, in PyTorch we can't create streams or easily synchronize them from just an integer. Should we add an [`ExternalStream`](https://docs.cupy.dev/en/stable/reference/generated/cupy.cuda.ExternalStream.html) object like the one we have in CuPy?

TODO: Add tests

Would like some feedback as this design needs quite a few iterations
rgommers leofang

Pull Request resolved: https://github.com/pytorch/pytorch/pull/57110

Reviewed By: saketh-are

Differential Revision: D30761481

Pulled By: mruberry

fbshipit-source-id: e85d78df3c1f8defc2a698878da89cd843cb1209
2021-09-12 19:47:15 -07:00
Alban Desmaison
d8ae3cc318 Add more error checking in subclass creation (#64746)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64746

This extracts the error checking that used to be in the PR above.
We are not going to land the proposed fix there, but I think we want this error checking in right now as these would lead to respectively a memory leak and arbitrary memory read/write.

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D30867569

Pulled By: albanD

fbshipit-source-id: bf468033fb8b49fcb26eed423f5fad82b4a46c56
2021-09-10 16:49:10 -07:00
Philip Meier
26b7ff5aea deprecate dtype getters from torch.testing namespace (#63554)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63554

Following https://github.com/pytorch/pytorch/pull/61840#issuecomment-884087809, this deprecates all the dtype getters publicly exposed in the `torch.testing` namespace. The reason for this twofold:

1. If someone is not familiar with the C++ dispatch macros PyTorch uses, the names are misleading. For example `torch.testing.floating_types()` will only give you `float32` and `float64` skipping `float16` and `bfloat16`.
2. The dtype getters provide very minimal functionality that can be easily emulated by downstream libraries.

We thought about [providing an replacement](https://gist.github.com/pmeier/3dfd2e105842ad0de4505068a1a0270a), but ultimately decided against it. The major problem is BC: by keeping it, either the namespace is getting messy again after a new dtype is added or we need to somehow version the return values of the getters.

Test Plan: Imported from OSS

Reviewed By: H-Huang

Differential Revision: D30662206

Pulled By: mruberry

fbshipit-source-id: a2bdb10ab02ae665df1b5b76e8afa9af043bbf56
2021-09-07 08:58:51 -07:00
Ivan Yashchuk
a91a278d60 Fix copy_transpose_valid condition for copy_same_type_transpose_ (#64425)
Summary:
Thanks to ngimel for the hint where the problem might be (https://github.com/pytorch/pytorch/issues/64358#issuecomment-910868849)!

I added a test that fails on master to verify the fix. The shape `(60, 60)` was chosen because of `MIN_SZ = 60 * 60` in `copy_transpose_valid`.

Fixes https://github.com/pytorch/pytorch/issues/64358

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64425

Reviewed By: mruberry

Differential Revision: D30752725

Pulled By: ngimel

fbshipit-source-id: f40370ea8365c94e30f8e8a3dcab5f3b3462464a
2021-09-03 18:50:33 -07:00
Kushashwa Ravi Shrimali
76e187aa08 Port gather to structured kernel (#63312)
Summary:
Will add a description once this is ready for review.

cc: ysiraichi ezyang

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63312

Reviewed By: iramazanli

Differential Revision: D30597447

Pulled By: ezyang

fbshipit-source-id: d36e59835c2f4b38e286032dd2a1111a7e16b7e5
2021-09-02 01:36:21 -07:00
anjali411
5d80a48cef Add fast path for addmm when the inputs are conjugate (#59380)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59380

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D28898374

Pulled By: anjali411

fbshipit-source-id: eab0e64d37bb57c18b54cabb8e5c00666338ba04
2021-09-01 16:34:02 -07:00
Philip Meier
401bbb2aa0 remove componentwise comparison of complex values in TestCase.assertEqual (#63572)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63572

Addresses #61906. Issue will be fixed later in the stack when `torch.testing.assert_close` got the same treatment.

cc ezyang gchanan

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D30633527

Pulled By: mruberry

fbshipit-source-id: c2002a4998a7a75cb2ab83f87190bde43a9d4f7c
2021-08-30 12:36:45 -07:00
Kushashwa Ravi Shrimali
d37636901e [Doc] make_tensor to torch.testing module (#63925)
Summary:
This PR aims to add `make_tensor` to the `torch.testing` module in PyTorch docs.

TODOs:

* [x] Add examples

cc: pmeier mruberry brianjo

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63925

Reviewed By: ngimel

Differential Revision: D30633487

Pulled By: mruberry

fbshipit-source-id: 8e5a1f880c6ece5925b4039fee8122bd739538af
2021-08-30 12:25:40 -07:00
mingfeima
b0782f0f32 add BFloat16 support for bernoulli and Dropout on CPU (#56372)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56372

Test Plan: Imported from OSS

Reviewed By: heitorschueroff

Differential Revision: D28836792

Pulled By: VitalyFedyunin

fbshipit-source-id: ede951d172a59276e11383fd767778ab959b5a6b
2021-08-25 12:01:27 -07:00
Aaron Bockover
c78ab28441 Add support for the ONNX Runtime Eager Mode backend (#58248)
Summary:
This PR implements the necessary hooks/stubs/enums/etc for complete ONNX Runtime (ORT) Eager Mode integration. The actual extension will live out of tree at https://github.com/pytorch/ort.

We have been [working on this at Microsoft](https://github.com/microsoft/onnxruntime-pytorch/tree/eager-ort/torch_onnxruntime) for the last few months, and are finally ready to contribute the PyTorch core changes upstream (nothing major or exciting, just the usual boilerplate for adding new backends).

The ORT backend will allow us to ferry [almost] all torch ops into granular ONNX kernels that ORT will eagerly execute against any devices it supports (therefore, we only need a single ORT backend from a PyTorch perspective).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58248

Reviewed By: astaff

Differential Revision: D30344992

Pulled By: albanD

fbshipit-source-id: 69082b32121246340d686e16653626114b7714b2
2021-08-20 11:17:13 -07:00
Philip Meier
99203580a9 Updates internal assert_allclose callsites in favor of assert_close (#61841)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61841

Redo of #60863.

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D30408145

Pulled By: mruberry

fbshipit-source-id: 0b34ebc7f23ba38ecd89640b61d8aca59b7eab58
2021-08-19 12:50:41 -07:00
Thomas J. Fan
07b00fc324 ENH Migrate nll_loss2d from THC to ATen (#62826)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/24608
Fixes https://github.com/pytorch/pytorch/issues/24607

With the following benchmark, the backward pass runs a little slower. This is strange since the implementation should be exactly the same.

<details>
 <summary>Benchmark script</summary>

```python
from itertools import product

import torch
import torch.nn as nn
import torch.nn.functional as F
import time

torch.manual_seed(0)
MS_PER_SECOND = 1000

def _time():
    torch.cuda.synchronize()
    return time.perf_counter() * MS_PER_SECOND

device = "cuda"
C = 3
n_runs = 30
reductions = ["none", "sum", "mean"]
Ns = [128, 256, 512]
Hs = [128, 256, 512]

for reduction, N, H in product(reductions, Ns, Hs):
    total_fwd_time = 0
    total_back_time = 0
    if reduction == "none":
        grad_out = torch.randn(N, H, H, device=device)
    else:
        grad_out = torch.randn(1)[0]

    for _ in range(n_runs):
        input = torch.randn(N, C, H, H, device=device, requires_grad=True)
        target = torch.rand(N, H, H, device=device).mul(3).floor().long()

        # forward
        start = _time()
        result = F.nll_loss(input, target, reduction=reduction)
        total_fwd_time += _time() - start

    result = F.nll_loss(input, target, reduction=reduction)
    for _ in range(n_runs):
        # backward
        start = _time()
        result.backward(grad_out, retain_graph=True)
        total_back_time += _time() - start

    fwd_avg = total_fwd_time / n_runs
    bwd_avg = total_back_time / n_runs
    print(
        f"input size({N}, {C}, {H}, {H}), reduction: {reduction}, fwd: {fwd_avg:.2f} (ms), back: {bwd_avg:.2f} (ms)"
    )

```

</details>

<details>
 <summary>master results</summary>

```
input size(128, 3, 128, 128), reduction: none, fwd: 0.34 (ms), back: 0.57 (ms)
input size(128, 3, 256, 256), reduction: none, fwd: 2.56 (ms), back: 3.85 (ms)
input size(128, 3, 512, 512), reduction: none, fwd: 14.54 (ms), back: 16.62 (ms)
input size(256, 3, 128, 128), reduction: none, fwd: 1.26 (ms), back: 1.78 (ms)
input size(256, 3, 256, 256), reduction: none, fwd: 7.07 (ms), back: 8.22 (ms)
input size(256, 3, 512, 512), reduction: none, fwd: 29.38 (ms), back: 33.29 (ms)
input size(512, 3, 128, 128), reduction: none, fwd: 3.41 (ms), back: 4.05 (ms)
input size(512, 3, 256, 256), reduction: none, fwd: 14.32 (ms), back: 16.46 (ms)
input size(512, 3, 512, 512), reduction: none, fwd: 59.20 (ms), back: 66.68 (ms)
input size(128, 3, 128, 128), reduction: sum, fwd: 0.08 (ms), back: 0.21 (ms)
input size(128, 3, 256, 256), reduction: sum, fwd: 0.21 (ms), back: 0.73 (ms)
input size(128, 3, 512, 512), reduction: sum, fwd: 0.82 (ms), back: 2.86 (ms)
input size(256, 3, 128, 128), reduction: sum, fwd: 0.12 (ms), back: 0.39 (ms)
input size(256, 3, 256, 256), reduction: sum, fwd: 0.42 (ms), back: 1.45 (ms)
input size(256, 3, 512, 512), reduction: sum, fwd: 1.53 (ms), back: 5.66 (ms)
input size(512, 3, 128, 128), reduction: sum, fwd: 0.21 (ms), back: 0.74 (ms)
input size(512, 3, 256, 256), reduction: sum, fwd: 0.78 (ms), back: 2.86 (ms)
input size(512, 3, 512, 512), reduction: sum, fwd: 2.98 (ms), back: 11.23 (ms)
input size(128, 3, 128, 128), reduction: mean, fwd: 0.07 (ms), back: 0.21 (ms)
input size(128, 3, 256, 256), reduction: mean, fwd: 0.21 (ms), back: 0.73 (ms)
input size(128, 3, 512, 512), reduction: mean, fwd: 0.82 (ms), back: 2.86 (ms)
input size(256, 3, 128, 128), reduction: mean, fwd: 0.13 (ms), back: 0.39 (ms)
input size(256, 3, 256, 256), reduction: mean, fwd: 0.42 (ms), back: 1.45 (ms)
input size(256, 3, 512, 512), reduction: mean, fwd: 1.54 (ms), back: 5.65 (ms)
input size(512, 3, 128, 128), reduction: mean, fwd: 0.22 (ms), back: 0.74 (ms)
input size(512, 3, 256, 256), reduction: mean, fwd: 0.78 (ms), back: 2.87 (ms)
input size(512, 3, 512, 512), reduction: mean, fwd: 2.98 (ms), back: 11.23 (ms)
```

</details>

<details>
 <summary>PR results</summary>

```
input size(128, 3, 128, 128), reduction: none, fwd: 0.33 (ms), back: 0.59 (ms)
input size(128, 3, 256, 256), reduction: none, fwd: 2.51 (ms), back: 3.92 (ms)
input size(128, 3, 512, 512), reduction: none, fwd: 14.52 (ms), back: 17.05 (ms)
input size(256, 3, 128, 128), reduction: none, fwd: 1.23 (ms), back: 1.85 (ms)
input size(256, 3, 256, 256), reduction: none, fwd: 7.07 (ms), back: 8.45 (ms)
input size(256, 3, 512, 512), reduction: none, fwd: 29.39 (ms), back: 34.21 (ms)
input size(512, 3, 128, 128), reduction: none, fwd: 3.40 (ms), back: 4.18 (ms)
input size(512, 3, 256, 256), reduction: none, fwd: 14.33 (ms), back: 16.90 (ms)
input size(512, 3, 512, 512), reduction: none, fwd: 59.04 (ms), back: 68.36 (ms)
input size(128, 3, 128, 128), reduction: sum, fwd: 0.07 (ms), back: 0.25 (ms)
input size(128, 3, 256, 256), reduction: sum, fwd: 0.21 (ms), back: 0.86 (ms)
input size(128, 3, 512, 512), reduction: sum, fwd: 0.82 (ms), back: 3.33 (ms)
input size(256, 3, 128, 128), reduction: sum, fwd: 0.12 (ms), back: 0.46 (ms)
input size(256, 3, 256, 256), reduction: sum, fwd: 0.42 (ms), back: 1.70 (ms)
input size(256, 3, 512, 512), reduction: sum, fwd: 1.53 (ms), back: 6.58 (ms)
input size(512, 3, 128, 128), reduction: sum, fwd: 0.21 (ms), back: 0.87 (ms)
input size(512, 3, 256, 256), reduction: sum, fwd: 0.78 (ms), back: 3.34 (ms)
input size(512, 3, 512, 512), reduction: sum, fwd: 2.98 (ms), back: 13.07 (ms)
input size(128, 3, 128, 128), reduction: mean, fwd: 0.07 (ms), back: 0.26 (ms)
input size(128, 3, 256, 256), reduction: mean, fwd: 0.21 (ms), back: 0.86 (ms)
input size(128, 3, 512, 512), reduction: mean, fwd: 0.82 (ms), back: 3.34 (ms)
input size(256, 3, 128, 128), reduction: mean, fwd: 0.12 (ms), back: 0.46 (ms)
input size(256, 3, 256, 256), reduction: mean, fwd: 0.42 (ms), back: 1.72 (ms)
input size(256, 3, 512, 512), reduction: mean, fwd: 1.53 (ms), back: 6.60 (ms)
input size(512, 3, 128, 128), reduction: mean, fwd: 0.21 (ms), back: 0.87 (ms)
input size(512, 3, 256, 256), reduction: mean, fwd: 0.78 (ms), back: 3.33 (ms)
input size(512, 3, 512, 512), reduction: mean, fwd: 2.98 (ms), back: 13.07 (ms)
```

</details>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62826

Reviewed By: bdhirsh

Differential Revision: D30282279

Pulled By: ngimel

fbshipit-source-id: 4aa0ff3f8af0632957417931d332ec486a12b52d
2021-08-12 18:07:15 -07:00
Shen Li
1022443168 Revert D30279364: [codemod][lint][fbcode/c*] Enable BLACK by default
Test Plan: revert-hammer

Differential Revision:
D30279364 (b004307252)

Original commit changeset: c1ed77dfe43a

fbshipit-source-id: eab50857675c51e0088391af06ec0ecb14e2347e
2021-08-12 11:45:01 -07:00
Zsolt Dollenstein
b004307252 [codemod][lint][fbcode/c*] Enable BLACK by default
Test Plan: manual inspection & sandcastle

Reviewed By: zertosh

Differential Revision: D30279364

fbshipit-source-id: c1ed77dfe43a3bde358f92737cd5535ae5d13c9a
2021-08-12 10:58:35 -07:00
Matti Picus
658540f43f remove deprecated is_deterministic and set_deterministic (#62158)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/58096

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62158

Reviewed By: mruberry

Differential Revision: D29909634

Pulled By: ezyang

fbshipit-source-id: ccffbcf8f378e39bd2c7fbeace7ed1cbbe003981
2021-08-04 16:45:23 -07:00
Natalia Gimelshein
d783617216 enable warnings on cuda synchronization (#62092)
Summary:
This creates `torch.cuda.set_warn_on_synchronization()` function that would warn or error when synchronizing operation is performed. We could wrap it in a context manager for ease of use, but it would be a lie, because it sets global, and not thread-local state. Since it's intended for debugging, maybe that's ok though.
As all `torch.cuda.*` functions, it's going through CPython, not pybind, so the argument is converted to long before being passed to c10 function. I'll make python argument a python enum class, but without pybind it'll still have to go thourgh long conversion.

For a test script
```
import torch
torch.cuda.set_warn_on_synchronization(1)
x=torch.randn(10, device="cuda")
x.nonzero()
y=torch.randn((), device="cuda")

if y:
    print("something")
torch.multinomial(x.abs(), 10, replacement=False)
torch.randperm(20000, device="cuda")
ind = torch.randint(10, (3,), device="cuda")
mask = torch.randint(2, (10,), device="cuda", dtype=torch.bool)
val = torch.randn((), device="cuda")
x[mask]=1.
x[mask] = val
torch.cuda.synchronize()
```
the output is
```
/../playground/sync_warn_test.py:4: UserWarning: called a synchronizing operation (Triggered internally at  ../c10/cuda/CUDAFunctions.cpp:145.)
  x.nonzero()
/../playground/sync_warn_test.py:7: UserWarning: called a synchronizing operation (Triggered internally at  ../c10/cuda/CUDAFunctions.cpp:145.)
  if y:
something
/../playground/sync_warn_test.py:9: UserWarning: called a synchronizing operation (Triggered internally at  ../c10/cuda/CUDAFunctions.cpp:145.)
  torch.multinomial(x.abs(), 10, replacement=False)
/../playground/sync_warn_test.py:15: UserWarning: called a synchronizing operation (Triggered internally at  ../c10/cuda/CUDAFunctions.cpp:145.)
  x[mask] = val
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62092

Reviewed By: mruberry

Differential Revision: D29968792

Pulled By: ngimel

fbshipit-source-id: cc6f817212c164727ed99ecf6ab050dc29631b9e
2021-07-30 09:13:01 -07:00
Jagadish Krishnamoorthy
64d61901eb [ROCm] Skip test_masked_scatter_large_tensor_cuda (#61313)
Summary:
Refer https://github.com/pytorch/pytorch/issues/60190. Skipping unit test until hipcub issue is fixed.

Signed-off-by: Jagadish Krishnamoorthy <jagdish.krishna@gmail.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61313

Reviewed By: iramazanli

Differential Revision: D29626664

Pulled By: malfet

fbshipit-source-id: db2a390d2a3e28ec05a5032a50aa9a35c86b96ca
2021-07-09 10:27:08 -07:00
kshitij12345
5e9bcf9101 fix: support removing hook in the hook (#61250)
Summary:
Fixes: https://github.com/pytorch/pytorch/issues/58354

Problem:
Once a hook is called
05c1e5b655/torch/csrc/autograd/python_hook.cpp (L51-L54)

If the hook has `handle.remove()` while executing and if there are no references to the hook function object then `python` is free to garbage collect.

At the subsequent call to
05c1e5b655/torch/csrc/autograd/python_hook.cpp (L54)

we have `hook` pointing to invalid memory

Thus when we try to fetch the name for `hook` from `check_single_result` with
05c1e5b655/torch/csrc/autograd/python_hook.cpp (L175-L177)
we get segfault.

Solution:
Temporarily increase the life-time of hook with `Py_INCREF` till we have verified the result.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61250

Reviewed By: iramazanli

Differential Revision: D29623826

Pulled By: soulitzer

fbshipit-source-id: c71322311f19066cafb7203980668868c59d4e5e
2021-07-09 09:27:58 -07:00
Heitor Schueroff
f32f85e6da Implemented torch.corrcoef (#60420)
Summary:
Implements `torch.corrcoef` similar to [`np.corrcoef`](https://numpy.org/doc/stable/reference/generated/numpy.corrcoef.html) using `torch.cov` implemented in https://github.com/pytorch/pytorch/pull/58311.

closes https://github.com/pytorch/pytorch/issues/1254

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60420

Reviewed By: mruberry

Differential Revision: D29474687

Pulled By: heitorschueroff

fbshipit-source-id: f3c7c5610363aebd88274a51fc77e3cf879cb611
2021-06-30 12:36:02 -07:00
Victor Bittorf
91c076eadc Add TorchVitals for DataLoader (#60959)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60959

Add TorchVitals for Dataloader, this indicates that the data loader was enabled.

This is a no-op if TORCH_VITALS environment variable is not set.

Test Plan: buck test mode/dbg caffe2/test:torch -- --regex vitals

Reviewed By: VitalyFedyunin

Differential Revision: D29445146

fbshipit-source-id: d5778fff3dafb3c0463fec7a498bff4905597518
2021-06-29 14:08:32 -07:00
Heitor Schueroff
ec9c03c234 Implemented torch.cov (#58311)
Summary:
Based from https://github.com/pytorch/pytorch/pull/50466

Adds the initial implementation of `torch.cov` similar to `numpy.cov`. For simplicity, we removed support for many parameters in `numpy.cov` that are either redundant such as `bias`, or have simple workarounds such as `y` and `rowvar`.

cc PandaBoi

closes https://github.com/pytorch/pytorch/issues/19037

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58311

Reviewed By: jbschlosser

Differential Revision: D29431651

Pulled By: heitorschueroff

fbshipit-source-id: 167dea880f534934b145ba94291a9d634c25b01b
2021-06-29 14:02:39 -07:00
kshitij12345
956faea585 [fix] cauchy sampling inf on cuda (#60186)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/59144

As pointed by ngimel, the issue is indeed with calling `tan`.

However the C++ `std::tan` [documenation](https://en.cppreference.com/w/cpp/numeric/math/tan) states that

```
The function has mathematical poles at π(1/2 + n); however no common floating-point representation
is able to represent π/2 exactly, thus there is no value of the argument for which a pole error occurs.
```

All `torch.tan`,`numpy.tan` and `math.tan` are compliant with the above statement.

<details>

```python
import torch
import math
import numpy as np

# Single Precision
print(torch.tan(torch.tensor(math.pi, device='cuda', dtype=torch.float32) * 0.5))
print(np.tan(np.array(np.pi, dtype=np.float32) * 0.5))

# Double Precision
print(math.tan(math.pi * 0.5))
print(torch.tan(torch.tensor(math.pi, device='cuda', dtype=torch.double) * 0.5))
print(np.tan(np.array(np.pi, dtype=np.float64) * 0.5))
```

Output
```
tensor(-22877334., device='cuda:0')
-22877332.42885646
1.633123935319537e+16
tensor(1.6331e+16, device='cuda:0', dtype=torch.float64)
1.633123935319537e+16
```

</details>

So this issue stems from the use of `__tanf` faster approximation of tan from CUDA library (for float16, bfloat16 and float).

8a839c5478/aten/src/ATen/NumericUtils.h (L91-L100)

The fix in the PR is to use the **slower** but more correct version.

Benchmark::
```
[ cauchy : input dtype torch.float16 device cuda ]
                             |  Before  |  After
1 threads: -------------------------------------
      (128,)                 |    3.8   |    4.3
      (256, 128)             |    3.8   |    4.2
      (2, 512, 256)          |    3.8   |    4.2
      (2, 64, 256, 128)      |   22.8   |   29.6
      (4, 2, 512, 256, 128)  |  649.6   |  869.3

Times are in microseconds (us).

[ cauchy : input dtype torch.bfloat16 device cuda ]
                             |  Before  |  After
1 threads: -------------------------------------
      (128,)                 |    3.8   |    4.3
      (256, 128)             |    3.8   |    4.3
      (2, 512, 256)          |    3.8   |    4.3
      (2, 64, 256, 128)      |   23.8   |   30.8
      (4, 2, 512, 256, 128)  |  682.5   |  904.2

Times are in microseconds (us).

[ cauchy : input dtype torch.float32 device cuda ]
                             |  Before  |  After
1 threads: --------------------------------------
      (128,)                 |     3.8  |     4.2
      (256, 128)             |     3.7  |     4.2
      (2, 512, 256)          |     3.7  |     4.2
      (2, 64, 256, 128)      |    35.3  |    37.1
      (4, 2, 512, 256, 128)  |  1020.0  |  1058.3

Times are in microseconds (us).

[- cauchy : input dtype torch.float64 device cuda ]
                             |   Before  |   After
1 threads: ----------------------------------------
      (128,)                 |      3.8  |      4.2
      (256, 128)             |      8.0  |      8.0
      (2, 512, 256)          |     46.0  |     46.0
      (2, 64, 256, 128)      |    669.2  |    669.4
      (4, 2, 512, 256, 128)  |  21255.0  |  21262.1

Times are in microseconds (us).
```

<details>

Benchmark Script:
```python
import torch
import itertools
import time
from torch.utils.benchmark import Timer
from torch.utils.benchmark import Compare
import sys
import pickle

print('Using pytorch %s' % (torch.__version__))

cuda_shapes = [(128,), (256, 128), (2, 512, 256), (2, 64, 256, 128), (4, 2, 512, 256, 128)]
cuda_dtypes = [torch.half, torch.bfloat16, torch.float, torch.double]
results = []
repeats = 10

for device in ['cuda']:
    dtypes = cuda_dtypes
    shapes = cuda_shapes

    for dtype in dtypes:
        for shape in shapes:
            t = torch.randn(shape, device=device, dtype=dtype) * 10

            tasks = [("t.cauchy_()", "After", "")]
            timers = [Timer(stmt=stmt, label=f"cauchy : input dtype {dtype} device {device}", sub_label=f"{(shape)}", description=desc, globals=globals()) for stmt, desc, label in tasks]

            for i, timer in enumerate(timers * repeats):
                results.append(
                    timer.blocked_autorange()
                )
                print(f"\r{i + 1} / {len(timers) * repeats}", end="")
                sys.stdout.flush()

with open('after-pr.pkl', 'wb') as f:
    pickle.dump(results, f)

comparison = Compare(results)
comparison.print()
```

Compare Script:
```
import torch
import itertools
import time
from torch.utils.benchmark import Timer
from torch.utils.benchmark import Compare
import sys
import pickle

with open('before-pr.pkl', 'rb') as f:
    after_results = pickle.load(f)

with open('after-pr.pkl', 'rb') as f:
    before_results = pickle.load(f)

comparison = Compare(after_results + before_results)
comparison.print()
```

</details>

TODO:
* [x] Add comment

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60186

Reviewed By: jbschlosser

Differential Revision: D29433897

Pulled By: ngimel

fbshipit-source-id: 9c5f14b83e3372bed72369f70eed9256c04385c6
2021-06-28 12:49:30 -07:00
Victor Bittorf
8b6487c650 Add CUDA Vital (#58059)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58059

Add CUDA.used vital sign which is true only if CUDA was "used" which technically means the context was created.

Also adds the following features:
- Force vitals to be written even if vitals are disabled, to enable testing when the env variable is not set from the start of execution
- Add a read_vitals call for python to read existing vital signs.

Test Plan: buck test mode/dbg caffe2/test:torch -- --regex basic_vitals

Reviewed By: xuzhao9

Differential Revision: D28357615

fbshipit-source-id: 681bf9ef63cb1458df9f1c241d301a3ddf1e5252
2021-06-25 16:31:11 -07:00
Masaki Kozuki
a404cc9a7b CUDA addcmul and addcdiv do math in float for 16 bits I/O (#60715)
Summary:
Currently foreach `addcmul` and `addcdiv` cast scalar to float so that actual math is done in FP32 when tensor dtype is Float16/BFloat16 while regular `addcmul` and `addcdiv`, not.

### Reproducible steps to see the behavioral difference
```ipython
In [1]: import torch; torch.__version__
Out[1]: '1.9.0'

In [2]: a, b, c = torch.tensor([60000.0], device='cuda', dtype=torch.half), torch.tensor([60000.0], device='cuda', dtype=torch.half), torch.tensor([-1.0], device='cuda', dtype=torch.half)

In [4]: torch.addcmul(a, b, c, value=2)
Out[4]: tensor([-inf], device='cuda:0', dtype=torch.float16)

In [5]: torch._foreach_addcmul([a], [b], [c], value=2)[0]
Out[5]: tensor([-60000.], device='cuda:0', dtype=torch.float16)
```

### How foreach casts?
Foreach addcmul and addcdiv cast scalar to `opmath_t` (almost equivalent to acc_type) here: 42c8439b6e/aten/src/ATen/native/cuda/ForeachPointwiseOp.cu (L30) and cast inputs and results here:
42c8439b6e/aten/src/ATen/native/cuda/ForeachFunctors.cuh (L133-L135)

Related to https://github.com/pytorch/pytorch/issues/58833 #60227 https://github.com/pytorch/pytorch/issues/60454
cc ptrblck mcarilli ngimel

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60715

Reviewed By: albanD

Differential Revision: D29385715

Pulled By: ngimel

fbshipit-source-id: 8bb2db19ab66fc99d686de056a6ee60f9f71d603
2021-06-25 10:21:35 -07:00
Ilqar Ramazanli
90cd57ee16 To add edge_order=2 and documentation for gradient operator (#58165)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/56036
Fixes https://github.com/pytorch/pytorch/issues/56130

* All the interior points are computed using second order accurate central differences method for gradient operator. However, currently we only have first order method computation for edge points. In this PR we are adding second order methods for edge points as well.

* Currently, there is no detailed description of how gradient operator computed using second order method, and how to use parameters correctly. We add detailed explanation of meaning of each parameter, and return of the gradient operator, meanwhile giving description of the second-order computation.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58165

Reviewed By: mruberry

Differential Revision: D29305321

Pulled By: iramazanli

fbshipit-source-id: 0e0e418eed801c8510b8babe2ad3d064479fb4d6
2021-06-23 03:35:15 -07:00
Philip Meier
0c916c8a4e up the priority of numpy array comparisons in self.assertEqual (#59067)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/58988.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59067

Reviewed By: jbschlosser

Differential Revision: D28986642

Pulled By: heitorschueroff

fbshipit-source-id: 3ef2d26b4010fc3519d0a1a020ea446ffeb46ba0
2021-06-22 13:07:07 -07:00
praneeth
9b30fb8528 add support for constant (#60166)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/58739 Add support for constants according to python array API stipulation.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60166

Reviewed By: anjali411

Differential Revision: D29253958

Pulled By: mruberry

fbshipit-source-id: 0bc86b74d3a4eb3ec4a65c941ec2710747402db1
2021-06-21 20:47:21 -07:00
Thomas J. Fan
c16f87949f ENH Adds nn.ReflectionPad3d (#59791)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/27655

This PR adds a C++ and Python version of ReflectionPad3d with structured kernels. The implementation uses lambdas extensively to better share code from the backward and forward pass.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59791

Reviewed By: gchanan

Differential Revision: D29242015

Pulled By: jbschlosser

fbshipit-source-id: 18e692d3b49b74082be09f373fc95fb7891e1b56
2021-06-21 10:53:14 -07:00
Peter Bell
e8e3394ea8 Recognize transposed dense tensors as a form of partial overlap (#59014)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59014

Fixes #48401

`assert_no_overlap` currently has a false-negative where it recognizes
the transpose of a contiguous tensor as fully overlapping. This happens because
the memory regions do fully overlap, but of course the strides are different so
the actual elements don't all overlap.

This goes slightly in the other direction, by requiring strides to exactly
match we get false-positives for some unusual situations, e.g.
```
torch.add(a, a, out=a.view([1, *a.shape]))
```
Or replacing strides of length-1 dimensions, etc. However, I think these are
sufficiently obscure that it's okay to error and the common cases like
inplace operations still work as before.

Test Plan: Imported from OSS

Reviewed By: gchanan

Differential Revision: D29040928

Pulled By: ngimel

fbshipit-source-id: 5a636c67536a3809c83f0d3117d2fdf49c0a45e6
2021-06-18 16:29:25 -07:00
Mike Ruberry
92513038e8 Revert D28994140: [pytorch][PR] Implemented torch.cov
Test Plan: revert-hammer

Differential Revision:
D28994140 (23c232554b)

Original commit changeset: 1890166c0a9c

fbshipit-source-id: 73dfe1b00464e38f004f99960cdeeb604ed4b20a
2021-06-13 02:33:37 -07:00
Heitor Schueroff
23c232554b Implemented torch.cov (#58311)
Summary:
Based from https://github.com/pytorch/pytorch/pull/50466

Adds the initial implementation of `torch.cov` similar to `numpy.cov`. For simplicity, we removed support for many parameters in `numpy.cov` that are either redundant such as `bias`, or have simple workarounds such as `y` and `rowvar`.

cc PandaBoi

TODO

- [x] Improve documentation

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58311

Reviewed By: mruberry

Differential Revision: D28994140

Pulled By: heitorschueroff

fbshipit-source-id: 1890166c0a9c01e0a536acd91571cd704d632f44
2021-06-11 09:40:50 -07:00
Kimish Patel
4f79270b89 [PyTorch ] Thread parallel bmm across batch dim (#59596)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59596

Parallelize batch matmul across batch dim. This was found to improve perf for
some usecases on mobile.
ghstack-source-id: 130989569

Test Plan: CI unit tests

Reviewed By: albanD

Differential Revision: D26833417

fbshipit-source-id: 9b84d89d29883a6c9d992d993844dd31a25f76b1
2021-06-10 08:25:40 -07:00
Yukio Siraichi
84061dadad Add reduce variants for scatter operation. (#57015)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/56463 #56464

- Add reduce variants for `scatter` in both _native_functions.yaml_ and _TensorAdvancedIndexing.cpp_
- Add `OpInfo` tests and reduce tests in _test_torch.py_
- Fix default reduce argument for `scatter_` in __tensor_docs.py_

Pull Request resolved: https://github.com/pytorch/pytorch/pull/57015

Reviewed By: mrshenli

Differential Revision: D28162657

Pulled By: ezyang

fbshipit-source-id: 4d37ed1569ce8560aca1085c9cf5349f11427c4f
2021-06-08 13:37:26 -07:00
Mike Ruberry
de40c8e495 Adds remaining OpInfos and removes redundant test generators (#55558)
Summary:
Per title.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/55558

Reviewed By: ngimel

Differential Revision: D28922522

Pulled By: mruberry

fbshipit-source-id: 89cefd93788bc8aa0683f4583cf5caa81aa2dc93
2021-06-06 14:52:26 -07:00
Natalia Gimelshein
344ecb2e71 flip via TI (#59509)
Summary:
Resubmit of https://github.com/pytorch/pytorch/issues/58747

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59509

Reviewed By: mruberry

Differential Revision: D28918665

Pulled By: ngimel

fbshipit-source-id: b045c7b35eaf22e53b1bc359ffbe5a4fda05dcda
2021-06-05 15:43:29 -07:00
Natalia Gimelshein
5117ac3bb4 Revert D28877076: [pytorch][PR] torch.flip via TI
Test Plan: revert-hammer

Differential Revision:
D28877076 (d82bc3feb8)

Original commit changeset: 4fa6eb519085

fbshipit-source-id: c81e7d3283ff6822db913bf9f49a1533268755d0
2021-06-04 23:03:53 -07:00
lezcano
d82bc3feb8 torch.flip via TI (#58747)
Summary:
Implements an idea by ngimel to improve the performance of `torch.flip` via a clever hack into TI to bypass the fact that TI is not designed to work with negative indices.

Something that might be added is vectorisation support on CPU, given how simple the implementation is now.

Some low-hanging fruits that I did not implement:
- Write it as a structured kernel
- Migrate the tests to opinfos
- Have a look at `cumsum_backward` and `cumprod_backward`,  as I think that they could be implemented faster with `flip`, now that `flip` is fast.

**Edit**
This operation already has OpInfos and it cannot be migrated to a structured kernel because it implements quantisation

Summary of the PR:
- x1.5-3 performance boost on CPU
- x1.5-2 performance boost on CUDA
- Comparable performance across dimensions, regardless of the strides (thanks TI)
- Simpler code

<details>
<summary>
Test Script
</summary>

```python
from itertools import product

import torch
from torch.utils.benchmark import Compare, Timer

def get_timer(size, dims, num_threads, device):
    x = torch.rand(*size, device=device)

    timer = Timer(
        "torch.flip(x, dims=dims)",
        globals={"x": x, "dims": dims},
        label=f"Flip {device}",
        description=f"dims: {dims}",
        sub_label=f"size: {size}",
        num_threads=num_threads,
    )

    return timer.blocked_autorange(min_run_time=5)

def get_params():
    sizes = ((1000,)*2, (1000,)*3, (10000,)*2)
    for size, device in product(sizes, ("cpu", "cuda")):
        threads = (1, 2, 4) if device == "cpu" else (1,)
        list_dims = [(0,), (1,), (0, 1)]
        if len(size) == 3:
            list_dims.append((0, 2))
        for num_threads, dims in product(threads, list_dims):
            yield size, dims, num_threads, device

def compare():
    compare = Compare([get_timer(*params) for params in get_params()])
    compare.trim_significant_figures()
    compare.colorize()
    compare.print()

compare()
```
</details>

<details>
<summary>
Benchmark PR
</summary>

![image](https://user-images.githubusercontent.com/3291265/119139954-81e46d80-ba3b-11eb-9aad-e825e515d41b.png)

</details>

<details>
<summary>
Benchmark master
</summary>

![image](https://user-images.githubusercontent.com/3291265/119139915-76914200-ba3b-11eb-9aa8-84b3ca220c93.png)

</details>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58747

Reviewed By: agolynski

Differential Revision: D28877076

Pulled By: ngimel

fbshipit-source-id: 4fa6eb519085950176cb3a9161eeb3b6289ec575
2021-06-04 20:13:38 -07:00
Elton Leander Pinto
2119efd234 reflection_pad1d_backward: Port to structured (#59103)
Summary:
Tracking Issue: https://github.com/pytorch/pytorch/issues/55070
Port `reflection_pad1d_backward` to structured kernel.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59103

Test Plan: Pre-existing tests

Reviewed By: jbschlosser

Differential Revision: D28836043

Pulled By: ezyang

fbshipit-source-id: 4c3b0880edf305896f540113dcab70c8af24253b
2021-06-04 10:23:53 -07:00
Edward Yang
f05d5bec48 Preserve PyObject even when it goes dead (#56017)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56017

Fixes #55686

This patch is seemingly straightforward but some of the changes are very
subtle.  For the general algorithmic approach, please first read the
quoted issue.  Based on the algorithm, there are some fairly
straightforward changes:

- New boolean on TensorImpl tracking if we own the pyobj or not
- PythonHooks virtual interface for requesting deallocation of pyobj
  when TensorImpl is being released and we own its pyobj, and
  implementation of the hooks in python_tensor.cpp
- Modification of THPVariable to MaybeOwned its C++ tensor, directly
  using swolchok's nice new class

And then, there is python_variable.cpp.  Some of the changes follow the
general algorithmic approach:

- THPVariable_NewWithVar is simply adjusted to handle MaybeOwned and
  initializes as owend (like before)
- THPVariable_Wrap adds the logic for reverting ownership back to
  PyObject when we take out an owning reference to the Python object
- THPVariable_dealloc attempts to resurrect the Python object if
  the C++ tensor is live, and otherwise does the same old implementation
  as before
- THPVariable_tryResurrect implements the resurrection logic.  It is
  modeled after CPython code so read the cited logic and see if
  it is faithfully replicated
- THPVariable_clear is slightly updated for MaybeOwned and also to
  preserve the invariant that if owns_pyobj, then pyobj_ is not null.
  This change is slightly dodgy: the previous implementation has a
  comment mentioning that the pyobj nulling is required to ensure we
  don't try to reuse the dead pyobj.  I don't think, in this new world,
  this is possible, because the invariant says that the pyobj only
  dies if the C++ object is dead too.  But I still unset the field
  for safety.

And then... there is THPVariableMetaType.  colesbury explained in the
issue why this is necessary: when destructing an object in Python, you
start off by running the tp_dealloc of the subclass before moving up
to the parent class (much in the same way C++ destructors work).  The
deallocation process for a vanilla Python-defined class does irreparable
harm to the PyObject instance (e.g., the finalizers get run) making it
no longer valid attempt to resurrect later in the tp_dealloc chain.
(BTW, the fact that objects can resurrect but in an invalid state is
one of the reasons why it's so frickin' hard to write correct __del__
implementations).  So we need to make sure that we actually override
the tp_dealloc of the bottom most *subclass* of Tensor to make sure
we attempt a resurrection before we start finalizing.  To do this,
we need to define a metaclass for Tensor that can override tp_dealloc
whenever we create a new subclass of Tensor.  By the way, it was totally
not documented how to create metaclasses in the C++ API, and it took
a good bit of trial error to figure it out (and the answer is now
immortalized in https://stackoverflow.com/q/67077317/23845 -- the things
that I got wrong in earlier versions of the PR included setting
tp_basicsize incorrectly, incorrectly setting Py_TPFLAGS_HAVE_GC on
the metaclass--you want to leave it unset so that it inherits, and
determining that tp_init is what actually gets called when you construct
a class, not tp_call as another not-to-be-named StackOverflow question
suggests).

Aside: Ordinarily, adding a metaclass to a class is a user visible
change, as it means that it is no longer valid to mixin another class
with a different metaclass.  However, because _C._TensorBase is a C
extension object, it will typically conflict with most other
metaclasses, so this is not BC breaking.

The desired new behavior of a subclass tp_dealloc is to first test if
we should resurrect, and otherwise do the same old behavior.  In an
initial implementation of this patch, I implemented this by saving the
original tp_dealloc (which references subtype_dealloc, the "standard"
dealloc for all Python defined classes) and invoking it.  However, this
results in an infinite loop, as it attempts to call the dealloc function
of the base type, but incorrectly chooses subclass type (because it is
not a subtype_dealloc, as we have overridden it; see
b38601d496/Objects/typeobject.c (L1261) )
So, with great reluctance, I must duplicate the behavior of
subtype_dealloc in our implementation.  Note that this is not entirely
unheard of in Python binding code; for example, Cython
c25c3ccc4b/Cython/Compiler/ModuleNode.py (L1560)
also does similar things.  This logic makes up the bulk of
THPVariable_subclass_dealloc

To review this, you should pull up the CPython copy of subtype_dealloc
b38601d496/Objects/typeobject.c (L1230)
and verify that I have specialized the implementation for our case
appropriately.  Among the simplifications I made:

- I assume PyType_IS_GC, because I assume that Tensor subclasses are
  only ever done in Python and those classes are always subject to GC.
  (BTW, yes!  This means I have broken anyone who has extend PyTorch
  tensor from C API directly.  I'm going to guess no one has actually
  done this.)

- I don't bother walking up the type bases to find the parent dealloc;
  I know it is always THPVariable_dealloc.  Similarly, I can get rid
  of some parent type tests based on knowledge of how
  THPVariable_dealloc is defined

- The CPython version calls some private APIs which I can't call, so
  I use the public PyObject_GC_UnTrack APIs.

- I don't allow the finalizer of a Tensor to change its type (but
  more on this shortly)

One alternative I discussed with colesbury was instead of copy pasting
the subtype_dealloc, we could transmute the type of the object that was
dying to turn it into a different object whose tp_dealloc is
subtype_dealloc, so the stock subtype_dealloc would then be applicable.
We decided this would be kind of weird and didn't do it that way.

TODO:

- More code comments

- Figure out how not to increase the size of TensorImpl with the new
  bool field

- Add some torture tests for the THPVariable_subclass_dealloc, e.g.,
  involving subclasses of Tensors that do strange things with finalizers

- Benchmark the impact of taking the GIL to release C++ side tensors
  (e.g., from autograd)

- Benchmark the impact of adding a new metaclass to Tensor (probably
  will be done by separating out the metaclass change into its own
  change)

- Benchmark the impact of changing THPVariable to conditionally own
  Tensor (as opposed to unconditionally owning it, as before)

- Add tests that this actually indeed preserves the Python object

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D27765125

Pulled By: ezyang

fbshipit-source-id: 857f14bdcca2900727412aff4c2e2d7f0af1415a
2021-06-03 10:50:36 -07:00
Thomas J. Fan
7f2e620105 FIX Validates that weights are 2d in embedding (#59314)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/55185

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59314

Reviewed By: H-Huang

Differential Revision: D28837753

Pulled By: jbschlosser

fbshipit-source-id: 683378244c61b0937c95563f91ef87ab09fd1653
2021-06-02 12:52:21 -07:00
Natalia Gimelshein
12418a4f86 Back out "Revert D28664514: [pytorch][PR] various TensorIterator speed improvements"
Summary: Original commit changeset: fcad039b7dc8

Test Plan: Existing tests

Reviewed By: mruberry

Differential Revision: D28720186

fbshipit-source-id: 14ac99ee2d7cafb86b20c979f8917beeefd616e1
2021-05-26 12:22:48 -07:00
Edward Yang
17fb651a3b Make torch.Tensor(torch.tensor(1.0)) work (#58885)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58885

Fixes #58884

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: gchanan

Differential Revision: D28687510

Pulled By: ezyang

fbshipit-source-id: 81325f501cc3e83cbac02f7c44ded9d396356bb8
2021-05-26 11:33:05 -07:00
Natalia Gimelshein
8398ebaa86 Revert D28664514: [pytorch][PR] various TensorIterator speed improvements
Test Plan: revert-hammer

Differential Revision:
D28664514 (8a28bbeeb9)

Original commit changeset: 2e03cf90b37a

fbshipit-source-id: fcad039b7dc823fec8afa694ab74a7ac5011f8ab
2021-05-26 10:49:58 -07:00
Xiang Gao
c88333484f [resubmit] masked_scatter thrust->cub (#58865)
Summary:
See ae7760cf50bb2cddff4663a07b9d68decf4b6c75 for the fix

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58865

Reviewed By: mruberry

Differential Revision: D28657940

Pulled By: ngimel

fbshipit-source-id: 9155c710b0e18ebb3bfa2dabfdd117355ac30840
2021-05-25 11:00:50 -07:00
Natalia Gimelshein
8a28bbeeb9 various TensorIterator speed improvements (#58810)
Summary:
1) remove pushing back to strides vector for 1D tensors, those strides are never used in the loop anyway
2) avoid calling get_data_ptrs unless necessary
3) don't call into assert_no_partial_overlap if tensorImpls are the same (assert_no_partial_overlap has this comparison too, but after a couple of nested function calls)
4) is_non_overlapping_and_dense instead of is_contiguous in memory overlap (which, for some reason, is faster than is_contiguous, though I hoped after is_contiguous is non-virtualized, it should be the same).

Altogether, brings instruction count down from ~110K to 102735 for the following binary inplace benchmark:
```
In [2]:  timer = Timer("m1.add_(b);", setup="at::Tensor m1=torch::empty({1}); at::Tensor b = torch::empty({1});", language="c++", timer=timeit.default_timer)
   ...:  stats=timer.collect_callgrind(number=30, repeats=3)
   ...:  print(stats[1].as_standardized().stats(inclusive=False))
```
similar improvements for unary inplace.

Upd: returned stride packing for now, counts is now 104295, so packing is worth ~ 52 instructions, we should think about how to remove it  safely.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58810

Reviewed By: bhosmer

Differential Revision: D28664514

Pulled By: ngimel

fbshipit-source-id: 2e03cf90b37a411d9994a7607402645f1d8f3c93
2021-05-25 10:44:51 -07:00
Serhat Yilmaz
b4f3a989da [torch][repeat_interleave] Fix ambigious function call (#58881)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58881

recently added new parameter to the function with PR: https://github.com/pytorch/pytorch/pull/58417

However, this introduced ambiguity when making call below:
  some_tensor.repeat_interleave(some_integer_value)

Making it optional to avoid the issue.

Reviewed By: ezyang, ngimel

Differential Revision: D28653820

fbshipit-source-id: 5bc0b1f326f069ff505554b51e3b24d60e69c843
2021-05-25 00:31:32 -07:00
Yu Guo
74c12da451 add deterministic path for scatter_add_cuda for 1D tensors (#58761)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58761

previously we implemented deterministic path for gather_backward in https://github.com/pytorch/pytorch/pull/55573, which replaced non-deterministic scatter_add_cuda.

It's better to move it inside scatter_add so scatter_add can benefit from the deterministic path.

Test Plan:
buck test mode/opt //caffe2/test:torch_cuda -- test_scatter_add_one_dim_deterministic

    ✓ ListingSuccess: caffe2/test:torch_cuda - main (5.063)
    ✓ Pass: caffe2/test:torch_cuda - test_scatter_add_one_dim_deterministic_cuda (test_torch.TestTorchDeviceTypeCUDA) (30.909)
    ✓ Pass: caffe2/test:torch_cuda - main (30.909)
Summary
  Pass: 2
  ListingSuccess: 1

buck test mode/opt //caffe2/test:torch_cuda -- test_gather_backward

    ✓ ListingSuccess: caffe2/test:torch_cuda - main (4.613)
    ✓ Pass: caffe2/test:torch_cuda - test_gather_backward_deterministic_path_cuda (test_torch.TestTorchDeviceTypeCUDA) (25.369)

buck test mode/opt //caffe2/test:torch_cuda -- test_nondeterministic_alert

    ✓ ListingSuccess: caffe2/test:torch_cuda - main (5.356)
    ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_CTCLoss_cuda (test_torch.TestTorchDeviceTypeCUDA) (28.146)
    ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_put_accumulate_cuda (test_torch.TestTorchDeviceTypeCUDA) (28.146)
    ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_ReplicationPad1d_cuda (test_torch.TestTorchDeviceTypeCUDA) (28.146)
    ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_scatter_add_cuda (test_torch.TestTorchDeviceTypeCUDA) (28.146)
    ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_FractionalMaxPool2d_cuda (test_torch.TestTorchDeviceTypeCUDA) (28.146)
    ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_AdaptiveAvgPool2d_cuda (test_torch.TestTorchDeviceTypeCUDA) (28.146)
    ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_AvgPool3d_cuda (test_torch.TestTorchDeviceTypeCUDA) (28.146)
    ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_grid_sample_2d_cuda (test_torch.TestTorchDeviceTypeCUDA) (28.146)
    ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_NLLLoss_cuda (test_torch.TestTorchDeviceTypeCUDA) (28.146)
    ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_put_cuda (test_torch.TestTorchDeviceTypeCUDA) (28.146)
    ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_median_cuda_float64 (test_torch.TestTorchDeviceTypeCUDA) (28.146)
    ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_gather_cuda (test_torch.TestTorchDeviceTypeCUDA) (28.146)
    ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_bincount_cuda (test_torch.TestTorchDeviceTypeCUDA) (28.146)
    ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_histc_cuda (test_torch.TestTorchDeviceTypeCUDA) (28.146)
    ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_ReflectionPad1d_cuda (test_torch.TestTorchDeviceTypeCUDA) (28.146)
    ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_interpolate_bilinear_cuda (test_torch.TestTorchDeviceTypeCUDA) (28.146)
    ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_ReplicationPad2d_cuda (test_torch.TestTorchDeviceTypeCUDA) (28.146)
    ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_interpolate_bicubic_cuda (test_torch.TestTorchDeviceTypeCUDA) (28.146)
    ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_grid_sample_3d_cuda (test_torch.TestTorchDeviceTypeCUDA) (28.146)
    ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_MaxPool3d_cuda (test_torch.TestTorchDeviceTypeCUDA) (28.146)
    ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_AdaptiveAvgPool3d_cuda (test_torch.TestTorchDeviceTypeCUDA) (28.146)
    ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_EmbeddingBag_max_cuda (test_torch.TestTorchDeviceTypeCUDA) (28.146)
    ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_interpolate_trilinear_cuda (test_torch.TestTorchDeviceTypeCUDA) (28.146)
    ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_AdaptiveMaxPool2d_cuda (test_torch.TestTorchDeviceTypeCUDA) (28.146)
    ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_ReflectionPad2d_cuda (test_torch.TestTorchDeviceTypeCUDA) (28.146)
    ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_FractionalMaxPool3d_cuda (test_torch.TestTorchDeviceTypeCUDA) (28.146)
    ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_kthvalue_cuda_float64 (test_torch.TestTorchDeviceTypeCUDA) (28.146)
    ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_interpolate_linear_cuda (test_torch.TestTorchDeviceTypeCUDA) (28.146)
    ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_ReplicationPad3d_cuda (test_torch.TestTorchDeviceTypeCUDA) (28.146)
    ✓ Pass: caffe2/test:torch_cuda - main (28.146)
Summary
  Pass: 30
  ListingSuccess: 1

Reviewed By: ngimel

Differential Revision: D28585659

fbshipit-source-id: 1ad003d4130501ceff5f6a7a870ca3dbc9a3f1f2
2021-05-23 21:36:02 -07:00
kshitij12345
ee3ea31f12 OpInfo: split, split_with_sizes (#58184)
Summary:
Reference: https://github.com/pytorch/pytorch/issues/54261

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58184

Reviewed By: ngimel

Differential Revision: D28627271

Pulled By: mruberry

fbshipit-source-id: e6c0d2b005904ddebc9dab76685403530a6f6519
2021-05-23 15:47:35 -07:00
Serhat Yilmaz
4ca4640bae [torch][repeat_interleave] remove stream syncronization if output size is given (#58417)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58417

Same as title.

Test Plan:
Rely on CI signal.

Update unit test to exercise new code path as well.

Reviewed By: ngimel

Differential Revision: D28482927

fbshipit-source-id: 3ec8682810ed5c8547b1e8d3869924480ce63dcd
2021-05-22 20:53:28 -07:00
Natalia Gimelshein
9e261de630 Revert D28547564: [pytorch][PR] masked_scatter thrust->cub
Test Plan: revert-hammer

Differential Revision:
D28547564 (5152cf8647)

Original commit changeset: 83aeddfaf702

fbshipit-source-id: d5259afb584e0f6c0a11de4d4cb3d56a2a562eb7
2021-05-21 09:18:34 -07:00
Xiang Gao
5152cf8647 masked_scatter thrust->cub (#56750)
Summary:
Benchmark:

```python
import torch
import itertools

def run50_sync(f):
    for _ in range(50):
        f()
    torch.cuda.synchronize()

run50_sync(lambda: torch.randperm(1000000, device='cuda'))

def benchmark(M):
    a = torch.randn(M, device='cuda')
    m = torch.randint(1, (M,), dtype=torch.long, device='cuda').bool()
    v = torch.randn(M, device='cuda')

    torch.cuda.synchronize()

    %timeit run50_sync(lambda:a.masked_scatter_(m, v))

for M in (100, 1000, 100000, 10000000):
    print(M)
    benchmark(M)
```

Before:
```
100
8.65 ms ± 80.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
1000
8.75 ms ± 72.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
100000
9.27 ms ± 87.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
10000000
33.6 ms ± 358 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
```

After
```
100
8.04 ms ± 37.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
1000
8.09 ms ± 38.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
100000
8.63 ms ± 76.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
10000000
31.9 ms ± 298 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56750

Reviewed By: ailzhang

Differential Revision: D28547564

Pulled By: ngimel

fbshipit-source-id: 83aeddfaf7023f9f9501c6b1e2faf91e8b6277b1
2021-05-20 10:27:58 -07:00
lezcano
452569dffb cfloat and cdouble functions (#58137)
Summary:
This adds the methods `Tensor.cfloat()` and `Tensor.cdouble()`.

I was not able to find the tests for `.float()` functions. I'd be happy to add similar tests for these functions  once someone points me to them.

Fixes https://github.com/pytorch/pytorch/issues/56014

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58137

Reviewed By: ejguan

Differential Revision: D28412288

Pulled By: anjali411

fbshipit-source-id: ff3653cb3516bcb3d26a97b9ec3d314f1f42f83d
2021-05-13 21:13:37 -07:00
kshitij12345
6b1eeef601 OpInfo: squeeze (#58080)
Summary:
Reference: https://github.com/pytorch/pytorch/issues/54261

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58080

Reviewed By: agolynski

Differential Revision: D28379485

Pulled By: mruberry

fbshipit-source-id: 2b288036f595a5bd6b948a072494ee87f82322ce
2021-05-12 21:29:31 -07:00
Yu Guo
8a45006765 enable deterministic path for index_copy_cuda with index_put (#58144)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58144

reland D28291041 (14badd9929), which was reverted due to a type error from Tuple[torch.Tensor], seems that mypy requires Tuple[torch.Tensor, torch.Tensor, torch.Tensor]

Test Plan:
buck test mode/opt //caffe2/test:torch_cuda -- test_index_copy_deterministic

    ✓ ListingSuccess: caffe2/test:torch_cuda - main (9.229)
    ✓ Pass: caffe2/test:torch_cuda - test_index_copy_deterministic_cuda (test_torch.TestTorchDeviceTypeCUDA) (25.750)
    ✓ Pass: caffe2/test:torch_cuda - main (25.750)

Reviewed By: ngimel

Differential Revision: D28383178

fbshipit-source-id: 38896fd6ddd670cfcce36e079aee7ad52adc2a28
2021-05-12 16:26:50 -07:00
kshitij12345
d09abf004c OpInfo: narrow (#58082)
Summary:
Reference: https://github.com/pytorch/pytorch/issues/54261

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58082

Reviewed By: agolynski

Differential Revision: D28379371

Pulled By: mruberry

fbshipit-source-id: 484e560b1e6ceba234e497585ed308a27cd8b7a0
2021-05-12 15:39:15 -07:00
Mike Ruberry
c911c30520 Revert D28291041: enable deterministic path for index_copy_cuda with index_put
Test Plan: revert-hammer

Differential Revision:
D28291041 (14badd9929)

Original commit changeset: 7f0cf3ec7280

fbshipit-source-id: 6117bc6e5b2044ce70d4e4a19bccd8c183ea3702
2021-05-12 03:33:57 -07:00
Kurt Mohler
c7fb0a0e82 Remove beta warning for use_deterministic_algorithms (#58074)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/58073

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58074

Reviewed By: ngimel

Differential Revision: D28373676

Pulled By: mruberry

fbshipit-source-id: cae9a92ebbf6ac5f8d3008aa6a6a9cd5c1041c9f
2021-05-12 03:30:12 -07:00
Yu Guo
14badd9929 enable deterministic path for index_copy_cuda with index_put (#57870)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57870

this is similar to index_add_cuda with index_put accumulate = True

Test Plan:
buck test mode/opt //caffe2/test:torch_cuda -- test_index_copy_deterministic

    ✓ ListingSuccess: caffe2/test:torch_cuda - main (9.229)
    ✓ Pass: caffe2/test:torch_cuda - test_index_copy_deterministic_cuda (test_torch.TestTorchDeviceTypeCUDA) (25.750)
    ✓ Pass: caffe2/test:torch_cuda - main (25.750)

Reviewed By: ngimel

Differential Revision: D28291041

fbshipit-source-id: 7f0cf3ec72805f3617fd1de9ff03e1d49114fed8
2021-05-12 00:32:35 -07:00
Yu Guo
a07a0190f9 enable deterministic path for index_put with accumulate=False on CPU and CUDA (#57839)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57839

we reuse the `index_put_accum_kernel`, rename it to  `index_put_deterministic_kernel` and add a bool `accumulate` in `index_backward_kernel`

Test Plan:
buck test mode/opt //caffe2/test:torch -- test_index_put_non_accumulate_deterministic

    ✓ Pass: caffe2/test:torch - test_index_put_non_accumulate_deterministic_cpu (test_torch.TestTorchDeviceTypeCPU) (5.120)
Summary
  Pass: 1
  Skip: 1
    ↻ caffe2/test:torch - test_index_put_non_accumulate_deterministic_meta (test_torch.TestTorchDeviceTypeMETA)
  ListingSuccess: 1

buck test mode/opt //caffe2/test:torch_cuda -- test_index_put_non_accumulate_deterministic

    ✓ ListingSuccess: caffe2/test:torch_cuda - main (6.397)
    ✓ Pass: caffe2/test:torch_cuda - test_index_put_non_accumulate_deterministic_cuda (test_torch.TestTorchDeviceTypeCUDA) (26.030)
    ✓ Pass: caffe2/test:torch_cuda - main (26.030)
Summary
  Pass: 2
  ListingSuccess: 1

Reviewed By: ngimel

Differential Revision: D28290699

fbshipit-source-id: df8bbe7af2e72017566161b05b85737fda4ceb3f
2021-05-12 00:31:19 -07:00
Ilqar Ramazanli
8b816e9010 To implement gradient for Pytorch (#54617)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/56129

Pull Request resolved: https://github.com/pytorch/pytorch/pull/54617

Reviewed By: anjali411

Differential Revision: D28057452

Pulled By: iramazanli

fbshipit-source-id: 9bd86679282d34f5e5393e6447121586517eb4f0
2021-05-11 18:52:20 -07:00
kshitij12345
502eb664ae OpInfo: chunk (#57935)
Summary:
Reference: https://github.com/pytorch/pytorch/issues/54261

Pull Request resolved: https://github.com/pytorch/pytorch/pull/57935

Reviewed By: ngimel

Differential Revision: D28346217

Pulled By: mruberry

fbshipit-source-id: 331995aa18fd2983fc2122a9af31fba43ab9839c
2021-05-11 10:16:10 -07:00
Edward Yang
da8cc355a3 Relax tp_new so that it is OK to call (#57544)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57544

Instead of removing tp_new from the superclass (which causes
super().__new__ to not work), I now still install tp_new on the
superclass, but verify that you are not trying to directly
construct _TensorBase.

Fixes https://github.com/pytorch/pytorch/issues/57421

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D28189475

Pulled By: ezyang

fbshipit-source-id: 9397a3842a77f5428d182dd62244b42425bca827
2021-05-05 09:04:39 -07:00
Peter Bell
33eea146ee torch.clamp with tensor min and max (#52695)
Summary:
Fixes gh-2793

Pull Request resolved: https://github.com/pytorch/pytorch/pull/52695

Reviewed By: mruberry

Differential Revision: D27395977

Pulled By: ezyang

fbshipit-source-id: f86aa240feb034d42e4c45447e72218f6a773c24
2021-05-03 12:56:16 -07:00
kshitij12345
154eca0309 OpInfo: ravel, view, view_as (#56910)
Summary:
Reference: https://github.com/pytorch/pytorch/issues/54261

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56910

Reviewed By: ngimel

Differential Revision: D28141867

Pulled By: mruberry

fbshipit-source-id: bff49d40d7e3bb36bc83d1405bd77f5529eeffe9
2021-05-02 22:10:36 -07:00
Ivan Yashchuk
eaf00bf7d4 Skip linalg.qr saved mode check if compiled without LAPACK (#56284)
Summary:
This PR also removes qr and eig tests from test/test_torch.py. They were not skipped if compiled without LAPACK and they are now replaced with OpInfos.

Fixes https://github.com/pytorch/pytorch/issues/55929

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56284

Reviewed By: ejguan

Differential Revision: D27827077

Pulled By: mruberry

fbshipit-source-id: 1dceb955810a9fa34bb6baaccbaf0c8229444d3a
2021-05-02 16:07:07 -07:00
kshitij12345
41099ef71c OpInfo: mvlgamma (#56907)
Summary:
Reference: https://github.com/pytorch/pytorch/issues/42515

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56907

Reviewed By: astaff

Differential Revision: D28118669

Pulled By: mruberry

fbshipit-source-id: f54ad6dc64ddb6bcfca5c5c7fd8f395cd9761128
2021-05-01 20:51:01 -07:00
Wenlei Xie
20085f6d23 Support auto generation of device check (#56872)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56872

ghstack-source-id: 127914018

Test Plan: auto test

Reviewed By: ezyang

Differential Revision: D27986429

fbshipit-source-id: 0da8413b0b8e6810fcea27ed1de499f11f68bd1f
2021-05-01 12:02:09 -07:00
Emilio Castillo
0a9c9cc674 Update DLPack to 0.4 (#55365)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/55090

I included the header directly, but I am not sure if we should add this as a git submodule, what do you guys think?
Also regarding the implementation, in ATen lanes seems not to be supported, but from CuPy complex types are exported with 2 lanes, I am not sure wether this is correct or not. However, in PyTorch this seems to be working properly, so I forgive 2 lanes for complex datatypes.

TODO: add tests for complex and bfloat

Easy test script against cupy

```python
import cupy
import torch

from torch.utils.dlpack import to_dlpack
from torch.utils.dlpack import from_dlpack

# Create a PyTorch tensor.
tx1 = torch.tensor(
    [2 + 1j, 3 + 2j, 4 + 3j, 5 + 4j], dtype=torch.complex128
).cuda()

# Convert it into a DLPack tensor.
dx = to_dlpack(tx1)

# Convert it into a CuPy array.
cx = cupy.fromDlpack(dx)

# Convert it back to a PyTorch tensor.
tx2 = from_dlpack(cx.toDlpack())
torch.testing.assert_allclose(tx1, tx2)
```

Thanks to leofang who updated CuPy's dlpack version and his PR served me as the guide for this one.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/55365

Reviewed By: ngimel

Differential Revision: D27724923

Pulled By: mruberry

fbshipit-source-id: 481eadb882ff3dd31e7664e08e8908c60a960f66
2021-04-30 10:30:05 -07:00
Edward Yang
e362ee6f8a Make it illegal to directly construct _TensorBase (#56150)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56150

See #56017 for full context; the short story is that by making
it illegal to directly construct _TensorBase, we need only
write a *single* tp_dealloc function which will work universally
for all _TensorBase subclasses, rather than having to write two
versions, one for _TensorBase itself, and others for Python subclasses
of _TensorBase.  This means simpler code.

The subtlety here is that we only install our custom `tp_new` for direct subclasses of TensorBase.  This is important, because overriding the `tp_new` also overrides any user defined constructor.  Fortunately class Tensor(_TensorBase) has no nontrivial constructors and doesn't mind, but other subclasses like Parameter definitely mind!

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: H-Huang

Differential Revision: D28028746

Pulled By: ezyang

fbshipit-source-id: 3c03a14666ad1ded1145fe676afb0a7623cdb9bb
2021-04-28 09:25:25 -07:00
Arindam Roy
5d7e48c9fc Disable one test in rocm (#56951)
Summary:
The test seems to be failing in ROCM 4.1 on CI node.  Disabling the same for now. The test will be    re-enabled for ROCM when CI transitions to 4.2.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56951

Reviewed By: zou3519

Differential Revision: D28059808

Pulled By: ezyang

fbshipit-source-id: a9b064b7525ae6dce89c51fe29ff07f37b7ac796
2021-04-28 08:58:51 -07:00
Yukio Siraichi
cf17fd6dd5 Fix multinomial CUDA misalignment and non-deterministic behavior (#55364)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/46702

- fails on probability distribution with odd items
  - trying to access an `acc_type` (`float`) in a `scalar_t` (`float16`) aligned memory
- produce unrepeatable result for large input tensor
  - parallel cumsum not monotonic at some positions

### Fixes
- computing cumsum on `acc_type` (`float`) instead of using `scalar_t` (`float16`) fixed both issues
- the non-monotonic behavior may happen even using `float`, though
  - in these cases, deterministic behavior may be achieved by eliminating the race condition when writing the result, using the atomic function `atomicMax`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/55364

Reviewed By: mruberry

Differential Revision: D28031666

Pulled By: ngimel

fbshipit-source-id: 0fc6289e0b9ea2d31ef3771e7ca370de8f5c02de
2021-04-27 12:04:32 -07:00
Yu Guo
f5c24cc891 add deterministic path for index_copy_cpu (#56900)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56900

use serial copy with iter.serial_for_each in the deterministic mode

Test Plan:
buck test mode/opt //caffe2/test:torch -- test_index_copy_deterministic

    ✓ Pass: caffe2/test:torch - test_index_copy_deterministic_cpu (test_torch.TestTorchDeviceTypeCPU) (5.581)

buck test mode/opt //caffe2/test:torch_cuda -- test_nondeterministic_alert_index_copy

    ✓ ListingSuccess: caffe2/test:torch_cuda - main (11.565)
    ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_index_copy_cuda_float64 (test_torch.TestTorchDeviceTypeCUDA) (29.172)
    ✓ Pass: caffe2/test:torch_cuda - main (29.172)

Reviewed By: ngimel

Differential Revision: D27992992

fbshipit-source-id: cebeefd8508553f9dbc4145819fe90dd625502f3
2021-04-26 16:57:47 -07:00
Yu Guo
72c3ee073f add deterministic path for index_add_cuda (#56521)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56521

index_add_cuda is non-deterministic due to cuda atomicAdd. Here we add a deterministic code path with index_put(accumulate=True)

Test Plan:
buck test mode/opt //caffe2/test:torch_cuda -- test_index_add_deterministic

    ✓ ListingSuccess: caffe2/test:torch_cuda - main (12.289)
    ✓ Pass: caffe2/test:torch_cuda - test_index_add_deterministic_cuda (test_torch.TestTorchDeviceTypeCUDA) (27.190)
    ✓ Pass: caffe2/test:torch_cuda - main (27.190)
Summary
  Pass: 2
  ListingSuccess: 1

buck test mode/opt //caffe2/test:torch_cuda -- test_nondeterministic_alert

    ✓ ListingSuccess: caffe2/test:torch_cuda - main (16.088)
    ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_ReflectionPad1d_cuda (test_torch.TestTorchDeviceTypeCUDA) (37.654)
    ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_kthvalue_cuda_float64 (test_torch.TestTorchDeviceTypeCUDA) (37.654)
    ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_ReplicationPad1d_cuda (test_torch.TestTorchDeviceTypeCUDA) (37.654)
    ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_bincount_cuda (test_torch.TestTorchDeviceTypeCUDA) (37.654)
    ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_index_put_cuda (test_torch.TestTorchDeviceTypeCUDA) (37.654)
    ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_EmbeddingBag_max_cuda (test_torch.TestTorchDeviceTypeCUDA) (37.654)
    ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_MaxPool3d_cuda (test_torch.TestTorchDeviceTypeCUDA) (37.654)
    ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_AdaptiveAvgPool3d_cuda (test_torch.TestTorchDeviceTypeCUDA) (37.654)
    ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_histc_cuda (test_torch.TestTorchDeviceTypeCUDA) (37.654)
    ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_interpolate_linear_cuda (test_torch.TestTorchDeviceTypeCUDA) (37.654)
    ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_AdaptiveMaxPool2d_cuda (test_torch.TestTorchDeviceTypeCUDA) (37.654)
    ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_FractionalMaxPool3d_cuda (test_torch.TestTorchDeviceTypeCUDA) (37.654)
    ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_AdaptiveAvgPool2d_cuda (test_torch.TestTorchDeviceTypeCUDA) (37.654)
    ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_NLLLoss_cuda (test_torch.TestTorchDeviceTypeCUDA) (37.654)
    ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_put_accumulate_cuda (test_torch.TestTorchDeviceTypeCUDA) (37.654)
    ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_grid_sample_2d_cuda (test_torch.TestTorchDeviceTypeCUDA) (37.654)
    ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_put_cuda (test_torch.TestTorchDeviceTypeCUDA) (37.654)
    ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_interpolate_trilinear_cuda (test_torch.TestTorchDeviceTypeCUDA) (37.654)
    ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_interpolate_bicubic_cuda (test_torch.TestTorchDeviceTypeCUDA) (37.654)
    ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_ReflectionPad2d_cuda (test_torch.TestTorchDeviceTypeCUDA) (37.654)
    ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_scatter_add_cuda (test_torch.TestTorchDeviceTypeCUDA) (37.654)
    ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_AvgPool3d_cuda (test_torch.TestTorchDeviceTypeCUDA) (37.654)
    ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_grid_sample_3d_cuda (test_torch.TestTorchDeviceTypeCUDA) (37.654)
    ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_CTCLoss_cuda (test_torch.TestTorchDeviceTypeCUDA) (37.654)
    ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_FractionalMaxPool2d_cuda (test_torch.TestTorchDeviceTypeCUDA) (37.654)
    ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_ReplicationPad3d_cuda (test_torch.TestTorchDeviceTypeCUDA) (37.654)
    ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_index_copy_cuda_float64 (test_torch.TestTorchDeviceTypeCUDA) (37.654)
    ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_ReplicationPad2d_cuda (test_torch.TestTorchDeviceTypeCUDA) (37.654)
    ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_median_cuda_float64 (test_torch.TestTorchDeviceTypeCUDA) (37.654)
    ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_gather_cuda (test_torch.TestTorchDeviceTypeCUDA) (37.654)
    ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_interpolate_bilinear_cuda (test_torch.TestTorchDeviceTypeCUDA) (37.654)
    ✓ Pass: caffe2/test:torch_cuda - main (37.654)
Summary
  Pass: 32
  ListingSuccess: 1

Reviewed By: ngimel

Differential Revision: D27861072

fbshipit-source-id: c33731017b863751f3e3068a23135129c555b66f
2021-04-26 12:14:58 -07:00
kshitij12345
9eee14704a OpInfo: roll and rot90 (#56770)
Summary:
Reference: https://github.com/pytorch/pytorch/issues/54261

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56770

Reviewed By: ngimel

Differential Revision: D27987820

Pulled By: mruberry

fbshipit-source-id: c6b86cdc1b89d91eeda2215020137582e7c20c65
2021-04-25 22:12:38 -07:00
kshitij12345
9e027d7ea3 [OpInfo] Add opinfo for transpose and its aliases (#56122)
Summary:
Reference: https://github.com/pytorch/pytorch/issues/54261

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56122

Reviewed By: ezyang

Differential Revision: D27962878

Pulled By: mruberry

fbshipit-source-id: cfd84bb0dcedeb98233a10e2c9754281f7cb76af
2021-04-25 21:58:16 -07:00
kshitij12345
298db67220 [OpInfo] Add Function Variant and Opinfo for permute (#56125)
Summary:
Reference: https://github.com/pytorch/pytorch/issues/54261

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56125

Reviewed By: ezyang

Differential Revision: D27960312

Pulled By: mruberry

fbshipit-source-id: b9dd89f7e69d7dff29f3b53828656c13df898fa5
2021-04-25 21:26:44 -07:00
Kurt Mohler
1f04494c0e Consolidate nondeterministic error tests (#55631)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/51498

Pull Request resolved: https://github.com/pytorch/pytorch/pull/55631

Reviewed By: malfet

Differential Revision: D27909953

Pulled By: mruberry

fbshipit-source-id: 9115b2433f9c276555be55bd51b270a7a2846829
2021-04-22 23:37:01 -07:00
Sam Estep
75024e228c Add lint for unqualified type: ignore (#56290)
Summary:
The other half of https://github.com/pytorch/pytorch/issues/56272.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56290

Test Plan:
CI should pass on the tip of this PR, and we know that the lint works because the following CI runs (before this PR was finished) failed:

- https://github.com/pytorch/pytorch/runs/2384511062
- https://github.com/pytorch/pytorch/actions/runs/765036024

Reviewed By: seemethere

Differential Revision: D27867219

Pulled By: samestep

fbshipit-source-id: e648f07b6822867e70833e23ddafe7fb7eaca235
2021-04-21 08:07:23 -07:00
Brandon Lin
d806b06167 Support int32 indices in torch.repeat_interleave (#55102)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55102

To avoid casting a tensor to `.long()`, we introduce support for int32 in `torch.repeat_interleave`.

Reviewed By: ezyang

Differential Revision: D27478235

fbshipit-source-id: 08b4cce65fe94ff10535ddc07e1ba2bacea6a2cf
2021-04-19 09:07:25 -07:00