Commit Graph

162 Commits

Author SHA1 Message Date
Yanbo Liang
d719f0276d [Dynamo] Fix nested function resume execution (#100426)
Fixes #99665

Let me explain the root cause using the unit test I added:
* This bug is triggered when:
  * ```wrapped``` is a nested function.
  * ```wrapped``` is in another module which is different from the main function ```fn```.
  * There is a graph break inside of ```wrapped```.
* The root cause is when resuming nested function, actually we are using the outermost function(```fn``` in my example)'s global variables, but ```wrapped``` calls ```inner_func``` which is not part of ```fn```'s globals, so we have to set correct globals when nested function resume execution.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100426
Approved by: https://github.com/jansel
2023-05-06 05:04:50 +00:00
mingfeima
a8429342df fix mul/div overflow issue on CPU float16 (#98820)
Fix https://github.com/pytorch/pytorch/issues/63482 and https://github.com/pytorch/pytorch/issues/98691

The above two issues have the same root cause:

**binary_ops** will create TensorIterator with the flag `promote_inputs_to_common_dtype` on, which will convert both input tensors to the common_dtype_ (the logic is bypassed on CUDA), which might overflow on Half. If one of the inputs is a scalar with abs value larger than ~65000, it will overflow.

This patch will try to fetch the scalar value from the `original_tensor_base` which records the original scalar input value, then in the `cpu_kernel_vec` the TensorIterator is treated as an unary Op.

So previously, CPU and CUDA would have different behaviors for such scenario. This is aligned with this patch, test cases added for both CPU and CUDA device.

The following is the results:

#### before:
```
>>> torch.tensor([3388.], dtype=torch.half).div(524288.0)
tensor([0.], dtype=torch.float16)
>>> torch.tensor([0.01], dtype=torch.float16) * torch.tensor(65536, dtype=torch.float32)
tensor([inf], dtype=torch.float16)
```

#### after:
```
>>> torch.tensor([3388.], dtype=torch.half).div(524288.0)
tensor([0.0065], dtype=torch.float16)
>>> torch.tensor([0.01], dtype=torch.float16) * torch.tensor(65536, dtype=torch.float32)
tensor([655.5000], dtype=torch.float16)
```

Also need to update `RRelu` implementation, to use float to store the intermediate results, otherwise the following test case would fail:
```
. build/bin/test_api --gtest_filter=ModulesTest.RReLU
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98820
Approved by: https://github.com/jgong5, https://github.com/ngimel
2023-04-17 07:12:53 +00:00
Aaron Gokaslan
47dca20d80 [BE] Enable flake8-comprehension rule C417 (#97880)
Enables flake8-comprehension rule C417. Ruff autogenerated these fixes to the codebase.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97880
Approved by: https://github.com/ezyang, https://github.com/kit1980, https://github.com/albanD
2023-03-30 14:34:24 +00:00
BJ Hargrave
b59a60ddff Fix CPU bitwise shifts for out-of-limit shift values (#96659)
Negative shift values and positive shift values greater than the bit size of the dtype (limit `0..bits`) now yield expected results which are consistent with numpy.

Left shift with an out-of-limit shift value result in a value of `0`. Right shift with an out-of-limit shift value results in a value of `-1` for negative inputs and `0` for non-negative inputs (sign preserving).

Fixes https://github.com/pytorch/pytorch/issues/70904

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96659
Approved by: https://github.com/ngimel, https://github.com/albanD, https://github.com/zou3519, https://github.com/jgong5, https://github.com/malfet
2023-03-17 21:35:34 +00:00
mfkasim1
975333d80c Logaddexp for complex in CPU (#95717)
Continuation of PR #93153 where I implemented logaddexp for complex, but didn't expose it to `torch.logaddexp`. So this PR is to expose the complex logaddexp to `torch.logaddexp`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95717
Approved by: https://github.com/lezcano
2023-03-01 20:37:46 +00:00
Xuehai Pan
b005ec62b9 [BE] Remove dependency on six and future (#94709)
Remove the Python 2 and 3 compatibility library [six](https://pypi.org/project/six) and [future](https://pypi.org/project/future) and `torch._six`. We only support Python 3.8+ now. It's time to retire them.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94709
Approved by: https://github.com/malfet, https://github.com/Skylion007
2023-02-14 09:14:14 +00:00
Aaron Gokaslan
67d9790985 [BE] Apply almost all remaining flake8-comprehension checks (#94676)
Applies the remaining flake8-comprehension fixes and checks. This changes replace all remaining unnecessary generator expressions with list/dict/set comprehensions which are more succinct, performant, and better supported by our torch.jit compiler. It also removes useless generators such as 'set(a for a in b)`, resolving it into just the set call.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94676
Approved by: https://github.com/ezyang
2023-02-12 01:01:25 +00:00
Muhammad Firmansyah Kasim
aaa27a6b6d Vectorized more stable complex division (#93277)
Fixes #92043 and completing #92539 by implementing the vectorized more stable complex division.
I implement this using the internal `abs_` function to avoid branching. I also re-implement the internal `abs_` to make it more stable.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/93277
Approved by: https://github.com/peterbell10, https://github.com/lezcano
2023-02-03 11:48:20 +00:00
mfkasim1
1f55f3b0de Solving the under/overflow for complex division (#92539)
Fixes #92043.
I'm following numpy's implementation as suggested by @min-jean-cho.
I found out that this implementation still produces overflow if we're working with numbers greater than `finfo.max / 2`, but this is still much better than the previous implementation where it gets overflow with numbers greater than `finfo.max ** 0.5`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92539
Approved by: https://github.com/lezcano
2023-01-26 01:14:06 +00:00
Natalia Gimelshein
f1b78224ca Fix type promotion for 2 wrapped scalar args (#87845)
Fixes #76801

Pull Request resolved: https://github.com/pytorch/pytorch/pull/87845
Approved by: https://github.com/SherlockNoMad, https://github.com/mruberry
2022-10-27 15:53:11 +00:00
CaoE
cca909645f Add bfloat16 support for lerp on CPU (#84327)
### Description
Add bfloat16 support for lerp on CPU

### Testing
single core:
<html>
<body>
<!--StartFragment-->

op | shape |fp32 forward/ms|bf16 forward/s|fb32 backward/s| bf16 backward/s
-- | -- | -- | -- | -- | --
lerp (tensor) | [10, 128, 10, 124] | 0.005489 | 0.000613 | 0.006658 | 0.003385
  | [10, 128, 20, 124] | 0.011057 | 0.001204 | 0.016032 | 0.007869
  | [10, 128, 30, 124] | 0.016691 | 0.001954 | 0.025549 | 0.012823
  |   |   |   |   |  
lerp (scalar) | [10, 128, 10, 124] | 0.001096 | 0.000507 | 0.002024 | 0.001479
  | [10, 128, 20, 124] | 0.00247 | 0.000997 | 0.005468 | 0.002907
  | [10, 128, 30, 124] | 0.004178 | 0.001513 | 0.009775 | 0.004859

<!--EndFragment-->
</body>
</html>

single socket (28cores):
<html>
<body>
<!--StartFragment-->

op | shape | fp32 forward/s| bf16 forward/s| fb32backward/s| bf16 backward/s
-- | -- | -- | -- | -- | --
lerp (tensor) | [10, 128, 10, 124] | 0.000236 | 3.93E-05 | 0.000494 | 0.000235
  | [10, 128, 20, 124] | 0.000525 | 7.39E-05 | 0.002485 | 0.000638
  | [10, 128, 30, 124] | 0.000801 | 0.000121 | 0.004235 | 0.001529
  |   |   |   |   |  
lerp (scalar) | [10, 128, 10, 124] | 5.90E-05 | 3.32E-05 | 0.000129 | 0.000116
  | [10, 128, 20, 124] | 0.000155 | 5.87E-05 | 0.000368 | 0.000206
  | [10, 128, 30, 124] | 0.000324 | 9.04E-05 | 0.001322 | 0.000313

<!--EndFragment-->
</body>
</html>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84327
Approved by: https://github.com/frank-wei
2022-09-29 01:16:16 +00:00
Natalia Gimelshein
a998a8eb10 Fix segfault for out with a large number of dims (#85294)
Fixes #85166, #85167, #79218, #85251

Pull Request resolved: https://github.com/pytorch/pytorch/pull/85294
Approved by: https://github.com/malfet
2022-09-19 22:00:43 +00:00
Mario Lezcano
88d3acd6b1 Fix and improve the efficiency of the backward of xlog* functions. (#82713)
That is `xlogy`, `special.xlogy`, `special.xlog1py`.

Fixes https://github.com/pytorch/pytorch/issues/80770
Fixes https://github.com/pytorch/pytorch/issues/74279
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82713
Approved by: https://github.com/albanD
2022-08-18 21:55:42 +00:00
Sergii Dymchenko
a0b3854548 Change seperate -> separate (#83056)
One instance was caught by Meta-internal "exact-word-misspell" linter in D38505529.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83056
Approved by: https://github.com/huydhn, https://github.com/seemethere
2022-08-09 23:11:34 +00:00
Peter Bell
5e3d1ef49f Allow ufunc OpInfos to have no reference (#82348)
The `ref` property was moved down from `{Unary,Binary}UfuncInfo` into
`OpInfo` quite some time ago, but `OpInfo` uses `None` to signal no
reference is available while the others use `_NOTHING`. This makes
everything consistently use `None`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82348
Approved by: https://github.com/ngimel
2022-08-09 04:38:17 +00:00
PyTorch MergeBot
814c19b266 Revert "Allow ufunc OpInfos to have no reference (#82348)"
This reverts commit 566d734396.

Reverted https://github.com/pytorch/pytorch/pull/82348 on behalf of https://github.com/peterbell10 due to This stack broke macos tests on trunk
2022-08-06 21:09:09 +00:00
Peter Bell
566d734396 Allow ufunc OpInfos to have no reference (#82348)
The `ref` property was moved down from `{Unary,Binary}UfuncInfo` into
`OpInfo` quite some time ago, but `OpInfo` uses `None` to signal no
reference is available while the others use `_NOTHING`. This makes
everything consistently use `None`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82348
Approved by: https://github.com/ngimel
2022-08-06 20:01:39 +00:00
Natalia Gimelshein
ac8c6d09d1 Don't check overflow for scalar arg in comparison ops (#78881)
That check was potentially synchronizing (people were running into synchronization in real workloads) and mostly unneeded.
Type promotion takes care of comparing to values that cannot be safely converted to the type of other argument.
This now works for `comp(int_tensor, float('inf'))` as expected. For `comp(uint8_tensor, large_int)` it silently wraps integer to uint8 whereas it would error out before, but this also happens with other arithmetic ops.
Also adds reference np implementation to comparison op OpInfos, and additional reference inputs to test newly enabled behavior.

Fixes #76805
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78881
Approved by: https://github.com/mruberry
2022-06-05 01:43:51 +00:00
Kshiteej K
849b08f14b [reland][chalf] where(cpu and cuda), pow(cuda) (#78665)
Reland: https://github.com/pytorch/pytorch/pull/77640
Ref: #74537
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78665
Approved by: https://github.com/ngimel
2022-06-02 18:04:06 +00:00
PyTorch MergeBot
4bb8db85e9 Revert "[chalf] where(cpu and cuda), pow(cuda) (#77640)"
This reverts commit 3697cf7f76.

Reverted https://github.com/pytorch/pytorch/pull/77640 on behalf of https://github.com/mruberry due to as it broke ROCM on trunk
2022-06-01 19:39:38 +00:00
kshitij12345
3697cf7f76 [chalf] where(cpu and cuda), pow(cuda) (#77640)
Ref: #74537
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77640
Approved by: https://github.com/anjali411, https://github.com/ngimel
2022-06-01 18:35:53 +00:00
Mike Ruberry
089203f8bc Updates floor_divide to perform floor division (#78411)
Fixes https://github.com/pytorch/pytorch/issues/43874

This PR changes floor_divide to perform floor division instead of truncation division.

This is a BC-breaking change, but it's a "bug fix," and we've already warned users for several releases this behavior would change.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78411
Approved by: https://github.com/ngimel
2022-05-29 21:28:45 +00:00
kshitij12345
efb2c093fc [fix] complex type promotion (#77524)
Fixes https://github.com/pytorch/pytorch/issues/76803

Before Fix:
```python
>> a = torch.randn((2, 2), dtype=torch.float)
>> b = torch.tensor(1, dtype=torch.cdouble)
>> (a + b).dtype
torch.complex128
```

After Fix:
```python
>> a = torch.randn((2, 2), dtype=torch.float)
>> b = torch.tensor(1, dtype=torch.cdouble)
>> (a + b).dtype
torch.complex64
```

**Note**: This is a BC Breaking change.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77524
Approved by: https://github.com/anjali411, https://github.com/mruberry
2022-05-20 10:23:56 +00:00
Xiang Gao
f274558018 Bitwise ops improvements (#77621)
- Bitwise shift remove floating point support
- Bitwise and, or, xor add (scalar, tensor) overload
- Use `test_ops.py` to test these ops, including error cases
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77621
Approved by: https://github.com/ngimel
2022-05-17 21:16:42 +00:00
Andrew M. James
39d3a7ffe5 Connect Tensor.__ipow__ to pow_ method
The `pow_` method should be connected to `Tensor.__ipow__` so that the
operator `**=` works correctly.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76900

Approved by: https://github.com/mruberry
2022-05-15 15:06:30 +00:00
Jiayi Sun
cdf9572c52 add BFloat16 operators on CPU: histc, atan2 (#72695)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/72695
Approved by: https://github.com/VitalyFedyunin, https://github.com/frank-wei
2022-05-12 17:45:08 +00:00
Xiang Gao
cc9d0f309e lshift and rshift stop support floating types (#77146)
Fixes #74358

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77146
Approved by: https://github.com/ngimel
2022-05-11 22:29:30 +00:00
Mike Ruberry
c031643e39 Adds decorators for Python References and extends Python Reference testing (#76945)
This PR does the following...

Tests:
- fixes test_type_promotion in test_binary_ufuncs to correctly generate scalar cpu tensors
- fixes test_python_reference_consistency to use the Python Reference's reference inputs
- extends Python reference testing to test_conj_view, test_neg_view, and test_neg_conj_view
- adds a NaN propagation sample input for elementwise unary and binary operations
- fixes the UnaryUfuncInfo class to properly register its reference inputs
- Updates the Python Reference OpInfos to skip error inputs when their behavior on scalar inputs is inconsistent with their reference operators

Code organization:
- moves elementwise type promotion functionality to prims.utils

Prims & Refs:
- fixes scalar cpu tensor handling by having them pass through broadcasting and device and shape checks
- adds two decorators, `elementwise_type_promotion_wrapper` and `out_wrapper`, the former allows for elementwise type promotion to be automated and the latter automatically adds the out kwarg and handles it properly

cc @ezyang who also had some thoughts on cpu scalar tensor handling
cc @chillee -- might want to use this new decorator as we converge decompositions and references
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76945
Approved by: https://github.com/ngimel
2022-05-07 03:42:24 +00:00
Mike Ruberry
b557e102d8 Fixes prim type promotion and updates type promotion testing
This PR fixes prim elementwise type promotion, tests elementwise binary references using `test_type_promotion` in the elementwise binary test suite, and updates that test with additional cases for float x complex and scalar type promotion.

The following issues were discovered while working on this PR:

- https://github.com/pytorch/pytorch/issues/76806
- https://github.com/pytorch/pytorch/issues/76805
- https://github.com/pytorch/pytorch/issues/76804
- https://github.com/pytorch/pytorch/issues/76803
- https://github.com/pytorch/pytorch/issues/76801

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76809
Approved by: https://github.com/ngimel
2022-05-04 17:58:10 +00:00
Mike Ruberry
f6bbecf8b5 Adds python ref consistency test, elementwise unary reference inputs, and formats test files
Per title.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76626
Approved by: https://github.com/ngimel
2022-05-01 22:42:46 +00:00
kshitij12345
1f1d0b30ce [fix] mul chalf: cpu scalar and cuda tensor case
https://github.com/pytorch/pytorch/pull/76158 added `chalf` support for `mul` on CUDA incorrectly.

Following sample
```python
torch.mul(torch.zeros(3, device='cuda'), 2.5) # CUDA Tensor and CPU Scalar
```

fails with
```
RuntimeError: iter.device(arg).is_cuda() INTERNAL ASSERT FAILED at "../aten/src/ATen/native/cuda/JitLoops.cuh":83, please report a bug to PyTorch. argument 2: expected a CUDA device but found cpu
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76364
Approved by: https://github.com/mruberry
2022-04-30 06:31:52 +00:00
Mike Ruberry
4048d4cdd2 [primTorch] Prototype tracer and elementwise unary reference opinfo class
Adds a prototype tracer with no caching support and the `ElementwiseUnaryPythonRefInfo` class. A reference for `floor` is added to test the latter, and the elementwise binary reference inputs are extended to also return noncontiguous inputs. The SampleInput transform operation has been updated to return an actual SampleInput instead of a tuple to facilitate uniform handling of (transformed) SampleInputs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76388
Approved by: https://github.com/ngimel
2022-04-27 14:40:21 +00:00
Jeff Daily
24d06182f6 [ROCm] unskip test_fmod_remainder_by_zero_integral
The test is updated to check for ROCm-specific undefined behavior for
integral fmod division by zero.

Fixes #48130.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76119
Approved by: https://github.com/mruberry
2022-04-22 17:34:36 +00:00
Natalia Gimelshein
2f9a239ae9 do intermediate math in opmath_t in lerp
Per title

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76062
Approved by: https://github.com/mruberry
2022-04-20 04:05:55 +00:00
Mike Ruberry
de949a0e59 Various OpInfo architecture improvements
This PR makes the following improvements:

- moves the custom skip list for test_normalize_operator_exhaustive in test_fx_experimental to use the typical OpInfo skip architecture. The skips were updated to xfails, and that identified some operators which were no longer failing the test
- redundant tests with OpInfo-based testing in test_jit.py were removed
- test_dtypes was improved so its error messages are clear and it makes test_nondifferentiable redundant; the latter test has been removed
- OpInfo.supports_complex_autograd() is removed in favor of a more accurate and general test for whether the particular dtype is in the backward dtypes of the operator
- gradchecks have been improved to verify that an operator doesn't support grad if it claims not to
- gradchecks have been improved to test the gradient of all input tensors that require gradient
- the concept of "default test dtypes" has been removed
- excessive and mostly redundant out testing for elementwise unary operators has been removed
- metadata for whether an op supports nuanced "safe casting" to out behavior has been removed from OpInfos
- numerous skips have been converted to xfails
- numerous OpInfos have had their metadata fixed based on the new checks
- jit-specific utilities in common_methods_invocations.py have been moved to jit_programming_utils.py
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75951
Approved by: https://github.com/ngimel
2022-04-18 21:55:32 +00:00
Mike Ruberry
b09769992f Improves the OpInfo out= tests
Edit: OpInfos separated into their own PRs to debug an ASAN failure that doesn't identify the failing test properly. This PR now just updates the out tests.

Adds OpInfos for:

- nn.functional.smooth_l1_loss
- nn.functional.l1_loss
- nn.functional.pdist
- nn.functional.binary_cross_entropy
- nn.functional.triplet_margin_loss
- nn.functional.triplet_margin_with_distance_loss
- nn.functional.max_unpool{1, 2, 3}D
- nn.functional.alpha_dropout
- nn.functional.soft_margin_loss
- nn.functional.multilabel_soft_margin_loss
- nn.functional.multilabel_margin_loss
- nn.functional.multi_margin_loss
- nn.functional.margin_ranking_loss

These OpInfos were taken from https://github.com/pytorch/pytorch/pull/67560, https://github.com/pytorch/pytorch/pull/67823, https://github.com/pytorch/pytorch/pull/68625, and https://github.com/pytorch/pytorch/pull/67079. The sample input update from https://github.com/pytorch/pytorch/pull/67017 is also rolled into this PR.

cc @zou3519 @nikitaved @pmeier @vfdev-5 @dagitses
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75782
Approved by: https://github.com/ngimel
2022-04-15 06:16:01 +00:00
Richard Zou
791321e44d Fix minimum, maximum forward-ad formula for float32
fixes https://github.com/pytorch/pytorch/issues/69034

Test Plan:
- new tests

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75277
Approved by: https://github.com/soulitzer
2022-04-07 13:58:39 +00:00
Nikita Shulga
bfac65dfe5
[testing] Update dispatch macros (#74977)
This PR is reland of #74289 
Co-authored-by: Khushi Agrawal <khushiagrawal411@gmail.com>
2022-03-30 14:13:21 -07:00
PyTorch MergeBot
2e4152b118 Revert "[testing] Update dispatch macros"
This reverts commit eed19a0f38.

Reverted https://github.com/pytorch/pytorch/pull/74289 on behalf of https://github.com/malfet
2022-03-30 19:52:37 +00:00
Khushi Agrawal
eed19a0f38 [testing] Update dispatch macros
Hi,
This PR is the follow-up PR of #71561. (the previous PR had a couple of merge conflicts and was reverted, this PR resolves that).
Please take a look. Thanks!

cc: @pmeier @mruberry @kshitij12345
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74289
Approved by: https://github.com/pmeier, https://github.com/mruberry
2022-03-30 16:10:16 +00:00
Mike Ruberry
0aa3c39e5f Extends OpInfo architecture with reference inputs, adds them for elementwise binary operators
This PR extends our OpInfo test architecture with "reference inputs," an optional expansion of typical sample inputs that allows for more thorough testing. Currently only the elementwise binary operations implement an extended set of reference inputs. This PR also cleans up some smaller OpInfo-related issues, including several bugs, and it identified https://github.com/pytorch/pytorch/issues/74279.

A reference inputs function can be specified for an OpInfo by filling in its "reference_inputs_func" metadata. If this is done it's recommended that the reference inputs function first call the sample inputs function, then produce additional sample inputs. See `reference_inputs_elementwise_binary` for an example of this pattern.

In addition to implementing reference inputs for the elementwise binary operations, this PR improves their consistency and simplifies how their metadata is represented. The great majority now use a generic sample input function, and those that want extensions start by calling the generic sample input function and then adding additional samples. This removes many older sample input functions. The BinaryUfuncInfo subclass also now allows specifying scalar support more precisely, and reference inputs and error inputs are generated based on this metadata to ensure it's correct.

cc @kshitij12345 @pmeier @zou3519 @Chillee

Pull Request resolved: https://github.com/pytorch/pytorch/pull/74280
Approved by: https://github.com/ngimel
2022-03-21 03:24:16 +00:00
Nikita Shulga
ef066f0832 Revert D34856571: [pytorch][PR] Replace get_all_ type macros with the ATen dispatch macros.
Test Plan: revert-hammer

Differential Revision:
D34856571 (3ded7b1da3)

Original commit changeset: 0dca038bcad5

Original Phabricator Diff: D34856571 (3ded7b1da3)

fbshipit-source-id: 594553fa0b710d78beba59d5d2b646f1f1270386
(cherry picked from commit 8090eb9b12dcf452a9e7dc01792a66fb91b563b6)
2022-03-15 22:07:11 +00:00
Khushi Agrawal
3ded7b1da3 Replace get_all_ type macros with the ATen dispatch macros. (#71561)
Summary:
Hi, Team!
The PR is motivated from https://github.com/pytorch/pytorch/pull/71153#discussion_r782446738. It aims to replace `get_all` type macros with the ATen dispatch macros.

The files it iterates over are: (Thanks, Lezcano, for the idea!!)

<details>
<summary>

`test/test_autograd.py`</summary>

<p>

```python
43:from torch.testing._internal.common_dtype import get_all_dtypes
8506:        floating_dt = [dt for dt in get_all_dtypes() if dt.is_floating_point]
```

</p>
</details>

<details>
<summary>

`test/test_binary_ufuncs.py`</summary>

<p>

```python
26:    all_types_and_complex_and, integral_types_and, get_all_dtypes, get_all_int_dtypes, get_all_math_dtypes,
27:    get_all_complex_dtypes, get_all_fp_dtypes,
935:    dtypes(*get_all_dtypes(include_bool=False, include_complex=False))
1035:    dtypes(*get_all_dtypes(
1488:    dtypes(*(get_all_dtypes(include_bool=False, include_bfloat16=False)))
1879:    dtypes(*product(get_all_dtypes(include_complex=False), get_all_dtypes(include_complex=False)))
1887:    dtypes(*(get_all_int_dtypes() + [torch.bool]))
1913:    dtypes(*(get_all_fp_dtypes()))
1941:    dtypes(*(get_all_fp_dtypes()))
1977:    dtypes(*product(get_all_complex_dtypes(), get_all_dtypes()))
2019:    dtypes(*product(get_all_fp_dtypes(), get_all_fp_dtypes()))
2048:    dtypes(*get_all_dtypes())
2110:    dtypes(*product(get_all_dtypes(include_complex=False),
2111:                     get_all_dtypes(include_complex=False)))
2128:            types = [torch.bool, torch.bfloat16] + get_all_int_dtypes()
2173:        if dtypes[1] in get_all_fp_dtypes():
2178:    dtypes(*product(get_all_fp_dtypes(),
2179:                     get_all_fp_dtypes()))
2260:    dtypesIfCUDA(*set(get_all_math_dtypes('cuda')) - {torch.complex64, torch.complex128})
2261:    dtypes(*set(get_all_math_dtypes('cpu')) - {torch.complex64, torch.complex128})
2273:    dtypesIfCUDA(*set(get_all_math_dtypes('cuda')) - {torch.complex64, torch.complex128})
2274:    dtypes(*set(get_all_math_dtypes('cpu')) - {torch.complex64, torch.complex128})
2307:    dtypes(*get_all_math_dtypes('cpu'))
2319:    dtypes(*get_all_fp_dtypes(include_bfloat16=False))
2331:    dtypes(*get_all_int_dtypes())
2356:    dtypes(*get_all_dtypes(include_bfloat16=False, include_bool=False, include_complex=False))
2393:        if dtype in get_all_int_dtypes():
2614:    dtypes(*get_all_dtypes())
2624:    dtypes(*tuple(itertools.combinations_with_replacement(get_all_dtypes(), 2)))
2806:    dtypes(*list(product(get_all_dtypes(include_complex=False),
2807:                          get_all_dtypes(include_complex=False))))
2866:    dtypes(*list(product(get_all_complex_dtypes(),
2867:                          get_all_complex_dtypes())))
2902:    dtypes(*product(get_all_dtypes(), get_all_dtypes()))
2906:    dtypes(*product(get_all_dtypes(), get_all_dtypes()))
2910:    dtypes(*product(get_all_dtypes(), get_all_dtypes()))
3019:        dtypes = [torch.float, torch.double] + get_all_complex_dtypes()
3221:    dtypes(*get_all_dtypes(include_complex=False))
3407:    dtypes(*list(product(get_all_dtypes(include_bool=False),
3408:                          get_all_dtypes(include_bool=False))))
3504:    dtypes(*product(get_all_dtypes(include_complex=False, include_bfloat16=False),
3505:                     get_all_dtypes(include_complex=False, include_bfloat16=False)))
3516:            if x.dtype in get_all_int_dtypes() + [torch.bool]:
3643:    dtypes(*product(get_all_dtypes(include_complex=False,
3645:                     get_all_dtypes(include_complex=False,
```

</p>
</details>

<details>
<summary>

`test/test_complex.py`</summary>

<p>

```python
6:from torch.testing._internal.common_dtype import get_all_complex_dtypes
11:    dtypes(*get_all_complex_dtypes())
```

</p>
</details>

<details>
<summary>

`test/test_foreach.py`</summary>

<p>

```python
18:    get_all_dtypes, get_all_int_dtypes, get_all_complex_dtypes, get_all_fp_dtypes,
142:            if dtype in get_all_int_dtypes():
179:            disable_fastpath = op.ref == torch.div and dtype in get_all_int_dtypes() + [torch.bool]
201:            disable_fastpath = op.ref == torch.div and dtype in get_all_int_dtypes() + [torch.bool]
205:                disable_fastpath |= dtype in get_all_int_dtypes() + [torch.bool]
211:                disable_fastpath |= dtype not in get_all_complex_dtypes()
241:                bool_int_div = op.ref == torch.div and dtype in get_all_int_dtypes() + [torch.bool]
246:                    disable_fastpath |= dtype in get_all_int_dtypes() + [torch.bool]
248:                    disable_fastpath |= dtype not in get_all_complex_dtypes()
250:                    disable_fastpath |= True and dtype not in get_all_complex_dtypes()
307:        disable_fastpath = dtype in get_all_int_dtypes() + [torch.bool]
365:        if opinfo.name == "_foreach_abs" and dtype in get_all_complex_dtypes():
376:    ops(foreach_unary_op_db, dtypes=get_all_dtypes())
393:         dtypes=get_all_dtypes(include_half=True, include_bfloat16=True, include_complex=False))
401:    ops(foreach_minmax_op_db, dtypes=get_all_fp_dtypes(include_bfloat16=True, include_half=True))
426:            if ord in (1, 2) and dtype in torch.testing.get_all_fp_dtypes():
439:    dtypes(*get_all_dtypes())
449:    ops(foreach_binary_op_db, dtypes=get_all_dtypes())
481:    ops(foreach_binary_op_db, dtypes=get_all_dtypes())
536:            if dtype in get_all_int_dtypes() + [torch.bool] and foreach_op == torch._foreach_div:
545:    ops(foreach_binary_op_db, dtypes=get_all_dtypes())
637:    ops(foreach_pointwise_op_db, allowed_dtypes=get_all_fp_dtypes(include_half=False, include_bfloat16=False))
```

</p>
</details>

<details>
<summary>

`test/test_linalg.py`</summary>

<p>

```python
29:    all_types, floating_types, floating_and_complex_types, get_all_dtypes, get_all_int_dtypes, get_all_complex_dtypes,
30:    get_all_fp_dtypes,
111:    dtypes(*(get_all_dtypes()))
794:        float_and_complex_dtypes = get_all_fp_dtypes() + get_all_complex_dtypes()
807:    dtypes(*(get_all_int_dtypes()))
828:    dtypes(*(get_all_fp_dtypes() + get_all_complex_dtypes()))
841:        if dtype in get_all_complex_dtypes():
844:    dtypes(*itertools.product(get_all_dtypes(),
845:                               get_all_dtypes()))
855:        for dtypes0, dtypes1, dtypes2 in product(get_all_dtypes(), repeat=3):
5607:                  *get_all_fp_dtypes(include_half=not CUDA9, include_bfloat16=(CUDA11OrLater and SM53OrLater)))
5608:    dtypes(*(set(get_all_dtypes()) - {torch.half, torch.bool}))
5644:    dtypes(*(get_all_complex_dtypes() + get_all_fp_dtypes()))
6255:    dtypesIfCUDA(*get_all_complex_dtypes(),
6256:                  *get_all_fp_dtypes(include_bfloat16=(TEST_WITH_ROCM or (CUDA11OrLater and SM53OrLater)),
6292:    dtypesIfCUDA(*get_all_fp_dtypes(include_bfloat16=(TEST_WITH_ROCM or (CUDA11OrLater and SM53OrLater))))
6323:    dtypesIfCUDA(*get_all_complex_dtypes(),
6324:                  *get_all_fp_dtypes(include_bfloat16=(TEST_WITH_ROCM or (CUDA11OrLater and SM53OrLater))))
6325:    dtypes(*get_all_complex_dtypes(), *get_all_fp_dtypes())
6358:    dtypesIfCUDA(*([torch.float, torch.double] + get_all_complex_dtypes()))
6556:    dtypes(*get_all_fp_dtypes(), *get_all_complex_dtypes())
6668:    dtypes(*get_all_fp_dtypes(), *get_all_complex_dtypes())
6741:    dtypes(*get_all_fp_dtypes(), *get_all_complex_dtypes())
```

</p>
</details>

<details>
<summary>

`test/test_nn.py`</summary>

<p>

```python
37:from torch.testing._internal.common_dtype import integral_types, get_all_fp_dtypes, get_all_math_dtypes
50:    onlyNativeDeviceTypes, deviceCountAtLeast, largeTensorTest, expectedFailureMeta, skipMeta, get_all_device_types, \
8862:                for device in get_all_device_types():
9629:            for dt1 in get_all_math_dtypes(device):
9630:                for dt2 in get_all_math_dtypes(device):
9631:                    for dt3 in get_all_math_dtypes(device):
9648:            for input_dtype in get_all_math_dtypes(device):
9664:            for input_dtype in get_all_math_dtypes(device):
13015:    dtypes(*get_all_fp_dtypes(include_bfloat16=AMPERE_OR_ROCM))
13034:    dtypes(*get_all_fp_dtypes(include_bfloat16=AMPERE_OR_ROCM))
13159:    dtypes(*get_all_fp_dtypes(include_bfloat16=AMPERE_OR_ROCM))
17400:    dtypesIfCUDA(*get_all_fp_dtypes(include_bfloat16=AMPERE_OR_ROCM))
17768:    dtypesIfCUDA(*get_all_fp_dtypes())
17773:    dtypesIfCUDA(*get_all_fp_dtypes())
17778:    dtypesIfCUDA(*get_all_fp_dtypes())
17783:    dtypesIfCUDA(*get_all_fp_dtypes())
17788:    dtypesIfCUDA(*get_all_fp_dtypes())
17793:    dtypesIfCUDA(*get_all_fp_dtypes())
17798:    dtypesIfCUDA(*get_all_fp_dtypes())
17963:    dtypesIfCUDA(*get_all_fp_dtypes())
17977:    dtypesIfCUDA(*get_all_fp_dtypes())
18684:    def test_cross_entropy_loss_prob_target_all_reductions(self, device):
```

</p>
</details>

<details>
<summary>

`test/test_numpy_interop.py`</summary>

<p>

```python
12:from torch.testing._internal.common_dtype import get_all_dtypes
399:    dtypes(*get_all_dtypes())
```

</p>
</details>

<details>
<summary>

`test/test_ops.py`</summary>

<p>

```python
12:from torch.testing._internal.common_dtype import floating_and_complex_types_and, get_all_dtypes
86:        for dtype in get_all_dtypes():
```

</p>
</details>

<details>
<summary>

`test/test_reductions.py`</summary>

<p>

```python
16:    get_all_dtypes, get_all_math_dtypes, get_all_int_dtypes, get_all_complex_dtypes, get_all_fp_dtypes,
360:         allowed_dtypes=get_all_dtypes(include_bfloat16=False))
366:         allowed_dtypes=get_all_dtypes(include_bfloat16=False))
394:         allowed_dtypes=get_all_dtypes(include_bfloat16=False))
750:        for dtype in [dtype for dtype in get_all_math_dtypes('cpu') if dtype != torch.float16]:
1404:    dtypes(*get_all_dtypes(include_bool=False, include_complex=False))
1457:    dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False) +
1458:              get_all_complex_dtypes()))
1465:            return dtype in get_all_int_dtypes()
1494:    dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False)))
1501:    dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False)))
1507:    dtypes(*(get_all_complex_dtypes()))
1514:        dtypes = list(get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False))
1523:    dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False)))
1531:        if dtype in get_all_fp_dtypes():
1608:    dtypes(*(get_all_dtypes(include_half=True, include_bfloat16=False,
1837:    dtypes(*get_all_dtypes(include_bool=False, include_complex=False))
1855:    dtypes(*(set(get_all_dtypes(include_bool=False, include_complex=False)) - {torch.uint8}))
3219:        for dtype in get_all_dtypes(include_half=True, include_bfloat16=False,
```

</p>
</details>

<details>
<summary>

`test/test_serialization.py`</summary>

<p>

```python
26:from torch.testing._internal.common_dtype import get_all_dtypes
586:        for device, dtype in product(devices, get_all_dtypes()):
589:            for other_dtype in get_all_dtypes():
```

</p>
</details>

<details>
<summary>

`test/test_shape_ops.py`</summary>

<p>

```python
18:from torch.testing._internal.common_dtype import get_all_dtypes
230:    dtypes(*get_all_dtypes(include_complex=False, include_bool=False, include_half=False,
232:    dtypesIfCUDA(*get_all_dtypes(include_complex=False, include_bool=False, include_bfloat16=False))
344:    dtypes(*get_all_dtypes())
443:    dtypes(*get_all_dtypes())
461:    dtypes(*get_all_dtypes())
570:    dtypes(*get_all_dtypes(include_complex=False))
```

</p>
</details>

<details>
<summary>

`test/test_sort_and_select.py`</summary>

<p>

```python
12:    all_types, all_types_and, floating_types_and, get_all_dtypes, get_all_int_dtypes, get_all_fp_dtypes,
136:    dtypes(*set(get_all_dtypes()) - {torch.bool, torch.complex64, torch.complex128})
231:    dtypes(*set(get_all_dtypes()) - {torch.bool, torch.complex64, torch.complex128})
296:    dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes()))
647:    dtypesIfCUDA(*get_all_fp_dtypes())
678:    dtypesIfCUDA(*(get_all_dtypes(include_complex=False,
682:    dtypes(*(get_all_dtypes(include_complex=False, include_bool=False, include_half=False, include_bfloat16=False)))
739:    dtypesIfCPU(*set(get_all_dtypes()) - {torch.complex64, torch.complex128})
740:    dtypes(*set(get_all_dtypes()) - {torch.bfloat16, torch.complex64, torch.complex128})
799:    dtypesIfCPU(*set(get_all_dtypes()) - {torch.complex64, torch.complex128})
800:    dtypes(*set(get_all_dtypes()) - {torch.bfloat16, torch.complex64, torch.complex128})
```

</p>
</details>

<details>
<summary>

`test/test_sparse.py`</summary>

<p>

```python
20:from torch.testing import get_all_complex_dtypes, get_all_fp_dtypes
29:    floating_and_complex_types, floating_and_complex_types_and, get_all_dtypes, get_all_int_dtypes,
1963:            return dtype in get_all_int_dtypes()
1994:    dtypes(*get_all_dtypes(include_bool=False, include_half=False,
2103:            return dtype in get_all_int_dtypes()
2138:    dtypes(*get_all_dtypes(include_bool=False, include_half=False,
2626:        all_sparse_dtypes = get_all_dtypes(include_complex=True)
2633:        all_sparse_dtypes = get_all_dtypes(include_complex=True)
3230:    dtypes(*get_all_complex_dtypes(),
3231:            *get_all_fp_dtypes(include_half=False, include_bfloat16=False))
3234:                  *get_all_fp_dtypes(
```

</p>
</details>

<details>
<summary>

`test/test_sparse_csr.py`</summary>

<p>

```python
7:from torch.testing import get_all_complex_dtypes, get_all_fp_dtypes, floating_and_complex_types, make_tensor
17:from torch.testing._internal.common_dtype import floating_types, get_all_dtypes
120:    dtypes(*get_all_dtypes())
133:    dtypes(*get_all_dtypes())
150:    dtypes(*get_all_dtypes())
180:    dtypes(*get_all_dtypes())
201:    dtypes(*get_all_dtypes())
210:    dtypes(*get_all_dtypes())
225:    dtypes(*get_all_dtypes())
244:    dtypes(*get_all_dtypes())
263:    dtypes(*get_all_dtypes())
285:    dtypes(*get_all_dtypes())
411:    dtypes(*get_all_dtypes())
482:    dtypes(*get_all_dtypes())
502:    dtypes(*get_all_dtypes())
562:    dtypes(*get_all_dtypes())
588:    dtypesIfCUDA(*get_all_complex_dtypes(),
589:                  *get_all_fp_dtypes(include_half=SM53OrLater, include_bfloat16=SM80OrLater))
745:    dtypesIfCUDA(*get_all_complex_dtypes(),
746:                  *get_all_fp_dtypes(include_half=SM53OrLater and TEST_CUSPARSE_GENERIC,
765:    dtypesIfCUDA(*get_all_complex_dtypes(),
766:                  *get_all_fp_dtypes(include_half=SM53OrLater and TEST_CUSPARSE_GENERIC,
801:                  *torch.testing.get_all_fp_dtypes(include_bfloat16=SM80OrLater,
841:                  *torch.testing.get_all_fp_dtypes(include_bfloat16=SM80OrLater,
1182:    dtypes(*get_all_dtypes())
1276:    dtypes(*get_all_dtypes(include_bool=False, include_half=False, include_bfloat16=False))
1286:    dtypes(*get_all_dtypes())
```

</p>
</details>

<details>
<summary>

`test/test_tensor_creation_ops.py`</summary>

<p>

```python
21:    onlyCUDA, skipCPUIf, dtypesIfCUDA, skipMeta, get_all_device_types)
23:    get_all_dtypes, get_all_math_dtypes, get_all_int_dtypes, get_all_fp_dtypes, get_all_complex_dtypes
150:        for dt in get_all_dtypes():
160:        for dt in get_all_dtypes():
314:        dtypes = [dtype for dtype in get_all_dtypes() if dtype != torch.bfloat16]
1012:    dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False) +
1013:              get_all_complex_dtypes()))
1032:    dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False) +
1033:              get_all_complex_dtypes()))
1050:    dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False) +
1051:              get_all_complex_dtypes()))
1745:    dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes()))
1779:    dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes()))
1868:    dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes()))
1926:    dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes()))
1954:            do_test_empty_full(self, get_all_math_dtypes('cpu'), torch.strided, torch_device)
1956:            do_test_empty_full(self, get_all_math_dtypes('cpu'), torch.strided, None)
1957:            do_test_empty_full(self, get_all_math_dtypes('cpu'), torch.strided, torch_device)
2538:        for device in get_all_device_types():
2645:        for dtype in get_all_dtypes():
2678:    dtypes(*(get_all_fp_dtypes(include_half=False, include_bfloat16=False) +
2679:              get_all_complex_dtypes()))
2716:    dtypes(*get_all_fp_dtypes(include_half=False, include_bfloat16=False))
2827:            for dt in get_all_dtypes():
2913:    dtypes(*get_all_dtypes(include_bool=False, include_half=False))
2914:    dtypesIfCUDA(*get_all_dtypes(include_bool=False, include_half=True))
3028:    dtypes(*(get_all_fp_dtypes() + get_all_complex_dtypes()))
3033:    dtypes(*(get_all_fp_dtypes() + get_all_complex_dtypes()))
3074:    dtypes(*get_all_dtypes(include_bool=False, include_half=False, include_complex=False))
3075:    dtypesIfCUDA(*((get_all_int_dtypes() + [torch.float32, torch.float16, torch.bfloat16])
3077:                    else get_all_dtypes(include_bool=False, include_half=True, include_complex=False)))
3873:    dtypes(*get_all_dtypes())
3884:    dtypes(*get_all_dtypes(include_bool=False))
3916:            for other in get_all_dtypes():
3922:    dtypes(*get_all_dtypes())
3932:    dtypes(*get_all_dtypes(include_bool=False))
3955:    dtypes(*get_all_dtypes(include_bool=False))
3961:    dtypes(*get_all_dtypes(include_bool=False))
3965:    dtypes(*get_all_dtypes())
```

</p>
</details>

<details>
<summary>

`test/test_testing.py`</summary>

<p>

```python
25:from torch.testing._internal.common_dtype import get_all_dtypes
31:    dtypes(*(get_all_dtypes(include_half=True, include_bfloat16=False,
```

</p>
</details>

<details>
<summary>

`test/test_torch.py`</summary>

<p>

```python
51:    expectedAlertNondeterministic, get_all_device_types, skipXLA)
57:    get_all_fp_dtypes, get_all_int_dtypes, get_all_math_dtypes, get_all_dtypes, get_all_complex_dtypes
296:            for d in get_all_device_types():
323:            for device in get_all_device_types():
324:                for dt1 in get_all_dtypes():
325:                    for dt2 in get_all_dtypes():
343:            all_dtypes = get_all_dtypes()
350:            all_dtypes = get_all_dtypes()
781:            for dtype in get_all_dtypes():
986:            for device in get_all_device_types():
1017:            for device in get_all_device_types():
1018:                for dtype in get_all_math_dtypes(device):
2792:            for device in get_all_device_types():
3186:    dtypes(*get_all_dtypes())
3195:        for error_dtype in get_all_dtypes():
3203:    dtypes(*get_all_dtypes())
3212:        for error_dtype in get_all_dtypes():
4539:    dtypes(*get_all_fp_dtypes())
4545:    dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes()))
4577:    dtypes(*get_all_fp_dtypes(include_half=False, include_bfloat16=False))
4578:    dtypesIfCPU(*(get_all_fp_dtypes(include_half=False, include_bfloat16=True)))
4579:    dtypesIfCUDA(*(get_all_fp_dtypes(include_bfloat16=False)))
4599:    dtypes(*(get_all_fp_dtypes(include_half=False, include_bfloat16=False)))
4600:    dtypesIfCPU(*(get_all_dtypes(include_half=False, include_bfloat16=False, include_complex=False)))
4601:    dtypesIfCUDA(*(get_all_dtypes(include_bfloat16=False, include_complex=False)))
4613:        for p_dtype in get_all_fp_dtypes(include_half=device.startswith('cuda'), include_bfloat16=False):
4628:    dtypes(*(get_all_fp_dtypes(include_half=False, include_bfloat16=False)))
4629:    dtypesIfCUDA(*(get_all_fp_dtypes(include_bfloat16=False)))
4640:    dtypes(*get_all_fp_dtypes())
4723:    dtypes(*get_all_fp_dtypes())
4735:    dtypes(*get_all_fp_dtypes(include_bfloat16=False))
4736:    dtypesIfCUDA(*get_all_fp_dtypes())
4747:    dtypes(*get_all_fp_dtypes())
4761:    dtypes(*get_all_fp_dtypes())
4771:    dtypes(*get_all_fp_dtypes())
4792:    dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes()))
5302:    dtypes(*get_all_dtypes(include_bfloat16=False))
5322:    dtypes(*get_all_dtypes(include_half=False, include_bfloat16=False))
5323:    dtypesIfCPU(*get_all_dtypes(include_bfloat16=False))
5324:    dtypesIfCUDA(*get_all_dtypes(include_bfloat16=False))
5591:        for dt in get_all_dtypes():
5611:        for dt in get_all_dtypes():
5678:        for dt in get_all_dtypes():
5696:    dtypesIfCUDA(*set(get_all_math_dtypes('cuda')))
5697:    dtypes(*set(get_all_math_dtypes('cpu')))
5746:    dtypes(*get_all_dtypes())
5780:    dtypes(*get_all_dtypes())
5885:    dtypes(*get_all_dtypes())
5902:    dtypes(*get_all_dtypes())
5945:    dtypes(*get_all_dtypes())
5979:    dtypes(*get_all_dtypes(include_bool=False))
6049:    dtypes(*get_all_dtypes(include_bool=False))
6092:    dtypes(*(get_all_fp_dtypes(include_bfloat16=False, include_half=False) +
6093:              get_all_complex_dtypes()))
6094:    dtypesIfCPU(*get_all_dtypes())
6095:    dtypesIfCUDA(*get_all_dtypes())
6122:    dtypes(*(get_all_fp_dtypes(include_bfloat16=False, include_half=False) +
6123:              get_all_complex_dtypes()))
6124:    dtypesIfCPU(*get_all_dtypes())
6125:    dtypesIfCUDA(*get_all_dtypes())
6163:    dtypes(*(get_all_fp_dtypes(include_bfloat16=False, include_half=False) +
6164:              get_all_complex_dtypes()))
6165:    dtypesIfCPU(*get_all_dtypes())
6166:    dtypesIfCUDA(*get_all_dtypes())
6190:    dtypes(*(get_all_complex_dtypes() +
6191:              get_all_int_dtypes()))
6238:    dtypes(*get_all_dtypes())
6323:    dtypes(*get_all_dtypes())
6389:    dtypes(*product(get_all_dtypes(), (torch.uint8, torch.bool)))
6699:    dtypesIfCUDA(*set(get_all_math_dtypes('cuda')))
6700:    dtypes(*set(get_all_math_dtypes('cpu')))
7452:    dtypes(*get_all_dtypes(include_bool=False))
7461:    dtypes(*get_all_dtypes(include_bool=False))
7477:    dtypes(*get_all_dtypes(include_bool=False))
7496:    dtypes(*get_all_dtypes(include_bool=False))
7538:    dtypes(*get_all_dtypes(include_bool=False))
8162:    dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes() +
8163:              get_all_complex_dtypes()))
8175:    dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes() +
8176:              get_all_complex_dtypes()))
```

</p>
</details>

<details>
<summary>

`test/test_type_promotion.py`</summary>

<p>

```python
14:    get_all_dtypes, get_all_math_dtypes, get_all_int_dtypes, get_all_fp_dtypes
187:        for dtype in get_all_dtypes():
262:        dtypes1 = get_all_math_dtypes('cuda')
263:        dtypes2 = get_all_math_dtypes(device)
339:    dtypes(*itertools.product(get_all_dtypes(), get_all_dtypes()))
468:            for dt1 in get_all_math_dtypes(device):
469:                for dt2 in get_all_math_dtypes(device):
519:            for dt1 in get_all_math_dtypes(device):
520:                for dt2 in get_all_math_dtypes(device):
528:        for dt in get_all_math_dtypes(device):
561:        for dtype in get_all_dtypes():
766:                                          dtypes=get_all_math_dtypes(device))
771:                                          dtypes=get_all_math_dtypes(device))
782:                                          dtypes=get_all_math_dtypes(device))
879:        dtypes = get_all_dtypes(include_bfloat16=False)
898:        dtypes = get_all_dtypes(include_bfloat16=False, include_bool=False)
965:    dtypesIfCUDA(*itertools.product(get_all_dtypes(include_bfloat16=False, include_complex=False),
966:                                     get_all_dtypes(include_bfloat16=False, include_complex=False)))
967:    dtypes(*itertools.product(get_all_dtypes(include_half=False, include_bfloat16=False,
969:                               get_all_dtypes(include_half=False, include_bfloat16=False,
976:            return dtype in get_all_int_dtypes() + [torch.bool]
979:            return dtype in get_all_fp_dtypes(include_half=True, include_bfloat16=False)
```

</p>
</details>

<details>
<summary>

`test/test_unary_ufuncs.py`</summary>

<p>

```python
24:    floating_types_and, all_types_and_complex_and, floating_and_complex_types_and, get_all_dtypes, get_all_math_dtypes,
25:    get_all_int_dtypes, get_all_fp_dtypes, get_all_complex_dtypes
517:    dtypes(*(get_all_int_dtypes() + [torch.bool] +
518:              get_all_fp_dtypes(include_bfloat16=False)))
596:    dtypes(*get_all_fp_dtypes(include_half=True, include_bfloat16=False))
611:        invalid_input_dtypes = get_all_int_dtypes() + \
612:            get_all_complex_dtypes() + \
619:        for dtype in get_all_fp_dtypes(include_half=True, include_bfloat16=False):
1048:    dtypes(*get_all_math_dtypes('cpu'))
1182:    dtypesIfCUDA(*get_all_fp_dtypes())
1190:    dtypesIfCUDA(*get_all_fp_dtypes())
1205:    dtypesIfCUDA(*get_all_fp_dtypes())
1215:    dtypesIfCUDA(*get_all_fp_dtypes())
1307:    dtypes(*(get_all_dtypes(include_bool=False)))
1349:    dtypes(*(get_all_fp_dtypes(include_half=False) +
1350:              get_all_complex_dtypes()))
1351:    dtypesIfCUDA(*(get_all_fp_dtypes(include_half=True) +
1352:                    get_all_complex_dtypes()))
```

</p>
</details>

<details>
<summary>

`test/test_view_ops.py`</summary>

<p>

```python
19:    get_all_dtypes, get_all_int_dtypes, get_all_fp_dtypes, get_all_complex_dtypes
124:    dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes()))
131:    dtypes(*get_all_dtypes(include_bfloat16=False))
213:            for view_dtype in [*get_all_fp_dtypes(), *get_all_complex_dtypes()]:
220:    dtypes(*get_all_dtypes())
224:        for view_dtype in get_all_dtypes():
305:    dtypes(*get_all_complex_dtypes(include_complex32=True))
343:    dtypes(*get_all_dtypes())
354:    dtypes(*get_all_dtypes())
364:    dtypes(*get_all_dtypes())
374:    dtypes(*get_all_dtypes())
384:    dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes()))
395:    dtypes(*get_all_complex_dtypes())
426:    dtypes(*get_all_complex_dtypes())
451:    dtypes(*product(get_all_complex_dtypes(), get_all_dtypes()))
1263:    dtypes(*(torch.testing.get_all_dtypes()))
1279:    dtypes(*(torch.testing.get_all_dtypes()))
1405:    dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False) +
1406:              get_all_complex_dtypes()))
1471:    dtypes(*get_all_dtypes(include_bfloat16=False))
1574:    dtypes(*get_all_dtypes())
1601:    dtypes(*get_all_dtypes(include_bfloat16=False))
1632:    dtypes(*get_all_dtypes(include_bfloat16=False))
1711:        for dt in get_all_dtypes():
1717:        for dt in get_all_dtypes():
1724:        for dt in get_all_dtypes():
```

</p>
</details>

I'm looking forward to your viewpoints. Thanks :)

cc: mruberry kshitij12345 anjali411

Pull Request resolved: https://github.com/pytorch/pytorch/pull/71561

Reviewed By: samdow

Differential Revision: D34856571

Pulled By: mruberry

fbshipit-source-id: 0dca038bcad5cf69906245c496d2e61ac3876335
(cherry picked from commit b058f67b4313143efa714ab105f36e74083131b9)
2022-03-15 20:31:41 +00:00
Sergii Dymchenko
794f813522 Fix test_binary_ufuncs.py for Python 3.10
This is a partial fix for https://github.com/pytorch/pytorch/issues/66424

Before this PR, test_binary_ufuncs.py gives DeprecationWarning in Python 3.8 and "TypeError: 'float' object cannot be interpreted as an integer" in Python 3.10.

With this PR, `python run_test.py -i test_binary_ufuncs.py` succeeds  with Python 3.10.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74013
Approved by: https://github.com/ezyang
2022-03-11 02:13:39 +00:00
Philip Meier
0973c5a1cc align signature of make_tensor with other creation ops (#72702)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72702

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D34457729

Pulled By: mruberry

fbshipit-source-id: 83d580c4201eef946dc9cf4b9e28a3d36be55609
(cherry picked from commit aa4cf20fbeb4b795595729b8ac2e6ba7707d8283)
2022-02-25 06:30:31 +00:00
Philip Meier
1f74e082e2 only compare attributes for meta tensors (#72508)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72508

Todo:

- [x] document this behavior
- [x] add tests

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D34262452

Pulled By: ezyang

fbshipit-source-id: bc5c9653d5c3ad5c6efccc9c8e0efc0d28e15104
(cherry picked from commit 233142c88e)
2022-02-17 02:33:08 +00:00
Natalia Gimelshein
06d0536dad Low precision support for jiterator (#70157)
Summary:
This adds support for bfloat16 and fp16 types for jiterator by adding at::Half and at::BFloat16 classes to the jiterator code template. The only methods defined in those classes are construction from float and implicit conversion to float. Mathematical operations on them never need to be defined, because jiterator is written in a way to implicitly upcast the inputs to the functor, so all math has to be performed on float only (e.g. compute part of the kernel would always be written as
```
        out[j] = i0<float>(arg0[j]);
```
It also adds support for casting to complex outputs, by adding a similar templated class c10::complex<T>. Originally I planned to only support float -> complex complex for it, but to compile fetch_and_cast function we also need complex -> float conversion. We can avoid it by compiling fetch_and_cast for a different subset of types, but I'm not doing it in this PR. Thus, technically, we can compile a kernel that would accept complex inputs and produce wrong results, but we are guarding against it by static asserting that none of the functor datatype are complex, and runtime-checking that none of the inputs are complex.
Adding bfloat16, half and complex support allows us to remove special handling for type promotion tests for gcd.
i0 (that supports half and bfloat16 inputs) is moved to use jiterator.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70157

Reviewed By: mruberry

Differential Revision: D33221645

Pulled By: ngimel

fbshipit-source-id: 9cfe8aba3498a0604c4ea62c217292ea06c826b1
2021-12-19 11:56:57 -08:00
Natalia Gimelshein
9ff8c49ed9 Enable cpu scalar arguments for jiterator (#69861)
Summary:
Creates analog of `gpu_kernel_with_scalars` for jiterator kernels

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69861

Reviewed By: mruberry

Differential Revision: D33134013

Pulled By: ngimel

fbshipit-source-id: fd2412e8d6432e15d5721e95a194d29fa70ad92c
2021-12-16 10:58:59 -08:00
Mike Ruberry
a974699633 Skips failing ROCm test (#69456)
Summary:
ROCm and CUDA type promotion are slightly divergent and need to be updated.

cc jeffdaily sunway513 jithunnair-amd ROCmSupport KyleCZH

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69456

Reviewed By: anjali411, janeyx99

Differential Revision: D32883895

Pulled By: mruberry

fbshipit-source-id: 3b0ba8a9d092c2d7ff20d78da42d4a147b1db12d
2021-12-06 09:12:31 -08:00
Mike Ruberry
b6f41bb848 The Jiterator (#69439)
Summary:
This PR:

- creates the "jiterator" pattern, allowing elementwise unary and binary kernels that don't accept scalars to be jit compiled when called
- ports the gcd and i1 CUDA kernels to use the jiterator
- extends elementwise binary systemic testing to be comparable to elementwise unary systemic testing
- separates one test case from test_out in test_ops.py
- updates more OpInfos to use expected failures instead of skips

The jiterator currently does not support half, bfloat16 or complex dtypes. It also (as mentioned above) doesn't support scalar inputs. In the future we expect to add support for those datatypes and scalars.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69439

Reviewed By: ngimel

Differential Revision: D32874968

Pulled By: mruberry

fbshipit-source-id: d44bb9cde4f602703e75400ec5a0b209f085e9b3
2021-12-06 07:32:48 -08:00
soulitzer
83e8612d11 Clean up test autograd (#67413)
Summary:
Partially fixes https://github.com/pytorch/pytorch/issues/66066

This PR:
 - cleans up op-specific testing from test_autograd. test_autograd should be reserved for testing generic autograd functionality
 - tests related to an operator are better colocated
 - see the tracker for details

What to think about when moving tests to their correct test suite:
 - naming, make sure its not too generic
 - how the test is parametrized, sometimes we need to add/remove a device/dtype parameter
 - can this be merged with existing tests

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67413

Reviewed By: jbschlosser, albanD

Differential Revision: D32031480

Pulled By: soulitzer

fbshipit-source-id: 8e13da1e58a38d5cecbfdfd4fe2b4fe6f816897f
2021-11-03 15:26:09 -07:00
kshitij12345
885a8e53ba replace onlyOnCPUAndCUDA with onlyNativeDeviceTypes (#65201)
Summary:
Reference https://github.com/pytorch/pytorch/issues/53849

Replace `onlyOnCPUandCUDA` with `onlyNativeDeviceTypes` which includes `cpu, cuda and meta`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65201

Reviewed By: mrshenli

Differential Revision: D31299718

Pulled By: mruberry

fbshipit-source-id: 2d8356450c035d6a314209ab51b2c237583920fd
2021-11-01 09:22:34 -07:00
Yukio Siraichi
83f70db95c Fix common device computation for comparison ops. (#66245)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66245

Fixes #66053

This PR splits `declare_static_dtype_and_device` into two new methods for
`TensorIteratorBase`: `declare_static_dtype` and `declare_static_device`.

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D31503849

Pulled By: ngimel

fbshipit-source-id: 4b131b691d29ceb5f3709f5d6503997ea0875c54
2021-10-22 18:43:17 -07:00
Jane Xu
8a65047acc [skip ci] Set test owners for everything considered with module: tests (#66865)
Summary:
Action following https://github.com/pytorch/pytorch/issues/66232

cc mruberry

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66865

Reviewed By: anjali411

Differential Revision: D31771147

Pulled By: janeyx99

fbshipit-source-id: 8bebe5ac2098364ef1ee93b590abb5f4455b0f89
2021-10-20 09:37:03 -07:00
jiayisun
0b8dc0f04a add BFloat16 operators on CPU: logaddexp, logaddexp2, remainder (#63621)
Summary:
Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63621

Reviewed By: H-Huang

Differential Revision: D31640811

Pulled By: mruberry

fbshipit-source-id: 1fd061b65c196398738018eefc52bf459e424b1c
2021-10-15 13:11:45 -07:00
Freey0
2223737da9 restore test_inplace_comparison_ops_require_inputs_have_same_dtype Expected behavior (#64267)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64267

This test expects every operation to throw a runtime error.

And Reinsert in-place operation test,Fix bug for comparison operation

fix: #64018

Test Plan: Imported from OSS

Reviewed By: gchanan

Differential Revision: D30720915

Pulled By: ezyang

fbshipit-source-id: 215a6556d20770f70f4ced1c1f9a9753933f1d37
2021-09-08 06:42:12 -07:00
Kevin Tse
7e4ebe06ca Fixes issue related torch.trapezoid broadcasting behavior and documentation (#64054)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64054

Fixes #63608

cc mruberry rgommers heitorschueroff

Test Plan: Imported from OSS

Reviewed By: saketh-are

Differential Revision: D30617078

Pulled By: NivekT

fbshipit-source-id: 815896ec56d447562790df4d662e94fd13457e2a
2021-09-07 11:41:55 -07:00
Philip Meier
26b7ff5aea deprecate dtype getters from torch.testing namespace (#63554)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63554

Following https://github.com/pytorch/pytorch/pull/61840#issuecomment-884087809, this deprecates all the dtype getters publicly exposed in the `torch.testing` namespace. The reason for this twofold:

1. If someone is not familiar with the C++ dispatch macros PyTorch uses, the names are misleading. For example `torch.testing.floating_types()` will only give you `float32` and `float64` skipping `float16` and `bfloat16`.
2. The dtype getters provide very minimal functionality that can be easily emulated by downstream libraries.

We thought about [providing an replacement](https://gist.github.com/pmeier/3dfd2e105842ad0de4505068a1a0270a), but ultimately decided against it. The major problem is BC: by keeping it, either the namespace is getting messy again after a new dtype is added or we need to somehow version the return values of the getters.

Test Plan: Imported from OSS

Reviewed By: H-Huang

Differential Revision: D30662206

Pulled By: mruberry

fbshipit-source-id: a2bdb10ab02ae665df1b5b76e8afa9af043bbf56
2021-09-07 08:58:51 -07:00
Saketh Are
83e28a7d28 Use stacklevel for floordiv deprecation warnings (#64034)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/60548

`Tensor.__floordiv__` was indirectly deprecated by deprecation of `torch.floor_divide` (see https://github.com/pytorch/pytorch/issues/43874). Deprecating it directly provides clearer feedback.

Repro:
```
import torch
x = torch.tensor(0)
x // 1
```

Before this change, a deprecation warning was triggered within the C++ implementation of floor_divide:
```
UserWarning: floor_divide is deprecated, and will be removed in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values.
To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at  ../aten/src/ATen/native/BinaryOps.cpp:571.)
  return torch.floor_divide(self, other)
```

After this change, the warning instead cites the user's offending line of Python code:
```
UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values.
To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
  x // 1
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64034

Reviewed By: mruberry

Differential Revision: D30658010

Pulled By: saketh-are

fbshipit-source-id: b0e6c5008d741897509d102f4a89efb47de4aa2a
2021-08-31 11:27:56 -07:00
Kushashwa Ravi Shrimali
d37636901e [Doc] make_tensor to torch.testing module (#63925)
Summary:
This PR aims to add `make_tensor` to the `torch.testing` module in PyTorch docs.

TODOs:

* [x] Add examples

cc: pmeier mruberry brianjo

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63925

Reviewed By: ngimel

Differential Revision: D30633487

Pulled By: mruberry

fbshipit-source-id: 8e5a1f880c6ece5925b4039fee8122bd739538af
2021-08-30 12:25:40 -07:00
Philip Meier
70a3210eca Add BinaryUfuncOpInfo and broadcasting tests (#61964)
Summary:
As proof of concept, this PR uses the new `BinaryUfuncOpInfo` in broadcasting tests for `add`, `sub`, `mul`, `div`, `floor_div`, and `true_div`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61964

Reviewed By: ngimel

Differential Revision: D30407734

Pulled By: mruberry

fbshipit-source-id: ada28994f43b0635f279f45a02ecba18bc8ee033
2021-08-20 11:44:15 -07:00
Shen Li
1022443168 Revert D30279364: [codemod][lint][fbcode/c*] Enable BLACK by default
Test Plan: revert-hammer

Differential Revision:
D30279364 (b004307252)

Original commit changeset: c1ed77dfe43a

fbshipit-source-id: eab50857675c51e0088391af06ec0ecb14e2347e
2021-08-12 11:45:01 -07:00
Zsolt Dollenstein
b004307252 [codemod][lint][fbcode/c*] Enable BLACK by default
Test Plan: manual inspection & sandcastle

Reviewed By: zertosh

Differential Revision: D30279364

fbshipit-source-id: c1ed77dfe43a3bde358f92737cd5535ae5d13c9a
2021-08-12 10:58:35 -07:00
Kevin Tse
87465a6e68 adding operator cumulative_trapezoid (#61615)
Summary:
Stack from [ghstack](https://github.com/ezyang/ghstack):
* https://github.com/pytorch/pytorch/issues/61616
* **https://github.com/pytorch/pytorch/issues/61615**
* https://github.com/pytorch/pytorch/issues/61475

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61615

Reviewed By: malfet, mruberry

Differential Revision: D29975064

Pulled By: NivekT

fbshipit-source-id: 4d4e98f3efb720fdc44eb238ecbf0fa157ac13d7
2021-08-03 08:04:00 -07:00
kshitij12345
fd8004b42e add bfloat16 impl for nextafter (#61829)
Summary:
Add `BFloat16` support for `nextafter`.

* [x] Add OpInfo
* [x] Add Implementation Test (C++ tests)
* [x] Add credit

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61829

Reviewed By: ejguan

Differential Revision: D29932498

Pulled By: mruberry

fbshipit-source-id: 89524531a4800569ba1addd08a4ace330a6f72a4
2021-08-02 23:16:58 -07:00
Freey0
109bd5e78a OpInfo: bitwise_and (#61349)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61349

Also add type promotion test for bugs found by pr #60813

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D29592840

Pulled By: ezyang

fbshipit-source-id: ee013b20e31baf6c6ebf2edb881ae6d8e215c7a6
2021-07-22 07:04:17 -07:00
Nikita Shulga
604f503d30 Revert D29794958 + compilation fix (#61937)
Summary:
This PR un-reverts https://github.com/pytorch/pytorch/issues/61475 + fixes compilation with MSVC, that does not recognize alternative operator spellings (i.e. using `or` instead of `||` )

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61937

Reviewed By: albanD

Differential Revision: D29805941

Pulled By: malfet

fbshipit-source-id: 01e5963c6717c1b44b260300d87ba0bf57f26ce9
2021-07-20 18:14:45 -07:00
Nikita Shulga
22fff61f06 Revert D29794958: [pytorch][PR] changing trapz to trapezoid
Test Plan: revert-hammer

Differential Revision:
D29794958 (95cec8f4fa)

Original commit changeset: 60b9c07efd47

fbshipit-source-id: 2dcda2d62e01c2521a86ae5ed8246cfb686d3f64
2021-07-20 16:00:46 -07:00
Kevin Tse
95cec8f4fa changing trapz to trapezoid (#61475)
Summary:
This PR resolves issue https://github.com/pytorch/pytorch/issues/52606 while also adding support for complex number

Stack from [ghstack](https://github.com/ezyang/ghstack):
* https://github.com/pytorch/pytorch/issues/61616
* https://github.com/pytorch/pytorch/issues/61615
* **https://github.com/pytorch/pytorch/issues/61475**

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61475

Reviewed By: mruberry

Differential Revision: D29794958

Pulled By: NivekT

fbshipit-source-id: 60b9c07efd47fd85b9c8178768fc7828d7b57d29
2021-07-20 15:25:55 -07:00
Akifumi Imanishi
4d9fd8958b Support __rand__, __ror__ and __rxor__ (#59240)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/58120.

This PR implements `torch.Tensor.{__rand__/__ror__/__rxor__}` for the compatibility with NumPy’s interface.
(cc: mruberry, rgommers, emcastillo, kmaehashi)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59240

Reviewed By: ngimel

Differential Revision: D29482304

Pulled By: mruberry

fbshipit-source-id: 13789202c1d8dddf8658a45381aeedcc31e2f603
2021-07-07 13:34:14 -07:00
Joel Schlosser
03b5a225a7 Test parametrization for instantiated device-specific tests (#60233)
Summary:
The `ops` decorator provides a way to parameterize a test across a given list of ops. This would be useful for modules as well (e.g. a `modules` decorator), but the mechanism by which this is accomplished is specific to ops. In the details, the `ops` decorator tags a test function with the metadata needed (list of ops, `dtypes`) and the actual tests are generated according to this metadata during the call to `instantiate_device_type_tests()`.

This PR makes this mechanism more generic, allowing for test parameterization across arbitrary dimensions. This makes a `modules` decorator (or any similar type of decorator) straightforward to implement without changes to the device-specific test instantiation logic.

One caveat is that, since this is implemented where the old `ops` decorator was (within `instantiate_device_type_tests()`), this only works for tests instantiated using the device-specific instantiation logic. Longer term, even device-specific test instantiation could be treated as an optional parameterization across device types, but this PR takes a low-risk approach for now. In practice, this just means that a `device` kwarg is required for all test signatures used with the mechanism.

The `ops` decorator has been refactored to use the generic mechanism and works the same as before, with one difference: when `OpDTypes.none` is specified, the test signature no longer needs an unused `dtype` kwarg. This is a nice bonus that demonstrates the added flexibility of a generic parameterization mechanism. The refactored form also has the bonus that all op-specific test generation logic is contained within the `ops` decorator class, improving readability.

Behind the scenes, the generic mechanism is a base decorator class (`_TestParameterizer`) from which `ops` derives. The core functionality is in the `_parameterize_test()` method, which takes in a test function and returns a generator that produces parameterized tests, including names and parameter kwargs to pass to them. Using the `ops` decorator results in a set of op-specific tests from a given generic test.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60233

Reviewed By: iramazanli

Differential Revision: D29494995

Pulled By: jbschlosser

fbshipit-source-id: a14446488c106094fafcaa75ccf8e9e3faf33bfc
2021-06-30 18:50:22 -07:00
kshitij12345
dfd2edc025 [special] add zeta (#59623)
Summary:
Reference https://github.com/pytorch/pytorch/issues/50345

`zeta` was already present in the codebase to support computation of `polygamma`.

However, `zeta` only had `double(double, double)` signature **for CPU** before the PR (which meant that computation `polygamma` were always upcasted to `double` for zeta part).

With this PR, float computations will take place in float and double in double.

Have also refactored the code and moved the duplicate code from `Math.cuh` to `Math.h`

**Note**: For scipy, q is optional, and if it is `None`, it defaults `1` which corresponds to Reimann-Zeta. However, for `torch.specia.zeta`, I made it mandatory cause for me it feels odd without `q` this is Reimann-Zeta and with `q` it is the general Hurwitz Zeta. I think sticking to just general made more sense as passing `1` for q sounds trivial.

Verify:
* [x] Docs https://14234587-65600975-gh.circle-artifacts.com/0/docs/special.html#torch.special.zeta

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59623

Reviewed By: ngimel

Differential Revision: D29348269

Pulled By: mruberry

fbshipit-source-id: a3f9ebe1f7724dbe66de2b391afb9da1cfc3e4bb
2021-06-24 00:00:12 -07:00
Akifumi Imanishi
26cdec6ce4 Support torch.bitwise_{left/right}_shift and __rlshift__, __rrshift__ (#59544)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/58121

This PR implements `torch.bitwise_left_shift` and `torch.bitwise_right_shift` and `torch.Tensor.{__rlshift__/__rrshift__}`for compatibility with Python array API standard.
(cc: mruberry, rgommers, emcastillo, kmaehashi)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59544

Reviewed By: ngimel

Differential Revision: D29348869

Pulled By: mruberry

fbshipit-source-id: 329aee296cf890735e8a9f858bccfe87c03d06ca
2021-06-23 23:57:16 -07:00
Masaki Kozuki
9e773ea7d5 Use accscalar_t for CUDA add/sub with Tensor and Scalar (#60454)
Summary:
Follow up of https://github.com/pytorch/pytorch/issues/60227, related to https://github.com/pytorch/pytorch/issues/59907 & https://github.com/pytorch/pytorch/issues/58833

With this pull request, `torch.add` & `torch.sub` use `acc_type` for `Scalar` if either of two arguments is `Scalar`.
This mimics the behavior of [`torch.mul`](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/cuda/BinaryMulDivKernel.cu#L18), `torch._foreach_(add|sub).Scalar` and `torch._foreach_(add|sub).ScalarList`.

 ---

**reference**
- torch.mul CUDA kernel: b0c9762e2d/aten/src/ATen/native/cuda/BinaryMulDivKernel.cu (L17-L25)
- `torch._foreach_(add|sub).Scalar`: cast scalar b0c9762e2d/aten/src/ATen/native/cuda/ForeachBinaryOpScalar.cu (L27)
- `torch._foreach_(add|sub).ScalarList`: `BinaryOpScalarListFunctor` b0c9762e2d/aten/src/ATen/native/cuda/ForeachFunctors.cuh (L180-L182) and multi_tensor_apply handles `scalar_t` and computes `opmath_t` (almost equivalent `accscalar_t`)  b0c9762e2d/aten/src/ATen/native/cuda/MultiTensorApply.cuh (L60-L68). BinaryOpScalarListFunctor
is used b0c9762e2d/aten/src/ATen/native/cuda/ForeachBinaryOpScalarList.cu (L24)

cc ngimel ptrblck mcarilli

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60454

Reviewed By: VitalyFedyunin

Differential Revision: D29345035

Pulled By: ngimel

fbshipit-source-id: 5dbafbdfe029a9544ec2e58f17d547928e017a04
2021-06-23 18:59:22 -07:00
Philip Meier
0c916c8a4e up the priority of numpy array comparisons in self.assertEqual (#59067)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/58988.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59067

Reviewed By: jbschlosser

Differential Revision: D28986642

Pulled By: heitorschueroff

fbshipit-source-id: 3ef2d26b4010fc3519d0a1a020ea446ffeb46ba0
2021-06-22 13:07:07 -07:00
Masaki Kozuki
b298013cd5 [add/sub] Cast alpha to acc_type (#60227)
Summary:
This PR lets `torch.add` & `torch.sub` CUDA kernels cast `alpha` to `acc_type`, not `scalar_t`.
I do not remove `cast`s from `test/test_foreach.py` because I'll do this in https://github.com/pytorch/pytorch/issues/59907 or follow-up for it.

Current upstream `torch._foreach_add` & `torch._foreach_sub` upcast `alpha` parameter to `acc_type<scalar_t>` while `torch.add` & `torch.sub` not. This is kind of problematic because outputs of `torch.add` and `torch.sub` are different from `torch._foreach_add` and `torch._foreach_sub`, respectively if the dtype of input tensors is either `torch.half` or `torch.bfloat16`. The discrepancy is proportional-ish to `abs(alpha)` except when `alpha` is representable with 16 bits.

ref:
- `torch._foreach_add` & `torch._foreach_sub` cast `alpha`: 6d0fb85a62/aten/src/ATen/native/cuda/ForeachBinaryOpList.cu (L21-L28), `BinaryOpListAlphaFunctor` is defined here: 6d0fb85a62/aten/src/ATen/native/cuda/ForeachFunctors.cuh (L202)

related: https://github.com/pytorch/pytorch/issues/58833, https://github.com/pytorch/pytorch/pull/59907

cc ngimel ptrblck mcarilli

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60227

Reviewed By: mruberry

Differential Revision: D29252759

Pulled By: ngimel

fbshipit-source-id: 847f3b9493ae30a900f7445af00aef1abcc1ab21
2021-06-20 19:05:22 -07:00
Akifumi Imanishi
0a5bfa9919 Support __rmod__ (#58476)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/58035.

This PR implements `torch.Tensor.__rmod__` and `torch.remainder(scalar, tensor)` for the compatibility with NumPy’s interface.
(cc: mruberry, rgommers, emcastillo, kmaehashi)

TODO:
  - [x] Update `tensor_binary_op` in test/test_binary_ufuncs.py after https://github.com/pytorch/pytorch/issues/58216 is merged.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58476

Reviewed By: ngimel

Differential Revision: D28776810

Pulled By: mruberry

fbshipit-source-id: 74f8aea80f439ef2cc370333524e39971eeb7bf4
2021-06-05 16:19:24 -07:00
Akifumi Imanishi
3113a1de4a Fix some tensor operators to return NotImplemented for invalid inputs (#58216)
Summary:
Same as https://github.com/pytorch/pytorch/issues/57934. (cc/ albanD)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58216

Reviewed By: ailzhang

Differential Revision: D28494886

Pulled By: albanD

fbshipit-source-id: 380205867ee1cde90e1c6fcfe2a31749e1243530
2021-05-19 13:09:57 -07:00
Xue Haotian
098d9975a7 Port heaviside to structured kernel (#57933)
Summary:
Port heaviside to structured kernel
Related https://github.com/pytorch/pytorch/issues/55070

Pull Request resolved: https://github.com/pytorch/pytorch/pull/57933

Reviewed By: mruberry

Differential Revision: D28362533

Pulled By: ezyang

fbshipit-source-id: 96b4591db3f609434784bd0ef9e54c61c918fb88
2021-05-13 10:48:11 -07:00
Alban Desmaison
5e83c62a9e Revert D28351931: [pytorch][PR] Fix some tensor operators to return NotImplemented for invalid inputs
Test Plan: revert-hammer

Differential Revision:
D28351931 (35521a2629)

Original commit changeset: 985457a44dba

fbshipit-source-id: 10724c219e53648f10a70719e25bcf774c6c7852
2021-05-12 13:58:03 -07:00
Akifumi Imanishi
35521a2629 Fix some tensor operators to return NotImplemented for invalid inputs (#57934)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/57719.

This PR fixes `torch.Tensor{__rsub__, __rdiv__, __rtruediv__, __pow__, __rmatmul__}` to return `NotImplemented` instead of raising a `TypeError`.

cc/ mruberry: The first commit of this PR is the same as 1d209db1cc excepts the commit message.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/57934

Reviewed By: mruberry

Differential Revision: D28351931

Pulled By: albanD

fbshipit-source-id: 985457a44dba24d2496794dfb8c1661cbcd4ff8f
2021-05-12 11:03:23 -07:00
Akifumi Imanishi
14282232d9 Fix generate_not_implemented_tests not testing unknown types correctly (#56997)
Summary:
Currently, the test code is not testing unknown types correctly because `op` is overwritten in the for-loop (i.e., currently only `__ior__` is tested).
This PR fixes the test `generate_not_implemented_tests` to bind operator name to each method, and remove operators currently unsupported (`__rand__`, …).

cc/ mruberry This fix is be needed to add tests for the operator we are going to introduce (e.g., `__rand__`)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56997

Reviewed By: astaff

Differential Revision: D28118465

Pulled By: mruberry

fbshipit-source-id: c5a466a7604262ed5490862300d47043aff63d0b
2021-05-09 05:34:10 -07:00
Wenlei Xie
20085f6d23 Support auto generation of device check (#56872)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56872

ghstack-source-id: 127914018

Test Plan: auto test

Reviewed By: ezyang

Differential Revision: D27986429

fbshipit-source-id: 0da8413b0b8e6810fcea27ed1de499f11f68bd1f
2021-05-01 12:02:09 -07:00
kshitij12345
d4ddb47719 [special] Add xlog1py (#55138)
Summary:
Reference : https://github.com/pytorch/pytorch/issues/50345

* [x] Check Rendered Document (https://12494173-65600975-gh.circle-artifacts.com/0/docs/special.html#torch.special.xlog1py)
* [x] Tests in Binary Ufunc
* [x] OpInfo
* [x] Structured Kernel

Pull Request resolved: https://github.com/pytorch/pytorch/pull/55138

Reviewed By: ngimel

Differential Revision: D27961461

Pulled By: mruberry

fbshipit-source-id: 30a8f41970a829bf50254aadf5615e8ce4148c7e
2021-04-30 05:51:13 -07:00
Peter Bell
5536cda19a Update floor_divide behavior in line with NumPy 1.20 (#56893)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56893

Fixes gh-56814

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D28025814

Pulled By: mruberry

fbshipit-source-id: 8654978ea1d5aa7c12bcf5a8c939966287a2d34e
2021-04-28 05:01:23 -07:00
Brian Hirsh
e8faf69739 fix torch.pow type promotion issue (#54085)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54085

Fixes https://github.com/pytorch/pytorch/issues/50121.

This fixes two similar issues pointed out with the dtype that `torch.pow` performs its computation. Thanks ngimel for spotting the issues originally (comments [here](https://github.com/pytorch/pytorch/pull/53669#discussion_r594624355) and [here](https://github.com/pytorch/pytorch/pull/53669#discussion_r594719704))!

Before:
```
>>> torch.pow(2, torch.tensor([17], dtype=torch.uint8), out=torch.tensor([0]))
tensor([0])
>>> torch.pow(2, torch.tensor(17, dtype=torch.uint8), out=torch.tensor(0))
tensor(131072)
>>> torch.pow(2, torch.tensor([17], dtype=torch.uint8, device='cuda'), out=torch.tensor([0], device='cuda'))
tensor([131072], device='cuda:0')
>>> torch.pow(2, torch.tensor(17, dtype=torch.uint8, device='cuda'), out=torch.tensor(0, device='cuda'))
tensor(131072, device='cuda:0')
```

After:
```
>>> torch.pow(2, torch.tensor([17], dtype=torch.uint8), out=torch.tensor([0]))
tensor([0])
>>> torch.pow(2, torch.tensor(17, dtype=torch.uint8), out=torch.tensor(0))
tensor(0)
>>> torch.pow(2, torch.tensor([17], dtype=torch.uint8, device='cuda'), out=torch.tensor([0], device='cuda'))
tensor([0], device='cuda:0')
>>> torch.pow(2, torch.tensor(17, dtype=torch.uint8, device='cuda'), out=torch.tensor(0, device='cuda'))
tensor(0, device='cuda:0')
```

In all four cases above, `tensor(0, ...)` is the correct value because the computed "common dtype" among the inputs is expected to be `uint8`. Computing `2 ** 7` in uint8 will then overflow to zero. Finally, we cast the computed output to the output tensor's dtype, which is `int32`.

There were two separate issues fixed in this PR: one for cpu and one for cuda:
* For CPU, The `pow(Scalar, Tensor)` overload wasn't calling `set_wrapped_number(true)` after wrapping the scalar in a Tensor, which caused the "promoted" scalar to incorrectly participate in type promotion (see the documented behavior [here](aa8714dfed/c10/core/TensorImpl.h (L590)))
* For CUDA, the cuda kernels defined in `PowKernel.cu` were using the output's dtype to run the computation, instead of the common dtype.

As an aside: The CPU and CUDA kernels actually both use `iter.dtype()` instead of `iter.common_dtype()` to run the computation, which I fixed. The reason that only manifested here for CUDA is because TensorIterator has cpu-specific logic to create temporary outputs with the intermediate dtype (shown [here](aa8714dfed/aten/src/ATen/TensorIterator.cpp (L349))). I'm not sure what the end state is there- I can imagine that being something we're more okay doing for cpu than for cuda, but it also leads to hard-to-track-down inconsistencies between the two like in this case.

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D27096330

Pulled By: bdhirsh

fbshipit-source-id: a7e2909243851625cb3056d1e7abb2383bfe95f2
2021-04-15 08:55:53 -07:00
Winston Smith
aceceb3d5c Reland #50999 (Added pow() on CPU for float16 & bfloat16) (#55280)
Summary:
#### Reason for relanding
Line 1607 of `torch/testing/_internal/common_methods_invocations.py` of https://github.com/pytorch/pytorch/issues/50999  had `dtype` instead of `dtype=torch.bool`, so 4 of the 9 sample inputs for `bool` had incorrect dtype. This bug was caught by https://github.com/pytorch/pytorch/issues/54949.

1. Added support for pow() on CPU for `float16` (`Half`) and `bfloat16` types.
Both `pow(Tensor, Scalar)` and `pow(Tensor, Tensor)` are now supported for the aforementioned types.
However autograd isn't supported for `Float16` on CPU yet, as `log_vml_cpu` can't be enabled for it.
2. heitorschueroff added `pow_tensor_scalar_optimized_kernel` to refactor & simplify `PowKernel.cpp`.
It provides a common path for all the complex types & floating point types (except Float16, due to lack of complete AVX2 vectorization support for it).  It replaced code that had previously been duplicated for (float, double) and complex types,
so PowKernel.cpp looks a lot cleaner now.
3. Enabled (unskipped) some tests for `erf`, `erfc`,`erfinv`, `tan` and `linalg.vector.norm` which were being skipped earlier due to `pow()` not having been implemented for `float16` & `bfloat16`.
4. Added an OpInfo for `pow()` & enabled some test cases for `pow()`.
5. Extended the coverage of existing tests for `pow` in `test_binary_ufuncs.py` in order to enable comparison with `numpy`, even with discontiguous tensors, and added a test to ensure that a runtime error is raised for `pow`'s inplace variant if resizing the base tensor is required during its invocation.
6. Added `float16` & `bfloat16` to `square`'s dtype lists in its `UnaryUfuncInfo`.
7. Removed redundant `dtypesIfCPU` and `dtypesIfCUDA` from `OpInfo`s where they are equal to `dtypes`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/55280

Reviewed By: jbschlosser

Differential Revision: D27591772

Pulled By: heitorschueroff

fbshipit-source-id: c7420811b32595bb3353149a61e54a73f2eb352b
2021-04-13 13:23:29 -07:00
Richard Zou
1e70d217e7 Add error message for complex alpha and non-complex inputs (#54964)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54964

Previously, the following would error out with a strange error message:
```
import torch
x=torch.randn(2)
torch.rsub(x, 1, alpha=2j)

Traceback (most recent call last)
<ipython-input-2-caf2a1c03d0b> in <module>
      1 import torch
      2 x=torch.randn(2)
----> 3 torch.rsub(x, 1, alpha=2j)

RuntimeError: value cannot be converted to type float without overflow: (-0,-2)
```

The reason why this is happening is because the alpha check doesn't check for if `x` is not complex and `alpha` is complex.
The error gets thrown further along in the implementation of torch.sub,
when it coerces `alpha` to be the same dtype as the input tensor:
https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/cpu/BinaryOpsKernel.cpp#L53

This PR fixes the bad error message by adding a new check to the alpha check.

Test Plan:
- pytest test/test_binary_ufuncs.py
- NB: add, sub, and rsub all share the same alpha check. The test only tests it for torch.add, but that should be sufficient.

Reviewed By: gchanan

Differential Revision: D27504017

Pulled By: zou3519

fbshipit-source-id: 70b9aa75a7a4faaaa93f6ba235cae85998a91697
2021-04-07 14:12:34 -07:00
Peter Bell
2ee02b30b1 Replace rounding_mode="true" with rounding_mode=None (#51988)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51988

* **#51988 Replace rounding_mode="true" with rounding_mode=None**

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D27561817

Pulled By: mruberry

fbshipit-source-id: 60d1d9c389570f60d599fc1876518717367fb368
2021-04-05 14:53:43 -07:00
Nikita Shulga
8377e6221a Revert D27478225: [pytorch][PR] Added pow() on CPU for float16 & bfloat16
Test Plan: revert-hammer

Differential Revision:
D27478225 (6d030c14cf)

Original commit changeset: d309dd98d5a9

fbshipit-source-id: e0518f15185b41946caf3a8456c7af3f52e5a910
2021-04-03 10:26:44 -07:00
Winston Smith
6d030c14cf Added pow() on CPU for float16 & bfloat16 (#50999)
Summary:
Added the functionality desired in https://github.com/pytorch/pytorch/issues/50789.

1. Added support for pow() on CPU for `float16` (`Half`) and `bfloat16` types.
Both `pow(Tensor, Scalar)` and `pow(Tensor, Tensor)` are now supported for the aforementioned types.
However autograd isn't supported for `Float16` on CPU yet, as `log_vml_cpu` can't be enabled for it.
2. heitorschueroff added `pow_tensor_scalar_optimized_kernel` to refactor & simplify `PowKernel.cpp`.
It provides a common path for all the complex types & floating point types (except Float16, due to lack of complete AVX2 vectorization support for it).  It replaced code that had previously been duplicated for (float, double) and complex types,
so PowKernel.cpp looks a lot cleaner now.
3. Enabled (unskipped) some tests for `erf`, `erfc`,`erfinv`, `linalg.norm` and `linalg.vector.norm` which were being skipped earlier due to `pow()` not having been implemented for `float16` & `bfloat16`.
4. Added an OpInfo for `pow()` & enabled some test cases for `pow()`.
5. Extended the coverage of existing tests for `pow` in `test_binary_ufuncs.py` in order to enable comparison with `numpy`, even with discontiguous tensors, and added a test to ensure that a runtime error is raised for `pow`'s inplace variant if resizing the base tensor is required during its invocation.
6. Added `float16` & `bfloat16` to `square`'s dtype lists in its `UnaryUfuncInfo`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/50999

Reviewed By: zou3519

Differential Revision: D27478225

Pulled By: heitorschueroff

fbshipit-source-id: d309dd98d5a96d0cb9b08281757bb1c65266d011
2021-04-02 15:57:06 -07:00
kshitij12345
bac566bf61 torch.square : OpInfo and minor fixes (#52551)
Summary:
Reference: https://github.com/pytorch/pytorch/issues/42515

Add `out` variant to be consistent with Unary Ops.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/52551

Reviewed By: heitorschueroff

Differential Revision: D27233482

Pulled By: mruberry

fbshipit-source-id: fef6f241849a12c46028bd1aad8f5ecc1dc65ea1
2021-03-24 00:04:42 -07:00
kshitij12345
b93ab10b7a torch.lerp: cuda complex support (#54129)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/54048

TODO
* [x] Add test

Pull Request resolved: https://github.com/pytorch/pytorch/pull/54129

Reviewed By: bdhirsh

Differential Revision: D27261878

Pulled By: anjali411

fbshipit-source-id: 10937a2eab944c73b5a98ec6278f50a876b8c7dc
2021-03-23 19:58:43 -07:00
Brian Hirsh
779cae9e42 port at::pow to structured (#53669)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53669

This PR does two things:
* Ports `pow` to be structured
* Fixes a bug with how pow handles mixed cpu and cuda tensors

**bug fix**
Pow is a binary op, and all binary ops that use TensorIterator are currently written to handle the case when one of the inputs is a CUDA tensor, and the other is a zero-dimensional cpu tensor.

`pow` incidentally only handles one of the two cases: it fails when the CUDA tensor is passed as the exponent, e.g. `at::pow(torch.tensor(2.0, device='cpu'), torch.tensor([2, 2], device='cuda'))`. Porting `pow` to structured happened to change the error that was outputted from a `TORCH_CHECK` in TensorIterator to an `INTERNAL_ASSERT` in loop.cuh, so I ended up trying to fix the error and update the tests. I added more details in a comment on the PR.

**notes on the structured port**
Pow is a little weird, so I wrote down a couple of issues I noticed during the port:
* Multiple independent overloads. `pow` has two overloads that have their own cpu/cuda kernels, meaning one doesn't call the other. I have to update the names of the kernel overloads to make the compiler happy, since the codegen would otherwise try to generate two classes with the same name. `pow` actually has 3 overloads that all have `out` variants, so I ported all 3 to structured- one of them just happens to redispatch one of the others in most cases.
* Name propagation. Is name propagation implemented per operator? Or is expected to work for most/all ops by default. Right now it looks like it happens for TensorIterator ops by default. For ops that don't use TensorIterator, we need to explicitly pass the names through to the `set_output()` call in the meta function. This happened to matter for `pow` because it has 3 overloads, but only two of them directly use TensorIterator. I had to pass names directly to `set_output` in the 3rd overload to make tests happy.
*  Lack of `const Tensor &` in the C++ API. It's a goal to slowly make all `Tensor &` arguments const as part of the structured port, but in this case I needed to explicitly cast constness away because one structured kernel called back into the C++ API, which still has ordinary `Tensor &` arguments. This probably isn't something we'll fix soon, since we have boxing logic that actually relies on the `Tensor &` / `const Tensor &` distinction in some places.

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D27029821

Pulled By: bdhirsh

fbshipit-source-id: c1786e770de6e6c2474b9a48210b88057ab1018e
2021-03-19 14:30:48 -07:00
mattip
54a2498919 Modify tests to use assertWarnsOnceRegex instead of maybeWarnsRegex (#52387)
Summary:
Related to https://github.com/pytorch/pytorch/issues/50006

Follow on for https://github.com/pytorch/pytorch/issues/48560 to ensure TORCH_WARN_ONCE warnings are caught. Most of this is straight-forward find-and-replace, but I did find one place where the TORCH_WARN_ONCE warning was not wrapped into a python warning.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/52387

Reviewed By: albanD

Differential Revision: D26773387

Pulled By: mruberry

fbshipit-source-id: 5be7efbc8ab4a32ec8437c9c45f3b6c3c328f5dd
2021-03-08 03:32:14 -08:00
Natalia Gimelshein
3309f034aa remove pointless test (#52609)
Summary:
Fixes T81870118

Pull Request resolved: https://github.com/pytorch/pytorch/pull/52609

Reviewed By: mruberry

Differential Revision: D26584288

Pulled By: ngimel

fbshipit-source-id: 7cec37db46cfe5b5b2fd21fe7c3e3fcbb8aba049
2021-02-22 16:25:04 -08:00
Gregory Chanan
983347fa25 Allow broadcasting against lerp weights. (#52319)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52319

Fixes: https://github.com/pytorch/pytorch/issues/52254

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D26488411

Pulled By: gchanan

fbshipit-source-id: 60eb471609986584c4235ba7f263581e988e7642
2021-02-18 09:53:25 -08:00
Mike Ruberry
594a66d778 Warn about floor_divide performing incorrect rounding (#50281) (#50281)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50281

Pull Request resolved: https://github.com/pytorch/pytorch/pull/51745

Test Plan: Imported from OSS

Reviewed By: ngimel

Pulled By: mruberry

Differential Revision: D26257855

fbshipit-source-id: e5d497cf07b0c746838ed081c5d0e82fb4cb701b
2021-02-10 03:13:34 -08:00
Peter Bell
b150f150ba Add division overload with rounding_mode selection (#51706)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51706

Pull Request resolved: https://github.com/pytorch/pytorch/pull/50280

As mentioned in gh-43874, this adds a `rounding_mode={'true', 'trunc', 'floor'}`
argument so `torch.div` can be used as a replacement for `floor_divide` during
the transitional period.

I've included dedicated kernels for truncated and floor division which
aren't strictly necessary for float, but do perform significantly better (~2x) than
doing true division followed by a separate rounding kernel.

Note: I introduce new overloads for `aten::div` instead of just adding a default
`rounding_mode` because various JIT passes rely on the exact operator schema.

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D26123271

Pulled By: mruberry

fbshipit-source-id: 51a83717602114597ec9c4d946e35a392eb01d46
2021-02-04 13:08:36 -08:00
kiyosora
4803eaf502 Implement NumPy-like function torch.fmax() & torch.fmin() (#49312)
Summary:
- Implementing the NumPy-like function`torch.fmax()` and `torch.fmin()` recommended in https://github.com/pytorch/pytorch/issues/48440

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49312

Reviewed By: izdeby

Differential Revision: D25887246

Pulled By: heitorschueroff

fbshipit-source-id: d762eeff8b328bfcbe7d48b7ee9d2da72c249691
2021-01-20 06:45:25 -08:00