Commit Graph

123 Commits

Author SHA1 Message Date
Nikita Shulga
8377e6221a Revert D27478225: [pytorch][PR] Added pow() on CPU for float16 & bfloat16
Test Plan: revert-hammer

Differential Revision:
D27478225 (6d030c14cf)

Original commit changeset: d309dd98d5a9

fbshipit-source-id: e0518f15185b41946caf3a8456c7af3f52e5a910
2021-04-03 10:26:44 -07:00
Winston Smith
6d030c14cf Added pow() on CPU for float16 & bfloat16 (#50999)
Summary:
Added the functionality desired in https://github.com/pytorch/pytorch/issues/50789.

1. Added support for pow() on CPU for `float16` (`Half`) and `bfloat16` types.
Both `pow(Tensor, Scalar)` and `pow(Tensor, Tensor)` are now supported for the aforementioned types.
However autograd isn't supported for `Float16` on CPU yet, as `log_vml_cpu` can't be enabled for it.
2. heitorschueroff added `pow_tensor_scalar_optimized_kernel` to refactor & simplify `PowKernel.cpp`.
It provides a common path for all the complex types & floating point types (except Float16, due to lack of complete AVX2 vectorization support for it).  It replaced code that had previously been duplicated for (float, double) and complex types,
so PowKernel.cpp looks a lot cleaner now.
3. Enabled (unskipped) some tests for `erf`, `erfc`,`erfinv`, `linalg.norm` and `linalg.vector.norm` which were being skipped earlier due to `pow()` not having been implemented for `float16` & `bfloat16`.
4. Added an OpInfo for `pow()` & enabled some test cases for `pow()`.
5. Extended the coverage of existing tests for `pow` in `test_binary_ufuncs.py` in order to enable comparison with `numpy`, even with discontiguous tensors, and added a test to ensure that a runtime error is raised for `pow`'s inplace variant if resizing the base tensor is required during its invocation.
6. Added `float16` & `bfloat16` to `square`'s dtype lists in its `UnaryUfuncInfo`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/50999

Reviewed By: zou3519

Differential Revision: D27478225

Pulled By: heitorschueroff

fbshipit-source-id: d309dd98d5a96d0cb9b08281757bb1c65266d011
2021-04-02 15:57:06 -07:00
kshitij12345
bac566bf61 torch.square : OpInfo and minor fixes (#52551)
Summary:
Reference: https://github.com/pytorch/pytorch/issues/42515

Add `out` variant to be consistent with Unary Ops.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/52551

Reviewed By: heitorschueroff

Differential Revision: D27233482

Pulled By: mruberry

fbshipit-source-id: fef6f241849a12c46028bd1aad8f5ecc1dc65ea1
2021-03-24 00:04:42 -07:00
kshitij12345
b93ab10b7a torch.lerp: cuda complex support (#54129)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/54048

TODO
* [x] Add test

Pull Request resolved: https://github.com/pytorch/pytorch/pull/54129

Reviewed By: bdhirsh

Differential Revision: D27261878

Pulled By: anjali411

fbshipit-source-id: 10937a2eab944c73b5a98ec6278f50a876b8c7dc
2021-03-23 19:58:43 -07:00
Brian Hirsh
779cae9e42 port at::pow to structured (#53669)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53669

This PR does two things:
* Ports `pow` to be structured
* Fixes a bug with how pow handles mixed cpu and cuda tensors

**bug fix**
Pow is a binary op, and all binary ops that use TensorIterator are currently written to handle the case when one of the inputs is a CUDA tensor, and the other is a zero-dimensional cpu tensor.

`pow` incidentally only handles one of the two cases: it fails when the CUDA tensor is passed as the exponent, e.g. `at::pow(torch.tensor(2.0, device='cpu'), torch.tensor([2, 2], device='cuda'))`. Porting `pow` to structured happened to change the error that was outputted from a `TORCH_CHECK` in TensorIterator to an `INTERNAL_ASSERT` in loop.cuh, so I ended up trying to fix the error and update the tests. I added more details in a comment on the PR.

**notes on the structured port**
Pow is a little weird, so I wrote down a couple of issues I noticed during the port:
* Multiple independent overloads. `pow` has two overloads that have their own cpu/cuda kernels, meaning one doesn't call the other. I have to update the names of the kernel overloads to make the compiler happy, since the codegen would otherwise try to generate two classes with the same name. `pow` actually has 3 overloads that all have `out` variants, so I ported all 3 to structured- one of them just happens to redispatch one of the others in most cases.
* Name propagation. Is name propagation implemented per operator? Or is expected to work for most/all ops by default. Right now it looks like it happens for TensorIterator ops by default. For ops that don't use TensorIterator, we need to explicitly pass the names through to the `set_output()` call in the meta function. This happened to matter for `pow` because it has 3 overloads, but only two of them directly use TensorIterator. I had to pass names directly to `set_output` in the 3rd overload to make tests happy.
*  Lack of `const Tensor &` in the C++ API. It's a goal to slowly make all `Tensor &` arguments const as part of the structured port, but in this case I needed to explicitly cast constness away because one structured kernel called back into the C++ API, which still has ordinary `Tensor &` arguments. This probably isn't something we'll fix soon, since we have boxing logic that actually relies on the `Tensor &` / `const Tensor &` distinction in some places.

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D27029821

Pulled By: bdhirsh

fbshipit-source-id: c1786e770de6e6c2474b9a48210b88057ab1018e
2021-03-19 14:30:48 -07:00
mattip
54a2498919 Modify tests to use assertWarnsOnceRegex instead of maybeWarnsRegex (#52387)
Summary:
Related to https://github.com/pytorch/pytorch/issues/50006

Follow on for https://github.com/pytorch/pytorch/issues/48560 to ensure TORCH_WARN_ONCE warnings are caught. Most of this is straight-forward find-and-replace, but I did find one place where the TORCH_WARN_ONCE warning was not wrapped into a python warning.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/52387

Reviewed By: albanD

Differential Revision: D26773387

Pulled By: mruberry

fbshipit-source-id: 5be7efbc8ab4a32ec8437c9c45f3b6c3c328f5dd
2021-03-08 03:32:14 -08:00
Natalia Gimelshein
3309f034aa remove pointless test (#52609)
Summary:
Fixes T81870118

Pull Request resolved: https://github.com/pytorch/pytorch/pull/52609

Reviewed By: mruberry

Differential Revision: D26584288

Pulled By: ngimel

fbshipit-source-id: 7cec37db46cfe5b5b2fd21fe7c3e3fcbb8aba049
2021-02-22 16:25:04 -08:00
Gregory Chanan
983347fa25 Allow broadcasting against lerp weights. (#52319)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52319

Fixes: https://github.com/pytorch/pytorch/issues/52254

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D26488411

Pulled By: gchanan

fbshipit-source-id: 60eb471609986584c4235ba7f263581e988e7642
2021-02-18 09:53:25 -08:00
Mike Ruberry
594a66d778 Warn about floor_divide performing incorrect rounding (#50281) (#50281)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50281

Pull Request resolved: https://github.com/pytorch/pytorch/pull/51745

Test Plan: Imported from OSS

Reviewed By: ngimel

Pulled By: mruberry

Differential Revision: D26257855

fbshipit-source-id: e5d497cf07b0c746838ed081c5d0e82fb4cb701b
2021-02-10 03:13:34 -08:00
Peter Bell
b150f150ba Add division overload with rounding_mode selection (#51706)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51706

Pull Request resolved: https://github.com/pytorch/pytorch/pull/50280

As mentioned in gh-43874, this adds a `rounding_mode={'true', 'trunc', 'floor'}`
argument so `torch.div` can be used as a replacement for `floor_divide` during
the transitional period.

I've included dedicated kernels for truncated and floor division which
aren't strictly necessary for float, but do perform significantly better (~2x) than
doing true division followed by a separate rounding kernel.

Note: I introduce new overloads for `aten::div` instead of just adding a default
`rounding_mode` because various JIT passes rely on the exact operator schema.

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D26123271

Pulled By: mruberry

fbshipit-source-id: 51a83717602114597ec9c4d946e35a392eb01d46
2021-02-04 13:08:36 -08:00
kiyosora
4803eaf502 Implement NumPy-like function torch.fmax() & torch.fmin() (#49312)
Summary:
- Implementing the NumPy-like function`torch.fmax()` and `torch.fmin()` recommended in https://github.com/pytorch/pytorch/issues/48440

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49312

Reviewed By: izdeby

Differential Revision: D25887246

Pulled By: heitorschueroff

fbshipit-source-id: d762eeff8b328bfcbe7d48b7ee9d2da72c249691
2021-01-20 06:45:25 -08:00
76181208+imaginary-person@users.noreply.github.com
3f052ba07b Remove unnecessary dtype checks for complex types & disable complex dispatch for CPU min/max pointwise ops (#50465)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/50064

**PROBLEM DESCRIPTION:**
1. Had not removed dtype checks for complex types in the previous PR (https://github.com/pytorch/pytorch/issues/50347) for this issue.
These type-checks were added in https://github.com/pytorch/pytorch/issues/36377, but are no longer necessary,
as we now rely upon dispatch macros to produce error messages.
2. dtype checks in `clamp_max()` and `clamp_min()` for complex inputs had not been removed either.
3. For min/max pointwise ops in TensorCompareKernel.cpp, complex dispatch had not been removed for min/max functions.

### **FIX DESCRIPTION:**
**FIX SUMMARY:**
1. Removed dtype checks added in https://github.com/pytorch/pytorch/issues/36377, and added 3 more in TensorCompare.cpp.
2. Removed dtype checks for complex inputs in `clamp_max()` and `clamp_min()`.
3.  Disabled complex dispatch for min/max pointwise ops in TensorCompareKernel.cpp.
4. Error messages in the exceptions raised due to min/max ops not being implemented are now checked for containing the text _not support_ (which can also be present in _not supported_), or _not implemented_, so one of them should be a part of error messages, in order for them to be informative.

**REASON FOR NOT CHANGING DISPATCH FOR CUDA AND CLAMP OPS**:

As for the CUDA min/max operations, their kernels do not seem to be compiled & dispatched for complex types anyway, so no further changes seem to be required. Basically, the dispatch macros currently being used don't have cases for complex types.

For example,

1. the reduce CUDA ops use [AT_DISPATCH_ALL_TYPES_AND2 (678fe9f077)](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/Dispatch.h#L548-L575) in [ReduceMinMaxKernel.cu](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/cuda/ReduceMinMaxKernel.cu), and that macro doesn't allow complex types.

2. In [MinMaxElementwiseKernel.cu](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/cuda/MaxMinElementwiseKernel.cu), the CUDA pointwise ops use [`AT_DISPATCH_FLOATING_TYPES_AND2 (678fe9f077)`](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/Dispatch.h#L240-L263) for non-integral & non-boolean types, and this marco doesn't have a case for complex types either.

3. [clamp CUDA ops](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/cuda/UnaryOpsKernel.cu#L170-L211) use `AT_DISPATCH_ALL_TYPES_AND2 (678fe9f077)`, which doesn't have a case for complex types.

Similarly, [CPU clamp min/max ops](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/cpu/UnaryOpsKernel.cpp#L428-L458) use the `AT_DISPATCH_ALL_TYPES_AND `dispatch macro, which doesn't have a case for complex types.

**REASON FOR ADDING 3 dtype CHECKS:**
There are a few cases in which the methods corresponding to `min_stub()` or `max_stub()` are not called, so dispatch macros don't get invoked, resulting in no exceptions being raised. Hence, `dtype` checks are necessary at 3 places to raise exceptions:

1. 52dcc72999/aten/src/ATen/native/TensorCompare.cpp (L342)
2. 52dcc72999/aten/src/ATen/native/TensorCompare.cpp (L422)
3. 52dcc72999/aten/src/ATen/native/TensorCompare.cpp (L389)

The first dtype check requirement can be verified from the following example Python code based on `test_complex_unsupported()`:
```
import unittest
import torch

class MyTestCase(unittest.TestCase):

   def test_1(self):
      t = torch.tensor((1 + 1j), device='cpu', dtype=torch.complex128)
      with self.assertRaises(Exception):
         torch.max(t, dim=0)

if __name__ == '__main__':
    unittest.main()
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/50465

Reviewed By: mruberry

Differential Revision: D25938106

Pulled By: ngimel

fbshipit-source-id: 95e2df02ba8583fa3ce87d4a2fdcd60b912dda46
2021-01-17 22:00:05 -08:00
Erjia Guan
ca5d9617ba Fix remainder type promotion (#48668)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48668

Combine tests for `fmod` and `remainder`.

## BC-breaking Note:
In order to make `remainder` operator have type promotion, we have to introduce BC breaking.
### 1.7.1:
In the case where the second argument is a python number, the result is casted to the dtype of the first argument.
```python
>>> torch.remainder(x, 1.2)
tensor([0, 0, 0, 0, 0], dtype=torch.int32)
```
### This PR:
In the case where the second argument is a python number, the dtype of result is determined by type promotion of both inputs.
```python
>>> torch.remainder(x, 1.2)
tensor([1.0000, 0.8000, 0.6000, 0.4000, 0.2000])
```

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D25869136

Pulled By: ejguan

fbshipit-source-id: 8e5e87eec605a15060f715952de140f25644008c
2021-01-12 22:09:30 -08:00
Erjia Guan
a0f7b18391 Fix fmod type promotion (#48278)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48278

Remove various lines from tests due to no type promotion introduced from #47323

## BC-breaking Note:
In order to make `fmod` operator have type promotion, we have to introduce BC breaking.
### 1.7.1:
In the case where the second argument is a python number, the result is casted to the dtype of the first argument.
```python
>>> torch.fmod(x, 1.2)
tensor([0, 0, 0, 0, 0], dtype=torch.int32)
```
### Prior PR:
Check the BC-breaking note of #47323

### This PR:
In the case where the second argument is a python number, the dtype of result is determined by type promotion of both inputs.
```python
>>> torch.fmod(x, 1.2)
tensor([1.0000, 0.8000, 0.6000, 0.4000, 0.2000])
```

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D25869137

Pulled By: ejguan

fbshipit-source-id: bce763926731e095b75daf2e934bff7c03ff0832
2021-01-12 22:04:19 -08:00
Gregory Chanan
88bd69b488 Stop using c10::scalar_to_tensor in float_power. (#50105)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50105

There should be no functional change here.

A couple of reasons here:
1) This function is generally an anti-pattern (https://github.com/pytorch/pytorch/issues/49758) and it is good to minimize its usage in the code base.
2) pow itself has a fair amount of smarts like not broadcasting scalar/tensor combinations and we should defer to it.

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D25786172

Pulled By: gchanan

fbshipit-source-id: 89de03aa0b900ce011a62911224a5441f15e331a
2021-01-08 09:44:15 -08:00
Gregory Chanan
74dcb6d363 torch.xlogy: Use wrapped_scalar_tensor / gpu_with_scalars to speed up GPU kernel. (#49926)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49926

While investigating https://github.com/pytorch/pytorch/issues/49758, I changed the xlogy kernel to use the recommended wrapped_scaler_tensor pattern instead of moving the scalar to the GPU as a tensor.
While this doesn't avoid a synchronization (there is no synchronization in the move, as its done via fill), this does significantly speed up the GPU kernel (almost ~50%, benchmark in PR comments).

From looking at the nvprof output, it looks like this code path avoids broadcasting.  Aside: this seems unnecessary, as there is nothing special from the point-of-view of broadcasting whether the Tensor
is ()-sized or marked as a wrapped_scalar.  Still, this is a useful change to make as we avoid extra kernel launches and dispatches to create and fill the tensor.

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D25724215

Pulled By: gchanan

fbshipit-source-id: 4adcd5d8b3297502672ffeafc77e8af80592f460
2021-01-04 12:42:08 -08:00
kshitij12345
2780400904 [numpy] Add torch.xlogy (#48777)
Summary:
Reference https://github.com/pytorch/pytorch/issues/38349
Fixes https://github.com/pytorch/pytorch/issues/22656

TODO:
* [x] Add docs
* [x] Add tests

Pull Request resolved: https://github.com/pytorch/pytorch/pull/48777

Reviewed By: ngimel

Differential Revision: D25681346

Pulled By: mruberry

fbshipit-source-id: 369e0a29ac8a2c44de95eec115bf75943fe1aa45
2020-12-22 15:05:59 -08:00
kshitij12345
2df249f0ab [fix] inplace remainder/% (#49390)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/49214

**BC-Breaking**
Before this PR, `%=` didn't actually do the operation inplace and returned a new tensor.
After this PR, `%=` operation is actually inplace and the modified input tensor is returned.

Before PR,
```python
>>> import torch
>>> a = torch.tensor([11,12,13])
>>> id(a)
139627966219328
>>> a %= 10
>>> id(a)
139627966219264
```

After PR,
```python
>>> import torch
>>> a = torch.tensor([11,12,13])
>>> id(a)
139804702425280
>>> a %= 10
>>> id(a)
139804702425280
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49390

Reviewed By: izdeby

Differential Revision: D25560423

Pulled By: zou3519

fbshipit-source-id: 2b92bfda260582aa4ac22c4025376295e51f854e
2020-12-22 07:30:03 -08:00
Jeffrey Wan
5ab9593098 torch.reciprocal: promote integer inputs to float (#49102)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/49091

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49102

Reviewed By: VitalyFedyunin

Differential Revision: D25639541

Pulled By: soulitzer

fbshipit-source-id: 1dd360bd7b77f106d606143d8d3961610bac8cb7
2020-12-18 16:17:30 -08:00
Erjia Guan
c98c98d77d Migrate fmod and fmod_ from TH to ATen (CUDA) (#47323)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47323

Fixes #24565

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D24763086

Pulled By: ejguan

fbshipit-source-id: fa004baea19bbbdbeb44814903db29226805ef0e
2020-12-02 09:38:29 -08:00
FNSTER
30324d1e71 fix INTERNAL ASSERT FAILED for maximum (#48446)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/48393

Pull Request resolved: https://github.com/pytorch/pytorch/pull/48446

Reviewed By: zhangguanheng66

Differential Revision: D25240270

Pulled By: ngimel

fbshipit-source-id: 57fc223b98f2b6f96f2f24e1d9041644e3187262
2020-12-01 15:29:48 -08:00
Nikita Shulga
032e4f81a8 Fix test comparison ops check for scalar overflow (#48597)
Summary:
Test should verify, that all listed conditions throw, not just the first one
Refactor duplicated constants
Use `self.assertTrue()` instead of suppressing flake8 `B015: Pointless Comparison` warning

Pull Request resolved: https://github.com/pytorch/pytorch/pull/48597

Reviewed By: mruberry

Differential Revision: D25222734

Pulled By: malfet

fbshipit-source-id: 7854f755a84f23a1a52dc74402582e34d69ff984
2020-11-30 12:39:28 -08:00
Mike Ruberry
36c87f1243 Refactors test_torch.py to be fewer than 10k lines (#47356)
Summary:
Creates multiple new test suites to have fewer tests in test_torch.py, consistent with previous test suite creation like test_unary_ufuncs.py and test_linalg.py.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/47356

Reviewed By: ngimel

Differential Revision: D25202268

Pulled By: mruberry

fbshipit-source-id: 75fde3ca76545d1b32b86d432a5cb7a5ba8f5bb6
2020-11-28 20:11:40 -08:00