Commit Graph

162 Commits

Author SHA1 Message Date
soulitzer
83e8612d11 Clean up test autograd (#67413)
Summary:
Partially fixes https://github.com/pytorch/pytorch/issues/66066

This PR:
 - cleans up op-specific testing from test_autograd. test_autograd should be reserved for testing generic autograd functionality
 - tests related to an operator are better colocated
 - see the tracker for details

What to think about when moving tests to their correct test suite:
 - naming, make sure its not too generic
 - how the test is parametrized, sometimes we need to add/remove a device/dtype parameter
 - can this be merged with existing tests

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67413

Reviewed By: jbschlosser, albanD

Differential Revision: D32031480

Pulled By: soulitzer

fbshipit-source-id: 8e13da1e58a38d5cecbfdfd4fe2b4fe6f816897f
2021-11-03 15:26:09 -07:00
kshitij12345
885a8e53ba replace onlyOnCPUAndCUDA with onlyNativeDeviceTypes (#65201)
Summary:
Reference https://github.com/pytorch/pytorch/issues/53849

Replace `onlyOnCPUandCUDA` with `onlyNativeDeviceTypes` which includes `cpu, cuda and meta`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65201

Reviewed By: mrshenli

Differential Revision: D31299718

Pulled By: mruberry

fbshipit-source-id: 2d8356450c035d6a314209ab51b2c237583920fd
2021-11-01 09:22:34 -07:00
Yukio Siraichi
83f70db95c Fix common device computation for comparison ops. (#66245)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66245

Fixes #66053

This PR splits `declare_static_dtype_and_device` into two new methods for
`TensorIteratorBase`: `declare_static_dtype` and `declare_static_device`.

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D31503849

Pulled By: ngimel

fbshipit-source-id: 4b131b691d29ceb5f3709f5d6503997ea0875c54
2021-10-22 18:43:17 -07:00
Jane Xu
8a65047acc [skip ci] Set test owners for everything considered with module: tests (#66865)
Summary:
Action following https://github.com/pytorch/pytorch/issues/66232

cc mruberry

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66865

Reviewed By: anjali411

Differential Revision: D31771147

Pulled By: janeyx99

fbshipit-source-id: 8bebe5ac2098364ef1ee93b590abb5f4455b0f89
2021-10-20 09:37:03 -07:00
jiayisun
0b8dc0f04a add BFloat16 operators on CPU: logaddexp, logaddexp2, remainder (#63621)
Summary:
Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63621

Reviewed By: H-Huang

Differential Revision: D31640811

Pulled By: mruberry

fbshipit-source-id: 1fd061b65c196398738018eefc52bf459e424b1c
2021-10-15 13:11:45 -07:00
Freey0
2223737da9 restore test_inplace_comparison_ops_require_inputs_have_same_dtype Expected behavior (#64267)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64267

This test expects every operation to throw a runtime error.

And Reinsert in-place operation test,Fix bug for comparison operation

fix: #64018

Test Plan: Imported from OSS

Reviewed By: gchanan

Differential Revision: D30720915

Pulled By: ezyang

fbshipit-source-id: 215a6556d20770f70f4ced1c1f9a9753933f1d37
2021-09-08 06:42:12 -07:00
Kevin Tse
7e4ebe06ca Fixes issue related torch.trapezoid broadcasting behavior and documentation (#64054)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64054

Fixes #63608

cc mruberry rgommers heitorschueroff

Test Plan: Imported from OSS

Reviewed By: saketh-are

Differential Revision: D30617078

Pulled By: NivekT

fbshipit-source-id: 815896ec56d447562790df4d662e94fd13457e2a
2021-09-07 11:41:55 -07:00
Philip Meier
26b7ff5aea deprecate dtype getters from torch.testing namespace (#63554)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63554

Following https://github.com/pytorch/pytorch/pull/61840#issuecomment-884087809, this deprecates all the dtype getters publicly exposed in the `torch.testing` namespace. The reason for this twofold:

1. If someone is not familiar with the C++ dispatch macros PyTorch uses, the names are misleading. For example `torch.testing.floating_types()` will only give you `float32` and `float64` skipping `float16` and `bfloat16`.
2. The dtype getters provide very minimal functionality that can be easily emulated by downstream libraries.

We thought about [providing an replacement](https://gist.github.com/pmeier/3dfd2e105842ad0de4505068a1a0270a), but ultimately decided against it. The major problem is BC: by keeping it, either the namespace is getting messy again after a new dtype is added or we need to somehow version the return values of the getters.

Test Plan: Imported from OSS

Reviewed By: H-Huang

Differential Revision: D30662206

Pulled By: mruberry

fbshipit-source-id: a2bdb10ab02ae665df1b5b76e8afa9af043bbf56
2021-09-07 08:58:51 -07:00
Saketh Are
83e28a7d28 Use stacklevel for floordiv deprecation warnings (#64034)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/60548

`Tensor.__floordiv__` was indirectly deprecated by deprecation of `torch.floor_divide` (see https://github.com/pytorch/pytorch/issues/43874). Deprecating it directly provides clearer feedback.

Repro:
```
import torch
x = torch.tensor(0)
x // 1
```

Before this change, a deprecation warning was triggered within the C++ implementation of floor_divide:
```
UserWarning: floor_divide is deprecated, and will be removed in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values.
To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at  ../aten/src/ATen/native/BinaryOps.cpp:571.)
  return torch.floor_divide(self, other)
```

After this change, the warning instead cites the user's offending line of Python code:
```
UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values.
To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
  x // 1
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64034

Reviewed By: mruberry

Differential Revision: D30658010

Pulled By: saketh-are

fbshipit-source-id: b0e6c5008d741897509d102f4a89efb47de4aa2a
2021-08-31 11:27:56 -07:00
Kushashwa Ravi Shrimali
d37636901e [Doc] make_tensor to torch.testing module (#63925)
Summary:
This PR aims to add `make_tensor` to the `torch.testing` module in PyTorch docs.

TODOs:

* [x] Add examples

cc: pmeier mruberry brianjo

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63925

Reviewed By: ngimel

Differential Revision: D30633487

Pulled By: mruberry

fbshipit-source-id: 8e5a1f880c6ece5925b4039fee8122bd739538af
2021-08-30 12:25:40 -07:00
Philip Meier
70a3210eca Add BinaryUfuncOpInfo and broadcasting tests (#61964)
Summary:
As proof of concept, this PR uses the new `BinaryUfuncOpInfo` in broadcasting tests for `add`, `sub`, `mul`, `div`, `floor_div`, and `true_div`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61964

Reviewed By: ngimel

Differential Revision: D30407734

Pulled By: mruberry

fbshipit-source-id: ada28994f43b0635f279f45a02ecba18bc8ee033
2021-08-20 11:44:15 -07:00
Shen Li
1022443168 Revert D30279364: [codemod][lint][fbcode/c*] Enable BLACK by default
Test Plan: revert-hammer

Differential Revision:
D30279364 (b004307252)

Original commit changeset: c1ed77dfe43a

fbshipit-source-id: eab50857675c51e0088391af06ec0ecb14e2347e
2021-08-12 11:45:01 -07:00
Zsolt Dollenstein
b004307252 [codemod][lint][fbcode/c*] Enable BLACK by default
Test Plan: manual inspection & sandcastle

Reviewed By: zertosh

Differential Revision: D30279364

fbshipit-source-id: c1ed77dfe43a3bde358f92737cd5535ae5d13c9a
2021-08-12 10:58:35 -07:00
Kevin Tse
87465a6e68 adding operator cumulative_trapezoid (#61615)
Summary:
Stack from [ghstack](https://github.com/ezyang/ghstack):
* https://github.com/pytorch/pytorch/issues/61616
* **https://github.com/pytorch/pytorch/issues/61615**
* https://github.com/pytorch/pytorch/issues/61475

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61615

Reviewed By: malfet, mruberry

Differential Revision: D29975064

Pulled By: NivekT

fbshipit-source-id: 4d4e98f3efb720fdc44eb238ecbf0fa157ac13d7
2021-08-03 08:04:00 -07:00
kshitij12345
fd8004b42e add bfloat16 impl for nextafter (#61829)
Summary:
Add `BFloat16` support for `nextafter`.

* [x] Add OpInfo
* [x] Add Implementation Test (C++ tests)
* [x] Add credit

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61829

Reviewed By: ejguan

Differential Revision: D29932498

Pulled By: mruberry

fbshipit-source-id: 89524531a4800569ba1addd08a4ace330a6f72a4
2021-08-02 23:16:58 -07:00
Freey0
109bd5e78a OpInfo: bitwise_and (#61349)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61349

Also add type promotion test for bugs found by pr #60813

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D29592840

Pulled By: ezyang

fbshipit-source-id: ee013b20e31baf6c6ebf2edb881ae6d8e215c7a6
2021-07-22 07:04:17 -07:00
Nikita Shulga
604f503d30 Revert D29794958 + compilation fix (#61937)
Summary:
This PR un-reverts https://github.com/pytorch/pytorch/issues/61475 + fixes compilation with MSVC, that does not recognize alternative operator spellings (i.e. using `or` instead of `||` )

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61937

Reviewed By: albanD

Differential Revision: D29805941

Pulled By: malfet

fbshipit-source-id: 01e5963c6717c1b44b260300d87ba0bf57f26ce9
2021-07-20 18:14:45 -07:00
Nikita Shulga
22fff61f06 Revert D29794958: [pytorch][PR] changing trapz to trapezoid
Test Plan: revert-hammer

Differential Revision:
D29794958 (95cec8f4fa)

Original commit changeset: 60b9c07efd47

fbshipit-source-id: 2dcda2d62e01c2521a86ae5ed8246cfb686d3f64
2021-07-20 16:00:46 -07:00
Kevin Tse
95cec8f4fa changing trapz to trapezoid (#61475)
Summary:
This PR resolves issue https://github.com/pytorch/pytorch/issues/52606 while also adding support for complex number

Stack from [ghstack](https://github.com/ezyang/ghstack):
* https://github.com/pytorch/pytorch/issues/61616
* https://github.com/pytorch/pytorch/issues/61615
* **https://github.com/pytorch/pytorch/issues/61475**

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61475

Reviewed By: mruberry

Differential Revision: D29794958

Pulled By: NivekT

fbshipit-source-id: 60b9c07efd47fd85b9c8178768fc7828d7b57d29
2021-07-20 15:25:55 -07:00
Akifumi Imanishi
4d9fd8958b Support __rand__, __ror__ and __rxor__ (#59240)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/58120.

This PR implements `torch.Tensor.{__rand__/__ror__/__rxor__}` for the compatibility with NumPy’s interface.
(cc: mruberry, rgommers, emcastillo, kmaehashi)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59240

Reviewed By: ngimel

Differential Revision: D29482304

Pulled By: mruberry

fbshipit-source-id: 13789202c1d8dddf8658a45381aeedcc31e2f603
2021-07-07 13:34:14 -07:00
Joel Schlosser
03b5a225a7 Test parametrization for instantiated device-specific tests (#60233)
Summary:
The `ops` decorator provides a way to parameterize a test across a given list of ops. This would be useful for modules as well (e.g. a `modules` decorator), but the mechanism by which this is accomplished is specific to ops. In the details, the `ops` decorator tags a test function with the metadata needed (list of ops, `dtypes`) and the actual tests are generated according to this metadata during the call to `instantiate_device_type_tests()`.

This PR makes this mechanism more generic, allowing for test parameterization across arbitrary dimensions. This makes a `modules` decorator (or any similar type of decorator) straightforward to implement without changes to the device-specific test instantiation logic.

One caveat is that, since this is implemented where the old `ops` decorator was (within `instantiate_device_type_tests()`), this only works for tests instantiated using the device-specific instantiation logic. Longer term, even device-specific test instantiation could be treated as an optional parameterization across device types, but this PR takes a low-risk approach for now. In practice, this just means that a `device` kwarg is required for all test signatures used with the mechanism.

The `ops` decorator has been refactored to use the generic mechanism and works the same as before, with one difference: when `OpDTypes.none` is specified, the test signature no longer needs an unused `dtype` kwarg. This is a nice bonus that demonstrates the added flexibility of a generic parameterization mechanism. The refactored form also has the bonus that all op-specific test generation logic is contained within the `ops` decorator class, improving readability.

Behind the scenes, the generic mechanism is a base decorator class (`_TestParameterizer`) from which `ops` derives. The core functionality is in the `_parameterize_test()` method, which takes in a test function and returns a generator that produces parameterized tests, including names and parameter kwargs to pass to them. Using the `ops` decorator results in a set of op-specific tests from a given generic test.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60233

Reviewed By: iramazanli

Differential Revision: D29494995

Pulled By: jbschlosser

fbshipit-source-id: a14446488c106094fafcaa75ccf8e9e3faf33bfc
2021-06-30 18:50:22 -07:00
kshitij12345
dfd2edc025 [special] add zeta (#59623)
Summary:
Reference https://github.com/pytorch/pytorch/issues/50345

`zeta` was already present in the codebase to support computation of `polygamma`.

However, `zeta` only had `double(double, double)` signature **for CPU** before the PR (which meant that computation `polygamma` were always upcasted to `double` for zeta part).

With this PR, float computations will take place in float and double in double.

Have also refactored the code and moved the duplicate code from `Math.cuh` to `Math.h`

**Note**: For scipy, q is optional, and if it is `None`, it defaults `1` which corresponds to Reimann-Zeta. However, for `torch.specia.zeta`, I made it mandatory cause for me it feels odd without `q` this is Reimann-Zeta and with `q` it is the general Hurwitz Zeta. I think sticking to just general made more sense as passing `1` for q sounds trivial.

Verify:
* [x] Docs https://14234587-65600975-gh.circle-artifacts.com/0/docs/special.html#torch.special.zeta

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59623

Reviewed By: ngimel

Differential Revision: D29348269

Pulled By: mruberry

fbshipit-source-id: a3f9ebe1f7724dbe66de2b391afb9da1cfc3e4bb
2021-06-24 00:00:12 -07:00
Akifumi Imanishi
26cdec6ce4 Support torch.bitwise_{left/right}_shift and __rlshift__, __rrshift__ (#59544)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/58121

This PR implements `torch.bitwise_left_shift` and `torch.bitwise_right_shift` and `torch.Tensor.{__rlshift__/__rrshift__}`for compatibility with Python array API standard.
(cc: mruberry, rgommers, emcastillo, kmaehashi)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59544

Reviewed By: ngimel

Differential Revision: D29348869

Pulled By: mruberry

fbshipit-source-id: 329aee296cf890735e8a9f858bccfe87c03d06ca
2021-06-23 23:57:16 -07:00
Masaki Kozuki
9e773ea7d5 Use accscalar_t for CUDA add/sub with Tensor and Scalar (#60454)
Summary:
Follow up of https://github.com/pytorch/pytorch/issues/60227, related to https://github.com/pytorch/pytorch/issues/59907 & https://github.com/pytorch/pytorch/issues/58833

With this pull request, `torch.add` & `torch.sub` use `acc_type` for `Scalar` if either of two arguments is `Scalar`.
This mimics the behavior of [`torch.mul`](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/cuda/BinaryMulDivKernel.cu#L18), `torch._foreach_(add|sub).Scalar` and `torch._foreach_(add|sub).ScalarList`.

 ---

**reference**
- torch.mul CUDA kernel: b0c9762e2d/aten/src/ATen/native/cuda/BinaryMulDivKernel.cu (L17-L25)
- `torch._foreach_(add|sub).Scalar`: cast scalar b0c9762e2d/aten/src/ATen/native/cuda/ForeachBinaryOpScalar.cu (L27)
- `torch._foreach_(add|sub).ScalarList`: `BinaryOpScalarListFunctor` b0c9762e2d/aten/src/ATen/native/cuda/ForeachFunctors.cuh (L180-L182) and multi_tensor_apply handles `scalar_t` and computes `opmath_t` (almost equivalent `accscalar_t`)  b0c9762e2d/aten/src/ATen/native/cuda/MultiTensorApply.cuh (L60-L68). BinaryOpScalarListFunctor
is used b0c9762e2d/aten/src/ATen/native/cuda/ForeachBinaryOpScalarList.cu (L24)

cc ngimel ptrblck mcarilli

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60454

Reviewed By: VitalyFedyunin

Differential Revision: D29345035

Pulled By: ngimel

fbshipit-source-id: 5dbafbdfe029a9544ec2e58f17d547928e017a04
2021-06-23 18:59:22 -07:00
Philip Meier
0c916c8a4e up the priority of numpy array comparisons in self.assertEqual (#59067)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/58988.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59067

Reviewed By: jbschlosser

Differential Revision: D28986642

Pulled By: heitorschueroff

fbshipit-source-id: 3ef2d26b4010fc3519d0a1a020ea446ffeb46ba0
2021-06-22 13:07:07 -07:00
Masaki Kozuki
b298013cd5 [add/sub] Cast alpha to acc_type (#60227)
Summary:
This PR lets `torch.add` & `torch.sub` CUDA kernels cast `alpha` to `acc_type`, not `scalar_t`.
I do not remove `cast`s from `test/test_foreach.py` because I'll do this in https://github.com/pytorch/pytorch/issues/59907 or follow-up for it.

Current upstream `torch._foreach_add` & `torch._foreach_sub` upcast `alpha` parameter to `acc_type<scalar_t>` while `torch.add` & `torch.sub` not. This is kind of problematic because outputs of `torch.add` and `torch.sub` are different from `torch._foreach_add` and `torch._foreach_sub`, respectively if the dtype of input tensors is either `torch.half` or `torch.bfloat16`. The discrepancy is proportional-ish to `abs(alpha)` except when `alpha` is representable with 16 bits.

ref:
- `torch._foreach_add` & `torch._foreach_sub` cast `alpha`: 6d0fb85a62/aten/src/ATen/native/cuda/ForeachBinaryOpList.cu (L21-L28), `BinaryOpListAlphaFunctor` is defined here: 6d0fb85a62/aten/src/ATen/native/cuda/ForeachFunctors.cuh (L202)

related: https://github.com/pytorch/pytorch/issues/58833, https://github.com/pytorch/pytorch/pull/59907

cc ngimel ptrblck mcarilli

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60227

Reviewed By: mruberry

Differential Revision: D29252759

Pulled By: ngimel

fbshipit-source-id: 847f3b9493ae30a900f7445af00aef1abcc1ab21
2021-06-20 19:05:22 -07:00
Akifumi Imanishi
0a5bfa9919 Support __rmod__ (#58476)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/58035.

This PR implements `torch.Tensor.__rmod__` and `torch.remainder(scalar, tensor)` for the compatibility with NumPy’s interface.
(cc: mruberry, rgommers, emcastillo, kmaehashi)

TODO:
  - [x] Update `tensor_binary_op` in test/test_binary_ufuncs.py after https://github.com/pytorch/pytorch/issues/58216 is merged.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58476

Reviewed By: ngimel

Differential Revision: D28776810

Pulled By: mruberry

fbshipit-source-id: 74f8aea80f439ef2cc370333524e39971eeb7bf4
2021-06-05 16:19:24 -07:00
Akifumi Imanishi
3113a1de4a Fix some tensor operators to return NotImplemented for invalid inputs (#58216)
Summary:
Same as https://github.com/pytorch/pytorch/issues/57934. (cc/ albanD)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58216

Reviewed By: ailzhang

Differential Revision: D28494886

Pulled By: albanD

fbshipit-source-id: 380205867ee1cde90e1c6fcfe2a31749e1243530
2021-05-19 13:09:57 -07:00
Xue Haotian
098d9975a7 Port heaviside to structured kernel (#57933)
Summary:
Port heaviside to structured kernel
Related https://github.com/pytorch/pytorch/issues/55070

Pull Request resolved: https://github.com/pytorch/pytorch/pull/57933

Reviewed By: mruberry

Differential Revision: D28362533

Pulled By: ezyang

fbshipit-source-id: 96b4591db3f609434784bd0ef9e54c61c918fb88
2021-05-13 10:48:11 -07:00
Alban Desmaison
5e83c62a9e Revert D28351931: [pytorch][PR] Fix some tensor operators to return NotImplemented for invalid inputs
Test Plan: revert-hammer

Differential Revision:
D28351931 (35521a2629)

Original commit changeset: 985457a44dba

fbshipit-source-id: 10724c219e53648f10a70719e25bcf774c6c7852
2021-05-12 13:58:03 -07:00
Akifumi Imanishi
35521a2629 Fix some tensor operators to return NotImplemented for invalid inputs (#57934)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/57719.

This PR fixes `torch.Tensor{__rsub__, __rdiv__, __rtruediv__, __pow__, __rmatmul__}` to return `NotImplemented` instead of raising a `TypeError`.

cc/ mruberry: The first commit of this PR is the same as 1d209db1cc excepts the commit message.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/57934

Reviewed By: mruberry

Differential Revision: D28351931

Pulled By: albanD

fbshipit-source-id: 985457a44dba24d2496794dfb8c1661cbcd4ff8f
2021-05-12 11:03:23 -07:00
Akifumi Imanishi
14282232d9 Fix generate_not_implemented_tests not testing unknown types correctly (#56997)
Summary:
Currently, the test code is not testing unknown types correctly because `op` is overwritten in the for-loop (i.e., currently only `__ior__` is tested).
This PR fixes the test `generate_not_implemented_tests` to bind operator name to each method, and remove operators currently unsupported (`__rand__`, …).

cc/ mruberry This fix is be needed to add tests for the operator we are going to introduce (e.g., `__rand__`)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56997

Reviewed By: astaff

Differential Revision: D28118465

Pulled By: mruberry

fbshipit-source-id: c5a466a7604262ed5490862300d47043aff63d0b
2021-05-09 05:34:10 -07:00
Wenlei Xie
20085f6d23 Support auto generation of device check (#56872)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56872

ghstack-source-id: 127914018

Test Plan: auto test

Reviewed By: ezyang

Differential Revision: D27986429

fbshipit-source-id: 0da8413b0b8e6810fcea27ed1de499f11f68bd1f
2021-05-01 12:02:09 -07:00
kshitij12345
d4ddb47719 [special] Add xlog1py (#55138)
Summary:
Reference : https://github.com/pytorch/pytorch/issues/50345

* [x] Check Rendered Document (https://12494173-65600975-gh.circle-artifacts.com/0/docs/special.html#torch.special.xlog1py)
* [x] Tests in Binary Ufunc
* [x] OpInfo
* [x] Structured Kernel

Pull Request resolved: https://github.com/pytorch/pytorch/pull/55138

Reviewed By: ngimel

Differential Revision: D27961461

Pulled By: mruberry

fbshipit-source-id: 30a8f41970a829bf50254aadf5615e8ce4148c7e
2021-04-30 05:51:13 -07:00
Peter Bell
5536cda19a Update floor_divide behavior in line with NumPy 1.20 (#56893)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56893

Fixes gh-56814

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D28025814

Pulled By: mruberry

fbshipit-source-id: 8654978ea1d5aa7c12bcf5a8c939966287a2d34e
2021-04-28 05:01:23 -07:00
Brian Hirsh
e8faf69739 fix torch.pow type promotion issue (#54085)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54085

Fixes https://github.com/pytorch/pytorch/issues/50121.

This fixes two similar issues pointed out with the dtype that `torch.pow` performs its computation. Thanks ngimel for spotting the issues originally (comments [here](https://github.com/pytorch/pytorch/pull/53669#discussion_r594624355) and [here](https://github.com/pytorch/pytorch/pull/53669#discussion_r594719704))!

Before:
```
>>> torch.pow(2, torch.tensor([17], dtype=torch.uint8), out=torch.tensor([0]))
tensor([0])
>>> torch.pow(2, torch.tensor(17, dtype=torch.uint8), out=torch.tensor(0))
tensor(131072)
>>> torch.pow(2, torch.tensor([17], dtype=torch.uint8, device='cuda'), out=torch.tensor([0], device='cuda'))
tensor([131072], device='cuda:0')
>>> torch.pow(2, torch.tensor(17, dtype=torch.uint8, device='cuda'), out=torch.tensor(0, device='cuda'))
tensor(131072, device='cuda:0')
```

After:
```
>>> torch.pow(2, torch.tensor([17], dtype=torch.uint8), out=torch.tensor([0]))
tensor([0])
>>> torch.pow(2, torch.tensor(17, dtype=torch.uint8), out=torch.tensor(0))
tensor(0)
>>> torch.pow(2, torch.tensor([17], dtype=torch.uint8, device='cuda'), out=torch.tensor([0], device='cuda'))
tensor([0], device='cuda:0')
>>> torch.pow(2, torch.tensor(17, dtype=torch.uint8, device='cuda'), out=torch.tensor(0, device='cuda'))
tensor(0, device='cuda:0')
```

In all four cases above, `tensor(0, ...)` is the correct value because the computed "common dtype" among the inputs is expected to be `uint8`. Computing `2 ** 7` in uint8 will then overflow to zero. Finally, we cast the computed output to the output tensor's dtype, which is `int32`.

There were two separate issues fixed in this PR: one for cpu and one for cuda:
* For CPU, The `pow(Scalar, Tensor)` overload wasn't calling `set_wrapped_number(true)` after wrapping the scalar in a Tensor, which caused the "promoted" scalar to incorrectly participate in type promotion (see the documented behavior [here](aa8714dfed/c10/core/TensorImpl.h (L590)))
* For CUDA, the cuda kernels defined in `PowKernel.cu` were using the output's dtype to run the computation, instead of the common dtype.

As an aside: The CPU and CUDA kernels actually both use `iter.dtype()` instead of `iter.common_dtype()` to run the computation, which I fixed. The reason that only manifested here for CUDA is because TensorIterator has cpu-specific logic to create temporary outputs with the intermediate dtype (shown [here](aa8714dfed/aten/src/ATen/TensorIterator.cpp (L349))). I'm not sure what the end state is there- I can imagine that being something we're more okay doing for cpu than for cuda, but it also leads to hard-to-track-down inconsistencies between the two like in this case.

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D27096330

Pulled By: bdhirsh

fbshipit-source-id: a7e2909243851625cb3056d1e7abb2383bfe95f2
2021-04-15 08:55:53 -07:00
Winston Smith
aceceb3d5c Reland #50999 (Added pow() on CPU for float16 & bfloat16) (#55280)
Summary:
#### Reason for relanding
Line 1607 of `torch/testing/_internal/common_methods_invocations.py` of https://github.com/pytorch/pytorch/issues/50999  had `dtype` instead of `dtype=torch.bool`, so 4 of the 9 sample inputs for `bool` had incorrect dtype. This bug was caught by https://github.com/pytorch/pytorch/issues/54949.

1. Added support for pow() on CPU for `float16` (`Half`) and `bfloat16` types.
Both `pow(Tensor, Scalar)` and `pow(Tensor, Tensor)` are now supported for the aforementioned types.
However autograd isn't supported for `Float16` on CPU yet, as `log_vml_cpu` can't be enabled for it.
2. heitorschueroff added `pow_tensor_scalar_optimized_kernel` to refactor & simplify `PowKernel.cpp`.
It provides a common path for all the complex types & floating point types (except Float16, due to lack of complete AVX2 vectorization support for it).  It replaced code that had previously been duplicated for (float, double) and complex types,
so PowKernel.cpp looks a lot cleaner now.
3. Enabled (unskipped) some tests for `erf`, `erfc`,`erfinv`, `tan` and `linalg.vector.norm` which were being skipped earlier due to `pow()` not having been implemented for `float16` & `bfloat16`.
4. Added an OpInfo for `pow()` & enabled some test cases for `pow()`.
5. Extended the coverage of existing tests for `pow` in `test_binary_ufuncs.py` in order to enable comparison with `numpy`, even with discontiguous tensors, and added a test to ensure that a runtime error is raised for `pow`'s inplace variant if resizing the base tensor is required during its invocation.
6. Added `float16` & `bfloat16` to `square`'s dtype lists in its `UnaryUfuncInfo`.
7. Removed redundant `dtypesIfCPU` and `dtypesIfCUDA` from `OpInfo`s where they are equal to `dtypes`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/55280

Reviewed By: jbschlosser

Differential Revision: D27591772

Pulled By: heitorschueroff

fbshipit-source-id: c7420811b32595bb3353149a61e54a73f2eb352b
2021-04-13 13:23:29 -07:00
Richard Zou
1e70d217e7 Add error message for complex alpha and non-complex inputs (#54964)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54964

Previously, the following would error out with a strange error message:
```
import torch
x=torch.randn(2)
torch.rsub(x, 1, alpha=2j)

Traceback (most recent call last)
<ipython-input-2-caf2a1c03d0b> in <module>
      1 import torch
      2 x=torch.randn(2)
----> 3 torch.rsub(x, 1, alpha=2j)

RuntimeError: value cannot be converted to type float without overflow: (-0,-2)
```

The reason why this is happening is because the alpha check doesn't check for if `x` is not complex and `alpha` is complex.
The error gets thrown further along in the implementation of torch.sub,
when it coerces `alpha` to be the same dtype as the input tensor:
https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/cpu/BinaryOpsKernel.cpp#L53

This PR fixes the bad error message by adding a new check to the alpha check.

Test Plan:
- pytest test/test_binary_ufuncs.py
- NB: add, sub, and rsub all share the same alpha check. The test only tests it for torch.add, but that should be sufficient.

Reviewed By: gchanan

Differential Revision: D27504017

Pulled By: zou3519

fbshipit-source-id: 70b9aa75a7a4faaaa93f6ba235cae85998a91697
2021-04-07 14:12:34 -07:00
Peter Bell
2ee02b30b1 Replace rounding_mode="true" with rounding_mode=None (#51988)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51988

* **#51988 Replace rounding_mode="true" with rounding_mode=None**

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D27561817

Pulled By: mruberry

fbshipit-source-id: 60d1d9c389570f60d599fc1876518717367fb368
2021-04-05 14:53:43 -07:00
Nikita Shulga
8377e6221a Revert D27478225: [pytorch][PR] Added pow() on CPU for float16 & bfloat16
Test Plan: revert-hammer

Differential Revision:
D27478225 (6d030c14cf)

Original commit changeset: d309dd98d5a9

fbshipit-source-id: e0518f15185b41946caf3a8456c7af3f52e5a910
2021-04-03 10:26:44 -07:00
Winston Smith
6d030c14cf Added pow() on CPU for float16 & bfloat16 (#50999)
Summary:
Added the functionality desired in https://github.com/pytorch/pytorch/issues/50789.

1. Added support for pow() on CPU for `float16` (`Half`) and `bfloat16` types.
Both `pow(Tensor, Scalar)` and `pow(Tensor, Tensor)` are now supported for the aforementioned types.
However autograd isn't supported for `Float16` on CPU yet, as `log_vml_cpu` can't be enabled for it.
2. heitorschueroff added `pow_tensor_scalar_optimized_kernel` to refactor & simplify `PowKernel.cpp`.
It provides a common path for all the complex types & floating point types (except Float16, due to lack of complete AVX2 vectorization support for it).  It replaced code that had previously been duplicated for (float, double) and complex types,
so PowKernel.cpp looks a lot cleaner now.
3. Enabled (unskipped) some tests for `erf`, `erfc`,`erfinv`, `linalg.norm` and `linalg.vector.norm` which were being skipped earlier due to `pow()` not having been implemented for `float16` & `bfloat16`.
4. Added an OpInfo for `pow()` & enabled some test cases for `pow()`.
5. Extended the coverage of existing tests for `pow` in `test_binary_ufuncs.py` in order to enable comparison with `numpy`, even with discontiguous tensors, and added a test to ensure that a runtime error is raised for `pow`'s inplace variant if resizing the base tensor is required during its invocation.
6. Added `float16` & `bfloat16` to `square`'s dtype lists in its `UnaryUfuncInfo`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/50999

Reviewed By: zou3519

Differential Revision: D27478225

Pulled By: heitorschueroff

fbshipit-source-id: d309dd98d5a96d0cb9b08281757bb1c65266d011
2021-04-02 15:57:06 -07:00
kshitij12345
bac566bf61 torch.square : OpInfo and minor fixes (#52551)
Summary:
Reference: https://github.com/pytorch/pytorch/issues/42515

Add `out` variant to be consistent with Unary Ops.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/52551

Reviewed By: heitorschueroff

Differential Revision: D27233482

Pulled By: mruberry

fbshipit-source-id: fef6f241849a12c46028bd1aad8f5ecc1dc65ea1
2021-03-24 00:04:42 -07:00
kshitij12345
b93ab10b7a torch.lerp: cuda complex support (#54129)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/54048

TODO
* [x] Add test

Pull Request resolved: https://github.com/pytorch/pytorch/pull/54129

Reviewed By: bdhirsh

Differential Revision: D27261878

Pulled By: anjali411

fbshipit-source-id: 10937a2eab944c73b5a98ec6278f50a876b8c7dc
2021-03-23 19:58:43 -07:00
Brian Hirsh
779cae9e42 port at::pow to structured (#53669)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53669

This PR does two things:
* Ports `pow` to be structured
* Fixes a bug with how pow handles mixed cpu and cuda tensors

**bug fix**
Pow is a binary op, and all binary ops that use TensorIterator are currently written to handle the case when one of the inputs is a CUDA tensor, and the other is a zero-dimensional cpu tensor.

`pow` incidentally only handles one of the two cases: it fails when the CUDA tensor is passed as the exponent, e.g. `at::pow(torch.tensor(2.0, device='cpu'), torch.tensor([2, 2], device='cuda'))`. Porting `pow` to structured happened to change the error that was outputted from a `TORCH_CHECK` in TensorIterator to an `INTERNAL_ASSERT` in loop.cuh, so I ended up trying to fix the error and update the tests. I added more details in a comment on the PR.

**notes on the structured port**
Pow is a little weird, so I wrote down a couple of issues I noticed during the port:
* Multiple independent overloads. `pow` has two overloads that have their own cpu/cuda kernels, meaning one doesn't call the other. I have to update the names of the kernel overloads to make the compiler happy, since the codegen would otherwise try to generate two classes with the same name. `pow` actually has 3 overloads that all have `out` variants, so I ported all 3 to structured- one of them just happens to redispatch one of the others in most cases.
* Name propagation. Is name propagation implemented per operator? Or is expected to work for most/all ops by default. Right now it looks like it happens for TensorIterator ops by default. For ops that don't use TensorIterator, we need to explicitly pass the names through to the `set_output()` call in the meta function. This happened to matter for `pow` because it has 3 overloads, but only two of them directly use TensorIterator. I had to pass names directly to `set_output` in the 3rd overload to make tests happy.
*  Lack of `const Tensor &` in the C++ API. It's a goal to slowly make all `Tensor &` arguments const as part of the structured port, but in this case I needed to explicitly cast constness away because one structured kernel called back into the C++ API, which still has ordinary `Tensor &` arguments. This probably isn't something we'll fix soon, since we have boxing logic that actually relies on the `Tensor &` / `const Tensor &` distinction in some places.

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D27029821

Pulled By: bdhirsh

fbshipit-source-id: c1786e770de6e6c2474b9a48210b88057ab1018e
2021-03-19 14:30:48 -07:00
mattip
54a2498919 Modify tests to use assertWarnsOnceRegex instead of maybeWarnsRegex (#52387)
Summary:
Related to https://github.com/pytorch/pytorch/issues/50006

Follow on for https://github.com/pytorch/pytorch/issues/48560 to ensure TORCH_WARN_ONCE warnings are caught. Most of this is straight-forward find-and-replace, but I did find one place where the TORCH_WARN_ONCE warning was not wrapped into a python warning.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/52387

Reviewed By: albanD

Differential Revision: D26773387

Pulled By: mruberry

fbshipit-source-id: 5be7efbc8ab4a32ec8437c9c45f3b6c3c328f5dd
2021-03-08 03:32:14 -08:00
Natalia Gimelshein
3309f034aa remove pointless test (#52609)
Summary:
Fixes T81870118

Pull Request resolved: https://github.com/pytorch/pytorch/pull/52609

Reviewed By: mruberry

Differential Revision: D26584288

Pulled By: ngimel

fbshipit-source-id: 7cec37db46cfe5b5b2fd21fe7c3e3fcbb8aba049
2021-02-22 16:25:04 -08:00
Gregory Chanan
983347fa25 Allow broadcasting against lerp weights. (#52319)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52319

Fixes: https://github.com/pytorch/pytorch/issues/52254

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D26488411

Pulled By: gchanan

fbshipit-source-id: 60eb471609986584c4235ba7f263581e988e7642
2021-02-18 09:53:25 -08:00
Mike Ruberry
594a66d778 Warn about floor_divide performing incorrect rounding (#50281) (#50281)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50281

Pull Request resolved: https://github.com/pytorch/pytorch/pull/51745

Test Plan: Imported from OSS

Reviewed By: ngimel

Pulled By: mruberry

Differential Revision: D26257855

fbshipit-source-id: e5d497cf07b0c746838ed081c5d0e82fb4cb701b
2021-02-10 03:13:34 -08:00
Peter Bell
b150f150ba Add division overload with rounding_mode selection (#51706)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51706

Pull Request resolved: https://github.com/pytorch/pytorch/pull/50280

As mentioned in gh-43874, this adds a `rounding_mode={'true', 'trunc', 'floor'}`
argument so `torch.div` can be used as a replacement for `floor_divide` during
the transitional period.

I've included dedicated kernels for truncated and floor division which
aren't strictly necessary for float, but do perform significantly better (~2x) than
doing true division followed by a separate rounding kernel.

Note: I introduce new overloads for `aten::div` instead of just adding a default
`rounding_mode` because various JIT passes rely on the exact operator schema.

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D26123271

Pulled By: mruberry

fbshipit-source-id: 51a83717602114597ec9c4d946e35a392eb01d46
2021-02-04 13:08:36 -08:00
kiyosora
4803eaf502 Implement NumPy-like function torch.fmax() & torch.fmin() (#49312)
Summary:
- Implementing the NumPy-like function`torch.fmax()` and `torch.fmin()` recommended in https://github.com/pytorch/pytorch/issues/48440

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49312

Reviewed By: izdeby

Differential Revision: D25887246

Pulled By: heitorschueroff

fbshipit-source-id: d762eeff8b328bfcbe7d48b7ee9d2da72c249691
2021-01-20 06:45:25 -08:00
76181208+imaginary-person@users.noreply.github.com
3f052ba07b Remove unnecessary dtype checks for complex types & disable complex dispatch for CPU min/max pointwise ops (#50465)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/50064

**PROBLEM DESCRIPTION:**
1. Had not removed dtype checks for complex types in the previous PR (https://github.com/pytorch/pytorch/issues/50347) for this issue.
These type-checks were added in https://github.com/pytorch/pytorch/issues/36377, but are no longer necessary,
as we now rely upon dispatch macros to produce error messages.
2. dtype checks in `clamp_max()` and `clamp_min()` for complex inputs had not been removed either.
3. For min/max pointwise ops in TensorCompareKernel.cpp, complex dispatch had not been removed for min/max functions.

### **FIX DESCRIPTION:**
**FIX SUMMARY:**
1. Removed dtype checks added in https://github.com/pytorch/pytorch/issues/36377, and added 3 more in TensorCompare.cpp.
2. Removed dtype checks for complex inputs in `clamp_max()` and `clamp_min()`.
3.  Disabled complex dispatch for min/max pointwise ops in TensorCompareKernel.cpp.
4. Error messages in the exceptions raised due to min/max ops not being implemented are now checked for containing the text _not support_ (which can also be present in _not supported_), or _not implemented_, so one of them should be a part of error messages, in order for them to be informative.

**REASON FOR NOT CHANGING DISPATCH FOR CUDA AND CLAMP OPS**:

As for the CUDA min/max operations, their kernels do not seem to be compiled & dispatched for complex types anyway, so no further changes seem to be required. Basically, the dispatch macros currently being used don't have cases for complex types.

For example,

1. the reduce CUDA ops use [AT_DISPATCH_ALL_TYPES_AND2 (678fe9f077)](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/Dispatch.h#L548-L575) in [ReduceMinMaxKernel.cu](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/cuda/ReduceMinMaxKernel.cu), and that macro doesn't allow complex types.

2. In [MinMaxElementwiseKernel.cu](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/cuda/MaxMinElementwiseKernel.cu), the CUDA pointwise ops use [`AT_DISPATCH_FLOATING_TYPES_AND2 (678fe9f077)`](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/Dispatch.h#L240-L263) for non-integral & non-boolean types, and this marco doesn't have a case for complex types either.

3. [clamp CUDA ops](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/cuda/UnaryOpsKernel.cu#L170-L211) use `AT_DISPATCH_ALL_TYPES_AND2 (678fe9f077)`, which doesn't have a case for complex types.

Similarly, [CPU clamp min/max ops](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/cpu/UnaryOpsKernel.cpp#L428-L458) use the `AT_DISPATCH_ALL_TYPES_AND `dispatch macro, which doesn't have a case for complex types.

**REASON FOR ADDING 3 dtype CHECKS:**
There are a few cases in which the methods corresponding to `min_stub()` or `max_stub()` are not called, so dispatch macros don't get invoked, resulting in no exceptions being raised. Hence, `dtype` checks are necessary at 3 places to raise exceptions:

1. 52dcc72999/aten/src/ATen/native/TensorCompare.cpp (L342)
2. 52dcc72999/aten/src/ATen/native/TensorCompare.cpp (L422)
3. 52dcc72999/aten/src/ATen/native/TensorCompare.cpp (L389)

The first dtype check requirement can be verified from the following example Python code based on `test_complex_unsupported()`:
```
import unittest
import torch

class MyTestCase(unittest.TestCase):

   def test_1(self):
      t = torch.tensor((1 + 1j), device='cpu', dtype=torch.complex128)
      with self.assertRaises(Exception):
         torch.max(t, dim=0)

if __name__ == '__main__':
    unittest.main()
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/50465

Reviewed By: mruberry

Differential Revision: D25938106

Pulled By: ngimel

fbshipit-source-id: 95e2df02ba8583fa3ce87d4a2fdcd60b912dda46
2021-01-17 22:00:05 -08:00
Erjia Guan
ca5d9617ba Fix remainder type promotion (#48668)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48668

Combine tests for `fmod` and `remainder`.

## BC-breaking Note:
In order to make `remainder` operator have type promotion, we have to introduce BC breaking.
### 1.7.1:
In the case where the second argument is a python number, the result is casted to the dtype of the first argument.
```python
>>> torch.remainder(x, 1.2)
tensor([0, 0, 0, 0, 0], dtype=torch.int32)
```
### This PR:
In the case where the second argument is a python number, the dtype of result is determined by type promotion of both inputs.
```python
>>> torch.remainder(x, 1.2)
tensor([1.0000, 0.8000, 0.6000, 0.4000, 0.2000])
```

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D25869136

Pulled By: ejguan

fbshipit-source-id: 8e5e87eec605a15060f715952de140f25644008c
2021-01-12 22:09:30 -08:00
Erjia Guan
a0f7b18391 Fix fmod type promotion (#48278)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48278

Remove various lines from tests due to no type promotion introduced from #47323

## BC-breaking Note:
In order to make `fmod` operator have type promotion, we have to introduce BC breaking.
### 1.7.1:
In the case where the second argument is a python number, the result is casted to the dtype of the first argument.
```python
>>> torch.fmod(x, 1.2)
tensor([0, 0, 0, 0, 0], dtype=torch.int32)
```
### Prior PR:
Check the BC-breaking note of #47323

### This PR:
In the case where the second argument is a python number, the dtype of result is determined by type promotion of both inputs.
```python
>>> torch.fmod(x, 1.2)
tensor([1.0000, 0.8000, 0.6000, 0.4000, 0.2000])
```

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D25869137

Pulled By: ejguan

fbshipit-source-id: bce763926731e095b75daf2e934bff7c03ff0832
2021-01-12 22:04:19 -08:00
Gregory Chanan
88bd69b488 Stop using c10::scalar_to_tensor in float_power. (#50105)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50105

There should be no functional change here.

A couple of reasons here:
1) This function is generally an anti-pattern (https://github.com/pytorch/pytorch/issues/49758) and it is good to minimize its usage in the code base.
2) pow itself has a fair amount of smarts like not broadcasting scalar/tensor combinations and we should defer to it.

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D25786172

Pulled By: gchanan

fbshipit-source-id: 89de03aa0b900ce011a62911224a5441f15e331a
2021-01-08 09:44:15 -08:00
Gregory Chanan
74dcb6d363 torch.xlogy: Use wrapped_scalar_tensor / gpu_with_scalars to speed up GPU kernel. (#49926)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49926

While investigating https://github.com/pytorch/pytorch/issues/49758, I changed the xlogy kernel to use the recommended wrapped_scaler_tensor pattern instead of moving the scalar to the GPU as a tensor.
While this doesn't avoid a synchronization (there is no synchronization in the move, as its done via fill), this does significantly speed up the GPU kernel (almost ~50%, benchmark in PR comments).

From looking at the nvprof output, it looks like this code path avoids broadcasting.  Aside: this seems unnecessary, as there is nothing special from the point-of-view of broadcasting whether the Tensor
is ()-sized or marked as a wrapped_scalar.  Still, this is a useful change to make as we avoid extra kernel launches and dispatches to create and fill the tensor.

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D25724215

Pulled By: gchanan

fbshipit-source-id: 4adcd5d8b3297502672ffeafc77e8af80592f460
2021-01-04 12:42:08 -08:00
kshitij12345
2780400904 [numpy] Add torch.xlogy (#48777)
Summary:
Reference https://github.com/pytorch/pytorch/issues/38349
Fixes https://github.com/pytorch/pytorch/issues/22656

TODO:
* [x] Add docs
* [x] Add tests

Pull Request resolved: https://github.com/pytorch/pytorch/pull/48777

Reviewed By: ngimel

Differential Revision: D25681346

Pulled By: mruberry

fbshipit-source-id: 369e0a29ac8a2c44de95eec115bf75943fe1aa45
2020-12-22 15:05:59 -08:00
kshitij12345
2df249f0ab [fix] inplace remainder/% (#49390)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/49214

**BC-Breaking**
Before this PR, `%=` didn't actually do the operation inplace and returned a new tensor.
After this PR, `%=` operation is actually inplace and the modified input tensor is returned.

Before PR,
```python
>>> import torch
>>> a = torch.tensor([11,12,13])
>>> id(a)
139627966219328
>>> a %= 10
>>> id(a)
139627966219264
```

After PR,
```python
>>> import torch
>>> a = torch.tensor([11,12,13])
>>> id(a)
139804702425280
>>> a %= 10
>>> id(a)
139804702425280
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49390

Reviewed By: izdeby

Differential Revision: D25560423

Pulled By: zou3519

fbshipit-source-id: 2b92bfda260582aa4ac22c4025376295e51f854e
2020-12-22 07:30:03 -08:00
Jeffrey Wan
5ab9593098 torch.reciprocal: promote integer inputs to float (#49102)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/49091

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49102

Reviewed By: VitalyFedyunin

Differential Revision: D25639541

Pulled By: soulitzer

fbshipit-source-id: 1dd360bd7b77f106d606143d8d3961610bac8cb7
2020-12-18 16:17:30 -08:00
Erjia Guan
c98c98d77d Migrate fmod and fmod_ from TH to ATen (CUDA) (#47323)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47323

Fixes #24565

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D24763086

Pulled By: ejguan

fbshipit-source-id: fa004baea19bbbdbeb44814903db29226805ef0e
2020-12-02 09:38:29 -08:00
FNSTER
30324d1e71 fix INTERNAL ASSERT FAILED for maximum (#48446)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/48393

Pull Request resolved: https://github.com/pytorch/pytorch/pull/48446

Reviewed By: zhangguanheng66

Differential Revision: D25240270

Pulled By: ngimel

fbshipit-source-id: 57fc223b98f2b6f96f2f24e1d9041644e3187262
2020-12-01 15:29:48 -08:00
Nikita Shulga
032e4f81a8 Fix test comparison ops check for scalar overflow (#48597)
Summary:
Test should verify, that all listed conditions throw, not just the first one
Refactor duplicated constants
Use `self.assertTrue()` instead of suppressing flake8 `B015: Pointless Comparison` warning

Pull Request resolved: https://github.com/pytorch/pytorch/pull/48597

Reviewed By: mruberry

Differential Revision: D25222734

Pulled By: malfet

fbshipit-source-id: 7854f755a84f23a1a52dc74402582e34d69ff984
2020-11-30 12:39:28 -08:00
Mike Ruberry
36c87f1243 Refactors test_torch.py to be fewer than 10k lines (#47356)
Summary:
Creates multiple new test suites to have fewer tests in test_torch.py, consistent with previous test suite creation like test_unary_ufuncs.py and test_linalg.py.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/47356

Reviewed By: ngimel

Differential Revision: D25202268

Pulled By: mruberry

fbshipit-source-id: 75fde3ca76545d1b32b86d432a5cb7a5ba8f5bb6
2020-11-28 20:11:40 -08:00