Commit Graph

1571 Commits

Author SHA1 Message Date
Edward Yang
f9a0abfc43 Fix code review from #48659 and #48116 (#48731)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48731

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: bhosmer

Differential Revision: D25278034

Pulled By: ezyang

fbshipit-source-id: 73652311b48d8d80c06e9385b7ff18ef3a158ae8
2020-12-03 08:26:17 -08:00
kshitij12345
90a3049a9a [fix] repr(torch.device) (#48655)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/48585

In the following commit 4c9eb57914, type of `DeviceIndex` was changed from `uint16_t` to `uint8_t`.
`uint8_t` is treated as ascii chars by std::cout and other stream operators. Hence the broken `repr`

Stackoverflow Reference: https://stackoverflow.com/questions/19562103/uint8-t-cant-be-printed-with-cout

Pull Request resolved: https://github.com/pytorch/pytorch/pull/48655

Reviewed By: bdhirsh

Differential Revision: D25272289

Pulled By: ezyang

fbshipit-source-id: a1549f5f8d417138cf38795e4c373e3a487d3691
2020-12-02 15:48:17 -08:00
Erjia Guan
c98c98d77d Migrate fmod and fmod_ from TH to ATen (CUDA) (#47323)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47323

Fixes #24565

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D24763086

Pulled By: ejguan

fbshipit-source-id: fa004baea19bbbdbeb44814903db29226805ef0e
2020-12-02 09:38:29 -08:00
Edward Yang
b4f5efa7b2 Structured kernels generate Meta registrations (#48116)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48116

If you port kernels to be structured, you get Meta kernels automatically
generated for you.  This is one payoff of structured kernels.

Code generation was mercifully really simple, although at risk of
"swiss cheese" syndrome: there's two new conditionals in the codegen
to tweak behavior when generating for meta keys.  It's not too bad
right now but there's a risk of things getting out of hand.  One
way to rationalize the logic here would be to transmit "TensorMeta-ness"
inside the TensorOptions (so tensor_from_meta can deal with it); then
the "Meta" kernel magic would literally just be generating empty
out_impls to call after all the scaffolding is done.  But I didn't
do this because it seemed like it would be more annoying short term.

Also had to teach resize_ to work on meta tensors, since we use them
to implement the out kernels.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: bhosmer, ailzhang

Differential Revision: D25056640

Pulled By: ezyang

fbshipit-source-id: f8fcfa0dbb58a94d9b4196748f56e155f83b1521
2020-12-02 07:54:48 -08:00
kshitij12345
bcc85a363e [numpy] torch.sigmoid : promote integer inputs to float (#47551)
Summary:
Reference https://github.com/pytorch/pytorch/issues/42515

Pull Request resolved: https://github.com/pytorch/pytorch/pull/47551

Reviewed By: ngimel

Differential Revision: D25211953

Pulled By: mruberry

fbshipit-source-id: 9174cda401aeba0fd585a4c9bda166dbcf64f42f
2020-12-01 23:28:57 -08:00
Taylor Robie
27905dfe9c Expose CXX_FLAGS through __config__ (#47861)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47861

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D25199263

Pulled By: robieta

fbshipit-source-id: 3cfdb0485d686a03a68dd0907d1733634857963f
2020-12-01 19:58:29 -08:00
Mike Ruberry
36c87f1243 Refactors test_torch.py to be fewer than 10k lines (#47356)
Summary:
Creates multiple new test suites to have fewer tests in test_torch.py, consistent with previous test suite creation like test_unary_ufuncs.py and test_linalg.py.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/47356

Reviewed By: ngimel

Differential Revision: D25202268

Pulled By: mruberry

fbshipit-source-id: 75fde3ca76545d1b32b86d432a5cb7a5ba8f5bb6
2020-11-28 20:11:40 -08:00
kiyosora
272f4db043 Implement NumPy-like function torch.float_power() (#44937)
Summary:
- Related with https://github.com/pytorch/pytorch/issues/38349
- Implementing the NumPy-like function `torch.float_power()` .

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44937

Reviewed By: ngimel

Differential Revision: D25192119

Pulled By: mruberry

fbshipit-source-id: 2e446b8e0c2825f045fe057e30c9419335557a05
2020-11-27 18:01:42 -08:00
Antonio Cuni
344918576c Migrate eig from the TH to Aten (CUDA) (#44105)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/24553

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44105

Reviewed By: ngimel

Differential Revision: D25192116

Pulled By: mruberry

fbshipit-source-id: 87f1ba4924b9174bfe0d9e2ab14bbe1c6bae879c
2020-11-27 15:15:48 -08:00
elfringham
db1b0b06c4 Flake8 fixes (#48453)
Summary:
Quiet errors from flake8. Only a couple of code changes for deprecated Python syntax from before 2.4. The rest is just adding noqa markers.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/48453

Reviewed By: mruberry

Differential Revision: D25181871

Pulled By: ngimel

fbshipit-source-id: f8d7298aae783b1bce2a46827b088fc390970641
2020-11-25 19:09:50 -08:00
Xiao Wang
4ab2055857 Re-enable only cuda tests wrongly disabled before (#48429)
Summary:
Close https://github.com/pytorch/pytorch/issues/46536

Re-enable only cuda tests wrongly disabled in https://github.com/pytorch/pytorch/pull/45332

See discussions https://github.com/pytorch/pytorch/issues/46536#issuecomment-721386038 and https://github.com/pytorch/pytorch/pull/45332#issuecomment-721350987

~~See also https://github.com/pytorch/pytorch/pull/47237 and https://github.com/pytorch/pytorch/pull/47642~~

Pull Request resolved: https://github.com/pytorch/pytorch/pull/48429

Reviewed By: ngimel

Differential Revision: D25176368

Pulled By: mruberry

fbshipit-source-id: 3822f5a45e58c0e387624e70ea272d16218901a9
2020-11-25 13:26:35 -08:00
kshitij12345
9ecaeb0962 [numpy] Add unary-ufunc tests for erf variants (#47155)
Summary:
Adding Unary Ufunc Test entry for `erf` variants.

We use scipy functions for reference implementation.

We can later update the tests once these functions will update integer input to float.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/47155

Reviewed By: ngimel

Differential Revision: D25176654

Pulled By: mruberry

fbshipit-source-id: cb08efed1468b27650cec4f87a9a34e999ebd810
2020-11-25 13:20:14 -08:00
Fayçal Arbai
2e0a8b75d8 An implementation of torch.tile as requested in pytorch/pytorch#38349 (#47974)
Summary:
The approach is to simply reuse `torch.repeat` but adding one more functionality to tile, which is to prepend 1's to reps arrays if there are more dimensions to the tensors than the reps given in input. Thus for a tensor of shape (64, 3, 24, 24) and reps of (2, 2) will become (1, 1, 2, 2), which is what NumPy does.

I've encountered some instability with the test on my end, where I could get a random failure of the test (due to, sometimes, random value of `self.dim()`, and sometimes, segfaults). I'd appreciate any feedback on the test or an explanation for this instability so I can this.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/47974

Reviewed By: ngimel

Differential Revision: D25148963

Pulled By: mruberry

fbshipit-source-id: bf63b72c6fe3d3998a682822e669666f7cc97c58
2020-11-24 18:07:25 -08:00
Kurt Mohler
b6654906c7 Fix assertEqual's handling of numpy array inputs (#48217)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/47948

Pull Request resolved: https://github.com/pytorch/pytorch/pull/48217

Reviewed By: mrshenli

Differential Revision: D25119607

Pulled By: mruberry

fbshipit-source-id: efe84380d3797d242c2aa7d43d2209bcba89cee0
2020-11-22 00:13:42 -08:00
Nikita Shulga
dc843fe197 Fix test_ldexp on Windows (#48335)
Summary:
Force `torch.randint` to generate tensor of int32 rather than tensor of int64
Delete unneeded copies

Pull Request resolved: https://github.com/pytorch/pytorch/pull/48335

Reviewed By: ranman

Differential Revision: D25133312

Pulled By: malfet

fbshipit-source-id: 70bfcb6b7ff3bea611c4277e6634dc7473541288
2020-11-20 15:41:59 -08:00
Randall Hunt
562d4c3bc5 Add basic ldexp operator for numpy compatibility (#45370)
Summary:
Adds ldexp operator for https://github.com/pytorch/pytorch/issues/38349

I'm not entirely sure the changes to `NamedRegistrations.cpp` were needed but I saw other operators in there so I added it.

Normally the ldexp operator is used along with the frexp to construct and deconstruct floating point values. This is useful for performing operations on either the mantissa and exponent portions of floating point values.

Sleef, std math.h, and cuda support both ldexp and frexp but not for all data types. I wasn't able to figure out how to get the iterators to play nicely with a vectorized kernel so I have left this with just the normal CPU kernel for now.

This is the first operator I'm adding so please review with an eye for errors.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45370

Reviewed By: mruberry

Differential Revision: D24333516

Pulled By: ranman

fbshipit-source-id: 2df78088f00aa9789aae1124eda399771e120d3f
2020-11-20 04:09:39 -08:00
kiyosora
008f840e7a Implement in-place method torch.cumsum_ and torch.cumprod_ (#47651)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/47193

Pull Request resolved: https://github.com/pytorch/pytorch/pull/47651

Reviewed By: zou3519

Differential Revision: D24992438

Pulled By: ezyang

fbshipit-source-id: c38bea55f4af1fc92be780eaa8e1d462316e6192
2020-11-19 11:20:12 -08:00
mfkasim91
8819bad86c Implement igammac (3rd PR) (#48171)
Summary:
Related: https://github.com/pytorch/pytorch/issues/46183 (torch.igamma)
This is the regularized upper incomplete gamma function.

This is supposed to be exactly the same as https://github.com/pytorch/pytorch/issues/47463, but after rebasing the `viable/strict` branch.

cc: mruberry

Pull Request resolved: https://github.com/pytorch/pytorch/pull/48171

Reviewed By: zhangguanheng66

Differential Revision: D25060107

Pulled By: mruberry

fbshipit-source-id: 89780dea21dbb2141cbc4f7f18192cb78a769b17
2020-11-18 23:44:32 -08:00
Edward Yang
a97d059614 Get TestTorch.test_empty_meta working again (#48113)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48113

Fix is simple: just treat Meta as a backend covered by AutogradOther.
This semantically makes sense, since meta kernels are just like regular
CPU/CUDA kernels, they just don't do any compute.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: zhangguanheng66

Differential Revision: D25056641

Pulled By: ezyang

fbshipit-source-id: 7b68911982352b3e0ee8616b38cd9c70bd58a740
2020-11-18 19:50:27 -08:00
Scott Wolchok
4c9eb57914 [PyTorch] Narrow Device to 2 bytes by narrowing DeviceType and DeviceIndex (#47023)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47023

DeviceType pretty clearly only needs 1 byte. DeviceIndex only needs 1 byte given that machines don't have anywhere near 255 GPUs in them as far as I know.
ghstack-source-id: 116901430

Test Plan: Existing tests, added assertion to catch if my assumption about DeviceIndex is incorrect

Reviewed By: dzhulgakov

Differential Revision: D24605460

fbshipit-source-id: 7c9a89027fcf8eebd623b7cdbf6302162c981cd2
2020-11-18 19:39:40 -08:00
Mike Ruberry
ea1e78a0c5 Revert D24853669: [pytorch][PR] Migrate eig from the TH to Aten (CUDA)
Test Plan: revert-hammer

Differential Revision:
D24853669 (866f8591be)

Original commit changeset: a513242dc7f4

fbshipit-source-id: a0c8c424b61b1e627d9102de6b4c6d0717a6c06d
2020-11-18 16:53:18 -08:00
Antonio Cuni
866f8591be Migrate eig from the TH to Aten (CUDA) (#44105)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/24553

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44105

Reviewed By: heitorschueroff

Differential Revision: D24853669

Pulled By: mruberry

fbshipit-source-id: a513242dc7f49f55dbc6046c18d8a9d9aa2aaf8d
2020-11-18 12:10:18 -08:00
kshitij12345
68a3a3f3b5 Add torch.swapdims and torch.swapaxes (#46041)
Summary:
Reference https://github.com/pytorch/pytorch/issues/38349

Delegates to `torch.transpose` (not sure what is the best way to alias)

TODO:
* [x] Add test
* [x] Add documentation

Pull Request resolved: https://github.com/pytorch/pytorch/pull/46041

Reviewed By: gchanan

Differential Revision: D25022816

Pulled By: mruberry

fbshipit-source-id: c80223d081cef84f523ef9b23fbedeb2f8c1efc5
2020-11-18 11:35:53 -08:00
Ivan Yashchuk
81b1673a21 Enable complex tests that depend on batched matmul on CUDA (#47910)
Summary:
Now when https://github.com/pytorch/pytorch/pull/42553 is merged we can delete a bit of code from the tests and enable some of the skipped complex tests.

Unfortunately, `test_pinverse_complex_xfailed` and `test_symeig_complex_xfailed` had bugs and it wasn't caught automatically that these tests xpass. Need to be careful next time with `unittest.expectedFailure`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/47910

Reviewed By: zhangguanheng66

Differential Revision: D25052130

Pulled By: mruberry

fbshipit-source-id: 29512995c024b882f9cb78b7bede77733d5762d0
2020-11-18 10:44:47 -08:00
Heitor Schueroff
2ff748a680 Move kthvalue scalar test to separate method for XLA (#48042)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48042

Moving scalar test to a separate method so the XLA team can continue to test for the other cases without failing. Requested here https://github.com/pytorch/xla/issues/2620#issuecomment-725696108

Test Plan: Imported from OSS

Reviewed By: zhangguanheng66

Differential Revision: D25055677

Pulled By: heitorschueroff

fbshipit-source-id: 5da66bac78ea197821fee0b9b8a213ff2dc19c67
2020-11-18 07:49:14 -08:00
Xiang Gao
d293413b3e Batched matmul dtypes (#47873)
Summary:
Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/47873

Reviewed By: navahgar

Differential Revision: D24928256

Pulled By: anjali411

fbshipit-source-id: a26aef7a15a13fc0b5716e905971265d8b1cea61
2020-11-14 22:45:48 -08:00
anjali411
db1f217d8d Add complex support for torch.addcmul and torch.addcdiv (#46639)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46639

Resolves: https://github.com/pytorch/pytorch/issues/46546#issuecomment-713122245

Test Plan: Imported from OSS

Reviewed By: izdeby, ansley

Differential Revision: D24879099

Pulled By: anjali411

fbshipit-source-id: 76131dc68ac964e67a633f62e07f7c799df4463e
2020-11-14 21:27:34 -08:00
Ivan Yashchuk
260daf088d Added linalg.cholesky (#46083)
Summary:
This PR adds `torch.linalg.cholesky` function that matches `numpy.linalg.cholesky`.

Fixed `lda` argument to `lapackCholesky` calls.
Added `random_hermitian_pd_matrix` helper function for tests.

Ref https://github.com/pytorch/pytorch/issues/42666.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/46083

Reviewed By: ailzhang

Differential Revision: D24861752

Pulled By: mruberry

fbshipit-source-id: 214dbceb4e8a2c589df209493efd843962d25593
2020-11-13 16:50:40 -08:00
Richard Zou
1c7c612af0 Revert D24543682: [pytorch][PR] Added support for complex input for torch.lu_solve
Test Plan: revert-hammer

Differential Revision:
D24543682 (ffd0003022)

Original commit changeset: 165bde39ef95

fbshipit-source-id: 790b4157fdbc7149aaf0748555efe6daed7e1a23
2020-11-13 08:24:53 -08:00
Ivan Yashchuk
ffd0003022 Added support for complex input for torch.lu_solve (#46862)
Summary:
`torch.lu_solve` now works for complex inputs both on CPU and GPU.
I moved the existing tests to `test_linalg.py` and modified them to test complex dtypes, but I didn't modify/improve the body of the tests.

Ref. https://github.com/pytorch/pytorch/issues/33152

Pull Request resolved: https://github.com/pytorch/pytorch/pull/46862

Reviewed By: nikithamalgifb

Differential Revision: D24543682

Pulled By: anjali411

fbshipit-source-id: 165bde39ef95cafebf976c5ba4b487297efe8433
2020-11-13 02:35:31 -08:00
Gao, Xiang
0652d755d3 Fix some flaky tests in test_torch.py and test_nn.py (#46941)
Summary:
Fixed test:
- `test_is_nonzero`, this is asserting exact match, which is flaky when `TORCH_SHOW_CPP_STACKTRACES=1`, I changed this to non-exact assert
- `test_pinverse` TF32
- `test_symeig` TF32
- `test_triangular_solve_batched_many_batches_cpu_float64` precision on CPU BLAS
- `test_qr` TF32, as well as the tensor factory forgets a `dtype=dtype`
- `test_lu` TF32
- `ConvTranspose2d` TF32
- `Conv3d_1x1x1_no_bias` TF32
- `Transformer*` TF32

Pull Request resolved: https://github.com/pytorch/pytorch/pull/46941

Reviewed By: heitorschueroff

Differential Revision: D24852725

Pulled By: mruberry

fbshipit-source-id: ccd4740cc643476178d81059d1c78da34e5082ed
2020-11-12 22:35:42 -08:00
kshitij12345
3649a2c170 [numpy] torch.sqrt : promote integer inputs to float (#47293)
Summary:
Reference https://github.com/pytorch/pytorch/issues/42515

Pull Request resolved: https://github.com/pytorch/pytorch/pull/47293

Reviewed By: malfet

Differential Revision: D24855994

Pulled By: mruberry

fbshipit-source-id: 1e6752f2eeba6d638dea0bdea0c650cf722718c9
2020-11-12 16:16:09 -08:00
Ivan Yashchuk
149190c014 Added CUDA support for complex input for torch.solve (#47045)
Summary:
`torch.solve` now works for complex inputs on GPU.
I moved the existing tests to `test_linalg.py` and modified them to test complex and float32 dtypes.
Differentiation also works correctly with complex inputs.

Fixes https://github.com/pytorch/pytorch/issues/41084
Ref. https://github.com/pytorch/pytorch/issues/33152

anjali411 I hope you don't mind that I took over https://github.com/pytorch/pytorch/pull/42737

Pull Request resolved: https://github.com/pytorch/pytorch/pull/47045

Reviewed By: nikithamalgifb

Differential Revision: D24921503

Pulled By: anjali411

fbshipit-source-id: 4c3fc4f193a84b6e28c43c08672d480715000923
2020-11-12 12:22:59 -08:00
Gregory Chanan
b6cb2caa68 Revert "Fixed einsum compatibility/performance issues (#46398)" (#47821)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47821

This reverts commit a5c65b86ce.

 Conflicts:
	test/test_linalg.py

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D24909923

Pulled By: gchanan

fbshipit-source-id: 9dcf98e7c4a3c7e5aaffe475867fa086f3bb6ff2
2020-11-12 08:11:40 -08:00
anjali411
e1ee3bfc0e Port bmm and baddbmm from TH to ATen (#42553)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42553

Ports `torch.bmm` and `torch.baddbmm` from TH to ATen, as well as adds support for complex dtypes. Also removes dead TH code for Level 2 functions.

Closes #24539

Test Plan: Imported from OSS

Reviewed By: ansley

Differential Revision: D24893511

Pulled By: anjali411

fbshipit-source-id: 0eba3f2aec99c48b3018a5264ee7789279cfab58
2020-11-12 07:57:42 -08:00
Ivan Yashchuk
52ec8b9340 Added CUDA support for complex input for torch.triangular_solve (#46916)
Summary:
`torch.triangular_solve` now works for complex inputs on GPU.
I moved the existing tests to `test_linalg.py` and modified them to test complex and float32 dtypes.

Ref. https://github.com/pytorch/pytorch/issues/33152

Pull Request resolved: https://github.com/pytorch/pytorch/pull/46916

Reviewed By: navahgar, agolynski

Differential Revision: D24706647

Pulled By: anjali411

fbshipit-source-id: fe780eac93d2ae1b2549539bb385e5fac25213b3
2020-11-11 16:08:11 -08:00
Ivan Yashchuk
a1db5b0f2b Added CUDA support for complex input for torch.inverse #2 (#47595)
Summary:
`torch.inverse` now works for complex inputs on GPU.
Opening a new PR here. The previous PR was merged and reverted due to a bug in tests marked with `slowTest`.
Previous PR https://github.com/pytorch/pytorch/pull/45034

Ref. https://github.com/pytorch/pytorch/issues/33152

Pull Request resolved: https://github.com/pytorch/pytorch/pull/47595

Reviewed By: navahgar

Differential Revision: D24840955

Pulled By: anjali411

fbshipit-source-id: ec49fffdc4b3cb4ae7507270fa24e127be14f59b
2020-11-11 11:06:08 -08:00
Heitor Schueroff
a5c65b86ce Fixed einsum compatibility/performance issues (#46398)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46398

This PR makes torch.einsum compatible with numpy.einsum except for the sublist input option as requested here https://github.com/pytorch/pytorch/issues/21412. It also fixed 2 performance issues linked below and adds a check for reducing to torch.dot instead of torch.bmm which is faster in some cases.

fixes #45854, #37628, #30194, #15671

fixes #41467 with benchmark below
```python
import torch
from torch.utils.benchmark import Timer

a = torch.randn(10000, 100, 101, device='cuda')
b = torch.randn(10000, 101, 3, device='cuda')

c = torch.randn(10000, 100, 1, device='cuda')
d = torch.randn(10000, 100, 1, 3, device='cuda')

print(Timer(
    stmt='torch.einsum("bij,bjf->bif", a, b)',
    globals={'a': a, 'b': b}
).blocked_autorange())

print()

print(Timer(
    stmt='torch.einsum("bic,bicf->bif", c, d)',
    globals={'c': c, 'd': d}
).blocked_autorange())
```
```
<torch.utils.benchmark.utils.common.Measurement object at 0x7fa37c413850>
torch.einsum("bij,bjf->bif", a, b)
  Median: 4.53 ms
  IQR:    0.00 ms (4.53 to 4.53)
  45 measurements, 1 runs per measurement, 1 thread

<torch.utils.benchmark.utils.common.Measurement object at 0x7fa37c413700>
torch.einsum("bic,bicf->bif", c, d)
  Median: 63.86 us
  IQR:    1.52 us (63.22 to 64.73)
  4 measurements, 1000 runs per measurement, 1 thread
```

fixes #32591 with benchmark below
```python
import torch
from torch.utils.benchmark import Timer

a = torch.rand(1, 1, 16, 2, 16, 2, 16, 2, 2, 2, 2, device="cuda")
b = torch.rand(729, 1, 1, 2, 1, 2, 1, 2, 2, 2, 2, device="cuda")

print(Timer(
    stmt='(a * b).sum(dim = (-3, -2, -1))',
    globals={'a': a, 'b': b}
).blocked_autorange())

print()

print(Timer(
    stmt='torch.einsum("...ijk, ...ijk -> ...", a, b)',
    globals={'a': a, 'b': b}
).blocked_autorange())
```
```
<torch.utils.benchmark.utils.common.Measurement object at 0x7efe0de28850>
(a * b).sum(dim = (-3, -2, -1))
  Median: 17.86 ms
  2 measurements, 10 runs per measurement, 1 thread

<torch.utils.benchmark.utils.common.Measurement object at 0x7efe0de286a0>
torch.einsum("...ijk, ...ijk -> ...", a, b)
  Median: 296.11 us
  IQR:    1.38 us (295.42 to 296.81)
  662 measurements, 1 runs per measurement, 1 thread
```

TODO

- [x] add support for ellipsis broadcasting
- [x] fix corner case issues with sumproduct_pair
- [x] update docs and add more comments
- [x] add tests for error cases

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D24860367

Pulled By: heitorschueroff

fbshipit-source-id: 31110ee598fd598a43acccf07929b67daee160f9
2020-11-10 19:38:43 -08:00
Heitor Schueroff
bf6a156f64 Fix kthvalue error for scalar input (#47600)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47600

fixes https://github.com/pytorch/pytorch/issues/30818

Note that the median case was already fixed by https://github.com/pytorch/pytorch/pull/45847

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D24860337

Pulled By: heitorschueroff

fbshipit-source-id: 69ccbbb6c7c86671e5712b1c2056c012d898b4f2
2020-11-10 17:21:52 -08:00
kshitij12345
6575e674ce [numpy] torch.{all, any} : Extend Dtype Support (#44790)
Summary:
Reference https://github.com/pytorch/pytorch/issues/44779

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44790

Reviewed By: bdhirsh

Differential Revision: D24393119

Pulled By: heitorschueroff

fbshipit-source-id: a9b88e9d06b3c282f2e5360b6eaea4ae8ef77c1d
2020-11-10 17:11:39 -08:00
Natalia Gimelshein
c9d37675b2 Back out "[pytorch][PR] The dimension being reduced should not be coalesced by TensorIterator" (#47642)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47642

Original commit changeset: 02bb2b15694c

Test Plan: Covered by CI tests

Reviewed By: anjali411

Differential Revision: D24849072

fbshipit-source-id: a8790cbf46936aee7a6f504dac8595997175fc65
2020-11-10 16:31:33 -08:00
Radhakrishnan Venkataramani
163adb9fa7 Add HalfToFloat + FloatToHalf operators to PyTorch (#45092)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45092

Adding two operators
1. at::float_to_half -> Converts FP32 tensor to FP16 tensor
2. at::half_to_float -> Converts FP16 tensor to FP32 tensor.

These operators internally use the kernel provided by FBGeMM. Both C2 and PT will use the same FBGeMM kernel underneath.

Test Plan:
buck test //caffe2/test:torch -- .*test_half_tensor.*

Run benchmark locally using

```
buck run //caffe2/benchmarks/operator_benchmark/pt:tensor_to_test
```

AI Bench results are pending. I expect that not to finish as we have large queue with jobs pending for 2+ days.

Benchmark for 512x512 tensor with FbGeMM implementation

```
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: FloatToHalfTensorConversionBenchmark
# Mode: Eager
# Name: FloatToHalfTensorConversionBenchmark_M512_N512_cpu
# Input: M: 512, N: 512, device: cpu
Forward Execution Time (us) : 1246.332

# Benchmarking PyTorch: HalfToFloatTensorConversionBenchmark
# Mode: Eager
# Name: HalfToFloatTensorConversionBenchmark_M512_N512_cpu
# Input: M: 512, N: 512, device: cpu
Forward Execution Time (us) : 1734.304
```

Benchmark for 512x512 tensor trunk with no FbGeMM integration.

```
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: FloatToHalfTensorConversionBenchmark
# Mode: Eager
# Name: FloatToHalfTensorConversionBenchmark_M512_N512_cpu
# Input: M: 512, N: 512, device: cpu
Forward Execution Time (us) : 169045.724

# Benchmarking PyTorch: HalfToFloatTensorConversionBenchmark
# Mode: Eager
# Name: HalfToFloatTensorConversionBenchmark_M512_N512_cpu
# Input: M: 512, N: 512, device: cpu
Forward Execution Time (us) : 152382.494
```

Reviewed By: ngimel

Differential Revision: D23824869

fbshipit-source-id: ef044459b6c8c6e5ddded72080204c6a0ab4582c
2020-11-10 12:00:53 -08:00
Gregory Chanan
65a72cae2c Fix type promotion for trace on CPU. (#47305)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47305

Fixes https://github.com/pytorch/pytorch/issues/47127.

Ideally this would just use diag and sum (as the CUDA implementation does), but that seems to have performance problems, which I'll link in the github PR.

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D24729627

Pulled By: gchanan

fbshipit-source-id: 151b786b53e7b958f0929c803dbf8e95981c6884
2020-11-10 07:46:03 -08:00
John Kilpatrick
8aca85dbcd Add diagflat complex support (#47564)
Summary:
Adds complex numbers support for `torch.diag`
``` python
>>> import torch
>>> a = torch.ones(2, dtype=torch.complex128)
>>> torch.diagflat(a)
tensor([[1.+0.j, 0.+0.j],
        [0.+0.j, 1.+0.j]], dtype=torch.complex128)
>>> b = a.cuda()
>>> torch.diagflat(b)
tensor([[1.+0.j, 0.+0.j],
        [0.+0.j, 1.+0.j]], device='cuda:0', dtype=torch.complex128)
```

Note that automatic differentiation isn't implemented:
``` python
>>> d = torch.ones(1, dtype=torch.complex128, requires_grad=True)
>>> torch.diagflat(d)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: diag does not support automatic differentiation for outputs with complex dtype.
```

Fixes https://github.com/pytorch/pytorch/issues/47499

Pull Request resolved: https://github.com/pytorch/pytorch/pull/47564

Reviewed By: heitorschueroff

Differential Revision: D24844467

Pulled By: anjali411

fbshipit-source-id: 9c8cb795d52880b7dcffab0c059b0f6c2e5ef151
2020-11-09 20:28:23 -08:00
Xiang Gao
f23a2a1115 The dimension being reduced should not be coalesced by TensorIterator (#47237)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/37583#issuecomment-720172838

Also add overload of `<<` for convenience of debugging.

This PR is tested by `test_reduction_split_cuda` which was added in https://github.com/pytorch/pytorch/pull/37788.

Reproduce
```python
import torch

a = torch.zeros(8, 1, 128, 1024, 1024)
a.cuda().sum(1)
```

Before

```
TensorIterator @ 0x7ffd05b10ba0 {
  ntensors() = 2
  noutputs() = 1
  shape() = [1073741824]
  strides(*) = {
    (0) = [4]
    (1) = [4]
  }
  dtype(*) = {
    (0) = Float
    (1) = Float
  }
  is_reduction_ = 1
}
```

After

```
TensorIterator @ 0x7fffc9051010 {
  ntensors() = 2
  noutputs() = 1
  shape() = [1, 1073741824]
  strides(*) = {
    (0) = [0, 4]
    (1) = [536870912, 4]
  }
  dtype(*) = {
    (0) = Float
    (1) = Float
  }
  is_reduction_ = 1
}
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/47237

Reviewed By: ejguan

Differential Revision: D24734763

Pulled By: ngimel

fbshipit-source-id: 02bb2b15694c68f96434f55033b63b6e5ff7085b
2020-11-07 01:30:24 -08:00
Xiong Wei
f90da88d8f Add complex support for torch.mean [CUDA] (#47048)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/46982

Pull Request resolved: https://github.com/pytorch/pytorch/pull/47048

Reviewed By: heitorschueroff

Differential Revision: D24729895

Pulled By: anjali411

fbshipit-source-id: 8e948480eb87c37de810207edf909375c0380772
2020-11-06 21:29:19 -08:00
Howard Huang
451e7d3db4 Enable diag for bool Tensors (#47455)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47455

Test Plan: Imported from OSS

Reviewed By: bdhirsh

Differential Revision: D24772483

Pulled By: H-Huang

fbshipit-source-id: 08ea4af4352972617db3c6475943b326f36b3049
2020-11-06 21:29:17 -08:00
Howard Huang
3253ccbd9f Add bool tensor support for where (#47454)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47454

Test Plan: Imported from OSS

Reviewed By: bdhirsh

Differential Revision: D24772482

Pulled By: H-Huang

fbshipit-source-id: ea488aae5bf64ac20f7a5d001e8edf55eed16eaf
2020-11-06 21:26:24 -08:00
Rong Rong
5614f72534 Suppres test issues in test_torch running in sandcastle (#47474)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47474

After enabling GPU/Re, some issues were specific to those runs

Test Plan:
```
buck test -c test.external_runner=tpx mode/opt //caffe2/test:torch_cuda -- --use-remote-execution --force-tpx --run-disabled
```

Reviewed By: malfet, janeyx99

Differential Revision: D24771578

fbshipit-source-id: 1ada79dae12c8cb6f795a0d261c60f038eee2dfb
2020-11-06 10:34:28 -08:00
Edward Yang
1aeefcdaa6 Revert D24730264: [pytorch][PR] Added CUDA support for complex input for torch.inverse
Test Plan: revert-hammer

Differential Revision:
D24730264 (33acbedace)

Original commit changeset: b9c94ec46301

fbshipit-source-id: beb9263700e9bc92685f74c37c46aa33f3b595b9
2020-11-06 07:28:14 -08:00