Summary:
This pull request fixes and re-enables two of the tests disabled in https://github.com/pytorch/pytorch/issues/37427
1. `test_sparse_add_out_bfloat16` in test_sparse.py fixed to use updated `atol` argument instead of `prec` for `assertEqual`
2. The conversion of `flt_min` to `int64` is divergent on HIP compared to numpy. The change removes that conversion from the `test_float_to_int_conversion_finite` test case in test_torch.py
cc: ezyang jeffdaily
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37616
Differential Revision: D21379876
Pulled By: ezyang
fbshipit-source-id: 2bfb41d67874383a01330c5d540ee516b3b07dcc
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37507
Replace `TORCH_WARN` with `TORCH_CHECK` if `Tensor.random_()`'s `from` or `to-1` is out of bounds for tensor's dtype. Previously warning said "This warning will become an error in version 1.6 release, please fix the code in advance", so the time has come.
Related to #33106
Test Plan: Imported from OSS
Differential Revision: D21349413
Pulled By: pbelevich
fbshipit-source-id: ac7c196a48fc58634611e427e65429a948119e40
Summary:
Following up on this: https://github.com/pytorch/pytorch/pull/35851 cross dtype storage copy is not being used internally, so I have not included cross dtype copy for complex.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35771
Differential Revision: D21319650
Pulled By: anjali411
fbshipit-source-id: 07c72996ee598eba0cf401ad61534494d6f5b5b3
Summary:
Closes https://github.com/pytorch/pytorch/issues/24641
Benchmark with same build settings on same system.
gcc : version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)
CUDA : 10.1
GPU : 1050ti
```python
import timeit
for n, t in [(10_000, 20000),
(100_000, 20000)]:
for dtype in ('torch.half', 'torch.float', 'torch.double'):
print(f'torch.tan(a) a.numel() == {n} for {t} times {dtype}')
print(timeit.timeit(f'torch.tan(a); torch.cuda.synchronize()',
setup=f'import torch; a=torch.arange({n}, dtype={dtype}, device="cuda")',
number=t))
```
Before:
```
torch.tan(a) a.numel() == 10000 for 20000 times torch.half
0.28325206200003095
torch.tan(a) a.numel() == 10000 for 20000 times torch.float
0.28363607099998944
torch.tan(a) a.numel() == 10000 for 20000 times torch.double
0.43924326799998425
torch.tan(a) a.numel() == 100000 for 20000 times torch.half
0.3754699589999859
torch.tan(a) a.numel() == 100000 for 20000 times torch.float
0.38143782899999223
torch.tan(a) a.numel() == 100000 for 20000 times torch.double
1.7672172019999834
```
After:
```
torch.tan(a) a.numel() == 10000 for 20000 times torch.half
0.28982524599996395
torch.tan(a) a.numel() == 10000 for 20000 times torch.float
0.29121579000002384
torch.tan(a) a.numel() == 10000 for 20000 times torch.double
0.4599610559998837
torch.tan(a) a.numel() == 100000 for 20000 times torch.half
0.3557764019997194
torch.tan(a) a.numel() == 100000 for 20000 times torch.float
0.34793807599999127
torch.tan(a) a.numel() == 100000 for 20000 times torch.double
1.7564662459999454
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36906
Differential Revision: D21335320
Pulled By: VitalyFedyunin
fbshipit-source-id: efab9c175c60fb09223105380d48b93a81994fb0
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27957
Benchmark (gcc 8.3, Debian Buster, turbo off, Release build, Intel(R) Xeon(R) E-2136):
```python
import timeit
for dtype in ('torch.double', 'torch.float', 'torch.uint8', 'torch.int8', 'torch.int16', 'torch.int32', 'torch.int64'):
for n, t in [(40_000, 50000),
(400_000, 5000)]:
print(f'torch.linspace(0, 10, {n}, dtype={dtype}) for {t} times')
print(timeit.timeit(f'torch.linspace(0, 10, {n}, dtype={dtype})', setup=f'import torch', number=t))
```
Before:
```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
1.3964195849839598
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
1.2374563289922662
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
1.8631796519621275
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
1.6991038109990768
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
1.8358083459897898
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
1.7214750979910605
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
1.8356257299892604
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
1.706238206999842
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
1.7463878280250356
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
1.6172360889613628
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
1.8656846070080064
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
1.714238062966615
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
1.8272205490502529
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
1.6409171230043285
```
After:
```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
1.0077099470072426
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
0.8227124120458029
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
1.0058343949494883
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
0.8376779520185664
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
1.903041019977536
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
1.7576498500420712
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
1.7628699769848026
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
1.6204477970022708
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
2.0970272019621916
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
1.9493417189805768
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
2.29020385700278
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
2.1212510910118
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
2.3479344319785014
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
2.156775983981788
```
Test Plan: Imported from OSS
Differential Revision: D20773454
Pulled By: VitalyFedyunin
fbshipit-source-id: ebeef59a90edde581669cc2afcc3d65929c8ac79
Summary:
Benchmark with same build settings on same system.
Closes https://github.com/pytorch/pytorch/issues/24545
gcc : version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)
CUDA : 10.1
GPU : 1050ti
```python
import timeit
for n, t in [(10_000, 20000),
(100_000, 20000)]:
for dtype in ('torch.half', 'torch.float', 'torch.double'):
print(f'torch.cos(a) a.numel() == {n} for {t} times {dtype}')
print(timeit.timeit(f'torch.cos(a); torch.cuda.synchronize()',
setup=f'import torch; a=torch.arange({n}, dtype={dtype}, device="cuda")',
number=t))
```
Before:
```
torch.cos(a) a.numel() == 10000 for 20000 times torch.half
0.2797315450006863
torch.cos(a) a.numel() == 10000 for 20000 times torch.float
0.283109110998339
torch.cos(a) a.numel() == 10000 for 20000 times torch.double
0.3648525129974587
torch.cos(a) a.numel() == 100000 for 20000 times torch.half
0.34239949499897193
torch.cos(a) a.numel() == 100000 for 20000 times torch.float
0.33680364199972246
torch.cos(a) a.numel() == 100000 for 20000 times torch.double
1.0512770260102116
```
After:
```
torch.cos(a) a.numel() == 10000 for 20000 times torch.half
0.285825898999974
torch.cos(a) a.numel() == 10000 for 20000 times torch.float
0.2781305120001889
torch.cos(a) a.numel() == 10000 for 20000 times torch.double
0.34188826099989456
torch.cos(a) a.numel() == 100000 for 20000 times torch.half
0.29040409300023384
torch.cos(a) a.numel() == 100000 for 20000 times torch.float
0.28678944200009937
torch.cos(a) a.numel() == 100000 for 20000 times torch.double
1.065477349000048
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36653
Differential Revision: D21164675
Pulled By: VitalyFedyunin
fbshipit-source-id: 5dd5d3af47c2a5527e1f4ab7669c2ed9a2293cee
Summary:
Adds support for generating Vandermonde matrices based off of the Numpy implementation found [here](https://github.com/numpy/numpy/blob/v1.17.0/numpy/lib/twodim_base.py#L475-L563).
Adds test to ensure generated matrix matches expected Numpy implementation. Note test are only limited to torch.long and torch.double due to differences in now PyTorch and Numpy deal with type promotion.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36725
Differential Revision: D21075138
Pulled By: jessebrizzi
fbshipit-source-id: 6bb1559e8247945714469b0e2b07c6f4d5fd1fd0
Summary:
Added enough operators to make sure that all unit tests from ATen/basic are passing, except for MM and IntArrayRefExpansion
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37121
Test Plan: `./bin/basic --gtest_filter=--gtest_filter=BasicTest.BasicTestHalfCPU` + `python -c "import torch; x = torch.tensor([2], dtype=torch.half); print(torch.isfinite(x+x))"`
Differential Revision: D21296863
Pulled By: malfet
fbshipit-source-id: e03d7a6939df11f611a9b317543bac52403cd009
Summary:
This pull request disables the unit tests that were observed to be failing once `test2` was enabled. These tests will be one by one looked at and fixed at the earliest, but until then disabling them to unblock `test2`
The pull request also disables fftPlanDestroy for rocFFT to avoid double-freeing FFT handles
cc: ezyang jeffdaily
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37427
Differential Revision: D21302909
Pulled By: ezyang
fbshipit-source-id: ecadda3778e65b7f4f97e24b932b96b9ce928616
Summary:
dylanbespalko anjali411
Not sure if the test should be added to `test_torch` or `test_complex`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36749
Differential Revision: D21290529
Pulled By: anjali411
fbshipit-source-id: 07bc282e4c9480cd015ec5db104e79728437cd90
Summary:
Fixes https://github.com/pytorch/pytorch/issues/37084
There are 3 alternatives for this design.
This PR and the first one.
When a tensor is a scalar `ndim==0`, accessing view_offsets_[0] when doing reductions, yields an invalid offset for the index which is the output of `argmax` and `argmin`.
fba9b9a023/aten/src/ATen/native/cpu/Reduce.h (L217)
This also happens in cuda code:
fba9b9a023/aten/src/ATen/native/cuda/Reduce.cuh (L797)
The second alternative is to check the size of `view_offsets` before accessing it. But this introduces some burden.
The third alternative is related to the way that inputs are treated in `argmax` and `argmin`
depending on the `dim` argument value.
fba9b9a023/aten/src/ATen/native/ReduceOps.cpp (L775-L780)
If `dim` is not specified, then the scalar gets reshaped into a 1-dim tensor and everything works properly, since now `view_offsets` has an actual entry.
If dim is specified, then the input remains as a scalar causing the issue we see here.
This PR tries to solve it in a generic way for every case so I went with option 1. I am willing to discuss it and change if you think that the other alternatives are better.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37214
Differential Revision: D21258320
Pulled By: ngimel
fbshipit-source-id: 46223412187bbba4bfa7337e3f1d2518db72dea2
Summary:
Today in PyTorch, warnings triggered in C++ are printed to Python users like this:
`../aten/src/ATen/native/BinaryOps.cpp:81: UserWarning: Integer division of tensors using div or / is deprecated, and in a future release div will perform true division as in Python 3. Use true_divide or floor_divide (// in Python) instead.`
This may be unhelpful to Python users, who have complained it's difficult to relate these messages back to their programs. After this PR, warnings that go through the PyWarningHandler and allow it to add context print like this:
```
test/test_torch.py:16463: UserWarning: Integer division of tensors using div or / is deprecated, and in a future release div will perform true division as in Python 3. Use true_divide or floor_divide (// in Python) instead. (Triggered internally at ../aten/src/ATen/native/BinaryOps.cpp:81.)
cpu_result = getattr(cpu_tensor, op_str)(*cpu_args)
```
This relates the warning back to the user's program. The information about the cpp file and line number is preserved in the body of the warning message.
Some warnings, like those generated in the JIT, already account for a user's Python context, and so they specify that they should be printed verbatim and are unaffected by this change. Warnings originating in Python and warnings that go through c10's warning handler, which prints to cerr, are also unaffected.
A test is added to test_torch.py for this behavior. The test relies on uint8 indexing being deprecated and its warning originating from its current header file, which is an unfortunate dependency. We could implement a `torch.warn` function, instead.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36052
Differential Revision: D20887740
Pulled By: mruberry
fbshipit-source-id: d3515c6658a387acb7fccaf83f23dbb452f02847
Summary:
Resolves https://github.com/pytorch/pytorch/issues/36730https://github.com/pytorch/pytorch/issues/36057
Partially resolves: https://github.com/pytorch/pytorch/issues/36671
```
>>> 2j / torch.tensor([4], dtype = torch.complex64)
tensor([(0.0000+0.5000j)], dtype=torch.complex64)
>>> 1 / torch.tensor(3+4j)
tensor((0.1200-0.1600j), dtype=torch.complex64)
```
rdiv is more generally broken for all dtypes because it doesn't promote the types properly
eg.
```
>>> 1 / torch.tensor(2)
tensor(0)
>>> 2j / torch.tensor(4)
tensor(0)
```
so that issue should be fixed in a separate PR
Adding CPU acc types for complex
Added cumsum, cumprod for complex dtypes
Added complex dtypes to get_all_math_dtypes to expand testing for complex dtypes
Old PR - https://github.com/pytorch/pytorch/pull/36747
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37193
Differential Revision: D21229373
Pulled By: anjali411
fbshipit-source-id: 8a086136d8c10dabe62358d276331e3f22bb2342
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37128
In certain build modes (in fbcode, building a .par) the mechanism to get test output "expect" files doesn't work.
All other tests in test_torch.py already had assertExpectedInline instead of assertExpected, with the expected result inline in the file.
There was no equivalent for assertExpectedRaises, so I added one, and changed the tests for test_is_nonzero (the only test using this)
Test Plan: CI, specifically the test test_is_nonzero should pass
Reviewed By: malfet
Differential Revision: D21197651
fbshipit-source-id: 2a07079efdcf1f0b0abe60e92cadcf55d81d4b13
Summary:
Closes https://github.com/pytorch/pytorch/issues/24642
Benchmark with same build settings on same system.
gcc : version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)
CUDA : 10.1
GPU : 1050ti
```python
import timeit
for n, t in [(10_000, 20000),
(100_000, 20000)]:
for dtype in ('torch.half', 'torch.float', 'torch.double'):
print(f'torch.tanh(a) a.numel() == {n} for {t} times {dtype}')
print(timeit.timeit(f'torch.tanh(a); torch.cuda.synchronize()',
setup=f'import torch; a=torch.arange({n}, dtype={dtype}, device="cuda")',
number=t))
```
Before:
```
torch.tanh(a) a.numel() == 10000 for 20000 times torch.half
0.2816318240002147
torch.tanh(a) a.numel() == 10000 for 20000 times torch.float
0.2728829070001666
torch.tanh(a) a.numel() == 10000 for 20000 times torch.double
0.39797203200214426
torch.tanh(a) a.numel() == 100000 for 20000 times torch.half
0.3228214350019698
torch.tanh(a) a.numel() == 100000 for 20000 times torch.float
0.31780802399953245
torch.tanh(a) a.numel() == 100000 for 20000 times torch.double
1.3745740449994628
```
After:
```
torch.tanh(a) a.numel() == 10000 for 20000 times torch.half
0.27825374500025646
torch.tanh(a) a.numel() == 10000 for 20000 times torch.float
0.27764024499992956
torch.tanh(a) a.numel() == 10000 for 20000 times torch.double
0.3771585260001302
torch.tanh(a) a.numel() == 100000 for 20000 times torch.half
0.2995866400015075
torch.tanh(a) a.numel() == 100000 for 20000 times torch.float
0.28355561699936516
torch.tanh(a) a.numel() == 100000 for 20000 times torch.double
1.393811182002537
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36995
Differential Revision: D21163353
Pulled By: ngimel
fbshipit-source-id: e2216ff62cdfdd13b6a56daa63d4ef1440d991d4
Summary:
Fixes a safety issue (Nonsense values and segfaults) introduced by https://github.com/pytorch/pytorch/pull/36875 when in-place gather tries to use incorrect shapes.
Consider the following block of code:
```
k0 = 8
k1 = 8
m = 100
x = torch.rand((k0, k1))
ind = torch.randint(0, k0, (m, k1))
output = torch.empty((m, k1))
print(torch.gather(x, 0, ind, out=output))
print(torch.gather(x, 1, ind, out=output))
```
The first gather is legal, the second is not. (`ind` and `output` need to be transposed) Previously this was caught when the kernel tried to restride inputs for TensorIterator, but we can no longer rely on those checks and must test explicitly. If `m` is small the second gather returns gibberish; if it is large enough to push the read out of memory block the program segfaults.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37102
Differential Revision: D21190580
Pulled By: robieta
fbshipit-source-id: 80175620d24ad3380d78995f7ec7dbf2627d2998
Summary:
Fixes: https://github.com/pytorch/pytorch/issues/24605https://github.com/pytorch/pytorch/issues/24535https://github.com/pytorch/pytorch/issues/24739https://github.com/pytorch/pytorch/issues/24680https://github.com/pytorch/pytorch/issues/30986
This does not fix https://github.com/pytorch/pytorch/issues/29984, it will be fixed in later PR.
Most of this PR is just following the same logic inside TH and THC except the handle of n-dimensional zero-sized tensor, in specific the case:
```
(m,).addmv((m, 0), (0,), beta, alpha)
```
# Legacy code bugs and how this PR deal with it
The above case is a case where BLAS often have a mismatch of semantics with PyTorch: For BLAS and cuBLAS, the above is a noop, but for PyTorch, it is a scalar-vector multiplication `output = beta * input`. The handle of this case is already very poor in legacy code and it is poorly tested:
For the CPU implementation, there are two code paths:
- Path 1: when dtype is float or double and `USE_BLAS`, then use BLAS
- Path 2: when other dtypes or not `USE_BLAS`, use a fallback kernel in PyTorch
For the CUDA implementation, there are also two code paths:
- Path 1: when float or double, then use `cublasSgemv` or `cublasDgemv` in cuBlas
- Path 2: when half, dispatch to `addmm`
`test_blas_alpha_beta_empty` is supposed to cover all cases, but unfortunately, it only tests the Path 1 of CUDA and Path 1 of CPU, and both uncovered paths (path 2 for CPU and path 2 for CUDA) are buggy in legacy code. In this PR, I expanded the coverage of `test_blas_alpha_beta_empty`, but unfortunately, I have to skip the `half` dtype on CUDA 9. See the description below for detail:
## Bug on CPU implementation
For the CPU implementation, the fallback kernel in path 2 already has the same semantics as PyTorch, not BLAS. But the code that tries to correct BLAS semantics to match PyTorch also runs on this case, leading to double correction, that is, `output = beta * input` now becomes `output = beta * beta * input`.
This leads to the issue https://github.com/pytorch/pytorch/issues/30986 I just opened, and it is fixed in this PR.
## Bug on CUDA implementation
For the CUDA implementation, path 2 dispatches to
```
(m, 1).addmm((m, 0), (0, 1), beta, alpha)
```
But unfortunately, for some old CUDA version when on old GPU on half dtype, the above is also noop, which is definitely not correct.
But from what I see, on newer CUDA version or newer GPU, this is not a problem. This is a bug of PyTorch in `addmm`, so I opened a new issue https://github.com/pytorch/pytorch/issues/31006 to track this problem. But this is highly likely a dependency bug for PyTorch originating from cuBLAS, and it is only on a rarely used edge case on old hardware and software, so this issue would be a `won't_fix` unless some real requirements strongly indicate that this should be fixed.
This issue is already with legacy code, and this PR does not make it worse. To prevent this issue from bothering us, I disable the test of `half` dtype for CUDA 9 when expanding the coverage of `test_blas_alpha_beta_empty`.
I promote a CircleCI CUDA 10.1 test to `XImportant` so that it runs on PRs, because the path 2 of CUDA implementation is only covered by this configuration. Let me know if I should revert this change.
## An additional problem
In legacy code for `addmv`, dtype `bfloat16` is enabled and dispatch to `addmm`, but `addmm` does not support `bfloat16` from what I test. I do the same thing in the new code. Let me know if I should do it differently.
# Benchmark
Code:
```python
import torch
print(torch.__version__)
for i in range(1000):
torch.arange(i, device='cuda')
print('cpu')
for i in 10, 100, 1000, 10000:
a = torch.randn((i,))
b = torch.randn((i, i))
c = torch.randn((i,))
%timeit a.addmv(b, c, alpha=1, beta=2)
print('cuda')
for i in 10, 100, 1000, 10000:
a = torch.randn((i,)).cuda()
b = torch.randn((i, i)).cuda()
c = torch.randn((i,)).cuda()
torch.cuda.synchronize()
%timeit a.addmv(b, c, alpha=1, beta=2); torch.cuda.synchronize()
```
Before:
```
1.5.0a0+2b45368
cpu
2.74 µs ± 30.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
8.5 µs ± 85.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
686 µs ± 2.95 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
74 ms ± 410 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
cuda
The slowest run took 4.81 times longer than the fastest. This could mean that an intermediate result is being cached.
27.6 µs ± 23 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
17.3 µs ± 151 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
20.5 µs ± 369 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
756 µs ± 6.81 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
```
After:
```
1.5.0a0+66b4034
cpu
3.29 µs ± 20 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
9.09 µs ± 7.41 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
687 µs ± 7.01 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
73.8 ms ± 453 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
cuda
18.2 µs ± 478 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
17.7 µs ± 299 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
21.5 µs ± 2.38 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
751 µs ± 35.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30898
Differential Revision: D20660338
Pulled By: anjali411
fbshipit-source-id: db1f521f124198f63545064026f93fcb16b68f18
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36680
If torch compiled without MKL, this test fails with torch.fft requiring MKL support
Test Plan: CI
Reviewed By: malfet
Differential Revision: D21051362
fbshipit-source-id: dd2e2c7d323622c1c25fc4c817b85d83d2241b3a
Summary:
Closes https://github.com/pytorch/pytorch/issues/24546
Benchmark with same build settings on same system.
gcc : version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)
CUDA : 10.1
GPU : 1050ti
```python
import timeit
for n, t in [(10_000, 20000),
(100_000, 20000)]:
for dtype in ('torch.half', 'torch.float', 'torch.double'):
print(f'torch.cosh(a) a.numel() == {n} for {t} times {dtype}')
print(timeit.timeit(f'torch.cosh(a); torch.cuda.synchronize()',
setup=f'import torch; a=torch.arange({n}, dtype={dtype}, device="cuda")',
number=t))
```
Before:
```
torch.cosh(a) a.numel() == 10000 for 20000 times torch.half
0.2813017509997735
torch.cosh(a) a.numel() == 10000 for 20000 times torch.float
0.28355878599904827
torch.cosh(a) a.numel() == 10000 for 20000 times torch.double
0.27810572300040803
torch.cosh(a) a.numel() == 100000 for 20000 times torch.half
0.3239932899996347
torch.cosh(a) a.numel() == 100000 for 20000 times torch.float
0.321233343998756
torch.cosh(a) a.numel() == 100000 for 20000 times torch.double
0.5546665399997437
```
After:
```
torch.cosh(a) a.numel() == 10000 for 20000 times torch.half
0.2905335750001541
torch.cosh(a) a.numel() == 10000 for 20000 times torch.float
0.27596429500044906
torch.cosh(a) a.numel() == 10000 for 20000 times torch.double
0.30358699899989006
torch.cosh(a) a.numel() == 100000 for 20000 times torch.half
0.30139567500009434
torch.cosh(a) a.numel() == 100000 for 20000 times torch.float
0.30246640400036995
torch.cosh(a) a.numel() == 100000 for 20000 times torch.double
0.5403946970000106
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36654
Differential Revision: D21164606
Pulled By: VitalyFedyunin
fbshipit-source-id: 55e88f94044957f81599ae3c12cda38a3e2c985c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35615
Python 2 has reached end-of-life and is no longer supported by PyTorch.
Now we can clean up a lot of cruft that we put in place to support it.
These changes were all done manually, and I skipped anything that seemed
like it would take more than a few seconds, so I think it makes sense to
review it manually as well (though using side-by-side view and ignoring
whitespace change might be helpful).
Test Plan: CI
Differential Revision: D20842886
Pulled By: dreiss
fbshipit-source-id: 8cad4e87c45895e7ce3938a88e61157a79504aed
Summary:
Resolves https://github.com/pytorch/pytorch/issues/36730https://github.com/pytorch/pytorch/issues/36057
Partially resolves: https://github.com/pytorch/pytorch/issues/36671
```
>>> 2j / torch.tensor([4], dtype = torch.complex64)
tensor([(0.0000+0.5000j)], dtype=torch.complex64)
>>> 1 / torch.tensor(3+4j)
tensor((0.1200-0.1600j), dtype=torch.complex64)
```
rdiv is more generally broken for all dtypes because it doesn't promote the types properly
eg.
```
>>> 1 / torch.tensor(2)
tensor(0)
>>> 2j / torch.tensor(4)
tensor(0)
```
so that issue should be fixed in a separate PR
Adding CPU acc types for complex
Added cumsum, cumprod for complex dtypes
Added complex dtypes to get_all_math_dtypes to expand testing for complex dtypes
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36747
Differential Revision: D21138687
Pulled By: anjali411
fbshipit-source-id: ad3602ccf86c70294a6e71e564cb0d46c393dfab
Summary:
Previously torch.isclose would RuntimeError when called on complex tensors. This update updates torch.isclose to run on complex tensors and be consistent with [NumPy](https://numpy.org/doc/1.18/reference/generated/numpy.isclose.html). However, NumPy's handling of NaN, -inf, and inf values is odd, so I adopted Python's [cmath.isclose](https://docs.python.org/3/library/cmath.html) behavior when dealing with them. See https://github.com/numpy/numpy/issues/15959 for more on NumPy's behavior.
While implementing complex isclose I also simplified the isclose algorithm to:
- A is close to B if A and B are equal, if equal_nan is true then NaN is equal to NaN
- If A and B are finite, then A is close to B if `abs(a - b) <= (atol + abs(rtol * b))`
This PR also documents torch.isclose, since it was undocumented, and adds multiple tests for its behavior to test_torch.py since it had no dedicated tests.
The PR leaves equal_nan=True with complex inputs an error for now, pending the outcome of https://github.com/numpy/numpy/issues/15959.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36456
Differential Revision: D21159853
Pulled By: mruberry
fbshipit-source-id: fb18fa7048e6104cc24f5ce308fdfb0ba5e4bb30
Summary:
Addresses https://github.com/pytorch/pytorch/issues/36807. Also updates the cast testing to catch issues like this better.
In the future a more constexpr based approach to casting would be nice.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36832
Differential Revision: D21120822
Pulled By: mruberry
fbshipit-source-id: 9504ddd36cfe6d9f9f545fc277fef36855c1b221
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34258
This PR allows both atol and rtol to be specified, uses defaults based on the prior analysis (spreadsheet attached to https://github.com/pytorch/pytorch/pull/32538), but retains the absolute tolerance behavior in cases where precision was previously specified explicitly.
Test Plan: Imported from OSS
Differential Revision: D21110255
Pulled By: nairbv
fbshipit-source-id: 57b3a004c7d5ac1be80ee765f03668b1b13f4a7e
Summary:
Implements complex isfinite and isinf, consistent with NumPy.
A complex value is finite if and only if both its real and imaginary part are finite.
A complex value is infinite if and only if its real or imaginary part are infinite.
Old isfinite, isinf, and isnan tests are modernized and instead of fixtures the torch results are compared with NumPy. A new test is added for complex isfinite, isinf, and isnan. The docs for each function are updated to clarify what finite, infinite, and NaN values are.
The new tests rely on a new helper, _np_compare, that we'll likely want to generalize in the near future and use in more tests.
Addresses part of the complex support tasks. See https://github.com/pytorch/pytorch/issues/33152.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36648
Differential Revision: D21054766
Pulled By: mruberry
fbshipit-source-id: d947707c5437385775c82f4e6c722349ca5a2174
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36351
Adds CUDA kernels for hardsigmoid, to enable its use in training.
Note: the update to the cpu backward pass is to keep the cpu vs cuda
logic consistent, no change in functionality.
Test Plan:
add CI for the forward pass
run this for the backward pass:
https://gist.github.com/vkuzo/95957d365600f9ad10d25bd20f58cc1a
Imported from OSS
Differential Revision: D20955589
fbshipit-source-id: dc198aa6a58e1a7996e1831f1e479c398ffcbc90
Summary:
Per title. test_abs used to be marked as slow_test and run on cpu only. Conceptually similar tests are done in TestTorchMathOps, so it's a matter of adding `abs` test there. 2 remaining checks (correct abs for large-valued long tensors, and correct abs for signed zeros) are factored into separate tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36465
Differential Revision: D21000248
Pulled By: ngimel
fbshipit-source-id: 8bc8b0da936b1c10fe016ff2f0dbb5ea428e7e61
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36411
This PR remove pytorch specific defined assertwarns and use the unit
test one, also format some tests
Test Plan: Imported from OSS
Differential Revision: D20998159
Pulled By: wanchaol
fbshipit-source-id: 1280ecff2dd293b95a639d13cc7417fc819c2201