Summary:
This PR aims to add `arcosh`, `arcsinh` and `arctanh` support. Please see issue https://github.com/pytorch/pytorch/issues/38349 for more details.
**TODOs:**
* [x] Add test cases for `arcosh`, `arcsinh` and `arctanh`. (need help)
* [x] Overload ops if `std::op` does not work with `thrust::complex` types (like for `sinh`, `cosh`).
Note: `std::acosh, std::asinh, std::atanh` do not support `thrust::complex` types. Added support for complex types for these 3 ops (`arccosh, arcsinh, arctanh`)
cc: mruberry
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38388
Differential Revision: D21882055
Pulled By: mruberry
fbshipit-source-id: d334590b47c5a89e491a002c3e41e6ffa89000e3
Summary:
Re-enable some test cases in `test_memory_format_operators` since their corresponding issue has been fixed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38648
Differential Revision: D21689085
Pulled By: VitalyFedyunin
fbshipit-source-id: 0aa09e0bf31ba98c8ad0191ac3afd31dda0f1d42
Summary:
Cut from https://github.com/pytorch/pytorch/pull/38994.
This is a helper function for comparing torch and NumPy behavior. It updates the existing and increasingly popular _np_compare function and moves it to be a method on TestCase.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39179
Differential Revision: D21855082
Pulled By: mruberry
fbshipit-source-id: edca3b78ae392d32243b02bf61960898b6ba590f
Summary:
Fixes https://github.com/pytorch/pytorch/issues/32866, resubmit of https://github.com/pytorch/pytorch/issues/38970
The memory error in the issue is caused by int overflowing in col2vol. This version using mixed 32-bit and 64-bit indexing calculation lifts the maximum indexing possible without compromising the performance of ConvTranspose3d. vs 20-30% regression with pure 64-bit indexing.
This requires that input.numel() <= UINT_MAX, and channels * kernel.numel() <= UINT_MAX otherwise it raises an error. Previously, the code would crash or give incorrect results unless input.numel() * kernel.numel() <= INT_MAX.
Note that the test is a minimised reproducer for the issue.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39198
Differential Revision: D21817836
Pulled By: ezyang
fbshipit-source-id: b9adfe9f9dd00f04435be132966b33ac6b9efbef
Summary:
The test is currently only enabled for CPU, and it will be enabled for CUDA after the migration of `min` and `max` from THC to ATen is done.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38850
Differential Revision: D21819388
Pulled By: ngimel
fbshipit-source-id: 406343e96bccbf9139eb1f8f2d49ed530dd83d62
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39033
Added `real` and `imag` views as tensor attributes. Right now, tensor.imag is disabled for real tensors. This is because if we return a new tensor of zeros, the user would be able to update the tensor returned by tensor.imag which should not be allowed as numpy returns a read-only array, and pytorch doesn't support read-only tensors yet.
TODO in follow-up PRs:
1. add a setter for `real` and `imag`
2. add special case in codegen for `real` and `imag` backward functions.
3. remove `copy_real` and `copy_imag` methods.
Test Plan: Imported from OSS
Differential Revision: D21767542
Pulled By: anjali411
fbshipit-source-id: 539febf01f01ff055e3fbc7e9ff01fd3fe729056
Summary:
Adds complex support to `cumsum`, `cumprod` and relevant test update in `test_torch::tensor_op_tests`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39063
Differential Revision: D21771186
Pulled By: anjali411
fbshipit-source-id: 632916d4bdbd1c0941001898ab8146be2b7884fc
Summary:
**BC-breaking note:**
In previous versions of PyTorch zero dimensional CUDA tensors could be moved across devices implicitly. For example,
```
torch.tensor(5, device='cuda:0') + torch.tensor((1, 1), device='cuda:1')
```
would work, even though the tensors are on different CUDA devices. This is a frequent source of user confusion, however, and PyTorch generally does not move data across devices without it being explicit. This functionality is removed in PyTorch 1.6.
**PR Summary:**
Today in PyTorch we allow implicit data movement of zero dimensional CUDA tensors. For example, we allow:
```
torch.tensor(5, device='cuda:0') + torch.tensor((1, 1), device='cuda:1')
```
and
```
torch.tensor(2, device='cuda') + torch.tensor((3, 5))
```
In both of these cases TensorIterator would move the zero dim CUDA tensor to the device of the non-scalar tensor (cuda:1 in the first snippet, the CPU in the second snippet).
One of PyTorch's fundamental rules, however, is that it does not perform implicit data movement like this, and this change will causes these cases to throw an error. New tests for this behavior are added to test_torch.py, and tests of the old behavior are removed in test_torch.py and test_autograd.py. A cpp test in tensor_iterator_test.cpp is modified to account for the new behavior.
This addresses https://github.com/pytorch/pytorch/issues/36722.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38998
Differential Revision: D21757617
Pulled By: mruberry
fbshipit-source-id: 2498f07f4938d6de691fdbd5155ad2e881ff7fdb
Summary:
Fixes https://github.com/pytorch/pytorch/issues/32866
The memory error in the issue is caused by `int` overflowing in `col2vol`. This version using mixed 32-bit and 64-bit indexing calculation lifts the maximum indexing possible without compromising the performance of `ConvTranspose3d`. vs 20-30% regression with pure 64-bit indexing.
This requires that `input.numel() <= UINT_MAX`, and `channels * kernel.numel() <= UINT_MAX` otherwise it raises an error. Previously, the code would crash or give incorrect results unless `input.numel() * kernel.numel() <= INT_MAX`.
Note that the test is a minimised reproducer for the issue.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38970
Differential Revision: D21748644
Pulled By: ezyang
fbshipit-source-id: 95060423219dc647595e1a24b3dcac520d3aecba
Summary:
`_TestTorchMixin` is base class which is instantiated across multiple types.
It was inherited from `object` in order to hide it from unittest test discovery mechanism.
But this approach makes it almost impossible to use static code analyzer on the class.
This PR implements alternative approach by hiding base class into inner class, per https://stackoverflow.com/a/25695512
Change imported class access path in `test_cuda.py`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39110
Test Plan:
run `test_torch.py --discover-tests` and `test_cuda.py --discover-tests` before and after change:
```
$ python test_torch.py --discover-tests|md5sum
2ca437bb5d65700763ce04cdacf6de3e -
$ python test_cuda.py --discover-tests|md5sum
b17df916fb0eeb6f0dd7222d7dae392c -
```
Differential Revision: D21759265
Pulled By: malfet
fbshipit-source-id: b01b06111469e551f7b78387449975e5248f6b9e
Summary:
1.6 Deprecation Note:
In 1.6 attempting to perform integer division using addcdiv will throw a RuntimeError, and in 1.7 the behavior will change so that addcdiv always performs a true division of its tensor1 and tensor2 inputs. See the warning in torch.addcdiv's documentation for more information.
PR Summary:
This PR updates the warning that appears when addcdiv performs integer division to throw a RuntimeError. This is intended to prevent silent errors when torch.addcdiv's behavior is changed to always perform true division in 1.7. The documentation is updated (slightly) to reflect this, as our the addcdiv tests in test_torch and test_type_promotion.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38762
Differential Revision: D21657585
Pulled By: mruberry
fbshipit-source-id: c514b44409706f2bcfeca4473424b30cc48aafbc
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37181
Now that assertEquals considers dtypes in determining tolerance, most
tests don't need explicitly set precision.
Those that do are a few half precision tests on cuda. In this PR, those
are broken out to be handled explicitly, though we may also want to
consider further loosening the tolerance on half-precision.
Test Plan: Imported from OSS
Differential Revision: D21728402
Pulled By: nairbv
fbshipit-source-id: 85f3daf63f1bdbb5101e8dea8c125f13448ca228
Summary:
This updates assertEqual and assertEqual-like functions to either require both or neither of atol and rtol be specified. This should improve clarity around handling precision in the test suite, and it allows us to remove the legacy positional atol argument from assertEqual. In addition, the "message" kwarg is replace with a kwarg-only "msg" argument whose name is consistent with unittest's assertEqual argument.
In the future we could make "msg" an optional third positional argument to be more consistent with unittest's assertEqual, but requiring it be specified should be clear, and we can easily update the signature to make "msg" an optional positional argument in the future, too.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38872
Differential Revision: D21740237
Pulled By: mruberry
fbshipit-source-id: acbc027aa1d7877a49664d94db9a5fff91a07042
Summary:
This updates assertEqual and assertEqual-like functions to either require both or neither of atol and rtol be specified. This should improve clarity around handling precision in the test suite, and it allows us to remove the legacy positional atol argument from assertEqual. In addition, the "message" kwarg is replace with a kwarg-only "msg" argument whose name is consistent with unittest's assertEqual argument.
In the future we could make "msg" an optional third positional argument to be more consistent with unittest's assertEqual, but requiring it be specified should be clear, and we can easily update the signature to make "msg" an optional positional argument in the future, too.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38872
Differential Revision: D21717199
Pulled By: mruberry
fbshipit-source-id: 9feb856f94eee911b44f6c7140a1d07c1b026d3a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37098
### **Cherry-picked from another stack:**
Some code review already occurred here: https://github.com/pytorch/pytorch/pull/32582
### Summary:
Fixes: https://github.com/pytorch/pytorch/issues/32436
The issue caused incorrect handling of dtypes for scalar ** tensor.
e.g. before this change:
```
>>> 5.5 ** torch.ones(5, dtype=torch.int32)
tensor([5, 5, 5, 5, 5], dtype=torch.int32)
```
should return a float tensor.
Also fixes a number of incorrect cases:
* tensors to negative powers were giving incorrect results (1 instead
of 0 or error)
* Behavior wasn't consistent between cuda/cpu
* large_value ** 1 in some cases gave a result not equal
to large_value because of truncation in conversion to double and back.
BC-breaking:
Previously incorrect behavior (in 1.4):
```
>>> a
tensor([1, 1, 1, 1, 1], dtype=torch.int32)
>>> a.pow_(.5)
tensor([1, 1, 1, 1, 1], dtype=torch.int32)
```
After this change:
`RuntimeError: result type Float can't be cast to the desired output type Int`
Test Plan: Imported from OSS
Differential Revision: D21686207
Pulled By: nairbv
fbshipit-source-id: e797e7b195d224fa46404f668bb714e312ea78ac
Summary:
Related issue: https://github.com/pytorch/pytorch/issues/36900
Since I feel this PR is already large enough, I didn't migrate max in this PR. Legacy code is not cleaned up either. All these remaining work will be done in later PRs after this is merged.
Benchmark on an extreme case
```python
import torch
print(torch.__version__)
t = torch.randn(100000, 2, device='cuda')
warmup = torch.arange(100000000)
torch.cuda.synchronize()
%timeit t.min(dim=0); torch.cuda.synchronize()
```
Before: 4ms; After: 24.5us.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38440
Differential Revision: D21560691
Pulled By: ngimel
Summary:
This PR fixes the tolerance values for some of the bfloat16 div tests that were enabled on ROCm with incorrect tolerance values in the PR https://github.com/pytorch/pytorch/pull/38621
Also disabled(to unblock CI) `test_addcdiv*` for which the error is large when absolute values in the tensor are higher. This will have to be investigated further.
ezyang jeffdaily sunway513
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38823
Differential Revision: D21686290
Pulled By: ezyang
fbshipit-source-id: 85472680e1886bdc7c227ed2656e0b4fd5328e46
Summary:
This PR ports `masked_select` from TH to ATen and optimize the performance on CPU with TensorIterator.
https://github.com/pytorch/pytorch/issues/33053
1. single socket run: up to **5.4x** speedup;
2. single core run: up to **1.16x** speedup.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33269
Differential Revision: D20922288
Pulled By: ngimel
fbshipit-source-id: 38e183a4e3599bba29bbbebe36264026abe1c50e
Summary:
Updates our tests in preparation of integer division using torch.div and torch.addcdiv throwing a runtime error by avoiding integer division using torch.div. This creates a brief period where integer division using torch.div is untested, but that should be OK (since it will soon throw a runtime error).
These callsites were identified using https://github.com/pytorch/pytorch/issues/36897.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38621
Differential Revision: D21612823
Pulled By: mruberry
fbshipit-source-id: 749c03a69feae02590b4395335163d9bf047e162
Summary:
floordiv was missing a couple dunder registrations, which was causing __ifloordiv__ to not be called when it should. This adds the appropriate registrations and adds a test verifying that the inplace dunders are actually occuring inplace.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38695
Differential Revision: D21633980
Pulled By: mruberry
fbshipit-source-id: a423f5ec327cdc062fd6d9d56abd36fe44ac8198
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37984
- `NumericUtils.h`
CUDA distribution kernels had two variants of transformation labdas(`uniform`/`normal` -> `lognormal`/`exponential`/`cauchy`/`geometric`...): for double-precision and optimized for CUDA single precision. It was done by using `::log`/`__logf`, `::exp`/`__expf` and `::tan/__tanf`. I moved them to `NumericUtils.h` and called them `at::exp`, `at::log` and `at::tan`. It allowed to unify CPU/CUDA transformation templates in `TransformationHelper.h`.
- `DistributionsHelper.h`
Made `normal_distribution`, `geometric_distribution`, `exponential_distribution`, `cauchy_distribution`, `lognormal_distribution` C10_HOST_DEVICE compatible to reuse them in CPU/CUDA distribution kernels.
Replaced explicit math with transformations from `TransformationHelper.h`
- `TransformationHelper.h`
Renamed `*_transformation` to `transformation::*`
Added clear unified host/device transformations templates `normal`, `cauchy`, `exponential`, `geometric`, `log_normal` which are used by both CPU and CUDA distribution kernels and custom PRNG distribution kernels.
- `cpu/DistributionTemplates.h`
Unified `normal_kernel`, `cauchy_kernel`, `log_normal_kernel`, `geometric_kernel`, `exponential_kernel`.
- `cuda/DistributionTemplates.h`
Extracted `UNIFORM_AND_TRANSFORM` and `NORMAL_AND_TRANSFORM` macros to reuse code between distribution kernel templates.
Unified transformation labdas(`uniform`/`normal` -> `lognormal`/`exponential`/`cauchy`/`geometric`...)
- `test_torch.py`
Added `scipy.stats.kstest` [Kolmogorov–Smirnov](https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test) tests for `uniform`/`normal`/`lognormal`/`exponential`/`cauchy` distributions and [Chi-squared](https://en.wikipedia.org/wiki/Chi-squared_test) test for `geometric` one. To make sure that our distributions are correct.
- `cpu_rng_test.cpp`, `rng_test.h`
Fixed random_()'s from and to bounds issue for floating-point types, fixed cast/overflow warnings
- `THTensorRandom.h`, `THVector.h`
Moved unnecessary includes to `THTensorRandom.cpp`
Test Plan: Imported from OSS
Differential Revision: D21477955
Pulled By: pbelevich
fbshipit-source-id: 7b793d1761a7a921c4b4a4a7d21d5d6c48f03e72
Summary:
Edit: this has been updated to reflect the PR's current status, which has changed after review.
This PR updates the behavior of the assertEqual, assertNotEqual, and assert_allclose to be consistent with each other and torch.isclose. It corrects several additional bugs in the current implementations and adds extensive testing and comments, too.
These updates follow from changes to assertEqual like https://github.com/pytorch/pytorch/pull/34258 and https://github.com/pytorch/pytorch/pull/37069, and from our discussion of torch.isclose for complex tensors (see https://github.com/pytorch/pytorch/issues/36462), where we decided to implement a NumPy-compatible mathematical notion of "closeness" for complex tensors that is not a great fit for our testing framework.
The detailed changelist is:
- New test framework functions for comparing tensors and scalars
- Tensors are compared using isclose; the real and imaginary parts of complex tensors are compared independently
- Scalars are compared using the same algorithm
- assertEqual and assert_allclose now use this common comparison function, instead of each implementing their own with divergent behavior
- assertEqual-like debug messages are now available for all tensor and scalar comparisons, with additional context when comparing the components of sparse, quantized, and complex tensors
- Extensive testing of the comparison behavior and debug messages
- Small Updates
- assertEqual now takes an "exact_device" argument, analogous to "exact_dtype", which should be useful in multidevice tests
- assertEqual now takes an "equal_nan" argument for argument consistency with torch.isclose
- assertEqual no longer takes the "allow_inf" keyword, which misleadingly only applied to scalar comparisons, was only ever set (rarely) to true, and is not supported by torch.isclose
- Bug fixes:
- the exact_dtype attribute has been removed (no longer needed after https://github.com/pytorch/pytorch/pull/38103)
- message arguments passed to assertEqual are now handled correctly
- bool x other dtype comparisons are now supported
- uint8 and int8 tensor comparisons now function properly
- rtol for integer comparisons is now supported (default is zero)
- rtol and atol for scalar comparisons are now supported
- complex scalar comparisons are now supported, analogous to complex tensor comparisons
- assertNotEqual is now equivalent to the logical negation of assertEqual
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37294
Differential Revision: D21596830
Pulled By: mruberry
fbshipit-source-id: f2576669f7113a06f82581fc71883e6b772de19b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38505
This takes the testing of https://github.com/pytorch/pytorch/pull/38275, but doesn't include the kernel changes which are still being worked out.
Test Plan: Imported from OSS
Reviewed By: mruberry
Differential Revision: D21580574
Pulled By: gchanan
fbshipit-source-id: f12317259cb7373989f6c9ad345b19aaac524851