Summary:
Issue https://github.com/pytorch/pytorch/issues/24596
This PR moves `mm` cuda to ATen. The internal `addmmImpl` that was used as the base of the old TH version of `mm` cuda is also ported.
This PR also sets up `addmm` cuda to be fairly easily ported to ATen in a future PR, since TH `mm` and `addmm` used the same `addmmImpl` function at their core.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34891
Differential Revision: D20650713
Pulled By: ngimel
fbshipit-source-id: 692aba1bbae65a18d23855b5e101446082d64c66
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35167
The purpose of this PR is to move `normal`/`normal_`/`normal_out` to `native/DistributionTemplates.h`, `native/cpu/DistributionTemplates.h` and `native/cuda/DistributionTemplates.h` to make it reusable for custom RNG, see cpu_rng_test.cpp as an example of custom RNG.
Test Plan: Imported from OSS
Differential Revision: D20588248
Pulled By: pbelevich
fbshipit-source-id: 7ee60be97f81522cd68894ff1389007c05130a60
Summary:
Fixes https://github.com/pytorch/pytorch/issues/34191
`at::native::radixSelect` basically uses integer comparison which creates a defined ordering of non-finite float values. This isn't compatible with IEEE float comparison, so mixing the two leads to unwritten values in the output.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35253
Differential Revision: D20645554
Pulled By: ezyang
fbshipit-source-id: 651bcb1742ed67086ec89cc318d862caae65b981
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34747
Adds the hardswish FP operator from MobileNetV3 to PyTorch. This is for
common operator coverage, since this is widely used. A future PR will
add the quantized version. CUDA is saved for a future PR as well.
Test Plan:
tests pass:
```
python test/test_torch.py TestTorchDeviceTypeCPU.test_hardswish_cpu_float32
```
microbenchmark:
https://gist.github.com/vkuzo/b10d3b238f24e58c585314e8b5385aca
(batch_size == 1: 11.5GiB/s, batch_size == 4: 11.9GiB/s)
Imported from OSS
Differential Revision: D20451404
fbshipit-source-id: c7e13c9ab1a83e27a1ba18182947c82c896efae2
Summary:
Per title. See related https://github.com/pytorch/pytorch/pull/34570.
In PyTorch 1.7 the plan is for torch.div and Python's division operator to perform "true" division, like Python 3, JAX, and NumPy. To facilitate this change, this PR expands true_divide to be a method so it can cover all of torch.div's use cases.
New true_divide tests are added to test_torch.py, test_type_promotion.py, and test_sparse.py.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34794
Differential Revision: D20545507
Pulled By: mruberry
fbshipit-source-id: 55286f819716c8823d1930441a69008560ac2bd5
Summary:
In C++, casting a floating point value to an integer dtype is undefined when the value is outside the dtype's dynamic range. For example, casting 300.5 to Int8 is undefined behavior because the maximum representable Int8 value is 127, and 300.5 > 127.
PyTorch, like NumPy, deliberately allows and makes these casts, however, and when we do this we trigger undefined behavior that causes our sanitizers to (correctly) complain. I propose skipping this sanitization on our cast function.
The history of this PR demonstrates the issue, showing a single CI failure in the ASAN build when a test is added that converts a large float value to an integral value. The current PR shows a green CI after the sanitization is skipped.
There are alternatives to skipping this sanitization:
- Clamping or otherwise converting floats to the dynamic range of integral types they're cast to
- Throwing a runtime error if a float value is outside the dynamic range of the integral type it's cast to (this would not be NumPy compatible)
- Declaring programs in error if they perform these casts (this is technically true)
- Preventing this happening in PyTorch proper so the ASAN build doesn't fail
None of these alternatives seems particularly appealing, and I think it's appropriate to skip the sanitization because our behavior is deliberate.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35086
Differential Revision: D20591163
Pulled By: mruberry
fbshipit-source-id: fa7a90609c73c4c627bd39726a7dcbaeeffa1d1b
Summary:
Per title.
In the future we want to make div(), the division operator, and addcdiv perform true division as in Python 3, NumPy, and JAX. To do this without silently breaking users we plan to:
- Warn (once) in 1.5 when a user performs integer division using div or addcdiv
- RuntimeError in 1.6 when a user attempts to perform integer division using div or addcdiv
- Always perform true division in 1.7 using div, /, and addcdiv
Users can use true_divide or floor_divide today to explicitly specify the type of division they like.
A test for this behavior is added to test_type_promotion. Unfortunately, because we are only warning once (to avoid a deluge) the test only uses maybeWarns Regex.
The XLA failure is real but will be solved by https://github.com/pytorch/pytorch/pull/34552. I'll be sure to land that PR first to avoid temporarily breaking the XLA build.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34570
Differential Revision: D20529211
Pulled By: mruberry
fbshipit-source-id: 65af5a9641c5825175d029e8413c9e1730c661d0
Summary:
(Updated per review feedback)
`torch.floor_divide` is currently a function that can operate on two tensors or a tensor and a scalar (scalar x scalar floor division is handled natively by Python and the JIT has a builtin function for it). This PR updates it to:
- have an out variant: `floor_divide(x, y, out=z)`
- be a method on a tensor: `x.floor_divide(y)`
- have an in-place variant: `x.floor_divide_(y)`
- work with sparse tensors
Tests are added to test_sparse.py and test_torch.py for these new behaviors.
In addition, this PR:
- cleans up the existing sparse division and true_division code and improves their error message
- adds testing of sparse true_division to test_sparse.py
- extends existing floor_divide testing in test_torch to run on CUDA, too, not just the CPU
Unfortunately, making floor_divide a method requires breaking backwards compatibility, and floor_divide has been added to the BC whitelist since this is international. The BC issue is that the first parameter name to torch.floor_divide is changing from input to self. If you previously called torch.floor_divide with keyword arguments, e.g. torch.floor_divide(input=x, other=y), you will need to update to torch.floor_divide(self=x, other=y), or the more common torch.floor_divide(x, y).
The intent of this PR is to allow floor_divide to be substituted for division (torch.div, /) wherever division was previously used. In 1.6 we expect torch.div to perform true_division, and floor_divide is how users can continue to perform integer division with tensors.
There are two potential follow-up issues suggested by this PR:
- the test framework might benefit from additional tensor construction classes, like one to create dividends and divisors for multiple dtypes
- the test framework might benefit from a universal function test class. while methods have reasonable coverage as part of test_torch.py's TestTensorOp tests, function coverage is spotty. Universal functions are similar enough it should be possible to generate tests for them.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34552
Differential Revision: D20509850
Pulled By: mruberry
fbshipit-source-id: 2cd3c828aad67191c77f2ed8470411e246f604f8
Summary:
Per title.
Currently torch.full will always (attempt to) produce a float tensor. This is inconsistent with NumPy in (at least) two cases:
- When integral fill values (including bool) are given
- When complex fill values are given
For example:
```
np.full((1, 2), 1).dtype
: dtype('int64')
np.full((1, 2), (1 + 1j)).dtype
: dtype('complex128')
```
Whereas in PyTorch
```
torch.full((1, 2), 1).dtype
: torch.float32
torch.full((1, 2), (1 + 1j)).dtype
: RuntimeError: value cannot be converted to type float without overflow: (1,1)
```
This PR begins the process of deprecating our current behavior of returning float tensors (by default) when given integer fill values by warning the user that integer fill values will require explicitly specifying the dtype or out kwargs in 1.6, and in 1.7 the behavior will change to return a LongTensor by default (BoolTensor for bool values). The intermediate 1.6 release is to prevent changing the behavior silently and unexpectedly.
The PR also implements inference for complex types. So that with it:
```
torch.full((1, 2), (1 + 1j)).dtype
: torch.complex64
```
The complex type inference returns a ComplexFloat tensor when given a complex fill value (and no dtype or out kwarg is specified), unless the default dtype is Double, in which case a ComplexDouble tensor is returned.
A test for these behaviors is added to test_torch.py.
Implementation note:
This PR required customizing full's dispatch because currently in eager codegen the TensorOptions object passed to functions improperly sets has_dtype() to true, even if the user did not explicitly provide a dtype. torch.arange already worked around this issue with its own custom implementation. The JIT, however, does pass a properly constructed TensorOptions object.
Future Work:
This PR does not extend torch.full's complex type inference to ONNX. This seems unlikely to come up and will be a clear error if it does. When integer type inference is added to torch.full, however, then porting the behavior to ONNX may be warranted. torch.arange ported its complex type promotion logic to ONNX, for example.
Additionally, this PR mostly leaves existing call sites in PyTorch that would trigger this warning intact. This is to be more minimal (since the PR is BC breaking). I will submit a separate PR fixing PyTorch's call sites.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34709
Differential Revision: D20509387
Pulled By: mruberry
fbshipit-source-id: 129593ba06a1662032bbbf8056975eaa59baf933
Summary:
(Updated per review feedback)
`torch.floor_divide` is currently a function that can operate on two tensors or a tensor and a scalar (scalar x scalar floor division is handled natively by Python and the JIT has a builtin function for it). This PR updates it to:
- have an out variant: `floor_divide(x, y, out=z)`
- be a method on a tensor: `x.floor_divide(y)`
- have an in-place variant: `x.floor_divide_(y)`
- work with sparse tensors
Tests are added to test_sparse.py and test_torch.py for these new behaviors.
In addition, this PR:
- cleans up the existing sparse division and true_division code and improves their error message
- adds testing of sparse true_division to test_sparse.py
- extends existing floor_divide testing in test_torch to run on CUDA, too, not just the CPU
Unfortunately, making floor_divide a method requires breaking backwards compatibility, and floor_divide has been added to the BC whitelist since this is international. The BC issue is that the first parameter name to torch.floor_divide is changing from input to self. If you previously called torch.floor_divide with keyword arguments, e.g. torch.floor_divide(input=x, other=y), you will need to update to torch.floor_divide(self=x, other=y), or the more common torch.floor_divide(x, y).
The intent of this PR is to allow floor_divide to be substituted for division (torch.div, /) wherever division was previously used. In 1.6 we expect torch.div to perform true_division, and floor_divide is how users can continue to perform integer division with tensors.
There are two potential follow-up issues suggested by this PR:
- the test framework might benefit from additional tensor construction classes, like one to create dividends and divisors for multiple dtypes
- the test framework might benefit from a universal function test class. while methods have reasonable coverage as part of test_torch.py's TestTensorOp tests, function coverage is spotty. Universal functions are similar enough it should be possible to generate tests for them.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34552
Differential Revision: D20497453
Pulled By: mruberry
fbshipit-source-id: ac326f2007d8894f730d1278fef84d63bcb07b5d
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34545
This is for common operator coverage, since this is widely used. A future PR
will add the quantized version.
Some initial questions for reviewers, since it's my first FP operator
diff:
* do we need a backwards.out method for this?
* do we need CUDA? If yes, should it be this PR or is it ok to split
Test Plan:
```
// test
python test/test_torch.py TestTorchDeviceTypeCPU.test_hardsigmoid_cpu_float32
// benchmark
python -m pt.hardsigmoid_test
...
Forward Execution Time (us) : 40.315
Forward Execution Time (us) : 42.603
```
Imported from OSS
Differential Revision: D20371692
fbshipit-source-id: 95668400da9577fd1002ce3f76b9777c6f96c327
Summary:
This test is flaky on my computer, the error is:
```
AssertionError: tensor(1.3351e-05) not less than or equal to 1e-05
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34764
Differential Revision: D20476006
Pulled By: ezyang
fbshipit-source-id: dad7e702275346070552c8a98765c37e6ca2c197
Summary:
This PR enables bfloat16 type for
- Embedding, Index, Sigmoid Ops used in [DLRM](https://github.com/facebookresearch/dlrm)
- Miscellaneous ops like comparison ops, arange op used in unit tests
- Rename types list with the pattern `*_with_bfloat16` in `test_torch.py` to avoid confusion
iotamudelta ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34630
Differential Revision: D20405093
Pulled By: ezyang
fbshipit-source-id: aa9538acf81b3a5a9a46ce5014529707fdf25687
Summary:
This PR implements the following linear algebra algorithms for low-rank matrices:
- [x] Approximate `A` as `Q Q^H A` - using Algorithm 4.4 from [Halko et al, 2009](http://arxiv.org/abs/0909.4061).
+ exposed as `torch.lowrank.get_approximate_basis(A, q, niter=2, M=None) -> Q`
+ [x] dense matrices
+ [x] batches of dense matrices
+ [x] sparse matrices
+ [x] documentation
- [x] SVD - using Algorithm 5.1 from [Halko et al, 2009](http://arxiv.org/abs/0909.4061).
+ uses `torch.lowrank.get_approximate_basis`
+ exposed as `torch.svd_lowrank(A, q=6, niter=2, M=None) -> (U, S, V)`
+ [x] dense matrices
+ [x] batches of dense matrices
+ [x] sparse matrices
+ [x] documentation
- [x] PCA - using `torch.svd_lowrank`
+ uses `torch.svd_lowrank`
+ exposed as `torch.pca_lowrank(A, center=True, q=None, niter=2) -> (U, S, V)`
+ [x] dense matrices
+ [x] batches of dense matrices
+ [x] sparse matrices, uses non-centered sparse matrix algorithm
+ [x] documentation
- [x] generalized eigenvalue solver using the original LOBPCG algorithm [Knyazev, 2001](https://epubs.siam.org/doi/abs/10.1137/S1064827500366124)
+ exposed as `torch.lobpcg(A, B=None, k=1, method="basic", ...)`
+ [x] dense matrices
+ [x] batches of dense matrices
+ [x] sparse matrices
+ [x] documentation
- [x] generalized eigenvalue solver using robust LOBPCG with orthogonal basis selection [Stathopoulos, 2002](https://epubs.siam.org/doi/10.1137/S1064827500370883)
+ exposed as `torch.lobpcg(A, B=None, k=1, method="ortho", ...)`
+ [x] dense matrices
+ [x] batches of dense matrices
+ [x] sparse matrices
+ [x] documentation
- [x] generalized eigenvalue solver using the robust and efficient LOBPCG Algorithm 8 from [Duersch et al, 2018](https://epubs.siam.org/doi/abs/10.1137/17M1129830) that switches to orthogonal basis selection automatically
+ the "ortho" method improves iterations so rapidly that in the current test cases it does not make sense to use the basic iterations at all. If users will have matrices for which basic iterations could improve convergence then the `tracker` argument allows breaking the iteration process at user choice so that the user can switch to the orthogonal basis selection if needed. In conclusion, there is no need to implement Algorithm 8 at this point.
- [x] benchmarks
+ [x] `torch.svd` vs `torch.svd_lowrank`, see notebook [Low-rank SVD](https://github.com/Quansight/pearu-sandbox/blob/master/pytorch/Low-rank%20SVD.ipynb). In conclusion, the low-rank SVD is going to be useful only for large sparse matrices where the full-rank SVD will fail due to memory limitations.
+ [x] `torch.lobpcg` vs `scipy.sparse.linalg.lobpcg`, see notebook [LOBPCG - pytorch vs scipy](https://github.com/Quansight/pearu-sandbox/blob/master/pytorch/LOBPCG%20-%20pytorch%20vs%20scipy.ipynb). In conculsion, both implementations give the same results (up to numerical errors from different methods), scipy lobpcg implementation is generally faster.
+ [x] On very small tolerance cases, `torch.lobpcg` is more robust than `scipy.sparse.linalg.lobpcg` (see `test_lobpcg_scipy` results)
Resolves https://github.com/pytorch/pytorch/issues/8049.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29488
Differential Revision: D20193196
Pulled By: vincentqb
fbshipit-source-id: 78a4879912424595e6ea95a95e483a37487a907e
Summary:
This PR enables bfloat16 type for loss criterion ops(and the ops they depend on) and few miscellaneous ops required to train resnet50.
iotamudelta ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34469
Differential Revision: D20348856
Pulled By: ezyang
fbshipit-source-id: 0a8f06c2169cfa3c9cf319120e27150170095f6c
Summary:
This allows us to enable some double-based pdist tests running into accrued error from casting down to float previously.
Addresses https://github.com/pytorch/pytorch/issues/33128
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34103
Differential Revision: D20343279
Pulled By: ezyang
fbshipit-source-id: a2da768259fab34ef326976283b7a15bebbbb979
Summary:
Addresses https://github.com/pytorch/pytorch/issues/5442.
Per title (and see issue). A test is added to test_torch.py to verify the behavior.
Update (with new behavior):
NumPy arrays can be non-writeable (read-only). When converting a NumPy array to a Torch tensor the storage is shared, but the tensor is always writable (PyTorch doesn't have a read-only tensor). Thus, when a non-writeable NumPy array is converted to a PyTorch tensor it can be written to.
In the past, PyTorch would silently copy non-writeable NumPy arrays and then convert those copies into tensors. This behavior violates the from_numpy contract, however, which promises that the tensor and the array share memory.
This PR adds a warning message when a non-writeable NumPy array is converted into a Torch tensor. This will not break any networks, but will make end users aware of the behavior. They can work-around the warning message by marking their NumPy arrays as writeable.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33615
Differential Revision: D20289894
Pulled By: mruberry
fbshipit-source-id: b76df0077399eb91038b12a6bf1917ef38c2cafd
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33825
Partially addresses #20376
I do this by overriding assertEqual in classes that opt into
this. This means I have to fix#33821. The fix is a little
unsatisfactory as idiomatic Python 2 super() calls don't work
(since the class is no longer in scope); hopefully this will just
work when we go to Python 3.
General approach taken:
- A lot of dtype mismatches are because we specified tensor constants
that infer to some dtype, but the actual dtype needed is something else.
Those are easy, just annotate the tensor() constructor (often a legacy
Tensor/FloatTensor call) with dtype
- There are a few cases where the promotion rules are nontrivial. Some of them
I just typed out the expected promotion rules manually (based on trial
and error)
- There are some more complex cases; if it gets too hairy I just
set exact_dtype=False and nope the fuck out
I don't have time to do it for all the other classes. But the setup
should work if people just incrementally add the overrides to classes,
and then eventually flip the default.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Differential Revision: D20125791
Pulled By: ezyang
fbshipit-source-id: 389c2d1efbd93172af02f13e38ac5e92fe730c57
Summary:
1. randn and normal_ methods will work for complex tensors after this PR
2. added an internal function for viewing complex tensors as float tensors which enables us to reuse functions defined for float tensors for complex tensors with change in arguments passed(like size, standard deviation in case of normal_). currently the resultant new float tensor doesn't share the storage with the input complex tensor which means that the version counter wouldn't be updated if any function is called on this resultant tensor, but once the dtype entry is removed from the storage class, this issue will be resolved.
Side notes:
1. didn't add a separate header for the util functions because of this issue https://github.com/pytorch/pytorch/issues/20686#issuecomment-593002293
2. we should eventually have a public API method view_complex_as_float once (2) mentioned above gets resolved
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34037
Differential Revision: D20221793
Pulled By: anjali411
fbshipit-source-id: a78f5e83d6104e2f55e0b250c4ec32e8d29a14eb
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33901
After this change, the pytest profile looks like:
4.83s call test/test_torch.py::TestTorch::test_fft_ifft_rfft_irfft
4.23s call test/test_torch.py::TestTorch::test_var_dim
4.22s call test/test_torch.py::TestTorch::test_std_dim
4.19s call test/test_torch.py::TestTorch::test_max
4.06s call test/test_torch.py::TestTorch::test_min
3.60s call test/test_torch.py::TestTorchDeviceTypeCPU::test_cdist_norm_batch_cpu
2.62s call test/test_torch.py::TestTorchDeviceTypeCPU::test_pow_cpu
2.60s call test/test_torch.py::TestTorch::test_matmul_small_brute_force_1d_Nd
And the entire CPU-only test suite can be run in 88s on my Intel(R) Xeon(R) CPU
E5-2650 v4 @ 2.20GHz
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Differential Revision: D20222288
Pulled By: ezyang
fbshipit-source-id: 4224a9117f42566e290ae202881d76f1545cebec
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33819
These conditions are for the specific implementation, the fallback implementation works without these checks. So use that if any of these checks isn't true.
Resubmit of https://github.com/pytorch/pytorch/pull/33419 (which got reverted due to a problem with XLA, but which now has been fixed)
ghstack-source-id: 99333280
Test Plan: Test included
Differential Revision: D20121460
fbshipit-source-id: c1056b8e26751e24078bbe80c7cb4b223bcca7cb
Summary:
- Modified assertEqual to handle complex tensors
- added a test in test_torch.py to test torch.zeros
- added dispatch for complex for index_kernel, index_put_kernel
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33773
Differential Revision: D20135553
Pulled By: anjali411
fbshipit-source-id: f716604535c0447ecffa335b0fc843431397c988