Commit Graph

1095 Commits

Author SHA1 Message Date
Mike Ruberry
21c94606b8 Cleans up type conversions, adds CPU test comparing with NumPy (#35374)
Summary:
Per title. Follow-up to https://github.com/pytorch/pytorch/pull/35086.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35374

Differential Revision: D20712443

Pulled By: mruberry

fbshipit-source-id: 987089c14bff644fd6a636da5530dc260e1d1a68
2020-03-27 22:11:57 -07:00
anjali411
96eec95ece torch.from_numpy for complex dtypes (#35531)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35531

Differential Revision: D20693581

Pulled By: anjali411

fbshipit-source-id: d53e26b4175452fa00b287efbfceea18104c1364
2020-03-27 14:40:28 -07:00
Johannes M Dieterich
835ee34e38 [ROCm] Update to ROCm 3.1.1 (#35552)
Summary:
Redux.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35552

Differential Revision: D20701593

Pulled By: ezyang

fbshipit-source-id: 1946d1e8fb47d597da903bae5d355bf52a5f017f
2020-03-27 12:21:12 -07:00
Vitaly Fedyunin
930d218fbf Increase Channels Last test coverage (#35504)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35504

Test Plan: Imported from OSS

Differential Revision: D20682117

Pulled By: VitalyFedyunin

fbshipit-source-id: ddd7ef1f075ea2c5c35df7bd698974fc5c59bc40
2020-03-27 12:04:47 -07:00
Natalia Gimelshein
8d720b7034 fix complex conversions on cuda (#35344)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/35225.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35344

Differential Revision: D20650471

Pulled By: ngimel

fbshipit-source-id: f9edabc6dd8884f72c1a38cdf9dbe1de8362535e
2020-03-26 13:17:37 -07:00
KostekIV
ada40777c4 Rand function for complex dtype (#34924)
Summary:
Address https://github.com/pytorch/pytorch/issues/34380
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34924

Differential Revision: D20596623

Pulled By: anjali411

fbshipit-source-id: e17ce069cd763b773399128d113704579ca766e6
2020-03-26 08:34:56 -07:00
Johannes M Dieterich
d807292c4a [ROCm] Hotfix disable tests (#35396)
Summary:
Regressions introduced sometime these last days - disable for now.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35396

Differential Revision: D20656744

Pulled By: xw285cornell

fbshipit-source-id: 386e4e5d50fb81a1d44e8f3558b81cb69299fe92
2020-03-26 00:21:40 -07:00
Kurt Mohler
a7c232f74c Port mm cuda from TH to ATen (#34891)
Summary:
Issue https://github.com/pytorch/pytorch/issues/24596

This PR moves `mm` cuda to ATen. The internal `addmmImpl` that was used as the base of the old TH version of `mm` cuda is also ported.

This PR also sets up `addmm` cuda to be fairly easily ported to ATen in a future PR, since TH `mm` and `addmm` used the same `addmmImpl` function at their core.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34891

Differential Revision: D20650713

Pulled By: ngimel

fbshipit-source-id: 692aba1bbae65a18d23855b5e101446082d64c66
2020-03-25 21:42:35 -07:00
Pavel Belevich
2dd867f30f Move normal() to DistributionTemplates (#35167)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35167

The purpose of this PR is to move `normal`/`normal_`/`normal_out` to `native/DistributionTemplates.h`, `native/cpu/DistributionTemplates.h` and `native/cuda/DistributionTemplates.h` to make it reusable for custom RNG, see cpu_rng_test.cpp as an example of custom RNG.

Test Plan: Imported from OSS

Differential Revision: D20588248

Pulled By: pbelevich

fbshipit-source-id: 7ee60be97f81522cd68894ff1389007c05130a60
2020-03-25 19:54:18 -07:00
Peter Bell
40b244ceb4 Fix handling of non-finite values in topk (#35253)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/34191

`at::native::radixSelect` basically uses integer comparison which creates a defined ordering of non-finite float values. This isn't compatible with IEEE float comparison, so mixing the two leads to unwritten values in the output.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35253

Differential Revision: D20645554

Pulled By: ezyang

fbshipit-source-id: 651bcb1742ed67086ec89cc318d862caae65b981
2020-03-25 13:29:45 -07:00
Zafar Takhirov
5959bd6c29 Making sure all tensors in torch.cat sequence have the same dtype. (#35150)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35150

Fixes #35014

Test Plan: Imported from OSS

Differential Revision: D20578589

Pulled By: z-a-f

fbshipit-source-id: edeaef133d1cf5152dcbafab2b969f1424ee2836
2020-03-25 11:36:12 -07:00
Vasiliy Kuznetsov
f3e9fa6122 add hardswish FP operator (#34747)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34747

Adds the hardswish FP operator from MobileNetV3 to PyTorch. This is for
common operator coverage, since this is widely used.  A future PR will
add the quantized version.  CUDA is saved for a future PR as well.

Test Plan:
tests pass:
```
python test/test_torch.py TestTorchDeviceTypeCPU.test_hardswish_cpu_float32
```

microbenchmark:
https://gist.github.com/vkuzo/b10d3b238f24e58c585314e8b5385aca
(batch_size == 1: 11.5GiB/s, batch_size == 4: 11.9GiB/s)

Imported from OSS

Differential Revision: D20451404

fbshipit-source-id: c7e13c9ab1a83e27a1ba18182947c82c896efae2
2020-03-24 15:15:34 -07:00
Peter Bell
6f6436ff5d Fix input overwriting in irfft (#35219)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/34551
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35219

Differential Revision: D20605330

Pulled By: ezyang

fbshipit-source-id: a62f1685779bb05c3682255bb3a3f6f9ec35814f
2020-03-24 08:27:06 -07:00
Mike Ruberry
7c1ea736ba Extends true_divide to be a method (#34794)
Summary:
Per title. See related https://github.com/pytorch/pytorch/pull/34570.

In PyTorch 1.7 the plan is for torch.div and Python's division operator to perform "true" division, like Python 3, JAX, and NumPy. To facilitate this change, this PR expands true_divide to be a method so it can cover all of torch.div's use cases.

New true_divide tests are added to test_torch.py, test_type_promotion.py, and test_sparse.py.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34794

Differential Revision: D20545507

Pulled By: mruberry

fbshipit-source-id: 55286f819716c8823d1930441a69008560ac2bd5
2020-03-23 23:12:23 -07:00
Mike Ruberry
36e36eff2f Ignores deliberate undefined float->int conversion (#35086)
Summary:
In C++, casting a floating point value to an integer dtype is undefined when the value is outside the dtype's dynamic range. For example, casting 300.5 to Int8 is undefined behavior because the maximum representable Int8 value is 127, and 300.5 > 127.

PyTorch, like NumPy, deliberately allows and makes these casts, however, and when we do this we trigger undefined behavior that causes our sanitizers to (correctly) complain. I propose skipping this sanitization on our cast function.

The history of this PR demonstrates the issue, showing a single CI failure in the ASAN build when a test is added that converts a large float value to an integral value. The current PR shows a green CI after the sanitization is skipped.

There are alternatives to skipping this sanitization:

- Clamping or otherwise converting floats to the dynamic range of integral types they're cast to
- Throwing a runtime error if a float value is outside the dynamic range of the integral type it's cast to (this would not be NumPy compatible)
- Declaring programs in error if they perform these casts (this is technically true)
- Preventing this happening in PyTorch proper so the ASAN build doesn't fail

None of these alternatives seems particularly appealing, and I think it's appropriate to skip the sanitization because our behavior is deliberate.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35086

Differential Revision: D20591163

Pulled By: mruberry

fbshipit-source-id: fa7a90609c73c4c627bd39726a7dcbaeeffa1d1b
2020-03-23 01:08:57 -07:00
anjali411
7d5a899883 randn cuda kernel complex dtype (#35056)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35056

Differential Revision: D20559396

Pulled By: anjali411

fbshipit-source-id: 64b911f893e9c54aef89e8c1e643998d8b70e613
2020-03-20 11:19:08 -07:00
Wojciech Baranowski
eb78f7ea41 torch.cat: disallow inputs on different devices (#35053)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/35045
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35053

Differential Revision: D20545517

Pulled By: ngimel

fbshipit-source-id: eee3fc87c7e578ff44d69d5ce6f92a8f496fa97b
2020-03-19 22:06:39 -07:00
rohithkrn
edb794fb19 [ROCm] Enable BFloat16 type for TopK operator on ROCm. (#34849)
Summary:
This PR enables bfloat16 for topk on ROCm.

iotamudelta ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34849

Differential Revision: D20544732

Pulled By: ezyang

fbshipit-source-id: 1ad017a4403d2a429d98e60c8eb1f78b320df920
2020-03-19 20:04:08 -07:00
Mike Ruberry
0d8447a9b8 Warns when performing integer division with div and addcdiv (#34570)
Summary:
Per title.

In the future we want to make div(), the division operator, and addcdiv perform true division as in Python 3, NumPy, and JAX. To do this without silently breaking users we plan to:

- Warn (once) in 1.5 when a user performs integer division using div or addcdiv
- RuntimeError in 1.6 when a user attempts to perform integer division using div or addcdiv
- Always perform true division in 1.7 using div, /, and addcdiv

Users can use true_divide or floor_divide today to explicitly specify the type of division they like.

A test for this behavior is added to test_type_promotion. Unfortunately, because we are only warning once (to avoid a deluge) the test only uses maybeWarns Regex.

The XLA failure is real but will be solved by https://github.com/pytorch/pytorch/pull/34552. I'll be sure to land that PR first to avoid temporarily breaking the XLA build.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34570

Differential Revision: D20529211

Pulled By: mruberry

fbshipit-source-id: 65af5a9641c5825175d029e8413c9e1730c661d0
2020-03-19 04:10:55 -07:00
Mike Ruberry
3b7e1cd2cc Makes floor_divide a method, adds sparse floor division (#34552)
Summary:
(Updated per review feedback)

`torch.floor_divide` is currently a function that can operate on two tensors or a tensor and a scalar (scalar x scalar floor division is handled natively by Python and the JIT has a builtin function for it). This PR updates it to:

- have an out variant: `floor_divide(x, y, out=z)`
- be a method on a tensor: `x.floor_divide(y)`
- have an in-place variant: `x.floor_divide_(y)`
- work with sparse tensors

Tests are added to test_sparse.py and test_torch.py for these new behaviors.

In addition, this PR:

- cleans up the existing sparse division and true_division code and improves their error message
- adds testing of sparse true_division to test_sparse.py
- extends existing floor_divide testing in test_torch to run on CUDA, too, not just the CPU

Unfortunately, making floor_divide a method requires breaking backwards compatibility, and floor_divide has been added to the BC whitelist since this is international. The BC issue is that the first parameter name to torch.floor_divide is changing from input to self. If you previously called torch.floor_divide with keyword arguments, e.g. torch.floor_divide(input=x, other=y), you will need to update to torch.floor_divide(self=x, other=y), or the more common torch.floor_divide(x, y).

The intent of this PR is to allow floor_divide to be substituted for division (torch.div, /) wherever division was previously used. In 1.6 we expect torch.div to perform true_division, and floor_divide is how users can continue to perform integer division with tensors.

There are two potential follow-up issues suggested by this PR:

- the test framework might benefit from additional tensor construction classes, like one to create dividends and divisors for multiple dtypes
- the test framework might benefit from a universal function test class. while methods have reasonable coverage as part of test_torch.py's TestTensorOp tests, function coverage is spotty. Universal functions are similar enough it should be possible to generate tests for them.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34552

Differential Revision: D20509850

Pulled By: mruberry

fbshipit-source-id: 2cd3c828aad67191c77f2ed8470411e246f604f8
2020-03-18 15:00:53 -07:00
Mike Ruberry
1afc584188 Deprecates current torch.full integral type inference, adds torch.full complex type inference (#34709)
Summary:
Per title.

Currently torch.full will always (attempt to) produce a float tensor. This is inconsistent with NumPy in (at least) two cases:

- When integral fill values (including bool) are given
- When complex fill values are given

For example:

```
np.full((1, 2), 1).dtype
: dtype('int64')

np.full((1, 2), (1 + 1j)).dtype
: dtype('complex128')
```

Whereas in PyTorch

```
torch.full((1, 2), 1).dtype
: torch.float32

torch.full((1, 2), (1 + 1j)).dtype
: RuntimeError: value cannot be converted to type float without overflow: (1,1)
```

This PR begins the process of deprecating our current behavior of returning float tensors (by default) when given integer fill values by warning the user that integer fill values will require explicitly specifying the dtype or out kwargs in 1.6, and in 1.7 the behavior will change to return a LongTensor by default (BoolTensor for bool values). The intermediate 1.6 release is to prevent changing the behavior silently and unexpectedly.

The PR also implements inference for complex types. So that with it:

```
torch.full((1, 2), (1 + 1j)).dtype
: torch.complex64
```

The complex type inference returns a ComplexFloat tensor when given a complex fill value (and no dtype or out kwarg is specified), unless the default dtype is Double, in which case a ComplexDouble tensor is returned.

A test for these behaviors is added to test_torch.py.

Implementation note:

This PR required customizing full's dispatch because currently in eager codegen the TensorOptions object passed to functions improperly sets has_dtype() to true, even if the user did not explicitly provide a dtype. torch.arange already worked around this issue with its own custom implementation. The JIT, however, does pass a properly constructed TensorOptions object.

Future Work:

This PR does not extend torch.full's complex type inference to ONNX. This seems unlikely to come up and will be a clear error if it does. When integer type inference is added to torch.full, however, then porting the behavior to ONNX may be warranted. torch.arange ported its complex type promotion logic to ONNX, for example.

Additionally, this PR mostly leaves existing call sites in PyTorch that would trigger this warning intact. This is to be more minimal (since the PR is BC breaking). I will submit a separate PR fixing PyTorch's call sites.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34709

Differential Revision: D20509387

Pulled By: mruberry

fbshipit-source-id: 129593ba06a1662032bbbf8056975eaa59baf933
2020-03-18 12:19:31 -07:00
Mike Ruberry
a1eaaea288 Revert D20497453: [pytorch][PR] Makes floor_divide a method, adds sparse floor division
Test Plan: revert-hammer

Differential Revision:
D20497453

Original commit changeset: ac326f2007d8

fbshipit-source-id: b94b89b1a25521506e3d0a6b072d3d4d8c55e63d
2020-03-18 01:48:50 -07:00
Mike Ruberry
b7129050e7 Makes floor_divide a method, adds sparse floor division (#34552)
Summary:
(Updated per review feedback)

`torch.floor_divide` is currently a function that can operate on two tensors or a tensor and a scalar (scalar x scalar floor division is handled natively by Python and the JIT has a builtin function for it). This PR updates it to:

- have an out variant: `floor_divide(x, y, out=z)`
- be a method on a tensor: `x.floor_divide(y)`
- have an in-place variant: `x.floor_divide_(y)`
- work with sparse tensors

Tests are added to test_sparse.py and test_torch.py for these new behaviors.

In addition, this PR:

- cleans up the existing sparse division and true_division code and improves their error message
- adds testing of sparse true_division to test_sparse.py
- extends existing floor_divide testing in test_torch to run on CUDA, too, not just the CPU

Unfortunately, making floor_divide a method requires breaking backwards compatibility, and floor_divide has been added to the BC whitelist since this is international. The BC issue is that the first parameter name to torch.floor_divide is changing from input to self. If you previously called torch.floor_divide with keyword arguments, e.g. torch.floor_divide(input=x, other=y), you will need to update to torch.floor_divide(self=x, other=y), or the more common torch.floor_divide(x, y).

The intent of this PR is to allow floor_divide to be substituted for division (torch.div, /) wherever division was previously used. In 1.6 we expect torch.div to perform true_division, and floor_divide is how users can continue to perform integer division with tensors.

There are two potential follow-up issues suggested by this PR:

- the test framework might benefit from additional tensor construction classes, like one to create dividends and divisors for multiple dtypes
- the test framework might benefit from a universal function test class. while methods have reasonable coverage as part of test_torch.py's TestTensorOp tests, function coverage is spotty. Universal functions are similar enough it should be possible to generate tests for them.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34552

Differential Revision: D20497453

Pulled By: mruberry

fbshipit-source-id: ac326f2007d8894f730d1278fef84d63bcb07b5d
2020-03-18 00:01:45 -07:00
Vasiliy Kuznetsov
1bac5fd0d3 add hardsigmoid FP operator to PyTorch (#34545)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34545

This is for common operator coverage, since this is widely used.  A future PR
will add the quantized version.

Some initial questions for reviewers, since it's my first FP operator
diff:
* do we need a backwards.out method for this?
* do we need CUDA? If yes, should it be this PR or is it ok to split

Test Plan:
```
// test
python test/test_torch.py TestTorchDeviceTypeCPU.test_hardsigmoid_cpu_float32

// benchmark
python -m pt.hardsigmoid_test
...
Forward Execution Time (us) : 40.315

Forward Execution Time (us) : 42.603
```

Imported from OSS

Differential Revision: D20371692

fbshipit-source-id: 95668400da9577fd1002ce3f76b9777c6f96c327
2020-03-16 15:24:12 -07:00
Xiang Gao
31eaeba38a Increase the prec of test_baddbmm (#34764)
Summary:
This test is flaky on my computer, the error is:
```
AssertionError: tensor(1.3351e-05) not less than or equal to 1e-05
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34764

Differential Revision: D20476006

Pulled By: ezyang

fbshipit-source-id: dad7e702275346070552c8a98765c37e6ca2c197
2020-03-16 15:06:01 -07:00
Pearu Peterson
8bae1ed144 PCA and SVD for low-rank matrices, LOBPCG for positive-defined generalized eigenvalue problem - copy (#34721)
Summary:
This is a copy of PR https://github.com/pytorch/pytorch/issues/29488 to help the merging process.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34721

Differential Revision: D20444270

Pulled By: vincentqb

fbshipit-source-id: 042c56c8c0dae37834f52b4aee2deae7dd6fa659
2020-03-16 14:13:30 -07:00
Andrew Delong
8e8a37d746 Fix bug in baddbmm corner case (#33467) (#33538)
Summary:
Ensure `torch.baddbmm(c, a, b)` returns `beta*c` when `a @ b` has empty inner dimension.

Fixes https://github.com/pytorch/pytorch/issues/33467.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33538

Differential Revision: D20352352

Pulled By: albanD

fbshipit-source-id: a7021c1979f82402ecea4784d6cc39783392ea16
2020-03-13 09:30:20 -07:00
rohithkrn
2f32b92763 [ROCm] Enable BFloat16 type for EmbeddingBag ops et al (#34630)
Summary:
This PR enables bfloat16 type for

- Embedding, Index, Sigmoid Ops used in [DLRM](https://github.com/facebookresearch/dlrm)
- Miscellaneous ops like comparison ops, arange op used in unit tests
- Rename types list with the pattern `*_with_bfloat16` in `test_torch.py` to avoid confusion

iotamudelta ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34630

Differential Revision: D20405093

Pulled By: ezyang

fbshipit-source-id: aa9538acf81b3a5a9a46ce5014529707fdf25687
2020-03-12 11:30:33 -07:00
Edward Yang
4b929e5466 Revert D20193196: [pytorch][PR] PCA and SVD for low-rank matrices, LOBPCG for positive-defined generalized eigenvalue problem
Test Plan: revert-hammer

Differential Revision:
D20193196

Original commit changeset: 78a487991242

fbshipit-source-id: 8da4f8cb17c45af41e8c0ce80bc72581eb10dbb8
2020-03-11 09:24:34 -07:00
Pearu Peterson
2ec779d46c PCA and SVD for low-rank matrices, LOBPCG for positive-defined generalized eigenvalue problem (#29488)
Summary:
This PR implements the following linear algebra algorithms for low-rank matrices:
- [x] Approximate `A` as `Q Q^H A` - using Algorithm 4.4 from [Halko et al, 2009](http://arxiv.org/abs/0909.4061).
  + exposed as `torch.lowrank.get_approximate_basis(A, q, niter=2, M=None) -> Q`
  + [x] dense matrices
  + [x] batches of dense matrices
  + [x] sparse matrices
  + [x] documentation
- [x] SVD - using Algorithm 5.1 from [Halko et al, 2009](http://arxiv.org/abs/0909.4061).
  + uses `torch.lowrank.get_approximate_basis`
  + exposed as `torch.svd_lowrank(A, q=6, niter=2, M=None) -> (U, S, V)`
  + [x] dense matrices
  + [x] batches of dense matrices
  + [x] sparse matrices
  + [x] documentation
- [x] PCA - using `torch.svd_lowrank`
  + uses `torch.svd_lowrank`
  + exposed as `torch.pca_lowrank(A, center=True, q=None, niter=2) -> (U, S, V)`
  + [x] dense matrices
  + [x] batches of dense matrices
  + [x] sparse matrices, uses non-centered sparse matrix algorithm
  + [x] documentation
- [x] generalized eigenvalue solver using the original LOBPCG algorithm [Knyazev, 2001](https://epubs.siam.org/doi/abs/10.1137/S1064827500366124)
  + exposed as `torch.lobpcg(A, B=None, k=1, method="basic", ...)`
  + [x] dense matrices
  + [x] batches of dense matrices
  + [x] sparse matrices
  + [x] documentation
- [x] generalized eigenvalue solver using robust LOBPCG with orthogonal basis selection [Stathopoulos, 2002](https://epubs.siam.org/doi/10.1137/S1064827500370883)
  + exposed as `torch.lobpcg(A, B=None, k=1, method="ortho", ...)`
  + [x] dense matrices
  + [x] batches of dense matrices
  + [x] sparse matrices
  + [x] documentation
- [x] generalized eigenvalue solver using the robust and efficient LOBPCG Algorithm 8 from [Duersch et al, 2018](https://epubs.siam.org/doi/abs/10.1137/17M1129830) that switches to orthogonal basis selection automatically
  + the "ortho" method improves iterations so rapidly that in the current test cases it does not make sense to use the basic iterations at all. If users will have matrices for which basic iterations could improve convergence then the `tracker` argument allows breaking the iteration process at user choice so that the user can switch to the orthogonal basis selection if needed. In conclusion, there is no need to implement Algorithm 8 at this point.
- [x] benchmarks
  + [x] `torch.svd` vs `torch.svd_lowrank`, see notebook [Low-rank SVD](https://github.com/Quansight/pearu-sandbox/blob/master/pytorch/Low-rank%20SVD.ipynb). In conclusion, the low-rank SVD is going to be useful only for large sparse matrices where the full-rank SVD will fail due to memory limitations.
  + [x] `torch.lobpcg` vs `scipy.sparse.linalg.lobpcg`, see notebook [LOBPCG - pytorch vs scipy](https://github.com/Quansight/pearu-sandbox/blob/master/pytorch/LOBPCG%20-%20pytorch%20vs%20scipy.ipynb). In conculsion, both implementations give the same results (up to numerical errors from different methods), scipy lobpcg implementation is generally faster.
  + [x] On very small tolerance cases, `torch.lobpcg` is more robust than `scipy.sparse.linalg.lobpcg` (see `test_lobpcg_scipy` results)

Resolves https://github.com/pytorch/pytorch/issues/8049.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29488

Differential Revision: D20193196

Pulled By: vincentqb

fbshipit-source-id: 78a4879912424595e6ea95a95e483a37487a907e
2020-03-11 07:33:49 -07:00
Kurt Mohler
fbbeee0983 Port remainder from TH to ATen (CPU and CUDA) (#34136)
Summary:
CPU issue https://github.com/pytorch/pytorch/issues/24753
CUDA issue https://github.com/pytorch/pytorch/issues/24615
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34136

Differential Revision: D20375458

Pulled By: ezyang

fbshipit-source-id: 1a9fb39a7e2d17a0d31bd14b211eaacea060e834
2020-03-11 07:08:11 -07:00
Ailing Zhang
ab2297dfe6 Add Tensor overload for start in narrow. (#34317)
Summary:
https://github.com/pytorch/pytorch/issues/31558
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34317

Differential Revision: D20294333

Pulled By: ailzhang

fbshipit-source-id: 47c6646ae298e04a455923bd5048db026a5e3c7c
2020-03-10 22:33:22 -07:00
Gao, Xiang
d0834c5b64 Preserve memory format for torch.cat on CUDA (#34526)
Summary:
fix https://github.com/pytorch/pytorch/issues/34084

cc: ptrblck VitalyFedyunin
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34526

Differential Revision: D20371847

Pulled By: ngimel

fbshipit-source-id: e3b1a34caff2db8099ad9afe91bf9b473d5da6e8
2020-03-10 16:06:10 -07:00
rohithkrn
29b673392f [ROCm] Enable BFloat16 type for loss functions and few misc ops required for resnet50 (#34469)
Summary:
This PR enables bfloat16 type for loss criterion ops(and the ops they depend on) and few miscellaneous ops required to train resnet50.

iotamudelta ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34469

Differential Revision: D20348856

Pulled By: ezyang

fbshipit-source-id: 0a8f06c2169cfa3c9cf319120e27150170095f6c
2020-03-10 08:39:07 -07:00
Johannes M Dieterich
2c1a302d6a [ROCm] Enable double __shfl_down (#34103)
Summary:
This allows us to enable some double-based pdist tests running into accrued error from casting down to float previously.

Addresses https://github.com/pytorch/pytorch/issues/33128
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34103

Differential Revision: D20343279

Pulled By: ezyang

fbshipit-source-id: a2da768259fab34ef326976283b7a15bebbbb979
2020-03-09 16:23:56 -07:00
Mike Ruberry
7e55494502 Warns on read-only Numpy array->tensor conversion (#33615)
Summary:
Addresses https://github.com/pytorch/pytorch/issues/5442.

Per title (and see issue). A test is added to test_torch.py to verify the behavior.

Update (with new behavior):

NumPy arrays can be non-writeable (read-only). When converting a NumPy array to a Torch tensor the storage is shared, but the tensor is always writable (PyTorch doesn't have a read-only tensor). Thus, when a non-writeable NumPy array is converted to a PyTorch tensor it can be written to.

In the past, PyTorch would silently copy non-writeable NumPy arrays and then convert those copies into tensors. This behavior violates the from_numpy contract, however, which promises that the tensor and the array share memory.

This PR adds a warning message when a non-writeable NumPy array is converted into a Torch tensor. This will not break any networks, but will make end users aware of the behavior. They can work-around the warning message by marking their NumPy arrays as writeable.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33615

Differential Revision: D20289894

Pulled By: mruberry

fbshipit-source-id: b76df0077399eb91038b12a6bf1917ef38c2cafd
2020-03-08 20:03:50 -07:00
Pavel Belevich
35b6d2945d Tensor.random_ check that from and to are in tensor dtype bounds (#34033)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34033

Test Plan: Imported from OSS

Differential Revision: D20182414

Pulled By: pbelevich

fbshipit-source-id: 3704570ead7de169ce13c81164be0aff0806fb46
2020-03-06 07:22:47 -08:00
lixinyu
f9f135c5d8 ChannelsLast3d support is_contiguous, contiguous, suggest_memory_format, caching (#33033)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33033

Test Plan: Imported from OSS

Differential Revision: D19759661

Pulled By: glaringlee

fbshipit-source-id: 6c4798fa93589338c0c71c5308b9fd1151330245
2020-03-06 06:02:03 -08:00
Peter Bell
2af64ba3ed Allow output to zero-strided tensors if the size is <= 1 along that dim (#34100)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/33812
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34100

Differential Revision: D20267778

Pulled By: ngimel

fbshipit-source-id: 1b84c4f6e6bf5d29c3698daa3cb71554b25c1eee
2020-03-05 16:01:33 -08:00
Edward Yang
ba1bd41767 Turn on strict dtype checking for test_torch.py (#33825)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33825

Partially addresses #20376

I do this by overriding assertEqual in classes that opt into
this.  This means I have to fix #33821.  The fix is a little
unsatisfactory as idiomatic Python 2 super() calls don't work
(since the class is no longer in scope); hopefully this will just
work when we go to Python 3.

General approach taken:
- A lot of dtype mismatches are because we specified tensor constants
  that infer to some dtype, but the actual dtype needed is something else.
  Those are easy, just annotate the tensor() constructor (often a legacy
  Tensor/FloatTensor call) with dtype
- There are a few cases where the promotion rules are nontrivial.  Some of them
  I just typed out the expected promotion rules manually (based on trial
  and error)
- There are some more complex cases; if it gets too hairy I just
  set exact_dtype=False and nope the fuck out

I don't have time to do it for all the other classes.  But the setup
should work if people just incrementally add the overrides to classes,
and then eventually flip the default.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D20125791

Pulled By: ezyang

fbshipit-source-id: 389c2d1efbd93172af02f13e38ac5e92fe730c57
2020-03-03 14:45:53 -08:00
anjali411
fbc9c61c81 randn and normal_ for complex tensors (#34037)
Summary:
1. randn and normal_ methods will work for complex tensors after this PR
2. added an internal function for viewing complex tensors as float tensors which enables us to reuse functions defined for float tensors for complex tensors with change in arguments passed(like size, standard deviation in case of normal_). currently the resultant new float tensor doesn't share the storage with the input complex tensor which means that the version counter wouldn't be updated if any function is called on this resultant tensor, but once the dtype entry is removed from the storage class, this issue will be resolved.

Side notes:
1. didn't add a separate header for the util functions because of this issue https://github.com/pytorch/pytorch/issues/20686#issuecomment-593002293
2. we should eventually have a public API method view_complex_as_float once (2) mentioned above gets resolved
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34037

Differential Revision: D20221793

Pulled By: anjali411

fbshipit-source-id: a78f5e83d6104e2f55e0b250c4ec32e8d29a14eb
2020-03-03 12:46:01 -08:00
Edward Yang
74a0663afd In torch_test, mark every test that takes >5s on a DEBUG CPU-only build as slow test (#33901)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33901

After this change, the pytest profile looks like:

4.83s call     test/test_torch.py::TestTorch::test_fft_ifft_rfft_irfft
4.23s call     test/test_torch.py::TestTorch::test_var_dim
4.22s call     test/test_torch.py::TestTorch::test_std_dim
4.19s call     test/test_torch.py::TestTorch::test_max
4.06s call     test/test_torch.py::TestTorch::test_min
3.60s call     test/test_torch.py::TestTorchDeviceTypeCPU::test_cdist_norm_batch_cpu
2.62s call     test/test_torch.py::TestTorchDeviceTypeCPU::test_pow_cpu
2.60s call     test/test_torch.py::TestTorch::test_matmul_small_brute_force_1d_Nd

And the entire CPU-only test suite can be run in 88s on my Intel(R) Xeon(R) CPU
E5-2650 v4 @ 2.20GHz

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D20222288

Pulled By: ezyang

fbshipit-source-id: 4224a9117f42566e290ae202881d76f1545cebec
2020-03-03 11:49:49 -08:00
Gerard Goossen
f29110fdf8 [pytorch] blas gemm fix for k=0 (#33819)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33819

These conditions are for the specific implementation, the fallback implementation works without these checks. So use that if any of these checks isn't true.

Resubmit of https://github.com/pytorch/pytorch/pull/33419 (which got reverted due to a problem with XLA, but which now has been fixed)
ghstack-source-id: 99333280

Test Plan: Test included

Differential Revision: D20121460

fbshipit-source-id: c1056b8e26751e24078bbe80c7cb4b223bcca7cb
2020-03-03 08:56:05 -08:00
Pavel Belevich
e568c039bd Enable Tensor.random_(from, to) for half on CPU (#34030)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34030

Test Plan: Imported from OSS

Differential Revision: D20182412

Pulled By: pbelevich

fbshipit-source-id: b7439e6d66e1c0b9ffa8b397cab057c9146f5714
2020-03-02 14:22:35 -08:00
anjali411
ba4cff2ffc [dtype inference] Following pytorch default for float vs double (#33713)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33713

Differential Revision: D20193387

Pulled By: anjali411

fbshipit-source-id: d802ec395df4e75e2be02e91d7288ae6fb7cf8e0
2020-03-02 11:56:34 -08:00
Mingfei Ma
c6d301220a Fix torch.cat() performance regression on single core CPU (#33534)
Summary:
This PR addresses the performance regression on `torch.cat()` on CPU with single thread.
Previous optimization https://github.com/pytorch/pytorch/issues/30806 introduced regression for several cases on pytorch operator benchmark.
See https://github.com/pytorch/pytorch/issues/33334 for detail.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33534

Differential Revision: D20129963

Pulled By: VitalyFedyunin

fbshipit-source-id: 3fa6cd266978e5b54fa37105555502b77352df3e
2020-02-28 11:22:08 -08:00
anjali411
dece155335 Modified assertEqual to handle complex tensors (#33773)
Summary:
- Modified assertEqual to handle complex tensors
- added a test in test_torch.py to test torch.zeros
- added dispatch for complex for index_kernel, index_put_kernel
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33773

Differential Revision: D20135553

Pulled By: anjali411

fbshipit-source-id: f716604535c0447ecffa335b0fc843431397c988
2020-02-28 08:43:28 -08:00
Pavel Belevich
095de1e872 Migrate random_ from the TH to Aten (CPU and CUDA) (#33663)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33663

Test Plan: Imported from OSS

Differential Revision: D20056350

Pulled By: pbelevich

fbshipit-source-id: f9859b79ffdec70c48d6ee3ec70fd6fad593a9f5
2020-02-27 05:05:42 -08:00
Edward Yang
8159316714 Revert D19941103: [pytorch] blas gemm fix for k=0
Test Plan: revert-hammer

Differential Revision:
D19941103

Original commit changeset: e1c85d1e7574

fbshipit-source-id: da12747130c60b61452aa46e269c66546a1075f9
2020-02-25 13:30:38 -08:00
xiaobing.zhang
4d203c6fc8 Move cumprod and cumsum to Aten(CPU) (#33280)
Summary:
This PR is about move cumprod and cumsum to Aten.
Test script:
```
import torch
import torch.nn as nn
import time

torch.manual_seed(0)

def _time():
    return time.time()

device = "cpu"

#torch.set_num_threads(1)

#warm up
for n in [10, 300]:
    input = torch.randn(n, n, n, requires_grad=False, device=device)
    input = input * 0.01 + 1
    for dim in range(input.dim()):
        for i in range(100):
            #output = input.cumsum(dim)
            output = input.cumprod(dim)

for n in [10, 300]:
    input = torch.randn(n, n, n, requires_grad=False, device=device)
    input = input * 0.01 + 1
    for dim in range(input.dim()):
        fwd_t = 0
        for i in range(1000):
            t1 = _time()
            #output = input.cumsum(dim)
            output = input.cumprod(dim)
            t2 = _time()
            fwd_t = fwd_t + (t2 -t1)
        fwd_avg = fwd_t / 1000 * 1000
        print("size = (%d, %d, %d); reduce dim=%d; compute time is %.4f(ms)" % (n, n, n, dim, fwd_avg))
```
Test device: **skx-8180**.
Performance:
```
size = (10, 10, 10); reduce dim=0; compute time is 0.0098(ms)
size = (10, 10, 10); reduce dim=1; compute time is 0.0089(ms)
size = (10, 10, 10); reduce dim=2; compute time is 0.0089(ms)
size = (300, 300, 300); reduce dim=0; compute time is 208.9403(ms)
size = (300, 300, 300); reduce dim=1; compute time is 241.5989(ms)
size = (300, 300, 300); reduce dim=2; compute time is 66.2587(ms)
After:
size = (10, 10, 10); reduce dim=0; compute time is 0.0065(ms)
size = (10, 10, 10); reduce dim=1; compute time is 0.0063(ms)
size = (10, 10, 10); reduce dim=2; compute time is 0.0053(ms)
size = (300, 300, 300); reduce dim=0; compute time is 36.0139(ms)
size = (300, 300, 300); reduce dim=1; compute time is 36.0776(ms)
size = (300, 300, 300); reduce dim=2; compute time is 21.0111(ms)
number_threads = 1:
size = (10, 10, 10); reduce dim=0; compute time is 0.0053(ms)
size = (10, 10, 10); reduce dim=1; compute time is 0.0052(ms)
size = (10, 10, 10); reduce dim=2; compute time is 0.0051(ms)
size = (300, 300, 300); reduce dim=0; compute time is 81.8831(ms)
size = (300, 300, 300); reduce dim=1; compute time is 88.5687(ms)
size = (300, 300, 300); reduce dim=2; compute time is 54.9922(ms)

cumprod:
Before:
size = (10, 10, 10); reduce dim=0; compute time is 0.0096(ms)
size = (10, 10, 10); reduce dim=1; compute time is 0.0088(ms)
size = (10, 10, 10); reduce dim=2; compute time is 0.0088(ms)
size = (300, 300, 300); reduce dim=0; compute time is 221.2601(ms)
size = (300, 300, 300); reduce dim=1; compute time is 249.7894(ms)
size = (300, 300, 300); reduce dim=2; compute time is 71.5182(ms)
number_threads = 1:
size = (10, 10, 10); reduce dim=0; compute time is 0.0100(ms)
size = (10, 10, 10); reduce dim=1; compute time is 0.0093(ms)
size = (10, 10, 10); reduce dim=2; compute time is 0.0093(ms)
size = (300, 300, 300); reduce dim=0; compute time is 207.6287(ms)
size = (300, 300, 300); reduce dim=1; compute time is 241.6693(ms)
size = (300, 300, 300); reduce dim=2; compute time is 66.2977(ms)
After:
size = (10, 10, 10); reduce dim=0; compute time is 0.0063(ms)
size = (10, 10, 10); reduce dim=1; compute time is 0.0062(ms)
size = (10, 10, 10); reduce dim=2; compute time is 0.0053(ms)
size = (300, 300, 300); reduce dim=0; compute time is 36.4283(ms)
size = (300, 300, 300); reduce dim=1; compute time is 38.1139(ms)
size = (300, 300, 300); reduce dim=2; compute time is 20.9140(ms)
number_threads =1:
size = (10, 10, 10); reduce dim=0; compute time is 0.0052(ms)
size = (10, 10, 10); reduce dim=1; compute time is 0.0052(ms)
size = (10, 10, 10); reduce dim=2; compute time is 0.0050(ms)
size = (300, 300, 300); reduce dim=0; compute time is 82.6926(ms)
size = (300, 300, 300); reduce dim=1; compute time is 90.1265(ms)
size = (300, 300, 300); reduce dim=2; compute time is 55.0196(ms)
```
Fix https://github.com/pytorch/pytorch/issues/24668, https://github.com/pytorch/pytorch/issues/24669.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33280

Differential Revision: D20076997

Pulled By: VitalyFedyunin

fbshipit-source-id: 12225767da8cfdc5e44257462a432bffa04cd469
2020-02-25 13:03:16 -08:00