Commit Graph

529 Commits

Author SHA1 Message Date
Jeffrey Wan
c0966914bc Internal gradcheck wrapper in testing._internal that sets certain flags to True (#51133)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/49409

There are many call sites where, gradcheck/gradgradcheck is now being implicitly invoked with `check_batched_grad` as True, but they were previously False. Cases fall into two basic categories:
1) the call site was previously using `torch.autograd.gradcheck` but is now changed to use the globally imported function instead
3) the call site was already using globally imported function, but does not explicitly pass `check_batched_grad` flag

Only in the _assertGradAndGradgradChecks cases, which are infrequent, I assumed that the the author is aware that omitting the flag means not applying check_batched_grad=True. (but maybe that is not the case?)

Overall this PR in its current state assumes that unless the author explicitly specified `check_batched_grad=False`, they were just probably not aware of this flag and did not mean to have this flag as False.

So far exceptions to the above (as discovered by CI) include:
 - Mkldnn (opaque tensors do not have strides) https://app.circleci.com/pipelines/github/pytorch/pytorch/264416/workflows/e4d87886-6247-4305-8526-2696130aa9a4/jobs/10401882/tests
 - all cases in test_sparse (https://app.circleci.com/pipelines/github/pytorch/pytorch/264553/workflows/3c1cbe30-830d-4acd-b240-38d833dccd9b/jobs/10407103)
 - all cases in test_overrides (https://app.circleci.com/pipelines/github/pytorch/pytorch/264553/workflows/3c1cbe30-830d-4acd-b240-38d833dccd9b/jobs/10407236)
 - test_autograd (test_LSTM_grad_and_gradgrad) - (https://app.circleci.com/pipelines/github/pytorch/pytorch/264553/workflows/3c1cbe30-830d-4acd-b240-38d833dccd9b/jobs/10407235)
 - test_data_parallel (test_data_parallel_buffers_requiring_grad) - *SIGSEGV* (https://app.circleci.com/pipelines/github/pytorch/pytorch/264820/workflows/14d89503-040d-4e3d-9f7b-0bc04833589b/jobs/10422697)
 - test_nn (https://app.circleci.com/pipelines/github/pytorch/pytorch/264919/workflows/df79e3ed-8a31-4a8e-b584-858ee99686ff/jobs/10427315)

Possible TODO is to prevent new tests from invoking external gradcheck.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/51133

Reviewed By: ezyang

Differential Revision: D26147919

Pulled By: soulitzer

fbshipit-source-id: dff883b50f337510a89f391ea2fd87de2d531432
2021-01-29 09:13:37 -08:00
Ivan Yashchuk
6e4746c1ac Port cholesky_inverse to ATen (#50269)
Summary:
Now we can remove `_th_potri`!

Compared to the original TH-based `cholesky_inverse`, complex (https://github.com/pytorch/pytorch/issues/33152) and batched inputs (https://github.com/pytorch/pytorch/issues/7500) are now supported both on CPU and CUDA.

Closes https://github.com/pytorch/pytorch/issues/24685.
Closes https://github.com/pytorch/pytorch/issues/24543.

Ref. https://github.com/pytorch/pytorch/issues/49421, https://github.com/pytorch/pytorch/issues/42666

Pull Request resolved: https://github.com/pytorch/pytorch/pull/50269

Reviewed By: bdhirsh

Differential Revision: D26047548

Pulled By: anjali411

fbshipit-source-id: e4f191e39c684f241b7cb0f4b4c025de082cccef
2021-01-28 16:24:41 -08:00
Scott Wolchok
1321f2bfe6 [PyTorch] Port Caffe2 opti for BatchMatMul batch size 1 to baddbmm (#51057)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51057

Caffe2 has an
[optimization](f8eefbdf7a/caffe2/operators/batch_matmul_op.h (L192))
for the case where the batch size is 1 that uses the underlying `gemm`
instead of `gemm_batched` BLAS function. This diff tries to port that
optimization to `baddbmm_mkl`.

Note that I have very little linear algebra background and am just
going off existing code and cblas API documentation, so please
review without assuming I know what I'm doing with the math itself.
ghstack-source-id: 120342923

Reviewed By: hlu1

Differential Revision: D26056613

fbshipit-source-id: feef80344b96601fc2bd0a2e8c8f6b57510d7856
2021-01-27 15:59:57 -08:00
Gao, Xiang
16dd5ca8ab Followup of kron PR (#51045)
Summary:
Followup of https://github.com/pytorch/pytorch/pull/50927

Pull Request resolved: https://github.com/pytorch/pytorch/pull/51045

Reviewed By: mruberry

Differential Revision: D26089204

Pulled By: ngimel

fbshipit-source-id: 77291dd83fba32d6f80a8540910b112a1d85a892
2021-01-27 10:33:05 -08:00
Xiang Gao
ba316a7612 Fix TF32 failures in test_linalg.py (#50453)
Summary:
On Ampere GPU, matmuls are computed by default with TF32 when the dtype is `torch.float`:  https://pytorch.org/docs/stable/notes/cuda.html#tensorfloat-32-tf32-on-ampere-devices, which results in reduced precision in results. However, linear algebra usually need higher precision, therefore lots of tests in `test_linalg.py` are failing on Ampere GPU because of precision issue.

To fix this issue:
- Most linear algebra methods, except for matmuls, should add `NoTF32Guard`
- Expected results in unit tests should compute matmuls using numpy instead of pytorch cuda.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/50453

Reviewed By: glaringlee

Differential Revision: D26023005

Pulled By: ngimel

fbshipit-source-id: f0ea533494fee322b07925565b57e3b0db2570c5
2021-01-26 19:51:20 -08:00
Xiang Gao
b822aba8ec Enable BFloat support for gemms on arch other than ampere (#50442)
Summary:
Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/50442

Reviewed By: bdhirsh

Differential Revision: D26044981

Pulled By: mruberry

fbshipit-source-id: 65c42f2c1de8d24e4852a1b5bd8f4b1735b2230e
2021-01-26 11:07:07 -08:00
Antonio Cuni
880f007480 Add torch.eig complex forward (CPU, CUDA) (#49168)
Summary:
Related to issue https://github.com/pytorch/pytorch/issues/42666

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49168

Reviewed By: mrshenli

Differential Revision: D25954027

Pulled By: mruberry

fbshipit-source-id: e429f9587efff5e638bfd0e4de864c06f41c63b1
2021-01-25 21:27:08 -08:00
Ivan Yashchuk
ddf26816d3 Make torch.svd return V, not V.conj() for complex inputs (#51012)
Summary:
**BC-breaking note:**

torch.svd() added support for complex inputs in PyTorch 1.7, but was not documented as doing so. The complex "V" tensor returned was actually the complex conjugate of what's expected. This PR fixes the discrepancy.

This will silently break all users of torch.svd() with complex inputs.

**Original PR Summary:**

This PR resolves https://github.com/pytorch/pytorch/issues/45821.

The problem was that when introducing the support of complex inputs for `torch.svd` it was overlooked that LAPACK/MAGMA returns the conjugate transpose of V matrix, not just the transpose of V. So `torch.svd` was silently returning U, S, V.conj() instead of U, S, V.

Behavior of `torch.linalg.pinv`, `torch.pinverse` and `torch.linalg.svd` (they depend on `torch.svd`) is not changed in this PR.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/51012

Reviewed By: bdhirsh

Differential Revision: D26047593

Pulled By: albanD

fbshipit-source-id: d1e08dbc3aab9ce1150a95806ef3b5da98b5d3ca
2021-01-25 14:06:41 -08:00
Heitor Schueroff
a7cf04ec40 Workaround for MAGMA accessing illegal memory in batched cholesky (#50957)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50957

MAGMA has an off-by-one error in their batched cholesky implementation which is causing illegal memory access for certain inputs. The workaround implemented in this PR is to pad the input to MAGMA with 1 extra element.

**Benchmark**
Ran the script below for both before and after my PR and got similar results.

*Script*
```
import torch
from torch.utils import benchmark

DTYPE = torch.float32
BATCHSIZE = 512 * 512
MATRIXSIZE = 16

a = torch.eye(MATRIXSIZE, device='cuda', dtype=DTYPE)

t0 = benchmark.Timer(
    stmt='torch.cholesky(a)',
    globals={'a': a},
    label='Single'
)

t1 = benchmark.Timer(
    stmt='torch.cholesky(a)',
    globals={'a': a.expand(BATCHSIZE, -1, -1)},
    label='Batched'
)

print(t0.timeit(100))
print(t1.timeit(100))
```

*Results before*
```
<torch.utils.benchmark.utils.common.Measurement object at 0x7faf9bc63400>
Single
  2.08 ms
  1 measurement, 100 runs , 1 thread
<torch.utils.benchmark.utils.common.Measurement object at 0x7faf9bc63400>
Batched
  7.68 ms
  1 measurement, 100 runs , 1 thread
```

*Results after*
```
<torch.utils.benchmark.utils.common.Measurement object at 0x7faf9bc63400>
Single
  2.10 ms
  1 measurement, 100 runs , 1 thread
<torch.utils.benchmark.utils.common.Measurement object at 0x7faf9bc63400>
Batched
  7.56 ms
  1 measurement, 100 runs , 1 thread
```

Fixes https://github.com/pytorch/pytorch/issues/41394, https://github.com/pytorch/pytorch/issues/26996, https://github.com/pytorch/pytorch/issues/48996

See also https://github.com/pytorch/pytorch/issues/42666, https://github.com/pytorch/pytorch/pull/26789

TODO
 ---
- [x] Benchmark to check for perf regressions

Test Plan: Imported from OSS

Reviewed By: bdhirsh

Differential Revision: D26050978

Pulled By: heitorschueroff

fbshipit-source-id: 7a5ba7e34c9d74b58568b2a0c631cc6d7ba63f86
2021-01-25 13:39:24 -08:00
Ivan Yashchuk
627a331257 Port CPU torch.orgqr to ATen (#50502)
Summary:
Now we can remove `_th_orgqr`!

Compared to the original TH-based `orgqr`, complex (https://github.com/pytorch/pytorch/issues/33152) and batched inputs are now supported.
CUDA support will be added in a follow-up PR.

Closes https://github.com/pytorch/pytorch/issues/24747

Ref. https://github.com/pytorch/pytorch/issues/49421, https://github.com/pytorch/pytorch/issues/42666

Pull Request resolved: https://github.com/pytorch/pytorch/pull/50502

Reviewed By: mrshenli

Differential Revision: D25953300

Pulled By: mruberry

fbshipit-source-id: f52a74e1c8f51b5e24f7b461430ca8fc96e4d149
2021-01-25 02:57:05 -08:00
Xiao Wang
186c3da037 Add cusolver gesvdj and gesvdjBatched to the backend of torch.svd (#48436)
Summary:
This PR adds cusolver `gesvdj` and `gesvdjBatched` to the backend of `torch.svd`.

I've tested the performance using cuda 11.1 on 2070, V100, and A100. The cusolver gesvdj and gesvdjBatched performances are better than magma in all square matrix cases. So cusolver backend will replace magma backend when available.

When both matrix dimensions are no greater than 32, `gesvdjBatched` is used. Otherwise, `gesvdj` is used.

Detailed benchmark is available at https://github.com/xwang233/code-snippet/tree/master/svd.

Some relevant code and discussions
- https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/kernels/linalg/svd_op_gpu.cu.cc
- https://github.com/google/jax/blob/master/jaxlib/cusolver.cc
- https://github.com/cupy/cupy/issues/3174
- https://github.com/tensorflow/tensorflow/issues/13603
- https://www.nvidia.com/en-us/on-demand/session/gtcsiliconvalley2019-s9226/

See also https://github.com/pytorch/pytorch/issues/42666 https://github.com/pytorch/pytorch/issues/47953

Close https://github.com/pytorch/pytorch/pull/50516

Pull Request resolved: https://github.com/pytorch/pytorch/pull/48436

Reviewed By: ejguan

Differential Revision: D25977046

Pulled By: heitorschueroff

fbshipit-source-id: c27e705cd29b6fd7c8ac674c1f9f490fa26ee1bf
2021-01-24 15:47:05 -08:00
Xiang Gao
ab331da7ac Rewrite kron with broadcasting at::mul (#50927)
Summary:
Because it is shorter, faster, and does not have TF32 issue.

Benchmark: https://github.com/zasdfgbnm/things/blob/master/2021Q1/kron.ipynb

Pull Request resolved: https://github.com/pytorch/pytorch/pull/50927

Reviewed By: glaringlee

Differential Revision: D26022385

Pulled By: ngimel

fbshipit-source-id: 513c9e9138c35c70d3a475a8407728af21321dae
2021-01-22 20:58:17 -08:00
Kurt Mohler
8ab1a1495d Rename set_deterministic to use_deterministic_algorithms (#49904)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/49100

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49904

Reviewed By: ezyang, mrshenli

Differential Revision: D25956761

Pulled By: mruberry

fbshipit-source-id: 86a59289d50825a0ebbd7c358b483c8d8039ffa6
2021-01-22 11:27:07 -08:00
Kurt Mohler
c082e2184d Add autograd tests for complex matrix norm nuclear and +/-2 (#50746)
Summary:
Also upgrades `linalg.norm`'s autograd and jit tests to `OpInfo`

Fixes https://github.com/pytorch/pytorch/issues/48842

Pull Request resolved: https://github.com/pytorch/pytorch/pull/50746

Reviewed By: mruberry

Differential Revision: D25968246

Pulled By: anjali411

fbshipit-source-id: d457069ddb4caf2a5caed1aa64c791ef0790952c
2021-01-21 15:33:08 -08:00
Richard Zou
884fb48794 Miscellaneous batched grad testing (#50738)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50738

This PR adds batched grad testing for:
- test_linalg.py
- test_unary_ufuncs.py

Future:
- add batched grad testing for test_nn
- enable option for batched grad testing in OpInfo

Test Plan: - run tests

Reviewed By: ejguan

Differential Revision: D25997678

Pulled By: zou3519

fbshipit-source-id: 9a9f6694c041580061bd52b5e45661c872b0b761
2021-01-21 14:26:46 -08:00
Ivan Yashchuk
f9a5ba7398 Added linalg.slogdet (#49194)
Summary:
This PR adds `torch.linalg.slogdet`.

Changes compared to the original torch.slogdet:

- Complex input now works as in NumPy
- Added out= variant (allocates temporary and makes a copy for now)
- Updated `slogdet_backward` to work with complex input

Ref. https://github.com/pytorch/pytorch/issues/42666

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49194

Reviewed By: VitalyFedyunin

Differential Revision: D25916959

Pulled By: mruberry

fbshipit-source-id: cf9be8c5c044870200dcce38be48cd0d10e61a48
2021-01-19 07:28:12 -08:00
Ivan Yashchuk
9384d31af5 Added linalg.pinv (#48399)
Summary:
This PR adds `torch.linalg.pinv`.

Changes compared to the original `torch.pinverse`:
 * New kwarg "hermitian": with `hermitian=True` eigendecomposition is used instead of singular value decomposition.
 * `rcond` argument can now be a `Tensor` of appropriate shape to apply matrix-wise clipping of singular values.
 * Added `out=` variant (allocates temporary and makes a copy for now)

Ref. https://github.com/pytorch/pytorch/issues/42666

Pull Request resolved: https://github.com/pytorch/pytorch/pull/48399

Reviewed By: zhangguanheng66

Differential Revision: D25869572

Pulled By: mruberry

fbshipit-source-id: 0f330a91d24ba4e4375f648a448b27594e00dead
2021-01-12 06:52:06 -08:00
Ivan Yashchuk
4774c6800b Added linalg.inv (#48261)
Summary:
This PR adds `torch.linalg.inv` for NumPy compatibility.

`linalg_inv_out` uses in-place operations on provided `result` tensor.

I modified `apply_inverse` to accept tensor of Int instead of std::vector, that way we can write a function similar to `linalg_inv_out` but removing the error checks and device memory synchronization.

I fixed `lda` (leading dimension parameter which is max(1, n)) in many places to handle 0x0 matrices correctly.
Zero batch dimensions are also working and tested.

Ref https://github.com/pytorch/pytorch/issues/42666

Pull Request resolved: https://github.com/pytorch/pytorch/pull/48261

Reviewed By: gchanan

Differential Revision: D25849590

Pulled By: mruberry

fbshipit-source-id: cfee6f1daf7daccbe4612ec68f94db328f327651
2021-01-10 04:00:51 -08:00
Antonio Cuni
b5ab0a7f78 Improve torch.linalg.qr (#50046)
Summary:
This is a follow up of PR https://github.com/pytorch/pytorch/issues/47764 to fix the remaining details.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/50046

Reviewed By: zou3519

Differential Revision: D25825557

Pulled By: mruberry

fbshipit-source-id: b8e335e02265e73484a99b0189e4cc042828e0a9
2021-01-08 09:52:31 -08:00
Antonio Cuni
5c5abd591d Implement torch.linalg.svd (#45562)
Summary:
This is related to https://github.com/pytorch/pytorch/issues/42666 .
I am opening this PR to have the opportunity to discuss things.
First, we need to consider the differences between `torch.svd` and `numpy.linalg.svd`:

1. `torch.svd` takes `some=True`, while `numpy.linalg.svd` takes `full_matrices=True`, which is effectively the opposite (and with the opposite default, too!)

2. `torch.svd` returns `(U, S, V)`, while `numpy.linalg.svd` returns `(U, S, VT)` (i.e., V transposed).

3. `torch.svd` always returns a 3-tuple; `numpy.linalg.svd` returns only `S` in case `compute_uv==False`

4. `numpy.linalg.svd` also takes an optional `hermitian=False` argument.

I think that the plan is to eventually deprecate `torch.svd` in favor of `torch.linalg.svd`, so this PR does the following:

1. Rename/adapt the old `svd` C++ functions into `linalg_svd`: in particular, now `linalg_svd` takes `full_matrices` and returns `VT`

2. Re-implement the old C++ interface on top of the new (by negating `full_matrices` and transposing `VT`).

3. The C++ version of `linalg_svd` *always* returns a 3-tuple (we can't do anything else). So, there is a python wrapper which manually calls `torch._C._linalg.linalg_svd` to tweak the return value in case `compute_uv==False`.

Currently, `linalg_svd_backward` is broken because it has not been adapted yet after the `V ==> VT` change, but before continuing and spending more time on it I wanted to make sure that the general approach is fine.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45562

Reviewed By: H-Huang

Differential Revision: D25803557

Pulled By: mruberry

fbshipit-source-id: 4966f314a0ba2ee391bab5cda4563e16275ce91f
2021-01-08 06:46:16 -08:00
Richard Barnes
ec6d29d6fa Drop unused imports from test (#49973)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49973

From
```
./python/libcst/libcst codemod remove_unused_imports.RemoveUnusedImportsWithGlean --no-format caffe2/
```

Test Plan: Standard sandcastle tests

Reviewed By: xush6528

Differential Revision: D25727350

fbshipit-source-id: 237ec4edd85788de920663719173ebec7ddbae1c
2021-01-07 12:09:38 -08:00
Antonio Cuni
361f5ed91d Implement torch.linalg.qr (#47764)
Summary:
I am opening this PR early to have a place to discuss design issues.
The biggest difference between `torch.qr` and `numpy.linalg.qr` is that the former `torch.qr` takes a boolean parameter `some=True`, while the latter takes a string parameter `mode='reduced'` which can be one of the following:

`reduced`
this is completely equivalent to `some=True`, and both are the default.

`complete`
this is completely equivalent to `some=False`.

`r`
this returns only `r` instead of a tuple `(r, q)`. We have already decided that we don't want different return types depending on the parameters, so I propose to return `(r, empty_tensor)` instead. I **think** that in this mode it will be impossible to implement the backward pass, so we should raise an appropriate error in that case.

`raw`
in this mode, it returns `(h, tau)` instead of `(q, r)`. Internally, `h` and `tau` are obtained by calling lapack's `dgeqrf` and are later used to compute the actual values of `(q, r)`. The numpy docs suggest that these might be useful to call other lapack functions, but at the moment none of them is exposed by numpy and I don't know how often it is used in the real world.
I suppose the implementing the backward pass need attention to: the most straightforward solution is to use `(h, tau)` to compute `(q, r)` and then use the normal logic for `qr_backward`, but there might be faster alternatives.

`full`, `f`
alias for `reduced`, deprecated since numpy 1.8.0

`economic`, `e`
similar to `raw but it returns only `h` instead of `(h, tau). Deprecated since numpy 1.8.0

To summarize:
  * `reduce`, `complete` and `r` are straightforward to implement.

  * `raw` needs a bit of extra care, but I don't know how much high priority it is: since it is used rarely, we might want to not support it right now and maybe implement it in the future?

  * I think we should just leave `full` and `economic` out, and possibly add a note to the docs explaining what you need to use instead

/cc mruberry

Pull Request resolved: https://github.com/pytorch/pytorch/pull/47764

Reviewed By: ngimel

Differential Revision: D25708870

Pulled By: mruberry

fbshipit-source-id: c25c70a23a02ec4322430d636542041e766ebe1b
2020-12-28 17:28:17 -08:00
Mike Ruberry
5acc27c00a Revert D25690129: [pytorch][PR] Added linalg.inv
Test Plan: revert-hammer

Differential Revision:
D25690129 (8554b58fbd)

Original commit changeset: edb2d03721f2

fbshipit-source-id: 8679ea18e637423d35919544d2b047a62ac3abd8
2020-12-23 15:27:52 -08:00
Ivan Yashchuk
8554b58fbd Added linalg.inv (#48261)
Summary:
This PR adds `torch.linalg.inv` for NumPy compatibility.

`linalg_inv_out` uses in-place operations on provided `result` tensor.

I modified `apply_inverse` to accept tensor of Int instead of std::vector, that way we can write a function similar to `linalg_inv_out` but removing the error checks and device memory synchronization.

I fixed `lda` (leading dimension parameter which is max(1, n)) in many places to handle 0x0 matrices correctly.
Zero batch dimensions are also working and tested.

Ref https://github.com/pytorch/pytorch/issues/42666

Pull Request resolved: https://github.com/pytorch/pytorch/pull/48261

Reviewed By: ngimel

Differential Revision: D25690129

Pulled By: mruberry

fbshipit-source-id: edb2d03721f22168c42ded8458513cb23dfdc712
2020-12-23 11:29:00 -08:00
Hao Lu
d54cf2aa27 [pt][ATen] Optimize bmm (#49506)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49506

- Get rid of expensive stuff like `TensorArg`, `checkBackend`, `checkSize`, and `TensorAccessor`.
- Add `checkDim` that does not require creating a `TensorArg` which incurs a refcount bump
- Avoid unnecessary calls to `torch.select`, which goes through the dispatcher in the cases we care about, with mat1 and mat2 not permuted or permuted with dims = [0, 2, 1]. The pt version of bmm supports crazy cases like when the inputs are permuted with dims = [1, 2, 0], which is uncommon in SparseNNs.

Test Plan:
Unit test:
```
buck test //caffe2/test:linalg
```

Benchmark with the adindexer model:
```
Before:
I1216 14:02:24.155516 2595800 PyTorchPredictorBenchLib.cpp:209] PyTorch run finished. Milliseconds per iter: 0.0847197. Iters per second: 11803.6
After:
I1216 14:02:26.583878 2595939 PyTorchPredictorBenchLib.cpp:209] PyTorch run finished. Milliseconds per iter: 0.082051. Iters per second: 12187.5
```

Reviewed By: bwasti

Differential Revision: D25577574

fbshipit-source-id: 8aba69b950e7b4d9d1b14ba837931695a908c068
2020-12-21 22:08:39 -08:00
Ivan Yashchuk
8be205ae13 Added linalg.solve (#48456)
Summary:
This PR adds `torch.linalg.solve`.

`linalg_solve_out` uses in-place operations on the provided result tensor.

I modified `apply_solve` to accept tensor of Int instead of std::vector, that way we can write a function similar to `linalg_solve_out` but removing the error checks and device memory synchronization.

In comparison to `torch.solve` this routine accepts 1-dimensional tensors and batches of 1-dim tensors for the right-hand-side term. `torch.solve` requires it to be at least 2-dimensional.

Ref. https://github.com/pytorch/pytorch/issues/42666

Pull Request resolved: https://github.com/pytorch/pytorch/pull/48456

Reviewed By: izdeby

Differential Revision: D25562222

Pulled By: mruberry

fbshipit-source-id: a9355c029e2442c2e448b6309511919631f9e43b
2020-12-21 10:11:12 -08:00
Ivan Yashchuk
f5ee619d2a Updated derivative rules for complex svd and pinverse (#47761)
Summary:
Updated `svd_backward` to work correctly for complex-valued inputs.
Updated `common_methods_invocations.py` to take dtype, device arguments for input construction.
Removed `test_pinverse` from `test_autograd.py`, it is replaced by entries to `common_methods_invocations.py`.
Added `svd` and `pinverse` to list of complex tests.

References for complex-valued SVD differentiation:

- https://giggleliu.github.io/2019/04/02/einsumbp.html
- https://arxiv.org/abs/1909.02659

The derived rules assume gauge invariance of loss functions, so the result would not be correct for loss functions that are not gauge invariant.
https://re-ra.xyz/Gauge-Problem-in-Automatic-Differentiation/

The same rule is implemented in Tensorflow and [BackwardsLinalg.jl](https://github.com/GiggleLiu/BackwardsLinalg.jl).

Ref. https://github.com/pytorch/pytorch/issues/33152

Pull Request resolved: https://github.com/pytorch/pytorch/pull/47761

Reviewed By: ngimel

Differential Revision: D25658897

Pulled By: mruberry

fbshipit-source-id: ba33ecbbea3f592238c01e62c7f193daf22a9d01
2020-12-20 14:39:31 -08:00
Mike Ruberry
f5b68e74d7 Revert D25574962: [pytorch][PR] Updated derivative rules for complex svd and pinverse
Test Plan: revert-hammer

Differential Revision:
D25574962 (9955355853)

Original commit changeset: 832b61303e88

fbshipit-source-id: d73f77f3e51b0f535dad6d21c5bebf8d41a6bfbd
2020-12-17 00:59:43 -08:00
Ivan Yashchuk
9955355853 Updated derivative rules for complex svd and pinverse (#47761)
Summary:
Updated `svd_backward` to work correctly for complex-valued inputs.
Updated `common_methods_invocations.py` to take dtype, device arguments for input construction.
Removed `test_pinverse` from `test_autograd.py`, it is replaced by entries to `common_methods_invocations.py`.
Added `svd` and `pinverse` to list of complex tests.

References for complex-valued SVD differentiation:

- https://giggleliu.github.io/2019/04/02/einsumbp.html
- https://arxiv.org/abs/1909.02659

The derived rules assume gauge invariance of loss functions, so the result would not be correct for loss functions that are not gauge invariant.
https://re-ra.xyz/Gauge-Problem-in-Automatic-Differentiation/

The same rule is implemented in Tensorflow and [BackwardsLinalg.jl](https://github.com/GiggleLiu/BackwardsLinalg.jl).

Ref. https://github.com/pytorch/pytorch/issues/33152

Pull Request resolved: https://github.com/pytorch/pytorch/pull/47761

Reviewed By: izdeby

Differential Revision: D25574962

Pulled By: mruberry

fbshipit-source-id: 832b61303e883ad3a451b84850ccf0f36763a6f6
2020-12-16 12:32:22 -08:00
Gao, Xiang
48d1ad1ada Reland "Add test for empty tensors for batch matmuls" (#48797)
Summary:
This reverts commit c7746adbc6.

Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/48797

Reviewed By: mruberry

Differential Revision: D25575264

Pulled By: ngimel

fbshipit-source-id: c7f3b384db833d727bb5bd8a51f1493a13016d09
2020-12-16 11:19:27 -08:00
Heitor Schueroff
45b33c83f1 Revert "Revert D24923679: Fixed einsum compatibility/performance issues (#46398)" (#49189)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49189

This reverts commit d307601365 and fixes the bug with diagonals and ellipsis combined.

Test Plan: Imported from OSS

Reviewed By: glaringlee

Differential Revision: D25540722

Pulled By: heitorschueroff

fbshipit-source-id: 86d0c9a7dcfda600b546457dad102af2ff33e353
2020-12-16 10:38:07 -08:00
Kurt Mohler
54f0556ee4 Add missing complex support for torch.norm and torch.linalg.norm (#48284)
Summary:
**BC-breaking note:**

Previously, when given a complex input, `torch.linalg.norm` and `torch.norm` would return a complex output. `torch.linalg.cond` would sometimes return a complex output and sometimes return a real output when given a complex input, depending on its `p` argument. This PR changes this behavior to match `numpy.linalg.norm` and `numpy.linalg.cond`, so that a complex input will result in the downgraded real number type, consistent with NumPy.

**PR Summary:**

The following cases were previously unsupported for complex inputs, and this commit adds support:

- Frobenius norm
- Norm order 2 (vector and matrix)
- CUDA vector norm

Part of https://github.com/pytorch/pytorch/issues/47833

Pull Request resolved: https://github.com/pytorch/pytorch/pull/48284

Reviewed By: H-Huang

Differential Revision: D25420880

Pulled By: mruberry

fbshipit-source-id: 11f6a2f3cad57d66476d30921c3f6ab8f3cd4017
2020-12-10 10:23:45 -08:00
Kurt Mohler
27f7d1c286 Port eig CPU from TH to ATen (#43215)
Summary:
Also consolidates shared logic between `eig` CPU and CUDA implementations

Fixes https://github.com/pytorch/pytorch/issues/24693

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43215

Reviewed By: VitalyFedyunin, zhangguanheng66

Differential Revision: D23862622

Pulled By: ngimel

fbshipit-source-id: ca1002428850520cd74cd5b7ed8cb4d12dbd9c52
2020-12-09 23:27:35 -08:00
X Wang
a849f38222 skip cuda test_cholesky_solve_batched_many_batches due to illegal memory access (#48999)
Summary:
See https://github.com/pytorch/pytorch/issues/48996

Pull Request resolved: https://github.com/pytorch/pytorch/pull/48999

Reviewed By: zhangguanheng66

Differential Revision: D25390070

Pulled By: mruberry

fbshipit-source-id: cf59130f6189ab8c2dade6a6a4de2f69753a5e36
2020-12-09 00:47:55 -08:00
Heitor Schueroff
d307601365 Revert D24923679: Fixed einsum compatibility/performance issues (#46398)
Test Plan: revert-hammer

Differential Revision:
D24923679 (ea2a568cca)

Original commit changeset: 47e48822cd67

fbshipit-source-id: 52f17b66a4aa075d0159bdf1c98616e6098091b8
2020-12-07 11:48:36 -08:00
Heitor Schueroff
ea2a568cca Fixed einsum compatibility/performance issues (#46398) (#47860)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47860

This PR makes torch.einsum compatible with numpy.einsum except for the sublist input option as requested here https://github.com/pytorch/pytorch/issues/21412. It also fixed 2 performance issues linked below and adds a check for reducing to torch.dot instead of torch.bmm which is faster in some cases.

fixes #45854, #37628, #30194, #15671

fixes #41467 with benchmark below
```python
import torch
from torch.utils.benchmark import Timer

a = torch.randn(10000, 100, 101, device='cuda')
b = torch.randn(10000, 101, 3, device='cuda')

c = torch.randn(10000, 100, 1, device='cuda')
d = torch.randn(10000, 100, 1, 3, device='cuda')

print(Timer(
    stmt='torch.einsum("bij,bjf->bif", a, b)',
    globals={'a': a, 'b': b}
).blocked_autorange())

print()

print(Timer(
    stmt='torch.einsum("bic,bicf->bif", c, d)',
    globals={'c': c, 'd': d}
).blocked_autorange())
```
```
<torch.utils.benchmark.utils.common.Measurement object at 0x7fa37c413850>
torch.einsum("bij,bjf->bif", a, b)
  Median: 4.53 ms
  IQR:    0.00 ms (4.53 to 4.53)
  45 measurements, 1 runs per measurement, 1 thread

<torch.utils.benchmark.utils.common.Measurement object at 0x7fa37c413700>
torch.einsum("bic,bicf->bif", c, d)
  Median: 63.86 us
  IQR:    1.52 us (63.22 to 64.73)
  4 measurements, 1000 runs per measurement, 1 thread
```

fixes #32591 with benchmark below
```python
import torch
from torch.utils.benchmark import Timer

a = torch.rand(1, 1, 16, 2, 16, 2, 16, 2, 2, 2, 2, device="cuda")
b = torch.rand(729, 1, 1, 2, 1, 2, 1, 2, 2, 2, 2, device="cuda")

print(Timer(
    stmt='(a * b).sum(dim = (-3, -2, -1))',
    globals={'a': a, 'b': b}
).blocked_autorange())

print()

print(Timer(
    stmt='torch.einsum("...ijk, ...ijk -> ...", a, b)',
    globals={'a': a, 'b': b}
).blocked_autorange())
```
```
<torch.utils.benchmark.utils.common.Measurement object at 0x7efe0de28850>
(a * b).sum(dim = (-3, -2, -1))
  Median: 17.86 ms
  2 measurements, 10 runs per measurement, 1 thread

<torch.utils.benchmark.utils.common.Measurement object at 0x7efe0de286a0>
torch.einsum("...ijk, ...ijk -> ...", a, b)
  Median: 296.11 us
  IQR:    1.38 us (295.42 to 296.81)
  662 measurements, 1 runs per measurement, 1 thread
```

TODO

- [x] add support for ellipsis broadcasting
- [x] fix corner case issues with sumproduct_pair
- [x] update docs and add more comments
- [x] add tests for error cases

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D24923679

Pulled By: heitorschueroff

fbshipit-source-id: 47e48822cd67bbcdadbdfc5ffa25ee8ba4c9620a
2020-12-06 08:02:37 -08:00
Ivan Yashchuk
85121a7a0f Added CUDA support for complex input for torch.cholesky_solve (#47047)
Summary:
`torch.cholesky_solve` now works for complex inputs on GPU.
I moved the existing tests to `test_linalg.py` and modified them to test complex and float32 dtypes.
Differentiation also works correctly with complex inputs now.

Ref. https://github.com/pytorch/pytorch/issues/33152

Pull Request resolved: https://github.com/pytorch/pytorch/pull/47047

Reviewed By: ngimel

Differential Revision: D24730020

Pulled By: mruberry

fbshipit-source-id: 95402da5789c56e5a682019790985207fa28fa1f
2020-12-05 20:18:30 -08:00
Ivan Yashchuk
cb285080b0 Added computing matrix condition numbers (linalg.cond) (#45832)
Summary:
This PR adds `torch.linalg.cond` for NumPy compatibility.

Ref https://github.com/pytorch/pytorch/issues/42666.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45832

Reviewed By: ngimel

Differential Revision: D25183690

Pulled By: mruberry

fbshipit-source-id: a727959bfec2bc2dc36df59d9ef79c0534b68194
2020-12-04 02:23:57 -08:00
Heitor Schueroff
c134f32835 Implemented torch.inner (#46716)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46716

Implemented torch.inner similar to [numpy.inner](https://numpy.org/doc/stable/reference/generated/numpy.inner.html). For now it's implemented as a composite op.

TODO

- [x] Add documentation

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D24860351

Pulled By: heitorschueroff

fbshipit-source-id: de5c82f285893495491fdba73b35634f4d00bac8
2020-12-03 11:37:55 -08:00
Mike Ruberry
c7746adbc6 Revert D24874754: [pytorch][PR] Add test for empty tensors for batch matmuls
Test Plan: revert-hammer

Differential Revision:
D24874754 (5f105e2aa6)

Original commit changeset: 41ba837740ff

fbshipit-source-id: d6cb31cbc4a2a386aab0a5f24710f218f9a561ca
2020-12-03 00:29:07 -08:00
Xiang Gao
5f105e2aa6 Add test for empty tensors for batch matmuls (#47700)
Summary:
Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/47700

Reviewed By: malfet

Differential Revision: D24874754

Pulled By: ngimel

fbshipit-source-id: 41ba837740ff7d5bd49d5f7277ad2064985aba2f
2020-12-02 20:45:59 -08:00
Nikita Vedeneev
3b25af02a4 matrix_exp + matrix_exp.backward complex support (#48363)
Summary:
As per title. Fixes https://github.com/pytorch/pytorch/issues/48299.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/48363

Reviewed By: ejguan

Differential Revision: D25224498

Pulled By: albanD

fbshipit-source-id: 0c80ffb03ccfc46ab86398911edfba0b09049e55
2020-12-02 08:35:14 -08:00
Ivan Yashchuk
e41e780f7a Added support for complex input for torch.lu_solve #2 (#48028)
Summary:
Relanding https://github.com/pytorch/pytorch/pull/46862
There was an issue with the simultaneous merge of two slightly conflicting PRs.

This PR adds `torch.lu_solve` for complex inputs both on CPU and GPU.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/48028

Reviewed By: linbinyu

Differential Revision: D25003700

Pulled By: zou3519

fbshipit-source-id: 24cd1babe9ccdbaa4e2ed23f08a9153d40d0f0cd
2020-12-02 08:13:02 -08:00
Ivan Yashchuk
74330e0497 Added linalg.matrix_rank (#48206)
Summary:
This PR adds `torch.linalg.matrix_rank`.

Changes compared to the original `torch.matrix_rank`:
- input with the complex dtype is supported
- batched input is supported
- "symmetric" kwarg renamed to "hermitian"

Should I update the documentation for `torch.matrix_rank`?

For the input with no elements (for example 0×0 matrix), the current implementation is divergent from NumPy. NumPy stumbles on not defined max for such input, here I chose to return appropriately sized tensor of zeros. I think that's mathematically a correct thing to do.

Ref https://github.com/pytorch/pytorch/issues/42666.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/48206

Reviewed By: albanD

Differential Revision: D25211965

Pulled By: mruberry

fbshipit-source-id: ae87227150ab2cffa07f37b4a3ab228788701837
2020-12-02 03:29:25 -08:00
Mike Ruberry
36c87f1243 Refactors test_torch.py to be fewer than 10k lines (#47356)
Summary:
Creates multiple new test suites to have fewer tests in test_torch.py, consistent with previous test suite creation like test_unary_ufuncs.py and test_linalg.py.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/47356

Reviewed By: ngimel

Differential Revision: D25202268

Pulled By: mruberry

fbshipit-source-id: 75fde3ca76545d1b32b86d432a5cb7a5ba8f5bb6
2020-11-28 20:11:40 -08:00
Antonio Cuni
344918576c Migrate eig from the TH to Aten (CUDA) (#44105)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/24553

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44105

Reviewed By: ngimel

Differential Revision: D25192116

Pulled By: mruberry

fbshipit-source-id: 87f1ba4924b9174bfe0d9e2ab14bbe1c6bae879c
2020-11-27 15:15:48 -08:00
Ivan Yashchuk
4ed7f36ed1 Added linalg.eigh, linalg.eigvalsh (#45526)
Summary:
This PR adds `torch.linalg.eigh`, and `torch.linalg.eigvalsh` for NumPy compatibility.
The current `torch.symeig` uses (on CPU) a different LAPACK routine than NumPy (`syev` vs `syevd`). Even though it shouldn't matter in practice, `torch.linalg.eigh` uses `syevd` (as NumPy does).

Ref https://github.com/pytorch/pytorch/issues/42666

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45526

Reviewed By: gchanan

Differential Revision: D25022659

Pulled By: mruberry

fbshipit-source-id: 3676b77a121c4b5abdb712ad06702ac4944e900a
2020-11-22 04:57:28 -08:00
Xiong Wei
ec256ab2f2 implement torch.addr using TensorIterator based kernels (#47664)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/47313

This PR implements `torch.addr` function using `TensorIterator` with `cpu_kernel_vec` and `gpu_kernel`.
It helps reduce memory usage, improve performance, and fix the bug when `beta` or `alpha` is a complex number.

Todo
- [x] benchmarking `torch.addr` for the change of this PR, as well as the legacy TH implementation used in PyTorch 1.6.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/47664

Reviewed By: zhangguanheng66

Differential Revision: D25059693

Pulled By: ngimel

fbshipit-source-id: 20a90824aa4cb2240e81a9f17a9e2f16ae6e3437
2020-11-20 00:21:49 -08:00
Ivan Yashchuk
343b3e5cae Added linalg.tensorinv (#45969)
Summary:
This PR adds `torch.linalg.tensorinv` for NumPy compatibility.

Ref https://github.com/pytorch/pytorch/issues/42666

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45969

Reviewed By: zhangguanheng66

Differential Revision: D25060568

Pulled By: mruberry

fbshipit-source-id: 3b145ce64e4bd5021bc229f5ffdd791c572673a0
2020-11-19 11:54:50 -08:00
Mike Ruberry
ea1e78a0c5 Revert D24853669: [pytorch][PR] Migrate eig from the TH to Aten (CUDA)
Test Plan: revert-hammer

Differential Revision:
D24853669 (866f8591be)

Original commit changeset: a513242dc7f4

fbshipit-source-id: a0c8c424b61b1e627d9102de6b4c6d0717a6c06d
2020-11-18 16:53:18 -08:00
Antonio Cuni
866f8591be Migrate eig from the TH to Aten (CUDA) (#44105)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/24553

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44105

Reviewed By: heitorschueroff

Differential Revision: D24853669

Pulled By: mruberry

fbshipit-source-id: a513242dc7f49f55dbc6046c18d8a9d9aa2aaf8d
2020-11-18 12:10:18 -08:00
Ivan Yashchuk
81b1673a21 Enable complex tests that depend on batched matmul on CUDA (#47910)
Summary:
Now when https://github.com/pytorch/pytorch/pull/42553 is merged we can delete a bit of code from the tests and enable some of the skipped complex tests.

Unfortunately, `test_pinverse_complex_xfailed` and `test_symeig_complex_xfailed` had bugs and it wasn't caught automatically that these tests xpass. Need to be careful next time with `unittest.expectedFailure`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/47910

Reviewed By: zhangguanheng66

Differential Revision: D25052130

Pulled By: mruberry

fbshipit-source-id: 29512995c024b882f9cb78b7bede77733d5762d0
2020-11-18 10:44:47 -08:00
Ivan Yashchuk
260daf088d Added linalg.cholesky (#46083)
Summary:
This PR adds `torch.linalg.cholesky` function that matches `numpy.linalg.cholesky`.

Fixed `lda` argument to `lapackCholesky` calls.
Added `random_hermitian_pd_matrix` helper function for tests.

Ref https://github.com/pytorch/pytorch/issues/42666.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/46083

Reviewed By: ailzhang

Differential Revision: D24861752

Pulled By: mruberry

fbshipit-source-id: 214dbceb4e8a2c589df209493efd843962d25593
2020-11-13 16:50:40 -08:00
Richard Zou
1c7c612af0 Revert D24543682: [pytorch][PR] Added support for complex input for torch.lu_solve
Test Plan: revert-hammer

Differential Revision:
D24543682 (ffd0003022)

Original commit changeset: 165bde39ef95

fbshipit-source-id: 790b4157fdbc7149aaf0748555efe6daed7e1a23
2020-11-13 08:24:53 -08:00
Ivan Yashchuk
ffd0003022 Added support for complex input for torch.lu_solve (#46862)
Summary:
`torch.lu_solve` now works for complex inputs both on CPU and GPU.
I moved the existing tests to `test_linalg.py` and modified them to test complex dtypes, but I didn't modify/improve the body of the tests.

Ref. https://github.com/pytorch/pytorch/issues/33152

Pull Request resolved: https://github.com/pytorch/pytorch/pull/46862

Reviewed By: nikithamalgifb

Differential Revision: D24543682

Pulled By: anjali411

fbshipit-source-id: 165bde39ef95cafebf976c5ba4b487297efe8433
2020-11-13 02:35:31 -08:00
Ivan Yashchuk
149190c014 Added CUDA support for complex input for torch.solve (#47045)
Summary:
`torch.solve` now works for complex inputs on GPU.
I moved the existing tests to `test_linalg.py` and modified them to test complex and float32 dtypes.
Differentiation also works correctly with complex inputs.

Fixes https://github.com/pytorch/pytorch/issues/41084
Ref. https://github.com/pytorch/pytorch/issues/33152

anjali411 I hope you don't mind that I took over https://github.com/pytorch/pytorch/pull/42737

Pull Request resolved: https://github.com/pytorch/pytorch/pull/47045

Reviewed By: nikithamalgifb

Differential Revision: D24921503

Pulled By: anjali411

fbshipit-source-id: 4c3fc4f193a84b6e28c43c08672d480715000923
2020-11-12 12:22:59 -08:00
Gregory Chanan
b6cb2caa68 Revert "Fixed einsum compatibility/performance issues (#46398)" (#47821)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47821

This reverts commit a5c65b86ce.

 Conflicts:
	test/test_linalg.py

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D24909923

Pulled By: gchanan

fbshipit-source-id: 9dcf98e7c4a3c7e5aaffe475867fa086f3bb6ff2
2020-11-12 08:11:40 -08:00
Jeff Daily
2df5600155 [ROCm] add skipCUDAIfRocm to test_lingalg test_norm_fro_2_equivalence_old (#47809)
Summary:
This test started failing when ROCm CI moved to 3.9.  Skip until triage is complete.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/47809

Reviewed By: seemethere

Differential Revision: D24906319

Pulled By: walterddr

fbshipit-source-id: 0c425f3b21190cfbc5e0d1c3f477d834af40f0ca
2020-11-12 07:12:43 -08:00
Ivan Yashchuk
52ec8b9340 Added CUDA support for complex input for torch.triangular_solve (#46916)
Summary:
`torch.triangular_solve` now works for complex inputs on GPU.
I moved the existing tests to `test_linalg.py` and modified them to test complex and float32 dtypes.

Ref. https://github.com/pytorch/pytorch/issues/33152

Pull Request resolved: https://github.com/pytorch/pytorch/pull/46916

Reviewed By: navahgar, agolynski

Differential Revision: D24706647

Pulled By: anjali411

fbshipit-source-id: fe780eac93d2ae1b2549539bb385e5fac25213b3
2020-11-11 16:08:11 -08:00
Ivan Yashchuk
a1db5b0f2b Added CUDA support for complex input for torch.inverse #2 (#47595)
Summary:
`torch.inverse` now works for complex inputs on GPU.
Opening a new PR here. The previous PR was merged and reverted due to a bug in tests marked with `slowTest`.
Previous PR https://github.com/pytorch/pytorch/pull/45034

Ref. https://github.com/pytorch/pytorch/issues/33152

Pull Request resolved: https://github.com/pytorch/pytorch/pull/47595

Reviewed By: navahgar

Differential Revision: D24840955

Pulled By: anjali411

fbshipit-source-id: ec49fffdc4b3cb4ae7507270fa24e127be14f59b
2020-11-11 11:06:08 -08:00
Heitor Schueroff
a5c65b86ce Fixed einsum compatibility/performance issues (#46398)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46398

This PR makes torch.einsum compatible with numpy.einsum except for the sublist input option as requested here https://github.com/pytorch/pytorch/issues/21412. It also fixed 2 performance issues linked below and adds a check for reducing to torch.dot instead of torch.bmm which is faster in some cases.

fixes #45854, #37628, #30194, #15671

fixes #41467 with benchmark below
```python
import torch
from torch.utils.benchmark import Timer

a = torch.randn(10000, 100, 101, device='cuda')
b = torch.randn(10000, 101, 3, device='cuda')

c = torch.randn(10000, 100, 1, device='cuda')
d = torch.randn(10000, 100, 1, 3, device='cuda')

print(Timer(
    stmt='torch.einsum("bij,bjf->bif", a, b)',
    globals={'a': a, 'b': b}
).blocked_autorange())

print()

print(Timer(
    stmt='torch.einsum("bic,bicf->bif", c, d)',
    globals={'c': c, 'd': d}
).blocked_autorange())
```
```
<torch.utils.benchmark.utils.common.Measurement object at 0x7fa37c413850>
torch.einsum("bij,bjf->bif", a, b)
  Median: 4.53 ms
  IQR:    0.00 ms (4.53 to 4.53)
  45 measurements, 1 runs per measurement, 1 thread

<torch.utils.benchmark.utils.common.Measurement object at 0x7fa37c413700>
torch.einsum("bic,bicf->bif", c, d)
  Median: 63.86 us
  IQR:    1.52 us (63.22 to 64.73)
  4 measurements, 1000 runs per measurement, 1 thread
```

fixes #32591 with benchmark below
```python
import torch
from torch.utils.benchmark import Timer

a = torch.rand(1, 1, 16, 2, 16, 2, 16, 2, 2, 2, 2, device="cuda")
b = torch.rand(729, 1, 1, 2, 1, 2, 1, 2, 2, 2, 2, device="cuda")

print(Timer(
    stmt='(a * b).sum(dim = (-3, -2, -1))',
    globals={'a': a, 'b': b}
).blocked_autorange())

print()

print(Timer(
    stmt='torch.einsum("...ijk, ...ijk -> ...", a, b)',
    globals={'a': a, 'b': b}
).blocked_autorange())
```
```
<torch.utils.benchmark.utils.common.Measurement object at 0x7efe0de28850>
(a * b).sum(dim = (-3, -2, -1))
  Median: 17.86 ms
  2 measurements, 10 runs per measurement, 1 thread

<torch.utils.benchmark.utils.common.Measurement object at 0x7efe0de286a0>
torch.einsum("...ijk, ...ijk -> ...", a, b)
  Median: 296.11 us
  IQR:    1.38 us (295.42 to 296.81)
  662 measurements, 1 runs per measurement, 1 thread
```

TODO

- [x] add support for ellipsis broadcasting
- [x] fix corner case issues with sumproduct_pair
- [x] update docs and add more comments
- [x] add tests for error cases

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D24860367

Pulled By: heitorschueroff

fbshipit-source-id: 31110ee598fd598a43acccf07929b67daee160f9
2020-11-10 19:38:43 -08:00
Edward Yang
1aeefcdaa6 Revert D24730264: [pytorch][PR] Added CUDA support for complex input for torch.inverse
Test Plan: revert-hammer

Differential Revision:
D24730264 (33acbedace)

Original commit changeset: b9c94ec46301

fbshipit-source-id: beb9263700e9bc92685f74c37c46aa33f3b595b9
2020-11-06 07:28:14 -08:00
Ivan Yashchuk
33acbedace Added CUDA support for complex input for torch.inverse (#45034)
Summary:
`torch.inverse` now works for complex inputs on GPU.
Test cases with complex matrices are xfailed for now. For example, batched matmul does not work with complex yet.

Ref. https://github.com/pytorch/pytorch/issues/33152

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45034

Reviewed By: zou3519

Differential Revision: D24730264

Pulled By: anjali411

fbshipit-source-id: b9c94ec463012913c117278a884adeee96ea02aa
2020-11-05 16:30:11 -08:00
Heitor Schueroff
a4ba018e57 Updated docs/test for dot and vdot (#47242)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47242

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D24733771

Pulled By: heitorschueroff

fbshipit-source-id: 92e3b0e28e0565918335fa85d52abe5db9eeff57
2020-11-05 06:27:50 -08:00
Nikita Vedeneev
8a3728c819 Make torch.det() support complex input. (#45980)
Summary:
As per title. A minor fix required to make it available for the CPU (`fmod` does not support complex).
For CUDA requires [https://github.com/pytorch/pytorch/issues/45898 ](https://github.com/pytorch/pytorch/pull/45898).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45980

Reviewed By: izdeby

Differential Revision: D24539097

Pulled By: anjali411

fbshipit-source-id: 508830dbfd7794ab73e19320d07c69a051c91819
2020-11-04 17:47:03 -08:00
Ivan Yashchuk
f276ab55cd Added Kronecker product of tensors (torch.kron) (#45358)
Summary:
This PR adds a function for calculating the Kronecker product of tensors.
The implementation is based on `at::tensordot` with permutations and reshape.
Tests pass.

TODO:

- [x] Add more test cases
- [x] Write documentation
- [x] Add entry `common_methods_invokations.py`

Ref. https://github.com/pytorch/pytorch/issues/42666

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45358

Reviewed By: mrshenli

Differential Revision: D24680755

Pulled By: mruberry

fbshipit-source-id: b1f8694589349986c3abfda3dc1971584932b3fa
2020-11-03 12:41:41 -08:00
Ivan Yashchuk
f629fbe235 Added torch.linalg.tensorsolve (#46142)
Summary:
This PR adds `torch.linalg.tensorsolve` function that matches `numpy.linalg.tensorsolve`.

Ref https://github.com/pytorch/pytorch/issues/42666.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/46142

Reviewed By: izdeby

Differential Revision: D24539400

Pulled By: mruberry

fbshipit-source-id: 6e38364fe0bc511e739036deb274d9307df119b2
2020-10-29 10:29:28 -07:00
Kurt Mohler
b61671ccd2 Enable dtype arg for torch.linalg.norm with order 'fro' and 'nuc' (#46637)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/46255

Pull Request resolved: https://github.com/pytorch/pytorch/pull/46637

Reviewed By: gchanan

Differential Revision: D24459097

Pulled By: mruberry

fbshipit-source-id: 7f207a23de902c27f8313ee80f452687a97e8f6f
2020-10-26 02:59:00 -07:00
Kurt Mohler
a0a8bc8870 Fix mistakes and increase clarity of norm documentation (#42696)
Summary:
* Removes incorrect statement that "the vector norm will be applied to the last dimension".
* More clearly describe each different combination of `p`, `ord`, and input size.
* Moves norm tests from `test/test_torch.py` to `test/test_linalg.py`
* Adds test ensuring that `p='fro'` and `p=2` give same results for mutually valid inputs

Fixes https://github.com/pytorch/pytorch/issues/41388

Pull Request resolved: https://github.com/pytorch/pytorch/pull/42696

Reviewed By: bwasti

Differential Revision: D23876862

Pulled By: mruberry

fbshipit-source-id: 36f33ccb6706d5fe13f6acf3de8ae14d7fbdff85
2020-10-10 14:12:43 -07:00
Kurt Mohler
d360402f34 Use out variants of functions used by linalg.norm, where possible (#45641)
Summary:
Closes https://github.com/pytorch/pytorch/issues/45669

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45641

Reviewed By: ngimel

Differential Revision: D24186731

Pulled By: mruberry

fbshipit-source-id: 7e3d12ef34704bf461b8de19830e7b2f73f3739b
2020-10-08 10:55:35 -07:00
Nikita Shulga
3a27fc966a Test torch.svd using complex float and double numbers (take 2) (#45795)
Summary:
Adds support for magmaSvd for complex numbers

Fixes use-after-free error in `apply_symeig`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45795

Reviewed By: ezyang

Differential Revision: D24096955

Pulled By: malfet

fbshipit-source-id: 0d8d8492f89fe722bbd5aed3528f244245b496d0
2020-10-03 11:33:28 -07:00
Mike Ruberry
6417a70465 Updates linalg warning + docs (#45415)
Summary:
Changes the deprecation of norm to a docs deprecation, since PyTorch components still rely on norm and some behavior, like automatically flattening tensors, may need to be ported to torch.linalg.norm. The documentation is also updated to clarify that torch.norm and torch.linalg.norm are distinct.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45415

Reviewed By: ngimel

Differential Revision: D23958252

Pulled By: mruberry

fbshipit-source-id: fd54e807c59a2655453a6bcd9f4073cb2c12e8ac
2020-09-28 05:28:42 -07:00
Xiong Wei
241afc9188 Migrate addr from the TH to Aten (CPU) (#44364)
Summary:
Related https://github.com/pytorch/pytorch/issues/24507
Fixes https://github.com/pytorch/pytorch/issues/24666

This PR is to modernize the CPU implementation of the vector `outer product`.
The existing TH implementation for `torch.attr` is migrated to `aten`, as the `torch.ger` manipulates the `addr` functions to calculate outer product,

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44364

Reviewed By: ezyang

Differential Revision: D23866733

Pulled By: mruberry

fbshipit-source-id: 5159ea22f0e3c991123fe7c19cc9beb6ad00301e
2020-09-25 01:18:09 -07:00
Mike Ruberry
95df8657c9 Enables test linalg (#45278)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/45271.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45278

Reviewed By: ngimel

Differential Revision: D23926124

Pulled By: mruberry

fbshipit-source-id: 26692597f9a1988e5fa846f97b8430c3689cac27
2020-09-24 23:09:38 -07:00
Kurt Mohler
28a23fce4c Deprecate torch.norm and torch.functional.norm (#44321)
Summary:
Part of https://github.com/pytorch/pytorch/issues/24802

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44321

Reviewed By: mrshenli

Differential Revision: D23617273

Pulled By: mruberry

fbshipit-source-id: 6f88b5cb097fd0acb9cf0e415172c5a86f94e9f2
2020-09-10 01:16:41 -07:00
Kurt Mohler
68297eeb1a Add support for integer dim arg in torch.linalg.norm (#43907)
Summary:
Since PR https://github.com/pytorch/pytorch/issues/43262 is merged, this works now.

Part of https://github.com/pytorch/pytorch/issues/24802

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43907

Reviewed By: anjali411

Differential Revision: D23471964

Pulled By: mruberry

fbshipit-source-id: ef2f11f78343fc866f752c9691b0c1fa687353ba
2020-09-05 23:16:36 -07:00
Kurt Mohler
68b9daa9bf Add torch.linalg.norm (#42749)
Summary:
Adds `torch.linalg.norm` function that matches the behavior of `numpy.linalg.norm`.

Additional changes:
* Add support for dimension wrapping in `frobenius_norm` and `nuclear_norm`
* Fix `out` argument behavior for `nuclear_norm`
* Fix issue where `frobenius_norm` allowed duplicates in `dim` argument
* Add `_norm_matrix`

Closes https://github.com/pytorch/pytorch/issues/24802

Pull Request resolved: https://github.com/pytorch/pytorch/pull/42749

Reviewed By: ngimel

Differential Revision: D23336234

Pulled By: mruberry

fbshipit-source-id: f0aba3089a3a0bf856aa9c4215e673ff34228fac
2020-08-28 18:28:33 -07:00
Mike Ruberry
bee174dc3f Adds linalg.det alias, fixes outer alias, updates alias testing (#42802)
Summary:
This PR:

- updates test_op_normalization.py, which verifies that aliases are correctly translated in the JIT
- adds torch.linalg.det as an alias for torch.det
- moves the torch.linalg.outer alias to torch.outer (to be consistent with NumPy)

The torch.linalg.outer alias was put the linalg namespace erroneously as a placeholder since it's a "linear algebra op" according to NumPy but is actually still in the main NumPy namespace.

The updates to test_op_normalization are necessary. Previously it was using method_tests to generate tests, and method_tests assumes test suites using it also use the device generic framework, which test_op_normalization did not. For example, some ops require decorators like `skipCPUIfNoLapack`, which only works in device generic test classes. Moving test_op_normalization to the device generic framework also lets these tests run on CPU and CUDA.

Continued reliance on method_tests() is excessive since the test suite is only interested in testing aliasing, and a simpler and more readable `AliasInfo` class is used for the required information. An example impedance mismatch between method_tests and the new tests, for example, was how to handle ops in namespaces like torch.linalg.det. In the future this information will likely be folded into a common 'OpInfo' registry in the test suite.

The actual tests performed are similar to what they were previously: a scripted and traced version of the op is run and the test verifies that both graphs do not contain the alias name and do contain the aliased name.

The guidance for adding an alias has been updated accordingly.

cc mattip

Note:

ngimel suggests:
- deprecating and then removing the `torch.ger` name
- reviewing the implementation of `torch.outer`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/42802

Reviewed By: zou3519

Differential Revision: D23059883

Pulled By: mruberry

fbshipit-source-id: 11321c2a7fb283a6e7c0d8899849ad7476be42d1
2020-08-11 21:48:31 -07:00
Mike Ruberry
9c8021c0b1 Adds torch.linalg namespace (#42664)
Summary:
This PR adds the `torch.linalg` namespace as part of our continued effort to be more compatible with NumPy. The namespace is tested by adding a single function, `torch.linalg.outer`, and testing it in a new test suite, test_linalg.py. It follows the same pattern that https://github.com/pytorch/pytorch/pull/41911, which added the `torch.fft` namespace, did.

Future PRs will likely:

- add more functions to torch.linalg
- expand the testing done in test_linalg.py, including legacy functions, like torch.ger
- deprecate existing linalg functions outside of `torch.linalg` in preference to the new namespace

Pull Request resolved: https://github.com/pytorch/pytorch/pull/42664

Reviewed By: ngimel

Differential Revision: D22991019

Pulled By: mruberry

fbshipit-source-id: 39258d9b116a916817b3588f160b141f956e5d0b
2020-08-07 10:18:30 -07:00