Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51057
Caffe2 has an
[optimization](f8eefbdf7a/caffe2/operators/batch_matmul_op.h (L192))
for the case where the batch size is 1 that uses the underlying `gemm`
instead of `gemm_batched` BLAS function. This diff tries to port that
optimization to `baddbmm_mkl`.
Note that I have very little linear algebra background and am just
going off existing code and cblas API documentation, so please
review without assuming I know what I'm doing with the math itself.
ghstack-source-id: 120342923
Reviewed By: hlu1
Differential Revision: D26056613
fbshipit-source-id: feef80344b96601fc2bd0a2e8c8f6b57510d7856
Summary:
On Ampere GPU, matmuls are computed by default with TF32 when the dtype is `torch.float`: https://pytorch.org/docs/stable/notes/cuda.html#tensorfloat-32-tf32-on-ampere-devices, which results in reduced precision in results. However, linear algebra usually need higher precision, therefore lots of tests in `test_linalg.py` are failing on Ampere GPU because of precision issue.
To fix this issue:
- Most linear algebra methods, except for matmuls, should add `NoTF32Guard`
- Expected results in unit tests should compute matmuls using numpy instead of pytorch cuda.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50453
Reviewed By: glaringlee
Differential Revision: D26023005
Pulled By: ngimel
fbshipit-source-id: f0ea533494fee322b07925565b57e3b0db2570c5
Summary:
**BC-breaking note:**
torch.svd() added support for complex inputs in PyTorch 1.7, but was not documented as doing so. The complex "V" tensor returned was actually the complex conjugate of what's expected. This PR fixes the discrepancy.
This will silently break all users of torch.svd() with complex inputs.
**Original PR Summary:**
This PR resolves https://github.com/pytorch/pytorch/issues/45821.
The problem was that when introducing the support of complex inputs for `torch.svd` it was overlooked that LAPACK/MAGMA returns the conjugate transpose of V matrix, not just the transpose of V. So `torch.svd` was silently returning U, S, V.conj() instead of U, S, V.
Behavior of `torch.linalg.pinv`, `torch.pinverse` and `torch.linalg.svd` (they depend on `torch.svd`) is not changed in this PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51012
Reviewed By: bdhirsh
Differential Revision: D26047593
Pulled By: albanD
fbshipit-source-id: d1e08dbc3aab9ce1150a95806ef3b5da98b5d3ca
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50957
MAGMA has an off-by-one error in their batched cholesky implementation which is causing illegal memory access for certain inputs. The workaround implemented in this PR is to pad the input to MAGMA with 1 extra element.
**Benchmark**
Ran the script below for both before and after my PR and got similar results.
*Script*
```
import torch
from torch.utils import benchmark
DTYPE = torch.float32
BATCHSIZE = 512 * 512
MATRIXSIZE = 16
a = torch.eye(MATRIXSIZE, device='cuda', dtype=DTYPE)
t0 = benchmark.Timer(
stmt='torch.cholesky(a)',
globals={'a': a},
label='Single'
)
t1 = benchmark.Timer(
stmt='torch.cholesky(a)',
globals={'a': a.expand(BATCHSIZE, -1, -1)},
label='Batched'
)
print(t0.timeit(100))
print(t1.timeit(100))
```
*Results before*
```
<torch.utils.benchmark.utils.common.Measurement object at 0x7faf9bc63400>
Single
2.08 ms
1 measurement, 100 runs , 1 thread
<torch.utils.benchmark.utils.common.Measurement object at 0x7faf9bc63400>
Batched
7.68 ms
1 measurement, 100 runs , 1 thread
```
*Results after*
```
<torch.utils.benchmark.utils.common.Measurement object at 0x7faf9bc63400>
Single
2.10 ms
1 measurement, 100 runs , 1 thread
<torch.utils.benchmark.utils.common.Measurement object at 0x7faf9bc63400>
Batched
7.56 ms
1 measurement, 100 runs , 1 thread
```
Fixes https://github.com/pytorch/pytorch/issues/41394, https://github.com/pytorch/pytorch/issues/26996, https://github.com/pytorch/pytorch/issues/48996
See also https://github.com/pytorch/pytorch/issues/42666, https://github.com/pytorch/pytorch/pull/26789
TODO
---
- [x] Benchmark to check for perf regressions
Test Plan: Imported from OSS
Reviewed By: bdhirsh
Differential Revision: D26050978
Pulled By: heitorschueroff
fbshipit-source-id: 7a5ba7e34c9d74b58568b2a0c631cc6d7ba63f86
Summary:
This PR adds `torch.linalg.slogdet`.
Changes compared to the original torch.slogdet:
- Complex input now works as in NumPy
- Added out= variant (allocates temporary and makes a copy for now)
- Updated `slogdet_backward` to work with complex input
Ref. https://github.com/pytorch/pytorch/issues/42666
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49194
Reviewed By: VitalyFedyunin
Differential Revision: D25916959
Pulled By: mruberry
fbshipit-source-id: cf9be8c5c044870200dcce38be48cd0d10e61a48
Summary:
This PR adds `torch.linalg.pinv`.
Changes compared to the original `torch.pinverse`:
* New kwarg "hermitian": with `hermitian=True` eigendecomposition is used instead of singular value decomposition.
* `rcond` argument can now be a `Tensor` of appropriate shape to apply matrix-wise clipping of singular values.
* Added `out=` variant (allocates temporary and makes a copy for now)
Ref. https://github.com/pytorch/pytorch/issues/42666
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48399
Reviewed By: zhangguanheng66
Differential Revision: D25869572
Pulled By: mruberry
fbshipit-source-id: 0f330a91d24ba4e4375f648a448b27594e00dead
Summary:
This PR adds `torch.linalg.inv` for NumPy compatibility.
`linalg_inv_out` uses in-place operations on provided `result` tensor.
I modified `apply_inverse` to accept tensor of Int instead of std::vector, that way we can write a function similar to `linalg_inv_out` but removing the error checks and device memory synchronization.
I fixed `lda` (leading dimension parameter which is max(1, n)) in many places to handle 0x0 matrices correctly.
Zero batch dimensions are also working and tested.
Ref https://github.com/pytorch/pytorch/issues/42666
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48261
Reviewed By: gchanan
Differential Revision: D25849590
Pulled By: mruberry
fbshipit-source-id: cfee6f1daf7daccbe4612ec68f94db328f327651
Summary:
This is related to https://github.com/pytorch/pytorch/issues/42666 .
I am opening this PR to have the opportunity to discuss things.
First, we need to consider the differences between `torch.svd` and `numpy.linalg.svd`:
1. `torch.svd` takes `some=True`, while `numpy.linalg.svd` takes `full_matrices=True`, which is effectively the opposite (and with the opposite default, too!)
2. `torch.svd` returns `(U, S, V)`, while `numpy.linalg.svd` returns `(U, S, VT)` (i.e., V transposed).
3. `torch.svd` always returns a 3-tuple; `numpy.linalg.svd` returns only `S` in case `compute_uv==False`
4. `numpy.linalg.svd` also takes an optional `hermitian=False` argument.
I think that the plan is to eventually deprecate `torch.svd` in favor of `torch.linalg.svd`, so this PR does the following:
1. Rename/adapt the old `svd` C++ functions into `linalg_svd`: in particular, now `linalg_svd` takes `full_matrices` and returns `VT`
2. Re-implement the old C++ interface on top of the new (by negating `full_matrices` and transposing `VT`).
3. The C++ version of `linalg_svd` *always* returns a 3-tuple (we can't do anything else). So, there is a python wrapper which manually calls `torch._C._linalg.linalg_svd` to tweak the return value in case `compute_uv==False`.
Currently, `linalg_svd_backward` is broken because it has not been adapted yet after the `V ==> VT` change, but before continuing and spending more time on it I wanted to make sure that the general approach is fine.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45562
Reviewed By: H-Huang
Differential Revision: D25803557
Pulled By: mruberry
fbshipit-source-id: 4966f314a0ba2ee391bab5cda4563e16275ce91f
Summary:
I am opening this PR early to have a place to discuss design issues.
The biggest difference between `torch.qr` and `numpy.linalg.qr` is that the former `torch.qr` takes a boolean parameter `some=True`, while the latter takes a string parameter `mode='reduced'` which can be one of the following:
`reduced`
this is completely equivalent to `some=True`, and both are the default.
`complete`
this is completely equivalent to `some=False`.
`r`
this returns only `r` instead of a tuple `(r, q)`. We have already decided that we don't want different return types depending on the parameters, so I propose to return `(r, empty_tensor)` instead. I **think** that in this mode it will be impossible to implement the backward pass, so we should raise an appropriate error in that case.
`raw`
in this mode, it returns `(h, tau)` instead of `(q, r)`. Internally, `h` and `tau` are obtained by calling lapack's `dgeqrf` and are later used to compute the actual values of `(q, r)`. The numpy docs suggest that these might be useful to call other lapack functions, but at the moment none of them is exposed by numpy and I don't know how often it is used in the real world.
I suppose the implementing the backward pass need attention to: the most straightforward solution is to use `(h, tau)` to compute `(q, r)` and then use the normal logic for `qr_backward`, but there might be faster alternatives.
`full`, `f`
alias for `reduced`, deprecated since numpy 1.8.0
`economic`, `e`
similar to `raw but it returns only `h` instead of `(h, tau). Deprecated since numpy 1.8.0
To summarize:
* `reduce`, `complete` and `r` are straightforward to implement.
* `raw` needs a bit of extra care, but I don't know how much high priority it is: since it is used rarely, we might want to not support it right now and maybe implement it in the future?
* I think we should just leave `full` and `economic` out, and possibly add a note to the docs explaining what you need to use instead
/cc mruberry
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47764
Reviewed By: ngimel
Differential Revision: D25708870
Pulled By: mruberry
fbshipit-source-id: c25c70a23a02ec4322430d636542041e766ebe1b
Summary:
This PR adds `torch.linalg.inv` for NumPy compatibility.
`linalg_inv_out` uses in-place operations on provided `result` tensor.
I modified `apply_inverse` to accept tensor of Int instead of std::vector, that way we can write a function similar to `linalg_inv_out` but removing the error checks and device memory synchronization.
I fixed `lda` (leading dimension parameter which is max(1, n)) in many places to handle 0x0 matrices correctly.
Zero batch dimensions are also working and tested.
Ref https://github.com/pytorch/pytorch/issues/42666
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48261
Reviewed By: ngimel
Differential Revision: D25690129
Pulled By: mruberry
fbshipit-source-id: edb2d03721f22168c42ded8458513cb23dfdc712
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49506
- Get rid of expensive stuff like `TensorArg`, `checkBackend`, `checkSize`, and `TensorAccessor`.
- Add `checkDim` that does not require creating a `TensorArg` which incurs a refcount bump
- Avoid unnecessary calls to `torch.select`, which goes through the dispatcher in the cases we care about, with mat1 and mat2 not permuted or permuted with dims = [0, 2, 1]. The pt version of bmm supports crazy cases like when the inputs are permuted with dims = [1, 2, 0], which is uncommon in SparseNNs.
Test Plan:
Unit test:
```
buck test //caffe2/test:linalg
```
Benchmark with the adindexer model:
```
Before:
I1216 14:02:24.155516 2595800 PyTorchPredictorBenchLib.cpp:209] PyTorch run finished. Milliseconds per iter: 0.0847197. Iters per second: 11803.6
After:
I1216 14:02:26.583878 2595939 PyTorchPredictorBenchLib.cpp:209] PyTorch run finished. Milliseconds per iter: 0.082051. Iters per second: 12187.5
```
Reviewed By: bwasti
Differential Revision: D25577574
fbshipit-source-id: 8aba69b950e7b4d9d1b14ba837931695a908c068
Summary:
This PR adds `torch.linalg.solve`.
`linalg_solve_out` uses in-place operations on the provided result tensor.
I modified `apply_solve` to accept tensor of Int instead of std::vector, that way we can write a function similar to `linalg_solve_out` but removing the error checks and device memory synchronization.
In comparison to `torch.solve` this routine accepts 1-dimensional tensors and batches of 1-dim tensors for the right-hand-side term. `torch.solve` requires it to be at least 2-dimensional.
Ref. https://github.com/pytorch/pytorch/issues/42666
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48456
Reviewed By: izdeby
Differential Revision: D25562222
Pulled By: mruberry
fbshipit-source-id: a9355c029e2442c2e448b6309511919631f9e43b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49189
This reverts commit d307601365 and fixes the bug with diagonals and ellipsis combined.
Test Plan: Imported from OSS
Reviewed By: glaringlee
Differential Revision: D25540722
Pulled By: heitorschueroff
fbshipit-source-id: 86d0c9a7dcfda600b546457dad102af2ff33e353
Summary:
**BC-breaking note:**
Previously, when given a complex input, `torch.linalg.norm` and `torch.norm` would return a complex output. `torch.linalg.cond` would sometimes return a complex output and sometimes return a real output when given a complex input, depending on its `p` argument. This PR changes this behavior to match `numpy.linalg.norm` and `numpy.linalg.cond`, so that a complex input will result in the downgraded real number type, consistent with NumPy.
**PR Summary:**
The following cases were previously unsupported for complex inputs, and this commit adds support:
- Frobenius norm
- Norm order 2 (vector and matrix)
- CUDA vector norm
Part of https://github.com/pytorch/pytorch/issues/47833
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48284
Reviewed By: H-Huang
Differential Revision: D25420880
Pulled By: mruberry
fbshipit-source-id: 11f6a2f3cad57d66476d30921c3f6ab8f3cd4017
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47860
This PR makes torch.einsum compatible with numpy.einsum except for the sublist input option as requested here https://github.com/pytorch/pytorch/issues/21412. It also fixed 2 performance issues linked below and adds a check for reducing to torch.dot instead of torch.bmm which is faster in some cases.
fixes#45854, #37628, #30194, #15671fixes#41467 with benchmark below
```python
import torch
from torch.utils.benchmark import Timer
a = torch.randn(10000, 100, 101, device='cuda')
b = torch.randn(10000, 101, 3, device='cuda')
c = torch.randn(10000, 100, 1, device='cuda')
d = torch.randn(10000, 100, 1, 3, device='cuda')
print(Timer(
stmt='torch.einsum("bij,bjf->bif", a, b)',
globals={'a': a, 'b': b}
).blocked_autorange())
print()
print(Timer(
stmt='torch.einsum("bic,bicf->bif", c, d)',
globals={'c': c, 'd': d}
).blocked_autorange())
```
```
<torch.utils.benchmark.utils.common.Measurement object at 0x7fa37c413850>
torch.einsum("bij,bjf->bif", a, b)
Median: 4.53 ms
IQR: 0.00 ms (4.53 to 4.53)
45 measurements, 1 runs per measurement, 1 thread
<torch.utils.benchmark.utils.common.Measurement object at 0x7fa37c413700>
torch.einsum("bic,bicf->bif", c, d)
Median: 63.86 us
IQR: 1.52 us (63.22 to 64.73)
4 measurements, 1000 runs per measurement, 1 thread
```
fixes#32591 with benchmark below
```python
import torch
from torch.utils.benchmark import Timer
a = torch.rand(1, 1, 16, 2, 16, 2, 16, 2, 2, 2, 2, device="cuda")
b = torch.rand(729, 1, 1, 2, 1, 2, 1, 2, 2, 2, 2, device="cuda")
print(Timer(
stmt='(a * b).sum(dim = (-3, -2, -1))',
globals={'a': a, 'b': b}
).blocked_autorange())
print()
print(Timer(
stmt='torch.einsum("...ijk, ...ijk -> ...", a, b)',
globals={'a': a, 'b': b}
).blocked_autorange())
```
```
<torch.utils.benchmark.utils.common.Measurement object at 0x7efe0de28850>
(a * b).sum(dim = (-3, -2, -1))
Median: 17.86 ms
2 measurements, 10 runs per measurement, 1 thread
<torch.utils.benchmark.utils.common.Measurement object at 0x7efe0de286a0>
torch.einsum("...ijk, ...ijk -> ...", a, b)
Median: 296.11 us
IQR: 1.38 us (295.42 to 296.81)
662 measurements, 1 runs per measurement, 1 thread
```
TODO
- [x] add support for ellipsis broadcasting
- [x] fix corner case issues with sumproduct_pair
- [x] update docs and add more comments
- [x] add tests for error cases
Test Plan: Imported from OSS
Reviewed By: mruberry
Differential Revision: D24923679
Pulled By: heitorschueroff
fbshipit-source-id: 47e48822cd67bbcdadbdfc5ffa25ee8ba4c9620a
Summary:
`torch.cholesky_solve` now works for complex inputs on GPU.
I moved the existing tests to `test_linalg.py` and modified them to test complex and float32 dtypes.
Differentiation also works correctly with complex inputs now.
Ref. https://github.com/pytorch/pytorch/issues/33152
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47047
Reviewed By: ngimel
Differential Revision: D24730020
Pulled By: mruberry
fbshipit-source-id: 95402da5789c56e5a682019790985207fa28fa1f
Summary:
Relanding https://github.com/pytorch/pytorch/pull/46862
There was an issue with the simultaneous merge of two slightly conflicting PRs.
This PR adds `torch.lu_solve` for complex inputs both on CPU and GPU.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48028
Reviewed By: linbinyu
Differential Revision: D25003700
Pulled By: zou3519
fbshipit-source-id: 24cd1babe9ccdbaa4e2ed23f08a9153d40d0f0cd
Summary:
This PR adds `torch.linalg.matrix_rank`.
Changes compared to the original `torch.matrix_rank`:
- input with the complex dtype is supported
- batched input is supported
- "symmetric" kwarg renamed to "hermitian"
Should I update the documentation for `torch.matrix_rank`?
For the input with no elements (for example 0×0 matrix), the current implementation is divergent from NumPy. NumPy stumbles on not defined max for such input, here I chose to return appropriately sized tensor of zeros. I think that's mathematically a correct thing to do.
Ref https://github.com/pytorch/pytorch/issues/42666.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48206
Reviewed By: albanD
Differential Revision: D25211965
Pulled By: mruberry
fbshipit-source-id: ae87227150ab2cffa07f37b4a3ab228788701837
Summary:
Creates multiple new test suites to have fewer tests in test_torch.py, consistent with previous test suite creation like test_unary_ufuncs.py and test_linalg.py.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47356
Reviewed By: ngimel
Differential Revision: D25202268
Pulled By: mruberry
fbshipit-source-id: 75fde3ca76545d1b32b86d432a5cb7a5ba8f5bb6
Summary:
This PR adds `torch.linalg.eigh`, and `torch.linalg.eigvalsh` for NumPy compatibility.
The current `torch.symeig` uses (on CPU) a different LAPACK routine than NumPy (`syev` vs `syevd`). Even though it shouldn't matter in practice, `torch.linalg.eigh` uses `syevd` (as NumPy does).
Ref https://github.com/pytorch/pytorch/issues/42666
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45526
Reviewed By: gchanan
Differential Revision: D25022659
Pulled By: mruberry
fbshipit-source-id: 3676b77a121c4b5abdb712ad06702ac4944e900a
Summary:
Fixes https://github.com/pytorch/pytorch/issues/47313
This PR implements `torch.addr` function using `TensorIterator` with `cpu_kernel_vec` and `gpu_kernel`.
It helps reduce memory usage, improve performance, and fix the bug when `beta` or `alpha` is a complex number.
Todo
- [x] benchmarking `torch.addr` for the change of this PR, as well as the legacy TH implementation used in PyTorch 1.6.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47664
Reviewed By: zhangguanheng66
Differential Revision: D25059693
Pulled By: ngimel
fbshipit-source-id: 20a90824aa4cb2240e81a9f17a9e2f16ae6e3437
Summary:
Now when https://github.com/pytorch/pytorch/pull/42553 is merged we can delete a bit of code from the tests and enable some of the skipped complex tests.
Unfortunately, `test_pinverse_complex_xfailed` and `test_symeig_complex_xfailed` had bugs and it wasn't caught automatically that these tests xpass. Need to be careful next time with `unittest.expectedFailure`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47910
Reviewed By: zhangguanheng66
Differential Revision: D25052130
Pulled By: mruberry
fbshipit-source-id: 29512995c024b882f9cb78b7bede77733d5762d0
Summary:
`torch.lu_solve` now works for complex inputs both on CPU and GPU.
I moved the existing tests to `test_linalg.py` and modified them to test complex dtypes, but I didn't modify/improve the body of the tests.
Ref. https://github.com/pytorch/pytorch/issues/33152
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46862
Reviewed By: nikithamalgifb
Differential Revision: D24543682
Pulled By: anjali411
fbshipit-source-id: 165bde39ef95cafebf976c5ba4b487297efe8433
Summary:
This test started failing when ROCm CI moved to 3.9. Skip until triage is complete.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47809
Reviewed By: seemethere
Differential Revision: D24906319
Pulled By: walterddr
fbshipit-source-id: 0c425f3b21190cfbc5e0d1c3f477d834af40f0ca
Summary:
`torch.triangular_solve` now works for complex inputs on GPU.
I moved the existing tests to `test_linalg.py` and modified them to test complex and float32 dtypes.
Ref. https://github.com/pytorch/pytorch/issues/33152
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46916
Reviewed By: navahgar, agolynski
Differential Revision: D24706647
Pulled By: anjali411
fbshipit-source-id: fe780eac93d2ae1b2549539bb385e5fac25213b3
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46398
This PR makes torch.einsum compatible with numpy.einsum except for the sublist input option as requested here https://github.com/pytorch/pytorch/issues/21412. It also fixed 2 performance issues linked below and adds a check for reducing to torch.dot instead of torch.bmm which is faster in some cases.
fixes#45854, #37628, #30194, #15671fixes#41467 with benchmark below
```python
import torch
from torch.utils.benchmark import Timer
a = torch.randn(10000, 100, 101, device='cuda')
b = torch.randn(10000, 101, 3, device='cuda')
c = torch.randn(10000, 100, 1, device='cuda')
d = torch.randn(10000, 100, 1, 3, device='cuda')
print(Timer(
stmt='torch.einsum("bij,bjf->bif", a, b)',
globals={'a': a, 'b': b}
).blocked_autorange())
print()
print(Timer(
stmt='torch.einsum("bic,bicf->bif", c, d)',
globals={'c': c, 'd': d}
).blocked_autorange())
```
```
<torch.utils.benchmark.utils.common.Measurement object at 0x7fa37c413850>
torch.einsum("bij,bjf->bif", a, b)
Median: 4.53 ms
IQR: 0.00 ms (4.53 to 4.53)
45 measurements, 1 runs per measurement, 1 thread
<torch.utils.benchmark.utils.common.Measurement object at 0x7fa37c413700>
torch.einsum("bic,bicf->bif", c, d)
Median: 63.86 us
IQR: 1.52 us (63.22 to 64.73)
4 measurements, 1000 runs per measurement, 1 thread
```
fixes#32591 with benchmark below
```python
import torch
from torch.utils.benchmark import Timer
a = torch.rand(1, 1, 16, 2, 16, 2, 16, 2, 2, 2, 2, device="cuda")
b = torch.rand(729, 1, 1, 2, 1, 2, 1, 2, 2, 2, 2, device="cuda")
print(Timer(
stmt='(a * b).sum(dim = (-3, -2, -1))',
globals={'a': a, 'b': b}
).blocked_autorange())
print()
print(Timer(
stmt='torch.einsum("...ijk, ...ijk -> ...", a, b)',
globals={'a': a, 'b': b}
).blocked_autorange())
```
```
<torch.utils.benchmark.utils.common.Measurement object at 0x7efe0de28850>
(a * b).sum(dim = (-3, -2, -1))
Median: 17.86 ms
2 measurements, 10 runs per measurement, 1 thread
<torch.utils.benchmark.utils.common.Measurement object at 0x7efe0de286a0>
torch.einsum("...ijk, ...ijk -> ...", a, b)
Median: 296.11 us
IQR: 1.38 us (295.42 to 296.81)
662 measurements, 1 runs per measurement, 1 thread
```
TODO
- [x] add support for ellipsis broadcasting
- [x] fix corner case issues with sumproduct_pair
- [x] update docs and add more comments
- [x] add tests for error cases
Test Plan: Imported from OSS
Reviewed By: malfet
Differential Revision: D24860367
Pulled By: heitorschueroff
fbshipit-source-id: 31110ee598fd598a43acccf07929b67daee160f9
Summary:
`torch.inverse` now works for complex inputs on GPU.
Test cases with complex matrices are xfailed for now. For example, batched matmul does not work with complex yet.
Ref. https://github.com/pytorch/pytorch/issues/33152
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45034
Reviewed By: zou3519
Differential Revision: D24730264
Pulled By: anjali411
fbshipit-source-id: b9c94ec463012913c117278a884adeee96ea02aa
Summary:
This PR adds a function for calculating the Kronecker product of tensors.
The implementation is based on `at::tensordot` with permutations and reshape.
Tests pass.
TODO:
- [x] Add more test cases
- [x] Write documentation
- [x] Add entry `common_methods_invokations.py`
Ref. https://github.com/pytorch/pytorch/issues/42666
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45358
Reviewed By: mrshenli
Differential Revision: D24680755
Pulled By: mruberry
fbshipit-source-id: b1f8694589349986c3abfda3dc1971584932b3fa
Summary:
* Removes incorrect statement that "the vector norm will be applied to the last dimension".
* More clearly describe each different combination of `p`, `ord`, and input size.
* Moves norm tests from `test/test_torch.py` to `test/test_linalg.py`
* Adds test ensuring that `p='fro'` and `p=2` give same results for mutually valid inputs
Fixes https://github.com/pytorch/pytorch/issues/41388
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42696
Reviewed By: bwasti
Differential Revision: D23876862
Pulled By: mruberry
fbshipit-source-id: 36f33ccb6706d5fe13f6acf3de8ae14d7fbdff85
Summary:
Changes the deprecation of norm to a docs deprecation, since PyTorch components still rely on norm and some behavior, like automatically flattening tensors, may need to be ported to torch.linalg.norm. The documentation is also updated to clarify that torch.norm and torch.linalg.norm are distinct.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45415
Reviewed By: ngimel
Differential Revision: D23958252
Pulled By: mruberry
fbshipit-source-id: fd54e807c59a2655453a6bcd9f4073cb2c12e8ac
Summary:
This PR:
- updates test_op_normalization.py, which verifies that aliases are correctly translated in the JIT
- adds torch.linalg.det as an alias for torch.det
- moves the torch.linalg.outer alias to torch.outer (to be consistent with NumPy)
The torch.linalg.outer alias was put the linalg namespace erroneously as a placeholder since it's a "linear algebra op" according to NumPy but is actually still in the main NumPy namespace.
The updates to test_op_normalization are necessary. Previously it was using method_tests to generate tests, and method_tests assumes test suites using it also use the device generic framework, which test_op_normalization did not. For example, some ops require decorators like `skipCPUIfNoLapack`, which only works in device generic test classes. Moving test_op_normalization to the device generic framework also lets these tests run on CPU and CUDA.
Continued reliance on method_tests() is excessive since the test suite is only interested in testing aliasing, and a simpler and more readable `AliasInfo` class is used for the required information. An example impedance mismatch between method_tests and the new tests, for example, was how to handle ops in namespaces like torch.linalg.det. In the future this information will likely be folded into a common 'OpInfo' registry in the test suite.
The actual tests performed are similar to what they were previously: a scripted and traced version of the op is run and the test verifies that both graphs do not contain the alias name and do contain the aliased name.
The guidance for adding an alias has been updated accordingly.
cc mattip
Note:
ngimel suggests:
- deprecating and then removing the `torch.ger` name
- reviewing the implementation of `torch.outer`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42802
Reviewed By: zou3519
Differential Revision: D23059883
Pulled By: mruberry
fbshipit-source-id: 11321c2a7fb283a6e7c0d8899849ad7476be42d1
Summary:
This PR adds the `torch.linalg` namespace as part of our continued effort to be more compatible with NumPy. The namespace is tested by adding a single function, `torch.linalg.outer`, and testing it in a new test suite, test_linalg.py. It follows the same pattern that https://github.com/pytorch/pytorch/pull/41911, which added the `torch.fft` namespace, did.
Future PRs will likely:
- add more functions to torch.linalg
- expand the testing done in test_linalg.py, including legacy functions, like torch.ger
- deprecate existing linalg functions outside of `torch.linalg` in preference to the new namespace
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42664
Reviewed By: ngimel
Differential Revision: D22991019
Pulled By: mruberry
fbshipit-source-id: 39258d9b116a916817b3588f160b141f956e5d0b