Summary:
This PR fixes `torch.linalg.inv_ex` with MAGMA backend.
`info` tensor was returned on CPU device even for CUDA inputs.
Now it's on the same device as input.
Fixes https://github.com/pytorch/pytorch/issues/58769
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59223
Reviewed By: ngimel
Differential Revision: D28814876
Pulled By: mruberry
fbshipit-source-id: f66c6f06fb8bc305cb2e22b08750a25c8888fb65
Summary:
Per title. Now `norm` with fp16/bfloat16 inputs and fp32 outputs on cuda won't do explicit cast
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59134
Reviewed By: mruberry
Differential Revision: D28775729
Pulled By: ngimel
fbshipit-source-id: 896daa4f02e8a817cb7cb99ae8a93c02fa8dd5e9
Summary:
This PR adds an alternative way of calling `torch.einsum`. Instead of specifying the subscripts as letters in the `equation` parameter, one can now specify the subscripts as a list of integers as in `torch.einsum(operand1, subscripts1, operand2, subscripts2, ..., [subscripts_out])`. This would be equivalent to `torch.einsum('<subscripts1>,<subscripts2>,...,->[<subscript_out>]', operand1, operand2, ...)`
TODO
- [x] Update documentation
- [x] Add more error checking
- [x] Update tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56625
Reviewed By: zou3519
Differential Revision: D28062616
Pulled By: heitorschueroff
fbshipit-source-id: ec50ad34f127210696e7c545e4c0675166f127dc
Summary:
This PR does several things to relax test tolerance
- Do not use TF32 in cuda matmul in test_c10d. See https://github.com/pytorch/pytorch/issues/52941.
- Do not use TF32 in cuda matmul in test_linalg. Increase atol for float and cfloat. See https://github.com/pytorch/pytorch/issues/50453
The tolerance is increased because most linear algebra operators are not that stable in single precision.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56114
Reviewed By: ailzhang
Differential Revision: D28554467
Pulled By: ngimel
fbshipit-source-id: 90416be8e4c048bedb16903b01315584d344ecdf
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56613
Replace linalg_solve_helper with `lu_stub` + `lu_solve_stub`.
Once `lu_stub` and `lu_solve_stub` have cuSOLVER-based codepath,
`torch.linalg.solve` will have it as well.
Test Plan: Imported from OSS
Reviewed By: agolynski
Differential Revision: D28379394
Pulled By: mruberry
fbshipit-source-id: b47f66bc1ee12715da11dcffc92e31e67fa8c8f6
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58041
The shape of the returned result was different for NumPy and PyTorch for
`ord={-2, 2, None}`. Now it's fixed.
Test Plan: Imported from OSS
Reviewed By: ngimel
Differential Revision: D28405147
Pulled By: mruberry
fbshipit-source-id: 30293a017a0c0a7e9e3aabd470386235fef7b6a6
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58040
This PR uses `torch.linalg.inv_ex` to determine the non-invertible
inputs and return the condition number of infinity for such inputs.
Added OpInfo entry for `torch.linalg.cond`.
Test Plan: Imported from OSS
Reviewed By: ngimel
Differential Revision: D28405146
Pulled By: mruberry
fbshipit-source-id: 524b9a38309851fa6461cb787ef3fba5aa7d5328
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58039
The new function has the following signature
`inv_ex(Tensor inpit, *, bool check_errors=False) -> (Tensor inverse, Tensor info)`.
When `check_errors=True`, an error is thrown if the matrix is not invertible; `check_errors=False` - responsibility for checking the result is on the user.
`linalg_inv` is implemented using calls to `linalg_inv_ex` now.
Resolves https://github.com/pytorch/pytorch/issues/25095
Test Plan: Imported from OSS
Reviewed By: ngimel
Differential Revision: D28405148
Pulled By: mruberry
fbshipit-source-id: b8563a6c59048cb81e206932eb2f6cf489fd8531
Summary:
This one had a tricky usage of `torch.symeig` that had to be replaced. I tested the replacement locally though.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57732
Reviewed By: bdhirsh
Differential Revision: D28328189
Pulled By: mruberry
fbshipit-source-id: 7f000fcbf2b029beabc76e5a89ff158b47977474
Summary:
Backward methods for `torch.lu` and `torch.lu_solve` require the `torch.lu_unpack` method.
However, while `torch.lu` is a Python wrapper over a native function, so its gradient is implemented via `autograd.Function`,
`torch.lu_solve` is a native function, so it cannot access `torch.lu_unpack` as it is implemented in Python.
Hence this PR presents a native (ATen) `lu_unpack` version. It is also possible to update the gradients for `torch.lu` so that backward+JIT is supported (no JIT for `autograd.Function`) with this function.
~~The interface for this method is different from the original `torch.lu_unpack`, so it is decided to keep it hidden.~~
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46913
Reviewed By: albanD
Differential Revision: D28355725
Pulled By: mruberry
fbshipit-source-id: 281260f3b6e93c15b08b2ba66d5a221314b00e78
Summary:
This PR adds a note to the documentation that torch.svd is deprecated together with an upgrade guide on how to use `torch.linalg.svd` and `torch.linalg.svdvals` (Lezcano's instructions from https://github.com/pytorch/pytorch/issues/57549).
In addition, all usage of the old svd function is replaced with a new one from torch.linalg module, except for the `at::linalg_pinv` function, that fails the XLA CI build (https://github.com/pytorch/xla/issues/2755, see failure in draft PR https://github.com/pytorch/pytorch/pull/57772).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57981
Reviewed By: ngimel
Differential Revision: D28345558
Pulled By: mruberry
fbshipit-source-id: 02dd9ae6efe975026e80ca128e9b91dfc65d7213
Summary:
Backward methods for `torch.lu` and `torch.lu_solve` require the `torch.lu_unpack` method.
However, while `torch.lu` is a Python wrapper over a native function, so its gradient is implemented via `autograd.Function`,
`torch.lu_solve` is a native function, so it cannot access `torch.lu_unpack` as it is implemented in Python.
Hence this PR presents a native (ATen) `lu_unpack` version. It is also possible to update the gradients for `torch.lu` so that backward+JIT is supported (no JIT for `autograd.Function`) with this function.
~~The interface for this method is different from the original `torch.lu_unpack`, so it is decided to keep it hidden.~~
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46913
Reviewed By: astaff
Differential Revision: D28117714
Pulled By: mruberry
fbshipit-source-id: befd33db12ecc147afacac792418b6f4948fa4a4
Summary:
This PR is focused on the API for `linalg.matrix_norm` and delegates computations to `linalg.norm` for the moment.
The main difference between the norms is when `dim=None`. In this case
- `linalg.norm` will compute a vector norm on the flattened input if `ord=None`, otherwise it requires the input to be either 1D or 2D in order to disambiguate between vector and matrix norm
- `linalg.vector_norm` will flatten the input
- `linalg.matrix_norm` will compute the norm over the last two dimensions, treating the input as batch of matrices
In future PRs, the computations will be moved to `torch.linalg.matrix_norm` and `torch.norm` and `torch.linalg.norm` will delegate computations to either `linalg.vector_norm` or `linalg.matrix_norm` based on the arguments provided.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57127
Reviewed By: mrshenli
Differential Revision: D28186736
Pulled By: mruberry
fbshipit-source-id: 99ce2da9d1c4df3d9dd82c0a312c9570da5caf25
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57181
Documentation for torch.linalg.svd says:
> The returned decomposition is a named tuple `(U, S, Vh)`
The documentation is correct while the implementation was wrong.
Renamed `V` -> `Vh`. `h` stands for hermitian.
This is a BC-breaking change but our linalg module is beta, therefore we can do it without a deprecation notice or aliases.
Test Plan: Imported from OSS
Reviewed By: ngimel
Differential Revision: D28142162
Pulled By: mruberry
fbshipit-source-id: 5e6e0ae5a63300f2db1575ca3259df381f8e1a7e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57180
We have now a separate function for computing only the singular values.
`compute_uv` argument is not needed and it was decided in the
offline discussion to remove it. This is a BC-breaking change but our
linalg module is beta, therefore we can do it without a deprecation
notice.
Test Plan: Imported from OSS
Reviewed By: ngimel
Differential Revision: D28142163
Pulled By: mruberry
fbshipit-source-id: 3fac1fcae414307ad5748c9d5ff50e0aa4e1b853
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56613
Replace linalg_solve_helper with `lu_stub` + `lu_solve_stub`.
Once `lu_stub` and `lu_solve_stub` have cuSOLVER-based codepath,
`torch.linalg.solve` will have it as well.
Test Plan: Imported from OSS
Reviewed By: albanD
Differential Revision: D28248766
Pulled By: mruberry
fbshipit-source-id: 3003666056533d097d0ad659e0603f59fbfda9aa
Summary:
As per discussion here https://github.com/pytorch/pytorch/pull/57127#discussion_r624948215
Note that we cannot remove the optional type from the `dim` parameter because the default is to flatten the input tensor which cannot be easily captured by a value other than `None`
### BC Breaking Note
This PR changes the `ord` parameter of `torch.linalg.vector_norm` so that it no longer accepts `None` arguments. The default behavior of `2` is equivalent to the previous default of `None`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57662
Reviewed By: albanD, mruberry
Differential Revision: D28228870
Pulled By: heitorschueroff
fbshipit-source-id: 040fd8055bbe013f64d3c8409bbb4b2c87c99d13
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57316
CUDA support is implemented using cuSOLVER.
Test Plan: Imported from OSS
Reviewed By: ngimel
Differential Revision: D28242071
Pulled By: mruberry
fbshipit-source-id: 6f0a1c50c21c376d2ee2907bddb618c6a600db1f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57315
This PR ports `torch.ormqr` from TH to ATen.
CUDA path will be implemented in a follow-up PR.
With ATen port, support for complex and batched inputs is added.
The tests are rewritten and OpInfo entry is added.
We can implement the least squares solver with geqrf + ormqr +
triangular_solve. So it's useful to have this function renewed at least for the
internal code.
Resolves https://github.com/pytorch/pytorch/issues/24748
Test Plan: Imported from OSS
Reviewed By: ngimel
Differential Revision: D28242070
Pulled By: mruberry
fbshipit-source-id: f070bb6ac2f5a3269b163b22f7354e9089ed3061
Summary:
Testing 11.3 with current CI.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57223
Test Plan:
Relevant CI (11.3) pass!
Disclaimer: Skipped test_inverse_errors_large for CUDA 11.3 as it failed. Issue documented at https://github.com/pytorch/pytorch/issues/57482.
Reviewed By: malfet
Differential Revision: D28169393
Pulled By: janeyx99
fbshipit-source-id: 9f5cf7b6737ee6196de92bd80918a5bfbe5510ea
Summary:
The new function has the following signature `cholesky_ex(Tensor input, *, bool check_errors=False) -> (Tensor L, Tensor infos)`. When `check_errors=True`, an error is thrown if the decomposition fails; `check_errors=False` - responsibility for checking the decomposition is on the user.
When `check_errors=False`, we don't have host-device memory transfers for checking the values of the `info` tensor.
Rewrote the internal code for `torch.linalg.cholesky`. Added `cholesky_stub` dispatch. `linalg_cholesky` is implemented using calls to `linalg_cholesky_ex` now.
Resolves https://github.com/pytorch/pytorch/issues/57032.
Ref. https://github.com/pytorch/pytorch/issues/34272, https://github.com/pytorch/pytorch/issues/47608, https://github.com/pytorch/pytorch/issues/47953
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56724
Reviewed By: ngimel
Differential Revision: D27960176
Pulled By: mruberry
fbshipit-source-id: f05f3d5d9b4aa444e41c4eec48ad9a9b6fd5dfa5
Summary:
This test was disabled for ROCM 3.9. With latest updates, the test is passing in ROCM 4.1. Hence enabling this test in test/test_linalg.py
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57170
Reviewed By: astaff
Differential Revision: D28118217
Pulled By: mruberry
fbshipit-source-id: 1b830eed944a664c3b1b3e936b87096fef0c0ca2
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56257
CPU and cuSOLVER path were fixed with refactoring of
`_linalg_qr_helper_default`.
Resolves https://github.com/pytorch/pytorch/issues/50576
Test Plan: Imported from OSS
Reviewed By: ngimel
Differential Revision: D27960157
Pulled By: mruberry
fbshipit-source-id: f923f3067a35e65218889e64c6a886364c3d1759
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54723
Renamed "cond" -> "rcond" to be NumPy compatible. The default value for
rcond was changed to match non-legacy NumPy behavior.
Test Plan: Imported from OSS
Reviewed By: H-Huang
Differential Revision: D27993741
Pulled By: mruberry
fbshipit-source-id: a4baf25aca6a8272f1af2f963600866bfda56fb3
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54722
SciPy and NumPy operate only on non-batched input and return an empty array with shape (0,) if rank(a) != n.
The behavior for non-batched inputs is NumPy and SciPy compatible and the same result is computed.
For batched inputs, if any matrix in the batch has a rank less than `n`, then an empty tensor is returned.
Test Plan: Imported from OSS
Reviewed By: ngimel
Differential Revision: D27993736
Pulled By: mruberry
fbshipit-source-id: 0d7cff967b322a5e816a23f282b6ce383c4468ef
Summary:
Currently `torch.linalg.matrix_rank` accepts only Python's float for `tol=` argument. The current behavior is not NumPy compatible and this PR adds the possibility to pass Tensor for matrix-wise tolerances.
Ref. https://github.com/pytorch/pytorch/issues/42666
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54157
Reviewed By: ezyang
Differential Revision: D27961548
Pulled By: mruberry
fbshipit-source-id: 47318eefa07a7876e6360dae089e5389b9939489
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56249
This PR ports `torch.geqrf` from TH to ATen. CUDA path will be
implemented in a follow-up PR.
With ATen port support for complex and batched inputs is added.
There were no correctness tests, they are
added in this PR and I added OpInfo for this operation.
We can implement the QR decomposition as a composition of geqrf and
orgqr (torch.linalg.householder_product).
Also we can implement the least squares solver with geqrf + ormqr +
trtrs. So it's useful to have this function renewed at least for the
internal code.
Resolves https://github.com/pytorch/pytorch/issues/24705
Test Plan: Imported from OSS
Reviewed By: ngimel
Differential Revision: D27907357
Pulled By: mruberry
fbshipit-source-id: 94e1806078977417e7903db76eab9d578305f585
Summary:
Fixes https://github.com/pytorch/pytorch/issues/56022.
Fixes https://github.com/pytorch/pytorch/issues/56316
For `torch.tensordot`,
1. `tensordot`'s out variant now resizes the output tensor provided as the `out` argument if necessary.
2. Added a check to verify if the output tensor provided as the argument for `out` is on the same device as the input tensors.
3. Added a check to verify if the dtype of the result is castable to the dtype of the output tensor provided as an argument for `out`.
4. Because of (2) & (3), `tensordot`'s out variant now [safely casts & copies output](https://github.com/pytorch/pytorch/wiki/Developer-FAQ#how-does-out-work-in-pytorch).
5. `test_tensordot` in `test_linalg.py` had a bug - the output tensor wasn't being defined to be on the same device as the input tensors. It was fixed by simply using a `device` argument in its definition.
6. Added an `OpInfo` for `tensordot` and modified the `OpInfo` for `inner`.
cc heitorschueroff mruberry
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56286
Reviewed By: ngimel
Differential Revision: D27845980
Pulled By: mruberry
fbshipit-source-id: 134ab163f05c31a6900dd65aefc745803019e037
Summary:
After MAGMA has been enabled, around 5k new tests are running now.
Out of these 5 tests (each having 4 datatypes) are failing on the latest ROCM
CI with Rocm 4.1. Disabling these tests for now so the ROCM CI does not fail.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55534
Reviewed By: ZolotukhinM
Differential Revision: D27630085
Pulled By: malfet
fbshipit-source-id: c48d124e6a2b4a4f3c6c4b6ac2bdf6c214f325c7