pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
Xinya Zhang	1d857586f1	[ROCM] enable hipSOLVER backend for linalg.ldl_factor (#102665 ) * Add complex dtype support for linalg.ldl_factor * Fixes SWDEV-360139 * Enable the following 19 tests for ROCM + test_decomp.py TestDecompCUDA.test_comprehensive_linalg_ldl_factor_cuda_complex128 + test_decomp.py TestDecompCUDA.test_comprehensive_linalg_ldl_factor_cuda_complex64 + test_decomp.py TestDecompCUDA.test_comprehensive_linalg_ldl_factor_ex_cuda_complex128 + test_decomp.py TestDecompCUDA.test_comprehensive_linalg_ldl_factor_ex_cuda_complex64 + test_meta.py TestMetaCUDA.test_dispatch_meta_linalg_ldl_factor_cuda_complex128 + test_meta.py TestMetaCUDA.test_dispatch_meta_linalg_ldl_factor_cuda_complex64 + test_meta.py TestMetaCUDA.test_dispatch_meta_linalg_ldl_factor_ex_cuda_complex128 + test_meta.py TestMetaCUDA.test_dispatch_meta_linalg_ldl_factor_ex_cuda_complex64 + test_meta.py TestMetaCUDA.test_meta_linalg_ldl_factor_cuda_complex128 + test_ops.py TestCommonCUDA.test_noncontiguous_samples_linalg_ldl_factor_cuda_complex64 + test_ops.py TestCommonCUDA.test_noncontiguous_samples_linalg_ldl_factor_ex_cuda_complex64 + test_ops.py TestCommonCUDA.test_variant_consistency_eager_linalg_ldl_factor_cuda_complex64 + test_ops.py TestCommonCUDA.test_variant_consistency_eager_linalg_ldl_factor_ex_cuda_complex64 + test_ops.py TestMathBitsCUDA.test_conj_view_linalg_ldl_factor_cuda_complex64 + test_ops.py TestMathBitsCUDA.test_conj_view_linalg_ldl_factor_ex_cuda_complex64 + test_ops.py TestMathBitsCUDA.test_neg_conj_view_linalg_ldl_factor_cuda_complex128 + test_ops.py TestMathBitsCUDA.test_neg_conj_view_linalg_ldl_factor_ex_cuda_complex128 + test_ops_jit.py TestJitCUDA.test_variant_consistency_jit_linalg_ldl_factor_cuda_complex64 + test_ops_jit.py TestJitCUDA.test_variant_consistency_jit_linalg_ldl_factor_ex_cuda_complex64 Pull Request resolved: https://github.com/pytorch/pytorch/pull/102665 Approved by: https://github.com/lezcano	2023-06-08 20:05:01 +00:00
Andres Lugo-Reyes	eaffd98880	Enable hipSOLVER in ROCm builds (#97370 ) Enables the hipSolver backend for ROCm builds -------------------------------------------------------------------------- - Minimum ROCm version requirement - 5.3 - Introduces new macro USE_LINALG_SOLVER the controls enablement of both cuSOLVER and hipSOLVER - Adds hipSOLVER API to hipification process - combines hipSOLVER and hipSPARSE mappings into single SPECIAL map that takes priority among normal mappings - Torch api to be moved to hipsolver backend (as opposed to magma) include: torch.svd(), torch.geqrf(), torch.orgqr(), torch.ormqr() - Will enable 100+ linalg unit tests for ROCm Pull Request resolved: https://github.com/pytorch/pytorch/pull/97370 Approved by: https://github.com/malfet	2023-05-31 16:53:23 +00:00
PyTorch MergeBot	a64e97b62c	Revert "[dynamo 3.11] enable other torch 3.11 dynamo-related tests (#99180 )" This reverts commit `aa8dcab1ce`. Reverted https://github.com/pytorch/pytorch/pull/99180 on behalf of https://github.com/huydhn due to Sorry for reverting this, but linux-bionic-py3.11-clang9 test starts to timeout after this taking more than 3h30m. This is probably a landrace ([comment](https://github.com/pytorch/pytorch/pull/99180#issuecomment-1545982256))	2023-05-12 16:18:22 +00:00
William Wen	aa8dcab1ce	[dynamo 3.11] enable other torch 3.11 dynamo-related tests (#99180 ) Notes: - No segfaults observed in any CI tests: dynamo unittests, inductor unittests, dynamo-wrapped pytorch tests. So we remove the warning that using dynamo 3.11 may result in segfaults. - Some dynamo-wrapped pytorch tests hang. They will be skipped in the dynamo-wrapped test suite and will be addressed in a future PR Pull Request resolved: https://github.com/pytorch/pytorch/pull/99180 Approved by: https://github.com/malfet	2023-05-12 07:03:09 +00:00
soulitzer	6a02342131	Check inputs have same dtype in addmm_impl_cpu_ even if input has zero numel (#100274 ) Fixes #99226 When an inputs has zero numel, addmm_impl_cpu_'s check that the inputs have the same dtype are bypassed. This PR adds a check before the early return. Pull Request resolved: https://github.com/pytorch/pytorch/pull/100274 Approved by: https://github.com/ngimel	2023-04-29 00:07:54 +00:00
Irem Yuksel	2504089329	Enable test_linalg_solve_triangular_large (#96182 ) PR to see if test fails after removing skip line Fixes #70111 Pull Request resolved: https://github.com/pytorch/pytorch/pull/96182 Approved by: https://github.com/lezcano	2023-04-28 12:54:27 +00:00
Larry Liu	687afeb686	[dynamo][numpy] Add NumpyTensorVariable to translate ndarray attribute calls to tensor attributes (#95849 ) Issue: #93684 # Problem Reduce graph breaks when dynamo compiles python functions containing numpy functions and ndarray operations. # Design (as I know it) * Use torch_np.ndarray(a wrapper of tensor) to back a `VariableTracker`: `NumpyTensorVariable`. * Translate all attributes and methods calls, on ndarray, to torch_np.ndarray equivalent. This PR adds `NumpyTensorVariable` and supports: 1. tensor to ndarray, ndarray to tensor 2. numpy functions such as numpy.meshgrid() 3. ndarray attributes such as `itemsize`, `stride` Next PR will handle returning `np.ndarray` and add support for ndarray methods Pull Request resolved: https://github.com/pytorch/pytorch/pull/95849 Approved by: https://github.com/ezyang	2023-04-27 16:18:35 +00:00
eqy	2fddcf0fc0	[CUDA][CUDA 11] Remove more CUDA 11 version checks (#92934 ) Working on removing stragglers missed in previous CUDA version < 11.0 cleanup PRs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/92934 Approved by: https://github.com/ngimel	2023-03-30 19:49:52 +00:00
Aaron Gokaslan	47dca20d80	[BE] Enable flake8-comprehension rule C417 (#97880 ) Enables flake8-comprehension rule C417. Ruff autogenerated these fixes to the codebase. Pull Request resolved: https://github.com/pytorch/pytorch/pull/97880 Approved by: https://github.com/ezyang, https://github.com/kit1980, https://github.com/albanD	2023-03-30 14:34:24 +00:00
Christian Puhrsch	9d37cefcb0	Resubmit _int_mm (#96685 ) Avoids any changes to gemm_and_bias Pull Request resolved: https://github.com/pytorch/pytorch/pull/96685 Approved by: https://github.com/drisspg, https://github.com/ngimel	2023-03-27 16:14:07 +00:00
haozhe.zhu	fe0afc5852	use accumulate type in BF16 gemm(include dot, mv) ref path (#96074 ) Fix https://github.com/pytorch/pytorch/issues/95125 and https://github.com/pytorch/pytorch/issues/83863 for bf16 accumulation in gemm ref path Pull Request resolved: https://github.com/pytorch/pytorch/pull/96074 Approved by: https://github.com/lezcano, https://github.com/peterbell10	2023-03-23 01:22:59 +00:00
Christian Puhrsch	0a53c9624a	Back out "Add _int_mm to expose cuBLAS int8@int8 -> int32 matmul (#94339 )" (#96885 ) Summary: Backing out _int_mm to expose cuBLAS int8@int8 -> int32 matmul (#94339) Test Plan: CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/96885 Approved by: https://github.com/drisspg	2023-03-16 05:32:55 +00:00
mantaionut	2cbce06fee	Enablee test_inverse_errors_large (#94727 ) Test to see if TestLinAlgCUDA.test_inverse_errors_large_cuda_float64 still fails on CI. The test was not failing in multiple CI runs. I was not able to reproduce the crash locally. Fixes #57482 Pull Request resolved: https://github.com/pytorch/pytorch/pull/94727 Approved by: https://github.com/lezcano	2023-03-13 08:31:41 +00:00
XiaobingSuper	ac77883e48	fix issue of baddbmm when out has nan value for beta=0 (#96086 ) Fix https://github.com/pytorch/pytorch/issues/96037. Pull Request resolved: https://github.com/pytorch/pytorch/pull/96086 Approved by: https://github.com/ngimel, https://github.com/lezcano	2023-03-07 14:54:05 +00:00
Christian Puhrsch	1fe2a9d122	Add _int_mm to expose cuBLAS int8@int8 -> int32 matmul (#94339 ) Add _int_mm primitive that binds cuBLAS int8@int8 -> int32 matmul and that translates to Triton based mm templates under max autotune. This is a very useful first step towards better supporting quantization on the GPU. This is a not a user facing API, but an internal primitive. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94339 Approved by: https://github.com/ngimel, https://github.com/jansel	2023-02-27 20:27:25 +00:00
lezcano	03cc0f587c	Don't create large intermediary tensors in the backward of matmul (#95261 ) Currently, if we multiply a transposed batch of matrices with shape [b, m, n] and a matrix with shape [n, k], when computing the gradient of the matrix, we instantiate a matrix of shape [b, n, k]. This may be a very large matrix. Instead, we fold the batch of matrices into a matrix, which avoids creating any large intermediary tensor. Note that multiplying a batch of matrices and a matrix naturally occurs within an attention module, so this case surely happens in the wild. In particular, this issue was found while investigating the OOMs caused by the improved folding algorithm in the next PR of this stack. See https://github.com/pytorch/pytorch/pull/76828#issuecomment-1432359980 This PR fixes those OOMs and decreases the memory footprint of the backward of matmul. I understand this is a tricky one, so I put it on its own PR to discuss it. Differential Revision: [D43541495](https://our.internmc.facebook.com/intern/diff/D43541495) Pull Request resolved: https://github.com/pytorch/pytorch/pull/95261 Approved by: https://github.com/ezyang	2023-02-27 15:19:09 +00:00
kshitij12345	3b966a6ce3	[autograd] disable backward/grad for complex scalar output (#92753 ) Fixes https://github.com/pytorch/pytorch/issues/92750 Pull Request resolved: https://github.com/pytorch/pytorch/pull/92753 Approved by: https://github.com/ezyang	2023-02-23 11:38:27 +00:00
XiaobingSuper	5730cabdd0	using float type to do the computation of norm reduce for cpu half and bfloat16 dtype (#95166 ) As the title, we should use a higher dtype to compute norm reduce for half and bfloat1 dtype. Pull Request resolved: https://github.com/pytorch/pytorch/pull/95166 Approved by: https://github.com/peterbell10, https://github.com/jgong5, https://github.com/ngimel, https://github.com/lezcano	2023-02-23 05:00:25 +00:00
Nikita Shulga	42b6bcdb13	[BE] Add empty tensor check to _compute_linear_combination (#94245 ) Fixes https://github.com/pytorch/pytorch/issues/94124 Pull Request resolved: https://github.com/pytorch/pytorch/pull/94245 Approved by: https://github.com/lezcano	2023-02-07 11:31:11 +00:00
Ivan Yashchuk	fba13d94a1	Remove deprecated torch.symeig (#70988 ) The time has come to remove deprecated linear algebra related functions. This PR removes `torch.symeig`. - [x] XLA PR: https://github.com/pytorch/xla/pull/4498 Pull Request resolved: https://github.com/pytorch/pytorch/pull/70988 Approved by: https://github.com/lezcano, https://github.com/kit1980, https://github.com/malfet	2023-01-31 11:59:11 +00:00
PyTorch MergeBot	acdd462b1a	Revert "Remove deprecated torch.symeig (#70988 )" This reverts commit `d70ed68162`. Reverted https://github.com/pytorch/pytorch/pull/70988 on behalf of https://github.com/kit1980 due to Failing XLA tests, forward fix unsuccessful	2023-01-24 19:03:40 +00:00
Eddie Yan	0bf7506051	[CUDA] Drop CUDA < 11.0 test flags (#92605 ) Follow-up of #89582 to drop flags like `CUDA11OrLater` in tests. Note that in some places it appears that `TEST_WITH_ROCM` is _implicitly_ guarded against via the `CUDA11OrLater` version check, based on my best-guess of how `torch.version.cuda` would behave in ROCM builds, so I've added `not TEST_WITH_ROCM` in cases where ROCM wasn't previously explicitly allowed. CC @ptrblck @malfet @ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/92605 Approved by: https://github.com/ngimel	2023-01-24 04:34:06 +00:00
Ivan Yashchuk	d70ed68162	Remove deprecated torch.symeig (#70988 ) The time has come to remove deprecated linear algebra related functions. This PR removes `torch.symeig`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/70988 Approved by: https://github.com/lezcano, https://github.com/kit1980	2023-01-23 22:51:40 +00:00
PyTorch MergeBot	0a6053e9b5	Revert "Avoid copies in matmul (#76828 )" This reverts commit `8c2e82b487`. Reverted https://github.com/pytorch/pytorch/pull/76828 on behalf of https://github.com/mehtanirav due to Internal breakages	2023-01-03 23:36:58 +00:00
lezcano	8c2e82b487	Avoid copies in matmul (#76828 ) With this PR, matmul just folds a bmm into a mm o mv if and only if it can achieve so without copying. We add tests for this to make sure that our algorithm to detect this is accurate. For the cases where it was copying before see https://github.com/pytorch/pytorch/pull/75197#discussion_r843413208 https://github.com/pytorch/pytorch/pull/75197#discussion_r863489479 https://github.com/pytorch/pytorch/pull/75197#discussion_r863489805 Fixes https://github.com/pytorch/pytorch/issues/76702 Pull Request resolved: https://github.com/pytorch/pytorch/pull/76828 Approved by: https://github.com/ngimel	2023-01-03 14:18:38 +00:00
PyTorch MergeBot	db2a237763	Revert "Avoid copies in matmul (#76828 )" This reverts commit `0c3659586d`. Reverted https://github.com/pytorch/pytorch/pull/76828 on behalf of https://github.com/lezcano due to Makes functorch tests fail	2023-01-03 12:26:29 +00:00
lezcano	0c3659586d	Avoid copies in matmul (#76828 ) With this PR, matmul just folds a bmm into a mm o mv if and only if it can achieve so without copying. We add tests for this to make sure that our algorithm to detect this is accurate. For the cases where it was copying before see https://github.com/pytorch/pytorch/pull/75197#discussion_r843413208 https://github.com/pytorch/pytorch/pull/75197#discussion_r863489479 https://github.com/pytorch/pytorch/pull/75197#discussion_r863489805 Fixes https://github.com/pytorch/pytorch/issues/76702 Pull Request resolved: https://github.com/pytorch/pytorch/pull/76828 Approved by: https://github.com/ngimel	2023-01-02 20:07:38 +00:00
Jithun Nair	e8e591b72f	Upgrade CI to ROCm5.3 (#88297 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88297 Approved by: https://github.com/malfet	2022-12-14 05:09:56 +00:00
PyTorch MergeBot	af4735d3ad	Revert "Upgrade CI to ROCm5.3 (#88297 )" This reverts commit `181a82ffd2`. Reverted https://github.com/pytorch/pytorch/pull/88297 on behalf of https://github.com/IvanYashchuk due to Tests are unnecessarily skipped on all platforms	2022-12-13 12:23:44 +00:00
Jithun Nair	181a82ffd2	Upgrade CI to ROCm5.3 (#88297 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88297 Approved by: https://github.com/malfet	2022-12-13 04:50:06 +00:00
lezcano	1d6a188d08	Reland Dispatch torch.norm to linalg.vector_norm and linalg.matrix_norm (#81761 ) (#84624 ) Reland https://github.com/pytorch/pytorch/pull/81761 Differential Revision: [D39332292](https://our.internmc.facebook.com/intern/diff/D39332292) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84624 Approved by: https://github.com/kit1980	2022-11-22 07:53:24 +00:00
lezcano	d8506ff42b	Generalize gesvdjBatched to run whith full_matrices==false (#88502 ) As brought up in https://github.com/pytorch/pytorch/issues/86234#issuecomment-1268296036, our heuristic for which SVD backend to choose was not great in some cases. The case in which there could be some improvements is when we have a large batch of very small non-square matrices. This PR, adapts the calling code to gesvdj by creating two temporary square buffers to allow to call gesvdjBatched, and then copies back the result into the output buffers. We then modify the heuristic that chooses between gesvdj and gesvdjBatched. Fixes https://github.com/pytorch/pytorch/issues/86234 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88502 Approved by: https://github.com/IvanYashchuk, https://github.com/nikitaved, https://github.com/mruberry, https://github.com/xwang233	2022-11-07 22:07:48 +00:00
Fang Wang	160118d72a	Add test case for matrix multiply-add with large inputs (#85550 ) Summary: - Added test case for addmm, baddbmm and linear with large inputs - Testing with torch types: float32, float16, bfloat16 Test Plan: Run unit tests with: `buck2 run mode/opt //caffe2/test:linalg_re_cuda` ``` ... test_addmm_baddbmm_large_input_1_10000_10000_10000_cpu_bfloat16 (test_linalg_re_cuda.TestLinalgReCudaCPU) ... skipped 'Only runs on cuda' test_addmm_baddbmm_large_input_1_10000_10000_10000_cpu_float16 (test_linalg_re_cuda.TestLinalgReCudaCPU) ... skipped 'Only runs on cuda' test_addmm_baddbmm_large_input_1_10000_10000_10000_cpu_float32 (test_linalg_re_cuda.TestLinalgReCudaCPU) ... skipped 'Only runs on cuda' test_addmm_baddbmm_large_input_1_10000_1000_10000_cpu_bfloat16 (test_linalg_re_cuda.TestLinalgReCudaCPU) ... skipped 'Only runs on cuda' test_addmm_baddbmm_large_input_1_10000_1000_10000_cpu_float16 (test_linalg_re_cuda.TestLinalgReCudaCPU) ... skipped 'Only runs on cuda' test_addmm_baddbmm_large_input_1_10000_1000_10000_cpu_float32 (test_linalg_re_cuda.TestLinalgReCudaCPU) ... skipped 'Only runs on cuda' test_addmm_baddbmm_large_input_2_1000_1000_1000_cpu_bfloat16 (test_linalg_re_cuda.TestLinalgReCudaCPU) ... skipped 'Only runs on cuda' test_addmm_baddbmm_large_input_2_1000_1000_1000_cpu_float16 (test_linalg_re_cuda.TestLinalgReCudaCPU) ... skipped 'Only runs on cuda' test_addmm_baddbmm_large_input_2_1000_1000_1000_cpu_float32 (test_linalg_re_cuda.TestLinalgReCudaCPU) ... skipped 'Only runs on cuda' test_addmm_baddbmm_large_input_2_100_100_100_cpu_bfloat16 (test_linalg_re_cuda.TestLinalgReCudaCPU) ... skipped 'Only runs on cuda' test_addmm_baddbmm_large_input_2_100_100_100_cpu_float16 (test_linalg_re_cuda.TestLinalgReCudaCPU) ... skipped 'Only runs on cuda' test_addmm_baddbmm_large_input_2_100_100_100_cpu_float32 (test_linalg_re_cuda.TestLinalgReCudaCPU) ... skipped 'Only runs on cuda' test_addmm_baddbmm_large_input_1_10000_10000_10000_cuda_bfloat16 (test_linalg_re_cuda.TestLinalgReCudaCUDA) ... ok test_addmm_baddbmm_large_input_1_10000_10000_10000_cuda_float16 (test_linalg_re_cuda.TestLinalgReCudaCUDA) ... ok test_addmm_baddbmm_large_input_1_10000_10000_10000_cuda_float32 (test_linalg_re_cuda.TestLinalgReCudaCUDA) ... ok test_addmm_baddbmm_large_input_1_10000_1000_10000_cuda_bfloat16 (test_linalg_re_cuda.TestLinalgReCudaCUDA) ... ok test_addmm_baddbmm_large_input_1_10000_1000_10000_cuda_float16 (test_linalg_re_cuda.TestLinalgReCudaCUDA) ... ok test_addmm_baddbmm_large_input_1_10000_1000_10000_cuda_float32 (test_linalg_re_cuda.TestLinalgReCudaCUDA) ... ok test_addmm_baddbmm_large_input_2_1000_1000_1000_cuda_bfloat16 (test_linalg_re_cuda.TestLinalgReCudaCUDA) ... ok test_addmm_baddbmm_large_input_2_1000_1000_1000_cuda_float16 (test_linalg_re_cuda.TestLinalgReCudaCUDA) ... ok test_addmm_baddbmm_large_input_2_1000_1000_1000_cuda_float32 (test_linalg_re_cuda.TestLinalgReCudaCUDA) ... ok test_addmm_baddbmm_large_input_2_100_100_100_cuda_bfloat16 (test_linalg_re_cuda.TestLinalgReCudaCUDA) ... ok test_addmm_baddbmm_large_input_2_100_100_100_cuda_float16 (test_linalg_re_cuda.TestLinalgReCudaCUDA) ... ok test_addmm_baddbmm_large_input_2_100_100_100_cuda_float32 (test_linalg_re_cuda.TestLinalgReCudaCUDA) ... ok ---------------------------------------------------------------------- Ran 24 tests in 63.224s OK (skipped=12) ``` Differential Revision: D39718256 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85550 Approved by: https://github.com/IvanYashchuk, https://github.com/malfet	2022-10-11 17:52:21 +00:00
Jane Xu	a348975e00	Add opteinsum backend to give users control (#86219 ) This achieves the same things as https://github.com/pytorch/pytorch/pull/85908 but using backends instead of kwargs (which breaks torchscript unfortunately). This also does mean we let go of numpy compatibility BUT the wins here are that users can control what opt einsum they wanna do! The backend allows for..well you should just read the docs: ``` .. attribute:: torch.backends.opteinsum.enabled A :class:`bool` that controls whether opt_einsum is enabled (on by default). If so, torch.einsum will use opt_einsum (https://optimized-einsum.readthedocs.io/en/stable/path_finding.html) to calculate an optimal path of contraction for faster performance. .. attribute:: torch.backends.opteinsum.strategy A :class:`str` that specifies which strategies to try when `torch.backends.opteinsum.enabled` is True. By default, torch.einsum will try the "auto" strategy, but the "greedy" and "optimal" strategies are also supported. Note that the "optimal" strategy is factorial on the number of inputs as it tries all possible paths. See more details in opt_einsum's docs (https://optimized-einsum.readthedocs.io/en/stable/path_finding.html). ``` In trying (and failing) to land 85908, I discovered that jit script does NOT actually pull from python's version of einsum (because it cannot support variadic args nor kwargs). Thus I learned that jitted einsum does not subscribe to the new opt_einsum path calculation. Overall, this is fine since jit script is getting deprecated, but where is the best place to document this? ## Test plan: - added tests to CI - locally tested that trying to set the strategy to something invalid will error properly - locally tested that tests will pass even if you don't have opt-einsum - locally tested that setting the strategy when opt-einsum is not there will also error properly Pull Request resolved: https://github.com/pytorch/pytorch/pull/86219 Approved by: https://github.com/soulitzer, https://github.com/malfet	2022-10-05 06:33:25 +00:00
albanD	94da90e41f	LU solve/unpack fix to prevent bad memory usage on CPU (#85922 ) Fixes https://github.com/pytorch/pytorch/issues/77898 Fixes https://github.com/pytorch/pytorch/issues/85026 There is a minor perf impact but: - For lu_solve, the actual compute is going to be more expensive than this O(n) check (ones pass over the other matrices is O(n^2) in any case) - For lu_unpack, the check inside the kernel should be almost free. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85922 Approved by: https://github.com/ngimel, https://github.com/nikitaved	2022-09-30 20:07:08 +00:00
Jane Xu	e7e1cd945f	Add path optimize kwarg to einsum (#84890 ) ## This PR seeks to: - [x] add c++ support for an optimize path - [x] add python opt_einsum path passthrough - [x] add opt_einsum to OSS requirements, but a soft one - [x] show benchmark results here Additional things I've explored + their conclusions: - Delaying the summing over dimensions => added! - The idea here is to not incur kernel calls to `sum` as we try to early sum out in einsum. Thus, we collect all the dimensions that need to be summed together in one contraction + sum at the end instead of summing as we go. While this optimization didn't feel like it made things faster for the random cases we've selected (they all summed 1 dim per contraction), it is a good principle and would help more common use cases that would reduce multiple dimensions at a time (like `bxy,xyi,xyj->bij`). - Caching contract_path based on equation and tensor sizes => dropped :( - The benchmarks were strictly worse for all the cases, and, from scanning the use cases, I observed people do not often call einsum on the same equation/tensor order enough for caching to be justified. I do think caching can be effective in the future, but it would require further investigation. ## Not a part of this PR (but are next steps): - adding opt_einsum package to OSS CI - adding it to internal CI - potentially adding a kwarg path argument to the python API -- if the path is given, we wouldn't have to spend time calculating it, but there would be some time lost validating user input. ## Testing: - Added more tests to CI ## Benchmarking: TL;DRs - torch.einsum with opt_einsum is a definite win for the production case. - torch.einsum with opt_einsum installed is consistently fast, but has an overhead of needing to find the path. If the path is already found/optimal, it will be slightly slower. - The einsum overhead decreases for bigger dimensions. - torch.einsum without opt_einsum installed is comparable to before this commit, with occasional slowness potentially due to not reshaping/squeezing as we contract until the end. - For many of the random generated cases, the dimensions were too similar and small where an optimal order wasn't that much more optimal than just going left to right. However, in production, dimensions are commonly quite distinct (batch size will be small, but the data will be huge). - torch.einsum opt is comparable (slightly faster overall) compared to numpy.einsum opt for the cpu case. This is interesting given that torch.einsum currently spends time computing the path, but numpy.einsum takes it as input. - torch.einsum opt is significantly faster than numpy.einsum opt for the gpu case. This is because numpy doesn't take advantage of GPUs. The following benchmarks were done on an A100 GPU and Linux CPUs. The line in the first chart separates GPU (on top) from CPU, and the line in the second graph separates CPU (on top) and then GPU. Sorry it's flipped 😛 . Production example (see [colab benchmark](https://colab.research.google.com/drive/1V2s4v1dOOKwRvp5T_DC-PNUosOV9FFJx?authuser=1#scrollTo=WZoQkC8Mdt6I) for more context): <img width="1176" alt="image" src="https://user-images.githubusercontent.com/31798555/192012636-9a68bfa7-2601-43b1-afeb-b4e0877db6a4.png"> Randomly generated examples (the same ones as in https://github.com/pytorch/pytorch/pull/60191) <img width="1176" alt="image" src="https://user-images.githubusercontent.com/31798555/192012804-1c639595-b3e6-48c9-a385-ad851c13e1c2.png"> Open below to see old + not super relevant benchmarking results: <details> Benchmark results BEFORE this PR (on Linux -- I will update devices so they are consistent later): <img width="776" alt="image" src="https://user-images.githubusercontent.com/31798555/190807274-18f71fce-556e-47f4-b18c-e0f7d0c0d5aa.png"> Benchmark results with the code on this PR (on my x86 mac): For the CPU internal use case -- ![image](https://user-images.githubusercontent.com/31798555/190801376-6f591b00-cebd-4ca7-bb23-ae8f17f1634e.png) For the general use case -- It looks like numpy opt still does better in several of these random cases, but torch einsum opt is consistently faster than torch.einsum. ![image](https://user-images.githubusercontent.com/31798555/190811730-fbb6797d-af59-4f5a-92da-ba4103372014.png) <details> Pull Request resolved: https://github.com/pytorch/pytorch/pull/84890 Approved by: https://github.com/albanD, https://github.com/soulitzer	2022-09-24 03:47:36 +00:00
Sourav Mandal	70b27e91c7	[pytorch] Skip linalg tests that fail on Meta infra (#85577 ) Summary: test_inverse_errors_large and test_linalg_solve_triangular fail for dtype=float64 when invoked on GPUs on Meta internal testing infra. Skip in Meta internal testing. Test Plan: (observe tests skipped on Meta internal infra) Reviewed By: mikekgfb Differential Revision: D39785331 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85577 Approved by: https://github.com/malfet	2022-09-24 01:02:42 +00:00
Wei Wang	8bd4724f04	Adding a unit test that would gate PRs and prevent reverts, e.g. #83327 (#85442 ) PR #83327 slipped through CI, adding this unit test as part of efforts to minimize future reverts Pull Request resolved: https://github.com/pytorch/pytorch/pull/85442 Approved by: https://github.com/Balandat, https://github.com/mehtanirav	2022-09-23 01:05:17 +00:00
Ivan Yashchuk	539076e2c2	Remove deprecated torch.lstsq (#70980 ) The time has come to remove deprecated linear algebra related functions. This PR removes `torch.lstsq`. There's a note in `tools/codegen/gen.py` about `lstsq` schema in `native_function.yaml` that I will not remove: `87139d8532/tools/codegen/gen.py (L734-L770)` cc @jianyuh @nikitaved @pearu @mruberry @walterddr @IvanYashchuk @xwang233 @Lezcano Pull Request resolved: https://github.com/pytorch/pytorch/pull/70980 Approved by: https://github.com/lezcano, https://github.com/kit1980	2022-09-23 00:16:55 +00:00
Ivan Yashchuk	bcf93181a0	Remove deprecated torch.matrix_rank (#70981 ) The time has come to remove deprecated linear algebra related functions. This PR removes `torch.matrix_rank`. cc @jianyuh @nikitaved @pearu @mruberry @walterddr @IvanYashchuk @xwang233 @Lezcano Pull Request resolved: https://github.com/pytorch/pytorch/pull/70981 Approved by: https://github.com/lezcano, https://github.com/kit1980	2022-09-22 17:40:46 +00:00
Sourav Mandal	5aa84c16db	[pytorch] cuBLAS addmm malfunction test (#85432 ) Summary: Re-submit for approved PR that was then reverted: https://github.com/pytorch/pytorch/pull/85084 Create unit test to detect cuBLAS breakage via large differences between CPU and GPU addmm invocations Test Plan: Sample unit test output -- [...] test_cublas_addmm_size_10000_cpu_bfloat16 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_10000_cpu_float16 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_10000_cpu_float32 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_1000_cpu_bfloat16 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_1000_cpu_float16 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_1000_cpu_float32 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_100_cpu_bfloat16 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_100_cpu_float16 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_100_cpu_float32 (test_linalg.TestLinalgCPU) ... ok [...] Reviewed By: mikekgfb Differential Revision: D39433029 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85432 Approved by: https://github.com/zrphercule	2022-09-21 22:17:48 +00:00
PyTorch MergeBot	2fb820455c	Revert "[pytorch] cuBLAS addmm malfunction test (#85084 )" This reverts commit `0297c75c14`. Reverted https://github.com/pytorch/pytorch/pull/85084 on behalf of https://github.com/clee2000 due to broke tests on trunk, https://github.com/pytorch/pytorch/actions/runs/3098347639/jobs/5017166419	2022-09-21 16:48:55 +00:00
Sourav Mandal	0297c75c14	[pytorch] cuBLAS addmm malfunction test (#85084 ) Summary: Create unit test to detect cuBLAS breakage via large differences between CPU and GPU addmm invocations Test Plan: Sample unit test output -- [...] test_cublas_addmm_size_10000_cpu_bfloat16 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_10000_cpu_float16 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_10000_cpu_float32 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_1000_cpu_bfloat16 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_1000_cpu_float16 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_1000_cpu_float32 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_100_cpu_bfloat16 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_100_cpu_float16 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_100_cpu_float32 (test_linalg.TestLinalgCPU) ... ok [...] Reviewed By: mikekgfb Differential Revision: D39433029 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85084 Approved by: https://github.com/zrphercule	2022-09-21 13:42:13 +00:00
Ivan Yashchuk	01c54ad6de	Remove deprecated torch.eig (#70982 ) The time has come to remove deprecated linear algebra related functions. This PR removes `torch.eig`. cc @jianyuh @nikitaved @pearu @mruberry @walterddr @IvanYashchuk @xwang233 @Lezcano Pull Request resolved: https://github.com/pytorch/pytorch/pull/70982 Approved by: https://github.com/Lezcano, https://github.com/malfet	2022-09-09 21:31:57 +00:00
PyTorch MergeBot	166dec74b5	Revert "Dispatch torch.norm to linalg.vector_norm and linalg.matrix_norm (#81761 )" This reverts commit `65beff5acb`. Reverted https://github.com/pytorch/pytorch/pull/81761 on behalf of https://github.com/mehtanirav due to Breakages in pytorch/glow	2022-09-06 22:31:14 +00:00
lezcano	65beff5acb	Dispatch torch.norm to linalg.vector_norm and linalg.matrix_norm (#81761 ) `torch.norm` is very odd. Some notable issues are: - The default value of `"fro"` in `torch.norm` has an odd behaviour when `dim=None`. This is handled in the new dispatch - The treatment of the `dtype` argument in `torch.norm` was completely wrong. This should fix it - Some `out=` variants in the previous implementation were also wrong. This should fix those. - This new dispatch should make some paths much faster. For example, `torch.norm(x)` where `x` is complex. I'll try to make the changes in these PRs as incremental as possible as this is a tricky one. Pull Request resolved: https://github.com/pytorch/pytorch/pull/81761 Approved by: https://github.com/ngimel	2022-09-02 19:12:25 +00:00
Mario Lezcano	f5a3515083	Make linalg.inv composite of linalg.solve (#80074 ) The `getri` kernel calls inside `getrs` so we can do so explicitly ourselves and save ourselves from having to maintain an extra kernel. This way we just need to optimise `lu_factor` and `lu_solve` and `inv` will be as efficient as it can be, as it'll be choosing the best backend to perform the factorisation and the best backend (not necessarily the same) to perform the solve. Fixes https://github.com/pytorch/pytorch/issues/77498 The benchmarks: https://github.com/pytorch/pytorch/pull/80074#issuecomment-1164309071 Pull Request resolved: https://github.com/pytorch/pytorch/pull/80074 Approved by: https://github.com/IvanYashchuk, https://github.com/albanD, https://github.com/malfet	2022-08-25 09:28:55 +00:00
PyTorch MergeBot	5321bf52f2	Revert "Make linalg.inv composite of linalg.solve (#80074 )" This reverts commit `4737b33614`. Reverted https://github.com/pytorch/pytorch/pull/80074 on behalf of https://github.com/malfet due to Depends on the changes from https://github.com/pytorch/pytorch/pull/83628	2022-08-25 00:43:00 +00:00
Mario Lezcano	4737b33614	Make linalg.inv composite of linalg.solve (#80074 ) The `getri` kernel calls inside `getrs` so we can do so explicitly ourselves and save ourselves from having to maintain an extra kernel. This way we just need to optimise `lu_factor` and `lu_solve` and `inv` will be as efficient as it can be, as it'll be choosing the best backend to perform the factorisation and the best backend (not necessarily the same) to perform the solve. Fixes https://github.com/pytorch/pytorch/issues/77498 The benchmarks: https://github.com/pytorch/pytorch/pull/80074#issuecomment-1164309071 Pull Request resolved: https://github.com/pytorch/pytorch/pull/80074 Approved by: https://github.com/IvanYashchuk, https://github.com/albanD, https://github.com/malfet	2022-08-24 15:18:56 +00:00
lezcano	0bdcfcb840	Strenghten preconditions of linalg.cross (#83798 ) This makes `linalg.cross` array API complaint (https://github.com/data-apis/array-api/issues/415) and fixes a few bugs. Fixes https://github.com/pytorch/pytorch/issues/77629 Fixes https://github.com/pytorch/pytorch/issues/83756 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83798 Approved by: https://github.com/mruberry	2022-08-24 15:17:12 +00:00
PyTorch MergeBot	bbe803cb35	Revert "Strenghten preconditions of linalg.cross (#83798 )" This reverts commit `7f0198e739`. Reverted https://github.com/pytorch/pytorch/pull/83798 on behalf of https://github.com/janeyx99 due to Sorry, land race caused functorch issues `7f0198e739`	2022-08-23 19:36:43 +00:00
lezcano	7f0198e739	Strenghten preconditions of linalg.cross (#83798 ) This makes `linalg.cross` array API complaint (https://github.com/data-apis/array-api/issues/415) and fixes a few bugs. Fixes https://github.com/pytorch/pytorch/issues/77629 Fixes https://github.com/pytorch/pytorch/issues/83756 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83798 Approved by: https://github.com/mruberry	2022-08-23 18:06:51 +00:00
pbialecki	b4f7e22640	Enable periodic builds for CUDA 11.7 (#81688 ) CC @atalman Pull Request resolved: https://github.com/pytorch/pytorch/pull/81688 Approved by: https://github.com/atalman	2022-08-10 00:03:51 +00:00
albanD	2255911f8a	Make M1 tests green (#82213 ) This is skipping all the failing tests and add a new master job to test on M1 Pull Request resolved: https://github.com/pytorch/pytorch/pull/82213 Approved by: https://github.com/seemethere, https://github.com/soulitzer, https://github.com/malfet	2022-08-05 16:12:08 +00:00
lezcano	c5330183ca	[PrimTorch] Reference for linalg.matrix_norm (#81113 ) As per title. I corrected a thing or two from my previous implementation to make for better errors in some weird edge-cases and have a more clear understanding of when does this function support low_precision types and when it doesn't. We also use the optimisation for bfloat16 within `vector_norm` within this function. Pull Request resolved: https://github.com/pytorch/pytorch/pull/81113 Approved by: https://github.com/ngimel	2022-07-21 23:07:32 +00:00
Nikita Vedeneev	85144e63a9	`matrix_exp`: Make sure `_compute_linear_combinations` result preserves dim of the input. (#81330 ) Fixes https://github.com/pytorch/pytorch/issues/80948. Pull Request resolved: https://github.com/pytorch/pytorch/pull/81330 Approved by: https://github.com/Lezcano, https://github.com/mruberry	2022-07-12 21:21:34 +00:00
lezcano	37a5819665	Make slogdet, linalg.sloget and logdet support metatensors (#79742 ) This PR also adds complex support for logdet, and makes all these functions support out= and be composite depending on one function. We also extend the support of `logdet` to complex numbers and improve the docs of all these functions. We also use `linalg_lu_factor_ex` in these functions, so we remove the synchronisation present before. Pull Request resolved: https://github.com/pytorch/pytorch/pull/79742 Approved by: https://github.com/IvanYashchuk, https://github.com/albanD	2022-07-01 16:09:21 +00:00
lezcano	ff5a588e6e	Port cholesky to structured kernels (#79300 ) Yeah. Pull Request resolved: https://github.com/pytorch/pytorch/pull/79300 Approved by: https://github.com/IvanYashchuk, https://github.com/albanD	2022-06-24 02:37:45 +00:00
lezcano	549a597c00	Port linalg_eigh and linalg_eigvalsh to structured This follows the structure of linalg.svd. Pull Request resolved: https://github.com/pytorch/pytorch/pull/79072 Approved by: https://github.com/IvanYashchuk, https://github.com/albanD	2022-06-14 20:17:01 +00:00
lezcano	9fc2518a8a	Update and improve the heuristics for linalg.lu_solve This PR adds getrf_cublas to the functions considered in the heuristics for lu_solve. Pull Request resolved: https://github.com/pytorch/pytorch/pull/73878 Approved by: https://github.com/nikitaved, https://github.com/IvanYashchuk, https://github.com/mruberry	2022-06-11 04:06:40 +00:00
lezcano	54949a5abc	Simplify and optimize linalg.solve This PR heavily simplifies the code of `linalg.solve`. At the same time, this implementation saves quite a few copies of the input data in some cases (e.g. A is contiguous) We also implement it in such a way that the derivative goes from computing two LU decompositions and two LU solves to no LU decompositions and one LU solves. It also avoids a number of unnecessary copies the derivative was unnecessarily performing (at least the copy of two matrices). On top of this, we add a `left` kw-only arg that allows the user to solve `XA = B` rather concisely. Pull Request resolved: https://github.com/pytorch/pytorch/pull/74046 Approved by: https://github.com/nikitaved, https://github.com/IvanYashchuk, https://github.com/mruberry	2022-06-11 04:06:40 +00:00
lezcano	af6321f3d8	Port linalg_qr to structured This PR simplifies the logic of `linalg.qr` using structured kernels. I also took this chance and merged a few `copy_` operations with other ops. This PR removes a the previous magma implementation as is never faster than that of cusolver and it's rather buggy. This has the side-effect that now `qr` is not supported in Rocm. Ivan confirmed that this is fine, given how incredibly slow was QR on Rocm anyway (we were marking some tests as slow because of this...). This PR also corrects the dispatch in geqrf. Before, if we called it with a matrix for which `input.size(-2) <= 256 && batchCount(input) >= std::max<int64_t>(2, input.size(-2) / 16)` is false, and we have cublas but not cusolver, we would end up calling magma rather than cublas. This is not what the heuristic suggested. Probaly we should benchmark these heuristics again, but that's beyond the scope of this PR. Note. It looks like `torch.geqrf` maybe broken in MAGMA as per the previous comment in `linalg_qr_helper_magma`. IvanYashchuk wdyt? Pull Request resolved: https://github.com/pytorch/pytorch/pull/79054 Approved by: https://github.com/IvanYashchuk, https://github.com/ezyang	2022-06-09 14:41:30 +00:00
lezcano	f7b9a46880	Deprecate torch.lu BC-breaking note: This PR deprecates `torch.lu` in favor of `torch.linalg.lu_factor`. A upgrade guide is added to the documentation for `torch.lu`. Note this PR DOES NOT remove `torch.lu`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/77636 Approved by: https://github.com/malfet	2022-06-07 22:50:14 +00:00
lezcano	c7d6cec078	Add linalg.lu_solve This PR adds `linalg.lu_solve`. While doing so, I found a bug in MAGMA when calling the batched MAGMA backend with trans=True. We work around that by solving the system solving two triangular systems. We also update the heuristics for this function, as they were fairly updated. We found that cuSolver is king, so luckily we do not need to rely on the buggy backend from magma for this function. We added tests testing this function left and right. We also added tests for the different backends. We also activated the tests for AMD, as those should work as well. Fixes https://github.com/pytorch/pytorch/issues/61657 Pull Request resolved: https://github.com/pytorch/pytorch/pull/77634 Approved by: https://github.com/malfet	2022-06-07 22:28:28 +00:00
Xiao Wang	d136852bda	[CUDA][Linalg] Add a `driver=` kwarg to `torch.linalg.svd` and `svdvals`; add cusolver gesvdaStridedBatched driver to svd (#74521 ) [CUDA][Linalg] Add a driver= kwarg to torch.linalg.svd and svdvals; add cusolver gesvdaStridedBatched driver to svd cusolver doc: https://docs.nvidia.com/cuda/cusolver/index.html#cuSolverDN-lt-t-gt-gesvda Todo: - [X] add cusolver `gesvdaStridedBatched` driver - [X] add `driver=` kwarg to `torch.linalg.svd` and `torch.linalg.svdvals` - [X] doc - [X] error out (?) on other non-cusolver use cases: CPU, MAGMA - [X] change svd api in `torch/csrc/api/include/torch/linalg.h` ? Close https://github.com/pytorch/pytorch/issues/41306 Related https://github.com/pytorch/pytorch/issues/75494 Pull Request resolved: https://github.com/pytorch/pytorch/pull/74521 Approved by: https://github.com/Lezcano, https://github.com/IvanYashchuk, https://github.com/mruberry	2022-05-31 16:11:53 +00:00
Jagadish Krishnamoorthy	3ee863cb7c	[ROCm] enable test_lobpcg_ortho_cuda_float64 (#78385 ) Signed-off-by: Jagadish Krishnamoorthy <jagdish.krishna@gmail.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/78385 Approved by: https://github.com/Lezcano, https://github.com/pruthvistony	2022-05-28 22:46:23 +00:00
lezcano	ff7b6d6b5f	Update linalg.*norm This PR does a number of things: - Move linalg.vector_norm to structured kernels and simplify the logic - Fixes a number of prexisting issues with the dtype kwarg of these ops - Heavily simplifies and corrects the logic of `linalg.matrix_norm` and `linalg.norm` to be consistent with the docs - Before the `_out` versions of these functions were incorrect - Their implementation is now as efficient as expected, as it avoids reimplementing these operations whenever possible. - Deprecates `torch.frobenius_norm` and `torch.nuclear_norm`, as they were exposed in the API and they are apparently being used in mobile (??!!) even though they were not documented and their implementation was slow. - I'd love to get rid of these functions already, but I guess we have to go through their deprecation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/76547 Approved by: https://github.com/mruberry	2022-05-18 11:46:50 +00:00
Kulin Seth	e011a8e18b	Enable PyTorch operations on MPS Backend. (#77343 ) Add PyTorch operations to MPS backend. - https://github.com/pytorch/pytorch/issues/77394 Pull Request resolved: https://github.com/pytorch/pytorch/pull/77343 Approved by: https://github.com/albanD	2022-05-13 18:28:53 +00:00
Ivan Yashchuk	890bdf13e1	Remove deprecated torch.solve (#70986 ) The time has come to remove deprecated linear algebra related functions. This PR removes `torch.solve`. cc @jianyuh @nikitaved @pearu @mruberry @walterddr @IvanYashchuk @xwang233 @Lezcano Pull Request resolved: https://github.com/pytorch/pytorch/pull/70986 Approved by: https://github.com/Lezcano, https://github.com/albanD	2022-05-10 13:44:07 +00:00
PyTorch MergeBot	4ebc4890dd	Revert "Add linalg.lu_solve" This reverts commit `fc5b4a5a33`. Reverted https://github.com/pytorch/pytorch/pull/72935 on behalf of https://github.com/malfet	2022-05-09 19:12:30 +00:00
PyTorch MergeBot	1467e0dd5d	Revert "Deprecate torch.lu" This reverts commit `a5bbfd94fb`. Reverted https://github.com/pytorch/pytorch/pull/73804 on behalf of https://github.com/malfet	2022-05-09 19:06:44 +00:00
lezcano	a5bbfd94fb	Deprecate torch.lu BC-breaking note: This PR deprecates `torch.lu` in favor of `torch.linalg.lu_factor`. A upgrade guide is added to the documentation for `torch.lu`. Note this PR DOES NOT remove `torch.lu`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/73804 Approved by: https://github.com/IvanYashchuk, https://github.com/mruberry	2022-05-05 19:17:11 +00:00
lezcano	fc5b4a5a33	Add linalg.lu_solve This PR adds `linalg.lu_solve`. While doing so, I found a bug in MAGMA when calling the batched MAGMA backend with trans=True. We work around that by solving the system solving two triangular systems. We also update the heuristics for this function, as they were fairly updated. We found that cuSolver is king, so luckily we do not need to rely on the buggy backend from magma for this function. We added tests testing this function left and right. We also added tests for the different backends. We also activated the tests for AMD, as those should work as well. Fixes https://github.com/pytorch/pytorch/issues/61657 Pull Request resolved: https://github.com/pytorch/pytorch/pull/72935 Approved by: https://github.com/IvanYashchuk, https://github.com/mruberry	2022-05-05 19:02:13 +00:00
lezcano	7cb7cd5802	Add linalg.lu This PR modifies `lu_unpack` by: - Using less memory when unpacking `L` and `U` - Fuse the subtraction by `-1` with `unpack_pivots_stub` - Define tensors of the correct types to avoid copies - Port `lu_unpack` to be a strucutred kernel so that its `_out` version does not incur on extra copies Then we implement `linalg.lu` as a structured kernel, as we want to compute its derivative manually. We do so because composing the derivatives of `torch.lu_factor` and `torch.lu_unpack` would be less efficient. This new function and `lu_unpack` comes with all the things it can come: forward and backward ad, decent docs, correctness tests, OpInfo, complex support, support for metatensors and support for vmap and vmap over the gradients. I really hope we don't continue adding more features. This PR also avoids saving some of the tensors that were previously saved unnecessarily for the backward in `lu_factor_ex_backward` and `lu_backward` and does some other general improvements here and there to the forward and backward AD formulae of other related functions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/67833 Approved by: https://github.com/IvanYashchuk, https://github.com/nikitaved, https://github.com/mruberry	2022-05-05 09:17:05 +00:00
lezcano	1a4eea57be	Improve derivative of QR decomposition We derive and implement a more concise rule for the forward and backward derivatives of the QR decomposition. While doing this we: - Fix the composite compliance of `linalg.qr` and we make it support batches - Improve the performance and simplify the implementation of both foward and backward - Avoid saving the input matrix for the backward computation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/76115 Approved by: https://github.com/nikitaved, https://github.com/albanD	2022-05-05 09:14:57 +00:00
lezcano	9e34a8241b	Improved matmul tests Let's make sure we don't break anything in the next PRs of the stack. Also some comprehensive testing of matmul on CPU and CUDA was long due. Running this tests we see that the `out=` variant of matmul is broken when used on 4D tensors. This hints what would be the amount of people that use out= variants... Pull Request resolved: https://github.com/pytorch/pytorch/pull/75193 Approved by: https://github.com/ngimel	2022-05-03 16:28:14 +00:00
Ivan Yashchuk	23dcbe3fed	Fix failing test when SciPy is not available for test_ldl_factor This PR moves the SciPy function import under the `if TEST_SCIPY` block. Pull Request resolved: https://github.com/pytorch/pytorch/pull/76657 Approved by: https://github.com/nikitaved	2022-05-02 14:03:36 +00:00
Ivan Yashchuk	8bb7203049	Add torch.linalg.ldl_factor_ex and torch.linalg.ldl_solve This PR adds a function for computing the LDL decomposition and a function that can solve systems of linear equations using this decomposition. The result of `torch.linalg.ldl_factor_ex` is in a compact form and it's required to use it only through `torch.linalg.ldl_solve`. In the future, we could provide `ldl_unpack` function that transforms the compact representation into explicit matrices. Fixes https://github.com/pytorch/pytorch/issues/54847. cc @jianyuh @nikitaved @pearu @mruberry @walterddr @IvanYashchuk @xwang233 @Lezcano Pull Request resolved: https://github.com/pytorch/pytorch/pull/69828 Approved by: https://github.com/Lezcano, https://github.com/mruberry, https://github.com/albanD	2022-04-28 19:23:37 +00:00
Xiao Sun	a5ffdaf064	bypassed cublasLtMatMul bug after slicing (#76205 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/76205 bypass corner cases in cublasLtMatmul Reviewed By: ngimel Differential Revision: D35716932 fbshipit-source-id: 9e0beaba1cae5f9cec369d25d85ef58e226c5c3c (cherry picked from commit 029b9578b05ceae51fb36eeaa7120855926b8393)	2022-04-24 17:28:31 +00:00
Jeff Daily	e846ef8818	add rocm ciflow/slow workflow Enables additional tests that historically have been missed for ROCm CI. Pull Request resolved: https://github.com/pytorch/pytorch/pull/72686 Approved by: https://github.com/seemethere	2022-04-22 17:41:28 +00:00
arindamroy-eng	7478ce187a	ROCM:Unskip more tests for ROCM5.0 Re-enabling more tests which are working on ROCM5.0 Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/75353 Approved by: https://github.com/ezyang	2022-04-19 19:45:55 +00:00
Natalia Gimelshein	f120d5be94	remove fp16 support from cpu linalg functions fp16 on cpu produces slow and inaccurate results, see #69969 Pull Request resolved: https://github.com/pytorch/pytorch/pull/75647 Approved by: https://github.com/Lezcano, https://github.com/mruberry	2022-04-18 15:55:38 +00:00
PyTorch MergeBot	9312ee8cd6	Revert "remove fp16 support from cpu linalg functions" This reverts commit `29af58db51`. Reverted https://github.com/pytorch/pytorch/pull/75647 on behalf of https://github.com/ngimel	2022-04-14 21:06:48 +00:00
Natalia Gimelshein	29af58db51	remove fp16 support from cpu linalg functions fp16 on cpu produces slow and inaccurate results, see #69969 Pull Request resolved: https://github.com/pytorch/pytorch/pull/75647 Approved by: https://github.com/Lezcano, https://github.com/mruberry	2022-04-14 18:45:59 +00:00
PyTorch MergeBot	495c5aebb1	Revert "remove fp16 support from cpu linalg functions" This reverts commit `de18c28a4c`. Reverted https://github.com/pytorch/pytorch/pull/75647 on behalf of https://github.com/suo	2022-04-13 18:34:35 +00:00
Natalia Gimelshein	de18c28a4c	remove fp16 support from cpu linalg functions fp16 on cpu produces slow and inaccurate results, see #69969 Pull Request resolved: https://github.com/pytorch/pytorch/pull/75647 Approved by: https://github.com/Lezcano, https://github.com/mruberry	2022-04-13 17:24:22 +00:00
Scott Wolchok	48147675f2	[PyTorch] _addm_activation native function for matmul/bias/activation fusion Pull Request resolved: https://github.com/pytorch/pytorch/pull/74490 Here's an extended version of addmm that takes advantage of cublasLt's fused addmm + relu/gelu support. Differential Revision: [D35019612](https://our.internmc.facebook.com/intern/diff/D35019612/) Approved by: https://github.com/ngimel	2022-04-08 17:54:09 +00:00
Andrey Talman	622cff3e95	Cuda 11.6 Disable failing tests (#75420 ) Summary: This mitigates number of issues with CUDA 11.6 update and updates Linux driver . New issues discovered #[75391](https://github.com/pytorch/pytorch/issues/75391) #[75375](https://github.com/pytorch/pytorch/issues/75375) Old issue present since 11.3 #[57482](https://github.com/pytorch/pytorch/issues/57482) #[70111](https://github.com/pytorch/pytorch/issues/70111) These changes already testsed WIP PR: #[75337](https://github.com/pytorch/pytorch/pull/75337) Pull Request resolved: https://github.com/pytorch/pytorch/pull/75420 Reviewed By: seemethere Differential Revision: D35481973 Pulled By: atalman fbshipit-source-id: 4db00c646e2df4f8650404763963c3b215110f1f (cherry picked from commit 518e19dc361b43273f5bd6bdfff942614e8466f5)	2022-04-07 22:43:15 +00:00
Nikita Shulga	bfac65dfe5	[testing] Update dispatch macros (#74977 ) This PR is reland of #74289 Co-authored-by: Khushi Agrawal <khushiagrawal411@gmail.com>	2022-03-30 14:13:21 -07:00
PyTorch MergeBot	2e4152b118	Revert "[testing] Update dispatch macros" This reverts commit `eed19a0f38`. Reverted https://github.com/pytorch/pytorch/pull/74289 on behalf of https://github.com/malfet	2022-03-30 19:52:37 +00:00
Khushi Agrawal	eed19a0f38	[testing] Update dispatch macros Hi, This PR is the follow-up PR of #71561. (the previous PR had a couple of merge conflicts and was reverted, this PR resolves that). Please take a look. Thanks! cc: @pmeier @mruberry @kshitij12345 Pull Request resolved: https://github.com/pytorch/pytorch/pull/74289 Approved by: https://github.com/pmeier, https://github.com/mruberry	2022-03-30 16:10:16 +00:00
Ivan Yashchuk	ca4ba2ee92	Skip specifying rcond for gelsy driver in tests Fixes https://github.com/pytorch/pytorch/issues/72281 Pull Request resolved: https://github.com/pytorch/pytorch/pull/74638 Approved by: https://github.com/mruberry	2022-03-24 14:55:33 +00:00
David Berard	15c98700ed	Add CPU slow test job (#73748 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73748 This adds CPU-only slow test jobs, which previously would never run. Includes fixes/skips for slow tests which fail (they need to be skipped now because they used to never run) Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D34628803 Pulled By: davidberard98 fbshipit-source-id: c090ab7bf7bda9e24ec5cdefa6fd35c6310dbac0 (cherry picked from commit 06f7a94a57cc7023e9c5442be8298d20cd011144)	2022-03-23 21:17:27 +00:00
Nikita Shulga	ef066f0832	Revert D34856571: [pytorch][PR] Replace `get_all_` type macros with the ATen dispatch macros. Test Plan: revert-hammer Differential Revision: D34856571 (`3ded7b1da3`) Original commit changeset: 0dca038bcad5 Original Phabricator Diff: D34856571 (`3ded7b1da3`) fbshipit-source-id: 594553fa0b710d78beba59d5d2b646f1f1270386 (cherry picked from commit 8090eb9b12dcf452a9e7dc01792a66fb91b563b6)	2022-03-15 22:07:11 +00:00
Ivan Yashchuk	064206df03	Performance and memory improvements to batched torch.linalg.solve (2nd attempt) (#71756 ) Summary: Previous PR with the same content: https://github.com/pytorch/pytorch/pull/69752. Opening a new PR by request: https://github.com/pytorch/pytorch/pull/69752#issuecomment-1020829812. ------ Previously for single input matrix A and batched matrix B, matrix A was expanded and cloned before computing the LU decomposition and solving the linear system. With this PR the LU decomposition is computed once for a single matrix and then expanded&cloned if required by a backend library call for the linear system solving. Here's a basic comparison: ```python # BEFORE THE PR In [1]: import torch In [2]: a = torch.randn(256, 256) In [3]: b = torch.randn(1024, 256, 2) In [4]: %%timeit ...: torch.linalg.solve(a, b) ...: ...: 329 ms ± 17.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) # WITH THIS PR In [1]: import torch In [2]: a = torch.randn(256, 256) In [3]: b = torch.randn(1024, 256, 2) In [4]: %%timeit ...: torch.linalg.solve(a, b) ...: ...: 21.4 ms ± 23 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) ``` Fixes https://github.com/pytorch/pytorch/issues/71406, fixes https://github.com/pytorch/pytorch/issues/71610 Pull Request resolved: https://github.com/pytorch/pytorch/pull/71756 Reviewed By: ngimel Differential Revision: D33771981 Pulled By: mruberry fbshipit-source-id: 0917ee36a3eb622ff75d54787b1bffe26b41cb4a (cherry picked from commit 9c30a05aaa972bc02dfc94c3d2463f0c5ee0c58c)	2022-03-15 21:28:31 +00:00
Khushi Agrawal	3ded7b1da3	Replace `get_all_` type macros with the ATen dispatch macros. (#71561 ) Summary: Hi, Team! The PR is motivated from https://github.com/pytorch/pytorch/pull/71153#discussion_r782446738. It aims to replace `get_all` type macros with the ATen dispatch macros. The files it iterates over are: (Thanks, Lezcano, for the idea!!) <details> <summary> `test/test_autograd.py`</summary> <p> ```python 43:from torch.testing._internal.common_dtype import get_all_dtypes 8506: floating_dt = [dt for dt in get_all_dtypes() if dt.is_floating_point] ``` </p> </details> <details> <summary> `test/test_binary_ufuncs.py`</summary> <p> ```python 26: all_types_and_complex_and, integral_types_and, get_all_dtypes, get_all_int_dtypes, get_all_math_dtypes, 27: get_all_complex_dtypes, get_all_fp_dtypes, 935: dtypes(get_all_dtypes(include_bool=False, include_complex=False)) 1035: dtypes(get_all_dtypes( 1488: dtypes((get_all_dtypes(include_bool=False, include_bfloat16=False))) 1879: dtypes(product(get_all_dtypes(include_complex=False), get_all_dtypes(include_complex=False))) 1887: dtypes((get_all_int_dtypes() + [torch.bool])) 1913: dtypes((get_all_fp_dtypes())) 1941: dtypes((get_all_fp_dtypes())) 1977: dtypes(product(get_all_complex_dtypes(), get_all_dtypes())) 2019: dtypes(product(get_all_fp_dtypes(), get_all_fp_dtypes())) 2048: dtypes(get_all_dtypes()) 2110: dtypes(product(get_all_dtypes(include_complex=False), 2111: get_all_dtypes(include_complex=False))) 2128: types = [torch.bool, torch.bfloat16] + get_all_int_dtypes() 2173: if dtypes[1] in get_all_fp_dtypes(): 2178: dtypes(product(get_all_fp_dtypes(), 2179: get_all_fp_dtypes())) 2260: dtypesIfCUDA(set(get_all_math_dtypes('cuda')) - {torch.complex64, torch.complex128}) 2261: dtypes(set(get_all_math_dtypes('cpu')) - {torch.complex64, torch.complex128}) 2273: dtypesIfCUDA(set(get_all_math_dtypes('cuda')) - {torch.complex64, torch.complex128}) 2274: dtypes(set(get_all_math_dtypes('cpu')) - {torch.complex64, torch.complex128}) 2307: dtypes(get_all_math_dtypes('cpu')) 2319: dtypes(get_all_fp_dtypes(include_bfloat16=False)) 2331: dtypes(get_all_int_dtypes()) 2356: dtypes(get_all_dtypes(include_bfloat16=False, include_bool=False, include_complex=False)) 2393: if dtype in get_all_int_dtypes(): 2614: dtypes(get_all_dtypes()) 2624: dtypes(tuple(itertools.combinations_with_replacement(get_all_dtypes(), 2))) 2806: dtypes(list(product(get_all_dtypes(include_complex=False), 2807: get_all_dtypes(include_complex=False)))) 2866: dtypes(list(product(get_all_complex_dtypes(), 2867: get_all_complex_dtypes()))) 2902: dtypes(product(get_all_dtypes(), get_all_dtypes())) 2906: dtypes(product(get_all_dtypes(), get_all_dtypes())) 2910: dtypes(product(get_all_dtypes(), get_all_dtypes())) 3019: dtypes = [torch.float, torch.double] + get_all_complex_dtypes() 3221: dtypes(get_all_dtypes(include_complex=False)) 3407: dtypes(list(product(get_all_dtypes(include_bool=False), 3408: get_all_dtypes(include_bool=False)))) 3504: dtypes(product(get_all_dtypes(include_complex=False, include_bfloat16=False), 3505: get_all_dtypes(include_complex=False, include_bfloat16=False))) 3516: if x.dtype in get_all_int_dtypes() + [torch.bool]: 3643: dtypes(product(get_all_dtypes(include_complex=False, 3645: get_all_dtypes(include_complex=False, ``` </p> </details> <details> <summary> `test/test_complex.py`</summary> <p> ```python 6:from torch.testing._internal.common_dtype import get_all_complex_dtypes 11: dtypes(get_all_complex_dtypes()) ``` </p> </details> <details> <summary> `test/test_foreach.py`</summary> <p> ```python 18: get_all_dtypes, get_all_int_dtypes, get_all_complex_dtypes, get_all_fp_dtypes, 142: if dtype in get_all_int_dtypes(): 179: disable_fastpath = op.ref == torch.div and dtype in get_all_int_dtypes() + [torch.bool] 201: disable_fastpath = op.ref == torch.div and dtype in get_all_int_dtypes() + [torch.bool] 205: disable_fastpath \|= dtype in get_all_int_dtypes() + [torch.bool] 211: disable_fastpath \|= dtype not in get_all_complex_dtypes() 241: bool_int_div = op.ref == torch.div and dtype in get_all_int_dtypes() + [torch.bool] 246: disable_fastpath \|= dtype in get_all_int_dtypes() + [torch.bool] 248: disable_fastpath \|= dtype not in get_all_complex_dtypes() 250: disable_fastpath \|= True and dtype not in get_all_complex_dtypes() 307: disable_fastpath = dtype in get_all_int_dtypes() + [torch.bool] 365: if opinfo.name == "_foreach_abs" and dtype in get_all_complex_dtypes(): 376: ops(foreach_unary_op_db, dtypes=get_all_dtypes()) 393: dtypes=get_all_dtypes(include_half=True, include_bfloat16=True, include_complex=False)) 401: ops(foreach_minmax_op_db, dtypes=get_all_fp_dtypes(include_bfloat16=True, include_half=True)) 426: if ord in (1, 2) and dtype in torch.testing.get_all_fp_dtypes(): 439: dtypes(get_all_dtypes()) 449: ops(foreach_binary_op_db, dtypes=get_all_dtypes()) 481: ops(foreach_binary_op_db, dtypes=get_all_dtypes()) 536: if dtype in get_all_int_dtypes() + [torch.bool] and foreach_op == torch._foreach_div: 545: ops(foreach_binary_op_db, dtypes=get_all_dtypes()) 637: ops(foreach_pointwise_op_db, allowed_dtypes=get_all_fp_dtypes(include_half=False, include_bfloat16=False)) ``` </p> </details> <details> <summary> `test/test_linalg.py`</summary> <p> ```python 29: all_types, floating_types, floating_and_complex_types, get_all_dtypes, get_all_int_dtypes, get_all_complex_dtypes, 30: get_all_fp_dtypes, 111: dtypes((get_all_dtypes())) 794: float_and_complex_dtypes = get_all_fp_dtypes() + get_all_complex_dtypes() 807: dtypes((get_all_int_dtypes())) 828: dtypes((get_all_fp_dtypes() + get_all_complex_dtypes())) 841: if dtype in get_all_complex_dtypes(): 844: dtypes(itertools.product(get_all_dtypes(), 845: get_all_dtypes())) 855: for dtypes0, dtypes1, dtypes2 in product(get_all_dtypes(), repeat=3): 5607: get_all_fp_dtypes(include_half=not CUDA9, include_bfloat16=(CUDA11OrLater and SM53OrLater))) 5608: dtypes((set(get_all_dtypes()) - {torch.half, torch.bool})) 5644: dtypes((get_all_complex_dtypes() + get_all_fp_dtypes())) 6255: dtypesIfCUDA(get_all_complex_dtypes(), 6256: get_all_fp_dtypes(include_bfloat16=(TEST_WITH_ROCM or (CUDA11OrLater and SM53OrLater)), 6292: dtypesIfCUDA(get_all_fp_dtypes(include_bfloat16=(TEST_WITH_ROCM or (CUDA11OrLater and SM53OrLater)))) 6323: dtypesIfCUDA(get_all_complex_dtypes(), 6324: get_all_fp_dtypes(include_bfloat16=(TEST_WITH_ROCM or (CUDA11OrLater and SM53OrLater)))) 6325: dtypes(get_all_complex_dtypes(), get_all_fp_dtypes()) 6358: dtypesIfCUDA(([torch.float, torch.double] + get_all_complex_dtypes())) 6556: dtypes(get_all_fp_dtypes(), get_all_complex_dtypes()) 6668: dtypes(get_all_fp_dtypes(), get_all_complex_dtypes()) 6741: dtypes(get_all_fp_dtypes(), get_all_complex_dtypes()) ``` </p> </details> <details> <summary> `test/test_nn.py`</summary> <p> ```python 37:from torch.testing._internal.common_dtype import integral_types, get_all_fp_dtypes, get_all_math_dtypes 50: onlyNativeDeviceTypes, deviceCountAtLeast, largeTensorTest, expectedFailureMeta, skipMeta, get_all_device_types, \ 8862: for device in get_all_device_types(): 9629: for dt1 in get_all_math_dtypes(device): 9630: for dt2 in get_all_math_dtypes(device): 9631: for dt3 in get_all_math_dtypes(device): 9648: for input_dtype in get_all_math_dtypes(device): 9664: for input_dtype in get_all_math_dtypes(device): 13015: dtypes(get_all_fp_dtypes(include_bfloat16=AMPERE_OR_ROCM)) 13034: dtypes(get_all_fp_dtypes(include_bfloat16=AMPERE_OR_ROCM)) 13159: dtypes(get_all_fp_dtypes(include_bfloat16=AMPERE_OR_ROCM)) 17400: dtypesIfCUDA(get_all_fp_dtypes(include_bfloat16=AMPERE_OR_ROCM)) 17768: dtypesIfCUDA(get_all_fp_dtypes()) 17773: dtypesIfCUDA(get_all_fp_dtypes()) 17778: dtypesIfCUDA(get_all_fp_dtypes()) 17783: dtypesIfCUDA(get_all_fp_dtypes()) 17788: dtypesIfCUDA(get_all_fp_dtypes()) 17793: dtypesIfCUDA(get_all_fp_dtypes()) 17798: dtypesIfCUDA(get_all_fp_dtypes()) 17963: dtypesIfCUDA(get_all_fp_dtypes()) 17977: dtypesIfCUDA(get_all_fp_dtypes()) 18684: def test_cross_entropy_loss_prob_target_all_reductions(self, device): ``` </p> </details> <details> <summary> `test/test_numpy_interop.py`</summary> <p> ```python 12:from torch.testing._internal.common_dtype import get_all_dtypes 399: dtypes(get_all_dtypes()) ``` </p> </details> <details> <summary> `test/test_ops.py`</summary> <p> ```python 12:from torch.testing._internal.common_dtype import floating_and_complex_types_and, get_all_dtypes 86: for dtype in get_all_dtypes(): ``` </p> </details> <details> <summary> `test/test_reductions.py`</summary> <p> ```python 16: get_all_dtypes, get_all_math_dtypes, get_all_int_dtypes, get_all_complex_dtypes, get_all_fp_dtypes, 360: allowed_dtypes=get_all_dtypes(include_bfloat16=False)) 366: allowed_dtypes=get_all_dtypes(include_bfloat16=False)) 394: allowed_dtypes=get_all_dtypes(include_bfloat16=False)) 750: for dtype in [dtype for dtype in get_all_math_dtypes('cpu') if dtype != torch.float16]: 1404: dtypes(get_all_dtypes(include_bool=False, include_complex=False)) 1457: dtypes((get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False) + 1458: get_all_complex_dtypes())) 1465: return dtype in get_all_int_dtypes() 1494: dtypes((get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False))) 1501: dtypes((get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False))) 1507: dtypes((get_all_complex_dtypes())) 1514: dtypes = list(get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False)) 1523: dtypes((get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False))) 1531: if dtype in get_all_fp_dtypes(): 1608: dtypes((get_all_dtypes(include_half=True, include_bfloat16=False, 1837: dtypes(get_all_dtypes(include_bool=False, include_complex=False)) 1855: dtypes((set(get_all_dtypes(include_bool=False, include_complex=False)) - {torch.uint8})) 3219: for dtype in get_all_dtypes(include_half=True, include_bfloat16=False, ``` </p> </details> <details> <summary> `test/test_serialization.py`</summary> <p> ```python 26:from torch.testing._internal.common_dtype import get_all_dtypes 586: for device, dtype in product(devices, get_all_dtypes()): 589: for other_dtype in get_all_dtypes(): ``` </p> </details> <details> <summary> `test/test_shape_ops.py`</summary> <p> ```python 18:from torch.testing._internal.common_dtype import get_all_dtypes 230: dtypes(get_all_dtypes(include_complex=False, include_bool=False, include_half=False, 232: dtypesIfCUDA(get_all_dtypes(include_complex=False, include_bool=False, include_bfloat16=False)) 344: dtypes(get_all_dtypes()) 443: dtypes(get_all_dtypes()) 461: dtypes(get_all_dtypes()) 570: dtypes(get_all_dtypes(include_complex=False)) ``` </p> </details> <details> <summary> `test/test_sort_and_select.py`</summary> <p> ```python 12: all_types, all_types_and, floating_types_and, get_all_dtypes, get_all_int_dtypes, get_all_fp_dtypes, 136: dtypes(set(get_all_dtypes()) - {torch.bool, torch.complex64, torch.complex128}) 231: dtypes(set(get_all_dtypes()) - {torch.bool, torch.complex64, torch.complex128}) 296: dtypes((get_all_int_dtypes() + get_all_fp_dtypes())) 647: dtypesIfCUDA(get_all_fp_dtypes()) 678: dtypesIfCUDA((get_all_dtypes(include_complex=False, 682: dtypes((get_all_dtypes(include_complex=False, include_bool=False, include_half=False, include_bfloat16=False))) 739: dtypesIfCPU(set(get_all_dtypes()) - {torch.complex64, torch.complex128}) 740: dtypes(set(get_all_dtypes()) - {torch.bfloat16, torch.complex64, torch.complex128}) 799: dtypesIfCPU(set(get_all_dtypes()) - {torch.complex64, torch.complex128}) 800: dtypes(set(get_all_dtypes()) - {torch.bfloat16, torch.complex64, torch.complex128}) ``` </p> </details> <details> <summary> `test/test_sparse.py`</summary> <p> ```python 20:from torch.testing import get_all_complex_dtypes, get_all_fp_dtypes 29: floating_and_complex_types, floating_and_complex_types_and, get_all_dtypes, get_all_int_dtypes, 1963: return dtype in get_all_int_dtypes() 1994: dtypes(get_all_dtypes(include_bool=False, include_half=False, 2103: return dtype in get_all_int_dtypes() 2138: dtypes(get_all_dtypes(include_bool=False, include_half=False, 2626: all_sparse_dtypes = get_all_dtypes(include_complex=True) 2633: all_sparse_dtypes = get_all_dtypes(include_complex=True) 3230: dtypes(get_all_complex_dtypes(), 3231: get_all_fp_dtypes(include_half=False, include_bfloat16=False)) 3234: get_all_fp_dtypes( ``` </p> </details> <details> <summary> `test/test_sparse_csr.py`</summary> <p> ```python 7:from torch.testing import get_all_complex_dtypes, get_all_fp_dtypes, floating_and_complex_types, make_tensor 17:from torch.testing._internal.common_dtype import floating_types, get_all_dtypes 120: dtypes(get_all_dtypes()) 133: dtypes(get_all_dtypes()) 150: dtypes(get_all_dtypes()) 180: dtypes(get_all_dtypes()) 201: dtypes(get_all_dtypes()) 210: dtypes(get_all_dtypes()) 225: dtypes(get_all_dtypes()) 244: dtypes(get_all_dtypes()) 263: dtypes(get_all_dtypes()) 285: dtypes(get_all_dtypes()) 411: dtypes(get_all_dtypes()) 482: dtypes(get_all_dtypes()) 502: dtypes(get_all_dtypes()) 562: dtypes(get_all_dtypes()) 588: dtypesIfCUDA(get_all_complex_dtypes(), 589: get_all_fp_dtypes(include_half=SM53OrLater, include_bfloat16=SM80OrLater)) 745: dtypesIfCUDA(get_all_complex_dtypes(), 746: get_all_fp_dtypes(include_half=SM53OrLater and TEST_CUSPARSE_GENERIC, 765: dtypesIfCUDA(get_all_complex_dtypes(), 766: get_all_fp_dtypes(include_half=SM53OrLater and TEST_CUSPARSE_GENERIC, 801: torch.testing.get_all_fp_dtypes(include_bfloat16=SM80OrLater, 841: torch.testing.get_all_fp_dtypes(include_bfloat16=SM80OrLater, 1182: dtypes(get_all_dtypes()) 1276: dtypes(get_all_dtypes(include_bool=False, include_half=False, include_bfloat16=False)) 1286: dtypes(get_all_dtypes()) ``` </p> </details> <details> <summary> `test/test_tensor_creation_ops.py`</summary> <p> ```python 21: onlyCUDA, skipCPUIf, dtypesIfCUDA, skipMeta, get_all_device_types) 23: get_all_dtypes, get_all_math_dtypes, get_all_int_dtypes, get_all_fp_dtypes, get_all_complex_dtypes 150: for dt in get_all_dtypes(): 160: for dt in get_all_dtypes(): 314: dtypes = [dtype for dtype in get_all_dtypes() if dtype != torch.bfloat16] 1012: dtypes((get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False) + 1013: get_all_complex_dtypes())) 1032: dtypes((get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False) + 1033: get_all_complex_dtypes())) 1050: dtypes((get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False) + 1051: get_all_complex_dtypes())) 1745: dtypes((get_all_int_dtypes() + get_all_fp_dtypes())) 1779: dtypes((get_all_int_dtypes() + get_all_fp_dtypes())) 1868: dtypes((get_all_int_dtypes() + get_all_fp_dtypes())) 1926: dtypes((get_all_int_dtypes() + get_all_fp_dtypes())) 1954: do_test_empty_full(self, get_all_math_dtypes('cpu'), torch.strided, torch_device) 1956: do_test_empty_full(self, get_all_math_dtypes('cpu'), torch.strided, None) 1957: do_test_empty_full(self, get_all_math_dtypes('cpu'), torch.strided, torch_device) 2538: for device in get_all_device_types(): 2645: for dtype in get_all_dtypes(): 2678: dtypes((get_all_fp_dtypes(include_half=False, include_bfloat16=False) + 2679: get_all_complex_dtypes())) 2716: dtypes(get_all_fp_dtypes(include_half=False, include_bfloat16=False)) 2827: for dt in get_all_dtypes(): 2913: dtypes(get_all_dtypes(include_bool=False, include_half=False)) 2914: dtypesIfCUDA(get_all_dtypes(include_bool=False, include_half=True)) 3028: dtypes((get_all_fp_dtypes() + get_all_complex_dtypes())) 3033: dtypes((get_all_fp_dtypes() + get_all_complex_dtypes())) 3074: dtypes(get_all_dtypes(include_bool=False, include_half=False, include_complex=False)) 3075: dtypesIfCUDA(((get_all_int_dtypes() + [torch.float32, torch.float16, torch.bfloat16]) 3077: else get_all_dtypes(include_bool=False, include_half=True, include_complex=False))) 3873: dtypes(get_all_dtypes()) 3884: dtypes(get_all_dtypes(include_bool=False)) 3916: for other in get_all_dtypes(): 3922: dtypes(get_all_dtypes()) 3932: dtypes(get_all_dtypes(include_bool=False)) 3955: dtypes(get_all_dtypes(include_bool=False)) 3961: dtypes(get_all_dtypes(include_bool=False)) 3965: dtypes(get_all_dtypes()) ``` </p> </details> <details> <summary> `test/test_testing.py`</summary> <p> ```python 25:from torch.testing._internal.common_dtype import get_all_dtypes 31: dtypes((get_all_dtypes(include_half=True, include_bfloat16=False, ``` </p> </details> <details> <summary> `test/test_torch.py`</summary> <p> ```python 51: expectedAlertNondeterministic, get_all_device_types, skipXLA) 57: get_all_fp_dtypes, get_all_int_dtypes, get_all_math_dtypes, get_all_dtypes, get_all_complex_dtypes 296: for d in get_all_device_types(): 323: for device in get_all_device_types(): 324: for dt1 in get_all_dtypes(): 325: for dt2 in get_all_dtypes(): 343: all_dtypes = get_all_dtypes() 350: all_dtypes = get_all_dtypes() 781: for dtype in get_all_dtypes(): 986: for device in get_all_device_types(): 1017: for device in get_all_device_types(): 1018: for dtype in get_all_math_dtypes(device): 2792: for device in get_all_device_types(): 3186: dtypes(get_all_dtypes()) 3195: for error_dtype in get_all_dtypes(): 3203: dtypes(get_all_dtypes()) 3212: for error_dtype in get_all_dtypes(): 4539: dtypes(get_all_fp_dtypes()) 4545: dtypes((get_all_int_dtypes() + get_all_fp_dtypes())) 4577: dtypes(get_all_fp_dtypes(include_half=False, include_bfloat16=False)) 4578: dtypesIfCPU((get_all_fp_dtypes(include_half=False, include_bfloat16=True))) 4579: dtypesIfCUDA((get_all_fp_dtypes(include_bfloat16=False))) 4599: dtypes((get_all_fp_dtypes(include_half=False, include_bfloat16=False))) 4600: dtypesIfCPU((get_all_dtypes(include_half=False, include_bfloat16=False, include_complex=False))) 4601: dtypesIfCUDA((get_all_dtypes(include_bfloat16=False, include_complex=False))) 4613: for p_dtype in get_all_fp_dtypes(include_half=device.startswith('cuda'), include_bfloat16=False): 4628: dtypes((get_all_fp_dtypes(include_half=False, include_bfloat16=False))) 4629: dtypesIfCUDA((get_all_fp_dtypes(include_bfloat16=False))) 4640: dtypes(get_all_fp_dtypes()) 4723: dtypes(get_all_fp_dtypes()) 4735: dtypes(get_all_fp_dtypes(include_bfloat16=False)) 4736: dtypesIfCUDA(get_all_fp_dtypes()) 4747: dtypes(get_all_fp_dtypes()) 4761: dtypes(get_all_fp_dtypes()) 4771: dtypes(get_all_fp_dtypes()) 4792: dtypes((get_all_int_dtypes() + get_all_fp_dtypes())) 5302: dtypes(get_all_dtypes(include_bfloat16=False)) 5322: dtypes(get_all_dtypes(include_half=False, include_bfloat16=False)) 5323: dtypesIfCPU(get_all_dtypes(include_bfloat16=False)) 5324: dtypesIfCUDA(get_all_dtypes(include_bfloat16=False)) 5591: for dt in get_all_dtypes(): 5611: for dt in get_all_dtypes(): 5678: for dt in get_all_dtypes(): 5696: dtypesIfCUDA(set(get_all_math_dtypes('cuda'))) 5697: dtypes(set(get_all_math_dtypes('cpu'))) 5746: dtypes(get_all_dtypes()) 5780: dtypes(get_all_dtypes()) 5885: dtypes(get_all_dtypes()) 5902: dtypes(get_all_dtypes()) 5945: dtypes(get_all_dtypes()) 5979: dtypes(get_all_dtypes(include_bool=False)) 6049: dtypes(get_all_dtypes(include_bool=False)) 6092: dtypes((get_all_fp_dtypes(include_bfloat16=False, include_half=False) + 6093: get_all_complex_dtypes())) 6094: dtypesIfCPU(get_all_dtypes()) 6095: dtypesIfCUDA(get_all_dtypes()) 6122: dtypes((get_all_fp_dtypes(include_bfloat16=False, include_half=False) + 6123: get_all_complex_dtypes())) 6124: dtypesIfCPU(get_all_dtypes()) 6125: dtypesIfCUDA(get_all_dtypes()) 6163: dtypes((get_all_fp_dtypes(include_bfloat16=False, include_half=False) + 6164: get_all_complex_dtypes())) 6165: dtypesIfCPU(get_all_dtypes()) 6166: dtypesIfCUDA(get_all_dtypes()) 6190: dtypes((get_all_complex_dtypes() + 6191: get_all_int_dtypes())) 6238: dtypes(get_all_dtypes()) 6323: dtypes(get_all_dtypes()) 6389: dtypes(product(get_all_dtypes(), (torch.uint8, torch.bool))) 6699: dtypesIfCUDA(set(get_all_math_dtypes('cuda'))) 6700: dtypes(set(get_all_math_dtypes('cpu'))) 7452: dtypes(get_all_dtypes(include_bool=False)) 7461: dtypes(get_all_dtypes(include_bool=False)) 7477: dtypes(get_all_dtypes(include_bool=False)) 7496: dtypes(get_all_dtypes(include_bool=False)) 7538: dtypes(get_all_dtypes(include_bool=False)) 8162: dtypes((get_all_int_dtypes() + get_all_fp_dtypes() + 8163: get_all_complex_dtypes())) 8175: dtypes((get_all_int_dtypes() + get_all_fp_dtypes() + 8176: get_all_complex_dtypes())) ``` </p> </details> <details> <summary> `test/test_type_promotion.py`</summary> <p> ```python 14: get_all_dtypes, get_all_math_dtypes, get_all_int_dtypes, get_all_fp_dtypes 187: for dtype in get_all_dtypes(): 262: dtypes1 = get_all_math_dtypes('cuda') 263: dtypes2 = get_all_math_dtypes(device) 339: dtypes(itertools.product(get_all_dtypes(), get_all_dtypes())) 468: for dt1 in get_all_math_dtypes(device): 469: for dt2 in get_all_math_dtypes(device): 519: for dt1 in get_all_math_dtypes(device): 520: for dt2 in get_all_math_dtypes(device): 528: for dt in get_all_math_dtypes(device): 561: for dtype in get_all_dtypes(): 766: dtypes=get_all_math_dtypes(device)) 771: dtypes=get_all_math_dtypes(device)) 782: dtypes=get_all_math_dtypes(device)) 879: dtypes = get_all_dtypes(include_bfloat16=False) 898: dtypes = get_all_dtypes(include_bfloat16=False, include_bool=False) 965: dtypesIfCUDA(itertools.product(get_all_dtypes(include_bfloat16=False, include_complex=False), 966: get_all_dtypes(include_bfloat16=False, include_complex=False))) 967: dtypes(itertools.product(get_all_dtypes(include_half=False, include_bfloat16=False, 969: get_all_dtypes(include_half=False, include_bfloat16=False, 976: return dtype in get_all_int_dtypes() + [torch.bool] 979: return dtype in get_all_fp_dtypes(include_half=True, include_bfloat16=False) ``` </p> </details> <details> <summary> `test/test_unary_ufuncs.py`</summary> <p> ```python 24: floating_types_and, all_types_and_complex_and, floating_and_complex_types_and, get_all_dtypes, get_all_math_dtypes, 25: get_all_int_dtypes, get_all_fp_dtypes, get_all_complex_dtypes 517: dtypes((get_all_int_dtypes() + [torch.bool] + 518: get_all_fp_dtypes(include_bfloat16=False))) 596: dtypes(get_all_fp_dtypes(include_half=True, include_bfloat16=False)) 611: invalid_input_dtypes = get_all_int_dtypes() + \ 612: get_all_complex_dtypes() + \ 619: for dtype in get_all_fp_dtypes(include_half=True, include_bfloat16=False): 1048: dtypes(get_all_math_dtypes('cpu')) 1182: dtypesIfCUDA(get_all_fp_dtypes()) 1190: dtypesIfCUDA(get_all_fp_dtypes()) 1205: dtypesIfCUDA(get_all_fp_dtypes()) 1215: dtypesIfCUDA(get_all_fp_dtypes()) 1307: dtypes((get_all_dtypes(include_bool=False))) 1349: dtypes((get_all_fp_dtypes(include_half=False) + 1350: get_all_complex_dtypes())) 1351: dtypesIfCUDA((get_all_fp_dtypes(include_half=True) + 1352: get_all_complex_dtypes())) ``` </p> </details> <details> <summary> `test/test_view_ops.py`</summary> <p> ```python 19: get_all_dtypes, get_all_int_dtypes, get_all_fp_dtypes, get_all_complex_dtypes 124: dtypes((get_all_int_dtypes() + get_all_fp_dtypes())) 131: dtypes(get_all_dtypes(include_bfloat16=False)) 213: for view_dtype in [get_all_fp_dtypes(), get_all_complex_dtypes()]: 220: dtypes(get_all_dtypes()) 224: for view_dtype in get_all_dtypes(): 305: dtypes(get_all_complex_dtypes(include_complex32=True)) 343: dtypes(get_all_dtypes()) 354: dtypes(get_all_dtypes()) 364: dtypes(get_all_dtypes()) 374: dtypes(get_all_dtypes()) 384: dtypes((get_all_int_dtypes() + get_all_fp_dtypes())) 395: dtypes(get_all_complex_dtypes()) 426: dtypes(get_all_complex_dtypes()) 451: dtypes(product(get_all_complex_dtypes(), get_all_dtypes())) 1263: dtypes((torch.testing.get_all_dtypes())) 1279: dtypes((torch.testing.get_all_dtypes())) 1405: dtypes((get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False) + 1406: get_all_complex_dtypes())) 1471: dtypes(get_all_dtypes(include_bfloat16=False)) 1574: dtypes(get_all_dtypes()) 1601: dtypes(get_all_dtypes(include_bfloat16=False)) 1632: dtypes(*get_all_dtypes(include_bfloat16=False)) 1711: for dt in get_all_dtypes(): 1717: for dt in get_all_dtypes(): 1724: for dt in get_all_dtypes(): ``` </p> </details> I'm looking forward to your viewpoints. Thanks :) cc: mruberry kshitij12345 anjali411 Pull Request resolved: https://github.com/pytorch/pytorch/pull/71561 Reviewed By: samdow Differential Revision: D34856571 Pulled By: mruberry fbshipit-source-id: 0dca038bcad5cf69906245c496d2e61ac3876335 (cherry picked from commit b058f67b4313143efa714ab105f36e74083131b9)	2022-03-15 20:31:41 +00:00
Philip Meier	0973c5a1cc	align signature of make_tensor with other creation ops (#72702 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72702 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D34457729 Pulled By: mruberry fbshipit-source-id: 83d580c4201eef946dc9cf4b9e28a3d36be55609 (cherry picked from commit aa4cf20fbeb4b795595729b8ac2e6ba7707d8283)	2022-02-25 06:30:31 +00:00
Ivan Yashchuk	29c81bbff5	Fix SVD error code handling for OpenBLAS 0.3.15+ and MKL 2022+ (again) (#72357 ) Summary: This PR was opened as copy of https://github.com/pytorch/pytorch/pull/68812 by request https://github.com/pytorch/pytorch/pull/68812#issuecomment-1030215862. ----- Fixes https://github.com/pytorch/pytorch/issues/67693. Reference LAPACK (used in OpenBLAS) changed info error code for svd when inputs contain non-finite numbers. In PyTorch, we raise an internal assert error for negative `info` error codes because usually, it would indicate the wrong implementation. However, this is not the case with SVD now in newer versions of LAPACK. MKL (tried 2021.4.0) still gives a positive error code for this kind of input. This change aligns with the OpenBLAS and MKL behavior in our code. MKL 2022 has uses the latest reference LAPACK behavior and returns the same `info` as OpenBLAS 0.3.15+ This PR also fixes https://github.com/pytorch/pytorch/issues/71645 that is due to the updated MKL version in CI. Pull Request resolved: https://github.com/pytorch/pytorch/pull/72357 Reviewed By: albanD Differential Revision: D34012245 Pulled By: ngimel fbshipit-source-id: 2b66c173cc3458d8c766b542d0d569191cdce310 (cherry picked from commit `fa29e65611`)	2022-02-07 21:36:30 +00:00
Nikita Shulga	bb6b501aa0	Back out "[pytorch][PR] Fix SVD error code handling for OpenBLAS 0.3.15+ and MKL 2022+" (#72063 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72063 Original commit changeset: fd1c86e37e40 Original Phabricator Diff: D33844257 (`2017b404ec`) Test Plan: CI Reviewed By: pbelevich Differential Revision: D33890846 fbshipit-source-id: df9eea497038ec256893a6ce7c3dcd645441b50d (cherry picked from commit `c9bf2ba5e7`)	2022-01-31 18:09:45 +00:00
Ivan Yashchuk	2017b404ec	Fix SVD error code handling for OpenBLAS 0.3.15+ and MKL 2022+ (#68812 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/67693. Reference LAPACK (used in OpenBLAS) changed info error code for svd when inputs contain non-finite numbers. In PyTorch, we raise an internal assert error for negative `info` error codes because usually, it would indicate wrong implementation. However, this is not the case with SVD now in newer versions of LAPACK. MKL (tried 2021.4.0) still gives a positive error code for this kind of input. This change aligns with the OpenBLAS and MKL behavior in our code. UPDATE: MKL 2022 has uses the latest reference LAPACK behavior and returns the same `info` as OpenBLAS 0.3.15+ This PR fixes https://github.com/pytorch/pytorch/issues/71645 that is due to the updated MKL version in CI. Pull Request resolved: https://github.com/pytorch/pytorch/pull/68812 Reviewed By: mrshenli Differential Revision: D33844257 Pulled By: ngimel fbshipit-source-id: fd1c86e37e405b330633d039f49dce466391b66e (cherry picked from commit `c00a9bdeb0`)	2022-01-29 00:48:06 +00:00
Pavel Belevich	9413c0cd3e	Revert D32626563: [pytorch][PR] Fix SVD error code handling for OpenBLAS 0.3.15+ and MKL 2022+ Test Plan: revert-hammer Differential Revision: D32626563 (`e06eb286da`) Original commit changeset: 09042f07cdc9 Original Phabricator Diff: D32626563 (`e06eb286da`) fbshipit-source-id: 387681e68121708a97dfe2768297b470fa84c097 (cherry picked from commit `6ad4864d63`)	2022-01-28 13:24:43 +00:00
Ivan Yashchuk	e06eb286da	Fix SVD error code handling for OpenBLAS 0.3.15+ and MKL 2022+ (#68812 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/67693. Reference LAPACK (used in OpenBLAS) changed info error code for svd when inputs contain non-finite numbers. In PyTorch, we raise an internal assert error for negative `info` error codes because usually, it would indicate wrong implementation. However, this is not the case with SVD now in newer versions of LAPACK. MKL (tried 2021.4.0) still gives a positive error code for this kind of input. This change aligns with the OpenBLAS and MKL behavior in our code. UPDATE: MKL 2022 has uses the latest reference LAPACK behavior and returns the same `info` as OpenBLAS 0.3.15+ This PR fixes https://github.com/pytorch/pytorch/issues/71645 that is due to the updated MKL version in CI. Pull Request resolved: https://github.com/pytorch/pytorch/pull/68812 Reviewed By: osalpekar Differential Revision: D32626563 Pulled By: ngimel fbshipit-source-id: 09042f07cdc9c24ce1fa5cd6f4483340c7b5b06c (cherry picked from commit `aadf507319`)	2022-01-28 06:33:29 +00:00
lezcano	1407939f69	Remove unnecessary non_contiguous and gradient tests from test_linalg (#68188 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68188 As per title cc mruberry jianyuh nikitaved pearu walterddr IvanYashchuk xwang233 Lezcano Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D33774419 Pulled By: mruberry fbshipit-source-id: d63b599a25a6426463548d632d13a403cad1cc34 (cherry picked from commit `eed47601fa`)	2022-01-27 23:13:17 +00:00
lezcano	8ff1a8fdca	Implement forward AD for linalg.svd and improve svd_backward (#70253 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70253 I included a derivation of the formula in the complex case, as it is particularly tricky. As far as I know, this is the first time this formula is derived in the literature. I also implemented a more efficient and more accurate version of svd_backward. More importantly, I also added a lax check in the complex case making sure the loss function just depends on the subspaces spanned by the pairs of singular vectors, and not their joint phase. cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano Test Plan: Imported from OSS Reviewed By: mikaylagawarecki Differential Revision: D33751982 Pulled By: mruberry fbshipit-source-id: c2a4a92a921a732357e99c01ccb563813b1af512 (cherry picked from commit `391319ed8f`)	2022-01-27 18:38:30 +00:00
lezcano	84f1685397	Rewrite svd and linalg.svd as structured kernels (#69827 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69827 In general, the current pattern allows for implementing optimisations for all the backends in a common place (see for example the optimisation for empty matrices). After this PR, `torch.svd` is implemented in terms of `linalg.svd` and `linalg.svdvals`, as expected. This makes it differentiable in the case when `compute_uv=False`, although this is not particularly important, as `torch.svd` will eventually be deprecated. This PR also instantiates smaller `U` / `V` when calling cusolver_gesvdj in the cases when `full_matrices=False` or `compute_uv=False`. The memory for auxiliary `U` and `V` in the cases above, needed for some cuSOLVER routines is allocated raw allocators rather than through fully fledged tensors, as it's just a blob of memory the algorithm requests. As the code is better structured now, it was easier to see that `U` and `Vh` needn't be allocated when calling `svd_cusolver_gesvd`. Now `linalg.svdvals` work as expected wrt the `out=` parameter. Note that in the test `test_svd_memory_allocation` we were passing a tensor of the wrong size and dtype and the test seemed to pass... This PR also changes the backward formula to avoid saving the input matrix, as it's not necessary. In a follow up PR, I will clean the backward formula and make it more numerically stable and efficient. This PR also does a number of memory optimisations here and there, and fixes the call to cusolver_gesvd, which were incorrect for m <= n. To test this path, I compiled the code with a flag to unconditionally execute the `if (!gesvdj_convergence_check.empty())` branch, and all the tests passed. I also took this chance to simplify the tests for these functions in `test_linalg.py`, as we had lots of tests that were testing some functionality that is already currently tested in the corresponding OpInfos. I used xwang233's feature to test both MAGMA and CUDA backends. This is particularly good for SVD, as cuSOLVER is always chosen over MAGMA when available, so testing MAGMA otherwise would be tricky. cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano Test Plan: Imported from OSS Reviewed By: mikaylagawarecki Differential Revision: D33751983 Pulled By: mruberry fbshipit-source-id: 11d48d977946345583d33d14fb11a170a7d14fd2 (cherry picked from commit `a1860bd567`)	2022-01-27 18:38:30 +00:00
Peter Bell	4e031419aa	Skip broken svd tests (#71646 ) Summary: Workaround for https://github.com/pytorch/pytorch/issues/71645 breaking CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/71646 Reviewed By: jbschlosser Differential Revision: D33717104 Pulled By: albanD fbshipit-source-id: f12d3895ecadd7000faa203e3d070dc0ee81e2f7 (cherry picked from commit `2b7c234d84`)	2022-01-21 20:47:21 +00:00
lezcano	baeca11a21	Remove random_fullrank_matrix_distinc_singular_value (#68183 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68183 We do so in favour of `make_fullrank_matrices_with_distinct_singular_values` as this latter one not only has an even longer name, but also generates inputs correctly for them to work with the PR that tests noncontig inputs latter in this stack. We also heavily simplified the generation of samples for the SVD, as it was fairly convoluted and it was not generating the inputs correclty for the noncontiguous test. To do the transition, we also needed to fix the following issue, as it was popping up in the tests: Fixes https://github.com/pytorch/pytorch/issues/66856 cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D32684853 Pulled By: mruberry fbshipit-source-id: e88189c8b67dbf592eccdabaf2aa6d2e2f7b95a4	2022-01-05 20:33:37 -08:00
lezcano	a35b4b49d2	Add linalg.lu_factor (#66933 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66933 This PR exposes `torch.lu` as `torch.linalg.lu_factor` and `torch.linalg.lu_factor_ex`. This PR also adds support for matrices with zero elements both in the size of the matrix and the batch. Note that this function simply returns empty tensors of the correct size in this case. We add a test and an OpInfo for the new function. This PR also adds documentation for this new function in line of the documentation in the rest of `torch.linalg`. Fixes https://github.com/pytorch/pytorch/issues/56590 Fixes https://github.com/pytorch/pytorch/issues/64014 cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D32834069 Pulled By: mruberry fbshipit-source-id: 51ef12535fa91d292f419acf83b800b86ee9c7eb	2022-01-05 20:32:12 -08:00
Sameer Deshmukh	d100d98db8	`torch.linalg` routines return `torch.linalg.LinAlgError` when a numerical error in the computation is found. (#68571 ) Summary: This PR fixes https://github.com/pytorch/pytorch/issues/64785 by introducing a `torch.LinAlgError` for reporting errors caused by bad values in linear algebra routines which should allow users to easily catch errors caused by numerical errors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/68571 Reviewed By: malfet Differential Revision: D33254087 Pulled By: albanD fbshipit-source-id: 94b59000fdb6a9765e397158e526d1f815f18f0f	2021-12-23 10:53:26 -08:00
Jane Xu	c555b7bacb	GHA: Remove caffe2 check in Windows shard 1 smoke tests (#70010 ) Summary: Windows shard 1 hasn't actually been running any tests because the script that does so exited before running the python tests but did not report an error. This has been happening to all windows tests across the board, for example https://github.com/pytorch/pytorch/runs/4526170542?check_suite_focus=true Removing the caffe2.python check passes the smoke tests now. You can observe that the run_test.py file is called in the windows cpu job now https://github.com/pytorch/pytorch/runs/4541331717?check_suite_focus=true Pull Request resolved: https://github.com/pytorch/pytorch/pull/70010 Reviewed By: malfet, seemethere Differential Revision: D33161291 Pulled By: janeyx99 fbshipit-source-id: 85024b0ebb3ac42297684467ee4d0898ecf394de	2021-12-20 16:05:38 -08:00
Xiao Wang	bfe5ad28e6	[Linalg] Add a runtime switch to let pytorch prefer a backend impl in linalg functions on GPU (#67980 ) Summary: Per title. This PR introduces a global flag that lets pytorch prefer one of the many backend implementations while calling linear algebra functions on GPU. Usage: ```python torch.backends.cuda.preferred_linalg_library('cusolver') ``` Available options (str): `'default'`, `'cusolver'`, `'magma'`. Issue https://github.com/pytorch/pytorch/issues/63992 inspired me to write this PR. No heuristic is perfect on all devices, library versions, matrix shapes, workloads, etc. We can obtain better performance if we can conveniently switch linear algebra backends at runtime. Performance of linear algebra operators after this PR should be no worse than before. The flag is set to `'default'` by default, which makes everything the same as before this PR. The implementation of this PR is basically following that of https://github.com/pytorch/pytorch/pull/67790. Pull Request resolved: https://github.com/pytorch/pytorch/pull/67980 Reviewed By: mruberry Differential Revision: D32849457 Pulled By: ngimel fbshipit-source-id: 679fee7744a03af057995aef06316306073010a6	2021-12-03 19:06:30 -08:00
lezcano	cafcf599d0	Deprecate torch.triangular_solve (#63570 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63570 There is a use of `at::triangular_solve_out` in the file `torch/csrc/jit/tensorexpr/external_functions.cpp` that I have not dared to move to `at::linalg_solve_triangular_out`. Deprecation note: This PR deprecates the `torch.triangular_solve` function in favor of `torch.linalg.solve_triangular`. An upgrade guide is added to the documentation for `torch.triangular_solve`. Note that it DOES NOT remove `torch.triangular_solve`, but `torch.triangular_solve` will be removed in a future PyTorch release. cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D32618035 Pulled By: anjali411 fbshipit-source-id: 0bfb48eeb6d96eff3e96e8a14818268cceb93c83	2021-12-02 13:24:55 -08:00
Mike Ruberry	6ae34ea6f8	Revert D32521980: Add linalg.lu_factor Test Plan: revert-hammer Differential Revision: D32521980 (`b10929a14a`) Original commit changeset: 26a49ebd87f8 fbshipit-source-id: e1a6bb9c2ece9bd78190fe17e16a46e3358c5c82	2021-11-28 17:22:15 -08:00
lezcano	b10929a14a	Add linalg.lu_factor (#66933 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66933 This PR exposes `torch.lu` as `torch.linalg.lu_factor` and `torch.linalg.lu_factor_ex`. This PR also adds support for matrices with zero elements both in the size of the matrix and the batch. Note that this function simply returns empty tensors of the correct size in this case. We add a test and an OpInfo for the new function. This PR also adds documentation for this new function in line of the documentation in the rest of `torch.linalg`. Fixes https://github.com/pytorch/pytorch/issues/56590 Fixes https://github.com/pytorch/pytorch/issues/64014 cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D32521980 Pulled By: mruberry fbshipit-source-id: 26a49ebd87f8a41472f8cd4e9de4ddfb7f5581fb	2021-11-27 17:52:48 -08:00
Xiao Wang	2cd48d14ef	Fix `test_svd_errors_and_warnings` warning message when cuda >= 11.5 (#68683 ) Summary: In SVD cusolverDnXgesvd computations, When cuda < 11.5, cusolver raises CUSOLVER_STATUS_EXECUTION_FAILED when input contains nan. When cuda >= 11.5, cusolver normally finishes execution and sets info array indicating convergence issue. Related: https://github.com/pytorch/pytorch/issues/68259 #64533 Pull Request resolved: https://github.com/pytorch/pytorch/pull/68683 Reviewed By: dagitses Differential Revision: D32583576 Pulled By: mruberry fbshipit-source-id: f732872522e0bda2703450ffcc64ae3a0d3f5bc0	2021-11-23 14:16:23 -08:00
lezcano	b46c89d950	Add linalg.solve_triangular (#63568 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63568 This PR adds the first solver with structure to `linalg`. This solver has an API compatible with that of `linalg.solve` preparing these for a possible future merge of the APIs. The new API: - Just returns the solution, rather than the solution and a copy of `A` - Removes the confusing `transpose` argument and replaces it by a correct handling of conj and strides within the call - Adds a `left=True` kwarg. This can be achieved via transposes of the inputs and the result, but it's exposed for convenience. This PR also implements a dataflow that minimises the number of copies needed before calling LAPACK / MAGMA / cuBLAS and takes advantage of the conjugate and neg bits. This algorithm is implemented for `solve_triangular` (which, for this, is the most complex of all the solvers due to the `upper` parameters). Once more solvers are added, we will factor out this calling algorithm, so that all of them can take advantage of it. Given the complexity of this algorithm, we implement some thorough testing. We also added tests for all the backends, which was not done before. We also add forward AD support for `linalg.solve_triangular` and improve the docs of `linalg.solve_triangular`. We also fix a few issues with those of `torch.triangular_solve`. Resolves https://github.com/pytorch/pytorch/issues/54258 Resolves https://github.com/pytorch/pytorch/issues/56327 Resolves https://github.com/pytorch/pytorch/issues/45734 cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D32588230 Pulled By: mruberry fbshipit-source-id: 69e484849deb9ad7bb992cc97905df29c8915910	2021-11-22 12:41:06 -08:00
Jane Xu	9f4e004abd	Revert D32283178: Add linalg.solve_triangular Test Plan: revert-hammer Differential Revision: D32283178 (`0706607abc`) Original commit changeset: deb672e6e52f fbshipit-source-id: d2a3421292147426cc61c2f063b721acf9004755	2021-11-18 14:46:10 -08:00
lezcano	0706607abc	Add linalg.solve_triangular (#63568 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63568 This PR adds the first solver with structure to `linalg`. This solver has an API compatible with that of `linalg.solve` preparing these for a possible future merge of the APIs. The new API: - Just returns the solution, rather than the solution and a copy of `A` - Removes the confusing `transpose` argument and replaces it by a correct handling of conj and strides within the call - Adds a `left=True` kwarg. This can be achieved via transposes of the inputs and the result, but it's exposed for convenience. This PR also implements a dataflow that minimises the number of copies needed before calling LAPACK / MAGMA / cuBLAS and takes advantage of the conjugate and neg bits. This algorithm is implemented for `solve_triangular` (which, for this, is the most complex of all the solvers due to the `upper` parameters). Once more solvers are added, we will factor out this calling algorithm, so that all of them can take advantage of it. Given the complexity of this algorithm, we implement some thorough testing. We also added tests for all the backends, which was not done before. We also add forward AD support for `linalg.solve_triangular` and improve the docs of `linalg.solve_triangular`. We also fix a few issues with those of `torch.triangular_solve`. Resolves https://github.com/pytorch/pytorch/issues/54258 Resolves https://github.com/pytorch/pytorch/issues/56327 Resolves https://github.com/pytorch/pytorch/issues/45734 cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano Test Plan: Imported from OSS Reviewed By: zou3519, JacobSzwejbka Differential Revision: D32283178 Pulled By: mruberry fbshipit-source-id: deb672e6e52f58b76536ab4158073927a35e43a8	2021-11-18 09:45:51 -08:00
eqy	391be39575	Use reduced precision switch in `test_addmm_baddbmm_overflow` (#68399 ) Summary: https://github.com/pytorch/pytorch/issues/68125 Checking to see if actually using the switch fixes the test... CC mruberry ngimel ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/68399 Reviewed By: VitalyFedyunin Differential Revision: D32466974 Pulled By: ngimel fbshipit-source-id: aa8643ed913b344977fd103974625c527d20dbb8	2021-11-16 10:50:17 -08:00
lezcano	43874d79e7	Fix failing test due to a bug in NumPy when using OpenBLAS (#67679 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67679 implementations Fixes https://github.com/pytorch/pytorch/issues/67675 cc mruberry Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D32368698 Pulled By: mruberry fbshipit-source-id: 3ea6ebc43c061af2f376cdf5da06884859bbbf53	2021-11-15 08:25:12 -08:00
Sameer Deshmukh	6afb414c21	Nan in linalg eig (#67544 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/61251. As per the comment here (https://github.com/pytorch/pytorch/issues/61251#issuecomment-954676082), a consensus has been reached to raise an error if there is a NaN value in the input when calling `eig()`. This PR implements that feature. cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano Pull Request resolved: https://github.com/pytorch/pytorch/pull/67544 Reviewed By: malfet Differential Revision: D32310919 Pulled By: mruberry fbshipit-source-id: fc74a1ae2d929157c2d4c9051e3e9a4bf03dd5be	2021-11-11 14:33:49 -08:00
Anirudh Dagar	b07a11929d	Array API: Add torch.linalg.cross (#63285 ) Summary: ### Create `linalg.cross` Fixes https://github.com/pytorch/pytorch/issues/62810 As discussed in the corresponding issue, this PR adds `cross` to the `linalg` namespace (Note: There is no method variant) which is slightly different in behaviour compared to `torch.cross`. Note: this is NOT an alias as suggested in mruberry's [https://github.com/pytorch/pytorch/issues/62810 comment](https://github.com/pytorch/pytorch/issues/62810#issuecomment-897504372) below > linalg.cross being consistent with the Python Array API (over NumPy) makes sense because NumPy has no linalg.cross. I also think we can implement linalg.cross without immediately deprecating torch.cross, although we should definitely refer users to linalg.cross. Deprecating torch.cross will require additional review. While it's not used often it is used, and it's unclear if users are relying on its unique behavior or not. The current default implementation of `torch.cross` is extremely weird and confusing. This has also been reported multiple times previously. (See https://github.com/pytorch/pytorch/issues/17229, https://github.com/pytorch/pytorch/issues/39310, https://github.com/pytorch/pytorch/issues/41850, https://github.com/pytorch/pytorch/issues/50273) - [x] Add `torch.linalg.cross` with default `dim=-1` - [x] Add OpInfo and other tests for `torch.linalg.cross` - [x] Add broadcasting support to `torch.cross` and `torch.linalg.cross` - [x] Remove out skip from `torch.cross` OpInfo - [x] Add docs for `torch.linalg.cross`. Improve docs for `torch.cross` mentioning `linalg.cross` and the difference between the two. Also adds a warning to `torch.cross`, that it may change in the future (we might want to deprecate it later) --- ### Additional Fixes to `torch.cross` - [x] Fix Doc for Tensor.cross - [x] Fix torch.cross in `torch/overridres.py` While working on `linalg.cross` I noticed these small issues with `torch.cross` itself. [Tensor.cross docs](https://pytorch.org/docs/stable/generated/torch.Tensor.cross.html) still mentions `dim=-1` default which is actually wrong. It should be `dim=None` after the behaviour was updated in PR https://github.com/pytorch/pytorch/issues/17582 but the documentation for the `method` or `function` variant wasn’t updated. Later PR https://github.com/pytorch/pytorch/issues/41850 updated the documentation for the `function` variant i.e `torch.cross` and also added the following warning about the weird behaviour. > If `dim` is not given, it defaults to the first dimension found with the size 3. Note that this might be unexpected. But still, the `Tensor.cross` docs were missed and remained outdated. I’m finally fixing that here. Also fixing `torch/overrides.py` for `torch.cross` as well now, with `dim=None`. To verify according to the docs the default behaviour of `dim=-1` should raise, you can try the following. ```python a = torch.randn(3, 4) b = torch.randn(3, 4) b.cross(a) # this works because the implementation finds 3 in the first dimension and the default behaviour as shown in documentation is actually not true. >>> tensor([[ 0.7171, -1.1059, 0.4162, 1.3026], [ 0.4320, -2.1591, -1.1423, 1.2314], [-0.6034, -1.6592, -0.8016, 1.6467]]) b.cross(a, dim=-1) # this raises as expected since the last dimension doesn't have a 3 >>> RuntimeError: dimension -1 does not have size 3 ``` Please take a closer look (particularly the autograd part, this is the first time I'm dealing with `derivatives.yaml`). If there is something missing, wrong or needs more explanation, please let me know. Looking forward to the feedback. cc mruberry Lezcano IvanYashchuk rgommers Pull Request resolved: https://github.com/pytorch/pytorch/pull/63285 Reviewed By: gchanan Differential Revision: D32313346 Pulled By: mruberry fbshipit-source-id: e68c2687c57367274e8ddb7ef28ee92dcd4c9f2c	2021-11-11 12:49:41 -08:00
Thomas Viehmann	40bedf6206	Fix test_triangular_solve testcase enumeration (#67635 ) Summary: use product instead of zip to cover all cases cc mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/67635 Reviewed By: malfet Differential Revision: D32310956 Pulled By: mruberry fbshipit-source-id: 806c3313e2db26d77199d3145b2d5283b6ca3617	2021-11-11 12:49:38 -08:00
Eddie Yan	a5b57c9433	Avoid prematurely casting GEMM parameters `alpha`, `beta` to `scalar_t` (#67633 ) Summary: stas00 uncovered an issue where certain half-precision GEMMs would produce outputs that looked like the result of strange rounding behavior (e.g., `10008.` in place of `10000.`). ptrblck suspected that this was due to the parameters being downcasted to the input types (which would reproduce the problematic output). Indeed, the GEMM and BGEMM cublas wrappers are currently converting the `alpha` and `beta` parameters to `scalar_t` (which potentially is reduced precision) before converting them back to `float`. This PR changes the "ARGTYPE" wrappers to use `acc_t` instead and adds a corresponding test. CC ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/67633 Reviewed By: mruberry Differential Revision: D32076474 Pulled By: ngimel fbshipit-source-id: 2540d9b9d0195c17d07d1161374fb6a5850779d5	2021-11-03 12:01:04 -07:00
kshitij12345	885a8e53ba	replace onlyOnCPUAndCUDA with onlyNativeDeviceTypes (#65201 ) Summary: Reference https://github.com/pytorch/pytorch/issues/53849 Replace `onlyOnCPUandCUDA` with `onlyNativeDeviceTypes` which includes `cpu, cuda and meta`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/65201 Reviewed By: mrshenli Differential Revision: D31299718 Pulled By: mruberry fbshipit-source-id: 2d8356450c035d6a314209ab51b2c237583920fd	2021-11-01 09:22:34 -07:00
Eddie Yan	e01279cc2e	Disable reduced precision reductions for fp16 GEMMs (#67578 ) Summary: It appears that most NVIDIA architectures (well, at least there haven't been many reports of this issue) don't do reduced precision reductions (e.g., reducing in fp16 given fp16 inputs), but this change attempts to ensure that a reduced precision reduction is never done. The included test case currently fails on Volta but passes on Pascal and Ampere; setting this flag causes the test to pass on all three. CC stas00 ngimel ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/67578 Reviewed By: mruberry Differential Revision: D32046030 Pulled By: ngimel fbshipit-source-id: ac9aa8489ad6835f34bd0300c5d6f4ea76f333d1	2021-10-30 21:14:11 -07:00
Ivan Yashchuk	fdc74e2373	Port triangular_solve to structured kernel (#61857 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61857 A few updates to internal code that allow marking triangular_solve as structured. cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D31928687 Pulled By: cpuhrsch fbshipit-source-id: 80a2783c469d5a6194c466ccfa8808fa41c0bdb7	2021-10-26 14:50:00 -07:00
Xiao Wang	204ffd33ee	[CUDA][Linalg] Add gesvd as SVD fallback; optimize SVD gesvdj performance (#64533 ) Summary: Fix https://github.com/pytorch/pytorch/issues/64237 Fix https://github.com/pytorch/pytorch/issues/28293 Fix https://github.com/pytorch/pytorch/issues/4689 See also https://github.com/pytorch/pytorch/issues/47953 cc ngimel jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano Pull Request resolved: https://github.com/pytorch/pytorch/pull/64533 Reviewed By: albanD Differential Revision: D31915794 Pulled By: ngimel fbshipit-source-id: 29ea48696531ced8a48474e891a9e2d5f11e9d7a	2021-10-26 10:13:52 -07:00
lezcano	f4dd88489a	Better and more consistent error messages in torch.linalg (#62734 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62734 Following https://github.com/pytorch/pytorch/pull/62715#discussion_r682610788 - squareCheckInputs takes a string with the name of the function - We reuse more functions when checking the inputs The state of the errors in torch.linalg is far from great though. We leave a more comprehensive clean-up for the future. cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D31823230 Pulled By: mruberry fbshipit-source-id: eccd531f10d590eb5f9d04a957b7cdcb31c72ea4	2021-10-25 13:24:28 -07:00
Jane Xu	9a00910bf3	[skip ci] Set test owner for test_linalg.py (#66844 ) Summary: Action following https://github.com/pytorch/pytorch/issues/66232 cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano Pull Request resolved: https://github.com/pytorch/pytorch/pull/66844 Reviewed By: gchanan Differential Revision: D31761714 Pulled By: janeyx99 fbshipit-source-id: a4c7b239d855707ee6ec1194f57f8a66812b4e99	2021-10-19 13:01:05 -07:00
lezcano	a2e94b80fa	Create linalg.matrix_exp (#62715 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62715 Fixes https://github.com/pytorch/pytorch/issues/61648 Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D31641698 Pulled By: mruberry fbshipit-source-id: 2e2965d14807b6b4fada4b809d539066dd0ba277	2021-10-19 09:07:15 -07:00
Ivan Yashchuk	061baf02bf	Skip failing tests when LAPACK and MAGMA are not available (#64930 ) Summary: Skip failing tests when LAPACK and MAGMA are not available for ` test_linalg.py` and ` test_ops.py`. Note that there's no CI without LAPACK or MAGMA. I verified locally that now it works as expected, but in the future we have no guards against tests failing again for this situation. <details> <summary> test_ops.py failures that are fixed</summary> ``` FAILED test/test_ops.py::TestCommonCPU::test_out_linalg_tensorinv_cpu_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestCommonCPU::test_reference_testing_linalg_tensorinv_cpu_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestCommonCPU::test_reference_testing_linalg_tensorinv_cpu_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestCommonCPU::test_variant_consistency_eager_linalg_tensorinv_cpu_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestCommonCPU::test_variant_consistency_eager_linalg_tensorinv_cpu_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestCommonCPU::test_variant_consistency_eager_triangular_solve_cpu_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestCommonCPU::test_variant_consistency_eager_triangular_solve_cpu_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestGradientsCPU::test_fn_grad_linalg_tensorinv_cpu_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestGradientsCPU::test_fn_grad_linalg_tensorinv_cpu_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestGradientsCPU::test_fn_grad_triangular_solve_cpu_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestGradientsCPU::test_fn_grad_triangular_solve_cpu_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestGradientsCPU::test_fn_gradgrad_linalg_tensorinv_cpu_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestGradientsCPU::test_fn_gradgrad_linalg_tensorinv_cpu_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestGradientsCPU::test_fn_gradgrad_triangular_solve_cpu_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestGradientsCPU::test_fn_gradgrad_triangular_solve_cpu_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestGradientsCPU::test_forward_mode_AD_linalg_tensorinv_cpu_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestGradientsCPU::test_forward_mode_AD_linalg_tensorinv_cpu_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestGradientsCPU::test_forward_mode_AD_triangular_solve_cpu_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestGradientsCPU::test_forward_mode_AD_triangular_solve_cpu_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestJitCPU::test_variant_consistency_jit_linalg_tensorinv_cpu_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestJitCPU::test_variant_consistency_jit_triangular_solve_cpu_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestJitCPU::test_variant_consistency_jit_triangular_solve_cpu_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestMathBitsCPU::test_conj_view_linalg_tensorinv_cpu_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestMathBitsCPU::test_conj_view_triangular_solve_cpu_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestMathBitsCPU::test_neg_view_linalg_tensorinv_cpu_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestMathBitsCPU::test_neg_view_triangular_solve_cpu_float64 - RuntimeError: svd: LAPACK library not found in compilation ``` </details> <details> <summary> test_linalg.py failures that are fixed</summary> ``` FAILED test/test_linalg.py::TestLinalgCPU::test_norm_dtype_cpu - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCPU::test_norm_matrix_cpu_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCPU::test_norm_matrix_cpu_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCPU::test_nuclear_norm_axes_small_brute_force_old_cpu - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_eigh_hermitian_grad_meta_complex128 - RuntimeError: Calling torch.linalg.eigh or eigvalsh on a CPU tensor requires compiling PyTorch with LAPACK. Please use PyTorch built with LAPACK support. FAILED test/test_linalg.py::TestLinalgMETA::test_eigh_hermitian_grad_meta_float64 - RuntimeError: Calling torch.linalg.eigh or eigvalsh on a CPU tensor requires compiling PyTorch with LAPACK. Please use PyTorch built with LAPACK support. FAILED test/test_linalg.py::TestLinalgMETA::test_inverse_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_inverse_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_inverse_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_inverse_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_lu_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_lu_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_lu_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_batched_broadcasting_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_batched_broadcasting_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_batched_broadcasting_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_batched_broadcasting_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_batched_non_contiguous_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_batched_non_contiguous_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_batched_non_contiguous_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_batched_non_contiguous_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_broadcasting_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_broadcasting_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_broadcasting_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_broadcasting_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_non_contiguous_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_non_contiguous_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_non_contiguous_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_non_contiguous_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_solve_batched_non_contiguous_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_solve_batched_non_contiguous_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_solve_batched_non_contiguous_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_solve_batched_non_contiguous_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_solve_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_solve_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_solve_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_solve_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_svd_square_col_maj_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_svd_square_col_maj_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_svd_square_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_svd_square_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_svd_square_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_svd_square_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_svd_tall_all_col_maj_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_svd_tall_all_col_maj_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_svd_tall_all_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_svd_tall_all_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_svd_tall_some_col_maj_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_svd_tall_some_col_maj_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_svd_tall_some_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_svd_tall_some_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_inverse_cuda_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_inverse_cuda_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_inverse_cuda_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_inverse_cuda_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_lowrank_cuda_float64 - RuntimeError: Calling torch.lu on a CUDA tensor requires compiling PyTorch with MAGMA. lease rebuild with MAGMA. FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_square_col_maj_cuda_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_square_col_maj_cuda_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_square_cuda_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_square_cuda_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_square_cuda_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_square_cuda_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_tall_all_col_maj_cuda_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_tall_all_col_maj_cuda_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_tall_all_cuda_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_tall_all_cuda_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_tall_some_col_maj_cuda_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_tall_some_col_maj_cuda_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_tall_some_cuda_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_tall_some_cuda_float64 - RuntimeError: svd: LAPACK library not found in compilation ``` </details> Fixes https://github.com/pytorch/pytorch/issues/59662 cc mruberry jianyuh nikitaved pearu walterddr IvanYashchuk xwang233 Lezcano Pull Request resolved: https://github.com/pytorch/pytorch/pull/64930 Reviewed By: zou3519 Differential Revision: D31739416 Pulled By: mruberry fbshipit-source-id: 153c40d8eeeb094b06816882a7cbb28c681509a9	2021-10-18 21:30:01 -07:00
Natalia Gimelshein	c9c52b760b	test addr type promotion in a single test (#66812 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/66802 Test time goes from 150s to 15s. Pull Request resolved: https://github.com/pytorch/pytorch/pull/66812 Reviewed By: mruberry Differential Revision: D31739299 Pulled By: ngimel fbshipit-source-id: cb6d92ff335f46ee06b2480bdd9143f85865bccf	2021-10-18 21:21:11 -07:00
lezcano	0974215c4d	Prefer mT and mH over transpose(-2, -1) and transpose(-2, -1).conj() (#64181 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64181 This PR replaces all the calls to: - `transpose(-2, -1)` or `transpose(-1, -2)` by `mT()` in C++ and `mT` in Python - `conj().transpose(-2, -1)` or `transpose(-2, -1).conj()` or `conj().transpose(-1, -2)` or `transpose(-1, -2).conj()` by `mH()` in C++ and `mH` in Python. It also simplifies two pieces of code, and fixes one bug where a pair of parentheses were missing in the function `make_symmetric_matrices`. Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D31692896 Pulled By: anjali411 fbshipit-source-id: e9112c42343663d442dc5bd53ff2b492094b434a	2021-10-18 13:02:25 -07:00
Ivan Yashchuk	0d203a16fe	Add relative and absolute tolerances for matrix_rank, pinv (#63102 ) Summary: This pull request introduces new keyword arguments for `torch.linalg.matrix_rank` and `torch.linalg.pinv`: `atol` and `rtol`. Currently, only tensor overload has default values for either `atol` or `rtol`, the float overload requires both arguments to be specified. FC compatibility: https://github.com/pytorch/pytorch/pull/63102#discussion_r710930509 Fixes https://github.com/pytorch/pytorch/issues/54151. Fixes https://github.com/pytorch/pytorch/issues/66618. cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano Pull Request resolved: https://github.com/pytorch/pytorch/pull/63102 Reviewed By: H-Huang Differential Revision: D31641456 Pulled By: mruberry fbshipit-source-id: 4c765508ab1657730703e42975fc8c0d0a60eb7c	2021-10-17 22:15:42 -07:00
Richard Barnes	abc022f9c8	Fix torch.cholesky deprecation warning (#66645 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66645 Fixes: ``` test_cholesky_solve_batched_broadcasting_cpu_complex128 (__main__.TestLinalgCPU) ... test_linalg.py:3099: UserWarning: torch.cholesky is deprecated in favor of torch.linalg.cholesky and will be removed in a future PyTorch release. ``` Test Plan: Sandcastle Reviewed By: mruberry Differential Revision: D31635851 fbshipit-source-id: c377eb88d753fb573b3947f0c6ff5df055cb13d8	2021-10-15 13:24:58 -07:00
Michael Suo	22f36353dc	Revert D31137652: [pytorch][PR] Skip failing tests when LAPACK and MAGMA are not available Test Plan: revert-hammer Differential Revision: D31137652 (`dd354117ef`) Original commit changeset: c969f75d7cf1 fbshipit-source-id: bc4cde4eeb5d38ac940ebb471abbd8b9009b3aee	2021-09-30 16:08:57 -07:00
Ivan Yashchuk	dd354117ef	Skip failing tests when LAPACK and MAGMA are not available (#64930 ) Summary: Skip failing tests when LAPACK and MAGMA are not available for ` test_linalg.py` and ` test_ops.py`. Note that there's no CI without LAPACK or MAGMA. I verified locally that now it works as expected, but in the future we have no guards against tests failing again for this situation. <details> <summary> test_ops.py failures that are fixed</summary> ``` FAILED test/test_ops.py::TestCommonCPU::test_out_linalg_tensorinv_cpu_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestCommonCPU::test_reference_testing_linalg_tensorinv_cpu_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestCommonCPU::test_reference_testing_linalg_tensorinv_cpu_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestCommonCPU::test_variant_consistency_eager_linalg_tensorinv_cpu_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestCommonCPU::test_variant_consistency_eager_linalg_tensorinv_cpu_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestCommonCPU::test_variant_consistency_eager_triangular_solve_cpu_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestCommonCPU::test_variant_consistency_eager_triangular_solve_cpu_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestGradientsCPU::test_fn_grad_linalg_tensorinv_cpu_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestGradientsCPU::test_fn_grad_linalg_tensorinv_cpu_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestGradientsCPU::test_fn_grad_triangular_solve_cpu_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestGradientsCPU::test_fn_grad_triangular_solve_cpu_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestGradientsCPU::test_fn_gradgrad_linalg_tensorinv_cpu_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestGradientsCPU::test_fn_gradgrad_linalg_tensorinv_cpu_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestGradientsCPU::test_fn_gradgrad_triangular_solve_cpu_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestGradientsCPU::test_fn_gradgrad_triangular_solve_cpu_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestGradientsCPU::test_forward_mode_AD_linalg_tensorinv_cpu_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestGradientsCPU::test_forward_mode_AD_linalg_tensorinv_cpu_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestGradientsCPU::test_forward_mode_AD_triangular_solve_cpu_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestGradientsCPU::test_forward_mode_AD_triangular_solve_cpu_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestJitCPU::test_variant_consistency_jit_linalg_tensorinv_cpu_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestJitCPU::test_variant_consistency_jit_triangular_solve_cpu_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestJitCPU::test_variant_consistency_jit_triangular_solve_cpu_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestMathBitsCPU::test_conj_view_linalg_tensorinv_cpu_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestMathBitsCPU::test_conj_view_triangular_solve_cpu_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestMathBitsCPU::test_neg_view_linalg_tensorinv_cpu_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestMathBitsCPU::test_neg_view_triangular_solve_cpu_float64 - RuntimeError: svd: LAPACK library not found in compilation ``` </details> <details> <summary> test_linalg.py failures that are fixed</summary> ``` FAILED test/test_linalg.py::TestLinalgCPU::test_norm_dtype_cpu - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCPU::test_norm_matrix_cpu_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCPU::test_norm_matrix_cpu_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCPU::test_nuclear_norm_axes_small_brute_force_old_cpu - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_eigh_hermitian_grad_meta_complex128 - RuntimeError: Calling torch.linalg.eigh or eigvalsh on a CPU tensor requires compiling PyTorch with LAPACK. Please use PyTorch built with LAPACK support. FAILED test/test_linalg.py::TestLinalgMETA::test_eigh_hermitian_grad_meta_float64 - RuntimeError: Calling torch.linalg.eigh or eigvalsh on a CPU tensor requires compiling PyTorch with LAPACK. Please use PyTorch built with LAPACK support. FAILED test/test_linalg.py::TestLinalgMETA::test_inverse_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_inverse_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_inverse_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_inverse_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_lu_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_lu_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_lu_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_batched_broadcasting_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_batched_broadcasting_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_batched_broadcasting_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_batched_broadcasting_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_batched_non_contiguous_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_batched_non_contiguous_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_batched_non_contiguous_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_batched_non_contiguous_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_broadcasting_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_broadcasting_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_broadcasting_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_broadcasting_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_non_contiguous_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_non_contiguous_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_non_contiguous_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_non_contiguous_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_solve_batched_non_contiguous_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_solve_batched_non_contiguous_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_solve_batched_non_contiguous_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_solve_batched_non_contiguous_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_solve_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_solve_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_solve_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_solve_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_svd_square_col_maj_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_svd_square_col_maj_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_svd_square_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_svd_square_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_svd_square_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_svd_square_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_svd_tall_all_col_maj_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_svd_tall_all_col_maj_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_svd_tall_all_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_svd_tall_all_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_svd_tall_some_col_maj_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_svd_tall_some_col_maj_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_svd_tall_some_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_svd_tall_some_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_inverse_cuda_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_inverse_cuda_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_inverse_cuda_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_inverse_cuda_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_lowrank_cuda_float64 - RuntimeError: Calling torch.lu on a CUDA tensor requires compiling PyTorch with MAGMA. lease rebuild with MAGMA. FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_square_col_maj_cuda_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_square_col_maj_cuda_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_square_cuda_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_square_cuda_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_square_cuda_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_square_cuda_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_tall_all_col_maj_cuda_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_tall_all_col_maj_cuda_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_tall_all_cuda_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_tall_all_cuda_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_tall_some_col_maj_cuda_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_tall_some_col_maj_cuda_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_tall_some_cuda_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_tall_some_cuda_float64 - RuntimeError: svd: LAPACK library not found in compilation ``` </details> Fixes https://github.com/pytorch/pytorch/issues/59662 cc mruberry jianyuh nikitaved pearu walterddr IvanYashchuk xwang233 Lezcano Pull Request resolved: https://github.com/pytorch/pytorch/pull/64930 Reviewed By: H-Huang Differential Revision: D31137652 Pulled By: mruberry fbshipit-source-id: c969f75d7cf185765211004a0878e7c8a5d3cbf7	2021-09-29 21:31:14 -07:00
haozhe.zhu	752a820230	Bf16 matmul (#64619 ) Summary: Re-create PR to fix https://github.com/pytorch/pytorch/pull/61891. Drop the support for addbmm. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64619 Reviewed By: jbschlosser Differential Revision: D30902995 Pulled By: VitalyFedyunin fbshipit-source-id: dc318d73adff8f6974c9752d0d097e69276f8206	2021-09-17 10:31:56 -07:00
Natalia Gimelshein	e777e1b01c	Revert D29998114: [pytorch][PR] enable bf16 mkldnn path for gemm Test Plan: revert-hammer Differential Revision: D29998114 (`acc9f9afc8`) Original commit changeset: 459dc5874c63 fbshipit-source-id: 1994623a3afc22a94bd0cf5de766b023185f5238	2021-09-07 18:45:13 -07:00
lezcano	566ee1217f	Use trsm for triangular_solve in CPU (#63567 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63567 The current implementation called trtrs for CPU and trsm for CUDA. See https://github.com/pytorch/pytorch/issues/56326#issuecomment-825496115 for a discussion on the differences between these two functions and why we prefer trsm vs trtrs on CUDA. This PR also exposes the `side` argument of this function which is used in the second PR of this stack to optimise the number copies one needs to make when preparing the arguments to be sent to the backends. It also changes the use of `bool`s to a common enum type to represent whether a matrix is transposed / conj transposed, etc. This makes the API consistent, as before, the behaviour of these functions with `transpose=True` and `conjugate_transpose=True` it was not well defined. Functions to transform this type into the specific types / chars for the different libraries are provided under the names `to_blas`, `to_lapack`, `to_magma`, etc. This is the first of a stack of PRs that aim to improve the performance of `linalg.solve_triangular`. `trsm` has an extra parameter (`side`), which allows to ellide the copy of the triangular matrix in many cases. Fixes https://github.com/pytorch/pytorch/issues/56326 Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D30566479 Pulled By: mruberry fbshipit-source-id: 3831af9b51e09fbfe272c17c88c21ecf45413212	2021-09-07 17:26:17 -07:00
haozhe.zhu	acc9f9afc8	enable bf16 mkldnn path for gemm (#61891 ) Summary: # Goal: Integrate mkldnn bf16 Gemm to pytorch ## BF16 Suport for mm, addmm, bmm, addbmm, baddbmm, mv, addmv, dot (with mkldnn matmul primitive): https://oneapi-src.github.io/oneDNN/group__dnnl__api__matmul.html For gemm related ops, we keep all inputs under plain format. So we will not introduce opaque tensor for these ops to save mem copy here. ![mkldnn bf16 gemm integration](https://user-images.githubusercontent.com/54701539/126263077-4b5134e1-52a7-4fad-94fb-19e13a0377f6.png) The minimized integration is only dispatch to mkldnn in addmm, but for gemm with 3-D input (with additional dim for"batch") this will call mkldnn gemm for "batch" times. Since mkldnn matmul support input with multiple dims, we directly dispatch to mkldnn gemm in {bmm, addbmm, baddbmm} to reduce the time to create mkldnn memory desc, primitive, etc. For the different definition for "bias" between mkldnn(which must be shape of (1, N)) and pytorch (which can be same shape with gemm result (M, N)), we use a fused sum to handle it. ## User Case: User case is exactly same with before because no opaque tensor's is introduced. Since the pytorch has already support bf16 data type with CPU tensor before, we can leverage the existed bf16 gemm UT. ## Gemm performance gain on CPX 28Cores/Socket: Note: data is collected using PyTorch operator benchmarks: https://github.com/pytorch/pytorch/tree/master/benchmarks/operator_benchmark (with adding bfloat16 dtype) ### use 1 thread on 1 core ### torch.addmm (M, N) * (N, K) + (M, K) \| impl \|16x16x16\|32x32x32\| 64x64x64 \| 128x128x128\| 256x256x256\| 512x512x512\|1024x1024x1024\| \|:---:\|:---:\| :---: \| :---: \| :---: \| :---: \| :---: \| :---: \| \| aten-fp32\| 4.115us\|4.583us\|8.230us\|26.972us\|211.857us\|1.458ms\|11.258ms\| \| aten-bf16 \| 15.812us\| 105.087us\|801.787us\|3.767ms\|20.274ms\|122.440ms\|836.453ms\| \| mkldnn-bf16 \|20.561us \|22.510us\|24.551us\|37.709us\|143.571us\|0.835ms\|5.76ms\| We can see mkldnn-bf16 are better than aten bf16, but for smaller shapes, mkldnn bf16 are not better than aten fp32. This is because onednn overhead, this overhead more like a "constant" overhead and while problems get larger, we can ignore it. Also we are continue optimize the kernel efficiency and decrease the overhead as well. More shapes \| impl \|1x2048x2048\|2048x1x2048\| 2048x2048x1 \| \|:---:\|:---:\| :---: \| :---: \| \| aten-fp32\| 0.640ms\|3.794ms\|0.641ms\| \| aten-bf16 \| 2.924ms\| 3.868ms\|23.413ms\| \| mkldnn-bf16 \|0.335ms \|4.490ms\|0.368ms\| ### use 1 socket (28 thread, 28 core) \| impl \| 256x256x256\| 512x512x512\|1024x1024x1024\| 2048x2048x2048\|4096x4096x4096\| \|:---:\| :---: \| :---: \| :---: \| :---: \| :---: \| \| aten-fp32\| 35.943us \|140.315us\|643.510us\|5.827ms\|41.761ms\| \| mkldnn-bf16 \|53.432us\|114.716us\|421.858us\|2.863ms\|23.029ms\| More shapes \| impl \|128x2048x2048\|2048x128x2048\| 2048x2048x128 \| \|:---:\|:---:\| :---: \| :---: \| \| aten-fp32\| 0.561ms\|0.458ms\|0.406ms\| \| mkldnn-bf16 \|0.369ms \|0.331ms\|0.239ms\| We dose not show aten-bf16 for this case since aten-bf16 always compute as single thread and the performance is extreme poor. The trend for this case is similar for 1 thread on 1 core. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61891 Reviewed By: iramazanli Differential Revision: D29998114 Pulled By: VitalyFedyunin fbshipit-source-id: 459dc5874c638d62f290c96684ca0a694ded4b5a	2021-09-07 13:00:37 -07:00
Philip Meier	26b7ff5aea	deprecate dtype getters from `torch.testing` namespace (#63554 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63554 Following https://github.com/pytorch/pytorch/pull/61840#issuecomment-884087809, this deprecates all the dtype getters publicly exposed in the `torch.testing` namespace. The reason for this twofold: 1. If someone is not familiar with the C++ dispatch macros PyTorch uses, the names are misleading. For example `torch.testing.floating_types()` will only give you `float32` and `float64` skipping `float16` and `bfloat16`. 2. The dtype getters provide very minimal functionality that can be easily emulated by downstream libraries. We thought about [providing an replacement](https://gist.github.com/pmeier/3dfd2e105842ad0de4505068a1a0270a), but ultimately decided against it. The major problem is BC: by keeping it, either the namespace is getting messy again after a new dtype is added or we need to somehow version the return values of the getters. Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D30662206 Pulled By: mruberry fbshipit-source-id: a2bdb10ab02ae665df1b5b76e8afa9af043bbf56	2021-09-07 08:58:51 -07:00
Ivan Yashchuk	32fbeb170d	Update error messages that use LAPACK error codes (#63864 ) Summary: This PR updates the` batchCheckErrors` and `singleCheckErrors` functions so that the error messages are defined only once. `batchCheckErrors` function reuses `singleCheckErrors` now. Fixes https://github.com/pytorch/pytorch/issues/63220, fixes https://github.com/pytorch/pytorch/issues/59779 cc jianyuh nikitaved pearu mruberry heitorschueroff walterddr IvanYashchuk xwang233 Lezcano Pull Request resolved: https://github.com/pytorch/pytorch/pull/63864 Reviewed By: ngimel Differential Revision: D30672933 Pulled By: mruberry fbshipit-source-id: 0ba37ff98ef278efdb12c3890aa07d687047da7a	2021-09-07 00:05:46 -07:00
anjali411	5d80a48cef	Add fast path for addmm when the inputs are conjugate (#59380 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59380 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D28898374 Pulled By: anjali411 fbshipit-source-id: eab0e64d37bb57c18b54cabb8e5c00666338ba04	2021-09-01 16:34:02 -07:00
Kushashwa Ravi Shrimali	d37636901e	[Doc] `make_tensor` to `torch.testing` module (#63925 ) Summary: This PR aims to add `make_tensor` to the `torch.testing` module in PyTorch docs. TODOs: * [x] Add examples cc: pmeier mruberry brianjo Pull Request resolved: https://github.com/pytorch/pytorch/pull/63925 Reviewed By: ngimel Differential Revision: D30633487 Pulled By: mruberry fbshipit-source-id: 8e5a1f880c6ece5925b4039fee8122bd739538af	2021-08-30 12:25:40 -07:00
Shen Li	1022443168	Revert D30279364: [codemod][lint][fbcode/c*] Enable BLACK by default Test Plan: revert-hammer Differential Revision: D30279364 (`b004307252`) Original commit changeset: c1ed77dfe43a fbshipit-source-id: eab50857675c51e0088391af06ec0ecb14e2347e	2021-08-12 11:45:01 -07:00
Zsolt Dollenstein	b004307252	[codemod][lint][fbcode/c*] Enable BLACK by default Test Plan: manual inspection & sandcastle Reviewed By: zertosh Differential Revision: D30279364 fbshipit-source-id: c1ed77dfe43a3bde358f92737cd5535ae5d13c9a	2021-08-12 10:58:35 -07:00
Raghav Kansal	6d21e36f21	LU solve uses cuBLAS and cuSOLVER for matrices with dim > 1024 (#61815 ) Summary: This PR builds off of https://github.com/pytorch/pytorch/issues/59148 and modifies the `lu_solve` routine to avoid MAGMA for `b` or `lu_data` matrices with any dimension > 1024, since MAGMA has a bug when dealing with such matrices (https://bitbucket.org/icl/magma/issues/19/dgesv_batched-dgetrs_batched-fails-for). Fixes https://github.com/pytorch/pytorch/issues/36921 Fixes https://github.com/pytorch/pytorch/issues/61929 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61815 Reviewed By: anjali411 Differential Revision: D30199618 Pulled By: ngimel fbshipit-source-id: 06870793f697e9c35aaaa8254b8a8b1a38bd3aa9	2021-08-10 11:07:16 -07:00
Rong Rong (AI Infra)	3782f3eced	Enable upper for torch.linalg.cholesky (#62434 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/61988 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62434 Reviewed By: seemethere, tktrungna Differential Revision: D30079806 Pulled By: walterddr fbshipit-source-id: 044efb96525155c9bc7953ac4ad47c1b7c12fb20	2021-08-09 09:28:33 -07:00
Ivan Yashchuk	3c0c1c4ecb	Fix incorrectly sized tensors for svd when full_matrices=False (#62022 ) Summary: Before this PR for m x n input matrix, the return matrices were always allocated as m x m and n x n and then narrowed. This unnecessarily requires a lot of memory that is then discarded. With this PR when `compute_uv=True and full_matrices=False` correctly sized tensors are allocated. Moreover, if `compute_uv=False` U, V matrices are not allocated as they are not needed. However, cusolver's gesvdj routines fail when these matrices are not allocated, which is a bug, so this allocation is done separately in cusolver specific code path. MAGMA doesn't work for this input because it tries to allocate a large matrix internally (ROCm doesn't work as it uses MAGMA). Example error: ``` CUBLAS error: memory mapping error (11) in magma_sgelqf at /opt/conda/conda-bld/magma-cuda110_1598416697386/work/src/sgelqf.cpp:161 CUBLAS error: out of memory (3) in magma_sgeqrf2_gpu at /opt/conda/conda-bld/magma-cuda110_1598416697386/work/src/sgeqrf2_gpu.cpp:145 CUBLAS error: not initialized (1) in magma_sgeqrf2_gpu at /opt/conda/conda-bld/magma-cuda110_1598416697386/work/src/sgeqrf2_gpu.cpp:145 MAGMA error: function-specific error, see documentation (1) in magma_sgeqrf2_gpu at /opt/conda/conda-bld/magma-cuda110_1598416697386/work/src/sgeqrf2_gpu.cpp:145 MAGMA error: function-specific error, see documentation (1) in magma_sgeqrf2_gpu at /opt/conda/conda-bld/magma-cuda110_1598416697386/work/src/sgeqrf2_gpu.cpp:145 python: /opt/conda/conda-bld/magma-cuda110_1598416697386/work/interface_cuda/interface.cpp:806: void magma_queue_create_internal(magma_device_t, magma_queue*, const char, const char*, int): Assertion `queue->dAarray__ != __null' failed. Aborted (core dumped) ``` Fixes https://github.com/pytorch/pytorch/issues/61949. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62022 Reviewed By: heitorschueroff Differential Revision: D29994429 Pulled By: ngimel fbshipit-source-id: c3f7744d7adc5fd6787f6cbb1ec41405f89a6d4c	2021-07-30 10:27:13 -07:00
Xiao Wang	d57ce8cf89	[Linalg] Add cusolver syevjBatched path for torch.linalg.eigh when cuda >= 11.3 U1 (#62003 ) Summary: This PR adds the `cusolverDn<T>SyevjBatched` fuction to the backend of `torch.linalg.eigh` (eigenvalue solver for Hermitian matrix). Using the heuristics from https://github.com/pytorch/pytorch/pull/53040#issuecomment-788264724 and my local tests, the `syevj_batched` path is only used when `batch_size > 1` and `matrix_size <= 32`. This would give us huge performance boost in those cases. Since there were known numerical issues on cusolver `syevj_batched` before cuda 11.3 update 1, this PR only enables the dispatch when cuda version is no less than that. See also https://github.com/pytorch/pytorch/issues/42666 #47953 https://github.com/pytorch/pytorch/issues/53040 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62003 Reviewed By: heitorschueroff Differential Revision: D30006316 Pulled By: ngimel fbshipit-source-id: 3a65c5fc9adbbe776524f8957df5442c3d3aeb8e	2021-07-30 00:35:21 -07:00
Rong Rong (AI Infra)	65ab861ec6	fix mm not correctly report TORCH_CHECK failure issue (#61394 ) Summary: fixes https://github.com/pytorch/pytorch/issues/61291. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61394 Reviewed By: zhouzhuojie, seemethere Differential Revision: D29614208 Pulled By: walterddr fbshipit-source-id: f49a15dde708e30b06059b47fae1cda7c2c3571c	2021-07-12 12:50:51 -07:00
Xiao Wang	c18017190b	Relax some linalg test tolerances (#61101 ) Summary: We are seeing some test failures on A100 machine, though TF32 matmul is not involved in these cases. I tried `svd_lowrank` test. It passed while testing itself, but failed when I run the whole test suite. It's probably some random seed issue. Relax test tolerance would be much easier to do. Some SVD tests failed when we compare CPU float32 vs GPU float32. Since linear algebra are sort of unstable at single precision, comparing two single precision results may give some false positives. So we calculate CPU results in float64 or complex128, which is much more accurate. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61101 Reviewed By: ngimel Differential Revision: D29593483 Pulled By: mruberry fbshipit-source-id: 3df651e3cca1b0effc1a4ae29d4f26b1cb4082ed	2021-07-12 09:17:59 -07:00
gmagogsfm	a46d4212bf	Allow dims=0 in torch.tensordot call (#61331 ) Summary: In one of my previous PRs that rewrite `tensordot` implementation, I mistakenly take empty value of `dims_a` and `dims_b` as illegal values. This turns out to be not true. Empty `dims_a` and `dims_b` are supported, in fact common when `dims` is passed as an integer. This PR removes the unnecessary check. Fixes https://github.com/pytorch/pytorch/issues/61096 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61331 Reviewed By: eellison Differential Revision: D29578910 Pulled By: gmagogsfm fbshipit-source-id: 96e58164491a077ddc7a1d6aa6ccef8c0c9efda2	2021-07-10 17:05:20 -07:00
Ivan Yashchuk	9dd1824741	Fix dispatch keys for eigh, lu_solve (#60945 ) Summary: I added a test to `test_ops.py` that verifies that the op can run correctly from different cuda devices. This test revealed that `linalg_eigh`, `linalg_eigvalsh`, `linalg_matrix_rank`, `linalg_pinv` were failing. `matrix_rank` and `pinv` are calling `eigh` internally. `linalg_eigh` and `lu_solve` internally use dispatch stubs, so they should be registered with `CPU, CUDA` dispatch keys. The generated code includes device guards in this case and the problem is not present. Implemented a better out variant for `eigvalsh` and registered it with `CPU, CUDA` dispatch keys. ~I added a device guard to `linalg_eigh_kernel` as a fix for `eigvalsh` function. This function needs to be registered as CompositeImplicitAutograd, because it calls `at::linalg_eigh` if `at::GradMode::is_enabled()`.~ Fixes https://github.com/pytorch/pytorch/issues/60892. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60945 Reviewed By: mruberry Differential Revision: D29589580 Pulled By: ngimel fbshipit-source-id: 5851605958bdfc3a1a1768263934619449957168	2021-07-07 16:28:22 -07:00
Kurt Mohler	b39770c461	Fix degenerate shape behavior for ord=+/-2 (#60273 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/59198 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60273 Reviewed By: jbschlosser Differential Revision: D29422907 Pulled By: mruberry fbshipit-source-id: 609cd640b0477f90bebca20865e34cbe182d3909	2021-06-30 02:17:26 -07:00
Aswin John Mathews	a53d7f8f7c	Remove test linalg test skips from MAGMA integration (#58232 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/55552; majority of cases in https://github.com/pytorch/pytorch/issues/51303 Tests in torch/testing/_internal/common_methods_invocations.py (tested through test_ops) cannot be fully removed, since the machines seem to be running out of gpu memory during the test, and needs further analysis Pull Request resolved: https://github.com/pytorch/pytorch/pull/58232 Reviewed By: ngimel Differential Revision: D29394021 Pulled By: malfet fbshipit-source-id: f108a70af33beec908ac1c0b58467f8744e6fe87	2021-06-25 11:44:49 -07:00
Philip Meier	0c916c8a4e	up the priority of numpy array comparisons in self.assertEqual (#59067 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/58988. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59067 Reviewed By: jbschlosser Differential Revision: D28986642 Pulled By: heitorschueroff fbshipit-source-id: 3ef2d26b4010fc3519d0a1a020ea446ffeb46ba0	2021-06-22 13:07:07 -07:00
Heitor Schueroff	4caca7a15b	Improved torch.einsum testing and fixed bug (#59731 ) Summary: Improved torch.einsum testing and fixed a bug where lower case letters appeared before upper case letters in the sorted order which is inconsistent with NumPy. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59731 Reviewed By: SplitInfinity, ansley Differential Revision: D29183078 Pulled By: heitorschueroff fbshipit-source-id: a33980d273707da2d60a387a2af2fa41527ddb68	2021-06-17 04:48:47 -07:00
Natalia Gimelshein	9d533ef3ac	Renorm fix (#59615 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/59584 albanD, soulitzer, `renorm` grad was completely busted. Fast gradcheck is definitely not doing its job. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59615 Reviewed By: jbschlosser Differential Revision: D28964271 Pulled By: ngimel fbshipit-source-id: b6878cd24db9189b64b67eb58bd2cd8956cda78a	2021-06-08 14:59:24 -07:00
Mike Ruberry	de40c8e495	Adds remaining OpInfos and removes redundant test generators (#55558 ) Summary: Per title. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55558 Reviewed By: ngimel Differential Revision: D28922522 Pulled By: mruberry fbshipit-source-id: 89cefd93788bc8aa0683f4583cf5caa81aa2dc93	2021-06-06 14:52:26 -07:00
albanD	d095ec75a1	Forward AD formulas batch 2 (#57863 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57863 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D28387763 Pulled By: albanD fbshipit-source-id: e1b60ab728bb05b9e3323ee0dc7e401aaf5b8817	2021-06-03 07:33:04 -07:00
Ivan Yashchuk	e9e1bb1a4e	Fix device of info tensor for torch.linalg.inv_ex with MAGMA backend (#59223 ) Summary: This PR fixes `torch.linalg.inv_ex` with MAGMA backend. `info` tensor was returned on CPU device even for CUDA inputs. Now it's on the same device as input. Fixes https://github.com/pytorch/pytorch/issues/58769 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59223 Reviewed By: ngimel Differential Revision: D28814876 Pulled By: mruberry fbshipit-source-id: f66c6f06fb8bc305cb2e22b08750a25c8888fb65	2021-06-01 21:49:57 -07:00
Natalia Gimelshein	1871d4e604	avoid explicitly casting low precision inputs to fp32 in norm (#59134 ) Summary: Per title. Now `norm` with fp16/bfloat16 inputs and fp32 outputs on cuda won't do explicit cast Pull Request resolved: https://github.com/pytorch/pytorch/pull/59134 Reviewed By: mruberry Differential Revision: D28775729 Pulled By: ngimel fbshipit-source-id: 896daa4f02e8a817cb7cb99ae8a93c02fa8dd5e9	2021-05-29 00:48:18 -07:00
Heitor Schueroff	72ae924fad	Added sublist support for torch.einsum (#56625 ) Summary: This PR adds an alternative way of calling `torch.einsum`. Instead of specifying the subscripts as letters in the `equation` parameter, one can now specify the subscripts as a list of integers as in `torch.einsum(operand1, subscripts1, operand2, subscripts2, ..., [subscripts_out])`. This would be equivalent to `torch.einsum('<subscripts1>,<subscripts2>,...,->[<subscript_out>]', operand1, operand2, ...)` TODO - [x] Update documentation - [x] Add more error checking - [x] Update tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/56625 Reviewed By: zou3519 Differential Revision: D28062616 Pulled By: heitorschueroff fbshipit-source-id: ec50ad34f127210696e7c545e4c0675166f127dc	2021-05-21 08:36:45 -07:00
Xiao Wang	691c139144	Do not use TF32 matmul in linalg and DDP tests (#56114 ) Summary: This PR does several things to relax test tolerance - Do not use TF32 in cuda matmul in test_c10d. See https://github.com/pytorch/pytorch/issues/52941. - Do not use TF32 in cuda matmul in test_linalg. Increase atol for float and cfloat. See https://github.com/pytorch/pytorch/issues/50453 The tolerance is increased because most linear algebra operators are not that stable in single precision. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56114 Reviewed By: ailzhang Differential Revision: D28554467 Pulled By: ngimel fbshipit-source-id: 90416be8e4c048bedb16903b01315584d344ecdf	2021-05-20 14:01:19 -07:00
Rong Rong (AI Infra)	64d23cc040	Revert D28379394: Update internal code for torch.linalg.solve Test Plan: revert-hammer Differential Revision: D28379394 (`b0833533a7`) Original commit changeset: b47f66bc1ee1 fbshipit-source-id: c81b34f45a1d82a2b1cecc8987048fa1055203d6	2021-05-13 19:49:41 -07:00
Ivan Yashchuk	b0833533a7	Update internal code for torch.linalg.solve (#56613 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56613 Replace linalg_solve_helper with `lu_stub` + `lu_solve_stub`. Once `lu_stub` and `lu_solve_stub` have cuSOLVER-based codepath, `torch.linalg.solve` will have it as well. Test Plan: Imported from OSS Reviewed By: agolynski Differential Revision: D28379394 Pulled By: mruberry fbshipit-source-id: b47f66bc1ee12715da11dcffc92e31e67fa8c8f6	2021-05-13 16:57:29 -07:00
Ivan Yashchuk	5e65428503	Fix NumPy compatibility issue for torch.linalg.cond (#58041 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58041 The shape of the returned result was different for NumPy and PyTorch for `ord={-2, 2, None}`. Now it's fixed. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D28405147 Pulled By: mruberry fbshipit-source-id: 30293a017a0c0a7e9e3aabd470386235fef7b6a6	2021-05-13 09:42:18 -07:00
Ivan Yashchuk	a49406b331	Fixed batched version of torch.linalg.cond for singular inputs (#58040 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58040 This PR uses `torch.linalg.inv_ex` to determine the non-invertible inputs and return the condition number of infinity for such inputs. Added OpInfo entry for `torch.linalg.cond`. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D28405146 Pulled By: mruberry fbshipit-source-id: 524b9a38309851fa6461cb787ef3fba5aa7d5328	2021-05-13 09:42:17 -07:00
Ivan Yashchuk	c1430c3425	Add torch.linalg.inv_ex without checking for errors by default (#58039 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58039 The new function has the following signature `inv_ex(Tensor inpit, *, bool check_errors=False) -> (Tensor inverse, Tensor info)`. When `check_errors=True`, an error is thrown if the matrix is not invertible; `check_errors=False` - responsibility for checking the result is on the user. `linalg_inv` is implemented using calls to `linalg_inv_ex` now. Resolves https://github.com/pytorch/pytorch/issues/25095 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D28405148 Pulled By: mruberry fbshipit-source-id: b8563a6c59048cb81e206932eb2f6cf489fd8531	2021-05-13 09:42:15 -07:00
lezcano	db13119fc4	Deprecate symeig (#57732 ) Summary: This one had a tricky usage of `torch.symeig` that had to be replaced. I tested the replacement locally though. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57732 Reviewed By: bdhirsh Differential Revision: D28328189 Pulled By: mruberry fbshipit-source-id: 7f000fcbf2b029beabc76e5a89ff158b47977474	2021-05-12 02:21:35 -07:00
Nikita Vedeneev	c790fd2bf8	ATen lu_unpack. Required for making `torch.lu_solve` differentiable. (#46913 ) Summary: Backward methods for `torch.lu` and `torch.lu_solve` require the `torch.lu_unpack` method. However, while `torch.lu` is a Python wrapper over a native function, so its gradient is implemented via `autograd.Function`, `torch.lu_solve` is a native function, so it cannot access `torch.lu_unpack` as it is implemented in Python. Hence this PR presents a native (ATen) `lu_unpack` version. It is also possible to update the gradients for `torch.lu` so that backward+JIT is supported (no JIT for `autograd.Function`) with this function. ~~The interface for this method is different from the original `torch.lu_unpack`, so it is decided to keep it hidden.~~ Pull Request resolved: https://github.com/pytorch/pytorch/pull/46913 Reviewed By: albanD Differential Revision: D28355725 Pulled By: mruberry fbshipit-source-id: 281260f3b6e93c15b08b2ba66d5a221314b00e78	2021-05-11 22:53:21 -07:00
Ivan Yashchuk	aaca12bcc2	Deprecate in docs torch.svd and change svd -> linalg_svd (#57981 ) Summary: This PR adds a note to the documentation that torch.svd is deprecated together with an upgrade guide on how to use `torch.linalg.svd` and `torch.linalg.svdvals` (Lezcano's instructions from https://github.com/pytorch/pytorch/issues/57549). In addition, all usage of the old svd function is replaced with a new one from torch.linalg module, except for the `at::linalg_pinv` function, that fails the XLA CI build (https://github.com/pytorch/xla/issues/2755, see failure in draft PR https://github.com/pytorch/pytorch/pull/57772). Pull Request resolved: https://github.com/pytorch/pytorch/pull/57981 Reviewed By: ngimel Differential Revision: D28345558 Pulled By: mruberry fbshipit-source-id: 02dd9ae6efe975026e80ca128e9b91dfc65d7213	2021-05-11 18:04:10 -07:00
Mike Ruberry	3c87fe9b14	Revert D28117714: [pytorch][PR] ATen lu_unpack. Required for making `torch.lu_solve` differentiable. Test Plan: revert-hammer Differential Revision: D28117714 (`5c67d8dfd3`) Original commit changeset: befd33db12ec fbshipit-source-id: 295b2134935542a903a73f90a7998239dfe6cc81	2021-05-09 23:20:06 -07:00
Ivan Yashchuk	d11cce4f5e	Add cuSOLVER path for torch.linalg.lstsq (#57317 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57317 This PR implements QR-based least squares solver using geqrf, ormqr, and triangular_solve operations. Internal code of triangular_solve was fixed to handle correctly larger sized rectangular arrays. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D28312683 Pulled By: mruberry fbshipit-source-id: dc8ae837a5fb0685d85c8733a47d7d25dc46443a	2021-05-09 21:19:10 -07:00
Nikita Vedeneev	5c67d8dfd3	ATen lu_unpack. Required for making `torch.lu_solve` differentiable. (#46913 ) Summary: Backward methods for `torch.lu` and `torch.lu_solve` require the `torch.lu_unpack` method. However, while `torch.lu` is a Python wrapper over a native function, so its gradient is implemented via `autograd.Function`, `torch.lu_solve` is a native function, so it cannot access `torch.lu_unpack` as it is implemented in Python. Hence this PR presents a native (ATen) `lu_unpack` version. It is also possible to update the gradients for `torch.lu` so that backward+JIT is supported (no JIT for `autograd.Function`) with this function. ~~The interface for this method is different from the original `torch.lu_unpack`, so it is decided to keep it hidden.~~ Pull Request resolved: https://github.com/pytorch/pytorch/pull/46913 Reviewed By: astaff Differential Revision: D28117714 Pulled By: mruberry fbshipit-source-id: befd33db12ecc147afacac792418b6f4948fa4a4	2021-05-09 19:12:56 -07:00
Heitor Schueroff	4cf2c646c2	Added torch.linalg.matrix_norm (#57127 ) Summary: This PR is focused on the API for `linalg.matrix_norm` and delegates computations to `linalg.norm` for the moment. The main difference between the norms is when `dim=None`. In this case - `linalg.norm` will compute a vector norm on the flattened input if `ord=None`, otherwise it requires the input to be either 1D or 2D in order to disambiguate between vector and matrix norm - `linalg.vector_norm` will flatten the input - `linalg.matrix_norm` will compute the norm over the last two dimensions, treating the input as batch of matrices In future PRs, the computations will be moved to `torch.linalg.matrix_norm` and `torch.norm` and `torch.linalg.norm` will delegate computations to either `linalg.vector_norm` or `linalg.matrix_norm` based on the arguments provided. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57127 Reviewed By: mrshenli Differential Revision: D28186736 Pulled By: mruberry fbshipit-source-id: 99ce2da9d1c4df3d9dd82c0a312c9570da5caf25	2021-05-09 04:50:33 -07:00
Ivan Yashchuk	18fed3dfbe	Change name for namedtuple return of torch.linalg.svd (#57181 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57181 Documentation for torch.linalg.svd says: > The returned decomposition is a named tuple `(U, S, Vh)` The documentation is correct while the implementation was wrong. Renamed `V` -> `Vh`. `h` stands for hermitian. This is a BC-breaking change but our linalg module is beta, therefore we can do it without a deprecation notice or aliases. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D28142162 Pulled By: mruberry fbshipit-source-id: 5e6e0ae5a63300f2db1575ca3259df381f8e1a7e	2021-05-07 15:17:43 -07:00
Ivan Yashchuk	58f32fa5fd	Remove compute_uv flag from torch.linalg.svd (#57180 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57180 We have now a separate function for computing only the singular values. `compute_uv` argument is not needed and it was decided in the offline discussion to remove it. This is a BC-breaking change but our linalg module is beta, therefore we can do it without a deprecation notice. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D28142163 Pulled By: mruberry fbshipit-source-id: 3fac1fcae414307ad5748c9d5ff50e0aa4e1b853	2021-05-07 15:16:42 -07:00
Sam Estep	023ecc40ad	Revert D28248766: Update internal code for torch.linalg.solve Test Plan: revert-hammer Differential Revision: D28248766 (`5f2925074b`) Original commit changeset: 300366605653 fbshipit-source-id: 316b97791e57f9017d4bf87898aea8dc869cba79	2021-05-07 07:49:16 -07:00
Ivan Yashchuk	5f2925074b	Update internal code for torch.linalg.solve (#56613 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56613 Replace linalg_solve_helper with `lu_stub` + `lu_solve_stub`. Once `lu_stub` and `lu_solve_stub` have cuSOLVER-based codepath, `torch.linalg.solve` will have it as well. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D28248766 Pulled By: mruberry fbshipit-source-id: 3003666056533d097d0ad659e0603f59fbfda9aa	2021-05-07 03:29:16 -07:00
Heitor Schueroff	1f1e2dab6b	Remove optional type for ord parameter in vector_norm (#57662 ) Summary: As per discussion here https://github.com/pytorch/pytorch/pull/57127#discussion_r624948215 Note that we cannot remove the optional type from the `dim` parameter because the default is to flatten the input tensor which cannot be easily captured by a value other than `None` ### BC Breaking Note This PR changes the `ord` parameter of `torch.linalg.vector_norm` so that it no longer accepts `None` arguments. The default behavior of `2` is equivalent to the previous default of `None`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57662 Reviewed By: albanD, mruberry Differential Revision: D28228870 Pulled By: heitorschueroff fbshipit-source-id: 040fd8055bbe013f64d3c8409bbb4b2c87c99d13	2021-05-06 17:53:25 -07:00
Sam Estep	72ebdd68e1	Revert D28242069: Add cuSOLVER path for torch.linalg.lstsq Test Plan: revert-hammer Differential Revision: D28242069 (`7b31d4262b`) Original commit changeset: 23979d19ccc7 fbshipit-source-id: edf26a78b3485790deb1a8f53e8c8d3989c28e1b	2021-05-06 09:28:15 -07:00
Ivan Yashchuk	7b31d4262b	Add cuSOLVER path for torch.linalg.lstsq (#57317 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57317 This PR implements QR-based least squares solver using geqrf, ormqr, and triangular_solve operations. Internal code of triangular_solve was fixed to handle correctly larger sized rectangular arrays. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D28242069 Pulled By: mruberry fbshipit-source-id: 23979d19ccc7f591afa8df4435d0db847e2d0d97	2021-05-06 04:45:55 -07:00
Ivan Yashchuk	35fab44eaf	Add CUDA support for torch.ormqr (#57316 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57316 CUDA support is implemented using cuSOLVER. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D28242071 Pulled By: mruberry fbshipit-source-id: 6f0a1c50c21c376d2ee2907bddb618c6a600db1f	2021-05-06 04:45:54 -07:00
Ivan Yashchuk	59d794b2c3	Port CPU torch.ormqr to ATen (#57315 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57315 This PR ports `torch.ormqr` from TH to ATen. CUDA path will be implemented in a follow-up PR. With ATen port, support for complex and batched inputs is added. The tests are rewritten and OpInfo entry is added. We can implement the least squares solver with geqrf + ormqr + triangular_solve. So it's useful to have this function renewed at least for the internal code. Resolves https://github.com/pytorch/pytorch/issues/24748 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D28242070 Pulled By: mruberry fbshipit-source-id: f070bb6ac2f5a3269b163b22f7354e9089ed3061	2021-05-06 04:44:40 -07:00
Jane Xu	76d9070d10	Replace windows CUDA 11.2 CI with 11.3 (#57223 ) Summary: Testing 11.3 with current CI. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57223 Test Plan: Relevant CI (11.3) pass! Disclaimer: Skipped test_inverse_errors_large for CUDA 11.3 as it failed. Issue documented at https://github.com/pytorch/pytorch/issues/57482. Reviewed By: malfet Differential Revision: D28169393 Pulled By: janeyx99 fbshipit-source-id: 9f5cf7b6737ee6196de92bd80918a5bfbe5510ea	2021-05-04 14:23:23 -07:00
Shen Li	6bc3ad28a3	Revert D28143091: [pytorch][PR] Add cross OpInfo Test Plan: revert-hammer Differential Revision: D28143091 (`4a872f8539`) Original commit changeset: 0b98226a1811 fbshipit-source-id: eda38923f31ac5a79af5c78077ed0106d904f6da	2021-05-03 09:19:41 -07:00
Mike Ruberry	4a872f8539	Add cross OpInfo (#55483 ) Summary: One of the tasks in https://github.com/pytorch/pytorch/issues/54261. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55483 Reviewed By: ngimel Differential Revision: D28143091 Pulled By: mruberry fbshipit-source-id: 0b98226a1811f61cb90d2248dd4425135a096551	2021-05-02 16:23:02 -07:00
Ivan Yashchuk	75a2a92b02	Add torch.linalg.cholesky_ex without checking for errors by default (#56724 ) Summary: The new function has the following signature `cholesky_ex(Tensor input, *, bool check_errors=False) -> (Tensor L, Tensor infos)`. When `check_errors=True`, an error is thrown if the decomposition fails; `check_errors=False` - responsibility for checking the decomposition is on the user. When `check_errors=False`, we don't have host-device memory transfers for checking the values of the `info` tensor. Rewrote the internal code for `torch.linalg.cholesky`. Added `cholesky_stub` dispatch. `linalg_cholesky` is implemented using calls to `linalg_cholesky_ex` now. Resolves https://github.com/pytorch/pytorch/issues/57032. Ref. https://github.com/pytorch/pytorch/issues/34272, https://github.com/pytorch/pytorch/issues/47608, https://github.com/pytorch/pytorch/issues/47953 Pull Request resolved: https://github.com/pytorch/pytorch/pull/56724 Reviewed By: ngimel Differential Revision: D27960176 Pulled By: mruberry fbshipit-source-id: f05f3d5d9b4aa444e41c4eec48ad9a9b6fd5dfa5	2021-05-01 18:48:27 -07:00
Ivan Yashchuk	2be115336b	Fix torch.ormqr for non Fortran-contiguous inputs (#57314 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57314 Test Plan: Imported from OSS Reviewed By: astaff Differential Revision: D28118029 Pulled By: mruberry fbshipit-source-id: e2ef65093cc5f77769adc7066c76f0607b5559a9	2021-05-01 17:50:06 -07:00
Arindam Roy	6d681d064f	ROCM: Re-enable test_norm_fro_2_equivalence_old (#57170 ) Summary: This test was disabled for ROCM 3.9. With latest updates, the test is passing in ROCM 4.1. Hence enabling this test in test/test_linalg.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/57170 Reviewed By: astaff Differential Revision: D28118217 Pulled By: mruberry fbshipit-source-id: 1b830eed944a664c3b1b3e936b87096fef0c0ca2	2021-05-01 16:41:41 -07:00
Wenlei Xie	20085f6d23	Support auto generation of device check (#56872 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56872 ghstack-source-id: 127914018 Test Plan: auto test Reviewed By: ezyang Differential Revision: D27986429 fbshipit-source-id: 0da8413b0b8e6810fcea27ed1de499f11f68bd1f	2021-05-01 12:02:09 -07:00
Sameer Deshmukh	293830bc19	Fix min() and max() for empty tensors (#52565 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/34907 Pull Request resolved: https://github.com/pytorch/pytorch/pull/52565 Reviewed By: anjali411 Differential Revision: D27999955 Pulled By: ezyang fbshipit-source-id: 30e88cc8d84806198500e3001ecf58fa764536dd	2021-04-30 15:55:10 -07:00
Ivan Yashchuk	f54aa85a6c	Fix MAGMA qr for empty batched inputs (#56257 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56257 CPU and cuSOLVER path were fixed with refactoring of `_linalg_qr_helper_default`. Resolves https://github.com/pytorch/pytorch/issues/50576 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D27960157 Pulled By: mruberry fbshipit-source-id: f923f3067a35e65218889e64c6a886364c3d1759	2021-04-30 11:15:03 -07:00
Ivan Yashchuk	03962bc7f1	Updated linalg.lstsq with NumPy compatible kwarg rcond (#54723 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54723 Renamed "cond" -> "rcond" to be NumPy compatible. The default value for rcond was changed to match non-legacy NumPy behavior. Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D27993741 Pulled By: mruberry fbshipit-source-id: a4baf25aca6a8272f1af2f963600866bfda56fb3	2021-04-29 09:11:12 -07:00
Ivan Yashchuk	5a02f72fcf	Modified batched residuals return of torch.linalg.lstsq (#54722 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54722 SciPy and NumPy operate only on non-batched input and return an empty array with shape (0,) if rank(a) != n. The behavior for non-batched inputs is NumPy and SciPy compatible and the same result is computed. For batched inputs, if any matrix in the batch has a rank less than `n`, then an empty tensor is returned. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D27993736 Pulled By: mruberry fbshipit-source-id: 0d7cff967b322a5e816a23f282b6ce383c4468ef	2021-04-29 09:10:12 -07:00
Heitor Schueroff	57e37080cd	Added OpInfo for torch.einsum (#56276 ) Summary: Adds OpInfo testing for torch.einsum. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56276 Reviewed By: mruberry Differential Revision: D27967095 Pulled By: heitorschueroff fbshipit-source-id: 60524273d2ca885e7eeb932db3e7fd697ae5ca8e	2021-04-27 07:39:38 -07:00
Ivan Yashchuk	f84f2063b4	Port CUDA torch.geqrf to ATen (#56251 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56251 This PR ports `torch.geqrf` from TH to ATen for CUDA path. Resolves https://github.com/pytorch/pytorch/issues/24569 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D27960155 Pulled By: mruberry fbshipit-source-id: a8b010c41d703a5de4bf40b045c89e6b95b5a5ca	2021-04-26 09:50:41 -07:00
Ivan Yashchuk	6ba9fd5963	Added "Tensor tol" overload of torch.linalg.matrix_rank (#54157 ) Summary: Currently `torch.linalg.matrix_rank` accepts only Python's float for `tol=` argument. The current behavior is not NumPy compatible and this PR adds the possibility to pass Tensor for matrix-wise tolerances. Ref. https://github.com/pytorch/pytorch/issues/42666 Pull Request resolved: https://github.com/pytorch/pytorch/pull/54157 Reviewed By: ezyang Differential Revision: D27961548 Pulled By: mruberry fbshipit-source-id: 47318eefa07a7876e6360dae089e5389b9939489	2021-04-26 09:35:40 -07:00
Ivan Yashchuk	d5ff432615	Add torch.linalg.svdvals (#56684 ) Summary: This PR adds `torch.linalg.svdvals(input, out=None)` that computes only the singular values of `input`. Resolves https://github.com/pytorch/pytorch/issues/54155. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56684 Reviewed By: albanD Differential Revision: D27938229 Pulled By: mruberry fbshipit-source-id: 5ea79ad9cccf818df0fbda1f431299ebf8de3798	2021-04-25 03:42:24 -07:00
Ivan Yashchuk	58fcf77712	Port CPU torch.geqrf to ATen (#56249 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56249 This PR ports `torch.geqrf` from TH to ATen. CUDA path will be implemented in a follow-up PR. With ATen port support for complex and batched inputs is added. There were no correctness tests, they are added in this PR and I added OpInfo for this operation. We can implement the QR decomposition as a composition of geqrf and orgqr (torch.linalg.householder_product). Also we can implement the least squares solver with geqrf + ormqr + trtrs. So it's useful to have this function renewed at least for the internal code. Resolves https://github.com/pytorch/pytorch/issues/24705 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D27907357 Pulled By: mruberry fbshipit-source-id: 94e1806078977417e7903db76eab9d578305f585	2021-04-25 01:17:00 -07:00
Heitor Schueroff	369e8bc4bc	Added support for uppercase letters in torch.einsum (#56475 ) Summary: This PR adds support for upper case letters in `torch.einsum` equation. Addresses PR https://github.com/pytorch/pytorch/pull/55013 here. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56475 Reviewed By: ailzhang Differential Revision: D27948362 Pulled By: heitorschueroff fbshipit-source-id: 51cf57b17c4c23d88fab5343f17ba3bfbe3607a5	2021-04-23 08:13:58 -07:00
Kurt Mohler	1f04494c0e	Consolidate nondeterministic error tests (#55631 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/51498 Pull Request resolved: https://github.com/pytorch/pytorch/pull/55631 Reviewed By: malfet Differential Revision: D27909953 Pulled By: mruberry fbshipit-source-id: 9115b2433f9c276555be55bd51b270a7a2846829	2021-04-22 23:37:01 -07:00
Jeffrey Wan	2ea3c24c06	Disable flaky tests (#56279 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56279 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D27916606 Pulled By: soulitzer fbshipit-source-id: 60c07024f6eb818f4aa6730a5f9ff90d7bc2b80f	2021-04-22 19:45:41 -07:00
Ivan Yashchuk	3d878dee45	Added out= variant for torch.linalg.lstsq (#54721 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54721 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D27874711 Pulled By: mruberry fbshipit-source-id: 696ebb6eb0bad81988e9cb7a081388a3a5ab3e2c	2021-04-20 07:09:06 -07:00
Winston Smith	7513455c74	Make tensordot resize output tensor's size if out= argument is specified & make it safely cast & copy output (#56286 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/56022. Fixes https://github.com/pytorch/pytorch/issues/56316 For `torch.tensordot`, 1. `tensordot`'s out variant now resizes the output tensor provided as the `out` argument if necessary. 2. Added a check to verify if the output tensor provided as the argument for `out` is on the same device as the input tensors. 3. Added a check to verify if the dtype of the result is castable to the dtype of the output tensor provided as an argument for `out`. 4. Because of (2) & (3), `tensordot`'s out variant now [safely casts & copies output](https://github.com/pytorch/pytorch/wiki/Developer-FAQ#how-does-out-work-in-pytorch). 5. `test_tensordot` in `test_linalg.py` had a bug - the output tensor wasn't being defined to be on the same device as the input tensors. It was fixed by simply using a `device` argument in its definition. 6. Added an `OpInfo` for `tensordot` and modified the `OpInfo` for `inner`. cc heitorschueroff mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/56286 Reviewed By: ngimel Differential Revision: D27845980 Pulled By: mruberry fbshipit-source-id: 134ab163f05c31a6900dd65aefc745803019e037	2021-04-19 04:20:21 -07:00
Kurt Mohler	a3a75bd35e	Add complex autograd support for `torch.cross` (#55854 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/53512 Pull Request resolved: https://github.com/pytorch/pytorch/pull/55854 Reviewed By: nikithamalgifb Differential Revision: D27737571 Pulled By: anjali411 fbshipit-source-id: 38165b952cc4c9213d61c7d98b549b984c154927	2021-04-15 15:07:25 -07:00
Mike Ruberry	399b66c813	Ports logdet from method_tests() to op_db (#55743 ) Summary: Per title. Also updates some tensor construction helpers. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55743 Reviewed By: ngimel Differential Revision: D27702060 Pulled By: mruberry fbshipit-source-id: f64b7bee855733ad1f4fd182819ceec5831d9878	2021-04-11 20:39:16 -07:00
Yukio Siraichi	93bf0ae6fc	Remove legacy constructor calls from pytorch codebase. (#54142 ) Summary: Follow up from https://github.com/pytorch/pytorch/issues/53889 Related to https://github.com/pytorch/pytorch/issues/47112 Removing every occurrence of the legacy constructor call present in PyTorch at: - _docs_ - _benchmarks_ - _test_ - _caffe2_ - _CONTRIBUTING.md_ Pull Request resolved: https://github.com/pytorch/pytorch/pull/54142 Reviewed By: ngimel Differential Revision: D27699450 Pulled By: mruberry fbshipit-source-id: 530aa3f5746cc8bc1407d5d51b2bbd8075e30546	2021-04-11 15:45:17 -07:00
Arindam Roy	0dff0d1537	[ROCM] Disable few tests for Magma (#55534 ) Summary: After MAGMA has been enabled, around 5k new tests are running now. Out of these 5 tests (each having 4 datatypes) are failing on the latest ROCM CI with Rocm 4.1. Disabling these tests for now so the ROCM CI does not fail. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55534 Reviewed By: ZolotukhinM Differential Revision: D27630085 Pulled By: malfet fbshipit-source-id: c48d124e6a2b4a4f3c6c4b6ac2bdf6c214f325c7	2021-04-07 22:22:43 -07:00
Nikita Shulga	add49e7e4e	Enforce PEP263 for PyTorch python codebase (#55346 ) Summary: All python files containing non-ASCII characters should be correctly annotated with `# -- coding: utf-8 --` comment Delete number of superfluous UTF-8 characters, most commonly UTF-8 opening closing quotation mark U+2019 (’) instead of ascii apostrophe ', for example `Module’s`->`Module's` Pull Request resolved: https://github.com/pytorch/pytorch/pull/55346 Reviewed By: samestep Differential Revision: D27582044 Pulled By: malfet fbshipit-source-id: c1cd89655915858ff3a41f675cdfffff795a8e44	2021-04-06 18:31:38 -07:00
Ivan Yashchuk	84d18727bd	Added linalg.eig, linalg.eigvals (#52491 ) Summary: This PR adds `torch.linalg.eig`, and `torch.linalg.eigvals` for NumPy compatibility. MAGMA uses a hybrid CPU-GPU algorithm and doesn't have a GPU interface for the non-symmetric eigendecomposition. It means that it forces us to transfer inputs living in GPU memory to CPU first before calling MAGMA, and then transfer results from MAGMA to CPU. That is rather slow for smaller matrices and MAGMA is faster than CPU path only for matrices larger than 3000x3000. Unfortunately, there is no cuSOLVER function for this operation. Autograd support for `torch.linalg.eig` will be added in a follow-up PR. Ref https://github.com/pytorch/pytorch/issues/42666 Pull Request resolved: https://github.com/pytorch/pytorch/pull/52491 Reviewed By: anjali411 Differential Revision: D27563616 Pulled By: mruberry fbshipit-source-id: b42bb98afcd2ed7625d30bdd71cfc74a7ea57bb5	2021-04-06 13:53:26 -07:00
Heitor Schueroff	d98072b027	Deprecate torch.chain_matmul in favor of torch.linalg.multi_dot (#53453 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53453 Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D27406282 Pulled By: heitorschueroff fbshipit-source-id: b6e715d1b88e0613ee6b6208cb28ba4757e31717	2021-04-01 04:50:51 -07:00
Heitor Schueroff	5d68b3695c	[Relanding] Implemented torch.linalg.multi_dot (#52859 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52859 This reverts commit `92a4ee1cf6`. Added support for bfloat16 for CUDA 11 and removed fast-path for empty input tensors that was affecting autograd graph. Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D27402390 Pulled By: heitorschueroff fbshipit-source-id: 73c5ccf54f3da3d29eb63c9ed3601e2fe6951034	2021-04-01 04:49:05 -07:00
Ivan Yashchuk	854c92078a	Fixed the default size of the workspace array for MAGMA's SVD (#54875 ) Summary: The problem was that MAGMA might not set the value for the optimal size of the workspace array leaving it uninitialized. This is fixed by setting the default value for `wkopt` variable. Fixes https://github.com/pytorch/pytorch/issues/54381 and https://github.com/pytorch/pytorch/issues/53976. Pull Request resolved: https://github.com/pytorch/pytorch/pull/54875 Reviewed By: H-Huang Differential Revision: D27437702 Pulled By: mruberry fbshipit-source-id: bf61555abc4c50e8ef2dae933df24ce4d4fe4527	2021-03-30 19:28:06 -07:00
anjali411	7c8b0f2600	Test torch.chain_matmul for complex dtype (#54885 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54885 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D27400936 Pulled By: anjali411 fbshipit-source-id: 415d843d7c55f4d84a8e9faab926a4895e1544d0	2021-03-29 13:37:23 -07:00
Edward Yang	1f36ce6e4d	Restore storage on meta tensors; increase meta coverage (#53973 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53973 Two parts to this PR; I had to put them together because adding support for X causes more test code to be exercised, which in turn may require a fix for Y. The first part is restoring the concept of storage to meta tensors. Previously, meta tensors had a nullptr storage (e.g., `meta_tensor.storage()` is an error.) As I was increasing the coverage of meta tensors, I started running into test cases (specifically memory overlap tests) that were failing because not having storage meant I couldn't check for memory overlap. After some discussion, we decided that it would make sense for meta tensors to model this as well (we already model strides, so getting accurate view information also seems useful). This PR does that by: * Rewrite all of the factory functions in MetaTensor.cpp to use the generic versions (which are very carefully written to not actually poke at the data pointer, so everything works out). The key idea here is we give meta tensors a special allocator, MetaAllocator, which always returns a nullptr even if you ask for a nonzero number of bytes. resize_ is also made generic; the normal variant can be used directly rather than having to instruct it to avoid resizing storage * Turn on memory overlap checking in TensorIterator even for meta tensors * Although meta tensors now have storage, the concept of meta storage is NOT exposed to Python land (as it would imply I would have to codegen MetaFloatStorage, MetaDoubleStorage, etc. classes). So `x.storage()` still raises an error and I have a cludge in `__deepcopy__` to break storage sharing upon deep copy (this is wrong, but no tests exercise this at the moment). The second part is adding more support for the most used functions in the test suite. * Inplace operations have very simple meta functions. I added `fill_`, `zero_`, `random_`, `uniform_` and `normal_`. In the case of random, I take advantage of pbelevich's templates for defining random kernels, so that I can reuse the common scaffolding, and then just register a noop stub that actually does the RNG. (Look, another structured kernels tiny variant!) * `copy_` is now implemented. Copying into a meta tensor is always OK, but copying out of a meta tensor raises an error (as we don't know what the "correct" data to copy out is in this case) * `empty_strided` usage from structured kernels now is implemented (TBH, this could have been done as soon as `empty_strided` was added) * Meta was missing in a few places in TensorOptions/DispatchKey utility functions, so I added them * Autograd engine now correctly homes meta tensors with CPU tensors (they have -1 device index so CUDA queues wouldn't work anyway) * `apply_`, `map_` and `map2_` are special cased to no-op on meta tensor self. These count as inplace operations too but they are implemented a little differently. Getting more meta function support triggers a number of bugs in the test suite, which I then fix: - Linear algebra functions sometimes don't report NotImplementedError because they get swallowed by catch all try blocks. This is tracked in https://github.com/pytorch/pytorch/issues/53739 - dlpack obviously doesn't work with meta tensors, I just disabled the test Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: D27036572 Test Plan: Imported from OSS Reviewed By: agolynski, bdhirsh Pulled By: ezyang fbshipit-source-id: 7005ecf4feb92a643c37389fdfbd852dbf00ac78	2021-03-29 08:37:46 -07:00
Heitor Schueroff	f9e7f132fb	Added torch.linalg.matrix_power (#52608 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52608 TODO - [x] Add OpInfo - [x] Update documentation - [x] Add more tests and compare against NumPy Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D27261532 Pulled By: heitorschueroff fbshipit-source-id: c1e4ab297da3683f6d5751be8790602f9dc37b6b	2021-03-23 15:10:06 -07:00
Mike Ruberry	544a996f83	Revert D27155845: [pytorch][PR] Fixed the size of the workspace array in functions calling MAGMA Test Plan: revert-hammer Differential Revision: D27155845 (`04a2506091`) Original commit changeset: 04439bfa82a5 fbshipit-source-id: f45967e94883effbb43d8d0a019596f1f82caa56	2021-03-19 08:27:18 -07:00
Ivan Yashchuk	04a2506091	Fixed the size of the workspace array in functions calling MAGMA (#54009 ) Summary: The size of the workspace arrays should not be less than 1. This PR fixes lstsq calls to LAPACK and MAGMA. Also `max(1, ...)` guards were added to a few other functions (symeig, svd). ROCm testing is enabled for lstsq, pinv, pinverse. Fixes https://github.com/pytorch/pytorch/issues/53976 Pull Request resolved: https://github.com/pytorch/pytorch/pull/54009 Reviewed By: ejguan Differential Revision: D27155845 Pulled By: mruberry fbshipit-source-id: 04439bfa82a5bdbe2297a6d62b6e68ba1c30e4a2	2021-03-18 10:07:45 -07:00
Kurt Mohler	382a47b493	Add torch.linalg.vector_norm function (#51099 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/50214 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51099 Reviewed By: agolynski Differential Revision: D27147360 Pulled By: mruberry fbshipit-source-id: 1056f840e7027ad81971c9d1a9f952ab9648f1b5	2021-03-18 06:41:39 -07:00
Ivan Yashchuk	564456ac44	Added autograd support for torch.orgqr (#52637 ) Summary: This PR adds autograd support for `torch.orgqr`. Since `torch.orgqr` is one of few functions that expose LAPACK's naming and all other linear algebra routines were renamed a long time ago, I also added a new function with a new name and `torch.orgqr` now is an alias for it. The new proposed name is `householder_product`. For a matrix `input` and a vector `tau` LAPACK's orgqr operation takes columns of `input` (called Householder vectors or elementary reflectors) scalars of `tau` that together represent Householder matrices and then the product of these matrices is computed. See https://www.netlib.org/lapack/lug/node128.html. Other linear algebra libraries that I'm aware of do not expose this LAPACK function, so there is some freedom in naming it. It is usually used internally only for QR decomposition, but can be useful for deep learning tasks now when it supports differentiation. Resolves https://github.com/pytorch/pytorch/issues/50104 Pull Request resolved: https://github.com/pytorch/pytorch/pull/52637 Reviewed By: agolynski Differential Revision: D27114246 Pulled By: mruberry fbshipit-source-id: 9ab51efe52aec7c137aa018c7bd486297e4111ce	2021-03-18 05:42:18 -07:00
Edward Yang	c2f41b6b84	Add meta device to generic device testing framework, skip NotImplementedError (#53682 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53682 With this, under the meta device, 101 tests passed and 16953 skipped. It ain't much, but it's a start. Some various bits and bobs: - NotImplementedError suppression at test level is implemented in the same way as CUDA memory leak check, i.e., by wrapping test methods and monkeypatching them back in. - I had to reimplement assertRaises/assertRaisesRegex from scratch to ignore NotImplementedError when _ignore_not_implemented_error is True. The implementation relies on a small amount of private API that hasn't changed since 2010 - expectedAlertNondeterministic doesn't really work so I skipped them all; there's probably a way to do it better I tested this using `pytest --disable-warnings --tb=native -k meta --sw test/*.py` and a pile of extra patches to make collection actually work (lol). Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D26955539 Pulled By: ezyang fbshipit-source-id: ac21c8734562497fdcca3b614a28010bc4c03d74	2021-03-14 20:41:19 -07:00
Mike Ruberry	319ab58e27	Skips test_linalg_lstsq on ROCm (#53977 ) Summary: This test is flaky (tracked in https://github.com/pytorch/pytorch/issues/53976). This PR skips it to let the rest of the ROCm CI run. cc nikitaved Pull Request resolved: https://github.com/pytorch/pytorch/pull/53977 Reviewed By: ngimel Differential Revision: D27036705 Pulled By: mruberry fbshipit-source-id: 5bae741fd2a68f23717cb3a7c8b73e97cfb23b5c	2021-03-14 05:42:39 -07:00
Ivan Yashchuk	7df176b1f9	Added OpInfo-based testing of some linalg functions (#51107 ) Summary: Added OpInfo-based testing of the following linear algebra functions: * cholesky, linalg.cholesky * linalg.eigh * inverse, linalg.inv * qr, linalg.qr * solve The output of `torch.linalg.pinv` for empty inputs was not differentiable, now it's fixed. In some cases, batched grad checks are disabled because it doesn't work well with 0x0 matrices (see https://github.com/pytorch/pytorch/issues/50743#issuecomment-767376085). Ref. https://github.com/pytorch/pytorch/issues/50006 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51107 Reviewed By: albanD Differential Revision: D27006115 Pulled By: mruberry fbshipit-source-id: 3c1d00e3d506948da25d612fb114e6d4a478c5b1	2021-03-14 01:10:02 -08:00
Mike Ruberry	d46978cc55	Refines test_orgqr_* skip (#53975 ) Summary: https://github.com/pytorch/pytorch/pull/51348 added CUDA support for orgqr but only a cuSOLVER path; the orgqr tests, however, were marked to run on builds with either MAGMA or cuSOLVER. This PR addresses the issue by creating a skipCUDAIfNoCusolver decator and applying to the orgqr tests. It triggers ci-all because our CI build with MAGMA but no cuSOLVER is CUDA 9.2, which does run in the typical PR CI. cc IvanYashchuk Pull Request resolved: https://github.com/pytorch/pytorch/pull/53975 Reviewed By: ngimel Differential Revision: D27036683 Pulled By: mruberry fbshipit-source-id: f6c0a3e526bde08c44b119ed2ae5d51fee27e283	2021-03-14 00:41:26 -08:00
Ivan Yashchuk	fe08671756	Added cuBLAS path for torch.triangular_solve (#53147 ) Summary: This PR adds the cuBLAS based path for `torch.triangular_solve` The device dispatching helper function was removed from native_functions.yml, it is replaced with DECLARE/DEFINE_DISPATCH. `magmaTriangularSolve` is removed and replaced with cuBLAS calls, this is not a BC-breaking change because internally MAGMA just calls the same cuBLAS function and doesn't do anything else. Batched cuBLAS is faster than batched MAGMA for matrices of size up until 512x512, after that MAGMA is faster. For batches smaller than ~8 and matrix sizes larger than 64x64 a forloop of cuBLAS calls is faster than batched version. Ref. https://github.com/pytorch/pytorch/issues/47953 Pull Request resolved: https://github.com/pytorch/pytorch/pull/53147 Reviewed By: heitorschueroff Differential Revision: D27007416 Pulled By: mruberry fbshipit-source-id: ddfc190346e6a56b84145ed0a9af67ca9cde3506	2021-03-12 13:38:42 -08:00
Nikita Vedeneev	afa1ff8e04	Implements `torch.linalg.lstsq` (#49093 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/44378 by providing a wider range of drivers similar to what SciPy is doing. The supported CPU drivers are `gels, gelsy, gelsd, gelss`. The CUDA interface has only `gels` implemented but only for overdetermined systems. The current state of this PR: - [x] CPU interface - [x] CUDA interface - [x] CPU tests - [x] CUDA tests - [x] Memory-efficient batch-wise iteration with broadcasting which fixes https://github.com/pytorch/pytorch/issues/49252 - [x] docs Pull Request resolved: https://github.com/pytorch/pytorch/pull/49093 Reviewed By: albanD Differential Revision: D26991788 Pulled By: mruberry fbshipit-source-id: 8af9ada979240b255402f55210c0af1cba6a0a3c	2021-03-12 13:25:55 -08:00
Nikita Vedeneev	8f15a2f052	eig_backward: faster and with complex support (#52875 ) Summary: As per title. Compared to the previous version, it is lighter on the usage of `at::solve` and `at::matmul` methods. Fixes https://github.com/pytorch/pytorch/issues/51621 Pull Request resolved: https://github.com/pytorch/pytorch/pull/52875 Reviewed By: mrshenli Differential Revision: D26768653 Pulled By: anjali411 fbshipit-source-id: aab141968d02587440128003203fed4b94c4c655	2021-03-10 11:33:30 -08:00
Ivan Yashchuk	e937db5dba	Added CUDA support for torch.orgqr (#51348 ) Summary: Update: MAGMA support was dropped from this PR. Only the cuSOLVER path is implemented and it's used exclusively. Original PR message: This PR adds support for CUDA inputs for `torch.orgqr`. CUDA implementation is based on both [cuSOLVER](https://docs.nvidia.com/cuda/cusolver/index.html#cuSolverDN-lt-t-gt-orgqr) and MAGMA. cuSOLVER doesn't have a specialized routine for the batched case. While MAGMA doesn't have a specialized GPU native (without CPU sync) `orgqr`. But MAGMA has implemented (and not documented) the batched GPU native version of `larft` function (for small inputs of size <= 32), which together with `larfb` operation form `orgqr` (see the call graph [here at the end of the page](http://www.netlib.org/lapack/explore-html/da/dba/group__double_o_t_h_e_rcomputational_ga14b45f7374dc8654073aa06879c1c459.html)). So now there are two main codepaths for CUDA inputs (if both MAGMA and cuSOLVER are available): * if `batchsize > 1` and `tau.shape[-1] <= 32` then MAGMA based function is called * else [cuSOLVER's `orgqr`](https://docs.nvidia.com/cuda/cusolver/index.html#cuSolverDN-lt-t-gt-orgqr) is used. If MAGMA is not available then only cuSOLVER is used and vice versa. Documentation updates and possibly a new name for this function will be in a follow-up PR. Ref. https://github.com/pytorch/pytorch/issues/50104 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51348 Reviewed By: heitorschueroff Differential Revision: D26882415 Pulled By: mruberry fbshipit-source-id: 9f91ff962921932777ff108bedc133b55fe22842	2021-03-10 09:59:56 -08:00
mattip	54a2498919	Modify tests to use assertWarnsOnceRegex instead of maybeWarnsRegex (#52387 ) Summary: Related to https://github.com/pytorch/pytorch/issues/50006 Follow on for https://github.com/pytorch/pytorch/issues/48560 to ensure TORCH_WARN_ONCE warnings are caught. Most of this is straight-forward find-and-replace, but I did find one place where the TORCH_WARN_ONCE warning was not wrapped into a python warning. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52387 Reviewed By: albanD Differential Revision: D26773387 Pulled By: mruberry fbshipit-source-id: 5be7efbc8ab4a32ec8437c9c45f3b6c3c328f5dd	2021-03-08 03:32:14 -08:00
Peter Bell	5ebfabb310	MAGMA: Initialize ipiv data to avoid internal memory access violation (#53064 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/51930 Running the reproducer under `cuda-gdb`, I see access violations in either [`zswap_kernel_batched`](`4fd4634f35/magmablas/zgetf2_kernels.cu (lines-276)`) (part of the LU factorization) and other times in [`zlaswp_columnserial_kernel`](`4fd4634f35/magmablas/zlaswp_batched.cu (lines-335)`) (part of the inverse). The common factor between both of these is they use `ipiv` to index into the matrix. My best guess is the `ipiv` indices aren't written when the factorization fails, hence garbage data is used as matrix indices and we get an access violation. Initializing `ipiv` to a known-good value before the factorization fixes the issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53064 Reviewed By: zhangguanheng66 Differential Revision: D26829053 Pulled By: heitorschueroff fbshipit-source-id: 842854a6ee182f20b2acad0d76d32d27cb51b061	2021-03-05 08:59:27 -08:00
Kyle Chen	bf5e5bf901	[ROCm] Enable test in test_linalg.py, test_optim.py and test_vmap.py … (#52818 ) Summary: Enable test in test_linalg.py, test_optim.py, and test_vmap.py for ROCm because they are passing. Signed-off-by: Kyle Chen <kylechen@amd.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/52818 Reviewed By: H-Huang Differential Revision: D26694091 Pulled By: mruberry fbshipit-source-id: 285d17aa7f271f4d94b5fa9d9f6620de8a70847b	2021-03-04 02:29:45 -08:00
Mike Ruberry	9c2673df46	Revert D26723384: [pytorch][PR] Implements `torch.linalg.lstsq` Test Plan: revert-hammer Differential Revision: D26723384 (`3ac9013235`) Original commit changeset: c9866a95f140 fbshipit-source-id: 3e5263d71facdc91ca09d7dcbbbe3ba818ee2821	2021-03-03 15:24:25 -08:00
Mike Ruberry	20860ab01a	Revert D26727918: [pytorch][PR] Added CUDA support for torch.orgqr Test Plan: revert-hammer Differential Revision: D26727918 (`e29d8477a6`) Original commit changeset: 1c4d15fa76ba fbshipit-source-id: f3d5d6811ab77332a333cd165d69fcd9ecd92dc6	2021-03-03 10:06:49 -08:00
Ivan Yashchuk	926e011cde	Fixed out= variant of linalg.solve (#51968 ) Summary: This PR modifies the behavior of the `linalg_solve_out` variant to match the description here https://github.com/pytorch/pytorch/wiki/Developer-FAQ#how-does-out-work-in-pytorch With this PR result and input tensors must be on the same device and have the same "type kind". It's allowed to pass out tensors with complex dtypes for float inputs. `linalg_solve_out` was broken for batched vector inputs and it's now fixed. Ref. https://github.com/pytorch/pytorch/issues/42666 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51968 Reviewed By: H-Huang Differential Revision: D26728825 Pulled By: mruberry fbshipit-source-id: c06fe937e7f452193b23ba09ca6cfa2703488455	2021-03-02 22:33:19 -08:00
Ivan Yashchuk	e29d8477a6	Added CUDA support for torch.orgqr (#51348 ) Summary: This PR adds support for CUDA inputs for `torch.orgqr`. CUDA implementation is based on both [cuSOLVER](https://docs.nvidia.com/cuda/cusolver/index.html#cuSolverDN-lt-t-gt-orgqr) and MAGMA. cuSOLVER doesn't have a specialized routine for the batched case. While MAGMA doesn't have a specialized GPU native (without CPU sync) `orgqr`. But MAGMA has implemented (and not documented) the batched GPU native version of `larft` function (for small inputs of size <= 32), which together with `larfb` operation form `orgqr` (see the call graph [here at the end of the page](http://www.netlib.org/lapack/explore-html/da/dba/group__double_o_t_h_e_rcomputational_ga14b45f7374dc8654073aa06879c1c459.html)). So now there are two main codepaths for CUDA inputs (if both MAGMA and cuSOLVER are available): * if `batchsize > 1` and `tau.shape[-1] <= 32` then MAGMA based function is called * else [cuSOLVER's `orgqr`](https://docs.nvidia.com/cuda/cusolver/index.html#cuSolverDN-lt-t-gt-orgqr) is used. If MAGMA is not available then only cuSOLVER is used and vice versa. Documentation updates and possibly a new name for this function will be in a follow-up PR. Ref. https://github.com/pytorch/pytorch/issues/50104 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51348 Reviewed By: ngimel Differential Revision: D26727918 Pulled By: mruberry fbshipit-source-id: 1c4d15fa76ba624e341a69a32337a9a16cc01013	2021-03-02 21:34:23 -08:00
Nikita Vedeneev	3ac9013235	Implements `torch.linalg.lstsq` (#49093 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/44378 by providing a wider range of drivers similar to what SciPy is doing. The supported CPU drivers are `gels, gelsy, gelsd, gelss`. The CUDA interface has only `gels` implemented but only for overdetermined systems. The current state of this PR: - [x] CPU interface - [x] CUDA interface - [x] CPU tests - [x] CUDA tests - [x] Memory-efficient batch-wise iteration with broadcasting which fixes https://github.com/pytorch/pytorch/issues/49252 - [x] docs Pull Request resolved: https://github.com/pytorch/pytorch/pull/49093 Reviewed By: H-Huang Differential Revision: D26723384 Pulled By: mruberry fbshipit-source-id: c9866a95f14091955cf42de22f4ac9e2da009713	2021-03-02 19:00:07 -08:00
Ivan Yashchuk	870bac13bc	Fixed out= variant of linalg.inv (#51977 ) Summary: This PR modifies the behavior of the `linalg_inv_out` variant to match the description here https://github.com/pytorch/pytorch/wiki/Developer-FAQ#how-does-out-work-in-pytorch With this PR result and input tensors must be on the same device and have the same "type kind". It's allowed to pass out tensors with complex dtypes for float inputs. Ref. https://github.com/pytorch/pytorch/issues/42666 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51977 Reviewed By: H-Huang Differential Revision: D26725718 Pulled By: mruberry fbshipit-source-id: 2acc2a311328268706ce27ce060fc88fc7416753	2021-03-02 18:45:29 -08:00
Luca Wehrstedt	92a4ee1cf6	Revert D26375734: Implemented torch.linalg.multi_dot Test Plan: revert-hammer Differential Revision: D26375734 (`0396f492b9`) Original commit changeset: 839642692424 fbshipit-source-id: cb64db646010128d802e1930d5e9526c1f7aa6a2	2021-02-25 00:43:57 -08:00
Heitor Schueroff	0396f492b9	Implemented torch.linalg.multi_dot (#51807 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51807 Implemented torch.linalg.multi_dot similar to [numpy.linalg.multi_dot](https://numpy.org/doc/stable/reference/generated/numpy.linalg.multi_dot.html). This function does not support broadcasting or batched inputs at the moment. NOTE numpy.linalg.multi_dot allows the first and last tensors to have more than 2 dimensions despite their docs stating these must be either 1D or 2D. This PR diverges from NumPy in that it enforces this restriction. TODO - [ ] Benchmark against NumPy - [x] Add OpInfo testing - [x] Remove unnecessary copy for out= argument Test Plan: Imported from OSS Reviewed By: nikithamalgifb Differential Revision: D26375734 Pulled By: heitorschueroff fbshipit-source-id: 839642692424c4b1783606c76dd5b34455368f0b	2021-02-24 15:32:30 -08:00
Ivan Yashchuk	7ca9776874	Fixed _out variants of linear algebra functions (#51560 ) Summary: This PR modifies the behavior of `_out` variants to match the description here https://github.com/pytorch/pytorch/wiki/Developer-FAQ#how-does-out-work-in-pytorch With this PR result and input tensors must be on the same device and have the same "type kind". I skipped `qr` and `eig` in this process as they require a bit more work. Functions that can use the provided storage directly do so. If `result` is not empty and not in the batched column-major format or does not have the same type as input then we have to allocate a temporary tensor and copy it. TODO: - [x] Add more tests for same device and valid safe dtype - [x] Move inv and solve changes to separate PRs https://github.com/pytorch/pytorch/pull/51968, https://github.com/pytorch/pytorch/pull/51977 Ref. https://github.com/pytorch/pytorch/issues/42666 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51560 Reviewed By: albanD Differential Revision: D26400734 Pulled By: heitorschueroff fbshipit-source-id: a6201ed7e919c1670c6ff3ef60217d1dbfb72e67	2021-02-19 04:03:35 -08:00
Jeff Daily	70a805a286	[ROCm] skip one more magma test that is flaky (#52064 ) Summary: Skipped hipMAGMA tests are tracked in https://github.com/pytorch/pytorch/issues/51303. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52064 Reviewed By: albanD Differential Revision: D26406745 Pulled By: walterddr fbshipit-source-id: 2405ea06e03450eb22177c2c8b12a366cfbdaa93	2021-02-11 14:02:52 -08:00
Jeff Daily	5dd1568aa3	[ROCm] skip more magma tests (#51915 ) Summary: Additional magma tests have been identified as failing after integrating hipMAGMA into the ROCm builds. Skipping is necessary until they can be fixed properly. This is blocking migration of ROCm CI to 4.0.1. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51915 Reviewed By: izdeby Differential Revision: D26326404 Pulled By: malfet fbshipit-source-id: 558cce66f216f404c0316ab036e2e5637fc99798	2021-02-09 09:14:42 -08:00
Jeff Daily	d02ea9a141	[ROCm] add hipMAGMA support (#51238 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/48831. - CI image is updated to build hipMAGMA from source and set env MAGMA_HOME. - CMake is updated to separate different requirements for CUDA versus ROCm MAGMA. - Some unit tests that become enabled with MAGMA are currently skipped for ROCm due to failures. Fixing these failures will be follow-on work. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51238 Reviewed By: ngimel Differential Revision: D26184918 Pulled By: malfet fbshipit-source-id: ada632f1ae7b413e8cae6543fe931dcd46985821	2021-02-01 22:09:33 -08:00
Ivan Yashchuk	5e09ec6518	Fixed SVD ignoring "some/full_matrices" flag for empty inputs (#51109 ) Summary: For empty inputs `torch.svd` (and `torch.linalg.svd`) was returning incorrect results for `some=True` (`full_matrices=False`). Behaviour on master branch: ```python In [1]: import torch In [2]: a = torch.randn(0, 7) In [3]: a.svd() Out[3]: torch.return_types.svd( U=tensor([], size=(0, 0)), S=tensor([]), V=tensor([[0., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0., 0.]])) In [4]: a.svd(some=False) Out[4]: torch.return_types.svd( U=tensor([], size=(0, 0)), S=tensor([]), V=tensor([[0., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0., 0.]])) ``` `some` flag is ignored and 7x7 `V` matrix is returned in both cases. `V` should have 7x0 shape when `some=True`. This PR fixes that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51109 Reviewed By: ngimel Differential Revision: D26170897 Pulled By: mruberry fbshipit-source-id: 664c09ca27bb375fabef2a046d0a09ca57b01aac	2021-02-01 21:51:58 -08:00
Ivan Yashchuk	30675d0921	Added OpInfo-based testing of triangular_solve (#50948 ) Summary: Added OpInfo-based testing of `torch.triangular_solve`. These tests helped to discover that CPU `triangular_solve` wasn't working for empty matrices and for CUDA inputs a warning was printed to the terminal. It is fixed now. CUDA gradgrad checks are skipped. ``` 11.44s call test/test_ops.py::TestGradientsCUDA::test_fn_gradgrad_triangular_solve_cuda_complex128 2.97s call test/test_ops.py::TestGradientsCUDA::test_fn_gradgrad_triangular_solve_cuda_float64 1.60s call test/test_ops.py::TestGradientsCPU::test_fn_gradgrad_triangular_solve_cpu_complex128 1.36s call test/test_ops.py::TestOpInfoCUDA::test_supported_dtypes_triangular_solve_cuda_complex128 1.20s call test/test_ops.py::TestGradientsCUDA::test_fn_grad_triangular_solve_cuda_complex128 0.86s call test/test_ops.py::TestCommonCUDA::test_variant_consistency_jit_triangular_solve_cuda_complex64 0.85s call test/test_ops.py::TestCommonCUDA::test_variant_consistency_jit_triangular_solve_cuda_complex128 0.81s call test/test_ops.py::TestCommonCUDA::test_variant_consistency_jit_triangular_solve_cuda_float64 0.77s call test/test_ops.py::TestCommonCUDA::test_variant_consistency_jit_triangular_solve_cuda_float32 0.46s call test/test_ops.py::TestCommonCPU::test_variant_consistency_jit_triangular_solve_cpu_complex128 0.44s call test/test_ops.py::TestCommonCPU::test_variant_consistency_jit_triangular_solve_cpu_complex64 0.44s call test/test_ops.py::TestGradientsCUDA::test_fn_grad_triangular_solve_cuda_float64 0.42s call test/test_ops.py::TestGradientsCPU::test_fn_gradgrad_triangular_solve_cpu_float64 0.40s call test/test_ops.py::TestCommonCPU::test_variant_consistency_jit_triangular_solve_cpu_float32 0.40s call test/test_ops.py::TestCommonCPU::test_variant_consistency_jit_triangular_solve_cpu_float64 0.17s call test/test_ops.py::TestGradientsCPU::test_fn_grad_triangular_solve_cpu_complex128 ``` Ref. https://github.com/pytorch/pytorch/issues/50006 Pull Request resolved: https://github.com/pytorch/pytorch/pull/50948 Reviewed By: ailzhang Differential Revision: D26123998 Pulled By: mruberry fbshipit-source-id: 54136e8fc8a71f107dddb692c5be298c6d5ed168	2021-01-29 10:31:07 -08:00

... 3 4 5 6 7 ...

529 Commits