pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Andres Lugo-Reyes	0a4a7d4b26	Use hipsolver for default svd case on ROCm (#103540 ) Fixes #102678 Fixes #102629 Fixes #102558 HipSOLVER performance on ROCm5.4.2 and later no longer serves as massive bottleneck. Additionally, using magma on rocm in this case caused test_compare_cpu_lialg_pinv_singular_cuda_float32 to fail. Using hipSOLVER, the test now passes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/103540 Approved by: https://github.com/lezcano	2023-06-16 14:57:34 +00:00
Bearnardd	2abad0c184	Add dtype check baddbmm (#102659 ) Fixes part of the #100838 related to disabling support for non matching dtypes for input/batches for `baddbmm` operator. * [x] added dtype checks * [x] added test case Pull Request resolved: https://github.com/pytorch/pytorch/pull/102659 Approved by: https://github.com/ngimel	2023-06-13 00:31:06 +00:00
Xinya Zhang	1d857586f1	[ROCM] enable hipSOLVER backend for linalg.ldl_factor (#102665 ) * Add complex dtype support for linalg.ldl_factor * Fixes SWDEV-360139 * Enable the following 19 tests for ROCM + test_decomp.py TestDecompCUDA.test_comprehensive_linalg_ldl_factor_cuda_complex128 + test_decomp.py TestDecompCUDA.test_comprehensive_linalg_ldl_factor_cuda_complex64 + test_decomp.py TestDecompCUDA.test_comprehensive_linalg_ldl_factor_ex_cuda_complex128 + test_decomp.py TestDecompCUDA.test_comprehensive_linalg_ldl_factor_ex_cuda_complex64 + test_meta.py TestMetaCUDA.test_dispatch_meta_linalg_ldl_factor_cuda_complex128 + test_meta.py TestMetaCUDA.test_dispatch_meta_linalg_ldl_factor_cuda_complex64 + test_meta.py TestMetaCUDA.test_dispatch_meta_linalg_ldl_factor_ex_cuda_complex128 + test_meta.py TestMetaCUDA.test_dispatch_meta_linalg_ldl_factor_ex_cuda_complex64 + test_meta.py TestMetaCUDA.test_meta_linalg_ldl_factor_cuda_complex128 + test_ops.py TestCommonCUDA.test_noncontiguous_samples_linalg_ldl_factor_cuda_complex64 + test_ops.py TestCommonCUDA.test_noncontiguous_samples_linalg_ldl_factor_ex_cuda_complex64 + test_ops.py TestCommonCUDA.test_variant_consistency_eager_linalg_ldl_factor_cuda_complex64 + test_ops.py TestCommonCUDA.test_variant_consistency_eager_linalg_ldl_factor_ex_cuda_complex64 + test_ops.py TestMathBitsCUDA.test_conj_view_linalg_ldl_factor_cuda_complex64 + test_ops.py TestMathBitsCUDA.test_conj_view_linalg_ldl_factor_ex_cuda_complex64 + test_ops.py TestMathBitsCUDA.test_neg_conj_view_linalg_ldl_factor_cuda_complex128 + test_ops.py TestMathBitsCUDA.test_neg_conj_view_linalg_ldl_factor_ex_cuda_complex128 + test_ops_jit.py TestJitCUDA.test_variant_consistency_jit_linalg_ldl_factor_cuda_complex64 + test_ops_jit.py TestJitCUDA.test_variant_consistency_jit_linalg_ldl_factor_ex_cuda_complex64 Pull Request resolved: https://github.com/pytorch/pytorch/pull/102665 Approved by: https://github.com/lezcano	2023-06-08 20:05:01 +00:00
Andres Lugo-Reyes	eaffd98880	Enable hipSOLVER in ROCm builds (#97370 ) Enables the hipSolver backend for ROCm builds -------------------------------------------------------------------------- - Minimum ROCm version requirement - 5.3 - Introduces new macro USE_LINALG_SOLVER the controls enablement of both cuSOLVER and hipSOLVER - Adds hipSOLVER API to hipification process - combines hipSOLVER and hipSPARSE mappings into single SPECIAL map that takes priority among normal mappings - Torch api to be moved to hipsolver backend (as opposed to magma) include: torch.svd(), torch.geqrf(), torch.orgqr(), torch.ormqr() - Will enable 100+ linalg unit tests for ROCm Pull Request resolved: https://github.com/pytorch/pytorch/pull/97370 Approved by: https://github.com/malfet	2023-05-31 16:53:23 +00:00
PyTorch MergeBot	a64e97b62c	Revert "[dynamo 3.11] enable other torch 3.11 dynamo-related tests (#99180 )" This reverts commit `aa8dcab1ce`. Reverted https://github.com/pytorch/pytorch/pull/99180 on behalf of https://github.com/huydhn due to Sorry for reverting this, but linux-bionic-py3.11-clang9 test starts to timeout after this taking more than 3h30m. This is probably a landrace ([comment](https://github.com/pytorch/pytorch/pull/99180#issuecomment-1545982256))	2023-05-12 16:18:22 +00:00
William Wen	aa8dcab1ce	[dynamo 3.11] enable other torch 3.11 dynamo-related tests (#99180 ) Notes: - No segfaults observed in any CI tests: dynamo unittests, inductor unittests, dynamo-wrapped pytorch tests. So we remove the warning that using dynamo 3.11 may result in segfaults. - Some dynamo-wrapped pytorch tests hang. They will be skipped in the dynamo-wrapped test suite and will be addressed in a future PR Pull Request resolved: https://github.com/pytorch/pytorch/pull/99180 Approved by: https://github.com/malfet	2023-05-12 07:03:09 +00:00
soulitzer	6a02342131	Check inputs have same dtype in addmm_impl_cpu_ even if input has zero numel (#100274 ) Fixes #99226 When an inputs has zero numel, addmm_impl_cpu_'s check that the inputs have the same dtype are bypassed. This PR adds a check before the early return. Pull Request resolved: https://github.com/pytorch/pytorch/pull/100274 Approved by: https://github.com/ngimel	2023-04-29 00:07:54 +00:00
Irem Yuksel	2504089329	Enable test_linalg_solve_triangular_large (#96182 ) PR to see if test fails after removing skip line Fixes #70111 Pull Request resolved: https://github.com/pytorch/pytorch/pull/96182 Approved by: https://github.com/lezcano	2023-04-28 12:54:27 +00:00
Larry Liu	687afeb686	[dynamo][numpy] Add NumpyTensorVariable to translate ndarray attribute calls to tensor attributes (#95849 ) Issue: #93684 # Problem Reduce graph breaks when dynamo compiles python functions containing numpy functions and ndarray operations. # Design (as I know it) * Use torch_np.ndarray(a wrapper of tensor) to back a `VariableTracker`: `NumpyTensorVariable`. * Translate all attributes and methods calls, on ndarray, to torch_np.ndarray equivalent. This PR adds `NumpyTensorVariable` and supports: 1. tensor to ndarray, ndarray to tensor 2. numpy functions such as numpy.meshgrid() 3. ndarray attributes such as `itemsize`, `stride` Next PR will handle returning `np.ndarray` and add support for ndarray methods Pull Request resolved: https://github.com/pytorch/pytorch/pull/95849 Approved by: https://github.com/ezyang	2023-04-27 16:18:35 +00:00
eqy	2fddcf0fc0	[CUDA][CUDA 11] Remove more CUDA 11 version checks (#92934 ) Working on removing stragglers missed in previous CUDA version < 11.0 cleanup PRs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/92934 Approved by: https://github.com/ngimel	2023-03-30 19:49:52 +00:00
Aaron Gokaslan	47dca20d80	[BE] Enable flake8-comprehension rule C417 (#97880 ) Enables flake8-comprehension rule C417. Ruff autogenerated these fixes to the codebase. Pull Request resolved: https://github.com/pytorch/pytorch/pull/97880 Approved by: https://github.com/ezyang, https://github.com/kit1980, https://github.com/albanD	2023-03-30 14:34:24 +00:00
Christian Puhrsch	9d37cefcb0	Resubmit _int_mm (#96685 ) Avoids any changes to gemm_and_bias Pull Request resolved: https://github.com/pytorch/pytorch/pull/96685 Approved by: https://github.com/drisspg, https://github.com/ngimel	2023-03-27 16:14:07 +00:00
haozhe.zhu	fe0afc5852	use accumulate type in BF16 gemm(include dot, mv) ref path (#96074 ) Fix https://github.com/pytorch/pytorch/issues/95125 and https://github.com/pytorch/pytorch/issues/83863 for bf16 accumulation in gemm ref path Pull Request resolved: https://github.com/pytorch/pytorch/pull/96074 Approved by: https://github.com/lezcano, https://github.com/peterbell10	2023-03-23 01:22:59 +00:00
Christian Puhrsch	0a53c9624a	Back out "Add _int_mm to expose cuBLAS int8@int8 -> int32 matmul (#94339 )" (#96885 ) Summary: Backing out _int_mm to expose cuBLAS int8@int8 -> int32 matmul (#94339) Test Plan: CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/96885 Approved by: https://github.com/drisspg	2023-03-16 05:32:55 +00:00
mantaionut	2cbce06fee	Enablee test_inverse_errors_large (#94727 ) Test to see if TestLinAlgCUDA.test_inverse_errors_large_cuda_float64 still fails on CI. The test was not failing in multiple CI runs. I was not able to reproduce the crash locally. Fixes #57482 Pull Request resolved: https://github.com/pytorch/pytorch/pull/94727 Approved by: https://github.com/lezcano	2023-03-13 08:31:41 +00:00
XiaobingSuper	ac77883e48	fix issue of baddbmm when out has nan value for beta=0 (#96086 ) Fix https://github.com/pytorch/pytorch/issues/96037. Pull Request resolved: https://github.com/pytorch/pytorch/pull/96086 Approved by: https://github.com/ngimel, https://github.com/lezcano	2023-03-07 14:54:05 +00:00
Christian Puhrsch	1fe2a9d122	Add _int_mm to expose cuBLAS int8@int8 -> int32 matmul (#94339 ) Add _int_mm primitive that binds cuBLAS int8@int8 -> int32 matmul and that translates to Triton based mm templates under max autotune. This is a very useful first step towards better supporting quantization on the GPU. This is a not a user facing API, but an internal primitive. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94339 Approved by: https://github.com/ngimel, https://github.com/jansel	2023-02-27 20:27:25 +00:00
lezcano	03cc0f587c	Don't create large intermediary tensors in the backward of matmul (#95261 ) Currently, if we multiply a transposed batch of matrices with shape [b, m, n] and a matrix with shape [n, k], when computing the gradient of the matrix, we instantiate a matrix of shape [b, n, k]. This may be a very large matrix. Instead, we fold the batch of matrices into a matrix, which avoids creating any large intermediary tensor. Note that multiplying a batch of matrices and a matrix naturally occurs within an attention module, so this case surely happens in the wild. In particular, this issue was found while investigating the OOMs caused by the improved folding algorithm in the next PR of this stack. See https://github.com/pytorch/pytorch/pull/76828#issuecomment-1432359980 This PR fixes those OOMs and decreases the memory footprint of the backward of matmul. I understand this is a tricky one, so I put it on its own PR to discuss it. Differential Revision: [D43541495](https://our.internmc.facebook.com/intern/diff/D43541495) Pull Request resolved: https://github.com/pytorch/pytorch/pull/95261 Approved by: https://github.com/ezyang	2023-02-27 15:19:09 +00:00
kshitij12345	3b966a6ce3	[autograd] disable backward/grad for complex scalar output (#92753 ) Fixes https://github.com/pytorch/pytorch/issues/92750 Pull Request resolved: https://github.com/pytorch/pytorch/pull/92753 Approved by: https://github.com/ezyang	2023-02-23 11:38:27 +00:00
XiaobingSuper	5730cabdd0	using float type to do the computation of norm reduce for cpu half and bfloat16 dtype (#95166 ) As the title, we should use a higher dtype to compute norm reduce for half and bfloat1 dtype. Pull Request resolved: https://github.com/pytorch/pytorch/pull/95166 Approved by: https://github.com/peterbell10, https://github.com/jgong5, https://github.com/ngimel, https://github.com/lezcano	2023-02-23 05:00:25 +00:00
Nikita Shulga	42b6bcdb13	[BE] Add empty tensor check to _compute_linear_combination (#94245 ) Fixes https://github.com/pytorch/pytorch/issues/94124 Pull Request resolved: https://github.com/pytorch/pytorch/pull/94245 Approved by: https://github.com/lezcano	2023-02-07 11:31:11 +00:00
Ivan Yashchuk	fba13d94a1	Remove deprecated torch.symeig (#70988 ) The time has come to remove deprecated linear algebra related functions. This PR removes `torch.symeig`. - [x] XLA PR: https://github.com/pytorch/xla/pull/4498 Pull Request resolved: https://github.com/pytorch/pytorch/pull/70988 Approved by: https://github.com/lezcano, https://github.com/kit1980, https://github.com/malfet	2023-01-31 11:59:11 +00:00
PyTorch MergeBot	acdd462b1a	Revert "Remove deprecated torch.symeig (#70988 )" This reverts commit `d70ed68162`. Reverted https://github.com/pytorch/pytorch/pull/70988 on behalf of https://github.com/kit1980 due to Failing XLA tests, forward fix unsuccessful	2023-01-24 19:03:40 +00:00
Eddie Yan	0bf7506051	[CUDA] Drop CUDA < 11.0 test flags (#92605 ) Follow-up of #89582 to drop flags like `CUDA11OrLater` in tests. Note that in some places it appears that `TEST_WITH_ROCM` is _implicitly_ guarded against via the `CUDA11OrLater` version check, based on my best-guess of how `torch.version.cuda` would behave in ROCM builds, so I've added `not TEST_WITH_ROCM` in cases where ROCM wasn't previously explicitly allowed. CC @ptrblck @malfet @ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/92605 Approved by: https://github.com/ngimel	2023-01-24 04:34:06 +00:00
Ivan Yashchuk	d70ed68162	Remove deprecated torch.symeig (#70988 ) The time has come to remove deprecated linear algebra related functions. This PR removes `torch.symeig`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/70988 Approved by: https://github.com/lezcano, https://github.com/kit1980	2023-01-23 22:51:40 +00:00
PyTorch MergeBot	0a6053e9b5	Revert "Avoid copies in matmul (#76828 )" This reverts commit `8c2e82b487`. Reverted https://github.com/pytorch/pytorch/pull/76828 on behalf of https://github.com/mehtanirav due to Internal breakages	2023-01-03 23:36:58 +00:00
lezcano	8c2e82b487	Avoid copies in matmul (#76828 ) With this PR, matmul just folds a bmm into a mm o mv if and only if it can achieve so without copying. We add tests for this to make sure that our algorithm to detect this is accurate. For the cases where it was copying before see https://github.com/pytorch/pytorch/pull/75197#discussion_r843413208 https://github.com/pytorch/pytorch/pull/75197#discussion_r863489479 https://github.com/pytorch/pytorch/pull/75197#discussion_r863489805 Fixes https://github.com/pytorch/pytorch/issues/76702 Pull Request resolved: https://github.com/pytorch/pytorch/pull/76828 Approved by: https://github.com/ngimel	2023-01-03 14:18:38 +00:00
PyTorch MergeBot	db2a237763	Revert "Avoid copies in matmul (#76828 )" This reverts commit `0c3659586d`. Reverted https://github.com/pytorch/pytorch/pull/76828 on behalf of https://github.com/lezcano due to Makes functorch tests fail	2023-01-03 12:26:29 +00:00
lezcano	0c3659586d	Avoid copies in matmul (#76828 ) With this PR, matmul just folds a bmm into a mm o mv if and only if it can achieve so without copying. We add tests for this to make sure that our algorithm to detect this is accurate. For the cases where it was copying before see https://github.com/pytorch/pytorch/pull/75197#discussion_r843413208 https://github.com/pytorch/pytorch/pull/75197#discussion_r863489479 https://github.com/pytorch/pytorch/pull/75197#discussion_r863489805 Fixes https://github.com/pytorch/pytorch/issues/76702 Pull Request resolved: https://github.com/pytorch/pytorch/pull/76828 Approved by: https://github.com/ngimel	2023-01-02 20:07:38 +00:00
Jithun Nair	e8e591b72f	Upgrade CI to ROCm5.3 (#88297 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88297 Approved by: https://github.com/malfet	2022-12-14 05:09:56 +00:00
PyTorch MergeBot	af4735d3ad	Revert "Upgrade CI to ROCm5.3 (#88297 )" This reverts commit `181a82ffd2`. Reverted https://github.com/pytorch/pytorch/pull/88297 on behalf of https://github.com/IvanYashchuk due to Tests are unnecessarily skipped on all platforms	2022-12-13 12:23:44 +00:00
Jithun Nair	181a82ffd2	Upgrade CI to ROCm5.3 (#88297 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88297 Approved by: https://github.com/malfet	2022-12-13 04:50:06 +00:00
lezcano	1d6a188d08	Reland Dispatch torch.norm to linalg.vector_norm and linalg.matrix_norm (#81761 ) (#84624 ) Reland https://github.com/pytorch/pytorch/pull/81761 Differential Revision: [D39332292](https://our.internmc.facebook.com/intern/diff/D39332292) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84624 Approved by: https://github.com/kit1980	2022-11-22 07:53:24 +00:00
lezcano	d8506ff42b	Generalize gesvdjBatched to run whith full_matrices==false (#88502 ) As brought up in https://github.com/pytorch/pytorch/issues/86234#issuecomment-1268296036, our heuristic for which SVD backend to choose was not great in some cases. The case in which there could be some improvements is when we have a large batch of very small non-square matrices. This PR, adapts the calling code to gesvdj by creating two temporary square buffers to allow to call gesvdjBatched, and then copies back the result into the output buffers. We then modify the heuristic that chooses between gesvdj and gesvdjBatched. Fixes https://github.com/pytorch/pytorch/issues/86234 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88502 Approved by: https://github.com/IvanYashchuk, https://github.com/nikitaved, https://github.com/mruberry, https://github.com/xwang233	2022-11-07 22:07:48 +00:00
Fang Wang	160118d72a	Add test case for matrix multiply-add with large inputs (#85550 ) Summary: - Added test case for addmm, baddbmm and linear with large inputs - Testing with torch types: float32, float16, bfloat16 Test Plan: Run unit tests with: `buck2 run mode/opt //caffe2/test:linalg_re_cuda` ``` ... test_addmm_baddbmm_large_input_1_10000_10000_10000_cpu_bfloat16 (test_linalg_re_cuda.TestLinalgReCudaCPU) ... skipped 'Only runs on cuda' test_addmm_baddbmm_large_input_1_10000_10000_10000_cpu_float16 (test_linalg_re_cuda.TestLinalgReCudaCPU) ... skipped 'Only runs on cuda' test_addmm_baddbmm_large_input_1_10000_10000_10000_cpu_float32 (test_linalg_re_cuda.TestLinalgReCudaCPU) ... skipped 'Only runs on cuda' test_addmm_baddbmm_large_input_1_10000_1000_10000_cpu_bfloat16 (test_linalg_re_cuda.TestLinalgReCudaCPU) ... skipped 'Only runs on cuda' test_addmm_baddbmm_large_input_1_10000_1000_10000_cpu_float16 (test_linalg_re_cuda.TestLinalgReCudaCPU) ... skipped 'Only runs on cuda' test_addmm_baddbmm_large_input_1_10000_1000_10000_cpu_float32 (test_linalg_re_cuda.TestLinalgReCudaCPU) ... skipped 'Only runs on cuda' test_addmm_baddbmm_large_input_2_1000_1000_1000_cpu_bfloat16 (test_linalg_re_cuda.TestLinalgReCudaCPU) ... skipped 'Only runs on cuda' test_addmm_baddbmm_large_input_2_1000_1000_1000_cpu_float16 (test_linalg_re_cuda.TestLinalgReCudaCPU) ... skipped 'Only runs on cuda' test_addmm_baddbmm_large_input_2_1000_1000_1000_cpu_float32 (test_linalg_re_cuda.TestLinalgReCudaCPU) ... skipped 'Only runs on cuda' test_addmm_baddbmm_large_input_2_100_100_100_cpu_bfloat16 (test_linalg_re_cuda.TestLinalgReCudaCPU) ... skipped 'Only runs on cuda' test_addmm_baddbmm_large_input_2_100_100_100_cpu_float16 (test_linalg_re_cuda.TestLinalgReCudaCPU) ... skipped 'Only runs on cuda' test_addmm_baddbmm_large_input_2_100_100_100_cpu_float32 (test_linalg_re_cuda.TestLinalgReCudaCPU) ... skipped 'Only runs on cuda' test_addmm_baddbmm_large_input_1_10000_10000_10000_cuda_bfloat16 (test_linalg_re_cuda.TestLinalgReCudaCUDA) ... ok test_addmm_baddbmm_large_input_1_10000_10000_10000_cuda_float16 (test_linalg_re_cuda.TestLinalgReCudaCUDA) ... ok test_addmm_baddbmm_large_input_1_10000_10000_10000_cuda_float32 (test_linalg_re_cuda.TestLinalgReCudaCUDA) ... ok test_addmm_baddbmm_large_input_1_10000_1000_10000_cuda_bfloat16 (test_linalg_re_cuda.TestLinalgReCudaCUDA) ... ok test_addmm_baddbmm_large_input_1_10000_1000_10000_cuda_float16 (test_linalg_re_cuda.TestLinalgReCudaCUDA) ... ok test_addmm_baddbmm_large_input_1_10000_1000_10000_cuda_float32 (test_linalg_re_cuda.TestLinalgReCudaCUDA) ... ok test_addmm_baddbmm_large_input_2_1000_1000_1000_cuda_bfloat16 (test_linalg_re_cuda.TestLinalgReCudaCUDA) ... ok test_addmm_baddbmm_large_input_2_1000_1000_1000_cuda_float16 (test_linalg_re_cuda.TestLinalgReCudaCUDA) ... ok test_addmm_baddbmm_large_input_2_1000_1000_1000_cuda_float32 (test_linalg_re_cuda.TestLinalgReCudaCUDA) ... ok test_addmm_baddbmm_large_input_2_100_100_100_cuda_bfloat16 (test_linalg_re_cuda.TestLinalgReCudaCUDA) ... ok test_addmm_baddbmm_large_input_2_100_100_100_cuda_float16 (test_linalg_re_cuda.TestLinalgReCudaCUDA) ... ok test_addmm_baddbmm_large_input_2_100_100_100_cuda_float32 (test_linalg_re_cuda.TestLinalgReCudaCUDA) ... ok ---------------------------------------------------------------------- Ran 24 tests in 63.224s OK (skipped=12) ``` Differential Revision: D39718256 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85550 Approved by: https://github.com/IvanYashchuk, https://github.com/malfet	2022-10-11 17:52:21 +00:00
Jane Xu	a348975e00	Add opteinsum backend to give users control (#86219 ) This achieves the same things as https://github.com/pytorch/pytorch/pull/85908 but using backends instead of kwargs (which breaks torchscript unfortunately). This also does mean we let go of numpy compatibility BUT the wins here are that users can control what opt einsum they wanna do! The backend allows for..well you should just read the docs: ``` .. attribute:: torch.backends.opteinsum.enabled A :class:`bool` that controls whether opt_einsum is enabled (on by default). If so, torch.einsum will use opt_einsum (https://optimized-einsum.readthedocs.io/en/stable/path_finding.html) to calculate an optimal path of contraction for faster performance. .. attribute:: torch.backends.opteinsum.strategy A :class:`str` that specifies which strategies to try when `torch.backends.opteinsum.enabled` is True. By default, torch.einsum will try the "auto" strategy, but the "greedy" and "optimal" strategies are also supported. Note that the "optimal" strategy is factorial on the number of inputs as it tries all possible paths. See more details in opt_einsum's docs (https://optimized-einsum.readthedocs.io/en/stable/path_finding.html). ``` In trying (and failing) to land 85908, I discovered that jit script does NOT actually pull from python's version of einsum (because it cannot support variadic args nor kwargs). Thus I learned that jitted einsum does not subscribe to the new opt_einsum path calculation. Overall, this is fine since jit script is getting deprecated, but where is the best place to document this? ## Test plan: - added tests to CI - locally tested that trying to set the strategy to something invalid will error properly - locally tested that tests will pass even if you don't have opt-einsum - locally tested that setting the strategy when opt-einsum is not there will also error properly Pull Request resolved: https://github.com/pytorch/pytorch/pull/86219 Approved by: https://github.com/soulitzer, https://github.com/malfet	2022-10-05 06:33:25 +00:00
albanD	94da90e41f	LU solve/unpack fix to prevent bad memory usage on CPU (#85922 ) Fixes https://github.com/pytorch/pytorch/issues/77898 Fixes https://github.com/pytorch/pytorch/issues/85026 There is a minor perf impact but: - For lu_solve, the actual compute is going to be more expensive than this O(n) check (ones pass over the other matrices is O(n^2) in any case) - For lu_unpack, the check inside the kernel should be almost free. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85922 Approved by: https://github.com/ngimel, https://github.com/nikitaved	2022-09-30 20:07:08 +00:00
Jane Xu	e7e1cd945f	Add path optimize kwarg to einsum (#84890 ) ## This PR seeks to: - [x] add c++ support for an optimize path - [x] add python opt_einsum path passthrough - [x] add opt_einsum to OSS requirements, but a soft one - [x] show benchmark results here Additional things I've explored + their conclusions: - Delaying the summing over dimensions => added! - The idea here is to not incur kernel calls to `sum` as we try to early sum out in einsum. Thus, we collect all the dimensions that need to be summed together in one contraction + sum at the end instead of summing as we go. While this optimization didn't feel like it made things faster for the random cases we've selected (they all summed 1 dim per contraction), it is a good principle and would help more common use cases that would reduce multiple dimensions at a time (like `bxy,xyi,xyj->bij`). - Caching contract_path based on equation and tensor sizes => dropped :( - The benchmarks were strictly worse for all the cases, and, from scanning the use cases, I observed people do not often call einsum on the same equation/tensor order enough for caching to be justified. I do think caching can be effective in the future, but it would require further investigation. ## Not a part of this PR (but are next steps): - adding opt_einsum package to OSS CI - adding it to internal CI - potentially adding a kwarg path argument to the python API -- if the path is given, we wouldn't have to spend time calculating it, but there would be some time lost validating user input. ## Testing: - Added more tests to CI ## Benchmarking: TL;DRs - torch.einsum with opt_einsum is a definite win for the production case. - torch.einsum with opt_einsum installed is consistently fast, but has an overhead of needing to find the path. If the path is already found/optimal, it will be slightly slower. - The einsum overhead decreases for bigger dimensions. - torch.einsum without opt_einsum installed is comparable to before this commit, with occasional slowness potentially due to not reshaping/squeezing as we contract until the end. - For many of the random generated cases, the dimensions were too similar and small where an optimal order wasn't that much more optimal than just going left to right. However, in production, dimensions are commonly quite distinct (batch size will be small, but the data will be huge). - torch.einsum opt is comparable (slightly faster overall) compared to numpy.einsum opt for the cpu case. This is interesting given that torch.einsum currently spends time computing the path, but numpy.einsum takes it as input. - torch.einsum opt is significantly faster than numpy.einsum opt for the gpu case. This is because numpy doesn't take advantage of GPUs. The following benchmarks were done on an A100 GPU and Linux CPUs. The line in the first chart separates GPU (on top) from CPU, and the line in the second graph separates CPU (on top) and then GPU. Sorry it's flipped 😛 . Production example (see [colab benchmark](https://colab.research.google.com/drive/1V2s4v1dOOKwRvp5T_DC-PNUosOV9FFJx?authuser=1#scrollTo=WZoQkC8Mdt6I) for more context): <img width="1176" alt="image" src="https://user-images.githubusercontent.com/31798555/192012636-9a68bfa7-2601-43b1-afeb-b4e0877db6a4.png"> Randomly generated examples (the same ones as in https://github.com/pytorch/pytorch/pull/60191) <img width="1176" alt="image" src="https://user-images.githubusercontent.com/31798555/192012804-1c639595-b3e6-48c9-a385-ad851c13e1c2.png"> Open below to see old + not super relevant benchmarking results: <details> Benchmark results BEFORE this PR (on Linux -- I will update devices so they are consistent later): <img width="776" alt="image" src="https://user-images.githubusercontent.com/31798555/190807274-18f71fce-556e-47f4-b18c-e0f7d0c0d5aa.png"> Benchmark results with the code on this PR (on my x86 mac): For the CPU internal use case -- ![image](https://user-images.githubusercontent.com/31798555/190801376-6f591b00-cebd-4ca7-bb23-ae8f17f1634e.png) For the general use case -- It looks like numpy opt still does better in several of these random cases, but torch einsum opt is consistently faster than torch.einsum. ![image](https://user-images.githubusercontent.com/31798555/190811730-fbb6797d-af59-4f5a-92da-ba4103372014.png) <details> Pull Request resolved: https://github.com/pytorch/pytorch/pull/84890 Approved by: https://github.com/albanD, https://github.com/soulitzer	2022-09-24 03:47:36 +00:00
Sourav Mandal	70b27e91c7	[pytorch] Skip linalg tests that fail on Meta infra (#85577 ) Summary: test_inverse_errors_large and test_linalg_solve_triangular fail for dtype=float64 when invoked on GPUs on Meta internal testing infra. Skip in Meta internal testing. Test Plan: (observe tests skipped on Meta internal infra) Reviewed By: mikekgfb Differential Revision: D39785331 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85577 Approved by: https://github.com/malfet	2022-09-24 01:02:42 +00:00
Wei Wang	8bd4724f04	Adding a unit test that would gate PRs and prevent reverts, e.g. #83327 (#85442 ) PR #83327 slipped through CI, adding this unit test as part of efforts to minimize future reverts Pull Request resolved: https://github.com/pytorch/pytorch/pull/85442 Approved by: https://github.com/Balandat, https://github.com/mehtanirav	2022-09-23 01:05:17 +00:00
Ivan Yashchuk	539076e2c2	Remove deprecated torch.lstsq (#70980 ) The time has come to remove deprecated linear algebra related functions. This PR removes `torch.lstsq`. There's a note in `tools/codegen/gen.py` about `lstsq` schema in `native_function.yaml` that I will not remove: `87139d8532/tools/codegen/gen.py (L734-L770)` cc @jianyuh @nikitaved @pearu @mruberry @walterddr @IvanYashchuk @xwang233 @Lezcano Pull Request resolved: https://github.com/pytorch/pytorch/pull/70980 Approved by: https://github.com/lezcano, https://github.com/kit1980	2022-09-23 00:16:55 +00:00
Ivan Yashchuk	bcf93181a0	Remove deprecated torch.matrix_rank (#70981 ) The time has come to remove deprecated linear algebra related functions. This PR removes `torch.matrix_rank`. cc @jianyuh @nikitaved @pearu @mruberry @walterddr @IvanYashchuk @xwang233 @Lezcano Pull Request resolved: https://github.com/pytorch/pytorch/pull/70981 Approved by: https://github.com/lezcano, https://github.com/kit1980	2022-09-22 17:40:46 +00:00
Sourav Mandal	5aa84c16db	[pytorch] cuBLAS addmm malfunction test (#85432 ) Summary: Re-submit for approved PR that was then reverted: https://github.com/pytorch/pytorch/pull/85084 Create unit test to detect cuBLAS breakage via large differences between CPU and GPU addmm invocations Test Plan: Sample unit test output -- [...] test_cublas_addmm_size_10000_cpu_bfloat16 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_10000_cpu_float16 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_10000_cpu_float32 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_1000_cpu_bfloat16 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_1000_cpu_float16 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_1000_cpu_float32 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_100_cpu_bfloat16 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_100_cpu_float16 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_100_cpu_float32 (test_linalg.TestLinalgCPU) ... ok [...] Reviewed By: mikekgfb Differential Revision: D39433029 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85432 Approved by: https://github.com/zrphercule	2022-09-21 22:17:48 +00:00
PyTorch MergeBot	2fb820455c	Revert "[pytorch] cuBLAS addmm malfunction test (#85084 )" This reverts commit `0297c75c14`. Reverted https://github.com/pytorch/pytorch/pull/85084 on behalf of https://github.com/clee2000 due to broke tests on trunk, https://github.com/pytorch/pytorch/actions/runs/3098347639/jobs/5017166419	2022-09-21 16:48:55 +00:00
Sourav Mandal	0297c75c14	[pytorch] cuBLAS addmm malfunction test (#85084 ) Summary: Create unit test to detect cuBLAS breakage via large differences between CPU and GPU addmm invocations Test Plan: Sample unit test output -- [...] test_cublas_addmm_size_10000_cpu_bfloat16 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_10000_cpu_float16 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_10000_cpu_float32 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_1000_cpu_bfloat16 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_1000_cpu_float16 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_1000_cpu_float32 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_100_cpu_bfloat16 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_100_cpu_float16 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_100_cpu_float32 (test_linalg.TestLinalgCPU) ... ok [...] Reviewed By: mikekgfb Differential Revision: D39433029 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85084 Approved by: https://github.com/zrphercule	2022-09-21 13:42:13 +00:00
Ivan Yashchuk	01c54ad6de	Remove deprecated torch.eig (#70982 ) The time has come to remove deprecated linear algebra related functions. This PR removes `torch.eig`. cc @jianyuh @nikitaved @pearu @mruberry @walterddr @IvanYashchuk @xwang233 @Lezcano Pull Request resolved: https://github.com/pytorch/pytorch/pull/70982 Approved by: https://github.com/Lezcano, https://github.com/malfet	2022-09-09 21:31:57 +00:00
PyTorch MergeBot	166dec74b5	Revert "Dispatch torch.norm to linalg.vector_norm and linalg.matrix_norm (#81761 )" This reverts commit `65beff5acb`. Reverted https://github.com/pytorch/pytorch/pull/81761 on behalf of https://github.com/mehtanirav due to Breakages in pytorch/glow	2022-09-06 22:31:14 +00:00
lezcano	65beff5acb	Dispatch torch.norm to linalg.vector_norm and linalg.matrix_norm (#81761 ) `torch.norm` is very odd. Some notable issues are: - The default value of `"fro"` in `torch.norm` has an odd behaviour when `dim=None`. This is handled in the new dispatch - The treatment of the `dtype` argument in `torch.norm` was completely wrong. This should fix it - Some `out=` variants in the previous implementation were also wrong. This should fix those. - This new dispatch should make some paths much faster. For example, `torch.norm(x)` where `x` is complex. I'll try to make the changes in these PRs as incremental as possible as this is a tricky one. Pull Request resolved: https://github.com/pytorch/pytorch/pull/81761 Approved by: https://github.com/ngimel	2022-09-02 19:12:25 +00:00
Mario Lezcano	f5a3515083	Make linalg.inv composite of linalg.solve (#80074 ) The `getri` kernel calls inside `getrs` so we can do so explicitly ourselves and save ourselves from having to maintain an extra kernel. This way we just need to optimise `lu_factor` and `lu_solve` and `inv` will be as efficient as it can be, as it'll be choosing the best backend to perform the factorisation and the best backend (not necessarily the same) to perform the solve. Fixes https://github.com/pytorch/pytorch/issues/77498 The benchmarks: https://github.com/pytorch/pytorch/pull/80074#issuecomment-1164309071 Pull Request resolved: https://github.com/pytorch/pytorch/pull/80074 Approved by: https://github.com/IvanYashchuk, https://github.com/albanD, https://github.com/malfet	2022-08-25 09:28:55 +00:00
PyTorch MergeBot	5321bf52f2	Revert "Make linalg.inv composite of linalg.solve (#80074 )" This reverts commit `4737b33614`. Reverted https://github.com/pytorch/pytorch/pull/80074 on behalf of https://github.com/malfet due to Depends on the changes from https://github.com/pytorch/pytorch/pull/83628	2022-08-25 00:43:00 +00:00
Mario Lezcano	4737b33614	Make linalg.inv composite of linalg.solve (#80074 ) The `getri` kernel calls inside `getrs` so we can do so explicitly ourselves and save ourselves from having to maintain an extra kernel. This way we just need to optimise `lu_factor` and `lu_solve` and `inv` will be as efficient as it can be, as it'll be choosing the best backend to perform the factorisation and the best backend (not necessarily the same) to perform the solve. Fixes https://github.com/pytorch/pytorch/issues/77498 The benchmarks: https://github.com/pytorch/pytorch/pull/80074#issuecomment-1164309071 Pull Request resolved: https://github.com/pytorch/pytorch/pull/80074 Approved by: https://github.com/IvanYashchuk, https://github.com/albanD, https://github.com/malfet	2022-08-24 15:18:56 +00:00
lezcano	0bdcfcb840	Strenghten preconditions of linalg.cross (#83798 ) This makes `linalg.cross` array API complaint (https://github.com/data-apis/array-api/issues/415) and fixes a few bugs. Fixes https://github.com/pytorch/pytorch/issues/77629 Fixes https://github.com/pytorch/pytorch/issues/83756 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83798 Approved by: https://github.com/mruberry	2022-08-24 15:17:12 +00:00
PyTorch MergeBot	bbe803cb35	Revert "Strenghten preconditions of linalg.cross (#83798 )" This reverts commit `7f0198e739`. Reverted https://github.com/pytorch/pytorch/pull/83798 on behalf of https://github.com/janeyx99 due to Sorry, land race caused functorch issues `7f0198e739`	2022-08-23 19:36:43 +00:00
lezcano	7f0198e739	Strenghten preconditions of linalg.cross (#83798 ) This makes `linalg.cross` array API complaint (https://github.com/data-apis/array-api/issues/415) and fixes a few bugs. Fixes https://github.com/pytorch/pytorch/issues/77629 Fixes https://github.com/pytorch/pytorch/issues/83756 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83798 Approved by: https://github.com/mruberry	2022-08-23 18:06:51 +00:00
pbialecki	b4f7e22640	Enable periodic builds for CUDA 11.7 (#81688 ) CC @atalman Pull Request resolved: https://github.com/pytorch/pytorch/pull/81688 Approved by: https://github.com/atalman	2022-08-10 00:03:51 +00:00
albanD	2255911f8a	Make M1 tests green (#82213 ) This is skipping all the failing tests and add a new master job to test on M1 Pull Request resolved: https://github.com/pytorch/pytorch/pull/82213 Approved by: https://github.com/seemethere, https://github.com/soulitzer, https://github.com/malfet	2022-08-05 16:12:08 +00:00
lezcano	c5330183ca	[PrimTorch] Reference for linalg.matrix_norm (#81113 ) As per title. I corrected a thing or two from my previous implementation to make for better errors in some weird edge-cases and have a more clear understanding of when does this function support low_precision types and when it doesn't. We also use the optimisation for bfloat16 within `vector_norm` within this function. Pull Request resolved: https://github.com/pytorch/pytorch/pull/81113 Approved by: https://github.com/ngimel	2022-07-21 23:07:32 +00:00
Nikita Vedeneev	85144e63a9	`matrix_exp`: Make sure `_compute_linear_combinations` result preserves dim of the input. (#81330 ) Fixes https://github.com/pytorch/pytorch/issues/80948. Pull Request resolved: https://github.com/pytorch/pytorch/pull/81330 Approved by: https://github.com/Lezcano, https://github.com/mruberry	2022-07-12 21:21:34 +00:00
lezcano	37a5819665	Make slogdet, linalg.sloget and logdet support metatensors (#79742 ) This PR also adds complex support for logdet, and makes all these functions support out= and be composite depending on one function. We also extend the support of `logdet` to complex numbers and improve the docs of all these functions. We also use `linalg_lu_factor_ex` in these functions, so we remove the synchronisation present before. Pull Request resolved: https://github.com/pytorch/pytorch/pull/79742 Approved by: https://github.com/IvanYashchuk, https://github.com/albanD	2022-07-01 16:09:21 +00:00
lezcano	ff5a588e6e	Port cholesky to structured kernels (#79300 ) Yeah. Pull Request resolved: https://github.com/pytorch/pytorch/pull/79300 Approved by: https://github.com/IvanYashchuk, https://github.com/albanD	2022-06-24 02:37:45 +00:00
lezcano	549a597c00	Port linalg_eigh and linalg_eigvalsh to structured This follows the structure of linalg.svd. Pull Request resolved: https://github.com/pytorch/pytorch/pull/79072 Approved by: https://github.com/IvanYashchuk, https://github.com/albanD	2022-06-14 20:17:01 +00:00
lezcano	9fc2518a8a	Update and improve the heuristics for linalg.lu_solve This PR adds getrf_cublas to the functions considered in the heuristics for lu_solve. Pull Request resolved: https://github.com/pytorch/pytorch/pull/73878 Approved by: https://github.com/nikitaved, https://github.com/IvanYashchuk, https://github.com/mruberry	2022-06-11 04:06:40 +00:00
lezcano	54949a5abc	Simplify and optimize linalg.solve This PR heavily simplifies the code of `linalg.solve`. At the same time, this implementation saves quite a few copies of the input data in some cases (e.g. A is contiguous) We also implement it in such a way that the derivative goes from computing two LU decompositions and two LU solves to no LU decompositions and one LU solves. It also avoids a number of unnecessary copies the derivative was unnecessarily performing (at least the copy of two matrices). On top of this, we add a `left` kw-only arg that allows the user to solve `XA = B` rather concisely. Pull Request resolved: https://github.com/pytorch/pytorch/pull/74046 Approved by: https://github.com/nikitaved, https://github.com/IvanYashchuk, https://github.com/mruberry	2022-06-11 04:06:40 +00:00
lezcano	af6321f3d8	Port linalg_qr to structured This PR simplifies the logic of `linalg.qr` using structured kernels. I also took this chance and merged a few `copy_` operations with other ops. This PR removes a the previous magma implementation as is never faster than that of cusolver and it's rather buggy. This has the side-effect that now `qr` is not supported in Rocm. Ivan confirmed that this is fine, given how incredibly slow was QR on Rocm anyway (we were marking some tests as slow because of this...). This PR also corrects the dispatch in geqrf. Before, if we called it with a matrix for which `input.size(-2) <= 256 && batchCount(input) >= std::max<int64_t>(2, input.size(-2) / 16)` is false, and we have cublas but not cusolver, we would end up calling magma rather than cublas. This is not what the heuristic suggested. Probaly we should benchmark these heuristics again, but that's beyond the scope of this PR. Note. It looks like `torch.geqrf` maybe broken in MAGMA as per the previous comment in `linalg_qr_helper_magma`. IvanYashchuk wdyt? Pull Request resolved: https://github.com/pytorch/pytorch/pull/79054 Approved by: https://github.com/IvanYashchuk, https://github.com/ezyang	2022-06-09 14:41:30 +00:00
lezcano	f7b9a46880	Deprecate torch.lu BC-breaking note: This PR deprecates `torch.lu` in favor of `torch.linalg.lu_factor`. A upgrade guide is added to the documentation for `torch.lu`. Note this PR DOES NOT remove `torch.lu`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/77636 Approved by: https://github.com/malfet	2022-06-07 22:50:14 +00:00
lezcano	c7d6cec078	Add linalg.lu_solve This PR adds `linalg.lu_solve`. While doing so, I found a bug in MAGMA when calling the batched MAGMA backend with trans=True. We work around that by solving the system solving two triangular systems. We also update the heuristics for this function, as they were fairly updated. We found that cuSolver is king, so luckily we do not need to rely on the buggy backend from magma for this function. We added tests testing this function left and right. We also added tests for the different backends. We also activated the tests for AMD, as those should work as well. Fixes https://github.com/pytorch/pytorch/issues/61657 Pull Request resolved: https://github.com/pytorch/pytorch/pull/77634 Approved by: https://github.com/malfet	2022-06-07 22:28:28 +00:00
Xiao Wang	d136852bda	[CUDA][Linalg] Add a `driver=` kwarg to `torch.linalg.svd` and `svdvals`; add cusolver gesvdaStridedBatched driver to svd (#74521 ) [CUDA][Linalg] Add a driver= kwarg to torch.linalg.svd and svdvals; add cusolver gesvdaStridedBatched driver to svd cusolver doc: https://docs.nvidia.com/cuda/cusolver/index.html#cuSolverDN-lt-t-gt-gesvda Todo: - [X] add cusolver `gesvdaStridedBatched` driver - [X] add `driver=` kwarg to `torch.linalg.svd` and `torch.linalg.svdvals` - [X] doc - [X] error out (?) on other non-cusolver use cases: CPU, MAGMA - [X] change svd api in `torch/csrc/api/include/torch/linalg.h` ? Close https://github.com/pytorch/pytorch/issues/41306 Related https://github.com/pytorch/pytorch/issues/75494 Pull Request resolved: https://github.com/pytorch/pytorch/pull/74521 Approved by: https://github.com/Lezcano, https://github.com/IvanYashchuk, https://github.com/mruberry	2022-05-31 16:11:53 +00:00
Jagadish Krishnamoorthy	3ee863cb7c	[ROCm] enable test_lobpcg_ortho_cuda_float64 (#78385 ) Signed-off-by: Jagadish Krishnamoorthy <jagdish.krishna@gmail.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/78385 Approved by: https://github.com/Lezcano, https://github.com/pruthvistony	2022-05-28 22:46:23 +00:00
lezcano	ff7b6d6b5f	Update linalg.*norm This PR does a number of things: - Move linalg.vector_norm to structured kernels and simplify the logic - Fixes a number of prexisting issues with the dtype kwarg of these ops - Heavily simplifies and corrects the logic of `linalg.matrix_norm` and `linalg.norm` to be consistent with the docs - Before the `_out` versions of these functions were incorrect - Their implementation is now as efficient as expected, as it avoids reimplementing these operations whenever possible. - Deprecates `torch.frobenius_norm` and `torch.nuclear_norm`, as they were exposed in the API and they are apparently being used in mobile (??!!) even though they were not documented and their implementation was slow. - I'd love to get rid of these functions already, but I guess we have to go through their deprecation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/76547 Approved by: https://github.com/mruberry	2022-05-18 11:46:50 +00:00
Kulin Seth	e011a8e18b	Enable PyTorch operations on MPS Backend. (#77343 ) Add PyTorch operations to MPS backend. - https://github.com/pytorch/pytorch/issues/77394 Pull Request resolved: https://github.com/pytorch/pytorch/pull/77343 Approved by: https://github.com/albanD	2022-05-13 18:28:53 +00:00
Ivan Yashchuk	890bdf13e1	Remove deprecated torch.solve (#70986 ) The time has come to remove deprecated linear algebra related functions. This PR removes `torch.solve`. cc @jianyuh @nikitaved @pearu @mruberry @walterddr @IvanYashchuk @xwang233 @Lezcano Pull Request resolved: https://github.com/pytorch/pytorch/pull/70986 Approved by: https://github.com/Lezcano, https://github.com/albanD	2022-05-10 13:44:07 +00:00
PyTorch MergeBot	4ebc4890dd	Revert "Add linalg.lu_solve" This reverts commit `fc5b4a5a33`. Reverted https://github.com/pytorch/pytorch/pull/72935 on behalf of https://github.com/malfet	2022-05-09 19:12:30 +00:00
PyTorch MergeBot	1467e0dd5d	Revert "Deprecate torch.lu" This reverts commit `a5bbfd94fb`. Reverted https://github.com/pytorch/pytorch/pull/73804 on behalf of https://github.com/malfet	2022-05-09 19:06:44 +00:00
lezcano	a5bbfd94fb	Deprecate torch.lu BC-breaking note: This PR deprecates `torch.lu` in favor of `torch.linalg.lu_factor`. A upgrade guide is added to the documentation for `torch.lu`. Note this PR DOES NOT remove `torch.lu`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/73804 Approved by: https://github.com/IvanYashchuk, https://github.com/mruberry	2022-05-05 19:17:11 +00:00
lezcano	fc5b4a5a33	Add linalg.lu_solve This PR adds `linalg.lu_solve`. While doing so, I found a bug in MAGMA when calling the batched MAGMA backend with trans=True. We work around that by solving the system solving two triangular systems. We also update the heuristics for this function, as they were fairly updated. We found that cuSolver is king, so luckily we do not need to rely on the buggy backend from magma for this function. We added tests testing this function left and right. We also added tests for the different backends. We also activated the tests for AMD, as those should work as well. Fixes https://github.com/pytorch/pytorch/issues/61657 Pull Request resolved: https://github.com/pytorch/pytorch/pull/72935 Approved by: https://github.com/IvanYashchuk, https://github.com/mruberry	2022-05-05 19:02:13 +00:00
lezcano	7cb7cd5802	Add linalg.lu This PR modifies `lu_unpack` by: - Using less memory when unpacking `L` and `U` - Fuse the subtraction by `-1` with `unpack_pivots_stub` - Define tensors of the correct types to avoid copies - Port `lu_unpack` to be a strucutred kernel so that its `_out` version does not incur on extra copies Then we implement `linalg.lu` as a structured kernel, as we want to compute its derivative manually. We do so because composing the derivatives of `torch.lu_factor` and `torch.lu_unpack` would be less efficient. This new function and `lu_unpack` comes with all the things it can come: forward and backward ad, decent docs, correctness tests, OpInfo, complex support, support for metatensors and support for vmap and vmap over the gradients. I really hope we don't continue adding more features. This PR also avoids saving some of the tensors that were previously saved unnecessarily for the backward in `lu_factor_ex_backward` and `lu_backward` and does some other general improvements here and there to the forward and backward AD formulae of other related functions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/67833 Approved by: https://github.com/IvanYashchuk, https://github.com/nikitaved, https://github.com/mruberry	2022-05-05 09:17:05 +00:00
lezcano	1a4eea57be	Improve derivative of QR decomposition We derive and implement a more concise rule for the forward and backward derivatives of the QR decomposition. While doing this we: - Fix the composite compliance of `linalg.qr` and we make it support batches - Improve the performance and simplify the implementation of both foward and backward - Avoid saving the input matrix for the backward computation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/76115 Approved by: https://github.com/nikitaved, https://github.com/albanD	2022-05-05 09:14:57 +00:00
lezcano	9e34a8241b	Improved matmul tests Let's make sure we don't break anything in the next PRs of the stack. Also some comprehensive testing of matmul on CPU and CUDA was long due. Running this tests we see that the `out=` variant of matmul is broken when used on 4D tensors. This hints what would be the amount of people that use out= variants... Pull Request resolved: https://github.com/pytorch/pytorch/pull/75193 Approved by: https://github.com/ngimel	2022-05-03 16:28:14 +00:00
Ivan Yashchuk	23dcbe3fed	Fix failing test when SciPy is not available for test_ldl_factor This PR moves the SciPy function import under the `if TEST_SCIPY` block. Pull Request resolved: https://github.com/pytorch/pytorch/pull/76657 Approved by: https://github.com/nikitaved	2022-05-02 14:03:36 +00:00
Ivan Yashchuk	8bb7203049	Add torch.linalg.ldl_factor_ex and torch.linalg.ldl_solve This PR adds a function for computing the LDL decomposition and a function that can solve systems of linear equations using this decomposition. The result of `torch.linalg.ldl_factor_ex` is in a compact form and it's required to use it only through `torch.linalg.ldl_solve`. In the future, we could provide `ldl_unpack` function that transforms the compact representation into explicit matrices. Fixes https://github.com/pytorch/pytorch/issues/54847. cc @jianyuh @nikitaved @pearu @mruberry @walterddr @IvanYashchuk @xwang233 @Lezcano Pull Request resolved: https://github.com/pytorch/pytorch/pull/69828 Approved by: https://github.com/Lezcano, https://github.com/mruberry, https://github.com/albanD	2022-04-28 19:23:37 +00:00
Xiao Sun	a5ffdaf064	bypassed cublasLtMatMul bug after slicing (#76205 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/76205 bypass corner cases in cublasLtMatmul Reviewed By: ngimel Differential Revision: D35716932 fbshipit-source-id: 9e0beaba1cae5f9cec369d25d85ef58e226c5c3c (cherry picked from commit 029b9578b05ceae51fb36eeaa7120855926b8393)	2022-04-24 17:28:31 +00:00
Jeff Daily	e846ef8818	add rocm ciflow/slow workflow Enables additional tests that historically have been missed for ROCm CI. Pull Request resolved: https://github.com/pytorch/pytorch/pull/72686 Approved by: https://github.com/seemethere	2022-04-22 17:41:28 +00:00
arindamroy-eng	7478ce187a	ROCM:Unskip more tests for ROCM5.0 Re-enabling more tests which are working on ROCM5.0 Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/75353 Approved by: https://github.com/ezyang	2022-04-19 19:45:55 +00:00
Natalia Gimelshein	f120d5be94	remove fp16 support from cpu linalg functions fp16 on cpu produces slow and inaccurate results, see #69969 Pull Request resolved: https://github.com/pytorch/pytorch/pull/75647 Approved by: https://github.com/Lezcano, https://github.com/mruberry	2022-04-18 15:55:38 +00:00
PyTorch MergeBot	9312ee8cd6	Revert "remove fp16 support from cpu linalg functions" This reverts commit `29af58db51`. Reverted https://github.com/pytorch/pytorch/pull/75647 on behalf of https://github.com/ngimel	2022-04-14 21:06:48 +00:00
Natalia Gimelshein	29af58db51	remove fp16 support from cpu linalg functions fp16 on cpu produces slow and inaccurate results, see #69969 Pull Request resolved: https://github.com/pytorch/pytorch/pull/75647 Approved by: https://github.com/Lezcano, https://github.com/mruberry	2022-04-14 18:45:59 +00:00
PyTorch MergeBot	495c5aebb1	Revert "remove fp16 support from cpu linalg functions" This reverts commit `de18c28a4c`. Reverted https://github.com/pytorch/pytorch/pull/75647 on behalf of https://github.com/suo	2022-04-13 18:34:35 +00:00
Natalia Gimelshein	de18c28a4c	remove fp16 support from cpu linalg functions fp16 on cpu produces slow and inaccurate results, see #69969 Pull Request resolved: https://github.com/pytorch/pytorch/pull/75647 Approved by: https://github.com/Lezcano, https://github.com/mruberry	2022-04-13 17:24:22 +00:00
Scott Wolchok	48147675f2	[PyTorch] _addm_activation native function for matmul/bias/activation fusion Pull Request resolved: https://github.com/pytorch/pytorch/pull/74490 Here's an extended version of addmm that takes advantage of cublasLt's fused addmm + relu/gelu support. Differential Revision: [D35019612](https://our.internmc.facebook.com/intern/diff/D35019612/) Approved by: https://github.com/ngimel	2022-04-08 17:54:09 +00:00
Andrey Talman	622cff3e95	Cuda 11.6 Disable failing tests (#75420 ) Summary: This mitigates number of issues with CUDA 11.6 update and updates Linux driver . New issues discovered #[75391](https://github.com/pytorch/pytorch/issues/75391) #[75375](https://github.com/pytorch/pytorch/issues/75375) Old issue present since 11.3 #[57482](https://github.com/pytorch/pytorch/issues/57482) #[70111](https://github.com/pytorch/pytorch/issues/70111) These changes already testsed WIP PR: #[75337](https://github.com/pytorch/pytorch/pull/75337) Pull Request resolved: https://github.com/pytorch/pytorch/pull/75420 Reviewed By: seemethere Differential Revision: D35481973 Pulled By: atalman fbshipit-source-id: 4db00c646e2df4f8650404763963c3b215110f1f (cherry picked from commit 518e19dc361b43273f5bd6bdfff942614e8466f5)	2022-04-07 22:43:15 +00:00
Nikita Shulga	bfac65dfe5	[testing] Update dispatch macros (#74977 ) This PR is reland of #74289 Co-authored-by: Khushi Agrawal <khushiagrawal411@gmail.com>	2022-03-30 14:13:21 -07:00
PyTorch MergeBot	2e4152b118	Revert "[testing] Update dispatch macros" This reverts commit `eed19a0f38`. Reverted https://github.com/pytorch/pytorch/pull/74289 on behalf of https://github.com/malfet	2022-03-30 19:52:37 +00:00
Khushi Agrawal	eed19a0f38	[testing] Update dispatch macros Hi, This PR is the follow-up PR of #71561. (the previous PR had a couple of merge conflicts and was reverted, this PR resolves that). Please take a look. Thanks! cc: @pmeier @mruberry @kshitij12345 Pull Request resolved: https://github.com/pytorch/pytorch/pull/74289 Approved by: https://github.com/pmeier, https://github.com/mruberry	2022-03-30 16:10:16 +00:00
Ivan Yashchuk	ca4ba2ee92	Skip specifying rcond for gelsy driver in tests Fixes https://github.com/pytorch/pytorch/issues/72281 Pull Request resolved: https://github.com/pytorch/pytorch/pull/74638 Approved by: https://github.com/mruberry	2022-03-24 14:55:33 +00:00
David Berard	15c98700ed	Add CPU slow test job (#73748 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73748 This adds CPU-only slow test jobs, which previously would never run. Includes fixes/skips for slow tests which fail (they need to be skipped now because they used to never run) Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D34628803 Pulled By: davidberard98 fbshipit-source-id: c090ab7bf7bda9e24ec5cdefa6fd35c6310dbac0 (cherry picked from commit 06f7a94a57cc7023e9c5442be8298d20cd011144)	2022-03-23 21:17:27 +00:00
Nikita Shulga	ef066f0832	Revert D34856571: [pytorch][PR] Replace `get_all_` type macros with the ATen dispatch macros. Test Plan: revert-hammer Differential Revision: D34856571 (`3ded7b1da3`) Original commit changeset: 0dca038bcad5 Original Phabricator Diff: D34856571 (`3ded7b1da3`) fbshipit-source-id: 594553fa0b710d78beba59d5d2b646f1f1270386 (cherry picked from commit 8090eb9b12dcf452a9e7dc01792a66fb91b563b6)	2022-03-15 22:07:11 +00:00
Ivan Yashchuk	064206df03	Performance and memory improvements to batched torch.linalg.solve (2nd attempt) (#71756 ) Summary: Previous PR with the same content: https://github.com/pytorch/pytorch/pull/69752. Opening a new PR by request: https://github.com/pytorch/pytorch/pull/69752#issuecomment-1020829812. ------ Previously for single input matrix A and batched matrix B, matrix A was expanded and cloned before computing the LU decomposition and solving the linear system. With this PR the LU decomposition is computed once for a single matrix and then expanded&cloned if required by a backend library call for the linear system solving. Here's a basic comparison: ```python # BEFORE THE PR In [1]: import torch In [2]: a = torch.randn(256, 256) In [3]: b = torch.randn(1024, 256, 2) In [4]: %%timeit ...: torch.linalg.solve(a, b) ...: ...: 329 ms ± 17.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) # WITH THIS PR In [1]: import torch In [2]: a = torch.randn(256, 256) In [3]: b = torch.randn(1024, 256, 2) In [4]: %%timeit ...: torch.linalg.solve(a, b) ...: ...: 21.4 ms ± 23 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) ``` Fixes https://github.com/pytorch/pytorch/issues/71406, fixes https://github.com/pytorch/pytorch/issues/71610 Pull Request resolved: https://github.com/pytorch/pytorch/pull/71756 Reviewed By: ngimel Differential Revision: D33771981 Pulled By: mruberry fbshipit-source-id: 0917ee36a3eb622ff75d54787b1bffe26b41cb4a (cherry picked from commit 9c30a05aaa972bc02dfc94c3d2463f0c5ee0c58c)	2022-03-15 21:28:31 +00:00
Khushi Agrawal	3ded7b1da3	Replace `get_all_` type macros with the ATen dispatch macros. (#71561 ) Summary: Hi, Team! The PR is motivated from https://github.com/pytorch/pytorch/pull/71153#discussion_r782446738. It aims to replace `get_all` type macros with the ATen dispatch macros. The files it iterates over are: (Thanks, Lezcano, for the idea!!) <details> <summary> `test/test_autograd.py`</summary> <p> ```python 43:from torch.testing._internal.common_dtype import get_all_dtypes 8506: floating_dt = [dt for dt in get_all_dtypes() if dt.is_floating_point] ``` </p> </details> <details> <summary> `test/test_binary_ufuncs.py`</summary> <p> ```python 26: all_types_and_complex_and, integral_types_and, get_all_dtypes, get_all_int_dtypes, get_all_math_dtypes, 27: get_all_complex_dtypes, get_all_fp_dtypes, 935: dtypes(get_all_dtypes(include_bool=False, include_complex=False)) 1035: dtypes(get_all_dtypes( 1488: dtypes((get_all_dtypes(include_bool=False, include_bfloat16=False))) 1879: dtypes(product(get_all_dtypes(include_complex=False), get_all_dtypes(include_complex=False))) 1887: dtypes((get_all_int_dtypes() + [torch.bool])) 1913: dtypes((get_all_fp_dtypes())) 1941: dtypes((get_all_fp_dtypes())) 1977: dtypes(product(get_all_complex_dtypes(), get_all_dtypes())) 2019: dtypes(product(get_all_fp_dtypes(), get_all_fp_dtypes())) 2048: dtypes(get_all_dtypes()) 2110: dtypes(product(get_all_dtypes(include_complex=False), 2111: get_all_dtypes(include_complex=False))) 2128: types = [torch.bool, torch.bfloat16] + get_all_int_dtypes() 2173: if dtypes[1] in get_all_fp_dtypes(): 2178: dtypes(product(get_all_fp_dtypes(), 2179: get_all_fp_dtypes())) 2260: dtypesIfCUDA(set(get_all_math_dtypes('cuda')) - {torch.complex64, torch.complex128}) 2261: dtypes(set(get_all_math_dtypes('cpu')) - {torch.complex64, torch.complex128}) 2273: dtypesIfCUDA(set(get_all_math_dtypes('cuda')) - {torch.complex64, torch.complex128}) 2274: dtypes(set(get_all_math_dtypes('cpu')) - {torch.complex64, torch.complex128}) 2307: dtypes(get_all_math_dtypes('cpu')) 2319: dtypes(get_all_fp_dtypes(include_bfloat16=False)) 2331: dtypes(get_all_int_dtypes()) 2356: dtypes(get_all_dtypes(include_bfloat16=False, include_bool=False, include_complex=False)) 2393: if dtype in get_all_int_dtypes(): 2614: dtypes(get_all_dtypes()) 2624: dtypes(tuple(itertools.combinations_with_replacement(get_all_dtypes(), 2))) 2806: dtypes(list(product(get_all_dtypes(include_complex=False), 2807: get_all_dtypes(include_complex=False)))) 2866: dtypes(list(product(get_all_complex_dtypes(), 2867: get_all_complex_dtypes()))) 2902: dtypes(product(get_all_dtypes(), get_all_dtypes())) 2906: dtypes(product(get_all_dtypes(), get_all_dtypes())) 2910: dtypes(product(get_all_dtypes(), get_all_dtypes())) 3019: dtypes = [torch.float, torch.double] + get_all_complex_dtypes() 3221: dtypes(get_all_dtypes(include_complex=False)) 3407: dtypes(list(product(get_all_dtypes(include_bool=False), 3408: get_all_dtypes(include_bool=False)))) 3504: dtypes(product(get_all_dtypes(include_complex=False, include_bfloat16=False), 3505: get_all_dtypes(include_complex=False, include_bfloat16=False))) 3516: if x.dtype in get_all_int_dtypes() + [torch.bool]: 3643: dtypes(product(get_all_dtypes(include_complex=False, 3645: get_all_dtypes(include_complex=False, ``` </p> </details> <details> <summary> `test/test_complex.py`</summary> <p> ```python 6:from torch.testing._internal.common_dtype import get_all_complex_dtypes 11: dtypes(get_all_complex_dtypes()) ``` </p> </details> <details> <summary> `test/test_foreach.py`</summary> <p> ```python 18: get_all_dtypes, get_all_int_dtypes, get_all_complex_dtypes, get_all_fp_dtypes, 142: if dtype in get_all_int_dtypes(): 179: disable_fastpath = op.ref == torch.div and dtype in get_all_int_dtypes() + [torch.bool] 201: disable_fastpath = op.ref == torch.div and dtype in get_all_int_dtypes() + [torch.bool] 205: disable_fastpath \|= dtype in get_all_int_dtypes() + [torch.bool] 211: disable_fastpath \|= dtype not in get_all_complex_dtypes() 241: bool_int_div = op.ref == torch.div and dtype in get_all_int_dtypes() + [torch.bool] 246: disable_fastpath \|= dtype in get_all_int_dtypes() + [torch.bool] 248: disable_fastpath \|= dtype not in get_all_complex_dtypes() 250: disable_fastpath \|= True and dtype not in get_all_complex_dtypes() 307: disable_fastpath = dtype in get_all_int_dtypes() + [torch.bool] 365: if opinfo.name == "_foreach_abs" and dtype in get_all_complex_dtypes(): 376: ops(foreach_unary_op_db, dtypes=get_all_dtypes()) 393: dtypes=get_all_dtypes(include_half=True, include_bfloat16=True, include_complex=False)) 401: ops(foreach_minmax_op_db, dtypes=get_all_fp_dtypes(include_bfloat16=True, include_half=True)) 426: if ord in (1, 2) and dtype in torch.testing.get_all_fp_dtypes(): 439: dtypes(get_all_dtypes()) 449: ops(foreach_binary_op_db, dtypes=get_all_dtypes()) 481: ops(foreach_binary_op_db, dtypes=get_all_dtypes()) 536: if dtype in get_all_int_dtypes() + [torch.bool] and foreach_op == torch._foreach_div: 545: ops(foreach_binary_op_db, dtypes=get_all_dtypes()) 637: ops(foreach_pointwise_op_db, allowed_dtypes=get_all_fp_dtypes(include_half=False, include_bfloat16=False)) ``` </p> </details> <details> <summary> `test/test_linalg.py`</summary> <p> ```python 29: all_types, floating_types, floating_and_complex_types, get_all_dtypes, get_all_int_dtypes, get_all_complex_dtypes, 30: get_all_fp_dtypes, 111: dtypes((get_all_dtypes())) 794: float_and_complex_dtypes = get_all_fp_dtypes() + get_all_complex_dtypes() 807: dtypes((get_all_int_dtypes())) 828: dtypes((get_all_fp_dtypes() + get_all_complex_dtypes())) 841: if dtype in get_all_complex_dtypes(): 844: dtypes(itertools.product(get_all_dtypes(), 845: get_all_dtypes())) 855: for dtypes0, dtypes1, dtypes2 in product(get_all_dtypes(), repeat=3): 5607: get_all_fp_dtypes(include_half=not CUDA9, include_bfloat16=(CUDA11OrLater and SM53OrLater))) 5608: dtypes((set(get_all_dtypes()) - {torch.half, torch.bool})) 5644: dtypes((get_all_complex_dtypes() + get_all_fp_dtypes())) 6255: dtypesIfCUDA(get_all_complex_dtypes(), 6256: get_all_fp_dtypes(include_bfloat16=(TEST_WITH_ROCM or (CUDA11OrLater and SM53OrLater)), 6292: dtypesIfCUDA(get_all_fp_dtypes(include_bfloat16=(TEST_WITH_ROCM or (CUDA11OrLater and SM53OrLater)))) 6323: dtypesIfCUDA(get_all_complex_dtypes(), 6324: get_all_fp_dtypes(include_bfloat16=(TEST_WITH_ROCM or (CUDA11OrLater and SM53OrLater)))) 6325: dtypes(get_all_complex_dtypes(), get_all_fp_dtypes()) 6358: dtypesIfCUDA(([torch.float, torch.double] + get_all_complex_dtypes())) 6556: dtypes(get_all_fp_dtypes(), get_all_complex_dtypes()) 6668: dtypes(get_all_fp_dtypes(), get_all_complex_dtypes()) 6741: dtypes(get_all_fp_dtypes(), get_all_complex_dtypes()) ``` </p> </details> <details> <summary> `test/test_nn.py`</summary> <p> ```python 37:from torch.testing._internal.common_dtype import integral_types, get_all_fp_dtypes, get_all_math_dtypes 50: onlyNativeDeviceTypes, deviceCountAtLeast, largeTensorTest, expectedFailureMeta, skipMeta, get_all_device_types, \ 8862: for device in get_all_device_types(): 9629: for dt1 in get_all_math_dtypes(device): 9630: for dt2 in get_all_math_dtypes(device): 9631: for dt3 in get_all_math_dtypes(device): 9648: for input_dtype in get_all_math_dtypes(device): 9664: for input_dtype in get_all_math_dtypes(device): 13015: dtypes(get_all_fp_dtypes(include_bfloat16=AMPERE_OR_ROCM)) 13034: dtypes(get_all_fp_dtypes(include_bfloat16=AMPERE_OR_ROCM)) 13159: dtypes(get_all_fp_dtypes(include_bfloat16=AMPERE_OR_ROCM)) 17400: dtypesIfCUDA(get_all_fp_dtypes(include_bfloat16=AMPERE_OR_ROCM)) 17768: dtypesIfCUDA(get_all_fp_dtypes()) 17773: dtypesIfCUDA(get_all_fp_dtypes()) 17778: dtypesIfCUDA(get_all_fp_dtypes()) 17783: dtypesIfCUDA(get_all_fp_dtypes()) 17788: dtypesIfCUDA(get_all_fp_dtypes()) 17793: dtypesIfCUDA(get_all_fp_dtypes()) 17798: dtypesIfCUDA(get_all_fp_dtypes()) 17963: dtypesIfCUDA(get_all_fp_dtypes()) 17977: dtypesIfCUDA(get_all_fp_dtypes()) 18684: def test_cross_entropy_loss_prob_target_all_reductions(self, device): ``` </p> </details> <details> <summary> `test/test_numpy_interop.py`</summary> <p> ```python 12:from torch.testing._internal.common_dtype import get_all_dtypes 399: dtypes(get_all_dtypes()) ``` </p> </details> <details> <summary> `test/test_ops.py`</summary> <p> ```python 12:from torch.testing._internal.common_dtype import floating_and_complex_types_and, get_all_dtypes 86: for dtype in get_all_dtypes(): ``` </p> </details> <details> <summary> `test/test_reductions.py`</summary> <p> ```python 16: get_all_dtypes, get_all_math_dtypes, get_all_int_dtypes, get_all_complex_dtypes, get_all_fp_dtypes, 360: allowed_dtypes=get_all_dtypes(include_bfloat16=False)) 366: allowed_dtypes=get_all_dtypes(include_bfloat16=False)) 394: allowed_dtypes=get_all_dtypes(include_bfloat16=False)) 750: for dtype in [dtype for dtype in get_all_math_dtypes('cpu') if dtype != torch.float16]: 1404: dtypes(get_all_dtypes(include_bool=False, include_complex=False)) 1457: dtypes((get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False) + 1458: get_all_complex_dtypes())) 1465: return dtype in get_all_int_dtypes() 1494: dtypes((get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False))) 1501: dtypes((get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False))) 1507: dtypes((get_all_complex_dtypes())) 1514: dtypes = list(get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False)) 1523: dtypes((get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False))) 1531: if dtype in get_all_fp_dtypes(): 1608: dtypes((get_all_dtypes(include_half=True, include_bfloat16=False, 1837: dtypes(get_all_dtypes(include_bool=False, include_complex=False)) 1855: dtypes((set(get_all_dtypes(include_bool=False, include_complex=False)) - {torch.uint8})) 3219: for dtype in get_all_dtypes(include_half=True, include_bfloat16=False, ``` </p> </details> <details> <summary> `test/test_serialization.py`</summary> <p> ```python 26:from torch.testing._internal.common_dtype import get_all_dtypes 586: for device, dtype in product(devices, get_all_dtypes()): 589: for other_dtype in get_all_dtypes(): ``` </p> </details> <details> <summary> `test/test_shape_ops.py`</summary> <p> ```python 18:from torch.testing._internal.common_dtype import get_all_dtypes 230: dtypes(get_all_dtypes(include_complex=False, include_bool=False, include_half=False, 232: dtypesIfCUDA(get_all_dtypes(include_complex=False, include_bool=False, include_bfloat16=False)) 344: dtypes(get_all_dtypes()) 443: dtypes(get_all_dtypes()) 461: dtypes(get_all_dtypes()) 570: dtypes(get_all_dtypes(include_complex=False)) ``` </p> </details> <details> <summary> `test/test_sort_and_select.py`</summary> <p> ```python 12: all_types, all_types_and, floating_types_and, get_all_dtypes, get_all_int_dtypes, get_all_fp_dtypes, 136: dtypes(set(get_all_dtypes()) - {torch.bool, torch.complex64, torch.complex128}) 231: dtypes(set(get_all_dtypes()) - {torch.bool, torch.complex64, torch.complex128}) 296: dtypes((get_all_int_dtypes() + get_all_fp_dtypes())) 647: dtypesIfCUDA(get_all_fp_dtypes()) 678: dtypesIfCUDA((get_all_dtypes(include_complex=False, 682: dtypes((get_all_dtypes(include_complex=False, include_bool=False, include_half=False, include_bfloat16=False))) 739: dtypesIfCPU(set(get_all_dtypes()) - {torch.complex64, torch.complex128}) 740: dtypes(set(get_all_dtypes()) - {torch.bfloat16, torch.complex64, torch.complex128}) 799: dtypesIfCPU(set(get_all_dtypes()) - {torch.complex64, torch.complex128}) 800: dtypes(set(get_all_dtypes()) - {torch.bfloat16, torch.complex64, torch.complex128}) ``` </p> </details> <details> <summary> `test/test_sparse.py`</summary> <p> ```python 20:from torch.testing import get_all_complex_dtypes, get_all_fp_dtypes 29: floating_and_complex_types, floating_and_complex_types_and, get_all_dtypes, get_all_int_dtypes, 1963: return dtype in get_all_int_dtypes() 1994: dtypes(get_all_dtypes(include_bool=False, include_half=False, 2103: return dtype in get_all_int_dtypes() 2138: dtypes(get_all_dtypes(include_bool=False, include_half=False, 2626: all_sparse_dtypes = get_all_dtypes(include_complex=True) 2633: all_sparse_dtypes = get_all_dtypes(include_complex=True) 3230: dtypes(get_all_complex_dtypes(), 3231: get_all_fp_dtypes(include_half=False, include_bfloat16=False)) 3234: get_all_fp_dtypes( ``` </p> </details> <details> <summary> `test/test_sparse_csr.py`</summary> <p> ```python 7:from torch.testing import get_all_complex_dtypes, get_all_fp_dtypes, floating_and_complex_types, make_tensor 17:from torch.testing._internal.common_dtype import floating_types, get_all_dtypes 120: dtypes(get_all_dtypes()) 133: dtypes(get_all_dtypes()) 150: dtypes(get_all_dtypes()) 180: dtypes(get_all_dtypes()) 201: dtypes(get_all_dtypes()) 210: dtypes(get_all_dtypes()) 225: dtypes(get_all_dtypes()) 244: dtypes(get_all_dtypes()) 263: dtypes(get_all_dtypes()) 285: dtypes(get_all_dtypes()) 411: dtypes(get_all_dtypes()) 482: dtypes(get_all_dtypes()) 502: dtypes(get_all_dtypes()) 562: dtypes(get_all_dtypes()) 588: dtypesIfCUDA(get_all_complex_dtypes(), 589: get_all_fp_dtypes(include_half=SM53OrLater, include_bfloat16=SM80OrLater)) 745: dtypesIfCUDA(get_all_complex_dtypes(), 746: get_all_fp_dtypes(include_half=SM53OrLater and TEST_CUSPARSE_GENERIC, 765: dtypesIfCUDA(get_all_complex_dtypes(), 766: get_all_fp_dtypes(include_half=SM53OrLater and TEST_CUSPARSE_GENERIC, 801: torch.testing.get_all_fp_dtypes(include_bfloat16=SM80OrLater, 841: torch.testing.get_all_fp_dtypes(include_bfloat16=SM80OrLater, 1182: dtypes(get_all_dtypes()) 1276: dtypes(get_all_dtypes(include_bool=False, include_half=False, include_bfloat16=False)) 1286: dtypes(get_all_dtypes()) ``` </p> </details> <details> <summary> `test/test_tensor_creation_ops.py`</summary> <p> ```python 21: onlyCUDA, skipCPUIf, dtypesIfCUDA, skipMeta, get_all_device_types) 23: get_all_dtypes, get_all_math_dtypes, get_all_int_dtypes, get_all_fp_dtypes, get_all_complex_dtypes 150: for dt in get_all_dtypes(): 160: for dt in get_all_dtypes(): 314: dtypes = [dtype for dtype in get_all_dtypes() if dtype != torch.bfloat16] 1012: dtypes((get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False) + 1013: get_all_complex_dtypes())) 1032: dtypes((get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False) + 1033: get_all_complex_dtypes())) 1050: dtypes((get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False) + 1051: get_all_complex_dtypes())) 1745: dtypes((get_all_int_dtypes() + get_all_fp_dtypes())) 1779: dtypes((get_all_int_dtypes() + get_all_fp_dtypes())) 1868: dtypes((get_all_int_dtypes() + get_all_fp_dtypes())) 1926: dtypes((get_all_int_dtypes() + get_all_fp_dtypes())) 1954: do_test_empty_full(self, get_all_math_dtypes('cpu'), torch.strided, torch_device) 1956: do_test_empty_full(self, get_all_math_dtypes('cpu'), torch.strided, None) 1957: do_test_empty_full(self, get_all_math_dtypes('cpu'), torch.strided, torch_device) 2538: for device in get_all_device_types(): 2645: for dtype in get_all_dtypes(): 2678: dtypes((get_all_fp_dtypes(include_half=False, include_bfloat16=False) + 2679: get_all_complex_dtypes())) 2716: dtypes(get_all_fp_dtypes(include_half=False, include_bfloat16=False)) 2827: for dt in get_all_dtypes(): 2913: dtypes(get_all_dtypes(include_bool=False, include_half=False)) 2914: dtypesIfCUDA(get_all_dtypes(include_bool=False, include_half=True)) 3028: dtypes((get_all_fp_dtypes() + get_all_complex_dtypes())) 3033: dtypes((get_all_fp_dtypes() + get_all_complex_dtypes())) 3074: dtypes(get_all_dtypes(include_bool=False, include_half=False, include_complex=False)) 3075: dtypesIfCUDA(((get_all_int_dtypes() + [torch.float32, torch.float16, torch.bfloat16]) 3077: else get_all_dtypes(include_bool=False, include_half=True, include_complex=False))) 3873: dtypes(get_all_dtypes()) 3884: dtypes(get_all_dtypes(include_bool=False)) 3916: for other in get_all_dtypes(): 3922: dtypes(get_all_dtypes()) 3932: dtypes(get_all_dtypes(include_bool=False)) 3955: dtypes(get_all_dtypes(include_bool=False)) 3961: dtypes(get_all_dtypes(include_bool=False)) 3965: dtypes(get_all_dtypes()) ``` </p> </details> <details> <summary> `test/test_testing.py`</summary> <p> ```python 25:from torch.testing._internal.common_dtype import get_all_dtypes 31: dtypes((get_all_dtypes(include_half=True, include_bfloat16=False, ``` </p> </details> <details> <summary> `test/test_torch.py`</summary> <p> ```python 51: expectedAlertNondeterministic, get_all_device_types, skipXLA) 57: get_all_fp_dtypes, get_all_int_dtypes, get_all_math_dtypes, get_all_dtypes, get_all_complex_dtypes 296: for d in get_all_device_types(): 323: for device in get_all_device_types(): 324: for dt1 in get_all_dtypes(): 325: for dt2 in get_all_dtypes(): 343: all_dtypes = get_all_dtypes() 350: all_dtypes = get_all_dtypes() 781: for dtype in get_all_dtypes(): 986: for device in get_all_device_types(): 1017: for device in get_all_device_types(): 1018: for dtype in get_all_math_dtypes(device): 2792: for device in get_all_device_types(): 3186: dtypes(get_all_dtypes()) 3195: for error_dtype in get_all_dtypes(): 3203: dtypes(get_all_dtypes()) 3212: for error_dtype in get_all_dtypes(): 4539: dtypes(get_all_fp_dtypes()) 4545: dtypes((get_all_int_dtypes() + get_all_fp_dtypes())) 4577: dtypes(get_all_fp_dtypes(include_half=False, include_bfloat16=False)) 4578: dtypesIfCPU((get_all_fp_dtypes(include_half=False, include_bfloat16=True))) 4579: dtypesIfCUDA((get_all_fp_dtypes(include_bfloat16=False))) 4599: dtypes((get_all_fp_dtypes(include_half=False, include_bfloat16=False))) 4600: dtypesIfCPU((get_all_dtypes(include_half=False, include_bfloat16=False, include_complex=False))) 4601: dtypesIfCUDA((get_all_dtypes(include_bfloat16=False, include_complex=False))) 4613: for p_dtype in get_all_fp_dtypes(include_half=device.startswith('cuda'), include_bfloat16=False): 4628: dtypes((get_all_fp_dtypes(include_half=False, include_bfloat16=False))) 4629: dtypesIfCUDA((get_all_fp_dtypes(include_bfloat16=False))) 4640: dtypes(get_all_fp_dtypes()) 4723: dtypes(get_all_fp_dtypes()) 4735: dtypes(get_all_fp_dtypes(include_bfloat16=False)) 4736: dtypesIfCUDA(get_all_fp_dtypes()) 4747: dtypes(get_all_fp_dtypes()) 4761: dtypes(get_all_fp_dtypes()) 4771: dtypes(get_all_fp_dtypes()) 4792: dtypes((get_all_int_dtypes() + get_all_fp_dtypes())) 5302: dtypes(get_all_dtypes(include_bfloat16=False)) 5322: dtypes(get_all_dtypes(include_half=False, include_bfloat16=False)) 5323: dtypesIfCPU(get_all_dtypes(include_bfloat16=False)) 5324: dtypesIfCUDA(get_all_dtypes(include_bfloat16=False)) 5591: for dt in get_all_dtypes(): 5611: for dt in get_all_dtypes(): 5678: for dt in get_all_dtypes(): 5696: dtypesIfCUDA(set(get_all_math_dtypes('cuda'))) 5697: dtypes(set(get_all_math_dtypes('cpu'))) 5746: dtypes(get_all_dtypes()) 5780: dtypes(get_all_dtypes()) 5885: dtypes(get_all_dtypes()) 5902: dtypes(get_all_dtypes()) 5945: dtypes(get_all_dtypes()) 5979: dtypes(get_all_dtypes(include_bool=False)) 6049: dtypes(get_all_dtypes(include_bool=False)) 6092: dtypes((get_all_fp_dtypes(include_bfloat16=False, include_half=False) + 6093: get_all_complex_dtypes())) 6094: dtypesIfCPU(get_all_dtypes()) 6095: dtypesIfCUDA(get_all_dtypes()) 6122: dtypes((get_all_fp_dtypes(include_bfloat16=False, include_half=False) + 6123: get_all_complex_dtypes())) 6124: dtypesIfCPU(get_all_dtypes()) 6125: dtypesIfCUDA(get_all_dtypes()) 6163: dtypes((get_all_fp_dtypes(include_bfloat16=False, include_half=False) + 6164: get_all_complex_dtypes())) 6165: dtypesIfCPU(get_all_dtypes()) 6166: dtypesIfCUDA(get_all_dtypes()) 6190: dtypes((get_all_complex_dtypes() + 6191: get_all_int_dtypes())) 6238: dtypes(get_all_dtypes()) 6323: dtypes(get_all_dtypes()) 6389: dtypes(product(get_all_dtypes(), (torch.uint8, torch.bool))) 6699: dtypesIfCUDA(set(get_all_math_dtypes('cuda'))) 6700: dtypes(set(get_all_math_dtypes('cpu'))) 7452: dtypes(get_all_dtypes(include_bool=False)) 7461: dtypes(get_all_dtypes(include_bool=False)) 7477: dtypes(get_all_dtypes(include_bool=False)) 7496: dtypes(get_all_dtypes(include_bool=False)) 7538: dtypes(get_all_dtypes(include_bool=False)) 8162: dtypes((get_all_int_dtypes() + get_all_fp_dtypes() + 8163: get_all_complex_dtypes())) 8175: dtypes((get_all_int_dtypes() + get_all_fp_dtypes() + 8176: get_all_complex_dtypes())) ``` </p> </details> <details> <summary> `test/test_type_promotion.py`</summary> <p> ```python 14: get_all_dtypes, get_all_math_dtypes, get_all_int_dtypes, get_all_fp_dtypes 187: for dtype in get_all_dtypes(): 262: dtypes1 = get_all_math_dtypes('cuda') 263: dtypes2 = get_all_math_dtypes(device) 339: dtypes(itertools.product(get_all_dtypes(), get_all_dtypes())) 468: for dt1 in get_all_math_dtypes(device): 469: for dt2 in get_all_math_dtypes(device): 519: for dt1 in get_all_math_dtypes(device): 520: for dt2 in get_all_math_dtypes(device): 528: for dt in get_all_math_dtypes(device): 561: for dtype in get_all_dtypes(): 766: dtypes=get_all_math_dtypes(device)) 771: dtypes=get_all_math_dtypes(device)) 782: dtypes=get_all_math_dtypes(device)) 879: dtypes = get_all_dtypes(include_bfloat16=False) 898: dtypes = get_all_dtypes(include_bfloat16=False, include_bool=False) 965: dtypesIfCUDA(itertools.product(get_all_dtypes(include_bfloat16=False, include_complex=False), 966: get_all_dtypes(include_bfloat16=False, include_complex=False))) 967: dtypes(itertools.product(get_all_dtypes(include_half=False, include_bfloat16=False, 969: get_all_dtypes(include_half=False, include_bfloat16=False, 976: return dtype in get_all_int_dtypes() + [torch.bool] 979: return dtype in get_all_fp_dtypes(include_half=True, include_bfloat16=False) ``` </p> </details> <details> <summary> `test/test_unary_ufuncs.py`</summary> <p> ```python 24: floating_types_and, all_types_and_complex_and, floating_and_complex_types_and, get_all_dtypes, get_all_math_dtypes, 25: get_all_int_dtypes, get_all_fp_dtypes, get_all_complex_dtypes 517: dtypes((get_all_int_dtypes() + [torch.bool] + 518: get_all_fp_dtypes(include_bfloat16=False))) 596: dtypes(get_all_fp_dtypes(include_half=True, include_bfloat16=False)) 611: invalid_input_dtypes = get_all_int_dtypes() + \ 612: get_all_complex_dtypes() + \ 619: for dtype in get_all_fp_dtypes(include_half=True, include_bfloat16=False): 1048: dtypes(get_all_math_dtypes('cpu')) 1182: dtypesIfCUDA(get_all_fp_dtypes()) 1190: dtypesIfCUDA(get_all_fp_dtypes()) 1205: dtypesIfCUDA(get_all_fp_dtypes()) 1215: dtypesIfCUDA(get_all_fp_dtypes()) 1307: dtypes((get_all_dtypes(include_bool=False))) 1349: dtypes((get_all_fp_dtypes(include_half=False) + 1350: get_all_complex_dtypes())) 1351: dtypesIfCUDA((get_all_fp_dtypes(include_half=True) + 1352: get_all_complex_dtypes())) ``` </p> </details> <details> <summary> `test/test_view_ops.py`</summary> <p> ```python 19: get_all_dtypes, get_all_int_dtypes, get_all_fp_dtypes, get_all_complex_dtypes 124: dtypes((get_all_int_dtypes() + get_all_fp_dtypes())) 131: dtypes(get_all_dtypes(include_bfloat16=False)) 213: for view_dtype in [get_all_fp_dtypes(), get_all_complex_dtypes()]: 220: dtypes(get_all_dtypes()) 224: for view_dtype in get_all_dtypes(): 305: dtypes(get_all_complex_dtypes(include_complex32=True)) 343: dtypes(get_all_dtypes()) 354: dtypes(get_all_dtypes()) 364: dtypes(get_all_dtypes()) 374: dtypes(get_all_dtypes()) 384: dtypes((get_all_int_dtypes() + get_all_fp_dtypes())) 395: dtypes(get_all_complex_dtypes()) 426: dtypes(get_all_complex_dtypes()) 451: dtypes(product(get_all_complex_dtypes(), get_all_dtypes())) 1263: dtypes((torch.testing.get_all_dtypes())) 1279: dtypes((torch.testing.get_all_dtypes())) 1405: dtypes((get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False) + 1406: get_all_complex_dtypes())) 1471: dtypes(get_all_dtypes(include_bfloat16=False)) 1574: dtypes(get_all_dtypes()) 1601: dtypes(get_all_dtypes(include_bfloat16=False)) 1632: dtypes(*get_all_dtypes(include_bfloat16=False)) 1711: for dt in get_all_dtypes(): 1717: for dt in get_all_dtypes(): 1724: for dt in get_all_dtypes(): ``` </p> </details> I'm looking forward to your viewpoints. Thanks :) cc: mruberry kshitij12345 anjali411 Pull Request resolved: https://github.com/pytorch/pytorch/pull/71561 Reviewed By: samdow Differential Revision: D34856571 Pulled By: mruberry fbshipit-source-id: 0dca038bcad5cf69906245c496d2e61ac3876335 (cherry picked from commit b058f67b4313143efa714ab105f36e74083131b9)	2022-03-15 20:31:41 +00:00
Philip Meier	0973c5a1cc	align signature of make_tensor with other creation ops (#72702 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72702 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D34457729 Pulled By: mruberry fbshipit-source-id: 83d580c4201eef946dc9cf4b9e28a3d36be55609 (cherry picked from commit aa4cf20fbeb4b795595729b8ac2e6ba7707d8283)	2022-02-25 06:30:31 +00:00
Ivan Yashchuk	29c81bbff5	Fix SVD error code handling for OpenBLAS 0.3.15+ and MKL 2022+ (again) (#72357 ) Summary: This PR was opened as copy of https://github.com/pytorch/pytorch/pull/68812 by request https://github.com/pytorch/pytorch/pull/68812#issuecomment-1030215862. ----- Fixes https://github.com/pytorch/pytorch/issues/67693. Reference LAPACK (used in OpenBLAS) changed info error code for svd when inputs contain non-finite numbers. In PyTorch, we raise an internal assert error for negative `info` error codes because usually, it would indicate the wrong implementation. However, this is not the case with SVD now in newer versions of LAPACK. MKL (tried 2021.4.0) still gives a positive error code for this kind of input. This change aligns with the OpenBLAS and MKL behavior in our code. MKL 2022 has uses the latest reference LAPACK behavior and returns the same `info` as OpenBLAS 0.3.15+ This PR also fixes https://github.com/pytorch/pytorch/issues/71645 that is due to the updated MKL version in CI. Pull Request resolved: https://github.com/pytorch/pytorch/pull/72357 Reviewed By: albanD Differential Revision: D34012245 Pulled By: ngimel fbshipit-source-id: 2b66c173cc3458d8c766b542d0d569191cdce310 (cherry picked from commit `fa29e65611`)	2022-02-07 21:36:30 +00:00

1 2 3 4 5 ...

381 Commits