pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Raghav Kansal	6d21e36f21	LU solve uses cuBLAS and cuSOLVER for matrices with dim > 1024 (#61815 ) Summary: This PR builds off of https://github.com/pytorch/pytorch/issues/59148 and modifies the `lu_solve` routine to avoid MAGMA for `b` or `lu_data` matrices with any dimension > 1024, since MAGMA has a bug when dealing with such matrices (https://bitbucket.org/icl/magma/issues/19/dgesv_batched-dgetrs_batched-fails-for). Fixes https://github.com/pytorch/pytorch/issues/36921 Fixes https://github.com/pytorch/pytorch/issues/61929 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61815 Reviewed By: anjali411 Differential Revision: D30199618 Pulled By: ngimel fbshipit-source-id: 06870793f697e9c35aaaa8254b8a8b1a38bd3aa9	2021-08-10 11:07:16 -07:00
Rong Rong (AI Infra)	3782f3eced	Enable upper for torch.linalg.cholesky (#62434 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/61988 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62434 Reviewed By: seemethere, tktrungna Differential Revision: D30079806 Pulled By: walterddr fbshipit-source-id: 044efb96525155c9bc7953ac4ad47c1b7c12fb20	2021-08-09 09:28:33 -07:00
Ivan Yashchuk	3c0c1c4ecb	Fix incorrectly sized tensors for svd when full_matrices=False (#62022 ) Summary: Before this PR for m x n input matrix, the return matrices were always allocated as m x m and n x n and then narrowed. This unnecessarily requires a lot of memory that is then discarded. With this PR when `compute_uv=True and full_matrices=False` correctly sized tensors are allocated. Moreover, if `compute_uv=False` U, V matrices are not allocated as they are not needed. However, cusolver's gesvdj routines fail when these matrices are not allocated, which is a bug, so this allocation is done separately in cusolver specific code path. MAGMA doesn't work for this input because it tries to allocate a large matrix internally (ROCm doesn't work as it uses MAGMA). Example error: ``` CUBLAS error: memory mapping error (11) in magma_sgelqf at /opt/conda/conda-bld/magma-cuda110_1598416697386/work/src/sgelqf.cpp:161 CUBLAS error: out of memory (3) in magma_sgeqrf2_gpu at /opt/conda/conda-bld/magma-cuda110_1598416697386/work/src/sgeqrf2_gpu.cpp:145 CUBLAS error: not initialized (1) in magma_sgeqrf2_gpu at /opt/conda/conda-bld/magma-cuda110_1598416697386/work/src/sgeqrf2_gpu.cpp:145 MAGMA error: function-specific error, see documentation (1) in magma_sgeqrf2_gpu at /opt/conda/conda-bld/magma-cuda110_1598416697386/work/src/sgeqrf2_gpu.cpp:145 MAGMA error: function-specific error, see documentation (1) in magma_sgeqrf2_gpu at /opt/conda/conda-bld/magma-cuda110_1598416697386/work/src/sgeqrf2_gpu.cpp:145 python: /opt/conda/conda-bld/magma-cuda110_1598416697386/work/interface_cuda/interface.cpp:806: void magma_queue_create_internal(magma_device_t, magma_queue*, const char, const char*, int): Assertion `queue->dAarray__ != __null' failed. Aborted (core dumped) ``` Fixes https://github.com/pytorch/pytorch/issues/61949. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62022 Reviewed By: heitorschueroff Differential Revision: D29994429 Pulled By: ngimel fbshipit-source-id: c3f7744d7adc5fd6787f6cbb1ec41405f89a6d4c	2021-07-30 10:27:13 -07:00
Xiao Wang	d57ce8cf89	[Linalg] Add cusolver syevjBatched path for torch.linalg.eigh when cuda >= 11.3 U1 (#62003 ) Summary: This PR adds the `cusolverDn<T>SyevjBatched` fuction to the backend of `torch.linalg.eigh` (eigenvalue solver for Hermitian matrix). Using the heuristics from https://github.com/pytorch/pytorch/pull/53040#issuecomment-788264724 and my local tests, the `syevj_batched` path is only used when `batch_size > 1` and `matrix_size <= 32`. This would give us huge performance boost in those cases. Since there were known numerical issues on cusolver `syevj_batched` before cuda 11.3 update 1, this PR only enables the dispatch when cuda version is no less than that. See also https://github.com/pytorch/pytorch/issues/42666 #47953 https://github.com/pytorch/pytorch/issues/53040 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62003 Reviewed By: heitorschueroff Differential Revision: D30006316 Pulled By: ngimel fbshipit-source-id: 3a65c5fc9adbbe776524f8957df5442c3d3aeb8e	2021-07-30 00:35:21 -07:00
Rong Rong (AI Infra)	65ab861ec6	fix mm not correctly report TORCH_CHECK failure issue (#61394 ) Summary: fixes https://github.com/pytorch/pytorch/issues/61291. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61394 Reviewed By: zhouzhuojie, seemethere Differential Revision: D29614208 Pulled By: walterddr fbshipit-source-id: f49a15dde708e30b06059b47fae1cda7c2c3571c	2021-07-12 12:50:51 -07:00
Xiao Wang	c18017190b	Relax some linalg test tolerances (#61101 ) Summary: We are seeing some test failures on A100 machine, though TF32 matmul is not involved in these cases. I tried `svd_lowrank` test. It passed while testing itself, but failed when I run the whole test suite. It's probably some random seed issue. Relax test tolerance would be much easier to do. Some SVD tests failed when we compare CPU float32 vs GPU float32. Since linear algebra are sort of unstable at single precision, comparing two single precision results may give some false positives. So we calculate CPU results in float64 or complex128, which is much more accurate. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61101 Reviewed By: ngimel Differential Revision: D29593483 Pulled By: mruberry fbshipit-source-id: 3df651e3cca1b0effc1a4ae29d4f26b1cb4082ed	2021-07-12 09:17:59 -07:00
gmagogsfm	a46d4212bf	Allow dims=0 in torch.tensordot call (#61331 ) Summary: In one of my previous PRs that rewrite `tensordot` implementation, I mistakenly take empty value of `dims_a` and `dims_b` as illegal values. This turns out to be not true. Empty `dims_a` and `dims_b` are supported, in fact common when `dims` is passed as an integer. This PR removes the unnecessary check. Fixes https://github.com/pytorch/pytorch/issues/61096 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61331 Reviewed By: eellison Differential Revision: D29578910 Pulled By: gmagogsfm fbshipit-source-id: 96e58164491a077ddc7a1d6aa6ccef8c0c9efda2	2021-07-10 17:05:20 -07:00
Ivan Yashchuk	9dd1824741	Fix dispatch keys for eigh, lu_solve (#60945 ) Summary: I added a test to `test_ops.py` that verifies that the op can run correctly from different cuda devices. This test revealed that `linalg_eigh`, `linalg_eigvalsh`, `linalg_matrix_rank`, `linalg_pinv` were failing. `matrix_rank` and `pinv` are calling `eigh` internally. `linalg_eigh` and `lu_solve` internally use dispatch stubs, so they should be registered with `CPU, CUDA` dispatch keys. The generated code includes device guards in this case and the problem is not present. Implemented a better out variant for `eigvalsh` and registered it with `CPU, CUDA` dispatch keys. ~I added a device guard to `linalg_eigh_kernel` as a fix for `eigvalsh` function. This function needs to be registered as CompositeImplicitAutograd, because it calls `at::linalg_eigh` if `at::GradMode::is_enabled()`.~ Fixes https://github.com/pytorch/pytorch/issues/60892. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60945 Reviewed By: mruberry Differential Revision: D29589580 Pulled By: ngimel fbshipit-source-id: 5851605958bdfc3a1a1768263934619449957168	2021-07-07 16:28:22 -07:00
Kurt Mohler	b39770c461	Fix degenerate shape behavior for ord=+/-2 (#60273 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/59198 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60273 Reviewed By: jbschlosser Differential Revision: D29422907 Pulled By: mruberry fbshipit-source-id: 609cd640b0477f90bebca20865e34cbe182d3909	2021-06-30 02:17:26 -07:00
Aswin John Mathews	a53d7f8f7c	Remove test linalg test skips from MAGMA integration (#58232 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/55552; majority of cases in https://github.com/pytorch/pytorch/issues/51303 Tests in torch/testing/_internal/common_methods_invocations.py (tested through test_ops) cannot be fully removed, since the machines seem to be running out of gpu memory during the test, and needs further analysis Pull Request resolved: https://github.com/pytorch/pytorch/pull/58232 Reviewed By: ngimel Differential Revision: D29394021 Pulled By: malfet fbshipit-source-id: f108a70af33beec908ac1c0b58467f8744e6fe87	2021-06-25 11:44:49 -07:00
Philip Meier	0c916c8a4e	up the priority of numpy array comparisons in self.assertEqual (#59067 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/58988. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59067 Reviewed By: jbschlosser Differential Revision: D28986642 Pulled By: heitorschueroff fbshipit-source-id: 3ef2d26b4010fc3519d0a1a020ea446ffeb46ba0	2021-06-22 13:07:07 -07:00
Heitor Schueroff	4caca7a15b	Improved torch.einsum testing and fixed bug (#59731 ) Summary: Improved torch.einsum testing and fixed a bug where lower case letters appeared before upper case letters in the sorted order which is inconsistent with NumPy. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59731 Reviewed By: SplitInfinity, ansley Differential Revision: D29183078 Pulled By: heitorschueroff fbshipit-source-id: a33980d273707da2d60a387a2af2fa41527ddb68	2021-06-17 04:48:47 -07:00
Natalia Gimelshein	9d533ef3ac	Renorm fix (#59615 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/59584 albanD, soulitzer, `renorm` grad was completely busted. Fast gradcheck is definitely not doing its job. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59615 Reviewed By: jbschlosser Differential Revision: D28964271 Pulled By: ngimel fbshipit-source-id: b6878cd24db9189b64b67eb58bd2cd8956cda78a	2021-06-08 14:59:24 -07:00
Mike Ruberry	de40c8e495	Adds remaining OpInfos and removes redundant test generators (#55558 ) Summary: Per title. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55558 Reviewed By: ngimel Differential Revision: D28922522 Pulled By: mruberry fbshipit-source-id: 89cefd93788bc8aa0683f4583cf5caa81aa2dc93	2021-06-06 14:52:26 -07:00
albanD	d095ec75a1	Forward AD formulas batch 2 (#57863 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57863 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D28387763 Pulled By: albanD fbshipit-source-id: e1b60ab728bb05b9e3323ee0dc7e401aaf5b8817	2021-06-03 07:33:04 -07:00
Ivan Yashchuk	e9e1bb1a4e	Fix device of info tensor for torch.linalg.inv_ex with MAGMA backend (#59223 ) Summary: This PR fixes `torch.linalg.inv_ex` with MAGMA backend. `info` tensor was returned on CPU device even for CUDA inputs. Now it's on the same device as input. Fixes https://github.com/pytorch/pytorch/issues/58769 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59223 Reviewed By: ngimel Differential Revision: D28814876 Pulled By: mruberry fbshipit-source-id: f66c6f06fb8bc305cb2e22b08750a25c8888fb65	2021-06-01 21:49:57 -07:00
Natalia Gimelshein	1871d4e604	avoid explicitly casting low precision inputs to fp32 in norm (#59134 ) Summary: Per title. Now `norm` with fp16/bfloat16 inputs and fp32 outputs on cuda won't do explicit cast Pull Request resolved: https://github.com/pytorch/pytorch/pull/59134 Reviewed By: mruberry Differential Revision: D28775729 Pulled By: ngimel fbshipit-source-id: 896daa4f02e8a817cb7cb99ae8a93c02fa8dd5e9	2021-05-29 00:48:18 -07:00
Heitor Schueroff	72ae924fad	Added sublist support for torch.einsum (#56625 ) Summary: This PR adds an alternative way of calling `torch.einsum`. Instead of specifying the subscripts as letters in the `equation` parameter, one can now specify the subscripts as a list of integers as in `torch.einsum(operand1, subscripts1, operand2, subscripts2, ..., [subscripts_out])`. This would be equivalent to `torch.einsum('<subscripts1>,<subscripts2>,...,->[<subscript_out>]', operand1, operand2, ...)` TODO - [x] Update documentation - [x] Add more error checking - [x] Update tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/56625 Reviewed By: zou3519 Differential Revision: D28062616 Pulled By: heitorschueroff fbshipit-source-id: ec50ad34f127210696e7c545e4c0675166f127dc	2021-05-21 08:36:45 -07:00
Xiao Wang	691c139144	Do not use TF32 matmul in linalg and DDP tests (#56114 ) Summary: This PR does several things to relax test tolerance - Do not use TF32 in cuda matmul in test_c10d. See https://github.com/pytorch/pytorch/issues/52941. - Do not use TF32 in cuda matmul in test_linalg. Increase atol for float and cfloat. See https://github.com/pytorch/pytorch/issues/50453 The tolerance is increased because most linear algebra operators are not that stable in single precision. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56114 Reviewed By: ailzhang Differential Revision: D28554467 Pulled By: ngimel fbshipit-source-id: 90416be8e4c048bedb16903b01315584d344ecdf	2021-05-20 14:01:19 -07:00
Rong Rong (AI Infra)	64d23cc040	Revert D28379394: Update internal code for torch.linalg.solve Test Plan: revert-hammer Differential Revision: D28379394 (`b0833533a7`) Original commit changeset: b47f66bc1ee1 fbshipit-source-id: c81b34f45a1d82a2b1cecc8987048fa1055203d6	2021-05-13 19:49:41 -07:00
Ivan Yashchuk	b0833533a7	Update internal code for torch.linalg.solve (#56613 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56613 Replace linalg_solve_helper with `lu_stub` + `lu_solve_stub`. Once `lu_stub` and `lu_solve_stub` have cuSOLVER-based codepath, `torch.linalg.solve` will have it as well. Test Plan: Imported from OSS Reviewed By: agolynski Differential Revision: D28379394 Pulled By: mruberry fbshipit-source-id: b47f66bc1ee12715da11dcffc92e31e67fa8c8f6	2021-05-13 16:57:29 -07:00
Ivan Yashchuk	5e65428503	Fix NumPy compatibility issue for torch.linalg.cond (#58041 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58041 The shape of the returned result was different for NumPy and PyTorch for `ord={-2, 2, None}`. Now it's fixed. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D28405147 Pulled By: mruberry fbshipit-source-id: 30293a017a0c0a7e9e3aabd470386235fef7b6a6	2021-05-13 09:42:18 -07:00
Ivan Yashchuk	a49406b331	Fixed batched version of torch.linalg.cond for singular inputs (#58040 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58040 This PR uses `torch.linalg.inv_ex` to determine the non-invertible inputs and return the condition number of infinity for such inputs. Added OpInfo entry for `torch.linalg.cond`. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D28405146 Pulled By: mruberry fbshipit-source-id: 524b9a38309851fa6461cb787ef3fba5aa7d5328	2021-05-13 09:42:17 -07:00
Ivan Yashchuk	c1430c3425	Add torch.linalg.inv_ex without checking for errors by default (#58039 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58039 The new function has the following signature `inv_ex(Tensor inpit, *, bool check_errors=False) -> (Tensor inverse, Tensor info)`. When `check_errors=True`, an error is thrown if the matrix is not invertible; `check_errors=False` - responsibility for checking the result is on the user. `linalg_inv` is implemented using calls to `linalg_inv_ex` now. Resolves https://github.com/pytorch/pytorch/issues/25095 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D28405148 Pulled By: mruberry fbshipit-source-id: b8563a6c59048cb81e206932eb2f6cf489fd8531	2021-05-13 09:42:15 -07:00
lezcano	db13119fc4	Deprecate symeig (#57732 ) Summary: This one had a tricky usage of `torch.symeig` that had to be replaced. I tested the replacement locally though. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57732 Reviewed By: bdhirsh Differential Revision: D28328189 Pulled By: mruberry fbshipit-source-id: 7f000fcbf2b029beabc76e5a89ff158b47977474	2021-05-12 02:21:35 -07:00
Nikita Vedeneev	c790fd2bf8	ATen lu_unpack. Required for making `torch.lu_solve` differentiable. (#46913 ) Summary: Backward methods for `torch.lu` and `torch.lu_solve` require the `torch.lu_unpack` method. However, while `torch.lu` is a Python wrapper over a native function, so its gradient is implemented via `autograd.Function`, `torch.lu_solve` is a native function, so it cannot access `torch.lu_unpack` as it is implemented in Python. Hence this PR presents a native (ATen) `lu_unpack` version. It is also possible to update the gradients for `torch.lu` so that backward+JIT is supported (no JIT for `autograd.Function`) with this function. ~~The interface for this method is different from the original `torch.lu_unpack`, so it is decided to keep it hidden.~~ Pull Request resolved: https://github.com/pytorch/pytorch/pull/46913 Reviewed By: albanD Differential Revision: D28355725 Pulled By: mruberry fbshipit-source-id: 281260f3b6e93c15b08b2ba66d5a221314b00e78	2021-05-11 22:53:21 -07:00
Ivan Yashchuk	aaca12bcc2	Deprecate in docs torch.svd and change svd -> linalg_svd (#57981 ) Summary: This PR adds a note to the documentation that torch.svd is deprecated together with an upgrade guide on how to use `torch.linalg.svd` and `torch.linalg.svdvals` (Lezcano's instructions from https://github.com/pytorch/pytorch/issues/57549). In addition, all usage of the old svd function is replaced with a new one from torch.linalg module, except for the `at::linalg_pinv` function, that fails the XLA CI build (https://github.com/pytorch/xla/issues/2755, see failure in draft PR https://github.com/pytorch/pytorch/pull/57772). Pull Request resolved: https://github.com/pytorch/pytorch/pull/57981 Reviewed By: ngimel Differential Revision: D28345558 Pulled By: mruberry fbshipit-source-id: 02dd9ae6efe975026e80ca128e9b91dfc65d7213	2021-05-11 18:04:10 -07:00
Mike Ruberry	3c87fe9b14	Revert D28117714: [pytorch][PR] ATen lu_unpack. Required for making `torch.lu_solve` differentiable. Test Plan: revert-hammer Differential Revision: D28117714 (`5c67d8dfd3`) Original commit changeset: befd33db12ec fbshipit-source-id: 295b2134935542a903a73f90a7998239dfe6cc81	2021-05-09 23:20:06 -07:00
Ivan Yashchuk	d11cce4f5e	Add cuSOLVER path for torch.linalg.lstsq (#57317 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57317 This PR implements QR-based least squares solver using geqrf, ormqr, and triangular_solve operations. Internal code of triangular_solve was fixed to handle correctly larger sized rectangular arrays. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D28312683 Pulled By: mruberry fbshipit-source-id: dc8ae837a5fb0685d85c8733a47d7d25dc46443a	2021-05-09 21:19:10 -07:00
Nikita Vedeneev	5c67d8dfd3	ATen lu_unpack. Required for making `torch.lu_solve` differentiable. (#46913 ) Summary: Backward methods for `torch.lu` and `torch.lu_solve` require the `torch.lu_unpack` method. However, while `torch.lu` is a Python wrapper over a native function, so its gradient is implemented via `autograd.Function`, `torch.lu_solve` is a native function, so it cannot access `torch.lu_unpack` as it is implemented in Python. Hence this PR presents a native (ATen) `lu_unpack` version. It is also possible to update the gradients for `torch.lu` so that backward+JIT is supported (no JIT for `autograd.Function`) with this function. ~~The interface for this method is different from the original `torch.lu_unpack`, so it is decided to keep it hidden.~~ Pull Request resolved: https://github.com/pytorch/pytorch/pull/46913 Reviewed By: astaff Differential Revision: D28117714 Pulled By: mruberry fbshipit-source-id: befd33db12ecc147afacac792418b6f4948fa4a4	2021-05-09 19:12:56 -07:00
Heitor Schueroff	4cf2c646c2	Added torch.linalg.matrix_norm (#57127 ) Summary: This PR is focused on the API for `linalg.matrix_norm` and delegates computations to `linalg.norm` for the moment. The main difference between the norms is when `dim=None`. In this case - `linalg.norm` will compute a vector norm on the flattened input if `ord=None`, otherwise it requires the input to be either 1D or 2D in order to disambiguate between vector and matrix norm - `linalg.vector_norm` will flatten the input - `linalg.matrix_norm` will compute the norm over the last two dimensions, treating the input as batch of matrices In future PRs, the computations will be moved to `torch.linalg.matrix_norm` and `torch.norm` and `torch.linalg.norm` will delegate computations to either `linalg.vector_norm` or `linalg.matrix_norm` based on the arguments provided. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57127 Reviewed By: mrshenli Differential Revision: D28186736 Pulled By: mruberry fbshipit-source-id: 99ce2da9d1c4df3d9dd82c0a312c9570da5caf25	2021-05-09 04:50:33 -07:00
Ivan Yashchuk	18fed3dfbe	Change name for namedtuple return of torch.linalg.svd (#57181 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57181 Documentation for torch.linalg.svd says: > The returned decomposition is a named tuple `(U, S, Vh)` The documentation is correct while the implementation was wrong. Renamed `V` -> `Vh`. `h` stands for hermitian. This is a BC-breaking change but our linalg module is beta, therefore we can do it without a deprecation notice or aliases. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D28142162 Pulled By: mruberry fbshipit-source-id: 5e6e0ae5a63300f2db1575ca3259df381f8e1a7e	2021-05-07 15:17:43 -07:00
Ivan Yashchuk	58f32fa5fd	Remove compute_uv flag from torch.linalg.svd (#57180 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57180 We have now a separate function for computing only the singular values. `compute_uv` argument is not needed and it was decided in the offline discussion to remove it. This is a BC-breaking change but our linalg module is beta, therefore we can do it without a deprecation notice. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D28142163 Pulled By: mruberry fbshipit-source-id: 3fac1fcae414307ad5748c9d5ff50e0aa4e1b853	2021-05-07 15:16:42 -07:00
Sam Estep	023ecc40ad	Revert D28248766: Update internal code for torch.linalg.solve Test Plan: revert-hammer Differential Revision: D28248766 (`5f2925074b`) Original commit changeset: 300366605653 fbshipit-source-id: 316b97791e57f9017d4bf87898aea8dc869cba79	2021-05-07 07:49:16 -07:00
Ivan Yashchuk	5f2925074b	Update internal code for torch.linalg.solve (#56613 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56613 Replace linalg_solve_helper with `lu_stub` + `lu_solve_stub`. Once `lu_stub` and `lu_solve_stub` have cuSOLVER-based codepath, `torch.linalg.solve` will have it as well. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D28248766 Pulled By: mruberry fbshipit-source-id: 3003666056533d097d0ad659e0603f59fbfda9aa	2021-05-07 03:29:16 -07:00
Heitor Schueroff	1f1e2dab6b	Remove optional type for ord parameter in vector_norm (#57662 ) Summary: As per discussion here https://github.com/pytorch/pytorch/pull/57127#discussion_r624948215 Note that we cannot remove the optional type from the `dim` parameter because the default is to flatten the input tensor which cannot be easily captured by a value other than `None` ### BC Breaking Note This PR changes the `ord` parameter of `torch.linalg.vector_norm` so that it no longer accepts `None` arguments. The default behavior of `2` is equivalent to the previous default of `None`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57662 Reviewed By: albanD, mruberry Differential Revision: D28228870 Pulled By: heitorschueroff fbshipit-source-id: 040fd8055bbe013f64d3c8409bbb4b2c87c99d13	2021-05-06 17:53:25 -07:00
Sam Estep	72ebdd68e1	Revert D28242069: Add cuSOLVER path for torch.linalg.lstsq Test Plan: revert-hammer Differential Revision: D28242069 (`7b31d4262b`) Original commit changeset: 23979d19ccc7 fbshipit-source-id: edf26a78b3485790deb1a8f53e8c8d3989c28e1b	2021-05-06 09:28:15 -07:00
Ivan Yashchuk	7b31d4262b	Add cuSOLVER path for torch.linalg.lstsq (#57317 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57317 This PR implements QR-based least squares solver using geqrf, ormqr, and triangular_solve operations. Internal code of triangular_solve was fixed to handle correctly larger sized rectangular arrays. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D28242069 Pulled By: mruberry fbshipit-source-id: 23979d19ccc7f591afa8df4435d0db847e2d0d97	2021-05-06 04:45:55 -07:00
Ivan Yashchuk	35fab44eaf	Add CUDA support for torch.ormqr (#57316 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57316 CUDA support is implemented using cuSOLVER. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D28242071 Pulled By: mruberry fbshipit-source-id: 6f0a1c50c21c376d2ee2907bddb618c6a600db1f	2021-05-06 04:45:54 -07:00
Ivan Yashchuk	59d794b2c3	Port CPU torch.ormqr to ATen (#57315 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57315 This PR ports `torch.ormqr` from TH to ATen. CUDA path will be implemented in a follow-up PR. With ATen port, support for complex and batched inputs is added. The tests are rewritten and OpInfo entry is added. We can implement the least squares solver with geqrf + ormqr + triangular_solve. So it's useful to have this function renewed at least for the internal code. Resolves https://github.com/pytorch/pytorch/issues/24748 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D28242070 Pulled By: mruberry fbshipit-source-id: f070bb6ac2f5a3269b163b22f7354e9089ed3061	2021-05-06 04:44:40 -07:00
Jane Xu	76d9070d10	Replace windows CUDA 11.2 CI with 11.3 (#57223 ) Summary: Testing 11.3 with current CI. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57223 Test Plan: Relevant CI (11.3) pass! Disclaimer: Skipped test_inverse_errors_large for CUDA 11.3 as it failed. Issue documented at https://github.com/pytorch/pytorch/issues/57482. Reviewed By: malfet Differential Revision: D28169393 Pulled By: janeyx99 fbshipit-source-id: 9f5cf7b6737ee6196de92bd80918a5bfbe5510ea	2021-05-04 14:23:23 -07:00
Shen Li	6bc3ad28a3	Revert D28143091: [pytorch][PR] Add cross OpInfo Test Plan: revert-hammer Differential Revision: D28143091 (`4a872f8539`) Original commit changeset: 0b98226a1811 fbshipit-source-id: eda38923f31ac5a79af5c78077ed0106d904f6da	2021-05-03 09:19:41 -07:00
Mike Ruberry	4a872f8539	Add cross OpInfo (#55483 ) Summary: One of the tasks in https://github.com/pytorch/pytorch/issues/54261. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55483 Reviewed By: ngimel Differential Revision: D28143091 Pulled By: mruberry fbshipit-source-id: 0b98226a1811f61cb90d2248dd4425135a096551	2021-05-02 16:23:02 -07:00
Ivan Yashchuk	75a2a92b02	Add torch.linalg.cholesky_ex without checking for errors by default (#56724 ) Summary: The new function has the following signature `cholesky_ex(Tensor input, *, bool check_errors=False) -> (Tensor L, Tensor infos)`. When `check_errors=True`, an error is thrown if the decomposition fails; `check_errors=False` - responsibility for checking the decomposition is on the user. When `check_errors=False`, we don't have host-device memory transfers for checking the values of the `info` tensor. Rewrote the internal code for `torch.linalg.cholesky`. Added `cholesky_stub` dispatch. `linalg_cholesky` is implemented using calls to `linalg_cholesky_ex` now. Resolves https://github.com/pytorch/pytorch/issues/57032. Ref. https://github.com/pytorch/pytorch/issues/34272, https://github.com/pytorch/pytorch/issues/47608, https://github.com/pytorch/pytorch/issues/47953 Pull Request resolved: https://github.com/pytorch/pytorch/pull/56724 Reviewed By: ngimel Differential Revision: D27960176 Pulled By: mruberry fbshipit-source-id: f05f3d5d9b4aa444e41c4eec48ad9a9b6fd5dfa5	2021-05-01 18:48:27 -07:00
Ivan Yashchuk	2be115336b	Fix torch.ormqr for non Fortran-contiguous inputs (#57314 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57314 Test Plan: Imported from OSS Reviewed By: astaff Differential Revision: D28118029 Pulled By: mruberry fbshipit-source-id: e2ef65093cc5f77769adc7066c76f0607b5559a9	2021-05-01 17:50:06 -07:00
Arindam Roy	6d681d064f	ROCM: Re-enable test_norm_fro_2_equivalence_old (#57170 ) Summary: This test was disabled for ROCM 3.9. With latest updates, the test is passing in ROCM 4.1. Hence enabling this test in test/test_linalg.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/57170 Reviewed By: astaff Differential Revision: D28118217 Pulled By: mruberry fbshipit-source-id: 1b830eed944a664c3b1b3e936b87096fef0c0ca2	2021-05-01 16:41:41 -07:00
Wenlei Xie	20085f6d23	Support auto generation of device check (#56872 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56872 ghstack-source-id: 127914018 Test Plan: auto test Reviewed By: ezyang Differential Revision: D27986429 fbshipit-source-id: 0da8413b0b8e6810fcea27ed1de499f11f68bd1f	2021-05-01 12:02:09 -07:00
Sameer Deshmukh	293830bc19	Fix min() and max() for empty tensors (#52565 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/34907 Pull Request resolved: https://github.com/pytorch/pytorch/pull/52565 Reviewed By: anjali411 Differential Revision: D27999955 Pulled By: ezyang fbshipit-source-id: 30e88cc8d84806198500e3001ecf58fa764536dd	2021-04-30 15:55:10 -07:00
Ivan Yashchuk	f54aa85a6c	Fix MAGMA qr for empty batched inputs (#56257 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56257 CPU and cuSOLVER path were fixed with refactoring of `_linalg_qr_helper_default`. Resolves https://github.com/pytorch/pytorch/issues/50576 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D27960157 Pulled By: mruberry fbshipit-source-id: f923f3067a35e65218889e64c6a886364c3d1759	2021-04-30 11:15:03 -07:00
Ivan Yashchuk	03962bc7f1	Updated linalg.lstsq with NumPy compatible kwarg rcond (#54723 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54723 Renamed "cond" -> "rcond" to be NumPy compatible. The default value for rcond was changed to match non-legacy NumPy behavior. Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D27993741 Pulled By: mruberry fbshipit-source-id: a4baf25aca6a8272f1af2f963600866bfda56fb3	2021-04-29 09:11:12 -07:00
Ivan Yashchuk	5a02f72fcf	Modified batched residuals return of torch.linalg.lstsq (#54722 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54722 SciPy and NumPy operate only on non-batched input and return an empty array with shape (0,) if rank(a) != n. The behavior for non-batched inputs is NumPy and SciPy compatible and the same result is computed. For batched inputs, if any matrix in the batch has a rank less than `n`, then an empty tensor is returned. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D27993736 Pulled By: mruberry fbshipit-source-id: 0d7cff967b322a5e816a23f282b6ce383c4468ef	2021-04-29 09:10:12 -07:00
Heitor Schueroff	57e37080cd	Added OpInfo for torch.einsum (#56276 ) Summary: Adds OpInfo testing for torch.einsum. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56276 Reviewed By: mruberry Differential Revision: D27967095 Pulled By: heitorschueroff fbshipit-source-id: 60524273d2ca885e7eeb932db3e7fd697ae5ca8e	2021-04-27 07:39:38 -07:00
Ivan Yashchuk	f84f2063b4	Port CUDA torch.geqrf to ATen (#56251 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56251 This PR ports `torch.geqrf` from TH to ATen for CUDA path. Resolves https://github.com/pytorch/pytorch/issues/24569 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D27960155 Pulled By: mruberry fbshipit-source-id: a8b010c41d703a5de4bf40b045c89e6b95b5a5ca	2021-04-26 09:50:41 -07:00
Ivan Yashchuk	6ba9fd5963	Added "Tensor tol" overload of torch.linalg.matrix_rank (#54157 ) Summary: Currently `torch.linalg.matrix_rank` accepts only Python's float for `tol=` argument. The current behavior is not NumPy compatible and this PR adds the possibility to pass Tensor for matrix-wise tolerances. Ref. https://github.com/pytorch/pytorch/issues/42666 Pull Request resolved: https://github.com/pytorch/pytorch/pull/54157 Reviewed By: ezyang Differential Revision: D27961548 Pulled By: mruberry fbshipit-source-id: 47318eefa07a7876e6360dae089e5389b9939489	2021-04-26 09:35:40 -07:00
Ivan Yashchuk	d5ff432615	Add torch.linalg.svdvals (#56684 ) Summary: This PR adds `torch.linalg.svdvals(input, out=None)` that computes only the singular values of `input`. Resolves https://github.com/pytorch/pytorch/issues/54155. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56684 Reviewed By: albanD Differential Revision: D27938229 Pulled By: mruberry fbshipit-source-id: 5ea79ad9cccf818df0fbda1f431299ebf8de3798	2021-04-25 03:42:24 -07:00
Ivan Yashchuk	58fcf77712	Port CPU torch.geqrf to ATen (#56249 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56249 This PR ports `torch.geqrf` from TH to ATen. CUDA path will be implemented in a follow-up PR. With ATen port support for complex and batched inputs is added. There were no correctness tests, they are added in this PR and I added OpInfo for this operation. We can implement the QR decomposition as a composition of geqrf and orgqr (torch.linalg.householder_product). Also we can implement the least squares solver with geqrf + ormqr + trtrs. So it's useful to have this function renewed at least for the internal code. Resolves https://github.com/pytorch/pytorch/issues/24705 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D27907357 Pulled By: mruberry fbshipit-source-id: 94e1806078977417e7903db76eab9d578305f585	2021-04-25 01:17:00 -07:00
Heitor Schueroff	369e8bc4bc	Added support for uppercase letters in torch.einsum (#56475 ) Summary: This PR adds support for upper case letters in `torch.einsum` equation. Addresses PR https://github.com/pytorch/pytorch/pull/55013 here. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56475 Reviewed By: ailzhang Differential Revision: D27948362 Pulled By: heitorschueroff fbshipit-source-id: 51cf57b17c4c23d88fab5343f17ba3bfbe3607a5	2021-04-23 08:13:58 -07:00
Kurt Mohler	1f04494c0e	Consolidate nondeterministic error tests (#55631 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/51498 Pull Request resolved: https://github.com/pytorch/pytorch/pull/55631 Reviewed By: malfet Differential Revision: D27909953 Pulled By: mruberry fbshipit-source-id: 9115b2433f9c276555be55bd51b270a7a2846829	2021-04-22 23:37:01 -07:00
Jeffrey Wan	2ea3c24c06	Disable flaky tests (#56279 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56279 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D27916606 Pulled By: soulitzer fbshipit-source-id: 60c07024f6eb818f4aa6730a5f9ff90d7bc2b80f	2021-04-22 19:45:41 -07:00
Ivan Yashchuk	3d878dee45	Added out= variant for torch.linalg.lstsq (#54721 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54721 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D27874711 Pulled By: mruberry fbshipit-source-id: 696ebb6eb0bad81988e9cb7a081388a3a5ab3e2c	2021-04-20 07:09:06 -07:00
Winston Smith	7513455c74	Make tensordot resize output tensor's size if out= argument is specified & make it safely cast & copy output (#56286 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/56022. Fixes https://github.com/pytorch/pytorch/issues/56316 For `torch.tensordot`, 1. `tensordot`'s out variant now resizes the output tensor provided as the `out` argument if necessary. 2. Added a check to verify if the output tensor provided as the argument for `out` is on the same device as the input tensors. 3. Added a check to verify if the dtype of the result is castable to the dtype of the output tensor provided as an argument for `out`. 4. Because of (2) & (3), `tensordot`'s out variant now [safely casts & copies output](https://github.com/pytorch/pytorch/wiki/Developer-FAQ#how-does-out-work-in-pytorch). 5. `test_tensordot` in `test_linalg.py` had a bug - the output tensor wasn't being defined to be on the same device as the input tensors. It was fixed by simply using a `device` argument in its definition. 6. Added an `OpInfo` for `tensordot` and modified the `OpInfo` for `inner`. cc heitorschueroff mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/56286 Reviewed By: ngimel Differential Revision: D27845980 Pulled By: mruberry fbshipit-source-id: 134ab163f05c31a6900dd65aefc745803019e037	2021-04-19 04:20:21 -07:00
Kurt Mohler	a3a75bd35e	Add complex autograd support for `torch.cross` (#55854 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/53512 Pull Request resolved: https://github.com/pytorch/pytorch/pull/55854 Reviewed By: nikithamalgifb Differential Revision: D27737571 Pulled By: anjali411 fbshipit-source-id: 38165b952cc4c9213d61c7d98b549b984c154927	2021-04-15 15:07:25 -07:00
Mike Ruberry	399b66c813	Ports logdet from method_tests() to op_db (#55743 ) Summary: Per title. Also updates some tensor construction helpers. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55743 Reviewed By: ngimel Differential Revision: D27702060 Pulled By: mruberry fbshipit-source-id: f64b7bee855733ad1f4fd182819ceec5831d9878	2021-04-11 20:39:16 -07:00
Yukio Siraichi	93bf0ae6fc	Remove legacy constructor calls from pytorch codebase. (#54142 ) Summary: Follow up from https://github.com/pytorch/pytorch/issues/53889 Related to https://github.com/pytorch/pytorch/issues/47112 Removing every occurrence of the legacy constructor call present in PyTorch at: - _docs_ - _benchmarks_ - _test_ - _caffe2_ - _CONTRIBUTING.md_ Pull Request resolved: https://github.com/pytorch/pytorch/pull/54142 Reviewed By: ngimel Differential Revision: D27699450 Pulled By: mruberry fbshipit-source-id: 530aa3f5746cc8bc1407d5d51b2bbd8075e30546	2021-04-11 15:45:17 -07:00
Arindam Roy	0dff0d1537	[ROCM] Disable few tests for Magma (#55534 ) Summary: After MAGMA has been enabled, around 5k new tests are running now. Out of these 5 tests (each having 4 datatypes) are failing on the latest ROCM CI with Rocm 4.1. Disabling these tests for now so the ROCM CI does not fail. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55534 Reviewed By: ZolotukhinM Differential Revision: D27630085 Pulled By: malfet fbshipit-source-id: c48d124e6a2b4a4f3c6c4b6ac2bdf6c214f325c7	2021-04-07 22:22:43 -07:00
Nikita Shulga	add49e7e4e	Enforce PEP263 for PyTorch python codebase (#55346 ) Summary: All python files containing non-ASCII characters should be correctly annotated with `# -- coding: utf-8 --` comment Delete number of superfluous UTF-8 characters, most commonly UTF-8 opening closing quotation mark U+2019 (’) instead of ascii apostrophe ', for example `Module’s`->`Module's` Pull Request resolved: https://github.com/pytorch/pytorch/pull/55346 Reviewed By: samestep Differential Revision: D27582044 Pulled By: malfet fbshipit-source-id: c1cd89655915858ff3a41f675cdfffff795a8e44	2021-04-06 18:31:38 -07:00
Ivan Yashchuk	84d18727bd	Added linalg.eig, linalg.eigvals (#52491 ) Summary: This PR adds `torch.linalg.eig`, and `torch.linalg.eigvals` for NumPy compatibility. MAGMA uses a hybrid CPU-GPU algorithm and doesn't have a GPU interface for the non-symmetric eigendecomposition. It means that it forces us to transfer inputs living in GPU memory to CPU first before calling MAGMA, and then transfer results from MAGMA to CPU. That is rather slow for smaller matrices and MAGMA is faster than CPU path only for matrices larger than 3000x3000. Unfortunately, there is no cuSOLVER function for this operation. Autograd support for `torch.linalg.eig` will be added in a follow-up PR. Ref https://github.com/pytorch/pytorch/issues/42666 Pull Request resolved: https://github.com/pytorch/pytorch/pull/52491 Reviewed By: anjali411 Differential Revision: D27563616 Pulled By: mruberry fbshipit-source-id: b42bb98afcd2ed7625d30bdd71cfc74a7ea57bb5	2021-04-06 13:53:26 -07:00
Heitor Schueroff	d98072b027	Deprecate torch.chain_matmul in favor of torch.linalg.multi_dot (#53453 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53453 Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D27406282 Pulled By: heitorschueroff fbshipit-source-id: b6e715d1b88e0613ee6b6208cb28ba4757e31717	2021-04-01 04:50:51 -07:00
Heitor Schueroff	5d68b3695c	[Relanding] Implemented torch.linalg.multi_dot (#52859 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52859 This reverts commit `92a4ee1cf6`. Added support for bfloat16 for CUDA 11 and removed fast-path for empty input tensors that was affecting autograd graph. Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D27402390 Pulled By: heitorschueroff fbshipit-source-id: 73c5ccf54f3da3d29eb63c9ed3601e2fe6951034	2021-04-01 04:49:05 -07:00
Ivan Yashchuk	854c92078a	Fixed the default size of the workspace array for MAGMA's SVD (#54875 ) Summary: The problem was that MAGMA might not set the value for the optimal size of the workspace array leaving it uninitialized. This is fixed by setting the default value for `wkopt` variable. Fixes https://github.com/pytorch/pytorch/issues/54381 and https://github.com/pytorch/pytorch/issues/53976. Pull Request resolved: https://github.com/pytorch/pytorch/pull/54875 Reviewed By: H-Huang Differential Revision: D27437702 Pulled By: mruberry fbshipit-source-id: bf61555abc4c50e8ef2dae933df24ce4d4fe4527	2021-03-30 19:28:06 -07:00
anjali411	7c8b0f2600	Test torch.chain_matmul for complex dtype (#54885 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54885 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D27400936 Pulled By: anjali411 fbshipit-source-id: 415d843d7c55f4d84a8e9faab926a4895e1544d0	2021-03-29 13:37:23 -07:00
Edward Yang	1f36ce6e4d	Restore storage on meta tensors; increase meta coverage (#53973 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53973 Two parts to this PR; I had to put them together because adding support for X causes more test code to be exercised, which in turn may require a fix for Y. The first part is restoring the concept of storage to meta tensors. Previously, meta tensors had a nullptr storage (e.g., `meta_tensor.storage()` is an error.) As I was increasing the coverage of meta tensors, I started running into test cases (specifically memory overlap tests) that were failing because not having storage meant I couldn't check for memory overlap. After some discussion, we decided that it would make sense for meta tensors to model this as well (we already model strides, so getting accurate view information also seems useful). This PR does that by: * Rewrite all of the factory functions in MetaTensor.cpp to use the generic versions (which are very carefully written to not actually poke at the data pointer, so everything works out). The key idea here is we give meta tensors a special allocator, MetaAllocator, which always returns a nullptr even if you ask for a nonzero number of bytes. resize_ is also made generic; the normal variant can be used directly rather than having to instruct it to avoid resizing storage * Turn on memory overlap checking in TensorIterator even for meta tensors * Although meta tensors now have storage, the concept of meta storage is NOT exposed to Python land (as it would imply I would have to codegen MetaFloatStorage, MetaDoubleStorage, etc. classes). So `x.storage()` still raises an error and I have a cludge in `__deepcopy__` to break storage sharing upon deep copy (this is wrong, but no tests exercise this at the moment). The second part is adding more support for the most used functions in the test suite. * Inplace operations have very simple meta functions. I added `fill_`, `zero_`, `random_`, `uniform_` and `normal_`. In the case of random, I take advantage of pbelevich's templates for defining random kernels, so that I can reuse the common scaffolding, and then just register a noop stub that actually does the RNG. (Look, another structured kernels tiny variant!) * `copy_` is now implemented. Copying into a meta tensor is always OK, but copying out of a meta tensor raises an error (as we don't know what the "correct" data to copy out is in this case) * `empty_strided` usage from structured kernels now is implemented (TBH, this could have been done as soon as `empty_strided` was added) * Meta was missing in a few places in TensorOptions/DispatchKey utility functions, so I added them * Autograd engine now correctly homes meta tensors with CPU tensors (they have -1 device index so CUDA queues wouldn't work anyway) * `apply_`, `map_` and `map2_` are special cased to no-op on meta tensor self. These count as inplace operations too but they are implemented a little differently. Getting more meta function support triggers a number of bugs in the test suite, which I then fix: - Linear algebra functions sometimes don't report NotImplementedError because they get swallowed by catch all try blocks. This is tracked in https://github.com/pytorch/pytorch/issues/53739 - dlpack obviously doesn't work with meta tensors, I just disabled the test Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: D27036572 Test Plan: Imported from OSS Reviewed By: agolynski, bdhirsh Pulled By: ezyang fbshipit-source-id: 7005ecf4feb92a643c37389fdfbd852dbf00ac78	2021-03-29 08:37:46 -07:00
Heitor Schueroff	f9e7f132fb	Added torch.linalg.matrix_power (#52608 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52608 TODO - [x] Add OpInfo - [x] Update documentation - [x] Add more tests and compare against NumPy Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D27261532 Pulled By: heitorschueroff fbshipit-source-id: c1e4ab297da3683f6d5751be8790602f9dc37b6b	2021-03-23 15:10:06 -07:00
Mike Ruberry	544a996f83	Revert D27155845: [pytorch][PR] Fixed the size of the workspace array in functions calling MAGMA Test Plan: revert-hammer Differential Revision: D27155845 (`04a2506091`) Original commit changeset: 04439bfa82a5 fbshipit-source-id: f45967e94883effbb43d8d0a019596f1f82caa56	2021-03-19 08:27:18 -07:00
Ivan Yashchuk	04a2506091	Fixed the size of the workspace array in functions calling MAGMA (#54009 ) Summary: The size of the workspace arrays should not be less than 1. This PR fixes lstsq calls to LAPACK and MAGMA. Also `max(1, ...)` guards were added to a few other functions (symeig, svd). ROCm testing is enabled for lstsq, pinv, pinverse. Fixes https://github.com/pytorch/pytorch/issues/53976 Pull Request resolved: https://github.com/pytorch/pytorch/pull/54009 Reviewed By: ejguan Differential Revision: D27155845 Pulled By: mruberry fbshipit-source-id: 04439bfa82a5bdbe2297a6d62b6e68ba1c30e4a2	2021-03-18 10:07:45 -07:00
Kurt Mohler	382a47b493	Add torch.linalg.vector_norm function (#51099 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/50214 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51099 Reviewed By: agolynski Differential Revision: D27147360 Pulled By: mruberry fbshipit-source-id: 1056f840e7027ad81971c9d1a9f952ab9648f1b5	2021-03-18 06:41:39 -07:00
Ivan Yashchuk	564456ac44	Added autograd support for torch.orgqr (#52637 ) Summary: This PR adds autograd support for `torch.orgqr`. Since `torch.orgqr` is one of few functions that expose LAPACK's naming and all other linear algebra routines were renamed a long time ago, I also added a new function with a new name and `torch.orgqr` now is an alias for it. The new proposed name is `householder_product`. For a matrix `input` and a vector `tau` LAPACK's orgqr operation takes columns of `input` (called Householder vectors or elementary reflectors) scalars of `tau` that together represent Householder matrices and then the product of these matrices is computed. See https://www.netlib.org/lapack/lug/node128.html. Other linear algebra libraries that I'm aware of do not expose this LAPACK function, so there is some freedom in naming it. It is usually used internally only for QR decomposition, but can be useful for deep learning tasks now when it supports differentiation. Resolves https://github.com/pytorch/pytorch/issues/50104 Pull Request resolved: https://github.com/pytorch/pytorch/pull/52637 Reviewed By: agolynski Differential Revision: D27114246 Pulled By: mruberry fbshipit-source-id: 9ab51efe52aec7c137aa018c7bd486297e4111ce	2021-03-18 05:42:18 -07:00
Edward Yang	c2f41b6b84	Add meta device to generic device testing framework, skip NotImplementedError (#53682 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53682 With this, under the meta device, 101 tests passed and 16953 skipped. It ain't much, but it's a start. Some various bits and bobs: - NotImplementedError suppression at test level is implemented in the same way as CUDA memory leak check, i.e., by wrapping test methods and monkeypatching them back in. - I had to reimplement assertRaises/assertRaisesRegex from scratch to ignore NotImplementedError when _ignore_not_implemented_error is True. The implementation relies on a small amount of private API that hasn't changed since 2010 - expectedAlertNondeterministic doesn't really work so I skipped them all; there's probably a way to do it better I tested this using `pytest --disable-warnings --tb=native -k meta --sw test/*.py` and a pile of extra patches to make collection actually work (lol). Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D26955539 Pulled By: ezyang fbshipit-source-id: ac21c8734562497fdcca3b614a28010bc4c03d74	2021-03-14 20:41:19 -07:00
Mike Ruberry	319ab58e27	Skips test_linalg_lstsq on ROCm (#53977 ) Summary: This test is flaky (tracked in https://github.com/pytorch/pytorch/issues/53976). This PR skips it to let the rest of the ROCm CI run. cc nikitaved Pull Request resolved: https://github.com/pytorch/pytorch/pull/53977 Reviewed By: ngimel Differential Revision: D27036705 Pulled By: mruberry fbshipit-source-id: 5bae741fd2a68f23717cb3a7c8b73e97cfb23b5c	2021-03-14 05:42:39 -07:00
Ivan Yashchuk	7df176b1f9	Added OpInfo-based testing of some linalg functions (#51107 ) Summary: Added OpInfo-based testing of the following linear algebra functions: * cholesky, linalg.cholesky * linalg.eigh * inverse, linalg.inv * qr, linalg.qr * solve The output of `torch.linalg.pinv` for empty inputs was not differentiable, now it's fixed. In some cases, batched grad checks are disabled because it doesn't work well with 0x0 matrices (see https://github.com/pytorch/pytorch/issues/50743#issuecomment-767376085). Ref. https://github.com/pytorch/pytorch/issues/50006 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51107 Reviewed By: albanD Differential Revision: D27006115 Pulled By: mruberry fbshipit-source-id: 3c1d00e3d506948da25d612fb114e6d4a478c5b1	2021-03-14 01:10:02 -08:00
Mike Ruberry	d46978cc55	Refines test_orgqr_* skip (#53975 ) Summary: https://github.com/pytorch/pytorch/pull/51348 added CUDA support for orgqr but only a cuSOLVER path; the orgqr tests, however, were marked to run on builds with either MAGMA or cuSOLVER. This PR addresses the issue by creating a skipCUDAIfNoCusolver decator and applying to the orgqr tests. It triggers ci-all because our CI build with MAGMA but no cuSOLVER is CUDA 9.2, which does run in the typical PR CI. cc IvanYashchuk Pull Request resolved: https://github.com/pytorch/pytorch/pull/53975 Reviewed By: ngimel Differential Revision: D27036683 Pulled By: mruberry fbshipit-source-id: f6c0a3e526bde08c44b119ed2ae5d51fee27e283	2021-03-14 00:41:26 -08:00
Ivan Yashchuk	fe08671756	Added cuBLAS path for torch.triangular_solve (#53147 ) Summary: This PR adds the cuBLAS based path for `torch.triangular_solve` The device dispatching helper function was removed from native_functions.yml, it is replaced with DECLARE/DEFINE_DISPATCH. `magmaTriangularSolve` is removed and replaced with cuBLAS calls, this is not a BC-breaking change because internally MAGMA just calls the same cuBLAS function and doesn't do anything else. Batched cuBLAS is faster than batched MAGMA for matrices of size up until 512x512, after that MAGMA is faster. For batches smaller than ~8 and matrix sizes larger than 64x64 a forloop of cuBLAS calls is faster than batched version. Ref. https://github.com/pytorch/pytorch/issues/47953 Pull Request resolved: https://github.com/pytorch/pytorch/pull/53147 Reviewed By: heitorschueroff Differential Revision: D27007416 Pulled By: mruberry fbshipit-source-id: ddfc190346e6a56b84145ed0a9af67ca9cde3506	2021-03-12 13:38:42 -08:00
Nikita Vedeneev	afa1ff8e04	Implements `torch.linalg.lstsq` (#49093 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/44378 by providing a wider range of drivers similar to what SciPy is doing. The supported CPU drivers are `gels, gelsy, gelsd, gelss`. The CUDA interface has only `gels` implemented but only for overdetermined systems. The current state of this PR: - [x] CPU interface - [x] CUDA interface - [x] CPU tests - [x] CUDA tests - [x] Memory-efficient batch-wise iteration with broadcasting which fixes https://github.com/pytorch/pytorch/issues/49252 - [x] docs Pull Request resolved: https://github.com/pytorch/pytorch/pull/49093 Reviewed By: albanD Differential Revision: D26991788 Pulled By: mruberry fbshipit-source-id: 8af9ada979240b255402f55210c0af1cba6a0a3c	2021-03-12 13:25:55 -08:00
Nikita Vedeneev	8f15a2f052	eig_backward: faster and with complex support (#52875 ) Summary: As per title. Compared to the previous version, it is lighter on the usage of `at::solve` and `at::matmul` methods. Fixes https://github.com/pytorch/pytorch/issues/51621 Pull Request resolved: https://github.com/pytorch/pytorch/pull/52875 Reviewed By: mrshenli Differential Revision: D26768653 Pulled By: anjali411 fbshipit-source-id: aab141968d02587440128003203fed4b94c4c655	2021-03-10 11:33:30 -08:00
Ivan Yashchuk	e937db5dba	Added CUDA support for torch.orgqr (#51348 ) Summary: Update: MAGMA support was dropped from this PR. Only the cuSOLVER path is implemented and it's used exclusively. Original PR message: This PR adds support for CUDA inputs for `torch.orgqr`. CUDA implementation is based on both [cuSOLVER](https://docs.nvidia.com/cuda/cusolver/index.html#cuSolverDN-lt-t-gt-orgqr) and MAGMA. cuSOLVER doesn't have a specialized routine for the batched case. While MAGMA doesn't have a specialized GPU native (without CPU sync) `orgqr`. But MAGMA has implemented (and not documented) the batched GPU native version of `larft` function (for small inputs of size <= 32), which together with `larfb` operation form `orgqr` (see the call graph [here at the end of the page](http://www.netlib.org/lapack/explore-html/da/dba/group__double_o_t_h_e_rcomputational_ga14b45f7374dc8654073aa06879c1c459.html)). So now there are two main codepaths for CUDA inputs (if both MAGMA and cuSOLVER are available): * if `batchsize > 1` and `tau.shape[-1] <= 32` then MAGMA based function is called * else [cuSOLVER's `orgqr`](https://docs.nvidia.com/cuda/cusolver/index.html#cuSolverDN-lt-t-gt-orgqr) is used. If MAGMA is not available then only cuSOLVER is used and vice versa. Documentation updates and possibly a new name for this function will be in a follow-up PR. Ref. https://github.com/pytorch/pytorch/issues/50104 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51348 Reviewed By: heitorschueroff Differential Revision: D26882415 Pulled By: mruberry fbshipit-source-id: 9f91ff962921932777ff108bedc133b55fe22842	2021-03-10 09:59:56 -08:00
mattip	54a2498919	Modify tests to use assertWarnsOnceRegex instead of maybeWarnsRegex (#52387 ) Summary: Related to https://github.com/pytorch/pytorch/issues/50006 Follow on for https://github.com/pytorch/pytorch/issues/48560 to ensure TORCH_WARN_ONCE warnings are caught. Most of this is straight-forward find-and-replace, but I did find one place where the TORCH_WARN_ONCE warning was not wrapped into a python warning. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52387 Reviewed By: albanD Differential Revision: D26773387 Pulled By: mruberry fbshipit-source-id: 5be7efbc8ab4a32ec8437c9c45f3b6c3c328f5dd	2021-03-08 03:32:14 -08:00
Peter Bell	5ebfabb310	MAGMA: Initialize ipiv data to avoid internal memory access violation (#53064 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/51930 Running the reproducer under `cuda-gdb`, I see access violations in either [`zswap_kernel_batched`](`4fd4634f35/magmablas/zgetf2_kernels.cu (lines-276)`) (part of the LU factorization) and other times in [`zlaswp_columnserial_kernel`](`4fd4634f35/magmablas/zlaswp_batched.cu (lines-335)`) (part of the inverse). The common factor between both of these is they use `ipiv` to index into the matrix. My best guess is the `ipiv` indices aren't written when the factorization fails, hence garbage data is used as matrix indices and we get an access violation. Initializing `ipiv` to a known-good value before the factorization fixes the issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53064 Reviewed By: zhangguanheng66 Differential Revision: D26829053 Pulled By: heitorschueroff fbshipit-source-id: 842854a6ee182f20b2acad0d76d32d27cb51b061	2021-03-05 08:59:27 -08:00
Kyle Chen	bf5e5bf901	[ROCm] Enable test in test_linalg.py, test_optim.py and test_vmap.py … (#52818 ) Summary: Enable test in test_linalg.py, test_optim.py, and test_vmap.py for ROCm because they are passing. Signed-off-by: Kyle Chen <kylechen@amd.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/52818 Reviewed By: H-Huang Differential Revision: D26694091 Pulled By: mruberry fbshipit-source-id: 285d17aa7f271f4d94b5fa9d9f6620de8a70847b	2021-03-04 02:29:45 -08:00
Mike Ruberry	9c2673df46	Revert D26723384: [pytorch][PR] Implements `torch.linalg.lstsq` Test Plan: revert-hammer Differential Revision: D26723384 (`3ac9013235`) Original commit changeset: c9866a95f140 fbshipit-source-id: 3e5263d71facdc91ca09d7dcbbbe3ba818ee2821	2021-03-03 15:24:25 -08:00
Mike Ruberry	20860ab01a	Revert D26727918: [pytorch][PR] Added CUDA support for torch.orgqr Test Plan: revert-hammer Differential Revision: D26727918 (`e29d8477a6`) Original commit changeset: 1c4d15fa76ba fbshipit-source-id: f3d5d6811ab77332a333cd165d69fcd9ecd92dc6	2021-03-03 10:06:49 -08:00
Ivan Yashchuk	926e011cde	Fixed out= variant of linalg.solve (#51968 ) Summary: This PR modifies the behavior of the `linalg_solve_out` variant to match the description here https://github.com/pytorch/pytorch/wiki/Developer-FAQ#how-does-out-work-in-pytorch With this PR result and input tensors must be on the same device and have the same "type kind". It's allowed to pass out tensors with complex dtypes for float inputs. `linalg_solve_out` was broken for batched vector inputs and it's now fixed. Ref. https://github.com/pytorch/pytorch/issues/42666 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51968 Reviewed By: H-Huang Differential Revision: D26728825 Pulled By: mruberry fbshipit-source-id: c06fe937e7f452193b23ba09ca6cfa2703488455	2021-03-02 22:33:19 -08:00
Ivan Yashchuk	e29d8477a6	Added CUDA support for torch.orgqr (#51348 ) Summary: This PR adds support for CUDA inputs for `torch.orgqr`. CUDA implementation is based on both [cuSOLVER](https://docs.nvidia.com/cuda/cusolver/index.html#cuSolverDN-lt-t-gt-orgqr) and MAGMA. cuSOLVER doesn't have a specialized routine for the batched case. While MAGMA doesn't have a specialized GPU native (without CPU sync) `orgqr`. But MAGMA has implemented (and not documented) the batched GPU native version of `larft` function (for small inputs of size <= 32), which together with `larfb` operation form `orgqr` (see the call graph [here at the end of the page](http://www.netlib.org/lapack/explore-html/da/dba/group__double_o_t_h_e_rcomputational_ga14b45f7374dc8654073aa06879c1c459.html)). So now there are two main codepaths for CUDA inputs (if both MAGMA and cuSOLVER are available): * if `batchsize > 1` and `tau.shape[-1] <= 32` then MAGMA based function is called * else [cuSOLVER's `orgqr`](https://docs.nvidia.com/cuda/cusolver/index.html#cuSolverDN-lt-t-gt-orgqr) is used. If MAGMA is not available then only cuSOLVER is used and vice versa. Documentation updates and possibly a new name for this function will be in a follow-up PR. Ref. https://github.com/pytorch/pytorch/issues/50104 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51348 Reviewed By: ngimel Differential Revision: D26727918 Pulled By: mruberry fbshipit-source-id: 1c4d15fa76ba624e341a69a32337a9a16cc01013	2021-03-02 21:34:23 -08:00
Nikita Vedeneev	3ac9013235	Implements `torch.linalg.lstsq` (#49093 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/44378 by providing a wider range of drivers similar to what SciPy is doing. The supported CPU drivers are `gels, gelsy, gelsd, gelss`. The CUDA interface has only `gels` implemented but only for overdetermined systems. The current state of this PR: - [x] CPU interface - [x] CUDA interface - [x] CPU tests - [x] CUDA tests - [x] Memory-efficient batch-wise iteration with broadcasting which fixes https://github.com/pytorch/pytorch/issues/49252 - [x] docs Pull Request resolved: https://github.com/pytorch/pytorch/pull/49093 Reviewed By: H-Huang Differential Revision: D26723384 Pulled By: mruberry fbshipit-source-id: c9866a95f14091955cf42de22f4ac9e2da009713	2021-03-02 19:00:07 -08:00
Ivan Yashchuk	870bac13bc	Fixed out= variant of linalg.inv (#51977 ) Summary: This PR modifies the behavior of the `linalg_inv_out` variant to match the description here https://github.com/pytorch/pytorch/wiki/Developer-FAQ#how-does-out-work-in-pytorch With this PR result and input tensors must be on the same device and have the same "type kind". It's allowed to pass out tensors with complex dtypes for float inputs. Ref. https://github.com/pytorch/pytorch/issues/42666 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51977 Reviewed By: H-Huang Differential Revision: D26725718 Pulled By: mruberry fbshipit-source-id: 2acc2a311328268706ce27ce060fc88fc7416753	2021-03-02 18:45:29 -08:00
Luca Wehrstedt	92a4ee1cf6	Revert D26375734: Implemented torch.linalg.multi_dot Test Plan: revert-hammer Differential Revision: D26375734 (`0396f492b9`) Original commit changeset: 839642692424 fbshipit-source-id: cb64db646010128d802e1930d5e9526c1f7aa6a2	2021-02-25 00:43:57 -08:00
Heitor Schueroff	0396f492b9	Implemented torch.linalg.multi_dot (#51807 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51807 Implemented torch.linalg.multi_dot similar to [numpy.linalg.multi_dot](https://numpy.org/doc/stable/reference/generated/numpy.linalg.multi_dot.html). This function does not support broadcasting or batched inputs at the moment. NOTE numpy.linalg.multi_dot allows the first and last tensors to have more than 2 dimensions despite their docs stating these must be either 1D or 2D. This PR diverges from NumPy in that it enforces this restriction. TODO - [ ] Benchmark against NumPy - [x] Add OpInfo testing - [x] Remove unnecessary copy for out= argument Test Plan: Imported from OSS Reviewed By: nikithamalgifb Differential Revision: D26375734 Pulled By: heitorschueroff fbshipit-source-id: 839642692424c4b1783606c76dd5b34455368f0b	2021-02-24 15:32:30 -08:00
Ivan Yashchuk	7ca9776874	Fixed _out variants of linear algebra functions (#51560 ) Summary: This PR modifies the behavior of `_out` variants to match the description here https://github.com/pytorch/pytorch/wiki/Developer-FAQ#how-does-out-work-in-pytorch With this PR result and input tensors must be on the same device and have the same "type kind". I skipped `qr` and `eig` in this process as they require a bit more work. Functions that can use the provided storage directly do so. If `result` is not empty and not in the batched column-major format or does not have the same type as input then we have to allocate a temporary tensor and copy it. TODO: - [x] Add more tests for same device and valid safe dtype - [x] Move inv and solve changes to separate PRs https://github.com/pytorch/pytorch/pull/51968, https://github.com/pytorch/pytorch/pull/51977 Ref. https://github.com/pytorch/pytorch/issues/42666 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51560 Reviewed By: albanD Differential Revision: D26400734 Pulled By: heitorschueroff fbshipit-source-id: a6201ed7e919c1670c6ff3ef60217d1dbfb72e67	2021-02-19 04:03:35 -08:00
Jeff Daily	70a805a286	[ROCm] skip one more magma test that is flaky (#52064 ) Summary: Skipped hipMAGMA tests are tracked in https://github.com/pytorch/pytorch/issues/51303. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52064 Reviewed By: albanD Differential Revision: D26406745 Pulled By: walterddr fbshipit-source-id: 2405ea06e03450eb22177c2c8b12a366cfbdaa93	2021-02-11 14:02:52 -08:00
Jeff Daily	5dd1568aa3	[ROCm] skip more magma tests (#51915 ) Summary: Additional magma tests have been identified as failing after integrating hipMAGMA into the ROCm builds. Skipping is necessary until they can be fixed properly. This is blocking migration of ROCm CI to 4.0.1. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51915 Reviewed By: izdeby Differential Revision: D26326404 Pulled By: malfet fbshipit-source-id: 558cce66f216f404c0316ab036e2e5637fc99798	2021-02-09 09:14:42 -08:00
Jeff Daily	d02ea9a141	[ROCm] add hipMAGMA support (#51238 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/48831. - CI image is updated to build hipMAGMA from source and set env MAGMA_HOME. - CMake is updated to separate different requirements for CUDA versus ROCm MAGMA. - Some unit tests that become enabled with MAGMA are currently skipped for ROCm due to failures. Fixing these failures will be follow-on work. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51238 Reviewed By: ngimel Differential Revision: D26184918 Pulled By: malfet fbshipit-source-id: ada632f1ae7b413e8cae6543fe931dcd46985821	2021-02-01 22:09:33 -08:00
Ivan Yashchuk	5e09ec6518	Fixed SVD ignoring "some/full_matrices" flag for empty inputs (#51109 ) Summary: For empty inputs `torch.svd` (and `torch.linalg.svd`) was returning incorrect results for `some=True` (`full_matrices=False`). Behaviour on master branch: ```python In [1]: import torch In [2]: a = torch.randn(0, 7) In [3]: a.svd() Out[3]: torch.return_types.svd( U=tensor([], size=(0, 0)), S=tensor([]), V=tensor([[0., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0., 0.]])) In [4]: a.svd(some=False) Out[4]: torch.return_types.svd( U=tensor([], size=(0, 0)), S=tensor([]), V=tensor([[0., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0., 0.]])) ``` `some` flag is ignored and 7x7 `V` matrix is returned in both cases. `V` should have 7x0 shape when `some=True`. This PR fixes that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51109 Reviewed By: ngimel Differential Revision: D26170897 Pulled By: mruberry fbshipit-source-id: 664c09ca27bb375fabef2a046d0a09ca57b01aac	2021-02-01 21:51:58 -08:00
Ivan Yashchuk	30675d0921	Added OpInfo-based testing of triangular_solve (#50948 ) Summary: Added OpInfo-based testing of `torch.triangular_solve`. These tests helped to discover that CPU `triangular_solve` wasn't working for empty matrices and for CUDA inputs a warning was printed to the terminal. It is fixed now. CUDA gradgrad checks are skipped. ``` 11.44s call test/test_ops.py::TestGradientsCUDA::test_fn_gradgrad_triangular_solve_cuda_complex128 2.97s call test/test_ops.py::TestGradientsCUDA::test_fn_gradgrad_triangular_solve_cuda_float64 1.60s call test/test_ops.py::TestGradientsCPU::test_fn_gradgrad_triangular_solve_cpu_complex128 1.36s call test/test_ops.py::TestOpInfoCUDA::test_supported_dtypes_triangular_solve_cuda_complex128 1.20s call test/test_ops.py::TestGradientsCUDA::test_fn_grad_triangular_solve_cuda_complex128 0.86s call test/test_ops.py::TestCommonCUDA::test_variant_consistency_jit_triangular_solve_cuda_complex64 0.85s call test/test_ops.py::TestCommonCUDA::test_variant_consistency_jit_triangular_solve_cuda_complex128 0.81s call test/test_ops.py::TestCommonCUDA::test_variant_consistency_jit_triangular_solve_cuda_float64 0.77s call test/test_ops.py::TestCommonCUDA::test_variant_consistency_jit_triangular_solve_cuda_float32 0.46s call test/test_ops.py::TestCommonCPU::test_variant_consistency_jit_triangular_solve_cpu_complex128 0.44s call test/test_ops.py::TestCommonCPU::test_variant_consistency_jit_triangular_solve_cpu_complex64 0.44s call test/test_ops.py::TestGradientsCUDA::test_fn_grad_triangular_solve_cuda_float64 0.42s call test/test_ops.py::TestGradientsCPU::test_fn_gradgrad_triangular_solve_cpu_float64 0.40s call test/test_ops.py::TestCommonCPU::test_variant_consistency_jit_triangular_solve_cpu_float32 0.40s call test/test_ops.py::TestCommonCPU::test_variant_consistency_jit_triangular_solve_cpu_float64 0.17s call test/test_ops.py::TestGradientsCPU::test_fn_grad_triangular_solve_cpu_complex128 ``` Ref. https://github.com/pytorch/pytorch/issues/50006 Pull Request resolved: https://github.com/pytorch/pytorch/pull/50948 Reviewed By: ailzhang Differential Revision: D26123998 Pulled By: mruberry fbshipit-source-id: 54136e8fc8a71f107dddb692c5be298c6d5ed168	2021-01-29 10:31:07 -08:00
Jeffrey Wan	c0966914bc	Internal gradcheck wrapper in testing._internal that sets certain flags to True (#51133 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/49409 There are many call sites where, gradcheck/gradgradcheck is now being implicitly invoked with `check_batched_grad` as True, but they were previously False. Cases fall into two basic categories: 1) the call site was previously using `torch.autograd.gradcheck` but is now changed to use the globally imported function instead 3) the call site was already using globally imported function, but does not explicitly pass `check_batched_grad` flag Only in the _assertGradAndGradgradChecks cases, which are infrequent, I assumed that the the author is aware that omitting the flag means not applying check_batched_grad=True. (but maybe that is not the case?) Overall this PR in its current state assumes that unless the author explicitly specified `check_batched_grad=False`, they were just probably not aware of this flag and did not mean to have this flag as False. So far exceptions to the above (as discovered by CI) include: - Mkldnn (opaque tensors do not have strides) https://app.circleci.com/pipelines/github/pytorch/pytorch/264416/workflows/e4d87886-6247-4305-8526-2696130aa9a4/jobs/10401882/tests - all cases in test_sparse (https://app.circleci.com/pipelines/github/pytorch/pytorch/264553/workflows/3c1cbe30-830d-4acd-b240-38d833dccd9b/jobs/10407103) - all cases in test_overrides (https://app.circleci.com/pipelines/github/pytorch/pytorch/264553/workflows/3c1cbe30-830d-4acd-b240-38d833dccd9b/jobs/10407236) - test_autograd (test_LSTM_grad_and_gradgrad) - (https://app.circleci.com/pipelines/github/pytorch/pytorch/264553/workflows/3c1cbe30-830d-4acd-b240-38d833dccd9b/jobs/10407235) - test_data_parallel (test_data_parallel_buffers_requiring_grad) - SIGSEGV (https://app.circleci.com/pipelines/github/pytorch/pytorch/264820/workflows/14d89503-040d-4e3d-9f7b-0bc04833589b/jobs/10422697) - test_nn (https://app.circleci.com/pipelines/github/pytorch/pytorch/264919/workflows/df79e3ed-8a31-4a8e-b584-858ee99686ff/jobs/10427315) Possible TODO is to prevent new tests from invoking external gradcheck. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51133 Reviewed By: ezyang Differential Revision: D26147919 Pulled By: soulitzer fbshipit-source-id: dff883b50f337510a89f391ea2fd87de2d531432	2021-01-29 09:13:37 -08:00
Ivan Yashchuk	6e4746c1ac	Port cholesky_inverse to ATen (#50269 ) Summary: Now we can remove `_th_potri`! Compared to the original TH-based `cholesky_inverse`, complex (https://github.com/pytorch/pytorch/issues/33152) and batched inputs (https://github.com/pytorch/pytorch/issues/7500) are now supported both on CPU and CUDA. Closes https://github.com/pytorch/pytorch/issues/24685. Closes https://github.com/pytorch/pytorch/issues/24543. Ref. https://github.com/pytorch/pytorch/issues/49421, https://github.com/pytorch/pytorch/issues/42666 Pull Request resolved: https://github.com/pytorch/pytorch/pull/50269 Reviewed By: bdhirsh Differential Revision: D26047548 Pulled By: anjali411 fbshipit-source-id: e4f191e39c684f241b7cb0f4b4c025de082cccef	2021-01-28 16:24:41 -08:00
Scott Wolchok	1321f2bfe6	[PyTorch] Port Caffe2 opti for BatchMatMul batch size 1 to baddbmm (#51057 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51057 Caffe2 has an [optimization](`f8eefbdf7a/caffe2/operators/batch_matmul_op.h (L192)`) for the case where the batch size is 1 that uses the underlying `gemm` instead of `gemm_batched` BLAS function. This diff tries to port that optimization to `baddbmm_mkl`. Note that I have very little linear algebra background and am just going off existing code and cblas API documentation, so please review without assuming I know what I'm doing with the math itself. ghstack-source-id: 120342923 Reviewed By: hlu1 Differential Revision: D26056613 fbshipit-source-id: feef80344b96601fc2bd0a2e8c8f6b57510d7856	2021-01-27 15:59:57 -08:00
Gao, Xiang	16dd5ca8ab	Followup of kron PR (#51045 ) Summary: Followup of https://github.com/pytorch/pytorch/pull/50927 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51045 Reviewed By: mruberry Differential Revision: D26089204 Pulled By: ngimel fbshipit-source-id: 77291dd83fba32d6f80a8540910b112a1d85a892	2021-01-27 10:33:05 -08:00
Xiang Gao	ba316a7612	Fix TF32 failures in test_linalg.py (#50453 ) Summary: On Ampere GPU, matmuls are computed by default with TF32 when the dtype is `torch.float`: https://pytorch.org/docs/stable/notes/cuda.html#tensorfloat-32-tf32-on-ampere-devices, which results in reduced precision in results. However, linear algebra usually need higher precision, therefore lots of tests in `test_linalg.py` are failing on Ampere GPU because of precision issue. To fix this issue: - Most linear algebra methods, except for matmuls, should add `NoTF32Guard` - Expected results in unit tests should compute matmuls using numpy instead of pytorch cuda. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50453 Reviewed By: glaringlee Differential Revision: D26023005 Pulled By: ngimel fbshipit-source-id: f0ea533494fee322b07925565b57e3b0db2570c5	2021-01-26 19:51:20 -08:00
Xiang Gao	b822aba8ec	Enable BFloat support for gemms on arch other than ampere (#50442 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/50442 Reviewed By: bdhirsh Differential Revision: D26044981 Pulled By: mruberry fbshipit-source-id: 65c42f2c1de8d24e4852a1b5bd8f4b1735b2230e	2021-01-26 11:07:07 -08:00
Antonio Cuni	880f007480	Add torch.eig complex forward (CPU, CUDA) (#49168 ) Summary: Related to issue https://github.com/pytorch/pytorch/issues/42666 Pull Request resolved: https://github.com/pytorch/pytorch/pull/49168 Reviewed By: mrshenli Differential Revision: D25954027 Pulled By: mruberry fbshipit-source-id: e429f9587efff5e638bfd0e4de864c06f41c63b1	2021-01-25 21:27:08 -08:00
Ivan Yashchuk	ddf26816d3	Make torch.svd return V, not V.conj() for complex inputs (#51012 ) Summary: BC-breaking note: torch.svd() added support for complex inputs in PyTorch 1.7, but was not documented as doing so. The complex "V" tensor returned was actually the complex conjugate of what's expected. This PR fixes the discrepancy. This will silently break all users of torch.svd() with complex inputs. Original PR Summary: This PR resolves https://github.com/pytorch/pytorch/issues/45821. The problem was that when introducing the support of complex inputs for `torch.svd` it was overlooked that LAPACK/MAGMA returns the conjugate transpose of V matrix, not just the transpose of V. So `torch.svd` was silently returning U, S, V.conj() instead of U, S, V. Behavior of `torch.linalg.pinv`, `torch.pinverse` and `torch.linalg.svd` (they depend on `torch.svd`) is not changed in this PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51012 Reviewed By: bdhirsh Differential Revision: D26047593 Pulled By: albanD fbshipit-source-id: d1e08dbc3aab9ce1150a95806ef3b5da98b5d3ca	2021-01-25 14:06:41 -08:00
Heitor Schueroff	a7cf04ec40	Workaround for MAGMA accessing illegal memory in batched cholesky (#50957 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50957 MAGMA has an off-by-one error in their batched cholesky implementation which is causing illegal memory access for certain inputs. The workaround implemented in this PR is to pad the input to MAGMA with 1 extra element. Benchmark Ran the script below for both before and after my PR and got similar results. Script ``` import torch from torch.utils import benchmark DTYPE = torch.float32 BATCHSIZE = 512 * 512 MATRIXSIZE = 16 a = torch.eye(MATRIXSIZE, device='cuda', dtype=DTYPE) t0 = benchmark.Timer( stmt='torch.cholesky(a)', globals={'a': a}, label='Single' ) t1 = benchmark.Timer( stmt='torch.cholesky(a)', globals={'a': a.expand(BATCHSIZE, -1, -1)}, label='Batched' ) print(t0.timeit(100)) print(t1.timeit(100)) ``` Results before ``` <torch.utils.benchmark.utils.common.Measurement object at 0x7faf9bc63400> Single 2.08 ms 1 measurement, 100 runs , 1 thread <torch.utils.benchmark.utils.common.Measurement object at 0x7faf9bc63400> Batched 7.68 ms 1 measurement, 100 runs , 1 thread ``` Results after ``` <torch.utils.benchmark.utils.common.Measurement object at 0x7faf9bc63400> Single 2.10 ms 1 measurement, 100 runs , 1 thread <torch.utils.benchmark.utils.common.Measurement object at 0x7faf9bc63400> Batched 7.56 ms 1 measurement, 100 runs , 1 thread ``` Fixes https://github.com/pytorch/pytorch/issues/41394, https://github.com/pytorch/pytorch/issues/26996, https://github.com/pytorch/pytorch/issues/48996 See also https://github.com/pytorch/pytorch/issues/42666, https://github.com/pytorch/pytorch/pull/26789 TODO --- - [x] Benchmark to check for perf regressions Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D26050978 Pulled By: heitorschueroff fbshipit-source-id: 7a5ba7e34c9d74b58568b2a0c631cc6d7ba63f86	2021-01-25 13:39:24 -08:00
Ivan Yashchuk	627a331257	Port CPU torch.orgqr to ATen (#50502 ) Summary: Now we can remove `_th_orgqr`! Compared to the original TH-based `orgqr`, complex (https://github.com/pytorch/pytorch/issues/33152) and batched inputs are now supported. CUDA support will be added in a follow-up PR. Closes https://github.com/pytorch/pytorch/issues/24747 Ref. https://github.com/pytorch/pytorch/issues/49421, https://github.com/pytorch/pytorch/issues/42666 Pull Request resolved: https://github.com/pytorch/pytorch/pull/50502 Reviewed By: mrshenli Differential Revision: D25953300 Pulled By: mruberry fbshipit-source-id: f52a74e1c8f51b5e24f7b461430ca8fc96e4d149	2021-01-25 02:57:05 -08:00
Xiao Wang	186c3da037	Add cusolver gesvdj and gesvdjBatched to the backend of torch.svd (#48436 ) Summary: This PR adds cusolver `gesvdj` and `gesvdjBatched` to the backend of `torch.svd`. I've tested the performance using cuda 11.1 on 2070, V100, and A100. The cusolver gesvdj and gesvdjBatched performances are better than magma in all square matrix cases. So cusolver backend will replace magma backend when available. When both matrix dimensions are no greater than 32, `gesvdjBatched` is used. Otherwise, `gesvdj` is used. Detailed benchmark is available at https://github.com/xwang233/code-snippet/tree/master/svd. Some relevant code and discussions - https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/kernels/linalg/svd_op_gpu.cu.cc - https://github.com/google/jax/blob/master/jaxlib/cusolver.cc - https://github.com/cupy/cupy/issues/3174 - https://github.com/tensorflow/tensorflow/issues/13603 - https://www.nvidia.com/en-us/on-demand/session/gtcsiliconvalley2019-s9226/ See also https://github.com/pytorch/pytorch/issues/42666 https://github.com/pytorch/pytorch/issues/47953 Close https://github.com/pytorch/pytorch/pull/50516 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48436 Reviewed By: ejguan Differential Revision: D25977046 Pulled By: heitorschueroff fbshipit-source-id: c27e705cd29b6fd7c8ac674c1f9f490fa26ee1bf	2021-01-24 15:47:05 -08:00
Xiang Gao	ab331da7ac	Rewrite kron with broadcasting at::mul (#50927 ) Summary: Because it is shorter, faster, and does not have TF32 issue. Benchmark: https://github.com/zasdfgbnm/things/blob/master/2021Q1/kron.ipynb Pull Request resolved: https://github.com/pytorch/pytorch/pull/50927 Reviewed By: glaringlee Differential Revision: D26022385 Pulled By: ngimel fbshipit-source-id: 513c9e9138c35c70d3a475a8407728af21321dae	2021-01-22 20:58:17 -08:00
Kurt Mohler	8ab1a1495d	Rename `set_deterministic` to `use_deterministic_algorithms` (#49904 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/49100 Pull Request resolved: https://github.com/pytorch/pytorch/pull/49904 Reviewed By: ezyang, mrshenli Differential Revision: D25956761 Pulled By: mruberry fbshipit-source-id: 86a59289d50825a0ebbd7c358b483c8d8039ffa6	2021-01-22 11:27:07 -08:00
Kurt Mohler	c082e2184d	Add autograd tests for complex matrix norm nuclear and +/-2 (#50746 ) Summary: Also upgrades `linalg.norm`'s autograd and jit tests to `OpInfo` Fixes https://github.com/pytorch/pytorch/issues/48842 Pull Request resolved: https://github.com/pytorch/pytorch/pull/50746 Reviewed By: mruberry Differential Revision: D25968246 Pulled By: anjali411 fbshipit-source-id: d457069ddb4caf2a5caed1aa64c791ef0790952c	2021-01-21 15:33:08 -08:00
Richard Zou	884fb48794	Miscellaneous batched grad testing (#50738 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50738 This PR adds batched grad testing for: - test_linalg.py - test_unary_ufuncs.py Future: - add batched grad testing for test_nn - enable option for batched grad testing in OpInfo Test Plan: - run tests Reviewed By: ejguan Differential Revision: D25997678 Pulled By: zou3519 fbshipit-source-id: 9a9f6694c041580061bd52b5e45661c872b0b761	2021-01-21 14:26:46 -08:00
Ivan Yashchuk	f9a5ba7398	Added linalg.slogdet (#49194 ) Summary: This PR adds `torch.linalg.slogdet`. Changes compared to the original torch.slogdet: - Complex input now works as in NumPy - Added out= variant (allocates temporary and makes a copy for now) - Updated `slogdet_backward` to work with complex input Ref. https://github.com/pytorch/pytorch/issues/42666 Pull Request resolved: https://github.com/pytorch/pytorch/pull/49194 Reviewed By: VitalyFedyunin Differential Revision: D25916959 Pulled By: mruberry fbshipit-source-id: cf9be8c5c044870200dcce38be48cd0d10e61a48	2021-01-19 07:28:12 -08:00
Ivan Yashchuk	9384d31af5	Added linalg.pinv (#48399 ) Summary: This PR adds `torch.linalg.pinv`. Changes compared to the original `torch.pinverse`: * New kwarg "hermitian": with `hermitian=True` eigendecomposition is used instead of singular value decomposition. * `rcond` argument can now be a `Tensor` of appropriate shape to apply matrix-wise clipping of singular values. * Added `out=` variant (allocates temporary and makes a copy for now) Ref. https://github.com/pytorch/pytorch/issues/42666 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48399 Reviewed By: zhangguanheng66 Differential Revision: D25869572 Pulled By: mruberry fbshipit-source-id: 0f330a91d24ba4e4375f648a448b27594e00dead	2021-01-12 06:52:06 -08:00
Ivan Yashchuk	4774c6800b	Added linalg.inv (#48261 ) Summary: This PR adds `torch.linalg.inv` for NumPy compatibility. `linalg_inv_out` uses in-place operations on provided `result` tensor. I modified `apply_inverse` to accept tensor of Int instead of std::vector, that way we can write a function similar to `linalg_inv_out` but removing the error checks and device memory synchronization. I fixed `lda` (leading dimension parameter which is max(1, n)) in many places to handle 0x0 matrices correctly. Zero batch dimensions are also working and tested. Ref https://github.com/pytorch/pytorch/issues/42666 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48261 Reviewed By: gchanan Differential Revision: D25849590 Pulled By: mruberry fbshipit-source-id: cfee6f1daf7daccbe4612ec68f94db328f327651	2021-01-10 04:00:51 -08:00
Antonio Cuni	b5ab0a7f78	Improve torch.linalg.qr (#50046 ) Summary: This is a follow up of PR https://github.com/pytorch/pytorch/issues/47764 to fix the remaining details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50046 Reviewed By: zou3519 Differential Revision: D25825557 Pulled By: mruberry fbshipit-source-id: b8e335e02265e73484a99b0189e4cc042828e0a9	2021-01-08 09:52:31 -08:00
Antonio Cuni	5c5abd591d	Implement torch.linalg.svd (#45562 ) Summary: This is related to https://github.com/pytorch/pytorch/issues/42666 . I am opening this PR to have the opportunity to discuss things. First, we need to consider the differences between `torch.svd` and `numpy.linalg.svd`: 1. `torch.svd` takes `some=True`, while `numpy.linalg.svd` takes `full_matrices=True`, which is effectively the opposite (and with the opposite default, too!) 2. `torch.svd` returns `(U, S, V)`, while `numpy.linalg.svd` returns `(U, S, VT)` (i.e., V transposed). 3. `torch.svd` always returns a 3-tuple; `numpy.linalg.svd` returns only `S` in case `compute_uv==False` 4. `numpy.linalg.svd` also takes an optional `hermitian=False` argument. I think that the plan is to eventually deprecate `torch.svd` in favor of `torch.linalg.svd`, so this PR does the following: 1. Rename/adapt the old `svd` C++ functions into `linalg_svd`: in particular, now `linalg_svd` takes `full_matrices` and returns `VT` 2. Re-implement the old C++ interface on top of the new (by negating `full_matrices` and transposing `VT`). 3. The C++ version of `linalg_svd` always returns a 3-tuple (we can't do anything else). So, there is a python wrapper which manually calls `torch._C._linalg.linalg_svd` to tweak the return value in case `compute_uv==False`. Currently, `linalg_svd_backward` is broken because it has not been adapted yet after the `V ==> VT` change, but before continuing and spending more time on it I wanted to make sure that the general approach is fine. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45562 Reviewed By: H-Huang Differential Revision: D25803557 Pulled By: mruberry fbshipit-source-id: 4966f314a0ba2ee391bab5cda4563e16275ce91f	2021-01-08 06:46:16 -08:00
Richard Barnes	ec6d29d6fa	Drop unused imports from test (#49973 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49973 From ``` ./python/libcst/libcst codemod remove_unused_imports.RemoveUnusedImportsWithGlean --no-format caffe2/ ``` Test Plan: Standard sandcastle tests Reviewed By: xush6528 Differential Revision: D25727350 fbshipit-source-id: 237ec4edd85788de920663719173ebec7ddbae1c	2021-01-07 12:09:38 -08:00
Antonio Cuni	361f5ed91d	Implement torch.linalg.qr (#47764 ) Summary: I am opening this PR early to have a place to discuss design issues. The biggest difference between `torch.qr` and `numpy.linalg.qr` is that the former `torch.qr` takes a boolean parameter `some=True`, while the latter takes a string parameter `mode='reduced'` which can be one of the following: `reduced` this is completely equivalent to `some=True`, and both are the default. `complete` this is completely equivalent to `some=False`. `r` this returns only `r` instead of a tuple `(r, q)`. We have already decided that we don't want different return types depending on the parameters, so I propose to return `(r, empty_tensor)` instead. I think that in this mode it will be impossible to implement the backward pass, so we should raise an appropriate error in that case. `raw` in this mode, it returns `(h, tau)` instead of `(q, r)`. Internally, `h` and `tau` are obtained by calling lapack's `dgeqrf` and are later used to compute the actual values of `(q, r)`. The numpy docs suggest that these might be useful to call other lapack functions, but at the moment none of them is exposed by numpy and I don't know how often it is used in the real world. I suppose the implementing the backward pass need attention to: the most straightforward solution is to use `(h, tau)` to compute `(q, r)` and then use the normal logic for `qr_backward`, but there might be faster alternatives. `full`, `f` alias for `reduced`, deprecated since numpy 1.8.0 `economic`, `e` similar to `raw but it returns only `h` instead of `(h, tau). Deprecated since numpy 1.8.0 To summarize: * `reduce`, `complete` and `r` are straightforward to implement. * `raw` needs a bit of extra care, but I don't know how much high priority it is: since it is used rarely, we might want to not support it right now and maybe implement it in the future? * I think we should just leave `full` and `economic` out, and possibly add a note to the docs explaining what you need to use instead /cc mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/47764 Reviewed By: ngimel Differential Revision: D25708870 Pulled By: mruberry fbshipit-source-id: c25c70a23a02ec4322430d636542041e766ebe1b	2020-12-28 17:28:17 -08:00
Mike Ruberry	5acc27c00a	Revert D25690129: [pytorch][PR] Added linalg.inv Test Plan: revert-hammer Differential Revision: D25690129 (`8554b58fbd`) Original commit changeset: edb2d03721f2 fbshipit-source-id: 8679ea18e637423d35919544d2b047a62ac3abd8	2020-12-23 15:27:52 -08:00
Ivan Yashchuk	8554b58fbd	Added linalg.inv (#48261 ) Summary: This PR adds `torch.linalg.inv` for NumPy compatibility. `linalg_inv_out` uses in-place operations on provided `result` tensor. I modified `apply_inverse` to accept tensor of Int instead of std::vector, that way we can write a function similar to `linalg_inv_out` but removing the error checks and device memory synchronization. I fixed `lda` (leading dimension parameter which is max(1, n)) in many places to handle 0x0 matrices correctly. Zero batch dimensions are also working and tested. Ref https://github.com/pytorch/pytorch/issues/42666 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48261 Reviewed By: ngimel Differential Revision: D25690129 Pulled By: mruberry fbshipit-source-id: edb2d03721f22168c42ded8458513cb23dfdc712	2020-12-23 11:29:00 -08:00
Hao Lu	d54cf2aa27	[pt][ATen] Optimize bmm (#49506 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49506 - Get rid of expensive stuff like `TensorArg`, `checkBackend`, `checkSize`, and `TensorAccessor`. - Add `checkDim` that does not require creating a `TensorArg` which incurs a refcount bump - Avoid unnecessary calls to `torch.select`, which goes through the dispatcher in the cases we care about, with mat1 and mat2 not permuted or permuted with dims = [0, 2, 1]. The pt version of bmm supports crazy cases like when the inputs are permuted with dims = [1, 2, 0], which is uncommon in SparseNNs. Test Plan: Unit test: ``` buck test //caffe2/test:linalg ``` Benchmark with the adindexer model: ``` Before: I1216 14:02:24.155516 2595800 PyTorchPredictorBenchLib.cpp:209] PyTorch run finished. Milliseconds per iter: 0.0847197. Iters per second: 11803.6 After: I1216 14:02:26.583878 2595939 PyTorchPredictorBenchLib.cpp:209] PyTorch run finished. Milliseconds per iter: 0.082051. Iters per second: 12187.5 ``` Reviewed By: bwasti Differential Revision: D25577574 fbshipit-source-id: 8aba69b950e7b4d9d1b14ba837931695a908c068	2020-12-21 22:08:39 -08:00
Ivan Yashchuk	8be205ae13	Added linalg.solve (#48456 ) Summary: This PR adds `torch.linalg.solve`. `linalg_solve_out` uses in-place operations on the provided result tensor. I modified `apply_solve` to accept tensor of Int instead of std::vector, that way we can write a function similar to `linalg_solve_out` but removing the error checks and device memory synchronization. In comparison to `torch.solve` this routine accepts 1-dimensional tensors and batches of 1-dim tensors for the right-hand-side term. `torch.solve` requires it to be at least 2-dimensional. Ref. https://github.com/pytorch/pytorch/issues/42666 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48456 Reviewed By: izdeby Differential Revision: D25562222 Pulled By: mruberry fbshipit-source-id: a9355c029e2442c2e448b6309511919631f9e43b	2020-12-21 10:11:12 -08:00
Ivan Yashchuk	f5ee619d2a	Updated derivative rules for complex svd and pinverse (#47761 ) Summary: Updated `svd_backward` to work correctly for complex-valued inputs. Updated `common_methods_invocations.py` to take dtype, device arguments for input construction. Removed `test_pinverse` from `test_autograd.py`, it is replaced by entries to `common_methods_invocations.py`. Added `svd` and `pinverse` to list of complex tests. References for complex-valued SVD differentiation: - https://giggleliu.github.io/2019/04/02/einsumbp.html - https://arxiv.org/abs/1909.02659 The derived rules assume gauge invariance of loss functions, so the result would not be correct for loss functions that are not gauge invariant. https://re-ra.xyz/Gauge-Problem-in-Automatic-Differentiation/ The same rule is implemented in Tensorflow and [BackwardsLinalg.jl](https://github.com/GiggleLiu/BackwardsLinalg.jl). Ref. https://github.com/pytorch/pytorch/issues/33152 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47761 Reviewed By: ngimel Differential Revision: D25658897 Pulled By: mruberry fbshipit-source-id: ba33ecbbea3f592238c01e62c7f193daf22a9d01	2020-12-20 14:39:31 -08:00
Mike Ruberry	f5b68e74d7	Revert D25574962: [pytorch][PR] Updated derivative rules for complex svd and pinverse Test Plan: revert-hammer Differential Revision: D25574962 (`9955355853`) Original commit changeset: 832b61303e88 fbshipit-source-id: d73f77f3e51b0f535dad6d21c5bebf8d41a6bfbd	2020-12-17 00:59:43 -08:00
Ivan Yashchuk	9955355853	Updated derivative rules for complex svd and pinverse (#47761 ) Summary: Updated `svd_backward` to work correctly for complex-valued inputs. Updated `common_methods_invocations.py` to take dtype, device arguments for input construction. Removed `test_pinverse` from `test_autograd.py`, it is replaced by entries to `common_methods_invocations.py`. Added `svd` and `pinverse` to list of complex tests. References for complex-valued SVD differentiation: - https://giggleliu.github.io/2019/04/02/einsumbp.html - https://arxiv.org/abs/1909.02659 The derived rules assume gauge invariance of loss functions, so the result would not be correct for loss functions that are not gauge invariant. https://re-ra.xyz/Gauge-Problem-in-Automatic-Differentiation/ The same rule is implemented in Tensorflow and [BackwardsLinalg.jl](https://github.com/GiggleLiu/BackwardsLinalg.jl). Ref. https://github.com/pytorch/pytorch/issues/33152 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47761 Reviewed By: izdeby Differential Revision: D25574962 Pulled By: mruberry fbshipit-source-id: 832b61303e883ad3a451b84850ccf0f36763a6f6	2020-12-16 12:32:22 -08:00
Gao, Xiang	48d1ad1ada	Reland "Add test for empty tensors for batch matmuls" (#48797 ) Summary: This reverts commit `c7746adbc6`. Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/48797 Reviewed By: mruberry Differential Revision: D25575264 Pulled By: ngimel fbshipit-source-id: c7f3b384db833d727bb5bd8a51f1493a13016d09	2020-12-16 11:19:27 -08:00
Heitor Schueroff	45b33c83f1	Revert "Revert D24923679: Fixed einsum compatibility/performance issues (#46398 )" (#49189 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49189 This reverts commit `d307601365` and fixes the bug with diagonals and ellipsis combined. Test Plan: Imported from OSS Reviewed By: glaringlee Differential Revision: D25540722 Pulled By: heitorschueroff fbshipit-source-id: 86d0c9a7dcfda600b546457dad102af2ff33e353	2020-12-16 10:38:07 -08:00
Kurt Mohler	54f0556ee4	Add missing complex support for torch.norm and torch.linalg.norm (#48284 ) Summary: BC-breaking note: Previously, when given a complex input, `torch.linalg.norm` and `torch.norm` would return a complex output. `torch.linalg.cond` would sometimes return a complex output and sometimes return a real output when given a complex input, depending on its `p` argument. This PR changes this behavior to match `numpy.linalg.norm` and `numpy.linalg.cond`, so that a complex input will result in the downgraded real number type, consistent with NumPy. PR Summary: The following cases were previously unsupported for complex inputs, and this commit adds support: - Frobenius norm - Norm order 2 (vector and matrix) - CUDA vector norm Part of https://github.com/pytorch/pytorch/issues/47833 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48284 Reviewed By: H-Huang Differential Revision: D25420880 Pulled By: mruberry fbshipit-source-id: 11f6a2f3cad57d66476d30921c3f6ab8f3cd4017	2020-12-10 10:23:45 -08:00
Kurt Mohler	27f7d1c286	Port `eig` CPU from TH to ATen (#43215 ) Summary: Also consolidates shared logic between `eig` CPU and CUDA implementations Fixes https://github.com/pytorch/pytorch/issues/24693 Pull Request resolved: https://github.com/pytorch/pytorch/pull/43215 Reviewed By: VitalyFedyunin, zhangguanheng66 Differential Revision: D23862622 Pulled By: ngimel fbshipit-source-id: ca1002428850520cd74cd5b7ed8cb4d12dbd9c52	2020-12-09 23:27:35 -08:00
X Wang	a849f38222	skip cuda test_cholesky_solve_batched_many_batches due to illegal memory access (#48999 ) Summary: See https://github.com/pytorch/pytorch/issues/48996 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48999 Reviewed By: zhangguanheng66 Differential Revision: D25390070 Pulled By: mruberry fbshipit-source-id: cf59130f6189ab8c2dade6a6a4de2f69753a5e36	2020-12-09 00:47:55 -08:00
Heitor Schueroff	d307601365	Revert D24923679: Fixed einsum compatibility/performance issues (#46398 ) Test Plan: revert-hammer Differential Revision: D24923679 (`ea2a568cca`) Original commit changeset: 47e48822cd67 fbshipit-source-id: 52f17b66a4aa075d0159bdf1c98616e6098091b8	2020-12-07 11:48:36 -08:00
Heitor Schueroff	ea2a568cca	Fixed einsum compatibility/performance issues (#46398 ) (#47860 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47860 This PR makes torch.einsum compatible with numpy.einsum except for the sublist input option as requested here https://github.com/pytorch/pytorch/issues/21412. It also fixed 2 performance issues linked below and adds a check for reducing to torch.dot instead of torch.bmm which is faster in some cases. fixes #45854, #37628, #30194, #15671 fixes #41467 with benchmark below ```python import torch from torch.utils.benchmark import Timer a = torch.randn(10000, 100, 101, device='cuda') b = torch.randn(10000, 101, 3, device='cuda') c = torch.randn(10000, 100, 1, device='cuda') d = torch.randn(10000, 100, 1, 3, device='cuda') print(Timer( stmt='torch.einsum("bij,bjf->bif", a, b)', globals={'a': a, 'b': b} ).blocked_autorange()) print() print(Timer( stmt='torch.einsum("bic,bicf->bif", c, d)', globals={'c': c, 'd': d} ).blocked_autorange()) ``` ``` <torch.utils.benchmark.utils.common.Measurement object at 0x7fa37c413850> torch.einsum("bij,bjf->bif", a, b) Median: 4.53 ms IQR: 0.00 ms (4.53 to 4.53) 45 measurements, 1 runs per measurement, 1 thread <torch.utils.benchmark.utils.common.Measurement object at 0x7fa37c413700> torch.einsum("bic,bicf->bif", c, d) Median: 63.86 us IQR: 1.52 us (63.22 to 64.73) 4 measurements, 1000 runs per measurement, 1 thread ``` fixes #32591 with benchmark below ```python import torch from torch.utils.benchmark import Timer a = torch.rand(1, 1, 16, 2, 16, 2, 16, 2, 2, 2, 2, device="cuda") b = torch.rand(729, 1, 1, 2, 1, 2, 1, 2, 2, 2, 2, device="cuda") print(Timer( stmt='(a * b).sum(dim = (-3, -2, -1))', globals={'a': a, 'b': b} ).blocked_autorange()) print() print(Timer( stmt='torch.einsum("...ijk, ...ijk -> ...", a, b)', globals={'a': a, 'b': b} ).blocked_autorange()) ``` ``` <torch.utils.benchmark.utils.common.Measurement object at 0x7efe0de28850> (a * b).sum(dim = (-3, -2, -1)) Median: 17.86 ms 2 measurements, 10 runs per measurement, 1 thread <torch.utils.benchmark.utils.common.Measurement object at 0x7efe0de286a0> torch.einsum("...ijk, ...ijk -> ...", a, b) Median: 296.11 us IQR: 1.38 us (295.42 to 296.81) 662 measurements, 1 runs per measurement, 1 thread ``` TODO - [x] add support for ellipsis broadcasting - [x] fix corner case issues with sumproduct_pair - [x] update docs and add more comments - [x] add tests for error cases Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D24923679 Pulled By: heitorschueroff fbshipit-source-id: 47e48822cd67bbcdadbdfc5ffa25ee8ba4c9620a	2020-12-06 08:02:37 -08:00
Ivan Yashchuk	85121a7a0f	Added CUDA support for complex input for torch.cholesky_solve (#47047 ) Summary: `torch.cholesky_solve` now works for complex inputs on GPU. I moved the existing tests to `test_linalg.py` and modified them to test complex and float32 dtypes. Differentiation also works correctly with complex inputs now. Ref. https://github.com/pytorch/pytorch/issues/33152 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47047 Reviewed By: ngimel Differential Revision: D24730020 Pulled By: mruberry fbshipit-source-id: 95402da5789c56e5a682019790985207fa28fa1f	2020-12-05 20:18:30 -08:00
Ivan Yashchuk	cb285080b0	Added computing matrix condition numbers (linalg.cond) (#45832 ) Summary: This PR adds `torch.linalg.cond` for NumPy compatibility. Ref https://github.com/pytorch/pytorch/issues/42666. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45832 Reviewed By: ngimel Differential Revision: D25183690 Pulled By: mruberry fbshipit-source-id: a727959bfec2bc2dc36df59d9ef79c0534b68194	2020-12-04 02:23:57 -08:00
Heitor Schueroff	c134f32835	Implemented torch.inner (#46716 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46716 Implemented torch.inner similar to [numpy.inner](https://numpy.org/doc/stable/reference/generated/numpy.inner.html). For now it's implemented as a composite op. TODO - [x] Add documentation Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D24860351 Pulled By: heitorschueroff fbshipit-source-id: de5c82f285893495491fdba73b35634f4d00bac8	2020-12-03 11:37:55 -08:00
Mike Ruberry	c7746adbc6	Revert D24874754: [pytorch][PR] Add test for empty tensors for batch matmuls Test Plan: revert-hammer Differential Revision: D24874754 (`5f105e2aa6`) Original commit changeset: 41ba837740ff fbshipit-source-id: d6cb31cbc4a2a386aab0a5f24710f218f9a561ca	2020-12-03 00:29:07 -08:00
Xiang Gao	5f105e2aa6	Add test for empty tensors for batch matmuls (#47700 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/47700 Reviewed By: malfet Differential Revision: D24874754 Pulled By: ngimel fbshipit-source-id: 41ba837740ff7d5bd49d5f7277ad2064985aba2f	2020-12-02 20:45:59 -08:00
Nikita Vedeneev	3b25af02a4	matrix_exp + matrix_exp.backward complex support (#48363 ) Summary: As per title. Fixes https://github.com/pytorch/pytorch/issues/48299. Pull Request resolved: https://github.com/pytorch/pytorch/pull/48363 Reviewed By: ejguan Differential Revision: D25224498 Pulled By: albanD fbshipit-source-id: 0c80ffb03ccfc46ab86398911edfba0b09049e55	2020-12-02 08:35:14 -08:00
Ivan Yashchuk	e41e780f7a	Added support for complex input for torch.lu_solve #2 (#48028 ) Summary: Relanding https://github.com/pytorch/pytorch/pull/46862 There was an issue with the simultaneous merge of two slightly conflicting PRs. This PR adds `torch.lu_solve` for complex inputs both on CPU and GPU. Pull Request resolved: https://github.com/pytorch/pytorch/pull/48028 Reviewed By: linbinyu Differential Revision: D25003700 Pulled By: zou3519 fbshipit-source-id: 24cd1babe9ccdbaa4e2ed23f08a9153d40d0f0cd	2020-12-02 08:13:02 -08:00
Ivan Yashchuk	74330e0497	Added linalg.matrix_rank (#48206 ) Summary: This PR adds `torch.linalg.matrix_rank`. Changes compared to the original `torch.matrix_rank`: - input with the complex dtype is supported - batched input is supported - "symmetric" kwarg renamed to "hermitian" Should I update the documentation for `torch.matrix_rank`? For the input with no elements (for example 0×0 matrix), the current implementation is divergent from NumPy. NumPy stumbles on not defined max for such input, here I chose to return appropriately sized tensor of zeros. I think that's mathematically a correct thing to do. Ref https://github.com/pytorch/pytorch/issues/42666. Pull Request resolved: https://github.com/pytorch/pytorch/pull/48206 Reviewed By: albanD Differential Revision: D25211965 Pulled By: mruberry fbshipit-source-id: ae87227150ab2cffa07f37b4a3ab228788701837	2020-12-02 03:29:25 -08:00
Mike Ruberry	36c87f1243	Refactors test_torch.py to be fewer than 10k lines (#47356 ) Summary: Creates multiple new test suites to have fewer tests in test_torch.py, consistent with previous test suite creation like test_unary_ufuncs.py and test_linalg.py. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47356 Reviewed By: ngimel Differential Revision: D25202268 Pulled By: mruberry fbshipit-source-id: 75fde3ca76545d1b32b86d432a5cb7a5ba8f5bb6	2020-11-28 20:11:40 -08:00
Antonio Cuni	344918576c	Migrate `eig` from the TH to Aten (CUDA) (#44105 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/24553 Pull Request resolved: https://github.com/pytorch/pytorch/pull/44105 Reviewed By: ngimel Differential Revision: D25192116 Pulled By: mruberry fbshipit-source-id: 87f1ba4924b9174bfe0d9e2ab14bbe1c6bae879c	2020-11-27 15:15:48 -08:00
Ivan Yashchuk	4ed7f36ed1	Added linalg.eigh, linalg.eigvalsh (#45526 ) Summary: This PR adds `torch.linalg.eigh`, and `torch.linalg.eigvalsh` for NumPy compatibility. The current `torch.symeig` uses (on CPU) a different LAPACK routine than NumPy (`syev` vs `syevd`). Even though it shouldn't matter in practice, `torch.linalg.eigh` uses `syevd` (as NumPy does). Ref https://github.com/pytorch/pytorch/issues/42666 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45526 Reviewed By: gchanan Differential Revision: D25022659 Pulled By: mruberry fbshipit-source-id: 3676b77a121c4b5abdb712ad06702ac4944e900a	2020-11-22 04:57:28 -08:00
Xiong Wei	ec256ab2f2	implement torch.addr using TensorIterator based kernels (#47664 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/47313 This PR implements `torch.addr` function using `TensorIterator` with `cpu_kernel_vec` and `gpu_kernel`. It helps reduce memory usage, improve performance, and fix the bug when `beta` or `alpha` is a complex number. Todo - [x] benchmarking `torch.addr` for the change of this PR, as well as the legacy TH implementation used in PyTorch 1.6. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47664 Reviewed By: zhangguanheng66 Differential Revision: D25059693 Pulled By: ngimel fbshipit-source-id: 20a90824aa4cb2240e81a9f17a9e2f16ae6e3437	2020-11-20 00:21:49 -08:00
Ivan Yashchuk	343b3e5cae	Added linalg.tensorinv (#45969 ) Summary: This PR adds `torch.linalg.tensorinv` for NumPy compatibility. Ref https://github.com/pytorch/pytorch/issues/42666 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45969 Reviewed By: zhangguanheng66 Differential Revision: D25060568 Pulled By: mruberry fbshipit-source-id: 3b145ce64e4bd5021bc229f5ffdd791c572673a0	2020-11-19 11:54:50 -08:00
Mike Ruberry	ea1e78a0c5	Revert D24853669: [pytorch][PR] Migrate `eig` from the TH to Aten (CUDA) Test Plan: revert-hammer Differential Revision: D24853669 (`866f8591be`) Original commit changeset: a513242dc7f4 fbshipit-source-id: a0c8c424b61b1e627d9102de6b4c6d0717a6c06d	2020-11-18 16:53:18 -08:00
Antonio Cuni	866f8591be	Migrate `eig` from the TH to Aten (CUDA) (#44105 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/24553 Pull Request resolved: https://github.com/pytorch/pytorch/pull/44105 Reviewed By: heitorschueroff Differential Revision: D24853669 Pulled By: mruberry fbshipit-source-id: a513242dc7f49f55dbc6046c18d8a9d9aa2aaf8d	2020-11-18 12:10:18 -08:00
Ivan Yashchuk	81b1673a21	Enable complex tests that depend on batched matmul on CUDA (#47910 ) Summary: Now when https://github.com/pytorch/pytorch/pull/42553 is merged we can delete a bit of code from the tests and enable some of the skipped complex tests. Unfortunately, `test_pinverse_complex_xfailed` and `test_symeig_complex_xfailed` had bugs and it wasn't caught automatically that these tests xpass. Need to be careful next time with `unittest.expectedFailure`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47910 Reviewed By: zhangguanheng66 Differential Revision: D25052130 Pulled By: mruberry fbshipit-source-id: 29512995c024b882f9cb78b7bede77733d5762d0	2020-11-18 10:44:47 -08:00
Ivan Yashchuk	260daf088d	Added linalg.cholesky (#46083 ) Summary: This PR adds `torch.linalg.cholesky` function that matches `numpy.linalg.cholesky`. Fixed `lda` argument to `lapackCholesky` calls. Added `random_hermitian_pd_matrix` helper function for tests. Ref https://github.com/pytorch/pytorch/issues/42666. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46083 Reviewed By: ailzhang Differential Revision: D24861752 Pulled By: mruberry fbshipit-source-id: 214dbceb4e8a2c589df209493efd843962d25593	2020-11-13 16:50:40 -08:00
Richard Zou	1c7c612af0	Revert D24543682: [pytorch][PR] Added support for complex input for torch.lu_solve Test Plan: revert-hammer Differential Revision: D24543682 (`ffd0003022`) Original commit changeset: 165bde39ef95 fbshipit-source-id: 790b4157fdbc7149aaf0748555efe6daed7e1a23	2020-11-13 08:24:53 -08:00
Ivan Yashchuk	ffd0003022	Added support for complex input for torch.lu_solve (#46862 ) Summary: `torch.lu_solve` now works for complex inputs both on CPU and GPU. I moved the existing tests to `test_linalg.py` and modified them to test complex dtypes, but I didn't modify/improve the body of the tests. Ref. https://github.com/pytorch/pytorch/issues/33152 Pull Request resolved: https://github.com/pytorch/pytorch/pull/46862 Reviewed By: nikithamalgifb Differential Revision: D24543682 Pulled By: anjali411 fbshipit-source-id: 165bde39ef95cafebf976c5ba4b487297efe8433	2020-11-13 02:35:31 -08:00
Ivan Yashchuk	149190c014	Added CUDA support for complex input for torch.solve (#47045 ) Summary: `torch.solve` now works for complex inputs on GPU. I moved the existing tests to `test_linalg.py` and modified them to test complex and float32 dtypes. Differentiation also works correctly with complex inputs. Fixes https://github.com/pytorch/pytorch/issues/41084 Ref. https://github.com/pytorch/pytorch/issues/33152 anjali411 I hope you don't mind that I took over https://github.com/pytorch/pytorch/pull/42737 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47045 Reviewed By: nikithamalgifb Differential Revision: D24921503 Pulled By: anjali411 fbshipit-source-id: 4c3fc4f193a84b6e28c43c08672d480715000923	2020-11-12 12:22:59 -08:00
Gregory Chanan	b6cb2caa68	Revert "Fixed einsum compatibility/performance issues (#46398 )" (#47821 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47821 This reverts commit `a5c65b86ce`. Conflicts: test/test_linalg.py Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D24909923 Pulled By: gchanan fbshipit-source-id: 9dcf98e7c4a3c7e5aaffe475867fa086f3bb6ff2	2020-11-12 08:11:40 -08:00
Jeff Daily	2df5600155	[ROCm] add skipCUDAIfRocm to test_lingalg test_norm_fro_2_equivalence_old (#47809 ) Summary: This test started failing when ROCm CI moved to 3.9. Skip until triage is complete. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47809 Reviewed By: seemethere Differential Revision: D24906319 Pulled By: walterddr fbshipit-source-id: 0c425f3b21190cfbc5e0d1c3f477d834af40f0ca	2020-11-12 07:12:43 -08:00
Ivan Yashchuk	52ec8b9340	Added CUDA support for complex input for torch.triangular_solve (#46916 ) Summary: `torch.triangular_solve` now works for complex inputs on GPU. I moved the existing tests to `test_linalg.py` and modified them to test complex and float32 dtypes. Ref. https://github.com/pytorch/pytorch/issues/33152 Pull Request resolved: https://github.com/pytorch/pytorch/pull/46916 Reviewed By: navahgar, agolynski Differential Revision: D24706647 Pulled By: anjali411 fbshipit-source-id: fe780eac93d2ae1b2549539bb385e5fac25213b3	2020-11-11 16:08:11 -08:00
Ivan Yashchuk	a1db5b0f2b	Added CUDA support for complex input for torch.inverse #2 (#47595 ) Summary: `torch.inverse` now works for complex inputs on GPU. Opening a new PR here. The previous PR was merged and reverted due to a bug in tests marked with `slowTest`. Previous PR https://github.com/pytorch/pytorch/pull/45034 Ref. https://github.com/pytorch/pytorch/issues/33152 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47595 Reviewed By: navahgar Differential Revision: D24840955 Pulled By: anjali411 fbshipit-source-id: ec49fffdc4b3cb4ae7507270fa24e127be14f59b	2020-11-11 11:06:08 -08:00
Heitor Schueroff	a5c65b86ce	Fixed einsum compatibility/performance issues (#46398 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46398 This PR makes torch.einsum compatible with numpy.einsum except for the sublist input option as requested here https://github.com/pytorch/pytorch/issues/21412. It also fixed 2 performance issues linked below and adds a check for reducing to torch.dot instead of torch.bmm which is faster in some cases. fixes #45854, #37628, #30194, #15671 fixes #41467 with benchmark below ```python import torch from torch.utils.benchmark import Timer a = torch.randn(10000, 100, 101, device='cuda') b = torch.randn(10000, 101, 3, device='cuda') c = torch.randn(10000, 100, 1, device='cuda') d = torch.randn(10000, 100, 1, 3, device='cuda') print(Timer( stmt='torch.einsum("bij,bjf->bif", a, b)', globals={'a': a, 'b': b} ).blocked_autorange()) print() print(Timer( stmt='torch.einsum("bic,bicf->bif", c, d)', globals={'c': c, 'd': d} ).blocked_autorange()) ``` ``` <torch.utils.benchmark.utils.common.Measurement object at 0x7fa37c413850> torch.einsum("bij,bjf->bif", a, b) Median: 4.53 ms IQR: 0.00 ms (4.53 to 4.53) 45 measurements, 1 runs per measurement, 1 thread <torch.utils.benchmark.utils.common.Measurement object at 0x7fa37c413700> torch.einsum("bic,bicf->bif", c, d) Median: 63.86 us IQR: 1.52 us (63.22 to 64.73) 4 measurements, 1000 runs per measurement, 1 thread ``` fixes #32591 with benchmark below ```python import torch from torch.utils.benchmark import Timer a = torch.rand(1, 1, 16, 2, 16, 2, 16, 2, 2, 2, 2, device="cuda") b = torch.rand(729, 1, 1, 2, 1, 2, 1, 2, 2, 2, 2, device="cuda") print(Timer( stmt='(a * b).sum(dim = (-3, -2, -1))', globals={'a': a, 'b': b} ).blocked_autorange()) print() print(Timer( stmt='torch.einsum("...ijk, ...ijk -> ...", a, b)', globals={'a': a, 'b': b} ).blocked_autorange()) ``` ``` <torch.utils.benchmark.utils.common.Measurement object at 0x7efe0de28850> (a * b).sum(dim = (-3, -2, -1)) Median: 17.86 ms 2 measurements, 10 runs per measurement, 1 thread <torch.utils.benchmark.utils.common.Measurement object at 0x7efe0de286a0> torch.einsum("...ijk, ...ijk -> ...", a, b) Median: 296.11 us IQR: 1.38 us (295.42 to 296.81) 662 measurements, 1 runs per measurement, 1 thread ``` TODO - [x] add support for ellipsis broadcasting - [x] fix corner case issues with sumproduct_pair - [x] update docs and add more comments - [x] add tests for error cases Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D24860367 Pulled By: heitorschueroff fbshipit-source-id: 31110ee598fd598a43acccf07929b67daee160f9	2020-11-10 19:38:43 -08:00
Edward Yang	1aeefcdaa6	Revert D24730264: [pytorch][PR] Added CUDA support for complex input for torch.inverse Test Plan: revert-hammer Differential Revision: D24730264 (`33acbedace`) Original commit changeset: b9c94ec46301 fbshipit-source-id: beb9263700e9bc92685f74c37c46aa33f3b595b9	2020-11-06 07:28:14 -08:00
Ivan Yashchuk	33acbedace	Added CUDA support for complex input for torch.inverse (#45034 ) Summary: `torch.inverse` now works for complex inputs on GPU. Test cases with complex matrices are xfailed for now. For example, batched matmul does not work with complex yet. Ref. https://github.com/pytorch/pytorch/issues/33152 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45034 Reviewed By: zou3519 Differential Revision: D24730264 Pulled By: anjali411 fbshipit-source-id: b9c94ec463012913c117278a884adeee96ea02aa	2020-11-05 16:30:11 -08:00
Heitor Schueroff	a4ba018e57	Updated docs/test for dot and vdot (#47242 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47242 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D24733771 Pulled By: heitorschueroff fbshipit-source-id: 92e3b0e28e0565918335fa85d52abe5db9eeff57	2020-11-05 06:27:50 -08:00
Nikita Vedeneev	8a3728c819	Make `torch.det()` support complex input. (#45980 ) Summary: As per title. A minor fix required to make it available for the CPU (`fmod` does not support complex). For CUDA requires [https://github.com/pytorch/pytorch/issues/45898 ](https://github.com/pytorch/pytorch/pull/45898). Pull Request resolved: https://github.com/pytorch/pytorch/pull/45980 Reviewed By: izdeby Differential Revision: D24539097 Pulled By: anjali411 fbshipit-source-id: 508830dbfd7794ab73e19320d07c69a051c91819	2020-11-04 17:47:03 -08:00
Ivan Yashchuk	f276ab55cd	Added Kronecker product of tensors (torch.kron) (#45358 ) Summary: This PR adds a function for calculating the Kronecker product of tensors. The implementation is based on `at::tensordot` with permutations and reshape. Tests pass. TODO: - [x] Add more test cases - [x] Write documentation - [x] Add entry `common_methods_invokations.py` Ref. https://github.com/pytorch/pytorch/issues/42666 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45358 Reviewed By: mrshenli Differential Revision: D24680755 Pulled By: mruberry fbshipit-source-id: b1f8694589349986c3abfda3dc1971584932b3fa	2020-11-03 12:41:41 -08:00
Ivan Yashchuk	f629fbe235	Added torch.linalg.tensorsolve (#46142 ) Summary: This PR adds `torch.linalg.tensorsolve` function that matches `numpy.linalg.tensorsolve`. Ref https://github.com/pytorch/pytorch/issues/42666. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46142 Reviewed By: izdeby Differential Revision: D24539400 Pulled By: mruberry fbshipit-source-id: 6e38364fe0bc511e739036deb274d9307df119b2	2020-10-29 10:29:28 -07:00
Kurt Mohler	b61671ccd2	Enable dtype arg for torch.linalg.norm with order 'fro' and 'nuc' (#46637 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/46255 Pull Request resolved: https://github.com/pytorch/pytorch/pull/46637 Reviewed By: gchanan Differential Revision: D24459097 Pulled By: mruberry fbshipit-source-id: 7f207a23de902c27f8313ee80f452687a97e8f6f	2020-10-26 02:59:00 -07:00
Kurt Mohler	a0a8bc8870	Fix mistakes and increase clarity of norm documentation (#42696 ) Summary: * Removes incorrect statement that "the vector norm will be applied to the last dimension". * More clearly describe each different combination of `p`, `ord`, and input size. * Moves norm tests from `test/test_torch.py` to `test/test_linalg.py` * Adds test ensuring that `p='fro'` and `p=2` give same results for mutually valid inputs Fixes https://github.com/pytorch/pytorch/issues/41388 Pull Request resolved: https://github.com/pytorch/pytorch/pull/42696 Reviewed By: bwasti Differential Revision: D23876862 Pulled By: mruberry fbshipit-source-id: 36f33ccb6706d5fe13f6acf3de8ae14d7fbdff85	2020-10-10 14:12:43 -07:00
Kurt Mohler	d360402f34	Use out variants of functions used by linalg.norm, where possible (#45641 ) Summary: Closes https://github.com/pytorch/pytorch/issues/45669 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45641 Reviewed By: ngimel Differential Revision: D24186731 Pulled By: mruberry fbshipit-source-id: 7e3d12ef34704bf461b8de19830e7b2f73f3739b	2020-10-08 10:55:35 -07:00
Nikita Shulga	3a27fc966a	Test torch.svd using complex float and double numbers (take 2) (#45795 ) Summary: Adds support for magmaSvd for complex numbers Fixes use-after-free error in `apply_symeig` Pull Request resolved: https://github.com/pytorch/pytorch/pull/45795 Reviewed By: ezyang Differential Revision: D24096955 Pulled By: malfet fbshipit-source-id: 0d8d8492f89fe722bbd5aed3528f244245b496d0	2020-10-03 11:33:28 -07:00
Mike Ruberry	6417a70465	Updates linalg warning + docs (#45415 ) Summary: Changes the deprecation of norm to a docs deprecation, since PyTorch components still rely on norm and some behavior, like automatically flattening tensors, may need to be ported to torch.linalg.norm. The documentation is also updated to clarify that torch.norm and torch.linalg.norm are distinct. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45415 Reviewed By: ngimel Differential Revision: D23958252 Pulled By: mruberry fbshipit-source-id: fd54e807c59a2655453a6bcd9f4073cb2c12e8ac	2020-09-28 05:28:42 -07:00
Xiong Wei	241afc9188	Migrate `addr` from the TH to Aten (CPU) (#44364 ) Summary: Related https://github.com/pytorch/pytorch/issues/24507 Fixes https://github.com/pytorch/pytorch/issues/24666 This PR is to modernize the CPU implementation of the vector `outer product`. The existing TH implementation for `torch.attr` is migrated to `aten`, as the `torch.ger` manipulates the `addr` functions to calculate outer product, Pull Request resolved: https://github.com/pytorch/pytorch/pull/44364 Reviewed By: ezyang Differential Revision: D23866733 Pulled By: mruberry fbshipit-source-id: 5159ea22f0e3c991123fe7c19cc9beb6ad00301e	2020-09-25 01:18:09 -07:00
Mike Ruberry	95df8657c9	Enables test linalg (#45278 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/45271. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45278 Reviewed By: ngimel Differential Revision: D23926124 Pulled By: mruberry fbshipit-source-id: 26692597f9a1988e5fa846f97b8430c3689cac27	2020-09-24 23:09:38 -07:00
Kurt Mohler	28a23fce4c	Deprecate torch.norm and torch.functional.norm (#44321 ) Summary: Part of https://github.com/pytorch/pytorch/issues/24802 Pull Request resolved: https://github.com/pytorch/pytorch/pull/44321 Reviewed By: mrshenli Differential Revision: D23617273 Pulled By: mruberry fbshipit-source-id: 6f88b5cb097fd0acb9cf0e415172c5a86f94e9f2	2020-09-10 01:16:41 -07:00
Kurt Mohler	68297eeb1a	Add support for integer dim arg in `torch.linalg.norm` (#43907 ) Summary: Since PR https://github.com/pytorch/pytorch/issues/43262 is merged, this works now. Part of https://github.com/pytorch/pytorch/issues/24802 Pull Request resolved: https://github.com/pytorch/pytorch/pull/43907 Reviewed By: anjali411 Differential Revision: D23471964 Pulled By: mruberry fbshipit-source-id: ef2f11f78343fc866f752c9691b0c1fa687353ba	2020-09-05 23:16:36 -07:00
Kurt Mohler	68b9daa9bf	Add `torch.linalg.norm` (#42749 ) Summary: Adds `torch.linalg.norm` function that matches the behavior of `numpy.linalg.norm`. Additional changes: * Add support for dimension wrapping in `frobenius_norm` and `nuclear_norm` * Fix `out` argument behavior for `nuclear_norm` * Fix issue where `frobenius_norm` allowed duplicates in `dim` argument * Add `_norm_matrix` Closes https://github.com/pytorch/pytorch/issues/24802 Pull Request resolved: https://github.com/pytorch/pytorch/pull/42749 Reviewed By: ngimel Differential Revision: D23336234 Pulled By: mruberry fbshipit-source-id: f0aba3089a3a0bf856aa9c4215e673ff34228fac	2020-08-28 18:28:33 -07:00
Mike Ruberry	bee174dc3f	Adds linalg.det alias, fixes outer alias, updates alias testing (#42802 ) Summary: This PR: - updates test_op_normalization.py, which verifies that aliases are correctly translated in the JIT - adds torch.linalg.det as an alias for torch.det - moves the torch.linalg.outer alias to torch.outer (to be consistent with NumPy) The torch.linalg.outer alias was put the linalg namespace erroneously as a placeholder since it's a "linear algebra op" according to NumPy but is actually still in the main NumPy namespace. The updates to test_op_normalization are necessary. Previously it was using method_tests to generate tests, and method_tests assumes test suites using it also use the device generic framework, which test_op_normalization did not. For example, some ops require decorators like `skipCPUIfNoLapack`, which only works in device generic test classes. Moving test_op_normalization to the device generic framework also lets these tests run on CPU and CUDA. Continued reliance on method_tests() is excessive since the test suite is only interested in testing aliasing, and a simpler and more readable `AliasInfo` class is used for the required information. An example impedance mismatch between method_tests and the new tests, for example, was how to handle ops in namespaces like torch.linalg.det. In the future this information will likely be folded into a common 'OpInfo' registry in the test suite. The actual tests performed are similar to what they were previously: a scripted and traced version of the op is run and the test verifies that both graphs do not contain the alias name and do contain the aliased name. The guidance for adding an alias has been updated accordingly. cc mattip Note: ngimel suggests: - deprecating and then removing the `torch.ger` name - reviewing the implementation of `torch.outer` Pull Request resolved: https://github.com/pytorch/pytorch/pull/42802 Reviewed By: zou3519 Differential Revision: D23059883 Pulled By: mruberry fbshipit-source-id: 11321c2a7fb283a6e7c0d8899849ad7476be42d1	2020-08-11 21:48:31 -07:00
Mike Ruberry	9c8021c0b1	Adds torch.linalg namespace (#42664 ) Summary: This PR adds the `torch.linalg` namespace as part of our continued effort to be more compatible with NumPy. The namespace is tested by adding a single function, `torch.linalg.outer`, and testing it in a new test suite, test_linalg.py. It follows the same pattern that https://github.com/pytorch/pytorch/pull/41911, which added the `torch.fft` namespace, did. Future PRs will likely: - add more functions to torch.linalg - expand the testing done in test_linalg.py, including legacy functions, like torch.ger - deprecate existing linalg functions outside of `torch.linalg` in preference to the new namespace Pull Request resolved: https://github.com/pytorch/pytorch/pull/42664 Reviewed By: ngimel Differential Revision: D22991019 Pulled By: mruberry fbshipit-source-id: 39258d9b116a916817b3588f160b141f956e5d0b	2020-08-07 10:18:30 -07:00

... 3 4 5 6 7 ...

381 Commits