pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
mingfeima	645fb217c0	optimize sampled_addmm performance on CPU (SparseCSR) (#90978 ) ### Target and Background This PR is improving the performance of `sampled_addmm` on CPU device. This is part of effort for improving PyG performance on CPU for GNN training/inference. The current implementation is a reference design which converts `SparseCSR` tensor back to dense tensor and then do the addmm and convert back to `SparseCSR` again: this is going to be very slow and won't be able to run most of the datasets under https://github.com/snap-stanford/ogb (convert to dense would trigger `OOM`). ### Benchmarks Right now we don't have any hands-on benchmark or workload to test this since this operator is not used in PyG yet. I fetched the dataset from `ogb-products` where: * number of nodes: 2.4 * 10^6 * number of edges: 1.26 * 10^8 * number of features: 128 So if we store the adjacency matrix is dense, it is going to be 2.4 * 2.4 * 4 * 10^12 bytes, this will be OOB on current code. I abstract the first 1k rows to compare, 1100x speedup: CPU: Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz, dual socket, 20 cores per socket. ``` ### before: run 1000 rows from the whole dataset sampled_addmm: running dataset ogb-products first 1000 rows: each iter takes 1212.000 ms! ### after: run 1000 rows from the whole dataset sampled_addmm: running dataset ogb-products first 1000 rows: each iter takes 1.102 ms! ### after: run the whole dataset sampled_addmm: running dataset ogb-products (the whole dataset) 2449029 rows: each iter takes 873.306 ms! ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/90978 Approved by: https://github.com/pearu, https://github.com/cpuhrsch	2023-01-10 22:13:35 +00:00
Pearu Peterson	cdc30048e5	Fix numel() result after resizing a sparse compressed tensor. (#91831 ) Fixes #91830 Pull Request resolved: https://github.com/pytorch/pytorch/pull/91831 Approved by: https://github.com/cpuhrsch	2023-01-10 18:21:07 +00:00
Pearu Peterson	b797a24259	Support indices contiguity per batch and non-contiguous values in sparse compressed tensors (#91243 ) Fixes https://github.com/pytorch/pytorch/issues/91062 With this PR, all reported failures in https://github.com/pytorch/pytorch/pull/90849 are resolved (modulo test_bmm that uses an unorthodox way to construct a batch CSR tensor). Pull Request resolved: https://github.com/pytorch/pytorch/pull/91243 Approved by: https://github.com/nikitaved, https://github.com/amjames, https://github.com/lezcano	2023-01-02 18:08:46 +00:00
Kurt Mohler	08a47549af	Rename `Tensor._storage` to `Tensor.untyped_storage` and update docs (#91414 ) Fixes #89224 Pull Request resolved: https://github.com/pytorch/pytorch/pull/91414 Approved by: https://github.com/ezyang	2022-12-28 19:21:34 +00:00
Nikita Vedeneev	4c5928e387	Fix for `mul(compressed, wrapped scalar)` (#91239 ) Fixes https://github.com/pytorch/pytorch/issues/90819. The path with `Scalar` should have been picked up by the dispatcher, but still the path with a 0-dim wrapped scalar was broken. Pull Request resolved: https://github.com/pytorch/pytorch/pull/91239 Approved by: https://github.com/pearu, https://github.com/cpuhrsch	2022-12-22 13:11:13 +00:00
Pearu Peterson	01e7f46215	Ensure sorted indices from the CSR->BSR conversion (#90918 ) Fixes https://github.com/pytorch/pytorch/issues/90910 Pull Request resolved: https://github.com/pytorch/pytorch/pull/90918 Approved by: https://github.com/cpuhrsch	2022-12-16 15:49:48 +00:00
Nikita Vedeneev	c2c14f9597	Sparse compressed mm: fix for orthogonal inputs (#90917 ) Fixes https://github.com/pytorch/pytorch/issues/90836 Pull Request resolved: https://github.com/pytorch/pytorch/pull/90917 Approved by: https://github.com/cpuhrsch	2022-12-16 13:08:00 +00:00
Nikita Vedeneev	4dd3de23dd	Sparse compressed mm: fix for empty inputs (#90763 ) Fixes [#90693 ](https://github.com/pytorch/pytorch/issues/90693) Pull Request resolved: https://github.com/pytorch/pytorch/pull/90763 Approved by: https://github.com/cpuhrsch	2022-12-16 12:33:57 +00:00
Pearu Peterson	76c6dfeaa6	Add layout and blocksize arguments to Tensor.to_sparse method (#89502 ) This PR extends the `Tensor.to_sparse()` method to `Tensor.to_sparse(layout=None, blocksize=None)` in a BC manner (`layout=None` means `layout=torch.sparse_coo`). In addition, the PR adds support for the following conversions: - non-hybrid/hybrid COO tensor to CSR or CSC or a COO tensor - short, bool, byte, char, bfloat16, int, long, half CSR tensor to a BSR tensor and fixes the following conversions: - hybrid COO to COO tensor - non-batch/batch hybrid BSR to BSR or BSC tensor Pull Request resolved: https://github.com/pytorch/pytorch/pull/89502 Approved by: https://github.com/amjames, https://github.com/cpuhrsch	2022-11-30 20:21:10 +00:00
Pearu Peterson	296e1ba4d0	Row and column select support for block compressed sparse tensors (#88733 ) As in the title: - Support `select` and `select_copy` on block sparse compressed tensors - Fixes incorrect results when selecting dense dimensions The PR also improves the performance of indexing sparse compressed tensors considerably: <details> Before: ```python In [3]: a=torch.rand((1000, 1000)).to_sparse_csr() In [4]: %timeit a.select(0, 0) 606 µs ± 4.27 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) In [5]: %timeit a.select(1, 0) 527 µs ± 57.7 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) In [6]: %timeit a[0, 0] 617 µs ± 3.74 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) In [7]: a = a.cuda() In [8]: %timeit a.select(0, 0); torch.cuda.synchronize(); 1.19 ms ± 137 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) In [9]: %timeit a.select(1, 0); torch.cuda.synchronize(); 1.2 ms ± 119 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) In [10]: %timeit a[0, 0]; torch.cuda.synchronize(); 1.23 ms ± 482 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) ``` This PR: ```python In [3]: a=torch.rand((1000, 1000)).to_sparse_csr() In [4]: %timeit a.select(0, 0) 4.75 µs ± 8.94 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) In [5]: %timeit a.select(1, 0) 565 µs ± 156 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) In [6]: %timeit a[0, 0] 13.1 µs ± 435 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) In [7]: a = a.cuda() In [8]: %timeit a.select(0, 0); torch.cuda.synchronize(); 21.6 µs ± 23.9 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) In [9]: %timeit a.select(1, 0); torch.cuda.synchronize(); 1.15 ms ± 3.13 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) In [10]: %timeit a[0, 0]; torch.cuda.synchronize(); 63.7 µs ± 2.5 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) ``` </details> Pull Request resolved: https://github.com/pytorch/pytorch/pull/88733 Approved by: https://github.com/nikitaved, https://github.com/amjames, https://github.com/cpuhrsch	2022-11-30 11:15:56 +00:00
Pearu Peterson	90bed8874f	Generator of tensor inputs with variable layout and structure (batch/non-batch, hybrid/non-hybrid, block/non-block) (#88914 ) This PR introduces `TestCase.generate_simple_inputs` method that is an improved and generalized version of the `TestSparseCompressed._generate_small_inputs` method. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88914 Approved by: https://github.com/cpuhrsch	2022-11-30 02:13:33 +00:00
Pearu Peterson	50e2e4faf3	Sparse CSC/BSR/BSC serialization and pickle support (#89553 ) Fixes https://github.com/pytorch/pytorch/issues/89497 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89553 Approved by: https://github.com/cpuhrsch	2022-11-23 20:56:48 +00:00
Andrew M. James	a41f70603a	Round out rad2deg sparse support (#88442 ) - Add sparse coo dispatch - Modify backward to work with sparse compressed layouts - Enable sparse_compressed autograd testing - Correct layout support attributes on OpInfo Pull Request resolved: https://github.com/pytorch/pytorch/pull/88442 Approved by: https://github.com/cpuhrsch	2022-11-17 06:00:23 +00:00
Nikita Vedeneev	8dc3353b0b	add `to(dtype)` support for all sparse compressed formats (#89055 ) Fixes [#88419](https://github.com/pytorch/pytorch/issues/88419) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89055 Approved by: https://github.com/cpuhrsch	2022-11-15 21:16:18 +00:00
Kazuaki Ishizaki	03296844aa	Fix typos in messages under aten (#88964 ) This PR fixes typos of messages and parms in c++ source files under `aten` directory. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88964 Approved by: https://github.com/lezcano	2022-11-14 09:50:50 +00:00
Andrew M. James	ff6770a9a1	enable backward for log1p (sparse layouts) (#88155 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88155 Approved by: https://github.com/cpuhrsch	2022-11-04 20:59:26 +00:00
Andrew M. James	6938dd0b2c	Support sparse inputs to deg2rad (#88156 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88156 Approved by: https://github.com/cpuhrsch	2022-11-04 20:59:26 +00:00
Andrew M. James	1964d8c34f	Enable sparse_csr autograd testing for relu (#88154 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88154 Approved by: https://github.com/cpuhrsch	2022-11-04 20:59:23 +00:00
Andrew M. James	f03302ba49	Add sparse layout support for torch.frac (#88153 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88153 Approved by: https://github.com/cpuhrsch	2022-11-04 20:59:22 +00:00
Andrew M. James	b2dfd20260	Remove BSC conversion skip from TestSparseCompressed.test_consistency (#88152 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88152 Approved by: https://github.com/cpuhrsch	2022-11-01 22:18:56 +00:00
Andrew M. James	d044b4cc58	Update torch.abs and torch.positive opinfos to reflect sparse support (#88151 ) cc @nikitaved @pearu @cpuhrsch @bhosmer Pull Request resolved: https://github.com/pytorch/pytorch/pull/88151 Approved by: https://github.com/cpuhrsch	2022-11-01 22:18:56 +00:00
Ivan Yashchuk	51ea441862	Upcast to fp32 in test_addmm_block ref_half_bfloat16 (#86682 ) Fixes https://github.com/pytorch/pytorch/issues/86681 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86682 Approved by: https://github.com/nikitaved	2022-10-11 16:39:57 +00:00
nikitaved	e15a48def7	(bsr/csr) x dense mm (#85551 ) As per title. This implementation is not the most optimal and could be improved albeit with native kernels (i.e. block matching need not be materialized). Compared to existing kernels it offers: - Half float support (In fact, any dtype that supports `matmul` will work). - Arbitrary block sizes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85551 Approved by: https://github.com/amjames, https://github.com/cpuhrsch	2022-09-29 17:12:04 +00:00
Andrew M. James	8a926b3187	Enable CSC @ CSC addmm (#85379 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85379 Approved by: https://github.com/pearu, https://github.com/cpuhrsch	2022-09-27 19:49:31 +00:00
Andrew M. James	bb5001ce3d	Enable dense x bsc mm/addmm (#85308 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85308 Approved by: https://github.com/pearu	2022-09-27 19:49:31 +00:00
Andrew M. James	aaef5d8f2c	sparse mm/addmm enable dense x csc, csc x dense and simplify layout check logic. (#85307 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85307 Approved by: https://github.com/pearu, https://github.com/cpuhrsch	2022-09-27 16:46:28 +00:00
Andrew M. James	f64857189d	resize_as_sparse support all compressed layouts (#85378 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85378 Approved by: https://github.com/pearu, https://github.com/cpuhrsch	2022-09-27 06:59:18 +00:00
George Qi	686555b663	[maskedtensor] port torch/_masked into torch/masked (#85515 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85515 Approved by: https://github.com/cpuhrsch	2022-09-26 23:41:13 +00:00
Sean Ross-Ross	a4c94f0739	Fix cuda issue with sparse.sampled_addmm (#85194 ) fixes https://github.com/pytorch/pytorch/issues/85169 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85194 Approved by: https://github.com/amjames, https://github.com/nikitaved	2022-09-23 20:52:23 +00:00
nikitaved	0278a141fc	csr <-> csc, csc <-> csc, bsr <-> bsc, bsc <-> bsc, bsr <-> bsr conversions (#85091 ) As per title. Required to enable a wider selection of sparse formats for `nn.functional.linear`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85091 Approved by: https://github.com/amjames, https://github.com/cpuhrsch	2022-09-21 20:10:26 +00:00
Pearu Peterson	f0b06c64c8	Fix bugs in sparse compressed tensor shape and device inference (#85240 ) Fixes #84999 This PR - uses device option to set sparse compressed tensor instance device - enables shape and device inference tests that was disabled due to an oversight - fixes a bug in shape inference of hybrid tensors - fixes a bug in to_sparse_bsr of a cuda tensor - updates tests that catch the above bugs Pull Request resolved: https://github.com/pytorch/pytorch/pull/85240 Approved by: https://github.com/cpuhrsch	2022-09-19 18:10:37 +00:00
Edward Z. Yang	c5a8946e40	Revert "Revert "Redo how custom/python_custom methods on TensorImpl work (#84796 )" (#84806 ) This reverts commit `ca3b2bfbe3`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84806 Approved by: https://github.com/Chillee	2022-09-10 06:17:35 +00:00
Eli Uriegas	ca3b2bfbe3	Revert "Redo how custom/python_custom methods on TensorImpl work (#84796 ) This reverts commit `591b75bf98`. Manual revert of https://github.com/pytorch/pytorch/pull/84641 Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/84796 Approved by: https://github.com/izaitsevfb	2022-09-10 00:18:13 +00:00
Edward Z. Yang	591b75bf98	Redo how custom/python_custom methods on TensorImpl work (#84641 ) A longstanding confusion in the implementation of fake tensor and proxy tensor is what to do about torch.ops.aten.sym_sizes and related calls. In particular, when you have a tensor that (1) has symbolic shapes and (2) has a `__torch_dispatch__` call, previously, you would always get `__torch_dispatch__` calls for sizes/strides query, even if you didn't request it via the dispatch kwargs in `make_wrapper_subclass`. The reason for this is because we were previously mixing several concepts: "I want to dispatch to Python", "I want to call a virtual method" and "I have dynamic shapes". A single boolean variable controlled all of these things, and so it was not possible to understand inside TensorImpl what the user had actually originally requested. In this PR, we track each of these concepts individually so that we can preserve user intent. Then, we combine these into a single "policy" variable that controls whether or not we can use the fastpath or not. For the policy to trigger, we only need one of the exceptional cases to be true. Billing of changes: * Rename `set_sizes_strides_policy` to `set_custom_sizes_strides`; in general, you cannot DIRECTLY set policy; you have to indirectly set it by the public functions. * Some helpers for sizes and strides, since it's more complicated (as it is an enum, rather than just bools as is the case for device and layout). `matches_python_custom` is used to test the Python dispatch user ask. `matches_policy` does the policy test (only used in the user facing functions.) * I reorged the accessor methods so that they are more logical. This makes the diff bad, so I recommend reading the final code directly. * The default custom implementations now more reliably call their default() implementations * As bonus refactor, I devirtualized some functions that don't need to be virtual * `set_sym_sizes_and_strides` is renamed to `set_sizes_and_strides` to make it easier to use in template contexts; it optionally takes a storage offset now so you can set all three values at the same time. If you use the SymInt overload but there are no symbolic integers, we give you a normal resize. * This adds `sym_storage_offset` since we had that in the symbolic shapes branch and there's no reason not to put it in (and it reduces merge conflicts) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/84641 Approved by: https://github.com/wconstab	2022-09-09 13:41:13 +00:00
Andrew M. James	9b115c7bd3	Sparse Compressed Transpose add support for Batch dims and BSR/BSC layouts (#82122 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/82122 Approved by: https://github.com/bhosmer	2022-09-02 17:42:58 +00:00
Andrew M. James	0192a34910	Dense -> CSC support batch dimensions (#83086 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83086 Approved by: https://github.com/bhosmer, https://github.com/nikitaved	2022-09-02 17:42:58 +00:00
Andrew M. James	f0e5b73364	Dense -> CSR support batch dimensions (#83084 ) Only requires changes to the dense->sparse pathway. The reverse already has support. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83084 Approved by: https://github.com/bhosmer, https://github.com/nikitaved	2022-09-02 17:42:58 +00:00
Andrew M. James	8778f33744	Dense <-> bsc conversions (#80781 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/80781 Approved by: https://github.com/bhosmer, https://github.com/nikitaved	2022-09-01 16:01:58 +00:00
jpvillam	247468baf0	[ROCm] More Sparse UTs enablement and more hipification mappings. (#78939 ) Enables: test_bmm_cuda_float64 test_bmm_deterministic_cuda_float64 test_csr_matvec_cuda_complex128 test_csr_matvec_cuda_complex64 test_csr_matvec_cuda_float32 test_csr_matvec_cuda_float64 To enable the above tests had to add some more hip mappings for the hipification process. Pull Request resolved: https://github.com/pytorch/pytorch/pull/78939 Approved by: https://github.com/pruthvistony, https://github.com/malfet	2022-08-23 13:54:09 +00:00
Andrew M. James	eebcb9117a	Fix BSR->Dense Batched Bug (#82120 ) A todo in the tests which should have been removed and addressed before the initial PR landed was left, and so left holes in testing BSR-> Dense. This addresses the underlying issue and removes the hole in test coverage. #8071 Introduces more comprehensive test coverage for sparse compressed <-> Dense conversion in general. Pull Request resolved: https://github.com/pytorch/pytorch/pull/82120 Approved by: https://github.com/nikitaved, https://github.com/bhosmer	2022-08-06 02:24:20 +00:00
Andrew M. James	0e0dfaa057	Add support for `select` of batch dims for all sparse compressed formats. (#82119 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/82119 Approved by: https://github.com/nikitaved, https://github.com/bhosmer	2022-08-06 02:24:20 +00:00
Nikita Shulga	d80fe49de0	[Reland] Add py-3.10 config (#82329 ) This is a re-land of #81372 and #81233 with the exception that it does not force the range-checks on older Python runtime versions and as such should not affect the internal workloads, which were the reason for revert, see https://github.com/pytorch/pytorch/pull/81372#issuecomment-1187516464 - [Py3.10] Allow floats to be imported as Long (#81372) - [CI] Move CUDA-11.6 to Python-3.10 configuration (#81233) - Don't do anything about range checks for pre-py3.10 Pull Request resolved: https://github.com/pytorch/pytorch/pull/82329 Approved by: https://github.com/kit1980	2022-07-27 20:22:47 +00:00
Edward Z. Yang	7f7c81c5f9	Add empty_like support for sparse_csc/bsr/bsc (#82310 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/82310 Approved by: https://github.com/amjames, https://github.com/nikitaved	2022-07-27 18:59:07 +00:00
PyTorch MergeBot	ec1b3a45ad	Revert "[Py3.10] Allow floats to be imported as Long (#81372 )" This reverts commit `69d73345a2`. Reverted https://github.com/pytorch/pytorch/pull/81372 on behalf of https://github.com/DanilBaibak due to Break internal build	2022-07-18 14:55:13 +00:00
Nikita Shulga	69d73345a2	[Py3.10] Allow floats to be imported as Long (#81372 ) Thus avoiding `TypeError: 'float' object cannot be interpreted as an integer` when trying to create integer tensor from floating point values Use `c10::checked_convert` to detect overflows during tensor construction from scalars. Modify sparse_csr test that violated this rule Fixes #69319 Tested in #81233 Pull Request resolved: https://github.com/pytorch/pytorch/pull/81372 Approved by: https://github.com/ezyang, https://github.com/ngimel	2022-07-15 22:57:58 +00:00
Nikita Vedeneev	880b972841	More efficient indices validations for compressed sparse formats. (#81108 ) As per title. Some of the features: - native kernels both for the CPU and CUDA without device syncs. - If needed, invariant checks 5.1 - 5.5 could be improved to utilize vectorization. This will require implementing a conversion `Vectorized -> bool`. That's a follow-up. Pull Request resolved: https://github.com/pytorch/pytorch/pull/81108 Approved by: https://github.com/amjames, https://github.com/pearu, https://github.com/cpuhrsch	2022-07-14 20:36:18 +00:00
Pearu Peterson	d50f4a3c24	Support sparse/dense_dim for Compressed Sparse tensors (#80901 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/80901 Approved by: https://github.com/cpuhrsch, https://github.com/nikitaved	2022-07-08 15:49:35 +00:00
Pearu Peterson	d266256621	Support compressed sparse tensors with dense dimensions (#80565 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/80565 Approved by: https://github.com/cpuhrsch	2022-07-07 16:21:12 +00:00
PyTorch MergeBot	682c0d2615	Use segment/scatter_reduce to support masked reductions on sparse CSR tensors (mean, amax, amin) (fp only) (#78918 ) Follows design [here](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/sparse/SparseCsrTensorMath.cpp#L804-L837) and [here](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/sparse/SparseCsrTensorMath.cpp#L885-L928) from SparseCsrTensorMath.cpp (which has already been used to implement sum/prod) but use `segment_reduce`/`scatter_reduce` for reduction step Pull Request resolved: https://github.com/pytorch/pytorch/pull/78918 Approved by: https://github.com/cpuhrsch	2022-06-30 14:11:53 +00:00
Andrew M. James	9e3677f85d	Add support for BSR <-> Strided Conversion (#80354 ) Supersedes #78303 Pull Request resolved: https://github.com/pytorch/pytorch/pull/80354 Approved by: https://github.com/cpuhrsch	2022-06-27 21:09:09 +00:00
Pearu Peterson	cde365a7cd	Validate Sparse Compressed tensor inputs (#79385 ) The validation includes regular tensor inputs, batched tensor inputs, as well as hybrid tensor inputs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/79385 Approved by: https://github.com/nikitaved, https://github.com/cpuhrsch	2022-06-27 17:19:54 +00:00
Nikita Vedeneev	9ad91cc6e0	optimize `to_dense` for CSC (#79635 ) As per title. Previously it was done via converting to COO. A better approach could be using `dense.out_`, but `sparse_csc` is yet forbidden. And are we fine with implementing very critical operations like `add` via transpositions? Pull Request resolved: https://github.com/pytorch/pytorch/pull/79635 Approved by: https://github.com/cpuhrsch	2022-06-21 16:52:16 +00:00
jpvillam	aff7eef476	[ROCm] Enable some sparse tests on ROCm (#77877 ) Enabling: test_sampled_addmm_errors_cuda_complex128 test_sampled_addmm_errors_cuda_complex64 test_sampled_addmm_errors_cuda_float32 test_sampled_addmm_errors_cuda_float64 test_sparse_add_cuda_complex128 test_sparse_add_cuda_complex64 Pull Request resolved: https://github.com/pytorch/pytorch/pull/77877 Approved by: https://github.com/pruthvistony, https://github.com/malfet	2022-06-14 21:11:35 +00:00
Pearu Peterson	fb6749d977	Support CSC/BSR/BSC inputs to unary zero-preserving functions. In addition, enable testing masked reductions in sparse compressed consistency check. Pull Request resolved: https://github.com/pytorch/pytorch/pull/78173 Approved by: https://github.com/cpuhrsch	2022-06-09 09:46:34 +00:00
Pearu Peterson	8c88a55d44	Fix sparse BSR tensor validation. Also adds bits to support dense dimensions for Sparse Compressed tensors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/78359 Approved by: https://github.com/cpuhrsch	2022-05-27 13:26:35 +00:00
Christian Puhrsch	b9fb940dec	Conversion between SparseBsr and Strided (#78025 ) Adds conversion between the strided and SparseBsr layout [Based on code by @bhosmer!](https://colab.research.google.com/drive/1NHWti04TU269dzbRjLfxGxVlzZWo1XLo?usp=sharing) Pull Request resolved: https://github.com/pytorch/pytorch/pull/78025 Approved by: https://github.com/pearu, https://github.com/jbschlosser	2022-05-25 15:03:35 +00:00
Christian Puhrsch	a8467de6fa	Guard test_sparse_csr.test_mm on CUDA11+ (#77965 ) Fixes #77944 Pull Request resolved: https://github.com/pytorch/pytorch/pull/77965 Approved by: https://github.com/albanD, https://github.com/malfet	2022-05-20 16:16:28 +00:00
Christian Puhrsch	ec290949aa	Change transpose to return CSC when given CSR, adjust addmm, addmv, mm (#77615 ) Changes transpose to return CSC when given CSR and adds CSC support via to_sparse_csr to addmm and addmv. Pull Request resolved: https://github.com/pytorch/pytorch/pull/77615 Approved by: https://github.com/pearu, https://github.com/albanD	2022-05-19 14:17:55 +00:00
Pearu Peterson	8b5f11c61e	Support copy_ for Sparse Compressed tensors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/77605 Approved by: https://github.com/cpuhrsch	2022-05-18 21:22:19 +00:00
Christian Puhrsch	e10a002e52	2D Strided to/from CSC, COO to CSC, CSC to CSC conversion. (#77521 ) Adds - to_sparse_csc for strided input - to_sparse_csc for COO input - CSC to strided - CSC to CSR - CSC to CSC Uses SciPy as a reference Follow up work is changing transpose to return CSC when passed CSR and the resulting ripples through our matmul operations. Pull Request resolved: https://github.com/pytorch/pytorch/pull/77521 Approved by: https://github.com/pearu, https://github.com/anjali411	2022-05-18 14:49:11 +00:00
Pearu Peterson	ccc991ba29	Support str for Sparse Compressed tensors Pull Request resolved: https://github.com/pytorch/pytorch/pull/77530 Approved by: https://github.com/cpuhrsch	2022-05-18 12:58:54 +00:00
Pearu Peterson	dc882ed33d	Add Sparse Compressed tensor support to torch.clone Pull Request resolved: https://github.com/pytorch/pytorch/pull/77512 Approved by: https://github.com/cpuhrsch	2022-05-17 16:29:41 +00:00
PyTorch MergeBot	0d1329c4ea	Revert "Add Sparse Compressed tensor support to torch.clone" This reverts commit `942f04172a`. Reverted https://github.com/pytorch/pytorch/pull/77512 on behalf of https://github.com/atalman	2022-05-17 14:26:52 +00:00
Pearu Peterson	942f04172a	Add Sparse Compressed tensor support to torch.clone Pull Request resolved: https://github.com/pytorch/pytorch/pull/77512 Approved by: https://github.com/cpuhrsch	2022-05-17 07:32:46 +00:00
PyTorch MergeBot	f1c8e8fa4e	Revert "Add Sparse Compressed tensor support to torch.clone" This reverts commit `20ba6e6935`. Reverted https://github.com/pytorch/pytorch/pull/77512 on behalf of https://github.com/malfet	2022-05-17 00:31:49 +00:00
Christian Puhrsch	89e32f52c7	Change test_sparse_csr test signatures (#77595 ) Some consuming tools aren't equipped to split on the "(" and ")" induced by passing tuples to parametrize. Pull Request resolved: https://github.com/pytorch/pytorch/pull/77595 Approved by: https://github.com/malfet	2022-05-17 00:24:08 +00:00
Pearu Peterson	20ba6e6935	Add Sparse Compressed tensor support to torch.clone Pull Request resolved: https://github.com/pytorch/pytorch/pull/77512 Approved by: https://github.com/cpuhrsch	2022-05-16 22:21:49 +00:00
Pearu Peterson	d76efed578	Add Sparse CSC support to torch.empty Pull Request resolved: https://github.com/pytorch/pytorch/pull/77508 Approved by: https://github.com/cpuhrsch	2022-05-16 18:53:56 +00:00
Christian Puhrsch	8c608a79b4	Compressed sparse layout conversion stubs (#77489 ) This PR unifies sparse layout conversions into a single location and adds stubs to raise a Runtime error for unsupported conversions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/77489 Approved by: https://github.com/pearu, https://github.com/mruberry	2022-05-16 18:37:42 +00:00
Pearu Peterson	88205886d7	Add ccol_indices and row_indices methods. Pull Request resolved: https://github.com/pytorch/pytorch/pull/77503 Approved by: https://github.com/cpuhrsch	2022-05-16 00:23:54 +00:00
Christian Puhrsch	289192199a	Add to_sparse_bsr (#77366 ) Conversion function of CSR to BSR. Follow up work includes - Conversion from strided, COO, CSC, BSC - autograd Pull Request resolved: https://github.com/pytorch/pytorch/pull/77366 Approved by: https://github.com/IvanYashchuk, https://github.com/mikaylagawarecki	2022-05-13 20:16:03 +00:00
Christian Puhrsch	b250759242	mul(dense, csr), mul(csr, dense) via sparse_mask_csr (#77177 ) This adds basic coverage, but can be easily made more efficient by providing a native implementation. Follow up work includes supporting CSR gradients for strided Tensors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/77177 Approved by: https://github.com/nikitaved, https://github.com/mikaylagawarecki	2022-05-12 23:56:10 +00:00
Ivan Yashchuk	09be44de7b	Sparse BSR: Enable addmm, addmv, triangular_solve for BSR layout (#77255 ) This PR enables `addmm`, `addmv`, `triangular_solve` functions for tensors with `torch.sparse_bsr` layout. Pull Request resolved: https://github.com/pytorch/pytorch/pull/77255 Approved by: https://github.com/cpuhrsch	2022-05-12 08:31:44 +00:00
Ivan Yashchuk	d1beda53e8	Sparse CSR CUDA: add batched support for torch.sparse.sampled_addmm This PR adds a forloop around cuSPARSE calls to support batched inputs. cuSPARSE function itself doesn't support batched inputs yet. `mat1` and `mat2` must have the same batch shape. It's allowed to pass `self` as a single matrix when `mat1` and `mat2` are batched. Pull Request resolved: https://github.com/pytorch/pytorch/pull/77243 Approved by: https://github.com/cpuhrsch	2022-05-12 08:23:38 +00:00
Ivan Yashchuk	545d90f032	Sparse CSR: enable autograd for torch.sparse.addmm and torch.sparse.mm This PR updates the derivative rule for `torch.sparse.addmm` to be working with CSR sparse matrix. Notably `torch.sparse.sampled_addmm` is used in the backward function. Pull Request resolved: https://github.com/pytorch/pytorch/pull/76591 Approved by: https://github.com/cpuhrsch	2022-05-11 18:57:40 +00:00
PyTorch MergeBot	f94abd59f7	Revert "Sparse CSR: enable autograd for torch.sparse.addmm and torch.sparse.mm" This reverts commit `721a8ca697`. Reverted https://github.com/pytorch/pytorch/pull/76591 on behalf of https://github.com/janeyx99	2022-05-10 13:21:46 +00:00
Ivan Yashchuk	721a8ca697	Sparse CSR: enable autograd for torch.sparse.addmm and torch.sparse.mm This PR updates the derivative rule for `torch.sparse.addmm` to be working with CSR sparse matrix. Notably `torch.sparse.sampled_addmm` is used in the backward function. Pull Request resolved: https://github.com/pytorch/pytorch/pull/76591 Approved by: https://github.com/cpuhrsch	2022-05-10 08:44:55 +00:00
Ivan Yashchuk	3df0140cbd	Sparse CSR: Fix sampled_addmm for noncontiguous inputs and fix block sparse triangular solve `torch.sparse.sampled_addmm` was incorrect for noncontiguous inputs on CUDA. Unfortnately, it was overlooked in the tests that noncontiguous inputs are not tested properly because 1x5, 5x1 shapes were used. Block sparse triangular solver on CUDA could return incorrect results if there's a zero on the diagonal in the sparse matrix. Now it returns nan. Tests also revealed that unitriangular=True flag is not working correctly on CPU in some cases. That part needs more investigation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/76590 Approved by: https://github.com/cpuhrsch	2022-05-05 09:00:48 +00:00
Ivan Yashchuk	1335512056	Sparse CSR: Add CPU fallback for sampled_addmm `torch.sparse.sampled_addmm` function is used in backward for `torch.sparse.addmm` and `torch.sparse.mm` therefore we need a CPU implementation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/76589 Approved by: https://github.com/cpuhrsch	2022-05-04 21:30:43 +00:00
Pearu Peterson	436a7be059	Factory functions for sparse CSC, BSR, and BSC tensors Pull Request resolved: https://github.com/pytorch/pytorch/pull/76634 Tests for Sparse Compressed factory functions Pull Request resolved: https://github.com/pytorch/pytorch/pull/76746 Approved by: https://github.com/cpuhrsch	2022-05-04 03:30:41 +00:00
Ivan Yashchuk	d7db6a7b02	Sparse CSR: Add backward for torch.sparse.sampled_addmm Pull Request resolved: https://github.com/pytorch/pytorch/pull/68084 Approved by: https://github.com/cpuhrsch	2022-05-02 17:58:20 +00:00
Ivan Yashchuk	407e8eba8c	Enable simple indexing into CSR tensor, add torch.select for CSR This PR implements `torch.select` for CSR tensors. Currently, it's not possible to select rows or columns for batched CSR. The non-batched case works fine by converting to COO and calling select. Initially, I implemented raw manipulations of indices but converting to COO is only slightly slower and more readable. This PR also enables indexing into batched CSR tensor with `[x, y, z]`. Assigning is disabled. Pull Request resolved: https://github.com/pytorch/pytorch/pull/76228 Approved by: https://github.com/cpuhrsch	2022-04-23 02:36:03 +00:00
arindamroy-eng	7478ce187a	ROCM:Unskip more tests for ROCM5.0 Re-enabling more tests which are working on ROCM5.0 Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/75353 Approved by: https://github.com/ezyang	2022-04-19 19:45:55 +00:00
Ivan Yashchuk	bba4780232	Enable autograd wrt sparse CSR tensors This pull request enables accumulating gradients for the CSR tensor. Functions that work and are tested: - tensor.abs() - tensor.neg() - tensor.conj_physical() - torch.addmm `torch.mm` also works, but tests will be added later. In addition, this PR adds throwing an error when trying to access strides, storage, and contiguity info on a CSR tensor. `tensor.to_sparse_csr().to_sparse_csr()` was failing and now fixed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/75435 Approved by: https://github.com/cpuhrsch	2022-04-19 18:42:45 +00:00
Pearu Peterson	e9791cd8c9	Validate Sparse Compressed tensor arguments Pull Request resolved: https://github.com/pytorch/pytorch/pull/75946 Approved by: https://github.com/cpuhrsch	2022-04-18 02:21:22 +00:00
Yukio Siraichi	22a10ce513	Port `cat` kernel to structured kernels. Tracking issue: #55070 Pull Request resolved: https://github.com/pytorch/pytorch/pull/68640 Approved by: https://github.com/ezyang	2022-04-14 17:49:43 +00:00
Ivan Yashchuk	3f1351d1cf	Disable strides and contiguity for CSR tensors This pull request adds throwing an error when trying to access the strides, storage, and contiguity info of a CSR tensor. Pull Request resolved: https://github.com/pytorch/pytorch/pull/75499 Approved by: https://github.com/cpuhrsch	2022-04-08 23:15:19 +00:00
Pearu Peterson	e61b2e12e1	Support masked sum on CSR tensors [CPU, CUDA] Pull Request resolved: https://github.com/pytorch/pytorch/pull/72633 Approved by: https://github.com/cpuhrsch	2022-04-08 20:07:18 +00:00
PyTorch MergeBot	31ed77b769	Revert "Support masked sum on CSR tensors [CPU, CUDA]" This reverts commit `5c28216aea`. Reverted https://github.com/pytorch/pytorch/pull/72633 on behalf of https://github.com/b0noI	2022-04-07 23:34:58 +00:00
Ivan Yashchuk	c7ae23b50e	Extend CSR constructor to support batched indices and values This is the first portion of changes required to enable Batched CSR format described in https://github.com/pytorch/pytorch/issues/60854#batched-CSR-computation. Currently, only the same batch shape for indices and values is allowed. In the future, we could enable "broadcasting" of indices and batched values, as done in xFormers (`dd96b8d8be/xformers/components/attention/_sputnik_sparse.py (L441)`). This PR adds possibility to construct a batched CSR matrix with `torch.sparse_csr_tensor` and this batched CSR can be converted to a dense tensor with a `.to_dense()` call. Pull Request resolved: https://github.com/pytorch/pytorch/pull/74542 Approved by: https://github.com/cpuhrsch	2022-04-07 17:10:52 +00:00
Pearu Peterson	5c28216aea	Support masked sum on CSR tensors [CPU, CUDA] Pull Request resolved: https://github.com/pytorch/pytorch/pull/72633 Approved by: https://github.com/cpuhrsch	2022-04-07 17:08:35 +00:00
PyTorch MergeBot	6d832a7a20	Revert "Extend CSR constructor to support batched indices and values" This reverts commit `eead599039`. Reverted https://github.com/pytorch/pytorch/pull/74542 on behalf of https://github.com/b0noI	2022-04-05 21:39:34 +00:00
Christian Puhrsch	f2a4d49174	torch.mm(dense, sparse_csr) Fixes #68621 Pull Request resolved: https://github.com/pytorch/pytorch/pull/73686 Approved by: https://github.com/IvanYashchuk, https://github.com/malfet	2022-04-05 17:05:37 +00:00
Ivan Yashchuk	eead599039	Extend CSR constructor to support batched indices and values This is the first portion of changes required to enable Batched CSR format described in https://github.com/pytorch/pytorch/issues/60854#batched-CSR-computation. Currently, only the same batch shape for indices and values is allowed. In the future, we could enable "broadcasting" of indices and batched values, as done in xFormers (`dd96b8d8be/xformers/components/attention/_sputnik_sparse.py (L441)`). This PR adds possibility to construct a batched CSR matrix with `torch.sparse_csr_tensor` and this batched CSR can be converted to a dense tensor with a `.to_dense()` call. Pull Request resolved: https://github.com/pytorch/pytorch/pull/74542 Approved by: https://github.com/cpuhrsch	2022-04-04 22:09:44 +00:00
PyTorch MergeBot	f6b9a1d4fb	Revert "Support masked sum on CSR tensors [CPU, CUDA]" This reverts commit `cda3f586d0`. Reverted https://github.com/pytorch/pytorch/pull/72633 on behalf of https://github.com/janeyx99	2022-04-04 22:06:19 +00:00
Pearu Peterson	cda3f586d0	Support masked sum on CSR tensors [CPU, CUDA] Pull Request resolved: https://github.com/pytorch/pytorch/pull/72633 Approved by: https://github.com/cpuhrsch	2022-04-04 19:23:45 +00:00
Nikita Shulga	bfac65dfe5	[testing] Update dispatch macros (#74977 ) This PR is reland of #74289 Co-authored-by: Khushi Agrawal <khushiagrawal411@gmail.com>	2022-03-30 14:13:21 -07:00
PyTorch MergeBot	cc23725e89	Revert "Extend CSR constructor to support batched indices and values" This reverts commit `c074a53002`. Reverted https://github.com/pytorch/pytorch/pull/74542 on behalf of https://github.com/malfet	2022-03-30 19:54:26 +00:00
PyTorch MergeBot	2e4152b118	Revert "[testing] Update dispatch macros" This reverts commit `eed19a0f38`. Reverted https://github.com/pytorch/pytorch/pull/74289 on behalf of https://github.com/malfet	2022-03-30 19:52:37 +00:00
Khushi Agrawal	eed19a0f38	[testing] Update dispatch macros Hi, This PR is the follow-up PR of #71561. (the previous PR had a couple of merge conflicts and was reverted, this PR resolves that). Please take a look. Thanks! cc: @pmeier @mruberry @kshitij12345 Pull Request resolved: https://github.com/pytorch/pytorch/pull/74289 Approved by: https://github.com/pmeier, https://github.com/mruberry	2022-03-30 16:10:16 +00:00
Ivan Yashchuk	c074a53002	Extend CSR constructor to support batched indices and values This is the first portion of changes required to enable Batched CSR format described in https://github.com/pytorch/pytorch/issues/60854#batched-CSR-computation. Currently, only the same batch shape for indices and values is allowed. In the future, we could enable "broadcasting" of indices and batched values, as done in xFormers (`dd96b8d8be/xformers/components/attention/_sputnik_sparse.py (L441)`). This PR adds possibility to construct a batched CSR matrix with `torch.sparse_csr_tensor` and this batched CSR can be converted to a dense tensor with a `.to_dense()` call. Pull Request resolved: https://github.com/pytorch/pytorch/pull/74542 Approved by: https://github.com/cpuhrsch	2022-03-29 21:20:25 +00:00
Christian Puhrsch	568e02dcd7	Support sum(sparse_csr) Basic support for summation of CSR. ~~Generalizes structured torch.sum to also support CSR.~~ Follow up work: - Autograd support - OpInfo integration Pull Request resolved: https://github.com/pytorch/pytorch/pull/74766 Approved by: https://github.com/ezyang	2022-03-29 18:44:02 +00:00
Christian Puhrsch	edf2deb81e	Add private conversion function from CSR to block CSR This PR adds a private function that converts a CSR Tensor into a [scipy-style block CSR Tensor](https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.bsr_matrix.html#scipy.sparse.bsr_matrix). It uses the scipy CSR to BSR conversion routines (and credits them accordingly). The main purpose of this function is to easily create a block CSR Tensor for matrix multiplication. Follow up work includes - Blocksize support for sparse_csr_tensor - Parallel CPU kernel - CUDA kernels - Faster arg sanitization - Benchmarking of cuSPARSE backend - Dense to/from block CSR - Autograd support - Column-major blocks - Block CSR to CSR conversion Pull Request resolved: https://github.com/pytorch/pytorch/pull/71582 Approved by: https://github.com/IvanYashchuk, https://github.com/albanD	2022-03-25 21:22:15 +00:00
Christian Puhrsch	7fe0b6a5cd	mul(sparse_csr, sparse_csr) using mul(sparse, sparse) Basic fallback implementation. Let's make this faster once used. NOTE: This is stacked on top of https://github.com/pytorch/pytorch/pull/74294 Pull Request resolved: https://github.com/pytorch/pytorch/pull/74266 Approved by: https://github.com/pearu, https://github.com/malfet	2022-03-25 17:10:33 +00:00
Christian Puhrsch	807b2e190b	Move to_sparse_csr to C++ Allows use of to_sparse_csr from C++ Pull Request resolved: https://github.com/pytorch/pytorch/pull/74294 Approved by: https://github.com/ngimel, https://github.com/malfet	2022-03-23 17:17:45 +00:00
Christian Puhrsch	a346a18150	Use assertEqual consistently in test_sparse_csr.py Let's use the provided comparison infrastructure Pull Request resolved: https://github.com/pytorch/pytorch/pull/74264 Approved by: https://github.com/IvanYashchuk, https://github.com/albanD	2022-03-16 15:19:41 +00:00
Nikita Shulga	ef066f0832	Revert D34856571: [pytorch][PR] Replace `get_all_` type macros with the ATen dispatch macros. Test Plan: revert-hammer Differential Revision: D34856571 (`3ded7b1da3`) Original commit changeset: 0dca038bcad5 Original Phabricator Diff: D34856571 (`3ded7b1da3`) fbshipit-source-id: 594553fa0b710d78beba59d5d2b646f1f1270386 (cherry picked from commit 8090eb9b12dcf452a9e7dc01792a66fb91b563b6)	2022-03-15 22:07:11 +00:00
Khushi Agrawal	3ded7b1da3	Replace `get_all_` type macros with the ATen dispatch macros. (#71561 ) Summary: Hi, Team! The PR is motivated from https://github.com/pytorch/pytorch/pull/71153#discussion_r782446738. It aims to replace `get_all` type macros with the ATen dispatch macros. The files it iterates over are: (Thanks, Lezcano, for the idea!!) <details> <summary> `test/test_autograd.py`</summary> <p> ```python 43:from torch.testing._internal.common_dtype import get_all_dtypes 8506: floating_dt = [dt for dt in get_all_dtypes() if dt.is_floating_point] ``` </p> </details> <details> <summary> `test/test_binary_ufuncs.py`</summary> <p> ```python 26: all_types_and_complex_and, integral_types_and, get_all_dtypes, get_all_int_dtypes, get_all_math_dtypes, 27: get_all_complex_dtypes, get_all_fp_dtypes, 935: dtypes(get_all_dtypes(include_bool=False, include_complex=False)) 1035: dtypes(get_all_dtypes( 1488: dtypes((get_all_dtypes(include_bool=False, include_bfloat16=False))) 1879: dtypes(product(get_all_dtypes(include_complex=False), get_all_dtypes(include_complex=False))) 1887: dtypes((get_all_int_dtypes() + [torch.bool])) 1913: dtypes((get_all_fp_dtypes())) 1941: dtypes((get_all_fp_dtypes())) 1977: dtypes(product(get_all_complex_dtypes(), get_all_dtypes())) 2019: dtypes(product(get_all_fp_dtypes(), get_all_fp_dtypes())) 2048: dtypes(get_all_dtypes()) 2110: dtypes(product(get_all_dtypes(include_complex=False), 2111: get_all_dtypes(include_complex=False))) 2128: types = [torch.bool, torch.bfloat16] + get_all_int_dtypes() 2173: if dtypes[1] in get_all_fp_dtypes(): 2178: dtypes(product(get_all_fp_dtypes(), 2179: get_all_fp_dtypes())) 2260: dtypesIfCUDA(set(get_all_math_dtypes('cuda')) - {torch.complex64, torch.complex128}) 2261: dtypes(set(get_all_math_dtypes('cpu')) - {torch.complex64, torch.complex128}) 2273: dtypesIfCUDA(set(get_all_math_dtypes('cuda')) - {torch.complex64, torch.complex128}) 2274: dtypes(set(get_all_math_dtypes('cpu')) - {torch.complex64, torch.complex128}) 2307: dtypes(get_all_math_dtypes('cpu')) 2319: dtypes(get_all_fp_dtypes(include_bfloat16=False)) 2331: dtypes(get_all_int_dtypes()) 2356: dtypes(get_all_dtypes(include_bfloat16=False, include_bool=False, include_complex=False)) 2393: if dtype in get_all_int_dtypes(): 2614: dtypes(get_all_dtypes()) 2624: dtypes(tuple(itertools.combinations_with_replacement(get_all_dtypes(), 2))) 2806: dtypes(list(product(get_all_dtypes(include_complex=False), 2807: get_all_dtypes(include_complex=False)))) 2866: dtypes(list(product(get_all_complex_dtypes(), 2867: get_all_complex_dtypes()))) 2902: dtypes(product(get_all_dtypes(), get_all_dtypes())) 2906: dtypes(product(get_all_dtypes(), get_all_dtypes())) 2910: dtypes(product(get_all_dtypes(), get_all_dtypes())) 3019: dtypes = [torch.float, torch.double] + get_all_complex_dtypes() 3221: dtypes(get_all_dtypes(include_complex=False)) 3407: dtypes(list(product(get_all_dtypes(include_bool=False), 3408: get_all_dtypes(include_bool=False)))) 3504: dtypes(product(get_all_dtypes(include_complex=False, include_bfloat16=False), 3505: get_all_dtypes(include_complex=False, include_bfloat16=False))) 3516: if x.dtype in get_all_int_dtypes() + [torch.bool]: 3643: dtypes(product(get_all_dtypes(include_complex=False, 3645: get_all_dtypes(include_complex=False, ``` </p> </details> <details> <summary> `test/test_complex.py`</summary> <p> ```python 6:from torch.testing._internal.common_dtype import get_all_complex_dtypes 11: dtypes(get_all_complex_dtypes()) ``` </p> </details> <details> <summary> `test/test_foreach.py`</summary> <p> ```python 18: get_all_dtypes, get_all_int_dtypes, get_all_complex_dtypes, get_all_fp_dtypes, 142: if dtype in get_all_int_dtypes(): 179: disable_fastpath = op.ref == torch.div and dtype in get_all_int_dtypes() + [torch.bool] 201: disable_fastpath = op.ref == torch.div and dtype in get_all_int_dtypes() + [torch.bool] 205: disable_fastpath \|= dtype in get_all_int_dtypes() + [torch.bool] 211: disable_fastpath \|= dtype not in get_all_complex_dtypes() 241: bool_int_div = op.ref == torch.div and dtype in get_all_int_dtypes() + [torch.bool] 246: disable_fastpath \|= dtype in get_all_int_dtypes() + [torch.bool] 248: disable_fastpath \|= dtype not in get_all_complex_dtypes() 250: disable_fastpath \|= True and dtype not in get_all_complex_dtypes() 307: disable_fastpath = dtype in get_all_int_dtypes() + [torch.bool] 365: if opinfo.name == "_foreach_abs" and dtype in get_all_complex_dtypes(): 376: ops(foreach_unary_op_db, dtypes=get_all_dtypes()) 393: dtypes=get_all_dtypes(include_half=True, include_bfloat16=True, include_complex=False)) 401: ops(foreach_minmax_op_db, dtypes=get_all_fp_dtypes(include_bfloat16=True, include_half=True)) 426: if ord in (1, 2) and dtype in torch.testing.get_all_fp_dtypes(): 439: dtypes(get_all_dtypes()) 449: ops(foreach_binary_op_db, dtypes=get_all_dtypes()) 481: ops(foreach_binary_op_db, dtypes=get_all_dtypes()) 536: if dtype in get_all_int_dtypes() + [torch.bool] and foreach_op == torch._foreach_div: 545: ops(foreach_binary_op_db, dtypes=get_all_dtypes()) 637: ops(foreach_pointwise_op_db, allowed_dtypes=get_all_fp_dtypes(include_half=False, include_bfloat16=False)) ``` </p> </details> <details> <summary> `test/test_linalg.py`</summary> <p> ```python 29: all_types, floating_types, floating_and_complex_types, get_all_dtypes, get_all_int_dtypes, get_all_complex_dtypes, 30: get_all_fp_dtypes, 111: dtypes((get_all_dtypes())) 794: float_and_complex_dtypes = get_all_fp_dtypes() + get_all_complex_dtypes() 807: dtypes((get_all_int_dtypes())) 828: dtypes((get_all_fp_dtypes() + get_all_complex_dtypes())) 841: if dtype in get_all_complex_dtypes(): 844: dtypes(itertools.product(get_all_dtypes(), 845: get_all_dtypes())) 855: for dtypes0, dtypes1, dtypes2 in product(get_all_dtypes(), repeat=3): 5607: get_all_fp_dtypes(include_half=not CUDA9, include_bfloat16=(CUDA11OrLater and SM53OrLater))) 5608: dtypes((set(get_all_dtypes()) - {torch.half, torch.bool})) 5644: dtypes((get_all_complex_dtypes() + get_all_fp_dtypes())) 6255: dtypesIfCUDA(get_all_complex_dtypes(), 6256: get_all_fp_dtypes(include_bfloat16=(TEST_WITH_ROCM or (CUDA11OrLater and SM53OrLater)), 6292: dtypesIfCUDA(get_all_fp_dtypes(include_bfloat16=(TEST_WITH_ROCM or (CUDA11OrLater and SM53OrLater)))) 6323: dtypesIfCUDA(get_all_complex_dtypes(), 6324: get_all_fp_dtypes(include_bfloat16=(TEST_WITH_ROCM or (CUDA11OrLater and SM53OrLater)))) 6325: dtypes(get_all_complex_dtypes(), get_all_fp_dtypes()) 6358: dtypesIfCUDA(([torch.float, torch.double] + get_all_complex_dtypes())) 6556: dtypes(get_all_fp_dtypes(), get_all_complex_dtypes()) 6668: dtypes(get_all_fp_dtypes(), get_all_complex_dtypes()) 6741: dtypes(get_all_fp_dtypes(), get_all_complex_dtypes()) ``` </p> </details> <details> <summary> `test/test_nn.py`</summary> <p> ```python 37:from torch.testing._internal.common_dtype import integral_types, get_all_fp_dtypes, get_all_math_dtypes 50: onlyNativeDeviceTypes, deviceCountAtLeast, largeTensorTest, expectedFailureMeta, skipMeta, get_all_device_types, \ 8862: for device in get_all_device_types(): 9629: for dt1 in get_all_math_dtypes(device): 9630: for dt2 in get_all_math_dtypes(device): 9631: for dt3 in get_all_math_dtypes(device): 9648: for input_dtype in get_all_math_dtypes(device): 9664: for input_dtype in get_all_math_dtypes(device): 13015: dtypes(get_all_fp_dtypes(include_bfloat16=AMPERE_OR_ROCM)) 13034: dtypes(get_all_fp_dtypes(include_bfloat16=AMPERE_OR_ROCM)) 13159: dtypes(get_all_fp_dtypes(include_bfloat16=AMPERE_OR_ROCM)) 17400: dtypesIfCUDA(get_all_fp_dtypes(include_bfloat16=AMPERE_OR_ROCM)) 17768: dtypesIfCUDA(get_all_fp_dtypes()) 17773: dtypesIfCUDA(get_all_fp_dtypes()) 17778: dtypesIfCUDA(get_all_fp_dtypes()) 17783: dtypesIfCUDA(get_all_fp_dtypes()) 17788: dtypesIfCUDA(get_all_fp_dtypes()) 17793: dtypesIfCUDA(get_all_fp_dtypes()) 17798: dtypesIfCUDA(get_all_fp_dtypes()) 17963: dtypesIfCUDA(get_all_fp_dtypes()) 17977: dtypesIfCUDA(get_all_fp_dtypes()) 18684: def test_cross_entropy_loss_prob_target_all_reductions(self, device): ``` </p> </details> <details> <summary> `test/test_numpy_interop.py`</summary> <p> ```python 12:from torch.testing._internal.common_dtype import get_all_dtypes 399: dtypes(get_all_dtypes()) ``` </p> </details> <details> <summary> `test/test_ops.py`</summary> <p> ```python 12:from torch.testing._internal.common_dtype import floating_and_complex_types_and, get_all_dtypes 86: for dtype in get_all_dtypes(): ``` </p> </details> <details> <summary> `test/test_reductions.py`</summary> <p> ```python 16: get_all_dtypes, get_all_math_dtypes, get_all_int_dtypes, get_all_complex_dtypes, get_all_fp_dtypes, 360: allowed_dtypes=get_all_dtypes(include_bfloat16=False)) 366: allowed_dtypes=get_all_dtypes(include_bfloat16=False)) 394: allowed_dtypes=get_all_dtypes(include_bfloat16=False)) 750: for dtype in [dtype for dtype in get_all_math_dtypes('cpu') if dtype != torch.float16]: 1404: dtypes(get_all_dtypes(include_bool=False, include_complex=False)) 1457: dtypes((get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False) + 1458: get_all_complex_dtypes())) 1465: return dtype in get_all_int_dtypes() 1494: dtypes((get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False))) 1501: dtypes((get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False))) 1507: dtypes((get_all_complex_dtypes())) 1514: dtypes = list(get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False)) 1523: dtypes((get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False))) 1531: if dtype in get_all_fp_dtypes(): 1608: dtypes((get_all_dtypes(include_half=True, include_bfloat16=False, 1837: dtypes(get_all_dtypes(include_bool=False, include_complex=False)) 1855: dtypes((set(get_all_dtypes(include_bool=False, include_complex=False)) - {torch.uint8})) 3219: for dtype in get_all_dtypes(include_half=True, include_bfloat16=False, ``` </p> </details> <details> <summary> `test/test_serialization.py`</summary> <p> ```python 26:from torch.testing._internal.common_dtype import get_all_dtypes 586: for device, dtype in product(devices, get_all_dtypes()): 589: for other_dtype in get_all_dtypes(): ``` </p> </details> <details> <summary> `test/test_shape_ops.py`</summary> <p> ```python 18:from torch.testing._internal.common_dtype import get_all_dtypes 230: dtypes(get_all_dtypes(include_complex=False, include_bool=False, include_half=False, 232: dtypesIfCUDA(get_all_dtypes(include_complex=False, include_bool=False, include_bfloat16=False)) 344: dtypes(get_all_dtypes()) 443: dtypes(get_all_dtypes()) 461: dtypes(get_all_dtypes()) 570: dtypes(get_all_dtypes(include_complex=False)) ``` </p> </details> <details> <summary> `test/test_sort_and_select.py`</summary> <p> ```python 12: all_types, all_types_and, floating_types_and, get_all_dtypes, get_all_int_dtypes, get_all_fp_dtypes, 136: dtypes(set(get_all_dtypes()) - {torch.bool, torch.complex64, torch.complex128}) 231: dtypes(set(get_all_dtypes()) - {torch.bool, torch.complex64, torch.complex128}) 296: dtypes((get_all_int_dtypes() + get_all_fp_dtypes())) 647: dtypesIfCUDA(get_all_fp_dtypes()) 678: dtypesIfCUDA((get_all_dtypes(include_complex=False, 682: dtypes((get_all_dtypes(include_complex=False, include_bool=False, include_half=False, include_bfloat16=False))) 739: dtypesIfCPU(set(get_all_dtypes()) - {torch.complex64, torch.complex128}) 740: dtypes(set(get_all_dtypes()) - {torch.bfloat16, torch.complex64, torch.complex128}) 799: dtypesIfCPU(set(get_all_dtypes()) - {torch.complex64, torch.complex128}) 800: dtypes(set(get_all_dtypes()) - {torch.bfloat16, torch.complex64, torch.complex128}) ``` </p> </details> <details> <summary> `test/test_sparse.py`</summary> <p> ```python 20:from torch.testing import get_all_complex_dtypes, get_all_fp_dtypes 29: floating_and_complex_types, floating_and_complex_types_and, get_all_dtypes, get_all_int_dtypes, 1963: return dtype in get_all_int_dtypes() 1994: dtypes(get_all_dtypes(include_bool=False, include_half=False, 2103: return dtype in get_all_int_dtypes() 2138: dtypes(get_all_dtypes(include_bool=False, include_half=False, 2626: all_sparse_dtypes = get_all_dtypes(include_complex=True) 2633: all_sparse_dtypes = get_all_dtypes(include_complex=True) 3230: dtypes(get_all_complex_dtypes(), 3231: get_all_fp_dtypes(include_half=False, include_bfloat16=False)) 3234: get_all_fp_dtypes( ``` </p> </details> <details> <summary> `test/test_sparse_csr.py`</summary> <p> ```python 7:from torch.testing import get_all_complex_dtypes, get_all_fp_dtypes, floating_and_complex_types, make_tensor 17:from torch.testing._internal.common_dtype import floating_types, get_all_dtypes 120: dtypes(get_all_dtypes()) 133: dtypes(get_all_dtypes()) 150: dtypes(get_all_dtypes()) 180: dtypes(get_all_dtypes()) 201: dtypes(get_all_dtypes()) 210: dtypes(get_all_dtypes()) 225: dtypes(get_all_dtypes()) 244: dtypes(get_all_dtypes()) 263: dtypes(get_all_dtypes()) 285: dtypes(get_all_dtypes()) 411: dtypes(get_all_dtypes()) 482: dtypes(get_all_dtypes()) 502: dtypes(get_all_dtypes()) 562: dtypes(get_all_dtypes()) 588: dtypesIfCUDA(get_all_complex_dtypes(), 589: get_all_fp_dtypes(include_half=SM53OrLater, include_bfloat16=SM80OrLater)) 745: dtypesIfCUDA(get_all_complex_dtypes(), 746: get_all_fp_dtypes(include_half=SM53OrLater and TEST_CUSPARSE_GENERIC, 765: dtypesIfCUDA(get_all_complex_dtypes(), 766: get_all_fp_dtypes(include_half=SM53OrLater and TEST_CUSPARSE_GENERIC, 801: torch.testing.get_all_fp_dtypes(include_bfloat16=SM80OrLater, 841: torch.testing.get_all_fp_dtypes(include_bfloat16=SM80OrLater, 1182: dtypes(get_all_dtypes()) 1276: dtypes(get_all_dtypes(include_bool=False, include_half=False, include_bfloat16=False)) 1286: dtypes(get_all_dtypes()) ``` </p> </details> <details> <summary> `test/test_tensor_creation_ops.py`</summary> <p> ```python 21: onlyCUDA, skipCPUIf, dtypesIfCUDA, skipMeta, get_all_device_types) 23: get_all_dtypes, get_all_math_dtypes, get_all_int_dtypes, get_all_fp_dtypes, get_all_complex_dtypes 150: for dt in get_all_dtypes(): 160: for dt in get_all_dtypes(): 314: dtypes = [dtype for dtype in get_all_dtypes() if dtype != torch.bfloat16] 1012: dtypes((get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False) + 1013: get_all_complex_dtypes())) 1032: dtypes((get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False) + 1033: get_all_complex_dtypes())) 1050: dtypes((get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False) + 1051: get_all_complex_dtypes())) 1745: dtypes((get_all_int_dtypes() + get_all_fp_dtypes())) 1779: dtypes((get_all_int_dtypes() + get_all_fp_dtypes())) 1868: dtypes((get_all_int_dtypes() + get_all_fp_dtypes())) 1926: dtypes((get_all_int_dtypes() + get_all_fp_dtypes())) 1954: do_test_empty_full(self, get_all_math_dtypes('cpu'), torch.strided, torch_device) 1956: do_test_empty_full(self, get_all_math_dtypes('cpu'), torch.strided, None) 1957: do_test_empty_full(self, get_all_math_dtypes('cpu'), torch.strided, torch_device) 2538: for device in get_all_device_types(): 2645: for dtype in get_all_dtypes(): 2678: dtypes((get_all_fp_dtypes(include_half=False, include_bfloat16=False) + 2679: get_all_complex_dtypes())) 2716: dtypes(get_all_fp_dtypes(include_half=False, include_bfloat16=False)) 2827: for dt in get_all_dtypes(): 2913: dtypes(get_all_dtypes(include_bool=False, include_half=False)) 2914: dtypesIfCUDA(get_all_dtypes(include_bool=False, include_half=True)) 3028: dtypes((get_all_fp_dtypes() + get_all_complex_dtypes())) 3033: dtypes((get_all_fp_dtypes() + get_all_complex_dtypes())) 3074: dtypes(get_all_dtypes(include_bool=False, include_half=False, include_complex=False)) 3075: dtypesIfCUDA(((get_all_int_dtypes() + [torch.float32, torch.float16, torch.bfloat16]) 3077: else get_all_dtypes(include_bool=False, include_half=True, include_complex=False))) 3873: dtypes(get_all_dtypes()) 3884: dtypes(get_all_dtypes(include_bool=False)) 3916: for other in get_all_dtypes(): 3922: dtypes(get_all_dtypes()) 3932: dtypes(get_all_dtypes(include_bool=False)) 3955: dtypes(get_all_dtypes(include_bool=False)) 3961: dtypes(get_all_dtypes(include_bool=False)) 3965: dtypes(get_all_dtypes()) ``` </p> </details> <details> <summary> `test/test_testing.py`</summary> <p> ```python 25:from torch.testing._internal.common_dtype import get_all_dtypes 31: dtypes((get_all_dtypes(include_half=True, include_bfloat16=False, ``` </p> </details> <details> <summary> `test/test_torch.py`</summary> <p> ```python 51: expectedAlertNondeterministic, get_all_device_types, skipXLA) 57: get_all_fp_dtypes, get_all_int_dtypes, get_all_math_dtypes, get_all_dtypes, get_all_complex_dtypes 296: for d in get_all_device_types(): 323: for device in get_all_device_types(): 324: for dt1 in get_all_dtypes(): 325: for dt2 in get_all_dtypes(): 343: all_dtypes = get_all_dtypes() 350: all_dtypes = get_all_dtypes() 781: for dtype in get_all_dtypes(): 986: for device in get_all_device_types(): 1017: for device in get_all_device_types(): 1018: for dtype in get_all_math_dtypes(device): 2792: for device in get_all_device_types(): 3186: dtypes(get_all_dtypes()) 3195: for error_dtype in get_all_dtypes(): 3203: dtypes(get_all_dtypes()) 3212: for error_dtype in get_all_dtypes(): 4539: dtypes(get_all_fp_dtypes()) 4545: dtypes((get_all_int_dtypes() + get_all_fp_dtypes())) 4577: dtypes(get_all_fp_dtypes(include_half=False, include_bfloat16=False)) 4578: dtypesIfCPU((get_all_fp_dtypes(include_half=False, include_bfloat16=True))) 4579: dtypesIfCUDA((get_all_fp_dtypes(include_bfloat16=False))) 4599: dtypes((get_all_fp_dtypes(include_half=False, include_bfloat16=False))) 4600: dtypesIfCPU((get_all_dtypes(include_half=False, include_bfloat16=False, include_complex=False))) 4601: dtypesIfCUDA((get_all_dtypes(include_bfloat16=False, include_complex=False))) 4613: for p_dtype in get_all_fp_dtypes(include_half=device.startswith('cuda'), include_bfloat16=False): 4628: dtypes((get_all_fp_dtypes(include_half=False, include_bfloat16=False))) 4629: dtypesIfCUDA((get_all_fp_dtypes(include_bfloat16=False))) 4640: dtypes(get_all_fp_dtypes()) 4723: dtypes(get_all_fp_dtypes()) 4735: dtypes(get_all_fp_dtypes(include_bfloat16=False)) 4736: dtypesIfCUDA(get_all_fp_dtypes()) 4747: dtypes(get_all_fp_dtypes()) 4761: dtypes(get_all_fp_dtypes()) 4771: dtypes(get_all_fp_dtypes()) 4792: dtypes((get_all_int_dtypes() + get_all_fp_dtypes())) 5302: dtypes(get_all_dtypes(include_bfloat16=False)) 5322: dtypes(get_all_dtypes(include_half=False, include_bfloat16=False)) 5323: dtypesIfCPU(get_all_dtypes(include_bfloat16=False)) 5324: dtypesIfCUDA(get_all_dtypes(include_bfloat16=False)) 5591: for dt in get_all_dtypes(): 5611: for dt in get_all_dtypes(): 5678: for dt in get_all_dtypes(): 5696: dtypesIfCUDA(set(get_all_math_dtypes('cuda'))) 5697: dtypes(set(get_all_math_dtypes('cpu'))) 5746: dtypes(get_all_dtypes()) 5780: dtypes(get_all_dtypes()) 5885: dtypes(get_all_dtypes()) 5902: dtypes(get_all_dtypes()) 5945: dtypes(get_all_dtypes()) 5979: dtypes(get_all_dtypes(include_bool=False)) 6049: dtypes(get_all_dtypes(include_bool=False)) 6092: dtypes((get_all_fp_dtypes(include_bfloat16=False, include_half=False) + 6093: get_all_complex_dtypes())) 6094: dtypesIfCPU(get_all_dtypes()) 6095: dtypesIfCUDA(get_all_dtypes()) 6122: dtypes((get_all_fp_dtypes(include_bfloat16=False, include_half=False) + 6123: get_all_complex_dtypes())) 6124: dtypesIfCPU(get_all_dtypes()) 6125: dtypesIfCUDA(get_all_dtypes()) 6163: dtypes((get_all_fp_dtypes(include_bfloat16=False, include_half=False) + 6164: get_all_complex_dtypes())) 6165: dtypesIfCPU(get_all_dtypes()) 6166: dtypesIfCUDA(get_all_dtypes()) 6190: dtypes((get_all_complex_dtypes() + 6191: get_all_int_dtypes())) 6238: dtypes(get_all_dtypes()) 6323: dtypes(get_all_dtypes()) 6389: dtypes(product(get_all_dtypes(), (torch.uint8, torch.bool))) 6699: dtypesIfCUDA(set(get_all_math_dtypes('cuda'))) 6700: dtypes(set(get_all_math_dtypes('cpu'))) 7452: dtypes(get_all_dtypes(include_bool=False)) 7461: dtypes(get_all_dtypes(include_bool=False)) 7477: dtypes(get_all_dtypes(include_bool=False)) 7496: dtypes(get_all_dtypes(include_bool=False)) 7538: dtypes(get_all_dtypes(include_bool=False)) 8162: dtypes((get_all_int_dtypes() + get_all_fp_dtypes() + 8163: get_all_complex_dtypes())) 8175: dtypes((get_all_int_dtypes() + get_all_fp_dtypes() + 8176: get_all_complex_dtypes())) ``` </p> </details> <details> <summary> `test/test_type_promotion.py`</summary> <p> ```python 14: get_all_dtypes, get_all_math_dtypes, get_all_int_dtypes, get_all_fp_dtypes 187: for dtype in get_all_dtypes(): 262: dtypes1 = get_all_math_dtypes('cuda') 263: dtypes2 = get_all_math_dtypes(device) 339: dtypes(itertools.product(get_all_dtypes(), get_all_dtypes())) 468: for dt1 in get_all_math_dtypes(device): 469: for dt2 in get_all_math_dtypes(device): 519: for dt1 in get_all_math_dtypes(device): 520: for dt2 in get_all_math_dtypes(device): 528: for dt in get_all_math_dtypes(device): 561: for dtype in get_all_dtypes(): 766: dtypes=get_all_math_dtypes(device)) 771: dtypes=get_all_math_dtypes(device)) 782: dtypes=get_all_math_dtypes(device)) 879: dtypes = get_all_dtypes(include_bfloat16=False) 898: dtypes = get_all_dtypes(include_bfloat16=False, include_bool=False) 965: dtypesIfCUDA(itertools.product(get_all_dtypes(include_bfloat16=False, include_complex=False), 966: get_all_dtypes(include_bfloat16=False, include_complex=False))) 967: dtypes(itertools.product(get_all_dtypes(include_half=False, include_bfloat16=False, 969: get_all_dtypes(include_half=False, include_bfloat16=False, 976: return dtype in get_all_int_dtypes() + [torch.bool] 979: return dtype in get_all_fp_dtypes(include_half=True, include_bfloat16=False) ``` </p> </details> <details> <summary> `test/test_unary_ufuncs.py`</summary> <p> ```python 24: floating_types_and, all_types_and_complex_and, floating_and_complex_types_and, get_all_dtypes, get_all_math_dtypes, 25: get_all_int_dtypes, get_all_fp_dtypes, get_all_complex_dtypes 517: dtypes((get_all_int_dtypes() + [torch.bool] + 518: get_all_fp_dtypes(include_bfloat16=False))) 596: dtypes(get_all_fp_dtypes(include_half=True, include_bfloat16=False)) 611: invalid_input_dtypes = get_all_int_dtypes() + \ 612: get_all_complex_dtypes() + \ 619: for dtype in get_all_fp_dtypes(include_half=True, include_bfloat16=False): 1048: dtypes(get_all_math_dtypes('cpu')) 1182: dtypesIfCUDA(get_all_fp_dtypes()) 1190: dtypesIfCUDA(get_all_fp_dtypes()) 1205: dtypesIfCUDA(get_all_fp_dtypes()) 1215: dtypesIfCUDA(get_all_fp_dtypes()) 1307: dtypes((get_all_dtypes(include_bool=False))) 1349: dtypes((get_all_fp_dtypes(include_half=False) + 1350: get_all_complex_dtypes())) 1351: dtypesIfCUDA((get_all_fp_dtypes(include_half=True) + 1352: get_all_complex_dtypes())) ``` </p> </details> <details> <summary> `test/test_view_ops.py`</summary> <p> ```python 19: get_all_dtypes, get_all_int_dtypes, get_all_fp_dtypes, get_all_complex_dtypes 124: dtypes((get_all_int_dtypes() + get_all_fp_dtypes())) 131: dtypes(get_all_dtypes(include_bfloat16=False)) 213: for view_dtype in [get_all_fp_dtypes(), get_all_complex_dtypes()]: 220: dtypes(get_all_dtypes()) 224: for view_dtype in get_all_dtypes(): 305: dtypes(get_all_complex_dtypes(include_complex32=True)) 343: dtypes(get_all_dtypes()) 354: dtypes(get_all_dtypes()) 364: dtypes(get_all_dtypes()) 374: dtypes(get_all_dtypes()) 384: dtypes((get_all_int_dtypes() + get_all_fp_dtypes())) 395: dtypes(get_all_complex_dtypes()) 426: dtypes(get_all_complex_dtypes()) 451: dtypes(product(get_all_complex_dtypes(), get_all_dtypes())) 1263: dtypes((torch.testing.get_all_dtypes())) 1279: dtypes((torch.testing.get_all_dtypes())) 1405: dtypes((get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False) + 1406: get_all_complex_dtypes())) 1471: dtypes(get_all_dtypes(include_bfloat16=False)) 1574: dtypes(get_all_dtypes()) 1601: dtypes(get_all_dtypes(include_bfloat16=False)) 1632: dtypes(*get_all_dtypes(include_bfloat16=False)) 1711: for dt in get_all_dtypes(): 1717: for dt in get_all_dtypes(): 1724: for dt in get_all_dtypes(): ``` </p> </details> I'm looking forward to your viewpoints. Thanks :) cc: mruberry kshitij12345 anjali411 Pull Request resolved: https://github.com/pytorch/pytorch/pull/71561 Reviewed By: samdow Differential Revision: D34856571 Pulled By: mruberry fbshipit-source-id: 0dca038bcad5cf69906245c496d2e61ac3876335 (cherry picked from commit b058f67b4313143efa714ab105f36e74083131b9)	2022-03-15 20:31:41 +00:00
Pearu Peterson	4168c87ed3	Support CSR to COO conversion in to_sparse(2). (#73642 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73642 Former https://github.com/pytorch/pytorch/pull/73471 that was reverted due to lack of `to_sparse(sparse_dim)` support. Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D34580353 Pulled By: cpuhrsch fbshipit-source-id: a8a4ea381daeb80d8365fe931af9f55a7e789ea1 (cherry picked from commit 5a3cf8110980e5a10dbb687e87e67d5524ebf2f5)	2022-03-02 22:33:32 +00:00
Nikita Shulga	8ac7393565	Revert D33767740: [pytorch][PR] Sparse CSR CPU: cuSolverSP backend for `linalg.solve` Test Plan: revert-hammer Differential Revision: D33767740 (`199d9a992c`) Original commit changeset: a945f065210c Original Phabricator Diff: D33767740 (`199d9a992c`) fbshipit-source-id: b7934df18118f8d6d5f165deb5aae9887953ae43 (cherry picked from commit d3ddbb021b227e3638f6f7c22c6eadfa73695e31)	2022-03-01 18:33:23 +00:00
Rohan Varma	95204c4e2b	Revert D34503882: Support CSR to COO conversion in to_sparse. Test Plan: revert-hammer Differential Revision: D34503882 (`84f4e9c10a`) Original commit changeset: 4a781647a0ae Original Phabricator Diff: D34503882 (`84f4e9c10a`) fbshipit-source-id: cf161171a3b51aa3c0f2b15501956873b1ba29dd (cherry picked from commit 924c19071713777700087087b27b388eb057d8d9)	2022-03-01 15:33:37 +00:00
Pearu Peterson	84f4e9c10a	Support CSR to COO conversion in to_sparse. (#73471 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73471 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D34503882 Pulled By: cpuhrsch fbshipit-source-id: 4a781647a0ae5d03827406b75b14acc7c48da0b0 (cherry picked from commit fa3dbdc6a8529d19f8a055494436ca1f766807be)	2022-03-01 06:31:52 +00:00
Kushashwa Ravi Shrimali	199d9a992c	Sparse CSR CPU: cuSolverSP backend for `linalg.solve` (#71399 ) Summary: This PR introduces the `cuSolverSP` backend for `linalg.solve` with sparse CSR input matrices. The motivation comes from the issue: https://github.com/pytorch/pytorch/issues/69538. `cuSolver` provides [`cusolverSp<t>csrlsvluHost`](https://docs.nvidia.com/cuda/cusolver/index.html#cusolver-lt-t-gt-csrlsvlu) API, a few things to note: 1. As mentioned in the documentation: `only CPU (Host) path is provided.` From the profiling, there doesn't seem to be any GPU kernel launch for optimization, please see the profiling below. 2. Since only `host` path is provided, the CPU path uses `csrlsvluHost` (but requires PyTorch to be installed/built with CUDA support). 3. The documentation mentions reordering helps optimize stuff, but it isn't clear how it affects the performance. There are options for reordering, so we stick to `reorder = 0` as the default choice. `cuSolver` has [`csrlsvqr`](https://docs.nvidia.com/cuda/cusolver/index.html#cusolver-lt-t-gt-csrlsvqr) function which provides a `device` path to solve the linear system. This function is used for the CUDA path in this PR. Gist: For CPU Path: we call [`csrlsvluHost` function of cuSolver](https://docs.nvidia.com/cuda/cusolver/index.html#cusolver-lt-t-gt-csrlsvlu). For CUDA Path: we call [`csrlsvqr` function of cuSolver](https://docs.nvidia.com/cuda/cusolver/index.html#cusolver-lt-t-gt-csrlsvqr). Profiling: (On sparse input tensor of size 1000 x 1000, with a vector of shape length 1000), for `csrlsvlu` function (to show no GPU optimization) ```cpp ==3999651== Profiling result: Type Time(%) Time Calls Avg Min Max Name GPU activities: 100.00% 2.1440us 1 2.1440us 2.1440us 2.1440us [CUDA memcpy HtoD] API calls: 99.72% 1.07199s 9 119.11ms 500ns 1.07164s cudaFree 0.11% 1.2182ms 398 3.0600us 140ns 137.94us cuDeviceGetAttribute 0.06% 674.45us 4 168.61us 165.50us 173.64us cuDeviceTotalMem 0.03% 357.07us 4 89.268us 2.7800us 201.89us cudaMalloc 0.03% 309.29us 1 309.29us 309.29us 309.29us cudaGetDeviceProperties 0.01% 160.47us 332 483ns 350ns 3.3300us cudaFuncSetAttribute 0.01% 115.12us 4 28.780us 26.290us 33.410us cuDeviceGetName 0.00% 28.591us 5 5.7180us 440ns 16.921us cudaGetDevice 0.00% 22.061us 4 5.5150us 871ns 18.690us cudaDeviceSynchronize 0.00% 20.370us 18 1.1310us 410ns 6.9900us cudaEventDestroy 0.00% 16.390us 1 16.390us 16.390us 16.390us cudaMemcpy 0.00% 11.540us 2 5.7700us 1.4900us 10.050us cuDeviceGetPCIBusId 0.00% 10.510us 18 583ns 430ns 1.6200us cudaEventCreateWithFlags 0.00% 7.9100us 21 376ns 290ns 700ns cudaDeviceGetAttribute 0.00% 1.4300us 6 238ns 150ns 590ns cuDeviceGet 0.00% 1.2200us 4 305ns 190ns 500ns cuDeviceGetCount 0.00% 900ns 1 900ns 900ns 900ns cuInit 0.00% 860ns 4 215ns 180ns 260ns cuDeviceGetUuid 0.00% 240ns 1 240ns 240ns 240ns cuDriverGetVersion 0.00% 230ns 1 230ns 230ns 230ns cudaGetDeviceCount ``` Script: ```python import torch def solve(x, other, out): torch.linalg.solve(x, other, out=out) if __name__ == "__main__": dense_inp = torch.randn((1000, 1000), dtype=torch.float64) # Set 50% of the values to 0 randomly dense_inp = torch.nn.functional.dropout(dense_inp, p=0.5) sparse_inp = dense_inp.to_sparse_csr() other = torch.randint(100, (1000,), dtype=torch.float64) out = torch.randint(1, (1000,), dtype=torch.float64) solve(sparse_inp, other, out) ``` The following error is raised when the function is used on a CPU device with PyTorch built/installed without CUDA support: * When built without CUDA support: ```python /home/krshrimali/pytorch/torch/autograd/profiler.py:151: UserWarning: CUDA is not available, disabling CUDA profiling warn("CUDA is not available, disabling CUDA profiling") Traceback (most recent call last): File "/home/krshrimali/pytorch/test_sp.py", line 17, in <module> solve(x, other, out) File "/home/krshrimali/pytorch/test_sp.py", line 5, in solve torch.linalg.solve(x, other, out=out) RuntimeError: PyTorch was not built with CUDA support. Please use PyTorch built CUDA support ``` Performance Comparison (vs SciPy's [`scipy.sparse.linalg.spsolve`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.linalg.spsolve.html): Time taken by `scipy.sparse.linalg.spsolve` : 0.595 seconds On CPU: Time taken by `torch.linalg.solve` : 4.565 seconds On CUDA: Time taken by `torch.linalg.solve`: 1.838 seconds The inputs are of dimensions: (17281, 17281) and (17281, 1), and were taken from https://math.nist.gov/MatrixMarket/extreme.html. Thanks to IvanYashchuk for helping me with the PR, and guiding me through it. cc: IvanYashchuk pearu nikitaved cpuhrsch cc nikitaved pearu cpuhrsch Pull Request resolved: https://github.com/pytorch/pytorch/pull/71399 Reviewed By: VitalyFedyunin Differential Revision: D33767740 Pulled By: cpuhrsch fbshipit-source-id: a945f065210cd719096eb8d7cdbf8e8937c2fce9 (cherry picked from commit f4f35c17da414e1ca6c6d91402933521857aa1ea)	2022-03-01 05:32:35 +00:00
Ivan Yashchuk	0ba3498248	Sparse CSR CPU: implement addmm(dense, sparse, sparse) -> dense (#73076 ) Summary: This PR adds a possibility to multiply two sparse matrices and add the result of a product to a dense matrix. It uses [MKL spmmd function](https://www.intel.com/content/www/us/en/develop/documentation/onemkl-developer-reference-c/top/blas-and-sparse-blas-routines/inspector-executor-sparse-blas-routines/inspector-executor-sparse-blas-execution-routines/mkl-sparse-spmmd.html) and only CPU path is implemented for now. Ref. https://github.com/pytorch/pytorch/issues/60858 Pull Request resolved: https://github.com/pytorch/pytorch/pull/73076 Reviewed By: mikaylagawarecki Differential Revision: D34342993 Pulled By: cpuhrsch fbshipit-source-id: 5e5ea67cb92fbaa4d4c0eaf61e85019972989a21 (cherry picked from commit 62b8dc730e6a6736f5c03ac09eac5223cd9706cf)	2022-02-26 01:08:45 +00:00
Ivan Yashchuk	ebd93f69db	Enable CSR inputs for torch.sparse.mm (#73075 ) Summary: Previously `torch.sparse.mm` supported only COO and dense inputs. Computing derivatives works wrt dense input for sparse_csr x dense -> dense Modified implementation of `torch.sparse.mm` to be directly bound to ATen function. Pull Request resolved: https://github.com/pytorch/pytorch/pull/73075 Reviewed By: mikaylagawarecki Differential Revision: D34342954 Pulled By: cpuhrsch fbshipit-source-id: a6ed914a0ce28b35276109479109095f7149d32b (cherry picked from commit 948de1816c46cd087bacbee36dc583cf409813f9)	2022-02-24 04:30:48 +00:00
Pearu Peterson	e785c0a1ab	Enable Half/BFloat16 support for to_dense and coalesce methods. (#72397 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72397 Test Plan: Imported from OSS Reviewed By: jbschlosser, zou3519 Differential Revision: D34286114 Pulled By: cpuhrsch fbshipit-source-id: a4f7e2abc3b2d37437cbd09d693c1b409bb011b9 (cherry picked from commit `74f94447fc`)	2022-02-17 02:54:23 +00:00
Ivan Yashchuk	fb7c4780f9	Add autograd tests for addmm, addmv, mm, mv and CSR matrix input (#71949 ) Summary: This PR adds autograd tests for `addmm, addmv, mm, mv` functions that check computing derivatives wrt dense inputs. Currently, neither autograd engine, nor gradcheck can work with CSR inputs<->CSR outputs. I added xfailing tests for that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/71949 Reviewed By: george-qi Differential Revision: D33834653 Pulled By: cpuhrsch fbshipit-source-id: 4144c1547427d4cd6b01495cf45242bb4e914e86 (cherry picked from commit `2cb362283d`)	2022-02-11 23:14:02 +00:00
Ivan Yashchuk	ad5a5a9794	Beta value is ignored for sparse torch.addmm with non-MKL build (#72430 ) Summary: When PyTorch is not built with MKL or on Windows there's a native implementation of `torch.addmm` for tensors on CPU. There was a bug that `beta` value was ignored, causing new tests to fail (see https://github.com/pytorch/pytorch/pull/71949#issuecomment-1024639741). In addition, I also enabled complex numbers support for this code path. Pull Request resolved: https://github.com/pytorch/pytorch/pull/72430 Reviewed By: davidberard98 Differential Revision: D34045670 Pulled By: cpuhrsch fbshipit-source-id: b2b63f22ba3eea895a31c5c2925b0fb1555d2c6f (cherry picked from commit `ac0a2080bb`)	2022-02-09 00:32:17 +00:00
Nikita Shulga	38ebb776a4	Fail with unexpected success for fatal errors (#72016 ) Summary: Rest of the tests from CUDA testuite is skipped after GPU context corruption is encountered. For tests decorated with `expectedFailure` creates false impression that entire testsuite is passing. Remedy it by suppressing the exception and printing the warning about unexpected success if `should_stop_early` is true Also, prints warning when this happens (to make attribution easier) as well as when this condition is detected. Pull Request resolved: https://github.com/pytorch/pytorch/pull/72016 Test Plan: `python test_ops.py -v -k test_fn_fwgrad_bwgrad_gradient` Before the change: ``` test_fn_fwgrad_bwgrad_gradient_cpu_complex128 (__main__.TestGradientsCPU) ... ok test_fn_fwgrad_bwgrad_gradient_cpu_float64 (__main__.TestGradientsCPU) ... ok test_fn_fwgrad_bwgrad_gradient_cuda_complex128 (__main__.TestGradientsCUDA) ... expected failure ---------------------------------------------------------------------- Ran 3 tests in 0.585s OK (expected failures=1) ``` After the change: ``` test_fn_fwgrad_bwgrad_gradient_cpu_complex128 (__main__.TestGradientsCPU) ... ok test_fn_fwgrad_bwgrad_gradient_cpu_float64 (__main__.TestGradientsCPU) ... ok test_fn_fwgrad_bwgrad_gradient_cuda_complex128 (__main__.TestGradientsCUDA) ... /home/conda/miniconda3/lib/python3.9/site-packages/torch/testing/_internal/common_utils.py:1670: UserWarning: TEST SUITE EARLY TERMINATION due to torch.cuda.synchronize() failed with CUDA error: an illegal memory access was encountered CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. warn(f"TEST SUITE EARLY TERMINATION due to torch.cuda.synchronize() failed with {rte}") /home/conda/miniconda3/lib/python3.9/site-packages/torch/testing/_internal/common_device_type.py:382: UserWarning: Suppressed expected failure that resulted in fatal error warn("Suppressed expected failure that resulted in fatal error") unexpected success ---------------------------------------------------------------------- Ran 3 tests in 0.595s FAILED (unexpected successes=1) ``` And `stderr` from XML file contains requested info: ``` /home/conda/miniconda3/lib/python3.9/site-packages/torch/testing/_internal/common_utils.py:1670: UserWarning: TEST SUITE EARLY TERMINATION due to torch.cuda.synchronize() failed with CUDA error: an illegal memory access was encountered CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. warn(f"TEST SUITE EARLY TERMINATION due to torch.cuda.synchronize() failed with {rte}") /home/conda/miniconda3/lib/python3.9/site-packages/torch/testing/_internal/common_device_type.py:382: UserWarning: Suppressed expected failure that resulted in fatal error warn("Suppressed expected failure that resulted in fatal error") ``` Fixes https://github.com/pytorch/pytorch/issues/71973 Reviewed By: janeyx99, ngimel Differential Revision: D33854287 Pulled By: malfet fbshipit-source-id: dd0f5a4d2fcd21ebb7ee50ce4ec4914405a812d0 (cherry picked from commit `0c0baf3931`)	2022-02-03 17:49:59 +00:00
Kushashwa Ravi Shrimali	85591dc85d	Test 0->0 correspondence for Unary Ops with Sparse CSR inputs (#70302 ) Summary: Since there is no rule in PyTorch (Sparse CSR) for filling zeros, it was decided that only those ops will be supported which do not break 0->0 correspondence. To ensure that this rule is not broken, this PR aims to add a test to ensure this rule is not broken. `sample_inputs_unary` may or may not generate a zero in the sample input. Hence, this separate test is good for validating the rule, and the support for Sparse CSR. cc nikitaved pearu cpuhrsch Pull Request resolved: https://github.com/pytorch/pytorch/pull/70302 Reviewed By: albanD Differential Revision: D33922501 Pulled By: cpuhrsch fbshipit-source-id: 10f67a220b95a8e75205345a33744ad536fdcf53 (cherry picked from commit `ade9bf7818`)	2022-02-03 16:53:27 +00:00
Christian Puhrsch	4a7e07e53e	Fix torch.save and detach for CSR Tensor (#71963 ) Summary: Currently saving a CSR Tensor simply fails. This also addresses the segfault encountered in https://github.com/pytorch/pytorch/issues/71652. Pull Request resolved: https://github.com/pytorch/pytorch/pull/71963 Reviewed By: jbschlosser Differential Revision: D33895938 Pulled By: cpuhrsch fbshipit-source-id: a333505d3a216705147c2aaaaeb2a0fd0c2a5e43 (cherry picked from commit `a88265921c`)	2022-02-02 23:59:24 +00:00
Ivan Yashchuk	be2dc8f294	Sparse CSR CUDA: Add torch.baddbmm and torch.bmm (#68711 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68711 This PR adds possibility to multiply a single CSR matrix by a batch of dense matrices. cc nikitaved pearu cpuhrsch IvanYashchuk ngimel Test Plan: Imported from OSS Reviewed By: davidberard98 Differential Revision: D33773319 Pulled By: cpuhrsch fbshipit-source-id: 1623ce9affbc4fdc6d6130a95c5a42022858b62b (cherry picked from commit `628c8e366d`)	2022-01-28 07:25:32 +00:00
Ivan Yashchuk	f93ffc9ea8	Sparse CSR: Handle zero matrix consistently for triangular_solve (#71304 ) Summary: This PR enables `test_block_triangular` tests on the CPU. These tests revealed that there was a problem with how the nnz==0 case is handled. Now we return a tensor filled with NaNs both on CUDA and CPU. cc nikitaved pearu cpuhrsch Pull Request resolved: https://github.com/pytorch/pytorch/pull/71304 Reviewed By: davidberard98 Differential Revision: D33600482 Pulled By: cpuhrsch fbshipit-source-id: d09cb619f8b6e54b9f07eb16765ad1c183c42487	2022-01-17 13:47:49 -08:00
Ivan Yashchuk	40121456af	Sparse CSR: Add `torch.randn_like` (#68083 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68083 This PR adds support for `torch.randn_like(sparse_csr_tensor)`. It creates a new sparse csr tensor with same indices but different values that are normally distributed. In addition `.normal_()` and `torch.empty_like` were implemented because `randn_like` is a composite of these two functions. cc nikitaved pearu cpuhrsch IvanYashchuk Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D33511280 Pulled By: cpuhrsch fbshipit-source-id: 6129083e8bc6cc5af2e0191294bd5e4e864f6c0e	2022-01-11 18:29:24 -08:00
Pearu Peterson	cfc5519661	Support Sparse CSR transpose. Fix clang-tidy warnings. (#70582 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70582 cc nikitaved pearu cpuhrsch Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D33414446 Pulled By: cpuhrsch fbshipit-source-id: dd0888d9dd3885579e853643a60d13373b5d6b15	2022-01-05 17:41:51 -08:00
Pearu Peterson	ab7d0df449	Support cloning CSR tensors (#70581 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70581 cc nikitaved pearu cpuhrsch Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D33413992 Pulled By: cpuhrsch fbshipit-source-id: 3a576d2c2f26d1edcc8f6932b2dbe2c7c11e9593	2022-01-04 21:41:18 -08:00
Ivan Yashchuk	60eb1e53b2	Sparse CSR CPU: Add block sparse support for MKL path (#68710 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68710 This PR adds support for block sparse (BSR) matrices for functions that use Inspector-Executor MKL Sparse API. At the moment of this PR it's: * torch.addmm * torch.addmv * torch.triangular_solve (once https://github.com/pytorch/pytorch/pull/62180 is merged) cc nikitaved pearu cpuhrsch IvanYashchuk Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D33179486 Pulled By: cpuhrsch fbshipit-source-id: e1dec0dccdbfed8b280be16b8c11fc9e770d50ae	2021-12-17 10:56:05 -08:00
Ivan Yashchuk	243e135eb4	Sparse CSR CUDA: Add block sparse support for torch.triangular_solve (#68709 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68709 This PR adds support for triangular solver with a block CSR matrix. cc nikitaved pearu cpuhrsch IvanYashchuk ngimel Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D33066067 Pulled By: cpuhrsch fbshipit-source-id: 9eaf1839071e9526be8d8c6d47732b24200f3557	2021-12-16 13:03:42 -08:00
Peter Bell	6de9f0fc94	OpInfo: Allow sample_inputs_func to be any iterable (#69256 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69256 Closes #52486 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D32942008 Pulled By: mruberry fbshipit-source-id: f5b01b0298c0160b0bec6e86e2b6db8cfe746206	2021-12-09 08:37:26 -08:00
Ivan Yashchuk	a8232ee1bc	Sparse CSR CUDA: Add block torch.addmv when mat is sparse (#68708 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68708 This PR adds block CSR matrix times dense vector multiplication. cc nikitaved pearu cpuhrsch IvanYashchuk ngimel Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D32647694 Pulled By: cpuhrsch fbshipit-source-id: a1c120691c4350284b156fe4259eda684b734b66	2021-12-07 14:02:59 -08:00
Kushashwa Ravi Shrimali	63470f9449	Sparse CSR: Implement unary ufuncs (with 0->0 correspondence) (#69292 ) Summary: This PR attempts to add support for unary ufuncs (with 0->0 correspondence) for Sparse CSR Layout. Ops supported: `['abs', 'asin', 'asinh', 'atan', 'atanh', 'ceil', 'conj_physical', 'floor', 'log1p', 'neg', 'round', 'sin', 'sinh', 'sign', 'sgn', 'signbit', 'tan', 'tanh', 'trunc', 'expm1', 'sqrt', 'angle', 'isinf', 'isposinf', 'isneginf', 'isnan', 'erf', 'erfinv']` cc nikitaved pearu cpuhrsch IvanYashchuk peterbell10 Pull Request resolved: https://github.com/pytorch/pytorch/pull/69292 Reviewed By: pbelevich Differential Revision: D32805514 Pulled By: cpuhrsch fbshipit-source-id: 9ae20817e77a36d3aa6c5afa532b9dc3b8cf1dd3	2021-12-07 12:07:41 -08:00
Ivan Yashchuk	89a145fd91	Sparse CSR CUDA: Add torch.sparse.sampled_addmm (#68007 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68007 This PR adds a new function to the sparse module. `sampled_addmm` computes α(A @ B) spy(C) + β*C, where C is a sparse CSR matrix and A, B are dense (strided) matrices. This function is currently restricted to single 2D matrices, it doesn't support batched input. cc nikitaved pearu cpuhrsch IvanYashchuk Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D32435799 Pulled By: cpuhrsch fbshipit-source-id: b1ffac795080aef3fa05eaeeded03402bc097392	2021-11-29 15:43:29 -08:00
Ivan Yashchuk	61a4204d80	Sparse CSR CUDA: Add block torch.addmm when mat1 is sparse (#68707 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68707 This PR adds a path for block CSR matrices for `torch.addmm`. cuSPARSE interface is restricted to 32-bit indices and square blocks. My plan is to make everything work and tests passing using an unsafe constructor first, keeping it all private. Then discuss & implement constructors with block information separately unlocking the functions for wider use. Documentation will come with the update to constructors. cc nikitaved pearu cpuhrsch IvanYashchuk ngimel Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D32650366 Pulled By: cpuhrsch fbshipit-source-id: 430a9627901781ee3d2e2496097b71ec17727d98	2021-11-29 08:58:49 -08:00
Nikita Shulga	208e109dbf	Revert D32633806: Sparse CSR CUDA: Add block torch.addmm when mat1 is sparse Test Plan: revert-hammer Differential Revision: D32633806 (`b28ddd72d3`) Original commit changeset: b98db0bd655c fbshipit-source-id: 1c757628526bb1b88747257fc77d8b9cb996e502	2021-11-24 09:15:17 -08:00
Ivan Yashchuk	b28ddd72d3	Sparse CSR CUDA: Add block torch.addmm when mat1 is sparse (#68707 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68707 This PR adds a path for block CSR matrices for `torch.addmm`. cuSPARSE interface is restricted to 32-bit indices and square blocks. My plan is to make everything work and tests passing using an unsafe constructor first, keeping it all private. Then discuss & implement constructors with block information separately unlocking the functions for wider use. Documentation will come with the update to constructors. cc nikitaved pearu cpuhrsch IvanYashchuk ngimel Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D32633806 Pulled By: cpuhrsch fbshipit-source-id: b98db0bd655cce651a5da457e78fca08619a5066	2021-11-23 22:55:46 -08:00
Ivan Yashchuk	3b3dc1ade8	Sparse CSR CPU: add `triangular_solve_out` (#62180 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62180 This PR adds CPU dispatch for `triangular_solve` with sparse CSR matrix. The implementation uses MKL Sparse library. If it's not available then a runtime error is thrown. cc nikitaved pearu cpuhrsch IvanYashchuk Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D32581395 Pulled By: cpuhrsch fbshipit-source-id: 41c7133a0d2754ef60b5a7f1d14aa0bf7680a844	2021-11-21 21:29:20 -08:00
Kushashwa Ravi Shrimali	833dcaf2d6	Sparse CSR: Add `torch.sin` (#68123 ) Summary: This PR attempts to add support for `torch.sin` for sparse CSR tensors. This aims to be a revised implementation (in some form) of https://github.com/pytorch/pytorch/pull/68083, and the implementation aims to be similar to that in [`SparseTensorMath.cpp` file](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/sparse/SparseTensorMath.cpp) The tests and `empty_like` support for sparse CSR tensors (with a minor correction) are borrowed from https://github.com/pytorch/pytorch/pull/68083 temporarily to assist CI with testing this PR. :) cc nikitaved pearu cpuhrsch IvanYashchuk krshrimali Pull Request resolved: https://github.com/pytorch/pytorch/pull/68123 Reviewed By: jbschlosser Differential Revision: D32533379 Pulled By: cpuhrsch fbshipit-source-id: eb834d64d16ee12734c77e74fffa4a47614e3dfb	2021-11-18 21:58:09 -08:00
Rok	952ca25daa	Sparse CSR: add `convert_indices_from_csr_to_coo` (#66774 ) Summary: This PR adds conversion from CSR to COO. Fixes https://github.com/pytorch/pytorch/issues/56959 cc nikitaved pearu cpuhrsch IvanYashchuk gchanan mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/66774 Reviewed By: zou3519 Differential Revision: D32288415 Pulled By: cpuhrsch fbshipit-source-id: 683ba658dc46835fdf3c0e24645c0c2bb243b968	2021-11-17 22:28:30 -08:00
Ivan Yashchuk	affa3f846c	Sparse CSR CPU: add `torch.addmm` (#65606 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65606 This PR adds `torch.addmm(c, a, b, alpha=1.0, beta=0.0, out=out)` variant with `a, b, c, out` all being sparse CSR tensors on CPU. cc nikitaved pearu cpuhrsch IvanYashchuk Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D32366236 Pulled By: cpuhrsch fbshipit-source-id: e910bcc96eee99d624b80ee881df3887ab3ba5ac	2021-11-16 17:22:46 -08:00
Ivan Yashchuk	c2642b6465	Sparse CSR CPU: add `torch.add` with all inputs sparse (#64391 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64391 This PR adds `torch.add(a, b, alpha=None, out=out)` variant with `a, b, out` all being sparse CSR tensors on CPU. Fixes #59060 cc nikitaved pearu cpuhrsch IvanYashchuk Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D32316562 Pulled By: cpuhrsch fbshipit-source-id: 384462369007854b5e2e6cb9ae7b320302627c71	2021-11-11 10:02:12 -08:00
Ivan Yashchuk	cbf596bf8e	Sparse CSR CPU: add `addmv_out` (#61536 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61536 This PR adds CPU dispatch for `addmv_out` with Sparse CSR matrix. The implementation uses MKL Sparse library. If it's not available then a runtime error is thrown. Since structured_delegate is used we only need to implement the out variant, the in-place and normal variants are autogenerated. MKL descriptor of sparse matrices is implemented in `at::mkl::sparse::MklSparseCsrDescriptor`. MKL Sparse doesn't allow switching indices type in runtime, it's predetermined in build time. Only 32-bit version of MKL was tested locally, but I expect 64-bit version to work correctly as well. When indices type of PyTorch CSR tensor doesn't match with MKL's, indices tensor is converted to MKL compatible type (`int` vs `int64_t`). cc nikitaved pearu cpuhrsch IvanYashchuk Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D32141787 Pulled By: malfet fbshipit-source-id: b818a0b186aa227982221c3862a594266a58a2a6	2021-11-09 12:34:21 -08:00
Ivan Yashchuk	d5d342b237	Sparse CSR CUDA: Support mixed memory format input for triangular_solve (#66401 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66401 This PR fixes the case when result and input tensors have different strides. cuSPARSE from CUDA 11.3.1 has a bug: it doesn't use correct strides to write the result. This is "fixed" in PyTorch code by copying the input tensor to a tensor with same strides as result tensor has. cc nikitaved pearu cpuhrsch IvanYashchuk ngimel Test Plan: Imported from OSS Reviewed By: davidberard98 Differential Revision: D32177966 Pulled By: cpuhrsch fbshipit-source-id: 118437409df147f04dce02763aff9bfd33f87c63	2021-11-04 15:34:42 -07:00
Ivan Yashchuk	69f86ecd3a	Sparse CSR CUDA: add `torch.add` with all inputs sparse (#63948 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63948 This PR adds `torch.add(a, b, alpha=None, out=out)` variant with `a, b, out` all being sparse CSR tensors. The underlying cuSPARSE function works only with 32-bit indices, and in the current implementation, the result tensor has 32-bit indices. Input tensors can have both 64-bit and 32-bit indices tensors. Fixes https://github.com/pytorch/pytorch/issues/59060 cc nikitaved pearu cpuhrsch IvanYashchuk ngimel Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D31909731 Pulled By: cpuhrsch fbshipit-source-id: 656f523e3947fec56b2f93c474fb6fd49f0360ca	2021-10-29 10:43:05 -07:00
Ivan Yashchuk	bd5e6fe5ac	Skip complex128 dtype for test_addmm_sizes_all_sparse_csr Windows test (#67453 ) Summary: Windows CUDA 11.1 periodic CI is failing. See https://github.com/pytorch/pytorch/pull/63511#issuecomment-953940183. I don't understand though why periodic-win-vs2019-cuda11.1-py3 was triggered on the PR, but no test from `test_sparse_csr.py` were run https://github.com/pytorch/pytorch/runs/3975200820?check_suite_focus=true. cc nikitaved pearu cpuhrsch IvanYashchuk mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/67453 Reviewed By: malfet, seemethere, janeyx99 Differential Revision: D31997574 Pulled By: cpuhrsch fbshipit-source-id: ae8bfb6da865014f39e6ad5675eb17e5a4d39744	2021-10-28 12:24:46 -07:00
Ivan Yashchuk	7c48b9ee25	Sparse CSR CUDA: add `triangular_solve_out` (#61858 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61858 This PR adds `triangular_solve_out_sparse_csr_cuda`. The operation is used to comput the solution to the linear system where coefficient matrix is triangular. Structured kernels are used and the meta function needed some changes to support sparse csr layout. With sparse matrix input the `cloned_coefficient` tensor is 0-sized tensor. cc nikitaved pearu cpuhrsch IvanYashchuk ngimel Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D31948435 Pulled By: cpuhrsch fbshipit-source-id: 7775fece83ca705a26d75f82aead10b956b14bfd	2021-10-27 11:12:20 -07:00
Ivan Yashchuk	700b39a3df	Sparse CSR CUDA: add `torch.addmm` with all inputs sparse (#63511 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63511 This PR adds `torch.addmm(c, a, b)` variant with `c, a, b` all being CSR tensors. The underlying cuSPARSE function works only with 32-bit indices, and in the current implementation the result tensor has 32-bit indices. Input tensors can have both 64-bit and 32-bit indices tensors. cc nikitaved pearu cpuhrsch IvanYashchuk ngimel Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D31809838 Pulled By: cpuhrsch fbshipit-source-id: 97005dba27d8adcae445eb756bcbd7271061e9b5	2021-10-25 14:32:30 -07:00
Ivan Yashchuk	450221c534	Sparse CSR: Add tensor.resize_ and tensor.copy_ (#63510 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63510 Sparse CSR matrix resizing behavior: If we _increase the number of rows_ the number of specified elements in the matrix remains the same -> the size of col_indices, values doesn't change, the size of crow_indices becomes `rows+1`. If we _decrease the number of rows_ the number of specified elements will be `min(nnz, rowscols)` -> need to resize `crow_indices` to `rows+1` and set the last element to `min(nnz, rowscols)`; decrease the size of col_indices and values to `min(nnz, rows*cols)`. If we _increase the number of columns_ the number of specified elements in the matrix remains the same, the number of rows remains the same -> no need to resize anything, just set new sizes. We _cannot decrease the number of columns_ because it would require recomputing `crow_indices`. cc nikitaved pearu cpuhrsch IvanYashchuk Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D31796680 Pulled By: cpuhrsch fbshipit-source-id: 7d8a9701ce06d30a1841f94bba0a057cacea9401	2021-10-20 14:19:04 -07:00
Jane Xu	793f366e34	[skip ci] Set test owners for sparse tests (#66863 ) Summary: Action following https://github.com/pytorch/pytorch/issues/66232 cc nikitaved pearu cpuhrsch IvanYashchuk Pull Request resolved: https://github.com/pytorch/pytorch/pull/66863 Reviewed By: anjali411 Differential Revision: D31771126 Pulled By: janeyx99 fbshipit-source-id: 6cb5ca0557e8555f6a09b3e607ff8888e505486e	2021-10-20 10:12:13 -07:00
Ivan Yashchuk	bd4d5cb14c	Sparse CSR: Add torch.empty (#63509 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63509 The primary use of `torch.empty` is to reserve memory for tensor and set the type, device, size information. The same is done here for SparseCSR. `crow_indices` is initialized as an empty tensor of size `num_rows + 1`. `col_indices` and `values` are initialized as empty tensors of size 0. cc nikitaved pearu cpuhrsch IvanYashchuk Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D31770359 Pulled By: cpuhrsch fbshipit-source-id: c83f2a2e0d7514ba24780add1086e1bccf541dd9	2021-10-19 15:59:07 -07:00
Ivan Yashchuk	3488a85a76	Sparse CSR CUDA: fix input checks for `addmm` and `mm` (#66485 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66485 The errors for incorrectly sized inputs should match the dense variants of functions. Moved addmm_out_sparse_csr_dense_cuda from SparseCsrTensorMath.cu and removed unnecessary device check. cc nikitaved pearu cpuhrsch IvanYashchuk Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D31764036 Pulled By: cpuhrsch fbshipit-source-id: 76900fe9e4a49474695a01f34bad41cb3422321c	2021-10-19 12:01:11 -07:00

1 2 3 4 5 ...

272 Commits