Commit Graph

231 Commits

Author SHA1 Message Date
Nikita Shulga
d80fe49de0 [Reland] Add py-3.10 config (#82329)
This is a re-land of #81372 and #81233 with the exception that it does not force the range-checks on older Python runtime versions and as such should not affect the internal workloads, which were the reason for revert, see https://github.com/pytorch/pytorch/pull/81372#issuecomment-1187516464

- [Py3.10] Allow floats to be imported as Long (#81372)
- [CI] Move CUDA-11.6 to Python-3.10 configuration (#81233)
- Don't do anything about range checks for pre-py3.10
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82329
Approved by: https://github.com/kit1980
2022-07-27 20:22:47 +00:00
Edward Z. Yang
7f7c81c5f9 Add empty_like support for sparse_csc/bsr/bsc (#82310)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82310
Approved by: https://github.com/amjames, https://github.com/nikitaved
2022-07-27 18:59:07 +00:00
PyTorch MergeBot
ec1b3a45ad Revert "[Py3.10] Allow floats to be imported as Long (#81372)"
This reverts commit 69d73345a2.

Reverted https://github.com/pytorch/pytorch/pull/81372 on behalf of https://github.com/DanilBaibak due to Break internal build
2022-07-18 14:55:13 +00:00
Nikita Shulga
69d73345a2 [Py3.10] Allow floats to be imported as Long (#81372)
Thus avoiding `TypeError: 'float' object cannot be interpreted as an integer` when trying to create integer tensor from floating point values

Use `c10::checked_convert` to detect overflows during tensor construction from scalars. Modify sparse_csr test that violated this rule

Fixes #69319

Tested in #81233

Pull Request resolved: https://github.com/pytorch/pytorch/pull/81372
Approved by: https://github.com/ezyang, https://github.com/ngimel
2022-07-15 22:57:58 +00:00
Nikita Vedeneev
880b972841 More efficient indices validations for compressed sparse formats. (#81108)
As per title.

Some of the features:
- native kernels both for the CPU and CUDA without device syncs.
- If needed, invariant checks 5.1 - 5.5 could be improved to utilize vectorization. This will require implementing a conversion `Vectorized -> bool`. That's a follow-up.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/81108
Approved by: https://github.com/amjames, https://github.com/pearu, https://github.com/cpuhrsch
2022-07-14 20:36:18 +00:00
Pearu Peterson
d50f4a3c24 Support sparse/dense_dim for Compressed Sparse tensors (#80901)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80901
Approved by: https://github.com/cpuhrsch, https://github.com/nikitaved
2022-07-08 15:49:35 +00:00
Pearu Peterson
d266256621 Support compressed sparse tensors with dense dimensions (#80565)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80565
Approved by: https://github.com/cpuhrsch
2022-07-07 16:21:12 +00:00
PyTorch MergeBot
682c0d2615 Use segment/scatter_reduce to support masked reductions on sparse CSR tensors (mean, amax, amin) (fp only) (#78918)
Follows design  [here](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/sparse/SparseCsrTensorMath.cpp#L804-L837) and [here](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/sparse/SparseCsrTensorMath.cpp#L885-L928) from SparseCsrTensorMath.cpp (which has already been used to implement sum/prod) but use `segment_reduce`/`scatter_reduce` for reduction step

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78918
Approved by: https://github.com/cpuhrsch
2022-06-30 14:11:53 +00:00
Andrew M. James
9e3677f85d Add support for BSR <-> Strided Conversion (#80354)
Supersedes #78303

Pull Request resolved: https://github.com/pytorch/pytorch/pull/80354
Approved by: https://github.com/cpuhrsch
2022-06-27 21:09:09 +00:00
Pearu Peterson
cde365a7cd Validate Sparse Compressed tensor inputs (#79385)
The validation includes regular tensor inputs, batched tensor inputs, as well as hybrid tensor inputs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79385
Approved by: https://github.com/nikitaved, https://github.com/cpuhrsch
2022-06-27 17:19:54 +00:00
Nikita Vedeneev
9ad91cc6e0 optimize to_dense for CSC (#79635)
As per title. Previously it was done via converting to COO.
A better approach could be using `dense.out_`, but `sparse_csc` is yet forbidden.
And are we fine with implementing very critical operations like `add` via transpositions?

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79635
Approved by: https://github.com/cpuhrsch
2022-06-21 16:52:16 +00:00
jpvillam
aff7eef476 [ROCm] Enable some sparse tests on ROCm (#77877)
Enabling:
test_sampled_addmm_errors_cuda_complex128
test_sampled_addmm_errors_cuda_complex64
test_sampled_addmm_errors_cuda_float32
test_sampled_addmm_errors_cuda_float64
test_sparse_add_cuda_complex128
test_sparse_add_cuda_complex64

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77877
Approved by: https://github.com/pruthvistony, https://github.com/malfet
2022-06-14 21:11:35 +00:00
Pearu Peterson
fb6749d977 Support CSC/BSR/BSC inputs to unary zero-preserving functions.
In addition, enable testing masked reductions in sparse compressed consistency check.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78173

Approved by: https://github.com/cpuhrsch
2022-06-09 09:46:34 +00:00
Pearu Peterson
8c88a55d44 Fix sparse BSR tensor validation.
Also adds bits to support dense dimensions for Sparse Compressed tensors.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78359

Approved by: https://github.com/cpuhrsch
2022-05-27 13:26:35 +00:00
Christian Puhrsch
b9fb940dec Conversion between SparseBsr and Strided (#78025)
Adds conversion between the strided and SparseBsr layout

[Based on code by @bhosmer!](https://colab.research.google.com/drive/1NHWti04TU269dzbRjLfxGxVlzZWo1XLo?usp=sharing)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78025
Approved by: https://github.com/pearu, https://github.com/jbschlosser
2022-05-25 15:03:35 +00:00
Christian Puhrsch
a8467de6fa Guard test_sparse_csr.test_mm on CUDA11+ (#77965)
Fixes #77944

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77965
Approved by: https://github.com/albanD, https://github.com/malfet
2022-05-20 16:16:28 +00:00
Christian Puhrsch
ec290949aa Change transpose to return CSC when given CSR, adjust addmm, addmv, mm (#77615)
Changes transpose to return CSC when given CSR and adds CSC support via to_sparse_csr to addmm and addmv.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77615
Approved by: https://github.com/pearu, https://github.com/albanD
2022-05-19 14:17:55 +00:00
Pearu Peterson
8b5f11c61e Support copy_ for Sparse Compressed tensors.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77605

Approved by: https://github.com/cpuhrsch
2022-05-18 21:22:19 +00:00
Christian Puhrsch
e10a002e52 2D Strided to/from CSC, COO to CSC, CSC to CSC conversion. (#77521)
Adds
- to_sparse_csc for strided input
- to_sparse_csc for COO input
- CSC to strided
- CSC to CSR
- CSC to CSC

Uses SciPy as a reference

Follow up work is changing transpose to return CSC when passed CSR and the resulting ripples through our matmul operations.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77521
Approved by: https://github.com/pearu, https://github.com/anjali411
2022-05-18 14:49:11 +00:00
Pearu Peterson
ccc991ba29 Support str for Sparse Compressed tensors
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77530

Approved by: https://github.com/cpuhrsch
2022-05-18 12:58:54 +00:00
Pearu Peterson
dc882ed33d Add Sparse Compressed tensor support to torch.clone
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77512

Approved by: https://github.com/cpuhrsch
2022-05-17 16:29:41 +00:00
PyTorch MergeBot
0d1329c4ea Revert "Add Sparse Compressed tensor support to torch.clone"
This reverts commit 942f04172a.

Reverted https://github.com/pytorch/pytorch/pull/77512 on behalf of https://github.com/atalman
2022-05-17 14:26:52 +00:00
Pearu Peterson
942f04172a Add Sparse Compressed tensor support to torch.clone
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77512

Approved by: https://github.com/cpuhrsch
2022-05-17 07:32:46 +00:00
PyTorch MergeBot
f1c8e8fa4e Revert "Add Sparse Compressed tensor support to torch.clone"
This reverts commit 20ba6e6935.

Reverted https://github.com/pytorch/pytorch/pull/77512 on behalf of https://github.com/malfet
2022-05-17 00:31:49 +00:00
Christian Puhrsch
89e32f52c7 Change test_sparse_csr test signatures (#77595)
Some consuming tools aren't equipped to split on the "(" and ")" induced by passing tuples to parametrize.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77595
Approved by: https://github.com/malfet
2022-05-17 00:24:08 +00:00
Pearu Peterson
20ba6e6935 Add Sparse Compressed tensor support to torch.clone
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77512

Approved by: https://github.com/cpuhrsch
2022-05-16 22:21:49 +00:00
Pearu Peterson
d76efed578 Add Sparse CSC support to torch.empty
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77508

Approved by: https://github.com/cpuhrsch
2022-05-16 18:53:56 +00:00
Christian Puhrsch
8c608a79b4 Compressed sparse layout conversion stubs (#77489)
This PR unifies sparse layout conversions into a single location and adds stubs to raise a Runtime error for unsupported conversions.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77489
Approved by: https://github.com/pearu, https://github.com/mruberry
2022-05-16 18:37:42 +00:00
Pearu Peterson
88205886d7 Add ccol_indices and row_indices methods.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77503

Approved by: https://github.com/cpuhrsch
2022-05-16 00:23:54 +00:00
Christian Puhrsch
289192199a Add to_sparse_bsr (#77366)
Conversion function of CSR to BSR.

Follow up work includes
- Conversion from strided, COO, CSC, BSC
- autograd
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77366
Approved by: https://github.com/IvanYashchuk, https://github.com/mikaylagawarecki
2022-05-13 20:16:03 +00:00
Christian Puhrsch
b250759242 mul(dense, csr), mul(csr, dense) via sparse_mask_csr (#77177)
This adds basic coverage, but can be easily made more efficient by providing a native implementation.

Follow up work includes supporting CSR gradients for strided Tensors.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77177
Approved by: https://github.com/nikitaved, https://github.com/mikaylagawarecki
2022-05-12 23:56:10 +00:00
Ivan Yashchuk
09be44de7b Sparse BSR: Enable addmm, addmv, triangular_solve for BSR layout (#77255)
This PR enables `addmm`, `addmv`, `triangular_solve` functions for tensors with `torch.sparse_bsr` layout.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77255
Approved by: https://github.com/cpuhrsch
2022-05-12 08:31:44 +00:00
Ivan Yashchuk
d1beda53e8 Sparse CSR CUDA: add batched support for torch.sparse.sampled_addmm
This PR adds a forloop around cuSPARSE calls to support batched inputs.
cuSPARSE function itself doesn't support batched inputs yet.
`mat1` and `mat2` must have the same batch shape. It's allowed to pass
`self` as a single matrix when `mat1` and `mat2` are batched.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77243

Approved by: https://github.com/cpuhrsch
2022-05-12 08:23:38 +00:00
Ivan Yashchuk
545d90f032 Sparse CSR: enable autograd for torch.sparse.addmm and torch.sparse.mm
This PR updates the derivative rule for `torch.sparse.addmm` to be
working with CSR sparse matrix. Notably `torch.sparse.sampled_addmm` is
used in the backward function.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76591

Approved by: https://github.com/cpuhrsch
2022-05-11 18:57:40 +00:00
PyTorch MergeBot
f94abd59f7 Revert "Sparse CSR: enable autograd for torch.sparse.addmm and torch.sparse.mm"
This reverts commit 721a8ca697.

Reverted https://github.com/pytorch/pytorch/pull/76591 on behalf of https://github.com/janeyx99
2022-05-10 13:21:46 +00:00
Ivan Yashchuk
721a8ca697 Sparse CSR: enable autograd for torch.sparse.addmm and torch.sparse.mm
This PR updates the derivative rule for `torch.sparse.addmm` to be
working with CSR sparse matrix. Notably `torch.sparse.sampled_addmm` is
used in the backward function.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76591

Approved by: https://github.com/cpuhrsch
2022-05-10 08:44:55 +00:00
Ivan Yashchuk
3df0140cbd Sparse CSR: Fix sampled_addmm for noncontiguous inputs and fix block sparse triangular solve
`torch.sparse.sampled_addmm` was incorrect for noncontiguous inputs on CUDA.
Unfortnately, it was overlooked in the tests that noncontiguous inputs
are not tested properly because 1x5, 5x1 shapes were used.

Block sparse triangular solver on CUDA could return incorrect results if
there's a zero on the diagonal in the sparse matrix. Now it returns nan.
Tests also revealed that unitriangular=True flag is not working
correctly on CPU in some cases. That part needs more investigation.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76590

Approved by: https://github.com/cpuhrsch
2022-05-05 09:00:48 +00:00
Ivan Yashchuk
1335512056 Sparse CSR: Add CPU fallback for sampled_addmm
`torch.sparse.sampled_addmm` function is used in backward for
`torch.sparse.addmm` and `torch.sparse.mm` therefore we need a CPU
implementation.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76589

Approved by: https://github.com/cpuhrsch
2022-05-04 21:30:43 +00:00
Pearu Peterson
436a7be059 Factory functions for sparse CSC, BSR, and BSC tensors
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76634

Tests for Sparse Compressed factory functions

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76746

Approved by: https://github.com/cpuhrsch
2022-05-04 03:30:41 +00:00
Ivan Yashchuk
d7db6a7b02 Sparse CSR: Add backward for torch.sparse.sampled_addmm
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68084

Approved by: https://github.com/cpuhrsch
2022-05-02 17:58:20 +00:00
Ivan Yashchuk
407e8eba8c Enable simple indexing into CSR tensor, add torch.select for CSR
This PR implements `torch.select` for CSR tensors. Currently, it's not possible to select rows or columns for batched CSR. The non-batched case works fine by converting to COO and calling select. Initially, I implemented raw manipulations of indices but converting to COO is only slightly slower and more readable.

This PR also enables indexing into batched CSR tensor with `[x, y, z]`. Assigning is disabled.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76228
Approved by: https://github.com/cpuhrsch
2022-04-23 02:36:03 +00:00
arindamroy-eng
7478ce187a ROCM:Unskip more tests for ROCM5.0
Re-enabling more tests which are working on ROCM5.0

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75353
Approved by: https://github.com/ezyang
2022-04-19 19:45:55 +00:00
Ivan Yashchuk
bba4780232 Enable autograd wrt sparse CSR tensors
This pull request enables accumulating gradients for the CSR tensor.
Functions that work and are tested:
- tensor.abs()
- tensor.neg()
- tensor.conj_physical()
- torch.addmm

`torch.mm` also works, but tests will be added later.

In addition, this PR adds throwing an error when trying to access strides, storage, and contiguity info on a CSR tensor.

`tensor.to_sparse_csr().to_sparse_csr()` was failing and now fixed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75435
Approved by: https://github.com/cpuhrsch
2022-04-19 18:42:45 +00:00
Pearu Peterson
e9791cd8c9 Validate Sparse Compressed tensor arguments
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75946

Approved by: https://github.com/cpuhrsch
2022-04-18 02:21:22 +00:00
Yukio Siraichi
22a10ce513 Port cat kernel to structured kernels.
Tracking issue: #55070

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68640

Approved by: https://github.com/ezyang
2022-04-14 17:49:43 +00:00
Ivan Yashchuk
3f1351d1cf Disable strides and contiguity for CSR tensors
This pull request adds throwing an error when trying to access the strides, storage, and contiguity info of a CSR tensor.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75499
Approved by: https://github.com/cpuhrsch
2022-04-08 23:15:19 +00:00
Pearu Peterson
e61b2e12e1 Support masked sum on CSR tensors [CPU, CUDA]
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72633

Approved by: https://github.com/cpuhrsch
2022-04-08 20:07:18 +00:00
PyTorch MergeBot
31ed77b769 Revert "Support masked sum on CSR tensors [CPU, CUDA]"
This reverts commit 5c28216aea.

Reverted https://github.com/pytorch/pytorch/pull/72633 on behalf of https://github.com/b0noI
2022-04-07 23:34:58 +00:00
Ivan Yashchuk
c7ae23b50e Extend CSR constructor to support batched indices and values
This is the first portion of changes required to enable Batched CSR format described in https://github.com/pytorch/pytorch/issues/60854#batched-CSR-computation.

Currently, only the same batch shape for indices and values is allowed. In the future, we could enable "broadcasting" of indices and batched values, as done in xFormers (dd96b8d8be/xformers/components/attention/_sputnik_sparse.py (L441)).

This PR adds possibility to construct a batched CSR matrix with `torch.sparse_csr_tensor` and this batched CSR can be converted to a dense tensor with a `.to_dense()` call.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74542
Approved by: https://github.com/cpuhrsch
2022-04-07 17:10:52 +00:00
Pearu Peterson
5c28216aea Support masked sum on CSR tensors [CPU, CUDA]
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72633

Approved by: https://github.com/cpuhrsch
2022-04-07 17:08:35 +00:00
PyTorch MergeBot
6d832a7a20 Revert "Extend CSR constructor to support batched indices and values"
This reverts commit eead599039.

Reverted https://github.com/pytorch/pytorch/pull/74542 on behalf of https://github.com/b0noI
2022-04-05 21:39:34 +00:00
Christian Puhrsch
f2a4d49174 torch.mm(dense, sparse_csr)
Fixes #68621
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73686
Approved by: https://github.com/IvanYashchuk, https://github.com/malfet
2022-04-05 17:05:37 +00:00
Ivan Yashchuk
eead599039 Extend CSR constructor to support batched indices and values
This is the first portion of changes required to enable Batched CSR format described in https://github.com/pytorch/pytorch/issues/60854#batched-CSR-computation.

Currently, only the same batch shape for indices and values is allowed. In the future, we could enable "broadcasting" of indices and batched values, as done in xFormers (dd96b8d8be/xformers/components/attention/_sputnik_sparse.py (L441)).

This PR adds possibility to construct a batched CSR matrix with `torch.sparse_csr_tensor` and this batched CSR can be converted to a dense tensor with a `.to_dense()` call.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74542
Approved by: https://github.com/cpuhrsch
2022-04-04 22:09:44 +00:00
PyTorch MergeBot
f6b9a1d4fb Revert "Support masked sum on CSR tensors [CPU, CUDA]"
This reverts commit cda3f586d0.

Reverted https://github.com/pytorch/pytorch/pull/72633 on behalf of https://github.com/janeyx99
2022-04-04 22:06:19 +00:00
Pearu Peterson
cda3f586d0 Support masked sum on CSR tensors [CPU, CUDA]
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72633

Approved by: https://github.com/cpuhrsch
2022-04-04 19:23:45 +00:00
Nikita Shulga
bfac65dfe5
[testing] Update dispatch macros (#74977)
This PR is reland of #74289 
Co-authored-by: Khushi Agrawal <khushiagrawal411@gmail.com>
2022-03-30 14:13:21 -07:00
PyTorch MergeBot
cc23725e89 Revert "Extend CSR constructor to support batched indices and values"
This reverts commit c074a53002.

Reverted https://github.com/pytorch/pytorch/pull/74542 on behalf of https://github.com/malfet
2022-03-30 19:54:26 +00:00
PyTorch MergeBot
2e4152b118 Revert "[testing] Update dispatch macros"
This reverts commit eed19a0f38.

Reverted https://github.com/pytorch/pytorch/pull/74289 on behalf of https://github.com/malfet
2022-03-30 19:52:37 +00:00
Khushi Agrawal
eed19a0f38 [testing] Update dispatch macros
Hi,
This PR is the follow-up PR of #71561. (the previous PR had a couple of merge conflicts and was reverted, this PR resolves that).
Please take a look. Thanks!

cc: @pmeier @mruberry @kshitij12345
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74289
Approved by: https://github.com/pmeier, https://github.com/mruberry
2022-03-30 16:10:16 +00:00
Ivan Yashchuk
c074a53002 Extend CSR constructor to support batched indices and values
This is the first portion of changes required to enable Batched CSR format described in https://github.com/pytorch/pytorch/issues/60854#batched-CSR-computation.

Currently, only the same batch shape for indices and values is allowed. In the future, we could enable "broadcasting" of indices and batched values, as done in xFormers (dd96b8d8be/xformers/components/attention/_sputnik_sparse.py (L441)).

This PR adds possibility to construct a batched CSR matrix with `torch.sparse_csr_tensor` and this batched CSR can be converted to a dense tensor with a `.to_dense()` call.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74542
Approved by: https://github.com/cpuhrsch
2022-03-29 21:20:25 +00:00
Christian Puhrsch
568e02dcd7 Support sum(sparse_csr)
Basic support for summation of CSR. ~~Generalizes structured torch.sum to also support CSR.~~

Follow up work:
- Autograd support
- OpInfo integration
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74766
Approved by: https://github.com/ezyang
2022-03-29 18:44:02 +00:00
Christian Puhrsch
edf2deb81e Add private conversion function from CSR to block CSR
This PR adds a private function that converts a CSR Tensor into a [scipy-style block CSR Tensor](https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.bsr_matrix.html#scipy.sparse.bsr_matrix).

It uses the scipy CSR to BSR conversion routines (and credits them accordingly).

The main purpose of this function is to easily create a block CSR Tensor for matrix multiplication.

Follow up work includes
- Blocksize support for sparse_csr_tensor
- Parallel CPU kernel
- CUDA kernels
- Faster arg sanitization
- Benchmarking of cuSPARSE backend
- Dense to/from block CSR
- Autograd support
- Column-major blocks
- Block CSR to CSR conversion
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71582
Approved by: https://github.com/IvanYashchuk, https://github.com/albanD
2022-03-25 21:22:15 +00:00
Christian Puhrsch
7fe0b6a5cd mul(sparse_csr, sparse_csr) using mul(sparse, sparse)
Basic fallback implementation. Let's make this faster once used.

NOTE: This is stacked on top of https://github.com/pytorch/pytorch/pull/74294
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74266
Approved by: https://github.com/pearu, https://github.com/malfet
2022-03-25 17:10:33 +00:00
Christian Puhrsch
807b2e190b Move to_sparse_csr to C++
Allows use of to_sparse_csr from C++
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74294
Approved by: https://github.com/ngimel, https://github.com/malfet
2022-03-23 17:17:45 +00:00
Christian Puhrsch
a346a18150 Use assertEqual consistently in test_sparse_csr.py
Let's use the provided comparison infrastructure
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74264
Approved by: https://github.com/IvanYashchuk, https://github.com/albanD
2022-03-16 15:19:41 +00:00
Nikita Shulga
ef066f0832 Revert D34856571: [pytorch][PR] Replace get_all_ type macros with the ATen dispatch macros.
Test Plan: revert-hammer

Differential Revision:
D34856571 (3ded7b1da3)

Original commit changeset: 0dca038bcad5

Original Phabricator Diff: D34856571 (3ded7b1da3)

fbshipit-source-id: 594553fa0b710d78beba59d5d2b646f1f1270386
(cherry picked from commit 8090eb9b12dcf452a9e7dc01792a66fb91b563b6)
2022-03-15 22:07:11 +00:00
Khushi Agrawal
3ded7b1da3 Replace get_all_ type macros with the ATen dispatch macros. (#71561)
Summary:
Hi, Team!
The PR is motivated from https://github.com/pytorch/pytorch/pull/71153#discussion_r782446738. It aims to replace `get_all` type macros with the ATen dispatch macros.

The files it iterates over are: (Thanks, Lezcano, for the idea!!)

<details>
<summary>

`test/test_autograd.py`</summary>

<p>

```python
43:from torch.testing._internal.common_dtype import get_all_dtypes
8506:        floating_dt = [dt for dt in get_all_dtypes() if dt.is_floating_point]
```

</p>
</details>

<details>
<summary>

`test/test_binary_ufuncs.py`</summary>

<p>

```python
26:    all_types_and_complex_and, integral_types_and, get_all_dtypes, get_all_int_dtypes, get_all_math_dtypes,
27:    get_all_complex_dtypes, get_all_fp_dtypes,
935:    dtypes(*get_all_dtypes(include_bool=False, include_complex=False))
1035:    dtypes(*get_all_dtypes(
1488:    dtypes(*(get_all_dtypes(include_bool=False, include_bfloat16=False)))
1879:    dtypes(*product(get_all_dtypes(include_complex=False), get_all_dtypes(include_complex=False)))
1887:    dtypes(*(get_all_int_dtypes() + [torch.bool]))
1913:    dtypes(*(get_all_fp_dtypes()))
1941:    dtypes(*(get_all_fp_dtypes()))
1977:    dtypes(*product(get_all_complex_dtypes(), get_all_dtypes()))
2019:    dtypes(*product(get_all_fp_dtypes(), get_all_fp_dtypes()))
2048:    dtypes(*get_all_dtypes())
2110:    dtypes(*product(get_all_dtypes(include_complex=False),
2111:                     get_all_dtypes(include_complex=False)))
2128:            types = [torch.bool, torch.bfloat16] + get_all_int_dtypes()
2173:        if dtypes[1] in get_all_fp_dtypes():
2178:    dtypes(*product(get_all_fp_dtypes(),
2179:                     get_all_fp_dtypes()))
2260:    dtypesIfCUDA(*set(get_all_math_dtypes('cuda')) - {torch.complex64, torch.complex128})
2261:    dtypes(*set(get_all_math_dtypes('cpu')) - {torch.complex64, torch.complex128})
2273:    dtypesIfCUDA(*set(get_all_math_dtypes('cuda')) - {torch.complex64, torch.complex128})
2274:    dtypes(*set(get_all_math_dtypes('cpu')) - {torch.complex64, torch.complex128})
2307:    dtypes(*get_all_math_dtypes('cpu'))
2319:    dtypes(*get_all_fp_dtypes(include_bfloat16=False))
2331:    dtypes(*get_all_int_dtypes())
2356:    dtypes(*get_all_dtypes(include_bfloat16=False, include_bool=False, include_complex=False))
2393:        if dtype in get_all_int_dtypes():
2614:    dtypes(*get_all_dtypes())
2624:    dtypes(*tuple(itertools.combinations_with_replacement(get_all_dtypes(), 2)))
2806:    dtypes(*list(product(get_all_dtypes(include_complex=False),
2807:                          get_all_dtypes(include_complex=False))))
2866:    dtypes(*list(product(get_all_complex_dtypes(),
2867:                          get_all_complex_dtypes())))
2902:    dtypes(*product(get_all_dtypes(), get_all_dtypes()))
2906:    dtypes(*product(get_all_dtypes(), get_all_dtypes()))
2910:    dtypes(*product(get_all_dtypes(), get_all_dtypes()))
3019:        dtypes = [torch.float, torch.double] + get_all_complex_dtypes()
3221:    dtypes(*get_all_dtypes(include_complex=False))
3407:    dtypes(*list(product(get_all_dtypes(include_bool=False),
3408:                          get_all_dtypes(include_bool=False))))
3504:    dtypes(*product(get_all_dtypes(include_complex=False, include_bfloat16=False),
3505:                     get_all_dtypes(include_complex=False, include_bfloat16=False)))
3516:            if x.dtype in get_all_int_dtypes() + [torch.bool]:
3643:    dtypes(*product(get_all_dtypes(include_complex=False,
3645:                     get_all_dtypes(include_complex=False,
```

</p>
</details>

<details>
<summary>

`test/test_complex.py`</summary>

<p>

```python
6:from torch.testing._internal.common_dtype import get_all_complex_dtypes
11:    dtypes(*get_all_complex_dtypes())
```

</p>
</details>

<details>
<summary>

`test/test_foreach.py`</summary>

<p>

```python
18:    get_all_dtypes, get_all_int_dtypes, get_all_complex_dtypes, get_all_fp_dtypes,
142:            if dtype in get_all_int_dtypes():
179:            disable_fastpath = op.ref == torch.div and dtype in get_all_int_dtypes() + [torch.bool]
201:            disable_fastpath = op.ref == torch.div and dtype in get_all_int_dtypes() + [torch.bool]
205:                disable_fastpath |= dtype in get_all_int_dtypes() + [torch.bool]
211:                disable_fastpath |= dtype not in get_all_complex_dtypes()
241:                bool_int_div = op.ref == torch.div and dtype in get_all_int_dtypes() + [torch.bool]
246:                    disable_fastpath |= dtype in get_all_int_dtypes() + [torch.bool]
248:                    disable_fastpath |= dtype not in get_all_complex_dtypes()
250:                    disable_fastpath |= True and dtype not in get_all_complex_dtypes()
307:        disable_fastpath = dtype in get_all_int_dtypes() + [torch.bool]
365:        if opinfo.name == "_foreach_abs" and dtype in get_all_complex_dtypes():
376:    ops(foreach_unary_op_db, dtypes=get_all_dtypes())
393:         dtypes=get_all_dtypes(include_half=True, include_bfloat16=True, include_complex=False))
401:    ops(foreach_minmax_op_db, dtypes=get_all_fp_dtypes(include_bfloat16=True, include_half=True))
426:            if ord in (1, 2) and dtype in torch.testing.get_all_fp_dtypes():
439:    dtypes(*get_all_dtypes())
449:    ops(foreach_binary_op_db, dtypes=get_all_dtypes())
481:    ops(foreach_binary_op_db, dtypes=get_all_dtypes())
536:            if dtype in get_all_int_dtypes() + [torch.bool] and foreach_op == torch._foreach_div:
545:    ops(foreach_binary_op_db, dtypes=get_all_dtypes())
637:    ops(foreach_pointwise_op_db, allowed_dtypes=get_all_fp_dtypes(include_half=False, include_bfloat16=False))
```

</p>
</details>

<details>
<summary>

`test/test_linalg.py`</summary>

<p>

```python
29:    all_types, floating_types, floating_and_complex_types, get_all_dtypes, get_all_int_dtypes, get_all_complex_dtypes,
30:    get_all_fp_dtypes,
111:    dtypes(*(get_all_dtypes()))
794:        float_and_complex_dtypes = get_all_fp_dtypes() + get_all_complex_dtypes()
807:    dtypes(*(get_all_int_dtypes()))
828:    dtypes(*(get_all_fp_dtypes() + get_all_complex_dtypes()))
841:        if dtype in get_all_complex_dtypes():
844:    dtypes(*itertools.product(get_all_dtypes(),
845:                               get_all_dtypes()))
855:        for dtypes0, dtypes1, dtypes2 in product(get_all_dtypes(), repeat=3):
5607:                  *get_all_fp_dtypes(include_half=not CUDA9, include_bfloat16=(CUDA11OrLater and SM53OrLater)))
5608:    dtypes(*(set(get_all_dtypes()) - {torch.half, torch.bool}))
5644:    dtypes(*(get_all_complex_dtypes() + get_all_fp_dtypes()))
6255:    dtypesIfCUDA(*get_all_complex_dtypes(),
6256:                  *get_all_fp_dtypes(include_bfloat16=(TEST_WITH_ROCM or (CUDA11OrLater and SM53OrLater)),
6292:    dtypesIfCUDA(*get_all_fp_dtypes(include_bfloat16=(TEST_WITH_ROCM or (CUDA11OrLater and SM53OrLater))))
6323:    dtypesIfCUDA(*get_all_complex_dtypes(),
6324:                  *get_all_fp_dtypes(include_bfloat16=(TEST_WITH_ROCM or (CUDA11OrLater and SM53OrLater))))
6325:    dtypes(*get_all_complex_dtypes(), *get_all_fp_dtypes())
6358:    dtypesIfCUDA(*([torch.float, torch.double] + get_all_complex_dtypes()))
6556:    dtypes(*get_all_fp_dtypes(), *get_all_complex_dtypes())
6668:    dtypes(*get_all_fp_dtypes(), *get_all_complex_dtypes())
6741:    dtypes(*get_all_fp_dtypes(), *get_all_complex_dtypes())
```

</p>
</details>

<details>
<summary>

`test/test_nn.py`</summary>

<p>

```python
37:from torch.testing._internal.common_dtype import integral_types, get_all_fp_dtypes, get_all_math_dtypes
50:    onlyNativeDeviceTypes, deviceCountAtLeast, largeTensorTest, expectedFailureMeta, skipMeta, get_all_device_types, \
8862:                for device in get_all_device_types():
9629:            for dt1 in get_all_math_dtypes(device):
9630:                for dt2 in get_all_math_dtypes(device):
9631:                    for dt3 in get_all_math_dtypes(device):
9648:            for input_dtype in get_all_math_dtypes(device):
9664:            for input_dtype in get_all_math_dtypes(device):
13015:    dtypes(*get_all_fp_dtypes(include_bfloat16=AMPERE_OR_ROCM))
13034:    dtypes(*get_all_fp_dtypes(include_bfloat16=AMPERE_OR_ROCM))
13159:    dtypes(*get_all_fp_dtypes(include_bfloat16=AMPERE_OR_ROCM))
17400:    dtypesIfCUDA(*get_all_fp_dtypes(include_bfloat16=AMPERE_OR_ROCM))
17768:    dtypesIfCUDA(*get_all_fp_dtypes())
17773:    dtypesIfCUDA(*get_all_fp_dtypes())
17778:    dtypesIfCUDA(*get_all_fp_dtypes())
17783:    dtypesIfCUDA(*get_all_fp_dtypes())
17788:    dtypesIfCUDA(*get_all_fp_dtypes())
17793:    dtypesIfCUDA(*get_all_fp_dtypes())
17798:    dtypesIfCUDA(*get_all_fp_dtypes())
17963:    dtypesIfCUDA(*get_all_fp_dtypes())
17977:    dtypesIfCUDA(*get_all_fp_dtypes())
18684:    def test_cross_entropy_loss_prob_target_all_reductions(self, device):
```

</p>
</details>

<details>
<summary>

`test/test_numpy_interop.py`</summary>

<p>

```python
12:from torch.testing._internal.common_dtype import get_all_dtypes
399:    dtypes(*get_all_dtypes())
```

</p>
</details>

<details>
<summary>

`test/test_ops.py`</summary>

<p>

```python
12:from torch.testing._internal.common_dtype import floating_and_complex_types_and, get_all_dtypes
86:        for dtype in get_all_dtypes():
```

</p>
</details>

<details>
<summary>

`test/test_reductions.py`</summary>

<p>

```python
16:    get_all_dtypes, get_all_math_dtypes, get_all_int_dtypes, get_all_complex_dtypes, get_all_fp_dtypes,
360:         allowed_dtypes=get_all_dtypes(include_bfloat16=False))
366:         allowed_dtypes=get_all_dtypes(include_bfloat16=False))
394:         allowed_dtypes=get_all_dtypes(include_bfloat16=False))
750:        for dtype in [dtype for dtype in get_all_math_dtypes('cpu') if dtype != torch.float16]:
1404:    dtypes(*get_all_dtypes(include_bool=False, include_complex=False))
1457:    dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False) +
1458:              get_all_complex_dtypes()))
1465:            return dtype in get_all_int_dtypes()
1494:    dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False)))
1501:    dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False)))
1507:    dtypes(*(get_all_complex_dtypes()))
1514:        dtypes = list(get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False))
1523:    dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False)))
1531:        if dtype in get_all_fp_dtypes():
1608:    dtypes(*(get_all_dtypes(include_half=True, include_bfloat16=False,
1837:    dtypes(*get_all_dtypes(include_bool=False, include_complex=False))
1855:    dtypes(*(set(get_all_dtypes(include_bool=False, include_complex=False)) - {torch.uint8}))
3219:        for dtype in get_all_dtypes(include_half=True, include_bfloat16=False,
```

</p>
</details>

<details>
<summary>

`test/test_serialization.py`</summary>

<p>

```python
26:from torch.testing._internal.common_dtype import get_all_dtypes
586:        for device, dtype in product(devices, get_all_dtypes()):
589:            for other_dtype in get_all_dtypes():
```

</p>
</details>

<details>
<summary>

`test/test_shape_ops.py`</summary>

<p>

```python
18:from torch.testing._internal.common_dtype import get_all_dtypes
230:    dtypes(*get_all_dtypes(include_complex=False, include_bool=False, include_half=False,
232:    dtypesIfCUDA(*get_all_dtypes(include_complex=False, include_bool=False, include_bfloat16=False))
344:    dtypes(*get_all_dtypes())
443:    dtypes(*get_all_dtypes())
461:    dtypes(*get_all_dtypes())
570:    dtypes(*get_all_dtypes(include_complex=False))
```

</p>
</details>

<details>
<summary>

`test/test_sort_and_select.py`</summary>

<p>

```python
12:    all_types, all_types_and, floating_types_and, get_all_dtypes, get_all_int_dtypes, get_all_fp_dtypes,
136:    dtypes(*set(get_all_dtypes()) - {torch.bool, torch.complex64, torch.complex128})
231:    dtypes(*set(get_all_dtypes()) - {torch.bool, torch.complex64, torch.complex128})
296:    dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes()))
647:    dtypesIfCUDA(*get_all_fp_dtypes())
678:    dtypesIfCUDA(*(get_all_dtypes(include_complex=False,
682:    dtypes(*(get_all_dtypes(include_complex=False, include_bool=False, include_half=False, include_bfloat16=False)))
739:    dtypesIfCPU(*set(get_all_dtypes()) - {torch.complex64, torch.complex128})
740:    dtypes(*set(get_all_dtypes()) - {torch.bfloat16, torch.complex64, torch.complex128})
799:    dtypesIfCPU(*set(get_all_dtypes()) - {torch.complex64, torch.complex128})
800:    dtypes(*set(get_all_dtypes()) - {torch.bfloat16, torch.complex64, torch.complex128})
```

</p>
</details>

<details>
<summary>

`test/test_sparse.py`</summary>

<p>

```python
20:from torch.testing import get_all_complex_dtypes, get_all_fp_dtypes
29:    floating_and_complex_types, floating_and_complex_types_and, get_all_dtypes, get_all_int_dtypes,
1963:            return dtype in get_all_int_dtypes()
1994:    dtypes(*get_all_dtypes(include_bool=False, include_half=False,
2103:            return dtype in get_all_int_dtypes()
2138:    dtypes(*get_all_dtypes(include_bool=False, include_half=False,
2626:        all_sparse_dtypes = get_all_dtypes(include_complex=True)
2633:        all_sparse_dtypes = get_all_dtypes(include_complex=True)
3230:    dtypes(*get_all_complex_dtypes(),
3231:            *get_all_fp_dtypes(include_half=False, include_bfloat16=False))
3234:                  *get_all_fp_dtypes(
```

</p>
</details>

<details>
<summary>

`test/test_sparse_csr.py`</summary>

<p>

```python
7:from torch.testing import get_all_complex_dtypes, get_all_fp_dtypes, floating_and_complex_types, make_tensor
17:from torch.testing._internal.common_dtype import floating_types, get_all_dtypes
120:    dtypes(*get_all_dtypes())
133:    dtypes(*get_all_dtypes())
150:    dtypes(*get_all_dtypes())
180:    dtypes(*get_all_dtypes())
201:    dtypes(*get_all_dtypes())
210:    dtypes(*get_all_dtypes())
225:    dtypes(*get_all_dtypes())
244:    dtypes(*get_all_dtypes())
263:    dtypes(*get_all_dtypes())
285:    dtypes(*get_all_dtypes())
411:    dtypes(*get_all_dtypes())
482:    dtypes(*get_all_dtypes())
502:    dtypes(*get_all_dtypes())
562:    dtypes(*get_all_dtypes())
588:    dtypesIfCUDA(*get_all_complex_dtypes(),
589:                  *get_all_fp_dtypes(include_half=SM53OrLater, include_bfloat16=SM80OrLater))
745:    dtypesIfCUDA(*get_all_complex_dtypes(),
746:                  *get_all_fp_dtypes(include_half=SM53OrLater and TEST_CUSPARSE_GENERIC,
765:    dtypesIfCUDA(*get_all_complex_dtypes(),
766:                  *get_all_fp_dtypes(include_half=SM53OrLater and TEST_CUSPARSE_GENERIC,
801:                  *torch.testing.get_all_fp_dtypes(include_bfloat16=SM80OrLater,
841:                  *torch.testing.get_all_fp_dtypes(include_bfloat16=SM80OrLater,
1182:    dtypes(*get_all_dtypes())
1276:    dtypes(*get_all_dtypes(include_bool=False, include_half=False, include_bfloat16=False))
1286:    dtypes(*get_all_dtypes())
```

</p>
</details>

<details>
<summary>

`test/test_tensor_creation_ops.py`</summary>

<p>

```python
21:    onlyCUDA, skipCPUIf, dtypesIfCUDA, skipMeta, get_all_device_types)
23:    get_all_dtypes, get_all_math_dtypes, get_all_int_dtypes, get_all_fp_dtypes, get_all_complex_dtypes
150:        for dt in get_all_dtypes():
160:        for dt in get_all_dtypes():
314:        dtypes = [dtype for dtype in get_all_dtypes() if dtype != torch.bfloat16]
1012:    dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False) +
1013:              get_all_complex_dtypes()))
1032:    dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False) +
1033:              get_all_complex_dtypes()))
1050:    dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False) +
1051:              get_all_complex_dtypes()))
1745:    dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes()))
1779:    dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes()))
1868:    dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes()))
1926:    dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes()))
1954:            do_test_empty_full(self, get_all_math_dtypes('cpu'), torch.strided, torch_device)
1956:            do_test_empty_full(self, get_all_math_dtypes('cpu'), torch.strided, None)
1957:            do_test_empty_full(self, get_all_math_dtypes('cpu'), torch.strided, torch_device)
2538:        for device in get_all_device_types():
2645:        for dtype in get_all_dtypes():
2678:    dtypes(*(get_all_fp_dtypes(include_half=False, include_bfloat16=False) +
2679:              get_all_complex_dtypes()))
2716:    dtypes(*get_all_fp_dtypes(include_half=False, include_bfloat16=False))
2827:            for dt in get_all_dtypes():
2913:    dtypes(*get_all_dtypes(include_bool=False, include_half=False))
2914:    dtypesIfCUDA(*get_all_dtypes(include_bool=False, include_half=True))
3028:    dtypes(*(get_all_fp_dtypes() + get_all_complex_dtypes()))
3033:    dtypes(*(get_all_fp_dtypes() + get_all_complex_dtypes()))
3074:    dtypes(*get_all_dtypes(include_bool=False, include_half=False, include_complex=False))
3075:    dtypesIfCUDA(*((get_all_int_dtypes() + [torch.float32, torch.float16, torch.bfloat16])
3077:                    else get_all_dtypes(include_bool=False, include_half=True, include_complex=False)))
3873:    dtypes(*get_all_dtypes())
3884:    dtypes(*get_all_dtypes(include_bool=False))
3916:            for other in get_all_dtypes():
3922:    dtypes(*get_all_dtypes())
3932:    dtypes(*get_all_dtypes(include_bool=False))
3955:    dtypes(*get_all_dtypes(include_bool=False))
3961:    dtypes(*get_all_dtypes(include_bool=False))
3965:    dtypes(*get_all_dtypes())
```

</p>
</details>

<details>
<summary>

`test/test_testing.py`</summary>

<p>

```python
25:from torch.testing._internal.common_dtype import get_all_dtypes
31:    dtypes(*(get_all_dtypes(include_half=True, include_bfloat16=False,
```

</p>
</details>

<details>
<summary>

`test/test_torch.py`</summary>

<p>

```python
51:    expectedAlertNondeterministic, get_all_device_types, skipXLA)
57:    get_all_fp_dtypes, get_all_int_dtypes, get_all_math_dtypes, get_all_dtypes, get_all_complex_dtypes
296:            for d in get_all_device_types():
323:            for device in get_all_device_types():
324:                for dt1 in get_all_dtypes():
325:                    for dt2 in get_all_dtypes():
343:            all_dtypes = get_all_dtypes()
350:            all_dtypes = get_all_dtypes()
781:            for dtype in get_all_dtypes():
986:            for device in get_all_device_types():
1017:            for device in get_all_device_types():
1018:                for dtype in get_all_math_dtypes(device):
2792:            for device in get_all_device_types():
3186:    dtypes(*get_all_dtypes())
3195:        for error_dtype in get_all_dtypes():
3203:    dtypes(*get_all_dtypes())
3212:        for error_dtype in get_all_dtypes():
4539:    dtypes(*get_all_fp_dtypes())
4545:    dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes()))
4577:    dtypes(*get_all_fp_dtypes(include_half=False, include_bfloat16=False))
4578:    dtypesIfCPU(*(get_all_fp_dtypes(include_half=False, include_bfloat16=True)))
4579:    dtypesIfCUDA(*(get_all_fp_dtypes(include_bfloat16=False)))
4599:    dtypes(*(get_all_fp_dtypes(include_half=False, include_bfloat16=False)))
4600:    dtypesIfCPU(*(get_all_dtypes(include_half=False, include_bfloat16=False, include_complex=False)))
4601:    dtypesIfCUDA(*(get_all_dtypes(include_bfloat16=False, include_complex=False)))
4613:        for p_dtype in get_all_fp_dtypes(include_half=device.startswith('cuda'), include_bfloat16=False):
4628:    dtypes(*(get_all_fp_dtypes(include_half=False, include_bfloat16=False)))
4629:    dtypesIfCUDA(*(get_all_fp_dtypes(include_bfloat16=False)))
4640:    dtypes(*get_all_fp_dtypes())
4723:    dtypes(*get_all_fp_dtypes())
4735:    dtypes(*get_all_fp_dtypes(include_bfloat16=False))
4736:    dtypesIfCUDA(*get_all_fp_dtypes())
4747:    dtypes(*get_all_fp_dtypes())
4761:    dtypes(*get_all_fp_dtypes())
4771:    dtypes(*get_all_fp_dtypes())
4792:    dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes()))
5302:    dtypes(*get_all_dtypes(include_bfloat16=False))
5322:    dtypes(*get_all_dtypes(include_half=False, include_bfloat16=False))
5323:    dtypesIfCPU(*get_all_dtypes(include_bfloat16=False))
5324:    dtypesIfCUDA(*get_all_dtypes(include_bfloat16=False))
5591:        for dt in get_all_dtypes():
5611:        for dt in get_all_dtypes():
5678:        for dt in get_all_dtypes():
5696:    dtypesIfCUDA(*set(get_all_math_dtypes('cuda')))
5697:    dtypes(*set(get_all_math_dtypes('cpu')))
5746:    dtypes(*get_all_dtypes())
5780:    dtypes(*get_all_dtypes())
5885:    dtypes(*get_all_dtypes())
5902:    dtypes(*get_all_dtypes())
5945:    dtypes(*get_all_dtypes())
5979:    dtypes(*get_all_dtypes(include_bool=False))
6049:    dtypes(*get_all_dtypes(include_bool=False))
6092:    dtypes(*(get_all_fp_dtypes(include_bfloat16=False, include_half=False) +
6093:              get_all_complex_dtypes()))
6094:    dtypesIfCPU(*get_all_dtypes())
6095:    dtypesIfCUDA(*get_all_dtypes())
6122:    dtypes(*(get_all_fp_dtypes(include_bfloat16=False, include_half=False) +
6123:              get_all_complex_dtypes()))
6124:    dtypesIfCPU(*get_all_dtypes())
6125:    dtypesIfCUDA(*get_all_dtypes())
6163:    dtypes(*(get_all_fp_dtypes(include_bfloat16=False, include_half=False) +
6164:              get_all_complex_dtypes()))
6165:    dtypesIfCPU(*get_all_dtypes())
6166:    dtypesIfCUDA(*get_all_dtypes())
6190:    dtypes(*(get_all_complex_dtypes() +
6191:              get_all_int_dtypes()))
6238:    dtypes(*get_all_dtypes())
6323:    dtypes(*get_all_dtypes())
6389:    dtypes(*product(get_all_dtypes(), (torch.uint8, torch.bool)))
6699:    dtypesIfCUDA(*set(get_all_math_dtypes('cuda')))
6700:    dtypes(*set(get_all_math_dtypes('cpu')))
7452:    dtypes(*get_all_dtypes(include_bool=False))
7461:    dtypes(*get_all_dtypes(include_bool=False))
7477:    dtypes(*get_all_dtypes(include_bool=False))
7496:    dtypes(*get_all_dtypes(include_bool=False))
7538:    dtypes(*get_all_dtypes(include_bool=False))
8162:    dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes() +
8163:              get_all_complex_dtypes()))
8175:    dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes() +
8176:              get_all_complex_dtypes()))
```

</p>
</details>

<details>
<summary>

`test/test_type_promotion.py`</summary>

<p>

```python
14:    get_all_dtypes, get_all_math_dtypes, get_all_int_dtypes, get_all_fp_dtypes
187:        for dtype in get_all_dtypes():
262:        dtypes1 = get_all_math_dtypes('cuda')
263:        dtypes2 = get_all_math_dtypes(device)
339:    dtypes(*itertools.product(get_all_dtypes(), get_all_dtypes()))
468:            for dt1 in get_all_math_dtypes(device):
469:                for dt2 in get_all_math_dtypes(device):
519:            for dt1 in get_all_math_dtypes(device):
520:                for dt2 in get_all_math_dtypes(device):
528:        for dt in get_all_math_dtypes(device):
561:        for dtype in get_all_dtypes():
766:                                          dtypes=get_all_math_dtypes(device))
771:                                          dtypes=get_all_math_dtypes(device))
782:                                          dtypes=get_all_math_dtypes(device))
879:        dtypes = get_all_dtypes(include_bfloat16=False)
898:        dtypes = get_all_dtypes(include_bfloat16=False, include_bool=False)
965:    dtypesIfCUDA(*itertools.product(get_all_dtypes(include_bfloat16=False, include_complex=False),
966:                                     get_all_dtypes(include_bfloat16=False, include_complex=False)))
967:    dtypes(*itertools.product(get_all_dtypes(include_half=False, include_bfloat16=False,
969:                               get_all_dtypes(include_half=False, include_bfloat16=False,
976:            return dtype in get_all_int_dtypes() + [torch.bool]
979:            return dtype in get_all_fp_dtypes(include_half=True, include_bfloat16=False)
```

</p>
</details>

<details>
<summary>

`test/test_unary_ufuncs.py`</summary>

<p>

```python
24:    floating_types_and, all_types_and_complex_and, floating_and_complex_types_and, get_all_dtypes, get_all_math_dtypes,
25:    get_all_int_dtypes, get_all_fp_dtypes, get_all_complex_dtypes
517:    dtypes(*(get_all_int_dtypes() + [torch.bool] +
518:              get_all_fp_dtypes(include_bfloat16=False)))
596:    dtypes(*get_all_fp_dtypes(include_half=True, include_bfloat16=False))
611:        invalid_input_dtypes = get_all_int_dtypes() + \
612:            get_all_complex_dtypes() + \
619:        for dtype in get_all_fp_dtypes(include_half=True, include_bfloat16=False):
1048:    dtypes(*get_all_math_dtypes('cpu'))
1182:    dtypesIfCUDA(*get_all_fp_dtypes())
1190:    dtypesIfCUDA(*get_all_fp_dtypes())
1205:    dtypesIfCUDA(*get_all_fp_dtypes())
1215:    dtypesIfCUDA(*get_all_fp_dtypes())
1307:    dtypes(*(get_all_dtypes(include_bool=False)))
1349:    dtypes(*(get_all_fp_dtypes(include_half=False) +
1350:              get_all_complex_dtypes()))
1351:    dtypesIfCUDA(*(get_all_fp_dtypes(include_half=True) +
1352:                    get_all_complex_dtypes()))
```

</p>
</details>

<details>
<summary>

`test/test_view_ops.py`</summary>

<p>

```python
19:    get_all_dtypes, get_all_int_dtypes, get_all_fp_dtypes, get_all_complex_dtypes
124:    dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes()))
131:    dtypes(*get_all_dtypes(include_bfloat16=False))
213:            for view_dtype in [*get_all_fp_dtypes(), *get_all_complex_dtypes()]:
220:    dtypes(*get_all_dtypes())
224:        for view_dtype in get_all_dtypes():
305:    dtypes(*get_all_complex_dtypes(include_complex32=True))
343:    dtypes(*get_all_dtypes())
354:    dtypes(*get_all_dtypes())
364:    dtypes(*get_all_dtypes())
374:    dtypes(*get_all_dtypes())
384:    dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes()))
395:    dtypes(*get_all_complex_dtypes())
426:    dtypes(*get_all_complex_dtypes())
451:    dtypes(*product(get_all_complex_dtypes(), get_all_dtypes()))
1263:    dtypes(*(torch.testing.get_all_dtypes()))
1279:    dtypes(*(torch.testing.get_all_dtypes()))
1405:    dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False) +
1406:              get_all_complex_dtypes()))
1471:    dtypes(*get_all_dtypes(include_bfloat16=False))
1574:    dtypes(*get_all_dtypes())
1601:    dtypes(*get_all_dtypes(include_bfloat16=False))
1632:    dtypes(*get_all_dtypes(include_bfloat16=False))
1711:        for dt in get_all_dtypes():
1717:        for dt in get_all_dtypes():
1724:        for dt in get_all_dtypes():
```

</p>
</details>

I'm looking forward to your viewpoints. Thanks :)

cc: mruberry kshitij12345 anjali411

Pull Request resolved: https://github.com/pytorch/pytorch/pull/71561

Reviewed By: samdow

Differential Revision: D34856571

Pulled By: mruberry

fbshipit-source-id: 0dca038bcad5cf69906245c496d2e61ac3876335
(cherry picked from commit b058f67b4313143efa714ab105f36e74083131b9)
2022-03-15 20:31:41 +00:00
Pearu Peterson
4168c87ed3 Support CSR to COO conversion in to_sparse(2). (#73642)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73642

Former https://github.com/pytorch/pytorch/pull/73471 that was reverted due to lack of `to_sparse(sparse_dim)` support.

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D34580353

Pulled By: cpuhrsch

fbshipit-source-id: a8a4ea381daeb80d8365fe931af9f55a7e789ea1
(cherry picked from commit 5a3cf8110980e5a10dbb687e87e67d5524ebf2f5)
2022-03-02 22:33:32 +00:00
Nikita Shulga
8ac7393565 Revert D33767740: [pytorch][PR] Sparse CSR CPU: cuSolverSP backend for linalg.solve
Test Plan: revert-hammer

Differential Revision:
D33767740 (199d9a992c)

Original commit changeset: a945f065210c

Original Phabricator Diff: D33767740 (199d9a992c)

fbshipit-source-id: b7934df18118f8d6d5f165deb5aae9887953ae43
(cherry picked from commit d3ddbb021b227e3638f6f7c22c6eadfa73695e31)
2022-03-01 18:33:23 +00:00
Rohan Varma
95204c4e2b Revert D34503882: Support CSR to COO conversion in to_sparse.
Test Plan: revert-hammer

Differential Revision:
D34503882 (84f4e9c10a)

Original commit changeset: 4a781647a0ae

Original Phabricator Diff: D34503882 (84f4e9c10a)

fbshipit-source-id: cf161171a3b51aa3c0f2b15501956873b1ba29dd
(cherry picked from commit 924c19071713777700087087b27b388eb057d8d9)
2022-03-01 15:33:37 +00:00
Pearu Peterson
84f4e9c10a Support CSR to COO conversion in to_sparse. (#73471)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73471

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D34503882

Pulled By: cpuhrsch

fbshipit-source-id: 4a781647a0ae5d03827406b75b14acc7c48da0b0
(cherry picked from commit fa3dbdc6a8529d19f8a055494436ca1f766807be)
2022-03-01 06:31:52 +00:00
Kushashwa Ravi Shrimali
199d9a992c Sparse CSR CPU: cuSolverSP backend for linalg.solve (#71399)
Summary:
This PR introduces the `cuSolverSP` backend for `linalg.solve` with sparse CSR input matrices. The motivation comes from the issue: https://github.com/pytorch/pytorch/issues/69538.

`cuSolver` provides [`cusolverSp<t>csrlsvluHost`](https://docs.nvidia.com/cuda/cusolver/index.html#cusolver-lt-t-gt-csrlsvlu) API, a few things to note:

1. As mentioned in the documentation: `only CPU (Host) path is provided.` From the profiling, there doesn't seem to be any GPU kernel launch for optimization, please see the profiling below.
2. Since only `host` path is provided, the CPU path uses `csrlsvluHost` (but requires PyTorch to be installed/built with CUDA support).
3. The documentation mentions reordering helps optimize stuff, but it isn't clear how it affects the performance. There are options for reordering, so we stick to `reorder = 0` as the default choice.

`cuSolver` has [`csrlsvqr`](https://docs.nvidia.com/cuda/cusolver/index.html#cusolver-lt-t-gt-csrlsvqr) function which provides a `device` path to solve the linear system. This function is used for the CUDA path in this PR.

**Gist:**

For CPU Path: we call [`csrlsvluHost` function of cuSolver](https://docs.nvidia.com/cuda/cusolver/index.html#cusolver-lt-t-gt-csrlsvlu).
For CUDA Path: we call [`csrlsvqr` function of cuSolver](https://docs.nvidia.com/cuda/cusolver/index.html#cusolver-lt-t-gt-csrlsvqr).

**Profiling:** (On sparse input tensor of size 1000 x 1000, with a vector of shape length 1000), for `csrlsvlu` function (to show no GPU optimization)

```cpp
==3999651== Profiling result:
            Type  Time(%)      Time     Calls       Avg       Min       Max  Name
 GPU activities:  100.00%  2.1440us         1  2.1440us  2.1440us  2.1440us  [CUDA memcpy HtoD]
      API calls:   99.72%  1.07199s         9  119.11ms     500ns  1.07164s  cudaFree
                    0.11%  1.2182ms       398  3.0600us     140ns  137.94us  cuDeviceGetAttribute
                    0.06%  674.45us         4  168.61us  165.50us  173.64us  cuDeviceTotalMem
                    0.03%  357.07us         4  89.268us  2.7800us  201.89us  cudaMalloc
                    0.03%  309.29us         1  309.29us  309.29us  309.29us  cudaGetDeviceProperties
                    0.01%  160.47us       332     483ns     350ns  3.3300us  cudaFuncSetAttribute
                    0.01%  115.12us         4  28.780us  26.290us  33.410us  cuDeviceGetName
                    0.00%  28.591us         5  5.7180us     440ns  16.921us  cudaGetDevice
                    0.00%  22.061us         4  5.5150us     871ns  18.690us  cudaDeviceSynchronize
                    0.00%  20.370us        18  1.1310us     410ns  6.9900us  cudaEventDestroy
                    0.00%  16.390us         1  16.390us  16.390us  16.390us  cudaMemcpy
                    0.00%  11.540us         2  5.7700us  1.4900us  10.050us  cuDeviceGetPCIBusId
                    0.00%  10.510us        18     583ns     430ns  1.6200us  cudaEventCreateWithFlags
                    0.00%  7.9100us        21     376ns     290ns     700ns  cudaDeviceGetAttribute
                    0.00%  1.4300us         6     238ns     150ns     590ns  cuDeviceGet
                    0.00%  1.2200us         4     305ns     190ns     500ns  cuDeviceGetCount
                    0.00%     900ns         1     900ns     900ns     900ns  cuInit
                    0.00%     860ns         4     215ns     180ns     260ns  cuDeviceGetUuid
                    0.00%     240ns         1     240ns     240ns     240ns  cuDriverGetVersion
                    0.00%     230ns         1     230ns     230ns     230ns  cudaGetDeviceCount
```

Script:

```python
import torch

def solve(x, other, out):
    torch.linalg.solve(x, other, out=out)

if __name__ == "__main__":
    dense_inp = torch.randn((1000, 1000), dtype=torch.float64)
    # Set 50% of the values to 0 randomly
    dense_inp = torch.nn.functional.dropout(dense_inp, p=0.5)
    sparse_inp = dense_inp.to_sparse_csr()

    other = torch.randint(100, (1000,), dtype=torch.float64)
    out = torch.randint(1, (1000,), dtype=torch.float64)

    solve(sparse_inp, other, out)
```

The following error is raised when the function is used on a CPU device with PyTorch built/installed without CUDA support:
* When built without CUDA support:

```python
/home/krshrimali/pytorch/torch/autograd/profiler.py:151: UserWarning: CUDA is not available, disabling CUDA profiling
  warn("CUDA is not available, disabling CUDA profiling")
Traceback (most recent call last):
  File "/home/krshrimali/pytorch/test_sp.py", line 17, in <module>
    solve(x, other, out)
  File "/home/krshrimali/pytorch/test_sp.py", line 5, in solve
    torch.linalg.solve(x, other, out=out)
RuntimeError: PyTorch was not built with CUDA support. Please use PyTorch built CUDA support
```

**Performance Comparison** (vs SciPy's [`scipy.sparse.linalg.spsolve`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.linalg.spsolve.html):

Time taken by `scipy.sparse.linalg.spsolve` : 0.595 seconds

On CPU: Time taken by `torch.linalg.solve` : 4.565 seconds
On CUDA: Time taken by `torch.linalg.solve`: 1.838 seconds

The inputs are of dimensions: (17281, 17281) and (17281, 1), and were taken from https://math.nist.gov/MatrixMarket/extreme.html.

Thanks to IvanYashchuk for helping me with the PR, and guiding me through it.

cc: IvanYashchuk pearu nikitaved cpuhrsch

cc nikitaved pearu cpuhrsch

Pull Request resolved: https://github.com/pytorch/pytorch/pull/71399

Reviewed By: VitalyFedyunin

Differential Revision: D33767740

Pulled By: cpuhrsch

fbshipit-source-id: a945f065210cd719096eb8d7cdbf8e8937c2fce9
(cherry picked from commit f4f35c17da414e1ca6c6d91402933521857aa1ea)
2022-03-01 05:32:35 +00:00
Ivan Yashchuk
0ba3498248 Sparse CSR CPU: implement addmm(dense, sparse, sparse) -> dense (#73076)
Summary:
This PR adds a possibility to multiply two sparse matrices and add the result of a product to a dense matrix.
It uses [MKL spmmd function](https://www.intel.com/content/www/us/en/develop/documentation/onemkl-developer-reference-c/top/blas-and-sparse-blas-routines/inspector-executor-sparse-blas-routines/inspector-executor-sparse-blas-execution-routines/mkl-sparse-spmmd.html) and only CPU path is implemented for now.

Ref. https://github.com/pytorch/pytorch/issues/60858

Pull Request resolved: https://github.com/pytorch/pytorch/pull/73076

Reviewed By: mikaylagawarecki

Differential Revision: D34342993

Pulled By: cpuhrsch

fbshipit-source-id: 5e5ea67cb92fbaa4d4c0eaf61e85019972989a21
(cherry picked from commit 62b8dc730e6a6736f5c03ac09eac5223cd9706cf)
2022-02-26 01:08:45 +00:00
Ivan Yashchuk
ebd93f69db Enable CSR inputs for torch.sparse.mm (#73075)
Summary:
Previously `torch.sparse.mm` supported only COO and dense inputs.

Computing derivatives works wrt dense input for sparse_csr x dense -> dense

Modified implementation of `torch.sparse.mm` to be directly bound to ATen function.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/73075

Reviewed By: mikaylagawarecki

Differential Revision: D34342954

Pulled By: cpuhrsch

fbshipit-source-id: a6ed914a0ce28b35276109479109095f7149d32b
(cherry picked from commit 948de1816c46cd087bacbee36dc583cf409813f9)
2022-02-24 04:30:48 +00:00
Pearu Peterson
e785c0a1ab Enable Half/BFloat16 support for to_dense and coalesce methods. (#72397)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72397

Test Plan: Imported from OSS

Reviewed By: jbschlosser, zou3519

Differential Revision: D34286114

Pulled By: cpuhrsch

fbshipit-source-id: a4f7e2abc3b2d37437cbd09d693c1b409bb011b9
(cherry picked from commit 74f94447fc)
2022-02-17 02:54:23 +00:00
Ivan Yashchuk
fb7c4780f9 Add autograd tests for addmm, addmv, mm, mv and CSR matrix input (#71949)
Summary:
This PR adds autograd tests for `addmm, addmv, mm, mv` functions that check computing derivatives wrt dense inputs.

Currently, neither autograd engine, nor gradcheck can work with CSR inputs<->CSR outputs. I added xfailing tests for that.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/71949

Reviewed By: george-qi

Differential Revision: D33834653

Pulled By: cpuhrsch

fbshipit-source-id: 4144c1547427d4cd6b01495cf45242bb4e914e86
(cherry picked from commit 2cb362283d)
2022-02-11 23:14:02 +00:00
Ivan Yashchuk
ad5a5a9794 Beta value is ignored for sparse torch.addmm with non-MKL build (#72430)
Summary:
When PyTorch is not built with MKL or on Windows there's a native implementation of `torch.addmm` for tensors on CPU. There was a bug that `beta` value was ignored, causing new tests to fail (see https://github.com/pytorch/pytorch/pull/71949#issuecomment-1024639741).

In addition, I also enabled complex numbers support for this code path.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/72430

Reviewed By: davidberard98

Differential Revision: D34045670

Pulled By: cpuhrsch

fbshipit-source-id: b2b63f22ba3eea895a31c5c2925b0fb1555d2c6f
(cherry picked from commit ac0a2080bb)
2022-02-09 00:32:17 +00:00
Nikita Shulga
38ebb776a4 Fail with unexpected success for fatal errors (#72016)
Summary:
Rest of the tests from CUDA testuite is skipped after GPU context corruption is encountered.
For tests decorated with `expectedFailure` creates false impression that entire testsuite is passing.
Remedy it by suppressing the exception and printing the warning about unexpected success if `should_stop_early` is true
Also, prints warning when this happens (to make attribution easier) as well as when this condition is detected.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/72016

Test Plan:
`python test_ops.py -v  -k test_fn_fwgrad_bwgrad_gradient`
Before the change:
```
test_fn_fwgrad_bwgrad_gradient_cpu_complex128 (__main__.TestGradientsCPU) ... ok
test_fn_fwgrad_bwgrad_gradient_cpu_float64 (__main__.TestGradientsCPU) ... ok
test_fn_fwgrad_bwgrad_gradient_cuda_complex128 (__main__.TestGradientsCUDA) ... expected failure

----------------------------------------------------------------------
Ran 3 tests in 0.585s
OK (expected failures=1)
```

After the change:
```
test_fn_fwgrad_bwgrad_gradient_cpu_complex128 (__main__.TestGradientsCPU) ... ok
test_fn_fwgrad_bwgrad_gradient_cpu_float64 (__main__.TestGradientsCPU) ... ok
test_fn_fwgrad_bwgrad_gradient_cuda_complex128 (__main__.TestGradientsCUDA) ... /home/conda/miniconda3/lib/python3.9/site-packages/torch/testing/_internal/common_utils.py:1670: UserWarning: TEST SUITE EARLY TERMINATION due to torch.cuda.synchronize() failed with CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
  warn(f"TEST SUITE EARLY TERMINATION due to torch.cuda.synchronize() failed with {rte}")
/home/conda/miniconda3/lib/python3.9/site-packages/torch/testing/_internal/common_device_type.py:382: UserWarning: Suppressed expected failure that resulted in fatal error
  warn("Suppressed expected failure that resulted in fatal error")
unexpected success

----------------------------------------------------------------------
Ran 3 tests in 0.595s

FAILED (unexpected successes=1)
```
And `stderr` from XML file contains requested info:
```
/home/conda/miniconda3/lib/python3.9/site-packages/torch/testing/_internal/common_utils.py:1670: UserWarning: TEST SUITE EARLY TERMINATION due to torch.cuda.synchronize() failed with CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
  warn(f"TEST SUITE EARLY TERMINATION due to torch.cuda.synchronize() failed with {rte}")
/home/conda/miniconda3/lib/python3.9/site-packages/torch/testing/_internal/common_device_type.py:382: UserWarning: Suppressed expected failure that resulted in fatal error
  warn("Suppressed expected failure that resulted in fatal error")
```

Fixes https://github.com/pytorch/pytorch/issues/71973

Reviewed By: janeyx99, ngimel

Differential Revision: D33854287

Pulled By: malfet

fbshipit-source-id: dd0f5a4d2fcd21ebb7ee50ce4ec4914405a812d0
(cherry picked from commit 0c0baf3931)
2022-02-03 17:49:59 +00:00
Kushashwa Ravi Shrimali
85591dc85d Test 0->0 correspondence for Unary Ops with Sparse CSR inputs (#70302)
Summary:
Since there is no rule in PyTorch (Sparse CSR) for filling zeros, it was decided that only those ops will be supported which do not break 0->0 correspondence. To ensure that this rule is not broken, this PR aims to add a test to ensure this rule is not broken.

`sample_inputs_unary` may or may not generate a zero in the sample input. Hence, this separate test is good for validating the rule, and the support for Sparse CSR.

cc nikitaved pearu cpuhrsch

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70302

Reviewed By: albanD

Differential Revision: D33922501

Pulled By: cpuhrsch

fbshipit-source-id: 10f67a220b95a8e75205345a33744ad536fdcf53
(cherry picked from commit ade9bf7818)
2022-02-03 16:53:27 +00:00
Christian Puhrsch
4a7e07e53e Fix torch.save and detach for CSR Tensor (#71963)
Summary:
Currently saving a CSR Tensor simply fails. This also addresses the segfault encountered in https://github.com/pytorch/pytorch/issues/71652.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/71963

Reviewed By: jbschlosser

Differential Revision: D33895938

Pulled By: cpuhrsch

fbshipit-source-id: a333505d3a216705147c2aaaaeb2a0fd0c2a5e43
(cherry picked from commit a88265921c)
2022-02-02 23:59:24 +00:00
Ivan Yashchuk
be2dc8f294 Sparse CSR CUDA: Add torch.baddbmm and torch.bmm (#68711)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68711

This PR adds possibility to multiply a single CSR matrix by a batch of dense matrices.

cc nikitaved pearu cpuhrsch IvanYashchuk ngimel

Test Plan: Imported from OSS

Reviewed By: davidberard98

Differential Revision: D33773319

Pulled By: cpuhrsch

fbshipit-source-id: 1623ce9affbc4fdc6d6130a95c5a42022858b62b
(cherry picked from commit 628c8e366d)
2022-01-28 07:25:32 +00:00
Ivan Yashchuk
f93ffc9ea8 Sparse CSR: Handle zero matrix consistently for triangular_solve (#71304)
Summary:
This PR enables `test_block_triangular` tests on the CPU.
These tests revealed that there was a problem with how the nnz==0 case is handled. Now we return a tensor filled with NaNs both on CUDA and CPU.

cc nikitaved pearu cpuhrsch

Pull Request resolved: https://github.com/pytorch/pytorch/pull/71304

Reviewed By: davidberard98

Differential Revision: D33600482

Pulled By: cpuhrsch

fbshipit-source-id: d09cb619f8b6e54b9f07eb16765ad1c183c42487
2022-01-17 13:47:49 -08:00
Ivan Yashchuk
40121456af Sparse CSR: Add torch.randn_like (#68083)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68083

This PR adds support for `torch.randn_like(sparse_csr_tensor)`.
It creates a new sparse csr tensor with same indices but different values that are normally distributed.

In addition `.normal_()` and `torch.empty_like` were implemented because `randn_like` is a composite of these two functions.

cc nikitaved pearu cpuhrsch IvanYashchuk

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D33511280

Pulled By: cpuhrsch

fbshipit-source-id: 6129083e8bc6cc5af2e0191294bd5e4e864f6c0e
2022-01-11 18:29:24 -08:00
Pearu Peterson
cfc5519661 Support Sparse CSR transpose. Fix clang-tidy warnings. (#70582)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70582

cc nikitaved pearu cpuhrsch

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D33414446

Pulled By: cpuhrsch

fbshipit-source-id: dd0888d9dd3885579e853643a60d13373b5d6b15
2022-01-05 17:41:51 -08:00
Pearu Peterson
ab7d0df449 Support cloning CSR tensors (#70581)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70581

cc nikitaved pearu cpuhrsch

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D33413992

Pulled By: cpuhrsch

fbshipit-source-id: 3a576d2c2f26d1edcc8f6932b2dbe2c7c11e9593
2022-01-04 21:41:18 -08:00
Ivan Yashchuk
60eb1e53b2 Sparse CSR CPU: Add block sparse support for MKL path (#68710)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68710

This PR adds support for block sparse (BSR) matrices for functions that
use Inspector-Executor MKL Sparse API. At the moment of this PR it's:
* torch.addmm
* torch.addmv
* torch.triangular_solve (once https://github.com/pytorch/pytorch/pull/62180 is merged)

cc nikitaved pearu cpuhrsch IvanYashchuk

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D33179486

Pulled By: cpuhrsch

fbshipit-source-id: e1dec0dccdbfed8b280be16b8c11fc9e770d50ae
2021-12-17 10:56:05 -08:00
Ivan Yashchuk
243e135eb4 Sparse CSR CUDA: Add block sparse support for torch.triangular_solve (#68709)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68709

This PR adds support for triangular solver with a block CSR matrix.

cc nikitaved pearu cpuhrsch IvanYashchuk ngimel

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D33066067

Pulled By: cpuhrsch

fbshipit-source-id: 9eaf1839071e9526be8d8c6d47732b24200f3557
2021-12-16 13:03:42 -08:00
Peter Bell
6de9f0fc94 OpInfo: Allow sample_inputs_func to be any iterable (#69256)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69256

Closes #52486

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D32942008

Pulled By: mruberry

fbshipit-source-id: f5b01b0298c0160b0bec6e86e2b6db8cfe746206
2021-12-09 08:37:26 -08:00
Ivan Yashchuk
a8232ee1bc Sparse CSR CUDA: Add block torch.addmv when mat is sparse (#68708)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68708

This PR adds block CSR matrix times dense vector multiplication.

cc nikitaved pearu cpuhrsch IvanYashchuk ngimel

Test Plan: Imported from OSS

Reviewed By: pbelevich

Differential Revision: D32647694

Pulled By: cpuhrsch

fbshipit-source-id: a1c120691c4350284b156fe4259eda684b734b66
2021-12-07 14:02:59 -08:00
Kushashwa Ravi Shrimali
63470f9449 Sparse CSR: Implement unary ufuncs (with 0->0 correspondence) (#69292)
Summary:
This PR attempts to add support for unary ufuncs (with 0->0 correspondence) for Sparse CSR Layout.

Ops supported: `['abs', 'asin', 'asinh', 'atan', 'atanh', 'ceil', 'conj_physical', 'floor', 'log1p', 'neg', 'round', 'sin', 'sinh', 'sign', 'sgn', 'signbit', 'tan', 'tanh', 'trunc', 'expm1', 'sqrt', 'angle', 'isinf', 'isposinf', 'isneginf', 'isnan', 'erf', 'erfinv']`

cc nikitaved pearu cpuhrsch IvanYashchuk peterbell10

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69292

Reviewed By: pbelevich

Differential Revision: D32805514

Pulled By: cpuhrsch

fbshipit-source-id: 9ae20817e77a36d3aa6c5afa532b9dc3b8cf1dd3
2021-12-07 12:07:41 -08:00
Ivan Yashchuk
89a145fd91 Sparse CSR CUDA: Add torch.sparse.sampled_addmm (#68007)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68007

This PR adds a new function to the sparse module.
`sampled_addmm` computes α*(A @ B) * spy(C) + β*C, where C is a sparse CSR matrix and A, B are dense (strided) matrices.
This function is currently restricted to single 2D matrices, it doesn't support batched input.

cc nikitaved pearu cpuhrsch IvanYashchuk

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D32435799

Pulled By: cpuhrsch

fbshipit-source-id: b1ffac795080aef3fa05eaeeded03402bc097392
2021-11-29 15:43:29 -08:00
Ivan Yashchuk
61a4204d80 Sparse CSR CUDA: Add block torch.addmm when mat1 is sparse (#68707)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68707

This PR adds a path for block CSR matrices for `torch.addmm`. cuSPARSE interface is restricted to 32-bit indices and square blocks.
My plan is to make everything work and tests passing using an unsafe constructor first, keeping it all private. Then discuss & implement constructors with block information separately unlocking the functions for wider use. Documentation will come with the update to constructors.

cc nikitaved pearu cpuhrsch IvanYashchuk ngimel

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D32650366

Pulled By: cpuhrsch

fbshipit-source-id: 430a9627901781ee3d2e2496097b71ec17727d98
2021-11-29 08:58:49 -08:00
Nikita Shulga
208e109dbf Revert D32633806: Sparse CSR CUDA: Add block torch.addmm when mat1 is sparse
Test Plan: revert-hammer

Differential Revision:
D32633806 (b28ddd72d3)

Original commit changeset: b98db0bd655c

fbshipit-source-id: 1c757628526bb1b88747257fc77d8b9cb996e502
2021-11-24 09:15:17 -08:00
Ivan Yashchuk
b28ddd72d3 Sparse CSR CUDA: Add block torch.addmm when mat1 is sparse (#68707)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68707

This PR adds a path for block CSR matrices for `torch.addmm`. cuSPARSE interface is restricted to 32-bit indices and square blocks.
My plan is to make everything work and tests passing using an unsafe constructor first, keeping it all private. Then discuss & implement constructors with block information separately unlocking the functions for wider use. Documentation will come with the update to constructors.

cc nikitaved pearu cpuhrsch IvanYashchuk ngimel

Test Plan: Imported from OSS

Reviewed By: pbelevich

Differential Revision: D32633806

Pulled By: cpuhrsch

fbshipit-source-id: b98db0bd655cce651a5da457e78fca08619a5066
2021-11-23 22:55:46 -08:00
Ivan Yashchuk
3b3dc1ade8 Sparse CSR CPU: add triangular_solve_out (#62180)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62180

This PR adds CPU dispatch for `triangular_solve` with sparse CSR matrix.
The implementation uses MKL Sparse library. If it's not available then a runtime error is thrown.

cc nikitaved pearu cpuhrsch IvanYashchuk

Test Plan: Imported from OSS

Reviewed By: pbelevich

Differential Revision: D32581395

Pulled By: cpuhrsch

fbshipit-source-id: 41c7133a0d2754ef60b5a7f1d14aa0bf7680a844
2021-11-21 21:29:20 -08:00
Kushashwa Ravi Shrimali
833dcaf2d6 Sparse CSR: Add torch.sin (#68123)
Summary:
This PR attempts to add support for `torch.sin` for sparse CSR tensors.

This aims to be a revised implementation (in some form) of https://github.com/pytorch/pytorch/pull/68083, and the implementation aims to be similar to that in [`SparseTensorMath.cpp` file](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/sparse/SparseTensorMath.cpp)

The tests and `empty_like` support for sparse CSR tensors (with a minor correction) are borrowed from https://github.com/pytorch/pytorch/pull/68083 temporarily to assist CI with testing this PR. :)

cc nikitaved pearu cpuhrsch IvanYashchuk krshrimali

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68123

Reviewed By: jbschlosser

Differential Revision: D32533379

Pulled By: cpuhrsch

fbshipit-source-id: eb834d64d16ee12734c77e74fffa4a47614e3dfb
2021-11-18 21:58:09 -08:00
Rok
952ca25daa Sparse CSR: add convert_indices_from_csr_to_coo (#66774)
Summary:
This PR adds conversion from CSR to COO.

Fixes https://github.com/pytorch/pytorch/issues/56959

cc nikitaved pearu cpuhrsch IvanYashchuk gchanan mruberry

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66774

Reviewed By: zou3519

Differential Revision: D32288415

Pulled By: cpuhrsch

fbshipit-source-id: 683ba658dc46835fdf3c0e24645c0c2bb243b968
2021-11-17 22:28:30 -08:00
Ivan Yashchuk
affa3f846c Sparse CSR CPU: add torch.addmm (#65606)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65606

This PR adds `torch.addmm(c, a, b, alpha=1.0, beta=0.0, out=out)` variant with `a, b, c, out` all being sparse CSR tensors on CPU.

cc nikitaved pearu cpuhrsch IvanYashchuk

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D32366236

Pulled By: cpuhrsch

fbshipit-source-id: e910bcc96eee99d624b80ee881df3887ab3ba5ac
2021-11-16 17:22:46 -08:00
Ivan Yashchuk
c2642b6465 Sparse CSR CPU: add torch.add with all inputs sparse (#64391)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64391

This PR adds `torch.add(a, b, alpha=None, out=out)` variant with `a, b, out` all being sparse CSR tensors on CPU.

Fixes #59060

cc nikitaved pearu cpuhrsch IvanYashchuk

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D32316562

Pulled By: cpuhrsch

fbshipit-source-id: 384462369007854b5e2e6cb9ae7b320302627c71
2021-11-11 10:02:12 -08:00
Ivan Yashchuk
cbf596bf8e Sparse CSR CPU: add addmv_out (#61536)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61536

This PR adds CPU dispatch for `addmv_out` with Sparse CSR matrix.
The implementation uses MKL Sparse library. If it's not available then a
runtime error is thrown.
Since structured_delegate is used we only need to implement the out variant, the in-place and normal variants are autogenerated.

MKL descriptor of sparse matrices is implemented in `at::mkl::sparse::MklSparseCsrDescriptor`.
MKL Sparse doesn't allow switching indices type in runtime, it's
predetermined in build time. Only 32-bit version of MKL was tested
locally, but I expect 64-bit version to work correctly as well.

When indices type of PyTorch CSR tensor doesn't match with MKL's,
indices tensor is converted to MKL compatible type (`int` vs `int64_t`).

cc nikitaved pearu cpuhrsch IvanYashchuk

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D32141787

Pulled By: malfet

fbshipit-source-id: b818a0b186aa227982221c3862a594266a58a2a6
2021-11-09 12:34:21 -08:00
Ivan Yashchuk
d5d342b237 Sparse CSR CUDA: Support mixed memory format input for triangular_solve (#66401)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66401

This PR fixes the case when result and input tensors have different
strides.
cuSPARSE from CUDA 11.3.1 has a bug: it doesn't use correct strides to
write the result. This is "fixed" in PyTorch code by copying the input
tensor to a tensor with same strides as result tensor has.

cc nikitaved pearu cpuhrsch IvanYashchuk ngimel

Test Plan: Imported from OSS

Reviewed By: davidberard98

Differential Revision: D32177966

Pulled By: cpuhrsch

fbshipit-source-id: 118437409df147f04dce02763aff9bfd33f87c63
2021-11-04 15:34:42 -07:00
Ivan Yashchuk
69f86ecd3a Sparse CSR CUDA: add torch.add with all inputs sparse (#63948)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63948

This PR adds `torch.add(a, b, alpha=None, out=out)` variant with `a, b,
out` all being sparse CSR tensors.
The underlying cuSPARSE function works only with 32-bit indices, and in
the current implementation, the result tensor has 32-bit indices. Input
tensors can have both 64-bit and 32-bit indices tensors.

Fixes https://github.com/pytorch/pytorch/issues/59060

cc nikitaved pearu cpuhrsch IvanYashchuk ngimel

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D31909731

Pulled By: cpuhrsch

fbshipit-source-id: 656f523e3947fec56b2f93c474fb6fd49f0360ca
2021-10-29 10:43:05 -07:00
Ivan Yashchuk
bd5e6fe5ac Skip complex128 dtype for test_addmm_sizes_all_sparse_csr Windows test (#67453)
Summary:
Windows CUDA 11.1 periodic CI is failing. See https://github.com/pytorch/pytorch/pull/63511#issuecomment-953940183.
I don't understand though why periodic-win-vs2019-cuda11.1-py3 was triggered on the PR, but no test from `test_sparse_csr.py` were run https://github.com/pytorch/pytorch/runs/3975200820?check_suite_focus=true.

cc nikitaved pearu cpuhrsch IvanYashchuk mruberry

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67453

Reviewed By: malfet, seemethere, janeyx99

Differential Revision: D31997574

Pulled By: cpuhrsch

fbshipit-source-id: ae8bfb6da865014f39e6ad5675eb17e5a4d39744
2021-10-28 12:24:46 -07:00
Ivan Yashchuk
7c48b9ee25 Sparse CSR CUDA: add triangular_solve_out (#61858)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61858

This PR adds `triangular_solve_out_sparse_csr_cuda`. The operation is
used to comput the solution to the linear system where coefficient
matrix is triangular.
Structured kernels are used and the meta function needed some changes to
support sparse csr layout. With sparse matrix input the `cloned_coefficient`
tensor is 0-sized tensor.

cc nikitaved pearu cpuhrsch IvanYashchuk ngimel

Test Plan: Imported from OSS

Reviewed By: pbelevich

Differential Revision: D31948435

Pulled By: cpuhrsch

fbshipit-source-id: 7775fece83ca705a26d75f82aead10b956b14bfd
2021-10-27 11:12:20 -07:00
Ivan Yashchuk
700b39a3df Sparse CSR CUDA: add torch.addmm with all inputs sparse (#63511)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63511

This PR adds `torch.addmm(c, a, b)` variant with `c, a, b` all being CSR tensors.
The underlying cuSPARSE function works only with 32-bit indices, and in
the current implementation the result tensor has 32-bit indices. Input
tensors can have both 64-bit and 32-bit indices tensors.

cc nikitaved pearu cpuhrsch IvanYashchuk ngimel

Test Plan: Imported from OSS

Reviewed By: eellison

Differential Revision: D31809838

Pulled By: cpuhrsch

fbshipit-source-id: 97005dba27d8adcae445eb756bcbd7271061e9b5
2021-10-25 14:32:30 -07:00
Ivan Yashchuk
450221c534 Sparse CSR: Add tensor.resize_ and tensor.copy_ (#63510)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63510

Sparse CSR matrix resizing behavior:
If we _increase the number of rows_ the number of specified elements in the matrix remains the same -> the size of col_indices, values doesn't change, the size of crow_indices becomes `rows+1`.
If we _decrease the number of rows_ the number of specified elements will be `min(nnz, rows*cols)` -> need to resize `crow_indices` to `rows+1` and set the last element to `min(nnz, rows*cols)`; decrease the size of col_indices and values to `min(nnz, rows*cols)`.
If we _increase the number of columns_ the number of specified elements in the matrix remains the same, the number of rows remains the same -> no need to resize anything, just set new sizes.
We _cannot decrease the number of columns_ because it would require recomputing `crow_indices`.

cc nikitaved pearu cpuhrsch IvanYashchuk

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D31796680

Pulled By: cpuhrsch

fbshipit-source-id: 7d8a9701ce06d30a1841f94bba0a057cacea9401
2021-10-20 14:19:04 -07:00
Jane Xu
793f366e34 [skip ci] Set test owners for sparse tests (#66863)
Summary:
Action following https://github.com/pytorch/pytorch/issues/66232

cc nikitaved pearu cpuhrsch IvanYashchuk

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66863

Reviewed By: anjali411

Differential Revision: D31771126

Pulled By: janeyx99

fbshipit-source-id: 6cb5ca0557e8555f6a09b3e607ff8888e505486e
2021-10-20 10:12:13 -07:00
Ivan Yashchuk
bd4d5cb14c Sparse CSR: Add torch.empty (#63509)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63509

The primary use of `torch.empty` is to reserve memory for tensor and set the type, device, size information. The same is done here for SparseCSR.
`crow_indices` is initialized as an empty tensor of size `num_rows + 1`. `col_indices` and `values` are initialized as empty tensors of size 0.

cc nikitaved pearu cpuhrsch IvanYashchuk

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D31770359

Pulled By: cpuhrsch

fbshipit-source-id: c83f2a2e0d7514ba24780add1086e1bccf541dd9
2021-10-19 15:59:07 -07:00
Ivan Yashchuk
3488a85a76 Sparse CSR CUDA: fix input checks for addmm and mm (#66485)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66485

The errors for incorrectly sized inputs should match the dense variants
of functions.
Moved addmm_out_sparse_csr_dense_cuda from SparseCsrTensorMath.cu and
removed unnecessary device check.

cc nikitaved pearu cpuhrsch IvanYashchuk

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D31764036

Pulled By: cpuhrsch

fbshipit-source-id: 76900fe9e4a49474695a01f34bad41cb3422321c
2021-10-19 12:01:11 -07:00
Ivan Yashchuk
08f3823647 Sparse CSR CUDA: add addmv_out (#61407)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61407

This PR adds `addmv_out_sparse_csr_cuda`. The operation is used to
compute matrix-vector multiplication. Since structured_delegate is used
we only need to implement the out variant, the in-place and normal
variants are autogenerated.
Working on this PR revealed that float16 (and probably bfloat16) inputs
do not work correctly in cusparse, therefore for this case `addmm` is
used with squeezes and unsqueezes.

cc nikitaved pearu cpuhrsch IvanYashchuk ngimel

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D31584499

Pulled By: ngimel

fbshipit-source-id: 4c507791471ada88969116b88eeaaba7a7536431
2021-10-12 20:06:56 -07:00
Ivan Yashchuk
541eb1db63 Add cuSPARSE descriptors and update CSR addmm (#60838)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60838

Rewrote `addmm_out_sparse_csr_dense_cuda` implementation using new cusparse descriptors.

`addmm` now works without conversions with both 32-bit and 64-bit indices.
The dense tensors can have a row- or column-major layout. If the dense tensors are a contiguous slice of a larger tensor, the storage is used directly without temporary copies.

Test Plan: Imported from OSS

Reviewed By: pbelevich

Differential Revision: D30643191

Pulled By: cpuhrsch

fbshipit-source-id: 5555f5b59b288daa3a3987d322a93dada63b46c8
2021-09-30 11:32:51 -07:00
Philip Meier
26b7ff5aea deprecate dtype getters from torch.testing namespace (#63554)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63554

Following https://github.com/pytorch/pytorch/pull/61840#issuecomment-884087809, this deprecates all the dtype getters publicly exposed in the `torch.testing` namespace. The reason for this twofold:

1. If someone is not familiar with the C++ dispatch macros PyTorch uses, the names are misleading. For example `torch.testing.floating_types()` will only give you `float32` and `float64` skipping `float16` and `bfloat16`.
2. The dtype getters provide very minimal functionality that can be easily emulated by downstream libraries.

We thought about [providing an replacement](https://gist.github.com/pmeier/3dfd2e105842ad0de4505068a1a0270a), but ultimately decided against it. The major problem is BC: by keeping it, either the namespace is getting messy again after a new dtype is added or we need to somehow version the return values of the getters.

Test Plan: Imported from OSS

Reviewed By: H-Huang

Differential Revision: D30662206

Pulled By: mruberry

fbshipit-source-id: a2bdb10ab02ae665df1b5b76e8afa9af043bbf56
2021-09-07 08:58:51 -07:00
Kushashwa Ravi Shrimali
d37636901e [Doc] make_tensor to torch.testing module (#63925)
Summary:
This PR aims to add `make_tensor` to the `torch.testing` module in PyTorch docs.

TODOs:

* [x] Add examples

cc: pmeier mruberry brianjo

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63925

Reviewed By: ngimel

Differential Revision: D30633487

Pulled By: mruberry

fbshipit-source-id: 8e5a1f880c6ece5925b4039fee8122bd739538af
2021-08-30 12:25:40 -07:00
Shen Li
1022443168 Revert D30279364: [codemod][lint][fbcode/c*] Enable BLACK by default
Test Plan: revert-hammer

Differential Revision:
D30279364 (b004307252)

Original commit changeset: c1ed77dfe43a

fbshipit-source-id: eab50857675c51e0088391af06ec0ecb14e2347e
2021-08-12 11:45:01 -07:00
Zsolt Dollenstein
b004307252 [codemod][lint][fbcode/c*] Enable BLACK by default
Test Plan: manual inspection & sandcastle

Reviewed By: zertosh

Differential Revision: D30279364

fbshipit-source-id: c1ed77dfe43a3bde358f92737cd5535ae5d13c9a
2021-08-12 10:58:35 -07:00
rusty1s
82123758ba _convert_coo_to_csr CPP and CUDA functionality (#61838)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/57381 and improves https://github.com/pytorch/pytorch/pull/61340 via dedicated `coo_to_csr` functionalities.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61838

Reviewed By: ezyang

Differential Revision: D30132736

Pulled By: cpuhrsch

fbshipit-source-id: a1fd074c0d70366a524d219a620b94f8bed71d7c
2021-08-11 11:37:20 -07:00
rusty1s
457a0b63bf use torch.bucketize into_sparse_csr implementation (+ additional tests) (#61340)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/57381

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61340

Reviewed By: bhosmer

Differential Revision: D29601393

Pulled By: cpuhrsch

fbshipit-source-id: 4ca1f013d96e8716f0e658e0cd685d9aa0d98a5c
2021-07-20 15:44:25 -07:00
Ivan Yashchuk
7011513d23 Enable sparse_csr.to_dense() for bool, float16, bfloat16 and complex (#60657)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60657

Fixes https://github.com/pytorch/pytorch/issues/60648

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D29408102

Pulled By: cpuhrsch

fbshipit-source-id: 406505c1c52c0eada934833f9723f58fa67e9256
2021-07-07 19:29:19 -07:00
Pearu Peterson
374278f431 Improved sparse CSR tensor sampling method (#60283)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/59379

The improved sparse CSR tensor sampling method is described in https://pearu.github.io/csr_sampling.html that features:
- for specified `nnz`, one gets a CSR sample with the same `nnz`
- variability of the number of specified columns per row is maximized
- `crow_indices` content is randomized
- a given row specific `col_indices` content is sorted and filled with unique values (see also https://github.com/pytorch/pytorch/issues/60277)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60283

Reviewed By: bhosmer

Differential Revision: D29492605

Pulled By: cpuhrsch

fbshipit-source-id: 8d875b7c2b0573a9ab37047c6d8fe8b540295ce1
2021-07-01 13:26:19 -07:00
Joel Schlosser
03b5a225a7 Test parametrization for instantiated device-specific tests (#60233)
Summary:
The `ops` decorator provides a way to parameterize a test across a given list of ops. This would be useful for modules as well (e.g. a `modules` decorator), but the mechanism by which this is accomplished is specific to ops. In the details, the `ops` decorator tags a test function with the metadata needed (list of ops, `dtypes`) and the actual tests are generated according to this metadata during the call to `instantiate_device_type_tests()`.

This PR makes this mechanism more generic, allowing for test parameterization across arbitrary dimensions. This makes a `modules` decorator (or any similar type of decorator) straightforward to implement without changes to the device-specific test instantiation logic.

One caveat is that, since this is implemented where the old `ops` decorator was (within `instantiate_device_type_tests()`), this only works for tests instantiated using the device-specific instantiation logic. Longer term, even device-specific test instantiation could be treated as an optional parameterization across device types, but this PR takes a low-risk approach for now. In practice, this just means that a `device` kwarg is required for all test signatures used with the mechanism.

The `ops` decorator has been refactored to use the generic mechanism and works the same as before, with one difference: when `OpDTypes.none` is specified, the test signature no longer needs an unused `dtype` kwarg. This is a nice bonus that demonstrates the added flexibility of a generic parameterization mechanism. The refactored form also has the bonus that all op-specific test generation logic is contained within the `ops` decorator class, improving readability.

Behind the scenes, the generic mechanism is a base decorator class (`_TestParameterizer`) from which `ops` derives. The core functionality is in the `_parameterize_test()` method, which takes in a test function and returns a generator that produces parameterized tests, including names and parameter kwargs to pass to them. Using the `ops` decorator results in a set of op-specific tests from a given generic test.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60233

Reviewed By: iramazanli

Differential Revision: D29494995

Pulled By: jbschlosser

fbshipit-source-id: a14446488c106094fafcaa75ccf8e9e3faf33bfc
2021-06-30 18:50:22 -07:00
Ivan Yashchuk
c5f0692b6e Sparse CSR: increase dtype test coverage (#60656)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60656

This PR uses `torch.testing.get_all_dtypes()` for dtype parametrisation
of tests in `test_sparse_csr.py`. It adds previously excluded from tests
bool, half, bfloat16, complex dtypes. `torch.complex32` is omitted due
to lack of coverage and lack of specialized `AT_DISPATCH...`.
The process of adding more dtypes to tests releaved that `.to_dense()`
doesn't work for all dtypes.

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D29408058

Pulled By: cpuhrsch

fbshipit-source-id: 319b6f51b9786d6957d508f51657657a6d00267a
2021-06-25 17:11:21 -07:00
Alexander
2d8f0d966f CUDA support in the CSR layout: CUDA addmm/matvec (#59012)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59012

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D28719631

Pulled By: bhosmer

fbshipit-source-id: 43e2004a61e114aeb0a7c6ad8a25fedda238c6da
2021-06-01 21:16:42 -07:00
Alexander
41054f2ab5 CUDA support in the CSR layout: sparse_to_dense/add_sparse_csr (#59011)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59011

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D28719550

Pulled By: bhosmer

fbshipit-source-id: 530c7cd1b20ae6d8865fd414afaf6fab27a643e6
2021-05-27 20:59:22 -07:00
Alexander
b435a27fb7 CUDA support in the CSR layout: constructors (#59010)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59010

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D28719287

Pulled By: bhosmer

fbshipit-source-id: fbb5784ccb5ce19dcca1f2f95c4ee16f9b7680c4
2021-05-26 16:39:43 -07:00
Alban Desmaison
032d6b0643 Revert D28112689: CUDA support in the CSR layout: constructors
Test Plan: revert-hammer

Differential Revision:
D28112689 (1416e57465)

Original commit changeset: f825cd4bce40

fbshipit-source-id: 421fc590797ac5fab6a55ac6f213361fbba7cd5b
2021-05-26 06:15:05 -07:00
Alexander
1416e57465 CUDA support in the CSR layout: constructors (#57274)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57274

Test Plan: Imported from OSS

Reviewed By: astaff

Differential Revision: D28112689

Pulled By: bhosmer

fbshipit-source-id: f825cd4bce402dd4c3f71db88854f77830b687b8
2021-05-26 01:36:20 -07:00
Alexander
1fca1545d4 fixing csr addmm bug (#58768)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58768

Fixes gh-58757

This PR has a fix for CPU version of addmm op. Just for context, before this PR, only CSR @ vector was supported. I found out a minor bug in the addmm_out_sparse_csr_dense_cpu for the non MKL code which is solved in this PR.

Moreover, I discovered a limitation in the current MKL implementation. It only works well (acceptable tolerance for output error) with square matrices. I was looking in deep to this issue and I found out that it could be a limitation of the MKL API.

I used this [gist code](https://gist.github.com/aocsa/0606e833cd16a8bfb7d37a5fbb3a5b14) based on [this](https://github.com/baidu-research/DeepBench/blob/master/code/intel/spmm/spmm_bench.cpp) to test this behavior.

As you can see there is not an acceptable output error (last column) when the matrices are squares and there is a not acceptable error when the matrices are not square. I reported the issue here: https://github.com/pytorch/pytorch/issues/58770

Looking forward to your comments.

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D28629563

Pulled By: malfet

fbshipit-source-id: 5ee00ae667336e0d9301e5117057213f472cbc86
2021-05-24 09:54:07 -07:00
Rong Rong (AI Infra)
a70020465b adding test_sparse_csr to run_test (#58666)
Summary:
fixes https://github.com/pytorch/pytorch/issues/58632.

Added several skips that relates to test assert and MKL. Will address them in separate PR.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58666

Reviewed By: seemethere, janeyx99

Differential Revision: D28607966

Pulled By: walterddr

fbshipit-source-id: 066d4afce2672e4026334528233e69f68da04965
2021-05-22 13:17:46 -07:00
Nikita Shulga
abb215e229 Fix dtype inference in sparse_csr_tensor_ctor (#58631)
Summary:
`NULL` return from `PyObject_GetAttrString` should never get ignored without handling the exception, as behavior of subsequent Python C API calls are undefined until `PyErr_Fetch` or `PyErr_Clear` is called.

This accidentally leads to `list` type being incorrectly identified as `Tensor`

Fixes https://github.com/pytorch/pytorch/issues/58520

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58631

Reviewed By: albanD

Differential Revision: D28559454

Pulled By: malfet

fbshipit-source-id: 46f044b5f0f94264779a6108474d04a8ba851c53
2021-05-20 08:02:05 -07:00
Alexander
18c89a904b Modernize test-suite in sparse tensor CSR (#56392)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56392

Fixes for gh-56371 and gh-56369

Test Plan: Imported from OSS

Reviewed By: H-Huang

Differential Revision: D27913212

Pulled By: mruberry

fbshipit-source-id: 2c78fe9fa4b6c6b566d9eb01f71e6016d672a545
2021-04-27 15:22:17 -07:00
Sameer Deshmukh
5fb1142702 Add CSR (compressed sparse row) layout for sparse tensors (#50937)
Summary:
Implement compressed sparse row format. Derived from the GCS implementation at https://github.com/pytorch/pytorch/pull/44190

Pull Request resolved: https://github.com/pytorch/pytorch/pull/50937

Reviewed By: mrshenli

Differential Revision: D27439865

Pulled By: ezyang

fbshipit-source-id: 3ba3dcb9679505b980ff6a5f513e913bbae2fb1d
2021-04-12 10:09:12 -07:00