Commit Graph

274 Commits

Author SHA1 Message Date
PyTorch MergeBot
c5836153f5 Revert "optimize sampled_addmm performance on CPU (SparseCSR) (#90978)"
This reverts commit 645fb217c0.

Reverted https://github.com/pytorch/pytorch/pull/90978 on behalf of https://github.com/seemethere due to This broke internal builds for android due to the new file added being missing in build_variables.bzl
2023-01-11 20:12:12 +00:00
Pearu Peterson
b9a035c1c5 Add check-sparse-tensor-invariants flag to Context. (#90849)
This PR adds "check sparse tensor invariants" flag to Context that when enabled will trigger sparse tensor data invariants checks in unsafe methods of constructing sparse COO/CSR/CSC/BSR/BSC tensors. The feature includes the following changes to UI:

- `torch.enable_check_sparse_tensor_invariants` and `torch.is_check_sparse_tensor_invariants_enabled` functions to globally enable/disable the invariant checks and to retrieve the state of the feature, respectively
- `torch.sparse_coo/csr/csc/bsr/bsc/compressed_tensor` functions have a new optional argument `check_invariants` to enable/disable the invariant checks explicitly. When the `check_invariants` argument is specified, the global state of the feature is temporarily overridden.

The PR also fixes https://github.com/pytorch/pytorch/issues/90833

# Main issue

*The following content is outdated after merging the PRs in this ghstack but kept for the record.*

The importance of this feature is that when enabling the invariants checks by default, say, via

<details>

```
$ git diff
diff --git a/torch/__init__.py b/torch/__init__.py
index c8543057c7..19a91d0482 100644
--- a/torch/__init__.py
+++ b/torch/__init__.py
@@ -1239,3 +1239,8 @@ if 'TORCH_CUDA_SANITIZER' in os.environ:

 # Populate magic methods on SymInt and SymFloat
 import torch.fx.experimental.symbolic_shapes
+
+# temporarily enable sparse tensor arguments validation in unsafe
+# constructors:
+
+torch._C._set_check_sparse_tensor_invariants(True)
```

</details>

a massive number of test failures/errors occur in test_sparse_csr.py tests:
```
$ pytest -sv test/test_sparse_csr.py
<snip>
==== 4293 failed, 1557 passed, 237 skipped, 2744 errors in 69.71s (0:01:09) ====
```
that means that we are silently constructing sparse compressed tensors that do not satisfy the sparse tensor invariants. In particular, the following errors are raised:

```
AssertionError: "resize_as_sparse_compressed_tensor_: self and src must have the same layout" does not match "expected values to be a strided and contiguous tensor"

RuntimeError: CUDA error: device-side assert triggered

RuntimeError: `col_indices[..., crow_indices[..., i - 1]:crow_indices[..., i]] for all i = 1, ..., nrows are sorted and distinct along the last dimension values` is not satisfied.

RuntimeError: expected col_indices to be a strided and contiguous tensor

RuntimeError: expected row_indices to be a strided and contiguous tensor

RuntimeError: expected values to be a strided and contiguous tensor

RuntimeError: for_each: failed to synchronize: cudaErrorAssert: device-side assert triggered

RuntimeError: tensor dimensionality must be sum of batch, base, and dense dimensionalities (=0 + 2 + 0) but got 3
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90849
Approved by: https://github.com/amjames, https://github.com/cpuhrsch
2023-01-11 01:05:14 +00:00
mingfeima
645fb217c0 optimize sampled_addmm performance on CPU (SparseCSR) (#90978)
### Target and Background
This PR is improving the performance of `sampled_addmm` on CPU device. This is part of effort for improving PyG performance on CPU for GNN training/inference.

The current implementation is a reference design which converts `SparseCSR` tensor back to dense tensor and then do the addmm and convert back to `SparseCSR` again: this is going to be very slow and won't be able to run most of the datasets under https://github.com/snap-stanford/ogb (convert to dense would trigger `OOM`).

### Benchmarks

Right now we don't have any hands-on benchmark or workload to test this since this operator is not used in PyG yet. I fetched the dataset from `ogb-products` where:

* number of nodes: 2.4 * 10^6
* number of edges: 1.26 * 10^8
* number of features: 128

So if we store the **adjacency matrix** is dense, it is going to be 2.4 * 2.4 * 4 * 10^12 bytes, this will be OOB on current code. I abstract the first 1k rows to compare, **1100x** speedup:

CPU: Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz, dual socket, 20 cores per socket.
```
### before: run 1000 rows from the whole dataset
sampled_addmm: running dataset ogb-products first 1000 rows: each iter takes 1212.000 ms!

### after: run 1000 rows from the whole dataset
sampled_addmm: running dataset ogb-products first 1000 rows: each iter takes 1.102 ms!

### after: run the whole dataset
sampled_addmm: running dataset ogb-products (the whole dataset) 2449029 rows: each iter takes 873.306 ms!
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90978
Approved by: https://github.com/pearu, https://github.com/cpuhrsch
2023-01-10 22:13:35 +00:00
Pearu Peterson
cdc30048e5 Fix numel() result after resizing a sparse compressed tensor. (#91831)
Fixes #91830

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91831
Approved by: https://github.com/cpuhrsch
2023-01-10 18:21:07 +00:00
Pearu Peterson
b797a24259 Support indices contiguity per batch and non-contiguous values in sparse compressed tensors (#91243)
Fixes https://github.com/pytorch/pytorch/issues/91062

With this PR, all reported failures in https://github.com/pytorch/pytorch/pull/90849 are resolved (modulo test_bmm that uses an unorthodox way to construct a batch CSR tensor).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91243
Approved by: https://github.com/nikitaved, https://github.com/amjames, https://github.com/lezcano
2023-01-02 18:08:46 +00:00
Kurt Mohler
08a47549af Rename Tensor._storage to Tensor.untyped_storage and update docs (#91414)
Fixes #89224

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91414
Approved by: https://github.com/ezyang
2022-12-28 19:21:34 +00:00
Nikita Vedeneev
4c5928e387 Fix for mul(compressed, wrapped scalar) (#91239)
Fixes https://github.com/pytorch/pytorch/issues/90819.

The path with `Scalar` should have been picked up by the dispatcher, but still the path with a 0-dim wrapped scalar was broken.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91239
Approved by: https://github.com/pearu, https://github.com/cpuhrsch
2022-12-22 13:11:13 +00:00
Pearu Peterson
01e7f46215 Ensure sorted indices from the CSR->BSR conversion (#90918)
Fixes https://github.com/pytorch/pytorch/issues/90910

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90918
Approved by: https://github.com/cpuhrsch
2022-12-16 15:49:48 +00:00
Nikita Vedeneev
c2c14f9597 Sparse compressed mm: fix for orthogonal inputs (#90917)
Fixes https://github.com/pytorch/pytorch/issues/90836
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90917
Approved by: https://github.com/cpuhrsch
2022-12-16 13:08:00 +00:00
Nikita Vedeneev
4dd3de23dd Sparse compressed mm: fix for empty inputs (#90763)
Fixes [#90693
](https://github.com/pytorch/pytorch/issues/90693)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90763
Approved by: https://github.com/cpuhrsch
2022-12-16 12:33:57 +00:00
Pearu Peterson
76c6dfeaa6 Add layout and blocksize arguments to Tensor.to_sparse method (#89502)
This PR extends the `Tensor.to_sparse()` method to `Tensor.to_sparse(layout=None, blocksize=None)` in a BC manner (`layout=None` means `layout=torch.sparse_coo`).

In addition, the PR adds support for the following conversions:
- non-hybrid/hybrid COO tensor to CSR or CSC or a COO tensor
- short, bool, byte, char, bfloat16, int, long, half CSR tensor to a BSR tensor

and fixes the following conversions:
- hybrid COO to COO tensor
- non-batch/batch hybrid BSR to BSR or BSC tensor

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89502
Approved by: https://github.com/amjames, https://github.com/cpuhrsch
2022-11-30 20:21:10 +00:00
Pearu Peterson
296e1ba4d0 Row and column select support for block compressed sparse tensors (#88733)
As in the title:

- Support `select` and `select_copy` on block sparse compressed tensors
- Fixes incorrect results when selecting dense dimensions

The PR also improves the performance of indexing sparse compressed tensors considerably:

<details>

Before:

```python
In [3]: a=torch.rand((1000, 1000)).to_sparse_csr()

In [4]: %timeit a.select(0, 0)
606 µs ± 4.27 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [5]: %timeit a.select(1, 0)
527 µs ± 57.7 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [6]: %timeit a[0, 0]
617 µs ± 3.74 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [7]: a = a.cuda()

In [8]: %timeit a.select(0, 0); torch.cuda.synchronize();
1.19 ms ± 137 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [9]: %timeit a.select(1, 0); torch.cuda.synchronize();
1.2 ms ± 119 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [10]: %timeit a[0, 0]; torch.cuda.synchronize();
1.23 ms ± 482 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
```

This PR:

```python
In [3]: a=torch.rand((1000, 1000)).to_sparse_csr()

In [4]: %timeit a.select(0, 0)
4.75 µs ± 8.94 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [5]: %timeit a.select(1, 0)
565 µs ± 156 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [6]: %timeit a[0, 0]
13.1 µs ± 435 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [7]: a = a.cuda()

In [8]: %timeit a.select(0, 0); torch.cuda.synchronize();
21.6 µs ± 23.9 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [9]: %timeit a.select(1, 0); torch.cuda.synchronize();
1.15 ms ± 3.13 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [10]: %timeit a[0, 0]; torch.cuda.synchronize();
63.7 µs ± 2.5 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
```

</details>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88733
Approved by: https://github.com/nikitaved, https://github.com/amjames, https://github.com/cpuhrsch
2022-11-30 11:15:56 +00:00
Pearu Peterson
90bed8874f Generator of tensor inputs with variable layout and structure (batch/non-batch, hybrid/non-hybrid, block/non-block) (#88914)
This PR introduces `TestCase.generate_simple_inputs` method that is an improved and generalized version of the `TestSparseCompressed._generate_small_inputs` method.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88914
Approved by: https://github.com/cpuhrsch
2022-11-30 02:13:33 +00:00
Pearu Peterson
50e2e4faf3 Sparse CSC/BSR/BSC serialization and pickle support (#89553)
Fixes https://github.com/pytorch/pytorch/issues/89497

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89553
Approved by: https://github.com/cpuhrsch
2022-11-23 20:56:48 +00:00
Andrew M. James
a41f70603a Round out rad2deg sparse support (#88442)
- Add sparse coo dispatch
- Modify backward to work with sparse compressed layouts
- Enable sparse_compressed autograd testing
- Correct layout support attributes on OpInfo

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88442
Approved by: https://github.com/cpuhrsch
2022-11-17 06:00:23 +00:00
Nikita Vedeneev
8dc3353b0b add to(dtype) support for all sparse compressed formats (#89055)
Fixes [#88419](https://github.com/pytorch/pytorch/issues/88419)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89055
Approved by: https://github.com/cpuhrsch
2022-11-15 21:16:18 +00:00
Kazuaki Ishizaki
03296844aa Fix typos in messages under aten (#88964)
This PR fixes typos of messages and parms in c++ source files under `aten` directory.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88964
Approved by: https://github.com/lezcano
2022-11-14 09:50:50 +00:00
Andrew M. James
ff6770a9a1 enable backward for log1p (sparse layouts) (#88155)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88155
Approved by: https://github.com/cpuhrsch
2022-11-04 20:59:26 +00:00
Andrew M. James
6938dd0b2c Support sparse inputs to deg2rad (#88156)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88156
Approved by: https://github.com/cpuhrsch
2022-11-04 20:59:26 +00:00
Andrew M. James
1964d8c34f Enable sparse_csr autograd testing for relu (#88154)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88154
Approved by: https://github.com/cpuhrsch
2022-11-04 20:59:23 +00:00
Andrew M. James
f03302ba49 Add sparse layout support for torch.frac (#88153)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88153
Approved by: https://github.com/cpuhrsch
2022-11-04 20:59:22 +00:00
Andrew M. James
b2dfd20260 Remove BSC conversion skip from TestSparseCompressed.test_consistency (#88152)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88152
Approved by: https://github.com/cpuhrsch
2022-11-01 22:18:56 +00:00
Andrew M. James
d044b4cc58 Update torch.abs and torch.positive opinfos to reflect sparse support (#88151)
cc @nikitaved @pearu @cpuhrsch @bhosmer
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88151
Approved by: https://github.com/cpuhrsch
2022-11-01 22:18:56 +00:00
Ivan Yashchuk
51ea441862 Upcast to fp32 in test_addmm_block ref_half_bfloat16 (#86682)
Fixes https://github.com/pytorch/pytorch/issues/86681
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86682
Approved by: https://github.com/nikitaved
2022-10-11 16:39:57 +00:00
nikitaved
e15a48def7 (bsr/csr) x dense mm (#85551)
As per title. This implementation is not the most optimal and could be improved albeit with native kernels (i.e. block matching need not be materialized).

Compared to existing kernels it offers:

- Half float support (In fact, any dtype that supports `matmul` will work).
- Arbitrary block sizes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85551
Approved by: https://github.com/amjames, https://github.com/cpuhrsch
2022-09-29 17:12:04 +00:00
Andrew M. James
8a926b3187 Enable CSC @ CSC addmm (#85379)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85379
Approved by: https://github.com/pearu, https://github.com/cpuhrsch
2022-09-27 19:49:31 +00:00
Andrew M. James
bb5001ce3d Enable dense x bsc mm/addmm (#85308)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85308
Approved by: https://github.com/pearu
2022-09-27 19:49:31 +00:00
Andrew M. James
aaef5d8f2c sparse mm/addmm enable dense x csc, csc x dense and simplify layout check logic. (#85307)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85307
Approved by: https://github.com/pearu, https://github.com/cpuhrsch
2022-09-27 16:46:28 +00:00
Andrew M. James
f64857189d resize_as_sparse support all compressed layouts (#85378)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85378
Approved by: https://github.com/pearu, https://github.com/cpuhrsch
2022-09-27 06:59:18 +00:00
George Qi
686555b663 [maskedtensor] port torch/_masked into torch/masked (#85515)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85515
Approved by: https://github.com/cpuhrsch
2022-09-26 23:41:13 +00:00
Sean Ross-Ross
a4c94f0739 Fix cuda issue with sparse.sampled_addmm (#85194)
fixes https://github.com/pytorch/pytorch/issues/85169

Pull Request resolved: https://github.com/pytorch/pytorch/pull/85194
Approved by: https://github.com/amjames, https://github.com/nikitaved
2022-09-23 20:52:23 +00:00
nikitaved
0278a141fc csr <-> csc, csc <-> csc, bsr <-> bsc, bsc <-> bsc, bsr <-> bsr conversions (#85091)
As per title. Required to enable a wider selection of sparse formats for `nn.functional.linear`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/85091
Approved by: https://github.com/amjames, https://github.com/cpuhrsch
2022-09-21 20:10:26 +00:00
Pearu Peterson
f0b06c64c8 Fix bugs in sparse compressed tensor shape and device inference (#85240)
Fixes #84999

This PR
- uses device option to set sparse compressed tensor instance device
- enables shape and device inference tests that was disabled due to an oversight
- fixes a bug in shape inference of hybrid tensors
- fixes a bug in to_sparse_bsr of a cuda tensor
- updates tests that catch the above bugs

Pull Request resolved: https://github.com/pytorch/pytorch/pull/85240
Approved by: https://github.com/cpuhrsch
2022-09-19 18:10:37 +00:00
Edward Z. Yang
c5a8946e40 Revert "Revert "Redo how custom/python_custom methods on TensorImpl work (#84796)" (#84806)
This reverts commit ca3b2bfbe3.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84806
Approved by: https://github.com/Chillee
2022-09-10 06:17:35 +00:00
Eli Uriegas
ca3b2bfbe3 Revert "Redo how custom/python_custom methods on TensorImpl work (#84796)
This reverts commit 591b75bf98.

Manual revert of https://github.com/pytorch/pytorch/pull/84641

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/84796
Approved by: https://github.com/izaitsevfb
2022-09-10 00:18:13 +00:00
Edward Z. Yang
591b75bf98 Redo how custom/python_custom methods on TensorImpl work (#84641)
A longstanding confusion in the implementation of fake tensor and proxy tensor is what to do about torch.ops.aten.sym_sizes and related calls. In particular, when you have a tensor that (1) has symbolic shapes and (2) has a `__torch_dispatch__` call, previously, you would always get `__torch_dispatch__` calls for sizes/strides query, *even if you didn't request it* via the dispatch kwargs in `make_wrapper_subclass`.

The reason for this is because we were previously mixing several concepts: "I want to dispatch to Python", "I want to call a virtual method" and "I have dynamic shapes". A single boolean variable controlled all of these things, and so it was not possible to understand inside TensorImpl what the user had actually originally requested.

In this PR, we track each of these concepts individually so that we can preserve user intent. Then, we combine these into a single "policy" variable that controls whether or not we can use the fastpath or not. For the policy to trigger, we only need one of the exceptional cases to be true.

Billing of changes:
* Rename `set_sizes_strides_policy` to `set_custom_sizes_strides`; in general, you cannot DIRECTLY set policy; you have to indirectly set it by the public functions.
* Some helpers for sizes and strides, since it's more complicated (as it is an enum, rather than just bools as is the case for device and layout). `matches_python_custom` is used to test the Python dispatch user ask. `matches_policy` does the policy test (only used in the user facing functions.)
* I reorged the accessor methods so that they are more logical. This makes the diff bad, so I recommend reading the final code directly.
* The default custom implementations now more reliably call their default() implementations
* As bonus refactor, I devirtualized some functions that don't need to be virtual
* `set_sym_sizes_and_strides` is renamed to `set_sizes_and_strides` to make it easier to use in template contexts; it optionally takes a storage offset now so you can set all three values at the same time. If you use the SymInt overload but there are no symbolic integers, we give you a normal resize.
* This adds `sym_storage_offset` since we had that in the symbolic shapes branch and there's no reason not to put it in (and it reduces merge conflicts)

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84641
Approved by: https://github.com/wconstab
2022-09-09 13:41:13 +00:00
Andrew M. James
9b115c7bd3 Sparse Compressed Transpose add support for Batch dims and BSR/BSC layouts (#82122)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82122
Approved by: https://github.com/bhosmer
2022-09-02 17:42:58 +00:00
Andrew M. James
0192a34910 Dense -> CSC support batch dimensions (#83086)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83086
Approved by: https://github.com/bhosmer, https://github.com/nikitaved
2022-09-02 17:42:58 +00:00
Andrew M. James
f0e5b73364 Dense -> CSR support batch dimensions (#83084)
Only requires changes to the dense->sparse pathway. The reverse already has support.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/83084
Approved by: https://github.com/bhosmer, https://github.com/nikitaved
2022-09-02 17:42:58 +00:00
Andrew M. James
8778f33744 Dense <-> bsc conversions (#80781)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80781
Approved by: https://github.com/bhosmer, https://github.com/nikitaved
2022-09-01 16:01:58 +00:00
jpvillam
247468baf0 [ROCm] More Sparse UTs enablement and more hipification mappings. (#78939)
Enables:

 test_bmm_cuda_float64
 test_bmm_deterministic_cuda_float64
 test_csr_matvec_cuda_complex128
 test_csr_matvec_cuda_complex64
 test_csr_matvec_cuda_float32
 test_csr_matvec_cuda_float64

To enable the above tests had to add some more hip mappings for the hipification process.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78939
Approved by: https://github.com/pruthvistony, https://github.com/malfet
2022-08-23 13:54:09 +00:00
Andrew M. James
eebcb9117a Fix BSR->Dense Batched Bug (#82120)
A todo in the tests which should have been removed and addressed before the initial PR landed was left, and so left holes in testing BSR-> Dense. This addresses the underlying issue and removes the hole in test coverage. #8071 Introduces more comprehensive test coverage for sparse compressed <-> Dense conversion in general.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82120
Approved by: https://github.com/nikitaved, https://github.com/bhosmer
2022-08-06 02:24:20 +00:00
Andrew M. James
0e0dfaa057 Add support for select of batch dims for all sparse compressed formats. (#82119)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82119
Approved by: https://github.com/nikitaved, https://github.com/bhosmer
2022-08-06 02:24:20 +00:00
Nikita Shulga
d80fe49de0 [Reland] Add py-3.10 config (#82329)
This is a re-land of #81372 and #81233 with the exception that it does not force the range-checks on older Python runtime versions and as such should not affect the internal workloads, which were the reason for revert, see https://github.com/pytorch/pytorch/pull/81372#issuecomment-1187516464

- [Py3.10] Allow floats to be imported as Long (#81372)
- [CI] Move CUDA-11.6 to Python-3.10 configuration (#81233)
- Don't do anything about range checks for pre-py3.10
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82329
Approved by: https://github.com/kit1980
2022-07-27 20:22:47 +00:00
Edward Z. Yang
7f7c81c5f9 Add empty_like support for sparse_csc/bsr/bsc (#82310)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82310
Approved by: https://github.com/amjames, https://github.com/nikitaved
2022-07-27 18:59:07 +00:00
PyTorch MergeBot
ec1b3a45ad Revert "[Py3.10] Allow floats to be imported as Long (#81372)"
This reverts commit 69d73345a2.

Reverted https://github.com/pytorch/pytorch/pull/81372 on behalf of https://github.com/DanilBaibak due to Break internal build
2022-07-18 14:55:13 +00:00
Nikita Shulga
69d73345a2 [Py3.10] Allow floats to be imported as Long (#81372)
Thus avoiding `TypeError: 'float' object cannot be interpreted as an integer` when trying to create integer tensor from floating point values

Use `c10::checked_convert` to detect overflows during tensor construction from scalars. Modify sparse_csr test that violated this rule

Fixes #69319

Tested in #81233

Pull Request resolved: https://github.com/pytorch/pytorch/pull/81372
Approved by: https://github.com/ezyang, https://github.com/ngimel
2022-07-15 22:57:58 +00:00
Nikita Vedeneev
880b972841 More efficient indices validations for compressed sparse formats. (#81108)
As per title.

Some of the features:
- native kernels both for the CPU and CUDA without device syncs.
- If needed, invariant checks 5.1 - 5.5 could be improved to utilize vectorization. This will require implementing a conversion `Vectorized -> bool`. That's a follow-up.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/81108
Approved by: https://github.com/amjames, https://github.com/pearu, https://github.com/cpuhrsch
2022-07-14 20:36:18 +00:00
Pearu Peterson
d50f4a3c24 Support sparse/dense_dim for Compressed Sparse tensors (#80901)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80901
Approved by: https://github.com/cpuhrsch, https://github.com/nikitaved
2022-07-08 15:49:35 +00:00
Pearu Peterson
d266256621 Support compressed sparse tensors with dense dimensions (#80565)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80565
Approved by: https://github.com/cpuhrsch
2022-07-07 16:21:12 +00:00
PyTorch MergeBot
682c0d2615 Use segment/scatter_reduce to support masked reductions on sparse CSR tensors (mean, amax, amin) (fp only) (#78918)
Follows design  [here](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/sparse/SparseCsrTensorMath.cpp#L804-L837) and [here](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/sparse/SparseCsrTensorMath.cpp#L885-L928) from SparseCsrTensorMath.cpp (which has already been used to implement sum/prod) but use `segment_reduce`/`scatter_reduce` for reduction step

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78918
Approved by: https://github.com/cpuhrsch
2022-06-30 14:11:53 +00:00
Andrew M. James
9e3677f85d Add support for BSR <-> Strided Conversion (#80354)
Supersedes #78303

Pull Request resolved: https://github.com/pytorch/pytorch/pull/80354
Approved by: https://github.com/cpuhrsch
2022-06-27 21:09:09 +00:00
Pearu Peterson
cde365a7cd Validate Sparse Compressed tensor inputs (#79385)
The validation includes regular tensor inputs, batched tensor inputs, as well as hybrid tensor inputs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79385
Approved by: https://github.com/nikitaved, https://github.com/cpuhrsch
2022-06-27 17:19:54 +00:00
Nikita Vedeneev
9ad91cc6e0 optimize to_dense for CSC (#79635)
As per title. Previously it was done via converting to COO.
A better approach could be using `dense.out_`, but `sparse_csc` is yet forbidden.
And are we fine with implementing very critical operations like `add` via transpositions?

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79635
Approved by: https://github.com/cpuhrsch
2022-06-21 16:52:16 +00:00
jpvillam
aff7eef476 [ROCm] Enable some sparse tests on ROCm (#77877)
Enabling:
test_sampled_addmm_errors_cuda_complex128
test_sampled_addmm_errors_cuda_complex64
test_sampled_addmm_errors_cuda_float32
test_sampled_addmm_errors_cuda_float64
test_sparse_add_cuda_complex128
test_sparse_add_cuda_complex64

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77877
Approved by: https://github.com/pruthvistony, https://github.com/malfet
2022-06-14 21:11:35 +00:00
Pearu Peterson
fb6749d977 Support CSC/BSR/BSC inputs to unary zero-preserving functions.
In addition, enable testing masked reductions in sparse compressed consistency check.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78173

Approved by: https://github.com/cpuhrsch
2022-06-09 09:46:34 +00:00
Pearu Peterson
8c88a55d44 Fix sparse BSR tensor validation.
Also adds bits to support dense dimensions for Sparse Compressed tensors.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78359

Approved by: https://github.com/cpuhrsch
2022-05-27 13:26:35 +00:00
Christian Puhrsch
b9fb940dec Conversion between SparseBsr and Strided (#78025)
Adds conversion between the strided and SparseBsr layout

[Based on code by @bhosmer!](https://colab.research.google.com/drive/1NHWti04TU269dzbRjLfxGxVlzZWo1XLo?usp=sharing)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78025
Approved by: https://github.com/pearu, https://github.com/jbschlosser
2022-05-25 15:03:35 +00:00
Christian Puhrsch
a8467de6fa Guard test_sparse_csr.test_mm on CUDA11+ (#77965)
Fixes #77944

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77965
Approved by: https://github.com/albanD, https://github.com/malfet
2022-05-20 16:16:28 +00:00
Christian Puhrsch
ec290949aa Change transpose to return CSC when given CSR, adjust addmm, addmv, mm (#77615)
Changes transpose to return CSC when given CSR and adds CSC support via to_sparse_csr to addmm and addmv.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77615
Approved by: https://github.com/pearu, https://github.com/albanD
2022-05-19 14:17:55 +00:00
Pearu Peterson
8b5f11c61e Support copy_ for Sparse Compressed tensors.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77605

Approved by: https://github.com/cpuhrsch
2022-05-18 21:22:19 +00:00
Christian Puhrsch
e10a002e52 2D Strided to/from CSC, COO to CSC, CSC to CSC conversion. (#77521)
Adds
- to_sparse_csc for strided input
- to_sparse_csc for COO input
- CSC to strided
- CSC to CSR
- CSC to CSC

Uses SciPy as a reference

Follow up work is changing transpose to return CSC when passed CSR and the resulting ripples through our matmul operations.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77521
Approved by: https://github.com/pearu, https://github.com/anjali411
2022-05-18 14:49:11 +00:00
Pearu Peterson
ccc991ba29 Support str for Sparse Compressed tensors
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77530

Approved by: https://github.com/cpuhrsch
2022-05-18 12:58:54 +00:00
Pearu Peterson
dc882ed33d Add Sparse Compressed tensor support to torch.clone
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77512

Approved by: https://github.com/cpuhrsch
2022-05-17 16:29:41 +00:00
PyTorch MergeBot
0d1329c4ea Revert "Add Sparse Compressed tensor support to torch.clone"
This reverts commit 942f04172a.

Reverted https://github.com/pytorch/pytorch/pull/77512 on behalf of https://github.com/atalman
2022-05-17 14:26:52 +00:00
Pearu Peterson
942f04172a Add Sparse Compressed tensor support to torch.clone
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77512

Approved by: https://github.com/cpuhrsch
2022-05-17 07:32:46 +00:00
PyTorch MergeBot
f1c8e8fa4e Revert "Add Sparse Compressed tensor support to torch.clone"
This reverts commit 20ba6e6935.

Reverted https://github.com/pytorch/pytorch/pull/77512 on behalf of https://github.com/malfet
2022-05-17 00:31:49 +00:00
Christian Puhrsch
89e32f52c7 Change test_sparse_csr test signatures (#77595)
Some consuming tools aren't equipped to split on the "(" and ")" induced by passing tuples to parametrize.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77595
Approved by: https://github.com/malfet
2022-05-17 00:24:08 +00:00
Pearu Peterson
20ba6e6935 Add Sparse Compressed tensor support to torch.clone
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77512

Approved by: https://github.com/cpuhrsch
2022-05-16 22:21:49 +00:00
Pearu Peterson
d76efed578 Add Sparse CSC support to torch.empty
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77508

Approved by: https://github.com/cpuhrsch
2022-05-16 18:53:56 +00:00
Christian Puhrsch
8c608a79b4 Compressed sparse layout conversion stubs (#77489)
This PR unifies sparse layout conversions into a single location and adds stubs to raise a Runtime error for unsupported conversions.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77489
Approved by: https://github.com/pearu, https://github.com/mruberry
2022-05-16 18:37:42 +00:00
Pearu Peterson
88205886d7 Add ccol_indices and row_indices methods.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77503

Approved by: https://github.com/cpuhrsch
2022-05-16 00:23:54 +00:00
Christian Puhrsch
289192199a Add to_sparse_bsr (#77366)
Conversion function of CSR to BSR.

Follow up work includes
- Conversion from strided, COO, CSC, BSC
- autograd
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77366
Approved by: https://github.com/IvanYashchuk, https://github.com/mikaylagawarecki
2022-05-13 20:16:03 +00:00
Christian Puhrsch
b250759242 mul(dense, csr), mul(csr, dense) via sparse_mask_csr (#77177)
This adds basic coverage, but can be easily made more efficient by providing a native implementation.

Follow up work includes supporting CSR gradients for strided Tensors.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77177
Approved by: https://github.com/nikitaved, https://github.com/mikaylagawarecki
2022-05-12 23:56:10 +00:00
Ivan Yashchuk
09be44de7b Sparse BSR: Enable addmm, addmv, triangular_solve for BSR layout (#77255)
This PR enables `addmm`, `addmv`, `triangular_solve` functions for tensors with `torch.sparse_bsr` layout.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77255
Approved by: https://github.com/cpuhrsch
2022-05-12 08:31:44 +00:00
Ivan Yashchuk
d1beda53e8 Sparse CSR CUDA: add batched support for torch.sparse.sampled_addmm
This PR adds a forloop around cuSPARSE calls to support batched inputs.
cuSPARSE function itself doesn't support batched inputs yet.
`mat1` and `mat2` must have the same batch shape. It's allowed to pass
`self` as a single matrix when `mat1` and `mat2` are batched.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77243

Approved by: https://github.com/cpuhrsch
2022-05-12 08:23:38 +00:00
Ivan Yashchuk
545d90f032 Sparse CSR: enable autograd for torch.sparse.addmm and torch.sparse.mm
This PR updates the derivative rule for `torch.sparse.addmm` to be
working with CSR sparse matrix. Notably `torch.sparse.sampled_addmm` is
used in the backward function.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76591

Approved by: https://github.com/cpuhrsch
2022-05-11 18:57:40 +00:00
PyTorch MergeBot
f94abd59f7 Revert "Sparse CSR: enable autograd for torch.sparse.addmm and torch.sparse.mm"
This reverts commit 721a8ca697.

Reverted https://github.com/pytorch/pytorch/pull/76591 on behalf of https://github.com/janeyx99
2022-05-10 13:21:46 +00:00
Ivan Yashchuk
721a8ca697 Sparse CSR: enable autograd for torch.sparse.addmm and torch.sparse.mm
This PR updates the derivative rule for `torch.sparse.addmm` to be
working with CSR sparse matrix. Notably `torch.sparse.sampled_addmm` is
used in the backward function.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76591

Approved by: https://github.com/cpuhrsch
2022-05-10 08:44:55 +00:00
Ivan Yashchuk
3df0140cbd Sparse CSR: Fix sampled_addmm for noncontiguous inputs and fix block sparse triangular solve
`torch.sparse.sampled_addmm` was incorrect for noncontiguous inputs on CUDA.
Unfortnately, it was overlooked in the tests that noncontiguous inputs
are not tested properly because 1x5, 5x1 shapes were used.

Block sparse triangular solver on CUDA could return incorrect results if
there's a zero on the diagonal in the sparse matrix. Now it returns nan.
Tests also revealed that unitriangular=True flag is not working
correctly on CPU in some cases. That part needs more investigation.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76590

Approved by: https://github.com/cpuhrsch
2022-05-05 09:00:48 +00:00
Ivan Yashchuk
1335512056 Sparse CSR: Add CPU fallback for sampled_addmm
`torch.sparse.sampled_addmm` function is used in backward for
`torch.sparse.addmm` and `torch.sparse.mm` therefore we need a CPU
implementation.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76589

Approved by: https://github.com/cpuhrsch
2022-05-04 21:30:43 +00:00
Pearu Peterson
436a7be059 Factory functions for sparse CSC, BSR, and BSC tensors
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76634

Tests for Sparse Compressed factory functions

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76746

Approved by: https://github.com/cpuhrsch
2022-05-04 03:30:41 +00:00
Ivan Yashchuk
d7db6a7b02 Sparse CSR: Add backward for torch.sparse.sampled_addmm
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68084

Approved by: https://github.com/cpuhrsch
2022-05-02 17:58:20 +00:00
Ivan Yashchuk
407e8eba8c Enable simple indexing into CSR tensor, add torch.select for CSR
This PR implements `torch.select` for CSR tensors. Currently, it's not possible to select rows or columns for batched CSR. The non-batched case works fine by converting to COO and calling select. Initially, I implemented raw manipulations of indices but converting to COO is only slightly slower and more readable.

This PR also enables indexing into batched CSR tensor with `[x, y, z]`. Assigning is disabled.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76228
Approved by: https://github.com/cpuhrsch
2022-04-23 02:36:03 +00:00
arindamroy-eng
7478ce187a ROCM:Unskip more tests for ROCM5.0
Re-enabling more tests which are working on ROCM5.0

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75353
Approved by: https://github.com/ezyang
2022-04-19 19:45:55 +00:00
Ivan Yashchuk
bba4780232 Enable autograd wrt sparse CSR tensors
This pull request enables accumulating gradients for the CSR tensor.
Functions that work and are tested:
- tensor.abs()
- tensor.neg()
- tensor.conj_physical()
- torch.addmm

`torch.mm` also works, but tests will be added later.

In addition, this PR adds throwing an error when trying to access strides, storage, and contiguity info on a CSR tensor.

`tensor.to_sparse_csr().to_sparse_csr()` was failing and now fixed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75435
Approved by: https://github.com/cpuhrsch
2022-04-19 18:42:45 +00:00
Pearu Peterson
e9791cd8c9 Validate Sparse Compressed tensor arguments
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75946

Approved by: https://github.com/cpuhrsch
2022-04-18 02:21:22 +00:00
Yukio Siraichi
22a10ce513 Port cat kernel to structured kernels.
Tracking issue: #55070

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68640

Approved by: https://github.com/ezyang
2022-04-14 17:49:43 +00:00
Ivan Yashchuk
3f1351d1cf Disable strides and contiguity for CSR tensors
This pull request adds throwing an error when trying to access the strides, storage, and contiguity info of a CSR tensor.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75499
Approved by: https://github.com/cpuhrsch
2022-04-08 23:15:19 +00:00
Pearu Peterson
e61b2e12e1 Support masked sum on CSR tensors [CPU, CUDA]
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72633

Approved by: https://github.com/cpuhrsch
2022-04-08 20:07:18 +00:00
PyTorch MergeBot
31ed77b769 Revert "Support masked sum on CSR tensors [CPU, CUDA]"
This reverts commit 5c28216aea.

Reverted https://github.com/pytorch/pytorch/pull/72633 on behalf of https://github.com/b0noI
2022-04-07 23:34:58 +00:00
Ivan Yashchuk
c7ae23b50e Extend CSR constructor to support batched indices and values
This is the first portion of changes required to enable Batched CSR format described in https://github.com/pytorch/pytorch/issues/60854#batched-CSR-computation.

Currently, only the same batch shape for indices and values is allowed. In the future, we could enable "broadcasting" of indices and batched values, as done in xFormers (dd96b8d8be/xformers/components/attention/_sputnik_sparse.py (L441)).

This PR adds possibility to construct a batched CSR matrix with `torch.sparse_csr_tensor` and this batched CSR can be converted to a dense tensor with a `.to_dense()` call.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74542
Approved by: https://github.com/cpuhrsch
2022-04-07 17:10:52 +00:00
Pearu Peterson
5c28216aea Support masked sum on CSR tensors [CPU, CUDA]
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72633

Approved by: https://github.com/cpuhrsch
2022-04-07 17:08:35 +00:00
PyTorch MergeBot
6d832a7a20 Revert "Extend CSR constructor to support batched indices and values"
This reverts commit eead599039.

Reverted https://github.com/pytorch/pytorch/pull/74542 on behalf of https://github.com/b0noI
2022-04-05 21:39:34 +00:00
Christian Puhrsch
f2a4d49174 torch.mm(dense, sparse_csr)
Fixes #68621
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73686
Approved by: https://github.com/IvanYashchuk, https://github.com/malfet
2022-04-05 17:05:37 +00:00
Ivan Yashchuk
eead599039 Extend CSR constructor to support batched indices and values
This is the first portion of changes required to enable Batched CSR format described in https://github.com/pytorch/pytorch/issues/60854#batched-CSR-computation.

Currently, only the same batch shape for indices and values is allowed. In the future, we could enable "broadcasting" of indices and batched values, as done in xFormers (dd96b8d8be/xformers/components/attention/_sputnik_sparse.py (L441)).

This PR adds possibility to construct a batched CSR matrix with `torch.sparse_csr_tensor` and this batched CSR can be converted to a dense tensor with a `.to_dense()` call.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74542
Approved by: https://github.com/cpuhrsch
2022-04-04 22:09:44 +00:00
PyTorch MergeBot
f6b9a1d4fb Revert "Support masked sum on CSR tensors [CPU, CUDA]"
This reverts commit cda3f586d0.

Reverted https://github.com/pytorch/pytorch/pull/72633 on behalf of https://github.com/janeyx99
2022-04-04 22:06:19 +00:00
Pearu Peterson
cda3f586d0 Support masked sum on CSR tensors [CPU, CUDA]
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72633

Approved by: https://github.com/cpuhrsch
2022-04-04 19:23:45 +00:00
Nikita Shulga
bfac65dfe5
[testing] Update dispatch macros (#74977)
This PR is reland of #74289 
Co-authored-by: Khushi Agrawal <khushiagrawal411@gmail.com>
2022-03-30 14:13:21 -07:00
PyTorch MergeBot
cc23725e89 Revert "Extend CSR constructor to support batched indices and values"
This reverts commit c074a53002.

Reverted https://github.com/pytorch/pytorch/pull/74542 on behalf of https://github.com/malfet
2022-03-30 19:54:26 +00:00
PyTorch MergeBot
2e4152b118 Revert "[testing] Update dispatch macros"
This reverts commit eed19a0f38.

Reverted https://github.com/pytorch/pytorch/pull/74289 on behalf of https://github.com/malfet
2022-03-30 19:52:37 +00:00
Khushi Agrawal
eed19a0f38 [testing] Update dispatch macros
Hi,
This PR is the follow-up PR of #71561. (the previous PR had a couple of merge conflicts and was reverted, this PR resolves that).
Please take a look. Thanks!

cc: @pmeier @mruberry @kshitij12345
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74289
Approved by: https://github.com/pmeier, https://github.com/mruberry
2022-03-30 16:10:16 +00:00
Ivan Yashchuk
c074a53002 Extend CSR constructor to support batched indices and values
This is the first portion of changes required to enable Batched CSR format described in https://github.com/pytorch/pytorch/issues/60854#batched-CSR-computation.

Currently, only the same batch shape for indices and values is allowed. In the future, we could enable "broadcasting" of indices and batched values, as done in xFormers (dd96b8d8be/xformers/components/attention/_sputnik_sparse.py (L441)).

This PR adds possibility to construct a batched CSR matrix with `torch.sparse_csr_tensor` and this batched CSR can be converted to a dense tensor with a `.to_dense()` call.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74542
Approved by: https://github.com/cpuhrsch
2022-03-29 21:20:25 +00:00
Christian Puhrsch
568e02dcd7 Support sum(sparse_csr)
Basic support for summation of CSR. ~~Generalizes structured torch.sum to also support CSR.~~

Follow up work:
- Autograd support
- OpInfo integration
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74766
Approved by: https://github.com/ezyang
2022-03-29 18:44:02 +00:00
Christian Puhrsch
edf2deb81e Add private conversion function from CSR to block CSR
This PR adds a private function that converts a CSR Tensor into a [scipy-style block CSR Tensor](https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.bsr_matrix.html#scipy.sparse.bsr_matrix).

It uses the scipy CSR to BSR conversion routines (and credits them accordingly).

The main purpose of this function is to easily create a block CSR Tensor for matrix multiplication.

Follow up work includes
- Blocksize support for sparse_csr_tensor
- Parallel CPU kernel
- CUDA kernels
- Faster arg sanitization
- Benchmarking of cuSPARSE backend
- Dense to/from block CSR
- Autograd support
- Column-major blocks
- Block CSR to CSR conversion
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71582
Approved by: https://github.com/IvanYashchuk, https://github.com/albanD
2022-03-25 21:22:15 +00:00
Christian Puhrsch
7fe0b6a5cd mul(sparse_csr, sparse_csr) using mul(sparse, sparse)
Basic fallback implementation. Let's make this faster once used.

NOTE: This is stacked on top of https://github.com/pytorch/pytorch/pull/74294
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74266
Approved by: https://github.com/pearu, https://github.com/malfet
2022-03-25 17:10:33 +00:00
Christian Puhrsch
807b2e190b Move to_sparse_csr to C++
Allows use of to_sparse_csr from C++
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74294
Approved by: https://github.com/ngimel, https://github.com/malfet
2022-03-23 17:17:45 +00:00
Christian Puhrsch
a346a18150 Use assertEqual consistently in test_sparse_csr.py
Let's use the provided comparison infrastructure
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74264
Approved by: https://github.com/IvanYashchuk, https://github.com/albanD
2022-03-16 15:19:41 +00:00
Nikita Shulga
ef066f0832 Revert D34856571: [pytorch][PR] Replace get_all_ type macros with the ATen dispatch macros.
Test Plan: revert-hammer

Differential Revision:
D34856571 (3ded7b1da3)

Original commit changeset: 0dca038bcad5

Original Phabricator Diff: D34856571 (3ded7b1da3)

fbshipit-source-id: 594553fa0b710d78beba59d5d2b646f1f1270386
(cherry picked from commit 8090eb9b12dcf452a9e7dc01792a66fb91b563b6)
2022-03-15 22:07:11 +00:00
Khushi Agrawal
3ded7b1da3 Replace get_all_ type macros with the ATen dispatch macros. (#71561)
Summary:
Hi, Team!
The PR is motivated from https://github.com/pytorch/pytorch/pull/71153#discussion_r782446738. It aims to replace `get_all` type macros with the ATen dispatch macros.

The files it iterates over are: (Thanks, Lezcano, for the idea!!)

<details>
<summary>

`test/test_autograd.py`</summary>

<p>

```python
43:from torch.testing._internal.common_dtype import get_all_dtypes
8506:        floating_dt = [dt for dt in get_all_dtypes() if dt.is_floating_point]
```

</p>
</details>

<details>
<summary>

`test/test_binary_ufuncs.py`</summary>

<p>

```python
26:    all_types_and_complex_and, integral_types_and, get_all_dtypes, get_all_int_dtypes, get_all_math_dtypes,
27:    get_all_complex_dtypes, get_all_fp_dtypes,
935:    dtypes(*get_all_dtypes(include_bool=False, include_complex=False))
1035:    dtypes(*get_all_dtypes(
1488:    dtypes(*(get_all_dtypes(include_bool=False, include_bfloat16=False)))
1879:    dtypes(*product(get_all_dtypes(include_complex=False), get_all_dtypes(include_complex=False)))
1887:    dtypes(*(get_all_int_dtypes() + [torch.bool]))
1913:    dtypes(*(get_all_fp_dtypes()))
1941:    dtypes(*(get_all_fp_dtypes()))
1977:    dtypes(*product(get_all_complex_dtypes(), get_all_dtypes()))
2019:    dtypes(*product(get_all_fp_dtypes(), get_all_fp_dtypes()))
2048:    dtypes(*get_all_dtypes())
2110:    dtypes(*product(get_all_dtypes(include_complex=False),
2111:                     get_all_dtypes(include_complex=False)))
2128:            types = [torch.bool, torch.bfloat16] + get_all_int_dtypes()
2173:        if dtypes[1] in get_all_fp_dtypes():
2178:    dtypes(*product(get_all_fp_dtypes(),
2179:                     get_all_fp_dtypes()))
2260:    dtypesIfCUDA(*set(get_all_math_dtypes('cuda')) - {torch.complex64, torch.complex128})
2261:    dtypes(*set(get_all_math_dtypes('cpu')) - {torch.complex64, torch.complex128})
2273:    dtypesIfCUDA(*set(get_all_math_dtypes('cuda')) - {torch.complex64, torch.complex128})
2274:    dtypes(*set(get_all_math_dtypes('cpu')) - {torch.complex64, torch.complex128})
2307:    dtypes(*get_all_math_dtypes('cpu'))
2319:    dtypes(*get_all_fp_dtypes(include_bfloat16=False))
2331:    dtypes(*get_all_int_dtypes())
2356:    dtypes(*get_all_dtypes(include_bfloat16=False, include_bool=False, include_complex=False))
2393:        if dtype in get_all_int_dtypes():
2614:    dtypes(*get_all_dtypes())
2624:    dtypes(*tuple(itertools.combinations_with_replacement(get_all_dtypes(), 2)))
2806:    dtypes(*list(product(get_all_dtypes(include_complex=False),
2807:                          get_all_dtypes(include_complex=False))))
2866:    dtypes(*list(product(get_all_complex_dtypes(),
2867:                          get_all_complex_dtypes())))
2902:    dtypes(*product(get_all_dtypes(), get_all_dtypes()))
2906:    dtypes(*product(get_all_dtypes(), get_all_dtypes()))
2910:    dtypes(*product(get_all_dtypes(), get_all_dtypes()))
3019:        dtypes = [torch.float, torch.double] + get_all_complex_dtypes()
3221:    dtypes(*get_all_dtypes(include_complex=False))
3407:    dtypes(*list(product(get_all_dtypes(include_bool=False),
3408:                          get_all_dtypes(include_bool=False))))
3504:    dtypes(*product(get_all_dtypes(include_complex=False, include_bfloat16=False),
3505:                     get_all_dtypes(include_complex=False, include_bfloat16=False)))
3516:            if x.dtype in get_all_int_dtypes() + [torch.bool]:
3643:    dtypes(*product(get_all_dtypes(include_complex=False,
3645:                     get_all_dtypes(include_complex=False,
```

</p>
</details>

<details>
<summary>

`test/test_complex.py`</summary>

<p>

```python
6:from torch.testing._internal.common_dtype import get_all_complex_dtypes
11:    dtypes(*get_all_complex_dtypes())
```

</p>
</details>

<details>
<summary>

`test/test_foreach.py`</summary>

<p>

```python
18:    get_all_dtypes, get_all_int_dtypes, get_all_complex_dtypes, get_all_fp_dtypes,
142:            if dtype in get_all_int_dtypes():
179:            disable_fastpath = op.ref == torch.div and dtype in get_all_int_dtypes() + [torch.bool]
201:            disable_fastpath = op.ref == torch.div and dtype in get_all_int_dtypes() + [torch.bool]
205:                disable_fastpath |= dtype in get_all_int_dtypes() + [torch.bool]
211:                disable_fastpath |= dtype not in get_all_complex_dtypes()
241:                bool_int_div = op.ref == torch.div and dtype in get_all_int_dtypes() + [torch.bool]
246:                    disable_fastpath |= dtype in get_all_int_dtypes() + [torch.bool]
248:                    disable_fastpath |= dtype not in get_all_complex_dtypes()
250:                    disable_fastpath |= True and dtype not in get_all_complex_dtypes()
307:        disable_fastpath = dtype in get_all_int_dtypes() + [torch.bool]
365:        if opinfo.name == "_foreach_abs" and dtype in get_all_complex_dtypes():
376:    ops(foreach_unary_op_db, dtypes=get_all_dtypes())
393:         dtypes=get_all_dtypes(include_half=True, include_bfloat16=True, include_complex=False))
401:    ops(foreach_minmax_op_db, dtypes=get_all_fp_dtypes(include_bfloat16=True, include_half=True))
426:            if ord in (1, 2) and dtype in torch.testing.get_all_fp_dtypes():
439:    dtypes(*get_all_dtypes())
449:    ops(foreach_binary_op_db, dtypes=get_all_dtypes())
481:    ops(foreach_binary_op_db, dtypes=get_all_dtypes())
536:            if dtype in get_all_int_dtypes() + [torch.bool] and foreach_op == torch._foreach_div:
545:    ops(foreach_binary_op_db, dtypes=get_all_dtypes())
637:    ops(foreach_pointwise_op_db, allowed_dtypes=get_all_fp_dtypes(include_half=False, include_bfloat16=False))
```

</p>
</details>

<details>
<summary>

`test/test_linalg.py`</summary>

<p>

```python
29:    all_types, floating_types, floating_and_complex_types, get_all_dtypes, get_all_int_dtypes, get_all_complex_dtypes,
30:    get_all_fp_dtypes,
111:    dtypes(*(get_all_dtypes()))
794:        float_and_complex_dtypes = get_all_fp_dtypes() + get_all_complex_dtypes()
807:    dtypes(*(get_all_int_dtypes()))
828:    dtypes(*(get_all_fp_dtypes() + get_all_complex_dtypes()))
841:        if dtype in get_all_complex_dtypes():
844:    dtypes(*itertools.product(get_all_dtypes(),
845:                               get_all_dtypes()))
855:        for dtypes0, dtypes1, dtypes2 in product(get_all_dtypes(), repeat=3):
5607:                  *get_all_fp_dtypes(include_half=not CUDA9, include_bfloat16=(CUDA11OrLater and SM53OrLater)))
5608:    dtypes(*(set(get_all_dtypes()) - {torch.half, torch.bool}))
5644:    dtypes(*(get_all_complex_dtypes() + get_all_fp_dtypes()))
6255:    dtypesIfCUDA(*get_all_complex_dtypes(),
6256:                  *get_all_fp_dtypes(include_bfloat16=(TEST_WITH_ROCM or (CUDA11OrLater and SM53OrLater)),
6292:    dtypesIfCUDA(*get_all_fp_dtypes(include_bfloat16=(TEST_WITH_ROCM or (CUDA11OrLater and SM53OrLater))))
6323:    dtypesIfCUDA(*get_all_complex_dtypes(),
6324:                  *get_all_fp_dtypes(include_bfloat16=(TEST_WITH_ROCM or (CUDA11OrLater and SM53OrLater))))
6325:    dtypes(*get_all_complex_dtypes(), *get_all_fp_dtypes())
6358:    dtypesIfCUDA(*([torch.float, torch.double] + get_all_complex_dtypes()))
6556:    dtypes(*get_all_fp_dtypes(), *get_all_complex_dtypes())
6668:    dtypes(*get_all_fp_dtypes(), *get_all_complex_dtypes())
6741:    dtypes(*get_all_fp_dtypes(), *get_all_complex_dtypes())
```

</p>
</details>

<details>
<summary>

`test/test_nn.py`</summary>

<p>

```python
37:from torch.testing._internal.common_dtype import integral_types, get_all_fp_dtypes, get_all_math_dtypes
50:    onlyNativeDeviceTypes, deviceCountAtLeast, largeTensorTest, expectedFailureMeta, skipMeta, get_all_device_types, \
8862:                for device in get_all_device_types():
9629:            for dt1 in get_all_math_dtypes(device):
9630:                for dt2 in get_all_math_dtypes(device):
9631:                    for dt3 in get_all_math_dtypes(device):
9648:            for input_dtype in get_all_math_dtypes(device):
9664:            for input_dtype in get_all_math_dtypes(device):
13015:    dtypes(*get_all_fp_dtypes(include_bfloat16=AMPERE_OR_ROCM))
13034:    dtypes(*get_all_fp_dtypes(include_bfloat16=AMPERE_OR_ROCM))
13159:    dtypes(*get_all_fp_dtypes(include_bfloat16=AMPERE_OR_ROCM))
17400:    dtypesIfCUDA(*get_all_fp_dtypes(include_bfloat16=AMPERE_OR_ROCM))
17768:    dtypesIfCUDA(*get_all_fp_dtypes())
17773:    dtypesIfCUDA(*get_all_fp_dtypes())
17778:    dtypesIfCUDA(*get_all_fp_dtypes())
17783:    dtypesIfCUDA(*get_all_fp_dtypes())
17788:    dtypesIfCUDA(*get_all_fp_dtypes())
17793:    dtypesIfCUDA(*get_all_fp_dtypes())
17798:    dtypesIfCUDA(*get_all_fp_dtypes())
17963:    dtypesIfCUDA(*get_all_fp_dtypes())
17977:    dtypesIfCUDA(*get_all_fp_dtypes())
18684:    def test_cross_entropy_loss_prob_target_all_reductions(self, device):
```

</p>
</details>

<details>
<summary>

`test/test_numpy_interop.py`</summary>

<p>

```python
12:from torch.testing._internal.common_dtype import get_all_dtypes
399:    dtypes(*get_all_dtypes())
```

</p>
</details>

<details>
<summary>

`test/test_ops.py`</summary>

<p>

```python
12:from torch.testing._internal.common_dtype import floating_and_complex_types_and, get_all_dtypes
86:        for dtype in get_all_dtypes():
```

</p>
</details>

<details>
<summary>

`test/test_reductions.py`</summary>

<p>

```python
16:    get_all_dtypes, get_all_math_dtypes, get_all_int_dtypes, get_all_complex_dtypes, get_all_fp_dtypes,
360:         allowed_dtypes=get_all_dtypes(include_bfloat16=False))
366:         allowed_dtypes=get_all_dtypes(include_bfloat16=False))
394:         allowed_dtypes=get_all_dtypes(include_bfloat16=False))
750:        for dtype in [dtype for dtype in get_all_math_dtypes('cpu') if dtype != torch.float16]:
1404:    dtypes(*get_all_dtypes(include_bool=False, include_complex=False))
1457:    dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False) +
1458:              get_all_complex_dtypes()))
1465:            return dtype in get_all_int_dtypes()
1494:    dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False)))
1501:    dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False)))
1507:    dtypes(*(get_all_complex_dtypes()))
1514:        dtypes = list(get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False))
1523:    dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False)))
1531:        if dtype in get_all_fp_dtypes():
1608:    dtypes(*(get_all_dtypes(include_half=True, include_bfloat16=False,
1837:    dtypes(*get_all_dtypes(include_bool=False, include_complex=False))
1855:    dtypes(*(set(get_all_dtypes(include_bool=False, include_complex=False)) - {torch.uint8}))
3219:        for dtype in get_all_dtypes(include_half=True, include_bfloat16=False,
```

</p>
</details>

<details>
<summary>

`test/test_serialization.py`</summary>

<p>

```python
26:from torch.testing._internal.common_dtype import get_all_dtypes
586:        for device, dtype in product(devices, get_all_dtypes()):
589:            for other_dtype in get_all_dtypes():
```

</p>
</details>

<details>
<summary>

`test/test_shape_ops.py`</summary>

<p>

```python
18:from torch.testing._internal.common_dtype import get_all_dtypes
230:    dtypes(*get_all_dtypes(include_complex=False, include_bool=False, include_half=False,
232:    dtypesIfCUDA(*get_all_dtypes(include_complex=False, include_bool=False, include_bfloat16=False))
344:    dtypes(*get_all_dtypes())
443:    dtypes(*get_all_dtypes())
461:    dtypes(*get_all_dtypes())
570:    dtypes(*get_all_dtypes(include_complex=False))
```

</p>
</details>

<details>
<summary>

`test/test_sort_and_select.py`</summary>

<p>

```python
12:    all_types, all_types_and, floating_types_and, get_all_dtypes, get_all_int_dtypes, get_all_fp_dtypes,
136:    dtypes(*set(get_all_dtypes()) - {torch.bool, torch.complex64, torch.complex128})
231:    dtypes(*set(get_all_dtypes()) - {torch.bool, torch.complex64, torch.complex128})
296:    dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes()))
647:    dtypesIfCUDA(*get_all_fp_dtypes())
678:    dtypesIfCUDA(*(get_all_dtypes(include_complex=False,
682:    dtypes(*(get_all_dtypes(include_complex=False, include_bool=False, include_half=False, include_bfloat16=False)))
739:    dtypesIfCPU(*set(get_all_dtypes()) - {torch.complex64, torch.complex128})
740:    dtypes(*set(get_all_dtypes()) - {torch.bfloat16, torch.complex64, torch.complex128})
799:    dtypesIfCPU(*set(get_all_dtypes()) - {torch.complex64, torch.complex128})
800:    dtypes(*set(get_all_dtypes()) - {torch.bfloat16, torch.complex64, torch.complex128})
```

</p>
</details>

<details>
<summary>

`test/test_sparse.py`</summary>

<p>

```python
20:from torch.testing import get_all_complex_dtypes, get_all_fp_dtypes
29:    floating_and_complex_types, floating_and_complex_types_and, get_all_dtypes, get_all_int_dtypes,
1963:            return dtype in get_all_int_dtypes()
1994:    dtypes(*get_all_dtypes(include_bool=False, include_half=False,
2103:            return dtype in get_all_int_dtypes()
2138:    dtypes(*get_all_dtypes(include_bool=False, include_half=False,
2626:        all_sparse_dtypes = get_all_dtypes(include_complex=True)
2633:        all_sparse_dtypes = get_all_dtypes(include_complex=True)
3230:    dtypes(*get_all_complex_dtypes(),
3231:            *get_all_fp_dtypes(include_half=False, include_bfloat16=False))
3234:                  *get_all_fp_dtypes(
```

</p>
</details>

<details>
<summary>

`test/test_sparse_csr.py`</summary>

<p>

```python
7:from torch.testing import get_all_complex_dtypes, get_all_fp_dtypes, floating_and_complex_types, make_tensor
17:from torch.testing._internal.common_dtype import floating_types, get_all_dtypes
120:    dtypes(*get_all_dtypes())
133:    dtypes(*get_all_dtypes())
150:    dtypes(*get_all_dtypes())
180:    dtypes(*get_all_dtypes())
201:    dtypes(*get_all_dtypes())
210:    dtypes(*get_all_dtypes())
225:    dtypes(*get_all_dtypes())
244:    dtypes(*get_all_dtypes())
263:    dtypes(*get_all_dtypes())
285:    dtypes(*get_all_dtypes())
411:    dtypes(*get_all_dtypes())
482:    dtypes(*get_all_dtypes())
502:    dtypes(*get_all_dtypes())
562:    dtypes(*get_all_dtypes())
588:    dtypesIfCUDA(*get_all_complex_dtypes(),
589:                  *get_all_fp_dtypes(include_half=SM53OrLater, include_bfloat16=SM80OrLater))
745:    dtypesIfCUDA(*get_all_complex_dtypes(),
746:                  *get_all_fp_dtypes(include_half=SM53OrLater and TEST_CUSPARSE_GENERIC,
765:    dtypesIfCUDA(*get_all_complex_dtypes(),
766:                  *get_all_fp_dtypes(include_half=SM53OrLater and TEST_CUSPARSE_GENERIC,
801:                  *torch.testing.get_all_fp_dtypes(include_bfloat16=SM80OrLater,
841:                  *torch.testing.get_all_fp_dtypes(include_bfloat16=SM80OrLater,
1182:    dtypes(*get_all_dtypes())
1276:    dtypes(*get_all_dtypes(include_bool=False, include_half=False, include_bfloat16=False))
1286:    dtypes(*get_all_dtypes())
```

</p>
</details>

<details>
<summary>

`test/test_tensor_creation_ops.py`</summary>

<p>

```python
21:    onlyCUDA, skipCPUIf, dtypesIfCUDA, skipMeta, get_all_device_types)
23:    get_all_dtypes, get_all_math_dtypes, get_all_int_dtypes, get_all_fp_dtypes, get_all_complex_dtypes
150:        for dt in get_all_dtypes():
160:        for dt in get_all_dtypes():
314:        dtypes = [dtype for dtype in get_all_dtypes() if dtype != torch.bfloat16]
1012:    dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False) +
1013:              get_all_complex_dtypes()))
1032:    dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False) +
1033:              get_all_complex_dtypes()))
1050:    dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False) +
1051:              get_all_complex_dtypes()))
1745:    dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes()))
1779:    dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes()))
1868:    dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes()))
1926:    dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes()))
1954:            do_test_empty_full(self, get_all_math_dtypes('cpu'), torch.strided, torch_device)
1956:            do_test_empty_full(self, get_all_math_dtypes('cpu'), torch.strided, None)
1957:            do_test_empty_full(self, get_all_math_dtypes('cpu'), torch.strided, torch_device)
2538:        for device in get_all_device_types():
2645:        for dtype in get_all_dtypes():
2678:    dtypes(*(get_all_fp_dtypes(include_half=False, include_bfloat16=False) +
2679:              get_all_complex_dtypes()))
2716:    dtypes(*get_all_fp_dtypes(include_half=False, include_bfloat16=False))
2827:            for dt in get_all_dtypes():
2913:    dtypes(*get_all_dtypes(include_bool=False, include_half=False))
2914:    dtypesIfCUDA(*get_all_dtypes(include_bool=False, include_half=True))
3028:    dtypes(*(get_all_fp_dtypes() + get_all_complex_dtypes()))
3033:    dtypes(*(get_all_fp_dtypes() + get_all_complex_dtypes()))
3074:    dtypes(*get_all_dtypes(include_bool=False, include_half=False, include_complex=False))
3075:    dtypesIfCUDA(*((get_all_int_dtypes() + [torch.float32, torch.float16, torch.bfloat16])
3077:                    else get_all_dtypes(include_bool=False, include_half=True, include_complex=False)))
3873:    dtypes(*get_all_dtypes())
3884:    dtypes(*get_all_dtypes(include_bool=False))
3916:            for other in get_all_dtypes():
3922:    dtypes(*get_all_dtypes())
3932:    dtypes(*get_all_dtypes(include_bool=False))
3955:    dtypes(*get_all_dtypes(include_bool=False))
3961:    dtypes(*get_all_dtypes(include_bool=False))
3965:    dtypes(*get_all_dtypes())
```

</p>
</details>

<details>
<summary>

`test/test_testing.py`</summary>

<p>

```python
25:from torch.testing._internal.common_dtype import get_all_dtypes
31:    dtypes(*(get_all_dtypes(include_half=True, include_bfloat16=False,
```

</p>
</details>

<details>
<summary>

`test/test_torch.py`</summary>

<p>

```python
51:    expectedAlertNondeterministic, get_all_device_types, skipXLA)
57:    get_all_fp_dtypes, get_all_int_dtypes, get_all_math_dtypes, get_all_dtypes, get_all_complex_dtypes
296:            for d in get_all_device_types():
323:            for device in get_all_device_types():
324:                for dt1 in get_all_dtypes():
325:                    for dt2 in get_all_dtypes():
343:            all_dtypes = get_all_dtypes()
350:            all_dtypes = get_all_dtypes()
781:            for dtype in get_all_dtypes():
986:            for device in get_all_device_types():
1017:            for device in get_all_device_types():
1018:                for dtype in get_all_math_dtypes(device):
2792:            for device in get_all_device_types():
3186:    dtypes(*get_all_dtypes())
3195:        for error_dtype in get_all_dtypes():
3203:    dtypes(*get_all_dtypes())
3212:        for error_dtype in get_all_dtypes():
4539:    dtypes(*get_all_fp_dtypes())
4545:    dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes()))
4577:    dtypes(*get_all_fp_dtypes(include_half=False, include_bfloat16=False))
4578:    dtypesIfCPU(*(get_all_fp_dtypes(include_half=False, include_bfloat16=True)))
4579:    dtypesIfCUDA(*(get_all_fp_dtypes(include_bfloat16=False)))
4599:    dtypes(*(get_all_fp_dtypes(include_half=False, include_bfloat16=False)))
4600:    dtypesIfCPU(*(get_all_dtypes(include_half=False, include_bfloat16=False, include_complex=False)))
4601:    dtypesIfCUDA(*(get_all_dtypes(include_bfloat16=False, include_complex=False)))
4613:        for p_dtype in get_all_fp_dtypes(include_half=device.startswith('cuda'), include_bfloat16=False):
4628:    dtypes(*(get_all_fp_dtypes(include_half=False, include_bfloat16=False)))
4629:    dtypesIfCUDA(*(get_all_fp_dtypes(include_bfloat16=False)))
4640:    dtypes(*get_all_fp_dtypes())
4723:    dtypes(*get_all_fp_dtypes())
4735:    dtypes(*get_all_fp_dtypes(include_bfloat16=False))
4736:    dtypesIfCUDA(*get_all_fp_dtypes())
4747:    dtypes(*get_all_fp_dtypes())
4761:    dtypes(*get_all_fp_dtypes())
4771:    dtypes(*get_all_fp_dtypes())
4792:    dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes()))
5302:    dtypes(*get_all_dtypes(include_bfloat16=False))
5322:    dtypes(*get_all_dtypes(include_half=False, include_bfloat16=False))
5323:    dtypesIfCPU(*get_all_dtypes(include_bfloat16=False))
5324:    dtypesIfCUDA(*get_all_dtypes(include_bfloat16=False))
5591:        for dt in get_all_dtypes():
5611:        for dt in get_all_dtypes():
5678:        for dt in get_all_dtypes():
5696:    dtypesIfCUDA(*set(get_all_math_dtypes('cuda')))
5697:    dtypes(*set(get_all_math_dtypes('cpu')))
5746:    dtypes(*get_all_dtypes())
5780:    dtypes(*get_all_dtypes())
5885:    dtypes(*get_all_dtypes())
5902:    dtypes(*get_all_dtypes())
5945:    dtypes(*get_all_dtypes())
5979:    dtypes(*get_all_dtypes(include_bool=False))
6049:    dtypes(*get_all_dtypes(include_bool=False))
6092:    dtypes(*(get_all_fp_dtypes(include_bfloat16=False, include_half=False) +
6093:              get_all_complex_dtypes()))
6094:    dtypesIfCPU(*get_all_dtypes())
6095:    dtypesIfCUDA(*get_all_dtypes())
6122:    dtypes(*(get_all_fp_dtypes(include_bfloat16=False, include_half=False) +
6123:              get_all_complex_dtypes()))
6124:    dtypesIfCPU(*get_all_dtypes())
6125:    dtypesIfCUDA(*get_all_dtypes())
6163:    dtypes(*(get_all_fp_dtypes(include_bfloat16=False, include_half=False) +
6164:              get_all_complex_dtypes()))
6165:    dtypesIfCPU(*get_all_dtypes())
6166:    dtypesIfCUDA(*get_all_dtypes())
6190:    dtypes(*(get_all_complex_dtypes() +
6191:              get_all_int_dtypes()))
6238:    dtypes(*get_all_dtypes())
6323:    dtypes(*get_all_dtypes())
6389:    dtypes(*product(get_all_dtypes(), (torch.uint8, torch.bool)))
6699:    dtypesIfCUDA(*set(get_all_math_dtypes('cuda')))
6700:    dtypes(*set(get_all_math_dtypes('cpu')))
7452:    dtypes(*get_all_dtypes(include_bool=False))
7461:    dtypes(*get_all_dtypes(include_bool=False))
7477:    dtypes(*get_all_dtypes(include_bool=False))
7496:    dtypes(*get_all_dtypes(include_bool=False))
7538:    dtypes(*get_all_dtypes(include_bool=False))
8162:    dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes() +
8163:              get_all_complex_dtypes()))
8175:    dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes() +
8176:              get_all_complex_dtypes()))
```

</p>
</details>

<details>
<summary>

`test/test_type_promotion.py`</summary>

<p>

```python
14:    get_all_dtypes, get_all_math_dtypes, get_all_int_dtypes, get_all_fp_dtypes
187:        for dtype in get_all_dtypes():
262:        dtypes1 = get_all_math_dtypes('cuda')
263:        dtypes2 = get_all_math_dtypes(device)
339:    dtypes(*itertools.product(get_all_dtypes(), get_all_dtypes()))
468:            for dt1 in get_all_math_dtypes(device):
469:                for dt2 in get_all_math_dtypes(device):
519:            for dt1 in get_all_math_dtypes(device):
520:                for dt2 in get_all_math_dtypes(device):
528:        for dt in get_all_math_dtypes(device):
561:        for dtype in get_all_dtypes():
766:                                          dtypes=get_all_math_dtypes(device))
771:                                          dtypes=get_all_math_dtypes(device))
782:                                          dtypes=get_all_math_dtypes(device))
879:        dtypes = get_all_dtypes(include_bfloat16=False)
898:        dtypes = get_all_dtypes(include_bfloat16=False, include_bool=False)
965:    dtypesIfCUDA(*itertools.product(get_all_dtypes(include_bfloat16=False, include_complex=False),
966:                                     get_all_dtypes(include_bfloat16=False, include_complex=False)))
967:    dtypes(*itertools.product(get_all_dtypes(include_half=False, include_bfloat16=False,
969:                               get_all_dtypes(include_half=False, include_bfloat16=False,
976:            return dtype in get_all_int_dtypes() + [torch.bool]
979:            return dtype in get_all_fp_dtypes(include_half=True, include_bfloat16=False)
```

</p>
</details>

<details>
<summary>

`test/test_unary_ufuncs.py`</summary>

<p>

```python
24:    floating_types_and, all_types_and_complex_and, floating_and_complex_types_and, get_all_dtypes, get_all_math_dtypes,
25:    get_all_int_dtypes, get_all_fp_dtypes, get_all_complex_dtypes
517:    dtypes(*(get_all_int_dtypes() + [torch.bool] +
518:              get_all_fp_dtypes(include_bfloat16=False)))
596:    dtypes(*get_all_fp_dtypes(include_half=True, include_bfloat16=False))
611:        invalid_input_dtypes = get_all_int_dtypes() + \
612:            get_all_complex_dtypes() + \
619:        for dtype in get_all_fp_dtypes(include_half=True, include_bfloat16=False):
1048:    dtypes(*get_all_math_dtypes('cpu'))
1182:    dtypesIfCUDA(*get_all_fp_dtypes())
1190:    dtypesIfCUDA(*get_all_fp_dtypes())
1205:    dtypesIfCUDA(*get_all_fp_dtypes())
1215:    dtypesIfCUDA(*get_all_fp_dtypes())
1307:    dtypes(*(get_all_dtypes(include_bool=False)))
1349:    dtypes(*(get_all_fp_dtypes(include_half=False) +
1350:              get_all_complex_dtypes()))
1351:    dtypesIfCUDA(*(get_all_fp_dtypes(include_half=True) +
1352:                    get_all_complex_dtypes()))
```

</p>
</details>

<details>
<summary>

`test/test_view_ops.py`</summary>

<p>

```python
19:    get_all_dtypes, get_all_int_dtypes, get_all_fp_dtypes, get_all_complex_dtypes
124:    dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes()))
131:    dtypes(*get_all_dtypes(include_bfloat16=False))
213:            for view_dtype in [*get_all_fp_dtypes(), *get_all_complex_dtypes()]:
220:    dtypes(*get_all_dtypes())
224:        for view_dtype in get_all_dtypes():
305:    dtypes(*get_all_complex_dtypes(include_complex32=True))
343:    dtypes(*get_all_dtypes())
354:    dtypes(*get_all_dtypes())
364:    dtypes(*get_all_dtypes())
374:    dtypes(*get_all_dtypes())
384:    dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes()))
395:    dtypes(*get_all_complex_dtypes())
426:    dtypes(*get_all_complex_dtypes())
451:    dtypes(*product(get_all_complex_dtypes(), get_all_dtypes()))
1263:    dtypes(*(torch.testing.get_all_dtypes()))
1279:    dtypes(*(torch.testing.get_all_dtypes()))
1405:    dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False) +
1406:              get_all_complex_dtypes()))
1471:    dtypes(*get_all_dtypes(include_bfloat16=False))
1574:    dtypes(*get_all_dtypes())
1601:    dtypes(*get_all_dtypes(include_bfloat16=False))
1632:    dtypes(*get_all_dtypes(include_bfloat16=False))
1711:        for dt in get_all_dtypes():
1717:        for dt in get_all_dtypes():
1724:        for dt in get_all_dtypes():
```

</p>
</details>

I'm looking forward to your viewpoints. Thanks :)

cc: mruberry kshitij12345 anjali411

Pull Request resolved: https://github.com/pytorch/pytorch/pull/71561

Reviewed By: samdow

Differential Revision: D34856571

Pulled By: mruberry

fbshipit-source-id: 0dca038bcad5cf69906245c496d2e61ac3876335
(cherry picked from commit b058f67b4313143efa714ab105f36e74083131b9)
2022-03-15 20:31:41 +00:00
Pearu Peterson
4168c87ed3 Support CSR to COO conversion in to_sparse(2). (#73642)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73642

Former https://github.com/pytorch/pytorch/pull/73471 that was reverted due to lack of `to_sparse(sparse_dim)` support.

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D34580353

Pulled By: cpuhrsch

fbshipit-source-id: a8a4ea381daeb80d8365fe931af9f55a7e789ea1
(cherry picked from commit 5a3cf8110980e5a10dbb687e87e67d5524ebf2f5)
2022-03-02 22:33:32 +00:00
Nikita Shulga
8ac7393565 Revert D33767740: [pytorch][PR] Sparse CSR CPU: cuSolverSP backend for linalg.solve
Test Plan: revert-hammer

Differential Revision:
D33767740 (199d9a992c)

Original commit changeset: a945f065210c

Original Phabricator Diff: D33767740 (199d9a992c)

fbshipit-source-id: b7934df18118f8d6d5f165deb5aae9887953ae43
(cherry picked from commit d3ddbb021b227e3638f6f7c22c6eadfa73695e31)
2022-03-01 18:33:23 +00:00
Rohan Varma
95204c4e2b Revert D34503882: Support CSR to COO conversion in to_sparse.
Test Plan: revert-hammer

Differential Revision:
D34503882 (84f4e9c10a)

Original commit changeset: 4a781647a0ae

Original Phabricator Diff: D34503882 (84f4e9c10a)

fbshipit-source-id: cf161171a3b51aa3c0f2b15501956873b1ba29dd
(cherry picked from commit 924c19071713777700087087b27b388eb057d8d9)
2022-03-01 15:33:37 +00:00
Pearu Peterson
84f4e9c10a Support CSR to COO conversion in to_sparse. (#73471)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73471

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D34503882

Pulled By: cpuhrsch

fbshipit-source-id: 4a781647a0ae5d03827406b75b14acc7c48da0b0
(cherry picked from commit fa3dbdc6a8529d19f8a055494436ca1f766807be)
2022-03-01 06:31:52 +00:00
Kushashwa Ravi Shrimali
199d9a992c Sparse CSR CPU: cuSolverSP backend for linalg.solve (#71399)
Summary:
This PR introduces the `cuSolverSP` backend for `linalg.solve` with sparse CSR input matrices. The motivation comes from the issue: https://github.com/pytorch/pytorch/issues/69538.

`cuSolver` provides [`cusolverSp<t>csrlsvluHost`](https://docs.nvidia.com/cuda/cusolver/index.html#cusolver-lt-t-gt-csrlsvlu) API, a few things to note:

1. As mentioned in the documentation: `only CPU (Host) path is provided.` From the profiling, there doesn't seem to be any GPU kernel launch for optimization, please see the profiling below.
2. Since only `host` path is provided, the CPU path uses `csrlsvluHost` (but requires PyTorch to be installed/built with CUDA support).
3. The documentation mentions reordering helps optimize stuff, but it isn't clear how it affects the performance. There are options for reordering, so we stick to `reorder = 0` as the default choice.

`cuSolver` has [`csrlsvqr`](https://docs.nvidia.com/cuda/cusolver/index.html#cusolver-lt-t-gt-csrlsvqr) function which provides a `device` path to solve the linear system. This function is used for the CUDA path in this PR.

**Gist:**

For CPU Path: we call [`csrlsvluHost` function of cuSolver](https://docs.nvidia.com/cuda/cusolver/index.html#cusolver-lt-t-gt-csrlsvlu).
For CUDA Path: we call [`csrlsvqr` function of cuSolver](https://docs.nvidia.com/cuda/cusolver/index.html#cusolver-lt-t-gt-csrlsvqr).

**Profiling:** (On sparse input tensor of size 1000 x 1000, with a vector of shape length 1000), for `csrlsvlu` function (to show no GPU optimization)

```cpp
==3999651== Profiling result:
            Type  Time(%)      Time     Calls       Avg       Min       Max  Name
 GPU activities:  100.00%  2.1440us         1  2.1440us  2.1440us  2.1440us  [CUDA memcpy HtoD]
      API calls:   99.72%  1.07199s         9  119.11ms     500ns  1.07164s  cudaFree
                    0.11%  1.2182ms       398  3.0600us     140ns  137.94us  cuDeviceGetAttribute
                    0.06%  674.45us         4  168.61us  165.50us  173.64us  cuDeviceTotalMem
                    0.03%  357.07us         4  89.268us  2.7800us  201.89us  cudaMalloc
                    0.03%  309.29us         1  309.29us  309.29us  309.29us  cudaGetDeviceProperties
                    0.01%  160.47us       332     483ns     350ns  3.3300us  cudaFuncSetAttribute
                    0.01%  115.12us         4  28.780us  26.290us  33.410us  cuDeviceGetName
                    0.00%  28.591us         5  5.7180us     440ns  16.921us  cudaGetDevice
                    0.00%  22.061us         4  5.5150us     871ns  18.690us  cudaDeviceSynchronize
                    0.00%  20.370us        18  1.1310us     410ns  6.9900us  cudaEventDestroy
                    0.00%  16.390us         1  16.390us  16.390us  16.390us  cudaMemcpy
                    0.00%  11.540us         2  5.7700us  1.4900us  10.050us  cuDeviceGetPCIBusId
                    0.00%  10.510us        18     583ns     430ns  1.6200us  cudaEventCreateWithFlags
                    0.00%  7.9100us        21     376ns     290ns     700ns  cudaDeviceGetAttribute
                    0.00%  1.4300us         6     238ns     150ns     590ns  cuDeviceGet
                    0.00%  1.2200us         4     305ns     190ns     500ns  cuDeviceGetCount
                    0.00%     900ns         1     900ns     900ns     900ns  cuInit
                    0.00%     860ns         4     215ns     180ns     260ns  cuDeviceGetUuid
                    0.00%     240ns         1     240ns     240ns     240ns  cuDriverGetVersion
                    0.00%     230ns         1     230ns     230ns     230ns  cudaGetDeviceCount
```

Script:

```python
import torch

def solve(x, other, out):
    torch.linalg.solve(x, other, out=out)

if __name__ == "__main__":
    dense_inp = torch.randn((1000, 1000), dtype=torch.float64)
    # Set 50% of the values to 0 randomly
    dense_inp = torch.nn.functional.dropout(dense_inp, p=0.5)
    sparse_inp = dense_inp.to_sparse_csr()

    other = torch.randint(100, (1000,), dtype=torch.float64)
    out = torch.randint(1, (1000,), dtype=torch.float64)

    solve(sparse_inp, other, out)
```

The following error is raised when the function is used on a CPU device with PyTorch built/installed without CUDA support:
* When built without CUDA support:

```python
/home/krshrimali/pytorch/torch/autograd/profiler.py:151: UserWarning: CUDA is not available, disabling CUDA profiling
  warn("CUDA is not available, disabling CUDA profiling")
Traceback (most recent call last):
  File "/home/krshrimali/pytorch/test_sp.py", line 17, in <module>
    solve(x, other, out)
  File "/home/krshrimali/pytorch/test_sp.py", line 5, in solve
    torch.linalg.solve(x, other, out=out)
RuntimeError: PyTorch was not built with CUDA support. Please use PyTorch built CUDA support
```

**Performance Comparison** (vs SciPy's [`scipy.sparse.linalg.spsolve`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.linalg.spsolve.html):

Time taken by `scipy.sparse.linalg.spsolve` : 0.595 seconds

On CPU: Time taken by `torch.linalg.solve` : 4.565 seconds
On CUDA: Time taken by `torch.linalg.solve`: 1.838 seconds

The inputs are of dimensions: (17281, 17281) and (17281, 1), and were taken from https://math.nist.gov/MatrixMarket/extreme.html.

Thanks to IvanYashchuk for helping me with the PR, and guiding me through it.

cc: IvanYashchuk pearu nikitaved cpuhrsch

cc nikitaved pearu cpuhrsch

Pull Request resolved: https://github.com/pytorch/pytorch/pull/71399

Reviewed By: VitalyFedyunin

Differential Revision: D33767740

Pulled By: cpuhrsch

fbshipit-source-id: a945f065210cd719096eb8d7cdbf8e8937c2fce9
(cherry picked from commit f4f35c17da414e1ca6c6d91402933521857aa1ea)
2022-03-01 05:32:35 +00:00
Ivan Yashchuk
0ba3498248 Sparse CSR CPU: implement addmm(dense, sparse, sparse) -> dense (#73076)
Summary:
This PR adds a possibility to multiply two sparse matrices and add the result of a product to a dense matrix.
It uses [MKL spmmd function](https://www.intel.com/content/www/us/en/develop/documentation/onemkl-developer-reference-c/top/blas-and-sparse-blas-routines/inspector-executor-sparse-blas-routines/inspector-executor-sparse-blas-execution-routines/mkl-sparse-spmmd.html) and only CPU path is implemented for now.

Ref. https://github.com/pytorch/pytorch/issues/60858

Pull Request resolved: https://github.com/pytorch/pytorch/pull/73076

Reviewed By: mikaylagawarecki

Differential Revision: D34342993

Pulled By: cpuhrsch

fbshipit-source-id: 5e5ea67cb92fbaa4d4c0eaf61e85019972989a21
(cherry picked from commit 62b8dc730e6a6736f5c03ac09eac5223cd9706cf)
2022-02-26 01:08:45 +00:00
Ivan Yashchuk
ebd93f69db Enable CSR inputs for torch.sparse.mm (#73075)
Summary:
Previously `torch.sparse.mm` supported only COO and dense inputs.

Computing derivatives works wrt dense input for sparse_csr x dense -> dense

Modified implementation of `torch.sparse.mm` to be directly bound to ATen function.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/73075

Reviewed By: mikaylagawarecki

Differential Revision: D34342954

Pulled By: cpuhrsch

fbshipit-source-id: a6ed914a0ce28b35276109479109095f7149d32b
(cherry picked from commit 948de1816c46cd087bacbee36dc583cf409813f9)
2022-02-24 04:30:48 +00:00
Pearu Peterson
e785c0a1ab Enable Half/BFloat16 support for to_dense and coalesce methods. (#72397)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72397

Test Plan: Imported from OSS

Reviewed By: jbschlosser, zou3519

Differential Revision: D34286114

Pulled By: cpuhrsch

fbshipit-source-id: a4f7e2abc3b2d37437cbd09d693c1b409bb011b9
(cherry picked from commit 74f94447fc)
2022-02-17 02:54:23 +00:00
Ivan Yashchuk
fb7c4780f9 Add autograd tests for addmm, addmv, mm, mv and CSR matrix input (#71949)
Summary:
This PR adds autograd tests for `addmm, addmv, mm, mv` functions that check computing derivatives wrt dense inputs.

Currently, neither autograd engine, nor gradcheck can work with CSR inputs<->CSR outputs. I added xfailing tests for that.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/71949

Reviewed By: george-qi

Differential Revision: D33834653

Pulled By: cpuhrsch

fbshipit-source-id: 4144c1547427d4cd6b01495cf45242bb4e914e86
(cherry picked from commit 2cb362283d)
2022-02-11 23:14:02 +00:00
Ivan Yashchuk
ad5a5a9794 Beta value is ignored for sparse torch.addmm with non-MKL build (#72430)
Summary:
When PyTorch is not built with MKL or on Windows there's a native implementation of `torch.addmm` for tensors on CPU. There was a bug that `beta` value was ignored, causing new tests to fail (see https://github.com/pytorch/pytorch/pull/71949#issuecomment-1024639741).

In addition, I also enabled complex numbers support for this code path.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/72430

Reviewed By: davidberard98

Differential Revision: D34045670

Pulled By: cpuhrsch

fbshipit-source-id: b2b63f22ba3eea895a31c5c2925b0fb1555d2c6f
(cherry picked from commit ac0a2080bb)
2022-02-09 00:32:17 +00:00
Nikita Shulga
38ebb776a4 Fail with unexpected success for fatal errors (#72016)
Summary:
Rest of the tests from CUDA testuite is skipped after GPU context corruption is encountered.
For tests decorated with `expectedFailure` creates false impression that entire testsuite is passing.
Remedy it by suppressing the exception and printing the warning about unexpected success if `should_stop_early` is true
Also, prints warning when this happens (to make attribution easier) as well as when this condition is detected.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/72016

Test Plan:
`python test_ops.py -v  -k test_fn_fwgrad_bwgrad_gradient`
Before the change:
```
test_fn_fwgrad_bwgrad_gradient_cpu_complex128 (__main__.TestGradientsCPU) ... ok
test_fn_fwgrad_bwgrad_gradient_cpu_float64 (__main__.TestGradientsCPU) ... ok
test_fn_fwgrad_bwgrad_gradient_cuda_complex128 (__main__.TestGradientsCUDA) ... expected failure

----------------------------------------------------------------------
Ran 3 tests in 0.585s
OK (expected failures=1)
```

After the change:
```
test_fn_fwgrad_bwgrad_gradient_cpu_complex128 (__main__.TestGradientsCPU) ... ok
test_fn_fwgrad_bwgrad_gradient_cpu_float64 (__main__.TestGradientsCPU) ... ok
test_fn_fwgrad_bwgrad_gradient_cuda_complex128 (__main__.TestGradientsCUDA) ... /home/conda/miniconda3/lib/python3.9/site-packages/torch/testing/_internal/common_utils.py:1670: UserWarning: TEST SUITE EARLY TERMINATION due to torch.cuda.synchronize() failed with CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
  warn(f"TEST SUITE EARLY TERMINATION due to torch.cuda.synchronize() failed with {rte}")
/home/conda/miniconda3/lib/python3.9/site-packages/torch/testing/_internal/common_device_type.py:382: UserWarning: Suppressed expected failure that resulted in fatal error
  warn("Suppressed expected failure that resulted in fatal error")
unexpected success

----------------------------------------------------------------------
Ran 3 tests in 0.595s

FAILED (unexpected successes=1)
```
And `stderr` from XML file contains requested info:
```
/home/conda/miniconda3/lib/python3.9/site-packages/torch/testing/_internal/common_utils.py:1670: UserWarning: TEST SUITE EARLY TERMINATION due to torch.cuda.synchronize() failed with CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
  warn(f"TEST SUITE EARLY TERMINATION due to torch.cuda.synchronize() failed with {rte}")
/home/conda/miniconda3/lib/python3.9/site-packages/torch/testing/_internal/common_device_type.py:382: UserWarning: Suppressed expected failure that resulted in fatal error
  warn("Suppressed expected failure that resulted in fatal error")
```

Fixes https://github.com/pytorch/pytorch/issues/71973

Reviewed By: janeyx99, ngimel

Differential Revision: D33854287

Pulled By: malfet

fbshipit-source-id: dd0f5a4d2fcd21ebb7ee50ce4ec4914405a812d0
(cherry picked from commit 0c0baf3931)
2022-02-03 17:49:59 +00:00
Kushashwa Ravi Shrimali
85591dc85d Test 0->0 correspondence for Unary Ops with Sparse CSR inputs (#70302)
Summary:
Since there is no rule in PyTorch (Sparse CSR) for filling zeros, it was decided that only those ops will be supported which do not break 0->0 correspondence. To ensure that this rule is not broken, this PR aims to add a test to ensure this rule is not broken.

`sample_inputs_unary` may or may not generate a zero in the sample input. Hence, this separate test is good for validating the rule, and the support for Sparse CSR.

cc nikitaved pearu cpuhrsch

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70302

Reviewed By: albanD

Differential Revision: D33922501

Pulled By: cpuhrsch

fbshipit-source-id: 10f67a220b95a8e75205345a33744ad536fdcf53
(cherry picked from commit ade9bf7818)
2022-02-03 16:53:27 +00:00
Christian Puhrsch
4a7e07e53e Fix torch.save and detach for CSR Tensor (#71963)
Summary:
Currently saving a CSR Tensor simply fails. This also addresses the segfault encountered in https://github.com/pytorch/pytorch/issues/71652.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/71963

Reviewed By: jbschlosser

Differential Revision: D33895938

Pulled By: cpuhrsch

fbshipit-source-id: a333505d3a216705147c2aaaaeb2a0fd0c2a5e43
(cherry picked from commit a88265921c)
2022-02-02 23:59:24 +00:00
Ivan Yashchuk
be2dc8f294 Sparse CSR CUDA: Add torch.baddbmm and torch.bmm (#68711)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68711

This PR adds possibility to multiply a single CSR matrix by a batch of dense matrices.

cc nikitaved pearu cpuhrsch IvanYashchuk ngimel

Test Plan: Imported from OSS

Reviewed By: davidberard98

Differential Revision: D33773319

Pulled By: cpuhrsch

fbshipit-source-id: 1623ce9affbc4fdc6d6130a95c5a42022858b62b
(cherry picked from commit 628c8e366d)
2022-01-28 07:25:32 +00:00
Ivan Yashchuk
f93ffc9ea8 Sparse CSR: Handle zero matrix consistently for triangular_solve (#71304)
Summary:
This PR enables `test_block_triangular` tests on the CPU.
These tests revealed that there was a problem with how the nnz==0 case is handled. Now we return a tensor filled with NaNs both on CUDA and CPU.

cc nikitaved pearu cpuhrsch

Pull Request resolved: https://github.com/pytorch/pytorch/pull/71304

Reviewed By: davidberard98

Differential Revision: D33600482

Pulled By: cpuhrsch

fbshipit-source-id: d09cb619f8b6e54b9f07eb16765ad1c183c42487
2022-01-17 13:47:49 -08:00
Ivan Yashchuk
40121456af Sparse CSR: Add torch.randn_like (#68083)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68083

This PR adds support for `torch.randn_like(sparse_csr_tensor)`.
It creates a new sparse csr tensor with same indices but different values that are normally distributed.

In addition `.normal_()` and `torch.empty_like` were implemented because `randn_like` is a composite of these two functions.

cc nikitaved pearu cpuhrsch IvanYashchuk

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D33511280

Pulled By: cpuhrsch

fbshipit-source-id: 6129083e8bc6cc5af2e0191294bd5e4e864f6c0e
2022-01-11 18:29:24 -08:00
Pearu Peterson
cfc5519661 Support Sparse CSR transpose. Fix clang-tidy warnings. (#70582)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70582

cc nikitaved pearu cpuhrsch

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D33414446

Pulled By: cpuhrsch

fbshipit-source-id: dd0888d9dd3885579e853643a60d13373b5d6b15
2022-01-05 17:41:51 -08:00
Pearu Peterson
ab7d0df449 Support cloning CSR tensors (#70581)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70581

cc nikitaved pearu cpuhrsch

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D33413992

Pulled By: cpuhrsch

fbshipit-source-id: 3a576d2c2f26d1edcc8f6932b2dbe2c7c11e9593
2022-01-04 21:41:18 -08:00
Ivan Yashchuk
60eb1e53b2 Sparse CSR CPU: Add block sparse support for MKL path (#68710)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68710

This PR adds support for block sparse (BSR) matrices for functions that
use Inspector-Executor MKL Sparse API. At the moment of this PR it's:
* torch.addmm
* torch.addmv
* torch.triangular_solve (once https://github.com/pytorch/pytorch/pull/62180 is merged)

cc nikitaved pearu cpuhrsch IvanYashchuk

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D33179486

Pulled By: cpuhrsch

fbshipit-source-id: e1dec0dccdbfed8b280be16b8c11fc9e770d50ae
2021-12-17 10:56:05 -08:00
Ivan Yashchuk
243e135eb4 Sparse CSR CUDA: Add block sparse support for torch.triangular_solve (#68709)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68709

This PR adds support for triangular solver with a block CSR matrix.

cc nikitaved pearu cpuhrsch IvanYashchuk ngimel

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D33066067

Pulled By: cpuhrsch

fbshipit-source-id: 9eaf1839071e9526be8d8c6d47732b24200f3557
2021-12-16 13:03:42 -08:00
Peter Bell
6de9f0fc94 OpInfo: Allow sample_inputs_func to be any iterable (#69256)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69256

Closes #52486

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D32942008

Pulled By: mruberry

fbshipit-source-id: f5b01b0298c0160b0bec6e86e2b6db8cfe746206
2021-12-09 08:37:26 -08:00
Ivan Yashchuk
a8232ee1bc Sparse CSR CUDA: Add block torch.addmv when mat is sparse (#68708)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68708

This PR adds block CSR matrix times dense vector multiplication.

cc nikitaved pearu cpuhrsch IvanYashchuk ngimel

Test Plan: Imported from OSS

Reviewed By: pbelevich

Differential Revision: D32647694

Pulled By: cpuhrsch

fbshipit-source-id: a1c120691c4350284b156fe4259eda684b734b66
2021-12-07 14:02:59 -08:00
Kushashwa Ravi Shrimali
63470f9449 Sparse CSR: Implement unary ufuncs (with 0->0 correspondence) (#69292)
Summary:
This PR attempts to add support for unary ufuncs (with 0->0 correspondence) for Sparse CSR Layout.

Ops supported: `['abs', 'asin', 'asinh', 'atan', 'atanh', 'ceil', 'conj_physical', 'floor', 'log1p', 'neg', 'round', 'sin', 'sinh', 'sign', 'sgn', 'signbit', 'tan', 'tanh', 'trunc', 'expm1', 'sqrt', 'angle', 'isinf', 'isposinf', 'isneginf', 'isnan', 'erf', 'erfinv']`

cc nikitaved pearu cpuhrsch IvanYashchuk peterbell10

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69292

Reviewed By: pbelevich

Differential Revision: D32805514

Pulled By: cpuhrsch

fbshipit-source-id: 9ae20817e77a36d3aa6c5afa532b9dc3b8cf1dd3
2021-12-07 12:07:41 -08:00
Ivan Yashchuk
89a145fd91 Sparse CSR CUDA: Add torch.sparse.sampled_addmm (#68007)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68007

This PR adds a new function to the sparse module.
`sampled_addmm` computes α*(A @ B) * spy(C) + β*C, where C is a sparse CSR matrix and A, B are dense (strided) matrices.
This function is currently restricted to single 2D matrices, it doesn't support batched input.

cc nikitaved pearu cpuhrsch IvanYashchuk

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D32435799

Pulled By: cpuhrsch

fbshipit-source-id: b1ffac795080aef3fa05eaeeded03402bc097392
2021-11-29 15:43:29 -08:00
Ivan Yashchuk
61a4204d80 Sparse CSR CUDA: Add block torch.addmm when mat1 is sparse (#68707)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68707

This PR adds a path for block CSR matrices for `torch.addmm`. cuSPARSE interface is restricted to 32-bit indices and square blocks.
My plan is to make everything work and tests passing using an unsafe constructor first, keeping it all private. Then discuss & implement constructors with block information separately unlocking the functions for wider use. Documentation will come with the update to constructors.

cc nikitaved pearu cpuhrsch IvanYashchuk ngimel

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D32650366

Pulled By: cpuhrsch

fbshipit-source-id: 430a9627901781ee3d2e2496097b71ec17727d98
2021-11-29 08:58:49 -08:00
Nikita Shulga
208e109dbf Revert D32633806: Sparse CSR CUDA: Add block torch.addmm when mat1 is sparse
Test Plan: revert-hammer

Differential Revision:
D32633806 (b28ddd72d3)

Original commit changeset: b98db0bd655c

fbshipit-source-id: 1c757628526bb1b88747257fc77d8b9cb996e502
2021-11-24 09:15:17 -08:00
Ivan Yashchuk
b28ddd72d3 Sparse CSR CUDA: Add block torch.addmm when mat1 is sparse (#68707)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68707

This PR adds a path for block CSR matrices for `torch.addmm`. cuSPARSE interface is restricted to 32-bit indices and square blocks.
My plan is to make everything work and tests passing using an unsafe constructor first, keeping it all private. Then discuss & implement constructors with block information separately unlocking the functions for wider use. Documentation will come with the update to constructors.

cc nikitaved pearu cpuhrsch IvanYashchuk ngimel

Test Plan: Imported from OSS

Reviewed By: pbelevich

Differential Revision: D32633806

Pulled By: cpuhrsch

fbshipit-source-id: b98db0bd655cce651a5da457e78fca08619a5066
2021-11-23 22:55:46 -08:00
Ivan Yashchuk
3b3dc1ade8 Sparse CSR CPU: add triangular_solve_out (#62180)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62180

This PR adds CPU dispatch for `triangular_solve` with sparse CSR matrix.
The implementation uses MKL Sparse library. If it's not available then a runtime error is thrown.

cc nikitaved pearu cpuhrsch IvanYashchuk

Test Plan: Imported from OSS

Reviewed By: pbelevich

Differential Revision: D32581395

Pulled By: cpuhrsch

fbshipit-source-id: 41c7133a0d2754ef60b5a7f1d14aa0bf7680a844
2021-11-21 21:29:20 -08:00
Kushashwa Ravi Shrimali
833dcaf2d6 Sparse CSR: Add torch.sin (#68123)
Summary:
This PR attempts to add support for `torch.sin` for sparse CSR tensors.

This aims to be a revised implementation (in some form) of https://github.com/pytorch/pytorch/pull/68083, and the implementation aims to be similar to that in [`SparseTensorMath.cpp` file](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/sparse/SparseTensorMath.cpp)

The tests and `empty_like` support for sparse CSR tensors (with a minor correction) are borrowed from https://github.com/pytorch/pytorch/pull/68083 temporarily to assist CI with testing this PR. :)

cc nikitaved pearu cpuhrsch IvanYashchuk krshrimali

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68123

Reviewed By: jbschlosser

Differential Revision: D32533379

Pulled By: cpuhrsch

fbshipit-source-id: eb834d64d16ee12734c77e74fffa4a47614e3dfb
2021-11-18 21:58:09 -08:00
Rok
952ca25daa Sparse CSR: add convert_indices_from_csr_to_coo (#66774)
Summary:
This PR adds conversion from CSR to COO.

Fixes https://github.com/pytorch/pytorch/issues/56959

cc nikitaved pearu cpuhrsch IvanYashchuk gchanan mruberry

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66774

Reviewed By: zou3519

Differential Revision: D32288415

Pulled By: cpuhrsch

fbshipit-source-id: 683ba658dc46835fdf3c0e24645c0c2bb243b968
2021-11-17 22:28:30 -08:00
Ivan Yashchuk
affa3f846c Sparse CSR CPU: add torch.addmm (#65606)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65606

This PR adds `torch.addmm(c, a, b, alpha=1.0, beta=0.0, out=out)` variant with `a, b, c, out` all being sparse CSR tensors on CPU.

cc nikitaved pearu cpuhrsch IvanYashchuk

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D32366236

Pulled By: cpuhrsch

fbshipit-source-id: e910bcc96eee99d624b80ee881df3887ab3ba5ac
2021-11-16 17:22:46 -08:00
Ivan Yashchuk
c2642b6465 Sparse CSR CPU: add torch.add with all inputs sparse (#64391)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64391

This PR adds `torch.add(a, b, alpha=None, out=out)` variant with `a, b, out` all being sparse CSR tensors on CPU.

Fixes #59060

cc nikitaved pearu cpuhrsch IvanYashchuk

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D32316562

Pulled By: cpuhrsch

fbshipit-source-id: 384462369007854b5e2e6cb9ae7b320302627c71
2021-11-11 10:02:12 -08:00
Ivan Yashchuk
cbf596bf8e Sparse CSR CPU: add addmv_out (#61536)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61536

This PR adds CPU dispatch for `addmv_out` with Sparse CSR matrix.
The implementation uses MKL Sparse library. If it's not available then a
runtime error is thrown.
Since structured_delegate is used we only need to implement the out variant, the in-place and normal variants are autogenerated.

MKL descriptor of sparse matrices is implemented in `at::mkl::sparse::MklSparseCsrDescriptor`.
MKL Sparse doesn't allow switching indices type in runtime, it's
predetermined in build time. Only 32-bit version of MKL was tested
locally, but I expect 64-bit version to work correctly as well.

When indices type of PyTorch CSR tensor doesn't match with MKL's,
indices tensor is converted to MKL compatible type (`int` vs `int64_t`).

cc nikitaved pearu cpuhrsch IvanYashchuk

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D32141787

Pulled By: malfet

fbshipit-source-id: b818a0b186aa227982221c3862a594266a58a2a6
2021-11-09 12:34:21 -08:00
Ivan Yashchuk
d5d342b237 Sparse CSR CUDA: Support mixed memory format input for triangular_solve (#66401)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66401

This PR fixes the case when result and input tensors have different
strides.
cuSPARSE from CUDA 11.3.1 has a bug: it doesn't use correct strides to
write the result. This is "fixed" in PyTorch code by copying the input
tensor to a tensor with same strides as result tensor has.

cc nikitaved pearu cpuhrsch IvanYashchuk ngimel

Test Plan: Imported from OSS

Reviewed By: davidberard98

Differential Revision: D32177966

Pulled By: cpuhrsch

fbshipit-source-id: 118437409df147f04dce02763aff9bfd33f87c63
2021-11-04 15:34:42 -07:00
Ivan Yashchuk
69f86ecd3a Sparse CSR CUDA: add torch.add with all inputs sparse (#63948)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63948

This PR adds `torch.add(a, b, alpha=None, out=out)` variant with `a, b,
out` all being sparse CSR tensors.
The underlying cuSPARSE function works only with 32-bit indices, and in
the current implementation, the result tensor has 32-bit indices. Input
tensors can have both 64-bit and 32-bit indices tensors.

Fixes https://github.com/pytorch/pytorch/issues/59060

cc nikitaved pearu cpuhrsch IvanYashchuk ngimel

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D31909731

Pulled By: cpuhrsch

fbshipit-source-id: 656f523e3947fec56b2f93c474fb6fd49f0360ca
2021-10-29 10:43:05 -07:00
Ivan Yashchuk
bd5e6fe5ac Skip complex128 dtype for test_addmm_sizes_all_sparse_csr Windows test (#67453)
Summary:
Windows CUDA 11.1 periodic CI is failing. See https://github.com/pytorch/pytorch/pull/63511#issuecomment-953940183.
I don't understand though why periodic-win-vs2019-cuda11.1-py3 was triggered on the PR, but no test from `test_sparse_csr.py` were run https://github.com/pytorch/pytorch/runs/3975200820?check_suite_focus=true.

cc nikitaved pearu cpuhrsch IvanYashchuk mruberry

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67453

Reviewed By: malfet, seemethere, janeyx99

Differential Revision: D31997574

Pulled By: cpuhrsch

fbshipit-source-id: ae8bfb6da865014f39e6ad5675eb17e5a4d39744
2021-10-28 12:24:46 -07:00
Ivan Yashchuk
7c48b9ee25 Sparse CSR CUDA: add triangular_solve_out (#61858)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61858

This PR adds `triangular_solve_out_sparse_csr_cuda`. The operation is
used to comput the solution to the linear system where coefficient
matrix is triangular.
Structured kernels are used and the meta function needed some changes to
support sparse csr layout. With sparse matrix input the `cloned_coefficient`
tensor is 0-sized tensor.

cc nikitaved pearu cpuhrsch IvanYashchuk ngimel

Test Plan: Imported from OSS

Reviewed By: pbelevich

Differential Revision: D31948435

Pulled By: cpuhrsch

fbshipit-source-id: 7775fece83ca705a26d75f82aead10b956b14bfd
2021-10-27 11:12:20 -07:00
Ivan Yashchuk
700b39a3df Sparse CSR CUDA: add torch.addmm with all inputs sparse (#63511)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63511

This PR adds `torch.addmm(c, a, b)` variant with `c, a, b` all being CSR tensors.
The underlying cuSPARSE function works only with 32-bit indices, and in
the current implementation the result tensor has 32-bit indices. Input
tensors can have both 64-bit and 32-bit indices tensors.

cc nikitaved pearu cpuhrsch IvanYashchuk ngimel

Test Plan: Imported from OSS

Reviewed By: eellison

Differential Revision: D31809838

Pulled By: cpuhrsch

fbshipit-source-id: 97005dba27d8adcae445eb756bcbd7271061e9b5
2021-10-25 14:32:30 -07:00
Ivan Yashchuk
450221c534 Sparse CSR: Add tensor.resize_ and tensor.copy_ (#63510)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63510

Sparse CSR matrix resizing behavior:
If we _increase the number of rows_ the number of specified elements in the matrix remains the same -> the size of col_indices, values doesn't change, the size of crow_indices becomes `rows+1`.
If we _decrease the number of rows_ the number of specified elements will be `min(nnz, rows*cols)` -> need to resize `crow_indices` to `rows+1` and set the last element to `min(nnz, rows*cols)`; decrease the size of col_indices and values to `min(nnz, rows*cols)`.
If we _increase the number of columns_ the number of specified elements in the matrix remains the same, the number of rows remains the same -> no need to resize anything, just set new sizes.
We _cannot decrease the number of columns_ because it would require recomputing `crow_indices`.

cc nikitaved pearu cpuhrsch IvanYashchuk

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D31796680

Pulled By: cpuhrsch

fbshipit-source-id: 7d8a9701ce06d30a1841f94bba0a057cacea9401
2021-10-20 14:19:04 -07:00
Jane Xu
793f366e34 [skip ci] Set test owners for sparse tests (#66863)
Summary:
Action following https://github.com/pytorch/pytorch/issues/66232

cc nikitaved pearu cpuhrsch IvanYashchuk

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66863

Reviewed By: anjali411

Differential Revision: D31771126

Pulled By: janeyx99

fbshipit-source-id: 6cb5ca0557e8555f6a09b3e607ff8888e505486e
2021-10-20 10:12:13 -07:00