Commit Graph

231 Commits

Author SHA1 Message Date
Nikita Vedeneev
8383b5c488 Improve bsr @ strided performance in baddmm for bfloat16/half with Triton kernels. (#88078)
As per title.

Additionally we also introduce support for:
- Rectangular block sizes which are powers of 2 and at least 16 (triton's `dot` limitation).
- Batch support with broadcasting for either of the arguments.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88078
Approved by: https://github.com/cpuhrsch
2023-01-19 03:14:54 +00:00
PyTorch MergeBot
89f1ad08b4 Revert "Improve bsr @ strided performance in baddmm for bfloat16/half with Triton kernels. (#88078)"
This reverts commit 7f256fff77.

Reverted https://github.com/pytorch/pytorch/pull/88078 on behalf of https://github.com/huydhn due to This breaks lint 7f256fff77
2023-01-17 22:14:37 +00:00
Nikita Vedeneev
7f256fff77 Improve bsr @ strided performance in baddmm for bfloat16/half with Triton kernels. (#88078)
As per title.

Additionally we also introduce support for:
- Rectangular block sizes which are powers of 2 and at least 16 (triton's `dot` limitation).
- Batch support with broadcasting for either of the arguments.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88078
Approved by: https://github.com/cpuhrsch
2023-01-17 21:43:20 +00:00
Pearu Peterson
b3e4f5029b Add check-sparse-tensor-invariants flag to Context - 2nd try. (#92094)
This PR is a copy of https://github.com/pytorch/pytorch/pull/90849 that merge was reverted.

The PR adds "check sparse tensor invariants" flag to Context that when enabled will trigger sparse tensor data invariants checks in unsafe methods of constructing sparse COO/CSR/CSC/BSR/BSC tensors. The feature includes the following changes to UI:

`torch.sparse.check_sparse_tensor_invariants` class provides different ways to enable/disable the invariant checking.

`torch.sparse_coo/csr/csc/bsr/bsc/compressed_tensor` functions have a new optional argument `check_invariants` to enable/disable the invariant checks explicitly. When the `check_invariants` argument is specified, the global state of the feature is temporarily overridden.

The PR fixes https://github.com/pytorch/pytorch/issues/90833

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92094
Approved by: https://github.com/cpuhrsch
2023-01-13 14:50:33 +00:00
mingfeima
3ab58fd5ed optimize sampled_addmm performance on CPU (SparseCSR) (#90978)
### Target and Background
This PR is improving the performance of `sampled_addmm` on CPU device. This is part of effort for improving PyG performance on CPU for GNN training/inference.

The current implementation is a reference design which converts `SparseCSR` tensor back to dense tensor and then do the addmm and convert back to `SparseCSR` again: this is going to be very slow and won't be able to run most of the datasets under https://github.com/snap-stanford/ogb (convert to dense would trigger `OOM`).

### Benchmarks

Right now we don't have any hands-on benchmark or workload to test this since this operator is not used in PyG yet. I fetched the dataset from `ogb-products` where:

* number of nodes: 2.4 * 10^6
* number of edges: 1.26 * 10^8
* number of features: 128

So if we store the **adjacency matrix** is dense, it is going to be 2.4 * 2.4 * 4 * 10^12 bytes, this will be OOB on current code. I abstract the first 1k rows to compare, **1100x** speedup:

CPU: Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz, dual socket, 20 cores per socket.
```
### before: run 1000 rows from the whole dataset
sampled_addmm: running dataset ogb-products first 1000 rows: each iter takes 1212.000 ms!

### after: run 1000 rows from the whole dataset
sampled_addmm: running dataset ogb-products first 1000 rows: each iter takes 1.102 ms!

### after: run the whole dataset
sampled_addmm: running dataset ogb-products (the whole dataset) 2449029 rows: each iter takes 873.306 ms!
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90978
Approved by: https://github.com/pearu, https://github.com/cpuhrsch
2023-01-12 12:04:07 +00:00
PyTorch MergeBot
c7a22bb7c7 Revert "Add check-sparse-tensor-invariants flag to Context. (#90849)"
This reverts commit b9a035c1c5.

Reverted https://github.com/pytorch/pytorch/pull/90849 on behalf of https://github.com/DanilBaibak due to Break internal build
2023-01-12 09:58:16 +00:00
Aleksandar Samardžić
8612ec5b90 Implement hybrid sparse to/from dense conversions. (#90177)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90177
Approved by: https://github.com/cpuhrsch, https://github.com/pearu
2023-01-12 03:31:30 +00:00
PyTorch MergeBot
c5836153f5 Revert "optimize sampled_addmm performance on CPU (SparseCSR) (#90978)"
This reverts commit 645fb217c0.

Reverted https://github.com/pytorch/pytorch/pull/90978 on behalf of https://github.com/seemethere due to This broke internal builds for android due to the new file added being missing in build_variables.bzl
2023-01-11 20:12:12 +00:00
Pearu Peterson
b9a035c1c5 Add check-sparse-tensor-invariants flag to Context. (#90849)
This PR adds "check sparse tensor invariants" flag to Context that when enabled will trigger sparse tensor data invariants checks in unsafe methods of constructing sparse COO/CSR/CSC/BSR/BSC tensors. The feature includes the following changes to UI:

- `torch.enable_check_sparse_tensor_invariants` and `torch.is_check_sparse_tensor_invariants_enabled` functions to globally enable/disable the invariant checks and to retrieve the state of the feature, respectively
- `torch.sparse_coo/csr/csc/bsr/bsc/compressed_tensor` functions have a new optional argument `check_invariants` to enable/disable the invariant checks explicitly. When the `check_invariants` argument is specified, the global state of the feature is temporarily overridden.

The PR also fixes https://github.com/pytorch/pytorch/issues/90833

# Main issue

*The following content is outdated after merging the PRs in this ghstack but kept for the record.*

The importance of this feature is that when enabling the invariants checks by default, say, via

<details>

```
$ git diff
diff --git a/torch/__init__.py b/torch/__init__.py
index c8543057c7..19a91d0482 100644
--- a/torch/__init__.py
+++ b/torch/__init__.py
@@ -1239,3 +1239,8 @@ if 'TORCH_CUDA_SANITIZER' in os.environ:

 # Populate magic methods on SymInt and SymFloat
 import torch.fx.experimental.symbolic_shapes
+
+# temporarily enable sparse tensor arguments validation in unsafe
+# constructors:
+
+torch._C._set_check_sparse_tensor_invariants(True)
```

</details>

a massive number of test failures/errors occur in test_sparse_csr.py tests:
```
$ pytest -sv test/test_sparse_csr.py
<snip>
==== 4293 failed, 1557 passed, 237 skipped, 2744 errors in 69.71s (0:01:09) ====
```
that means that we are silently constructing sparse compressed tensors that do not satisfy the sparse tensor invariants. In particular, the following errors are raised:

```
AssertionError: "resize_as_sparse_compressed_tensor_: self and src must have the same layout" does not match "expected values to be a strided and contiguous tensor"

RuntimeError: CUDA error: device-side assert triggered

RuntimeError: `col_indices[..., crow_indices[..., i - 1]:crow_indices[..., i]] for all i = 1, ..., nrows are sorted and distinct along the last dimension values` is not satisfied.

RuntimeError: expected col_indices to be a strided and contiguous tensor

RuntimeError: expected row_indices to be a strided and contiguous tensor

RuntimeError: expected values to be a strided and contiguous tensor

RuntimeError: for_each: failed to synchronize: cudaErrorAssert: device-side assert triggered

RuntimeError: tensor dimensionality must be sum of batch, base, and dense dimensionalities (=0 + 2 + 0) but got 3
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90849
Approved by: https://github.com/amjames, https://github.com/cpuhrsch
2023-01-11 01:05:14 +00:00
mingfeima
645fb217c0 optimize sampled_addmm performance on CPU (SparseCSR) (#90978)
### Target and Background
This PR is improving the performance of `sampled_addmm` on CPU device. This is part of effort for improving PyG performance on CPU for GNN training/inference.

The current implementation is a reference design which converts `SparseCSR` tensor back to dense tensor and then do the addmm and convert back to `SparseCSR` again: this is going to be very slow and won't be able to run most of the datasets under https://github.com/snap-stanford/ogb (convert to dense would trigger `OOM`).

### Benchmarks

Right now we don't have any hands-on benchmark or workload to test this since this operator is not used in PyG yet. I fetched the dataset from `ogb-products` where:

* number of nodes: 2.4 * 10^6
* number of edges: 1.26 * 10^8
* number of features: 128

So if we store the **adjacency matrix** is dense, it is going to be 2.4 * 2.4 * 4 * 10^12 bytes, this will be OOB on current code. I abstract the first 1k rows to compare, **1100x** speedup:

CPU: Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz, dual socket, 20 cores per socket.
```
### before: run 1000 rows from the whole dataset
sampled_addmm: running dataset ogb-products first 1000 rows: each iter takes 1212.000 ms!

### after: run 1000 rows from the whole dataset
sampled_addmm: running dataset ogb-products first 1000 rows: each iter takes 1.102 ms!

### after: run the whole dataset
sampled_addmm: running dataset ogb-products (the whole dataset) 2449029 rows: each iter takes 873.306 ms!
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90978
Approved by: https://github.com/pearu, https://github.com/cpuhrsch
2023-01-10 22:13:35 +00:00
Pearu Peterson
cdc30048e5 Fix numel() result after resizing a sparse compressed tensor. (#91831)
Fixes #91830

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91831
Approved by: https://github.com/cpuhrsch
2023-01-10 18:21:07 +00:00
Pearu Peterson
b797a24259 Support indices contiguity per batch and non-contiguous values in sparse compressed tensors (#91243)
Fixes https://github.com/pytorch/pytorch/issues/91062

With this PR, all reported failures in https://github.com/pytorch/pytorch/pull/90849 are resolved (modulo test_bmm that uses an unorthodox way to construct a batch CSR tensor).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91243
Approved by: https://github.com/nikitaved, https://github.com/amjames, https://github.com/lezcano
2023-01-02 18:08:46 +00:00
Kurt Mohler
08a47549af Rename Tensor._storage to Tensor.untyped_storage and update docs (#91414)
Fixes #89224

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91414
Approved by: https://github.com/ezyang
2022-12-28 19:21:34 +00:00
Nikita Vedeneev
4c5928e387 Fix for mul(compressed, wrapped scalar) (#91239)
Fixes https://github.com/pytorch/pytorch/issues/90819.

The path with `Scalar` should have been picked up by the dispatcher, but still the path with a 0-dim wrapped scalar was broken.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91239
Approved by: https://github.com/pearu, https://github.com/cpuhrsch
2022-12-22 13:11:13 +00:00
Pearu Peterson
01e7f46215 Ensure sorted indices from the CSR->BSR conversion (#90918)
Fixes https://github.com/pytorch/pytorch/issues/90910

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90918
Approved by: https://github.com/cpuhrsch
2022-12-16 15:49:48 +00:00
Nikita Vedeneev
c2c14f9597 Sparse compressed mm: fix for orthogonal inputs (#90917)
Fixes https://github.com/pytorch/pytorch/issues/90836
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90917
Approved by: https://github.com/cpuhrsch
2022-12-16 13:08:00 +00:00
Nikita Vedeneev
4dd3de23dd Sparse compressed mm: fix for empty inputs (#90763)
Fixes [#90693
](https://github.com/pytorch/pytorch/issues/90693)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90763
Approved by: https://github.com/cpuhrsch
2022-12-16 12:33:57 +00:00
Pearu Peterson
76c6dfeaa6 Add layout and blocksize arguments to Tensor.to_sparse method (#89502)
This PR extends the `Tensor.to_sparse()` method to `Tensor.to_sparse(layout=None, blocksize=None)` in a BC manner (`layout=None` means `layout=torch.sparse_coo`).

In addition, the PR adds support for the following conversions:
- non-hybrid/hybrid COO tensor to CSR or CSC or a COO tensor
- short, bool, byte, char, bfloat16, int, long, half CSR tensor to a BSR tensor

and fixes the following conversions:
- hybrid COO to COO tensor
- non-batch/batch hybrid BSR to BSR or BSC tensor

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89502
Approved by: https://github.com/amjames, https://github.com/cpuhrsch
2022-11-30 20:21:10 +00:00
Pearu Peterson
296e1ba4d0 Row and column select support for block compressed sparse tensors (#88733)
As in the title:

- Support `select` and `select_copy` on block sparse compressed tensors
- Fixes incorrect results when selecting dense dimensions

The PR also improves the performance of indexing sparse compressed tensors considerably:

<details>

Before:

```python
In [3]: a=torch.rand((1000, 1000)).to_sparse_csr()

In [4]: %timeit a.select(0, 0)
606 µs ± 4.27 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [5]: %timeit a.select(1, 0)
527 µs ± 57.7 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [6]: %timeit a[0, 0]
617 µs ± 3.74 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [7]: a = a.cuda()

In [8]: %timeit a.select(0, 0); torch.cuda.synchronize();
1.19 ms ± 137 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [9]: %timeit a.select(1, 0); torch.cuda.synchronize();
1.2 ms ± 119 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [10]: %timeit a[0, 0]; torch.cuda.synchronize();
1.23 ms ± 482 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
```

This PR:

```python
In [3]: a=torch.rand((1000, 1000)).to_sparse_csr()

In [4]: %timeit a.select(0, 0)
4.75 µs ± 8.94 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [5]: %timeit a.select(1, 0)
565 µs ± 156 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [6]: %timeit a[0, 0]
13.1 µs ± 435 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [7]: a = a.cuda()

In [8]: %timeit a.select(0, 0); torch.cuda.synchronize();
21.6 µs ± 23.9 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [9]: %timeit a.select(1, 0); torch.cuda.synchronize();
1.15 ms ± 3.13 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [10]: %timeit a[0, 0]; torch.cuda.synchronize();
63.7 µs ± 2.5 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
```

</details>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88733
Approved by: https://github.com/nikitaved, https://github.com/amjames, https://github.com/cpuhrsch
2022-11-30 11:15:56 +00:00
Pearu Peterson
90bed8874f Generator of tensor inputs with variable layout and structure (batch/non-batch, hybrid/non-hybrid, block/non-block) (#88914)
This PR introduces `TestCase.generate_simple_inputs` method that is an improved and generalized version of the `TestSparseCompressed._generate_small_inputs` method.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88914
Approved by: https://github.com/cpuhrsch
2022-11-30 02:13:33 +00:00
Pearu Peterson
50e2e4faf3 Sparse CSC/BSR/BSC serialization and pickle support (#89553)
Fixes https://github.com/pytorch/pytorch/issues/89497

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89553
Approved by: https://github.com/cpuhrsch
2022-11-23 20:56:48 +00:00
Andrew M. James
a41f70603a Round out rad2deg sparse support (#88442)
- Add sparse coo dispatch
- Modify backward to work with sparse compressed layouts
- Enable sparse_compressed autograd testing
- Correct layout support attributes on OpInfo

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88442
Approved by: https://github.com/cpuhrsch
2022-11-17 06:00:23 +00:00
Nikita Vedeneev
8dc3353b0b add to(dtype) support for all sparse compressed formats (#89055)
Fixes [#88419](https://github.com/pytorch/pytorch/issues/88419)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89055
Approved by: https://github.com/cpuhrsch
2022-11-15 21:16:18 +00:00
Kazuaki Ishizaki
03296844aa Fix typos in messages under aten (#88964)
This PR fixes typos of messages and parms in c++ source files under `aten` directory.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88964
Approved by: https://github.com/lezcano
2022-11-14 09:50:50 +00:00
Andrew M. James
ff6770a9a1 enable backward for log1p (sparse layouts) (#88155)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88155
Approved by: https://github.com/cpuhrsch
2022-11-04 20:59:26 +00:00
Andrew M. James
6938dd0b2c Support sparse inputs to deg2rad (#88156)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88156
Approved by: https://github.com/cpuhrsch
2022-11-04 20:59:26 +00:00
Andrew M. James
1964d8c34f Enable sparse_csr autograd testing for relu (#88154)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88154
Approved by: https://github.com/cpuhrsch
2022-11-04 20:59:23 +00:00
Andrew M. James
f03302ba49 Add sparse layout support for torch.frac (#88153)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88153
Approved by: https://github.com/cpuhrsch
2022-11-04 20:59:22 +00:00
Andrew M. James
b2dfd20260 Remove BSC conversion skip from TestSparseCompressed.test_consistency (#88152)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88152
Approved by: https://github.com/cpuhrsch
2022-11-01 22:18:56 +00:00
Andrew M. James
d044b4cc58 Update torch.abs and torch.positive opinfos to reflect sparse support (#88151)
cc @nikitaved @pearu @cpuhrsch @bhosmer
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88151
Approved by: https://github.com/cpuhrsch
2022-11-01 22:18:56 +00:00
Ivan Yashchuk
51ea441862 Upcast to fp32 in test_addmm_block ref_half_bfloat16 (#86682)
Fixes https://github.com/pytorch/pytorch/issues/86681
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86682
Approved by: https://github.com/nikitaved
2022-10-11 16:39:57 +00:00
nikitaved
e15a48def7 (bsr/csr) x dense mm (#85551)
As per title. This implementation is not the most optimal and could be improved albeit with native kernels (i.e. block matching need not be materialized).

Compared to existing kernels it offers:

- Half float support (In fact, any dtype that supports `matmul` will work).
- Arbitrary block sizes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85551
Approved by: https://github.com/amjames, https://github.com/cpuhrsch
2022-09-29 17:12:04 +00:00
Andrew M. James
8a926b3187 Enable CSC @ CSC addmm (#85379)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85379
Approved by: https://github.com/pearu, https://github.com/cpuhrsch
2022-09-27 19:49:31 +00:00
Andrew M. James
bb5001ce3d Enable dense x bsc mm/addmm (#85308)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85308
Approved by: https://github.com/pearu
2022-09-27 19:49:31 +00:00
Andrew M. James
aaef5d8f2c sparse mm/addmm enable dense x csc, csc x dense and simplify layout check logic. (#85307)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85307
Approved by: https://github.com/pearu, https://github.com/cpuhrsch
2022-09-27 16:46:28 +00:00
Andrew M. James
f64857189d resize_as_sparse support all compressed layouts (#85378)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85378
Approved by: https://github.com/pearu, https://github.com/cpuhrsch
2022-09-27 06:59:18 +00:00
George Qi
686555b663 [maskedtensor] port torch/_masked into torch/masked (#85515)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85515
Approved by: https://github.com/cpuhrsch
2022-09-26 23:41:13 +00:00
Sean Ross-Ross
a4c94f0739 Fix cuda issue with sparse.sampled_addmm (#85194)
fixes https://github.com/pytorch/pytorch/issues/85169

Pull Request resolved: https://github.com/pytorch/pytorch/pull/85194
Approved by: https://github.com/amjames, https://github.com/nikitaved
2022-09-23 20:52:23 +00:00
nikitaved
0278a141fc csr <-> csc, csc <-> csc, bsr <-> bsc, bsc <-> bsc, bsr <-> bsr conversions (#85091)
As per title. Required to enable a wider selection of sparse formats for `nn.functional.linear`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/85091
Approved by: https://github.com/amjames, https://github.com/cpuhrsch
2022-09-21 20:10:26 +00:00
Pearu Peterson
f0b06c64c8 Fix bugs in sparse compressed tensor shape and device inference (#85240)
Fixes #84999

This PR
- uses device option to set sparse compressed tensor instance device
- enables shape and device inference tests that was disabled due to an oversight
- fixes a bug in shape inference of hybrid tensors
- fixes a bug in to_sparse_bsr of a cuda tensor
- updates tests that catch the above bugs

Pull Request resolved: https://github.com/pytorch/pytorch/pull/85240
Approved by: https://github.com/cpuhrsch
2022-09-19 18:10:37 +00:00
Edward Z. Yang
c5a8946e40 Revert "Revert "Redo how custom/python_custom methods on TensorImpl work (#84796)" (#84806)
This reverts commit ca3b2bfbe3.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84806
Approved by: https://github.com/Chillee
2022-09-10 06:17:35 +00:00
Eli Uriegas
ca3b2bfbe3 Revert "Redo how custom/python_custom methods on TensorImpl work (#84796)
This reverts commit 591b75bf98.

Manual revert of https://github.com/pytorch/pytorch/pull/84641

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/84796
Approved by: https://github.com/izaitsevfb
2022-09-10 00:18:13 +00:00
Edward Z. Yang
591b75bf98 Redo how custom/python_custom methods on TensorImpl work (#84641)
A longstanding confusion in the implementation of fake tensor and proxy tensor is what to do about torch.ops.aten.sym_sizes and related calls. In particular, when you have a tensor that (1) has symbolic shapes and (2) has a `__torch_dispatch__` call, previously, you would always get `__torch_dispatch__` calls for sizes/strides query, *even if you didn't request it* via the dispatch kwargs in `make_wrapper_subclass`.

The reason for this is because we were previously mixing several concepts: "I want to dispatch to Python", "I want to call a virtual method" and "I have dynamic shapes". A single boolean variable controlled all of these things, and so it was not possible to understand inside TensorImpl what the user had actually originally requested.

In this PR, we track each of these concepts individually so that we can preserve user intent. Then, we combine these into a single "policy" variable that controls whether or not we can use the fastpath or not. For the policy to trigger, we only need one of the exceptional cases to be true.

Billing of changes:
* Rename `set_sizes_strides_policy` to `set_custom_sizes_strides`; in general, you cannot DIRECTLY set policy; you have to indirectly set it by the public functions.
* Some helpers for sizes and strides, since it's more complicated (as it is an enum, rather than just bools as is the case for device and layout). `matches_python_custom` is used to test the Python dispatch user ask. `matches_policy` does the policy test (only used in the user facing functions.)
* I reorged the accessor methods so that they are more logical. This makes the diff bad, so I recommend reading the final code directly.
* The default custom implementations now more reliably call their default() implementations
* As bonus refactor, I devirtualized some functions that don't need to be virtual
* `set_sym_sizes_and_strides` is renamed to `set_sizes_and_strides` to make it easier to use in template contexts; it optionally takes a storage offset now so you can set all three values at the same time. If you use the SymInt overload but there are no symbolic integers, we give you a normal resize.
* This adds `sym_storage_offset` since we had that in the symbolic shapes branch and there's no reason not to put it in (and it reduces merge conflicts)

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84641
Approved by: https://github.com/wconstab
2022-09-09 13:41:13 +00:00
Andrew M. James
9b115c7bd3 Sparse Compressed Transpose add support for Batch dims and BSR/BSC layouts (#82122)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82122
Approved by: https://github.com/bhosmer
2022-09-02 17:42:58 +00:00
Andrew M. James
0192a34910 Dense -> CSC support batch dimensions (#83086)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83086
Approved by: https://github.com/bhosmer, https://github.com/nikitaved
2022-09-02 17:42:58 +00:00
Andrew M. James
f0e5b73364 Dense -> CSR support batch dimensions (#83084)
Only requires changes to the dense->sparse pathway. The reverse already has support.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/83084
Approved by: https://github.com/bhosmer, https://github.com/nikitaved
2022-09-02 17:42:58 +00:00
Andrew M. James
8778f33744 Dense <-> bsc conversions (#80781)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80781
Approved by: https://github.com/bhosmer, https://github.com/nikitaved
2022-09-01 16:01:58 +00:00
jpvillam
247468baf0 [ROCm] More Sparse UTs enablement and more hipification mappings. (#78939)
Enables:

 test_bmm_cuda_float64
 test_bmm_deterministic_cuda_float64
 test_csr_matvec_cuda_complex128
 test_csr_matvec_cuda_complex64
 test_csr_matvec_cuda_float32
 test_csr_matvec_cuda_float64

To enable the above tests had to add some more hip mappings for the hipification process.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78939
Approved by: https://github.com/pruthvistony, https://github.com/malfet
2022-08-23 13:54:09 +00:00
Andrew M. James
eebcb9117a Fix BSR->Dense Batched Bug (#82120)
A todo in the tests which should have been removed and addressed before the initial PR landed was left, and so left holes in testing BSR-> Dense. This addresses the underlying issue and removes the hole in test coverage. #8071 Introduces more comprehensive test coverage for sparse compressed <-> Dense conversion in general.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82120
Approved by: https://github.com/nikitaved, https://github.com/bhosmer
2022-08-06 02:24:20 +00:00
Andrew M. James
0e0dfaa057 Add support for select of batch dims for all sparse compressed formats. (#82119)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82119
Approved by: https://github.com/nikitaved, https://github.com/bhosmer
2022-08-06 02:24:20 +00:00
Nikita Shulga
d80fe49de0 [Reland] Add py-3.10 config (#82329)
This is a re-land of #81372 and #81233 with the exception that it does not force the range-checks on older Python runtime versions and as such should not affect the internal workloads, which were the reason for revert, see https://github.com/pytorch/pytorch/pull/81372#issuecomment-1187516464

- [Py3.10] Allow floats to be imported as Long (#81372)
- [CI] Move CUDA-11.6 to Python-3.10 configuration (#81233)
- Don't do anything about range checks for pre-py3.10
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82329
Approved by: https://github.com/kit1980
2022-07-27 20:22:47 +00:00
Edward Z. Yang
7f7c81c5f9 Add empty_like support for sparse_csc/bsr/bsc (#82310)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82310
Approved by: https://github.com/amjames, https://github.com/nikitaved
2022-07-27 18:59:07 +00:00
PyTorch MergeBot
ec1b3a45ad Revert "[Py3.10] Allow floats to be imported as Long (#81372)"
This reverts commit 69d73345a2.

Reverted https://github.com/pytorch/pytorch/pull/81372 on behalf of https://github.com/DanilBaibak due to Break internal build
2022-07-18 14:55:13 +00:00
Nikita Shulga
69d73345a2 [Py3.10] Allow floats to be imported as Long (#81372)
Thus avoiding `TypeError: 'float' object cannot be interpreted as an integer` when trying to create integer tensor from floating point values

Use `c10::checked_convert` to detect overflows during tensor construction from scalars. Modify sparse_csr test that violated this rule

Fixes #69319

Tested in #81233

Pull Request resolved: https://github.com/pytorch/pytorch/pull/81372
Approved by: https://github.com/ezyang, https://github.com/ngimel
2022-07-15 22:57:58 +00:00
Nikita Vedeneev
880b972841 More efficient indices validations for compressed sparse formats. (#81108)
As per title.

Some of the features:
- native kernels both for the CPU and CUDA without device syncs.
- If needed, invariant checks 5.1 - 5.5 could be improved to utilize vectorization. This will require implementing a conversion `Vectorized -> bool`. That's a follow-up.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/81108
Approved by: https://github.com/amjames, https://github.com/pearu, https://github.com/cpuhrsch
2022-07-14 20:36:18 +00:00
Pearu Peterson
d50f4a3c24 Support sparse/dense_dim for Compressed Sparse tensors (#80901)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80901
Approved by: https://github.com/cpuhrsch, https://github.com/nikitaved
2022-07-08 15:49:35 +00:00
Pearu Peterson
d266256621 Support compressed sparse tensors with dense dimensions (#80565)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80565
Approved by: https://github.com/cpuhrsch
2022-07-07 16:21:12 +00:00
PyTorch MergeBot
682c0d2615 Use segment/scatter_reduce to support masked reductions on sparse CSR tensors (mean, amax, amin) (fp only) (#78918)
Follows design  [here](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/sparse/SparseCsrTensorMath.cpp#L804-L837) and [here](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/sparse/SparseCsrTensorMath.cpp#L885-L928) from SparseCsrTensorMath.cpp (which has already been used to implement sum/prod) but use `segment_reduce`/`scatter_reduce` for reduction step

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78918
Approved by: https://github.com/cpuhrsch
2022-06-30 14:11:53 +00:00
Andrew M. James
9e3677f85d Add support for BSR <-> Strided Conversion (#80354)
Supersedes #78303

Pull Request resolved: https://github.com/pytorch/pytorch/pull/80354
Approved by: https://github.com/cpuhrsch
2022-06-27 21:09:09 +00:00
Pearu Peterson
cde365a7cd Validate Sparse Compressed tensor inputs (#79385)
The validation includes regular tensor inputs, batched tensor inputs, as well as hybrid tensor inputs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79385
Approved by: https://github.com/nikitaved, https://github.com/cpuhrsch
2022-06-27 17:19:54 +00:00
Nikita Vedeneev
9ad91cc6e0 optimize to_dense for CSC (#79635)
As per title. Previously it was done via converting to COO.
A better approach could be using `dense.out_`, but `sparse_csc` is yet forbidden.
And are we fine with implementing very critical operations like `add` via transpositions?

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79635
Approved by: https://github.com/cpuhrsch
2022-06-21 16:52:16 +00:00
jpvillam
aff7eef476 [ROCm] Enable some sparse tests on ROCm (#77877)
Enabling:
test_sampled_addmm_errors_cuda_complex128
test_sampled_addmm_errors_cuda_complex64
test_sampled_addmm_errors_cuda_float32
test_sampled_addmm_errors_cuda_float64
test_sparse_add_cuda_complex128
test_sparse_add_cuda_complex64

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77877
Approved by: https://github.com/pruthvistony, https://github.com/malfet
2022-06-14 21:11:35 +00:00
Pearu Peterson
fb6749d977 Support CSC/BSR/BSC inputs to unary zero-preserving functions.
In addition, enable testing masked reductions in sparse compressed consistency check.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78173

Approved by: https://github.com/cpuhrsch
2022-06-09 09:46:34 +00:00
Pearu Peterson
8c88a55d44 Fix sparse BSR tensor validation.
Also adds bits to support dense dimensions for Sparse Compressed tensors.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78359

Approved by: https://github.com/cpuhrsch
2022-05-27 13:26:35 +00:00
Christian Puhrsch
b9fb940dec Conversion between SparseBsr and Strided (#78025)
Adds conversion between the strided and SparseBsr layout

[Based on code by @bhosmer!](https://colab.research.google.com/drive/1NHWti04TU269dzbRjLfxGxVlzZWo1XLo?usp=sharing)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78025
Approved by: https://github.com/pearu, https://github.com/jbschlosser
2022-05-25 15:03:35 +00:00
Christian Puhrsch
a8467de6fa Guard test_sparse_csr.test_mm on CUDA11+ (#77965)
Fixes #77944

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77965
Approved by: https://github.com/albanD, https://github.com/malfet
2022-05-20 16:16:28 +00:00
Christian Puhrsch
ec290949aa Change transpose to return CSC when given CSR, adjust addmm, addmv, mm (#77615)
Changes transpose to return CSC when given CSR and adds CSC support via to_sparse_csr to addmm and addmv.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77615
Approved by: https://github.com/pearu, https://github.com/albanD
2022-05-19 14:17:55 +00:00
Pearu Peterson
8b5f11c61e Support copy_ for Sparse Compressed tensors.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77605

Approved by: https://github.com/cpuhrsch
2022-05-18 21:22:19 +00:00
Christian Puhrsch
e10a002e52 2D Strided to/from CSC, COO to CSC, CSC to CSC conversion. (#77521)
Adds
- to_sparse_csc for strided input
- to_sparse_csc for COO input
- CSC to strided
- CSC to CSR
- CSC to CSC

Uses SciPy as a reference

Follow up work is changing transpose to return CSC when passed CSR and the resulting ripples through our matmul operations.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77521
Approved by: https://github.com/pearu, https://github.com/anjali411
2022-05-18 14:49:11 +00:00
Pearu Peterson
ccc991ba29 Support str for Sparse Compressed tensors
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77530

Approved by: https://github.com/cpuhrsch
2022-05-18 12:58:54 +00:00
Pearu Peterson
dc882ed33d Add Sparse Compressed tensor support to torch.clone
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77512

Approved by: https://github.com/cpuhrsch
2022-05-17 16:29:41 +00:00
PyTorch MergeBot
0d1329c4ea Revert "Add Sparse Compressed tensor support to torch.clone"
This reverts commit 942f04172a.

Reverted https://github.com/pytorch/pytorch/pull/77512 on behalf of https://github.com/atalman
2022-05-17 14:26:52 +00:00
Pearu Peterson
942f04172a Add Sparse Compressed tensor support to torch.clone
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77512

Approved by: https://github.com/cpuhrsch
2022-05-17 07:32:46 +00:00
PyTorch MergeBot
f1c8e8fa4e Revert "Add Sparse Compressed tensor support to torch.clone"
This reverts commit 20ba6e6935.

Reverted https://github.com/pytorch/pytorch/pull/77512 on behalf of https://github.com/malfet
2022-05-17 00:31:49 +00:00
Christian Puhrsch
89e32f52c7 Change test_sparse_csr test signatures (#77595)
Some consuming tools aren't equipped to split on the "(" and ")" induced by passing tuples to parametrize.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77595
Approved by: https://github.com/malfet
2022-05-17 00:24:08 +00:00
Pearu Peterson
20ba6e6935 Add Sparse Compressed tensor support to torch.clone
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77512

Approved by: https://github.com/cpuhrsch
2022-05-16 22:21:49 +00:00
Pearu Peterson
d76efed578 Add Sparse CSC support to torch.empty
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77508

Approved by: https://github.com/cpuhrsch
2022-05-16 18:53:56 +00:00
Christian Puhrsch
8c608a79b4 Compressed sparse layout conversion stubs (#77489)
This PR unifies sparse layout conversions into a single location and adds stubs to raise a Runtime error for unsupported conversions.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77489
Approved by: https://github.com/pearu, https://github.com/mruberry
2022-05-16 18:37:42 +00:00
Pearu Peterson
88205886d7 Add ccol_indices and row_indices methods.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77503

Approved by: https://github.com/cpuhrsch
2022-05-16 00:23:54 +00:00
Christian Puhrsch
289192199a Add to_sparse_bsr (#77366)
Conversion function of CSR to BSR.

Follow up work includes
- Conversion from strided, COO, CSC, BSC
- autograd
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77366
Approved by: https://github.com/IvanYashchuk, https://github.com/mikaylagawarecki
2022-05-13 20:16:03 +00:00
Christian Puhrsch
b250759242 mul(dense, csr), mul(csr, dense) via sparse_mask_csr (#77177)
This adds basic coverage, but can be easily made more efficient by providing a native implementation.

Follow up work includes supporting CSR gradients for strided Tensors.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77177
Approved by: https://github.com/nikitaved, https://github.com/mikaylagawarecki
2022-05-12 23:56:10 +00:00
Ivan Yashchuk
09be44de7b Sparse BSR: Enable addmm, addmv, triangular_solve for BSR layout (#77255)
This PR enables `addmm`, `addmv`, `triangular_solve` functions for tensors with `torch.sparse_bsr` layout.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77255
Approved by: https://github.com/cpuhrsch
2022-05-12 08:31:44 +00:00
Ivan Yashchuk
d1beda53e8 Sparse CSR CUDA: add batched support for torch.sparse.sampled_addmm
This PR adds a forloop around cuSPARSE calls to support batched inputs.
cuSPARSE function itself doesn't support batched inputs yet.
`mat1` and `mat2` must have the same batch shape. It's allowed to pass
`self` as a single matrix when `mat1` and `mat2` are batched.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77243

Approved by: https://github.com/cpuhrsch
2022-05-12 08:23:38 +00:00
Ivan Yashchuk
545d90f032 Sparse CSR: enable autograd for torch.sparse.addmm and torch.sparse.mm
This PR updates the derivative rule for `torch.sparse.addmm` to be
working with CSR sparse matrix. Notably `torch.sparse.sampled_addmm` is
used in the backward function.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76591

Approved by: https://github.com/cpuhrsch
2022-05-11 18:57:40 +00:00
PyTorch MergeBot
f94abd59f7 Revert "Sparse CSR: enable autograd for torch.sparse.addmm and torch.sparse.mm"
This reverts commit 721a8ca697.

Reverted https://github.com/pytorch/pytorch/pull/76591 on behalf of https://github.com/janeyx99
2022-05-10 13:21:46 +00:00
Ivan Yashchuk
721a8ca697 Sparse CSR: enable autograd for torch.sparse.addmm and torch.sparse.mm
This PR updates the derivative rule for `torch.sparse.addmm` to be
working with CSR sparse matrix. Notably `torch.sparse.sampled_addmm` is
used in the backward function.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76591

Approved by: https://github.com/cpuhrsch
2022-05-10 08:44:55 +00:00
Ivan Yashchuk
3df0140cbd Sparse CSR: Fix sampled_addmm for noncontiguous inputs and fix block sparse triangular solve
`torch.sparse.sampled_addmm` was incorrect for noncontiguous inputs on CUDA.
Unfortnately, it was overlooked in the tests that noncontiguous inputs
are not tested properly because 1x5, 5x1 shapes were used.

Block sparse triangular solver on CUDA could return incorrect results if
there's a zero on the diagonal in the sparse matrix. Now it returns nan.
Tests also revealed that unitriangular=True flag is not working
correctly on CPU in some cases. That part needs more investigation.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76590

Approved by: https://github.com/cpuhrsch
2022-05-05 09:00:48 +00:00
Ivan Yashchuk
1335512056 Sparse CSR: Add CPU fallback for sampled_addmm
`torch.sparse.sampled_addmm` function is used in backward for
`torch.sparse.addmm` and `torch.sparse.mm` therefore we need a CPU
implementation.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76589

Approved by: https://github.com/cpuhrsch
2022-05-04 21:30:43 +00:00
Pearu Peterson
436a7be059 Factory functions for sparse CSC, BSR, and BSC tensors
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76634

Tests for Sparse Compressed factory functions

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76746

Approved by: https://github.com/cpuhrsch
2022-05-04 03:30:41 +00:00
Ivan Yashchuk
d7db6a7b02 Sparse CSR: Add backward for torch.sparse.sampled_addmm
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68084

Approved by: https://github.com/cpuhrsch
2022-05-02 17:58:20 +00:00
Ivan Yashchuk
407e8eba8c Enable simple indexing into CSR tensor, add torch.select for CSR
This PR implements `torch.select` for CSR tensors. Currently, it's not possible to select rows or columns for batched CSR. The non-batched case works fine by converting to COO and calling select. Initially, I implemented raw manipulations of indices but converting to COO is only slightly slower and more readable.

This PR also enables indexing into batched CSR tensor with `[x, y, z]`. Assigning is disabled.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76228
Approved by: https://github.com/cpuhrsch
2022-04-23 02:36:03 +00:00
arindamroy-eng
7478ce187a ROCM:Unskip more tests for ROCM5.0
Re-enabling more tests which are working on ROCM5.0

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75353
Approved by: https://github.com/ezyang
2022-04-19 19:45:55 +00:00
Ivan Yashchuk
bba4780232 Enable autograd wrt sparse CSR tensors
This pull request enables accumulating gradients for the CSR tensor.
Functions that work and are tested:
- tensor.abs()
- tensor.neg()
- tensor.conj_physical()
- torch.addmm

`torch.mm` also works, but tests will be added later.

In addition, this PR adds throwing an error when trying to access strides, storage, and contiguity info on a CSR tensor.

`tensor.to_sparse_csr().to_sparse_csr()` was failing and now fixed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75435
Approved by: https://github.com/cpuhrsch
2022-04-19 18:42:45 +00:00
Pearu Peterson
e9791cd8c9 Validate Sparse Compressed tensor arguments
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75946

Approved by: https://github.com/cpuhrsch
2022-04-18 02:21:22 +00:00
Yukio Siraichi
22a10ce513 Port cat kernel to structured kernels.
Tracking issue: #55070

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68640

Approved by: https://github.com/ezyang
2022-04-14 17:49:43 +00:00
Ivan Yashchuk
3f1351d1cf Disable strides and contiguity for CSR tensors
This pull request adds throwing an error when trying to access the strides, storage, and contiguity info of a CSR tensor.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75499
Approved by: https://github.com/cpuhrsch
2022-04-08 23:15:19 +00:00
Pearu Peterson
e61b2e12e1 Support masked sum on CSR tensors [CPU, CUDA]
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72633

Approved by: https://github.com/cpuhrsch
2022-04-08 20:07:18 +00:00
PyTorch MergeBot
31ed77b769 Revert "Support masked sum on CSR tensors [CPU, CUDA]"
This reverts commit 5c28216aea.

Reverted https://github.com/pytorch/pytorch/pull/72633 on behalf of https://github.com/b0noI
2022-04-07 23:34:58 +00:00
Ivan Yashchuk
c7ae23b50e Extend CSR constructor to support batched indices and values
This is the first portion of changes required to enable Batched CSR format described in https://github.com/pytorch/pytorch/issues/60854#batched-CSR-computation.

Currently, only the same batch shape for indices and values is allowed. In the future, we could enable "broadcasting" of indices and batched values, as done in xFormers (dd96b8d8be/xformers/components/attention/_sputnik_sparse.py (L441)).

This PR adds possibility to construct a batched CSR matrix with `torch.sparse_csr_tensor` and this batched CSR can be converted to a dense tensor with a `.to_dense()` call.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74542
Approved by: https://github.com/cpuhrsch
2022-04-07 17:10:52 +00:00
Pearu Peterson
5c28216aea Support masked sum on CSR tensors [CPU, CUDA]
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72633

Approved by: https://github.com/cpuhrsch
2022-04-07 17:08:35 +00:00