Commit Graph

59 Commits

Author SHA1 Message Date
Xuehai Pan
2ccfd14e23 [BE] fix typos in docs/ (#156080)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/156080
Approved by: https://github.com/cyyever, https://github.com/albanD
2025-06-21 02:47:32 +00:00
Nikita Shulga
f7c09f864a [Docs] Reformat sparse example (#154785)
Not sure why, but rst fails to colorize multiline inputs, but works fine for single line commands
Test plan:
| [Before](https://docs.pytorch.org/docs/main/sparse.html#construction)  | [After](https://docs-preview.pytorch.org/pytorch/pytorch/154785/sparse.html#construction) |
| ------------- | ------------- |
| <img width="466" alt="image" src="https://github.com/user-attachments/assets/96a5c52a-1804-4d05-a5cf-c10221aaddf6" />  | <img width="477" alt="image" src="https://github.com/user-attachments/assets/99565288-5c0b-4e8e-bd60-f016ebc207b5" />  |

Fixes https://github.com/pytorch/pytorch/issues/154779

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154785
Approved by: https://github.com/janeyx99, https://github.com/Skylion007
2025-06-01 20:56:14 +00:00
Zitong Zhan
90c821814e SparseCsrCUDA: cuDSS backend for linalg.solve (#129856)
This PR switches to cuDSS library and has the same purpose of #127692, which is to add Sparse CSR tensor support to linalg.solve.
Fixes #69538

Minimum example of usage:
```
import torch

if __name__ == '__main__':
    spd = torch.rand(4, 3)
    A = spd.T @ spd
    b = torch.rand(3).to(torch.float64).cuda()
    A = A.to_sparse_csr().to(torch.float64).cuda()

    x = torch.linalg.solve(A, b)
    print((A @ x - b).norm())

```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129856
Approved by: https://github.com/amjames, https://github.com/lezcano, https://github.com/huydhn

Co-authored-by: Zihang Fang <zhfang1108@gmail.com>
Co-authored-by: Huy Do <huydhn@gmail.com>
2024-08-22 07:57:30 +00:00
redwrasse
63a0a65df9 Define 'zero-preserving unary functions' in docs (#130804)
Make explicit the definition of 'zero-preserving unary functions' in the sparse tensors documentation.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130804
Approved by: https://github.com/soulitzer
2024-07-18 13:30:29 +00:00
Nathan
ae983d2d6e Fix typo in sparse.rst (#121826)
Change word "on" to "one" when talking in the third person.

Fixes #121770
Pull Request resolved: https://github.com/pytorch/pytorch/pull/121826
Approved by: https://github.com/janeyx99
2024-03-19 00:17:19 +00:00
Linus
5f2ff29569 Fix typo in https://pytorch.org/docs/stable/sparse.html (#115282)
Fixes #111473

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115282
Approved by: https://github.com/svekars
2023-12-08 18:31:33 +00:00
Kazuaki Ishizaki
aa3629ee3e Fix typo under docs directory (#110359)
This PR fixes typo in `.rst` files under docs directory

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110359
Approved by: https://github.com/kit1980
2023-10-03 16:36:05 +00:00
Pearu Peterson
c5ad44be1d Add torch.sparse.as_sparse_gradcheck decorator of gradcheck that allows gradcheck input function to receive and return sparse tensors (#107150)
Compared to #104848, this PR makes a step further: when the enable_sparse_support decorator is applied to `torch.autograd.gradcheck`, the resulting callable is equivalent to `torch.autograd.gradcheck` with an extra feature of supporting functions that can have input sparse tensors or/and can return sparse tensors.

At the same time, the underlying call to `torch.autograd.gradcheck` will operate on strided tensors only. This basically means that torch/autograd/gradcheck.py can be cleaned up by removing the code that deals with sparse tensors.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107150
Approved by: https://github.com/albanD, https://github.com/amjames, https://github.com/cpuhrsch
ghstack dependencies: #107638, #107777
2023-08-26 07:24:31 +00:00
Aleksandar Samardžić
d7e6040efa Update sparse semi-structured linear operator (#104608)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104608
Approved by: https://github.com/cpuhrsch
2023-07-13 23:52:39 +00:00
Aleksandar Samardžić
fc2f87b281 Add semi-structured sparse conversions (#103830)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103830
Approved by: https://github.com/amjames, https://github.com/jcaip, https://github.com/cpuhrsch
2023-07-13 21:09:09 +00:00
Jesse Cai
2da6cae43c [core][pruning][sparse][feature] SparseSemiStructured tensor subclass (#102135)
This PR adds in support for semi-structured sparsity via a tensor
subclass. It currently uses the CUTLASS kernels merged in PR #100881.

In the future we plan to add in cuSPARSELt support (see the other PRs in
the stack), which will give us larger performance gains.

This PR adds in 2 things:
- a Tensor subclass, `SparseSemiStructuredTensor` to store the
  sparse tensor in copmressed form and override `__torch_dispatch__`.
- a conversion function that takes in a dense tensor and a
  semi-structured sparse bool mask and creates an instance of the
  subclass.

**SparseSemiStructuredTensor**

The subclass stores the dense tensor in a contiguous flattened tensor
for future compatability with cuSPARSELt, which expects this format.
Note that the CUTLASS kernels do not have this limitation, as the
specified values and the metadata are passed separately in
`_structured_sparse_linear`. In the future we can use the cuSPARSELT bindings
[here](https://github.com/pytorch/pytorch/pull/103700) for faster matmul, better dtype converage, and relaxed shape
constraints.

Since we currently don't have a way to go back from the sparse
representation to the dense representation, and we store the weights in
compressed form, we don't have a great way to handle .t().

Instead, we keep track of how often we've called transpose on our
tensor, and if it's an unexpected number we throw an error. When the first
argument is sparse, we expect an even number of calls to transpose,
while when the second argument is sparse, we expect an odd number of
calls. This is because we support second argument sparse matrix
multiplications by using transpose properties.

**to_sparse_semi_structured**

This is a conversion function to convert a dense tensor and a
semi-structured sparse bool mask into a subclass. Currently, we must
pass in a bool mask, since we can't infer it becuase there may be
additional zero elements in the dense tensor, so `tensor !=0` is not 2:4
sparse.

Once we add either a method to derive the mask from the dense tensor or
cuSPARSELt, we no longer need to pass in the mask. cuSPARSELt has it's
own helper functions to create the metadata mask.

**User Details**

We have implemented support for the following ops for `torch.float16`
and `torch.int8`:
```
torch.addmm(bias, dense, sparse.t())
torch.mm(dense, sparse)
torch.mm(sparse, dense)
aten.linear.default
aten.t.default
aten.t.detach
```

The end user interface to accelerate a nn.Linaer module with the
subclass would look like this:

```
from torch.sparse import to_sparse_semi_structured

mask = torch.Tensor([0, 0, 1, 1]).tile(128, 32).cuda().bool()
linear = Model(128, 128).half().cuda()

linear.weight = nn.Parameter(to_sparse_semi_structured(linear.weight,
                                                       mask=linear.weight.bool())

```

This also updates tests and the `torch.sparse` module docstring to
reflect these changes.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102135
Approved by: https://github.com/albanD
2023-06-27 19:21:06 +00:00
PyTorch MergeBot
b76a040b18 Revert "[core][pruning][sparse][feature] SparseSemiStructured tensor subclass (#102135)"
This reverts commit aea771de30.

Reverted https://github.com/pytorch/pytorch/pull/102135 on behalf of https://github.com/huydhn due to test_sparse_semi_structured.py::TestSparseSemiStructuredCUDA::test_mm_sparse_first_NT_cuda_int8 is still failing CUDA trunk jobs aea771de30 ([comment](https://github.com/pytorch/pytorch/pull/102135#issuecomment-1608744110))
2023-06-27 03:49:31 +00:00
Jesse Cai
aea771de30 [core][pruning][sparse][feature] SparseSemiStructured tensor subclass (#102135)
This PR adds in support for semi-structured sparsity via a tensor
subclass. It currently uses the CUTLASS kernels merged in PR #100881.

In the future we plan to add in cuSPARSELt support (see the other PRs in
the stack), which will give us larger performance gains.

This PR adds in 2 things:
- a Tensor subclass, `SparseSemiStructuredTensor` to store the
  sparse tensor in copmressed form and override `__torch_dispatch__`.
- a conversion function that takes in a dense tensor and a
  semi-structured sparse bool mask and creates an instance of the
  subclass.

**SparseSemiStructuredTensor**

The subclass stores the dense tensor in a contiguous flattened tensor
for future compatability with cuSPARSELt, which expects this format.
Note that the CUTLASS kernels do not have this limitation, as the
specified values and the metadata are passed separately in
`_structured_sparse_linear`. In the future we can use the cuSPARSELT bindings
[here](https://github.com/pytorch/pytorch/pull/103700) for faster matmul, better dtype converage, and relaxed shape
constraints.

Since we currently don't have a way to go back from the sparse
representation to the dense representation, and we store the weights in
compressed form, we don't have a great way to handle .t().

Instead, we keep track of how often we've called transpose on our
tensor, and if it's an unexpected number we throw an error. When the first
argument is sparse, we expect an even number of calls to transpose,
while when the second argument is sparse, we expect an odd number of
calls. This is because we support second argument sparse matrix
multiplications by using transpose properties.

**to_sparse_semi_structured**

This is a conversion function to convert a dense tensor and a
semi-structured sparse bool mask into a subclass. Currently, we must
pass in a bool mask, since we can't infer it becuase there may be
additional zero elements in the dense tensor, so `tensor !=0` is not 2:4
sparse.

Once we add either a method to derive the mask from the dense tensor or
cuSPARSELt, we no longer need to pass in the mask. cuSPARSELt has it's
own helper functions to create the metadata mask.

**User Details**

We have implemented support for the following ops for `torch.float16`
and `torch.int8`:
```
torch.addmm(bias, dense, sparse.t())
torch.mm(dense, sparse)
torch.mm(sparse, dense)
aten.linear.default
aten.t.default
aten.t.detach
```

The end user interface to accelerate a nn.Linaer module with the
subclass would look like this:

```
from torch.sparse import to_sparse_semi_structured

mask = torch.Tensor([0, 0, 1, 1]).tile(128, 32).cuda().bool()
linear = Model(128, 128).half().cuda()

linear.weight = nn.Parameter(to_sparse_semi_structured(linear.weight,
                                                       mask=linear.weight.bool())

```

This also updates tests and the `torch.sparse` module docstring to
reflect these changes.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102135
Approved by: https://github.com/albanD
2023-06-27 02:37:00 +00:00
PyTorch MergeBot
bfa08a1c67 Revert "[core][pruning][sparse][feature] SparseSemiStructured tensor subclass (#102135)"
This reverts commit cf5262a84f.

Reverted https://github.com/pytorch/pytorch/pull/102135 on behalf of https://github.com/huydhn due to Sorry for reverting your PR but test_sparse_semi_structured.py::TestSparseSemiStructuredCUDA::test_mm_sparse_first_NT_cuda_int8 is failing CUDA trunk jobs cf5262a84f. This looks like a landrace ([comment](https://github.com/pytorch/pytorch/pull/102135#issuecomment-1608423849))
2023-06-26 22:54:16 +00:00
Jesse Cai
cf5262a84f [core][pruning][sparse][feature] SparseSemiStructured tensor subclass (#102135)
This PR adds in support for semi-structured sparsity via a tensor
subclass. It currently uses the CUTLASS kernels merged in PR #100881.

In the future we plan to add in cuSPARSELt support (see the other PRs in
the stack), which will give us larger performance gains.

This PR adds in 2 things:
- a Tensor subclass, `SparseSemiStructuredTensor` to store the
  sparse tensor in copmressed form and override `__torch_dispatch__`.
- a conversion function that takes in a dense tensor and a
  semi-structured sparse bool mask and creates an instance of the
  subclass.

**SparseSemiStructuredTensor**

The subclass stores the dense tensor in a contiguous flattened tensor
for future compatability with cuSPARSELt, which expects this format.
Note that the CUTLASS kernels do not have this limitation, as the
specified values and the metadata are passed separately in
`_structured_sparse_linear`. In the future we can use the cuSPARSELT bindings
[here](https://github.com/pytorch/pytorch/pull/103700) for faster matmul, better dtype converage, and relaxed shape
constraints.

Since we currently don't have a way to go back from the sparse
representation to the dense representation, and we store the weights in
compressed form, we don't have a great way to handle .t().

Instead, we keep track of how often we've called transpose on our
tensor, and if it's an unexpected number we throw an error. When the first
argument is sparse, we expect an even number of calls to transpose,
while when the second argument is sparse, we expect an odd number of
calls. This is because we support second argument sparse matrix
multiplications by using transpose properties.

**to_sparse_semi_structured**

This is a conversion function to convert a dense tensor and a
semi-structured sparse bool mask into a subclass. Currently, we must
pass in a bool mask, since we can't infer it becuase there may be
additional zero elements in the dense tensor, so `tensor !=0` is not 2:4
sparse.

Once we add either a method to derive the mask from the dense tensor or
cuSPARSELt, we no longer need to pass in the mask. cuSPARSELt has it's
own helper functions to create the metadata mask.

**User Details**

We have implemented support for the following ops for `torch.float16`
and `torch.int8`:
```
torch.addmm(bias, dense, sparse.t())
torch.mm(dense, sparse)
torch.mm(sparse, dense)
aten.linear.default
aten.t.default
aten.t.detach
```

The end user interface to accelerate a nn.Linaer module with the
subclass would look like this:

```
from torch.sparse import to_sparse_semi_structured

mask = torch.Tensor([0, 0, 1, 1]).tile(128, 32).cuda().bool()
linear = Model(128, 128).half().cuda()

linear.weight = nn.Parameter(to_sparse_semi_structured(linear.weight,
                                                       mask=linear.weight.bool())

```

This also updates tests and the `torch.sparse` module docstring to
reflect these changes.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102135
Approved by: https://github.com/albanD
2023-06-26 21:30:43 +00:00
akhilkedia
129a1bc715 Minor error in docs regarding execution time (#93258)
The previous sentence seemed to imply that sparse may not always be helpful, ie, your execution time may increase when using sparse. But the docs mentioned otherwise.

A simple re-ordering of two words in the documentation to better align with the contextual sentiment.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/93258
Approved by: https://github.com/cpuhrsch
2023-01-31 23:32:42 +00:00
Pearu Peterson
b3e4f5029b Add check-sparse-tensor-invariants flag to Context - 2nd try. (#92094)
This PR is a copy of https://github.com/pytorch/pytorch/pull/90849 that merge was reverted.

The PR adds "check sparse tensor invariants" flag to Context that when enabled will trigger sparse tensor data invariants checks in unsafe methods of constructing sparse COO/CSR/CSC/BSR/BSC tensors. The feature includes the following changes to UI:

`torch.sparse.check_sparse_tensor_invariants` class provides different ways to enable/disable the invariant checking.

`torch.sparse_coo/csr/csc/bsr/bsc/compressed_tensor` functions have a new optional argument `check_invariants` to enable/disable the invariant checks explicitly. When the `check_invariants` argument is specified, the global state of the feature is temporarily overridden.

The PR fixes https://github.com/pytorch/pytorch/issues/90833

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92094
Approved by: https://github.com/cpuhrsch
2023-01-13 14:50:33 +00:00
PyTorch MergeBot
c7a22bb7c7 Revert "Add check-sparse-tensor-invariants flag to Context. (#90849)"
This reverts commit b9a035c1c5.

Reverted https://github.com/pytorch/pytorch/pull/90849 on behalf of https://github.com/DanilBaibak due to Break internal build
2023-01-12 09:58:16 +00:00
Pearu Peterson
b9a035c1c5 Add check-sparse-tensor-invariants flag to Context. (#90849)
This PR adds "check sparse tensor invariants" flag to Context that when enabled will trigger sparse tensor data invariants checks in unsafe methods of constructing sparse COO/CSR/CSC/BSR/BSC tensors. The feature includes the following changes to UI:

- `torch.enable_check_sparse_tensor_invariants` and `torch.is_check_sparse_tensor_invariants_enabled` functions to globally enable/disable the invariant checks and to retrieve the state of the feature, respectively
- `torch.sparse_coo/csr/csc/bsr/bsc/compressed_tensor` functions have a new optional argument `check_invariants` to enable/disable the invariant checks explicitly. When the `check_invariants` argument is specified, the global state of the feature is temporarily overridden.

The PR also fixes https://github.com/pytorch/pytorch/issues/90833

# Main issue

*The following content is outdated after merging the PRs in this ghstack but kept for the record.*

The importance of this feature is that when enabling the invariants checks by default, say, via

<details>

```
$ git diff
diff --git a/torch/__init__.py b/torch/__init__.py
index c8543057c7..19a91d0482 100644
--- a/torch/__init__.py
+++ b/torch/__init__.py
@@ -1239,3 +1239,8 @@ if 'TORCH_CUDA_SANITIZER' in os.environ:

 # Populate magic methods on SymInt and SymFloat
 import torch.fx.experimental.symbolic_shapes
+
+# temporarily enable sparse tensor arguments validation in unsafe
+# constructors:
+
+torch._C._set_check_sparse_tensor_invariants(True)
```

</details>

a massive number of test failures/errors occur in test_sparse_csr.py tests:
```
$ pytest -sv test/test_sparse_csr.py
<snip>
==== 4293 failed, 1557 passed, 237 skipped, 2744 errors in 69.71s (0:01:09) ====
```
that means that we are silently constructing sparse compressed tensors that do not satisfy the sparse tensor invariants. In particular, the following errors are raised:

```
AssertionError: "resize_as_sparse_compressed_tensor_: self and src must have the same layout" does not match "expected values to be a strided and contiguous tensor"

RuntimeError: CUDA error: device-side assert triggered

RuntimeError: `col_indices[..., crow_indices[..., i - 1]:crow_indices[..., i]] for all i = 1, ..., nrows are sorted and distinct along the last dimension values` is not satisfied.

RuntimeError: expected col_indices to be a strided and contiguous tensor

RuntimeError: expected row_indices to be a strided and contiguous tensor

RuntimeError: expected values to be a strided and contiguous tensor

RuntimeError: for_each: failed to synchronize: cudaErrorAssert: device-side assert triggered

RuntimeError: tensor dimensionality must be sum of batch, base, and dense dimensionalities (=0 + 2 + 0) but got 3
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90849
Approved by: https://github.com/amjames, https://github.com/cpuhrsch
2023-01-11 01:05:14 +00:00
Kazuaki Ishizaki
a5f04e9a91 Fix typos in .md and .rst files (#88962)
This PR fixes typos `Github` in `.md` and `.rst` files.
`Github` -> `GitHub`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88962
Approved by: https://github.com/kit1980
2022-11-17 03:37:02 +00:00
Kazuaki Ishizaki
72ec1b5fc1 Fix typo under docs directory (#87583)
This PR fixes typo in `.rst` files under docs directory

Pull Request resolved: https://github.com/pytorch/pytorch/pull/87583
Approved by: https://github.com/kit1980
2022-10-24 23:52:44 +00:00
Christian Puhrsch
e8c4adf3c3 Add torch.sparse overview section (#85265)
The goal of this section is to provide a general overview of how PyTorch handles sparsity for readers who are already familiar with sparse matrices and their operators.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85265
Approved by: https://github.com/jisaacso
2022-10-18 21:07:57 +00:00
Eddie Yan
25725fd624 (Re-open) Adds cudaMallocAsync as an alternative backend for the CUDA allocator (#82682)
Rebased version of @mcarilli 's cudaMallocAsync #65365 for continued testing
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82682
Approved by: https://github.com/ngimel
2022-10-12 03:44:21 +00:00
Kazuaki Ishizaki
bc57306bdd Fix typo under docs directory and RELEASE.md (#85896)
This PR fixes typo in rst files under docs directory and `RELEASE.md`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/85896
Approved by: https://github.com/kit1980
2022-09-29 21:41:59 +00:00
Pearu Peterson
ff5399e528 Revise sparse docs regarding Sparse Compressed tensors (#82108)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82108
Approved by: https://github.com/bhosmer
2022-07-29 18:15:09 +00:00
Andrew M. James
5a4c9e8394 Add spdiags sparse matrix initialization (#78439)
Similar to [scipy.sparse.spdiags](https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.spdiags.html#scipy-sparse-spdiags)

Part of #70926

In other functions (ie (torch.diagonal)[https://pytorch.org/docs/stable/generated/torch.diagonal.html#torch.diagonal]) diagonals of a tensor are referenced using the offset and the two dimensions that the diagonal is taken with respect to.

Here the reference implementation from scipy is only considering matrix output, so even if we only support 2-d output at first. It may be useful to consider how the dimensions corresponding to each diagonal would be specified for higher dimensional output.

The proposed torch signature implies that all offsets refer to the diagonals with respect to the only two dimensions of the output:

```
torch.sparse.spdiags(Tensor diagonals, IntTensor offsets, int[] shape, Layout? layout=None) -> SparseTensor
```
 Above it is required that: `diagonals.ndimension() == 2`, `offsets.ndimensions() == 1`, `offsets.shape[0] == diagonals.shape[0]` and `len(shape) == 2`.

This would need to be altered for the case where `len(shape)` > 2. One options is:
```
torch.sparse.spdiags(Tensor[] diagonals, IntTensor[] offsets, IntTensor dims, int[] shape, Layout? layout=None) -> SparseTensor
```

Here `offsets` and `diagonals` becomes lists of tensors, and the `IntTensor dims` argument is introduced. This would require that `len(diagonals) == len(offsets) == dims.shape[0]`, `dims.ndimension() == 2` and `dims.shape[1] == 2` also the same restrictions as the 2d case above apply to the elements of `diagonals` and `offsets` pairwise (that is `diagonals[i].ndimension() == 2`, `offsets[i].ndimension() == 1` and `offsets[i].shape[0] == diagonals[i].shape[0]` for all i). This form of the signature would construct the sparse result by placing the values from `diagonals[i][j]` into the diagonal with offset `offset[i][j]` taken with respect to dimensions `dims[i]`. The specialization back to the original signature for the 2d case could be seen as allowing the single row of dims to default to `[0, 1]` when there is only one `diagonals`, `offsets` provided, and shape is `2-d`. This option allows the rows of an input element `diagonals[i]` to have a different length which may be appropriate as the max length of a diagonal along different dimension pairs will be different.

Another option is to specify the dimensions the diagonal is taken with respect to for each offset. This signature would look like:

```
torch.sparse.spdiags(Tensor diagonals, IntTensor offsets, IntTensor dims, int[] shape, Layout? layout=None) -> SparseTensor
```
Here, `diagonals` is still 2-D with dimension 0 matching the length of 1-D `offsets` and the tensor input `dims` is also 2-D with dimension 0 matching the length of 1-D `offsets` and the second dimension being fixed at `2` in this case the sparse result is constructed by placing the elements from `diagonals[i]` into the output diagonal `output.diagonal(offset[i], dim0=dims[i][0], dim1=dims[i][1])` (with some additional consideration that makes it more complicated than simply asigning to that view). The specialization from this back to the 2-D form could be seen as assuming `dims = [[0, 1], [0, 1]... len(offsets) times ]` when `len shape==2`.

In both proposed signatures for the N-D case the specialization back to the 2-D signature is a bit of a stretch for your typical default arguments logic, however I think the first is better choice as it offers more flexibility.

I think some discussion is required about:
- [x] Should the N-D output case be implemented from the outset
- [x] If not, should the future addition of the N-D output case be considered when designing the interface.
- [x] Other thoughts on the signature which includes the `dims` information for the N-D output case.

**Resolution**: Since no one has offered a request for N-D output support, I think is fine to restrict this to sparse matrix generation. Should a request for N-D support come later, an overload accepting the additional `dims` could be added.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78439
Approved by: https://github.com/nikitaved, https://github.com/cpuhrsch, https://github.com/pearu
2022-07-01 01:11:54 +00:00
PyTorch MergeBot
56e3bc5215 Revert "Add spdiags sparse matrix initialization (#78439)"
This reverts commit cfb2034b65.

Reverted https://github.com/pytorch/pytorch/pull/78439 on behalf of https://github.com/suo due to broke windows builds, see: cfb2034b65
2022-06-30 21:04:36 +00:00
Andrew M. James
cfb2034b65 Add spdiags sparse matrix initialization (#78439)
Similar to [scipy.sparse.spdiags](https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.spdiags.html#scipy-sparse-spdiags)

Part of #70926

In other functions (ie (torch.diagonal)[https://pytorch.org/docs/stable/generated/torch.diagonal.html#torch.diagonal]) diagonals of a tensor are referenced using the offset and the two dimensions that the diagonal is taken with respect to.

Here the reference implementation from scipy is only considering matrix output, so even if we only support 2-d output at first. It may be useful to consider how the dimensions corresponding to each diagonal would be specified for higher dimensional output.

The proposed torch signature implies that all offsets refer to the diagonals with respect to the only two dimensions of the output:

```
torch.sparse.spdiags(Tensor diagonals, IntTensor offsets, int[] shape, Layout? layout=None) -> SparseTensor
```
 Above it is required that: `diagonals.ndimension() == 2`, `offsets.ndimensions() == 1`, `offsets.shape[0] == diagonals.shape[0]` and `len(shape) == 2`.

This would need to be altered for the case where `len(shape)` > 2. One options is:
```
torch.sparse.spdiags(Tensor[] diagonals, IntTensor[] offsets, IntTensor dims, int[] shape, Layout? layout=None) -> SparseTensor
```

Here `offsets` and `diagonals` becomes lists of tensors, and the `IntTensor dims` argument is introduced. This would require that `len(diagonals) == len(offsets) == dims.shape[0]`, `dims.ndimension() == 2` and `dims.shape[1] == 2` also the same restrictions as the 2d case above apply to the elements of `diagonals` and `offsets` pairwise (that is `diagonals[i].ndimension() == 2`, `offsets[i].ndimension() == 1` and `offsets[i].shape[0] == diagonals[i].shape[0]` for all i). This form of the signature would construct the sparse result by placing the values from `diagonals[i][j]` into the diagonal with offset `offset[i][j]` taken with respect to dimensions `dims[i]`. The specialization back to the original signature for the 2d case could be seen as allowing the single row of dims to default to `[0, 1]` when there is only one `diagonals`, `offsets` provided, and shape is `2-d`. This option allows the rows of an input element `diagonals[i]` to have a different length which may be appropriate as the max length of a diagonal along different dimension pairs will be different.

Another option is to specify the dimensions the diagonal is taken with respect to for each offset. This signature would look like:

```
torch.sparse.spdiags(Tensor diagonals, IntTensor offsets, IntTensor dims, int[] shape, Layout? layout=None) -> SparseTensor
```
Here, `diagonals` is still 2-D with dimension 0 matching the length of 1-D `offsets` and the tensor input `dims` is also 2-D with dimension 0 matching the length of 1-D `offsets` and the second dimension being fixed at `2` in this case the sparse result is constructed by placing the elements from `diagonals[i]` into the output diagonal `output.diagonal(offset[i], dim0=dims[i][0], dim1=dims[i][1])` (with some additional consideration that makes it more complicated than simply asigning to that view). The specialization from this back to the 2-D form could be seen as assuming `dims = [[0, 1], [0, 1]... len(offsets) times ]` when `len shape==2`.

In both proposed signatures for the N-D case the specialization back to the 2-D signature is a bit of a stretch for your typical default arguments logic, however I think the first is better choice as it offers more flexibility.

I think some discussion is required about:
- [x] Should the N-D output case be implemented from the outset
- [x] If not, should the future addition of the N-D output case be considered when designing the interface.
- [x] Other thoughts on the signature which includes the `dims` information for the N-D output case.

**Resolution**: Since no one has offered a request for N-D output support, I think is fine to restrict this to sparse matrix generation. Should a request for N-D support come later, an overload accepting the additional `dims` could be added.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78439
Approved by: https://github.com/nikitaved, https://github.com/cpuhrsch, https://github.com/pearu
2022-06-30 19:54:47 +00:00
Alban Desmaison
734281c3d6 Cleanup all module references in doc (#73983)
Summary:
Working towards https://docs.google.com/document/d/10yx2-4gs0gTMOimVS403MnoAWkqitS8TUHX73PN8EjE/edit?pli=1#

This PR:
- Ensure that all the submodules are listed in a rst file (that ensure they are considered by the coverage tool)
- Remove some long deprecated code that just error out on import
- Remove the allow list altogether to ensure nothing gets added back there

Pull Request resolved: https://github.com/pytorch/pytorch/pull/73983

Reviewed By: anjali411

Differential Revision: D34787908

Pulled By: albanD

fbshipit-source-id: 163ce61e133b12b2f2e1cbe374f979e3d6858db7
(cherry picked from commit c9edfead7a01dc45bfc24eaf7220d2a84ab1f62e)
2022-03-10 22:26:29 +00:00
Nikita Shulga
8ac7393565 Revert D33767740: [pytorch][PR] Sparse CSR CPU: cuSolverSP backend for linalg.solve
Test Plan: revert-hammer

Differential Revision:
D33767740 (199d9a992c)

Original commit changeset: a945f065210c

Original Phabricator Diff: D33767740 (199d9a992c)

fbshipit-source-id: b7934df18118f8d6d5f165deb5aae9887953ae43
(cherry picked from commit d3ddbb021b227e3638f6f7c22c6eadfa73695e31)
2022-03-01 18:33:23 +00:00
Kushashwa Ravi Shrimali
199d9a992c Sparse CSR CPU: cuSolverSP backend for linalg.solve (#71399)
Summary:
This PR introduces the `cuSolverSP` backend for `linalg.solve` with sparse CSR input matrices. The motivation comes from the issue: https://github.com/pytorch/pytorch/issues/69538.

`cuSolver` provides [`cusolverSp<t>csrlsvluHost`](https://docs.nvidia.com/cuda/cusolver/index.html#cusolver-lt-t-gt-csrlsvlu) API, a few things to note:

1. As mentioned in the documentation: `only CPU (Host) path is provided.` From the profiling, there doesn't seem to be any GPU kernel launch for optimization, please see the profiling below.
2. Since only `host` path is provided, the CPU path uses `csrlsvluHost` (but requires PyTorch to be installed/built with CUDA support).
3. The documentation mentions reordering helps optimize stuff, but it isn't clear how it affects the performance. There are options for reordering, so we stick to `reorder = 0` as the default choice.

`cuSolver` has [`csrlsvqr`](https://docs.nvidia.com/cuda/cusolver/index.html#cusolver-lt-t-gt-csrlsvqr) function which provides a `device` path to solve the linear system. This function is used for the CUDA path in this PR.

**Gist:**

For CPU Path: we call [`csrlsvluHost` function of cuSolver](https://docs.nvidia.com/cuda/cusolver/index.html#cusolver-lt-t-gt-csrlsvlu).
For CUDA Path: we call [`csrlsvqr` function of cuSolver](https://docs.nvidia.com/cuda/cusolver/index.html#cusolver-lt-t-gt-csrlsvqr).

**Profiling:** (On sparse input tensor of size 1000 x 1000, with a vector of shape length 1000), for `csrlsvlu` function (to show no GPU optimization)

```cpp
==3999651== Profiling result:
            Type  Time(%)      Time     Calls       Avg       Min       Max  Name
 GPU activities:  100.00%  2.1440us         1  2.1440us  2.1440us  2.1440us  [CUDA memcpy HtoD]
      API calls:   99.72%  1.07199s         9  119.11ms     500ns  1.07164s  cudaFree
                    0.11%  1.2182ms       398  3.0600us     140ns  137.94us  cuDeviceGetAttribute
                    0.06%  674.45us         4  168.61us  165.50us  173.64us  cuDeviceTotalMem
                    0.03%  357.07us         4  89.268us  2.7800us  201.89us  cudaMalloc
                    0.03%  309.29us         1  309.29us  309.29us  309.29us  cudaGetDeviceProperties
                    0.01%  160.47us       332     483ns     350ns  3.3300us  cudaFuncSetAttribute
                    0.01%  115.12us         4  28.780us  26.290us  33.410us  cuDeviceGetName
                    0.00%  28.591us         5  5.7180us     440ns  16.921us  cudaGetDevice
                    0.00%  22.061us         4  5.5150us     871ns  18.690us  cudaDeviceSynchronize
                    0.00%  20.370us        18  1.1310us     410ns  6.9900us  cudaEventDestroy
                    0.00%  16.390us         1  16.390us  16.390us  16.390us  cudaMemcpy
                    0.00%  11.540us         2  5.7700us  1.4900us  10.050us  cuDeviceGetPCIBusId
                    0.00%  10.510us        18     583ns     430ns  1.6200us  cudaEventCreateWithFlags
                    0.00%  7.9100us        21     376ns     290ns     700ns  cudaDeviceGetAttribute
                    0.00%  1.4300us         6     238ns     150ns     590ns  cuDeviceGet
                    0.00%  1.2200us         4     305ns     190ns     500ns  cuDeviceGetCount
                    0.00%     900ns         1     900ns     900ns     900ns  cuInit
                    0.00%     860ns         4     215ns     180ns     260ns  cuDeviceGetUuid
                    0.00%     240ns         1     240ns     240ns     240ns  cuDriverGetVersion
                    0.00%     230ns         1     230ns     230ns     230ns  cudaGetDeviceCount
```

Script:

```python
import torch

def solve(x, other, out):
    torch.linalg.solve(x, other, out=out)

if __name__ == "__main__":
    dense_inp = torch.randn((1000, 1000), dtype=torch.float64)
    # Set 50% of the values to 0 randomly
    dense_inp = torch.nn.functional.dropout(dense_inp, p=0.5)
    sparse_inp = dense_inp.to_sparse_csr()

    other = torch.randint(100, (1000,), dtype=torch.float64)
    out = torch.randint(1, (1000,), dtype=torch.float64)

    solve(sparse_inp, other, out)
```

The following error is raised when the function is used on a CPU device with PyTorch built/installed without CUDA support:
* When built without CUDA support:

```python
/home/krshrimali/pytorch/torch/autograd/profiler.py:151: UserWarning: CUDA is not available, disabling CUDA profiling
  warn("CUDA is not available, disabling CUDA profiling")
Traceback (most recent call last):
  File "/home/krshrimali/pytorch/test_sp.py", line 17, in <module>
    solve(x, other, out)
  File "/home/krshrimali/pytorch/test_sp.py", line 5, in solve
    torch.linalg.solve(x, other, out=out)
RuntimeError: PyTorch was not built with CUDA support. Please use PyTorch built CUDA support
```

**Performance Comparison** (vs SciPy's [`scipy.sparse.linalg.spsolve`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.linalg.spsolve.html):

Time taken by `scipy.sparse.linalg.spsolve` : 0.595 seconds

On CPU: Time taken by `torch.linalg.solve` : 4.565 seconds
On CUDA: Time taken by `torch.linalg.solve`: 1.838 seconds

The inputs are of dimensions: (17281, 17281) and (17281, 1), and were taken from https://math.nist.gov/MatrixMarket/extreme.html.

Thanks to IvanYashchuk for helping me with the PR, and guiding me through it.

cc: IvanYashchuk pearu nikitaved cpuhrsch

cc nikitaved pearu cpuhrsch

Pull Request resolved: https://github.com/pytorch/pytorch/pull/71399

Reviewed By: VitalyFedyunin

Differential Revision: D33767740

Pulled By: cpuhrsch

fbshipit-source-id: a945f065210cd719096eb8d7cdbf8e8937c2fce9
(cherry picked from commit f4f35c17da414e1ca6c6d91402933521857aa1ea)
2022-03-01 05:32:35 +00:00
Ivan Yashchuk
8cdcc1181c Add missing entry for sampled_addmm in sparse.rst (#72312)
Summary:
Let's make the documentation for `torch.sparse.sampled_addmm` searchable in the PyTorch documentation.
This PR shall be cherry-picked for the next 1.11 release.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/72312

Reviewed By: davidberard98

Differential Revision: D34045230

Pulled By: cpuhrsch

fbshipit-source-id: c1b1dc907443284857f48c8ce1efab22c6701bbe
(cherry picked from commit 225929ecf2)
2022-02-08 00:07:20 +00:00
Steven Morad
cfc1117591 Update sparse.rst to warn about _values() (#71088)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/70357

Pull Request resolved: https://github.com/pytorch/pytorch/pull/71088

Reviewed By: jbschlosser

Differential Revision: D33511207

Pulled By: cpuhrsch

fbshipit-source-id: 9d0c5445842ed96999eb88445cbea7ae284b1a6f
2022-01-10 12:43:46 -08:00
Rok
952ca25daa Sparse CSR: add convert_indices_from_csr_to_coo (#66774)
Summary:
This PR adds conversion from CSR to COO.

Fixes https://github.com/pytorch/pytorch/issues/56959

cc nikitaved pearu cpuhrsch IvanYashchuk gchanan mruberry

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66774

Reviewed By: zou3519

Differential Revision: D32288415

Pulled By: cpuhrsch

fbshipit-source-id: 683ba658dc46835fdf3c0e24645c0c2bb243b968
2021-11-17 22:28:30 -08:00
Sameer Deshmukh
5fb1142702 Add CSR (compressed sparse row) layout for sparse tensors (#50937)
Summary:
Implement compressed sparse row format. Derived from the GCS implementation at https://github.com/pytorch/pytorch/pull/44190

Pull Request resolved: https://github.com/pytorch/pytorch/pull/50937

Reviewed By: mrshenli

Differential Revision: D27439865

Pulled By: ezyang

fbshipit-source-id: 3ba3dcb9679505b980ff6a5f513e913bbae2fb1d
2021-04-12 10:09:12 -07:00
mattip
7d56de1834 DOC: use autosummary on tensors.rst (#55042)
Summary:
Related to https://github.com/pytorch/pytorch/issues/52256

Splits tensors into a table-of-contents page and many sub-pages, one for each function

Pull Request resolved: https://github.com/pytorch/pytorch/pull/55042

Reviewed By: mrshenli

Differential Revision: D27628688

Pulled By: zou3519

fbshipit-source-id: 08e87700a8e7d5b3fba3f1949e29e988a42bf2c6
2021-04-08 06:44:23 -07:00
Natalia Gimelshein
6c0bf28da6 [wip] doc_fix (#51825)
Summary:
tries to fix doc_test

Pull Request resolved: https://github.com/pytorch/pytorch/pull/51825

Reviewed By: bertmaher

Differential Revision: D26295583

Pulled By: ngimel

fbshipit-source-id: 13f6e7f1675d810adfd4abd2d579e2812fe54c80
2021-02-06 11:36:36 -08:00
Himangshu
4ff1823fac Add Sparse support for torch.sqrt (#50088)
Summary:
Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/50088

Reviewed By: mrshenli

Differential Revision: D25894003

Pulled By: ezyang

fbshipit-source-id: 93688c33b2f9a355c331d6edb3e402935223f75b
2021-01-19 20:19:07 -08:00
Pearu Peterson
905ed3c840 Revised sparse tensor documentation. (#45400)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/44635.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45400

Reviewed By: ezyang

Differential Revision: D24359410

Pulled By: mruberry

fbshipit-source-id: 37c691a49a7b0042c7a298e0ed1226702b097c8b
2020-10-22 02:07:54 -07:00
Jessica Lin
2e6e8d557c Update docs feature classifications (#39966)
Summary:
Update the following feature classifications in docs to align with the changes:
1. [High Level Autograd APIs](https://pytorch.org/docs/stable/autograd.html#functional-higher-level-api): Beta (was experimental)
2. [Eager Mode Quantization](https://pytorch.org/docs/stable/quantization.html): Beta (was experimental)
3. [Named Tensors](https://pytorch.org/docs/stable/named_tensor.html): Prototype (was experimental)
4. [TorchScript/RPC](https://pytorch.org/docs/stable/rpc.html#rpc): Prototype (was experimental)
5. [Channels Last Memory Layout](https://pytorch.org/docs/stable/tensor_attributes.html#torch-memory-format): Beta (was experimental)
6. [Custom C++ Classes](https://pytorch.org/docs/stable/cpp_index.html): Beta (was experimental)
7. [Torch.Sparse](https://pytorch.org/docs/stable/sparse.html): Beta (was experimental)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39966

Differential Revision: D22213217

Pulled By: jlin27

fbshipit-source-id: dc49337cbc7026ed8dcac506fc60029dc3add854
2020-06-24 15:35:59 -07:00
zou3519
e5d6b75319 Bag of documentation fixes; fix more sphinx warnings (#27850)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27850

Many of these are real problems in the documentation (i.e., link or
bullet point doesn't display correctly).

Test Plan: - built and viewed the documentation for each change locally.

Differential Revision: D17908123

Pulled By: zou3519

fbshipit-source-id: 65c92a352c89b90fb6b508c388b0874233a3817a
2019-10-15 07:31:14 -07:00
M. Doosti Lakhani
1777eb2ed9 fix typo: toDense --> to_dense #25706 (#25832)
Summary:
Only fixes a minor typo in [torch.sparse.FloatTensor docs](https://pytorch.org/docs/stable/sparse.html#torch.sparse.FloatTensor.toDense).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25832

Differential Revision: D17276700

Pulled By: soumith

fbshipit-source-id: cf3d550d5756b000a4e864170ecd4b31826b40f8
2019-09-09 18:27:03 -07:00
Wei Yang
5ee8312b63 sparse.mm(), reland #14526 (#14661)
Summary:
- reland reverted PR #14526 with doc fixes
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14661

Differential Revision: D13289047

Pulled By: weiyangfb

fbshipit-source-id: 5b843a11a58b56aeada3af2680a27cf89ecef4d8
2018-12-03 10:39:27 -08:00
Alyssa Wang
1c21dc6e16 Revert D13252990: [pytorch][PR] [sparse] sparse.mm(S, D)
Differential Revision:
D13252990

Original commit changeset: 8fdb14144405

fbshipit-source-id: 49b8b0759a6e647854689962ffa72a205b4a2088
2018-11-30 18:53:47 -08:00
Wei Yang
c3a2b1e155 sparse.mm(S, D) (#14526)
Summary:
- add `sparse.mm(S, D)` with backward
- for `sparse.addmm()`, relax input constraint so that sparse matrix input doesn't have to coalesced
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14526

Reviewed By: ezyang

Differential Revision: D13252990

Pulled By: weiyangfb

fbshipit-source-id: 8fdb14144405a2122d4b8447ad4055cd0330e6e8
2018-11-30 14:15:34 -08:00
Wei Yang
be7c618fd7 torch.sparse.sum() (#12430)
Summary:
- to fix #12241
- add `_sparse_sum()` to ATen, and expose as `torch.sparse.sum()`, not support `SparseTensor.sum()` currently
- this PR depends on #11253, and will need to be updated upon it lands
- [x] implement forward
- [x] implement backward
- performance [benchmark script](https://gist.github.com/weiyangfb/f4c55c88b6092ef8f7e348f6b9ad8946#file-sparse_sum_benchmark-py):
  - sum all dims is fastest for sparse tensor
  - when input is sparse enough nnz = 0.1%, sum of sparse tensor is faster than dense in CPU, but not necessary in CUDA
  - CUDA backward is comparable (<2x) between `sum several dims` vs `sum all dims` in sparse
  - CPU backward uses binary search is still slow in sparse, takes `5x` time in `sum [0, 2, 3] dims` vs `sum all dims`
    - optimize CUDA backward for now
      - using thrust for sort and binary search, but runtime not improved
  - both of CPU and CUDA forward are slow in sparse (`sum several dims` vs `sum all dims`), at most `20x` slower in CPU, and `10x` in CUDA
    - improve CPU and CUDA forward kernels

(nnz, sizes, sum_dims, keepdim, sum all or dims, bk=backward) | CPU (sparse vs dense) | CUDA(sparse vs dense)
-- | -- | --
(1000,   [1000, 1000, 2, 2], [0, 1], False, sumAll) | 8.77 µs vs 72.9 µs | 42.5 µs vs 108 µs
(1000,   [1000, 1000, 2, 2], [0, 1], False, sumD) | 112 µs vs 4.47 ms | 484 µs vs 407 µs
(1000,   [1000, 1000, 2, 2], [0, 1], False, sumAll, bk) | 141 µs vs 148 µs | 647 µs vs 231 µs
(1000,   [1000, 1000, 2, 2], [0, 1], False, sumD, bk) | 235 µs vs 1.23 ms | 781 µs vs 213 µs
(1000,   [1000, 1000, 2, 2], [2, 3], False, sumD) | 48.5 µs vs 360 µs | 160 µs vs 2.03 ms
(1000,   [1000, 1000, 2, 2], [2, 3], False, sumD, bk) | 258 µs vs 1.22 ms | 798 µs vs 224 µs
(1000,   [1000, 1000, 2, 2], [0, 2, 3], False, sumD) | 204 µs vs 882 µs | 443 µs vs 133 µs
(1000,   [1000, 1000, 2, 2], [0, 2, 3], False, sumD, bk) | 709 µs vs 1.15 ms | 893 µs vs 202 µs
(10000,   [1000, 1000, 2, 2], [0, 1], False, sumAll) | 39.8 µs vs 81 µs | 42.4 µs vs 113 µs
(10000,   [1000, 1000, 2, 2], [0, 1], False, sumD) | 747 µs vs 4.7 ms | 2.4 ms vs 414 µs
(10000,   [1000, 1000, 2, 2], [0, 1], False, sumAll, bk) | 1.04 ms vs 126 µs | 5.03 ms vs 231 µs
(10000,   [1000, 1000, 2, 2], [0, 1], False, sumD, bk) | 1.12 ms vs 1.24 ms | 5.99 ms vs 213 µs
(10000,   [1000, 1000, 2, 2], [2, 3], False, sumD) | 133 µs vs 366 µs | 463 µs vs 2.03 ms
(10000,   [1000, 1000, 2, 2], [2, 3], False, sumD, bk) | 1.56 ms vs 1.22 ms | 6.11 ms vs 229 µs
(10000,   [1000, 1000, 2, 2], [0, 2, 3], False, sumD) | 1.53 ms vs 799 µs | 824 µs vs 134 µs
(10000,   [1000, 1000, 2, 2], [0, 2, 3], False, sumD, bk) | 5.15 ms vs 1.09 ms | 7.02 ms vs 205 µs

- after improving CPU and CUDA forward kernels
  - in `(1000,   [1000, 1000, 2, 2], [0, 2, 3], False, sumD)` forward, CPU takes ~~`171 µs`~~, in which `130 µs` is spent on `coalesce()`, for CUDA, total time is ~~`331 µs`~~, in which `141 µs` is spent on `coalesce()`, we need to reduce time at other places outside `coalesce()`.
  - after a few simple tweaks, now in the forward, it is at most `10x` slower in CPU, and `7x` in CUDA. And time takes in `sum dense dims only [2, 3]` is `~2x` of `sum all dims`. Speed of `sum all sparse dims [0, 1]` is on bar with `sum all dims`

(nnz,   sizes, sum_dims, keepdim, sum all or dims, bk=backward) | CPU (sparse vs dense) | CUDA(sparse vs dense)
-- | -- | --
(1000,   [1000, 1000, 2, 2], [0, 1], False, sumAll) | 7 µs vs 69.5 µs | 31.5 µs vs 61.6 µs
(1000,   [1000, 1000, 2, 2], [0, 1], False, sumD) | 11.3 µs vs 4.72 ms | 35.2 µs vs 285 µs
(1000,   [1000, 1000, 2, 2], [0, 1], False, sumAll, bk) | 197 µs vs 124 µs | 857 µs vs 134 µs
(1000,   [1000, 1000, 2, 2], [0, 1], False, sumD, bk) | 124 µs vs 833 µs | 796 µs vs 106 µs
(1000,   [1000, 1000, 2, 2], [2, 3], False, sumD) | 20.5 µs vs 213 µs | 39.4 µs vs 1.24 ms
(1000,   [1000, 1000, 2, 2], [2, 3], False, sumD, bk) | 131 µs vs 830 µs | 881 µs vs 132 µs
(1000,   [1000, 1000, 2, 2], [0, 2, 3], False, sumD) | 95.8 µs vs 409 µs | 246 µs vs 87.2 µs
(1000,   [1000, 1000, 2, 2], [0, 2, 3], False, sumD, bk) | 624 µs vs 820 µs | 953 µs vs 124 µs
(10000,   [1000, 1000, 2, 2], [0, 1], False, sumAll) | 45.3 µs vs 72.9 µs | 33.9 µs vs 57.2 µs
(10000,   [1000, 1000, 2, 2], [0, 1], False, sumD) | 81.4 µs vs 4.49 ms | 39.7 µs vs 280 µs
(10000,   [1000, 1000, 2, 2], [0, 1], False, sumAll, bk) | 984 µs vs 111 µs | 6.41 ms vs 121 µs
(10000,   [1000, 1000, 2, 2], [0, 1], False, sumD, bk) | 1.45 ms vs 828 µs | 6.77 ms vs 113 µs
(10000,   [1000, 1000, 2, 2], [2, 3], False, sumD) | 74.9 µs vs 209 µs | 37.7 µs vs 1.23 ms
(10000,   [1000, 1000, 2, 2], [2, 3], False, sumD, bk) | 1.48 ms vs 845 µs | 6.96 ms vs 132 µs
(10000,   [1000, 1000, 2, 2], [0, 2, 3], False, sumD) | 1.14 ms vs 411 µs | 252 µs vs 87.8 µs
(10000,   [1000, 1000, 2, 2], [0, 2, 3], False, sumD, bk) | 4.53 ms vs 851 µs | 7.12 ms vs 128 µs

- time takes in CUDA backward of sparse is super long with large variance (in case of nnz=10000, it normally takes 6-7ms). To improve backward of sparse ops, we will need to debug at places other than CUDA kernels. here is a benchmark of `torch.copy_()`:
```
>>> d = [1000, 1000, 2, 2]
>>> nnz = 10000
>>> I = torch.cat([torch.randint(0, d[0], size=(nnz,)),
               torch.randint(0, d[1], size=(nnz,))], 0).reshape(2, nnz)
>>> V = torch.randn(nnz, d[2], d[3])
>>> size = torch.Size(d)
>>> S = torch.sparse_coo_tensor(I, V, size).coalesce().cuda()
>>> S2 = torch.sparse_coo_tensor(I, V, size).coalesce().cuda().requires_grad_()
>>> data = S2.clone()
>>> S.copy_(S2)
>>> y = S * 2
>>> torch.cuda.synchronize()
>>> %timeit y.backward(data, retain_graph=True); torch.cuda.synchronize()
7.07 ms ± 3.06 ms per loop (mean ± std. dev. of 7 runs, 1000 loops each)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12430

Differential Revision: D12878313

Pulled By: weiyangfb

fbshipit-source-id: e16dc7681ba41fdabf4838cf05e491ca9108c6fe
2018-11-28 02:19:12 -08:00
Wei Yang
50bc9dc9c3 fix doc for sparse.addmm (#14403)
Summary:
- fixing the doc issue in sparse.addmm

================ before change ==================
![image](https://user-images.githubusercontent.com/38509346/49063994-2f10fe80-f1ce-11e8-9ccc-54241bc45f0b.png)
![image](https://user-images.githubusercontent.com/38509346/49064064-641d5100-f1ce-11e8-865a-7227be7156ef.png)

================ post change ==================
![image](https://user-images.githubusercontent.com/38509346/49064078-76978a80-f1ce-11e8-8f38-f1f8ac9ce63b.png)
![image](https://user-images.githubusercontent.com/38509346/49064085-7bf4d500-f1ce-11e8-8a0d-bf9e5460d21f.png)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14403

Differential Revision: D13216582

Pulled By: weiyangfb

fbshipit-source-id: 52e0a20c6b341c37cfb31f281be3afe2a52ca532
2018-11-27 10:24:18 -08:00
Wei Yang
12558019a8 backward for sparse.addmm(D, S, D, alpha, beta) -> D (#13345)
Summary:
- introduce `sparse.addmm()` with backward for sparse matrix input for https://github.com/pytorch/pytorch/issues/12308
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13345

Differential Revision: D13094070

Pulled By: weiyangfb

fbshipit-source-id: 136c08c3ca9bafb20577b60dd43d31c3e5cd5461
2018-11-26 17:47:48 -08:00
Doug Friedman
c2f8f5076c add narrow() support for sparse tensors re: #8853 (#11342)
Summary:
Couple questions:

1) I used the log1p implementation in #8969 as a guide especially for testing.  I'm not sure what the ```skipIfROCM``` annotation is for, so unsure if i need it for my test.

2) I implemented the branching logic in the narrow function itself; is this the right place to do so?  I noticed that there a number of places where sparse-specific logic is handled with just an if statement in this file.  Or should I implement a separate dispatch in native_functions.yml as in the log1p?

And of course, happy to make any any other updates/changes that I may have missed as well.  This is my first PR to the project.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11342

Differential Revision: D9978430

Pulled By: weiyangfb

fbshipit-source-id: e73dc20302ab58925afb19e609e31f4a38c634ad
2018-09-26 12:24:54 -07:00
Peter Goldsborough
fb4e8088f3 Remove methods that start with an underscore from at::Tensor (#11152)
Summary:
This PR cleans up the `at::Tensor` class by removing all methods that start with an underscore in favor of functions in the `at::` namespace. This greatly cleans up the `Tensor` class and makes it clearer what is the public and non-public API.

For this I changed `native_functions.yaml` and `Declarations.cwrap` to make all underscore methods `variant: function` (or add such a statement to begin with), and then fixed all code locations using the underscore methods.

ezyang colesbury gchanan
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11152

Differential Revision: D9683607

Pulled By: goldsborough

fbshipit-source-id: 97f869f788fa56639c05a439e2a33be49f10f543
2018-09-07 11:55:11 -07:00