Commit Graph

4 Commits

Author SHA1 Message Date
Sameer Deshmukh
5fb1142702 Add CSR (compressed sparse row) layout for sparse tensors (#50937)
Summary:
Implement compressed sparse row format. Derived from the GCS implementation at https://github.com/pytorch/pytorch/pull/44190

Pull Request resolved: https://github.com/pytorch/pytorch/pull/50937

Reviewed By: mrshenli

Differential Revision: D27439865

Pulled By: ezyang

fbshipit-source-id: 3ba3dcb9679505b980ff6a5f513e913bbae2fb1d
2021-04-12 10:09:12 -07:00
Alexander
39f50f468d matmul performance benchmarks (#51647)
Summary:
Minor PR following up the previous PR about sparse benchmarking utils https://github.com/pytorch/pytorch/pull/48397

Fixes https://github.com/pytorch/pytorch/issues/44634:  Performance benchmarks for matrix-matrix and matrix-vector ops (dense-sparse, sparse-sparse, and compare to dense-dense)

I ran  all benchmarks on an 2xRTX8000 machine with AMD 2970WX 24-cores for `DLMC/magnitude_pruning` dataset with different sparsity levels.

 ---
<details><summary> forward tests (expand for details).
</summary>

- `sparse@sparse`
```
[------------------------------- cpu:matmul-forward -------------------------------]
                           |   0.5   |   0.7   |   0.8   |   0.9   |  0.95  |   0.98
1 threads: -------------------------------------------------------------------------
      torch:dense@dense    |  108.1  |  100.5  |  101.3  |  108.4  |  98.4  |  187.4
      torch:sparse@sparse  |  659.1  |  368.8  |  156.5  |   53.3  |  26.8  |   14.9
      scipy:sparse@sparse  |  565.1  |  233.9  |  130.2  |   23.1  |  21.6  |   15.2

Times are in milliseconds (ms).

[----------------------------------- cuda:matmul-forward -----------------------------------]
                           |    0.5    |    0.7    |   0.8    |   0.9    |   0.95   |   0.98
1 threads: ----------------------------------------------------------------------------------
      torch:dense@dense    |   2243.5  |   4392.5  |  4419.8  |  2272.3  |  4433.9  |  8920.1
      torch:sparse@sparse  |  21369.2  |  11877.6  |  7339.2  |  1787.2  |  1335.1  |   845.7

Times are in microseconds (us).

```
- `sparse@dense`
```
[------------------------------- cpu:matmul-forward -------------------------------]
                          |   0.5   |   0.7   |   0.8   |   0.9   |   0.95  |   0.98
1 threads: -------------------------------------------------------------------------
      torch:dense@dense   |  105.8  |  103.8  |  103.0  |  104.4  |  104.4  |  197.0
      torch:sparse@dense  |  119.9  |  102.4  |   84.0  |   19.7  |   16.8  |   11.6
      scipy:sparse@dense  |  906.5  |  799.6  |  697.8  |  182.2  |  165.5  |  135.4

Times are in milliseconds (ms).

[------------------------- cuda:matmul-forward --------------------------]
                          |  0.5  |  0.7  |  0.8  |  0.9  |  0.95  |  0.98
1 threads: ---------------------------------------------------------------
      torch:dense@dense   |  2.2  |  4.4  |  4.4  |  2.3  |  4.5   |  2.3
      torch:sparse@dense  |  5.7  |  6.6  |  4.5  |  1.4  |  1.4   |  1.3

Times are in milliseconds (ms).

```
- `sparse@vector`
```
[----------------------------------- cpu:matmul-forward ----------------------------------]
                           |    0.5    |   0.7    |   0.8    |   0.9    |   0.95   |   0.98
1 threads: --------------------------------------------------------------------------------
      torch:dense@vector   |    510.6  |   505.8  |   759.6  |   782.1  |   682.4  |  764.6
      torch:sparse@vector  |  10122.8  |  6241.1  |  7935.6  |  2076.3  |  1049.5  |  826.3
      scipy:sparse@vector  |   1756.7  |  1033.9  |   678.2  |   343.5  |   168.5  |   65.4

Times are in microseconds (us).

[-------------------------------- cuda:matmul-forward --------------------------------]
                           |   0.5    |   0.7    |   0.8   |   0.9   |   0.95  |   0.98
1 threads: ----------------------------------------------------------------------------
      torch:dense@vector   |    36.1  |    21.5  |   21.6  |   21.5  |   21.6  |   21.5
      torch:sparse@vector  |  1099.2  |  1289.4  |  775.7  |  327.1  |  285.4  |  274.0

Times are in microseconds (us).

```
</details>

 ---
<details><summary> backward tests (expand for details).
</summary>

- `sparse@sparse`
```
[--------------------------------- cpu:matmul-backward ---------------------------------]
                           |   0.5    |   0.7    |   0.8    |   0.9    |   0.95  |   0.98
1 threads: ------------------------------------------------------------------------------
      torch:dense@dense    |   246.1  |   315.0  |   306.9  |   168.6  |  290.6  |  146.9
      torch:sparse@sparse  |  6417.5  |  4393.7  |  3012.7  |  1029.4  |  908.0  |  650.7

Times are in microseconds (us).

[----------------------------- cuda:matmul-backward -----------------------------]
                           |   0.5   |   0.7   |   0.8   |  0.9   |  0.95  |  0.98
1 threads: -----------------------------------------------------------------------
      torch:dense@dense    |    6.7  |   13.3  |   13.3  |   6.9  |  13.5  |   6.9
      torch:sparse@sparse  |  143.7  |  143.4  |  119.6  |  29.5  |  29.1  |  10.9

Times are in microseconds (us).

```
- `sparse@dense`
```
 [------------------------------ cpu:matmul-backward -------------------------------]
                          |   0.5   |   0.7   |   0.8   |   0.9   |   0.95  |   0.98
1 threads: -------------------------------------------------------------------------
      torch:dense@dense   |  185.9  |  304.8  |  305.8  |  169.9  |  308.7  |  168.4
      torch:sparse@dense  |  407.9  |  345.8  |  274.6  |  114.2  |  163.6  |  230.5

Times are in milliseconds (ms).

[--------------------------- cuda:matmul-backward --------------------------]
                          |  0.5   |  0.7   |  0.8   |  0.9  |  0.95  |  0.98
1 threads: ------------------------------------------------------------------
      torch:dense@dense   |   6.7  |  13.3  |  13.3  |  6.9  |  13.4  |   6.9
      torch:sparse@dense  |  16.7  |  19.0  |  15.1  |  6.3  |   8.2  |  12.7

Times are in milliseconds (ms).

```
</details>

Kindly review this  PR. cc mruberry, ngimel

Pull Request resolved: https://github.com/pytorch/pytorch/pull/51647

Reviewed By: albanD

Differential Revision: D27007809

Pulled By: mruberry

fbshipit-source-id: 8c1922cb3280027ca5e3eef31bfa20500c548cfd
2021-03-14 00:25:45 -08:00
Sam Estep
8c798e0622 Forbid trailing whitespace (#53406)
Summary:
Context: https://github.com/pytorch/pytorch/pull/53299#discussion_r587882857

These are the only hand-written parts of this diff:
- the addition to `.github/workflows/lint.yml`
- the file endings changed in these four files (to appease FB-internal land-blocking lints):
  - `GLOSSARY.md`
  - `aten/src/ATen/core/op_registration/README.md`
  - `scripts/README.md`
  - `torch/csrc/jit/codegen/fuser/README.md`

The rest was generated by running this command (on macOS):
```
git grep -I -l ' $' -- . ':(exclude)**/contrib/**' ':(exclude)third_party' | xargs gsed -i 's/ *$//'
```

I looked over the auto-generated changes and didn't see anything that looked problematic.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/53406

Test Plan:
This run (after adding the lint but before removing existing trailing spaces) failed:
- https://github.com/pytorch/pytorch/runs/2043032377

This run (on the tip of this PR) succeeded:
- https://github.com/pytorch/pytorch/runs/2043296348

Reviewed By: walterddr, seemethere

Differential Revision: D26856620

Pulled By: samestep

fbshipit-source-id: 3f0de7f7c2e4b0f1c089eac9b5085a58dd7e0d97
2021-03-05 17:22:55 -08:00
Alexander
44ce0b8883 Sparse-sparse matrix multiplication (CPU/CUDA) (#39526)
Summary:
This PR implements matrix multiplication support for 2-d sparse tensors using the COO sparse format.

The current implementation of `torch.sparse.mm` support this configuration,
`torch.sparse.mm(sparse_matrix1, sparse_matrix2.to_dense())`, but this could spend a lot of memory when sparse_matrix2's shape is large.

This implementation extends `torch.sparse.mm` function to support  `torch.sparse.mm(sparse_matrix1, sparse_matrix2)`

Resolves  #[20988](https://github.com/pytorch/pytorch/issues/20988) for CPU/CUDA.

- [x] sparse matmul
  - [x] CPU/CUDA C++ implementation
  - [x] unittests
  - [x] update torch.sparse.mm documentation
  - [x] autograd support

The CPU sparse-sparse matmul was implemented taking as a reference this work "Sparse Matrix Multiplication Package (SMMP)". The GPU sparse-sparse matmul is based on cuSparse, there is specific code for CUSPARSE when CUSPARSE_VERSION >= 11 and old version of CUSPARSE. Both CPU/CUDA  rely on the sparse-sparse matmul algorithm using the CSR indices format as it is one of the fastest algorithm.

Here it is the latest benchmark (script is here) results for torch.sparse.mm (CUDA) and torch.sparse.mm (CPU) and scipy, values are float32 scalars:

size | density | sparse.mm(CUDA) | sparse.mm(CPU) | scipy_coo_matmul
-- | -- | -- | -- | --
(32, 10000) | 0.01 | 822.7 | 79.4 | 704.1
(32, 10000) | 0.05 | 1741.1 | 402.6 | 1155.3
(32, 10000) | 0.1 | 2956.8 | 840.8 | 1885.4
(32, 10000) | 0.25 | 6417.7 | 2832.3 | 4665.2
(512, 10000) | 0.01 | 1010.2 | 3941.3 | 26937.7
(512, 10000) | 0.05 | 2216.2 | 26903.8 | 57343.7
(512, 10000) | 0.1 | 4868.4 | 87773.7 | 117477.0
(512, 10000) | 0.25 | 16639.3 | 608105.0 | 624290.4
(1024, 10000) | 0.01 | 1224.8 | 13088.1 | 110379.2
(1024, 10000) | 0.05 | 3897.5 | 94783.9 | 236541.8
(1024, 10000) | 0.1 | 10559.1 | 405312.5 | 525483.4
(1024, 10000) | 0.25 | 57456.3 | 2424337.5 | 2729318.7

A new backward algorithm was implemented using only `sparse @ sparse` and `sparse_mask` operations. Here is some benchmarking:

```
[------------------------- sparse.mm-backward -------------------------]
                            |   sparse.backward   |  dense.backward
 -----------------------------------------------------------------------
      (32, 10000) | 0.01    |            13.5          |         2.4
      (32, 10000) | 0.05    |            52.3          |         2.4
      (512, 10000) | 0.01   |          1016.8          |       491.5
      (512, 10000) | 0.05   |          1604.3          |       492.3
      (1024, 10000) | 0.01  |          2384.1          |      1963.7
      (1024, 10000) | 0.05  |          3965.8          |      1951.9
```

I added new benchmark tests. Now I am using a real dataset used in recent studies [1, 2] with different sparsity levels.

```
[---------------------------------- matmul ---------------------------------]
                        |   0.5   |  0.7   |  0.8   |  0.9   |  0.95  |  0.98
1 threads: ------------------------------------------------------------------
  (cpu)   torch         |    5.4  |   5.4  |   5.2  |   5.3  |   5.3  |   5.4
          torch.sparse  |  122.2  |  51.9  |  27.5  |  11.4  |   4.9  |   1.8
          scipy         |  150.1  |  87.4  |  69.2  |  56.8  |  38.4  |  17.1
  (cuda)  torch         |    1.3  |   1.1  |   1.1  |   1.1  |   1.1  |   1.1
          torch.sparse  |   20.0  |   8.4  |   5.1  |   2.5  |   1.5  |   1.1

[----------------------------------- backward -----------------------------------]
                        |   0.5   |   0.7   |   0.8   |   0.9   |   0.95  |   0.98
1 threads: -----------------------------------------------------------------------
  (cpu)   torch         |   17.7  |   17.9  |   17.7  |   17.7  |   17.6  |   17.9
          torch.sparse  |  672.9  |  432.6  |  327.5  |  230.8  |  176.7  |  116.7
  (cuda)  torch         |    3.8  |    3.6  |    3.5  |    3.5  |    3.6  |    3.5
          torch.sparse  |   68.8  |   46.2  |   35.6  |   24.2  |   17.8  |   11.9

Times are in milliseconds (ms).
```

In summary, I can say that the new `sparse @ sparse` backward algorithm is better as it is more about saving space than performance. Moreover, it is better than other options tested before.

## **References**

1. Trevor Gale, Matei Zaharia, Cliff Young, Erich Elsen. **Sparse GPU Kernels for Deep Learning.**  Proceedings of the International Conference for High Performance Computing, 2020. [https://github.com/google-research/google-research/tree/master/sgk](https://github.com/google-research/google-research/tree/master/sgk)
2. Trevor Gale, Erich Elsen, Sara Hooker. **The State of Sparsity in Deep Neural Networks.** [https://github.com/google-research/google-research/tree/master/state_of_sparsity](https://github.com/google-research/google-research/tree/master/state_of_sparsity)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/39526

Reviewed By: mruberry

Differential Revision: D25661239

Pulled By: ngimel

fbshipit-source-id: b515ecd66d25f347d637e159d51aa45fb43b6938
2020-12-21 11:53:55 -08:00