mirror of
https://github.com/zebrajr/pytorch.git
synced 2025-12-07 00:21:07 +01:00
Summary:
- `sparse_mask(D, S)` is useful to implement backward for `sparse_addmm()`
- previous `sparse_mask(D, S)` cpu kernel is not parallelized
- this PR speed up the cpu kernel for two separated cases:
- `D.dim == S.sparse_dim`: simply parallelize the kernel
- `D.dim > S.sparse_dim`: simply use CUDA kernel implementation
- performance:
`D.dim == S.sparse_dim`
```
>>> nnz = 100000
>>> dims = [1000, 1000]
>>> I = torch.cat([torch.randint(0, dims[0], size=(nnz,)),
torch.randint(0, dims[1], size=(nnz,))], 0).reshape(2, nnz)
>>> V = torch.randn(nnz)
>>> size = torch.Size(dims)
>>> S = torch.sparse_coo_tensor(I, V, size).coalesce()
>>> D = torch.randn(dims)
>>> %timeit D.sparse_mask(S)
======= before change =======
6.4 ms ± 684 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
======= after change =======
333 µs ± 89.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
```
`D.dim > S.sparse_dim`
```
>>> nnz = 100000
>>> dims = [1000, 1000, 2, 2]
>>> I = torch.cat([torch.randint(0, dims[0], size=(nnz,)),
torch.randint(0, dims[1], size=(nnz,))], 0).reshape(2, nnz)
>>> V = torch.randn(nnz, dims[2], dims[3])
>>> size = torch.Size(dims)
>>> S = torch.sparse_coo_tensor(I, V, size).coalesce()
>>> D = torch.randn(dims)
%timeit D.sparse_mask(S)
======= before change =======
495 ms ± 41.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
======= after change =======
594 µs ± 68.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13290
Differential Revision: D12878336
Pulled By: weiyangfb
fbshipit-source-id: 10b5981af382f7c6095a42c0fee7297d6438ce37
|
||
|---|---|---|
| .. | ||
| _static | ||
| _templates | ||
| notes | ||
| scripts | ||
| autograd.rst | ||
| bottleneck.rst | ||
| checkpoint.rst | ||
| conf.py | ||
| cpp_extension.rst | ||
| cuda_deterministic_backward.rst | ||
| cuda_deterministic.rst | ||
| cuda.rst | ||
| cudnn_deterministic.rst | ||
| cudnn_persistent_rnn.rst | ||
| data.rst | ||
| distributed_deprecated.rst | ||
| distributed.rst | ||
| distributions.rst | ||
| dlpack.rst | ||
| ffi.rst | ||
| hub.rst | ||
| index.rst | ||
| jit.rst | ||
| legacy.rst | ||
| model_zoo.rst | ||
| multiprocessing.rst | ||
| nn.rst | ||
| onnx.rst | ||
| optim.rst | ||
| sparse.rst | ||
| storage.rst | ||
| tensor_attributes.rst | ||
| tensors.rst | ||
| torch.rst | ||
| type_info.rst | ||