pytorch/docs/source
Wei Yang 5dd153b1c2 speed up torch.sparse_mask() cpu kernel (#13290)
Summary:
- `sparse_mask(D, S)` is useful to implement backward for `sparse_addmm()`
- previous `sparse_mask(D, S)` cpu kernel is not parallelized
- this PR speed up the cpu kernel for two separated cases:
  - `D.dim == S.sparse_dim`: simply parallelize the kernel
  - `D.dim > S.sparse_dim`: simply use CUDA kernel implementation
- performance:

`D.dim == S.sparse_dim`
```
>>> nnz = 100000
>>> dims = [1000, 1000]
>>> I = torch.cat([torch.randint(0, dims[0], size=(nnz,)),
               torch.randint(0, dims[1], size=(nnz,))], 0).reshape(2, nnz)
>>> V = torch.randn(nnz)
>>> size = torch.Size(dims)

>>> S = torch.sparse_coo_tensor(I, V, size).coalesce()
>>> D = torch.randn(dims)

>>> %timeit D.sparse_mask(S)

======= before change =======
6.4 ms ± 684 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

======= after change =======
333 µs ± 89.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
```

`D.dim > S.sparse_dim`
```
>>> nnz = 100000
>>> dims = [1000, 1000, 2, 2]
>>> I = torch.cat([torch.randint(0, dims[0], size=(nnz,)),
               torch.randint(0, dims[1], size=(nnz,))], 0).reshape(2, nnz)
>>> V = torch.randn(nnz, dims[2], dims[3])
>>> size = torch.Size(dims)

>>> S = torch.sparse_coo_tensor(I, V, size).coalesce()
>>> D = torch.randn(dims)
%timeit D.sparse_mask(S)

======= before change =======
495 ms ± 41.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

======= after change =======
594 µs ± 68.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13290

Differential Revision: D12878336

Pulled By: weiyangfb

fbshipit-source-id: 10b5981af382f7c6095a42c0fee7297d6438ce37
2018-11-07 20:02:17 -08:00
..
_static move flags to c10 (#12144) 2018-10-04 02:09:56 -07:00
_templates Add Google pixel code 2018-10-23 13:26:37 -07:00
notes Try to fix randomness.rst formatting again 2018-10-18 19:18:49 -07:00
scripts Add CELU activation to pytorch (#8551) 2018-08-01 07:54:44 -07:00
autograd.rst Add autograd automatic anomaly detection (#7677) 2018-06-11 21:26:17 -04:00
bottleneck.rst [docs] Clarify more CUDA profiling gotchas in bottleneck docs (#6763) 2018-04-19 13:15:27 -04:00
checkpoint.rst [docs] Fix some sphinx warnings (#6764) 2018-04-19 12:37:42 -04:00
conf.py Remove outdated css and font files in html docs (#13699) 2018-11-07 16:31:28 -08:00
cpp_extension.rst Inline JIT C++ Extensions (#7059) 2018-04-30 11:48:44 -04:00
cuda_deterministic_backward.rst Amend nondeterminism notes (#12217) 2018-10-16 23:59:26 -07:00
cuda_deterministic.rst Amend nondeterminism notes (#12217) 2018-10-16 23:59:26 -07:00
cuda.rst Fix Python docs for broadcast and braodcast_coalesced (#4727) 2018-01-19 10:57:20 -05:00
cudnn_deterministic.rst Amend nondeterminism notes (#12217) 2018-10-16 23:59:26 -07:00
cudnn_persistent_rnn.rst don't copy weight gradients in rnn (#12600) 2018-10-12 13:34:10 -07:00
data.rst add fold example and add nn.Fold/nn.Unfold and F.fold/F.unfold to doc (#8600) 2018-06-18 09:36:42 -04:00
distributed_deprecated.rst Documentation for c10d: torch.distributed and deprecate the old distributed doc (#11450) 2018-09-11 02:10:28 -07:00
distributed.rst Rename DistBackend -> Backend (#11830) 2018-11-07 11:58:12 -08:00
distributions.rst NegativeBinomial distribution (#9345) 2018-08-01 08:39:25 -07:00
dlpack.rst document torch.utils.dlpack (#9343) 2018-07-11 07:46:09 -07:00
ffi.rst Improve ffi utils (#479) 2017-01-18 11:17:01 -05:00
hub.rst Hub Implementation (#12228) 2018-10-29 18:43:14 -07:00
index.rst Hub Implementation (#12228) 2018-10-29 18:43:14 -07:00
jit.rst Fixes for Torch Script C++ API (#11682) 2018-09-17 09:54:50 -07:00
legacy.rst Add anything in torch.legacy docs 2017-01-16 12:59:47 -05:00
model_zoo.rst Add model_zoo utility torch torch.utils (#424) 2017-01-09 13:16:58 -05:00
multiprocessing.rst Typofix 2017-10-13 01:31:22 +02:00
nn.rst Add DistributedDataParallelCPU to doc 2018-10-21 11:20:11 -07:00
onnx.rst Add trigonometry functions to docs/source/onnx.rst 2018-09-12 12:10:01 -07:00
optim.rst Add Cosine Annealing LR Scheduler (#3311) 2017-12-18 02:43:08 -05:00
sparse.rst add narrow() support for sparse tensors re: #8853 (#11342) 2018-09-26 12:24:54 -07:00
storage.rst Start documenting torch.Tensor (#377) 2016-12-30 01:21:34 -05:00
tensor_attributes.rst Update device docs (#6887) 2018-04-23 19:04:20 -04:00
tensors.rst speed up torch.sparse_mask() cpu kernel (#13290) 2018-11-07 20:02:17 -08:00
torch.rst Add diag_embed to ATen and torch (#12447) 2018-11-05 08:55:28 -08:00
type_info.rst Added a default constructor for torch.finfo. 2018-10-23 09:03:24 -07:00