pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

History

Wei Yang 5dd153b1c2 speed up torch.sparse_mask() cpu kernel (#13290 ) Summary: - `sparse_mask(D, S)` is useful to implement backward for `sparse_addmm()` - previous `sparse_mask(D, S)` cpu kernel is not parallelized - this PR speed up the cpu kernel for two separated cases: - `D.dim == S.sparse_dim`: simply parallelize the kernel - `D.dim > S.sparse_dim`: simply use CUDA kernel implementation - performance: `D.dim == S.sparse_dim` ``` >>> nnz = 100000 >>> dims = [1000, 1000] >>> I = torch.cat([torch.randint(0, dims[0], size=(nnz,)), torch.randint(0, dims[1], size=(nnz,))], 0).reshape(2, nnz) >>> V = torch.randn(nnz) >>> size = torch.Size(dims) >>> S = torch.sparse_coo_tensor(I, V, size).coalesce() >>> D = torch.randn(dims) >>> %timeit D.sparse_mask(S) ======= before change ======= 6.4 ms ± 684 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) ======= after change ======= 333 µs ± 89.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) ``` `D.dim > S.sparse_dim` ``` >>> nnz = 100000 >>> dims = [1000, 1000, 2, 2] >>> I = torch.cat([torch.randint(0, dims[0], size=(nnz,)), torch.randint(0, dims[1], size=(nnz,))], 0).reshape(2, nnz) >>> V = torch.randn(nnz, dims[2], dims[3]) >>> size = torch.Size(dims) >>> S = torch.sparse_coo_tensor(I, V, size).coalesce() >>> D = torch.randn(dims) %timeit D.sparse_mask(S) ======= before change ======= 495 ms ± 41.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ======= after change ======= 594 µs ± 68.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/13290 Differential Revision: D12878336 Pulled By: weiyangfb fbshipit-source-id: 10b5981af382f7c6095a42c0fee7297d6438ce37		2018-11-07 20:02:17 -08:00
..
_static	move flags to c10 (#12144 )	2018-10-04 02:09:56 -07:00
_templates	Add Google pixel code	2018-10-23 13:26:37 -07:00
notes	Try to fix randomness.rst formatting again	2018-10-18 19:18:49 -07:00
scripts	Add CELU activation to pytorch (#8551 )	2018-08-01 07:54:44 -07:00
autograd.rst	Add autograd automatic anomaly detection (#7677 )	2018-06-11 21:26:17 -04:00
bottleneck.rst	[docs] Clarify more CUDA profiling gotchas in bottleneck docs (#6763 )	2018-04-19 13:15:27 -04:00
checkpoint.rst	[docs] Fix some sphinx warnings (#6764 )	2018-04-19 12:37:42 -04:00
conf.py	Remove outdated css and font files in html docs (#13699 )	2018-11-07 16:31:28 -08:00
cpp_extension.rst	Inline JIT C++ Extensions (#7059 )	2018-04-30 11:48:44 -04:00
cuda_deterministic_backward.rst	Amend nondeterminism notes (#12217 )	2018-10-16 23:59:26 -07:00
cuda_deterministic.rst	Amend nondeterminism notes (#12217 )	2018-10-16 23:59:26 -07:00
cuda.rst	Fix Python docs for broadcast and braodcast_coalesced (#4727 )	2018-01-19 10:57:20 -05:00
cudnn_deterministic.rst	Amend nondeterminism notes (#12217 )	2018-10-16 23:59:26 -07:00
cudnn_persistent_rnn.rst	don't copy weight gradients in rnn (#12600 )	2018-10-12 13:34:10 -07:00
data.rst	add fold example and add nn.Fold/nn.Unfold and F.fold/F.unfold to doc (#8600 )	2018-06-18 09:36:42 -04:00
distributed_deprecated.rst	Documentation for c10d: torch.distributed and deprecate the old distributed doc (#11450 )	2018-09-11 02:10:28 -07:00
distributed.rst	Rename DistBackend -> Backend (#11830 )	2018-11-07 11:58:12 -08:00
distributions.rst	NegativeBinomial distribution (#9345 )	2018-08-01 08:39:25 -07:00
dlpack.rst	document torch.utils.dlpack (#9343 )	2018-07-11 07:46:09 -07:00
ffi.rst	Improve ffi utils (#479 )	2017-01-18 11:17:01 -05:00
hub.rst	Hub Implementation (#12228 )	2018-10-29 18:43:14 -07:00
index.rst	Hub Implementation (#12228 )	2018-10-29 18:43:14 -07:00
jit.rst	Fixes for Torch Script C++ API (#11682 )	2018-09-17 09:54:50 -07:00
legacy.rst	Add anything in torch.legacy docs	2017-01-16 12:59:47 -05:00
model_zoo.rst	Add model_zoo utility torch torch.utils (#424 )	2017-01-09 13:16:58 -05:00
multiprocessing.rst	Typofix	2017-10-13 01:31:22 +02:00
nn.rst	Add DistributedDataParallelCPU to doc	2018-10-21 11:20:11 -07:00
onnx.rst	Add trigonometry functions to docs/source/onnx.rst	2018-09-12 12:10:01 -07:00
optim.rst	Add Cosine Annealing LR Scheduler (#3311 )	2017-12-18 02:43:08 -05:00
sparse.rst	add narrow() support for sparse tensors re: #8853 (#11342 )	2018-09-26 12:24:54 -07:00
storage.rst	Start documenting torch.Tensor (#377 )	2016-12-30 01:21:34 -05:00
tensor_attributes.rst	Update device docs (#6887 )	2018-04-23 19:04:20 -04:00
tensors.rst	speed up torch.sparse_mask() cpu kernel (#13290 )	2018-11-07 20:02:17 -08:00
torch.rst	Add diag_embed to ATen and torch (#12447 )	2018-11-05 08:55:28 -08:00
type_info.rst	Added a default constructor for torch.finfo.	2018-10-23 09:03:24 -07:00