pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Aleksandar Samardžić	c3a893c659	Implement adding bias vector into structured sparse linear operator (#100881 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/100881 Approved by: https://github.com/cpuhrsch	2023-05-17 05:46:22 +00:00
Pearu Peterson	65b15be04c	Fix incorrect sparse_dim in COO.zero_() and in binary operations with zero-sized COO operands (#98292 ) Fixes https://github.com/pytorch/pytorch/issues/97627 Pull Request resolved: https://github.com/pytorch/pytorch/pull/98292 Approved by: https://github.com/nikitaved, https://github.com/cpuhrsch, https://github.com/amjames	2023-05-11 19:05:34 +00:00
Aleksandar Samardžić	a8c2cd1039	Add CUTLASS-based MM for structured sparse linear operator (#100485 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/100485 Approved by: https://github.com/cpuhrsch	2023-05-09 21:05:15 +00:00
Pearu Peterson	92a7640b76	Add mul tests with sparse sample inputs (#100393 ) This PR implements sparse sample inputs and error inputs for mul OpInfo. Pull Request resolved: https://github.com/pytorch/pytorch/pull/100393 Approved by: https://github.com/amjames, https://github.com/cpuhrsch	2023-05-09 16:13:14 +00:00
Larry Liu	687afeb686	[dynamo][numpy] Add NumpyTensorVariable to translate ndarray attribute calls to tensor attributes (#95849 ) Issue: #93684 # Problem Reduce graph breaks when dynamo compiles python functions containing numpy functions and ndarray operations. # Design (as I know it) * Use torch_np.ndarray(a wrapper of tensor) to back a `VariableTracker`: `NumpyTensorVariable`. * Translate all attributes and methods calls, on ndarray, to torch_np.ndarray equivalent. This PR adds `NumpyTensorVariable` and supports: 1. tensor to ndarray, ndarray to tensor 2. numpy functions such as numpy.meshgrid() 3. ndarray attributes such as `itemsize`, `stride` Next PR will handle returning `np.ndarray` and add support for ndarray methods Pull Request resolved: https://github.com/pytorch/pytorch/pull/95849 Approved by: https://github.com/ezyang	2023-04-27 16:18:35 +00:00
Ilia Taraban	a1074ddf51	Enable cadd_sparse for BFloat16 on CPU (#96767 ) Enabling cadd_sparse operation for BFloat16 on CPU to support BFloat16 operations in GNN libraries. Pull Request resolved: https://github.com/pytorch/pytorch/pull/96767 Approved by: https://github.com/jgong5, https://github.com/cpuhrsch	2023-04-14 19:50:49 +00:00
eqy	2fddcf0fc0	[CUDA][CUDA 11] Remove more CUDA 11 version checks (#92934 ) Working on removing stragglers missed in previous CUDA version < 11.0 cleanup PRs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/92934 Approved by: https://github.com/ngimel	2023-03-30 19:49:52 +00:00
Pearu Peterson	9d5ac03b9a	Deprecate gradcheck check_sparse_nnz argument as duplicate of masked argument (#97187 ) As in the title. Pull Request resolved: https://github.com/pytorch/pytorch/pull/97187 Approved by: https://github.com/soulitzer	2023-03-22 14:11:03 +00:00
Huy Do	679dec847e	Use is_available instead of device_count to check for CUDA availability (#97043 ) There are some tests that incorrectly uses the number of GPU devices `torch.cuda.device_count() > 0` to check for CUDA availability instead of the default `torch.cuda.is_available()` call. This makes these tests more brittle when encountering infra flakiness on G5 runner using A10G, for example [test_pytorch_np](https://hud.pytorch.org/failure/FAILED%20test_tensorboard.py%3A%3ATestTensorBoardPyTorchNumpy%3A%3Atest_pytorch_np%20-%20RuntimeError%3A%20No%20CUDA%20GPUs%20are%20available). The underlying problem is that GPU devices could crash on these runner. While the root cause for that is unclear and we will try to upgrade to a new NVIDIA driver https://github.com/pytorch/pytorch/pull/96904 to see if it helps, we can also make these tests more resilient by using the correct check to skip tests correctly when GPU crashes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/97043 Approved by: https://github.com/clee2000	2023-03-18 00:39:42 +00:00
Pearu Peterson	2abcafcfd8	Add masked_grad kw argument to to_dense (#96095 ) As in the title. The `masked_grad` kw argument is required for `to_dense` backward to distinguish the expected semantics of sparse tensors. `masked_grad=True` means that the `to_dense` backward will apply a mask to the returned gradient where the mask is defined by the input indices. The default semantics implies `masked_grad==True` for BC but see the [comment](https://github.com/pytorch/pytorch/pull/96095/files#diff-d4df180433a09071e891d552426911c227b30ae9b8a8e56da31046e7ecb1afbeR501-R513) in `to_dense_backward`. As a consequence, existing code that is run through autograd engine must replace `.to_dense()` calls with `.to_dense(masked_grad=False)`. For example, ```python torch.autograd.gradcheck(lambda x: torch.sum(x, [0]).to_dense()) torch.autograd.gradcheck(lambda x: torch.sparse.sum(x, [0]).to_dense()) ``` (recall, gradcheck has `masked=False` as default) must be updated to ```python torch.autograd.gradcheck(lambda x: torch.sum(x, [0]).to_dense(masked_grad=False)) torch.autograd.gradcheck(lambda x: torch.sparse.sum(x, [0]).to_dense(masked_grad=True), masked=True) ``` Fixes https://github.com/pytorch/pytorch/issues/95550 Pull Request resolved: https://github.com/pytorch/pytorch/pull/96095 Approved by: https://github.com/cpuhrsch	2023-03-16 21:38:11 +00:00
Nikita Vedeneev	0b5040b329	sparse_mask: remove syncs by removing calls to coalesce (#94406 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/94406 Approved by: https://github.com/cpuhrsch, https://github.com/pearu	2023-03-13 16:30:27 +00:00
Andrew M. James	2bcc0e9e18	Expand sparse.softmax zero nnz tests to cover cases of previously reported FPE. (#95646 ) - Test cases with zero `nnz` added for `sparse.log_softmax`. - Test cases with zero `nnz` for both `sparse.log_softmax` and `torch.sparse_softmax` expanded to cover the backward pass. These test additions prove resolution to #95371 and #82107. Fixes #82107 #95371 Pull Request resolved: https://github.com/pytorch/pytorch/pull/95646 Approved by: https://github.com/cpuhrsch, https://github.com/pearu, https://github.com/nikitaved	2023-03-01 17:26:51 +00:00
Pearu Peterson	b89fda51cd	Implement sparse semantics support in gradcheck (2nd try) (#95405 ) Replaces https://github.com/pytorch/pytorch/pull/94714 that was reverted due to https://github.com/pytorch/pytorch/pull/94714#issuecomment-1442355648 Pull Request resolved: https://github.com/pytorch/pytorch/pull/95405 Approved by: https://github.com/albanD	2023-02-27 17:48:02 +00:00
Pearu Peterson	0c0694495b	Fix a bug in nesting check_sparse_tensor_invariants context managers (#95372 ) As in the title. The bug was reported in https://github.com/pytorch/pytorch/pull/94728#discussion_r1108892366 and has the following reproducer: ```python >>> import torch >>> check_ctx = torch.sparse.check_sparse_tensor_invariants(True) >>> no_check_ctx = torch.sparse.check_sparse_tensor_invariants(False) >>> with check_ctx: ... assert torch.sparse.check_sparse_tensor_invariants.is_enabled() ... with no_check_ctx: ... assert not torch.sparse.check_sparse_tensor_invariants.is_enabled() ... assert torch.sparse.check_sparse_tensor_invariants.is_enabled() ... Traceback (most recent call last): File "<stdin>", line 5, in <module> AssertionError ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/95372 Approved by: https://github.com/cpuhrsch	2023-02-23 18:22:13 +00:00
Zain Rizvi	808879ec8b	Revert "Implement sparse semantics support in gradcheck (#94714 )" (#95386 ) This reverts commit `7ac511c29a` from https://github.com/pytorch/pytorch/pull/94714 since it breaks periodic. Git thinks there's a merge conflict due to an unfortunately located newline deletion, so reverting this one manually Details behind the failure in https://github.com/pytorch/pytorch/pull/94714#issuecomment-1442160593 Pull Request resolved: https://github.com/pytorch/pytorch/pull/95386 Approved by: https://github.com/clee2000	2023-02-23 18:02:37 +00:00
Pearu Peterson	cece63f197	Add warn-once deprecation warning to legacy sparse constructors (#94850 ) Addresses https://github.com/pytorch/pytorch/issues/68323#issuecomment-1425174341 Pull Request resolved: https://github.com/pytorch/pytorch/pull/94850 Approved by: https://github.com/amjames, https://github.com/cpuhrsch	2023-02-23 15:05:12 +00:00
kshitij12345	3b966a6ce3	[autograd] disable backward/grad for complex scalar output (#92753 ) Fixes https://github.com/pytorch/pytorch/issues/92750 Pull Request resolved: https://github.com/pytorch/pytorch/pull/92753 Approved by: https://github.com/ezyang	2023-02-23 11:38:27 +00:00
Pearu Peterson	7ac511c29a	Implement sparse semantics support in gradcheck (#94714 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/94714 Approved by: https://github.com/soulitzer, https://github.com/albanD	2023-02-22 20:03:25 +00:00
Nikita Vedeneev	3ace14eb8b	[Bug fix] sparse_mask: wrong intersection on CUDA (#94829 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/94829 Approved by: https://github.com/cpuhrsch	2023-02-15 13:22:39 +00:00
Xuehai Pan	046e88a291	[BE] [3/3] Rewrite `super()` calls in test (#94592 ) Rewrite Python built-in class `super()` calls. Only non-semantic changes should be applied. - #94587 - #94588 - #94592 Also, methods with only a `super()` call are removed: ```diff class MyModule(nn.Module): - def __init__(self): - super().__init__() - def forward(self, ...): ... ``` Some cases that change the semantics should be kept unchanged. E.g.: `f152a79be9/caffe2/python/net_printer.py (L184-L190)` `f152a79be9/test/test_jit_fuser_te.py (L2628-L2635)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/94592 Approved by: https://github.com/ezyang, https://github.com/seemethere	2023-02-12 22:20:53 +00:00
Aaron Gokaslan	3d82d8d0ed	[BE] Enable more flake8-comprehensions checks (#94601 ) I applied some flake8 fixes and enabled checking for them in the linter. I also enabled some checks for my previous comprehensions PR. This is a follow up to #94323 where I enable the flake8 checkers for the fixes I made and fix a few more of them. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94601 Approved by: https://github.com/ezyang	2023-02-10 23:40:29 +00:00
Huy Do	c53bd0dd30	Mitigate broken test_coalesce_reference_cycle test on dynamo (#94622 ) The test has been disabled and shows up on https://github.com/pytorch/test-infra/blob/generated-stats/stats/disabled-tests-condensed.json, but then the JSON file downloaded by the runner doesn't seem to have it. Disable it explicitly to keep trunk green while investigating. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94622 Approved by: https://github.com/weiwangmeta	2023-02-10 21:59:36 +00:00
PyTorch MergeBot	76ed1a81d1	Revert "COO intersection kernel: respect value intersection order (#92242 )" This reverts commit `b07c839b70`. Reverted https://github.com/pytorch/pytorch/pull/92242 on behalf of https://github.com/jeanschmidt due to breaking vs17	2023-02-09 14:44:32 +00:00
Aleksandar Samardžić	e1f17b3530	Add CSR->BSC and CSC->BSR conversions (#93301 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/93301 Approved by: https://github.com/cpuhrsch	2023-02-07 19:22:05 +00:00
Nikita Vedeneev	b07c839b70	COO intersection kernel: respect value intersection order (#92242 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/92242 Approved by: https://github.com/cpuhrsch, https://github.com/amjames	2023-02-07 17:05:28 +00:00
Nikita Vedeneev	994f85d639	sparse_mask: extend lhs to sparse COO tensors (#92248 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/92248 Approved by: https://github.com/cpuhrsch, https://github.com/pearu	2023-02-01 09:00:07 +00:00
Aleksandar Samardžić	53f7fb9a22	Add CSC->BSC conversion (#92307 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/92307 Approved by: https://github.com/cpuhrsch	2023-01-30 17:03:36 +00:00
Pearu Peterson	65d6802e2f	Improve error messages for sparse methods on tensors with unsupported backends/layouts. (#93149 ) Fixes https://github.com/pytorch/pytorch/issues/92790 Pull Request resolved: https://github.com/pytorch/pytorch/pull/93149 Approved by: https://github.com/cpuhrsch	2023-01-27 19:50:23 +00:00
Pearu Peterson	0e92bbe5b1	Add sparse COO tensor support to torch.sum(dim=..., keepdim=...) (#92979 ) Fixes #92757, #86232 Pull Request resolved: https://github.com/pytorch/pytorch/pull/92979 Approved by: https://github.com/cpuhrsch	2023-01-26 18:42:51 +00:00
Eddie Yan	0bf7506051	[CUDA] Drop CUDA < 11.0 test flags (#92605 ) Follow-up of #89582 to drop flags like `CUDA11OrLater` in tests. Note that in some places it appears that `TEST_WITH_ROCM` is _implicitly_ guarded against via the `CUDA11OrLater` version check, based on my best-guess of how `torch.version.cuda` would behave in ROCM builds, so I've added `not TEST_WITH_ROCM` in cases where ROCM wasn't previously explicitly allowed. CC @ptrblck @malfet @ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/92605 Approved by: https://github.com/ngimel	2023-01-24 04:34:06 +00:00
Nikita Vedeneev	9f381c9b7f	sparse_sparse_matmul: simplify backward (#91712 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/91712 Approved by: https://github.com/albanD	2023-01-23 19:24:28 +00:00
Yanbo Liang	0ab4ab9f8d	[Dynamo] Fix calling UserDefinedObject.func should pass self object (#92050 ) Fixes #90834 Pull Request resolved: https://github.com/pytorch/pytorch/pull/92050 Approved by: https://github.com/jansel	2023-01-21 05:47:01 +00:00
Pearu Peterson	b3e4f5029b	Add check-sparse-tensor-invariants flag to Context - 2nd try. (#92094 ) This PR is a copy of https://github.com/pytorch/pytorch/pull/90849 that merge was reverted. The PR adds "check sparse tensor invariants" flag to Context that when enabled will trigger sparse tensor data invariants checks in unsafe methods of constructing sparse COO/CSR/CSC/BSR/BSC tensors. The feature includes the following changes to UI: `torch.sparse.check_sparse_tensor_invariants` class provides different ways to enable/disable the invariant checking. `torch.sparse_coo/csr/csc/bsr/bsc/compressed_tensor` functions have a new optional argument `check_invariants` to enable/disable the invariant checks explicitly. When the `check_invariants` argument is specified, the global state of the feature is temporarily overridden. The PR fixes https://github.com/pytorch/pytorch/issues/90833 Pull Request resolved: https://github.com/pytorch/pytorch/pull/92094 Approved by: https://github.com/cpuhrsch	2023-01-13 14:50:33 +00:00
PyTorch MergeBot	c7a22bb7c7	Revert "Add check-sparse-tensor-invariants flag to Context. (#90849 )" This reverts commit `b9a035c1c5`. Reverted https://github.com/pytorch/pytorch/pull/90849 on behalf of https://github.com/DanilBaibak due to Break internal build	2023-01-12 09:58:16 +00:00
Aleksandar Samardžić	8612ec5b90	Implement hybrid sparse to/from dense conversions. (#90177 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/90177 Approved by: https://github.com/cpuhrsch, https://github.com/pearu	2023-01-12 03:31:30 +00:00
min-jean-cho	af242eedfb	[Inductor] Added aten.uniform_ decomp (#90869 ) Fixes #90815 Pull Request resolved: https://github.com/pytorch/pytorch/pull/90869 Approved by: https://github.com/jgong5, https://github.com/jansel, https://github.com/lezcano, https://github.com/ngimel, https://github.com/albanD	2023-01-11 23:23:42 +00:00
Pearu Peterson	b9a035c1c5	Add check-sparse-tensor-invariants flag to Context. (#90849 ) This PR adds "check sparse tensor invariants" flag to Context that when enabled will trigger sparse tensor data invariants checks in unsafe methods of constructing sparse COO/CSR/CSC/BSR/BSC tensors. The feature includes the following changes to UI: - `torch.enable_check_sparse_tensor_invariants` and `torch.is_check_sparse_tensor_invariants_enabled` functions to globally enable/disable the invariant checks and to retrieve the state of the feature, respectively - `torch.sparse_coo/csr/csc/bsr/bsc/compressed_tensor` functions have a new optional argument `check_invariants` to enable/disable the invariant checks explicitly. When the `check_invariants` argument is specified, the global state of the feature is temporarily overridden. The PR also fixes https://github.com/pytorch/pytorch/issues/90833 # Main issue The following content is outdated after merging the PRs in this ghstack but kept for the record. The importance of this feature is that when enabling the invariants checks by default, say, via <details> ``` $ git diff diff --git a/torch/__init__.py b/torch/__init__.py index c8543057c7..19a91d0482 100644 --- a/torch/__init__.py +++ b/torch/__init__.py @@ -1239,3 +1239,8 @@ if 'TORCH_CUDA_SANITIZER' in os.environ: # Populate magic methods on SymInt and SymFloat import torch.fx.experimental.symbolic_shapes + +# temporarily enable sparse tensor arguments validation in unsafe +# constructors: + +torch._C._set_check_sparse_tensor_invariants(True) ``` </details> a massive number of test failures/errors occur in test_sparse_csr.py tests: ``` $ pytest -sv test/test_sparse_csr.py <snip> ==== 4293 failed, 1557 passed, 237 skipped, 2744 errors in 69.71s (0:01:09) ==== ``` that means that we are silently constructing sparse compressed tensors that do not satisfy the sparse tensor invariants. In particular, the following errors are raised: ``` AssertionError: "resize_as_sparse_compressed_tensor_: self and src must have the same layout" does not match "expected values to be a strided and contiguous tensor" RuntimeError: CUDA error: device-side assert triggered RuntimeError: `col_indices[..., crow_indices[..., i - 1]:crow_indices[..., i]] for all i = 1, ..., nrows are sorted and distinct along the last dimension values` is not satisfied. RuntimeError: expected col_indices to be a strided and contiguous tensor RuntimeError: expected row_indices to be a strided and contiguous tensor RuntimeError: expected values to be a strided and contiguous tensor RuntimeError: for_each: failed to synchronize: cudaErrorAssert: device-side assert triggered RuntimeError: tensor dimensionality must be sum of batch, base, and dense dimensionalities (=0 + 2 + 0) but got 3 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/90849 Approved by: https://github.com/amjames, https://github.com/cpuhrsch	2023-01-11 01:05:14 +00:00
anjali411	c887837ec3	Reland "Fix dynamo handling for tensor attributes: T, H, mT, mH (#90463 )" (#91897 ) This reverts commit `84266ae670`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/91897 Approved by: https://github.com/ngimel	2023-01-10 08:16:07 +00:00
PyTorch MergeBot	84266ae670	Revert "Fix dynamo handling for tensor attributes: T, H, mT, mH (#90463 )" This reverts commit `9945a78a94`. Reverted https://github.com/pytorch/pytorch/pull/90463 on behalf of https://github.com/ZainRizvi due to This is causing test failures: FAILED inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_pinv_singular_cuda_float64 - RuntimeError: unexpected success linalg.pinv.singular, torch.float64, cuda	2023-01-09 16:43:36 +00:00
anjali411	9945a78a94	Fix dynamo handling for tensor attributes: T, H, mT, mH (#90463 ) Fixes https://github.com/pytorch/pytorch/issues/88843 Pull Request resolved: https://github.com/pytorch/pytorch/pull/90463 Approved by: https://github.com/ngimel	2023-01-09 04:11:23 +00:00
Nikita Vedeneev	7ef7c57ae7	CSC/BSC -> COO coalesce fix (#91440 ) Fixes https://github.com/pytorch/pytorch/issues/91010. CSC and BSC sparse formats are not inherently `coalesced`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/91440 Approved by: https://github.com/pearu, https://github.com/amjames, https://github.com/cpuhrsch	2023-01-03 18:42:39 +00:00
Pearu Peterson	b797a24259	Support indices contiguity per batch and non-contiguous values in sparse compressed tensors (#91243 ) Fixes https://github.com/pytorch/pytorch/issues/91062 With this PR, all reported failures in https://github.com/pytorch/pytorch/pull/90849 are resolved (modulo test_bmm that uses an unorthodox way to construct a batch CSR tensor). Pull Request resolved: https://github.com/pytorch/pytorch/pull/91243 Approved by: https://github.com/nikitaved, https://github.com/amjames, https://github.com/lezcano	2023-01-02 18:08:46 +00:00
Nikita Vedeneev	1768a28a20	`COO @ COO`: fix to always produce coalesced outputs. (#91094 ) Fixes [#90516](https://github.com/pytorch/pytorch/issues/90516) Fixes [#90538](https://github.com/pytorch/pytorch/issues/90538) Pull Request resolved: https://github.com/pytorch/pytorch/pull/91094 Approved by: https://github.com/pearu	2022-12-27 21:32:14 +00:00
Pearu Peterson	8004f934cd	Fix CSR with int32 indices to CSC conversion (#91061 ) Fixes https://github.com/pytorch/pytorch/issues/91007 Pull Request resolved: https://github.com/pytorch/pytorch/pull/91061 Approved by: https://github.com/nikitaved	2022-12-18 13:53:25 +00:00
Pearu Peterson	01e7f46215	Ensure sorted indices from the CSR->BSR conversion (#90918 ) Fixes https://github.com/pytorch/pytorch/issues/90910 Pull Request resolved: https://github.com/pytorch/pytorch/pull/90918 Approved by: https://github.com/cpuhrsch	2022-12-16 15:49:48 +00:00
Edward Z. Yang	e686a442b4	If a torch.* returns non-Tensor, make this unimplemented rather than assert. (#89918 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/89918 Approved by: https://github.com/albanD	2022-12-15 21:53:54 +00:00
Pearu Peterson	a60d712010	Support (non-batch) BSR/BSC to COO sparse tensor conversions (#90718 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/90718 Approved by: https://github.com/cpuhrsch	2022-12-14 05:37:05 +00:00
Pearu Peterson	76c6dfeaa6	Add layout and blocksize arguments to Tensor.to_sparse method (#89502 ) This PR extends the `Tensor.to_sparse()` method to `Tensor.to_sparse(layout=None, blocksize=None)` in a BC manner (`layout=None` means `layout=torch.sparse_coo`). In addition, the PR adds support for the following conversions: - non-hybrid/hybrid COO tensor to CSR or CSC or a COO tensor - short, bool, byte, char, bfloat16, int, long, half CSR tensor to a BSR tensor and fixes the following conversions: - hybrid COO to COO tensor - non-batch/batch hybrid BSR to BSR or BSC tensor Pull Request resolved: https://github.com/pytorch/pytorch/pull/89502 Approved by: https://github.com/amjames, https://github.com/cpuhrsch	2022-11-30 20:21:10 +00:00
Pearu Peterson	296e1ba4d0	Row and column select support for block compressed sparse tensors (#88733 ) As in the title: - Support `select` and `select_copy` on block sparse compressed tensors - Fixes incorrect results when selecting dense dimensions The PR also improves the performance of indexing sparse compressed tensors considerably: <details> Before: ```python In [3]: a=torch.rand((1000, 1000)).to_sparse_csr() In [4]: %timeit a.select(0, 0) 606 µs ± 4.27 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) In [5]: %timeit a.select(1, 0) 527 µs ± 57.7 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) In [6]: %timeit a[0, 0] 617 µs ± 3.74 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) In [7]: a = a.cuda() In [8]: %timeit a.select(0, 0); torch.cuda.synchronize(); 1.19 ms ± 137 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) In [9]: %timeit a.select(1, 0); torch.cuda.synchronize(); 1.2 ms ± 119 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) In [10]: %timeit a[0, 0]; torch.cuda.synchronize(); 1.23 ms ± 482 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) ``` This PR: ```python In [3]: a=torch.rand((1000, 1000)).to_sparse_csr() In [4]: %timeit a.select(0, 0) 4.75 µs ± 8.94 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) In [5]: %timeit a.select(1, 0) 565 µs ± 156 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) In [6]: %timeit a[0, 0] 13.1 µs ± 435 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) In [7]: a = a.cuda() In [8]: %timeit a.select(0, 0); torch.cuda.synchronize(); 21.6 µs ± 23.9 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) In [9]: %timeit a.select(1, 0); torch.cuda.synchronize(); 1.15 ms ± 3.13 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) In [10]: %timeit a[0, 0]; torch.cuda.synchronize(); 63.7 µs ± 2.5 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) ``` </details> Pull Request resolved: https://github.com/pytorch/pytorch/pull/88733 Approved by: https://github.com/nikitaved, https://github.com/amjames, https://github.com/cpuhrsch	2022-11-30 11:15:56 +00:00
Pearu Peterson	90bed8874f	Generator of tensor inputs with variable layout and structure (batch/non-batch, hybrid/non-hybrid, block/non-block) (#88914 ) This PR introduces `TestCase.generate_simple_inputs` method that is an improved and generalized version of the `TestSparseCompressed._generate_small_inputs` method. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88914 Approved by: https://github.com/cpuhrsch	2022-11-30 02:13:33 +00:00
Kazuaki Ishizaki	088f2fa567	Fix typos in messages under test (#89121 ) This PR fixes typos of messages in `.cpp` and `.py` files under test directory. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89121 Approved by: https://github.com/mruberry, https://github.com/kit1980	2022-11-17 01:55:03 +00:00
Andrew M. James	ff6770a9a1	enable backward for log1p (sparse layouts) (#88155 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88155 Approved by: https://github.com/cpuhrsch	2022-11-04 20:59:26 +00:00
jpvillam	1e1b045128	[ROCM] Enable Sparse Pickle Test (#82729 ) Missed stream context for serialization ### Description Missing ROCm stream context on memory operations for serialization ### Testing Ran the sparse pickle test Pull Request resolved: https://github.com/pytorch/pytorch/pull/82729 Approved by: https://github.com/ngimel	2022-10-27 15:11:28 +00:00
Pearu Peterson	88b882cd1c	Support sum on a sparse COO tensor. (#86300 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86300 Approved by: https://github.com/cpuhrsch	2022-10-06 18:39:28 +00:00
George Qi	686555b663	[maskedtensor] port torch/_masked into torch/masked (#85515 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85515 Approved by: https://github.com/cpuhrsch	2022-09-26 23:41:13 +00:00
Elias Ellison	bcc544e9d7	Add FakeCrossRef tests for backwards, Fix Layer Norm Backward Decomp (#85417 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85417 Approved by: https://github.com/ezyang	2022-09-26 17:08:14 +00:00
nikitaved	12ae3bea43	Faster mul(sparse, sparse) with broadcasting in dense dims. (#85336 ) This is a combo PR of https://github.com/pytorch/pytorch/pull/84929 and ~https://github.com/pytorch/pytorch/pull/83428~. Preliminary benchmarks (square matrices of shape (n, n)). <details> <summary>Script</summary> ```python import torch import math from IPython import get_ipython from itertools import product, repeat import pickle from torch.utils.benchmark import Timer, Compare torch.manual_seed(13) problem_dims = ( # n > nnz (10000, 100), (100000, 1000), (1000000, 10000), # n < nnz (10, 100), (10, 1000), (10, 10000), (100, 1000), (100, 10000), (1000, 10000), (1000, 100000), (1000, 1000000), #(1000000, 1000000000), ) name = "PR" device = "cuda" results = [] for n, nnz in problem_dims: def gen_tensor(coalesce=False): shape = (n, n) nrows, ncols = shape rowidx = torch.randint(low=0, high=nrows, size=(nnz,), device=device) colidx = torch.randint(low=0, high=ncols, size=(nnz,), device=device) itemidx = torch.vstack((rowidx, colidx)) xvalues = torch.randn(nnz, device=device) itemidx = torch.hstack((itemidx, itemidx)) xvalues = torch.hstack((xvalues, xvalues)) res = torch.sparse_coo_tensor(itemidx, xvalues, size=shape) if coalesce: return res.coalesce() else: return res for x_coalesce, y_coalesce in product(repeat((True, False), 2)): x = gen_tensor(x_coalesce) y = gen_tensor(y_coalesce) smtp = "x y" timer = Timer(smtp, globals=globals(), label="coo.mul", description=f"{name}: mul, device: {device}", sub_label=f"n={n}, nnz={nnz}, coalesce=({x_coalesce, y_coalesce})", num_threads=torch.get_num_threads()) results.append(timer.blocked_autorange()) compare = Compare(results) compare.trim_significant_figures() compare.print() with open(f"{name}_{device}_mul.pickle", 'wb') as f: pickle.dump(results, f) ``` </details> <details> <summary>Gather results</summary> ```python import pickle from torch.utils.benchmark import Timer, Compare files = [ "PR", "master" ] device = 'cuda' timers = [] for name in files: with open("{}_{}_mul.pickle".format(name, device), 'rb') as f: timers += pickle.load(f) compare = Compare(timers) compare.trim_significant_figures() compare.print() ``` </details> <details> <summary>CUDA</summary> ``` [------------------------------------------------- coo.mul -------------------------------------------------] \| PR: mul, device: cuda \| master: mul, device: cuda 24 threads: ------------------------------------------------------------------------------------------------- n=10000, nnz=100, coalesce=((True, True)) \| 95 \| 91 n=10000, nnz=100, coalesce=((True, False)) \| 87 \| 242 n=10000, nnz=100, coalesce=((False, True)) \| 87 \| 226 n=10000, nnz=100, coalesce=((False, False)) \| 130 \| 371 n=100000, nnz=1000, coalesce=((True, True)) \| 100 \| 521 n=100000, nnz=1000, coalesce=((True, False)) \| 90 \| 649 n=100000, nnz=1000, coalesce=((False, True)) \| 100 \| 659 n=100000, nnz=1000, coalesce=((False, False)) \| 200 \| 781 n=1000000, nnz=10000, coalesce=((True, True)) \| 100 \| 4861 n=1000000, nnz=10000, coalesce=((True, False)) \| 100 \| 5012 n=1000000, nnz=10000, coalesce=((False, True)) \| 98 \| 5010 n=1000000, nnz=10000, coalesce=((False, False)) \| 384 \| 5174 n=10, nnz=100, coalesce=((True, True)) \| 100 \| 79 n=10, nnz=100, coalesce=((True, False)) \| 100 \| 221 n=10, nnz=100, coalesce=((False, True)) \| 100 \| 221 n=10, nnz=100, coalesce=((False, False)) \| 100 \| 350 n=10, nnz=1000, coalesce=((True, True)) \| 100 \| 100 n=10, nnz=1000, coalesce=((True, False)) \| 100 \| 240 n=10, nnz=1000, coalesce=((False, True)) \| 100 \| 254 n=10, nnz=1000, coalesce=((False, False)) \| 100 \| 392 n=10, nnz=10000, coalesce=((True, True)) \| 100 \| 110 n=10, nnz=10000, coalesce=((True, False)) \| 110 \| 286 n=10, nnz=10000, coalesce=((False, True)) \| 110 \| 286 n=10, nnz=10000, coalesce=((False, False)) \| 271 \| 455 n=100, nnz=1000, coalesce=((True, True)) \| 110 \| 851 n=100, nnz=1000, coalesce=((True, False)) \| 110 \| 1000 n=100, nnz=1000, coalesce=((False, True)) \| 110 \| 990 n=100, nnz=1000, coalesce=((False, False)) \| 140 \| 1124 n=100, nnz=10000, coalesce=((True, True)) \| 110 \| 5137 n=100, nnz=10000, coalesce=((True, False)) \| 110 \| 5391 n=100, nnz=10000, coalesce=((False, True)) \| 100 \| 5405 n=100, nnz=10000, coalesce=((False, False)) \| 249 \| 5539 n=1000, nnz=10000, coalesce=((True, True)) \| 100 \| 8598 n=1000, nnz=10000, coalesce=((True, False)) \| 100 \| 8800 n=1000, nnz=10000, coalesce=((False, True)) \| 100 \| 8782 n=1000, nnz=10000, coalesce=((False, False)) \| 255 \| 8956 n=1000, nnz=100000, coalesce=((True, True)) \| 120 \| 84500 n=1000, nnz=100000, coalesce=((True, False)) \| 200 \| 88560 n=1000, nnz=100000, coalesce=((False, True)) \| 160 \| 89000 n=1000, nnz=100000, coalesce=((False, False)) \| 373 \| 89000 n=1000, nnz=1000000, coalesce=((True, True)) \| 312 \| 606400 n=1000, nnz=1000000, coalesce=((True, False)) \| 1340 \| 609200 n=1000, nnz=1000000, coalesce=((False, True)) \| 1340 \| 609100 n=1000, nnz=1000000, coalesce=((False, False)) \| 4408 \| 611400 Times are in microseconds (us). ``` </details> <details> <summary>CPU</summary> ``` [------------------------------------------------ coo.mul ------------------------------------------------] \| PR: mul, device: cpu \| master: mul, device: cpu 24 threads: ----------------------------------------------------------------------------------------------- n=10000, nnz=100, coalesce=((True, True)) \| 8 \| 8 n=10000, nnz=100, coalesce=((True, False)) \| 32 \| 34 n=10000, nnz=100, coalesce=((False, True)) \| 32 \| 34 n=10000, nnz=100, coalesce=((False, False)) \| 41 \| 56 n=100000, nnz=1000, coalesce=((True, True)) \| 24 \| 24 n=100000, nnz=1000, coalesce=((True, False)) \| 90 \| 100 n=100000, nnz=1000, coalesce=((False, True)) \| 87 \| 100 n=100000, nnz=1000, coalesce=((False, False)) \| 231 \| 255 n=1000000, nnz=10000, coalesce=((True, True)) \| 190 \| 200 n=1000000, nnz=10000, coalesce=((True, False)) \| 908 \| 2023 n=1000000, nnz=10000, coalesce=((False, True)) \| 800 \| 2036 n=1000000, nnz=10000, coalesce=((False, False)) \| 3684 \| 3989 n=10, nnz=100, coalesce=((True, True)) \| 8 \| 7 n=10, nnz=100, coalesce=((True, False)) \| 34 \| 30 n=10, nnz=100, coalesce=((False, True)) \| 33 \| 30 n=10, nnz=100, coalesce=((False, False)) \| 44 \| 50 n=10, nnz=1000, coalesce=((True, True)) \| 8 \| 7 n=10, nnz=1000, coalesce=((True, False)) \| 100 \| 100 n=10, nnz=1000, coalesce=((False, True)) \| 130 \| 100 n=10, nnz=1000, coalesce=((False, False)) \| 746 \| 210 n=10, nnz=10000, coalesce=((True, True)) \| 8 \| 7 n=10, nnz=10000, coalesce=((True, False)) \| 1000 \| 1500 n=10, nnz=10000, coalesce=((False, True)) \| 1000 \| 1510 n=10, nnz=10000, coalesce=((False, False)) \| 3063 \| 2457 n=100, nnz=1000, coalesce=((True, True)) \| 25 \| 25 n=100, nnz=1000, coalesce=((True, False)) \| 180 \| 130 n=100, nnz=1000, coalesce=((False, True)) \| 200 \| 130 n=100, nnz=1000, coalesce=((False, False)) \| 271 \| 255 n=100, nnz=10000, coalesce=((True, True)) \| 100 \| 100 n=100, nnz=10000, coalesce=((True, False)) \| 2444 \| 2290 n=100, nnz=10000, coalesce=((False, True)) \| 2455 \| 2357 n=100, nnz=10000, coalesce=((False, False)) \| 5316 \| 3783 n=1000, nnz=10000, coalesce=((True, True)) \| 204 \| 211 n=1000, nnz=10000, coalesce=((True, False)) \| 2457 \| 2480 n=1000, nnz=10000, coalesce=((False, True)) \| 2448 \| 2539 n=1000, nnz=10000, coalesce=((False, False)) \| 3665 \| 4801 n=1000, nnz=100000, coalesce=((True, True)) \| 2293 \| 2374 n=1000, nnz=100000, coalesce=((True, False)) \| 9000 \| 24620 n=1000, nnz=100000, coalesce=((False, True)) \| 8000 \| 25080 n=1000, nnz=100000, coalesce=((False, False)) \| 26500 \| 47650 n=1000, nnz=1000000, coalesce=((True, True)) \| 10000 \| 13000 n=1000, nnz=1000000, coalesce=((True, False)) \| 80000 \| 362200 n=1000, nnz=1000000, coalesce=((False, True)) \| 78050 \| 392600 n=1000, nnz=1000000, coalesce=((False, False)) \| 312100 \| 766900 Times are in microseconds (us). ``` </details> Pull Request resolved: https://github.com/pytorch/pytorch/pull/85336 Approved by: https://github.com/cpuhrsch	2022-09-23 23:31:19 +00:00
PyTorch MergeBot	d10de31cc8	Revert "Add FakeCrossRef tests for backwards, Fix Layer Norm Backward Decomp (#85417 )" This reverts commit `78afa0cf0c`. Reverted https://github.com/pytorch/pytorch/pull/85417 on behalf of https://github.com/clee2000 due to broke tests on trunk `78afa0cf0c`	2022-09-23 17:21:43 +00:00
Elias Ellison	78afa0cf0c	Add FakeCrossRef tests for backwards, Fix Layer Norm Backward Decomp (#85417 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85417 Approved by: https://github.com/ezyang	2022-09-23 15:50:03 +00:00
PyTorch MergeBot	5043457a8e	Revert "Add FakeCrossRef tests for backwards, Fix Layer Norm Backward Decomp (#85417 )" This reverts commit `9c77083965`. Reverted https://github.com/pytorch/pytorch/pull/85417 on behalf of https://github.com/clee2000 due to broke tests on trunk (and pull somehow) `9c77083965`	2022-09-22 15:44:38 +00:00
Elias Ellison	9c77083965	Add FakeCrossRef tests for backwards, Fix Layer Norm Backward Decomp (#85417 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85417 Approved by: https://github.com/ezyang	2022-09-22 13:03:57 +00:00
Elias Ellison	d9aa6dfe88	Add Fake Cross Ref Mode, migrate sparse to it (#85382 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85382 Approved by: https://github.com/ezyang	2022-09-21 17:15:47 +00:00
PyTorch MergeBot	81620c3360	Revert "Faster mul(sparse, sparse) with broadcasting in dense dims. (#83428 )" This reverts commit `d49943bda8`. Reverted https://github.com/pytorch/pytorch/pull/83428 on behalf of https://github.com/osalpekar due to Reverted because __restrict symbol not supported by certain MSVC compilers, leading to undefined symbol error at compilation time	2022-09-17 06:53:11 +00:00
nikitaved	d49943bda8	Faster mul(sparse, sparse) with broadcasting in dense dims. (#83428 ) Preliminary benchmarks (square matrices of shape (n, n)). <details> <summary>Script</summary> ```python import torch import math from IPython import get_ipython from itertools import product, repeat import pickle from torch.utils.benchmark import Timer, Compare torch.manual_seed(13) # specifies (n, nnz) problem_dims = ( # n > nnz (10000, 100), (100000, 1000), (1000000, 10000), # n < nnz (10, 100), (10, 1000), (10, 10000), (100, 1000), (100, 10000), (1000, 10000), (1000, 100000), (1000, 1000000), #(1000000, 1000000000), ) name = "PR" device = "cuda" results = [] for n, nnz in problem_dims: def gen_tensor(coalesce=False): shape = (n, n) nrows, ncols = shape rowidx = torch.randint(low=0, high=nrows, size=(nnz,), device=device) colidx = torch.randint(low=0, high=ncols, size=(nnz,), device=device) itemidx = torch.vstack((rowidx, colidx)) xvalues = torch.randn(nnz, device=device) itemidx = torch.hstack((itemidx, itemidx)) xvalues = torch.hstack((xvalues, xvalues)) res = torch.sparse_coo_tensor(itemidx, xvalues, size=shape) if coalesce: return res.coalesce() else: return res for x_coalesce, y_coalesce in product(repeat((True, False), 2)): x = gen_tensor(x_coalesce) y = gen_tensor(y_coalesce) smtp = "x y" timer = Timer(smtp, globals=globals(), label="coo.mul", description=f"{name}: mul, device: {device}", sub_label=f"n={n}, nnz={nnz}, coalesce=({x_coalesce, y_coalesce})", num_threads=torch.get_num_threads()) results.append(timer.blocked_autorange()) compare = Compare(results) compare.trim_significant_figures() compare.print() with open(f"{name}_{device}_mul.pickle", 'wb') as f: pickle.dump(results, f) ``` </details> <details> <summary>Gather results</summary> ```python import pickle from torch.utils.benchmark import Timer, Compare files = [ "PR", "master" ] device = 'cuda' timers = [] for name in files: with open("{}_{}_mul.pickle".format(name, device), 'rb') as f: timers += pickle.load(f) compare = Compare(timers) compare.trim_significant_figures() compare.print() ``` </details> <details> <summary>CUDA</summary> ``` [------------------------------------------------- coo.mul -------------------------------------------------] \| PR: mul, device: cuda \| master: mul, device: cuda 24 threads: ------------------------------------------------------------------------------------------------- n=10000, nnz=100, coalesce=((True, True)) \| 95 \| 91 n=10000, nnz=100, coalesce=((True, False)) \| 87 \| 242 n=10000, nnz=100, coalesce=((False, True)) \| 87 \| 226 n=10000, nnz=100, coalesce=((False, False)) \| 130 \| 371 n=100000, nnz=1000, coalesce=((True, True)) \| 100 \| 521 n=100000, nnz=1000, coalesce=((True, False)) \| 90 \| 649 n=100000, nnz=1000, coalesce=((False, True)) \| 100 \| 659 n=100000, nnz=1000, coalesce=((False, False)) \| 200 \| 781 n=1000000, nnz=10000, coalesce=((True, True)) \| 100 \| 4861 n=1000000, nnz=10000, coalesce=((True, False)) \| 100 \| 5012 n=1000000, nnz=10000, coalesce=((False, True)) \| 98 \| 5010 n=1000000, nnz=10000, coalesce=((False, False)) \| 384 \| 5174 n=10, nnz=100, coalesce=((True, True)) \| 100 \| 79 n=10, nnz=100, coalesce=((True, False)) \| 100 \| 221 n=10, nnz=100, coalesce=((False, True)) \| 100 \| 221 n=10, nnz=100, coalesce=((False, False)) \| 100 \| 350 n=10, nnz=1000, coalesce=((True, True)) \| 100 \| 100 n=10, nnz=1000, coalesce=((True, False)) \| 100 \| 240 n=10, nnz=1000, coalesce=((False, True)) \| 100 \| 254 n=10, nnz=1000, coalesce=((False, False)) \| 100 \| 392 n=10, nnz=10000, coalesce=((True, True)) \| 100 \| 110 n=10, nnz=10000, coalesce=((True, False)) \| 110 \| 286 n=10, nnz=10000, coalesce=((False, True)) \| 110 \| 286 n=10, nnz=10000, coalesce=((False, False)) \| 271 \| 455 n=100, nnz=1000, coalesce=((True, True)) \| 110 \| 851 n=100, nnz=1000, coalesce=((True, False)) \| 110 \| 1000 n=100, nnz=1000, coalesce=((False, True)) \| 110 \| 990 n=100, nnz=1000, coalesce=((False, False)) \| 140 \| 1124 n=100, nnz=10000, coalesce=((True, True)) \| 110 \| 5137 n=100, nnz=10000, coalesce=((True, False)) \| 110 \| 5391 n=100, nnz=10000, coalesce=((False, True)) \| 100 \| 5405 n=100, nnz=10000, coalesce=((False, False)) \| 249 \| 5539 n=1000, nnz=10000, coalesce=((True, True)) \| 100 \| 8598 n=1000, nnz=10000, coalesce=((True, False)) \| 100 \| 8800 n=1000, nnz=10000, coalesce=((False, True)) \| 100 \| 8782 n=1000, nnz=10000, coalesce=((False, False)) \| 255 \| 8956 n=1000, nnz=100000, coalesce=((True, True)) \| 120 \| 84500 n=1000, nnz=100000, coalesce=((True, False)) \| 200 \| 88560 n=1000, nnz=100000, coalesce=((False, True)) \| 160 \| 89000 n=1000, nnz=100000, coalesce=((False, False)) \| 373 \| 89000 n=1000, nnz=1000000, coalesce=((True, True)) \| 312 \| 606400 n=1000, nnz=1000000, coalesce=((True, False)) \| 1340 \| 609200 n=1000, nnz=1000000, coalesce=((False, True)) \| 1340 \| 609100 n=1000, nnz=1000000, coalesce=((False, False)) \| 4408 \| 611400 Times are in microseconds (us). ``` </details> <details> <summary>CPU</summary> ``` [------------------------------------------------ coo.mul ------------------------------------------------] \| PR: mul, device: cpu \| master: mul, device: cpu 24 threads: ----------------------------------------------------------------------------------------------- n=10000, nnz=100, coalesce=((True, True)) \| 8 \| 8 n=10000, nnz=100, coalesce=((True, False)) \| 32 \| 34 n=10000, nnz=100, coalesce=((False, True)) \| 32 \| 34 n=10000, nnz=100, coalesce=((False, False)) \| 41 \| 56 n=100000, nnz=1000, coalesce=((True, True)) \| 24 \| 24 n=100000, nnz=1000, coalesce=((True, False)) \| 90 \| 100 n=100000, nnz=1000, coalesce=((False, True)) \| 87 \| 100 n=100000, nnz=1000, coalesce=((False, False)) \| 231 \| 255 n=1000000, nnz=10000, coalesce=((True, True)) \| 190 \| 200 n=1000000, nnz=10000, coalesce=((True, False)) \| 908 \| 2023 n=1000000, nnz=10000, coalesce=((False, True)) \| 800 \| 2036 n=1000000, nnz=10000, coalesce=((False, False)) \| 3684 \| 3989 n=10, nnz=100, coalesce=((True, True)) \| 8 \| 7 n=10, nnz=100, coalesce=((True, False)) \| 34 \| 30 n=10, nnz=100, coalesce=((False, True)) \| 33 \| 30 n=10, nnz=100, coalesce=((False, False)) \| 44 \| 50 n=10, nnz=1000, coalesce=((True, True)) \| 8 \| 7 n=10, nnz=1000, coalesce=((True, False)) \| 100 \| 100 n=10, nnz=1000, coalesce=((False, True)) \| 130 \| 100 n=10, nnz=1000, coalesce=((False, False)) \| 746 \| 210 n=10, nnz=10000, coalesce=((True, True)) \| 8 \| 7 n=10, nnz=10000, coalesce=((True, False)) \| 1000 \| 1500 n=10, nnz=10000, coalesce=((False, True)) \| 1000 \| 1510 n=10, nnz=10000, coalesce=((False, False)) \| 3063 \| 2457 n=100, nnz=1000, coalesce=((True, True)) \| 25 \| 25 n=100, nnz=1000, coalesce=((True, False)) \| 180 \| 130 n=100, nnz=1000, coalesce=((False, True)) \| 200 \| 130 n=100, nnz=1000, coalesce=((False, False)) \| 271 \| 255 n=100, nnz=10000, coalesce=((True, True)) \| 100 \| 100 n=100, nnz=10000, coalesce=((True, False)) \| 2444 \| 2290 n=100, nnz=10000, coalesce=((False, True)) \| 2455 \| 2357 n=100, nnz=10000, coalesce=((False, False)) \| 5316 \| 3783 n=1000, nnz=10000, coalesce=((True, True)) \| 204 \| 211 n=1000, nnz=10000, coalesce=((True, False)) \| 2457 \| 2480 n=1000, nnz=10000, coalesce=((False, True)) \| 2448 \| 2539 n=1000, nnz=10000, coalesce=((False, False)) \| 3665 \| 4801 n=1000, nnz=100000, coalesce=((True, True)) \| 2293 \| 2374 n=1000, nnz=100000, coalesce=((True, False)) \| 9000 \| 24620 n=1000, nnz=100000, coalesce=((False, True)) \| 8000 \| 25080 n=1000, nnz=100000, coalesce=((False, False)) \| 26500 \| 47650 n=1000, nnz=1000000, coalesce=((True, True)) \| 10000 \| 13000 n=1000, nnz=1000000, coalesce=((True, False)) \| 80000 \| 362200 n=1000, nnz=1000000, coalesce=((False, True)) \| 78050 \| 392600 n=1000, nnz=1000000, coalesce=((False, False)) \| 312100 \| 766900 Times are in microseconds (us). ``` </details> Pull Request resolved: https://github.com/pytorch/pytorch/pull/83428 Approved by: https://github.com/cpuhrsch	2022-09-16 00:28:40 +00:00
Edward Z. Yang	c5a8946e40	Revert "Revert "Redo how custom/python_custom methods on TensorImpl work (#84796 )" (#84806 ) This reverts commit `ca3b2bfbe3`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84806 Approved by: https://github.com/Chillee	2022-09-10 06:17:35 +00:00
Eli Uriegas	ca3b2bfbe3	Revert "Redo how custom/python_custom methods on TensorImpl work (#84796 ) This reverts commit `591b75bf98`. Manual revert of https://github.com/pytorch/pytorch/pull/84641 Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/84796 Approved by: https://github.com/izaitsevfb	2022-09-10 00:18:13 +00:00
Edward Z. Yang	591b75bf98	Redo how custom/python_custom methods on TensorImpl work (#84641 ) A longstanding confusion in the implementation of fake tensor and proxy tensor is what to do about torch.ops.aten.sym_sizes and related calls. In particular, when you have a tensor that (1) has symbolic shapes and (2) has a `__torch_dispatch__` call, previously, you would always get `__torch_dispatch__` calls for sizes/strides query, even if you didn't request it via the dispatch kwargs in `make_wrapper_subclass`. The reason for this is because we were previously mixing several concepts: "I want to dispatch to Python", "I want to call a virtual method" and "I have dynamic shapes". A single boolean variable controlled all of these things, and so it was not possible to understand inside TensorImpl what the user had actually originally requested. In this PR, we track each of these concepts individually so that we can preserve user intent. Then, we combine these into a single "policy" variable that controls whether or not we can use the fastpath or not. For the policy to trigger, we only need one of the exceptional cases to be true. Billing of changes: * Rename `set_sizes_strides_policy` to `set_custom_sizes_strides`; in general, you cannot DIRECTLY set policy; you have to indirectly set it by the public functions. * Some helpers for sizes and strides, since it's more complicated (as it is an enum, rather than just bools as is the case for device and layout). `matches_python_custom` is used to test the Python dispatch user ask. `matches_policy` does the policy test (only used in the user facing functions.) * I reorged the accessor methods so that they are more logical. This makes the diff bad, so I recommend reading the final code directly. * The default custom implementations now more reliably call their default() implementations * As bonus refactor, I devirtualized some functions that don't need to be virtual * `set_sym_sizes_and_strides` is renamed to `set_sizes_and_strides` to make it easier to use in template contexts; it optionally takes a storage offset now so you can set all three values at the same time. If you use the SymInt overload but there are no symbolic integers, we give you a normal resize. * This adds `sym_storage_offset` since we had that in the symbolic shapes branch and there's no reason not to put it in (and it reduces merge conflicts) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/84641 Approved by: https://github.com/wconstab	2022-09-09 13:41:13 +00:00
Elias Ellison	15c5baf878	Throw on data dependent ops (#83567 ) Previously, we would trace through the following with no error: ``` from torch.fx.experimental.proxy_tensor import make_fx import torch def f(x, y): return x[0, y:] ``` Even though the output shape is dependent on the data of `y`. Now, throw on the conversion of `y` to an integer. It would be nice to not break on constant tensors but I'll do that as the next PR (Edit: done with https://github.com/pytorch/pytorch/pull/84387). Sketching out how that would work (and keep in mind this is applicable Dynamo tracing and not just AOT Autograd) I think to do that you would need to : - hold strong refs to a set of constant tensors, and only allow them to be captured from `lift_fresh.copy` - when you run a mutable op, either remove it from the set of constant tensors or run the operator for real - limit to small constant tensors Anything else ? Pull Request resolved: https://github.com/pytorch/pytorch/pull/83567 Approved by: https://github.com/ezyang	2022-09-07 02:37:00 +00:00
Andrew M. James	6dc9223c8b	Sparse_coo: Be more agressive in setting coalesced True to avoid suprising behaviors (#82426 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/82426 Approved by: https://github.com/pearu, https://github.com/bhosmer	2022-09-01 17:46:51 +00:00
jpvillam	247468baf0	[ROCm] More Sparse UTs enablement and more hipification mappings. (#78939 ) Enables: test_bmm_cuda_float64 test_bmm_deterministic_cuda_float64 test_csr_matvec_cuda_complex128 test_csr_matvec_cuda_complex64 test_csr_matvec_cuda_float32 test_csr_matvec_cuda_float64 To enable the above tests had to add some more hip mappings for the hipification process. Pull Request resolved: https://github.com/pytorch/pytorch/pull/78939 Approved by: https://github.com/pruthvistony, https://github.com/malfet	2022-08-23 13:54:09 +00:00
Brian Hirsh	0c24af4985	Always allow tensor metadata changes (#83590 ) Make it so that it is valid to set metadata after detach calls, like `x.detach().resize_(...)`. This technically lifts some restrictions around `.data`. This PR means that you can now technically call `x.data.resize_(...)`, which can now directly resize `x` instead of erroring. My understanding: Before the tensor-variable merge, when `x` and `x.data` were really different tensors, you could resize `x.data` independently of `x`, and during the merge, this error was added to avoid silent confusing behavior changes. It was agreed that this error has been around long enough (several years) that it's acceptable to drop. cc @albanD @ezyang. (Ed already had a prototype PR [here](https://github.com/pytorch/pytorch/pull/83545) - I ended up making one to try to slog through test failures). Pull Request resolved: https://github.com/pytorch/pytorch/pull/83590 Approved by: https://github.com/ezyang	2022-08-19 23:30:43 +00:00
nikitaved	b60dc2eb43	`mul`: sparse-dense + sparse-sparse with 0-dims support take 2. (#82962 ) This one is a copy of https://github.com/pytorch/pytorch/pull/81556 https://github.com/pytorch/pytorch/pull/82717 These got reverted due to issues with torchvision. CC @kit1980 , could you please take over from here? Pull Request resolved: https://github.com/pytorch/pytorch/pull/82962 Approved by: https://github.com/kit1980	2022-08-11 23:34:58 +00:00
PyTorch MergeBot	45291c7ec8	Revert "Implement `mul(dense, sparse), mul(sparse, dense)` for sparse COO tensors. (#81556 )" This reverts commit `edd2f6daa7`. Reverted https://github.com/pytorch/pytorch/pull/81556 on behalf of https://github.com/kit1980 due to Broken internal test, S286911	2022-08-05 19:39:01 +00:00
PyTorch MergeBot	796fba02fe	Revert "Implement and extend `mul(sparse, sparse)` to work with 0-dim arguments on either side. (#82717 )" This reverts commit `3ab54b971f`. Reverted https://github.com/pytorch/pytorch/pull/82717 on behalf of https://github.com/kit1980 due to Broken internal test, S286911	2022-08-05 19:35:35 +00:00
Nikita Vedeneev	3ab54b971f	Implement and extend `mul(sparse, sparse)` to work with 0-dim arguments on either side. (#82717 ) Extends https://github.com/pytorch/pytorch/pull/81556 by bringing some missing functionality implemented in master. Also, improves on master to allow arbitrary 0-dim coalesced or not arguments to be on either side of the operation. Master, for example, would fail on 0-dim non-coalesced inputs. CC @datumbox, @osalpekar . Pull Request resolved: https://github.com/pytorch/pytorch/pull/82717 Approved by: https://github.com/amjames, https://github.com/bhosmer	2022-08-04 17:46:23 +00:00
Edward Z. Yang	42fefd4403	Sparse fake tensor support (#82172 ) Add support for sparse fake tensors. - The testing strategy is to run a fake tensor cross ref test on `test_sparse.py`. This is necessary because OpInfo sparse coverage is completely nonexistent. We could have tried to turn on cross ref testing globally for all files, but that would be very time consuming and the tests I'm interested in are mostly in this file. There are some exclusions in testing for things that don't work. - I make fake tensor converter raise a UnsupportedFakeTensorException if the meta converter fails to do a conversion (which can happen in a relatively large number of situations). - I relax fake tensor invariants so that you can make a fake tensor from a meta tensor. This is useful because in the cross ref test sometimes we operate on meta tensors. - Fake tensor wrapping is improved to handle the case when a function doesn't return any tensors - Meta converter is taught how to convert sparse tensors to meta There's still a little more cleanup that needs to be done, but this is good for review. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/82172 Approved by: https://github.com/eellison	2022-08-03 14:29:36 +00:00
Nikita Vedeneev	edd2f6daa7	Implement `mul(dense, sparse), mul(sparse, dense)` for sparse COO tensors. (#81556 ) As per title. Implemented with broadcasting and in-place support. Follow-up : Backward implementation. Fixes https://github.com/pytorch/pytorch/issues/3158 Fixes https://github.com/pytorch/pytorch/issues/4456 Fixes https://github.com/pytorch/pytorch/issues/46307 Pull Request resolved: https://github.com/pytorch/pytorch/pull/81556 Approved by: https://github.com/amjames, https://github.com/cpuhrsch	2022-07-29 15:15:27 +00:00
Nikita Vedeneev	18d0e533da	fix silent type promition for sparse COO tensors with `select` (#82215 ) Fixes https://github.com/pytorch/pytorch/issues/82150. Pull Request resolved: https://github.com/pytorch/pytorch/pull/82215 Approved by: https://github.com/amjames, https://github.com/cpuhrsch	2022-07-27 12:24:06 +00:00
Christian Puhrsch	6ab1fe19ee	torch.sparse.softmax avoid div by zero and invalid kernel launch parameters (#82149 ) ### Description Small changes needed to deal with nnz 0 inputs. ### Issue https://github.com/pytorch/pytorch/issues/82107 ### Testing Added additional test coverage to reproduce bug reported in issue. Tested resulting values by conversion `to_dense`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/82149 Approved by: https://github.com/jbschlosser, https://github.com/ezyang	2022-07-25 23:10:58 +00:00
PyTorch MergeBot	6e9b0dcdc4	Revert "Implement `mul(dense, sparse), mul(sparse, dense)` for sparse COO tensors. (#81556 )" This reverts commit `cc5b01651f`. Reverted https://github.com/pytorch/pytorch/pull/81556 on behalf of https://github.com/jeanschmidt due to breaking internal builds	2022-07-22 11:20:11 +00:00
Nikita Vedeneev	cc5b01651f	Implement `mul(dense, sparse), mul(sparse, dense)` for sparse COO tensors. (#81556 ) As per title. Implemented with broadcasting and in-place support. Follow-up : Backward implementation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/81556 Approved by: https://github.com/amjames, https://github.com/cpuhrsch	2022-07-22 04:55:48 +00:00
Edward Z. Yang	44193f6b5d	Add basic support for sparse meta tensors (#81800 ) Coverage is by no means complete, we'll drive more coverage using an appropriate cross-ref tests; this is just enough to get construction and querying working. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/81800 Approved by: https://github.com/cpuhrsch, https://github.com/bdhirsh	2022-07-21 21:23:57 +00:00
Andrew M. James	5a4c9e8394	Add spdiags sparse matrix initialization (#78439 ) Similar to [scipy.sparse.spdiags](https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.spdiags.html#scipy-sparse-spdiags) Part of #70926 In other functions (ie (torch.diagonal)[https://pytorch.org/docs/stable/generated/torch.diagonal.html#torch.diagonal]) diagonals of a tensor are referenced using the offset and the two dimensions that the diagonal is taken with respect to. Here the reference implementation from scipy is only considering matrix output, so even if we only support 2-d output at first. It may be useful to consider how the dimensions corresponding to each diagonal would be specified for higher dimensional output. The proposed torch signature implies that all offsets refer to the diagonals with respect to the only two dimensions of the output: ``` torch.sparse.spdiags(Tensor diagonals, IntTensor offsets, int[] shape, Layout? layout=None) -> SparseTensor ``` Above it is required that: `diagonals.ndimension() == 2`, `offsets.ndimensions() == 1`, `offsets.shape[0] == diagonals.shape[0]` and `len(shape) == 2`. This would need to be altered for the case where `len(shape)` > 2. One options is: ``` torch.sparse.spdiags(Tensor[] diagonals, IntTensor[] offsets, IntTensor dims, int[] shape, Layout? layout=None) -> SparseTensor ``` Here `offsets` and `diagonals` becomes lists of tensors, and the `IntTensor dims` argument is introduced. This would require that `len(diagonals) == len(offsets) == dims.shape[0]`, `dims.ndimension() == 2` and `dims.shape[1] == 2` also the same restrictions as the 2d case above apply to the elements of `diagonals` and `offsets` pairwise (that is `diagonals[i].ndimension() == 2`, `offsets[i].ndimension() == 1` and `offsets[i].shape[0] == diagonals[i].shape[0]` for all i). This form of the signature would construct the sparse result by placing the values from `diagonals[i][j]` into the diagonal with offset `offset[i][j]` taken with respect to dimensions `dims[i]`. The specialization back to the original signature for the 2d case could be seen as allowing the single row of dims to default to `[0, 1]` when there is only one `diagonals`, `offsets` provided, and shape is `2-d`. This option allows the rows of an input element `diagonals[i]` to have a different length which may be appropriate as the max length of a diagonal along different dimension pairs will be different. Another option is to specify the dimensions the diagonal is taken with respect to for each offset. This signature would look like: ``` torch.sparse.spdiags(Tensor diagonals, IntTensor offsets, IntTensor dims, int[] shape, Layout? layout=None) -> SparseTensor ``` Here, `diagonals` is still 2-D with dimension 0 matching the length of 1-D `offsets` and the tensor input `dims` is also 2-D with dimension 0 matching the length of 1-D `offsets` and the second dimension being fixed at `2` in this case the sparse result is constructed by placing the elements from `diagonals[i]` into the output diagonal `output.diagonal(offset[i], dim0=dims[i][0], dim1=dims[i][1])` (with some additional consideration that makes it more complicated than simply asigning to that view). The specialization from this back to the 2-D form could be seen as assuming `dims = [[0, 1], [0, 1]... len(offsets) times ]` when `len shape==2`. In both proposed signatures for the N-D case the specialization back to the 2-D signature is a bit of a stretch for your typical default arguments logic, however I think the first is better choice as it offers more flexibility. I think some discussion is required about: - [x] Should the N-D output case be implemented from the outset - [x] If not, should the future addition of the N-D output case be considered when designing the interface. - [x] Other thoughts on the signature which includes the `dims` information for the N-D output case. Resolution: Since no one has offered a request for N-D output support, I think is fine to restrict this to sparse matrix generation. Should a request for N-D support come later, an overload accepting the additional `dims` could be added. Pull Request resolved: https://github.com/pytorch/pytorch/pull/78439 Approved by: https://github.com/nikitaved, https://github.com/cpuhrsch, https://github.com/pearu	2022-07-01 01:11:54 +00:00
PyTorch MergeBot	56e3bc5215	Revert "Add spdiags sparse matrix initialization (#78439 )" This reverts commit `cfb2034b65`. Reverted https://github.com/pytorch/pytorch/pull/78439 on behalf of https://github.com/suo due to broke windows builds, see: `cfb2034b65`	2022-06-30 21:04:36 +00:00
Andrew M. James	cfb2034b65	Add spdiags sparse matrix initialization (#78439 ) Similar to [scipy.sparse.spdiags](https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.spdiags.html#scipy-sparse-spdiags) Part of #70926 In other functions (ie (torch.diagonal)[https://pytorch.org/docs/stable/generated/torch.diagonal.html#torch.diagonal]) diagonals of a tensor are referenced using the offset and the two dimensions that the diagonal is taken with respect to. Here the reference implementation from scipy is only considering matrix output, so even if we only support 2-d output at first. It may be useful to consider how the dimensions corresponding to each diagonal would be specified for higher dimensional output. The proposed torch signature implies that all offsets refer to the diagonals with respect to the only two dimensions of the output: ``` torch.sparse.spdiags(Tensor diagonals, IntTensor offsets, int[] shape, Layout? layout=None) -> SparseTensor ``` Above it is required that: `diagonals.ndimension() == 2`, `offsets.ndimensions() == 1`, `offsets.shape[0] == diagonals.shape[0]` and `len(shape) == 2`. This would need to be altered for the case where `len(shape)` > 2. One options is: ``` torch.sparse.spdiags(Tensor[] diagonals, IntTensor[] offsets, IntTensor dims, int[] shape, Layout? layout=None) -> SparseTensor ``` Here `offsets` and `diagonals` becomes lists of tensors, and the `IntTensor dims` argument is introduced. This would require that `len(diagonals) == len(offsets) == dims.shape[0]`, `dims.ndimension() == 2` and `dims.shape[1] == 2` also the same restrictions as the 2d case above apply to the elements of `diagonals` and `offsets` pairwise (that is `diagonals[i].ndimension() == 2`, `offsets[i].ndimension() == 1` and `offsets[i].shape[0] == diagonals[i].shape[0]` for all i). This form of the signature would construct the sparse result by placing the values from `diagonals[i][j]` into the diagonal with offset `offset[i][j]` taken with respect to dimensions `dims[i]`. The specialization back to the original signature for the 2d case could be seen as allowing the single row of dims to default to `[0, 1]` when there is only one `diagonals`, `offsets` provided, and shape is `2-d`. This option allows the rows of an input element `diagonals[i]` to have a different length which may be appropriate as the max length of a diagonal along different dimension pairs will be different. Another option is to specify the dimensions the diagonal is taken with respect to for each offset. This signature would look like: ``` torch.sparse.spdiags(Tensor diagonals, IntTensor offsets, IntTensor dims, int[] shape, Layout? layout=None) -> SparseTensor ``` Here, `diagonals` is still 2-D with dimension 0 matching the length of 1-D `offsets` and the tensor input `dims` is also 2-D with dimension 0 matching the length of 1-D `offsets` and the second dimension being fixed at `2` in this case the sparse result is constructed by placing the elements from `diagonals[i]` into the output diagonal `output.diagonal(offset[i], dim0=dims[i][0], dim1=dims[i][1])` (with some additional consideration that makes it more complicated than simply asigning to that view). The specialization from this back to the 2-D form could be seen as assuming `dims = [[0, 1], [0, 1]... len(offsets) times ]` when `len shape==2`. In both proposed signatures for the N-D case the specialization back to the 2-D signature is a bit of a stretch for your typical default arguments logic, however I think the first is better choice as it offers more flexibility. I think some discussion is required about: - [x] Should the N-D output case be implemented from the outset - [x] If not, should the future addition of the N-D output case be considered when designing the interface. - [x] Other thoughts on the signature which includes the `dims` information for the N-D output case. Resolution: Since no one has offered a request for N-D output support, I think is fine to restrict this to sparse matrix generation. Should a request for N-D support come later, an overload accepting the additional `dims` could be added. Pull Request resolved: https://github.com/pytorch/pytorch/pull/78439 Approved by: https://github.com/nikitaved, https://github.com/cpuhrsch, https://github.com/pearu	2022-06-30 19:54:47 +00:00
Christian Puhrsch	5da776dd08	[Resubmission] fix mul_out CUDA config for COO tensors (#80254 ) Fixes https://github.com/pytorch/pytorch/issues/79914 Duplicate of https://github.com/pytorch/pytorch/pull/79937 . I wasn't able to push changes to the existing PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/80254 Approved by: https://github.com/eellison	2022-06-28 00:47:03 +00:00
Nikita Vedeneev	417677bf62	`permute` for COO sparse tensors (#79707 ) As per title. Partial implementation of https://github.com/pytorch/pytorch/issues/78422. We cannot satisfy the view semantics once operated over sparse dims. Pull Request resolved: https://github.com/pytorch/pytorch/pull/79707 Approved by: https://github.com/cpuhrsch	2022-06-25 08:49:58 +00:00
Nikita Vedeneev	03cf01bdc0	`index_select` for COO CUDA tensors. (#77551 ) Brings a native CUDA implementation for `index_select`. Master silently converts CUDA tensors to CPU for CUDA support. Case `nnz >> size` could be optimized similar to how https://github.com/pytorch/pytorch/pull/72710 is doing that. Some benchmarks: <details> <summary>PR/torch_sparse/master</summary> ``` [------------------------------- cuda coo.index_select -------------------------------] \| PR \| torch_sparse \| master 32 threads: --------------------------------------------------------------------------- n=10000, nnz=100, index_len=100, dim=0 \| 96 \| 327 \| 70 n=10000, nnz=100, index_len=100, dim=1 \| 120 \| 505 \| 74 n=10000, nnz=100, index_len=1000, dim=0 \| 90 \| 333 \| 93 n=10000, nnz=100, index_len=1000, dim=1 \| 120 \| 499 \| 98 n=10000, nnz=100, index_len=10000, dim=0 \| 92 \| 331 \| 350 n=10000, nnz=100, index_len=10000, dim=1 \| 100 \| 506 \| 352 n=100000, nnz=1000, index_len=100, dim=0 \| 53 \| 274 \| 60 n=100000, nnz=1000, index_len=100, dim=1 \| 90 \| 368 \| 71 n=100000, nnz=1000, index_len=1000, dim=0 \| 93 \| 332 \| 100 n=100000, nnz=1000, index_len=1000, dim=1 \| 130 \| 501 \| 140 n=100000, nnz=1000, index_len=10000, dim=0 \| 100 \| 341 \| 522 n=100000, nnz=1000, index_len=10000, dim=1 \| 130 \| 530 \| 549 n=1000000, nnz=10000, index_len=100, dim=0 \| 90 \| 429 \| 110 n=1000000, nnz=10000, index_len=100, dim=1 \| 296 \| 810 \| 355 n=1000000, nnz=10000, index_len=1000, dim=0 \| 100 \| 435 \| 170 n=1000000, nnz=10000, index_len=1000, dim=1 \| 309 \| 830 \| 548 n=1000000, nnz=10000, index_len=10000, dim=0 \| 110 \| 446 \| 750 n=1000000, nnz=10000, index_len=10000, dim=1 \| 310 \| 830 \| 1000 n=10, nnz=100, index_len=100, dim=0 \| 90 \| 333 \| 74 n=10, nnz=100, index_len=100, dim=1 \| 100 \| 497 \| 78 n=10, nnz=100, index_len=1000, dim=0 \| 90 \| 329 \| 140 n=10, nnz=100, index_len=1000, dim=1 \| 100 \| 800 \| 100 n=10, nnz=100, index_len=10000, dim=0 \| 93 \| 340 \| 900 n=10, nnz=100, index_len=10000, dim=1 \| 120 \| 800 \| 489 n=10, nnz=1000, index_len=100, dim=0 \| 90 \| 321 \| 140 n=10, nnz=1000, index_len=100, dim=1 \| 100 \| 680 \| 140 n=10, nnz=1000, index_len=1000, dim=0 \| 110 \| 349 \| 670 n=10, nnz=1000, index_len=1000, dim=1 \| 130 \| 740 \| 800 n=10, nnz=1000, index_len=10000, dim=0 \| 302 \| 503 \| 4882 n=10, nnz=1000, index_len=10000, dim=1 \| 325 \| 2257 \| 5262 n=10, nnz=10000, index_len=100, dim=0 \| 229 \| 349 \| 810 n=10, nnz=10000, index_len=100, dim=1 \| 433 \| 870 \| 700 n=10, nnz=10000, index_len=1000, dim=0 \| 666 \| 502 \| 5581 n=10, nnz=10000, index_len=1000, dim=1 \| 826 \| 2379 \| 4820 n=10, nnz=10000, index_len=10000, dim=0 \| 2534 \| 2700 \| 80000 n=10, nnz=10000, index_len=10000, dim=1 \| 2723 \| 18540 \| 80000 n=100, nnz=1000, index_len=100, dim=0 \| 94 \| 324 \| 110 n=100, nnz=1000, index_len=100, dim=1 \| 100 \| 499 \| 110 n=100, nnz=1000, index_len=1000, dim=0 \| 96 \| 337 \| 150 n=100, nnz=1000, index_len=1000, dim=1 \| 130 \| 800 \| 140 n=100, nnz=1000, index_len=10000, dim=0 \| 100 \| 346 \| 900 n=100, nnz=1000, index_len=10000, dim=1 \| 130 \| 760 \| 900 n=100, nnz=10000, index_len=100, dim=0 \| 90 \| 323 \| 190 n=100, nnz=10000, index_len=100, dim=1 \| 279 \| 800 \| 180 n=100, nnz=10000, index_len=1000, dim=0 \| 110 \| 339 \| 781 n=100, nnz=10000, index_len=1000, dim=1 \| 294 \| 870 \| 800 n=100, nnz=10000, index_len=10000, dim=0 \| 315 \| 505 \| 6264 n=100, nnz=10000, index_len=10000, dim=1 \| 497 \| 2398 \| 5404 n=1000, nnz=10000, index_len=100, dim=0 \| 90 \| 333 \| 160 n=1000, nnz=10000, index_len=100, dim=1 \| 279 \| 635 \| 150 n=1000, nnz=10000, index_len=1000, dim=0 \| 100 \| 328 \| 215 n=1000, nnz=10000, index_len=1000, dim=1 \| 287 \| 810 \| 207 n=1000, nnz=10000, index_len=10000, dim=0 \| 100 \| 339 \| 900 n=1000, nnz=10000, index_len=10000, dim=1 \| 291 \| 880 \| 1000 n=1000, nnz=100000, index_len=100, dim=0 \| 92 \| 358 \| 435 n=1000, nnz=100000, index_len=100, dim=1 \| 302 \| 900 \| 530 n=1000, nnz=100000, index_len=1000, dim=0 \| 130 \| 360 \| 1000 n=1000, nnz=100000, index_len=1000, dim=1 \| 329 \| 930 \| 1200 n=1000, nnz=100000, index_len=10000, dim=0 \| 343 \| 530 \| 7000 n=1000, nnz=100000, index_len=10000, dim=1 \| 545 \| 2446 \| 6100 n=1000, nnz=1000000, index_len=100, dim=0 \| 355 \| 394 \| 2210 n=1000, nnz=1000000, index_len=100, dim=1 \| 1660 \| 2276 \| 2674 n=1000, nnz=1000000, index_len=1000, dim=0 \| 877 \| 574 \| 6700 n=1000, nnz=1000000, index_len=1000, dim=1 \| 2449 \| 3782 \| 9000 n=1000, nnz=1000000, index_len=10000, dim=0 \| 3112 \| 2931 \| 57000 n=1000, nnz=1000000, index_len=10000, dim=1 \| 7340 \| 20220 \| 65700 Times are in microseconds (us). ``` </details> Pull Request resolved: https://github.com/pytorch/pytorch/pull/77551 Approved by: https://github.com/cpuhrsch	2022-06-01 17:39:03 +00:00
Mike Ruberry	089203f8bc	Updates floor_divide to perform floor division (#78411 ) Fixes https://github.com/pytorch/pytorch/issues/43874 This PR changes floor_divide to perform floor division instead of truncation division. This is a BC-breaking change, but it's a "bug fix," and we've already warned users for several releases this behavior would change. Pull Request resolved: https://github.com/pytorch/pytorch/pull/78411 Approved by: https://github.com/ngimel	2022-05-29 21:28:45 +00:00
Nikita Vedeneev	00a1fb64bb	Faster `index_select` for sparse COO tensors on CPU. (#72710 ) Fixes https://github.com/pytorch/pytorch/issues/72212. This PR improves the previous algorithm in complexity. It also utilizes the structure of the problem and parallelizes computations when possible. Benchmark results. <details> <summary>Testing script</summary> ```python import torch import math from IPython import get_ipython from itertools import product import pickle from torch.utils.benchmark import Timer, Compare torch.manual_seed(13) #torch.set_num_threads(1) ipython = get_ipython() index_sizes = (100, 1000, 10000) # specifies (n, nnz) problem_dims = ( # n > nnz (10000, 100), (100000, 1000), (1000000, 10000), # n < nnz (10, 100), (10, 1000), (10, 10000), (100, 1000), (100, 10000), (1000, 10000), (1000, 100000), (1000, 1000000), #(1000000, 1000000000), ) def f(t, d, index): s = torch_sparse.SparseTensor.from_torch_sparse_coo_tensor(t) ss = s.index_select(d, index) return ss.coo() name = "PR" results = [] for (n, nnz), m in product(problem_dims, index_sizes): for d in (0, 1): if nnz < n: shape = (n, n) else: shape = (n, nnz // n) if d == 0 else (nnz // n, n) nrows, ncols = shape rowidx = torch.randint(low=0, high=nrows, size=(nnz,)) colidx = torch.randint(low=0, high=ncols, size=(nnz,)) itemidx = torch.vstack((rowidx, colidx)) xvalues = torch.randn(nnz) index = torch.randint(low=0, high=n, size=(m,)) SparseX = torch.sparse_coo_tensor(itemidx, xvalues, size=shape).coalesce() smtp = "SparseX.index_select(d, index)" timer = Timer(smtp, globals=globals(), label="coo.index_select", description=f"{name}: coo.index_select", sub_label=f"n={n}, nnz={nnz}, index_len={m}, dim={d}", num_threads=torch.get_num_threads()) results.append(timer.blocked_autorange()) compare = Compare(results) compare.trim_significant_figures() compare.print() with open(f"{name}_index_select.pickle", 'wb') as f: pickle.dump(results, f) ``` </details> <details> <summary>Gather results</summary> ```python import pickle from torch.utils.benchmark import Timer, Compare files = [ "PR", "torch_sparse", "master" ] timers = [] for name in files: with open("{}_index_select.pickle".format(name), 'rb') as f: timers += pickle.load(f) compare = Compare(timers) compare.trim_significant_figures() compare.print() ``` </details> <details> <summary>PR/torch_sparse/master runtime comparison</summary> ``` [----------------------------------- coo.index_select ----------------------------------] \| PR \| torch_sparse \| master 32 threads: ----------------------------------------------------------------------------- n=10000, nnz=100, index_len=100, dim=0 \| 14 \| 140 \| 10 n=10000, nnz=100, index_len=100, dim=1 \| 14 \| 200 \| 10 n=10000, nnz=100, index_len=1000, dim=0 \| 30 \| 180 \| 38 n=10000, nnz=100, index_len=1000, dim=1 \| 34 \| 240 \| 38 n=10000, nnz=100, index_len=10000, dim=0 \| 278 \| 460 \| 330 n=10000, nnz=100, index_len=10000, dim=1 \| 275 \| 516 \| 330 n=100000, nnz=1000, index_len=100, dim=0 \| 16 \| 290 \| 31 n=100000, nnz=1000, index_len=100, dim=1 \| 26 \| 390 \| 31 n=100000, nnz=1000, index_len=1000, dim=0 \| 45 \| 405 \| 263 n=100000, nnz=1000, index_len=1000, dim=1 \| 73 \| 500 \| 261 n=100000, nnz=1000, index_len=10000, dim=0 \| 444 \| 783 \| 2570 n=100000, nnz=1000, index_len=10000, dim=1 \| 470 \| 890 \| 2590 n=1000000, nnz=10000, index_len=100, dim=0 \| 25 \| 2400 \| 270 n=1000000, nnz=10000, index_len=100, dim=1 \| 270 \| 4000 \| 269 n=1000000, nnz=10000, index_len=1000, dim=0 \| 74 \| 2600 \| 2620 n=1000000, nnz=10000, index_len=1000, dim=1 \| 464 \| 3600 \| 2640 n=1000000, nnz=10000, index_len=10000, dim=0 \| 635 \| 3300 \| 26400 n=1000000, nnz=10000, index_len=10000, dim=1 \| 1000 \| 3960 \| 26400 n=10, nnz=100, index_len=100, dim=0 \| 16 \| 137 \| 16 n=10, nnz=100, index_len=100, dim=1 \| 16 \| 220 \| 16 n=10, nnz=100, index_len=1000, dim=0 \| 63 \| 238 \| 81 n=10, nnz=100, index_len=1000, dim=1 \| 60 \| 698 \| 78 n=10, nnz=100, index_len=10000, dim=0 \| 480 \| 940 \| 862 n=10, nnz=100, index_len=10000, dim=1 \| 330 \| 4930 \| 1070 n=10, nnz=1000, index_len=100, dim=0 \| 60 \| 200 \| 73 n=10, nnz=1000, index_len=100, dim=1 \| 56 \| 683 \| 70 n=10, nnz=1000, index_len=1000, dim=0 \| 480 \| 530 \| 1050 n=10, nnz=1000, index_len=1000, dim=1 \| 330 \| 4550 \| 1368 n=10, nnz=1000, index_len=10000, dim=0 \| 3100 \| 2900 \| 9300 n=10, nnz=1000, index_len=10000, dim=1 \| 3400 \| 46000 \| 9100 n=10, nnz=10000, index_len=100, dim=0 \| 400 \| 453 \| 857 n=10, nnz=10000, index_len=100, dim=1 \| 400 \| 4070 \| 1730 n=10, nnz=10000, index_len=1000, dim=0 \| 2840 \| 2600 \| 13900 n=10, nnz=10000, index_len=1000, dim=1 \| 3700 \| 40600 \| 16000 n=10, nnz=10000, index_len=10000, dim=0 \| 83200 \| 67400 \| 160000 n=10, nnz=10000, index_len=10000, dim=1 \| 68000 \| 528000 \| 190000 n=100, nnz=1000, index_len=100, dim=0 \| 46 \| 148 \| 31 n=100, nnz=1000, index_len=100, dim=1 \| 45 \| 242 \| 37 n=100, nnz=1000, index_len=1000, dim=0 \| 68 \| 248 \| 240 n=100, nnz=1000, index_len=1000, dim=1 \| 66 \| 755 \| 290 n=100, nnz=1000, index_len=10000, dim=0 \| 370 \| 802 \| 2250 n=100, nnz=1000, index_len=10000, dim=1 \| 372 \| 5430 \| 2770 n=100, nnz=10000, index_len=100, dim=0 \| 82 \| 210 \| 224 n=100, nnz=10000, index_len=100, dim=1 \| 74 \| 986 \| 270 n=100, nnz=10000, index_len=1000, dim=0 \| 350 \| 618 \| 2600 n=100, nnz=10000, index_len=1000, dim=1 \| 370 \| 4660 \| 4560 n=100, nnz=10000, index_len=10000, dim=0 \| 3000 \| 3400 \| 41680 n=100, nnz=10000, index_len=10000, dim=1 \| 5000 \| 47500 \| 30400 n=1000, nnz=10000, index_len=100, dim=0 \| 71 \| 160 \| 185 n=1000, nnz=10000, index_len=100, dim=1 \| 64 \| 516 \| 190 n=1000, nnz=10000, index_len=1000, dim=0 \| 100 \| 249 \| 1740 n=1000, nnz=10000, index_len=1000, dim=1 \| 98 \| 1030 \| 1770 n=1000, nnz=10000, index_len=10000, dim=0 \| 600 \| 808 \| 18300 n=1000, nnz=10000, index_len=10000, dim=1 \| 663 \| 5300 \| 18500 n=1000, nnz=100000, index_len=100, dim=0 \| 160 \| 258 \| 1890 n=1000, nnz=100000, index_len=100, dim=1 \| 200 \| 3620 \| 2050 n=1000, nnz=100000, index_len=1000, dim=0 \| 500 \| 580 \| 18700 n=1000, nnz=100000, index_len=1000, dim=1 \| 640 \| 7550 \| 30000 n=1000, nnz=100000, index_len=10000, dim=0 \| 3400 \| 3260 \| 186000 n=1000, nnz=100000, index_len=10000, dim=1 \| 3600 \| 49600 \| 194000 n=1000, nnz=1000000, index_len=100, dim=0 \| 517 \| 957 \| 18700 n=1000, nnz=1000000, index_len=100, dim=1 \| 680 \| 39600 \| 37600 n=1000, nnz=1000000, index_len=1000, dim=0 \| 3600 \| 4500 \| 186000 n=1000, nnz=1000000, index_len=1000, dim=1 \| 5800 \| 76400 \| 190000 n=1000, nnz=1000000, index_len=10000, dim=0 \| 50000 \| 67900 \| 1800000 n=1000, nnz=1000000, index_len=10000, dim=1 \| 45000 \| 570000 \| 1900000 Times are in microseconds (us). ``` </details> Pull Request resolved: https://github.com/pytorch/pytorch/pull/72710 Approved by: https://github.com/pearu, https://github.com/cpuhrsch	2022-05-10 16:33:13 +00:00
PyTorch MergeBot	8d67972b14	Revert "Faster `index_select` for sparse COO tensors on CPU. (#72710 )" This reverts commit `ce3857e73c`. Reverted https://github.com/pytorch/pytorch/pull/72710 on behalf of https://github.com/malfet	2022-05-10 14:43:05 +00:00
Nikita Vedeneev	ce3857e73c	Faster `index_select` for sparse COO tensors on CPU. (#72710 ) Fixes https://github.com/pytorch/pytorch/issues/72212. This PR improves the previous algorithm in complexity. It also utilizes the structure of the problem and parallelizes computations when possible. Benchmark results. <details> <summary>Testing script</summary> ```python import torch import math from IPython import get_ipython from itertools import product import pickle from torch.utils.benchmark import Timer, Compare torch.manual_seed(13) #torch.set_num_threads(1) ipython = get_ipython() index_sizes = (100, 1000, 10000) # specifies (n, nnz) problem_dims = ( # n > nnz (10000, 100), (100000, 1000), (1000000, 10000), # n < nnz (10, 100), (10, 1000), (10, 10000), (100, 1000), (100, 10000), (1000, 10000), (1000, 100000), (1000, 1000000), #(1000000, 1000000000), ) def f(t, d, index): s = torch_sparse.SparseTensor.from_torch_sparse_coo_tensor(t) ss = s.index_select(d, index) return ss.coo() name = "PR" results = [] for (n, nnz), m in product(problem_dims, index_sizes): for d in (0, 1): if nnz < n: shape = (n, n) else: shape = (n, nnz // n) if d == 0 else (nnz // n, n) nrows, ncols = shape rowidx = torch.randint(low=0, high=nrows, size=(nnz,)) colidx = torch.randint(low=0, high=ncols, size=(nnz,)) itemidx = torch.vstack((rowidx, colidx)) xvalues = torch.randn(nnz) index = torch.randint(low=0, high=n, size=(m,)) SparseX = torch.sparse_coo_tensor(itemidx, xvalues, size=shape).coalesce() smtp = "SparseX.index_select(d, index)" timer = Timer(smtp, globals=globals(), label="coo.index_select", description=f"{name}: coo.index_select", sub_label=f"n={n}, nnz={nnz}, index_len={m}, dim={d}", num_threads=torch.get_num_threads()) results.append(timer.blocked_autorange()) compare = Compare(results) compare.trim_significant_figures() compare.print() with open(f"{name}_index_select.pickle", 'wb') as f: pickle.dump(results, f) ``` </details> <details> <summary>Gather results</summary> ```python import pickle from torch.utils.benchmark import Timer, Compare files = [ "PR", "torch_sparse", "master" ] timers = [] for name in files: with open("{}_index_select.pickle".format(name), 'rb') as f: timers += pickle.load(f) compare = Compare(timers) compare.trim_significant_figures() compare.print() ``` </details> <details> <summary>PR/torch_sparse/master runtime comparison</summary> ``` [----------------------------------- coo.index_select ----------------------------------] \| PR \| torch_sparse \| master 32 threads: ----------------------------------------------------------------------------- n=10000, nnz=100, index_len=100, dim=0 \| 14 \| 140 \| 10 n=10000, nnz=100, index_len=100, dim=1 \| 14 \| 200 \| 10 n=10000, nnz=100, index_len=1000, dim=0 \| 30 \| 180 \| 38 n=10000, nnz=100, index_len=1000, dim=1 \| 34 \| 240 \| 38 n=10000, nnz=100, index_len=10000, dim=0 \| 278 \| 460 \| 330 n=10000, nnz=100, index_len=10000, dim=1 \| 275 \| 516 \| 330 n=100000, nnz=1000, index_len=100, dim=0 \| 16 \| 290 \| 31 n=100000, nnz=1000, index_len=100, dim=1 \| 26 \| 390 \| 31 n=100000, nnz=1000, index_len=1000, dim=0 \| 45 \| 405 \| 263 n=100000, nnz=1000, index_len=1000, dim=1 \| 73 \| 500 \| 261 n=100000, nnz=1000, index_len=10000, dim=0 \| 444 \| 783 \| 2570 n=100000, nnz=1000, index_len=10000, dim=1 \| 470 \| 890 \| 2590 n=1000000, nnz=10000, index_len=100, dim=0 \| 25 \| 2400 \| 270 n=1000000, nnz=10000, index_len=100, dim=1 \| 270 \| 4000 \| 269 n=1000000, nnz=10000, index_len=1000, dim=0 \| 74 \| 2600 \| 2620 n=1000000, nnz=10000, index_len=1000, dim=1 \| 464 \| 3600 \| 2640 n=1000000, nnz=10000, index_len=10000, dim=0 \| 635 \| 3300 \| 26400 n=1000000, nnz=10000, index_len=10000, dim=1 \| 1000 \| 3960 \| 26400 n=10, nnz=100, index_len=100, dim=0 \| 16 \| 137 \| 16 n=10, nnz=100, index_len=100, dim=1 \| 16 \| 220 \| 16 n=10, nnz=100, index_len=1000, dim=0 \| 63 \| 238 \| 81 n=10, nnz=100, index_len=1000, dim=1 \| 60 \| 698 \| 78 n=10, nnz=100, index_len=10000, dim=0 \| 480 \| 940 \| 862 n=10, nnz=100, index_len=10000, dim=1 \| 330 \| 4930 \| 1070 n=10, nnz=1000, index_len=100, dim=0 \| 60 \| 200 \| 73 n=10, nnz=1000, index_len=100, dim=1 \| 56 \| 683 \| 70 n=10, nnz=1000, index_len=1000, dim=0 \| 480 \| 530 \| 1050 n=10, nnz=1000, index_len=1000, dim=1 \| 330 \| 4550 \| 1368 n=10, nnz=1000, index_len=10000, dim=0 \| 3100 \| 2900 \| 9300 n=10, nnz=1000, index_len=10000, dim=1 \| 3400 \| 46000 \| 9100 n=10, nnz=10000, index_len=100, dim=0 \| 400 \| 453 \| 857 n=10, nnz=10000, index_len=100, dim=1 \| 400 \| 4070 \| 1730 n=10, nnz=10000, index_len=1000, dim=0 \| 2840 \| 2600 \| 13900 n=10, nnz=10000, index_len=1000, dim=1 \| 3700 \| 40600 \| 16000 n=10, nnz=10000, index_len=10000, dim=0 \| 83200 \| 67400 \| 160000 n=10, nnz=10000, index_len=10000, dim=1 \| 68000 \| 528000 \| 190000 n=100, nnz=1000, index_len=100, dim=0 \| 46 \| 148 \| 31 n=100, nnz=1000, index_len=100, dim=1 \| 45 \| 242 \| 37 n=100, nnz=1000, index_len=1000, dim=0 \| 68 \| 248 \| 240 n=100, nnz=1000, index_len=1000, dim=1 \| 66 \| 755 \| 290 n=100, nnz=1000, index_len=10000, dim=0 \| 370 \| 802 \| 2250 n=100, nnz=1000, index_len=10000, dim=1 \| 372 \| 5430 \| 2770 n=100, nnz=10000, index_len=100, dim=0 \| 82 \| 210 \| 224 n=100, nnz=10000, index_len=100, dim=1 \| 74 \| 986 \| 270 n=100, nnz=10000, index_len=1000, dim=0 \| 350 \| 618 \| 2600 n=100, nnz=10000, index_len=1000, dim=1 \| 370 \| 4660 \| 4560 n=100, nnz=10000, index_len=10000, dim=0 \| 3000 \| 3400 \| 41680 n=100, nnz=10000, index_len=10000, dim=1 \| 5000 \| 47500 \| 30400 n=1000, nnz=10000, index_len=100, dim=0 \| 71 \| 160 \| 185 n=1000, nnz=10000, index_len=100, dim=1 \| 64 \| 516 \| 190 n=1000, nnz=10000, index_len=1000, dim=0 \| 100 \| 249 \| 1740 n=1000, nnz=10000, index_len=1000, dim=1 \| 98 \| 1030 \| 1770 n=1000, nnz=10000, index_len=10000, dim=0 \| 600 \| 808 \| 18300 n=1000, nnz=10000, index_len=10000, dim=1 \| 663 \| 5300 \| 18500 n=1000, nnz=100000, index_len=100, dim=0 \| 160 \| 258 \| 1890 n=1000, nnz=100000, index_len=100, dim=1 \| 200 \| 3620 \| 2050 n=1000, nnz=100000, index_len=1000, dim=0 \| 500 \| 580 \| 18700 n=1000, nnz=100000, index_len=1000, dim=1 \| 640 \| 7550 \| 30000 n=1000, nnz=100000, index_len=10000, dim=0 \| 3400 \| 3260 \| 186000 n=1000, nnz=100000, index_len=10000, dim=1 \| 3600 \| 49600 \| 194000 n=1000, nnz=1000000, index_len=100, dim=0 \| 517 \| 957 \| 18700 n=1000, nnz=1000000, index_len=100, dim=1 \| 680 \| 39600 \| 37600 n=1000, nnz=1000000, index_len=1000, dim=0 \| 3600 \| 4500 \| 186000 n=1000, nnz=1000000, index_len=1000, dim=1 \| 5800 \| 76400 \| 190000 n=1000, nnz=1000000, index_len=10000, dim=0 \| 50000 \| 67900 \| 1800000 n=1000, nnz=1000000, index_len=10000, dim=1 \| 45000 \| 570000 \| 1900000 Times are in microseconds (us). ``` </details> Pull Request resolved: https://github.com/pytorch/pytorch/pull/72710 Approved by: https://github.com/pearu, https://github.com/cpuhrsch	2022-05-09 19:59:39 +00:00
Jane Xu	6d9dbd3391	Manually skip test_sparse_addmm as disable code is not working for now (#77076 ) Related to https://github.com/pytorch/pytorch/issues/73145 It was previously skipped for Linux and Windows, but mac has become a problem as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/77076 Approved by: https://github.com/ezyang	2022-05-09 13:54:29 +00:00
Mikayla Gawarecki	0adf070574	Use scatter_reduce to support masked reductions on sparse COO tensors (sum, prod, amin, amax) Pull Request resolved: https://github.com/pytorch/pytorch/pull/75454 Approved by: https://github.com/cpuhrsch	2022-05-06 15:40:22 +00:00
PyTorch MergeBot	381e08309f	Revert "Use scatter_reduce to support masked reductions on sparse COO tensors (sum, prod, amin, amax)" This reverts commit `fc2a2e8b72`. Reverted https://github.com/pytorch/pytorch/pull/75454 on behalf of https://github.com/b0noI	2022-05-04 22:31:31 +00:00
Mikayla Gawarecki	fc2a2e8b72	Use scatter_reduce to support masked reductions on sparse COO tensors (sum, prod, amin, amax) Pull Request resolved: https://github.com/pytorch/pytorch/pull/75454 Approved by: https://github.com/cpuhrsch	2022-05-03 23:17:07 +00:00
arindamroy-eng	7478ce187a	ROCM:Unskip more tests for ROCM5.0 Re-enabling more tests which are working on ROCM5.0 Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/75353 Approved by: https://github.com/ezyang	2022-04-19 19:45:55 +00:00
Pearu Peterson	a98b4666e0	Enable test_sparse_mask for Windows Pull Request resolved: https://github.com/pytorch/pytorch/pull/75189 Approved by: https://github.com/cpuhrsch	2022-04-11 17:21:29 +00:00
Brian Hirsh	1b7d7d9327	Reland: "free up dispatch key space (in C++)" (#74963 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74963 This is a re-land of D35192346 (`9872a06d77`) and D35192317 (`a9216cde6c`), which together are a diff that changes the internal representation of `DispatchKeySet` in pytorch core to free up the number of dispatch keys that we have available. See a more detailed description of the design in the original PR: https://github.com/pytorch/pytorch/pull/69633. The original PR broke Milan workflows, which use a pytorch mobile build, and manifested as a memory corruption bug inside of `liboacrmerged.so`. Background: Existing Mobile Optimization Pytorch mobile builds have an existing optimization (here `cc23725e89/c10/core/DispatchKey.h (L382)` and here `cc23725e89/aten/src/ATen/core/dispatch/OperatorEntry.h (L214)`), which works as follows: Every operator in pytorch has a "dispatch table" of function pointers, corresponding to all of the (up to 64) different kernels that we might dispatch to when we run an operator in pytorch (autograd, cpu, cuda, complex number support, etc). In mobile builds, the size of that table is shrunk from 64 to 8 to save a bunch of space, because mobile doesn't end up using the functionality associated with most dispatch keys. The dispatcher also has a notion of "fallback kernels", which are kernels that you can register to a particular dispatch key, but should be able to work for "any operator". The array of fallback kernels is defined here: `cc23725e89/aten/src/ATen/core/dispatch/Dispatcher.h (L294)`. The mobile-optimization currently does not extend to this array (it wouldn't be that useful anyway because there is only one array of fallback kernels globally - vs. there is a separate dispatch table of function pointers per operator). So the per-operator tables on mobile are size 8, while the fallback table is size 64. The Bug This PR actually makes it difficult to enable that optimization separately for the per-operator arrays vs. the fallback array, and incidentally shrunk the size of the fallback array from 64 to 8 for mobile (that happened on this line: https://github.com/pytorch/pytorch/pull/69633/files#diff-f735cd7aa68f15b624100cbc4bb3b5ea76ffc7c9d3bec3b0ccabaa09609e5319R294). That isn't a problem by itself (since mobile doesn't actually use any of the fallbacks that can no longer be stored). However, pytorch core will still register all of those fallback kernels on startup in mobile builds, even if they aren't used. When we tried to register one of those fallbacks on startup, it would try to dump the kernel somewhere in memory past the bounds of the (now smaller) array inside of the `Dispatcher` object, `backendFallbackKernels_`. Why didn't this problem show up in OSS CI? Why didn't it break other internal mobile workflows aside from Milan? Ideally, this failure would show up as part of the OSS signal on GitHub, since we already have mobile OSS builds. Given that it was another memory corruption issue that only affected Milan (subset of mobile), I'm not sure what's specific about Milan's builds that caused it only to manifest there. dreiss I wonder if there's another flavor of mobile builds we could run in OSS CI that could potentially help catch this? The debugging experience was pretty difficult Debugging the Milan-specific failure was made difficult by the following: (1) lack of CI - the original Milan failure didn't surface on my original diff, because the Milan job(s) that failed weren't triggered to run on pytorch changes. There's probably a balance to strike here, since those jobs will only be useful if they aren't flaky, and if they can produce reliable failure logs for debugging. (2) It's difficult to get a repro. - my work laptop doesn't have the right specs to run the Milan development workflow (not enough disk space) - There is an existing OnDemand workflow for Milan, but it appears to be relatively new, and after a bunch of help from MarcioPorto, we ran into issues forwarding the log output from Milan tests on the emulator back to the terminal (see the original discussion here: https://fb.workplace.com/groups/OnDemandFRL/permalink/1424937774645433/) (3) Lack of stack-traces. - Most Milan failures didn't include actionable stack traces. phding generously helped me debug by running my suggested patches locally, and reporting back if there were any failures. The failing test didn't include a stack trace though (just the line where the crash appeared), so I ended up making some educated guesses about what the issue was based on the area of the crash. ghstack-source-id: 152688542 Test Plan: Confirmed with phding that the broken Milan workflow from the previous version of this diff is now passing. Reviewed By: phding, albanD Differential Revision: D35222806 fbshipit-source-id: 0ad115a0f768bc8ea5d4c203b2990254c7092d30 (cherry picked from commit 002b91966f11fd55ab3fa3801b636fa39a6dd12c)	2022-03-31 21:52:38 +00:00
Nikita Shulga	bfac65dfe5	[testing] Update dispatch macros (#74977 ) This PR is reland of #74289 Co-authored-by: Khushi Agrawal <khushiagrawal411@gmail.com>	2022-03-30 14:13:21 -07:00

1 2 3 4 5 ...

398 Commits