pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Pearu Peterson	2abcafcfd8	Add masked_grad kw argument to to_dense (#96095 ) As in the title. The `masked_grad` kw argument is required for `to_dense` backward to distinguish the expected semantics of sparse tensors. `masked_grad=True` means that the `to_dense` backward will apply a mask to the returned gradient where the mask is defined by the input indices. The default semantics implies `masked_grad==True` for BC but see the [comment](https://github.com/pytorch/pytorch/pull/96095/files#diff-d4df180433a09071e891d552426911c227b30ae9b8a8e56da31046e7ecb1afbeR501-R513) in `to_dense_backward`. As a consequence, existing code that is run through autograd engine must replace `.to_dense()` calls with `.to_dense(masked_grad=False)`. For example, ```python torch.autograd.gradcheck(lambda x: torch.sum(x, [0]).to_dense()) torch.autograd.gradcheck(lambda x: torch.sparse.sum(x, [0]).to_dense()) ``` (recall, gradcheck has `masked=False` as default) must be updated to ```python torch.autograd.gradcheck(lambda x: torch.sum(x, [0]).to_dense(masked_grad=False)) torch.autograd.gradcheck(lambda x: torch.sparse.sum(x, [0]).to_dense(masked_grad=True), masked=True) ``` Fixes https://github.com/pytorch/pytorch/issues/95550 Pull Request resolved: https://github.com/pytorch/pytorch/pull/96095 Approved by: https://github.com/cpuhrsch	2023-03-16 21:38:11 +00:00
Nikita Vedeneev	0b5040b329	sparse_mask: remove syncs by removing calls to coalesce (#94406 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/94406 Approved by: https://github.com/cpuhrsch, https://github.com/pearu	2023-03-13 16:30:27 +00:00
Andrew M. James	2bcc0e9e18	Expand sparse.softmax zero nnz tests to cover cases of previously reported FPE. (#95646 ) - Test cases with zero `nnz` added for `sparse.log_softmax`. - Test cases with zero `nnz` for both `sparse.log_softmax` and `torch.sparse_softmax` expanded to cover the backward pass. These test additions prove resolution to #95371 and #82107. Fixes #82107 #95371 Pull Request resolved: https://github.com/pytorch/pytorch/pull/95646 Approved by: https://github.com/cpuhrsch, https://github.com/pearu, https://github.com/nikitaved	2023-03-01 17:26:51 +00:00
Pearu Peterson	b89fda51cd	Implement sparse semantics support in gradcheck (2nd try) (#95405 ) Replaces https://github.com/pytorch/pytorch/pull/94714 that was reverted due to https://github.com/pytorch/pytorch/pull/94714#issuecomment-1442355648 Pull Request resolved: https://github.com/pytorch/pytorch/pull/95405 Approved by: https://github.com/albanD	2023-02-27 17:48:02 +00:00
Pearu Peterson	0c0694495b	Fix a bug in nesting check_sparse_tensor_invariants context managers (#95372 ) As in the title. The bug was reported in https://github.com/pytorch/pytorch/pull/94728#discussion_r1108892366 and has the following reproducer: ```python >>> import torch >>> check_ctx = torch.sparse.check_sparse_tensor_invariants(True) >>> no_check_ctx = torch.sparse.check_sparse_tensor_invariants(False) >>> with check_ctx: ... assert torch.sparse.check_sparse_tensor_invariants.is_enabled() ... with no_check_ctx: ... assert not torch.sparse.check_sparse_tensor_invariants.is_enabled() ... assert torch.sparse.check_sparse_tensor_invariants.is_enabled() ... Traceback (most recent call last): File "<stdin>", line 5, in <module> AssertionError ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/95372 Approved by: https://github.com/cpuhrsch	2023-02-23 18:22:13 +00:00
Zain Rizvi	808879ec8b	Revert "Implement sparse semantics support in gradcheck (#94714 )" (#95386 ) This reverts commit `7ac511c29a` from https://github.com/pytorch/pytorch/pull/94714 since it breaks periodic. Git thinks there's a merge conflict due to an unfortunately located newline deletion, so reverting this one manually Details behind the failure in https://github.com/pytorch/pytorch/pull/94714#issuecomment-1442160593 Pull Request resolved: https://github.com/pytorch/pytorch/pull/95386 Approved by: https://github.com/clee2000	2023-02-23 18:02:37 +00:00
Pearu Peterson	cece63f197	Add warn-once deprecation warning to legacy sparse constructors (#94850 ) Addresses https://github.com/pytorch/pytorch/issues/68323#issuecomment-1425174341 Pull Request resolved: https://github.com/pytorch/pytorch/pull/94850 Approved by: https://github.com/amjames, https://github.com/cpuhrsch	2023-02-23 15:05:12 +00:00
kshitij12345	3b966a6ce3	[autograd] disable backward/grad for complex scalar output (#92753 ) Fixes https://github.com/pytorch/pytorch/issues/92750 Pull Request resolved: https://github.com/pytorch/pytorch/pull/92753 Approved by: https://github.com/ezyang	2023-02-23 11:38:27 +00:00
Pearu Peterson	7ac511c29a	Implement sparse semantics support in gradcheck (#94714 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/94714 Approved by: https://github.com/soulitzer, https://github.com/albanD	2023-02-22 20:03:25 +00:00
Nikita Vedeneev	3ace14eb8b	[Bug fix] sparse_mask: wrong intersection on CUDA (#94829 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/94829 Approved by: https://github.com/cpuhrsch	2023-02-15 13:22:39 +00:00
Xuehai Pan	046e88a291	[BE] [3/3] Rewrite `super()` calls in test (#94592 ) Rewrite Python built-in class `super()` calls. Only non-semantic changes should be applied. - #94587 - #94588 - #94592 Also, methods with only a `super()` call are removed: ```diff class MyModule(nn.Module): - def __init__(self): - super().__init__() - def forward(self, ...): ... ``` Some cases that change the semantics should be kept unchanged. E.g.: `f152a79be9/caffe2/python/net_printer.py (L184-L190)` `f152a79be9/test/test_jit_fuser_te.py (L2628-L2635)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/94592 Approved by: https://github.com/ezyang, https://github.com/seemethere	2023-02-12 22:20:53 +00:00
Aaron Gokaslan	3d82d8d0ed	[BE] Enable more flake8-comprehensions checks (#94601 ) I applied some flake8 fixes and enabled checking for them in the linter. I also enabled some checks for my previous comprehensions PR. This is a follow up to #94323 where I enable the flake8 checkers for the fixes I made and fix a few more of them. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94601 Approved by: https://github.com/ezyang	2023-02-10 23:40:29 +00:00
Huy Do	c53bd0dd30	Mitigate broken test_coalesce_reference_cycle test on dynamo (#94622 ) The test has been disabled and shows up on https://github.com/pytorch/test-infra/blob/generated-stats/stats/disabled-tests-condensed.json, but then the JSON file downloaded by the runner doesn't seem to have it. Disable it explicitly to keep trunk green while investigating. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94622 Approved by: https://github.com/weiwangmeta	2023-02-10 21:59:36 +00:00
PyTorch MergeBot	76ed1a81d1	Revert "COO intersection kernel: respect value intersection order (#92242 )" This reverts commit `b07c839b70`. Reverted https://github.com/pytorch/pytorch/pull/92242 on behalf of https://github.com/jeanschmidt due to breaking vs17	2023-02-09 14:44:32 +00:00
Aleksandar Samardžić	e1f17b3530	Add CSR->BSC and CSC->BSR conversions (#93301 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/93301 Approved by: https://github.com/cpuhrsch	2023-02-07 19:22:05 +00:00
Nikita Vedeneev	b07c839b70	COO intersection kernel: respect value intersection order (#92242 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/92242 Approved by: https://github.com/cpuhrsch, https://github.com/amjames	2023-02-07 17:05:28 +00:00
Nikita Vedeneev	994f85d639	sparse_mask: extend lhs to sparse COO tensors (#92248 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/92248 Approved by: https://github.com/cpuhrsch, https://github.com/pearu	2023-02-01 09:00:07 +00:00
Aleksandar Samardžić	53f7fb9a22	Add CSC->BSC conversion (#92307 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/92307 Approved by: https://github.com/cpuhrsch	2023-01-30 17:03:36 +00:00
Pearu Peterson	65d6802e2f	Improve error messages for sparse methods on tensors with unsupported backends/layouts. (#93149 ) Fixes https://github.com/pytorch/pytorch/issues/92790 Pull Request resolved: https://github.com/pytorch/pytorch/pull/93149 Approved by: https://github.com/cpuhrsch	2023-01-27 19:50:23 +00:00
Pearu Peterson	0e92bbe5b1	Add sparse COO tensor support to torch.sum(dim=..., keepdim=...) (#92979 ) Fixes #92757, #86232 Pull Request resolved: https://github.com/pytorch/pytorch/pull/92979 Approved by: https://github.com/cpuhrsch	2023-01-26 18:42:51 +00:00
Eddie Yan	0bf7506051	[CUDA] Drop CUDA < 11.0 test flags (#92605 ) Follow-up of #89582 to drop flags like `CUDA11OrLater` in tests. Note that in some places it appears that `TEST_WITH_ROCM` is _implicitly_ guarded against via the `CUDA11OrLater` version check, based on my best-guess of how `torch.version.cuda` would behave in ROCM builds, so I've added `not TEST_WITH_ROCM` in cases where ROCM wasn't previously explicitly allowed. CC @ptrblck @malfet @ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/92605 Approved by: https://github.com/ngimel	2023-01-24 04:34:06 +00:00
Nikita Vedeneev	9f381c9b7f	sparse_sparse_matmul: simplify backward (#91712 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/91712 Approved by: https://github.com/albanD	2023-01-23 19:24:28 +00:00
Yanbo Liang	0ab4ab9f8d	[Dynamo] Fix calling UserDefinedObject.func should pass self object (#92050 ) Fixes #90834 Pull Request resolved: https://github.com/pytorch/pytorch/pull/92050 Approved by: https://github.com/jansel	2023-01-21 05:47:01 +00:00
Pearu Peterson	b3e4f5029b	Add check-sparse-tensor-invariants flag to Context - 2nd try. (#92094 ) This PR is a copy of https://github.com/pytorch/pytorch/pull/90849 that merge was reverted. The PR adds "check sparse tensor invariants" flag to Context that when enabled will trigger sparse tensor data invariants checks in unsafe methods of constructing sparse COO/CSR/CSC/BSR/BSC tensors. The feature includes the following changes to UI: `torch.sparse.check_sparse_tensor_invariants` class provides different ways to enable/disable the invariant checking. `torch.sparse_coo/csr/csc/bsr/bsc/compressed_tensor` functions have a new optional argument `check_invariants` to enable/disable the invariant checks explicitly. When the `check_invariants` argument is specified, the global state of the feature is temporarily overridden. The PR fixes https://github.com/pytorch/pytorch/issues/90833 Pull Request resolved: https://github.com/pytorch/pytorch/pull/92094 Approved by: https://github.com/cpuhrsch	2023-01-13 14:50:33 +00:00
PyTorch MergeBot	c7a22bb7c7	Revert "Add check-sparse-tensor-invariants flag to Context. (#90849 )" This reverts commit `b9a035c1c5`. Reverted https://github.com/pytorch/pytorch/pull/90849 on behalf of https://github.com/DanilBaibak due to Break internal build	2023-01-12 09:58:16 +00:00
Aleksandar Samardžić	8612ec5b90	Implement hybrid sparse to/from dense conversions. (#90177 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/90177 Approved by: https://github.com/cpuhrsch, https://github.com/pearu	2023-01-12 03:31:30 +00:00
min-jean-cho	af242eedfb	[Inductor] Added aten.uniform_ decomp (#90869 ) Fixes #90815 Pull Request resolved: https://github.com/pytorch/pytorch/pull/90869 Approved by: https://github.com/jgong5, https://github.com/jansel, https://github.com/lezcano, https://github.com/ngimel, https://github.com/albanD	2023-01-11 23:23:42 +00:00
Pearu Peterson	b9a035c1c5	Add check-sparse-tensor-invariants flag to Context. (#90849 ) This PR adds "check sparse tensor invariants" flag to Context that when enabled will trigger sparse tensor data invariants checks in unsafe methods of constructing sparse COO/CSR/CSC/BSR/BSC tensors. The feature includes the following changes to UI: - `torch.enable_check_sparse_tensor_invariants` and `torch.is_check_sparse_tensor_invariants_enabled` functions to globally enable/disable the invariant checks and to retrieve the state of the feature, respectively - `torch.sparse_coo/csr/csc/bsr/bsc/compressed_tensor` functions have a new optional argument `check_invariants` to enable/disable the invariant checks explicitly. When the `check_invariants` argument is specified, the global state of the feature is temporarily overridden. The PR also fixes https://github.com/pytorch/pytorch/issues/90833 # Main issue The following content is outdated after merging the PRs in this ghstack but kept for the record. The importance of this feature is that when enabling the invariants checks by default, say, via <details> ``` $ git diff diff --git a/torch/__init__.py b/torch/__init__.py index c8543057c7..19a91d0482 100644 --- a/torch/__init__.py +++ b/torch/__init__.py @@ -1239,3 +1239,8 @@ if 'TORCH_CUDA_SANITIZER' in os.environ: # Populate magic methods on SymInt and SymFloat import torch.fx.experimental.symbolic_shapes + +# temporarily enable sparse tensor arguments validation in unsafe +# constructors: + +torch._C._set_check_sparse_tensor_invariants(True) ``` </details> a massive number of test failures/errors occur in test_sparse_csr.py tests: ``` $ pytest -sv test/test_sparse_csr.py <snip> ==== 4293 failed, 1557 passed, 237 skipped, 2744 errors in 69.71s (0:01:09) ==== ``` that means that we are silently constructing sparse compressed tensors that do not satisfy the sparse tensor invariants. In particular, the following errors are raised: ``` AssertionError: "resize_as_sparse_compressed_tensor_: self and src must have the same layout" does not match "expected values to be a strided and contiguous tensor" RuntimeError: CUDA error: device-side assert triggered RuntimeError: `col_indices[..., crow_indices[..., i - 1]:crow_indices[..., i]] for all i = 1, ..., nrows are sorted and distinct along the last dimension values` is not satisfied. RuntimeError: expected col_indices to be a strided and contiguous tensor RuntimeError: expected row_indices to be a strided and contiguous tensor RuntimeError: expected values to be a strided and contiguous tensor RuntimeError: for_each: failed to synchronize: cudaErrorAssert: device-side assert triggered RuntimeError: tensor dimensionality must be sum of batch, base, and dense dimensionalities (=0 + 2 + 0) but got 3 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/90849 Approved by: https://github.com/amjames, https://github.com/cpuhrsch	2023-01-11 01:05:14 +00:00
anjali411	c887837ec3	Reland "Fix dynamo handling for tensor attributes: T, H, mT, mH (#90463 )" (#91897 ) This reverts commit `84266ae670`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/91897 Approved by: https://github.com/ngimel	2023-01-10 08:16:07 +00:00
PyTorch MergeBot	84266ae670	Revert "Fix dynamo handling for tensor attributes: T, H, mT, mH (#90463 )" This reverts commit `9945a78a94`. Reverted https://github.com/pytorch/pytorch/pull/90463 on behalf of https://github.com/ZainRizvi due to This is causing test failures: FAILED inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_pinv_singular_cuda_float64 - RuntimeError: unexpected success linalg.pinv.singular, torch.float64, cuda	2023-01-09 16:43:36 +00:00
anjali411	9945a78a94	Fix dynamo handling for tensor attributes: T, H, mT, mH (#90463 ) Fixes https://github.com/pytorch/pytorch/issues/88843 Pull Request resolved: https://github.com/pytorch/pytorch/pull/90463 Approved by: https://github.com/ngimel	2023-01-09 04:11:23 +00:00
Nikita Vedeneev	7ef7c57ae7	CSC/BSC -> COO coalesce fix (#91440 ) Fixes https://github.com/pytorch/pytorch/issues/91010. CSC and BSC sparse formats are not inherently `coalesced`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/91440 Approved by: https://github.com/pearu, https://github.com/amjames, https://github.com/cpuhrsch	2023-01-03 18:42:39 +00:00
Pearu Peterson	b797a24259	Support indices contiguity per batch and non-contiguous values in sparse compressed tensors (#91243 ) Fixes https://github.com/pytorch/pytorch/issues/91062 With this PR, all reported failures in https://github.com/pytorch/pytorch/pull/90849 are resolved (modulo test_bmm that uses an unorthodox way to construct a batch CSR tensor). Pull Request resolved: https://github.com/pytorch/pytorch/pull/91243 Approved by: https://github.com/nikitaved, https://github.com/amjames, https://github.com/lezcano	2023-01-02 18:08:46 +00:00
Nikita Vedeneev	1768a28a20	`COO @ COO`: fix to always produce coalesced outputs. (#91094 ) Fixes [#90516](https://github.com/pytorch/pytorch/issues/90516) Fixes [#90538](https://github.com/pytorch/pytorch/issues/90538) Pull Request resolved: https://github.com/pytorch/pytorch/pull/91094 Approved by: https://github.com/pearu	2022-12-27 21:32:14 +00:00
Pearu Peterson	8004f934cd	Fix CSR with int32 indices to CSC conversion (#91061 ) Fixes https://github.com/pytorch/pytorch/issues/91007 Pull Request resolved: https://github.com/pytorch/pytorch/pull/91061 Approved by: https://github.com/nikitaved	2022-12-18 13:53:25 +00:00
Pearu Peterson	01e7f46215	Ensure sorted indices from the CSR->BSR conversion (#90918 ) Fixes https://github.com/pytorch/pytorch/issues/90910 Pull Request resolved: https://github.com/pytorch/pytorch/pull/90918 Approved by: https://github.com/cpuhrsch	2022-12-16 15:49:48 +00:00
Edward Z. Yang	e686a442b4	If a torch.* returns non-Tensor, make this unimplemented rather than assert. (#89918 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/89918 Approved by: https://github.com/albanD	2022-12-15 21:53:54 +00:00
Pearu Peterson	a60d712010	Support (non-batch) BSR/BSC to COO sparse tensor conversions (#90718 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/90718 Approved by: https://github.com/cpuhrsch	2022-12-14 05:37:05 +00:00
Pearu Peterson	76c6dfeaa6	Add layout and blocksize arguments to Tensor.to_sparse method (#89502 ) This PR extends the `Tensor.to_sparse()` method to `Tensor.to_sparse(layout=None, blocksize=None)` in a BC manner (`layout=None` means `layout=torch.sparse_coo`). In addition, the PR adds support for the following conversions: - non-hybrid/hybrid COO tensor to CSR or CSC or a COO tensor - short, bool, byte, char, bfloat16, int, long, half CSR tensor to a BSR tensor and fixes the following conversions: - hybrid COO to COO tensor - non-batch/batch hybrid BSR to BSR or BSC tensor Pull Request resolved: https://github.com/pytorch/pytorch/pull/89502 Approved by: https://github.com/amjames, https://github.com/cpuhrsch	2022-11-30 20:21:10 +00:00
Pearu Peterson	296e1ba4d0	Row and column select support for block compressed sparse tensors (#88733 ) As in the title: - Support `select` and `select_copy` on block sparse compressed tensors - Fixes incorrect results when selecting dense dimensions The PR also improves the performance of indexing sparse compressed tensors considerably: <details> Before: ```python In [3]: a=torch.rand((1000, 1000)).to_sparse_csr() In [4]: %timeit a.select(0, 0) 606 µs ± 4.27 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) In [5]: %timeit a.select(1, 0) 527 µs ± 57.7 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) In [6]: %timeit a[0, 0] 617 µs ± 3.74 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) In [7]: a = a.cuda() In [8]: %timeit a.select(0, 0); torch.cuda.synchronize(); 1.19 ms ± 137 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) In [9]: %timeit a.select(1, 0); torch.cuda.synchronize(); 1.2 ms ± 119 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) In [10]: %timeit a[0, 0]; torch.cuda.synchronize(); 1.23 ms ± 482 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) ``` This PR: ```python In [3]: a=torch.rand((1000, 1000)).to_sparse_csr() In [4]: %timeit a.select(0, 0) 4.75 µs ± 8.94 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) In [5]: %timeit a.select(1, 0) 565 µs ± 156 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) In [6]: %timeit a[0, 0] 13.1 µs ± 435 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) In [7]: a = a.cuda() In [8]: %timeit a.select(0, 0); torch.cuda.synchronize(); 21.6 µs ± 23.9 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) In [9]: %timeit a.select(1, 0); torch.cuda.synchronize(); 1.15 ms ± 3.13 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) In [10]: %timeit a[0, 0]; torch.cuda.synchronize(); 63.7 µs ± 2.5 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) ``` </details> Pull Request resolved: https://github.com/pytorch/pytorch/pull/88733 Approved by: https://github.com/nikitaved, https://github.com/amjames, https://github.com/cpuhrsch	2022-11-30 11:15:56 +00:00
Pearu Peterson	90bed8874f	Generator of tensor inputs with variable layout and structure (batch/non-batch, hybrid/non-hybrid, block/non-block) (#88914 ) This PR introduces `TestCase.generate_simple_inputs` method that is an improved and generalized version of the `TestSparseCompressed._generate_small_inputs` method. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88914 Approved by: https://github.com/cpuhrsch	2022-11-30 02:13:33 +00:00
Kazuaki Ishizaki	088f2fa567	Fix typos in messages under test (#89121 ) This PR fixes typos of messages in `.cpp` and `.py` files under test directory. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89121 Approved by: https://github.com/mruberry, https://github.com/kit1980	2022-11-17 01:55:03 +00:00
Andrew M. James	ff6770a9a1	enable backward for log1p (sparse layouts) (#88155 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88155 Approved by: https://github.com/cpuhrsch	2022-11-04 20:59:26 +00:00
jpvillam	1e1b045128	[ROCM] Enable Sparse Pickle Test (#82729 ) Missed stream context for serialization ### Description Missing ROCm stream context on memory operations for serialization ### Testing Ran the sparse pickle test Pull Request resolved: https://github.com/pytorch/pytorch/pull/82729 Approved by: https://github.com/ngimel	2022-10-27 15:11:28 +00:00
Pearu Peterson	88b882cd1c	Support sum on a sparse COO tensor. (#86300 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86300 Approved by: https://github.com/cpuhrsch	2022-10-06 18:39:28 +00:00
George Qi	686555b663	[maskedtensor] port torch/_masked into torch/masked (#85515 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85515 Approved by: https://github.com/cpuhrsch	2022-09-26 23:41:13 +00:00
Elias Ellison	bcc544e9d7	Add FakeCrossRef tests for backwards, Fix Layer Norm Backward Decomp (#85417 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85417 Approved by: https://github.com/ezyang	2022-09-26 17:08:14 +00:00
nikitaved	12ae3bea43	Faster mul(sparse, sparse) with broadcasting in dense dims. (#85336 ) This is a combo PR of https://github.com/pytorch/pytorch/pull/84929 and ~https://github.com/pytorch/pytorch/pull/83428~. Preliminary benchmarks (square matrices of shape (n, n)). <details> <summary>Script</summary> ```python import torch import math from IPython import get_ipython from itertools import product, repeat import pickle from torch.utils.benchmark import Timer, Compare torch.manual_seed(13) problem_dims = ( # n > nnz (10000, 100), (100000, 1000), (1000000, 10000), # n < nnz (10, 100), (10, 1000), (10, 10000), (100, 1000), (100, 10000), (1000, 10000), (1000, 100000), (1000, 1000000), #(1000000, 1000000000), ) name = "PR" device = "cuda" results = [] for n, nnz in problem_dims: def gen_tensor(coalesce=False): shape = (n, n) nrows, ncols = shape rowidx = torch.randint(low=0, high=nrows, size=(nnz,), device=device) colidx = torch.randint(low=0, high=ncols, size=(nnz,), device=device) itemidx = torch.vstack((rowidx, colidx)) xvalues = torch.randn(nnz, device=device) itemidx = torch.hstack((itemidx, itemidx)) xvalues = torch.hstack((xvalues, xvalues)) res = torch.sparse_coo_tensor(itemidx, xvalues, size=shape) if coalesce: return res.coalesce() else: return res for x_coalesce, y_coalesce in product(repeat((True, False), 2)): x = gen_tensor(x_coalesce) y = gen_tensor(y_coalesce) smtp = "x y" timer = Timer(smtp, globals=globals(), label="coo.mul", description=f"{name}: mul, device: {device}", sub_label=f"n={n}, nnz={nnz}, coalesce=({x_coalesce, y_coalesce})", num_threads=torch.get_num_threads()) results.append(timer.blocked_autorange()) compare = Compare(results) compare.trim_significant_figures() compare.print() with open(f"{name}_{device}_mul.pickle", 'wb') as f: pickle.dump(results, f) ``` </details> <details> <summary>Gather results</summary> ```python import pickle from torch.utils.benchmark import Timer, Compare files = [ "PR", "master" ] device = 'cuda' timers = [] for name in files: with open("{}_{}_mul.pickle".format(name, device), 'rb') as f: timers += pickle.load(f) compare = Compare(timers) compare.trim_significant_figures() compare.print() ``` </details> <details> <summary>CUDA</summary> ``` [------------------------------------------------- coo.mul -------------------------------------------------] \| PR: mul, device: cuda \| master: mul, device: cuda 24 threads: ------------------------------------------------------------------------------------------------- n=10000, nnz=100, coalesce=((True, True)) \| 95 \| 91 n=10000, nnz=100, coalesce=((True, False)) \| 87 \| 242 n=10000, nnz=100, coalesce=((False, True)) \| 87 \| 226 n=10000, nnz=100, coalesce=((False, False)) \| 130 \| 371 n=100000, nnz=1000, coalesce=((True, True)) \| 100 \| 521 n=100000, nnz=1000, coalesce=((True, False)) \| 90 \| 649 n=100000, nnz=1000, coalesce=((False, True)) \| 100 \| 659 n=100000, nnz=1000, coalesce=((False, False)) \| 200 \| 781 n=1000000, nnz=10000, coalesce=((True, True)) \| 100 \| 4861 n=1000000, nnz=10000, coalesce=((True, False)) \| 100 \| 5012 n=1000000, nnz=10000, coalesce=((False, True)) \| 98 \| 5010 n=1000000, nnz=10000, coalesce=((False, False)) \| 384 \| 5174 n=10, nnz=100, coalesce=((True, True)) \| 100 \| 79 n=10, nnz=100, coalesce=((True, False)) \| 100 \| 221 n=10, nnz=100, coalesce=((False, True)) \| 100 \| 221 n=10, nnz=100, coalesce=((False, False)) \| 100 \| 350 n=10, nnz=1000, coalesce=((True, True)) \| 100 \| 100 n=10, nnz=1000, coalesce=((True, False)) \| 100 \| 240 n=10, nnz=1000, coalesce=((False, True)) \| 100 \| 254 n=10, nnz=1000, coalesce=((False, False)) \| 100 \| 392 n=10, nnz=10000, coalesce=((True, True)) \| 100 \| 110 n=10, nnz=10000, coalesce=((True, False)) \| 110 \| 286 n=10, nnz=10000, coalesce=((False, True)) \| 110 \| 286 n=10, nnz=10000, coalesce=((False, False)) \| 271 \| 455 n=100, nnz=1000, coalesce=((True, True)) \| 110 \| 851 n=100, nnz=1000, coalesce=((True, False)) \| 110 \| 1000 n=100, nnz=1000, coalesce=((False, True)) \| 110 \| 990 n=100, nnz=1000, coalesce=((False, False)) \| 140 \| 1124 n=100, nnz=10000, coalesce=((True, True)) \| 110 \| 5137 n=100, nnz=10000, coalesce=((True, False)) \| 110 \| 5391 n=100, nnz=10000, coalesce=((False, True)) \| 100 \| 5405 n=100, nnz=10000, coalesce=((False, False)) \| 249 \| 5539 n=1000, nnz=10000, coalesce=((True, True)) \| 100 \| 8598 n=1000, nnz=10000, coalesce=((True, False)) \| 100 \| 8800 n=1000, nnz=10000, coalesce=((False, True)) \| 100 \| 8782 n=1000, nnz=10000, coalesce=((False, False)) \| 255 \| 8956 n=1000, nnz=100000, coalesce=((True, True)) \| 120 \| 84500 n=1000, nnz=100000, coalesce=((True, False)) \| 200 \| 88560 n=1000, nnz=100000, coalesce=((False, True)) \| 160 \| 89000 n=1000, nnz=100000, coalesce=((False, False)) \| 373 \| 89000 n=1000, nnz=1000000, coalesce=((True, True)) \| 312 \| 606400 n=1000, nnz=1000000, coalesce=((True, False)) \| 1340 \| 609200 n=1000, nnz=1000000, coalesce=((False, True)) \| 1340 \| 609100 n=1000, nnz=1000000, coalesce=((False, False)) \| 4408 \| 611400 Times are in microseconds (us). ``` </details> <details> <summary>CPU</summary> ``` [------------------------------------------------ coo.mul ------------------------------------------------] \| PR: mul, device: cpu \| master: mul, device: cpu 24 threads: ----------------------------------------------------------------------------------------------- n=10000, nnz=100, coalesce=((True, True)) \| 8 \| 8 n=10000, nnz=100, coalesce=((True, False)) \| 32 \| 34 n=10000, nnz=100, coalesce=((False, True)) \| 32 \| 34 n=10000, nnz=100, coalesce=((False, False)) \| 41 \| 56 n=100000, nnz=1000, coalesce=((True, True)) \| 24 \| 24 n=100000, nnz=1000, coalesce=((True, False)) \| 90 \| 100 n=100000, nnz=1000, coalesce=((False, True)) \| 87 \| 100 n=100000, nnz=1000, coalesce=((False, False)) \| 231 \| 255 n=1000000, nnz=10000, coalesce=((True, True)) \| 190 \| 200 n=1000000, nnz=10000, coalesce=((True, False)) \| 908 \| 2023 n=1000000, nnz=10000, coalesce=((False, True)) \| 800 \| 2036 n=1000000, nnz=10000, coalesce=((False, False)) \| 3684 \| 3989 n=10, nnz=100, coalesce=((True, True)) \| 8 \| 7 n=10, nnz=100, coalesce=((True, False)) \| 34 \| 30 n=10, nnz=100, coalesce=((False, True)) \| 33 \| 30 n=10, nnz=100, coalesce=((False, False)) \| 44 \| 50 n=10, nnz=1000, coalesce=((True, True)) \| 8 \| 7 n=10, nnz=1000, coalesce=((True, False)) \| 100 \| 100 n=10, nnz=1000, coalesce=((False, True)) \| 130 \| 100 n=10, nnz=1000, coalesce=((False, False)) \| 746 \| 210 n=10, nnz=10000, coalesce=((True, True)) \| 8 \| 7 n=10, nnz=10000, coalesce=((True, False)) \| 1000 \| 1500 n=10, nnz=10000, coalesce=((False, True)) \| 1000 \| 1510 n=10, nnz=10000, coalesce=((False, False)) \| 3063 \| 2457 n=100, nnz=1000, coalesce=((True, True)) \| 25 \| 25 n=100, nnz=1000, coalesce=((True, False)) \| 180 \| 130 n=100, nnz=1000, coalesce=((False, True)) \| 200 \| 130 n=100, nnz=1000, coalesce=((False, False)) \| 271 \| 255 n=100, nnz=10000, coalesce=((True, True)) \| 100 \| 100 n=100, nnz=10000, coalesce=((True, False)) \| 2444 \| 2290 n=100, nnz=10000, coalesce=((False, True)) \| 2455 \| 2357 n=100, nnz=10000, coalesce=((False, False)) \| 5316 \| 3783 n=1000, nnz=10000, coalesce=((True, True)) \| 204 \| 211 n=1000, nnz=10000, coalesce=((True, False)) \| 2457 \| 2480 n=1000, nnz=10000, coalesce=((False, True)) \| 2448 \| 2539 n=1000, nnz=10000, coalesce=((False, False)) \| 3665 \| 4801 n=1000, nnz=100000, coalesce=((True, True)) \| 2293 \| 2374 n=1000, nnz=100000, coalesce=((True, False)) \| 9000 \| 24620 n=1000, nnz=100000, coalesce=((False, True)) \| 8000 \| 25080 n=1000, nnz=100000, coalesce=((False, False)) \| 26500 \| 47650 n=1000, nnz=1000000, coalesce=((True, True)) \| 10000 \| 13000 n=1000, nnz=1000000, coalesce=((True, False)) \| 80000 \| 362200 n=1000, nnz=1000000, coalesce=((False, True)) \| 78050 \| 392600 n=1000, nnz=1000000, coalesce=((False, False)) \| 312100 \| 766900 Times are in microseconds (us). ``` </details> Pull Request resolved: https://github.com/pytorch/pytorch/pull/85336 Approved by: https://github.com/cpuhrsch	2022-09-23 23:31:19 +00:00
PyTorch MergeBot	d10de31cc8	Revert "Add FakeCrossRef tests for backwards, Fix Layer Norm Backward Decomp (#85417 )" This reverts commit `78afa0cf0c`. Reverted https://github.com/pytorch/pytorch/pull/85417 on behalf of https://github.com/clee2000 due to broke tests on trunk `78afa0cf0c`	2022-09-23 17:21:43 +00:00
Elias Ellison	78afa0cf0c	Add FakeCrossRef tests for backwards, Fix Layer Norm Backward Decomp (#85417 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85417 Approved by: https://github.com/ezyang	2022-09-23 15:50:03 +00:00
PyTorch MergeBot	5043457a8e	Revert "Add FakeCrossRef tests for backwards, Fix Layer Norm Backward Decomp (#85417 )" This reverts commit `9c77083965`. Reverted https://github.com/pytorch/pytorch/pull/85417 on behalf of https://github.com/clee2000 due to broke tests on trunk (and pull somehow) `9c77083965`	2022-09-22 15:44:38 +00:00
Elias Ellison	9c77083965	Add FakeCrossRef tests for backwards, Fix Layer Norm Backward Decomp (#85417 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85417 Approved by: https://github.com/ezyang	2022-09-22 13:03:57 +00:00
Elias Ellison	d9aa6dfe88	Add Fake Cross Ref Mode, migrate sparse to it (#85382 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85382 Approved by: https://github.com/ezyang	2022-09-21 17:15:47 +00:00
PyTorch MergeBot	81620c3360	Revert "Faster mul(sparse, sparse) with broadcasting in dense dims. (#83428 )" This reverts commit `d49943bda8`. Reverted https://github.com/pytorch/pytorch/pull/83428 on behalf of https://github.com/osalpekar due to Reverted because __restrict symbol not supported by certain MSVC compilers, leading to undefined symbol error at compilation time	2022-09-17 06:53:11 +00:00
nikitaved	d49943bda8	Faster mul(sparse, sparse) with broadcasting in dense dims. (#83428 ) Preliminary benchmarks (square matrices of shape (n, n)). <details> <summary>Script</summary> ```python import torch import math from IPython import get_ipython from itertools import product, repeat import pickle from torch.utils.benchmark import Timer, Compare torch.manual_seed(13) # specifies (n, nnz) problem_dims = ( # n > nnz (10000, 100), (100000, 1000), (1000000, 10000), # n < nnz (10, 100), (10, 1000), (10, 10000), (100, 1000), (100, 10000), (1000, 10000), (1000, 100000), (1000, 1000000), #(1000000, 1000000000), ) name = "PR" device = "cuda" results = [] for n, nnz in problem_dims: def gen_tensor(coalesce=False): shape = (n, n) nrows, ncols = shape rowidx = torch.randint(low=0, high=nrows, size=(nnz,), device=device) colidx = torch.randint(low=0, high=ncols, size=(nnz,), device=device) itemidx = torch.vstack((rowidx, colidx)) xvalues = torch.randn(nnz, device=device) itemidx = torch.hstack((itemidx, itemidx)) xvalues = torch.hstack((xvalues, xvalues)) res = torch.sparse_coo_tensor(itemidx, xvalues, size=shape) if coalesce: return res.coalesce() else: return res for x_coalesce, y_coalesce in product(repeat((True, False), 2)): x = gen_tensor(x_coalesce) y = gen_tensor(y_coalesce) smtp = "x y" timer = Timer(smtp, globals=globals(), label="coo.mul", description=f"{name}: mul, device: {device}", sub_label=f"n={n}, nnz={nnz}, coalesce=({x_coalesce, y_coalesce})", num_threads=torch.get_num_threads()) results.append(timer.blocked_autorange()) compare = Compare(results) compare.trim_significant_figures() compare.print() with open(f"{name}_{device}_mul.pickle", 'wb') as f: pickle.dump(results, f) ``` </details> <details> <summary>Gather results</summary> ```python import pickle from torch.utils.benchmark import Timer, Compare files = [ "PR", "master" ] device = 'cuda' timers = [] for name in files: with open("{}_{}_mul.pickle".format(name, device), 'rb') as f: timers += pickle.load(f) compare = Compare(timers) compare.trim_significant_figures() compare.print() ``` </details> <details> <summary>CUDA</summary> ``` [------------------------------------------------- coo.mul -------------------------------------------------] \| PR: mul, device: cuda \| master: mul, device: cuda 24 threads: ------------------------------------------------------------------------------------------------- n=10000, nnz=100, coalesce=((True, True)) \| 95 \| 91 n=10000, nnz=100, coalesce=((True, False)) \| 87 \| 242 n=10000, nnz=100, coalesce=((False, True)) \| 87 \| 226 n=10000, nnz=100, coalesce=((False, False)) \| 130 \| 371 n=100000, nnz=1000, coalesce=((True, True)) \| 100 \| 521 n=100000, nnz=1000, coalesce=((True, False)) \| 90 \| 649 n=100000, nnz=1000, coalesce=((False, True)) \| 100 \| 659 n=100000, nnz=1000, coalesce=((False, False)) \| 200 \| 781 n=1000000, nnz=10000, coalesce=((True, True)) \| 100 \| 4861 n=1000000, nnz=10000, coalesce=((True, False)) \| 100 \| 5012 n=1000000, nnz=10000, coalesce=((False, True)) \| 98 \| 5010 n=1000000, nnz=10000, coalesce=((False, False)) \| 384 \| 5174 n=10, nnz=100, coalesce=((True, True)) \| 100 \| 79 n=10, nnz=100, coalesce=((True, False)) \| 100 \| 221 n=10, nnz=100, coalesce=((False, True)) \| 100 \| 221 n=10, nnz=100, coalesce=((False, False)) \| 100 \| 350 n=10, nnz=1000, coalesce=((True, True)) \| 100 \| 100 n=10, nnz=1000, coalesce=((True, False)) \| 100 \| 240 n=10, nnz=1000, coalesce=((False, True)) \| 100 \| 254 n=10, nnz=1000, coalesce=((False, False)) \| 100 \| 392 n=10, nnz=10000, coalesce=((True, True)) \| 100 \| 110 n=10, nnz=10000, coalesce=((True, False)) \| 110 \| 286 n=10, nnz=10000, coalesce=((False, True)) \| 110 \| 286 n=10, nnz=10000, coalesce=((False, False)) \| 271 \| 455 n=100, nnz=1000, coalesce=((True, True)) \| 110 \| 851 n=100, nnz=1000, coalesce=((True, False)) \| 110 \| 1000 n=100, nnz=1000, coalesce=((False, True)) \| 110 \| 990 n=100, nnz=1000, coalesce=((False, False)) \| 140 \| 1124 n=100, nnz=10000, coalesce=((True, True)) \| 110 \| 5137 n=100, nnz=10000, coalesce=((True, False)) \| 110 \| 5391 n=100, nnz=10000, coalesce=((False, True)) \| 100 \| 5405 n=100, nnz=10000, coalesce=((False, False)) \| 249 \| 5539 n=1000, nnz=10000, coalesce=((True, True)) \| 100 \| 8598 n=1000, nnz=10000, coalesce=((True, False)) \| 100 \| 8800 n=1000, nnz=10000, coalesce=((False, True)) \| 100 \| 8782 n=1000, nnz=10000, coalesce=((False, False)) \| 255 \| 8956 n=1000, nnz=100000, coalesce=((True, True)) \| 120 \| 84500 n=1000, nnz=100000, coalesce=((True, False)) \| 200 \| 88560 n=1000, nnz=100000, coalesce=((False, True)) \| 160 \| 89000 n=1000, nnz=100000, coalesce=((False, False)) \| 373 \| 89000 n=1000, nnz=1000000, coalesce=((True, True)) \| 312 \| 606400 n=1000, nnz=1000000, coalesce=((True, False)) \| 1340 \| 609200 n=1000, nnz=1000000, coalesce=((False, True)) \| 1340 \| 609100 n=1000, nnz=1000000, coalesce=((False, False)) \| 4408 \| 611400 Times are in microseconds (us). ``` </details> <details> <summary>CPU</summary> ``` [------------------------------------------------ coo.mul ------------------------------------------------] \| PR: mul, device: cpu \| master: mul, device: cpu 24 threads: ----------------------------------------------------------------------------------------------- n=10000, nnz=100, coalesce=((True, True)) \| 8 \| 8 n=10000, nnz=100, coalesce=((True, False)) \| 32 \| 34 n=10000, nnz=100, coalesce=((False, True)) \| 32 \| 34 n=10000, nnz=100, coalesce=((False, False)) \| 41 \| 56 n=100000, nnz=1000, coalesce=((True, True)) \| 24 \| 24 n=100000, nnz=1000, coalesce=((True, False)) \| 90 \| 100 n=100000, nnz=1000, coalesce=((False, True)) \| 87 \| 100 n=100000, nnz=1000, coalesce=((False, False)) \| 231 \| 255 n=1000000, nnz=10000, coalesce=((True, True)) \| 190 \| 200 n=1000000, nnz=10000, coalesce=((True, False)) \| 908 \| 2023 n=1000000, nnz=10000, coalesce=((False, True)) \| 800 \| 2036 n=1000000, nnz=10000, coalesce=((False, False)) \| 3684 \| 3989 n=10, nnz=100, coalesce=((True, True)) \| 8 \| 7 n=10, nnz=100, coalesce=((True, False)) \| 34 \| 30 n=10, nnz=100, coalesce=((False, True)) \| 33 \| 30 n=10, nnz=100, coalesce=((False, False)) \| 44 \| 50 n=10, nnz=1000, coalesce=((True, True)) \| 8 \| 7 n=10, nnz=1000, coalesce=((True, False)) \| 100 \| 100 n=10, nnz=1000, coalesce=((False, True)) \| 130 \| 100 n=10, nnz=1000, coalesce=((False, False)) \| 746 \| 210 n=10, nnz=10000, coalesce=((True, True)) \| 8 \| 7 n=10, nnz=10000, coalesce=((True, False)) \| 1000 \| 1500 n=10, nnz=10000, coalesce=((False, True)) \| 1000 \| 1510 n=10, nnz=10000, coalesce=((False, False)) \| 3063 \| 2457 n=100, nnz=1000, coalesce=((True, True)) \| 25 \| 25 n=100, nnz=1000, coalesce=((True, False)) \| 180 \| 130 n=100, nnz=1000, coalesce=((False, True)) \| 200 \| 130 n=100, nnz=1000, coalesce=((False, False)) \| 271 \| 255 n=100, nnz=10000, coalesce=((True, True)) \| 100 \| 100 n=100, nnz=10000, coalesce=((True, False)) \| 2444 \| 2290 n=100, nnz=10000, coalesce=((False, True)) \| 2455 \| 2357 n=100, nnz=10000, coalesce=((False, False)) \| 5316 \| 3783 n=1000, nnz=10000, coalesce=((True, True)) \| 204 \| 211 n=1000, nnz=10000, coalesce=((True, False)) \| 2457 \| 2480 n=1000, nnz=10000, coalesce=((False, True)) \| 2448 \| 2539 n=1000, nnz=10000, coalesce=((False, False)) \| 3665 \| 4801 n=1000, nnz=100000, coalesce=((True, True)) \| 2293 \| 2374 n=1000, nnz=100000, coalesce=((True, False)) \| 9000 \| 24620 n=1000, nnz=100000, coalesce=((False, True)) \| 8000 \| 25080 n=1000, nnz=100000, coalesce=((False, False)) \| 26500 \| 47650 n=1000, nnz=1000000, coalesce=((True, True)) \| 10000 \| 13000 n=1000, nnz=1000000, coalesce=((True, False)) \| 80000 \| 362200 n=1000, nnz=1000000, coalesce=((False, True)) \| 78050 \| 392600 n=1000, nnz=1000000, coalesce=((False, False)) \| 312100 \| 766900 Times are in microseconds (us). ``` </details> Pull Request resolved: https://github.com/pytorch/pytorch/pull/83428 Approved by: https://github.com/cpuhrsch	2022-09-16 00:28:40 +00:00
Edward Z. Yang	c5a8946e40	Revert "Revert "Redo how custom/python_custom methods on TensorImpl work (#84796 )" (#84806 ) This reverts commit `ca3b2bfbe3`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84806 Approved by: https://github.com/Chillee	2022-09-10 06:17:35 +00:00
Eli Uriegas	ca3b2bfbe3	Revert "Redo how custom/python_custom methods on TensorImpl work (#84796 ) This reverts commit `591b75bf98`. Manual revert of https://github.com/pytorch/pytorch/pull/84641 Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/84796 Approved by: https://github.com/izaitsevfb	2022-09-10 00:18:13 +00:00
Edward Z. Yang	591b75bf98	Redo how custom/python_custom methods on TensorImpl work (#84641 ) A longstanding confusion in the implementation of fake tensor and proxy tensor is what to do about torch.ops.aten.sym_sizes and related calls. In particular, when you have a tensor that (1) has symbolic shapes and (2) has a `__torch_dispatch__` call, previously, you would always get `__torch_dispatch__` calls for sizes/strides query, even if you didn't request it via the dispatch kwargs in `make_wrapper_subclass`. The reason for this is because we were previously mixing several concepts: "I want to dispatch to Python", "I want to call a virtual method" and "I have dynamic shapes". A single boolean variable controlled all of these things, and so it was not possible to understand inside TensorImpl what the user had actually originally requested. In this PR, we track each of these concepts individually so that we can preserve user intent. Then, we combine these into a single "policy" variable that controls whether or not we can use the fastpath or not. For the policy to trigger, we only need one of the exceptional cases to be true. Billing of changes: * Rename `set_sizes_strides_policy` to `set_custom_sizes_strides`; in general, you cannot DIRECTLY set policy; you have to indirectly set it by the public functions. * Some helpers for sizes and strides, since it's more complicated (as it is an enum, rather than just bools as is the case for device and layout). `matches_python_custom` is used to test the Python dispatch user ask. `matches_policy` does the policy test (only used in the user facing functions.) * I reorged the accessor methods so that they are more logical. This makes the diff bad, so I recommend reading the final code directly. * The default custom implementations now more reliably call their default() implementations * As bonus refactor, I devirtualized some functions that don't need to be virtual * `set_sym_sizes_and_strides` is renamed to `set_sizes_and_strides` to make it easier to use in template contexts; it optionally takes a storage offset now so you can set all three values at the same time. If you use the SymInt overload but there are no symbolic integers, we give you a normal resize. * This adds `sym_storage_offset` since we had that in the symbolic shapes branch and there's no reason not to put it in (and it reduces merge conflicts) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/84641 Approved by: https://github.com/wconstab	2022-09-09 13:41:13 +00:00
Elias Ellison	15c5baf878	Throw on data dependent ops (#83567 ) Previously, we would trace through the following with no error: ``` from torch.fx.experimental.proxy_tensor import make_fx import torch def f(x, y): return x[0, y:] ``` Even though the output shape is dependent on the data of `y`. Now, throw on the conversion of `y` to an integer. It would be nice to not break on constant tensors but I'll do that as the next PR (Edit: done with https://github.com/pytorch/pytorch/pull/84387). Sketching out how that would work (and keep in mind this is applicable Dynamo tracing and not just AOT Autograd) I think to do that you would need to : - hold strong refs to a set of constant tensors, and only allow them to be captured from `lift_fresh.copy` - when you run a mutable op, either remove it from the set of constant tensors or run the operator for real - limit to small constant tensors Anything else ? Pull Request resolved: https://github.com/pytorch/pytorch/pull/83567 Approved by: https://github.com/ezyang	2022-09-07 02:37:00 +00:00
Andrew M. James	6dc9223c8b	Sparse_coo: Be more agressive in setting coalesced True to avoid suprising behaviors (#82426 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/82426 Approved by: https://github.com/pearu, https://github.com/bhosmer	2022-09-01 17:46:51 +00:00
jpvillam	247468baf0	[ROCm] More Sparse UTs enablement and more hipification mappings. (#78939 ) Enables: test_bmm_cuda_float64 test_bmm_deterministic_cuda_float64 test_csr_matvec_cuda_complex128 test_csr_matvec_cuda_complex64 test_csr_matvec_cuda_float32 test_csr_matvec_cuda_float64 To enable the above tests had to add some more hip mappings for the hipification process. Pull Request resolved: https://github.com/pytorch/pytorch/pull/78939 Approved by: https://github.com/pruthvistony, https://github.com/malfet	2022-08-23 13:54:09 +00:00
Brian Hirsh	0c24af4985	Always allow tensor metadata changes (#83590 ) Make it so that it is valid to set metadata after detach calls, like `x.detach().resize_(...)`. This technically lifts some restrictions around `.data`. This PR means that you can now technically call `x.data.resize_(...)`, which can now directly resize `x` instead of erroring. My understanding: Before the tensor-variable merge, when `x` and `x.data` were really different tensors, you could resize `x.data` independently of `x`, and during the merge, this error was added to avoid silent confusing behavior changes. It was agreed that this error has been around long enough (several years) that it's acceptable to drop. cc @albanD @ezyang. (Ed already had a prototype PR [here](https://github.com/pytorch/pytorch/pull/83545) - I ended up making one to try to slog through test failures). Pull Request resolved: https://github.com/pytorch/pytorch/pull/83590 Approved by: https://github.com/ezyang	2022-08-19 23:30:43 +00:00
nikitaved	b60dc2eb43	`mul`: sparse-dense + sparse-sparse with 0-dims support take 2. (#82962 ) This one is a copy of https://github.com/pytorch/pytorch/pull/81556 https://github.com/pytorch/pytorch/pull/82717 These got reverted due to issues with torchvision. CC @kit1980 , could you please take over from here? Pull Request resolved: https://github.com/pytorch/pytorch/pull/82962 Approved by: https://github.com/kit1980	2022-08-11 23:34:58 +00:00
PyTorch MergeBot	45291c7ec8	Revert "Implement `mul(dense, sparse), mul(sparse, dense)` for sparse COO tensors. (#81556 )" This reverts commit `edd2f6daa7`. Reverted https://github.com/pytorch/pytorch/pull/81556 on behalf of https://github.com/kit1980 due to Broken internal test, S286911	2022-08-05 19:39:01 +00:00
PyTorch MergeBot	796fba02fe	Revert "Implement and extend `mul(sparse, sparse)` to work with 0-dim arguments on either side. (#82717 )" This reverts commit `3ab54b971f`. Reverted https://github.com/pytorch/pytorch/pull/82717 on behalf of https://github.com/kit1980 due to Broken internal test, S286911	2022-08-05 19:35:35 +00:00
Nikita Vedeneev	3ab54b971f	Implement and extend `mul(sparse, sparse)` to work with 0-dim arguments on either side. (#82717 ) Extends https://github.com/pytorch/pytorch/pull/81556 by bringing some missing functionality implemented in master. Also, improves on master to allow arbitrary 0-dim coalesced or not arguments to be on either side of the operation. Master, for example, would fail on 0-dim non-coalesced inputs. CC @datumbox, @osalpekar . Pull Request resolved: https://github.com/pytorch/pytorch/pull/82717 Approved by: https://github.com/amjames, https://github.com/bhosmer	2022-08-04 17:46:23 +00:00
Edward Z. Yang	42fefd4403	Sparse fake tensor support (#82172 ) Add support for sparse fake tensors. - The testing strategy is to run a fake tensor cross ref test on `test_sparse.py`. This is necessary because OpInfo sparse coverage is completely nonexistent. We could have tried to turn on cross ref testing globally for all files, but that would be very time consuming and the tests I'm interested in are mostly in this file. There are some exclusions in testing for things that don't work. - I make fake tensor converter raise a UnsupportedFakeTensorException if the meta converter fails to do a conversion (which can happen in a relatively large number of situations). - I relax fake tensor invariants so that you can make a fake tensor from a meta tensor. This is useful because in the cross ref test sometimes we operate on meta tensors. - Fake tensor wrapping is improved to handle the case when a function doesn't return any tensors - Meta converter is taught how to convert sparse tensors to meta There's still a little more cleanup that needs to be done, but this is good for review. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/82172 Approved by: https://github.com/eellison	2022-08-03 14:29:36 +00:00
Nikita Vedeneev	edd2f6daa7	Implement `mul(dense, sparse), mul(sparse, dense)` for sparse COO tensors. (#81556 ) As per title. Implemented with broadcasting and in-place support. Follow-up : Backward implementation. Fixes https://github.com/pytorch/pytorch/issues/3158 Fixes https://github.com/pytorch/pytorch/issues/4456 Fixes https://github.com/pytorch/pytorch/issues/46307 Pull Request resolved: https://github.com/pytorch/pytorch/pull/81556 Approved by: https://github.com/amjames, https://github.com/cpuhrsch	2022-07-29 15:15:27 +00:00
Nikita Vedeneev	18d0e533da	fix silent type promition for sparse COO tensors with `select` (#82215 ) Fixes https://github.com/pytorch/pytorch/issues/82150. Pull Request resolved: https://github.com/pytorch/pytorch/pull/82215 Approved by: https://github.com/amjames, https://github.com/cpuhrsch	2022-07-27 12:24:06 +00:00
Christian Puhrsch	6ab1fe19ee	torch.sparse.softmax avoid div by zero and invalid kernel launch parameters (#82149 ) ### Description Small changes needed to deal with nnz 0 inputs. ### Issue https://github.com/pytorch/pytorch/issues/82107 ### Testing Added additional test coverage to reproduce bug reported in issue. Tested resulting values by conversion `to_dense`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/82149 Approved by: https://github.com/jbschlosser, https://github.com/ezyang	2022-07-25 23:10:58 +00:00
PyTorch MergeBot	6e9b0dcdc4	Revert "Implement `mul(dense, sparse), mul(sparse, dense)` for sparse COO tensors. (#81556 )" This reverts commit `cc5b01651f`. Reverted https://github.com/pytorch/pytorch/pull/81556 on behalf of https://github.com/jeanschmidt due to breaking internal builds	2022-07-22 11:20:11 +00:00
Nikita Vedeneev	cc5b01651f	Implement `mul(dense, sparse), mul(sparse, dense)` for sparse COO tensors. (#81556 ) As per title. Implemented with broadcasting and in-place support. Follow-up : Backward implementation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/81556 Approved by: https://github.com/amjames, https://github.com/cpuhrsch	2022-07-22 04:55:48 +00:00
Edward Z. Yang	44193f6b5d	Add basic support for sparse meta tensors (#81800 ) Coverage is by no means complete, we'll drive more coverage using an appropriate cross-ref tests; this is just enough to get construction and querying working. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/81800 Approved by: https://github.com/cpuhrsch, https://github.com/bdhirsh	2022-07-21 21:23:57 +00:00
Andrew M. James	5a4c9e8394	Add spdiags sparse matrix initialization (#78439 ) Similar to [scipy.sparse.spdiags](https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.spdiags.html#scipy-sparse-spdiags) Part of #70926 In other functions (ie (torch.diagonal)[https://pytorch.org/docs/stable/generated/torch.diagonal.html#torch.diagonal]) diagonals of a tensor are referenced using the offset and the two dimensions that the diagonal is taken with respect to. Here the reference implementation from scipy is only considering matrix output, so even if we only support 2-d output at first. It may be useful to consider how the dimensions corresponding to each diagonal would be specified for higher dimensional output. The proposed torch signature implies that all offsets refer to the diagonals with respect to the only two dimensions of the output: ``` torch.sparse.spdiags(Tensor diagonals, IntTensor offsets, int[] shape, Layout? layout=None) -> SparseTensor ``` Above it is required that: `diagonals.ndimension() == 2`, `offsets.ndimensions() == 1`, `offsets.shape[0] == diagonals.shape[0]` and `len(shape) == 2`. This would need to be altered for the case where `len(shape)` > 2. One options is: ``` torch.sparse.spdiags(Tensor[] diagonals, IntTensor[] offsets, IntTensor dims, int[] shape, Layout? layout=None) -> SparseTensor ``` Here `offsets` and `diagonals` becomes lists of tensors, and the `IntTensor dims` argument is introduced. This would require that `len(diagonals) == len(offsets) == dims.shape[0]`, `dims.ndimension() == 2` and `dims.shape[1] == 2` also the same restrictions as the 2d case above apply to the elements of `diagonals` and `offsets` pairwise (that is `diagonals[i].ndimension() == 2`, `offsets[i].ndimension() == 1` and `offsets[i].shape[0] == diagonals[i].shape[0]` for all i). This form of the signature would construct the sparse result by placing the values from `diagonals[i][j]` into the diagonal with offset `offset[i][j]` taken with respect to dimensions `dims[i]`. The specialization back to the original signature for the 2d case could be seen as allowing the single row of dims to default to `[0, 1]` when there is only one `diagonals`, `offsets` provided, and shape is `2-d`. This option allows the rows of an input element `diagonals[i]` to have a different length which may be appropriate as the max length of a diagonal along different dimension pairs will be different. Another option is to specify the dimensions the diagonal is taken with respect to for each offset. This signature would look like: ``` torch.sparse.spdiags(Tensor diagonals, IntTensor offsets, IntTensor dims, int[] shape, Layout? layout=None) -> SparseTensor ``` Here, `diagonals` is still 2-D with dimension 0 matching the length of 1-D `offsets` and the tensor input `dims` is also 2-D with dimension 0 matching the length of 1-D `offsets` and the second dimension being fixed at `2` in this case the sparse result is constructed by placing the elements from `diagonals[i]` into the output diagonal `output.diagonal(offset[i], dim0=dims[i][0], dim1=dims[i][1])` (with some additional consideration that makes it more complicated than simply asigning to that view). The specialization from this back to the 2-D form could be seen as assuming `dims = [[0, 1], [0, 1]... len(offsets) times ]` when `len shape==2`. In both proposed signatures for the N-D case the specialization back to the 2-D signature is a bit of a stretch for your typical default arguments logic, however I think the first is better choice as it offers more flexibility. I think some discussion is required about: - [x] Should the N-D output case be implemented from the outset - [x] If not, should the future addition of the N-D output case be considered when designing the interface. - [x] Other thoughts on the signature which includes the `dims` information for the N-D output case. Resolution: Since no one has offered a request for N-D output support, I think is fine to restrict this to sparse matrix generation. Should a request for N-D support come later, an overload accepting the additional `dims` could be added. Pull Request resolved: https://github.com/pytorch/pytorch/pull/78439 Approved by: https://github.com/nikitaved, https://github.com/cpuhrsch, https://github.com/pearu	2022-07-01 01:11:54 +00:00
PyTorch MergeBot	56e3bc5215	Revert "Add spdiags sparse matrix initialization (#78439 )" This reverts commit `cfb2034b65`. Reverted https://github.com/pytorch/pytorch/pull/78439 on behalf of https://github.com/suo due to broke windows builds, see: `cfb2034b65`	2022-06-30 21:04:36 +00:00
Andrew M. James	cfb2034b65	Add spdiags sparse matrix initialization (#78439 ) Similar to [scipy.sparse.spdiags](https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.spdiags.html#scipy-sparse-spdiags) Part of #70926 In other functions (ie (torch.diagonal)[https://pytorch.org/docs/stable/generated/torch.diagonal.html#torch.diagonal]) diagonals of a tensor are referenced using the offset and the two dimensions that the diagonal is taken with respect to. Here the reference implementation from scipy is only considering matrix output, so even if we only support 2-d output at first. It may be useful to consider how the dimensions corresponding to each diagonal would be specified for higher dimensional output. The proposed torch signature implies that all offsets refer to the diagonals with respect to the only two dimensions of the output: ``` torch.sparse.spdiags(Tensor diagonals, IntTensor offsets, int[] shape, Layout? layout=None) -> SparseTensor ``` Above it is required that: `diagonals.ndimension() == 2`, `offsets.ndimensions() == 1`, `offsets.shape[0] == diagonals.shape[0]` and `len(shape) == 2`. This would need to be altered for the case where `len(shape)` > 2. One options is: ``` torch.sparse.spdiags(Tensor[] diagonals, IntTensor[] offsets, IntTensor dims, int[] shape, Layout? layout=None) -> SparseTensor ``` Here `offsets` and `diagonals` becomes lists of tensors, and the `IntTensor dims` argument is introduced. This would require that `len(diagonals) == len(offsets) == dims.shape[0]`, `dims.ndimension() == 2` and `dims.shape[1] == 2` also the same restrictions as the 2d case above apply to the elements of `diagonals` and `offsets` pairwise (that is `diagonals[i].ndimension() == 2`, `offsets[i].ndimension() == 1` and `offsets[i].shape[0] == diagonals[i].shape[0]` for all i). This form of the signature would construct the sparse result by placing the values from `diagonals[i][j]` into the diagonal with offset `offset[i][j]` taken with respect to dimensions `dims[i]`. The specialization back to the original signature for the 2d case could be seen as allowing the single row of dims to default to `[0, 1]` when there is only one `diagonals`, `offsets` provided, and shape is `2-d`. This option allows the rows of an input element `diagonals[i]` to have a different length which may be appropriate as the max length of a diagonal along different dimension pairs will be different. Another option is to specify the dimensions the diagonal is taken with respect to for each offset. This signature would look like: ``` torch.sparse.spdiags(Tensor diagonals, IntTensor offsets, IntTensor dims, int[] shape, Layout? layout=None) -> SparseTensor ``` Here, `diagonals` is still 2-D with dimension 0 matching the length of 1-D `offsets` and the tensor input `dims` is also 2-D with dimension 0 matching the length of 1-D `offsets` and the second dimension being fixed at `2` in this case the sparse result is constructed by placing the elements from `diagonals[i]` into the output diagonal `output.diagonal(offset[i], dim0=dims[i][0], dim1=dims[i][1])` (with some additional consideration that makes it more complicated than simply asigning to that view). The specialization from this back to the 2-D form could be seen as assuming `dims = [[0, 1], [0, 1]... len(offsets) times ]` when `len shape==2`. In both proposed signatures for the N-D case the specialization back to the 2-D signature is a bit of a stretch for your typical default arguments logic, however I think the first is better choice as it offers more flexibility. I think some discussion is required about: - [x] Should the N-D output case be implemented from the outset - [x] If not, should the future addition of the N-D output case be considered when designing the interface. - [x] Other thoughts on the signature which includes the `dims` information for the N-D output case. Resolution: Since no one has offered a request for N-D output support, I think is fine to restrict this to sparse matrix generation. Should a request for N-D support come later, an overload accepting the additional `dims` could be added. Pull Request resolved: https://github.com/pytorch/pytorch/pull/78439 Approved by: https://github.com/nikitaved, https://github.com/cpuhrsch, https://github.com/pearu	2022-06-30 19:54:47 +00:00
Christian Puhrsch	5da776dd08	[Resubmission] fix mul_out CUDA config for COO tensors (#80254 ) Fixes https://github.com/pytorch/pytorch/issues/79914 Duplicate of https://github.com/pytorch/pytorch/pull/79937 . I wasn't able to push changes to the existing PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/80254 Approved by: https://github.com/eellison	2022-06-28 00:47:03 +00:00
Nikita Vedeneev	417677bf62	`permute` for COO sparse tensors (#79707 ) As per title. Partial implementation of https://github.com/pytorch/pytorch/issues/78422. We cannot satisfy the view semantics once operated over sparse dims. Pull Request resolved: https://github.com/pytorch/pytorch/pull/79707 Approved by: https://github.com/cpuhrsch	2022-06-25 08:49:58 +00:00
Nikita Vedeneev	03cf01bdc0	`index_select` for COO CUDA tensors. (#77551 ) Brings a native CUDA implementation for `index_select`. Master silently converts CUDA tensors to CPU for CUDA support. Case `nnz >> size` could be optimized similar to how https://github.com/pytorch/pytorch/pull/72710 is doing that. Some benchmarks: <details> <summary>PR/torch_sparse/master</summary> ``` [------------------------------- cuda coo.index_select -------------------------------] \| PR \| torch_sparse \| master 32 threads: --------------------------------------------------------------------------- n=10000, nnz=100, index_len=100, dim=0 \| 96 \| 327 \| 70 n=10000, nnz=100, index_len=100, dim=1 \| 120 \| 505 \| 74 n=10000, nnz=100, index_len=1000, dim=0 \| 90 \| 333 \| 93 n=10000, nnz=100, index_len=1000, dim=1 \| 120 \| 499 \| 98 n=10000, nnz=100, index_len=10000, dim=0 \| 92 \| 331 \| 350 n=10000, nnz=100, index_len=10000, dim=1 \| 100 \| 506 \| 352 n=100000, nnz=1000, index_len=100, dim=0 \| 53 \| 274 \| 60 n=100000, nnz=1000, index_len=100, dim=1 \| 90 \| 368 \| 71 n=100000, nnz=1000, index_len=1000, dim=0 \| 93 \| 332 \| 100 n=100000, nnz=1000, index_len=1000, dim=1 \| 130 \| 501 \| 140 n=100000, nnz=1000, index_len=10000, dim=0 \| 100 \| 341 \| 522 n=100000, nnz=1000, index_len=10000, dim=1 \| 130 \| 530 \| 549 n=1000000, nnz=10000, index_len=100, dim=0 \| 90 \| 429 \| 110 n=1000000, nnz=10000, index_len=100, dim=1 \| 296 \| 810 \| 355 n=1000000, nnz=10000, index_len=1000, dim=0 \| 100 \| 435 \| 170 n=1000000, nnz=10000, index_len=1000, dim=1 \| 309 \| 830 \| 548 n=1000000, nnz=10000, index_len=10000, dim=0 \| 110 \| 446 \| 750 n=1000000, nnz=10000, index_len=10000, dim=1 \| 310 \| 830 \| 1000 n=10, nnz=100, index_len=100, dim=0 \| 90 \| 333 \| 74 n=10, nnz=100, index_len=100, dim=1 \| 100 \| 497 \| 78 n=10, nnz=100, index_len=1000, dim=0 \| 90 \| 329 \| 140 n=10, nnz=100, index_len=1000, dim=1 \| 100 \| 800 \| 100 n=10, nnz=100, index_len=10000, dim=0 \| 93 \| 340 \| 900 n=10, nnz=100, index_len=10000, dim=1 \| 120 \| 800 \| 489 n=10, nnz=1000, index_len=100, dim=0 \| 90 \| 321 \| 140 n=10, nnz=1000, index_len=100, dim=1 \| 100 \| 680 \| 140 n=10, nnz=1000, index_len=1000, dim=0 \| 110 \| 349 \| 670 n=10, nnz=1000, index_len=1000, dim=1 \| 130 \| 740 \| 800 n=10, nnz=1000, index_len=10000, dim=0 \| 302 \| 503 \| 4882 n=10, nnz=1000, index_len=10000, dim=1 \| 325 \| 2257 \| 5262 n=10, nnz=10000, index_len=100, dim=0 \| 229 \| 349 \| 810 n=10, nnz=10000, index_len=100, dim=1 \| 433 \| 870 \| 700 n=10, nnz=10000, index_len=1000, dim=0 \| 666 \| 502 \| 5581 n=10, nnz=10000, index_len=1000, dim=1 \| 826 \| 2379 \| 4820 n=10, nnz=10000, index_len=10000, dim=0 \| 2534 \| 2700 \| 80000 n=10, nnz=10000, index_len=10000, dim=1 \| 2723 \| 18540 \| 80000 n=100, nnz=1000, index_len=100, dim=0 \| 94 \| 324 \| 110 n=100, nnz=1000, index_len=100, dim=1 \| 100 \| 499 \| 110 n=100, nnz=1000, index_len=1000, dim=0 \| 96 \| 337 \| 150 n=100, nnz=1000, index_len=1000, dim=1 \| 130 \| 800 \| 140 n=100, nnz=1000, index_len=10000, dim=0 \| 100 \| 346 \| 900 n=100, nnz=1000, index_len=10000, dim=1 \| 130 \| 760 \| 900 n=100, nnz=10000, index_len=100, dim=0 \| 90 \| 323 \| 190 n=100, nnz=10000, index_len=100, dim=1 \| 279 \| 800 \| 180 n=100, nnz=10000, index_len=1000, dim=0 \| 110 \| 339 \| 781 n=100, nnz=10000, index_len=1000, dim=1 \| 294 \| 870 \| 800 n=100, nnz=10000, index_len=10000, dim=0 \| 315 \| 505 \| 6264 n=100, nnz=10000, index_len=10000, dim=1 \| 497 \| 2398 \| 5404 n=1000, nnz=10000, index_len=100, dim=0 \| 90 \| 333 \| 160 n=1000, nnz=10000, index_len=100, dim=1 \| 279 \| 635 \| 150 n=1000, nnz=10000, index_len=1000, dim=0 \| 100 \| 328 \| 215 n=1000, nnz=10000, index_len=1000, dim=1 \| 287 \| 810 \| 207 n=1000, nnz=10000, index_len=10000, dim=0 \| 100 \| 339 \| 900 n=1000, nnz=10000, index_len=10000, dim=1 \| 291 \| 880 \| 1000 n=1000, nnz=100000, index_len=100, dim=0 \| 92 \| 358 \| 435 n=1000, nnz=100000, index_len=100, dim=1 \| 302 \| 900 \| 530 n=1000, nnz=100000, index_len=1000, dim=0 \| 130 \| 360 \| 1000 n=1000, nnz=100000, index_len=1000, dim=1 \| 329 \| 930 \| 1200 n=1000, nnz=100000, index_len=10000, dim=0 \| 343 \| 530 \| 7000 n=1000, nnz=100000, index_len=10000, dim=1 \| 545 \| 2446 \| 6100 n=1000, nnz=1000000, index_len=100, dim=0 \| 355 \| 394 \| 2210 n=1000, nnz=1000000, index_len=100, dim=1 \| 1660 \| 2276 \| 2674 n=1000, nnz=1000000, index_len=1000, dim=0 \| 877 \| 574 \| 6700 n=1000, nnz=1000000, index_len=1000, dim=1 \| 2449 \| 3782 \| 9000 n=1000, nnz=1000000, index_len=10000, dim=0 \| 3112 \| 2931 \| 57000 n=1000, nnz=1000000, index_len=10000, dim=1 \| 7340 \| 20220 \| 65700 Times are in microseconds (us). ``` </details> Pull Request resolved: https://github.com/pytorch/pytorch/pull/77551 Approved by: https://github.com/cpuhrsch	2022-06-01 17:39:03 +00:00
Mike Ruberry	089203f8bc	Updates floor_divide to perform floor division (#78411 ) Fixes https://github.com/pytorch/pytorch/issues/43874 This PR changes floor_divide to perform floor division instead of truncation division. This is a BC-breaking change, but it's a "bug fix," and we've already warned users for several releases this behavior would change. Pull Request resolved: https://github.com/pytorch/pytorch/pull/78411 Approved by: https://github.com/ngimel	2022-05-29 21:28:45 +00:00
Nikita Vedeneev	00a1fb64bb	Faster `index_select` for sparse COO tensors on CPU. (#72710 ) Fixes https://github.com/pytorch/pytorch/issues/72212. This PR improves the previous algorithm in complexity. It also utilizes the structure of the problem and parallelizes computations when possible. Benchmark results. <details> <summary>Testing script</summary> ```python import torch import math from IPython import get_ipython from itertools import product import pickle from torch.utils.benchmark import Timer, Compare torch.manual_seed(13) #torch.set_num_threads(1) ipython = get_ipython() index_sizes = (100, 1000, 10000) # specifies (n, nnz) problem_dims = ( # n > nnz (10000, 100), (100000, 1000), (1000000, 10000), # n < nnz (10, 100), (10, 1000), (10, 10000), (100, 1000), (100, 10000), (1000, 10000), (1000, 100000), (1000, 1000000), #(1000000, 1000000000), ) def f(t, d, index): s = torch_sparse.SparseTensor.from_torch_sparse_coo_tensor(t) ss = s.index_select(d, index) return ss.coo() name = "PR" results = [] for (n, nnz), m in product(problem_dims, index_sizes): for d in (0, 1): if nnz < n: shape = (n, n) else: shape = (n, nnz // n) if d == 0 else (nnz // n, n) nrows, ncols = shape rowidx = torch.randint(low=0, high=nrows, size=(nnz,)) colidx = torch.randint(low=0, high=ncols, size=(nnz,)) itemidx = torch.vstack((rowidx, colidx)) xvalues = torch.randn(nnz) index = torch.randint(low=0, high=n, size=(m,)) SparseX = torch.sparse_coo_tensor(itemidx, xvalues, size=shape).coalesce() smtp = "SparseX.index_select(d, index)" timer = Timer(smtp, globals=globals(), label="coo.index_select", description=f"{name}: coo.index_select", sub_label=f"n={n}, nnz={nnz}, index_len={m}, dim={d}", num_threads=torch.get_num_threads()) results.append(timer.blocked_autorange()) compare = Compare(results) compare.trim_significant_figures() compare.print() with open(f"{name}_index_select.pickle", 'wb') as f: pickle.dump(results, f) ``` </details> <details> <summary>Gather results</summary> ```python import pickle from torch.utils.benchmark import Timer, Compare files = [ "PR", "torch_sparse", "master" ] timers = [] for name in files: with open("{}_index_select.pickle".format(name), 'rb') as f: timers += pickle.load(f) compare = Compare(timers) compare.trim_significant_figures() compare.print() ``` </details> <details> <summary>PR/torch_sparse/master runtime comparison</summary> ``` [----------------------------------- coo.index_select ----------------------------------] \| PR \| torch_sparse \| master 32 threads: ----------------------------------------------------------------------------- n=10000, nnz=100, index_len=100, dim=0 \| 14 \| 140 \| 10 n=10000, nnz=100, index_len=100, dim=1 \| 14 \| 200 \| 10 n=10000, nnz=100, index_len=1000, dim=0 \| 30 \| 180 \| 38 n=10000, nnz=100, index_len=1000, dim=1 \| 34 \| 240 \| 38 n=10000, nnz=100, index_len=10000, dim=0 \| 278 \| 460 \| 330 n=10000, nnz=100, index_len=10000, dim=1 \| 275 \| 516 \| 330 n=100000, nnz=1000, index_len=100, dim=0 \| 16 \| 290 \| 31 n=100000, nnz=1000, index_len=100, dim=1 \| 26 \| 390 \| 31 n=100000, nnz=1000, index_len=1000, dim=0 \| 45 \| 405 \| 263 n=100000, nnz=1000, index_len=1000, dim=1 \| 73 \| 500 \| 261 n=100000, nnz=1000, index_len=10000, dim=0 \| 444 \| 783 \| 2570 n=100000, nnz=1000, index_len=10000, dim=1 \| 470 \| 890 \| 2590 n=1000000, nnz=10000, index_len=100, dim=0 \| 25 \| 2400 \| 270 n=1000000, nnz=10000, index_len=100, dim=1 \| 270 \| 4000 \| 269 n=1000000, nnz=10000, index_len=1000, dim=0 \| 74 \| 2600 \| 2620 n=1000000, nnz=10000, index_len=1000, dim=1 \| 464 \| 3600 \| 2640 n=1000000, nnz=10000, index_len=10000, dim=0 \| 635 \| 3300 \| 26400 n=1000000, nnz=10000, index_len=10000, dim=1 \| 1000 \| 3960 \| 26400 n=10, nnz=100, index_len=100, dim=0 \| 16 \| 137 \| 16 n=10, nnz=100, index_len=100, dim=1 \| 16 \| 220 \| 16 n=10, nnz=100, index_len=1000, dim=0 \| 63 \| 238 \| 81 n=10, nnz=100, index_len=1000, dim=1 \| 60 \| 698 \| 78 n=10, nnz=100, index_len=10000, dim=0 \| 480 \| 940 \| 862 n=10, nnz=100, index_len=10000, dim=1 \| 330 \| 4930 \| 1070 n=10, nnz=1000, index_len=100, dim=0 \| 60 \| 200 \| 73 n=10, nnz=1000, index_len=100, dim=1 \| 56 \| 683 \| 70 n=10, nnz=1000, index_len=1000, dim=0 \| 480 \| 530 \| 1050 n=10, nnz=1000, index_len=1000, dim=1 \| 330 \| 4550 \| 1368 n=10, nnz=1000, index_len=10000, dim=0 \| 3100 \| 2900 \| 9300 n=10, nnz=1000, index_len=10000, dim=1 \| 3400 \| 46000 \| 9100 n=10, nnz=10000, index_len=100, dim=0 \| 400 \| 453 \| 857 n=10, nnz=10000, index_len=100, dim=1 \| 400 \| 4070 \| 1730 n=10, nnz=10000, index_len=1000, dim=0 \| 2840 \| 2600 \| 13900 n=10, nnz=10000, index_len=1000, dim=1 \| 3700 \| 40600 \| 16000 n=10, nnz=10000, index_len=10000, dim=0 \| 83200 \| 67400 \| 160000 n=10, nnz=10000, index_len=10000, dim=1 \| 68000 \| 528000 \| 190000 n=100, nnz=1000, index_len=100, dim=0 \| 46 \| 148 \| 31 n=100, nnz=1000, index_len=100, dim=1 \| 45 \| 242 \| 37 n=100, nnz=1000, index_len=1000, dim=0 \| 68 \| 248 \| 240 n=100, nnz=1000, index_len=1000, dim=1 \| 66 \| 755 \| 290 n=100, nnz=1000, index_len=10000, dim=0 \| 370 \| 802 \| 2250 n=100, nnz=1000, index_len=10000, dim=1 \| 372 \| 5430 \| 2770 n=100, nnz=10000, index_len=100, dim=0 \| 82 \| 210 \| 224 n=100, nnz=10000, index_len=100, dim=1 \| 74 \| 986 \| 270 n=100, nnz=10000, index_len=1000, dim=0 \| 350 \| 618 \| 2600 n=100, nnz=10000, index_len=1000, dim=1 \| 370 \| 4660 \| 4560 n=100, nnz=10000, index_len=10000, dim=0 \| 3000 \| 3400 \| 41680 n=100, nnz=10000, index_len=10000, dim=1 \| 5000 \| 47500 \| 30400 n=1000, nnz=10000, index_len=100, dim=0 \| 71 \| 160 \| 185 n=1000, nnz=10000, index_len=100, dim=1 \| 64 \| 516 \| 190 n=1000, nnz=10000, index_len=1000, dim=0 \| 100 \| 249 \| 1740 n=1000, nnz=10000, index_len=1000, dim=1 \| 98 \| 1030 \| 1770 n=1000, nnz=10000, index_len=10000, dim=0 \| 600 \| 808 \| 18300 n=1000, nnz=10000, index_len=10000, dim=1 \| 663 \| 5300 \| 18500 n=1000, nnz=100000, index_len=100, dim=0 \| 160 \| 258 \| 1890 n=1000, nnz=100000, index_len=100, dim=1 \| 200 \| 3620 \| 2050 n=1000, nnz=100000, index_len=1000, dim=0 \| 500 \| 580 \| 18700 n=1000, nnz=100000, index_len=1000, dim=1 \| 640 \| 7550 \| 30000 n=1000, nnz=100000, index_len=10000, dim=0 \| 3400 \| 3260 \| 186000 n=1000, nnz=100000, index_len=10000, dim=1 \| 3600 \| 49600 \| 194000 n=1000, nnz=1000000, index_len=100, dim=0 \| 517 \| 957 \| 18700 n=1000, nnz=1000000, index_len=100, dim=1 \| 680 \| 39600 \| 37600 n=1000, nnz=1000000, index_len=1000, dim=0 \| 3600 \| 4500 \| 186000 n=1000, nnz=1000000, index_len=1000, dim=1 \| 5800 \| 76400 \| 190000 n=1000, nnz=1000000, index_len=10000, dim=0 \| 50000 \| 67900 \| 1800000 n=1000, nnz=1000000, index_len=10000, dim=1 \| 45000 \| 570000 \| 1900000 Times are in microseconds (us). ``` </details> Pull Request resolved: https://github.com/pytorch/pytorch/pull/72710 Approved by: https://github.com/pearu, https://github.com/cpuhrsch	2022-05-10 16:33:13 +00:00
PyTorch MergeBot	8d67972b14	Revert "Faster `index_select` for sparse COO tensors on CPU. (#72710 )" This reverts commit `ce3857e73c`. Reverted https://github.com/pytorch/pytorch/pull/72710 on behalf of https://github.com/malfet	2022-05-10 14:43:05 +00:00
Nikita Vedeneev	ce3857e73c	Faster `index_select` for sparse COO tensors on CPU. (#72710 ) Fixes https://github.com/pytorch/pytorch/issues/72212. This PR improves the previous algorithm in complexity. It also utilizes the structure of the problem and parallelizes computations when possible. Benchmark results. <details> <summary>Testing script</summary> ```python import torch import math from IPython import get_ipython from itertools import product import pickle from torch.utils.benchmark import Timer, Compare torch.manual_seed(13) #torch.set_num_threads(1) ipython = get_ipython() index_sizes = (100, 1000, 10000) # specifies (n, nnz) problem_dims = ( # n > nnz (10000, 100), (100000, 1000), (1000000, 10000), # n < nnz (10, 100), (10, 1000), (10, 10000), (100, 1000), (100, 10000), (1000, 10000), (1000, 100000), (1000, 1000000), #(1000000, 1000000000), ) def f(t, d, index): s = torch_sparse.SparseTensor.from_torch_sparse_coo_tensor(t) ss = s.index_select(d, index) return ss.coo() name = "PR" results = [] for (n, nnz), m in product(problem_dims, index_sizes): for d in (0, 1): if nnz < n: shape = (n, n) else: shape = (n, nnz // n) if d == 0 else (nnz // n, n) nrows, ncols = shape rowidx = torch.randint(low=0, high=nrows, size=(nnz,)) colidx = torch.randint(low=0, high=ncols, size=(nnz,)) itemidx = torch.vstack((rowidx, colidx)) xvalues = torch.randn(nnz) index = torch.randint(low=0, high=n, size=(m,)) SparseX = torch.sparse_coo_tensor(itemidx, xvalues, size=shape).coalesce() smtp = "SparseX.index_select(d, index)" timer = Timer(smtp, globals=globals(), label="coo.index_select", description=f"{name}: coo.index_select", sub_label=f"n={n}, nnz={nnz}, index_len={m}, dim={d}", num_threads=torch.get_num_threads()) results.append(timer.blocked_autorange()) compare = Compare(results) compare.trim_significant_figures() compare.print() with open(f"{name}_index_select.pickle", 'wb') as f: pickle.dump(results, f) ``` </details> <details> <summary>Gather results</summary> ```python import pickle from torch.utils.benchmark import Timer, Compare files = [ "PR", "torch_sparse", "master" ] timers = [] for name in files: with open("{}_index_select.pickle".format(name), 'rb') as f: timers += pickle.load(f) compare = Compare(timers) compare.trim_significant_figures() compare.print() ``` </details> <details> <summary>PR/torch_sparse/master runtime comparison</summary> ``` [----------------------------------- coo.index_select ----------------------------------] \| PR \| torch_sparse \| master 32 threads: ----------------------------------------------------------------------------- n=10000, nnz=100, index_len=100, dim=0 \| 14 \| 140 \| 10 n=10000, nnz=100, index_len=100, dim=1 \| 14 \| 200 \| 10 n=10000, nnz=100, index_len=1000, dim=0 \| 30 \| 180 \| 38 n=10000, nnz=100, index_len=1000, dim=1 \| 34 \| 240 \| 38 n=10000, nnz=100, index_len=10000, dim=0 \| 278 \| 460 \| 330 n=10000, nnz=100, index_len=10000, dim=1 \| 275 \| 516 \| 330 n=100000, nnz=1000, index_len=100, dim=0 \| 16 \| 290 \| 31 n=100000, nnz=1000, index_len=100, dim=1 \| 26 \| 390 \| 31 n=100000, nnz=1000, index_len=1000, dim=0 \| 45 \| 405 \| 263 n=100000, nnz=1000, index_len=1000, dim=1 \| 73 \| 500 \| 261 n=100000, nnz=1000, index_len=10000, dim=0 \| 444 \| 783 \| 2570 n=100000, nnz=1000, index_len=10000, dim=1 \| 470 \| 890 \| 2590 n=1000000, nnz=10000, index_len=100, dim=0 \| 25 \| 2400 \| 270 n=1000000, nnz=10000, index_len=100, dim=1 \| 270 \| 4000 \| 269 n=1000000, nnz=10000, index_len=1000, dim=0 \| 74 \| 2600 \| 2620 n=1000000, nnz=10000, index_len=1000, dim=1 \| 464 \| 3600 \| 2640 n=1000000, nnz=10000, index_len=10000, dim=0 \| 635 \| 3300 \| 26400 n=1000000, nnz=10000, index_len=10000, dim=1 \| 1000 \| 3960 \| 26400 n=10, nnz=100, index_len=100, dim=0 \| 16 \| 137 \| 16 n=10, nnz=100, index_len=100, dim=1 \| 16 \| 220 \| 16 n=10, nnz=100, index_len=1000, dim=0 \| 63 \| 238 \| 81 n=10, nnz=100, index_len=1000, dim=1 \| 60 \| 698 \| 78 n=10, nnz=100, index_len=10000, dim=0 \| 480 \| 940 \| 862 n=10, nnz=100, index_len=10000, dim=1 \| 330 \| 4930 \| 1070 n=10, nnz=1000, index_len=100, dim=0 \| 60 \| 200 \| 73 n=10, nnz=1000, index_len=100, dim=1 \| 56 \| 683 \| 70 n=10, nnz=1000, index_len=1000, dim=0 \| 480 \| 530 \| 1050 n=10, nnz=1000, index_len=1000, dim=1 \| 330 \| 4550 \| 1368 n=10, nnz=1000, index_len=10000, dim=0 \| 3100 \| 2900 \| 9300 n=10, nnz=1000, index_len=10000, dim=1 \| 3400 \| 46000 \| 9100 n=10, nnz=10000, index_len=100, dim=0 \| 400 \| 453 \| 857 n=10, nnz=10000, index_len=100, dim=1 \| 400 \| 4070 \| 1730 n=10, nnz=10000, index_len=1000, dim=0 \| 2840 \| 2600 \| 13900 n=10, nnz=10000, index_len=1000, dim=1 \| 3700 \| 40600 \| 16000 n=10, nnz=10000, index_len=10000, dim=0 \| 83200 \| 67400 \| 160000 n=10, nnz=10000, index_len=10000, dim=1 \| 68000 \| 528000 \| 190000 n=100, nnz=1000, index_len=100, dim=0 \| 46 \| 148 \| 31 n=100, nnz=1000, index_len=100, dim=1 \| 45 \| 242 \| 37 n=100, nnz=1000, index_len=1000, dim=0 \| 68 \| 248 \| 240 n=100, nnz=1000, index_len=1000, dim=1 \| 66 \| 755 \| 290 n=100, nnz=1000, index_len=10000, dim=0 \| 370 \| 802 \| 2250 n=100, nnz=1000, index_len=10000, dim=1 \| 372 \| 5430 \| 2770 n=100, nnz=10000, index_len=100, dim=0 \| 82 \| 210 \| 224 n=100, nnz=10000, index_len=100, dim=1 \| 74 \| 986 \| 270 n=100, nnz=10000, index_len=1000, dim=0 \| 350 \| 618 \| 2600 n=100, nnz=10000, index_len=1000, dim=1 \| 370 \| 4660 \| 4560 n=100, nnz=10000, index_len=10000, dim=0 \| 3000 \| 3400 \| 41680 n=100, nnz=10000, index_len=10000, dim=1 \| 5000 \| 47500 \| 30400 n=1000, nnz=10000, index_len=100, dim=0 \| 71 \| 160 \| 185 n=1000, nnz=10000, index_len=100, dim=1 \| 64 \| 516 \| 190 n=1000, nnz=10000, index_len=1000, dim=0 \| 100 \| 249 \| 1740 n=1000, nnz=10000, index_len=1000, dim=1 \| 98 \| 1030 \| 1770 n=1000, nnz=10000, index_len=10000, dim=0 \| 600 \| 808 \| 18300 n=1000, nnz=10000, index_len=10000, dim=1 \| 663 \| 5300 \| 18500 n=1000, nnz=100000, index_len=100, dim=0 \| 160 \| 258 \| 1890 n=1000, nnz=100000, index_len=100, dim=1 \| 200 \| 3620 \| 2050 n=1000, nnz=100000, index_len=1000, dim=0 \| 500 \| 580 \| 18700 n=1000, nnz=100000, index_len=1000, dim=1 \| 640 \| 7550 \| 30000 n=1000, nnz=100000, index_len=10000, dim=0 \| 3400 \| 3260 \| 186000 n=1000, nnz=100000, index_len=10000, dim=1 \| 3600 \| 49600 \| 194000 n=1000, nnz=1000000, index_len=100, dim=0 \| 517 \| 957 \| 18700 n=1000, nnz=1000000, index_len=100, dim=1 \| 680 \| 39600 \| 37600 n=1000, nnz=1000000, index_len=1000, dim=0 \| 3600 \| 4500 \| 186000 n=1000, nnz=1000000, index_len=1000, dim=1 \| 5800 \| 76400 \| 190000 n=1000, nnz=1000000, index_len=10000, dim=0 \| 50000 \| 67900 \| 1800000 n=1000, nnz=1000000, index_len=10000, dim=1 \| 45000 \| 570000 \| 1900000 Times are in microseconds (us). ``` </details> Pull Request resolved: https://github.com/pytorch/pytorch/pull/72710 Approved by: https://github.com/pearu, https://github.com/cpuhrsch	2022-05-09 19:59:39 +00:00
Jane Xu	6d9dbd3391	Manually skip test_sparse_addmm as disable code is not working for now (#77076 ) Related to https://github.com/pytorch/pytorch/issues/73145 It was previously skipped for Linux and Windows, but mac has become a problem as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/77076 Approved by: https://github.com/ezyang	2022-05-09 13:54:29 +00:00
Mikayla Gawarecki	0adf070574	Use scatter_reduce to support masked reductions on sparse COO tensors (sum, prod, amin, amax) Pull Request resolved: https://github.com/pytorch/pytorch/pull/75454 Approved by: https://github.com/cpuhrsch	2022-05-06 15:40:22 +00:00
PyTorch MergeBot	381e08309f	Revert "Use scatter_reduce to support masked reductions on sparse COO tensors (sum, prod, amin, amax)" This reverts commit `fc2a2e8b72`. Reverted https://github.com/pytorch/pytorch/pull/75454 on behalf of https://github.com/b0noI	2022-05-04 22:31:31 +00:00
Mikayla Gawarecki	fc2a2e8b72	Use scatter_reduce to support masked reductions on sparse COO tensors (sum, prod, amin, amax) Pull Request resolved: https://github.com/pytorch/pytorch/pull/75454 Approved by: https://github.com/cpuhrsch	2022-05-03 23:17:07 +00:00
arindamroy-eng	7478ce187a	ROCM:Unskip more tests for ROCM5.0 Re-enabling more tests which are working on ROCM5.0 Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/75353 Approved by: https://github.com/ezyang	2022-04-19 19:45:55 +00:00
Pearu Peterson	a98b4666e0	Enable test_sparse_mask for Windows Pull Request resolved: https://github.com/pytorch/pytorch/pull/75189 Approved by: https://github.com/cpuhrsch	2022-04-11 17:21:29 +00:00
Brian Hirsh	1b7d7d9327	Reland: "free up dispatch key space (in C++)" (#74963 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74963 This is a re-land of D35192346 (`9872a06d77`) and D35192317 (`a9216cde6c`), which together are a diff that changes the internal representation of `DispatchKeySet` in pytorch core to free up the number of dispatch keys that we have available. See a more detailed description of the design in the original PR: https://github.com/pytorch/pytorch/pull/69633. The original PR broke Milan workflows, which use a pytorch mobile build, and manifested as a memory corruption bug inside of `liboacrmerged.so`. Background: Existing Mobile Optimization Pytorch mobile builds have an existing optimization (here `cc23725e89/c10/core/DispatchKey.h (L382)` and here `cc23725e89/aten/src/ATen/core/dispatch/OperatorEntry.h (L214)`), which works as follows: Every operator in pytorch has a "dispatch table" of function pointers, corresponding to all of the (up to 64) different kernels that we might dispatch to when we run an operator in pytorch (autograd, cpu, cuda, complex number support, etc). In mobile builds, the size of that table is shrunk from 64 to 8 to save a bunch of space, because mobile doesn't end up using the functionality associated with most dispatch keys. The dispatcher also has a notion of "fallback kernels", which are kernels that you can register to a particular dispatch key, but should be able to work for "any operator". The array of fallback kernels is defined here: `cc23725e89/aten/src/ATen/core/dispatch/Dispatcher.h (L294)`. The mobile-optimization currently does not extend to this array (it wouldn't be that useful anyway because there is only one array of fallback kernels globally - vs. there is a separate dispatch table of function pointers per operator). So the per-operator tables on mobile are size 8, while the fallback table is size 64. The Bug This PR actually makes it difficult to enable that optimization separately for the per-operator arrays vs. the fallback array, and incidentally shrunk the size of the fallback array from 64 to 8 for mobile (that happened on this line: https://github.com/pytorch/pytorch/pull/69633/files#diff-f735cd7aa68f15b624100cbc4bb3b5ea76ffc7c9d3bec3b0ccabaa09609e5319R294). That isn't a problem by itself (since mobile doesn't actually use any of the fallbacks that can no longer be stored). However, pytorch core will still register all of those fallback kernels on startup in mobile builds, even if they aren't used. When we tried to register one of those fallbacks on startup, it would try to dump the kernel somewhere in memory past the bounds of the (now smaller) array inside of the `Dispatcher` object, `backendFallbackKernels_`. Why didn't this problem show up in OSS CI? Why didn't it break other internal mobile workflows aside from Milan? Ideally, this failure would show up as part of the OSS signal on GitHub, since we already have mobile OSS builds. Given that it was another memory corruption issue that only affected Milan (subset of mobile), I'm not sure what's specific about Milan's builds that caused it only to manifest there. dreiss I wonder if there's another flavor of mobile builds we could run in OSS CI that could potentially help catch this? The debugging experience was pretty difficult Debugging the Milan-specific failure was made difficult by the following: (1) lack of CI - the original Milan failure didn't surface on my original diff, because the Milan job(s) that failed weren't triggered to run on pytorch changes. There's probably a balance to strike here, since those jobs will only be useful if they aren't flaky, and if they can produce reliable failure logs for debugging. (2) It's difficult to get a repro. - my work laptop doesn't have the right specs to run the Milan development workflow (not enough disk space) - There is an existing OnDemand workflow for Milan, but it appears to be relatively new, and after a bunch of help from MarcioPorto, we ran into issues forwarding the log output from Milan tests on the emulator back to the terminal (see the original discussion here: https://fb.workplace.com/groups/OnDemandFRL/permalink/1424937774645433/) (3) Lack of stack-traces. - Most Milan failures didn't include actionable stack traces. phding generously helped me debug by running my suggested patches locally, and reporting back if there were any failures. The failing test didn't include a stack trace though (just the line where the crash appeared), so I ended up making some educated guesses about what the issue was based on the area of the crash. ghstack-source-id: 152688542 Test Plan: Confirmed with phding that the broken Milan workflow from the previous version of this diff is now passing. Reviewed By: phding, albanD Differential Revision: D35222806 fbshipit-source-id: 0ad115a0f768bc8ea5d4c203b2990254c7092d30 (cherry picked from commit 002b91966f11fd55ab3fa3801b636fa39a6dd12c)	2022-03-31 21:52:38 +00:00
Nikita Shulga	bfac65dfe5	[testing] Update dispatch macros (#74977 ) This PR is reland of #74289 Co-authored-by: Khushi Agrawal <khushiagrawal411@gmail.com>	2022-03-30 14:13:21 -07:00
PyTorch MergeBot	2e4152b118	Revert "[testing] Update dispatch macros" This reverts commit `eed19a0f38`. Reverted https://github.com/pytorch/pytorch/pull/74289 on behalf of https://github.com/malfet	2022-03-30 19:52:37 +00:00
Khushi Agrawal	eed19a0f38	[testing] Update dispatch macros Hi, This PR is the follow-up PR of #71561. (the previous PR had a couple of merge conflicts and was reverted, this PR resolves that). Please take a look. Thanks! cc: @pmeier @mruberry @kshitij12345 Pull Request resolved: https://github.com/pytorch/pytorch/pull/74289 Approved by: https://github.com/pmeier, https://github.com/mruberry	2022-03-30 16:10:16 +00:00
Brian Hirsh	9872a06d77	Back out "free up dispatch key space (in C++)" (#74859 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74859 Original commit changeset: 6d1dd0fd8144 Original Phabricator Diff: D34227616 (`2cbddc0e9b`) ghstack-source-id: 152381077 (Note: this ignores all push blocking failures!) Test Plan: Test on Milan with "get weather utterance" buck build fbsourcefbandroid/mode/opt fbsourcefbandroid/mode/milan_build_rdk //fbandroid/apps/wearable/system/speechservice:speechservice_target30_xhdpi_armv7_release_debug_keystore -c pt.has_backtaces=1 Reviewed By: phding Differential Revision: D35192346 fbshipit-source-id: b962de5d5effaf23f9aa8afd3ef36f8c6383de5b (cherry picked from commit 913e3027a11457aaa2d97a9d89ebc6133b14213c)	2022-03-29 15:39:17 +00:00
Christian Puhrsch	e55b73d65a	Add strided layout support for to_dense Fixes #59958 Pull Request resolved: https://github.com/pytorch/pytorch/pull/74486 Approved by: https://github.com/pearu, https://github.com/suo	2022-03-29 00:12:48 +00:00
Pearu Peterson	ebeea9e2ea	Support masked sum on sparse COO tensors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/71239 Approved by: https://github.com/cpuhrsch	2022-03-25 18:26:39 +00:00
Brian Hirsh	2cbddc0e9b	free up dispatch key space (in C++) (#72827 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72827 Reland of D34034848 (`6690256021`) ghstack-source-id: 152161452 Test Plan: Confirm that Milan tests are passing Reviewed By: ezyang Differential Revision: D34227616 fbshipit-source-id: 6d1dd0fd8144dfbd9e194cd7564cce017e7db968 (cherry picked from commit e5c1b29fedd5c2a0bad810cedc94aa784136b6aa)	2022-03-25 17:04:51 +00:00
Nikita Shulga	ef066f0832	Revert D34856571: [pytorch][PR] Replace `get_all_` type macros with the ATen dispatch macros. Test Plan: revert-hammer Differential Revision: D34856571 (`3ded7b1da3`) Original commit changeset: 0dca038bcad5 Original Phabricator Diff: D34856571 (`3ded7b1da3`) fbshipit-source-id: 594553fa0b710d78beba59d5d2b646f1f1270386 (cherry picked from commit 8090eb9b12dcf452a9e7dc01792a66fb91b563b6)	2022-03-15 22:07:11 +00:00
Khushi Agrawal	3ded7b1da3	Replace `get_all_` type macros with the ATen dispatch macros. (#71561 ) Summary: Hi, Team! The PR is motivated from https://github.com/pytorch/pytorch/pull/71153#discussion_r782446738. It aims to replace `get_all` type macros with the ATen dispatch macros. The files it iterates over are: (Thanks, Lezcano, for the idea!!) <details> <summary> `test/test_autograd.py`</summary> <p> ```python 43:from torch.testing._internal.common_dtype import get_all_dtypes 8506: floating_dt = [dt for dt in get_all_dtypes() if dt.is_floating_point] ``` </p> </details> <details> <summary> `test/test_binary_ufuncs.py`</summary> <p> ```python 26: all_types_and_complex_and, integral_types_and, get_all_dtypes, get_all_int_dtypes, get_all_math_dtypes, 27: get_all_complex_dtypes, get_all_fp_dtypes, 935: dtypes(get_all_dtypes(include_bool=False, include_complex=False)) 1035: dtypes(get_all_dtypes( 1488: dtypes((get_all_dtypes(include_bool=False, include_bfloat16=False))) 1879: dtypes(product(get_all_dtypes(include_complex=False), get_all_dtypes(include_complex=False))) 1887: dtypes((get_all_int_dtypes() + [torch.bool])) 1913: dtypes((get_all_fp_dtypes())) 1941: dtypes((get_all_fp_dtypes())) 1977: dtypes(product(get_all_complex_dtypes(), get_all_dtypes())) 2019: dtypes(product(get_all_fp_dtypes(), get_all_fp_dtypes())) 2048: dtypes(get_all_dtypes()) 2110: dtypes(product(get_all_dtypes(include_complex=False), 2111: get_all_dtypes(include_complex=False))) 2128: types = [torch.bool, torch.bfloat16] + get_all_int_dtypes() 2173: if dtypes[1] in get_all_fp_dtypes(): 2178: dtypes(product(get_all_fp_dtypes(), 2179: get_all_fp_dtypes())) 2260: dtypesIfCUDA(set(get_all_math_dtypes('cuda')) - {torch.complex64, torch.complex128}) 2261: dtypes(set(get_all_math_dtypes('cpu')) - {torch.complex64, torch.complex128}) 2273: dtypesIfCUDA(set(get_all_math_dtypes('cuda')) - {torch.complex64, torch.complex128}) 2274: dtypes(set(get_all_math_dtypes('cpu')) - {torch.complex64, torch.complex128}) 2307: dtypes(get_all_math_dtypes('cpu')) 2319: dtypes(get_all_fp_dtypes(include_bfloat16=False)) 2331: dtypes(get_all_int_dtypes()) 2356: dtypes(get_all_dtypes(include_bfloat16=False, include_bool=False, include_complex=False)) 2393: if dtype in get_all_int_dtypes(): 2614: dtypes(get_all_dtypes()) 2624: dtypes(tuple(itertools.combinations_with_replacement(get_all_dtypes(), 2))) 2806: dtypes(list(product(get_all_dtypes(include_complex=False), 2807: get_all_dtypes(include_complex=False)))) 2866: dtypes(list(product(get_all_complex_dtypes(), 2867: get_all_complex_dtypes()))) 2902: dtypes(product(get_all_dtypes(), get_all_dtypes())) 2906: dtypes(product(get_all_dtypes(), get_all_dtypes())) 2910: dtypes(product(get_all_dtypes(), get_all_dtypes())) 3019: dtypes = [torch.float, torch.double] + get_all_complex_dtypes() 3221: dtypes(get_all_dtypes(include_complex=False)) 3407: dtypes(list(product(get_all_dtypes(include_bool=False), 3408: get_all_dtypes(include_bool=False)))) 3504: dtypes(product(get_all_dtypes(include_complex=False, include_bfloat16=False), 3505: get_all_dtypes(include_complex=False, include_bfloat16=False))) 3516: if x.dtype in get_all_int_dtypes() + [torch.bool]: 3643: dtypes(product(get_all_dtypes(include_complex=False, 3645: get_all_dtypes(include_complex=False, ``` </p> </details> <details> <summary> `test/test_complex.py`</summary> <p> ```python 6:from torch.testing._internal.common_dtype import get_all_complex_dtypes 11: dtypes(get_all_complex_dtypes()) ``` </p> </details> <details> <summary> `test/test_foreach.py`</summary> <p> ```python 18: get_all_dtypes, get_all_int_dtypes, get_all_complex_dtypes, get_all_fp_dtypes, 142: if dtype in get_all_int_dtypes(): 179: disable_fastpath = op.ref == torch.div and dtype in get_all_int_dtypes() + [torch.bool] 201: disable_fastpath = op.ref == torch.div and dtype in get_all_int_dtypes() + [torch.bool] 205: disable_fastpath \|= dtype in get_all_int_dtypes() + [torch.bool] 211: disable_fastpath \|= dtype not in get_all_complex_dtypes() 241: bool_int_div = op.ref == torch.div and dtype in get_all_int_dtypes() + [torch.bool] 246: disable_fastpath \|= dtype in get_all_int_dtypes() + [torch.bool] 248: disable_fastpath \|= dtype not in get_all_complex_dtypes() 250: disable_fastpath \|= True and dtype not in get_all_complex_dtypes() 307: disable_fastpath = dtype in get_all_int_dtypes() + [torch.bool] 365: if opinfo.name == "_foreach_abs" and dtype in get_all_complex_dtypes(): 376: ops(foreach_unary_op_db, dtypes=get_all_dtypes()) 393: dtypes=get_all_dtypes(include_half=True, include_bfloat16=True, include_complex=False)) 401: ops(foreach_minmax_op_db, dtypes=get_all_fp_dtypes(include_bfloat16=True, include_half=True)) 426: if ord in (1, 2) and dtype in torch.testing.get_all_fp_dtypes(): 439: dtypes(get_all_dtypes()) 449: ops(foreach_binary_op_db, dtypes=get_all_dtypes()) 481: ops(foreach_binary_op_db, dtypes=get_all_dtypes()) 536: if dtype in get_all_int_dtypes() + [torch.bool] and foreach_op == torch._foreach_div: 545: ops(foreach_binary_op_db, dtypes=get_all_dtypes()) 637: ops(foreach_pointwise_op_db, allowed_dtypes=get_all_fp_dtypes(include_half=False, include_bfloat16=False)) ``` </p> </details> <details> <summary> `test/test_linalg.py`</summary> <p> ```python 29: all_types, floating_types, floating_and_complex_types, get_all_dtypes, get_all_int_dtypes, get_all_complex_dtypes, 30: get_all_fp_dtypes, 111: dtypes((get_all_dtypes())) 794: float_and_complex_dtypes = get_all_fp_dtypes() + get_all_complex_dtypes() 807: dtypes((get_all_int_dtypes())) 828: dtypes((get_all_fp_dtypes() + get_all_complex_dtypes())) 841: if dtype in get_all_complex_dtypes(): 844: dtypes(itertools.product(get_all_dtypes(), 845: get_all_dtypes())) 855: for dtypes0, dtypes1, dtypes2 in product(get_all_dtypes(), repeat=3): 5607: get_all_fp_dtypes(include_half=not CUDA9, include_bfloat16=(CUDA11OrLater and SM53OrLater))) 5608: dtypes((set(get_all_dtypes()) - {torch.half, torch.bool})) 5644: dtypes((get_all_complex_dtypes() + get_all_fp_dtypes())) 6255: dtypesIfCUDA(get_all_complex_dtypes(), 6256: get_all_fp_dtypes(include_bfloat16=(TEST_WITH_ROCM or (CUDA11OrLater and SM53OrLater)), 6292: dtypesIfCUDA(get_all_fp_dtypes(include_bfloat16=(TEST_WITH_ROCM or (CUDA11OrLater and SM53OrLater)))) 6323: dtypesIfCUDA(get_all_complex_dtypes(), 6324: get_all_fp_dtypes(include_bfloat16=(TEST_WITH_ROCM or (CUDA11OrLater and SM53OrLater)))) 6325: dtypes(get_all_complex_dtypes(), get_all_fp_dtypes()) 6358: dtypesIfCUDA(([torch.float, torch.double] + get_all_complex_dtypes())) 6556: dtypes(get_all_fp_dtypes(), get_all_complex_dtypes()) 6668: dtypes(get_all_fp_dtypes(), get_all_complex_dtypes()) 6741: dtypes(get_all_fp_dtypes(), get_all_complex_dtypes()) ``` </p> </details> <details> <summary> `test/test_nn.py`</summary> <p> ```python 37:from torch.testing._internal.common_dtype import integral_types, get_all_fp_dtypes, get_all_math_dtypes 50: onlyNativeDeviceTypes, deviceCountAtLeast, largeTensorTest, expectedFailureMeta, skipMeta, get_all_device_types, \ 8862: for device in get_all_device_types(): 9629: for dt1 in get_all_math_dtypes(device): 9630: for dt2 in get_all_math_dtypes(device): 9631: for dt3 in get_all_math_dtypes(device): 9648: for input_dtype in get_all_math_dtypes(device): 9664: for input_dtype in get_all_math_dtypes(device): 13015: dtypes(get_all_fp_dtypes(include_bfloat16=AMPERE_OR_ROCM)) 13034: dtypes(get_all_fp_dtypes(include_bfloat16=AMPERE_OR_ROCM)) 13159: dtypes(get_all_fp_dtypes(include_bfloat16=AMPERE_OR_ROCM)) 17400: dtypesIfCUDA(get_all_fp_dtypes(include_bfloat16=AMPERE_OR_ROCM)) 17768: dtypesIfCUDA(get_all_fp_dtypes()) 17773: dtypesIfCUDA(get_all_fp_dtypes()) 17778: dtypesIfCUDA(get_all_fp_dtypes()) 17783: dtypesIfCUDA(get_all_fp_dtypes()) 17788: dtypesIfCUDA(get_all_fp_dtypes()) 17793: dtypesIfCUDA(get_all_fp_dtypes()) 17798: dtypesIfCUDA(get_all_fp_dtypes()) 17963: dtypesIfCUDA(get_all_fp_dtypes()) 17977: dtypesIfCUDA(get_all_fp_dtypes()) 18684: def test_cross_entropy_loss_prob_target_all_reductions(self, device): ``` </p> </details> <details> <summary> `test/test_numpy_interop.py`</summary> <p> ```python 12:from torch.testing._internal.common_dtype import get_all_dtypes 399: dtypes(get_all_dtypes()) ``` </p> </details> <details> <summary> `test/test_ops.py`</summary> <p> ```python 12:from torch.testing._internal.common_dtype import floating_and_complex_types_and, get_all_dtypes 86: for dtype in get_all_dtypes(): ``` </p> </details> <details> <summary> `test/test_reductions.py`</summary> <p> ```python 16: get_all_dtypes, get_all_math_dtypes, get_all_int_dtypes, get_all_complex_dtypes, get_all_fp_dtypes, 360: allowed_dtypes=get_all_dtypes(include_bfloat16=False)) 366: allowed_dtypes=get_all_dtypes(include_bfloat16=False)) 394: allowed_dtypes=get_all_dtypes(include_bfloat16=False)) 750: for dtype in [dtype for dtype in get_all_math_dtypes('cpu') if dtype != torch.float16]: 1404: dtypes(get_all_dtypes(include_bool=False, include_complex=False)) 1457: dtypes((get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False) + 1458: get_all_complex_dtypes())) 1465: return dtype in get_all_int_dtypes() 1494: dtypes((get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False))) 1501: dtypes((get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False))) 1507: dtypes((get_all_complex_dtypes())) 1514: dtypes = list(get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False)) 1523: dtypes((get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False))) 1531: if dtype in get_all_fp_dtypes(): 1608: dtypes((get_all_dtypes(include_half=True, include_bfloat16=False, 1837: dtypes(get_all_dtypes(include_bool=False, include_complex=False)) 1855: dtypes((set(get_all_dtypes(include_bool=False, include_complex=False)) - {torch.uint8})) 3219: for dtype in get_all_dtypes(include_half=True, include_bfloat16=False, ``` </p> </details> <details> <summary> `test/test_serialization.py`</summary> <p> ```python 26:from torch.testing._internal.common_dtype import get_all_dtypes 586: for device, dtype in product(devices, get_all_dtypes()): 589: for other_dtype in get_all_dtypes(): ``` </p> </details> <details> <summary> `test/test_shape_ops.py`</summary> <p> ```python 18:from torch.testing._internal.common_dtype import get_all_dtypes 230: dtypes(get_all_dtypes(include_complex=False, include_bool=False, include_half=False, 232: dtypesIfCUDA(get_all_dtypes(include_complex=False, include_bool=False, include_bfloat16=False)) 344: dtypes(get_all_dtypes()) 443: dtypes(get_all_dtypes()) 461: dtypes(get_all_dtypes()) 570: dtypes(get_all_dtypes(include_complex=False)) ``` </p> </details> <details> <summary> `test/test_sort_and_select.py`</summary> <p> ```python 12: all_types, all_types_and, floating_types_and, get_all_dtypes, get_all_int_dtypes, get_all_fp_dtypes, 136: dtypes(set(get_all_dtypes()) - {torch.bool, torch.complex64, torch.complex128}) 231: dtypes(set(get_all_dtypes()) - {torch.bool, torch.complex64, torch.complex128}) 296: dtypes((get_all_int_dtypes() + get_all_fp_dtypes())) 647: dtypesIfCUDA(get_all_fp_dtypes()) 678: dtypesIfCUDA((get_all_dtypes(include_complex=False, 682: dtypes((get_all_dtypes(include_complex=False, include_bool=False, include_half=False, include_bfloat16=False))) 739: dtypesIfCPU(set(get_all_dtypes()) - {torch.complex64, torch.complex128}) 740: dtypes(set(get_all_dtypes()) - {torch.bfloat16, torch.complex64, torch.complex128}) 799: dtypesIfCPU(set(get_all_dtypes()) - {torch.complex64, torch.complex128}) 800: dtypes(set(get_all_dtypes()) - {torch.bfloat16, torch.complex64, torch.complex128}) ``` </p> </details> <details> <summary> `test/test_sparse.py`</summary> <p> ```python 20:from torch.testing import get_all_complex_dtypes, get_all_fp_dtypes 29: floating_and_complex_types, floating_and_complex_types_and, get_all_dtypes, get_all_int_dtypes, 1963: return dtype in get_all_int_dtypes() 1994: dtypes(get_all_dtypes(include_bool=False, include_half=False, 2103: return dtype in get_all_int_dtypes() 2138: dtypes(get_all_dtypes(include_bool=False, include_half=False, 2626: all_sparse_dtypes = get_all_dtypes(include_complex=True) 2633: all_sparse_dtypes = get_all_dtypes(include_complex=True) 3230: dtypes(get_all_complex_dtypes(), 3231: get_all_fp_dtypes(include_half=False, include_bfloat16=False)) 3234: get_all_fp_dtypes( ``` </p> </details> <details> <summary> `test/test_sparse_csr.py`</summary> <p> ```python 7:from torch.testing import get_all_complex_dtypes, get_all_fp_dtypes, floating_and_complex_types, make_tensor 17:from torch.testing._internal.common_dtype import floating_types, get_all_dtypes 120: dtypes(get_all_dtypes()) 133: dtypes(get_all_dtypes()) 150: dtypes(get_all_dtypes()) 180: dtypes(get_all_dtypes()) 201: dtypes(get_all_dtypes()) 210: dtypes(get_all_dtypes()) 225: dtypes(get_all_dtypes()) 244: dtypes(get_all_dtypes()) 263: dtypes(get_all_dtypes()) 285: dtypes(get_all_dtypes()) 411: dtypes(get_all_dtypes()) 482: dtypes(get_all_dtypes()) 502: dtypes(get_all_dtypes()) 562: dtypes(get_all_dtypes()) 588: dtypesIfCUDA(get_all_complex_dtypes(), 589: get_all_fp_dtypes(include_half=SM53OrLater, include_bfloat16=SM80OrLater)) 745: dtypesIfCUDA(get_all_complex_dtypes(), 746: get_all_fp_dtypes(include_half=SM53OrLater and TEST_CUSPARSE_GENERIC, 765: dtypesIfCUDA(get_all_complex_dtypes(), 766: get_all_fp_dtypes(include_half=SM53OrLater and TEST_CUSPARSE_GENERIC, 801: torch.testing.get_all_fp_dtypes(include_bfloat16=SM80OrLater, 841: torch.testing.get_all_fp_dtypes(include_bfloat16=SM80OrLater, 1182: dtypes(get_all_dtypes()) 1276: dtypes(get_all_dtypes(include_bool=False, include_half=False, include_bfloat16=False)) 1286: dtypes(get_all_dtypes()) ``` </p> </details> <details> <summary> `test/test_tensor_creation_ops.py`</summary> <p> ```python 21: onlyCUDA, skipCPUIf, dtypesIfCUDA, skipMeta, get_all_device_types) 23: get_all_dtypes, get_all_math_dtypes, get_all_int_dtypes, get_all_fp_dtypes, get_all_complex_dtypes 150: for dt in get_all_dtypes(): 160: for dt in get_all_dtypes(): 314: dtypes = [dtype for dtype in get_all_dtypes() if dtype != torch.bfloat16] 1012: dtypes((get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False) + 1013: get_all_complex_dtypes())) 1032: dtypes((get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False) + 1033: get_all_complex_dtypes())) 1050: dtypes((get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False) + 1051: get_all_complex_dtypes())) 1745: dtypes((get_all_int_dtypes() + get_all_fp_dtypes())) 1779: dtypes((get_all_int_dtypes() + get_all_fp_dtypes())) 1868: dtypes((get_all_int_dtypes() + get_all_fp_dtypes())) 1926: dtypes((get_all_int_dtypes() + get_all_fp_dtypes())) 1954: do_test_empty_full(self, get_all_math_dtypes('cpu'), torch.strided, torch_device) 1956: do_test_empty_full(self, get_all_math_dtypes('cpu'), torch.strided, None) 1957: do_test_empty_full(self, get_all_math_dtypes('cpu'), torch.strided, torch_device) 2538: for device in get_all_device_types(): 2645: for dtype in get_all_dtypes(): 2678: dtypes((get_all_fp_dtypes(include_half=False, include_bfloat16=False) + 2679: get_all_complex_dtypes())) 2716: dtypes(get_all_fp_dtypes(include_half=False, include_bfloat16=False)) 2827: for dt in get_all_dtypes(): 2913: dtypes(get_all_dtypes(include_bool=False, include_half=False)) 2914: dtypesIfCUDA(get_all_dtypes(include_bool=False, include_half=True)) 3028: dtypes((get_all_fp_dtypes() + get_all_complex_dtypes())) 3033: dtypes((get_all_fp_dtypes() + get_all_complex_dtypes())) 3074: dtypes(get_all_dtypes(include_bool=False, include_half=False, include_complex=False)) 3075: dtypesIfCUDA(((get_all_int_dtypes() + [torch.float32, torch.float16, torch.bfloat16]) 3077: else get_all_dtypes(include_bool=False, include_half=True, include_complex=False))) 3873: dtypes(get_all_dtypes()) 3884: dtypes(get_all_dtypes(include_bool=False)) 3916: for other in get_all_dtypes(): 3922: dtypes(get_all_dtypes()) 3932: dtypes(get_all_dtypes(include_bool=False)) 3955: dtypes(get_all_dtypes(include_bool=False)) 3961: dtypes(get_all_dtypes(include_bool=False)) 3965: dtypes(get_all_dtypes()) ``` </p> </details> <details> <summary> `test/test_testing.py`</summary> <p> ```python 25:from torch.testing._internal.common_dtype import get_all_dtypes 31: dtypes((get_all_dtypes(include_half=True, include_bfloat16=False, ``` </p> </details> <details> <summary> `test/test_torch.py`</summary> <p> ```python 51: expectedAlertNondeterministic, get_all_device_types, skipXLA) 57: get_all_fp_dtypes, get_all_int_dtypes, get_all_math_dtypes, get_all_dtypes, get_all_complex_dtypes 296: for d in get_all_device_types(): 323: for device in get_all_device_types(): 324: for dt1 in get_all_dtypes(): 325: for dt2 in get_all_dtypes(): 343: all_dtypes = get_all_dtypes() 350: all_dtypes = get_all_dtypes() 781: for dtype in get_all_dtypes(): 986: for device in get_all_device_types(): 1017: for device in get_all_device_types(): 1018: for dtype in get_all_math_dtypes(device): 2792: for device in get_all_device_types(): 3186: dtypes(get_all_dtypes()) 3195: for error_dtype in get_all_dtypes(): 3203: dtypes(get_all_dtypes()) 3212: for error_dtype in get_all_dtypes(): 4539: dtypes(get_all_fp_dtypes()) 4545: dtypes((get_all_int_dtypes() + get_all_fp_dtypes())) 4577: dtypes(get_all_fp_dtypes(include_half=False, include_bfloat16=False)) 4578: dtypesIfCPU((get_all_fp_dtypes(include_half=False, include_bfloat16=True))) 4579: dtypesIfCUDA((get_all_fp_dtypes(include_bfloat16=False))) 4599: dtypes((get_all_fp_dtypes(include_half=False, include_bfloat16=False))) 4600: dtypesIfCPU((get_all_dtypes(include_half=False, include_bfloat16=False, include_complex=False))) 4601: dtypesIfCUDA((get_all_dtypes(include_bfloat16=False, include_complex=False))) 4613: for p_dtype in get_all_fp_dtypes(include_half=device.startswith('cuda'), include_bfloat16=False): 4628: dtypes((get_all_fp_dtypes(include_half=False, include_bfloat16=False))) 4629: dtypesIfCUDA((get_all_fp_dtypes(include_bfloat16=False))) 4640: dtypes(get_all_fp_dtypes()) 4723: dtypes(get_all_fp_dtypes()) 4735: dtypes(get_all_fp_dtypes(include_bfloat16=False)) 4736: dtypesIfCUDA(get_all_fp_dtypes()) 4747: dtypes(get_all_fp_dtypes()) 4761: dtypes(get_all_fp_dtypes()) 4771: dtypes(get_all_fp_dtypes()) 4792: dtypes((get_all_int_dtypes() + get_all_fp_dtypes())) 5302: dtypes(get_all_dtypes(include_bfloat16=False)) 5322: dtypes(get_all_dtypes(include_half=False, include_bfloat16=False)) 5323: dtypesIfCPU(get_all_dtypes(include_bfloat16=False)) 5324: dtypesIfCUDA(get_all_dtypes(include_bfloat16=False)) 5591: for dt in get_all_dtypes(): 5611: for dt in get_all_dtypes(): 5678: for dt in get_all_dtypes(): 5696: dtypesIfCUDA(set(get_all_math_dtypes('cuda'))) 5697: dtypes(set(get_all_math_dtypes('cpu'))) 5746: dtypes(get_all_dtypes()) 5780: dtypes(get_all_dtypes()) 5885: dtypes(get_all_dtypes()) 5902: dtypes(get_all_dtypes()) 5945: dtypes(get_all_dtypes()) 5979: dtypes(get_all_dtypes(include_bool=False)) 6049: dtypes(get_all_dtypes(include_bool=False)) 6092: dtypes((get_all_fp_dtypes(include_bfloat16=False, include_half=False) + 6093: get_all_complex_dtypes())) 6094: dtypesIfCPU(get_all_dtypes()) 6095: dtypesIfCUDA(get_all_dtypes()) 6122: dtypes((get_all_fp_dtypes(include_bfloat16=False, include_half=False) + 6123: get_all_complex_dtypes())) 6124: dtypesIfCPU(get_all_dtypes()) 6125: dtypesIfCUDA(get_all_dtypes()) 6163: dtypes((get_all_fp_dtypes(include_bfloat16=False, include_half=False) + 6164: get_all_complex_dtypes())) 6165: dtypesIfCPU(get_all_dtypes()) 6166: dtypesIfCUDA(get_all_dtypes()) 6190: dtypes((get_all_complex_dtypes() + 6191: get_all_int_dtypes())) 6238: dtypes(get_all_dtypes()) 6323: dtypes(get_all_dtypes()) 6389: dtypes(product(get_all_dtypes(), (torch.uint8, torch.bool))) 6699: dtypesIfCUDA(set(get_all_math_dtypes('cuda'))) 6700: dtypes(set(get_all_math_dtypes('cpu'))) 7452: dtypes(get_all_dtypes(include_bool=False)) 7461: dtypes(get_all_dtypes(include_bool=False)) 7477: dtypes(get_all_dtypes(include_bool=False)) 7496: dtypes(get_all_dtypes(include_bool=False)) 7538: dtypes(get_all_dtypes(include_bool=False)) 8162: dtypes((get_all_int_dtypes() + get_all_fp_dtypes() + 8163: get_all_complex_dtypes())) 8175: dtypes((get_all_int_dtypes() + get_all_fp_dtypes() + 8176: get_all_complex_dtypes())) ``` </p> </details> <details> <summary> `test/test_type_promotion.py`</summary> <p> ```python 14: get_all_dtypes, get_all_math_dtypes, get_all_int_dtypes, get_all_fp_dtypes 187: for dtype in get_all_dtypes(): 262: dtypes1 = get_all_math_dtypes('cuda') 263: dtypes2 = get_all_math_dtypes(device) 339: dtypes(itertools.product(get_all_dtypes(), get_all_dtypes())) 468: for dt1 in get_all_math_dtypes(device): 469: for dt2 in get_all_math_dtypes(device): 519: for dt1 in get_all_math_dtypes(device): 520: for dt2 in get_all_math_dtypes(device): 528: for dt in get_all_math_dtypes(device): 561: for dtype in get_all_dtypes(): 766: dtypes=get_all_math_dtypes(device)) 771: dtypes=get_all_math_dtypes(device)) 782: dtypes=get_all_math_dtypes(device)) 879: dtypes = get_all_dtypes(include_bfloat16=False) 898: dtypes = get_all_dtypes(include_bfloat16=False, include_bool=False) 965: dtypesIfCUDA(itertools.product(get_all_dtypes(include_bfloat16=False, include_complex=False), 966: get_all_dtypes(include_bfloat16=False, include_complex=False))) 967: dtypes(itertools.product(get_all_dtypes(include_half=False, include_bfloat16=False, 969: get_all_dtypes(include_half=False, include_bfloat16=False, 976: return dtype in get_all_int_dtypes() + [torch.bool] 979: return dtype in get_all_fp_dtypes(include_half=True, include_bfloat16=False) ``` </p> </details> <details> <summary> `test/test_unary_ufuncs.py`</summary> <p> ```python 24: floating_types_and, all_types_and_complex_and, floating_and_complex_types_and, get_all_dtypes, get_all_math_dtypes, 25: get_all_int_dtypes, get_all_fp_dtypes, get_all_complex_dtypes 517: dtypes((get_all_int_dtypes() + [torch.bool] + 518: get_all_fp_dtypes(include_bfloat16=False))) 596: dtypes(get_all_fp_dtypes(include_half=True, include_bfloat16=False)) 611: invalid_input_dtypes = get_all_int_dtypes() + \ 612: get_all_complex_dtypes() + \ 619: for dtype in get_all_fp_dtypes(include_half=True, include_bfloat16=False): 1048: dtypes(get_all_math_dtypes('cpu')) 1182: dtypesIfCUDA(get_all_fp_dtypes()) 1190: dtypesIfCUDA(get_all_fp_dtypes()) 1205: dtypesIfCUDA(get_all_fp_dtypes()) 1215: dtypesIfCUDA(get_all_fp_dtypes()) 1307: dtypes((get_all_dtypes(include_bool=False))) 1349: dtypes((get_all_fp_dtypes(include_half=False) + 1350: get_all_complex_dtypes())) 1351: dtypesIfCUDA((get_all_fp_dtypes(include_half=True) + 1352: get_all_complex_dtypes())) ``` </p> </details> <details> <summary> `test/test_view_ops.py`</summary> <p> ```python 19: get_all_dtypes, get_all_int_dtypes, get_all_fp_dtypes, get_all_complex_dtypes 124: dtypes((get_all_int_dtypes() + get_all_fp_dtypes())) 131: dtypes(get_all_dtypes(include_bfloat16=False)) 213: for view_dtype in [get_all_fp_dtypes(), get_all_complex_dtypes()]: 220: dtypes(get_all_dtypes()) 224: for view_dtype in get_all_dtypes(): 305: dtypes(get_all_complex_dtypes(include_complex32=True)) 343: dtypes(get_all_dtypes()) 354: dtypes(get_all_dtypes()) 364: dtypes(get_all_dtypes()) 374: dtypes(get_all_dtypes()) 384: dtypes((get_all_int_dtypes() + get_all_fp_dtypes())) 395: dtypes(get_all_complex_dtypes()) 426: dtypes(get_all_complex_dtypes()) 451: dtypes(product(get_all_complex_dtypes(), get_all_dtypes())) 1263: dtypes((torch.testing.get_all_dtypes())) 1279: dtypes((torch.testing.get_all_dtypes())) 1405: dtypes((get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False) + 1406: get_all_complex_dtypes())) 1471: dtypes(get_all_dtypes(include_bfloat16=False)) 1574: dtypes(get_all_dtypes()) 1601: dtypes(get_all_dtypes(include_bfloat16=False)) 1632: dtypes(*get_all_dtypes(include_bfloat16=False)) 1711: for dt in get_all_dtypes(): 1717: for dt in get_all_dtypes(): 1724: for dt in get_all_dtypes(): ``` </p> </details> I'm looking forward to your viewpoints. Thanks :) cc: mruberry kshitij12345 anjali411 Pull Request resolved: https://github.com/pytorch/pytorch/pull/71561 Reviewed By: samdow Differential Revision: D34856571 Pulled By: mruberry fbshipit-source-id: 0dca038bcad5cf69906245c496d2e61ac3876335 (cherry picked from commit b058f67b4313143efa714ab105f36e74083131b9)	2022-03-15 20:31:41 +00:00
Pearu Peterson	a5dcc0c378	Enable test_coalesce_cuda_bfloat16 (#73158 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73158 Fixes #72893 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D34515679 Pulled By: cpuhrsch fbshipit-source-id: 049f8ddf53023b78e1b48e15bbd3cdc58b6bf692 (cherry picked from commit 28a44ca56f66bfaaf14a049856b7d89fec8cd838)	2022-02-28 19:34:20 +00:00
Pearu Peterson	3c932c345b	Fix test_Sparse_to_Sparse_copy__cuda_bfloat16 failure (#73157 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73157 Fixes #72892 Test Plan: Imported from OSS Reviewed By: george-qi Differential Revision: D34398986 Pulled By: cpuhrsch fbshipit-source-id: 20214be1859354fb18a306e8d1de9852a898c485 (cherry picked from commit c1816ef0cf8834149bebcc11f4402f0eedfae6f7)	2022-02-28 05:33:50 +00:00
Pearu Peterson	16cd6853e1	Fix test_sparse_addmm_...float16 and test_sparse_matmul_...float16 test failures (#73155 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73155 Fixes #73145 Test Plan: Imported from OSS Reviewed By: mikaylagawarecki Differential Revision: D34398935 Pulled By: cpuhrsch fbshipit-source-id: b1e852f25b0888b37d9c9c1418ddf344ac8f0a04 (cherry picked from commit d63c977fb39c7dcb3f3d083edc4b25cd2d6c2ec4)	2022-02-26 05:30:36 +00:00
Pearu Peterson	4c522643e7	Fix CUDA error when multiplying sparse hybrid tensors with zero dense dimensions (#73428 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73428 Fixes https://github.com/pytorch/pytorch/issues/73363 Test Plan: Imported from OSS Reviewed By: george-qi Differential Revision: D34478521 Pulled By: cpuhrsch fbshipit-source-id: cbc83f223a14c92ed8b284e5e2a8aab390e2bc5c (cherry picked from commit 9d7ecc848228f9a5b1761f9d3653d3cca49e0244)	2022-02-26 01:08:45 +00:00
Philip Meier	0973c5a1cc	align signature of make_tensor with other creation ops (#72702 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72702 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D34457729 Pulled By: mruberry fbshipit-source-id: 83d580c4201eef946dc9cf4b9e28a3d36be55609 (cherry picked from commit aa4cf20fbeb4b795595729b8ac2e6ba7707d8283)	2022-02-25 06:30:31 +00:00
Rohan Varma	c3d79ac422	Manual skip sparse tests manual skip because not properly disabled by automation Differential Revision: [D34456851](https://our.internmc.facebook.com/intern/diff/D34456851/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/73374	2022-02-24 20:26:02 +00:00
Alban Desmaison	49444bb501	Revert D34400588: [pytorch][PR] super setUp call missing in TestSparse Test Plan: revert-hammer Differential Revision: D34400588 (`555b215a90`) Original commit changeset: 40ac1c56918d Original Phabricator Diff: D34400588 (`555b215a90`) fbshipit-source-id: 0375279d06cc7a9d612bd70cc4c042cb3319a5fc (cherry picked from commit 7cd3d2da907e6f0882f56c8843d50586756a2fe6)	2022-02-24 14:34:01 +00:00
Jane Xu	555b215a90	super setUp call missing in TestSparse (#73217 ) Summary: Should fix the fact that Sparse tests are not rightly disabled https://github.com/pytorch/pytorch/issues/73145#issuecomment-1046952585 Pull Request resolved: https://github.com/pytorch/pytorch/pull/73217 Reviewed By: atalman Differential Revision: D34400588 Pulled By: janeyx99 fbshipit-source-id: 40ac1c56918d5c47debf962a2bd218a325626ad8 (cherry picked from commit e63dae284ba9056567fcaffc54d1aa38151c0a12)	2022-02-23 19:36:50 +00:00
Nikita Shulga	5dad19fef0	Back out "[pytorch][PR] add BFloat16 sparse operators on CPU: copy, coalesce, sparse_mask, ad…" Summary: Original commit changeset: f1274125234a Original Phabricator Diff: D34343016 (`c6f56599bb`) Test Plan: Abovementioned PR regressed OSS CI Reviewed By: atalman Differential Revision: D34379703 fbshipit-source-id: bc624cfd86249dde2fac635d9b66f08f86b4aed9 (cherry picked from commit `e52827f1ae`)	2022-02-21 18:31:51 +00:00
Jiayi Sun	c6f56599bb	add BFloat16 sparse operators on CPU: copy, coalesce, sparse_mask, ad… (#72846 ) Summary: …d_out, addmm Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/72846 Reviewed By: mikaylagawarecki Differential Revision: D34343016 Pulled By: cpuhrsch fbshipit-source-id: f1274125234a3bacbb7a38fc642fbf5c9786d435 (cherry picked from commit `c819456abf`)	2022-02-19 01:33:51 +00:00
Pearu Peterson	e785c0a1ab	Enable Half/BFloat16 support for to_dense and coalesce methods. (#72397 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72397 Test Plan: Imported from OSS Reviewed By: jbschlosser, zou3519 Differential Revision: D34286114 Pulled By: cpuhrsch fbshipit-source-id: a4f7e2abc3b2d37437cbd09d693c1b409bb011b9 (cherry picked from commit `74f94447fc`)	2022-02-17 02:54:23 +00:00
Philip Meier	b5f2574f36	no longer coalesce sparse COO tensors before comparison (#69751 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69751 cc nikitaved pearu cpuhrsch IvanYashchuk Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D34262453 Pulled By: ezyang fbshipit-source-id: e2e62d2aa03fc569d2951c880960b256f5dc4aaa (cherry picked from commit `cb6b0ef719`)	2022-02-17 02:33:08 +00:00
Brian Hirsh	22ccf448e8	Revert D34034848: free up dispatch key space (in C++) Test Plan: revert-hammer Differential Revision: D34034848 (`6690256021`) Original commit changeset: 9677ee2c0a1a Original Phabricator Diff: D34034848 (`6690256021`) fbshipit-source-id: fd50943d915ef813bb9f9ab278fb582429eea3b1 (cherry picked from commit `3acefee1cd`)	2022-02-14 23:29:00 +00:00
Brian Hirsh	6690256021	free up dispatch key space (in C++) (#72402 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72402 The original PR had an array-out-of-bounds access in `DispatchKeyExtractor.cpp`, that wasn't caught by ASAN and appeared to only manifest in a subset of android internal tests. After fixing the OOB access (and adding more asserts), I confirmed that the android internal test passes. Reland of D33255193 (`20b8653dfa`) ghstack-source-id: 148830728 Test Plan: Steps to test: (1) connect to a mobile OD (2) run `one_world android emulator android-29` in a terminal to start the android emulator (3) In a separate terminal, run the test: `buck test //fbandroid/instrumentation_tests/com/facebook/pytorch/bi_xray:instrumentation_test -c test.external_runner=tpx -- --regex 'testBIXRayModel.*PyTorchBIXRayInstrumentationTest' --force-remote-execution --run-disabled` I also ran `buck test fbandroid/mode/dbg //fbandroid/instrumentation_tests/com/facebook/pytorch/bi_xray:instrumentation_test`, which failed before and passed after the PR. Reviewed By: albanD Differential Revision: D34034848 fbshipit-source-id: 9677ee2c0a1afd1183896f7055009445712523c5 (cherry picked from commit `9ab9b12d35`)	2022-02-14 16:02:29 +00:00
Jacob Szwejbka	791e7df7d9	Back out "free up dispatch key space (in C++)" Summary: I think this diff stack broke all the related tasks below. Test Plan: For our failing tests: buck test //fbandroid/instrumentation_tests/com/facebook/pytorch/bi_xray:instrumentation_test -c test.external_runner=tpx -- --regex 'testBIXRayModel.*PyTorchBIXRayInstrumentationTest' --force-remote-execution --run-disabled For the ubn: Not really sure what to do, trying to build the app and see if I can use an effect? Reviewed By: shoumikhin Differential Revision: D34018849 fbshipit-source-id: 3571718cb6621931af931b494e0a70d6e0164e65 (cherry picked from commit `3cc63cb2ea`)	2022-02-05 01:25:42 +00:00
Brian Hirsh	20b8653dfa	free up dispatch key space (in C++) (#69633 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69633 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D33255193 Pulled By: bdhirsh fbshipit-source-id: 79773e9c15bf4f2f27675121a49ff5ffd1375238 (cherry picked from commit `eac0b13005`)	2022-02-04 17:57:38 +00:00
Pearu Peterson	214f4bf2ff	Support sparse.sum on empty sparse tensor (#71091 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71091 Fixes https://github.com/pytorch/pytorch/issues/65394 The masked sum on a full input tensor (of any layout) with an all-true mask is the same as the sum on the strided input tensor (after applying `to_dense` to sparse inputs). Since masked sum uses `torch.sparse.sum` then, for the simplicity of masked reductions implementations, its reduction behavior ought to be defined by the behavior of the `torch.sum`. This PR implements the behavioral connection with respect to the directional summation of empty sparse tensors that correspond to all-zero strided tensors. cc nikitaved pearu cpuhrsch Test Plan: Imported from OSS Reviewed By: davidberard98 Differential Revision: D33651750 Pulled By: cpuhrsch fbshipit-source-id: 703891bff88c8da6270b4272f5d2da81688db67d (cherry picked from commit `53f97e80f7`)	2022-01-19 18:58:08 +00:00
Pearu Peterson	677fab6d1d	Support broadcast_to on sparse COO tensors (#71073 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71073 cc nikitaved pearu cpuhrsch Test Plan: Imported from OSS Reviewed By: mikaylagawarecki Differential Revision: D33645744 Pulled By: cpuhrsch fbshipit-source-id: 4775c9636c4e868022a8c1bbfec93e351d1cf885 (cherry picked from commit `640f21e09a`)	2022-01-19 04:33:41 +00:00
Pearu Peterson	e7602a1e30	Fix multiplication of 0-D sparse tensors (#70749 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70749 Fixes https://github.com/pytorch/pytorch/issues/65396 and a clang-tidy error. cc nikitaved pearu cpuhrsch Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D33439136 Pulled By: cpuhrsch fbshipit-source-id: 45ec58de7c18db183f891431d4a26e98fd0e924a	2022-01-06 13:36:46 -08:00
Peter Bell	6de9f0fc94	OpInfo: Allow sample_inputs_func to be any iterable (#69256 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69256 Closes #52486 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D32942008 Pulled By: mruberry fbshipit-source-id: f5b01b0298c0160b0bec6e86e2b6db8cfe746206	2021-12-09 08:37:26 -08:00
Peter Bell	1da1707568	Sparse: Implement simple unary ufuncs operators (#68887 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68887 Closes #46988, closes #46987, closes #46761 By "simple" I mean operators that map 0->0 so we can implement it by just re-dispatching on the values tensor. That does mean we have `sin` but not `cos` for example, but without fill value support this is the best that can be done. Most of these don't support autograd because the derivative formulas use unsupported operators. cc nikitaved pearu cpuhrsch IvanYashchuk Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D32734911 Pulled By: cpuhrsch fbshipit-source-id: 203ab105799f3d2d682b01ca3d6b18e7c994776a	2021-12-01 05:43:19 -08:00
Eli Uriegas	251686fc4c	Revert D32706197: Sparse: Implement simple unary ufuncs operators Test Plan: revert-hammer Differential Revision: D32706197 (`fbaa19a6fa`) Original commit changeset: 65e1acb36457 fbshipit-source-id: 45c4b486f9eee200d5a1f6d46d267617124f8a5e	2021-11-30 10:50:12 -08:00
Peter Bell	fbaa19a6fa	Sparse: Implement simple unary ufuncs operators (#68887 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68887 Closes #46988, closes #46987, closes #46761 By "simple" I mean operators that map 0->0 so we can implement it by just re-dispatching on the values tensor. That does mean we have `sin` but not `cos` for example, but without fill value support this is the best that can be done. Most of these don't support autograd because the derivative formulas use unsupported operators. cc nikitaved pearu cpuhrsch IvanYashchuk Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D32706197 Pulled By: cpuhrsch fbshipit-source-id: 65e1acb3645737ca7bdb7f2db739d8e118906f4b	2021-11-30 00:30:30 -08:00
Peter Bell	f5fa91ba2e	Sparse: Add additional opinfo tests (#68886 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68886 cc nikitaved pearu cpuhrsch IvanYashchuk Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D32697933 Pulled By: cpuhrsch fbshipit-source-id: fffdd1bc663cc1bc49abe8cf3680982d1cb497bc	2021-11-29 12:49:20 -08:00
Vinnam Kim	f89572f417	Add feature: zeros_like() from a dense tensor to a sparse tensor (#68108 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/67904. - Create a sparse tensor when the sparse layout is given even if the input tensor is not sparse. cc nikitaved pearu cpuhrsch IvanYashchuk Pull Request resolved: https://github.com/pytorch/pytorch/pull/68108 Reviewed By: anjali411 Differential Revision: D32316269 Pulled By: cpuhrsch fbshipit-source-id: 923dbd4dc7c74f51f7cdbafb2375a30271a6a886	2021-11-11 08:54:15 -08:00
Jane Xu	793f366e34	[skip ci] Set test owners for sparse tests (#66863 ) Summary: Action following https://github.com/pytorch/pytorch/issues/66232 cc nikitaved pearu cpuhrsch IvanYashchuk Pull Request resolved: https://github.com/pytorch/pytorch/pull/66863 Reviewed By: anjali411 Differential Revision: D31771126 Pulled By: janeyx99 fbshipit-source-id: 6cb5ca0557e8555f6a09b3e607ff8888e505486e	2021-10-20 10:12:13 -07:00
lezcano	0974215c4d	Prefer mT and mH over transpose(-2, -1) and transpose(-2, -1).conj() (#64181 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64181 This PR replaces all the calls to: - `transpose(-2, -1)` or `transpose(-1, -2)` by `mT()` in C++ and `mT` in Python - `conj().transpose(-2, -1)` or `transpose(-2, -1).conj()` or `conj().transpose(-1, -2)` or `transpose(-1, -2).conj()` by `mH()` in C++ and `mH` in Python. It also simplifies two pieces of code, and fixes one bug where a pair of parentheses were missing in the function `make_symmetric_matrices`. Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D31692896 Pulled By: anjali411 fbshipit-source-id: e9112c42343663d442dc5bd53ff2b492094b434a	2021-10-18 13:02:25 -07:00
Yukio Siraichi	c829cb6840	Port `min` kernel to structured kernels. (#61450 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61450 Tracking issue: #55070 Test Plan: Imported from OSS Reviewed By: saketh-are Differential Revision: D29741713 Pulled By: bdhirsh fbshipit-source-id: 2c107752a90fd39cfb55e08aaf3541bd484a5fc3	2021-09-28 14:03:54 -07:00
Ivan Yashchuk	1fec9cd76b	[Fixed] Enable Half, BFloat16, and Complex dtypes for coo-coo sparse matmul [CUDA] (#59980 ) Summary: This PR enables Half, BFloat16, ComplexFloat, and ComplexDouble support for matrix-matrix multiplication of COO sparse matrices. The change is applied only to CUDA 11+ builds. `cusparseSpGEMM` also supports `CUDA_C_16F` (complex float16) and `CUDA_C_16BF` (complex bfloat16). PyTorch also supports the complex float16 dtype (`ScalarType::ComplexHalf`), but there is no convenient dispatch, so this dtype is omitted in this PR. cc nikitaved pearu cpuhrsch IvanYashchuk ezyang anjali411 dylanbespalko mruberry Lezcano Pull Request resolved: https://github.com/pytorch/pytorch/pull/59980 Reviewed By: ngimel Differential Revision: D30994115 Pulled By: cpuhrsch fbshipit-source-id: 4f55b99e8e25079d6273b4edf95ad6fa85aeaf24	2021-09-21 13:03:40 -07:00
Philip Meier	26b7ff5aea	deprecate dtype getters from `torch.testing` namespace (#63554 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63554 Following https://github.com/pytorch/pytorch/pull/61840#issuecomment-884087809, this deprecates all the dtype getters publicly exposed in the `torch.testing` namespace. The reason for this twofold: 1. If someone is not familiar with the C++ dispatch macros PyTorch uses, the names are misleading. For example `torch.testing.floating_types()` will only give you `float32` and `float64` skipping `float16` and `bfloat16`. 2. The dtype getters provide very minimal functionality that can be easily emulated by downstream libraries. We thought about [providing an replacement](https://gist.github.com/pmeier/3dfd2e105842ad0de4505068a1a0270a), but ultimately decided against it. The major problem is BC: by keeping it, either the namespace is getting messy again after a new dtype is added or we need to somehow version the return values of the getters. Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D30662206 Pulled By: mruberry fbshipit-source-id: a2bdb10ab02ae665df1b5b76e8afa9af043bbf56	2021-09-07 08:58:51 -07:00
Richard Zou	92b31b59af	Revert D29699456: [pytorch][PR] Enable Half, BFloat16, and Complex dtypes for coo-coo sparse matmul [CUDA] Test Plan: revert-hammer Differential Revision: D29699456 (`ad4848565e`) Original commit changeset: 407ae53392ac fbshipit-source-id: b6c70ba8bb28c0c38de47857030b69792a8470de	2021-09-01 07:32:24 -07:00
Saketh Are	83e28a7d28	Use stacklevel for floordiv deprecation warnings (#64034 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/60548 `Tensor.__floordiv__` was indirectly deprecated by deprecation of `torch.floor_divide` (see https://github.com/pytorch/pytorch/issues/43874). Deprecating it directly provides clearer feedback. Repro: ``` import torch x = torch.tensor(0) x // 1 ``` Before this change, a deprecation warning was triggered within the C++ implementation of floor_divide: ``` UserWarning: floor_divide is deprecated, and will be removed in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at ../aten/src/ATen/native/BinaryOps.cpp:571.) return torch.floor_divide(self, other) ``` After this change, the warning instead cites the user's offending line of Python code: ``` UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). x // 1 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/64034 Reviewed By: mruberry Differential Revision: D30658010 Pulled By: saketh-are fbshipit-source-id: b0e6c5008d741897509d102f4a89efb47de4aa2a	2021-08-31 11:27:56 -07:00
Ivan Yashchuk	ad4848565e	Enable Half, BFloat16, and Complex dtypes for coo-coo sparse matmul [CUDA] (#59980 ) Summary: This PR enables Half, BFloat16, ComplexFloat, and ComplexDouble support for matrix-matrix multiplication of COO sparse matrices. The change is applied only to CUDA 11+ builds. `cusparseSpGEMM` also supports `CUDA_C_16F` (complex float16) and `CUDA_C_16BF` (complex bfloat16). PyTorch also supports the complex float16 dtype (`ScalarType::ComplexHalf`), but there is no convenient dispatch, so this dtype is omitted in this PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59980 Reviewed By: ngimel Differential Revision: D29699456 Pulled By: cpuhrsch fbshipit-source-id: 407ae53392acb2f92396a62a57cbaeb0fe6e950b	2021-08-30 15:06:25 -07:00
Kushashwa Ravi Shrimali	d37636901e	[Doc] `make_tensor` to `torch.testing` module (#63925 ) Summary: This PR aims to add `make_tensor` to the `torch.testing` module in PyTorch docs. TODOs: * [x] Add examples cc: pmeier mruberry brianjo Pull Request resolved: https://github.com/pytorch/pytorch/pull/63925 Reviewed By: ngimel Differential Revision: D30633487 Pulled By: mruberry fbshipit-source-id: 8e5a1f880c6ece5925b4039fee8122bd739538af	2021-08-30 12:25:40 -07:00
Shen Li	1022443168	Revert D30279364: [codemod][lint][fbcode/c*] Enable BLACK by default Test Plan: revert-hammer Differential Revision: D30279364 (`b004307252`) Original commit changeset: c1ed77dfe43a fbshipit-source-id: eab50857675c51e0088391af06ec0ecb14e2347e	2021-08-12 11:45:01 -07:00
Zsolt Dollenstein	b004307252	[codemod][lint][fbcode/c*] Enable BLACK by default Test Plan: manual inspection & sandcastle Reviewed By: zertosh Differential Revision: D30279364 fbshipit-source-id: c1ed77dfe43a3bde358f92737cd5535ae5d13c9a	2021-08-12 10:58:35 -07:00
mattip	c8eda919a4	test, fix sparse * dense exceptions and corner case (#61723 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/59916 This fixes two problems with sparse multiplication - 0d-dense * sparse was creating a non-sparse output and failing. - dense * sparse or sparse * dense is not supported, but would emit an unhelpful error message <details> <summary> unhelpful error message </summary> Traceback (most recent call last): File "<stdin>", line 1, in <module> NotImplementedError: Could not run 'aten::_nnz' with arguments from the 'CPU' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::_nnz' is only available for these backends: [SparseCPU, SparseCUDA, SparseCsrCPU, SparseCsrCUDA, BackendSelect, Python, Named, Conjugate, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradXPU, AutogradMLC, AutogradHPU, AutogradNestedTensor, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, UNKNOWN_TENSOR_TYPE_ID, Autocast, Batched, VmapMode]. SparseCPU: registered at aten/src/ATen/RegisterSparseCPU.cpp:961 [kernel] SparseCUDA: registered at aten/src/ATen/RegisterSparseCUDA.cpp:1092 [kernel] SparseCsrCPU: registered at aten/src/ATen/RegisterSparseCsrCPU.cpp:202 [kernel] SparseCsrCUDA: registered at aten/src/ATen/RegisterSparseCsrCUDA.cpp:229 [kernel] BackendSelect: fallthrough registered at ../aten/src/ATen/core/BackendSelectFallbackKernel.cpp:3 [backend fallback] Python: registered at ../aten/src/ATen/core/PythonFallbackKernel.cpp:38 [backend fallback] Named: registered at ../aten/src/ATen/core/NamedRegistrations.cpp:7 [backend fallback] Conjugate: registered at ../aten/src/ATen/ConjugateFallback.cpp:118 [backend fallback] ADInplaceOrView: fallthrough registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:60 [backend fallback] AutogradOther: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel] AutogradCPU: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel] AutogradCUDA: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel] AutogradXLA: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel] AutogradXPU: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel] AutogradMLC: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel] AutogradHPU: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel] AutogradNestedTensor: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel] AutogradPrivateUse1: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel] AutogradPrivateUse2: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel] AutogradPrivateUse3: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel] Tracer: registered at ../torch/csrc/autograd/generated/TraceType_2.cpp:10254 [kernel] UNKNOWN_TENSOR_TYPE_ID: fallthrough registered at ../aten/src/ATen/autocast_mode.cpp:446 [backend fallback] Autocast: fallthrough registered at ../aten/src/ATen/autocast_mode.cpp:285 [backend fallback] Batched: registered at ../aten/src/ATen/BatchingRegistrations.cpp:1016 [backend fallback] VmapMode: fallthrough registered at ../aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback] </details> Also added tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61723 Reviewed By: ezyang Differential Revision: D29962639 Pulled By: cpuhrsch fbshipit-source-id: 5455680ddfa91d5cc9925174d0fd3107c40f5b06	2021-08-05 11:27:12 -07:00
Kurt Mohler	87334c40a7	Remove torch._bmm and remove torch.bmm deterministic arg documentation (#61629 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/61571 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61629 Reviewed By: mrshenli Differential Revision: D29774486 Pulled By: albanD fbshipit-source-id: bfc9119c478f0244d5be681bcf4954a3eb97e542	2021-07-20 10:55:43 -07:00
Anjali Chourdia	287603f51c	Revert D29698486: [pytorch][PR] Remove torch._bmm and remove torch.bmm deterministic arg documentation Test Plan: revert-hammer Differential Revision: D29698486 (`328606699f`) Original commit changeset: 5af2d3803ab1 fbshipit-source-id: ce954c13196b1fb8277d61a686ac351d3bf13903	2021-07-16 11:02:09 -07:00
Kurt Mohler	328606699f	Remove torch._bmm and remove torch.bmm deterministic arg documentation (#61629 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/61571 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61629 Reviewed By: zou3519 Differential Revision: D29698486 Pulled By: albanD fbshipit-source-id: 5af2d3803ab1eb093616bcfc7e074d8b57ef6958	2021-07-16 09:18:34 -07:00
Joel Schlosser	03b5a225a7	Test parametrization for instantiated device-specific tests (#60233 ) Summary: The `ops` decorator provides a way to parameterize a test across a given list of ops. This would be useful for modules as well (e.g. a `modules` decorator), but the mechanism by which this is accomplished is specific to ops. In the details, the `ops` decorator tags a test function with the metadata needed (list of ops, `dtypes`) and the actual tests are generated according to this metadata during the call to `instantiate_device_type_tests()`. This PR makes this mechanism more generic, allowing for test parameterization across arbitrary dimensions. This makes a `modules` decorator (or any similar type of decorator) straightforward to implement without changes to the device-specific test instantiation logic. One caveat is that, since this is implemented where the old `ops` decorator was (within `instantiate_device_type_tests()`), this only works for tests instantiated using the device-specific instantiation logic. Longer term, even device-specific test instantiation could be treated as an optional parameterization across device types, but this PR takes a low-risk approach for now. In practice, this just means that a `device` kwarg is required for all test signatures used with the mechanism. The `ops` decorator has been refactored to use the generic mechanism and works the same as before, with one difference: when `OpDTypes.none` is specified, the test signature no longer needs an unused `dtype` kwarg. This is a nice bonus that demonstrates the added flexibility of a generic parameterization mechanism. The refactored form also has the bonus that all op-specific test generation logic is contained within the `ops` decorator class, improving readability. Behind the scenes, the generic mechanism is a base decorator class (`_TestParameterizer`) from which `ops` derives. The core functionality is in the `_parameterize_test()` method, which takes in a test function and returns a generator that produces parameterized tests, including names and parameter kwargs to pass to them. Using the `ops` decorator results in a set of op-specific tests from a given generic test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60233 Reviewed By: iramazanli Differential Revision: D29494995 Pulled By: jbschlosser fbshipit-source-id: a14446488c106094fafcaa75ccf8e9e3faf33bfc	2021-06-30 18:50:22 -07:00
Ivan Yashchuk	90303157ab	Enable complex dtypes for coo_sparse-coo_sparse matmul [CPU] (#59554 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59554 This PR enables complex numbers supports for matrix-matrix multiplication of COO sparse matrices. Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D28968309 Pulled By: anjali411 fbshipit-source-id: 4fd471e76a5584366aabc86c08b4564667ee54ca	2021-06-08 19:34:41 -07:00
Ivan Yashchuk	acc47357b5	Fix torch.conj for zero-dimensional sparse coo matrix (#59553 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59553 Added a test for 0x0 sparse coo input for sparse_unary_ufuncs. This test fails for `conj` on master. Modified `unsupportedTypes` for test_sparse_consistency, complex dtypes pass, but float16 doesn't pass for `conj` because `to_dense()` doesn't work with float16. Fixes https://github.com/pytorch/pytorch/issues/59549 Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D28968215 Pulled By: anjali411 fbshipit-source-id: 44e99f0ce4aa45b760d79995a021e6139f064fea	2021-06-08 15:46:49 -07:00
Peter Bell	99f2000a99	Migrate nonzero from TH to ATen (CPU) (#59149 ) Summary: Resubmit of https://github.com/pytorch/pytorch/issues/58811, Closes gh-24745 The existing PR (gh-50655) has been stalled because `TensorIterator` doesn't guarantee iteration order in the same way that `TH_TENSOR_APPLY` does. For contiguous test cases this isn't an issue; but it breaks down for example with channels last format. I resolve this by adding a new `TensorIteratorConfig` parameter, `enforce_linear_iteration`, which disables dimension reordering. I've also added a test case for non-contiguous tensors to verify this works. This PR also significantly improves performance by adding multithreading support to the algorithm. As part of this, I wrote a custom `count_nonzero` that gives per-thread counts which is necessary to write the outputs in the right location. \| Shape \| Before \| After (1 thread) \| After (8 threads) \| \|:----------:\|--------:\|-----------------:\|------------------:\| \| 256,128,32 \| 2610 us \| 2150 us \| 551 us \| \| 128,128,32 \| 1250 us \| 1020 us \| 197 us \| \| 64,128,32 \| 581 us \| 495 us \| 99 us \| \| 32,128,32 \| 292 us \| 255 us \| 83 us \| \| 16,128,32 \| 147 us \| 126 us \| 75 us \| \| 8,128,32 \| 75 us \| 65 us \| 65 us \| \| 4,128,32 \| 39 us \| 33 us \| 33 us \| \| 2,128,32 \| 20 us \| 18 us \| 18 us \| \| 1,128,32 \| 11 us \| 9 us \| 9 us \| Pull Request resolved: https://github.com/pytorch/pytorch/pull/59149 Reviewed By: mruberry Differential Revision: D28817466 Pulled By: ngimel fbshipit-source-id: f08f6c003c339368fd53dabd28e9ada9e59de732	2021-06-02 12:26:29 -07:00
Natalia Gimelshein	657b75d155	Revert D28700259: [pytorch][PR] Migrate nonzero from TH to ATen (CPU) Test Plan: revert-hammer Differential Revision: D28700259 (`95b1bc1009`) Original commit changeset: 9b279ca7c36d fbshipit-source-id: 267afe63376be598d24c862e02e3b4b3ea75f77c	2021-05-27 20:07:30 -07:00
Peter Bell	95b1bc1009	Migrate nonzero from TH to ATen (CPU) (#58811 ) Summary: Closes gh-24745 The existing PR (gh-50655) has been stalled because `TensorIterator` doesn't guarantee iteration order in the same way that `TH_TENSOR_APPLY` does. For contiguous test cases this isn't an issue; but it breaks down for example with channels last format. I resolve this by adding a new `TensorIteratorConfig` parameter, `enforce_linear_iteration`, which disables dimension reordering. I've also added a test case for non-contiguous tensors to verify this works. This PR also significantly improves performance by adding multithreading support to the algorithm. As part of this, I wrote a custom `count_nonzero` that gives per-thread counts which is necessary to write the outputs in the right location. \| Shape \| Before \| After (1 thread) \| After (8 threads) \| \|:----------:\|--------:\|-----------------:\|------------------:\| \| 256,128,32 \| 2610 us \| 2220 us \| 496 us \| \| 128,128,32 \| 1250 us \| 976 us \| 175 us \| \| 64,128,32 \| 581 us \| 486 us \| 88 us \| \| 32,128,32 \| 292 us \| 245 us \| 80 us \| \| 16,128,32 \| 147 us \| 120 us \| 71 us \| \| 8,128,32 \| 75 us \| 61 us \| 61 us \| \| 4,128,32 \| 39 us \| 32 us \| 32 us \| \| 2,128,32 \| 20 us \| 17 us \| 17 us \| \| 1,128,32 \| 11 us \| 9 us \| 9 us \| Pull Request resolved: https://github.com/pytorch/pytorch/pull/58811 Reviewed By: anjali411 Differential Revision: D28700259 Pulled By: ngimel fbshipit-source-id: 9b279ca7c36d8e348b7e5e4be0dd159e05aee159	2021-05-27 10:06:54 -07:00
Pearu Peterson	be4ba29d49	Detect overflow in numel of sparse COO tensor (#57492 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/57416 Pull Request resolved: https://github.com/pytorch/pytorch/pull/57492 Reviewed By: albanD Differential Revision: D28273649 Pulled By: mruberry fbshipit-source-id: 08ba50509556df1981d7ede025d84a836d2e8e5e	2021-05-25 22:16:21 -07:00
Alexander	6f2c0cccdd	New: sparse complex: add linear algebra, addmm (#57129 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57129 Test Plan: Imported from OSS Reviewed By: janeyx99, astaff Differential Revision: D28112701 Pulled By: ezyang fbshipit-source-id: 1b253453dc19e908fb18d0b1a83738243e0a8d59	2021-05-07 05:37:48 -07:00
Alexander	a911c4fc1c	New: Initial support for sparse complex tensors constructors for CPU/CUDA (#57125 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57125 I'm opening this PR, solving the last issued reported before merging PR #54153 https://github.com/pytorch/pytorch/pull/54153#issuecomment-827997616, Solves gh-50690 Test Plan: Imported from OSS Reviewed By: astaff Differential Revision: D28112702 Pulled By: ezyang fbshipit-source-id: 915681954edb14b7c19c3ffe641af2d2e6649576	2021-05-07 05:36:41 -07:00
Peter Bell	a5288a0244	Sparse support for division rounding_mode argument (#51989 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51989 Test Plan: Imported from OSS Reviewed By: astaff Differential Revision: D28118114 Pulled By: mruberry fbshipit-source-id: 2a76ee55c3845552e57e93d54628ce3c2fab3399	2021-05-01 17:37:25 -07:00
Mike Ruberry	7bcce2acb9	Revert D27765618: Initial support for sparse complex tensors constructors for CPU/CUDA Test Plan: revert-hammer Differential Revision: D27765618 (`daef60c3b7`) Original commit changeset: a9cdd31d5c7a fbshipit-source-id: f700d5db7ff8930b9158460b5a77f68a35e212a4	2021-04-27 15:48:51 -07:00

1 2 3 4 5 ...

439 Commits