Commit Graph

398 Commits

Author SHA1 Message Date
Aleksandar Samardžić
c3a893c659 Implement adding bias vector into structured sparse linear operator (#100881)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100881
Approved by: https://github.com/cpuhrsch
2023-05-17 05:46:22 +00:00
Pearu Peterson
65b15be04c Fix incorrect sparse_dim in COO.zero_() and in binary operations with zero-sized COO operands (#98292)
Fixes https://github.com/pytorch/pytorch/issues/97627

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98292
Approved by: https://github.com/nikitaved, https://github.com/cpuhrsch, https://github.com/amjames
2023-05-11 19:05:34 +00:00
Aleksandar Samardžić
a8c2cd1039 Add CUTLASS-based MM for structured sparse linear operator (#100485)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100485
Approved by: https://github.com/cpuhrsch
2023-05-09 21:05:15 +00:00
Pearu Peterson
92a7640b76 Add mul tests with sparse sample inputs (#100393)
This PR implements sparse sample inputs and error inputs for mul OpInfo.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100393
Approved by: https://github.com/amjames, https://github.com/cpuhrsch
2023-05-09 16:13:14 +00:00
Larry Liu
687afeb686 [dynamo][numpy] Add NumpyTensorVariable to translate ndarray attribute calls to tensor attributes (#95849)
Issue: #93684

# Problem

Reduce graph breaks when dynamo compiles python functions containing numpy functions and ndarray operations.

# Design (as I know it)

* Use torch_np.ndarray(a wrapper of tensor) to back a `VariableTracker`: `NumpyTensorVariable`.
* Translate all attributes and methods calls, on ndarray, to torch_np.ndarray equivalent.

This PR adds `NumpyTensorVariable` and supports:
1.  tensor to ndarray, ndarray to tensor
2. numpy functions such as numpy.meshgrid()
3. ndarray attributes such as `itemsize`, `stride`

Next PR will handle returning `np.ndarray` and add support for ndarray methods
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95849
Approved by: https://github.com/ezyang
2023-04-27 16:18:35 +00:00
Ilia Taraban
a1074ddf51 Enable cadd_sparse for BFloat16 on CPU (#96767)
Enabling **cadd_sparse** operation for BFloat16 on CPU to support BFloat16 operations in GNN libraries.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96767
Approved by: https://github.com/jgong5, https://github.com/cpuhrsch
2023-04-14 19:50:49 +00:00
eqy
2fddcf0fc0 [CUDA][CUDA 11] Remove more CUDA 11 version checks (#92934)
Working on removing stragglers missed in previous CUDA version < 11.0 cleanup PRs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92934
Approved by: https://github.com/ngimel
2023-03-30 19:49:52 +00:00
Pearu Peterson
9d5ac03b9a Deprecate gradcheck check_sparse_nnz argument as duplicate of masked argument (#97187)
As in the title.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97187
Approved by: https://github.com/soulitzer
2023-03-22 14:11:03 +00:00
Huy Do
679dec847e Use is_available instead of device_count to check for CUDA availability (#97043)
There are some tests that incorrectly uses the number of GPU devices `torch.cuda.device_count() > 0` to check for CUDA availability instead of the default `torch.cuda.is_available()` call.  This makes these tests more brittle when encountering infra flakiness on G5 runner using A10G, for example [test_pytorch_np](https://hud.pytorch.org/failure/FAILED%20test_tensorboard.py%3A%3ATestTensorBoardPyTorchNumpy%3A%3Atest_pytorch_np%20-%20RuntimeError%3A%20No%20CUDA%20GPUs%20are%20available).

The underlying problem is that GPU devices could crash on these runner.  While the root cause for that is unclear and we will try to upgrade to a new NVIDIA driver https://github.com/pytorch/pytorch/pull/96904 to see if it helps, we can also make these tests more resilient by using the correct check to skip tests correctly when GPU crashes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97043
Approved by: https://github.com/clee2000
2023-03-18 00:39:42 +00:00
Pearu Peterson
2abcafcfd8 Add masked_grad kw argument to to_dense (#96095)
As in the title.

The `masked_grad` kw argument is required for `to_dense` backward to distinguish the expected semantics of sparse tensors. `masked_grad=True` means that the `to_dense` backward will apply a mask to the returned gradient where the mask is defined by the input indices. The default semantics implies `masked_grad==True` for BC but see the [comment](https://github.com/pytorch/pytorch/pull/96095/files#diff-d4df180433a09071e891d552426911c227b30ae9b8a8e56da31046e7ecb1afbeR501-R513) in `to_dense_backward`.

As a consequence, existing code that is run through autograd engine must replace `.to_dense()` calls with `.to_dense(masked_grad=False)`. For example,
```python
torch.autograd.gradcheck(lambda x: torch.sum(x, [0]).to_dense())
torch.autograd.gradcheck(lambda x: torch.sparse.sum(x, [0]).to_dense())
```
(recall, gradcheck has `masked=False` as default) must be updated to
```python
torch.autograd.gradcheck(lambda x: torch.sum(x, [0]).to_dense(masked_grad=False))
torch.autograd.gradcheck(lambda x: torch.sparse.sum(x, [0]).to_dense(masked_grad=True), masked=True)
```

Fixes https://github.com/pytorch/pytorch/issues/95550

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96095
Approved by: https://github.com/cpuhrsch
2023-03-16 21:38:11 +00:00
Nikita Vedeneev
0b5040b329 sparse_mask: remove syncs by removing calls to coalesce (#94406)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94406
Approved by: https://github.com/cpuhrsch, https://github.com/pearu
2023-03-13 16:30:27 +00:00
Andrew M. James
2bcc0e9e18 Expand sparse.softmax zero nnz tests to cover cases of previously reported FPE. (#95646)
- Test cases with zero `nnz` added for `sparse.log_softmax`.
- Test cases with zero `nnz` for both `sparse.log_softmax` and
`torch.sparse_softmax` expanded to cover the backward pass.

These test additions prove resolution to #95371 and #82107.

Fixes #82107 #95371

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95646
Approved by: https://github.com/cpuhrsch, https://github.com/pearu, https://github.com/nikitaved
2023-03-01 17:26:51 +00:00
Pearu Peterson
b89fda51cd Implement sparse semantics support in gradcheck (2nd try) (#95405)
Replaces https://github.com/pytorch/pytorch/pull/94714 that was reverted due to https://github.com/pytorch/pytorch/pull/94714#issuecomment-1442355648

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95405
Approved by: https://github.com/albanD
2023-02-27 17:48:02 +00:00
Pearu Peterson
0c0694495b Fix a bug in nesting check_sparse_tensor_invariants context managers (#95372)
As in the title. The bug was reported in https://github.com/pytorch/pytorch/pull/94728#discussion_r1108892366 and has the following reproducer:
```python
>>> import torch
>>> check_ctx = torch.sparse.check_sparse_tensor_invariants(True)
>>> no_check_ctx = torch.sparse.check_sparse_tensor_invariants(False)
>>> with check_ctx:
...   assert torch.sparse.check_sparse_tensor_invariants.is_enabled()
...   with no_check_ctx:
...     assert not torch.sparse.check_sparse_tensor_invariants.is_enabled()
...   assert torch.sparse.check_sparse_tensor_invariants.is_enabled()
...
Traceback (most recent call last):
  File "<stdin>", line 5, in <module>
AssertionError
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95372
Approved by: https://github.com/cpuhrsch
2023-02-23 18:22:13 +00:00
Zain Rizvi
808879ec8b Revert "Implement sparse semantics support in gradcheck (#94714)" (#95386)
This reverts commit 7ac511c29a from https://github.com/pytorch/pytorch/pull/94714 since it breaks periodic.

Git thinks there's a merge conflict due to an unfortunately located newline deletion, so reverting this one manually

Details behind the failure in https://github.com/pytorch/pytorch/pull/94714#issuecomment-1442160593
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95386
Approved by: https://github.com/clee2000
2023-02-23 18:02:37 +00:00
Pearu Peterson
cece63f197 Add warn-once deprecation warning to legacy sparse constructors (#94850)
Addresses https://github.com/pytorch/pytorch/issues/68323#issuecomment-1425174341

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94850
Approved by: https://github.com/amjames, https://github.com/cpuhrsch
2023-02-23 15:05:12 +00:00
kshitij12345
3b966a6ce3 [autograd] disable backward/grad for complex scalar output (#92753)
Fixes https://github.com/pytorch/pytorch/issues/92750

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92753
Approved by: https://github.com/ezyang
2023-02-23 11:38:27 +00:00
Pearu Peterson
7ac511c29a Implement sparse semantics support in gradcheck (#94714)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94714
Approved by: https://github.com/soulitzer, https://github.com/albanD
2023-02-22 20:03:25 +00:00
Nikita Vedeneev
3ace14eb8b [Bug fix] sparse_mask: wrong intersection on CUDA (#94829)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94829
Approved by: https://github.com/cpuhrsch
2023-02-15 13:22:39 +00:00
Xuehai Pan
046e88a291 [BE] [3/3] Rewrite super() calls in test (#94592)
Rewrite Python built-in class `super()` calls. Only non-semantic changes should be applied.

- #94587
- #94588
- #94592

Also, methods with only a `super()` call are removed:

```diff
class MyModule(nn.Module):
-   def __init__(self):
-       super().__init__()
-
    def forward(self, ...):
        ...
```

Some cases that change the semantics should be kept unchanged. E.g.:

f152a79be9/caffe2/python/net_printer.py (L184-L190)

f152a79be9/test/test_jit_fuser_te.py (L2628-L2635)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94592
Approved by: https://github.com/ezyang, https://github.com/seemethere
2023-02-12 22:20:53 +00:00
Aaron Gokaslan
3d82d8d0ed [BE] Enable more flake8-comprehensions checks (#94601)
I applied some flake8 fixes and enabled checking for them in the linter. I also enabled some checks for my previous comprehensions PR.

This is a follow up to #94323 where I enable the flake8 checkers for the fixes I made and fix a few more of them.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94601
Approved by: https://github.com/ezyang
2023-02-10 23:40:29 +00:00
Huy Do
c53bd0dd30 Mitigate broken test_coalesce_reference_cycle test on dynamo (#94622)
The test has been disabled and shows up on https://github.com/pytorch/test-infra/blob/generated-stats/stats/disabled-tests-condensed.json, but then the JSON file downloaded by the runner doesn't seem to have it.

Disable it explicitly to keep trunk green while investigating.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94622
Approved by: https://github.com/weiwangmeta
2023-02-10 21:59:36 +00:00
PyTorch MergeBot
76ed1a81d1 Revert "COO intersection kernel: respect value intersection order (#92242)"
This reverts commit b07c839b70.

Reverted https://github.com/pytorch/pytorch/pull/92242 on behalf of https://github.com/jeanschmidt due to breaking vs17
2023-02-09 14:44:32 +00:00
Aleksandar Samardžić
e1f17b3530 Add CSR->BSC and CSC->BSR conversions (#93301)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/93301
Approved by: https://github.com/cpuhrsch
2023-02-07 19:22:05 +00:00
Nikita Vedeneev
b07c839b70 COO intersection kernel: respect value intersection order (#92242)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92242
Approved by: https://github.com/cpuhrsch, https://github.com/amjames
2023-02-07 17:05:28 +00:00
Nikita Vedeneev
994f85d639 sparse_mask: extend lhs to sparse COO tensors (#92248)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92248
Approved by: https://github.com/cpuhrsch, https://github.com/pearu
2023-02-01 09:00:07 +00:00
Aleksandar Samardžić
53f7fb9a22 Add CSC->BSC conversion (#92307)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92307
Approved by: https://github.com/cpuhrsch
2023-01-30 17:03:36 +00:00
Pearu Peterson
65d6802e2f Improve error messages for sparse methods on tensors with unsupported backends/layouts. (#93149)
Fixes https://github.com/pytorch/pytorch/issues/92790

Pull Request resolved: https://github.com/pytorch/pytorch/pull/93149
Approved by: https://github.com/cpuhrsch
2023-01-27 19:50:23 +00:00
Pearu Peterson
0e92bbe5b1 Add sparse COO tensor support to torch.sum(dim=..., keepdim=...) (#92979)
Fixes #92757, #86232

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92979
Approved by: https://github.com/cpuhrsch
2023-01-26 18:42:51 +00:00
Eddie Yan
0bf7506051 [CUDA] Drop CUDA < 11.0 test flags (#92605)
Follow-up of #89582 to drop flags like `CUDA11OrLater` in tests. Note that in some places it appears that `TEST_WITH_ROCM` is _implicitly_ guarded against via the `CUDA11OrLater` version check, based on my best-guess of how `torch.version.cuda` would behave in ROCM builds, so I've added `not TEST_WITH_ROCM` in cases where ROCM wasn't previously explicitly allowed.

CC @ptrblck @malfet @ngimel
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92605
Approved by: https://github.com/ngimel
2023-01-24 04:34:06 +00:00
Nikita Vedeneev
9f381c9b7f sparse_sparse_matmul: simplify backward (#91712)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91712
Approved by: https://github.com/albanD
2023-01-23 19:24:28 +00:00
Yanbo Liang
0ab4ab9f8d [Dynamo] Fix calling UserDefinedObject.func should pass self object (#92050)
Fixes #90834

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92050
Approved by: https://github.com/jansel
2023-01-21 05:47:01 +00:00
Pearu Peterson
b3e4f5029b Add check-sparse-tensor-invariants flag to Context - 2nd try. (#92094)
This PR is a copy of https://github.com/pytorch/pytorch/pull/90849 that merge was reverted.

The PR adds "check sparse tensor invariants" flag to Context that when enabled will trigger sparse tensor data invariants checks in unsafe methods of constructing sparse COO/CSR/CSC/BSR/BSC tensors. The feature includes the following changes to UI:

`torch.sparse.check_sparse_tensor_invariants` class provides different ways to enable/disable the invariant checking.

`torch.sparse_coo/csr/csc/bsr/bsc/compressed_tensor` functions have a new optional argument `check_invariants` to enable/disable the invariant checks explicitly. When the `check_invariants` argument is specified, the global state of the feature is temporarily overridden.

The PR fixes https://github.com/pytorch/pytorch/issues/90833

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92094
Approved by: https://github.com/cpuhrsch
2023-01-13 14:50:33 +00:00
PyTorch MergeBot
c7a22bb7c7 Revert "Add check-sparse-tensor-invariants flag to Context. (#90849)"
This reverts commit b9a035c1c5.

Reverted https://github.com/pytorch/pytorch/pull/90849 on behalf of https://github.com/DanilBaibak due to Break internal build
2023-01-12 09:58:16 +00:00
Aleksandar Samardžić
8612ec5b90 Implement hybrid sparse to/from dense conversions. (#90177)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90177
Approved by: https://github.com/cpuhrsch, https://github.com/pearu
2023-01-12 03:31:30 +00:00
min-jean-cho
af242eedfb [Inductor] Added aten.uniform_ decomp (#90869)
Fixes #90815

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90869
Approved by: https://github.com/jgong5, https://github.com/jansel, https://github.com/lezcano, https://github.com/ngimel, https://github.com/albanD
2023-01-11 23:23:42 +00:00
Pearu Peterson
b9a035c1c5 Add check-sparse-tensor-invariants flag to Context. (#90849)
This PR adds "check sparse tensor invariants" flag to Context that when enabled will trigger sparse tensor data invariants checks in unsafe methods of constructing sparse COO/CSR/CSC/BSR/BSC tensors. The feature includes the following changes to UI:

- `torch.enable_check_sparse_tensor_invariants` and `torch.is_check_sparse_tensor_invariants_enabled` functions to globally enable/disable the invariant checks and to retrieve the state of the feature, respectively
- `torch.sparse_coo/csr/csc/bsr/bsc/compressed_tensor` functions have a new optional argument `check_invariants` to enable/disable the invariant checks explicitly. When the `check_invariants` argument is specified, the global state of the feature is temporarily overridden.

The PR also fixes https://github.com/pytorch/pytorch/issues/90833

# Main issue

*The following content is outdated after merging the PRs in this ghstack but kept for the record.*

The importance of this feature is that when enabling the invariants checks by default, say, via

<details>

```
$ git diff
diff --git a/torch/__init__.py b/torch/__init__.py
index c8543057c7..19a91d0482 100644
--- a/torch/__init__.py
+++ b/torch/__init__.py
@@ -1239,3 +1239,8 @@ if 'TORCH_CUDA_SANITIZER' in os.environ:

 # Populate magic methods on SymInt and SymFloat
 import torch.fx.experimental.symbolic_shapes
+
+# temporarily enable sparse tensor arguments validation in unsafe
+# constructors:
+
+torch._C._set_check_sparse_tensor_invariants(True)
```

</details>

a massive number of test failures/errors occur in test_sparse_csr.py tests:
```
$ pytest -sv test/test_sparse_csr.py
<snip>
==== 4293 failed, 1557 passed, 237 skipped, 2744 errors in 69.71s (0:01:09) ====
```
that means that we are silently constructing sparse compressed tensors that do not satisfy the sparse tensor invariants. In particular, the following errors are raised:

```
AssertionError: "resize_as_sparse_compressed_tensor_: self and src must have the same layout" does not match "expected values to be a strided and contiguous tensor"

RuntimeError: CUDA error: device-side assert triggered

RuntimeError: `col_indices[..., crow_indices[..., i - 1]:crow_indices[..., i]] for all i = 1, ..., nrows are sorted and distinct along the last dimension values` is not satisfied.

RuntimeError: expected col_indices to be a strided and contiguous tensor

RuntimeError: expected row_indices to be a strided and contiguous tensor

RuntimeError: expected values to be a strided and contiguous tensor

RuntimeError: for_each: failed to synchronize: cudaErrorAssert: device-side assert triggered

RuntimeError: tensor dimensionality must be sum of batch, base, and dense dimensionalities (=0 + 2 + 0) but got 3
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90849
Approved by: https://github.com/amjames, https://github.com/cpuhrsch
2023-01-11 01:05:14 +00:00
anjali411
c887837ec3 Reland "Fix dynamo handling for tensor attributes: T, H, mT, mH (#90463)" (#91897)
This reverts commit 84266ae670.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91897
Approved by: https://github.com/ngimel
2023-01-10 08:16:07 +00:00
PyTorch MergeBot
84266ae670 Revert "Fix dynamo handling for tensor attributes: T, H, mT, mH (#90463)"
This reverts commit 9945a78a94.

Reverted https://github.com/pytorch/pytorch/pull/90463 on behalf of https://github.com/ZainRizvi due to This is causing test failures: FAILED inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_pinv_singular_cuda_float64 - RuntimeError: unexpected success linalg.pinv.singular, torch.float64, cuda
2023-01-09 16:43:36 +00:00
anjali411
9945a78a94 Fix dynamo handling for tensor attributes: T, H, mT, mH (#90463)
Fixes https://github.com/pytorch/pytorch/issues/88843

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90463
Approved by: https://github.com/ngimel
2023-01-09 04:11:23 +00:00
Nikita Vedeneev
7ef7c57ae7 CSC/BSC -> COO coalesce fix (#91440)
Fixes https://github.com/pytorch/pytorch/issues/91010.

CSC and BSC sparse formats are not inherently `coalesced`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91440
Approved by: https://github.com/pearu, https://github.com/amjames, https://github.com/cpuhrsch
2023-01-03 18:42:39 +00:00
Pearu Peterson
b797a24259 Support indices contiguity per batch and non-contiguous values in sparse compressed tensors (#91243)
Fixes https://github.com/pytorch/pytorch/issues/91062

With this PR, all reported failures in https://github.com/pytorch/pytorch/pull/90849 are resolved (modulo test_bmm that uses an unorthodox way to construct a batch CSR tensor).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91243
Approved by: https://github.com/nikitaved, https://github.com/amjames, https://github.com/lezcano
2023-01-02 18:08:46 +00:00
Nikita Vedeneev
1768a28a20 COO @ COO: fix to always produce coalesced outputs. (#91094)
Fixes [#90516](https://github.com/pytorch/pytorch/issues/90516)
Fixes [#90538](https://github.com/pytorch/pytorch/issues/90538)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91094
Approved by: https://github.com/pearu
2022-12-27 21:32:14 +00:00
Pearu Peterson
8004f934cd Fix CSR with int32 indices to CSC conversion (#91061)
Fixes https://github.com/pytorch/pytorch/issues/91007

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91061
Approved by: https://github.com/nikitaved
2022-12-18 13:53:25 +00:00
Pearu Peterson
01e7f46215 Ensure sorted indices from the CSR->BSR conversion (#90918)
Fixes https://github.com/pytorch/pytorch/issues/90910

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90918
Approved by: https://github.com/cpuhrsch
2022-12-16 15:49:48 +00:00
Edward Z. Yang
e686a442b4 If a torch.* returns non-Tensor, make this unimplemented rather than assert. (#89918)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89918
Approved by: https://github.com/albanD
2022-12-15 21:53:54 +00:00
Pearu Peterson
a60d712010 Support (non-batch) BSR/BSC to COO sparse tensor conversions (#90718)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90718
Approved by: https://github.com/cpuhrsch
2022-12-14 05:37:05 +00:00
Pearu Peterson
76c6dfeaa6 Add layout and blocksize arguments to Tensor.to_sparse method (#89502)
This PR extends the `Tensor.to_sparse()` method to `Tensor.to_sparse(layout=None, blocksize=None)` in a BC manner (`layout=None` means `layout=torch.sparse_coo`).

In addition, the PR adds support for the following conversions:
- non-hybrid/hybrid COO tensor to CSR or CSC or a COO tensor
- short, bool, byte, char, bfloat16, int, long, half CSR tensor to a BSR tensor

and fixes the following conversions:
- hybrid COO to COO tensor
- non-batch/batch hybrid BSR to BSR or BSC tensor

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89502
Approved by: https://github.com/amjames, https://github.com/cpuhrsch
2022-11-30 20:21:10 +00:00
Pearu Peterson
296e1ba4d0 Row and column select support for block compressed sparse tensors (#88733)
As in the title:

- Support `select` and `select_copy` on block sparse compressed tensors
- Fixes incorrect results when selecting dense dimensions

The PR also improves the performance of indexing sparse compressed tensors considerably:

<details>

Before:

```python
In [3]: a=torch.rand((1000, 1000)).to_sparse_csr()

In [4]: %timeit a.select(0, 0)
606 µs ± 4.27 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [5]: %timeit a.select(1, 0)
527 µs ± 57.7 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [6]: %timeit a[0, 0]
617 µs ± 3.74 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [7]: a = a.cuda()

In [8]: %timeit a.select(0, 0); torch.cuda.synchronize();
1.19 ms ± 137 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [9]: %timeit a.select(1, 0); torch.cuda.synchronize();
1.2 ms ± 119 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [10]: %timeit a[0, 0]; torch.cuda.synchronize();
1.23 ms ± 482 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
```

This PR:

```python
In [3]: a=torch.rand((1000, 1000)).to_sparse_csr()

In [4]: %timeit a.select(0, 0)
4.75 µs ± 8.94 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [5]: %timeit a.select(1, 0)
565 µs ± 156 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [6]: %timeit a[0, 0]
13.1 µs ± 435 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [7]: a = a.cuda()

In [8]: %timeit a.select(0, 0); torch.cuda.synchronize();
21.6 µs ± 23.9 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [9]: %timeit a.select(1, 0); torch.cuda.synchronize();
1.15 ms ± 3.13 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [10]: %timeit a[0, 0]; torch.cuda.synchronize();
63.7 µs ± 2.5 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
```

</details>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88733
Approved by: https://github.com/nikitaved, https://github.com/amjames, https://github.com/cpuhrsch
2022-11-30 11:15:56 +00:00
Pearu Peterson
90bed8874f Generator of tensor inputs with variable layout and structure (batch/non-batch, hybrid/non-hybrid, block/non-block) (#88914)
This PR introduces `TestCase.generate_simple_inputs` method that is an improved and generalized version of the `TestSparseCompressed._generate_small_inputs` method.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88914
Approved by: https://github.com/cpuhrsch
2022-11-30 02:13:33 +00:00
Kazuaki Ishizaki
088f2fa567 Fix typos in messages under test (#89121)
This PR fixes typos of messages in `.cpp` and `.py` files under test directory.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89121
Approved by: https://github.com/mruberry, https://github.com/kit1980
2022-11-17 01:55:03 +00:00
Andrew M. James
ff6770a9a1 enable backward for log1p (sparse layouts) (#88155)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88155
Approved by: https://github.com/cpuhrsch
2022-11-04 20:59:26 +00:00
jpvillam
1e1b045128 [ROCM] Enable Sparse Pickle Test (#82729)
Missed stream context for serialization

### Description
Missing ROCm stream context on memory operations for serialization

### Testing
Ran the sparse pickle test

Pull Request resolved: https://github.com/pytorch/pytorch/pull/82729
Approved by: https://github.com/ngimel
2022-10-27 15:11:28 +00:00
Pearu Peterson
88b882cd1c Support sum on a sparse COO tensor. (#86300)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86300
Approved by: https://github.com/cpuhrsch
2022-10-06 18:39:28 +00:00
George Qi
686555b663 [maskedtensor] port torch/_masked into torch/masked (#85515)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85515
Approved by: https://github.com/cpuhrsch
2022-09-26 23:41:13 +00:00
Elias Ellison
bcc544e9d7 Add FakeCrossRef tests for backwards, Fix Layer Norm Backward Decomp (#85417)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85417
Approved by: https://github.com/ezyang
2022-09-26 17:08:14 +00:00
nikitaved
12ae3bea43 Faster mul(sparse, sparse) with broadcasting in dense dims. (#85336)
This is a combo PR of https://github.com/pytorch/pytorch/pull/84929 and ~https://github.com/pytorch/pytorch/pull/83428~.

Preliminary benchmarks (square matrices of shape (n, n)).

<details>

<summary>Script</summary>

```python
import torch
import math
from IPython import get_ipython
from itertools import product, repeat
import pickle
from torch.utils.benchmark import Timer, Compare

torch.manual_seed(13)

problem_dims = (
    # n > nnz
    (10000, 100),
    (100000, 1000),
    (1000000, 10000),
    # n < nnz
    (10, 100),
    (10, 1000),
    (10, 10000),
    (100, 1000),
    (100, 10000),
    (1000, 10000),
    (1000, 100000),
    (1000, 1000000),
    #(1000000, 1000000000),
)

name = "PR"
device = "cuda"
results = []

for n, nnz in problem_dims:
    def gen_tensor(coalesce=False):
        shape = (n, n)
        nrows, ncols = shape
        rowidx = torch.randint(low=0, high=nrows, size=(nnz,), device=device)
        colidx = torch.randint(low=0, high=ncols, size=(nnz,), device=device)
        itemidx = torch.vstack((rowidx, colidx))
        xvalues = torch.randn(nnz, device=device)
        itemidx = torch.hstack((itemidx, itemidx))
        xvalues = torch.hstack((xvalues, xvalues))
        res = torch.sparse_coo_tensor(itemidx, xvalues, size=shape)
        if coalesce:
            return res.coalesce()
        else:
            return res

    for x_coalesce, y_coalesce in product(*repeat((True, False), 2)):
        x = gen_tensor(x_coalesce)
        y = gen_tensor(y_coalesce)
        smtp = "x * y"
        timer = Timer(smtp,
                      globals=globals(),
                      label="coo.mul",
                      description=f"{name}: mul, device: {device}",
                      sub_label=f"n={n}, nnz={nnz}, coalesce=({x_coalesce, y_coalesce})",
                      num_threads=torch.get_num_threads())
        results.append(timer.blocked_autorange())

compare = Compare(results)
compare.trim_significant_figures()
compare.print()

with open(f"{name}_{device}_mul.pickle", 'wb') as f:
    pickle.dump(results, f)

```

</details>

<details>

<summary>Gather results</summary>

```python
import pickle
from torch.utils.benchmark import Timer, Compare

files = [
        "PR",
        "master"
        ]

device = 'cuda'

timers = []
for name in files:
    with open("{}_{}_mul.pickle".format(name, device), 'rb') as f:
        timers += pickle.load(f)

compare = Compare(timers)
compare.trim_significant_figures()
compare.print()

```

</details>

<details>

<summary>CUDA</summary>

```
[------------------------------------------------- coo.mul -------------------------------------------------]
                                                       |  PR: mul, device: cuda  |  master: mul, device: cuda
24 threads: -------------------------------------------------------------------------------------------------
      n=10000, nnz=100, coalesce=((True, True))        |             95          |                91
      n=10000, nnz=100, coalesce=((True, False))       |             87          |               242
      n=10000, nnz=100, coalesce=((False, True))       |             87          |               226
      n=10000, nnz=100, coalesce=((False, False))      |            130          |               371
      n=100000, nnz=1000, coalesce=((True, True))      |            100          |               521
      n=100000, nnz=1000, coalesce=((True, False))     |             90          |               649
      n=100000, nnz=1000, coalesce=((False, True))     |            100          |               659
      n=100000, nnz=1000, coalesce=((False, False))    |            200          |               781
      n=1000000, nnz=10000, coalesce=((True, True))    |            100          |              4861
      n=1000000, nnz=10000, coalesce=((True, False))   |            100          |              5012
      n=1000000, nnz=10000, coalesce=((False, True))   |             98          |              5010
      n=1000000, nnz=10000, coalesce=((False, False))  |            384          |              5174
      n=10, nnz=100, coalesce=((True, True))           |            100          |                79
      n=10, nnz=100, coalesce=((True, False))          |            100          |               221
      n=10, nnz=100, coalesce=((False, True))          |            100          |               221
      n=10, nnz=100, coalesce=((False, False))         |            100          |               350
      n=10, nnz=1000, coalesce=((True, True))          |            100          |               100
      n=10, nnz=1000, coalesce=((True, False))         |            100          |               240
      n=10, nnz=1000, coalesce=((False, True))         |            100          |               254
      n=10, nnz=1000, coalesce=((False, False))        |            100          |               392
      n=10, nnz=10000, coalesce=((True, True))         |            100          |               110
      n=10, nnz=10000, coalesce=((True, False))        |            110          |               286
      n=10, nnz=10000, coalesce=((False, True))        |            110          |               286
      n=10, nnz=10000, coalesce=((False, False))       |            271          |               455
      n=100, nnz=1000, coalesce=((True, True))         |            110          |               851
      n=100, nnz=1000, coalesce=((True, False))        |            110          |              1000
      n=100, nnz=1000, coalesce=((False, True))        |            110          |               990
      n=100, nnz=1000, coalesce=((False, False))       |            140          |              1124
      n=100, nnz=10000, coalesce=((True, True))        |            110          |              5137
      n=100, nnz=10000, coalesce=((True, False))       |            110          |              5391
      n=100, nnz=10000, coalesce=((False, True))       |            100          |              5405
      n=100, nnz=10000, coalesce=((False, False))      |            249          |              5539
      n=1000, nnz=10000, coalesce=((True, True))       |            100          |              8598
      n=1000, nnz=10000, coalesce=((True, False))      |            100          |              8800
      n=1000, nnz=10000, coalesce=((False, True))      |            100          |              8782
      n=1000, nnz=10000, coalesce=((False, False))     |            255          |              8956
      n=1000, nnz=100000, coalesce=((True, True))      |            120          |             84500
      n=1000, nnz=100000, coalesce=((True, False))     |            200          |             88560
      n=1000, nnz=100000, coalesce=((False, True))     |            160          |             89000
      n=1000, nnz=100000, coalesce=((False, False))    |            373          |             89000
      n=1000, nnz=1000000, coalesce=((True, True))     |            312          |            606400
      n=1000, nnz=1000000, coalesce=((True, False))    |           1340          |            609200
      n=1000, nnz=1000000, coalesce=((False, True))    |           1340          |            609100
      n=1000, nnz=1000000, coalesce=((False, False))   |           4408          |            611400

Times are in microseconds (us).
```

</details>

<details>

<summary>CPU</summary>

```
[------------------------------------------------ coo.mul ------------------------------------------------]
                                                       |  PR: mul, device: cpu  |  master: mul, device: cpu
24 threads: -----------------------------------------------------------------------------------------------
      n=10000, nnz=100, coalesce=((True, True))        |              8         |                8
      n=10000, nnz=100, coalesce=((True, False))       |             32         |               34
      n=10000, nnz=100, coalesce=((False, True))       |             32         |               34
      n=10000, nnz=100, coalesce=((False, False))      |             41         |               56
      n=100000, nnz=1000, coalesce=((True, True))      |             24         |               24
      n=100000, nnz=1000, coalesce=((True, False))     |             90         |              100
      n=100000, nnz=1000, coalesce=((False, True))     |             87         |              100
      n=100000, nnz=1000, coalesce=((False, False))    |            231         |              255
      n=1000000, nnz=10000, coalesce=((True, True))    |            190         |              200
      n=1000000, nnz=10000, coalesce=((True, False))   |            908         |             2023
      n=1000000, nnz=10000, coalesce=((False, True))   |            800         |             2036
      n=1000000, nnz=10000, coalesce=((False, False))  |           3684         |             3989
      n=10, nnz=100, coalesce=((True, True))           |              8         |                7
      n=10, nnz=100, coalesce=((True, False))          |             34         |               30
      n=10, nnz=100, coalesce=((False, True))          |             33         |               30
      n=10, nnz=100, coalesce=((False, False))         |             44         |               50
      n=10, nnz=1000, coalesce=((True, True))          |              8         |                7
      n=10, nnz=1000, coalesce=((True, False))         |            100         |              100
      n=10, nnz=1000, coalesce=((False, True))         |            130         |              100
      n=10, nnz=1000, coalesce=((False, False))        |            746         |              210
      n=10, nnz=10000, coalesce=((True, True))         |              8         |                7
      n=10, nnz=10000, coalesce=((True, False))        |           1000         |             1500
      n=10, nnz=10000, coalesce=((False, True))        |           1000         |             1510
      n=10, nnz=10000, coalesce=((False, False))       |           3063         |             2457
      n=100, nnz=1000, coalesce=((True, True))         |             25         |               25
      n=100, nnz=1000, coalesce=((True, False))        |            180         |              130
      n=100, nnz=1000, coalesce=((False, True))        |            200         |              130
      n=100, nnz=1000, coalesce=((False, False))       |            271         |              255
      n=100, nnz=10000, coalesce=((True, True))        |            100         |              100
      n=100, nnz=10000, coalesce=((True, False))       |           2444         |             2290
      n=100, nnz=10000, coalesce=((False, True))       |           2455         |             2357
      n=100, nnz=10000, coalesce=((False, False))      |           5316         |             3783
      n=1000, nnz=10000, coalesce=((True, True))       |            204         |              211
      n=1000, nnz=10000, coalesce=((True, False))      |           2457         |             2480
      n=1000, nnz=10000, coalesce=((False, True))      |           2448         |             2539
      n=1000, nnz=10000, coalesce=((False, False))     |           3665         |             4801
      n=1000, nnz=100000, coalesce=((True, True))      |           2293         |             2374
      n=1000, nnz=100000, coalesce=((True, False))     |           9000         |            24620
      n=1000, nnz=100000, coalesce=((False, True))     |           8000         |            25080
      n=1000, nnz=100000, coalesce=((False, False))    |          26500         |            47650
      n=1000, nnz=1000000, coalesce=((True, True))     |          10000         |            13000
      n=1000, nnz=1000000, coalesce=((True, False))    |          80000         |           362200
      n=1000, nnz=1000000, coalesce=((False, True))    |          78050         |           392600
      n=1000, nnz=1000000, coalesce=((False, False))   |         312100         |           766900

Times are in microseconds (us).
```

</details>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/85336
Approved by: https://github.com/cpuhrsch
2022-09-23 23:31:19 +00:00
PyTorch MergeBot
d10de31cc8 Revert "Add FakeCrossRef tests for backwards, Fix Layer Norm Backward Decomp (#85417)"
This reverts commit 78afa0cf0c.

Reverted https://github.com/pytorch/pytorch/pull/85417 on behalf of https://github.com/clee2000 due to broke tests on trunk 78afa0cf0c
2022-09-23 17:21:43 +00:00
Elias Ellison
78afa0cf0c Add FakeCrossRef tests for backwards, Fix Layer Norm Backward Decomp (#85417)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85417
Approved by: https://github.com/ezyang
2022-09-23 15:50:03 +00:00
PyTorch MergeBot
5043457a8e Revert "Add FakeCrossRef tests for backwards, Fix Layer Norm Backward Decomp (#85417)"
This reverts commit 9c77083965.

Reverted https://github.com/pytorch/pytorch/pull/85417 on behalf of https://github.com/clee2000 due to broke tests on trunk (and pull somehow) 9c77083965
2022-09-22 15:44:38 +00:00
Elias Ellison
9c77083965 Add FakeCrossRef tests for backwards, Fix Layer Norm Backward Decomp (#85417)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85417
Approved by: https://github.com/ezyang
2022-09-22 13:03:57 +00:00
Elias Ellison
d9aa6dfe88 Add Fake Cross Ref Mode, migrate sparse to it (#85382)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85382
Approved by: https://github.com/ezyang
2022-09-21 17:15:47 +00:00
PyTorch MergeBot
81620c3360 Revert "Faster mul(sparse, sparse) with broadcasting in dense dims. (#83428)"
This reverts commit d49943bda8.

Reverted https://github.com/pytorch/pytorch/pull/83428 on behalf of https://github.com/osalpekar due to Reverted because __restrict symbol not supported by certain MSVC compilers, leading to undefined symbol error at compilation time
2022-09-17 06:53:11 +00:00
nikitaved
d49943bda8 Faster mul(sparse, sparse) with broadcasting in dense dims. (#83428)
Preliminary benchmarks (square matrices of shape (n, n)).

<details>

<summary>Script</summary>

```python
import torch
import math
from IPython import get_ipython
from itertools import product, repeat
import pickle
from torch.utils.benchmark import Timer, Compare

torch.manual_seed(13)

# specifies (n, nnz)
problem_dims = (
    # n > nnz
    (10000, 100),
    (100000, 1000),
    (1000000, 10000),
    # n < nnz
    (10, 100),
    (10, 1000),
    (10, 10000),
    (100, 1000),
    (100, 10000),
    (1000, 10000),
    (1000, 100000),
    (1000, 1000000),
    #(1000000, 1000000000),
)

name = "PR"
device = "cuda"
results = []

for n, nnz in problem_dims:
    def gen_tensor(coalesce=False):
        shape = (n, n)
        nrows, ncols = shape
        rowidx = torch.randint(low=0, high=nrows, size=(nnz,), device=device)
        colidx = torch.randint(low=0, high=ncols, size=(nnz,), device=device)
        itemidx = torch.vstack((rowidx, colidx))
        xvalues = torch.randn(nnz, device=device)
        itemidx = torch.hstack((itemidx, itemidx))
        xvalues = torch.hstack((xvalues, xvalues))
        res = torch.sparse_coo_tensor(itemidx, xvalues, size=shape)
        if coalesce:
            return res.coalesce()
        else:
            return res

    for x_coalesce, y_coalesce in product(*repeat((True, False), 2)):
        x = gen_tensor(x_coalesce)
        y = gen_tensor(y_coalesce)
        smtp = "x * y"
        timer = Timer(smtp,
                      globals=globals(),
                      label="coo.mul",
                      description=f"{name}: mul, device: {device}",
                      sub_label=f"n={n}, nnz={nnz}, coalesce=({x_coalesce, y_coalesce})",
                      num_threads=torch.get_num_threads())
        results.append(timer.blocked_autorange())

compare = Compare(results)
compare.trim_significant_figures()
compare.print()

with open(f"{name}_{device}_mul.pickle", 'wb') as f:
    pickle.dump(results, f)

```

</details>

<details>

<summary>Gather results</summary>

```python
import pickle
from torch.utils.benchmark import Timer, Compare

files = [
        "PR",
        "master"
        ]

device = 'cuda'

timers = []
for name in files:
    with open("{}_{}_mul.pickle".format(name, device), 'rb') as f:
        timers += pickle.load(f)

compare = Compare(timers)
compare.trim_significant_figures()
compare.print()

```

</details>

<details>

<summary>CUDA</summary>

```
[------------------------------------------------- coo.mul -------------------------------------------------]
                                                       |  PR: mul, device: cuda  |  master: mul, device: cuda
24 threads: -------------------------------------------------------------------------------------------------
      n=10000, nnz=100, coalesce=((True, True))        |             95          |                91
      n=10000, nnz=100, coalesce=((True, False))       |             87          |               242
      n=10000, nnz=100, coalesce=((False, True))       |             87          |               226
      n=10000, nnz=100, coalesce=((False, False))      |            130          |               371
      n=100000, nnz=1000, coalesce=((True, True))      |            100          |               521
      n=100000, nnz=1000, coalesce=((True, False))     |             90          |               649
      n=100000, nnz=1000, coalesce=((False, True))     |            100          |               659
      n=100000, nnz=1000, coalesce=((False, False))    |            200          |               781
      n=1000000, nnz=10000, coalesce=((True, True))    |            100          |              4861
      n=1000000, nnz=10000, coalesce=((True, False))   |            100          |              5012
      n=1000000, nnz=10000, coalesce=((False, True))   |             98          |              5010
      n=1000000, nnz=10000, coalesce=((False, False))  |            384          |              5174
      n=10, nnz=100, coalesce=((True, True))           |            100          |                79
      n=10, nnz=100, coalesce=((True, False))          |            100          |               221
      n=10, nnz=100, coalesce=((False, True))          |            100          |               221
      n=10, nnz=100, coalesce=((False, False))         |            100          |               350
      n=10, nnz=1000, coalesce=((True, True))          |            100          |               100
      n=10, nnz=1000, coalesce=((True, False))         |            100          |               240
      n=10, nnz=1000, coalesce=((False, True))         |            100          |               254
      n=10, nnz=1000, coalesce=((False, False))        |            100          |               392
      n=10, nnz=10000, coalesce=((True, True))         |            100          |               110
      n=10, nnz=10000, coalesce=((True, False))        |            110          |               286
      n=10, nnz=10000, coalesce=((False, True))        |            110          |               286
      n=10, nnz=10000, coalesce=((False, False))       |            271          |               455
      n=100, nnz=1000, coalesce=((True, True))         |            110          |               851
      n=100, nnz=1000, coalesce=((True, False))        |            110          |              1000
      n=100, nnz=1000, coalesce=((False, True))        |            110          |               990
      n=100, nnz=1000, coalesce=((False, False))       |            140          |              1124
      n=100, nnz=10000, coalesce=((True, True))        |            110          |              5137
      n=100, nnz=10000, coalesce=((True, False))       |            110          |              5391
      n=100, nnz=10000, coalesce=((False, True))       |            100          |              5405
      n=100, nnz=10000, coalesce=((False, False))      |            249          |              5539
      n=1000, nnz=10000, coalesce=((True, True))       |            100          |              8598
      n=1000, nnz=10000, coalesce=((True, False))      |            100          |              8800
      n=1000, nnz=10000, coalesce=((False, True))      |            100          |              8782
      n=1000, nnz=10000, coalesce=((False, False))     |            255          |              8956
      n=1000, nnz=100000, coalesce=((True, True))      |            120          |             84500
      n=1000, nnz=100000, coalesce=((True, False))     |            200          |             88560
      n=1000, nnz=100000, coalesce=((False, True))     |            160          |             89000
      n=1000, nnz=100000, coalesce=((False, False))    |            373          |             89000
      n=1000, nnz=1000000, coalesce=((True, True))     |            312          |            606400
      n=1000, nnz=1000000, coalesce=((True, False))    |           1340          |            609200
      n=1000, nnz=1000000, coalesce=((False, True))    |           1340          |            609100
      n=1000, nnz=1000000, coalesce=((False, False))   |           4408          |            611400

Times are in microseconds (us).
```

</details>

<details>

<summary>CPU</summary>

```
[------------------------------------------------ coo.mul ------------------------------------------------]
                                                       |  PR: mul, device: cpu  |  master: mul, device: cpu
24 threads: -----------------------------------------------------------------------------------------------
      n=10000, nnz=100, coalesce=((True, True))        |              8         |                8
      n=10000, nnz=100, coalesce=((True, False))       |             32         |               34
      n=10000, nnz=100, coalesce=((False, True))       |             32         |               34
      n=10000, nnz=100, coalesce=((False, False))      |             41         |               56
      n=100000, nnz=1000, coalesce=((True, True))      |             24         |               24
      n=100000, nnz=1000, coalesce=((True, False))     |             90         |              100
      n=100000, nnz=1000, coalesce=((False, True))     |             87         |              100
      n=100000, nnz=1000, coalesce=((False, False))    |            231         |              255
      n=1000000, nnz=10000, coalesce=((True, True))    |            190         |              200
      n=1000000, nnz=10000, coalesce=((True, False))   |            908         |             2023
      n=1000000, nnz=10000, coalesce=((False, True))   |            800         |             2036
      n=1000000, nnz=10000, coalesce=((False, False))  |           3684         |             3989
      n=10, nnz=100, coalesce=((True, True))           |              8         |                7
      n=10, nnz=100, coalesce=((True, False))          |             34         |               30
      n=10, nnz=100, coalesce=((False, True))          |             33         |               30
      n=10, nnz=100, coalesce=((False, False))         |             44         |               50
      n=10, nnz=1000, coalesce=((True, True))          |              8         |                7
      n=10, nnz=1000, coalesce=((True, False))         |            100         |              100
      n=10, nnz=1000, coalesce=((False, True))         |            130         |              100
      n=10, nnz=1000, coalesce=((False, False))        |            746         |              210
      n=10, nnz=10000, coalesce=((True, True))         |              8         |                7
      n=10, nnz=10000, coalesce=((True, False))        |           1000         |             1500
      n=10, nnz=10000, coalesce=((False, True))        |           1000         |             1510
      n=10, nnz=10000, coalesce=((False, False))       |           3063         |             2457
      n=100, nnz=1000, coalesce=((True, True))         |             25         |               25
      n=100, nnz=1000, coalesce=((True, False))        |            180         |              130
      n=100, nnz=1000, coalesce=((False, True))        |            200         |              130
      n=100, nnz=1000, coalesce=((False, False))       |            271         |              255
      n=100, nnz=10000, coalesce=((True, True))        |            100         |              100
      n=100, nnz=10000, coalesce=((True, False))       |           2444         |             2290
      n=100, nnz=10000, coalesce=((False, True))       |           2455         |             2357
      n=100, nnz=10000, coalesce=((False, False))      |           5316         |             3783
      n=1000, nnz=10000, coalesce=((True, True))       |            204         |              211
      n=1000, nnz=10000, coalesce=((True, False))      |           2457         |             2480
      n=1000, nnz=10000, coalesce=((False, True))      |           2448         |             2539
      n=1000, nnz=10000, coalesce=((False, False))     |           3665         |             4801
      n=1000, nnz=100000, coalesce=((True, True))      |           2293         |             2374
      n=1000, nnz=100000, coalesce=((True, False))     |           9000         |            24620
      n=1000, nnz=100000, coalesce=((False, True))     |           8000         |            25080
      n=1000, nnz=100000, coalesce=((False, False))    |          26500         |            47650
      n=1000, nnz=1000000, coalesce=((True, True))     |          10000         |            13000
      n=1000, nnz=1000000, coalesce=((True, False))    |          80000         |           362200
      n=1000, nnz=1000000, coalesce=((False, True))    |          78050         |           392600
      n=1000, nnz=1000000, coalesce=((False, False))   |         312100         |           766900

Times are in microseconds (us).
```

</details>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/83428
Approved by: https://github.com/cpuhrsch
2022-09-16 00:28:40 +00:00
Edward Z. Yang
c5a8946e40 Revert "Revert "Redo how custom/python_custom methods on TensorImpl work (#84796)" (#84806)
This reverts commit ca3b2bfbe3.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84806
Approved by: https://github.com/Chillee
2022-09-10 06:17:35 +00:00
Eli Uriegas
ca3b2bfbe3 Revert "Redo how custom/python_custom methods on TensorImpl work (#84796)
This reverts commit 591b75bf98.

Manual revert of https://github.com/pytorch/pytorch/pull/84641

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/84796
Approved by: https://github.com/izaitsevfb
2022-09-10 00:18:13 +00:00
Edward Z. Yang
591b75bf98 Redo how custom/python_custom methods on TensorImpl work (#84641)
A longstanding confusion in the implementation of fake tensor and proxy tensor is what to do about torch.ops.aten.sym_sizes and related calls. In particular, when you have a tensor that (1) has symbolic shapes and (2) has a `__torch_dispatch__` call, previously, you would always get `__torch_dispatch__` calls for sizes/strides query, *even if you didn't request it* via the dispatch kwargs in `make_wrapper_subclass`.

The reason for this is because we were previously mixing several concepts: "I want to dispatch to Python", "I want to call a virtual method" and "I have dynamic shapes". A single boolean variable controlled all of these things, and so it was not possible to understand inside TensorImpl what the user had actually originally requested.

In this PR, we track each of these concepts individually so that we can preserve user intent. Then, we combine these into a single "policy" variable that controls whether or not we can use the fastpath or not. For the policy to trigger, we only need one of the exceptional cases to be true.

Billing of changes:
* Rename `set_sizes_strides_policy` to `set_custom_sizes_strides`; in general, you cannot DIRECTLY set policy; you have to indirectly set it by the public functions.
* Some helpers for sizes and strides, since it's more complicated (as it is an enum, rather than just bools as is the case for device and layout). `matches_python_custom` is used to test the Python dispatch user ask. `matches_policy` does the policy test (only used in the user facing functions.)
* I reorged the accessor methods so that they are more logical. This makes the diff bad, so I recommend reading the final code directly.
* The default custom implementations now more reliably call their default() implementations
* As bonus refactor, I devirtualized some functions that don't need to be virtual
* `set_sym_sizes_and_strides` is renamed to `set_sizes_and_strides` to make it easier to use in template contexts; it optionally takes a storage offset now so you can set all three values at the same time. If you use the SymInt overload but there are no symbolic integers, we give you a normal resize.
* This adds `sym_storage_offset` since we had that in the symbolic shapes branch and there's no reason not to put it in (and it reduces merge conflicts)

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84641
Approved by: https://github.com/wconstab
2022-09-09 13:41:13 +00:00
Elias Ellison
15c5baf878 Throw on data dependent ops (#83567)
Previously, we would trace through the following with no error:
```
from torch.fx.experimental.proxy_tensor import make_fx
import torch

def f(x, y):
    return x[0, y:]
```

Even though the output shape is dependent on the data of `y`.  Now, throw on the conversion of `y` to an integer.

It would be nice to not break on constant tensors but I'll do that as the next PR (Edit: done with https://github.com/pytorch/pytorch/pull/84387). Sketching out how that would work (and keep in mind this is applicable Dynamo tracing and not just AOT Autograd)

I think to do that you would need to :
- hold strong refs to a set of constant tensors, and only allow them to be captured from `lift_fresh.copy`
- when you run a mutable op, either remove it from the set of constant tensors or run the operator for real
- limit to small constant tensors
Anything else ?

Pull Request resolved: https://github.com/pytorch/pytorch/pull/83567
Approved by: https://github.com/ezyang
2022-09-07 02:37:00 +00:00
Andrew M. James
6dc9223c8b Sparse_coo: Be more agressive in setting coalesced True to avoid suprising behaviors (#82426)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82426
Approved by: https://github.com/pearu, https://github.com/bhosmer
2022-09-01 17:46:51 +00:00
jpvillam
247468baf0 [ROCm] More Sparse UTs enablement and more hipification mappings. (#78939)
Enables:

 test_bmm_cuda_float64
 test_bmm_deterministic_cuda_float64
 test_csr_matvec_cuda_complex128
 test_csr_matvec_cuda_complex64
 test_csr_matvec_cuda_float32
 test_csr_matvec_cuda_float64

To enable the above tests had to add some more hip mappings for the hipification process.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78939
Approved by: https://github.com/pruthvistony, https://github.com/malfet
2022-08-23 13:54:09 +00:00
Brian Hirsh
0c24af4985 Always allow tensor metadata changes (#83590)
Make it so that it is valid to set metadata after detach calls, like `x.detach().resize_(...)`.

This technically lifts some restrictions around `.data`. This PR means that you can now technically call `x.data.resize_(...)`, which can now directly resize `x` instead of erroring.

My understanding: Before the tensor-variable merge, when `x` and `x.data` were really different tensors, you could resize `x.data` independently of `x`, and during the merge, this error was added to avoid silent confusing behavior changes.

It was agreed that this error has been around long enough (several years) that it's acceptable to drop.  cc @albanD @ezyang.

(Ed already had a prototype PR [here](https://github.com/pytorch/pytorch/pull/83545) - I ended up making one to try to slog through test failures).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/83590
Approved by: https://github.com/ezyang
2022-08-19 23:30:43 +00:00
nikitaved
b60dc2eb43 mul: sparse-dense + sparse-sparse with 0-dims support take 2. (#82962)
This one is a copy of
https://github.com/pytorch/pytorch/pull/81556
https://github.com/pytorch/pytorch/pull/82717
These got reverted due to issues with torchvision.

CC @kit1980 , could you please take over from here?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82962
Approved by: https://github.com/kit1980
2022-08-11 23:34:58 +00:00
PyTorch MergeBot
45291c7ec8 Revert "Implement mul(dense, sparse), mul(sparse, dense) for sparse COO tensors. (#81556)"
This reverts commit edd2f6daa7.

Reverted https://github.com/pytorch/pytorch/pull/81556 on behalf of https://github.com/kit1980 due to Broken internal test, S286911
2022-08-05 19:39:01 +00:00
PyTorch MergeBot
796fba02fe Revert "Implement and extend mul(sparse, sparse) to work with 0-dim arguments on either side. (#82717)"
This reverts commit 3ab54b971f.

Reverted https://github.com/pytorch/pytorch/pull/82717 on behalf of https://github.com/kit1980 due to Broken internal test, S286911
2022-08-05 19:35:35 +00:00
Nikita Vedeneev
3ab54b971f Implement and extend mul(sparse, sparse) to work with 0-dim arguments on either side. (#82717)
Extends https://github.com/pytorch/pytorch/pull/81556 by bringing some missing functionality implemented in master.
Also, improves on master to allow arbitrary 0-dim coalesced or not arguments to be on either side of the operation.
Master, for example, would fail on 0-dim non-coalesced inputs.

CC @datumbox, @osalpekar .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82717
Approved by: https://github.com/amjames, https://github.com/bhosmer
2022-08-04 17:46:23 +00:00
Edward Z. Yang
42fefd4403 Sparse fake tensor support (#82172)
Add support for sparse fake tensors.

- The testing strategy is to run a fake tensor cross ref test on `test_sparse.py`. This is necessary because OpInfo sparse coverage is completely nonexistent. We could have tried to turn on cross ref testing globally for all files, but that would be very time consuming and the tests I'm interested in are mostly in this file. There are some exclusions in testing for things that don't work.
- I make fake tensor converter raise a UnsupportedFakeTensorException if the meta converter fails to do a conversion (which can happen in a relatively large number of situations).
- I relax fake tensor invariants so that you can make a fake tensor from a meta tensor. This is useful because in the cross ref test sometimes we operate on meta tensors.
- Fake tensor wrapping is improved to handle the case when a function doesn't return any tensors
- Meta converter is taught how to convert sparse tensors to meta

There's still a little more cleanup that needs to be done, but this is good for review.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82172
Approved by: https://github.com/eellison
2022-08-03 14:29:36 +00:00
Nikita Vedeneev
edd2f6daa7 Implement mul(dense, sparse), mul(sparse, dense) for sparse COO tensors. (#81556)
As per title. Implemented with broadcasting and in-place support.
Follow-up : Backward implementation.

Fixes https://github.com/pytorch/pytorch/issues/3158
Fixes https://github.com/pytorch/pytorch/issues/4456
Fixes https://github.com/pytorch/pytorch/issues/46307
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81556
Approved by: https://github.com/amjames, https://github.com/cpuhrsch
2022-07-29 15:15:27 +00:00
Nikita Vedeneev
18d0e533da fix silent type promition for sparse COO tensors with select (#82215)
Fixes https://github.com/pytorch/pytorch/issues/82150.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82215
Approved by: https://github.com/amjames, https://github.com/cpuhrsch
2022-07-27 12:24:06 +00:00
Christian Puhrsch
6ab1fe19ee torch.sparse.softmax avoid div by zero and invalid kernel launch parameters (#82149)
### Description
Small changes needed to deal with nnz 0 inputs.

### Issue
https://github.com/pytorch/pytorch/issues/82107

### Testing
Added additional test coverage to reproduce bug reported in issue. Tested resulting values by conversion `to_dense`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/82149
Approved by: https://github.com/jbschlosser, https://github.com/ezyang
2022-07-25 23:10:58 +00:00
PyTorch MergeBot
6e9b0dcdc4 Revert "Implement mul(dense, sparse), mul(sparse, dense) for sparse COO tensors. (#81556)"
This reverts commit cc5b01651f.

Reverted https://github.com/pytorch/pytorch/pull/81556 on behalf of https://github.com/jeanschmidt due to breaking internal builds
2022-07-22 11:20:11 +00:00
Nikita Vedeneev
cc5b01651f Implement mul(dense, sparse), mul(sparse, dense) for sparse COO tensors. (#81556)
As per title. Implemented with broadcasting and in-place support.
Follow-up : Backward implementation.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81556
Approved by: https://github.com/amjames, https://github.com/cpuhrsch
2022-07-22 04:55:48 +00:00
Edward Z. Yang
44193f6b5d Add basic support for sparse meta tensors (#81800)
Coverage is by no means complete, we'll drive more coverage using
an appropriate cross-ref tests; this is just enough to get construction
and querying working.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81800
Approved by: https://github.com/cpuhrsch, https://github.com/bdhirsh
2022-07-21 21:23:57 +00:00
Andrew M. James
5a4c9e8394 Add spdiags sparse matrix initialization (#78439)
Similar to [scipy.sparse.spdiags](https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.spdiags.html#scipy-sparse-spdiags)

Part of #70926

In other functions (ie (torch.diagonal)[https://pytorch.org/docs/stable/generated/torch.diagonal.html#torch.diagonal]) diagonals of a tensor are referenced using the offset and the two dimensions that the diagonal is taken with respect to.

Here the reference implementation from scipy is only considering matrix output, so even if we only support 2-d output at first. It may be useful to consider how the dimensions corresponding to each diagonal would be specified for higher dimensional output.

The proposed torch signature implies that all offsets refer to the diagonals with respect to the only two dimensions of the output:

```
torch.sparse.spdiags(Tensor diagonals, IntTensor offsets, int[] shape, Layout? layout=None) -> SparseTensor
```
 Above it is required that: `diagonals.ndimension() == 2`, `offsets.ndimensions() == 1`, `offsets.shape[0] == diagonals.shape[0]` and `len(shape) == 2`.

This would need to be altered for the case where `len(shape)` > 2. One options is:
```
torch.sparse.spdiags(Tensor[] diagonals, IntTensor[] offsets, IntTensor dims, int[] shape, Layout? layout=None) -> SparseTensor
```

Here `offsets` and `diagonals` becomes lists of tensors, and the `IntTensor dims` argument is introduced. This would require that `len(diagonals) == len(offsets) == dims.shape[0]`, `dims.ndimension() == 2` and `dims.shape[1] == 2` also the same restrictions as the 2d case above apply to the elements of `diagonals` and `offsets` pairwise (that is `diagonals[i].ndimension() == 2`, `offsets[i].ndimension() == 1` and `offsets[i].shape[0] == diagonals[i].shape[0]` for all i). This form of the signature would construct the sparse result by placing the values from `diagonals[i][j]` into the diagonal with offset `offset[i][j]` taken with respect to dimensions `dims[i]`. The specialization back to the original signature for the 2d case could be seen as allowing the single row of dims to default to `[0, 1]` when there is only one `diagonals`, `offsets` provided, and shape is `2-d`. This option allows the rows of an input element `diagonals[i]` to have a different length which may be appropriate as the max length of a diagonal along different dimension pairs will be different.

Another option is to specify the dimensions the diagonal is taken with respect to for each offset. This signature would look like:

```
torch.sparse.spdiags(Tensor diagonals, IntTensor offsets, IntTensor dims, int[] shape, Layout? layout=None) -> SparseTensor
```
Here, `diagonals` is still 2-D with dimension 0 matching the length of 1-D `offsets` and the tensor input `dims` is also 2-D with dimension 0 matching the length of 1-D `offsets` and the second dimension being fixed at `2` in this case the sparse result is constructed by placing the elements from `diagonals[i]` into the output diagonal `output.diagonal(offset[i], dim0=dims[i][0], dim1=dims[i][1])` (with some additional consideration that makes it more complicated than simply asigning to that view). The specialization from this back to the 2-D form could be seen as assuming `dims = [[0, 1], [0, 1]... len(offsets) times ]` when `len shape==2`.

In both proposed signatures for the N-D case the specialization back to the 2-D signature is a bit of a stretch for your typical default arguments logic, however I think the first is better choice as it offers more flexibility.

I think some discussion is required about:
- [x] Should the N-D output case be implemented from the outset
- [x] If not, should the future addition of the N-D output case be considered when designing the interface.
- [x] Other thoughts on the signature which includes the `dims` information for the N-D output case.

**Resolution**: Since no one has offered a request for N-D output support, I think is fine to restrict this to sparse matrix generation. Should a request for N-D support come later, an overload accepting the additional `dims` could be added.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78439
Approved by: https://github.com/nikitaved, https://github.com/cpuhrsch, https://github.com/pearu
2022-07-01 01:11:54 +00:00
PyTorch MergeBot
56e3bc5215 Revert "Add spdiags sparse matrix initialization (#78439)"
This reverts commit cfb2034b65.

Reverted https://github.com/pytorch/pytorch/pull/78439 on behalf of https://github.com/suo due to broke windows builds, see: cfb2034b65
2022-06-30 21:04:36 +00:00
Andrew M. James
cfb2034b65 Add spdiags sparse matrix initialization (#78439)
Similar to [scipy.sparse.spdiags](https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.spdiags.html#scipy-sparse-spdiags)

Part of #70926

In other functions (ie (torch.diagonal)[https://pytorch.org/docs/stable/generated/torch.diagonal.html#torch.diagonal]) diagonals of a tensor are referenced using the offset and the two dimensions that the diagonal is taken with respect to.

Here the reference implementation from scipy is only considering matrix output, so even if we only support 2-d output at first. It may be useful to consider how the dimensions corresponding to each diagonal would be specified for higher dimensional output.

The proposed torch signature implies that all offsets refer to the diagonals with respect to the only two dimensions of the output:

```
torch.sparse.spdiags(Tensor diagonals, IntTensor offsets, int[] shape, Layout? layout=None) -> SparseTensor
```
 Above it is required that: `diagonals.ndimension() == 2`, `offsets.ndimensions() == 1`, `offsets.shape[0] == diagonals.shape[0]` and `len(shape) == 2`.

This would need to be altered for the case where `len(shape)` > 2. One options is:
```
torch.sparse.spdiags(Tensor[] diagonals, IntTensor[] offsets, IntTensor dims, int[] shape, Layout? layout=None) -> SparseTensor
```

Here `offsets` and `diagonals` becomes lists of tensors, and the `IntTensor dims` argument is introduced. This would require that `len(diagonals) == len(offsets) == dims.shape[0]`, `dims.ndimension() == 2` and `dims.shape[1] == 2` also the same restrictions as the 2d case above apply to the elements of `diagonals` and `offsets` pairwise (that is `diagonals[i].ndimension() == 2`, `offsets[i].ndimension() == 1` and `offsets[i].shape[0] == diagonals[i].shape[0]` for all i). This form of the signature would construct the sparse result by placing the values from `diagonals[i][j]` into the diagonal with offset `offset[i][j]` taken with respect to dimensions `dims[i]`. The specialization back to the original signature for the 2d case could be seen as allowing the single row of dims to default to `[0, 1]` when there is only one `diagonals`, `offsets` provided, and shape is `2-d`. This option allows the rows of an input element `diagonals[i]` to have a different length which may be appropriate as the max length of a diagonal along different dimension pairs will be different.

Another option is to specify the dimensions the diagonal is taken with respect to for each offset. This signature would look like:

```
torch.sparse.spdiags(Tensor diagonals, IntTensor offsets, IntTensor dims, int[] shape, Layout? layout=None) -> SparseTensor
```
Here, `diagonals` is still 2-D with dimension 0 matching the length of 1-D `offsets` and the tensor input `dims` is also 2-D with dimension 0 matching the length of 1-D `offsets` and the second dimension being fixed at `2` in this case the sparse result is constructed by placing the elements from `diagonals[i]` into the output diagonal `output.diagonal(offset[i], dim0=dims[i][0], dim1=dims[i][1])` (with some additional consideration that makes it more complicated than simply asigning to that view). The specialization from this back to the 2-D form could be seen as assuming `dims = [[0, 1], [0, 1]... len(offsets) times ]` when `len shape==2`.

In both proposed signatures for the N-D case the specialization back to the 2-D signature is a bit of a stretch for your typical default arguments logic, however I think the first is better choice as it offers more flexibility.

I think some discussion is required about:
- [x] Should the N-D output case be implemented from the outset
- [x] If not, should the future addition of the N-D output case be considered when designing the interface.
- [x] Other thoughts on the signature which includes the `dims` information for the N-D output case.

**Resolution**: Since no one has offered a request for N-D output support, I think is fine to restrict this to sparse matrix generation. Should a request for N-D support come later, an overload accepting the additional `dims` could be added.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78439
Approved by: https://github.com/nikitaved, https://github.com/cpuhrsch, https://github.com/pearu
2022-06-30 19:54:47 +00:00
Christian Puhrsch
5da776dd08 [Resubmission] fix mul_out CUDA config for COO tensors (#80254)
Fixes https://github.com/pytorch/pytorch/issues/79914

Duplicate of https://github.com/pytorch/pytorch/pull/79937 . I wasn't able to push changes to the existing PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80254
Approved by: https://github.com/eellison
2022-06-28 00:47:03 +00:00
Nikita Vedeneev
417677bf62 permute for COO sparse tensors (#79707)
As per title. Partial implementation of https://github.com/pytorch/pytorch/issues/78422.
We cannot satisfy the view semantics once operated over sparse dims.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79707
Approved by: https://github.com/cpuhrsch
2022-06-25 08:49:58 +00:00
Nikita Vedeneev
03cf01bdc0 index_select for COO CUDA tensors. (#77551)
Brings a native CUDA implementation for `index_select`. Master silently converts CUDA tensors to CPU for CUDA support.

Case `nnz >> size` could be optimized similar to how https://github.com/pytorch/pytorch/pull/72710 is doing that.

Some benchmarks:
<details>

<summary>PR/torch_sparse/master</summary>

```
[------------------------------- cuda coo.index_select -------------------------------]
                                                    |   PR   |  torch_sparse  |  master
32 threads: ---------------------------------------------------------------------------
      n=10000, nnz=100, index_len=100, dim=0        |    96  |       327      |     70
      n=10000, nnz=100, index_len=100, dim=1        |   120  |       505      |     74
      n=10000, nnz=100, index_len=1000, dim=0       |    90  |       333      |     93
      n=10000, nnz=100, index_len=1000, dim=1       |   120  |       499      |     98
      n=10000, nnz=100, index_len=10000, dim=0      |    92  |       331      |    350
      n=10000, nnz=100, index_len=10000, dim=1      |   100  |       506      |    352
      n=100000, nnz=1000, index_len=100, dim=0      |    53  |       274      |     60
      n=100000, nnz=1000, index_len=100, dim=1      |    90  |       368      |     71
      n=100000, nnz=1000, index_len=1000, dim=0     |    93  |       332      |    100
      n=100000, nnz=1000, index_len=1000, dim=1     |   130  |       501      |    140
      n=100000, nnz=1000, index_len=10000, dim=0    |   100  |       341      |    522
      n=100000, nnz=1000, index_len=10000, dim=1    |   130  |       530      |    549
      n=1000000, nnz=10000, index_len=100, dim=0    |    90  |       429      |    110
      n=1000000, nnz=10000, index_len=100, dim=1    |   296  |       810      |    355
      n=1000000, nnz=10000, index_len=1000, dim=0   |   100  |       435      |    170
      n=1000000, nnz=10000, index_len=1000, dim=1   |   309  |       830      |    548
      n=1000000, nnz=10000, index_len=10000, dim=0  |   110  |       446      |    750
      n=1000000, nnz=10000, index_len=10000, dim=1  |   310  |       830      |   1000
      n=10, nnz=100, index_len=100, dim=0           |    90  |       333      |     74
      n=10, nnz=100, index_len=100, dim=1           |   100  |       497      |     78
      n=10, nnz=100, index_len=1000, dim=0          |    90  |       329      |    140
      n=10, nnz=100, index_len=1000, dim=1          |   100  |       800      |    100
      n=10, nnz=100, index_len=10000, dim=0         |    93  |       340      |    900
      n=10, nnz=100, index_len=10000, dim=1         |   120  |       800      |    489
      n=10, nnz=1000, index_len=100, dim=0          |    90  |       321      |    140
      n=10, nnz=1000, index_len=100, dim=1          |   100  |       680      |    140
      n=10, nnz=1000, index_len=1000, dim=0         |   110  |       349      |    670
      n=10, nnz=1000, index_len=1000, dim=1         |   130  |       740      |    800
      n=10, nnz=1000, index_len=10000, dim=0        |   302  |       503      |   4882
      n=10, nnz=1000, index_len=10000, dim=1        |   325  |      2257      |   5262
      n=10, nnz=10000, index_len=100, dim=0         |   229  |       349      |    810
      n=10, nnz=10000, index_len=100, dim=1         |   433  |       870      |    700
      n=10, nnz=10000, index_len=1000, dim=0        |   666  |       502      |   5581
      n=10, nnz=10000, index_len=1000, dim=1        |   826  |      2379      |   4820
      n=10, nnz=10000, index_len=10000, dim=0       |  2534  |      2700      |  80000
      n=10, nnz=10000, index_len=10000, dim=1       |  2723  |     18540      |  80000
      n=100, nnz=1000, index_len=100, dim=0         |    94  |       324      |    110
      n=100, nnz=1000, index_len=100, dim=1         |   100  |       499      |    110
      n=100, nnz=1000, index_len=1000, dim=0        |    96  |       337      |    150
      n=100, nnz=1000, index_len=1000, dim=1        |   130  |       800      |    140
      n=100, nnz=1000, index_len=10000, dim=0       |   100  |       346      |    900
      n=100, nnz=1000, index_len=10000, dim=1       |   130  |       760      |    900
      n=100, nnz=10000, index_len=100, dim=0        |    90  |       323      |    190
      n=100, nnz=10000, index_len=100, dim=1        |   279  |       800      |    180
      n=100, nnz=10000, index_len=1000, dim=0       |   110  |       339      |    781
      n=100, nnz=10000, index_len=1000, dim=1       |   294  |       870      |    800
      n=100, nnz=10000, index_len=10000, dim=0      |   315  |       505      |   6264
      n=100, nnz=10000, index_len=10000, dim=1      |   497  |      2398      |   5404
      n=1000, nnz=10000, index_len=100, dim=0       |    90  |       333      |    160
      n=1000, nnz=10000, index_len=100, dim=1       |   279  |       635      |    150
      n=1000, nnz=10000, index_len=1000, dim=0      |   100  |       328      |    215
      n=1000, nnz=10000, index_len=1000, dim=1      |   287  |       810      |    207
      n=1000, nnz=10000, index_len=10000, dim=0     |   100  |       339      |    900
      n=1000, nnz=10000, index_len=10000, dim=1     |   291  |       880      |   1000
      n=1000, nnz=100000, index_len=100, dim=0      |    92  |       358      |    435
      n=1000, nnz=100000, index_len=100, dim=1      |   302  |       900      |    530
      n=1000, nnz=100000, index_len=1000, dim=0     |   130  |       360      |   1000
      n=1000, nnz=100000, index_len=1000, dim=1     |   329  |       930      |   1200
      n=1000, nnz=100000, index_len=10000, dim=0    |   343  |       530      |   7000
      n=1000, nnz=100000, index_len=10000, dim=1    |   545  |      2446      |   6100
      n=1000, nnz=1000000, index_len=100, dim=0     |   355  |       394      |   2210
      n=1000, nnz=1000000, index_len=100, dim=1     |  1660  |      2276      |   2674
      n=1000, nnz=1000000, index_len=1000, dim=0    |   877  |       574      |   6700
      n=1000, nnz=1000000, index_len=1000, dim=1    |  2449  |      3782      |   9000
      n=1000, nnz=1000000, index_len=10000, dim=0   |  3112  |      2931      |  57000
      n=1000, nnz=1000000, index_len=10000, dim=1   |  7340  |     20220      |  65700

Times are in microseconds (us).

```

</details>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77551
Approved by: https://github.com/cpuhrsch
2022-06-01 17:39:03 +00:00
Mike Ruberry
089203f8bc Updates floor_divide to perform floor division (#78411)
Fixes https://github.com/pytorch/pytorch/issues/43874

This PR changes floor_divide to perform floor division instead of truncation division.

This is a BC-breaking change, but it's a "bug fix," and we've already warned users for several releases this behavior would change.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78411
Approved by: https://github.com/ngimel
2022-05-29 21:28:45 +00:00
Nikita Vedeneev
00a1fb64bb Faster index_select for sparse COO tensors on CPU. (#72710)
Fixes https://github.com/pytorch/pytorch/issues/72212.

This PR improves the previous algorithm in complexity. It also utilizes the structure of the problem and parallelizes computations when possible.

Benchmark results.

<details>

<summary>Testing script</summary>

```python
import torch
import math
from IPython import get_ipython
from itertools import product
import pickle
from torch.utils.benchmark import Timer, Compare

torch.manual_seed(13)
#torch.set_num_threads(1)
ipython = get_ipython()

index_sizes = (100, 1000, 10000)
# specifies (n, nnz)
problem_dims = (
    # n > nnz
    (10000, 100),
    (100000, 1000),
    (1000000, 10000),
    # n < nnz
    (10, 100),
    (10, 1000),
    (10, 10000),
    (100, 1000),
    (100, 10000),
    (1000, 10000),
    (1000, 100000),
    (1000, 1000000),
    #(1000000, 1000000000),
)

def f(t, d, index):
    s = torch_sparse.SparseTensor.from_torch_sparse_coo_tensor(t)
    ss = s.index_select(d, index)
    return ss.coo()

name = "PR"
results = []

for (n, nnz), m in product(problem_dims, index_sizes):
    for d in (0, 1):
        if nnz < n:
            shape = (n, n)
        else:
            shape = (n, nnz // n) if d == 0 else (nnz // n, n)
        nrows, ncols = shape
        rowidx = torch.randint(low=0, high=nrows, size=(nnz,))
        colidx = torch.randint(low=0, high=ncols, size=(nnz,))
        itemidx = torch.vstack((rowidx, colidx))
        xvalues = torch.randn(nnz)
        index = torch.randint(low=0, high=n, size=(m,))

        SparseX = torch.sparse_coo_tensor(itemidx, xvalues, size=shape).coalesce()
        smtp = "SparseX.index_select(d, index)"
        timer = Timer(smtp,
                      globals=globals(),
                      label="coo.index_select",
                      description=f"{name}: coo.index_select",
                      sub_label=f"n={n}, nnz={nnz}, index_len={m}, dim={d}",
                      num_threads=torch.get_num_threads())
        results.append(timer.blocked_autorange())

compare = Compare(results)
compare.trim_significant_figures()
compare.print()

with open(f"{name}_index_select.pickle", 'wb') as f:
    pickle.dump(results, f)

```

</details>

<details>

<summary>Gather results</summary>

```python
import pickle
from torch.utils.benchmark import Timer, Compare

files = [
        "PR",
        "torch_sparse",
        "master"
        ]

timers = []
for name in files:
    with open("{}_index_select.pickle".format(name), 'rb') as f:
        timers += pickle.load(f)

compare = Compare(timers)
compare.trim_significant_figures()
compare.print()

```

</details>

<details>

<summary>PR/torch_sparse/master runtime comparison</summary>

```
[----------------------------------- coo.index_select ----------------------------------]
                                                    |    PR   |  torch_sparse  |   master
32 threads: -----------------------------------------------------------------------------
      n=10000, nnz=100, index_len=100, dim=0        |     14  |        140     |       10
      n=10000, nnz=100, index_len=100, dim=1        |     14  |        200     |       10
      n=10000, nnz=100, index_len=1000, dim=0       |     30  |        180     |       38
      n=10000, nnz=100, index_len=1000, dim=1       |     34  |        240     |       38
      n=10000, nnz=100, index_len=10000, dim=0      |    278  |        460     |      330
      n=10000, nnz=100, index_len=10000, dim=1      |    275  |        516     |      330
      n=100000, nnz=1000, index_len=100, dim=0      |     16  |        290     |       31
      n=100000, nnz=1000, index_len=100, dim=1      |     26  |        390     |       31
      n=100000, nnz=1000, index_len=1000, dim=0     |     45  |        405     |      263
      n=100000, nnz=1000, index_len=1000, dim=1     |     73  |        500     |      261
      n=100000, nnz=1000, index_len=10000, dim=0    |    444  |        783     |     2570
      n=100000, nnz=1000, index_len=10000, dim=1    |    470  |        890     |     2590
      n=1000000, nnz=10000, index_len=100, dim=0    |     25  |       2400     |      270
      n=1000000, nnz=10000, index_len=100, dim=1    |    270  |       4000     |      269
      n=1000000, nnz=10000, index_len=1000, dim=0   |     74  |       2600     |     2620
      n=1000000, nnz=10000, index_len=1000, dim=1   |    464  |       3600     |     2640
      n=1000000, nnz=10000, index_len=10000, dim=0  |    635  |       3300     |    26400
      n=1000000, nnz=10000, index_len=10000, dim=1  |   1000  |       3960     |    26400
      n=10, nnz=100, index_len=100, dim=0           |     16  |        137     |       16
      n=10, nnz=100, index_len=100, dim=1           |     16  |        220     |       16
      n=10, nnz=100, index_len=1000, dim=0          |     63  |        238     |       81
      n=10, nnz=100, index_len=1000, dim=1          |     60  |        698     |       78
      n=10, nnz=100, index_len=10000, dim=0         |    480  |        940     |      862
      n=10, nnz=100, index_len=10000, dim=1         |    330  |       4930     |     1070
      n=10, nnz=1000, index_len=100, dim=0          |     60  |        200     |       73
      n=10, nnz=1000, index_len=100, dim=1          |     56  |        683     |       70
      n=10, nnz=1000, index_len=1000, dim=0         |    480  |        530     |     1050
      n=10, nnz=1000, index_len=1000, dim=1         |    330  |       4550     |     1368
      n=10, nnz=1000, index_len=10000, dim=0        |   3100  |       2900     |     9300
      n=10, nnz=1000, index_len=10000, dim=1        |   3400  |      46000     |     9100
      n=10, nnz=10000, index_len=100, dim=0         |    400  |        453     |      857
      n=10, nnz=10000, index_len=100, dim=1         |    400  |       4070     |     1730
      n=10, nnz=10000, index_len=1000, dim=0        |   2840  |       2600     |    13900
      n=10, nnz=10000, index_len=1000, dim=1        |   3700  |      40600     |    16000
      n=10, nnz=10000, index_len=10000, dim=0       |  83200  |      67400     |   160000
      n=10, nnz=10000, index_len=10000, dim=1       |  68000  |     528000     |   190000
      n=100, nnz=1000, index_len=100, dim=0         |     46  |        148     |       31
      n=100, nnz=1000, index_len=100, dim=1         |     45  |        242     |       37
      n=100, nnz=1000, index_len=1000, dim=0        |     68  |        248     |      240
      n=100, nnz=1000, index_len=1000, dim=1        |     66  |        755     |      290
      n=100, nnz=1000, index_len=10000, dim=0       |    370  |        802     |     2250
      n=100, nnz=1000, index_len=10000, dim=1       |    372  |       5430     |     2770
      n=100, nnz=10000, index_len=100, dim=0        |     82  |        210     |      224
      n=100, nnz=10000, index_len=100, dim=1        |     74  |        986     |      270
      n=100, nnz=10000, index_len=1000, dim=0       |    350  |        618     |     2600
      n=100, nnz=10000, index_len=1000, dim=1       |    370  |       4660     |     4560
      n=100, nnz=10000, index_len=10000, dim=0      |   3000  |       3400     |    41680
      n=100, nnz=10000, index_len=10000, dim=1      |   5000  |      47500     |    30400
      n=1000, nnz=10000, index_len=100, dim=0       |     71  |        160     |      185
      n=1000, nnz=10000, index_len=100, dim=1       |     64  |        516     |      190
      n=1000, nnz=10000, index_len=1000, dim=0      |    100  |        249     |     1740
      n=1000, nnz=10000, index_len=1000, dim=1      |     98  |       1030     |     1770
      n=1000, nnz=10000, index_len=10000, dim=0     |    600  |        808     |    18300
      n=1000, nnz=10000, index_len=10000, dim=1     |    663  |       5300     |    18500
      n=1000, nnz=100000, index_len=100, dim=0      |    160  |        258     |     1890
      n=1000, nnz=100000, index_len=100, dim=1      |    200  |       3620     |     2050
      n=1000, nnz=100000, index_len=1000, dim=0     |    500  |        580     |    18700
      n=1000, nnz=100000, index_len=1000, dim=1     |    640  |       7550     |    30000
      n=1000, nnz=100000, index_len=10000, dim=0    |   3400  |       3260     |   186000
      n=1000, nnz=100000, index_len=10000, dim=1    |   3600  |      49600     |   194000
      n=1000, nnz=1000000, index_len=100, dim=0     |    517  |        957     |    18700
      n=1000, nnz=1000000, index_len=100, dim=1     |    680  |      39600     |    37600
      n=1000, nnz=1000000, index_len=1000, dim=0    |   3600  |       4500     |   186000
      n=1000, nnz=1000000, index_len=1000, dim=1    |   5800  |      76400     |   190000
      n=1000, nnz=1000000, index_len=10000, dim=0   |  50000  |      67900     |  1800000
      n=1000, nnz=1000000, index_len=10000, dim=1   |  45000  |     570000     |  1900000

Times are in microseconds (us).

```

</details>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72710
Approved by: https://github.com/pearu, https://github.com/cpuhrsch
2022-05-10 16:33:13 +00:00
PyTorch MergeBot
8d67972b14 Revert "Faster index_select for sparse COO tensors on CPU. (#72710)"
This reverts commit ce3857e73c.

Reverted https://github.com/pytorch/pytorch/pull/72710 on behalf of https://github.com/malfet
2022-05-10 14:43:05 +00:00
Nikita Vedeneev
ce3857e73c Faster index_select for sparse COO tensors on CPU. (#72710)
Fixes https://github.com/pytorch/pytorch/issues/72212.

This PR improves the previous algorithm in complexity. It also utilizes the structure of the problem and parallelizes computations when possible.

Benchmark results.

<details>

<summary>Testing script</summary>

```python
import torch
import math
from IPython import get_ipython
from itertools import product
import pickle
from torch.utils.benchmark import Timer, Compare

torch.manual_seed(13)
#torch.set_num_threads(1)
ipython = get_ipython()

index_sizes = (100, 1000, 10000)
# specifies (n, nnz)
problem_dims = (
    # n > nnz
    (10000, 100),
    (100000, 1000),
    (1000000, 10000),
    # n < nnz
    (10, 100),
    (10, 1000),
    (10, 10000),
    (100, 1000),
    (100, 10000),
    (1000, 10000),
    (1000, 100000),
    (1000, 1000000),
    #(1000000, 1000000000),
)

def f(t, d, index):
    s = torch_sparse.SparseTensor.from_torch_sparse_coo_tensor(t)
    ss = s.index_select(d, index)
    return ss.coo()

name = "PR"
results = []

for (n, nnz), m in product(problem_dims, index_sizes):
    for d in (0, 1):
        if nnz < n:
            shape = (n, n)
        else:
            shape = (n, nnz // n) if d == 0 else (nnz // n, n)
        nrows, ncols = shape
        rowidx = torch.randint(low=0, high=nrows, size=(nnz,))
        colidx = torch.randint(low=0, high=ncols, size=(nnz,))
        itemidx = torch.vstack((rowidx, colidx))
        xvalues = torch.randn(nnz)
        index = torch.randint(low=0, high=n, size=(m,))

        SparseX = torch.sparse_coo_tensor(itemidx, xvalues, size=shape).coalesce()
        smtp = "SparseX.index_select(d, index)"
        timer = Timer(smtp,
                      globals=globals(),
                      label="coo.index_select",
                      description=f"{name}: coo.index_select",
                      sub_label=f"n={n}, nnz={nnz}, index_len={m}, dim={d}",
                      num_threads=torch.get_num_threads())
        results.append(timer.blocked_autorange())

compare = Compare(results)
compare.trim_significant_figures()
compare.print()

with open(f"{name}_index_select.pickle", 'wb') as f:
    pickle.dump(results, f)

```

</details>

<details>

<summary>Gather results</summary>

```python
import pickle
from torch.utils.benchmark import Timer, Compare

files = [
        "PR",
        "torch_sparse",
        "master"
        ]

timers = []
for name in files:
    with open("{}_index_select.pickle".format(name), 'rb') as f:
        timers += pickle.load(f)

compare = Compare(timers)
compare.trim_significant_figures()
compare.print()

```

</details>

<details>

<summary>PR/torch_sparse/master runtime comparison</summary>

```
[----------------------------------- coo.index_select ----------------------------------]
                                                    |    PR   |  torch_sparse  |   master
32 threads: -----------------------------------------------------------------------------
      n=10000, nnz=100, index_len=100, dim=0        |     14  |        140     |       10
      n=10000, nnz=100, index_len=100, dim=1        |     14  |        200     |       10
      n=10000, nnz=100, index_len=1000, dim=0       |     30  |        180     |       38
      n=10000, nnz=100, index_len=1000, dim=1       |     34  |        240     |       38
      n=10000, nnz=100, index_len=10000, dim=0      |    278  |        460     |      330
      n=10000, nnz=100, index_len=10000, dim=1      |    275  |        516     |      330
      n=100000, nnz=1000, index_len=100, dim=0      |     16  |        290     |       31
      n=100000, nnz=1000, index_len=100, dim=1      |     26  |        390     |       31
      n=100000, nnz=1000, index_len=1000, dim=0     |     45  |        405     |      263
      n=100000, nnz=1000, index_len=1000, dim=1     |     73  |        500     |      261
      n=100000, nnz=1000, index_len=10000, dim=0    |    444  |        783     |     2570
      n=100000, nnz=1000, index_len=10000, dim=1    |    470  |        890     |     2590
      n=1000000, nnz=10000, index_len=100, dim=0    |     25  |       2400     |      270
      n=1000000, nnz=10000, index_len=100, dim=1    |    270  |       4000     |      269
      n=1000000, nnz=10000, index_len=1000, dim=0   |     74  |       2600     |     2620
      n=1000000, nnz=10000, index_len=1000, dim=1   |    464  |       3600     |     2640
      n=1000000, nnz=10000, index_len=10000, dim=0  |    635  |       3300     |    26400
      n=1000000, nnz=10000, index_len=10000, dim=1  |   1000  |       3960     |    26400
      n=10, nnz=100, index_len=100, dim=0           |     16  |        137     |       16
      n=10, nnz=100, index_len=100, dim=1           |     16  |        220     |       16
      n=10, nnz=100, index_len=1000, dim=0          |     63  |        238     |       81
      n=10, nnz=100, index_len=1000, dim=1          |     60  |        698     |       78
      n=10, nnz=100, index_len=10000, dim=0         |    480  |        940     |      862
      n=10, nnz=100, index_len=10000, dim=1         |    330  |       4930     |     1070
      n=10, nnz=1000, index_len=100, dim=0          |     60  |        200     |       73
      n=10, nnz=1000, index_len=100, dim=1          |     56  |        683     |       70
      n=10, nnz=1000, index_len=1000, dim=0         |    480  |        530     |     1050
      n=10, nnz=1000, index_len=1000, dim=1         |    330  |       4550     |     1368
      n=10, nnz=1000, index_len=10000, dim=0        |   3100  |       2900     |     9300
      n=10, nnz=1000, index_len=10000, dim=1        |   3400  |      46000     |     9100
      n=10, nnz=10000, index_len=100, dim=0         |    400  |        453     |      857
      n=10, nnz=10000, index_len=100, dim=1         |    400  |       4070     |     1730
      n=10, nnz=10000, index_len=1000, dim=0        |   2840  |       2600     |    13900
      n=10, nnz=10000, index_len=1000, dim=1        |   3700  |      40600     |    16000
      n=10, nnz=10000, index_len=10000, dim=0       |  83200  |      67400     |   160000
      n=10, nnz=10000, index_len=10000, dim=1       |  68000  |     528000     |   190000
      n=100, nnz=1000, index_len=100, dim=0         |     46  |        148     |       31
      n=100, nnz=1000, index_len=100, dim=1         |     45  |        242     |       37
      n=100, nnz=1000, index_len=1000, dim=0        |     68  |        248     |      240
      n=100, nnz=1000, index_len=1000, dim=1        |     66  |        755     |      290
      n=100, nnz=1000, index_len=10000, dim=0       |    370  |        802     |     2250
      n=100, nnz=1000, index_len=10000, dim=1       |    372  |       5430     |     2770
      n=100, nnz=10000, index_len=100, dim=0        |     82  |        210     |      224
      n=100, nnz=10000, index_len=100, dim=1        |     74  |        986     |      270
      n=100, nnz=10000, index_len=1000, dim=0       |    350  |        618     |     2600
      n=100, nnz=10000, index_len=1000, dim=1       |    370  |       4660     |     4560
      n=100, nnz=10000, index_len=10000, dim=0      |   3000  |       3400     |    41680
      n=100, nnz=10000, index_len=10000, dim=1      |   5000  |      47500     |    30400
      n=1000, nnz=10000, index_len=100, dim=0       |     71  |        160     |      185
      n=1000, nnz=10000, index_len=100, dim=1       |     64  |        516     |      190
      n=1000, nnz=10000, index_len=1000, dim=0      |    100  |        249     |     1740
      n=1000, nnz=10000, index_len=1000, dim=1      |     98  |       1030     |     1770
      n=1000, nnz=10000, index_len=10000, dim=0     |    600  |        808     |    18300
      n=1000, nnz=10000, index_len=10000, dim=1     |    663  |       5300     |    18500
      n=1000, nnz=100000, index_len=100, dim=0      |    160  |        258     |     1890
      n=1000, nnz=100000, index_len=100, dim=1      |    200  |       3620     |     2050
      n=1000, nnz=100000, index_len=1000, dim=0     |    500  |        580     |    18700
      n=1000, nnz=100000, index_len=1000, dim=1     |    640  |       7550     |    30000
      n=1000, nnz=100000, index_len=10000, dim=0    |   3400  |       3260     |   186000
      n=1000, nnz=100000, index_len=10000, dim=1    |   3600  |      49600     |   194000
      n=1000, nnz=1000000, index_len=100, dim=0     |    517  |        957     |    18700
      n=1000, nnz=1000000, index_len=100, dim=1     |    680  |      39600     |    37600
      n=1000, nnz=1000000, index_len=1000, dim=0    |   3600  |       4500     |   186000
      n=1000, nnz=1000000, index_len=1000, dim=1    |   5800  |      76400     |   190000
      n=1000, nnz=1000000, index_len=10000, dim=0   |  50000  |      67900     |  1800000
      n=1000, nnz=1000000, index_len=10000, dim=1   |  45000  |     570000     |  1900000

Times are in microseconds (us).

```

</details>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72710
Approved by: https://github.com/pearu, https://github.com/cpuhrsch
2022-05-09 19:59:39 +00:00
Jane Xu
6d9dbd3391 Manually skip test_sparse_addmm as disable code is not working for now (#77076)
Related to https://github.com/pytorch/pytorch/issues/73145

It was previously skipped for Linux and Windows, but mac has become a problem as well.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77076
Approved by: https://github.com/ezyang
2022-05-09 13:54:29 +00:00
Mikayla Gawarecki
0adf070574 Use scatter_reduce to support masked reductions on sparse COO tensors (sum, prod, amin, amax)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75454

Approved by: https://github.com/cpuhrsch
2022-05-06 15:40:22 +00:00
PyTorch MergeBot
381e08309f Revert "Use scatter_reduce to support masked reductions on sparse COO tensors (sum, prod, amin, amax)"
This reverts commit fc2a2e8b72.

Reverted https://github.com/pytorch/pytorch/pull/75454 on behalf of https://github.com/b0noI
2022-05-04 22:31:31 +00:00
Mikayla Gawarecki
fc2a2e8b72 Use scatter_reduce to support masked reductions on sparse COO tensors (sum, prod, amin, amax)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75454

Approved by: https://github.com/cpuhrsch
2022-05-03 23:17:07 +00:00
arindamroy-eng
7478ce187a ROCM:Unskip more tests for ROCM5.0
Re-enabling more tests which are working on ROCM5.0

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75353
Approved by: https://github.com/ezyang
2022-04-19 19:45:55 +00:00
Pearu Peterson
a98b4666e0 Enable test_sparse_mask for Windows
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75189

Approved by: https://github.com/cpuhrsch
2022-04-11 17:21:29 +00:00
Brian Hirsh
1b7d7d9327 Reland: "free up dispatch key space (in C++)" (#74963)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74963

This is a re-land of D35192346 (9872a06d77) and D35192317 (a9216cde6c), which together are a diff that changes the internal representation of `DispatchKeySet` in pytorch core to free up the number of dispatch keys that we have available. See a more detailed description of the design in the original PR: https://github.com/pytorch/pytorch/pull/69633.

The original PR broke Milan workflows, which use a pytorch mobile build, and manifested as a memory corruption bug inside of `liboacrmerged.so`.

**Background: Existing Mobile Optimization**
Pytorch mobile builds have an existing optimization (here cc23725e89/c10/core/DispatchKey.h (L382) and here cc23725e89/aten/src/ATen/core/dispatch/OperatorEntry.h (L214)), which works as follows:

Every operator in pytorch has a "dispatch table" of function pointers, corresponding to all of the (up to 64) different kernels that we might dispatch to when we run an operator in pytorch (autograd, cpu, cuda, complex number support, etc).

In mobile builds, the size of that table is shrunk from 64 to 8 to save a bunch of space, because mobile doesn't end up using the functionality associated with most dispatch keys.

The dispatcher also has a notion of "fallback kernels", which are kernels that you can register to a particular dispatch key, but should be able to work for "any operator". The array of fallback kernels is defined here: cc23725e89/aten/src/ATen/core/dispatch/Dispatcher.h (L294).

The mobile-optimization currently does **not** extend to this array (it wouldn't be that useful anyway because there is only one array of fallback kernels globally - vs. there is a separate dispatch table of function pointers per operator). So the per-operator tables on mobile are size 8, while the fallback table is size 64.

**The Bug**
This PR actually makes it difficult to enable that optimization separately for the per-operator arrays vs. the fallback array, and incidentally shrunk the size of the fallback array from 64 to 8 for mobile (that happened on this line: https://github.com/pytorch/pytorch/pull/69633/files#diff-f735cd7aa68f15b624100cbc4bb3b5ea76ffc7c9d3bec3b0ccabaa09609e5319R294).

That isn't a problem by itself (since mobile doesn't actually use any of the fallbacks that can no longer be stored). However, pytorch core will still register all of those fallback kernels on startup in mobile builds, even if they aren't used. When we tried to register one of those fallbacks on startup, it would try to dump the kernel somewhere in memory past the bounds of the (now smaller) array inside of the `Dispatcher` object, `backendFallbackKernels_`.

**Why didn't this problem show up in OSS CI? Why didn't it break other internal mobile workflows aside from Milan?**

Ideally, this failure would show up as part of the OSS signal on GitHub, since we already have mobile OSS builds. Given that it was another memory corruption issue that only affected Milan (subset of mobile), I'm not sure what's specific about Milan's builds that caused it only to manifest there. dreiss I wonder if there's another flavor of mobile builds we could run in OSS CI that could potentially help catch this?

**The debugging experience was pretty difficult**

Debugging the Milan-specific failure was made difficult by the following:

(1) lack of CI
- the original Milan failure didn't surface on my original diff, because the Milan job(s) that failed weren't triggered to run on pytorch changes. There's probably a balance to strike here, since those jobs will only be useful if they aren't flaky, and if they can produce reliable failure logs for debugging.

(2) It's difficult to get a repro.
- my work laptop doesn't have the right specs to run the Milan development workflow (not enough disk space)
- There is an existing OnDemand workflow for Milan, but it appears to be relatively new, and after a bunch of help from MarcioPorto, we ran into issues forwarding the log output from Milan tests on the emulator back to the terminal (see the original discussion here: https://fb.workplace.com/groups/OnDemandFRL/permalink/1424937774645433/)

(3) Lack of stack-traces.
- Most Milan failures didn't include actionable stack traces. phding generously helped me debug by running my suggested patches locally, and reporting back if there were any failures. The failing test didn't include a stack trace though (just the line where the crash appeared), so I ended up making some educated guesses about what the issue was based on the area of the crash.
ghstack-source-id: 152688542

Test Plan: Confirmed with phding that the broken Milan workflow from the previous version of this diff is now passing.

Reviewed By: phding, albanD

Differential Revision: D35222806

fbshipit-source-id: 0ad115a0f768bc8ea5d4c203b2990254c7092d30
(cherry picked from commit 002b91966f11fd55ab3fa3801b636fa39a6dd12c)
2022-03-31 21:52:38 +00:00
Nikita Shulga
bfac65dfe5
[testing] Update dispatch macros (#74977)
This PR is reland of #74289 
Co-authored-by: Khushi Agrawal <khushiagrawal411@gmail.com>
2022-03-30 14:13:21 -07:00