Commit Graph

328 Commits

Author SHA1 Message Date
Aaron Gokaslan
3d82d8d0ed [BE] Enable more flake8-comprehensions checks (#94601)
I applied some flake8 fixes and enabled checking for them in the linter. I also enabled some checks for my previous comprehensions PR.

This is a follow up to #94323 where I enable the flake8 checkers for the fixes I made and fix a few more of them.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94601
Approved by: https://github.com/ezyang
2023-02-10 23:40:29 +00:00
Huy Do
c53bd0dd30 Mitigate broken test_coalesce_reference_cycle test on dynamo (#94622)
The test has been disabled and shows up on https://github.com/pytorch/test-infra/blob/generated-stats/stats/disabled-tests-condensed.json, but then the JSON file downloaded by the runner doesn't seem to have it.

Disable it explicitly to keep trunk green while investigating.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94622
Approved by: https://github.com/weiwangmeta
2023-02-10 21:59:36 +00:00
PyTorch MergeBot
76ed1a81d1 Revert "COO intersection kernel: respect value intersection order (#92242)"
This reverts commit b07c839b70.

Reverted https://github.com/pytorch/pytorch/pull/92242 on behalf of https://github.com/jeanschmidt due to breaking vs17
2023-02-09 14:44:32 +00:00
Aleksandar Samardžić
e1f17b3530 Add CSR->BSC and CSC->BSR conversions (#93301)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/93301
Approved by: https://github.com/cpuhrsch
2023-02-07 19:22:05 +00:00
Nikita Vedeneev
b07c839b70 COO intersection kernel: respect value intersection order (#92242)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92242
Approved by: https://github.com/cpuhrsch, https://github.com/amjames
2023-02-07 17:05:28 +00:00
Nikita Vedeneev
994f85d639 sparse_mask: extend lhs to sparse COO tensors (#92248)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92248
Approved by: https://github.com/cpuhrsch, https://github.com/pearu
2023-02-01 09:00:07 +00:00
Aleksandar Samardžić
53f7fb9a22 Add CSC->BSC conversion (#92307)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92307
Approved by: https://github.com/cpuhrsch
2023-01-30 17:03:36 +00:00
Pearu Peterson
65d6802e2f Improve error messages for sparse methods on tensors with unsupported backends/layouts. (#93149)
Fixes https://github.com/pytorch/pytorch/issues/92790

Pull Request resolved: https://github.com/pytorch/pytorch/pull/93149
Approved by: https://github.com/cpuhrsch
2023-01-27 19:50:23 +00:00
Pearu Peterson
0e92bbe5b1 Add sparse COO tensor support to torch.sum(dim=..., keepdim=...) (#92979)
Fixes #92757, #86232

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92979
Approved by: https://github.com/cpuhrsch
2023-01-26 18:42:51 +00:00
Eddie Yan
0bf7506051 [CUDA] Drop CUDA < 11.0 test flags (#92605)
Follow-up of #89582 to drop flags like `CUDA11OrLater` in tests. Note that in some places it appears that `TEST_WITH_ROCM` is _implicitly_ guarded against via the `CUDA11OrLater` version check, based on my best-guess of how `torch.version.cuda` would behave in ROCM builds, so I've added `not TEST_WITH_ROCM` in cases where ROCM wasn't previously explicitly allowed.

CC @ptrblck @malfet @ngimel
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92605
Approved by: https://github.com/ngimel
2023-01-24 04:34:06 +00:00
Nikita Vedeneev
9f381c9b7f sparse_sparse_matmul: simplify backward (#91712)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91712
Approved by: https://github.com/albanD
2023-01-23 19:24:28 +00:00
Yanbo Liang
0ab4ab9f8d [Dynamo] Fix calling UserDefinedObject.func should pass self object (#92050)
Fixes #90834

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92050
Approved by: https://github.com/jansel
2023-01-21 05:47:01 +00:00
Pearu Peterson
b3e4f5029b Add check-sparse-tensor-invariants flag to Context - 2nd try. (#92094)
This PR is a copy of https://github.com/pytorch/pytorch/pull/90849 that merge was reverted.

The PR adds "check sparse tensor invariants" flag to Context that when enabled will trigger sparse tensor data invariants checks in unsafe methods of constructing sparse COO/CSR/CSC/BSR/BSC tensors. The feature includes the following changes to UI:

`torch.sparse.check_sparse_tensor_invariants` class provides different ways to enable/disable the invariant checking.

`torch.sparse_coo/csr/csc/bsr/bsc/compressed_tensor` functions have a new optional argument `check_invariants` to enable/disable the invariant checks explicitly. When the `check_invariants` argument is specified, the global state of the feature is temporarily overridden.

The PR fixes https://github.com/pytorch/pytorch/issues/90833

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92094
Approved by: https://github.com/cpuhrsch
2023-01-13 14:50:33 +00:00
PyTorch MergeBot
c7a22bb7c7 Revert "Add check-sparse-tensor-invariants flag to Context. (#90849)"
This reverts commit b9a035c1c5.

Reverted https://github.com/pytorch/pytorch/pull/90849 on behalf of https://github.com/DanilBaibak due to Break internal build
2023-01-12 09:58:16 +00:00
Aleksandar Samardžić
8612ec5b90 Implement hybrid sparse to/from dense conversions. (#90177)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90177
Approved by: https://github.com/cpuhrsch, https://github.com/pearu
2023-01-12 03:31:30 +00:00
min-jean-cho
af242eedfb [Inductor] Added aten.uniform_ decomp (#90869)
Fixes #90815

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90869
Approved by: https://github.com/jgong5, https://github.com/jansel, https://github.com/lezcano, https://github.com/ngimel, https://github.com/albanD
2023-01-11 23:23:42 +00:00
Pearu Peterson
b9a035c1c5 Add check-sparse-tensor-invariants flag to Context. (#90849)
This PR adds "check sparse tensor invariants" flag to Context that when enabled will trigger sparse tensor data invariants checks in unsafe methods of constructing sparse COO/CSR/CSC/BSR/BSC tensors. The feature includes the following changes to UI:

- `torch.enable_check_sparse_tensor_invariants` and `torch.is_check_sparse_tensor_invariants_enabled` functions to globally enable/disable the invariant checks and to retrieve the state of the feature, respectively
- `torch.sparse_coo/csr/csc/bsr/bsc/compressed_tensor` functions have a new optional argument `check_invariants` to enable/disable the invariant checks explicitly. When the `check_invariants` argument is specified, the global state of the feature is temporarily overridden.

The PR also fixes https://github.com/pytorch/pytorch/issues/90833

# Main issue

*The following content is outdated after merging the PRs in this ghstack but kept for the record.*

The importance of this feature is that when enabling the invariants checks by default, say, via

<details>

```
$ git diff
diff --git a/torch/__init__.py b/torch/__init__.py
index c8543057c7..19a91d0482 100644
--- a/torch/__init__.py
+++ b/torch/__init__.py
@@ -1239,3 +1239,8 @@ if 'TORCH_CUDA_SANITIZER' in os.environ:

 # Populate magic methods on SymInt and SymFloat
 import torch.fx.experimental.symbolic_shapes
+
+# temporarily enable sparse tensor arguments validation in unsafe
+# constructors:
+
+torch._C._set_check_sparse_tensor_invariants(True)
```

</details>

a massive number of test failures/errors occur in test_sparse_csr.py tests:
```
$ pytest -sv test/test_sparse_csr.py
<snip>
==== 4293 failed, 1557 passed, 237 skipped, 2744 errors in 69.71s (0:01:09) ====
```
that means that we are silently constructing sparse compressed tensors that do not satisfy the sparse tensor invariants. In particular, the following errors are raised:

```
AssertionError: "resize_as_sparse_compressed_tensor_: self and src must have the same layout" does not match "expected values to be a strided and contiguous tensor"

RuntimeError: CUDA error: device-side assert triggered

RuntimeError: `col_indices[..., crow_indices[..., i - 1]:crow_indices[..., i]] for all i = 1, ..., nrows are sorted and distinct along the last dimension values` is not satisfied.

RuntimeError: expected col_indices to be a strided and contiguous tensor

RuntimeError: expected row_indices to be a strided and contiguous tensor

RuntimeError: expected values to be a strided and contiguous tensor

RuntimeError: for_each: failed to synchronize: cudaErrorAssert: device-side assert triggered

RuntimeError: tensor dimensionality must be sum of batch, base, and dense dimensionalities (=0 + 2 + 0) but got 3
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90849
Approved by: https://github.com/amjames, https://github.com/cpuhrsch
2023-01-11 01:05:14 +00:00
anjali411
c887837ec3 Reland "Fix dynamo handling for tensor attributes: T, H, mT, mH (#90463)" (#91897)
This reverts commit 84266ae670.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91897
Approved by: https://github.com/ngimel
2023-01-10 08:16:07 +00:00
PyTorch MergeBot
84266ae670 Revert "Fix dynamo handling for tensor attributes: T, H, mT, mH (#90463)"
This reverts commit 9945a78a94.

Reverted https://github.com/pytorch/pytorch/pull/90463 on behalf of https://github.com/ZainRizvi due to This is causing test failures: FAILED inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_pinv_singular_cuda_float64 - RuntimeError: unexpected success linalg.pinv.singular, torch.float64, cuda
2023-01-09 16:43:36 +00:00
anjali411
9945a78a94 Fix dynamo handling for tensor attributes: T, H, mT, mH (#90463)
Fixes https://github.com/pytorch/pytorch/issues/88843

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90463
Approved by: https://github.com/ngimel
2023-01-09 04:11:23 +00:00
Nikita Vedeneev
7ef7c57ae7 CSC/BSC -> COO coalesce fix (#91440)
Fixes https://github.com/pytorch/pytorch/issues/91010.

CSC and BSC sparse formats are not inherently `coalesced`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91440
Approved by: https://github.com/pearu, https://github.com/amjames, https://github.com/cpuhrsch
2023-01-03 18:42:39 +00:00
Pearu Peterson
b797a24259 Support indices contiguity per batch and non-contiguous values in sparse compressed tensors (#91243)
Fixes https://github.com/pytorch/pytorch/issues/91062

With this PR, all reported failures in https://github.com/pytorch/pytorch/pull/90849 are resolved (modulo test_bmm that uses an unorthodox way to construct a batch CSR tensor).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91243
Approved by: https://github.com/nikitaved, https://github.com/amjames, https://github.com/lezcano
2023-01-02 18:08:46 +00:00
Nikita Vedeneev
1768a28a20 COO @ COO: fix to always produce coalesced outputs. (#91094)
Fixes [#90516](https://github.com/pytorch/pytorch/issues/90516)
Fixes [#90538](https://github.com/pytorch/pytorch/issues/90538)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91094
Approved by: https://github.com/pearu
2022-12-27 21:32:14 +00:00
Pearu Peterson
8004f934cd Fix CSR with int32 indices to CSC conversion (#91061)
Fixes https://github.com/pytorch/pytorch/issues/91007

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91061
Approved by: https://github.com/nikitaved
2022-12-18 13:53:25 +00:00
Pearu Peterson
01e7f46215 Ensure sorted indices from the CSR->BSR conversion (#90918)
Fixes https://github.com/pytorch/pytorch/issues/90910

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90918
Approved by: https://github.com/cpuhrsch
2022-12-16 15:49:48 +00:00
Edward Z. Yang
e686a442b4 If a torch.* returns non-Tensor, make this unimplemented rather than assert. (#89918)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89918
Approved by: https://github.com/albanD
2022-12-15 21:53:54 +00:00
Pearu Peterson
a60d712010 Support (non-batch) BSR/BSC to COO sparse tensor conversions (#90718)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90718
Approved by: https://github.com/cpuhrsch
2022-12-14 05:37:05 +00:00
Pearu Peterson
76c6dfeaa6 Add layout and blocksize arguments to Tensor.to_sparse method (#89502)
This PR extends the `Tensor.to_sparse()` method to `Tensor.to_sparse(layout=None, blocksize=None)` in a BC manner (`layout=None` means `layout=torch.sparse_coo`).

In addition, the PR adds support for the following conversions:
- non-hybrid/hybrid COO tensor to CSR or CSC or a COO tensor
- short, bool, byte, char, bfloat16, int, long, half CSR tensor to a BSR tensor

and fixes the following conversions:
- hybrid COO to COO tensor
- non-batch/batch hybrid BSR to BSR or BSC tensor

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89502
Approved by: https://github.com/amjames, https://github.com/cpuhrsch
2022-11-30 20:21:10 +00:00
Pearu Peterson
296e1ba4d0 Row and column select support for block compressed sparse tensors (#88733)
As in the title:

- Support `select` and `select_copy` on block sparse compressed tensors
- Fixes incorrect results when selecting dense dimensions

The PR also improves the performance of indexing sparse compressed tensors considerably:

<details>

Before:

```python
In [3]: a=torch.rand((1000, 1000)).to_sparse_csr()

In [4]: %timeit a.select(0, 0)
606 µs ± 4.27 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [5]: %timeit a.select(1, 0)
527 µs ± 57.7 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [6]: %timeit a[0, 0]
617 µs ± 3.74 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [7]: a = a.cuda()

In [8]: %timeit a.select(0, 0); torch.cuda.synchronize();
1.19 ms ± 137 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [9]: %timeit a.select(1, 0); torch.cuda.synchronize();
1.2 ms ± 119 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [10]: %timeit a[0, 0]; torch.cuda.synchronize();
1.23 ms ± 482 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
```

This PR:

```python
In [3]: a=torch.rand((1000, 1000)).to_sparse_csr()

In [4]: %timeit a.select(0, 0)
4.75 µs ± 8.94 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [5]: %timeit a.select(1, 0)
565 µs ± 156 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [6]: %timeit a[0, 0]
13.1 µs ± 435 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [7]: a = a.cuda()

In [8]: %timeit a.select(0, 0); torch.cuda.synchronize();
21.6 µs ± 23.9 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [9]: %timeit a.select(1, 0); torch.cuda.synchronize();
1.15 ms ± 3.13 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [10]: %timeit a[0, 0]; torch.cuda.synchronize();
63.7 µs ± 2.5 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
```

</details>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88733
Approved by: https://github.com/nikitaved, https://github.com/amjames, https://github.com/cpuhrsch
2022-11-30 11:15:56 +00:00
Pearu Peterson
90bed8874f Generator of tensor inputs with variable layout and structure (batch/non-batch, hybrid/non-hybrid, block/non-block) (#88914)
This PR introduces `TestCase.generate_simple_inputs` method that is an improved and generalized version of the `TestSparseCompressed._generate_small_inputs` method.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88914
Approved by: https://github.com/cpuhrsch
2022-11-30 02:13:33 +00:00
Kazuaki Ishizaki
088f2fa567 Fix typos in messages under test (#89121)
This PR fixes typos of messages in `.cpp` and `.py` files under test directory.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89121
Approved by: https://github.com/mruberry, https://github.com/kit1980
2022-11-17 01:55:03 +00:00
Andrew M. James
ff6770a9a1 enable backward for log1p (sparse layouts) (#88155)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88155
Approved by: https://github.com/cpuhrsch
2022-11-04 20:59:26 +00:00
jpvillam
1e1b045128 [ROCM] Enable Sparse Pickle Test (#82729)
Missed stream context for serialization

### Description
Missing ROCm stream context on memory operations for serialization

### Testing
Ran the sparse pickle test

Pull Request resolved: https://github.com/pytorch/pytorch/pull/82729
Approved by: https://github.com/ngimel
2022-10-27 15:11:28 +00:00
Pearu Peterson
88b882cd1c Support sum on a sparse COO tensor. (#86300)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86300
Approved by: https://github.com/cpuhrsch
2022-10-06 18:39:28 +00:00
George Qi
686555b663 [maskedtensor] port torch/_masked into torch/masked (#85515)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85515
Approved by: https://github.com/cpuhrsch
2022-09-26 23:41:13 +00:00
Elias Ellison
bcc544e9d7 Add FakeCrossRef tests for backwards, Fix Layer Norm Backward Decomp (#85417)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85417
Approved by: https://github.com/ezyang
2022-09-26 17:08:14 +00:00
nikitaved
12ae3bea43 Faster mul(sparse, sparse) with broadcasting in dense dims. (#85336)
This is a combo PR of https://github.com/pytorch/pytorch/pull/84929 and ~https://github.com/pytorch/pytorch/pull/83428~.

Preliminary benchmarks (square matrices of shape (n, n)).

<details>

<summary>Script</summary>

```python
import torch
import math
from IPython import get_ipython
from itertools import product, repeat
import pickle
from torch.utils.benchmark import Timer, Compare

torch.manual_seed(13)

problem_dims = (
    # n > nnz
    (10000, 100),
    (100000, 1000),
    (1000000, 10000),
    # n < nnz
    (10, 100),
    (10, 1000),
    (10, 10000),
    (100, 1000),
    (100, 10000),
    (1000, 10000),
    (1000, 100000),
    (1000, 1000000),
    #(1000000, 1000000000),
)

name = "PR"
device = "cuda"
results = []

for n, nnz in problem_dims:
    def gen_tensor(coalesce=False):
        shape = (n, n)
        nrows, ncols = shape
        rowidx = torch.randint(low=0, high=nrows, size=(nnz,), device=device)
        colidx = torch.randint(low=0, high=ncols, size=(nnz,), device=device)
        itemidx = torch.vstack((rowidx, colidx))
        xvalues = torch.randn(nnz, device=device)
        itemidx = torch.hstack((itemidx, itemidx))
        xvalues = torch.hstack((xvalues, xvalues))
        res = torch.sparse_coo_tensor(itemidx, xvalues, size=shape)
        if coalesce:
            return res.coalesce()
        else:
            return res

    for x_coalesce, y_coalesce in product(*repeat((True, False), 2)):
        x = gen_tensor(x_coalesce)
        y = gen_tensor(y_coalesce)
        smtp = "x * y"
        timer = Timer(smtp,
                      globals=globals(),
                      label="coo.mul",
                      description=f"{name}: mul, device: {device}",
                      sub_label=f"n={n}, nnz={nnz}, coalesce=({x_coalesce, y_coalesce})",
                      num_threads=torch.get_num_threads())
        results.append(timer.blocked_autorange())

compare = Compare(results)
compare.trim_significant_figures()
compare.print()

with open(f"{name}_{device}_mul.pickle", 'wb') as f:
    pickle.dump(results, f)

```

</details>

<details>

<summary>Gather results</summary>

```python
import pickle
from torch.utils.benchmark import Timer, Compare

files = [
        "PR",
        "master"
        ]

device = 'cuda'

timers = []
for name in files:
    with open("{}_{}_mul.pickle".format(name, device), 'rb') as f:
        timers += pickle.load(f)

compare = Compare(timers)
compare.trim_significant_figures()
compare.print()

```

</details>

<details>

<summary>CUDA</summary>

```
[------------------------------------------------- coo.mul -------------------------------------------------]
                                                       |  PR: mul, device: cuda  |  master: mul, device: cuda
24 threads: -------------------------------------------------------------------------------------------------
      n=10000, nnz=100, coalesce=((True, True))        |             95          |                91
      n=10000, nnz=100, coalesce=((True, False))       |             87          |               242
      n=10000, nnz=100, coalesce=((False, True))       |             87          |               226
      n=10000, nnz=100, coalesce=((False, False))      |            130          |               371
      n=100000, nnz=1000, coalesce=((True, True))      |            100          |               521
      n=100000, nnz=1000, coalesce=((True, False))     |             90          |               649
      n=100000, nnz=1000, coalesce=((False, True))     |            100          |               659
      n=100000, nnz=1000, coalesce=((False, False))    |            200          |               781
      n=1000000, nnz=10000, coalesce=((True, True))    |            100          |              4861
      n=1000000, nnz=10000, coalesce=((True, False))   |            100          |              5012
      n=1000000, nnz=10000, coalesce=((False, True))   |             98          |              5010
      n=1000000, nnz=10000, coalesce=((False, False))  |            384          |              5174
      n=10, nnz=100, coalesce=((True, True))           |            100          |                79
      n=10, nnz=100, coalesce=((True, False))          |            100          |               221
      n=10, nnz=100, coalesce=((False, True))          |            100          |               221
      n=10, nnz=100, coalesce=((False, False))         |            100          |               350
      n=10, nnz=1000, coalesce=((True, True))          |            100          |               100
      n=10, nnz=1000, coalesce=((True, False))         |            100          |               240
      n=10, nnz=1000, coalesce=((False, True))         |            100          |               254
      n=10, nnz=1000, coalesce=((False, False))        |            100          |               392
      n=10, nnz=10000, coalesce=((True, True))         |            100          |               110
      n=10, nnz=10000, coalesce=((True, False))        |            110          |               286
      n=10, nnz=10000, coalesce=((False, True))        |            110          |               286
      n=10, nnz=10000, coalesce=((False, False))       |            271          |               455
      n=100, nnz=1000, coalesce=((True, True))         |            110          |               851
      n=100, nnz=1000, coalesce=((True, False))        |            110          |              1000
      n=100, nnz=1000, coalesce=((False, True))        |            110          |               990
      n=100, nnz=1000, coalesce=((False, False))       |            140          |              1124
      n=100, nnz=10000, coalesce=((True, True))        |            110          |              5137
      n=100, nnz=10000, coalesce=((True, False))       |            110          |              5391
      n=100, nnz=10000, coalesce=((False, True))       |            100          |              5405
      n=100, nnz=10000, coalesce=((False, False))      |            249          |              5539
      n=1000, nnz=10000, coalesce=((True, True))       |            100          |              8598
      n=1000, nnz=10000, coalesce=((True, False))      |            100          |              8800
      n=1000, nnz=10000, coalesce=((False, True))      |            100          |              8782
      n=1000, nnz=10000, coalesce=((False, False))     |            255          |              8956
      n=1000, nnz=100000, coalesce=((True, True))      |            120          |             84500
      n=1000, nnz=100000, coalesce=((True, False))     |            200          |             88560
      n=1000, nnz=100000, coalesce=((False, True))     |            160          |             89000
      n=1000, nnz=100000, coalesce=((False, False))    |            373          |             89000
      n=1000, nnz=1000000, coalesce=((True, True))     |            312          |            606400
      n=1000, nnz=1000000, coalesce=((True, False))    |           1340          |            609200
      n=1000, nnz=1000000, coalesce=((False, True))    |           1340          |            609100
      n=1000, nnz=1000000, coalesce=((False, False))   |           4408          |            611400

Times are in microseconds (us).
```

</details>

<details>

<summary>CPU</summary>

```
[------------------------------------------------ coo.mul ------------------------------------------------]
                                                       |  PR: mul, device: cpu  |  master: mul, device: cpu
24 threads: -----------------------------------------------------------------------------------------------
      n=10000, nnz=100, coalesce=((True, True))        |              8         |                8
      n=10000, nnz=100, coalesce=((True, False))       |             32         |               34
      n=10000, nnz=100, coalesce=((False, True))       |             32         |               34
      n=10000, nnz=100, coalesce=((False, False))      |             41         |               56
      n=100000, nnz=1000, coalesce=((True, True))      |             24         |               24
      n=100000, nnz=1000, coalesce=((True, False))     |             90         |              100
      n=100000, nnz=1000, coalesce=((False, True))     |             87         |              100
      n=100000, nnz=1000, coalesce=((False, False))    |            231         |              255
      n=1000000, nnz=10000, coalesce=((True, True))    |            190         |              200
      n=1000000, nnz=10000, coalesce=((True, False))   |            908         |             2023
      n=1000000, nnz=10000, coalesce=((False, True))   |            800         |             2036
      n=1000000, nnz=10000, coalesce=((False, False))  |           3684         |             3989
      n=10, nnz=100, coalesce=((True, True))           |              8         |                7
      n=10, nnz=100, coalesce=((True, False))          |             34         |               30
      n=10, nnz=100, coalesce=((False, True))          |             33         |               30
      n=10, nnz=100, coalesce=((False, False))         |             44         |               50
      n=10, nnz=1000, coalesce=((True, True))          |              8         |                7
      n=10, nnz=1000, coalesce=((True, False))         |            100         |              100
      n=10, nnz=1000, coalesce=((False, True))         |            130         |              100
      n=10, nnz=1000, coalesce=((False, False))        |            746         |              210
      n=10, nnz=10000, coalesce=((True, True))         |              8         |                7
      n=10, nnz=10000, coalesce=((True, False))        |           1000         |             1500
      n=10, nnz=10000, coalesce=((False, True))        |           1000         |             1510
      n=10, nnz=10000, coalesce=((False, False))       |           3063         |             2457
      n=100, nnz=1000, coalesce=((True, True))         |             25         |               25
      n=100, nnz=1000, coalesce=((True, False))        |            180         |              130
      n=100, nnz=1000, coalesce=((False, True))        |            200         |              130
      n=100, nnz=1000, coalesce=((False, False))       |            271         |              255
      n=100, nnz=10000, coalesce=((True, True))        |            100         |              100
      n=100, nnz=10000, coalesce=((True, False))       |           2444         |             2290
      n=100, nnz=10000, coalesce=((False, True))       |           2455         |             2357
      n=100, nnz=10000, coalesce=((False, False))      |           5316         |             3783
      n=1000, nnz=10000, coalesce=((True, True))       |            204         |              211
      n=1000, nnz=10000, coalesce=((True, False))      |           2457         |             2480
      n=1000, nnz=10000, coalesce=((False, True))      |           2448         |             2539
      n=1000, nnz=10000, coalesce=((False, False))     |           3665         |             4801
      n=1000, nnz=100000, coalesce=((True, True))      |           2293         |             2374
      n=1000, nnz=100000, coalesce=((True, False))     |           9000         |            24620
      n=1000, nnz=100000, coalesce=((False, True))     |           8000         |            25080
      n=1000, nnz=100000, coalesce=((False, False))    |          26500         |            47650
      n=1000, nnz=1000000, coalesce=((True, True))     |          10000         |            13000
      n=1000, nnz=1000000, coalesce=((True, False))    |          80000         |           362200
      n=1000, nnz=1000000, coalesce=((False, True))    |          78050         |           392600
      n=1000, nnz=1000000, coalesce=((False, False))   |         312100         |           766900

Times are in microseconds (us).
```

</details>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/85336
Approved by: https://github.com/cpuhrsch
2022-09-23 23:31:19 +00:00
PyTorch MergeBot
d10de31cc8 Revert "Add FakeCrossRef tests for backwards, Fix Layer Norm Backward Decomp (#85417)"
This reverts commit 78afa0cf0c.

Reverted https://github.com/pytorch/pytorch/pull/85417 on behalf of https://github.com/clee2000 due to broke tests on trunk 78afa0cf0c
2022-09-23 17:21:43 +00:00
Elias Ellison
78afa0cf0c Add FakeCrossRef tests for backwards, Fix Layer Norm Backward Decomp (#85417)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85417
Approved by: https://github.com/ezyang
2022-09-23 15:50:03 +00:00
PyTorch MergeBot
5043457a8e Revert "Add FakeCrossRef tests for backwards, Fix Layer Norm Backward Decomp (#85417)"
This reverts commit 9c77083965.

Reverted https://github.com/pytorch/pytorch/pull/85417 on behalf of https://github.com/clee2000 due to broke tests on trunk (and pull somehow) 9c77083965
2022-09-22 15:44:38 +00:00
Elias Ellison
9c77083965 Add FakeCrossRef tests for backwards, Fix Layer Norm Backward Decomp (#85417)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85417
Approved by: https://github.com/ezyang
2022-09-22 13:03:57 +00:00
Elias Ellison
d9aa6dfe88 Add Fake Cross Ref Mode, migrate sparse to it (#85382)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85382
Approved by: https://github.com/ezyang
2022-09-21 17:15:47 +00:00
PyTorch MergeBot
81620c3360 Revert "Faster mul(sparse, sparse) with broadcasting in dense dims. (#83428)"
This reverts commit d49943bda8.

Reverted https://github.com/pytorch/pytorch/pull/83428 on behalf of https://github.com/osalpekar due to Reverted because __restrict symbol not supported by certain MSVC compilers, leading to undefined symbol error at compilation time
2022-09-17 06:53:11 +00:00
nikitaved
d49943bda8 Faster mul(sparse, sparse) with broadcasting in dense dims. (#83428)
Preliminary benchmarks (square matrices of shape (n, n)).

<details>

<summary>Script</summary>

```python
import torch
import math
from IPython import get_ipython
from itertools import product, repeat
import pickle
from torch.utils.benchmark import Timer, Compare

torch.manual_seed(13)

# specifies (n, nnz)
problem_dims = (
    # n > nnz
    (10000, 100),
    (100000, 1000),
    (1000000, 10000),
    # n < nnz
    (10, 100),
    (10, 1000),
    (10, 10000),
    (100, 1000),
    (100, 10000),
    (1000, 10000),
    (1000, 100000),
    (1000, 1000000),
    #(1000000, 1000000000),
)

name = "PR"
device = "cuda"
results = []

for n, nnz in problem_dims:
    def gen_tensor(coalesce=False):
        shape = (n, n)
        nrows, ncols = shape
        rowidx = torch.randint(low=0, high=nrows, size=(nnz,), device=device)
        colidx = torch.randint(low=0, high=ncols, size=(nnz,), device=device)
        itemidx = torch.vstack((rowidx, colidx))
        xvalues = torch.randn(nnz, device=device)
        itemidx = torch.hstack((itemidx, itemidx))
        xvalues = torch.hstack((xvalues, xvalues))
        res = torch.sparse_coo_tensor(itemidx, xvalues, size=shape)
        if coalesce:
            return res.coalesce()
        else:
            return res

    for x_coalesce, y_coalesce in product(*repeat((True, False), 2)):
        x = gen_tensor(x_coalesce)
        y = gen_tensor(y_coalesce)
        smtp = "x * y"
        timer = Timer(smtp,
                      globals=globals(),
                      label="coo.mul",
                      description=f"{name}: mul, device: {device}",
                      sub_label=f"n={n}, nnz={nnz}, coalesce=({x_coalesce, y_coalesce})",
                      num_threads=torch.get_num_threads())
        results.append(timer.blocked_autorange())

compare = Compare(results)
compare.trim_significant_figures()
compare.print()

with open(f"{name}_{device}_mul.pickle", 'wb') as f:
    pickle.dump(results, f)

```

</details>

<details>

<summary>Gather results</summary>

```python
import pickle
from torch.utils.benchmark import Timer, Compare

files = [
        "PR",
        "master"
        ]

device = 'cuda'

timers = []
for name in files:
    with open("{}_{}_mul.pickle".format(name, device), 'rb') as f:
        timers += pickle.load(f)

compare = Compare(timers)
compare.trim_significant_figures()
compare.print()

```

</details>

<details>

<summary>CUDA</summary>

```
[------------------------------------------------- coo.mul -------------------------------------------------]
                                                       |  PR: mul, device: cuda  |  master: mul, device: cuda
24 threads: -------------------------------------------------------------------------------------------------
      n=10000, nnz=100, coalesce=((True, True))        |             95          |                91
      n=10000, nnz=100, coalesce=((True, False))       |             87          |               242
      n=10000, nnz=100, coalesce=((False, True))       |             87          |               226
      n=10000, nnz=100, coalesce=((False, False))      |            130          |               371
      n=100000, nnz=1000, coalesce=((True, True))      |            100          |               521
      n=100000, nnz=1000, coalesce=((True, False))     |             90          |               649
      n=100000, nnz=1000, coalesce=((False, True))     |            100          |               659
      n=100000, nnz=1000, coalesce=((False, False))    |            200          |               781
      n=1000000, nnz=10000, coalesce=((True, True))    |            100          |              4861
      n=1000000, nnz=10000, coalesce=((True, False))   |            100          |              5012
      n=1000000, nnz=10000, coalesce=((False, True))   |             98          |              5010
      n=1000000, nnz=10000, coalesce=((False, False))  |            384          |              5174
      n=10, nnz=100, coalesce=((True, True))           |            100          |                79
      n=10, nnz=100, coalesce=((True, False))          |            100          |               221
      n=10, nnz=100, coalesce=((False, True))          |            100          |               221
      n=10, nnz=100, coalesce=((False, False))         |            100          |               350
      n=10, nnz=1000, coalesce=((True, True))          |            100          |               100
      n=10, nnz=1000, coalesce=((True, False))         |            100          |               240
      n=10, nnz=1000, coalesce=((False, True))         |            100          |               254
      n=10, nnz=1000, coalesce=((False, False))        |            100          |               392
      n=10, nnz=10000, coalesce=((True, True))         |            100          |               110
      n=10, nnz=10000, coalesce=((True, False))        |            110          |               286
      n=10, nnz=10000, coalesce=((False, True))        |            110          |               286
      n=10, nnz=10000, coalesce=((False, False))       |            271          |               455
      n=100, nnz=1000, coalesce=((True, True))         |            110          |               851
      n=100, nnz=1000, coalesce=((True, False))        |            110          |              1000
      n=100, nnz=1000, coalesce=((False, True))        |            110          |               990
      n=100, nnz=1000, coalesce=((False, False))       |            140          |              1124
      n=100, nnz=10000, coalesce=((True, True))        |            110          |              5137
      n=100, nnz=10000, coalesce=((True, False))       |            110          |              5391
      n=100, nnz=10000, coalesce=((False, True))       |            100          |              5405
      n=100, nnz=10000, coalesce=((False, False))      |            249          |              5539
      n=1000, nnz=10000, coalesce=((True, True))       |            100          |              8598
      n=1000, nnz=10000, coalesce=((True, False))      |            100          |              8800
      n=1000, nnz=10000, coalesce=((False, True))      |            100          |              8782
      n=1000, nnz=10000, coalesce=((False, False))     |            255          |              8956
      n=1000, nnz=100000, coalesce=((True, True))      |            120          |             84500
      n=1000, nnz=100000, coalesce=((True, False))     |            200          |             88560
      n=1000, nnz=100000, coalesce=((False, True))     |            160          |             89000
      n=1000, nnz=100000, coalesce=((False, False))    |            373          |             89000
      n=1000, nnz=1000000, coalesce=((True, True))     |            312          |            606400
      n=1000, nnz=1000000, coalesce=((True, False))    |           1340          |            609200
      n=1000, nnz=1000000, coalesce=((False, True))    |           1340          |            609100
      n=1000, nnz=1000000, coalesce=((False, False))   |           4408          |            611400

Times are in microseconds (us).
```

</details>

<details>

<summary>CPU</summary>

```
[------------------------------------------------ coo.mul ------------------------------------------------]
                                                       |  PR: mul, device: cpu  |  master: mul, device: cpu
24 threads: -----------------------------------------------------------------------------------------------
      n=10000, nnz=100, coalesce=((True, True))        |              8         |                8
      n=10000, nnz=100, coalesce=((True, False))       |             32         |               34
      n=10000, nnz=100, coalesce=((False, True))       |             32         |               34
      n=10000, nnz=100, coalesce=((False, False))      |             41         |               56
      n=100000, nnz=1000, coalesce=((True, True))      |             24         |               24
      n=100000, nnz=1000, coalesce=((True, False))     |             90         |              100
      n=100000, nnz=1000, coalesce=((False, True))     |             87         |              100
      n=100000, nnz=1000, coalesce=((False, False))    |            231         |              255
      n=1000000, nnz=10000, coalesce=((True, True))    |            190         |              200
      n=1000000, nnz=10000, coalesce=((True, False))   |            908         |             2023
      n=1000000, nnz=10000, coalesce=((False, True))   |            800         |             2036
      n=1000000, nnz=10000, coalesce=((False, False))  |           3684         |             3989
      n=10, nnz=100, coalesce=((True, True))           |              8         |                7
      n=10, nnz=100, coalesce=((True, False))          |             34         |               30
      n=10, nnz=100, coalesce=((False, True))          |             33         |               30
      n=10, nnz=100, coalesce=((False, False))         |             44         |               50
      n=10, nnz=1000, coalesce=((True, True))          |              8         |                7
      n=10, nnz=1000, coalesce=((True, False))         |            100         |              100
      n=10, nnz=1000, coalesce=((False, True))         |            130         |              100
      n=10, nnz=1000, coalesce=((False, False))        |            746         |              210
      n=10, nnz=10000, coalesce=((True, True))         |              8         |                7
      n=10, nnz=10000, coalesce=((True, False))        |           1000         |             1500
      n=10, nnz=10000, coalesce=((False, True))        |           1000         |             1510
      n=10, nnz=10000, coalesce=((False, False))       |           3063         |             2457
      n=100, nnz=1000, coalesce=((True, True))         |             25         |               25
      n=100, nnz=1000, coalesce=((True, False))        |            180         |              130
      n=100, nnz=1000, coalesce=((False, True))        |            200         |              130
      n=100, nnz=1000, coalesce=((False, False))       |            271         |              255
      n=100, nnz=10000, coalesce=((True, True))        |            100         |              100
      n=100, nnz=10000, coalesce=((True, False))       |           2444         |             2290
      n=100, nnz=10000, coalesce=((False, True))       |           2455         |             2357
      n=100, nnz=10000, coalesce=((False, False))      |           5316         |             3783
      n=1000, nnz=10000, coalesce=((True, True))       |            204         |              211
      n=1000, nnz=10000, coalesce=((True, False))      |           2457         |             2480
      n=1000, nnz=10000, coalesce=((False, True))      |           2448         |             2539
      n=1000, nnz=10000, coalesce=((False, False))     |           3665         |             4801
      n=1000, nnz=100000, coalesce=((True, True))      |           2293         |             2374
      n=1000, nnz=100000, coalesce=((True, False))     |           9000         |            24620
      n=1000, nnz=100000, coalesce=((False, True))     |           8000         |            25080
      n=1000, nnz=100000, coalesce=((False, False))    |          26500         |            47650
      n=1000, nnz=1000000, coalesce=((True, True))     |          10000         |            13000
      n=1000, nnz=1000000, coalesce=((True, False))    |          80000         |           362200
      n=1000, nnz=1000000, coalesce=((False, True))    |          78050         |           392600
      n=1000, nnz=1000000, coalesce=((False, False))   |         312100         |           766900

Times are in microseconds (us).
```

</details>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/83428
Approved by: https://github.com/cpuhrsch
2022-09-16 00:28:40 +00:00
Edward Z. Yang
c5a8946e40 Revert "Revert "Redo how custom/python_custom methods on TensorImpl work (#84796)" (#84806)
This reverts commit ca3b2bfbe3.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84806
Approved by: https://github.com/Chillee
2022-09-10 06:17:35 +00:00
Eli Uriegas
ca3b2bfbe3 Revert "Redo how custom/python_custom methods on TensorImpl work (#84796)
This reverts commit 591b75bf98.

Manual revert of https://github.com/pytorch/pytorch/pull/84641

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/84796
Approved by: https://github.com/izaitsevfb
2022-09-10 00:18:13 +00:00
Edward Z. Yang
591b75bf98 Redo how custom/python_custom methods on TensorImpl work (#84641)
A longstanding confusion in the implementation of fake tensor and proxy tensor is what to do about torch.ops.aten.sym_sizes and related calls. In particular, when you have a tensor that (1) has symbolic shapes and (2) has a `__torch_dispatch__` call, previously, you would always get `__torch_dispatch__` calls for sizes/strides query, *even if you didn't request it* via the dispatch kwargs in `make_wrapper_subclass`.

The reason for this is because we were previously mixing several concepts: "I want to dispatch to Python", "I want to call a virtual method" and "I have dynamic shapes". A single boolean variable controlled all of these things, and so it was not possible to understand inside TensorImpl what the user had actually originally requested.

In this PR, we track each of these concepts individually so that we can preserve user intent. Then, we combine these into a single "policy" variable that controls whether or not we can use the fastpath or not. For the policy to trigger, we only need one of the exceptional cases to be true.

Billing of changes:
* Rename `set_sizes_strides_policy` to `set_custom_sizes_strides`; in general, you cannot DIRECTLY set policy; you have to indirectly set it by the public functions.
* Some helpers for sizes and strides, since it's more complicated (as it is an enum, rather than just bools as is the case for device and layout). `matches_python_custom` is used to test the Python dispatch user ask. `matches_policy` does the policy test (only used in the user facing functions.)
* I reorged the accessor methods so that they are more logical. This makes the diff bad, so I recommend reading the final code directly.
* The default custom implementations now more reliably call their default() implementations
* As bonus refactor, I devirtualized some functions that don't need to be virtual
* `set_sym_sizes_and_strides` is renamed to `set_sizes_and_strides` to make it easier to use in template contexts; it optionally takes a storage offset now so you can set all three values at the same time. If you use the SymInt overload but there are no symbolic integers, we give you a normal resize.
* This adds `sym_storage_offset` since we had that in the symbolic shapes branch and there's no reason not to put it in (and it reduces merge conflicts)

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84641
Approved by: https://github.com/wconstab
2022-09-09 13:41:13 +00:00
Elias Ellison
15c5baf878 Throw on data dependent ops (#83567)
Previously, we would trace through the following with no error:
```
from torch.fx.experimental.proxy_tensor import make_fx
import torch

def f(x, y):
    return x[0, y:]
```

Even though the output shape is dependent on the data of `y`.  Now, throw on the conversion of `y` to an integer.

It would be nice to not break on constant tensors but I'll do that as the next PR (Edit: done with https://github.com/pytorch/pytorch/pull/84387). Sketching out how that would work (and keep in mind this is applicable Dynamo tracing and not just AOT Autograd)

I think to do that you would need to :
- hold strong refs to a set of constant tensors, and only allow them to be captured from `lift_fresh.copy`
- when you run a mutable op, either remove it from the set of constant tensors or run the operator for real
- limit to small constant tensors
Anything else ?

Pull Request resolved: https://github.com/pytorch/pytorch/pull/83567
Approved by: https://github.com/ezyang
2022-09-07 02:37:00 +00:00
Andrew M. James
6dc9223c8b Sparse_coo: Be more agressive in setting coalesced True to avoid suprising behaviors (#82426)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82426
Approved by: https://github.com/pearu, https://github.com/bhosmer
2022-09-01 17:46:51 +00:00
jpvillam
247468baf0 [ROCm] More Sparse UTs enablement and more hipification mappings. (#78939)
Enables:

 test_bmm_cuda_float64
 test_bmm_deterministic_cuda_float64
 test_csr_matvec_cuda_complex128
 test_csr_matvec_cuda_complex64
 test_csr_matvec_cuda_float32
 test_csr_matvec_cuda_float64

To enable the above tests had to add some more hip mappings for the hipification process.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78939
Approved by: https://github.com/pruthvistony, https://github.com/malfet
2022-08-23 13:54:09 +00:00