Nikita Vedeneev
3ace14eb8b
[Bug fix] sparse_mask: wrong intersection on CUDA ( #94829 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94829
Approved by: https://github.com/cpuhrsch
2023-02-15 13:22:39 +00:00
Xuehai Pan
046e88a291
[BE] [3/3] Rewrite super() calls in test ( #94592 )
...
Rewrite Python built-in class `super()` calls. Only non-semantic changes should be applied.
- #94587
- #94588
- #94592
Also, methods with only a `super()` call are removed:
```diff
class MyModule(nn.Module):
- def __init__(self):
- super().__init__()
-
def forward(self, ...):
...
```
Some cases that change the semantics should be kept unchanged. E.g.:
f152a79be9/caffe2/python/net_printer.py (L184-L190)
f152a79be9/test/test_jit_fuser_te.py (L2628-L2635)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94592
Approved by: https://github.com/ezyang , https://github.com/seemethere
2023-02-12 22:20:53 +00:00
Aaron Gokaslan
3d82d8d0ed
[BE] Enable more flake8-comprehensions checks ( #94601 )
...
I applied some flake8 fixes and enabled checking for them in the linter. I also enabled some checks for my previous comprehensions PR.
This is a follow up to #94323 where I enable the flake8 checkers for the fixes I made and fix a few more of them.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94601
Approved by: https://github.com/ezyang
2023-02-10 23:40:29 +00:00
Huy Do
c53bd0dd30
Mitigate broken test_coalesce_reference_cycle test on dynamo ( #94622 )
...
The test has been disabled and shows up on https://github.com/pytorch/test-infra/blob/generated-stats/stats/disabled-tests-condensed.json , but then the JSON file downloaded by the runner doesn't seem to have it.
Disable it explicitly to keep trunk green while investigating.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94622
Approved by: https://github.com/weiwangmeta
2023-02-10 21:59:36 +00:00
PyTorch MergeBot
76ed1a81d1
Revert "COO intersection kernel: respect value intersection order ( #92242 )"
...
This reverts commit b07c839b70 .
Reverted https://github.com/pytorch/pytorch/pull/92242 on behalf of https://github.com/jeanschmidt due to breaking vs17
2023-02-09 14:44:32 +00:00
Aleksandar Samardžić
e1f17b3530
Add CSR->BSC and CSC->BSR conversions ( #93301 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/93301
Approved by: https://github.com/cpuhrsch
2023-02-07 19:22:05 +00:00
Nikita Vedeneev
b07c839b70
COO intersection kernel: respect value intersection order ( #92242 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92242
Approved by: https://github.com/cpuhrsch , https://github.com/amjames
2023-02-07 17:05:28 +00:00
Nikita Vedeneev
994f85d639
sparse_mask: extend lhs to sparse COO tensors ( #92248 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92248
Approved by: https://github.com/cpuhrsch , https://github.com/pearu
2023-02-01 09:00:07 +00:00
Aleksandar Samardžić
53f7fb9a22
Add CSC->BSC conversion ( #92307 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92307
Approved by: https://github.com/cpuhrsch
2023-01-30 17:03:36 +00:00
Pearu Peterson
65d6802e2f
Improve error messages for sparse methods on tensors with unsupported backends/layouts. ( #93149 )
...
Fixes https://github.com/pytorch/pytorch/issues/92790
Pull Request resolved: https://github.com/pytorch/pytorch/pull/93149
Approved by: https://github.com/cpuhrsch
2023-01-27 19:50:23 +00:00
Pearu Peterson
0e92bbe5b1
Add sparse COO tensor support to torch.sum(dim=..., keepdim=...) ( #92979 )
...
Fixes #92757 , #86232
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92979
Approved by: https://github.com/cpuhrsch
2023-01-26 18:42:51 +00:00
Eddie Yan
0bf7506051
[CUDA] Drop CUDA < 11.0 test flags ( #92605 )
...
Follow-up of #89582 to drop flags like `CUDA11OrLater` in tests. Note that in some places it appears that `TEST_WITH_ROCM` is _implicitly_ guarded against via the `CUDA11OrLater` version check, based on my best-guess of how `torch.version.cuda` would behave in ROCM builds, so I've added `not TEST_WITH_ROCM` in cases where ROCM wasn't previously explicitly allowed.
CC @ptrblck @malfet @ngimel
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92605
Approved by: https://github.com/ngimel
2023-01-24 04:34:06 +00:00
Nikita Vedeneev
9f381c9b7f
sparse_sparse_matmul: simplify backward ( #91712 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91712
Approved by: https://github.com/albanD
2023-01-23 19:24:28 +00:00
Yanbo Liang
0ab4ab9f8d
[Dynamo] Fix calling UserDefinedObject.func should pass self object ( #92050 )
...
Fixes #90834
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92050
Approved by: https://github.com/jansel
2023-01-21 05:47:01 +00:00
Pearu Peterson
b3e4f5029b
Add check-sparse-tensor-invariants flag to Context - 2nd try. ( #92094 )
...
This PR is a copy of https://github.com/pytorch/pytorch/pull/90849 that merge was reverted.
The PR adds "check sparse tensor invariants" flag to Context that when enabled will trigger sparse tensor data invariants checks in unsafe methods of constructing sparse COO/CSR/CSC/BSR/BSC tensors. The feature includes the following changes to UI:
`torch.sparse.check_sparse_tensor_invariants` class provides different ways to enable/disable the invariant checking.
`torch.sparse_coo/csr/csc/bsr/bsc/compressed_tensor` functions have a new optional argument `check_invariants` to enable/disable the invariant checks explicitly. When the `check_invariants` argument is specified, the global state of the feature is temporarily overridden.
The PR fixes https://github.com/pytorch/pytorch/issues/90833
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92094
Approved by: https://github.com/cpuhrsch
2023-01-13 14:50:33 +00:00
PyTorch MergeBot
c7a22bb7c7
Revert "Add check-sparse-tensor-invariants flag to Context. ( #90849 )"
...
This reverts commit b9a035c1c5 .
Reverted https://github.com/pytorch/pytorch/pull/90849 on behalf of https://github.com/DanilBaibak due to Break internal build
2023-01-12 09:58:16 +00:00
Aleksandar Samardžić
8612ec5b90
Implement hybrid sparse to/from dense conversions. ( #90177 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90177
Approved by: https://github.com/cpuhrsch , https://github.com/pearu
2023-01-12 03:31:30 +00:00
min-jean-cho
af242eedfb
[Inductor] Added aten.uniform_ decomp ( #90869 )
...
Fixes #90815
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90869
Approved by: https://github.com/jgong5 , https://github.com/jansel , https://github.com/lezcano , https://github.com/ngimel , https://github.com/albanD
2023-01-11 23:23:42 +00:00
Pearu Peterson
b9a035c1c5
Add check-sparse-tensor-invariants flag to Context. ( #90849 )
...
This PR adds "check sparse tensor invariants" flag to Context that when enabled will trigger sparse tensor data invariants checks in unsafe methods of constructing sparse COO/CSR/CSC/BSR/BSC tensors. The feature includes the following changes to UI:
- `torch.enable_check_sparse_tensor_invariants` and `torch.is_check_sparse_tensor_invariants_enabled` functions to globally enable/disable the invariant checks and to retrieve the state of the feature, respectively
- `torch.sparse_coo/csr/csc/bsr/bsc/compressed_tensor` functions have a new optional argument `check_invariants` to enable/disable the invariant checks explicitly. When the `check_invariants` argument is specified, the global state of the feature is temporarily overridden.
The PR also fixes https://github.com/pytorch/pytorch/issues/90833
# Main issue
*The following content is outdated after merging the PRs in this ghstack but kept for the record.*
The importance of this feature is that when enabling the invariants checks by default, say, via
<details>
```
$ git diff
diff --git a/torch/__init__.py b/torch/__init__.py
index c8543057c7..19a91d0482 100644
--- a/torch/__init__.py
+++ b/torch/__init__.py
@@ -1239,3 +1239,8 @@ if 'TORCH_CUDA_SANITIZER' in os.environ:
# Populate magic methods on SymInt and SymFloat
import torch.fx.experimental.symbolic_shapes
+
+# temporarily enable sparse tensor arguments validation in unsafe
+# constructors:
+
+torch._C._set_check_sparse_tensor_invariants(True)
```
</details>
a massive number of test failures/errors occur in test_sparse_csr.py tests:
```
$ pytest -sv test/test_sparse_csr.py
<snip>
==== 4293 failed, 1557 passed, 237 skipped, 2744 errors in 69.71s (0:01:09) ====
```
that means that we are silently constructing sparse compressed tensors that do not satisfy the sparse tensor invariants. In particular, the following errors are raised:
```
AssertionError: "resize_as_sparse_compressed_tensor_: self and src must have the same layout" does not match "expected values to be a strided and contiguous tensor"
RuntimeError: CUDA error: device-side assert triggered
RuntimeError: `col_indices[..., crow_indices[..., i - 1]:crow_indices[..., i]] for all i = 1, ..., nrows are sorted and distinct along the last dimension values` is not satisfied.
RuntimeError: expected col_indices to be a strided and contiguous tensor
RuntimeError: expected row_indices to be a strided and contiguous tensor
RuntimeError: expected values to be a strided and contiguous tensor
RuntimeError: for_each: failed to synchronize: cudaErrorAssert: device-side assert triggered
RuntimeError: tensor dimensionality must be sum of batch, base, and dense dimensionalities (=0 + 2 + 0) but got 3
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90849
Approved by: https://github.com/amjames , https://github.com/cpuhrsch
2023-01-11 01:05:14 +00:00
anjali411
c887837ec3
Reland "Fix dynamo handling for tensor attributes: T, H, mT, mH ( #90463 )" ( #91897 )
...
This reverts commit 84266ae670 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91897
Approved by: https://github.com/ngimel
2023-01-10 08:16:07 +00:00
PyTorch MergeBot
84266ae670
Revert "Fix dynamo handling for tensor attributes: T, H, mT, mH ( #90463 )"
...
This reverts commit 9945a78a94 .
Reverted https://github.com/pytorch/pytorch/pull/90463 on behalf of https://github.com/ZainRizvi due to This is causing test failures: FAILED inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_pinv_singular_cuda_float64 - RuntimeError: unexpected success linalg.pinv.singular, torch.float64, cuda
2023-01-09 16:43:36 +00:00
anjali411
9945a78a94
Fix dynamo handling for tensor attributes: T, H, mT, mH ( #90463 )
...
Fixes https://github.com/pytorch/pytorch/issues/88843
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90463
Approved by: https://github.com/ngimel
2023-01-09 04:11:23 +00:00
Nikita Vedeneev
7ef7c57ae7
CSC/BSC -> COO coalesce fix ( #91440 )
...
Fixes https://github.com/pytorch/pytorch/issues/91010 .
CSC and BSC sparse formats are not inherently `coalesced`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91440
Approved by: https://github.com/pearu , https://github.com/amjames , https://github.com/cpuhrsch
2023-01-03 18:42:39 +00:00
Pearu Peterson
b797a24259
Support indices contiguity per batch and non-contiguous values in sparse compressed tensors ( #91243 )
...
Fixes https://github.com/pytorch/pytorch/issues/91062
With this PR, all reported failures in https://github.com/pytorch/pytorch/pull/90849 are resolved (modulo test_bmm that uses an unorthodox way to construct a batch CSR tensor).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91243
Approved by: https://github.com/nikitaved , https://github.com/amjames , https://github.com/lezcano
2023-01-02 18:08:46 +00:00
Nikita Vedeneev
1768a28a20
COO @ COO: fix to always produce coalesced outputs. (#91094 )
...
Fixes [#90516 ](https://github.com/pytorch/pytorch/issues/90516 )
Fixes [#90538 ](https://github.com/pytorch/pytorch/issues/90538 )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91094
Approved by: https://github.com/pearu
2022-12-27 21:32:14 +00:00
Pearu Peterson
8004f934cd
Fix CSR with int32 indices to CSC conversion ( #91061 )
...
Fixes https://github.com/pytorch/pytorch/issues/91007
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91061
Approved by: https://github.com/nikitaved
2022-12-18 13:53:25 +00:00
Pearu Peterson
01e7f46215
Ensure sorted indices from the CSR->BSR conversion ( #90918 )
...
Fixes https://github.com/pytorch/pytorch/issues/90910
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90918
Approved by: https://github.com/cpuhrsch
2022-12-16 15:49:48 +00:00
Edward Z. Yang
e686a442b4
If a torch.* returns non-Tensor, make this unimplemented rather than assert. ( #89918 )
...
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89918
Approved by: https://github.com/albanD
2022-12-15 21:53:54 +00:00
Pearu Peterson
a60d712010
Support (non-batch) BSR/BSC to COO sparse tensor conversions ( #90718 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90718
Approved by: https://github.com/cpuhrsch
2022-12-14 05:37:05 +00:00
Pearu Peterson
76c6dfeaa6
Add layout and blocksize arguments to Tensor.to_sparse method ( #89502 )
...
This PR extends the `Tensor.to_sparse()` method to `Tensor.to_sparse(layout=None, blocksize=None)` in a BC manner (`layout=None` means `layout=torch.sparse_coo`).
In addition, the PR adds support for the following conversions:
- non-hybrid/hybrid COO tensor to CSR or CSC or a COO tensor
- short, bool, byte, char, bfloat16, int, long, half CSR tensor to a BSR tensor
and fixes the following conversions:
- hybrid COO to COO tensor
- non-batch/batch hybrid BSR to BSR or BSC tensor
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89502
Approved by: https://github.com/amjames , https://github.com/cpuhrsch
2022-11-30 20:21:10 +00:00
Pearu Peterson
296e1ba4d0
Row and column select support for block compressed sparse tensors ( #88733 )
...
As in the title:
- Support `select` and `select_copy` on block sparse compressed tensors
- Fixes incorrect results when selecting dense dimensions
The PR also improves the performance of indexing sparse compressed tensors considerably:
<details>
Before:
```python
In [3]: a=torch.rand((1000, 1000)).to_sparse_csr()
In [4]: %timeit a.select(0, 0)
606 µs ± 4.27 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [5]: %timeit a.select(1, 0)
527 µs ± 57.7 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [6]: %timeit a[0, 0]
617 µs ± 3.74 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [7]: a = a.cuda()
In [8]: %timeit a.select(0, 0); torch.cuda.synchronize();
1.19 ms ± 137 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [9]: %timeit a.select(1, 0); torch.cuda.synchronize();
1.2 ms ± 119 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [10]: %timeit a[0, 0]; torch.cuda.synchronize();
1.23 ms ± 482 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
```
This PR:
```python
In [3]: a=torch.rand((1000, 1000)).to_sparse_csr()
In [4]: %timeit a.select(0, 0)
4.75 µs ± 8.94 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [5]: %timeit a.select(1, 0)
565 µs ± 156 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [6]: %timeit a[0, 0]
13.1 µs ± 435 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [7]: a = a.cuda()
In [8]: %timeit a.select(0, 0); torch.cuda.synchronize();
21.6 µs ± 23.9 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [9]: %timeit a.select(1, 0); torch.cuda.synchronize();
1.15 ms ± 3.13 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [10]: %timeit a[0, 0]; torch.cuda.synchronize();
63.7 µs ± 2.5 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
```
</details>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88733
Approved by: https://github.com/nikitaved , https://github.com/amjames , https://github.com/cpuhrsch
2022-11-30 11:15:56 +00:00
Pearu Peterson
90bed8874f
Generator of tensor inputs with variable layout and structure (batch/non-batch, hybrid/non-hybrid, block/non-block) ( #88914 )
...
This PR introduces `TestCase.generate_simple_inputs` method that is an improved and generalized version of the `TestSparseCompressed._generate_small_inputs` method.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88914
Approved by: https://github.com/cpuhrsch
2022-11-30 02:13:33 +00:00
Kazuaki Ishizaki
088f2fa567
Fix typos in messages under test ( #89121 )
...
This PR fixes typos of messages in `.cpp` and `.py` files under test directory.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89121
Approved by: https://github.com/mruberry , https://github.com/kit1980
2022-11-17 01:55:03 +00:00
Andrew M. James
ff6770a9a1
enable backward for log1p (sparse layouts) ( #88155 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88155
Approved by: https://github.com/cpuhrsch
2022-11-04 20:59:26 +00:00
jpvillam
1e1b045128
[ROCM] Enable Sparse Pickle Test ( #82729 )
...
Missed stream context for serialization
### Description
Missing ROCm stream context on memory operations for serialization
### Testing
Ran the sparse pickle test
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82729
Approved by: https://github.com/ngimel
2022-10-27 15:11:28 +00:00
Pearu Peterson
88b882cd1c
Support sum on a sparse COO tensor. ( #86300 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86300
Approved by: https://github.com/cpuhrsch
2022-10-06 18:39:28 +00:00
George Qi
686555b663
[maskedtensor] port torch/_masked into torch/masked ( #85515 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85515
Approved by: https://github.com/cpuhrsch
2022-09-26 23:41:13 +00:00
Elias Ellison
bcc544e9d7
Add FakeCrossRef tests for backwards, Fix Layer Norm Backward Decomp ( #85417 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85417
Approved by: https://github.com/ezyang
2022-09-26 17:08:14 +00:00
nikitaved
12ae3bea43
Faster mul(sparse, sparse) with broadcasting in dense dims. ( #85336 )
...
This is a combo PR of https://github.com/pytorch/pytorch/pull/84929 and ~https://github.com/pytorch/pytorch/pull/83428~ .
Preliminary benchmarks (square matrices of shape (n, n)).
<details>
<summary>Script</summary>
```python
import torch
import math
from IPython import get_ipython
from itertools import product, repeat
import pickle
from torch.utils.benchmark import Timer, Compare
torch.manual_seed(13)
problem_dims = (
# n > nnz
(10000, 100),
(100000, 1000),
(1000000, 10000),
# n < nnz
(10, 100),
(10, 1000),
(10, 10000),
(100, 1000),
(100, 10000),
(1000, 10000),
(1000, 100000),
(1000, 1000000),
#(1000000, 1000000000),
)
name = "PR"
device = "cuda"
results = []
for n, nnz in problem_dims:
def gen_tensor(coalesce=False):
shape = (n, n)
nrows, ncols = shape
rowidx = torch.randint(low=0, high=nrows, size=(nnz,), device=device)
colidx = torch.randint(low=0, high=ncols, size=(nnz,), device=device)
itemidx = torch.vstack((rowidx, colidx))
xvalues = torch.randn(nnz, device=device)
itemidx = torch.hstack((itemidx, itemidx))
xvalues = torch.hstack((xvalues, xvalues))
res = torch.sparse_coo_tensor(itemidx, xvalues, size=shape)
if coalesce:
return res.coalesce()
else:
return res
for x_coalesce, y_coalesce in product(*repeat((True, False), 2)):
x = gen_tensor(x_coalesce)
y = gen_tensor(y_coalesce)
smtp = "x * y"
timer = Timer(smtp,
globals=globals(),
label="coo.mul",
description=f"{name}: mul, device: {device}",
sub_label=f"n={n}, nnz={nnz}, coalesce=({x_coalesce, y_coalesce})",
num_threads=torch.get_num_threads())
results.append(timer.blocked_autorange())
compare = Compare(results)
compare.trim_significant_figures()
compare.print()
with open(f"{name}_{device}_mul.pickle", 'wb') as f:
pickle.dump(results, f)
```
</details>
<details>
<summary>Gather results</summary>
```python
import pickle
from torch.utils.benchmark import Timer, Compare
files = [
"PR",
"master"
]
device = 'cuda'
timers = []
for name in files:
with open("{}_{}_mul.pickle".format(name, device), 'rb') as f:
timers += pickle.load(f)
compare = Compare(timers)
compare.trim_significant_figures()
compare.print()
```
</details>
<details>
<summary>CUDA</summary>
```
[------------------------------------------------- coo.mul -------------------------------------------------]
| PR: mul, device: cuda | master: mul, device: cuda
24 threads: -------------------------------------------------------------------------------------------------
n=10000, nnz=100, coalesce=((True, True)) | 95 | 91
n=10000, nnz=100, coalesce=((True, False)) | 87 | 242
n=10000, nnz=100, coalesce=((False, True)) | 87 | 226
n=10000, nnz=100, coalesce=((False, False)) | 130 | 371
n=100000, nnz=1000, coalesce=((True, True)) | 100 | 521
n=100000, nnz=1000, coalesce=((True, False)) | 90 | 649
n=100000, nnz=1000, coalesce=((False, True)) | 100 | 659
n=100000, nnz=1000, coalesce=((False, False)) | 200 | 781
n=1000000, nnz=10000, coalesce=((True, True)) | 100 | 4861
n=1000000, nnz=10000, coalesce=((True, False)) | 100 | 5012
n=1000000, nnz=10000, coalesce=((False, True)) | 98 | 5010
n=1000000, nnz=10000, coalesce=((False, False)) | 384 | 5174
n=10, nnz=100, coalesce=((True, True)) | 100 | 79
n=10, nnz=100, coalesce=((True, False)) | 100 | 221
n=10, nnz=100, coalesce=((False, True)) | 100 | 221
n=10, nnz=100, coalesce=((False, False)) | 100 | 350
n=10, nnz=1000, coalesce=((True, True)) | 100 | 100
n=10, nnz=1000, coalesce=((True, False)) | 100 | 240
n=10, nnz=1000, coalesce=((False, True)) | 100 | 254
n=10, nnz=1000, coalesce=((False, False)) | 100 | 392
n=10, nnz=10000, coalesce=((True, True)) | 100 | 110
n=10, nnz=10000, coalesce=((True, False)) | 110 | 286
n=10, nnz=10000, coalesce=((False, True)) | 110 | 286
n=10, nnz=10000, coalesce=((False, False)) | 271 | 455
n=100, nnz=1000, coalesce=((True, True)) | 110 | 851
n=100, nnz=1000, coalesce=((True, False)) | 110 | 1000
n=100, nnz=1000, coalesce=((False, True)) | 110 | 990
n=100, nnz=1000, coalesce=((False, False)) | 140 | 1124
n=100, nnz=10000, coalesce=((True, True)) | 110 | 5137
n=100, nnz=10000, coalesce=((True, False)) | 110 | 5391
n=100, nnz=10000, coalesce=((False, True)) | 100 | 5405
n=100, nnz=10000, coalesce=((False, False)) | 249 | 5539
n=1000, nnz=10000, coalesce=((True, True)) | 100 | 8598
n=1000, nnz=10000, coalesce=((True, False)) | 100 | 8800
n=1000, nnz=10000, coalesce=((False, True)) | 100 | 8782
n=1000, nnz=10000, coalesce=((False, False)) | 255 | 8956
n=1000, nnz=100000, coalesce=((True, True)) | 120 | 84500
n=1000, nnz=100000, coalesce=((True, False)) | 200 | 88560
n=1000, nnz=100000, coalesce=((False, True)) | 160 | 89000
n=1000, nnz=100000, coalesce=((False, False)) | 373 | 89000
n=1000, nnz=1000000, coalesce=((True, True)) | 312 | 606400
n=1000, nnz=1000000, coalesce=((True, False)) | 1340 | 609200
n=1000, nnz=1000000, coalesce=((False, True)) | 1340 | 609100
n=1000, nnz=1000000, coalesce=((False, False)) | 4408 | 611400
Times are in microseconds (us).
```
</details>
<details>
<summary>CPU</summary>
```
[------------------------------------------------ coo.mul ------------------------------------------------]
| PR: mul, device: cpu | master: mul, device: cpu
24 threads: -----------------------------------------------------------------------------------------------
n=10000, nnz=100, coalesce=((True, True)) | 8 | 8
n=10000, nnz=100, coalesce=((True, False)) | 32 | 34
n=10000, nnz=100, coalesce=((False, True)) | 32 | 34
n=10000, nnz=100, coalesce=((False, False)) | 41 | 56
n=100000, nnz=1000, coalesce=((True, True)) | 24 | 24
n=100000, nnz=1000, coalesce=((True, False)) | 90 | 100
n=100000, nnz=1000, coalesce=((False, True)) | 87 | 100
n=100000, nnz=1000, coalesce=((False, False)) | 231 | 255
n=1000000, nnz=10000, coalesce=((True, True)) | 190 | 200
n=1000000, nnz=10000, coalesce=((True, False)) | 908 | 2023
n=1000000, nnz=10000, coalesce=((False, True)) | 800 | 2036
n=1000000, nnz=10000, coalesce=((False, False)) | 3684 | 3989
n=10, nnz=100, coalesce=((True, True)) | 8 | 7
n=10, nnz=100, coalesce=((True, False)) | 34 | 30
n=10, nnz=100, coalesce=((False, True)) | 33 | 30
n=10, nnz=100, coalesce=((False, False)) | 44 | 50
n=10, nnz=1000, coalesce=((True, True)) | 8 | 7
n=10, nnz=1000, coalesce=((True, False)) | 100 | 100
n=10, nnz=1000, coalesce=((False, True)) | 130 | 100
n=10, nnz=1000, coalesce=((False, False)) | 746 | 210
n=10, nnz=10000, coalesce=((True, True)) | 8 | 7
n=10, nnz=10000, coalesce=((True, False)) | 1000 | 1500
n=10, nnz=10000, coalesce=((False, True)) | 1000 | 1510
n=10, nnz=10000, coalesce=((False, False)) | 3063 | 2457
n=100, nnz=1000, coalesce=((True, True)) | 25 | 25
n=100, nnz=1000, coalesce=((True, False)) | 180 | 130
n=100, nnz=1000, coalesce=((False, True)) | 200 | 130
n=100, nnz=1000, coalesce=((False, False)) | 271 | 255
n=100, nnz=10000, coalesce=((True, True)) | 100 | 100
n=100, nnz=10000, coalesce=((True, False)) | 2444 | 2290
n=100, nnz=10000, coalesce=((False, True)) | 2455 | 2357
n=100, nnz=10000, coalesce=((False, False)) | 5316 | 3783
n=1000, nnz=10000, coalesce=((True, True)) | 204 | 211
n=1000, nnz=10000, coalesce=((True, False)) | 2457 | 2480
n=1000, nnz=10000, coalesce=((False, True)) | 2448 | 2539
n=1000, nnz=10000, coalesce=((False, False)) | 3665 | 4801
n=1000, nnz=100000, coalesce=((True, True)) | 2293 | 2374
n=1000, nnz=100000, coalesce=((True, False)) | 9000 | 24620
n=1000, nnz=100000, coalesce=((False, True)) | 8000 | 25080
n=1000, nnz=100000, coalesce=((False, False)) | 26500 | 47650
n=1000, nnz=1000000, coalesce=((True, True)) | 10000 | 13000
n=1000, nnz=1000000, coalesce=((True, False)) | 80000 | 362200
n=1000, nnz=1000000, coalesce=((False, True)) | 78050 | 392600
n=1000, nnz=1000000, coalesce=((False, False)) | 312100 | 766900
Times are in microseconds (us).
```
</details>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85336
Approved by: https://github.com/cpuhrsch
2022-09-23 23:31:19 +00:00
PyTorch MergeBot
d10de31cc8
Revert "Add FakeCrossRef tests for backwards, Fix Layer Norm Backward Decomp ( #85417 )"
...
This reverts commit 78afa0cf0c .
Reverted https://github.com/pytorch/pytorch/pull/85417 on behalf of https://github.com/clee2000 due to broke tests on trunk 78afa0cf0c
2022-09-23 17:21:43 +00:00
Elias Ellison
78afa0cf0c
Add FakeCrossRef tests for backwards, Fix Layer Norm Backward Decomp ( #85417 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85417
Approved by: https://github.com/ezyang
2022-09-23 15:50:03 +00:00
PyTorch MergeBot
5043457a8e
Revert "Add FakeCrossRef tests for backwards, Fix Layer Norm Backward Decomp ( #85417 )"
...
This reverts commit 9c77083965 .
Reverted https://github.com/pytorch/pytorch/pull/85417 on behalf of https://github.com/clee2000 due to broke tests on trunk (and pull somehow) 9c77083965
2022-09-22 15:44:38 +00:00
Elias Ellison
9c77083965
Add FakeCrossRef tests for backwards, Fix Layer Norm Backward Decomp ( #85417 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85417
Approved by: https://github.com/ezyang
2022-09-22 13:03:57 +00:00
Elias Ellison
d9aa6dfe88
Add Fake Cross Ref Mode, migrate sparse to it ( #85382 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85382
Approved by: https://github.com/ezyang
2022-09-21 17:15:47 +00:00
PyTorch MergeBot
81620c3360
Revert "Faster mul(sparse, sparse) with broadcasting in dense dims. ( #83428 )"
...
This reverts commit d49943bda8 .
Reverted https://github.com/pytorch/pytorch/pull/83428 on behalf of https://github.com/osalpekar due to Reverted because __restrict symbol not supported by certain MSVC compilers, leading to undefined symbol error at compilation time
2022-09-17 06:53:11 +00:00
nikitaved
d49943bda8
Faster mul(sparse, sparse) with broadcasting in dense dims. ( #83428 )
...
Preliminary benchmarks (square matrices of shape (n, n)).
<details>
<summary>Script</summary>
```python
import torch
import math
from IPython import get_ipython
from itertools import product, repeat
import pickle
from torch.utils.benchmark import Timer, Compare
torch.manual_seed(13)
# specifies (n, nnz)
problem_dims = (
# n > nnz
(10000, 100),
(100000, 1000),
(1000000, 10000),
# n < nnz
(10, 100),
(10, 1000),
(10, 10000),
(100, 1000),
(100, 10000),
(1000, 10000),
(1000, 100000),
(1000, 1000000),
#(1000000, 1000000000),
)
name = "PR"
device = "cuda"
results = []
for n, nnz in problem_dims:
def gen_tensor(coalesce=False):
shape = (n, n)
nrows, ncols = shape
rowidx = torch.randint(low=0, high=nrows, size=(nnz,), device=device)
colidx = torch.randint(low=0, high=ncols, size=(nnz,), device=device)
itemidx = torch.vstack((rowidx, colidx))
xvalues = torch.randn(nnz, device=device)
itemidx = torch.hstack((itemidx, itemidx))
xvalues = torch.hstack((xvalues, xvalues))
res = torch.sparse_coo_tensor(itemidx, xvalues, size=shape)
if coalesce:
return res.coalesce()
else:
return res
for x_coalesce, y_coalesce in product(*repeat((True, False), 2)):
x = gen_tensor(x_coalesce)
y = gen_tensor(y_coalesce)
smtp = "x * y"
timer = Timer(smtp,
globals=globals(),
label="coo.mul",
description=f"{name}: mul, device: {device}",
sub_label=f"n={n}, nnz={nnz}, coalesce=({x_coalesce, y_coalesce})",
num_threads=torch.get_num_threads())
results.append(timer.blocked_autorange())
compare = Compare(results)
compare.trim_significant_figures()
compare.print()
with open(f"{name}_{device}_mul.pickle", 'wb') as f:
pickle.dump(results, f)
```
</details>
<details>
<summary>Gather results</summary>
```python
import pickle
from torch.utils.benchmark import Timer, Compare
files = [
"PR",
"master"
]
device = 'cuda'
timers = []
for name in files:
with open("{}_{}_mul.pickle".format(name, device), 'rb') as f:
timers += pickle.load(f)
compare = Compare(timers)
compare.trim_significant_figures()
compare.print()
```
</details>
<details>
<summary>CUDA</summary>
```
[------------------------------------------------- coo.mul -------------------------------------------------]
| PR: mul, device: cuda | master: mul, device: cuda
24 threads: -------------------------------------------------------------------------------------------------
n=10000, nnz=100, coalesce=((True, True)) | 95 | 91
n=10000, nnz=100, coalesce=((True, False)) | 87 | 242
n=10000, nnz=100, coalesce=((False, True)) | 87 | 226
n=10000, nnz=100, coalesce=((False, False)) | 130 | 371
n=100000, nnz=1000, coalesce=((True, True)) | 100 | 521
n=100000, nnz=1000, coalesce=((True, False)) | 90 | 649
n=100000, nnz=1000, coalesce=((False, True)) | 100 | 659
n=100000, nnz=1000, coalesce=((False, False)) | 200 | 781
n=1000000, nnz=10000, coalesce=((True, True)) | 100 | 4861
n=1000000, nnz=10000, coalesce=((True, False)) | 100 | 5012
n=1000000, nnz=10000, coalesce=((False, True)) | 98 | 5010
n=1000000, nnz=10000, coalesce=((False, False)) | 384 | 5174
n=10, nnz=100, coalesce=((True, True)) | 100 | 79
n=10, nnz=100, coalesce=((True, False)) | 100 | 221
n=10, nnz=100, coalesce=((False, True)) | 100 | 221
n=10, nnz=100, coalesce=((False, False)) | 100 | 350
n=10, nnz=1000, coalesce=((True, True)) | 100 | 100
n=10, nnz=1000, coalesce=((True, False)) | 100 | 240
n=10, nnz=1000, coalesce=((False, True)) | 100 | 254
n=10, nnz=1000, coalesce=((False, False)) | 100 | 392
n=10, nnz=10000, coalesce=((True, True)) | 100 | 110
n=10, nnz=10000, coalesce=((True, False)) | 110 | 286
n=10, nnz=10000, coalesce=((False, True)) | 110 | 286
n=10, nnz=10000, coalesce=((False, False)) | 271 | 455
n=100, nnz=1000, coalesce=((True, True)) | 110 | 851
n=100, nnz=1000, coalesce=((True, False)) | 110 | 1000
n=100, nnz=1000, coalesce=((False, True)) | 110 | 990
n=100, nnz=1000, coalesce=((False, False)) | 140 | 1124
n=100, nnz=10000, coalesce=((True, True)) | 110 | 5137
n=100, nnz=10000, coalesce=((True, False)) | 110 | 5391
n=100, nnz=10000, coalesce=((False, True)) | 100 | 5405
n=100, nnz=10000, coalesce=((False, False)) | 249 | 5539
n=1000, nnz=10000, coalesce=((True, True)) | 100 | 8598
n=1000, nnz=10000, coalesce=((True, False)) | 100 | 8800
n=1000, nnz=10000, coalesce=((False, True)) | 100 | 8782
n=1000, nnz=10000, coalesce=((False, False)) | 255 | 8956
n=1000, nnz=100000, coalesce=((True, True)) | 120 | 84500
n=1000, nnz=100000, coalesce=((True, False)) | 200 | 88560
n=1000, nnz=100000, coalesce=((False, True)) | 160 | 89000
n=1000, nnz=100000, coalesce=((False, False)) | 373 | 89000
n=1000, nnz=1000000, coalesce=((True, True)) | 312 | 606400
n=1000, nnz=1000000, coalesce=((True, False)) | 1340 | 609200
n=1000, nnz=1000000, coalesce=((False, True)) | 1340 | 609100
n=1000, nnz=1000000, coalesce=((False, False)) | 4408 | 611400
Times are in microseconds (us).
```
</details>
<details>
<summary>CPU</summary>
```
[------------------------------------------------ coo.mul ------------------------------------------------]
| PR: mul, device: cpu | master: mul, device: cpu
24 threads: -----------------------------------------------------------------------------------------------
n=10000, nnz=100, coalesce=((True, True)) | 8 | 8
n=10000, nnz=100, coalesce=((True, False)) | 32 | 34
n=10000, nnz=100, coalesce=((False, True)) | 32 | 34
n=10000, nnz=100, coalesce=((False, False)) | 41 | 56
n=100000, nnz=1000, coalesce=((True, True)) | 24 | 24
n=100000, nnz=1000, coalesce=((True, False)) | 90 | 100
n=100000, nnz=1000, coalesce=((False, True)) | 87 | 100
n=100000, nnz=1000, coalesce=((False, False)) | 231 | 255
n=1000000, nnz=10000, coalesce=((True, True)) | 190 | 200
n=1000000, nnz=10000, coalesce=((True, False)) | 908 | 2023
n=1000000, nnz=10000, coalesce=((False, True)) | 800 | 2036
n=1000000, nnz=10000, coalesce=((False, False)) | 3684 | 3989
n=10, nnz=100, coalesce=((True, True)) | 8 | 7
n=10, nnz=100, coalesce=((True, False)) | 34 | 30
n=10, nnz=100, coalesce=((False, True)) | 33 | 30
n=10, nnz=100, coalesce=((False, False)) | 44 | 50
n=10, nnz=1000, coalesce=((True, True)) | 8 | 7
n=10, nnz=1000, coalesce=((True, False)) | 100 | 100
n=10, nnz=1000, coalesce=((False, True)) | 130 | 100
n=10, nnz=1000, coalesce=((False, False)) | 746 | 210
n=10, nnz=10000, coalesce=((True, True)) | 8 | 7
n=10, nnz=10000, coalesce=((True, False)) | 1000 | 1500
n=10, nnz=10000, coalesce=((False, True)) | 1000 | 1510
n=10, nnz=10000, coalesce=((False, False)) | 3063 | 2457
n=100, nnz=1000, coalesce=((True, True)) | 25 | 25
n=100, nnz=1000, coalesce=((True, False)) | 180 | 130
n=100, nnz=1000, coalesce=((False, True)) | 200 | 130
n=100, nnz=1000, coalesce=((False, False)) | 271 | 255
n=100, nnz=10000, coalesce=((True, True)) | 100 | 100
n=100, nnz=10000, coalesce=((True, False)) | 2444 | 2290
n=100, nnz=10000, coalesce=((False, True)) | 2455 | 2357
n=100, nnz=10000, coalesce=((False, False)) | 5316 | 3783
n=1000, nnz=10000, coalesce=((True, True)) | 204 | 211
n=1000, nnz=10000, coalesce=((True, False)) | 2457 | 2480
n=1000, nnz=10000, coalesce=((False, True)) | 2448 | 2539
n=1000, nnz=10000, coalesce=((False, False)) | 3665 | 4801
n=1000, nnz=100000, coalesce=((True, True)) | 2293 | 2374
n=1000, nnz=100000, coalesce=((True, False)) | 9000 | 24620
n=1000, nnz=100000, coalesce=((False, True)) | 8000 | 25080
n=1000, nnz=100000, coalesce=((False, False)) | 26500 | 47650
n=1000, nnz=1000000, coalesce=((True, True)) | 10000 | 13000
n=1000, nnz=1000000, coalesce=((True, False)) | 80000 | 362200
n=1000, nnz=1000000, coalesce=((False, True)) | 78050 | 392600
n=1000, nnz=1000000, coalesce=((False, False)) | 312100 | 766900
Times are in microseconds (us).
```
</details>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83428
Approved by: https://github.com/cpuhrsch
2022-09-16 00:28:40 +00:00
Edward Z. Yang
c5a8946e40
Revert "Revert "Redo how custom/python_custom methods on TensorImpl work ( #84796 )" ( #84806 )
...
This reverts commit ca3b2bfbe3 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84806
Approved by: https://github.com/Chillee
2022-09-10 06:17:35 +00:00
Eli Uriegas
ca3b2bfbe3
Revert "Redo how custom/python_custom methods on TensorImpl work ( #84796 )
...
This reverts commit 591b75bf98 .
Manual revert of https://github.com/pytorch/pytorch/pull/84641
Fixes #ISSUE_NUMBER
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84796
Approved by: https://github.com/izaitsevfb
2022-09-10 00:18:13 +00:00
Edward Z. Yang
591b75bf98
Redo how custom/python_custom methods on TensorImpl work ( #84641 )
...
A longstanding confusion in the implementation of fake tensor and proxy tensor is what to do about torch.ops.aten.sym_sizes and related calls. In particular, when you have a tensor that (1) has symbolic shapes and (2) has a `__torch_dispatch__` call, previously, you would always get `__torch_dispatch__` calls for sizes/strides query, *even if you didn't request it* via the dispatch kwargs in `make_wrapper_subclass`.
The reason for this is because we were previously mixing several concepts: "I want to dispatch to Python", "I want to call a virtual method" and "I have dynamic shapes". A single boolean variable controlled all of these things, and so it was not possible to understand inside TensorImpl what the user had actually originally requested.
In this PR, we track each of these concepts individually so that we can preserve user intent. Then, we combine these into a single "policy" variable that controls whether or not we can use the fastpath or not. For the policy to trigger, we only need one of the exceptional cases to be true.
Billing of changes:
* Rename `set_sizes_strides_policy` to `set_custom_sizes_strides`; in general, you cannot DIRECTLY set policy; you have to indirectly set it by the public functions.
* Some helpers for sizes and strides, since it's more complicated (as it is an enum, rather than just bools as is the case for device and layout). `matches_python_custom` is used to test the Python dispatch user ask. `matches_policy` does the policy test (only used in the user facing functions.)
* I reorged the accessor methods so that they are more logical. This makes the diff bad, so I recommend reading the final code directly.
* The default custom implementations now more reliably call their default() implementations
* As bonus refactor, I devirtualized some functions that don't need to be virtual
* `set_sym_sizes_and_strides` is renamed to `set_sizes_and_strides` to make it easier to use in template contexts; it optionally takes a storage offset now so you can set all three values at the same time. If you use the SymInt overload but there are no symbolic integers, we give you a normal resize.
* This adds `sym_storage_offset` since we had that in the symbolic shapes branch and there's no reason not to put it in (and it reduces merge conflicts)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84641
Approved by: https://github.com/wconstab
2022-09-09 13:41:13 +00:00
Elias Ellison
15c5baf878
Throw on data dependent ops ( #83567 )
...
Previously, we would trace through the following with no error:
```
from torch.fx.experimental.proxy_tensor import make_fx
import torch
def f(x, y):
return x[0, y:]
```
Even though the output shape is dependent on the data of `y`. Now, throw on the conversion of `y` to an integer.
It would be nice to not break on constant tensors but I'll do that as the next PR (Edit: done with https://github.com/pytorch/pytorch/pull/84387 ). Sketching out how that would work (and keep in mind this is applicable Dynamo tracing and not just AOT Autograd)
I think to do that you would need to :
- hold strong refs to a set of constant tensors, and only allow them to be captured from `lift_fresh.copy`
- when you run a mutable op, either remove it from the set of constant tensors or run the operator for real
- limit to small constant tensors
Anything else ?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83567
Approved by: https://github.com/ezyang
2022-09-07 02:37:00 +00:00
Andrew M. James
6dc9223c8b
Sparse_coo: Be more agressive in setting coalesced True to avoid suprising behaviors ( #82426 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82426
Approved by: https://github.com/pearu , https://github.com/bhosmer
2022-09-01 17:46:51 +00:00
jpvillam
247468baf0
[ROCm] More Sparse UTs enablement and more hipification mappings. ( #78939 )
...
Enables:
test_bmm_cuda_float64
test_bmm_deterministic_cuda_float64
test_csr_matvec_cuda_complex128
test_csr_matvec_cuda_complex64
test_csr_matvec_cuda_float32
test_csr_matvec_cuda_float64
To enable the above tests had to add some more hip mappings for the hipification process.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78939
Approved by: https://github.com/pruthvistony , https://github.com/malfet
2022-08-23 13:54:09 +00:00
Brian Hirsh
0c24af4985
Always allow tensor metadata changes ( #83590 )
...
Make it so that it is valid to set metadata after detach calls, like `x.detach().resize_(...)`.
This technically lifts some restrictions around `.data`. This PR means that you can now technically call `x.data.resize_(...)`, which can now directly resize `x` instead of erroring.
My understanding: Before the tensor-variable merge, when `x` and `x.data` were really different tensors, you could resize `x.data` independently of `x`, and during the merge, this error was added to avoid silent confusing behavior changes.
It was agreed that this error has been around long enough (several years) that it's acceptable to drop. cc @albanD @ezyang.
(Ed already had a prototype PR [here](https://github.com/pytorch/pytorch/pull/83545 ) - I ended up making one to try to slog through test failures).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83590
Approved by: https://github.com/ezyang
2022-08-19 23:30:43 +00:00
nikitaved
b60dc2eb43
mul: sparse-dense + sparse-sparse with 0-dims support take 2. (#82962 )
...
This one is a copy of
https://github.com/pytorch/pytorch/pull/81556
https://github.com/pytorch/pytorch/pull/82717
These got reverted due to issues with torchvision.
CC @kit1980 , could you please take over from here?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82962
Approved by: https://github.com/kit1980
2022-08-11 23:34:58 +00:00
PyTorch MergeBot
45291c7ec8
Revert "Implement mul(dense, sparse), mul(sparse, dense) for sparse COO tensors. ( #81556 )"
...
This reverts commit edd2f6daa7 .
Reverted https://github.com/pytorch/pytorch/pull/81556 on behalf of https://github.com/kit1980 due to Broken internal test, S286911
2022-08-05 19:39:01 +00:00
PyTorch MergeBot
796fba02fe
Revert "Implement and extend mul(sparse, sparse) to work with 0-dim arguments on either side. ( #82717 )"
...
This reverts commit 3ab54b971f .
Reverted https://github.com/pytorch/pytorch/pull/82717 on behalf of https://github.com/kit1980 due to Broken internal test, S286911
2022-08-05 19:35:35 +00:00
Nikita Vedeneev
3ab54b971f
Implement and extend mul(sparse, sparse) to work with 0-dim arguments on either side. ( #82717 )
...
Extends https://github.com/pytorch/pytorch/pull/81556 by bringing some missing functionality implemented in master.
Also, improves on master to allow arbitrary 0-dim coalesced or not arguments to be on either side of the operation.
Master, for example, would fail on 0-dim non-coalesced inputs.
CC @datumbox, @osalpekar .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82717
Approved by: https://github.com/amjames , https://github.com/bhosmer
2022-08-04 17:46:23 +00:00
Edward Z. Yang
42fefd4403
Sparse fake tensor support ( #82172 )
...
Add support for sparse fake tensors.
- The testing strategy is to run a fake tensor cross ref test on `test_sparse.py`. This is necessary because OpInfo sparse coverage is completely nonexistent. We could have tried to turn on cross ref testing globally for all files, but that would be very time consuming and the tests I'm interested in are mostly in this file. There are some exclusions in testing for things that don't work.
- I make fake tensor converter raise a UnsupportedFakeTensorException if the meta converter fails to do a conversion (which can happen in a relatively large number of situations).
- I relax fake tensor invariants so that you can make a fake tensor from a meta tensor. This is useful because in the cross ref test sometimes we operate on meta tensors.
- Fake tensor wrapping is improved to handle the case when a function doesn't return any tensors
- Meta converter is taught how to convert sparse tensors to meta
There's still a little more cleanup that needs to be done, but this is good for review.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82172
Approved by: https://github.com/eellison
2022-08-03 14:29:36 +00:00
Nikita Vedeneev
edd2f6daa7
Implement mul(dense, sparse), mul(sparse, dense) for sparse COO tensors. ( #81556 )
...
As per title. Implemented with broadcasting and in-place support.
Follow-up : Backward implementation.
Fixes https://github.com/pytorch/pytorch/issues/3158
Fixes https://github.com/pytorch/pytorch/issues/4456
Fixes https://github.com/pytorch/pytorch/issues/46307
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81556
Approved by: https://github.com/amjames , https://github.com/cpuhrsch
2022-07-29 15:15:27 +00:00
Nikita Vedeneev
18d0e533da
fix silent type promition for sparse COO tensors with select ( #82215 )
...
Fixes https://github.com/pytorch/pytorch/issues/82150 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82215
Approved by: https://github.com/amjames , https://github.com/cpuhrsch
2022-07-27 12:24:06 +00:00
Christian Puhrsch
6ab1fe19ee
torch.sparse.softmax avoid div by zero and invalid kernel launch parameters ( #82149 )
...
### Description
Small changes needed to deal with nnz 0 inputs.
### Issue
https://github.com/pytorch/pytorch/issues/82107
### Testing
Added additional test coverage to reproduce bug reported in issue. Tested resulting values by conversion `to_dense`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82149
Approved by: https://github.com/jbschlosser , https://github.com/ezyang
2022-07-25 23:10:58 +00:00
PyTorch MergeBot
6e9b0dcdc4
Revert "Implement mul(dense, sparse), mul(sparse, dense) for sparse COO tensors. ( #81556 )"
...
This reverts commit cc5b01651f .
Reverted https://github.com/pytorch/pytorch/pull/81556 on behalf of https://github.com/jeanschmidt due to breaking internal builds
2022-07-22 11:20:11 +00:00
Nikita Vedeneev
cc5b01651f
Implement mul(dense, sparse), mul(sparse, dense) for sparse COO tensors. ( #81556 )
...
As per title. Implemented with broadcasting and in-place support.
Follow-up : Backward implementation.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81556
Approved by: https://github.com/amjames , https://github.com/cpuhrsch
2022-07-22 04:55:48 +00:00
Edward Z. Yang
44193f6b5d
Add basic support for sparse meta tensors ( #81800 )
...
Coverage is by no means complete, we'll drive more coverage using
an appropriate cross-ref tests; this is just enough to get construction
and querying working.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81800
Approved by: https://github.com/cpuhrsch , https://github.com/bdhirsh
2022-07-21 21:23:57 +00:00
Andrew M. James
5a4c9e8394
Add spdiags sparse matrix initialization ( #78439 )
...
Similar to [scipy.sparse.spdiags](https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.spdiags.html#scipy-sparse-spdiags )
Part of #70926
In other functions (ie (torch.diagonal)[https://pytorch.org/docs/stable/generated/torch.diagonal.html#torch.diagonal ]) diagonals of a tensor are referenced using the offset and the two dimensions that the diagonal is taken with respect to.
Here the reference implementation from scipy is only considering matrix output, so even if we only support 2-d output at first. It may be useful to consider how the dimensions corresponding to each diagonal would be specified for higher dimensional output.
The proposed torch signature implies that all offsets refer to the diagonals with respect to the only two dimensions of the output:
```
torch.sparse.spdiags(Tensor diagonals, IntTensor offsets, int[] shape, Layout? layout=None) -> SparseTensor
```
Above it is required that: `diagonals.ndimension() == 2`, `offsets.ndimensions() == 1`, `offsets.shape[0] == diagonals.shape[0]` and `len(shape) == 2`.
This would need to be altered for the case where `len(shape)` > 2. One options is:
```
torch.sparse.spdiags(Tensor[] diagonals, IntTensor[] offsets, IntTensor dims, int[] shape, Layout? layout=None) -> SparseTensor
```
Here `offsets` and `diagonals` becomes lists of tensors, and the `IntTensor dims` argument is introduced. This would require that `len(diagonals) == len(offsets) == dims.shape[0]`, `dims.ndimension() == 2` and `dims.shape[1] == 2` also the same restrictions as the 2d case above apply to the elements of `diagonals` and `offsets` pairwise (that is `diagonals[i].ndimension() == 2`, `offsets[i].ndimension() == 1` and `offsets[i].shape[0] == diagonals[i].shape[0]` for all i). This form of the signature would construct the sparse result by placing the values from `diagonals[i][j]` into the diagonal with offset `offset[i][j]` taken with respect to dimensions `dims[i]`. The specialization back to the original signature for the 2d case could be seen as allowing the single row of dims to default to `[0, 1]` when there is only one `diagonals`, `offsets` provided, and shape is `2-d`. This option allows the rows of an input element `diagonals[i]` to have a different length which may be appropriate as the max length of a diagonal along different dimension pairs will be different.
Another option is to specify the dimensions the diagonal is taken with respect to for each offset. This signature would look like:
```
torch.sparse.spdiags(Tensor diagonals, IntTensor offsets, IntTensor dims, int[] shape, Layout? layout=None) -> SparseTensor
```
Here, `diagonals` is still 2-D with dimension 0 matching the length of 1-D `offsets` and the tensor input `dims` is also 2-D with dimension 0 matching the length of 1-D `offsets` and the second dimension being fixed at `2` in this case the sparse result is constructed by placing the elements from `diagonals[i]` into the output diagonal `output.diagonal(offset[i], dim0=dims[i][0], dim1=dims[i][1])` (with some additional consideration that makes it more complicated than simply asigning to that view). The specialization from this back to the 2-D form could be seen as assuming `dims = [[0, 1], [0, 1]... len(offsets) times ]` when `len shape==2`.
In both proposed signatures for the N-D case the specialization back to the 2-D signature is a bit of a stretch for your typical default arguments logic, however I think the first is better choice as it offers more flexibility.
I think some discussion is required about:
- [x] Should the N-D output case be implemented from the outset
- [x] If not, should the future addition of the N-D output case be considered when designing the interface.
- [x] Other thoughts on the signature which includes the `dims` information for the N-D output case.
**Resolution**: Since no one has offered a request for N-D output support, I think is fine to restrict this to sparse matrix generation. Should a request for N-D support come later, an overload accepting the additional `dims` could be added.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78439
Approved by: https://github.com/nikitaved , https://github.com/cpuhrsch , https://github.com/pearu
2022-07-01 01:11:54 +00:00
PyTorch MergeBot
56e3bc5215
Revert "Add spdiags sparse matrix initialization ( #78439 )"
...
This reverts commit cfb2034b65 .
Reverted https://github.com/pytorch/pytorch/pull/78439 on behalf of https://github.com/suo due to broke windows builds, see: cfb2034b65
2022-06-30 21:04:36 +00:00
Andrew M. James
cfb2034b65
Add spdiags sparse matrix initialization ( #78439 )
...
Similar to [scipy.sparse.spdiags](https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.spdiags.html#scipy-sparse-spdiags )
Part of #70926
In other functions (ie (torch.diagonal)[https://pytorch.org/docs/stable/generated/torch.diagonal.html#torch.diagonal ]) diagonals of a tensor are referenced using the offset and the two dimensions that the diagonal is taken with respect to.
Here the reference implementation from scipy is only considering matrix output, so even if we only support 2-d output at first. It may be useful to consider how the dimensions corresponding to each diagonal would be specified for higher dimensional output.
The proposed torch signature implies that all offsets refer to the diagonals with respect to the only two dimensions of the output:
```
torch.sparse.spdiags(Tensor diagonals, IntTensor offsets, int[] shape, Layout? layout=None) -> SparseTensor
```
Above it is required that: `diagonals.ndimension() == 2`, `offsets.ndimensions() == 1`, `offsets.shape[0] == diagonals.shape[0]` and `len(shape) == 2`.
This would need to be altered for the case where `len(shape)` > 2. One options is:
```
torch.sparse.spdiags(Tensor[] diagonals, IntTensor[] offsets, IntTensor dims, int[] shape, Layout? layout=None) -> SparseTensor
```
Here `offsets` and `diagonals` becomes lists of tensors, and the `IntTensor dims` argument is introduced. This would require that `len(diagonals) == len(offsets) == dims.shape[0]`, `dims.ndimension() == 2` and `dims.shape[1] == 2` also the same restrictions as the 2d case above apply to the elements of `diagonals` and `offsets` pairwise (that is `diagonals[i].ndimension() == 2`, `offsets[i].ndimension() == 1` and `offsets[i].shape[0] == diagonals[i].shape[0]` for all i). This form of the signature would construct the sparse result by placing the values from `diagonals[i][j]` into the diagonal with offset `offset[i][j]` taken with respect to dimensions `dims[i]`. The specialization back to the original signature for the 2d case could be seen as allowing the single row of dims to default to `[0, 1]` when there is only one `diagonals`, `offsets` provided, and shape is `2-d`. This option allows the rows of an input element `diagonals[i]` to have a different length which may be appropriate as the max length of a diagonal along different dimension pairs will be different.
Another option is to specify the dimensions the diagonal is taken with respect to for each offset. This signature would look like:
```
torch.sparse.spdiags(Tensor diagonals, IntTensor offsets, IntTensor dims, int[] shape, Layout? layout=None) -> SparseTensor
```
Here, `diagonals` is still 2-D with dimension 0 matching the length of 1-D `offsets` and the tensor input `dims` is also 2-D with dimension 0 matching the length of 1-D `offsets` and the second dimension being fixed at `2` in this case the sparse result is constructed by placing the elements from `diagonals[i]` into the output diagonal `output.diagonal(offset[i], dim0=dims[i][0], dim1=dims[i][1])` (with some additional consideration that makes it more complicated than simply asigning to that view). The specialization from this back to the 2-D form could be seen as assuming `dims = [[0, 1], [0, 1]... len(offsets) times ]` when `len shape==2`.
In both proposed signatures for the N-D case the specialization back to the 2-D signature is a bit of a stretch for your typical default arguments logic, however I think the first is better choice as it offers more flexibility.
I think some discussion is required about:
- [x] Should the N-D output case be implemented from the outset
- [x] If not, should the future addition of the N-D output case be considered when designing the interface.
- [x] Other thoughts on the signature which includes the `dims` information for the N-D output case.
**Resolution**: Since no one has offered a request for N-D output support, I think is fine to restrict this to sparse matrix generation. Should a request for N-D support come later, an overload accepting the additional `dims` could be added.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78439
Approved by: https://github.com/nikitaved , https://github.com/cpuhrsch , https://github.com/pearu
2022-06-30 19:54:47 +00:00
Christian Puhrsch
5da776dd08
[Resubmission] fix mul_out CUDA config for COO tensors ( #80254 )
...
Fixes https://github.com/pytorch/pytorch/issues/79914
Duplicate of https://github.com/pytorch/pytorch/pull/79937 . I wasn't able to push changes to the existing PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80254
Approved by: https://github.com/eellison
2022-06-28 00:47:03 +00:00
Nikita Vedeneev
417677bf62
permute for COO sparse tensors (#79707 )
...
As per title. Partial implementation of https://github.com/pytorch/pytorch/issues/78422 .
We cannot satisfy the view semantics once operated over sparse dims.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79707
Approved by: https://github.com/cpuhrsch
2022-06-25 08:49:58 +00:00
Nikita Vedeneev
03cf01bdc0
index_select for COO CUDA tensors. (#77551 )
...
Brings a native CUDA implementation for `index_select`. Master silently converts CUDA tensors to CPU for CUDA support.
Case `nnz >> size` could be optimized similar to how https://github.com/pytorch/pytorch/pull/72710 is doing that.
Some benchmarks:
<details>
<summary>PR/torch_sparse/master</summary>
```
[------------------------------- cuda coo.index_select -------------------------------]
| PR | torch_sparse | master
32 threads: ---------------------------------------------------------------------------
n=10000, nnz=100, index_len=100, dim=0 | 96 | 327 | 70
n=10000, nnz=100, index_len=100, dim=1 | 120 | 505 | 74
n=10000, nnz=100, index_len=1000, dim=0 | 90 | 333 | 93
n=10000, nnz=100, index_len=1000, dim=1 | 120 | 499 | 98
n=10000, nnz=100, index_len=10000, dim=0 | 92 | 331 | 350
n=10000, nnz=100, index_len=10000, dim=1 | 100 | 506 | 352
n=100000, nnz=1000, index_len=100, dim=0 | 53 | 274 | 60
n=100000, nnz=1000, index_len=100, dim=1 | 90 | 368 | 71
n=100000, nnz=1000, index_len=1000, dim=0 | 93 | 332 | 100
n=100000, nnz=1000, index_len=1000, dim=1 | 130 | 501 | 140
n=100000, nnz=1000, index_len=10000, dim=0 | 100 | 341 | 522
n=100000, nnz=1000, index_len=10000, dim=1 | 130 | 530 | 549
n=1000000, nnz=10000, index_len=100, dim=0 | 90 | 429 | 110
n=1000000, nnz=10000, index_len=100, dim=1 | 296 | 810 | 355
n=1000000, nnz=10000, index_len=1000, dim=0 | 100 | 435 | 170
n=1000000, nnz=10000, index_len=1000, dim=1 | 309 | 830 | 548
n=1000000, nnz=10000, index_len=10000, dim=0 | 110 | 446 | 750
n=1000000, nnz=10000, index_len=10000, dim=1 | 310 | 830 | 1000
n=10, nnz=100, index_len=100, dim=0 | 90 | 333 | 74
n=10, nnz=100, index_len=100, dim=1 | 100 | 497 | 78
n=10, nnz=100, index_len=1000, dim=0 | 90 | 329 | 140
n=10, nnz=100, index_len=1000, dim=1 | 100 | 800 | 100
n=10, nnz=100, index_len=10000, dim=0 | 93 | 340 | 900
n=10, nnz=100, index_len=10000, dim=1 | 120 | 800 | 489
n=10, nnz=1000, index_len=100, dim=0 | 90 | 321 | 140
n=10, nnz=1000, index_len=100, dim=1 | 100 | 680 | 140
n=10, nnz=1000, index_len=1000, dim=0 | 110 | 349 | 670
n=10, nnz=1000, index_len=1000, dim=1 | 130 | 740 | 800
n=10, nnz=1000, index_len=10000, dim=0 | 302 | 503 | 4882
n=10, nnz=1000, index_len=10000, dim=1 | 325 | 2257 | 5262
n=10, nnz=10000, index_len=100, dim=0 | 229 | 349 | 810
n=10, nnz=10000, index_len=100, dim=1 | 433 | 870 | 700
n=10, nnz=10000, index_len=1000, dim=0 | 666 | 502 | 5581
n=10, nnz=10000, index_len=1000, dim=1 | 826 | 2379 | 4820
n=10, nnz=10000, index_len=10000, dim=0 | 2534 | 2700 | 80000
n=10, nnz=10000, index_len=10000, dim=1 | 2723 | 18540 | 80000
n=100, nnz=1000, index_len=100, dim=0 | 94 | 324 | 110
n=100, nnz=1000, index_len=100, dim=1 | 100 | 499 | 110
n=100, nnz=1000, index_len=1000, dim=0 | 96 | 337 | 150
n=100, nnz=1000, index_len=1000, dim=1 | 130 | 800 | 140
n=100, nnz=1000, index_len=10000, dim=0 | 100 | 346 | 900
n=100, nnz=1000, index_len=10000, dim=1 | 130 | 760 | 900
n=100, nnz=10000, index_len=100, dim=0 | 90 | 323 | 190
n=100, nnz=10000, index_len=100, dim=1 | 279 | 800 | 180
n=100, nnz=10000, index_len=1000, dim=0 | 110 | 339 | 781
n=100, nnz=10000, index_len=1000, dim=1 | 294 | 870 | 800
n=100, nnz=10000, index_len=10000, dim=0 | 315 | 505 | 6264
n=100, nnz=10000, index_len=10000, dim=1 | 497 | 2398 | 5404
n=1000, nnz=10000, index_len=100, dim=0 | 90 | 333 | 160
n=1000, nnz=10000, index_len=100, dim=1 | 279 | 635 | 150
n=1000, nnz=10000, index_len=1000, dim=0 | 100 | 328 | 215
n=1000, nnz=10000, index_len=1000, dim=1 | 287 | 810 | 207
n=1000, nnz=10000, index_len=10000, dim=0 | 100 | 339 | 900
n=1000, nnz=10000, index_len=10000, dim=1 | 291 | 880 | 1000
n=1000, nnz=100000, index_len=100, dim=0 | 92 | 358 | 435
n=1000, nnz=100000, index_len=100, dim=1 | 302 | 900 | 530
n=1000, nnz=100000, index_len=1000, dim=0 | 130 | 360 | 1000
n=1000, nnz=100000, index_len=1000, dim=1 | 329 | 930 | 1200
n=1000, nnz=100000, index_len=10000, dim=0 | 343 | 530 | 7000
n=1000, nnz=100000, index_len=10000, dim=1 | 545 | 2446 | 6100
n=1000, nnz=1000000, index_len=100, dim=0 | 355 | 394 | 2210
n=1000, nnz=1000000, index_len=100, dim=1 | 1660 | 2276 | 2674
n=1000, nnz=1000000, index_len=1000, dim=0 | 877 | 574 | 6700
n=1000, nnz=1000000, index_len=1000, dim=1 | 2449 | 3782 | 9000
n=1000, nnz=1000000, index_len=10000, dim=0 | 3112 | 2931 | 57000
n=1000, nnz=1000000, index_len=10000, dim=1 | 7340 | 20220 | 65700
Times are in microseconds (us).
```
</details>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77551
Approved by: https://github.com/cpuhrsch
2022-06-01 17:39:03 +00:00
Mike Ruberry
089203f8bc
Updates floor_divide to perform floor division ( #78411 )
...
Fixes https://github.com/pytorch/pytorch/issues/43874
This PR changes floor_divide to perform floor division instead of truncation division.
This is a BC-breaking change, but it's a "bug fix," and we've already warned users for several releases this behavior would change.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78411
Approved by: https://github.com/ngimel
2022-05-29 21:28:45 +00:00
Nikita Vedeneev
00a1fb64bb
Faster index_select for sparse COO tensors on CPU. ( #72710 )
...
Fixes https://github.com/pytorch/pytorch/issues/72212 .
This PR improves the previous algorithm in complexity. It also utilizes the structure of the problem and parallelizes computations when possible.
Benchmark results.
<details>
<summary>Testing script</summary>
```python
import torch
import math
from IPython import get_ipython
from itertools import product
import pickle
from torch.utils.benchmark import Timer, Compare
torch.manual_seed(13)
#torch.set_num_threads(1)
ipython = get_ipython()
index_sizes = (100, 1000, 10000)
# specifies (n, nnz)
problem_dims = (
# n > nnz
(10000, 100),
(100000, 1000),
(1000000, 10000),
# n < nnz
(10, 100),
(10, 1000),
(10, 10000),
(100, 1000),
(100, 10000),
(1000, 10000),
(1000, 100000),
(1000, 1000000),
#(1000000, 1000000000),
)
def f(t, d, index):
s = torch_sparse.SparseTensor.from_torch_sparse_coo_tensor(t)
ss = s.index_select(d, index)
return ss.coo()
name = "PR"
results = []
for (n, nnz), m in product(problem_dims, index_sizes):
for d in (0, 1):
if nnz < n:
shape = (n, n)
else:
shape = (n, nnz // n) if d == 0 else (nnz // n, n)
nrows, ncols = shape
rowidx = torch.randint(low=0, high=nrows, size=(nnz,))
colidx = torch.randint(low=0, high=ncols, size=(nnz,))
itemidx = torch.vstack((rowidx, colidx))
xvalues = torch.randn(nnz)
index = torch.randint(low=0, high=n, size=(m,))
SparseX = torch.sparse_coo_tensor(itemidx, xvalues, size=shape).coalesce()
smtp = "SparseX.index_select(d, index)"
timer = Timer(smtp,
globals=globals(),
label="coo.index_select",
description=f"{name}: coo.index_select",
sub_label=f"n={n}, nnz={nnz}, index_len={m}, dim={d}",
num_threads=torch.get_num_threads())
results.append(timer.blocked_autorange())
compare = Compare(results)
compare.trim_significant_figures()
compare.print()
with open(f"{name}_index_select.pickle", 'wb') as f:
pickle.dump(results, f)
```
</details>
<details>
<summary>Gather results</summary>
```python
import pickle
from torch.utils.benchmark import Timer, Compare
files = [
"PR",
"torch_sparse",
"master"
]
timers = []
for name in files:
with open("{}_index_select.pickle".format(name), 'rb') as f:
timers += pickle.load(f)
compare = Compare(timers)
compare.trim_significant_figures()
compare.print()
```
</details>
<details>
<summary>PR/torch_sparse/master runtime comparison</summary>
```
[----------------------------------- coo.index_select ----------------------------------]
| PR | torch_sparse | master
32 threads: -----------------------------------------------------------------------------
n=10000, nnz=100, index_len=100, dim=0 | 14 | 140 | 10
n=10000, nnz=100, index_len=100, dim=1 | 14 | 200 | 10
n=10000, nnz=100, index_len=1000, dim=0 | 30 | 180 | 38
n=10000, nnz=100, index_len=1000, dim=1 | 34 | 240 | 38
n=10000, nnz=100, index_len=10000, dim=0 | 278 | 460 | 330
n=10000, nnz=100, index_len=10000, dim=1 | 275 | 516 | 330
n=100000, nnz=1000, index_len=100, dim=0 | 16 | 290 | 31
n=100000, nnz=1000, index_len=100, dim=1 | 26 | 390 | 31
n=100000, nnz=1000, index_len=1000, dim=0 | 45 | 405 | 263
n=100000, nnz=1000, index_len=1000, dim=1 | 73 | 500 | 261
n=100000, nnz=1000, index_len=10000, dim=0 | 444 | 783 | 2570
n=100000, nnz=1000, index_len=10000, dim=1 | 470 | 890 | 2590
n=1000000, nnz=10000, index_len=100, dim=0 | 25 | 2400 | 270
n=1000000, nnz=10000, index_len=100, dim=1 | 270 | 4000 | 269
n=1000000, nnz=10000, index_len=1000, dim=0 | 74 | 2600 | 2620
n=1000000, nnz=10000, index_len=1000, dim=1 | 464 | 3600 | 2640
n=1000000, nnz=10000, index_len=10000, dim=0 | 635 | 3300 | 26400
n=1000000, nnz=10000, index_len=10000, dim=1 | 1000 | 3960 | 26400
n=10, nnz=100, index_len=100, dim=0 | 16 | 137 | 16
n=10, nnz=100, index_len=100, dim=1 | 16 | 220 | 16
n=10, nnz=100, index_len=1000, dim=0 | 63 | 238 | 81
n=10, nnz=100, index_len=1000, dim=1 | 60 | 698 | 78
n=10, nnz=100, index_len=10000, dim=0 | 480 | 940 | 862
n=10, nnz=100, index_len=10000, dim=1 | 330 | 4930 | 1070
n=10, nnz=1000, index_len=100, dim=0 | 60 | 200 | 73
n=10, nnz=1000, index_len=100, dim=1 | 56 | 683 | 70
n=10, nnz=1000, index_len=1000, dim=0 | 480 | 530 | 1050
n=10, nnz=1000, index_len=1000, dim=1 | 330 | 4550 | 1368
n=10, nnz=1000, index_len=10000, dim=0 | 3100 | 2900 | 9300
n=10, nnz=1000, index_len=10000, dim=1 | 3400 | 46000 | 9100
n=10, nnz=10000, index_len=100, dim=0 | 400 | 453 | 857
n=10, nnz=10000, index_len=100, dim=1 | 400 | 4070 | 1730
n=10, nnz=10000, index_len=1000, dim=0 | 2840 | 2600 | 13900
n=10, nnz=10000, index_len=1000, dim=1 | 3700 | 40600 | 16000
n=10, nnz=10000, index_len=10000, dim=0 | 83200 | 67400 | 160000
n=10, nnz=10000, index_len=10000, dim=1 | 68000 | 528000 | 190000
n=100, nnz=1000, index_len=100, dim=0 | 46 | 148 | 31
n=100, nnz=1000, index_len=100, dim=1 | 45 | 242 | 37
n=100, nnz=1000, index_len=1000, dim=0 | 68 | 248 | 240
n=100, nnz=1000, index_len=1000, dim=1 | 66 | 755 | 290
n=100, nnz=1000, index_len=10000, dim=0 | 370 | 802 | 2250
n=100, nnz=1000, index_len=10000, dim=1 | 372 | 5430 | 2770
n=100, nnz=10000, index_len=100, dim=0 | 82 | 210 | 224
n=100, nnz=10000, index_len=100, dim=1 | 74 | 986 | 270
n=100, nnz=10000, index_len=1000, dim=0 | 350 | 618 | 2600
n=100, nnz=10000, index_len=1000, dim=1 | 370 | 4660 | 4560
n=100, nnz=10000, index_len=10000, dim=0 | 3000 | 3400 | 41680
n=100, nnz=10000, index_len=10000, dim=1 | 5000 | 47500 | 30400
n=1000, nnz=10000, index_len=100, dim=0 | 71 | 160 | 185
n=1000, nnz=10000, index_len=100, dim=1 | 64 | 516 | 190
n=1000, nnz=10000, index_len=1000, dim=0 | 100 | 249 | 1740
n=1000, nnz=10000, index_len=1000, dim=1 | 98 | 1030 | 1770
n=1000, nnz=10000, index_len=10000, dim=0 | 600 | 808 | 18300
n=1000, nnz=10000, index_len=10000, dim=1 | 663 | 5300 | 18500
n=1000, nnz=100000, index_len=100, dim=0 | 160 | 258 | 1890
n=1000, nnz=100000, index_len=100, dim=1 | 200 | 3620 | 2050
n=1000, nnz=100000, index_len=1000, dim=0 | 500 | 580 | 18700
n=1000, nnz=100000, index_len=1000, dim=1 | 640 | 7550 | 30000
n=1000, nnz=100000, index_len=10000, dim=0 | 3400 | 3260 | 186000
n=1000, nnz=100000, index_len=10000, dim=1 | 3600 | 49600 | 194000
n=1000, nnz=1000000, index_len=100, dim=0 | 517 | 957 | 18700
n=1000, nnz=1000000, index_len=100, dim=1 | 680 | 39600 | 37600
n=1000, nnz=1000000, index_len=1000, dim=0 | 3600 | 4500 | 186000
n=1000, nnz=1000000, index_len=1000, dim=1 | 5800 | 76400 | 190000
n=1000, nnz=1000000, index_len=10000, dim=0 | 50000 | 67900 | 1800000
n=1000, nnz=1000000, index_len=10000, dim=1 | 45000 | 570000 | 1900000
Times are in microseconds (us).
```
</details>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72710
Approved by: https://github.com/pearu , https://github.com/cpuhrsch
2022-05-10 16:33:13 +00:00
PyTorch MergeBot
8d67972b14
Revert "Faster index_select for sparse COO tensors on CPU. ( #72710 )"
...
This reverts commit ce3857e73c .
Reverted https://github.com/pytorch/pytorch/pull/72710 on behalf of https://github.com/malfet
2022-05-10 14:43:05 +00:00
Nikita Vedeneev
ce3857e73c
Faster index_select for sparse COO tensors on CPU. ( #72710 )
...
Fixes https://github.com/pytorch/pytorch/issues/72212 .
This PR improves the previous algorithm in complexity. It also utilizes the structure of the problem and parallelizes computations when possible.
Benchmark results.
<details>
<summary>Testing script</summary>
```python
import torch
import math
from IPython import get_ipython
from itertools import product
import pickle
from torch.utils.benchmark import Timer, Compare
torch.manual_seed(13)
#torch.set_num_threads(1)
ipython = get_ipython()
index_sizes = (100, 1000, 10000)
# specifies (n, nnz)
problem_dims = (
# n > nnz
(10000, 100),
(100000, 1000),
(1000000, 10000),
# n < nnz
(10, 100),
(10, 1000),
(10, 10000),
(100, 1000),
(100, 10000),
(1000, 10000),
(1000, 100000),
(1000, 1000000),
#(1000000, 1000000000),
)
def f(t, d, index):
s = torch_sparse.SparseTensor.from_torch_sparse_coo_tensor(t)
ss = s.index_select(d, index)
return ss.coo()
name = "PR"
results = []
for (n, nnz), m in product(problem_dims, index_sizes):
for d in (0, 1):
if nnz < n:
shape = (n, n)
else:
shape = (n, nnz // n) if d == 0 else (nnz // n, n)
nrows, ncols = shape
rowidx = torch.randint(low=0, high=nrows, size=(nnz,))
colidx = torch.randint(low=0, high=ncols, size=(nnz,))
itemidx = torch.vstack((rowidx, colidx))
xvalues = torch.randn(nnz)
index = torch.randint(low=0, high=n, size=(m,))
SparseX = torch.sparse_coo_tensor(itemidx, xvalues, size=shape).coalesce()
smtp = "SparseX.index_select(d, index)"
timer = Timer(smtp,
globals=globals(),
label="coo.index_select",
description=f"{name}: coo.index_select",
sub_label=f"n={n}, nnz={nnz}, index_len={m}, dim={d}",
num_threads=torch.get_num_threads())
results.append(timer.blocked_autorange())
compare = Compare(results)
compare.trim_significant_figures()
compare.print()
with open(f"{name}_index_select.pickle", 'wb') as f:
pickle.dump(results, f)
```
</details>
<details>
<summary>Gather results</summary>
```python
import pickle
from torch.utils.benchmark import Timer, Compare
files = [
"PR",
"torch_sparse",
"master"
]
timers = []
for name in files:
with open("{}_index_select.pickle".format(name), 'rb') as f:
timers += pickle.load(f)
compare = Compare(timers)
compare.trim_significant_figures()
compare.print()
```
</details>
<details>
<summary>PR/torch_sparse/master runtime comparison</summary>
```
[----------------------------------- coo.index_select ----------------------------------]
| PR | torch_sparse | master
32 threads: -----------------------------------------------------------------------------
n=10000, nnz=100, index_len=100, dim=0 | 14 | 140 | 10
n=10000, nnz=100, index_len=100, dim=1 | 14 | 200 | 10
n=10000, nnz=100, index_len=1000, dim=0 | 30 | 180 | 38
n=10000, nnz=100, index_len=1000, dim=1 | 34 | 240 | 38
n=10000, nnz=100, index_len=10000, dim=0 | 278 | 460 | 330
n=10000, nnz=100, index_len=10000, dim=1 | 275 | 516 | 330
n=100000, nnz=1000, index_len=100, dim=0 | 16 | 290 | 31
n=100000, nnz=1000, index_len=100, dim=1 | 26 | 390 | 31
n=100000, nnz=1000, index_len=1000, dim=0 | 45 | 405 | 263
n=100000, nnz=1000, index_len=1000, dim=1 | 73 | 500 | 261
n=100000, nnz=1000, index_len=10000, dim=0 | 444 | 783 | 2570
n=100000, nnz=1000, index_len=10000, dim=1 | 470 | 890 | 2590
n=1000000, nnz=10000, index_len=100, dim=0 | 25 | 2400 | 270
n=1000000, nnz=10000, index_len=100, dim=1 | 270 | 4000 | 269
n=1000000, nnz=10000, index_len=1000, dim=0 | 74 | 2600 | 2620
n=1000000, nnz=10000, index_len=1000, dim=1 | 464 | 3600 | 2640
n=1000000, nnz=10000, index_len=10000, dim=0 | 635 | 3300 | 26400
n=1000000, nnz=10000, index_len=10000, dim=1 | 1000 | 3960 | 26400
n=10, nnz=100, index_len=100, dim=0 | 16 | 137 | 16
n=10, nnz=100, index_len=100, dim=1 | 16 | 220 | 16
n=10, nnz=100, index_len=1000, dim=0 | 63 | 238 | 81
n=10, nnz=100, index_len=1000, dim=1 | 60 | 698 | 78
n=10, nnz=100, index_len=10000, dim=0 | 480 | 940 | 862
n=10, nnz=100, index_len=10000, dim=1 | 330 | 4930 | 1070
n=10, nnz=1000, index_len=100, dim=0 | 60 | 200 | 73
n=10, nnz=1000, index_len=100, dim=1 | 56 | 683 | 70
n=10, nnz=1000, index_len=1000, dim=0 | 480 | 530 | 1050
n=10, nnz=1000, index_len=1000, dim=1 | 330 | 4550 | 1368
n=10, nnz=1000, index_len=10000, dim=0 | 3100 | 2900 | 9300
n=10, nnz=1000, index_len=10000, dim=1 | 3400 | 46000 | 9100
n=10, nnz=10000, index_len=100, dim=0 | 400 | 453 | 857
n=10, nnz=10000, index_len=100, dim=1 | 400 | 4070 | 1730
n=10, nnz=10000, index_len=1000, dim=0 | 2840 | 2600 | 13900
n=10, nnz=10000, index_len=1000, dim=1 | 3700 | 40600 | 16000
n=10, nnz=10000, index_len=10000, dim=0 | 83200 | 67400 | 160000
n=10, nnz=10000, index_len=10000, dim=1 | 68000 | 528000 | 190000
n=100, nnz=1000, index_len=100, dim=0 | 46 | 148 | 31
n=100, nnz=1000, index_len=100, dim=1 | 45 | 242 | 37
n=100, nnz=1000, index_len=1000, dim=0 | 68 | 248 | 240
n=100, nnz=1000, index_len=1000, dim=1 | 66 | 755 | 290
n=100, nnz=1000, index_len=10000, dim=0 | 370 | 802 | 2250
n=100, nnz=1000, index_len=10000, dim=1 | 372 | 5430 | 2770
n=100, nnz=10000, index_len=100, dim=0 | 82 | 210 | 224
n=100, nnz=10000, index_len=100, dim=1 | 74 | 986 | 270
n=100, nnz=10000, index_len=1000, dim=0 | 350 | 618 | 2600
n=100, nnz=10000, index_len=1000, dim=1 | 370 | 4660 | 4560
n=100, nnz=10000, index_len=10000, dim=0 | 3000 | 3400 | 41680
n=100, nnz=10000, index_len=10000, dim=1 | 5000 | 47500 | 30400
n=1000, nnz=10000, index_len=100, dim=0 | 71 | 160 | 185
n=1000, nnz=10000, index_len=100, dim=1 | 64 | 516 | 190
n=1000, nnz=10000, index_len=1000, dim=0 | 100 | 249 | 1740
n=1000, nnz=10000, index_len=1000, dim=1 | 98 | 1030 | 1770
n=1000, nnz=10000, index_len=10000, dim=0 | 600 | 808 | 18300
n=1000, nnz=10000, index_len=10000, dim=1 | 663 | 5300 | 18500
n=1000, nnz=100000, index_len=100, dim=0 | 160 | 258 | 1890
n=1000, nnz=100000, index_len=100, dim=1 | 200 | 3620 | 2050
n=1000, nnz=100000, index_len=1000, dim=0 | 500 | 580 | 18700
n=1000, nnz=100000, index_len=1000, dim=1 | 640 | 7550 | 30000
n=1000, nnz=100000, index_len=10000, dim=0 | 3400 | 3260 | 186000
n=1000, nnz=100000, index_len=10000, dim=1 | 3600 | 49600 | 194000
n=1000, nnz=1000000, index_len=100, dim=0 | 517 | 957 | 18700
n=1000, nnz=1000000, index_len=100, dim=1 | 680 | 39600 | 37600
n=1000, nnz=1000000, index_len=1000, dim=0 | 3600 | 4500 | 186000
n=1000, nnz=1000000, index_len=1000, dim=1 | 5800 | 76400 | 190000
n=1000, nnz=1000000, index_len=10000, dim=0 | 50000 | 67900 | 1800000
n=1000, nnz=1000000, index_len=10000, dim=1 | 45000 | 570000 | 1900000
Times are in microseconds (us).
```
</details>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72710
Approved by: https://github.com/pearu , https://github.com/cpuhrsch
2022-05-09 19:59:39 +00:00
Jane Xu
6d9dbd3391
Manually skip test_sparse_addmm as disable code is not working for now ( #77076 )
...
Related to https://github.com/pytorch/pytorch/issues/73145
It was previously skipped for Linux and Windows, but mac has become a problem as well.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77076
Approved by: https://github.com/ezyang
2022-05-09 13:54:29 +00:00
Mikayla Gawarecki
0adf070574
Use scatter_reduce to support masked reductions on sparse COO tensors (sum, prod, amin, amax)
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75454
Approved by: https://github.com/cpuhrsch
2022-05-06 15:40:22 +00:00
PyTorch MergeBot
381e08309f
Revert "Use scatter_reduce to support masked reductions on sparse COO tensors (sum, prod, amin, amax)"
...
This reverts commit fc2a2e8b72 .
Reverted https://github.com/pytorch/pytorch/pull/75454 on behalf of https://github.com/b0noI
2022-05-04 22:31:31 +00:00
Mikayla Gawarecki
fc2a2e8b72
Use scatter_reduce to support masked reductions on sparse COO tensors (sum, prod, amin, amax)
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75454
Approved by: https://github.com/cpuhrsch
2022-05-03 23:17:07 +00:00
arindamroy-eng
7478ce187a
ROCM:Unskip more tests for ROCM5.0
...
Re-enabling more tests which are working on ROCM5.0
Fixes #ISSUE_NUMBER
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75353
Approved by: https://github.com/ezyang
2022-04-19 19:45:55 +00:00
Pearu Peterson
a98b4666e0
Enable test_sparse_mask for Windows
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75189
Approved by: https://github.com/cpuhrsch
2022-04-11 17:21:29 +00:00
Brian Hirsh
1b7d7d9327
Reland: "free up dispatch key space (in C++)" ( #74963 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74963
This is a re-land of D35192346 (9872a06d77 ) and D35192317 (a9216cde6c ), which together are a diff that changes the internal representation of `DispatchKeySet` in pytorch core to free up the number of dispatch keys that we have available. See a more detailed description of the design in the original PR: https://github.com/pytorch/pytorch/pull/69633 .
The original PR broke Milan workflows, which use a pytorch mobile build, and manifested as a memory corruption bug inside of `liboacrmerged.so`.
**Background: Existing Mobile Optimization**
Pytorch mobile builds have an existing optimization (here cc23725e89/c10/core/DispatchKey.h (L382) and here cc23725e89/aten/src/ATen/core/dispatch/OperatorEntry.h (L214) ), which works as follows:
Every operator in pytorch has a "dispatch table" of function pointers, corresponding to all of the (up to 64) different kernels that we might dispatch to when we run an operator in pytorch (autograd, cpu, cuda, complex number support, etc).
In mobile builds, the size of that table is shrunk from 64 to 8 to save a bunch of space, because mobile doesn't end up using the functionality associated with most dispatch keys.
The dispatcher also has a notion of "fallback kernels", which are kernels that you can register to a particular dispatch key, but should be able to work for "any operator". The array of fallback kernels is defined here: cc23725e89/aten/src/ATen/core/dispatch/Dispatcher.h (L294) .
The mobile-optimization currently does **not** extend to this array (it wouldn't be that useful anyway because there is only one array of fallback kernels globally - vs. there is a separate dispatch table of function pointers per operator). So the per-operator tables on mobile are size 8, while the fallback table is size 64.
**The Bug**
This PR actually makes it difficult to enable that optimization separately for the per-operator arrays vs. the fallback array, and incidentally shrunk the size of the fallback array from 64 to 8 for mobile (that happened on this line: https://github.com/pytorch/pytorch/pull/69633/files#diff-f735cd7aa68f15b624100cbc4bb3b5ea76ffc7c9d3bec3b0ccabaa09609e5319R294 ).
That isn't a problem by itself (since mobile doesn't actually use any of the fallbacks that can no longer be stored). However, pytorch core will still register all of those fallback kernels on startup in mobile builds, even if they aren't used. When we tried to register one of those fallbacks on startup, it would try to dump the kernel somewhere in memory past the bounds of the (now smaller) array inside of the `Dispatcher` object, `backendFallbackKernels_`.
**Why didn't this problem show up in OSS CI? Why didn't it break other internal mobile workflows aside from Milan?**
Ideally, this failure would show up as part of the OSS signal on GitHub, since we already have mobile OSS builds. Given that it was another memory corruption issue that only affected Milan (subset of mobile), I'm not sure what's specific about Milan's builds that caused it only to manifest there. dreiss I wonder if there's another flavor of mobile builds we could run in OSS CI that could potentially help catch this?
**The debugging experience was pretty difficult**
Debugging the Milan-specific failure was made difficult by the following:
(1) lack of CI
- the original Milan failure didn't surface on my original diff, because the Milan job(s) that failed weren't triggered to run on pytorch changes. There's probably a balance to strike here, since those jobs will only be useful if they aren't flaky, and if they can produce reliable failure logs for debugging.
(2) It's difficult to get a repro.
- my work laptop doesn't have the right specs to run the Milan development workflow (not enough disk space)
- There is an existing OnDemand workflow for Milan, but it appears to be relatively new, and after a bunch of help from MarcioPorto, we ran into issues forwarding the log output from Milan tests on the emulator back to the terminal (see the original discussion here: https://fb.workplace.com/groups/OnDemandFRL/permalink/1424937774645433/ )
(3) Lack of stack-traces.
- Most Milan failures didn't include actionable stack traces. phding generously helped me debug by running my suggested patches locally, and reporting back if there were any failures. The failing test didn't include a stack trace though (just the line where the crash appeared), so I ended up making some educated guesses about what the issue was based on the area of the crash.
ghstack-source-id: 152688542
Test Plan: Confirmed with phding that the broken Milan workflow from the previous version of this diff is now passing.
Reviewed By: phding, albanD
Differential Revision: D35222806
fbshipit-source-id: 0ad115a0f768bc8ea5d4c203b2990254c7092d30
(cherry picked from commit 002b91966f11fd55ab3fa3801b636fa39a6dd12c)
2022-03-31 21:52:38 +00:00
Nikita Shulga
bfac65dfe5
[testing] Update dispatch macros ( #74977 )
...
This PR is reland of #74289
Co-authored-by: Khushi Agrawal <khushiagrawal411@gmail.com>
2022-03-30 14:13:21 -07:00
PyTorch MergeBot
2e4152b118
Revert "[testing] Update dispatch macros"
...
This reverts commit eed19a0f38 .
Reverted https://github.com/pytorch/pytorch/pull/74289 on behalf of https://github.com/malfet
2022-03-30 19:52:37 +00:00
Khushi Agrawal
eed19a0f38
[testing] Update dispatch macros
...
Hi,
This PR is the follow-up PR of #71561 . (the previous PR had a couple of merge conflicts and was reverted, this PR resolves that).
Please take a look. Thanks!
cc: @pmeier @mruberry @kshitij12345
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74289
Approved by: https://github.com/pmeier , https://github.com/mruberry
2022-03-30 16:10:16 +00:00
Brian Hirsh
9872a06d77
Back out "free up dispatch key space (in C++)" ( #74859 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74859
Original commit changeset: 6d1dd0fd8144
Original Phabricator Diff: D34227616 (2cbddc0e9b )
ghstack-source-id: 152381077
(Note: this ignores all push blocking failures!)
Test Plan:
Test on Milan with "get weather utterance"
buck build fbsourcefbandroid/mode/opt fbsourcefbandroid/mode/milan_build_rdk //fbandroid/apps/wearable/system/speechservice:speechservice_target30_xhdpi_armv7_release_debug_keystore -c pt.has_backtaces=1
Reviewed By: phding
Differential Revision: D35192346
fbshipit-source-id: b962de5d5effaf23f9aa8afd3ef36f8c6383de5b
(cherry picked from commit 913e3027a11457aaa2d97a9d89ebc6133b14213c)
2022-03-29 15:39:17 +00:00
Christian Puhrsch
e55b73d65a
Add strided layout support for to_dense
...
Fixes #59958
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74486
Approved by: https://github.com/pearu , https://github.com/suo
2022-03-29 00:12:48 +00:00
Pearu Peterson
ebeea9e2ea
Support masked sum on sparse COO tensors.
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71239
Approved by: https://github.com/cpuhrsch
2022-03-25 18:26:39 +00:00
Brian Hirsh
2cbddc0e9b
free up dispatch key space (in C++) ( #72827 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72827
Reland of D34034848 (6690256021 )
ghstack-source-id: 152161452
Test Plan: Confirm that Milan tests are passing
Reviewed By: ezyang
Differential Revision: D34227616
fbshipit-source-id: 6d1dd0fd8144dfbd9e194cd7564cce017e7db968
(cherry picked from commit e5c1b29fedd5c2a0bad810cedc94aa784136b6aa)
2022-03-25 17:04:51 +00:00
Nikita Shulga
ef066f0832
Revert D34856571: [pytorch][PR] Replace get_all_ type macros with the ATen dispatch macros.
...
Test Plan: revert-hammer
Differential Revision:
D34856571 (3ded7b1da3 )
Original commit changeset: 0dca038bcad5
Original Phabricator Diff: D34856571 (3ded7b1da3 )
fbshipit-source-id: 594553fa0b710d78beba59d5d2b646f1f1270386
(cherry picked from commit 8090eb9b12dcf452a9e7dc01792a66fb91b563b6)
2022-03-15 22:07:11 +00:00
Khushi Agrawal
3ded7b1da3
Replace get_all_ type macros with the ATen dispatch macros. ( #71561 )
...
Summary:
Hi, Team!
The PR is motivated from https://github.com/pytorch/pytorch/pull/71153#discussion_r782446738 . It aims to replace `get_all` type macros with the ATen dispatch macros.
The files it iterates over are: (Thanks, Lezcano, for the idea!!)
<details>
<summary>
`test/test_autograd.py`</summary>
<p>
```python
43:from torch.testing._internal.common_dtype import get_all_dtypes
8506: floating_dt = [dt for dt in get_all_dtypes() if dt.is_floating_point]
```
</p>
</details>
<details>
<summary>
`test/test_binary_ufuncs.py`</summary>
<p>
```python
26: all_types_and_complex_and, integral_types_and, get_all_dtypes, get_all_int_dtypes, get_all_math_dtypes,
27: get_all_complex_dtypes, get_all_fp_dtypes,
935: dtypes(*get_all_dtypes(include_bool=False, include_complex=False))
1035: dtypes(*get_all_dtypes(
1488: dtypes(*(get_all_dtypes(include_bool=False, include_bfloat16=False)))
1879: dtypes(*product(get_all_dtypes(include_complex=False), get_all_dtypes(include_complex=False)))
1887: dtypes(*(get_all_int_dtypes() + [torch.bool]))
1913: dtypes(*(get_all_fp_dtypes()))
1941: dtypes(*(get_all_fp_dtypes()))
1977: dtypes(*product(get_all_complex_dtypes(), get_all_dtypes()))
2019: dtypes(*product(get_all_fp_dtypes(), get_all_fp_dtypes()))
2048: dtypes(*get_all_dtypes())
2110: dtypes(*product(get_all_dtypes(include_complex=False),
2111: get_all_dtypes(include_complex=False)))
2128: types = [torch.bool, torch.bfloat16] + get_all_int_dtypes()
2173: if dtypes[1] in get_all_fp_dtypes():
2178: dtypes(*product(get_all_fp_dtypes(),
2179: get_all_fp_dtypes()))
2260: dtypesIfCUDA(*set(get_all_math_dtypes('cuda')) - {torch.complex64, torch.complex128})
2261: dtypes(*set(get_all_math_dtypes('cpu')) - {torch.complex64, torch.complex128})
2273: dtypesIfCUDA(*set(get_all_math_dtypes('cuda')) - {torch.complex64, torch.complex128})
2274: dtypes(*set(get_all_math_dtypes('cpu')) - {torch.complex64, torch.complex128})
2307: dtypes(*get_all_math_dtypes('cpu'))
2319: dtypes(*get_all_fp_dtypes(include_bfloat16=False))
2331: dtypes(*get_all_int_dtypes())
2356: dtypes(*get_all_dtypes(include_bfloat16=False, include_bool=False, include_complex=False))
2393: if dtype in get_all_int_dtypes():
2614: dtypes(*get_all_dtypes())
2624: dtypes(*tuple(itertools.combinations_with_replacement(get_all_dtypes(), 2)))
2806: dtypes(*list(product(get_all_dtypes(include_complex=False),
2807: get_all_dtypes(include_complex=False))))
2866: dtypes(*list(product(get_all_complex_dtypes(),
2867: get_all_complex_dtypes())))
2902: dtypes(*product(get_all_dtypes(), get_all_dtypes()))
2906: dtypes(*product(get_all_dtypes(), get_all_dtypes()))
2910: dtypes(*product(get_all_dtypes(), get_all_dtypes()))
3019: dtypes = [torch.float, torch.double] + get_all_complex_dtypes()
3221: dtypes(*get_all_dtypes(include_complex=False))
3407: dtypes(*list(product(get_all_dtypes(include_bool=False),
3408: get_all_dtypes(include_bool=False))))
3504: dtypes(*product(get_all_dtypes(include_complex=False, include_bfloat16=False),
3505: get_all_dtypes(include_complex=False, include_bfloat16=False)))
3516: if x.dtype in get_all_int_dtypes() + [torch.bool]:
3643: dtypes(*product(get_all_dtypes(include_complex=False,
3645: get_all_dtypes(include_complex=False,
```
</p>
</details>
<details>
<summary>
`test/test_complex.py`</summary>
<p>
```python
6:from torch.testing._internal.common_dtype import get_all_complex_dtypes
11: dtypes(*get_all_complex_dtypes())
```
</p>
</details>
<details>
<summary>
`test/test_foreach.py`</summary>
<p>
```python
18: get_all_dtypes, get_all_int_dtypes, get_all_complex_dtypes, get_all_fp_dtypes,
142: if dtype in get_all_int_dtypes():
179: disable_fastpath = op.ref == torch.div and dtype in get_all_int_dtypes() + [torch.bool]
201: disable_fastpath = op.ref == torch.div and dtype in get_all_int_dtypes() + [torch.bool]
205: disable_fastpath |= dtype in get_all_int_dtypes() + [torch.bool]
211: disable_fastpath |= dtype not in get_all_complex_dtypes()
241: bool_int_div = op.ref == torch.div and dtype in get_all_int_dtypes() + [torch.bool]
246: disable_fastpath |= dtype in get_all_int_dtypes() + [torch.bool]
248: disable_fastpath |= dtype not in get_all_complex_dtypes()
250: disable_fastpath |= True and dtype not in get_all_complex_dtypes()
307: disable_fastpath = dtype in get_all_int_dtypes() + [torch.bool]
365: if opinfo.name == "_foreach_abs" and dtype in get_all_complex_dtypes():
376: ops(foreach_unary_op_db, dtypes=get_all_dtypes())
393: dtypes=get_all_dtypes(include_half=True, include_bfloat16=True, include_complex=False))
401: ops(foreach_minmax_op_db, dtypes=get_all_fp_dtypes(include_bfloat16=True, include_half=True))
426: if ord in (1, 2) and dtype in torch.testing.get_all_fp_dtypes():
439: dtypes(*get_all_dtypes())
449: ops(foreach_binary_op_db, dtypes=get_all_dtypes())
481: ops(foreach_binary_op_db, dtypes=get_all_dtypes())
536: if dtype in get_all_int_dtypes() + [torch.bool] and foreach_op == torch._foreach_div:
545: ops(foreach_binary_op_db, dtypes=get_all_dtypes())
637: ops(foreach_pointwise_op_db, allowed_dtypes=get_all_fp_dtypes(include_half=False, include_bfloat16=False))
```
</p>
</details>
<details>
<summary>
`test/test_linalg.py`</summary>
<p>
```python
29: all_types, floating_types, floating_and_complex_types, get_all_dtypes, get_all_int_dtypes, get_all_complex_dtypes,
30: get_all_fp_dtypes,
111: dtypes(*(get_all_dtypes()))
794: float_and_complex_dtypes = get_all_fp_dtypes() + get_all_complex_dtypes()
807: dtypes(*(get_all_int_dtypes()))
828: dtypes(*(get_all_fp_dtypes() + get_all_complex_dtypes()))
841: if dtype in get_all_complex_dtypes():
844: dtypes(*itertools.product(get_all_dtypes(),
845: get_all_dtypes()))
855: for dtypes0, dtypes1, dtypes2 in product(get_all_dtypes(), repeat=3):
5607: *get_all_fp_dtypes(include_half=not CUDA9, include_bfloat16=(CUDA11OrLater and SM53OrLater)))
5608: dtypes(*(set(get_all_dtypes()) - {torch.half, torch.bool}))
5644: dtypes(*(get_all_complex_dtypes() + get_all_fp_dtypes()))
6255: dtypesIfCUDA(*get_all_complex_dtypes(),
6256: *get_all_fp_dtypes(include_bfloat16=(TEST_WITH_ROCM or (CUDA11OrLater and SM53OrLater)),
6292: dtypesIfCUDA(*get_all_fp_dtypes(include_bfloat16=(TEST_WITH_ROCM or (CUDA11OrLater and SM53OrLater))))
6323: dtypesIfCUDA(*get_all_complex_dtypes(),
6324: *get_all_fp_dtypes(include_bfloat16=(TEST_WITH_ROCM or (CUDA11OrLater and SM53OrLater))))
6325: dtypes(*get_all_complex_dtypes(), *get_all_fp_dtypes())
6358: dtypesIfCUDA(*([torch.float, torch.double] + get_all_complex_dtypes()))
6556: dtypes(*get_all_fp_dtypes(), *get_all_complex_dtypes())
6668: dtypes(*get_all_fp_dtypes(), *get_all_complex_dtypes())
6741: dtypes(*get_all_fp_dtypes(), *get_all_complex_dtypes())
```
</p>
</details>
<details>
<summary>
`test/test_nn.py`</summary>
<p>
```python
37:from torch.testing._internal.common_dtype import integral_types, get_all_fp_dtypes, get_all_math_dtypes
50: onlyNativeDeviceTypes, deviceCountAtLeast, largeTensorTest, expectedFailureMeta, skipMeta, get_all_device_types, \
8862: for device in get_all_device_types():
9629: for dt1 in get_all_math_dtypes(device):
9630: for dt2 in get_all_math_dtypes(device):
9631: for dt3 in get_all_math_dtypes(device):
9648: for input_dtype in get_all_math_dtypes(device):
9664: for input_dtype in get_all_math_dtypes(device):
13015: dtypes(*get_all_fp_dtypes(include_bfloat16=AMPERE_OR_ROCM))
13034: dtypes(*get_all_fp_dtypes(include_bfloat16=AMPERE_OR_ROCM))
13159: dtypes(*get_all_fp_dtypes(include_bfloat16=AMPERE_OR_ROCM))
17400: dtypesIfCUDA(*get_all_fp_dtypes(include_bfloat16=AMPERE_OR_ROCM))
17768: dtypesIfCUDA(*get_all_fp_dtypes())
17773: dtypesIfCUDA(*get_all_fp_dtypes())
17778: dtypesIfCUDA(*get_all_fp_dtypes())
17783: dtypesIfCUDA(*get_all_fp_dtypes())
17788: dtypesIfCUDA(*get_all_fp_dtypes())
17793: dtypesIfCUDA(*get_all_fp_dtypes())
17798: dtypesIfCUDA(*get_all_fp_dtypes())
17963: dtypesIfCUDA(*get_all_fp_dtypes())
17977: dtypesIfCUDA(*get_all_fp_dtypes())
18684: def test_cross_entropy_loss_prob_target_all_reductions(self, device):
```
</p>
</details>
<details>
<summary>
`test/test_numpy_interop.py`</summary>
<p>
```python
12:from torch.testing._internal.common_dtype import get_all_dtypes
399: dtypes(*get_all_dtypes())
```
</p>
</details>
<details>
<summary>
`test/test_ops.py`</summary>
<p>
```python
12:from torch.testing._internal.common_dtype import floating_and_complex_types_and, get_all_dtypes
86: for dtype in get_all_dtypes():
```
</p>
</details>
<details>
<summary>
`test/test_reductions.py`</summary>
<p>
```python
16: get_all_dtypes, get_all_math_dtypes, get_all_int_dtypes, get_all_complex_dtypes, get_all_fp_dtypes,
360: allowed_dtypes=get_all_dtypes(include_bfloat16=False))
366: allowed_dtypes=get_all_dtypes(include_bfloat16=False))
394: allowed_dtypes=get_all_dtypes(include_bfloat16=False))
750: for dtype in [dtype for dtype in get_all_math_dtypes('cpu') if dtype != torch.float16]:
1404: dtypes(*get_all_dtypes(include_bool=False, include_complex=False))
1457: dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False) +
1458: get_all_complex_dtypes()))
1465: return dtype in get_all_int_dtypes()
1494: dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False)))
1501: dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False)))
1507: dtypes(*(get_all_complex_dtypes()))
1514: dtypes = list(get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False))
1523: dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False)))
1531: if dtype in get_all_fp_dtypes():
1608: dtypes(*(get_all_dtypes(include_half=True, include_bfloat16=False,
1837: dtypes(*get_all_dtypes(include_bool=False, include_complex=False))
1855: dtypes(*(set(get_all_dtypes(include_bool=False, include_complex=False)) - {torch.uint8}))
3219: for dtype in get_all_dtypes(include_half=True, include_bfloat16=False,
```
</p>
</details>
<details>
<summary>
`test/test_serialization.py`</summary>
<p>
```python
26:from torch.testing._internal.common_dtype import get_all_dtypes
586: for device, dtype in product(devices, get_all_dtypes()):
589: for other_dtype in get_all_dtypes():
```
</p>
</details>
<details>
<summary>
`test/test_shape_ops.py`</summary>
<p>
```python
18:from torch.testing._internal.common_dtype import get_all_dtypes
230: dtypes(*get_all_dtypes(include_complex=False, include_bool=False, include_half=False,
232: dtypesIfCUDA(*get_all_dtypes(include_complex=False, include_bool=False, include_bfloat16=False))
344: dtypes(*get_all_dtypes())
443: dtypes(*get_all_dtypes())
461: dtypes(*get_all_dtypes())
570: dtypes(*get_all_dtypes(include_complex=False))
```
</p>
</details>
<details>
<summary>
`test/test_sort_and_select.py`</summary>
<p>
```python
12: all_types, all_types_and, floating_types_and, get_all_dtypes, get_all_int_dtypes, get_all_fp_dtypes,
136: dtypes(*set(get_all_dtypes()) - {torch.bool, torch.complex64, torch.complex128})
231: dtypes(*set(get_all_dtypes()) - {torch.bool, torch.complex64, torch.complex128})
296: dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes()))
647: dtypesIfCUDA(*get_all_fp_dtypes())
678: dtypesIfCUDA(*(get_all_dtypes(include_complex=False,
682: dtypes(*(get_all_dtypes(include_complex=False, include_bool=False, include_half=False, include_bfloat16=False)))
739: dtypesIfCPU(*set(get_all_dtypes()) - {torch.complex64, torch.complex128})
740: dtypes(*set(get_all_dtypes()) - {torch.bfloat16, torch.complex64, torch.complex128})
799: dtypesIfCPU(*set(get_all_dtypes()) - {torch.complex64, torch.complex128})
800: dtypes(*set(get_all_dtypes()) - {torch.bfloat16, torch.complex64, torch.complex128})
```
</p>
</details>
<details>
<summary>
`test/test_sparse.py`</summary>
<p>
```python
20:from torch.testing import get_all_complex_dtypes, get_all_fp_dtypes
29: floating_and_complex_types, floating_and_complex_types_and, get_all_dtypes, get_all_int_dtypes,
1963: return dtype in get_all_int_dtypes()
1994: dtypes(*get_all_dtypes(include_bool=False, include_half=False,
2103: return dtype in get_all_int_dtypes()
2138: dtypes(*get_all_dtypes(include_bool=False, include_half=False,
2626: all_sparse_dtypes = get_all_dtypes(include_complex=True)
2633: all_sparse_dtypes = get_all_dtypes(include_complex=True)
3230: dtypes(*get_all_complex_dtypes(),
3231: *get_all_fp_dtypes(include_half=False, include_bfloat16=False))
3234: *get_all_fp_dtypes(
```
</p>
</details>
<details>
<summary>
`test/test_sparse_csr.py`</summary>
<p>
```python
7:from torch.testing import get_all_complex_dtypes, get_all_fp_dtypes, floating_and_complex_types, make_tensor
17:from torch.testing._internal.common_dtype import floating_types, get_all_dtypes
120: dtypes(*get_all_dtypes())
133: dtypes(*get_all_dtypes())
150: dtypes(*get_all_dtypes())
180: dtypes(*get_all_dtypes())
201: dtypes(*get_all_dtypes())
210: dtypes(*get_all_dtypes())
225: dtypes(*get_all_dtypes())
244: dtypes(*get_all_dtypes())
263: dtypes(*get_all_dtypes())
285: dtypes(*get_all_dtypes())
411: dtypes(*get_all_dtypes())
482: dtypes(*get_all_dtypes())
502: dtypes(*get_all_dtypes())
562: dtypes(*get_all_dtypes())
588: dtypesIfCUDA(*get_all_complex_dtypes(),
589: *get_all_fp_dtypes(include_half=SM53OrLater, include_bfloat16=SM80OrLater))
745: dtypesIfCUDA(*get_all_complex_dtypes(),
746: *get_all_fp_dtypes(include_half=SM53OrLater and TEST_CUSPARSE_GENERIC,
765: dtypesIfCUDA(*get_all_complex_dtypes(),
766: *get_all_fp_dtypes(include_half=SM53OrLater and TEST_CUSPARSE_GENERIC,
801: *torch.testing.get_all_fp_dtypes(include_bfloat16=SM80OrLater,
841: *torch.testing.get_all_fp_dtypes(include_bfloat16=SM80OrLater,
1182: dtypes(*get_all_dtypes())
1276: dtypes(*get_all_dtypes(include_bool=False, include_half=False, include_bfloat16=False))
1286: dtypes(*get_all_dtypes())
```
</p>
</details>
<details>
<summary>
`test/test_tensor_creation_ops.py`</summary>
<p>
```python
21: onlyCUDA, skipCPUIf, dtypesIfCUDA, skipMeta, get_all_device_types)
23: get_all_dtypes, get_all_math_dtypes, get_all_int_dtypes, get_all_fp_dtypes, get_all_complex_dtypes
150: for dt in get_all_dtypes():
160: for dt in get_all_dtypes():
314: dtypes = [dtype for dtype in get_all_dtypes() if dtype != torch.bfloat16]
1012: dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False) +
1013: get_all_complex_dtypes()))
1032: dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False) +
1033: get_all_complex_dtypes()))
1050: dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False) +
1051: get_all_complex_dtypes()))
1745: dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes()))
1779: dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes()))
1868: dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes()))
1926: dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes()))
1954: do_test_empty_full(self, get_all_math_dtypes('cpu'), torch.strided, torch_device)
1956: do_test_empty_full(self, get_all_math_dtypes('cpu'), torch.strided, None)
1957: do_test_empty_full(self, get_all_math_dtypes('cpu'), torch.strided, torch_device)
2538: for device in get_all_device_types():
2645: for dtype in get_all_dtypes():
2678: dtypes(*(get_all_fp_dtypes(include_half=False, include_bfloat16=False) +
2679: get_all_complex_dtypes()))
2716: dtypes(*get_all_fp_dtypes(include_half=False, include_bfloat16=False))
2827: for dt in get_all_dtypes():
2913: dtypes(*get_all_dtypes(include_bool=False, include_half=False))
2914: dtypesIfCUDA(*get_all_dtypes(include_bool=False, include_half=True))
3028: dtypes(*(get_all_fp_dtypes() + get_all_complex_dtypes()))
3033: dtypes(*(get_all_fp_dtypes() + get_all_complex_dtypes()))
3074: dtypes(*get_all_dtypes(include_bool=False, include_half=False, include_complex=False))
3075: dtypesIfCUDA(*((get_all_int_dtypes() + [torch.float32, torch.float16, torch.bfloat16])
3077: else get_all_dtypes(include_bool=False, include_half=True, include_complex=False)))
3873: dtypes(*get_all_dtypes())
3884: dtypes(*get_all_dtypes(include_bool=False))
3916: for other in get_all_dtypes():
3922: dtypes(*get_all_dtypes())
3932: dtypes(*get_all_dtypes(include_bool=False))
3955: dtypes(*get_all_dtypes(include_bool=False))
3961: dtypes(*get_all_dtypes(include_bool=False))
3965: dtypes(*get_all_dtypes())
```
</p>
</details>
<details>
<summary>
`test/test_testing.py`</summary>
<p>
```python
25:from torch.testing._internal.common_dtype import get_all_dtypes
31: dtypes(*(get_all_dtypes(include_half=True, include_bfloat16=False,
```
</p>
</details>
<details>
<summary>
`test/test_torch.py`</summary>
<p>
```python
51: expectedAlertNondeterministic, get_all_device_types, skipXLA)
57: get_all_fp_dtypes, get_all_int_dtypes, get_all_math_dtypes, get_all_dtypes, get_all_complex_dtypes
296: for d in get_all_device_types():
323: for device in get_all_device_types():
324: for dt1 in get_all_dtypes():
325: for dt2 in get_all_dtypes():
343: all_dtypes = get_all_dtypes()
350: all_dtypes = get_all_dtypes()
781: for dtype in get_all_dtypes():
986: for device in get_all_device_types():
1017: for device in get_all_device_types():
1018: for dtype in get_all_math_dtypes(device):
2792: for device in get_all_device_types():
3186: dtypes(*get_all_dtypes())
3195: for error_dtype in get_all_dtypes():
3203: dtypes(*get_all_dtypes())
3212: for error_dtype in get_all_dtypes():
4539: dtypes(*get_all_fp_dtypes())
4545: dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes()))
4577: dtypes(*get_all_fp_dtypes(include_half=False, include_bfloat16=False))
4578: dtypesIfCPU(*(get_all_fp_dtypes(include_half=False, include_bfloat16=True)))
4579: dtypesIfCUDA(*(get_all_fp_dtypes(include_bfloat16=False)))
4599: dtypes(*(get_all_fp_dtypes(include_half=False, include_bfloat16=False)))
4600: dtypesIfCPU(*(get_all_dtypes(include_half=False, include_bfloat16=False, include_complex=False)))
4601: dtypesIfCUDA(*(get_all_dtypes(include_bfloat16=False, include_complex=False)))
4613: for p_dtype in get_all_fp_dtypes(include_half=device.startswith('cuda'), include_bfloat16=False):
4628: dtypes(*(get_all_fp_dtypes(include_half=False, include_bfloat16=False)))
4629: dtypesIfCUDA(*(get_all_fp_dtypes(include_bfloat16=False)))
4640: dtypes(*get_all_fp_dtypes())
4723: dtypes(*get_all_fp_dtypes())
4735: dtypes(*get_all_fp_dtypes(include_bfloat16=False))
4736: dtypesIfCUDA(*get_all_fp_dtypes())
4747: dtypes(*get_all_fp_dtypes())
4761: dtypes(*get_all_fp_dtypes())
4771: dtypes(*get_all_fp_dtypes())
4792: dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes()))
5302: dtypes(*get_all_dtypes(include_bfloat16=False))
5322: dtypes(*get_all_dtypes(include_half=False, include_bfloat16=False))
5323: dtypesIfCPU(*get_all_dtypes(include_bfloat16=False))
5324: dtypesIfCUDA(*get_all_dtypes(include_bfloat16=False))
5591: for dt in get_all_dtypes():
5611: for dt in get_all_dtypes():
5678: for dt in get_all_dtypes():
5696: dtypesIfCUDA(*set(get_all_math_dtypes('cuda')))
5697: dtypes(*set(get_all_math_dtypes('cpu')))
5746: dtypes(*get_all_dtypes())
5780: dtypes(*get_all_dtypes())
5885: dtypes(*get_all_dtypes())
5902: dtypes(*get_all_dtypes())
5945: dtypes(*get_all_dtypes())
5979: dtypes(*get_all_dtypes(include_bool=False))
6049: dtypes(*get_all_dtypes(include_bool=False))
6092: dtypes(*(get_all_fp_dtypes(include_bfloat16=False, include_half=False) +
6093: get_all_complex_dtypes()))
6094: dtypesIfCPU(*get_all_dtypes())
6095: dtypesIfCUDA(*get_all_dtypes())
6122: dtypes(*(get_all_fp_dtypes(include_bfloat16=False, include_half=False) +
6123: get_all_complex_dtypes()))
6124: dtypesIfCPU(*get_all_dtypes())
6125: dtypesIfCUDA(*get_all_dtypes())
6163: dtypes(*(get_all_fp_dtypes(include_bfloat16=False, include_half=False) +
6164: get_all_complex_dtypes()))
6165: dtypesIfCPU(*get_all_dtypes())
6166: dtypesIfCUDA(*get_all_dtypes())
6190: dtypes(*(get_all_complex_dtypes() +
6191: get_all_int_dtypes()))
6238: dtypes(*get_all_dtypes())
6323: dtypes(*get_all_dtypes())
6389: dtypes(*product(get_all_dtypes(), (torch.uint8, torch.bool)))
6699: dtypesIfCUDA(*set(get_all_math_dtypes('cuda')))
6700: dtypes(*set(get_all_math_dtypes('cpu')))
7452: dtypes(*get_all_dtypes(include_bool=False))
7461: dtypes(*get_all_dtypes(include_bool=False))
7477: dtypes(*get_all_dtypes(include_bool=False))
7496: dtypes(*get_all_dtypes(include_bool=False))
7538: dtypes(*get_all_dtypes(include_bool=False))
8162: dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes() +
8163: get_all_complex_dtypes()))
8175: dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes() +
8176: get_all_complex_dtypes()))
```
</p>
</details>
<details>
<summary>
`test/test_type_promotion.py`</summary>
<p>
```python
14: get_all_dtypes, get_all_math_dtypes, get_all_int_dtypes, get_all_fp_dtypes
187: for dtype in get_all_dtypes():
262: dtypes1 = get_all_math_dtypes('cuda')
263: dtypes2 = get_all_math_dtypes(device)
339: dtypes(*itertools.product(get_all_dtypes(), get_all_dtypes()))
468: for dt1 in get_all_math_dtypes(device):
469: for dt2 in get_all_math_dtypes(device):
519: for dt1 in get_all_math_dtypes(device):
520: for dt2 in get_all_math_dtypes(device):
528: for dt in get_all_math_dtypes(device):
561: for dtype in get_all_dtypes():
766: dtypes=get_all_math_dtypes(device))
771: dtypes=get_all_math_dtypes(device))
782: dtypes=get_all_math_dtypes(device))
879: dtypes = get_all_dtypes(include_bfloat16=False)
898: dtypes = get_all_dtypes(include_bfloat16=False, include_bool=False)
965: dtypesIfCUDA(*itertools.product(get_all_dtypes(include_bfloat16=False, include_complex=False),
966: get_all_dtypes(include_bfloat16=False, include_complex=False)))
967: dtypes(*itertools.product(get_all_dtypes(include_half=False, include_bfloat16=False,
969: get_all_dtypes(include_half=False, include_bfloat16=False,
976: return dtype in get_all_int_dtypes() + [torch.bool]
979: return dtype in get_all_fp_dtypes(include_half=True, include_bfloat16=False)
```
</p>
</details>
<details>
<summary>
`test/test_unary_ufuncs.py`</summary>
<p>
```python
24: floating_types_and, all_types_and_complex_and, floating_and_complex_types_and, get_all_dtypes, get_all_math_dtypes,
25: get_all_int_dtypes, get_all_fp_dtypes, get_all_complex_dtypes
517: dtypes(*(get_all_int_dtypes() + [torch.bool] +
518: get_all_fp_dtypes(include_bfloat16=False)))
596: dtypes(*get_all_fp_dtypes(include_half=True, include_bfloat16=False))
611: invalid_input_dtypes = get_all_int_dtypes() + \
612: get_all_complex_dtypes() + \
619: for dtype in get_all_fp_dtypes(include_half=True, include_bfloat16=False):
1048: dtypes(*get_all_math_dtypes('cpu'))
1182: dtypesIfCUDA(*get_all_fp_dtypes())
1190: dtypesIfCUDA(*get_all_fp_dtypes())
1205: dtypesIfCUDA(*get_all_fp_dtypes())
1215: dtypesIfCUDA(*get_all_fp_dtypes())
1307: dtypes(*(get_all_dtypes(include_bool=False)))
1349: dtypes(*(get_all_fp_dtypes(include_half=False) +
1350: get_all_complex_dtypes()))
1351: dtypesIfCUDA(*(get_all_fp_dtypes(include_half=True) +
1352: get_all_complex_dtypes()))
```
</p>
</details>
<details>
<summary>
`test/test_view_ops.py`</summary>
<p>
```python
19: get_all_dtypes, get_all_int_dtypes, get_all_fp_dtypes, get_all_complex_dtypes
124: dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes()))
131: dtypes(*get_all_dtypes(include_bfloat16=False))
213: for view_dtype in [*get_all_fp_dtypes(), *get_all_complex_dtypes()]:
220: dtypes(*get_all_dtypes())
224: for view_dtype in get_all_dtypes():
305: dtypes(*get_all_complex_dtypes(include_complex32=True))
343: dtypes(*get_all_dtypes())
354: dtypes(*get_all_dtypes())
364: dtypes(*get_all_dtypes())
374: dtypes(*get_all_dtypes())
384: dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes()))
395: dtypes(*get_all_complex_dtypes())
426: dtypes(*get_all_complex_dtypes())
451: dtypes(*product(get_all_complex_dtypes(), get_all_dtypes()))
1263: dtypes(*(torch.testing.get_all_dtypes()))
1279: dtypes(*(torch.testing.get_all_dtypes()))
1405: dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False) +
1406: get_all_complex_dtypes()))
1471: dtypes(*get_all_dtypes(include_bfloat16=False))
1574: dtypes(*get_all_dtypes())
1601: dtypes(*get_all_dtypes(include_bfloat16=False))
1632: dtypes(*get_all_dtypes(include_bfloat16=False))
1711: for dt in get_all_dtypes():
1717: for dt in get_all_dtypes():
1724: for dt in get_all_dtypes():
```
</p>
</details>
I'm looking forward to your viewpoints. Thanks :)
cc: mruberry kshitij12345 anjali411
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71561
Reviewed By: samdow
Differential Revision: D34856571
Pulled By: mruberry
fbshipit-source-id: 0dca038bcad5cf69906245c496d2e61ac3876335
(cherry picked from commit b058f67b4313143efa714ab105f36e74083131b9)
2022-03-15 20:31:41 +00:00
Pearu Peterson
a5dcc0c378
Enable test_coalesce_cuda_bfloat16 ( #73158 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73158
Fixes #72893
Test Plan: Imported from OSS
Reviewed By: albanD
Differential Revision: D34515679
Pulled By: cpuhrsch
fbshipit-source-id: 049f8ddf53023b78e1b48e15bbd3cdc58b6bf692
(cherry picked from commit 28a44ca56f66bfaaf14a049856b7d89fec8cd838)
2022-02-28 19:34:20 +00:00
Pearu Peterson
3c932c345b
Fix test_Sparse_to_Sparse_copy__cuda_bfloat16 failure ( #73157 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73157
Fixes #72892
Test Plan: Imported from OSS
Reviewed By: george-qi
Differential Revision: D34398986
Pulled By: cpuhrsch
fbshipit-source-id: 20214be1859354fb18a306e8d1de9852a898c485
(cherry picked from commit c1816ef0cf8834149bebcc11f4402f0eedfae6f7)
2022-02-28 05:33:50 +00:00
Pearu Peterson
16cd6853e1
Fix test_sparse_addmm_...float16 and test_sparse_matmul_...float16 test failures ( #73155 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73155
Fixes #73145
Test Plan: Imported from OSS
Reviewed By: mikaylagawarecki
Differential Revision: D34398935
Pulled By: cpuhrsch
fbshipit-source-id: b1e852f25b0888b37d9c9c1418ddf344ac8f0a04
(cherry picked from commit d63c977fb39c7dcb3f3d083edc4b25cd2d6c2ec4)
2022-02-26 05:30:36 +00:00
Pearu Peterson
4c522643e7
Fix CUDA error when multiplying sparse hybrid tensors with zero dense dimensions ( #73428 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73428
Fixes https://github.com/pytorch/pytorch/issues/73363
Test Plan: Imported from OSS
Reviewed By: george-qi
Differential Revision: D34478521
Pulled By: cpuhrsch
fbshipit-source-id: cbc83f223a14c92ed8b284e5e2a8aab390e2bc5c
(cherry picked from commit 9d7ecc848228f9a5b1761f9d3653d3cca49e0244)
2022-02-26 01:08:45 +00:00
Philip Meier
0973c5a1cc
align signature of make_tensor with other creation ops ( #72702 )
...
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72702
Test Plan: Imported from OSS
Reviewed By: mrshenli
Differential Revision: D34457729
Pulled By: mruberry
fbshipit-source-id: 83d580c4201eef946dc9cf4b9e28a3d36be55609
(cherry picked from commit aa4cf20fbeb4b795595729b8ac2e6ba7707d8283)
2022-02-25 06:30:31 +00:00
Rohan Varma
c3d79ac422
Manual skip sparse tests
...
manual skip because not properly disabled by automation
Differential Revision: [D34456851](https://our.internmc.facebook.com/intern/diff/D34456851/ )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73374
2022-02-24 20:26:02 +00:00
Alban Desmaison
49444bb501
Revert D34400588: [pytorch][PR] super setUp call missing in TestSparse
...
Test Plan: revert-hammer
Differential Revision:
D34400588 (555b215a90 )
Original commit changeset: 40ac1c56918d
Original Phabricator Diff: D34400588 (555b215a90 )
fbshipit-source-id: 0375279d06cc7a9d612bd70cc4c042cb3319a5fc
(cherry picked from commit 7cd3d2da907e6f0882f56c8843d50586756a2fe6)
2022-02-24 14:34:01 +00:00
Jane Xu
555b215a90
super setUp call missing in TestSparse ( #73217 )
...
Summary:
Should fix the fact that Sparse tests are not rightly disabled https://github.com/pytorch/pytorch/issues/73145#issuecomment-1046952585
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73217
Reviewed By: atalman
Differential Revision: D34400588
Pulled By: janeyx99
fbshipit-source-id: 40ac1c56918d5c47debf962a2bd218a325626ad8
(cherry picked from commit e63dae284ba9056567fcaffc54d1aa38151c0a12)
2022-02-23 19:36:50 +00:00
Nikita Shulga
5dad19fef0
Back out "[pytorch][PR] add BFloat16 sparse operators on CPU: copy, coalesce, sparse_mask, ad…"
...
Summary:
Original commit changeset: f1274125234a
Original Phabricator Diff: D34343016 (c6f56599bb )
Test Plan: Abovementioned PR regressed OSS CI
Reviewed By: atalman
Differential Revision: D34379703
fbshipit-source-id: bc624cfd86249dde2fac635d9b66f08f86b4aed9
(cherry picked from commit e52827f1ae )
2022-02-21 18:31:51 +00:00
Jiayi Sun
c6f56599bb
add BFloat16 sparse operators on CPU: copy, coalesce, sparse_mask, ad… ( #72846 )
...
Summary:
…d_out, addmm
Fixes #ISSUE_NUMBER
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72846
Reviewed By: mikaylagawarecki
Differential Revision: D34343016
Pulled By: cpuhrsch
fbshipit-source-id: f1274125234a3bacbb7a38fc642fbf5c9786d435
(cherry picked from commit c819456abf )
2022-02-19 01:33:51 +00:00