pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Aaron Gokaslan	3d82d8d0ed	[BE] Enable more flake8-comprehensions checks (#94601 ) I applied some flake8 fixes and enabled checking for them in the linter. I also enabled some checks for my previous comprehensions PR. This is a follow up to #94323 where I enable the flake8 checkers for the fixes I made and fix a few more of them. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94601 Approved by: https://github.com/ezyang	2023-02-10 23:40:29 +00:00
Huy Do	c53bd0dd30	Mitigate broken test_coalesce_reference_cycle test on dynamo (#94622 ) The test has been disabled and shows up on https://github.com/pytorch/test-infra/blob/generated-stats/stats/disabled-tests-condensed.json, but then the JSON file downloaded by the runner doesn't seem to have it. Disable it explicitly to keep trunk green while investigating. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94622 Approved by: https://github.com/weiwangmeta	2023-02-10 21:59:36 +00:00
PyTorch MergeBot	76ed1a81d1	Revert "COO intersection kernel: respect value intersection order (#92242 )" This reverts commit `b07c839b70`. Reverted https://github.com/pytorch/pytorch/pull/92242 on behalf of https://github.com/jeanschmidt due to breaking vs17	2023-02-09 14:44:32 +00:00
Aleksandar Samardžić	e1f17b3530	Add CSR->BSC and CSC->BSR conversions (#93301 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/93301 Approved by: https://github.com/cpuhrsch	2023-02-07 19:22:05 +00:00
Nikita Vedeneev	b07c839b70	COO intersection kernel: respect value intersection order (#92242 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/92242 Approved by: https://github.com/cpuhrsch, https://github.com/amjames	2023-02-07 17:05:28 +00:00
Nikita Vedeneev	994f85d639	sparse_mask: extend lhs to sparse COO tensors (#92248 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/92248 Approved by: https://github.com/cpuhrsch, https://github.com/pearu	2023-02-01 09:00:07 +00:00
Aleksandar Samardžić	53f7fb9a22	Add CSC->BSC conversion (#92307 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/92307 Approved by: https://github.com/cpuhrsch	2023-01-30 17:03:36 +00:00
Pearu Peterson	65d6802e2f	Improve error messages for sparse methods on tensors with unsupported backends/layouts. (#93149 ) Fixes https://github.com/pytorch/pytorch/issues/92790 Pull Request resolved: https://github.com/pytorch/pytorch/pull/93149 Approved by: https://github.com/cpuhrsch	2023-01-27 19:50:23 +00:00
Pearu Peterson	0e92bbe5b1	Add sparse COO tensor support to torch.sum(dim=..., keepdim=...) (#92979 ) Fixes #92757, #86232 Pull Request resolved: https://github.com/pytorch/pytorch/pull/92979 Approved by: https://github.com/cpuhrsch	2023-01-26 18:42:51 +00:00
Eddie Yan	0bf7506051	[CUDA] Drop CUDA < 11.0 test flags (#92605 ) Follow-up of #89582 to drop flags like `CUDA11OrLater` in tests. Note that in some places it appears that `TEST_WITH_ROCM` is _implicitly_ guarded against via the `CUDA11OrLater` version check, based on my best-guess of how `torch.version.cuda` would behave in ROCM builds, so I've added `not TEST_WITH_ROCM` in cases where ROCM wasn't previously explicitly allowed. CC @ptrblck @malfet @ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/92605 Approved by: https://github.com/ngimel	2023-01-24 04:34:06 +00:00
Nikita Vedeneev	9f381c9b7f	sparse_sparse_matmul: simplify backward (#91712 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/91712 Approved by: https://github.com/albanD	2023-01-23 19:24:28 +00:00
Yanbo Liang	0ab4ab9f8d	[Dynamo] Fix calling UserDefinedObject.func should pass self object (#92050 ) Fixes #90834 Pull Request resolved: https://github.com/pytorch/pytorch/pull/92050 Approved by: https://github.com/jansel	2023-01-21 05:47:01 +00:00
Pearu Peterson	b3e4f5029b	Add check-sparse-tensor-invariants flag to Context - 2nd try. (#92094 ) This PR is a copy of https://github.com/pytorch/pytorch/pull/90849 that merge was reverted. The PR adds "check sparse tensor invariants" flag to Context that when enabled will trigger sparse tensor data invariants checks in unsafe methods of constructing sparse COO/CSR/CSC/BSR/BSC tensors. The feature includes the following changes to UI: `torch.sparse.check_sparse_tensor_invariants` class provides different ways to enable/disable the invariant checking. `torch.sparse_coo/csr/csc/bsr/bsc/compressed_tensor` functions have a new optional argument `check_invariants` to enable/disable the invariant checks explicitly. When the `check_invariants` argument is specified, the global state of the feature is temporarily overridden. The PR fixes https://github.com/pytorch/pytorch/issues/90833 Pull Request resolved: https://github.com/pytorch/pytorch/pull/92094 Approved by: https://github.com/cpuhrsch	2023-01-13 14:50:33 +00:00
PyTorch MergeBot	c7a22bb7c7	Revert "Add check-sparse-tensor-invariants flag to Context. (#90849 )" This reverts commit `b9a035c1c5`. Reverted https://github.com/pytorch/pytorch/pull/90849 on behalf of https://github.com/DanilBaibak due to Break internal build	2023-01-12 09:58:16 +00:00
Aleksandar Samardžić	8612ec5b90	Implement hybrid sparse to/from dense conversions. (#90177 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/90177 Approved by: https://github.com/cpuhrsch, https://github.com/pearu	2023-01-12 03:31:30 +00:00
min-jean-cho	af242eedfb	[Inductor] Added aten.uniform_ decomp (#90869 ) Fixes #90815 Pull Request resolved: https://github.com/pytorch/pytorch/pull/90869 Approved by: https://github.com/jgong5, https://github.com/jansel, https://github.com/lezcano, https://github.com/ngimel, https://github.com/albanD	2023-01-11 23:23:42 +00:00
Pearu Peterson	b9a035c1c5	Add check-sparse-tensor-invariants flag to Context. (#90849 ) This PR adds "check sparse tensor invariants" flag to Context that when enabled will trigger sparse tensor data invariants checks in unsafe methods of constructing sparse COO/CSR/CSC/BSR/BSC tensors. The feature includes the following changes to UI: - `torch.enable_check_sparse_tensor_invariants` and `torch.is_check_sparse_tensor_invariants_enabled` functions to globally enable/disable the invariant checks and to retrieve the state of the feature, respectively - `torch.sparse_coo/csr/csc/bsr/bsc/compressed_tensor` functions have a new optional argument `check_invariants` to enable/disable the invariant checks explicitly. When the `check_invariants` argument is specified, the global state of the feature is temporarily overridden. The PR also fixes https://github.com/pytorch/pytorch/issues/90833 # Main issue The following content is outdated after merging the PRs in this ghstack but kept for the record. The importance of this feature is that when enabling the invariants checks by default, say, via <details> ``` $ git diff diff --git a/torch/__init__.py b/torch/__init__.py index c8543057c7..19a91d0482 100644 --- a/torch/__init__.py +++ b/torch/__init__.py @@ -1239,3 +1239,8 @@ if 'TORCH_CUDA_SANITIZER' in os.environ: # Populate magic methods on SymInt and SymFloat import torch.fx.experimental.symbolic_shapes + +# temporarily enable sparse tensor arguments validation in unsafe +# constructors: + +torch._C._set_check_sparse_tensor_invariants(True) ``` </details> a massive number of test failures/errors occur in test_sparse_csr.py tests: ``` $ pytest -sv test/test_sparse_csr.py <snip> ==== 4293 failed, 1557 passed, 237 skipped, 2744 errors in 69.71s (0:01:09) ==== ``` that means that we are silently constructing sparse compressed tensors that do not satisfy the sparse tensor invariants. In particular, the following errors are raised: ``` AssertionError: "resize_as_sparse_compressed_tensor_: self and src must have the same layout" does not match "expected values to be a strided and contiguous tensor" RuntimeError: CUDA error: device-side assert triggered RuntimeError: `col_indices[..., crow_indices[..., i - 1]:crow_indices[..., i]] for all i = 1, ..., nrows are sorted and distinct along the last dimension values` is not satisfied. RuntimeError: expected col_indices to be a strided and contiguous tensor RuntimeError: expected row_indices to be a strided and contiguous tensor RuntimeError: expected values to be a strided and contiguous tensor RuntimeError: for_each: failed to synchronize: cudaErrorAssert: device-side assert triggered RuntimeError: tensor dimensionality must be sum of batch, base, and dense dimensionalities (=0 + 2 + 0) but got 3 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/90849 Approved by: https://github.com/amjames, https://github.com/cpuhrsch	2023-01-11 01:05:14 +00:00
anjali411	c887837ec3	Reland "Fix dynamo handling for tensor attributes: T, H, mT, mH (#90463 )" (#91897 ) This reverts commit `84266ae670`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/91897 Approved by: https://github.com/ngimel	2023-01-10 08:16:07 +00:00
PyTorch MergeBot	84266ae670	Revert "Fix dynamo handling for tensor attributes: T, H, mT, mH (#90463 )" This reverts commit `9945a78a94`. Reverted https://github.com/pytorch/pytorch/pull/90463 on behalf of https://github.com/ZainRizvi due to This is causing test failures: FAILED inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_pinv_singular_cuda_float64 - RuntimeError: unexpected success linalg.pinv.singular, torch.float64, cuda	2023-01-09 16:43:36 +00:00
anjali411	9945a78a94	Fix dynamo handling for tensor attributes: T, H, mT, mH (#90463 ) Fixes https://github.com/pytorch/pytorch/issues/88843 Pull Request resolved: https://github.com/pytorch/pytorch/pull/90463 Approved by: https://github.com/ngimel	2023-01-09 04:11:23 +00:00
Nikita Vedeneev	7ef7c57ae7	CSC/BSC -> COO coalesce fix (#91440 ) Fixes https://github.com/pytorch/pytorch/issues/91010. CSC and BSC sparse formats are not inherently `coalesced`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/91440 Approved by: https://github.com/pearu, https://github.com/amjames, https://github.com/cpuhrsch	2023-01-03 18:42:39 +00:00
Pearu Peterson	b797a24259	Support indices contiguity per batch and non-contiguous values in sparse compressed tensors (#91243 ) Fixes https://github.com/pytorch/pytorch/issues/91062 With this PR, all reported failures in https://github.com/pytorch/pytorch/pull/90849 are resolved (modulo test_bmm that uses an unorthodox way to construct a batch CSR tensor). Pull Request resolved: https://github.com/pytorch/pytorch/pull/91243 Approved by: https://github.com/nikitaved, https://github.com/amjames, https://github.com/lezcano	2023-01-02 18:08:46 +00:00
Nikita Vedeneev	1768a28a20	`COO @ COO`: fix to always produce coalesced outputs. (#91094 ) Fixes [#90516](https://github.com/pytorch/pytorch/issues/90516) Fixes [#90538](https://github.com/pytorch/pytorch/issues/90538) Pull Request resolved: https://github.com/pytorch/pytorch/pull/91094 Approved by: https://github.com/pearu	2022-12-27 21:32:14 +00:00
Pearu Peterson	8004f934cd	Fix CSR with int32 indices to CSC conversion (#91061 ) Fixes https://github.com/pytorch/pytorch/issues/91007 Pull Request resolved: https://github.com/pytorch/pytorch/pull/91061 Approved by: https://github.com/nikitaved	2022-12-18 13:53:25 +00:00
Pearu Peterson	01e7f46215	Ensure sorted indices from the CSR->BSR conversion (#90918 ) Fixes https://github.com/pytorch/pytorch/issues/90910 Pull Request resolved: https://github.com/pytorch/pytorch/pull/90918 Approved by: https://github.com/cpuhrsch	2022-12-16 15:49:48 +00:00
Edward Z. Yang	e686a442b4	If a torch.* returns non-Tensor, make this unimplemented rather than assert. (#89918 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/89918 Approved by: https://github.com/albanD	2022-12-15 21:53:54 +00:00
Pearu Peterson	a60d712010	Support (non-batch) BSR/BSC to COO sparse tensor conversions (#90718 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/90718 Approved by: https://github.com/cpuhrsch	2022-12-14 05:37:05 +00:00
Pearu Peterson	76c6dfeaa6	Add layout and blocksize arguments to Tensor.to_sparse method (#89502 ) This PR extends the `Tensor.to_sparse()` method to `Tensor.to_sparse(layout=None, blocksize=None)` in a BC manner (`layout=None` means `layout=torch.sparse_coo`). In addition, the PR adds support for the following conversions: - non-hybrid/hybrid COO tensor to CSR or CSC or a COO tensor - short, bool, byte, char, bfloat16, int, long, half CSR tensor to a BSR tensor and fixes the following conversions: - hybrid COO to COO tensor - non-batch/batch hybrid BSR to BSR or BSC tensor Pull Request resolved: https://github.com/pytorch/pytorch/pull/89502 Approved by: https://github.com/amjames, https://github.com/cpuhrsch	2022-11-30 20:21:10 +00:00
Pearu Peterson	296e1ba4d0	Row and column select support for block compressed sparse tensors (#88733 ) As in the title: - Support `select` and `select_copy` on block sparse compressed tensors - Fixes incorrect results when selecting dense dimensions The PR also improves the performance of indexing sparse compressed tensors considerably: <details> Before: ```python In [3]: a=torch.rand((1000, 1000)).to_sparse_csr() In [4]: %timeit a.select(0, 0) 606 µs ± 4.27 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) In [5]: %timeit a.select(1, 0) 527 µs ± 57.7 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) In [6]: %timeit a[0, 0] 617 µs ± 3.74 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) In [7]: a = a.cuda() In [8]: %timeit a.select(0, 0); torch.cuda.synchronize(); 1.19 ms ± 137 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) In [9]: %timeit a.select(1, 0); torch.cuda.synchronize(); 1.2 ms ± 119 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) In [10]: %timeit a[0, 0]; torch.cuda.synchronize(); 1.23 ms ± 482 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) ``` This PR: ```python In [3]: a=torch.rand((1000, 1000)).to_sparse_csr() In [4]: %timeit a.select(0, 0) 4.75 µs ± 8.94 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) In [5]: %timeit a.select(1, 0) 565 µs ± 156 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) In [6]: %timeit a[0, 0] 13.1 µs ± 435 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) In [7]: a = a.cuda() In [8]: %timeit a.select(0, 0); torch.cuda.synchronize(); 21.6 µs ± 23.9 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) In [9]: %timeit a.select(1, 0); torch.cuda.synchronize(); 1.15 ms ± 3.13 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) In [10]: %timeit a[0, 0]; torch.cuda.synchronize(); 63.7 µs ± 2.5 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) ``` </details> Pull Request resolved: https://github.com/pytorch/pytorch/pull/88733 Approved by: https://github.com/nikitaved, https://github.com/amjames, https://github.com/cpuhrsch	2022-11-30 11:15:56 +00:00
Pearu Peterson	90bed8874f	Generator of tensor inputs with variable layout and structure (batch/non-batch, hybrid/non-hybrid, block/non-block) (#88914 ) This PR introduces `TestCase.generate_simple_inputs` method that is an improved and generalized version of the `TestSparseCompressed._generate_small_inputs` method. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88914 Approved by: https://github.com/cpuhrsch	2022-11-30 02:13:33 +00:00
Kazuaki Ishizaki	088f2fa567	Fix typos in messages under test (#89121 ) This PR fixes typos of messages in `.cpp` and `.py` files under test directory. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89121 Approved by: https://github.com/mruberry, https://github.com/kit1980	2022-11-17 01:55:03 +00:00
Andrew M. James	ff6770a9a1	enable backward for log1p (sparse layouts) (#88155 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88155 Approved by: https://github.com/cpuhrsch	2022-11-04 20:59:26 +00:00
jpvillam	1e1b045128	[ROCM] Enable Sparse Pickle Test (#82729 ) Missed stream context for serialization ### Description Missing ROCm stream context on memory operations for serialization ### Testing Ran the sparse pickle test Pull Request resolved: https://github.com/pytorch/pytorch/pull/82729 Approved by: https://github.com/ngimel	2022-10-27 15:11:28 +00:00
Pearu Peterson	88b882cd1c	Support sum on a sparse COO tensor. (#86300 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86300 Approved by: https://github.com/cpuhrsch	2022-10-06 18:39:28 +00:00
George Qi	686555b663	[maskedtensor] port torch/_masked into torch/masked (#85515 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85515 Approved by: https://github.com/cpuhrsch	2022-09-26 23:41:13 +00:00
Elias Ellison	bcc544e9d7	Add FakeCrossRef tests for backwards, Fix Layer Norm Backward Decomp (#85417 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85417 Approved by: https://github.com/ezyang	2022-09-26 17:08:14 +00:00
nikitaved	12ae3bea43	Faster mul(sparse, sparse) with broadcasting in dense dims. (#85336 ) This is a combo PR of https://github.com/pytorch/pytorch/pull/84929 and ~https://github.com/pytorch/pytorch/pull/83428~. Preliminary benchmarks (square matrices of shape (n, n)). <details> <summary>Script</summary> ```python import torch import math from IPython import get_ipython from itertools import product, repeat import pickle from torch.utils.benchmark import Timer, Compare torch.manual_seed(13) problem_dims = ( # n > nnz (10000, 100), (100000, 1000), (1000000, 10000), # n < nnz (10, 100), (10, 1000), (10, 10000), (100, 1000), (100, 10000), (1000, 10000), (1000, 100000), (1000, 1000000), #(1000000, 1000000000), ) name = "PR" device = "cuda" results = [] for n, nnz in problem_dims: def gen_tensor(coalesce=False): shape = (n, n) nrows, ncols = shape rowidx = torch.randint(low=0, high=nrows, size=(nnz,), device=device) colidx = torch.randint(low=0, high=ncols, size=(nnz,), device=device) itemidx = torch.vstack((rowidx, colidx)) xvalues = torch.randn(nnz, device=device) itemidx = torch.hstack((itemidx, itemidx)) xvalues = torch.hstack((xvalues, xvalues)) res = torch.sparse_coo_tensor(itemidx, xvalues, size=shape) if coalesce: return res.coalesce() else: return res for x_coalesce, y_coalesce in product(repeat((True, False), 2)): x = gen_tensor(x_coalesce) y = gen_tensor(y_coalesce) smtp = "x y" timer = Timer(smtp, globals=globals(), label="coo.mul", description=f"{name}: mul, device: {device}", sub_label=f"n={n}, nnz={nnz}, coalesce=({x_coalesce, y_coalesce})", num_threads=torch.get_num_threads()) results.append(timer.blocked_autorange()) compare = Compare(results) compare.trim_significant_figures() compare.print() with open(f"{name}_{device}_mul.pickle", 'wb') as f: pickle.dump(results, f) ``` </details> <details> <summary>Gather results</summary> ```python import pickle from torch.utils.benchmark import Timer, Compare files = [ "PR", "master" ] device = 'cuda' timers = [] for name in files: with open("{}_{}_mul.pickle".format(name, device), 'rb') as f: timers += pickle.load(f) compare = Compare(timers) compare.trim_significant_figures() compare.print() ``` </details> <details> <summary>CUDA</summary> ``` [------------------------------------------------- coo.mul -------------------------------------------------] \| PR: mul, device: cuda \| master: mul, device: cuda 24 threads: ------------------------------------------------------------------------------------------------- n=10000, nnz=100, coalesce=((True, True)) \| 95 \| 91 n=10000, nnz=100, coalesce=((True, False)) \| 87 \| 242 n=10000, nnz=100, coalesce=((False, True)) \| 87 \| 226 n=10000, nnz=100, coalesce=((False, False)) \| 130 \| 371 n=100000, nnz=1000, coalesce=((True, True)) \| 100 \| 521 n=100000, nnz=1000, coalesce=((True, False)) \| 90 \| 649 n=100000, nnz=1000, coalesce=((False, True)) \| 100 \| 659 n=100000, nnz=1000, coalesce=((False, False)) \| 200 \| 781 n=1000000, nnz=10000, coalesce=((True, True)) \| 100 \| 4861 n=1000000, nnz=10000, coalesce=((True, False)) \| 100 \| 5012 n=1000000, nnz=10000, coalesce=((False, True)) \| 98 \| 5010 n=1000000, nnz=10000, coalesce=((False, False)) \| 384 \| 5174 n=10, nnz=100, coalesce=((True, True)) \| 100 \| 79 n=10, nnz=100, coalesce=((True, False)) \| 100 \| 221 n=10, nnz=100, coalesce=((False, True)) \| 100 \| 221 n=10, nnz=100, coalesce=((False, False)) \| 100 \| 350 n=10, nnz=1000, coalesce=((True, True)) \| 100 \| 100 n=10, nnz=1000, coalesce=((True, False)) \| 100 \| 240 n=10, nnz=1000, coalesce=((False, True)) \| 100 \| 254 n=10, nnz=1000, coalesce=((False, False)) \| 100 \| 392 n=10, nnz=10000, coalesce=((True, True)) \| 100 \| 110 n=10, nnz=10000, coalesce=((True, False)) \| 110 \| 286 n=10, nnz=10000, coalesce=((False, True)) \| 110 \| 286 n=10, nnz=10000, coalesce=((False, False)) \| 271 \| 455 n=100, nnz=1000, coalesce=((True, True)) \| 110 \| 851 n=100, nnz=1000, coalesce=((True, False)) \| 110 \| 1000 n=100, nnz=1000, coalesce=((False, True)) \| 110 \| 990 n=100, nnz=1000, coalesce=((False, False)) \| 140 \| 1124 n=100, nnz=10000, coalesce=((True, True)) \| 110 \| 5137 n=100, nnz=10000, coalesce=((True, False)) \| 110 \| 5391 n=100, nnz=10000, coalesce=((False, True)) \| 100 \| 5405 n=100, nnz=10000, coalesce=((False, False)) \| 249 \| 5539 n=1000, nnz=10000, coalesce=((True, True)) \| 100 \| 8598 n=1000, nnz=10000, coalesce=((True, False)) \| 100 \| 8800 n=1000, nnz=10000, coalesce=((False, True)) \| 100 \| 8782 n=1000, nnz=10000, coalesce=((False, False)) \| 255 \| 8956 n=1000, nnz=100000, coalesce=((True, True)) \| 120 \| 84500 n=1000, nnz=100000, coalesce=((True, False)) \| 200 \| 88560 n=1000, nnz=100000, coalesce=((False, True)) \| 160 \| 89000 n=1000, nnz=100000, coalesce=((False, False)) \| 373 \| 89000 n=1000, nnz=1000000, coalesce=((True, True)) \| 312 \| 606400 n=1000, nnz=1000000, coalesce=((True, False)) \| 1340 \| 609200 n=1000, nnz=1000000, coalesce=((False, True)) \| 1340 \| 609100 n=1000, nnz=1000000, coalesce=((False, False)) \| 4408 \| 611400 Times are in microseconds (us). ``` </details> <details> <summary>CPU</summary> ``` [------------------------------------------------ coo.mul ------------------------------------------------] \| PR: mul, device: cpu \| master: mul, device: cpu 24 threads: ----------------------------------------------------------------------------------------------- n=10000, nnz=100, coalesce=((True, True)) \| 8 \| 8 n=10000, nnz=100, coalesce=((True, False)) \| 32 \| 34 n=10000, nnz=100, coalesce=((False, True)) \| 32 \| 34 n=10000, nnz=100, coalesce=((False, False)) \| 41 \| 56 n=100000, nnz=1000, coalesce=((True, True)) \| 24 \| 24 n=100000, nnz=1000, coalesce=((True, False)) \| 90 \| 100 n=100000, nnz=1000, coalesce=((False, True)) \| 87 \| 100 n=100000, nnz=1000, coalesce=((False, False)) \| 231 \| 255 n=1000000, nnz=10000, coalesce=((True, True)) \| 190 \| 200 n=1000000, nnz=10000, coalesce=((True, False)) \| 908 \| 2023 n=1000000, nnz=10000, coalesce=((False, True)) \| 800 \| 2036 n=1000000, nnz=10000, coalesce=((False, False)) \| 3684 \| 3989 n=10, nnz=100, coalesce=((True, True)) \| 8 \| 7 n=10, nnz=100, coalesce=((True, False)) \| 34 \| 30 n=10, nnz=100, coalesce=((False, True)) \| 33 \| 30 n=10, nnz=100, coalesce=((False, False)) \| 44 \| 50 n=10, nnz=1000, coalesce=((True, True)) \| 8 \| 7 n=10, nnz=1000, coalesce=((True, False)) \| 100 \| 100 n=10, nnz=1000, coalesce=((False, True)) \| 130 \| 100 n=10, nnz=1000, coalesce=((False, False)) \| 746 \| 210 n=10, nnz=10000, coalesce=((True, True)) \| 8 \| 7 n=10, nnz=10000, coalesce=((True, False)) \| 1000 \| 1500 n=10, nnz=10000, coalesce=((False, True)) \| 1000 \| 1510 n=10, nnz=10000, coalesce=((False, False)) \| 3063 \| 2457 n=100, nnz=1000, coalesce=((True, True)) \| 25 \| 25 n=100, nnz=1000, coalesce=((True, False)) \| 180 \| 130 n=100, nnz=1000, coalesce=((False, True)) \| 200 \| 130 n=100, nnz=1000, coalesce=((False, False)) \| 271 \| 255 n=100, nnz=10000, coalesce=((True, True)) \| 100 \| 100 n=100, nnz=10000, coalesce=((True, False)) \| 2444 \| 2290 n=100, nnz=10000, coalesce=((False, True)) \| 2455 \| 2357 n=100, nnz=10000, coalesce=((False, False)) \| 5316 \| 3783 n=1000, nnz=10000, coalesce=((True, True)) \| 204 \| 211 n=1000, nnz=10000, coalesce=((True, False)) \| 2457 \| 2480 n=1000, nnz=10000, coalesce=((False, True)) \| 2448 \| 2539 n=1000, nnz=10000, coalesce=((False, False)) \| 3665 \| 4801 n=1000, nnz=100000, coalesce=((True, True)) \| 2293 \| 2374 n=1000, nnz=100000, coalesce=((True, False)) \| 9000 \| 24620 n=1000, nnz=100000, coalesce=((False, True)) \| 8000 \| 25080 n=1000, nnz=100000, coalesce=((False, False)) \| 26500 \| 47650 n=1000, nnz=1000000, coalesce=((True, True)) \| 10000 \| 13000 n=1000, nnz=1000000, coalesce=((True, False)) \| 80000 \| 362200 n=1000, nnz=1000000, coalesce=((False, True)) \| 78050 \| 392600 n=1000, nnz=1000000, coalesce=((False, False)) \| 312100 \| 766900 Times are in microseconds (us). ``` </details> Pull Request resolved: https://github.com/pytorch/pytorch/pull/85336 Approved by: https://github.com/cpuhrsch	2022-09-23 23:31:19 +00:00
PyTorch MergeBot	d10de31cc8	Revert "Add FakeCrossRef tests for backwards, Fix Layer Norm Backward Decomp (#85417 )" This reverts commit `78afa0cf0c`. Reverted https://github.com/pytorch/pytorch/pull/85417 on behalf of https://github.com/clee2000 due to broke tests on trunk `78afa0cf0c`	2022-09-23 17:21:43 +00:00
Elias Ellison	78afa0cf0c	Add FakeCrossRef tests for backwards, Fix Layer Norm Backward Decomp (#85417 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85417 Approved by: https://github.com/ezyang	2022-09-23 15:50:03 +00:00
PyTorch MergeBot	5043457a8e	Revert "Add FakeCrossRef tests for backwards, Fix Layer Norm Backward Decomp (#85417 )" This reverts commit `9c77083965`. Reverted https://github.com/pytorch/pytorch/pull/85417 on behalf of https://github.com/clee2000 due to broke tests on trunk (and pull somehow) `9c77083965`	2022-09-22 15:44:38 +00:00
Elias Ellison	9c77083965	Add FakeCrossRef tests for backwards, Fix Layer Norm Backward Decomp (#85417 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85417 Approved by: https://github.com/ezyang	2022-09-22 13:03:57 +00:00
Elias Ellison	d9aa6dfe88	Add Fake Cross Ref Mode, migrate sparse to it (#85382 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85382 Approved by: https://github.com/ezyang	2022-09-21 17:15:47 +00:00
PyTorch MergeBot	81620c3360	Revert "Faster mul(sparse, sparse) with broadcasting in dense dims. (#83428 )" This reverts commit `d49943bda8`. Reverted https://github.com/pytorch/pytorch/pull/83428 on behalf of https://github.com/osalpekar due to Reverted because __restrict symbol not supported by certain MSVC compilers, leading to undefined symbol error at compilation time	2022-09-17 06:53:11 +00:00
nikitaved	d49943bda8	Faster mul(sparse, sparse) with broadcasting in dense dims. (#83428 ) Preliminary benchmarks (square matrices of shape (n, n)). <details> <summary>Script</summary> ```python import torch import math from IPython import get_ipython from itertools import product, repeat import pickle from torch.utils.benchmark import Timer, Compare torch.manual_seed(13) # specifies (n, nnz) problem_dims = ( # n > nnz (10000, 100), (100000, 1000), (1000000, 10000), # n < nnz (10, 100), (10, 1000), (10, 10000), (100, 1000), (100, 10000), (1000, 10000), (1000, 100000), (1000, 1000000), #(1000000, 1000000000), ) name = "PR" device = "cuda" results = [] for n, nnz in problem_dims: def gen_tensor(coalesce=False): shape = (n, n) nrows, ncols = shape rowidx = torch.randint(low=0, high=nrows, size=(nnz,), device=device) colidx = torch.randint(low=0, high=ncols, size=(nnz,), device=device) itemidx = torch.vstack((rowidx, colidx)) xvalues = torch.randn(nnz, device=device) itemidx = torch.hstack((itemidx, itemidx)) xvalues = torch.hstack((xvalues, xvalues)) res = torch.sparse_coo_tensor(itemidx, xvalues, size=shape) if coalesce: return res.coalesce() else: return res for x_coalesce, y_coalesce in product(repeat((True, False), 2)): x = gen_tensor(x_coalesce) y = gen_tensor(y_coalesce) smtp = "x y" timer = Timer(smtp, globals=globals(), label="coo.mul", description=f"{name}: mul, device: {device}", sub_label=f"n={n}, nnz={nnz}, coalesce=({x_coalesce, y_coalesce})", num_threads=torch.get_num_threads()) results.append(timer.blocked_autorange()) compare = Compare(results) compare.trim_significant_figures() compare.print() with open(f"{name}_{device}_mul.pickle", 'wb') as f: pickle.dump(results, f) ``` </details> <details> <summary>Gather results</summary> ```python import pickle from torch.utils.benchmark import Timer, Compare files = [ "PR", "master" ] device = 'cuda' timers = [] for name in files: with open("{}_{}_mul.pickle".format(name, device), 'rb') as f: timers += pickle.load(f) compare = Compare(timers) compare.trim_significant_figures() compare.print() ``` </details> <details> <summary>CUDA</summary> ``` [------------------------------------------------- coo.mul -------------------------------------------------] \| PR: mul, device: cuda \| master: mul, device: cuda 24 threads: ------------------------------------------------------------------------------------------------- n=10000, nnz=100, coalesce=((True, True)) \| 95 \| 91 n=10000, nnz=100, coalesce=((True, False)) \| 87 \| 242 n=10000, nnz=100, coalesce=((False, True)) \| 87 \| 226 n=10000, nnz=100, coalesce=((False, False)) \| 130 \| 371 n=100000, nnz=1000, coalesce=((True, True)) \| 100 \| 521 n=100000, nnz=1000, coalesce=((True, False)) \| 90 \| 649 n=100000, nnz=1000, coalesce=((False, True)) \| 100 \| 659 n=100000, nnz=1000, coalesce=((False, False)) \| 200 \| 781 n=1000000, nnz=10000, coalesce=((True, True)) \| 100 \| 4861 n=1000000, nnz=10000, coalesce=((True, False)) \| 100 \| 5012 n=1000000, nnz=10000, coalesce=((False, True)) \| 98 \| 5010 n=1000000, nnz=10000, coalesce=((False, False)) \| 384 \| 5174 n=10, nnz=100, coalesce=((True, True)) \| 100 \| 79 n=10, nnz=100, coalesce=((True, False)) \| 100 \| 221 n=10, nnz=100, coalesce=((False, True)) \| 100 \| 221 n=10, nnz=100, coalesce=((False, False)) \| 100 \| 350 n=10, nnz=1000, coalesce=((True, True)) \| 100 \| 100 n=10, nnz=1000, coalesce=((True, False)) \| 100 \| 240 n=10, nnz=1000, coalesce=((False, True)) \| 100 \| 254 n=10, nnz=1000, coalesce=((False, False)) \| 100 \| 392 n=10, nnz=10000, coalesce=((True, True)) \| 100 \| 110 n=10, nnz=10000, coalesce=((True, False)) \| 110 \| 286 n=10, nnz=10000, coalesce=((False, True)) \| 110 \| 286 n=10, nnz=10000, coalesce=((False, False)) \| 271 \| 455 n=100, nnz=1000, coalesce=((True, True)) \| 110 \| 851 n=100, nnz=1000, coalesce=((True, False)) \| 110 \| 1000 n=100, nnz=1000, coalesce=((False, True)) \| 110 \| 990 n=100, nnz=1000, coalesce=((False, False)) \| 140 \| 1124 n=100, nnz=10000, coalesce=((True, True)) \| 110 \| 5137 n=100, nnz=10000, coalesce=((True, False)) \| 110 \| 5391 n=100, nnz=10000, coalesce=((False, True)) \| 100 \| 5405 n=100, nnz=10000, coalesce=((False, False)) \| 249 \| 5539 n=1000, nnz=10000, coalesce=((True, True)) \| 100 \| 8598 n=1000, nnz=10000, coalesce=((True, False)) \| 100 \| 8800 n=1000, nnz=10000, coalesce=((False, True)) \| 100 \| 8782 n=1000, nnz=10000, coalesce=((False, False)) \| 255 \| 8956 n=1000, nnz=100000, coalesce=((True, True)) \| 120 \| 84500 n=1000, nnz=100000, coalesce=((True, False)) \| 200 \| 88560 n=1000, nnz=100000, coalesce=((False, True)) \| 160 \| 89000 n=1000, nnz=100000, coalesce=((False, False)) \| 373 \| 89000 n=1000, nnz=1000000, coalesce=((True, True)) \| 312 \| 606400 n=1000, nnz=1000000, coalesce=((True, False)) \| 1340 \| 609200 n=1000, nnz=1000000, coalesce=((False, True)) \| 1340 \| 609100 n=1000, nnz=1000000, coalesce=((False, False)) \| 4408 \| 611400 Times are in microseconds (us). ``` </details> <details> <summary>CPU</summary> ``` [------------------------------------------------ coo.mul ------------------------------------------------] \| PR: mul, device: cpu \| master: mul, device: cpu 24 threads: ----------------------------------------------------------------------------------------------- n=10000, nnz=100, coalesce=((True, True)) \| 8 \| 8 n=10000, nnz=100, coalesce=((True, False)) \| 32 \| 34 n=10000, nnz=100, coalesce=((False, True)) \| 32 \| 34 n=10000, nnz=100, coalesce=((False, False)) \| 41 \| 56 n=100000, nnz=1000, coalesce=((True, True)) \| 24 \| 24 n=100000, nnz=1000, coalesce=((True, False)) \| 90 \| 100 n=100000, nnz=1000, coalesce=((False, True)) \| 87 \| 100 n=100000, nnz=1000, coalesce=((False, False)) \| 231 \| 255 n=1000000, nnz=10000, coalesce=((True, True)) \| 190 \| 200 n=1000000, nnz=10000, coalesce=((True, False)) \| 908 \| 2023 n=1000000, nnz=10000, coalesce=((False, True)) \| 800 \| 2036 n=1000000, nnz=10000, coalesce=((False, False)) \| 3684 \| 3989 n=10, nnz=100, coalesce=((True, True)) \| 8 \| 7 n=10, nnz=100, coalesce=((True, False)) \| 34 \| 30 n=10, nnz=100, coalesce=((False, True)) \| 33 \| 30 n=10, nnz=100, coalesce=((False, False)) \| 44 \| 50 n=10, nnz=1000, coalesce=((True, True)) \| 8 \| 7 n=10, nnz=1000, coalesce=((True, False)) \| 100 \| 100 n=10, nnz=1000, coalesce=((False, True)) \| 130 \| 100 n=10, nnz=1000, coalesce=((False, False)) \| 746 \| 210 n=10, nnz=10000, coalesce=((True, True)) \| 8 \| 7 n=10, nnz=10000, coalesce=((True, False)) \| 1000 \| 1500 n=10, nnz=10000, coalesce=((False, True)) \| 1000 \| 1510 n=10, nnz=10000, coalesce=((False, False)) \| 3063 \| 2457 n=100, nnz=1000, coalesce=((True, True)) \| 25 \| 25 n=100, nnz=1000, coalesce=((True, False)) \| 180 \| 130 n=100, nnz=1000, coalesce=((False, True)) \| 200 \| 130 n=100, nnz=1000, coalesce=((False, False)) \| 271 \| 255 n=100, nnz=10000, coalesce=((True, True)) \| 100 \| 100 n=100, nnz=10000, coalesce=((True, False)) \| 2444 \| 2290 n=100, nnz=10000, coalesce=((False, True)) \| 2455 \| 2357 n=100, nnz=10000, coalesce=((False, False)) \| 5316 \| 3783 n=1000, nnz=10000, coalesce=((True, True)) \| 204 \| 211 n=1000, nnz=10000, coalesce=((True, False)) \| 2457 \| 2480 n=1000, nnz=10000, coalesce=((False, True)) \| 2448 \| 2539 n=1000, nnz=10000, coalesce=((False, False)) \| 3665 \| 4801 n=1000, nnz=100000, coalesce=((True, True)) \| 2293 \| 2374 n=1000, nnz=100000, coalesce=((True, False)) \| 9000 \| 24620 n=1000, nnz=100000, coalesce=((False, True)) \| 8000 \| 25080 n=1000, nnz=100000, coalesce=((False, False)) \| 26500 \| 47650 n=1000, nnz=1000000, coalesce=((True, True)) \| 10000 \| 13000 n=1000, nnz=1000000, coalesce=((True, False)) \| 80000 \| 362200 n=1000, nnz=1000000, coalesce=((False, True)) \| 78050 \| 392600 n=1000, nnz=1000000, coalesce=((False, False)) \| 312100 \| 766900 Times are in microseconds (us). ``` </details> Pull Request resolved: https://github.com/pytorch/pytorch/pull/83428 Approved by: https://github.com/cpuhrsch	2022-09-16 00:28:40 +00:00
Edward Z. Yang	c5a8946e40	Revert "Revert "Redo how custom/python_custom methods on TensorImpl work (#84796 )" (#84806 ) This reverts commit `ca3b2bfbe3`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84806 Approved by: https://github.com/Chillee	2022-09-10 06:17:35 +00:00
Eli Uriegas	ca3b2bfbe3	Revert "Redo how custom/python_custom methods on TensorImpl work (#84796 ) This reverts commit `591b75bf98`. Manual revert of https://github.com/pytorch/pytorch/pull/84641 Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/84796 Approved by: https://github.com/izaitsevfb	2022-09-10 00:18:13 +00:00
Edward Z. Yang	591b75bf98	Redo how custom/python_custom methods on TensorImpl work (#84641 ) A longstanding confusion in the implementation of fake tensor and proxy tensor is what to do about torch.ops.aten.sym_sizes and related calls. In particular, when you have a tensor that (1) has symbolic shapes and (2) has a `__torch_dispatch__` call, previously, you would always get `__torch_dispatch__` calls for sizes/strides query, even if you didn't request it via the dispatch kwargs in `make_wrapper_subclass`. The reason for this is because we were previously mixing several concepts: "I want to dispatch to Python", "I want to call a virtual method" and "I have dynamic shapes". A single boolean variable controlled all of these things, and so it was not possible to understand inside TensorImpl what the user had actually originally requested. In this PR, we track each of these concepts individually so that we can preserve user intent. Then, we combine these into a single "policy" variable that controls whether or not we can use the fastpath or not. For the policy to trigger, we only need one of the exceptional cases to be true. Billing of changes: * Rename `set_sizes_strides_policy` to `set_custom_sizes_strides`; in general, you cannot DIRECTLY set policy; you have to indirectly set it by the public functions. * Some helpers for sizes and strides, since it's more complicated (as it is an enum, rather than just bools as is the case for device and layout). `matches_python_custom` is used to test the Python dispatch user ask. `matches_policy` does the policy test (only used in the user facing functions.) * I reorged the accessor methods so that they are more logical. This makes the diff bad, so I recommend reading the final code directly. * The default custom implementations now more reliably call their default() implementations * As bonus refactor, I devirtualized some functions that don't need to be virtual * `set_sym_sizes_and_strides` is renamed to `set_sizes_and_strides` to make it easier to use in template contexts; it optionally takes a storage offset now so you can set all three values at the same time. If you use the SymInt overload but there are no symbolic integers, we give you a normal resize. * This adds `sym_storage_offset` since we had that in the symbolic shapes branch and there's no reason not to put it in (and it reduces merge conflicts) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/84641 Approved by: https://github.com/wconstab	2022-09-09 13:41:13 +00:00
Elias Ellison	15c5baf878	Throw on data dependent ops (#83567 ) Previously, we would trace through the following with no error: ``` from torch.fx.experimental.proxy_tensor import make_fx import torch def f(x, y): return x[0, y:] ``` Even though the output shape is dependent on the data of `y`. Now, throw on the conversion of `y` to an integer. It would be nice to not break on constant tensors but I'll do that as the next PR (Edit: done with https://github.com/pytorch/pytorch/pull/84387). Sketching out how that would work (and keep in mind this is applicable Dynamo tracing and not just AOT Autograd) I think to do that you would need to : - hold strong refs to a set of constant tensors, and only allow them to be captured from `lift_fresh.copy` - when you run a mutable op, either remove it from the set of constant tensors or run the operator for real - limit to small constant tensors Anything else ? Pull Request resolved: https://github.com/pytorch/pytorch/pull/83567 Approved by: https://github.com/ezyang	2022-09-07 02:37:00 +00:00
Andrew M. James	6dc9223c8b	Sparse_coo: Be more agressive in setting coalesced True to avoid suprising behaviors (#82426 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/82426 Approved by: https://github.com/pearu, https://github.com/bhosmer	2022-09-01 17:46:51 +00:00
jpvillam	247468baf0	[ROCm] More Sparse UTs enablement and more hipification mappings. (#78939 ) Enables: test_bmm_cuda_float64 test_bmm_deterministic_cuda_float64 test_csr_matvec_cuda_complex128 test_csr_matvec_cuda_complex64 test_csr_matvec_cuda_float32 test_csr_matvec_cuda_float64 To enable the above tests had to add some more hip mappings for the hipification process. Pull Request resolved: https://github.com/pytorch/pytorch/pull/78939 Approved by: https://github.com/pruthvistony, https://github.com/malfet	2022-08-23 13:54:09 +00:00

1 2 3 4 5 ...

328 Commits