pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
yanbing-j	da7675621e	Optimize scatter_add/scatter_reduce in BFloat16/Half data type in CPU backend (#103427 ) ### Description This PR is to optimize scatter_add/scatter_reduce of BFloat16/Half data type in CPU backend, which is one task in https://github.com/pyg-team/pytorch_geometric/issues/7057. Main point is creating a buffer among threads to accumulate intermediate data as fp32 data type. Next step: - [x] Add benchmarks - [x] Extend to Half - [x] Simplify code ### Performance test (Updated) Test BFloat16 in Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz With jemalloc and iomp Single socket (40C) ![image](https://github.com/pytorch/pytorch/assets/61222868/4b4342f1-8cc3-46f7-81f5-651becd9b1e3) Single core ![image](https://github.com/pytorch/pytorch/assets/61222868/09e5f700-2c2e-4208-979e-74b85474dea6) Pull Request resolved: https://github.com/pytorch/pytorch/pull/103427 Approved by: https://github.com/mingfeima, https://github.com/albanD	2023-07-06 01:23:56 +00:00
Andrew M. James	5364366f8c	Sparse Compressed mm avoid creating temp sparse (#104062 ) When mm forwards to addmm it creates a zeroed out self this tensor should take options from the result not one of the sparse arguments. The bug was leading to an error when calling linear with an `out` kwarg. Pull Request resolved: https://github.com/pytorch/pytorch/pull/104062 Approved by: https://github.com/nikitaved, https://github.com/pearu	2023-06-26 16:45:04 +00:00
Aleksandar Samardžić	09fdea8564	Fix autograd issue with identity conversions (#92022 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/92022 Approved by: https://github.com/pearu, https://github.com/mtaaooby, https://github.com/amjames, https://github.com/cpuhrsch	2023-06-21 21:23:03 +00:00
Nikita Vedeneev	39a22e2791	softmax: Triton kernel for BSR inputs (#102095 ) Implements `softmax` Triton kernel for BSR inputs. So far, only over `dim=-1`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/102095 Approved by: https://github.com/cpuhrsch	2023-06-21 01:23:27 +00:00
Pearu Peterson	cbe270d233	Fix zeros_like for sparse tensors with batch dimensions. Add opinfo-based tests to like-functions. (#101215 ) Fixes #101078 Pull Request resolved: https://github.com/pytorch/pytorch/pull/101215 Approved by: https://github.com/cpuhrsch	2023-06-13 16:02:10 +00:00
Xiao Wang	6340aa5d58	Skip test test_triton_bsr_dense_bmm if not TEST_WITH_TORCHINDUCTOR [v2] (#102660 ) Test was originally skipped in https://github.com/pytorch/pytorch/pull/98462 Not sure why it was removed in https://github.com/pytorch/pytorch/pull/94825 Now the test hits CUDA illegal memory access on H100 again after https://github.com/pytorch/pytorch/pull/101163 Pull Request resolved: https://github.com/pytorch/pytorch/pull/102660 Approved by: https://github.com/zou3519	2023-06-01 20:36:45 +00:00
Pearu Peterson	9f97b7c43b	Add integer overflow checks for large compressed tensor dimensions and nnz (#102530 ) With the previous PR allowing large compressed tensors (dimensions larger than `2 31 - 1`), sparse compressed tensor invariants checks may give false-positive results: ```python >>> nnz=231 >>> torch.sparse.check_sparse_tensor_invariants.enable() >>> torch.sparse_csr_tensor(torch.arange(nnz+1, dtype=torch.int32), torch.zeros(nnz, dtype=torch.int32), torch.ones(nnz), (nnz, 1)) tensor(crow_indices=tensor([ 0, 1, 2, ..., 2147483646, 2147483647, -2147483648]), col_indices=tensor([0, 0, 0, ..., 0, 0, 0]), values=tensor([1., 1., 1., ..., 1., 1., 1.]), size=(2147483648, 1), nnz=2147483648, layout=torch.sparse_csr) ``` (notice that the last entry in `crow_indices` is invalid) or raise a bogus exception as in ```python >>> torch.sparse_csr_tensor(torch.arange(nnz+1, dtype=torch.int32), torch.arange(nnz, dtype=torch.int32), torch.ones(nnz), (nnz, 1)) Traceback (most recent call last): File "<stdin>", line 1, in <module> RuntimeError: `0 <= col_indices < ncols` is not satisfied. ``` (notice that `col_indices` is actually valid). This PR fixes the above-reported bugs by introducing integer overflow checks for sparse compressed tensors dimensions as well as nnz. Pull Request resolved: https://github.com/pytorch/pytorch/pull/102530 Approved by: https://github.com/nikitaved	2023-05-31 15:34:08 +00:00
Nikita Vedeneev	d80d3b18d0	nn.Linear with BSR inputs: spare the user from explicit Triton kernel registrations (#98403 ) <!-- copilot:summary --> ### <samp>🤖 Generated by Copilot at 08f7a6a</samp> This pull request adds support for triton kernels in `torch` and `torch/cuda`, and refactors and tests the existing triton kernel for BSR matrix multiplication. It also adds a test case to ensure that importing `torch` does not implicitly import `triton`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/98403 Approved by: https://github.com/malfet, https://github.com/cpuhrsch	2023-05-31 13:09:45 +00:00
Pearu Peterson	fcbdbd6682	Fix silent nnz overflow for large sparse compressed tensors. (#102523 ) Fixes https://github.com/pytorch/pytorch/issues/102520 Pull Request resolved: https://github.com/pytorch/pytorch/pull/102523 Approved by: https://github.com/nikitaved, https://github.com/cpuhrsch	2023-05-30 16:58:01 +00:00
Nikita Vedeneev	6c7410ddc3	sampled_addmm: BSR support (#101163 ) This PR implements a `sampled_addmm` kernel that works with a BSR mask. Pull Request resolved: https://github.com/pytorch/pytorch/pull/101163 Approved by: https://github.com/cpuhrsch	2023-05-25 12:33:50 +00:00
Nikita Vedeneev	346e1f512f	sparse compressed validation: allow empty-batched inputs (#101180 ) Fixes https://github.com/pytorch/pytorch/issues/101179. Pull Request resolved: https://github.com/pytorch/pytorch/pull/101180 Approved by: https://github.com/pearu, https://github.com/cpuhrsch	2023-05-11 20:30:20 +00:00
Nikita Vedeneev	dd2c22f4bb	bsr_dense_bmm(): enable more precise float32 support with float64 accumulators (#100882 ) Float64 is there in Triton! This PR increases precision for float32 inputs with float64 accumulation dtype. Pull Request resolved: https://github.com/pytorch/pytorch/pull/100882 Approved by: https://github.com/cpuhrsch	2023-05-11 11:22:55 +00:00
Pearu Peterson	92a7640b76	Add mul tests with sparse sample inputs (#100393 ) This PR implements sparse sample inputs and error inputs for mul OpInfo. Pull Request resolved: https://github.com/pytorch/pytorch/pull/100393 Approved by: https://github.com/amjames, https://github.com/cpuhrsch	2023-05-09 16:13:14 +00:00
Nikita Vedeneev	0141a242fd	bsr_dense_bmm(): remove sparse_rowspace kernel and some dead code (#100876 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/100876 Approved by: https://github.com/cpuhrsch, https://github.com/Skylion007	2023-05-09 16:12:11 +00:00
Nikita Vedeneev	c4bc259f00	bsr_dense_mm(): better test coverage (#100543 ) This PR improves test coverage for `bsr_dense_mm` by: - ~~enabling correctness tests for `float32`~~. - extending and testing input correctness checks. Pull Request resolved: https://github.com/pytorch/pytorch/pull/100543 Approved by: https://github.com/cpuhrsch, https://github.com/malfet	2023-05-09 09:26:02 +00:00
Pearu Peterson	3ae0e23b90	Fix sum OpInfo for sparse sample inputs and assert coverage for sparse-enabled operators (#100391 ) This PR enables sum tests for sparse sample inputs. Previously, the tests existed but were never run because the sum OpInfo instance was created without specifying `supports_sparse_=True`. To avoid such mistakes in the future, the following PR https://github.com/pytorch/pytorch/pull/100392 enables the `supports_sparse_` flags automatically when OpInfo creation specifies `sample_inputs_sparse_*_func`. In addition, the PR applies several fixes to sum tests for sparse sample inputs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/100391 Approved by: https://github.com/cpuhrsch	2023-05-03 02:04:39 +00:00
Nikita Vedeneev	1adb6fa922	nn.Linear: dispatch to bsr_dense_mm for half and bfloat16 (#94825 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/94825 Approved by: https://github.com/albanD, https://github.com/cpuhrsch	2023-04-15 13:38:42 +00:00
Xiao Wang	bd83b205cc	Skip test test_triton_bsr_dense_bmm if not TEST_WITH_TORCHINDUCTOR (#98462 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/98462 Approved by: https://github.com/zou3519	2023-04-10 21:21:06 +00:00
eqy	2fddcf0fc0	[CUDA][CUDA 11] Remove more CUDA 11 version checks (#92934 ) Working on removing stragglers missed in previous CUDA version < 11.0 cleanup PRs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/92934 Approved by: https://github.com/ngimel	2023-03-30 19:49:52 +00:00
Aaron Gokaslan	47dca20d80	[BE] Enable flake8-comprehension rule C417 (#97880 ) Enables flake8-comprehension rule C417. Ruff autogenerated these fixes to the codebase. Pull Request resolved: https://github.com/pytorch/pytorch/pull/97880 Approved by: https://github.com/ezyang, https://github.com/kit1980, https://github.com/albanD	2023-03-30 14:34:24 +00:00
Sergii Dymchenko	5ab50cf048	Fix shoud/shoudl typos (#97930 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/97930 Approved by: https://github.com/clee2000	2023-03-30 08:27:16 +00:00
Nikita Shulga	2c16b73a1b	Remove comma from parametrized test name (#97844 ) Using `name_fn` argument of `@paramterize` decorator. As internal test runner can't figure out how to parse those, otherwise this is a no-op. For those with intern access, see [T149211516](https://www.internalfb.com/intern/tasks/?t=149211516) Pull Request resolved: https://github.com/pytorch/pytorch/pull/97844 Approved by: https://github.com/weiwangmeta	2023-03-29 14:20:13 +00:00
Nikita Shulga	b443198966	Fix sparse addmv ref impl for non-contig tensors (#97730 ) Fix logic in `test_block_addmm` that tested op against itself rather than against dense implementation, by implementing `ref_addvm` function that converts tensor back to dense before multiplying it with vector. Fix reference implementation by passing stride for vector and result. (Not sure wether it will be more perf efficient to iterate over strided tensor or request a dense copy as MKL implementation does) Print more verbose error message if values differ. Fixes https://github.com/pytorch/pytorch/issues/97629 , https://github.com/pytorch/pytorch/issues/97589 , https://github.com/pytorch/pytorch/issues/97563 Pull Request resolved: https://github.com/pytorch/pytorch/pull/97730 Approved by: https://github.com/cpuhrsch	2023-03-28 20:46:32 +00:00
Nikita Shulga	ad5d81adda	[Sparse] Add reference implementation for addmv (#97353 ) Partially addresses the problem raised in https://github.com/pytorch/pytorch/issues/96972 Add `test_addmv` and enable `test_block_addmv` on all platforms (so the test could be run on M1) TODO: Make sure that test_block_addmv non-contiguous mode actually generate non-contiguous as rigth now it probably does not, as test passes assuming values are contiguous. Pull Request resolved: https://github.com/pytorch/pytorch/pull/97353 Approved by: https://github.com/cpuhrsch	2023-03-24 06:14:32 +00:00
haozhe.zhu	fe0afc5852	use accumulate type in BF16 gemm(include dot, mv) ref path (#96074 ) Fix https://github.com/pytorch/pytorch/issues/95125 and https://github.com/pytorch/pytorch/issues/83863 for bf16 accumulation in gemm ref path Pull Request resolved: https://github.com/pytorch/pytorch/pull/96074 Approved by: https://github.com/lezcano, https://github.com/peterbell10	2023-03-23 01:22:59 +00:00
Nikita Vedeneev	55cf7eef86	add/add_ for sparse compressed formats: fix silent index downcast int64 -> int32 (#95294 ) Fixes https://github.com/pytorch/pytorch/issues/95224. Pull Request resolved: https://github.com/pytorch/pytorch/pull/95294 Approved by: https://github.com/cpuhrsch, https://github.com/amjames	2023-03-10 17:51:40 +00:00
Nikita Vedeneev	98a4d74a68	COO intersection primitives: performance improvement (#96094 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/96094 Approved by: https://github.com/pearu	2023-03-07 13:21:29 +00:00
Nikita Vedeneev	d809020fc8	Triton kernel for bsr @ dense (#94823 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/94823 Approved by: https://github.com/cpuhrsch, https://github.com/malfet	2023-03-03 15:11:28 +00:00
PyTorch MergeBot	d7637801d3	Revert "COO intersection primitives: performance improvement (#92976 )" This reverts commit `b033594943`. Reverted https://github.com/pytorch/pytorch/pull/92976 on behalf of https://github.com/seemethere due to Need to revert this so I can revert https://github.com/pytorch/pytorch/pull/94048 cleanly	2023-03-03 01:38:56 +00:00
Nikita Vedeneev	b033594943	COO intersection primitives: performance improvement (#92976 ) This PR improves COO intersection primitives by: * making it sync-less (dims <= 8, can be changed to any value that fits stack). * improving performance with much less kernel calls. Pull Request resolved: https://github.com/pytorch/pytorch/pull/92976 Approved by: https://github.com/cpuhrsch, https://github.com/pearu	2023-03-02 17:42:39 +00:00
Nikita Vedeneev	325b43661e	add/add_ for compressed sparse inputs: bypass BLAS in some trivial cases (#95293 ) In `add(self, other, out=...)` we can bypass calls to BLAS in cases when `self == other == out` and `self == other`. This PR fixes the repro from https://github.com/pytorch/pytorch/issues/94966, but the issue is still present when `x.add_(x)` is replaced, say, with `x = x.clone().add_(x)`. Could that be a synchronization issue? CC @IvanYashchuk . Pull Request resolved: https://github.com/pytorch/pytorch/pull/95293 Approved by: https://github.com/cpuhrsch	2023-02-27 16:06:02 +00:00
mingfeima	c620ece726	port sparse_mm.reduce to pytorch and optimize it on CPU (#83727 ) ### Motivation of this PR This patch is to migrate `spmm_reduce` from `torch-sparse` (a 3rd party dependency for PyG) to `torch`, which is a response to the initial proposal for fusion of Gather, Apply Scatter in Message Passing of GNN inference/training. https://github.com/pytorch/pytorch/issues/71300 GAS is the major step for Message Passing, the behavior of GAS can be classified into 2 kinds depending on the storage type of `EdgeIndex` which records the connections of nodes: * COO: the hotspot is `scatter_reduce` * CSR: the hotspot is `spmm_reduce` The reduce type can be choose from: "max", "mean", "max", "min". extend `torch.sparse.mm` with an `reduce` argument, maps to `torch.sparse_mm.reduce` internally. `sparse_mm_reduce` is registered under the TensorTypeId of `SparseCsrCPU`, and this operator requires an internal interface `_sparse_mm_reduce_impl` which has dual outputs: * `out` - the actual output * `arg_out` - records output indices in the non zero elements if the reduce type is "max" or "min", this is only useful for training. So for inference, it will not be calculated. ### Performance Benchmark on GCN for obgn-products on Xeon single socket, the workload is improved by `4.3x` with this patch. Performance benefit for training will be bigger, the original backward impl for `sum\|mean` is sequential; the original backward impl for `max\|min` is not fused. #### before: ``` ----------------------------- ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg # of Calls ----------------------------- ------------ ------------ ------------ ------------ ------------ ------------ torch_sparse::spmm_sum 97.09% 56.086s 97.09% 56.088s 6.232s 9 aten::linear 0.00% 85.000us 1.38% 795.485ms 88.387ms 9 aten::matmul 0.00% 57.000us 1.38% 795.260ms 88.362ms 9 aten::mm 1.38% 795.201ms 1.38% 795.203ms 88.356ms 9 aten::relu 0.00% 50.000us 0.76% 440.434ms 73.406ms 6 aten::clamp_min 0.76% 440.384ms 0.76% 440.384ms 73.397ms 6 aten::add_ 0.57% 327.801ms 0.57% 327.801ms 36.422ms 9 aten::log_softmax 0.00% 23.000us 0.10% 55.503ms 18.501ms 3 ``` #### after ``` ----------------------------- ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg # of Calls ----------------------------- ------------ ------------ ------------ ------------ ------------ ------------ aten::spmm_sum 87.35% 11.826s 87.36% 11.827s 1.314s 9 aten::linear 0.00% 92.000us 5.87% 794.451ms 88.272ms 9 aten::matmul 0.00% 62.000us 5.87% 794.208ms 88.245ms 9 aten::mm 5.87% 794.143ms 5.87% 794.146ms 88.238ms 9 aten::relu 0.00% 53.000us 3.35% 452.977ms 75.496ms 6 aten::clamp_min 3.35% 452.924ms 3.35% 452.924ms 75.487ms 6 aten::add_ 2.58% 348.663ms 2.58% 348.663ms 38.740ms 9 aten::argmax 0.42% 57.473ms 0.42% 57.475ms 14.369ms 4 aten::log_softmax 0.00% 22.000us 0.39% 52.605ms 17.535ms 3 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/83727 Approved by: https://github.com/jgong5, https://github.com/cpuhrsch, https://github.com/rusty1s, https://github.com/pearu	2023-02-10 15:56:40 +00:00
Aleksandar Samardžić	e1f17b3530	Add CSR->BSC and CSC->BSR conversions (#93301 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/93301 Approved by: https://github.com/cpuhrsch	2023-02-07 19:22:05 +00:00
Nikita Vedeneev	bb6af061a0	`torch.triangular_solve` for CSR: materialize diagonal elements when `unitriangular=True`. (#93352 ) Fixes https://github.com/pytorch/pytorch/issues/88890 A temporary fix until MKL is fixed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/93352 Approved by: https://github.com/cpuhrsch	2023-01-31 16:33:57 +00:00
Aleksandar Samardžić	53f7fb9a22	Add CSC->BSC conversion (#92307 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/92307 Approved by: https://github.com/cpuhrsch	2023-01-30 17:03:36 +00:00
Pearu Peterson	65d6802e2f	Improve error messages for sparse methods on tensors with unsupported backends/layouts. (#93149 ) Fixes https://github.com/pytorch/pytorch/issues/92790 Pull Request resolved: https://github.com/pytorch/pytorch/pull/93149 Approved by: https://github.com/cpuhrsch	2023-01-27 19:50:23 +00:00
PyTorch MergeBot	7012d985fa	Revert "Improve `bsr @ strided` performance in `baddmm` for `bfloat16/half` with Triton kernels. (#88078 )" This reverts commit `46f16b9363`. Reverted https://github.com/pytorch/pytorch/pull/88078 on behalf of https://github.com/ZainRizvi due to Causing a test to fail consistently: test_decomp.py::HasDecompTest::test_has_decomposition	2023-01-26 16:22:29 +00:00
Nikita Vedeneev	46f16b9363	Improve `bsr @ strided` performance in `baddmm` for `bfloat16/half` with Triton kernels. (#88078 ) As per title. Additionally we also introduce support for: - Rectangular block sizes which are powers of 2 and at least 16 (triton's `dot` limitation). - Batch support with broadcasting for either of the arguments. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88078 Approved by: https://github.com/cpuhrsch	2023-01-26 07:58:27 +00:00
Eddie Yan	0bf7506051	[CUDA] Drop CUDA < 11.0 test flags (#92605 ) Follow-up of #89582 to drop flags like `CUDA11OrLater` in tests. Note that in some places it appears that `TEST_WITH_ROCM` is _implicitly_ guarded against via the `CUDA11OrLater` version check, based on my best-guess of how `torch.version.cuda` would behave in ROCM builds, so I've added `not TEST_WITH_ROCM` in cases where ROCM wasn't previously explicitly allowed. CC @ptrblck @malfet @ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/92605 Approved by: https://github.com/ngimel	2023-01-24 04:34:06 +00:00
Yanbo Liang	0ab4ab9f8d	[Dynamo] Fix calling UserDefinedObject.func should pass self object (#92050 ) Fixes #90834 Pull Request resolved: https://github.com/pytorch/pytorch/pull/92050 Approved by: https://github.com/jansel	2023-01-21 05:47:01 +00:00
PyTorch MergeBot	60bf851931	Revert "Improve `bsr @ strided` performance in `baddmm` for `bfloat16/half` with Triton kernels. (#88078 )" This reverts commit `8383b5c488`. Reverted https://github.com/pytorch/pytorch/pull/88078 on behalf of https://github.com/malfet due to This seems to have broke sm_86 testing, see https://hud.pytorch.org/hud/pytorch/pytorch/master/1?per_page=50&name_filter=sm86%20%2F%20test%20(default%2C%203	2023-01-19 23:37:59 +00:00
Nikita Vedeneev	8383b5c488	Improve `bsr @ strided` performance in `baddmm` for `bfloat16/half` with Triton kernels. (#88078 ) As per title. Additionally we also introduce support for: - Rectangular block sizes which are powers of 2 and at least 16 (triton's `dot` limitation). - Batch support with broadcasting for either of the arguments. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88078 Approved by: https://github.com/cpuhrsch	2023-01-19 03:14:54 +00:00
PyTorch MergeBot	89f1ad08b4	Revert "Improve `bsr @ strided` performance in `baddmm` for `bfloat16/half` with Triton kernels. (#88078 )" This reverts commit `7f256fff77`. Reverted https://github.com/pytorch/pytorch/pull/88078 on behalf of https://github.com/huydhn due to This breaks lint `7f256fff77`	2023-01-17 22:14:37 +00:00
Nikita Vedeneev	7f256fff77	Improve `bsr @ strided` performance in `baddmm` for `bfloat16/half` with Triton kernels. (#88078 ) As per title. Additionally we also introduce support for: - Rectangular block sizes which are powers of 2 and at least 16 (triton's `dot` limitation). - Batch support with broadcasting for either of the arguments. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88078 Approved by: https://github.com/cpuhrsch	2023-01-17 21:43:20 +00:00
Pearu Peterson	b3e4f5029b	Add check-sparse-tensor-invariants flag to Context - 2nd try. (#92094 ) This PR is a copy of https://github.com/pytorch/pytorch/pull/90849 that merge was reverted. The PR adds "check sparse tensor invariants" flag to Context that when enabled will trigger sparse tensor data invariants checks in unsafe methods of constructing sparse COO/CSR/CSC/BSR/BSC tensors. The feature includes the following changes to UI: `torch.sparse.check_sparse_tensor_invariants` class provides different ways to enable/disable the invariant checking. `torch.sparse_coo/csr/csc/bsr/bsc/compressed_tensor` functions have a new optional argument `check_invariants` to enable/disable the invariant checks explicitly. When the `check_invariants` argument is specified, the global state of the feature is temporarily overridden. The PR fixes https://github.com/pytorch/pytorch/issues/90833 Pull Request resolved: https://github.com/pytorch/pytorch/pull/92094 Approved by: https://github.com/cpuhrsch	2023-01-13 14:50:33 +00:00
mingfeima	3ab58fd5ed	optimize sampled_addmm performance on CPU (SparseCSR) (#90978 ) ### Target and Background This PR is improving the performance of `sampled_addmm` on CPU device. This is part of effort for improving PyG performance on CPU for GNN training/inference. The current implementation is a reference design which converts `SparseCSR` tensor back to dense tensor and then do the addmm and convert back to `SparseCSR` again: this is going to be very slow and won't be able to run most of the datasets under https://github.com/snap-stanford/ogb (convert to dense would trigger `OOM`). ### Benchmarks Right now we don't have any hands-on benchmark or workload to test this since this operator is not used in PyG yet. I fetched the dataset from `ogb-products` where: * number of nodes: 2.4 * 10^6 * number of edges: 1.26 * 10^8 * number of features: 128 So if we store the adjacency matrix is dense, it is going to be 2.4 * 2.4 * 4 * 10^12 bytes, this will be OOB on current code. I abstract the first 1k rows to compare, 1100x speedup: CPU: Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz, dual socket, 20 cores per socket. ``` ### before: run 1000 rows from the whole dataset sampled_addmm: running dataset ogb-products first 1000 rows: each iter takes 1212.000 ms! ### after: run 1000 rows from the whole dataset sampled_addmm: running dataset ogb-products first 1000 rows: each iter takes 1.102 ms! ### after: run the whole dataset sampled_addmm: running dataset ogb-products (the whole dataset) 2449029 rows: each iter takes 873.306 ms! ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/90978 Approved by: https://github.com/pearu, https://github.com/cpuhrsch	2023-01-12 12:04:07 +00:00
PyTorch MergeBot	c7a22bb7c7	Revert "Add check-sparse-tensor-invariants flag to Context. (#90849 )" This reverts commit `b9a035c1c5`. Reverted https://github.com/pytorch/pytorch/pull/90849 on behalf of https://github.com/DanilBaibak due to Break internal build	2023-01-12 09:58:16 +00:00
Aleksandar Samardžić	8612ec5b90	Implement hybrid sparse to/from dense conversions. (#90177 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/90177 Approved by: https://github.com/cpuhrsch, https://github.com/pearu	2023-01-12 03:31:30 +00:00
PyTorch MergeBot	c5836153f5	Revert "optimize sampled_addmm performance on CPU (SparseCSR) (#90978 )" This reverts commit `645fb217c0`. Reverted https://github.com/pytorch/pytorch/pull/90978 on behalf of https://github.com/seemethere due to This broke internal builds for android due to the new file added being missing in build_variables.bzl	2023-01-11 20:12:12 +00:00
Pearu Peterson	b9a035c1c5	Add check-sparse-tensor-invariants flag to Context. (#90849 ) This PR adds "check sparse tensor invariants" flag to Context that when enabled will trigger sparse tensor data invariants checks in unsafe methods of constructing sparse COO/CSR/CSC/BSR/BSC tensors. The feature includes the following changes to UI: - `torch.enable_check_sparse_tensor_invariants` and `torch.is_check_sparse_tensor_invariants_enabled` functions to globally enable/disable the invariant checks and to retrieve the state of the feature, respectively - `torch.sparse_coo/csr/csc/bsr/bsc/compressed_tensor` functions have a new optional argument `check_invariants` to enable/disable the invariant checks explicitly. When the `check_invariants` argument is specified, the global state of the feature is temporarily overridden. The PR also fixes https://github.com/pytorch/pytorch/issues/90833 # Main issue The following content is outdated after merging the PRs in this ghstack but kept for the record. The importance of this feature is that when enabling the invariants checks by default, say, via <details> ``` $ git diff diff --git a/torch/__init__.py b/torch/__init__.py index c8543057c7..19a91d0482 100644 --- a/torch/__init__.py +++ b/torch/__init__.py @@ -1239,3 +1239,8 @@ if 'TORCH_CUDA_SANITIZER' in os.environ: # Populate magic methods on SymInt and SymFloat import torch.fx.experimental.symbolic_shapes + +# temporarily enable sparse tensor arguments validation in unsafe +# constructors: + +torch._C._set_check_sparse_tensor_invariants(True) ``` </details> a massive number of test failures/errors occur in test_sparse_csr.py tests: ``` $ pytest -sv test/test_sparse_csr.py <snip> ==== 4293 failed, 1557 passed, 237 skipped, 2744 errors in 69.71s (0:01:09) ==== ``` that means that we are silently constructing sparse compressed tensors that do not satisfy the sparse tensor invariants. In particular, the following errors are raised: ``` AssertionError: "resize_as_sparse_compressed_tensor_: self and src must have the same layout" does not match "expected values to be a strided and contiguous tensor" RuntimeError: CUDA error: device-side assert triggered RuntimeError: `col_indices[..., crow_indices[..., i - 1]:crow_indices[..., i]] for all i = 1, ..., nrows are sorted and distinct along the last dimension values` is not satisfied. RuntimeError: expected col_indices to be a strided and contiguous tensor RuntimeError: expected row_indices to be a strided and contiguous tensor RuntimeError: expected values to be a strided and contiguous tensor RuntimeError: for_each: failed to synchronize: cudaErrorAssert: device-side assert triggered RuntimeError: tensor dimensionality must be sum of batch, base, and dense dimensionalities (=0 + 2 + 0) but got 3 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/90849 Approved by: https://github.com/amjames, https://github.com/cpuhrsch	2023-01-11 01:05:14 +00:00
mingfeima	645fb217c0	optimize sampled_addmm performance on CPU (SparseCSR) (#90978 ) ### Target and Background This PR is improving the performance of `sampled_addmm` on CPU device. This is part of effort for improving PyG performance on CPU for GNN training/inference. The current implementation is a reference design which converts `SparseCSR` tensor back to dense tensor and then do the addmm and convert back to `SparseCSR` again: this is going to be very slow and won't be able to run most of the datasets under https://github.com/snap-stanford/ogb (convert to dense would trigger `OOM`). ### Benchmarks Right now we don't have any hands-on benchmark or workload to test this since this operator is not used in PyG yet. I fetched the dataset from `ogb-products` where: * number of nodes: 2.4 * 10^6 * number of edges: 1.26 * 10^8 * number of features: 128 So if we store the adjacency matrix is dense, it is going to be 2.4 * 2.4 * 4 * 10^12 bytes, this will be OOB on current code. I abstract the first 1k rows to compare, 1100x speedup: CPU: Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz, dual socket, 20 cores per socket. ``` ### before: run 1000 rows from the whole dataset sampled_addmm: running dataset ogb-products first 1000 rows: each iter takes 1212.000 ms! ### after: run 1000 rows from the whole dataset sampled_addmm: running dataset ogb-products first 1000 rows: each iter takes 1.102 ms! ### after: run the whole dataset sampled_addmm: running dataset ogb-products (the whole dataset) 2449029 rows: each iter takes 873.306 ms! ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/90978 Approved by: https://github.com/pearu, https://github.com/cpuhrsch	2023-01-10 22:13:35 +00:00
Pearu Peterson	cdc30048e5	Fix numel() result after resizing a sparse compressed tensor. (#91831 ) Fixes #91830 Pull Request resolved: https://github.com/pytorch/pytorch/pull/91831 Approved by: https://github.com/cpuhrsch	2023-01-10 18:21:07 +00:00
Pearu Peterson	b797a24259	Support indices contiguity per batch and non-contiguous values in sparse compressed tensors (#91243 ) Fixes https://github.com/pytorch/pytorch/issues/91062 With this PR, all reported failures in https://github.com/pytorch/pytorch/pull/90849 are resolved (modulo test_bmm that uses an unorthodox way to construct a batch CSR tensor). Pull Request resolved: https://github.com/pytorch/pytorch/pull/91243 Approved by: https://github.com/nikitaved, https://github.com/amjames, https://github.com/lezcano	2023-01-02 18:08:46 +00:00
Kurt Mohler	08a47549af	Rename `Tensor._storage` to `Tensor.untyped_storage` and update docs (#91414 ) Fixes #89224 Pull Request resolved: https://github.com/pytorch/pytorch/pull/91414 Approved by: https://github.com/ezyang	2022-12-28 19:21:34 +00:00
Nikita Vedeneev	4c5928e387	Fix for `mul(compressed, wrapped scalar)` (#91239 ) Fixes https://github.com/pytorch/pytorch/issues/90819. The path with `Scalar` should have been picked up by the dispatcher, but still the path with a 0-dim wrapped scalar was broken. Pull Request resolved: https://github.com/pytorch/pytorch/pull/91239 Approved by: https://github.com/pearu, https://github.com/cpuhrsch	2022-12-22 13:11:13 +00:00
Pearu Peterson	01e7f46215	Ensure sorted indices from the CSR->BSR conversion (#90918 ) Fixes https://github.com/pytorch/pytorch/issues/90910 Pull Request resolved: https://github.com/pytorch/pytorch/pull/90918 Approved by: https://github.com/cpuhrsch	2022-12-16 15:49:48 +00:00
Nikita Vedeneev	c2c14f9597	Sparse compressed mm: fix for orthogonal inputs (#90917 ) Fixes https://github.com/pytorch/pytorch/issues/90836 Pull Request resolved: https://github.com/pytorch/pytorch/pull/90917 Approved by: https://github.com/cpuhrsch	2022-12-16 13:08:00 +00:00
Nikita Vedeneev	4dd3de23dd	Sparse compressed mm: fix for empty inputs (#90763 ) Fixes [#90693 ](https://github.com/pytorch/pytorch/issues/90693) Pull Request resolved: https://github.com/pytorch/pytorch/pull/90763 Approved by: https://github.com/cpuhrsch	2022-12-16 12:33:57 +00:00
Pearu Peterson	76c6dfeaa6	Add layout and blocksize arguments to Tensor.to_sparse method (#89502 ) This PR extends the `Tensor.to_sparse()` method to `Tensor.to_sparse(layout=None, blocksize=None)` in a BC manner (`layout=None` means `layout=torch.sparse_coo`). In addition, the PR adds support for the following conversions: - non-hybrid/hybrid COO tensor to CSR or CSC or a COO tensor - short, bool, byte, char, bfloat16, int, long, half CSR tensor to a BSR tensor and fixes the following conversions: - hybrid COO to COO tensor - non-batch/batch hybrid BSR to BSR or BSC tensor Pull Request resolved: https://github.com/pytorch/pytorch/pull/89502 Approved by: https://github.com/amjames, https://github.com/cpuhrsch	2022-11-30 20:21:10 +00:00
Pearu Peterson	296e1ba4d0	Row and column select support for block compressed sparse tensors (#88733 ) As in the title: - Support `select` and `select_copy` on block sparse compressed tensors - Fixes incorrect results when selecting dense dimensions The PR also improves the performance of indexing sparse compressed tensors considerably: <details> Before: ```python In [3]: a=torch.rand((1000, 1000)).to_sparse_csr() In [4]: %timeit a.select(0, 0) 606 µs ± 4.27 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) In [5]: %timeit a.select(1, 0) 527 µs ± 57.7 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) In [6]: %timeit a[0, 0] 617 µs ± 3.74 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) In [7]: a = a.cuda() In [8]: %timeit a.select(0, 0); torch.cuda.synchronize(); 1.19 ms ± 137 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) In [9]: %timeit a.select(1, 0); torch.cuda.synchronize(); 1.2 ms ± 119 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) In [10]: %timeit a[0, 0]; torch.cuda.synchronize(); 1.23 ms ± 482 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) ``` This PR: ```python In [3]: a=torch.rand((1000, 1000)).to_sparse_csr() In [4]: %timeit a.select(0, 0) 4.75 µs ± 8.94 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) In [5]: %timeit a.select(1, 0) 565 µs ± 156 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) In [6]: %timeit a[0, 0] 13.1 µs ± 435 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) In [7]: a = a.cuda() In [8]: %timeit a.select(0, 0); torch.cuda.synchronize(); 21.6 µs ± 23.9 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) In [9]: %timeit a.select(1, 0); torch.cuda.synchronize(); 1.15 ms ± 3.13 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) In [10]: %timeit a[0, 0]; torch.cuda.synchronize(); 63.7 µs ± 2.5 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) ``` </details> Pull Request resolved: https://github.com/pytorch/pytorch/pull/88733 Approved by: https://github.com/nikitaved, https://github.com/amjames, https://github.com/cpuhrsch	2022-11-30 11:15:56 +00:00
Pearu Peterson	90bed8874f	Generator of tensor inputs with variable layout and structure (batch/non-batch, hybrid/non-hybrid, block/non-block) (#88914 ) This PR introduces `TestCase.generate_simple_inputs` method that is an improved and generalized version of the `TestSparseCompressed._generate_small_inputs` method. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88914 Approved by: https://github.com/cpuhrsch	2022-11-30 02:13:33 +00:00
Pearu Peterson	50e2e4faf3	Sparse CSC/BSR/BSC serialization and pickle support (#89553 ) Fixes https://github.com/pytorch/pytorch/issues/89497 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89553 Approved by: https://github.com/cpuhrsch	2022-11-23 20:56:48 +00:00
Andrew M. James	a41f70603a	Round out rad2deg sparse support (#88442 ) - Add sparse coo dispatch - Modify backward to work with sparse compressed layouts - Enable sparse_compressed autograd testing - Correct layout support attributes on OpInfo Pull Request resolved: https://github.com/pytorch/pytorch/pull/88442 Approved by: https://github.com/cpuhrsch	2022-11-17 06:00:23 +00:00
Nikita Vedeneev	8dc3353b0b	add `to(dtype)` support for all sparse compressed formats (#89055 ) Fixes [#88419](https://github.com/pytorch/pytorch/issues/88419) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89055 Approved by: https://github.com/cpuhrsch	2022-11-15 21:16:18 +00:00
Kazuaki Ishizaki	03296844aa	Fix typos in messages under aten (#88964 ) This PR fixes typos of messages and parms in c++ source files under `aten` directory. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88964 Approved by: https://github.com/lezcano	2022-11-14 09:50:50 +00:00
Andrew M. James	ff6770a9a1	enable backward for log1p (sparse layouts) (#88155 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88155 Approved by: https://github.com/cpuhrsch	2022-11-04 20:59:26 +00:00
Andrew M. James	6938dd0b2c	Support sparse inputs to deg2rad (#88156 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88156 Approved by: https://github.com/cpuhrsch	2022-11-04 20:59:26 +00:00
Andrew M. James	1964d8c34f	Enable sparse_csr autograd testing for relu (#88154 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88154 Approved by: https://github.com/cpuhrsch	2022-11-04 20:59:23 +00:00
Andrew M. James	f03302ba49	Add sparse layout support for torch.frac (#88153 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88153 Approved by: https://github.com/cpuhrsch	2022-11-04 20:59:22 +00:00
Andrew M. James	b2dfd20260	Remove BSC conversion skip from TestSparseCompressed.test_consistency (#88152 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88152 Approved by: https://github.com/cpuhrsch	2022-11-01 22:18:56 +00:00
Andrew M. James	d044b4cc58	Update torch.abs and torch.positive opinfos to reflect sparse support (#88151 ) cc @nikitaved @pearu @cpuhrsch @bhosmer Pull Request resolved: https://github.com/pytorch/pytorch/pull/88151 Approved by: https://github.com/cpuhrsch	2022-11-01 22:18:56 +00:00
Ivan Yashchuk	51ea441862	Upcast to fp32 in test_addmm_block ref_half_bfloat16 (#86682 ) Fixes https://github.com/pytorch/pytorch/issues/86681 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86682 Approved by: https://github.com/nikitaved	2022-10-11 16:39:57 +00:00
nikitaved	e15a48def7	(bsr/csr) x dense mm (#85551 ) As per title. This implementation is not the most optimal and could be improved albeit with native kernels (i.e. block matching need not be materialized). Compared to existing kernels it offers: - Half float support (In fact, any dtype that supports `matmul` will work). - Arbitrary block sizes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85551 Approved by: https://github.com/amjames, https://github.com/cpuhrsch	2022-09-29 17:12:04 +00:00
Andrew M. James	8a926b3187	Enable CSC @ CSC addmm (#85379 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85379 Approved by: https://github.com/pearu, https://github.com/cpuhrsch	2022-09-27 19:49:31 +00:00
Andrew M. James	bb5001ce3d	Enable dense x bsc mm/addmm (#85308 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85308 Approved by: https://github.com/pearu	2022-09-27 19:49:31 +00:00
Andrew M. James	aaef5d8f2c	sparse mm/addmm enable dense x csc, csc x dense and simplify layout check logic. (#85307 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85307 Approved by: https://github.com/pearu, https://github.com/cpuhrsch	2022-09-27 16:46:28 +00:00
Andrew M. James	f64857189d	resize_as_sparse support all compressed layouts (#85378 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85378 Approved by: https://github.com/pearu, https://github.com/cpuhrsch	2022-09-27 06:59:18 +00:00
George Qi	686555b663	[maskedtensor] port torch/_masked into torch/masked (#85515 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85515 Approved by: https://github.com/cpuhrsch	2022-09-26 23:41:13 +00:00
Sean Ross-Ross	a4c94f0739	Fix cuda issue with sparse.sampled_addmm (#85194 ) fixes https://github.com/pytorch/pytorch/issues/85169 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85194 Approved by: https://github.com/amjames, https://github.com/nikitaved	2022-09-23 20:52:23 +00:00
nikitaved	0278a141fc	csr <-> csc, csc <-> csc, bsr <-> bsc, bsc <-> bsc, bsr <-> bsr conversions (#85091 ) As per title. Required to enable a wider selection of sparse formats for `nn.functional.linear`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85091 Approved by: https://github.com/amjames, https://github.com/cpuhrsch	2022-09-21 20:10:26 +00:00
Pearu Peterson	f0b06c64c8	Fix bugs in sparse compressed tensor shape and device inference (#85240 ) Fixes #84999 This PR - uses device option to set sparse compressed tensor instance device - enables shape and device inference tests that was disabled due to an oversight - fixes a bug in shape inference of hybrid tensors - fixes a bug in to_sparse_bsr of a cuda tensor - updates tests that catch the above bugs Pull Request resolved: https://github.com/pytorch/pytorch/pull/85240 Approved by: https://github.com/cpuhrsch	2022-09-19 18:10:37 +00:00
Edward Z. Yang	c5a8946e40	Revert "Revert "Redo how custom/python_custom methods on TensorImpl work (#84796 )" (#84806 ) This reverts commit `ca3b2bfbe3`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84806 Approved by: https://github.com/Chillee	2022-09-10 06:17:35 +00:00
Eli Uriegas	ca3b2bfbe3	Revert "Redo how custom/python_custom methods on TensorImpl work (#84796 ) This reverts commit `591b75bf98`. Manual revert of https://github.com/pytorch/pytorch/pull/84641 Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/84796 Approved by: https://github.com/izaitsevfb	2022-09-10 00:18:13 +00:00
Edward Z. Yang	591b75bf98	Redo how custom/python_custom methods on TensorImpl work (#84641 ) A longstanding confusion in the implementation of fake tensor and proxy tensor is what to do about torch.ops.aten.sym_sizes and related calls. In particular, when you have a tensor that (1) has symbolic shapes and (2) has a `__torch_dispatch__` call, previously, you would always get `__torch_dispatch__` calls for sizes/strides query, even if you didn't request it via the dispatch kwargs in `make_wrapper_subclass`. The reason for this is because we were previously mixing several concepts: "I want to dispatch to Python", "I want to call a virtual method" and "I have dynamic shapes". A single boolean variable controlled all of these things, and so it was not possible to understand inside TensorImpl what the user had actually originally requested. In this PR, we track each of these concepts individually so that we can preserve user intent. Then, we combine these into a single "policy" variable that controls whether or not we can use the fastpath or not. For the policy to trigger, we only need one of the exceptional cases to be true. Billing of changes: * Rename `set_sizes_strides_policy` to `set_custom_sizes_strides`; in general, you cannot DIRECTLY set policy; you have to indirectly set it by the public functions. * Some helpers for sizes and strides, since it's more complicated (as it is an enum, rather than just bools as is the case for device and layout). `matches_python_custom` is used to test the Python dispatch user ask. `matches_policy` does the policy test (only used in the user facing functions.) * I reorged the accessor methods so that they are more logical. This makes the diff bad, so I recommend reading the final code directly. * The default custom implementations now more reliably call their default() implementations * As bonus refactor, I devirtualized some functions that don't need to be virtual * `set_sym_sizes_and_strides` is renamed to `set_sizes_and_strides` to make it easier to use in template contexts; it optionally takes a storage offset now so you can set all three values at the same time. If you use the SymInt overload but there are no symbolic integers, we give you a normal resize. * This adds `sym_storage_offset` since we had that in the symbolic shapes branch and there's no reason not to put it in (and it reduces merge conflicts) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/84641 Approved by: https://github.com/wconstab	2022-09-09 13:41:13 +00:00
Andrew M. James	9b115c7bd3	Sparse Compressed Transpose add support for Batch dims and BSR/BSC layouts (#82122 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/82122 Approved by: https://github.com/bhosmer	2022-09-02 17:42:58 +00:00
Andrew M. James	0192a34910	Dense -> CSC support batch dimensions (#83086 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83086 Approved by: https://github.com/bhosmer, https://github.com/nikitaved	2022-09-02 17:42:58 +00:00
Andrew M. James	f0e5b73364	Dense -> CSR support batch dimensions (#83084 ) Only requires changes to the dense->sparse pathway. The reverse already has support. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83084 Approved by: https://github.com/bhosmer, https://github.com/nikitaved	2022-09-02 17:42:58 +00:00
Andrew M. James	8778f33744	Dense <-> bsc conversions (#80781 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/80781 Approved by: https://github.com/bhosmer, https://github.com/nikitaved	2022-09-01 16:01:58 +00:00
jpvillam	247468baf0	[ROCm] More Sparse UTs enablement and more hipification mappings. (#78939 ) Enables: test_bmm_cuda_float64 test_bmm_deterministic_cuda_float64 test_csr_matvec_cuda_complex128 test_csr_matvec_cuda_complex64 test_csr_matvec_cuda_float32 test_csr_matvec_cuda_float64 To enable the above tests had to add some more hip mappings for the hipification process. Pull Request resolved: https://github.com/pytorch/pytorch/pull/78939 Approved by: https://github.com/pruthvistony, https://github.com/malfet	2022-08-23 13:54:09 +00:00
Andrew M. James	eebcb9117a	Fix BSR->Dense Batched Bug (#82120 ) A todo in the tests which should have been removed and addressed before the initial PR landed was left, and so left holes in testing BSR-> Dense. This addresses the underlying issue and removes the hole in test coverage. #8071 Introduces more comprehensive test coverage for sparse compressed <-> Dense conversion in general. Pull Request resolved: https://github.com/pytorch/pytorch/pull/82120 Approved by: https://github.com/nikitaved, https://github.com/bhosmer	2022-08-06 02:24:20 +00:00
Andrew M. James	0e0dfaa057	Add support for `select` of batch dims for all sparse compressed formats. (#82119 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/82119 Approved by: https://github.com/nikitaved, https://github.com/bhosmer	2022-08-06 02:24:20 +00:00
Nikita Shulga	d80fe49de0	[Reland] Add py-3.10 config (#82329 ) This is a re-land of #81372 and #81233 with the exception that it does not force the range-checks on older Python runtime versions and as such should not affect the internal workloads, which were the reason for revert, see https://github.com/pytorch/pytorch/pull/81372#issuecomment-1187516464 - [Py3.10] Allow floats to be imported as Long (#81372) - [CI] Move CUDA-11.6 to Python-3.10 configuration (#81233) - Don't do anything about range checks for pre-py3.10 Pull Request resolved: https://github.com/pytorch/pytorch/pull/82329 Approved by: https://github.com/kit1980	2022-07-27 20:22:47 +00:00
Edward Z. Yang	7f7c81c5f9	Add empty_like support for sparse_csc/bsr/bsc (#82310 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/82310 Approved by: https://github.com/amjames, https://github.com/nikitaved	2022-07-27 18:59:07 +00:00
PyTorch MergeBot	ec1b3a45ad	Revert "[Py3.10] Allow floats to be imported as Long (#81372 )" This reverts commit `69d73345a2`. Reverted https://github.com/pytorch/pytorch/pull/81372 on behalf of https://github.com/DanilBaibak due to Break internal build	2022-07-18 14:55:13 +00:00
Nikita Shulga	69d73345a2	[Py3.10] Allow floats to be imported as Long (#81372 ) Thus avoiding `TypeError: 'float' object cannot be interpreted as an integer` when trying to create integer tensor from floating point values Use `c10::checked_convert` to detect overflows during tensor construction from scalars. Modify sparse_csr test that violated this rule Fixes #69319 Tested in #81233 Pull Request resolved: https://github.com/pytorch/pytorch/pull/81372 Approved by: https://github.com/ezyang, https://github.com/ngimel	2022-07-15 22:57:58 +00:00
Nikita Vedeneev	880b972841	More efficient indices validations for compressed sparse formats. (#81108 ) As per title. Some of the features: - native kernels both for the CPU and CUDA without device syncs. - If needed, invariant checks 5.1 - 5.5 could be improved to utilize vectorization. This will require implementing a conversion `Vectorized -> bool`. That's a follow-up. Pull Request resolved: https://github.com/pytorch/pytorch/pull/81108 Approved by: https://github.com/amjames, https://github.com/pearu, https://github.com/cpuhrsch	2022-07-14 20:36:18 +00:00
Pearu Peterson	d50f4a3c24	Support sparse/dense_dim for Compressed Sparse tensors (#80901 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/80901 Approved by: https://github.com/cpuhrsch, https://github.com/nikitaved	2022-07-08 15:49:35 +00:00
Pearu Peterson	d266256621	Support compressed sparse tensors with dense dimensions (#80565 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/80565 Approved by: https://github.com/cpuhrsch	2022-07-07 16:21:12 +00:00
PyTorch MergeBot	682c0d2615	Use segment/scatter_reduce to support masked reductions on sparse CSR tensors (mean, amax, amin) (fp only) (#78918 ) Follows design [here](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/sparse/SparseCsrTensorMath.cpp#L804-L837) and [here](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/sparse/SparseCsrTensorMath.cpp#L885-L928) from SparseCsrTensorMath.cpp (which has already been used to implement sum/prod) but use `segment_reduce`/`scatter_reduce` for reduction step Pull Request resolved: https://github.com/pytorch/pytorch/pull/78918 Approved by: https://github.com/cpuhrsch	2022-06-30 14:11:53 +00:00
Andrew M. James	9e3677f85d	Add support for BSR <-> Strided Conversion (#80354 ) Supersedes #78303 Pull Request resolved: https://github.com/pytorch/pytorch/pull/80354 Approved by: https://github.com/cpuhrsch	2022-06-27 21:09:09 +00:00

1 2 3 4 5 ...

272 Commits