pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-08 07:39:33 +01:00

Author	SHA1	Message	Date
Dmitry Nikolaev	656134c38f	[ROCm] enable complex128 in test_addmm_sizes_all_sparse_csr for rocm for trivial (k,n,m) cases (#120504 ) This PR enables `test_addmm_sizes_all_sparse_csr_k__n__m_*_cuda_complex128` for ROCm for trivial cases (m or n or k = 0) CUSPARSE_SPMM_COMPLEX128_SUPPORTED also used for `test_addmm_all_sparse_csr` and ` test_sparse_matmul` and both of them are skipped for ROCm by `@skipIfRocm` or `@skipCUDAIf(not _check_cusparse_spgemm_available())` Pull Request resolved: https://github.com/pytorch/pytorch/pull/120504 Approved by: https://github.com/jithunnair-amd, https://github.com/ezyang	2024-03-12 07:29:57 +00:00
Dmitry Nikolaev	c7328602ed	[ROCm] enable tests test_sampled_addmm_autograd_cuda_*, test_sample… (#117501 ) These tests PASS on ROCM 5.6+ now: - test_sampled_addmm_autograd_cuda_complex128 - test_sampled_addmm_autograd_cuda_complex64 - test_sampled_addmm_autograd_cuda_float32 - test_sampled_addmm_autograd_cuda_float64 - test_sampled_addmm_cuda_complex128 - test_sampled_addmm_cuda_complex64 - test_sampled_addmm_cuda_float32 - test_sampled_addmm_cuda_float64 - test_autograd_dense_output_addmm_cuda_float64 - test_autograd_dense_output_addmv_cuda_float64 - test_autograd_dense_output_mv_cuda_float64 @pruthvistony @jithunnair-amd Pull Request resolved: https://github.com/pytorch/pytorch/pull/117501 Approved by: https://github.com/pruthvistony, https://github.com/jeffdaily, https://github.com/malfet	2024-02-22 17:24:25 +00:00
Peter Bell	3a8bf25fdd	[SparseCsr] Remove triton sdpa skip after triton pin update (#109601 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109601 Approved by: https://github.com/desertfire, https://github.com/amjames	2024-02-08 16:40:25 +00:00
Aaron Orenstein	c6c851102f	Fix test_compressed_layout_conversions_coverage to check BSC format (#117951 ) test_compressed_layout_conversions_coverage verifies torch's conversions between different memory layouts using numpy as a reference. Since numpy doesn't support BSC format it just skipped that. Instead fake it by using a transposed BSR format. Pull Request resolved: https://github.com/pytorch/pytorch/pull/117951 Approved by: https://github.com/zou3519	2024-02-03 08:10:15 +00:00
Jeff Daily	a27a6e8cf1	[ROCm] skip test_sparse_csr test_triton_bsr_softmax_cuda (#118006 ) The tests were taking too long and leading to CI timeouts. Pull Request resolved: https://github.com/pytorch/pytorch/pull/118006 Approved by: https://github.com/huydhn	2024-01-23 00:09:42 +00:00
rzou	9dbe4eae82	[codemod] markDynamoStrictTest batch 14 (#117133 ) [codemod] markDynamoStrictTest test_utils [codemod] markDynamoStrictTest test_unary_ufuncs [codemod] markDynamoStrictTest test_sparse_semi_structured [codemod] markDynamoStrictTest test_sparse_csr [codemod] markDynamoStrictTest test_sparse [codemod] markDynamoStrictTest test_reductions [codemod] markDynamoStrictTest test_proxy_tensor [codemod] markDynamoStrictTest test_prims [codemod] markDynamoStrictTest test_maskedtensor [codemod] markDynamoStrictTest test_masked [codemod] markDynamoStrictTest test_legacy_vmap [codemod] markDynamoStrictTest test_binary_ufuncs Pull Request resolved: https://github.com/pytorch/pytorch/pull/117133 Approved by: https://github.com/voznesenskym ghstack dependencies: #117114, #117127, #117128, #117129	2024-01-11 04:28:57 +00:00
Jack Taylor	db79ceb110	[ROCm] Enabling additional UTs on ROCm (#115738 ) Unskips mostly for dynamo/inductor UT. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115738 Approved by: https://github.com/jithunnair-amd, https://github.com/malfet	2024-01-09 08:36:07 +00:00
Nikita Shulga	4bfaa6bc25	[MPS] Fix addmm (#116547 ) Remove weird logic for designating matrices as transposed if sizes match(which always true if square matrices are multiplied with each other), which resulted in `torch.addmm` returns transposed matrix compared to `torch.mm`, see below: ``` % python -c "import torch;torch.set_default_device('mps');a=torch.eye(2);b=torch.arange(4.0).reshape(2, 2);print(a@b);print(torch.addmm(torch.zeros(2, 2), a,b))" tensor([[0., 1.], [2., 3.]], device='mps:0') tensor([[0., 2.], [1., 3.]], device='mps:0') ``` Fixes introduced to `torch.mm` in https://github.com/pytorch/pytorch/pull/77462 suggests that this is not needed Modify `sample_inputs_addmm` to test `torch.addmm` with square matrices, but skip this config for `test_autograd_dense_output_addmm`, see https://github.com/pytorch/pytorch/issues/116565 TODO: probably tweak tolerances, as `test_output_match_addmm_cpu_float16` fails with 2x2 matrices, but passes using 3x3 ones with errors slightly exceeding the tolerance Fixes https://github.com/pytorch/pytorch/issues/116331 Pull Request resolved: https://github.com/pytorch/pytorch/pull/116547 Approved by: https://github.com/albanD, https://github.com/Skylion007	2023-12-31 02:28:59 +00:00
Andrew M. James	4b97ed2ed8	[SparseCompressed] support csc layout for add sparse/dense. (#115433 ) `add` when passed one sparse and one dense argument will error if the sparse argument does not have csr layout. This PR modifies the underlying algorithm to be generic on the compressed dimension handling both csr and csc. The functions are renamed to use the `sparse_compressed` qualifier rather than `sparse_csr` Fixes: #114807 Pull Request resolved: https://github.com/pytorch/pytorch/pull/115433 Approved by: https://github.com/cpuhrsch, https://github.com/pearu ghstack dependencies: #115432	2023-12-22 01:47:55 +00:00
Andrew M. James	910baa3a03	[SparseCompressed] Support `add(sparse_compressed, dense)` (#115432 ) Addition involving sparse compressed and dense arguments is implemented requiring that the dense tensor be on the LHS. This change adds support for the other pattern `sparse + dense by permuting arguments. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115432 Approved by: https://github.com/cpuhrsch, https://github.com/pearu	2023-12-22 01:47:55 +00:00
Aaron Gokaslan	6de28e92d2	[BE]: Apply FURB118 (prev): replaces unnecessary lambdas with operator. (#116027 ) This replaces a bunch of unnecessary lambdas with the operator package. This is semantically equivalent, but the operator package is faster, and arguably more readable. When the FURB rules are taken out of preview, I will enable it as a ruff check. Pull Request resolved: https://github.com/pytorch/pytorch/pull/116027 Approved by: https://github.com/malfet	2023-12-20 19:35:08 +00:00
Pearu Peterson	d72d99e591	Fix sparse compressed tensor invariants checks when nnz==0 (#115826 ) Fixes https://github.com/pytorch/pytorch/issues/115755 This PR is a step toward deprecating `torch.empty(..., layout=<sparse compressed tensor layout>)` that usage should be minimized as it will produce invalid tensors, see also https://github.com/pytorch/pytorch/issues/90695 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/115826 Approved by: https://github.com/cpuhrsch, https://github.com/amjames	2023-12-20 12:16:07 +00:00
Pearu Peterson	419f2ca3e3	Fix a crash in sparse compressed tensor invariants check when nnz == 0 (#115825 ) Fixes python crash example from https://github.com/pytorch/pytorch/issues/115755 Pull Request resolved: https://github.com/pytorch/pytorch/pull/115825 Approved by: https://github.com/cpuhrsch	2023-12-17 17:36:15 +00:00
Pearu Peterson	32286512cc	Add tune_bsr_dense_addmm as an API to find optimal triton kernel parameters for bsr_dense_addmm (#115499 ) As in the title. In addition: - improve the algorithm for finding a minima of operation timings: break the inner loop early when a next minima candidate is found - add tests and fix bugs Pull Request resolved: https://github.com/pytorch/pytorch/pull/115499 Approved by: https://github.com/cpuhrsch	2023-12-12 16:44:51 +00:00
PyTorch MergeBot	d7180161b5	Revert "[SparseCsr] Remove triton sdpa skip after triton pin update (#109601 )" This reverts commit `f64b10803f`. Reverted https://github.com/pytorch/pytorch/pull/109601 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but it is failing in trunk with this error ZeroDivisionError: integer division or modulo by zero ([comment](https://github.com/pytorch/pytorch/pull/109601#issuecomment-1847784383))	2023-12-08 20:12:53 +00:00
Peter Bell	f64b10803f	[SparseCsr] Remove triton sdpa skip after triton pin update (#109601 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109601 Approved by: https://github.com/desertfire, https://github.com/amjames	2023-12-08 15:49:16 +00:00
Alexander Grund	ca15671c30	Fix failing test_invalid_input_csr_large (#114940 ) The test introduced in #102530 has a bug: Construction of `crow_indices` raises an exception: "value cannot be converted to type int32 without overflow" which is obviously correct. This makes the test fail which is supposed to check for an overflow in nnz. Fix by making the construction of `crow_indices` pass although with an invalid value which would error later but triggers the correct check. Given that I'm not sure it is even worth checking for an overflow in nnz: - `crow_indices[..., -1] == nnz` is already enforced - this can only hold if `crow_indices` is able to hold `nnz` without overflow - `col_indices` has to be of the same type as `crow_indices` - Hence the type of `col_indices` has to be able to hold the value of `nnz` So in conclusion: The situation being checked for cannot reasonably occur CC @pearu as the test author for additional insight Pull Request resolved: https://github.com/pytorch/pytorch/pull/114940 Approved by: https://github.com/pearu, https://github.com/cpuhrsch	2023-12-08 11:55:21 +00:00
Pearu Peterson	12085914b8	Replace bsr_dense_mm triton kernel with bsr_dense_addm triton kernel (#115030 ) The `bsr_dense_addmm` triton kernel introduced in https://github.com/pytorch/pytorch/pull/114595 is a generalization of `bsr_dense_mm` triton kernel and a more efficient version of it because it uses an extra kernel parameter `SPLIT_N` that has notable effect to performance for r.h.s operand with a larger number of columns. This PR eliminates the `bsr_dense_mm` triton kernel in favor of using `bsr_dense_addmm` triton kernel. The performance increase of `bsr_dense_mm` is as follows (float16, `NVIDIA A100-SXM4-80GB`): - with 16x16 blocks, the average/maximal speed up is 50/71 % - with 32x32 blocks, the average/maximal speed up is 30/63 % - with 64x64 blocks, the average/maximal speed up is 12/26 % - with 128x128 blocks, the average/maximal speed up is 7/17 % Pull Request resolved: https://github.com/pytorch/pytorch/pull/115030 Approved by: https://github.com/cpuhrsch	2023-12-05 22:29:24 +00:00
Pearu Peterson	4ba37e1804	Add tests for bsr_dense_addmm and bsr_dense_mm triton kernels (#114800 ) As in the title. In addition, - resolve https://github.com/pytorch/pytorch/pull/114757#discussion_r1409547917 re triton-contiguous inputs - support non-contiguous inputs and outputs in triton kernels - fix a couple of minor bugs Pull Request resolved: https://github.com/pytorch/pytorch/pull/114800 Approved by: https://github.com/cpuhrsch	2023-12-04 22:07:47 +00:00
Jason Ansel	9664190952	[dynamo] Eagerly install guards (#111415 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111415 Approved by: https://github.com/voznesenskym ghstack dependencies: #111306	2023-11-07 19:55:19 +00:00
Andrew M. James	0bd2955f15	Memory leak from bsr_scatter_mm_indices_data argument cache (#112301 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112301 Approved by: https://github.com/cpuhrsch, https://github.com/pearu	2023-11-02 18:43:10 +00:00
Pearu Peterson	cf6041e942	Use weakref in storing tensors as keys (follow-up to #111470 ) (#112076 ) This PR addresses the discussion items in https://github.com/pytorch/pytorch/pull/111470#discussion_r1369008167, that is, - use weakref when storing tensors as keys, - add `storage_offset` to the key data, - and revise the description of the `TensorAsKey` utility. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112076 Approved by: https://github.com/cpuhrsch ghstack dependencies: #112154	2023-10-30 19:16:05 +00:00
Pearu Peterson	b969c675f5	Add batched dimensions support to the second operand of bsr_scatter_mm (#111796 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111796 Approved by: https://github.com/cpuhrsch ghstack dependencies: #110396, #111470, #111489, #111760	2023-10-23 23:52:49 +00:00
Pearu Peterson	d4708a6da7	Add scatter_mm and bsr_scatter_mm operations. (#110396 ) This PR introduces `scatter_mm` operation (compute `mm` of arbitrary pairs of tensors given in batches of tensors) that is used to implement `bsr_scatter_mm` that is equivalent to `bsr_dense_mm` (the `mm` operation on bsr and strided tensors). The implementation is provided both in Triton (when tensor dimensions are multiples of 16) and in PyTorch (otherwise). The figures below illustrate the performance differences of `bsr_scatter_mm` and `bsr_dense_mm` (GPU: `NVIDIA GeForce RTX 2060 SUPER`). The first figure represents the performance equilibrium point in BSR tensor sparsity at which value `bsr_scatter_mm` or `bsr_dense_mm` have the same performance characteristics as `torch.matmul`. The second figure represents speedups from using `bsr_scatter_mm` at its performance equilibrium points with respect to `bsr_dense_mm`. <img src="https://github.com/pytorch/pytorch/assets/402156/526d182e-937f-4812-a6c4-904f52d6d5ab" width="48%"> <img src="https://github.com/pytorch/pytorch/assets/402156/ccb606ab-1f3f-4133-887c-b56285f4f168" width="48%"> The same figures for GPU card `NVIDIA A100-SXM4-80GB`: <img src="https://github.com/pytorch/pytorch/assets/402156/25466f1d-df34-4d1c-a975-afb478e4d9f0" width="48%"> <img src="https://github.com/pytorch/pytorch/assets/402156/6ada91f0-a20f-4f0d-8a48-1f4ccc60d08e" width="48%"> In sum: - `bsr_scatter_mm` is about 2x faster than `bsr_dense_mm` for small block sizes of 16 and 32 and large tensors [GPU: `NVIDIA GeForce RTX 2060 SUPER`]. - `bsr_scatter_mm` is up to 2x faster than `bsr_dense_mm` for small block sizes of 16 and large tensors [GPU: `NVIDIA A100-SXM4-80GB`]. - `bsr_dense_mm` is up to 20 % faster than `bsr_scatter_mm` for block sizes of 64 or larger [GPU: `NVIDIA GeForce RTX 2060 SUPER`]. - However, `bsr_dense_mm` fails with `OutOfResources` exception for block sizes of 256 or larger whereas `bsr_scatter_mm` succeeds. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110396 Approved by: https://github.com/cpuhrsch	2023-10-23 19:45:30 +00:00
Evgeni Burovski	48989bc820	trace frames with np.ndarray (#110512 ) Fixes #109604 Resubmit gh-109715 + several skips and small fixes to make tests pass. The main fix here is by @ysiraichi : previously, dynamo did not resume tracing numpy ndarrays after a graph break. While at it, fix several small issues Yukio's fix uncovers: - graph break gracefully on numpy dtypes which do not map to torch.dtypes (uint16 etc) - recognize array scalars in dynamo, treat them as 0D ndarrays - make sure that iterating over torch.ndarray generates arrays not bare tensors Pull Request resolved: https://github.com/pytorch/pytorch/pull/110512 Approved by: https://github.com/lezcano	2023-10-15 00:56:10 +00:00
Oguz Ulgen	1df14f1bf8	Move has_triton to top level triton utils so that dynamo can also access (#109832 ) it without creating cyclic dependencies Pull Request resolved: https://github.com/pytorch/pytorch/pull/109832 Approved by: https://github.com/zou3519	2023-09-22 19:33:41 +00:00
Shunting Zhang	e68b3ad14f	update triton pin with needed inductor change (#107722 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/107722 Approved by: https://github.com/jansel, https://github.com/cpuhrsch	2023-08-29 04:31:44 +00:00
Pearu Peterson	d7c0c5de2d	Set crow_indices outputs as non-differentiable. (#107447 ) Fixes https://github.com/pytorch/pytorch/issues/107083 Pull Request resolved: https://github.com/pytorch/pytorch/pull/107447 Approved by: https://github.com/cpuhrsch	2023-08-21 19:52:32 +00:00
rraminen	239578beff	[ROCm] Enable a few bfloat16 unit tests (#105177 ) Currently a few unit tests from test_matmul_cuda and test_sparse_csr test suites are being skipped on ROCm. This PR is to enable the following unit tests on ROCm (~30 UTs): test_cublas_baddbmm_large_input_* (__main__.TestMatmulCudaCUDA) test_addmm_sizes_all_sparse_csr* (__main__.TestSparseCSRCUDA) when m==0 or n==0 or k==0 Pull Request resolved: https://github.com/pytorch/pytorch/pull/105177 Approved by: https://github.com/pruthvistony, https://github.com/jithunnair-amd, https://github.com/malfet	2023-08-03 21:17:19 +00:00
yanbing-j	a54043516f	Add SparseCsrCPU and SparseCsrCUDA dispatch to sum.dim_IntList (#99292 ) This PR is to add support of sum.dim_IntList for Sparse Tensor, which is exposed in https://github.com/pytorch/pytorch/issues/98796. Pull Request resolved: https://github.com/pytorch/pytorch/pull/99292 Approved by: https://github.com/mingfeima, https://github.com/rusty1s, https://github.com/cpuhrsch	2023-07-24 17:30:58 +00:00
Justin Chu	73e1455327	[BE] Enable ruff's UP rules and autoformat test/ (#105434 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/105434 Approved by: https://github.com/albanD	2023-07-19 20:36:06 +00:00
nikitaved	44c8515d0d	SDPA: frontend for BSR masks (#104042 ) This PR implements a (yet private) frontend for scaled_dot_product_attention that works with BSR `attn_mask`. This function is directly comparable (with suitable masks) with `torch.nn.functional.scaled_dot_product_attention` once `attn_mask.dtype == torch.bool`, but it's behavior is different when `attn_mask.dtype != torch.bool`. This is because `torch.nn.functional.scaled_dot_product_attention` assumes that irrelevant values are supposed to be filled with `-inf`, while the selected ones should be `0`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/104042 Approved by: https://github.com/amjames, https://github.com/cpuhrsch	2023-07-13 18:01:21 +00:00
yanbing-j	053654b9cf	Optimize scatter_add/scatter_reduce in BFloat16/Half data type in CPU backend (#103427 ) ### Description This PR is to optimize scatter_add/scatter_reduce of BFloat16/Half data type in CPU backend, which is one task in https://github.com/pyg-team/pytorch_geometric/issues/7057. Main point is creating a buffer among threads to accumulate intermediate data as fp32 data type. Next step: - [x] Add benchmarks - [x] Extend to Half - [x] Simplify code ### Performance test (Updated) Test BFloat16 in Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz With jemalloc and iomp Single socket (40C) ![image](https://github.com/pytorch/pytorch/assets/61222868/4b4342f1-8cc3-46f7-81f5-651becd9b1e3) Single core ![image](https://github.com/pytorch/pytorch/assets/61222868/09e5f700-2c2e-4208-979e-74b85474dea6) Pull Request resolved: https://github.com/pytorch/pytorch/pull/103427 Approved by: https://github.com/mingfeima, https://github.com/albanD	2023-07-13 09:34:29 +00:00
PyTorch MergeBot	f8aedf1efe	Revert "Optimize scatter_add/scatter_reduce in BFloat16/Half data type in CPU backend (#103427 )" This reverts commit `da7675621e`. Reverted https://github.com/pytorch/pytorch/pull/103427 on behalf of https://github.com/clee2000 due to sorry but it looks like this pr broke test_scatter_gather_ops.py::TestScatterGatherCPU::test_scatter_expanded_index_cpu_bfloat16 on periodic parallelnative testing `da7675621e` https://github.com/pytorch/pytorch/actions/runs/5477783108/jobs/9977608393 ([comment](https://github.com/pytorch/pytorch/pull/103427#issuecomment-1624008753))	2023-07-06 17:02:03 +00:00
yanbing-j	da7675621e	Optimize scatter_add/scatter_reduce in BFloat16/Half data type in CPU backend (#103427 ) ### Description This PR is to optimize scatter_add/scatter_reduce of BFloat16/Half data type in CPU backend, which is one task in https://github.com/pyg-team/pytorch_geometric/issues/7057. Main point is creating a buffer among threads to accumulate intermediate data as fp32 data type. Next step: - [x] Add benchmarks - [x] Extend to Half - [x] Simplify code ### Performance test (Updated) Test BFloat16 in Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz With jemalloc and iomp Single socket (40C) ![image](https://github.com/pytorch/pytorch/assets/61222868/4b4342f1-8cc3-46f7-81f5-651becd9b1e3) Single core ![image](https://github.com/pytorch/pytorch/assets/61222868/09e5f700-2c2e-4208-979e-74b85474dea6) Pull Request resolved: https://github.com/pytorch/pytorch/pull/103427 Approved by: https://github.com/mingfeima, https://github.com/albanD	2023-07-06 01:23:56 +00:00
Andrew M. James	5364366f8c	Sparse Compressed mm avoid creating temp sparse (#104062 ) When mm forwards to addmm it creates a zeroed out self this tensor should take options from the result not one of the sparse arguments. The bug was leading to an error when calling linear with an `out` kwarg. Pull Request resolved: https://github.com/pytorch/pytorch/pull/104062 Approved by: https://github.com/nikitaved, https://github.com/pearu	2023-06-26 16:45:04 +00:00
Aleksandar Samardžić	09fdea8564	Fix autograd issue with identity conversions (#92022 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/92022 Approved by: https://github.com/pearu, https://github.com/mtaaooby, https://github.com/amjames, https://github.com/cpuhrsch	2023-06-21 21:23:03 +00:00
Nikita Vedeneev	39a22e2791	softmax: Triton kernel for BSR inputs (#102095 ) Implements `softmax` Triton kernel for BSR inputs. So far, only over `dim=-1`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/102095 Approved by: https://github.com/cpuhrsch	2023-06-21 01:23:27 +00:00
Pearu Peterson	cbe270d233	Fix zeros_like for sparse tensors with batch dimensions. Add opinfo-based tests to like-functions. (#101215 ) Fixes #101078 Pull Request resolved: https://github.com/pytorch/pytorch/pull/101215 Approved by: https://github.com/cpuhrsch	2023-06-13 16:02:10 +00:00
Xiao Wang	6340aa5d58	Skip test test_triton_bsr_dense_bmm if not TEST_WITH_TORCHINDUCTOR [v2] (#102660 ) Test was originally skipped in https://github.com/pytorch/pytorch/pull/98462 Not sure why it was removed in https://github.com/pytorch/pytorch/pull/94825 Now the test hits CUDA illegal memory access on H100 again after https://github.com/pytorch/pytorch/pull/101163 Pull Request resolved: https://github.com/pytorch/pytorch/pull/102660 Approved by: https://github.com/zou3519	2023-06-01 20:36:45 +00:00
Pearu Peterson	9f97b7c43b	Add integer overflow checks for large compressed tensor dimensions and nnz (#102530 ) With the previous PR allowing large compressed tensors (dimensions larger than `2 31 - 1`), sparse compressed tensor invariants checks may give false-positive results: ```python >>> nnz=231 >>> torch.sparse.check_sparse_tensor_invariants.enable() >>> torch.sparse_csr_tensor(torch.arange(nnz+1, dtype=torch.int32), torch.zeros(nnz, dtype=torch.int32), torch.ones(nnz), (nnz, 1)) tensor(crow_indices=tensor([ 0, 1, 2, ..., 2147483646, 2147483647, -2147483648]), col_indices=tensor([0, 0, 0, ..., 0, 0, 0]), values=tensor([1., 1., 1., ..., 1., 1., 1.]), size=(2147483648, 1), nnz=2147483648, layout=torch.sparse_csr) ``` (notice that the last entry in `crow_indices` is invalid) or raise a bogus exception as in ```python >>> torch.sparse_csr_tensor(torch.arange(nnz+1, dtype=torch.int32), torch.arange(nnz, dtype=torch.int32), torch.ones(nnz), (nnz, 1)) Traceback (most recent call last): File "<stdin>", line 1, in <module> RuntimeError: `0 <= col_indices < ncols` is not satisfied. ``` (notice that `col_indices` is actually valid). This PR fixes the above-reported bugs by introducing integer overflow checks for sparse compressed tensors dimensions as well as nnz. Pull Request resolved: https://github.com/pytorch/pytorch/pull/102530 Approved by: https://github.com/nikitaved	2023-05-31 15:34:08 +00:00
Nikita Vedeneev	d80d3b18d0	nn.Linear with BSR inputs: spare the user from explicit Triton kernel registrations (#98403 ) <!-- copilot:summary --> ### <samp>🤖 Generated by Copilot at 08f7a6a</samp> This pull request adds support for triton kernels in `torch` and `torch/cuda`, and refactors and tests the existing triton kernel for BSR matrix multiplication. It also adds a test case to ensure that importing `torch` does not implicitly import `triton`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/98403 Approved by: https://github.com/malfet, https://github.com/cpuhrsch	2023-05-31 13:09:45 +00:00
Pearu Peterson	fcbdbd6682	Fix silent nnz overflow for large sparse compressed tensors. (#102523 ) Fixes https://github.com/pytorch/pytorch/issues/102520 Pull Request resolved: https://github.com/pytorch/pytorch/pull/102523 Approved by: https://github.com/nikitaved, https://github.com/cpuhrsch	2023-05-30 16:58:01 +00:00
Nikita Vedeneev	6c7410ddc3	sampled_addmm: BSR support (#101163 ) This PR implements a `sampled_addmm` kernel that works with a BSR mask. Pull Request resolved: https://github.com/pytorch/pytorch/pull/101163 Approved by: https://github.com/cpuhrsch	2023-05-25 12:33:50 +00:00
Nikita Vedeneev	346e1f512f	sparse compressed validation: allow empty-batched inputs (#101180 ) Fixes https://github.com/pytorch/pytorch/issues/101179. Pull Request resolved: https://github.com/pytorch/pytorch/pull/101180 Approved by: https://github.com/pearu, https://github.com/cpuhrsch	2023-05-11 20:30:20 +00:00
Nikita Vedeneev	dd2c22f4bb	bsr_dense_bmm(): enable more precise float32 support with float64 accumulators (#100882 ) Float64 is there in Triton! This PR increases precision for float32 inputs with float64 accumulation dtype. Pull Request resolved: https://github.com/pytorch/pytorch/pull/100882 Approved by: https://github.com/cpuhrsch	2023-05-11 11:22:55 +00:00
Pearu Peterson	92a7640b76	Add mul tests with sparse sample inputs (#100393 ) This PR implements sparse sample inputs and error inputs for mul OpInfo. Pull Request resolved: https://github.com/pytorch/pytorch/pull/100393 Approved by: https://github.com/amjames, https://github.com/cpuhrsch	2023-05-09 16:13:14 +00:00
Nikita Vedeneev	0141a242fd	bsr_dense_bmm(): remove sparse_rowspace kernel and some dead code (#100876 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/100876 Approved by: https://github.com/cpuhrsch, https://github.com/Skylion007	2023-05-09 16:12:11 +00:00
Nikita Vedeneev	c4bc259f00	bsr_dense_mm(): better test coverage (#100543 ) This PR improves test coverage for `bsr_dense_mm` by: - ~~enabling correctness tests for `float32`~~. - extending and testing input correctness checks. Pull Request resolved: https://github.com/pytorch/pytorch/pull/100543 Approved by: https://github.com/cpuhrsch, https://github.com/malfet	2023-05-09 09:26:02 +00:00
Pearu Peterson	3ae0e23b90	Fix sum OpInfo for sparse sample inputs and assert coverage for sparse-enabled operators (#100391 ) This PR enables sum tests for sparse sample inputs. Previously, the tests existed but were never run because the sum OpInfo instance was created without specifying `supports_sparse_=True`. To avoid such mistakes in the future, the following PR https://github.com/pytorch/pytorch/pull/100392 enables the `supports_sparse_` flags automatically when OpInfo creation specifies `sample_inputs_sparse_*_func`. In addition, the PR applies several fixes to sum tests for sparse sample inputs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/100391 Approved by: https://github.com/cpuhrsch	2023-05-03 02:04:39 +00:00

1 2 3 4 5 ...

256 Commits