pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Oguz Ulgen	1df14f1bf8	Move has_triton to top level triton utils so that dynamo can also access (#109832 ) it without creating cyclic dependencies Pull Request resolved: https://github.com/pytorch/pytorch/pull/109832 Approved by: https://github.com/zou3519	2023-09-22 19:33:41 +00:00
Shunting Zhang	e68b3ad14f	update triton pin with needed inductor change (#107722 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/107722 Approved by: https://github.com/jansel, https://github.com/cpuhrsch	2023-08-29 04:31:44 +00:00
Pearu Peterson	d7c0c5de2d	Set crow_indices outputs as non-differentiable. (#107447 ) Fixes https://github.com/pytorch/pytorch/issues/107083 Pull Request resolved: https://github.com/pytorch/pytorch/pull/107447 Approved by: https://github.com/cpuhrsch	2023-08-21 19:52:32 +00:00
rraminen	239578beff	[ROCm] Enable a few bfloat16 unit tests (#105177 ) Currently a few unit tests from test_matmul_cuda and test_sparse_csr test suites are being skipped on ROCm. This PR is to enable the following unit tests on ROCm (~30 UTs): test_cublas_baddbmm_large_input_* (__main__.TestMatmulCudaCUDA) test_addmm_sizes_all_sparse_csr* (__main__.TestSparseCSRCUDA) when m==0 or n==0 or k==0 Pull Request resolved: https://github.com/pytorch/pytorch/pull/105177 Approved by: https://github.com/pruthvistony, https://github.com/jithunnair-amd, https://github.com/malfet	2023-08-03 21:17:19 +00:00
yanbing-j	a54043516f	Add SparseCsrCPU and SparseCsrCUDA dispatch to sum.dim_IntList (#99292 ) This PR is to add support of sum.dim_IntList for Sparse Tensor, which is exposed in https://github.com/pytorch/pytorch/issues/98796. Pull Request resolved: https://github.com/pytorch/pytorch/pull/99292 Approved by: https://github.com/mingfeima, https://github.com/rusty1s, https://github.com/cpuhrsch	2023-07-24 17:30:58 +00:00
Justin Chu	73e1455327	[BE] Enable ruff's UP rules and autoformat test/ (#105434 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/105434 Approved by: https://github.com/albanD	2023-07-19 20:36:06 +00:00
nikitaved	44c8515d0d	SDPA: frontend for BSR masks (#104042 ) This PR implements a (yet private) frontend for scaled_dot_product_attention that works with BSR `attn_mask`. This function is directly comparable (with suitable masks) with `torch.nn.functional.scaled_dot_product_attention` once `attn_mask.dtype == torch.bool`, but it's behavior is different when `attn_mask.dtype != torch.bool`. This is because `torch.nn.functional.scaled_dot_product_attention` assumes that irrelevant values are supposed to be filled with `-inf`, while the selected ones should be `0`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/104042 Approved by: https://github.com/amjames, https://github.com/cpuhrsch	2023-07-13 18:01:21 +00:00
yanbing-j	053654b9cf	Optimize scatter_add/scatter_reduce in BFloat16/Half data type in CPU backend (#103427 ) ### Description This PR is to optimize scatter_add/scatter_reduce of BFloat16/Half data type in CPU backend, which is one task in https://github.com/pyg-team/pytorch_geometric/issues/7057. Main point is creating a buffer among threads to accumulate intermediate data as fp32 data type. Next step: - [x] Add benchmarks - [x] Extend to Half - [x] Simplify code ### Performance test (Updated) Test BFloat16 in Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz With jemalloc and iomp Single socket (40C) ![image](https://github.com/pytorch/pytorch/assets/61222868/4b4342f1-8cc3-46f7-81f5-651becd9b1e3) Single core ![image](https://github.com/pytorch/pytorch/assets/61222868/09e5f700-2c2e-4208-979e-74b85474dea6) Pull Request resolved: https://github.com/pytorch/pytorch/pull/103427 Approved by: https://github.com/mingfeima, https://github.com/albanD	2023-07-13 09:34:29 +00:00
PyTorch MergeBot	f8aedf1efe	Revert "Optimize scatter_add/scatter_reduce in BFloat16/Half data type in CPU backend (#103427 )" This reverts commit `da7675621e`. Reverted https://github.com/pytorch/pytorch/pull/103427 on behalf of https://github.com/clee2000 due to sorry but it looks like this pr broke test_scatter_gather_ops.py::TestScatterGatherCPU::test_scatter_expanded_index_cpu_bfloat16 on periodic parallelnative testing `da7675621e` https://github.com/pytorch/pytorch/actions/runs/5477783108/jobs/9977608393 ([comment](https://github.com/pytorch/pytorch/pull/103427#issuecomment-1624008753))	2023-07-06 17:02:03 +00:00
yanbing-j	da7675621e	Optimize scatter_add/scatter_reduce in BFloat16/Half data type in CPU backend (#103427 ) ### Description This PR is to optimize scatter_add/scatter_reduce of BFloat16/Half data type in CPU backend, which is one task in https://github.com/pyg-team/pytorch_geometric/issues/7057. Main point is creating a buffer among threads to accumulate intermediate data as fp32 data type. Next step: - [x] Add benchmarks - [x] Extend to Half - [x] Simplify code ### Performance test (Updated) Test BFloat16 in Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz With jemalloc and iomp Single socket (40C) ![image](https://github.com/pytorch/pytorch/assets/61222868/4b4342f1-8cc3-46f7-81f5-651becd9b1e3) Single core ![image](https://github.com/pytorch/pytorch/assets/61222868/09e5f700-2c2e-4208-979e-74b85474dea6) Pull Request resolved: https://github.com/pytorch/pytorch/pull/103427 Approved by: https://github.com/mingfeima, https://github.com/albanD	2023-07-06 01:23:56 +00:00
Andrew M. James	5364366f8c	Sparse Compressed mm avoid creating temp sparse (#104062 ) When mm forwards to addmm it creates a zeroed out self this tensor should take options from the result not one of the sparse arguments. The bug was leading to an error when calling linear with an `out` kwarg. Pull Request resolved: https://github.com/pytorch/pytorch/pull/104062 Approved by: https://github.com/nikitaved, https://github.com/pearu	2023-06-26 16:45:04 +00:00
Aleksandar Samardžić	09fdea8564	Fix autograd issue with identity conversions (#92022 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/92022 Approved by: https://github.com/pearu, https://github.com/mtaaooby, https://github.com/amjames, https://github.com/cpuhrsch	2023-06-21 21:23:03 +00:00
Nikita Vedeneev	39a22e2791	softmax: Triton kernel for BSR inputs (#102095 ) Implements `softmax` Triton kernel for BSR inputs. So far, only over `dim=-1`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/102095 Approved by: https://github.com/cpuhrsch	2023-06-21 01:23:27 +00:00
Pearu Peterson	cbe270d233	Fix zeros_like for sparse tensors with batch dimensions. Add opinfo-based tests to like-functions. (#101215 ) Fixes #101078 Pull Request resolved: https://github.com/pytorch/pytorch/pull/101215 Approved by: https://github.com/cpuhrsch	2023-06-13 16:02:10 +00:00
Xiao Wang	6340aa5d58	Skip test test_triton_bsr_dense_bmm if not TEST_WITH_TORCHINDUCTOR [v2] (#102660 ) Test was originally skipped in https://github.com/pytorch/pytorch/pull/98462 Not sure why it was removed in https://github.com/pytorch/pytorch/pull/94825 Now the test hits CUDA illegal memory access on H100 again after https://github.com/pytorch/pytorch/pull/101163 Pull Request resolved: https://github.com/pytorch/pytorch/pull/102660 Approved by: https://github.com/zou3519	2023-06-01 20:36:45 +00:00
Pearu Peterson	9f97b7c43b	Add integer overflow checks for large compressed tensor dimensions and nnz (#102530 ) With the previous PR allowing large compressed tensors (dimensions larger than `2 31 - 1`), sparse compressed tensor invariants checks may give false-positive results: ```python >>> nnz=231 >>> torch.sparse.check_sparse_tensor_invariants.enable() >>> torch.sparse_csr_tensor(torch.arange(nnz+1, dtype=torch.int32), torch.zeros(nnz, dtype=torch.int32), torch.ones(nnz), (nnz, 1)) tensor(crow_indices=tensor([ 0, 1, 2, ..., 2147483646, 2147483647, -2147483648]), col_indices=tensor([0, 0, 0, ..., 0, 0, 0]), values=tensor([1., 1., 1., ..., 1., 1., 1.]), size=(2147483648, 1), nnz=2147483648, layout=torch.sparse_csr) ``` (notice that the last entry in `crow_indices` is invalid) or raise a bogus exception as in ```python >>> torch.sparse_csr_tensor(torch.arange(nnz+1, dtype=torch.int32), torch.arange(nnz, dtype=torch.int32), torch.ones(nnz), (nnz, 1)) Traceback (most recent call last): File "<stdin>", line 1, in <module> RuntimeError: `0 <= col_indices < ncols` is not satisfied. ``` (notice that `col_indices` is actually valid). This PR fixes the above-reported bugs by introducing integer overflow checks for sparse compressed tensors dimensions as well as nnz. Pull Request resolved: https://github.com/pytorch/pytorch/pull/102530 Approved by: https://github.com/nikitaved	2023-05-31 15:34:08 +00:00
Nikita Vedeneev	d80d3b18d0	nn.Linear with BSR inputs: spare the user from explicit Triton kernel registrations (#98403 ) <!-- copilot:summary --> ### <samp>🤖 Generated by Copilot at 08f7a6a</samp> This pull request adds support for triton kernels in `torch` and `torch/cuda`, and refactors and tests the existing triton kernel for BSR matrix multiplication. It also adds a test case to ensure that importing `torch` does not implicitly import `triton`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/98403 Approved by: https://github.com/malfet, https://github.com/cpuhrsch	2023-05-31 13:09:45 +00:00
Pearu Peterson	fcbdbd6682	Fix silent nnz overflow for large sparse compressed tensors. (#102523 ) Fixes https://github.com/pytorch/pytorch/issues/102520 Pull Request resolved: https://github.com/pytorch/pytorch/pull/102523 Approved by: https://github.com/nikitaved, https://github.com/cpuhrsch	2023-05-30 16:58:01 +00:00
Nikita Vedeneev	6c7410ddc3	sampled_addmm: BSR support (#101163 ) This PR implements a `sampled_addmm` kernel that works with a BSR mask. Pull Request resolved: https://github.com/pytorch/pytorch/pull/101163 Approved by: https://github.com/cpuhrsch	2023-05-25 12:33:50 +00:00
Nikita Vedeneev	346e1f512f	sparse compressed validation: allow empty-batched inputs (#101180 ) Fixes https://github.com/pytorch/pytorch/issues/101179. Pull Request resolved: https://github.com/pytorch/pytorch/pull/101180 Approved by: https://github.com/pearu, https://github.com/cpuhrsch	2023-05-11 20:30:20 +00:00
Nikita Vedeneev	dd2c22f4bb	bsr_dense_bmm(): enable more precise float32 support with float64 accumulators (#100882 ) Float64 is there in Triton! This PR increases precision for float32 inputs with float64 accumulation dtype. Pull Request resolved: https://github.com/pytorch/pytorch/pull/100882 Approved by: https://github.com/cpuhrsch	2023-05-11 11:22:55 +00:00
Pearu Peterson	92a7640b76	Add mul tests with sparse sample inputs (#100393 ) This PR implements sparse sample inputs and error inputs for mul OpInfo. Pull Request resolved: https://github.com/pytorch/pytorch/pull/100393 Approved by: https://github.com/amjames, https://github.com/cpuhrsch	2023-05-09 16:13:14 +00:00
Nikita Vedeneev	0141a242fd	bsr_dense_bmm(): remove sparse_rowspace kernel and some dead code (#100876 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/100876 Approved by: https://github.com/cpuhrsch, https://github.com/Skylion007	2023-05-09 16:12:11 +00:00
Nikita Vedeneev	c4bc259f00	bsr_dense_mm(): better test coverage (#100543 ) This PR improves test coverage for `bsr_dense_mm` by: - ~~enabling correctness tests for `float32`~~. - extending and testing input correctness checks. Pull Request resolved: https://github.com/pytorch/pytorch/pull/100543 Approved by: https://github.com/cpuhrsch, https://github.com/malfet	2023-05-09 09:26:02 +00:00
Pearu Peterson	3ae0e23b90	Fix sum OpInfo for sparse sample inputs and assert coverage for sparse-enabled operators (#100391 ) This PR enables sum tests for sparse sample inputs. Previously, the tests existed but were never run because the sum OpInfo instance was created without specifying `supports_sparse_=True`. To avoid such mistakes in the future, the following PR https://github.com/pytorch/pytorch/pull/100392 enables the `supports_sparse_` flags automatically when OpInfo creation specifies `sample_inputs_sparse_*_func`. In addition, the PR applies several fixes to sum tests for sparse sample inputs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/100391 Approved by: https://github.com/cpuhrsch	2023-05-03 02:04:39 +00:00
Nikita Vedeneev	1adb6fa922	nn.Linear: dispatch to bsr_dense_mm for half and bfloat16 (#94825 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/94825 Approved by: https://github.com/albanD, https://github.com/cpuhrsch	2023-04-15 13:38:42 +00:00
Xiao Wang	bd83b205cc	Skip test test_triton_bsr_dense_bmm if not TEST_WITH_TORCHINDUCTOR (#98462 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/98462 Approved by: https://github.com/zou3519	2023-04-10 21:21:06 +00:00
eqy	2fddcf0fc0	[CUDA][CUDA 11] Remove more CUDA 11 version checks (#92934 ) Working on removing stragglers missed in previous CUDA version < 11.0 cleanup PRs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/92934 Approved by: https://github.com/ngimel	2023-03-30 19:49:52 +00:00
Aaron Gokaslan	47dca20d80	[BE] Enable flake8-comprehension rule C417 (#97880 ) Enables flake8-comprehension rule C417. Ruff autogenerated these fixes to the codebase. Pull Request resolved: https://github.com/pytorch/pytorch/pull/97880 Approved by: https://github.com/ezyang, https://github.com/kit1980, https://github.com/albanD	2023-03-30 14:34:24 +00:00
Sergii Dymchenko	5ab50cf048	Fix shoud/shoudl typos (#97930 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/97930 Approved by: https://github.com/clee2000	2023-03-30 08:27:16 +00:00
Nikita Shulga	2c16b73a1b	Remove comma from parametrized test name (#97844 ) Using `name_fn` argument of `@paramterize` decorator. As internal test runner can't figure out how to parse those, otherwise this is a no-op. For those with intern access, see [T149211516](https://www.internalfb.com/intern/tasks/?t=149211516) Pull Request resolved: https://github.com/pytorch/pytorch/pull/97844 Approved by: https://github.com/weiwangmeta	2023-03-29 14:20:13 +00:00
Nikita Shulga	b443198966	Fix sparse addmv ref impl for non-contig tensors (#97730 ) Fix logic in `test_block_addmm` that tested op against itself rather than against dense implementation, by implementing `ref_addvm` function that converts tensor back to dense before multiplying it with vector. Fix reference implementation by passing stride for vector and result. (Not sure wether it will be more perf efficient to iterate over strided tensor or request a dense copy as MKL implementation does) Print more verbose error message if values differ. Fixes https://github.com/pytorch/pytorch/issues/97629 , https://github.com/pytorch/pytorch/issues/97589 , https://github.com/pytorch/pytorch/issues/97563 Pull Request resolved: https://github.com/pytorch/pytorch/pull/97730 Approved by: https://github.com/cpuhrsch	2023-03-28 20:46:32 +00:00
Nikita Shulga	ad5d81adda	[Sparse] Add reference implementation for addmv (#97353 ) Partially addresses the problem raised in https://github.com/pytorch/pytorch/issues/96972 Add `test_addmv` and enable `test_block_addmv` on all platforms (so the test could be run on M1) TODO: Make sure that test_block_addmv non-contiguous mode actually generate non-contiguous as rigth now it probably does not, as test passes assuming values are contiguous. Pull Request resolved: https://github.com/pytorch/pytorch/pull/97353 Approved by: https://github.com/cpuhrsch	2023-03-24 06:14:32 +00:00
haozhe.zhu	fe0afc5852	use accumulate type in BF16 gemm(include dot, mv) ref path (#96074 ) Fix https://github.com/pytorch/pytorch/issues/95125 and https://github.com/pytorch/pytorch/issues/83863 for bf16 accumulation in gemm ref path Pull Request resolved: https://github.com/pytorch/pytorch/pull/96074 Approved by: https://github.com/lezcano, https://github.com/peterbell10	2023-03-23 01:22:59 +00:00
Nikita Vedeneev	55cf7eef86	add/add_ for sparse compressed formats: fix silent index downcast int64 -> int32 (#95294 ) Fixes https://github.com/pytorch/pytorch/issues/95224. Pull Request resolved: https://github.com/pytorch/pytorch/pull/95294 Approved by: https://github.com/cpuhrsch, https://github.com/amjames	2023-03-10 17:51:40 +00:00
Nikita Vedeneev	98a4d74a68	COO intersection primitives: performance improvement (#96094 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/96094 Approved by: https://github.com/pearu	2023-03-07 13:21:29 +00:00
Nikita Vedeneev	d809020fc8	Triton kernel for bsr @ dense (#94823 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/94823 Approved by: https://github.com/cpuhrsch, https://github.com/malfet	2023-03-03 15:11:28 +00:00
PyTorch MergeBot	d7637801d3	Revert "COO intersection primitives: performance improvement (#92976 )" This reverts commit `b033594943`. Reverted https://github.com/pytorch/pytorch/pull/92976 on behalf of https://github.com/seemethere due to Need to revert this so I can revert https://github.com/pytorch/pytorch/pull/94048 cleanly	2023-03-03 01:38:56 +00:00
Nikita Vedeneev	b033594943	COO intersection primitives: performance improvement (#92976 ) This PR improves COO intersection primitives by: * making it sync-less (dims <= 8, can be changed to any value that fits stack). * improving performance with much less kernel calls. Pull Request resolved: https://github.com/pytorch/pytorch/pull/92976 Approved by: https://github.com/cpuhrsch, https://github.com/pearu	2023-03-02 17:42:39 +00:00
Nikita Vedeneev	325b43661e	add/add_ for compressed sparse inputs: bypass BLAS in some trivial cases (#95293 ) In `add(self, other, out=...)` we can bypass calls to BLAS in cases when `self == other == out` and `self == other`. This PR fixes the repro from https://github.com/pytorch/pytorch/issues/94966, but the issue is still present when `x.add_(x)` is replaced, say, with `x = x.clone().add_(x)`. Could that be a synchronization issue? CC @IvanYashchuk . Pull Request resolved: https://github.com/pytorch/pytorch/pull/95293 Approved by: https://github.com/cpuhrsch	2023-02-27 16:06:02 +00:00
mingfeima	c620ece726	port sparse_mm.reduce to pytorch and optimize it on CPU (#83727 ) ### Motivation of this PR This patch is to migrate `spmm_reduce` from `torch-sparse` (a 3rd party dependency for PyG) to `torch`, which is a response to the initial proposal for fusion of Gather, Apply Scatter in Message Passing of GNN inference/training. https://github.com/pytorch/pytorch/issues/71300 GAS is the major step for Message Passing, the behavior of GAS can be classified into 2 kinds depending on the storage type of `EdgeIndex` which records the connections of nodes: * COO: the hotspot is `scatter_reduce` * CSR: the hotspot is `spmm_reduce` The reduce type can be choose from: "max", "mean", "max", "min". extend `torch.sparse.mm` with an `reduce` argument, maps to `torch.sparse_mm.reduce` internally. `sparse_mm_reduce` is registered under the TensorTypeId of `SparseCsrCPU`, and this operator requires an internal interface `_sparse_mm_reduce_impl` which has dual outputs: * `out` - the actual output * `arg_out` - records output indices in the non zero elements if the reduce type is "max" or "min", this is only useful for training. So for inference, it will not be calculated. ### Performance Benchmark on GCN for obgn-products on Xeon single socket, the workload is improved by `4.3x` with this patch. Performance benefit for training will be bigger, the original backward impl for `sum\|mean` is sequential; the original backward impl for `max\|min` is not fused. #### before: ``` ----------------------------- ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg # of Calls ----------------------------- ------------ ------------ ------------ ------------ ------------ ------------ torch_sparse::spmm_sum 97.09% 56.086s 97.09% 56.088s 6.232s 9 aten::linear 0.00% 85.000us 1.38% 795.485ms 88.387ms 9 aten::matmul 0.00% 57.000us 1.38% 795.260ms 88.362ms 9 aten::mm 1.38% 795.201ms 1.38% 795.203ms 88.356ms 9 aten::relu 0.00% 50.000us 0.76% 440.434ms 73.406ms 6 aten::clamp_min 0.76% 440.384ms 0.76% 440.384ms 73.397ms 6 aten::add_ 0.57% 327.801ms 0.57% 327.801ms 36.422ms 9 aten::log_softmax 0.00% 23.000us 0.10% 55.503ms 18.501ms 3 ``` #### after ``` ----------------------------- ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg # of Calls ----------------------------- ------------ ------------ ------------ ------------ ------------ ------------ aten::spmm_sum 87.35% 11.826s 87.36% 11.827s 1.314s 9 aten::linear 0.00% 92.000us 5.87% 794.451ms 88.272ms 9 aten::matmul 0.00% 62.000us 5.87% 794.208ms 88.245ms 9 aten::mm 5.87% 794.143ms 5.87% 794.146ms 88.238ms 9 aten::relu 0.00% 53.000us 3.35% 452.977ms 75.496ms 6 aten::clamp_min 3.35% 452.924ms 3.35% 452.924ms 75.487ms 6 aten::add_ 2.58% 348.663ms 2.58% 348.663ms 38.740ms 9 aten::argmax 0.42% 57.473ms 0.42% 57.475ms 14.369ms 4 aten::log_softmax 0.00% 22.000us 0.39% 52.605ms 17.535ms 3 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/83727 Approved by: https://github.com/jgong5, https://github.com/cpuhrsch, https://github.com/rusty1s, https://github.com/pearu	2023-02-10 15:56:40 +00:00
Aleksandar Samardžić	e1f17b3530	Add CSR->BSC and CSC->BSR conversions (#93301 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/93301 Approved by: https://github.com/cpuhrsch	2023-02-07 19:22:05 +00:00
Nikita Vedeneev	bb6af061a0	`torch.triangular_solve` for CSR: materialize diagonal elements when `unitriangular=True`. (#93352 ) Fixes https://github.com/pytorch/pytorch/issues/88890 A temporary fix until MKL is fixed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/93352 Approved by: https://github.com/cpuhrsch	2023-01-31 16:33:57 +00:00
Aleksandar Samardžić	53f7fb9a22	Add CSC->BSC conversion (#92307 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/92307 Approved by: https://github.com/cpuhrsch	2023-01-30 17:03:36 +00:00
Pearu Peterson	65d6802e2f	Improve error messages for sparse methods on tensors with unsupported backends/layouts. (#93149 ) Fixes https://github.com/pytorch/pytorch/issues/92790 Pull Request resolved: https://github.com/pytorch/pytorch/pull/93149 Approved by: https://github.com/cpuhrsch	2023-01-27 19:50:23 +00:00
PyTorch MergeBot	7012d985fa	Revert "Improve `bsr @ strided` performance in `baddmm` for `bfloat16/half` with Triton kernels. (#88078 )" This reverts commit `46f16b9363`. Reverted https://github.com/pytorch/pytorch/pull/88078 on behalf of https://github.com/ZainRizvi due to Causing a test to fail consistently: test_decomp.py::HasDecompTest::test_has_decomposition	2023-01-26 16:22:29 +00:00
Nikita Vedeneev	46f16b9363	Improve `bsr @ strided` performance in `baddmm` for `bfloat16/half` with Triton kernels. (#88078 ) As per title. Additionally we also introduce support for: - Rectangular block sizes which are powers of 2 and at least 16 (triton's `dot` limitation). - Batch support with broadcasting for either of the arguments. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88078 Approved by: https://github.com/cpuhrsch	2023-01-26 07:58:27 +00:00
Eddie Yan	0bf7506051	[CUDA] Drop CUDA < 11.0 test flags (#92605 ) Follow-up of #89582 to drop flags like `CUDA11OrLater` in tests. Note that in some places it appears that `TEST_WITH_ROCM` is _implicitly_ guarded against via the `CUDA11OrLater` version check, based on my best-guess of how `torch.version.cuda` would behave in ROCM builds, so I've added `not TEST_WITH_ROCM` in cases where ROCM wasn't previously explicitly allowed. CC @ptrblck @malfet @ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/92605 Approved by: https://github.com/ngimel	2023-01-24 04:34:06 +00:00
Yanbo Liang	0ab4ab9f8d	[Dynamo] Fix calling UserDefinedObject.func should pass self object (#92050 ) Fixes #90834 Pull Request resolved: https://github.com/pytorch/pytorch/pull/92050 Approved by: https://github.com/jansel	2023-01-21 05:47:01 +00:00
PyTorch MergeBot	60bf851931	Revert "Improve `bsr @ strided` performance in `baddmm` for `bfloat16/half` with Triton kernels. (#88078 )" This reverts commit `8383b5c488`. Reverted https://github.com/pytorch/pytorch/pull/88078 on behalf of https://github.com/malfet due to This seems to have broke sm_86 testing, see https://hud.pytorch.org/hud/pytorch/pytorch/master/1?per_page=50&name_filter=sm86%20%2F%20test%20(default%2C%203	2023-01-19 23:37:59 +00:00

1 2 3 4 5

231 Commits