pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
PyTorch MergeBot	efc975feb2	Revert "[triton] Warp specialization support in torchinductor (#148503 )" This reverts commit `36183215e8`. Reverted https://github.com/pytorch/pytorch/pull/148503 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/148503#issuecomment-2758590645))	2025-03-27 16:06:42 +00:00
Mandar Deshpande	36183215e8	[triton] Warp specialization support in torchinductor (#148503 ) Summary: Currently only `num_warps` and `num_stages` are supported as one of the kernel options for inductor auto-tuning using `TritonTemplate`. In order to allow warp-specialization kernel options should allow specifying `num_consumer_groups` and `num_buffers_warp_spec` as well. Test Plan: ## Unit test Added tests for `test_triton_template_warp_specialization` to verify generated kenrnel contains configs for `num_consumer_groups` and `num_buffers_warp_spec`. ## Functional Testing Specific to flexattention. ``` import torch from torch.nn.attention.flex_attention import flex_attention from triton.testing import do_bench make_tensor = lambda: torch.rand(8, 16, 8192, 128, device="cuda", dtype=torch.bfloat16) q, k, v = make_tensor(), make_tensor(), make_tensor() flex_compiled = torch.compile(flex_attention, fullgraph=True) print(do_bench(lambda: flex_compiled(q, k, v, kernel_options={"num_warps": 4}))) ``` triton do_bench results: - default compile: 15.176783561706543 - with warp-spec: 9.452800750732422 ## Extra notes - generated triton kernel using `TORCH_LOGS=output_code`: P1740612877 - TTGIR for fused kernel: P1740614685 Differential Revision: D70212243 Pull Request resolved: https://github.com/pytorch/pytorch/pull/148503 Approved by: https://github.com/eellison	2025-03-27 13:07:50 +00:00
Jason Ansel	b040dc3a53	Reland: [inductor] Simplify grid handling (#148305 ) Summary: Relands D69965761 / https://github.com/pytorch/pytorch/pull/147583 Before this PR, calling a triton kernel would look like: ```py kernel.run(a, b, xnumel, grid=grid(xnumel), stream=stream0) ``` where the `grid=` was passed as a callable (function closure) arg. This PR removes the grid arg: ```py kernel.run(a, b, xnumel, stream=stream0) ``` instead now the grid computation is included in the kernel launcher, with something like: ```py def launcher(in_ptr0, out_ptr0, xnumel, stream): grid_0 = ((xnumel + 1023) >> 10) grid_1 = 1 grid_2 = 1 runner(grid_0, grid_1, grid_2, stream, function, metadata, None, launch_enter_hook, launch_exit_hook, in_ptr0, out_ptr0, xnumel) ``` This should be faster, since we remove multiple function/dict calls and are able to specialize the grid computation for each `triton.Config`. It also allows us to unify the handling of grids between the Python and C++ wrapper code. Before this, C++ wrapper code didn't actually support dynamic grid sizes and instead burned in a static grid. This unification allows this PR to be a net deletion of code. Differential [disconnected] Revision: D70471332 Pull Request resolved: https://github.com/pytorch/pytorch/pull/148305 Approved by: https://github.com/shunting314, https://github.com/eellison	2025-03-12 15:52:16 +00:00
PyTorch MergeBot	5ada4e6a53	Revert "Reland: [inductor] Simplify grid handling (#148305 )" This reverts commit `8d08b49015`. Reverted https://github.com/pytorch/pytorch/pull/148305 on behalf of https://github.com/jithunnair-amd due to Broke ROCm CI ([comment](https://github.com/pytorch/pytorch/pull/148305#issuecomment-2718177044))	2025-03-12 14:58:43 +00:00
Jason Ansel	8d08b49015	Reland: [inductor] Simplify grid handling (#148305 ) Summary: Relands D69965761 / https://github.com/pytorch/pytorch/pull/147583 Before this PR, calling a triton kernel would look like: ```py kernel.run(a, b, xnumel, grid=grid(xnumel), stream=stream0) ``` where the `grid=` was passed as a callable (function closure) arg. This PR removes the grid arg: ```py kernel.run(a, b, xnumel, stream=stream0) ``` instead now the grid computation is included in the kernel launcher, with something like: ```py def launcher(in_ptr0, out_ptr0, xnumel, stream): grid_0 = ((xnumel + 1023) >> 10) grid_1 = 1 grid_2 = 1 runner(grid_0, grid_1, grid_2, stream, function, metadata, None, launch_enter_hook, launch_exit_hook, in_ptr0, out_ptr0, xnumel) ``` This should be faster, since we remove multiple function/dict calls and are able to specialize the grid computation for each `triton.Config`. It also allows us to unify the handling of grids between the Python and C++ wrapper code. Before this, C++ wrapper code didn't actually support dynamic grid sizes and instead burned in a static grid. This unification allows this PR to be a net deletion of code. Differential Revision: D70471332 Pull Request resolved: https://github.com/pytorch/pytorch/pull/148305 Approved by: https://github.com/shunting314, https://github.com/eellison	2025-03-11 18:51:06 +00:00
Wang, Eikan	6a3a1f96ce	Enable XPU for Inductor MM Triton Kernel Benchmark (#148237 ) #147620 enabled `force_shape_pad` for triton kernel benchmark. Intel GPU supports this scenario. Hence, we need to enable the case in this PR. Otherwise, there would be a test case regression for Intel GPU as #147620 has been landed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/148237 Approved by: https://github.com/jansel	2025-03-03 10:09:06 +00:00
PyTorch MergeBot	608377d341	Revert "[import][inductor] Simplify grid handling (#147583 )" This reverts commit `b59776d857`. Reverted https://github.com/pytorch/pytorch/pull/147583 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/147583#issuecomment-2693016036))	2025-03-03 00:49:32 +00:00
Jason Ansel	b59776d857	[import][inductor] Simplify grid handling (#147583 ) Before this PR, calling a triton kernel would look like: ```py kernel.run(a, b, xnumel, grid=grid(xnumel), stream=stream0) ``` where the `grid=` was passed as a callable (function closure) arg. This PR removes the grid arg: ```py kernel.run(a, b, xnumel, stream=stream0) ``` instead now the grid computation is included in the kernel launcher, with something like: ```py def launcher(in_ptr0, out_ptr0, xnumel, stream): grid_0 = ((xnumel + 1023) >> 10) grid_1 = 1 grid_2 = 1 runner(grid_0, grid_1, grid_2, stream, function, metadata, None, launch_enter_hook, launch_exit_hook, in_ptr0, out_ptr0, xnumel) ``` This should be faster, since we remove multiple function/dict calls and are able to specialize the grid computation for each `triton.Config`. It also allows us to unify the handling of grids between the Python and C++ wrapper code. Before this, C++ wrapper code didn't actually support dynamic grid sizes and instead burned in a static grid. This unification allows this PR to be a net deletion of code. Note the attached diff contains some minor fbcode-only changes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/147583 Approved by: https://github.com/eellison, https://github.com/shunting314	2025-03-02 07:31:07 +00:00
iupaikov-amd	577708e6de	Unskipped multiple inductor tests for ROCm (#143581 ) All of them should be fine to run now after the triton fix. Pull Request resolved: https://github.com/pytorch/pytorch/pull/143581 Approved by: https://github.com/jeffdaily Co-authored-by: Jeff Daily <jeff.daily@amd.com>	2025-01-16 20:46:06 +00:00
PyTorch MergeBot	7d9f26de05	Revert "Unskipped multiple inductor tests for ROCm (#143581 )" This reverts commit `e05d67790e`. Reverted https://github.com/pytorch/pytorch/pull/143581 on behalf of https://github.com/huydhn due to There is some tests failing on ROCm jobs in trunk ([comment](https://github.com/pytorch/pytorch/pull/143581#issuecomment-2577163274))	2025-01-08 09:15:14 +00:00
iupaikov-amd	e05d67790e	Unskipped multiple inductor tests for ROCm (#143581 ) All of them should be fine to run now after the triton fix. Pull Request resolved: https://github.com/pytorch/pytorch/pull/143581 Approved by: https://github.com/jataylo, https://github.com/jeffdaily Co-authored-by: Jeff Daily <jeff.daily@amd.com>	2025-01-08 03:55:33 +00:00
xinan.lin	934eaa503f	[Inductor XPU] Support max-autotune on XPU and reuse the corresponding Inductor UT. (#143266 ) This PR aims to add the functionality support of max-autotune for XPU. The current triton templates and configurations are not well optimized for XPU, so the performance is not ready yet. Also the `mm_plus_mm` template have accuracy issues in some cases. We will address these issues in the next PRs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/143266 Approved by: https://github.com/EikanWang, https://github.com/jansel	2024-12-30 23:51:17 +00:00
PyTorch MergeBot	844e6108f6	Revert "[Inductor XPU] Support max-autotune on XPU and reuse the corresponding Inductor UT. (#143266 )" This reverts commit `ad750ae320`. Reverted https://github.com/pytorch/pytorch/pull/143266 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it is failing some tests in trunk ([comment](https://github.com/pytorch/pytorch/pull/143266#issuecomment-2561303786))	2024-12-24 17:22:57 +00:00
xinan.lin	ad750ae320	[Inductor XPU] Support max-autotune on XPU and reuse the corresponding Inductor UT. (#143266 ) This PR aims to add the functionality support of max-autotune for XPU. The current triton templates and configurations are not well optimized for XPU, so the performance is not ready yet. Also the `mm_plus_mm` template have accuracy issues in some cases. We will address these issues in the next PRs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/143266 Approved by: https://github.com/EikanWang, https://github.com/jansel	2024-12-24 05:42:36 +00:00
Jason Ansel	e343f46464	[inductor] Refactor is_big_gpu (#142220 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/142220 Approved by: https://github.com/yanboliang ghstack dependencies: #142219, #142033, #142222	2024-12-08 18:51:36 +00:00
Bin Bao	1a2edf6dca	[AOTI] Fix _mm_plus_mm codegen (#131689 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/128474 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131689 Approved by: https://github.com/chenyang78	2024-07-26 16:50:12 +00:00
Adnan Akhundov	41e9f9cb7c	[inductor] Fix flaky tests in test_select_algorithm.py (#131709 ) Summary: Same as [#131699](https://github.com/pytorch/pytorch/pull/131699), but in `test_select_algorithm.py`. Test Plan: Tested internally. Differential Revision: D60202778 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131709 Approved by: https://github.com/eellison	2024-07-25 06:42:57 +00:00
Xuehai Pan	134bc4fc34	[BE][Easy][12/19] enforce style for empty lines in import segments in `test/i*/` (#129763 ) See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter. You can review these PRs via: ```bash git diff --ignore-all-space --ignore-blank-lines HEAD~1 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/129763 Approved by: https://github.com/jansel	2024-07-18 07:49:19 +00:00
PyTorch MergeBot	b732b52f1e	Revert "[BE][Easy][12/19] enforce style for empty lines in import segments in `test/i*/` (#129763 )" This reverts commit `aecc746fcc`. Reverted https://github.com/pytorch/pytorch/pull/129763 on behalf of https://github.com/XuehaiPan due to need reland after rerunning lintrunner on main ([comment](https://github.com/pytorch/pytorch/pull/129763#issuecomment-2235736732))	2024-07-18 06:39:58 +00:00
Xuehai Pan	aecc746fcc	[BE][Easy][12/19] enforce style for empty lines in import segments in `test/i*/` (#129763 ) See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter. You can review these PRs via: ```bash git diff --ignore-all-space --ignore-blank-lines HEAD~1 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/129763 Approved by: https://github.com/jansel	2024-07-18 05:13:41 +00:00
Prachi Gupta	e2610240f9	[ROCm] Enable several inductor UTs (#127761 ) Fixes #ISSUE_NUMBER Needs https://github.com/pytorch/pytorch/pull/125396 Pull Request resolved: https://github.com/pytorch/pytorch/pull/127761 Approved by: https://github.com/peterbell10, https://github.com/pruthvistony	2024-06-12 22:47:45 +00:00
Jack Taylor	4b586a434f	[ROCm] Triton upstream AMD backend integration (#121801 ) Update ROCm-triton to use the AMD backend from https://github.com/openai/triton Note: `test__int_mm` can be enabled after https://github.com/pytorch/pytorch/pull/122431 is landed Co-authored-by: Pruthvi Madugundu <pruthvigithub@gmail.com> Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/121801 Approved by: https://github.com/nmacchioni, https://github.com/malfet	2024-04-25 20:44:27 +00:00
PyTorch MergeBot	3890848ec2	Revert "[ROCm] Triton upstream AMD backend integration (#121801 )" This reverts commit `9888d7495e`. Reverted https://github.com/pytorch/pytorch/pull/121801 on behalf of https://github.com/jeanschmidt due to need to revert so I can revert https://github.com/pytorch/pytorch/pull/124592 ([comment](https://github.com/pytorch/pytorch/pull/121801#issuecomment-2076951327))	2024-04-25 11:22:19 +00:00
Jack Taylor	9888d7495e	[ROCm] Triton upstream AMD backend integration (#121801 ) Update ROCm-triton to use the AMD backend from https://github.com/openai/triton Note: `test__int_mm` can be enabled after https://github.com/pytorch/pytorch/pull/122431 is landed Co-authored-by: Pruthvi Madugundu <pruthvigithub@gmail.com> Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/121801 Approved by: https://github.com/nmacchioni, https://github.com/malfet	2024-04-24 17:28:12 +00:00
eellison	136f8378e1	Re-land precompile triton templates (#124030 ) Re-land precompile triton templates. This got reverted because we were precompiling templates without checking the cache. I have since added logic and a test to ensure we do not precompile if there is a cache hit. Pull Request resolved: https://github.com/pytorch/pytorch/pull/124030 Approved by: https://github.com/shunting314, https://github.com/nmacchioni, https://github.com/yoyoyocmu	2024-04-19 17:03:33 +00:00
PyTorch MergeBot	2b82345e48	Revert "Re-land precompile triton templates (#124030 )" This reverts commit `030bb13fe8`. Reverted https://github.com/pytorch/pytorch/pull/124030 on behalf of https://github.com/DanilBaibak due to Broken trunk ([comment](https://github.com/pytorch/pytorch/pull/124030#issuecomment-2063191117))	2024-04-18 07:21:41 +00:00
eellison	030bb13fe8	Re-land precompile triton templates (#124030 ) Re-land precompile triton templates. This got reverted because we were precompiling templates without checking the cache. I have since added logic and a test to ensure we do not precompile if there is a cache hit. Pull Request resolved: https://github.com/pytorch/pytorch/pull/124030 Approved by: https://github.com/shunting314, https://github.com/nmacchioni, https://github.com/yoyoyocmu	2024-04-18 01:22:13 +00:00
PyTorch MergeBot	3f89f565bb	Revert "Re-land precompile triton templates (#124030 )" This reverts commit `d68196e7ef`. Reverted https://github.com/pytorch/pytorch/pull/124030 on behalf of https://github.com/DanilBaibak due to Broken trunk ([comment](https://github.com/pytorch/pytorch/pull/124030#issuecomment-2061044960))	2024-04-17 11:31:33 +00:00
eellison	d68196e7ef	Re-land precompile triton templates (#124030 ) Re-land precompile triton templates. This got reverted because we were precompiling templates without checking the cache. I have since added logic and a test to ensure we do not precompile if there is a cache hit. Pull Request resolved: https://github.com/pytorch/pytorch/pull/124030 Approved by: https://github.com/shunting314, https://github.com/nmacchioni, https://github.com/yoyoyocmu	2024-04-17 02:30:46 +00:00
Sam Larsen	4cd503c1f3	Enable FX graph cache for a batch of inductor tests (#121696 ) Summary: Get more FX graph cache coverage by enabling it for these unit tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/121696 Approved by: https://github.com/eellison	2024-03-14 03:39:59 +00:00
PyTorch MergeBot	def4959662	Revert "[inductor] allow mm template to accumulate with float16 dtype (#117479 )" This reverts commit `a7fbbc2a4a`. Reverted https://github.com/pytorch/pytorch/pull/117479 on behalf of https://github.com/PaliC due to breaking tests internally ([comment](https://github.com/pytorch/pytorch/pull/117479#issuecomment-1899032973))	2024-01-18 18:53:37 +00:00
Guoliang He	a7fbbc2a4a	[inductor] allow mm template to accumulate with float16 dtype (#117479 ) Fixes #108621 replace #108637 and #108982 Pull Request resolved: https://github.com/pytorch/pytorch/pull/117479 Approved by: https://github.com/jansel	2024-01-17 21:01:14 +00:00
Jiong Gong	715d663794	[inductor] split test_cpp_wrapper.py into cpu and cuda test files (#115479 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/115479 Approved by: https://github.com/atalman ghstack dependencies: #115167	2023-12-15 21:21:10 +00:00
PyTorch MergeBot	66994bca5f	Revert "[inductor] split test_cpp_wrapper.py into cpu and cuda test files (#115479 )" This reverts commit `653acd8fe1`. Reverted https://github.com/pytorch/pytorch/pull/115479 on behalf of https://github.com/desertfire due to will cause land race in fbcode because https://github.com/pytorch/pytorch/pull/115831 is already landed internally ([comment](https://github.com/pytorch/pytorch/pull/115479#issuecomment-1857979948))	2023-12-15 14:35:40 +00:00
Jiong Gong	653acd8fe1	[inductor] split test_cpp_wrapper.py into cpu and cuda test files (#115479 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/115479 Approved by: https://github.com/atalman ghstack dependencies: #115167	2023-12-15 04:04:08 +00:00
Bin Bao	5a96a42cea	[AOTI] Improve the two-pass wrapper codegen (#114067 ) Summary: For the second-pass, we don't have to rerun the whole inductor flow again. This PR moves that second-pass to the codegen time. This change not only speeds up the compilation, but also removes kernel scheduling inconsistency between the two passes. Another future improvement is to make the second-pass reuse the scheduler and do the wrapper codegen only. This is a copy of https://github.com/pytorch/pytorch/pull/113762 to land in github first. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114067 Approved by: https://github.com/chenyang78	2023-11-19 23:30:36 +00:00
PyTorch MergeBot	1e60174891	Revert "[dynamo] Add run_inductor_tests entrypoint (#113278 )" This reverts commit `b00311ce9e`. Reverted https://github.com/pytorch/pytorch/pull/113278 on behalf of https://github.com/huydhn due to Sorry for reverting your stack, but it is failing to list test internally with buck2 ([comment](https://github.com/pytorch/pytorch/pull/113278#issuecomment-1811646325))	2023-11-15 01:19:48 +00:00
Jason Ansel	b00311ce9e	[dynamo] Add run_inductor_tests entrypoint (#113278 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/113278 Approved by: https://github.com/yanboliang	2023-11-11 08:54:43 +00:00
Sam Larsen	4a09ed5459	[inductor] Parallelize Max Autotune step 2: Use multiple GPUs (#109127 ) Test Plan: `python test/inductor/test_max_autotune.py` `TORCHINDUCTOR_AUTOTUNE_IN_SUBPROC=1 TORCHINDUCTOR_MAX_AUTOTUNE=1 python benchmarks/dynamo/torchbench.py --device cuda --performance --backend inductor --inference --only hf_Bart` `TORCHINDUCTOR_AUTOTUNE_MULTI_DEVICE=1 TORCHINDUCTOR_AUTOTUNE_IN_SUBPROC=1 TORCHINDUCTOR_MAX_AUTOTUNE=1 python benchmarks/dynamo/torchbench.py --device cuda --performance --backend inductor --inference --only hf_Bart` Pull Request resolved: https://github.com/pytorch/pytorch/pull/109127 Approved by: https://github.com/shunting314, https://github.com/eellison ghstack dependencies: #109126	2023-09-14 00:37:39 +00:00
Ying Zhang	b2d764ece0	[Inductor CUTLASS backend] Step 3: autotune_process, and CUDABenchmarkRequest (#107901 ) This is the step 3 to add cutlass as an alternative inductor backend. Full tests can be found from the last PR in the stack. Feature request: https://github.com/pytorch/pytorch/issues/106991. Pull Request resolved: https://github.com/pytorch/pytorch/pull/107901 Approved by: https://github.com/jansel, https://github.com/aakhundov, https://github.com/kadeng ghstack dependencies: #107802, #107847	2023-09-12 17:44:36 +00:00
PyTorch MergeBot	c36c2bfcb2	Revert "[inductor] Parallelize Max Autotune step 2: Use all GPUs (#107983 )" This reverts commit `2c61313ff3`. Reverted https://github.com/pytorch/pytorch/pull/107983 on behalf of https://github.com/masnesral due to fbcode failures ([comment](https://github.com/pytorch/pytorch/pull/107983#issuecomment-1714816358))	2023-09-12 01:08:08 +00:00
Sam Larsen	2c61313ff3	[inductor] Parallelize Max Autotune step 2: Use all GPUs (#107983 ) Summary: Step 2 in revamping subprocess autotune to support multiple GPUs: use a pool of subprocesses and distribute benchmark calls across them. Test Plan: `python test/inductor/test_max_autotune.py` `TORCHINDUCTOR_AUTOTUNE_IN_SUBPROC=1 TORCHINDUCTOR_MAX_AUTOTUNE=1 python benchmarks/dynamo/torchbench.py --device cuda --performance --backend inductor --inference --only hf_Bart` `TORCHINDUCTOR_AUTOTUNE_MULTI_DEVICE=1 TORCHINDUCTOR_AUTOTUNE_IN_SUBPROC=1 TORCHINDUCTOR_MAX_AUTOTUNE=1 python benchmarks/dynamo/torchbench.py --device cuda --performance --backend inductor --inference --only hf_Bart` Pull Request resolved: https://github.com/pytorch/pytorch/pull/107983 Approved by: https://github.com/eellison, https://github.com/shunting314 ghstack dependencies: #107982	2023-09-10 15:43:03 +00:00
constroy	0578732bc3	[inductor] fix duplicate arg handling in triton templates (#105315 ) Fixes #105212 De-duplicate kernel args in codegen and autotuning of `torch.mm` and `torch.bmm`. refer to https://github.com/pytorch/pytorch/issues/105212#issuecomment-1637168866 Pull Request resolved: https://github.com/pytorch/pytorch/pull/105315 Approved by: https://github.com/jansel	2023-07-20 07:46:46 +00:00
Bin Bao	528ab477ce	[reland][inductor] Register an op for mm_plus_mm (#105153 ) Summary: Reland https://github.com/pytorch/pytorch/pull/104835 after fixing internal build issues Test Plan: CI Differential Revision: D47442849 Pull Request resolved: https://github.com/pytorch/pytorch/pull/105153 Approved by: https://github.com/clee2000	2023-07-14 14:35:29 +00:00
Catherine Lee	c36dca7bc5	Revert "[inductor] Register an op for mm_plus_mm (#104835 )" (#105150 ) This reverts commit `9c46a1620c`. Actual revert referenced in https://github.com/pytorch/pytorch/pull/105149 #104835 is causing internal builds to fail Pull Request resolved: https://github.com/pytorch/pytorch/pull/105150 Approved by: https://github.com/atalman	2023-07-13 17:13:45 +00:00
Bin Bao	9c46a1620c	[inductor] Register an op for mm_plus_mm (#104835 ) Summary: Currently the aten version of mm_plus_mm has no cpp implementation, and thus cpp_wrapper can not generate the correct cpp function call for it. Differential Revision: [D47372057](https://our.internmc.facebook.com/intern/diff/D47372057) Pull Request resolved: https://github.com/pytorch/pytorch/pull/104835 Approved by: https://github.com/jansel, https://github.com/SherlockNoMad	2023-07-12 02:34:02 +00:00
Jack Taylor	c9a806be28	[ROCm] enable additional inductor/dynamo UTs (#104624 ) Enables additional inductor UTs on ROCm and un skips outdated skips. I have also removed a group of failures in `test_torchinductor_opinfo` which are now passing for CUDA and ROCm ``` - # The following 3 tests fail on CUDA with AssertionError: expected size 5==5, stride 5==1 at dim=0 - # linalg._svd's return value has different strides on CUDA vs CPU which causes this - # In test_meta.py there is a mechanism to skipping strides checks for some ops - # (including _linalg_svd), possibly we should have something similar here - "linalg.cond": {f32, f64}, - "linalg.svdvals": {f32, f64}, - "linalg.matrix_rank": {f32, f64}, - "linalg.svd": {f32, f64}, ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/104624 Approved by: https://github.com/malfet	2023-07-11 20:44:02 +00:00
Adnan Akhundov	4911b80b8e	[inductor] addmm + ReLU / GELU fusion pass (#104132 ) Summary: Add a new path in `post_grad.py` for replacing addmm + ReLU / GELU activation with the corresponding `_addmm_activation` call (with `use_gelu=False` or `True`, respectively). The replacement is done only on `max_autotune_gemm=False` and when the activation is fusible. Test Plan: $ python test/inductor/test_pattern_matcher.py -k test_addmm_activation -v (__main__.TestPaternMatcher.test_addmm_activation) ... /data/users/aakhundov/pytorch/torch/_inductor/compile_fx.py:128: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled. Consider setting `torch.set_float32_matmul_precision('high')` for better performance. warnings.warn( Using FallbackKernel: aten._addmm_activation.default Using FallbackKernel: aten._addmm_activation.default /data/users/aakhundov/pytorch/torch/_dynamo/eval_frame.py:373: UserWarning: changing options to `torch.compile()` may require calling `torch._dynamo.reset()` to take effect warnings.warn( frames [('total', 1), ('ok', 1)] stats [('calls_captured', 2), ('unique_graphs', 1)] aot_autograd [('total', 1), ('ok', 1)] inductor [] ok ---------------------------------------------------------------------- Ran 1 test in 13.415s OK Reviewers: @eellison Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/104132 Approved by: https://github.com/eellison, https://github.com/jansel	2023-07-10 16:44:14 +00:00
Jack Taylor	ede1965f5d	Enable additional inductor test suites on ROCm (#102270 ) Enables additional inductor UTs on ROCm, following from https://github.com/pytorch/pytorch/pull/100981 Pull Request resolved: https://github.com/pytorch/pytorch/pull/102270 Approved by: https://github.com/malfet	2023-06-22 00:36:35 +00:00
Edward Z. Yang	bc6ec97e02	Switch dynamic_shapes to True by default (#103597 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/103597 Approved by: https://github.com/voznesenskym	2023-06-15 15:16:20 +00:00

1 2

74 Commits