pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
Jason Ansel	b040dc3a53	Reland: [inductor] Simplify grid handling (#148305 ) Summary: Relands D69965761 / https://github.com/pytorch/pytorch/pull/147583 Before this PR, calling a triton kernel would look like: ```py kernel.run(a, b, xnumel, grid=grid(xnumel), stream=stream0) ``` where the `grid=` was passed as a callable (function closure) arg. This PR removes the grid arg: ```py kernel.run(a, b, xnumel, stream=stream0) ``` instead now the grid computation is included in the kernel launcher, with something like: ```py def launcher(in_ptr0, out_ptr0, xnumel, stream): grid_0 = ((xnumel + 1023) >> 10) grid_1 = 1 grid_2 = 1 runner(grid_0, grid_1, grid_2, stream, function, metadata, None, launch_enter_hook, launch_exit_hook, in_ptr0, out_ptr0, xnumel) ``` This should be faster, since we remove multiple function/dict calls and are able to specialize the grid computation for each `triton.Config`. It also allows us to unify the handling of grids between the Python and C++ wrapper code. Before this, C++ wrapper code didn't actually support dynamic grid sizes and instead burned in a static grid. This unification allows this PR to be a net deletion of code. Differential [disconnected] Revision: D70471332 Pull Request resolved: https://github.com/pytorch/pytorch/pull/148305 Approved by: https://github.com/shunting314, https://github.com/eellison	2025-03-12 15:52:16 +00:00
PyTorch MergeBot	5ada4e6a53	Revert "Reland: [inductor] Simplify grid handling (#148305 )" This reverts commit `8d08b49015`. Reverted https://github.com/pytorch/pytorch/pull/148305 on behalf of https://github.com/jithunnair-amd due to Broke ROCm CI ([comment](https://github.com/pytorch/pytorch/pull/148305#issuecomment-2718177044))	2025-03-12 14:58:43 +00:00
Jason Ansel	8d08b49015	Reland: [inductor] Simplify grid handling (#148305 ) Summary: Relands D69965761 / https://github.com/pytorch/pytorch/pull/147583 Before this PR, calling a triton kernel would look like: ```py kernel.run(a, b, xnumel, grid=grid(xnumel), stream=stream0) ``` where the `grid=` was passed as a callable (function closure) arg. This PR removes the grid arg: ```py kernel.run(a, b, xnumel, stream=stream0) ``` instead now the grid computation is included in the kernel launcher, with something like: ```py def launcher(in_ptr0, out_ptr0, xnumel, stream): grid_0 = ((xnumel + 1023) >> 10) grid_1 = 1 grid_2 = 1 runner(grid_0, grid_1, grid_2, stream, function, metadata, None, launch_enter_hook, launch_exit_hook, in_ptr0, out_ptr0, xnumel) ``` This should be faster, since we remove multiple function/dict calls and are able to specialize the grid computation for each `triton.Config`. It also allows us to unify the handling of grids between the Python and C++ wrapper code. Before this, C++ wrapper code didn't actually support dynamic grid sizes and instead burned in a static grid. This unification allows this PR to be a net deletion of code. Differential Revision: D70471332 Pull Request resolved: https://github.com/pytorch/pytorch/pull/148305 Approved by: https://github.com/shunting314, https://github.com/eellison	2025-03-11 18:51:06 +00:00
PyTorch MergeBot	608377d341	Revert "[import][inductor] Simplify grid handling (#147583 )" This reverts commit `b59776d857`. Reverted https://github.com/pytorch/pytorch/pull/147583 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/147583#issuecomment-2693016036))	2025-03-03 00:49:32 +00:00
Jason Ansel	b59776d857	[import][inductor] Simplify grid handling (#147583 ) Before this PR, calling a triton kernel would look like: ```py kernel.run(a, b, xnumel, grid=grid(xnumel), stream=stream0) ``` where the `grid=` was passed as a callable (function closure) arg. This PR removes the grid arg: ```py kernel.run(a, b, xnumel, stream=stream0) ``` instead now the grid computation is included in the kernel launcher, with something like: ```py def launcher(in_ptr0, out_ptr0, xnumel, stream): grid_0 = ((xnumel + 1023) >> 10) grid_1 = 1 grid_2 = 1 runner(grid_0, grid_1, grid_2, stream, function, metadata, None, launch_enter_hook, launch_exit_hook, in_ptr0, out_ptr0, xnumel) ``` This should be faster, since we remove multiple function/dict calls and are able to specialize the grid computation for each `triton.Config`. It also allows us to unify the handling of grids between the Python and C++ wrapper code. Before this, C++ wrapper code didn't actually support dynamic grid sizes and instead burned in a static grid. This unification allows this PR to be a net deletion of code. Note the attached diff contains some minor fbcode-only changes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/147583 Approved by: https://github.com/eellison, https://github.com/shunting314	2025-03-02 07:31:07 +00:00
Xuehai Pan	1cb4e2df65	[BE][PYFMT] migrate PYFMT for `torch._inductor` to `ruff format` (#144550 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144550 Approved by: https://github.com/jansel	2025-02-28 13:33:19 +00:00
David Berard	26f19539ad	[triton 3.3] cpp_wrapper: add a global_scratch arg (#148051 ) Following triton # 4916, the generated cubin expects a global_scratch argument to support on-device TMA. We believe this is the source of many of the "invalid argument" failures on AOTI/cpp_wrapper tests. AFAIK, we don't use on-device TMA in Inductor as of now, so it should be safe to use a nullptr for the scratch space. Pull Request resolved: https://github.com/pytorch/pytorch/pull/148051 Approved by: https://github.com/YUNQIUGUO	2025-02-27 10:13:57 +00:00
Mwiza Kunda	8cb8722979	[inductor][triton] Ignore block ptr advances for removed buffers (#147193 ) block ptr advancements should also be deferrered conditional on the associated buffer not being removed. For example, if `FusedSchedulerNode(op0-op1)` has a store in `SchedulerNode` `op0` that is read in `op1`, the store and associated block ptr that would be created for `op0` in isolation is no longer needed. Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/147193 Approved by: https://github.com/jansel Co-authored-by: Aaron Gokaslan <aaronGokaslan@gmail.com>	2025-02-27 03:37:33 +00:00
PyTorch MergeBot	0d31c621a3	Revert "[inductor][triton] Ignore block ptr advances for removed buffers (#147193 )" This reverts commit `17766b7aad`. Reverted https://github.com/pytorch/pytorch/pull/147193 on behalf of https://github.com/wdvr due to failing tests on trunk - see below ([comment](https://github.com/pytorch/pytorch/pull/147193#issuecomment-2683286358))	2025-02-25 21:04:04 +00:00
Mwiza	17766b7aad	[inductor][triton] Ignore block ptr advances for removed buffers (#147193 ) block ptr advancements should also be deferrered conditional on the associated buffer not being removed. For example, if `FusedSchedulerNode(op0-op1)` has a store in `SchedulerNode` `op0` that is read in `op1`, the store and associated block ptr that would be created for `op0` in isolation is no longer needed. Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/147193 Approved by: https://github.com/jansel Co-authored-by: Aaron Gokaslan <aaronGokaslan@gmail.com>	2025-02-25 19:14:55 +00:00
Aaron Orenstein	db4ce78d46	PEP585: More UP006 fixes (#146392 ) This should be the final PR before we can enable RUFF UP006. Pull Request resolved: https://github.com/pytorch/pytorch/pull/146392 Approved by: https://github.com/justinchuby, https://github.com/albanD, https://github.com/Skylion007	2025-02-20 06:18:13 +00:00
leslie-fang-intel	9e0b3e9b6c	[Inductor] Fix Inplace Buffer inner name conflict (#147199 ) Summary Fix issue: https://github.com/pytorch/pytorch/issues/146975, when create `InplacedBuffer` inner name, we only count the number of unique `InplacedBuffer` or `RemovedArg`. The name may have conflict, for example reported in this issue ``` ---- make inplace create, input_name is: buf22; output_name is: buf27; buf.inner_name is: in_out_ptr2 dict_values([ InplacedBuffer(inner_name='in_out_ptr0', other_names=['buf6', 'buf11']), InplacedBuffer(inner_name='in_out_ptr0', other_names=['buf6', 'buf11']), InplacedBuffer(inner_name='in_out_ptr1', other_names=['buf24', 'buf26']), InplacedBuffer(inner_name='in_out_ptr1', other_names=['buf24', 'buf26'])]) ---- make inplace create, input_name is: buf0; output_name is: buf3; buf.inner_name is: in_out_ptr2 dict_values([ <torch._inductor.codegen.common.RemovedArg object at 0x7fbf75516350>, <torch._inductor.codegen.common.RemovedArg object at 0x7fbf75516350>, <torch._inductor.codegen.common.RemovedArg object at 0x7fbf75516350>, <torch._inductor.codegen.common.RemovedArg object at 0x7fbf75516350>, InplacedBuffer(inner_name='in_out_ptr2', other_names=['buf22', 'buf27', 'buf31', 'buf33']), InplacedBuffer(inner_name='in_out_ptr2', other_names=['buf22', 'buf27', 'buf31', 'buf33']) <torch._inductor.codegen.common.RemovedArg object at 0x7fbf75516350>, InplacedBuffer(inner_name='in_out_ptr2', other_names=['buf22', 'buf27', 'buf31', 'buf33']), InplacedBuffer(inner_name='in_out_ptr2', other_names=['buf22', 'buf27', 'buf31', 'buf33']) ]) ``` - The first time create `in_out_ptr2`, there are 2 unique `InplacedBuffer` - The second time create `in_out_ptr2`, there is 1 `RemovedArg` and 1 unique `InplacedBuffer` They are 2 different `InplacedBuffer`, but with same name `in_out_ptr2`. In this PR, we fix this regression by counting the number of `RemovedArg`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/147199 Approved by: https://github.com/jansel	2025-02-15 08:31:06 +00:00
Henry Tsang	20a9938069	try print stacktrace for error (#147061 ) Differential Revision: D69573525 Pull Request resolved: https://github.com/pytorch/pytorch/pull/147061 Approved by: https://github.com/Skylion007	2025-02-14 18:28:03 +00:00
Ding, Yi	b18e3c01aa	[Inductor] Unifiy Low Precision FP Legalization for to_dtype_bitcast & constant (#144646 ) The upcast in `to_dtype_bitcast()` breaks following operations that only works with the target type (I uses `bitwise_and` in the updated UT). ![image](https://github.com/user-attachments/assets/77a6f3b6-b5e7-4ed8-ab65-09d76f077376) This PR fixes this problem. Let's check the CI results to make sure it doesn't bring accuracy problems. - Unified the type promotion of low-precision FP operations in the legalize func, grouping ops into sources (whose results may be promoted) and sinks (whose input may be cast back). (The term of _sink_ and _source_ are from [graph theory](https://en.wikipedia.org/wiki/Directed_graph#Indegree_and_outdegree).) ## Test ```bash pytest -vs test/inductor/test_torchinductor.py::CpuTests::test_float16_to_int16_cpu pytest -vs test/inductor/test_torchinductor.py::CpuTests::test_bfloat16_to_int16_cpu pytest -vs test/inductor/test_torchinductor.py::CpuTests::test_float32_to_int32_cpu ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/144646 Approved by: https://github.com/jgong5, https://github.com/leslie-fang-intel, https://github.com/jansel	2025-02-11 19:45:04 +00:00
Jason Ansel	d35f6b2339	[inductor] Minor compile time optimizations in DefaultHandler (#146282 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/146282 Approved by: https://github.com/shunting314 ghstack dependencies: #146252, #146254, #146255, #146257	2025-02-08 18:00:40 +00:00
Jason Ansel	06604c4ec1	[inductor] Refactor op handlers part 5 (#146257 ) This makes OpHandler just a normal class using inheritance, and removes typing workarounds needed because it wasn't Pull Request resolved: https://github.com/pytorch/pytorch/pull/146257 Approved by: https://github.com/shunting314 ghstack dependencies: #146252, #146254, #146255	2025-02-08 18:00:30 +00:00
Jason Ansel	403db2faee	[inductor] Refactor op handlers part 4 (#146255 ) This replaces the `__getattr__()` pattern used in remaining OpHandlers with a `DefaultHandler` class defined in part 2. Some compile time wins from this as well: ``` 2025-02-02T19:46:32.2033010Z 2025-02-02T19:46:32.2036607Z WIN: benchmark ('add_loop_inductor', 'compile_time_instruction_count') failed, actual result 29633182927 is -1.71% lower than expected 30150000000 ±1.50% please update the expected results. 2025-02-02T19:46:32.2037575Z 2025-02-02T19:46:32.2037907Z please update all results that changed significantly, and not only the failed ones 2025-02-02T19:46:32.2039291Z PASS: benchmark ('add_loop_inductor_dynamic_gpu', 'compile_time_instruction_count') pass, actual result 43986879172 -1.02% is within expected 44440000000 ±2.50% 2025-02-02T19:46:32.2040131Z 2025-02-02T19:46:32.2041180Z WIN: benchmark ('add_loop_inductor_gpu', 'compile_time_instruction_count') failed, actual result 26246225695 is -1.85% lower than expected 26740000000 ±1.50% please update the expected results. 2025-02-02T19:46:32.2042188Z ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/146255 Approved by: https://github.com/shunting314 ghstack dependencies: #146252, #146254	2025-02-08 18:00:17 +00:00
Jason Ansel	71498aeae3	[inductor] Refactor op handlers part 2 (#146252 ) This replaces the `__getattr__()` pattern used in (some) OpHandlers with a `DefaultHandler` class that has an implementation of every op that calls `self._default()`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/146252 Approved by: https://github.com/yanboliang	2025-02-08 18:00:00 +00:00
eellison	71e8a2bda4	Expand inductor codegen dtype asserts, fix scan (#146067 ) We were codegening intermediary dtype asserts in some places but not all. expands assertions, fixes newly failing assertion in `TORCHINDUCTOR_COMPILE_THREADS=1 TORCH_LOGS="output_code" PYTORCH_OPINFO_SAMPLE_INPUT_INDEX=1 python test/inductor/test_torchinductor_opinfo.py TestInductorOpInfoCUDA.test_comprehensive_logcumsumexp_cuda_float16` for scan. Pull Request resolved: https://github.com/pytorch/pytorch/pull/146067 Approved by: https://github.com/shunting314, https://github.com/jansel	2025-02-07 06:35:47 +00:00
PyTorch MergeBot	e0cf519ade	Revert "[inductor] Refactor op handlers part 2 (#146252 )" This reverts commit `13f0436abd`. Reverted https://github.com/pytorch/pytorch/pull/146252 on behalf of https://github.com/atalman due to Sorry need to revert, failing internally ([comment](https://github.com/pytorch/pytorch/pull/146252#issuecomment-2638305417))	2025-02-06 00:04:04 +00:00
PyTorch MergeBot	68304dba7a	Revert "[inductor] Refactor op handlers part 4 (#146255 )" This reverts commit `7aced455c5`. Reverted https://github.com/pytorch/pytorch/pull/146255 on behalf of https://github.com/atalman due to Sorry need to revert https://github.com/pytorch/pytorch/pull/146252 ([comment](https://github.com/pytorch/pytorch/pull/146255#issuecomment-2638258089))	2025-02-05 23:24:20 +00:00
PyTorch MergeBot	49effa0deb	Revert "[inductor] Refactor op handlers part 5 (#146257 )" This reverts commit `d3dd3eeb7f`. Reverted https://github.com/pytorch/pytorch/pull/146257 on behalf of https://github.com/atalman due to Sorry need to revert https://github.com/pytorch/pytorch/pull/146252 ([comment](https://github.com/pytorch/pytorch/pull/146257#issuecomment-2638251994))	2025-02-05 23:20:38 +00:00
PyTorch MergeBot	93e1e6e07c	Revert "[inductor] Minor compile time optimizations in DefaultHandler (#146282 )" This reverts commit `b8a529cca1`. Reverted https://github.com/pytorch/pytorch/pull/146282 on behalf of https://github.com/atalman due to Sorry need to revert https://github.com/pytorch/pytorch/pull/146252 ([comment](https://github.com/pytorch/pytorch/pull/146282#issuecomment-2638239575))	2025-02-05 23:13:08 +00:00
Jason Ansel	b8a529cca1	[inductor] Minor compile time optimizations in DefaultHandler (#146282 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/146282 Approved by: https://github.com/shunting314 ghstack dependencies: #146225, #146226, #146235, #146252, #146254, #146255, #146257	2025-02-04 23:36:34 +00:00
Jason Ansel	d3dd3eeb7f	[inductor] Refactor op handlers part 5 (#146257 ) This makes OpHandler just a normal class using inheritance, and removes typing workarounds needed because it wasn't Pull Request resolved: https://github.com/pytorch/pytorch/pull/146257 Approved by: https://github.com/shunting314 ghstack dependencies: #146225, #146226, #146235, #146252, #146254, #146255	2025-02-04 23:36:25 +00:00
Jason Ansel	7aced455c5	[inductor] Refactor op handlers part 4 (#146255 ) This replaces the `__getattr__()` pattern used in remaining OpHandlers with a `DefaultHandler` class defined in part 2. Some compile time wins from this as well: ``` 2025-02-02T19:46:32.2033010Z 2025-02-02T19:46:32.2036607Z WIN: benchmark ('add_loop_inductor', 'compile_time_instruction_count') failed, actual result 29633182927 is -1.71% lower than expected 30150000000 ±1.50% please update the expected results. 2025-02-02T19:46:32.2037575Z 2025-02-02T19:46:32.2037907Z please update all results that changed significantly, and not only the failed ones 2025-02-02T19:46:32.2039291Z PASS: benchmark ('add_loop_inductor_dynamic_gpu', 'compile_time_instruction_count') pass, actual result 43986879172 -1.02% is within expected 44440000000 ±2.50% 2025-02-02T19:46:32.2040131Z 2025-02-02T19:46:32.2041180Z WIN: benchmark ('add_loop_inductor_gpu', 'compile_time_instruction_count') failed, actual result 26246225695 is -1.85% lower than expected 26740000000 ±1.50% please update the expected results. 2025-02-02T19:46:32.2042188Z ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/146255 Approved by: https://github.com/shunting314 ghstack dependencies: #146225, #146226, #146235, #146252, #146254	2025-02-04 23:36:17 +00:00
Jason Ansel	13f0436abd	[inductor] Refactor op handlers part 2 (#146252 ) This replaces the `__getattr__()` pattern used in (some) OpHandlers with a `DefaultHandler` class that has an implementation of every op that calls `self._default()`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/146252 Approved by: https://github.com/yanboliang ghstack dependencies: #146225, #146226, #146235	2025-02-04 23:36:01 +00:00
Jason Ansel	67be5953fe	[inductor] Refactor op handlers part 1 (#146235 ) This enforces the invariant that every backend implements the same set of ops and removes a layer of indirection for BasicMathOps. Interestingly this is a small compile time win: ``` ... WIN: benchmark ('add_loop_inductor', 'compile_time_instruction_count') failed, actual result 30151159301 is -6.13% lower than expected 32120000000 ±1.50% please update the expected results. please update all results that changed significantly, and not only the failed ones PASS: benchmark ('add_loop_inductor_dynamic_gpu', 'compile_time_instruction_count') pass, actual result 44447549162 -1.69% is within expected 45210000000 ±2.50% WIN: benchmark ('add_loop_inductor_gpu', 'compile_time_instruction_count') failed, actual result 26743557195 is -2.25% lower than expected 27360000000 ±1.50% please update the expected results. please update all results that changed significantly, and not only the failed ones PASS: benchmark ('basic_modules_ListOfLinears_eager', 'compile_time_instruction_count') pass, actual result 945129734 +0.93% is within expected 936400000 ±1.50% WIN: benchmark ('basic_modules_ListOfLinears_inductor', 'compile_time_instruction_count') failed, actual result 18984384503 is -3.19% lower than expected 19610000000 ±1.50% please update the expected results. please update all results that changed significantly, and not only the failed ones WIN: benchmark ('basic_modules_ListOfLinears_inductor_gpu_force_shape_pad', 'compile_time_instruction_count') failed, actual result 17258025389 is -1.94% lower than expected 17600000000 ±1.50% please update the expected results. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/146235 Approved by: https://github.com/shunting314 ghstack dependencies: #146225, #146226	2025-02-04 23:35:53 +00:00
Jason Ansel	ed03f9ca10	[inductor] Refactor CSEProxy into global scope (#146226 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/146226 Approved by: https://github.com/shunting314 ghstack dependencies: #146225	2025-02-04 23:35:43 +00:00
Jason Ansel	5cac550ddf	[inductor] Finish typing common.py (#146225 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/146225 Approved by: https://github.com/Skylion007	2025-02-04 23:35:33 +00:00
Jason Ansel	e9f6e273e7	[inductor] Add typing to common.CSE (#145993 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145993 Approved by: https://github.com/yanboliang ghstack dependencies: #145916	2025-02-04 16:05:39 +00:00
Jason Ansel	7a5239afd7	[inductor] Add typing to common.KernelArgs (#145916 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145916 Approved by: https://github.com/yanboliang	2025-02-04 16:05:39 +00:00
PyTorch MergeBot	7f796eb8b7	Revert "[inductor] Add typing to common.KernelArgs (#145916 )" This reverts commit `68cf36d5ab`. Reverted https://github.com/pytorch/pytorch/pull/145916 on behalf of https://github.com/atalman due to Failing internally, please see associated diff ([comment](https://github.com/pytorch/pytorch/pull/145916#issuecomment-2632715678))	2025-02-04 03:07:12 +00:00
PyTorch MergeBot	d3c7e4bb9c	Revert "[inductor] Add typing to common.CSE (#145993 )" This reverts commit `8c657ae4be`. Reverted https://github.com/pytorch/pytorch/pull/145993 on behalf of https://github.com/atalman due to Sorry need to revert https://github.com/pytorch/pytorch/pull/145916 ([comment](https://github.com/pytorch/pytorch/pull/145993#issuecomment-2632712384))	2025-02-04 03:04:01 +00:00
PyTorch MergeBot	ecbc725fad	Revert "[inductor] Finish typing common.py (#146225 )" This reverts commit `3a67c0e48d`. Reverted https://github.com/pytorch/pytorch/pull/146225 on behalf of https://github.com/atalman due to Sorry need to revert https://github.com/pytorch/pytorch/pull/145916 ([comment](https://github.com/pytorch/pytorch/pull/146225#issuecomment-2632709707))	2025-02-04 03:01:36 +00:00
PyTorch MergeBot	0061eb5b70	Revert "[inductor] Refactor CSEProxy into global scope (#146226 )" This reverts commit `18380ab877`. Reverted https://github.com/pytorch/pytorch/pull/146226 on behalf of https://github.com/atalman due to Sorry need to revert https://github.com/pytorch/pytorch/pull/145916 ([comment](https://github.com/pytorch/pytorch/pull/146226#issuecomment-2632707618))	2025-02-04 02:58:50 +00:00
PyTorch MergeBot	2f40f789da	Revert "[inductor] Refactor op handlers part 1 (#146235 )" This reverts commit `204be4e0a2`. Reverted https://github.com/pytorch/pytorch/pull/146235 on behalf of https://github.com/atalman due to Breaks lint, sorry: Definition of polygamma in base class MetalOverrides is incompatible with definition in base class OpsHandler. Please rebase fix lint and reland ([comment](https://github.com/pytorch/pytorch/pull/146235#issuecomment-2632444514))	2025-02-04 00:00:08 +00:00
Jason Ansel	204be4e0a2	[inductor] Refactor op handlers part 1 (#146235 ) This enforces the invariant that every backend implements the same set of ops and removes a layer of indirection for BasicMathOps. Interestingly this is a small compile time win: ``` ... WIN: benchmark ('add_loop_inductor', 'compile_time_instruction_count') failed, actual result 30151159301 is -6.13% lower than expected 32120000000 ±1.50% please update the expected results. please update all results that changed significantly, and not only the failed ones PASS: benchmark ('add_loop_inductor_dynamic_gpu', 'compile_time_instruction_count') pass, actual result 44447549162 -1.69% is within expected 45210000000 ±2.50% WIN: benchmark ('add_loop_inductor_gpu', 'compile_time_instruction_count') failed, actual result 26743557195 is -2.25% lower than expected 27360000000 ±1.50% please update the expected results. please update all results that changed significantly, and not only the failed ones PASS: benchmark ('basic_modules_ListOfLinears_eager', 'compile_time_instruction_count') pass, actual result 945129734 +0.93% is within expected 936400000 ±1.50% WIN: benchmark ('basic_modules_ListOfLinears_inductor', 'compile_time_instruction_count') failed, actual result 18984384503 is -3.19% lower than expected 19610000000 ±1.50% please update the expected results. please update all results that changed significantly, and not only the failed ones WIN: benchmark ('basic_modules_ListOfLinears_inductor_gpu_force_shape_pad', 'compile_time_instruction_count') failed, actual result 17258025389 is -1.94% lower than expected 17600000000 ±1.50% please update the expected results. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/146235 Approved by: https://github.com/shunting314 ghstack dependencies: #146225, #146226	2025-02-03 23:15:13 +00:00
Jason Ansel	18380ab877	[inductor] Refactor CSEProxy into global scope (#146226 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/146226 Approved by: https://github.com/shunting314 ghstack dependencies: #146225	2025-02-03 23:15:13 +00:00
Jason Ansel	3a67c0e48d	[inductor] Finish typing common.py (#146225 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/146225 Approved by: https://github.com/Skylion007	2025-02-01 22:53:35 +00:00
Jason Ansel	8c657ae4be	[inductor] Add typing to common.CSE (#145993 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145993 Approved by: https://github.com/yanboliang ghstack dependencies: #145913, #145914, #145915, #145916	2025-02-01 16:34:18 +00:00
Jason Ansel	68cf36d5ab	[inductor] Add typing to common.KernelArgs (#145916 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145916 Approved by: https://github.com/yanboliang ghstack dependencies: #145913, #145914, #145915	2025-02-01 16:34:18 +00:00
Jason Ansel	8e56d713c9	[inductor] Add typing to common.OpDecompositions (#145915 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145915 Approved by: https://github.com/yanboliang ghstack dependencies: #145913, #145914	2025-02-01 16:34:11 +00:00
Jason Ansel	79f9f62e3a	[inductor] Combine regexp checks in OpOverrides.paren (#145914 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145914 Approved by: https://github.com/Skylion007 ghstack dependencies: #145913	2025-02-01 16:34:03 +00:00
Jason Ansel	4c004caa76	[inductor] Add types to DeviceOpOverrides (#145913 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145913 Approved by: https://github.com/Skylion007	2025-02-01 16:33:49 +00:00
bglass@quansight.com	40ccb7a86d	cpp_wrapper: Move #includes to per-device header files (#145932 ) Summary: This prepares us for the next PR in the stack, where we introduce pre-compiled per-device header files to save compilation time. Reland https://github.com/pytorch/pytorch/pull/143909 after merge conflicts. Co-authored-by: Benjamin Glass <[bglass@quansight.com](mailto:bglass@quansight.com)> Differential Revision: D68656960 Pulled By: benjaminglass1 Pull Request resolved: https://github.com/pytorch/pytorch/pull/145932 Approved by: https://github.com/yushangdi, https://github.com/benjaminglass1 Co-authored-by: bglass@quansight.com <bglass@quansight.com>	2025-01-29 21:08:45 +00:00
David Berard	2e8c080ab1	[inductor][4/N] triton support post-#5512, fix constexpr signatures (#145583 ) Prior to this PR, constexprs were appearing in signatures as `{.. "XBLOCK : tl.constexpr": "constexpr"}` when they really should appear as `{.. "XBLOCK": "constexpr"}`. This PR represents the argument names as ArgName objects, which can optionally be marked as constexpr. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145583 Approved by: https://github.com/jansel	2025-01-29 05:46:05 +00:00
Jason Ansel	78a94c9114	[inductor] Remove type ignores from scheduler.py (#145712 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145712 Approved by: https://github.com/yanboliang, https://github.com/Skylion007 ghstack dependencies: #145692	2025-01-28 01:44:32 +00:00
Jason Ansel	2df2f9d895	[inductor] Change type of get_backend_features to OrderedSet (#145692 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145692 Approved by: https://github.com/yanboliang	2025-01-28 01:44:32 +00:00
Jason Ansel	e90cf4abcf	[inductor] Add some typing to common.py (#145691 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145691 Approved by: https://github.com/malfet ghstack dependencies: #145690	2025-01-27 06:27:13 +00:00

1 2 3 4 5 ...

386 Commits