Commit Graph

386 Commits

Author SHA1 Message Date
Jason Ansel
b040dc3a53 Reland: [inductor] Simplify grid handling (#148305)
Summary:
Relands D69965761 / https://github.com/pytorch/pytorch/pull/147583

Before this PR, calling a triton kernel would look like:
```py
kernel.run(a, b, xnumel, grid=grid(xnumel), stream=stream0)
```
where the `grid=` was passed as a callable (function closure) arg.  This PR removes the grid arg:
```py
kernel.run(a, b, xnumel, stream=stream0)
```
instead now the grid computation is included in the kernel launcher, with something like:
```py
def launcher(in_ptr0, out_ptr0, xnumel, stream):
    grid_0 = ((xnumel + 1023) >> 10)
    grid_1 = 1
    grid_2 = 1
    runner(grid_0, grid_1, grid_2, stream, function, metadata, None, launch_enter_hook, launch_exit_hook, in_ptr0, out_ptr0, xnumel)
```

This should be faster, since we remove multiple function/dict calls and are able to specialize the grid computation for each `triton.Config`.

It also allows us to unify the handling of grids between the Python and C++ wrapper code.  Before this, C++ wrapper code didn't actually support dynamic grid sizes and instead burned in a static grid.

This unification allows this PR to be a net deletion of code.

Differential [disconnected] Revision: D70471332

Pull Request resolved: https://github.com/pytorch/pytorch/pull/148305
Approved by: https://github.com/shunting314, https://github.com/eellison
2025-03-12 15:52:16 +00:00
PyTorch MergeBot
5ada4e6a53 Revert "Reland: [inductor] Simplify grid handling (#148305)"
This reverts commit 8d08b49015.

Reverted https://github.com/pytorch/pytorch/pull/148305 on behalf of https://github.com/jithunnair-amd due to Broke ROCm CI ([comment](https://github.com/pytorch/pytorch/pull/148305#issuecomment-2718177044))
2025-03-12 14:58:43 +00:00
Jason Ansel
8d08b49015 Reland: [inductor] Simplify grid handling (#148305)
Summary:
Relands D69965761 / https://github.com/pytorch/pytorch/pull/147583

Before this PR, calling a triton kernel would look like:
```py
kernel.run(a, b, xnumel, grid=grid(xnumel), stream=stream0)
```
where the `grid=` was passed as a callable (function closure) arg.  This PR removes the grid arg:
```py
kernel.run(a, b, xnumel, stream=stream0)
```
instead now the grid computation is included in the kernel launcher, with something like:
```py
def launcher(in_ptr0, out_ptr0, xnumel, stream):
    grid_0 = ((xnumel + 1023) >> 10)
    grid_1 = 1
    grid_2 = 1
    runner(grid_0, grid_1, grid_2, stream, function, metadata, None, launch_enter_hook, launch_exit_hook, in_ptr0, out_ptr0, xnumel)
```

This should be faster, since we remove multiple function/dict calls and are able to specialize the grid computation for each `triton.Config`.

It also allows us to unify the handling of grids between the Python and C++ wrapper code.  Before this, C++ wrapper code didn't actually support dynamic grid sizes and instead burned in a static grid.

This unification allows this PR to be a net deletion of code.

Differential Revision: D70471332

Pull Request resolved: https://github.com/pytorch/pytorch/pull/148305
Approved by: https://github.com/shunting314, https://github.com/eellison
2025-03-11 18:51:06 +00:00
PyTorch MergeBot
608377d341 Revert "[import][inductor] Simplify grid handling (#147583)"
This reverts commit b59776d857.

Reverted https://github.com/pytorch/pytorch/pull/147583 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/147583#issuecomment-2693016036))
2025-03-03 00:49:32 +00:00
Jason Ansel
b59776d857 [import][inductor] Simplify grid handling (#147583)
Before this PR, calling a triton kernel would look like:
```py
kernel.run(a, b, xnumel, grid=grid(xnumel), stream=stream0)
```
where the `grid=` was passed as a callable (function closure) arg.  This PR removes the grid arg:
```py
kernel.run(a, b, xnumel, stream=stream0)
```
instead now the grid computation is included in the kernel launcher, with something like:
```py
def launcher(in_ptr0, out_ptr0, xnumel, stream):
    grid_0 = ((xnumel + 1023) >> 10)
    grid_1 = 1
    grid_2 = 1
    runner(grid_0, grid_1, grid_2, stream, function, metadata, None, launch_enter_hook, launch_exit_hook, in_ptr0, out_ptr0, xnumel)
```

This should be faster, since we remove multiple function/dict calls and are able to specialize the grid computation for each `triton.Config`.

It also allows us to unify the handling of grids between the Python and C++ wrapper code.  Before this, C++ wrapper code didn't actually support dynamic grid sizes and instead burned in a static grid.

This unification allows this PR to be a net deletion of code.

Note the attached diff contains some minor fbcode-only changes.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/147583
Approved by: https://github.com/eellison, https://github.com/shunting314
2025-03-02 07:31:07 +00:00
Xuehai Pan
1cb4e2df65 [BE][PYFMT] migrate PYFMT for torch._inductor to ruff format (#144550)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144550
Approved by: https://github.com/jansel
2025-02-28 13:33:19 +00:00
David Berard
26f19539ad [triton 3.3] cpp_wrapper: add a global_scratch arg (#148051)
Following triton # 4916, the generated cubin expects a global_scratch argument to support on-device TMA. We believe this is the source of many of the "invalid argument" failures on AOTI/cpp_wrapper tests. AFAIK, we don't use on-device TMA in Inductor as of now, so it should be safe to use a nullptr for the scratch space.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/148051
Approved by: https://github.com/YUNQIUGUO
2025-02-27 10:13:57 +00:00
Mwiza Kunda
8cb8722979 [inductor][triton] Ignore block ptr advances for removed buffers (#147193)
block ptr advancements should also be deferrered conditional on the associated buffer not being removed. For example, if `FusedSchedulerNode(op0-op1)` has a store in `SchedulerNode` `op0` that is read in `op1`, the store and associated block ptr that would be created for `op0` in isolation is no longer needed.

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/147193
Approved by: https://github.com/jansel

Co-authored-by: Aaron Gokaslan <aaronGokaslan@gmail.com>
2025-02-27 03:37:33 +00:00
PyTorch MergeBot
0d31c621a3 Revert "[inductor][triton] Ignore block ptr advances for removed buffers (#147193)"
This reverts commit 17766b7aad.

Reverted https://github.com/pytorch/pytorch/pull/147193 on behalf of https://github.com/wdvr due to failing tests on trunk - see below ([comment](https://github.com/pytorch/pytorch/pull/147193#issuecomment-2683286358))
2025-02-25 21:04:04 +00:00
Mwiza
17766b7aad [inductor][triton] Ignore block ptr advances for removed buffers (#147193)
block ptr advancements should also be deferrered conditional on the associated buffer not being removed. For example, if `FusedSchedulerNode(op0-op1)` has a store in `SchedulerNode` `op0` that is read in `op1`, the store and associated block ptr that would be created for `op0` in isolation is no longer needed.

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/147193
Approved by: https://github.com/jansel

Co-authored-by: Aaron Gokaslan <aaronGokaslan@gmail.com>
2025-02-25 19:14:55 +00:00
Aaron Orenstein
db4ce78d46 PEP585: More UP006 fixes (#146392)
This should be the final PR before we can enable RUFF UP006.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146392
Approved by: https://github.com/justinchuby, https://github.com/albanD, https://github.com/Skylion007
2025-02-20 06:18:13 +00:00
leslie-fang-intel
9e0b3e9b6c [Inductor] Fix Inplace Buffer inner name conflict (#147199)
**Summary**
Fix issue: https://github.com/pytorch/pytorch/issues/146975, when create `InplacedBuffer` inner name, we only count the number of unique `InplacedBuffer` or `RemovedArg`. The name may have conflict, for example reported in this issue

```
---- make inplace create, input_name is: buf22; output_name is: buf27; buf.inner_name is: in_out_ptr2
dict_values([
InplacedBuffer(inner_name='in_out_ptr0', other_names=['buf6', 'buf11']),
InplacedBuffer(inner_name='in_out_ptr0', other_names=['buf6', 'buf11']),
InplacedBuffer(inner_name='in_out_ptr1', other_names=['buf24', 'buf26']),
InplacedBuffer(inner_name='in_out_ptr1', other_names=['buf24', 'buf26'])])

---- make inplace create, input_name is: buf0; output_name is: buf3; buf.inner_name is: in_out_ptr2
dict_values([
<torch._inductor.codegen.common.RemovedArg object at 0x7fbf75516350>,
<torch._inductor.codegen.common.RemovedArg object at 0x7fbf75516350>,
<torch._inductor.codegen.common.RemovedArg object at 0x7fbf75516350>,
<torch._inductor.codegen.common.RemovedArg object at 0x7fbf75516350>,
InplacedBuffer(inner_name='in_out_ptr2', other_names=['buf22', 'buf27', 'buf31', 'buf33']),
InplacedBuffer(inner_name='in_out_ptr2', other_names=['buf22', 'buf27', 'buf31', 'buf33'])
<torch._inductor.codegen.common.RemovedArg object at 0x7fbf75516350>,
InplacedBuffer(inner_name='in_out_ptr2', other_names=['buf22', 'buf27', 'buf31', 'buf33']),
InplacedBuffer(inner_name='in_out_ptr2', other_names=['buf22', 'buf27', 'buf31', 'buf33'])
])
```

- The first time create `in_out_ptr2`, there are 2 unique `InplacedBuffer`

- The second time create `in_out_ptr2`, there is 1 `RemovedArg` and 1 unique `InplacedBuffer`

They are 2 different `InplacedBuffer`, but with same name `in_out_ptr2`. In this PR, we fix this regression by counting the number of `RemovedArg`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/147199
Approved by: https://github.com/jansel
2025-02-15 08:31:06 +00:00
Henry Tsang
20a9938069 try print stacktrace for error (#147061)
Differential Revision: D69573525

Pull Request resolved: https://github.com/pytorch/pytorch/pull/147061
Approved by: https://github.com/Skylion007
2025-02-14 18:28:03 +00:00
Ding, Yi
b18e3c01aa [Inductor] Unifiy Low Precision FP Legalization for to_dtype_bitcast & constant (#144646)
The upcast in `to_dtype_bitcast()` breaks following operations that only works with the target type (I uses `bitwise_and` in the updated UT).
![image](https://github.com/user-attachments/assets/77a6f3b6-b5e7-4ed8-ab65-09d76f077376)

This PR fixes this problem. Let's check the CI results to make sure it doesn't bring accuracy problems.

- Unified the type promotion of low-precision FP operations in the legalize func, grouping ops into sources (whose results may be promoted) and sinks (whose input may be cast back). (The term of _sink_ and _source_ are from [graph theory](https://en.wikipedia.org/wiki/Directed_graph#Indegree_and_outdegree).)

## Test
```bash
pytest -vs test/inductor/test_torchinductor.py::CpuTests::test_float16_to_int16_cpu
pytest -vs test/inductor/test_torchinductor.py::CpuTests::test_bfloat16_to_int16_cpu
pytest -vs test/inductor/test_torchinductor.py::CpuTests::test_float32_to_int32_cpu
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/144646
Approved by: https://github.com/jgong5, https://github.com/leslie-fang-intel, https://github.com/jansel
2025-02-11 19:45:04 +00:00
Jason Ansel
d35f6b2339 [inductor] Minor compile time optimizations in DefaultHandler (#146282)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/146282
Approved by: https://github.com/shunting314
ghstack dependencies: #146252, #146254, #146255, #146257
2025-02-08 18:00:40 +00:00
Jason Ansel
06604c4ec1 [inductor] Refactor op handlers part 5 (#146257)
This makes OpHandler just a normal class using inheritance, and removes typing workarounds needed because it wasn't

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146257
Approved by: https://github.com/shunting314
ghstack dependencies: #146252, #146254, #146255
2025-02-08 18:00:30 +00:00
Jason Ansel
403db2faee [inductor] Refactor op handlers part 4 (#146255)
This replaces the `__getattr__()` pattern used in remaining OpHandlers with a `DefaultHandler` class defined in part 2.

Some compile time wins from this as well:
```
2025-02-02T19:46:32.2033010Z
2025-02-02T19:46:32.2036607Z WIN: benchmark ('add_loop_inductor', 'compile_time_instruction_count') failed, actual result 29633182927 is -1.71% lower than expected 30150000000 ±1.50% please update the expected results.
2025-02-02T19:46:32.2037575Z
2025-02-02T19:46:32.2037907Z please update all results that changed significantly, and not only the failed ones
2025-02-02T19:46:32.2039291Z PASS: benchmark ('add_loop_inductor_dynamic_gpu', 'compile_time_instruction_count') pass, actual result 43986879172 -1.02% is within expected 44440000000 ±2.50%
2025-02-02T19:46:32.2040131Z
2025-02-02T19:46:32.2041180Z WIN: benchmark ('add_loop_inductor_gpu', 'compile_time_instruction_count') failed, actual result 26246225695 is -1.85% lower than expected 26740000000 ±1.50% please update the expected results.
2025-02-02T19:46:32.2042188Z
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146255
Approved by: https://github.com/shunting314
ghstack dependencies: #146252, #146254
2025-02-08 18:00:17 +00:00
Jason Ansel
71498aeae3 [inductor] Refactor op handlers part 2 (#146252)
This replaces the `__getattr__()` pattern used in (some) OpHandlers with a `DefaultHandler` class that has an implementation of every op that calls `self._default()`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146252
Approved by: https://github.com/yanboliang
2025-02-08 18:00:00 +00:00
eellison
71e8a2bda4 Expand inductor codegen dtype asserts, fix scan (#146067)
We were codegening intermediary dtype asserts in some places but not all. expands assertions, fixes newly failing assertion in

`TORCHINDUCTOR_COMPILE_THREADS=1 TORCH_LOGS="output_code" PYTORCH_OPINFO_SAMPLE_INPUT_INDEX=1 python test/inductor/test_torchinductor_opinfo.py TestInductorOpInfoCUDA.test_comprehensive_logcumsumexp_cuda_float16` for scan.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146067
Approved by: https://github.com/shunting314, https://github.com/jansel
2025-02-07 06:35:47 +00:00
PyTorch MergeBot
e0cf519ade Revert "[inductor] Refactor op handlers part 2 (#146252)"
This reverts commit 13f0436abd.

Reverted https://github.com/pytorch/pytorch/pull/146252 on behalf of https://github.com/atalman due to Sorry need to revert, failing internally ([comment](https://github.com/pytorch/pytorch/pull/146252#issuecomment-2638305417))
2025-02-06 00:04:04 +00:00
PyTorch MergeBot
68304dba7a Revert "[inductor] Refactor op handlers part 4 (#146255)"
This reverts commit 7aced455c5.

Reverted https://github.com/pytorch/pytorch/pull/146255 on behalf of https://github.com/atalman due to Sorry need to revert https://github.com/pytorch/pytorch/pull/146252 ([comment](https://github.com/pytorch/pytorch/pull/146255#issuecomment-2638258089))
2025-02-05 23:24:20 +00:00
PyTorch MergeBot
49effa0deb Revert "[inductor] Refactor op handlers part 5 (#146257)"
This reverts commit d3dd3eeb7f.

Reverted https://github.com/pytorch/pytorch/pull/146257 on behalf of https://github.com/atalman due to Sorry need to revert https://github.com/pytorch/pytorch/pull/146252 ([comment](https://github.com/pytorch/pytorch/pull/146257#issuecomment-2638251994))
2025-02-05 23:20:38 +00:00
PyTorch MergeBot
93e1e6e07c Revert "[inductor] Minor compile time optimizations in DefaultHandler (#146282)"
This reverts commit b8a529cca1.

Reverted https://github.com/pytorch/pytorch/pull/146282 on behalf of https://github.com/atalman due to Sorry need to revert https://github.com/pytorch/pytorch/pull/146252 ([comment](https://github.com/pytorch/pytorch/pull/146282#issuecomment-2638239575))
2025-02-05 23:13:08 +00:00
Jason Ansel
b8a529cca1 [inductor] Minor compile time optimizations in DefaultHandler (#146282)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/146282
Approved by: https://github.com/shunting314
ghstack dependencies: #146225, #146226, #146235, #146252, #146254, #146255, #146257
2025-02-04 23:36:34 +00:00
Jason Ansel
d3dd3eeb7f [inductor] Refactor op handlers part 5 (#146257)
This makes OpHandler just a normal class using inheritance, and removes typing workarounds needed because it wasn't

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146257
Approved by: https://github.com/shunting314
ghstack dependencies: #146225, #146226, #146235, #146252, #146254, #146255
2025-02-04 23:36:25 +00:00
Jason Ansel
7aced455c5 [inductor] Refactor op handlers part 4 (#146255)
This replaces the `__getattr__()` pattern used in remaining OpHandlers with a `DefaultHandler` class defined in part 2.

Some compile time wins from this as well:
```
2025-02-02T19:46:32.2033010Z
2025-02-02T19:46:32.2036607Z WIN: benchmark ('add_loop_inductor', 'compile_time_instruction_count') failed, actual result 29633182927 is -1.71% lower than expected 30150000000 ±1.50% please update the expected results.
2025-02-02T19:46:32.2037575Z
2025-02-02T19:46:32.2037907Z please update all results that changed significantly, and not only the failed ones
2025-02-02T19:46:32.2039291Z PASS: benchmark ('add_loop_inductor_dynamic_gpu', 'compile_time_instruction_count') pass, actual result 43986879172 -1.02% is within expected 44440000000 ±2.50%
2025-02-02T19:46:32.2040131Z
2025-02-02T19:46:32.2041180Z WIN: benchmark ('add_loop_inductor_gpu', 'compile_time_instruction_count') failed, actual result 26246225695 is -1.85% lower than expected 26740000000 ±1.50% please update the expected results.
2025-02-02T19:46:32.2042188Z
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146255
Approved by: https://github.com/shunting314
ghstack dependencies: #146225, #146226, #146235, #146252, #146254
2025-02-04 23:36:17 +00:00
Jason Ansel
13f0436abd [inductor] Refactor op handlers part 2 (#146252)
This replaces the `__getattr__()` pattern used in (some) OpHandlers with a `DefaultHandler` class that has an implementation of every op that calls `self._default()`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146252
Approved by: https://github.com/yanboliang
ghstack dependencies: #146225, #146226, #146235
2025-02-04 23:36:01 +00:00
Jason Ansel
67be5953fe [inductor] Refactor op handlers part 1 (#146235)
This enforces the invariant that every backend implements the same set of ops and removes a layer of indirection for BasicMathOps.

Interestingly this is a small compile time win:
```
...
WIN: benchmark ('add_loop_inductor', 'compile_time_instruction_count') failed, actual result 30151159301 is -6.13% lower than expected 32120000000 ±1.50% please update the expected results.

please update all results that changed significantly, and not only the failed ones
PASS: benchmark ('add_loop_inductor_dynamic_gpu', 'compile_time_instruction_count') pass, actual result 44447549162 -1.69% is within expected 45210000000 ±2.50%

WIN: benchmark ('add_loop_inductor_gpu', 'compile_time_instruction_count') failed, actual result 26743557195 is -2.25% lower than expected 27360000000 ±1.50% please update the expected results.

please update all results that changed significantly, and not only the failed ones
PASS: benchmark ('basic_modules_ListOfLinears_eager', 'compile_time_instruction_count') pass, actual result 945129734 +0.93% is within expected 936400000 ±1.50%

WIN: benchmark ('basic_modules_ListOfLinears_inductor', 'compile_time_instruction_count') failed, actual result 18984384503 is -3.19% lower than expected 19610000000 ±1.50% please update the expected results.

please update all results that changed significantly, and not only the failed ones
WIN: benchmark ('basic_modules_ListOfLinears_inductor_gpu_force_shape_pad', 'compile_time_instruction_count') failed, actual result 17258025389 is -1.94% lower than expected 17600000000 ±1.50% please update the expected results.
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146235
Approved by: https://github.com/shunting314
ghstack dependencies: #146225, #146226
2025-02-04 23:35:53 +00:00
Jason Ansel
ed03f9ca10 [inductor] Refactor CSEProxy into global scope (#146226)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/146226
Approved by: https://github.com/shunting314
ghstack dependencies: #146225
2025-02-04 23:35:43 +00:00
Jason Ansel
5cac550ddf [inductor] Finish typing common.py (#146225)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/146225
Approved by: https://github.com/Skylion007
2025-02-04 23:35:33 +00:00
Jason Ansel
e9f6e273e7 [inductor] Add typing to common.CSE (#145993)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145993
Approved by: https://github.com/yanboliang
ghstack dependencies: #145916
2025-02-04 16:05:39 +00:00
Jason Ansel
7a5239afd7 [inductor] Add typing to common.KernelArgs (#145916)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145916
Approved by: https://github.com/yanboliang
2025-02-04 16:05:39 +00:00
PyTorch MergeBot
7f796eb8b7 Revert "[inductor] Add typing to common.KernelArgs (#145916)"
This reverts commit 68cf36d5ab.

Reverted https://github.com/pytorch/pytorch/pull/145916 on behalf of https://github.com/atalman due to Failing internally, please see associated diff ([comment](https://github.com/pytorch/pytorch/pull/145916#issuecomment-2632715678))
2025-02-04 03:07:12 +00:00
PyTorch MergeBot
d3c7e4bb9c Revert "[inductor] Add typing to common.CSE (#145993)"
This reverts commit 8c657ae4be.

Reverted https://github.com/pytorch/pytorch/pull/145993 on behalf of https://github.com/atalman due to Sorry need to revert https://github.com/pytorch/pytorch/pull/145916 ([comment](https://github.com/pytorch/pytorch/pull/145993#issuecomment-2632712384))
2025-02-04 03:04:01 +00:00
PyTorch MergeBot
ecbc725fad Revert "[inductor] Finish typing common.py (#146225)"
This reverts commit 3a67c0e48d.

Reverted https://github.com/pytorch/pytorch/pull/146225 on behalf of https://github.com/atalman due to Sorry need to revert https://github.com/pytorch/pytorch/pull/145916 ([comment](https://github.com/pytorch/pytorch/pull/146225#issuecomment-2632709707))
2025-02-04 03:01:36 +00:00
PyTorch MergeBot
0061eb5b70 Revert "[inductor] Refactor CSEProxy into global scope (#146226)"
This reverts commit 18380ab877.

Reverted https://github.com/pytorch/pytorch/pull/146226 on behalf of https://github.com/atalman due to Sorry need to revert https://github.com/pytorch/pytorch/pull/145916 ([comment](https://github.com/pytorch/pytorch/pull/146226#issuecomment-2632707618))
2025-02-04 02:58:50 +00:00
PyTorch MergeBot
2f40f789da Revert "[inductor] Refactor op handlers part 1 (#146235)"
This reverts commit 204be4e0a2.

Reverted https://github.com/pytorch/pytorch/pull/146235 on behalf of https://github.com/atalman due to Breaks lint, sorry: Definition of polygamma in base class MetalOverrides is incompatible with definition in base class OpsHandler. Please rebase fix lint and reland ([comment](https://github.com/pytorch/pytorch/pull/146235#issuecomment-2632444514))
2025-02-04 00:00:08 +00:00
Jason Ansel
204be4e0a2 [inductor] Refactor op handlers part 1 (#146235)
This enforces the invariant that every backend implements the same set of ops and removes a layer of indirection for BasicMathOps.

Interestingly this is a small compile time win:
```
...
WIN: benchmark ('add_loop_inductor', 'compile_time_instruction_count') failed, actual result 30151159301 is -6.13% lower than expected 32120000000 ±1.50% please update the expected results.

please update all results that changed significantly, and not only the failed ones
PASS: benchmark ('add_loop_inductor_dynamic_gpu', 'compile_time_instruction_count') pass, actual result 44447549162 -1.69% is within expected 45210000000 ±2.50%

WIN: benchmark ('add_loop_inductor_gpu', 'compile_time_instruction_count') failed, actual result 26743557195 is -2.25% lower than expected 27360000000 ±1.50% please update the expected results.

please update all results that changed significantly, and not only the failed ones
PASS: benchmark ('basic_modules_ListOfLinears_eager', 'compile_time_instruction_count') pass, actual result 945129734 +0.93% is within expected 936400000 ±1.50%

WIN: benchmark ('basic_modules_ListOfLinears_inductor', 'compile_time_instruction_count') failed, actual result 18984384503 is -3.19% lower than expected 19610000000 ±1.50% please update the expected results.

please update all results that changed significantly, and not only the failed ones
WIN: benchmark ('basic_modules_ListOfLinears_inductor_gpu_force_shape_pad', 'compile_time_instruction_count') failed, actual result 17258025389 is -1.94% lower than expected 17600000000 ±1.50% please update the expected results.
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146235
Approved by: https://github.com/shunting314
ghstack dependencies: #146225, #146226
2025-02-03 23:15:13 +00:00
Jason Ansel
18380ab877 [inductor] Refactor CSEProxy into global scope (#146226)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/146226
Approved by: https://github.com/shunting314
ghstack dependencies: #146225
2025-02-03 23:15:13 +00:00
Jason Ansel
3a67c0e48d [inductor] Finish typing common.py (#146225)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/146225
Approved by: https://github.com/Skylion007
2025-02-01 22:53:35 +00:00
Jason Ansel
8c657ae4be [inductor] Add typing to common.CSE (#145993)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145993
Approved by: https://github.com/yanboliang
ghstack dependencies: #145913, #145914, #145915, #145916
2025-02-01 16:34:18 +00:00
Jason Ansel
68cf36d5ab [inductor] Add typing to common.KernelArgs (#145916)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145916
Approved by: https://github.com/yanboliang
ghstack dependencies: #145913, #145914, #145915
2025-02-01 16:34:18 +00:00
Jason Ansel
8e56d713c9 [inductor] Add typing to common.OpDecompositions (#145915)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145915
Approved by: https://github.com/yanboliang
ghstack dependencies: #145913, #145914
2025-02-01 16:34:11 +00:00
Jason Ansel
79f9f62e3a [inductor] Combine regexp checks in OpOverrides.paren (#145914)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145914
Approved by: https://github.com/Skylion007
ghstack dependencies: #145913
2025-02-01 16:34:03 +00:00
Jason Ansel
4c004caa76 [inductor] Add types to DeviceOpOverrides (#145913)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145913
Approved by: https://github.com/Skylion007
2025-02-01 16:33:49 +00:00
bglass@quansight.com
40ccb7a86d cpp_wrapper: Move #includes to per-device header files (#145932)
Summary:
This prepares us for the next PR in the stack, where we introduce pre-compiled per-device header files to save compilation time.

Reland https://github.com/pytorch/pytorch/pull/143909 after merge conflicts.

Co-authored-by: Benjamin Glass <[bglass@quansight.com](mailto:bglass@quansight.com)>

Differential Revision: D68656960

Pulled By: benjaminglass1

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145932
Approved by: https://github.com/yushangdi, https://github.com/benjaminglass1

Co-authored-by: bglass@quansight.com <bglass@quansight.com>
2025-01-29 21:08:45 +00:00
David Berard
2e8c080ab1 [inductor][4/N] triton support post-#5512, fix constexpr signatures (#145583)
Prior to this PR, constexprs were appearing in signatures as `{.. "XBLOCK : tl.constexpr": "constexpr"}` when they really should appear as `{.. "XBLOCK": "constexpr"}`.

This PR represents the argument names as ArgName objects, which can optionally be marked as constexpr.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145583
Approved by: https://github.com/jansel
2025-01-29 05:46:05 +00:00
Jason Ansel
78a94c9114 [inductor] Remove type ignores from scheduler.py (#145712)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145712
Approved by: https://github.com/yanboliang, https://github.com/Skylion007
ghstack dependencies: #145692
2025-01-28 01:44:32 +00:00
Jason Ansel
2df2f9d895 [inductor] Change type of get_backend_features to OrderedSet (#145692)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145692
Approved by: https://github.com/yanboliang
2025-01-28 01:44:32 +00:00
Jason Ansel
e90cf4abcf [inductor] Add some typing to common.py (#145691)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145691
Approved by: https://github.com/malfet
ghstack dependencies: #145690
2025-01-27 06:27:13 +00:00