pytorch/torch/_inductor/codegen
Jason Ansel b040dc3a53 Reland: [inductor] Simplify grid handling (#148305)
Summary:
Relands D69965761 / https://github.com/pytorch/pytorch/pull/147583

Before this PR, calling a triton kernel would look like:
```py
kernel.run(a, b, xnumel, grid=grid(xnumel), stream=stream0)
```
where the `grid=` was passed as a callable (function closure) arg.  This PR removes the grid arg:
```py
kernel.run(a, b, xnumel, stream=stream0)
```
instead now the grid computation is included in the kernel launcher, with something like:
```py
def launcher(in_ptr0, out_ptr0, xnumel, stream):
    grid_0 = ((xnumel + 1023) >> 10)
    grid_1 = 1
    grid_2 = 1
    runner(grid_0, grid_1, grid_2, stream, function, metadata, None, launch_enter_hook, launch_exit_hook, in_ptr0, out_ptr0, xnumel)
```

This should be faster, since we remove multiple function/dict calls and are able to specialize the grid computation for each `triton.Config`.

It also allows us to unify the handling of grids between the Python and C++ wrapper code.  Before this, C++ wrapper code didn't actually support dynamic grid sizes and instead burned in a static grid.

This unification allows this PR to be a net deletion of code.

Differential [disconnected] Revision: D70471332

Pull Request resolved: https://github.com/pytorch/pytorch/pull/148305
Approved by: https://github.com/shunting314, https://github.com/eellison
2025-03-12 15:52:16 +00:00
..
aoti_runtime Fix for AOTI + CUDAGraphs when calling from Python (#148601) 2025-03-08 02:44:14 +00:00
cuda Reland: [inductor] Simplify grid handling (#148305) 2025-03-12 15:52:16 +00:00
rocm Reland: [inductor] Simplify grid handling (#148305) 2025-03-12 15:52:16 +00:00
xpu Reland: [inductor] Simplify grid handling (#148305) 2025-03-12 15:52:16 +00:00
__init__.py
aoti_hipify_utils.py remove allow-untyped-defs from _inductor/codegen/aoti_hipify_utils.py (#143916) 2024-12-27 23:25:37 +00:00
block_analysis.py [inductor][triton] Block ptr analysis fix assert on matched index expression (#148446) 2025-03-10 05:26:55 +00:00
common.py Reland: [inductor] Simplify grid handling (#148305) 2025-03-12 15:52:16 +00:00
cpp_bmm_template.py [inductor][cpu] Move VNNI weight packing into AMX GEMM kernel for contiguous BMM weights (#146843) 2025-02-21 21:46:00 +00:00
cpp_flex_attention_template.py [Inductor][CPP] Avoid transpose with cpp micro-gemm for FlexAttention (#147069) 2025-03-03 15:22:11 +00:00
cpp_gemm_template.py [inductor][cpu] Fix error with FlexibleLayout weights in BMM (#148188) 2025-03-05 01:05:05 +00:00
cpp_grouped_gemm_template.py [BE][PYFMT] migrate PYFMT for torch._inductor to ruff format (#144550) 2025-02-28 13:33:19 +00:00
cpp_micro_gemm.py [Inductor][CPP] Avoid transpose with cpp micro-gemm for FlexAttention (#147069) 2025-03-03 15:22:11 +00:00
cpp_prefix.h [Inductor][CPP] fix store mode atomic add (#147961) 2025-02-26 14:04:34 +00:00
cpp_template_kernel.py Reland: [inductor] Simplify grid handling (#148305) 2025-03-12 15:52:16 +00:00
cpp_template.py [BE][PYFMT] migrate PYFMT for torch._inductor to ruff format (#144550) 2025-02-28 13:33:19 +00:00
cpp_utils.py [AOTI] build CPU CPP kernels at O3, and all other code at O1 (#148587) 2025-03-05 22:47:46 +00:00
cpp_wrapper_cpu_array_ref.py Reland: [inductor] Simplify grid handling (#148305) 2025-03-12 15:52:16 +00:00
cpp_wrapper_cpu.py Reland: [inductor] Simplify grid handling (#148305) 2025-03-12 15:52:16 +00:00
cpp_wrapper_gpu.py Reland: [inductor] Simplify grid handling (#148305) 2025-03-12 15:52:16 +00:00
cpp.py Reland: [inductor] Simplify grid handling (#148305) 2025-03-12 15:52:16 +00:00
cpu_device_op_overrides.py [inductor] Add types to DeviceOpOverrides (#145913) 2025-02-01 16:33:49 +00:00
cuda_combined_scheduling.py Revert "Use the device interface for detecting Triton availability (#139171)" 2025-03-11 18:49:21 +00:00
debug_utils.py Reland: [inductor] Simplify grid handling (#148305) 2025-03-12 15:52:16 +00:00
halide.py Reland: [inductor] Simplify grid handling (#148305) 2025-03-12 15:52:16 +00:00
memory_planning.py [BE][PYFMT] migrate PYFMT for torch._inductor to ruff format (#144550) 2025-02-28 13:33:19 +00:00
mps_device_op_overrides.py [inductor] Add types to DeviceOpOverrides (#145913) 2025-02-01 16:33:49 +00:00
mps.py Reland: [inductor] Simplify grid handling (#148305) 2025-03-12 15:52:16 +00:00
multi_kernel.py Reland: [inductor] Simplify grid handling (#148305) 2025-03-12 15:52:16 +00:00
simd_kernel_features.py [BE][Ez]: Use itertools.chain.from_iterable when possible (#148190) 2025-03-06 20:37:06 +00:00
simd.py Reland: [inductor] Simplify grid handling (#148305) 2025-03-12 15:52:16 +00:00
triton_combo_kernel.py Reland: [inductor] Simplify grid handling (#148305) 2025-03-12 15:52:16 +00:00
triton_split_scan.py Reland: [inductor] Simplify grid handling (#148305) 2025-03-12 15:52:16 +00:00
triton_utils.py [BE][PYFMT] migrate PYFMT for torch._inductor to ruff format (#144550) 2025-02-28 13:33:19 +00:00
triton.py Reland: [inductor] Simplify grid handling (#148305) 2025-03-12 15:52:16 +00:00
wrapper.py Reland: [inductor] Simplify grid handling (#148305) 2025-03-12 15:52:16 +00:00