pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

History

Max Podkorytov 7ef2c62fd3 [ROCm][Inductor][CK] Add ck-tile based universal gemm kernels to torch.mm autotune choices (#152341 ) This PR adds code generation for CK-tile based universal gemm kernels to the CK backend for Inductor, and adds these kernels to autotune choices. Unlike legacy-CK based kernels (which are generated by parsing the CK instances from CK library), we generate the set of instances by manually specifying the tuning parameters. This PR introduces a new template for code generation, and compilation/autotuning is handled by the existing infrastructure. Points of discussion: * For simplicity and reduced coupling with CK, the instance filter checks only data type and layout, and doesn't check the alignment requirement - meaning that more instances will be compiled than necessary - while keeping the code generation independent from internal CK logic which checks the alignment validity at runtime * CK-tile instances are enabled whenever legacy-CK instances are enabled. A config knob could be introduced to differentiate between the instance types if that's needed * Whether gemm problem size K is ever dynamic, since whenever it's not a compile-time constant, we need to perform a runtime dispatch between several kernels Testing Use the existing tests in `test/inductor/test_ck_backend.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/152341 Approved by: https://github.com/chenyang78		2025-05-21 23:59:16 +00:00
..
aoti_runtime	[AOTInductor] Fix clang-tidy warnings in wrapper (#153197 )	2025-05-12 22:35:59 +00:00
cuda	[cutlass backend] Add serializer for cutlass ops (#153894 )	2025-05-21 22:01:40 +00:00
rocm	[ROCm][Inductor][CK] Add ck-tile based universal gemm kernels to torch.mm autotune choices (#152341 )	2025-05-21 23:59:16 +00:00
xpu	Reland: [inductor] Simplify grid handling (#148305 )	2025-03-12 15:52:16 +00:00
__init__.py
aoti_hipify_utils.py	remove allow-untyped-defs from _inductor/codegen/aoti_hipify_utils.py (#143916 )	2024-12-27 23:25:37 +00:00
block_analysis.py	[inductor][triton] Block ptr analysis fix assert on matched index expression (#148446 )	2025-03-10 05:26:55 +00:00
common.py	[aoti] Initial Metal support (#153959 )	2025-05-21 21:55:59 +00:00
cpp_bmm_template.py	[inductor][cpu] Move VNNI weight packing into AMX GEMM kernel for contiguous BMM weights (#146843 )	2025-02-21 21:46:00 +00:00
cpp_flex_attention_template.py	Add pack support and use micro gemm for Half flex attention on CPU (#151530 )	2025-04-29 07:24:00 +00:00
cpp_gemm_template.py	[Inductor-CPU] Faster int8 WoQ GEMM for small M with explicit prefetching and different outer loops (#149373 )	2025-05-15 11:55:58 +00:00
cpp_grouped_gemm_template.py	[BE][PYFMT] migrate PYFMT for `torch._inductor` to `ruff format` (#144550 )	2025-02-28 13:33:19 +00:00
cpp_micro_gemm.py	[Inductor-CPU] Faster int8 WoQ GEMM for small M with explicit prefetching and different outer loops (#149373 )	2025-05-15 11:55:58 +00:00
cpp_template_kernel.py	[Inductor-CPU] Faster int8 WoQ GEMM for small M with explicit prefetching and different outer loops (#149373 )	2025-05-15 11:55:58 +00:00
cpp_template.py	codecache: Remove cpp_prefix.h duplication per build, then precompile it (#144293 )	2025-05-16 17:41:36 +00:00
cpp_utils.py	[aoti] Initial Metal support (#153959 )	2025-05-21 21:55:59 +00:00
cpp_wrapper_cpu_array_ref.py	[AOTInductor] Fix clang-tidy warnings in wrapper (#153197 )	2025-05-12 22:35:59 +00:00
cpp_wrapper_cpu.py	[aoti] Initial Metal support (#153959 )	2025-05-21 21:55:59 +00:00
cpp_wrapper_gpu.py	[AOTI][XPU] Embed SPRI-V files into .so (#153924 )	2025-05-20 17:38:53 +00:00
cpp_wrapper_mps.py	[aoti] Initial Metal support (#153959 )	2025-05-21 21:55:59 +00:00
cpp.py	codecache: Remove cpp_prefix.h duplication per build, then precompile it (#144293 )	2025-05-16 17:41:36 +00:00
cpu_device_op_overrides.py	[inductor] Add types to DeviceOpOverrides (#145913 )	2025-02-01 16:33:49 +00:00
cuda_combined_scheduling.py	[Cutlass] E2E Tests for EVT (#152815 )	2025-05-17 12:29:10 +00:00
debug_utils.py	[Inductor] Refactor wrapper codegen to use Wrapper IR. (#150458 )	2025-04-15 17:28:36 +00:00
halide.py	[BE]: Update ruff to 0.11.8 (#153249 )	2025-05-12 18:30:52 +00:00
memory_planning.py	[BE][PYFMT] migrate PYFMT for `torch._inductor` to `ruff format` (#144550 )	2025-02-28 13:33:19 +00:00
mps_device_op_overrides.py	[aoti] Initial Metal support (#153959 )	2025-05-21 21:55:59 +00:00
mps.py	[aoti] Initial Metal support (#153959 )	2025-05-21 21:55:59 +00:00
multi_kernel.py	[Inductor] Refactor wrapper codegen to use Wrapper IR. (#150458 )	2025-04-15 17:28:36 +00:00
simd_kernel_features.py	[BE][Ez]: Use itertools.chain.from_iterable when possible (#148190 )	2025-03-06 20:37:06 +00:00
simd.py	[Cutlass] Integrate EVT into CUDACPPScheduling (#150906 )	2025-05-07 23:09:02 +00:00
subgraph.py	[Inductor] Subgraph check output strides (#153755 )	2025-05-20 16:07:18 +00:00
triton_combo_kernel.py	Reland: [inductor] Simplify grid handling (#148305 )	2025-03-12 15:52:16 +00:00
triton_split_scan.py	Reland: [inductor] Simplify grid handling (#148305 )	2025-03-12 15:52:16 +00:00
triton_utils.py	Fix broken URLs (#152237 )	2025-04-27 09:56:42 +00:00
triton.py	Fix broken URLs (#152237 )	2025-04-27 09:56:42 +00:00
wrapper_fxir.py	[Inductor] Optimize grid calculation by using // instead of FloorDiv (#153230 )	2025-05-12 18:08:52 +00:00
wrapper.py	[aoti] skip input symbol codegen for sympy expr w/ many symbols (#152579 )	2025-05-07 01:18:09 +00:00