pytorch/torch/_inductor/kernel
Max Podkorytov 7ef2c62fd3 [ROCm][Inductor][CK] Add ck-tile based universal gemm kernels to torch.mm autotune choices (#152341)
This PR adds code generation for CK-tile based universal gemm kernels to the CK backend for Inductor, and adds these kernels to autotune choices.

Unlike legacy-CK based kernels (which are generated by parsing the CK instances from CK library), we generate the set of instances by manually specifying the tuning parameters.

This PR introduces a new template for code generation, and compilation/autotuning is handled by the existing infrastructure.

Points of discussion:

* For simplicity and reduced coupling with CK, the instance filter checks only data type and layout, and doesn't check the alignment requirement - meaning that more instances will be compiled than necessary - while keeping the code generation independent from internal CK logic which checks the alignment validity at runtime
* CK-tile instances are enabled whenever legacy-CK instances are enabled. A config knob could be introduced to differentiate between the instance types if that's needed
* Whether gemm problem size K is ever dynamic, since whenever it's not a compile-time constant, we need to perform a runtime dispatch between several kernels

** Testing **

Use the existing tests in `test/inductor/test_ck_backend.py`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/152341
Approved by: https://github.com/chenyang78
2025-05-21 23:59:16 +00:00
..
__init__.py Delete Mixed MM Special Casing (#147151) 2025-02-25 04:29:54 +00:00
bmm.py [Inductor] Add decomposeK as an autotuning choice for mm (#150654) 2025-05-03 02:23:54 +00:00
conv.py Fix broken URLs (#152237) 2025-04-27 09:56:42 +00:00
flex_attention.py [pytorch][triton] flex attention fwd kernel with TMA loads (#151923) (#152460) 2025-05-15 04:49:32 +00:00
flex_decoding.py [pytorch][triton] flex attention fwd kernel with TMA loads (#151923) (#152460) 2025-05-15 04:49:32 +00:00
mm_common.py [cutlass backend] Add (limited) bmm dynamic shape support (#152393) 2025-04-30 04:36:24 +00:00
mm_plus_mm.py Fix broken URLs (#152237) 2025-04-27 09:56:42 +00:00
mm_scaled_grouped.py [inductor] Generate synthetic offsets appropriately for autotuning _scaled_grouped_mm (#152968) 2025-05-08 21:07:04 +00:00
mm.py [ROCm][Inductor][CK] Add ck-tile based universal gemm kernels to torch.mm autotune choices (#152341) 2025-05-21 23:59:16 +00:00