pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

History

Max Podkorytov 7ef2c62fd3 [ROCm][Inductor][CK] Add ck-tile based universal gemm kernels to torch.mm autotune choices (#152341 ) This PR adds code generation for CK-tile based universal gemm kernels to the CK backend for Inductor, and adds these kernels to autotune choices. Unlike legacy-CK based kernels (which are generated by parsing the CK instances from CK library), we generate the set of instances by manually specifying the tuning parameters. This PR introduces a new template for code generation, and compilation/autotuning is handled by the existing infrastructure. Points of discussion: * For simplicity and reduced coupling with CK, the instance filter checks only data type and layout, and doesn't check the alignment requirement - meaning that more instances will be compiled than necessary - while keeping the code generation independent from internal CK logic which checks the alignment validity at runtime * CK-tile instances are enabled whenever legacy-CK instances are enabled. A config knob could be introduced to differentiate between the instance types if that's needed * Whether gemm problem size K is ever dynamic, since whenever it's not a compile-time constant, we need to perform a runtime dispatch between several kernels Testing Use the existing tests in `test/inductor/test_ck_backend.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/152341 Approved by: https://github.com/chenyang78		2025-05-21 23:59:16 +00:00
..
__init__.py
ck_conv_template.py	[ROCm][Inductor][CK] Add ck-tile based universal gemm kernels to torch.mm autotune choices (#152341 )	2025-05-21 23:59:16 +00:00
ck_template.py	[inductor][ck] kBatch parametrized (#147885 )	2025-02-26 07:28:19 +00:00
ck_tile_template.py	[ROCm][Inductor][CK] Add ck-tile based universal gemm kernels to torch.mm autotune choices (#152341 )	2025-05-21 23:59:16 +00:00
ck_tile_universal_gemm_template.py	[ROCm][Inductor][CK] Add ck-tile based universal gemm kernels to torch.mm autotune choices (#152341 )	2025-05-21 23:59:16 +00:00
ck_universal_gemm_template.py	[ROCm][Inductor][CK] Add ck-tile based universal gemm kernels to torch.mm autotune choices (#152341 )	2025-05-21 23:59:16 +00:00
compile_command.py	[ROCm][Inductor][CK] Add ck-tile based universal gemm kernels to torch.mm autotune choices (#152341 )	2025-05-21 23:59:16 +00:00
rocm_benchmark_request.py	Rename "output_tensor" -> "out" in autotune_process.py (#153169 )	2025-05-13 14:18:29 +00:00
rocm_cpp_scheduling.py	[BE][PYFMT] migrate PYFMT for `torch._inductor` to `ruff format` (#144550 )	2025-02-28 13:33:19 +00:00
rocm_kernel.py	[AOTI][reland2] Remove typedef for half and bfloat16 (#153467 )	2025-05-14 02:37:18 +00:00
rocm_template_buffer.py	PEP585 update - torch/_inductor/codegen (#145106 )	2025-01-18 06:56:03 +00:00
rocm_template.py	[ROCm][Inductor][CK] Add ck-tile based universal gemm kernels to torch.mm autotune choices (#152341 )	2025-05-21 23:59:16 +00:00
rocm_utils.py	[AOTI][reland2] Remove typedef for half and bfloat16 (#153467 )	2025-05-14 02:37:18 +00:00