pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

History

Mwiza Kunda 0e6d44df3f Add heuristic to fail block pointer match early (#144681 ) This PR adds a heuristic to potentially fail the block pointer match early. Expressions like below take a long time to match using sympy (e.g. > 100 seconds) ```python # torch._inductor.config.triton.use_block_ptr = True # torch._inductor.config.triton.prefer_nd_tiling = True # Expression from pytest -k test_max_pool2d1_dynamic_shapes_cuda: ((xindex//ps1))((s2 - 3//2))2 + 2((xindex//ps1))((s2 - 3//2)) + ((xindex//ps1)) + ((s2 - 3//2))(ModularIndexing(xindex, ps0, ps0)) + (ModularIndexing(xindex, 1, ps0)) + (ModularIndexing(xindex, ps0, ps0)) ``` Additionally, the heuristic for the number of dimensions based on the indexing expression is refined to only add dimensions for FloorDiv(index, denom) and ModularIndexing(index, denom, modulo) instead of including FloorDiv/ModularIndexing expressions that don't involve the index. Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/144681 Approved by: https://github.com/jansel		2025-01-16 21:57:30 +00:00
..
aoti_runtime	cpp_wrapper: Move #includes to per-device header files (#143909 )	2025-01-15 21:14:02 +00:00
cuda	cpp_wrapper: Move #includes to per-device header files (#143909 )	2025-01-15 21:14:02 +00:00
rocm	[ROCm][Inductor][CK] hackfix for segfault in addmm op (#144519 )	2025-01-10 19:29:14 +00:00
xpu	cpp_wrapper: Move #includes to per-device header files (#143909 )	2025-01-15 21:14:02 +00:00
__init__.py
aoti_hipify_utils.py	remove allow-untyped-defs from _inductor/codegen/aoti_hipify_utils.py (#143916 )	2024-12-27 23:25:37 +00:00
block_analysis.py	Add heuristic to fail block pointer match early (#144681 )	2025-01-16 21:57:30 +00:00
common.py	cpp_wrapper: Move #includes to per-device header files (#143909 )	2025-01-15 21:14:02 +00:00
cpp_bmm_template.py	[inductor][cpu] Fix bmm b_index for dynamic expressions in inductor autotuner (#143141 )	2025-01-05 18:02:37 +00:00
cpp_flex_attention_template.py	Remove is_reduced_floating_point from namespace std (#144502 )	2025-01-10 03:24:10 +00:00
cpp_gemm_template.py	[Inductor][CPP] Enable Grouped GEMM Template (#143796 )	2025-01-14 05:59:07 +00:00
cpp_grouped_gemm_template.py	[Inductor][CPP] Enable Epilogue Fusion for Grouped GEMM Template (#143897 )	2025-01-14 06:07:50 +00:00
cpp_micro_gemm.py	[Fix]: Enable support for Arm Neon & SVE support for FP32 Gemm Wrapper (#144327 )	2025-01-14 17:52:00 +00:00
cpp_prefix.h	Remove is_reduced_floating_point from namespace std (#144502 )	2025-01-10 03:24:10 +00:00
cpp_template_kernel.py	[Inductor][CPP] Enable Epilogue Fusion for Grouped GEMM Template (#143897 )	2025-01-14 06:07:50 +00:00
cpp_template.py	[Inductor][CPP] Enable Grouped GEMM Template (#143796 )	2025-01-14 05:59:07 +00:00
cpp_utils.py	Migrate from Tuple -> tuple in torch/_inductor (#144264 )	2025-01-07 03:27:27 +00:00
cpp_wrapper_cpu_array_ref.py	cpp_wrapper: Move #includes to per-device header files (#143909 )	2025-01-15 21:14:02 +00:00
cpp_wrapper_cpu.py	cpp_wrapper: Move #includes to per-device header files (#143909 )	2025-01-15 21:14:02 +00:00
cpp_wrapper_gpu.py	cpp_wrapper: Move #includes to per-device header files (#143909 )	2025-01-15 21:14:02 +00:00
cpp.py	[Inductor][CPP] Enable Epilogue Fusion for Grouped GEMM Template (#143897 )	2025-01-14 06:07:50 +00:00
cpu_device_op_overrides.py	remove allow-untyped-defs from _inductor/codegen/cpu_device_op_overrides.py (#143881 )	2024-12-27 04:10:47 +00:00
cuda_combined_scheduling.py	Prologue Fusion (#134532 )	2024-12-13 04:18:25 +00:00
debug_utils.py	cpp_wrapper: Move #includes to per-device header files (#143909 )	2025-01-15 21:14:02 +00:00
halide.py	Migrate from Tuple -> tuple in torch/_inductor (#144264 )	2025-01-07 03:27:27 +00:00
memory_planning.py	[inductor] Replace set by OrderedSet (#138466 )	2024-12-13 16:08:45 +00:00
mps_device_op_overrides.py	[Inductor] Add MPS device op overrides (#143892 )	2024-12-28 02:11:45 +00:00
mps.py	[MPSInductor] Fix codegen regression (#144924 )	2025-01-16 02:12:42 +00:00
multi_kernel.py	Revert "Use absolute path `path.resolve()` -> `path.absolute()` (#129409 )"	2025-01-04 14:17:20 +00:00
simd_kernel_features.py	Skip L1 cache for single-use buffers (#143115 )	2025-01-07 19:35:40 +00:00
simd.py	[Inductor] Restrict ND tiling analysis to MemoryDeps (#144497 )	2025-01-11 05:16:47 +00:00
triton_combo_kernel.py	Migrate from Tuple -> tuple in torch/_inductor (#144264 )	2025-01-07 03:27:27 +00:00
triton_split_scan.py	[inductor] Replace set by OrderedSet (#138466 )	2024-12-13 16:08:45 +00:00
triton_utils.py	[inductor] Move V.graph.scheduler.current_device to V.graph.current_device (#138252 )	2024-10-18 23:05:54 +00:00
triton.py	Add heuristic to fail block pointer match early (#144681 )	2025-01-16 21:57:30 +00:00
wrapper.py	cpp_wrapper: Move #includes to per-device header files (#143909 )	2025-01-15 21:14:02 +00:00