pytorch/torch/_inductor/codegen
Xiangyang (Mark) Guo 156a377f4c [AOTI][CPP] add flag TORCHINDUCTOR_CPP_FORCE_INLINE_KERNEL (#157949)
Summary: Add flag TORCHINDUCTOR_CPP_FORCE_INLINE_KERNEL to force inline the kernel function when TORCHINDUCTOR_CPP_FORCE_INLINE_KERNEL=1. It's disabled by default because force inlining may increase the build time.

Differential Revision: D77915987

Pull Request resolved: https://github.com/pytorch/pytorch/pull/157949
Approved by: https://github.com/desertfire
2025-07-15 10:51:43 +00:00
..
aoti_runtime [AOTI] Save data sizes to constants_info (#154534) 2025-05-29 06:39:13 +00:00
cuda [cutlass backend] cache a few things for codegen and properties (#158158) 2025-07-15 00:18:31 +00:00
rocm [ROCm][Inductor][CK] update API for gemm-multiD change (#156122) 2025-07-10 23:12:20 +00:00
xpu [user triton] AOT inductor support for device-side TMA (#155896) 2025-06-27 04:28:04 +00:00
__init__.py
aoti_hipify_utils.py [BE][3/16] fix typos in torch/ (torch/_inductor/) (#156313) 2025-06-23 02:57:12 +00:00
block_analysis.py [Inductor] Restrict block analysis to only match integer dims and strides (#149615) 2025-06-24 22:43:12 +00:00
common.py [user triton] AOT inductor support for device-side TMA (#155896) 2025-06-27 04:28:04 +00:00
cpp_bmm_template.py
cpp_flex_attention_template.py [inductor] Add typing to _inductor/ir.py (#149958) 2025-06-30 15:56:35 +00:00
cpp_gemm_template.py [inductor] Add typing to _inductor/ir.py (#149958) 2025-06-30 15:56:35 +00:00
cpp_grouped_gemm_template.py
cpp_micro_gemm.py [Pyrefly][Refactor] Replace dict() calls with literal dict syntax for improved readability (#157735) 2025-07-08 18:10:33 +00:00
cpp_template_kernel.py Revert "[Inductor] Set the default value of min_chunk_size to 512 (#150762)" 2025-07-14 16:58:13 +00:00
cpp_template.py codecache: Remove cpp_prefix.h duplication per build, then precompile it (#144293) 2025-05-16 17:41:36 +00:00
cpp_utils.py [aoti] Initial Metal support (#153959) 2025-05-23 05:45:35 +00:00
cpp_wrapper_cpu_array_ref.py [inductor] Add typing to _inductor/ir.py (#149958) 2025-06-30 15:56:35 +00:00
cpp_wrapper_cpu.py [AOTI] codegen for static linkage (#157129) 2025-07-10 16:03:50 +00:00
cpp_wrapper_gpu.py [user triton] AOT inductor support for device-side TMA (#155896) 2025-06-27 04:28:04 +00:00
cpp_wrapper_mps.py [aoti][mps] Fix deduplication of kernels (#156843) 2025-06-26 21:03:05 +00:00
cpp.py [AOTI][CPP] add flag TORCHINDUCTOR_CPP_FORCE_INLINE_KERNEL (#157949) 2025-07-15 10:51:43 +00:00
cpu_device_op_overrides.py
cuda_combined_scheduling.py multi-kernel matmuls based on varying hint sizes (#156628) 2025-07-12 15:08:21 +00:00
debug_utils.py [Inductor] Refactor wrapper codegen to use Wrapper IR. (#150458) 2025-04-15 17:28:36 +00:00
halide.py [inductor] more size_hint_or_throw usage (#157394) 2025-07-02 20:20:59 +00:00
memory_planning.py
mps_device_op_overrides.py [aoti] Initial Metal support (#153959) 2025-05-23 05:45:35 +00:00
mps.py [MPS] Add shifted_chebyshev_polynomial_[tuvw] (#157488) 2025-07-03 15:48:37 +00:00
multi_kernel.py multi-kernel matmuls based on varying hint sizes (#156628) 2025-07-12 15:08:21 +00:00
simd_kernel_features.py Replace runtime type parameterization (#155221) 2025-06-05 21:43:54 +00:00
simd.py multi-kernel matmuls based on varying hint sizes (#156628) 2025-07-12 15:08:21 +00:00
subgraph.py [inductor] Add typing to _inductor/ir.py (#149958) 2025-06-30 15:56:35 +00:00
triton_combo_kernel.py [BE][3/16] fix typos in torch/ (torch/_inductor/) (#156313) 2025-06-23 02:57:12 +00:00
triton_split_scan.py Reland: [inductor] Simplify grid handling (#148305) 2025-03-12 15:52:16 +00:00
triton_utils.py Unify dynamic shapes APIs naming 2 (expect_true and check) attempt2 (#156518) 2025-06-24 21:01:38 +00:00
triton.py multi-kernel matmuls based on varying hint sizes (#156628) 2025-07-12 15:08:21 +00:00
wrapper_fxir.py [Inductor] Support precomputed size args in the FX backend. (#157758) 2025-07-08 23:22:17 +00:00
wrapper.py [AOTI] Add device guard when launching autotune kernels (#158034) 2025-07-11 02:34:31 +00:00