pytorch/torch/_inductor/runtime
Nichols A. Romero 5fc2c7a2a1 [ROCm][inductor] More configs for pointwise kernels. (#166470)
This config improves performance by 250% on some kernels that contain `t1.atomic_add(...)`. Again, we conditionalize for ROCm/HIP, so there is no impact to NV.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/166470
Approved by: https://github.com/PaulZhang12, https://github.com/mlazos, https://github.com/eellison, https://github.com/jansel
2025-10-30 21:20:12 +00:00
..
caching Fix pyrefly errors on main (#166548) 2025-10-30 16:47:27 +00:00
__init__.py
autotune_cache.py Fix pyrefly ignore syntax in _inductor (#166247) 2025-10-27 02:48:42 +00:00
benchmarking.py Fix pyrefly ignore syntax in _inductor (#166247) 2025-10-27 02:48:42 +00:00
cache_dir_utils.py Rename inductor cache (#156128) 2025-06-17 03:57:18 +00:00
compile_tasks.py [AARCH64][CD][CUDA13][Triton][PTXAS] Turn on BUILD_BUNDLE_PTXAS=1 (#163988) 2025-09-30 01:56:12 +00:00
coordinate_descent_tuner.py Fix pyrefly ignore syntax in _inductor (#166247) 2025-10-27 02:48:42 +00:00
debug_utils.py Fix inductor memory estimation when a single buf has multiple mutations. Add runtime verification of mem tracking (#159569) 2025-08-05 19:58:11 +00:00
halide_helpers.py
hints.py Fix pyrefly ignore syntax in _inductor (#166247) 2025-10-27 02:48:42 +00:00
README.md
runtime_utils.py Fix pyrefly ignore syntax in _inductor (#166247) 2025-10-27 02:48:42 +00:00
static_cuda_launcher.py Remove JITFunction constexpr and some arg_names (#166280) 2025-10-27 09:29:03 +00:00
triton_compat.py Remove JITFunction constexpr and some arg_names (#166280) 2025-10-27 09:29:03 +00:00
triton_helpers.py Remove JITFunction constexpr and some arg_names (#166280) 2025-10-27 09:29:03 +00:00
triton_heuristics.py [ROCm][inductor] More configs for pointwise kernels. (#166470) 2025-10-30 21:20:12 +00:00

torch._inductor.runtime

This folder contains code needed at runtime by the output code of Inductor. The output code of Inductor will import torch and torch._inductor.runtime, but should not import from other files in torch._inductor.*. Note that this code includes code that is needed to actually perform Triton compilation, but is not needed in the actual, final runtime execution of kernels.

Runtime includes Triton/C++ generated code, which are compiled (sometimes in parallel) when the output code of Inductor is imported. It also includes the autotuning code and heuristics to decide block sizes of generated code.

One of the original motivations for this directory split was so that the Triton compile subprocesses could access Triton and our compiler support code while mocking out most of torch, which can take seconds to import (sometimes more than a Triton compile itself). An abandoned prototype of this can be found here.