pytorch/torch/_inductor/codegen
anwang cd68559d04 [Inductor] Support native Inductor as backend for MTIA (#158526)
This diff/PR includes the changes to support native Inductor integration for MTIA. The goal is to support `torch.compile(backend="inductor")` for MTIA. Inductor should generate code(triton kernel + python wrapper code) similar to CUDA. And the triton kernels can be launched eagerly.

The changes include:
- Add MTIA device interfaces used by Dynamo and Inductor, including APIs on device, stream, event, etc.
- Add required torch.mtia APIs, like is_bf16_supported, memory_allocated, set_stream_by_id, etc.
- MTIA specific codegen logic, for example, loading MTIA dynamic_library.
- Other necessary changes to integrate with Inductor codegn, following other devices like CUDA, XPU.
- Integrate with the [empty_strided_mtia](https://www.internalfb.com/code/fbsource/[0d017d3a4a1bdff7253f9c66a9f38e77bd62166b]/fbcode/caffe2/aten/src/ATen/native/mtia/EmptyTensor.cpp?lines=49%2C63%2C71%2C74%2C78) API that we’ve added for the new MTIA ATen backend.
- A change in Inductor runtime to avoid re-initialize MTIADriver.
- BUCK changes to include ATen-mtia in Inductor, and to use -USE_MTIA preprocessor flag.
- Update `test_mnist_e2e.py` to cover native Inductor as backend, using the `--use_native_inductor` flag.
- Add a personal script(`scripts/anwang/run_native_inductor_script.py`) for testing purpose.

Note:
- This approach(option 3) aims to provide a pytorch native approach of Inductor integration for MTIA, minimizing the onboarding overhead. The downside of this approach is that it doesn't leverage MTIA specific graph optimization, and is limited to eagerly launch overhead.
- MTIA will support another approach(option 2) to provide best performance, based on WrapperFxCodegen. We should be able to reuse the fundamental changes of this diff for option 2, like the device interfaces, steam/event APIs, etc, especially as WrapperFxCodegen inherits PythonWrapperCodegen.

Internal:
References:
- [post for context](https://fb.workplace.com/groups/mtiasw/permalink/1718377262384606/)
- [Inductor integration discussion(option 1/2/3)](https://docs.google.com/document/d/1p6363OXtVIRv1hPoaKlRSK3j-iir3QIbDd5bjyqCNig/edit?tab=t.0#heading=h.7s4ns6wcnhmb)
- [Project design doc(option 3)](https://docs.google.com/document/d/1jXUmhgoV9WvkMf-bcY3Od_kK9K_RDOdgHdt1LoQ5Tc4/edit?tab=t.0#heading=h.y43gwdqlv46w)
- [early prototying diff](https://www.internalfb.com/diff/D75110196)
- [MPS integration PR](https://github.com/pytorch/pytorch/pull/153959)
- [empty_strided_xpu PR](https://github.com/pytorch/pytorch/pull/126678)

Differential Revision: [D78458745](https://our.internmc.facebook.com/intern/diff/D78458745/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/158526
Approved by: https://github.com/blaine-rister, https://github.com/jansel, https://github.com/eellison
2025-07-26 08:16:34 +00:00
..
aoti_runtime [AOTI] Save data sizes to constants_info (#154534) 2025-05-29 06:39:13 +00:00
cuda [ci][cutlass backend] Add ci for cutlass backend tests (#156626) 2025-07-22 05:18:13 +00:00
mtia [Inductor] Support native Inductor as backend for MTIA (#158526) 2025-07-26 08:16:34 +00:00
rocm [ROCm][Inductor][CK] update API for gemm-multiD change (#156122) 2025-07-10 23:12:20 +00:00
xpu [user triton] AOT inductor support for device-side TMA (#155896) 2025-06-27 04:28:04 +00:00
__init__.py
aoti_hipify_utils.py [BE][3/16] fix typos in torch/ (torch/_inductor/) (#156313) 2025-06-23 02:57:12 +00:00
block_analysis.py [Inductor] Restrict block analysis to only match integer dims and strides (#149615) 2025-06-24 22:43:12 +00:00
common.py [Inductor] Support native Inductor as backend for MTIA (#158526) 2025-07-26 08:16:34 +00:00
cpp_bmm_template.py
cpp_flex_attention_template.py [Inductor] Set the default value of min_chunk_size to 512 (#150762) 2025-07-21 12:46:05 +00:00
cpp_gemm_template.py [inductor] Add typing to _inductor/ir.py (#149958) 2025-06-30 15:56:35 +00:00
cpp_grouped_gemm_template.py
cpp_micro_gemm.py [Pyrefly][Refactor] Replace dict() calls with literal dict syntax for improved readability (#157735) 2025-07-08 18:10:33 +00:00
cpp_template_kernel.py [Inductor] Set the default value of min_chunk_size to 512 (#150762) 2025-07-21 12:46:05 +00:00
cpp_template.py codecache: Remove cpp_prefix.h duplication per build, then precompile it (#144293) 2025-05-16 17:41:36 +00:00
cpp_utils.py [aoti] Initial Metal support (#153959) 2025-05-23 05:45:35 +00:00
cpp_wrapper_cpu_array_ref.py [inductor] Add typing to _inductor/ir.py (#149958) 2025-06-30 15:56:35 +00:00
cpp_wrapper_cpu.py DDE-Free select with unbacked index. (#157605) 2025-07-24 20:08:05 +00:00
cpp_wrapper_gpu.py [user triton] AOT inductor support for device-side TMA (#155896) 2025-06-27 04:28:04 +00:00
cpp_wrapper_mps.py [aoti][mps] Improve tabbing in cpp generation (#158351) 2025-07-23 00:54:53 +00:00
cpp.py Refactor Provenance Tracking (#158399) 2025-07-17 00:23:00 +00:00
cpu_device_op_overrides.py
cuda_combined_scheduling.py multi-kernel matmuls based on varying hint sizes (#156628) 2025-07-12 15:08:21 +00:00
debug_utils.py [Inductor] Refactor wrapper codegen to use Wrapper IR. (#150458) 2025-04-15 17:28:36 +00:00
halide.py [inductor] more size_hint_or_throw usage (#157394) 2025-07-02 20:20:59 +00:00
memory_planning.py
mps_device_op_overrides.py [aoti] Initial Metal support (#153959) 2025-05-23 05:45:35 +00:00
mps.py [aoti][mps] Fix cpu kernel generation (#158350) 2025-07-23 00:54:53 +00:00
multi_kernel.py multi-kernel matmuls based on varying hint sizes (#156628) 2025-07-12 15:08:21 +00:00
python_wrapper_mtia.py [Inductor] Support native Inductor as backend for MTIA (#158526) 2025-07-26 08:16:34 +00:00
simd_kernel_features.py Replace runtime type parameterization (#155221) 2025-06-05 21:43:54 +00:00
simd.py [inductor][templates] Finalize all registered hooks (#157270) 2025-07-20 22:07:32 +00:00
subgraph.py [inductor] Add typing to _inductor/ir.py (#149958) 2025-06-30 15:56:35 +00:00
triton_combo_kernel.py [BE][3/16] fix typos in torch/ (torch/_inductor/) (#156313) 2025-06-23 02:57:12 +00:00
triton_split_scan.py
triton_utils.py [Inductor] Fix a user-defined Triton kernel bool param codegen issue (#158845) 2025-07-24 00:19:27 +00:00
triton.py Revert "[AOTI] Add more default options to compile_standalone (#158560)" 2025-07-22 16:20:17 +00:00
wrapper_fxir.py [Inductor] Support precomputed size args in the FX backend. (#157758) 2025-07-08 23:22:17 +00:00
wrapper.py [Inductor] Support native Inductor as backend for MTIA (#158526) 2025-07-26 08:16:34 +00:00