pytorch/torch/_inductor/codegen
eellison 566ceb3e7e Refactor dtype propagation (#139945)
A couple changes.

- Tries to reuse dtype propagation rules that were already registered in inductor. These were present both with `pointwise_overrides_data` and the `boolean_ops` list. Additionally, the registration of pointwise ops already specified dtype propagation rules. Saves those registrations and reuses them later.

- Factors out `get_promoted_dtype` which uses functools.lru_cache to take in non - CSEVariable args because those will not work with the functools cache.

Tests get added later in the stack when everything is implemented.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/139945
Approved by: https://github.com/blaine-rister, https://github.com/arui-meta, https://github.com/ezyang
2024-11-27 16:57:02 +00:00
..
aoti_runtime [AOTI][refactor] Separate header codegen (#138882) 2024-10-27 14:14:27 +00:00
cuda [CUTLASS] Lift shape & stride information as kernel args (#138611) 2024-11-25 17:52:33 +00:00
rocm [ROCm][Inductor][CK] Enable scaled mm with bias in gemm max autotune with CK backend (#140674) 2024-11-15 22:08:38 +00:00
xpu [AOTI XPU] Enable Cpp wraper for Intel GPU. (#135318) 2024-11-26 11:51:32 +00:00
__init__.py
aoti_hipify_utils.py [BE][Easy][16/19] enforce style for empty lines in import segments in torch/_i*/ (#129768) 2024-07-20 16:20:58 +00:00
common.py Refactor dtype propagation (#139945) 2024-11-27 16:57:02 +00:00
cpp_gemm_template.py [Inductor][CPP] Extract common functions to be reused in other CPP Template (#141554) 2024-11-27 09:52:18 +00:00
cpp_micro_gemm.py Simplify & rectify dequantized B buffer loading for AMX GEMM micro-kernel for WoQ int8 case (#140258) 2024-11-22 01:34:06 +00:00
cpp_prefix.h std::value/std::type -> std::_v/std::_t (#138746) 2024-10-26 20:59:24 +00:00
cpp_template_kernel.py [inductor] Add typing to ir.py 2 (#140915) 2024-11-22 04:56:54 +00:00
cpp_template.py [AOTI] Remove the non-ABI-compatible mode (part 1) (#138009) 2024-10-17 02:48:26 +00:00
cpp_utils.py Move Sympy printers to torch/utils/_sympy/printers.py (#140597) 2024-11-26 18:11:00 +00:00
cpp_wrapper_cpu_array_ref.py [inductor] Refactor ir.Layout into ir.OutputSpec (#140910) 2024-11-21 20:01:57 +00:00
cpp_wrapper_cpu.py [AOTI XPU] Enable Cpp wraper for Intel GPU. (#135318) 2024-11-26 11:51:32 +00:00
cpp_wrapper_gpu.py [inductor] Refactor MutableBox to make IRNode typing easier (#140895) 2024-11-20 19:50:46 +00:00
cpp.py [inductor] modify the heuristic for loop split optimization (#137550) 2024-11-25 09:16:30 +00:00
cpu_device_op_overrides.py Add Triton CPU as an Inductor backend (#133408) 2024-09-30 20:24:52 +00:00
cuda_combined_scheduling.py [BE]: Update mypy to 1.11.2 (#133816) 2024-09-16 19:44:11 +00:00
debug_utils.py [BE]: Apply PERF401 autofixes from ruff (#140980) 2024-11-20 17:52:07 +00:00
halide.py Move Sympy printers to torch/utils/_sympy/printers.py (#140597) 2024-11-26 18:11:00 +00:00
memory_planning.py [inductor] Generalize WorkspaceArg for graph-level semaphores (#138170) 2024-10-18 23:05:54 +00:00
multi_kernel.py [BE]: Apply PERF401 autofixes from ruff (#140980) 2024-11-20 17:52:07 +00:00
simd_kernel_features.py [inductor] Refactor reduction type choices into V.choices (#139585) 2024-11-17 16:10:37 +00:00
simd.py [inductor] Refactor ir.Layout into ir.OutputSpec (#140910) 2024-11-21 20:01:57 +00:00
triton_combo_kernel.py [BE]: Apply PERF401 autofixes from ruff (#140980) 2024-11-20 17:52:07 +00:00
triton_split_scan.py [inductor] Support fixed triton configs defined at compile time (#140217) 2024-11-17 16:10:37 +00:00
triton_utils.py [inductor] Move V.graph.scheduler.current_device to V.graph.current_device (#138252) 2024-10-18 23:05:54 +00:00
triton.py Refactor dtype propagation (#139945) 2024-11-27 16:57:02 +00:00
wrapper.py Revert "[Inductor] Inplacing with Donated Buffer (#140113)" 2024-11-26 21:20:59 +00:00