pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

History

David Berard 62bac07981 [inductor][triton] support profile_scratch launcher arg (#159772 ) This adds support for Triton after https://github.com/triton-lang/triton/pull/7258 landed. https://github.com/triton-lang/triton/pull/7258 adds a new argument to all the Triton kernels - a profile_scratch argument, similar to global_scratch. This PR updates the static cuda launcher and the AOTI kernel callers to pass in these arguments when calling the Triton kernel. Tests: https://github.com/pytorch/pytorch/pull/159158. I also verified these test locally with triton 3.2, 3.3, and 3.4. Fixes: * static_cuda_launcher (test/repro: `python tools/dynamo/verify_dynamo.py`) * AOTI calling logic (test/repro: `TORCHINDUCTOR_CPP_WRAPPER=1 python test/inductor/test_torchinductor_opinfo.py -k test_comprehensive_linalg_vander_cuda_float32`) Differential Revision: [D79825121](https://our.internmc.facebook.com/intern/diff/D79825121) Pull Request resolved: https://github.com/pytorch/pytorch/pull/159772 Approved by: https://github.com/NikhilAPatel, https://github.com/eellison		2025-08-08 14:27:38 +00:00
..
aoti_runtime	[AOTI] Save data sizes to constants_info (#154534 )	2025-05-29 06:39:13 +00:00
cuda	[inductor][triton] support profile_scratch launcher arg (#159772 )	2025-08-08 14:27:38 +00:00
mtia	[Re-land][Inductor] Support native Inductor as backend for MTIA (#159211 )	2025-07-29 17:03:24 +00:00
rocm	Remove unnecessary "# noqa: set_linter" comments (#159467 )	2025-08-06 21:31:52 +00:00
xpu	[inductor][triton] support profile_scratch launcher arg (#159772 )	2025-08-08 14:27:38 +00:00
__init__.py
aoti_hipify_utils.py	[BE][3/16] fix typos in torch/ (torch/_inductor/) (#156313 )	2025-06-23 02:57:12 +00:00
block_analysis.py	[Inductor] Restrict block analysis to only match integer dims and strides (#149615 )	2025-06-24 22:43:12 +00:00
common.py	[inductor][triton] support profile_scratch launcher arg (#159772 )	2025-08-08 14:27:38 +00:00
cpp_bmm_template.py	[inductor][cpu] Move VNNI weight packing into AMX GEMM kernel for contiguous BMM weights (#146843 )	2025-02-21 21:46:00 +00:00
cpp_flex_attention_template.py	[Inductor] Set the default value of min_chunk_size to 512 (#150762 )	2025-07-21 12:46:05 +00:00
cpp_gemm_template.py	[inductor] Add typing to _inductor/ir.py (#149958 )	2025-06-30 15:56:35 +00:00
cpp_grouped_gemm_template.py	[BE][PYFMT] migrate PYFMT for `torch._inductor` to `ruff format` (#144550 )	2025-02-28 13:33:19 +00:00
cpp_micro_gemm.py	[Pyrefly][Refactor] Replace dict() calls with literal dict syntax for improved readability (#157735 )	2025-07-08 18:10:33 +00:00
cpp_template_kernel.py	[Inductor] Set the default value of min_chunk_size to 512 (#150762 )	2025-07-21 12:46:05 +00:00
cpp_template.py	codecache: Remove cpp_prefix.h duplication per build, then precompile it (#144293 )	2025-05-16 17:41:36 +00:00
cpp_utils.py	[aoti] Initial Metal support (#153959 )	2025-05-23 05:45:35 +00:00
cpp_wrapper_cpu_array_ref.py	[inductor] allocate non-blocking copy destinations in pinned memory (#155121 ) (#158758 )	2025-08-07 17:07:26 +00:00
cpp_wrapper_cpu.py	[inductor] allocate non-blocking copy destinations in pinned memory (#155121 ) (#158758 )	2025-08-07 17:07:26 +00:00
cpp_wrapper_gpu.py	[inductor][triton] support profile_scratch launcher arg (#159772 )	2025-08-08 14:27:38 +00:00
cpp_wrapper_mps.py	[aoti][mps] Initialize mps kernels first (#159753 )	2025-08-06 07:54:29 +00:00
cpp.py	[inductor] [cpu] fix the dype hardcoded to int64 in store_reduction (#157904 )	2025-08-07 08:03:05 +00:00
cpu_device_op_overrides.py	[inductor] Add types to DeviceOpOverrides (#145913 )	2025-02-01 16:33:49 +00:00
cuda_combined_scheduling.py	multi-kernel matmuls based on varying hint sizes (#156628 )	2025-07-12 15:08:21 +00:00
debug_utils.py	[Inductor] Refactor wrapper codegen to use Wrapper IR. (#150458 )	2025-04-15 17:28:36 +00:00
halide.py	[inductor] more size_hint_or_throw usage (#157394 )	2025-07-02 20:20:59 +00:00
memory_planning.py	[BE][PYFMT] migrate PYFMT for `torch._inductor` to `ruff format` (#144550 )	2025-02-28 13:33:19 +00:00
mps_device_op_overrides.py	[aoti] Initial Metal support (#153959 )	2025-05-23 05:45:35 +00:00
mps.py	[aoti][mps] Initialize mps kernels first (#159753 )	2025-08-06 07:54:29 +00:00
multi_kernel.py	multi-kernel matmuls based on varying hint sizes (#156628 )	2025-07-12 15:08:21 +00:00
python_wrapper_mtia.py	[Re-land][Inductor] Support native Inductor as backend for MTIA (#159211 )	2025-07-29 17:03:24 +00:00
simd_kernel_features.py	Replace runtime type parameterization (#155221 )	2025-06-05 21:43:54 +00:00
simd.py	[typing] Constrain OrderedSet generic to be Hashable (#159684 )	2025-08-04 18:08:01 +00:00
subgraph.py	[inductor] Add typing to _inductor/ir.py (#149958 )	2025-06-30 15:56:35 +00:00
triton_combo_kernel.py	[BE][3/16] fix typos in torch/ (torch/_inductor/) (#156313 )	2025-06-23 02:57:12 +00:00
triton_split_scan.py	Reland: [inductor] Simplify grid handling (#148305 )	2025-03-12 15:52:16 +00:00
triton_utils.py	[Inductor] Fix a user-defined Triton kernel bool param codegen issue (#158845 )	2025-07-24 00:19:27 +00:00
triton.py	Remove unnecessary "# noqa: set_linter" comments (#159467 )	2025-08-06 21:31:52 +00:00
wrapper_fxir.py	Fix launch grid calculation (#159497 )	2025-08-02 01:12:58 +00:00
wrapper.py	[inductor] allocate non-blocking copy destinations in pinned memory (#155121 ) (#158758 )	2025-08-07 17:07:26 +00:00