pytorch/torch
David Berard 62bac07981 [inductor][triton] support profile_scratch launcher arg (#159772)
This adds support for Triton after https://github.com/triton-lang/triton/pull/7258 landed. https://github.com/triton-lang/triton/pull/7258 adds a new argument to all the Triton kernels - a profile_scratch argument, similar to global_scratch. This PR updates the static cuda launcher and the AOTI kernel callers to pass in these arguments when calling the Triton kernel.

Tests: https://github.com/pytorch/pytorch/pull/159158. I also verified these test locally with triton 3.2, 3.3, and 3.4.

Fixes:
* static_cuda_launcher (test/repro: `python tools/dynamo/verify_dynamo.py`)
* AOTI calling logic (test/repro: `TORCHINDUCTOR_CPP_WRAPPER=1 python test/inductor/test_torchinductor_opinfo.py -k test_comprehensive_linalg_vander_cuda_float32`)

Differential Revision: [D79825121](https://our.internmc.facebook.com/intern/diff/D79825121)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/159772
Approved by: https://github.com/NikhilAPatel, https://github.com/eellison
2025-08-08 14:27:38 +00:00
..
_awaits
_C Revert "Add unified memory APIs for torch.accelerator (#152932)" 2025-08-07 16:34:36 +00:00
_C_flatbuffer
_custom_op [BE]: ruff PLC0207 - use maxsplit kwarg (#160107) 2025-08-08 03:14:59 +00:00
_decomp (should_fold) gso to guard_or_false when checking folding whether to 3d bmm into 2d mm (#159184) 2025-07-30 03:12:14 +00:00
_dispatch Improve torch.ops typing (#154555) 2025-06-22 15:52:27 +00:00
_dynamo Fix infinite loop when iterating over an empty zip (#159673) 2025-08-08 02:50:21 +00:00
_export [Export Schema] Remove deviceAllocationMap field (#159653) 2025-08-07 07:31:42 +00:00
_functorch [MTIA] Allow users who know what they are doing to ignore all device mismatches in tracing and take a preferred device. (#159931) 2025-08-07 22:37:15 +00:00
_higher_order_ops [HOP, map] Rework of map autograd to the new interface (#153343) 2025-08-06 23:02:42 +00:00
_inductor [inductor][triton] support profile_scratch launcher arg (#159772) 2025-08-08 14:27:38 +00:00
_lazy [BE][2/16] fix typos in torch/ (torch/_*/) (#156312) 2025-07-12 05:47:06 +00:00
_library [inductor] respect layout tags for ops with registered lowerings (#159134) 2025-07-31 21:29:40 +00:00
_logging fix logging setup issue for Windows.. (#159887) 2025-08-05 23:44:38 +00:00
_numpy Fix torch._numpy to match NumPy when empty ellipsis causes advanced indexing separation (#158297) 2025-07-16 08:11:53 +00:00
_prims [BE]: ruff PLC0207 - use maxsplit kwarg (#160107) 2025-08-08 03:14:59 +00:00
_prims_common [Dynamo][Better Engineering] Add typing annotations to guard and source (#158397) (#159491) 2025-07-30 22:57:50 +00:00
_refs [BE][2/16] fix typos in torch/ (torch/_*/) (#156312) 2025-07-12 05:47:06 +00:00
_strobelight [BE][2/16] fix typos in torch/ (torch/_*/) (#156312) 2025-07-12 05:47:06 +00:00
_subclasses [MTIA] Allow users who know what they are doing to ignore all device mismatches in tracing and take a preferred device. (#159931) 2025-08-07 22:37:15 +00:00
_vendor
accelerator Revert "Add unified memory APIs for torch.accelerator (#152932)" 2025-08-07 16:34:36 +00:00
amp Fix autocast context manager when there is exception (#159565) 2025-08-01 02:12:24 +00:00
ao [BE]: ruff PLC0207 - use maxsplit kwarg (#160107) 2025-08-08 03:14:59 +00:00
autograd Fix types in graphs.py (#158192) 2025-07-15 19:49:38 +00:00
backends fixed typo error (#159451) 2025-07-30 17:41:30 +00:00
compiler Add torch compile force disable caches alias (#158072) 2025-08-02 23:23:17 +00:00
contrib
cpu
csrc Extend torch function support to ALL arguments, not just scalar type (but not insides of list) (#145089) 2025-08-07 23:43:53 +00:00
cuda Revert "Add unified memory APIs for torch.accelerator (#152932)" 2025-08-07 16:34:36 +00:00
distributed [SymmMem] Send tensors with unerased type information to NVSHMEM Triton kernels (#159788) 2025-08-08 05:20:42 +00:00
distributions [BE][1/16] fix typos in torch/ (#156311) 2025-07-09 11:02:22 +00:00
export [export] Apply move_to_device_pass to all submodules (#159992) 2025-08-07 18:51:15 +00:00
fft [BE][PYFMT] migrate PYFMT for torch/[e-n]*/ to ruff format (#144553) 2025-06-17 08:18:47 +00:00
func
futures Simplify the base classes of _PyFutureMeta (#157757) 2025-07-08 15:39:56 +00:00
fx [BE]: ruff PLC0207 - use maxsplit kwarg (#160107) 2025-08-08 03:14:59 +00:00
headeronly [Reland] Migrate ScalarType to headeronly (#159911) 2025-08-06 07:36:37 +00:00
jit [4/n] Remove references to TorchScript in PyTorch docs (#158317) 2025-07-16 20:01:34 +00:00
legacy
lib [2/N] Fix cppcoreguidelines-init-variables suppression (#146237) 2025-06-19 23:26:42 +00:00
linalg
masked Fix MaskedTensor to device ignored mask (#151205) 2025-07-21 21:44:49 +00:00
monitor
mps [BE][12/16] fix typos in torch/ (#156602) 2025-07-02 22:55:29 +00:00
mtia [Re-land][Inductor] Support native Inductor as backend for MTIA (#159211) 2025-07-29 17:03:24 +00:00
multiprocessing [BE][12/16] fix typos in torch/ (#156602) 2025-07-02 22:55:29 +00:00
nativert turn on executon frame clenaup by default (#160110) 2025-08-08 02:13:48 +00:00
nested Add minimal nn.functional.log_softmax support for NestedTensor (#159662) 2025-08-06 20:34:02 +00:00
nn Allow register_buffer with Tensor-like object (#159455) 2025-08-01 15:31:38 +00:00
onnx Make onnx export SDPA match aten behavior (#159973) 2025-08-07 04:06:07 +00:00
optim Detach tensor before clone in SGD optimiser and other code (#159204) 2025-07-27 03:31:12 +00:00
package [BE][PYFMT] migrate PYFMT for torch/[p-z]*/ to ruff format (#144552) 2025-08-07 00:09:56 +00:00
profiler [BE][PYFMT] migrate PYFMT for torch/[p-z]*/ to ruff format (#144552) 2025-08-07 00:09:56 +00:00
quantization [BE][PYFMT] migrate PYFMT for torch/[p-z]*/ to ruff format (#144552) 2025-08-07 00:09:56 +00:00
signal [BE][PYFMT] migrate PYFMT for torch/[p-z]*/ to ruff format (#144552) 2025-08-07 00:09:56 +00:00
sparse [BE][PYFMT] migrate PYFMT for torch/[p-z]*/ to ruff format (#144552) 2025-08-07 00:09:56 +00:00
special [BE][PYFMT] migrate PYFMT for torch/[p-z]*/ to ruff format (#144552) 2025-08-07 00:09:56 +00:00
testing [BE]: ruff PLC0207 - use maxsplit kwarg (#160107) 2025-08-08 03:14:59 +00:00
utils dataclass pytree fix (#159916) 2025-08-07 08:22:41 +00:00
xpu [BE][PYFMT] migrate PYFMT for torch/[p-z]*/ to ruff format (#144552) 2025-08-07 00:09:56 +00:00
__config__.py
__future__.py
__init__.py [BE] remove torch deploy - conditionals (#158288) 2025-07-29 17:40:49 +00:00
_appdirs.py
_classes.py remove allow-untyped-defs from torch/_classes.py (#157231) 2025-07-08 00:11:52 +00:00
_compile.py [precompile] Ensure @disable()-ed function won't trigger recompile from precompile bytecode. (#155363) 2025-06-10 16:13:38 +00:00
_custom_ops.py
_environment.py
_guards.py [Dynamo][Better Engineering] Typing torch/_dynamo/guards.py (#159315) 2025-08-06 21:52:14 +00:00
_jit_internal.py [BE][1/16] fix typos in torch/ (#156311) 2025-07-09 11:02:22 +00:00
_linalg_utils.py Update is_sparse doc to mention that it is sparse_coo specific (#157378) 2025-07-09 18:22:14 +00:00
_lobpcg.py [BE][1/16] fix typos in torch/ (#156311) 2025-07-09 11:02:22 +00:00
_lowrank.py [BE][1/16] fix typos in torch/ (#156311) 2025-07-09 11:02:22 +00:00
_meta_registrations.py Add meta kernel for sdpa_math_for_mps (#159695) 2025-08-05 22:27:06 +00:00
_namedtensor_internals.py
_ops.py [BE] remove torch deploy - conditionals (#158288) 2025-07-29 17:40:49 +00:00
_python_dispatcher.py Typo fixes for "overridden" in comments and function names (#155944) 2025-06-14 03:37:38 +00:00
_size_docs.py
_sources.py
_storage_docs.py Fix docstring for torch.UntypedStorage.from_file (#155067) 2025-06-05 14:30:49 +00:00
_streambase.py
_tensor_docs.py Add missing optional for tensor ops (#159028) 2025-07-25 04:36:55 +00:00
_tensor_str.py Fix max_width computation in _tensor_str._Formatter (#126859) 2025-08-01 15:05:41 +00:00
_tensor.py [MPS] Enable dlpack integration (#158888) 2025-07-24 18:05:41 +00:00
_thread_safe_fork.py
_torch_docs.py Update the signature and test of torch.hamming_window() (#152682) 2025-08-04 17:50:42 +00:00
_utils_internal.py Wire in pt2_triton_builds (#159897) 2025-08-06 07:39:51 +00:00
_utils.py [BE][1/16] fix typos in torch/ (#156311) 2025-07-09 11:02:22 +00:00
_VF.py
_vmap_internals.py
_weights_only_unpickler.py
CMakeLists.txt Migrate c10/macros/cmake_macros.h.in to torch/headeronly (#158035) 2025-07-15 19:52:59 +00:00
custom_class_detail.h
custom_class.h [BE][1/16] fix typos in torch/ (#156311) 2025-07-09 11:02:22 +00:00
extension.h
functional.py Fix atleast_{1,2,3}d() with no arguments description (#156042) 2025-07-28 06:25:23 +00:00
header_only_apis.txt [Reland] Migrate ScalarType to headeronly (#159911) 2025-08-06 07:36:37 +00:00
hub.py [BE][1/16] fix typos in torch/ (#156311) 2025-07-09 11:02:22 +00:00
library.h [BE][1/16] fix typos in torch/ (#156311) 2025-07-09 11:02:22 +00:00
library.py [BE] remove torch deploy - conditionals (#158288) 2025-07-29 17:40:49 +00:00
overrides.py Add basic torch.hash_tensor op (#154149) 2025-07-23 22:28:03 +00:00
py.typed
quasirandom.py
random.py
return_types.py
script.h
serialization.py Reduce random reads for offset metadata when calling torch.load under FakeTensorMode (#157931) 2025-07-17 22:17:52 +00:00
storage.py mypy 1.16.0 (#155821) 2025-06-14 18:18:43 +00:00
torch_version.py
types.py
version.py.tpl