pytorch/torch/utils
Jason Ansel b040dc3a53 Reland: [inductor] Simplify grid handling (#148305)
Summary:
Relands D69965761 / https://github.com/pytorch/pytorch/pull/147583

Before this PR, calling a triton kernel would look like:
```py
kernel.run(a, b, xnumel, grid=grid(xnumel), stream=stream0)
```
where the `grid=` was passed as a callable (function closure) arg.  This PR removes the grid arg:
```py
kernel.run(a, b, xnumel, stream=stream0)
```
instead now the grid computation is included in the kernel launcher, with something like:
```py
def launcher(in_ptr0, out_ptr0, xnumel, stream):
    grid_0 = ((xnumel + 1023) >> 10)
    grid_1 = 1
    grid_2 = 1
    runner(grid_0, grid_1, grid_2, stream, function, metadata, None, launch_enter_hook, launch_exit_hook, in_ptr0, out_ptr0, xnumel)
```

This should be faster, since we remove multiple function/dict calls and are able to specialize the grid computation for each `triton.Config`.

It also allows us to unify the handling of grids between the Python and C++ wrapper code.  Before this, C++ wrapper code didn't actually support dynamic grid sizes and instead burned in a static grid.

This unification allows this PR to be a net deletion of code.

Differential [disconnected] Revision: D70471332

Pull Request resolved: https://github.com/pytorch/pytorch/pull/148305
Approved by: https://github.com/shunting314, https://github.com/eellison
2025-03-12 15:52:16 +00:00
..
_strobelight [BE]: Enable ruff rule SIM113 (#147290) 2025-02-16 22:41:16 +00:00
_sympy Add ccode for FloorDiv (#148727) 2025-03-10 14:00:18 +00:00
backcompat
benchmark [BE][Ez]: Use itertools.chain.from_iterable when possible (#148190) 2025-03-06 20:37:06 +00:00
bottleneck PEP585 update - torch/utils (#145201) 2025-01-21 21:04:10 +00:00
data Move get accelerator to use build time flags when possible (#146098) 2025-03-10 13:17:58 +00:00
hipify [ROCm] OCP FP8 Support for new GPUs (#146632) 2025-02-24 22:47:52 +00:00
jit PEP585 update - torch/utils (#145201) 2025-01-21 21:04:10 +00:00
model_dump PEP585: More UP006 fixes (#146392) 2025-02-20 06:18:13 +00:00
serialization Make record/storage alignment in torch.save configurable (#147788) 2025-03-06 12:04:46 +00:00
tensorboard Define __all__ for torch.utils.tensorboard (#147550) 2025-02-28 23:06:11 +00:00
viz Fix ReferenceError: weakly-referenced object no longer exists in cycle detector (#146922) 2025-02-24 22:27:39 +00:00
__init__.py
_appending_byte_serializer.py Add AppendingByteSerializer class (#148226) 2025-03-02 08:20:58 +00:00
_backport_slots.py PEP585 update - torch/utils (#145201) 2025-01-21 21:04:10 +00:00
_config_module.py Enable ruff rule S324 (#147665) 2025-02-25 18:27:34 +00:00
_config_typing.pyi
_content_store.py Revert "Use the device interface for detecting Triton availability (#139171)" 2025-03-11 18:49:21 +00:00
_contextlib.py
_cpp_embed_headers.py [BE] Strip #pragma once when embedding the headers (#146871) 2025-02-11 16:49:00 +00:00
_cpp_extension_versioner.py xpu: support sycl with torch.utils.cpp_extension APIs (#132945) 2025-02-16 16:50:59 +00:00
_cxx_pytree.py Revert "[pytree] add APIs to determine a class is a namedtuple or PyStructSequence (#113257)" 2025-03-10 17:19:21 +00:00
_device.py Revert "Fix torch.normal ignores default_device (#144070)" 2025-01-14 17:41:58 +00:00
_exposed_in.py
_filelock.py
_foreach_utils.py [HPU] Add hpu to fused kernels supported devices (#148666) 2025-03-07 04:28:33 +00:00
_freeze.py PEP585 update - torch/utils (#145201) 2025-01-21 21:04:10 +00:00
_functools.py
_get_clean_triton.py Reland: [inductor] Simplify grid handling (#148305) 2025-03-12 15:52:16 +00:00
_import_utils.py
_mode_utils.py
_ordered_set.py [BE]: Make OrderedSet reversible (#146904) 2025-02-13 15:11:48 +00:00
_python_dispatch.py PEP585 update - torch/utils (#145201) 2025-01-21 21:04:10 +00:00
_pytree.py Revert "[pytree] add APIs to determine a class is a namedtuple or PyStructSequence (#113257)" 2025-03-10 17:19:21 +00:00
_stats.py PEP585 update - torch/utils (#145201) 2025-01-21 21:04:10 +00:00
_thunk.py
_traceback.py PEP585 update - torch/utils (#145201) 2025-01-21 21:04:10 +00:00
_triton.py Revert "Use the device interface for detecting Triton availability (#139171)" 2025-03-11 18:49:21 +00:00
_typing_utils.py Revert "Fix type annotation of Linear.bias (#142326)" 2025-01-26 03:41:00 +00:00
_zip.py
backend_registration.py PEP585 update - torch/utils (#145201) 2025-01-21 21:04:10 +00:00
bundled_inputs.py PEP585 update - torch/utils (#145201) 2025-01-21 21:04:10 +00:00
checkpoint.py
collect_env.py Revert "Collect packages with importlib in collect_env (#144616)" 2025-01-13 03:11:04 +00:00
cpp_backtrace.py
cpp_extension.py doc/xpu: align description of SyclExtension with CPP/CUDA (#147988) 2025-03-04 04:17:36 +00:00
deterministic.py
dlpack.py
file_baton.py
flop_counter.py [NJT] fix flop counter for SDPA & test (#147032) 2025-02-13 07:14:58 +00:00
hooks.py PEP585 update - torch/utils (#145201) 2025-01-21 21:04:10 +00:00
mkldnn.py
mobile_optimizer.py PEP585 update - torch/utils (#145201) 2025-01-21 21:04:10 +00:00
model_zoo.py
module_tracker.py PEP585 update - torch/utils (#145201) 2025-01-21 21:04:10 +00:00
show_pickle.py Use typing.IO[bytes] instead of io.BytesIO in annotations (#144994) 2025-01-27 18:08:07 +00:00
throughput_benchmark.py
weak.py