pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

History

Prachi Gupta b8f4dc5a9f [ROCm] opportunistic fastatomics for ReduceAdd operations for MI300 GPUs (#146264 ) In this approach, we are catching any lane within a wave that is doing fastatomics to the same destination address and computing the sum on the CU. This is leading to 3x improvement in scatter_add performance and 2x improvement in index_select. scatter_add performance on MI300x: dtype\|Baseline (before optimizations)\|opportunistic fastatomics -------\|----------------------------------\|---------------------------------- f32\|1.389425039\|0.430447996 fp16\|2.195472956\|0.779729486 bf16\|2.194051027\|0.784599513 Using the following reproducer ``` import torch import triton def main(): dtype = torch.float32 dim = 1305301 a = torch.rand(100, device="cuda", dtype=dtype) index = torch.randint(0, 100, (dim,), device="cuda") src = torch.rand(dim, device="cuda", dtype=dtype) print("=" * 20) print( triton.testing.do_bench( lambda: a.scatter_add(0, index, src), return_mode="median", ) ) print("=" * 20) if __name__ == "__main__": main() ``` co-authored by: @amd-hhashemi Pull Request resolved: https://github.com/pytorch/pytorch/pull/146264 Approved by: https://github.com/jeffdaily, https://github.com/mxz297 Co-authored-by: Hashem Hashemi <hashem.hashemi@amd.com>		2025-04-22 21:55:40 +00:00
..
_strobelight
_sympy	Add ccode for CeilToInt and IntTrueDiv (#151375 )	2025-04-16 16:47:55 +00:00
backcompat
benchmark	[BE][Ez]: Use itertools.chain.from_iterable when possible (#148190 )	2025-03-06 20:37:06 +00:00
bottleneck
data	Optimize dataloader Self typing (#146816 )	2025-04-08 03:52:23 +00:00
hipify	[fbgemm_gpu] Incorporate Torch DSA (#151148 )	2025-04-15 11:34:04 +00:00
jit
model_dump
serialization	Make record/storage alignment in torch.save configurable (#147788 )	2025-03-06 12:04:46 +00:00
tensorboard	Define `__all__` for `torch.utils.tensorboard` (#147550 )	2025-02-28 23:06:11 +00:00
viz	Fix `ReferenceError: weakly-referenced object no longer exists` in cycle detector (#146922 )	2025-02-24 22:27:39 +00:00
__init__.py
_appending_byte_serializer.py	[MegaCache] Encode key in base64 (#151472 )	2025-04-17 17:12:22 +00:00
_backport_slots.py
_config_module.py	Revert "[dynamo] context manager/decorator for dynamo config patching during tracing (#150586 )"	2025-04-16 16:13:47 +00:00
_config_typing.pyi
_content_store.py	Revert "Use the device interface for detecting Triton availability (#139171 )"	2025-03-11 18:49:21 +00:00
_contextlib.py
_cpp_embed_headers.py
_cpp_extension_versioner.py
_cxx_pytree.py	Gracefully handle optree less than minimum version, part 2 (#151257 )	2025-04-15 13:08:26 +00:00
_device.py	Remove torch functions that do not support device arguments from _device_constructor (#150290 )	2025-04-08 15:13:55 +00:00
_exposed_in.py
_filelock.py
_foreach_utils.py	[HPU] Add hpu to fused kernels supported devices (#148666 )	2025-03-07 04:28:33 +00:00
_freeze.py
_functools.py
_get_clean_triton.py	Reland: [inductor] Simplify grid handling (#148305 )	2025-03-12 15:52:16 +00:00
_import_utils.py
_mode_utils.py
_ordered_set.py
_python_dispatch.py
_pytree.py	Gracefully handle optree less than minimum version, part 2 (#151257 )	2025-04-15 13:08:26 +00:00
_stats.py
_thunk.py
_traceback.py
_triton.py	[Inductor] Remove triton dtype patch which has landed (#149611 )	2025-04-10 03:42:55 +00:00
_typing_utils.py
_zip.py
backend_registration.py
bundled_inputs.py
checkpoint.py
collect_env.py	collect_env: gracefully handle no pip (#151607 )	2025-04-18 12:28:58 +00:00
cpp_backtrace.py
cpp_extension.py	[ROCm] opportunistic fastatomics for ReduceAdd operations for MI300 GPUs (#146264 )	2025-04-22 21:55:40 +00:00
deterministic.py
dlpack.py	Add `__all__` for `torch.utils.dlpack` (#149026 )	2025-04-11 22:03:24 +00:00
file_baton.py	Warn user of existing lock file to avoid infinite waiting (#149382 )	2025-04-15 20:25:29 +00:00
flop_counter.py
hooks.py
mkldnn.py
mobile_optimizer.py
model_zoo.py
module_tracker.py
show_pickle.py
throughput_benchmark.py
weak.py