pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

History

Elias Ellison f18f0c70ab Dont clone unmutated args in triton autotuning (#89519 ) Improves first memory compression on pytorch struct from .55 -> .73. However, it doesn't totally eliminate the overhead from autotuning. Any other pointers on where the overhead is coming from in autotuning would be great. Edit: i think it's just the triton cache clearing `44f577984d/python/triton/testing.py (L159)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/89519 Approved by: https://github.com/ngimel, https://github.com/jansel		2022-11-23 22:00:03 +00:00
..
__init__.py
autotuner.py
common.py	[inductor] Fix nan handling for aten.sign (#88937 )	2022-11-21 20:56:40 +00:00
cpp_prefix.h	Support masked_fill to address the GPT2 performance issue (#89274 )	2022-11-22 04:12:43 +00:00
cpp.py	[inductor] generate nan in the cpp backend (#89289 )	2022-11-22 15:54:04 +00:00
triton_conv_delta_x_hwc.j2
triton_conv_delta_x.j2
triton_mm.j2
triton_template.py	[inductor] fix could not find as_strided with config.triton.mm=triton (#88946 )	2022-11-15 00:48:49 +00:00
triton.py	Dont clone unmutated args in triton autotuning (#89519 )	2022-11-23 22:00:03 +00:00
wrapper.py	Have kernel names include fused ops (#88624 )	2022-11-10 21:38:06 +00:00