pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

History

Bert Maher d3d85e1c3b Emit torch.cuda.synchronize() after every kernel call in inductor (#90472 ) Debugging illegal memory access is hard; even CUDA_LAUNCH_BLOCKING=1 and using C10_CUDA_KERNEL_LAUNCH_CHECK doesn't guarantee a useful stack trace. doesn't necessarily guarantee that you'll get a stack trace pointing to the right kernel. This diff adds a config option to force a CUDA synchronize after every kernel call in inductor, for debugging those tricky cases. Differential Revision: [D41744967](https://our.internmc.facebook.com/intern/diff/D41744967/) Differential Revision: [D41744967](https://our.internmc.facebook.com/intern/diff/D41744967) Pull Request resolved: https://github.com/pytorch/pytorch/pull/90472 Approved by: https://github.com/jansel		2022-12-12 04:35:10 +00:00
..
__init__.py
autotuner.py
common.py	Revert "[inductor] New approach for computing triton load/store masks (#89566 )"	2022-12-09 19:36:25 +00:00
cpp_prefix.h	Support masked_fill to address the GPT2 performance issue (#89274 )	2022-11-22 04:12:43 +00:00
cpp.py	Emit torch.cuda.synchronize() after every kernel call in inductor (#90472 )	2022-12-12 04:35:10 +00:00
triton_conv_delta_x_hwc.j2
triton_conv_delta_x.j2
triton_mm.j2
triton_template.py	[inductor] fix could not find as_strided with config.triton.mm=triton (#88946 )	2022-11-15 00:48:49 +00:00
triton.py	Emit torch.cuda.synchronize() after every kernel call in inductor (#90472 )	2022-12-12 04:35:10 +00:00
wrapper.py	Emit torch.cuda.synchronize() after every kernel call in inductor (#90472 )	2022-12-12 04:35:10 +00:00