pytorch/torch/_inductor/codegen
Bert Maher d3d85e1c3b Emit torch.cuda.synchronize() after every kernel call in inductor (#90472)
Debugging illegal memory access is hard; even CUDA_LAUNCH_BLOCKING=1
and using C10_CUDA_KERNEL_LAUNCH_CHECK doesn't guarantee a useful stack trace.
doesn't necessarily guarantee that you'll get a stack trace pointing to the
right kernel.  This diff adds a config option to force a CUDA synchronize after
every kernel call in inductor, for debugging those tricky cases.

Differential Revision: [D41744967](https://our.internmc.facebook.com/intern/diff/D41744967/)

Differential Revision: [D41744967](https://our.internmc.facebook.com/intern/diff/D41744967)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90472
Approved by: https://github.com/jansel
2022-12-12 04:35:10 +00:00
..
__init__.py
autotuner.py
common.py Revert "[inductor] New approach for computing triton load/store masks (#89566)" 2022-12-09 19:36:25 +00:00
cpp_prefix.h Support masked_fill to address the GPT2 performance issue (#89274) 2022-11-22 04:12:43 +00:00
cpp.py Emit torch.cuda.synchronize() after every kernel call in inductor (#90472) 2022-12-12 04:35:10 +00:00
triton_conv_delta_x_hwc.j2
triton_conv_delta_x.j2
triton_mm.j2
triton_template.py [inductor] fix could not find as_strided with config.triton.mm=triton (#88946) 2022-11-15 00:48:49 +00:00
triton.py Emit torch.cuda.synchronize() after every kernel call in inductor (#90472) 2022-12-12 04:35:10 +00:00
wrapper.py Emit torch.cuda.synchronize() after every kernel call in inductor (#90472) 2022-12-12 04:35:10 +00:00