mirror of
https://github.com/zebrajr/pytorch.git
synced 2025-12-07 00:21:07 +01:00
Debugging illegal memory access is hard; even CUDA_LAUNCH_BLOCKING=1 and using C10_CUDA_KERNEL_LAUNCH_CHECK doesn't guarantee a useful stack trace. doesn't necessarily guarantee that you'll get a stack trace pointing to the right kernel. This diff adds a config option to force a CUDA synchronize after every kernel call in inductor, for debugging those tricky cases. Differential Revision: [D41744967](https://our.internmc.facebook.com/intern/diff/D41744967/) Differential Revision: [D41744967](https://our.internmc.facebook.com/intern/diff/D41744967) Pull Request resolved: https://github.com/pytorch/pytorch/pull/90472 Approved by: https://github.com/jansel |
||
|---|---|---|
| .. | ||
| __init__.py | ||
| autotuner.py | ||
| common.py | ||
| cpp_prefix.h | ||
| cpp.py | ||
| triton_conv_delta_x_hwc.j2 | ||
| triton_conv_delta_x.j2 | ||
| triton_mm.j2 | ||
| triton_template.py | ||
| triton.py | ||
| wrapper.py | ||