pytorch/torch/headeronly
Sudarshan Raghunathan ad3a56ab98 Add a compile-time flag to trigger verbose logging for device-side asserts (#166171)
Summary:
Using `CUDA_KERNEL_ASSERT_PRINTF` inside kernels allows us to log invalid values to the console (that can be in turn used to surface _hopefully_ more clearer error messages).

This does have an impact in the number of registers needed for the values being logged (I confirmed via diffing PTX that there is no other impact relative to using `__assert_fail`)

To avoid causing perf bottlenecks, this change adds a compile-time switch to enable more verbose errors in some of the common kernels that cause DSAs. There is also a Buck config that can be used to configure this switch more conveniently.

## Alternatives considered
I considered making the behavior of `CUDA_KERNEL_ASSERT_PRINTF` controllable via a compile-time macro instead of writing another wrapper for it but there are kernels where the extra register pressure is not as severe and in those cases, having more useful error messages by default is pretty useful.

Test Plan:
## Simple Python Driver:
```
# scatter_errors.py
import torch
def main() -> None:
    a = torch.rand(128, device="cuda:0")
    idx = torch.randint(0, 128, (100,), device="cuda:0")
    idx[0] = 9999
    b = torch.scatter(a, 0, idx, 555.0)
    print(b)
```

When running normally via:
```
$ buck2 run @//mode/opt  :scatter_errors
```
we see the followng DSA message:
```
fbcode/caffe2/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:410: operator(): block: [0,0,0], thread: [0,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
```

Running via:
```
$  buck2 run @//mode/opt -c fbcode.c10_enable_verbose_assert=1 :scatter_errors
```
however produces:
```
[CUDA_KERNEL_ASSERT] fbcode/caffe2/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:410: operator(): block: [0,0,0], thread: [0,0,0]: Assertion failed: `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"`: Expected 0 <= idx_dim < index_size (128), but got idx_dim = 9999
```

Differential Revision: D85185987

Pull Request resolved: https://github.com/pytorch/pytorch/pull/166171
Approved by: https://github.com/ngimel
2025-10-30 19:43:46 +00:00
..
core Hide APIs in torch::headeronly (#166079) 2025-10-25 00:18:26 +00:00
cpu/vec Hide APIs in torch::headeronly (#166079) 2025-10-25 00:18:26 +00:00
macros Add a compile-time flag to trigger verbose logging for device-side asserts (#166171) 2025-10-30 19:43:46 +00:00
util Hide APIs in torch::headeronly (#166079) 2025-10-25 00:18:26 +00:00
BUCK.oss Migrate c10/macros/cmake_macros.h.in to torch/headeronly (#158035) 2025-07-15 19:52:59 +00:00
BUILD.bazel Migrate c10/macros/cmake_macros.h.in to torch/headeronly (#158035) 2025-07-15 19:52:59 +00:00
build.bzl Move version.h to torch/headeronly (#164381) 2025-10-07 17:47:30 +00:00
CMakeLists.txt [Reland] Migrate ScalarType to headeronly (#159911) 2025-08-06 07:36:37 +00:00
ovrsource_defs.bzl Move version.h to torch/headeronly (#164381) 2025-10-07 17:47:30 +00:00
version.h.in Add TORCH_TARGET_VERSION for stable ABI (#164356) 2025-10-29 15:41:28 +00:00