pytorch/caffe2/core/macros.h.in
hongxyan 66a76516bf [ROCm] Disabling Kernel Asserts for ROCm by default - fix and clean up and refactoring (#114660)
Related to #103973  #110532 #108404 #94891

**Context:**
As commented in 6ae0554d11/cmake/Dependencies.cmake (L1198)
Kernel asserts are enabled by default for CUDA and disabled for ROCm.
However it is somewhat broken, and Kernel assert was still enabled for ROCm.

Disabling kernel assert is also needed for users who do not have PCIe atomics support. These community users have verified that disabling the kernel assert in PyTorch/ROCm platform fixed their pytorch workflow, like torch.sum script, stable-diffusion. (see the related issues)

**Changes:**

This pull request serves the following purposes:
* Refactor and clean up the logic,  make it simpler for ROCm to enable and disable Kernel Asserts
* Fix the bug that Kernel Asserts for ROCm was not disabled by default.

Specifically,
- Renamed `TORCH_DISABLE_GPU_ASSERTS` to `C10_USE_ROCM_KERNEL_ASSERT` for the following reasons:
(1) This variable only applies to ROCm.
(2) The new name is more align with #define CUDA_KERNEL_ASSERT function.
(3) With USE_ in front of the name, we can easily control it with environment variable to turn on and off this feature during build (e.g. `USE_ROCM_KERNEL_ASSERT=1 python setup.py develop` will enable kernel assert for ROCm build).
- Get rid of the `ROCM_FORCE_ENABLE_GPU_ASSERTS' to simplify the logic and make it easier to understand and maintain
- Added `#cmakedefine` to carry over the CMake variable to C++

**Tests:**
(1) build with default mode and verify that USE_ROCM_KERNEL_ASSERT  is OFF(0), and kernel assert is disabled:

```
python setup.py develop
```
Verify CMakeCache.txt has correct value.
```
/xxxx/pytorch/build$ grep USE_ROCM_KERNEL_ASSERT CMakeCache.txt
USE_ROCM_KERNEL_ASSERT:BOOL=0
```
Tested the following code in ROCm build and CUDA build, and expected the return code differently.

```
subprocess.call([sys.executable, '-c', "import torch;torch._assert_async(torch.tensor(0,device='cuda'));torch.cuda.synchronize()"])
```
This piece of code is adapted from below unit test to get around the limitation that this unit test now was skipped for ROCm. (We will check to enable this unit test in the future)

```
python test/test_cuda_expandable_segments.py -k test_fixed_cuda_assert_async
```

Ran the following script, expecting r ==0 since the CUDA_KERNEL_ASSERT is defined as nothing:
```
>> import sys
>>> import subprocess
>>> r=subprocess.call([sys.executable, '-c', "import torch;torch._assert_async(torch.tensor(0,device='cuda'));torch.cuda.synchronize()"])
>>> r
0
```

(2) Enable the kernel assert by building with USE_ROCM_KERNEL_ASSERT=1, or USE_ROCM_KERNEL_ASSERT=ON
```
USE_ROCM_KERNEL_ASSERT=1 python setup.py develop
```

Verify `USE_ROCM_KERNEL_ASSERT` is `1`
```
/xxxx/pytorch/build$ grep USE_ROCM_KERNEL_ASSERT CMakeCache.txt
USE_ROCM_KERNEL_ASSERT:BOOL=1
```

Run the assert test, and expected return code not equal to 0.

```
>> import sys
>>> import subprocess
>>> r=subprocess.call([sys.executable, '-c', "import torch;torch._assert_async(torch.tensor(0,device='cuda'));torch.cuda.synchronize()"])
>>>/xxxx/pytorch/aten/src/ATen/native/hip/TensorCompare.hip:108: _assert_async_cuda_kernel: Device-side assertion `input[0] != 0' failed.
:0:rocdevice.cpp            :2690: 2435301199202 us: [pid:206019 tid:0x7f6cf0a77700] Callback: Queue 0x7f64e8400000 aborting with error : HSA_STATUS_ERROR_EXCEPTION: An HSAIL operation resulted in a hardware exception. code: 0x1016

>>> r
-6
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114660
Approved by: https://github.com/jeffdaily, https://github.com/malfet, https://github.com/jithunnair-amd
2023-12-13 15:44:53 +00:00

73 lines
2.6 KiB
C

// Automatically generated header file for caffe2 macros. These
// macros are used to build the Caffe2 binary, and if you are
// building a dependent library, they will need to be set as well
// for your program to link correctly.
#pragma once
#cmakedefine CAFFE2_BUILD_SHARED_LIBS
#cmakedefine CAFFE2_FORCE_FALLBACK_CUDA_MPI
#cmakedefine CAFFE2_HAS_MKL_DNN
#cmakedefine CAFFE2_HAS_MKL_SGEMM_PACK
#cmakedefine CAFFE2_PERF_WITH_AVX
#cmakedefine CAFFE2_PERF_WITH_AVX2
#cmakedefine CAFFE2_PERF_WITH_AVX512
#cmakedefine CAFFE2_THREADPOOL_MAIN_IMBALANCE
#cmakedefine CAFFE2_THREADPOOL_STATS
#cmakedefine CAFFE2_USE_EXCEPTION_PTR
#cmakedefine CAFFE2_USE_ACCELERATE
#cmakedefine CAFFE2_USE_CUDNN
#cmakedefine CAFFE2_USE_EIGEN_FOR_BLAS
#cmakedefine CAFFE2_USE_FBCODE
#cmakedefine CAFFE2_USE_GOOGLE_GLOG
#cmakedefine CAFFE2_USE_LITE_PROTO
#cmakedefine CAFFE2_USE_MKL
#cmakedefine USE_MKLDNN
#cmakedefine CAFFE2_USE_NVTX
#cmakedefine CAFFE2_USE_ITT
#cmakedefine CAFFE2_USE_TRT
#ifndef EIGEN_MPL2_ONLY
#cmakedefine EIGEN_MPL2_ONLY
#endif
// Useful build settings that are recorded in the compiled binary
// torch.__build__.show()
#define CAFFE2_BUILD_STRINGS { \
{"TORCH_VERSION", "${TORCH_VERSION}"}, \
{"CXX_COMPILER", "${CMAKE_CXX_COMPILER}"}, \
{"CXX_FLAGS", "${CMAKE_CXX_FLAGS}"}, \
{"BUILD_TYPE", "${CMAKE_BUILD_TYPE}"}, \
{"BLAS_INFO", "${BLAS_INFO}"}, \
{"LAPACK_INFO", "${LAPACK_INFO}"}, \
{"USE_CUDA", "${USE_CUDA}"}, \
{"USE_ROCM", "${USE_ROCM}"}, \
{"CUDA_VERSION", "${CUDA_VERSION}"}, \
{"ROCM_VERSION", "${ROCM_VERSION}"}, \
{"USE_CUDNN", "${USE_CUDNN}"}, \
{"CUDNN_VERSION", "${CUDNN_VERSION}"}, \
{"USE_NCCL", "${USE_NCCL}"}, \
{"USE_MPI", "${USE_MPI}"}, \
{"USE_GFLAGS", "${USE_GFLAGS}"}, \
{"USE_GLOG", "${USE_GLOG}"}, \
{"USE_GLOO", "${USE_GLOI}"}, \
{"USE_NNPACK", "${USE_NNPACK}"}, \
{"USE_OPENMP", "${USE_OPENMP}"}, \
{"FORCE_FALLBACK_CUDA_MPI", "${CAFFE2_FORCE_FALLBACK_CUDA_MPI}"}, \
{"HAS_MKL_DNN", "${CAFFE2_HAS_MKL_DNN}"}, \
{"HAS_MKL_SGEMM_PACK", "${CAFFE2_HAS_MKL_SGEMM_PACK}"}, \
{"PERF_WITH_AVX", "${CAFFE2_PERF_WITH_AVX}"}, \
{"PERF_WITH_AVX2", "${CAFFE2_PERF_WITH_AVX2}"}, \
{"PERF_WITH_AVX512", "${CAFFE2_PERF_WITH_AVX512}"}, \
{"USE_EXCEPTION_PTR", "${CAFFE2_USE_EXCEPTION_PTR}"}, \
{"USE_ACCELERATE", "${CAFFE2_USE_ACCELERATE}"}, \
{"USE_EIGEN_FOR_BLAS", "${CAFFE2_USE_EIGEN_FOR_BLAS}"}, \
{"USE_LITE_PROTO", "${CAFFE2_USE_LITE_PROTO}"}, \
{"USE_MKL", "${CAFFE2_USE_MKL}"}, \
{"USE_MKLDNN", "${USE_MKLDNN}"}, \
{"USE_NVTX", "${CAFFE2_USE_NVTX}"}, \
{"USE_ITT", "${CAFFE2_USE_ITT}"}, \
{"USE_TRT", "${CAFFE2_USE_TRT}"}, \
{"USE_ROCM_KERNEL_ASSERT", "${USE_ROCM_KERNEL_ASSERT}"}, \
}