pytorch/caffe2/core
hongxyan 66a76516bf [ROCm] Disabling Kernel Asserts for ROCm by default - fix and clean up and refactoring (#114660)
Related to #103973  #110532 #108404 #94891

**Context:**
As commented in 6ae0554d11/cmake/Dependencies.cmake (L1198)
Kernel asserts are enabled by default for CUDA and disabled for ROCm.
However it is somewhat broken, and Kernel assert was still enabled for ROCm.

Disabling kernel assert is also needed for users who do not have PCIe atomics support. These community users have verified that disabling the kernel assert in PyTorch/ROCm platform fixed their pytorch workflow, like torch.sum script, stable-diffusion. (see the related issues)

**Changes:**

This pull request serves the following purposes:
* Refactor and clean up the logic,  make it simpler for ROCm to enable and disable Kernel Asserts
* Fix the bug that Kernel Asserts for ROCm was not disabled by default.

Specifically,
- Renamed `TORCH_DISABLE_GPU_ASSERTS` to `C10_USE_ROCM_KERNEL_ASSERT` for the following reasons:
(1) This variable only applies to ROCm.
(2) The new name is more align with #define CUDA_KERNEL_ASSERT function.
(3) With USE_ in front of the name, we can easily control it with environment variable to turn on and off this feature during build (e.g. `USE_ROCM_KERNEL_ASSERT=1 python setup.py develop` will enable kernel assert for ROCm build).
- Get rid of the `ROCM_FORCE_ENABLE_GPU_ASSERTS' to simplify the logic and make it easier to understand and maintain
- Added `#cmakedefine` to carry over the CMake variable to C++

**Tests:**
(1) build with default mode and verify that USE_ROCM_KERNEL_ASSERT  is OFF(0), and kernel assert is disabled:

```
python setup.py develop
```
Verify CMakeCache.txt has correct value.
```
/xxxx/pytorch/build$ grep USE_ROCM_KERNEL_ASSERT CMakeCache.txt
USE_ROCM_KERNEL_ASSERT:BOOL=0
```
Tested the following code in ROCm build and CUDA build, and expected the return code differently.

```
subprocess.call([sys.executable, '-c', "import torch;torch._assert_async(torch.tensor(0,device='cuda'));torch.cuda.synchronize()"])
```
This piece of code is adapted from below unit test to get around the limitation that this unit test now was skipped for ROCm. (We will check to enable this unit test in the future)

```
python test/test_cuda_expandable_segments.py -k test_fixed_cuda_assert_async
```

Ran the following script, expecting r ==0 since the CUDA_KERNEL_ASSERT is defined as nothing:
```
>> import sys
>>> import subprocess
>>> r=subprocess.call([sys.executable, '-c', "import torch;torch._assert_async(torch.tensor(0,device='cuda'));torch.cuda.synchronize()"])
>>> r
0
```

(2) Enable the kernel assert by building with USE_ROCM_KERNEL_ASSERT=1, or USE_ROCM_KERNEL_ASSERT=ON
```
USE_ROCM_KERNEL_ASSERT=1 python setup.py develop
```

Verify `USE_ROCM_KERNEL_ASSERT` is `1`
```
/xxxx/pytorch/build$ grep USE_ROCM_KERNEL_ASSERT CMakeCache.txt
USE_ROCM_KERNEL_ASSERT:BOOL=1
```

Run the assert test, and expected return code not equal to 0.

```
>> import sys
>>> import subprocess
>>> r=subprocess.call([sys.executable, '-c', "import torch;torch._assert_async(torch.tensor(0,device='cuda'));torch.cuda.synchronize()"])
>>>/xxxx/pytorch/aten/src/ATen/native/hip/TensorCompare.hip:108: _assert_async_cuda_kernel: Device-side assertion `input[0] != 0' failed.
:0:rocdevice.cpp            :2690: 2435301199202 us: [pid:206019 tid:0x7f6cf0a77700] Callback: Queue 0x7f64e8400000 aborting with error : HSA_STATUS_ERROR_EXCEPTION: An HSAIL operation resulted in a hardware exception. code: 0x1016

>>> r
-6
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114660
Approved by: https://github.com/jeffdaily, https://github.com/malfet, https://github.com/jithunnair-amd
2023-12-13 15:44:53 +00:00
..
hip Replace all CHECK_ and DCHECK_ with TORCH_* macros (#82032) 2022-07-26 01:20:44 +00:00
nomnigraph [BE] Enforce missing override keyword (#104032) 2023-06-24 02:34:24 +00:00
__init__.py
allocator.cc
allocator.h
blob_gpu_test.cc
blob_serialization_gpu.cc
blob_serialization.cc [caffe2] Don't copy Tensor dims during deserialization (#79471) 2022-07-12 21:36:26 +00:00
blob_serialization.h [caffe2] Don't copy Tensor dims during deserialization (#79471) 2022-07-12 21:36:26 +00:00
blob_serializer_base.h
blob_stats.cc
blob_stats.h
blob_test.cc Fix sign-compare in caffe2 cpp tests 2022-04-05 00:08:05 +00:00
blob.h [caffe2] Micro-optimizations in BlobGetMutableTensor (#98103) 2023-04-10 19:43:02 +00:00
CMakeLists.txt Remove caffe2 mobile (#84338) 2022-09-08 01:49:55 +00:00
common_cudnn.cc Replace all CHECK_ and DCHECK_ with TORCH_* macros (#82032) 2022-07-26 01:20:44 +00:00
common_cudnn.h
common_gpu.cc [ROCm] use hipblas instead of rocblas (#105881) 2023-07-31 20:42:55 +00:00
common_gpu.h [CUDA] Drop CUDA 10 support (#89582) 2023-01-05 05:11:53 +00:00
common_omp.h
common_test.cc [Reland] Eliminate invocations of c10::stoi,c10::stod,c10::stoull,c10::stoll (#109566) 2023-09-19 07:15:25 +00:00
common.cc Disable avoid-non-const-global-variables lint check (#62008) 2021-07-22 18:04:40 -07:00
common.h
context_base.cc Disable avoid-non-const-global-variables lint check (#62008) 2021-07-22 18:04:40 -07:00
context_base.h
context_gpu_test.cc Replace all CHECK_ and DCHECK_ with TORCH_* macros (#82032) 2022-07-26 01:20:44 +00:00
context_gpu.cu Revert "Add function to materialize COW storages (#113396)" 2023-11-20 10:26:01 +00:00
context_gpu.h [caffe2] dont call cudnnDestroy on thread exit (crashes on windows with cuda 11/12) (#95382) 2023-03-10 06:42:51 +00:00
context_test.cc cleanup unused include (#93359) 2023-02-04 02:15:50 +00:00
context.cc Disable avoid-non-const-global-variables lint check (#62008) 2021-07-22 18:04:40 -07:00
context.h use irange for loops 2 (#66746) 2021-12-10 04:26:23 -08:00
cudnn_wrappers.h Replace all CHECK_ and DCHECK_ with TORCH_* macros (#82032) 2022-07-26 01:20:44 +00:00
db.cc Disable avoid-non-const-global-variables lint check (#62008) 2021-07-22 18:04:40 -07:00
db.h use irange for loops 2 (#66746) 2021-12-10 04:26:23 -08:00
distributions_stubs.h
event_cpu.h
event_gpu_test.cc
event_gpu.cc cast return of cudaGetLastError() to void when discarding (#62518) 2021-08-03 11:17:22 -07:00
event_test.cc Disable avoid-non-const-global-variables lint check (#62008) 2021-07-22 18:04:40 -07:00
event.cc Revert "Use missing-prototypes in torch_cpu (#103725)" 2023-06-22 18:30:31 +00:00
event.h
export_c10_op_to_caffe2.cc Revert "Use missing-prototypes in torch_cpu (#103725)" 2023-06-22 18:30:31 +00:00
export_c10_op_to_caffe2.h use irange for loops 2 (#66746) 2021-12-10 04:26:23 -08:00
export_caffe2_op_to_c10.h Revert "Use missing-prototypes in torch_cpu (#103725)" 2023-06-22 18:30:31 +00:00
flags.h
graph_test.cc Disable avoid-non-const-global-variables lint check (#62008) 2021-07-22 18:04:40 -07:00
graph.cc
graph.h
init_denormals.cc Disable avoid-non-const-global-variables lint check (#62008) 2021-07-22 18:04:40 -07:00
init_intrinsics_check.cc Revert "Use missing-prototypes in torch_cpu (#103725)" 2023-06-22 18:30:31 +00:00
init_omp.cc Revert "Use missing-prototypes in torch_cpu (#103725)" 2023-06-22 18:30:31 +00:00
init_test.cc Disable avoid-non-const-global-variables lint check (#62008) 2021-07-22 18:04:40 -07:00
init.cc Revert "Use missing-prototypes in torch_cpu (#103725)" 2023-06-22 18:30:31 +00:00
init.h
int8_serialization.cc Disable avoid-non-const-global-variables lint check (#62008) 2021-07-22 18:04:40 -07:00
logging.h
macros.h
macros.h.in [ROCm] Disabling Kernel Asserts for ROCm by default - fix and clean up and refactoring (#114660) 2023-12-13 15:44:53 +00:00
memonger.cc
memonger.h
module_test.cc Disable avoid-non-const-global-variables lint check (#62008) 2021-07-22 18:04:40 -07:00
module.cc
module.h
net_async_base.cc Disable avoid-non-const-global-variables lint check (#62008) 2021-07-22 18:04:40 -07:00
net_async_base.h
net_async_scheduling.cc Disable avoid-non-const-global-variables lint check (#62008) 2021-07-22 18:04:40 -07:00
net_async_scheduling.h
net_async_task_future.cc
net_async_task_future.h
net_async_task_graph.cc
net_async_task_graph.h
net_async_task.cc
net_async_task.h
net_async_tracing_test.cc Disable avoid-non-const-global-variables lint check (#62008) 2021-07-22 18:04:40 -07:00
net_async_tracing.cc [Reland] Eliminate invocations of c10::stoi,c10::stod,c10::stoull,c10::stoll (#109566) 2023-09-19 07:15:25 +00:00
net_async_tracing.h
net_dag_utils_test.cc Fix warnings (#62930) 2021-08-11 14:07:10 -07:00
net_dag_utils.cc [caffe2] Replace CAFFE_ prefixes in static_tracepoint.h macros with TORCH_ (#106380) 2023-08-03 21:51:36 +00:00
net_dag_utils.h
net_gpu_test.cc Replace all CHECK_ and DCHECK_ with TORCH_* macros (#82032) 2022-07-26 01:20:44 +00:00
net_parallel.cc Disable avoid-non-const-global-variables lint check (#62008) 2021-07-22 18:04:40 -07:00
net_parallel.h
net_simple_refcount_test.cc Disable avoid-non-const-global-variables lint check (#62008) 2021-07-22 18:04:40 -07:00
net_simple_refcount.cc [caffe2] Replace CAFFE_ prefixes in static_tracepoint.h macros with TORCH_ (#106380) 2023-08-03 21:51:36 +00:00
net_simple_refcount.h
net_simple.cc [caffe2] Replace CAFFE_ prefixes in static_tracepoint.h macros with TORCH_ (#106380) 2023-08-03 21:51:36 +00:00
net_simple.h
net_test.cc Replace all CHECK_ and DCHECK_ with TORCH_* macros (#82032) 2022-07-26 01:20:44 +00:00
net.cc Disable avoid-non-const-global-variables lint check (#62008) 2021-07-22 18:04:40 -07:00
net.h
numa.cc
numa.h
observer_test.cc Disable avoid-non-const-global-variables lint check (#62008) 2021-07-22 18:04:40 -07:00
observer.h
operator_gpu_test.cc
operator_gradient.h
operator_schema_test.cc Disable avoid-non-const-global-variables lint check (#62008) 2021-07-22 18:04:40 -07:00
operator_schema.cc [PyTorch] Remove unnecessary iostream includes in headers (#61500) 2021-08-19 18:54:51 -07:00
operator_schema.h Revert "Use missing-prototypes in torch_cpu (#103725)" 2023-06-22 18:30:31 +00:00
operator_test.cc Disable avoid-non-const-global-variables lint check (#62008) 2021-07-22 18:04:40 -07:00
operator.cc [caffe2] Remove OperatorBase::newstyle_outputs_ (#67093) 2023-01-23 22:41:59 +00:00
operator.h [BE] Enforce missing override keyword (#104032) 2023-06-24 02:34:24 +00:00
parallel_net_test.cc Replace all CHECK_ and DCHECK_ with TORCH_* macros (#82032) 2022-07-26 01:20:44 +00:00
plan_executor_test.cc Fix broken caffe2 test: PlanExecutorTest.BlockingErrorPlan (#64401) 2021-09-02 08:30:29 -07:00
plan_executor.cc some reference and move fixes (#95942) 2023-03-10 03:44:09 +00:00
plan_executor.h
prof_dag_counters.cc turn on -Werror=type-limits in our Bazel CPU build 2022-06-10 10:04:08 +00:00
prof_dag_counters.h
qtensor_serialization.cc Disable avoid-non-const-global-variables lint check (#62008) 2021-07-22 18:04:40 -07:00
qtensor_serialization.h [BE] Enforce missing override keyword (#104032) 2023-06-24 02:34:24 +00:00
qtensor.cc
qtensor.h Replace all CHECK_ and DCHECK_ with TORCH_* macros (#82032) 2022-07-26 01:20:44 +00:00
scope_guard.h
serialization_test.cc Fix sign-compare in caffe2 cpp tests 2022-04-05 00:08:05 +00:00
stats_test.cc Disable avoid-non-const-global-variables lint check (#62008) 2021-07-22 18:04:40 -07:00
stats.cc
stats.h [caffe2] Replace CAFFE_ prefixes in static_tracepoint.h macros with TORCH_ (#106380) 2023-08-03 21:51:36 +00:00
storage.h
tensor_impl.h
tensor_int8.cc
tensor_int8.h
tensor.cc Revert "Use missing-prototypes in torch_cpu (#103725)" 2023-06-22 18:30:31 +00:00
tensor.h [PyTorch] Further reduce cost of TypeMeta::_typeMetaData (by 10x!) (#98105) 2023-04-12 17:44:48 +00:00
test_utils.cc
test_utils.h use irange for loops 2 (#66746) 2021-12-10 04:26:23 -08:00
timer_test.cc Disable avoid-non-const-global-variables lint check (#62008) 2021-07-22 18:04:40 -07:00
timer.h
transform_test.cc Fix sign-compare in caffe2 cpp tests 2022-04-05 00:08:05 +00:00
transform.cc Revert "Use missing-prototypes in torch_cpu (#103725)" 2023-06-22 18:30:31 +00:00
transform.h
types.cc Speed up DataTypeToTypeMeta (#66113) 2021-10-07 08:06:09 -07:00
types.h Speed up DataTypeToTypeMeta (#66113) 2021-10-07 08:06:09 -07:00
workspace_test.cc Disable avoid-non-const-global-variables lint check (#62008) 2021-07-22 18:04:40 -07:00
workspace.cc Disable avoid-non-const-global-variables lint check (#62008) 2021-07-22 18:04:40 -07:00
workspace.h