pytorch/c10/cuda/impl
Edward Yang 515238e0a5 Unify cudaGetDeviceCount implementations. (#18445)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18445
ghimport-source-id: 30d018737bf6989bc68b7e3676f44e0ca6141fde

Stack from [ghstack](https://github.com/ezyang/ghstack):
* #18242 Test running a CUDA build on CPU machine.
* **#18445 Unify cudaGetDeviceCount implementations.**

I went about doing this by searching for calls to cudaGetDeviceCount,
and then methodically replacing them with references to c10::cuda::device_count()
or at::cuda::device_count().

There is a point to doing this: the various implementations wildly differed
in their handling of what to do when cudaGetDeviceCount returns an error.
The final standardized behavior is that **all errors are swallowed** and
we return device count of zero.  This indirectly fixes running CUDA builds
on CPU, which was broken in #17847.

I added 'noexcept' to the 'deviceCount' virtual method on DeviceGuardImpl.
This is a BC-breaking change for anyone inheriting from DeviceGuardImpl
but all you need to do is put 'noexcept' on your method and it is backwards
compatible with older libtorch.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: D14612189

fbshipit-source-id: 3c8d186e3dd623c0e27625212c7ce30f75d943cb
2019-03-26 09:50:14 -07:00
..
cuda_cmake_macros.h.in Add c10 cuda library. (#13900) 2018-11-19 08:20:07 -08:00
CUDAGuardImpl.cpp Move CUDAGuard, CUDAStream and CUDAGuardImpl to c10/cuda (#14248) 2018-12-12 11:24:26 -08:00
CUDAGuardImpl.h Unify cudaGetDeviceCount implementations. (#18445) 2019-03-26 09:50:14 -07:00
CUDATest.cpp Catch cudaError_t return val (nodiscard in rocm) (#16399) 2019-02-11 13:18:36 -08:00
CUDATest.h Add c10 cuda library. (#13900) 2018-11-19 08:20:07 -08:00