pytorch/c10/cuda
Jeff Daily 59592ce9f2 [CUDA Host Allocator][ROCm] fixes (#110715)
Follow up to #110123, removing the CUDA_VERSION check for ROCm because HIP already has hipMallocAsync() and doesn't need the version check there.

Follow up to #108488, fixing the unit failing unit tests by accepting either a "cuda" or "hip" attribute for the caching allocator options.  This is aligned to the masquerading strategy for ROCm/HIP.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110715
Approved by: https://github.com/ezyang
2023-10-06 21:42:24 +00:00
..
impl [Reland] Add -Wdeprecated and related fixes (#110019) 2023-09-28 03:34:29 +00:00
test Fix CUDA Bazel build to optionally include gmock after #104255 (#104308) 2023-06-29 07:15:06 +00:00
BUILD.bazel create //c10/cuda library (#70863) 2022-02-03 19:17:18 +00:00
build.bzl Support expandable_segments:True in fbcode for caching allocator 2023-05-02 11:12:39 -07:00
CMakeLists.txt [PyTorch CCA] Refactor caching allocator config code (#110123) 2023-10-04 14:58:23 +00:00
CUDAAlgorithm.h Do not use thrust::lower_bound on device (#80746) 2022-07-02 03:00:27 +00:00
CUDAAllocatorConfig.cpp [CUDA Host Allocator][ROCm] fixes (#110715) 2023-10-06 21:42:24 +00:00
CUDAAllocatorConfig.h [CUDA Host Allocator] Add support of CudaHostRegister (#108488) 2023-10-06 04:13:02 +00:00
CUDACachingAllocator.cpp [PyTorch CCA] Refactor caching allocator config code (#110123) 2023-10-04 14:58:23 +00:00
CUDACachingAllocator.h [PyTorch CCA] Refactor caching allocator config code (#110123) 2023-10-04 14:58:23 +00:00
CUDADeviceAssertion.h Prevent duplicate symbol for dsa_add_new_assertion_failure (#94064) 2023-02-09 19:47:36 +00:00
CUDADeviceAssertionHost.cpp [CUDA12] set_device change (#94864) 2023-04-10 17:31:12 +00:00
CUDADeviceAssertionHost.h Fixes to DSA infra (#91835) 2023-01-12 21:54:26 +00:00
CUDAException.cpp Fix C10_CUDA_CHECK for failing to capture last cuda error occasionally (#93192) 2023-01-28 09:06:10 +00:00
CUDAException.h Fix CUDA error not getting captured by handler (#92227) 2023-01-17 00:16:29 +00:00
CUDAFunctions.cpp [BE] use DeviceIndex instead of int64_t for related device interfaces (#103068) 2023-08-25 20:16:14 +00:00
CUDAFunctions.h [BE] use DeviceIndex instead of int64_t for related device interfaces (#103068) 2023-08-25 20:16:14 +00:00
CUDAGraphsC10Utils.h [ROCm] hipGraph support for pytorch mainline (#88202) 2023-02-14 22:18:56 +00:00
CUDAGuard.h [PyTorch] Autoformat c10 (#56830) 2021-04-30 21:23:28 -07:00
CUDAMacros.h use c10/macros/cmake_macros.h in fbcode build (#70851) 2022-01-19 20:56:12 +00:00
CUDAMallocAsyncAllocator.cpp Introduce memory stacks for free (#106758) 2023-08-14 20:38:15 +00:00
CUDAMathCompat.h avoid CPU std::copysign segfault when compiling on arm64 (take-2) (#55608) 2021-04-08 11:34:09 -07:00
CUDAMiscFunctions.cpp Fixes to DSA infra (#91835) 2023-01-12 21:54:26 +00:00
CUDAMiscFunctions.h [allocator] Move getFreeMutex (#87237) 2022-10-19 18:00:40 +00:00
CUDAStream.cpp [ROCm] HIP stream priority fix post #101956 (#106157) 2023-07-31 16:57:20 +00:00
CUDAStream.h [ROCm] HIP stream priority fix post #101956 (#106157) 2023-07-31 16:57:20 +00:00
driver_api.cpp increase clang-tidy coverage to more c10 source files (#102902) 2023-06-04 06:33:01 +00:00
driver_api.h Improve OOM error message (#99699) 2023-04-21 21:36:48 +00:00
README.md

c10/cuda is a core library with CUDA functionality. It is distinguished from c10 in that it links against the CUDA library, but like c10 it doesn't contain any kernels, and consists solely of core functionality that is generally useful when writing CUDA code; for example, C++ wrappers for the CUDA C API.

Important notes for developers. If you want to add files or functionality to this folder, TAKE NOTE. The code in this folder is very special, because on our AMD GPU build, we transpile it into c10/hip to provide a ROCm environment. Thus, if you write:

// c10/cuda/CUDAFoo.h
namespace c10 { namespace cuda {

void my_func();

}}

this will get transpiled into:

// c10/hip/HIPFoo.h
namespace c10 { namespace hip {

void my_func();

}}

Thus, if you add new functionality to c10, you must also update C10_MAPPINGS torch/utils/hipify/cuda_to_hip_mappings.py to transpile occurrences of cuda::my_func to hip::my_func. (At the moment, we do NOT have a catch all cuda:: to hip:: namespace conversion, as not all cuda namespaces are converted to hip::, even though c10's are.)

Transpilation inside this folder is controlled by CAFFE2_SPECIFIC_MAPPINGS (oddly enough.) C10_MAPPINGS apply to ALL source files.

If you add a new directory to this folder, you MUST update both c10/cuda/CMakeLists.txt and c10/hip/CMakeLists.txt