pytorch/c10/cuda
Banit Agrawal c4bbc6433e [PyTorch CCA] Add an API to get expandable segment sizes (#163771)
Summary: This diffs add an API to query expandable segment size for each stream so that we can use this info to warmup the segment in advance, so we dont incur any performance penalty during steady state inference for new CUDA memory allocations.

Differential Revision: D76447308

Pull Request resolved: https://github.com/pytorch/pytorch/pull/163771
Approved by: https://github.com/bbus
2025-10-01 02:16:58 +00:00
..
impl Improve error message for CUDAGuardImpl, MPSGuardImpl, XPUGuardImpl (#149838) 2025-03-25 07:29:53 +00:00
test [ROCm][Windows] Fix offload gpu arch list in tests (#155212) 2025-06-05 20:30:28 +00:00
BUILD.bazel
build.bzl
CMakeLists.txt Use torch_compile_options for c10 libraries (#147821) 2025-03-18 01:54:23 +00:00
CUDAAlgorithm.h [c10] Use nested namespace in c10/cuda (#116464) 2023-12-27 23:14:00 +00:00
CUDAAllocatorConfig.cpp [CUDA] Reuse blocks with record_stream during CUDA Graph capture in the CUDACachingAllocator (#158352) 2025-09-04 17:21:26 +00:00
CUDAAllocatorConfig.h [CUDA] Reuse blocks with record_stream during CUDA Graph capture in the CUDACachingAllocator (#158352) 2025-09-04 17:21:26 +00:00
CUDACachingAllocator.cpp [PyTorch CCA] Add an API to get expandable segment sizes (#163771) 2025-10-01 02:16:58 +00:00
CUDACachingAllocator.h [PyTorch CCA] Add an API to get expandable segment sizes (#163771) 2025-10-01 02:16:58 +00:00
CUDADeviceAssertion.h Suppress -Wunused-function for DSA (#150735) 2025-04-07 01:47:35 +00:00
CUDADeviceAssertionHost.cpp Enable -Wunused on torch targets (#150077) 2025-05-02 07:14:19 +00:00
CUDADeviceAssertionHost.h [BE] fix typos in c10/ (#156078) 2025-06-18 10:24:44 +00:00
CUDAException.cpp [BE] Preserve caller source location in the error message (#162808) 2025-09-15 13:29:43 +00:00
CUDAException.h [BE] Preserve caller source location in the error message (#162808) 2025-09-15 13:29:43 +00:00
CUDAFunctions.cpp [BE] Fix '_WIN32' is not defined warning (#162516) 2025-09-10 04:21:38 +00:00
CUDAFunctions.h check if USE_ROCM is defined (#158571) 2025-07-17 19:48:26 +00:00
CUDAGraphsC10Utils.h Add DeviceAllocator as the base device allocator (#138222) 2025-08-08 17:41:10 +00:00
CUDAGuard.h Enable more readability-redundant checks (#143963) 2024-12-30 14:49:33 +00:00
CUDAMacros.h Revert "Increase C10_COMPILE_TIME_MAX_GPUS to 128 (#144138)" 2025-01-14 19:04:12 +00:00
CUDAMallocAsyncAllocator.cpp [PyTorch CCA] Add an API to get expandable segment sizes (#163771) 2025-10-01 02:16:58 +00:00
CUDAMathCompat.h [Reland] [5/N] Change static functions in headers to inline (#131010) 2024-07-18 15:53:48 +00:00
CUDAMiscFunctions.cpp Use cuda error code instead of error text in get_cuda_error_help (#158688) 2025-07-21 23:34:50 +00:00
CUDAMiscFunctions.h Use cuda error code instead of error text in get_cuda_error_help (#158688) 2025-07-21 23:34:50 +00:00
CUDAStream.cpp Replace TORCH_INTERNAL_ASSERT with TORCH_CHECK (#160411) 2025-08-13 06:31:10 +00:00
CUDAStream.h [Clang-tidy header][24/N] Fix clang-tidy warnings on c10/cuda/*.{cpp,h} (#120781) 2024-03-15 05:03:22 +00:00
driver_api.cpp [CI] Add basic CUDA 13.0 periodic test (#161013) 2025-08-29 17:56:33 +00:00
driver_api.h fabric detection - fix build on an old toolkit (#160984) 2025-08-19 23:43:36 +00:00
README.md

c10/cuda is a core library with CUDA functionality. It is distinguished from c10 in that it links against the CUDA library, but like c10 it doesn't contain any kernels, and consists solely of core functionality that is generally useful when writing CUDA code; for example, C++ wrappers for the CUDA C API.

Important notes for developers. If you want to add files or functionality to this folder, TAKE NOTE. The code in this folder is very special, because on our AMD GPU build, we transpile it into c10/hip to provide a ROCm environment. Thus, if you write:

// c10/cuda/CUDAFoo.h
namespace c10 { namespace cuda {

void my_func();

}}

this will get transpiled into:

// c10/hip/HIPFoo.h
namespace c10 { namespace hip {

void my_func();

}}

Thus, if you add new functionality to c10, you must also update C10_MAPPINGS torch/utils/hipify/cuda_to_hip_mappings.py to transpile occurrences of cuda::my_func to hip::my_func. (At the moment, we do NOT have a catch all cuda:: to hip:: namespace conversion, as not all cuda namespaces are converted to hip::, even though c10's are.)

Transpilation inside this folder is controlled by CAFFE2_SPECIFIC_MAPPINGS (oddly enough.) C10_MAPPINGS apply to ALL source files.

If you add a new directory to this folder, you MUST update both c10/cuda/CMakeLists.txt and c10/hip/CMakeLists.txt