pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

History

Natalia Gimelshein 401fa87ace make only current thread allocate to pool in NcclPG (#153990 ) follow up to #153356 that fixes nccl allocation to pool Pull Request resolved: https://github.com/pytorch/pytorch/pull/153990 Approved by: https://github.com/kwen2501		2025-05-21 21:57:37 +00:00
..
impl	Improve error message for CUDAGuardImpl, MPSGuardImpl, XPUGuardImpl (#149838 )	2025-03-25 07:29:53 +00:00
test	Fix some CMake issues (#153686 )	2025-05-19 00:31:34 +00:00
BUILD.bazel	create //c10/cuda library (#70863 )	2022-02-03 19:17:18 +00:00
build.bzl	Support expandable_segments:True in fbcode for caching allocator	2023-05-02 11:12:39 -07:00
CMakeLists.txt	Use torch_compile_options for c10 libraries (#147821 )	2025-03-18 01:54:23 +00:00
CUDAAlgorithm.h	[c10] Use nested namespace in c10/cuda (#116464 )	2023-12-27 23:14:00 +00:00
CUDAAllocatorConfig.cpp	[Environment Variable][Rebase] Use thread-safe getenv functions (#140200 )	2025-05-02 00:41:49 +00:00
CUDAAllocatorConfig.h	[Environment Variable][Rebase] Use thread-safe getenv functions (#140200 )	2025-05-02 00:41:49 +00:00
CUDACachingAllocator.cpp	make only current thread allocate to pool in NcclPG (#153990 )	2025-05-21 21:57:37 +00:00
CUDACachingAllocator.h	make only current thread allocate to pool in NcclPG (#153990 )	2025-05-21 21:57:37 +00:00
CUDADeviceAssertion.h	Suppress `-Wunused-function` for DSA (#150735 )	2025-04-07 01:47:35 +00:00
CUDADeviceAssertionHost.cpp	Enable -Wunused on torch targets (#150077 )	2025-05-02 07:14:19 +00:00
CUDADeviceAssertionHost.h	[pytorch] Expose `c10_retrieve_device_side_assertion_info()` for use by external code (#153211 )	2025-05-10 01:08:45 +00:00
CUDAException.cpp	Enable -Wunused on torch targets (#150077 )	2025-05-02 07:14:19 +00:00
CUDAException.h	C10_UNUSED to [[maybe_unused]] (#6357 ) (#138364 )	2024-10-19 13:17:43 +00:00
CUDAFunctions.cpp	Enable misc-use-internal-linkage check and apply fixes (#148948 )	2025-03-12 14:22:56 +00:00
CUDAFunctions.h	Revert "use copy2d in h2d/d2h copy when possible (#146256 )"	2025-02-25 07:06:38 +00:00
CUDAGraphsC10Utils.h	[4/N] Fix cppcoreguidelines-special-member-functions warnings (#139027 )	2024-10-29 00:18:18 +00:00
CUDAGuard.h	Enable more readability-redundant checks (#143963 )	2024-12-30 14:49:33 +00:00
CUDAMacros.h	Revert "Increase C10_COMPILE_TIME_MAX_GPUS to 128 (#144138 )"	2025-01-14 19:04:12 +00:00
CUDAMallocAsyncAllocator.cpp	[ROCm] enable HIPMallocAsyncAllocator (#149145 )	2025-03-19 23:42:35 +00:00
CUDAMathCompat.h	[Reland] [5/N] Change static functions in headers to inline (#131010 )	2024-07-18 15:53:48 +00:00
CUDAMiscFunctions.cpp	[Reland][Environment Variable][3/N] Use thread-safe getenv functions (#137942 )	2024-10-15 07:47:24 +00:00
CUDAMiscFunctions.h	[c10] Use nested namespace in c10/cuda (#116464 )	2023-12-27 23:14:00 +00:00
CUDAStream.cpp	Revert "Increase C10_COMPILE_TIME_MAX_GPUS to 128 (#144138 )"	2025-01-14 19:04:12 +00:00
CUDAStream.h	[Clang-tidy header][24/N] Fix clang-tidy warnings on c10/cuda/*.{cpp,h} (#120781 )	2024-03-15 05:03:22 +00:00
driver_api.cpp	[SymmetricMemory] introduce multicast support, multimem_all_reduce_ and multimem_one_shot_all_reduce (#133424 )	2024-08-23 20:09:20 +00:00
driver_api.h	set CUDA_MODULE_LOADING for older drivers only (#152695 )	2025-05-20 19:34:40 +00:00
README.md

README.md

c10/cuda is a core library with CUDA functionality. It is distinguished from c10 in that it links against the CUDA library, but like c10 it doesn't contain any kernels, and consists solely of core functionality that is generally useful when writing CUDA code; for example, C++ wrappers for the CUDA C API.

Important notes for developers. If you want to add files or functionality to this folder, TAKE NOTE. The code in this folder is very special, because on our AMD GPU build, we transpile it into c10/hip to provide a ROCm environment. Thus, if you write:

// c10/cuda/CUDAFoo.h
namespace c10 { namespace cuda {

void my_func();

}}

this will get transpiled into:

// c10/hip/HIPFoo.h
namespace c10 { namespace hip {

void my_func();

}}

Thus, if you add new functionality to c10, you must also update C10_MAPPINGS torch/utils/hipify/cuda_to_hip_mappings.py to transpile occurrences of cuda::my_func to hip::my_func. (At the moment, we do NOT have a catch all cuda:: to hip:: namespace conversion, as not all cuda namespaces are converted to hip::, even though c10's are.)

Transpilation inside this folder is controlled by CAFFE2_SPECIFIC_MAPPINGS (oddly enough.) C10_MAPPINGS apply to ALL source files.

If you add a new directory to this folder, you MUST update both c10/cuda/CMakeLists.txt and c10/hip/CMakeLists.txt