pytorch/c10/cuda
sryap 32461ed319 Pool cudaEvents in CUDACachingAllocator (#78279)
Summary:
cudaEventCreate/Destroy can be expensive especially when the process is calling lots of other CUDA APIs.

Pool the `cudaEvent_t` objects so that we create them once and reuse as much as possible.

Test Plan:
Unit tests to check the functionality.

Manual performance testing shows that this diff is perf positive.

|  | create_event_internal (us) | free_event_internal/destructor (us) | insert_events (us) | process_events (us) |
| baseline | 2.411 | 2.647 | 3.968 | 0.321 |
| this diff | 0.115 | 0.147 | 2.846 | 0.262 |
| speed up | 20.9x | 18.0x | 1.4x | 1.2x |

Differential Revision: D35729059

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78279
Approved by: https://github.com/jianyuh
2022-06-07 18:26:45 +00:00
..
impl Check all CUDA API calls for errors in caffe2/c10/ (#74918) 2022-03-30 17:13:02 +00:00
test move //c10/cuda/test to shared build structure (#71429) 2022-02-03 22:33:41 +00:00
BUILD.bazel create //c10/cuda library (#70863) 2022-02-03 19:17:18 +00:00
build.bzl create //c10/cuda library (#70863) 2022-02-03 19:17:18 +00:00
CMakeLists.txt Update CMake and use native CUDA language support (#62445) 2021-10-11 09:05:48 -07:00
CUDACachingAllocator.cpp Pool cudaEvents in CUDACachingAllocator (#78279) 2022-06-07 18:26:45 +00:00
CUDACachingAllocator.h [CUDACachingAlloc/GPUInference] Implement garbage collection without GPU sync (#74261) 2022-03-21 18:46:02 +00:00
CUDAException.h Add additional CUDA error handling macros (#74865) 2022-03-29 18:03:03 +00:00
CUDAFunctions.cpp Check all CUDA API calls for errors in caffe2/c10/ (#74918) 2022-03-30 17:13:02 +00:00
CUDAFunctions.h [ROCm] Update HIP_VERSION to TORCH_HIP_VERSION (#62786) 2021-08-13 15:00:43 -07:00
CUDAGraphsC10Utils.h [ROCm] Changes not to rely on CUDA_VERSION or HIP_VERSION (#65610) 2021-09-29 09:55:43 -07:00
CUDAGuard.h [PyTorch] Autoformat c10 (#56830) 2021-04-30 21:23:28 -07:00
CUDAMacros.h use c10/macros/cmake_macros.h in fbcode build (#70851) 2022-01-19 20:56:12 +00:00
CUDAMathCompat.h avoid CPU std::copysign segfault when compiling on arm64 (take-2) (#55608) 2021-04-08 11:34:09 -07:00
CUDAMiscFunctions.cpp wrap cudaStreamSynchronize calls (#61889) 2021-07-21 19:30:52 -07:00
CUDAMiscFunctions.h wrap cudaStreamSynchronize calls (#61889) 2021-07-21 19:30:52 -07:00
CUDAStream.cpp [reland] Allow external CUDA streams to be set as current (#66324) 2021-10-11 02:41:43 -07:00
CUDAStream.h Check all CUDA API calls for errors in caffe2/c10/ (#74918) 2022-03-30 17:13:02 +00:00
README.md

c10/cuda is a core library with CUDA functionality. It is distinguished from c10 in that it links against the CUDA library, but like c10 it doesn't contain any kernels, and consists solely of core functionality that is generally useful when writing CUDA code; for example, C++ wrappers for the CUDA C API.

Important notes for developers. If you want to add files or functionality to this folder, TAKE NOTE. The code in this folder is very special, because on our AMD GPU build, we transpile it into c10/hip to provide a ROCm environment. Thus, if you write:

// c10/cuda/CUDAFoo.h
namespace c10 { namespace cuda {

void my_func();

}}

this will get transpiled into:

// c10/hip/HIPFoo.h
namespace c10 { namespace hip {

void my_func();

}}

Thus, if you add new functionality to c10, you must also update C10_MAPPINGS torch/utils/hipify/cuda_to_hip_mappings.py to transpile occurrences of cuda::my_func to hip::my_func. (At the moment, we do NOT have a catch all cuda:: to hip:: namespace conversion, as not all cuda namespaces are converted to hip::, even though c10's are.)

Transpilation inside this folder is controlled by CAFFE2_SPECIFIC_MAPPINGS (oddly enough.) C10_MAPPINGS apply to ALL source files.

If you add a new directory to this folder, you MUST update both c10/cuda/CMakeLists.txt and c10/hip/CMakeLists.txt