pytorch/c10/cuda
Bert Maher 03342af3a3 Add env variable to bypass CUDACachingAllocator for debugging (#45294)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45294

While tracking down a recent memory corruption bug we found that
cuda-memcheck wasn't finding the bad accesses, and ngimel pointed out that
it's because we use a caching allocator so a lot of "out of bounds" accesses
land in a valid slab.

This PR adds a runtime knob (`PYTORCH_NO_CUDA_MEMORY_CACHING`) that, when set,
bypasses the caching allocator's caching logic so that allocations go straight
to cudaMalloc.  This way, cuda-memcheck will actually work.

Test Plan:
Insert some memory errors and run a test under cuda-memcheck;
observe that cuda-memcheck flags an error where expected.

Specifically I removed the output-masking logic here:
https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/tensorexpr/cuda_codegen.cpp#L819-L826

And ran:
```
PYTORCH_NO_CUDA_MEMORY_CACHING=1 cuda-memcheck pytest -k test_superslomo test_jit_fuser_te.py
```

Reviewed By: ngimel

Differential Revision: D23964734

Pulled By: bertmaher

fbshipit-source-id: 04efd11e8aff037b9edde80c70585cb820ee6e39
2020-09-28 11:40:04 -07:00
..
impl Call uncheckedSetDevice in ~InlineDeviceGuard only when device index are different (#35438) 2020-03-30 13:13:17 -07:00
test Formatting cmake (to lowercase without space for if/elseif/else/endif) (#35521) 2020-03-27 14:25:17 -07:00
CMakeLists.txt [c10/cuda] Reorganize device_count() and robustly surface ASAN warnings (#42249) 2020-08-05 11:39:31 -07:00
CUDACachingAllocator.cpp Add env variable to bypass CUDACachingAllocator for debugging (#45294) 2020-09-28 11:40:04 -07:00
CUDACachingAllocator.h Add per-device allocator object in CUDACachingAllocator (#37567) 2020-05-11 06:44:44 -07:00
CUDAException.h Creates Torch-friendly Event class and adds Stream tracking to autograd (#25130) 2019-09-01 12:37:52 -07:00
CUDAFunctions.cpp [c10/cuda] Reorganize device_count() and robustly surface ASAN warnings (#42249) 2020-08-05 11:39:31 -07:00
CUDAFunctions.h [c10/cuda] Reorganize device_count() and robustly surface ASAN warnings (#42249) 2020-08-05 11:39:31 -07:00
CUDAGuard.h Move files to/from c10/core and c10/util (#15316) 2019-01-10 16:22:22 -08:00
CUDAMacros.h Don't call cudaStreamDestroy at destruction time (#15692) 2019-01-11 12:36:41 -08:00
CUDAMathCompat.h [takeover] BTRS algorithm for fast/efficient binomial sampling (#36858) 2020-04-22 15:53:41 -07:00
CUDAStream.cpp Don't call cudaStreamDestroy at destruction time (#15692) 2019-01-11 12:36:41 -08:00
CUDAStream.h Add CUDA11 build and test (#40452) 2020-06-30 13:50:44 -07:00
README.md Move hipify to torch/utils to bundle them into torch package (#27425) 2019-10-07 17:25:45 -07:00

c10/cuda is a core library with CUDA functionality. It is distinguished from c10 in that it links against the CUDA library, but like c10 it doesn't contain any kernels, and consists solely of core functionality that is generally useful when writing CUDA code; for example, C++ wrappers for the CUDA C API.

Important notes for developers. If you want to add files or functionality to this folder, TAKE NOTE. The code in this folder is very special, because on our AMD GPU build, we transpile it into c10/hip to provide a ROCm environment. Thus, if you write:

// c10/cuda/CUDAFoo.h
namespace c10 { namespace cuda {

void my_func();

}}

this will get transpiled into:

// c10/hip/HIPFoo.h
namespace c10 { namespace hip {

void my_func();

}}

Thus, if you add new functionality to c10, you must also update C10_MAPPINGS torch/utils/hipify/cuda_to_hip_mappings.py to transpile occurrences of cuda::my_func to hip::my_func. (At the moment, we do NOT have a catch all cuda:: to hip:: namespace conversion, as not all cuda namespaces are converted to hip::, even though c10's are.)

Transpilation inside this folder is controlled by CAFFE2_SPECIFIC_MAPPINGS (oddly enough.) C10_MAPPINGS apply to ALL source files.

If you add a new directory to this folder, you MUST update both c10/cuda/CMakeLists.txt and c10/hip/CMakeLists.txt