mirror of
https://github.com/zebrajr/pytorch.git
synced 2025-12-07 12:21:27 +01:00
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45294 While tracking down a recent memory corruption bug we found that cuda-memcheck wasn't finding the bad accesses, and ngimel pointed out that it's because we use a caching allocator so a lot of "out of bounds" accesses land in a valid slab. This PR adds a runtime knob (`PYTORCH_NO_CUDA_MEMORY_CACHING`) that, when set, bypasses the caching allocator's caching logic so that allocations go straight to cudaMalloc. This way, cuda-memcheck will actually work. Test Plan: Insert some memory errors and run a test under cuda-memcheck; observe that cuda-memcheck flags an error where expected. Specifically I removed the output-masking logic here: https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/tensorexpr/cuda_codegen.cpp#L819-L826 And ran: ``` PYTORCH_NO_CUDA_MEMORY_CACHING=1 cuda-memcheck pytest -k test_superslomo test_jit_fuser_te.py ``` Reviewed By: ngimel Differential Revision: D23964734 Pulled By: bertmaher fbshipit-source-id: 04efd11e8aff037b9edde80c70585cb820ee6e39 |
||
|---|---|---|
| .. | ||
| amp_examples.rst | ||
| autograd.rst | ||
| broadcasting.rst | ||
| cpu_threading_runtimes.svg | ||
| cpu_threading_torchscript_inference.rst | ||
| cpu_threading_torchscript_inference.svg | ||
| cuda.rst | ||
| ddp.rst | ||
| extending.rst | ||
| faq.rst | ||
| large_scale_deployments.rst | ||
| multiprocessing.rst | ||
| randomness.rst | ||
| serialization.rst | ||
| windows.rst | ||