pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

History

Bert Maher 03342af3a3 Add env variable to bypass CUDACachingAllocator for debugging (#45294 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45294 While tracking down a recent memory corruption bug we found that cuda-memcheck wasn't finding the bad accesses, and ngimel pointed out that it's because we use a caching allocator so a lot of "out of bounds" accesses land in a valid slab. This PR adds a runtime knob (`PYTORCH_NO_CUDA_MEMORY_CACHING`) that, when set, bypasses the caching allocator's caching logic so that allocations go straight to cudaMalloc. This way, cuda-memcheck will actually work. Test Plan: Insert some memory errors and run a test under cuda-memcheck; observe that cuda-memcheck flags an error where expected. Specifically I removed the output-masking logic here: https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/tensorexpr/cuda_codegen.cpp#L819-L826 And ran: ``` PYTORCH_NO_CUDA_MEMORY_CACHING=1 cuda-memcheck pytest -k test_superslomo test_jit_fuser_te.py ``` Reviewed By: ngimel Differential Revision: D23964734 Pulled By: bertmaher fbshipit-source-id: 04efd11e8aff037b9edde80c70585cb820ee6e39		2020-09-28 11:40:04 -07:00
..
amp_examples.rst	Reference amp tutorial (recipe) from core amp docs (#44725 )	2020-09-16 11:37:58 -07:00
autograd.rst	Doc note for complex (#41252 )	2020-07-16 08:53:27 -07:00
broadcasting.rst	[docs] Update broadcasting and cuda semantics notes (#6904 )	2018-04-24 13:41:24 -04:00
cpu_threading_runtimes.svg	Update CPU threading doc (#33083 )	2020-02-11 14:13:51 -08:00
cpu_threading_torchscript_inference.rst	Upgrade MKL-DNN to DNNL v1.2 (#32422 )	2020-03-26 22:07:59 -07:00
cpu_threading_torchscript_inference.svg	Threading and CPU Inference note	2019-07-29 15:45:49 -07:00
cuda.rst	Add env variable to bypass CUDACachingAllocator for debugging (#45294 )	2020-09-28 11:40:04 -07:00
ddp.rst	Fix wrong link in docs/source/notes/ddp.rst (#40484 )	2020-06-28 13:55:56 -07:00
extending.rst	Don't materialize output grads (#41821 )	2020-08-11 04:27:07 -07:00
faq.rst	Revert "Revert D21337640: [pytorch][PR] Split up documentation into subpages and clean up some warnings" (#37778 )	2020-05-04 14:32:35 -07:00
large_scale_deployments.rst	Move ThreadLocalDebugInfo to c10 (#37774 )	2020-05-11 19:27:41 -07:00
multiprocessing.rst	Update docs for master to remove Python 2 references (#36336 )	2020-04-16 10:15:48 -07:00
randomness.rst	Update determinism documentation (#41692 )	2020-08-31 21:06:24 -07:00
serialization.rst	Makes the use of the term "module" consistent through the serialization note (#41563 )	2020-07-16 14:59:49 -07:00
windows.rst	Correct the windows docs (#43479 )	2020-08-25 13:41:24 -07:00