pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
albanD	09cbf34e93	[BE] Preserve caller source location in the error message (#162808 ) Summary: Currently the C10_CUDA_CHECK only shows source location in CUDAException like below: ``` Exception raised from c10_cuda_check_implementation at fbcode/caffe2/c10/cuda/CUDAException.cpp:44 ``` which is not terribly useful. By checking the original diff D39619861 that introduced c10_cuda_check_implementation, it seems the original macro would show the source location correctly but c10_cuda_check_implementation broke it. This diff will propagate caller source location to c10_cuda_check_implementation to fix the issue. Test Plan: CI Observed desired error message after the change: ``` CUDA error: an illegal memory access was encountered Search for `cudaErrorIllegalAddress' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Device-side assertion tracking was not enabled by user. Exception raised from operator() at fbcode/sigrid/predictor/aed/AedContainer.cpp:659 (most recent call first): ``` Note the last line reports actual caller location. Rollback Plan: Reviewed By: Raymo111 Differential Revision: D81880552 Pull Request resolved: https://github.com/pytorch/pytorch/pull/162808 Approved by: https://github.com/janeyx99	2025-09-15 13:29:43 +00:00
Richard Barnes	fddabc6e0b	C10_UNUSED to [[maybe_unused]] (#6357 ) (#138364 ) Summary: Pull Request resolved: https://github.com/pytorch/executorch/pull/6357 Pull Request resolved: https://github.com/pytorch/pytorch/pull/138364 Approved by: https://github.com/Skylion007, https://github.com/eqy	2024-10-19 13:17:43 +00:00
Richard Barnes	8dd575faf6	[BE] Modernize `C10_UNUSED` (#138102 ) [`[[maybe_unused]]`](https://en.cppreference.com/w/cpp/language/attributes/maybe_unused) is part of C++17 standard Test Plan: Sandcastle Pull Request resolved: https://github.com/pytorch/pytorch/pull/138102 Approved by: https://github.com/Skylion007, https://github.com/albanD, https://github.com/malfet, https://github.com/eqy	2024-10-18 16:33:01 +00:00
Nikita Shulga	53e32d12c4	[c10] Use nested namespace in c10/cuda (#116464 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/116464 Approved by: https://github.com/Skylion007	2023-12-27 23:14:00 +00:00
Richard Barnes	eadbf762fc	Fix CUDA error not getting captured by handler (#92227 ) Fixes #91758. Still leaves functions on the hotpath. Pull Request resolved: https://github.com/pytorch/pytorch/pull/92227 Approved by: https://github.com/ngimel, https://github.com/malfet	2023-01-17 00:16:29 +00:00
Richard Barnes	ad188a227e	Introduce CUDA Device Assertions Infrastructure (#84609 ) Summary: This diff introduces a set of changes that makes it possible for the host to get assertions from CUDA devices. This includes the introduction of `CUDA_KERNEL_ASSERT2` A preprocessor macro to be used within a CUDA kernel that, upon an assertion failure, writes the assertion message, file, line number, and possibly other information to UVM (Managed memory). Once this is done, the original assertion is triggered, which places the GPU in a Bad State requiring recovery. In my tests, data written to UVM appears there before the GPU reaches the Bad State and is still accessible from the host after the GPU is in this state. Messages are written to a multi-message buffer which can, in theory, hold many assertion failures. I've done this as a precaution in case there are several, but I don't actually know whether that is possible and a simpler design which holds only a single message may well be all that is necessary. `TORCH_DSA_KERNEL_ARGS` This preprocess macro is added as an _argument_ to a kernel function's signature. It expands to supply the standardized names of all the arguments needed by `C10_CUDA_COMMUNICATING_KERNEL_ASSERTION` to handle device-side assertions. This includes, eg, the name of the pointer to the UVM memory the assertion would be written to. This macro abstracts the arguments so there is a single point of change if the system needs to be modified. `c10::cuda::get_global_cuda_kernel_launch_registry()` This host-side function returns a singleton object that manages the host's part of the device-side assertions. Upon allocation, the singleton allocates sufficient UVM (Managed) memory to hold information about several device-side assertion failures. The singleton also provides methods for getting the current traceback (used to identify when a kernel was launched). To avoid consuming all the host's memory the singleton stores launches in a circular buffer; a unique "generation number" is used to ensure that kernel launch failures map to their actual launch points (in the case that the circular buffer wraps before the failure is detected). `TORCH_DSA_KERNEL_LAUNCH` This host-side preprocessor macro replaces the standard ``` kernel_name<<<blocks, threads, shmem, stream>>>(args) ``` invocation with ``` TORCH_DSA_KERNEL_LAUNCH(blocks, threads, shmem, stream, args); ``` Internally, it fetches the UVM (Managed) pointer and generation number from the singleton and append these to the standard argument list. It also checks to ensure the kernel launches correctly. This abstraction on kernel launches can be modified to provide additional safety/logging. `c10::cuda::c10_retrieve_device_side_assertion_info` This host-side function checks, when called, that no kernel assertions have occurred. If one has. It then raises an exception with: 1. Information (file, line number) of what kernel was launched. 2. Information (file, line number, message) about the device-side assertion 3. Information (file, line number) about where the failure was detected. Checking for device-side assertions Device-side assertions are most likely to be noticed by the host when a CUDA API call such as `cudaDeviceSynchronize` is made and fails with a `cudaError_t` indicating > CUDA error: device-side assert triggered CUDA kernel errors Therefore, we rewrite `C10_CUDA_CHECK()` to include a call to `c10_retrieve_device_side_assertion_info()`. To make the code cleaner, most of the logic of `C10_CUDA_CHECK()` is now contained within a new function `c10_cuda_check_implementation()` to which `C10_CUDA_CHECK` passes the preprocessor information about filenames, function names, and line numbers. (In C++20 we can use `std::source_location` to eliminate macros entirely!) # Notes on special cases * Multiple assertions from the same block are recorded * Multiple assertions from different blocks are recorded * Launching kernels from many threads on many streams seems to be handled correctly * If two process are using the same GPU and one of the processes fails with a device-side assertion the other process continues without issue * X Multiple assertions from separate kernels on different streams seem to be recorded, but we can't reproduce the test condition * X Multiple assertions from separate devices should be all be shown upon exit, but we've been unable to generate a test that produces this condition Differential Revision: D37621532 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84609 Approved by: https://github.com/ezyang, https://github.com/malfet	2022-12-08 01:26:07 +00:00
Richard Barnes	e59d307e2f	Improve perf by avoiding implicit string creation in c10_cuda_check_implementation (#88350 ) Test Plan: Sandcastle Differential Revision: D40949947 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88350 Approved by: https://github.com/Skylion007, https://github.com/soumith	2022-11-03 02:48:41 +00:00
Richard Barnes	f0869cc8d0	Make CUDA exceptions unlikely and isolate C10_CUDA_CHECK body (#85256 ) This marks CUDA exception checks as unlikely, which might have a positive performance impact. If further isolates part of `C10_CUDA_CHECK` into a separate function and file so that code can be made more expressive in subsequent diffs without bloating functions using the check or creating readability issues. Test Plan: Sandcastle Differential Revision: D39619861 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85256 Approved by: https://github.com/ezyang, https://github.com/ngimel	2022-09-22 23:15:10 +00:00
Richard Barnes	1249d490de	Add additional CUDA error handling macros (#74865 ) Summary: Introduces additional ways of handling CUDA errors that allow automated linters to detect if errors are being handled. Pull Request resolved: https://github.com/pytorch/pytorch/pull/74865 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D35194530 fbshipit-source-id: f4fe61594edbfd81e97a4b605935961b893df167 (cherry picked from commit 919ddf677c5b9b46c5e493ed64346a5f2527bf08)	2022-03-29 18:03:03 +00:00
Richard Barnes	11aa1961c1	Use (void)error_unused to avoid unused warning (#71000 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71000 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D33470600 fbshipit-source-id: 868a6ee33a04846bd1efbe06ab306fbaad3bf9db	2022-01-07 23:39:30 -08:00
Natalia Gimelshein	6284d2a82b	wrap cudaStreamSynchronize calls (#61889 ) Summary: This is a first step towards creating context manager that errors out on synchronizing calls. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61889 Reviewed By: albanD Differential Revision: D29805280 Pulled By: ngimel fbshipit-source-id: b66400fbe0941b7daa51e6b30abe27b9cccd4e8a	2021-07-21 19:30:52 -07:00
Nikita Shulga	d125694d0b	Move CUDA async warning to suffix (#59467 ) Summary: After the change async error warnings look as follows: ``` $ python -c "import torch;torch.eye(3,3,device='cuda:777')" Traceback (most recent call last): File "<string>", line 1, in <module> RuntimeError: CUDA error: invalid device ordinal CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/59467 Reviewed By: ngimel Differential Revision: D28904360 Pulled By: malfet fbshipit-source-id: 2a8fa5affed5b4ffcaa602c8ab2669061cde7db0	2021-06-04 17:26:28 -07:00
Atul Jangra	3948ce2fd9	[Caffe2] Introduce c10::CudaError for CUDA Exceptions (#57609 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57609 Throw c10::CudaError for CUDA Exceptions for better classification of errors Test Plan: Test locally by running some workflows Reviewed By: dzhulgakov Differential Revision: D28209356 fbshipit-source-id: 19a5fc8548433238dc224ea81a5f63a945fc5cc3	2021-05-06 14:28:45 -07:00
Scott Wolchok	44cc873fba	[PyTorch] Autoformat c10 (#56830 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56830 Opt into formatting on GitHub and format everything. This is a trial run before turning on formatting for more and eventually all of the codebase. Test Plan: CI Reviewed By: zertosh Differential Revision: D27979080 fbshipit-source-id: a80f0c48691c08ae8ca0af06377b87e6a2351151	2021-04-30 21:23:28 -07:00
Kurtis David	b824fc4de2	[pytorch] [PR] Rename cuda kernel checks to C10 (#48615 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48615 Convert the macro from `TORCH_CUDA_KERNEL_LAUNCH_CHECK` to `C10_CUDA_KERNEL_LAUNCH_CHECK`, since it is now accessible through c10, not just torch. Test Plan: ``` buck build //caffe2/caffe2:caffe2_cu buck build //caffe2/aten:ATen-cu buck test //caffe2/test:kernel_launch_checks -- --print-passing-details ``` Reviewed By: jianyuh Differential Revision: D25228727 fbshipit-source-id: 9c65feb3d0ea3fbd31f1dcaecdb88ef0534f9121	2020-12-01 18:19:07 -08:00
Kurtis David	7be30d1883	Move CUDA kernel check to c10 (#48277 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48277 We move `TORCH_CUDA_KERNEL_LAUNCH_CHECK` from `//caffe2/aten/src/ATen/cuda/Exceptions.h` to `//caffe2/c10/cuda/CUDAException.h`. The primary reason is for allowing us to use this MACRO in other subdirectories of //caffe2, not just in ATen. Refer to D24309971 (`353e7f940f`) for context. An example of this use case is D24868557, where we add these checks to `//caffe2/caffe2/sgd`. Also, this should not affect current files, because `Exceptions.h` includes `CUDAException.h`. Test Plan: ``` buck build //caffe2/aten:ATen-cu ``` - https://fburl.com/buck/oq3rxbir Also wait for sandcastle tests. Reviewed By: ngimel Differential Revision: D25101720 fbshipit-source-id: e2b05b39ff1413a21e64949e26ca24c8f7d0400f	2020-11-20 14:58:15 -08:00
Mike Ruberry	a024e1e091	Creates Torch-friendly Event class and adds Stream tracking to autograd (#25130 ) Summary: Resubmission of https://github.com/pytorch/pytorch/issues/23424 because previous PR was borked. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25130 Test Plan: Two tests were added to cuda_stream_test for this functionality. Differential Revision: D17145538 Pulled By: mruberry fbshipit-source-id: 2546c5907c038412e03aa0d3328a972b0164c455	2019-09-01 12:37:52 -07:00
SsnL	8482efb203	pin_memory malloc now uses existing context if available. (#22229 ) Summary: This is achieved by using `cuDevicePrimaryCtxGetState` as a way to check whether a primary context exists on a device. It is not too slow, from this benchmark of a single call to it on CUDA 10.1, Titan Xp, driver 415.27: ``` --------------------------------------------------------------------- Benchmark Time CPU Iterations --------------------------------------------------------------------- BM_cuDevicePrimaryCtxGetState 301 ns 301 ns 2319746 ``` Commits: 1. Add `CUDAHooks::getDeviceWithPrimaryContext` which returns a device index with primary context (if exists). Link `c10/cuda` against `libcuda` for device API calls. 2. Use `getDeviceWithPrimaryContext` to check primary context in `pin_memory`. Fix `OptionalDeviceGuard` doc. 3. Refactor `test_cuda_primary_ctx.py` to support multiple tests. Add test for this in that file. Fixes https://github.com/pytorch/pytorch/issues/21081. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22229 Differential Revision: D16170194 Pulled By: zou3519 fbshipit-source-id: 485a45f211b7844c9e69c63f3b3b75194a796c5d	2019-07-16 10:18:30 -07:00
Edward Yang	535c5540bc	Back out "Back out "[pytorch][PR] Move thnvrtc and DynamicLibrary to ATen"" (#22794 ) Summary: Original commit changeset: 227df3b85316 Pull Request resolved: https://github.com/pytorch/pytorch/pull/22794 ghstack-source-id: 86400904 Differential Revision: D16222777 fbshipit-source-id: 0b198ac59e640df0b8204b4ed30f8e822c15fd9a	2019-07-15 06:28:56 -07:00
Edward Yang	ac78a86e1d	Back out "[pytorch][PR] Move thnvrtc and DynamicLibrary to ATen" (#22749 ) Summary: Original commit changeset: add2ee8a8865 Pull Request resolved: https://github.com/pytorch/pytorch/pull/22749 ghstack-source-id: 86323899 Differential Revision: D16203552 fbshipit-source-id: 227df3b85316315c15d2cb7b6a5c884096a82e9e	2019-07-11 12:21:21 -07:00
SsnL	31d821e267	Move thnvrtc and DynamicLibrary to ATen (#22362 ) Summary: Having the NVRTC stub in ATen is necessary to call driver APIs in ATen. This is currently blocking https://github.com/pytorch/pytorch/pull/22229. `DynamicLibrary` is also moved as it is used in the stub code, and seems general enough. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22362 Differential Revision: D16131787 Pulled By: ezyang fbshipit-source-id: add2ee8a8865229578aa00001a00d5a6671e0e73	2019-07-09 07:28:27 -07:00
Junjie Bai	212024282b	Mark cudaGetLastError return value unused in C10_CUDA_CHECK Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17605 Reviewed By: xw285cornell Differential Revision: D14277586 Pulled By: bddppq fbshipit-source-id: 38879208f2ab83cf39d8a8a61b288cd09fcafd9a	2019-03-01 00:05:46 -08:00
Soumith Chintala	3a47d56946	Fix static linkage cases and NO_DISTRIBUTED=1 + CUDA (#16705 ) (#17337 ) Summary: Attempt #2 (attempt 1 is https://github.com/pytorch/pytorch/pull/16705 and got reverted because of CI failures) Fixes https://github.com/pytorch/pytorch/issues/14805 Pull Request resolved: https://github.com/pytorch/pytorch/pull/17337 Differential Revision: D14175626 Pulled By: soumith fbshipit-source-id: 66f2e10e219a1bf88ed342ec5c89da6f2994d8eb	2019-02-21 16:12:02 -08:00
Edward Yang	48099c23b4	Move AT_CUDA_CHECK to c10 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13910 Reviewed By: smessmer Differential Revision: D13046201 fbshipit-source-id: 8d360a0e4d6c2edf070d130e600c6b04f0ee0058	2018-11-19 08:20:10 -08:00

24 Commits