Yuanyuan Chen
0f0b4bf029
[1/N] Remove unused header inclusion ( #165763 )
...
This PR removes unused header inclusion in C++ files.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165763
Approved by: https://github.com/Skylion007
2025-10-18 05:23:11 +00:00
Raymond Li
08540b13c6
Use cuda error code instead of error text in get_cuda_error_help ( #158688 )
...
Use cudaError_t and switch through the enum to prevent impact by upstream changes in wording
Pull Request resolved: https://github.com/pytorch/pytorch/pull/158688
Approved by: https://github.com/q10 , https://github.com/aorenste
2025-07-21 23:34:50 +00:00
Raymond Li
24b49b9881
[Fix] Rework CUDA error explanation framework to be less destructive … ( #158484 )
...
…in fbsource
Fix-forward for #158395
Added `std::string c10::cuda::get_cuda_error_help(const char* error_string)` to provide a framework for appending clarifying messages to CUDA errors.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/158484
Approved by: https://github.com/aorenste
2025-07-17 03:36:47 +00:00
Raymond Li
55d888a616
Add framework for explanations for common CUDA errors ( #158395 )
...
As popularly requested in user groups.
Test plan:
```
import torch
a = torch.randn(10000)
device = torch.device('cuda:1')
a = a.to(device)
```
Before:
```
Traceback (most recent call last):
File "/data/users/raymo/pytorch/test/cuda.py", line 6, in <module>
a = a.to(device)
^^^^^^^^^^^^
torch.AcceleratorError: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
```
After:
```
Traceback (most recent call last):
File "/data/users/raymo/pytorch/test/cuda.py", line 6, in <module>
a = a.to(device)
^^^^^^^^^^^^
torch.AcceleratorError: CUDA error: invalid device ordinal
GPU device may be out of range, do you have enough GPUs?
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/158395
Approved by: https://github.com/aorenste
Co-authored-by: Aaron Orenstein <aorenste@fb.com>
2025-07-16 12:31:18 +00:00
cyy
8c860aef0d
[Reland][Environment Variable][3/N] Use thread-safe getenv functions ( #137942 )
...
Reland of #137328 , which was reverted due to reverting a dependent PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/137942
Approved by: https://github.com/eqy
2024-10-15 07:47:24 +00:00
PyTorch MergeBot
df0c2f5cae
Revert "[Environment Variable][3/N] Use thread-safe getenv wrapper ( #137328 )"
...
This reverts commit 25ac5652d0 .
Reverted https://github.com/pytorch/pytorch/pull/137328 on behalf of https://github.com/clee2000 due to need to revert this in order to revert #133896 , please rebase and reland, sorry for the churn ([comment](https://github.com/pytorch/pytorch/pull/137328#issuecomment-2412143739 ))
2024-10-14 20:22:26 +00:00
cyyever
25ac5652d0
[Environment Variable][3/N] Use thread-safe getenv wrapper ( #137328 )
...
Follows #124485
Pull Request resolved: https://github.com/pytorch/pytorch/pull/137328
Approved by: https://github.com/eqy
2024-10-11 23:23:57 +00:00
chilli
392dc45597
Made FlexAttention rewrite getitem calls to use aten.index in score_mod ( #124799 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124799
Approved by: https://github.com/drisspg
ghstack dependencies: #124444
2024-04-26 17:22:13 +00:00
PyTorch MergeBot
e913f77c60
Revert "Made FlexAttention rewrite getitem calls to use aten.index in score_mod ( #124799 )"
...
This reverts commit 9bccafc31c .
Reverted https://github.com/pytorch/pytorch/pull/124799 on behalf of https://github.com/clee2000 due to broke tests but only on crossref https://github.com/pytorch/pytorch/actions/runs/8841521519/job/24279075171 , added no td label so itll actually run this time ([comment](https://github.com/pytorch/pytorch/pull/124799#issuecomment-2078530797 ))
2024-04-26 02:35:14 +00:00
chilli
9bccafc31c
Made FlexAttention rewrite getitem calls to use aten.index in score_mod ( #124799 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124799
Approved by: https://github.com/drisspg
ghstack dependencies: #124444
2024-04-26 01:02:28 +00:00
PyTorch MergeBot
678662a557
Revert "Made FlexAttention rewrite getitem calls to use aten.index in score_mod ( #124799 )"
...
This reverts commit acc4cbea39 .
Reverted https://github.com/pytorch/pytorch/pull/124799 on behalf of https://github.com/jeanschmidt due to checking if this diff introduced regressions on linux-focal-py3.11-clang10 and linux-focal-py3.8-clang10 ([comment](https://github.com/pytorch/pytorch/pull/124799#issuecomment-2076756876 ))
2024-04-25 09:29:57 +00:00
chilli
acc4cbea39
Made FlexAttention rewrite getitem calls to use aten.index in score_mod ( #124799 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124799
Approved by: https://github.com/drisspg
2024-04-25 06:19:55 +00:00
PyTorch MergeBot
277ab8a4c0
Revert "[Environment Variable][1/N] Use thread-safe env variable API in c10 ( #119449 )"
...
This reverts commit a56e057814 .
Reverted https://github.com/pytorch/pytorch/pull/119449 on behalf of https://github.com/jeanschmidt due to Broken internal signals, @albanD please help get this sorted :) ([comment](https://github.com/pytorch/pytorch/pull/119449#issuecomment-2069716129 ))
2024-04-22 14:44:44 +00:00
cyy
a56e057814
[Environment Variable][1/N] Use thread-safe env variable API in c10 ( #119449 )
...
This PR is the beginning of attempts to wrap thread-unsafe getenv and set_env functions inside a RW mutex.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119449
Approved by: https://github.com/malfet , https://github.com/albanD
2024-04-19 13:39:41 +00:00
PyTorch MergeBot
61bc188f42
Revert "[Environment Variable][1/N] Use thread-safe env variable API in c10 ( #119449 )"
...
This reverts commit b51f66c195 .
Reverted https://github.com/pytorch/pytorch/pull/119449 on behalf of https://github.com/malfet due to Broke gcc9 builds ([comment](https://github.com/pytorch/pytorch/pull/119449#issuecomment-2064936414 ))
2024-04-18 18:53:59 +00:00
cyy
b51f66c195
[Environment Variable][1/N] Use thread-safe env variable API in c10 ( #119449 )
...
This PR is the beginning of attempts to wrap thread-unsafe getenv and set_env functions inside a RW mutex.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119449
Approved by: https://github.com/albanD
2024-04-18 13:35:48 +00:00
PyTorch MergeBot
f5049de242
Revert "[Environment Variable][1/N] Use thread-safe env variable API in c10 ( #119449 )"
...
This reverts commit 5bef127c2e .
Reverted https://github.com/pytorch/pytorch/pull/119449 on behalf of https://github.com/PaliC due to your using TORCH_INTERNAL_ASSERT incorrectly ([comment](https://github.com/pytorch/pytorch/pull/119449#issuecomment-2062696010 ))
2024-04-17 23:44:00 +00:00
cyy
5bef127c2e
[Environment Variable][1/N] Use thread-safe env variable API in c10 ( #119449 )
...
This PR is the beginning of attempts to wrap thread-unsafe getenv and set_env functions inside a RW mutex.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119449
Approved by: https://github.com/albanD
2024-04-16 04:39:20 +00:00
cyy
fb10e13000
[Clang-tidy header][24/N] Fix clang-tidy warnings on c10/cuda/*.{cpp,h} ( #120781 )
...
This PR begins to clean clang-tidy warnings of code in c10/cuda.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/120781
Approved by: https://github.com/ezyang
2024-03-15 05:03:22 +00:00
cyy
4a019047ad
Enable nested namespace check in clang-tidy ( #118506 )
...
It is time to enable nested namespaces in the code.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118506
Approved by: https://github.com/albanD
2024-01-31 00:32:35 +00:00
PyTorch MergeBot
9c7391ea36
Revert " [1/N] Apply clang-tidy to c10 cuda files ( #111137 )"
...
This reverts commit 43b023694e .
Reverted https://github.com/pytorch/pytorch/pull/111137 on behalf of https://github.com/malfet due to Was reverted internally due to the failures in torch.cuda.memory_stats(device=0) (presumably) ([comment](https://github.com/pytorch/pytorch/pull/111137#issuecomment-1769274103 ))
2023-10-18 20:32:53 +00:00
cyy
43b023694e
[1/N] Apply clang-tidy to c10 cuda files ( #111137 )
...
Fixes #ISSUE_NUMBER
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111137
Approved by: https://github.com/zou3519 , https://github.com/Skylion007
2023-10-17 04:52:50 +00:00
Richard Barnes
6f749fd171
Fixes to DSA infra ( #91835 )
...
Differential Revision: D42397325
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91835
Approved by: https://github.com/soumith
2023-01-12 21:54:26 +00:00
Zachary DeVito
f56ce8dbad
[allocator] Move getFreeMutex ( #87237 )
...
It isn't used at all the allocators and this change makes that more clear.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87237
Approved by: https://github.com/wconstab
2022-10-19 18:00:40 +00:00
Natalia Gimelshein
6284d2a82b
wrap cudaStreamSynchronize calls ( #61889 )
...
Summary:
This is a first step towards creating context manager that errors out on synchronizing calls.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61889
Reviewed By: albanD
Differential Revision: D29805280
Pulled By: ngimel
fbshipit-source-id: b66400fbe0941b7daa51e6b30abe27b9cccd4e8a
2021-07-21 19:30:52 -07:00