Commit Graph

23 Commits

Author SHA1 Message Date
PyTorch MergeBot
c6329524d8 Revert "Add magic TORCH_MAKE_PYBIND_ENUM_FASTER macro (#163527)"
This reverts commit 50c0550f5a.

Reverted https://github.com/pytorch/pytorch/pull/163527 on behalf of https://github.com/swolchok due to breaking import torch in debug builds, see #164297 ([comment](https://github.com/pytorch/pytorch/pull/163527#issuecomment-3361919142))
2025-10-02 15:42:42 +00:00
Scott Wolchok
50c0550f5a Add magic TORCH_MAKE_PYBIND_ENUM_FASTER macro (#163527)
See comment on the macro definition. In short, pybind11 3.x
added `py::native_enum`, and also had to add overhead for that new way
to bind enums on the critical path for calling functions that take
regular old `py::enum_`s as arguments (for example, `__eq__`).

Differential Revision: [D82873169](https://our.internmc.facebook.com/intern/diff/D82873169/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/163527
Approved by: https://github.com/ezyang
2025-09-26 17:59:22 +00:00
PyTorch MergeBot
9a883007a2 Revert "Implement cuda graphs implementation of torch.cond and torch.while_loop (#140979)"
This reverts commit c7515da7b0.

Reverted https://github.com/pytorch/pytorch/pull/140979 on behalf of https://github.com/huydhn due to This change has been reported to break internal code ([comment](https://github.com/pytorch/pytorch/pull/140979#issuecomment-2657361940))
2025-02-13 18:04:26 +00:00
Daniel Galvez
c7515da7b0 Implement cuda graphs implementation of torch.cond and torch.while_loop (#140979)
This is a new PR for #130386 , which got stale and was closed. Since I force-pushed to that branch in order to rebase it on top of main, the PR can no longer be reopened, according to https://github.com/isaacs/github/issues/361

I fixed the possibly-not-warmed-up problem described here: https://github.com/pytorch/pytorch/pull/130386/files#r1690856534

Since starting this, torch.cond and torch.while_loop now apparently have support for backward passes. I will look into what it might take to support that.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140979
Approved by: https://github.com/eqy, https://github.com/eellison
2025-02-11 18:16:15 +00:00
cyy
f95c71867e [9/N] Fix extra warnings brought by clang-tidy-17 (#139286)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/139286
Approved by: https://github.com/ezyang
2024-10-31 05:20:31 +00:00
Will Constable
df59084012 Drop GIL around cudart APIs (#132520)
Noticed a hang where the stuck thread blocked on cudaHostUnregister
call, probably due to an internal cuda deadlock caused by something
else, but was holding the GIL at the time and blocked other python
threads.

As far as I can tell cudart APIs all do not require the GIL held nor are
they marked as thread unsafe.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132520
Approved by: https://github.com/LucasLLC, https://github.com/kirtiteja
2024-08-05 17:04:01 +00:00
cyy
3cd6a21e8f [DeviceIndex][6/N] Use DeviceIndex in more places (#120133)
This PR follows the series of patches beginning with #119142 and fixes various XPU and python related methods to use DeviceIndex.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/120133
Approved by: https://github.com/Skylion007
2024-02-21 06:24:23 +00:00
Nikita Shulga
dfd441a12c [BE] Use nested namespaces in torch/csrc/cuda (#106928)
<!--
copilot:poem
-->
### <samp>🤖 Generated by Copilot at 6b1dde1</samp>

> _`namespace` syntax_
> _Simplified with C++17_
> _Code is more readable_

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106928
Approved by: https://github.com/huydhn, https://github.com/izaitsevfb
2023-08-10 03:56:09 +00:00
Jianyu Huang
63b8ecc415 [CUDA12] Make PyTorch compatible with CUDA 12 (#91118)
Fix the failure when building PyTorch from source code using CUDA 12

```
In file included from /home/jianyuhuang/Work/Github/pytorch/c10/cuda/CUDAFunctions.h:12,
                 from /home/jianyuhuang/Work/Github/pytorch/c10/cuda/CUDAStream.h:10,
                 from /home/jianyuhuang/Work/Github/pytorch/c10/cuda/CUDAGraphsC10Utils.h:3,
                 from /home/jianyuhuang/Work/Github/pytorch/aten/src/ATen/cuda/CUDAGraph.h:5,
                 from /home/jianyuhuang/Work/Github/pytorch/aten/src/ATen/cuda/CUDAGraph.cpp:2:
/home/jianyuhuang/Work/Github/pytorch/aten/src/ATen/cuda/CUDAGraph.cpp: In member function ‘void at::cuda::CUDAGraph::capture_end()’:
/home/jianyuhuang/Work/Github/pytorch/aten/src/ATen/cuda/CUDAGraph.cpp:168:75: warning: converting to non-pointer type ‘long long unsigned int’ from NULL [-Wconversion-null]
     AT_CUDA_CHECK(cudaGraphInstantiate(&graph_exec_, graph_, NULL, NULL, 0));
                                                                           ^
/home/jianyuhuang/Work/Github/pytorch/c10/cuda/CUDAException.h:31:42: note: in definition of macro ‘C10_CUDA_CHECK’
     C10_UNUSED const cudaError_t __err = EXPR;                           \
                                          ^~~~
/home/jianyuhuang/Work/Github/pytorch/aten/src/ATen/cuda/CUDAGraph.cpp:168:5: note: in expansion of macro ‘AT_CUDA_CHECK’
     AT_CUDA_CHECK(cudaGraphInstantiate(&graph_exec_, graph_, NULL, NULL, 0));
     ^~~~~~~~~~~~~
/home/jianyuhuang/Work/Github/pytorch/aten/src/ATen/cuda/CUDAGraph.cpp:168:75: error: too many arguments to function ‘cudaError_t cudaGraphInstantiate(CUgraphExec_st**, cudaGraph_t, long long unsigned int)’
     AT_CUDA_CHECK(cudaGraphInstantiate(&graph_exec_, graph_, NULL, NULL, 0));
                                                                           ^
/home/jianyuhuang/Work/Github/pytorch/c10/cuda/CUDAException.h:31:42: note: in definition of macro ‘C10_CUDA_CHECK’
     C10_UNUSED const cudaError_t __err = EXPR;                           \
                                          ^~~~
/home/jianyuhuang/Work/Github/pytorch/aten/src/ATen/cuda/CUDAGraph.cpp:168:5: note: in expansion of macro ‘AT_CUDA_CHECK’
     AT_CUDA_CHECK(cudaGraphInstantiate(&graph_exec_, graph_, NULL, NULL, 0));
     ^~~~~~~~~~~~~
In file included from /home/jianyuhuang/Work/Github/pytorch/c10/cuda/CUDAStream.h:6,
                 from /home/jianyuhuang/Work/Github/pytorch/c10/cuda/CUDAGraphsC10Utils.h:3,
                 from /home/jianyuhuang/Work/Github/pytorch/aten/src/ATen/cuda/CUDAGraph.h:5,
                 from /home/jianyuhuang/Work/Github/pytorch/aten/src/ATen/cuda/CUDAGraph.cpp:2:
/usr/local/cuda/include/cuda_runtime_api.h:11439:39: note: declared here
 extern __host__ cudaError_t CUDARTAPI cudaGraphInstantiate(cudaGraphExec_t *pGraphExec, cudaGraph_t graph, unsigned long long flags __dv(0));
                                       ^~~~~~~~~~~~~~~~~~~~
ninja: build stopped: subcommand failed.
```

```
/home/jianyuhuang/Work/Github/pytorch/torch/csrc/cuda/shared/cudart.cpp: In function ‘void torch::cuda::shared::initCudartBindings(PyObject*)’:
/home/jianyuhuang/Work/Github/pytorch/torch/csrc/cuda/shared/cudart.cpp:34:13: error: ‘cudaOutputMode_t’ was not declared in this scope
   py::enum_<cudaOutputMode_t>(
             ^~~~~~~~~~~~~~~~
/home/jianyuhuang/Work/Github/pytorch/torch/csrc/cuda/shared/cudart.cpp:34:13: note: suggested alternative: ‘cudaGraphNode_t’
   py::enum_<cudaOutputMode_t>(
             ^~~~~~~~~~~~~~~~
             cudaGraphNode_t
/home/jianyuhuang/Work/Github/pytorch/torch/csrc/cuda/shared/cudart.cpp:34:29: error: template argument 1 is invalid
   py::enum_<cudaOutputMode_t>(
                             ^
/home/jianyuhuang/Work/Github/pytorch/torch/csrc/cuda/shared/cudart.cpp:38:30: error: ‘cudaKeyValuePair’ was not declared in this scope
       .value("KeyValuePair", cudaKeyValuePair)
                              ^~~~~~~~~~~~~~~~
/home/jianyuhuang/Work/Github/pytorch/torch/csrc/cuda/shared/cudart.cpp:39:21: error: ‘cudaCSV’ was not declared in this scope
       .value("CSV", cudaCSV);
                     ^~~~~~~
/home/jianyuhuang/Work/Github/pytorch/torch/csrc/cuda/shared/cudart.cpp:39:21: note: suggested alternative: ‘cudart’
       .value("CSV", cudaCSV);
                     ^~~~~~~
                     cudart
/home/jianyuhuang/Work/Github/pytorch/torch/csrc/cuda/shared/cudart.cpp:99:7: error: ‘cudaProfilerInitialize’ was not declared in this scope
       cudaProfilerInitialize);
       ^~~~~~~~~~~~~~~~~~~~~~
/home/jianyuhuang/Work/Github/pytorch/torch/csrc/cuda/shared/cudart.cpp:99:7: note: suggested alternative: ‘cudaProfilerStart’
       cudaProfilerInitialize);
       ^~~~~~~~~~~~~~~~~~~~~~
       cudaProfilerStart
ninja: build stopped: subcommand failed.
```

After these fixes, we can see CUDA 12 is successfully built with OSS PyTorch instructions.

USE_CUDA=1 python setup.py develop  2>&1 | tee compile.log
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91118
Approved by: https://github.com/ngimel, https://github.com/brad-mengchi
2022-12-20 10:58:53 +00:00
Richard Barnes
3ece9fb45d Check all CUDA API calls for errors in torch/ (#81560)
Summary:
Original commit changeset: 0bb770d2cdb2

Original Phabricator Diff: D35194935 (79e5b053b6)

Differential Revision: D35291874

Pull Request resolved: https://github.com/pytorch/pytorch/pull/81560
Approved by: https://github.com/ezyang
2022-10-28 00:40:48 +00:00
Jeff Daily
263c05c918 [ROCm] work-around missing hipProfilerStart/Stop (#82778)
### Description
cudaProfilerStart and cudaProfilerStop are deprecated but exposed by torch.cuda.cudart().  HIP has corresponding functions stubbed out, hipProfilerStart and hipProfilerStop, but they return hipErrorNotSupported.  Profiling in HIP is supported, but not via these deprecated APIs.

See https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__PROFILER__DEPRECATED.html.

These functions are indirectly used by one or more unit tests that would otherwise pass if the non-functional HIP APIs were replaced with a dummy function.

### Testing
Unskipped a related unit test, run by ciflow/trunk.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/82778
Approved by: https://github.com/ezyang
2022-08-08 18:25:13 +00:00
Michael Suo
30fb2c4aba [lint] autoformat test/cpp and torch/csrc
Let's have some fun.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78828

Approved by: https://github.com/ezyang
2022-06-11 21:11:16 +00:00
Nikita Shulga
c40a009d66 Revert D35194935: Check all CUDA API calls for errors in torch/
Test Plan: revert-hammer

Differential Revision:
D35194935 (79e5b053b6)

Original commit changeset: f5ec5be87cdf

Original Phabricator Diff: D35194935 (79e5b053b6)

fbshipit-source-id: 0bb770d2cdb29b8e724c0b6a125c748f363d3358
(cherry picked from commit 04e5a73da4a53b0ec296f3df2c85626d19290c1f)
2022-03-31 05:48:30 +00:00
Richard Barnes
79e5b053b6 Check all CUDA API calls for errors in torch/ (#74923)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74923

Test Plan: Sandcastle

Reviewed By: malfet

Differential Revision: D35194935

fbshipit-source-id: f5ec5be87cdf775eb9c99f8c3baed6b0366dda49
(cherry picked from commit 7284c4ed7d57261d4936055e0c1a3f8f911fb1f0)
2022-03-31 05:08:55 +00:00
Shubhorup Biswas
e505f06a79 More folders in clang-tidy (#74908)
Summary:
1. Added folders torch/csrc/onnx and torch/csrc/cuda to clang-tidy.
2. Fixed clang-tidy violations in torch/csrc/cuda
3. Fixed(added Python import paths) in clang-tidy Python runner

Fixes some of https://github.com/pytorch/pytorch/issues/62011

Pull Request resolved: https://github.com/pytorch/pytorch/pull/74908

Reviewed By: atalman

Differential Revision: D35221843

Pulled By: shahofblah

fbshipit-source-id: f0d1f066550b383aa48449b12d194009977c0bd8
(cherry picked from commit 830186a673f432c9f3558f3e9cf1cd4294fa0fb0)
2022-03-29 22:59:16 +00:00
Mike Ruberry
dc87cf5fe1 Fixes mem_get_info when querying on a device other than the current device (#69640)
Summary:
Also fixes the documentation failing to appear and adds a test to validate that op works with multiple devices properly.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69640

Reviewed By: ngimel

Differential Revision: D32965391

Pulled By: mruberry

fbshipit-source-id: 4fe502809b353464da8edf62d92ca9863804f08e
2021-12-08 23:04:30 -08:00
Pruthvi Madugundu
085e2f7bdd [ROCm] Changes not to rely on CUDA_VERSION or HIP_VERSION (#65610)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65610

- Replace HIP_PLATFORM_HCC with USE_ROCM
- Dont rely on CUDA_VERSION or HIP_VERSION and use USE_ROCM and ROCM_VERSION.

- In the next PR
   - Will be removing the mapping from CUDA_VERSION to HIP_VERSION and CUDA to HIP in hipify.
   - HIP_PLATFORM_HCC is deprecated, so will add HIP_PLATFORM_AMD to support HIP host code compilation on gcc.

cc jeffdaily sunway513 jithunnair-amd ROCmSupport amathews-amd

Reviewed By: jbschlosser

Differential Revision: D30909053

Pulled By: ezyang

fbshipit-source-id: 224a966ebf1aaec79beccbbd686fdf3d49267e06
2021-09-29 09:55:43 -07:00
Emilio Castillo
f9ec86a6c6 External stream (#59527)
Summary:
Previous is https://github.com/pytorch/pytorch/issues/57781

We add now two CUDA bindings to avoid using ctypes to fix a windows issue.
However, we use ctypes to allocate the stream and create its pointer
(we can do this with a 0-dim tensor too if it feels better).

CC. ezyang rgommers ngimel mruberry

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59527

Reviewed By: albanD

Differential Revision: D29053062

Pulled By: ezyang

fbshipit-source-id: 661e7e58de98b1bdb7a0871808cd41d91fe8f13f
2021-06-14 13:46:11 -07:00
Corey Lammie
b4b95fc87a Expose cudaMemGetInfo (#58635)
Summary:
This PR resolves the second issue outlined in https://github.com/pytorch/pytorch/issues/58376, which has previously been discussed in https://github.com/pytorch/pytorch/issues/50722.

`cudaMemGetInfo` is bound/exposed to the Python API. An example function call is provided below:

```
device_free, device_total = torch.cuda.mem_get_info(torch.device('cuda:0'))
print(device_free, device_total)
```

In  `CUDACachingAllocator.cpp`, in constant to my initial PR, the newly defined function `std::pair<size_t, size_t> raw_cuda_mem_get_info(int device)` has been moved from the `CUDACaching` namespace to the `cuda` namespace. In addition, as suugested by ezyang, `det` has been removed from all function names.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58635

Reviewed By: zou3519

Differential Revision: D28649093

Pulled By: ezyang

fbshipit-source-id: d8b7c53e52cf73f35495d8651863c5bb408d7a6a
2021-05-25 14:58:35 -07:00
Edward Yang
da4033d32a Make cudaHostRegister actually useful on cudart. (#45159)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45159

By default, pybind11 binds void* to be capsules.  After a lot of
Googling, I have concluded that this is not actually useful:
you can't actually create a capsule from Python land, and our
data_ptr() function returns an int, which means that the
function is effectively unusable.  It didn't help that we had no
tests exercising it.

I've replaced the void* with uintptr_t, so that we now accept int
(and you can pass data_ptr() in directly).  I'm not sure if we
should make these functions accept ctypes types; unfortunately,
pybind11 doesn't seem to have any easy way to do this.

Fixes #43006

Also added cudaHostUnregister which was requested.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: lw

Differential Revision: D23849731

Pulled By: ezyang

fbshipit-source-id: 8a79986f3aa9546abbd2a6a5828329ae90fd298f
2020-09-23 11:05:44 -07:00
Luca Wehrstedt
c20426f86d Fix torch.cuda.check_error type errors (#41330)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41330

`torch.cuda.check_error` is annotated as taking an `int` as argument but when running `torch.cuda.check_error(34)` one would get:
```
TypeError: cudaGetErrorString(): incompatible function arguments. The following argument types are supported:
    1. (arg0: torch._C._cudart.cudaError) -> str

Invoked with: 34
```
Even if one explicitly casted the argument, running `torch.cuda.check_error(torch._C._cudart.cudaError(34))` would give:
```
AttributeError: 'str' object has no attribute 'decode'
```

This PR fixes both issues (thus allowing `check_error` to be called with a un-casted int) and adds a test.
ghstack-source-id: 107628709

Test Plan: Unit tests

Reviewed By: ezyang

Differential Revision: D22500549

fbshipit-source-id: 9170c1e466dd554d471e928b26eb472a712da9e1
2020-07-14 00:47:14 -07:00
Edward Yang
940e678da9 Add back cudaHostRegister to cudart API. (#34665)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34665

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D20493861

Pulled By: ezyang

fbshipit-source-id: 4215e3037a16be460f20cfc2859be5ee074128d3
2020-03-17 13:30:39 -07:00
Peter Bell
5fc5cf6571 Stop using ctypes to interface with CUDA libraries. (#33678)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/33016, Continuation of https://github.com/pytorch/pytorch/issues/31160
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33678

Differential Revision: D20249187

Pulled By: ezyang

fbshipit-source-id: 172ce4a0fee7fbe01436a421d1af22ef6173b6ed
2020-03-11 07:22:46 -07:00