Fixes:
- https://github.com/pytorch/pytorch/issues/101138
**Description**
The PR enhances error handling in `_check_cuda_version` by verifying the existence of the `nvcc` executable before invoking `subprocess.check_output`. If `nvcc` is missing, a `FileNotFoundError` is raised with a clear message, guiding users to check their CUDA installation and path configuration.
**Testing**
Manually tested with and without `nvcc` present in the expected path.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/148671
Approved by: https://github.com/malfet
In the kernelBot leaderboard we support people competing with custom cuda extensions via `load_inline()`, however even on toy kernels this can result in cold starts of up to 90s - this feature is primarily responsible for us having to double our timeout values
I performed an investigation here https://github.com/msaroufim/load_inline_slow and the primary cause was that torch/extension.h and torch/types.h add in about 5,000 header files https://github.com/msaroufim/load_inline_slow/blob/main/header-analysis
So we introduce a mode `no_implicit_headers` which forces users to be explicit about exactly what they want to add. There's a proper test meant to be used in a CLI and a pytest test that's not terribly helpful
Then there's still an open question around what's the most minimal example implementation we can provide. For the baseline kernel we're showing here, it takes about 1 min to compile
1. There's using TensorBase.h (finicky to get right but can get compilation times down to 7s)
2. Just using Tensor.h (down to 15s)
3. Using Shim.h (did not try yet since the syntax is verbose relative to cuda)
This is my take so far https://gist.github.com/msaroufim/079a8d08ffebd0f91a1c2247eb0ce9e0 for a minimal implementation at 15s but @malfet has a simpler one at only 5s
There's more things I'd like to try moving forward like nvrtc and fancier compilation flags. Typical advice around using precompiled headers does not apply to us because we are mostly interested in cold starts where we tear down the machine after running a kernel
Also in a future PR I'd like to fix issue I've noticed with load_inline
1. It needs a force recompilation mode, I was using this quite a bit myself
2. The cache does not take into account changes in environment so the best way to force a recompilation is to change some string in the file
3. Instead of relying on pybind, can we use TORCH_LIBRARY instead
Pull Request resolved: https://github.com/pytorch/pytorch/pull/149480
Approved by: https://github.com/malfet
- Updated HIP flags for Windows (removed non Windows flags on Windows case, added runtime library)
- Set hipcc call for Windows case
- Removed CUDA flags (not used in ROCm) on Windows
- Updated Windows compiler (added case when using ROCm on Windows)
- Fixed path issue in hipify_python
Pull Request resolved: https://github.com/pytorch/pytorch/pull/147382
Approved by: https://github.com/jeffdaily
Co-authored-by: Jeff Daily <jeff.daily@amd.com>
Fixes https://github.com/ROCm/hip/issues/3764.
Fixes and improvements to CUDA->HIP flag conversion for CPP extensions
- Log flag conversion for debugging purposes.
- Fix cases where it should not touch the -I flags or cases where CUDA appears more than once by replacing only the first instance.
- Fix case where nvcc key may not exist
- Fix case where hipify should ignore flag values and only touch the flag itself
Pull Request resolved: https://github.com/pytorch/pytorch/pull/149245
Approved by: https://github.com/jeffdaily
Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai>
This patch adds support for sycl kernels build via `torch.utils.cpp_extension.load`, `torch.utils.cpp_extension.load_inline` and (new) `class SyclExtension` APIs. Files having `.sycl` extension are considered to have sycl kernels and are compiled with `icpx` (dpc++ sycl compiler from Intel). Files with other extensions, `.cpp`, `.cu`, are handled as before. API supports building sycl along with other file types into single extension.
Note that `.sycl` file extension is a PyTorch convention for files containing sycl code which I propose to adopt. We did follow up with compiler team to introduce such file extension in the compiler, but they are opposed to this. At the same time discussion around sycl file extension and adding sycl language support into such tools as cmake is ongoing. Eventually cmake also considers to introduce some file extension convention for sycl. I hope we can further influence cmake and compiler communities to broader adopt `.sycl` file extension.
By default SYCL kernels are compiled for all Intel GPU devices for which pytorch native aten SYCL kernels are compiled. At the moment `pvc,xe-lpg`. This behavior can be overridden by setting `TORCH_XPU_ARCH_LIST` environment variables to the comma separated list of desired devices to compile for.
Fixes: #132944
CC: @gujinghui @EikanWang @fengyuan14 @guangyey @jgong5
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132945
Approved by: https://github.com/albanD, https://github.com/guangyey, https://github.com/malfet
Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
This patch adds support for sycl kernels build via `torch.utils.cpp_extension.load`, `torch.utils.cpp_extension.load_inline` and (new) `class SyclExtension` APIs. Files having `.sycl` extension are considered to have sycl kernels and are compiled with `icpx` (dpc++ sycl compiler from Intel). Files with other extensions, `.cpp`, `.cu`, are handled as before. API supports building sycl along with other file types into single extension.
Note that `.sycl` file extension is a PyTorch convention for files containing sycl code which I propose to adopt. We did follow up with compiler team to introduce such file extension in the compiler, but they are opposed to this. At the same time discussion around sycl file extension and adding sycl language support into such tools as cmake is ongoing. Eventually cmake also considers to introduce some file extension convention for sycl. I hope we can further influence cmake and compiler communities to broader adopt `.sycl` file extension.
By default SYCL kernels are compiled for all Intel GPU devices for which pytorch native aten SYCL kernels are compiled. At the moment `pvc,xe-lpg`. This behavior can be overridden by setting `TORCH_XPU_ARCH_LIST` environment variables to the comma separated list of desired devices to compile for.
Fixes: #132944
CC: @gujinghui @EikanWang @fengyuan14 @guangyey @jgong5
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132945
Approved by: https://github.com/albanD, https://github.com/guangyey
This could be BC breaking, because there was a period of time when we use py_limited_api=True but don't enforce the flag, and now that we will start enforcing the flag, people's custom extensions may fail to build.
This is strictly still better behavior, as it is sketchy to claim CPython agnosticism without the flag, but calling this out as potential people yelling at us. Ways to mitigate this risk + reasons this may not be too big a deal:
- People haven't known about py_limited_api for extensions much due to lack of docs from python so usage is low right now
- My current tutorial is in store to make new users of py_limited_api pass this flag, so it'd be a noop for them.
Test plan:
* Locally i'm confident as I tried rebuilding ao with this change and it reliably failed (cuz importing torch/extension.h is a nono)
* Unit test wise, the normal python_agnostic one I added should work
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145764
Approved by: https://github.com/ezyang, https://github.com/zou3519, https://github.com/albanD
While working on conda-forge integration, I needed to look at the way the include paths are calculated, and noticed an avoidable duplication between `torch/utils/cpp_extension.py` and `torch/_inductor/cpp_builder.py`. The latter already imports the former anyway, so simply reuse the same function.
Furthermore, remove long-obsolete include-paths. AFAICT, the `/TH` headers have not existed since pytorch 1.11.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145480
Approved by: https://github.com/ezyang
Getting tested with ao, but now there is a real test i added.
## What does this PR do?
We want to allow custom PyTorch extensions to be able to build one wheel for multiple Python versions, in other words, achieve python agnosticism. It turns out that there is such a way that setuptools/Python provides already! Namely, if the user promises to use only the Python limited API in their extension, they can pass in `py_limited_api` to their Extension class and to the bdist_wheel command (with a min python version) in order to build 1 wheel that will suffice across multiple Python versions.
Sounds lovely! Why don't people do that already with PyTorch? Well 2 things. This workflow is hardly documented (even searching for python agnostic specifically does not reveal many answers) so I'd expect that people simply don't know about it. But even if they did, _PyTorch_ custom Extensions would still not work because we always link torch_python, which does not abide by py_limited_api rules.
So this is where this PR comes in! We respect when the user specifies py_limited_api and skip linking torch_python under that condition, allowing users to enroll in the provided functionality I just described.
## How do I know this PR works?
I manually tested my silly little ultra_norm locally (with `import python_agnostic`) and wrote a test case for the extension showing that
- torch_python doesn't show up in the ldd tree
- no Py- symbols show up
It may be a little confusing that our test case is actually python-free (more clean than python-agnostic) but it is sufficient (and not necessary) towards showing that this change works.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/138088
Approved by: https://github.com/ezyang, https://github.com/albanD
Over time, a large number of the existing type ignores have become irrelevant/unused/dead as a result of improvements in annotations and type checking.
Having these `# type: ignore` linger around is not ideal for two reasons:
- They are verbose/ugly syntatically.
- They could hide genuine bugs in the future, if a refactoring would actually introduce a bug but it gets hidden by the ignore.
I'm counting over 1500 unused ignores already. This is a first PR that removes some of them. Note that I haven't touched type ignores that looked "conditional" like the import challenge mentioned in https://github.com/pytorch/pytorch/pull/60006#issuecomment-2480604728. I will address these at a later point, and eventually would enable `warn_unused_ignores = True` in the mypy configuration as discussed in that comment to prevent accumulating more dead ignores going forward.
This PR should have no effect on runtime at all.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/142325
Approved by: https://github.com/Skylion007, https://github.com/janeyx99
Newer versions of distutils no longer import `_msvccompiler` upon init(on Windows platform, that was not the case on other platforms even before 74), but it's still accessible if one chooses to import it directly.
Test plan:
```
% python -c 'from setuptools import distutils; print(distutils.__version__, hasattr(distutils, "_msvccompiler")); from distutils import _msvccompiler; import setuptools; print(setuptools.__version__, _msvccompiler.__file__)'
3.10.9 False
65.5.0 /usr/local/fbcode/platform010/Python3.10.framework/Versions/3.10/lib/python3.10/site-packages/setuptools/_distutils/_msvccompiler.py
```
and
```
% python -c 'from setuptools import distutils; print(distutils.__version__, hasattr(distutils, "_msvccompiler")); from distutils import _msvccompiler; import setuptools; print(setuptools.__version__, _msvccompiler.__file__)'
3.13.0 False
75.6.0 /Users/malfet/py312-venv/lib/python3.13/site-packages/setuptools/_distutils/_msvccompiler.py
```
Gave up trying to appease the linker, so rewrote it as following function:
```python
def _get_vc_env(vc_arch: str) -> dict[str, str]:
try:
from setuptools import distutils # type: ignore[import]
return distutils._msvccompiler._get_vc_env(vc_arch) # type: ignore[no-any-return]
except AttributeError:
from setuptools._distutils import _msvccompiler #type: ignore[import]
return _msvccompiler._get_vc_env(vc_arch) # type: ignore[no-any-return]
```
This PR also undoes setuptools version restriction introduced by https://github.com/pytorch/pytorch/pull/136489 as premise for restriction is incorrect
Fixes https://github.com/pytorch/pytorch/issues/141319
Pull Request resolved: https://github.com/pytorch/pytorch/pull/141363
Approved by: https://github.com/huydhn, https://github.com/atalman
This PR enables Pytorch for Windows on Arm64 - CPU only.
Currently, there aren't any checks in place to build and test for Windows on Arm64, but we're working to implement those as soon as possible.
We recommend using [Arm Performance Libraries (APL)](https://developer.arm.com/Tools%20and%20Software/Arm%20Performance%20Libraries) as a BLAS option, which is introduced in this PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/133088
Approved by: https://github.com/malfet
Co-authored-by: cristian panaite <panaite.cristian2000@gmail.com>
Co-authored-by: Stefan-Alin Pahontu <56953855+alinpahontu2912@users.noreply.github.com>
Co-authored-by: Ozan Aydin <148207261+ozanMSFT@users.noreply.github.com>
This is to avoid cache confusion between normal vs pydebug vs nogil builds in cpp extensions which can lead to catastrophic ABI issues.
This is rare today for people to run both normal and pydebug on the same machine, but we expect quite a few people will run normal and nogil on the same machine going forward.
This is tested locally by running each version alternatively.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/136890
Approved by: https://github.com/colesbury