PR #90689 replaces NVTX with NVTX3. However, the torch::nvtoolsext is created only when the third party NVTX is used.
This is clear a logical error. We now move the creation code out of the branch to cover all cases. This should fix the issues reported in the comments of #90689.
It would be better to move configurations of the failed FRL jobs to CI tests so that we can find such issues early before merging.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97582
Approved by: https://github.com/peterbell10
Summary:
This stack of PR's integrates cuSPARSELt into PyTorch.
This PR adds support for cuSPARSELt into the build process.
It adds in a new flag, USE_CUSPARSELT that defaults to false.
When USE_CUSPASRELT=1 is specified, the user can also specify
CUSPASRELT_ROOT, which defines the path to the library.
Compiling pytorch with cusparselt support can be done as follows:
``
USE_CUSPARSELT=1
CUSPARSELT_ROOT=/path/to/cusparselt
python setup.py develop
```
Test Plan:
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103700
Approved by: https://github.com/albanD
We greatly simplify the handing of OpenMP in CMake by using caffe2::openmp target thoroughly. We follow the old behavior by defaulting to MKL OMP library and detecting OMP flags otherwise.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91576
Approved by: https://github.com/malfet
The main changes are:
1. Remove outdated checks for old compiler versions because they can't support C++17.
2. Remove outdated CMake checks because it now requires 3.18.
3. Remove outdated CUDA checks because we are moving to CUDA 11.
Almost all changes are in CMake files for easy audition.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90599
Approved by: https://github.com/soumith
Summary:
Fixes : https://github.com/pytorch/pytorch/issues/73377
We've migrated to CUDA-11.3 as default toolkit in 1.9, it's time to stop builds (especially considering forward-compatibility guarantee across CUDA-11.x drivers)
Hence we are removing CUDA 11.1 support. We should also cleanup old cuda related code from our builder and pytorch repo making scripts a little more clean.
We have code that references cuda 9.2 , 10.1 , 11.0, 11.1, 11.2 and none of these are currently use
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73514
Reviewed By: janeyx99
Differential Revision: D34551989
Pulled By: atalman
fbshipit-source-id: 9ceaaa9b25ad49689986f4b29a26d20370d9d011
(cherry picked from commit fe109c62daf429e9053c03f6e374568ba23cd041)
Summary:
Remove forcing CUDNN_STATIC when CAFFE2_STATIC_LINK_CUDA is set
Since we are transitioning to using dynamic loading for multiple pytorch dependecies and CUDNN is the first step in this transition, hence we want to remove forcing CUDNN to statically load, and instead load it dynamically.
Tested using following workflow:
https://github.com/pytorch/pytorch/actions/runs/1790666862
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72290
Reviewed By: albanD
Differential Revision: D34003793
Pulled By: atalman
fbshipit-source-id: 41bda7ac019a612ee53ceb18d1e372b1bb3cb68e
(cherry picked from commit 4a01940e68)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62445
PyTorch currently uses the old style of compiling CUDA in CMake which is just a
bunch of scripts in `FindCUDA.cmake`. Newer versions support CUDA natively as
a language just like C++ or C.
Test Plan: Imported from OSS
Reviewed By: ejguan
Differential Revision: D31503350
fbshipit-source-id: 2ee817edc9698531ae1b87eda3ad271ee459fd55
Summary:
This is a PR on build system that provides support for cross compiling on Jetson platforms.
The major change is:
1. Disable try runs for cross compiling in `COMPILER_WORKS`, `BLAS`, and `CUDA`. They will not be able to perform try run on a cross compile setup
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59764
Reviewed By: soulitzer
Differential Revision: D29524363
Pulled By: malfet
fbshipit-source-id: f06d1ad30b704c9a17d77db686c65c0754db07b8
Summary:
This is only important for builds where cuDNN is linked statically into libtorch_cpu.
Before this PR PyTorch wheels often accidentally contained several partial copies of cudnn_static library.
Splitting the interface into header only (cudnn-public) and library+headers(cudnn-private) prevents those from happening.
Preliminary step towards enabling optional linking whole cudnn_library to workaround issue reported in https://github.com/pytorch/pytorch/issues/50153
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59721
Reviewed By: ngimel
Differential Revision: D29000967
Pulled By: malfet
fbshipit-source-id: f054df92b265e9494076ab16c247427b39da9336
Summary:
Library linking order matters during static linking
Not sure whether its a bug or a feature, but if cublas is reference
before CuDNN, it will be partially statically linked into the library,
even if it is not used
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58287
Reviewed By: janeyx99
Differential Revision: D28433165
Pulled By: malfet
fbshipit-source-id: 8dffa0533075126dc383428f838f7d048074205c
Summary:
Fixes following error during static linking, by enforcing that cudart dependency is put after cublasLt
```
/usr/bin/ld: /usr/local/cuda/lib64/libcublasLt_static.a(libcublasLt_static.a.o): undefined reference to symbol 'cudaStreamWaitEvent@libcudart.so.11.0'
/usr/local/cuda/lib64/libcudart.so: error adding symbols: DSO missing from command line
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52509
Reviewed By: janeyx99
Differential Revision: D26547622
Pulled By: malfet
fbshipit-source-id: 4e17f18cf0ab5479a549299faf2583a79fbda4b9
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52184
`auditwheel` inserts first 8 symbols of sha256 checksum of the library before relocating into the wheel package. This change adds logic for computing the same short sha sum and embedding it into LazyNVRTC as alternative name for libnvrt.so
Fixes https://github.com/pytorch/pytorch/issues/52075
Test Plan: Imported from OSS
Reviewed By: seemethere
Differential Revision: D26417403
Pulled By: malfet
fbshipit-source-id: e366dd22e95e219979f6c2fa39acb11585b34c72
Summary:
Necessary to ensure correct link order, especially if libraries are
linked statically. Otherwise, one might run into:
```
/usr/bin/ld: /usr/local/cuda/lib64/libcublasLt_static.a(libcublasLt_static.a.o): undefined reference to symbol 'cudaStreamWaitEvent@libcudart.so.11.0'
/usr/local/cuda/lib64/libcudart.so: error adding symbols: DSO missing from command line
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52243
Reviewed By: seemethere, ngimel
Differential Revision: D26437159
Pulled By: malfet
fbshipit-source-id: 33b8bb5040bda10537833f3ad737f535488452ea
Summary:
Fixes #{issue number}
This makes the command line shorter.
Also updates `randomtemp` in which the previous version has a limitation that the length of the argument cannot exceed 260.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45842
Reviewed By: albanD
Differential Revision: D24137088
Pulled By: ezyang
fbshipit-source-id: f0b4240735306e302eb3887f54a2b7af83c9f5dc
Summary:
If argumenets in set_target_properties are not separated by whitespace, cmake raises a warning:
```
CMake Warning (dev) at cmake/public/cuda.cmake:269:
Syntax Warning in cmake code at column 54
Argument not separated from preceding token by whitespace.
```
Fixes #{issue number}
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42707
Reviewed By: ailzhang
Differential Revision: D22988055
Pulled By: malfet
fbshipit-source-id: c3744f23b383d603788cd36f89a8286a46b6c00f
Summary:
This PR intends to fix https://github.com/pytorch/pytorch/issues/32983.
The initial (one-line) diff causes statically linked cudnn symbols in `libtorch_cuda.so` to have local linkage (such that they shouldn't be visible to external libraries during dynamic linking at load time), at least in my source build on Ubuntu 20.04.
Procedure I used to verify:
```
export USE_STATIC_CUDNN=ON
python3 setup.py install
...
```
then
```
mcarilli@mcarilli-desktop:~/Desktop/mcarilli_github/pytorch/torch/lib$ nm libtorch_cuda.so | grep cudnnCreate
00000000031ff540 t cudnnCreate
00000000031fbe70 t cudnnCreateActivationDescriptor
```
Before the diff they were marked with capital `T`s indicating external linkage.
Caveats:
- The fix is gcc-specific afaik. I have no idea how to enable it for Windows or other compilers.
- Hiding the cudnn symbols will break external C++ applications that rely on linking `libtorch.so` to supply cudnn symbol definitions. IMO this is "off menu" usage so I don't think it's a major concern. Hiding the symbols _won't_ break applications that call cudnn indirectly through torch functions, which IMO is the "on menu" way.
- I know _very little_ about the build system. The diff's intent is to add a link option that applies to any Pytorch `.so`s that statically link cudnn, and does so on Linux only. I'm blindly following soumith 's recommendation https://github.com/pytorch/pytorch/issues/32983#issuecomment-662056151, and post-checking the built libs (I also added `set(CMAKE_VERBOSE_MAKEFILE ON)` to the top-level CMakeLists.txt at one point to confirm `-Wl,--exclude-libs,libcudnn_static.a` was picked up by the command that linked `libtorch_cuda.so`).
- https://github.com/pytorch/pytorch/issues/32983 (which used a Pytorch 1.4 binary build) complained about `libtorch.so`, not `libtorch_cuda.so`:
```
nvpohanh@ubuntu:~$ nm /usr/local/lib/python3.5/dist-packages/torch/lib/libtorch.so | grep ' cudnnCreate'
000000000f479c30 T cudnnCreate
000000000f475ff0 T cudnnCreateActivationDescriptor
```
In my source build, `libtorch.so` ends up small, containing no cudnn symbols (this is true with or without the PR's diff), which contradicts https://github.com/pytorch/pytorch/issues/32983. Maybe the symbol organization (what goes in `libtorch.so` vs `libtorch_cuda/cpu/whatever.so`) changed since 1.4. Or maybe the symbol organization is different for source vs binary builds, in which case I have no idea if this PR's diff has the same effect for a binary build.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41986
Reviewed By: glaringlee
Differential Revision: D22934926
Pulled By: malfet
fbshipit-source-id: 711475834e0f8148f0e5f2fe28fca5f138ef494b
Summary:
As explained in https://github.com/pytorch/pytorch/issues/41922 using `if(NOT ${var})" is usually wrong and can lead to issues like https://github.com/pytorch/pytorch/issues/41922 where the condition is wrongly evaluated to FALSE instead of TRUE. Instead the unevaluated variable name should be used in all cases, see the CMake docu for details.
This fixes the `NOT ${var}` cases by using a simple regexp replacement. It seems `pybind11_PREFER_third_party` is the only variable really prone to causing an issue as all others are set. However due to CMake evaluating unquoted strings in `if` conditions as variable names I recommend to never use unquoted `${var}` in an if condition. A similar regexp based replacement could be done on the whole codebase but as that does a lot of changes I didn't include this now. Also `if(${var})` will likely lead to a parser error if `var` is unset instead of a wrong result
Fixes https://github.com/pytorch/pytorch/issues/41922
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41924
Reviewed By: seemethere
Differential Revision: D22700229
Pulled By: mrshenli
fbshipit-source-id: e2b3466039e4312887543c2e988270547a91c439
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39277
This PR contains initial changes that makes PyTorch build with Ampere GPU, CUDA 11, and cuDNN 8.
TF32 related features will not be included in this PR.
Test Plan: Imported from OSS
Differential Revision: D21832814
Pulled By: malfet
fbshipit-source-id: 37f9c6827e0c26ae3e303580f666584230832d06
Summary:
On Windows, when you call those unsupported functions like `std::pow`, `std::isnan` or `std::isinf` in the device function and compile, a warning is thrown:
```
kernel.cu
kernel.cu(39): warning: calling a __host__ function from a __host__ __device__ function is not allowed
kernel.cu(42): warning: calling a __host__ function from a __host__ __device__ function is not allowed
kernel.cu(39): warning: calling a __host__ function("isnan<double> ") from a __host__ __device__ function("test_") is not allowed
kernel.cu(42): warning: calling a __host__ function("isinf<double> ") from a __host__ __device__ function("test_") is not allowed
```
However, those calls will lead to runtime errors, see https://github.com/pytorch/pytorch/pull/36749#issuecomment-619239788 and https://github.com/pytorch/pytorch/issues/31108. So we should treat them as errors.
Previously, the situation is worse because the warnings are turned off by passing in `-w`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37302
Differential Revision: D21297207
Pulled By: ngimel
fbshipit-source-id: 822b8a98c10e54c38319674763b6681db21c1021
Summary:
We should not rely on the async exceptions. Catching C++ only exception is more sensible and may get a boost in both space (1163 MB -> 1073 MB, 0.92x) and performance(51m -> 49m, 0.96x).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37235
Differential Revision: D21256918
Pulled By: ezyang
fbshipit-source-id: 572ee96f2e4c48ad13f83409e4e113483b3a457a
Summary:
Fix https://github.com/pytorch/pytorch/issues/33928. Basically just move the dependency into a new imported target.
I'm not sure whether this modification will affect other parts, please test it throughly.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37310
Differential Revision: D21263066
Pulled By: ezyang
fbshipit-source-id: 7dc38f578d7e9bcb491ef5e122106fb66a33156f