Not sure, why was it excluded previous (oversight I guess).
Also, please note, that `clang++` is already considered acceptable compiler (as it ends with `g++` ;))
<!--
copilot:poem
-->
### <samp>🤖 Generated by Copilot at 55aa7db</samp>
> _`clang` or `gcc`, we don't care what you use_
> _We'll build our extensions with the tools we choose_
> _Don't try to stop us with your version string_
> _We'll update our logic and make our code sing_
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103349
Approved by: https://github.com/seemethere
When we need to link extra libs, we should notice that 64-bit CUDA may be installed in "lib", not in "lib64".
<!--
copilot:summary
-->
### <samp>🤖 Generated by Copilot at 05c1ca6</samp>
Improve CUDA compatibility in `torch.utils.cpp_extension` by checking for `lib64` or `lib` directory.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101285
Approved by: https://github.com/ezyang, https://github.com/malfet
Currently if `setuptools<49.4.0` and there is a minor version mismatch `_check_cuda_version` fails with a misleading non-actionable error:
```
2023-03-24T20:21:35.0625644Z RuntimeError:
2023-03-24T20:21:35.0628441Z The detected CUDA version (11.2) mismatches the version that was used to compile
2023-03-24T20:21:35.0630681Z PyTorch (11.3). Please make sure to use the same CUDA versions.
```
This condition shouldn't be failing since minor version match isn't required.
It fails because the other condition to have a certain version of `setuptools` isn't met. But that condition is written in a comment (!!!). So this PR changes it to actually tell the user how to fix the problem.
While at it, I adjusted the version number as a lower `setuptools>=49.4.0` is sufficient for this to work.
Thanks.
p.s. this problem manifests on `nvidia/cuda:11.2.2-cudnn8-devel-ubuntu20.04` docker image.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97602
Approved by: https://github.com/ezyang
Merges startswith, endswith calls to into a single call that feeds in a tuple. Not only are these calls more readable, but it will be more efficient as it iterates through each string only once.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96754
Approved by: https://github.com/ezyang
These warnings are disabled to avoid long log on Windows tests. They are also disabled on CMake buildings currently.
'/wd4624': MSVC complains "destructor was implicitly defined as delete" on c10::optional and other templates
'/wd4076': "unexpected tokens following preprocessor directive - expected a newline" on some header
'/wd4068': "The compiler ignored an unrecognized [pragma]"
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95933
Approved by: https://github.com/ezyang
Applies the remaining flake8-comprehension fixes and checks. This changes replace all remaining unnecessary generator expressions with list/dict/set comprehensions which are more succinct, performant, and better supported by our torch.jit compiler. It also removes useless generators such as 'set(a for a in b)`, resolving it into just the set call.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94676
Approved by: https://github.com/ezyang
Optimize unnecessary collection cast calls, unnecessary calls to list, tuple, and dict, and simplify calls to the sorted builtin. This should strictly improve speed and improve readability.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94323
Approved by: https://github.com/albanD
The main changes are:
1. Remove outdated checks for old compiler versions because they can't support C++17.
2. Remove outdated CMake checks because it now requires 3.18.
3. Remove outdated CUDA checks because we are moving to CUDA 11.
Almost all changes are in CMake files for easy audition.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90599
Approved by: https://github.com/soumith
Switch GCC/Clang max versions to be exclusive as the `include/crt/host_config.h` checks the major version only for the upper bound. This allows to be less restrictive and match the checks in the aforementioned header.
Also update the versions using that header in the CUDA SDKs.
Follow up to #82860
I noticed this as PyTorch 1.12.1 with CUDA 11.3.1 and GCC 10.3 was failing in the `test_cpp_extensions*` tests.
Example for CUDA 11.3.1 from the SDK header:
```
#if __GNUC__ > 11
// Error out
...
#if (__clang_major__ >= 12) || (__clang_major__ < 3) || ((__clang_major__ == 3) && (__clang_minor__ < 3))
// Error out
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86360
Approved by: https://github.com/ezyang
As we are linking with cuDNN and cuBLAS dynamically for all configs anyway, as statically linked cuDNN is different library than dynamically linked one, increases default memory footprint, etc, and libtorch_cuda even if compiled for all GPU architectures is no longer approaching 2Gb binary size limit, so BUILD_SPLIT_CUDA can go away.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87502
Approved by: https://github.com/atalman
The function checks the output of e.g. `c++ -v` for "gcc version". But on another locale than English it might be "gcc-Version" which makes the check fail.
This causes the function to wrongly return false on systems where `c++` is a hardlink to `g++` and the current locale returns another output format.
Fix this by setting `LC_ALL=C`.
I found this as `test_utils.py` was failing in `test_cpp_compiler_is_ok`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85891
Approved by: https://github.com/ezyang
This is a new version of #15648 based on the latest master branch.
Unlike the previous PR where I fixed a lot of the doctests in addition to integrating xdoctest, I'm going to reduce the scope here. I'm simply going to integrate xdoctest, and then I'm going to mark all of the failing tests as "SKIP". This will let xdoctest run on the dashboards, provide some value, and still let the dashboards pass. I'll leave fixing the doctests themselves to another PR.
In my initial commit, I do the bare minimum to get something running with failing dashboards. The few tests that I marked as skip are causing segfaults. Running xdoctest results in 293 failed, 201 passed tests. The next commits will be to disable those tests. (unfortunately I don't have a tool that will insert the `#xdoctest: +SKIP` directive over every failing test, so I'm going to do this mostly manually.)
Fixes https://github.com/pytorch/pytorch/issues/71105
@ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82797
Approved by: https://github.com/ezyang
### Description
<!-- What did you change and why was it needed? -->
Listed in the commit message:
> The user may want to use `python3 -c "..."` to get the torch library
> path and the include path. Printing messages to stdout will mess up
> the output.
I'm using the command:
```bash
LIBTORCH_PATH="$(
python3 -c 'print(":".join(__import__("torch.utils.cpp_extension", fromlist=[None]).library_paths()))'
)"
export LD_LIBRARY_PATH="${LIBTORCH_PATH}${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}"
```
To let the command line tools find the torch shared libraries. I think this would be a common use case for users who writing C/C++ extensions.
I got:
```console
$ LIBTORCH_PATH="$(python3 -c 'print(":".join(__import__("torch.utils.cpp_extension", fromlist=[None]).library_paths()))')"
$ export LD_LIBRARY_PATH="${LIBTORCH_PATH}${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}"
$ echo "LD_LIBRARY_PATH=${LD_LIBRARY_PATH}"
LD_LIBRARY_PATH=No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda-11.6'
/opt/hostedtoolcache/Python/3.7.13/x64/lib/python3.7/site-packages/torch/lib:/usr/local/cuda-11.6/lib64:
$ ls -alh "${LIBTORCH_PATH}"
ls: cannot access 'No CUDA runtime is found, using CUDA_HOME='\''/usr/local/cuda-11.6'\'''$'\n''/opt/hostedtoolcache/Python/3.7.13/x64/lib/python3.7/site-packages/torch/lib': No such file or directory
```
This PR prints messages in `torch.utils.cpp_extension` to `stderr`, which allows users to get correct result using `VAR="$(python3 -c '...')"`
### Issue
<!-- Link to Issue ticket or RFP -->
N/A
### Testing
<!-- How did you test your change? -->
N/A
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82097
Approved by: https://github.com/ezyang
*even if no GPUs are available*
When building PyTorch extensions for ROCm Pytorch, if the user doesn't specify a list of archs using PYTORCH_ROCM_ARCH env var, we would like to use the list of gfx archs that PyTorch was built for as the default value. To do this successfully even in an environment where no GPUs are available eg. a build-only CPU node, we need to be able to get the list of archs. `torch.cuda.get_arch_list()` doesn't work here because it calls `torch.cuda.available()` first: 0922cc024e/torch/cuda/__init__.py (L463), which will return `False` if no GPUs are available, resulting in an empty list being returned by `torch.cuda.get_arch_list()`. To get around this issue, we call the underlying API `torch._C._cuda_getArchFlags()`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80498
Approved by: https://github.com/ezyang, https://github.com/malfet
Summary:
hip/hip_runtime.h and libamdhip64.so may be required to compile
extension such as torch_ucc. They are in $ROCM_HOME/hip by default,
and may not be symlinked to $ROCM_HOME/include and $ROCM_HOME/lib.
This commit defines $ROCM_HOME/hip as $HIP_HOME, and adds its include
and lib paths when building hipified extension.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75548
Test Plan:
## Verify OSS pytorch + TorchUCC on an AMD GPU machine (MI100)
- step 1. Install OSS pytorch
```
export ROCM_PATH=/opt/rocm-4.5.2
git clone https://github.com/pytorch/pytorch.git
cd pytorch
python3 tools/amd_build/build_amd.py
USE_NCCL=0 USE_RCCL=0 USE_KINETO=0 with-proxy python3 setup.py develop
USE_NCCL=0 USE_RCCL=0 USE_KINETO=0 with-proxy python3 setup.py install
```
- step2. Install torchUCC extension
```
# /opt/rocm-4.5.2/include/hip does not exist, need include /opt/rocm-4.5.2/hip/include at compile time
export ROCM_PATH=/opt/rocm-4.5.2
export RCCL_INSTALL_DIR=/opt/rccl-rocm-rel-4.4-rdc
git clone https://github.com/facebookresearch/torch_ucc.git
cd torch_ucc
UCX_HOME=$RCCL_INSTALL_DIR UCC_HOME=$RCCL_INSTALL_DIR WITH_CUDA=$ROCM_PATH python setup.py
```
Build log before fix (error "hip/hip_runtime.h: No such file or directory"): P493038915
Build log after fix: P493037572
Reviewed By: ezyang
Differential Revision: D35506098
Pulled By: minsii
fbshipit-source-id: 76cbb6d4eaa6549a00898c9d9ebaca47a55330e9
(cherry picked from commit d684c080edf1fbd293e3321151976812c1da8533)
Summary:
Adding documentation about compiling extension with CUDA 11.5 and Windows
Example of failure: https://github.com/pytorch/pytorch/runs/4408796098?check_suite_focus=true
Note: Don't use torch/extension.h In CUDA 11.5 under windows in your C++ code:
Use aten instead of torch interface in all cuda 11.5 code under windows. It has been failing with errors, due to a bug in nvcc.
Example use:
>>> #include <ATen/ATen.h>
>>> at::Tensor SigmoidAlphaBlendForwardCuda(....)
Instead of:
>>> #include <torch/extension.h>
>>> torch::Tensor SigmoidAlphaBlendForwardCuda(...)
Currently open issue for nvcc bug: https://github.com/pytorch/pytorch/issues/69460
Complete Workaround code example: cb170ac024
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73013
Reviewed By: malfet, seemethere
Differential Revision: D34306134
Pulled By: atalman
fbshipit-source-id: 3c5b9d7a89c91bd1920dc63dbd356e45dc48a8bd
(cherry picked from commit 87098e7f17)
Summary:
Remove all hardcoded AMD gfx targets
PyTorch build and Magma build will use rocm_agent_enumerator as
backup if PYTORCH_ROCM_ARCH env var is not defined
PyTorch extensions will use same gfx targets as the PyTorch build,
unless PYTORCH_ROCM_ARCH env var is defined
torch.cuda.get_arch_list() now works for ROCm builds
PyTorch CI dockers will continue to be built for gfx900 and gfx906 for now.
PYTORCH_ROCM_ARCH env var can be a space or semicolon separated list of gfx archs eg. "gfx900 gfx906" or "gfx900;gfx906"
cc jeffdaily sunway513 jithunnair-amd ROCmSupport KyleCZH
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61706
Reviewed By: seemethere
Differential Revision: D32735862
Pulled By: malfet
fbshipit-source-id: 3170e445e738e3ce373203e1e4ae99c84e645d7d
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65610
- Replace HIP_PLATFORM_HCC with USE_ROCM
- Dont rely on CUDA_VERSION or HIP_VERSION and use USE_ROCM and ROCM_VERSION.
- In the next PR
- Will be removing the mapping from CUDA_VERSION to HIP_VERSION and CUDA to HIP in hipify.
- HIP_PLATFORM_HCC is deprecated, so will add HIP_PLATFORM_AMD to support HIP host code compilation on gcc.
cc jeffdaily sunway513 jithunnair-amd ROCmSupport amathews-amd
Reviewed By: jbschlosser
Differential Revision: D30909053
Pulled By: ezyang
fbshipit-source-id: 224a966ebf1aaec79beccbbd686fdf3d49267e06
Summary:
I think this may be related to https://app.circleci.com/pipelines/github/pytorch/vision/9352/workflows/9c8afb1c-6157-4c82-a5c8-105c5adac57d/jobs/687003
Apparently `pkg_resource.parse_version` returns a type of `pkg_resources.extern.packaging.version.Version` instead of `packaging.version.Version` and seems on some older version of the setuptools it doesn't support `.major/minor` operation. changing it back to using `packaging.version.parse`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61053
Test Plan: CI
Reviewed By: samestep
Differential Revision: D29494322
Pulled By: walterddr
fbshipit-source-id: 294572a10b167677440d7404e5ebe007ab59d299
Summary:
[distutils](https://docs.python.org/3/library/distutils.html) is on its way out and will be deprecated-on-import for Python 3.10+ and removed in Python 3.12 (see [PEP 632](https://www.python.org/dev/peps/pep-0632/)). There's no reason for us to keep it around since all the functionality we want from it can be found in `setuptools` / `sysconfig`. `setuptools` includes a copy of most of `distutils` (which is fine to use according to the PEP), that it uses under the hood, so this PR also uses that in some places.
Fixes#56527
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57040
Pulled By: driazati
Reviewed By: nikithamalgifb
Differential Revision: D28051356
fbshipit-source-id: 1ca312219032540e755593e50da0c9e23c62d720
Summary:
- This change is required to handle the case when hipcc is
updated to the latest using update-alternatives.
- Update-alternatives support for few ROCm binaries is available
from ROCm 4.1 onwards.
- This change doesnt not affect any previous versions of ROCm.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55968
Reviewed By: mruberry
Differential Revision: D27785123
Pulled By: ezyang
fbshipit-source-id: 8467e468d8d51277fab9b0c8cbd57e80bbcfc7f7
Summary:
Allows extensions to override ROCm gfx arch targets. Reuses the same env var used during cmake build for consistency.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54341
Reviewed By: bdhirsh
Differential Revision: D27244010
Pulled By: heitorschueroff
fbshipit-source-id: 279e1a41ee395a0596aa7f696b6e908cf7f5bb83
Summary:
This fixes the previous erroring out by adding stricter conditions in cpp_extension.py.
To test, run a split torch_cuda build on Windows with export BUILD_SPLIT_CUDA=ON && python setup.py develop and then run the following test: python test/test_utils.py TestStandaloneCPPJIT.test_load_standalone. It should pass.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51596
Reviewed By: malfet
Differential Revision: D26213816
Pulled By: janeyx99
fbshipit-source-id: a752ce7f9ab9d73dcf56f952bed2f2e040614443
Summary:
On all non-Windows platforms we should use 'posix_prefix' schema to discover location of Python.h header
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51586
Reviewed By: ezyang
Differential Revision: D26208684
Pulled By: malfet
fbshipit-source-id: bafa6d79de42231629960c642d535f1fcf7a427f
Summary:
Because of the size of our `libtorch_cuda.so`, linking with other hefty binaries presents a problem where 32bit relocation markers are too small and end up overflowing. This PR attempts to break up `torch_cuda` into `torch_cuda_cu` and `torch_cuda_cpp`.
`torch_cuda_cu`: all the files previously in `Caffe2_GPU_SRCS` that are
* pure `.cu` files in `aten`match
* all the BLAS files
* all the THC files, except for THCAllocator.cpp, THCCachingHostAllocator.cpp and THCGeneral.cpp
* all files in`detail`
* LegacyDefinitions.cpp and LegacyTHFunctionsCUDA.cpp
* Register*CUDA.cpp
* CUDAHooks.cpp
* CUDASolver.cpp
* TensorShapeCUDA.cpp
`torch_cuda_cpp`: all other files in `Caffe2_GPU_SRCS`
Accordingly, TORCH_CUDA_API and TORCH_CUDA_BUILD_MAIN_LIB usages are getting split as well to TORCH_CUDA_CU_API and TORCH_CUDA_CPP_API.
To test this locally, you can run `export BUILD_SPLIT_CUDA=ON && python setup.py develop`. In your `build/lib` folder, you should find binaries for both `torch_cuda_cpp` and `torch_cuda_cu`. To see that the SPLIT_CUDA option was toggled, you can grep the Summary of running cmake and make sure `Split CUDA` is ON.
This build option is tested on CI for CUDA 11.1 builds (linux for now, but windows soon).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49050
Reviewed By: walterddr
Differential Revision: D26114310
Pulled By: janeyx99
fbshipit-source-id: 0180f2519abb5a9cdde16a6fb7dd3171cff687a6
Summary:
Bugs:
1) would introduce -I* in compile commands
2) wouldn't hipify source code directly in build_dir, only one level down or more
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50703
Reviewed By: mrshenli
Differential Revision: D25949070
Pulled By: ngimel
fbshipit-source-id: 018c2a056b68019a922e20e5db2eb8435ad147fe
Summary:
_resubmission of gh-49654, which was reverted due to a cross-merge conflict_
This caught one incorrect annotation in `cpp_extension.load`.
xref gh-16574.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50278
Reviewed By: walterddr
Differential Revision: D25865278
Pulled By: ezyang
fbshipit-source-id: 25489191628af5cf9468136db36f5a0f72d9d54d
Summary:
tldr: current version of `is_ninja_available` of `torch/utils/cpp_extension.py` fails to run in the recent incarnations of pip w/ new build isolation feature which is now a default. This PR fixes this problem.
The full story follows:
--------------------------
Currently trying to build https://github.com/facebookresearch/fairscale/ which builds cuda extensions fails with the recent pip versions. The build is failing to perform `is_ninja_available`, which runs a simple subprocess to run `ninja --version` but does it with some /dev/null stream override which seems to break with the new pip versions. Currently I have `pip==20.3.3`. The recent pip performs build isolation which first fetches all dependencies to somewhere under /tmp/pip-install-xyz and then builds the package.
If I build:
```
pip install fairscale --no-build-isolation
```
everything works.
When building normally (i.e. without `--no-build-isolation`), the failure is a long long trace,
<details>
<summary>Full log</summary>
<pre>
pip install fairscale
Collecting fairscale
Downloading fairscale-0.1.1.tar.gz (83 kB)
|████████████████████████████████| 83 kB 562 kB/s
Installing build dependencies ... done
Getting requirements to build wheel ... error
ERROR: Command errored out with exit status 1:
command: /home/stas/anaconda3/envs/main-38/bin/python /home/stas/anaconda3/envs/main-38/lib/python3.8/site-packages/pip/_vendor/pep517/_in_process.py get_requires_for_build_wheel /tmp/tmpjvw00c7v
cwd: /tmp/pip-install-1wq9f8fp/fairscale_347f218384a64f24b8d5ce846641213e
Complete output (55 lines):
running egg_info
writing fairscale.egg-info/PKG-INFO
writing dependency_links to fairscale.egg-info/dependency_links.txt
writing requirements to fairscale.egg-info/requires.txt
writing top-level names to fairscale.egg-info/top_level.txt
Traceback (most recent call last):
File "/home/stas/anaconda3/envs/main-38/bin/ninja", line 5, in <module>
from ninja import ninja
ModuleNotFoundError: No module named 'ninja'
Traceback (most recent call last):
File "/home/stas/anaconda3/envs/main-38/lib/python3.8/site-packages/pip/_vendor/pep517/_in_process.py", line 280, in <module>
main()
File "/home/stas/anaconda3/envs/main-38/lib/python3.8/site-packages/pip/_vendor/pep517/_in_process.py", line 263, in main
json_out['return_val'] = hook(**hook_input['kwargs'])
File "/home/stas/anaconda3/envs/main-38/lib/python3.8/site-packages/pip/_vendor/pep517/_in_process.py", line 114, in get_requires_for_build_wheel
return hook(config_settings)
File "/tmp/pip-build-env-a5x2icen/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 149, in get_requires_for_build_wheel
return self._get_build_requires(
File "/tmp/pip-build-env-a5x2icen/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 130, in _get_build_requires
self.run_setup()
File "/tmp/pip-build-env-a5x2icen/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 145, in run_setup
exec(compile(code, __file__, 'exec'), locals())
File "setup.py", line 56, in <module>
setuptools.setup(
File "/tmp/pip-build-env-a5x2icen/overlay/lib/python3.8/site-packages/setuptools/__init__.py", line 153, in setup
return distutils.core.setup(**attrs)
File "/home/stas/anaconda3/envs/main-38/lib/python3.8/distutils/core.py", line 148, in setup
dist.run_commands()
File "/home/stas/anaconda3/envs/main-38/lib/python3.8/distutils/dist.py", line 966, in run_commands
self.run_command(cmd)
File "/home/stas/anaconda3/envs/main-38/lib/python3.8/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/tmp/pip-build-env-a5x2icen/overlay/lib/python3.8/site-packages/setuptools/command/egg_info.py", line 298, in run
self.find_sources()
File "/tmp/pip-build-env-a5x2icen/overlay/lib/python3.8/site-packages/setuptools/command/egg_info.py", line 305, in find_sources
mm.run()
File "/tmp/pip-build-env-a5x2icen/overlay/lib/python3.8/site-packages/setuptools/command/egg_info.py", line 536, in run
self.add_defaults()
File "/tmp/pip-build-env-a5x2icen/overlay/lib/python3.8/site-packages/setuptools/command/egg_info.py", line 572, in add_defaults
sdist.add_defaults(self)
File "/home/stas/anaconda3/envs/main-38/lib/python3.8/distutils/command/sdist.py", line 228, in add_defaults
self._add_defaults_ext()
File "/home/stas/anaconda3/envs/main-38/lib/python3.8/distutils/command/sdist.py", line 311, in _add_defaults_ext
build_ext = self.get_finalized_command('build_ext')
File "/home/stas/anaconda3/envs/main-38/lib/python3.8/distutils/cmd.py", line 298, in get_finalized_command
cmd_obj = self.distribution.get_command_obj(command, create)
File "/home/stas/anaconda3/envs/main-38/lib/python3.8/distutils/dist.py", line 858, in get_command_obj
cmd_obj = self.command_obj[command] = klass(self)
File "/tmp/pip-build-env-a5x2icen/overlay/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 351, in __init__
if not is_ninja_available():
File "/tmp/pip-build-env-a5x2icen/overlay/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1310, in is_ninja_available
subprocess.check_call('ninja --version'.split(), stdout=devnull)
File "/home/stas/anaconda3/envs/main-38/lib/python3.8/subprocess.py", line 364, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['ninja', '--version']' returned non-zero exit status 1.
----------------------------------------
ERROR: Command errored out with exit status 1: /home/stas/anaconda3/envs/main-38/bin/python /home/stas/anaconda3/envs/main-38/lib/python3.8/site-packages/pip/_vendor/pep517/_in_process.py get_requires_for_build_wheel /tmp/tmpjvw00c7v Check the logs for full command output.
</pre>
</details>
and the middle of it is what we want:
```
File "/tmp/pip-build-env-a5x2icen/overlay/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 351, in __init__
if not is_ninja_available():
File "/tmp/pip-build-env-a5x2icen/overlay/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1310, in is_ninja_available
subprocess.check_call('ninja --version'.split(), stdout=devnull)
File "/home/stas/anaconda3/envs/main-38/lib/python3.8/subprocess.py", line 364, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['ninja', '--version']' returned non-zero exit status 1.
```
For some reason pytorch fails to run this simple code:
```
# torch/utils/cpp_extension.py
def is_ninja_available():
r'''
Returns ``True`` if the `ninja <https://ninja-build.org/>`_ build system is
available on the system, ``False`` otherwise.
'''
with open(os.devnull, 'wb') as devnull:
try:
subprocess.check_call('ninja --version'.split(), stdout=devnull)
except OSError:
return False
else:
return True
```
I suspect that pip does something to `os.devnull` and that's why it fails.
This PR proposes a simpler code which doesn't rely on anything but `subprocess.check_output`:
```
def is_ninja_available():
r'''
Returns ``True`` if the `ninja <https://ninja-build.org/>`_ build system is
available on the system, ``False`` otherwise.
'''
try:
subprocess.check_output('ninja --version'.split())
except Exception:
return False
else:
return True
```
which doesn't use `os.devnull` and performs the same function. There could be a whole bunch of different exceptions there I think, so I went for the generic one - we don't care why it failed, since this function's only purpose is to suggest whether ninja can be used or not.
Let's check
```
python -c "import torch.utils.cpp_extension; print(torch.utils.cpp_extension.is_ninja_available())"
True
```
Look ma - no std noise to take care of. (i.e. no need for /dev/null).
I was editing the installed environment-wide `cpp_extension.py` file directly, so didn't need to tweak `PYTHONPATH` - I made sure to replace `'ninja --version'.` with something that should fail and I did get `False` for the above command line.
I next did a somewhat elaborate cheat to re-package an already existing binary wheel with this corrected version of `cpp_extension.py`, rather than building from source:
```
mkdir /tmp/pytorch-local-channel
cd /tmp/pytorch-local-channel
# get the latest nightly wheel
wget https://download.pytorch.org/whl/nightly/cu110/torch-1.8.0.dev20201215%2Bcu110-cp38-cp38-linux_x86_64.whl
# unpack it
unzip torch-1.8.0.dev20201215+cu110-cp38-cp38-linux_x86_64.whl
# edit torch/utils/cpp_extension.py to fix the python code with the new version as in this PR
emacs torch/utils/cpp_extension.py &
# pack the files back
zip -r torch-1.8.0.dev20201215+cu110-cp38-cp38-linux_x86_64.whl caffe2 torch torch-1.8.0.dev20201215+cu110.dist-info
```
Now I tell pip to use my local channel, plus `--pre` for it to pick up the pre-release as an acceptable wheel
```
# install using this local channel
git clone https://github.com/facebookresearch/fairscale/
cd fairscale
pip install -v --disable-pip-version-check -e . -f file:///tmp/pytorch-local-channel --pre
```
and voila all works.
```
[...]
Successfully installed fairscale
```
I noticed a whole bunch of ninja not found errors in the log, which I think is the same problem with other parts of the build system packages which also use this old check copied all over various projects and build tools, and which the recent pip breaks.
```
writing manifest file '/tmp/pip-modern-metadata-_nsdesbq/fairscale.egg-info/SOURCES.txt'
Traceback (most recent call last):
File "/home/stas/anaconda3/envs/main-38/bin/ninja", line 5, in <module>
from ninja import ninja
ModuleNotFoundError: No module named 'ninja'
[...]
/tmp/pip-build-env-fqflyevr/overlay/lib/python3.8/site-packages/torch/utils/cpp_extension.py:364: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
warnings.warn(msg.format('we could not find ninja.'))
```
but these don't prevent from the build completing and installing.
I suppose these need to be identified and reported to various other projects, but that's another story.
The new pip does something to `os.devnull` I think which breaks any code relying on it - I haven't tried to figure out what happens to that stream object, but this PR which removes its usage solves the problem.
Also do notice that:
```
git clone https://github.com/facebookresearch/fairscale/
cd fairscale
python setup.py bdist_wheel
pip install dist/fairscale-0.1.1-cp38-cp38-linux_x86_64.whl
```
works too. So it is really a pip issue.
Apologies if the notes are too many, I tried to give the complete picture and probably other projects will need those details as well.
Thank you for reading.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49443
Reviewed By: mruberry
Differential Revision: D25592109
Pulled By: ezyang
fbshipit-source-id: bfce4420c28b614ead48e9686f4153c6e0fbe8b7
Summary:
I am submitting this PR on behalf of Janne Hellsten(nurpax) from NVIDIA, for the convenience of CLA. Thanks Janne a lot for the contribution!
Currently, the ninja build decides whether to rebuild a .cu file or not pretty randomly. And there are actually two issues:
First, the arch list in the building command is ordered randomly. When the order changes, it will unconditionally rebuild regardless of the timestamp.
Second, the header files are not included in the dependency list, so if the header file changes, it is possible that ninja will not rebuild.
This PR fixes both issues. The fix for the second issue requires nvcc >= 10.2. nvcc < 10.2 can still build CUDA extension as it used to be, but it will be unable to see the changes in header files.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49344
Reviewed By: glaringlee
Differential Revision: D25540157
Pulled By: ezyang
fbshipit-source-id: 197541690d7f25e3ac5ebe3188beb1f131a4c51f
Summary:
Currently CUDAExtension assumes that all cards are of the same type on the same machine and builds the extension with compute capability of the 0th card. This breaks later at runtime if the machine has cards of different types.
Specifically resulting in:
```
RuntimeError: CUDA error: no kernel image is available for execution on the device
```
when the cards of the types that weren't compiled for are used. (and the error is far from telling what the problem is to the uninitiated)
My current setup is:
```
$ CUDA_VISIBLE_DEVICES=0 python -c "import torch; print(torch.cuda.get_device_capability())"
(8, 6)
$ CUDA_VISIBLE_DEVICES=1 python -c "import torch; print(torch.cuda.get_device_capability())"
(6, 1)
```
but the extension was getting built with `-gencode=arch=compute_80,code=sm_80`.
This PR:
* [x] introduces a loop over all visible at build time devices to ensure the extension will run on all of them (it sorts the new list generated by the loop, so that the output is easier to debug should a card with lower capacity come last)
* [x] adds `+PTX` to the last entry of ccs derived from local cards (`if not _arch_list:`) to support other archs
* [x] adds a digest of my conversation with ptrblck on slack in the form of docs which hopefully can help others know which archs to support, how to override defaults, when and how to add PTX, etc.
Please kindly review that my prose is clear and easy to understand.
ptrblck
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48891
Reviewed By: ngimel
Differential Revision: D25358285
Pulled By: ezyang
fbshipit-source-id: 8160f3adebffbc8e592ddfcc3adf153a9dc91557
Summary:
[Refiled version of earlier PR https://github.com/pytorch/pytorch/issues/45451]
This PR revamps the hipify module in PyTorch to overcome a long list of shortcomings in the original implementation. However, these improvements are applied only when using hipify to build PyTorch extensions, not for PyTorch or Caffe2 itself.
Correspondingly, changes are made to cpp_extension.py to match these improvements.
The list of improvements to hipify is as follows:
1. Hipify files in the same directory as the original file, unless there's a "cuda" subdirectory in the original file path, in which case the hipified file will be in the corresponding file path with "hip" subdirectory instead of "cuda".
2. Never hipify the file in-place if changes are introduced due to hipification i.e. always ensure the hipified file either resides in a different folder or has a different filename compared to the original file.
3. Prevent re-hipification of already hipified files. This avoids creation of unnecessary "hip/hip" etc. subdirectories and additional files which have no actual use.
4. Do not write out hipified versions of files if they are identical to the original file. This results in a cleaner output directory, with minimal number of hipified files created.
5. Update header rewrite logic so that it accounts for the previous improvement.
6. Update header rewrite logic so it respects the rules for finding header files depending on whether "" or <> is used.
7. Return a dictionary of mappings of original file paths to hipified file paths from hipify function.
8. Introduce a version for hipify module to allow extensions to contain back-compatible code that targets a specific point in PyTorch where the hipify functionality changed.
9. Update cuda_to_hip_mappings.py to account for the ROCm component subdirectories inside /opt/rocm/include. This also results in cleanup of the Caffe2_HIP_INCLUDE path to remove unnecessary additions to the include path.
The list of changes to cpp_extension.py is as follows:
1. Call hipify when building a CUDAExtension for ROCm.
2. Prune the list of source files to CUDAExtension to include only the hipified versions of any source files in the list (if both original and hipified versions of the source file are in the list)
3. Add subdirectories of /opt/rocm/include to the include path for extensions, so that ROCm headers for subcomponent libraries are found automatically
cc jeffdaily sunway513 ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48715
Reviewed By: bdhirsh
Differential Revision: D25272824
Pulled By: ezyang
fbshipit-source-id: 8bba68b27e41ca742781e1c4d7b07c6f985f040e
Summary:
They removed the specific function in Python 3.9 so we should just
remake the function here and use our own instead of relying on hidden
functions from the stdlib
Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
Fixes https://github.com/pytorch/pytorch/issues/48617
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48618
Reviewed By: samestep
Differential Revision: D25230281
Pulled By: seemethere
fbshipit-source-id: 57216af40a4ae4dc8bafcf40d2eb3ba793b9b6e2
Summary:
This PR revamps the hipify module in PyTorch to overcome a long list of shortcomings in the original implementation. However, these improvements are applied only when using hipify to build PyTorch extensions, **not for PyTorch or Caffe2 itself**.
Correspondingly, changes are made to `cpp_extension.py` to match these improvements.
The list of improvements to hipify is as follows:
1. Hipify files in the same directory as the original file, unless there's a "cuda" subdirectory in the original file path, in which case the hipified file will be in the corresponding file path with "hip" subdirectory instead of "cuda".
2. Never hipify the file in-place if changes are introduced due to hipification i.e. always ensure the hipified file either resides in a different folder or has a different filename compared to the original file.
3. Prevent re-hipification of already hipified files. This avoids creation of unnecessary "hip/hip" etc. subdirectories and additional files which have no actual use.
4. Do not write out hipified versions of files if they are identical to the original file. This results in a cleaner output directory, with minimal number of hipified files created.
5. Update header rewrite logic so that it accounts for the previous improvement.
6. Update header rewrite logic so it respects the rules for finding header files depending on whether `""` or `<>` is used.
7. Return a dictionary of mappings of original file paths to hipified file paths from `hipify` function.
8. Introduce a version for hipify module to allow extensions to contain back-compatible code that targets a specific point in PyTorch where the hipify functionality changed.
9. Update `cuda_to_hip_mappings.py` to account for the ROCm component subdirectories inside `/opt/rocm/include`. This also results in cleanup of the `Caffe2_HIP_INCLUDE` path to remove unnecessary additions to the include path.
The list of changes to `cpp_extension.py` is as follows:
1. Call `hipify` when building a CUDAExtension for ROCm.
2. Prune the list of source files to CUDAExtension to include only the hipified versions of any source files in the list (if both original and hipified versions of the source file are in the list)
3. Add subdirectories of /opt/rocm/include to the include path for extensions, so that ROCm headers for subcomponent libraries are found automatically
cc jeffdaily sunway513 hgaspar lcskrishna ashishfarmer
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45451
Reviewed By: ezyang
Differential Revision: D24924736
Pulled By: malfet
fbshipit-source-id: 4af42b8ff4f21c3782dedb8719b8f9f86b34bd2d
Summary:
I think these can be safely removed since the min version of supported Python is now 3.6
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47822
Reviewed By: smessmer
Differential Revision: D24954936
Pulled By: ezyang
fbshipit-source-id: 5d4b2aeb78fc97d7ee4abaf5fb2aae21bf765e8b
Summary:
Preserve PYBIND11 (63ce3fbde8) configuration options in `torch._C._PYBIND11 (63ce3fbde8)_COMPILER_TYPE` and use them when building extensions
Also, use f-strings in `torch.utils.cpp_extension`
"Fixes" https://github.com/pytorch/pytorch/issues/46367
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46415
Reviewed By: VitalyFedyunin
Differential Revision: D24605949
Pulled By: malfet
fbshipit-source-id: 87340f2ed5308266a46ef8f0317316227dab9d4d
Summary:
Plus two minor fixes to `torch/csrc/Module.cpp`:
- Use iterator of type `Py_ssize_t` for array indexing in `THPModule_initNames`
- Fix clang-tidy warning of unneeded defaultGenerator copy by capturing it as `const auto&`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47025
Reviewed By: samestep
Differential Revision: D24605907
Pulled By: malfet
fbshipit-source-id: c276567d320758fa8b6f4bd64ff46d2ea5d40eff
Summary:
Fixes issues when building certain PyTorch extensions where the cpp files do NOT compile if flags such as `__HIP_NO_HALF_CONVERSIONS__` are defined.
cc jeffdaily
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46273
Reviewed By: zou3519
Differential Revision: D24422463
Pulled By: ezyang
fbshipit-source-id: 7a43d1f7d59c95589963532ef3bd3c68cb8262be
Summary:
This is the common behavior when one builds PyTorch (or any other CUDA project) using CMake, so it should be held true for Torch CUDA extensions as well.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43931
Reviewed By: ezyang, seemethere
Differential Revision: D23441793
Pulled By: malfet
fbshipit-source-id: 1af392107a94840331014fda970ef640dc094ae4
Summary:
Fix typos in torch.utils/_benchmark/README.md
Add empty __init__.py to examples folder to make example invocations from README.md correct
Fixed uniform distribution logic generation when mixval and maxval are None
Fixes https://github.com/pytorch/pytorch/issues/42984
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42960
Reviewed By: seemethere
Differential Revision: D23095399
Pulled By: malfet
fbshipit-source-id: 0546ce7299b157d9a1f8634340024b10c4b7e7de
Summary:
Previously we did not link against amdhip64 (roughly equivalent to cudart). Apparently, the recent RTDL_GLOBAL fixes prevent the extensions from finding the symbols needed for launching kernels.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41257
Reviewed By: zou3519
Differential Revision: D22573288
Pulled By: ezyang
fbshipit-source-id: 89f9329b2097df26785e2f67e236d60984d40fdd
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40837
As ninja has accurate dependency tracking, if there is nothing to do,
then we will very quickly noop. But this is important for correctness:
if a change was made to a header that is not listed explicitly in
the distutils Extension, then distutils will come to the wrong
conclusion about whether or not recompilation is needed (but Ninja
will work it out.)
This caused https://github.com/pytorch/vision/issues/2367
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Reviewed By: zou3519
Differential Revision: D22340930
Pulled By: ezyang
fbshipit-source-id: 481b74f6e2cc78159d2a74d413751cf7cf16f592
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39277
This PR contains initial changes that makes PyTorch build with Ampere GPU, CUDA 11, and cuDNN 8.
TF32 related features will not be included in this PR.
Test Plan: Imported from OSS
Differential Revision: D21832814
Pulled By: malfet
fbshipit-source-id: 37f9c6827e0c26ae3e303580f666584230832d06
Summary:
This PR adds the following changes:
1. It sets the default extension build to use ninja
2. Adds HIPCC flags to the host code compile string for ninja builds. This is needed when host code makes HIP API calls
cc: ezyang jeffdaily
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38939
Differential Revision: D21721905
Pulled By: ezyang
fbshipit-source-id: 75206838315a79850ecf86a78391a31ba5ee97cb
Summary:
This pull request adds a check for ROCm environment and skips adding CUDA specific flags for the scenario when a pytorch extension is built on ROCm.
ezyang jeffdaily
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38047
Differential Revision: D21470507
Pulled By: ezyang
fbshipit-source-id: 5af2d7235e306c7aa9a5f7fc8760025417383069
Summary:
This pull request enables ahead of time compilation of HIPExtensions with ninja by setting appropriate compilation flags for ROCm environment. Also, this enables the unit test for testing cuda_extensions on ROCm as well as removing test for ahead of time compilation of extensions with ninja from ROCM_BLACKLIST
ezyang jeffdaily
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37800
Differential Revision: D21408148
Pulled By: soumith
fbshipit-source-id: 146f4ffb3418f3534e6ce86805d3fe9c3eae84e1
Summary:
As described in the issue (https://github.com/pytorch/pytorch/issues/33701) the compiler check
for building cpp extensions does not work with ccache.
In this case we check compiler -v to determine which
compiler is actually used and check it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37293
Differential Revision: D21256913
Pulled By: ezyang
fbshipit-source-id: 5483a10cc2dbcff98a7f069ea9dbc0c12b6502dc
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35615
Python 2 has reached end-of-life and is no longer supported by PyTorch.
Now we can clean up a lot of cruft that we put in place to support it.
These changes were all done manually, and I skipped anything that seemed
like it would take more than a few seconds, so I think it makes sense to
review it manually as well (though using side-by-side view and ignoring
whitespace change might be helpful).
Test Plan: CI
Differential Revision: D20842886
Pulled By: dreiss
fbshipit-source-id: 8cad4e87c45895e7ce3938a88e61157a79504aed
Summary:
This enables cpp_extensions.load/load_inline. This works by hipify-ing cuda sources.
Also enable tests.
CuDNN/MIOpen extensions aren't yet supported, I propose to not do this in this PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35897
Differential Revision: D20983279
Pulled By: ezyang
fbshipit-source-id: a5d0f5ac592d04488a6a46522c58e2ee0a6fd57c
Summary:
Otherwise, it will print some message when hipcc is not found.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35789
Differential Revision: D20793089
Pulled By: ezyang
fbshipit-source-id: 4b3cb29fb1d74a1931603ee01e669013ccae9685
Summary:
The current config on `master` yields the following errors when build from source on Windows with CMake and Visual Studio 2019.
```
Severity Code Description Project File Line Suppression State
Error LNK2001 unresolved external symbol \?warp_size@cuda@at@YAHXZ\ torch D:\AI\pytorch\build_libtorch\caffe2\LINK 1
Severity Code Description Project File Line Suppression State
Error LNK1120 1 unresolved externals torch D:\AI\pytorch\build_libtorch\bin\Release\torch.dll 1
Severity Code Description Project File Line Suppression State
Error LNK2001 unresolved external symbol \?warp_size@cuda@at@YAHXZ\ caffe2_observers D:\AI\pytorch\build_libtorch\modules\observers\LINK 1
Severity Code Description Project File Line Suppression State
Error LNK1120 1 unresolved externals caffe2_observers D:\AI\pytorch\build_libtorch\bin\Release\caffe2_observers.dll 1
Severity Code Description Project File Line Suppression State
Error LNK2001 unresolved external symbol \?warp_size@cuda@at@YAHXZ\ caffe2_detectron_ops_gpu D:\AI\pytorch\build_libtorch\modules\detectron\LINK 1
Severity Code Description Project File Line Suppression State
Error LNK1120 1 unresolved externals caffe2_detectron_ops_gpu D:\AI\pytorch\build_libtorch\bin\Release\caffe2_detectron_ops_gpu.dll 1
```
This change at least fixes the above errors in that specific setting. Do you think it makes sense to get this merged or will it break other settings?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35659
Differential Revision: D20735907
Pulled By: ezyang
fbshipit-source-id: eb8fa1e69aaaa5af2da3a76963ddc910bb716479
Summary:
Otherwise, VC++ will warn that every exposed C++ symbol, for example:
```
include\c10/core/impl/LocalDispatchKeySet.h(53): warning C4251: 'c10::impl::LocalDispatchKeySet::included_': class 'c10::DispatchKeySet' needs to have dll-interface to be used by clients of struct 'c10::impl::LocalDispatchKeySet'
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35272
Test Plan: CI
Differential Revision: D20623005
Pulled By: malfet
fbshipit-source-id: b635b674159bb9654e4e1a1af4394c4f36fe35bd
Summary:
This pull request has changes for:
1. Enabling a torch module with HIP code to be compiled by cpp_extensions.py
2. Fixes for hipify module to be able to be used by a torch extension
cc: ezyang iotamudelta jeffdaily
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32669
Differential Revision: D20033893
Pulled By: zou3519
fbshipit-source-id: fd6ddc8cdcd3930f41008636bb2bc9dd26cdb008
Summary:
Closes https://github.com/pytorch/pytorch/issues/30027
The idea here is that you can bind a function with `pybind11` in a single line and without modifying the function:
```cpp
m.def("foo", foo, py::call_guard<torch::PyWarningHandler>());
```
Where warnings are handled by the [`call_guard`](https://pybind11.readthedocs.io/en/stable/advanced/functions.html#call-guard) and exceptions are handled by the `pybind11` exception translator. To do this, I have added support for handling C++ exceptions in `torch::PyWarningHandler`'s destructor without setting the python error state before hand.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30588
Differential Revision: D19905626
Pulled By: albanD
fbshipit-source-id: 90c0a5e298b123cc0c8ab9c52c91be4e96ea47c6
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33346Fixes#33091
This PR lets users control the number of workers that cpp extensions
uses through the environment variable `MAX_JOBS`. If the environment
variable is a non-negative integer we use that many threads; otherwise,
ninja falls back to the default.
I chose to use the name `MAX_JOBS` because we use it in PyTorch already
to control the number of workers PyTorch builds with. There is a risk
that users of cpp extensions already have `MAX_JOBS` set but we are
hoping that that risk is small and/or it means semantically the same
thing.
Test Plan: - tested locally
Differential Revision: D19911645
Pulled By: zou3519
fbshipit-source-id: d20ed42de4f845499ed38f1a1c73e9ccb620f780
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32495
Background
------------------------------
Previously, ninja was used to compile+link inline cpp_extensions and
ahead-of-time cpp_extensions were compiled with distutils. This PR adds
the ability to compile (but not link) ahead-of-time cpp_extensions with ninja.
The main motivation for this is to speed up cpp_extension builds: distutils
does not make use of parallelism. With this PR, using the new option, on my machine,
- torchvision compilation goes from 3m43s to 49s
- nestedtensor compilation goes from 2m0s to 28s.
User-facing changes
------------------------------
I added a `use_ninja` flag to BuildExtension. This defaults to
`True`. When `use_ninja` is True:
- it will attempt to use ninja.
- If we cannot use ninja, then this throws a warning and falls back to
distutils.
- Situations we cannot use ninja: Windows (NYI, I'll open a new issue
for this), if ninja cannot be found on the system.
Implementation Details
------------------------------
This PR makes this change in two steps. Please me know if it would be
easier to review this if I split this up into a stacked diff.
Those changes are:
1) refactor _write_ninja_file to separate the policy (what compiler flags
to pass) from the mechanism (how to write the ninja file and do compilation).
2) call _write_ninja_file and _run_ninja_build while building
ahead-of-time cpp_extensions. These are only used to compile objects;
distutils still handles the linking.
Change 1: refactor _write_ninja_file to seperate policy from mechanism
- I split _write_ninja_file into: _write_ninja_file and
_write_ninja_file_to_build_library
- I renamed _build_extension_module to _run_ninja_build
Change 2: Call _write_ninja_file while building ahead-of-time
cpp_extensions
- _write_ninja_file_and_compile_objects calls _write_ninja_file to only
build object files.
- We monkey-patch distutils.CCompiler.compile to call
_write_ninja_files_and_compile_objects
- distutils still handles the linking step. The linking step is not a
bottleneck so it was not a concern.
- This change only works on unix-based systems. Our code for windows
goes down a different codepath and I did not want to mess with that.
- If a system does not support ninja, we raise a warning and fall back
to the original compilation path.
Test Plan
------------------------------
Adhoc testing
- I built torchvision using pytorch master and printed out the build
commands. Next, I used this branch to build torchvision and looked at
the ninja file. I compared the ninja file with the build commands and
asserted that they were functionally the same.
- I repeated the above for pytorch/nestedtensor.
PyTorch test suite
- I split `test_cpp_extensions` into `test_cpp_extensions_aot` and
`test_cpp_extensions_jit`. The AOT (ahead-of-time) version tests
ahead-of-time and the JIT version tests just-in-time (not to be confused
with TorchScript)
- `test_cpp_extensions_aot` gets run TWICE by run_test.py, once with
a module that was built with ninja, and once with a module that was
built without ninja.
- run_test.py asserts that when we are building with use_ninja=True,
ninja is actually available on the system.
Test Plan: Imported from OSS
Differential Revision: D19730432
Pulled By: zou3519
fbshipit-source-id: 819590d01cf65e8da5a1e8019b8b3084792fee90
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31161
Previously, it wasn't necessary to specify `DT_NEEDED` in C++ extensions on Linux (aka pass `-l` flags) because all of the symbols would have already been loaded with `RTLD_GLOBAL`, so there wouldn't be any undefined symbols. But when we switch to loading `_C` with `RTLD_LOCAL`, it's now necessary for all the C++ extensions to know what libraries to link with. The resulting code is clearer and more uniform, so it's wins all around.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Differential Revision: D19262578
Pulled By: ezyang
fbshipit-source-id: a893cc96f2e9aad1c064a6de4f7ccf79257dec3f
Summary:
The original `check-and-act` style can raise `FileExistsError` when multiple processes are jit-compiling the extension on the same node.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30956
Differential Revision: D19262570
Pulled By: ezyang
fbshipit-source-id: bb18c72e42648770b47f9378ac7c3929c3c03efc
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31047
Changelist:
- remove BUILD_NAMEDTENSOR from .cu files
- remove BUILD_NAMEDTENSOR special handling in function_wrapper.py
- remove BUILD_NAMEDTENSOR from cpp_extension.py. This code actually
did nothing because we always compile with BUILD_NAMEDTENSOR.
Test Plan: - run tests
Differential Revision: D18908442
Pulled By: zou3519
fbshipit-source-id: b239e24de58580adaf3cef573350773a38b1e4f0
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30315
The new structure is that libtorch_cpu contains the bulk of our
code, and libtorch depends on libtorch_cpu and libtorch_cuda.
This is a reland of https://github.com/pytorch/pytorch/pull/29731 but
I've extracted all of the prep work into separate PRs which can be
landed before this one.
Some things of note:
* torch/csrc/cuda/nccl.cpp was added to the wrong list of SRCS, now fixed (this didn't matter before because previously they were all in the same library)
* The dummy file for libtorch was brought back from the dead; it was previously deleted in #20774
In an initial version of the patch, I forgot to make torch_cuda explicitly depend on torch_cpu. This lead to some very odd errors, most notably "bin/blob_test: hidden symbol `_ZNK6google8protobuf5Arena17OnArenaAllocationEPKSt9type_infom' in lib/libprotobuf.a(arena.cc.o) is referenced by DSO"
* A number of places in Android/iOS builds have to add torch_cuda explicitly as a library, as they do not have transitive dependency calculation working correctly
* I had to torch_cpu/torch_cuda caffe2_interface_library so that they get whole-archived linked into torch when you statically link. And I had to do this in an *exported* fashion because torch needs to depend on torch_cpu_library. In the end I exported everything and removed the redefinition in the Caffe2Config.cmake. However, I am not too sure why the old code did it in this way in the first place; however, it doesn't seem to have broken anything to switch it this way.
* There's some uses of `__HIP_PLATFORM_HCC__` still in `torch_cpu` code, so I had to apply it to that library too (UGH). This manifests as a failer when trying to run the CUDA fuser. This doesn't really matter substantively right now because we still in-place HIPify, but it would be good to fix eventually. This was a bit difficult to debug because of an unrelated HIP bug, see https://github.com/ROCm-Developer-Tools/HIP/issues/1706Fixes#27215 (as our libraries are smaller), and executes on
part of the plan in #29235.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Differential Revision: D18790941
Pulled By: ezyang
fbshipit-source-id: 01296f6089d3de5e8365251b490c51e694f2d6c7
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29731
The new structure is that libtorch_cpu contains the bulk of our
code, and libtorch depends on libtorch_cpu and libtorch_cuda.
Some subtleties about the patch:
- There were a few functions that crossed CPU-CUDA boundary without API macros. I just added them, easy enough. An inverse situation was aten/src/THC/THCTensorRandom.cu where we weren't supposed to put API macros directly in a cpp file.
- DispatchStub wasn't getting all of its symbols related to static members on DispatchStub exported properly. I tried a few fixes but in the end I just moved everyone off using DispatchStub to dispatch CUDA/HIP (so they just use normal dispatch for those cases.) Additionally, there were some mistakes where people incorrectly were failing to actually import the declaration of the dispatch stub, so added includes for those cases.
- torch/csrc/cuda/nccl.cpp was added to the wrong list of SRCS, now fixed (this didn't matter before because previously they were all in the same library)
- The dummy file for libtorch was brought back from the dead; it was previously deleted in #20774
- In an initial version of the patch, I forgot to make torch_cuda explicitly depend on torch_cpu. This lead to some very odd errors, most notably "bin/blob_test: hidden symbol `_ZNK6google8protobuf5Arena17OnArenaAllocationEPKSt9type_infom' in lib/l
ibprotobuf.a(arena.cc.o) is referenced by DSO"
- A number of places in Android/iOS builds have to add torch_cuda explicitly as a library, as they do not have transitive dependency calculation working correctly. This situation also happens with custom C++ extensions.
- There's a ROCm compiler bug where extern "C" on functions is not respected. There's a little workaround to handle this.
- Because I was too lazy to check if HIPify was converting TORCH_CUDA_API into TORCH_HIP_API, I just made it so HIP build also triggers the TORCH_CUDA_API macro. Eventually, we should translate and keep the nature of TORCH_CUDA_API constant in all cases.
Fixes#27215 (as our libraries are smaller), and executes on
part of the plan in #29235.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Differential Revision: D18632773
Pulled By: ezyang
fbshipit-source-id: ea717c81e0d7554ede1dc404108603455a81da82
Summary:
This is a follow-up to gh-23408. No longer supported are any arches < 3.5 (numbers + 'Fermi' and 'Kepler+Tegra').
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24442
Differential Revision: D16889283
Pulled By: ezyang
fbshipit-source-id: 3c0c35d51b7ac7642d1be7ab4b0f260ac93b60c9