It doesn't make sense to set this (on import!) as CUDA cannot be used with PyTorch in this case but leads to messages like
> No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
when CUDA happens to be installed which is at least confusing.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106310
Approved by: https://github.com/ezyang
The default rendering of these code snippets renders the `TORCH_CUDA_ARCH_LIST` values with typographic quotes which prevent the examples from being directly copyable. Use code style for the two extension examples.
Fixes#112763
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112764
Approved by: https://github.com/malfet
- rename `__HIP_PLATFORM_HCC__` to `__HIP_PLATFORM_AMD__`
- rename `HIP_HCC_FLAGS` to `HIP_CLANG_FLAGS`
- rename `PYTORCH_HIP_HCC_LIBRARIES` to `PYTORCH_HIP_LIBRARIES`
- workaround in tools/amd_build/build_amd.py until submodules are updated
These symbols have had a long deprecation cycle and will finally be removed in ROCm 6.0.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111975
Approved by: https://github.com/ezyang, https://github.com/hongxiayang
Did some easy fixes from enabling TRY200. Most of these seem like oversights instead of intentional. The proper way to silence intentional errors is with `from None` to note that you thought about whether it should contain the cause and decided against it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111496
Approved by: https://github.com/malfet
The CUDA architecture flags from TORCH_CUDA_ARCH_LIST will be skipped if the TORCH_EXTENSION_NAME includes the substring "arch". A C++ Extension should be allowed to have any name. I just manually skip the TORCH_EXTENSION_NAME flag when checking if one of the flags is "arch". There is probably a better fix, but I'll leave this to experts.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111211
Approved by: https://github.com/ezyang
On Linux, CUDA header dependencies are not correctly tracked. After you modify a CUDA header, affected CUDA files won't be rebuilt. This PR will fix this problem.
```console
$ ninja -t deps
rep_penalty.o: #deps 2, deps mtime 1693956351892493247 (VALID)
/home/qc/Workspace/NotMe/exllama/exllama_ext/cpu_func/rep_penalty.cpp
/home/qc/Workspace/NotMe/exllama/exllama_ext/cpu_func/rep_penalty.h
rms_norm.cuda.o: #deps 0, deps mtime 1693961188871054130 (VALID)
rope.cuda.o: #deps 0, deps mtime 1693961188954388632 (VALID)
cuda_buffers.cuda.o: #deps 0, deps mtime 1693961188797719768 (VALID)
...
```
Historically, this line of code has been changed twice. It was first implemented in #49344 and there's no `if IS_WINDOWS`, just like now. Then in #56015 someone added `if IS_WINDOWS` for unknown reason. That PR has no description so I don't know what bug he encountered. I don't think there's any bug with these flags on Linux, at least for today. CMake generates exactly the same flags for CUDA.
```ninja
#############################################
# Rule for compiling CUDA files.
rule CUDA_COMPILER__cpp_cuda_unscanned_Debug
depfile = $DEP_FILE
deps = gcc
command = ${LAUNCHER}${CODE_CHECK}/opt/cuda/bin/nvcc -forward-unknown-to-host-compiler $DEFINES $INCLUDES $FLAGS -MD -MT $out -MF $DEP_FILE -x cu -c $in -o $out
description = Building CUDA object $out
```
where `-MD` is short for `--generate-dependencies-with-compile` and `-MF` is short for `--dependency-output`. My words can be verified by `nvcc --help`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108613
Approved by: https://github.com/ezyang
This updates ruff to 0.285 which is faster, better, and have fixes a bunch of false negatives with regards to fstrings.
I also enabled RUF017 which looks for accidental quadratic list summation. Luckily, seems like there are no instances of it in our codebase, so enabling it so that it stays like that. :)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107519
Approved by: https://github.com/ezyang
This updates ruff to 0.285 which is faster, better, and have fixes a bunch of false negatives with regards to fstrings.
I also enabled RUF017 which looks for accidental quadratic list summation. Luckily, seems like there are no instances of it in our codebase, so enabling it so that it stays like that. :)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107519
Approved by: https://github.com/ezyang
This PR fixes the circular issue during hipification process by introducing current_state to track whether a file is processed for hipification. (Iterative DFS)
The issue arises when two header files try to include themselves, which leads to a circular recursion or an infinite loop.
Fixes the related issues such as :
https://github.com/pytorch/pytorch/issues/93827https://github.com/ROCmSoftwarePlatform/hipify_torch/issues/39
Error log:
```
File "/opt/conda/lib/python3.8/posixpath.py", line 471, in relpath
start_list = [x for x in abspath(start).split(sep) if x]
File "/opt/conda/lib/python3.8/posixpath.py", line 375, in abspath
if not isabs(path):
File "/opt/conda/lib/python3.8/posixpath.py", line 63, in isabs
sep = _get_sep(s)
File "/opt/conda/lib/python3.8/posixpath.py", line 42, in _get_sep
if isinstance(path, bytes):
RecursionError: maximum recursion depth exceeded while calling a Python object
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104085
Approved by: https://github.com/jithunnair-amd, https://github.com/malfet
Not sure, why was it excluded previous (oversight I guess).
Also, please note, that `clang++` is already considered acceptable compiler (as it ends with `g++` ;))
<!--
copilot:poem
-->
### <samp>🤖 Generated by Copilot at 55aa7db</samp>
> _`clang` or `gcc`, we don't care what you use_
> _We'll build our extensions with the tools we choose_
> _Don't try to stop us with your version string_
> _We'll update our logic and make our code sing_
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103349
Approved by: https://github.com/seemethere
When we need to link extra libs, we should notice that 64-bit CUDA may be installed in "lib", not in "lib64".
<!--
copilot:summary
-->
### <samp>🤖 Generated by Copilot at 05c1ca6</samp>
Improve CUDA compatibility in `torch.utils.cpp_extension` by checking for `lib64` or `lib` directory.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101285
Approved by: https://github.com/ezyang, https://github.com/malfet
Currently if `setuptools<49.4.0` and there is a minor version mismatch `_check_cuda_version` fails with a misleading non-actionable error:
```
2023-03-24T20:21:35.0625644Z RuntimeError:
2023-03-24T20:21:35.0628441Z The detected CUDA version (11.2) mismatches the version that was used to compile
2023-03-24T20:21:35.0630681Z PyTorch (11.3). Please make sure to use the same CUDA versions.
```
This condition shouldn't be failing since minor version match isn't required.
It fails because the other condition to have a certain version of `setuptools` isn't met. But that condition is written in a comment (!!!). So this PR changes it to actually tell the user how to fix the problem.
While at it, I adjusted the version number as a lower `setuptools>=49.4.0` is sufficient for this to work.
Thanks.
p.s. this problem manifests on `nvidia/cuda:11.2.2-cudnn8-devel-ubuntu20.04` docker image.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97602
Approved by: https://github.com/ezyang
Merges startswith, endswith calls to into a single call that feeds in a tuple. Not only are these calls more readable, but it will be more efficient as it iterates through each string only once.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96754
Approved by: https://github.com/ezyang
These warnings are disabled to avoid long log on Windows tests. They are also disabled on CMake buildings currently.
'/wd4624': MSVC complains "destructor was implicitly defined as delete" on c10::optional and other templates
'/wd4076': "unexpected tokens following preprocessor directive - expected a newline" on some header
'/wd4068': "The compiler ignored an unrecognized [pragma]"
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95933
Approved by: https://github.com/ezyang
Applies the remaining flake8-comprehension fixes and checks. This changes replace all remaining unnecessary generator expressions with list/dict/set comprehensions which are more succinct, performant, and better supported by our torch.jit compiler. It also removes useless generators such as 'set(a for a in b)`, resolving it into just the set call.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94676
Approved by: https://github.com/ezyang
Optimize unnecessary collection cast calls, unnecessary calls to list, tuple, and dict, and simplify calls to the sorted builtin. This should strictly improve speed and improve readability.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94323
Approved by: https://github.com/albanD
The main changes are:
1. Remove outdated checks for old compiler versions because they can't support C++17.
2. Remove outdated CMake checks because it now requires 3.18.
3. Remove outdated CUDA checks because we are moving to CUDA 11.
Almost all changes are in CMake files for easy audition.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90599
Approved by: https://github.com/soumith
Switch GCC/Clang max versions to be exclusive as the `include/crt/host_config.h` checks the major version only for the upper bound. This allows to be less restrictive and match the checks in the aforementioned header.
Also update the versions using that header in the CUDA SDKs.
Follow up to #82860
I noticed this as PyTorch 1.12.1 with CUDA 11.3.1 and GCC 10.3 was failing in the `test_cpp_extensions*` tests.
Example for CUDA 11.3.1 from the SDK header:
```
#if __GNUC__ > 11
// Error out
...
#if (__clang_major__ >= 12) || (__clang_major__ < 3) || ((__clang_major__ == 3) && (__clang_minor__ < 3))
// Error out
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86360
Approved by: https://github.com/ezyang
As we are linking with cuDNN and cuBLAS dynamically for all configs anyway, as statically linked cuDNN is different library than dynamically linked one, increases default memory footprint, etc, and libtorch_cuda even if compiled for all GPU architectures is no longer approaching 2Gb binary size limit, so BUILD_SPLIT_CUDA can go away.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87502
Approved by: https://github.com/atalman
The function checks the output of e.g. `c++ -v` for "gcc version". But on another locale than English it might be "gcc-Version" which makes the check fail.
This causes the function to wrongly return false on systems where `c++` is a hardlink to `g++` and the current locale returns another output format.
Fix this by setting `LC_ALL=C`.
I found this as `test_utils.py` was failing in `test_cpp_compiler_is_ok`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85891
Approved by: https://github.com/ezyang
This is a new version of #15648 based on the latest master branch.
Unlike the previous PR where I fixed a lot of the doctests in addition to integrating xdoctest, I'm going to reduce the scope here. I'm simply going to integrate xdoctest, and then I'm going to mark all of the failing tests as "SKIP". This will let xdoctest run on the dashboards, provide some value, and still let the dashboards pass. I'll leave fixing the doctests themselves to another PR.
In my initial commit, I do the bare minimum to get something running with failing dashboards. The few tests that I marked as skip are causing segfaults. Running xdoctest results in 293 failed, 201 passed tests. The next commits will be to disable those tests. (unfortunately I don't have a tool that will insert the `#xdoctest: +SKIP` directive over every failing test, so I'm going to do this mostly manually.)
Fixes https://github.com/pytorch/pytorch/issues/71105
@ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82797
Approved by: https://github.com/ezyang
### Description
<!-- What did you change and why was it needed? -->
Listed in the commit message:
> The user may want to use `python3 -c "..."` to get the torch library
> path and the include path. Printing messages to stdout will mess up
> the output.
I'm using the command:
```bash
LIBTORCH_PATH="$(
python3 -c 'print(":".join(__import__("torch.utils.cpp_extension", fromlist=[None]).library_paths()))'
)"
export LD_LIBRARY_PATH="${LIBTORCH_PATH}${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}"
```
To let the command line tools find the torch shared libraries. I think this would be a common use case for users who writing C/C++ extensions.
I got:
```console
$ LIBTORCH_PATH="$(python3 -c 'print(":".join(__import__("torch.utils.cpp_extension", fromlist=[None]).library_paths()))')"
$ export LD_LIBRARY_PATH="${LIBTORCH_PATH}${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}"
$ echo "LD_LIBRARY_PATH=${LD_LIBRARY_PATH}"
LD_LIBRARY_PATH=No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda-11.6'
/opt/hostedtoolcache/Python/3.7.13/x64/lib/python3.7/site-packages/torch/lib:/usr/local/cuda-11.6/lib64:
$ ls -alh "${LIBTORCH_PATH}"
ls: cannot access 'No CUDA runtime is found, using CUDA_HOME='\''/usr/local/cuda-11.6'\'''$'\n''/opt/hostedtoolcache/Python/3.7.13/x64/lib/python3.7/site-packages/torch/lib': No such file or directory
```
This PR prints messages in `torch.utils.cpp_extension` to `stderr`, which allows users to get correct result using `VAR="$(python3 -c '...')"`
### Issue
<!-- Link to Issue ticket or RFP -->
N/A
### Testing
<!-- How did you test your change? -->
N/A
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82097
Approved by: https://github.com/ezyang
*even if no GPUs are available*
When building PyTorch extensions for ROCm Pytorch, if the user doesn't specify a list of archs using PYTORCH_ROCM_ARCH env var, we would like to use the list of gfx archs that PyTorch was built for as the default value. To do this successfully even in an environment where no GPUs are available eg. a build-only CPU node, we need to be able to get the list of archs. `torch.cuda.get_arch_list()` doesn't work here because it calls `torch.cuda.available()` first: 0922cc024e/torch/cuda/__init__.py (L463), which will return `False` if no GPUs are available, resulting in an empty list being returned by `torch.cuda.get_arch_list()`. To get around this issue, we call the underlying API `torch._C._cuda_getArchFlags()`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80498
Approved by: https://github.com/ezyang, https://github.com/malfet
Summary:
hip/hip_runtime.h and libamdhip64.so may be required to compile
extension such as torch_ucc. They are in $ROCM_HOME/hip by default,
and may not be symlinked to $ROCM_HOME/include and $ROCM_HOME/lib.
This commit defines $ROCM_HOME/hip as $HIP_HOME, and adds its include
and lib paths when building hipified extension.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75548
Test Plan:
## Verify OSS pytorch + TorchUCC on an AMD GPU machine (MI100)
- step 1. Install OSS pytorch
```
export ROCM_PATH=/opt/rocm-4.5.2
git clone https://github.com/pytorch/pytorch.git
cd pytorch
python3 tools/amd_build/build_amd.py
USE_NCCL=0 USE_RCCL=0 USE_KINETO=0 with-proxy python3 setup.py develop
USE_NCCL=0 USE_RCCL=0 USE_KINETO=0 with-proxy python3 setup.py install
```
- step2. Install torchUCC extension
```
# /opt/rocm-4.5.2/include/hip does not exist, need include /opt/rocm-4.5.2/hip/include at compile time
export ROCM_PATH=/opt/rocm-4.5.2
export RCCL_INSTALL_DIR=/opt/rccl-rocm-rel-4.4-rdc
git clone https://github.com/facebookresearch/torch_ucc.git
cd torch_ucc
UCX_HOME=$RCCL_INSTALL_DIR UCC_HOME=$RCCL_INSTALL_DIR WITH_CUDA=$ROCM_PATH python setup.py
```
Build log before fix (error "hip/hip_runtime.h: No such file or directory"): P493038915
Build log after fix: P493037572
Reviewed By: ezyang
Differential Revision: D35506098
Pulled By: minsii
fbshipit-source-id: 76cbb6d4eaa6549a00898c9d9ebaca47a55330e9
(cherry picked from commit d684c080edf1fbd293e3321151976812c1da8533)
Summary:
Adding documentation about compiling extension with CUDA 11.5 and Windows
Example of failure: https://github.com/pytorch/pytorch/runs/4408796098?check_suite_focus=true
Note: Don't use torch/extension.h In CUDA 11.5 under windows in your C++ code:
Use aten instead of torch interface in all cuda 11.5 code under windows. It has been failing with errors, due to a bug in nvcc.
Example use:
>>> #include <ATen/ATen.h>
>>> at::Tensor SigmoidAlphaBlendForwardCuda(....)
Instead of:
>>> #include <torch/extension.h>
>>> torch::Tensor SigmoidAlphaBlendForwardCuda(...)
Currently open issue for nvcc bug: https://github.com/pytorch/pytorch/issues/69460
Complete Workaround code example: cb170ac024
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73013
Reviewed By: malfet, seemethere
Differential Revision: D34306134
Pulled By: atalman
fbshipit-source-id: 3c5b9d7a89c91bd1920dc63dbd356e45dc48a8bd
(cherry picked from commit 87098e7f17)
Summary:
Remove all hardcoded AMD gfx targets
PyTorch build and Magma build will use rocm_agent_enumerator as
backup if PYTORCH_ROCM_ARCH env var is not defined
PyTorch extensions will use same gfx targets as the PyTorch build,
unless PYTORCH_ROCM_ARCH env var is defined
torch.cuda.get_arch_list() now works for ROCm builds
PyTorch CI dockers will continue to be built for gfx900 and gfx906 for now.
PYTORCH_ROCM_ARCH env var can be a space or semicolon separated list of gfx archs eg. "gfx900 gfx906" or "gfx900;gfx906"
cc jeffdaily sunway513 jithunnair-amd ROCmSupport KyleCZH
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61706
Reviewed By: seemethere
Differential Revision: D32735862
Pulled By: malfet
fbshipit-source-id: 3170e445e738e3ce373203e1e4ae99c84e645d7d
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65610
- Replace HIP_PLATFORM_HCC with USE_ROCM
- Dont rely on CUDA_VERSION or HIP_VERSION and use USE_ROCM and ROCM_VERSION.
- In the next PR
- Will be removing the mapping from CUDA_VERSION to HIP_VERSION and CUDA to HIP in hipify.
- HIP_PLATFORM_HCC is deprecated, so will add HIP_PLATFORM_AMD to support HIP host code compilation on gcc.
cc jeffdaily sunway513 jithunnair-amd ROCmSupport amathews-amd
Reviewed By: jbschlosser
Differential Revision: D30909053
Pulled By: ezyang
fbshipit-source-id: 224a966ebf1aaec79beccbbd686fdf3d49267e06
Summary:
I think this may be related to https://app.circleci.com/pipelines/github/pytorch/vision/9352/workflows/9c8afb1c-6157-4c82-a5c8-105c5adac57d/jobs/687003
Apparently `pkg_resource.parse_version` returns a type of `pkg_resources.extern.packaging.version.Version` instead of `packaging.version.Version` and seems on some older version of the setuptools it doesn't support `.major/minor` operation. changing it back to using `packaging.version.parse`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61053
Test Plan: CI
Reviewed By: samestep
Differential Revision: D29494322
Pulled By: walterddr
fbshipit-source-id: 294572a10b167677440d7404e5ebe007ab59d299
Summary:
[distutils](https://docs.python.org/3/library/distutils.html) is on its way out and will be deprecated-on-import for Python 3.10+ and removed in Python 3.12 (see [PEP 632](https://www.python.org/dev/peps/pep-0632/)). There's no reason for us to keep it around since all the functionality we want from it can be found in `setuptools` / `sysconfig`. `setuptools` includes a copy of most of `distutils` (which is fine to use according to the PEP), that it uses under the hood, so this PR also uses that in some places.
Fixes#56527
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57040
Pulled By: driazati
Reviewed By: nikithamalgifb
Differential Revision: D28051356
fbshipit-source-id: 1ca312219032540e755593e50da0c9e23c62d720
Summary:
- This change is required to handle the case when hipcc is
updated to the latest using update-alternatives.
- Update-alternatives support for few ROCm binaries is available
from ROCm 4.1 onwards.
- This change doesnt not affect any previous versions of ROCm.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55968
Reviewed By: mruberry
Differential Revision: D27785123
Pulled By: ezyang
fbshipit-source-id: 8467e468d8d51277fab9b0c8cbd57e80bbcfc7f7
Summary:
Allows extensions to override ROCm gfx arch targets. Reuses the same env var used during cmake build for consistency.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54341
Reviewed By: bdhirsh
Differential Revision: D27244010
Pulled By: heitorschueroff
fbshipit-source-id: 279e1a41ee395a0596aa7f696b6e908cf7f5bb83
Summary:
This fixes the previous erroring out by adding stricter conditions in cpp_extension.py.
To test, run a split torch_cuda build on Windows with export BUILD_SPLIT_CUDA=ON && python setup.py develop and then run the following test: python test/test_utils.py TestStandaloneCPPJIT.test_load_standalone. It should pass.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51596
Reviewed By: malfet
Differential Revision: D26213816
Pulled By: janeyx99
fbshipit-source-id: a752ce7f9ab9d73dcf56f952bed2f2e040614443
Summary:
On all non-Windows platforms we should use 'posix_prefix' schema to discover location of Python.h header
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51586
Reviewed By: ezyang
Differential Revision: D26208684
Pulled By: malfet
fbshipit-source-id: bafa6d79de42231629960c642d535f1fcf7a427f
Summary:
Because of the size of our `libtorch_cuda.so`, linking with other hefty binaries presents a problem where 32bit relocation markers are too small and end up overflowing. This PR attempts to break up `torch_cuda` into `torch_cuda_cu` and `torch_cuda_cpp`.
`torch_cuda_cu`: all the files previously in `Caffe2_GPU_SRCS` that are
* pure `.cu` files in `aten`match
* all the BLAS files
* all the THC files, except for THCAllocator.cpp, THCCachingHostAllocator.cpp and THCGeneral.cpp
* all files in`detail`
* LegacyDefinitions.cpp and LegacyTHFunctionsCUDA.cpp
* Register*CUDA.cpp
* CUDAHooks.cpp
* CUDASolver.cpp
* TensorShapeCUDA.cpp
`torch_cuda_cpp`: all other files in `Caffe2_GPU_SRCS`
Accordingly, TORCH_CUDA_API and TORCH_CUDA_BUILD_MAIN_LIB usages are getting split as well to TORCH_CUDA_CU_API and TORCH_CUDA_CPP_API.
To test this locally, you can run `export BUILD_SPLIT_CUDA=ON && python setup.py develop`. In your `build/lib` folder, you should find binaries for both `torch_cuda_cpp` and `torch_cuda_cu`. To see that the SPLIT_CUDA option was toggled, you can grep the Summary of running cmake and make sure `Split CUDA` is ON.
This build option is tested on CI for CUDA 11.1 builds (linux for now, but windows soon).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49050
Reviewed By: walterddr
Differential Revision: D26114310
Pulled By: janeyx99
fbshipit-source-id: 0180f2519abb5a9cdde16a6fb7dd3171cff687a6
Summary:
Bugs:
1) would introduce -I* in compile commands
2) wouldn't hipify source code directly in build_dir, only one level down or more
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50703
Reviewed By: mrshenli
Differential Revision: D25949070
Pulled By: ngimel
fbshipit-source-id: 018c2a056b68019a922e20e5db2eb8435ad147fe
Summary:
_resubmission of gh-49654, which was reverted due to a cross-merge conflict_
This caught one incorrect annotation in `cpp_extension.load`.
xref gh-16574.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50278
Reviewed By: walterddr
Differential Revision: D25865278
Pulled By: ezyang
fbshipit-source-id: 25489191628af5cf9468136db36f5a0f72d9d54d
Summary:
tldr: current version of `is_ninja_available` of `torch/utils/cpp_extension.py` fails to run in the recent incarnations of pip w/ new build isolation feature which is now a default. This PR fixes this problem.
The full story follows:
--------------------------
Currently trying to build https://github.com/facebookresearch/fairscale/ which builds cuda extensions fails with the recent pip versions. The build is failing to perform `is_ninja_available`, which runs a simple subprocess to run `ninja --version` but does it with some /dev/null stream override which seems to break with the new pip versions. Currently I have `pip==20.3.3`. The recent pip performs build isolation which first fetches all dependencies to somewhere under /tmp/pip-install-xyz and then builds the package.
If I build:
```
pip install fairscale --no-build-isolation
```
everything works.
When building normally (i.e. without `--no-build-isolation`), the failure is a long long trace,
<details>
<summary>Full log</summary>
<pre>
pip install fairscale
Collecting fairscale
Downloading fairscale-0.1.1.tar.gz (83 kB)
|████████████████████████████████| 83 kB 562 kB/s
Installing build dependencies ... done
Getting requirements to build wheel ... error
ERROR: Command errored out with exit status 1:
command: /home/stas/anaconda3/envs/main-38/bin/python /home/stas/anaconda3/envs/main-38/lib/python3.8/site-packages/pip/_vendor/pep517/_in_process.py get_requires_for_build_wheel /tmp/tmpjvw00c7v
cwd: /tmp/pip-install-1wq9f8fp/fairscale_347f218384a64f24b8d5ce846641213e
Complete output (55 lines):
running egg_info
writing fairscale.egg-info/PKG-INFO
writing dependency_links to fairscale.egg-info/dependency_links.txt
writing requirements to fairscale.egg-info/requires.txt
writing top-level names to fairscale.egg-info/top_level.txt
Traceback (most recent call last):
File "/home/stas/anaconda3/envs/main-38/bin/ninja", line 5, in <module>
from ninja import ninja
ModuleNotFoundError: No module named 'ninja'
Traceback (most recent call last):
File "/home/stas/anaconda3/envs/main-38/lib/python3.8/site-packages/pip/_vendor/pep517/_in_process.py", line 280, in <module>
main()
File "/home/stas/anaconda3/envs/main-38/lib/python3.8/site-packages/pip/_vendor/pep517/_in_process.py", line 263, in main
json_out['return_val'] = hook(**hook_input['kwargs'])
File "/home/stas/anaconda3/envs/main-38/lib/python3.8/site-packages/pip/_vendor/pep517/_in_process.py", line 114, in get_requires_for_build_wheel
return hook(config_settings)
File "/tmp/pip-build-env-a5x2icen/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 149, in get_requires_for_build_wheel
return self._get_build_requires(
File "/tmp/pip-build-env-a5x2icen/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 130, in _get_build_requires
self.run_setup()
File "/tmp/pip-build-env-a5x2icen/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 145, in run_setup
exec(compile(code, __file__, 'exec'), locals())
File "setup.py", line 56, in <module>
setuptools.setup(
File "/tmp/pip-build-env-a5x2icen/overlay/lib/python3.8/site-packages/setuptools/__init__.py", line 153, in setup
return distutils.core.setup(**attrs)
File "/home/stas/anaconda3/envs/main-38/lib/python3.8/distutils/core.py", line 148, in setup
dist.run_commands()
File "/home/stas/anaconda3/envs/main-38/lib/python3.8/distutils/dist.py", line 966, in run_commands
self.run_command(cmd)
File "/home/stas/anaconda3/envs/main-38/lib/python3.8/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/tmp/pip-build-env-a5x2icen/overlay/lib/python3.8/site-packages/setuptools/command/egg_info.py", line 298, in run
self.find_sources()
File "/tmp/pip-build-env-a5x2icen/overlay/lib/python3.8/site-packages/setuptools/command/egg_info.py", line 305, in find_sources
mm.run()
File "/tmp/pip-build-env-a5x2icen/overlay/lib/python3.8/site-packages/setuptools/command/egg_info.py", line 536, in run
self.add_defaults()
File "/tmp/pip-build-env-a5x2icen/overlay/lib/python3.8/site-packages/setuptools/command/egg_info.py", line 572, in add_defaults
sdist.add_defaults(self)
File "/home/stas/anaconda3/envs/main-38/lib/python3.8/distutils/command/sdist.py", line 228, in add_defaults
self._add_defaults_ext()
File "/home/stas/anaconda3/envs/main-38/lib/python3.8/distutils/command/sdist.py", line 311, in _add_defaults_ext
build_ext = self.get_finalized_command('build_ext')
File "/home/stas/anaconda3/envs/main-38/lib/python3.8/distutils/cmd.py", line 298, in get_finalized_command
cmd_obj = self.distribution.get_command_obj(command, create)
File "/home/stas/anaconda3/envs/main-38/lib/python3.8/distutils/dist.py", line 858, in get_command_obj
cmd_obj = self.command_obj[command] = klass(self)
File "/tmp/pip-build-env-a5x2icen/overlay/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 351, in __init__
if not is_ninja_available():
File "/tmp/pip-build-env-a5x2icen/overlay/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1310, in is_ninja_available
subprocess.check_call('ninja --version'.split(), stdout=devnull)
File "/home/stas/anaconda3/envs/main-38/lib/python3.8/subprocess.py", line 364, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['ninja', '--version']' returned non-zero exit status 1.
----------------------------------------
ERROR: Command errored out with exit status 1: /home/stas/anaconda3/envs/main-38/bin/python /home/stas/anaconda3/envs/main-38/lib/python3.8/site-packages/pip/_vendor/pep517/_in_process.py get_requires_for_build_wheel /tmp/tmpjvw00c7v Check the logs for full command output.
</pre>
</details>
and the middle of it is what we want:
```
File "/tmp/pip-build-env-a5x2icen/overlay/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 351, in __init__
if not is_ninja_available():
File "/tmp/pip-build-env-a5x2icen/overlay/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1310, in is_ninja_available
subprocess.check_call('ninja --version'.split(), stdout=devnull)
File "/home/stas/anaconda3/envs/main-38/lib/python3.8/subprocess.py", line 364, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['ninja', '--version']' returned non-zero exit status 1.
```
For some reason pytorch fails to run this simple code:
```
# torch/utils/cpp_extension.py
def is_ninja_available():
r'''
Returns ``True`` if the `ninja <https://ninja-build.org/>`_ build system is
available on the system, ``False`` otherwise.
'''
with open(os.devnull, 'wb') as devnull:
try:
subprocess.check_call('ninja --version'.split(), stdout=devnull)
except OSError:
return False
else:
return True
```
I suspect that pip does something to `os.devnull` and that's why it fails.
This PR proposes a simpler code which doesn't rely on anything but `subprocess.check_output`:
```
def is_ninja_available():
r'''
Returns ``True`` if the `ninja <https://ninja-build.org/>`_ build system is
available on the system, ``False`` otherwise.
'''
try:
subprocess.check_output('ninja --version'.split())
except Exception:
return False
else:
return True
```
which doesn't use `os.devnull` and performs the same function. There could be a whole bunch of different exceptions there I think, so I went for the generic one - we don't care why it failed, since this function's only purpose is to suggest whether ninja can be used or not.
Let's check
```
python -c "import torch.utils.cpp_extension; print(torch.utils.cpp_extension.is_ninja_available())"
True
```
Look ma - no std noise to take care of. (i.e. no need for /dev/null).
I was editing the installed environment-wide `cpp_extension.py` file directly, so didn't need to tweak `PYTHONPATH` - I made sure to replace `'ninja --version'.` with something that should fail and I did get `False` for the above command line.
I next did a somewhat elaborate cheat to re-package an already existing binary wheel with this corrected version of `cpp_extension.py`, rather than building from source:
```
mkdir /tmp/pytorch-local-channel
cd /tmp/pytorch-local-channel
# get the latest nightly wheel
wget https://download.pytorch.org/whl/nightly/cu110/torch-1.8.0.dev20201215%2Bcu110-cp38-cp38-linux_x86_64.whl
# unpack it
unzip torch-1.8.0.dev20201215+cu110-cp38-cp38-linux_x86_64.whl
# edit torch/utils/cpp_extension.py to fix the python code with the new version as in this PR
emacs torch/utils/cpp_extension.py &
# pack the files back
zip -r torch-1.8.0.dev20201215+cu110-cp38-cp38-linux_x86_64.whl caffe2 torch torch-1.8.0.dev20201215+cu110.dist-info
```
Now I tell pip to use my local channel, plus `--pre` for it to pick up the pre-release as an acceptable wheel
```
# install using this local channel
git clone https://github.com/facebookresearch/fairscale/
cd fairscale
pip install -v --disable-pip-version-check -e . -f file:///tmp/pytorch-local-channel --pre
```
and voila all works.
```
[...]
Successfully installed fairscale
```
I noticed a whole bunch of ninja not found errors in the log, which I think is the same problem with other parts of the build system packages which also use this old check copied all over various projects and build tools, and which the recent pip breaks.
```
writing manifest file '/tmp/pip-modern-metadata-_nsdesbq/fairscale.egg-info/SOURCES.txt'
Traceback (most recent call last):
File "/home/stas/anaconda3/envs/main-38/bin/ninja", line 5, in <module>
from ninja import ninja
ModuleNotFoundError: No module named 'ninja'
[...]
/tmp/pip-build-env-fqflyevr/overlay/lib/python3.8/site-packages/torch/utils/cpp_extension.py:364: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
warnings.warn(msg.format('we could not find ninja.'))
```
but these don't prevent from the build completing and installing.
I suppose these need to be identified and reported to various other projects, but that's another story.
The new pip does something to `os.devnull` I think which breaks any code relying on it - I haven't tried to figure out what happens to that stream object, but this PR which removes its usage solves the problem.
Also do notice that:
```
git clone https://github.com/facebookresearch/fairscale/
cd fairscale
python setup.py bdist_wheel
pip install dist/fairscale-0.1.1-cp38-cp38-linux_x86_64.whl
```
works too. So it is really a pip issue.
Apologies if the notes are too many, I tried to give the complete picture and probably other projects will need those details as well.
Thank you for reading.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49443
Reviewed By: mruberry
Differential Revision: D25592109
Pulled By: ezyang
fbshipit-source-id: bfce4420c28b614ead48e9686f4153c6e0fbe8b7
Summary:
I am submitting this PR on behalf of Janne Hellsten(nurpax) from NVIDIA, for the convenience of CLA. Thanks Janne a lot for the contribution!
Currently, the ninja build decides whether to rebuild a .cu file or not pretty randomly. And there are actually two issues:
First, the arch list in the building command is ordered randomly. When the order changes, it will unconditionally rebuild regardless of the timestamp.
Second, the header files are not included in the dependency list, so if the header file changes, it is possible that ninja will not rebuild.
This PR fixes both issues. The fix for the second issue requires nvcc >= 10.2. nvcc < 10.2 can still build CUDA extension as it used to be, but it will be unable to see the changes in header files.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49344
Reviewed By: glaringlee
Differential Revision: D25540157
Pulled By: ezyang
fbshipit-source-id: 197541690d7f25e3ac5ebe3188beb1f131a4c51f
Summary:
Currently CUDAExtension assumes that all cards are of the same type on the same machine and builds the extension with compute capability of the 0th card. This breaks later at runtime if the machine has cards of different types.
Specifically resulting in:
```
RuntimeError: CUDA error: no kernel image is available for execution on the device
```
when the cards of the types that weren't compiled for are used. (and the error is far from telling what the problem is to the uninitiated)
My current setup is:
```
$ CUDA_VISIBLE_DEVICES=0 python -c "import torch; print(torch.cuda.get_device_capability())"
(8, 6)
$ CUDA_VISIBLE_DEVICES=1 python -c "import torch; print(torch.cuda.get_device_capability())"
(6, 1)
```
but the extension was getting built with `-gencode=arch=compute_80,code=sm_80`.
This PR:
* [x] introduces a loop over all visible at build time devices to ensure the extension will run on all of them (it sorts the new list generated by the loop, so that the output is easier to debug should a card with lower capacity come last)
* [x] adds `+PTX` to the last entry of ccs derived from local cards (`if not _arch_list:`) to support other archs
* [x] adds a digest of my conversation with ptrblck on slack in the form of docs which hopefully can help others know which archs to support, how to override defaults, when and how to add PTX, etc.
Please kindly review that my prose is clear and easy to understand.
ptrblck
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48891
Reviewed By: ngimel
Differential Revision: D25358285
Pulled By: ezyang
fbshipit-source-id: 8160f3adebffbc8e592ddfcc3adf153a9dc91557
Summary:
[Refiled version of earlier PR https://github.com/pytorch/pytorch/issues/45451]
This PR revamps the hipify module in PyTorch to overcome a long list of shortcomings in the original implementation. However, these improvements are applied only when using hipify to build PyTorch extensions, not for PyTorch or Caffe2 itself.
Correspondingly, changes are made to cpp_extension.py to match these improvements.
The list of improvements to hipify is as follows:
1. Hipify files in the same directory as the original file, unless there's a "cuda" subdirectory in the original file path, in which case the hipified file will be in the corresponding file path with "hip" subdirectory instead of "cuda".
2. Never hipify the file in-place if changes are introduced due to hipification i.e. always ensure the hipified file either resides in a different folder or has a different filename compared to the original file.
3. Prevent re-hipification of already hipified files. This avoids creation of unnecessary "hip/hip" etc. subdirectories and additional files which have no actual use.
4. Do not write out hipified versions of files if they are identical to the original file. This results in a cleaner output directory, with minimal number of hipified files created.
5. Update header rewrite logic so that it accounts for the previous improvement.
6. Update header rewrite logic so it respects the rules for finding header files depending on whether "" or <> is used.
7. Return a dictionary of mappings of original file paths to hipified file paths from hipify function.
8. Introduce a version for hipify module to allow extensions to contain back-compatible code that targets a specific point in PyTorch where the hipify functionality changed.
9. Update cuda_to_hip_mappings.py to account for the ROCm component subdirectories inside /opt/rocm/include. This also results in cleanup of the Caffe2_HIP_INCLUDE path to remove unnecessary additions to the include path.
The list of changes to cpp_extension.py is as follows:
1. Call hipify when building a CUDAExtension for ROCm.
2. Prune the list of source files to CUDAExtension to include only the hipified versions of any source files in the list (if both original and hipified versions of the source file are in the list)
3. Add subdirectories of /opt/rocm/include to the include path for extensions, so that ROCm headers for subcomponent libraries are found automatically
cc jeffdaily sunway513 ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48715
Reviewed By: bdhirsh
Differential Revision: D25272824
Pulled By: ezyang
fbshipit-source-id: 8bba68b27e41ca742781e1c4d7b07c6f985f040e
Summary:
They removed the specific function in Python 3.9 so we should just
remake the function here and use our own instead of relying on hidden
functions from the stdlib
Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
Fixes https://github.com/pytorch/pytorch/issues/48617
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48618
Reviewed By: samestep
Differential Revision: D25230281
Pulled By: seemethere
fbshipit-source-id: 57216af40a4ae4dc8bafcf40d2eb3ba793b9b6e2
Summary:
This PR revamps the hipify module in PyTorch to overcome a long list of shortcomings in the original implementation. However, these improvements are applied only when using hipify to build PyTorch extensions, **not for PyTorch or Caffe2 itself**.
Correspondingly, changes are made to `cpp_extension.py` to match these improvements.
The list of improvements to hipify is as follows:
1. Hipify files in the same directory as the original file, unless there's a "cuda" subdirectory in the original file path, in which case the hipified file will be in the corresponding file path with "hip" subdirectory instead of "cuda".
2. Never hipify the file in-place if changes are introduced due to hipification i.e. always ensure the hipified file either resides in a different folder or has a different filename compared to the original file.
3. Prevent re-hipification of already hipified files. This avoids creation of unnecessary "hip/hip" etc. subdirectories and additional files which have no actual use.
4. Do not write out hipified versions of files if they are identical to the original file. This results in a cleaner output directory, with minimal number of hipified files created.
5. Update header rewrite logic so that it accounts for the previous improvement.
6. Update header rewrite logic so it respects the rules for finding header files depending on whether `""` or `<>` is used.
7. Return a dictionary of mappings of original file paths to hipified file paths from `hipify` function.
8. Introduce a version for hipify module to allow extensions to contain back-compatible code that targets a specific point in PyTorch where the hipify functionality changed.
9. Update `cuda_to_hip_mappings.py` to account for the ROCm component subdirectories inside `/opt/rocm/include`. This also results in cleanup of the `Caffe2_HIP_INCLUDE` path to remove unnecessary additions to the include path.
The list of changes to `cpp_extension.py` is as follows:
1. Call `hipify` when building a CUDAExtension for ROCm.
2. Prune the list of source files to CUDAExtension to include only the hipified versions of any source files in the list (if both original and hipified versions of the source file are in the list)
3. Add subdirectories of /opt/rocm/include to the include path for extensions, so that ROCm headers for subcomponent libraries are found automatically
cc jeffdaily sunway513 hgaspar lcskrishna ashishfarmer
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45451
Reviewed By: ezyang
Differential Revision: D24924736
Pulled By: malfet
fbshipit-source-id: 4af42b8ff4f21c3782dedb8719b8f9f86b34bd2d
Summary:
I think these can be safely removed since the min version of supported Python is now 3.6
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47822
Reviewed By: smessmer
Differential Revision: D24954936
Pulled By: ezyang
fbshipit-source-id: 5d4b2aeb78fc97d7ee4abaf5fb2aae21bf765e8b
Summary:
Preserve PYBIND11 (63ce3fbde8) configuration options in `torch._C._PYBIND11 (63ce3fbde8)_COMPILER_TYPE` and use them when building extensions
Also, use f-strings in `torch.utils.cpp_extension`
"Fixes" https://github.com/pytorch/pytorch/issues/46367
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46415
Reviewed By: VitalyFedyunin
Differential Revision: D24605949
Pulled By: malfet
fbshipit-source-id: 87340f2ed5308266a46ef8f0317316227dab9d4d
Summary:
Plus two minor fixes to `torch/csrc/Module.cpp`:
- Use iterator of type `Py_ssize_t` for array indexing in `THPModule_initNames`
- Fix clang-tidy warning of unneeded defaultGenerator copy by capturing it as `const auto&`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47025
Reviewed By: samestep
Differential Revision: D24605907
Pulled By: malfet
fbshipit-source-id: c276567d320758fa8b6f4bd64ff46d2ea5d40eff
Summary:
Fixes issues when building certain PyTorch extensions where the cpp files do NOT compile if flags such as `__HIP_NO_HALF_CONVERSIONS__` are defined.
cc jeffdaily
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46273
Reviewed By: zou3519
Differential Revision: D24422463
Pulled By: ezyang
fbshipit-source-id: 7a43d1f7d59c95589963532ef3bd3c68cb8262be
Summary:
This is the common behavior when one builds PyTorch (or any other CUDA project) using CMake, so it should be held true for Torch CUDA extensions as well.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43931
Reviewed By: ezyang, seemethere
Differential Revision: D23441793
Pulled By: malfet
fbshipit-source-id: 1af392107a94840331014fda970ef640dc094ae4
Summary:
Fix typos in torch.utils/_benchmark/README.md
Add empty __init__.py to examples folder to make example invocations from README.md correct
Fixed uniform distribution logic generation when mixval and maxval are None
Fixes https://github.com/pytorch/pytorch/issues/42984
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42960
Reviewed By: seemethere
Differential Revision: D23095399
Pulled By: malfet
fbshipit-source-id: 0546ce7299b157d9a1f8634340024b10c4b7e7de
Summary:
Previously we did not link against amdhip64 (roughly equivalent to cudart). Apparently, the recent RTDL_GLOBAL fixes prevent the extensions from finding the symbols needed for launching kernels.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41257
Reviewed By: zou3519
Differential Revision: D22573288
Pulled By: ezyang
fbshipit-source-id: 89f9329b2097df26785e2f67e236d60984d40fdd
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40837
As ninja has accurate dependency tracking, if there is nothing to do,
then we will very quickly noop. But this is important for correctness:
if a change was made to a header that is not listed explicitly in
the distutils Extension, then distutils will come to the wrong
conclusion about whether or not recompilation is needed (but Ninja
will work it out.)
This caused https://github.com/pytorch/vision/issues/2367
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Reviewed By: zou3519
Differential Revision: D22340930
Pulled By: ezyang
fbshipit-source-id: 481b74f6e2cc78159d2a74d413751cf7cf16f592
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39277
This PR contains initial changes that makes PyTorch build with Ampere GPU, CUDA 11, and cuDNN 8.
TF32 related features will not be included in this PR.
Test Plan: Imported from OSS
Differential Revision: D21832814
Pulled By: malfet
fbshipit-source-id: 37f9c6827e0c26ae3e303580f666584230832d06
Summary:
This PR adds the following changes:
1. It sets the default extension build to use ninja
2. Adds HIPCC flags to the host code compile string for ninja builds. This is needed when host code makes HIP API calls
cc: ezyang jeffdaily
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38939
Differential Revision: D21721905
Pulled By: ezyang
fbshipit-source-id: 75206838315a79850ecf86a78391a31ba5ee97cb
Summary:
This pull request adds a check for ROCm environment and skips adding CUDA specific flags for the scenario when a pytorch extension is built on ROCm.
ezyang jeffdaily
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38047
Differential Revision: D21470507
Pulled By: ezyang
fbshipit-source-id: 5af2d7235e306c7aa9a5f7fc8760025417383069
Summary:
This pull request enables ahead of time compilation of HIPExtensions with ninja by setting appropriate compilation flags for ROCm environment. Also, this enables the unit test for testing cuda_extensions on ROCm as well as removing test for ahead of time compilation of extensions with ninja from ROCM_BLACKLIST
ezyang jeffdaily
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37800
Differential Revision: D21408148
Pulled By: soumith
fbshipit-source-id: 146f4ffb3418f3534e6ce86805d3fe9c3eae84e1
Summary:
As described in the issue (https://github.com/pytorch/pytorch/issues/33701) the compiler check
for building cpp extensions does not work with ccache.
In this case we check compiler -v to determine which
compiler is actually used and check it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37293
Differential Revision: D21256913
Pulled By: ezyang
fbshipit-source-id: 5483a10cc2dbcff98a7f069ea9dbc0c12b6502dc
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35615
Python 2 has reached end-of-life and is no longer supported by PyTorch.
Now we can clean up a lot of cruft that we put in place to support it.
These changes were all done manually, and I skipped anything that seemed
like it would take more than a few seconds, so I think it makes sense to
review it manually as well (though using side-by-side view and ignoring
whitespace change might be helpful).
Test Plan: CI
Differential Revision: D20842886
Pulled By: dreiss
fbshipit-source-id: 8cad4e87c45895e7ce3938a88e61157a79504aed
Summary:
This enables cpp_extensions.load/load_inline. This works by hipify-ing cuda sources.
Also enable tests.
CuDNN/MIOpen extensions aren't yet supported, I propose to not do this in this PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35897
Differential Revision: D20983279
Pulled By: ezyang
fbshipit-source-id: a5d0f5ac592d04488a6a46522c58e2ee0a6fd57c
Summary:
Otherwise, it will print some message when hipcc is not found.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35789
Differential Revision: D20793089
Pulled By: ezyang
fbshipit-source-id: 4b3cb29fb1d74a1931603ee01e669013ccae9685
Summary:
The current config on `master` yields the following errors when build from source on Windows with CMake and Visual Studio 2019.
```
Severity Code Description Project File Line Suppression State
Error LNK2001 unresolved external symbol \?warp_size@cuda@at@YAHXZ\ torch D:\AI\pytorch\build_libtorch\caffe2\LINK 1
Severity Code Description Project File Line Suppression State
Error LNK1120 1 unresolved externals torch D:\AI\pytorch\build_libtorch\bin\Release\torch.dll 1
Severity Code Description Project File Line Suppression State
Error LNK2001 unresolved external symbol \?warp_size@cuda@at@YAHXZ\ caffe2_observers D:\AI\pytorch\build_libtorch\modules\observers\LINK 1
Severity Code Description Project File Line Suppression State
Error LNK1120 1 unresolved externals caffe2_observers D:\AI\pytorch\build_libtorch\bin\Release\caffe2_observers.dll 1
Severity Code Description Project File Line Suppression State
Error LNK2001 unresolved external symbol \?warp_size@cuda@at@YAHXZ\ caffe2_detectron_ops_gpu D:\AI\pytorch\build_libtorch\modules\detectron\LINK 1
Severity Code Description Project File Line Suppression State
Error LNK1120 1 unresolved externals caffe2_detectron_ops_gpu D:\AI\pytorch\build_libtorch\bin\Release\caffe2_detectron_ops_gpu.dll 1
```
This change at least fixes the above errors in that specific setting. Do you think it makes sense to get this merged or will it break other settings?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35659
Differential Revision: D20735907
Pulled By: ezyang
fbshipit-source-id: eb8fa1e69aaaa5af2da3a76963ddc910bb716479
Summary:
Otherwise, VC++ will warn that every exposed C++ symbol, for example:
```
include\c10/core/impl/LocalDispatchKeySet.h(53): warning C4251: 'c10::impl::LocalDispatchKeySet::included_': class 'c10::DispatchKeySet' needs to have dll-interface to be used by clients of struct 'c10::impl::LocalDispatchKeySet'
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35272
Test Plan: CI
Differential Revision: D20623005
Pulled By: malfet
fbshipit-source-id: b635b674159bb9654e4e1a1af4394c4f36fe35bd