Summary:
tldr: current version of `is_ninja_available` of `torch/utils/cpp_extension.py` fails to run in the recent incarnations of pip w/ new build isolation feature which is now a default. This PR fixes this problem.
The full story follows:
--------------------------
Currently trying to build https://github.com/facebookresearch/fairscale/ which builds cuda extensions fails with the recent pip versions. The build is failing to perform `is_ninja_available`, which runs a simple subprocess to run `ninja --version` but does it with some /dev/null stream override which seems to break with the new pip versions. Currently I have `pip==20.3.3`. The recent pip performs build isolation which first fetches all dependencies to somewhere under /tmp/pip-install-xyz and then builds the package.
If I build:
```
pip install fairscale --no-build-isolation
```
everything works.
When building normally (i.e. without `--no-build-isolation`), the failure is a long long trace,
<details>
<summary>Full log</summary>
<pre>
pip install fairscale
Collecting fairscale
Downloading fairscale-0.1.1.tar.gz (83 kB)
|████████████████████████████████| 83 kB 562 kB/s
Installing build dependencies ... done
Getting requirements to build wheel ... error
ERROR: Command errored out with exit status 1:
command: /home/stas/anaconda3/envs/main-38/bin/python /home/stas/anaconda3/envs/main-38/lib/python3.8/site-packages/pip/_vendor/pep517/_in_process.py get_requires_for_build_wheel /tmp/tmpjvw00c7v
cwd: /tmp/pip-install-1wq9f8fp/fairscale_347f218384a64f24b8d5ce846641213e
Complete output (55 lines):
running egg_info
writing fairscale.egg-info/PKG-INFO
writing dependency_links to fairscale.egg-info/dependency_links.txt
writing requirements to fairscale.egg-info/requires.txt
writing top-level names to fairscale.egg-info/top_level.txt
Traceback (most recent call last):
File "/home/stas/anaconda3/envs/main-38/bin/ninja", line 5, in <module>
from ninja import ninja
ModuleNotFoundError: No module named 'ninja'
Traceback (most recent call last):
File "/home/stas/anaconda3/envs/main-38/lib/python3.8/site-packages/pip/_vendor/pep517/_in_process.py", line 280, in <module>
main()
File "/home/stas/anaconda3/envs/main-38/lib/python3.8/site-packages/pip/_vendor/pep517/_in_process.py", line 263, in main
json_out['return_val'] = hook(**hook_input['kwargs'])
File "/home/stas/anaconda3/envs/main-38/lib/python3.8/site-packages/pip/_vendor/pep517/_in_process.py", line 114, in get_requires_for_build_wheel
return hook(config_settings)
File "/tmp/pip-build-env-a5x2icen/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 149, in get_requires_for_build_wheel
return self._get_build_requires(
File "/tmp/pip-build-env-a5x2icen/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 130, in _get_build_requires
self.run_setup()
File "/tmp/pip-build-env-a5x2icen/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 145, in run_setup
exec(compile(code, __file__, 'exec'), locals())
File "setup.py", line 56, in <module>
setuptools.setup(
File "/tmp/pip-build-env-a5x2icen/overlay/lib/python3.8/site-packages/setuptools/__init__.py", line 153, in setup
return distutils.core.setup(**attrs)
File "/home/stas/anaconda3/envs/main-38/lib/python3.8/distutils/core.py", line 148, in setup
dist.run_commands()
File "/home/stas/anaconda3/envs/main-38/lib/python3.8/distutils/dist.py", line 966, in run_commands
self.run_command(cmd)
File "/home/stas/anaconda3/envs/main-38/lib/python3.8/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/tmp/pip-build-env-a5x2icen/overlay/lib/python3.8/site-packages/setuptools/command/egg_info.py", line 298, in run
self.find_sources()
File "/tmp/pip-build-env-a5x2icen/overlay/lib/python3.8/site-packages/setuptools/command/egg_info.py", line 305, in find_sources
mm.run()
File "/tmp/pip-build-env-a5x2icen/overlay/lib/python3.8/site-packages/setuptools/command/egg_info.py", line 536, in run
self.add_defaults()
File "/tmp/pip-build-env-a5x2icen/overlay/lib/python3.8/site-packages/setuptools/command/egg_info.py", line 572, in add_defaults
sdist.add_defaults(self)
File "/home/stas/anaconda3/envs/main-38/lib/python3.8/distutils/command/sdist.py", line 228, in add_defaults
self._add_defaults_ext()
File "/home/stas/anaconda3/envs/main-38/lib/python3.8/distutils/command/sdist.py", line 311, in _add_defaults_ext
build_ext = self.get_finalized_command('build_ext')
File "/home/stas/anaconda3/envs/main-38/lib/python3.8/distutils/cmd.py", line 298, in get_finalized_command
cmd_obj = self.distribution.get_command_obj(command, create)
File "/home/stas/anaconda3/envs/main-38/lib/python3.8/distutils/dist.py", line 858, in get_command_obj
cmd_obj = self.command_obj[command] = klass(self)
File "/tmp/pip-build-env-a5x2icen/overlay/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 351, in __init__
if not is_ninja_available():
File "/tmp/pip-build-env-a5x2icen/overlay/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1310, in is_ninja_available
subprocess.check_call('ninja --version'.split(), stdout=devnull)
File "/home/stas/anaconda3/envs/main-38/lib/python3.8/subprocess.py", line 364, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['ninja', '--version']' returned non-zero exit status 1.
----------------------------------------
ERROR: Command errored out with exit status 1: /home/stas/anaconda3/envs/main-38/bin/python /home/stas/anaconda3/envs/main-38/lib/python3.8/site-packages/pip/_vendor/pep517/_in_process.py get_requires_for_build_wheel /tmp/tmpjvw00c7v Check the logs for full command output.
</pre>
</details>
and the middle of it is what we want:
```
File "/tmp/pip-build-env-a5x2icen/overlay/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 351, in __init__
if not is_ninja_available():
File "/tmp/pip-build-env-a5x2icen/overlay/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1310, in is_ninja_available
subprocess.check_call('ninja --version'.split(), stdout=devnull)
File "/home/stas/anaconda3/envs/main-38/lib/python3.8/subprocess.py", line 364, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['ninja', '--version']' returned non-zero exit status 1.
```
For some reason pytorch fails to run this simple code:
```
# torch/utils/cpp_extension.py
def is_ninja_available():
r'''
Returns ``True`` if the `ninja <https://ninja-build.org/>`_ build system is
available on the system, ``False`` otherwise.
'''
with open(os.devnull, 'wb') as devnull:
try:
subprocess.check_call('ninja --version'.split(), stdout=devnull)
except OSError:
return False
else:
return True
```
I suspect that pip does something to `os.devnull` and that's why it fails.
This PR proposes a simpler code which doesn't rely on anything but `subprocess.check_output`:
```
def is_ninja_available():
r'''
Returns ``True`` if the `ninja <https://ninja-build.org/>`_ build system is
available on the system, ``False`` otherwise.
'''
try:
subprocess.check_output('ninja --version'.split())
except Exception:
return False
else:
return True
```
which doesn't use `os.devnull` and performs the same function. There could be a whole bunch of different exceptions there I think, so I went for the generic one - we don't care why it failed, since this function's only purpose is to suggest whether ninja can be used or not.
Let's check
```
python -c "import torch.utils.cpp_extension; print(torch.utils.cpp_extension.is_ninja_available())"
True
```
Look ma - no std noise to take care of. (i.e. no need for /dev/null).
I was editing the installed environment-wide `cpp_extension.py` file directly, so didn't need to tweak `PYTHONPATH` - I made sure to replace `'ninja --version'.` with something that should fail and I did get `False` for the above command line.
I next did a somewhat elaborate cheat to re-package an already existing binary wheel with this corrected version of `cpp_extension.py`, rather than building from source:
```
mkdir /tmp/pytorch-local-channel
cd /tmp/pytorch-local-channel
# get the latest nightly wheel
wget https://download.pytorch.org/whl/nightly/cu110/torch-1.8.0.dev20201215%2Bcu110-cp38-cp38-linux_x86_64.whl
# unpack it
unzip torch-1.8.0.dev20201215+cu110-cp38-cp38-linux_x86_64.whl
# edit torch/utils/cpp_extension.py to fix the python code with the new version as in this PR
emacs torch/utils/cpp_extension.py &
# pack the files back
zip -r torch-1.8.0.dev20201215+cu110-cp38-cp38-linux_x86_64.whl caffe2 torch torch-1.8.0.dev20201215+cu110.dist-info
```
Now I tell pip to use my local channel, plus `--pre` for it to pick up the pre-release as an acceptable wheel
```
# install using this local channel
git clone https://github.com/facebookresearch/fairscale/
cd fairscale
pip install -v --disable-pip-version-check -e . -f file:///tmp/pytorch-local-channel --pre
```
and voila all works.
```
[...]
Successfully installed fairscale
```
I noticed a whole bunch of ninja not found errors in the log, which I think is the same problem with other parts of the build system packages which also use this old check copied all over various projects and build tools, and which the recent pip breaks.
```
writing manifest file '/tmp/pip-modern-metadata-_nsdesbq/fairscale.egg-info/SOURCES.txt'
Traceback (most recent call last):
File "/home/stas/anaconda3/envs/main-38/bin/ninja", line 5, in <module>
from ninja import ninja
ModuleNotFoundError: No module named 'ninja'
[...]
/tmp/pip-build-env-fqflyevr/overlay/lib/python3.8/site-packages/torch/utils/cpp_extension.py:364: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
warnings.warn(msg.format('we could not find ninja.'))
```
but these don't prevent from the build completing and installing.
I suppose these need to be identified and reported to various other projects, but that's another story.
The new pip does something to `os.devnull` I think which breaks any code relying on it - I haven't tried to figure out what happens to that stream object, but this PR which removes its usage solves the problem.
Also do notice that:
```
git clone https://github.com/facebookresearch/fairscale/
cd fairscale
python setup.py bdist_wheel
pip install dist/fairscale-0.1.1-cp38-cp38-linux_x86_64.whl
```
works too. So it is really a pip issue.
Apologies if the notes are too many, I tried to give the complete picture and probably other projects will need those details as well.
Thank you for reading.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49443
Reviewed By: mruberry
Differential Revision: D25592109
Pulled By: ezyang
fbshipit-source-id: bfce4420c28b614ead48e9686f4153c6e0fbe8b7
Summary:
I am submitting this PR on behalf of Janne Hellsten(nurpax) from NVIDIA, for the convenience of CLA. Thanks Janne a lot for the contribution!
Currently, the ninja build decides whether to rebuild a .cu file or not pretty randomly. And there are actually two issues:
First, the arch list in the building command is ordered randomly. When the order changes, it will unconditionally rebuild regardless of the timestamp.
Second, the header files are not included in the dependency list, so if the header file changes, it is possible that ninja will not rebuild.
This PR fixes both issues. The fix for the second issue requires nvcc >= 10.2. nvcc < 10.2 can still build CUDA extension as it used to be, but it will be unable to see the changes in header files.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49344
Reviewed By: glaringlee
Differential Revision: D25540157
Pulled By: ezyang
fbshipit-source-id: 197541690d7f25e3ac5ebe3188beb1f131a4c51f
Summary:
Currently CUDAExtension assumes that all cards are of the same type on the same machine and builds the extension with compute capability of the 0th card. This breaks later at runtime if the machine has cards of different types.
Specifically resulting in:
```
RuntimeError: CUDA error: no kernel image is available for execution on the device
```
when the cards of the types that weren't compiled for are used. (and the error is far from telling what the problem is to the uninitiated)
My current setup is:
```
$ CUDA_VISIBLE_DEVICES=0 python -c "import torch; print(torch.cuda.get_device_capability())"
(8, 6)
$ CUDA_VISIBLE_DEVICES=1 python -c "import torch; print(torch.cuda.get_device_capability())"
(6, 1)
```
but the extension was getting built with `-gencode=arch=compute_80,code=sm_80`.
This PR:
* [x] introduces a loop over all visible at build time devices to ensure the extension will run on all of them (it sorts the new list generated by the loop, so that the output is easier to debug should a card with lower capacity come last)
* [x] adds `+PTX` to the last entry of ccs derived from local cards (`if not _arch_list:`) to support other archs
* [x] adds a digest of my conversation with ptrblck on slack in the form of docs which hopefully can help others know which archs to support, how to override defaults, when and how to add PTX, etc.
Please kindly review that my prose is clear and easy to understand.
ptrblck
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48891
Reviewed By: ngimel
Differential Revision: D25358285
Pulled By: ezyang
fbshipit-source-id: 8160f3adebffbc8e592ddfcc3adf153a9dc91557
Summary:
[Refiled version of earlier PR https://github.com/pytorch/pytorch/issues/45451]
This PR revamps the hipify module in PyTorch to overcome a long list of shortcomings in the original implementation. However, these improvements are applied only when using hipify to build PyTorch extensions, not for PyTorch or Caffe2 itself.
Correspondingly, changes are made to cpp_extension.py to match these improvements.
The list of improvements to hipify is as follows:
1. Hipify files in the same directory as the original file, unless there's a "cuda" subdirectory in the original file path, in which case the hipified file will be in the corresponding file path with "hip" subdirectory instead of "cuda".
2. Never hipify the file in-place if changes are introduced due to hipification i.e. always ensure the hipified file either resides in a different folder or has a different filename compared to the original file.
3. Prevent re-hipification of already hipified files. This avoids creation of unnecessary "hip/hip" etc. subdirectories and additional files which have no actual use.
4. Do not write out hipified versions of files if they are identical to the original file. This results in a cleaner output directory, with minimal number of hipified files created.
5. Update header rewrite logic so that it accounts for the previous improvement.
6. Update header rewrite logic so it respects the rules for finding header files depending on whether "" or <> is used.
7. Return a dictionary of mappings of original file paths to hipified file paths from hipify function.
8. Introduce a version for hipify module to allow extensions to contain back-compatible code that targets a specific point in PyTorch where the hipify functionality changed.
9. Update cuda_to_hip_mappings.py to account for the ROCm component subdirectories inside /opt/rocm/include. This also results in cleanup of the Caffe2_HIP_INCLUDE path to remove unnecessary additions to the include path.
The list of changes to cpp_extension.py is as follows:
1. Call hipify when building a CUDAExtension for ROCm.
2. Prune the list of source files to CUDAExtension to include only the hipified versions of any source files in the list (if both original and hipified versions of the source file are in the list)
3. Add subdirectories of /opt/rocm/include to the include path for extensions, so that ROCm headers for subcomponent libraries are found automatically
cc jeffdaily sunway513 ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48715
Reviewed By: bdhirsh
Differential Revision: D25272824
Pulled By: ezyang
fbshipit-source-id: 8bba68b27e41ca742781e1c4d7b07c6f985f040e
Summary:
They removed the specific function in Python 3.9 so we should just
remake the function here and use our own instead of relying on hidden
functions from the stdlib
Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
Fixes https://github.com/pytorch/pytorch/issues/48617
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48618
Reviewed By: samestep
Differential Revision: D25230281
Pulled By: seemethere
fbshipit-source-id: 57216af40a4ae4dc8bafcf40d2eb3ba793b9b6e2
Summary:
This PR revamps the hipify module in PyTorch to overcome a long list of shortcomings in the original implementation. However, these improvements are applied only when using hipify to build PyTorch extensions, **not for PyTorch or Caffe2 itself**.
Correspondingly, changes are made to `cpp_extension.py` to match these improvements.
The list of improvements to hipify is as follows:
1. Hipify files in the same directory as the original file, unless there's a "cuda" subdirectory in the original file path, in which case the hipified file will be in the corresponding file path with "hip" subdirectory instead of "cuda".
2. Never hipify the file in-place if changes are introduced due to hipification i.e. always ensure the hipified file either resides in a different folder or has a different filename compared to the original file.
3. Prevent re-hipification of already hipified files. This avoids creation of unnecessary "hip/hip" etc. subdirectories and additional files which have no actual use.
4. Do not write out hipified versions of files if they are identical to the original file. This results in a cleaner output directory, with minimal number of hipified files created.
5. Update header rewrite logic so that it accounts for the previous improvement.
6. Update header rewrite logic so it respects the rules for finding header files depending on whether `""` or `<>` is used.
7. Return a dictionary of mappings of original file paths to hipified file paths from `hipify` function.
8. Introduce a version for hipify module to allow extensions to contain back-compatible code that targets a specific point in PyTorch where the hipify functionality changed.
9. Update `cuda_to_hip_mappings.py` to account for the ROCm component subdirectories inside `/opt/rocm/include`. This also results in cleanup of the `Caffe2_HIP_INCLUDE` path to remove unnecessary additions to the include path.
The list of changes to `cpp_extension.py` is as follows:
1. Call `hipify` when building a CUDAExtension for ROCm.
2. Prune the list of source files to CUDAExtension to include only the hipified versions of any source files in the list (if both original and hipified versions of the source file are in the list)
3. Add subdirectories of /opt/rocm/include to the include path for extensions, so that ROCm headers for subcomponent libraries are found automatically
cc jeffdaily sunway513 hgaspar lcskrishna ashishfarmer
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45451
Reviewed By: ezyang
Differential Revision: D24924736
Pulled By: malfet
fbshipit-source-id: 4af42b8ff4f21c3782dedb8719b8f9f86b34bd2d
Summary:
I think these can be safely removed since the min version of supported Python is now 3.6
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47822
Reviewed By: smessmer
Differential Revision: D24954936
Pulled By: ezyang
fbshipit-source-id: 5d4b2aeb78fc97d7ee4abaf5fb2aae21bf765e8b
Summary:
Preserve PYBIND11 (63ce3fbde8) configuration options in `torch._C._PYBIND11 (63ce3fbde8)_COMPILER_TYPE` and use them when building extensions
Also, use f-strings in `torch.utils.cpp_extension`
"Fixes" https://github.com/pytorch/pytorch/issues/46367
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46415
Reviewed By: VitalyFedyunin
Differential Revision: D24605949
Pulled By: malfet
fbshipit-source-id: 87340f2ed5308266a46ef8f0317316227dab9d4d
Summary:
Plus two minor fixes to `torch/csrc/Module.cpp`:
- Use iterator of type `Py_ssize_t` for array indexing in `THPModule_initNames`
- Fix clang-tidy warning of unneeded defaultGenerator copy by capturing it as `const auto&`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47025
Reviewed By: samestep
Differential Revision: D24605907
Pulled By: malfet
fbshipit-source-id: c276567d320758fa8b6f4bd64ff46d2ea5d40eff
Summary:
Fixes issues when building certain PyTorch extensions where the cpp files do NOT compile if flags such as `__HIP_NO_HALF_CONVERSIONS__` are defined.
cc jeffdaily
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46273
Reviewed By: zou3519
Differential Revision: D24422463
Pulled By: ezyang
fbshipit-source-id: 7a43d1f7d59c95589963532ef3bd3c68cb8262be
Summary:
This is the common behavior when one builds PyTorch (or any other CUDA project) using CMake, so it should be held true for Torch CUDA extensions as well.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43931
Reviewed By: ezyang, seemethere
Differential Revision: D23441793
Pulled By: malfet
fbshipit-source-id: 1af392107a94840331014fda970ef640dc094ae4
Summary:
Fix typos in torch.utils/_benchmark/README.md
Add empty __init__.py to examples folder to make example invocations from README.md correct
Fixed uniform distribution logic generation when mixval and maxval are None
Fixes https://github.com/pytorch/pytorch/issues/42984
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42960
Reviewed By: seemethere
Differential Revision: D23095399
Pulled By: malfet
fbshipit-source-id: 0546ce7299b157d9a1f8634340024b10c4b7e7de
Summary:
Previously we did not link against amdhip64 (roughly equivalent to cudart). Apparently, the recent RTDL_GLOBAL fixes prevent the extensions from finding the symbols needed for launching kernels.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41257
Reviewed By: zou3519
Differential Revision: D22573288
Pulled By: ezyang
fbshipit-source-id: 89f9329b2097df26785e2f67e236d60984d40fdd
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40837
As ninja has accurate dependency tracking, if there is nothing to do,
then we will very quickly noop. But this is important for correctness:
if a change was made to a header that is not listed explicitly in
the distutils Extension, then distutils will come to the wrong
conclusion about whether or not recompilation is needed (but Ninja
will work it out.)
This caused https://github.com/pytorch/vision/issues/2367
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Reviewed By: zou3519
Differential Revision: D22340930
Pulled By: ezyang
fbshipit-source-id: 481b74f6e2cc78159d2a74d413751cf7cf16f592
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39277
This PR contains initial changes that makes PyTorch build with Ampere GPU, CUDA 11, and cuDNN 8.
TF32 related features will not be included in this PR.
Test Plan: Imported from OSS
Differential Revision: D21832814
Pulled By: malfet
fbshipit-source-id: 37f9c6827e0c26ae3e303580f666584230832d06
Summary:
This PR adds the following changes:
1. It sets the default extension build to use ninja
2. Adds HIPCC flags to the host code compile string for ninja builds. This is needed when host code makes HIP API calls
cc: ezyang jeffdaily
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38939
Differential Revision: D21721905
Pulled By: ezyang
fbshipit-source-id: 75206838315a79850ecf86a78391a31ba5ee97cb
Summary:
This pull request adds a check for ROCm environment and skips adding CUDA specific flags for the scenario when a pytorch extension is built on ROCm.
ezyang jeffdaily
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38047
Differential Revision: D21470507
Pulled By: ezyang
fbshipit-source-id: 5af2d7235e306c7aa9a5f7fc8760025417383069
Summary:
This pull request enables ahead of time compilation of HIPExtensions with ninja by setting appropriate compilation flags for ROCm environment. Also, this enables the unit test for testing cuda_extensions on ROCm as well as removing test for ahead of time compilation of extensions with ninja from ROCM_BLACKLIST
ezyang jeffdaily
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37800
Differential Revision: D21408148
Pulled By: soumith
fbshipit-source-id: 146f4ffb3418f3534e6ce86805d3fe9c3eae84e1
Summary:
As described in the issue (https://github.com/pytorch/pytorch/issues/33701) the compiler check
for building cpp extensions does not work with ccache.
In this case we check compiler -v to determine which
compiler is actually used and check it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37293
Differential Revision: D21256913
Pulled By: ezyang
fbshipit-source-id: 5483a10cc2dbcff98a7f069ea9dbc0c12b6502dc
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35615
Python 2 has reached end-of-life and is no longer supported by PyTorch.
Now we can clean up a lot of cruft that we put in place to support it.
These changes were all done manually, and I skipped anything that seemed
like it would take more than a few seconds, so I think it makes sense to
review it manually as well (though using side-by-side view and ignoring
whitespace change might be helpful).
Test Plan: CI
Differential Revision: D20842886
Pulled By: dreiss
fbshipit-source-id: 8cad4e87c45895e7ce3938a88e61157a79504aed
Summary:
This enables cpp_extensions.load/load_inline. This works by hipify-ing cuda sources.
Also enable tests.
CuDNN/MIOpen extensions aren't yet supported, I propose to not do this in this PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35897
Differential Revision: D20983279
Pulled By: ezyang
fbshipit-source-id: a5d0f5ac592d04488a6a46522c58e2ee0a6fd57c
Summary:
Otherwise, it will print some message when hipcc is not found.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35789
Differential Revision: D20793089
Pulled By: ezyang
fbshipit-source-id: 4b3cb29fb1d74a1931603ee01e669013ccae9685
Summary:
The current config on `master` yields the following errors when build from source on Windows with CMake and Visual Studio 2019.
```
Severity Code Description Project File Line Suppression State
Error LNK2001 unresolved external symbol \?warp_size@cuda@at@YAHXZ\ torch D:\AI\pytorch\build_libtorch\caffe2\LINK 1
Severity Code Description Project File Line Suppression State
Error LNK1120 1 unresolved externals torch D:\AI\pytorch\build_libtorch\bin\Release\torch.dll 1
Severity Code Description Project File Line Suppression State
Error LNK2001 unresolved external symbol \?warp_size@cuda@at@YAHXZ\ caffe2_observers D:\AI\pytorch\build_libtorch\modules\observers\LINK 1
Severity Code Description Project File Line Suppression State
Error LNK1120 1 unresolved externals caffe2_observers D:\AI\pytorch\build_libtorch\bin\Release\caffe2_observers.dll 1
Severity Code Description Project File Line Suppression State
Error LNK2001 unresolved external symbol \?warp_size@cuda@at@YAHXZ\ caffe2_detectron_ops_gpu D:\AI\pytorch\build_libtorch\modules\detectron\LINK 1
Severity Code Description Project File Line Suppression State
Error LNK1120 1 unresolved externals caffe2_detectron_ops_gpu D:\AI\pytorch\build_libtorch\bin\Release\caffe2_detectron_ops_gpu.dll 1
```
This change at least fixes the above errors in that specific setting. Do you think it makes sense to get this merged or will it break other settings?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35659
Differential Revision: D20735907
Pulled By: ezyang
fbshipit-source-id: eb8fa1e69aaaa5af2da3a76963ddc910bb716479
Summary:
Otherwise, VC++ will warn that every exposed C++ symbol, for example:
```
include\c10/core/impl/LocalDispatchKeySet.h(53): warning C4251: 'c10::impl::LocalDispatchKeySet::included_': class 'c10::DispatchKeySet' needs to have dll-interface to be used by clients of struct 'c10::impl::LocalDispatchKeySet'
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35272
Test Plan: CI
Differential Revision: D20623005
Pulled By: malfet
fbshipit-source-id: b635b674159bb9654e4e1a1af4394c4f36fe35bd
Summary:
This pull request has changes for:
1. Enabling a torch module with HIP code to be compiled by cpp_extensions.py
2. Fixes for hipify module to be able to be used by a torch extension
cc: ezyang iotamudelta jeffdaily
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32669
Differential Revision: D20033893
Pulled By: zou3519
fbshipit-source-id: fd6ddc8cdcd3930f41008636bb2bc9dd26cdb008
Summary:
Closes https://github.com/pytorch/pytorch/issues/30027
The idea here is that you can bind a function with `pybind11` in a single line and without modifying the function:
```cpp
m.def("foo", foo, py::call_guard<torch::PyWarningHandler>());
```
Where warnings are handled by the [`call_guard`](https://pybind11.readthedocs.io/en/stable/advanced/functions.html#call-guard) and exceptions are handled by the `pybind11` exception translator. To do this, I have added support for handling C++ exceptions in `torch::PyWarningHandler`'s destructor without setting the python error state before hand.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30588
Differential Revision: D19905626
Pulled By: albanD
fbshipit-source-id: 90c0a5e298b123cc0c8ab9c52c91be4e96ea47c6
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33346Fixes#33091
This PR lets users control the number of workers that cpp extensions
uses through the environment variable `MAX_JOBS`. If the environment
variable is a non-negative integer we use that many threads; otherwise,
ninja falls back to the default.
I chose to use the name `MAX_JOBS` because we use it in PyTorch already
to control the number of workers PyTorch builds with. There is a risk
that users of cpp extensions already have `MAX_JOBS` set but we are
hoping that that risk is small and/or it means semantically the same
thing.
Test Plan: - tested locally
Differential Revision: D19911645
Pulled By: zou3519
fbshipit-source-id: d20ed42de4f845499ed38f1a1c73e9ccb620f780
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32495
Background
------------------------------
Previously, ninja was used to compile+link inline cpp_extensions and
ahead-of-time cpp_extensions were compiled with distutils. This PR adds
the ability to compile (but not link) ahead-of-time cpp_extensions with ninja.
The main motivation for this is to speed up cpp_extension builds: distutils
does not make use of parallelism. With this PR, using the new option, on my machine,
- torchvision compilation goes from 3m43s to 49s
- nestedtensor compilation goes from 2m0s to 28s.
User-facing changes
------------------------------
I added a `use_ninja` flag to BuildExtension. This defaults to
`True`. When `use_ninja` is True:
- it will attempt to use ninja.
- If we cannot use ninja, then this throws a warning and falls back to
distutils.
- Situations we cannot use ninja: Windows (NYI, I'll open a new issue
for this), if ninja cannot be found on the system.
Implementation Details
------------------------------
This PR makes this change in two steps. Please me know if it would be
easier to review this if I split this up into a stacked diff.
Those changes are:
1) refactor _write_ninja_file to separate the policy (what compiler flags
to pass) from the mechanism (how to write the ninja file and do compilation).
2) call _write_ninja_file and _run_ninja_build while building
ahead-of-time cpp_extensions. These are only used to compile objects;
distutils still handles the linking.
Change 1: refactor _write_ninja_file to seperate policy from mechanism
- I split _write_ninja_file into: _write_ninja_file and
_write_ninja_file_to_build_library
- I renamed _build_extension_module to _run_ninja_build
Change 2: Call _write_ninja_file while building ahead-of-time
cpp_extensions
- _write_ninja_file_and_compile_objects calls _write_ninja_file to only
build object files.
- We monkey-patch distutils.CCompiler.compile to call
_write_ninja_files_and_compile_objects
- distutils still handles the linking step. The linking step is not a
bottleneck so it was not a concern.
- This change only works on unix-based systems. Our code for windows
goes down a different codepath and I did not want to mess with that.
- If a system does not support ninja, we raise a warning and fall back
to the original compilation path.
Test Plan
------------------------------
Adhoc testing
- I built torchvision using pytorch master and printed out the build
commands. Next, I used this branch to build torchvision and looked at
the ninja file. I compared the ninja file with the build commands and
asserted that they were functionally the same.
- I repeated the above for pytorch/nestedtensor.
PyTorch test suite
- I split `test_cpp_extensions` into `test_cpp_extensions_aot` and
`test_cpp_extensions_jit`. The AOT (ahead-of-time) version tests
ahead-of-time and the JIT version tests just-in-time (not to be confused
with TorchScript)
- `test_cpp_extensions_aot` gets run TWICE by run_test.py, once with
a module that was built with ninja, and once with a module that was
built without ninja.
- run_test.py asserts that when we are building with use_ninja=True,
ninja is actually available on the system.
Test Plan: Imported from OSS
Differential Revision: D19730432
Pulled By: zou3519
fbshipit-source-id: 819590d01cf65e8da5a1e8019b8b3084792fee90
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31161
Previously, it wasn't necessary to specify `DT_NEEDED` in C++ extensions on Linux (aka pass `-l` flags) because all of the symbols would have already been loaded with `RTLD_GLOBAL`, so there wouldn't be any undefined symbols. But when we switch to loading `_C` with `RTLD_LOCAL`, it's now necessary for all the C++ extensions to know what libraries to link with. The resulting code is clearer and more uniform, so it's wins all around.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Differential Revision: D19262578
Pulled By: ezyang
fbshipit-source-id: a893cc96f2e9aad1c064a6de4f7ccf79257dec3f
Summary:
The original `check-and-act` style can raise `FileExistsError` when multiple processes are jit-compiling the extension on the same node.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30956
Differential Revision: D19262570
Pulled By: ezyang
fbshipit-source-id: bb18c72e42648770b47f9378ac7c3929c3c03efc
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31047
Changelist:
- remove BUILD_NAMEDTENSOR from .cu files
- remove BUILD_NAMEDTENSOR special handling in function_wrapper.py
- remove BUILD_NAMEDTENSOR from cpp_extension.py. This code actually
did nothing because we always compile with BUILD_NAMEDTENSOR.
Test Plan: - run tests
Differential Revision: D18908442
Pulled By: zou3519
fbshipit-source-id: b239e24de58580adaf3cef573350773a38b1e4f0
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30315
The new structure is that libtorch_cpu contains the bulk of our
code, and libtorch depends on libtorch_cpu and libtorch_cuda.
This is a reland of https://github.com/pytorch/pytorch/pull/29731 but
I've extracted all of the prep work into separate PRs which can be
landed before this one.
Some things of note:
* torch/csrc/cuda/nccl.cpp was added to the wrong list of SRCS, now fixed (this didn't matter before because previously they were all in the same library)
* The dummy file for libtorch was brought back from the dead; it was previously deleted in #20774
In an initial version of the patch, I forgot to make torch_cuda explicitly depend on torch_cpu. This lead to some very odd errors, most notably "bin/blob_test: hidden symbol `_ZNK6google8protobuf5Arena17OnArenaAllocationEPKSt9type_infom' in lib/libprotobuf.a(arena.cc.o) is referenced by DSO"
* A number of places in Android/iOS builds have to add torch_cuda explicitly as a library, as they do not have transitive dependency calculation working correctly
* I had to torch_cpu/torch_cuda caffe2_interface_library so that they get whole-archived linked into torch when you statically link. And I had to do this in an *exported* fashion because torch needs to depend on torch_cpu_library. In the end I exported everything and removed the redefinition in the Caffe2Config.cmake. However, I am not too sure why the old code did it in this way in the first place; however, it doesn't seem to have broken anything to switch it this way.
* There's some uses of `__HIP_PLATFORM_HCC__` still in `torch_cpu` code, so I had to apply it to that library too (UGH). This manifests as a failer when trying to run the CUDA fuser. This doesn't really matter substantively right now because we still in-place HIPify, but it would be good to fix eventually. This was a bit difficult to debug because of an unrelated HIP bug, see https://github.com/ROCm-Developer-Tools/HIP/issues/1706Fixes#27215 (as our libraries are smaller), and executes on
part of the plan in #29235.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Differential Revision: D18790941
Pulled By: ezyang
fbshipit-source-id: 01296f6089d3de5e8365251b490c51e694f2d6c7
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29731
The new structure is that libtorch_cpu contains the bulk of our
code, and libtorch depends on libtorch_cpu and libtorch_cuda.
Some subtleties about the patch:
- There were a few functions that crossed CPU-CUDA boundary without API macros. I just added them, easy enough. An inverse situation was aten/src/THC/THCTensorRandom.cu where we weren't supposed to put API macros directly in a cpp file.
- DispatchStub wasn't getting all of its symbols related to static members on DispatchStub exported properly. I tried a few fixes but in the end I just moved everyone off using DispatchStub to dispatch CUDA/HIP (so they just use normal dispatch for those cases.) Additionally, there were some mistakes where people incorrectly were failing to actually import the declaration of the dispatch stub, so added includes for those cases.
- torch/csrc/cuda/nccl.cpp was added to the wrong list of SRCS, now fixed (this didn't matter before because previously they were all in the same library)
- The dummy file for libtorch was brought back from the dead; it was previously deleted in #20774
- In an initial version of the patch, I forgot to make torch_cuda explicitly depend on torch_cpu. This lead to some very odd errors, most notably "bin/blob_test: hidden symbol `_ZNK6google8protobuf5Arena17OnArenaAllocationEPKSt9type_infom' in lib/l
ibprotobuf.a(arena.cc.o) is referenced by DSO"
- A number of places in Android/iOS builds have to add torch_cuda explicitly as a library, as they do not have transitive dependency calculation working correctly. This situation also happens with custom C++ extensions.
- There's a ROCm compiler bug where extern "C" on functions is not respected. There's a little workaround to handle this.
- Because I was too lazy to check if HIPify was converting TORCH_CUDA_API into TORCH_HIP_API, I just made it so HIP build also triggers the TORCH_CUDA_API macro. Eventually, we should translate and keep the nature of TORCH_CUDA_API constant in all cases.
Fixes#27215 (as our libraries are smaller), and executes on
part of the plan in #29235.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Differential Revision: D18632773
Pulled By: ezyang
fbshipit-source-id: ea717c81e0d7554ede1dc404108603455a81da82
Summary:
This is a follow-up to gh-23408. No longer supported are any arches < 3.5 (numbers + 'Fermi' and 'Kepler+Tegra').
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24442
Differential Revision: D16889283
Pulled By: ezyang
fbshipit-source-id: 3c0c35d51b7ac7642d1be7ab4b0f260ac93b60c9
Summary:
The old behavior was to always use `sm_30`. The new behavior is:
- For building via a setup.py, check if `'arch'` is in `extra_compile_args`. If so, don't change anything.
- If `TORCH_CUDA_ARCH_LIST` is set, respect that (can be 1 or more arches)
- Otherwise, query device capability and use that.
To test this, for example on a machine with `torch` installed for py37:
```
$ git clone https://github.com/pytorch/extension-cpp.git
$ cd extension-cpp/cuda
$ python setup.py install
$ cuobjdump --list-elf build/lib.linux-x86_64-3.7/lltm_cuda.cpython-37m-x86_64-linux-gnu.so
ELF file 1: lltm.1.sm_61.cubin
```
Existing tests in `test_cpp_extension.py` for `load_inline` and for compiling via `setup.py` in test/cpp_extensions/ cover this.
Closes gh-18657
EDIT: some more tests:
```
from torch.utils.cpp_extension import load
lltm = load(name='lltm', sources=['lltm_cuda.cpp', 'lltm_cuda_kernel.cu'])
```
```
# with TORCH_CUDA_ARCH_LIST undefined or an empty string
$ cuobjdump --list-elf /tmp/torch_extensions/lltm/lltm.so
ELF file 1: lltm.1.sm_61.cubin
# with TORCH_CUDA_ARCH_LIST = "3.5 5.2 6.0 6.1 7.0+PTX"
$ cuobjdump --list-elf build/lib.linux-x86_64-3.7/lltm_cuda.cpython-37m-x86_64-linux-gnu.so
ELF file 1: lltm_cuda.cpython-37m-x86_64-linux-gnu.1.sm_35.cubin
ELF file 2: lltm_cuda.cpython-37m-x86_64-linux-gnu.2.sm_52.cubin
ELF file 3: lltm_cuda.cpython-37m-x86_64-linux-gnu.3.sm_60.cubin
ELF file 4: lltm_cuda.cpython-37m-x86_64-linux-gnu.4.sm_61.cubin
ELF file 5: lltm_cuda.cpython-37m-x86_64-linux-gnu.5.sm_70.cubin
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23408
Differential Revision: D16784110
Pulled By: soumith
fbshipit-source-id: 69ba09e235e4f906b959fd20322c69303240ee7e
Summary:
Closes gh-16955.
Closes https://github.com/pytorch/vision/issues/977
On Linux both `lib64` and `lib` may be present (symlinked). The reports
seem to all be about macOS, but it seems like this is also possibly more
robust on Linux and can't hurt. So not treating platforms differently.
Note that Eigen has a similar check in its CMake:
```
if(CUDA_64_BIT_DEVICE_CODE AND (EXISTS "${CUDA_TOOLKIT_ROOT_DIR}/lib64"))
link_directories("${CUDA_TOOLKIT_ROOT_DIR}/lib64")
else()
link_directories("${CUDA_TOOLKIT_ROOT_DIR}/lib")
endif()
```
There may be other issues for building from source on macOS, can't test.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23491
Differential Revision: D16538973
Pulled By: soumith
fbshipit-source-id: cc309347b7d16e718e06878d3824d0a6e40b1019
Summary:
(1) Add `COMMON_MSVC_FLAGS` to the flags in the ninja codepath
(2) Add `/EHsc` to `COMMON_MSVC_FLAG`
(3) Remove `-fPIC` and `-std=c++11` from the flags in the windows codepath
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23472
Differential Revision: D16532993
Pulled By: soumith
fbshipit-source-id: bc2d983f5f8b4eae9c7385bf170f155679e92e87
Summary:
The conda compiler are gcc/c++ 7.3.0, but have custom version strings
for clarity:
x86_64-conda_cos6-linux-gnu-cc
x86_64-conda_cos6-linux-gnu-c++
Using these compilers to build a C++ or CUDA extension now gives this warning (unnecessarily):
```
!! WARNING !!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (/home/rgommers/anaconda3/envs/pytorch-nightly/bin/x86_64-conda_cos6-linux-gnu-c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux.
...
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23396
Differential Revision: D16500637
Pulled By: soumith
fbshipit-source-id: 5b2fc3593e22e9a7d07dc2c0456dbb4934ffddb2
Summary:
Align the behavior of `torch.utils.cpp_extension.CUDA_HOME` with that of `tools.setup_helpers.cuda.CUDA_HOME`.
Typically, I swapped the position of guess 2 and guess 3 in `torch.utils.cpp_extension.CUDA_HOME` .
Fixing issue https://github.com/pytorch/pytorch/issues/22844
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22845
Differential Revision: D16276241
Pulled By: zou3519
fbshipit-source-id: 3b62b439b2f794a6f3637a5fee58991f430985fe
Summary:
Currently the build system accepts USE_NAMEDTENSOR from the environment
variable and turns it into NAMEDTENSOR_ENABLED when passing to CMake.
This discrepancy does not seem necessary and complicates the build
system. The naming of this build option is also semantically incorrect
("BUILD_" vis-a-vis "USE_"). This commit eradicate this issue before it
is made into a stable release.
The support of NO_NAMEDTENSOR is also removed, since PyTorch has been
quite inconsistent about "NO_*" build options.
---
Note: All environment variables with their names starting with `BUILD_` are currently automatically passed to CMake with no need of an additional wrapper.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22360
Differential Revision: D16074509
Pulled By: zou3519
fbshipit-source-id: dc316287e26192118f3c99b945454bc50535b2ae
Summary:
This renames the CMake `caffe2` target to `torch`, as well as renaming `caffe2_gpu` to `torch_gpu` (and likewise for other gpu target variants). Many intermediate variables that don't manifest as artifacts of the build remain for now with the "caffe2" name; a complete purge of `caffe2` from CMake variable names is beyond the scope of this PR.
The shell `libtorch` library that had been introduced as a stopgap in https://github.com/pytorch/pytorch/issues/17783 is again flattened in this PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20774
Differential Revision: D15769965
Pulled By: kostmo
fbshipit-source-id: b86e8c410099f90be0468e30176207d3ad40c821
Summary:
This PR is an intermediate step toward the ultimate goal of eliminating "caffe2" in favor of "torch". This PR moves all of the files that had constituted "libtorch.so" into the "libcaffe2.so" library, and wraps "libcaffe2.so" with a shell library named "libtorch.so". This means that, for now, `caffe2/CMakeLists.txt` becomes a lot bigger, and `torch/CMakeLists.txt` becomes smaller.
The torch Python bindings (`torch_python.so`) still remain in `torch/CMakeLists.txt`.
The follow-up to this PR will rename references to `caffe2` to `torch`, and flatten the shell into one library.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17783
Differential Revision: D15284178
Pulled By: kostmo
fbshipit-source-id: a08387d735ae20652527ced4e69fd75b8ff88b05
Summary:
Previously, when a user built PyTorch from source, but set the version string manually to be binary-formatted, it would've simply used CXX11_ABI=0 incorrectly.
We have this information available at runtime with `torch._C._GLIBCXX_USE_CXX11_ABI`, so this PR improves the situation by simply using that information.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18994
Differential Revision: D14839393
Pulled By: soumith
fbshipit-source-id: ca92e0810b29ffe688be82326e02a64a5649a3ad
Summary:
Hi. It seems that when building CPP-extensions with CUDA for Windows, an `extra_cuda_cflags` options are not properly forwarded to `nvcc`.
Use of extra CUDA options is necessary to build, for instance, a InplaceABN (https://github.com/mapillary/inplace_abn), which requires `--expt-extended-lambda` option.
This PR adds one line that correctly appends `extra_cuda_cflags`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18638
Differential Revision: D14704270
Pulled By: ezyang
fbshipit-source-id: e1e330d193d9afd5707a5437a74c0499460d2b90
Summary:
...because gcc will have failures with very strange error messages
if you do.
This affects people with Debian/Ubuntu-provided NVCC, the PR should
not change anything for anyone else.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18127
Differential Revision: D14504386
Pulled By: soumith
fbshipit-source-id: 1aea168723cdc71cdcfffb3193ee116108ae755e
Summary:
Rehash of previous attempts. This tries a different approach where we accept the install as specified in cmake (leaving bin/ include/ and lib/ alone), and then try to adjust the rest of the files to this more standard layout.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16414
Differential Revision: D13863635
Pulled By: zdevito
fbshipit-source-id: 23725f5c64d7509bf3ca8f472dcdcad074de9828
Summary:
I fixed a very small extra parenthesis in a doctest.
I'm also going to use this issue as a place to propose the eventual inclusion of xdoctest (a pip installable library I wrote) in pytorch's test suite. I think there are a lot of problems with Python's built in doctest module, and I've built xdoctest to fix them. I would love for my project to get some exposure and its addition to PyTorch may benefit both projects. Please see the readme for more details on what xdoctest brings to the table over the builtin doctest module: https://github.com/Erotemic/xdoctest
I came across this small syntax error when working on ensuring xdoctest was compatible with pytorch. It isn't 100% there yet, but I'm working on it. My goal is to ensure that xdoctest is 100% compatible with all of torch's doctest out-of-the-box before writing up the PR. I'm also airing the idea out-loud before I commit too much time into this (or get my hopes up), so I'm attaching this little blurb to a no-brainer-merge PR to (1) demonstrate a little bit of value (because xdoctest flagged this syntax error) and (2) see how its received.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15646
Differential Revision: D13606111
Pulled By: soumith
fbshipit-source-id: d4492801a38ee0ae64ea0326a83239cee4d811a4
Summary:
This PR enables C++ frontend modules to be bound into Python and added as submodules of Python modules. For this, I added lots of pybind11 bindings for the `torch::nn::Module` class, and modified the `torch.nn.Module` class in Python to have a new Metaclass that makes `isinstance(m, torch.nn.Module)` return true when `m` is a C++ frontend module. The methods and fields of C++ modules are bound in such a way that they work seamlessly as submodules of Python modules for most operations (one exception I know of: calling `.to()` ends up calling `.apply()` on each submodule with a Python lambda, which cannot be used in C++ -- this may require small changes on Python side).
I've added quite a bunch of tests to verify the bindings and equality with Python. I think I should also try out adding a C++ module as part of some large PyTorch module, like a WLM or something, and see if everything works smoothly.
The next step for inter-op across our system is ScriptModule <-> C++ Frontend Module inter-op. I think this will then also allow using C++ frontend modules from TorchScript.
apaszke zdevito
CC dzhulgakov
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13481
Differential Revision: D12981996
Pulled By: goldsborough
fbshipit-source-id: 147370d3596ebb0e94c82cec92993a148fee50a7
Summary:
When using `setuptools` to build a Python extension, setuptools will automatically add an ABI suffix like `cpython-37m-x86_64-linux-gnu` to the shared library name when using Python 3. This is required for extensions meant to be imported as Python modules. When we use setuptools to build shared libraries not meant as Python modules, for example libraries that define and register TorchScript custom ops, having your library called `my_ops.cpython-37m-x86_64-linux-gnu.so` is a bit annoying compared to just `my_ops.so`, especially since you have to reference the library name when loading it with `torch.ops.load_library` in Python.
This PR fixes this by adding a `with_options` class method to the `torch.utils.cpp_extension.BuildExtension` which allows configuring the `BuildExtension`. In this case, the first option we add is `no_python_abi_suffix`, which we then use in `get_ext_filename` (override from `setuptools.build_ext`) to throw away the ABI suffix.
I've added a test `setup.py` in a `no_python_abi_suffix_test` folder.
Fixes https://github.com/pytorch/pytorch/issues/14188
t-vi fmassa soumith
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14130
Differential Revision: D13216575
Pulled By: goldsborough
fbshipit-source-id: 67dc345c1278a1a4ee4ca907d848bc1fb4956cfa
Summary:
For custom TorchScript operators, `torch.ops.load_library` must be used and passed the path to the shared library containing the custom ops. Our C++ extensions stuff generally is meant to build a Python module and import it. This PR changes `torch.utils.cpp_extension.load` to have an option to just return the shared library path instead of importing it as a Python module, so you can then pass it to `torch.ops.load_library`. This means folks can re-use `torch.utils.cpp_extension.load` and `torch.utils.cpp_extension.load_inline` to even write their custom ops inline. I think t-vi and fmassa will appreciate this.
soumith
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13941
Differential Revision: D13110592
Pulled By: goldsborough
fbshipit-source-id: 37756307dbf80a81d2ed550e67c8743dca01dc20
Summary:
This is the next minimal step towards moving _C into cmake. For now,
leave _C in setup.py, but reduce it to an empty stub file. All of its
sources are now part of the new torch-python cmake target.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12742
Reviewed By: soumith
Differential Revision: D13089691
Pulled By: anderspapitto
fbshipit-source-id: 1c746fda33cfebb26e02a7f0781fefa8b0d86385
Summary:
We weren't running C++ extensions tests in CI.
Also, let's error hard when `ninja` is not available instead of skipping C++ extensions tests.
Fixes https://github.com/pytorch/pytorch/issues/13622
ezyang soumith yf225
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13646
Differential Revision: D12961468
Pulled By: goldsborough
fbshipit-source-id: 917c8a14063dc40e6ab79a0f7d345ae2d3566ba4
Summary:
In TorchScript and C++ extensions we currently advocate a mix of `torch::` and `at::` namespace usage. In the C++ frontend I had instead exported all symbols from `at::` and some from `c10::` into the `torch::` namespace. This is far, far easier for users to understand, and also avoid bugs around creating tensors vs. variables. The same should from now on be true for the TorchScript C++ API (for running and loading models) and all C++ extensions.
Note that since we're just talking about typedefs, this change does not break any existing code.
Once this lands I will update stuff in `pytorch/tutorials` too.
zdevito ezyang gchanan
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13523
Differential Revision: D12942787
Pulled By: goldsborough
fbshipit-source-id: 76058936bd8707b33d9e5bbc2d0705fc3d820763
Summary:
I found a bug about compiling the cuda file when I install maskrcnn-benchmark lib.
`python setup.py build develop` will throw the error:
```
File "/usr/local/lib/python2.7/dist-packages/torch/utils/cpp_extension.py", line 214, in unix_wrap_compile
original_compile(obj, src, ext, cc_args, cflags, pp_opts)
File "/usr/lib/python2.7/distutils/unixccompiler.py", line 125, in _compile
self.spawn(compiler_so + cc_args + [src, '-o', obj] +
TypeError: coercing to Unicode: need string or buffer, list found
```
For more information, please see [issue](https://github.com/facebookresearch/maskrcnn-benchmark/issues/99).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13509
Differential Revision: D12902675
Pulled By: soumith
fbshipit-source-id: b9149f5de21ae29f94670cb2bbc93fa368f4e0f7
Summary:
This PR makes the ABI compatibility check for C++ extensions more robust by resolving the real path of the compiler binary, such that e.g. `"c++"` is resolved to the path of g++. This more robust than assuming that `c++ --version` will contain the word "gcc".
CC jcjohnson
Closes#10114
soumith
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13092
Differential Revision: D12810448
Pulled By: goldsborough
fbshipit-source-id: 6ac460e24496c0d8933b410401702363870b7568
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13141
This is an example diff to show what lint rules are being applied.
Reviewed By: mingzhe09088
Differential Revision: D10858478
fbshipit-source-id: cbeb013f10f755b0095478adf79366e7cf7836ff
Summary:
There are still a few work to be done:
- Move logging and unify AT_WARN with LOG(ERROR).
- A few header files are still being plumbed through, need cleaning.
- caffe2::EnforceNotMet aliasing is not done yet.
- need to unify the macros. See c10/util/Exception.h
This is mainly a codemod and not causing functional changes. If you find your job failing and trace back to this diff, usually it can be fixed by the following approaches:
(1) add //caffe2/c10:c10 to your dependency (or transitive dependency).
(2) change objects such as at::Error, at::Optional to the c10 namespace.
(3) change functions to the c10 namespace. Especially, caffe2::MakeString is not overridden by the unified c10::str function. Nothing else changes.
Please kindly consider not reverting this diff - it involves multiple rounds of rebasing and the fix is usually simple. Contact jiayq@ or AI Platform Dev for details.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12354
Reviewed By: orionr
Differential Revision: D10238910
Pulled By: Yangqing
fbshipit-source-id: 7794d5bf2797ab0ca6ebaccaa2f7ebbd50ff8f32
Summary:
Currently the C++ API and C++ extensions are effectively two different, entirely orthogonal code paths. This PR unifies the C++ API with the C++ extension API by adding an element of Python binding support to the C++ API. This means the `torch/torch.h` included by C++ extensions, which currently routes to `torch/csrc/torch.h`, can now be rerouted to `torch/csrc/api/include/torch/torch.h` -- i.e. the main C++ API header. This header then includes Python binding support conditioned on a define (`TORCH_WITH_PYTHON_BINDINGS`), *which is only passed when building a C++ extension*.
Currently stacked on top of https://github.com/pytorch/pytorch/pull/11498
Why is this useful?
1. One less codepath. In particular, there has been trouble again and again due to the two `torch/torch.h` header files and ambiguity when both ended up in the include path. This is now fixed.
2. I have found that it is quite common to want to bind a C++ API module back into Python. This could be for simple experimentation, or to have your training loop in Python but your models in C++. This PR makes this easier by adding pybind11 support to the C++ API.
3. The C++ extension API simply becomes richer by gaining access to the C++ API headers.
soumith ezyang apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11510
Reviewed By: ezyang
Differential Revision: D9998835
Pulled By: goldsborough
fbshipit-source-id: 7a94b44a9d7e0377b7f1cfc99ba2060874d51535
Summary:
Python never closes shared library it `dlopen`s. This means that calling `load` or `load_inline` (i.e. building a JIT C++ extension) with the same C++ extension name twice in the same Python process will never re-load the library, even if the compiled source code and the underlying shared library have changed. The only way to circumvent this is to create a new library and load it under a new module name.
I fix this, of course, by introducing a layer of indirection. Loading a JIT C++ extension now goes through an `ExtensionVersioner`, which hashes the contents of the source files as well as build flags, and if this hash changed, bumps an internal version stored for each module name. A bump in the version will result in the ninja file being edited and a new shared library and effectively a new C++ extension to be compiled. For this the version name is appended as `_v<version>` to the extension name for all versions greater zero.
One caveat is that if you were to update your code many times and always re-load it in the same process, you may end up with quite a lot of shared library objects in your extension's folder under `/tmp`. I imagine this isn't too bad, since extensions are typically small and there isn't really a good way for us to garbage collect old libraries, since we don't know what still has handles to them.
Fixes https://github.com/pytorch/pytorch/issues/11398 CC The controller you requested could not be found.
ezyang gchanan soumith fmassa
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11725
Differential Revision: D9948244
Pulled By: goldsborough
fbshipit-source-id: 695bbdc1f1597c5e4306a45cd8ba46f15c941383
Summary:
Two improvements to C++ extensions:
1. In verbose mode, show the ninja build output (the exact compile commands, very useful)
2. When raising an error, don't show the `CalledProcessError` that shows ninja failing, only show the `RuntimeError` with the captured stdout
soumith fmassa ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11724
Differential Revision: D9922459
Pulled By: goldsborough
fbshipit-source-id: 5b319bf24348eabfe5f4c55d6d8e799b9abe523a
Summary:
A couple fixes I deem necessary to the TorchScript C++ API after writing the tutorial:
1. When I was creating the custom op API, I created `torch/op.h` as the one-stop header for creating custom ops. I now notice that there is no good header for the TorchScript C++ story altogether, i.e. when you just want to load a script module in C++ without any custom ops necessarily. The `torch/op.h` header suits that purpose just as well of course, but I think we should rename it to `torch/script.h`, which seems like a great name for this feature.
2. The current API for the CMake we provided was that we defined a bunch of variables like `TORCH_LIBRARY_DIRS` and `TORCH_INCLUDES` and then expected users to add those variables to their targets. We also had a CMake function that did that for you automatically. I now realized a much smarter way of doing this is to create an `IMPORTED` target for the libtorch library in CMake, and then add all this stuff to the link interface of that target. Then all downstream users have to do is `target_link_libraries(my_target torch)` and they get all the proper includes, libraries and compiler flags added to their target. This means we can get rid of the CMake function and all that stuff. orionr AFAIK this is a much, much better way of doing all of this, no?
3. Since we distribute libtorch with `D_GLIBCXX_USE_CXX11_ABI=0`, dependent libraries must set this flag too. I now add this to the interface compile options of this imported target.
4. Fixes to JIT docs.
These could likely be 4 different PRs but given the release I wouldn't mind landing them all asap.
zdevito dzhulgakov soumith
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11682
Differential Revision: D9839431
Pulled By: goldsborough
fbshipit-source-id: fdc47b95f83f22d53e1995aa683e09613b4bfe65
Summary:
I noticed warnings from within pybind11 being shown when building C++ extensions. This can be avoided by including non-user-supplied headers with `-isystem` instead of `-I`
I hope this works on Windows.
soumith ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11459
Differential Revision: D9764444
Pulled By: goldsborough
fbshipit-source-id: b288572106078f347f0342f158f9e2b63a58c235
Summary:
Currently we assume to find cudnn includes and libraries in the `CUDA_HOME` root. But this is not always true. So we now support a `CUDNN_HOME`/`CUDNN_PATH` environment variable that can have its own `/include` and `/lib64` folder.
This means cudnn extensions now also get support on the FAIR cluster.
soumith fmassa
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10922
Differential Revision: D9526856
Pulled By: goldsborough
fbshipit-source-id: 5c64a5ff7cd428eb736381c24736006b21f8b6db
Summary:
Prior to this diff, there have been two ways of compiling the bulk of the torch codebase. There was no interaction between them - you had to pick one or the other.
1) with setup.py. This method
- used the setuptools C extension functionality
- worked on all platforms
- did not build test_jit/test_api binaries
- did not include the C++ api
- always included python functionality
- produced _C.so
2) with cpp_build. This method
- used CMake
- did not support Windows or ROCM
- was capable of building the test binaries
- included the C++ api
- did not build the python functionality
- produced libtorch.so
This diff combines the two.
1) cpp_build/CMakeLists.txt has become torch/CMakeLists.txt. This build
- is CMake-based
- works on all platforms
- builds the test binaries
- includes the C++ api
- does not include the python functionality
- produces libtorch.so
2) the setup.py build
- compiles the python functionality
- calls into the CMake build to build libtorch.so
- produces _C.so, which has a dependency on libtorch.so
In terms of code changes, this mostly means extending the cmake build to support the full variety of environments and platforms. There are also a small number of changes related to the fact that there are now two shared objects - in particular, windows requires annotating some symbols with dllimport/dllexport, and doesn't allow exposing thread_local globals directly.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8792
Reviewed By: ezyang
Differential Revision: D8764181
Pulled By: anderspapitto
fbshipit-source-id: abec43834f739049da25f4583a0794b38eb0a94f
Summary:
Any flags linking libraries only take effect on inputs preceding them,
so we have to call `$cxx $in $ldflags -o $out` instead of the other way
around.
This was probably not detected so far since the torch libraries are
already loaded when loading JIT-compiled extensions, so this only has an
effect on third-party libraries.
This also matches our behavior on windows.
Closes https://github.com/pytorch/pytorch/pull/9021
Reviewed By: soumith
Differential Revision: D8694049
Pulled By: ezyang
fbshipit-source-id: e35745fc3b89bf39c14f07ce90d6bd18e6a3d7cc
* Have PyTorch depend on minimal libcaffe2.so instead of libATen.so
* Build ATen tests as a part of Caffe2 build
* Hopefully cufft and nvcc fPIC fixes
* Make ATen install components optional
* Add tests back for ATen and fix TH build
* Fixes for test_install.sh script
* Fixes for cpp_build/build_all.sh
* Fixes for aten/tools/run_tests.sh
* Switch ATen cmake calls to USE_CUDA instead of NO_CUDA
* Attempt at fix for aten/tools/run_tests.sh
* Fix typo in last commit
* Fix valgrind call after pushd
* Be forgiving about USE_CUDA disable like PyTorch
* More fixes on the install side
* Link all libcaffe2 during test run
* Make cuDNN optional for ATen right now
* Potential fix for non-CUDA builds
* Use NCCL_ROOT_DIR environment variable
* Pass -fPIC through nvcc to base compiler/linker
* Remove THCUNN.h requirement for libtorch gen
* Add Mac test for -Wmaybe-uninitialized
* Potential Windows and Mac fixes
* Move MSVC target props to shared function
* Disable cpp_build/libtorch tests on Mac
* Disable sleef for Windows builds
* Move protos under BUILD_CAFFE2
* Remove space from linker flags passed with -Wl
* Remove ATen from Caffe2 dep libs since directly included
* Potential Windows fixes
* Preserve options while sleef builds
* Force BUILD_SHARED_LIBS flag for Caffe2 builds
* Set DYLD_LIBRARY_PATH and LD_LIBRARY_PATH for Mac testing
* Pass TORCH_CUDA_ARCH_LIST directly in cuda.cmake
* Fixes for the last two changes
* Potential fix for Mac build failure
* Switch Caffe2 to build_caffe2 dir to not conflict
* Cleanup FindMKL.cmake
* Another attempt at Mac cpp_build fix
* Clear cpp-build directory for Mac builds
* Disable test in Mac build/test to match cmake
* Split libATen.so into libATen_cpu.so and libATen_cuda.so
Previously, ATen could be built with either CPU-only support, or
CPU/CUDA support, but only via a compile-time flag, requiring
two separate builds. This means that if you have a program which
indirectly uses a CPU-only build of ATen, and a CPU/CUDA-build of
ATen, you're gonna have a bad time. And you might want a CPU-only
build of ATen, because it is 15M (versus the 300M of a CUDA build).
This commit splits libATen.so into two libraries, CPU/CUDA, so
that it's not necessary to do a full rebuild to get CPU-only
support; instead, if you link against libATen_cpu.so only, you
are CPU-only; if you additionally link/dlopen libATen_cuda.so,
this enables CUDA support. This brings ATen's dynamic library
structure more similar to Caffe2's. libATen.so is no more
(this is BC BREAKING)
The general principle for how this works is that we introduce
a *hooks* interface, which introduces a dynamic dispatch indirection
between a call site and implementation site of CUDA functionality,
mediated by a static initialization registry. This means that we can continue
to, for example, lazily initialize CUDA from Context (a core, CPU class) without
having a direct dependency on the CUDA bits. Instead, we look up
in the registry if, e.g., CUDA hooks have been loaded (this loading
process happens at static initialization time), and if they
have been we dynamic dispatch to this class. We similarly use
the hooks interface to handle Variable registration.
We introduce a new invariant: if the backend of a type has not
been initialized (e.g., it's library has not been dlopened; for
CUDA, this also includes CUDA initialization), then the Type
pointers in the context registry are NULL. If you access the
registry directly you must maintain this invariant.
There are a few potholes along the way. I document them here:
- Previously, PyTorch maintained a separate registry for variable
types, because no provision for them was made in the Context's
type_registry. Now that we have the hooks mechanism, we can easily
have PyTorch register variables in the main registry. The code
has been refactored accordingly.
- There is a subtle ordering issue between Variable and CUDA.
We permit libATen_cuda.so and PyTorch to be loaded in either
order (in practice, CUDA is always loaded "after" PyTorch, because
it is lazily initialized.) This means that, when CUDA types are
loaded, we must subsequently also initialize their Variable equivalents.
Appropriate hooks were added to VariableHooks to make this possible;
similarly, getVariableHooks() is not referentially transparent, and
will change behavior after Variables are loaded. (This is different
to CUDAHooks, which is "burned in" after you try to initialize CUDA.)
- The cmake is adjusted to separate dependencies into either CPU
or CUDA dependencies. The generator scripts are adjusted to either
generate a file as a CUDA (cuda_file_manager) or CPU file (file_manager).
- I changed all native functions which were CUDA-only (the cudnn functions)
to have dispatches for CUDA only (making it permissible to not specify
all dispatch options.) This uncovered a bug in how we were handling
native functions which dispatch on a Type argument; I introduced a new
self_ty keyword to handle this case. I'm not 100% happy about it
but it fixed my problem.
This also exposed the fact that set_history incompletely handles
heterogenous return tuples combining Tensor and TensorList. I
swapped this codegen to use flatten() (at the possible cost of
a slight perf regression, since we're allocating another vector now
in this code path).
- thc_state is no longer a public member of Context; use getTHCState() instead
- This PR comes with Registry from Caffe2, for handling static initialization.
I needed to make a bunch of fixes to Registry to make it more portable
- No more ##__VA_ARGS__ token pasting; instead, it is mandatory to pass at
least one argument to the var-args. CUDAHooks and VariableHooks pass a nullary
struct CUDAHooksArgs/VariableHooksArgs to solve the problem. We must get rid of
token pasting because it does not work with MSVC.
- It seems MSVC is not willing to generate code for constructors of template
classes at use sites which cross DLL boundaries. So we explicitly instantiate
the class to get around the problem. This involved tweaks to the boilerplate
generating macros, and also required us to shuffle around namespaces a bit,
because you can't specialize a template unless you are in the same namespace as
the template.
- Insertion of AT_API to appropriate places where the registry must be exported
- We have a general problem which is that on recent Ubuntu distributions,
--as-needed is enabled for shared libraries, which is (cc @apaszke who was
worrying about this in #7160 see also #7160 (comment)). For now, I've hacked
this up in the PR to pass -Wl,--no-as-needed to all of the spots necessary to
make CI work, but a more sustainable solution is to attempt to dlopen
libATen_cuda.so when CUDA functionality is requested.
- The JIT tests somehow manage to try to touch CUDA without loading libATen_cuda.so. So
we pass -Wl,--no-as-needed when linking libATen_cuda.so to _C.so
- There is a very subtle linking issue with lapack, which is solved by making sure libATen_cuda.so links against LAPACK. There's a comment in aten/src/ATen/CMakeLists.txt about htis as well as a follow up bug at #7353
- autogradpp used AT_CUDA_ENABLED directly. We've expunged these uses and added
a few more things to CUDAHooks (getNumGPUs)
- Added manualSeedAll to Generator so that we can invoke it polymorphically (it
only does something different for CUDAGenerator)
- There's a new cuda/CUDAConfig.h header for CUDA-only ifdef macros (AT_CUDNN_ENABLED, most prominently)
- CUDAHooks/VariableHooks structs live in at namespace because Registry's
namespace support is not good enough to handle it otherwise (see Registry
changes above)
- There's some modest moving around of native functions in ReduceOps and
UnaryOps to get the CUDA-only function implementations into separate files, so
they are only compiled into libATen_cuda.so. sspaddmm needed a separate CUDA
function due to object linkage boundaries.
- Some direct uses of native functions in CUDA code has to go away, since these
functions are not exported, so you have to go through the dispatcher
(at::native::empty_like to at::empty_like)
- Code in THC/THCS/THCUNN now properly use THC_API macro instead of TH_API
(which matters now that TH and THC are not in the same library)
- Added code debt in torch/_thnn/utils.py and other THNN parsing code to handle
both TH_API and THC_API
- TensorUtils.h is now properly exported with AT_API
- Dead uses of TH_EXPORTS and co expunged; we now use ATen_cpu_exports and
ATen_cuda_exports (new, in ATenCUDAGeneral.h) consistently
- Fix some incorrect type annotations on _cudnn_rnn_backward, where we didn't
declare a type as possibly undefined when we should have. We didn't catch this
previously because optional annotations are not tested on "pass-through" native
ATen ops (which don't have dispatch). Upstream issue at #7316
- There's a new cmake macro aten_compile_options for applying all of our
per-target compile time options. We use this on the cpu and cuda libraries.
- test/test_cpp_extensions.py can be run directly by invoking in Python,
assuming you've setup your PYTHONPATH setup correctly
- type_from_string does some new funny business to only query for all valid CUDA
types (which causes CUDA initialization) when we see "torch.cuda." in the
requested string
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Last mile libtorch fixes
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* pedantic fix
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Rename autograd namespace to torch and change torch.h into python.h
* Include torch.h instead of python.h in test/cpp/api
* Change some mentions of torch.h to python.h in C++ extensions
* Set paths directly, without find_path
Adds ability to JIT compile C++ extensions from strings
>>> from torch.utils.cpp_extension import load_inline
>>> source = '''
at::Tensor sin_add(at::Tensor x, at::Tensor y) {
return x.sin() + y.sin();
}
'''
>>> module = load_inline(name='inline_extension', cpp_sources=source, functions='sin_add')
Fixes#7012
* Inline JIT C++ Extensions
* jit_compile_sources -> jit_compile
* Split up test into CUDA and non-CUDA parts
* Documentation fixes
* Implement prologue and epilogue generation
* Remove extra newline
* Only create the CUDA source file when cuda_sources is passed
* Add support for dotted names in CPP Extensions
* Modify tests for cpp extensions
Test that dotted names work
* Py2 fixes
* Make run_test cpp_extensions Win-compatible
* Create FileBaton to synchronize distributed JIT C++ extension builds
* Move FileBaton to its own file
* Autoformat code
* Respect verbose flag in cpp_extension._prepare_ldflags
* Change cpp_extensions.py to make it work on Windows
* Fix linting
* Show python paths
* Debug
* Debug 1
* set PYTHONPATH
* Add ATen into library
* expose essential libs and functions, and copy _C.lib
* Specify dir in header
* Update check_abi for MSVC
* Activate cl environment to compile cpp extensions
* change version string
* Redirect stderr to stdout
* Add monkey patch for windows
* Remove unnecessary self
* Fix various issues
* Append necessary flags
* add /MD flag to cuda
* Install ninja
* Use THP_API instead of THP_CLASS
* Beautify the paths
* Revert "Use THP_API instead of THP_CLASS"
This reverts commit dd7e74c44db48e4c5f85bb8e3c698ff9de71ba2d.
* Use THP_API instead of THP_CLASS(new)
* Also pass torch includes to nvcc build
* Export ATen/cuda headers with install
* Refactor flags common to C++ and CUDA
* Improve tests for C++/CUDA extensions
* Export .cuh files under THC
* Refactor and clean cpp_extension.py slightly
* Include ATen in cuda extension test
* Clarifying comment in cuda_extension.cu
* Replace cuda_extension.cu with cuda_extension_kernel.cu in setup.py
* Copy compile args in C++ extension and add second kernel
* Conditionally add -std=c++11 to cuda_flags
* Also export cuDNN headers
* Add comment about deepcopy
This PR adds support for convenient CUDA integration in our C++ extension mechanism. This mainly involved figuring out how to get setuptools to use nvcc for CUDA files and the regular C++ compiler for C++ files. I've added a mixed C++/CUDA test case which works great.
I've also added a CUDAExtension and CppExtension function that constructs a setuptools.Extension with "usually the right" arguments, which reduces the required boilerplate to write an extension even more. Especially for CUDA, where library_dir (CUDA_HOME/lib64) and libraries (cudart) have to be specified as well.
Next step is to enable this with our "JIT" mechanism.
NOTE: I've had to write a small find_cuda_home function to find the CUDA install directory. This logic is kind of a duplicate of tools/setup_helpers/cuda.py, but that's not available in the shipped PyTorch distribution. The function is also fairly short. Let me know if it's fine to duplicate this logic.
* CUDA support for C++ extensions with setuptools
* Remove printf in CUDA test kernel
* Remove -arch flag in test/cpp_extensions/setup.py
* Put wrap_compile into BuildExtension
* Add guesses for CUDA_HOME directory
* export PATH to CUDA location in test.sh
* On Python2, sys.platform has the linux version number