* Change cpp_extensions.py to make it work on Windows
* Fix linting
* Show python paths
* Debug
* Debug 1
* set PYTHONPATH
* Add ATen into library
* expose essential libs and functions, and copy _C.lib
* Specify dir in header
* Update check_abi for MSVC
* Activate cl environment to compile cpp extensions
* change version string
* Redirect stderr to stdout
* Add monkey patch for windows
* Remove unnecessary self
* Fix various issues
* Append necessary flags
* add /MD flag to cuda
* Install ninja
* Use THP_API instead of THP_CLASS
* Beautify the paths
* Revert "Use THP_API instead of THP_CLASS"
This reverts commit dd7e74c44db48e4c5f85bb8e3c698ff9de71ba2d.
* Use THP_API instead of THP_CLASS(new)
* Also pass torch includes to nvcc build
* Export ATen/cuda headers with install
* Refactor flags common to C++ and CUDA
* Improve tests for C++/CUDA extensions
* Export .cuh files under THC
* Refactor and clean cpp_extension.py slightly
* Include ATen in cuda extension test
* Clarifying comment in cuda_extension.cu
* Replace cuda_extension.cu with cuda_extension_kernel.cu in setup.py
* Copy compile args in C++ extension and add second kernel
* Conditionally add -std=c++11 to cuda_flags
* Also export cuDNN headers
* Add comment about deepcopy
This PR adds support for convenient CUDA integration in our C++ extension mechanism. This mainly involved figuring out how to get setuptools to use nvcc for CUDA files and the regular C++ compiler for C++ files. I've added a mixed C++/CUDA test case which works great.
I've also added a CUDAExtension and CppExtension function that constructs a setuptools.Extension with "usually the right" arguments, which reduces the required boilerplate to write an extension even more. Especially for CUDA, where library_dir (CUDA_HOME/lib64) and libraries (cudart) have to be specified as well.
Next step is to enable this with our "JIT" mechanism.
NOTE: I've had to write a small find_cuda_home function to find the CUDA install directory. This logic is kind of a duplicate of tools/setup_helpers/cuda.py, but that's not available in the shipped PyTorch distribution. The function is also fairly short. Let me know if it's fine to duplicate this logic.
* CUDA support for C++ extensions with setuptools
* Remove printf in CUDA test kernel
* Remove -arch flag in test/cpp_extensions/setup.py
* Put wrap_compile into BuildExtension
* Add guesses for CUDA_HOME directory
* export PATH to CUDA location in test.sh
* On Python2, sys.platform has the linux version number