Commit Graph

25 Commits

Author SHA1 Message Date
Peter Goldsborough
cf9b80720d Dont emit warning for ABI incompatibility when PyTorch was built from source (#7681) 2018-05-19 20:25:52 +01:00
Peter Goldsborough
8f42bb65b3
Be more lenient w.r.t. flag processing in C++ extensions (#7621) 2018-05-16 18:17:18 -04:00
Edward Z. Yang
64834f6fb8
Split libATen.so into libATen_cpu.so and libATen_cuda.so (#7275)
* Split libATen.so into libATen_cpu.so and libATen_cuda.so

Previously, ATen could be built with either CPU-only support, or
CPU/CUDA support, but only via a compile-time flag, requiring
two separate builds.  This means that if you have a program which
indirectly uses a CPU-only build of ATen, and a CPU/CUDA-build of
ATen, you're gonna have a bad time.  And you might want a CPU-only
build of ATen, because it is 15M (versus the 300M of a CUDA build).

This commit splits libATen.so into two libraries, CPU/CUDA, so
that it's not necessary to do a full rebuild to get CPU-only
support; instead, if you link against libATen_cpu.so only, you
are CPU-only; if you additionally link/dlopen libATen_cuda.so,
this enables CUDA support.  This brings ATen's dynamic library
structure more similar to Caffe2's.  libATen.so is no more
(this is BC BREAKING)

The general principle for how this works is that we introduce
a *hooks* interface, which introduces a dynamic dispatch indirection
between a call site and implementation site of CUDA functionality,
mediated by a static initialization registry.  This means that we can continue
to, for example, lazily initialize CUDA from Context (a core, CPU class) without
having a direct dependency on the CUDA bits.  Instead, we look up
in the registry if, e.g., CUDA hooks have been loaded (this loading
process happens at static initialization time), and if they
have been we dynamic dispatch to this class.  We similarly use
the hooks interface to handle Variable registration.

We introduce a new invariant: if the backend of a type has not
been initialized (e.g., it's library has not been dlopened; for
CUDA, this also includes CUDA initialization), then the Type
pointers in the context registry are NULL.  If you access the
registry directly you must maintain this invariant.

There are a few potholes along the way.  I document them here:

- Previously, PyTorch maintained a separate registry for variable
  types, because no provision for them was made in the Context's
  type_registry.  Now that we have the hooks mechanism, we can easily
  have PyTorch register variables in the main registry.  The code
  has been refactored accordingly.

- There is a subtle ordering issue between Variable and CUDA.
  We permit libATen_cuda.so and PyTorch to be loaded in either
  order (in practice, CUDA is always loaded "after" PyTorch, because
  it is lazily initialized.)  This means that, when CUDA types are
  loaded, we must subsequently also initialize their Variable equivalents.
  Appropriate hooks were added to VariableHooks to make this possible;
  similarly, getVariableHooks() is not referentially transparent, and
  will change behavior after Variables are loaded.  (This is different
  to CUDAHooks, which is "burned in" after you try to initialize CUDA.)

- The cmake is adjusted to separate dependencies into either CPU
  or CUDA dependencies.  The generator scripts are adjusted to either
  generate a file as a CUDA (cuda_file_manager) or CPU file (file_manager).

- I changed all native functions which were CUDA-only (the cudnn functions)
  to have dispatches for CUDA only (making it permissible to not specify
  all dispatch options.)  This uncovered a bug in how we were handling
  native functions which dispatch on a Type argument; I introduced a new
  self_ty keyword to handle this case.  I'm not 100% happy about it
  but it fixed my problem.

  This also exposed the fact that set_history incompletely handles
  heterogenous return tuples combining Tensor and TensorList.  I
  swapped this codegen to use flatten() (at the possible cost of
  a slight perf regression, since we're allocating another vector now
  in this code path).

- thc_state is no longer a public member of Context; use getTHCState() instead

- This PR comes with Registry from Caffe2, for handling static initialization.
  I needed to make a bunch of fixes to Registry to make it more portable

  - No more ##__VA_ARGS__ token pasting; instead, it is mandatory to pass at
    least one argument to the var-args. CUDAHooks and VariableHooks pass a nullary
    struct CUDAHooksArgs/VariableHooksArgs to solve the problem. We must get rid of
    token pasting because it does not work with MSVC.

  - It seems MSVC is not willing to generate code for constructors of template
    classes at use sites which cross DLL boundaries. So we explicitly instantiate
    the class to get around the problem. This involved tweaks to the boilerplate
    generating macros, and also required us to shuffle around namespaces a bit,
    because you can't specialize a template unless you are in the same namespace as
    the template.
  - Insertion of AT_API to appropriate places where the registry must be exported

- We have a general problem which is that on recent Ubuntu distributions,
  --as-needed is enabled for shared libraries, which is (cc @apaszke who was
  worrying about this in #7160 see also #7160 (comment)). For now, I've hacked
  this up in the PR to pass -Wl,--no-as-needed to all of the spots necessary to
  make CI work, but a more sustainable solution is to attempt to dlopen
  libATen_cuda.so when CUDA functionality is requested.

    - The JIT tests somehow manage to try to touch CUDA without loading libATen_cuda.so. So
      we pass -Wl,--no-as-needed when linking libATen_cuda.so to _C.so

- There is a very subtle linking issue with lapack, which is solved by making sure libATen_cuda.so links against LAPACK. There's a comment in aten/src/ATen/CMakeLists.txt about htis as well as a follow up bug at #7353

- autogradpp used AT_CUDA_ENABLED directly. We've expunged these uses and added
  a few more things to CUDAHooks (getNumGPUs)

- Added manualSeedAll to Generator so that we can invoke it polymorphically (it
  only does something different for CUDAGenerator)

- There's a new cuda/CUDAConfig.h header for CUDA-only ifdef macros (AT_CUDNN_ENABLED, most prominently)

- CUDAHooks/VariableHooks structs live in at namespace because Registry's
  namespace support is not good enough to handle it otherwise (see Registry
  changes above)

- There's some modest moving around of native functions in ReduceOps and
  UnaryOps to get the CUDA-only function implementations into separate files, so
  they are only compiled into libATen_cuda.so. sspaddmm needed a separate CUDA
  function due to object linkage boundaries.

- Some direct uses of native functions in CUDA code has to go away, since these
  functions are not exported, so you have to go through the dispatcher
  (at::native::empty_like to at::empty_like)

- Code in THC/THCS/THCUNN now properly use THC_API macro instead of TH_API
  (which matters now that TH and THC are not in the same library)

- Added code debt in torch/_thnn/utils.py and other THNN parsing code to handle
  both TH_API and THC_API

- TensorUtils.h is now properly exported with AT_API

- Dead uses of TH_EXPORTS and co expunged; we now use ATen_cpu_exports and
  ATen_cuda_exports (new, in ATenCUDAGeneral.h) consistently

- Fix some incorrect type annotations on _cudnn_rnn_backward, where we didn't
  declare a type as possibly undefined when we should have. We didn't catch this
  previously because optional annotations are not tested on "pass-through" native
  ATen ops (which don't have dispatch). Upstream issue at #7316

- There's a new cmake macro aten_compile_options for applying all of our
  per-target compile time options. We use this on the cpu and cuda libraries.

- test/test_cpp_extensions.py can be run directly by invoking in Python,
  assuming you've setup your PYTHONPATH setup correctly

- type_from_string does some new funny business to only query for all valid CUDA
  types (which causes CUDA initialization) when we see "torch.cuda." in the
  requested string

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Last mile libtorch fixes

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* pedantic fix

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-05-10 10:28:33 -07:00
Peter Goldsborough
54a4867675
Bring back C++ extension torch.h (#7310)
* Bring back C++ extension torch.h

* Fix python.h include in python_tensor.cpp
2018-05-05 14:06:27 -07:00
Peter Goldsborough
67d0d14908
Rename autograd namespace to torch and change torch.h into python.h (#7267)
* Rename autograd namespace to torch and change torch.h into python.h

* Include torch.h instead of python.h in test/cpp/api

* Change some mentions of torch.h to python.h in C++ extensions

* Set paths directly, without find_path
2018-05-04 08:04:57 -07:00
Peter Goldsborough
50218a25e7
[EASY] Document load_inline (#7101)
* Document load_inline

* Link to tests for examples

* Links in RestructuredText are weird
2018-04-30 14:36:41 -07:00
Peter Goldsborough
b70b7a80d4 Inline JIT C++ Extensions (#7059)
Adds ability to JIT compile C++ extensions from strings

>>> from torch.utils.cpp_extension import load_inline
>>> source = '''
    at::Tensor sin_add(at::Tensor x, at::Tensor y) {
      return x.sin() + y.sin();
    }
'''
>>> module = load_inline(name='inline_extension', cpp_sources=source, functions='sin_add')
Fixes #7012

* Inline JIT C++ Extensions

* jit_compile_sources -> jit_compile

* Split up test into CUDA and non-CUDA parts

* Documentation fixes

* Implement prologue and epilogue generation

* Remove extra newline

* Only create the CUDA source file when cuda_sources is passed
2018-04-30 11:48:44 -04:00
Francisco Massa
b240cc9b87
Add support for dotted names in CPP Extensions (#6986)
* Add support for dotted names in CPP Extensions

* Modify tests for cpp extensions

Test that dotted names work

* Py2 fixes

* Make run_test cpp_extensions Win-compatible
2018-04-29 18:10:03 +02:00
Peter Goldsborough
3d4d39ce30
Also check compiler ABI compatibility when JIT compiling (#7015) 2018-04-27 08:19:17 +01:00
Peter Goldsborough
75ccfb321b Fix cpp_extensins.py (#6722) 2018-04-18 22:43:12 -04:00
Peter Goldsborough
c14b62fca2 Create FileBaton to synchronize distributed JIT C++ extension builds (#6684)
* Create FileBaton to synchronize distributed JIT C++ extension builds

* Move FileBaton to its own file

* Autoformat code

* Respect verbose flag in cpp_extension._prepare_ldflags
2018-04-18 18:07:03 -04:00
ngimel
96e2140ffb Check for g++ also in check_compiler_ABI (#6711)
Otherwise a spurious warning is generated. @goldsborough
2018-04-18 20:30:52 +01:00
Peter Goldsborough
ae592b4999
Louder warning for C++ extensions (#6435) 2018-04-10 12:47:39 -07:00
Kento NOZAWA
3b58b859b2 Fix typos in docs (#6389) 2018-04-07 12:41:15 -04:00
peterjc123
63af898d46 Fix extension test on Windows (#5548)
* Change cpp_extensions.py to make it work on Windows

* Fix linting

* Show python paths

* Debug

* Debug 1

* set PYTHONPATH

* Add ATen into library

* expose essential libs and functions, and copy _C.lib

* Specify dir in header

* Update check_abi for MSVC

* Activate cl environment to compile cpp extensions

* change version string

* Redirect stderr to stdout

* Add monkey patch for windows

* Remove unnecessary self

* Fix various issues

* Append necessary flags

* add /MD flag to cuda

* Install ninja

* Use THP_API instead of THP_CLASS

* Beautify the paths

* Revert "Use THP_API instead of THP_CLASS"

This reverts commit dd7e74c44db48e4c5f85bb8e3c698ff9de71ba2d.

* Use THP_API instead of THP_CLASS(new)
2018-04-02 13:53:25 -04:00
Peter Goldsborough
792daeb422 Enable documentation for C++ extensions on the website (#5597) 2018-03-07 14:07:26 +01:00
Peter Goldsborough
b10fcca5f0 Install cuda headers in ATen build (#5474) 2018-02-28 19:36:41 -08:00
Peter Goldsborough
008ba18c5b Improve CUDA extension support (#5324)
* Also pass torch includes to nvcc build

* Export ATen/cuda headers with install

* Refactor flags common to C++ and CUDA

* Improve tests for C++/CUDA extensions

* Export .cuh files under THC

* Refactor and clean cpp_extension.py slightly

* Include ATen in cuda extension test

* Clarifying comment in cuda_extension.cu

* Replace cuda_extension.cu with cuda_extension_kernel.cu in setup.py

* Copy compile args in C++ extension and add second kernel

* Conditionally add -std=c++11 to cuda_flags

* Also export cuDNN headers

* Add comment about deepcopy
2018-02-23 10:15:30 -05:00
Peter Goldsborough
22fe542b8e Use TORCH_EXTENSION_NAME macro to avoid mismatched module/extension name (#5277)
* Warn users about mismatched module/extension name

* Define TORCH_EXTENSION_NAME macro
2018-02-16 22:31:04 -05:00
Peter Goldsborough
fe72037c68 Add CUDA support for JIT-compiling C++ extensions (#5226) 2018-02-15 15:50:01 -05:00
Peter Goldsborough
1b71e78d13 CUDA support for C++ extensions with setuptools (#5207)
This PR adds support for convenient CUDA integration in our C++ extension mechanism. This mainly involved figuring out how to get setuptools to use nvcc for CUDA files and the regular C++ compiler for C++ files. I've added a mixed C++/CUDA test case which works great.

I've also added a CUDAExtension and CppExtension function that constructs a setuptools.Extension with "usually the right" arguments, which reduces the required boilerplate to write an extension even more. Especially for CUDA, where library_dir (CUDA_HOME/lib64) and libraries (cudart) have to be specified as well.

Next step is to enable this with our "JIT" mechanism.

NOTE: I've had to write a small find_cuda_home function to find the CUDA install directory. This logic is kind of a duplicate of tools/setup_helpers/cuda.py, but that's not available in the shipped PyTorch distribution. The function is also fairly short. Let me know if it's fine to duplicate this logic.

* CUDA support for C++ extensions with setuptools

* Remove printf in CUDA test kernel

* Remove -arch flag in test/cpp_extensions/setup.py

* Put wrap_compile into BuildExtension

* Add guesses for CUDA_HOME directory

* export PATH to CUDA location in test.sh

* On Python2, sys.platform has the linux version number
2018-02-13 15:02:50 -08:00
Peter Goldsborough
390d542db2 Modify .jenkins/test.sh to install ninja 2018-02-01 22:42:07 -08:00
Peter Goldsborough
733ce9529e [cpp-extensions] Implement torch.utils.cpp_extensions.load() 2018-02-01 22:42:07 -08:00
Peter Goldsborough
9e36f979c9 Add ABI compatibility check to cpp_extensions.py 2018-02-01 16:19:03 -08:00
Peter Goldsborough
1262fba8e7 [cpp extensions] Create torch.h and update setup.py 2018-02-01 16:19:03 -08:00