Commit Graph

210 Commits

Author SHA1 Message Date
Sam Gross
60c03bc09c
Implement apply_, map_, and map2_ in Variable (#4057) 2017-12-07 14:48:56 -05:00
Sam Gross
d0cabbde74
Implement Variable.from_numpy (#4043)
Implements from_numpy using ATen tensors. Variable.from_numpy is a
convenient placeholder for the variant that returns Variables until we
merge Tensor and Variable.

The behavior is slightly changed:

 - from_numpy() on an empty array now returns an empty tensor instead of
   throwing an exception. The shape may not be preserved.
 - CharTensor(ndarray) used to throw an exception. It now copies the
   ndarray. Copying is implemented via ATen toType.
2017-12-06 14:08:56 -05:00
Sam Gross
38f13447bc
Implement Variable.tolist() (#4038)
Tensor.tolist() now dispatches through Variable.tolist() so that we only
have one code path to test until we merge Variable and Tensor.
2017-12-06 12:35:05 -05:00
Sam Gross
5241cdf546
Implement Variable.numpy() (#4006)
Implement Variable.numpy() and dispatch Tensor.numpy() through Variable.numpy()

Variable.numpy() is disallowed on variables that require grad.
2017-12-05 14:24:11 -05:00
Zachary DeVito
9e46fca424 Use ninja as the cmake backend as well. 2017-12-04 14:16:26 -05:00
Zach DeVito
f72fe0624d Add a CPU Fuser (single core)
This adds a simple fusion backend for the CPU.
* Refactors CompiledFusionFunction to have two subclasses that handle
  the compilation details of each backend.
* emit-compile-link-run cycle for the CPU
* simple single core loop to run the operation
* lift CUDA-only restrictions in the fuser, checks that fusion groups
  are only on a single backend.
2017-12-04 14:13:44 -05:00
Zach DeVito
710f6d6958 Fix warnings and add alert to enable ninja when developing. 2017-12-03 04:49:41 +01:00
Edward Z. Yang
1c0fbd27a1
CuDNN bindings rewrite (into ATen) (#3666)
* Comprehensive rewrite of Torch CuDNN bindings / a bit of ATen infra

The executive summary is that this moves the torch/csrc/cudnn
library into ATen, adding a number of new cudnn_ methods to ATen
for batchnorm, convolution, affine grid generator and grid sampler.

ATen infra changes:

- TensorGeometry was moved to ATen
- TensorGeometry was modified to make its interface resemble that of
  Tensor; in particular, sizes is no longer a field, it's a method.
- AT_CUDA_ENABLED macro is set via ATen/Config.h header which is
  generated at cmake configure time.
  Fixes https://github.com/zdevito/ATen/issues/168
- Change AT_CUDA_ENABLED macro to be a function macro, so that we
  error if it is not defined
- Introduce a new TensorArg class, which is a Tensor plus a little
  metadata.  This helps us give good error messages when checking
  dimensions/shapes of tensors.
  Fixes https://github.com/zdevito/ATen/issues/169
- Also introduce a TensorGeometryArg class, for when you don't
  need the actual tensor data (which is most of the time.)
- Add ATen/Check.h, which contains a number of utility functions
  for testing shapes, types and devices of input tensors.  This
  will be particulary useful for native methods, which don't get
  code generated input testing code.  These functions take a
  'CheckedFrom' argument, at the moment just a string, which
  specifies some extra information about what function was
  doing the actual checking; this greatly improves error messages.
    - Many check functions take initializer lists, which let you
      test that all tensors have some property.  This API is
      peculiar, in that we IGNORE undefined tensors in this case.
      This is handled by filterDefined.
- Add AT_CUDNN_ENABLED macro
- CuDNN linking from ATen was improved; for example, we now actually
  add the CuDNN headers to our include path.
- Add some missing override specifiers to some methods
- We now actually build tests with CUDA functionality accessible
  (previously, AT_CUDA_ENABLED was not defined, meaning that
  the headers were missing all CUDA-only functionality.)
- Native functions now support giving explicit names to return
  outputs in yaml.  This makes it possible to hook into the NN
  autogenerated derivatives codepath using native functions.

CuDNN rewrite changes:

- torch/csrc/cudnn now uses ATen (rather than passing around
  THVoidTensor) and lives in ATen.  This lets us remove tensorPointer
  shenanigans.  The functions are exposed to ATen as native functions
  described in aten/src/ATen/cudnn/cuDNN.yaml
- ATen now builds and links against CuDNN when enabled.  The cmake
  package script was taken from Caffe2.
- Some header reorganization was done to help reduce dependencies
  on headers (this reorg is no longer used but I've kept it)
- Rename CHECK to CUDNN_CHECK
- Rip out old shape/type testing code in favor of modern ATen/Check.h
  interface using TensorArg.  In many cases, increase the robustness of
  the checking code.
- Change the inputs of the public facing functions, so that they can
  be bound by ATen
  - Delete THCState*; this is retrieved from the global ATen context
  - Delete cudnnHandle_t, this is retrieved from the global Handles.h
  - Delete cudnnDataType_t, this is retrieved from the Tensor type
  - Delete Convolution class, instead its constituent arguments are
    passed individually
- Change functions to return tensors, rather than take an appropriately
  sized output tensor as an input.
- Redo how transposed convolution / backward convolution is implemented
  (knock on effect of returning tensors).  Previously it was assumed
  that you would always pass an appropriately sized output tensor, but
  we don't want to do this anymore.  For backwards, we instead give
  the desired output tensor (input, really) size, because that is
  readily available.  For *transposed* convolution, however, we take
  output_padding, and otherwise do the shape calculation.
- Redo how legacy group convolution is implemented (knock on effect from
  porting cudnn to ATen.)  Previously, group convolution was implemented
  by manually constructing sizes and strides and then outputting
  appropriate, with macros switching between individual groups and
  all-at-once based on CuDNN version.  Now, the code looks exactly what
  you'd expect: there's a top-level wrapping function that supports
  group convolution no matter the version of CuDNN, and a low-level
  wrapper which supports only what CuDNN supports.  The top-level
  function conditions on CuDNN version, and invokes the low-level
  interface 1 or n times.
- There is now a debugging printer for tensor descriptors.
- Convolution struct is replaced with ConvolutionArgs, which is not
  part of the public API but is used internally to conveniently
  pass around all of the arguments needed for Convolution.
- Add some constexprs for well-known dimensions, reduce amount of
  magic numbers in code.
- Put 'deterministic' in to ConvParams.  Fixes #3659
- Lots more comments.
- Some pessimizations, in the name of code clarity:
  - The descriptors are initialized on every invocation of convolution
    forward/backward.  Previously, the descriptors were cached, so that
    you didn't have to initialize them again on backwards.  This is
    difficult to support in the ATen interface so I didn't support it.
  - Legacy group convolution initializes its workspace for *every* group
    it performs.  I did not feel motivated to fix this because the
    legacy codepath is already quite slow.
- Affine grid generator and grid sampler automatically call contiguous
  on their arguments as necessary.
- Batchnorm input checking is greatly beefed up, it now checks for
  the following input characteristics:
    - Definedness
    - GPU location
    - Type
    - Contiguity
    - Size

PyTorch binding code changes

- batchnorm now uses consistent var/data naming
- batchnorm and convolution make use of new ATen bindings
- Affine grid generator and grid sampler make use of ATen CuDNN
  bindings via derivatives.yaml.  This means I had to restructure
  the code a little, since the THNN bindings still go through
  a legacy Python class.
- I fixed some warnings:
  - s/friend class/friend struct/ on InterpreterStateImpl
  - Removed pessimizing move 'detached' in torch/csrc/autograd/variable.cpp
  - Removed unused pack_list on Scalar

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

GCC 4.8 buildfix

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Add TensorGeometry to ATen.h

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

CUDNN_CHECK

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Update TODO comment

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Delete return in cudnn_grid_sampler

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

s/cudnnSetStreamToCurrent/setCuDNNStreamToCurrent/g

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Don't allocate a new vector when filtering defined.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Remove Check overloads, convert to pass references.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Some more microbenchmarking.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-11-30 23:06:58 -05:00
Zachary DeVito
70ca83793d Add support to emit compile_commands.json from CMake/ninja
files.
2017-11-30 13:47:27 -05:00
Zachary DeVito
0e54c3a989 Significantly speed up the incremental build.
This commit adds code to setup.py to use ninja to manage
C++ and code generator dependencies rather than use raw setuptools.
This is based on similar code added to ONNX.

Enabled optionally when ninja is installed.

On my computer speed for a do-nothing build drops from 10s to 1.5 seconds.
Speed of other compilation steps is significantly improved as well.
Dependencies are tracked correctly so the need for ccache is reduced.
2017-11-30 13:47:27 -05:00
Zachary DeVito
929a11f920
Add interpreter support for Handles/PythonOp/CppOp (#3866)
* Add interpreter support for Handles/PythonOp/CppOp

This treats Handles as a first-class type in the interpreter
since this turned out to be conceptually simpler than treating
them as a separate concept, which requires a second channel for
register allocating and moving data from one op to the next.

Notes:
* The refcounting nature of tensors is factored into its own base type
so that it can be shared with other refcounted types such as handle.
* Some methods redundant with TensorBase have been deleted from Tensor
* The interpreter uses raw refcounted handles. In addition to being
able to treat Tensors and Handles as the same base object, it removes
a lot of redundant refcounting as objects moved from tensors to input/
output lists.
* aten_dispatch has been updated to work directly on the raw refcounted
lists to avoid refcounting and duplicate lists.
* Removing jit_closure.cpp, The interpreter can now handle all pathways.

* Functions like `unsafeToTensorShare` describe how
ownership transfers in the interpreter. The `Steal` variants
take rvalue references as arguments, and invalidate those
arguments to prevent potential problems.
* Make TensorTemporary is not a  subtype relationship because it is too easy to
do something horribly unsafe:

```
  void foo(at::Tensor bar) {
    // bar destructor call release on a temporary!
  }

  foo(TensorTemporary(retainable)); // structure slicing!
```
2017-11-29 11:38:57 -05:00
Sam Gross
4518793aa2
Implement indexing in ATen (#3725)
Implements basic and advanced indexing using ATen tensors/variables.
Basic indexing is translated at the Python-binding level
(python_variable_indexing.cpp) to slice/squeeze/unsqueeze/select calls.
Advanced indexing is implemented in ATen in terms of take() and put()
calls.
2017-11-21 13:19:00 -05:00
Scott Stevenson
a9ef76b9c6 Reflect renaming of OS X to macOS (#3795) 2017-11-20 16:52:10 -05:00
Adam Paszke
3e4a777e44
Correct JIT interpreter autograd function (#3760) 2017-11-19 21:48:22 +01:00
Zachary DeVito
cc7f09a372
Add cudaEvent support to the profiler (#3734)
* Add cudaEvent support to the profiler

This adds the ability to record cuda timings using cudaEventRecord
in the profiler. Since it doesn't require nvprof it is easier
to run than the nvprof path.

This also records a thread id for each event, which will make
tracing results easier to understand

* Add flow arrows from cpu to cuda event

* Fix no cuda build

* Review comments

* Move CUDA checks to one place
2017-11-16 13:58:09 -08:00
Soumith Chintala
99037d627d
fix OSX cuda build (#3722) 2017-11-15 16:38:18 -05:00
Zachary DeVito
e43ff32192
Add a JIT interpreter (#3634)
* Add a JIT interpreter

The separate interpreter is used to graphs with a lower overhead than
converting them to autograd graphs. Some notes:

* does not support Handles/PythonOp/CppOp, these will be in a future commit
* jit_closure.cpp still exists and we fall back to it for now when
  cannot handle something because of PythonOp/CppOp
* In order to support retain_graph=True, the interpreter can be cloned,
  creating a copy that can be run with different arguments. This is
  assumed to be the non-standard case so cloning is not particularly optimized.
  No tensor _data_ is copied, but the at::Tensor list in the interpreter is.
  If we hit problems, there is a lot we could do (such as register allocation)
  to minimize the stuff that needs to be copied.
* Uses a pImpl pattern to keep implementation details out of its header file.
* Modifies the way getTensorOp works so that it reads/writes to already-existing
  vectors, this prevents needing to realloc these buffers each time.
* Timings are here: https://gist.github.com/zdevito/5a20ac29fb1b9e449e693b67dc478127
  This reduces overhead to about the same as running it in python.
  It is about 10us faster to run the same thing using ATen directly.

* Code Mod

Interpreter -> InterpreterState
Function -> Code

Add other requested comments.

* RegList -> ListHandle<T>

Change the RegList functions to be safer by identifying the type of
each argument list, and checking that list insert does not try
to add to two different lists at once.

* Use exactly equal for interp tests
2017-11-13 22:09:53 -08:00
Sam Gross
4fa94793dd Bump version in master (#3605) 2017-11-11 18:49:19 -05:00
peter
7160fb0801 Fix setup scripts for Windows CUDA builds 2017-11-11 13:05:35 +01:00
Adam Paszke
1f1612ee37 Move _CompiledMixin to C++ 2017-11-10 16:31:44 +01:00
Soumith Chintala
285ce10dbe
fix linking order of nvrtc to force no-as-needed (#3583) 2017-11-08 22:05:09 -05:00
Edward Z. Yang
d2784b6e5b Link ATen against CuDNN when available. (#3582)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-11-08 20:20:53 -05:00
peterjc123
aa911939a3 Improve Windows Compatibility (for csrc/scripts) (#2941) 2017-11-08 19:51:35 +01:00
Adam Paszke
621fbd5c4e Move flattening/unflattening JIT logic to C 2017-11-06 19:42:44 -05:00
Sam Gross
fde355f7d4
Allow in-place operations on views (#3384)
Allow in-place operations on views

Adds VariableViewImpl, a subclass of VariableImpl which has a pointer to
the base Variable on which it is a view. In-place operations on views
change the grad_fn of the base.

Note that in-place operations only work on views that are the first output of the function that created them. All C++/ATen implemented functions have this behavior, but it's possible to write Python-implemented autograd functions that do not. In-place operations on these view will raise an exception.

Fixes #3313
2017-11-06 18:19:56 -05:00
Zach DeVito
f6dac327df build fixes 2017-11-02 19:53:36 -04:00
Zach DeVito
88d56cc198 fix setup.py paths 2017-11-02 19:53:36 -04:00
Zach DeVito
5aa5b572e4 update build so that all of TH* is in libATen 2017-11-02 19:53:36 -04:00
Sam Gross
afdf50cafe
Move jit/assert.h to csrc/assertions.h (#3442)
I've kept JIT_ASSERT as an alias to TORCH_ASSERT, which we can use throughout the C++ code.
2017-11-02 13:26:51 -04:00
Soumith Chintala
fc7a68d147 fix lint 2017-11-02 07:36:58 -04:00
Soumith Chintala
4108feb27d fix OSX cuda build 2017-11-02 07:15:24 -04:00
Trevor Killeen
0e38d3bbb3 remove thpp library (#3405) 2017-11-01 11:57:09 -04:00
Trevor Killeen
b544882335 ATen in THD (Part I) (#2288)
* enable size from ATen type

* temp commit aten thd

* port copy, math

* port random

* changes after rebase

* lapack bind

* thd and csrc compile

* fix min/max reductions in DataChannelTCP

* clean up changes

* re-enable tensor constructors

* port MPI to at::Tensor

* fix storage methods to not cast to thpp storage ptrs
2017-11-01 09:59:02 -04:00
Edward Z. Yang
d4abaa4b9e Move ONNX broadcast fusion into separate ONNX pass, fixes verbose printing.
This breaks a lot of the onnx-pytorch tests because the abstraction
barriers are not respected.  I'll spin up a patch for that separately.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-11-01 09:49:53 -04:00
Soumith Chintala
91af122d43 add no-as-needed for THRTC 2017-11-01 04:25:42 -07:00
Soumith Chintala
88d9ebc850
lazy-load nvrtc and libcuda (#3408) 2017-11-01 06:07:03 -04:00
Adam Cécile
a5dbc254f8 if git is not installed at all, no subprocess exception will be raised (#3379) 2017-10-30 18:37:12 -04:00
Edward Z. Yang
40f7f6e095 Improve handling of 'expand' (broadcasting) in JIT and ONNX
The pieces:

- I improved the lint / asserts to catch some bugs which I
  committed while working on my export.  There are two new
  properties which the linter checks now:

    (1) "Anticipated uses".  If a node says that is used by
    M, M better appear later in the topsort.  Previously,
    we only checked if it was in all_nodes.

    (2) If you are a select node, you better be a multi-type node;
    if you're not a select node, you better not be!  And you
    should never have an input that is multi-type.

- There is a new peephole optimization pass, for simple, local
  transformations to graphs.  Right now, it implements a simple
  optimization: remove 'expand' invocations that are no-ops
  (the size before matches the size after), but we can add other
  things to it later.  I needed this for ONNX because no-op expands
  show up in the left-hand argument, which we don't support.

- There is now a broadcast fuser, which fuses ATen expand ops
  into broadcastable ONNX ops (Add, Div, Mul, Pow, Sub, Gemm.)
  It only fuses when the original size is a suffix of the new
  size, as per the ONNX spec.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-10-29 23:50:34 -04:00
Maxim Berman
7b00adf5d3 Add CUDNN_LIB_DIR in rpath (#3255)
* Add CUDNN_LIB_DIR in link -rpath

* insert CUDNN_LIB_PATH in front of rpath
2017-10-28 00:13:53 -04:00
Adam Paszke
61afb0d519 Autogenerate ATen dispatch for JIT nodes 2017-10-27 02:40:09 +05:30
Sam Gross
67839ce7bc Delete unused Softmax code (#3220)
Softmax and LogSoftmax are automatically bound and dispatched through
VariableType.
2017-10-21 20:51:27 +02:00
Edward Z. Yang
67612cba09 Add -Wno-missing-braces
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-10-19 23:04:19 -04:00
Sam Gross
f1f64c8d07 Generate autograd functions for NN / more refactors (#3136)
Generate autograd functions for NN and implement more derivatives in derivatives.yaml

A big refactor of gen_variable_type.py
2017-10-19 15:03:26 -04:00
Adam Paszke
98e67448fa Large Softmax and LogSoftmax refactor
- Cleaned up THNN and THCUNN code and kernels
- Improved THCUNN kernel performance 5x, making it match cuDNN performance
- Added support for computing softmax over arbitrary dims
  NOTE: The default dim for 3D inputs is now 1 (used to be 0)
- Both functions now accept inputs with arbitrarily many dimensions
- Autograd functions no longer save the input (it's unnecessary)
- Added cuDNN bindings for softmax, but they are unused as THCUNN
  matches or even exceeds cuDNN performance
2017-10-19 19:51:10 +02:00
Trevor Killeen
dcb457fdd9 add support for using nnpack when installed via conda (#3155)
* add support for using nnpack when installed via conda

* unify nnpack discovery between conda and user
2017-10-18 20:11:13 +02:00
Richard Zou
0f4ae13f05 Better cudnn version checking (#3132) 2017-10-16 20:59:18 +02:00
Richard Zou
1322f9a272 Add cudnn version to torch.version 2017-10-13 23:58:25 +02:00
Francisco Massa
f093545919 Add compiled CUDA version in torch.version.cuda 2017-10-10 10:16:14 -04:00
Soumith Chintala
efe91fb9c1 delete redundant python nccl code 2017-10-09 22:24:18 -04:00
Soumith Chintala
4d62933529 add initial NCCL C bindings 2017-10-09 22:24:18 -04:00