Commit Graph

194 Commits

Author SHA1 Message Date
Zachary DeVito
e43ff32192
Add a JIT interpreter (#3634)
* Add a JIT interpreter

The separate interpreter is used to graphs with a lower overhead than
converting them to autograd graphs. Some notes:

* does not support Handles/PythonOp/CppOp, these will be in a future commit
* jit_closure.cpp still exists and we fall back to it for now when
  cannot handle something because of PythonOp/CppOp
* In order to support retain_graph=True, the interpreter can be cloned,
  creating a copy that can be run with different arguments. This is
  assumed to be the non-standard case so cloning is not particularly optimized.
  No tensor _data_ is copied, but the at::Tensor list in the interpreter is.
  If we hit problems, there is a lot we could do (such as register allocation)
  to minimize the stuff that needs to be copied.
* Uses a pImpl pattern to keep implementation details out of its header file.
* Modifies the way getTensorOp works so that it reads/writes to already-existing
  vectors, this prevents needing to realloc these buffers each time.
* Timings are here: https://gist.github.com/zdevito/5a20ac29fb1b9e449e693b67dc478127
  This reduces overhead to about the same as running it in python.
  It is about 10us faster to run the same thing using ATen directly.

* Code Mod

Interpreter -> InterpreterState
Function -> Code

Add other requested comments.

* RegList -> ListHandle<T>

Change the RegList functions to be safer by identifying the type of
each argument list, and checking that list insert does not try
to add to two different lists at once.

* Use exactly equal for interp tests
2017-11-13 22:09:53 -08:00
Sam Gross
4fa94793dd Bump version in master (#3605) 2017-11-11 18:49:19 -05:00
peter
7160fb0801 Fix setup scripts for Windows CUDA builds 2017-11-11 13:05:35 +01:00
Adam Paszke
1f1612ee37 Move _CompiledMixin to C++ 2017-11-10 16:31:44 +01:00
Soumith Chintala
285ce10dbe
fix linking order of nvrtc to force no-as-needed (#3583) 2017-11-08 22:05:09 -05:00
Edward Z. Yang
d2784b6e5b Link ATen against CuDNN when available. (#3582)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-11-08 20:20:53 -05:00
peterjc123
aa911939a3 Improve Windows Compatibility (for csrc/scripts) (#2941) 2017-11-08 19:51:35 +01:00
Adam Paszke
621fbd5c4e Move flattening/unflattening JIT logic to C 2017-11-06 19:42:44 -05:00
Sam Gross
fde355f7d4
Allow in-place operations on views (#3384)
Allow in-place operations on views

Adds VariableViewImpl, a subclass of VariableImpl which has a pointer to
the base Variable on which it is a view. In-place operations on views
change the grad_fn of the base.

Note that in-place operations only work on views that are the first output of the function that created them. All C++/ATen implemented functions have this behavior, but it's possible to write Python-implemented autograd functions that do not. In-place operations on these view will raise an exception.

Fixes #3313
2017-11-06 18:19:56 -05:00
Zach DeVito
f6dac327df build fixes 2017-11-02 19:53:36 -04:00
Zach DeVito
88d56cc198 fix setup.py paths 2017-11-02 19:53:36 -04:00
Zach DeVito
5aa5b572e4 update build so that all of TH* is in libATen 2017-11-02 19:53:36 -04:00
Sam Gross
afdf50cafe
Move jit/assert.h to csrc/assertions.h (#3442)
I've kept JIT_ASSERT as an alias to TORCH_ASSERT, which we can use throughout the C++ code.
2017-11-02 13:26:51 -04:00
Soumith Chintala
fc7a68d147 fix lint 2017-11-02 07:36:58 -04:00
Soumith Chintala
4108feb27d fix OSX cuda build 2017-11-02 07:15:24 -04:00
Trevor Killeen
0e38d3bbb3 remove thpp library (#3405) 2017-11-01 11:57:09 -04:00
Trevor Killeen
b544882335 ATen in THD (Part I) (#2288)
* enable size from ATen type

* temp commit aten thd

* port copy, math

* port random

* changes after rebase

* lapack bind

* thd and csrc compile

* fix min/max reductions in DataChannelTCP

* clean up changes

* re-enable tensor constructors

* port MPI to at::Tensor

* fix storage methods to not cast to thpp storage ptrs
2017-11-01 09:59:02 -04:00
Edward Z. Yang
d4abaa4b9e Move ONNX broadcast fusion into separate ONNX pass, fixes verbose printing.
This breaks a lot of the onnx-pytorch tests because the abstraction
barriers are not respected.  I'll spin up a patch for that separately.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-11-01 09:49:53 -04:00
Soumith Chintala
91af122d43 add no-as-needed for THRTC 2017-11-01 04:25:42 -07:00
Soumith Chintala
88d9ebc850
lazy-load nvrtc and libcuda (#3408) 2017-11-01 06:07:03 -04:00
Adam Cécile
a5dbc254f8 if git is not installed at all, no subprocess exception will be raised (#3379) 2017-10-30 18:37:12 -04:00
Edward Z. Yang
40f7f6e095 Improve handling of 'expand' (broadcasting) in JIT and ONNX
The pieces:

- I improved the lint / asserts to catch some bugs which I
  committed while working on my export.  There are two new
  properties which the linter checks now:

    (1) "Anticipated uses".  If a node says that is used by
    M, M better appear later in the topsort.  Previously,
    we only checked if it was in all_nodes.

    (2) If you are a select node, you better be a multi-type node;
    if you're not a select node, you better not be!  And you
    should never have an input that is multi-type.

- There is a new peephole optimization pass, for simple, local
  transformations to graphs.  Right now, it implements a simple
  optimization: remove 'expand' invocations that are no-ops
  (the size before matches the size after), but we can add other
  things to it later.  I needed this for ONNX because no-op expands
  show up in the left-hand argument, which we don't support.

- There is now a broadcast fuser, which fuses ATen expand ops
  into broadcastable ONNX ops (Add, Div, Mul, Pow, Sub, Gemm.)
  It only fuses when the original size is a suffix of the new
  size, as per the ONNX spec.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-10-29 23:50:34 -04:00
Maxim Berman
7b00adf5d3 Add CUDNN_LIB_DIR in rpath (#3255)
* Add CUDNN_LIB_DIR in link -rpath

* insert CUDNN_LIB_PATH in front of rpath
2017-10-28 00:13:53 -04:00
Adam Paszke
61afb0d519 Autogenerate ATen dispatch for JIT nodes 2017-10-27 02:40:09 +05:30
Sam Gross
67839ce7bc Delete unused Softmax code (#3220)
Softmax and LogSoftmax are automatically bound and dispatched through
VariableType.
2017-10-21 20:51:27 +02:00
Edward Z. Yang
67612cba09 Add -Wno-missing-braces
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-10-19 23:04:19 -04:00
Sam Gross
f1f64c8d07 Generate autograd functions for NN / more refactors (#3136)
Generate autograd functions for NN and implement more derivatives in derivatives.yaml

A big refactor of gen_variable_type.py
2017-10-19 15:03:26 -04:00
Adam Paszke
98e67448fa Large Softmax and LogSoftmax refactor
- Cleaned up THNN and THCUNN code and kernels
- Improved THCUNN kernel performance 5x, making it match cuDNN performance
- Added support for computing softmax over arbitrary dims
  NOTE: The default dim for 3D inputs is now 1 (used to be 0)
- Both functions now accept inputs with arbitrarily many dimensions
- Autograd functions no longer save the input (it's unnecessary)
- Added cuDNN bindings for softmax, but they are unused as THCUNN
  matches or even exceeds cuDNN performance
2017-10-19 19:51:10 +02:00
Trevor Killeen
dcb457fdd9 add support for using nnpack when installed via conda (#3155)
* add support for using nnpack when installed via conda

* unify nnpack discovery between conda and user
2017-10-18 20:11:13 +02:00
Richard Zou
0f4ae13f05 Better cudnn version checking (#3132) 2017-10-16 20:59:18 +02:00
Richard Zou
1322f9a272 Add cudnn version to torch.version 2017-10-13 23:58:25 +02:00
Francisco Massa
f093545919 Add compiled CUDA version in torch.version.cuda 2017-10-10 10:16:14 -04:00
Soumith Chintala
efe91fb9c1 delete redundant python nccl code 2017-10-09 22:24:18 -04:00
Soumith Chintala
4d62933529 add initial NCCL C bindings 2017-10-09 22:24:18 -04:00
Soumith Chintala
b7e258f81e link specific versioned System NCCL, rather than generic file 2017-10-09 22:24:18 -04:00
Trevor Killeen
029252fb3b NNPACK bindings for Convolution (#2826)
* skeleton commit for building and linking nnpack library in PyTorch

* first stab at conv forward binding + integration

* bind NNPACK gradient kernels

* move nnpack forward, input gradient calls deeper

* nnpack conv api mimics nn

* fix symbol error; use memory across calls

* clean up warnings, add shape checking, thread safety, configurable thread specification

* add batch size threshold, also bind for single-element batch for the future
2017-10-04 13:48:14 -04:00
Adam Paszke
437d3af7bf Add CUDNN_INCLUDE_DIR before CUDA directories in setup.py 2017-10-03 10:06:47 -04:00
Sam Gross
de757805fc Implement some autograd functions using ATen (#2805)
This adds some generated autograd functions implemented in C++, which
are generated from derivatives.yaml. It also generates Python bindings
for the Variable methods. The generated files are:

 Functions.cpp/h: subclasses of torch::autograd::Function
 VariableType.cpp/h: The at::Type for autograd Variables
 python_variable_methods.cpp: Python bindings to torch::autograd::Variable
 python_variable_methods_dispatch.h: wrapper which releases GIL and sets the
     CUDA device
 python_functions.cpp/h: exposes generated autograd functions as Python
     objects

The generated functions are mostly shadowed by the definitions in
variable.py. We'll remove the Python implementations in favor of the
generated C++ implementations in a subsequent commit.
2017-09-26 17:08:00 -04:00
Adam Paszke
b7849662b5 Always regenerate nn wrappers after rebuilding THNN and THCUNN 2017-09-25 23:21:30 -04:00
Adam Paszke
411e1469e0 Add tools for autograd profiling 2017-09-25 23:21:30 -04:00
Soumith Chintala
f4eca7c94d make CUDA_HOME take precedence over all other CUDA detection methods (#2863) 2017-09-25 18:17:40 -04:00
Soumith Chintala
5be06230f9 cleanup external NCCL detection, add NCCL_ROOT_DIR / NCCL_LIB_DIR mechanism 2017-09-25 11:28:59 -04:00
Edward Z. Yang
bf9ab91779 Indicate if the last invocation of setup.py was debug or not.
How to use:

    import torch.version
    print(torch.version.debug)

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-22 18:33:47 -04:00
Lu Fang
0a1ac8bfe5 create a cse pass, with very naive support. 2017-09-22 17:06:27 -04:00
Edward Z. Yang
670ec4bc59 Split Type into its own header file.
No other substantive changes.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-20 12:24:27 -04:00
Adam Paszke
28828e033f Make certain functions traceable 2017-09-19 10:53:32 -04:00
Adam Paszke
b708b6de8d Add ONNX pass (JIT trace initialization) 2017-09-19 10:53:32 -04:00
Adam Paszke
0e53fe3a41 Put ONNX files where they belong 2017-09-19 10:53:32 -04:00
Adam Paszke
8dae433de8 Move JIT passes to a separate directory 2017-09-19 10:53:32 -04:00
Sam Gross
80d229b0e7 Refactor THPUtils_invalidArguments into separate file 2017-09-13 19:18:02 -04:00