Commit Graph

200 Commits

Author SHA1 Message Date
Zachary DeVito
929a11f920
Add interpreter support for Handles/PythonOp/CppOp (#3866)
* Add interpreter support for Handles/PythonOp/CppOp

This treats Handles as a first-class type in the interpreter
since this turned out to be conceptually simpler than treating
them as a separate concept, which requires a second channel for
register allocating and moving data from one op to the next.

Notes:
* The refcounting nature of tensors is factored into its own base type
so that it can be shared with other refcounted types such as handle.
* Some methods redundant with TensorBase have been deleted from Tensor
* The interpreter uses raw refcounted handles. In addition to being
able to treat Tensors and Handles as the same base object, it removes
a lot of redundant refcounting as objects moved from tensors to input/
output lists.
* aten_dispatch has been updated to work directly on the raw refcounted
lists to avoid refcounting and duplicate lists.
* Removing jit_closure.cpp, The interpreter can now handle all pathways.

* Functions like `unsafeToTensorShare` describe how
ownership transfers in the interpreter. The `Steal` variants
take rvalue references as arguments, and invalidate those
arguments to prevent potential problems.
* Make TensorTemporary is not a  subtype relationship because it is too easy to
do something horribly unsafe:

```
  void foo(at::Tensor bar) {
    // bar destructor call release on a temporary!
  }

  foo(TensorTemporary(retainable)); // structure slicing!
```
2017-11-29 11:38:57 -05:00
Sam Gross
4518793aa2
Implement indexing in ATen (#3725)
Implements basic and advanced indexing using ATen tensors/variables.
Basic indexing is translated at the Python-binding level
(python_variable_indexing.cpp) to slice/squeeze/unsqueeze/select calls.
Advanced indexing is implemented in ATen in terms of take() and put()
calls.
2017-11-21 13:19:00 -05:00
Scott Stevenson
a9ef76b9c6 Reflect renaming of OS X to macOS (#3795) 2017-11-20 16:52:10 -05:00
Adam Paszke
3e4a777e44
Correct JIT interpreter autograd function (#3760) 2017-11-19 21:48:22 +01:00
Zachary DeVito
cc7f09a372
Add cudaEvent support to the profiler (#3734)
* Add cudaEvent support to the profiler

This adds the ability to record cuda timings using cudaEventRecord
in the profiler. Since it doesn't require nvprof it is easier
to run than the nvprof path.

This also records a thread id for each event, which will make
tracing results easier to understand

* Add flow arrows from cpu to cuda event

* Fix no cuda build

* Review comments

* Move CUDA checks to one place
2017-11-16 13:58:09 -08:00
Soumith Chintala
99037d627d
fix OSX cuda build (#3722) 2017-11-15 16:38:18 -05:00
Zachary DeVito
e43ff32192
Add a JIT interpreter (#3634)
* Add a JIT interpreter

The separate interpreter is used to graphs with a lower overhead than
converting them to autograd graphs. Some notes:

* does not support Handles/PythonOp/CppOp, these will be in a future commit
* jit_closure.cpp still exists and we fall back to it for now when
  cannot handle something because of PythonOp/CppOp
* In order to support retain_graph=True, the interpreter can be cloned,
  creating a copy that can be run with different arguments. This is
  assumed to be the non-standard case so cloning is not particularly optimized.
  No tensor _data_ is copied, but the at::Tensor list in the interpreter is.
  If we hit problems, there is a lot we could do (such as register allocation)
  to minimize the stuff that needs to be copied.
* Uses a pImpl pattern to keep implementation details out of its header file.
* Modifies the way getTensorOp works so that it reads/writes to already-existing
  vectors, this prevents needing to realloc these buffers each time.
* Timings are here: https://gist.github.com/zdevito/5a20ac29fb1b9e449e693b67dc478127
  This reduces overhead to about the same as running it in python.
  It is about 10us faster to run the same thing using ATen directly.

* Code Mod

Interpreter -> InterpreterState
Function -> Code

Add other requested comments.

* RegList -> ListHandle<T>

Change the RegList functions to be safer by identifying the type of
each argument list, and checking that list insert does not try
to add to two different lists at once.

* Use exactly equal for interp tests
2017-11-13 22:09:53 -08:00
Sam Gross
4fa94793dd Bump version in master (#3605) 2017-11-11 18:49:19 -05:00
peter
7160fb0801 Fix setup scripts for Windows CUDA builds 2017-11-11 13:05:35 +01:00
Adam Paszke
1f1612ee37 Move _CompiledMixin to C++ 2017-11-10 16:31:44 +01:00
Soumith Chintala
285ce10dbe
fix linking order of nvrtc to force no-as-needed (#3583) 2017-11-08 22:05:09 -05:00
Edward Z. Yang
d2784b6e5b Link ATen against CuDNN when available. (#3582)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-11-08 20:20:53 -05:00
peterjc123
aa911939a3 Improve Windows Compatibility (for csrc/scripts) (#2941) 2017-11-08 19:51:35 +01:00
Adam Paszke
621fbd5c4e Move flattening/unflattening JIT logic to C 2017-11-06 19:42:44 -05:00
Sam Gross
fde355f7d4
Allow in-place operations on views (#3384)
Allow in-place operations on views

Adds VariableViewImpl, a subclass of VariableImpl which has a pointer to
the base Variable on which it is a view. In-place operations on views
change the grad_fn of the base.

Note that in-place operations only work on views that are the first output of the function that created them. All C++/ATen implemented functions have this behavior, but it's possible to write Python-implemented autograd functions that do not. In-place operations on these view will raise an exception.

Fixes #3313
2017-11-06 18:19:56 -05:00
Zach DeVito
f6dac327df build fixes 2017-11-02 19:53:36 -04:00
Zach DeVito
88d56cc198 fix setup.py paths 2017-11-02 19:53:36 -04:00
Zach DeVito
5aa5b572e4 update build so that all of TH* is in libATen 2017-11-02 19:53:36 -04:00
Sam Gross
afdf50cafe
Move jit/assert.h to csrc/assertions.h (#3442)
I've kept JIT_ASSERT as an alias to TORCH_ASSERT, which we can use throughout the C++ code.
2017-11-02 13:26:51 -04:00
Soumith Chintala
fc7a68d147 fix lint 2017-11-02 07:36:58 -04:00
Soumith Chintala
4108feb27d fix OSX cuda build 2017-11-02 07:15:24 -04:00
Trevor Killeen
0e38d3bbb3 remove thpp library (#3405) 2017-11-01 11:57:09 -04:00
Trevor Killeen
b544882335 ATen in THD (Part I) (#2288)
* enable size from ATen type

* temp commit aten thd

* port copy, math

* port random

* changes after rebase

* lapack bind

* thd and csrc compile

* fix min/max reductions in DataChannelTCP

* clean up changes

* re-enable tensor constructors

* port MPI to at::Tensor

* fix storage methods to not cast to thpp storage ptrs
2017-11-01 09:59:02 -04:00
Edward Z. Yang
d4abaa4b9e Move ONNX broadcast fusion into separate ONNX pass, fixes verbose printing.
This breaks a lot of the onnx-pytorch tests because the abstraction
barriers are not respected.  I'll spin up a patch for that separately.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-11-01 09:49:53 -04:00
Soumith Chintala
91af122d43 add no-as-needed for THRTC 2017-11-01 04:25:42 -07:00
Soumith Chintala
88d9ebc850
lazy-load nvrtc and libcuda (#3408) 2017-11-01 06:07:03 -04:00
Adam Cécile
a5dbc254f8 if git is not installed at all, no subprocess exception will be raised (#3379) 2017-10-30 18:37:12 -04:00
Edward Z. Yang
40f7f6e095 Improve handling of 'expand' (broadcasting) in JIT and ONNX
The pieces:

- I improved the lint / asserts to catch some bugs which I
  committed while working on my export.  There are two new
  properties which the linter checks now:

    (1) "Anticipated uses".  If a node says that is used by
    M, M better appear later in the topsort.  Previously,
    we only checked if it was in all_nodes.

    (2) If you are a select node, you better be a multi-type node;
    if you're not a select node, you better not be!  And you
    should never have an input that is multi-type.

- There is a new peephole optimization pass, for simple, local
  transformations to graphs.  Right now, it implements a simple
  optimization: remove 'expand' invocations that are no-ops
  (the size before matches the size after), but we can add other
  things to it later.  I needed this for ONNX because no-op expands
  show up in the left-hand argument, which we don't support.

- There is now a broadcast fuser, which fuses ATen expand ops
  into broadcastable ONNX ops (Add, Div, Mul, Pow, Sub, Gemm.)
  It only fuses when the original size is a suffix of the new
  size, as per the ONNX spec.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-10-29 23:50:34 -04:00
Maxim Berman
7b00adf5d3 Add CUDNN_LIB_DIR in rpath (#3255)
* Add CUDNN_LIB_DIR in link -rpath

* insert CUDNN_LIB_PATH in front of rpath
2017-10-28 00:13:53 -04:00
Adam Paszke
61afb0d519 Autogenerate ATen dispatch for JIT nodes 2017-10-27 02:40:09 +05:30
Sam Gross
67839ce7bc Delete unused Softmax code (#3220)
Softmax and LogSoftmax are automatically bound and dispatched through
VariableType.
2017-10-21 20:51:27 +02:00
Edward Z. Yang
67612cba09 Add -Wno-missing-braces
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-10-19 23:04:19 -04:00
Sam Gross
f1f64c8d07 Generate autograd functions for NN / more refactors (#3136)
Generate autograd functions for NN and implement more derivatives in derivatives.yaml

A big refactor of gen_variable_type.py
2017-10-19 15:03:26 -04:00
Adam Paszke
98e67448fa Large Softmax and LogSoftmax refactor
- Cleaned up THNN and THCUNN code and kernels
- Improved THCUNN kernel performance 5x, making it match cuDNN performance
- Added support for computing softmax over arbitrary dims
  NOTE: The default dim for 3D inputs is now 1 (used to be 0)
- Both functions now accept inputs with arbitrarily many dimensions
- Autograd functions no longer save the input (it's unnecessary)
- Added cuDNN bindings for softmax, but they are unused as THCUNN
  matches or even exceeds cuDNN performance
2017-10-19 19:51:10 +02:00
Trevor Killeen
dcb457fdd9 add support for using nnpack when installed via conda (#3155)
* add support for using nnpack when installed via conda

* unify nnpack discovery between conda and user
2017-10-18 20:11:13 +02:00
Richard Zou
0f4ae13f05 Better cudnn version checking (#3132) 2017-10-16 20:59:18 +02:00
Richard Zou
1322f9a272 Add cudnn version to torch.version 2017-10-13 23:58:25 +02:00
Francisco Massa
f093545919 Add compiled CUDA version in torch.version.cuda 2017-10-10 10:16:14 -04:00
Soumith Chintala
efe91fb9c1 delete redundant python nccl code 2017-10-09 22:24:18 -04:00
Soumith Chintala
4d62933529 add initial NCCL C bindings 2017-10-09 22:24:18 -04:00
Soumith Chintala
b7e258f81e link specific versioned System NCCL, rather than generic file 2017-10-09 22:24:18 -04:00
Trevor Killeen
029252fb3b NNPACK bindings for Convolution (#2826)
* skeleton commit for building and linking nnpack library in PyTorch

* first stab at conv forward binding + integration

* bind NNPACK gradient kernels

* move nnpack forward, input gradient calls deeper

* nnpack conv api mimics nn

* fix symbol error; use memory across calls

* clean up warnings, add shape checking, thread safety, configurable thread specification

* add batch size threshold, also bind for single-element batch for the future
2017-10-04 13:48:14 -04:00
Adam Paszke
437d3af7bf Add CUDNN_INCLUDE_DIR before CUDA directories in setup.py 2017-10-03 10:06:47 -04:00
Sam Gross
de757805fc Implement some autograd functions using ATen (#2805)
This adds some generated autograd functions implemented in C++, which
are generated from derivatives.yaml. It also generates Python bindings
for the Variable methods. The generated files are:

 Functions.cpp/h: subclasses of torch::autograd::Function
 VariableType.cpp/h: The at::Type for autograd Variables
 python_variable_methods.cpp: Python bindings to torch::autograd::Variable
 python_variable_methods_dispatch.h: wrapper which releases GIL and sets the
     CUDA device
 python_functions.cpp/h: exposes generated autograd functions as Python
     objects

The generated functions are mostly shadowed by the definitions in
variable.py. We'll remove the Python implementations in favor of the
generated C++ implementations in a subsequent commit.
2017-09-26 17:08:00 -04:00
Adam Paszke
b7849662b5 Always regenerate nn wrappers after rebuilding THNN and THCUNN 2017-09-25 23:21:30 -04:00
Adam Paszke
411e1469e0 Add tools for autograd profiling 2017-09-25 23:21:30 -04:00
Soumith Chintala
f4eca7c94d make CUDA_HOME take precedence over all other CUDA detection methods (#2863) 2017-09-25 18:17:40 -04:00
Soumith Chintala
5be06230f9 cleanup external NCCL detection, add NCCL_ROOT_DIR / NCCL_LIB_DIR mechanism 2017-09-25 11:28:59 -04:00
Edward Z. Yang
bf9ab91779 Indicate if the last invocation of setup.py was debug or not.
How to use:

    import torch.version
    print(torch.version.debug)

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-22 18:33:47 -04:00
Lu Fang
0a1ac8bfe5 create a cse pass, with very naive support. 2017-09-22 17:06:27 -04:00