Commit Graph

745 Commits

Author SHA1 Message Date
Soumith Chintala
99037d627d
fix OSX cuda build (#3722) 2017-11-15 16:38:18 -05:00
Zachary DeVito
e43ff32192
Add a JIT interpreter (#3634)
* Add a JIT interpreter

The separate interpreter is used to graphs with a lower overhead than
converting them to autograd graphs. Some notes:

* does not support Handles/PythonOp/CppOp, these will be in a future commit
* jit_closure.cpp still exists and we fall back to it for now when
  cannot handle something because of PythonOp/CppOp
* In order to support retain_graph=True, the interpreter can be cloned,
  creating a copy that can be run with different arguments. This is
  assumed to be the non-standard case so cloning is not particularly optimized.
  No tensor _data_ is copied, but the at::Tensor list in the interpreter is.
  If we hit problems, there is a lot we could do (such as register allocation)
  to minimize the stuff that needs to be copied.
* Uses a pImpl pattern to keep implementation details out of its header file.
* Modifies the way getTensorOp works so that it reads/writes to already-existing
  vectors, this prevents needing to realloc these buffers each time.
* Timings are here: https://gist.github.com/zdevito/5a20ac29fb1b9e449e693b67dc478127
  This reduces overhead to about the same as running it in python.
  It is about 10us faster to run the same thing using ATen directly.

* Code Mod

Interpreter -> InterpreterState
Function -> Code

Add other requested comments.

* RegList -> ListHandle<T>

Change the RegList functions to be safer by identifying the type of
each argument list, and checking that list insert does not try
to add to two different lists at once.

* Use exactly equal for interp tests
2017-11-13 22:09:53 -08:00
Sam Gross
4fa94793dd Bump version in master (#3605) 2017-11-11 18:49:19 -05:00
peter
7160fb0801 Fix setup scripts for Windows CUDA builds 2017-11-11 13:05:35 +01:00
Adam Paszke
1f1612ee37 Move _CompiledMixin to C++ 2017-11-10 16:31:44 +01:00
Soumith Chintala
285ce10dbe
fix linking order of nvrtc to force no-as-needed (#3583) 2017-11-08 22:05:09 -05:00
Edward Z. Yang
d2784b6e5b Link ATen against CuDNN when available. (#3582)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-11-08 20:20:53 -05:00
peterjc123
aa911939a3 Improve Windows Compatibility (for csrc/scripts) (#2941) 2017-11-08 19:51:35 +01:00
Adam Paszke
621fbd5c4e Move flattening/unflattening JIT logic to C 2017-11-06 19:42:44 -05:00
Sam Gross
fde355f7d4
Allow in-place operations on views (#3384)
Allow in-place operations on views

Adds VariableViewImpl, a subclass of VariableImpl which has a pointer to
the base Variable on which it is a view. In-place operations on views
change the grad_fn of the base.

Note that in-place operations only work on views that are the first output of the function that created them. All C++/ATen implemented functions have this behavior, but it's possible to write Python-implemented autograd functions that do not. In-place operations on these view will raise an exception.

Fixes #3313
2017-11-06 18:19:56 -05:00
Zach DeVito
f6dac327df build fixes 2017-11-02 19:53:36 -04:00
Zach DeVito
88d56cc198 fix setup.py paths 2017-11-02 19:53:36 -04:00
Zach DeVito
5aa5b572e4 update build so that all of TH* is in libATen 2017-11-02 19:53:36 -04:00
Sam Gross
afdf50cafe
Move jit/assert.h to csrc/assertions.h (#3442)
I've kept JIT_ASSERT as an alias to TORCH_ASSERT, which we can use throughout the C++ code.
2017-11-02 13:26:51 -04:00
Soumith Chintala
fc7a68d147 fix lint 2017-11-02 07:36:58 -04:00
Soumith Chintala
4108feb27d fix OSX cuda build 2017-11-02 07:15:24 -04:00
Trevor Killeen
0e38d3bbb3 remove thpp library (#3405) 2017-11-01 11:57:09 -04:00
Trevor Killeen
b544882335 ATen in THD (Part I) (#2288)
* enable size from ATen type

* temp commit aten thd

* port copy, math

* port random

* changes after rebase

* lapack bind

* thd and csrc compile

* fix min/max reductions in DataChannelTCP

* clean up changes

* re-enable tensor constructors

* port MPI to at::Tensor

* fix storage methods to not cast to thpp storage ptrs
2017-11-01 09:59:02 -04:00
Edward Z. Yang
d4abaa4b9e Move ONNX broadcast fusion into separate ONNX pass, fixes verbose printing.
This breaks a lot of the onnx-pytorch tests because the abstraction
barriers are not respected.  I'll spin up a patch for that separately.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-11-01 09:49:53 -04:00
Soumith Chintala
91af122d43 add no-as-needed for THRTC 2017-11-01 04:25:42 -07:00
Soumith Chintala
88d9ebc850
lazy-load nvrtc and libcuda (#3408) 2017-11-01 06:07:03 -04:00
Adam Cécile
a5dbc254f8 if git is not installed at all, no subprocess exception will be raised (#3379) 2017-10-30 18:37:12 -04:00
Edward Z. Yang
40f7f6e095 Improve handling of 'expand' (broadcasting) in JIT and ONNX
The pieces:

- I improved the lint / asserts to catch some bugs which I
  committed while working on my export.  There are two new
  properties which the linter checks now:

    (1) "Anticipated uses".  If a node says that is used by
    M, M better appear later in the topsort.  Previously,
    we only checked if it was in all_nodes.

    (2) If you are a select node, you better be a multi-type node;
    if you're not a select node, you better not be!  And you
    should never have an input that is multi-type.

- There is a new peephole optimization pass, for simple, local
  transformations to graphs.  Right now, it implements a simple
  optimization: remove 'expand' invocations that are no-ops
  (the size before matches the size after), but we can add other
  things to it later.  I needed this for ONNX because no-op expands
  show up in the left-hand argument, which we don't support.

- There is now a broadcast fuser, which fuses ATen expand ops
  into broadcastable ONNX ops (Add, Div, Mul, Pow, Sub, Gemm.)
  It only fuses when the original size is a suffix of the new
  size, as per the ONNX spec.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-10-29 23:50:34 -04:00
Maxim Berman
7b00adf5d3 Add CUDNN_LIB_DIR in rpath (#3255)
* Add CUDNN_LIB_DIR in link -rpath

* insert CUDNN_LIB_PATH in front of rpath
2017-10-28 00:13:53 -04:00
Adam Paszke
61afb0d519 Autogenerate ATen dispatch for JIT nodes 2017-10-27 02:40:09 +05:30
Sam Gross
67839ce7bc Delete unused Softmax code (#3220)
Softmax and LogSoftmax are automatically bound and dispatched through
VariableType.
2017-10-21 20:51:27 +02:00
Edward Z. Yang
67612cba09 Add -Wno-missing-braces
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-10-19 23:04:19 -04:00
Sam Gross
f1f64c8d07 Generate autograd functions for NN / more refactors (#3136)
Generate autograd functions for NN and implement more derivatives in derivatives.yaml

A big refactor of gen_variable_type.py
2017-10-19 15:03:26 -04:00
Adam Paszke
98e67448fa Large Softmax and LogSoftmax refactor
- Cleaned up THNN and THCUNN code and kernels
- Improved THCUNN kernel performance 5x, making it match cuDNN performance
- Added support for computing softmax over arbitrary dims
  NOTE: The default dim for 3D inputs is now 1 (used to be 0)
- Both functions now accept inputs with arbitrarily many dimensions
- Autograd functions no longer save the input (it's unnecessary)
- Added cuDNN bindings for softmax, but they are unused as THCUNN
  matches or even exceeds cuDNN performance
2017-10-19 19:51:10 +02:00
Trevor Killeen
dcb457fdd9 add support for using nnpack when installed via conda (#3155)
* add support for using nnpack when installed via conda

* unify nnpack discovery between conda and user
2017-10-18 20:11:13 +02:00
Richard Zou
0f4ae13f05 Better cudnn version checking (#3132) 2017-10-16 20:59:18 +02:00
Richard Zou
1322f9a272 Add cudnn version to torch.version 2017-10-13 23:58:25 +02:00
Francisco Massa
f093545919 Add compiled CUDA version in torch.version.cuda 2017-10-10 10:16:14 -04:00
Soumith Chintala
efe91fb9c1 delete redundant python nccl code 2017-10-09 22:24:18 -04:00
Soumith Chintala
4d62933529 add initial NCCL C bindings 2017-10-09 22:24:18 -04:00
Soumith Chintala
b7e258f81e link specific versioned System NCCL, rather than generic file 2017-10-09 22:24:18 -04:00
Trevor Killeen
029252fb3b NNPACK bindings for Convolution (#2826)
* skeleton commit for building and linking nnpack library in PyTorch

* first stab at conv forward binding + integration

* bind NNPACK gradient kernels

* move nnpack forward, input gradient calls deeper

* nnpack conv api mimics nn

* fix symbol error; use memory across calls

* clean up warnings, add shape checking, thread safety, configurable thread specification

* add batch size threshold, also bind for single-element batch for the future
2017-10-04 13:48:14 -04:00
Adam Paszke
437d3af7bf Add CUDNN_INCLUDE_DIR before CUDA directories in setup.py 2017-10-03 10:06:47 -04:00
Sam Gross
de757805fc Implement some autograd functions using ATen (#2805)
This adds some generated autograd functions implemented in C++, which
are generated from derivatives.yaml. It also generates Python bindings
for the Variable methods. The generated files are:

 Functions.cpp/h: subclasses of torch::autograd::Function
 VariableType.cpp/h: The at::Type for autograd Variables
 python_variable_methods.cpp: Python bindings to torch::autograd::Variable
 python_variable_methods_dispatch.h: wrapper which releases GIL and sets the
     CUDA device
 python_functions.cpp/h: exposes generated autograd functions as Python
     objects

The generated functions are mostly shadowed by the definitions in
variable.py. We'll remove the Python implementations in favor of the
generated C++ implementations in a subsequent commit.
2017-09-26 17:08:00 -04:00
Adam Paszke
b7849662b5 Always regenerate nn wrappers after rebuilding THNN and THCUNN 2017-09-25 23:21:30 -04:00
Adam Paszke
411e1469e0 Add tools for autograd profiling 2017-09-25 23:21:30 -04:00
Soumith Chintala
f4eca7c94d make CUDA_HOME take precedence over all other CUDA detection methods (#2863) 2017-09-25 18:17:40 -04:00
Soumith Chintala
5be06230f9 cleanup external NCCL detection, add NCCL_ROOT_DIR / NCCL_LIB_DIR mechanism 2017-09-25 11:28:59 -04:00
Edward Z. Yang
bf9ab91779 Indicate if the last invocation of setup.py was debug or not.
How to use:

    import torch.version
    print(torch.version.debug)

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-22 18:33:47 -04:00
Lu Fang
0a1ac8bfe5 create a cse pass, with very naive support. 2017-09-22 17:06:27 -04:00
Edward Z. Yang
670ec4bc59 Split Type into its own header file.
No other substantive changes.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-20 12:24:27 -04:00
Adam Paszke
28828e033f Make certain functions traceable 2017-09-19 10:53:32 -04:00
Adam Paszke
b708b6de8d Add ONNX pass (JIT trace initialization) 2017-09-19 10:53:32 -04:00
Adam Paszke
0e53fe3a41 Put ONNX files where they belong 2017-09-19 10:53:32 -04:00
Adam Paszke
8dae433de8 Move JIT passes to a separate directory 2017-09-19 10:53:32 -04:00
Sam Gross
80d229b0e7 Refactor THPUtils_invalidArguments into separate file 2017-09-13 19:18:02 -04:00
Peter Ruch
0a9f93e43c add env var for python executable 2017-09-13 17:49:08 -04:00
Soumith Chintala
19cfda761c write THD link libraries to text file and read it in setup.py to link dependencies correctly (#2711) 2017-09-12 20:56:36 -04:00
Sam Gross
1290e586fb Use at::Tensor based autograd Variable (#2676)
Variable is now a subclass of at::Tensor backed by a VariableImpl* pImpl. The implementation of the ATen functions is defined in the auto-generated VariableType.h/cpp file.

Currently, only functions which fall through to the base type, such as sizes() and isCuda() are implemented. Differentiable ops like add() and mul() will be added in a subsequent PR.
2017-09-12 11:36:01 -04:00
Soumith Chintala
cf2c7ca998 add THPP linkage when building THD (#2687) 2017-09-11 08:53:38 -04:00
Edward Z. Yang
459cc5a346 Check for nanopb and pybind11 submodules as well. (#2660)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-07 13:24:31 -04:00
Soumith Chintala
84095f9512 add linux guard 2017-09-07 11:57:49 -04:00
Soumith Chintala
894c05fd22 fix static linkage and make THD statically linked 2017-09-07 11:54:18 -04:00
Zach DeVito
6d8d5bab4c Codemod Toffee -> ONNX, toffee -> onnx. Change file names to match 2017-09-06 13:45:39 -04:00
Edward Z. Yang
d59714e3b1 Code review comment changes.
- Reduce setup.py diff.
- Expunge WITH_TOFFEE from codebase.
- Elaborate on a comment.
- Move gen_toffee.sh to tools
- Delete densenet test.
- Use 'using' to inherit a constructor.
- Delete outdated comment.
- Comment about why primspecs can return fewer outputs.
- Remove dead, commented out includes.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
Edward Z. Yang
7ac6d67a4e Add nanopb to list of dep_libs.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
Adam Paszke
594f98ce16 Support multi-stage AutogradClosures 2017-09-05 17:48:55 -04:00
Edward Z. Yang
605ef38831 Explicitly override CMAKE_DEBUG_POSTFIX for nanopb build.
If it's not set, CMAKE_DEBUG_POSTFIX sets it to 'd' which means the
static library gets named something different when built in debug mode.
This is annoying because it means if you build in debug mode, the
library is in a different place.  Rather than teach the build system
to find the correct name, just set this POSTFIX so names don't change.

Also, update setup.py to look for the non-debug archive.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
Edward Z. Yang
de6ef65be5 Port to nanopb.
General strategy:
- nanopb is statically linked into PyTorch.  It must be built
  with -fPIC.
- Generated nanopb files for toffee.proto are checked into
  our repo.
- Because nanopb generated protobufs are C only, we wrote a
  wrapper around it to give a Google C++ style interface.
  More on this shortly.

How does the wrapper work?
- It's called "micropb" becaues it is less small than nanopb :)
- nanopb requires all variable-length fields to be written out
  using a "callbacks" mechanism.
- We wrote pre-canned callbacks for all of the types ToffeeIR
  writes out and lists; these are micropb_callback and
  micropb_callback_list.  These operate simply by dynamically
  allocating and storing the data to be written out in
  data (this defeats the purpose of the callback mechanism,
  but it's easy to implement)
- Finally some boilerplate to actually implement the wrapper
  classes and have owning pointers to the actual data.

Testing strategy:
- Take the serialized protobuf from nanopb, parse it again
  with ToffeeIR and print it.  Worked with all of test_jit.py!
  These tests don't run without 'toffee' being installed.

TODO:
- Update CI to install ToffeeIR, so we can run the Toffee tests
  in CI
- Update E2E with Caffe2 tests so that they work with new stuff.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
Zach DeVito
a3fdb281d1 Python wrapper for Node IR using pybind11
Supports almost all of the IR API.
2017-09-05 17:48:55 -04:00
Adam Paszke
fa308b3183 Improve backward tracing 2017-09-05 17:48:55 -04:00
Zach DeVito
57b7370aab switch NodeKind over to Symbol type. 2017-09-05 17:48:55 -04:00
Zach DeVito
d7d74428a3 batchnorm hacking 2017-09-05 17:48:55 -04:00
Edward Z. Yang
db79be82ab Move Toffee for C++ functions back to autograd.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
Edward Z. Yang
e1b345d81b More alexnet things as primspec.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
Edward Z. Yang
6f6fe177f1 Make Toffee optional. Unbreaks CI.
The general strategy:

- We put all the toffee files in torch/csrc/toffee; they will only be
  added when toffee is enabled

- Toffee is enabled if torch/lib/ToffeeIR is present (since we
  don't have a submodule/subtree thing going on)

- The most prevalant place you will need to use WITH_TOFFEE is for
  primspec definitions on C++ autograd functions.  There is a
  macro HAS_PRIMSPEC to ameliorate optionally defining primspec()
  virtual overrides on Function classes.  HasPrimspec is always
  available but will be a zero field class when Toffee is disabled.

NB: We might revert this commit in the future if we figure out a way
to unconditionally enable Toffee that everyone likes.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
Edward Z. Yang
4b1f182199 Disable C++ Python conversion code.
We want all the conversion code to live in one place. Away it goes!

This means that alexnet protobuf no longer works.  It will start working
again when we port changes.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
Edward Z. Yang
dd58b145c3 Toffee graph exporting for PyTorch.
This commit adds a new exporter pass which takes a graph and returns
a string of the human-readable protobuf representation of a model.

We have two strategies for how conversions are implemented:

- If a Python autograd function has a primspec static method, we invoke
  it to get the Toffee conversion.  Use torch.toffee.op to generate the
  format expected to be returned.  The particular data representation is opaque
  and subject to change in the future.

- Otherwise, there's a giant if statement in the exporter, which manually
  uses the JIT IR C++ API and Toffee IR C++ protobuf API to convert.

You must check out a copy of the ToffeeIR repo
https://github.com/ProjectToffee/ToffeeIR at torch/lib; at the moment
we don't have a subtree/submodule set up.

Technical debt in this commit:

- To get protobuf headers in scope, we unconditionally add $CONDA_PREFIX/include
  to the include path.  This needs to be replaced with a more robust mechanism.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
Adam Paszke
7f60a18293 Add initial support for backward tracing 2017-09-05 17:48:55 -04:00
Adam Paszke
1c4538e017 Trace C functions 2017-09-05 17:48:55 -04:00
Adam Paszke
233a66dcbe Remove SimpleMap from JIT IR 2017-09-05 17:48:55 -04:00
Zach DeVito
f5e414862a cuda guards for fusion compiler 2017-09-05 17:48:55 -04:00
Zach DeVito
50e51eaa7f Fusion of simple map operations using nvrtc.
Approach is based on the approach of THC's pointwiseApply{1,2,3} family of kernels,
but doesn't have any dependencies on that code.

Adjacent contiguous dimensions of input tensors are compressed to reduce the complexity of indexing math.
For the completely contiguous case, the indexing logic simplifies to just the linear index.

In simple tests, this code matched or beat the equivalent from THC.
2017-09-05 17:48:55 -04:00
Adam Paszke
f270973937 Add JIT IR -> Autograd IR converter 2017-09-05 17:48:55 -04:00
Zach DeVito
48945a435d IR modifications to make mutatation possible. Nodes are in intrusive doubly-linked list. Methods added to manipulate inputs etc. 2017-09-05 17:48:55 -04:00
Edward Z. Yang
8215860d2f Add an assert wrapper for easy porting.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
Adam Paszke
ea05ac8f41 Move JIT-related files to jit dir. Remove IR interpreter 2017-09-05 17:48:55 -04:00
Zach DeVito
1325fa511c JIT IR including use-def chains and updated comments. 2017-09-05 17:48:55 -04:00
Edward Z. Yang
a797ab9343 Rewrite AST to a new, more functional representation.
Previously, our AST was a DAG, where shared Nodes indicated a computation
should be reused.  This commit rewrites the IR into a new functional
representation which represents sharing explicitly using variable
bindings.

We offer a few justifications for this new style:

1. The new representation is not all that different from the
old one; it is about as easy to construct, and the lack of an
explicit graph doesn't negatively impact our ability to interpret
the graph, since we've chosen, as a matter of design, to NOT have
the IR participate in the actual execution of a graph.

2. The new let-binding representation has an implicit ordering,
which we can use to conveniently keep track of the original order
the trace showed up as.  This automatically gives us a topsort,
and gives us an easier to read textual representation of our
IR:

  %14 = Embedding %11, %0, -1, None, 2, False, False
  %15 = Dropout %14, 0.2, True, False
  %16 = Index %12, 0
  %17 = Index %12, 1
  %18 = Index %13, 0
  %19 = Index %13, 1
  %20 = Index %15, 0
  %21 = Linear %20, %1, %3
  %22 = Linear %16, %2, %4

3. It moves us closer to a Futhark style language
(http://futhark-lang.org/publications/pldi17.pdf).

Major aspects of the diff

- Node is replaced with Expr and Arg, a pair of mutually recursive
  structures which represent our new language.  In BNF, the language
  looks like this:

    a ::= c | %i
    e ::= %i, ... = e
        | PyOp e, ...
        | Ret %i, ...

  Technically, Ret is not actually a return (no control flow is involved),
  it just tuples up a series of tensors (identified by variables).

  One important invariant is that locals are always tensors; they
  are never constants (this is asymmetric with Args.)

- Arguments support Python constants.  This is an important piece because
  many operators take extra Python literals like integers and tuples in
  order to specify extra parameters about how an operator operates.  Adding
  this was essential to getting word_language_model to work.

- As both Expr and Arg have multiple variants, there is new infrastructure
  for doing case on the variants using ExprVisitor and ArgVisitor.  The
  strategy here is adapted from WebAssembly's visitors, although we have
  generalized to permit arbitrary argument forwarding, which is necessary
  to support tail-recursive visitor calls.  TCO is important because our
  interpreter may recurse arbitrarily deep into a stack of nested lets.
  If users wish, they can also manually case on the type tag.

- Tracing is now turned on and off using _tracer_enter/_tracer_exit in
  torch._C.  _tracer_enter accepts a list of variables which are to be
  treated as arguments; _tracer_exit accepts the list of traced variables
  which should be returned when you reexecute the trace, and returns
  the trace expression which can be reexecuted.  GlobalTracingState
  is a global variable which tracks whether or not we are tracing or not.

- You use run_forward to execute a trace on some set of parameters.

- When under tracing, variables keep track, via trace_local, what the
  name of their variables in the IR are.

Here is a simple runner which leaks memory but can be used to JIT models:

  import torch.autograd.function as F
  import torch._C

  def jit(model):
      import types
      real_forward = model.forward
      def forward(self, *args):
          def flatten(x):
              return tuple(F._iter_variables(x))
          if not hasattr(self, "saved_trace"):
              torch._C._tracer_enter(tuple(self.parameters()) + flatten(args))
              out = real_forward(*args)
              self.saved_trace = torch._C._tracer_exit(flatten(out))
              self.saved_outs = out
              return out
          else:
              flat_out = Variable._execution_engine.run_forward(self.saved_trace, tuple(self.parameters()) + flatten(args))
              return F._unflatten(flat_out, self.saved_outs)

Major problems:

- Sanity checking is spotty at best, especially when users pass in variables.

- The interpreter leaks tensor memory from the store.  When we add back def-use
  we should be able to deallocate tensors as soon as we know they are no longer
  necessary.

- The interpreter needs to reach feature parity with the old execution engine.
  From there, we need to see if backwards can be subsumed as well.

- I still have no confidence in having memory managed everything correctly.
  This requires a close look.

- Rather than return an *open* expression as a trace, we should return a
  *lambda* instead, which knows about how many formal parameters it
  requires.

- The IR is not introspectable from Python at the moment, but this is simply a
  matter of implementing all the binding code.

- The tracer is NOT reentrant (you can't trace while you're inside a trace.)
  Furthermore, no sanity checking is done if you try to incorrectly reuse
  things from one trace in another.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
Edward Z. Yang
e1b7872fc2 Make it possible to access IR from Python.
Also, add a new trace_fn field to attach forward IR to Variables.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
Edward Z. Yang
c5faaf69d8 Initial IR representation for forward trace.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
hongyi-zhang
bf013f4c99 fix Python 2 gloo install (#2597) 2017-09-02 20:05:37 -04:00
Edward Z. Yang
a03e5cb409 Remind users to submodule update.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-08-30 16:14:38 -04:00
Sam Gross
966fdbd93a Add commands to re-build individual libraries. (#2506)
When working on PyTorch dependencies we often want to rebuild only that
dependency and the Python extension. You can now do that by running:

  python setup.py build_thc

to only re-build THC
2017-08-23 07:16:05 -04:00
Thomas Viehmann
7c04f11d88 search for ldconfig in /sbin for nccl detection (#2276) 2017-08-03 05:32:21 +05:30
Zachary DeVito
43c944acbd Remove dead THPP code that has been replaced with ATen objects. (#2235)
THPP usage is now isolated in THD.
2017-07-29 08:07:41 +05:30
Trevor Killeen
c304d04fc6 Replace thpp::Tensor with ATen Tensor in autograd csrc (#2170) 2017-07-28 10:18:37 -04:00
Soumith Chintala
ea6f9a26b8 fix version number 2017-07-20 13:30:53 -04:00
Soumith Chintala
09abaa2189 make keepdim backcompat warnings emit in autograd as well (#2157) 2017-07-20 01:48:05 -04:00
Soumith Chintala
a5c2546c0f version bump 2017-07-19 12:34:43 -07:00
Soumith Chintala
b660303a16 Static linking against libstdc++ in Binary Build mode 2017-07-19 12:19:36 -04:00
Soumith Chintala
169ca67a4e Adding Spatial Transformers w/CuDNN support 2017-07-12 14:32:06 -04:00
Zach DeVito
ab3d85c410 add build commands for ATen 2017-07-11 10:35:03 -04:00
Trevor Killeen
6df23b418d mark tools as excluded in find_packages (#1915) 2017-06-29 13:49:56 -04:00
Trevor Killeen
cb4eaa9c5d TensorLib/Aten --> changes required in pytorch 2017-06-22 12:55:55 -04:00
gchanan
a64560c22e Remove flattening for torch.dot (#1781) 2017-06-16 02:15:33 +02:00
Edward Z. Yang
3ada9da808 Make csrc -Werror clean. (#1795)
Primary things I had to fix:

- Suppress _XOPEN_SOURCE warnings by ensuring that Python.h is included
  first, because it always unconditionally defines this macro.

- Turn off strict aliasing, because Python 2 doesn't work with strict
  aliasing.

- Workaround setuptools bug, where it's incorrectly passing
  -Wstrict-prototypes to C++ compilers (where this doesn't make
  any sense)

To compile csrc with -Werror, run `CFLAGS="-Werror" python setup.py build_ext`

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-06-13 20:18:09 -04:00
Adam Paszke
714351ff39 Officially enable process-group mode 2017-06-12 22:02:11 -04:00
Gregory Chanan
65b23f146e Add broadcasting support for copy_, simplify code generation by moving a lot of currently generated code to expand_utils. 2017-06-11 05:37:59 -04:00
Gregory Chanan
6a40acb4f0 Add Broadcast plugin. 2017-06-11 05:37:59 -04:00
Edward Z. Yang
ba690d5607 Add support for NVTX functions. (#1748) 2017-06-10 18:26:58 +02:00
Adam Paszke
8ea7c87c29 Improve init methods 2017-06-02 23:42:11 +02:00
Adam Paszke
702a2e3bc5 Make Variables not subclass Function anymore
Because of this Variables can no longer appear in the graph.
Every usage of a leaf Variable will leave an AccumulateGrad
function that has no outputs, but modifies var.grad as a side
effect.
2017-05-01 16:44:56 -04:00
Adam Paszke
2ca787fcf4 Refactor attribute names in autograd 2017-05-01 16:44:56 -04:00
Soumith Chintala
2197e4c766 version bump 2017-05-01 15:54:52 -04:00
Adam Paszke
9169f60a84 Parallelize TensorMethods.cpp builds (#1400) 2017-04-29 09:07:21 -04:00
Soumith Chintala
24e5a9057e Revert "Parallelize TensorMethods.cpp builds (#1364)" (#1390)
This reverts commit 060048bcd8.
2017-04-28 07:59:40 -04:00
Adam Paszke
060048bcd8 Parallelize TensorMethods.cpp builds (#1364) 2017-04-28 07:45:21 -04:00
albanD
f0c7124420 Allow support for negative dimension argument for all functions 2017-04-06 16:37:00 -07:00
Soumith Chintala
1c391f6f93 bump version 2017-03-29 10:08:34 -04:00
Sam Gross
b9379cfab7 Use cuDNN and NCCL symbols from _C library (#1017)
This ensures that we use the same library at the C++ level and with
Python ctypes. It moves the searching for the correct library from
run-time to compile-time.
2017-03-16 16:10:17 -04:00
Low Kian Seong
2f5c215d34 Update setup.py (#981)
Adding `description` to `setup.py`
2017-03-11 12:14:07 -05:00
Sam Gross
15a9fbdedb Merge pull request #881 from colesbury/parallelize_backwards
Parallelize autograd backwards
2017-03-06 16:57:19 -05:00
soumith
76f7d749e4 bump version 2017-03-05 08:49:52 -08:00
Sam Gross
34ce58c909 Parallelize backwards 2017-03-03 11:26:00 -08:00
Adam Paszke
0db9c63300 Use library_dirs in setup.py 2017-02-20 23:28:31 -08:00
Adam Paszke
1bdc28161a Add torch.__version__ 2017-02-17 10:40:08 +05:30
Dr. Kashif Rasul
8d90ab2d9b compile with cudart (#737) 2017-02-14 06:40:35 +05:30
Sam Gross
bd5303010d Refactor autograd package to separate Python dependencies. (#662)
The core autograd Variable, Function, and Engine no longer depend on the
Python API. This let's us implement functions in C++. In the future, we
can also multithread engine and release the GIL for most of the
non-Python backwards.
2017-02-13 16:00:16 -08:00
Sam Gross
f8fb25e0a2 Add generic bindings to THNN and THCUNN (#645)
Adds bindings using thpp::Tensor to THNN and THCUNN. This allows calling
into those APIs without knowing the concrete types of the tensor
arguments.
2017-01-31 13:23:02 -05:00
Adam Paszke
79232c24e2 Fixes after rebase 2017-01-31 01:58:09 +01:00
Janusz Marcinkiewicz
76520512e7 DataChannel tests rewrite (#42); DataChannel isend and irecv implementation (#44) 2017-01-31 01:58:09 +01:00
Adam Paszke
60d1852c7b Major improvements to master-worker mode
* Fixed all undefined symbol errors
* Implemented storage interface and THStorage class
* RPC improvements
* Code refactor
2017-01-31 01:58:09 +01:00
Filip Binkiewicz
9fc3c5e4d2 THDTensor constructors implemented + some minor fixes 2017-01-31 01:58:09 +01:00
Adam Paszke
55632d81d2 Add Python wrappers for process group mode 2017-01-31 01:58:09 +01:00
Adam Paszke
9c411513bf Patch distutils crash when linking with ccache 2017-01-28 00:28:33 +01:00
Luke Yeager
2ad967dbe4 Fix pep8 in setup.py with "autopep8 -i setup.py" 2017-01-25 22:23:22 -05:00
Sam Gross
c9db9c2317 Add C++ tensor library (from THD fork) (#526) 2017-01-20 15:23:34 -05:00
Sam Gross
9302f860ae Remove unused file TensorDocstrings.cpp (#481)
Tensor docstrings are created in _tensor_docs.py
2017-01-18 13:34:40 -05:00
soumith
57a2ccf777 PYTORCH_BUILD_VERSION to setup.py 2017-01-17 17:51:16 -08:00
soumith
e4812b3903 add binary version to setup.py 2017-01-17 14:14:01 -08:00
Sam Gross
fd92470e23 Add cuDNN bindings for BatchNorm (#421) 2017-01-07 15:35:24 -05:00
Zeming Lin
59d66e6963 Sparse Library (#333) 2017-01-05 00:43:41 +01:00
Soumith Chintala
6a2785aef7 remove link_prefix from linker arguments (#395) 2017-01-02 12:37:52 -05:00
Soumith Chintala
b650a45b9c fix botched merge in setup.py 2016-12-31 16:55:53 -05:00
Soumith Chintala
b5dc36f278 explicitly linking against v1 libs to avoid lua-torch conflicts (#386) 2016-12-31 10:30:36 -05:00
Adam Paszke
08d346df9c Print libraries used for building the extension 2016-12-15 00:47:55 +01:00
Adam Paszke
28f0cf6cee Add docstring support to cwrap (#295) 2016-12-11 23:25:14 +01:00
Adam Paszke
cb849524f3 Improve cuDNN detection at build time 2016-12-01 23:14:41 +01:00
Adam Paszke
ebc70f7919 Look for libcudart in default CUDA installation paths (#195) 2016-11-02 19:36:10 -04:00
Adam Paszke
ef557761dd Allow to not use all function outputs in autograd 2016-10-31 22:47:09 +01:00
Sam Gross
ad5fdef6ac Make every user-visible Tensor have a Storage (#179) 2016-10-31 12:12:22 -04:00
Sam Gross
f2d7e94948 Use torch.Size for Tensor sizes and tuple for strides
See issue #20

The torch.Size class is a tuple subclass which distinguishes sizes from
other tuples so that torch.Tensor(size) is interpreted as size instead
of data.
2016-10-28 19:37:09 +02:00
Sam Gross
ad2d413c0b Add C++ bindings for cuDNN (#167)
The Python ctypes bindings overhead was high enough that it slowed down
multi-gpu training when using 4+ Maxwell GPUs.
2016-10-26 19:51:48 -04:00
Soumith Chintala
140c65e52b fixing python setup.py clean 2016-10-21 23:20:02 -04:00
Sam Gross
79ead42ade Add CUDA Stream and Event API (#133) 2016-10-18 12:15:57 -04:00
Adam Paszke
0325e2f646 Major autograd refactor
Improves autograd performance by more than 2x and fixes a couple
of bugs. All core functions have been moved to C.
2016-10-13 17:17:49 -07:00
Adam Paszke
2acee24332 Add keyword argument support to most tensor functions 2016-10-13 12:32:04 -04:00
Adam Paszke
96f61bff30 Add LAPACK functions 2016-10-08 20:37:37 -07:00
Sam Gross
e8a5f00866 Auto GPU for CUNN (#71) 2016-09-30 14:04:53 -04:00
Adam Paszke
941cf4e63d Add ffi utils for user C extensions 2016-09-29 09:35:56 -07:00
Sam Gross
cb5d4e836f Lazy load CUDA and THNN modules (#64) 2016-09-28 19:29:53 -04:00
Adam Paszke
52ed57352a Free GIL in C functions 2016-09-27 15:22:20 -07:00
Soumith Chintala
1cf87e8a0b OSX + Python 2 build fixes 2016-09-25 19:26:13 -04:00
Adam Paszke
ddf1598ef8 Add a method for catching exceptions thrown in ctypes 2016-09-25 12:25:54 -07:00
Adam Paszke
06ab3f962f Refactor _C extension to export some utilities 2016-09-21 08:36:54 -07:00
soumith
65d4055366 adding static linking on binary builds 2016-09-13 10:34:13 -07:00
Sam Gross
1486d880b0 Add Storage.from_buffer
The from_buffer is similar to numpy's frombuffer. It decodes a Python
buffer object into a Storage object. For byte and char storages, it
simply copies the bytes.
2016-09-07 15:32:33 -07:00
Soumith Chintala
4cffa2219a build fixes for OSX 2016-09-06 22:06:06 -04:00
Adam Paszke
f9d186d33a Add initial version of multiprocessing module 2016-08-31 19:46:08 -07:00
Adam Paszke
686e8d32e2 Add torch.save and torch.load 2016-08-23 07:51:55 -07:00
Adam Paszke
8d933cbfc4 Fixes for OS X 2016-08-22 22:45:35 -04:00
Adam Paszke
4c51a523c8 Add super basic CUDA autodetection 2016-08-19 14:23:53 -07:00
Adam Paszke
b06c000478 Fix <3.5 compatibility and travis configuration 2016-08-16 21:11:10 -07:00
Adam Paszke
207d6ae60d Override build commands in setup.py 2016-08-14 20:47:27 -07:00
Adam Paszke
1902bc0bfb Interface with numpy 2016-08-13 20:19:17 -07:00
Adam Paszke
9fff8e7392 Fixes for changes in libs 2016-08-12 22:02:57 -07:00
Adam Paszke
ef7364b80e Fix Python 2.7 compatibility 2016-08-12 18:26:10 -07:00
Adam Paszke
12bed8dc0d Add CUDA device selection 2016-08-12 07:46:46 -07:00
Adam Paszke
e9f9fd3727 Major refactor 2016-08-10 09:24:53 -07:00
Adam Paszke
652a31b714 Add build scripts for libraries 2016-08-04 14:12:31 -07:00
Adam Paszke
6df0ae5d35 Add cunn 2016-08-02 09:20:18 -07:00
Adam Paszke
2f342af22f Move optim to legacy 2016-08-01 12:01:46 -04:00
Adam Paszke
ae40bcd58c Base for nn conversion 2016-07-22 22:21:29 -04:00
Adam Paszke
554a1d8336 Add optim 2016-07-21 16:42:06 -04:00
Adam Paszke
bc7bd7a8b3 Add unit tests and fix detected bugs 2016-07-21 13:46:59 -04:00
Adam Paszke
3a44259b32 Add support for CUDA 2016-07-19 10:45:59 -04:00
Adam Paszke
cf90bee8af Enable parallel builds 2016-07-18 23:56:50 -04:00
Adam Paszke
3cec305524 Restructure python code 2016-06-23 22:55:05 +02:00
Adam Paszke
077bfbde03 Add all constructors for Tensor and Storage 2016-06-19 23:45:41 +02:00
Adam Paszke
4f66ea42af Add random-related Tensor methods 2016-06-18 21:36:10 +02:00
Soumith Chintala
5ee3358a92 python 2 support 2016-06-08 19:14:57 -04:00
Adam Paszke
449ac4ca2a Add torch.* functions 2016-05-09 19:14:40 +02:00
Adam Paszke
7567a0bb13 Add cwrap 2016-05-07 15:28:13 +02:00
Adam Paszke
c3b3df9f22 Add utilities and clenup Tensor wrappers 2016-05-06 15:04:57 +02:00
Adam Paszke
842e1b6358 Add exception handling 2016-05-05 20:58:13 +02:00
Adam Paszke
f4b3554d9e Refactor generic/Tensor.c and add Short objects 2016-05-03 21:20:54 +02:00
Adam Paszke
690d470c71 Add Storage.py template 2016-05-03 15:13:12 +02:00
Adam Paszke
b0d90e3688 Add templated __init__ 2016-05-02 23:54:59 +02:00
Adam Paszke
731041cb6a Initial commit 2016-05-02 23:19:57 +02:00