Commit Graph

133 Commits

Author SHA1 Message Date
Edward Z. Yang
605ef38831 Explicitly override CMAKE_DEBUG_POSTFIX for nanopb build.
If it's not set, CMAKE_DEBUG_POSTFIX sets it to 'd' which means the
static library gets named something different when built in debug mode.
This is annoying because it means if you build in debug mode, the
library is in a different place.  Rather than teach the build system
to find the correct name, just set this POSTFIX so names don't change.

Also, update setup.py to look for the non-debug archive.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
Edward Z. Yang
de6ef65be5 Port to nanopb.
General strategy:
- nanopb is statically linked into PyTorch.  It must be built
  with -fPIC.
- Generated nanopb files for toffee.proto are checked into
  our repo.
- Because nanopb generated protobufs are C only, we wrote a
  wrapper around it to give a Google C++ style interface.
  More on this shortly.

How does the wrapper work?
- It's called "micropb" becaues it is less small than nanopb :)
- nanopb requires all variable-length fields to be written out
  using a "callbacks" mechanism.
- We wrote pre-canned callbacks for all of the types ToffeeIR
  writes out and lists; these are micropb_callback and
  micropb_callback_list.  These operate simply by dynamically
  allocating and storing the data to be written out in
  data (this defeats the purpose of the callback mechanism,
  but it's easy to implement)
- Finally some boilerplate to actually implement the wrapper
  classes and have owning pointers to the actual data.

Testing strategy:
- Take the serialized protobuf from nanopb, parse it again
  with ToffeeIR and print it.  Worked with all of test_jit.py!
  These tests don't run without 'toffee' being installed.

TODO:
- Update CI to install ToffeeIR, so we can run the Toffee tests
  in CI
- Update E2E with Caffe2 tests so that they work with new stuff.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
Zach DeVito
a3fdb281d1 Python wrapper for Node IR using pybind11
Supports almost all of the IR API.
2017-09-05 17:48:55 -04:00
Adam Paszke
fa308b3183 Improve backward tracing 2017-09-05 17:48:55 -04:00
Zach DeVito
57b7370aab switch NodeKind over to Symbol type. 2017-09-05 17:48:55 -04:00
Zach DeVito
d7d74428a3 batchnorm hacking 2017-09-05 17:48:55 -04:00
Edward Z. Yang
db79be82ab Move Toffee for C++ functions back to autograd.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
Edward Z. Yang
e1b345d81b More alexnet things as primspec.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
Edward Z. Yang
6f6fe177f1 Make Toffee optional. Unbreaks CI.
The general strategy:

- We put all the toffee files in torch/csrc/toffee; they will only be
  added when toffee is enabled

- Toffee is enabled if torch/lib/ToffeeIR is present (since we
  don't have a submodule/subtree thing going on)

- The most prevalant place you will need to use WITH_TOFFEE is for
  primspec definitions on C++ autograd functions.  There is a
  macro HAS_PRIMSPEC to ameliorate optionally defining primspec()
  virtual overrides on Function classes.  HasPrimspec is always
  available but will be a zero field class when Toffee is disabled.

NB: We might revert this commit in the future if we figure out a way
to unconditionally enable Toffee that everyone likes.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
Edward Z. Yang
4b1f182199 Disable C++ Python conversion code.
We want all the conversion code to live in one place. Away it goes!

This means that alexnet protobuf no longer works.  It will start working
again when we port changes.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
Edward Z. Yang
dd58b145c3 Toffee graph exporting for PyTorch.
This commit adds a new exporter pass which takes a graph and returns
a string of the human-readable protobuf representation of a model.

We have two strategies for how conversions are implemented:

- If a Python autograd function has a primspec static method, we invoke
  it to get the Toffee conversion.  Use torch.toffee.op to generate the
  format expected to be returned.  The particular data representation is opaque
  and subject to change in the future.

- Otherwise, there's a giant if statement in the exporter, which manually
  uses the JIT IR C++ API and Toffee IR C++ protobuf API to convert.

You must check out a copy of the ToffeeIR repo
https://github.com/ProjectToffee/ToffeeIR at torch/lib; at the moment
we don't have a subtree/submodule set up.

Technical debt in this commit:

- To get protobuf headers in scope, we unconditionally add $CONDA_PREFIX/include
  to the include path.  This needs to be replaced with a more robust mechanism.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
Adam Paszke
7f60a18293 Add initial support for backward tracing 2017-09-05 17:48:55 -04:00
Adam Paszke
1c4538e017 Trace C functions 2017-09-05 17:48:55 -04:00
Adam Paszke
233a66dcbe Remove SimpleMap from JIT IR 2017-09-05 17:48:55 -04:00
Zach DeVito
f5e414862a cuda guards for fusion compiler 2017-09-05 17:48:55 -04:00
Zach DeVito
50e51eaa7f Fusion of simple map operations using nvrtc.
Approach is based on the approach of THC's pointwiseApply{1,2,3} family of kernels,
but doesn't have any dependencies on that code.

Adjacent contiguous dimensions of input tensors are compressed to reduce the complexity of indexing math.
For the completely contiguous case, the indexing logic simplifies to just the linear index.

In simple tests, this code matched or beat the equivalent from THC.
2017-09-05 17:48:55 -04:00
Adam Paszke
f270973937 Add JIT IR -> Autograd IR converter 2017-09-05 17:48:55 -04:00
Zach DeVito
48945a435d IR modifications to make mutatation possible. Nodes are in intrusive doubly-linked list. Methods added to manipulate inputs etc. 2017-09-05 17:48:55 -04:00
Edward Z. Yang
8215860d2f Add an assert wrapper for easy porting.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
Adam Paszke
ea05ac8f41 Move JIT-related files to jit dir. Remove IR interpreter 2017-09-05 17:48:55 -04:00
Zach DeVito
1325fa511c JIT IR including use-def chains and updated comments. 2017-09-05 17:48:55 -04:00
Edward Z. Yang
a797ab9343 Rewrite AST to a new, more functional representation.
Previously, our AST was a DAG, where shared Nodes indicated a computation
should be reused.  This commit rewrites the IR into a new functional
representation which represents sharing explicitly using variable
bindings.

We offer a few justifications for this new style:

1. The new representation is not all that different from the
old one; it is about as easy to construct, and the lack of an
explicit graph doesn't negatively impact our ability to interpret
the graph, since we've chosen, as a matter of design, to NOT have
the IR participate in the actual execution of a graph.

2. The new let-binding representation has an implicit ordering,
which we can use to conveniently keep track of the original order
the trace showed up as.  This automatically gives us a topsort,
and gives us an easier to read textual representation of our
IR:

  %14 = Embedding %11, %0, -1, None, 2, False, False
  %15 = Dropout %14, 0.2, True, False
  %16 = Index %12, 0
  %17 = Index %12, 1
  %18 = Index %13, 0
  %19 = Index %13, 1
  %20 = Index %15, 0
  %21 = Linear %20, %1, %3
  %22 = Linear %16, %2, %4

3. It moves us closer to a Futhark style language
(http://futhark-lang.org/publications/pldi17.pdf).

Major aspects of the diff

- Node is replaced with Expr and Arg, a pair of mutually recursive
  structures which represent our new language.  In BNF, the language
  looks like this:

    a ::= c | %i
    e ::= %i, ... = e
        | PyOp e, ...
        | Ret %i, ...

  Technically, Ret is not actually a return (no control flow is involved),
  it just tuples up a series of tensors (identified by variables).

  One important invariant is that locals are always tensors; they
  are never constants (this is asymmetric with Args.)

- Arguments support Python constants.  This is an important piece because
  many operators take extra Python literals like integers and tuples in
  order to specify extra parameters about how an operator operates.  Adding
  this was essential to getting word_language_model to work.

- As both Expr and Arg have multiple variants, there is new infrastructure
  for doing case on the variants using ExprVisitor and ArgVisitor.  The
  strategy here is adapted from WebAssembly's visitors, although we have
  generalized to permit arbitrary argument forwarding, which is necessary
  to support tail-recursive visitor calls.  TCO is important because our
  interpreter may recurse arbitrarily deep into a stack of nested lets.
  If users wish, they can also manually case on the type tag.

- Tracing is now turned on and off using _tracer_enter/_tracer_exit in
  torch._C.  _tracer_enter accepts a list of variables which are to be
  treated as arguments; _tracer_exit accepts the list of traced variables
  which should be returned when you reexecute the trace, and returns
  the trace expression which can be reexecuted.  GlobalTracingState
  is a global variable which tracks whether or not we are tracing or not.

- You use run_forward to execute a trace on some set of parameters.

- When under tracing, variables keep track, via trace_local, what the
  name of their variables in the IR are.

Here is a simple runner which leaks memory but can be used to JIT models:

  import torch.autograd.function as F
  import torch._C

  def jit(model):
      import types
      real_forward = model.forward
      def forward(self, *args):
          def flatten(x):
              return tuple(F._iter_variables(x))
          if not hasattr(self, "saved_trace"):
              torch._C._tracer_enter(tuple(self.parameters()) + flatten(args))
              out = real_forward(*args)
              self.saved_trace = torch._C._tracer_exit(flatten(out))
              self.saved_outs = out
              return out
          else:
              flat_out = Variable._execution_engine.run_forward(self.saved_trace, tuple(self.parameters()) + flatten(args))
              return F._unflatten(flat_out, self.saved_outs)

Major problems:

- Sanity checking is spotty at best, especially when users pass in variables.

- The interpreter leaks tensor memory from the store.  When we add back def-use
  we should be able to deallocate tensors as soon as we know they are no longer
  necessary.

- The interpreter needs to reach feature parity with the old execution engine.
  From there, we need to see if backwards can be subsumed as well.

- I still have no confidence in having memory managed everything correctly.
  This requires a close look.

- Rather than return an *open* expression as a trace, we should return a
  *lambda* instead, which knows about how many formal parameters it
  requires.

- The IR is not introspectable from Python at the moment, but this is simply a
  matter of implementing all the binding code.

- The tracer is NOT reentrant (you can't trace while you're inside a trace.)
  Furthermore, no sanity checking is done if you try to incorrectly reuse
  things from one trace in another.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
Edward Z. Yang
e1b7872fc2 Make it possible to access IR from Python.
Also, add a new trace_fn field to attach forward IR to Variables.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
Edward Z. Yang
c5faaf69d8 Initial IR representation for forward trace.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
hongyi-zhang
bf013f4c99 fix Python 2 gloo install (#2597) 2017-09-02 20:05:37 -04:00
Edward Z. Yang
a03e5cb409 Remind users to submodule update.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-08-30 16:14:38 -04:00
Sam Gross
966fdbd93a Add commands to re-build individual libraries. (#2506)
When working on PyTorch dependencies we often want to rebuild only that
dependency and the Python extension. You can now do that by running:

  python setup.py build_thc

to only re-build THC
2017-08-23 07:16:05 -04:00
Thomas Viehmann
7c04f11d88 search for ldconfig in /sbin for nccl detection (#2276) 2017-08-03 05:32:21 +05:30
Zachary DeVito
43c944acbd Remove dead THPP code that has been replaced with ATen objects. (#2235)
THPP usage is now isolated in THD.
2017-07-29 08:07:41 +05:30
Trevor Killeen
c304d04fc6 Replace thpp::Tensor with ATen Tensor in autograd csrc (#2170) 2017-07-28 10:18:37 -04:00
Soumith Chintala
ea6f9a26b8 fix version number 2017-07-20 13:30:53 -04:00
Soumith Chintala
09abaa2189 make keepdim backcompat warnings emit in autograd as well (#2157) 2017-07-20 01:48:05 -04:00
Soumith Chintala
a5c2546c0f version bump 2017-07-19 12:34:43 -07:00
Soumith Chintala
b660303a16 Static linking against libstdc++ in Binary Build mode 2017-07-19 12:19:36 -04:00
Soumith Chintala
169ca67a4e Adding Spatial Transformers w/CuDNN support 2017-07-12 14:32:06 -04:00
Zach DeVito
ab3d85c410 add build commands for ATen 2017-07-11 10:35:03 -04:00
Trevor Killeen
6df23b418d mark tools as excluded in find_packages (#1915) 2017-06-29 13:49:56 -04:00
Trevor Killeen
cb4eaa9c5d TensorLib/Aten --> changes required in pytorch 2017-06-22 12:55:55 -04:00
gchanan
a64560c22e Remove flattening for torch.dot (#1781) 2017-06-16 02:15:33 +02:00
Edward Z. Yang
3ada9da808 Make csrc -Werror clean. (#1795)
Primary things I had to fix:

- Suppress _XOPEN_SOURCE warnings by ensuring that Python.h is included
  first, because it always unconditionally defines this macro.

- Turn off strict aliasing, because Python 2 doesn't work with strict
  aliasing.

- Workaround setuptools bug, where it's incorrectly passing
  -Wstrict-prototypes to C++ compilers (where this doesn't make
  any sense)

To compile csrc with -Werror, run `CFLAGS="-Werror" python setup.py build_ext`

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-06-13 20:18:09 -04:00
Adam Paszke
714351ff39 Officially enable process-group mode 2017-06-12 22:02:11 -04:00
Gregory Chanan
65b23f146e Add broadcasting support for copy_, simplify code generation by moving a lot of currently generated code to expand_utils. 2017-06-11 05:37:59 -04:00
Gregory Chanan
6a40acb4f0 Add Broadcast plugin. 2017-06-11 05:37:59 -04:00
Edward Z. Yang
ba690d5607 Add support for NVTX functions. (#1748) 2017-06-10 18:26:58 +02:00
Adam Paszke
8ea7c87c29 Improve init methods 2017-06-02 23:42:11 +02:00
Adam Paszke
702a2e3bc5 Make Variables not subclass Function anymore
Because of this Variables can no longer appear in the graph.
Every usage of a leaf Variable will leave an AccumulateGrad
function that has no outputs, but modifies var.grad as a side
effect.
2017-05-01 16:44:56 -04:00
Adam Paszke
2ca787fcf4 Refactor attribute names in autograd 2017-05-01 16:44:56 -04:00
Soumith Chintala
2197e4c766 version bump 2017-05-01 15:54:52 -04:00
Adam Paszke
9169f60a84 Parallelize TensorMethods.cpp builds (#1400) 2017-04-29 09:07:21 -04:00
Soumith Chintala
24e5a9057e Revert "Parallelize TensorMethods.cpp builds (#1364)" (#1390)
This reverts commit 060048bcd8.
2017-04-28 07:59:40 -04:00