Commit Graph

1913 Commits

Author SHA1 Message Date
Zachary DeVito
733e2967b1
Allow __constant__ values in a ScriptModule to be used as attributes for builtin functions (#7017)
* Allow `__constant__` values in a ScriptModule to be used as attributes for builtin functions
* Fix bugs in @script loops

1. while loops run shape propagation multiple times until the shapes have converged.
There were two bugs here. (a) First the 'changed' condition was not checking if it actually
changed the output, and instead would mark changed = true if the two inputs were different.
This incorrect because the output of the block and the input of the block may always have different shapes.
Now it actually checks if it is about to change the output entry that it is writing to.
(b) expand nodes were being inserted into the graph even inside the while loop body. However, if
we iteratively discover that the input shape to one of these expands is actual dynamic, then
it was incorrect to insert the expand in the first place. This changes it so that we only insert expands
after we have converged on the shapes.

2. the way deleteExtraInputs removed loop-carried dependencies was unsafe because it would lookup
Value* elements in the loop body's environment that were previously invalidated when deleteExtraInputs
remove another input to the loop. This changes the way deleteExtraInputs works so that it never has to
read a value out of the loop body's environment to avoid using the invalidated pointers.
2018-04-27 17:44:17 -07:00
gchanan
a6bfa16c17
torch.arange: add numpy-style type inference. (#7016)
* torch.arange: add numpy-style type inference.

This is a backwards-compatibility breaking change.

* Fix flake8.

* Use at::optional.

* Remove unneeded header files.

* Use reference wrapper.

* Update arange for test.

* Address review comments.
2018-04-27 15:11:45 -04:00
Zachary DeVito
b2581c0289 Workaround in onnx to get transposes into init_nets (#6924)
* Workaround in onnx to get transposes into init_nets

This adds a pass to ONNX so that it can speculate Transpose
operators so that ONNX's split pass can put them into an init_net

Also fixes a potential bug in onnx peephole where an optimization
across blocks might move a Value and violate scoping.

* Perform shape propagation when embedding a program into a trace.

This ensures the trace still has type information specific to that trace, which will help onnx export succeed in more cases.
2018-04-26 11:04:17 -04:00
Zachary DeVito
b7487d42a0
Workaround to make PythonOps traced with torch.jit.trace work correctly. (#6738)
The long-term fix is to remove the handling-creating pathways and
remove all the modes from PythonOp making it into an op that simply
calls a PyObject. Right now ONNX expects PythonOp to hold a
nn.Function, not a generic callable, so completely removing the legacy
pathway will also require changes to how ONNX symbolics are found.
2018-04-24 17:21:00 -07:00
Zachary DeVito
0b5910f77e
[jit][script] Fix a bug combining sizes/unsized tensors (#6882)
* [jit][script] Fix a bug combining sizes/unsized tensors

This add an isSubtypeOf method to reflect that sized tensors are a subtype
of Dynamic[Tensors]. It updates the typechecking code to reflect this
relationship.

* Add index_select to shape prop
2018-04-24 14:04:18 -07:00
Sam Gross
9765bb5f1e Revert "Fix performance regression of simple indexing cases (#6793)" (#6886)
This reverts commit 8a016693c0.
2018-04-23 22:22:12 -04:00
Zachary DeVito
b8ada7380a
Tuple literal and cat support (#6691)
* Support list and tuple literals: Adds support for [a, b], (a, b) and "a, "

* Allow non-tensors to reach emitBuiltinCall, each SugaredValue::call
is now responsible for checking the types of its inputs.

Add support for calling cat with a tuple to emitBuiltinOp
2018-04-23 10:58:07 -07:00
James Reed
814f791f2b [JIT][script] Improve error reporting for tuple type mismatch (#6819)
Previously we would see errors like:

variable 'states' previously has type (Tensor, Tensor, Tensor, Tensor, Tensor, Tensor) but is now being assigned to a value of type (Tensor, Tensor, Tensor, Tensor, Tensor, Tensor):
since the default case in the diagnostic printout was "Tensor". This adds a virtual member function to each Type class that returns a human-readable string for better error reporting

* Improve error reporting for tuple type mismatch

* Add better Tensor printout
2018-04-22 13:54:52 -04:00
James Reed
ef76e24f60
[JIT][script][ONNX] ScriptModule ONNX export + ONNX export for control flow nodes (#6608)
* ScriptModule ONNX export

* ScriptModule ONNX export

* Export for control flow nodes

* Add pretty-print capability for ONNX export testing

* Update tests and handling of mutliple GraphProto names

* Maybe bugfix?

* factor out code from export and pretty print
2018-04-19 23:45:03 -07:00
Zachary DeVito
c420297545
[jit][script] Constants python int now turn into Long (#6728)
This matches the behavior or literals.
2018-04-19 21:33:29 -07:00
gchanan
8a016693c0
Fix performance regression of simple indexing cases (#6793)
* Fix performance regression on simple cases of indexing

Dispatches to the old kernels

* Adapt JIT test

The test was expected to fail, but due to the change in the previous diff, it would now dispatch to index_select, which succeeds. I modified the function to go through the advanced indexing codepath

* Only do checks once, properly AutoNoGil, AutoGPU.
2018-04-19 23:41:44 -04:00
Zachary DeVito
a3f3817fbd
[jit][script] Allow variables to be define in if statements (#6675)
We allow variables defined inside of if statements to be defined after
if statements as long as they will be defined unconditionally. This
supports a larger subset of python programs than we supported before.
2018-04-19 11:32:31 -07:00
gchanan
e1f5d80d5c
Eliminate handle_zero_dim when broadcasting is applied earlier. (#6683)
* Eliminate handle_zero_dim when broadcasting is applied earlier.

This ends up not actually doing anything unless all the broadcasted tensors are scalars,
which ends up with inconsistent behavior in that case only, because the type promotion rules are different.

This is better solved with real type promotion logic.

* Change type of script comparison to long.

* Fix jit tests.

* Fix cpp jit test by being consistent about long-vs-float.

* Consistent float and long.

* Use int64_t rather than long.
2018-04-18 23:37:54 -04:00
Zachary DeVito
f656301526 Allow traces to call @script functions (#6642)
This adds the ability to trace script functions while preserving their
control flow. When the trace encounters a script function it inlines
the graph of the function into the trace rather than tracing the
function itself.
2018-04-17 15:19:16 -04:00
Zachary DeVito
ee240aa00c
Allow script_methods to be defined out of order (#6341)
This modifies the registration process so that all script methods
in a ScriptModule are defined at once.

Method gains a `method_creator` callback that gets invoked when the
method is first called to define it if it has not already been defined.
Recursive cycles in this `method_creator` are checked.

This approach was chosen over first creating all the graphs and then
inlining the call sites because it will combine better with type
propagation for non-tensor types like tuples. e.g.

```
a = foo(b)
return bar(*a)
```
2018-04-16 15:19:05 -07:00
James Reed
e8d2f05931
[JIT] Switch JIT passes to take a graph rather than TracingState (#6598)
* Switch JIT passes to take a graph rather than TracingState

* Add pybind11 binding for ONNX pass from graph

* Fix canonicalize pass

* address comment

* Switch ToONNX to explicitly return new graph

* optimize_graph instead of optimize_trace
2018-04-13 17:38:22 -07:00
Zachary DeVito
825ce7f196
[jit][script] Allow tuples to be re-assigned (#6538)
* Allow tuples to be re-assigned

This commit improves our support of tuples by making them more first-class.
In particular, it allows tuples to be re-assigned across loops and ifs.
It does this by making them first-class values in the Graph IR, and then
removing the tuples in a LowerTuples pass.

An alternative approach would have added more support for desugaring tuples
in the Environment object as they were emitted. Instead,
the current approach was chosen anticipating a future when tuples are
fully supported (including the interpreter). In that future, the current
code can be completly reused with the LowerTuples pass just becoming
a optimization that removes unneeded tuple allocations.
2018-04-13 17:34:50 -07:00
James Reed
3b0204d43c
[JIT] Hacky: Staged symbolics for RNN nodes (#6297)
* Staged symbolic for RNN modules

* Move function to symbolic.py

* Add comments, improve tests, fixup logic
2018-04-12 16:29:25 -07:00
Tongzhou Wang
e01569afd7 Restore allow_unused functionality (#6553) 2018-04-12 21:30:42 +02:00
Zachary DeVito
8995ddda05
[jit][script] Check that each builtin returns the right number of values. (#6492)
* Fixes to the way script handles multiple values, and other minor fixes.

This commit improves our handling of operators that return multiple values.
Builtins are now checked so that they return the right number of values,
and support for TupleValue is extended to all things that can return
multiple values.

This resolves issues where the compiler accepted things like:

  a, b = c + c

This would cause the interpreter to crash. Now each operator knows
how many results it will produce and can check it against the number
of requested inputs.

Notes:
* Allow True/False literals in constant expressions
* make handling of keyword constants more consistent to support True/False
* make parsing constants match the way we construct constants from python
* improve the error messages when accessing bad graph attributes.
* switch findTensorOp to return an optional.
* check that attribute types are correct in findTensorOp
* Check the correct number of outputs for builtins

This also changes emitExpr to return a single SugaredValue

Rather than possibly returning multiple values, emitExpr now
always returns a single value, which _might_ be a tuple. This approach
more closely follows python making the code easier to follow.

Checks for returning the right number of values are now located in
the assignment operator, and occur when unpacking the tuple.

We still pass `n_binders` to function calls so that calls into python
know how many values they should return.
2018-04-12 10:32:49 -07:00
James Reed
ad5d421554
[JIT] Implement staged symbolics for pack_padded_sequence/pad_packed_sequence (#6256)
* Unit test for pack_padded tracing

* Move monkeypatching stuff

* Switch symbolic

* Fix stack traces and update test

* Fixup and confirm e2e working

* lint

* Move monkeypatch back to onnx

* Address comments

* remove extraneous import

* Add gradient checking

* lint

* Address comments

* improve test case
2018-04-10 11:30:50 -07:00
James Reed
1533155c4e
[JIT][script] Implement compile-time tuples & starred unpacking (#6214)
* Something that works

* Tuple sugared value

* Works with commenting out input size check

* support string frontend

* Initial starred assignment

* Fix parser

* Fixup tests

* clang-format

* fix rebase error

* lint

* move star assign test to string frontend to make py2 happy

* Py2 fix: parse starargs from Call node

* Address some comments

* Fixup merge

* Remove overloaded unary operators

* Bugfix and test case

* Address a few more comments

* asValues -> asTuple

* Remove unrolledFor stuff

* Fixup getValues

* Pass CallsiteDescriptor struct and have different behavior for different call types

* Address comments and lint

* some type checks

* Address comments

* lint

* Fix mistake
2018-04-09 19:34:51 -07:00
Adam Paszke
c1cd6eab9f Handle broadcasting in the JIT (#6084)
* Add size checks to JIT's fuser

* Handle broadcasting in shape propagation pass

* Fix build errors and add tests
2018-04-05 17:07:52 -07:00
Zachary DeVito
5ab30eedf3
Add __constants__ to Script modules (#6092)
Like `__slots__` the `__constants__` property changes the set/getattr behavior of a script module for the keys listed so they behave as constants.
This enables script methods to use them in way that are otherwise not allowed.

* Python numbers/bools can be inlined as constants in script code.
* List of numbers can be iterated over using for loops
* nn.ModuleLists can be used in for loops as well, unrolling their content.
2018-04-05 11:31:43 -07:00
James Reed
9f49be51ec Fix argument checking for inlining a module (#6207) 2018-04-02 23:14:04 -04:00
Adam Paszke
da6c3c90d9
Relax constraints on return statements in the script (#6070)
Script functions can now have no return statements, empty
return statements, or return one or more values.

Additionally fix the lexer to always emit TK_NEWLINE before
TK_DEDENT, which simplifies the parser.
2018-03-31 18:35:33 +02:00
James Reed
5fe3c406f2 Experimental support for different ONNX export types (#6016)
Allows you to export an ONNX model as:

Protobuf file (this is what we have now)
Uncompressed zip archive
Compressed zip archive
Directory

* Experimental support for different ONNX export types

* Remove a copy

* Add comment

* Add test cases

* lint

* fix bug

* address comments
2018-03-30 15:30:38 -04:00
Richard Zou
1807bacd65 Fix printing of unknown binop operator in torchscript (#6069)
Before, using an unknown binary operator like `@`:
```
import torch
@torch.jit.script
def mm(x, y):
    return x @ y

x = torch.randn(4, 3)
y = torch.randn(3, 2)
mm(x, y)
```
resulted in [this not-so-readable trace](https://gist.github.com/zou3519/052b8998108c4bc0fe0e7c85c6f5758e).

Now, it tells the user that the problem is an unknown binary operator:
```
NotSupportedError: unsupported binary operator: MatMult
@torch.jit.script
def mm(x, y):
    return x @ y
            ~~~ <--- HERE
```
2018-03-28 19:41:45 +02:00
Zachary DeVito
0f198fa723
Add additional script module functionality (#6033)
* allow calls to non-script methods, allow python non-script attributes in methods
* add test to make sure submodules are not reassigned
* Test that we can change python attributes
2018-03-27 23:37:56 -07:00
Richard Zou
5d628db0a2 Deprecate ctx.saved_variables via python warning. (#5923)
* Deprecate ctx.saved_variables via python warning.

Advises replacing saved_variables with saved_tensors.
Also replaces all instances of ctx.saved_variables with ctx.saved_tensors in the
codebase.

Test by running:
```
import torch
from torch.autograd import Function

class MyFunction(Function):
    @staticmethod
    def forward(ctx, tensor1, tensor2):
        ctx.save_for_backward(tensor1, tensor2)
        return tensor1 + tensor2

    @staticmethod
    def backward(ctx, grad_output):
        var1, var2 = ctx.saved_variables
        return (grad_output, grad_output)

x = torch.randn((3, 3), requires_grad=True)
y = torch.randn((3, 3), requires_grad=True)
model = MyFunction()
model.apply(x, y).sum().backward()
```
and assert the warning shows up.

* Address comments

* Add deprecation test for saved_variables
2018-03-26 14:13:45 -04:00
James Reed
213fa61706 Implement range for loop in script (#5827)
* Implement range for loop in script

* Fix handling of boolean constants

* Use WithInsertPoint

* Allow dynamic max trip count

* fix symbols

* Fix argument order

* fix test

* Add insert{Input,Output} APIs and use them

* Factor out condition stuff

* clang-format

* Address remaining comments

* Fix tests

* Implement script in AST frontend
2018-03-23 11:55:32 -04:00
Adam Paszke
a58f2d242a
Test both Python and string JIT frontends (#5891) 2018-03-22 16:58:36 +01:00
Zachary DeVito
c8d1ec02be
[jit] Have ScriptModule inherit from Module (#5769)
* Have ScriptModule inherit from Module
  This is accomplished by created replacement _parameters, _buffers,
  and _modules which implement the OrderedDict APIs but which
  actually get/set their members inside script::Module
* Merge TracedModule with ScriptModule
* Move logic of attribute handling into Python bindings rather than
  make script::Module handle it. This was redundant with nn.Module,
  which already handles attribute.
* Make TracedModule a subclass of ScriptModule
* Move handling of attribute kind logic into bindings.
* Allow ScriptModule to contain non-script module submodules.
2018-03-22 00:17:49 -04:00
Adam Paszke
e6ac93b817 Add support for number and list literals in Python frontend (#5843) 2018-03-17 10:22:23 -04:00
Edward Z. Yang
acc409396b
Namespaced symbols (#5820)
* Namespaced symbols

- Our interned strings now have structure, "ns::symname" rather than just
  "symname" before.  We support efficient namespace testing for uniques
  by encoding the namespace in one byte in the Symbol internal representation.
  See torch/csrc/jit/interned_strings.h for a more in-depth implementation
  discussion.

- All uses of ksymbol are now attr::symbol (or some appropriate namespace).
  The valid namespaces are prim, attr, onnx and aten.

- Symbol is bound in Python as a qualified string "attr::symbol", EXCEPT for the
  attribute setting/getting API, whose symbols must always be attr
  symbols; they get special cased to assume strings are passed.
  There's a little bit of naughtiness in the implementation, maybe you know
  how to solve it.

- However, the g.op() convenience function assumes that you're generating
  ONNX operators, unless you explicitly qualify.

- All ATen operators and nodes have built-in interned strings generated
  for them, so you should never have to write a string literal ever again.
  The tracing code is adjusted to use it.

- ONNX exporter now properly tests to see that all operators are in
  onnx namespace before accepting the export.  This is way more
  robust than the previous exporter, which would be willing to
  export capitalized operators which were not actually ONNX operators.

- A slight organizational change for symbolic.py; this module now ONLY
  contains aten operators.  In particular, the exporter for Constant
  has moved into utils.py (along with Undefined, from the C++ side),
  since primitive ops get "special treatment."

- The un-inplacing logic in recording is more robust, so that we don't
  delete a trailing underscore from __and__.  This never affected us
  before because we didn't have any tests for it.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-03-16 13:36:11 -04:00
Adam Paszke
eeb90d9c95
Add a Number node to the JIT AST and unify script syntax with Python (#5716) 2018-03-15 20:56:23 +01:00
Adam Paszke
c66111e79b
Desugar torch.* and F.* functions in JIT script (#5784) 2018-03-15 12:02:31 +01:00
Adam Paszke
694bee1f7e
Fix the rule for Assign in JIT's Python frontend (#5793) 2018-03-15 09:14:03 +01:00
Zachary DeVito
41285edbb6 [jit] add a compiled script module (#5630)
Add script::Module C++ class to represent script modules
switch AST -> IR conversion to work on Modules/Methods rather than raw graphs
function-only AST -> IR conversion is just a simplified case where there is
only one module with a single method and no parameters.
introduce SugaredValue in compiler.h to represent values in scope in a script
function that are not first-class and that get desugared. This is used to
represent the module's self parameter, as well as python function calls,
and method calls on tensor
provide a Python ScriptModule that provides a nice API on top of script::Module
allowing for the definition of script modules with methods, parameters,
and submodules
Not in this PR but intended for the future:

ScriptModule actually subclasses nn.Module, with most methods implemented
Unification of tracedmodule and script module functionality into one container class.

Detailed changelog:

* Switch compiler over to using Module, but don't
use them yet.

* Remove intermediate attribute encoding in compiler

* Create SugaredValue object to handle resolution
of compiled module.

* switch to_ir to modules, implement Select

* hacky python wrappers

* Private ScriptModule

* Add `define` to script module

* Attributes use TK_LIST_LITERAL

this anticipates adding a real list literal expression to the language.

* Add a metaclass to make sure script stubs are registered

* Add a test

* Doc createResolutionCallback

* Docs and minor editing

* Address PR comments

* Document

* Fix unicode issue
2018-03-12 09:52:40 -04:00
Adam Paszke
e4c303f373
Defer shape analysis failures until runtime (#5574) 2018-03-09 18:43:03 +01:00
Adam Paszke
5597aba868
Add return statement to the JIT AST (#5578) 2018-03-06 13:14:53 +01:00
James Reed
60415cf0d2 Big batch of fixes for JIT (#5517)
* Check if node output matches in shape propagation

* Fix list attributes and view shape propagation

* fix inferred shapes for view

* Fix shape inference for integrally typed tensors

* Fixes for concat in control flow

* Fix print
2018-03-02 15:03:44 -08:00
Adam Paszke
4afd62db09 Add TracedModule to the JIT (#5409) 2018-02-28 22:50:50 -08:00
James Reed
55c64e5243 Add Python function calls to JIT script (#5445)
* Add Python function calls to script
* Script compiler gains a `Resolver` object that runs when it does not understand a function call. This decouples the python resolution from the conversion to IR.
2018-02-28 19:45:04 -08:00
Zachary DeVito
39608b0180
Add source information to IR nodes (#5449)
* Add source information to IR nodes

SourceRange information from the script is not propagated to IR nodes.
This information is only used in two places now: the interpreter
wraps errors that occur when an instruction executions and shape
propagation now reports errors on the line where it fails:

    Traceback (most recent call last):
      File "test/test_jit.py", line 1655, in test_script_error
        bar(Variable(torch.rand(10), requires_grad=True), Variable(torch.rand(9), requires_grad=True))
    RuntimeError:
    The size of tensor a (10) must match the size of tensor b (9) at non-singleton dimension 0:
    @torch.jit.script
    def bar(c, b):
        return c / b
               ~~~~~ <--- HERE

In the future, shape propagation should really not report any size
errors and instead just not propagate shapes and let the actual
execution fail. However, this is hard to accomplish while we still
depend on running the op to do shape propagation.
2018-02-28 17:06:18 -08:00
Zachary DeVito
05269b582b
[JIT] Support shape propagation with control-flow (#5391)
Support shape propagation with control-flow

* This allows us to enable optimization in the GraphExecutor for most
  script tests.
* Changes Type to always be present (non-null) on a Value, removing `hasType()`
  and `typeOption()`. A new type kind 'DynamicType' now represents when
  a specific type has not been determined.
* If/Loop nodes propagate shapes/types in the simple cases where types of
  outputs do not change depending on where control flows. In other
  cases, we propagate DynamicType to indicate we do not know what
  the shape will be.
* Remove the `cond` input to the body of Loop to simplify handling in
  interpreter and shape propagation.
* Bugfix for zero-dim contiguousStridesOf
2018-02-26 15:24:05 -08:00
Zachary DeVito
c6d47f6386
add @torch.jit.script, @torch.jit.compile, torch.jit.CompilationUnit(str) (#5367)
* torch.jit.trace annotation now creates a GraphExecutor

The other torch.jit.trace, which was used for testing purposes and for onnx to get the trace graph, is now called torch.jit. torch.jit.get_trace_graph.

* @script annotation, and compilation unit for strings
2018-02-26 13:22:45 -08:00
Adam Paszke
a0118533ef
Add a print() function to the JIT script (#5274)
Additionally:
- add support for calling functions that are not methods in the Python frontend
- add an end-to-end test for the Python frontend
- add a capture_stdout helper for checking that `print` actually works
2018-02-24 11:15:55 +01:00
Sam Gross
30ec06c140
Merge Variable and Tensor classes (#5225)
This replaces the torch.Tensor constructors with factories that produce
Variables. Similarly, functions on the torch module (e.g. torch.randn)
now return Variables.

To keep the PR to a reasonable size, I've left most of the unused tensor
code. Subsequent PRs will remove the dead code, clean-up calls to
torch.autograd.Variable, and rename Variable to Tensor everywhere.

There are some breaking changes because Variable and Tensors had
slightly different semantics. There's a list of those changes here:

 https://github.com/pytorch/pytorch/wiki/Breaking-Changes-from-Variable-and-Tensor-merge
2018-02-23 18:03:31 -05:00
Zachary DeVito
8904616028
add control flow to interpreter (#5293)
* Use stacks in the interpreter/aten_dispatch

Rather than have separate input/output lists,
the interpreter now works using a single stack.
Operators in the interpreter push/pop from the stack.
This allows ownership of tensors to transfer directly to an operator,
and an operator can drop the reference to a tensors as soon as it is
no longer needed. This is important for the GraphExecutor op,
which recursively runs the interpreter.

Once autograd is updated to pass variables to Function by value,
we will be able to ensure that we release ownership as soon as possible.

This commit also switches the interpreter to use a fake
tensor 'ContainerTensor' rather than at::Retainable to hold non-tensor
data in the interpreter. This allows us to use std::vector<at::Tensor>
for all registers, which is significantly less confusing than the
OwnedRetainables struct it was replacing.

* Add If and Loop to interpreter

* Preprocess loop to calculate where references to tensor should be dropped
* Add control instructions JumpZ/JumpNZ/Jump
* Switch from explicitly having stage structs to having a single list
  of instructions with Store/Load instructions to take values off the
  initial stack
* Make the interpreter tests executable rather than use expect files
* add a flag to interpreter code so that constants are variables
  if the interpreter is running on variables.

* Add tensor_as to its own file
2018-02-22 19:56:15 -08:00
Tongzhou Wang
1848cad108 [ready] Layer Normalization (#4922)
* at::maybe_data_ptr and Check.h => TensorUtils.h

* THNN support for optional BN running_*

* ATen support for optional BN running_*

* Python nn.* support for optional BN running_*; Improve IN and BN doc

* Add tests for IN and BN new option

* Layer Norm

* Fix LRN doc

* functional interface for LN and IN

* Layer norm tests

* fix BN double backward returning undefined tensors

* fix jit test using wrong dim inputs for BN

* add/improve BN, IN and LN GPU tests with half type

* Udpate docs to be consistent with Conv notation
Fix onnx
Clarified onnx symbokic wrapper

* fix typo

* Address comments
2018-02-22 11:56:41 -05:00
James Reed
5eefe87d4e Emit ternary if in script compiler (#5291) 2018-02-18 09:53:13 +00:00
James Reed
3ffd6ffa7d while and if for experimental JIT script (#5176)
This commit adds while and if support to the experimental script frontend, following the design of ONNX.
2018-02-16 15:30:18 -08:00
Adam Paszke
cb2fd39fdd
Add Python frontend to the JIT (#5190) 2018-02-15 22:53:19 +01:00
Adam Paszke
99474d28b8 Fix compound assignment in JIT script (#5178) 2018-02-12 09:12:28 -08:00
bddppq
3e85613751 Experimental jit script (#5074) 2018-02-07 20:43:45 +01:00
Sam Gross
895aebac08
Use Variable instead of Tensor in Function.forward (#4786)
The Tensor and Variable classes are being merged.
autograd.Function.forward is now called on Variables, but with "no-grad"
mode (torch.no_grad()) enabled.

One benefit is that we no longer have to explicitly track shared
storages.
2018-02-06 17:24:27 -05:00
Zachary DeVito
c308e03f3e
Initial GraphExecutor Implementation. (#4982)
This adds the initial implementation of graph executor for the new JIT design. It includes a few python tests ensuring that nograd, backward, and double-backward cases work for simple examples and some corner cases. More work needs to be done to performance optimize as there are many extra copies and places where we hold onto variables longer than we should. These are noted in the comments.
2018-02-02 17:45:59 -08:00
Adam Paszke
ae903ca61a
Fix JIT tracing in autograd codegen (#4941) 2018-01-31 00:14:36 +01:00
peterjc123
c011c8b5a6 Enable fixed tests again in Windows (#4928) 2018-01-30 16:33:49 +01:00
Zachary DeVito
0ae5498079 [JIT] add create_autodiff_subgraphs (#4822)
This pass splits differentiable subgraphs into their own Node,
similar to a fusion group.

This initial implementation does not create optimal subgraphs, but
it works well in the case where most things are differentiable,
and has the building blocks (`mergeNodes`) to extend to the
better implementation.
2018-01-23 23:46:54 -05:00
Edward Z. Yang
27505e6429 Fix #4480 by tracing inputs before running function. (#4807)
* Fix #4480 by tracing inputs before running function.

The DCE trick says that if I have y = f(x), and f is internally implemented as
g, it's OK to trace both g and f. Recall the tracing algorithm is:

    enter f(x)
    compute its result y
    trace y = f(x)
    return from f

So when you run the example above, you'll do this:

    # suppose x is mapped to %1
    enter f(x)
    enter g(x)
    result of g is y
    trace y = g(x a.k.a. %1) (mapping y to %2)
    return from g
    result of f is y
    trace y = f(x a.k.a. %1) (remapping y to %3)
    return from f

and end up with a trace like this:

    %2 = g(%1)
    %3 = f(%1)

... only %3 is live, because %2 was killed from the mapping...  Subsequent DCE
will eliminate the invocation of g and you'll only see f in the final trace.

However, if f and g are inplace functions, the machinery breaks:

    # suppose x is mapped to %1
    enter f(x)
    enter g(x)
    result of g is x
    trace x = g(x a.k.a. %1) (remapping x to %2)
    return from g
    result of f is x
    trace x = f(x a.k.a. %2) (remapping x to %3)
    return from f
    resulting in:

    %2 = g(%1)
    %3 = f(%2) # OOPS

This commit changes the strategy so we instead do this:

    enter f(x)
    trace f(x)
    compute its result y
    trace y = f(x)  (computed above)
    return from f

Now we get the correct Value before it is overwritten.
Here is what the new trace code looks like:

    jit::tracer::PreTraceInfo trace_info;
    if (jit::tracer::isTracing( self, index )) {
      trace_info = jit::tracer::preRecordTrace( "index_fill", { self, index } );
      setattr(trace_info.n, jit::Symbol("dim"), dim);
      setattr(trace_info.n, jit::Symbol("value"), value);
    }
    baseType->index_fill_(self_, dim, index_, value);
    increment_version(self);
    rebase_history(self, grad_fn);
    if (trace_info.state != nullptr) {
      jit::tracer::postRecordTrace( trace_info,  { self } );
    }

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Revert "Hot patch ONNX _run_symbolic_function"

This reverts commit d1c973fee1.

* lintfix

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Add missing expect file

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-01-23 18:06:55 -05:00
Adam Paszke
1a02d3ae86
Implement MM fusion (MM with add reduction tree) (#4615)
Implement MM fusion (MM with add reduction tree)

A tree where leaves are matrix multiplies and inner
vertices are adds can be computed as a single mm.
Such subgraph often appear in backward if a single weight
is reused multiple times (e.g. in RNNs).

NOTE: this seems to be slightly slower on the GPU than the
naive implementation, but it's a huge win on the CPU
(think 100x lower overhead)
2018-01-17 21:36:21 +01:00
Luca Antiga
040336f5dc Further fix to tracing scope (#4558)
* Set missing temporary scope in callPySymbolicMethod

* Use expected traces in all scope tests
2018-01-09 15:57:40 -05:00
Zach DeVito
674ddf6b91 Fix multi-gpu fuser bug
cuModuleLoad is only valid for a single device so we need to
compile for the particular device that the fusion group will run on.
CompiledFunction already specializes different traces for tensors,
so we just need to have fusion_compiler produce the cuFunction on
the right device.
2018-01-08 15:04:22 -08:00
Luca Antiga
d3612a5914 Fix tracking of tracing scopes during ONNX pass (#4524)
* Fix tracking of tracing scopes during ONNX pass

* Use ResourceGuard to manage setting a temporary current scope in Graph

* Add tests for ONNX pass scopes

* Remove unused num_classes argument
2018-01-08 12:20:52 -05:00
Edward Z. Yang
dc76db349e Delete a pile of dead code (#4295)
* Delete obsolete basic ops.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* More deletion.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Delete some unused utilities.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Delete dead apply_fn

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Delete CppFunction symbolic support.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Delete ForwardFunction

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Batchnorm is 'working'

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-01-04 09:21:54 -05:00
Adam Paszke
2d2b157d25 Handle repeated outputs in the tracer 2018-01-03 17:29:27 +01:00
Adam Paszke
e6cbe84bf6 Handle repeated inputs in JIT tracer 2018-01-03 17:29:27 +01:00
Edward Z. Yang
5f7c5502b8
Further improvements to ATen convolution (#4287)
- Rename THNN convolution to have thnn_ prefix.
- Propagate CuDNN benchmark and deterministic to at::Context
- Add 'convolution', 'convNd' and 'conv_transposeNd' native wrappers, with defaults
  The conv_transposeNd wrappers are updated to have the same argument
  order as Python.
- torch.nn.functional directly dispatches to the native wrappers
- Make it possible to turn off tracing for some native wrappers, so I don't
  have to write symbolics for all the functions above
- Spectral ops can now make use of CuDNN convolution if possible
- Better commentary on cudnn_batch_norm
- Turn on DCE for all JIT tests.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-12-21 13:03:43 -05:00
Zachary DeVito
674d7d1f8e Allow compiled functions to call compiled functions. (#4286) 2017-12-20 19:02:59 -05:00
Adam Paszke
efb6feb242 Make the JIT interpreter handle unused inputs correctly 2017-12-20 11:27:40 -08:00
Edward Z. Yang
6daf34ce7b Don't mark index as traceable, and other improvements (#4249)
* Improve 'untraced variable' message, add failing test.
* Make index traceable.
2017-12-20 11:25:59 -08:00
Zachary DeVito
766312b7f2 Further relax VariableFlags, ... and fix bugs (#4244)
* Further relax VariableFlags

* Allow a requires_grad=True trace to be used for a requires_grad=False
  input by computing the gradient but they not connecting it to the
  input.
* Enable CSE to de-duplicate WLM backwards pass code which calls sum twice.
* Fix a bug in the interpreter that frees a register too early when
  it appears twice in a use list.

* [fuser] Follow all outputs to check if fusion is safe

This bug was introduced when we allowed fusion groups
to fuse together. Previously producers were forced to have a single
output, but now producers that are fusion groups can have multiple outputs.
So now we check the uses of all the outputs of a producer.

* [JIT] Fix handling of undefined inputs

It is not legal to call .data() on variable objects whose tensors
are undefined.
2017-12-20 10:36:22 -05:00
Will Feng
1681d07199 Disable tests and fix issues with Windows CUDA build (#4251) 2017-12-20 11:30:21 +01:00
Sam Gross
d605058212
Replace Variable.volatile with torch.no_grad() (#3970)
This removes volatile from Variable. The functionality is mostly
replaced by a global (thread-local) flag, which is controlled by
torch.set_grad_enabled() and the context manager torch.no_grad().

In C++, the flag is exposed through GradMode::is_enabled() and GradMode::set_enabled()

Fixes #3627
2017-12-18 15:46:13 -05:00
Zachary DeVito
0e804ae042 [jit.compile] add a jit_debug_info method (#4205)
This method prints a bunch of useful debug information including
the traces that have been record, their shapes, and the traced
graphs associated with them.
2017-12-16 13:26:28 -05:00
Edward Z. Yang
6d72c82985
Trace ATen native functions as themselves, not their implementations. (#4127)
* Trace ATen non-primitive functions as themselves, not their implementations.

Previously, if I invoked an ATen non-primitive function foo, which in turn
called subfoo, I would always see 'subfoo' in the trace (e.g., tracing
'inlines' all of these operations.)  Such inlining is bad for ONNX
(and can be bad for optimization) as it prevents high-level
optimizations from taking advantage of the structure.  It might
be right to inline, but give the optimizer a chance to work before
inlining happens!

The implementation here is surprisingly simple, because it uses
the "DCE trick".  Essentially, it doesn't matter if the constituent
calls perform tracing, because you can always trace it again, and
override the trace nodes associated with the returned variables.
The original trace becomes dead and can be DCE'd.

While implementing this, I also refactored how 'isTracing' and
'trace_outputs' works:

- isTracing was previously a single function with overloads for
  both Tensor and Variable arguments.  Unfortunately, such overloads
  are not safe, because of how C++ implicit conversions work.  You
  would think that C++ should never confuse an overload for
  Variable with ArrayRef<Tensor>, but this is exactly what can
  happen: Tensor is convertible to both Variable and ArrayRef<Tensor>,
  thus it's ambiguous and C++ doesn't like it.  The last time I ran
  into this problem, I applied initializer lists to everything and
  called it a day.  A more robust fix is to separate out the
  Variable and Tensor overloads, which I have done in this patch.

- trace_outputs was fed as an initializer list, which doesn't work
  when you have heterogenous inputs.  So instead we first feed
  everything through 'flatten', which has overloads for each of the
  argument patterns in ATen, which then goes on to the recordTrace
  (which takes an ArrayRef).  This is *no less efficient*, because
  we were allocating a vector anyway (to do the conversion from
  vector of Tensor to vector of Variable).

This fixes mean that 'index' can properly be traced... although the
JIT still does not support it.  A failing test case has been added to
this effect.

Some knock-on effects:

- The fuser now knows about chunk as well as split.  They're pretty
  similar so there is no problem.

- There is a new 'canonicalize' pass in the JIT which renumbers a graph
  so that all structurally equivalent graphs render the same.

- We run DCE before the fuser tests, to make sure dead nodes don't
  block fusion.

- There are new ONNX exports for the newly introduced higher level ATen
  operations.  This includes type_as (no-op case only), chunk, select.

Zach didn't like the extra use of 'native' in the new codegen, so
we've introduced a new concept, 'abstract'.  An abstract function
is one that is implemented in derived types (e.g., CPUDoubleType),
where as a concrete one is implemented in the base type (Type).

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-12-15 13:50:32 -05:00
Will Feng
db446d69ca Fix issues with Windows 7 & 10 CPU build (#4065) 2017-12-15 10:14:43 +01:00
Zach DeVito
f72fe0624d Add a CPU Fuser (single core)
This adds a simple fusion backend for the CPU.
* Refactors CompiledFusionFunction to have two subclasses that handle
  the compilation details of each backend.
* emit-compile-link-run cycle for the CPU
* simple single core loop to run the operation
* lift CUDA-only restrictions in the fuser, checks that fusion groups
  are only on a single backend.
2017-12-04 14:13:44 -05:00
Luca Antiga
4eb8e12765 Introduce scopes during tracing (#3016) 2017-12-04 09:19:06 -08:00
Zachary DeVito
929a11f920
Add interpreter support for Handles/PythonOp/CppOp (#3866)
* Add interpreter support for Handles/PythonOp/CppOp

This treats Handles as a first-class type in the interpreter
since this turned out to be conceptually simpler than treating
them as a separate concept, which requires a second channel for
register allocating and moving data from one op to the next.

Notes:
* The refcounting nature of tensors is factored into its own base type
so that it can be shared with other refcounted types such as handle.
* Some methods redundant with TensorBase have been deleted from Tensor
* The interpreter uses raw refcounted handles. In addition to being
able to treat Tensors and Handles as the same base object, it removes
a lot of redundant refcounting as objects moved from tensors to input/
output lists.
* aten_dispatch has been updated to work directly on the raw refcounted
lists to avoid refcounting and duplicate lists.
* Removing jit_closure.cpp, The interpreter can now handle all pathways.

* Functions like `unsafeToTensorShare` describe how
ownership transfers in the interpreter. The `Steal` variants
take rvalue references as arguments, and invalidate those
arguments to prevent potential problems.
* Make TensorTemporary is not a  subtype relationship because it is too easy to
do something horribly unsafe:

```
  void foo(at::Tensor bar) {
    // bar destructor call release on a temporary!
  }

  foo(TensorTemporary(retainable)); // structure slicing!
```
2017-11-29 11:38:57 -05:00
Adam Paszke
669a99b595 Remove as much of Python from JIT hot path as possible 2017-11-27 11:42:47 +01:00
Soumith Chintala
c5e8048f58
add error checking for FusionCompiler around old CUDA versions (#3753)
* add error checking for FusionCompiler around old CUDA versions

* improve error message
2017-11-21 18:27:12 -05:00
Adam Paszke
3e4a777e44
Correct JIT interpreter autograd function (#3760) 2017-11-19 21:48:22 +01:00
Edward Z. Yang
689cf7d480
Reduce nondeterminism in test_jit (#3561)
Occasionally Travis builds would fail on these two tests.
It's not entirely clear where this nondeterminism is coming
from.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-11-17 19:48:59 +08:00
Adam Paszke
d1fb8fdf03 Improve IODescriptors in JIT arg checking 2017-11-17 00:13:02 +01:00
Zach DeVito
ef4b19f767 Refactor ir.h to distinguish Nodes and Values
This commit adds a Value type similar to the one @ezyang suggested a while
ago for handling multi-return nodes.

Previously if we had a graph like:

  a = op1(b)
  c, d = op2(a)

Then its in-memory format would look like:

  %0 = op1(b)
  %1 = op2(%0)
  %2 = select(%1, 0)
  %2 = select(%1, 1)

Select nodes were used only to handle the multi-output case. In the
single-output case ops referred directly to their uses.

This required special handling for the single- and multi- output cases,
and was confusing when used with ONNX which distinguishes values (the
inputs/outputs of a node) from the nodes themselves (e.g. a Conv).

This commit adds the Node/Value distinction to the IR. In the example
above, `a`, `b`, `c`, and `d` are now Value objects, while `op1` and
`op2` are now Node objects. Inputs/Outputs to the graph are values.

* Nodes now always have multiple outputs, accessible through their `output()`
  method.
* Methods exist for adding/removing outputs from a node.
* Nodes own their output Values, destroying a node destroys its outputs and it
is only valid to destroy a node when no uses of its outputs remain.
* Unlike select, Values do not appear in the nodes list.
* The method `node()` on `Value` retrieves its defining node. Calling it
is always valid. For inputs, its kind is "Param". Like "Return" there is a single Param
node representing all inputs.
* For single-output Nodes, the method `output()` retrieves the single
output Value, asserting that the node is in-fact single output.
* Functions are the same, but some functions like `type()` have moved to
Value.
* `replaceAllUsesWith` is now sanely defined for both Values and Nodes.
In the case of Nodes, it replaces all outputs of the node with the outputs
of the replacement node.
* stage is defined both on Node/Value. This is because Inputs require a stage.
* Apart from changing data types from Node->Value most passes remain the same.
  Things that previously assumed single-output nodes now have to call output()
  to get the node.
* This removes the uses = [...] field in the outputs because it was
getting confusing even before this commit when uses would refer to nodes,
but we print the names of Values. The lint pass validates the use list,
so printing it out seems less necessary.
2017-11-15 11:47:18 -08:00
Adam Paszke
3bb2308a89 Minor JIT improvements (#3703)
* Record autograd profiler events in JIT

* Fix the graph fuser

It was supposed to only work for float inputs, but worked
for all types _except_ float.
2017-11-14 21:23:31 -08:00
Adam Paszke
1f1612ee37 Move _CompiledMixin to C++ 2017-11-10 16:31:44 +01:00
Adam Paszke
621fbd5c4e Move flattening/unflattening JIT logic to C 2017-11-06 19:42:44 -05:00
Edward Z. Yang
9ca8b321f5 Skip cpp tests if CUDA not available.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-11-02 02:21:10 -04:00
Edward Z. Yang
247d50e2ad Improve const-correctness of JIT.
This started off as a minor fix based on Adam's question, "why is printing
a graph not const" and snowballed into a giant yak shaving exercise.

- The Graph and Node APIs now uniformly enforce deep constness; e.g., if you
  get a const Node* or const Graph*, it is not possible to get a non-const
  Node*/Graph* somewhere else in the graph (even though the member variables
  of these are non-const.  Hooray for private access specifier.)

- A big pile of functions got const versions, most notably the printing
  functions, and functions for accessing inputs().

- REALLY IMPORTANT, BC-BREAKING CHANGE: inputs() now returns a COPY of the
  inputs, rather than a reference to the underlying.  I was forced to do this
  because there is no way to portably turn a std::vector<Node*> into a
  std::vector<const Node*>, which is necessary to provide a const-correct
  version of inputs() that enforces deep const-correctness.  I then justified
  this choice to myself with the observation that outputs() returned a
  copy (by necessity), so this makes the API more uniform.

  But making this change uncovered two very subtle bugs:

    1. If you change functions from returning a reference to returning a copy,
       the idiom node->inputs().begin() is no longer valid, because the memory
       the iterator points to immediately becomes invalid.  THIS SUCKS.
       Honestly, we should add a lint rule rejecting calling begin()/end() on
       temporaries because this is very dangerous.  To excise this pattern from
       the codebase, I added begin() and end() methods to Graph, so that we got
       rid of the graph->nodes().begin() idiom, which happens to be sound,
       despite not returning a reference, because graph_node_list is a
       non-owning reference.

    2. pybind11 doesn't handle std::vector<Node*> cast out of the box.
       Fortunately, I found a simple fix in the GitHub issues tracker
       that involved adding an extra type converter.  And yes, this
       does mean that outputs() in Python never worked correctly.

- New const_graph_node_list, which is a graph_node_list that gives you const
  Node*

There are some more miscellaneous improvements:

- Applied CR comment fixes on export.cpp; using replaceInput, and renaming
  variables for clarity.

- assertValidInput helper method added, and applied to replaceInput

- Use an explicit function to print THPObjectPtr, otherwise we get
  the wrong overload.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-11-01 09:49:53 -04:00
Zachary DeVito
8cc30e4895 Fix the Fusion Pass (#3362)
* update fuser to match ATen-formatted JIT ops

* fix concat optimizations and add test

* allow onnx export to work with single-export functions

* fix onnx handling of multi-return nodes.

* nits, format, vision test update

* fix add constant

* fix driver init issues

* Add missing Neg symbolic.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-10-31 13:44:13 -04:00
Adam Paszke
fa0f3cf98a Re-enable and fix most JIT tests 2017-10-27 02:40:09 +05:30
Edward Z. Yang
53fe804322 Make ONNX work with new C++ autograd world.
The general strategy is there is a new module, torch.onnx.symbolic, which
contains a function for every ATen method name with the ONNX translation.
While implementing this, I took the opportunity to expunge all references
of 'g' from the public API; instead, it is managed by a global variable in
torch.onnx which tracks the "current graph".

Other changes:

- If you pass a Tensor to op as an argument, it will now automatically be
  converted into a Constant ONNX node.  This lets us remove needing to
  implement ONNX

- Rename value to other, wherever there is both a Scalar and Tensor overload.
  This way, keyword dispatch can work uniformly in both cases.

- Deleted any autograd Function classes that both had a symbolic and were ported
  to the new C++ autograd implementation.  There may still be some straggling
  classes that didn't have symbolic.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-10-20 15:38:01 -04:00
Edward Z. Yang
f709199c49 Make test_jit more robust about compilation.
It's pretty easy to accidentally fail to actually compile
a JITed region, which means that we have accidentally failed
to have test coverage for a number of features.  This adds
a secret _assert_compiled kwarg, which will raise an error
if we don't actually hit the compiled codepath.

This is not intended to be user visible; we have some other
ideas for handle this case.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-10-14 12:04:40 -04:00
Edward Z. Yang
f7f37306e4 New torch.jit.verify function for verify once-backward.
A few notes about the implementation:
- Need to plumb 'devices' through to the 'fork_rng' calls.  You definitely
  want these; it makes verify run A LOT faster
- New keyword argument for compiled model execution, '_force_trace', which
  forces us to retrace a model.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-10-10 11:46:40 -04:00
Edward Z. Yang
6fbdf40284 Translate addmm into Gemm operator / fix alpha-beta mixup / execute in JIT.
The alpha/beta naming in addmm was flipped; this commit fixes that
problem.  It also fixes the ONNX export of alpha/beta parameters.
Finally, it supports executing matmul in the JIT.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-10-03 17:23:43 -04:00
Adam Paszke
b6b41c829a Add inplace checks in JIT 2017-10-03 10:20:58 -04:00
Edward Z. Yang
954e9e370c Uncurry trace.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-28 12:34:35 -04:00
Edward Z. Yang
600fcf2f04 Delete params.
We have decided we are not going to support it.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-28 12:34:35 -04:00
Edward Z. Yang
0ad6c2d59c Lintfix.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-28 12:34:35 -04:00
Edward Z. Yang
db3349faa3 Support class decorator syntax; remove instance compilation.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-28 12:34:35 -04:00
Edward Z. Yang
0c40305ddd Rewrite torch.jit interface.
torch.jit now contains two user-facing functions: compile and trace
(corresponding to what was previously trace/traced and record_trace).
The non-curried versions of these functions have been eliminated, so
that there is only one function in the API (we *must* have the
curried versions, since these enable their use as decorators).  There is
detailed usage documentation in the docblocks for these methods.

This comes with a complete rewrite of the internals of torch.jit, in the process
fixing a number of bugs.  Key points of the new implementation:

- compile and trace both always return a Module representing the wrapped
  with compilation/tracing underlying function/module.  This makes handling
  of the function/module cases more uniform, as we can think of the function
  case as creating an on-the-fly module with the parameters explicitly
  specified by the user.  For technical reasons, we now *require* any parameters
  in the function case to be honest-to-goodness Parameters (gory details:
  you can't register a Variable as a Parameter to a Module, but you can't
  create a Parameter from a Variable while sharing the same underlying
  identity.)

- Flattening and unflattening is done a lot more uniformly.  We now have
  a _flatten and _unflatten function which are inverses of each other:
  _flatten always returns both the flat, tuple of Variables, *as well as*
  the "proto" (now referred in the code as the "struct") from which we
  can unflatten the variables.  Low level functions like 'raw_trace'
  always work with the flattened inputs/outputs, which keeps their logic
  simple.

- JIT trace keying now also includes the "struct" of the input arguments.
  This is a step towards accepting non-Variable arguments in functions,
  although flatten/unflatten don't currently support it.

- TraceForKey (previously TraceInfo) has had its API reworked to have
  less degrees of freedom when you are interacting with it.

TODO: Verify, timing, and trace dumping have been temporarily excised.  I
plan on adding them back.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-28 12:34:35 -04:00
Edward Z. Yang
c08395e290 Give a better error message when we hit a legacy function.
We now include the type name of the legacy function implementing
class.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-25 12:26:07 -04:00
Lu Fang
f256f686b5 Remove device comparison TODO mark, change the white list to black list on node kind checing 2017-09-22 17:06:27 -04:00
Lu Fang
18a1d272bf Add attributes comparison, fixed several issues, more interesting test case. 2017-09-22 17:06:27 -04:00
Lu Fang
0a1ac8bfe5 create a cse pass, with very naive support. 2017-09-22 17:06:27 -04:00
Edward Z. Yang
8d19319fa7 Documentation for FusionGroup and Eval requested by @houseroad (#2808)
Plus a test for Eval nodes in the IR, since we hadn't actually
covered this case now that some nodes are transparently traceable.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-21 17:14:56 -04:00
Edward Z. Yang
794e52bb1c Make cloneFrom() copy all metadata; use createClone() as much as possible.
To be honest, this was the whole point of this refactor set.

I noticed that in a lot of code, we were repeatedly copying lots of metadata
from old nodes to new nodes.  This was quite concerning because I wanted to
add some more metadata (alias information) and I didn't want to have to
get it right in all cases.  Plus, in a lot of cases we were forgetting
to set more optional properties like debug names when we "copied".

To solve this, I first made cloneFrom() copy all of this metadata.  Then,
I searched for all occurrences of setType() (a proxy for "I'm cloning this
node), looked for cases where we really were morally doing a copy, and rewrote
the code to use cloneFrom() instead, allowing us to drop explicit setType()
(and getting more metadata preservation in the process.)

Finally, I refactored tryToMoveChunk.  The code is modestly longer,
but the new version has the nice property that the initialization of
selects for input_chunk are next to the creation of the node (as opposed
to delayed for later.)  I also added a lot more comments for invariants
I noticed when I was working on the code.

One minor extra change: TensorType grew a new constructor and a withSizesStride
"immutable setter" which returns a new copy of TensorType with different info.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-20 12:24:27 -04:00
Adam Paszke
a7c4152302 Prune null edges in Eval nodes 2017-09-19 10:53:32 -04:00
Adam Paszke
28828e033f Make certain functions traceable 2017-09-19 10:53:32 -04:00
Adam Paszke
214eef5e5d Record device information in TensorType and check it in the fuser 2017-09-19 10:53:32 -04:00
Adam Paszke
b708b6de8d Add ONNX pass (JIT trace initialization) 2017-09-19 10:53:32 -04:00
Adam Paszke
6b60f31081 Fix bugs in AutogradClosure 2017-09-19 10:53:32 -04:00
Adam Paszke
aafa35e0b5 Fix bugs in Traceable
Previous refactor introduced a few problems like not saving the
output proto, and it didn't use the flattened inputs when querying
the key.
2017-09-19 10:53:32 -04:00
Adam Paszke
d90cd88fb7 Improve next_functions hanling in tracer and JIT closure
Added extra logic that records edges of previous stages and allows
JIT closures to copy next_functions for next stages.
2017-09-06 21:35:50 -04:00
Adam Paszke
3b1dfcb51c Add trace flag checking in backward passes too 2017-09-06 21:35:50 -04:00
Adam Paszke
ea888c1905 Check input flags in Traceable 2017-09-06 21:35:50 -04:00
Adam Paszke
230721e198 Support calling traced functions multiple times in forward
* Variables now hold a list of ValueTracingStates and can participate
in multiple traces.

* Refactored Traceable to maintain a list of traces, and only stop
tracing once it records all stages
2017-09-06 21:35:50 -04:00
Adam Paszke
fdbef1cfb0 Traces can now expire 2017-09-06 21:35:50 -04:00
Edward Z. Yang
57eb8bd288 Frontend refactor, and some documentation.
- BC BREAKING: export now also takes a mandatory file-ish argument, specifying
  the file to export the protobuf to.  I rewrote the tests to use BytesIO to
  get out the string so they could parse it again.

- BC BREAKING: export no longer returns the tensors that were computed.  To
  get these, use the internal _export function.

- Multiple inputs to models are now supported by passing a tuple to input.
  (Old API of a single Variable still works.)

- Keyword arguments to models are now supported via kwargs keyword arg.

- Renamed embed_params to export_params, and it now defaults to True.

- Toffee tests now live in their own test_toffee.py file.  I had to
  rename a pile of expect files for this.

- Removed defunct torch.toffee imports from autograd to solve module import
  cycle.

- Helper function _with_file_like to abstract over opening file-ish arguments,
  taken from torch.save()

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
Edward Z. Yang
3ef2ec6153 Actually correctly handle non-float exports.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
Adam Paszke
c537aebf5a Always run DCE in Traceable 2017-09-05 17:48:55 -04:00
Edward Z. Yang
d59714e3b1 Code review comment changes.
- Reduce setup.py diff.
- Expunge WITH_TOFFEE from codebase.
- Elaborate on a comment.
- Move gen_toffee.sh to tools
- Delete densenet test.
- Use 'using' to inherit a constructor.
- Delete outdated comment.
- Comment about why primspecs can return fewer outputs.
- Remove dead, commented out includes.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
Edward Z. Yang
2e266837f5 Port TracingState to pybind11, new export() method.
Along the way I added converters for Variable and TracingInput.  Variable should
probably be moved to a more widely known spot.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
Adam Paszke
d8e2ab632e Add support for Constant nodes in AutogradClosureFactory 2017-09-05 17:48:55 -04:00
Adam Paszke
594f98ce16 Support multi-stage AutogradClosures 2017-09-05 17:48:55 -04:00
Edward Z. Yang
b2e305e390 Lint after ToffeeIR, and subsequent fallout.
I realized we weren't running the linter after ToToffeeIR, so
I added a lint call.  It thus emerged that the current implementation
was using "Unused" nodes that were not added to the graph,
which was tripping the lint.  I fixed this a few ways:

- BatchNorm and Conv primspecs were returning dead "unused" nodes
  for their (implicit) handle parameters.  I removed them because
  setOutputs handles this already, and a dead unused node which
  is not attached to the graph violates the "no dead nodes"
  invariant.

- OK, but MaxPool actually needs to return a unused node for
  the output which supported by PyTorch but not Toffee; we need
  to error if subsequently in the trace this output is used.
  The new strategy is to have MaxPool's primspec return a None
  at the unused position, and then immediately *check* if there
  are any uses of that output.  If there are, that's an error!

- I needed to adjust the Select invariant in the exporter loop:
  only if a Select node has *uses* is it mandatory for it to be
  defined in env.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
Edward Z. Yang
82efbe349b Handle batchnorm properly.
Basic idea:
- Pass buffers (marked as non-Variable tensors) as input variables to
  the trace.   Every buffer gets represented as an input variable
  to the trace, and we remember a correspondence of the underlying
  TH pointer and an input variable in the trace.
- When we initially trace a function, we DO NOT record the buffers
  as edges.  This is so autograd doesn't have to know anything about buffers.
  If we ever turn buffers into requires_grad=False parameters, then
  this problem goes away.
- When we primspec the buffer, NOW we reach into the cached buffers
  (now appropriately named) and gin up the buffer information we need.

Other things:
- CppOp execution is now supported (but lightly tested) using
  SimpleEval (thanks @apaszke!)

Todo:
- E2E tests need to have their hacks removed.
- Figure out what is going on with backwards

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
Edward Z. Yang
de6ef65be5 Port to nanopb.
General strategy:
- nanopb is statically linked into PyTorch.  It must be built
  with -fPIC.
- Generated nanopb files for toffee.proto are checked into
  our repo.
- Because nanopb generated protobufs are C only, we wrote a
  wrapper around it to give a Google C++ style interface.
  More on this shortly.

How does the wrapper work?
- It's called "micropb" becaues it is less small than nanopb :)
- nanopb requires all variable-length fields to be written out
  using a "callbacks" mechanism.
- We wrote pre-canned callbacks for all of the types ToffeeIR
  writes out and lists; these are micropb_callback and
  micropb_callback_list.  These operate simply by dynamically
  allocating and storing the data to be written out in
  data (this defeats the purpose of the callback mechanism,
  but it's easy to implement)
- Finally some boilerplate to actually implement the wrapper
  classes and have owning pointers to the actual data.

Testing strategy:
- Take the serialized protobuf from nanopb, parse it again
  with ToffeeIR and print it.  Worked with all of test_jit.py!
  These tests don't run without 'toffee' being installed.

TODO:
- Update CI to install ToffeeIR, so we can run the Toffee tests
  in CI
- Update E2E with Caffe2 tests so that they work with new stuff.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
Zach DeVito
bad5717e15 add ability to specify initial values for inputs 2017-09-05 17:48:55 -04:00
Edward Z. Yang
f062e06c91 Make null Variables on convolution and batchnorm work.
This addresses when bias is disabled, which occurs in torchvision's
alexnet and densenet.

The general strategy is this:

- When we encounter a null variable, we turn this into a Constant
  node with an undefined at::Tensor

- Toffee exports for BatchNorm and Conv have special cases for bias,
  checking if they are provided by a Constant node with undefined
  value, and just omit the input if so.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
Zach DeVito
a60d9bd022 Bind Attributes in python ir, and add test for python ir binding 2017-09-05 17:48:55 -04:00
Zach DeVito
a3fdb281d1 Python wrapper for Node IR using pybind11
Supports almost all of the IR API.
2017-09-05 17:48:55 -04:00
Adam Paszke
fa308b3183 Improve backward tracing 2017-09-05 17:48:55 -04:00
Edward Z. Yang
ee2ba279f2 Working Reshape op
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
Edward Z. Yang
35f1cb462d Invert negation.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
Edward Z. Yang
6f6fe177f1 Make Toffee optional. Unbreaks CI.
The general strategy:

- We put all the toffee files in torch/csrc/toffee; they will only be
  added when toffee is enabled

- Toffee is enabled if torch/lib/ToffeeIR is present (since we
  don't have a submodule/subtree thing going on)

- The most prevalant place you will need to use WITH_TOFFEE is for
  primspec definitions on C++ autograd functions.  There is a
  macro HAS_PRIMSPEC to ameliorate optionally defining primspec()
  virtual overrides on Function classes.  HasPrimspec is always
  available but will be a zero field class when Toffee is disabled.

NB: We might revert this commit in the future if we figure out a way
to unconditionally enable Toffee that everyone likes.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
Edward Z. Yang
dd58b145c3 Toffee graph exporting for PyTorch.
This commit adds a new exporter pass which takes a graph and returns
a string of the human-readable protobuf representation of a model.

We have two strategies for how conversions are implemented:

- If a Python autograd function has a primspec static method, we invoke
  it to get the Toffee conversion.  Use torch.toffee.op to generate the
  format expected to be returned.  The particular data representation is opaque
  and subject to change in the future.

- Otherwise, there's a giant if statement in the exporter, which manually
  uses the JIT IR C++ API and Toffee IR C++ protobuf API to convert.

You must check out a copy of the ToffeeIR repo
https://github.com/ProjectToffee/ToffeeIR at torch/lib; at the moment
we don't have a subtree/submodule set up.

Technical debt in this commit:

- To get protobuf headers in scope, we unconditionally add $CONDA_PREFIX/include
  to the include path.  This needs to be replaced with a more robust mechanism.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
Adam Paszke
890c2071f0 PR comments 2017-09-05 17:48:55 -04:00
Zach DeVito
e91966a0b4 Unify our tracing API into a single interface for functions/models.
The API works on either functions or models, taking an extra parameter argument
so that functions can pass in additional variables to trace.

Other behavior is folded into boolean options:

time - collect stats for our own perf debugging
verify - run the original code, and check it is within threshold
optimize - run optimization (currently off until fusiongroups pr is accepted).
enabled - flag to turn off tracing so you can check timing of stuff that cannot be traced.
2017-09-05 17:48:55 -04:00
Adam Paszke
7f60a18293 Add initial support for backward tracing 2017-09-05 17:48:55 -04:00
Edward Z. Yang
eb730f8321 Inplace test.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
Edward Z. Yang
4a1bbc01ac Fix #41.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
Edward Z. Yang
21c0ad9702 Test case that we fail legacy traces
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
Edward Z. Yang
453b0fac03 Always print diffs, no matter how large.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
Adam Paszke
1c4538e017 Trace C functions 2017-09-05 17:48:55 -04:00
Adam Paszke
bdcbbeaf68 Remove GlobalTracingState 2017-09-05 17:48:55 -04:00
Edward Z. Yang
b158aaf6b4 Make linter an optimization pass.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
Adam Paszke
2dc3ef73ae Lint 2017-09-05 17:48:55 -04:00
Adam Paszke
3e0f1608fe Capture Variables that are not inputs as constants 2017-09-05 17:48:55 -04:00
Adam Paszke
233a66dcbe Remove SimpleMap from JIT IR 2017-09-05 17:48:55 -04:00
Zach DeVito
50e51eaa7f Fusion of simple map operations using nvrtc.
Approach is based on the approach of THC's pointwiseApply{1,2,3} family of kernels,
but doesn't have any dependencies on that code.

Adjacent contiguous dimensions of input tensors are compressed to reduce the complexity of indexing math.
For the completely contiguous case, the indexing logic simplifies to just the linear index.

In simple tests, this code matched or beat the equivalent from THC.
2017-09-05 17:48:55 -04:00
Adam Paszke
a4086508c6 Enable tests 2017-09-05 17:48:55 -04:00
Adam Paszke
f270973937 Add JIT IR -> Autograd IR converter 2017-09-05 17:48:55 -04:00
Adam Paszke
e186d16e6b Apply JIT optimizations form Python 2017-09-05 17:48:55 -04:00
Adam Paszke
72659bcdef Minor code cleanup 2017-09-05 17:48:55 -04:00
Edward Z. Yang
57d65a99bb Add LSTM fusion test.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
Edward Z. Yang
e238f3cada Very simple accept/golden test framework for JIT trees.
- To test whether or not a multiline string matches some expected
  value, you can use assertExpected.  This tests that the string
  matches the content stored at a file based on the name of the
  test (and an optional subname parameter you can pass if you
  what to assertExpected multiple times.)

- Suppose you make a change that modifies the output in a big way.
  Instead of manually going through and updating each test, you instead
  run python test/test_jit.py --accept.  This updates all of the expected
  outputs.  You can now review them one-by-one and make sure your
  changes make sense.

We can add more features later (e.g., munging the output to make it
more stable, more sanity checking) but this is just to get us started
testing.  One thing to watch out for is that accept tests on intermediate
representation can be a bit wobbly: it is *extremely* important that
people be able to read the IR.  It may be worth introducing niceties
to the printer in order to ensure this is the case.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
Adam Paszke
55c9e0258e Make the linter happy 2017-09-05 17:48:55 -04:00
Edward Z. Yang
2ced918063 Add a very simple visual (non-automated) test.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00