Commit Graph

9 Commits

Author SHA1 Message Date
Adam Paszke
17148f891f Fix a leak in JIT interpreter 2018-01-03 13:44:49 -05:00
Zachary DeVito
766312b7f2 Further relax VariableFlags, ... and fix bugs (#4244)
* Further relax VariableFlags

* Allow a requires_grad=True trace to be used for a requires_grad=False
  input by computing the gradient but they not connecting it to the
  input.
* Enable CSE to de-duplicate WLM backwards pass code which calls sum twice.
* Fix a bug in the interpreter that frees a register too early when
  it appears twice in a use list.

* [fuser] Follow all outputs to check if fusion is safe

This bug was introduced when we allowed fusion groups
to fuse together. Previously producers were forced to have a single
output, but now producers that are fusion groups can have multiple outputs.
So now we check the uses of all the outputs of a producer.

* [JIT] Fix handling of undefined inputs

It is not legal to call .data() on variable objects whose tensors
are undefined.
2017-12-20 10:36:22 -05:00
Sam Gross
d605058212
Replace Variable.volatile with torch.no_grad() (#3970)
This removes volatile from Variable. The functionality is mostly
replaced by a global (thread-local) flag, which is controlled by
torch.set_grad_enabled() and the context manager torch.no_grad().

In C++, the flag is exposed through GradMode::is_enabled() and GradMode::set_enabled()

Fixes #3627
2017-12-18 15:46:13 -05:00
Zachary DeVito
84b7daadb2 Relax verify of VariableFlags (#4191)
* Fix another leak in pybind11 code.

This time caused by an upstream pybind11 bug:

https://github.com/pybind/pybind11/pull/1216

This changes causes the code to go down a non-buggy pathway.

* Relax verify of VariableFlags

If we trace with a defined tensor, but see a run with a undefined
tensors we now allow that run to happen, replacing the tensor with
zeros.

This also fixes a bug where stage 0 tensors were not
checked against their verify flags.

This change does _not_ handle all bad situations that can happen.
For instance if the first thing traced has a undefined tensor but
a later tensor is defined, then it will fail because the graph itself
does not contain the trace for the derivative of the tensor.
However it is possible to work around this later case by
dry-running the function:

   z = Variable(...,requires_grad=True)
   x,y = f(z)
   (x.sum() + y.sum()).backward()
2017-12-15 12:57:31 -05:00
Zach DeVito
f72fe0624d Add a CPU Fuser (single core)
This adds a simple fusion backend for the CPU.
* Refactors CompiledFusionFunction to have two subclasses that handle
  the compilation details of each backend.
* emit-compile-link-run cycle for the CPU
* simple single core loop to run the operation
* lift CUDA-only restrictions in the fuser, checks that fusion groups
  are only on a single backend.
2017-12-04 14:13:44 -05:00
Zachary DeVito
929a11f920
Add interpreter support for Handles/PythonOp/CppOp (#3866)
* Add interpreter support for Handles/PythonOp/CppOp

This treats Handles as a first-class type in the interpreter
since this turned out to be conceptually simpler than treating
them as a separate concept, which requires a second channel for
register allocating and moving data from one op to the next.

Notes:
* The refcounting nature of tensors is factored into its own base type
so that it can be shared with other refcounted types such as handle.
* Some methods redundant with TensorBase have been deleted from Tensor
* The interpreter uses raw refcounted handles. In addition to being
able to treat Tensors and Handles as the same base object, it removes
a lot of redundant refcounting as objects moved from tensors to input/
output lists.
* aten_dispatch has been updated to work directly on the raw refcounted
lists to avoid refcounting and duplicate lists.
* Removing jit_closure.cpp, The interpreter can now handle all pathways.

* Functions like `unsafeToTensorShare` describe how
ownership transfers in the interpreter. The `Steal` variants
take rvalue references as arguments, and invalidate those
arguments to prevent potential problems.
* Make TensorTemporary is not a  subtype relationship because it is too easy to
do something horribly unsafe:

```
  void foo(at::Tensor bar) {
    // bar destructor call release on a temporary!
  }

  foo(TensorTemporary(retainable)); // structure slicing!
```
2017-11-29 11:38:57 -05:00
Zach DeVito
ef4b19f767 Refactor ir.h to distinguish Nodes and Values
This commit adds a Value type similar to the one @ezyang suggested a while
ago for handling multi-return nodes.

Previously if we had a graph like:

  a = op1(b)
  c, d = op2(a)

Then its in-memory format would look like:

  %0 = op1(b)
  %1 = op2(%0)
  %2 = select(%1, 0)
  %2 = select(%1, 1)

Select nodes were used only to handle the multi-output case. In the
single-output case ops referred directly to their uses.

This required special handling for the single- and multi- output cases,
and was confusing when used with ONNX which distinguishes values (the
inputs/outputs of a node) from the nodes themselves (e.g. a Conv).

This commit adds the Node/Value distinction to the IR. In the example
above, `a`, `b`, `c`, and `d` are now Value objects, while `op1` and
`op2` are now Node objects. Inputs/Outputs to the graph are values.

* Nodes now always have multiple outputs, accessible through their `output()`
  method.
* Methods exist for adding/removing outputs from a node.
* Nodes own their output Values, destroying a node destroys its outputs and it
is only valid to destroy a node when no uses of its outputs remain.
* Unlike select, Values do not appear in the nodes list.
* The method `node()` on `Value` retrieves its defining node. Calling it
is always valid. For inputs, its kind is "Param". Like "Return" there is a single Param
node representing all inputs.
* For single-output Nodes, the method `output()` retrieves the single
output Value, asserting that the node is in-fact single output.
* Functions are the same, but some functions like `type()` have moved to
Value.
* `replaceAllUsesWith` is now sanely defined for both Values and Nodes.
In the case of Nodes, it replaces all outputs of the node with the outputs
of the replacement node.
* stage is defined both on Node/Value. This is because Inputs require a stage.
* Apart from changing data types from Node->Value most passes remain the same.
  Things that previously assumed single-output nodes now have to call output()
  to get the node.
* This removes the uses = [...] field in the outputs because it was
getting confusing even before this commit when uses would refer to nodes,
but we print the names of Values. The lint pass validates the use list,
so printing it out seems less necessary.
2017-11-15 11:47:18 -08:00
Adam Paszke
3bb2308a89 Minor JIT improvements (#3703)
* Record autograd profiler events in JIT

* Fix the graph fuser

It was supposed to only work for float inputs, but worked
for all types _except_ float.
2017-11-14 21:23:31 -08:00
Zachary DeVito
e43ff32192
Add a JIT interpreter (#3634)
* Add a JIT interpreter

The separate interpreter is used to graphs with a lower overhead than
converting them to autograd graphs. Some notes:

* does not support Handles/PythonOp/CppOp, these will be in a future commit
* jit_closure.cpp still exists and we fall back to it for now when
  cannot handle something because of PythonOp/CppOp
* In order to support retain_graph=True, the interpreter can be cloned,
  creating a copy that can be run with different arguments. This is
  assumed to be the non-standard case so cloning is not particularly optimized.
  No tensor _data_ is copied, but the at::Tensor list in the interpreter is.
  If we hit problems, there is a lot we could do (such as register allocation)
  to minimize the stuff that needs to be copied.
* Uses a pImpl pattern to keep implementation details out of its header file.
* Modifies the way getTensorOp works so that it reads/writes to already-existing
  vectors, this prevents needing to realloc these buffers each time.
* Timings are here: https://gist.github.com/zdevito/5a20ac29fb1b9e449e693b67dc478127
  This reduces overhead to about the same as running it in python.
  It is about 10us faster to run the same thing using ATen directly.

* Code Mod

Interpreter -> InterpreterState
Function -> Code

Add other requested comments.

* RegList -> ListHandle<T>

Change the RegList functions to be safer by identifying the type of
each argument list, and checking that list insert does not try
to add to two different lists at once.

* Use exactly equal for interp tests
2017-11-13 22:09:53 -08:00