Commit Graph

96 Commits

Author SHA1 Message Date
Peter Goldsborough
2d5fbe6e0d Improve Variable interface (#5127)
* Improve Variable interface

* Address comments from @apaszke and @colesbury

* string ::operator= is not noexcept

* Remove ir.h from tracer_state.h to improve build times

* Make Variable a struct and pack SavedVariable fields

* Implement as_variable_ref

* grad_fn_ptr() -> grad_fn_unsafe()

* Reduce hackiness of set_type hack

* Include variable.h and edge.h in tracer_state.h because it uses them

* class Variable -> struct Variable because Windows cant even

* Make Variable::output_nr uint32_t instead of int

* Add comment about tracing state

* Replaced more static_cast<Variable&> and improve docs

* Remove SavedVariable destructor and construct members in init list

* Clarify docs for Variable

* Variable::set_version -> set_version_counter
2018-02-12 23:26:26 -05:00
Adam Paszke
0ef10385b2
Make Python functions respect grad mode (#5184) 2018-02-13 01:27:36 +01:00
Peter Goldsborough
f38b6f611e Replace NULL with nullptr in autograd (#5162) 2018-02-12 12:01:52 -08:00
Peter Goldsborough
25e946bf78 Replace edge_type with Edge and create Variable::gradient_edge() (#5030) 2018-02-07 10:50:42 -08:00
Sam Gross
895aebac08
Use Variable instead of Tensor in Function.forward (#4786)
The Tensor and Variable classes are being merged.
autograd.Function.forward is now called on Variables, but with "no-grad"
mode (torch.no_grad()) enabled.

One benefit is that we no longer have to explicitly track shared
storages.
2018-02-06 17:24:27 -05:00
Peter Goldsborough
61b5ea85d4 Remove FunctionFlags (#5018) 2018-02-03 20:57:39 -05:00
peterjc123
77ea2f26d8 Add build support for Python 2.7 using MSVC (#4226) 2017-12-20 15:07:25 +01:00
Richard Zou
cf2e088c9a Translate None to zeros for old-style autograd functions (#4242) 2017-12-20 14:03:56 +01:00
Sam Gross
d605058212
Replace Variable.volatile with torch.no_grad() (#3970)
This removes volatile from Variable. The functionality is mostly
replaced by a global (thread-local) flag, which is controlled by
torch.set_grad_enabled() and the context manager torch.no_grad().

In C++, the flag is exposed through GradMode::is_enabled() and GradMode::set_enabled()

Fixes #3627
2017-12-18 15:46:13 -05:00
Edward Z. Yang
6d72c82985
Trace ATen native functions as themselves, not their implementations. (#4127)
* Trace ATen non-primitive functions as themselves, not their implementations.

Previously, if I invoked an ATen non-primitive function foo, which in turn
called subfoo, I would always see 'subfoo' in the trace (e.g., tracing
'inlines' all of these operations.)  Such inlining is bad for ONNX
(and can be bad for optimization) as it prevents high-level
optimizations from taking advantage of the structure.  It might
be right to inline, but give the optimizer a chance to work before
inlining happens!

The implementation here is surprisingly simple, because it uses
the "DCE trick".  Essentially, it doesn't matter if the constituent
calls perform tracing, because you can always trace it again, and
override the trace nodes associated with the returned variables.
The original trace becomes dead and can be DCE'd.

While implementing this, I also refactored how 'isTracing' and
'trace_outputs' works:

- isTracing was previously a single function with overloads for
  both Tensor and Variable arguments.  Unfortunately, such overloads
  are not safe, because of how C++ implicit conversions work.  You
  would think that C++ should never confuse an overload for
  Variable with ArrayRef<Tensor>, but this is exactly what can
  happen: Tensor is convertible to both Variable and ArrayRef<Tensor>,
  thus it's ambiguous and C++ doesn't like it.  The last time I ran
  into this problem, I applied initializer lists to everything and
  called it a day.  A more robust fix is to separate out the
  Variable and Tensor overloads, which I have done in this patch.

- trace_outputs was fed as an initializer list, which doesn't work
  when you have heterogenous inputs.  So instead we first feed
  everything through 'flatten', which has overloads for each of the
  argument patterns in ATen, which then goes on to the recordTrace
  (which takes an ArrayRef).  This is *no less efficient*, because
  we were allocating a vector anyway (to do the conversion from
  vector of Tensor to vector of Variable).

This fixes mean that 'index' can properly be traced... although the
JIT still does not support it.  A failing test case has been added to
this effect.

Some knock-on effects:

- The fuser now knows about chunk as well as split.  They're pretty
  similar so there is no problem.

- There is a new 'canonicalize' pass in the JIT which renumbers a graph
  so that all structurally equivalent graphs render the same.

- We run DCE before the fuser tests, to make sure dead nodes don't
  block fusion.

- There are new ONNX exports for the newly introduced higher level ATen
  operations.  This includes type_as (no-op case only), chunk, select.

Zach didn't like the extra use of 'native' in the new codegen, so
we've introduced a new concept, 'abstract'.  An abstract function
is one that is implemented in derived types (e.g., CPUDoubleType),
where as a concrete one is implemented in the base type (Type).

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-12-15 13:50:32 -05:00
Edward Z. Yang
8a254a0271 Port batchnorm_double_backward to ATen.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-12-13 17:19:47 -05:00
Simeon Monov
3c709f5b26 Add HANDLE_TH_ERRORS for THPFunction_saved_variables and THPFunction_saved_tensors
SavedVariable.unpack() may throw std::runtime_error which may lead to
program termination with SIGABRT without the exception beeing handled
in Python

Fixes #3860
2017-11-29 22:54:27 +01:00
Zachary DeVito
929a11f920
Add interpreter support for Handles/PythonOp/CppOp (#3866)
* Add interpreter support for Handles/PythonOp/CppOp

This treats Handles as a first-class type in the interpreter
since this turned out to be conceptually simpler than treating
them as a separate concept, which requires a second channel for
register allocating and moving data from one op to the next.

Notes:
* The refcounting nature of tensors is factored into its own base type
so that it can be shared with other refcounted types such as handle.
* Some methods redundant with TensorBase have been deleted from Tensor
* The interpreter uses raw refcounted handles. In addition to being
able to treat Tensors and Handles as the same base object, it removes
a lot of redundant refcounting as objects moved from tensors to input/
output lists.
* aten_dispatch has been updated to work directly on the raw refcounted
lists to avoid refcounting and duplicate lists.
* Removing jit_closure.cpp, The interpreter can now handle all pathways.

* Functions like `unsafeToTensorShare` describe how
ownership transfers in the interpreter. The `Steal` variants
take rvalue references as arguments, and invalidate those
arguments to prevent potential problems.
* Make TensorTemporary is not a  subtype relationship because it is too easy to
do something horribly unsafe:

```
  void foo(at::Tensor bar) {
    // bar destructor call release on a temporary!
  }

  foo(TensorTemporary(retainable)); // structure slicing!
```
2017-11-29 11:38:57 -05:00
Sam Gross
0a434ff685 Remove Function::is_executable (#3907)
* Remove Function::is_executable

Ensure that grad_fn is null if requires_grad is false.

* Assert that grad_fn implies requires_grad=True
2017-11-28 18:29:27 -08:00
Adam Paszke
cf407213f9 Clean up stochastic function related dead code (#3782) 2017-11-20 12:44:45 -05:00
Gregory Chanan
9a2b54e08b [ATen] Rename isCuda -> is_cuda. 2017-11-15 18:33:07 -08:00
Sam Gross
b09d66e60d
Fix a reference cycle when in-place ops on views save the output (#3679)
Previously, an in-place operation that saves its output (such as
relu/threshold) would create a reference cycle when applied to the a
view. There were two cycles created:

1) The cycle base.grad_fn.fn.input_.base
   base.grad_fn is a CopySlices
   base.grad_fn.fn is ThresholdBackward
   base.grad_fn.fn.input_ is a SavedVariable with base pointing to base

2) The cycle base.grad_fn.fn.input_.grad_fn.next_functions[0]
   base.grad_fn.fn.input_.grad_fn is AsStridedBackward
   and next_functions[0] points to base.grad_fn

Generally, we avoid cycles because the AD graph is mostly immutable. Two
notable exceptions are:

a) Variable.grad_fn can change to point to a new grad_fn
b) SavedVariables in a function can be set after the function is created

The first case is not a problem if grad_fns do not hold strong references
to Variables. Removing "base" from SavedVariable removes the strong ref.

For the second case, we need to avoid saving the grad_fn of outputs. We
were incorrectly saving the grad_fns of outputs when they were the
result of in-place ops on views.
2017-11-15 15:19:41 -05:00
Zach DeVito
ef4b19f767 Refactor ir.h to distinguish Nodes and Values
This commit adds a Value type similar to the one @ezyang suggested a while
ago for handling multi-return nodes.

Previously if we had a graph like:

  a = op1(b)
  c, d = op2(a)

Then its in-memory format would look like:

  %0 = op1(b)
  %1 = op2(%0)
  %2 = select(%1, 0)
  %2 = select(%1, 1)

Select nodes were used only to handle the multi-output case. In the
single-output case ops referred directly to their uses.

This required special handling for the single- and multi- output cases,
and was confusing when used with ONNX which distinguishes values (the
inputs/outputs of a node) from the nodes themselves (e.g. a Conv).

This commit adds the Node/Value distinction to the IR. In the example
above, `a`, `b`, `c`, and `d` are now Value objects, while `op1` and
`op2` are now Node objects. Inputs/Outputs to the graph are values.

* Nodes now always have multiple outputs, accessible through their `output()`
  method.
* Methods exist for adding/removing outputs from a node.
* Nodes own their output Values, destroying a node destroys its outputs and it
is only valid to destroy a node when no uses of its outputs remain.
* Unlike select, Values do not appear in the nodes list.
* The method `node()` on `Value` retrieves its defining node. Calling it
is always valid. For inputs, its kind is "Param". Like "Return" there is a single Param
node representing all inputs.
* For single-output Nodes, the method `output()` retrieves the single
output Value, asserting that the node is in-fact single output.
* Functions are the same, but some functions like `type()` have moved to
Value.
* `replaceAllUsesWith` is now sanely defined for both Values and Nodes.
In the case of Nodes, it replaces all outputs of the node with the outputs
of the replacement node.
* stage is defined both on Node/Value. This is because Inputs require a stage.
* Apart from changing data types from Node->Value most passes remain the same.
  Things that previously assumed single-output nodes now have to call output()
  to get the node.
* This removes the uses = [...] field in the outputs because it was
getting confusing even before this commit when uses would refer to nodes,
but we print the names of Values. The lint pass validates the use list,
so printing it out seems less necessary.
2017-11-15 11:47:18 -08:00
Sam Gross
fde355f7d4
Allow in-place operations on views (#3384)
Allow in-place operations on views

Adds VariableViewImpl, a subclass of VariableImpl which has a pointer to
the base Variable on which it is a view. In-place operations on views
change the grad_fn of the base.

Note that in-place operations only work on views that are the first output of the function that created them. All C++/ATen implemented functions have this behavior, but it's possible to write Python-implemented autograd functions that do not. In-place operations on these view will raise an exception.

Fixes #3313
2017-11-06 18:19:56 -05:00
Sam Gross
3003ebe67a
Replace None grad_inputs with zero tensors in some cases (#3433)
Replace None grad_inputs with zero tensors in some cases

In Python-implemented autograd functions, we sometimes return None as
the grad_input if the output is marked "non-differentiable". This
replaces those None values with zero-filled Variables if the
corresponding input has requires_grad=True.

C++ implemented autograd functions expect the input (grad_outputs) to
be defined if they're executed. They always return non-null grad_inputs
if should_compute_output(i) is true. This could lead to segfaults if a
subsequent Python-implemented function returned None.

See #3412, #3241
2017-11-02 17:23:25 -04:00
Adam Paszke
fa0f3cf98a Re-enable and fix most JIT tests 2017-10-27 02:40:09 +05:30
Sam Gross
e970d35091 Make VariableVersion refcounting thread-safe (#3184)
I've also made the version counter and the "live" reference count
atomics.

Note that it's not safe to set the version counter (operator=) from
multiple threads, because shared_ptr assignment isn't thread safe.
Currently, the only call sites to these functions are on newly created
variables before they can be accessed from other threads.

See #3111
2017-10-19 17:22:01 -04:00
Edward Z. Yang
66bb3d6dec Remove incorrect comment that join_with is symmetric.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-10-13 01:31:22 +02:00
Adam Paszke
b6b41c829a Add inplace checks in JIT 2017-10-03 10:20:58 -04:00
Adam Paszke
411e1469e0 Add tools for autograd profiling 2017-09-25 23:21:30 -04:00
Edward Z. Yang
c08395e290 Give a better error message when we hit a legacy function.
We now include the type name of the legacy function implementing
class.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-25 12:26:07 -04:00
Edward Z. Yang
6efd797376 Document unchecked invariant.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-20 12:24:27 -04:00
Edward Z. Yang
25c2b7d8b2 Some minor extra comments on python_function
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-20 12:24:27 -04:00
Adam Paszke
c536da7064 Remove TensorMeta 2017-09-19 10:53:32 -04:00
Adam Paszke
ba6e652c02 Add simple mode to Eval 2017-09-19 10:53:32 -04:00
Edward Z. Yang
1f80dd03bd Track change of Variable from shared_ptr to ATen style tensor
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-19 10:53:32 -04:00
Adam Paszke
28828e033f Make certain functions traceable 2017-09-19 10:53:32 -04:00
Adam Paszke
4d1ed4ec42 Assign traces before saving Variables 2017-09-19 10:53:32 -04:00
Adam Paszke
964b731af3 Try to handle NULL Variables in the tracer 2017-09-19 10:53:32 -04:00
Adam Paszke
ddd417faf0 Fix non-CUDA builds after Windows PRs (#2760) 2017-09-17 02:02:52 -04:00
Sam Gross
1290e586fb Use at::Tensor based autograd Variable (#2676)
Variable is now a subclass of at::Tensor backed by a VariableImpl* pImpl. The implementation of the ATen functions is defined in the auto-generated VariableType.h/cpp file.

Currently, only functions which fall through to the base type, such as sizes() and isCuda() are implemented. Differentiable ops like add() and mul() will be added in a subsequent PR.
2017-09-12 11:36:01 -04:00
Adam Paszke
965a349bbd Record context edges in the JIT 2017-09-05 17:48:55 -04:00
Adam Paszke
9f97291408 Make tracer thread-safe 2017-09-05 17:48:55 -04:00
Adam Paszke
fa308b3183 Improve backward tracing 2017-09-05 17:48:55 -04:00
Zach DeVito
55cd9f37d1 remove Select, and NodeWithKind 2017-09-05 17:48:55 -04:00
Zach DeVito
24cdb897d6 starting removing nodes by removing Return 2017-09-05 17:48:55 -04:00
Zach DeVito
b037efa92c prep for removing node subtypes 2017-09-05 17:48:55 -04:00
Adam Paszke
7f60a18293 Add initial support for backward tracing 2017-09-05 17:48:55 -04:00
Edward Z. Yang
4a1bbc01ac Fix #41.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
Edward Z. Yang
765b0bf137 Make in-place work again.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
Adam Paszke
1c4538e017 Trace C functions 2017-09-05 17:48:55 -04:00
Adam Paszke
bdcbbeaf68 Remove GlobalTracingState 2017-09-05 17:48:55 -04:00
Edward Z. Yang
c931feaad0 Elaborate on NB a little 2017-09-05 17:48:55 -04:00
Adam Paszke
3e0f1608fe Capture Variables that are not inputs as constants 2017-09-05 17:48:55 -04:00
Adam Paszke
af21c6b018 Add Node type to JIT IR
Rewrite Type as a class hierarchy

PR comments + rebase fixes
2017-09-05 17:48:55 -04:00