ExportProxy was a mechanism to reuse the code that supported exporting
autograd Functions to support overriding arbitrary python
functions. However, it had some serious downsides
- only works on some functions (all args must be Variable)
- complicated
- bad error messages in some cases
Instead, just expose enough functionality to python to perform the
necessary logic explicitly.
* Trace ATen non-primitive functions as themselves, not their implementations.
Previously, if I invoked an ATen non-primitive function foo, which in turn
called subfoo, I would always see 'subfoo' in the trace (e.g., tracing
'inlines' all of these operations.) Such inlining is bad for ONNX
(and can be bad for optimization) as it prevents high-level
optimizations from taking advantage of the structure. It might
be right to inline, but give the optimizer a chance to work before
inlining happens!
The implementation here is surprisingly simple, because it uses
the "DCE trick". Essentially, it doesn't matter if the constituent
calls perform tracing, because you can always trace it again, and
override the trace nodes associated with the returned variables.
The original trace becomes dead and can be DCE'd.
While implementing this, I also refactored how 'isTracing' and
'trace_outputs' works:
- isTracing was previously a single function with overloads for
both Tensor and Variable arguments. Unfortunately, such overloads
are not safe, because of how C++ implicit conversions work. You
would think that C++ should never confuse an overload for
Variable with ArrayRef<Tensor>, but this is exactly what can
happen: Tensor is convertible to both Variable and ArrayRef<Tensor>,
thus it's ambiguous and C++ doesn't like it. The last time I ran
into this problem, I applied initializer lists to everything and
called it a day. A more robust fix is to separate out the
Variable and Tensor overloads, which I have done in this patch.
- trace_outputs was fed as an initializer list, which doesn't work
when you have heterogenous inputs. So instead we first feed
everything through 'flatten', which has overloads for each of the
argument patterns in ATen, which then goes on to the recordTrace
(which takes an ArrayRef). This is *no less efficient*, because
we were allocating a vector anyway (to do the conversion from
vector of Tensor to vector of Variable).
This fixes mean that 'index' can properly be traced... although the
JIT still does not support it. A failing test case has been added to
this effect.
Some knock-on effects:
- The fuser now knows about chunk as well as split. They're pretty
similar so there is no problem.
- There is a new 'canonicalize' pass in the JIT which renumbers a graph
so that all structurally equivalent graphs render the same.
- We run DCE before the fuser tests, to make sure dead nodes don't
block fusion.
- There are new ONNX exports for the newly introduced higher level ATen
operations. This includes type_as (no-op case only), chunk, select.
Zach didn't like the extra use of 'native' in the new codegen, so
we've introduced a new concept, 'abstract'. An abstract function
is one that is implemented in derived types (e.g., CPUDoubleType),
where as a concrete one is implemented in the base type (Type).
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Variable is now a subclass of at::Tensor backed by a VariableImpl* pImpl. The implementation of the ATen functions is defined in the auto-generated VariableType.h/cpp file.
Currently, only functions which fall through to the base type, such as sizes() and isCuda() are implemented. Differentiable ops like add() and mul() will be added in a subsequent PR.
* Variables now hold a list of ValueTracingStates and can participate
in multiple traces.
* Refactored Traceable to maintain a list of traces, and only stop
tracing once it records all stages
Along the way I added converters for Variable and TracingInput. Variable should
probably be moved to a more widely known spot.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Basic idea:
- Pass buffers (marked as non-Variable tensors) as input variables to
the trace. Every buffer gets represented as an input variable
to the trace, and we remember a correspondence of the underlying
TH pointer and an input variable in the trace.
- When we initially trace a function, we DO NOT record the buffers
as edges. This is so autograd doesn't have to know anything about buffers.
If we ever turn buffers into requires_grad=False parameters, then
this problem goes away.
- When we primspec the buffer, NOW we reach into the cached buffers
(now appropriately named) and gin up the buffer information we need.
Other things:
- CppOp execution is now supported (but lightly tested) using
SimpleEval (thanks @apaszke!)
Todo:
- E2E tests need to have their hacks removed.
- Figure out what is going on with backwards
Signed-off-by: Edward Z. Yang <ezyang@fb.com>