* Created TensorOptions
Storing the type in TensorOptions to solve the Variable problem
Created convenience creation functions for TensorOptions and added tests
Converted zeros to TensorOptions
Converted rand to TensorOptions
Fix codegen for TensorOptions and multiple arguments
Put TensorOptions convenience functions into torch namespace too
All factory functions except *_like support TensorOptions
Integrated with recent JIT changes
Support *_like functions
Fix in place modification
Some cleanups and fixes
Support sparse_coo_tensor
Fix bug in Type.cpp
Fix .empty calls in C++ API
Fix bug in Type.cpp
Trying to fix device placement
Make AutoGPU CPU compatible
Remove some auto_gpu.h uses
Fixing some headers
Fix some remaining CUDA/AutoGPU issues
Fix some AutoGPU uses
Fixes to dispatch_tensor_conversion
Reset version of new variables to zero
Implemented parsing device strings
Random fixes to tests
Self review cleanups
flake8
Undo changes to variable.{h,cpp} because they fail on gcc7.2
Add [cuda] tag to tensor_options_cuda.cpp
Move AutoGPU::set_index_from into .cpp file because Windows is stupid and sucks
Fix linker error in AutoGPU.cpp
Fix bad merge conflict in native_functions.yaml
Fixed caffe2/contrib/aten
Fix new window functions added to TensorFactories.cpp
* Removed torch::TensorOptions
Added code to generate wrapper functions for factory methods
Add implicit constructor from Backend to TensorOptions
Remove Var() from C++ API and use torch:: functions
Use torch:: functions more subtly in C++ API
Make AutoGPU::set_device more exception safe
Check status directly in DynamicCUDAHooksInterface
Rename AutoGPU to DeviceGuard
Removed set_requires_grad from python_variables.h and warn appropriately in Variable::set_requires_grad
remove python_default_init: self.type()
Add back original factory functions, but with deprecation warnings
Disable DeviceGuard for a couple functions in ATen
Remove print statement
Fix DeviceGuard construction from undefined tensor
Fixing CUDA device compiler issues
Moved as many methods as possible into header files
Dont generate python functions for deprecated factories
Remove merge conflict artefact
Fix tensor_options_cuda.cpp
Fix set_requires_grad not being checked
Fix tensor_new.h
TEMPORARILY put some methods in .cpp files to see if it solves issues on windows and mac
Fix bug in DeviceGuard.h
Missing includes
TEMPORARILY moving a few more methods into .cpp to see if it fixes windows
Fixing linker errors
* Fix up SummaryOps to use new factories
Undo device agnostic behavior of DeviceGuard
Use -1 instead of optional for default device index
Also move DeviceGuard methods into header
Fixes around device index after optional -> int32_t switch
Fix use of DeviceGuard in new_with_tensor_copy
Fix tensor_options.cpp
* Fix Type::copy(
* Remove test_non_float_params from ONNX tests
* Set requires_grad=False in ONNX tests that use ints
* Put layout/dtype/device on Tensor
* Post merge fixes
* Change behavior of DeviceGuard to match AutoGPU
* Fix C++ API integration tests
* Fix flip functions
* docstring support for @script and @script_method
* make it python2 compatible
* improve according to review
* improve build_stmts
* use filter instead of list comprehension
* improve the way wrap is handled for script_method
* stash the original method instead
* allow dynamic attr for ScriptMethod and GraphExecutor
* a bit comment on build_Expr
* remove _build_wrap
* a bit improve on comments
* rename to __original_methods
* should be _original_methods
* Factor python dependency out of interpreter
* Remove NO_PYTHON for the autograd engine
If there is no python bindings, then a default Engine is constructed
the first time it is requested.
If the python libraries are loaded, then they override the default
accessor and the default engine becomes a python Engine.
Note: it is possible for two engines to be generated if a non-python
one gets created before the python bindings are loaded. This case
is rare, and just results in additional threads being spawned.
* Fixing AlexNet test which is skipped in CI
This adds the ability to trace script functions while preserving their
control flow. When the trace encounters a script function it inlines
the graph of the function into the trace rather than tracing the
function itself.
* Switch JIT passes to take a graph rather than TracingState
* Add pybind11 binding for ONNX pass from graph
* Fix canonicalize pass
* address comment
* Switch ToONNX to explicitly return new graph
* optimize_graph instead of optimize_trace
Script functions can now have no return statements, empty
return statements, or return one or more values.
Additionally fix the lexer to always emit TK_NEWLINE before
TK_DEDENT, which simplifies the parser.
* Have ScriptModule inherit from Module
This is accomplished by created replacement _parameters, _buffers,
and _modules which implement the OrderedDict APIs but which
actually get/set their members inside script::Module
* Merge TracedModule with ScriptModule
* Move logic of attribute handling into Python bindings rather than
make script::Module handle it. This was redundant with nn.Module,
which already handles attribute.
* Make TracedModule a subclass of ScriptModule
* Move handling of attribute kind logic into bindings.
* Allow ScriptModule to contain non-script module submodules.
* torch.jit.trace annotation now creates a GraphExecutor
The other torch.jit.trace, which was used for testing purposes and for onnx to get the trace graph, is now called torch.jit. torch.jit.get_trace_graph.
* @script annotation, and compilation unit for strings
Additionally:
- add support for calling functions that are not methods in the Python frontend
- add an end-to-end test for the Python frontend
- add a capture_stdout helper for checking that `print` actually works
This adds the initial implementation of graph executor for the new JIT design. It includes a few python tests ensuring that nograd, backward, and double-backward cases work for simple examples and some corner cases. More work needs to be done to performance optimize as there are many extra copies and places where we hold onto variables longer than we should. These are noted in the comments.
This pass splits differentiable subgraphs into their own Node,
similar to a fusion group.
This initial implementation does not create optimal subgraphs, but
it works well in the case where most things are differentiable,
and has the building blocks (`mergeNodes`) to extend to the
better implementation.
* Trace ATen non-primitive functions as themselves, not their implementations.
Previously, if I invoked an ATen non-primitive function foo, which in turn
called subfoo, I would always see 'subfoo' in the trace (e.g., tracing
'inlines' all of these operations.) Such inlining is bad for ONNX
(and can be bad for optimization) as it prevents high-level
optimizations from taking advantage of the structure. It might
be right to inline, but give the optimizer a chance to work before
inlining happens!
The implementation here is surprisingly simple, because it uses
the "DCE trick". Essentially, it doesn't matter if the constituent
calls perform tracing, because you can always trace it again, and
override the trace nodes associated with the returned variables.
The original trace becomes dead and can be DCE'd.
While implementing this, I also refactored how 'isTracing' and
'trace_outputs' works:
- isTracing was previously a single function with overloads for
both Tensor and Variable arguments. Unfortunately, such overloads
are not safe, because of how C++ implicit conversions work. You
would think that C++ should never confuse an overload for
Variable with ArrayRef<Tensor>, but this is exactly what can
happen: Tensor is convertible to both Variable and ArrayRef<Tensor>,
thus it's ambiguous and C++ doesn't like it. The last time I ran
into this problem, I applied initializer lists to everything and
called it a day. A more robust fix is to separate out the
Variable and Tensor overloads, which I have done in this patch.
- trace_outputs was fed as an initializer list, which doesn't work
when you have heterogenous inputs. So instead we first feed
everything through 'flatten', which has overloads for each of the
argument patterns in ATen, which then goes on to the recordTrace
(which takes an ArrayRef). This is *no less efficient*, because
we were allocating a vector anyway (to do the conversion from
vector of Tensor to vector of Variable).
This fixes mean that 'index' can properly be traced... although the
JIT still does not support it. A failing test case has been added to
this effect.
Some knock-on effects:
- The fuser now knows about chunk as well as split. They're pretty
similar so there is no problem.
- There is a new 'canonicalize' pass in the JIT which renumbers a graph
so that all structurally equivalent graphs render the same.
- We run DCE before the fuser tests, to make sure dead nodes don't
block fusion.
- There are new ONNX exports for the newly introduced higher level ATen
operations. This includes type_as (no-op case only), chunk, select.
Zach didn't like the extra use of 'native' in the new codegen, so
we've introduced a new concept, 'abstract'. An abstract function
is one that is implemented in derived types (e.g., CPUDoubleType),
where as a concrete one is implemented in the base type (Type).
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
This breaks a lot of the onnx-pytorch tests because the abstraction
barriers are not respected. I'll spin up a patch for that separately.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
The pieces:
- I improved the lint / asserts to catch some bugs which I
committed while working on my export. There are two new
properties which the linter checks now:
(1) "Anticipated uses". If a node says that is used by
M, M better appear later in the topsort. Previously,
we only checked if it was in all_nodes.
(2) If you are a select node, you better be a multi-type node;
if you're not a select node, you better not be! And you
should never have an input that is multi-type.
- There is a new peephole optimization pass, for simple, local
transformations to graphs. Right now, it implements a simple
optimization: remove 'expand' invocations that are no-ops
(the size before matches the size after), but we can add other
things to it later. I needed this for ONNX because no-op expands
show up in the left-hand argument, which we don't support.
- There is now a broadcast fuser, which fuses ATen expand ops
into broadcastable ONNX ops (Add, Div, Mul, Pow, Sub, Gemm.)
It only fuses when the original size is a suffix of the new
size, as per the ONNX spec.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
- Reduce setup.py diff.
- Expunge WITH_TOFFEE from codebase.
- Elaborate on a comment.
- Move gen_toffee.sh to tools
- Delete densenet test.
- Use 'using' to inherit a constructor.
- Delete outdated comment.
- Comment about why primspecs can return fewer outputs.
- Remove dead, commented out includes.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Along the way I added converters for Variable and TracingInput. Variable should
probably be moved to a more widely known spot.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Basic idea:
- Pass buffers (marked as non-Variable tensors) as input variables to
the trace. Every buffer gets represented as an input variable
to the trace, and we remember a correspondence of the underlying
TH pointer and an input variable in the trace.
- When we initially trace a function, we DO NOT record the buffers
as edges. This is so autograd doesn't have to know anything about buffers.
If we ever turn buffers into requires_grad=False parameters, then
this problem goes away.
- When we primspec the buffer, NOW we reach into the cached buffers
(now appropriately named) and gin up the buffer information we need.
Other things:
- CppOp execution is now supported (but lightly tested) using
SimpleEval (thanks @apaszke!)
Todo:
- E2E tests need to have their hacks removed.
- Figure out what is going on with backwards
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
The general strategy:
- We put all the toffee files in torch/csrc/toffee; they will only be
added when toffee is enabled
- Toffee is enabled if torch/lib/ToffeeIR is present (since we
don't have a submodule/subtree thing going on)
- The most prevalant place you will need to use WITH_TOFFEE is for
primspec definitions on C++ autograd functions. There is a
macro HAS_PRIMSPEC to ameliorate optionally defining primspec()
virtual overrides on Function classes. HasPrimspec is always
available but will be a zero field class when Toffee is disabled.
NB: We might revert this commit in the future if we figure out a way
to unconditionally enable Toffee that everyone likes.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
This commit adds a new exporter pass which takes a graph and returns
a string of the human-readable protobuf representation of a model.
We have two strategies for how conversions are implemented:
- If a Python autograd function has a primspec static method, we invoke
it to get the Toffee conversion. Use torch.toffee.op to generate the
format expected to be returned. The particular data representation is opaque
and subject to change in the future.
- Otherwise, there's a giant if statement in the exporter, which manually
uses the JIT IR C++ API and Toffee IR C++ protobuf API to convert.
You must check out a copy of the ToffeeIR repo
https://github.com/ProjectToffee/ToffeeIR at torch/lib; at the moment
we don't have a subtree/submodule set up.
Technical debt in this commit:
- To get protobuf headers in scope, we unconditionally add $CONDA_PREFIX/include
to the include path. This needs to be replaced with a more robust mechanism.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>