Summary:
Adding a constant propagation pass to the JIT. I have added examples to the expect files.
There are a couple of special cases which have not been implemented here. IF nodes with constant conditions can be inlined with the correct block. WHILE nodes can be removed if the condition is false. I have added a test for each case in test_jit.py file as expected failures.
To be consistent with DCE, python ops & CPP ops are treated as not having side-effects.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8808
Reviewed By: wanchaol
Differential Revision: D8906770
Pulled By: eellison
fbshipit-source-id: 10ad796d89f80b843566c9ddad6a0abd1f3dc74c
Summary:
This is a follow-up to https://github.com/pytorch/pytorch/pull/9794 that contains only the serialization library and exposes a cleaner API. This should later be incorporated into the module export code
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9900
Reviewed By: zdevito
Differential Revision: D9021057
Pulled By: jamesr66a
fbshipit-source-id: 01af74a7fdd1b90b2f5484644c3121d8ba9eb3b3
Summary:
This is blocking the IR operator unification, because I need to be able to pass scalars to backward functions.
zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9763
Reviewed By: zou3519
Differential Revision: D8978457
Pulled By: apaszke
fbshipit-source-id: 570b4c3409322459cb0f2592069730a7d586ab20
Summary:
This PR adds machinery to cache the schema in an IR node, and allows lookups of (possibly) constant inputs by their names (instead of position). The new methods are:
- `at::optional<T> get<T>(Symbol name)` - if the argument called name is a constant, then casts it to type `T` and returns it. If it's not constant returns `nullopt`. Raises an error if there's no argument with that name.
- `at::optional<IValue> get<T>(Symbol name)` - like above, but packs the result in an IValue
- `Value* getValue(Symbol name)` - retrieves a `Value*` for an argument (no need to know its position).
All above functions currently inspect the attributes as well, but that's only so that I could start using them in other places in the JIT without disrupting our current functionality. I wanted this diff to be a preparation that doesn't change the semantics too much, and so both the tracer and script create nodes with attributes. The next PR will put that to a stop, and hopefully the changes we need to make to other components will be simpler thanks to what I did here.
One more thing I'd like to do before actually stopping creating the non-attributed nodes is to have a convenient way of creating a schema programmatically, matching nodes against it, and creating them without having to pack inputs into flat argument lists (which is quite error prone).
zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9505
Reviewed By: ezyang
Differential Revision: D8915496
Pulled By: apaszke
fbshipit-source-id: 39d14fc9a9d73d8494f128367bf70357dbba83f5
Summary:
Use decorator `torch.jit.batch` to implement auto-batching (call `to_batch` pass to do IR tranformation).
- `to_batch` pass: "to_batch.h/cpp" in csrc/jit/passess to transform a graph to a new batched graph.
- Write several basic operators for BatchTensor (add, mul, sigmoid, tanh, mm, matmul, select).
- Register the operators in a lookup table `<std::string, std::shared_ptr<Graph>>`. (use the Graph to replace the original node in IR graph)
Move BatchTensor in python from torch.BatchTensor to torch.jit.BatchTensor
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9198
Reviewed By: zdevito
Differential Revision: D8744466
Pulled By: ChunliF
fbshipit-source-id: 9ea56a30f55cb870f13a2069a47cc635419763ff
Summary:
This is a series of two commits that should probably be read separately. They are stacked on top of #9018 since the second commit requires it for correctness.
Commit 1
=======
This commit is the first in a series that will clean up how we handle declaring operators and intrinsics in the JIT to make it more modular and readable. This introduces readable declarations that can be used to register operators and switches gen_jit_dispatch to generate this schema. A follow up PR will remove the dispatch keys like "add-3" and resolve ops directly based on the registered schema, further simplifying the generation process.
* Switches schema over to parsed declarations, in the future this will allow something like:
```
registry.register_intrinsic("foo(Tensor a, Tensor b) -> Tensor", [](Stack& stack) {
...
})
```
This will allow the scalable registration of intrinsics for lists, tuples, and other ops, as long as meta-data for these ops (e.g. derivatives and size propagation routines).
The declarations resemble those used by PythonArgParser but have been singificantly cleaned up to minimize the number of types that can appear in the declaration. We should strive to get the other parts of PyTorch switched over to this restricted declaration set when possible, but it is too much to do in a single PR. My hope is that eventually we will use a very similar language to describe declarations in C10, and this can serve as a guide for that.
Parsing is done using the script lexer, so it is very robust to whitespace and extensible for future types.
This removes the other way we encoded schema, and makes it easier to see what schema are registered.
Current generated declarations: https://gist.github.com/zdevito/a96a17766fb3a098d69a91ee00abaaf6
* Switches how we handle attempting to use an integer in the place of a fixed-sized int list, such as in conv (e.g. 'int[3] stride=1'). Now that we can statically distinguish between int and Tensor, we handle the expansion as an implicit conversion in the compiler. This allows us to simplify the interpreter since it no longer needs to handle the conversion itself.
* Schema declarations have been changed so that they match the type system in the IR exactly. In particular, attribute_info which was used by liftConstantAttributes has been dropped and constant attributes are lifted purely based on the type of the input. Type conversions in compiler have been simplified due to this change.
* Error highlighting in ErrorReport now only reports at most 20 lines of code, to make reading where an error occurred easier.
Commit 2
=======
This commit unifies aten_dispatch and aten_schema into a single Operator object that both contains schema and implementation information. In the future we can use this object to also contain functionality like shape prop and autodiff needed by all operators. Operators are registered globally, and dispatch logic uses the schema information to figure out which variant to use. Descriptor keys, a frequent source of inscrutable debug errors, have been removed.
* Introduce Operator, to replace TensorOp. Unlike TensorOp, we use Operator for all op implementations, including primitives that may occur in the graphs. The only exceptions are ops that are only known to the interpreter like jumps, and GraphExecutors where we need to record additional debug info.
* Adds a global registry for Operator implementations. aten_dispatch.cpp turns into register_aten_ops.cpp, which registers all the Operators for aten with the operator registry. register_prim_ops.cpp now contains the implementations for primitive operators that used to be in the interpreter. This means that it is now safe to use `getOperation(node)` to lookup the true interpreter function for the node, which will simplify const-propagation passes.
* Remove addInterpreterOpHandler in favor of global operator registry.
* Instead of descriptors, we match Node arguments directly against FunctionSchema describing expected inputs in `matchSchema`. `matchSchema` knows how parse both attributes and positional inputs from a node and match it to the appropriate registered operator. Debug error messages when we try to run an invalid operator are significantly improved: they now automatically display the schema for the op with the same name that are registered.
* Merge aten_schema into regsiter_aten_ops. Each Operator takes a string schema which is parsed to determine when to dispatch to that op.
* Cleans up gen_jit_dispatch.py now that we do not need to write out descriptors. In particular, skip_scalar_overloads can be removed since Richard's code sorts declarations to put Tensor, Tensor declarations first.
* remove matchSchemaAndLiftConstantAttributes and use emitBuiltinCall instead to remove code duplication
* refactor stack manipulation functions into a separate header file.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8885
Reviewed By: jamesr66a
Differential Revision: D8751048
Pulled By: zdevito
fbshipit-source-id: 312aabfbf88307c5f6ab947b6caf691468b94557
Summary:
add two small bindings to recently added attributes.
Also want to leave a reference gist here: https://gist.github.com/soumith/8102ef39530bac09070912b1a5401d0f
It showcases:
- traced a module
- symbolically differentiated the forward graph, to get a forward, backward graph
- executed the subsequent forward + backward graphs correctly
- compared the jit vs non-jit results
Closes https://github.com/pytorch/pytorch/pull/8890
Reviewed By: ezyang
Differential Revision: D8677663
Pulled By: soumith
fbshipit-source-id: a29919c05baad997cd7fb7df718f933a83035118
Summary:
Add BatchTensor class
- construct from data, mask, dims or construct from list of tensors
- can return a list of tensors from an BatchTensor class
next step: do IR level transformation and operators
Closes https://github.com/pytorch/pytorch/pull/8922
Differential Revision: D8668986
Pulled By: ChunliF
fbshipit-source-id: 8b24d2a9f46a3b42dbb397e99e9e059dfb2b326e
Addresses #8177
A design doc can be found here: [gist](https://gist.github.com/zou3519/4b7f13f03cc9f3612bd9363e6405fa0a) version or [quip](https://fb.quip.com/azL1AqUckBdo) version
General approach:
- Add NumberType, FloatType, IntType to represent Python numbers, floats and ints.
- Emit these types for python literals
- Change aten_schema such that Scalars are NumberType, int64_t and bool are IntType.
- Emit aten::type_as, prim::NumToTensor, and prim::TensorToNum nodes for tensor-number math. (see examples below)
- Erase NumberType, prim::NumToTensor, and prim::TensorToNum for ONNX export
### Tensor/number math
```
import torch
@torch.jit.script
def fn(x):
return x + 1
```
```
graph(%x : Dynamic) {
%1 : int = prim::Constant[value={1}]()
%2 : Dynamic = prim::NumToTensor(%1)
%3 : Dynamic = aten::type_as(%2, %x)
%4 : Dynamic = aten::add[alpha={1}](%x, %4)
return (%5);
}
```
### Number/Number Math
```
import torch
@torch.jit.script
def fn(zero):
c = 1 + 1
return zero + c
```
```
graph(%zero : Dynamic) {
%1 : int = prim::Constant[value={1}]()
%2 : int = prim::Constant[value={1}]()
%3 : Dynamic = prim::num_to_tensor(%1)
%4 : Dynamic = prim::num_to_tensor(%2)
%5 : Dynamic = aten::add[alpha={1}](%3, %4)
%c : int = prim::TensorToNum(%6) # this is the result of the addition
...
return (%13);
}
```
List of squashed commits:
* Introduce Python Number types
Added: IntType, FloatType, NumberType with
IntType <: NumberType
FloatType <: NumberType
Changed aten_schema so arguments have corresponding types
* Emit a NumberType for python literals.
Also emit a NumberType for Scalar default values.
* Add prim::NumToTensor and prim::TensorToNum
* Add DynamicType -> NumberType implicit cast for bc
* Better ensureTensor error message
* Add ensureTensorOrNumber. Allow passing Number to some functions
Like the range() construct and slices
* Patch IntList to work.
IntList is still a DynamicType in the frontend: a tensor gets built from
a List[int].
Also, IntList[1] is a "union between int and IntList" the way it is
implemented. If the frontend sees an int being passed for an IntList[1]
arg, it converts it to a tensor as well.
* Enforce some order on schemas to avoid overload ambiguity
add(Tensor, Tensor) should appear earlier than add(Tensor, Scalar). This
matches the order in which python_arg_parser parses its arguments.
* Disable std_dim and var_dim tests.
With the new schema information, std(input, keepdim) and std(input, dim)
are ambiguous. This will need to be fixed at a later date.
* Add NumberType erasure pass.
This is used for ONNX export and to ensure that NumberType information
doesn't reach the interpreter
* Add support for mixed tensor/number math ops.
* Tests for new functionality.
Includes:
- Tensor/number math
- number/number math
- EraseNumberTypes pass test
* Patch tests
Update expect tests for:
- decompose_addmm
- loop unrolling tests
Because python numbers are now NumberType, they cannot be returned by
functions anymore. Work around this by using "torch.full", or by adding
a tensor([0]) (taken from FIXME_zerol()). Both approaches are used
because torch.full is more readable, but it is broken in some cases.
* Add erase_number_types to torch/CMakeLists.txt
* Move math back to emitSimpleExpr from emitSugaredExpr
* Remove some dead lines
* Renable some excluded script/trace tests that are fixed.
* Move some tests to expected failure
* Address some comments (more addressing to come)
* Erase relevant aten::type_as nodes in EraseNumberTypes
I also changed it so that EraseNumberTypes is only called for ONNX
export. It is no longer used to prevent
prim::NumToTensor/prim::TensorToNum from reaching shape_analysis or
interpreter.cpp.
shape_analysis infers the type of the output of these nodes to be the
same as their input.
intepreter.cpp treats both of these nodes as no-ops.
* Add reminder to fix std/var
* Call EraseNumberTypes only when exporting a script module
* Update expects after rebase
* Created TensorOptions
Storing the type in TensorOptions to solve the Variable problem
Created convenience creation functions for TensorOptions and added tests
Converted zeros to TensorOptions
Converted rand to TensorOptions
Fix codegen for TensorOptions and multiple arguments
Put TensorOptions convenience functions into torch namespace too
All factory functions except *_like support TensorOptions
Integrated with recent JIT changes
Support *_like functions
Fix in place modification
Some cleanups and fixes
Support sparse_coo_tensor
Fix bug in Type.cpp
Fix .empty calls in C++ API
Fix bug in Type.cpp
Trying to fix device placement
Make AutoGPU CPU compatible
Remove some auto_gpu.h uses
Fixing some headers
Fix some remaining CUDA/AutoGPU issues
Fix some AutoGPU uses
Fixes to dispatch_tensor_conversion
Reset version of new variables to zero
Implemented parsing device strings
Random fixes to tests
Self review cleanups
flake8
Undo changes to variable.{h,cpp} because they fail on gcc7.2
Add [cuda] tag to tensor_options_cuda.cpp
Move AutoGPU::set_index_from into .cpp file because Windows is stupid and sucks
Fix linker error in AutoGPU.cpp
Fix bad merge conflict in native_functions.yaml
Fixed caffe2/contrib/aten
Fix new window functions added to TensorFactories.cpp
* Removed torch::TensorOptions
Added code to generate wrapper functions for factory methods
Add implicit constructor from Backend to TensorOptions
Remove Var() from C++ API and use torch:: functions
Use torch:: functions more subtly in C++ API
Make AutoGPU::set_device more exception safe
Check status directly in DynamicCUDAHooksInterface
Rename AutoGPU to DeviceGuard
Removed set_requires_grad from python_variables.h and warn appropriately in Variable::set_requires_grad
remove python_default_init: self.type()
Add back original factory functions, but with deprecation warnings
Disable DeviceGuard for a couple functions in ATen
Remove print statement
Fix DeviceGuard construction from undefined tensor
Fixing CUDA device compiler issues
Moved as many methods as possible into header files
Dont generate python functions for deprecated factories
Remove merge conflict artefact
Fix tensor_options_cuda.cpp
Fix set_requires_grad not being checked
Fix tensor_new.h
TEMPORARILY put some methods in .cpp files to see if it solves issues on windows and mac
Fix bug in DeviceGuard.h
Missing includes
TEMPORARILY moving a few more methods into .cpp to see if it fixes windows
Fixing linker errors
* Fix up SummaryOps to use new factories
Undo device agnostic behavior of DeviceGuard
Use -1 instead of optional for default device index
Also move DeviceGuard methods into header
Fixes around device index after optional -> int32_t switch
Fix use of DeviceGuard in new_with_tensor_copy
Fix tensor_options.cpp
* Fix Type::copy(
* Remove test_non_float_params from ONNX tests
* Set requires_grad=False in ONNX tests that use ints
* Put layout/dtype/device on Tensor
* Post merge fixes
* Change behavior of DeviceGuard to match AutoGPU
* Fix C++ API integration tests
* Fix flip functions
* docstring support for @script and @script_method
* make it python2 compatible
* improve according to review
* improve build_stmts
* use filter instead of list comprehension
* improve the way wrap is handled for script_method
* stash the original method instead
* allow dynamic attr for ScriptMethod and GraphExecutor
* a bit comment on build_Expr
* remove _build_wrap
* a bit improve on comments
* rename to __original_methods
* should be _original_methods
* Factor python dependency out of interpreter
* Remove NO_PYTHON for the autograd engine
If there is no python bindings, then a default Engine is constructed
the first time it is requested.
If the python libraries are loaded, then they override the default
accessor and the default engine becomes a python Engine.
Note: it is possible for two engines to be generated if a non-python
one gets created before the python bindings are loaded. This case
is rare, and just results in additional threads being spawned.
* Fixing AlexNet test which is skipped in CI
This adds the ability to trace script functions while preserving their
control flow. When the trace encounters a script function it inlines
the graph of the function into the trace rather than tracing the
function itself.
* Switch JIT passes to take a graph rather than TracingState
* Add pybind11 binding for ONNX pass from graph
* Fix canonicalize pass
* address comment
* Switch ToONNX to explicitly return new graph
* optimize_graph instead of optimize_trace
Script functions can now have no return statements, empty
return statements, or return one or more values.
Additionally fix the lexer to always emit TK_NEWLINE before
TK_DEDENT, which simplifies the parser.
* Have ScriptModule inherit from Module
This is accomplished by created replacement _parameters, _buffers,
and _modules which implement the OrderedDict APIs but which
actually get/set their members inside script::Module
* Merge TracedModule with ScriptModule
* Move logic of attribute handling into Python bindings rather than
make script::Module handle it. This was redundant with nn.Module,
which already handles attribute.
* Make TracedModule a subclass of ScriptModule
* Move handling of attribute kind logic into bindings.
* Allow ScriptModule to contain non-script module submodules.
* torch.jit.trace annotation now creates a GraphExecutor
The other torch.jit.trace, which was used for testing purposes and for onnx to get the trace graph, is now called torch.jit. torch.jit.get_trace_graph.
* @script annotation, and compilation unit for strings
Additionally:
- add support for calling functions that are not methods in the Python frontend
- add an end-to-end test for the Python frontend
- add a capture_stdout helper for checking that `print` actually works
This adds the initial implementation of graph executor for the new JIT design. It includes a few python tests ensuring that nograd, backward, and double-backward cases work for simple examples and some corner cases. More work needs to be done to performance optimize as there are many extra copies and places where we hold onto variables longer than we should. These are noted in the comments.
This pass splits differentiable subgraphs into their own Node,
similar to a fusion group.
This initial implementation does not create optimal subgraphs, but
it works well in the case where most things are differentiable,
and has the building blocks (`mergeNodes`) to extend to the
better implementation.
* Trace ATen non-primitive functions as themselves, not their implementations.
Previously, if I invoked an ATen non-primitive function foo, which in turn
called subfoo, I would always see 'subfoo' in the trace (e.g., tracing
'inlines' all of these operations.) Such inlining is bad for ONNX
(and can be bad for optimization) as it prevents high-level
optimizations from taking advantage of the structure. It might
be right to inline, but give the optimizer a chance to work before
inlining happens!
The implementation here is surprisingly simple, because it uses
the "DCE trick". Essentially, it doesn't matter if the constituent
calls perform tracing, because you can always trace it again, and
override the trace nodes associated with the returned variables.
The original trace becomes dead and can be DCE'd.
While implementing this, I also refactored how 'isTracing' and
'trace_outputs' works:
- isTracing was previously a single function with overloads for
both Tensor and Variable arguments. Unfortunately, such overloads
are not safe, because of how C++ implicit conversions work. You
would think that C++ should never confuse an overload for
Variable with ArrayRef<Tensor>, but this is exactly what can
happen: Tensor is convertible to both Variable and ArrayRef<Tensor>,
thus it's ambiguous and C++ doesn't like it. The last time I ran
into this problem, I applied initializer lists to everything and
called it a day. A more robust fix is to separate out the
Variable and Tensor overloads, which I have done in this patch.
- trace_outputs was fed as an initializer list, which doesn't work
when you have heterogenous inputs. So instead we first feed
everything through 'flatten', which has overloads for each of the
argument patterns in ATen, which then goes on to the recordTrace
(which takes an ArrayRef). This is *no less efficient*, because
we were allocating a vector anyway (to do the conversion from
vector of Tensor to vector of Variable).
This fixes mean that 'index' can properly be traced... although the
JIT still does not support it. A failing test case has been added to
this effect.
Some knock-on effects:
- The fuser now knows about chunk as well as split. They're pretty
similar so there is no problem.
- There is a new 'canonicalize' pass in the JIT which renumbers a graph
so that all structurally equivalent graphs render the same.
- We run DCE before the fuser tests, to make sure dead nodes don't
block fusion.
- There are new ONNX exports for the newly introduced higher level ATen
operations. This includes type_as (no-op case only), chunk, select.
Zach didn't like the extra use of 'native' in the new codegen, so
we've introduced a new concept, 'abstract'. An abstract function
is one that is implemented in derived types (e.g., CPUDoubleType),
where as a concrete one is implemented in the base type (Type).
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
This breaks a lot of the onnx-pytorch tests because the abstraction
barriers are not respected. I'll spin up a patch for that separately.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
The pieces:
- I improved the lint / asserts to catch some bugs which I
committed while working on my export. There are two new
properties which the linter checks now:
(1) "Anticipated uses". If a node says that is used by
M, M better appear later in the topsort. Previously,
we only checked if it was in all_nodes.
(2) If you are a select node, you better be a multi-type node;
if you're not a select node, you better not be! And you
should never have an input that is multi-type.
- There is a new peephole optimization pass, for simple, local
transformations to graphs. Right now, it implements a simple
optimization: remove 'expand' invocations that are no-ops
(the size before matches the size after), but we can add other
things to it later. I needed this for ONNX because no-op expands
show up in the left-hand argument, which we don't support.
- There is now a broadcast fuser, which fuses ATen expand ops
into broadcastable ONNX ops (Add, Div, Mul, Pow, Sub, Gemm.)
It only fuses when the original size is a suffix of the new
size, as per the ONNX spec.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
- Reduce setup.py diff.
- Expunge WITH_TOFFEE from codebase.
- Elaborate on a comment.
- Move gen_toffee.sh to tools
- Delete densenet test.
- Use 'using' to inherit a constructor.
- Delete outdated comment.
- Comment about why primspecs can return fewer outputs.
- Remove dead, commented out includes.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>