* Remove addValues and use WithInsertPoint
* Use blocks to simplify differentiate
Using @ezyang's suggestion, this change uses a block rather than
staging annotations to represent the reverse pass. This allows us
to reuse the machinery to copy graphs/blocks to extract the
reverse pass concisely.
This also change the input order of Gradients df to:
[output vjps][temporary vjps][captures]
In addition to being simpler to generate in this order, it also
will allow ExecutionPlan to append the captures onto the already-
existing input list of vjps that are given by the autograd,
rather than have to prepend them, which should be slightly cheaper.
* Enforce that input capture are before outputs
This changes the Gradient struct to enforce that input
captures appear before output captures in the capture list,
which makes it easier to use in ExecutionPlan.
This adds the initial implementation of graph executor for the new JIT design. It includes a few python tests ensuring that nograd, backward, and double-backward cases work for simple examples and some corner cases. More work needs to be done to performance optimize as there are many extra copies and places where we hold onto variables longer than we should. These are noted in the comments.