Commit Graph

14 Commits

Author SHA1 Message Date
Zachary DeVito
e43ff32192
Add a JIT interpreter (#3634)
* Add a JIT interpreter

The separate interpreter is used to graphs with a lower overhead than
converting them to autograd graphs. Some notes:

* does not support Handles/PythonOp/CppOp, these will be in a future commit
* jit_closure.cpp still exists and we fall back to it for now when
  cannot handle something because of PythonOp/CppOp
* In order to support retain_graph=True, the interpreter can be cloned,
  creating a copy that can be run with different arguments. This is
  assumed to be the non-standard case so cloning is not particularly optimized.
  No tensor _data_ is copied, but the at::Tensor list in the interpreter is.
  If we hit problems, there is a lot we could do (such as register allocation)
  to minimize the stuff that needs to be copied.
* Uses a pImpl pattern to keep implementation details out of its header file.
* Modifies the way getTensorOp works so that it reads/writes to already-existing
  vectors, this prevents needing to realloc these buffers each time.
* Timings are here: https://gist.github.com/zdevito/5a20ac29fb1b9e449e693b67dc478127
  This reduces overhead to about the same as running it in python.
  It is about 10us faster to run the same thing using ATen directly.

* Code Mod

Interpreter -> InterpreterState
Function -> Code

Add other requested comments.

* RegList -> ListHandle<T>

Change the RegList functions to be safer by identifying the type of
each argument list, and checking that list insert does not try
to add to two different lists at once.

* Use exactly equal for interp tests
2017-11-13 22:09:53 -08:00
Zachary DeVito
25d3c25f50 add more fusable nodes to the graph compiler (#3559) 2017-11-08 22:58:08 -05:00
Zachary DeVito
8cc30e4895 Fix the Fusion Pass (#3362)
* update fuser to match ATen-formatted JIT ops

* fix concat optimizations and add test

* allow onnx export to work with single-export functions

* fix onnx handling of multi-return nodes.

* nits, format, vision test update

* fix add constant

* fix driver init issues

* Add missing Neg symbolic.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-10-31 13:44:13 -04:00
Zach DeVito
a194e66186 allow Concat operators to be the final operator in a fusion group, and update the fusion compiler to support code that includes final concats 2017-09-20 12:24:27 -04:00
Adam Paszke
fe5c644f81 Handle AddConstant in fusion compiler 2017-09-19 10:53:32 -04:00
Adam Paszke
8a605ce766 Minor refactor of fusion compiler 2017-09-19 10:53:32 -04:00
Edward Z. Yang
b2e7438ead Move disallow_copy into utils.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
Zach DeVito
57b7370aab switch NodeKind over to Symbol type. 2017-09-05 17:48:55 -04:00
Zach DeVito
c8b303e853 guard dump, guard cuda 2017-09-05 17:48:55 -04:00
Zach DeVito
f4b7178b59 track scalar type 2017-09-05 17:48:55 -04:00
Zach DeVito
b6175eb54d enable fusion group execution in autograd closure. implement chunk. propagate type information through fusion optimization. 2017-09-05 17:48:55 -04:00
Adam Paszke
233a66dcbe Remove SimpleMap from JIT IR 2017-09-05 17:48:55 -04:00
Edward Z. Yang
ea4aaa6b0b Document TemplateEnv & PR fixes 2017-09-05 17:48:55 -04:00
Zach DeVito
50e51eaa7f Fusion of simple map operations using nvrtc.
Approach is based on the approach of THC's pointwiseApply{1,2,3} family of kernels,
but doesn't have any dependencies on that code.

Adjacent contiguous dimensions of input tensors are compressed to reduce the complexity of indexing math.
For the completely contiguous case, the indexing logic simplifies to just the linear index.

In simple tests, this code matched or beat the equivalent from THC.
2017-09-05 17:48:55 -04:00