pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Zachary DeVito	0ae5498079	[JIT] add create_autodiff_subgraphs (#4822 ) This pass splits differentiable subgraphs into their own Node, similar to a fusion group. This initial implementation does not create optimal subgraphs, but it works well in the case where most things are differentiable, and has the building blocks (`mergeNodes`) to extend to the better implementation.	2018-01-23 23:46:54 -05:00
Adam Paszke	e6cbe84bf6	Handle repeated inputs in JIT tracer	2018-01-03 17:29:27 +01:00
Edward Z. Yang	6d72c82985	Trace ATen native functions as themselves, not their implementations. (#4127 ) * Trace ATen non-primitive functions as themselves, not their implementations. Previously, if I invoked an ATen non-primitive function foo, which in turn called subfoo, I would always see 'subfoo' in the trace (e.g., tracing 'inlines' all of these operations.) Such inlining is bad for ONNX (and can be bad for optimization) as it prevents high-level optimizations from taking advantage of the structure. It might be right to inline, but give the optimizer a chance to work before inlining happens! The implementation here is surprisingly simple, because it uses the "DCE trick". Essentially, it doesn't matter if the constituent calls perform tracing, because you can always trace it again, and override the trace nodes associated with the returned variables. The original trace becomes dead and can be DCE'd. While implementing this, I also refactored how 'isTracing' and 'trace_outputs' works: - isTracing was previously a single function with overloads for both Tensor and Variable arguments. Unfortunately, such overloads are not safe, because of how C++ implicit conversions work. You would think that C++ should never confuse an overload for Variable with ArrayRef<Tensor>, but this is exactly what can happen: Tensor is convertible to both Variable and ArrayRef<Tensor>, thus it's ambiguous and C++ doesn't like it. The last time I ran into this problem, I applied initializer lists to everything and called it a day. A more robust fix is to separate out the Variable and Tensor overloads, which I have done in this patch. - trace_outputs was fed as an initializer list, which doesn't work when you have heterogenous inputs. So instead we first feed everything through 'flatten', which has overloads for each of the argument patterns in ATen, which then goes on to the recordTrace (which takes an ArrayRef). This is no less efficient, because we were allocating a vector anyway (to do the conversion from vector of Tensor to vector of Variable). This fixes mean that 'index' can properly be traced... although the JIT still does not support it. A failing test case has been added to this effect. Some knock-on effects: - The fuser now knows about chunk as well as split. They're pretty similar so there is no problem. - There is a new 'canonicalize' pass in the JIT which renumbers a graph so that all structurally equivalent graphs render the same. - We run DCE before the fuser tests, to make sure dead nodes don't block fusion. - There are new ONNX exports for the newly introduced higher level ATen operations. This includes type_as (no-op case only), chunk, select. Zach didn't like the extra use of 'native' in the new codegen, so we've introduced a new concept, 'abstract'. An abstract function is one that is implemented in derived types (e.g., CPUDoubleType), where as a concrete one is implemented in the base type (Type). Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-12-15 13:50:32 -05:00
Adam Paszke	d1fb8fdf03	Improve IODescriptors in JIT arg checking	2017-11-17 00:13:02 +01:00
Adam Paszke	1f1612ee37	Move _CompiledMixin to C++	2017-11-10 16:31:44 +01:00
Adam Paszke	621fbd5c4e	Move flattening/unflattening JIT logic to C	2017-11-06 19:42:44 -05:00
Edward Z. Yang	d4abaa4b9e	Move ONNX broadcast fusion into separate ONNX pass, fixes verbose printing. This breaks a lot of the onnx-pytorch tests because the abstraction barriers are not respected. I'll spin up a patch for that separately. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-11-01 09:49:53 -04:00
Edward Z. Yang	40f7f6e095	Improve handling of 'expand' (broadcasting) in JIT and ONNX The pieces: - I improved the lint / asserts to catch some bugs which I committed while working on my export. There are two new properties which the linter checks now: (1) "Anticipated uses". If a node says that is used by M, M better appear later in the topsort. Previously, we only checked if it was in all_nodes. (2) If you are a select node, you better be a multi-type node; if you're not a select node, you better not be! And you should never have an input that is multi-type. - There is a new peephole optimization pass, for simple, local transformations to graphs. Right now, it implements a simple optimization: remove 'expand' invocations that are no-ops (the size before matches the size after), but we can add other things to it later. I needed this for ONNX because no-op expands show up in the left-hand argument, which we don't support. - There is now a broadcast fuser, which fuses ATen expand ops into broadcastable ONNX ops (Add, Div, Mul, Pow, Sub, Gemm.) It only fuses when the original size is a suffix of the new size, as per the ONNX spec. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-10-29 23:50:34 -04:00
Lu Fang	0a1ac8bfe5	create a cse pass, with very naive support.	2017-09-22 17:06:27 -04:00
Adam Paszke	b708b6de8d	Add ONNX pass (JIT trace initialization)	2017-09-19 10:53:32 -04:00
Adam Paszke	0e53fe3a41	Put ONNX files where they belong	2017-09-19 10:53:32 -04:00
Adam Paszke	8dae433de8	Move JIT passes to a separate directory	2017-09-19 10:53:32 -04:00
Zach DeVito	6d8d5bab4c	Codemod Toffee -> ONNX, toffee -> onnx. Change file names to match	2017-09-06 13:45:39 -04:00
Adam Paszke	c537aebf5a	Always run DCE in Traceable	2017-09-05 17:48:55 -04:00
Edward Z. Yang	d59714e3b1	Code review comment changes. - Reduce setup.py diff. - Expunge WITH_TOFFEE from codebase. - Elaborate on a comment. - Move gen_toffee.sh to tools - Delete densenet test. - Use 'using' to inherit a constructor. - Delete outdated comment. - Comment about why primspecs can return fewer outputs. - Remove dead, commented out includes. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	2e266837f5	Port TracingState to pybind11, new export() method. Along the way I added converters for Variable and TracingInput. Variable should probably be moved to a more widely known spot. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Adam Paszke	594f98ce16	Support multi-stage AutogradClosures	2017-09-05 17:48:55 -04:00
Edward Z. Yang	82efbe349b	Handle batchnorm properly. Basic idea: - Pass buffers (marked as non-Variable tensors) as input variables to the trace. Every buffer gets represented as an input variable to the trace, and we remember a correspondence of the underlying TH pointer and an input variable in the trace. - When we initially trace a function, we DO NOT record the buffers as edges. This is so autograd doesn't have to know anything about buffers. If we ever turn buffers into requires_grad=False parameters, then this problem goes away. - When we primspec the buffer, NOW we reach into the cached buffers (now appropriately named) and gin up the buffer information we need. Other things: - CppOp execution is now supported (but lightly tested) using SimpleEval (thanks @apaszke!) Todo: - E2E tests need to have their hacks removed. - Figure out what is going on with backwards Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Zach DeVito	bad5717e15	add ability to specify initial values for inputs	2017-09-05 17:48:55 -04:00
Zach DeVito	a3fdb281d1	Python wrapper for Node IR using pybind11 Supports almost all of the IR API.	2017-09-05 17:48:55 -04:00
Edward Z. Yang	e1b345d81b	More alexnet things as primspec. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	6f6fe177f1	Make Toffee optional. Unbreaks CI. The general strategy: - We put all the toffee files in torch/csrc/toffee; they will only be added when toffee is enabled - Toffee is enabled if torch/lib/ToffeeIR is present (since we don't have a submodule/subtree thing going on) - The most prevalant place you will need to use WITH_TOFFEE is for primspec definitions on C++ autograd functions. There is a macro HAS_PRIMSPEC to ameliorate optionally defining primspec() virtual overrides on Function classes. HasPrimspec is always available but will be a zero field class when Toffee is disabled. NB: We might revert this commit in the future if we figure out a way to unconditionally enable Toffee that everyone likes. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	dd58b145c3	Toffee graph exporting for PyTorch. This commit adds a new exporter pass which takes a graph and returns a string of the human-readable protobuf representation of a model. We have two strategies for how conversions are implemented: - If a Python autograd function has a primspec static method, we invoke it to get the Toffee conversion. Use torch.toffee.op to generate the format expected to be returned. The particular data representation is opaque and subject to change in the future. - Otherwise, there's a giant if statement in the exporter, which manually uses the JIT IR C++ API and Toffee IR C++ protobuf API to convert. You must check out a copy of the ToffeeIR repo https://github.com/ProjectToffee/ToffeeIR at torch/lib; at the moment we don't have a subtree/submodule set up. Technical debt in this commit: - To get protobuf headers in scope, we unconditionally add $CONDA_PREFIX/include to the include path. This needs to be replaced with a more robust mechanism. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Adam Paszke	890c2071f0	PR comments	2017-09-05 17:48:55 -04:00
Adam Paszke	7f60a18293	Add initial support for backward tracing	2017-09-05 17:48:55 -04:00
Adam Paszke	1c4538e017	Trace C functions	2017-09-05 17:48:55 -04:00
Adam Paszke	bdcbbeaf68	Remove GlobalTracingState	2017-09-05 17:48:55 -04:00
Edward Z. Yang	b158aaf6b4	Make linter an optimization pass. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	3016f459d2	Partial lint pass. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Zach DeVito	50e51eaa7f	Fusion of simple map operations using nvrtc. Approach is based on the approach of THC's pointwiseApply{1,2,3} family of kernels, but doesn't have any dependencies on that code. Adjacent contiguous dimensions of input tensors are compressed to reduce the complexity of indexing math. For the completely contiguous case, the indexing logic simplifies to just the linear index. In simple tests, this code matched or beat the equivalent from THC.	2017-09-05 17:48:55 -04:00
Adam Paszke	f270973937	Add JIT IR -> Autograd IR converter	2017-09-05 17:48:55 -04:00
Adam Paszke	e186d16e6b	Apply JIT optimizations form Python	2017-09-05 17:48:55 -04:00
Zach DeVito	48945a435d	IR modifications to make mutatation possible. Nodes are in intrusive doubly-linked list. Methods added to manipulate inputs etc.	2017-09-05 17:48:55 -04:00
Adam Paszke	6be47ec907	Minor fixes and improvements	2017-09-05 17:48:55 -04:00
Adam Paszke	ea05ac8f41	Move JIT-related files to jit dir. Remove IR interpreter	2017-09-05 17:48:55 -04:00

35 Commits