pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Adam Paszke	a0118533ef	Add a print() function to the JIT script (#5274 ) Additionally: - add support for calling functions that are not methods in the Python frontend - add an end-to-end test for the Python frontend - add a capture_stdout helper for checking that `print` actually works	2018-02-24 11:15:55 +01:00
Zachary DeVito	8904616028	add control flow to interpreter (#5293 ) * Use stacks in the interpreter/aten_dispatch Rather than have separate input/output lists, the interpreter now works using a single stack. Operators in the interpreter push/pop from the stack. This allows ownership of tensors to transfer directly to an operator, and an operator can drop the reference to a tensors as soon as it is no longer needed. This is important for the GraphExecutor op, which recursively runs the interpreter. Once autograd is updated to pass variables to Function by value, we will be able to ensure that we release ownership as soon as possible. This commit also switches the interpreter to use a fake tensor 'ContainerTensor' rather than at::Retainable to hold non-tensor data in the interpreter. This allows us to use std::vector<at::Tensor> for all registers, which is significantly less confusing than the OwnedRetainables struct it was replacing. * Add If and Loop to interpreter * Preprocess loop to calculate where references to tensor should be dropped * Add control instructions JumpZ/JumpNZ/Jump * Switch from explicitly having stage structs to having a single list of instructions with Store/Load instructions to take values off the initial stack * Make the interpreter tests executable rather than use expect files * add a flag to interpreter code so that constants are variables if the interpreter is running on variables. * Add tensor_as to its own file	2018-02-22 19:56:15 -08:00
Peter Goldsborough	702a7f3864	Improve Function interface (#5221 ) * Improve Function interface * Undo tracer changes * Fix bug in VariableType.set_history * Rename function_counter and sequence_number to sequence_nr * Clarify Function documentation * Replace swap_next_edges with next_edges() getter * Bring back set_gradient_edge * Simplify special.cpp * add_gradient_edge -> create_gradient_edge * Add mutable getters for pre/post hooks * Use make_variable with Edge * Remove remove_gradient_edge in favor of detach_ * Fix documentation and remove create_gradient_edge friend method * Canonicalize some includes	2018-02-21 16:37:52 -05:00
Peter Goldsborough	2d5fbe6e0d	Improve Variable interface (#5127 ) * Improve Variable interface * Address comments from @apaszke and @colesbury * string ::operator= is not noexcept * Remove ir.h from tracer_state.h to improve build times * Make Variable a struct and pack SavedVariable fields * Implement as_variable_ref * grad_fn_ptr() -> grad_fn_unsafe() * Reduce hackiness of set_type hack * Include variable.h and edge.h in tracer_state.h because it uses them * class Variable -> struct Variable because Windows cant even * Make Variable::output_nr uint32_t instead of int * Add comment about tracing state * Replaced more static_cast<Variable&> and improve docs * Remove SavedVariable destructor and construct members in init list * Clarify docs for Variable * Variable::set_version -> set_version_counter	2018-02-12 23:26:26 -05:00
Peter Goldsborough	25e946bf78	Replace edge_type with Edge and create Variable::gradient_edge() (#5030 )	2018-02-07 10:50:42 -08:00
Zachary DeVito	c308e03f3e	Initial GraphExecutor Implementation. (#4982 ) This adds the initial implementation of graph executor for the new JIT design. It includes a few python tests ensuring that nograd, backward, and double-backward cases work for simple examples and some corner cases. More work needs to be done to performance optimize as there are many extra copies and places where we hold onto variables longer than we should. These are noted in the comments.	2018-02-02 17:45:59 -08:00
Adam Paszke	79d15c52cb	Improve the engine support for functional graph execution (#4690 ) Previously the side-effect free grad calculation was performed using callbacks that could also override the decision to run a function. However this had a few problems e.g. it forced us to iterate over pretty much all functions in the graph and drop their buffers. This patch improves the mechanism, by adding explicit support for this kind of evaluation in execute(). It's safer, and the algorithm used to decide which nodes have to be evaluated was replaced with a faster one.	2018-01-18 11:20:30 +01:00
Adam Paszke	17148f891f	Fix a leak in JIT interpreter	2018-01-03 13:44:49 -05:00
Zachary DeVito	766312b7f2	Further relax VariableFlags, ... and fix bugs (#4244 ) * Further relax VariableFlags * Allow a requires_grad=True trace to be used for a requires_grad=False input by computing the gradient but they not connecting it to the input. * Enable CSE to de-duplicate WLM backwards pass code which calls sum twice. * Fix a bug in the interpreter that frees a register too early when it appears twice in a use list. * [fuser] Follow all outputs to check if fusion is safe This bug was introduced when we allowed fusion groups to fuse together. Previously producers were forced to have a single output, but now producers that are fusion groups can have multiple outputs. So now we check the uses of all the outputs of a producer. * [JIT] Fix handling of undefined inputs It is not legal to call .data() on variable objects whose tensors are undefined.	2017-12-20 10:36:22 -05:00
Sam Gross	d605058212	Replace Variable.volatile with torch.no_grad() (#3970 ) This removes volatile from Variable. The functionality is mostly replaced by a global (thread-local) flag, which is controlled by torch.set_grad_enabled() and the context manager torch.no_grad(). In C++, the flag is exposed through GradMode::is_enabled() and GradMode::set_enabled() Fixes #3627	2017-12-18 15:46:13 -05:00
Zachary DeVito	84b7daadb2	Relax verify of VariableFlags (#4191 ) * Fix another leak in pybind11 code. This time caused by an upstream pybind11 bug: https://github.com/pybind/pybind11/pull/1216 This changes causes the code to go down a non-buggy pathway. * Relax verify of VariableFlags If we trace with a defined tensor, but see a run with a undefined tensors we now allow that run to happen, replacing the tensor with zeros. This also fixes a bug where stage 0 tensors were not checked against their verify flags. This change does _not_ handle all bad situations that can happen. For instance if the first thing traced has a undefined tensor but a later tensor is defined, then it will fail because the graph itself does not contain the trace for the derivative of the tensor. However it is possible to work around this later case by dry-running the function: z = Variable(...,requires_grad=True) x,y = f(z) (x.sum() + y.sum()).backward()	2017-12-15 12:57:31 -05:00
Zach DeVito	f72fe0624d	Add a CPU Fuser (single core) This adds a simple fusion backend for the CPU. * Refactors CompiledFusionFunction to have two subclasses that handle the compilation details of each backend. * emit-compile-link-run cycle for the CPU * simple single core loop to run the operation * lift CUDA-only restrictions in the fuser, checks that fusion groups are only on a single backend.	2017-12-04 14:13:44 -05:00
Zachary DeVito	929a11f920	Add interpreter support for Handles/PythonOp/CppOp (#3866 ) * Add interpreter support for Handles/PythonOp/CppOp This treats Handles as a first-class type in the interpreter since this turned out to be conceptually simpler than treating them as a separate concept, which requires a second channel for register allocating and moving data from one op to the next. Notes: * The refcounting nature of tensors is factored into its own base type so that it can be shared with other refcounted types such as handle. * Some methods redundant with TensorBase have been deleted from Tensor * The interpreter uses raw refcounted handles. In addition to being able to treat Tensors and Handles as the same base object, it removes a lot of redundant refcounting as objects moved from tensors to input/ output lists. * aten_dispatch has been updated to work directly on the raw refcounted lists to avoid refcounting and duplicate lists. * Removing jit_closure.cpp, The interpreter can now handle all pathways. * Functions like `unsafeToTensorShare` describe how ownership transfers in the interpreter. The `Steal` variants take rvalue references as arguments, and invalidate those arguments to prevent potential problems. * Make TensorTemporary is not a subtype relationship because it is too easy to do something horribly unsafe: ``` void foo(at::Tensor bar) { // bar destructor call release on a temporary! } foo(TensorTemporary(retainable)); // structure slicing! ```	2017-11-29 11:38:57 -05:00
Zach DeVito	ef4b19f767	Refactor ir.h to distinguish Nodes and Values This commit adds a Value type similar to the one @ezyang suggested a while ago for handling multi-return nodes. Previously if we had a graph like: a = op1(b) c, d = op2(a) Then its in-memory format would look like: %0 = op1(b) %1 = op2(%0) %2 = select(%1, 0) %2 = select(%1, 1) Select nodes were used only to handle the multi-output case. In the single-output case ops referred directly to their uses. This required special handling for the single- and multi- output cases, and was confusing when used with ONNX which distinguishes values (the inputs/outputs of a node) from the nodes themselves (e.g. a Conv). This commit adds the Node/Value distinction to the IR. In the example above, `a`, `b`, `c`, and `d` are now Value objects, while `op1` and `op2` are now Node objects. Inputs/Outputs to the graph are values. * Nodes now always have multiple outputs, accessible through their `output()` method. * Methods exist for adding/removing outputs from a node. * Nodes own their output Values, destroying a node destroys its outputs and it is only valid to destroy a node when no uses of its outputs remain. * Unlike select, Values do not appear in the nodes list. * The method `node()` on `Value` retrieves its defining node. Calling it is always valid. For inputs, its kind is "Param". Like "Return" there is a single Param node representing all inputs. * For single-output Nodes, the method `output()` retrieves the single output Value, asserting that the node is in-fact single output. * Functions are the same, but some functions like `type()` have moved to Value. * `replaceAllUsesWith` is now sanely defined for both Values and Nodes. In the case of Nodes, it replaces all outputs of the node with the outputs of the replacement node. * stage is defined both on Node/Value. This is because Inputs require a stage. * Apart from changing data types from Node->Value most passes remain the same. Things that previously assumed single-output nodes now have to call output() to get the node. * This removes the uses = [...] field in the outputs because it was getting confusing even before this commit when uses would refer to nodes, but we print the names of Values. The lint pass validates the use list, so printing it out seems less necessary.	2017-11-15 11:47:18 -08:00
Adam Paszke	3bb2308a89	Minor JIT improvements (#3703 ) * Record autograd profiler events in JIT * Fix the graph fuser It was supposed to only work for float inputs, but worked for all types _except_ float.	2017-11-14 21:23:31 -08:00
Zachary DeVito	e43ff32192	Add a JIT interpreter (#3634 ) * Add a JIT interpreter The separate interpreter is used to graphs with a lower overhead than converting them to autograd graphs. Some notes: * does not support Handles/PythonOp/CppOp, these will be in a future commit * jit_closure.cpp still exists and we fall back to it for now when cannot handle something because of PythonOp/CppOp * In order to support retain_graph=True, the interpreter can be cloned, creating a copy that can be run with different arguments. This is assumed to be the non-standard case so cloning is not particularly optimized. No tensor _data_ is copied, but the at::Tensor list in the interpreter is. If we hit problems, there is a lot we could do (such as register allocation) to minimize the stuff that needs to be copied. * Uses a pImpl pattern to keep implementation details out of its header file. * Modifies the way getTensorOp works so that it reads/writes to already-existing vectors, this prevents needing to realloc these buffers each time. * Timings are here: https://gist.github.com/zdevito/5a20ac29fb1b9e449e693b67dc478127 This reduces overhead to about the same as running it in python. It is about 10us faster to run the same thing using ATen directly. * Code Mod Interpreter -> InterpreterState Function -> Code Add other requested comments. * RegList -> ListHandle<T> Change the RegList functions to be safer by identifying the type of each argument list, and checking that list insert does not try to add to two different lists at once. * Use exactly equal for interp tests	2017-11-13 22:09:53 -08:00

16 Commits