pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Richard Zou	cf2e088c9a	Translate None to zeros for old-style autograd functions (#4242 )	2017-12-20 14:03:56 +01:00
Sam Gross	d605058212	Replace Variable.volatile with torch.no_grad() (#3970 ) This removes volatile from Variable. The functionality is mostly replaced by a global (thread-local) flag, which is controlled by torch.set_grad_enabled() and the context manager torch.no_grad(). In C++, the flag is exposed through GradMode::is_enabled() and GradMode::set_enabled() Fixes #3627	2017-12-18 15:46:13 -05:00
Edward Z. Yang	6d72c82985	Trace ATen native functions as themselves, not their implementations. (#4127 ) * Trace ATen non-primitive functions as themselves, not their implementations. Previously, if I invoked an ATen non-primitive function foo, which in turn called subfoo, I would always see 'subfoo' in the trace (e.g., tracing 'inlines' all of these operations.) Such inlining is bad for ONNX (and can be bad for optimization) as it prevents high-level optimizations from taking advantage of the structure. It might be right to inline, but give the optimizer a chance to work before inlining happens! The implementation here is surprisingly simple, because it uses the "DCE trick". Essentially, it doesn't matter if the constituent calls perform tracing, because you can always trace it again, and override the trace nodes associated with the returned variables. The original trace becomes dead and can be DCE'd. While implementing this, I also refactored how 'isTracing' and 'trace_outputs' works: - isTracing was previously a single function with overloads for both Tensor and Variable arguments. Unfortunately, such overloads are not safe, because of how C++ implicit conversions work. You would think that C++ should never confuse an overload for Variable with ArrayRef<Tensor>, but this is exactly what can happen: Tensor is convertible to both Variable and ArrayRef<Tensor>, thus it's ambiguous and C++ doesn't like it. The last time I ran into this problem, I applied initializer lists to everything and called it a day. A more robust fix is to separate out the Variable and Tensor overloads, which I have done in this patch. - trace_outputs was fed as an initializer list, which doesn't work when you have heterogenous inputs. So instead we first feed everything through 'flatten', which has overloads for each of the argument patterns in ATen, which then goes on to the recordTrace (which takes an ArrayRef). This is no less efficient, because we were allocating a vector anyway (to do the conversion from vector of Tensor to vector of Variable). This fixes mean that 'index' can properly be traced... although the JIT still does not support it. A failing test case has been added to this effect. Some knock-on effects: - The fuser now knows about chunk as well as split. They're pretty similar so there is no problem. - There is a new 'canonicalize' pass in the JIT which renumbers a graph so that all structurally equivalent graphs render the same. - We run DCE before the fuser tests, to make sure dead nodes don't block fusion. - There are new ONNX exports for the newly introduced higher level ATen operations. This includes type_as (no-op case only), chunk, select. Zach didn't like the extra use of 'native' in the new codegen, so we've introduced a new concept, 'abstract'. An abstract function is one that is implemented in derived types (e.g., CPUDoubleType), where as a concrete one is implemented in the base type (Type). Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-12-15 13:50:32 -05:00
Edward Z. Yang	8a254a0271	Port batchnorm_double_backward to ATen. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-12-13 17:19:47 -05:00
Simeon Monov	3c709f5b26	Add HANDLE_TH_ERRORS for THPFunction_saved_variables and THPFunction_saved_tensors SavedVariable.unpack() may throw std::runtime_error which may lead to program termination with SIGABRT without the exception beeing handled in Python Fixes #3860	2017-11-29 22:54:27 +01:00
Zachary DeVito	929a11f920	Add interpreter support for Handles/PythonOp/CppOp (#3866 ) * Add interpreter support for Handles/PythonOp/CppOp This treats Handles as a first-class type in the interpreter since this turned out to be conceptually simpler than treating them as a separate concept, which requires a second channel for register allocating and moving data from one op to the next. Notes: * The refcounting nature of tensors is factored into its own base type so that it can be shared with other refcounted types such as handle. * Some methods redundant with TensorBase have been deleted from Tensor * The interpreter uses raw refcounted handles. In addition to being able to treat Tensors and Handles as the same base object, it removes a lot of redundant refcounting as objects moved from tensors to input/ output lists. * aten_dispatch has been updated to work directly on the raw refcounted lists to avoid refcounting and duplicate lists. * Removing jit_closure.cpp, The interpreter can now handle all pathways. * Functions like `unsafeToTensorShare` describe how ownership transfers in the interpreter. The `Steal` variants take rvalue references as arguments, and invalidate those arguments to prevent potential problems. * Make TensorTemporary is not a subtype relationship because it is too easy to do something horribly unsafe: ``` void foo(at::Tensor bar) { // bar destructor call release on a temporary! } foo(TensorTemporary(retainable)); // structure slicing! ```	2017-11-29 11:38:57 -05:00
Sam Gross	0a434ff685	Remove Function::is_executable (#3907 ) * Remove Function::is_executable Ensure that grad_fn is null if requires_grad is false. * Assert that grad_fn implies requires_grad=True	2017-11-28 18:29:27 -08:00
Adam Paszke	cf407213f9	Clean up stochastic function related dead code (#3782 )	2017-11-20 12:44:45 -05:00
Gregory Chanan	9a2b54e08b	[ATen] Rename isCuda -> is_cuda.	2017-11-15 18:33:07 -08:00
Sam Gross	b09d66e60d	Fix a reference cycle when in-place ops on views save the output (#3679 ) Previously, an in-place operation that saves its output (such as relu/threshold) would create a reference cycle when applied to the a view. There were two cycles created: 1) The cycle base.grad_fn.fn.input_.base base.grad_fn is a CopySlices base.grad_fn.fn is ThresholdBackward base.grad_fn.fn.input_ is a SavedVariable with base pointing to base 2) The cycle base.grad_fn.fn.input_.grad_fn.next_functions[0] base.grad_fn.fn.input_.grad_fn is AsStridedBackward and next_functions[0] points to base.grad_fn Generally, we avoid cycles because the AD graph is mostly immutable. Two notable exceptions are: a) Variable.grad_fn can change to point to a new grad_fn b) SavedVariables in a function can be set after the function is created The first case is not a problem if grad_fns do not hold strong references to Variables. Removing "base" from SavedVariable removes the strong ref. For the second case, we need to avoid saving the grad_fn of outputs. We were incorrectly saving the grad_fns of outputs when they were the result of in-place ops on views.	2017-11-15 15:19:41 -05:00
Zach DeVito	ef4b19f767	Refactor ir.h to distinguish Nodes and Values This commit adds a Value type similar to the one @ezyang suggested a while ago for handling multi-return nodes. Previously if we had a graph like: a = op1(b) c, d = op2(a) Then its in-memory format would look like: %0 = op1(b) %1 = op2(%0) %2 = select(%1, 0) %2 = select(%1, 1) Select nodes were used only to handle the multi-output case. In the single-output case ops referred directly to their uses. This required special handling for the single- and multi- output cases, and was confusing when used with ONNX which distinguishes values (the inputs/outputs of a node) from the nodes themselves (e.g. a Conv). This commit adds the Node/Value distinction to the IR. In the example above, `a`, `b`, `c`, and `d` are now Value objects, while `op1` and `op2` are now Node objects. Inputs/Outputs to the graph are values. * Nodes now always have multiple outputs, accessible through their `output()` method. * Methods exist for adding/removing outputs from a node. * Nodes own their output Values, destroying a node destroys its outputs and it is only valid to destroy a node when no uses of its outputs remain. * Unlike select, Values do not appear in the nodes list. * The method `node()` on `Value` retrieves its defining node. Calling it is always valid. For inputs, its kind is "Param". Like "Return" there is a single Param node representing all inputs. * For single-output Nodes, the method `output()` retrieves the single output Value, asserting that the node is in-fact single output. * Functions are the same, but some functions like `type()` have moved to Value. * `replaceAllUsesWith` is now sanely defined for both Values and Nodes. In the case of Nodes, it replaces all outputs of the node with the outputs of the replacement node. * stage is defined both on Node/Value. This is because Inputs require a stage. * Apart from changing data types from Node->Value most passes remain the same. Things that previously assumed single-output nodes now have to call output() to get the node. * This removes the uses = [...] field in the outputs because it was getting confusing even before this commit when uses would refer to nodes, but we print the names of Values. The lint pass validates the use list, so printing it out seems less necessary.	2017-11-15 11:47:18 -08:00
Sam Gross	fde355f7d4	Allow in-place operations on views (#3384 ) Allow in-place operations on views Adds VariableViewImpl, a subclass of VariableImpl which has a pointer to the base Variable on which it is a view. In-place operations on views change the grad_fn of the base. Note that in-place operations only work on views that are the first output of the function that created them. All C++/ATen implemented functions have this behavior, but it's possible to write Python-implemented autograd functions that do not. In-place operations on these view will raise an exception. Fixes #3313	2017-11-06 18:19:56 -05:00
Sam Gross	3003ebe67a	Replace None grad_inputs with zero tensors in some cases (#3433 ) Replace None grad_inputs with zero tensors in some cases In Python-implemented autograd functions, we sometimes return None as the grad_input if the output is marked "non-differentiable". This replaces those None values with zero-filled Variables if the corresponding input has requires_grad=True. C++ implemented autograd functions expect the input (grad_outputs) to be defined if they're executed. They always return non-null grad_inputs if should_compute_output(i) is true. This could lead to segfaults if a subsequent Python-implemented function returned None. See #3412, #3241	2017-11-02 17:23:25 -04:00
Adam Paszke	fa0f3cf98a	Re-enable and fix most JIT tests	2017-10-27 02:40:09 +05:30
Sam Gross	e970d35091	Make VariableVersion refcounting thread-safe (#3184 ) I've also made the version counter and the "live" reference count atomics. Note that it's not safe to set the version counter (operator=) from multiple threads, because shared_ptr assignment isn't thread safe. Currently, the only call sites to these functions are on newly created variables before they can be accessed from other threads. See #3111	2017-10-19 17:22:01 -04:00
Edward Z. Yang	66bb3d6dec	Remove incorrect comment that join_with is symmetric. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-10-13 01:31:22 +02:00
Adam Paszke	b6b41c829a	Add inplace checks in JIT	2017-10-03 10:20:58 -04:00
Adam Paszke	411e1469e0	Add tools for autograd profiling	2017-09-25 23:21:30 -04:00
Edward Z. Yang	c08395e290	Give a better error message when we hit a legacy function. We now include the type name of the legacy function implementing class. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-25 12:26:07 -04:00
Edward Z. Yang	6efd797376	Document unchecked invariant. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-20 12:24:27 -04:00
Edward Z. Yang	25c2b7d8b2	Some minor extra comments on python_function Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-20 12:24:27 -04:00
Adam Paszke	c536da7064	Remove TensorMeta	2017-09-19 10:53:32 -04:00
Adam Paszke	ba6e652c02	Add simple mode to Eval	2017-09-19 10:53:32 -04:00
Edward Z. Yang	1f80dd03bd	Track change of Variable from shared_ptr to ATen style tensor Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-19 10:53:32 -04:00
Adam Paszke	28828e033f	Make certain functions traceable	2017-09-19 10:53:32 -04:00
Adam Paszke	4d1ed4ec42	Assign traces before saving Variables	2017-09-19 10:53:32 -04:00
Adam Paszke	964b731af3	Try to handle NULL Variables in the tracer	2017-09-19 10:53:32 -04:00
Adam Paszke	ddd417faf0	Fix non-CUDA builds after Windows PRs (#2760 )	2017-09-17 02:02:52 -04:00
Sam Gross	1290e586fb	Use at::Tensor based autograd Variable (#2676 ) Variable is now a subclass of at::Tensor backed by a VariableImpl* pImpl. The implementation of the ATen functions is defined in the auto-generated VariableType.h/cpp file. Currently, only functions which fall through to the base type, such as sizes() and isCuda() are implemented. Differentiable ops like add() and mul() will be added in a subsequent PR.	2017-09-12 11:36:01 -04:00
Adam Paszke	965a349bbd	Record context edges in the JIT	2017-09-05 17:48:55 -04:00
Adam Paszke	9f97291408	Make tracer thread-safe	2017-09-05 17:48:55 -04:00
Adam Paszke	fa308b3183	Improve backward tracing	2017-09-05 17:48:55 -04:00
Zach DeVito	55cd9f37d1	remove Select, and NodeWithKind	2017-09-05 17:48:55 -04:00
Zach DeVito	24cdb897d6	starting removing nodes by removing Return	2017-09-05 17:48:55 -04:00
Zach DeVito	b037efa92c	prep for removing node subtypes	2017-09-05 17:48:55 -04:00
Adam Paszke	7f60a18293	Add initial support for backward tracing	2017-09-05 17:48:55 -04:00
Edward Z. Yang	4a1bbc01ac	Fix #41 . Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	765b0bf137	Make in-place work again. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Adam Paszke	1c4538e017	Trace C functions	2017-09-05 17:48:55 -04:00
Adam Paszke	bdcbbeaf68	Remove GlobalTracingState	2017-09-05 17:48:55 -04:00
Edward Z. Yang	c931feaad0	Elaborate on NB a little	2017-09-05 17:48:55 -04:00
Adam Paszke	3e0f1608fe	Capture Variables that are not inputs as constants	2017-09-05 17:48:55 -04:00
Adam Paszke	af21c6b018	Add Node type to JIT IR Rewrite Type as a class hierarchy PR comments + rebase fixes	2017-09-05 17:48:55 -04:00
Adam Paszke	f270973937	Add JIT IR -> Autograd IR converter	2017-09-05 17:48:55 -04:00
Adam Paszke	9662cffd26	Use std::list in JIT IR	2017-09-05 17:48:55 -04:00
Adam Paszke	3dcbba1f35	Keep Variable mapping as part of TracingState	2017-09-05 17:48:55 -04:00
Adam Paszke	6be47ec907	Minor fixes and improvements	2017-09-05 17:48:55 -04:00
Adam Paszke	ea05ac8f41	Move JIT-related files to jit dir. Remove IR interpreter	2017-09-05 17:48:55 -04:00
Zach DeVito	1325fa511c	JIT IR including use-def chains and updated comments.	2017-09-05 17:48:55 -04:00
Zach DeVito	7c083b00f8	refcounting for Node/Value	2017-09-05 17:48:55 -04:00

1 2

89 Commits