pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Sam Gross	12229afd00	Record shape and type in autograd to validate gradients (#8168 ) The check that the gradient is defined is currently disabled because TestJit.test_ge_optimized will trigger the error.	2018-06-06 18:09:53 -04:00
Zachary DeVito	d985cf46f1	Add workaround to fix include warnings in Python 2 builds. (#6716 )	2018-04-24 12:30:19 -07:00
Peter Goldsborough	702a7f3864	Improve Function interface (#5221 ) * Improve Function interface * Undo tracer changes * Fix bug in VariableType.set_history * Rename function_counter and sequence_number to sequence_nr * Clarify Function documentation * Replace swap_next_edges with next_edges() getter * Bring back set_gradient_edge * Simplify special.cpp * add_gradient_edge -> create_gradient_edge * Add mutable getters for pre/post hooks * Use make_variable with Edge * Remove remove_gradient_edge in favor of detach_ * Fix documentation and remove create_gradient_edge friend method * Canonicalize some includes	2018-02-21 16:37:52 -05:00
Peter Goldsborough	f38b6f611e	Replace NULL with nullptr in autograd (#5162 )	2018-02-12 12:01:52 -08:00
Sam Gross	895aebac08	Use Variable instead of Tensor in Function.forward (#4786 ) The Tensor and Variable classes are being merged. autograd.Function.forward is now called on Variables, but with "no-grad" mode (torch.no_grad()) enabled. One benefit is that we no longer have to explicitly track shared storages.	2018-02-06 17:24:27 -05:00
Edward Z. Yang	8a254a0271	Port batchnorm_double_backward to ATen. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-12-13 17:19:47 -05:00
Adam Paszke	cf407213f9	Clean up stochastic function related dead code (#3782 )	2017-11-20 12:44:45 -05:00
Sam Gross	3003ebe67a	Replace None grad_inputs with zero tensors in some cases (#3433 ) Replace None grad_inputs with zero tensors in some cases In Python-implemented autograd functions, we sometimes return None as the grad_input if the output is marked "non-differentiable". This replaces those None values with zero-filled Variables if the corresponding input has requires_grad=True. C++ implemented autograd functions expect the input (grad_outputs) to be defined if they're executed. They always return non-null grad_inputs if should_compute_output(i) is true. This could lead to segfaults if a subsequent Python-implemented function returned None. See #3412, #3241	2017-11-02 17:23:25 -04:00
Adam Paszke	28828e033f	Make certain functions traceable	2017-09-19 10:53:32 -04:00
Sam Gross	1290e586fb	Use at::Tensor based autograd Variable (#2676 ) Variable is now a subclass of at::Tensor backed by a VariableImpl* pImpl. The implementation of the ATen functions is defined in the auto-generated VariableType.h/cpp file. Currently, only functions which fall through to the base type, such as sizes() and isCuda() are implemented. Differentiable ops like add() and mul() will be added in a subsequent PR.	2017-09-12 11:36:01 -04:00
Adam Paszke	1c4538e017	Trace C functions	2017-09-05 17:48:55 -04:00
Adam Paszke	f270973937	Add JIT IR -> Autograd IR converter	2017-09-05 17:48:55 -04:00
Adam Paszke	6be47ec907	Minor fixes and improvements	2017-09-05 17:48:55 -04:00
Zach DeVito	1325fa511c	JIT IR including use-def chains and updated comments.	2017-09-05 17:48:55 -04:00
Zach DeVito	f369f8e80d	simplify IR	2017-09-05 17:48:55 -04:00
Edward Z. Yang	4979359800	Add graphs, trace them. It is not an /expression/ we trace, but it is a /graph/: that is, a closed expression which knows its parameters. Knowing the list of parameters is helpful and helps remove a hack when interpreting. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	8ab905b769	Remove unused output_list. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	a797ab9343	Rewrite AST to a new, more functional representation. Previously, our AST was a DAG, where shared Nodes indicated a computation should be reused. This commit rewrites the IR into a new functional representation which represents sharing explicitly using variable bindings. We offer a few justifications for this new style: 1. The new representation is not all that different from the old one; it is about as easy to construct, and the lack of an explicit graph doesn't negatively impact our ability to interpret the graph, since we've chosen, as a matter of design, to NOT have the IR participate in the actual execution of a graph. 2. The new let-binding representation has an implicit ordering, which we can use to conveniently keep track of the original order the trace showed up as. This automatically gives us a topsort, and gives us an easier to read textual representation of our IR: %14 = Embedding %11, %0, -1, None, 2, False, False %15 = Dropout %14, 0.2, True, False %16 = Index %12, 0 %17 = Index %12, 1 %18 = Index %13, 0 %19 = Index %13, 1 %20 = Index %15, 0 %21 = Linear %20, %1, %3 %22 = Linear %16, %2, %4 3. It moves us closer to a Futhark style language (http://futhark-lang.org/publications/pldi17.pdf). Major aspects of the diff - Node is replaced with Expr and Arg, a pair of mutually recursive structures which represent our new language. In BNF, the language looks like this: a ::= c \| %i e ::= %i, ... = e \| PyOp e, ... \| Ret %i, ... Technically, Ret is not actually a return (no control flow is involved), it just tuples up a series of tensors (identified by variables). One important invariant is that locals are always tensors; they are never constants (this is asymmetric with Args.) - Arguments support Python constants. This is an important piece because many operators take extra Python literals like integers and tuples in order to specify extra parameters about how an operator operates. Adding this was essential to getting word_language_model to work. - As both Expr and Arg have multiple variants, there is new infrastructure for doing case on the variants using ExprVisitor and ArgVisitor. The strategy here is adapted from WebAssembly's visitors, although we have generalized to permit arbitrary argument forwarding, which is necessary to support tail-recursive visitor calls. TCO is important because our interpreter may recurse arbitrarily deep into a stack of nested lets. If users wish, they can also manually case on the type tag. - Tracing is now turned on and off using _tracer_enter/_tracer_exit in torch._C. _tracer_enter accepts a list of variables which are to be treated as arguments; _tracer_exit accepts the list of traced variables which should be returned when you reexecute the trace, and returns the trace expression which can be reexecuted. GlobalTracingState is a global variable which tracks whether or not we are tracing or not. - You use run_forward to execute a trace on some set of parameters. - When under tracing, variables keep track, via trace_local, what the name of their variables in the IR are. Here is a simple runner which leaks memory but can be used to JIT models: import torch.autograd.function as F import torch._C def jit(model): import types real_forward = model.forward def forward(self, args): def flatten(x): return tuple(F._iter_variables(x)) if not hasattr(self, "saved_trace"): torch._C._tracer_enter(tuple(self.parameters()) + flatten(args)) out = real_forward(args) self.saved_trace = torch._C._tracer_exit(flatten(out)) self.saved_outs = out return out else: flat_out = Variable._execution_engine.run_forward(self.saved_trace, tuple(self.parameters()) + flatten(args)) return F._unflatten(flat_out, self.saved_outs) Major problems: - Sanity checking is spotty at best, especially when users pass in variables. - The interpreter leaks tensor memory from the store. When we add back def-use we should be able to deallocate tensors as soon as we know they are no longer necessary. - The interpreter needs to reach feature parity with the old execution engine. From there, we need to see if backwards can be subsumed as well. - I still have no confidence in having memory managed everything correctly. This requires a close look. - Rather than return an open expression as a trace, we should return a lambda instead, which knows about how many formal parameters it requires. - The IR is not introspectable from Python at the moment, but this is simply a matter of implementing all the binding code. - The tracer is NOT reentrant (you can't trace while you're inside a trace.) Furthermore, no sanity checking is done if you try to incorrectly reuse things from one trace in another. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	1e8bf12b3a	Add an inefficient but working evaluator for forward traces. Simple test: import torch from torch.autograd import Variable import torch._C as _C x = Variable(torch.Tensor([4]), requires_grad=True) y = Variable(torch.Tensor([7]), requires_grad=True) z = x * y z.sum().backward() print(x.grad) print(y.grad) x.data[0] = 2 y.data[0] = 3 (z,) = z._execution_engine.run_forward((x, y), (z,)) z.sum().backward() print(x.grad) print(y.grad) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Trevor Killeen	c304d04fc6	Replace thpp::Tensor with ATen Tensor in autograd csrc (#2170 )	2017-07-28 10:18:37 -04:00
gchanan	925208af72	Implement BatchNorm double backwards (#2207 ) * Implement BatchNorm double backwards as a python function called directly from C++. This will be converted to C++ code once ATen is integrated with autograd. * Some performance improvements via inplace ops and reusing calculations.	2017-07-27 06:00:31 +05:30
Sam Gross	eba3dc8561	Fix gc_refs assertion failure (#1705 ) * Fix gc_refs assertion failure Ensure that each THPVariable -> THPFunction reference contributes one ref count to the THPFunction by creating a new shared_ptr for each ref. Because multiple shared_ptrs can again manage a single THPFunction, it's not safe to use std::weak_ptr where it may point to a PyFunction. It's still safe to use weak_ptr for grad_accumulator since these are never PyFunctions. Fixes #1626 * Remove stale comment	2017-06-02 21:08:50 -04:00
Edward Z. Yang	1f3ff5ced2	Miscellaneous documentation around autograd. (#1577 ) * Miscellaneous documentation around autograd. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-05-17 19:19:24 -04:00
Adam Paszke	5f15a9e0cb	Add a note about THPFunction_asFunction	2017-05-06 14:28:32 -07:00
Adam Paszke	72e8190994	Use at most one shared_ptr block at a time to manage THPFunctions (#1454 ) * Fix failing ln in build_all.sh * Use at most one shared_ptr block at a time to manage THPFunctions	2017-05-03 08:15:36 -04:00
Adam Paszke	20aa5b066f	Convert some of the functions to new format Also, fix a lot of issues that appeared after the previous commits.	2017-05-01 16:44:56 -04:00
Adam Paszke	de9998e198	Add support for the new Function format	2017-05-01 16:44:56 -04:00
Adam Paszke	2ca787fcf4	Refactor attribute names in autograd	2017-05-01 16:44:56 -04:00
Sam Gross	5073132837	Implement 'pre' and 'post' hooks at the C++ autograd level	2017-03-06 12:47:53 -08:00
Sam Gross	34ce58c909	Parallelize backwards	2017-03-03 11:26:00 -08:00
Sam Gross	bd5303010d	Refactor autograd package to separate Python dependencies. (#662 ) The core autograd Variable, Function, and Engine no longer depend on the Python API. This let's us implement functions in C++. In the future, we can also multithread engine and release the GIL for most of the non-Python backwards.	2017-02-13 16:00:16 -08:00

31 Commits