Commit Graph

50 Commits

Author SHA1 Message Date
Edward Yang
173f224570 Turn on F401: Unused import warning. (#18598)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18598
ghimport-source-id: c74597e5e7437e94a43c163cee0639b20d0d0c6a

Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18598 Turn on F401: Unused import warning.**

This was requested by someone at Facebook; this lint is turned
on for Facebook by default.  "Sure, why not."

I had to noqa a number of imports in __init__.  Hypothetically
we're supposed to use __all__ in this case, but I was too lazy
to fix it.  Left for future work.

Be careful!  flake8-2 and flake8-3 behave differently with
respect to import resolution for # type: comments.  flake8-3 will
report an import unused; flake8-2 will not.  For now, I just
noqa'd all these sites.

All the changes were done by hand.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: D14687478

fbshipit-source-id: 30d532381e914091aadfa0d2a5a89404819663e3
2019-03-30 09:01:17 -07:00
Xiang Gao
df614371c7 Mention Jacobian-vector product in the doc of torch.autograd (#15197)
Summary:
A friend of me is learning deep learning and pytorch, and he is confused by the following piece of code from the tutorial https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html#gradients :

```python
x = torch.randn(3, requires_grad=True)

y = x * 2
while y.data.norm() < 1000:
    y = y * 2

print(y)

gradients = torch.tensor([0.1, 1.0, 0.0001], dtype=torch.float)
y.backward(gradients)

print(x.grad)
```

He don't know where the following line comes from:
```python
gradients = torch.tensor([0.1, 1.0, 0.0001], dtype=torch.float)
```

What are we computing? Why don't we compute "the gradient of `y` w.r.t `x`"?

In the tutorial, it only says
> You can do many crazy things with autograd!

Which does not explain anything. It seems to be hard for some beginners of deep learning to understand why do we ever do backwards with external gradient fed in and what is the meaning of doing so. So I modified the tutorial in https://github.com/pytorch/tutorials/pull/385
and the docstring correspondingly in this PR, explaining the Jacobian vector product. Please review this PR and https://github.com/pytorch/tutorials/pull/385 together.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15197

Differential Revision: D13476513

Pulled By: soumith

fbshipit-source-id: bee62282e9ab72403247384e4063bcdf59d40c3c
2018-12-15 00:10:30 -08:00
albanD
78e3259bbe Add autograd automatic anomaly detection (#7677)
* add autograd automatic anomaly detection

* python 3 string support

* Fix non python build

* fix typo in doc

* better test and naming fix

* fix no python build and python object handling

* fix missing checks

* clean NO_PYTHON build

* Remove unwanted changes
2018-06-11 21:26:17 -04:00
Tongzhou Wang
9af3a80cff
Docs for gradcheck and gradgradcheck; expose gradgradcheck (#8166)
* Docs for gradcheck and gradgradcheck; expose gradgradcheck

* address comments
2018-06-06 13:59:55 -04:00
Tongzhou Wang
1c01eabd3c
Codemod to update our codebase to 0.4 standard (#6641)
* Codemod to update our codebase to 0.4 standard

* Update some of the test scri[ts

* remove Variable in test_clip_grad_value

* fix _symbolic_override_wrapper_maker
2018-04-17 22:06:54 -04:00
Tongzhou Wang
e01569afd7 Restore allow_unused functionality (#6553) 2018-04-12 21:30:42 +02:00
Priya Goyal
e3196e0ea8
[Re-checkpointing] Autograd container for trading compute for memory (#6467)
* Autograd container for trading compute for memory

* add a unit test for checkpoint

* address comments

* address review comments

* adding some docs for the checkpoint api

* more comments

* more comments

* repro bug

* Fix a subtle bug/apply some review comments

* Update checkpoint.py

* Run everything in grad mode

* fix flake and chunk=1

* use imperative backward as per discussion

* remove Variable and also add models and test for models

* Add a simple thread local variable to check for autograd grad mode

* remove models and models test after debugging

* address review comments

* address more comments

* address more comments
2018-04-10 15:26:24 -04:00
Richard Zou
1449c9f754 Update autograd docs (#5907)
* Update autograd docs

* Deprecate 'grad_variables' in backward().

Advise to replace with 'grad_tensors'.

* Resolve saved_variables/saved_tensors

* Tensor section

* Address comments

* Address comments

* Address comments
2018-03-30 15:33:11 -04:00
Thomas Viehmann
a33aeed1dc Add set_grad_enabled as context manager and function (#5555) 2018-03-09 11:36:56 +01:00
Sam Gross
54b4cdeffa
Replace all uses of 'Tensor or Variable' with 'Tensor' (#5508)
Replace all uses of 'Tensor or Variable'  and 'Variable or Tensor' with 'Tensor'
2018-03-02 14:26:11 -05:00
Edward Z. Yang
f064c5aa33
Expunge all occurrences of torch._C._VariableFunctions (#5525)
Some of the call-sites now look a little hokey with this
removed, saving that for another patch.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-03-02 12:19:44 -05:00
Sam Gross
70ba50c3d4 Remove some uses of torch.is_tensor in favor of isinstance (#5473) 2018-03-02 06:17:38 -05:00
gchanan
da894901ef
Deprecate variable factory, use torch.tensor instead (#5476)
* Remove usages of torch.autograd.variable; use torch.tensor instead.

* Deprecate torch.autograd.variable.

* Remove unused sample_scalar.
2018-03-01 10:58:16 -05:00
gchanan
611c771fc8
Introduce torch.tensor (was torch.autograd.variable). (#5419)
* Introduce torch.tensor (was torch.autograd.variable).

* Get rid of torch.variable usages.

* Use more precise name.
2018-02-26 19:10:29 -05:00
gchanan
9390f7d3d6
Implement a (data-only) Variable factory (#4753)
* Implement a (data-only) Variable factory.

Implements a function, torch.autograd.variable that is modeled after np.array.  The main difference between it and new() and
the tensor constructors is it inteprets a python number as data, i.e. as a 0-dimensional tensor (we currently don't expose
that at the pytorchl level, so it will temporarily end up as a 1-dimensional tensor), rather than a size.

The main difference currently between torch.autograd.variable and np.array is that np.autograd.variable is stricter, e.g.
passing a PyFloat when an integral type is the default tensor type will result in an array; np.array basically lets anything
through (floating-point / integral mismatch, overflow, etc).  This is to keep it consistent with Variable.new when called with
a sequence, although we can loosen the checks later.

This will be renamed to torch.tensor once we merge Variable and tensor.

* Address review comments.
2018-01-22 18:14:22 -05:00
Adam Paszke
79d15c52cb
Improve the engine support for functional graph execution (#4690)
Previously the side-effect free grad calculation was performed
using callbacks that could also override the decision to run a
function. However this had a few problems e.g. it forced us to iterate
over pretty much all functions in the graph and drop their buffers.

This patch improves the mechanism, by adding explicit support for this
kind of evaluation in execute(). It's safer, and the algorithm used to
decide which nodes have to be evaluated was replaced with a faster one.
2018-01-18 11:20:30 +01:00
Sam Gross
d605058212
Replace Variable.volatile with torch.no_grad() (#3970)
This removes volatile from Variable. The functionality is mostly
replaced by a global (thread-local) flag, which is controlled by
torch.set_grad_enabled() and the context manager torch.no_grad().

In C++, the flag is exposed through GradMode::is_enabled() and GradMode::set_enabled()

Fixes #3627
2017-12-18 15:46:13 -05:00
Adam Paszke
cf407213f9 Clean up stochastic function related dead code (#3782) 2017-11-20 12:44:45 -05:00
Ozan Çağlayan
dd6d04ddf2 doc: Normalize all true/false in docstrings to `True|False` (#3593)
* doc: Normalize all true/false in docstrings to ``True|False``

This makes them more apparent in the documentation.

* doc: fix flake8
2017-11-09 08:12:29 -05:00
Adam Paszke
411e1469e0 Add tools for autograd profiling 2017-09-25 23:21:30 -04:00
Adam Paszke
3128218397 Allow specifying unused inputs to torch.autograd.grad (#2859) 2017-09-25 14:42:33 -04:00
Adam Paszke
6be47ec907 Minor fixes and improvements 2017-09-05 17:48:55 -04:00
Edward Z. Yang
4979359800 Add graphs, trace them.
It is not an /expression/ we trace, but it is a /graph/: that is,
a closed expression which knows its parameters.  Knowing the list
of parameters is helpful and helps remove a hack when interpreting.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
Edward Z. Yang
3055b69f63 Refactor Arg class away.
Although ANF style developments traditionally stratifies syntactic
classes into atomic (Arg) and complex (Expr) expressions, where
atomic expressions could be variables, constants or lambdas, Zach has
successfully convinced me that we should do away with the variant here and
always require arguments to be variables.  There are a few reasons for
this:

1) Tensor constants, not currently supported, could be modeled using a
"Constant" instruction, removing the need for them to be representable
directly inline.  An inline constant is marginally more convenient
for peephole optimizations, but since we have gone full ANF, we are going
to need to be able to see across def-uses in any case, and it is not
too much worse to need to handle constants this way.  By the way,
Swift Intermediate Language also made a similar choice, see
the slide on "Literal Instructions" in
http://llvm.org/devmtg/2015-10/slides/GroffLattner-SILHighLevelIR.pdf

2) Scalar constants, which are quite important for passing non-tensor
arguments to Python operators, are now stored out-of-band as NON
first-class values.  This more closely matches the ToffeeIR design,
and makes it clear what parameters are "first class" (tensors only)
and which ones are not.  However, we need to be able to unswizzle
the separate scalar/tensor lists into a unified list in the correct
format; this is what PyFunctionCConv is for.

Also, Locals got renamed into Tuple.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
Edward Z. Yang
a797ab9343 Rewrite AST to a new, more functional representation.
Previously, our AST was a DAG, where shared Nodes indicated a computation
should be reused.  This commit rewrites the IR into a new functional
representation which represents sharing explicitly using variable
bindings.

We offer a few justifications for this new style:

1. The new representation is not all that different from the
old one; it is about as easy to construct, and the lack of an
explicit graph doesn't negatively impact our ability to interpret
the graph, since we've chosen, as a matter of design, to NOT have
the IR participate in the actual execution of a graph.

2. The new let-binding representation has an implicit ordering,
which we can use to conveniently keep track of the original order
the trace showed up as.  This automatically gives us a topsort,
and gives us an easier to read textual representation of our
IR:

  %14 = Embedding %11, %0, -1, None, 2, False, False
  %15 = Dropout %14, 0.2, True, False
  %16 = Index %12, 0
  %17 = Index %12, 1
  %18 = Index %13, 0
  %19 = Index %13, 1
  %20 = Index %15, 0
  %21 = Linear %20, %1, %3
  %22 = Linear %16, %2, %4

3. It moves us closer to a Futhark style language
(http://futhark-lang.org/publications/pldi17.pdf).

Major aspects of the diff

- Node is replaced with Expr and Arg, a pair of mutually recursive
  structures which represent our new language.  In BNF, the language
  looks like this:

    a ::= c | %i
    e ::= %i, ... = e
        | PyOp e, ...
        | Ret %i, ...

  Technically, Ret is not actually a return (no control flow is involved),
  it just tuples up a series of tensors (identified by variables).

  One important invariant is that locals are always tensors; they
  are never constants (this is asymmetric with Args.)

- Arguments support Python constants.  This is an important piece because
  many operators take extra Python literals like integers and tuples in
  order to specify extra parameters about how an operator operates.  Adding
  this was essential to getting word_language_model to work.

- As both Expr and Arg have multiple variants, there is new infrastructure
  for doing case on the variants using ExprVisitor and ArgVisitor.  The
  strategy here is adapted from WebAssembly's visitors, although we have
  generalized to permit arbitrary argument forwarding, which is necessary
  to support tail-recursive visitor calls.  TCO is important because our
  interpreter may recurse arbitrarily deep into a stack of nested lets.
  If users wish, they can also manually case on the type tag.

- Tracing is now turned on and off using _tracer_enter/_tracer_exit in
  torch._C.  _tracer_enter accepts a list of variables which are to be
  treated as arguments; _tracer_exit accepts the list of traced variables
  which should be returned when you reexecute the trace, and returns
  the trace expression which can be reexecuted.  GlobalTracingState
  is a global variable which tracks whether or not we are tracing or not.

- You use run_forward to execute a trace on some set of parameters.

- When under tracing, variables keep track, via trace_local, what the
  name of their variables in the IR are.

Here is a simple runner which leaks memory but can be used to JIT models:

  import torch.autograd.function as F
  import torch._C

  def jit(model):
      import types
      real_forward = model.forward
      def forward(self, *args):
          def flatten(x):
              return tuple(F._iter_variables(x))
          if not hasattr(self, "saved_trace"):
              torch._C._tracer_enter(tuple(self.parameters()) + flatten(args))
              out = real_forward(*args)
              self.saved_trace = torch._C._tracer_exit(flatten(out))
              self.saved_outs = out
              return out
          else:
              flat_out = Variable._execution_engine.run_forward(self.saved_trace, tuple(self.parameters()) + flatten(args))
              return F._unflatten(flat_out, self.saved_outs)

Major problems:

- Sanity checking is spotty at best, especially when users pass in variables.

- The interpreter leaks tensor memory from the store.  When we add back def-use
  we should be able to deallocate tensors as soon as we know they are no longer
  necessary.

- The interpreter needs to reach feature parity with the old execution engine.
  From there, we need to see if backwards can be subsumed as well.

- I still have no confidence in having memory managed everything correctly.
  This requires a close look.

- Rather than return an *open* expression as a trace, we should return a
  *lambda* instead, which knows about how many formal parameters it
  requires.

- The IR is not introspectable from Python at the moment, but this is simply a
  matter of implementing all the binding code.

- The tracer is NOT reentrant (you can't trace while you're inside a trace.)
  Furthermore, no sanity checking is done if you try to incorrectly reuse
  things from one trace in another.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
Edward Z. Yang
e1b7872fc2 Make it possible to access IR from Python.
Also, add a new trace_fn field to attach forward IR to Variables.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
Zhou Mo
2c07f88ea3 Fix typos. 2017-08-25 14:27:07 -04:00
Gregory Chanan
50c208a50b Revert "Fix typos."
This reverts commit 4622b33952.
2017-08-10 13:57:00 -04:00
Zhou Mo
4622b33952 Fix typos. 2017-08-08 11:05:38 -04:00
Adam Paszke
575a4a98e0 Remove assertions with side effects 2017-07-20 01:45:57 -04:00
albanD
849fb1f7e3 Fix when running with python -O (#2120) 2017-07-16 13:51:14 -04:00
albanD
3dbece7eb5 clean tests 2017-06-17 11:11:48 -04:00
albanD
f945fbc3dd add gradgradcheck and conv double backward tests 2017-06-17 11:11:48 -04:00
Adam Paszke
e7220380bc Add new flags to Variable.backward 2017-05-10 16:43:14 +02:00
Naim Kabir
93094294ba function backward attempted to multiply tuple by variables (#1459)
One line fix--changed it to multiple the grad_variables by the
len(variables) when grad_variables is None.
2017-05-03 13:12:21 -04:00
Adam Paszke
e1278d4ee2 Fix typo in autograd docs 2017-05-03 03:11:55 -07:00
Adam Paszke
5c7453447f Fix bugs, rename differentiate to grad, make it more flexible 2017-05-01 16:44:56 -04:00
Adam Paszke
e5db8f98be Add torch.autograd.differentiate 2017-05-01 16:44:56 -04:00
Adam Paszke
2ca787fcf4 Refactor attribute names in autograd 2017-05-01 16:44:56 -04:00
Sergey Zagoruyko
819d4b2b83 Add finite differences gradcheck (#851) 2017-02-26 08:35:24 -05:00
Luke Yeager
e7c1e6a8e3 [pep8] Fix most lint automatically with autopep8
Here's the command I used to invoke autopep8 (in parallel!):

    git ls-files | grep '\.py$' | xargs -n1 -P`nproc` autopep8 -i

Several rules are ignored in setup.cfg. The goal is to let autopep8
handle everything which it can handle safely, and to disable any rules
which are tricky or controversial to address. We may want to come back
and re-enable some of these rules later, but I'm trying to make this
patch as safe as possible.

Also configures flake8 to match pep8's behavior.

Also configures TravisCI to check the whole project for lint.
2017-01-28 01:15:51 +01:00
Adam Paszke
a1fa995044 Fixes and improvements (#593)
* Fix error in ELU backward

* Add --seed flag for testst st

* Add test for BatchNorm eval

* Fix autograd.backward docs

* Support cc flags in cuDNN search

* Fix IndexSelect backward formula
2017-01-25 22:21:49 -05:00
Adam Paszke
bc6a71b1f5 Add Function docs 2016-12-30 00:15:06 -05:00
Adam Paszke
26f1e2ca9c Add basic autograd docs 2016-12-30 00:15:06 -05:00
Adam Paszke
b140e70b58 Add autograd.backward (#341) 2016-12-26 19:10:35 -05:00
Adam Paszke
8a70067b92 Add support for stochastic functions in autograd (#294) 2016-12-16 13:14:37 +01:00
Adam Lerer
f88c3e9c12 fix some missing features in pytorch needed for RNNs 2016-10-23 20:23:48 -07:00
Adam Lerer
942ca477a6 Copying weights for CUDNN 2016-10-23 20:23:48 -07:00
Adam Paszke
0325e2f646 Major autograd refactor
Improves autograd performance by more than 2x and fixes a couple
of bugs. All core functions have been moved to C.
2016-10-13 17:17:49 -07:00
Adam Paszke
53f00ae429 Add autograd 2016-08-19 14:24:07 -07:00