Summary:
load_state_dict includes a recursive inner function `load` that captures
Tensors through the close-over variable `state_dict`. Because it's
recursive, it also captures itself leading to a reference cycle.
This breaks the reference cycle so that any Tensors in state_dict can be
collected immediately instead of waiting until the next GC cycle.
Alternatively, we could have passed `state_dict` and `metadata` as
arguments to load to prevent capture of Tensors. (That would still
result in cyclic garbage, but not any cyclic garbage of Tensors).
See:
https://github.com/pytorch/pytorch/issues/20199#issuecomment-491089004
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20397
Differential Revision: D15414834
Pulled By: colesbury
fbshipit-source-id: 4c2275a08b2d8043deb3779db28be03bda15872d
Summary:
Added the ">>>" python interpreter sign(three greater than symbols), so that the edited lines will appear as code, not comments/output, in the documentation. Normally, the interpreter would display "..." when expecting a block, but I'm not sure how this would work on the pytorch docs website. It seems that in other code examples the ">>>" sign is used as well, therefore I used with too.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19347
Differential Revision: D14986154
Pulled By: soumith
fbshipit-source-id: 8f4d07d71ff7777b46c459837f350eb0a1f17e84
Summary:
return missing_keys and unexpected_keys from load_state_dict so the user can handle them when strict mode is off; also removed an unused variable
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18668
Differential Revision: D14782073
Pulled By: ezyang
fbshipit-source-id: ab3b855eb77bb7422594d971988067e86eef20f2
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17618
Base on the code, we only add key to `missing_keys` and `unexpected_keys` if `$strict` is `True`. The documentation is confusing.
This diff also fix one FLAKE8 warning.
Reviewed By: ailzhang
Differential Revision: D14280593
fbshipit-source-id: d368f5596bdf74ff62ee4d28d79120f5af91e0a3
Summary:
Previously this would fail with the error message:
```
ValueError: Auto nesting doesn't know how to process an input object of type dict. Accepted types: Tensors, or lists/tuples of them
```
Turns out we're not using the line that causes this error (or a side effect of that line), so removing it fixes the issue. Also cleaned up some related dead code (cc apaszke to make sure the code isn't useful in some way)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16616
Differential Revision: D13908352
Pulled By: suo
fbshipit-source-id: 27094f1f4ea0af215b901f7ed3520e94fbc587b3
Summary:
without this "if", code below will throw error " Linear' object has no attribute '_buffers' "
And with this if, error would be "cannot assign buffer before Module.\_\_init\_\_() call", which I think it's more accurate, just like register_parameter.
```
import math
import torch
from torch.nn.parameter import Parameter
from torch.nn import functional as F
from torch.nn import Module
class Linear(Module):
def __init__(self, in_features, out_features, bias=True):
self.in_features = in_features
self.out_features = out_features
self.register_buffer('test', torch.Tensor(out_features, in_features))
self.weight = Parameter(torch.Tensor(out_features, in_features))
if bias:
self.bias = Parameter(torch.Tensor(out_features))
else:
self.register_parameter('bias', None)
super(Linear, self).__init__()
self.reset_parameters()
def reset_parameters(self):
stdv = 1. / math.sqrt(self.weight.size(1))
self.weight.data.uniform_(-stdv, stdv)
if self.bias is not None:
self.bias.data.uniform_(-stdv, stdv)
def forward(self, input):
return F.linear(input, self.weight, self.bias)
def extra_repr(self):
return 'in_features={}, out_features={}, bias={}'.format(
self.in_features, self.out_features, self.bias is not None
)
linear = Linear(3,4)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16110
Differential Revision: D13715839
Pulled By: soumith
fbshipit-source-id: c300eff0a8655aade448354cf489a592f7db722a
Summary:
Fixes an issue that arose from https://github.com/pytorch/pytorch/pull/13481 where `.shared_memory()` couldn't be called. Effectively undoes all changes to `nn.Module` from that PR and solve the relevant problem in a different way (the goal was to be able to call `._apply()` on the Python wrapper for a C++ module).
soumith
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15305
Differential Revision: D13493937
Pulled By: goldsborough
fbshipit-source-id: 4cb8687f90fc8709a536c5e7eacd0dc8edf6f750
Summary:
This PR enables C++ frontend modules to be bound into Python and added as submodules of Python modules. For this, I added lots of pybind11 bindings for the `torch::nn::Module` class, and modified the `torch.nn.Module` class in Python to have a new Metaclass that makes `isinstance(m, torch.nn.Module)` return true when `m` is a C++ frontend module. The methods and fields of C++ modules are bound in such a way that they work seamlessly as submodules of Python modules for most operations (one exception I know of: calling `.to()` ends up calling `.apply()` on each submodule with a Python lambda, which cannot be used in C++ -- this may require small changes on Python side).
I've added quite a bunch of tests to verify the bindings and equality with Python. I think I should also try out adding a C++ module as part of some large PyTorch module, like a WLM or something, and see if everything works smoothly.
The next step for inter-op across our system is ScriptModule <-> C++ Frontend Module inter-op. I think this will then also allow using C++ frontend modules from TorchScript.
apaszke zdevito
CC dzhulgakov
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13481
Differential Revision: D12981996
Pulled By: goldsborough
fbshipit-source-id: 147370d3596ebb0e94c82cec92993a148fee50a7
Summary:
Add to the Tensor doc info about `.device`, `.is_cuda`, `.requires_grad`, `.is_leaf` and `.grad`.
Update the `register_backward_hook` doc with a warning stating that it does not work in all cases.
Add support in the `_add_docstr` function to add docstring to attributes.
There is an explicit cast here but I am not sure how to handle it properly. The thing is that the doc field for getsetdescr is written as being a const char * (as all other doc fields in descriptors objects) in cpython online documentation. But in the code, it is the only one that is not const.
I assumed here that it is a bug in the code because it does not follow the doc and the convention of the others descriptors and so I cast out the const.
EDIT: the online doc I was looking at is for 3.7 and in that version both the code and the doc are const. For older versions, both are non const.
Please let me know if this should not be done. And if it should be done if there is a cleaner way to do it !
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14339
Differential Revision: D13243266
Pulled By: ezyang
fbshipit-source-id: 75b7838f7cd6c8dc72b0c61950e7a971baefaeeb
Summary:
Problems with SN and DP after #12671 :
1. in eval mode, `weight_orig` is not getting correct gradient #12737 .
Fix: keep `v` vector around as a buffer and always calculate `W = W_orig / (u @ W_orig @ v)` even in eval.
2. in training mode, the `weight` buffer of the parallelized module is never updated, if someone touches `weight_orig` and/or `weight` and makes them not sharing storage. So in `eval` the weight used is wrong.
Fix: Make `weight` not a buffer anymore and always calculate it as above.
3. #12671 changed SN to update `u` in-place to make DP work correctly, but then it breaks backward through two forwards (e.g., the common GAN loss `D(real) - D(fake)`) because the vectors needed to backprop the 1st forward is changed in the 2nd forward.
Fix: This PR clones `u` and `v` before using them.
To maintain BC, I added a hook interface for producing and loading state_dict. This is ugly and we should really have better interface for spectral_norm. But for the purpose to fix this issue, I make this patch. Even if we have a better interface, BC mechanism for legacy loading legacy state_dict still needs to be done.
cc The controller you requested could not be found. crcrpar
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13350
Differential Revision: D12931044
Pulled By: SsnL
fbshipit-source-id: 8be6f934eaa62414d76d2c644dedd7e1b7eb31ef
Summary:
I spent a couple of minutes trying to understand which shape corresponds to checkpoint and which one to the model
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12870
Differential Revision: D10466600
Pulled By: SsnL
fbshipit-source-id: 3b68530b1b756462a2acd59e3a033ff633567a6b
Summary:
In the state dict loading code, it would print the error message referring to the shape of the loaded parameters and the parameters in the initialised model with the formatting in the wrong order. Swapped them round to fix.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11200
Differential Revision: D9631160
Pulled By: SsnL
fbshipit-source-id: 03d9446303bd417fef67027b10d7a27de06486be
Summary:
This commit adds the ``buffers()`` and ``named_buffers()`` methods as
analogues of ``parameters()`` and ``named_parameters()``.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10554
Reviewed By: SsnL
Differential Revision: D9367762
Pulled By: jma127
fbshipit-source-id: f2042e46a7e833dce40cb41681dbd80d7885c74e
Summary:
This PR fixes#9743 .
Adding backward support when loading a checkpoint from 0.3.* with 1dim tensor, they are now 0 dim tensor in 0.4+.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9781
Differential Revision: D8988196
Pulled By: ailzhang
fbshipit-source-id: a7a1bc771d597394208430575d5a4d23b9653fef
Summary:
As in the title. Lets us simplify a lot of code.
Depends on #9363, so please review only the last commit.
zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9414
Reviewed By: zdevito
Differential Revision: D8836496
Pulled By: apaszke
fbshipit-source-id: 9b3c3d1f001a9dc522f8478abc005b6b86cfa3e3
* Add non_blocking to Tensor/Module.to
* flake8
* Add argparse tests
* cpp parse
* Use C++ parser
* use a commong parse function with Tensor.to
* fix test_jit
* use THPObjectPtr
* increase refcount for None, True, and False
* address comments
* address comments
* Add version counter to module, change load_state_dict to use load_local_state_dict which does class specific loading
* Clarifies version number in docs
* fix jit tests
* fix state_dict tests
* typo
* fix ddp
* exclude version numbers from state dict entries
* Fix jit test and empty modules
* address comments
* test for "."
* revert the private version change in state_dict
* make IN case a hard error
* fix not reporting error when unexpected submodule
* address comments
* disallow empty string in name and remvoe trailing dot
* Codemod to update our codebase to 0.4 standard
* Update some of the test scri[ts
* remove Variable in test_clip_grad_value
* fix _symbolic_override_wrapper_maker
This PR enables users to print extra information of their subclassed nn.Module.
Now I simply insert the user-defined string at the ending of module name, which should be discussed in this PR.
Before this PR, users should redefine the __repr__ and copy&paste the source code from Module.
* Add support for extra information on Module
* Rewrite the repr method of Module
* Fix flake8
* Change the __repr__ to get_extra_repr in Linear
* Fix extra new-line for empty line
* Add test for __repr__ method
* Fix bug of block string indent
* Add indent for multi-line repr test.
* Address review comments
* Update tutorial for creating nn.Module
* Fix flake8, add extra_repr of bilinear
* Refactor DropoutNd
* Change to extra_repr in some Modules
* Fix flake8
* Refactor padding modules
* Refactor pooling module
* Fix typo
* Change to extra_repr
* Fix bug for GroupNorm
* Fix bug for LayerNorm
previously, it was being implicitly imported via the import of
torch.onnx
this is no longer the case, and is a hacky thing to depend on anyway,
so import it explicitly
* Improvize documentation
1. Add formula for erf, erfinv
2. Make exp, expm1 similar to log, log1p
3. Symbol change in ge, le, ne, isnan
* Fix minor nit in the docstring
* More doc improvements
1. Added some formulae
2. Complete scanning till "Other Operations" in Tensor docs
* Add more changes
1. Modify all torch.Tensor wherever required
* Fix Conv docs
1. Fix minor nits in the references for LAPACK routines
* Improve Pooling docs
1. Fix lint error
* Improve docs for RNN, Normalization and Padding
1. Fix flake8 error for pooling
* Final fixes for torch.nn.* docs.
1. Improve Loss Function documentation
2. Improve Vision Layers documentation
* Fix lint error
* Improve docstrings in torch.nn.init
* Fix lint error
* Fix minor error in torch.nn.init.sparse
* Fix Activation and Utils Docs
1. Fix Math Errors
2. Add explicit clean to Makefile in docs to prevent running graph generation script
while cleaning
3. Fix utils docs
* Make PYCMD a Makefile argument, clear up prints in the build_activation_images.py
* Fix batch norm doc error
The nn.* counterpart of #5443 . Mostly removed Variable wrapper. Also added doc for nn.RReLU.
Notice that torch.randn(*, requires_grad=True) isn't documented until #5462 is done.
This replaces the torch.Tensor constructors with factories that produce
Variables. Similarly, functions on the torch module (e.g. torch.randn)
now return Variables.
To keep the PR to a reasonable size, I've left most of the unused tensor
code. Subsequent PRs will remove the dead code, clean-up calls to
torch.autograd.Variable, and rename Variable to Tensor everywhere.
There are some breaking changes because Variable and Tensors had
slightly different semantics. There's a list of those changes here:
https://github.com/pytorch/pytorch/wiki/Breaking-Changes-from-Variable-and-Tensor-merge
This removes volatile from Variable. The functionality is mostly
replaced by a global (thread-local) flag, which is controlled by
torch.set_grad_enabled() and the context manager torch.no_grad().
In C++, the flag is exposed through GradMode::is_enabled() and GradMode::set_enabled()
Fixes#3627
* Avoid casting integer params and buffers to float(), double() and half()
* Add test for immune integer buffers
* Fix documentation for float(), double() and half()
* Fix test
* made it explicit in the docstring of Module.register_forward_hook() that the hook(s) will be called AFTER calling forward().
* added "every time" in docstring of Module.register_forward_pre_hook()
* Add weight normalization implementation
This adds forward "pre-hooks" which get called before the module's
forward() method. Weight norm is implemented as a hook which calculates
the weight variable from the weight_g and weight_v every iteration.
Based on @rtqichen implementation.
* Specify return type
a module that returns a non-standard data structure currently breaks
due to checks for backwards hooks. This refactors the code slightly so
this will only break in the event of backwards hooks.
We were keying hooks by RemovableHandle id. However, we don't hold onto
handles and ids of dead objects can be reused. This replaces id(handle)
with a global counter.
The core autograd Variable, Function, and Engine no longer depend on the
Python API. This let's us implement functions in C++. In the future, we
can also multithread engine and release the GIL for most of the
non-Python backwards.
Here's the command I used to invoke autopep8 (in parallel!):
git ls-files | grep '\.py$' | xargs -n1 -P`nproc` autopep8 -i
Several rules are ignored in setup.cfg. The goal is to let autopep8
handle everything which it can handle safely, and to disable any rules
which are tricky or controversial to address. We may want to come back
and re-enable some of these rules later, but I'm trying to make this
patch as safe as possible.
Also configures flake8 to match pep8's behavior.
Also configures TravisCI to check the whole project for lint.
The load_state_dict() function now raises an error if the argument
state_dict has extra keys or is missing keys.
Previously, load_state_dict() ignored extra and missing keys, which made
it hard to notice when you load an invalid state_dict. This could
happen, for example, if you save the state_dict for a DataParallel, but
load it into a single model.
The state_dict() function now only includes the Tensor data from the
paramters, which reduces checkpoint size by not saving gradients.
The register hook calls now return an object that can be used to remove
the hook. For example,
>>> h = module.register_forward_hook(callback)
>>> h.remove() # removes hook
Or as a context manager:
>>> with module.register_forward_hook(callback):
... pass
This makes it easier for libraries to use hooks without worrying about
name collisions.
This hooks into the (internal) ForkingPickler class in multiprocessing
to reduce tensors, storages, and CUDA events instead of our queue from
joblib. This makes it easier to use the standard multiprocessing classes
in later versions of Python.
This also exposes:
- Tensor/Storage.share_memory_()
- Module.share_memory()
These methods move the CPU tensors and storages to shared memory. If
you're using the "fork" method of multiprocessing, these objects can be
directly inherited instead of serialized through a queue.
Uses the assignment syntax to get deterministic ordering of parameters.
The ordering of parameters using the constructor syntax is
non-deterministic because kwargs use dict() in Python 3.5 and earlier.
modules(): returns an iterator over all modules in the network
children(): returns an iterator over immediate children
Also fix __getitem__ in Sequential
* _forward is renamed forward since users should override it
* some __call__ overrides are changed to forward
* function which return a single variable are changed to return that
variable instead of a one-element tuple