I see https://github.com/pytorch/pytorch/issues/53103 says this might be problematic, but I'm a bit confused at this point, because it looks like ModuleList does in fact already adhere to the Sequence API
The big win here is that for homogenous ModuleLists, you now get typing for individual members, e.g.
`ModuleList([Linear(), Linear(), Linear()])[1]` properly has type `Linear`
If this looks good, I can do a followup PR to do similarly for `ModuleDict` and `Parameter[List,Dict]`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89135
Approved by: https://github.com/albanD
## Problem
When models have a lot of complex repeated layers, `print(module)` output becomes unfeasible to work with. For example, current output of `__repr__` for `t5-small` is `715 ` lines long.
## Solution
Using better `__repr__` it becomes `135`. For `t5-large`, current `__repr__` prints `1411` lines. Better `__repr__` — `135`. Same numer as for t5-small, because most of the layers are just repeated. For `EleutherAI/gpt-j-6B` number of lines reduces form `483` to just `24`.
Here's how it works: when ModuleList items have exactly the same `__repr__` instead of printing both of them, it prints f`N x {repr(item)}`. Current code supports cases when the same ModuleList has multiple repeating items, which is especially useful when first/last layer of a block is different from the reset of them.
Better `__repr__` should make model prints smaller, more beautiful and significantly more useful by highlighting the difference between repeated blocks instead of losing it in a wall of text.
## Motivating real-life example.
You can try it out in this [colab notebook](https://colab.research.google.com/drive/1PscpX_K1UemIDotl2raC4QMy_pTqDq7p?usp=sharing).
Current `__repr__` of gpt-j-6b output it too big to add it to this PR description:
```
GPTJModel(
(wte): Embedding(50400, 4096)
(drop): Dropout(p=0.0, inplace=False)
(h): ModuleList(
(0): GPTJBlock(
(ln_1): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)
(attn): GPTJAttention(
(attn_dropout): Dropout(p=0.0, inplace=False)
(resid_dropout): Dropout(p=0.0, inplace=False)
(k_proj): Linear(in_features=4096, out_features=4096, bias=False)
(v_proj): Linear(in_features=4096, out_features=4096, bias=False)
(q_proj): Linear(in_features=4096, out_features=4096, bias=False)
(out_proj): Linear(in_features=4096, out_features=4096, bias=False)
)
(mlp): GPTJMLP(
(fc_in): Linear(in_features=4096, out_features=16384, bias=True)
(fc_out): Linear(in_features=16384, out_features=4096, bias=True)
(act): NewGELUActivation()
(dropout): Dropout(p=0.0, inplace=False)
)
)
(1): GPTJBlock(
(ln_1): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)
(attn): GPTJAttention(
(attn_dropout): Dropout(p=0.0, inplace=False)
(resid_dropout): Dropout(p=0.0, inplace=False)
(k_proj): Linear(in_features=4096, out_features=4096, bias=False)
(v_proj): Linear(in_features=4096, out_features=4096, bias=False)
(q_proj): Linear(in_features=4096, out_features=4096, bias=False)
(out_proj): Linear(in_features=4096, out_features=4096, bias=False)
)
(mlp): GPTJMLP(
(fc_in): Linear(in_features=4096, out_features=16384, bias=True)
(fc_out): Linear(in_features=16384, out_features=4096, bias=True)
(act): NewGELUActivation()
(dropout): Dropout(p=0.0, inplace=False)
)
)
(2): GPTJBlock(
...
```
Better `__repr__` output looks like this:
```
GPTJModel(
(wte): Embedding(50400, 4096)
(drop): Dropout(p=0.0, inplace=False)
(h): ModuleList(
(0-27): 28 x GPTJBlock(
(ln_1): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)
(attn): GPTJAttention(
(attn_dropout): Dropout(p=0.0, inplace=False)
(resid_dropout): Dropout(p=0.0, inplace=False)
(k_proj): Linear(in_features=4096, out_features=4096, bias=False)
(v_proj): Linear(in_features=4096, out_features=4096, bias=False)
(q_proj): Linear(in_features=4096, out_features=4096, bias=False)
(out_proj): Linear(in_features=4096, out_features=4096, bias=False)
)
(mlp): GPTJMLP(
(fc_in): Linear(in_features=4096, out_features=16384, bias=True)
(fc_out): Linear(in_features=16384, out_features=4096, bias=True)
(act): NewGELUActivation()
(dropout): Dropout(p=0.0, inplace=False)
)
)
)
(ln_f): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)
)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90452
Approved by: https://github.com/albanD
Summary:
The only difference with plain list/dict now is that nn.Parameters are
handled specially and registered as parameters properly.
test_nn and parametrization works locally.
Will see in CI if DP is fixed as well.
Tentative fix for https://github.com/pytorch/pytorch/issues/36035
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70499
Reviewed By: jbschlosser, alexeib
Differential Revision: D34005332
Pulled By: albanD
fbshipit-source-id: 7e76b0873d0fec345cb537e2a6ecba0258e662b9
(cherry picked from commit dc1e6f8d86)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/68476
We implemented all of the following `dict` methods for `ParameterDict`
- `get `
- `setdefault`
- `popitem`
- `fromkeys`
- `copy`
- `__or__`
- `__ior__`
- `__reversed__`
- `__ror__`
The behavior of these new methods matches the expected behavior of python `dict` as defined by the language itself: https://docs.python.org/3/library/stdtypes.html#typesmapping
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69403
Reviewed By: albanD
Differential Revision: D33187111
Pulled By: jbschlosser
fbshipit-source-id: ecaa493837dbc9d8566ddbb113b898997e2debcb
Summary:
Context: https://github.com/pytorch/pytorch/pull/53299#discussion_r587882857
These are the only hand-written parts of this diff:
- the addition to `.github/workflows/lint.yml`
- the file endings changed in these four files (to appease FB-internal land-blocking lints):
- `GLOSSARY.md`
- `aten/src/ATen/core/op_registration/README.md`
- `scripts/README.md`
- `torch/csrc/jit/codegen/fuser/README.md`
The rest was generated by running this command (on macOS):
```
git grep -I -l ' $' -- . ':(exclude)**/contrib/**' ':(exclude)third_party' | xargs gsed -i 's/ *$//'
```
I looked over the auto-generated changes and didn't see anything that looked problematic.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53406
Test Plan:
This run (after adding the lint but before removing existing trailing spaces) failed:
- https://github.com/pytorch/pytorch/runs/2043032377
This run (on the tip of this PR) succeeded:
- https://github.com/pytorch/pytorch/runs/2043296348
Reviewed By: walterddr, seemethere
Differential Revision: D26856620
Pulled By: samestep
fbshipit-source-id: 3f0de7f7c2e4b0f1c089eac9b5085a58dd7e0d97
Summary:
Fixes https://github.com/pytorch/pytorch/issues/46983.
The solution is based of two components:
1. The introduction of the `_initialized` attribute. This will be used during ParameterList/Dict creation methods `__init__` (introduced in https://github.com/pytorch/pytorch/issues/47772) and `__setstate__` to not trigger warnings when setting general `Module` attributes.
2. The introduction of the `not hasattr(self, key)` check to avoid triggering warnings when changing general `Module` attributes such as `.training` during the `train()` and `eval()` methods.
Tests related to the fix are added.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48315
Reviewed By: mrshenli
Differential Revision: D25130217
Pulled By: albanD
fbshipit-source-id: 79e2abf1eab616f5de74f75f370c2fe149bed4cb
Summary:
fix https://github.com/pytorch/pytorch/issues/40227
Removed the sorting operation both in ModuleDict class, updated the docstring.
Also remove a sort operation in corresponding unit test, which will lead to unit test fail.
BC Note: Python version after 3.6, the plain dict will preserve the order of keys.
example:
For a python 3.6+ user, if he is initial a ModuleDict instance using plain python dict:
{
"b": torch.nn.MaxPool2d(3),
"a": torch.nn.MaxPool2d(3)
}
, he will get a ModuleDict which preserve the order:
ModuleDict(
(b): MaxPool2d(kernel_size=3, stride=3, padding=0, dilation=1, ceil_mode=False)
(a): MaxPool2d(kernel_size=3, stride=3, padding=0, dilation=1, ceil_mode=False)
)
For a python 3.5 user, if we maintain the same input, then the output ModuleDict could be:
ModuleDict(
(a): MaxPool2d(kernel_size=3, stride=3, padding=0, dilation=1, ceil_mode=False)
(b): MaxPool2d(kernel_size=3, stride=3, padding=0, dilation=1, ceil_mode=False)
)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40905
Differential Revision: D22357480
Pulled By: albanD
fbshipit-source-id: 0e2502769647bb64f404978243ca1ebe5346d573
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38211
Just because the annotations are inline doesn't mean the files type
check; most of the newly annotated files have type errors and I
added exclusions for them in mypy.ini. The payoff of moving
all of these modules inline is I can delete the relevant code
generation logic for the pyi files (which was added ignore
annotations that weren't actually relevant anymore.)
For the most part the translation was completely mechanical, but there
were two hairy issues. First, I needed to work around a Python 3.6 and
earlier bug where Generic has a nontrivial metaclass. This fix is in
torch/jit/__init__.py. Second, module.py, we need to apply the same
fix for avoiding contravariance checks that the pyi file used to have;
this is done by declaring forward as a variable (rather than a
function), which appears to be sufficient enough to get mypy to not
contravariantly check input arguments.
Because we aren't actually typechecking these modules in most
cases, it is inevitable that some of these type annotations are wrong.
I slavishly copied the old annotations from the pyi files unless there
was an obvious correction I could make. These annotations will probably
need fixing up later.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Differential Revision: D21497397
Pulled By: ezyang
fbshipit-source-id: 2b08bacc152c48f074e7edc4ee5dce1b77d83702
Summary:
Container `Module`s, including `ModuleList`, `ParameterList` and `ParameterDict`, should not be called like a regular `Module`.
This PR add error messages for these special modules.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29991
Differential Revision: D19698535
Pulled By: ezyang
fbshipit-source-id: fe156a0bbb033041086734b38f8c6fde034829bf
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28988
Make ModuleList, Sequential, ModuleDict go through the same pathway as other modules, cleaning up a bunch of code and allowing them to define custom forwards and other methods.
EDIT: Previously, we would ignore an nn.Sequential attribute if it was not in `__constants__` ("did you forget to add it to Constants"). This PR scripts it even if it is not in `__constants__`. Is that what we want?
Test Plan: Imported from OSS
Differential Revision: D18402821
Pulled By: eellison
fbshipit-source-id: dd4f28fb0df0d1ba4ad1b3bc34ba141959a433f7
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28987
We have `__iter__` defined on nn.ModuleList. Chainer's `Sequential` defines `__iter__`. This will also be helpful in modules which extend `nn.Sequential` and define a custom forward, because they can use the `for x in self` syntax that is supported in both python & TorchScript.
Test Plan: Imported from OSS
Differential Revision: D18402822
Pulled By: eellison
fbshipit-source-id: 1ece0f891a9d37f401e232320f58b056d5481856
Summary:
Add support for nn.ModuleDict in script. This is needed to support torchvision.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25715
Differential Revision: D17301826
Pulled By: eellison
fbshipit-source-id: 541b5477e980f519a8c3bbb1be91dac227f6d00f
Summary:
PR to update the shape notation for all of the torch.nn modules to take a unified form. The goal is to make these definitions machine-readable and those checkable by unifying the style across all of the different modules.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15741
Differential Revision: D13709601
Pulled By: ezyang
fbshipit-source-id: fb89a03903fdf0cd0dcf76f3e469b8582b2f3634
Summary:
Simple change to allow ModuleList subclasses's `__getitem__(slice)` to return class of subclass rather than ModuleList
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11694
Differential Revision: D9892824
Pulled By: ezyang
fbshipit-source-id: b75e9c196487f55cb93f0dab6c20d850e8e759ff
This PR enables users to print extra information of their subclassed nn.Module.
Now I simply insert the user-defined string at the ending of module name, which should be discussed in this PR.
Before this PR, users should redefine the __repr__ and copy&paste the source code from Module.
* Add support for extra information on Module
* Rewrite the repr method of Module
* Fix flake8
* Change the __repr__ to get_extra_repr in Linear
* Fix extra new-line for empty line
* Add test for __repr__ method
* Fix bug of block string indent
* Add indent for multi-line repr test.
* Address review comments
* Update tutorial for creating nn.Module
* Fix flake8, add extra_repr of bilinear
* Refactor DropoutNd
* Change to extra_repr in some Modules
* Fix flake8
* Refactor padding modules
* Refactor pooling module
* Fix typo
* Change to extra_repr
* Fix bug for GroupNorm
* Fix bug for LayerNorm