I approved https://github.com/pytorch/pytorch/pull/110850 which did the following
Previously:
`num_batches_tracked` not in state_dict when doing `m.load_state_dict(state_dict)` --> always overwrite module's `num_batches_tracked` in `load_from_state_dict` with a 0 cpu tensor
Now:
`num_batches_tracked` not in state_dict loaded when doing `m.load_state_dict(state_dict)` --> only overwrite module's `num_batches_tracked` in `load_from_state_dict` with a 0 cpu tensor if module does not have `num_batches_tracked`
This causes the following issue:
```
with torch.device('meta'):
m = BatchNorm(...)
m.load_state_dict(state_dict, assign=True)
```
If `num_batches_tracked` is not in `state_dict`, since `modules's` `num_batches_tracked` is present on meta device, it is not overwritten with a 0 cpu tensor. When compiling, this error is raised
```
AssertionError: Does not support mixing cuda+meta
```
I am not sure whether the explicit check for meta device makes sense as a fix, will add testing if this fix is ok
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115285
Approved by: https://github.com/albanD
Using mypy in code that depends on pytorch, I noticed that the type annotation doesn't allow a device ordinal.
`error: Argument "device" to "to_empty" of "Module" has incompatible type "int"; expected "str | device" [arg-type]`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113647
Approved by: https://github.com/albanD
Fixes#112599
Fixed errors relating to pydocstyle in the following files. The remaining errors are related to docstrings at the module level and at methods within each module, `forward()`, `reset_parameters`, `__init__` ..etc
pydocstyle torch/nn/modules/pooling.py --count
before: 49
after: 29
**remaining errors:**
```
torch/nn/modules/pooling.py:1 at module level:
D100: Missing docstring in public module
torch/nn/modules/pooling.py:90 in public method `forward`:
D102: Missing docstring in public method
torch/nn/modules/pooling.py:163 in public method `forward`:
D102: Missing docstring in public method
torch/nn/modules/pooling.py:240 in public method `forward`:
D102: Missing docstring in public method
torch/nn/modules/pooling.py:315 in public method `__init__`:
D107: Missing docstring in __init__
torch/nn/modules/pooling.py:321 in public method `forward`:
D102: Missing docstring in public method
torch/nn/modules/pooling.py:402 in public method `__init__`:
D107: Missing docstring in __init__
torch/nn/modules/pooling.py:408 in public method `forward`:
D102: Missing docstring in public method
torch/nn/modules/pooling.py:472 in public method `__init__`:
D107: Missing docstring in __init__
torch/nn/modules/pooling.py:478 in public method `forward`:
D102: Missing docstring in public method
torch/nn/modules/pooling.py:541 in public method `__init__`:
D107: Missing docstring in __init__
torch/nn/modules/pooling.py:550 in public method `forward`:
D102: Missing docstring in public method
torch/nn/modules/pooling.py:620 in public method `__init__`:
D107: Missing docstring in __init__
torch/nn/modules/pooling.py:630 in public method `forward`:
D102: Missing docstring in public method
torch/nn/modules/pooling.py:706 in public method `__init__`:
D107: Missing docstring in __init__
torch/nn/modules/pooling.py:716 in public method `forward`:
D102: Missing docstring in public method
torch/nn/modules/pooling.py:720 in public method `__setstate__`:
D105: Missing docstring in magic method
torch/nn/modules/pooling.py:774 in public method `__init__`:
D107: Missing docstring in __init__
torch/nn/modules/pooling.py:792 in public method `forward`:
D102: Missing docstring in public method
torch/nn/modules/pooling.py:845 in public method `__init__`:
D107: Missing docstring in __init__
torch/nn/modules/pooling.py:863 in public method `forward`:
D102: Missing docstring in public method
torch/nn/modules/pooling.py:925 in public method `forward`:
D102: Missing docstring in public method
torch/nn/modules/pooling.py:979 in public method `forward`:
D102: Missing docstring in public method
torch/nn/modules/pooling.py:1026 in public method `forward`:
D102: Missing docstring in public method
torch/nn/modules/pooling.py:1068 in public method `forward`:
D102: Missing docstring in public method
torch/nn/modules/pooling.py:1111 in public method `forward`:
D102: Missing docstring in public method
torch/nn/modules/pooling.py:1150 in public method `forward`:
D102: Missing docstring in public method
torch/nn/modules/pooling.py:1189 in public method `forward`:
D102: Missing docstring in public method
torch/nn/modules/pooling.py:1228 in public method `forward`:
D102: Missing docstring in public method
```
pydocstyle torch/nn/modules/upsampling.py --count
before: 14
after: 7
**remaining:**
```
torch/nn/modules/upsampling.py:1 at module level:
D100: Missing docstring in public module
torch/nn/modules/upsampling.py:142 in public method `__init__`:
D107: Missing docstring in __init__
torch/nn/modules/upsampling.py:156 in public method `forward`:
D102: Missing docstring in public method
torch/nn/modules/upsampling.py:160 in public method `__setstate__`:
D105: Missing docstring in magic method
torch/nn/modules/upsampling.py:166 in public method `extra_repr`:
D102: Missing docstring in public method
torch/nn/modules/upsampling.py:216 in public method `__init__`:
D107: Missing docstring in __init__
torch/nn/modules/upsampling.py:263 in public method `__init__`:
D107: Missing docstring in __init__
```
pydocstyle torch/nn/modules/rnn.py --count
before: 47
after: 40
**remaining**
```
torch/nn/modules/rnn.py:1 at module level:
D100: Missing docstring in public module
torch/nn/modules/rnn.py:59 in public method `__init__`:
D107: Missing docstring in __init__
torch/nn/modules/rnn.py:160 in public method `__setattr__`:
D105: Missing docstring in magic method
torch/nn/modules/rnn.py:225 in public method `reset_parameters`:
D102: Missing docstring in public method
torch/nn/modules/rnn.py:230 in public method `check_input`:
D102: Missing docstring in public method
torch/nn/modules/rnn.py:242 in public method `get_expected_hidden_size`:
D102: Missing docstring in public method
torch/nn/modules/rnn.py:256 in public method `check_hidden_size`:
D102: Missing docstring in public method
torch/nn/modules/rnn.py:272 in public method `check_forward_args`:
D102: Missing docstring in public method
torch/nn/modules/rnn.py:278 in public method `permute_hidden`:
D102: Missing docstring in public method
torch/nn/modules/rnn.py:284 in public method `extra_repr`:
D102: Missing docstring in public method
torch/nn/modules/rnn.py:305 in public method `__getstate__`:
D105: Missing docstring in magic method
torch/nn/modules/rnn.py:313 in public method `__setstate__`:
D105: Missing docstring in magic method
torch/nn/modules/rnn.py:355 in public method `all_weights`:
D102: Missing docstring in public method
torch/nn/modules/rnn.py:471 in public method `__init__`:
D107: Missing docstring in __init__
torch/nn/modules/rnn.py:478 in public method `__init__`:
D107: Missing docstring in __init__
torch/nn/modules/rnn.py:481 in public method `__init__`:
D107: Missing docstring in __init__
torch/nn/modules/rnn.py:503 in public method `forward` (skipping F811):
D102: Missing docstring in public method
torch/nn/modules/rnn.py:762 in public method `__init__`:
D107: Missing docstring in __init__
torch/nn/modules/rnn.py:768 in public method `__init__`:
D107: Missing docstring in __init__
torch/nn/modules/rnn.py:771 in public method `__init__`:
D107: Missing docstring in __init__
torch/nn/modules/rnn.py:774 in public method `get_expected_cell_size`:
D102: Missing docstring in public method
torch/nn/modules/rnn.py:786 in public method `check_forward_args`:
D102: Missing docstring in public method
torch/nn/modules/rnn.py:798 in public method `permute_hidden`:
D102: Missing docstring in public method
torch/nn/modules/rnn.py:809 in public method `forward` (skipping F811):
D102: Missing docstring in public method
torch/nn/modules/rnn.py:820 in public method `forward` (skipping F811):
D102: Missing docstring in public method
torch/nn/modules/rnn.py:1030 in public method `__init__`:
D107: Missing docstring in __init__
torch/nn/modules/rnn.py:1036 in public method `__init__`:
D107: Missing docstring in __init__
torch/nn/modules/rnn.py:1039 in public method `__init__`:
D107: Missing docstring in __init__
torch/nn/modules/rnn.py:1046 in public method `forward` (skipping F811):
D102: Missing docstring in public method
torch/nn/modules/rnn.py:1054 in public method `forward` (skipping F811):
D102: Missing docstring in public method
torch/nn/modules/rnn.py:1123 in public class `RNNCellBase`:
D101: Missing docstring in public class
torch/nn/modules/rnn.py:1134 in public method `__init__`:
D107: Missing docstring in __init__
torch/nn/modules/rnn.py:1152 in public method `extra_repr`:
D102: Missing docstring in public method
torch/nn/modules/rnn.py:1160 in public method `reset_parameters`:
D102: Missing docstring in public method
torch/nn/modules/rnn.py:1224 in public method `__init__`:
D107: Missing docstring in __init__
torch/nn/modules/rnn.py:1230 in public method `forward`:
D102: Missing docstring in public method
torch/nn/modules/rnn.py:1327 in public method `__init__`:
D107: Missing docstring in __init__
torch/nn/modules/rnn.py:1332 in public method `forward`:
D102: Missing docstring in public method
torch/nn/modules/rnn.py:1422 in public method `__init__`:
D107: Missing docstring in __init__
torch/nn/modules/rnn.py:1427 in public method `forward`:
D102: Missing docstring in public method
```
pydocstyle torch/nn/modules/pixelshuffle.py --count
before: 13
after: 8
**remaining:**
```
torch/nn/modules/pixelshuffle.py:1 at module level:
D100: Missing docstring in public module
torch/nn/modules/pixelshuffle.py:52 in public method `__init__`:
D107: Missing docstring in __init__
torch/nn/modules/pixelshuffle.py:56 in public method `forward`:
D102: Missing docstring in public method
torch/nn/modules/pixelshuffle.py:59 in public method `extra_repr`:
D102: Missing docstring in public method
torch/nn/modules/pixelshuffle.py:105 in public method `__init__`:
D107: Missing docstring in __init__
torch/nn/modules/pixelshuffle.py:109 in public method `forward`:
D102: Missing docstring in public method
torch/nn/modules/pixelshuffle.py:112 in public method `extra_repr`:
D102: Missing docstring in public method
```
pydocstyle torch/nn/modules/sparse.py --count
before: 14
after: 8
**remaining errors:**
```
torch/nn/modules/sparse.py:1 at module level:
D100: Missing docstring in public module
torch/nn/modules/sparse.py:124 in public method `__init__`:
D107: Missing docstring in __init__
torch/nn/modules/sparse.py:153 in public method `reset_parameters`:
D102: Missing docstring in public method
torch/nn/modules/sparse.py:162 in public method `forward`:
D102: Missing docstring in public method
torch/nn/modules/sparse.py:167 in public method `extra_repr`:
D102: Missing docstring in public method
torch/nn/modules/sparse.py:320 in public method `__init__`:
D107: Missing docstring in __init__
torch/nn/modules/sparse.py:350 in public method `reset_parameters`:
D102: Missing docstring in public method
torch/nn/modules/sparse.py:396 in public method `extra_repr`:
D102: Missing docstring in public method
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113177
Approved by: https://github.com/ezyang
Fixes #112601
```
pydocstyle torch/nn/modules/module.py --count
```
On master:
115
After my changes on this PR:
8
The remaining 8 are due to missing docstrings in the magic methods:
```
torch/nn/modules/module.py:1 at module level:
D100: Missing docstring in public module
torch/nn/modules/module.py:1635 in public method `__getstate__`:
D105: Missing docstring in magic method
torch/nn/modules/module.py:1640 in public method `__setstate__`:
D105: Missing docstring in magic method
torch/nn/modules/module.py:1674 in public method `__getattr__`:
D105: Missing docstring in magic method
torch/nn/modules/module.py:1689 in public method `__setattr__`:
D105: Missing docstring in magic method
torch/nn/modules/module.py:1748 in public method `__delattr__`:
D105: Missing docstring in magic method
torch/nn/modules/module.py:2480 in public method `__repr__`:
D105: Missing docstring in magic method
torch/nn/modules/module.py:2505 in public method `__dir__`:
D105: Missing docstring in magic method
```
Should I add them too? Happy to do it, I just wasn't sure if you wanted these documented. Please let me know.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112674
Approved by: https://github.com/mikaylagawarecki
When converting bn to sync bn, we need to keep sync bn's training flag with the original bn flag, the motivation is there in case the given origin model has set some bn training flag and others are not seated, after we convert sync bn, we hoping not to change this behavior.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111998
Approved by: https://github.com/mikaylagawarecki
This pull request addresses an inconsistency in the representation of the Hadamard product across PyTorch documentation. Currently, the notation varies among different modules:
- In `torch.nn.LSTM` documentation the Hadamard product is represented with $\odot$
- In `torch.nn.GRU` documentation the Hadamard product is represented with $*$
- In `torch.nn.LSTMCell` documentation the Hadamard product is represented with $*$
- In `torch.nn.GRUCell` documentation the Hadamard product is represented with $*$
- In `torch.ao.nn.quantized.dynamic.GRU` documentation the Hadamard product is represented with $*$
This PR proposes consistently representing the Hadamard product throughout the documentation to enhance clarity and align with established standards.
The notation $\odot$ will be uniformly adopted, following the convention in the [Deep Learning Book](https://www.deeplearningbook.org/contents/linear_algebra.html).
**Changes Made:**
- Modified `torch.nn.GRU` documentation to represent the Hadamard product with $\odot$
- Modified `torch.nn.LSTMCell` documentation to represent the Hadamard product with $\odot$
- Modified `torch.nn.GRUCell` documentation to represent the Hadamard product with $\odot$
- Modified `torch.ao.nn.quantized.dynamic.GRU` documentation to represent the Hadamard product with $\odot$
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111763
Approved by: https://github.com/albanD
Fixes#106698
Also added a check for python API, because current error message
```
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/sehoon/pytorch-latest/torch/nn/modules/adaptive.py", line 128, in __init__
or (min(cutoffs) <= 0) \
ValueError: min() arg is an empty sequence
```
is not very comprehensible.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106777
Approved by: https://github.com/albanD
Addresses [issue #106085](https://github.com/pytorch/pytorch/issues/106085).
In `torch/nn/modules/rnn.py`:
- Adds documentation string to RNNBase class.
- Adds parameters to __init__ methods for RNN, LSTM, and GRU, classes.
- Adds type annotations to __init__ methods for RNN, LSTM, and GRU.
In `torch/ao/nn/quantized/dynamic/modules/rnn.py`:
- Adds type specifications to `_FLOAT_MODULE` attributes in RNNBase, RNN, LSTM, and GRU classes.
> This resolves a `mypy` assignment error `Incompatible types in assignment (expression has type "Type[LSTM]", base class "RNNBase" defined the type as "Type[RNNBase]")` that seemed to be a result of fully specified type annotations in `torch/nn/modules/rnn.py`).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106222
Approved by: https://github.com/mikaylagawarecki
Fixes#92000
The documentation at https://pytorch.org/docs/stable/generated/torch.nn.MultiLabelSoftMarginLoss.html#multilabelsoftmarginloss states:
> label targets padded by -1 ensuring same shape as the input.
However, the shape of input and target tensor are compared, and an exception is raised if they differ in either dimension 0 or 1. Meaning the label targets are never padded. See the code snippet below and the resulting output. The documentation is therefore adjusted to:
> label targets must have the same shape as the input.
```
import torch
import torch.nn as nn
# Create some example data
input = torch.tensor(
[
[0.8, 0.2, -0.5],
[0.1, 0.9, 0.3],
]
)
target1 = torch.tensor(
[
[1, 0, 1],
[0, 1, 1],
[0, 1, 1],
]
)
target2 = torch.tensor(
[
[1, 0],
[0, 1],
]
)
target3 = torch.tensor(
[
[1, 0, 1],
[0, 1, 1],
]
)
loss_func = nn.MultiLabelSoftMarginLoss()
try:
loss = loss_func(input, target1).item()
except RuntimeError as e:
print('target1 ', e)
try:
loss = loss_func(input, target2).item()
except RuntimeError as e:
print('target2 ', e)
loss = loss_func(input, target3).item()
print('target3 ', loss)
```
output:
```
target1 The size of tensor a (3) must match the size of tensor b (2) at non-singleton dimension 0
target2 The size of tensor a (2) must match the size of tensor b (3) at non-singleton dimension 1
target3 0.6305370926856995
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107817
Approved by: https://github.com/mikaylagawarecki
This updates ruff to 0.285 which is faster, better, and have fixes a bunch of false negatives with regards to fstrings.
I also enabled RUF017 which looks for accidental quadratic list summation. Luckily, seems like there are no instances of it in our codebase, so enabling it so that it stays like that. :)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107519
Approved by: https://github.com/ezyang
This updates ruff to 0.285 which is faster, better, and have fixes a bunch of false negatives with regards to fstrings.
I also enabled RUF017 which looks for accidental quadratic list summation. Luckily, seems like there are no instances of it in our codebase, so enabling it so that it stays like that. :)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107519
Approved by: https://github.com/ezyang
Summary:
We are seeing `aten._native_multi_head_attention` op (not in core Aten op set) is left in the exported graph and causes problems in the downstream at runtime.
Two proposed solutions:
1. Disable fast path while tracing to leverage the non-optimized path to get decomp, that way, the blamed op won't show up in the exported graph
2. Add a decomp rule for `aten._native_multi_head_attention`
After discussing with kimishpatel and bdhirsh, #1 is preferred and verified it could immediately unblock the critical model enablement work for PP.
Test Plan: CI
Differential Revision: D48169806
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106824
Approved by: https://github.com/kimishpatel
Summary:
make is_causal hint flags available for the top level transformer module.
It's debatable whether this is useful -- at present we autodetect causal masks for src and tgt masks in transformer encoder and decoder, respectively. is_causal flags available woul enable users to short-cut this check by asserting whether they mask is causal, or not.
I am putting this diff up for discussion, not as a solution. Not doing anything may be the right solution, unless there is strong (data-driven) user demand. -- it appears the consensus is to move ahead with this, as per discussions below.
@cpuhrsch @mikaylagawarecki @jbschlosser @janEbert
Test Plan: sandcastle
Differential Revision: D47373260
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106143
Approved by: https://github.com/mikaylagawarecki
- Enabled LSTM weight prepack in inductor.
- Added a mkldnn decomposition for lstm which won't change for different `seq_lens`. With the previous decomposition, for dynamic shapes use case where `seq_lens` changes, the graph will be different.
- Extended several inductor utility functions to support `List(Tensor`) as input. Previously those functions only supported `Tensor` input.
**Update 2023-07-26:**
- https://github.com/pytorch/pytorch/pull/103851 has moved CPU weight packing to be after AOTAutograd. Fixed the support in this PR to follow the same way (mainly in 3b207f7f1c (diff-6dffed1ade0ba3e887f9a4eafa3bfcec267ab2365b8adcb91bd391f49b3fd2e3)).
LSTM is decomposed in `aten.mkldnn_rnn_layer` by layer and by direction. The weight prepack is done at the `mkldnn_rnn_layer` level.
- Add a fix in rnn `__get_state__` function in case we need to recompile an `LSTM` module.
When compiling the module, the weights tensors which are the `named_parameters` of the module are converted to `functional_tensor` here:
76fb72e24a/torch/nn/utils/stateless.py (L125-L128)
The forward function of LSTM will be called:
76fb72e24a/torch/_functorch/aot_autograd.py (L3379-L3381)
In the forward function, the `_flat_weights` are updated to be the same as the weights, thus becoming `functional_tensor`:
76fb72e24a/torch/nn/modules/rnn.py (L775-L778)
The weights tensors are converted back to the original tensors (which are not `functional_tensor` anymore) before exiting the `_reparametrize_module` context here:
76fb72e24a/torch/nn/utils/stateless.py (L130-L142)
But since `_flat_weights` is not in the `named_parameters` of the module, it's still `functional_tensor` ([link of the parameters that will be converted to functional and reverted back](76fb72e24a/torch/_functorch/aot_autograd.py (L3695-L3698))).
At this moment, if we need to recompile the model, `deepcopy` will be called:
76fb72e24a/torch/_dynamo/utils.py (L915-L917)
And it will report `UnImplemented` since we have `functional_tensor` (`_flat_weights`) and will trigger graph break which is not what we expect:
76fb72e24a/torch/_subclasses/meta_utils.py (L514)
Added a fix in the `__get_state__` to update the `_flat_weights` if ever weights have changed to fix this issue. The fix is covered in the `test_lstm_packed` UT.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103071
Approved by: https://github.com/jgong5, https://github.com/jansel