Addresses [issue #106085](https://github.com/pytorch/pytorch/issues/106085).
In `torch/nn/modules/rnn.py`:
- Adds documentation string to RNNBase class.
- Adds parameters to __init__ methods for RNN, LSTM, and GRU, classes.
- Adds type annotations to __init__ methods for RNN, LSTM, and GRU.
In `torch/ao/nn/quantized/dynamic/modules/rnn.py`:
- Adds type specifications to `_FLOAT_MODULE` attributes in RNNBase, RNN, LSTM, and GRU classes.
> This resolves a `mypy` assignment error `Incompatible types in assignment (expression has type "Type[LSTM]", base class "RNNBase" defined the type as "Type[RNNBase]")` that seemed to be a result of fully specified type annotations in `torch/nn/modules/rnn.py`).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106222
Approved by: https://github.com/mikaylagawarecki
Fixes#92000
The documentation at https://pytorch.org/docs/stable/generated/torch.nn.MultiLabelSoftMarginLoss.html#multilabelsoftmarginloss states:
> label targets padded by -1 ensuring same shape as the input.
However, the shape of input and target tensor are compared, and an exception is raised if they differ in either dimension 0 or 1. Meaning the label targets are never padded. See the code snippet below and the resulting output. The documentation is therefore adjusted to:
> label targets must have the same shape as the input.
```
import torch
import torch.nn as nn
# Create some example data
input = torch.tensor(
[
[0.8, 0.2, -0.5],
[0.1, 0.9, 0.3],
]
)
target1 = torch.tensor(
[
[1, 0, 1],
[0, 1, 1],
[0, 1, 1],
]
)
target2 = torch.tensor(
[
[1, 0],
[0, 1],
]
)
target3 = torch.tensor(
[
[1, 0, 1],
[0, 1, 1],
]
)
loss_func = nn.MultiLabelSoftMarginLoss()
try:
loss = loss_func(input, target1).item()
except RuntimeError as e:
print('target1 ', e)
try:
loss = loss_func(input, target2).item()
except RuntimeError as e:
print('target2 ', e)
loss = loss_func(input, target3).item()
print('target3 ', loss)
```
output:
```
target1 The size of tensor a (3) must match the size of tensor b (2) at non-singleton dimension 0
target2 The size of tensor a (2) must match the size of tensor b (3) at non-singleton dimension 1
target3 0.6305370926856995
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107817
Approved by: https://github.com/mikaylagawarecki
This updates ruff to 0.285 which is faster, better, and have fixes a bunch of false negatives with regards to fstrings.
I also enabled RUF017 which looks for accidental quadratic list summation. Luckily, seems like there are no instances of it in our codebase, so enabling it so that it stays like that. :)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107519
Approved by: https://github.com/ezyang
This updates ruff to 0.285 which is faster, better, and have fixes a bunch of false negatives with regards to fstrings.
I also enabled RUF017 which looks for accidental quadratic list summation. Luckily, seems like there are no instances of it in our codebase, so enabling it so that it stays like that. :)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107519
Approved by: https://github.com/ezyang
Summary:
We are seeing `aten._native_multi_head_attention` op (not in core Aten op set) is left in the exported graph and causes problems in the downstream at runtime.
Two proposed solutions:
1. Disable fast path while tracing to leverage the non-optimized path to get decomp, that way, the blamed op won't show up in the exported graph
2. Add a decomp rule for `aten._native_multi_head_attention`
After discussing with kimishpatel and bdhirsh, #1 is preferred and verified it could immediately unblock the critical model enablement work for PP.
Test Plan: CI
Differential Revision: D48169806
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106824
Approved by: https://github.com/kimishpatel
Summary:
make is_causal hint flags available for the top level transformer module.
It's debatable whether this is useful -- at present we autodetect causal masks for src and tgt masks in transformer encoder and decoder, respectively. is_causal flags available woul enable users to short-cut this check by asserting whether they mask is causal, or not.
I am putting this diff up for discussion, not as a solution. Not doing anything may be the right solution, unless there is strong (data-driven) user demand. -- it appears the consensus is to move ahead with this, as per discussions below.
@cpuhrsch @mikaylagawarecki @jbschlosser @janEbert
Test Plan: sandcastle
Differential Revision: D47373260
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106143
Approved by: https://github.com/mikaylagawarecki
- Enabled LSTM weight prepack in inductor.
- Added a mkldnn decomposition for lstm which won't change for different `seq_lens`. With the previous decomposition, for dynamic shapes use case where `seq_lens` changes, the graph will be different.
- Extended several inductor utility functions to support `List(Tensor`) as input. Previously those functions only supported `Tensor` input.
**Update 2023-07-26:**
- https://github.com/pytorch/pytorch/pull/103851 has moved CPU weight packing to be after AOTAutograd. Fixed the support in this PR to follow the same way (mainly in 3b207f7f1c (diff-6dffed1ade0ba3e887f9a4eafa3bfcec267ab2365b8adcb91bd391f49b3fd2e3)).
LSTM is decomposed in `aten.mkldnn_rnn_layer` by layer and by direction. The weight prepack is done at the `mkldnn_rnn_layer` level.
- Add a fix in rnn `__get_state__` function in case we need to recompile an `LSTM` module.
When compiling the module, the weights tensors which are the `named_parameters` of the module are converted to `functional_tensor` here:
76fb72e24a/torch/nn/utils/stateless.py (L125-L128)
The forward function of LSTM will be called:
76fb72e24a/torch/_functorch/aot_autograd.py (L3379-L3381)
In the forward function, the `_flat_weights` are updated to be the same as the weights, thus becoming `functional_tensor`:
76fb72e24a/torch/nn/modules/rnn.py (L775-L778)
The weights tensors are converted back to the original tensors (which are not `functional_tensor` anymore) before exiting the `_reparametrize_module` context here:
76fb72e24a/torch/nn/utils/stateless.py (L130-L142)
But since `_flat_weights` is not in the `named_parameters` of the module, it's still `functional_tensor` ([link of the parameters that will be converted to functional and reverted back](76fb72e24a/torch/_functorch/aot_autograd.py (L3695-L3698))).
At this moment, if we need to recompile the model, `deepcopy` will be called:
76fb72e24a/torch/_dynamo/utils.py (L915-L917)
And it will report `UnImplemented` since we have `functional_tensor` (`_flat_weights`) and will trigger graph break which is not what we expect:
76fb72e24a/torch/_subclasses/meta_utils.py (L514)
Added a fix in the `__get_state__` to update the `_flat_weights` if ever weights have changed to fix this issue. The fix is covered in the `test_lstm_packed` UT.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103071
Approved by: https://github.com/jgong5, https://github.com/jansel