Commit Graph

1433 Commits

Author SHA1 Message Date
isdanni
2f7bb18def [Doc] Add padding size constraint in nn.ReflectionPad2d (#115995)
Fixes #115532

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115995
Approved by: https://github.com/mikaylagawarecki
2023-12-18 21:29:14 +00:00
Mikayla Gawarecki
6d5fe07659 Fix numpy warning when importing torch without numpy installed (#115867)
Fixes #115638

I verified locally that with no numpy install the warning no longer occurs

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115867
Approved by: https://github.com/soulitzer
2023-12-15 02:22:12 +00:00
Wongboo
68f74dd162 Add python and C++ support for LPPool3d (#114199)
Add python and C++ support for LPPool3d to Fixes #114114

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114199
Approved by: https://github.com/mikaylagawarecki
2023-12-08 18:18:44 +00:00
Mikayla Gawarecki
f5919335db Fix _load_from_state_dict for num_batches_tracked in batchnorm (#115285)
I approved https://github.com/pytorch/pytorch/pull/110850 which did the following

Previously:
`num_batches_tracked` not in state_dict when doing `m.load_state_dict(state_dict)` --> always overwrite module's `num_batches_tracked` in `load_from_state_dict` with a 0 cpu tensor

Now:
`num_batches_tracked` not in state_dict loaded when doing `m.load_state_dict(state_dict)` --> only overwrite module's `num_batches_tracked`  in `load_from_state_dict` with a 0 cpu tensor if module does not have `num_batches_tracked`

This causes the following issue:

```
with torch.device('meta'):
     m = BatchNorm(...)
m.load_state_dict(state_dict, assign=True)
```

If `num_batches_tracked` is not in `state_dict`, since `modules's` `num_batches_tracked` is present on meta device, it is not overwritten with a 0 cpu tensor. When compiling, this error is raised

```
AssertionError: Does not support mixing cuda+meta
```

I am not sure whether the explicit check for meta device makes sense as a fix, will add testing if this fix is ok

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115285
Approved by: https://github.com/albanD
2023-12-07 22:48:26 +00:00
Linus
7201edc0a5 Fix RNN class constructor signature (#115341)
Fixes #114617

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115341
Approved by: https://github.com/mikaylagawarecki
2023-12-07 19:46:33 +00:00
Aaron Gokaslan
ea7d70aecc [BE]: ruff FURB136: replace ternary with min/max (preview) (#114382)
Replaces ternary if else statements with simple min max when appropriate.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/114382
Approved by: https://github.com/albanD
2023-11-22 22:10:01 +00:00
Brian Vaughan
dbb96ef30d improve annotation device parameters where a device ordinal is allowed (#113647)
Using mypy in code that depends on pytorch, I noticed that the type annotation doesn't allow a device ordinal.

`error: Argument "device" to "to_empty" of "Module" has incompatible type "int"; expected "str | device"  [arg-type]`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113647
Approved by: https://github.com/albanD
2023-11-17 14:41:22 +00:00
zabboud
53e7de4b65 Issue 112599 - fix pydocstyle errors (#113177)
Fixes #112599

Fixed errors relating to pydocstyle in the following files. The remaining errors are related to docstrings at the module level and at methods within each module, `forward()`, `reset_parameters`, `__init__` ..etc

pydocstyle torch/nn/modules/pooling.py --count
before: 49
after: 29

**remaining errors:**
```
torch/nn/modules/pooling.py:1 at module level:
        D100: Missing docstring in public module
torch/nn/modules/pooling.py:90 in public method `forward`:
        D102: Missing docstring in public method
torch/nn/modules/pooling.py:163 in public method `forward`:
        D102: Missing docstring in public method
torch/nn/modules/pooling.py:240 in public method `forward`:
        D102: Missing docstring in public method
torch/nn/modules/pooling.py:315 in public method `__init__`:
        D107: Missing docstring in __init__
torch/nn/modules/pooling.py:321 in public method `forward`:
        D102: Missing docstring in public method
torch/nn/modules/pooling.py:402 in public method `__init__`:
        D107: Missing docstring in __init__
torch/nn/modules/pooling.py:408 in public method `forward`:
        D102: Missing docstring in public method
torch/nn/modules/pooling.py:472 in public method `__init__`:
        D107: Missing docstring in __init__
torch/nn/modules/pooling.py:478 in public method `forward`:
        D102: Missing docstring in public method
torch/nn/modules/pooling.py:541 in public method `__init__`:
        D107: Missing docstring in __init__
torch/nn/modules/pooling.py:550 in public method `forward`:
        D102: Missing docstring in public method
torch/nn/modules/pooling.py:620 in public method `__init__`:
        D107: Missing docstring in __init__
torch/nn/modules/pooling.py:630 in public method `forward`:
        D102: Missing docstring in public method
torch/nn/modules/pooling.py:706 in public method `__init__`:
        D107: Missing docstring in __init__
torch/nn/modules/pooling.py:716 in public method `forward`:
        D102: Missing docstring in public method
torch/nn/modules/pooling.py:720 in public method `__setstate__`:
        D105: Missing docstring in magic method
torch/nn/modules/pooling.py:774 in public method `__init__`:
        D107: Missing docstring in __init__
torch/nn/modules/pooling.py:792 in public method `forward`:
        D102: Missing docstring in public method
torch/nn/modules/pooling.py:845 in public method `__init__`:
        D107: Missing docstring in __init__
torch/nn/modules/pooling.py:863 in public method `forward`:
        D102: Missing docstring in public method
torch/nn/modules/pooling.py:925 in public method `forward`:
        D102: Missing docstring in public method
torch/nn/modules/pooling.py:979 in public method `forward`:
        D102: Missing docstring in public method
torch/nn/modules/pooling.py:1026 in public method `forward`:
        D102: Missing docstring in public method
torch/nn/modules/pooling.py:1068 in public method `forward`:
        D102: Missing docstring in public method
torch/nn/modules/pooling.py:1111 in public method `forward`:
        D102: Missing docstring in public method
torch/nn/modules/pooling.py:1150 in public method `forward`:
        D102: Missing docstring in public method
torch/nn/modules/pooling.py:1189 in public method `forward`:
        D102: Missing docstring in public method
torch/nn/modules/pooling.py:1228 in public method `forward`:
        D102: Missing docstring in public method
```

pydocstyle torch/nn/modules/upsampling.py --count
before: 14
after: 7

**remaining:**
```
torch/nn/modules/upsampling.py:1 at module level:
        D100: Missing docstring in public module
torch/nn/modules/upsampling.py:142 in public method `__init__`:
        D107: Missing docstring in __init__
torch/nn/modules/upsampling.py:156 in public method `forward`:
        D102: Missing docstring in public method
torch/nn/modules/upsampling.py:160 in public method `__setstate__`:
        D105: Missing docstring in magic method
torch/nn/modules/upsampling.py:166 in public method `extra_repr`:
        D102: Missing docstring in public method
torch/nn/modules/upsampling.py:216 in public method `__init__`:
        D107: Missing docstring in __init__
torch/nn/modules/upsampling.py:263 in public method `__init__`:
        D107: Missing docstring in __init__
```

pydocstyle torch/nn/modules/rnn.py --count
before: 47
after: 40

**remaining**
```
torch/nn/modules/rnn.py:1 at module level:
        D100: Missing docstring in public module
torch/nn/modules/rnn.py:59 in public method `__init__`:
        D107: Missing docstring in __init__
torch/nn/modules/rnn.py:160 in public method `__setattr__`:
        D105: Missing docstring in magic method
torch/nn/modules/rnn.py:225 in public method `reset_parameters`:
        D102: Missing docstring in public method
torch/nn/modules/rnn.py:230 in public method `check_input`:
        D102: Missing docstring in public method
torch/nn/modules/rnn.py:242 in public method `get_expected_hidden_size`:
        D102: Missing docstring in public method
torch/nn/modules/rnn.py:256 in public method `check_hidden_size`:
        D102: Missing docstring in public method
torch/nn/modules/rnn.py:272 in public method `check_forward_args`:
        D102: Missing docstring in public method
torch/nn/modules/rnn.py:278 in public method `permute_hidden`:
        D102: Missing docstring in public method
torch/nn/modules/rnn.py:284 in public method `extra_repr`:
        D102: Missing docstring in public method
torch/nn/modules/rnn.py:305 in public method `__getstate__`:
        D105: Missing docstring in magic method
torch/nn/modules/rnn.py:313 in public method `__setstate__`:
        D105: Missing docstring in magic method
torch/nn/modules/rnn.py:355 in public method `all_weights`:
        D102: Missing docstring in public method
torch/nn/modules/rnn.py:471 in public method `__init__`:
        D107: Missing docstring in __init__
torch/nn/modules/rnn.py:478 in public method `__init__`:
        D107: Missing docstring in __init__
torch/nn/modules/rnn.py:481 in public method `__init__`:
        D107: Missing docstring in __init__
torch/nn/modules/rnn.py:503 in public method `forward` (skipping F811):
        D102: Missing docstring in public method
torch/nn/modules/rnn.py:762 in public method `__init__`:
        D107: Missing docstring in __init__
torch/nn/modules/rnn.py:768 in public method `__init__`:
        D107: Missing docstring in __init__
torch/nn/modules/rnn.py:771 in public method `__init__`:
        D107: Missing docstring in __init__
torch/nn/modules/rnn.py:774 in public method `get_expected_cell_size`:
        D102: Missing docstring in public method
torch/nn/modules/rnn.py:786 in public method `check_forward_args`:
        D102: Missing docstring in public method
torch/nn/modules/rnn.py:798 in public method `permute_hidden`:
        D102: Missing docstring in public method
torch/nn/modules/rnn.py:809 in public method `forward` (skipping F811):
        D102: Missing docstring in public method
torch/nn/modules/rnn.py:820 in public method `forward` (skipping F811):
        D102: Missing docstring in public method
torch/nn/modules/rnn.py:1030 in public method `__init__`:
        D107: Missing docstring in __init__
torch/nn/modules/rnn.py:1036 in public method `__init__`:
        D107: Missing docstring in __init__
torch/nn/modules/rnn.py:1039 in public method `__init__`:
        D107: Missing docstring in __init__
torch/nn/modules/rnn.py:1046 in public method `forward` (skipping F811):
        D102: Missing docstring in public method
torch/nn/modules/rnn.py:1054 in public method `forward` (skipping F811):
        D102: Missing docstring in public method
torch/nn/modules/rnn.py:1123 in public class `RNNCellBase`:
        D101: Missing docstring in public class
torch/nn/modules/rnn.py:1134 in public method `__init__`:
        D107: Missing docstring in __init__
torch/nn/modules/rnn.py:1152 in public method `extra_repr`:
        D102: Missing docstring in public method
torch/nn/modules/rnn.py:1160 in public method `reset_parameters`:
        D102: Missing docstring in public method
torch/nn/modules/rnn.py:1224 in public method `__init__`:
        D107: Missing docstring in __init__
torch/nn/modules/rnn.py:1230 in public method `forward`:
        D102: Missing docstring in public method
torch/nn/modules/rnn.py:1327 in public method `__init__`:
        D107: Missing docstring in __init__
torch/nn/modules/rnn.py:1332 in public method `forward`:
        D102: Missing docstring in public method
torch/nn/modules/rnn.py:1422 in public method `__init__`:
        D107: Missing docstring in __init__
torch/nn/modules/rnn.py:1427 in public method `forward`:
        D102: Missing docstring in public method
```

pydocstyle torch/nn/modules/pixelshuffle.py --count
before: 13
after: 8

**remaining:**
```
torch/nn/modules/pixelshuffle.py:1 at module level:
        D100: Missing docstring in public module
torch/nn/modules/pixelshuffle.py:52 in public method `__init__`:
        D107: Missing docstring in __init__
torch/nn/modules/pixelshuffle.py:56 in public method `forward`:
        D102: Missing docstring in public method
torch/nn/modules/pixelshuffle.py:59 in public method `extra_repr`:
        D102: Missing docstring in public method
torch/nn/modules/pixelshuffle.py:105 in public method `__init__`:
        D107: Missing docstring in __init__
torch/nn/modules/pixelshuffle.py:109 in public method `forward`:
        D102: Missing docstring in public method
torch/nn/modules/pixelshuffle.py:112 in public method `extra_repr`:
        D102: Missing docstring in public method
```

pydocstyle torch/nn/modules/sparse.py --count
before: 14
after: 8

**remaining errors:**
```
torch/nn/modules/sparse.py:1 at module level:
        D100: Missing docstring in public module
torch/nn/modules/sparse.py:124 in public method `__init__`:
        D107: Missing docstring in __init__
torch/nn/modules/sparse.py:153 in public method `reset_parameters`:
        D102: Missing docstring in public method
torch/nn/modules/sparse.py:162 in public method `forward`:
        D102: Missing docstring in public method
torch/nn/modules/sparse.py:167 in public method `extra_repr`:
        D102: Missing docstring in public method
torch/nn/modules/sparse.py:320 in public method `__init__`:
        D107: Missing docstring in __init__
torch/nn/modules/sparse.py:350 in public method `reset_parameters`:
        D102: Missing docstring in public method
torch/nn/modules/sparse.py:396 in public method `extra_repr`:
        D102: Missing docstring in public method
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113177
Approved by: https://github.com/ezyang
2023-11-14 20:55:22 +00:00
markstur
5540d276ce Fix docstring errors in container.py, _functions.py, transformer.py, comm.py, parallel_apply.py, data_parallel.py, scatter_gather.py (#113250)
Fix docstring errors in container.py, _functions.py, transformer.py, comm.py, parallel_apply.py, data_parallel.py, scatter_gather.py

Fixes #112603

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113250
Approved by: https://github.com/mikaylagawarecki
2023-11-10 21:07:25 +00:00
Alperen ÜNLÜ
cb233dada4 Fix docstrings on torch/nn/modules (#113260)
Fixes #112598

## Description
Fixes the docstrings on following files.

```bash
pydocstyle path-to-file --count
```
| File                                  |  Count  |
| ------------------------------------- | ------- |
| torch/nn/modules/adaptive.py          |  20 -> 4 |
| torch/nn/modules/channelshuffle.py    |  7 -> 4 |
| torch/nn/modules/conv.py              |  37 -> 25 |
| torch/nn/modules/distance.py          |  7 -> 5 |
| torch/nn/modules/dropout.py           |  17 -> 7 |
| torch/nn/modules/flatten.py           |  10 -> 7 |
| torch/nn/modules/fold.py              |  11 -> 7 |
| torch/nn/modules/instancenorm.py      |  13 -> 1 |
| torch/nn/modules/lazy.py              |  11 -> 2 |
| torch/nn/modules/linear.py            |  20 -> 14 |
| torch/nn/modules/normalization.py     |  25 -> 16 |
| torch/nn/modules/padding.py           |  33 -> 19 |

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113260
Approved by: https://github.com/mikaylagawarecki
2023-11-10 18:22:48 +00:00
Edward Z. Yang
b4dbb02d46 Adjust _list_with_default to also work with SymInt input (#113073)
Fixes https://github.com/pytorch/pytorch/issues/112496

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113073
Approved by: https://github.com/jbschlosser
2023-11-07 00:59:25 +00:00
Adrian Wälchli
157bda1bf0 Fix pydocstyle errors in torch/nn/module (#112674)
Fixes  #112601

```
pydocstyle torch/nn/modules/module.py  --count
```
On master:
115
After my changes on this PR:
8

The remaining 8 are due to missing docstrings in the magic methods:
```
torch/nn/modules/module.py:1 at module level:
        D100: Missing docstring in public module
torch/nn/modules/module.py:1635 in public method `__getstate__`:
        D105: Missing docstring in magic method
torch/nn/modules/module.py:1640 in public method `__setstate__`:
        D105: Missing docstring in magic method
torch/nn/modules/module.py:1674 in public method `__getattr__`:
        D105: Missing docstring in magic method
torch/nn/modules/module.py:1689 in public method `__setattr__`:
        D105: Missing docstring in magic method
torch/nn/modules/module.py:1748 in public method `__delattr__`:
        D105: Missing docstring in magic method
torch/nn/modules/module.py:2480 in public method `__repr__`:
        D105: Missing docstring in magic method
torch/nn/modules/module.py:2505 in public method `__dir__`:
        D105: Missing docstring in magic method

```

Should I add them too? Happy to do it, I just wasn't sure if you wanted these documented. Please let me know.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112674
Approved by: https://github.com/mikaylagawarecki
2023-11-02 20:40:56 +00:00
XiaobingSuper
395614c1a4 keep sync bn training flag same with converted bn's training flag (#111998)
When converting bn to sync bn, we need to keep sync bn's training flag with the original bn flag, the motivation is there in case the given origin model has set some bn training flag and others are not seated, after we convert sync bn, we hoping not to change this behavior.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111998
Approved by: https://github.com/mikaylagawarecki
2023-10-26 08:18:08 +00:00
FFFrog
0e0f6a248d Fix num_batches_tracked of BatchNorm when load_state_dict (#110850)
Fixes #110361

as the title shown

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110850
Approved by: https://github.com/mikaylagawarecki
2023-10-24 04:20:38 +00:00
Federico Galatolo
d118531733 Use \odot everywhere instead of mixing \odot and * for the Hadamard product (#111763)
This pull request addresses an inconsistency in the representation of the Hadamard product across PyTorch documentation. Currently, the notation varies among different modules:

- In `torch.nn.LSTM` documentation the Hadamard product is represented with $\odot$
- In `torch.nn.GRU` documentation the Hadamard product is represented with $*$
- In `torch.nn.LSTMCell` documentation the Hadamard product is represented with $*$
- In `torch.nn.GRUCell` documentation the Hadamard product is represented with $*$
- In `torch.ao.nn.quantized.dynamic.GRU` documentation the Hadamard product is represented with $*$

This PR proposes consistently representing the Hadamard product throughout the documentation to enhance clarity and align with established standards.
The notation $\odot$ will be uniformly adopted, following the convention in the [Deep Learning Book](https://www.deeplearningbook.org/contents/linear_algebra.html).

**Changes Made:**

- Modified `torch.nn.GRU` documentation to represent the Hadamard product with $\odot$
- Modified `torch.nn.LSTMCell` documentation to represent the Hadamard product with $\odot$
- Modified `torch.nn.GRUCell` documentation to represent the Hadamard product with $\odot$
- Modified `torch.ao.nn.quantized.dynamic.GRU` documentation to represent the Hadamard product with $\odot$

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111763
Approved by: https://github.com/albanD
2023-10-22 21:01:35 +00:00
Aniket Patil
6f06832219 Fixed typo in activation.py (#111358)
liner -> linear
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111358
Approved by: https://github.com/mikaylagawarecki
2023-10-16 20:36:55 +00:00
isdanni
382327bd0e [BE] Enable Ruff's Flake8 PYI034 (#111105)
Enable [non-self-return-type (PYI034)](https://docs.astral.sh/ruff/rules/non-self-return-type/#non-self-return-type-pyi034)

Link: #110950

**EDIT**: to newly added reviewers, please ignore the request, it's due to a rebase error 😅

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111105
Approved by: https://github.com/Skylion007
2023-10-13 21:19:53 +00:00
isdanni
6c7013a3dc [Doc] Add weight dtype in torch.nn.CrossEntropyLoss (#110998)
Fixes #101213

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110998
Approved by: https://github.com/albanD
2023-10-11 19:52:13 +00:00
Sehoon Kim
c36b31d530 torch::nn::AdaptiveLogSoftmaxWithLoss: check length of cutoffs (#106777)
Fixes #106698

Also added a check for python API, because current error message
```
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/sehoon/pytorch-latest/torch/nn/modules/adaptive.py", line 128, in __init__
    or (min(cutoffs) <= 0) \
ValueError: min() arg is an empty sequence
```
is not very comprehensible.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106777
Approved by: https://github.com/albanD
2023-10-05 05:35:47 +00:00
FFFrog
bc3f0d341a LazyBatchNorm{1-3}d support dict&set (#109015)
Fixes #105292

As the title shown ,LazyBatchNorm don`t support dict&set, keep consistent with BatchNorm{1-3}d.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109015
Approved by: https://github.com/mikaylagawarecki
2023-09-12 09:09:59 +00:00
FFFrog
003c5bb156 Add checks to num_layers for RNN, LSTM, GRU (#108853)
Fixes #108223

As the title shown

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108853
Approved by: https://github.com/mikaylagawarecki
2023-09-09 19:33:52 +00:00
Randolf Scholz
8391e3fba4 fixed nn.Module.to type hint (#108767)
Fixes #108675

- [x] adds `str` as option for `device`
- [x] use `typing_extensions.Self` instead of `T`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108767
Approved by: https://github.com/ezyang
2023-09-08 02:40:53 +00:00
Minh-Long Luu (刘明龙)
95f268e426 Add examples for nn.CosineEmbeddingLoss (#108215)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108215
Approved by: https://github.com/mikaylagawarecki
2023-08-31 20:01:24 +00:00
hasteinmetz
b535ed2c1a Update to RNN documentation (issue #106085) (#106222)
Addresses [issue #106085](https://github.com/pytorch/pytorch/issues/106085).

In `torch/nn/modules/rnn.py`:
- Adds documentation string to RNNBase class.
- Adds parameters to __init__ methods for RNN, LSTM, and GRU, classes.
- Adds type annotations to __init__ methods for RNN, LSTM, and GRU.

In `torch/ao/nn/quantized/dynamic/modules/rnn.py`:
- Adds type specifications to `_FLOAT_MODULE` attributes in RNNBase, RNN, LSTM, and GRU classes.
> This resolves a `mypy` assignment error `Incompatible types in assignment (expression has type "Type[LSTM]", base class "RNNBase" defined the type as "Type[RNNBase]")` that seemed to be a result of fully specified type annotations in `torch/nn/modules/rnn.py`).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106222
Approved by: https://github.com/mikaylagawarecki
2023-08-31 00:50:32 +00:00
Mikayla Gawarecki
584a01b650 Fix LayerNorm(bias=False) error (#108060)
Fixes #108048

- [ ] Cherry pick this [here](https://github.com/pytorch/pytorch/issues/108055)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108060
Approved by: https://github.com/jbschlosser, https://github.com/albanD, https://github.com/malfet
2023-08-28 18:23:13 +00:00
dvorst
91a674ccd4 Fix docstring for shape of target for MultiLabelSoftMarginLoss (#107817)
Fixes #92000

The documentation at https://pytorch.org/docs/stable/generated/torch.nn.MultiLabelSoftMarginLoss.html#multilabelsoftmarginloss states:
> label targets padded by -1 ensuring same shape as the input.

However, the shape of input and target tensor are compared, and an exception is raised if they differ in either dimension 0 or 1. Meaning the label targets are never padded. See the code snippet below and the resulting output. The documentation is therefore adjusted to:
> label targets must have the same shape as the input.

```
import torch
import torch.nn as nn

# Create some example data
input = torch.tensor(
    [
        [0.8, 0.2, -0.5],
        [0.1, 0.9, 0.3],
    ]
)
target1 = torch.tensor(
    [
        [1, 0, 1],
        [0, 1, 1],
        [0, 1, 1],
    ]
)
target2 = torch.tensor(
    [
        [1, 0],
        [0, 1],
    ]
)
target3 = torch.tensor(
    [
        [1, 0, 1],
        [0, 1, 1],
    ]
)
loss_func = nn.MultiLabelSoftMarginLoss()
try:
    loss = loss_func(input, target1).item()
except RuntimeError as e:
    print('target1 ', e)
try:
    loss = loss_func(input, target2).item()
except RuntimeError as e:
    print('target2 ', e)
loss = loss_func(input, target3).item()
print('target3 ', loss)
```

output:
```
target1  The size of tensor a (3) must match the size of tensor b (2) at non-singleton dimension 0
target2  The size of tensor a (2) must match the size of tensor b (3) at non-singleton dimension 1
target3  0.6305370926856995
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107817
Approved by: https://github.com/mikaylagawarecki
2023-08-24 15:13:46 +00:00
Mikayla Gawarecki
48b1208e05 Disable nn.MHA fastpath for floating point masks (#107641)
Fixes https://github.com/pytorch/pytorch/issues/107084 by disabling the fast path when floating point masks (which should be additive) are passed

- [We claim in our docs for MHA that float masks will be added to the attention](https://pytorch.org/docs/stable/generated/torch.nn.MultiheadAttention.html) (be it `key_padding_mask` or `attn_mask`)
- We always canonicalize any mask at the start of MHA in python by converting it to float
- my understanding from Driss is that SDPA properly supports additive masking (but there are many special cases for mask shape for MHA that don't work properly currently (BxT, TxT) so [we're turning this off for now](https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/native/transformers/cuda/attention.cu#L531-L532)
- More broadly, the problem isn't with the SDPA path, but that things are broken for the path it falls back to
-  Right now mha "fast path" code with non-None masks is always going through [this path ](https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/native/transformers/cuda/attention.cu#L554-L640) that  has a call to `masked_softmax` that [converts the masks back to bool](https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/native/transformers/attention.cpp#L154-L156)
- the implication here is that **additive floating point attn_mask and additive key_padding_mask to nn.MHA fastpath are broken**
- This wasn't broken for the user in [https://github.com/pytorch/pytorch/issues/107084](https://l.workplace.com/l.php?u=https%3A%2F%2Fgithub.com%2Fpytorch%2Fpytorch%2Fissues%2F107084&h=AT35qHIQavtxKtriTkrkPsWRB3eSRh4qH5PQUyiTzrPTshoztPL0593AmKCmSdEQ5O-5wib0Fd4mwztVu4YbMWb2ghZnZw1pvpJb9-FYWjDsPQ6_oHRVPzFfj8xYXC1TaFnJCkMYjrGXkIfzzxZvmcQYNnIPgsJSiWgjIw) in 1.13.1 because of [this check which bypassed the fast path if attn_mask was defined](https://github.com/pytorch/pytorch/blob/v1.13.1/torch/nn/modules/activation.py#L1096-L1097) (as Driss pointed out though additive key_padding_mask with the fast path were probably  broken in 1.13.1)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107641
Approved by: https://github.com/drisspg, https://github.com/jbschlosser
2023-08-23 15:08:18 +00:00
Aaron Gokaslan
660e8060ad [BE]: Update ruff to 0.285 (#107519)
This updates ruff to 0.285 which is faster, better, and have fixes a bunch of false negatives with regards to fstrings.

I also enabled RUF017 which looks for accidental quadratic list summation. Luckily, seems like there are no instances of it in our codebase, so enabling it so that it stays like that. :)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107519
Approved by: https://github.com/ezyang
2023-08-22 23:16:38 +00:00
PyTorch MergeBot
d59a6864fb Revert "[BE]: Update ruff to 0.285 (#107519)"
This reverts commit 88ab3e4322.

Reverted https://github.com/pytorch/pytorch/pull/107519 on behalf of https://github.com/ZainRizvi due to Sorry, but this PR breaks internal tests. @ezyang, can you please hep them get unblocked? It seems like one of the strings was prob accidentally modified ([comment](https://github.com/pytorch/pytorch/pull/107519#issuecomment-1688833480))
2023-08-22 19:53:32 +00:00
Aaron Gokaslan
88ab3e4322 [BE]: Update ruff to 0.285 (#107519)
This updates ruff to 0.285 which is faster, better, and have fixes a bunch of false negatives with regards to fstrings.

I also enabled RUF017 which looks for accidental quadratic list summation. Luckily, seems like there are no instances of it in our codebase, so enabling it so that it stays like that. :)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107519
Approved by: https://github.com/ezyang
2023-08-20 01:36:18 +00:00
박종의 (PARK, Jongeui)
af0ed25ea8 Change >= in the GRU and the LSTM document to \ge (#107379)
Change >= in the GRU document to \ge
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107379
Approved by: https://github.com/ezyang
2023-08-18 20:44:51 +00:00
FFFrog
2d2d43d9fb add more check on LSTMCell (#107380)
Just like #107223, operator ``LSTMCell`` have the same problems as ``GRUCell``, and add some check and tests related to fix it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107380
Approved by: https://github.com/ezyang
2023-08-18 20:44:17 +00:00
FFFrog
a4229690e3 Add Some Checks about dim (#107223)
Fixes #106769

As mentioned in [GRUCell](https://pytorch.org/docs/stable/generated/torch.nn.GRUCell.html#grucell), `hidden` should have the same dimension as `input`, and the dimension should be either `1D` or `2D`.

As for other aspects, it has been verified in `C++`, such as the batch of `Input` and `hidden` are the same, `Input`'s Dim1 and `input_size` are the same, `hidden`'s Dim1 and `hidden_size` are the same, etc.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107223
Approved by: https://github.com/albanD
2023-08-16 22:03:31 +00:00
Mikayla Gawarecki
b08b0c915f [easy] Fix docs for sd calculation in BatchNorm1d/3d for consistency with BatchNorm2d (#107308)
Fixes https://github.com/pytorch/pytorch/issues/100048

BatchNorm2d docs were updated in https://github.com/pytorch/pytorch/pull/97974. There have been a number of issues filed due to confusion about this so I think we should fix before branch cut

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107308
Approved by: https://github.com/albanD
2023-08-16 21:51:02 +00:00
Guang Yang
0b57581dec [pytorch] Disable fast path in MultiheadAttention in Export (#106824)
Summary:
We are seeing `aten._native_multi_head_attention` op (not in core Aten op set) is left in the exported graph and causes problems in the downstream at runtime.

Two proposed solutions:
 1. Disable fast path while tracing to leverage the non-optimized path to get decomp, that way, the blamed op won't show up in the exported graph
 2. Add a decomp rule for `aten._native_multi_head_attention`

After discussing with kimishpatel and bdhirsh, #1 is preferred and verified it could immediately unblock the critical model enablement work for PP.

Test Plan: CI

Differential Revision: D48169806

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106824
Approved by: https://github.com/kimishpatel
2023-08-10 00:18:37 +00:00
Jason Lu
bc88028e8e Back out "Reland "Make adding buffers more like adding parameters (#104069)" (#106224)" (#106743)
Summary:
Original commit changeset: 81319beb97f3

Original Phabricator Diff: D47961182

Test Plan: revert to maintain backward compat with legacy ads_dper3 production package. Read details in: S357822

Reviewed By: atuljangra

Differential Revision: D48131623

@diff-train-skip-merge
(D48131623 landed internally)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106743
Approved by: https://github.com/malfet
2023-08-08 15:27:34 +00:00
Mikayla Gawarecki
1317dbf176 Reland "Add nn.CircularPad{*}d for consistency + fix no_batch_dim support (#106148)" (#106632)
Previous one was reverted because the PR stacked under which added error-checking to Pad variants https://github.com/pytorch/pytorch/pull/106147 was reverted as internally some people pass 2D inputs to ZeroPad2d (which should actually take 3d or 4d inputs :) but there wasn't actually anything this PR was breaking according to my understanding

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106632
Approved by: https://github.com/albanD
2023-08-07 20:10:25 +00:00
Mikayla Gawarecki
786977c647 [easy] Add reset_parameters for nn.PRelu (#106507)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106507
Approved by: https://github.com/albanD
2023-08-04 23:22:42 +00:00
Michael Gschwind
63d45275f4 is causal hints for transformer (#106143)
Summary:
make is_causal hint flags available for the top level transformer module.

It's debatable whether this is useful -- at present we autodetect causal masks for src and tgt masks in transformer encoder and decoder, respectively. is_causal flags available woul enable users to short-cut this check by asserting whether they mask is causal, or not.

I am putting this diff up for discussion, not as a solution.  Not doing anything may be the right solution, unless there is strong (data-driven) user demand. -- it appears the consensus is to move ahead with this, as per discussions below.

@cpuhrsch @mikaylagawarecki @jbschlosser @janEbert

Test Plan: sandcastle

Differential Revision: D47373260

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106143
Approved by: https://github.com/mikaylagawarecki
2023-08-04 14:16:48 +00:00
Michael Gschwind
3db255020b Clarify the clarification (#106358)
Summary: Clarify the clarification

Differential Revision: D47941982

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106358
Approved by: https://github.com/mikaylagawarecki
2023-08-03 16:58:36 +00:00
PyTorch MergeBot
dfcfd5cedb Revert "Add nn.CircularPad{*}d for consistency + fix no_batch_dim support (#106148)"
This reverts commit 87d2536971.

Reverted https://github.com/pytorch/pytorch/pull/106148 on behalf of https://github.com/malfet due to Reverting as dependent PR https://github.com/pytorch/pytorch/pull/106147 was reverted as well ([comment](https://github.com/pytorch/pytorch/pull/106148#issuecomment-1662344543))
2023-08-02 14:46:00 +00:00
PyTorch MergeBot
d83b887f2a Revert "Add error checking for padding modules (#106147)"
This reverts commit 0547b6279d.

Reverted https://github.com/pytorch/pytorch/pull/106147 on behalf of https://github.com/jeanschmidt due to sadly it is breaking internal builds, and I can't coordinate a FF due to timezone differences ([comment](https://github.com/pytorch/pytorch/pull/106147#issuecomment-1661870970))
2023-08-02 09:37:40 +00:00
Danni Li
5e3aca6c5c [BE] Input check for torch.nn.MultiheadAttention (#106363)
Summary: Check `embed_dim` and `num_heads ` of `torch.nn.MultiheadAttention`.

Test Plan: Please see GitHub Actions.

Differential Revision: D47943134

Fix: #105630

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106363
Approved by: https://github.com/mikaylagawarecki
2023-08-01 23:28:23 +00:00
Danni Li
1a6f1d816d [Doc] Add proj_size < hidden_size in LSTM (#106364)
Summary:
Add parameter constraint: `proj_size` has to be smaller than `hidden_size` in RNNBase doc.

Ref:
ceea08a986/torch/nn/modules/rnn.py (L83)

ceea08a986/torch/nn/modules/rnn.py (L458)

Test Plan: Please see GitHub Actions.

Differential Revision: D47943365

Fix: #105628

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106364
Approved by: https://github.com/mikaylagawarecki
2023-08-01 18:58:27 +00:00
Mikayla Gawarecki
87d2536971 Add nn.CircularPad{*}d for consistency + fix no_batch_dim support (#106148)
Fixes #105749 https://github.com/pytorch/pytorch/issues/95320

(tldr is that input should always be `[N, C, H, (W, D])` where only H, W and D dimensions get circular padding, so the 2D case where user wants both dimensions to be padded --> they should `.unsqueeze(0)` (as is the case for `Reflection/ReplicationPad`) but we didn't document this for circular padding. [This seems to be the old docstring](277b05014a/torch/nn/functional.py (L4689)) that was somehow lost.

Fixes no_batch_dim support https://github.com/pytorch/pytorch/issues/104860

- Adds missing documentation for circular padding
- Adds missing CircularPad modules
- Migrates legacy test_nn tests from circular padding to ModuleInfo
- Adds no_batch_dim support + sample inputs that test this

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106148
Approved by: https://github.com/albanD
ghstack dependencies: #106325, #106147
2023-08-01 12:49:58 +00:00
Mikayla Gawarecki
0547b6279d Add error checking for padding modules (#106147)
Fixes https://github.com/pytorch/pytorch/issues/105627

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106147
Approved by: https://github.com/albanD
ghstack dependencies: #106325
2023-08-01 12:49:58 +00:00
Mikayla Gawarecki
d8e5f2aa6d Reland "Make adding buffers more like adding parameters (#104069)" (#106224)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106224
Approved by: https://github.com/atalman, https://github.com/albanD
2023-07-31 17:18:56 +00:00
chunyuan
cb6c3cbc91 inductor: enable weight prepack for LSTM (#103071)
- Enabled LSTM weight prepack in inductor.
- Added a mkldnn decomposition for lstm which won't change for different `seq_lens`. With the previous decomposition, for dynamic shapes use case where `seq_lens` changes, the graph will be different.
- Extended several inductor utility functions to support `List(Tensor`) as input. Previously those functions only supported `Tensor` input.

**Update 2023-07-26:**
- https://github.com/pytorch/pytorch/pull/103851 has moved CPU weight packing to be after AOTAutograd. Fixed the support in this PR to follow the same way (mainly in 3b207f7f1c (diff-6dffed1ade0ba3e887f9a4eafa3bfcec267ab2365b8adcb91bd391f49b3fd2e3)).
LSTM is decomposed in `aten.mkldnn_rnn_layer` by layer and by direction. The weight prepack is done at the `mkldnn_rnn_layer` level.
- Add a fix in rnn `__get_state__` function in case we need to recompile an `LSTM` module.
When compiling the module, the weights tensors which are the `named_parameters` of the module are converted to `functional_tensor` here:
76fb72e24a/torch/nn/utils/stateless.py (L125-L128)
The forward function of LSTM will be called:
76fb72e24a/torch/_functorch/aot_autograd.py (L3379-L3381)
In the forward function, the `_flat_weights` are updated to be the same as the weights, thus becoming `functional_tensor`:
76fb72e24a/torch/nn/modules/rnn.py (L775-L778)
The weights tensors are converted back to the original tensors (which are not `functional_tensor` anymore) before exiting the `_reparametrize_module` context here:
76fb72e24a/torch/nn/utils/stateless.py (L130-L142)
But since `_flat_weights` is not in the `named_parameters` of the module, it's still `functional_tensor` ([link of the parameters that will be converted to functional and reverted back](76fb72e24a/torch/_functorch/aot_autograd.py (L3695-L3698))).
At this moment, if we need to recompile the model, `deepcopy` will be called:
76fb72e24a/torch/_dynamo/utils.py (L915-L917)
And it will report `UnImplemented` since we have `functional_tensor` (`_flat_weights`) and will trigger graph break which is not what we expect:
76fb72e24a/torch/_subclasses/meta_utils.py (L514)
Added a fix in the `__get_state__`  to update the `_flat_weights` if ever weights have changed to fix this issue. The fix is covered in the `test_lstm_packed` UT.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103071
Approved by: https://github.com/jgong5, https://github.com/jansel
2023-07-28 13:54:32 +00:00
Michael Gschwind
723bc136a1 Add context for warning about batch_first (#106139)
Summary: Add context for warning about batch_first

Test Plan: sandcastle github

Differential Revision: D47809651

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106139
Approved by: https://github.com/mikaylagawarecki
2023-07-27 23:02:05 +00:00
Mikayla Gawarecki
ca7ece9b50 [easy] improve hint on error message in nn.Module.load_state_dict (#106042)
Fix #105963

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106042
Approved by: https://github.com/albanD
2023-07-27 19:56:02 +00:00