This patch effectively ignores traceable_tensor_subclasses, allowing
Dynamo to always try tracing into the `__torch_function__` of tensor
subclass. This helps us with 2 things:
1. allowing users to directly benefit from better compilation of tensor
subclass, by just upgrading pytorch, without having to change legacy
library code (see earlier patches in the stack for examples).
2. potentially exposing more issues in compiling tensor subclass, so we
can get signals and improve them.
As a consequence, it exposed and fixes 2 subtle bugs:
1. In `build_torch_function_fn`, we could get
`torch._C._disabled_torch_function_impl` because we have a
`Parameter` subclass without `__torch_function__` override or if we
have a tensor subclass with `__torch_dispatch__` override. We graph
break on this for now, and plan to add support -- the logic for
simulating `torch._C._disabled_torch_function_impl` is already in
`SuperVariable`, we just need to reuse it.
2. Sometimes we create `SyntheticLocalSource` and need to remove all the
guards installed on it, but we only removed the ones whose source
_is_ the created synthetic source `s`, but forgot about chained
source like `s.foo`, this showed up as
`SYNTHETIC_LOCAL['tmp_0'].__torch_function__.__func__`.
Differential Revision: [D71906141](https://our.internmc.facebook.com/intern/diff/D71906141)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/149792
Approved by: https://github.com/jansel, https://github.com/mlazos
ghstack dependencies: #149482, #149483, #149484
This patch effectively ignores traceable_tensor_subclasses, allowing
Dynamo to always try tracing into the `__torch_function__` of tensor
subclass. This helps us with 2 things:
1. allowing users to directly benefit from better compilation of tensor
subclass, by just upgrading pytorch, without having to change legacy
library code (see earlier patches in the stack for examples).
2. potentially exposing more issues in compiling tensor subclass, so we
can get signals and improve them.
As a consequence, it exposed and fixes 2 subtle bugs:
1. In `build_torch_function_fn`, we could get
`torch._C._disabled_torch_function_impl` because we have a
`Parameter` subclass without `__torch_function__` override or if we
have a tensor subclass with `__torch_dispatch__` override. We graph
break on this for now, and plan to add support -- the logic for
simulating `torch._C._disabled_torch_function_impl` is already in
`SuperVariable`, we just need to reuse it.
2. Sometimes we create `SyntheticLocalSource` and need to remove all the
guards installed on it, but we only removed the ones whose source
_is_ the created synthetic source `s`, but forgot about chained
source like `s.foo`, this showed up as
`SYNTHETIC_LOCAL['tmp_0'].__torch_function__.__func__`.
Differential Revision: [D71906141](https://our.internmc.facebook.com/intern/diff/D71906141)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/149792
Approved by: https://github.com/jansel, https://github.com/mlazos
ghstack dependencies: #149482, #149483, #149484
Fixes#103425
## Changes
- Add doc description size value `must be > 0`
- Add validation for `in1_features` param
Currently, only `in1_features` will cause runtime error, if add checks for `in2_features` and `out_features` as well, might be kind of BC breaking.
```python
import torch
from torch import nn
class lenet(nn.Module):
def __init__(self):
super(lenet, self).__init__()
self.conv = nn.Conv2d(in_channels=3, out_channels=16, kernel_size=5, stride=1)
# Error, `in1_features=1, in2_features=0, out_features=0` no error
self.linear = nn.Bilinear(in1_features=0, in2_features=0, out_features=0)
def forward(self, x):
# 1st block
x = self.conv(x)
x = self.linear(x)
return x
if __name__ == '__main__':
net = lenet()
```
## Test Result
```bash
pytest test/test_nn.py -k test_bilinear -vv
```


Pull Request resolved: https://github.com/pytorch/pytorch/pull/149018
Approved by: https://github.com/mikaylagawarecki
Current implementation reads as: we will only actually use the "python_reducer" config if the DDP forward is compiled. Otherwise, we will silently fallback to C++ reducer + no DDPOptimizer.
I'm changing this behavior to always use the python reducer if the config is specified.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/147123
Approved by: https://github.com/fegin
This enables a check that which a class which only inherits from immutable classes like str, tuple, and NamedTuple, also defined `__slots__` so they don't allocate memory unnecessarily. This also ensure contributors think about how they define their classes with subclass NamedTuples and str, of which we have many in our codebase
Pull Request resolved: https://github.com/pytorch/pytorch/pull/146276
Approved by: https://github.com/aorenste
CrossEntropyLoss function requires that target for class indices are provided as a long and class probabilities are provided as a float datatype.
The CrossEntropyLoss function distinguish the two scenarios (indices and probabilities) by comparing the shapes. When input and target shapes are the same it’s a case for probabilities otherwise it will be used as a class index as already covered in the doc. The related code is here,
https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/native/LossNLL.cpp#L624
I think the current documentation is great but seems like it can confuse users about types as reported in the issues so this PR adds a bit more clarification.
Fixes#137188
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145444
Approved by: https://github.com/mikaylagawarecki
Currently the `bias` attribute of `torch.nn.Linear` (and `Bilinear`) is typed incorrectly, because it relies on the implicit `Module.__getattr__` which types it as `Tensor | Module`. This has two issues:
- It hides the fact that `bias` is optional, and can be `None`, which in turn can hide actual bugs on user side.
- It blurs the type due to having `Module` in the union, which can require unnecessary `isistance(linear.bias, Tensor)` on user side.
This PR types the `bias` attribute explicitly to fix these issues.
CC @ezyang @Skylion007
Pull Request resolved: https://github.com/pytorch/pytorch/pull/142326
Approved by: https://github.com/ezyang
# Summary
- Adds support for non-power of 2 headdim by launching blocks w/ head_dim rounded to the next valid power.
- Other option I considered was building up the final dot_products with smaller blocks (this would probably work but for sake of code complexity going with this option for now)
### Corollary
We had a bug in our backwards kernel where we were using index_k instead of index_v. This should have shown up for the qk_head_dim != v_head_dim cases..
Pull Request resolved: https://github.com/pytorch/pytorch/pull/133495
Approved by: https://github.com/Chillee