Mikayla Gawarecki
671553bd23
Update documentation wording for transformer-related layers ( #155123 )
...
<img width="947" alt="Screenshot 2025-06-04 at 1 33 53 PM" src="https://github.com/user-attachments/assets/4dbb66b3-43f4-4d04-afb5-dc80cec0f2cd " />
Pull Request resolved: https://github.com/pytorch/pytorch/pull/155123
Approved by: https://github.com/albanD , https://github.com/jbschlosser
2025-06-04 22:20:32 +00:00
zeshengzong
31d12b3955
Fix avg_pool2d param kernel_size descripthon ( #154353 )
...
Fixes part of #153149
## Test Result


Pull Request resolved: https://github.com/pytorch/pytorch/pull/154353
Approved by: https://github.com/colesbury
2025-06-04 11:55:01 +00:00
ILCSFNO
a69da90a9f
Add pad limit of avg_poolnd and AvgPoolnd ( #152680 )
...
Fixes #152156
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152680
Approved by: https://github.com/mikaylagawarecki
2025-05-04 17:25:22 +00:00
zeshengzong
7e2081fa93
Optimize interpolate saturate description ( #151304 )
...
Fixes #108225
## Test Result
### Before

### After

Pull Request resolved: https://github.com/pytorch/pytorch/pull/151304
Approved by: https://github.com/albanD
Co-authored-by: albanD <desmaison.alban@gmail.com>
2025-04-17 18:34:29 +00:00
Olaf Lipinski
0a6e1d6b9b
Expand docs for nn.functional, and make the wording consistent ( #148436 )
...
Expands the docs for the loss functions, and makes the wording consistent.
Fixes #148353
Pull Request resolved: https://github.com/pytorch/pytorch/pull/148436
Approved by: https://github.com/albanD
2025-04-14 19:37:12 +00:00
jPorterDosch
3e9f4f3f78
docs: allow empty targets tensor in ctc_loss ( #151080 )
...
docs: allow empty targets tensor in ctc_losswhen target_lengths are zero, as described in issue
Fixes #150995
Pull Request resolved: https://github.com/pytorch/pytorch/pull/151080
Approved by: https://github.com/albanD
2025-04-12 05:26:54 +00:00
zeshengzong
4a545eb85d
Fix torch.nn.functional.one_hot param num_classes optional description ( #146470 )
...
`torch.nn.functional.one_hot` [document](https://pytorch.org/docs/stable/generated/torch.nn.functional.one_hot.html ) describe param `num_classes` not optional, but user can call method without pass it.

```python
>>> import torch
>>> a = torch.arange(0, 5) % 3 # [0,1,2,0,1]
>>> torch.nn.functional.one_hot(a)
tensor([[1, 0, 0],
[0, 1, 0],
[0, 0, 1],
[1, 0, 0],
[0, 1, 0]])
```
`num_classes` has default value -1
93d98aca31/aten/src/ATen/native/native_functions.yaml (L6154-L6157)
## Test Result

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146470
Approved by: https://github.com/albanD
2025-02-06 07:48:05 +00:00
Aaron Orenstein
0afd335174
PEP585 update - torch/nn torch/optim torch/package torch/profiler torch/serialization torch/sparse torch/xpu ( #145175 )
...
See #145101 for details.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145175
Approved by: https://github.com/bobrenjc93
2025-01-21 16:57:27 +00:00
PyTorch MergeBot
5fd881a5b6
Revert "PEP585 update - torch/nn torch/optim torch/package torch/profiler torch/serialization torch/sparse torch/xpu ( #145175 )"
...
This reverts commit 54a00af2c6 .
Reverted https://github.com/pytorch/pytorch/pull/145175 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it seems to break some trunk tests ([comment](https://github.com/pytorch/pytorch/pull/145175#issuecomment-2603418267 ))
2025-01-21 00:49:55 +00:00
Aaron Orenstein
54a00af2c6
PEP585 update - torch/nn torch/optim torch/package torch/profiler torch/serialization torch/sparse torch/xpu ( #145175 )
...
See #145101 for details.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145175
Approved by: https://github.com/bobrenjc93
2025-01-20 22:32:59 +00:00
cyy
d87aad6877
[5/N] Apply Ruff fixes and pyupgrade to Python 3.9 ( #144205 )
...
Fixes #ISSUE_NUMBER
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144205
Approved by: https://github.com/albanD
2025-01-15 04:00:47 +00:00
Mikayla Gawarecki
b8f383107e
Link to transformer tutorial in transformer docs ( #144425 )
...
<img width="1045" alt="Screenshot 2025-01-08 at 4 50 20 PM" src="https://github.com/user-attachments/assets/05adfecb-8a23-4c48-9a2c-50c5b3f886b0 " />
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144425
Approved by: https://github.com/albanD
2025-01-09 17:42:09 +00:00
Angela Yi
a9d84875a9
Fix mha torch._check in jit tracing ( #142059 )
...
Test Plan: `buck2 run @//mode/dev-nosan //mobile-vision/d2go/projects_oss/detr:tests -- -r test_detr_fbnet_export`
Differential Revision: D66769339
Pull Request resolved: https://github.com/pytorch/pytorch/pull/142059
Approved by: https://github.com/ezyang
2024-12-05 18:38:17 +00:00
angelayi
80705d3abf
Convert assert to torch._check in MHA ( #141918 )
...
Fixes https://github.com/pytorch/pytorch/issues/139610
Pull Request resolved: https://github.com/pytorch/pytorch/pull/141918
Approved by: https://github.com/ezyang
2024-12-03 21:58:02 +00:00
Michael Diggin
723498aab8
Gaussian nll loss scalar variance support ( #138931 )
...
Fixes #138747
Adds support for `variance` being a Tensor or a float in `gaussian_nll_loss` to avoid a cpu-gpu sync point in the loss function, when the variance is a static tensor like `<scalar>*torch.ones_like(input)`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/138931
Approved by: https://github.com/mikaylagawarecki
2024-11-21 18:20:09 +00:00
Donald Tolley
c1e7d85ce6
Add Weighted Loss Functions to PyTorch : WMSE, WMAE, and Weighted Huber Loss ( #132049 )
...
#### Summary
This pull request introduces new weighted loss functions to the PyTorch library: `weighted_huber_loss`, `wmse_loss`, and `wmae_loss`. These functions allow for precise control over the influence of each sample during training, important for imbalanced data or when certain samples are more significant than others.
#### Changes
- **`weighted_huber_loss`**: Huber loss modified to incorporate weights, providing a balance between L1 and L2 loss based on the `delta` parameter.
- **`wmse_loss`** (Weighted Mean Squared Error): Applies weights to the standard MSE loss, useful for emphasizing certain samples in regression tasks.
- **`wmae_loss`** (Weighted Mean Absolute Error): Adjusts MAE loss calculation by including weights, ideal for datasets with outliers.
#### Code Details
- **Input Validation**: Ensures `input`, `target`, and `weights` tensors match in size to prevent broadcasting errors.
- **Reduction Options**: Supports `none`, `mean`, and `sum` reductions to suit various computational needs.
- **Backward Compatibility**: Maintains support for deprecated arguments `size_average` and `reduce`, while encouraging use of the `reduction` argument.
#### Usage Example
```python
import torch
input = torch.tensor([0.5, 2.5, 2.0], dtype=torch.float32)
target = torch.tensor([0.0, 2.0, 1.5], dtype=torch.float32)
weights = torch.tensor([1.0, 0.5, 1.5], dtype=torch.float32)
loss = weighted_huber_loss(input, target, weights, delta=1.0)
print(loss)
```
---
Feedback on these implementations is welcome; please let me know if further modifications are required.
Resolves #132465
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132049
Approved by: https://github.com/mikaylagawarecki
Co-authored-by: mikaylagawarecki <mikaylagawarecki@gmail.com>
2024-10-31 21:59:43 +00:00
Tom Ritchford
c0582fd0f8
Remove unused Python variables in torch/[b-z]* ( #136963 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/136963
Approved by: https://github.com/ezyang
2024-10-19 16:45:22 +00:00
Joel Schlosser
83a3ee0699
Support embedding_bag() with NJT input ( #135888 )
...
Fixes #93843
`EmbeddingBag()` / `embedding_bag()` support 1D inputs with offsets to handle raggedness. NJT is a natural fit here as it already maintains offsets of the same form. This PR updates the python-side to support NJT and adds corresponding OpInfo-based NJT tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/135888
Approved by: https://github.com/cpuhrsch
2024-09-23 17:35:19 +00:00
xingyunjohn1
e6c3f58584
Fix example: Address broadcasting error in the addition of `attn_bias… ( #135427 )
...
…` and `attn_mask`, and correct device assignment for newly created variables in the method.
Fix example: Address broadcasting error in the addition of `attn_bias` and `attn_mask`, and correct device assignment for newly created variables in the method.
1. Adding `attn_bias += attn_mask` results in a broadcasting error. The expected shape of `attn_bias` is (L, S), so the output should also have the shape (L, S). However, when the input shape is (N, num_heads, L, S), broadcasting occurs, leading to an output shape of (N, num_heads, L, S), which is not desired.
2. `attn_bias` is a newly created variable within the method, but it is not assigned to the correct device.
**This is my retry of PR #130209 . The PR has been merged into commit `d4a79d4a7c746068d25fe5cf9333495561f4ce1f`, but the modifications were overwritten by subsequent commits.**
Co-authored-by: mikaylagawarecki <mikaylagawarecki@gmail.com>
@mikaylagawarecki provided a more elegant implementation.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/135427
Approved by: https://github.com/ezyang
2024-09-09 03:47:34 +00:00
drisspg
85fa019697
[Docs] Fix call to deprecated function ( #135037 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/135037
Approved by: https://github.com/janeyx99 , https://github.com/jbschlosser
2024-09-03 20:57:11 +00:00
Apurva Jain
8bc5ef563e
Grouped Query Attention ( #132689 )
...
### Approach: Using the current function declaration
**Constraint:** Q_Heads % KV_Heads == 0
**Major change:**
- Added a new argument enable_gqa: bool to sdpa function call
- It adds a meaning to the last third dimension.
Sample use cases this would enable:
LLama3
```
# LLama3 8b call to SDPA
query = torch.rand(batch, 32, seq_len_q, D)
key = torch.rand(batch, 8, seq_len_kv, D)
value = torch.rand(batch, 8, seq_len_kv, D)
output = scaled_dot_product_attention(query, key, value, is_causal=True, enable_gqa=True)
# Output Shape
(batch, 32, seq_len_q, D)
```
### Design Choice:
- Check if Query.size(-3) == Key.size(-3) == Value.size(-3) or, Query.size(-3) % Key.size(-3) == 0
- The function adjusts the key and value tensors to match the query tensor's head dimension by using repeat_interleave if their number of heads are not equal, facilitating correct and efficient computation in attention mechanisms.
- By default the enable_gqa flag is set to False, which ensures that regular sdpa functionality remains unchanged.
### Benchmarks:
- **sdpa.py: #130634**
For different batch sizes enable_gqa=True shows a substansial improvement in the run_time of sdpa
| batch_size | q_num_heads | kv_num_heads | q_seq_len | kv_seq_len | embed_dim | forward_time when enable_gqa=True | forward_time when enable_gqa=False |
| ------------ | ------------- | -------------- | ----------- | ------------ | ----------- | ----------- | ---------------- |
| 1 | 32 | 8 | 2048 | 2048 | 2048 | 100.71 | 119.70 |
| 8 | 32 | 8 | 2048 | 2048 | 2048 | 539.78 | 628.83 |
| 16 | 32 | 8 | 2048 | 2048 | 2048 | 1056.81 | 1225.48 |
| 32 | 32 | 8 | 2048 | 2048 | 2048 | 2099.54 | 2440.45 |

- **TorchTitan: https://github.com/pytorch/torchtitan/pull/458 **
Differential Revision: D60772086
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132689
Approved by: https://github.com/drisspg
2024-08-07 05:35:36 +00:00
Jianyu Huang
c7cfa51721
Always use high precision for SDPA math backend ( #128922 )
...
Summary:
feikou observed the big numerical gaps when using math backend on AMD and NV GPUs. It's mainly because we are not using higher precision FP32 for the intermediate accumulated/materialized parts.
Since math backend is expected to be slower anyways, and we expect math backend to generate the correct reference result, I think it should be worth to upcast FP16/BF16 input to FP32, and do FP32/TF32 computations, and then downcast FP32 output back to FP16/BF16.
Differential Revision: D58710805
Pull Request resolved: https://github.com/pytorch/pytorch/pull/128922
Approved by: https://github.com/xw285cornell , https://github.com/drisspg
2024-08-04 23:58:14 +00:00
PyTorch MergeBot
bcb4f7c172
Revert "Grouped Query Attention ( #128898 )"
...
This reverts commit 6b28af1b79 .
Reverted https://github.com/pytorch/pytorch/pull/128898 on behalf of https://github.com/ZainRizvi due to Sorry, this broke a bunch of tests internally. See D60638265 ([comment](https://github.com/pytorch/pytorch/pull/128898#issuecomment-2265961038 ))
2024-08-02 18:58:46 +00:00
PyTorch MergeBot
59b73079a0
Revert "Always use high precision for SDPA math backend ( #128922 )"
...
This reverts commit fbf3bc0a60 .
Reverted https://github.com/pytorch/pytorch/pull/128922 on behalf of https://github.com/ZainRizvi due to Sorry, but this PR has a dependency on another PR (https://github.com/pytorch/pytorch/pull/128898 ) that has to be reverted ([comment](https://github.com/pytorch/pytorch/pull/128922#issuecomment-2265949958 ))
2024-08-02 18:46:50 +00:00
Jianyu Huang
fbf3bc0a60
Always use high precision for SDPA math backend ( #128922 )
...
Summary:
feikou observed the big numerical gaps when using math backend on AMD and NV GPUs. It's mainly because we are not using higher precision FP32 for the intermediate accumulated/materialized parts.
Since math backend is expected to be slower anyways, and we expect math backend to generate the correct reference result, I think it should be worth to upcast FP16/BF16 input to FP32, and do FP32/TF32 computations, and then downcast FP32 output back to FP16/BF16.
Differential Revision: D58710805
Pull Request resolved: https://github.com/pytorch/pytorch/pull/128922
Approved by: https://github.com/xw285cornell , https://github.com/drisspg
2024-08-01 18:55:48 +00:00
jainapurva
6b28af1b79
Grouped Query Attention ( #128898 )
...
### Approach: Using the current function declaration
**Constraint:** Q_Heads % KV_Heads == 0
**Major change:**
- Added a new argument enable_gqa: bool to sdpa function call
- It adds a meaning to the last third dimension.
Sample use cases this would enable:
LLama3
```
# LLama3 8b call to SDPA
query = torch.rand(batch, 32, seq_len_q, D)
key = torch.rand(batch, 8, seq_len_kv, D)
value = torch.rand(batch, 8, seq_len_kv, D)
output = scaled_dot_product_attention(query, key, value, is_causal=True, enable_gqa=True)
# Output Shape
(batch, 32, seq_len_q, D)
```
### Design Choice:
- Check if Query.size(-3) == Key.size(-3) == Value.size(-3) or, Query.size(-3) % Key.size(-3) == 0
- The function adjusts the key and value tensors to match the query tensor's head dimension by using repeat_interleave if their number of heads are not equal, facilitating correct and efficient computation in attention mechanisms.
- By default the enable_gqa flag is set to False, which ensures that regular sdpa functionality remains unchanged.
### Benchmarks:
- **sdpa.py: #130634**
For different batch sizes enable_gqa=True shows a substansial improvement in the run_time of sdpa
| batch_size | q_num_heads | kv_num_heads | q_seq_len | kv_seq_len | embed_dim | forward_time when enable_gqa=True | forward_time when enable_gqa=False |
| ------------ | ------------- | -------------- | ----------- | ------------ | ----------- | ----------- | ---------------- |
| 1 | 32 | 8 | 2048 | 2048 | 2048 | 100.71 | 119.70 |
| 8 | 32 | 8 | 2048 | 2048 | 2048 | 539.78 | 628.83 |
| 16 | 32 | 8 | 2048 | 2048 | 2048 | 1056.81 | 1225.48 |
| 32 | 32 | 8 | 2048 | 2048 | 2048 | 2099.54 | 2440.45 |

- **TorchTitan: https://github.com/pytorch/torchtitan/pull/458 **
Pull Request resolved: https://github.com/pytorch/pytorch/pull/128898
Approved by: https://github.com/drisspg
2024-07-31 22:58:51 +00:00
PyTorch MergeBot
499ead96ff
Revert "Grouped Query Attention ( #128898 )"
...
This reverts commit d039b14207 .
Reverted https://github.com/pytorch/pytorch/pull/128898 on behalf of https://github.com/albanD due to Broken test on main ([comment](https://github.com/pytorch/pytorch/pull/128898#issuecomment-2258314481 ))
2024-07-30 13:11:24 +00:00
jainapurva
d039b14207
Grouped Query Attention ( #128898 )
...
### Approach: Using the current function declaration
**Constraint:** Q_Heads % KV_Heads == 0
**Major change:**
- Added a new argument enable_gqa: bool to sdpa function call
- It adds a meaning to the last third dimension.
Sample use cases this would enable:
LLama3
```
# LLama3 8b call to SDPA
query = torch.rand(batch, 32, seq_len_q, D)
key = torch.rand(batch, 8, seq_len_kv, D)
value = torch.rand(batch, 8, seq_len_kv, D)
output = scaled_dot_product_attention(query, key, value, is_causal=True, enable_gqa=True)
# Output Shape
(batch, 32, seq_len_q, D)
```
### Design Choice:
- Check if Query.size(-3) == Key.size(-3) == Value.size(-3) or, Query.size(-3) % Key.size(-3) == 0
- The function adjusts the key and value tensors to match the query tensor's head dimension by using repeat_interleave if their number of heads are not equal, facilitating correct and efficient computation in attention mechanisms.
- By default the enable_gqa flag is set to False, which ensures that regular sdpa functionality remains unchanged.
### Benchmarks:
- **sdpa.py: #130634**
For different batch sizes enable_gqa=True shows a substansial improvement in the run_time of sdpa
| batch_size | q_num_heads | kv_num_heads | q_seq_len | kv_seq_len | embed_dim | forward_time when enable_gqa=True | forward_time when enable_gqa=False |
| ------------ | ------------- | -------------- | ----------- | ------------ | ----------- | ----------- | ---------------- |
| 1 | 32 | 8 | 2048 | 2048 | 2048 | 100.71 | 119.70 |
| 8 | 32 | 8 | 2048 | 2048 | 2048 | 539.78 | 628.83 |
| 16 | 32 | 8 | 2048 | 2048 | 2048 | 1056.81 | 1225.48 |
| 32 | 32 | 8 | 2048 | 2048 | 2048 | 2099.54 | 2440.45 |

- **TorchTitan: https://github.com/pytorch/torchtitan/pull/458 **
Pull Request resolved: https://github.com/pytorch/pytorch/pull/128898
Approved by: https://github.com/drisspg
2024-07-29 21:49:06 +00:00
xingyunjohn1
d4a79d4a7c
Fix an example: Resolve broadcasting error in attn_bias and attn_mask… ( #130209 )
...
… addition, fix device assignment for newly created variables in method
Fix an example: Resolve broadcasting error in attn_bias and attn_mask addition, fix device assignment for newly created variables in method
1. `attn_bias += attn_mask` would cause a broadcasting error. Because the shape of `attn_bias` is (L, S), the shape of the output would be expected as (L, S) too. When the shape of input is (N, num_heads, L, S), a broadcasting should be triggered. Then, the shape of the output would be (N, num_heads, L, S), which is unexpected.
2. `attn_bias` is a newly created variables in method, which is not assigned device.
**This is my retry of #130200 .** I used a wrong account in that pr.
Co-authored-by: mikaylagawarecki <mikaylagawarecki@gmail.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130209
Approved by: https://github.com/mikaylagawarecki
2024-07-19 15:23:22 +00:00
Ma, Jing1
52cb9abb1d
Add deterministic support in nn.functional.interpolate for XPU ( #129864 )
...
Both for CUDA and XPU, there are no deterministic implementation at native in `aten::upsample_bilinear` and `aten::replication_pad`. CUDA leverage operator decomposition path in frontend hook `nn.functional.interpolate` as its deterministic implentation. XPU backend uses the same solution in this PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129864
Approved by: https://github.com/dvrogozh , https://github.com/albanD , https://github.com/EikanWang
2024-07-19 02:15:42 +00:00
Xuehai Pan
662e9e1076
[BE] enable UFMT for torch/nn/functional.py ( #128592 )
...
Part of #123062
- #123062
Pull Request resolved: https://github.com/pytorch/pytorch/pull/128592
Approved by: https://github.com/mikaylagawarecki
2024-06-24 06:24:12 +00:00
PyTorch MergeBot
cc8193c707
Revert "[BE] enable UFMT for torch/nn/functional.py ( #128592 )"
...
This reverts commit f6e6e55fa7 .
Reverted https://github.com/pytorch/pytorch/pull/128592 on behalf of https://github.com/fbgheith due to breaking internal builds ([comment](https://github.com/pytorch/pytorch/pull/128592#issuecomment-2181783936 ))
2024-06-21 00:44:16 +00:00
Xuehai Pan
f6e6e55fa7
[BE] enable UFMT for torch/nn/functional.py ( #128592 )
...
Part of #123062
- #123062
Pull Request resolved: https://github.com/pytorch/pytorch/pull/128592
Approved by: https://github.com/mikaylagawarecki
ghstack dependencies: #128596 , #128594
2024-06-17 16:29:29 +00:00
Xuehai Pan
67ef2683d9
[BE] wrap deprecated function/class with typing_extensions.deprecated ( #127689 )
...
Use `typing_extensions.deprecated` for deprecation annotation if possible. Otherwise, add `category=FutureWarning` to `warnings.warn("message")` if the category is missing.
Note that only warnings that their messages contain `[Dd]eprecat(ed|ion)` are updated in this PR.
Resolves #126888
- #126888
This PR is split from PR #126898 .
- #126898
------
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127689
Approved by: https://github.com/Skylion007
2024-06-02 12:30:43 +00:00
PyTorch MergeBot
033e733021
Revert "[BE] wrap deprecated function/class with typing_extensions.deprecated ( #126898 )"
...
This reverts commit 749a132fb0 .
Reverted https://github.com/pytorch/pytorch/pull/126898 on behalf of https://github.com/fbgheith due to switching typing-extensions=4.3.0 to 4.9.0 causes internal failure ([comment](https://github.com/pytorch/pytorch/pull/126898#issuecomment-2142884456 ))
2024-05-31 19:47:24 +00:00
lancerts
ff65b18fcf
Update the is_causal explaination in the SDPA doc ( #127209 )
...
Fixes #126873
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127209
Approved by: https://github.com/drisspg
2024-05-29 18:53:17 +00:00
Xuehai Pan
749a132fb0
[BE] wrap deprecated function/class with typing_extensions.deprecated ( #126898 )
...
Use `typing_extensions.deprecated` for deprecation annotation if possible. Otherwise, add `category=FutureWarning` to `warnings.warn("message")` if the category is missing.
Note that only warnings that their messages contain `[Dd]eprecat(ed|ion)` are updated in this PR.
UPDATE: Use `FutureWarning` instead of `DeprecationWarning`.
Resolves #126888
- #126888
Pull Request resolved: https://github.com/pytorch/pytorch/pull/126898
Approved by: https://github.com/albanD
2024-05-29 12:09:27 +00:00
Joel Schlosser
d15920a7d0
Warn SDPA users about dropout behavior ( #126294 )
...
Fixes #124464
Pull Request resolved: https://github.com/pytorch/pytorch/pull/126294
Approved by: https://github.com/mikaylagawarecki , https://github.com/drisspg
2024-05-15 20:58:23 +00:00
Noam Siegel
a03b9a2189
fix: typo ( #125226 )
...
Fixes spelling error: spacial is an incorrect spelling of spatial
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125226
Approved by: https://github.com/Skylion007
2024-04-30 16:57:39 +00:00
Aaron Gokaslan
5a1216bb2e
[BE]: Update ruff to 0.4.1 ( #124549 )
...
Update ruff to 0.4.1 .
This version fixes a lot false negatives/false positives, is 20-40% faster, and has various other bug fixes.
Below is a before and after table showing the execution time of ruff lint and ruff format in milliseconds courtesy of https://astral.sh/blog/ruff-v0.4.0
| Repository | Linter (v0.3) | Linter (v0.4) | Formatter (v0.3) | Formatter (v0.4) |
|----------------------------------------------------|---------------|---------------|------------------|------------------|
| [pytorch/pytorch](https://github.com/pytorch/pytorch ) | 328.7 | 251.8 | 351.1 | 274.9 |
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124549
Approved by: https://github.com/ezyang
2024-04-21 14:06:23 +00:00
Dmitry Ulyanov
c8e117fb76
Tiny comments improvement ( #123426 )
...
Fixed a typo in `functional.py` and moved comment line to correct place in `transformer.py`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123426
Approved by: https://github.com/mikaylagawarecki
2024-04-05 17:25:42 +00:00
Mikayla Gawarecki
487b6d40ec
Add RMSNorm module ( #121364 )
...
Similar to dbeed9724b/torchmultimodal/modules/layers/normalizations.py (L51)
**The implementation here is not optimized and we welcome pull requests to improve this**
- Use `normalized_shape` instead of singular integer `dim` to be aligned with the `nn.LayerNorm` implementation
- Remove the [upcast to float and downcast
](dbeed9724b/torchmultimodal/modules/layers/normalizations.py (L73) )
Differential Revision: [](https://our.internmc.facebook.com/intern/diff/ )
Differential Revision: [D55485840](https://our.internmc.facebook.com/intern/diff/D55485840 )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/121364
Approved by: https://github.com/albanD
2024-03-29 18:05:28 +00:00
PyTorch MergeBot
8698121636
Revert "Add RMSNorm module ( #121364 )"
...
This reverts commit a7306de0dc .
Reverted https://github.com/pytorch/pytorch/pull/121364 on behalf of https://github.com/atalman due to Broke internal tests ([comment](https://github.com/pytorch/pytorch/pull/121364#issuecomment-2025502007 ))
2024-03-28 15:31:10 +00:00
Mikayla Gawarecki
a7306de0dc
Add RMSNorm module ( #121364 )
...
Similar to dbeed9724b/torchmultimodal/modules/layers/normalizations.py (L51)
**The implementation here is not optimized and we welcome pull requests to improve this**
- Use `normalized_shape` instead of singular integer `dim` to be aligned with the `nn.LayerNorm` implementation
- Remove the [upcast to float and downcast
](dbeed9724b/torchmultimodal/modules/layers/normalizations.py (L73) )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/121364
Approved by: https://github.com/albanD
2024-03-27 21:39:30 +00:00
Gonçalo Rua
139647d317
Fix #83241 : torch.nn.TripletMarginLoss allowed margin less or equal to 0 ( #121978 )
...
Documentation states that the parameter margin of torch.nn.TripletMarginLoss is greater than 0, however any value was being accepted. Also fixed torch.nn.TripletMarginWithDistanceLoss which had the same problem. Added error test input for the new ValueError.
Fixes #83241
Pull Request resolved: https://github.com/pytorch/pytorch/pull/121978
Approved by: https://github.com/mikaylagawarecki
2024-03-19 23:19:11 +00:00
João Gouveia
1afa8e0985
Fix #83153 : torch.nn.hardtahn allowed min_val to be greater than max_val ( #121627 )
...
Fixes #83153
Pull Request resolved: https://github.com/pytorch/pytorch/pull/121627
Approved by: https://github.com/albanD
2024-03-15 00:57:45 +00:00
drisspg
f5391dad82
Update docs to point to new sdpa_kernel context manager ( #121180 )
...
# Summary
Updates the SDPA docs to fix some small inaccuracies and points to the new sdpa_kernel context manger. The Enum like type binded from cpp SDPBackend does not render its fields for some reason. Manually list them instead for now
Pull Request resolved: https://github.com/pytorch/pytorch/pull/121180
Approved by: https://github.com/mikaylagawarecki
2024-03-05 22:19:48 +00:00
lancerts
67c97a9aad
fix the scale dot attention doc ( #120859 )
...
Fixes #120810
The code verifies the broadcast behavior (from the issue),
```
import torch
B = 3
S = 5
L = 7
E = 16
EV = 32
additional_batches = [2, 4]
query_shape = [B] + additional_batches + [L, E]
key_shape = [B] + additional_batches + [S, E]
value_shape = [B] + additional_batches + [S, EV]
query = torch.rand(*query_shape)
key = torch.rand(*key_shape)
value = torch.rand(*value_shape)
mask = torch.zeros((1, 1, S), dtype=torch.bool)
mask[:, :, S // 2 :] = True
# query.to("cuda")
# key.to("cuda")
# value.to("cuda")
# mask.to("cuda")
attention = torch.nn.functional.scaled_dot_product_attention(query, key, value, mask)
print(f"query shape = {query.shape}")
print(f"key shape = {key.shape}")
print(f"value shape = {value.shape}")
print(f"mask shape = {mask.shape}")
print(f"attention shape = {attention.shape}")
#in both CPU and cuda, output shape is:
# query shape = torch.Size([3, 2, 4, 7, 16])
# key shape = torch.Size([3, 2, 4, 5, 16])
# value shape = torch.Size([3, 2, 4, 5, 32])
# mask shape = torch.Size([1, 1, 5])
# attention shape = torch.Size([3, 2, 4, 7, 32])
## test add is broadcasting mask to query@(key.mT)
res = query@(key.mT)
print(res.shape)
res2 = torch.add(res, mask)
print(res2.shape)
```
At code level, in the default backend,
ab38354887/aten/src/ATen/native/transformers/attention.cpp (L735)
the add operation is broadcasting the `attn_mask` to `auto attn = at::matmul(query, key.transpose(-2, -1) * scaling_factor);`
- Changed the doc in [torch/nn/functional.py](https://github.com/pytorch/pytorch/pull/120859/files#diff-c358c214f663ba0c8b9c6846fbe0042fa29494cf02fe4714a17dcd0d268b035b ).
- Also fixed a few inconsistencies in the cpp comments.
@mikaylagawarecki
Pull Request resolved: https://github.com/pytorch/pytorch/pull/120859
Approved by: https://github.com/drisspg
2024-03-01 02:54:08 +00:00
Isuru Fernando
435063aa89
Decomposition for upsample_linear{1d, 3d} ( #114774 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/114774
Approved by: https://github.com/lezcano , https://github.com/vfdev-5 , https://github.com/peterbell10
2024-02-27 11:57:45 +00:00
Lei Mao
91d1d2c421
Make MHA Query Scaling Behaviors Consistent ( #119323 )
...
The multi-head attention (MHA) query scaling behaviors are not consistent when [`need_weights`](8ac9b20d4b/torch/nn/modules/activation.py (L1073) ) values are different.
On the current main, when `need_weights = True`, the query scaling was performed using a [division](8ac9b20d4b/torch/nn/functional.py (L5434) ) and it will be exported as a `Div` operator in ONNX. When `need_weights = False`, the query scaling was performed using a [multiplication](422b4271ae/aten/src/ATen/native/transformers/attention.cpp (L711) ) and it will be exported as a `Mul` operator in ONNX defined in the [PyTorch ONNX Symbolics](422b4271ae/torch/onnx/symbolic_opset14.py (L177) ).
We should make the query scaling behaviors consistent. On most of the platforms, multiplication performs no worse than division. Therefore, we should use multiplication consistently for both `need_weights = True` and `need_weights = False`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119323
Approved by: https://github.com/mikaylagawarecki , https://github.com/albanD
2024-02-07 18:42:57 +00:00