Commit Graph

720 Commits

Author SHA1 Message Date
zeshengzong
82dc3457e0 Add load_state_dict hint doc about invoke order work with lr_scheduler (#149942)
Fixes #119168

## Test Result

![image](https://github.com/user-attachments/assets/edb8124c-f103-475a-b903-20fbc71fdea6)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149942
Approved by: https://github.com/janeyx99

Co-authored-by: Jane (Yuan) Xu <31798555+janeyx99@users.noreply.github.com>
2025-05-15 01:07:36 +00:00
Aaron Gokaslan
f05b38aa26 [BE]: Improve decorator typing for Optimizer subclasses (#153374)
Improves typing so that all the optimizer subclasses (which all of them that subtype step) do not erase their type signature when this decorator is used. Now *kwarg values and returns will propogate

This complements @tsunghsienlee PR #153367  as the type signature of step() was being erased on all the optimizer subclasses by this untyped decorator

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153374
Approved by: https://github.com/janeyx99, https://github.com/tsunghsienlee
2025-05-12 22:55:25 +00:00
Tsung-Hsien Lee
ea4b65ab60 Fix the type hint of step() with default value (#153367)
Summary: Because the default value of `closure` is `None`, this fixes the situation when `step()`. The previous typing (https://github.com/pytorch/pytorch/pull/102593) could only be used as `step(closure=None)` and `step(None)`.

Test Plan: contbuild & OSS CI

Differential Revision: D74560785

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153367
Approved by: https://github.com/cyyever, https://github.com/Skylion007, https://github.com/janeyx99
2025-05-12 15:52:59 +00:00
Jacobgoss30
6a8006472e Fix doc cosineannealinglr 152081 (#152936)
## Summary

This PR updates the docstring for `CosineAnnealingLR` to accurately reflect its recursive learning rate schedule. The previous docstring displayed only the SGDR closed-form expression, which doesn't match the actual recursive implementation in code.

Changes:

- Added the recursive update formula used in `get_lr()`
- Retained the original closed-form SGDR expression for reference
- Clarified that warm restarts are not implemented in this scheduler

This addresses confusion raised in issue #152081.

## Related issue

[#152081](https://github.com/pytorch/pytorch/issues/152081)

## Testing

Doc-only change. Ran pre-commit to verify formatting.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/152936
Approved by: https://github.com/janeyx99
2025-05-08 17:25:30 +00:00
Jane Xu
3bc69cc08d Document that dampening is skipped in SGD momentum first step (#152833)
Pointed out by https://x.com/hi_tysam/status/1917318692276174977/photo/2.

It would be BC breaking to change this behavior 7 years after it has been decided, so we are documenting it first at the very least.

<img width="642" alt="image" src="https://github.com/user-attachments/assets/3febcb07-e0ed-44a1-bd3b-a8e685711cb4" />

Pull Request resolved: https://github.com/pytorch/pytorch/pull/152833
Approved by: https://github.com/albanD
2025-05-05 20:07:23 +00:00
kyo
a21090a38c Fix incorrect citation of authors in documentation (#145209)
This PR corrects the citation of Adafactor authors "Noam Shazeer" and "Mitchell Stern" in the documentation.
The current text incorrectly lists them as "Shazeer, Noam, and Mitchell Stern," which seems to be a result of a data parsing issue of some reference manager(s) [as you can find many papers with the same issue](https://www.google.com/search?q=%22Shazeer%2C+Noam%2C+and+Mitchell+Stern%22).
The updated citation follows standard conventions for author names.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145209
Approved by: https://github.com/janeyx99
2025-05-05 17:45:05 +00:00
dscamiss
9a9cc48c65 Update SGD documentation to match implementation (#149884)
Fixes #149476

This PR updates the pseudocode description of the SGD optimizer to better match the implementation.

Updated pseudocode:

![image](https://github.com/user-attachments/assets/2d7bc618-0408-4909-b835-af6465736918)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/149884
Approved by: https://github.com/janeyx99
2025-05-05 16:06:17 +00:00
zeshengzong
eb69f4e609 Add lr_lambda type check in MultiplicativeLR (#151973)
Fixes #81554

## TestResult

### Before

```python
In [3]: import torch
   ...: class SimpleLinearModel(torch.nn.Module):
   ...:     def __init__(self):
   ...:         super(SimpleLinearModel, self).__init__()
   ...:         self.linear = torch.nn.Linear(10, 1)
   ...:
   ...:     def forward(self, x):
   ...:         return self.linear(x)
   ...:
   ...: net = SimpleLinearModel()
   ...: optimizer = torch.optim.Adam(net.parameters(), lr=0.01)
   ...: scheduler = torch.optim.lr_scheduler.MultiplicativeLR(optimizer, 0.95)
   ...: for i in range(10):
   ...:     print(i, scheduler.get_last_lr())
   ...:     scheduler.step()
TypeError: 'float' object is not callable

### After

```python
   ...: scheduler = torch.optim.lr_scheduler.MultiplicativeLR(optimizer, 0.95)
TypeError: lr_lambda should be a function, but got float
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/151973
Approved by: https://github.com/janeyx99
2025-04-29 08:21:41 +00:00
zeshengzong
c81d8c231c Fix CosineAnnealingWarmRestarts reset T_cur (#151289)
Fixes #88791

## Test Result

```python
pytest test/optim/test_lrscheduler.py -k test_CosineAnnealingWarmRestarts
```

![image](https://github.com/user-attachments/assets/75ad238c-f319-47dc-bf2d-da05b0879b84)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/151289
Approved by: https://github.com/janeyx99
2025-04-28 23:02:55 +00:00
Anthony Shoumikhin
7cae7902a2 Add scripts to check xrefs and urls (#151844)
Traverses the docs and code to find any broken links
Pull Request resolved: https://github.com/pytorch/pytorch/pull/151844
Approved by: https://github.com/huydhn
2025-04-28 09:30:07 +00:00
Jane Xu
dccc41581a Include other accelerators in capturable docstr for optimizers (#149770)
Fixes #149722

@ILCSFNO is this better?

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149770
Approved by: https://github.com/albanD
2025-04-24 20:38:42 +00:00
zeshengzong
25803d3a22 Optimize typing in lr_scheduler.py (#151219)
## Changes

- Add typing annotation in `lr_scheduler.py`

## Test Result

```bash
pytest test/optim/test_lrscheduler.py -vv
```

![image](https://github.com/user-attachments/assets/34a91965-ff3a-462a-9ab0-b46ad4b290e9)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/151219
Approved by: https://github.com/janeyx99
2025-04-15 01:00:13 +00:00
zeshengzong
5eebcb991a Add scripts to generate plots of LRSchedulers (#149189)
Fixes #92007

## Changes

- Add script to generate plots for `lr_scheduler`
- Add plots to `lr_scheduler` docs
- Add example section if it missing in `lr_scheduler` docs

## Test Result

### LambdaLR

![image](https://github.com/user-attachments/assets/37fc0894-e2ec-48f2-a2d6-3514e51e1ea2)

### MultiplicativeLR

![image](https://github.com/user-attachments/assets/2122b3a0-a4ce-42c7-bb45-559c1fc73e0f)

### StepLR

![image](https://github.com/user-attachments/assets/47bc9d96-4b60-4586-a000-f213583bbe8f)

### MultiStepLR

![image](https://github.com/user-attachments/assets/c822b849-d5be-4b94-aa7a-0017a2c9ff15)

### ConstantLR

![image](https://github.com/user-attachments/assets/83107cdd-7b00-44a6-b09d-e8ee849b4a12)

### LinearLR

![image](https://github.com/user-attachments/assets/60190105-691a-4101-8966-5b0c396093a4)

### ExponentialLR

![image](https://github.com/user-attachments/assets/dfcbcbca-89e5-4a2f-b1bd-33e25d2405ec)

### PolynomialLR

![image](https://github.com/user-attachments/assets/7c3d4fce-c846-40a0-b62e-f3e81c7e08bd)

### CosineAnnealingLR

![image](https://github.com/user-attachments/assets/26712769-dde9-4faa-b61b-e23c51daef50)

### ChainedScheduler

![image](https://github.com/user-attachments/assets/20734a8b-e939-424f-b45a-773f86f020b1)

### SequentialLR

![image](https://github.com/user-attachments/assets/2cd3ed67-2a0a-4c42-9ad2-e0be090d3751)

### ReduceLROnPlateau

![image](https://github.com/user-attachments/assets/b77f641e-4810-450d-b2cd-8b3f134ea188)

### CyclicLR

![image](https://github.com/user-attachments/assets/29b8666f-41b3-45e4-9159-6929074e6108)

### OneCycleLR

![image](https://github.com/user-attachments/assets/d5b683ef-41e8-4ca8-9fe8-0f1e6b433866)

### CosineAnnealingWarmRestarts

![image](https://github.com/user-attachments/assets/1d45ea80-dea8-494d-a8ab-e9cfc94c55d6)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149189
Approved by: https://github.com/janeyx99
2025-04-14 09:53:38 +00:00
zeshengzong
304633152c Clean up duplicated code in lr_scheduler (#150984)
## Changes

- Remove duplicated code in `ReduceLROnPlateau`
- Remove redundant `noqa` comment

## Test Result

```bash
pytest test/optim/test_lrscheduler.py
```

![image](https://github.com/user-attachments/assets/37f91f31-0e77-4abf-9dd1-75538c0f0792)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/150984
Approved by: https://github.com/janeyx99
2025-04-13 09:18:50 +00:00
Isalia20
49f6cce736 [MPS] grad scaler (#150255)
Fixes #142397

Basic implementation is done. What's left:
- [x] Different dtype/device tensors in the TensorList
- [x] fast path for grouping the foreach kernel
- [x] Tests

Regarding tests, I found some tests in `test/test_torch.py` for GradScaler but I couldn't figure out what is the best way to enable the test for MPS device.

By removing `@onlyNativeDeviceTypes`, one enables the tests for MPS but also enables tests for all other devices which are not included in the native device types. If I put:
`instantiate_device_type_tests(TestTorchDeviceType, globals(), allow_mps=True)`

This enables lots of tests in that class for MPS which were not(?) being tested before? This part needs some clarification

Pull Request resolved: https://github.com/pytorch/pytorch/pull/150255
Approved by: https://github.com/malfet

Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
2025-04-06 17:06:55 +00:00
Tony-Y
78715a181f Convert Tensor lr to 0-dim as needed for the optimizer to normally work (#145674)
Fixes #145461

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145674
Approved by: https://github.com/janeyx99

Co-authored-by: Jane (Yuan) Xu <31798555+janeyx99@users.noreply.github.com>
2025-03-17 23:07:05 +00:00
zeshengzong
fb1b7ec173 Remove deprecate method and attirbute in LRScheduler (#147301)
Following [#99270 suggestion](https://github.com/pytorch/pytorch/issues/99270#issuecomment-1511656408), remove deprecate method `LRScheduler.print_lr`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/147301
Approved by: https://github.com/janeyx99
2025-03-05 05:30:19 +00:00
zeshengzong
c0ee62573a [Easy][optim] Add LBFGS params optional desc (#147579)
[LBFGS docs](https://pytorch.org/docs/stable/generated/torch.optim.LBFGS.html#torch.optim.LBFGS) missing `optional` description for params in compare with other optimizer docs, like [Adam](https://pytorch.org/docs/stable/generated/torch.optim.Adam.html)

## Test Result

### Before

![image](https://github.com/user-attachments/assets/34877490-16b4-4c68-bf6c-405bae563352)

### After

![image](https://github.com/user-attachments/assets/7fba94c8-7091-47b8-bdf1-ca7d779a027f)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/147579
Approved by: https://github.com/janeyx99
2025-02-21 19:38:10 +00:00
PyTorch MergeBot
302f56a1f2 Revert "Fix non-bitwise type annotations for Tensor operators (see #145838) (#146845)"
This reverts commit 59b7e52ad8.

Reverted https://github.com/pytorch/pytorch/pull/146845 on behalf of https://github.com/jeanschmidt due to Seems to break a few code dependencies in multiple places ([comment](https://github.com/pytorch/pytorch/pull/146845#issuecomment-2666656834))
2025-02-18 19:01:27 +00:00
Tom Ritchford
59b7e52ad8 Fix non-bitwise type annotations for Tensor operators (see #145838) (#146845)
Fix https://github.com/pytorch/pytorch/issues/145838

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146845
Approved by: https://github.com/Skylion007
2025-02-17 22:42:16 +00:00
Aaron Gokaslan
292af3cc89 [BE][Ez]: ISC001 Auto concatenate implicit one line strings (#146408)
Apply ruff rule about implicit string concatenation, this autofixes strings that are all the same type and on the same line. These lines are broken up likely as the result of autoformatters in the past. All fixes are automated using the autofixes in ISC001.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146408
Approved by: https://github.com/justinchuby, https://github.com/janeyx99
2025-02-04 19:07:04 +00:00
Aaron Orenstein
7178b827d7 PEP585: Missed conversions (#145342)
Differential Revision: [D68785969](https://our.internmc.facebook.com/intern/diff/D68785969)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145342
Approved by: https://github.com/bobrenjc93
2025-01-29 05:24:36 +00:00
Aaron Orenstein
0afd335174 PEP585 update - torch/nn torch/optim torch/package torch/profiler torch/serialization torch/sparse torch/xpu (#145175)
See #145101 for details.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145175
Approved by: https://github.com/bobrenjc93
2025-01-21 16:57:27 +00:00
PyTorch MergeBot
5fd881a5b6 Revert "PEP585 update - torch/nn torch/optim torch/package torch/profiler torch/serialization torch/sparse torch/xpu (#145175)"
This reverts commit 54a00af2c6.

Reverted https://github.com/pytorch/pytorch/pull/145175 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it seems to break some trunk tests ([comment](https://github.com/pytorch/pytorch/pull/145175#issuecomment-2603418267))
2025-01-21 00:49:55 +00:00
Aaron Orenstein
54a00af2c6 PEP585 update - torch/nn torch/optim torch/package torch/profiler torch/serialization torch/sparse torch/xpu (#145175)
See #145101 for details.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145175
Approved by: https://github.com/bobrenjc93
2025-01-20 22:32:59 +00:00
Jane Xu
3908be676c Fix loading older state_dict into AdamW after refactor (#144972)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144972
Approved by: https://github.com/albanD
2025-01-16 19:50:31 +00:00
Jane Xu
e32d2bf853 Document decoupled_weight_decay for Adam for consistency with N/RAdam (#144984)
Followup from #144972 and #143710

Pull Request resolved: https://github.com/pytorch/pytorch/pull/144984
Approved by: https://github.com/albanD
2025-01-16 18:58:29 +00:00
PyTorch MergeBot
154185dcd0 Revert "Removed unused _RequiredParameter (#144771)"
This reverts commit 6a5f895e54.

Reverted https://github.com/pytorch/pytorch/pull/144771 on behalf of https://github.com/malfet due to It broke number of cpuinductor tests ([comment](https://github.com/pytorch/pytorch/pull/144771#issuecomment-2593293542))
2025-01-15 15:51:33 +00:00
Piergiacomo De Marchi
6a5f895e54 Removed unused _RequiredParameter (#144771)
As per this [discussion](https://discuss.pytorch.org/t/a-question-about-requiredparameter/137977), I figured that `_RequiredParameter` is no longer used.

The `required` object was initially introduced in this [PR](4db6667923) as the `SGD` optimizer did not offer a default value for the learning rate. However there isn't a single place in the code base using `_RequiredParameter`, nor `required`. I am therefore removing unused `_RequiredParameter` and `required`.

Everything not included in this PR is Not a Contribution.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144771
Approved by: https://github.com/janeyx99
2025-01-15 04:11:17 +00:00
Aaron Orenstein
45ef3309e3 [BE] typing for decorators (#144161)
Summary:
Untyped decorators strip annotations from the decorated items.

- _compile
- _inductor/fx_passes/post_grad
- _inductor/lowering
- _library/custom_ops
- _meta_registrations
- _ops
- _refs/nn/functional
- ao/quantization/quantizer/xnnpack_quantizer_utils
- distributed/_composable/contract
- fx/experimental/graph_gradual_typechecker
- fx/experimental/migrate_gradual_types/constraint_generator
- optim/optimizer
- signal/windows/windows
- testing/_internal/common_device_type
- torch/_inductor/decomposition
- utils/flop_counter

Test Plan: unit tests

Differential Revision: D62302684

Pull Request resolved: https://github.com/pytorch/pytorch/pull/144161
Approved by: https://github.com/Skylion007, https://github.com/albanD
2025-01-04 16:40:09 +00:00
Jane Xu
7b69f7b449 Clarify what we mean by decoupled weight decay in the *AdamWs (#144101)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144101
Approved by: https://github.com/albanD
2025-01-03 19:06:00 +00:00
emmettbicker
92d8965082 Adding support for differentiable lr, weight_decay, and betas in Adam/AdamW (#143726)
Third PR in a series of PRs to broaden differentiable optimizer support w/ @janeyx99 (sorry for pinging over the holidays! I just wanted to put this one out but I am definitely not asking for review or anything like that rn)

This is also going to probably be my last PR before the holidays!

Note: This is a branch of #143710 -- I've never worked on a branch of a branch before so I wasn't sure about the protocol so I thought I'd just made the PR and wait until that one gets merged.

This is adding support for differentiable lr, weight_decay, and betas to Adam and AdamW (but after refactoring AdamW into an Adam subclass, it's really just changing code in torch/optim/adam.py)

I had one main thing I was wondering about, which is that adam already has a differentiable flag built in, so I have code like this
```py
if differentiable and isinstance(beta2, Tensor):
    if beta2.requires_grad:
        exp_avg_sq.mul_(beta2).addcmul_(grad, grad.conj().mul(1 - beta2))
    else:
        exp_avg_sq.mul_(beta2).addcmul_(grad, grad.conj(), value=1 - beta2)
else:
    exp_avg_sq.mul_(beta2).addcmul_(grad, grad.conj(), value=1 - beta2)
```
That I could definitely simplify to just
```py
if differentiable and isinstance(beta2, Tensor):
    exp_avg_sq.mul_(beta2).addcmul_(grad, grad.conj().mul(1 - beta2))
else:
    exp_avg_sq.mul_(beta2).addcmul_(grad, grad.conj(), value=1 - beta2)
```

It would definitely be a little slower in the case that it's differentiable but doesn't need a grad for beta2, but the code would also be a lot more clear and I'm debating speed vs future code usability.

Also the line in the above example:
```py
exp_avg_sq.mul_(beta2).addcmul_(grad, grad.conj().mul(1 - beta2))
```
was concerning to me because it is considerably more expensive than `value=1 - beta2`, but I couldn't think of a better way to do it.

Further work on #141832

Pull Request resolved: https://github.com/pytorch/pytorch/pull/143726
Approved by: https://github.com/janeyx99
2024-12-30 01:11:57 +00:00
Emmett Bicker
0de661dc27 Add support for differentiable weight decay (#143679)
(Actual) second PR in a larger project to broaden support for differentiable optimizers with @janeyx99!

In this PR, I did a lot of pattern matching from the previous PR to add support for differentiable weight_decay.

And also added a single new line on line 359 (previously line 352) to make the code from the last PR a little easier to read

Continuation of progress on #141832

Pull Request resolved: https://github.com/pytorch/pytorch/pull/143679
Approved by: https://github.com/janeyx99

Co-authored-by: Jane (Yuan) Xu <31798555+janeyx99@users.noreply.github.com>
2024-12-27 23:14:43 +00:00
emmettbicker
6ccb8ed186 Refactor AdamW into Adam (heavily inspired by tfsingh) (#143710)
Fixes #104899

Refactors AdamW into Adam by making AdamW a subclass of Adam. Additionally adds a test to assert that the added parameter `decoupled_weight_decay` is True in AdamW and also updates test_defaults_changed_to_foreach to account for the differences in module location for AdamW.

Heavily heavily inspired by #118857 by @tfsingh

Pull Request resolved: https://github.com/pytorch/pytorch/pull/143710
Approved by: https://github.com/janeyx99
2024-12-23 23:27:28 +00:00
emmettbicker
0b2c47962c Add support for differentiable LR in SGD + test v2.0 (#143510)
Second PR in a larger project to broader support for differentiable optimizers with @janeyx99 ! The first one had an issue near the end so this is the second PR on that subject. See #143122 for the development up until this point.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/143510
Approved by: https://github.com/janeyx99
2024-12-19 21:04:44 +00:00
Tony-Y
61a835ec53 Corrected description of AMSGrad algorithm (#142351)
Fixes #142323

Pull Request resolved: https://github.com/pytorch/pytorch/pull/142351
Approved by: https://github.com/janeyx99
2024-12-19 16:24:19 +00:00
Fabian Keller
5e8e1d725a Remove some unused type ignores (round 1) (#142325)
Over time, a large number of the existing type ignores have become irrelevant/unused/dead as a result of improvements in annotations and type checking.

Having these `# type: ignore` linger around is not ideal for two reasons:

- They are verbose/ugly syntatically.
- They could hide genuine bugs in the future, if a refactoring would actually introduce a bug but it gets hidden by the ignore.

I'm counting over 1500 unused ignores already. This is a first PR that removes some of them. Note that I haven't touched type ignores that looked "conditional" like the import challenge mentioned in https://github.com/pytorch/pytorch/pull/60006#issuecomment-2480604728. I will address these at a later point, and eventually would enable `warn_unused_ignores = True` in the mypy configuration as discussed in that comment to prevent accumulating more dead ignores going forward.

This PR should have no effect on runtime at all.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/142325
Approved by: https://github.com/Skylion007, https://github.com/janeyx99
2024-12-09 18:23:46 +00:00
Xuehai Pan
e1196dfe51 Deprecate torch._utils.is_compiling() (#127690)
This PR is split from PR #126898.

- #126898

------

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127690
Approved by: https://github.com/Skylion007, https://github.com/malfet
2024-12-08 22:55:36 +00:00
UV
7597ab6370 Corrected AMSGrad max equation in Adam and AdamW (#142051)
Fixes #142041

Pull Request resolved: https://github.com/pytorch/pytorch/pull/142051
Approved by: https://github.com/janeyx99
2024-12-06 21:55:26 +00:00
Aaron Gokaslan
08db735629 [BE]: Update mypy to 1.13.0 (#140808)
Update mypy to 1.13.0 . Should hopefully reduce linting time. Has support for orjson cache serialization which should improve mypy cache perf if orjson is installed.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140808
Approved by: https://github.com/ezyang, https://github.com/malfet
2024-12-03 02:50:10 +00:00
PyTorch MergeBot
daa77f3d9f Revert "[BE]: Update mypy to 1.13.0 (#140808)"
This reverts commit 00134d68af.

Reverted https://github.com/pytorch/pytorch/pull/140808 on behalf of https://github.com/huydhn due to This is failing a distributed test in trunk, target determination missed this test and did not run it on PR ([comment](https://github.com/pytorch/pytorch/pull/140808#issuecomment-2512788426))
2024-12-02 20:47:43 +00:00
Aaron Gokaslan
00134d68af [BE]: Update mypy to 1.13.0 (#140808)
Update mypy to 1.13.0 . Should hopefully reduce linting time. Has support for orjson cache serialization which should improve mypy cache perf if orjson is installed.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140808
Approved by: https://github.com/ezyang, https://github.com/malfet
2024-12-02 18:47:54 +00:00
Michael Lazos
1fd4757fdc Support tensor betas in Adam and AdamW (#134171)
Adds support for beta1 and beta2 to be wrapped in tensor for Adam and AdamW.

Fixes https://github.com/pytorch/pytorch/issues/133898

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134171
Approved by: https://github.com/janeyx99
2024-11-15 21:55:55 +00:00
Masaki Kozuki
6a368b3fc5 Add ScalarList overload to _foreach_lerp (#134482)
Related:
- https://github.com/pytorch/pytorch/issues/133367

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134482
Approved by: https://github.com/janeyx99
2024-11-12 19:03:41 +00:00
Masaki Kozuki
71d8bb7ede implement torch._foreach_rsqrt (#134574)
Related:
- #133367 c

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134574
Approved by: https://github.com/eqy, https://github.com/janeyx99
2024-11-12 15:34:35 +00:00
PyTorch MergeBot
1d28b8b6d5 Revert "Deprecate torch._utils.is_compiling() and torch._dynamo.external_utils.is_compiling() (#127690)"
This reverts commit e84d1121ad.

Reverted https://github.com/pytorch/pytorch/pull/127690 on behalf of https://github.com/ZainRizvi due to Sorry but this is breaking internally. More details in D65483292 ([comment](https://github.com/pytorch/pytorch/pull/127690#issuecomment-2458381056))
2024-11-05 23:10:38 +00:00
Xuehai Pan
e84d1121ad Deprecate torch._utils.is_compiling() and torch._dynamo.external_utils.is_compiling() (#127690)
This PR is split from PR #126898.

- #126898

------

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127690
Approved by: https://github.com/Skylion007, https://github.com/malfet
2024-11-05 10:44:56 +00:00
bskrlj
8e27833e30 Ensure SWA boundary conditions w.r.t. definition (#133773)
According to the documentation, decay is a number in [0,1] range,[ i.e.](https://pytorch.org/docs/stable/optim.html)
```
Decay is a parameter between 0 and 1 that controls how fast the averaged parameters are decayed. If not provided to get_ema_multi_avg_fn, the default is 0.999.
```
An inspection of `swa_utils.py`  indicates there are no checks for invalid values of `decay`. Adding asserts as suggested in this PR ensures valid compute range (one way to enforce correct behavior, there are perhaps more suitable ones). Papers `torch` cites for reference idea/implementation also consider exclusively this range (e.g., https://arxiv.org/pdf/2310.04415).

Fixes https://github.com/pytorch/pytorch/issues/133772

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133773
Approved by: https://github.com/janeyx99
2024-10-31 18:24:08 +00:00
Tom Ritchford
c0582fd0f8 Remove unused Python variables in torch/[b-z]* (#136963)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/136963
Approved by: https://github.com/ezyang
2024-10-19 16:45:22 +00:00
Matt Pitkin
8a5dd7f59b Allow SequentialLR to include ChainedScheduler (#133450)
This fixes #132745 and allows a `SequentialLR` to include schedulers that are compound scheduler types (i.e., a `ChainedScheduler`), which contain a list of schedulers in a `_schedulers` attribute.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133450
Approved by: https://github.com/janeyx99
2024-10-18 02:29:38 +00:00