Hi, we noticed in our team that by using CyclicLR, there is a problem with memory clearance on GPU (probably it will be the case without the GPU as well, but that was our use case) After initializing CyclicLR, GPU memory is not cleared even after the model, optimizer and scheduler are out of scope (e.g. reference count is zero). This is because `__init__` method inside `CyclicLR` creates reference to its own methods and it will not get removed until `gc.collect()` is called manually. This is a problem if people want to test multiple models in one run of a script, after testing the first model, second one will fail on `CUDA out of memory error` because the first one is not cleared from the memory.
I propose a simple fix by using `weakref`, similarly as in `_LRScheduler` base class, but if you have any comments I am happy to change it.
Here is the code to reproduce the bug:
```
import torch
import weakref
from transformers import DetrForObjectDetection
class X:
def __init__(self, optimizer):
self.optimizer = optimizer
# Will cause cyclic reference.
self.func = self.dummy
# Will work as expected, memory cleared after instance count is zero.
# self.func = weakref.WeakMethod(self.dummy)
def dummy(self, x):
return 1.
def test():
model = DetrForObjectDetection.from_pretrained('facebook/detr-resnet-50')
model.to('cuda')
optimizer = torch.optim.Adam(model.parameters())
x = X(optimizer)
test()
print(f'{torch.cuda.memory_reserved()}, {torch.cuda.memory_allocated()}') # Should print (<some memory>, 0), but with cyclic reference, it will print (<some memory>, <some memory>).
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85462
Approved by: https://github.com/albanD
Hello there 👋
As discussed in #84485, this PR enables more flexibility on the optimizers that are wrapped by LR schedulers in PyTorch. Currently, it is incompatible with optimizers that have a number of betas different than 2. This PR fixes that with minimal modifications.
Fixes#84485
Any feedback is welcome!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84486
Approved by: https://github.com/Lezcano, https://github.com/soulitzer
This is a new version of #15648 based on the latest master branch.
Unlike the previous PR where I fixed a lot of the doctests in addition to integrating xdoctest, I'm going to reduce the scope here. I'm simply going to integrate xdoctest, and then I'm going to mark all of the failing tests as "SKIP". This will let xdoctest run on the dashboards, provide some value, and still let the dashboards pass. I'll leave fixing the doctests themselves to another PR.
In my initial commit, I do the bare minimum to get something running with failing dashboards. The few tests that I marked as skip are causing segfaults. Running xdoctest results in 293 failed, 201 passed tests. The next commits will be to disable those tests. (unfortunately I don't have a tool that will insert the `#xdoctest: +SKIP` directive over every failing test, so I'm going to do this mostly manually.)
Fixes https://github.com/pytorch/pytorch/issues/71105
@ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82797
Approved by: https://github.com/ezyang
What was happening is that when we have multiple learning rate schedulers, the order in which they are being initialized is not being taken into account. This is a problem if they were being initialized in sequential order (as one might intuitively do).
Each scheduler calls `step()` on initialization and sets the `lr` in its optimizer's `params_groups`. However, this means that step 0 will be using the `lr` that was set by the very last scheduler (in the case of initializing schedulers sequentially) instead of the first scheduler.
The fix in this PR, addresses the above bug by performing a call to the appropriate scheduler on initialization after decrementing the `last_epoch` values in order to keep them the same post-step. This will ensure that the correct scheduler is the one setting the `lr` values for the optimizer's `param_groups`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72856
Approved by: https://github.com/jbschlosser
### Goal
Fixes https://github.com/pytorch/pytorch/issues/79720
### Approach
replace `Chains list of learning rate schedulers. It takes a list of chainable learning rate schedulers and performs consecutive step() functions` **`belong`** `to them by just one call.` with `Chains list of learning rate schedulers. It takes a list of chainable learning rate schedulers and performs consecutive step() functions` **`belonging`** `to them by just one call.`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79775
Approved by: https://github.com/albanD
Fixes#60265
The initial LR for this scheduler is not consistent when a new instance is created with `last_epoch != -1`
Maybe we can refactor the testing code to test `last_epoch != -1` in schedulers that can recreate their state from the current epoch?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60339
Approved by: https://github.com/albanD
Summary:
- ~optimizer isn't required for `SequentialLR` since it's already present in the schedulers. Trying to match the signature of it with `ChainedScheduler`.~
- ~`verbose` isn't really used anywhere so removed it.~
updated missing docs and added a small check
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69817
Reviewed By: ngimel
Differential Revision: D33069589
Pulled By: albanD
fbshipit-source-id: f015105a35a2ca39fe94c70acdfd55cdf5601419
Summary:
Fixes https://github.com/pytorch/pytorch/issues/67601.
As simple a fix as I could make it. I even managed to delete some testing code!
I checked calling `super()` and, as I had feared, it doesn't work out the box, so perhaps that ought to be revisited later.
As it stands, https://github.com/pytorch/pytorch/issues/20124, still applies to the chained scheduler, but I think this change is still an improvement.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68010
Reviewed By: zou3519
Differential Revision: D32278139
Pulled By: albanD
fbshipit-source-id: 4c6f9f1b2822affdf63a6d22ddfdbcb1c6afd579
Summary:
The final learning rate should be 0.05 like the lr used as the argument for the optimizer and not 0.005.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67840
Reviewed By: jbschlosser
Differential Revision: D32187091
Pulled By: albanD
fbshipit-source-id: 8aff691bba3896a847d7b9d9d669a65f67a6f066
Summary:
## {emoji:1f41b} Bug
'CosineAnnealingWarmRestarts' object has no attribute 'T_cur'.
In the Constructor of the CosineAnnealingWarmRestarts, we're calling the constructor of the Parent class (_LRScheduler) which inturn calls the step method of the CosineAnnealingWarmRestarts.
The called method tries to update the object's attribute 'T_cur' which is not defined yet. So it raises the error.
This only holds, when we give the value for last_epoch argument as 0 or greater than 0 to the 'CosineAnnealingWarmRestarts', while initializing the object.

## To Reproduce
Steps to reproduce the behavior:
1. Give the value for the last_epoch argument as zero OR
1. Give the value for the last_epoch argument as a Positive integer.
## Expected behavior
I only expected the 'CosineAnnealingWarmRestarts' object to be initialized.
## Environment
PyTorch version: 1.9.0+cpu
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.2 LTS (x86_64)
GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Clang version: Could not collect
CMake version: version 3.21.2
Libc version: glibc-2.31
Python version: 3.8.10 [GCC 9.4.0] (64-bit runtime)
Python platform: Linux-5.8.0-59-generic-x86_64-with-glibc2.29
Is CUDA available: False
CUDA runtime version: No CUDA
## Additional context
We can able to solve this bug by moving the line 'self.T_cur = self.last_epoch' above the 'super(CosineAnnealingWarmRestarts,self).__init__()' line. Since we've initialized the "self.T_cur" to the object.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64758
Reviewed By: ezyang
Differential Revision: D31113694
Pulled By: jbschlosser
fbshipit-source-id: 98c0e292291775895dc3566fda011f2d6696f721
Summary:
Partially resolves https://github.com/pytorch/vision/issues/4281
In this PR we are proposing a new scheduler --SequentialLR-- which enables list of different schedulers called in different periods of the training process.
The main motivation of this scheduler is recently gained popularity of warming up phase in the training time. It has been shown that having a small steps in initial stages of training can help convergence procedure get faster.
With the help of SequentialLR we mainly enable to call a small constant (or linearly increasing) learning rate followed by actual target learning rate scheduler.
```PyThon
scheduler1 = ConstantLR(optimizer, factor=0.1, total_iters=2)
scheduler2 = ExponentialLR(optimizer, gamma=0.9)
scheduler = SequentialLR(optimizer, schedulers=[scheduler1, scheduler2], milestones=[5])
for epoch in range(100):
train(...)
validate(...)
scheduler.step()
```
which this code snippet will call `ConstantLR` in the first 5 epochs and will follow up with `ExponentialLR` in the following epochs.
This scheduler could be used to provide call of any group of schedulers next to each other. The main consideration we should make is every time we switch to a new scheduler we assume that new scheduler starts from the beginning- zeroth epoch.
We also add Chained Scheduler to `optim.rst` and `lr_scheduler.pyi` files here.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64037
Reviewed By: albanD
Differential Revision: D30841099
Pulled By: iramazanli
fbshipit-source-id: 94f7d352066ee108eef8cda5f0dcb07f4d371751
Summary:
Partially unblocks https://github.com/pytorch/vision/issues/4281
Previously we have added WarmUp Schedulers to PyTorch Core in the PR : https://github.com/pytorch/pytorch/pull/60836 which had two mode of execution - linear and constant depending on warming up function.
In this PR we are changing this interface to more direct form, as separating linear and constant modes to separate Schedulers. In particular
```Python
scheduler1 = WarmUpLR(optimizer, warmup_factor=0.1, warmup_iters=5, warmup_method="constant")
scheduler2 = WarmUpLR(optimizer, warmup_factor=0.1, warmup_iters=5, warmup_method="linear")
```
will look like
```Python
scheduler1 = ConstantLR(optimizer, warmup_factor=0.1, warmup_iters=5)
scheduler2 = LinearLR(optimizer, warmup_factor=0.1, warmup_iters=5)
```
correspondingly.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64395
Reviewed By: datumbox
Differential Revision: D30753688
Pulled By: iramazanli
fbshipit-source-id: e47f86d12033f80982ddf1faf5b46873adb4f324
Summary:
In this PR we are introducing ChainedScheduler which initially proposed in the discussion https://github.com/pytorch/pytorch/pull/26423#discussion_r329976246 .
The idea is to provide a user friendly chaining method for schedulers, especially for the cases many of them are involved and we want to have a clean and easy to read interface for schedulers. This method will be even more crucial once CompositeSchedulers and Schedulers for different type of parameters are involved.
The immediate application of Chained Scheduler is expected to happen in TorchVision Library to combine WarmUpLR and MultiStepLR https://github.com/pytorch/vision/blob/master/references/video_classification/scheduler.py#L5 . However, it can be expected that in many other use cases also this method could be applied.
### Example
The usage is as simple as below:
```python
sched=ChainedScheduler([ExponentialLR(self.opt, gamma=0.9),
WarmUpLR(self.opt, warmup_factor=0.2, warmup_iters=4, warmup_method="constant"),
StepLR(self.opt, gamma=0.1, step_size=3)])
```
Then calling
```python
sched.step()
```
would trigger step function for all three schedulers consecutively
Partially resolves https://github.com/pytorch/vision/issues/4281
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63491
Reviewed By: datumbox, mruberry
Differential Revision: D30576180
Pulled By: iramazanli
fbshipit-source-id: b43f0749f55faab25079641b7d91c21a891a87e4
Summary:
It has been discussed in the https://github.com/pytorch/pytorch/pull/60836#issuecomment-899084092 that we have observed an obstacle to chain some type of learning rate schedulers. In particular we observed
* some of the learning rate schedulers returns initial learning rates at epoch 0 as
```
return self.base_lrs`
```
* This can be a problem when two schedulers called as chained as
```
scheduler1.step()
scheduler2.step()
```
in particular, we completely ignore the effect of scheduler1 at epoch 0. This could not be an issue if at epoch 0, scheduler1 was ineffective as in many schedulers, however for schedulers as WarmUp Schedulers, where at epoch 0 schedulers multiplicative value is smaller than 1 this could lead to undesired behaviors.
The following code snippet illustrates the problem better
## Reproducing the bug
```python
import torch
from torch.nn import Parameter
from torch.optim import SGD
from torch.optim.lr_scheduler import WarmUpLR, ExponentialLR
model = [Parameter(torch.randn(2, 2, requires_grad=True))]
optimizer = SGD(model, 1.0)
scheduler1 = WarmUpLR(optimizer, warmup_factor=0.1, warmup_iters=5, warmup_method="constant")
scheduler2 = ExponentialLR(optimizer, gamma=0.9)
for epoch in range(10):
print(epoch, scheduler2.get_last_lr()[0])
optimizer.step()
scheduler1.step()
scheduler2.step()
```
### Current Result
```
0 1.0
1 0.9
2 0.81
3 0.7290000000000001
4 0.6561000000000001
5 5.904900000000001
6 5.314410000000001
7 4.782969000000001
8 4.304672100000001
9 3.874204890000001
```
### Expected Result
```
0 1.0
1 0.9
2 0.81
3 0.7290000000000001
4 0.6561000000000001
5 0.5904900000000001
6 0.5314410000000001
7 0.4782969000000001
8 0.4304672100000001
9 0.3874204890000001
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63457
Reviewed By: datumbox
Differential Revision: D30424160
Pulled By: iramazanli
fbshipit-source-id: 3e15af8d278c872cd6f53406b55f4d3ce5002867
Summary:
Warm up of learning rate scheduling has initially been discussed by Priya et. al. in the paper: https://arxiv.org/pdf/1706.02677.pdf .
In the section 2.2 of the paper they discussed and proposed idea of warming up learning schedulers in order to prevent big variance / noise in the learning rate. Then idea has been further discussed in the following papers:
* Akilesh Gotmare et al. https://arxiv.org/abs/1810.13243
* Bernstein et al http://proceedings.mlr.press/v80/bernstein18a/bernstein18a.pdf
* Liyuan Liu et al: https://arxiv.org/pdf/1908.03265.pdf
There are two type of popularly used learning rate warm up ideas
* Constant warmup (start with very small constant learning rate)
* Linear Warmup ( start with small learning rate and gradually increase)
In this PR we are adding warm up as learning rate scheduler. Note that learning rates are chainable, which means that we can merge warmup scheduler with any other learning rate scheduler to make more sophisticated learning rate scheduler.
## Linear Warmup
Linear Warmup is multiplying learning rate with pre-defined constant - warmup_factor in the first epoch (epoch 0). Then targeting to increase this multiplication constant to one in warmup_iters many epochs. Hence we can derive the formula at i-th step to have multiplication constant equal to:
warmup_factor + (1-warmup_factor) * i / warmup_iters
Moreover, the fraction of this quantity at point i to point i-1 will give us
1 + (1.0 - warmup_factor) / [warmup_iters*warmup_factor+(i-1)*(1-warmup_factor)]
which is used in get_lr() method in our implementation. Below we provide an example how to use linear warmup scheduler and to give an example to show how does it works.
```python
import torch
from torch.nn import Parameter
from torch.optim import SGD
from torch.optim.lr_scheduler import WarmUpLR
model = [Parameter(torch.randn(2, 2, requires_grad=True))]
optimizer = SGD(model, 0.1)
scheduler = WarmUpLR(optimizer, warmup_factor=0.1, warmup_iters=10, warmup_method="linear")
for epoch in range(15):
print(epoch, scheduler.get_last_lr()[0])
optimizer.step()
scheduler.step()
```
```
0 0.010000000000000002
1 0.019000000000000003
2 0.028000000000000008
3 0.03700000000000001
4 0.04600000000000001
5 0.055000000000000014
6 0.06400000000000002
7 0.07300000000000002
8 0.08200000000000003
9 0.09100000000000004
10 0.10000000000000005
11 0.10000000000000005
12 0.10000000000000005
13 0.10000000000000005
14 0.10000000000000005
```
## Constant Warmup
Constant warmup has straightforward idea, to multiply learning rate by warmup_factor until we reach to epoch warmup_factor, then do nothing for following epochs
```python
import torch
from torch.nn import Parameter
from torch.optim import SGD
from torch.optim.lr_scheduler import WarmUpLR
model = [Parameter(torch.randn(2, 2, requires_grad=True))]
optimizer = SGD(model, 0.1)
scheduler = WarmUpLR(optimizer, warmup_factor=0.1, warmup_iters=5, warmup_method="constant")
for epoch in range(10):
print(epoch, scheduler.get_last_lr()[0])
optimizer.step()
scheduler.step()
```
```
0 0.010000000000000002
1 0.010000000000000002
2 0.010000000000000002
3 0.010000000000000002
4 0.010000000000000002
5 0.10000000000000002
6 0.10000000000000002
7 0.10000000000000002
8 0.10000000000000002
9 0.10000000000000002
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60836
Reviewed By: saketh-are
Differential Revision: D29537615
Pulled By: iramazanli
fbshipit-source-id: d910946027acc52663b301f9c56ade686e62cb69
Summary:
Fixes https://github.com/pytorch/pytorch/issues/46405, https://github.com/pytorch/pytorch/issues/43352
I updated the docstring in the local file (function level comments). Do I also need to edit somewhere else or recompile docstrings?
Also, though I didn't change any types here, how is typing (for IDE type checking) documentation generated / used)?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46813
Reviewed By: ezyang
Differential Revision: D24923112
Pulled By: vincentqb
fbshipit-source-id: be7818e0d4593bfc5d74023b9c361ac2a538589a
Summary:
Fixes https://github.com/pytorch/pytorch/issues/40362
The new `three_phase` option provides a way of constructing schedules according to the scheme recommended in [Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates](https://arxiv.org/abs/1708.07120).
Note that this change maintains backwards compatibility, and as a result the default behaviour of OneCycleLR remains quite counter-intuitive.
vincentqb
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42715
Reviewed By: heitorschueroff
Differential Revision: D24289744
Pulled By: vincentqb
fbshipit-source-id: e4aad87880716bb14613c0aa8631e43b04a93e5c
Summary:
The subclass sets "self.last_epoch" when this is set in the parent class's init function. Why would we need to set last_epoch twice? I think calling "super" resets last_epoch anyway, so I am not sure why we would want to include this in the subclass. Am I missing something?
For the record, I am just a Pytorch enthusiast. I hope my question isn't totally silly.
Fixes #{issue number}
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44613
Reviewed By: albanD
Differential Revision: D23691770
Pulled By: mrshenli
fbshipit-source-id: 080d9acda86e1a2bfaafe2c6fcb8fc1544f8cf8a
Summary:
When an error is raised and `__exit__` in a context manager returns `True`, the error is suppressed; otherwise the error is raised. No return value should be given to maintain the default behavior of context manager.
Fixes https://github.com/pytorch/pytorch/issues/32639. The `get_lr` function was overridden with a function taking an epoch parameter, which is not allowed. However, the relevant error was not being raised.
```python
In [1]: import torch
...:
...: class MultiStepLR(torch.optim.lr_scheduler._LRScheduler):
...: def __init__(self, optimizer, gamma, milestones, last_epoch = -1):
...: self.init_lr = [group['lr'] for group in optimizer.param_groups]
...: self.gamma = gamma
...: self.milestones = milestones
...: super().__init__(optimizer, last_epoch)
...:
...: def get_lr(self, step):
...: global_step = self.last_epoch #iteration number in pytorch
...: gamma_power = ([0] + [i + 1 for i, m in enumerate(self.milestones) if global_step >= m])[-1]
...: return [init_lr * (self.gamma ** gamma_power) for init_lr in self.init_lr]
...:
...: optimizer = torch.optim.SGD([torch.rand(1)], lr = 1)
...: scheduler = MultiStepLR(optimizer, gamma = 1, milestones = [10, 20])
```
```
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-1-7fad6ba050b0> in <module>
14
15 optimizer = torch.optim.SGD([torch.rand(1)], lr = 1)
---> 16 scheduler = MultiStepLR(optimizer, gamma = 1, milestones = [10, 20])
<ipython-input-1-7fad6ba050b0> in __init__(self, optimizer, gamma, milestones, last_epoch)
6 self.gamma = gamma
7 self.milestones = milestones
----> 8 super().__init__(optimizer, last_epoch)
9
10 def get_lr(self, step):
~/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/optim/lr_scheduler.py in __init__(self, optimizer, last_epoch)
75 self._step_count = 0
76
---> 77 self.step()
78
79 def state_dict(self):
~/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/optim/lr_scheduler.py in step(self, epoch)
141 print("1a")
142 # try:
--> 143 values = self.get_lr()
144 # except TypeError:
145 # raise RuntimeError
TypeError: get_lr() missing 1 required positional argument: 'step'
```
May be related to https://github.com/pytorch/pytorch/issues/32898.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32997
Differential Revision: D19737731
Pulled By: vincentqb
fbshipit-source-id: 5cf84beada69b91f91e36b20c3278e9920343655
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27254
`MultiplicativeLR` consumes a function providing the multiplicative factor at each epoch. It mimics `LambdaLR` in its syntax.
Test Plan: Imported from OSS
Differential Revision: D17728088
Pulled By: vincentqb
fbshipit-source-id: 1c4a8e19a4f24c87b5efccda01630c8a970dc5c9
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27850
Many of these are real problems in the documentation (i.e., link or
bullet point doesn't display correctly).
Test Plan: - built and viewed the documentation for each change locally.
Differential Revision: D17908123
Pulled By: zou3519
fbshipit-source-id: 65c92a352c89b90fb6b508c388b0874233a3817a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26423
Enable chainable schedulers as requested in #13022 by implementing the changes mentioned below from [comment](https://github.com/pytorch/pytorch/pull/21800#issuecomment-513370208).
* Changing the behavior of schedulers to the chainable formula when available
* Using the closed form whenever epoch is different from None until the next release with a deprecation warning
* Making `get_computed_values` the supported way of obtaining the last computed learning rate by the scheduler (see [comment](https://github.com/pytorch/pytorch/pull/21800#issuecomment-513940729) for new syntax)
* Returning a deprecation warning when invoking the undocumented get_lr function (see [comment](https://github.com/pytorch/pytorch/pull/21800#discussion_r294305485)) referring to `get_computed_values`, and deprecating it in the next release.
* `CosineAnnealingWarmRestart` still takes an epoch parameter as it is the only one with a mechanic relying on fractional epoch
* `MultiplicativeLR` is consumes a function providing the multiplicative factor at each epoch. It mimics `LambdaLR` in its syntax.
# #20527
### Before
The user calls scheduler with a constant epoch either across loops or in the same loop.
```
import torch.optim as optim
from torch import nn
conv = nn.Conv2d(3,3,3)
optimizer = optim.Adam(conv.parameters())
lr_scheduler = optim.lr_scheduler.StepLR(optimizer, 2)
# Scheduler with sometimes-constant epoch number
for epoch in [0, 0, 1, 1, 2, 2, 3, 3]:
lr_scheduler.step(epoch)
print(optimizer.param_groups[0]['lr'])
```
### After
If the user wants to step
```
import torch.optim as optim
from torch import nn
conv = nn.Conv2d(3,3,3)
optimizer = optim.Adam(conv.parameters())
lr_scheduler = optim.lr_scheduler.StepLR(optimizer, 2)
last_epoch = -1
for epoch in [0, 0, 1, 1, 2, 2, 3, 3]:
# Check if epoch number has changed manually
if epoch-last_epoch > 0:
lr_scheduler.step()
last_epoch = epoch
print(epoch, scheduler.get_computed_values())
```
# #22107
### Before
```
import torch
from torchvision.models import resnet18
net = resnet18()
optimizer = torch.optim.SGD(net.parameters(), 0.1)
scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones=[3, 6, 9], gamma=0.1)
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, 3, gamma=0.1)
for i in range(10):
# Scheduler computes and returns new learning rate, leading to unexpected behavior
print(i, scheduler.get_lr())
scheduler.step()
```
### After
```
import torch
from torchvision.models import resnet18
net = resnet18()
optimizer = torch.optim.SGD(net.parameters(), 0.1)
lr_scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones=[3, 6, 9], gamma=0.1)
lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer, 3, gamma=0.1)
for i in range(10):
# Returns last computed learning rate by scheduler
print(i, lr_scheduler.get_computed_values())
lr_scheduler.step()
```
# ghstack
This contains the changes from #24352. Opening again since they were reverted.
This reverts commit 1c477b7e1f.
Test Plan: Imported from OSS
Differential Revision: D17460427
Pulled By: vincentqb
fbshipit-source-id: 8c10f4e7246d6756ac91df734e8bed65bdef63c9
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24352
Enable chainable schedulers as requested in #13022 by implementing the changes mentioned below from [comment](https://github.com/pytorch/pytorch/pull/21800#issuecomment-513370208).
* Changing the behavior of schedulers to the chainable formula when available
* Using the closed form whenever epoch is different from None until the next release with a deprecation warning
* Making `get_computed_values` the supported way of obtaining the last computed learning rate by the scheduler (see [comment](https://github.com/pytorch/pytorch/pull/21800#issuecomment-513940729) for new syntax)
* Returning a deprecation warning when invoking the undocumented get_lr function (see [comment](https://github.com/pytorch/pytorch/pull/21800#discussion_r294305485)) referring to `get_computed_values`, and deprecating it in the next release.
* `CosineAnnealingWarmRestart` still takes an epoch parameter as it is the only one with a mechanic relying on fractional epoch
* `MultiplicativeLR` is consumes a function providing the multiplicative factor at each epoch. It mimics `LambdaLR` in its syntax.
# #20527
### Before
The user calls scheduler with a constant epoch either across loops or in the same loop.
```
import torch.optim as optim
from torch import nn
conv = nn.Conv2d(3,3,3)
optimizer = optim.Adam(conv.parameters())
lr_scheduler = optim.lr_scheduler.StepLR(optimizer, 2)
# Scheduler with sometimes-constant epoch number
for epoch in [0, 0, 1, 1, 2, 2, 3, 3]:
lr_scheduler.step(epoch)
print(optimizer.param_groups[0]['lr'])
```
### After
If the user wants to step
```
import torch.optim as optim
from torch import nn
conv = nn.Conv2d(3,3,3)
optimizer = optim.Adam(conv.parameters())
lr_scheduler = optim.lr_scheduler.StepLR(optimizer, 2)
last_epoch = -1
for epoch in [0, 0, 1, 1, 2, 2, 3, 3]:
# Check if epoch number has changed manually
if epoch-last_epoch > 0:
lr_scheduler.step()
last_epoch = epoch
print(epoch, scheduler.get_computed_values())
```
# #22107
### Before
```
import torch
from torchvision.models import resnet18
net = resnet18()
optimizer = torch.optim.SGD(net.parameters(), 0.1)
scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones=[3, 6, 9], gamma=0.1)
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, 3, gamma=0.1)
for i in range(10):
# Scheduler computes and returns new learning rate, leading to unexpected behavior
print(i, scheduler.get_lr())
scheduler.step()
```
### After
```
import torch
from torchvision.models import resnet18
net = resnet18()
optimizer = torch.optim.SGD(net.parameters(), 0.1)
lr_scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones=[3, 6, 9], gamma=0.1)
lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer, 3, gamma=0.1)
for i in range(10):
# Returns last computed learning rate by scheduler
print(i, lr_scheduler.get_computed_values())
lr_scheduler.step()
```
Test Plan: Imported from OSS
Differential Revision: D17349760
Pulled By: vincentqb
fbshipit-source-id: 0a6ac01e2a6b45000bc6f9df732033dd81f0d89f
Summary:
Fixes: https://github.com/pytorch/pytorch/issues/23480.
I only verified that the schedule reaches the restart at the expected step as specified in the issue, it would be good to have someone else verify correctness here.
Script:
```
scheduler = torch.optim.lr_scheduler.CosineAnnealingWarmRestarts(torch.optim.SGD([torch.randn(1, requires_grad=True)], lr=0.5), T_0=1, T_mult=2)
for i in range(9):
print(i)
print(scheduler.get_lr())
scheduler.step()
```
Output:
```
0
[0.5]
1
[0.5]
2
[0.25]
3
[0.5]
4
[0.42677669529663687]
5
[0.25]
6
[0.07322330470336313]
7
[0.5]
8
[0.4809698831278217]
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23833
Differential Revision: D16657251
Pulled By: gchanan
fbshipit-source-id: 713973cb7cbfc85dc333641cbe9feaf917718eb9
Summary:
Resolves issue https://github.com/pytorch/pytorch/issues/19003
The author of this issue also asked that `cycle_momentum` default to `False` if the optimizer does not have a momentum parameter, but I'm not sure what the best way to do this would be. Silently changing the value based on the optimizer may confuse the user in some cases (say the user explicitly set `cycle_momentum=True` but doesn't know that the Adam optimizer doesn't use momentum).
Maybe printing a warning when switching this argument's value would suffice?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20401
Differential Revision: D15765463
Pulled By: ezyang
fbshipit-source-id: 88ddabd9e960c46f3471f37ea46013e6b4137eaf
Summary:
This PR addresses the problem described in the comment: https://github.com/pytorch/pytorch/pull/20203#issuecomment-499231276
and previously coded bad behaviour:
- a warning was raised all the times when lr schedulling is initialized
Now the code checks that:
- on the second call of `lr_scheduler.step`, ensure that `optimizer.step` has been already called, otherwise raise a warning (as it was done in #20203 )
- if optimizer's step is overridden -> raise once another warning to aware user about the new pattern:
`opt.step()` -> `lrs.step()` as we can not check this .
Now tests check that
- at initialization (`lrs = StepLR(...)`)there is no warnings
- if we replace `optimizer.step` by something else (similarly to the [code of nvidia/apex](https://github.com/NVIDIA/apex/blob/master/apex/amp/_process_optimizer.py#L287)) there is another warning raised.
cc ezyang
PS. honestly I would say that there is a lot of overhead introduced for simple warnings. I hope all these checks will be removed in future `1.2.0` or other versions...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21460
Differential Revision: D15701776
Pulled By: ezyang
fbshipit-source-id: eac5712b9146d9d3392a30f6339cd33d90c497c7
Summary:
Fixes a typo in the CyclicLR docs by adding the lr_scheduler directory and puts in other required arguments.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21021
Differential Revision: D15530109
Pulled By: soumith
fbshipit-source-id: 98781bdab8d82465257229e50fa3bd0015da1286
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20880
This clarifies how the momentum parameters should be used.
Reviewed By: soumith
Differential Revision: D15482450
fbshipit-source-id: e3649a38876c5912cb101d8e404abca7c3431766
Summary:
Class attributes preferably be explicitly initiated within
the __init__() call. Otherwise, overriding step() is
prone to bugs.
This patch partially reverts #7889
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20059
Differential Revision: D15195747
Pulled By: soumith
fbshipit-source-id: 3d1a51d8c725d6f14e3e91ee94c7bc7a7d6c1713
Summary:
Because of merge error with master in #15042, open a new PR for ezyang.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17226
Differential Revision: D14418145
Pulled By: mrshenli
fbshipit-source-id: 099ba225b28e6aba71760b81b2153ad1c40fbaae
Summary:
Added the formula for the corner case. Updated unit tests.
Fixes#17913
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19180
Differential Revision: D14942023
Pulled By: ezyang
fbshipit-source-id: 167c109b97a7830d5b24541dc91e4788d531feec
Summary:
Hello everyone :) !!
I've found that lr_scheduler was initialized with last_epoch as -1.
This causes that even after the first step (not the one in init but explicit step of scheduler),
learning rate of scheduler's optimizer remains as the previous.
```python
>>> import torch
>>> cc = torch.nn.Conv2d(10,10,3)
>>> myinitial_lr = 0.1
>>> myoptimizer = torch.optim.Adam(cc.parameters(), lr=myinitial_lr)
>>> mylrdecay = 0.5
>>> myscheduler = torch.optim.lr_scheduler.ExponentialLR(myoptimizer,mylrdecay)
>>> myscheduler.get_lr()
[0.2] # this is because of get_lr calculates lr by 0.1 * 0.5^-1
>>> myscheduler.optimizer.param_groups[0]["lr"]
0.1 # this is not consistent with get_lr value
>>> myscheduler.last_epoch
-1
>>> myscheduler.step()
>>> myscheduler.get_lr()
[0.1] # this should be the value right after the init, not after first step
>>> myscheduler.optimizer.param_groups[0]["lr"]
0.1 # since this is after first step, it should have been decayed as 0.05
>>> myscheduler.last_epoch
0
>>> myscheduler.step()
>>> myscheduler.last_epoch
1
>>> myscheduler.get_lr()
[0.05]
>>> myscheduler.optimizer.param_groups[0]["lr"]
0.05
>>> myscheduler.last_epoch
1
```
First problem is, even after the init of lr_scheduler, you get the inconsistent parameter values.
The second problem is, you are stuck with same learning rate in the first 2 epochs if the step function of lr_scheduler is not called in the beginning of the epoch loop.
Of course, you can avoid this by calling lr_scheduler's step in the beginning,
but I don't think this is proper use since, incase of optimizer, step is called in the end of the iteration loop.
I've simply avoided all above issues by setting last_epoch as 0 after the initialization.
This also makes sense when you init with some value of last_epoch which is not -1.
For example, if you want to init with last epoch 10,
lr should not be set with decayed 1 step further. Which is
last_epoch gets +1 in the previous code.
base_lr * self.gamma ** self.last_epoch
Instead, it should be set with step 10 exact value.
I hope this fix find it's way with all your help :)
I'm really looking forward & excited to become a contributor for pytorch!
Pytorch Rocks!!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/7889
Differential Revision: D15012769
Pulled By: ezyang
fbshipit-source-id: 258fc3009ea7b7390a3cf2e8a3682eafb506b08b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18598
ghimport-source-id: c74597e5e7437e94a43c163cee0639b20d0d0c6a
Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18598 Turn on F401: Unused import warning.**
This was requested by someone at Facebook; this lint is turned
on for Facebook by default. "Sure, why not."
I had to noqa a number of imports in __init__. Hypothetically
we're supposed to use __all__ in this case, but I was too lazy
to fix it. Left for future work.
Be careful! flake8-2 and flake8-3 behave differently with
respect to import resolution for # type: comments. flake8-3 will
report an import unused; flake8-2 will not. For now, I just
noqa'd all these sites.
All the changes were done by hand.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Differential Revision: D14687478
fbshipit-source-id: 30d532381e914091aadfa0d2a5a89404819663e3
Summary:
This implements a cyclical learning rate (CLR) schedule with an optional inverse cyclical momentum. More info about CLR: https://github.com/bckenstler/CLR
This is finishing what #2016 started. Resolves#1909.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18001
Differential Revision: D14451845
Pulled By: sampepose
fbshipit-source-id: 8f682e0c3dee3a73bd2b14cc93fcf5f0e836b8c9
Summary:
Modified step_lr for StepLR, MultiStepLR, ExponentialLR and CosineAnnealingLR. In this way, multiple schedulers can be used simultaneously to modify the learning rates.
Related issue: https://github.com/pytorch/pytorch/issues/13022
Added unit tests combining multiple schedulers.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14010
Reviewed By: ezyang
Differential Revision: D13494941
Pulled By: chandlerzuo
fbshipit-source-id: 7561270245639ba1f2c00748f8e4a5f7dec7160c
Summary:
I opened an issue explaining some of my frustrations with the current state of schedulers.
While most points that I raised in [that issue](https://github.com/pytorch/pytorch/issues/8741#issuecomment-404449697) need to be discussed more thoroughly before being implemented, there are some that are not so difficult to fix.
This PR changes the way the LambdaLR scheduler gets serialized:
> The lr_lambda functions are only saved if the are callable objects (which can be stateful).
> There is no point in saving functions/lambdas as you need their definition before unpickling and they are stateless.
This has the big advantage that the scheduler is serializable, even if you use lambda functions or locally defined functions (aka a function in a function).
Does this functionality need any unit tests?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9927
Differential Revision: D9055505
Pulled By: soumith
fbshipit-source-id: 6c1cec588beedd098ec7d2bce6a9add27f29e48f
* Clarify patience in ReduceLROnPlateau docs
It's unclear which definition of patience we have. The two ways to
interpret it are:
- How many bad epochs can you see before you start considering changing the learning rate.
- How many bad epochs can you see before you change the learning rate.
This PR clarifies the docs with an example. If `patience = 2`, then
after 2 bad epochs, we begin considering changing the learning rate.
After seeing one more epoch (the 3rd epoch), if that epoch is also bad,
then we change the learning rate after it.
* address comments
* Fix LaTex rendering in CosineAnnealingLR
Backslashes were interpreted by Python as escapes in the string, so \frac
turned into frac, which is not a valid LaTex command.
This could be fixed with double backslashes, but the easiest solution is to
just use a raw (r) docstring.
* Fix sphinx warnings for LRN doc headings
* Move LRN docstring from __init__ to class level
The docstring was not rendered by sphinx at
http://pytorch.org/docs/master/nn.html#torch.nn.LocalResponseNorm
because it was in the constructor.
* Remove superfluous backticks from LRN formula