Commit Graph

212 Commits

Author SHA1 Message Date
Alban Desmaison
46b252b83a Revert D24262885: [pytorch][PR] Added foreach_zero_ API
Test Plan: revert-hammer

Differential Revision:
D24262885 (8e37dcb1f3)

Original commit changeset: 144c283dd009

fbshipit-source-id: 451b202e23bc1fcb11b20d26c11d9a1329789d22
2020-10-28 06:48:59 -07:00
iurii zdebskyi
8e37dcb1f3 Added foreach_zero_ API (#46215)
Summary:
Adding Added foreach_zero_(TensorList) API

Tested via unit tests

Pull Request resolved: https://github.com/pytorch/pytorch/pull/46215

Reviewed By: zhangguanheng66

Differential Revision: D24262885

Pulled By: izdeby

fbshipit-source-id: 144c283dd00924083096d6d92eb9085cbd6097d3
2020-10-27 18:03:34 -07:00
Alexander Grund
5b0f400488 Replace list(map(...)) constructs by list comprehensions (#46461)
Summary:
As discussed in https://github.com/pytorch/pytorch/issues/46392 this makes the code more readable and possibly more performant.

It also fixes a bug detected by this where the argument order of `map` was confused: 030a24906e (diff-5bb26bd3a23ee3bb540aeadcc0385df2a4e48de39f87ed9ea76b21990738fe98L1537-R1537)

Fixes https://github.com/pytorch/pytorch/issues/46392

Pull Request resolved: https://github.com/pytorch/pytorch/pull/46461

Reviewed By: ailzhang

Differential Revision: D24367015

Pulled By: ezyang

fbshipit-source-id: d55a67933cc22346b00544c9671f09982ad920e7
2020-10-19 18:42:49 -07:00
Iurii Zdebskyi
e7564b076c Refactor scalar list APIs to use overloads (#45673)
Summary:
Refactor foreach APIs to use overloads in case of scalar list inputs.
Tested via unit tests.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45673

Reviewed By: heitorschueroff

Differential Revision: D24053424

Pulled By: izdeby

fbshipit-source-id: 35976cc50b4acfe228a32ed26cede579d5621cde
2020-10-19 09:28:49 -07:00
Aiden Nibali
2bc6caa9e4 Add three-phase option to OneCycleLR (#42715)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/40362

The new `three_phase` option provides a way of constructing schedules according to the scheme recommended in [Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates](https://arxiv.org/abs/1708.07120).

Note that this change maintains backwards compatibility, and as a result the default behaviour of OneCycleLR remains quite counter-intuitive.

vincentqb

Pull Request resolved: https://github.com/pytorch/pytorch/pull/42715

Reviewed By: heitorschueroff

Differential Revision: D24289744

Pulled By: vincentqb

fbshipit-source-id: e4aad87880716bb14613c0aa8631e43b04a93e5c
2020-10-14 15:05:14 -07:00
Iurii Zdebskyi
8a074af929 Added scalar lists APIs for addcdiv and addcmul (#45932)
Summary:
1) Added new APIs:
 _foreach_addcdiv(Tensor(a!)[] self, Tensor[] tensor1, Tensor[] tensor2, float[] scalars)
 _foreach_addcdiv_(Tensor(a!)[] self, Tensor[] tensor1, Tensor[] tensor2, float[] scalars)
 _foreach_addcmul(Tensor(a!)[] self, Tensor[] tensor1, Tensor[] tensor2, float[] scalars)
 _foreach_addcmul_(Tensor(a!)[] self, Tensor[] tensor1, Tensor[] tensor2, float[] scalars)

2) Updated optimizers to use new APIs

Tested via unit tests

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45932

Reviewed By: navahgar

Differential Revision: D24150306

Pulled By: izdeby

fbshipit-source-id: c2e65dedc95d9d81a2fdd116e41df0accb0b6f26
2020-10-14 08:12:37 -07:00
Iurii Zdebskyi
1a57b390e8 Add torch._foreach_maximum(TensorList, TensorList) & torch._foreach_minimum(TensorList, TensorList) APIs (#45692)
Summary:
- Adding torch._foreach_maximum(TensorList, TensorList) API
- Adding torch._foreach_minimum(TensorList, TensorList) API
- Updated Adam/AdamW optimizers

Tested via unit tests

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45692

Reviewed By: anjali411

Differential Revision: D24142464

Pulled By: izdeby

fbshipit-source-id: 6a4fc343a1613cb1e26c8398450ac9cea0a2eb51
2020-10-13 09:22:30 -07:00
Iurii Zdebskyi
8c309fc052 Add more tests for mt optimizers (#45475)
Summary:
Add more test cases for mt optimizers and fix Adam/AdamW

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45475

Reviewed By: soumith

Differential Revision: D23982727

Pulled By: izdeby

fbshipit-source-id: 4b24d37bd52a2fa3719d3e3a5dcf3b96990b0f5b
2020-09-28 23:59:58 -07:00
Iurii Zdebskyi
722faeb2a4 [RELAND] Added optimizers based on multi tensor apply (#45408)
Summary:
Original PR https://github.com/pytorch/pytorch/pull/45299.  The present PR fixes minor bugs that caused revert.

Adding a new namespace `torch.optim._multi_tensor` with a bunch of updated optimizers. Those optimizers are using _foreach APIs which improve performance significantly.

### Tests
- updated existing tests to use both optimizers
- added `test_multi_tensor_optimizers` test to verify correctness.

### Perf results

**Adam**
timeit: 42.69 ms --> 10.16 ms
autorange: 41.96 ms --> 10.28 ms

**AdamW**
timeit: 51.38 ms --> 15.63 ms
autorange: 50.82 ms --> 16.07 ms

**SGD**
timeit: 6.28 ms --> 4.40 ms
autorange: 6.13 ms --> 4.73 ms

**RMSprop**
timeit: 28.63 ms --> 5.89 ms
autorange: 28.27 ms -->  5.76 ms

**Rprop**
timeit: 213.30 --> 178.42
autorange: 212.03 --> 178.03

**ASGD**
timeit: 21.67 --> 9.33
autorange: 21.64 --> 9.27

**Adamax**
timeit: 55.60 --> 48.29
autorange: 55.22 -> 49.13

**Rerf Script used**

```
import torch
import time
import torch.optim as optim
from torch.autograd import Variable
from torch.optim.lr_scheduler import ExponentialLR, ReduceLROnPlateau, StepLR
import torch.nn as nn
import time
import torchvision
import torch.utils._benchmark as benchmark_utils

device = "cuda"
model = torchvision.models.resnet.resnet101(pretrained=True).to(device)
targets = torch.randint(0, 1000, (100, 100), device=device)
criterion = nn.CrossEntropyLoss()

optimizer = optim.SGD(model.parameters(), lr=1e-3) # <----------------------- optimizer.
                                                          # would compare optim.SGD vs optim._multi_tensor.SGD
running_loss = 0.0
target = torch.empty(128, dtype=torch.long, device=device).random_(5)

optimizer.zero_grad()
inputs = torch.rand(128, 3, 100, 100, device=device , requires_grad=True)
outputs = model(inputs)
loss = criterion(outputs, target)
loss.backward()
optimizer.step()
running_loss += loss.item()

def main():
    timer = benchmark_utils.Timer(
        stmt="optimizer.step()",
        globals=globals(),
        label="str(optimizer)",
    )

    for i in range(1):
        print(f"Run: {i}\n{'-' * 40}")
        print(f"timeit:\n{timer.timeit(1000)}\n")
        print(f"autorange:\n{timer.blocked_autorange()}\n\n")

if __name__ == "__main__":
    main()
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45408

Reviewed By: gchanan

Differential Revision: D23956680

Pulled By: izdeby

fbshipit-source-id: c5eab7bf5fce14a287c15cead1cdc26e42cfed94
2020-09-28 13:14:04 -07:00
Mike Ruberry
54a253fded Revert D23931987: Added optimizers based on multi tensor apply
Test Plan: revert-hammer

Differential Revision:
D23931987 (2b21e7767e)

Original commit changeset: 582134ef2d40

fbshipit-source-id: ffd500aea55fda34155442fb15e2529cb9c00100
2020-09-26 18:11:54 -07:00
Iurii Zdebskyi
2b21e7767e Added optimizers based on multi tensor apply (#45299)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45299

Adding a new namespace `torch.optim._multi_tensor` with a bunch of updated optimizers. Those optimizers are using _foreach APIs which improve performance significantly.

### Tests
- updated existing tests to use both optimizers
- added `test_multi_tensor_optimizers` test to verify correctness.

### Perf results

**Adam**
timeit: 42.69 ms --> 10.16 ms
autorange: 41.96 ms --> 10.28 ms

**AdamW**
timeit: 51.38 ms --> 15.63 ms
autorange: 50.82 ms --> 16.07 ms

**SGD**
timeit: 6.28 ms --> 4.40 ms
autorange: 6.13 ms --> 4.73 ms

**RMSprop**
timeit: 28.63 ms --> 5.89 ms
autorange: 28.27 ms -->  5.76 ms

**Rprop**
timeit: 213.30 --> 178.42
autorange: 212.03 --> 178.03

**ASGD**
timeit: 21.67 --> 9.33
autorange: 21.64 --> 9.27

**Adamax**
timeit: 55.60 --> 48.29
autorange: 55.22 -> 49.13

**Rerf Script used**

```
import torch
import time
import torch.optim as optim
from torch.autograd import Variable
from torch.optim.lr_scheduler import ExponentialLR, ReduceLROnPlateau, StepLR
import torch.nn as nn
import time
import torchvision
import torch.utils._benchmark as benchmark_utils

device = "cuda"
model = torchvision.models.resnet.resnet101(pretrained=True).to(device)
targets = torch.randint(0, 1000, (100, 100), device=device)
criterion = nn.CrossEntropyLoss()

optimizer = optim.SGD(model.parameters(), lr=1e-3) # <----------------------- optimizer.
                                                          # would compare optim.SGD vs optim._multi_tensor.SGD
running_loss = 0.0
target = torch.empty(128, dtype=torch.long, device=device).random_(5)

optimizer.zero_grad()
inputs = torch.rand(128, 3, 100, 100, device=device , requires_grad=True)
outputs = model(inputs)
loss = criterion(outputs, target)
loss.backward()
optimizer.step()
running_loss += loss.item()

def main():
    timer = benchmark_utils.Timer(
        stmt="optimizer.step()",
        globals=globals(),
        label="str(optimizer)",
    )

    for i in range(1):
        print(f"Run: {i}\n{'-' * 40}")
        print(f"timeit:\n{timer.timeit(1000)}\n")
        print(f"autorange:\n{timer.blocked_autorange()}\n\n")

if __name__ == "__main__":
    main()
```

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D23931987

Pulled By: izdeby

fbshipit-source-id: 582134ef2d402909d27d89a45c5b588fb7130ea1
2020-09-26 12:17:43 -07:00
Wanchao Liang
32c355af5b [dist_optim] introduce distributed functional optimizer (#45221)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45221

This PR introduces a distributed functional optimizer, so that
distributed optimizer can reuse the functional optimizer APIs and
maintain their own states. This could enable the torchscript compatible
functional optimizer when using distributed optimizer, helps getting rid
of GIL and improve overall performance of training, especially distributed
model parallel training

Test Plan: Imported from OSS

Reviewed By: ailzhang

Differential Revision: D23935256

Pulled By: wanchaol

fbshipit-source-id: 59b6d77ff4693ab24a6e1cbb6740bcf614cc624a
2020-09-25 17:13:10 -07:00
Wanchao Liang
08caf15502 [optimizer] refactor Adam to use functional API (#44791)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44791

Test Plan: Imported from OSS

Reviewed By: ailzhang

Differential Revision: D23935257

Pulled By: wanchaol

fbshipit-source-id: 6f6e22a9287f5515d2e4e6abd4dee2fe7e17b945
2020-09-25 17:13:08 -07:00
Wanchao Liang
0444c372e1 [optimizer] introduce optimizer functional API, refactor Adagrad (#44715)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44715

We have provided a nice and intuitive API in Python. But in the context of large scale distributed training (e.g. Distributed Model Parallel), users often want to use multithreaded training instead of multiprocess training as it provides better resource utilization and efficiency.

This PR introduces functional optimizer concept (that is similar to the concept of `nn.functional`), we split optimizer into two parts: 1. optimizer state management 2. optimizer computation. We expose the computation part as a separate functional API that is available to be used by internal and OSS developers, the caller of the functional API will maintain their own states in order to directly calls the functional API. While maintaining the end user API be the same, the functional API is TorchScript friendly, and could be used by the distributed optimizer to speed up the training without GIL.

Test Plan: Imported from OSS

Reviewed By: ailzhang

Differential Revision: D23935258

Pulled By: wanchaol

fbshipit-source-id: d2a5228439edb3bc64f7771af2bb9e891847136a
2020-09-25 17:10:26 -07:00
Michael Carilli
3e6bb5233f Reference amp tutorial (recipe) from core amp docs (#44725)
Summary:
https://pytorch.org/tutorials/recipes/recipes/amp_recipe.html is live.  Core amp docs should reference it.

Also i fixed some typos in the `zero_grad` docs we ignored when git was behaving weirdly during ngimel 's merge of https://github.com/pytorch/pytorch/pull/44423.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44725

Reviewed By: mruberry

Differential Revision: D23723807

Pulled By: ngimel

fbshipit-source-id: ca0b76365f8ca908bd978e3b38bf81857fa6c2a3
2020-09-16 11:37:58 -07:00
Kent Gauen
2efc618f19 lr_schedule.py redundant code (#44613)
Summary:
The subclass sets "self.last_epoch" when this is set in the parent class's init function. Why would we need to set last_epoch twice? I think calling "super" resets last_epoch anyway, so I am not sure why we would want to include this in the subclass. Am I missing something?

For the record, I am just a Pytorch enthusiast. I hope my question isn't totally silly.

Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44613

Reviewed By: albanD

Differential Revision: D23691770

Pulled By: mrshenli

fbshipit-source-id: 080d9acda86e1a2bfaafe2c6fcb8fc1544f8cf8a
2020-09-15 20:28:39 -07:00
Xiang Gao
6bc77f4d35 Use amax/maximum instead of max in optimizers (#43797)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43797

Reviewed By: malfet

Differential Revision: D23406641

Pulled By: mruberry

fbshipit-source-id: 0cd075124aa6533b21375fe2c90c44a5d05ad6e6
2020-09-15 10:39:40 -07:00
taiyuanz
c515881137 Add reset_grad() function (#44423)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44423

Pull Request resolved: https://github.com/pytorch/pytorch/pull/42754

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D23010859

Pulled By: ngimel

fbshipit-source-id: 56eec43eba88b98cbf714841813977c68f983564
2020-09-09 22:05:45 -07:00
Randall Hunt
24eea364f7 Check SparseAdam params are dense on init (#41966) (#43668)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/41966

Raises a value error if user attempts to create SparseAdam optimizer with sparse parameter tensors.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43668

Reviewed By: glaringlee

Differential Revision: D23388109

Pulled By: ranman

fbshipit-source-id: 1fbcc7527d49eac6fae9ce51b3307c609a6ca38b
2020-09-01 14:25:59 -07:00
NTT123
103887892c Fix "non-negative integer" error messages (#42734)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/42662

Use "positive integer" error message for consistency with: 17f76f9a78/torch/optim/lr_scheduler.py (L958-L959)
ad7133d3c1/torch/utils/data/sampler.py (L102-L104)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/42734

Reviewed By: zdevito

Differential Revision: D23039575

Pulled By: smessmer

fbshipit-source-id: 1be1e0caa868891540ecdbe6f471a6cd51c40ede
2020-08-10 19:39:37 -07:00
Vincent Quenneville-Belair
7221a3d1aa enable torch.optim.swa_utils.SWALR (#42574)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/42435

Pull Request resolved: https://github.com/pytorch/pytorch/pull/42574

Reviewed By: zou3519

Differential Revision: D22949369

Pulled By: vincentqb

fbshipit-source-id: f2f319ec94a97e0afe4d4327c866504ae632a986
2020-08-05 12:37:45 -07:00
Yanli Zhao
79cfd85987 grad detach_ only when it has grad_fn in zero_grad call (#41283)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41283

in optimizer.zero_grad(), detach_ is useful to avoid memory leak only when grad has grad_fn, so add check to call grad.detach_ only when the grad has grad_fn in zero_grad() function
ghstack-source-id: 108702289

Test Plan: unit test

Reviewed By: mrshenli

Differential Revision: D22487315

fbshipit-source-id: 861909b15c8497f1da57f092d8963d4920c85e38
2020-07-29 11:40:13 -07:00
YifanShenSZ
e7ed0b3fae Avoid zero division in _cubic_interpolate (#42093)
Summary:
I encountered a zero division problem when using LBFGS:

File "/home/yshen/anaconda3/lib/python3.7/site-packages/torch/optim/lbfgs.py", line 118, in _strong_wolfe
    bracket[1], bracket_f[1], bracket_gtd[1])
File "/home/yshen/anaconda3/lib/python3.7/site-packages/torch/optim/lbfgs.py", line 21, in _cubic_interpolate
    d1 = g1 + g2 - 3 * (f1 - f2) / (x1 - x2)
ZeroDivisionError: float division by zero

My solution is to determine whether "line-search bracket is so small" before calling _cubic_interpolate

Pull Request resolved: https://github.com/pytorch/pytorch/pull/42093

Reviewed By: pbelevich

Differential Revision: D22770667

Pulled By: mrshenli

fbshipit-source-id: f8fdfcbd3fd530235901d255208fef8005bf898c
2020-07-28 08:32:00 -07:00
mariosasko
4281240cb5 Raise error for duplicate params in param group #40967 (#41597)
Summary:
This PR fixes an issue in https://github.com/pytorch/pytorch/issues/40967 where duplicate parameters across different parameter groups are not allowed, but duplicates inside the same parameter group are accepted. After this PR, both cases are treated equally and raise `ValueError`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/41597

Reviewed By: zou3519

Differential Revision: D22608019

Pulled By: vincentqb

fbshipit-source-id: 6df41dac62b80db042cfefa6e53fb021b49f4399
2020-07-27 12:25:52 -07:00
Zhijian Liu
7646f3c77f Fix type annotation for CosineAnnealingLR (#41866)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41866

Reviewed By: izdeby

Differential Revision: D22703576

Pulled By: mrshenli

fbshipit-source-id: 10a0f593ffaaae82a2923a42815c36793a9043d5
2020-07-23 15:56:50 -07:00
guol-fnst
17f76f9a78 Verbose param for schedulers that don't have it #38726 (#41580)
Summary:
Verbose param for schedulers that don't have it https://github.com/pytorch/pytorch/issues/38726

Pull Request resolved: https://github.com/pytorch/pytorch/pull/41580

Reviewed By: izdeby

Differential Revision: D22671163

Pulled By: vincentqb

fbshipit-source-id: 53a6c9e929141d411b6846bc25f3fe7f46fdf3be
2020-07-23 09:57:33 -07:00
Jeong Ukjae
e831299bae Fix typing error of torch/optim/lr_scheduler.pyi (#41775)
Summary:
* add `_LRScheduler.get_last_lr` type stub.
* remove `CosineAnnealingWarmRestarts.step` because its signature is same with `_LRScheduler`'s.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/41775

Reviewed By: izdeby

Differential Revision: D22649350

Pulled By: vincentqb

fbshipit-source-id: 5355dd062a5af437f4fc153244dda793a2382e7e
2020-07-23 09:30:32 -07:00
farhadrgh
4b4273a04e Update Adam documentation (#41679)
Summary:
This PR fixes https://github.com/pytorch/pytorch/issues/41477

Adam implementation is doing L2 regularization and not decoupled weight decay. However, the change mentioned in https://github.com/pytorch/pytorch/issues/41477 was motivated by Line 12 of algorithm 2 in [Decoupled Weight Decay Regularization](https://arxiv.org/pdf/1711.05101.pdf) paper.

Please let me know if you have other suggestions about how to deliver this info in the docs.
cc ezyang

Pull Request resolved: https://github.com/pytorch/pytorch/pull/41679

Reviewed By: izdeby

Differential Revision: D22671329

Pulled By: vincentqb

fbshipit-source-id: 2caf60e4f62fe31f29aa35a9532d1c6895a24224
2020-07-23 09:25:41 -07:00
wudenggang
9600ed9af3 typo fixes (#41632)
Summary:
typo fixes

Pull Request resolved: https://github.com/pytorch/pytorch/pull/41632

Reviewed By: ezyang

Differential Revision: D22617827

Pulled By: mrshenli

fbshipit-source-id: c2bfcb7cc36913a8dd32f13fc9adc3aa0a9b682f
2020-07-20 07:23:00 -07:00
Edward Leardi
6b50874cb7 Fix HTTP links in documentation to HTTPS (#40878)
Summary:
I ran `make linkcheck` using `sphinx.builders.linkcheck` on the documentation and noticed a few links weren't using HTTPS so I quickly updated them all.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/40878

Differential Revision: D22404647

Pulled By: ngimel

fbshipit-source-id: 9c9756db59197304023fddc28f252314f6cf4af3
2020-07-06 20:05:21 -07:00
vfdev
a6a2dd14ea Fix typo in warning message (#39854)
Summary:
Fix typo
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39854

Reviewed By: ezyang

Differential Revision: D22193544

Pulled By: zou3519

fbshipit-source-id: 04b9f59da7b6ba0649fc6d315adcf20685e10930
2020-06-23 16:47:35 -07:00
Ram Rachum
f6b9848c25 Use chain.from_iterable in optimizer.py (#40156)
Summary:
This is a faster and more idiomatic way of using `itertools.chain`. Instead of computing all the items in the iterable and storing them in memory, they are computed one-by-one and never stored as a huge list. This can save on both runtime and memory space.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40156

Reviewed By: ezyang

Differential Revision: D22189038

Pulled By: vincentqb

fbshipit-source-id: 160b2c27f442686821a6ea541e1f48f4a846c186
2020-06-23 14:07:05 -07:00
Alex Hedges
a3c87c4922 Make Optimizer.state_dict() nondeterministic (#37347)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/36831.

Instead of using `id()`, an arbitrary yet consistent order-based index is used instead. This results in a deterministic output between runs.

I am not the biggest fan of using `nonlocal` (it appears to be used sparingly in the codebase) to get `start_index` between calls to `pack_group()`, but the alternatives had larger issues:
- Using the last value added to `param_mappings` would be ideal, but that only works if `dict` iteration order is consistent, and PyTorch currently supports Python <3.7.
- Using the maximum value added to `param_mappings` wouldn't have that issue but would not be constant time.

For testing, I confirmed that `test_optim.py` works before and after these changes. Randomizing the indices in `param_mappings` causes the tests to fail, which is further evidence these changes work. I'm not 100% if these tests are sufficient, but they're a start.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37347

Differential Revision: D21353820

Pulled By: vincentqb

fbshipit-source-id: e549f1f154833a461b1f4df6d07ad509aab34ea1
2020-06-01 15:32:02 -07:00
Ralf Gommers
9fe8243536 Fix minor issue in type stub for Optimizer (#38067)
Summary:
Closes gh-23731
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38067

Differential Revision: D21471021

Pulled By: ezyang

fbshipit-source-id: 8e7ee7f437bfa8e78a47ac6cf572b0fc9b5c6939
2020-05-07 20:11:40 -07:00
Bartosz Gasiorzewski
867e05921f Fix multiple issues with type annotations (#36358)
Summary:
- added tests that showcase the problems
- fixed the problems

These changes would allow me to remove many "# type: ignore" comments in my codebase.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36358

Differential Revision: D21230704

Pulled By: ezyang

fbshipit-source-id: e6d475a0aa1fb40258fa0231ade28c38108355fb
2020-04-29 11:16:39 -07:00
Pavel Izmailov
22ac071d9a Add SWA to PyTorch mainline (#35032)
Summary:
This PR is based on the issue https://github.com/pytorch/pytorch/issues/29994#issue-524418771 and the discussion in the previous version of the PR https://github.com/pytorch/pytorch/pull/30559. Specifically, I followed the interface outlined in this [comment](https://github.com/pytorch/pytorch/pull/30559#issuecomment-574864768).

## Structure
- `torch/optim/swa_utils.py` contains the implementation of  `AveragedModel` class, `SWALR` learning rate scheduler and `update_bn` utility
- `test/test_optim.py` contains unit tests for the three components of SWA
- `torch/optim/swa_utils.pyi` describes the interface of `torch/optim/swa_utils.py`

The new implementation consists of
- `AveragedModel` class; this class creates a copy of a given model and allows to compute running averages of the parameters.
- `SWALR` learning rate scheduler; after a certain number of epochs switches to a constant learning rate; this scheduler is supposed to be chained with other schedulers.
- `update_bn` utility; updates the Batch Normalization activation statistics for a given model and dataloader; this utility is meant to be applied to `AveragedModel` instances.

For `update_bn` I simplified the implementation compared to the [original PR](https://github.com/pytorch/pytorch/pull/30559) according to the sugestions by vadimkantorov.

## Example
```python
loader, optimizer, model = ...
swa_model = torch.optim.swa_utils.AveragedModel(model)
# You can use custom averaging functions with `avg_fun` parameter
ema_avg = lambda p_avg, p, n_avg: 0.1 * p_avg + 0.9 * p
ema_model = torch.optim.swa_utils.AveragedModel(model,
                                    avg_function=ema_avg)
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer,
                                    T_max=300)
swa_start = 160
swa_scheduler = SWALR(optimizer, start_epoch=swa_start, swa_lr=0.05)

for i in range(300):
     for input, target in loader:
         optimizer.zero_grad()
         loss_fn(model(input), target).backward()
         optimizer.step()
         scheduler.step()
         swa_scheduler.step()

     if i > swa_start:
         swa_model.update_parameters(model)

# Update bn statistics for the swa_model at the end
torch.optim.swa_utils.update_bn(loader, swa_model)
```

UPDATED:
```python3
loader, optimizer, model, loss_fn = ...
swa_model = torch.optim.swa_utils.AveragedModel(model)
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=300)
swa_start = 160
swa_scheduler = SWALR(optimizer, swa_lr=0.05)
for i in range(300):
     for input, target in loader:
         optimizer.zero_grad()
         loss_fn(model(input), target).backward()
         optimizer.step()
     if i > swa_start:
         swa_model.update_parameters(model)
         swa_scheduler.step()
     else:
         scheduler.step()

# Update bn statistics for the swa_model at the end
torch.optim.swa_utils.update_bn(loader, swa_model)
```

Fixes https://github.com/pytorch/pytorch/issues/29994
cc soumith vincentqb andrewgordonwilson vadimkantorov
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35032

Differential Revision: D21079606

Pulled By: vincentqb

fbshipit-source-id: e07f5e821f72ada63789814c2dcbdc31f0160c37
2020-04-27 07:42:19 -07:00
Masaki Kozuki
7403545518 Fix exception message of torch.optim.AdamW. (#36088)
Summary:
PyTorch does not implement `SparseAdamW`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36088

Differential Revision: D20932357

Pulled By: gchanan

fbshipit-source-id: 49e5b72c34ff8ce0deb6b3807662b8b7d67d959f
2020-04-09 08:02:10 -07:00
lordeddard
2de4f245c6 Fix typo in documentation (#34581)
Summary:
Update the  parameter description of `total_steps` in `OneCycleLR`. References https://github.com/pytorch/pytorch/issues/34531
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34581

Differential Revision: D20386306

Pulled By: albanD

fbshipit-source-id: f8b424a01760e8f5d4de5367b6c60fb342019689
2020-03-11 13:57:10 -07:00
Vincent Quenneville-Belair
be3bc1deb1 convert counter back to list #33229 (#33356)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/33229
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33356

Differential Revision: D20003196

Pulled By: vincentqb

fbshipit-source-id: 96f9e0fc7e99a7c2e202f932d1a2ffa158afad92
2020-03-10 15:46:24 -07:00
prajjwal1
b1bd950a4d Fixed stub for AdamW (#34299)
Summary:
Fixes [https://github.com/pytorch/pytorch/issues/33757](https://github.com/pytorch/pytorch/issues/33757)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34299

Differential Revision: D20337844

Pulled By: ezyang

fbshipit-source-id: 54bf174a09b8db9bf6e0c3c717730dd7c795d76b
2020-03-09 08:45:51 -07:00
albanD
6e2bb1c054 End of the .data removal in torch/optim (#34211)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34211

Test Plan: Imported from OSS

Differential Revision: D20248684

Pulled By: albanD

fbshipit-source-id: 2294bfa41b82ff47f000bc98860780f59d7d4421
2020-03-09 06:40:39 -07:00
Eleanor Dwight Holland
6a97777f72 Remove use of .data from optimizers (#33640)
Summary:
Removes all uses of `.data` from optimizers.

Or tries to.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33640

Reviewed By: vincentqb

Differential Revision: D20203216

Pulled By: albanD

fbshipit-source-id: 9bfe78bbed00fd4aaa690801cff0201f0bd680a0
2020-03-03 13:21:55 -08:00
HearyShen
edd5c009f7 fix docs mistakes in lr_scheduler.MultiplicativeLR (#33805)
Summary:
This PR is referenced to an issue: [The docs of `MultiplicativeLR` use `LambdaLR` as example](https://github.com/pytorch/pytorch/issues/33752#issue-570374087)

https://github.com/pytorch/pytorch/issues/33752
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33805

Differential Revision: D20121314

Pulled By: mruberry

fbshipit-source-id: 5afa63bbe83d35ce4e55705b8cbd96326a907651
2020-02-27 14:11:57 -08:00
JeongUkJae
b10761d890 fix type stub errors (#33762)
Summary:
I've been using pytorch with type hintings, and I found errors that can be easily fixed. So I'm creating this PR to fix type bugs.

I expected below code should be type-checked without any errors.

```python
import torch
from torch.nn import Linear
from torch.autograd import Variable
from torch.optim import AdamW
from torch.utils import hooks

# nn.Module should have training attribute
module = Linear(10, 20)
module.training

# torch should have dtype bfloat16
tensor2 = torch.tensor([1,2,3], dtype=torch.bfloat16)

# torch.Tensor.cuda should accept int or str value
torch.randn(5).cuda(1)
torch.tensor(5).cuda('cuda:0')

# optimizer should have default attribute
module = Linear(10, 20)
print(AdamW(module.weight).default)

# torch.Tensor should have these boolean attributes
torch.tensor([1]).is_sparse
torch.tensor([1]).is_quantized
torch.tensor([1]).is_mkldnn

# Size class should tuple of int
a, b = torch.tensor([[1,2,3]]).size()

# check modules can be accessed
torch.nn.parallel
torch.autograd.profiler
torch.multiprocessing
torch.sparse
torch.onnx
torch.jit
torch.hub
torch.random
torch.distributions
torch.quantization
torch.__config__
torch.__future__

torch.ops
torch.classes

# Variable class's constructor should return Tensor
def fn_to_test_variable(t: torch.Tensor):
    return None

v = Variable(torch.tensor(1))
fn_to_test_variable(v)

# check RemovableHandle attributes can be accessed
handle = hooks.RemovableHandle({})
handle.id
handle.next_id

# check torch function hints
torch.is_grad_enabled()
```

But current master branch raises errors. (I checked with pyright)

```
$ pyright test.py
Searching for source files
Found 1 source file
test.py
  12:45 - error: 'bfloat16' is not a known member of module
  15:21 - error: Argument of type 'Literal[1]' cannot be assigned to parameter 'device' of type 'Optional[device]'
  'int' is incompatible with 'device'
  Cannot assign to 'None'
  16:22 - error: Argument of type 'Literal['cuda:0']' cannot be assigned to parameter 'device' of type 'Optional[device]'
  'str' is incompatible with 'device'
  Cannot assign to 'None'
  23:19 - error: Cannot access member 'is_sparse' for type 'Tensor'
  Member 'is_sparse' is unknown
  24:19 - error: Cannot access member 'is_quantized' for type 'Tensor'
  Member 'is_quantized' is unknown
  25:19 - error: Cannot access member 'is_mkldnn' for type 'Tensor'
  Member 'is_mkldnn' is unknown
  32:7 - error: 'autograd' is not a known member of module
  33:7 - error: 'multiprocessing' is not a known member of module
  34:7 - error: 'sparse' is not a known member of module
  35:7 - error: 'onnx' is not a known member of module
  36:7 - error: 'jit' is not a known member of module
  37:7 - error: 'hub' is not a known member of module
  38:7 - error: 'random' is not a known member of module
  39:7 - error: 'distributions' is not a known member of module
  40:7 - error: 'quantization' is not a known member of module
  41:7 - error: '__config__' is not a known member of module
  42:7 - error: '__future__' is not a known member of module
  44:7 - error: 'ops' is not a known member of module
  45:7 - error: 'classes' is not a known member of module
  60:7 - error: 'is_grad_enabled' is not a known member of module
20 errors, 0 warnings
Completed in 1.436sec
```

and below list is not checked as errors, but I think these are errors too.

* `nn.Module.training` is not boolean
* return type of `torch.Tensor.size()` is `Tuple[Unknown]`.

 ---

related issues.

https://github.com/pytorch/pytorch/issues/23731, https://github.com/pytorch/pytorch/issues/32824, https://github.com/pytorch/pytorch/issues/31753
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33762

Differential Revision: D20118884

Pulled By: albanD

fbshipit-source-id: 41557d66674a11b8e7503a48476d4cdd0f278eab
2020-02-27 06:58:53 -08:00
Xiao Wang
c1dd70688a Fix deprecated python "add" calls (#33428)
Summary:
This PR fixed those python "add" calls using deprecated signature `add(Scalar, Tensor)`. The alternative signature `add(Tensor, alpha = Scalar)` is used.

cc csarofeen zasdfgbnm ptrblck ngimel
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33428

Differential Revision: D20002534

Pulled By: vincentqb

fbshipit-source-id: 81f2dd6170a47a9b53a17e5817c26e70d8afa130
2020-02-26 09:02:31 -08:00
Hong Xu
a6a72ac68f Fix all occurrences of C416. (#33429)
Summary:
C416: Unnecessary (list/set) comprehension - rewrite using list/set().

See https://pypi.org/project/flake8-comprehensions/
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33429

Differential Revision: D19972858

Pulled By: ezyang

fbshipit-source-id: faac042a94c59d737bd5ae983121a0a029346e23
2020-02-21 08:32:22 -08:00
Nikolay Novik
d19a50bf27 Add missing weight_decay parameter validation for Adam and AdamW (#33126)
Summary:
Adam and AdamW are missing parameter validation for weight_decay. Other optimisers have this check present.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33126

Differential Revision: D19860366

Pulled By: vincentqb

fbshipit-source-id: 286d7dc90e2f4ccf6540638286d2fe17939648fc
2020-02-20 11:11:51 -08:00
Edgar Andrés Margffoy Tuay
cdf381c967 Fix LambdaLR scheduler side effects (#32848)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/32756
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32848

Differential Revision: D19859736

Pulled By: vincentqb

fbshipit-source-id: 43b3cbb2b6bed208c75aad37aebc2a8a9565fe0d
2020-02-20 11:09:56 -08:00
Jeong Ukjae
879cf0b15a fix typing bug of LambdaLR.__init__ (#33271)
Summary:
## problem

```python
class LambdaLR(_LRScheduler):
    """Sets the learning rate of each parameter group to the initial lr
    times a given function. When last_epoch=-1, sets initial lr as lr.

    Args:
        optimizer (Optimizer): Wrapped optimizer.
        lr_lambda (function or list): A function which computes a multiplicative
            factor given an integer parameter epoch, or a list of such
            functions, one for each group in optimizer.param_groups.
        last_epoch (int): The index of last epoch. Default: -1.

    Example:
        >>> # Assuming optimizer has two groups.
        >>> lambda1 = lambda epoch: epoch // 30
        >>> lambda2 = lambda epoch: 0.95 ** epoch
        >>> scheduler = LambdaLR(optimizer, lr_lambda=[lambda1, lambda2])
        >>> for epoch in range(100):
        >>>     train(...)
        >>>     validate(...)
        >>>     scheduler.step()
    """
```

`LambdaLR` takes a lambda that returns a float and takes a int, or a list of such lambdas.

## related issue

Resolve https://github.com/pytorch/pytorch/issues/32645
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33271

Differential Revision: D19878665

Pulled By: vincentqb

fbshipit-source-id: 50b16caea13de5a3cbd187e688369f33500499d0
2020-02-18 09:10:00 -08:00
cshesse
1487137c5b add missing default value for LRScheduler.step() (#32411)
Summary:
see also other type errors in https://github.com/pytorch/pytorch/pull/30576 and https://github.com/pytorch/pytorch/pull/30441
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32411

Differential Revision: D19697245

Pulled By: ezyang

fbshipit-source-id: d0295d747541adec5d6fad646f4cf4bb2f04abf5
2020-02-11 20:34:33 -08:00