Commit Graph

125 Commits

Author SHA1 Message Date
cyy
54cb61f7d9 enable ASAN on some tests (#103647)
Enabling more tests on ASAN, meanwhile we disable float-divide-by-zero and float-cast-overflow, both are disabled because they are also disabled by default in latest clang.
The following cited doc explains the reasons.
```
-fsanitize=float-cast-overflow: Conversion to, from, or between floating-point types
which would overflow the destination. Because the range of representable values
for all floating-point types supported by Clang is [-inf, +inf], the only cases detected are
conversions from floating point to integer types.
-fsanitize=float-divide-by-zero: Floating point division by zero.
This is undefined per the C and C++ standards,
 but is defined by Clang (and by ISO/IEC/IEEE 60559 / IEEE 754) as producing
either an infinity or NaN value,
so is not included in -fsanitize=undefined.
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103647
Approved by: https://github.com/kit1980
2023-06-28 02:17:14 +00:00
Ali Moezzi
8c3958eddc Fix lr_scheduler serialization contains bound methods issue (#102627)
Fixes #42376
`torch.save` serializes bound methods inside LR scheduler resulting in large serialized file.

Test cases include checking file size, checking if the `anneal_func` is bounded and file is loaded correctly.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102627
Approved by: https://github.com/albanD
2023-06-23 03:53:15 +00:00
Jane Xu
fa893f3f58 Fix optim state_dict casting to allow step to cast to CPU (#102619)
I'm guessing this should fix https://github.com/pytorch/pytorch/pull/88015#issuecomment-1569523106 but am waiting on @ychfan to supply more details so I could write a good test case.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102619
Approved by: https://github.com/albanD
2023-06-13 00:46:40 +00:00
Jane Xu
4a5d56b74c Disable dynamo'd test_optim entirely (#103323)
See issue https://github.com/pytorch/pytorch/issues/103322.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103323
Approved by: https://github.com/DanilBaibak, https://github.com/atalman, https://github.com/malfet
2023-06-09 16:06:36 +00:00
Michael Lazos
0769a50a5f Disable dynamo on some opt methods and differentiable optimizer tests (#103066)
- Disables dynamo on the differentiable optimizer tests
- Disables dynamo on some test methods which expose a very rare dynamo edge case
- Disables dynamo on export/save optimizer state methods because it shouldn't trace those anyway.

I have a draft PR to fix the two tests marked skip due to unsupported mutation of step.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103066
Approved by: https://github.com/janeyx99, https://github.com/malfet
2023-06-07 03:50:42 +00:00
Catherine Lee
08c4a442fd Dont run test files that are already run in test_optim (#103017)
they get run twice on accident
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103017
Approved by: https://github.com/janeyx99
2023-06-06 17:31:21 +00:00
Michael Lazos
c75e064dd6 Disallow _foreach_utils.py, but allow it to be inlined (#102221)
This function should not be allowed, but should be inlineable.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102221
Approved by: https://github.com/anijain2305
2023-06-02 05:14:09 +00:00
Masaki Kozuki
401109a243 Use int64_t for indexing in multi_tensor_apply (#101760)
Fixes #101449

I found it better to either imitate the combo of `TensorIterator::can_use_32bit_indexing` and `TensorIterator::with_32bit_indexing` or adroitly choose the index type depending on `Tensor::numel` in the future.

---

Used `nsys nvprof` to casually see the effect of `int64_t` indexing:

```python
import torch

params = [
    {"params": [torch.randn(32, 32, device="cuda") for _ in range(100)]},
    {"params": [torch.randn(32, 32, device="cuda") for _ in range(100)]},
]
grads = [
    [torch.randn(32, 32, device="cuda") for _ in range(100)],
    [torch.randn(32, 32, device="cuda") for _ in range(100)],
]
optimizer = torch.optim.Adam(params, fused=True)

for _ in range(100):
    for i, param_groups in enumerate(params):
        for p, g in zip(param_groups["params"], grads[i]):
            p.grad = g
        optimizer.step()
        optimizer.zero_grad()
```

Environment
```
Collecting environment information...
PyTorch version: 2.1.0a0+gitf994d0b
Is debug build: False
CUDA used to build PyTorch: 12.1

Python version: 3.10.9 (main, May 17 2023, 00:46:40) [GCC 11.3.0] (64-bit runtime)
CUDA runtime version: 12.1.105
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration:
GPU 0: NVIDIA A100-SXM4-80GB
```

---

- `multi_tensor_apply_kernel<at::native::<unnamed>::FusedOptimizerTensor` -> 1.02x
- `multi_tensor_apply_kernel<at::native::<unnamed>::TensorListMetadata<(in…` -> 1.04x

Current main branch:

```
 Time (%)  Total Time (ns)  Instances  Avg (ns)  Med (ns)  Min (ns)  Max (ns)  StdDev (ns)                                                  Name
 --------  ---------------  ---------  --------  --------  --------  --------  -----------  ----------------------------------------------------------------------------------------------------
     64.9          5787610        600    9646.0    9632.0      9503      9888         52.9  void at::native::<unnamed>::multi_tensor_apply_kernel<at::native::<unnamed>::FusedOptimizerTensorLi…
...
      8.1           720575        200    3602.9    3584.0      3551      4320         63.4  void at::native::<unnamed>::multi_tensor_apply_kernel<at::native::<unnamed>::TensorListMetadata<(in…
```

this PR:

```
 Time (%)  Total Time (ns)  Instances  Avg (ns)  Med (ns)  Min (ns)  Max (ns)  StdDev (ns)                                                  Name
 --------  ---------------  ---------  --------  --------  --------  --------  -----------  ----------------------------------------------------------------------------------------------------
     65.0          5876847        600    9794.7    9792.0      9632     10080         58.1  void at::native::<unnamed>::multi_tensor_apply_kernel<at::native::<unnamed>::FusedOptimizerTensorLi…
...
      8.3           748313        200    3741.6    3744.0      3711      4479         60.0  void at::native::<unnamed>::multi_tensor_apply_kernel<at::native::<unnamed>::TensorListMetadata<(in…
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101760
Approved by: https://github.com/ngimel
2023-06-01 20:55:09 +00:00
Masaki Kozuki
2fcc2002fa Handle tail 0-size tensor appropriately in MultiTensorApply (#100811)
Fixes #100701

It seems like we don't call `multi_tensor_apply_kernel` at all if the input tensor lists are small and their last tensors are zero-size as per e.g. ca9f55f79d/aten/src/ATen/native/cuda/MultiTensorApply.cuh (L100-L102)
which was introduced in 05943712a4.

This PR special cases the last zero-size tensors so that we won't be negligent.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100811
Approved by: https://github.com/ngimel
2023-05-12 20:26:45 +00:00
Jane Xu
a53cda1ddc [optim][BE] split test file into logical parts: SWA, LR, optim (#101100)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101100
Approved by: https://github.com/albanD
2023-05-12 16:41:44 +00:00
Alban Desmaison
d3b29afbb6 Remove old code that is unused in test/ (#66331)
Summary:
.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66331

Reviewed By: gchanan

Differential Revision: D31533549

Pulled By: albanD

fbshipit-source-id: 5addd11edc4199a88f10f0ff236be59ec2289903
2021-10-11 08:45:24 -07:00
Shen Li
1022443168 Revert D30279364: [codemod][lint][fbcode/c*] Enable BLACK by default
Test Plan: revert-hammer

Differential Revision:
D30279364 (b004307252)

Original commit changeset: c1ed77dfe43a

fbshipit-source-id: eab50857675c51e0088391af06ec0ecb14e2347e
2021-08-12 11:45:01 -07:00
Zsolt Dollenstein
b004307252 [codemod][lint][fbcode/c*] Enable BLACK by default
Test Plan: manual inspection & sandcastle

Reviewed By: zertosh

Differential Revision: D30279364

fbshipit-source-id: c1ed77dfe43a3bde358f92737cd5535ae5d13c9a
2021-08-12 10:58:35 -07:00
Ilqar Ramazanli
63219f1f9f To add Rectified Adam Algorithm to Optimizers (#58968)
Summary:
Fixes : https://github.com/pytorch/pytorch/issues/24892

In the paper : https://arxiv.org/pdf/1908.03265.pdf  Liyuan Liu et al. suggested a new optimization algorithm with an essence of similar to Adam Algorithm.

It has been discussed in the paper that, without warmup heuristic, in the early stage of adaptive optimization / learning algorithms sometimes we can get undesirable large variance which can slow overall convergence process.

Authors proposed the idea of rectification of variance of adaptive learning rate when it is expected to be high.

Differing from the paper, we selected variance tractability cut-off as 5 instead of 4. This adjustment is common practice, and could be found in the code-repository and also tensorflow swift optim library as well :

2f03dd1970/radam/radam.py (L156)

f51ee4618d/Sources/TensorFlow/Optimizers/MomentumBased.swift (L638)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58968

Reviewed By: vincentqb

Differential Revision: D29310601

Pulled By: iramazanli

fbshipit-source-id: b7bd487f72f1074f266687fd9c0c6be264a748a9
2021-06-23 18:27:57 -07:00
Ilqar Ramazanli
e8690dacb2 To add Nesterov Adam Algorithm to Optimizers (#59009)
Summary:
Fixes : https://github.com/pytorch/pytorch/issues/5804

In the paper : https://openreview.net/forum?id=OM0jvwB8jIp57ZJjtNEZ  Timothy Dozat suggested a new optimization algorithm with an essence of combination of NAG and Adam algorithms.

It is known that the idea of momentum can be improved with the Nesterov acceleration in optimization algorithms, and Dozat is investigating to apply this idea to momentum component of Adam algorithm. Author provided experiment evidence in their work to show excellence of the idea.

In this PR we are implementing the proposed algorithm NAdam in the mentioned paper. Author has a preliminary work http://cs229.stanford.edu/proj2015/054_report.pdf  where he shows the decay base constant should be taken as 0.96 which we also followed the same phenomenon here in this implementation similar to Keras. Moreover, implementation / coding practice have been followed similar to Keras in some other places as well:

f9d3868495/tensorflow/python/keras/optimizer_v2/nadam.py

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59009

Reviewed By: gchanan, vincentqb

Differential Revision: D29220375

Pulled By: iramazanli

fbshipit-source-id: 4b4bb4b15f7e16f7527f368bbf4207ed345751aa
2021-06-23 08:21:43 -07:00
Sam Estep
1abf45e37f Revert D29241736: [pytorch][PR] To add Rectified Adam Algorithm to Optimizers
Test Plan: revert-hammer

Differential Revision:
D29241736 (0d2a936176)

Original commit changeset: 288b9b1f3125

fbshipit-source-id: 56c4ec98647c6f1822b130726741a1c9ca193670
2021-06-22 12:08:31 -07:00
Ilqar Ramazanli
0d2a936176 To add Rectified Adam Algorithm to Optimizers (#58968)
Summary:
Fixes : https://github.com/pytorch/pytorch/issues/24892

In the paper : https://arxiv.org/pdf/1908.03265.pdf  Liyuan Liu et al. suggested a new optimization algorithm with an essence of similar to Adam Algorithm.

It has been discussed in the paper that, without warmup heuristic, in the early stage of adaptive optimization / learning algorithms sometimes we can get undesirable large variance which can slow overall convergence process.

Authors proposed the idea of rectification of variance of adaptive learning rate when it is expected to be high.

Differing from the paper, we selected variance tractability cut-off as 5 instead of 4. This adjustment is common practice, and could be found in the code-repository and also tensorflow swift optim library as well :

2f03dd1970/radam/radam.py (L156)

f51ee4618d/Sources/TensorFlow/Optimizers/MomentumBased.swift (L638)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58968

Reviewed By: gchanan

Differential Revision: D29241736

Pulled By: iramazanli

fbshipit-source-id: 288b9b1f3125fdc6c7a7bb23fde1ea5c201c0448
2021-06-22 10:38:41 -07:00
Sam Estep
5bcbbf5373 Lint trailing newlines (#54737)
Summary:
*Context:* https://github.com/pytorch/pytorch/issues/53406 added a lint for trailing whitespace at the ends of lines. However, in order to pass FB-internal lints, that PR also had to normalize the trailing newlines in four of the files it touched. This PR adds an OSS lint to normalize trailing newlines.

The changes to the following files (made in 54847d0adb9be71be4979cead3d9d4c02160e4cd) are the only manually-written parts of this PR:

- `.github/workflows/lint.yml`
- `mypy-strict.ini`
- `tools/README.md`
- `tools/test/test_trailing_newlines.py`
- `tools/trailing_newlines.py`

I would have liked to make this just a shell one-liner like the other three similar lints, but nothing I could find quite fit the bill. Specifically, all the answers I tried from the following Stack Overflow questions were far too slow (at least a minute and a half to run on this entire repository):

- [How to detect file ends in newline?](https://stackoverflow.com/q/38746)
- [How do I find files that do not end with a newline/linefeed?](https://stackoverflow.com/q/4631068)
- [How to list all files in the Git index without newline at end of file](https://stackoverflow.com/q/27624800)
- [Linux - check if there is an empty line at the end of a file [duplicate]](https://stackoverflow.com/q/34943632)
- [git ensure newline at end of each file](https://stackoverflow.com/q/57770972)

To avoid giving false positives during the few days after this PR is merged, we should probably only merge it after https://github.com/pytorch/pytorch/issues/54967.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/54737

Test Plan:
Running the shell script from the "Ensure correct trailing newlines" step in the `quick-checks` job of `.github/workflows/lint.yml` should print no output and exit in a fraction of a second with a status of 0. That was not the case prior to this PR, as shown by this failing GHA workflow run on an earlier draft of this PR:

- https://github.com/pytorch/pytorch/runs/2197446987?check_suite_focus=true

In contrast, this run (after correcting the trailing newlines in this PR) succeeded:

- https://github.com/pytorch/pytorch/pull/54737/checks?check_run_id=2197553241

To unit-test `tools/trailing_newlines.py` itself (this is run as part of our "Test tools" GitHub Actions workflow):
```
python tools/test/test_trailing_newlines.py
```

Reviewed By: malfet

Differential Revision: D27409736

Pulled By: samestep

fbshipit-source-id: 46f565227046b39f68349bbd5633105b2d2e9b19
2021-03-30 13:09:52 -07:00
Michael Acar
a4b2f3e213 Implement AdamW optimizer (#21250)
Summary:
# What is this?
This is an implementation of the AdamW optimizer as implemented in [the fastai library](803894051b/fastai/callback.py) and as initially introduced in the paper [Decoupled Weight Decay Regularization](https://arxiv.org/abs/1711.05101). It decouples the weight decay regularization step from the optimization step during training.

There have already been several abortive attempts to push this into pytorch in some form or fashion: https://github.com/pytorch/pytorch/pull/17468, https://github.com/pytorch/pytorch/pull/10866, https://github.com/pytorch/pytorch/pull/3740, https://github.com/pytorch/pytorch/pull/4429. Hopefully this one goes through.
# Why is this important?
Via a simple reparameterization, it can be shown that L2 regularization has a weight decay effect in the case of SGD optimization. Because of this, L2 regularization became synonymous with the concept of weight decay. However, it can be shown that the equivalence of L2 regularization and weight decay breaks down for more complex adaptive optimization schemes. It was shown in the paper [Decoupled Weight Decay Regularization](https://arxiv.org/abs/1711.05101) that this is the reason why models trained with SGD achieve better generalization than those trained with Adam. Weight decay is a very effective regularizer. L2 regularization, in and of itself, is much less effective. By explicitly decaying the weights, we can achieve state-of-the-art results while also taking advantage of the quick convergence properties that adaptive optimization schemes have.
# How was this tested?
There were test cases added to `test_optim.py` and I also ran a [little experiment](https://gist.github.com/mjacar/0c9809b96513daff84fe3d9938f08638) to validate that this implementation is equivalent to the fastai implementation.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21250

Differential Revision: D16060339

Pulled By: vincentqb

fbshipit-source-id: ded7cc9cfd3fde81f655b9ffb3e3d6b3543a4709
2019-07-02 09:09:10 -07:00
Edward Yang
173f224570 Turn on F401: Unused import warning. (#18598)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18598
ghimport-source-id: c74597e5e7437e94a43c163cee0639b20d0d0c6a

Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18598 Turn on F401: Unused import warning.**

This was requested by someone at Facebook; this lint is turned
on for Facebook by default.  "Sure, why not."

I had to noqa a number of imports in __init__.  Hypothetically
we're supposed to use __all__ in this case, but I was too lazy
to fix it.  Left for future work.

Be careful!  flake8-2 and flake8-3 behave differently with
respect to import resolution for # type: comments.  flake8-3 will
report an import unused; flake8-2 will not.  For now, I just
noqa'd all these sites.

All the changes were done by hand.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: D14687478

fbshipit-source-id: 30d532381e914091aadfa0d2a5a89404819663e3
2019-03-30 09:01:17 -07:00
theFool
fca77c9e25 Correct gradient of rosenbrock (#3881) 2017-11-26 11:49:04 -05:00
Soumith Chintala
d4c9a3782b billinear -> bilinear, docs for upsampling, improved docs for Unpooling, pep8 tests fix (#617)
* billinear -> bilinear, docs for upsampling, improved docs for Unpooling, pep8 tests fix
2017-01-30 05:08:48 +05:30
Luke Yeager
e7c1e6a8e3 [pep8] Fix most lint automatically with autopep8
Here's the command I used to invoke autopep8 (in parallel!):

    git ls-files | grep '\.py$' | xargs -n1 -P`nproc` autopep8 -i

Several rules are ignored in setup.cfg. The goal is to let autopep8
handle everything which it can handle safely, and to disable any rules
which are tricky or controversial to address. We may want to come back
and re-enable some of these rules later, but I'm trying to make this
patch as safe as possible.

Also configures flake8 to match pep8's behavior.

Also configures TravisCI to check the whole project for lint.
2017-01-28 01:15:51 +01:00
Adam Paszke
f8ae34706e Port L-BFGS from Lua optim 2017-01-22 18:02:40 -05:00
soumith
3d80b1371a adding optim tests 2016-08-12 16:46:18 -07:00