Commit Graph

64 Commits

Author SHA1 Message Date
Jon Chuang
62de29d06f [optim] be explicit about CPU scalar tensor dtypes (#111008)
Fixes https://github.com/pytorch/pytorch/issues/110940

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111008
Approved by: https://github.com/janeyx99
2023-11-21 22:44:50 +00:00
pilot-j
a2552d5521 Fixed docstring errors inside torch/cuda/ and torch/optim/ (Docathon H2) (#112964)
Fixes #112592
1) **File: torch/cuda/random.py**
```
Before:
/content/pytorch/torch/cuda/random.py:1 at module level:
        D100: Missing docstring in public module
/content/pytorch/torch/cuda/random.py:21 in public function `get_rng_state`:
        D401: First line should be in imperative mood (perhaps 'Return', not 'Returns')
/content/pytorch/torch/cuda/random.py:43 in public function `get_rng_state_all`:
        D202: No blank lines allowed after function docstring (found 1)
/content/pytorch/torch/cuda/random.py:43 in public function `get_rng_state_all`:
        D401: First line should be in imperative mood (perhaps 'Return', not 'Returns')
/content/pytorch/torch/cuda/random.py:54 in public function `set_rng_state`:
        D401: First line should be in imperative mood (perhaps 'Set', not 'Sets')
/content/pytorch/torch/cuda/random.py:79 in public function `set_rng_state_all`:
        D208: Docstring is over-indented
/content/pytorch/torch/cuda/random.py:79 in public function `set_rng_state_all`:
        D209: Multi-line docstring closing quotes should be on a separate line
/content/pytorch/torch/cuda/random.py:79 in public function `set_rng_state_all`:
        D401: First line should be in imperative mood (perhaps 'Set', not 'Sets')
/content/pytorch/torch/cuda/random.py:79 in public function `set_rng_state_all`:
        D414: Section has no content ('Args')
/content/pytorch/torch/cuda/random.py:88 in public function `manual_seed`:
        D205: 1 blank line required between summary line and description (found 0)
/content/pytorch/torch/cuda/random.py:88 in public function `manual_seed`:
        D401: First line should be in imperative mood (perhaps 'Set', not 'Sets')
/content/pytorch/torch/cuda/random.py:110 in public function `manual_seed_all`:
        D205: 1 blank line required between summary line and description (found 0)
/content/pytorch/torch/cuda/random.py:110 in public function `manual_seed_all`:
        D401: First line should be in imperative mood (perhaps 'Set', not 'Sets')
/content/pytorch/torch/cuda/random.py:128 in public function `seed`:
        D205: 1 blank line required between summary line and description (found 0)
/content/pytorch/torch/cuda/random.py:128 in public function `seed`:
        D401: First line should be in imperative mood (perhaps 'Set', not 'Sets')
/content/pytorch/torch/cuda/random.py:146 in public function `seed_all`:
        D205: 1 blank line required between summary line and description (found 0)
/content/pytorch/torch/cuda/random.py:146 in public function `seed_all`:
        D401: First line should be in imperative mood (perhaps 'Set', not 'Sets')
/content/pytorch/torch/cuda/random.py:167 in public function `initial_seed`:
        D401: First line should be in imperative mood (perhaps 'Return', not 'Returns')
18
```

```
After:
/content/pytorch/torch/cuda/random.py:1 at module level:
        D100: Missing docstring in public module
1

```
2) **File: torch/cuda/amp/autocast_mode.py**
```
Before: /content/pytorch/torch/cuda/amp/autocast_mode.py:1 at module level:
        D100: Missing docstring in public module
/content/pytorch/torch/cuda/amp/autocast_mode.py:18 in public class `autocast`:
        D205: 1 blank line required between summary line and description (found 0)
/content/pytorch/torch/cuda/amp/autocast_mode.py:23 in public method `__init__`:
        D107: Missing docstring in __init__
/content/pytorch/torch/cuda/amp/autocast_mode.py:38 in public method `__enter__`:
        D105: Missing docstring in magic method
/content/pytorch/torch/cuda/amp/autocast_mode.py:44 in public method `__exit__`:
        D105: Missing docstring in magic method
/content/pytorch/torch/cuda/amp/autocast_mode.py:49 in public method `__call__`:
        D102: Missing docstring in public method
/content/pytorch/torch/cuda/amp/autocast_mode.py:90 in public function `custom_fwd`:
        D205: 1 blank line required between summary line and description (found 0)
/content/pytorch/torch/cuda/amp/autocast_mode.py:90 in public function `custom_fwd`:
        D400: First line should end with a period (not 'f')
/content/pytorch/torch/cuda/amp/autocast_mode.py:90 in public function `custom_fwd`:
        D401: First line should be in imperative mood; try rephrasing (found 'Helper')
/content/pytorch/torch/cuda/amp/autocast_mode.py:130 in public function `custom_bwd`:
        D205: 1 blank line required between summary line and description (found 0)
/content/pytorch/torch/cuda/amp/autocast_mode.py:130 in public function `custom_bwd`:
        D400: First line should end with a period (not 'f')
/content/pytorch/torch/cuda/amp/autocast_mode.py:130 in public function `custom_bwd`:
        D401: First line should be in imperative mood; try rephrasing (found 'Helper')
12
```
```
After:
/content/pytorch/torch/cuda/amp/autocast_mode.py:1 at module level:
        D100: Missing docstring in public module
/content/pytorch/torch/cuda/amp/autocast_mode.py:23 in public method `__init__`:
        D107: Missing docstring in __init__
/content/pytorch/torch/cuda/amp/autocast_mode.py:38 in public method `__enter__`:
        D105: Missing docstring in magic method
/content/pytorch/torch/cuda/amp/autocast_mode.py:44 in public method `__exit__`:
        D105: Missing docstring in magic method
/content/pytorch/torch/cuda/amp/autocast_mode.py:49 in public method `__call__`:
        D102: Missing docstring in public method
5
```

3)  **File: torch/cuda/amp/grad_scaler.py**
```
Before: /content/pytorch/torch/cuda/amp/grad_scaler.py:1 at module level:
        D100: Missing docstring in public module
/content/pytorch/torch/cuda/amp/grad_scaler.py:17 in private class `_MultiDeviceReplicator`:
        D200: One-line docstring should fit on one line with quotes (found 3)
/content/pytorch/torch/cuda/amp/grad_scaler.py:39 in public class `OptState`:
        D101: Missing docstring in public class
/content/pytorch/torch/cuda/amp/grad_scaler.py:50 in public class `GradScaler`:
        D205: 1 blank line required between summary line and description (found 0)
/content/pytorch/torch/cuda/amp/grad_scaler.py:50 in public class `GradScaler`:
        D400: First line should end with a period (not 'g')
/content/pytorch/torch/cuda/amp/grad_scaler.py:115 in public method `__init__`:
        D107: Missing docstring in __init__
/content/pytorch/torch/cuda/amp/grad_scaler.py:354 in public method `step`:
        D400: First line should end with a period (not ':')
/content/pytorch/torch/cuda/amp/grad_scaler.py:456 in public method `update`:
        D401: First line should be in imperative mood (perhaps 'Update', not 'Updates')
/content/pytorch/torch/cuda/amp/grad_scaler.py:529 in public method `get_scale`:
        D401: First line should be in imperative mood (perhaps 'Return', not 'Returns')
/content/pytorch/torch/cuda/amp/grad_scaler.py:544 in public method `get_growth_factor`:
        D200: One-line docstring should fit on one line with quotes (found 3)
/content/pytorch/torch/cuda/amp/grad_scaler.py:544 in public method `get_growth_factor`:
        D401: First line should be in imperative mood (perhaps 'Return', not 'Returns')
/content/pytorch/torch/cuda/amp/grad_scaler.py:550 in public method `set_growth_factor`:
        D205: 1 blank line required between summary line and description (found 0)
/content/pytorch/torch/cuda/amp/grad_scaler.py:550 in public method `set_growth_factor`:
        D400: First line should end with a period (not ':')
/content/pytorch/torch/cuda/amp/grad_scaler.py:557 in public method `get_backoff_factor`:
        D200: One-line docstring should fit on one line with quotes (found 3)
/content/pytorch/torch/cuda/amp/grad_scaler.py:557 in public method `get_backoff_factor`:
        D401: First line should be in imperative mood (perhaps 'Return', not 'Returns')
/content/pytorch/torch/cuda/amp/grad_scaler.py:563 in public method `set_backoff_factor`:
        D205: 1 blank line required between summary line and description (found 0)
/content/pytorch/torch/cuda/amp/grad_scaler.py:563 in public method `set_backoff_factor`:
        D400: First line should end with a period (not ':')
/content/pytorch/torch/cuda/amp/grad_scaler.py:570 in public method `get_growth_interval`:
        D200: One-line docstring should fit on one line with quotes (found 3)
/content/pytorch/torch/cuda/amp/grad_scaler.py:570 in public method `get_growth_interval`:
        D401: First line should be in imperative mood (perhaps 'Return', not 'Returns')
/content/pytorch/torch/cuda/amp/grad_scaler.py:576 in public method `set_growth_interval`:
        D205: 1 blank line required between summary line and description (found 0)
/content/pytorch/torch/cuda/amp/grad_scaler.py:576 in public method `set_growth_interval`:
        D400: First line should end with a period (not ':')
/content/pytorch/torch/cuda/amp/grad_scaler.py:592 in public method `is_enabled`:
        D200: One-line docstring should fit on one line with quotes (found 3)
/content/pytorch/torch/cuda/amp/grad_scaler.py:592 in public method `is_enabled`:
        D401: First line should be in imperative mood (perhaps 'Return', not 'Returns')
/content/pytorch/torch/cuda/amp/grad_scaler.py:598 in public method `state_dict`:
        D400: First line should end with a period (not ':')
/content/pytorch/torch/cuda/amp/grad_scaler.py:598 in public method `state_dict`:
        D401: First line should be in imperative mood (perhaps 'Return', not 'Returns')
/content/pytorch/torch/cuda/amp/grad_scaler.py:624 in public method `load_state_dict`:
        D401: First line should be in imperative mood (perhaps 'Load', not 'Loads')
/content/pytorch/torch/cuda/amp/grad_scaler.py:649 in public method `__getstate__`:
        D105: Missing docstring in magic method
/content/pytorch/torch/cuda/amp/grad_scaler.py:665 in public method `__setstate__`:
        D105: Missing docstring in magic method
28
```
```
After:
/content/pytorch/torch/cuda/amp/grad_scaler.py:1 at module level:
        D100: Missing docstring in public module
/content/pytorch/torch/cuda/amp/grad_scaler.py:40 in public class `OptState`:
        D101: Missing docstring in public class
/content/pytorch/torch/cuda/amp/grad_scaler.py:117 in public method `__init__`:
        D107: Missing docstring in __init__
/content/pytorch/torch/cuda/amp/grad_scaler.py:647 in public method `__getstate__`:
        D105: Missing docstring in magic method
/content/pytorch/torch/cuda/amp/grad_scaler.py:663 in public method `__setstate__`:
        D105: Missing docstring in magic method
5
```
4) **File: torch/optim/_functional.py**
```
Before:
/content/pytorch/torch/optim/_functional.py:1 at module level:
        D400: First line should end with a period (not 'e')
1
```
```
After:
0

```
5) **File: torch/optim/__init__.py**
```
Before:
/content/pytorch/torch/optim/__init__.py:1 at module level:
        D205: 1 blank line required between summary line and description (found 0)
1
```
```
After:
0

```
6) **File: torch/optim/lbfgs.py**
```
Before:
/content/pytorch/torch/optim/lbfgs.py:1 at module level:
        D100: Missing docstring in public module
/content/pytorch/torch/optim/lbfgs.py:185 in public class `LBFGS`:
        D205: 1 blank line required between summary line and description (found 0)
/content/pytorch/torch/optim/lbfgs.py:185 in public class `LBFGS`:
        D400: First line should end with a period (not 'c')
/content/pytorch/torch/optim/lbfgs.py:215 in public method `__init__`:
        D107: Missing docstring in __init__
/content/pytorch/torch/optim/lbfgs.py:285 in public method `step`:
        D401: First line should be in imperative mood (perhaps 'Perform', not 'Performs')
5
```
```
After:
/content/pytorch/torch/optim/lbfgs.py:1 at module level:
        D100: Missing docstring in public module
/content/pytorch/torch/optim/lbfgs.py:217 in public method `__init__`:
        D107: Missing docstring in __init__
2
```
7)**File: torch/optim/sparse_adam.py**
```
Before: /content/pytorch/torch/optim/sparse_adam.py:1 at module level:
        D100: Missing docstring in public module
/content/pytorch/torch/optim/sparse_adam.py:7 in public class `SparseAdam`:
        D101: Missing docstring in public class
/content/pytorch/torch/optim/sparse_adam.py:8 in public method `__init__`:
        D107: Missing docstring in __init__
/content/pytorch/torch/optim/sparse_adam.py:40 in public method `step`:
        D401: First line should be in imperative mood (perhaps 'Perform', not 'Performs')
4
```
```
After:
/content/pytorch/torch/optim/sparse_adam.py:1 at module level:
        D100: Missing docstring in public module
/content/pytorch/torch/optim/sparse_adam.py:7 in public class `SparseAdam`:
        D101: Missing docstring in public class
/content/pytorch/torch/optim/sparse_adam.py:8 in public method `__init__`:
        D107: Missing docstring in __init__
3
```
8) **File:torch/optim/adadelta.py**
```
Before:
/content/pytorch/torch/optim/adadelta.py:1 at module level:
        D100: Missing docstring in public module
/content/pytorch/torch/optim/adadelta.py:11 in public class `Adadelta`:
        D101: Missing docstring in public class
/content/pytorch/torch/optim/adadelta.py:12 in public method `__init__`:
        D107: Missing docstring in __init__
/content/pytorch/torch/optim/adadelta.py:44 in public method `__setstate__`:
        D105: Missing docstring in magic method
/content/pytorch/torch/optim/adadelta.py:82 in public method `step`:
        D401: First line should be in imperative mood (perhaps 'Perform', not 'Performs')
/content/pytorch/torch/optim/adadelta.py:193 in public function `adadelta`:
        D202: No blank lines allowed after function docstring (found 1)
6
```
```
After:
/content/pytorch/torch/optim/adadelta.py:1 at module level:
        D100: Missing docstring in public module
/content/pytorch/torch/optim/adadelta.py:11 in public class `Adadelta`:
        D101: Missing docstring in public class
/content/pytorch/torch/optim/adadelta.py:12 in public method `__init__`:
        D107: Missing docstring in __init__
/content/pytorch/torch/optim/adadelta.py:44 in public method `__setstate__`:
        D105: Missing docstring in magic method
4
```
9) **File: torch/optim/adagrad.py**
```
Before:
/content/pytorch/torch/optim/adagrad.py:1 at module level:
        D100: Missing docstring in public module
/content/pytorch/torch/optim/adagrad.py:11 in public class `Adagrad`:
        D101: Missing docstring in public class
/content/pytorch/torch/optim/adagrad.py:12 in public method `__init__`:
        D107: Missing docstring in __init__
/content/pytorch/torch/optim/adagrad.py:63 in public method `__setstate__`:
        D105: Missing docstring in magic method
/content/pytorch/torch/optim/adagrad.py:78 in public method `share_memory`:
        D102: Missing docstring in public method
/content/pytorch/torch/optim/adagrad.py:100 in public method `step`:
        D401: First line should be in imperative mood (perhaps 'Perform', not 'Performs')
/content/pytorch/torch/optim/adagrad.py:201 in public function `adagrad`:
        D202: No blank lines allowed after function docstring (found 1)
7
```
```
After:
/content/pytorch/torch/optim/adagrad.py:1 at module level:
        D100: Missing docstring in public module
/content/pytorch/torch/optim/adagrad.py:11 in public class `Adagrad`:
        D101: Missing docstring in public class
/content/pytorch/torch/optim/adagrad.py:12 in public method `__init__`:
        D107: Missing docstring in __init__
/content/pytorch/torch/optim/adagrad.py:63 in public method `__setstate__`:
        D105: Missing docstring in magic method
/content/pytorch/torch/optim/adagrad.py:78 in public method `share_memory`:
        D102: Missing docstring in public method
5
```
10) **File: torch/optim/adam.py**
```
Before:
/content/pytorch/torch/optim/adam.py:1 at module level:
        D100: Missing docstring in public module
/content/pytorch/torch/optim/adam.py:14 in public class `Adam`:
        D101: Missing docstring in public class
/content/pytorch/torch/optim/adam.py:15 in public method `__init__`:
        D107: Missing docstring in __init__
/content/pytorch/torch/optim/adam.py:65 in public method `__setstate__`:
        D105: Missing docstring in magic method
/content/pytorch/torch/optim/adam.py:135 in public method `step`:
        D401: First line should be in imperative mood (perhaps 'Perform', not 'Performs')
/content/pytorch/torch/optim/adam.py:281 in public function `adam`:
        D202: No blank lines allowed after function docstring (found 1)
/content/pytorch/torch/optim/adam.py:281 in public function `adam`:
        D205: 1 blank line required between summary line and description (found 0)
7
```
```
After:
/content/pytorch/torch/optim/adam.py:1 at module level:
        D100: Missing docstring in public module
/content/pytorch/torch/optim/adam.py:14 in public class `Adam`:
        D101: Missing docstring in public class
/content/pytorch/torch/optim/adam.py:15 in public method `__init__`:
        D107: Missing docstring in __init__
/content/pytorch/torch/optim/adam.py:65 in public method `__setstate__`:
        D105: Missing docstring in magic method
4

```
11) **File: torch/optim/adamax.py**
```
Before:
/content/pytorch/torch/optim/adamax.py:1 at module level:
        D100: Missing docstring in public module
/content/pytorch/torch/optim/adamax.py:12 in public class `Adamax`:
        D101: Missing docstring in public class
/content/pytorch/torch/optim/adamax.py:13 in public method `__init__`:
        D107: Missing docstring in __init__
/content/pytorch/torch/optim/adamax.py:47 in public method `__setstate__`:
        D105: Missing docstring in magic method
/content/pytorch/torch/optim/adamax.py:91 in public method `step`:
        D401: First line should be in imperative mood (perhaps 'Perform', not 'Performs')
/content/pytorch/torch/optim/adamax.py:203 in public function `adamax`:
        D202: No blank lines allowed after function docstring (found 1)
6
```
```
After:
/content/pytorch/torch/optim/adamax.py:1 at module level:
        D100: Missing docstring in public module
/content/pytorch/torch/optim/adamax.py:12 in public class `Adamax`:
        D101: Missing docstring in public class
/content/pytorch/torch/optim/adamax.py:13 in public method `__init__`:
        D107: Missing docstring in __init__
/content/pytorch/torch/optim/adamax.py:47 in public method `__setstate__`:
        D105: Missing docstring in magic method
4
```
12) **File: torch/optim/adamw.py**
```
Before:
/content/pytorch/torch/optim/adamw.py:1 at module level:
        D100: Missing docstring in public module
/content/pytorch/torch/optim/adamw.py:12 in public class `AdamW`:
        D101: Missing docstring in public class
/content/pytorch/torch/optim/adamw.py:13 in public method `__init__`:
        D107: Missing docstring in __init__
/content/pytorch/torch/optim/adamw.py:73 in public method `__setstate__`:
        D105: Missing docstring in magic method
/content/pytorch/torch/optim/adamw.py:153 in public method `step`:
        D401: First line should be in imperative mood (perhaps 'Perform', not 'Performs')
/content/pytorch/torch/optim/adamw.py:304 in public function `adamw`:
        D202: No blank lines allowed after function docstring (found 1)
6

```
```
After:
/content/pytorch/torch/optim/adamw.py:1 at module level:
        D100: Missing docstring in public module
/content/pytorch/torch/optim/adamw.py:12 in public class `AdamW`:
        D101: Missing docstring in public class
/content/pytorch/torch/optim/adamw.py:13 in public method `__init__`:
        D107: Missing docstring in __init__
/content/pytorch/torch/optim/adamw.py:73 in public method `__setstate__`:
        D105: Missing docstring in magic method
4

```
13) **File: torch/optim/asgd.py**
```
Before:
/content/pytorch/torch/optim/asgd.py:1 at module level:
        D100: Missing docstring in public module
/content/pytorch/torch/optim/asgd.py:17 in public class `ASGD`:
        D101: Missing docstring in public class
/content/pytorch/torch/optim/asgd.py:18 in public method `__init__`:
        D107: Missing docstring in __init__
/content/pytorch/torch/optim/asgd.py:52 in public method `__setstate__`:
        D105: Missing docstring in magic method
/content/pytorch/torch/optim/asgd.py:107 in public method `step`:
        D401: First line should be in imperative mood (perhaps 'Perform', not 'Performs')
/content/pytorch/torch/optim/asgd.py:195 in public function `asgd`:
        D202: No blank lines allowed after function docstring (found 1)
6
```
```
After:
/content/pytorch/torch/optim/asgd.py:1 at module level:
        D100: Missing docstring in public module
/content/pytorch/torch/optim/asgd.py:17 in public class `ASGD`:
        D101: Missing docstring in public class
/content/pytorch/torch/optim/asgd.py:18 in public method `__init__`:
        D107: Missing docstring in __init__
/content/pytorch/torch/optim/asgd.py:52 in public method `__setstate__`:
        D105: Missing docstring in magic method
4
```
Resolved docstring errors as listed. I initially changed in the main branch of forked repo which caused changes to appear in my PR to other issue. I have fixed that and hope this PR won't have any conflicts.
Kindly review @svekars @jbschlosser.
In case of any other issues please let me know. Thanks!

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112964
Approved by: https://github.com/kit1980
2023-11-13 22:16:44 +00:00
Jon Chuang
954cba2ede [optim/dynamo] shortcut adagrad with has_complex (#112722)
Follow up to https://github.com/pytorch/pytorch/pull/110706, it was missed as depended on another fix

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112722
Approved by: https://github.com/albanD
2023-11-02 16:50:45 +00:00
Jane Xu
93a9b1314b Make step() faster by passing in a tensor vs scalar 1 (#111084)
This is the culminated result of https://github.com/pytorch/pytorch/pull/110954#issuecomment-1758520411.

We are making the code slightly more complicated to gain some perf in minimizing calls to `.copy_()` and `.to()`.

### Code
```
import torch
with torch.cuda.device(0):
    steps = [torch.zeros((), device="cpu", dtype=torch.float32) for i in range(1000)]

    with torch.profiler.profile(
        activities=[
            torch.profiler.ProfilerActivity.CPU,
            torch.profiler.ProfilerActivity.CUDA,
        ]
    ) as p:
        # New code:
        # step_device = steps[0].device
        # one = torch.tensor(1.0, device=step_device) if str(step_device) == "cpu" else 1
        # torch._foreach_add_(steps, one, 1.0)

        # Old code:
        torch._foreach_add_(steps, 1)

    print(p.key_averages().table(sort_by="cpu_time_total"))
```

### Profiles
**with old code**
```
-------------------------  ------------  ------------  ------------  ------------  ------------  ------------
                     Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg    # of Calls
-------------------------  ------------  ------------  ------------  ------------  ------------  ------------
      aten::_foreach_add_        35.31%      52.089ms        99.99%     147.495ms     147.495ms             1
               aten::add_        25.05%      36.949ms        64.68%      95.406ms      95.406us          1000
                 aten::to         3.97%       5.852ms        39.63%      58.457ms      58.457us          1000
           aten::_to_copy        10.11%      14.917ms        35.66%      52.605ms      52.605us          1000
              aten::copy_        21.65%      31.939ms        21.65%      31.939ms      31.939us          1000
      aten::empty_strided         3.90%       5.749ms         3.90%       5.749ms       5.749us          1000
    cudaDeviceSynchronize         0.01%      18.000us         0.01%      18.000us      18.000us             1
-------------------------  ------------  ------------  ------------  ------------  ------------  ------------
Self CPU time total: 147.513ms
```

**with new code**
```
-------------------------  ------------  ------------  ------------  ------------  ------------  ------------
                     Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg    # of Calls
-------------------------  ------------  ------------  ------------  ------------  ------------  ------------
      aten::_foreach_add_        55.06%      49.963ms        99.86%      90.625ms      90.625ms             1
               aten::add_        44.81%      40.662ms        44.81%      40.662ms      40.662us          1000
            aten::detach_         0.01%       8.000us         0.05%      45.000us      45.000us             1
                  detach_         0.04%      37.000us         0.04%      37.000us      37.000us             1
              aten::empty         0.03%      30.000us         0.03%      30.000us      30.000us             1
                 aten::to         0.03%      23.000us         0.03%      23.000us      23.000us             1
    cudaDeviceSynchronize         0.02%      22.000us         0.02%      22.000us      22.000us             1
         aten::lift_fresh         0.01%       6.000us         0.01%       6.000us       6.000us             1
-------------------------  ------------  ------------  ------------  ------------  ------------  ------------
Self CPU time total: 90.751ms
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111084
Approved by: https://github.com/albanD
ghstack dependencies: #111079
2023-10-20 01:34:08 +00:00
Jon Chuang
df7d01aed5 perf(inductor): use for loop with shortcut in Optimizers to speedup against list comprehensions (e.g. complex conversion) (#110613)
Fully fixes: https://github.com/pytorch/pytorch/issues/110506

Depends: https://github.com/pytorch/pytorch/pull/110607
Potential merge conflicts:
- https://github.com/pytorch/pytorch/pull/110339
- https://github.com/pytorch/pytorch/pull/110345
- https://github.com/pytorch/pytorch/pull/110454

Related:
- https://github.com/pytorch/pytorch/issues/110606 (we can apply the improvements here orthogonally to the complex support)

### Results

Benchmark: 100 params.

Breakdowns (float32, dynamo):
```
Adagrad: this PR: 4.4s, main: 8.8s
Adam: this PR: 2.1s, main: 9.8s
AdamW: this PR: 2.5s, main: 8.2s
ASGD: this PR: 3.1s, main: 8.5s
RMSProp: this PR: 1.3s, main: 4.2s
RProp: this PR: 6.7s, main: 14.9s
```

Notes:
1. Adagrad is still slow due to `_get_value` list comprehension. Can be fixed in https://github.com/pytorch/pytorch/pull/110339/files by utilizing capturable path
2. Adamax is not actually compiled (it is currently disabled).
3. Inductor compile time is quite variable. We calculate dynamo by subtracting `call_user_compiler` from `compile_inner` timing.

<details>

This PR:
```
Adagrad (torch.float32): 28.47496461868286s
Adagrad (torch.complex64): 29.379547357559204s
Adam (torch.float32): 17.334211587905884s
Adam (torch.complex64): 29.637500524520874s
Adamax (torch.float32): 2.4749321937561035s
Adamax (torch.complex64): 3.1997995376586914s
AdamW (torch.float32): 18.06532859802246s
AdamW (torch.complex64): 28.25661015510559s
ASGD (torch.float32): 23.70255398750305s
ASGD (torch.complex64): 25.33756995201111s
RMSprop (torch.float32): 7.964028596878052s
RMSprop (torch.complex64): 12.909599781036377s
Rprop (torch.float32): 30.512362003326416s
Rprop (torch.complex64): 44.74405765533447s
```

Main
```
Adagrad (torch.float32): 26.919506072998047s
Adagrad (torch.complex64): 35.190622091293335s
Adam (torch.float32): 25.715000867843628s
Adam (torch.complex64): 24.17716670036316s
Adamax (torch.float32): 2.4404726028442383s
Adamax (torch.complex64): 3.3538928031921387s
AdamW (torch.float32): 25.2022807598114s
AdamW (torch.complex64): 28.915700912475586s
ASGD (torch.float32): 24.108731985092163s
ASGD (torch.complex64): 26.589075088500977s
RMSprop (torch.float32): 10.781344175338745s
RMSprop (torch.complex64): 15.136352777481079s
Rprop (torch.float32): 42.46482181549072s
Rprop (torch.complex64): 48.28277635574341s
```

Seems that it doesn't help the complex case by much (but that's not the majority case). torch.float32 is generally positive, when it does not show drastic improvement / regresses, it is due to inductor variance (by manually inspecting the logs).

</details>

### Benchmark Script
```python
import torch
import time
from torch.optim import Adagrad, Adam, Adamax, AdamW, ASGD, RMSprop, Rprop

OPTIMS = [Adagrad, Adam, Adamax, AdamW, ASGD, RMSprop, Rprop]
DTYPES = [torch.float, torch.cfloat]

NUM_PARAMS = 100
kwargs = { "lr": 0.01, "foreach": True }
summary = []

for optim_cls in OPTIMS:
    for dtype in DTYPES:
        torch._dynamo.reset()
        # torch._inductor.metrics.reset()
        input = torch.ones([10, 10], dtype=dtype, device="cuda:0")
        model = torch.nn.Sequential(
            *[torch.nn.Linear(10, 10, dtype=dtype, device="cuda:0") for _ in range(NUM_PARAMS)]
        )

        model(input).sum().abs().backward()
        opt_compiled = optim_cls(model.parameters(), **kwargs)
        compiled_step = torch.compile(opt_compiled.step)

        with torch.set_grad_enabled(False):
            start_time = time.time()
            compiled_step()
            summary.append(f"{optim_cls.__name__} ({dtype}): {time.time() - start_time}s")

        print(optim_cls, kwargs, dtype, torch._dynamo.utils.compile_times())

for s in summary:
    print(s)
```

CC: @janeyx99 @mlazos
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110613
Approved by: https://github.com/janeyx99
2023-10-05 23:10:52 +00:00
Jon Chuang
c99de9f37c fix(optim): adagrad sparse multitensor incorrect early exit (#110454)
Fixes https://github.com/pytorch/pytorch/issues/110444#issuecomment-1745181530

This PR:
Passes

Main:
```
test/optim/test_optim.py::TestOptim::test_adagrad_sparse FAILED [0.0058s]

==================================================================================================================================== FAILURES =====================================================================================================================================
__________________________________________________________________________________________________________________________ TestOptim.test_adagrad_sparse __________________________________________________________________________________________________________________________
Traceback (most recent call last):
  File "/home/jonch/Desktop/Programming/mlsys/pytorch/test/optim/test_optim.py", line 1448, in test_adagrad_sparse
    self._test_rosenbrock_sparse(
  File "/home/jonch/Desktop/Programming/mlsys/pytorch/test/optim/test_optim.py", line 128, in _test_rosenbrock_sparse
    self.assertEqual(params, params_c, atol=1e-6, rtol=1e-6)
  File "/home/jonch/Desktop/Programming/mlsys/pytorch/torch/testing/_internal/common_utils.py", line 3309, in assertEqual
    raise error_metas.pop()[0].to_error(
AssertionError: Tensor-likes are not close!

Mismatched elements: 1 / 2 (50.0%)
Greatest absolute difference: 0.09999999999993325 at index (1,) (up to 1e-06 allowed)
Greatest relative difference: 0.06249999999996089 at index (1,) (up to 1e-06 allowed)

```

CC: @janeyx99
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110454
Approved by: https://github.com/janeyx99
2023-10-05 20:37:57 +00:00
Aaron Gokaslan
6d43c89f37 [BE]: Update Ruff to 0.0.280 (#105724)
Removes unusued loop values in python dictionary iteration. Automated fix from Ruff master

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105724
Approved by: https://github.com/ezyang, https://github.com/janeyx99
2023-07-22 23:03:34 +00:00
Justin Chu
4cc1745b13 [BE] f-stringify torch/ and scripts (#105538)
This PR is a follow up on the pyupgrade series to convert more strings to use f-strings using `flynt`.

- https://docs.python.org/3/reference/lexical_analysis.html#f-strings
- https://pypi.org/project/flynt/

Command used:

```
flynt torch/ -ll 120
flynt scripts/ -ll 120
flynt tools/ -ll 120
```

and excluded `collect_env.py`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105538
Approved by: https://github.com/ezyang, https://github.com/malfet
2023-07-21 19:35:24 +00:00
Justin Chu
3721fa5612 [BE] Enable ruff's UP rules and autoformat optim/ (#105426)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105426
Approved by: https://github.com/malfet, https://github.com/albanD, https://github.com/aaronenyeshi, https://github.com/janeyx99
2023-07-18 21:07:43 +00:00
Jane Xu
ea6a563a8c [foreach][Adagrad] Minimize intermediates=2 to decrease peak memory (#104988)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104988
Approved by: https://github.com/albanD, https://github.com/Skylion007
2023-07-12 03:09:17 +00:00
Nikita Shulga
6d2887cc06 Reland "Move tensor grouping to ATen" (#103912)
This is a reland of https://github.com/pytorch/pytorch/pull/100007 with a build fix for Windows debug builds.
`at::native::ParamsHash` only works on structs with standard layout, but `std::string` isn't one in Visual C++ debug builds, which one can easily verified by running something like:
```cpp
#define _DEBUG
#include <type_traits>
#include <string>
static_assert(std::is_standard_layout_v<std::string>, "Oh noes");
```
If above conditon is not met, instead of printing a static_assert output, VC++ raises a very cryptic compilation errors,  see https://github.com/pytorch/pytorch/pull/100007#discussion_r1227116292 for more detail.

Also, using `std::hash` for string should result in a faster hash function.

(cherry picked from commit 74b7a6c75e)

<!--
copilot:summary
-->
### <samp>🤖 Generated by Copilot at 5914771</samp>

This pull request introduces a new function `_group_tensors_by_device_and_dtype` that can group tensors by their device and dtype, and updates the `foreach` utilities and several optimizers to use this function. The goal is to improve the performance, readability, and compatibility of the code that handles tensors with different properties. The pull request also adds a test case and type annotations for the new function, and some error checks for the `fused` argument in Adam and AdamW.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103912
Approved by: https://github.com/janeyx99
2023-06-21 09:26:33 +00:00
PyTorch MergeBot
0cb5bc3b04 Revert "Move tensor grouping to ATen (#100007)"
This reverts commit 74b7a6c75e.

Reverted https://github.com/pytorch/pytorch/pull/100007 on behalf of https://github.com/izaitsevfb due to Breaks internal builds, see D46629727 ([comment](https://github.com/pytorch/pytorch/pull/100007#issuecomment-1587861598))
2023-06-12 18:30:33 +00:00
Masaki Kozuki
74b7a6c75e Move tensor grouping to ATen (#100007)
rel: #94344
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100007
Approved by: https://github.com/janeyx99
2023-06-09 15:44:46 +00:00
Michael Lazos
4da88447ea Disable grouping by dtype and device if compiling (#102771)
Disable grouping if we are compiling, this happens during lowering
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102771
Approved by: https://github.com/janeyx99
2023-06-02 21:04:49 +00:00
Jane Xu
75cb99e549 [optim] Widen the cases for defaulting to foreach (#95820)
Big OOP correction continued. Also added a test this time to verify the defaulting was as expected.

The key here is realizing that the grouping for foreach already assumes that the non-param tensorlists follow suit in dtype and device, so it is too narrow to check that _all_ tensors were on CUDA. The main leeway this allowed was state_steps, which are sometimes cpu tensors. Since foreach _can_ handle cpu tensors, this should not introduce breakage.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95820
Approved by: https://github.com/albanD
2023-03-02 04:15:33 +00:00
Jane Xu
097679478e [optim] Set defaults to foreach, NOT fused (#95241)
Rolling back the default change for Adam and rectifying the docs to reflect that AdamW never defaulted to fused.

Since our fused implementations are relatively newer, let's give them a longer bake-in time before flipping the switch for every user.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95241
Approved by: https://github.com/ngimel
2023-02-22 04:47:32 +00:00
Xuehai Pan
5b1cedacde [BE] [2/3] Rewrite super() calls in functorch and torch (#94588)
Rewrite Python built-in class `super()` calls. Only non-semantic changes should be applied.

- #94587
- #94588
- #94592

Also, methods with only a `super()` call are removed:

```diff
class MyModule(nn.Module):
-   def __init__(self):
-       super().__init__()
-
    def forward(self, ...):
        ...
```

Some cases that change the semantics should be kept unchanged. E.g.:

f152a79be9/caffe2/python/net_printer.py (L184-L190)

f152a79be9/test/test_jit_fuser_te.py (L2628-L2635)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94588
Approved by: https://github.com/ezyang, https://github.com/albanD
2023-02-10 21:16:33 +00:00
Jane Xu
4fc19e1a71 [optim][adam] use fastest impl whenever possible, add util (#93184)
This allows it so that ONLY when the users don't set anything for foreach or fused do we switch the default and cascades adam so that we default to fused, then foreach, then single-tensor.

To clarify:
* if the user puts True in foreach _only_, it will run the foreach implementation.
* if the user puts True in fused _only_, it will run the fused implementation.
* if the user puts True in foreach AND for fused, it will run the fused implementation.

And:
* if the user puts False in foreach _only_, it will run the single tensor implementation.
* if the user puts False in fused _only_, it will still run the single tensor implementation.
* if the user puts False in foreach AND for fused, it will run the single tensor implementation.

I also didn't trust myself that much with the helper function, so I ran some local asserts on _default_to_fused_or_foreach. The only point left to really test is the type(p) -- torch.Tensor but I think the distributed tests will catch that in CI.
```
cuda_only_fp_list = [
    torch.rand((1, 2), device="cuda", dtype=torch.float32),
    torch.rand((1, 2), device="cuda", dtype=torch.float64),
    torch.rand((1, 2), device="cuda", dtype=torch.float16),
    torch.rand((1, 2), device="cuda", dtype=torch.bfloat16),
]

cuda_only_int_list = [
    torch.randint(1024, (1, 2), device="cuda", dtype=torch.int64),
]

cpu_list = [
    torch.rand((1, 2), device="cpu", dtype=torch.float32),
    torch.rand((1, 2), device="cpu", dtype=torch.float64),
    torch.rand((1, 2), device="cpu", dtype=torch.float16),
]

none_list = [None]

# differentiable should always make it return false for both
assert _default_to_fused_or_foreach([cuda_only_fp_list], True, True) == (False, False)
assert _default_to_fused_or_foreach([cuda_only_fp_list], True, False) == (False, False)

# cpu lists should always make it return false for both
assert _default_to_fused_or_foreach([cuda_only_fp_list, cpu_list], False, True) == (False, False)
assert _default_to_fused_or_foreach([cpu_list], False, True) == (False, False)
assert _default_to_fused_or_foreach([cuda_only_fp_list, cpu_list], False, False) == (False, False)
assert _default_to_fused_or_foreach([cpu_list], False, False) == (False, False)

# has fused triggers correctly
assert _default_to_fused_or_foreach([cuda_only_fp_list], False, True) == (True, False)
assert _default_to_fused_or_foreach([cuda_only_fp_list], False, False) == (False, True)

# ints always goes to foreach
assert _default_to_fused_or_foreach([cuda_only_fp_list, cuda_only_int_list], False, True) == (False, True)
assert _default_to_fused_or_foreach([cuda_only_fp_list, cuda_only_int_list], False, False) == (False, True)

# Nones don't error
assert _default_to_fused_or_foreach([cuda_only_fp_list, none_list], False, True) == (True, False)
assert _default_to_fused_or_foreach([cuda_only_fp_list, cuda_only_int_list, none_list], False, True) == (False, True)
assert _default_to_fused_or_foreach([none_list], False, True) == (True, False)
assert _default_to_fused_or_foreach([none_list], False, False) == (False, True)
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/93184
Approved by: https://github.com/albanD
2023-01-30 19:58:55 +00:00
Jane Xu
9b4a778420 [optim][adagrad] default to foreach when CUDA + differentiable=False (#92716)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92716
Approved by: https://github.com/albanD
2023-01-21 05:31:22 +00:00
Jane Xu
de0375e79d [optim][foreach] Do NOT inplace modify gradients (#92706)
SGD and ASGD already had out-of-place grads.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92706
Approved by: https://github.com/ngimel, https://github.com/albanD
2023-01-21 00:12:28 +00:00
Jane Xu
b2ca2c8662 [optim][adagrad] group tensors in foreach to maximize perf (#92362)
another one
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92362
Approved by: https://github.com/albanD
2023-01-20 16:24:39 +00:00
Jane Xu
0070c546b5 [BE][optim] abstract out docstrings, add differentiable docs (#92336)
1. abstract out common doc strings --> I'm sure there are more, but let this be a first step.
2. Add differentiable docs to those who are actually differentiable
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92336
Approved by: https://github.com/albanD
2023-01-18 15:09:28 +00:00
Soumith Chintala
06326a7721 [optim] skip .item calls in all optimizers when compiling with dynamo (#88173)
@mlazos: skips `item()` calls if compiling with dynamo, by defining a helper function `_get_value` which either returns the result of `.item()` or the scalar cpu tensor if compiling with dynamo. This was done because removing `item()` calls significantly regresses eager perf. Additionally, `_dispatch_sqrt` calls the appropriate sqrt function (math.sqrt, or torch.sqrt).

Fixes https://github.com/pytorch/torchdynamo/issues/1083

This PR will no longer be needed once symint support is default.

This PR closes all remaining graph breaks in the optimizers (!!)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88173
Approved by: https://github.com/albanD
2022-12-12 17:32:35 +00:00
Michael Lazos
c63afb283c Disable dynamo on optimizer lazy initialization (#89902)
Helps with https://github.com/pytorch/torchdynamo/issues/1803

Separate out the group initialization and disable dynamo on it

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89902
Approved by: https://github.com/soumith, https://github.com/albanD
2022-12-02 01:15:11 +00:00
Michael Lazos
3d47c74cfe Update code style for optimizer code (#89862)
Separating out whitespace-only changes
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89862
Approved by: https://github.com/albanD, https://github.com/soumith
2022-11-30 00:53:05 +00:00
Emilio Castillo
aacb9f3ac6 Make Adadelta,Adagrad & Adamax differentiable (#86096)
Continuing the differentiable optimizers support

Pull Request resolved: https://github.com/pytorch/pytorch/pull/86096
Approved by: https://github.com/janeyx99
2022-10-12 23:16:29 +00:00
ProGamerGov
71d50f4f89 Change docstring type callable to Callable for consistency (#82487)
### Description

Across PyTorch's docstrings, both `callable` and `Callable` for variable types. The Callable should be capitalized as we are referring to the `Callable` type, and not the Python `callable()` function.

### Testing

There shouldn't be any testing required.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/82487
Approved by: https://github.com/albanD
2022-08-01 17:26:09 +00:00
anjali411
bda04e9f5e Add __all__ for torch.optim and torch.nn.modules modules (#80237)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80237
Approved by: https://github.com/albanD
2022-06-24 21:34:10 +00:00
Sergii Dymchenko
de7219e8a7 Use generators with all/any in torch/optim (#78142)
Generator comprehensions with any/all are less verbose and potentially help to save memory/CPU : https://eklitzke.org/generator-comprehensions-and-using-any-and-all-in-python

To make JIT work with this change, I added code to convert GeneratorExp to ListComp. So the whole PR is basically NoOp for JIT, but potentially memory and speed improvement for eager mode.

Also I removed a test from test/jit/test_parametrization.py. The test was bad and had a TODO to actually implement and just tested that UnsupportedNodeError is thrown, and with GeneratorExp support a different error would be thrown.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78142
Approved by: https://github.com/malfet, https://github.com/albanD
2022-06-24 17:23:45 +00:00
Rob Zinkov
6642e88ad2 Adding maximize flag to Adagrad
This adds maximize to Adagrad (#68052) along with updates the respective tests.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75968
Approved by: https://github.com/albanD
2022-04-20 08:29:03 +00:00
Mikayla Gawarecki
dabfea8363 Optim foreach cleanup for Adagrad (#69981)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69981

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D33767863

Pulled By: mikaylagawarecki

fbshipit-source-id: 1c99abe4ac4eb2a9eb896dff4837b539b94f68e7
(cherry picked from commit 61c28d0645)
2022-02-09 16:52:12 +00:00
Mikayla Gawarecki
7176c92687 [optim] update step in functional and pass state_steps instead of state (#71333)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71333

Updated
- Adagrad
- Adamax
- Adam
- AdamW
- RAdam
make multi_tensor functionals take `state_steps: List[Tensor]` instead of taking `states: List[Dict]`
make `state_steps: List[int]s -> state_steps:List[Tensor]` where each is a Singleton tensor so step can be updated within the functional

(NAdam and ASGD) were updated in separate diffs to fold their handling of state into the functionals

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D33767872

Pulled By: mikaylagawarecki

fbshipit-source-id: 9baa7cafb6375eab839917df9287c65a437891f2
(cherry picked from commit 831c02b3d0)
2022-02-08 16:51:19 +00:00
Christopher Gray Howard
acb340de75 [Pytorch][Bootcamp] Add fixes and vanilla testing for Adagrad non-vectorized and vectorized optimizers to handle complex numbers (#66671)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66671

Made changes in the step function of the vectorized and non-vectorized adagrad optimizers to handle complex numbers as two real numbers as per 65711 on github
ghstack-source-id: 141442350

Test Plan:
buck test mode/dev caffe2/test:optim -- 'test_adagrad_complex'
https://pxl.cl/1Rd44

Reviewed By: albanD

Differential Revision: D31673503

fbshipit-source-id: 90a0d0c69b556716e2d17c59ce80f09c750fc464
2021-10-25 10:13:21 -07:00
Ilqar Ramazanli
d4b09dbab3 [doc][hackathon] To add Adagrad Optimizer to the documentation (#63254)
Summary:
It has been discussed before that adding description of Optimization algorithms to PyTorch Core documentation may result in a nice Optimization research tutorial. In the following tracking issue we mentioned about all the necessary algorithms and links to the originally published paper  https://github.com/pytorch/pytorch/issues/63236.

In this PR we are adding description of Adagrad to the documentation.  For more details, we refer to the paper
http://jmlr.org/papers/v12/duchi11a.html

<img width="658" alt="AdaGradAlgo" src="https://user-images.githubusercontent.com/73658284/132743276-a52ea3fb-70a5-4788-94b7-f99367907a26.png">

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63254

Reviewed By: albanD

Differential Revision: D30852139

Pulled By: iramazanli

fbshipit-source-id: 9e496560a97e92be8386585b01d9bd3bba4b0c66
2021-09-09 15:41:29 -07:00
Wanchao Liang
4611387608 [optim] take kw-only argument for functional optim APIs (#56185)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56185

ghstack-source-id: 126670123

Reviewed By: albanD

Differential Revision: D27802169

fbshipit-source-id: f5e1cb2046dcdeecf5f6b0f70892828bf0adb22f
2021-04-15 20:08:04 -07:00
Vincent Quenneville-Belair
50d903f19f [optim] make functional api be private (#51316) (#51665)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51665

This reverts commit 896f82aa92.

Test Plan: Imported from OSS

Reviewed By: gchanan

Differential Revision: D26232608

Pulled By: vincentqb

fbshipit-source-id: ca006baf4fb672c11c1bb003c39a29cbadb63dd3
2021-02-03 17:59:05 -08:00
Vincent Quenneville-Belair
896f82aa92 [optim] make functional api be private (#51316)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51316

Make optim functional API be private until we release with beta

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D26213469

fbshipit-source-id: b0fd001a8362ec1c152250bcd57c7205ed893107
2021-02-03 09:29:33 -08:00
Samuel Marks
e6779d4357 [*.py] Rename "Arguments:" to "Args:" (#49736)
Summary:
I've written custom parsers and emitters for everything from docstrings to classes and functions. However, I recently came across an issue when I was parsing/generating from the TensorFlow codebase: inconsistent use of `Args:` and `Arguments:` in its docstrings.

```sh
(pytorch#c348fae)$ for name in 'Args:' 'Arguments:'; do
    printf '%-10s %04d\n' "$name" "$(rg -IFtpy --count-matches "$name" | paste -s -d+ -- | bc)"; done
Args:      1095
Arguments: 0336
```

It is easy enough to extend my parsers to support both variants, however it looks like `Arguments:` is wrong anyway, as per:

  - https://google.github.io/styleguide/pyguide.html#doc-function-args @ [`ddccc0f`](https://github.com/google/styleguide/blob/ddccc0f/pyguide.md)

  - https://chromium.googlesource.com/chromiumos/docs/+/master/styleguide/python.md#describing-arguments-in-docstrings @ [`9fc0fc0`](https://chromium.googlesource.com/chromiumos/docs/+/9fc0fc0/styleguide/python.md)

  - https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html @ [`c0ae8e3`](https://github.com/sphinx-contrib/napoleon/blob/c0ae8e3/docs/source/example_google.rst)

Therefore, only `Args:` is valid. This PR replaces them throughout the codebase.

PS: For related PRs, see tensorflow/tensorflow/pull/45420

PPS: The trackbacks automatically appearing below are sending the same changes to other repositories in the [PyTorch](https://github.com/pytorch) organisation.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49736

Reviewed By: albanD

Differential Revision: D25710534

Pulled By: soumith

fbshipit-source-id: 61e8ff01abb433e9f78185c2d1d0cbd7c22c1619
2020-12-28 09:34:47 -08:00
Wanchao Liang
0444c372e1 [optimizer] introduce optimizer functional API, refactor Adagrad (#44715)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44715

We have provided a nice and intuitive API in Python. But in the context of large scale distributed training (e.g. Distributed Model Parallel), users often want to use multithreaded training instead of multiprocess training as it provides better resource utilization and efficiency.

This PR introduces functional optimizer concept (that is similar to the concept of `nn.functional`), we split optimizer into two parts: 1. optimizer state management 2. optimizer computation. We expose the computation part as a separate functional API that is available to be used by internal and OSS developers, the caller of the functional API will maintain their own states in order to directly calls the functional API. While maintaining the end user API be the same, the functional API is TorchScript friendly, and could be used by the distributed optimizer to speed up the training without GIL.

Test Plan: Imported from OSS

Reviewed By: ailzhang

Differential Revision: D23935258

Pulled By: wanchaol

fbshipit-source-id: d2a5228439edb3bc64f7771af2bb9e891847136a
2020-09-25 17:10:26 -07:00
albanD
6e2bb1c054 End of the .data removal in torch/optim (#34211)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34211

Test Plan: Imported from OSS

Differential Revision: D20248684

Pulled By: albanD

fbshipit-source-id: 2294bfa41b82ff47f000bc98860780f59d7d4421
2020-03-09 06:40:39 -07:00
Eleanor Dwight Holland
6a97777f72 Remove use of .data from optimizers (#33640)
Summary:
Removes all uses of `.data` from optimizers.

Or tries to.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33640

Reviewed By: vincentqb

Differential Revision: D20203216

Pulled By: albanD

fbshipit-source-id: 9bfe78bbed00fd4aaa690801cff0201f0bd680a0
2020-03-03 13:21:55 -08:00
Xiao Wang
c1dd70688a Fix deprecated python "add" calls (#33428)
Summary:
This PR fixed those python "add" calls using deprecated signature `add(Scalar, Tensor)`. The alternative signature `add(Tensor, alpha = Scalar)` is used.

cc csarofeen zasdfgbnm ptrblck ngimel
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33428

Differential Revision: D20002534

Pulled By: vincentqb

fbshipit-source-id: 81f2dd6170a47a9b53a17e5817c26e70d8afa130
2020-02-26 09:02:31 -08:00
Vitaly Fedyunin
877c96cddf explicitly provide memory format when calling to *_like operators
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30008

Test Plan: Imported from OSS

Differential Revision: D18575981

Pulled By: VitalyFedyunin

fbshipit-source-id: ec3418257089ad57913932be1a8608cd20ce054c
2019-11-19 16:19:29 -08:00
Roy Li
14ac7a1d87 Add epsilon argument to Adagrad optimizer (#24980)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24980

We'll need this internally, so just updating the open source version. the other optimizers have this argument anyways.

Test Plan: Imported from OSS

Differential Revision: D16945279

Pulled By: li-roy

fbshipit-source-id: 0b8cc86f15387cd65660747899d3d7dd870cff27
2019-08-21 16:36:51 -07:00
Soumith Chintala
75754beca3 Revert D14577575: [pytorch][PR] Fix lack of state init for adagrad and add share_memory flag
Differential Revision:
D14577575

Original commit changeset: 12440079ac96

fbshipit-source-id: 935106385e608471dc280fc61cfedf19d330812d
2019-04-26 15:43:04 -07:00
Kaiyu Shi
444f792fa6 Fix lack of state init for adagrad and add share_memory flag (#17679)
Summary:
The current code initialize the `state` in `__init__` method, but the initialization process is not invoked in `add_parameter_group`.

I followed the same approach in other Optimizers to init the `state`.

```python
import torch

emb = torch.nn.Embedding(10,10)
emb2 = torch.nn.Embedding(10,10)

optim = torch.optim.Adagrad(emb.parameters())
print(optim.state[emb.weight])  # already initialized

optim.add_param_group({'params': emb2.parameters()})
print(optim.state[emb2.weight])  # empty dict

loss = emb2.weight.sum() + emb.weight.sum()
loss.backward()
optim.step()  # raised KeyError
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17679

Differential Revision: D14577575

Pulled By: ezyang

fbshipit-source-id: 12440079ac964b9eedad48e393d47f558babe300
2019-04-23 12:22:19 -07:00
Peter Goldsborough
fb4e8088f3 Remove methods that start with an underscore from at::Tensor (#11152)
Summary:
This PR cleans up the `at::Tensor` class by removing all methods that start with an underscore in favor of functions in the `at::` namespace. This greatly cleans up the `Tensor` class and makes it clearer what is the public and non-public API.

For this I changed `native_functions.yaml` and `Declarations.cwrap` to make all underscore methods `variant: function` (or add such a statement to begin with), and then fixed all code locations using the underscore methods.

ezyang colesbury gchanan
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11152

Differential Revision: D9683607

Pulled By: goldsborough

fbshipit-source-id: 97f869f788fa56639c05a439e2a33be49f10f543
2018-09-07 11:55:11 -07:00
Atul Kumar
3e83e3abfe Adding initial_accumulator_value parameter to Adagrad (#6616) 2018-04-16 22:12:36 +02:00
lazypanda1
063946d2b3 Added parameter range checks for all optimizers (#6000) 2018-03-28 11:22:23 +02:00
SsnL
f76d6c029c Sparse Adam optimizer for sparse gradients (#3137)
* sparse adam

* Favor dense addition over sparse_mask
2017-11-06 14:20:51 -05:00