pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
shibo19	b088ff4677	add foreach support for custom device (#102047 ) Fixes #ISSUE_NUMBER for custom device, we want to support foreach, so I add a func that we could set other device type, and the default value is cuda. Pull Request resolved: https://github.com/pytorch/pytorch/pull/102047 Approved by: https://github.com/janeyx99	2023-06-01 06:22:44 +00:00
vfdev	76af22103b	Fixed type hints for CosineAnnealingWarmRestarts (#102067 ) Fixed type hints for CosineAnnealingWarmRestarts: - `T_mult` is not `Optional[int]` but just `int` - `eta_min` is not `Optional[float]` but just `float` - removed `step` method specific annotation as it is compatible with the base class `e132f09e88/torch/optim/lr_scheduler.py (L1365-L1375)` Otherwise, computation like this `self.T_i * self.T_mult` in `self.step` is not possible: ``` error: Unsupported operand types for * ("int" and "None") ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/102067 Approved by: https://github.com/janeyx99	2023-05-23 19:06:07 +00:00
Jane Xu	3135bec4a0	[docs] Clarify when to use SparseAdam (#101465 ) ![image](https://github.com/pytorch/pytorch/assets/31798555/ff19a522-2630-4578-bc0e-6a704aa94d4e) Pull Request resolved: https://github.com/pytorch/pytorch/pull/101465 Approved by: https://github.com/albanD	2023-05-17 21:16:20 +00:00
Jane Xu	f558af2a55	[adam] Use the right params in weight_decay, rename for clarity, fixes #100707 (#100973 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/100973 Approved by: https://github.com/Skylion007, https://github.com/albanD	2023-05-09 17:00:27 +00:00
milesial	45bf3f6216	Optimized EMA implementation (#94820 ) This PR proposes an optimized way to do Exponential Moving Average (EMA), which is faster than the current way using `swa_utils.AveragedModel` described in https://pytorch.org/docs/stable/optim.html#custom-averaging-strategies. This implementation is asynchronous, and is built as an optimizer wrapper so that the EMA weight update happens without any additional CPU/GPU sync, just after optimizer steps, and with limited code changes. Example usage: ``` model = Model().to(device) opt = torch.optim.Adam(model.parameters()) opt = EMAOptimizer(opt, device, 0.9999) for epoch in range(epochs): training_loop(model, opt) regular_eval_accuracy = evaluate(model) with opt.swap_ema_weights(): ema_eval_accuracy = evaluate(model) ``` Here are some benchmarks (time per iteration) on various torchvision models: \|model\|this PR iteration time \|swa_utils.AveragedModel iteration time\| iteration speedup \| \|-----\|-----------------------------\|-----------------------\|---------------------------------------------\| \| \| \| \| \| \|regnet_x_1_6gf\|62.73 \|67.998 \|1.08 \| \|regnet_x_3_2gf\|101.75 \|109.422 \|1.08 \| \|regnet_x_400mf\|25.13 \|32.005 \|1.27 \| \|regnet_x_800mf\|33.01 \|37.466 \|1.13 \| \|regnet_x_8gf\|128.13 \|134.868 \|1.05 \| \|regnet_y_16gf\|252.91 \|261.292 \|1.03 \| \|regnet_y_1_6gf\|72.14 \|84.22 \|1.17 \| \|regnet_y_3_2gf\|99.99 \|109.296 \|1.09 \| \|regnet_y_400mf\|29.53 \|36.506 \|1.24 \| \|regnet_y_800mf\|37.82 \|43.634 \|1.15 \| \|regnet_y_8gf\|196.63 \|203.317 \|1.03 \| \|resnet101\|128.80 \|137.434 \|1.07 \| \|resnet152\|182.85 \|196.498 \|1.07 \| \|resnet18\|29.06 \|29.975 \|1.03 \| \|resnet34\|50.73 \|53.443 \|1.05 \| \|resnet50\|76.88 \|80.602 \|1.05 \| \|resnext101_32x8d\|277.29 \|280.759 \|1.01 \| \|resnext101_64x4d\|269.56 \|281.052 \|1.04 \| \|resnext50_32x4d\|100.73 \|101.102 \|1.00 \| \|shufflenet_v2_x0_5\|10.56 \|15.419 \|1.46 \| \|shufflenet_v2_x1_0\|13.11 \|18.525 \|1.41 \| \|shufflenet_v2_x1_5\|18.05 \|23.132 \|1.28 \| \|shufflenet_v2_x2_0\|25.04 \|30.008 \|1.20 \| \|squeezenet1_1\|14.26 \|14.325 \|1.00 \| \|swin_b\|264.52 \|274.613 \|1.04 \| \|swin_s\|180.66 \|188.914 \|1.05 \| \|swin_t\|108.62 \|112.632 \|1.04 \| \|swin_v2_s\|220.29 \|231.153 \|1.05 \| \|swin_v2_t\|127.27 \|133.586 \|1.05 \| \|vgg11\|95.52 \|103.714 \|1.09 \| \|vgg11_bn\|106.49 \|120.711 \|1.13 \| \|vgg13\|132.94 \|147.063 \|1.11 \| \|vgg13_bn\|149.73 \|165.256 \|1.10 \| \|vgg16\|158.19 \|172.865 \|1.09 \| \|vgg16_bn\|177.04 \|192.888 \|1.09 \| \|vgg19\|184.76 \|194.194 \|1.05 \| \|vgg19_bn\|203.30 \|213.334 \|1.05 \| \|vit_b_16\|217.31 \|219.748 \|1.01 \| \|vit_b_32\|69.47 \|75.692 \|1.09 \| \|vit_l_32\|223.20 \|258.487 \|1.16 \| \|wide_resnet101_2\|267.38 \|279.836 \|1.05 \| \|wide_resnet50_2\|145.06 \|154.918 \|1.07 \| You can see that in all cases it is faster than using `AveragedModel`. In fact in many cases, adding EMA does not add any overhead since the computation is hidden behind the usual iteration flow. This is a similar implementation to the one currently in [NVIDIA NeMo](https://github.com/NVIDIA/NeMo). If the team is interested in merging this, let me know and I'll add some documentation similar to `swa_utils` and tests. Credits to @szmigacz for the implementation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94820 Approved by: https://github.com/janeyx99	2023-04-26 18:02:11 +00:00
Arthur	7a8d0ccddf	Correct LBFGS tolerance_grad doc string (#99792 ) LBFGS' `tolerance_grad` parameter has had a default value of `1e-7` since #25240. The doc string wasn't updated in that PR to match the change https://github.com/pytorch/pytorch/blob/main/torch/optim/lbfgs.py#L207. no open issue for it, just happened to set it to 1e-7 and was surprised my results didn't change :-) eventually noticed inconsistency in the doc and seemed like an easy opportunity to figure out how to contribute. Pull Request resolved: https://github.com/pytorch/pytorch/pull/99792 Approved by: https://github.com/janeyx99	2023-04-22 20:19:01 +00:00
PyTorch MergeBot	4637c5ae5b	Revert "Simplify _use_grad_for_differentiable (#98706 )" This reverts commit `b9da79d280`. Reverted https://github.com/pytorch/pytorch/pull/98706 on behalf of https://github.com/huydhn due to Sorry for reverting your PR but a bunch of inductor tests are failing after this commit, so reverting the PR just to be sure	2023-04-22 00:35:56 +00:00
Jason Ansel	b9da79d280	Simplify _use_grad_for_differentiable (#98706 ) This makes it so dynamo can trace through it Pull Request resolved: https://github.com/pytorch/pytorch/pull/98706 Approved by: https://github.com/janeyx99	2023-04-21 20:47:19 +00:00
Masaki Kozuki	22ea21da3d	Change 1D Tensor of 1 element to 0D Tensor (#96994 ) add 0d tensor to graph adam/adamw test Affected: - `torch.cuda.amp.GradScaler`'s `found_inf`, `_scale`, and `_growth_tracker` - `step` of Adam & AdamW of `capturable` Fixes #96776 🤞 Pull Request resolved: https://github.com/pytorch/pytorch/pull/96994 Approved by: https://github.com/janeyx99	2023-03-21 18:24:19 +00:00
Jane Xu	aacbf091db	Allow fused optimizers to call _foreach_zero_ in zero_grad (#97159 ) Fixes #97032 Pull Request resolved: https://github.com/pytorch/pytorch/pull/97159 Approved by: https://github.com/Skylion007	2023-03-20 19:03:26 +00:00
Aaron Gokaslan	5471621497	[BE] Remove unnecessary dict comprehensions (#97116 ) Removes unnecessary dict comprehensions that optimize creation of dicts from iterables Pull Request resolved: https://github.com/pytorch/pytorch/pull/97116 Approved by: https://github.com/kit1980	2023-03-20 00:56:57 +00:00
Aaron Gokaslan	dd9ade6377	Remove unnecessary items() call in zero_grad (#97040 ) Micro-optimization to zero_grad() which is performance critical Pull Request resolved: https://github.com/pytorch/pytorch/pull/97040 Approved by: https://github.com/ezyang, https://github.com/albanD	2023-03-17 21:34:14 +00:00
David	e8b0f504e2	Fix unpicklable object in AveragedModel (#95979 ) Fixes #95376 Don't store the callable `avg_fn`, instead test if `avg_fn` is None and call the default impl if it's not. Pull Request resolved: https://github.com/pytorch/pytorch/pull/95979 Approved by: https://github.com/janeyx99	2023-03-12 05:13:22 +00:00
Masaki Kozuki	7d765cdc66	Fix wrong handling of `grad_scale` & `found_inf` in fused optimizers (#95847 ) Fixes #95781. The cause seems to be that the current implementation doesn't correctly pass `found_inf` when `grad_scale` is `None`. Therefore parameters can get mistakenly updated by gradients whose some elements are invalid, i.e. nan or inf. Related #94060 I forgot about this wrong handling after #94344 Pull Request resolved: https://github.com/pytorch/pytorch/pull/95847 Approved by: https://github.com/janeyx99	2023-03-04 01:21:21 +00:00
Jane Xu	75cb99e549	[optim] Widen the cases for defaulting to foreach (#95820 ) Big OOP correction continued. Also added a test this time to verify the defaulting was as expected. The key here is realizing that the grouping for foreach already assumes that the non-param tensorlists follow suit in dtype and device, so it is too narrow to check that _all_ tensors were on CUDA. The main leeway this allowed was state_steps, which are sometimes cpu tensors. Since foreach _can_ handle cpu tensors, this should not introduce breakage. Pull Request resolved: https://github.com/pytorch/pytorch/pull/95820 Approved by: https://github.com/albanD	2023-03-02 04:15:33 +00:00
Jane Xu	2bcf863fad	[optim] include nn.Parameter as foreach supported (#95811 ) This PR is a result of a realization that models are NOT subscribed to the foreach defaulting as have been claimed on our documentation for months now. BIG OOPS. Pull Request resolved: https://github.com/pytorch/pytorch/pull/95811 Approved by: https://github.com/albanD	2023-03-02 04:15:33 +00:00
Xuehai Pan	1fd119948e	[3/3] Update `.pyi` Python stub files and enable `'UFMT'` linter (#95268 ) Changes: - #95200 1. Recognize `.py.in` and `.pyi.in` files as Python in VS Code for a better development experience. 2. Fix deep setting merge in `tools/vscode_settings.py`. - #95267 3. Use `Namedtuple` rather than `namedtuple + __annotations__` for `torch.nn.utils.rnn.PackedSequence_`: `namedtuple + __annotations__`: ```python PackedSequence_ = namedtuple('PackedSequence_', ['data', 'batch_sizes', 'sorted_indices', 'unsorted_indices']) # type annotation for PackedSequence_ to make it compatible with TorchScript PackedSequence_.__annotations__ = {'data': torch.Tensor, 'batch_sizes': torch.Tensor, 'sorted_indices': Optional[torch.Tensor], 'unsorted_indices': Optional[torch.Tensor]} ``` `Namedtuple`: Python 3.6+ ```python class PackedSequence_(NamedTuple): data: torch.Tensor batch_sizes: torch.Tensor sorted_indices: Optional[torch.Tensor] unsorted_indices: Optional[torch.Tensor] ``` - => this PR: #95268 4. Sort import statements and remove unnecessary imports in `.pyi`, `.pyi.in` files. 5. Format `.pyi`, `.pyi.in` files and remove unnecessary ellipsis `...` in type stubs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/95268 Approved by: https://github.com/huydhn	2023-03-01 23:50:56 +00:00
Kiersten Stokes	60a1d29585	Correct OneCycleLR doc example code to explicitly call optimizer.step() (#95730 ) Fixes #89358 as suggested in the issue comment A screenshot of the example code in the built docs: <img width="1168" alt="Screenshot 2023-02-28 at 4 46 45 PM" src="https://user-images.githubusercontent.com/31816267/221999156-02b28f2a-85b3-4aa8-841d-e4c66a39a33f.png"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/95730 Approved by: https://github.com/janeyx99	2023-03-01 02:15:50 +00:00
Jane Xu	e5b9d98752	Rephrase zero_grad docs (#95643 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/95643 Approved by: https://github.com/albanD	2023-02-28 22:04:23 +00:00
Jane Xu	097679478e	[optim] Set defaults to foreach, NOT fused (#95241 ) Rolling back the default change for Adam and rectifying the docs to reflect that AdamW never defaulted to fused. Since our fused implementations are relatively newer, let's give them a longer bake-in time before flipping the switch for every user. Pull Request resolved: https://github.com/pytorch/pytorch/pull/95241 Approved by: https://github.com/ngimel	2023-02-22 04:47:32 +00:00
Masaki Kozuki	3e9df622fb	[mta] implement `_foreach_pow` (#92303 ) Mainly for foreach path of `Adam` and `AdamW` rel: https://github.com/pytorch/pytorch/issues/58833 Pull Request resolved: https://github.com/pytorch/pytorch/pull/92303 Approved by: https://github.com/albanD	2023-02-16 02:28:26 +00:00
Xuehai Pan	b005ec62b9	[BE] Remove dependency on `six` and `future` (#94709 ) Remove the Python 2 and 3 compatibility library [six](https://pypi.org/project/six) and [future](https://pypi.org/project/future) and `torch._six`. We only support Python 3.8+ now. It's time to retire them. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94709 Approved by: https://github.com/malfet, https://github.com/Skylion007	2023-02-14 09:14:14 +00:00
Xuehai Pan	5b1cedacde	[BE] [2/3] Rewrite `super()` calls in functorch and torch (#94588 ) Rewrite Python built-in class `super()` calls. Only non-semantic changes should be applied. - #94587 - #94588 - #94592 Also, methods with only a `super()` call are removed: ```diff class MyModule(nn.Module): - def __init__(self): - super().__init__() - def forward(self, ...): ... ``` Some cases that change the semantics should be kept unchanged. E.g.: `f152a79be9/caffe2/python/net_printer.py (L184-L190)` `f152a79be9/test/test_jit_fuser_te.py (L2628-L2635)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/94588 Approved by: https://github.com/ezyang, https://github.com/albanD	2023-02-10 21:16:33 +00:00
Aaron Gokaslan	1e2d82b8e4	[BE] Merge isinstance calls together (#94419 ) Simplify and speeds up isinstance calls by checking for multiple types at the same time. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94419 Approved by: https://github.com/ezyang	2023-02-09 00:47:26 +00:00
Aaron Gokaslan	3ce1ebb6fb	Apply some safe comprehension optimizations (#94323 ) Optimize unnecessary collection cast calls, unnecessary calls to list, tuple, and dict, and simplify calls to the sorted builtin. This should strictly improve speed and improve readability. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94323 Approved by: https://github.com/albanD	2023-02-07 23:53:46 +00:00
Aaron Gokaslan	8fce9a09cd	[BE]: pyupgrade Python to 3.8 - imports and object inheritance only (#94308 ) Apply parts of pyupgrade to torch (starting with the safest changes). This PR only does two things: removes the need to inherit from object and removes unused future imports. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94308 Approved by: https://github.com/ezyang, https://github.com/albanD	2023-02-07 21:10:56 +00:00
Masaki Kozuki	6ba041fcae	Look up `group["capturable"]`, not `defaults["capturable"]` in Adam(W) (#94149 ) We could set different values in each `param_group` when calling dunder init of `torch.optim` optimizers as in e.g. https://github.com/pytorch/pytorch/issues/89987. So check whether or not `capturable` is `True` among all the `param_group`s. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94149 Approved by: https://github.com/albanD	2023-02-07 00:24:35 +00:00
Masaki Kozuki	a23ed38f9a	[mta][foreach] Implement fused adamw (#88015 ) related: https://github.com/pytorch/pytorch/issues/68041, https://github.com/pytorch/pytorch/issues/71274, https://github.com/pytorch/pytorch/issues/80167 possibly related to https://github.com/pytorch/pytorch/issues/80595#issuecomment-1178519436 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88015 Approved by: https://github.com/albanD, https://github.com/ngimel	2023-02-01 19:32:29 +00:00
Masaki Kozuki	d7a3f2128f	pass `None` instead of `False` inside `Adam.__setstate__` (#93289 ) with `a061f139dc`, `fused`'s type hint is `Optional[bool]` and its default value is `None`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/93289 Approved by: https://github.com/janeyx99, https://github.com/Skylion007	2023-01-31 09:41:35 +00:00
Jane Xu	4fc19e1a71	[optim][adam] use fastest impl whenever possible, add util (#93184 ) This allows it so that ONLY when the users don't set anything for foreach or fused do we switch the default and cascades adam so that we default to fused, then foreach, then single-tensor. To clarify: * if the user puts True in foreach _only_, it will run the foreach implementation. * if the user puts True in fused _only_, it will run the fused implementation. * if the user puts True in foreach AND for fused, it will run the fused implementation. And: * if the user puts False in foreach _only_, it will run the single tensor implementation. * if the user puts False in fused _only_, it will still run the single tensor implementation. * if the user puts False in foreach AND for fused, it will run the single tensor implementation. I also didn't trust myself that much with the helper function, so I ran some local asserts on _default_to_fused_or_foreach. The only point left to really test is the type(p) -- torch.Tensor but I think the distributed tests will catch that in CI. ``` cuda_only_fp_list = [ torch.rand((1, 2), device="cuda", dtype=torch.float32), torch.rand((1, 2), device="cuda", dtype=torch.float64), torch.rand((1, 2), device="cuda", dtype=torch.float16), torch.rand((1, 2), device="cuda", dtype=torch.bfloat16), ] cuda_only_int_list = [ torch.randint(1024, (1, 2), device="cuda", dtype=torch.int64), ] cpu_list = [ torch.rand((1, 2), device="cpu", dtype=torch.float32), torch.rand((1, 2), device="cpu", dtype=torch.float64), torch.rand((1, 2), device="cpu", dtype=torch.float16), ] none_list = [None] # differentiable should always make it return false for both assert _default_to_fused_or_foreach([cuda_only_fp_list], True, True) == (False, False) assert _default_to_fused_or_foreach([cuda_only_fp_list], True, False) == (False, False) # cpu lists should always make it return false for both assert _default_to_fused_or_foreach([cuda_only_fp_list, cpu_list], False, True) == (False, False) assert _default_to_fused_or_foreach([cpu_list], False, True) == (False, False) assert _default_to_fused_or_foreach([cuda_only_fp_list, cpu_list], False, False) == (False, False) assert _default_to_fused_or_foreach([cpu_list], False, False) == (False, False) # has fused triggers correctly assert _default_to_fused_or_foreach([cuda_only_fp_list], False, True) == (True, False) assert _default_to_fused_or_foreach([cuda_only_fp_list], False, False) == (False, True) # ints always goes to foreach assert _default_to_fused_or_foreach([cuda_only_fp_list, cuda_only_int_list], False, True) == (False, True) assert _default_to_fused_or_foreach([cuda_only_fp_list, cuda_only_int_list], False, False) == (False, True) # Nones don't error assert _default_to_fused_or_foreach([cuda_only_fp_list, none_list], False, True) == (True, False) assert _default_to_fused_or_foreach([cuda_only_fp_list, cuda_only_int_list, none_list], False, True) == (False, True) assert _default_to_fused_or_foreach([none_list], False, True) == (True, False) assert _default_to_fused_or_foreach([none_list], False, False) == (False, True) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/93184 Approved by: https://github.com/albanD	2023-01-30 19:58:55 +00:00
Jane Xu	e714e37a06	[optim][sgd] default to foreach when CUDA + differentiable=False (#92730 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/92730 Approved by: https://github.com/albanD	2023-01-26 04:52:58 +00:00
Jane Xu	8c9f745af1	[foreach] guard default support on native tensors only (#92923 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/92923 Approved by: https://github.com/ngimel, https://github.com/crcrpar	2023-01-26 04:52:58 +00:00
Jane Xu	b90496eef5	[nn] zero_grad() set_to_none default True (#92731 ) Attempts to fix #92656 BC-breaking! This changes the default of zero_grad in optim and in nn to default set grads to None instead of zero tensors. We are changing the default because there are proven perf wins and existing code has typically not regressed due to this change. (will probably have to flesh out this note more). Pull Request resolved: https://github.com/pytorch/pytorch/pull/92731 Approved by: https://github.com/ngimel	2023-01-26 01:04:28 +00:00
Jane Xu	0d870b50d3	[optim][nadam] group tensors in foreach, make it default (#92715 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/92715 Approved by: https://github.com/albanD	2023-01-21 05:43:37 +00:00
Jane Xu	9ccf9362c2	[optim][rprop] default to foreach when CUDA + differentiable=False (#92728 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/92728 Approved by: https://github.com/albanD	2023-01-21 05:31:22 +00:00
Jane Xu	c628654724	[optim][rmsprop] default to foreach when CUDA + differentiable=False (#92727 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/92727 Approved by: https://github.com/albanD	2023-01-21 05:31:22 +00:00
Jane Xu	7277247a8c	[optim][radam] default to foreach when CUDA + differentiable=False (#92726 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/92726 Approved by: https://github.com/albanD	2023-01-21 05:31:22 +00:00
Jane Xu	9f356568ab	[optim][asgd] default to foreach when CUDA + differentiable=False (#92724 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/92724 Approved by: https://github.com/albanD	2023-01-21 05:31:22 +00:00
Jane Xu	30bda6b12b	[optim][adamax] default to foreach when CUDA + differentiable=False (#92723 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/92723 Approved by: https://github.com/albanD	2023-01-21 05:31:22 +00:00
Jane Xu	9b4a778420	[optim][adagrad] default to foreach when CUDA + differentiable=False (#92716 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/92716 Approved by: https://github.com/albanD	2023-01-21 05:31:22 +00:00
Jane Xu	de0375e79d	[optim][foreach] Do NOT inplace modify gradients (#92706 ) SGD and ASGD already had out-of-place grads. Pull Request resolved: https://github.com/pytorch/pytorch/pull/92706 Approved by: https://github.com/ngimel, https://github.com/albanD	2023-01-21 00:12:28 +00:00
Jane Xu	2b885e1f6c	[optim][NAdam] Fix discrepancy between mt vs st impl (#92699 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/92699 Approved by: https://github.com/albanD	2023-01-21 00:12:28 +00:00
milesial	e4d83d54a6	Foreach gradient clipping (#91846 ) Faster gradient clipping using the foreach functions ``` [------------------------ (tensors, scalar) -------------------------] \| without foreach \| with foreach \| apex 1 threads: ---------------------------------------------------------------------- 10 tensors of size 4 \| 120.5 \| 61.1 \| 50.3 100 tensors of size 4 \| 946.2 \| 239.5 \| 136.3 1000 tensors of size 4 \| 9808.5 \| 2151.1 \| 1006.9 10000 tensors of size 4 \| 96871.2 \| 22637.4 \| 10119.1 10 tensors of size 16 \| 121.0 \| 64.1 \| 52.5 100 tensors of size 16 \| 993.4 \| 252.6 \| 136.7 1000 tensors of size 16 \| 9427.7 \| 2151.2 \| 1049.5 10000 tensors of size 16 \| 97437.1 \| 22203.1 \| 10340.0 10 tensors of size 256 \| 118.9 \| 62.3 \| 51.5 100 tensors of size 256 \| 955.2 \| 243.1 \| 134.2 1000 tensors of size 256 \| 9374.9 \| 2140.7 \| 1009.6 10000 tensors of size 256 \| 95302.5 \| 21849.4 \| 10215.5 10 tensors of size 65536 \| 118.5 \| 62.4 \| 51.1 100 tensors of size 65536 \| 1740.7 \| 243.3 \| 225.3 1000 tensors of size 65536 \| 17364.1 \| 2228.7 \| 2004.5 10000 tensors of size 65536 \| 177510.1 \| 25410.4 \| 20678.2 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/91846 Approved by: https://github.com/janeyx99	2023-01-20 21:43:29 +00:00
Jane Xu	b2ca2c8662	[optim][adagrad] group tensors in foreach to maximize perf (#92362 ) another one Pull Request resolved: https://github.com/pytorch/pytorch/pull/92362 Approved by: https://github.com/albanD	2023-01-20 16:24:39 +00:00
Jane (Yuan) Xu	3ba5eae72a	[optim][radam] fix eps discrepancy for foreach (#92551 ) Will likely race with https://github.com/pytorch/pytorch/pull/92365 eps was not being used at all in the mta/foreach impl. There was also a discrepancy between the docs vs the implementation: the implementation was doing sqrt(x) + eps and the docs were doing sqrt(x+eps)). I've fixed the docs + extended the current multi_tensor test case to capture this issue. ![image](https://user-images.githubusercontent.com/31798555/213300617-61cbb763-da2d-48e0-b3b6-0190594dd049.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/92551 Approved by: https://github.com/albanD	2023-01-19 14:38:59 +00:00
Jane Xu	c5cb46ecdb	[optim][asgd] group tensors in foreach to maximize perf (#92364 ) faster foreach Pull Request resolved: https://github.com/pytorch/pytorch/pull/92364 Approved by: https://github.com/albanD	2023-01-18 23:09:55 +00:00
Jane Xu	fbafcecf8d	[optim][radam] group tensors in foreach to maximize perf (#92365 ) Also noticed that eps is not being used nor tested at all for the mta impl of RAdam. Will fix in a followup PR before turning foreach to default! Pull Request resolved: https://github.com/pytorch/pytorch/pull/92365 Approved by: https://github.com/albanD	2023-01-18 22:32:27 +00:00
Jane Xu	de459bdfaa	[optim][rmsprop] group tensors in foreach to maximize perf (#92369 ) Test plan: CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/92369 Approved by: https://github.com/albanD	2023-01-18 22:28:52 +00:00
Jane Xu	07800c52af	[optim][adam] group tensors in foreach to maximize perf (#92349 ) same idea as https://github.com/pytorch/pytorch/pull/92338 Pull Request resolved: https://github.com/pytorch/pytorch/pull/92349 Approved by: https://github.com/albanD	2023-01-18 22:05:42 +00:00
Jane (Yuan) Xu	e2433e420c	[optim][adamax] group tensors in foreach to maximize perf (#92363 ) make foreach faster Pull Request resolved: https://github.com/pytorch/pytorch/pull/92363 Approved by: https://github.com/albanD	2023-01-18 21:32:28 +00:00
Jane Xu	bb34461f00	[optim][rprop] group tensors in foreach to maximize perf (#92372 ) this one had a few more for loops than i was expecting Pull Request resolved: https://github.com/pytorch/pytorch/pull/92372 Approved by: https://github.com/albanD	2023-01-18 20:03:11 +00:00
Jane Xu	0070c546b5	[BE][optim] abstract out docstrings, add differentiable docs (#92336 ) 1. abstract out common doc strings --> I'm sure there are more, but let this be a first step. 2. Add differentiable docs to those who are actually differentiable Pull Request resolved: https://github.com/pytorch/pytorch/pull/92336 Approved by: https://github.com/albanD	2023-01-18 15:09:28 +00:00
Jane Xu	a41f00ed70	[optim][sgd] group tensors in foreach to maximize perf (#92338 ) Make foreach faster for SGD Pull Request resolved: https://github.com/pytorch/pytorch/pull/92338 Approved by: https://github.com/albanD	2023-01-18 04:02:41 +00:00
Jane Xu	0157e2ef4e	[optim][adamw] default to foreach when CUDA + differentiable=False (#92306 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/92306 Approved by: https://github.com/albanD	2023-01-18 00:13:50 +00:00
Jane Xu	4fc796daf9	[optim] abstract out _default_to_foreach_util (#92305 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/92305 Approved by: https://github.com/albanD	2023-01-17 19:42:20 +00:00
Jane Xu	d41b5d7c14	[adam] Add not torch.jit.is_scripting() as a requirement for switching to fused (#92181 ) A "fix" following https://github.com/pytorch/pytorch/pull/90865. Realized that fused is not compatible with torch.jit.is_scripting() when looking at a later line. Took the opportunity to make the code cleaner/slightly more performant (with the extends) as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/92181 Approved by: https://github.com/albanD	2023-01-14 19:05:27 +00:00
Jane Xu	d3765509df	[optim][adadelta] default to foreach when CUDA + differentiable=False (#91896 ) following up to https://github.com/pytorch/pytorch/pull/90865 and https://github.com/pytorch/pytorch/pull/92048 Pull Request resolved: https://github.com/pytorch/pytorch/pull/91896 Approved by: https://github.com/albanD	2023-01-14 01:21:33 +00:00
Jane Xu	4af5939d7a	[optim] Improve adadelta foreach, group tensors to maximize fast path (#92048 ) Old behavior would have adadelta foreach sending tensors to the slow path if they were not all the same dtype nor on the same device. This PR adds grouping for adadelta optimizer so that it would run foreach in batches, allowing more users to benefit from foreach perf. Of course, we should ensure that the new implementation works, so there are new tests to ensure this behavior is not broken. Pull Request resolved: https://github.com/pytorch/pytorch/pull/92048 Approved by: https://github.com/albanD	2023-01-14 00:35:14 +00:00
Anupam Bhatnagar	f4b804eeaa	Call profiler step via optimizer post hook (#90101 ) This PR adds the `_profile_using_dynolog` function to `torch/__init__.py`. The `_profile_using_dynolog` method allows registering the optimizer step post hook. This is required to collect iteration based traces using dynolog. Other related changes for tests to pass: 1. Updated `optimizer.pyi` 1. Updated `overrides.py` 1. The test `test_kineto_profiler_multiple_steppers` in `test_profiler.py` has been broken down into two cases: - `test_kineto_profiler_multiple_steppers_with_override_True` : this test uses the override argument - `test_kineto_profiler_multiple_steppers_with_override_False` : this test uses the environment variable Pull Request resolved: https://github.com/pytorch/pytorch/pull/90101 Approved by: https://github.com/albanD	2023-01-13 18:07:40 +00:00
Nouran Ali	a60125e298	add docstring for adam differentiable parameter (#91881 ) Fixes #90467 Pull Request resolved: https://github.com/pytorch/pytorch/pull/91881 Approved by: https://github.com/janeyx99	2023-01-13 17:08:27 +00:00
albanD	60e37a6e08	Update sgd doc to insist on momentum buffer initial value (#92111 ) Following the discussion in https://github.com/pytorch/pytorch/pull/91108 Pull Request resolved: https://github.com/pytorch/pytorch/pull/92111 Approved by: https://github.com/soumith, https://github.com/janeyx99	2023-01-13 15:50:57 +00:00
milesial	9412778d51	Fix OneCycleLR error log (#92040 ) If we call the scheduler 11 times but the number of expected steps is 10, we should print `Tried to step 11 times`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/92040 Approved by: https://github.com/janeyx99	2023-01-13 02:46:59 +00:00
Jane Xu	ed7885c254	[utils][foreach] Add group tensor by device and dtype util (#92014 ) Add util that will be commonly used throughout optim Pull Request resolved: https://github.com/pytorch/pytorch/pull/92014 Approved by: https://github.com/albanD	2023-01-11 23:37:20 +00:00
PyTorch MergeBot	7f2b5ea1e1	Revert "Avoid device casting for all singleton tensors in optimizer states (#91454 )" This reverts commit `1e725c9747`. Reverted https://github.com/pytorch/pytorch/pull/91454 on behalf of https://github.com/janeyx99 due to Likely caused regression where checkpoint resume fails during training	2023-01-10 18:57:50 +00:00
Joel Schlosser	1e725c9747	Avoid device casting for all singleton tensors in optimizer states (#91454 ) Fixes #75224 Pull Request resolved: https://github.com/pytorch/pytorch/pull/91454 Approved by: https://github.com/janeyx99	2023-01-04 17:55:00 +00:00
joncrall	ad782ff7df	Enable xdoctest runner in CI for real this time (#83816 ) Builds on #83317 and enables running the doctests. Just need to figure out what is causing the failures. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83816 Approved by: https://github.com/ezyang, https://github.com/malfet	2022-12-29 05:32:42 +00:00
Adrian Wälchli	f5e20d6060	Make the state dict of CyclicLR scheduler pickleable (#91400 ) Fixes #90414 This PR drops the unpicklable `weakref.WeakMethod` object from CyclicLR scheduler from the state dict, and re-inits the object again once the state dict gets loaded. This makes the state picklable so you can include it in your checkpoint. Also fixes https://github.com/Lightning-AI/lightning/issues/15901 A simple test was added that `pickle.dumps(state)` the state. Pull Request resolved: https://github.com/pytorch/pytorch/pull/91400 Approved by: https://github.com/albanD	2022-12-28 18:05:24 +00:00
Jane Xu	a061f139dc	[optim] Adam defaults to fused when CUDA + differentiable=False (#90865 ) Step 1 in faster default optimizers. Preliminary benchmarks show gaps in improvement on CUDA for BERT_pytorch and resnet18: ![image](https://user-images.githubusercontent.com/31798555/207707118-14221802-77ce-4ee0-96e3-04638c07924c.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/90865 Approved by: https://github.com/albanD	2022-12-27 01:28:47 +00:00
richardachen	dafd0432ee	Update __init__.py (#91196 ) Fixes #91080 Pull Request resolved: https://github.com/pytorch/pytorch/pull/91196 Approved by: https://github.com/janeyx99	2022-12-20 23:38:25 +00:00
Michael Lazos	1accd915a4	Re-enable optimizers (#90709 ) Fixes https://github.com/pytorch/pytorch/issues/90165 https://github.com/pytorch/torchdynamo/issues/328 Re-enables optimizer capture + compilation now that the dynamo slowdowns have been fixed and it has speedups, numbers to come soon Pull Request resolved: https://github.com/pytorch/pytorch/pull/90709 Approved by: https://github.com/anijain2305, https://github.com/jansel, https://github.com/yanboliang	2022-12-19 04:07:41 +00:00
Soumith Chintala	06326a7721	[optim] skip .item calls in all optimizers when compiling with dynamo (#88173 ) @mlazos: skips `item()` calls if compiling with dynamo, by defining a helper function `_get_value` which either returns the result of `.item()` or the scalar cpu tensor if compiling with dynamo. This was done because removing `item()` calls significantly regresses eager perf. Additionally, `_dispatch_sqrt` calls the appropriate sqrt function (math.sqrt, or torch.sqrt). Fixes https://github.com/pytorch/torchdynamo/issues/1083 This PR will no longer be needed once symint support is default. This PR closes all remaining graph breaks in the optimizers (!!) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88173 Approved by: https://github.com/albanD	2022-12-12 17:32:35 +00:00
Mauricio Villegas	aacafd2cba	Fixed a couple of mistakes in type annotations in optim package (#90216 ) Doing some tests with all Optimizer and LRScheduler classes in optim package, I noticed a couple of mistakes in type annotations, so created a pull request to fix them. - In Optimizer class, incorrectly named parameter `default` instead of `defaults` in pyi file - In SGD class, type for `maximize` and `differentiable` not available in either py or pyi files I don't know if there is a plan to move all types from pyi to py files, so wasn't too sure where to fix what. Pull Request resolved: https://github.com/pytorch/pytorch/pull/90216 Approved by: https://github.com/janeyx99	2022-12-09 03:20:21 +00:00
Anupam Bhatnagar	6f4dea562d	Implement post and pre hooks for optimizer (#89176 ) Fixes #88446 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89176 Approved by: https://github.com/albanD	2022-12-02 07:03:45 +00:00
Michael Lazos	c63afb283c	Disable dynamo on optimizer lazy initialization (#89902 ) Helps with https://github.com/pytorch/torchdynamo/issues/1803 Separate out the group initialization and disable dynamo on it Pull Request resolved: https://github.com/pytorch/pytorch/pull/89902 Approved by: https://github.com/soumith, https://github.com/albanD	2022-12-02 01:15:11 +00:00
Michael Lazos	3d47c74cfe	Update code style for optimizer code (#89862 ) Separating out whitespace-only changes Pull Request resolved: https://github.com/pytorch/pytorch/pull/89862 Approved by: https://github.com/albanD, https://github.com/soumith	2022-11-30 00:53:05 +00:00
albanD	c3e85d879c	Mention discrepency between original impl and our impl of RAdam (#89575 ) Fixes https://github.com/pytorch/pytorch/issues/88836 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89575 Approved by: https://github.com/mruberry	2022-11-24 17:11:42 +00:00
Jane Xu	310335de48	Update lr_scheduler.pyi to match lr_scheduler.py (#88818 ) Following #88503, we should also update the pyi file Pull Request resolved: https://github.com/pytorch/pytorch/pull/88818 Approved by: https://github.com/soulitzer	2022-11-11 04:02:44 +00:00
Jane Xu	0a69c50a46	Publicly expose _LRScheduler to LRScheduler (#88503 ) Fixes #61232 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88503 Approved by: https://github.com/soulitzer	2022-11-07 21:15:10 +00:00
Kazuaki Ishizaki	2ddefbdc3c	Fix typos used in documents under torch directory (#88300 ) This PR fixes typos, in comments of Python files, that are found from a search box at https://pytorch.org/docs/master/search.html Pull Request resolved: https://github.com/pytorch/pytorch/pull/88300 Approved by: https://github.com/lezcano	2022-11-02 09:38:13 +00:00
RangiLyu	512a3a48e3	sync AveragedModel buffers when use_buffers=False (#84054 ) Fixes #84053 As described in the issue, the AveragedModel will deep copy the model during initialization, which means that the buffers in the averaged model cannot be updated together with the model. One solution is to make the buffers equal to the source model every time when calling `update_parameters`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84054 Approved by: https://github.com/samdow	2022-10-24 16:03:14 +00:00
Emilio Castillo	1b43883fd6	Make `AdamW`, `NAdam` & `RAdam` differentiable (#86183 ) Blocked by #86096 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86183 Approved by: https://github.com/albanD	2022-10-17 04:32:08 +00:00
mikael10j	7dcfbedce0	Fix LinearLR scheduler start_factor (#86695 ) Fixes #86454 The `start_factor` must be comprised in ]0;1] instead of [0;1] to avoid division by 0. This PR changes the lower limit checking of the parameter. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86695 Approved by: https://github.com/albanD	2022-10-13 17:31:36 +00:00
Emilio Castillo	cb4867a71a	Make `ASGD` & `RProp` differentiable (#86258 ) Blocked by #86183 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86258 Approved by: https://github.com/albanD	2022-10-13 04:06:13 +00:00
Emilio Castillo	aacb9f3ac6	Make `Adadelta`,`Adagrad` & `Adamax` differentiable (#86096 ) Continuing the differentiable optimizers support Pull Request resolved: https://github.com/pytorch/pytorch/pull/86096 Approved by: https://github.com/janeyx99	2022-10-12 23:16:29 +00:00
kshitij12345	82229d1e33	[optim] fix: empty grad support for SparseAdam (#86459 ) Fixes #82486 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86459 Approved by: https://github.com/albanD	2022-10-07 19:24:59 +00:00
Check Deng	b3fdb02fb2	Fix memory leak in _LRScheduler.step() (#85602 ) Fixes #85410 This diff removed the cyclic references in `_LRScheduler.step()`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85602 Approved by: https://github.com/albanD	2022-10-07 15:55:55 +00:00
Tongzhou Wang	5ed75ec1d7	Fix SparseAdam consuming iterator (#86210 ) Fixes https://github.com/pytorch/pytorch/issues/86209 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86210 Approved by: https://github.com/cpuhrsch	2022-10-06 23:11:25 +00:00
PyTorch MergeBot	233d6f195a	Revert "Fix memory leak in _LRScheduler.step() (#85602 )" This reverts commit `eb32330d6b`. Reverted https://github.com/pytorch/pytorch/pull/85602 on behalf of https://github.com/albanD due to newly added test is flaky	2022-10-06 22:02:02 +00:00
Chengqi Deng	eb32330d6b	Fix memory leak in _LRScheduler.step() (#85602 ) Fixes #85410 This diff removed the cyclic references in `_LRScheduler.step()`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85602 Approved by: https://github.com/albanD	2022-10-06 17:07:36 +00:00
Masaki Kozuki	5f26df0345	resubmit: "resubmit: [mta] APEX style Fused Adam (#81705 ) (#85507 )" (#85739 ) Embarrassingly move the pow implementations around [ATen/native/cuda/PowKernel.cu#L21-L66](`849b08f14b/aten/src/ATen/native/cuda/PowKernel.cu (L21-L66)`) to a new header file and let FusedAdam use them to tame MSVC, hopefully. cc @ngimel @ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/85739 Approved by: https://github.com/ngimel	2022-09-29 16:58:59 +00:00
Seonglyong Gong	f80ef73d1c	[Profiler] tracking Optimizer (part 2 of Record Optimizer) (#84920 ) Summary: Part 2 of Record Optimizer param_groups and states (https://github.com/pytorch/pytorch/pull/84063) - hooking from optimizer step - PyOptCall Type - declare data type for collection - python binding - simple unit test case Test Plan: buck run mode/opt //caffe2/test:profiler Differential Revision: D39402667 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84920 Approved by: https://github.com/robieta	2022-09-28 02:48:07 +00:00
Peter Jung	9f1468ae6c	CyclicLR memory leak fix (#85462 ) Hi, we noticed in our team that by using CyclicLR, there is a problem with memory clearance on GPU (probably it will be the case without the GPU as well, but that was our use case) After initializing CyclicLR, GPU memory is not cleared even after the model, optimizer and scheduler are out of scope (e.g. reference count is zero). This is because `__init__` method inside `CyclicLR` creates reference to its own methods and it will not get removed until `gc.collect()` is called manually. This is a problem if people want to test multiple models in one run of a script, after testing the first model, second one will fail on `CUDA out of memory error` because the first one is not cleared from the memory. I propose a simple fix by using `weakref`, similarly as in `_LRScheduler` base class, but if you have any comments I am happy to change it. Here is the code to reproduce the bug: ``` import torch import weakref from transformers import DetrForObjectDetection class X: def __init__(self, optimizer): self.optimizer = optimizer # Will cause cyclic reference. self.func = self.dummy # Will work as expected, memory cleared after instance count is zero. # self.func = weakref.WeakMethod(self.dummy) def dummy(self, x): return 1. def test(): model = DetrForObjectDetection.from_pretrained('facebook/detr-resnet-50') model.to('cuda') optimizer = torch.optim.Adam(model.parameters()) x = X(optimizer) test() print(f'{torch.cuda.memory_reserved()}, {torch.cuda.memory_allocated()}') # Should print (<some memory>, 0), but with cyclic reference, it will print (<some memory>, <some memory>). ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/85462 Approved by: https://github.com/albanD	2022-09-27 17:41:58 +00:00
PyTorch MergeBot	7167996346	Revert "resubmit: [mta] APEX style Fused Adam (#81705 ) (#85507 )" This reverts commit `4615d1bcfa`. Reverted https://github.com/pytorch/pytorch/pull/85507 on behalf of https://github.com/atalman due to Break internal windows builds	2022-09-27 16:59:35 +00:00
Masaki Kozuki	4615d1bcfa	resubmit: [mta] APEX style Fused Adam (#81705 ) (#85507 ) This PR implements an APEX style FusedAdam in PyTorch. This is different from the APEX one in that this is compatible with `torch.cuda.amp.GradScaler` by setting `_step_supports_amp_scaling` to `True` and unscales gradients inside its CUDA kernel. related: https://github.com/pytorch/pytorch/issues/68041, https://github.com/pytorch/pytorch/issues/71274, https://github.com/pytorch/pytorch/issues/80167 possibly related to https://github.com/pytorch/pytorch/issues/80595#issuecomment-1178519436 Pull Request resolved: https://github.com/pytorch/pytorch/pull/81705 Approved by: https://github.com/ngimel cc @ptrblck @ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/85507 Approved by: https://github.com/ngimel	2022-09-23 18:56:00 +00:00
PyTorch MergeBot	e505360eb8	Revert "[mta] APEX style Fused Adam (#81705 )" This reverts commit `7a6c4d0c50`. Reverted https://github.com/pytorch/pytorch/pull/81705 on behalf of https://github.com/dagitses due to broke internal builds, details to come	2022-09-22 19:37:29 +00:00
Masaki Kozuki	7a6c4d0c50	[mta] APEX style Fused Adam (#81705 ) This PR implements an APEX style FusedAdam in PyTorch. This is different from the APEX one in that this is compatible with `torch.cuda.amp.GradScaler` by setting `_step_supports_amp_scaling` to `True` and unscales gradients inside its CUDA kernel. related: https://github.com/pytorch/pytorch/issues/68041, https://github.com/pytorch/pytorch/issues/71274, https://github.com/pytorch/pytorch/issues/80167 possibly related to https://github.com/pytorch/pytorch/issues/80595#issuecomment-1178519436 cc @ptrblck @ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/81705 Approved by: https://github.com/ngimel	2022-09-20 17:18:33 +00:00
F-G Fernandez	7243264c61	fix: Allowed optimizers with more than 2 betas (#84486 ) Hello there 👋 As discussed in #84485, this PR enables more flexibility on the optimizers that are wrapped by LR schedulers in PyTorch. Currently, it is incompatible with optimizers that have a number of betas different than 2. This PR fixes that with minimal modifications. Fixes #84485 Any feedback is welcome! Pull Request resolved: https://github.com/pytorch/pytorch/pull/84486 Approved by: https://github.com/Lezcano, https://github.com/soulitzer	2022-09-06 19:24:10 +00:00
kshitij12345	faac3dbce2	[optim] asgd : handle complex params as independent real params (#84472 ) Ref: #65711 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84472 Approved by: https://github.com/Lezcano, https://github.com/soulitzer	2022-09-06 16:58:42 +00:00
kshitij12345	7c20ad3dfa	[optim] rprop: handle complex params as independent real params (#83858 ) Ref #65711 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83858 Approved by: https://github.com/albanD	2022-08-23 08:39:35 +00:00
Kshiteej K	09331c947c	[optim] rmsprop: handle complex params as independent real params (#83860 ) Ref: #65711 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83860 Approved by: https://github.com/albanD	2022-08-22 21:55:01 +00:00
joncrall	b136f3f310	More doctest refinements. (#83317 ) Follow up to #82797 Now that the doctests themselves are in a better state, we should be able to enable xdoctest on the CI so they stay that way. @ezyang @vadimkantorov Pull Request resolved: https://github.com/pytorch/pytorch/pull/83317 Approved by: https://github.com/ezyang	2022-08-22 20:07:26 +00:00
Emilio Castillo	f0eb841d20	Make `torch.optim.RMSprop` differentiable (#83578 ) Blocked by #82205 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83578 Approved by: https://github.com/albanD	2022-08-22 03:37:10 +00:00
albanD	84c4b07932	Make sure that we can load old optimizer checkpoint (#83588 ) We want to make sure that we can load checkpoints that were saved with older version of the code (which doesn't contain the differentiable attribute). Pull Request resolved: https://github.com/pytorch/pytorch/pull/83588 Approved by: https://github.com/mikaylagawarecki	2022-08-17 15:08:05 +00:00
Emilio Castillo	5aab57e112	Make Adam optimizer differentiable (#82205 ) Continues [80938](https://github.com/pytorch/pytorch/pull/80938) Pull Request resolved: https://github.com/pytorch/pytorch/pull/82205 Approved by: https://github.com/albanD	2022-08-17 07:20:37 +00:00
Rob Zinkov	ff75562cff	Adding maximize to rprop (#81864 ) Added the maximize flag #68052 to rprop optimizer and updates the respective tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/81864 Approved by: https://github.com/albanD	2022-08-16 08:19:46 +00:00
joncrall	4618371da5	Integrate xdoctest - Rebased (#82797 ) This is a new version of #15648 based on the latest master branch. Unlike the previous PR where I fixed a lot of the doctests in addition to integrating xdoctest, I'm going to reduce the scope here. I'm simply going to integrate xdoctest, and then I'm going to mark all of the failing tests as "SKIP". This will let xdoctest run on the dashboards, provide some value, and still let the dashboards pass. I'll leave fixing the doctests themselves to another PR. In my initial commit, I do the bare minimum to get something running with failing dashboards. The few tests that I marked as skip are causing segfaults. Running xdoctest results in 293 failed, 201 passed tests. The next commits will be to disable those tests. (unfortunately I don't have a tool that will insert the `#xdoctest: +SKIP` directive over every failing test, so I'm going to do this mostly manually.) Fixes https://github.com/pytorch/pytorch/issues/71105 @ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/82797 Approved by: https://github.com/ezyang	2022-08-12 02:08:01 +00:00
Federico Pozzi	f8a10a7f79	feat: add PolynomialLR scheduler (#82769 ) ### Description <!-- What did you change and why was it needed? --> Add PolynomialLR scheduler. ### Issue Closes #79511. ### Testing I added tests for PolynomialLR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/82769 Approved by: https://github.com/datumbox	2022-08-10 18:21:00 +00:00
Rob Zinkov	c54d18dbc7	Handle complex optimization in Adamax by treating complex numbers as 2D real numbers (#80319 ) This commit partially addresses #65711 Pull Request resolved: https://github.com/pytorch/pytorch/pull/80319 Approved by: https://github.com/albanD	2022-08-05 21:03:18 +00:00
Rob Zinkov	dcbe9ce2ad	Handle complex optimization in AdamW by treating complex numbers as 2D real numbers (#80280 ) This commit partially addresses #65711 Pull Request resolved: https://github.com/pytorch/pytorch/pull/80280 Approved by: https://github.com/albanD	2022-08-05 13:47:14 +00:00
Masaki Kozuki	3139722679	[foreach][mta] Inplace `maximum` and `minimum` (#82523 ) ### Description <!-- What did you change and why was it needed? --> Implement `torch._foreach_maximum_` and `torch._foreach_minimum_` mainly for `_multi_tensor_adam` and `_multi_tensor_adamw` with `amsgrad=True` to correctly update their `max_exp_avg_sqs`. ### Issue <!-- Link to Issue ticket or RFP --> - https://github.com/pytorch/pytorch/issues/78807 - https://github.com/pytorch/pytorch/pull/81894 - https://github.com/pytorch/pytorch/pull/81348 - https://github.com/pytorch/pytorch/pull/81705 - https://github.com/pytorch/pytorch/issues/58833 - https://github.com/pytorch/pytorch/issues/68041 ### Testing <!-- How did you test your change? --> Updated `test_foreach.py::TestForeach::_minmax_test` to compare the outputs of `_foreach_maximum_` (and `_foreach_minimum_`) against those of `[torch.maximum(a, b) for a, b in zip(tensors1, tensors2)]` cc @ngimel @albanD @mikaylagawarecki Pull Request resolved: https://github.com/pytorch/pytorch/pull/82523 Approved by: https://github.com/albanD	2022-08-03 03:40:42 +00:00
ProGamerGov	71d50f4f89	Change docstring type callable to Callable for consistency (#82487 ) ### Description Across PyTorch's docstrings, both `callable` and `Callable` for variable types. The Callable should be capitalized as we are referring to the `Callable` type, and not the Python `callable()` function. ### Testing There shouldn't be any testing required. Pull Request resolved: https://github.com/pytorch/pytorch/pull/82487 Approved by: https://github.com/albanD	2022-08-01 17:26:09 +00:00
ProGamerGov	357b7d589c	Fix docstring inconsistencies: string -> str, boolean -> bool (#82410 ) ### Description Throughout the PyTorch docs and codebase, the `string` type in docstrings is referred to by two separate names. This leads to inconsistent docs, like you can see here: https://pytorch.org/docs/stable/generated/torch.nn.Conv3d.html#torch.nn.Conv3d This PR fixes this issue by ensuring that all mentions of the string type in docstrings, are using the same format that Sphinx generates hyperlinks for. ### Testing No testing should be required for this change Pull Request resolved: https://github.com/pytorch/pytorch/pull/82410 Approved by: https://github.com/jbschlosser	2022-07-28 21:29:57 +00:00
Rob Zinkov	f9ef363982	Modifying Adam to support complex numbers as 2d real numbers (#80279 ) This commit addresses issues in #65711 Pull Request resolved: https://github.com/pytorch/pytorch/pull/80279 Approved by: https://github.com/albanD	2022-07-27 18:39:40 +00:00
Sudarshan Raghunathan	52aae5aa19	[Sparse Adam] Fix error in loading serialized models due to introduction of new parameter (#82273 ) ### Description PR #80336 introduced a new parameter to the Sparse Adam optimizer. The new parameter is accessed inside the `step` method of the optimizer. If we try to deserialize and run an older version of the optimizer before this change was introduced, it fails in the step that tries to access the missing parameter. I have added a workaround to set a default value in case the parameter is unavailable in the optimizer. ### Issue <!-- Link to Issue ticket or RFP --> ### Testing * Testing on PyTorch CI * Manual validation against existing serialized models to make sure they continue to work Pull Request resolved: https://github.com/pytorch/pytorch/pull/82273 Approved by: https://github.com/mehtanirav, https://github.com/albanD	2022-07-27 12:48:38 +00:00
albanD	312ece7f65	fix sgd maximize when momentum is involved (#81859 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/81859 Approved by: https://github.com/jbschlosser	2022-07-26 16:48:32 +00:00
Emilio Castillo	49b4f45781	Add initial support for differentiable optimizers (#80938 ) Adds the `differentiable` argument, a method for updating parameters in an existing optimizer, and a template for testing the differentiability of multiple optimizers. This is all based in discussions with @albanD & @jbschlosser Pull Request resolved: https://github.com/pytorch/pytorch/pull/80938 Approved by: https://github.com/albanD	2022-07-25 13:37:08 +00:00
Rob Zinkov	50c655d5e3	Adding maximize to ASGD (#81875 ) Added the maximize flag #68052 to ASGD optimizer and updates the respective tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/81875 Approved by: https://github.com/albanD	2022-07-22 17:05:41 +00:00
PyTorch MergeBot	135af0fe30	Revert "Adding maximize to ASGD (#80323 )" This reverts commit `14bd5bd6ee`. Reverted https://github.com/pytorch/pytorch/pull/80323 on behalf of https://github.com/albanD due to Broke rocm test	2022-07-08 13:35:31 +00:00
PyTorch MergeBot	0b8a5ca01b	Revert "Adding maximize to rprop (#80335 )" This reverts commit `495aa9bc3a`. Reverted https://github.com/pytorch/pytorch/pull/80335 on behalf of https://github.com/albanD due to Broke rocm and windows test	2022-07-08 13:34:02 +00:00
Rob Zinkov	f24c94d7ae	Adding maximize to SparseAdam (#80336 ) Added the maximize flag #68052 to SparseAdam optimizer and updates the respective tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/80336 Approved by: https://github.com/albanD	2022-07-08 12:17:27 +00:00
Rob Zinkov	495aa9bc3a	Adding maximize to rprop (#80335 ) Added the maximize flag #68052 to rprop optimizer and updates the respective tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/80335 Approved by: https://github.com/albanD	2022-07-08 08:04:38 +00:00
Rob Zinkov	a1fd5b4273	Adding maximize to RMSprop (#80326 ) Added the maximize flag #68052 to RMSprop optimizer and updates the respective tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/80326 Approved by: https://github.com/albanD	2022-07-08 08:04:26 +00:00
Rob Zinkov	14bd5bd6ee	Adding maximize to ASGD (#80323 ) Added the maximize flag #68052 to ASGD optimizer and updates the respective tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/80323 Approved by: https://github.com/albanD	2022-07-08 08:03:36 +00:00
albanD	9d20af5060	remove overly restrictive checks for cudagraph (#80881 ) Finish fixing https://github.com/pytorch/pytorch/issues/80809 Pull Request resolved: https://github.com/pytorch/pytorch/pull/80881 Approved by: https://github.com/jbschlosser	2022-07-06 18:08:49 +00:00
Edward Z. Yang	57f001f35a	Don't error if _warned_capturable_if_run_uncaptured not set (#80345 ) This can happen if an optimizer was pickled. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/80345 Approved by: https://github.com/malfet, https://github.com/albanD	2022-06-29 03:46:22 +00:00
anjali411	bda04e9f5e	Add __all__ for torch.optim and torch.nn.modules modules (#80237 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/80237 Approved by: https://github.com/albanD	2022-06-24 21:34:10 +00:00
Sergii Dymchenko	de7219e8a7	Use generators with all/any in torch/optim (#78142 ) Generator comprehensions with any/all are less verbose and potentially help to save memory/CPU : https://eklitzke.org/generator-comprehensions-and-using-any-and-all-in-python To make JIT work with this change, I added code to convert GeneratorExp to ListComp. So the whole PR is basically NoOp for JIT, but potentially memory and speed improvement for eager mode. Also I removed a test from test/jit/test_parametrization.py. The test was bad and had a TODO to actually implement and just tested that UnsupportedNodeError is thrown, and with GeneratorExp support a different error would be thrown. Pull Request resolved: https://github.com/pytorch/pytorch/pull/78142 Approved by: https://github.com/malfet, https://github.com/albanD	2022-06-24 17:23:45 +00:00
albanD	375668cd96	Remove overly restrictive assert in adam (#80222 ) This is causing issues if the user has the step on cuda for a good reason. These assert prevents code that used to run just fine to fail. Note that this is a pretty bad thing to do for performance though so it is ok to try and push users away from doing it. For the 1.12.1 milestone: this is not asking for a dot release to fix this (as this is bad practice anyways). But it would be a great thing to add if we do one: it is very low risk and will prevent breakage for users. Pull Request resolved: https://github.com/pytorch/pytorch/pull/80222 Approved by: https://github.com/jbschlosser, https://github.com/ngimel	2022-06-24 17:08:34 +00:00
Antonio Kim	765b6a8fab	Fix SequentialLR initialization (#72856 ) What was happening is that when we have multiple learning rate schedulers, the order in which they are being initialized is not being taken into account. This is a problem if they were being initialized in sequential order (as one might intuitively do). Each scheduler calls `step()` on initialization and sets the `lr` in its optimizer's `params_groups`. However, this means that step 0 will be using the `lr` that was set by the very last scheduler (in the case of initializing schedulers sequentially) instead of the first scheduler. The fix in this PR, addresses the above bug by performing a call to the appropriate scheduler on initialization after decrementing the `last_epoch` values in order to keep them the same post-step. This will ensure that the correct scheduler is the one setting the `lr` values for the optimizer's `param_groups` Pull Request resolved: https://github.com/pytorch/pytorch/pull/72856 Approved by: https://github.com/jbschlosser	2022-06-21 20:21:13 +00:00
Janosh Riebesell	660d9ddef4	Fix `SWALR` doc string (#79836 ) In `torch/optim/swa_utils.py`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/79836 Approved by: https://github.com/albanD	2022-06-20 12:57:07 +00:00
Sebastian Brodehl	fb9d8de379	Make `LR scheduler` stub complete, including `OneCycleLR` and class attributes. (#59476 ) This PR completes the stub file for lr scheduler and includes a previously missing scheduler, namely `OneCycleLR, and adds additional class attributes and methods for all lr scheduler. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59476 Approved by: https://github.com/jbschlosser	2022-06-17 16:39:13 +00:00
Madhushan B	9acbaaaf05	Fix typo in ChainedScheduler docstring (#79775 ) ### Goal Fixes https://github.com/pytorch/pytorch/issues/79720 ### Approach replace `Chains list of learning rate schedulers. It takes a list of chainable learning rate schedulers and performs consecutive step() functions` `belong` `to them by just one call.` with `Chains list of learning rate schedulers. It takes a list of chainable learning rate schedulers and performs consecutive step() functions` `belonging` `to them by just one call.` Pull Request resolved: https://github.com/pytorch/pytorch/pull/79775 Approved by: https://github.com/albanD	2022-06-17 14:18:42 +00:00
Michael Carilli	ba27ee9e8f	[CUDA graphs] Allows Adam and AdamW to be capture-safe (#77862 ) Near term fix for https://github.com/pytorch/pytorch/issues/76368. Q. Why does the user need to request `capturable=True` in the optimizer constructor? Why can't capture safety be completely automatic? A. We need to set up capture-safe (device-side) state variables before capture. If we don't, and step() internally detects capture is underway, it's too late: the best we could do is create a device state variable and copy the current CPU value into it, which is not something we want baked into the graph. Q. Ok, why not just do the capture-safe approach with device-side state variables all the time? A. It incurs several more kernel launches per parameter, which could really add up and regress cpu overhead for ungraphed step()s. If the optimizer won't be captured, we should allow step() to stick with its current cpu-side state handling. Q. But cuda RNG is a stateful thing that maintains its state on the cpu outside of capture and replay, and we capture it automatically. Why can't we do the same thing here? A. The graph object can handle RNG generator increments because its capture_begin, capture_end, and replay() methods can see and access generator object. But the graph object has no explicit knowledge of or access to optimizer steps in its capture scope. We could let the user tell the graph object what optimizers will be stepped in its scope, ie something like ```python graph.will_use_optimizer(opt) graph.capture_begin() ... ``` but that seems clunkier than an optimizer constructor arg. I'm open to other ideas, but right now I think constructor arg is necessary and the least bad approach. Long term, https://github.com/pytorch/pytorch/issues/71274 is a better fix. Pull Request resolved: https://github.com/pytorch/pytorch/pull/77862 Approved by: https://github.com/ezyang	2022-06-13 01:56:47 +00:00
Rob Zinkov	2a496e2f80	Adding maximize to Adamax (#77409 ) Added the maximize flag #68052 to Adamax optimizer and updates the respective tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/77409 Approved by: https://github.com/albanD	2022-05-16 17:34:44 +00:00
James Reed	57b54dfec5	Fix Optimizer.zero_grad type annotation (#76998 ) `Optimizer.zero_grad()` defines the `set_to_none` argument as `bool`, not `Optional[bool]` Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/76998 Approved by: https://github.com/albanD	2022-05-11 00:05:26 +00:00
tomMoral	ff94c9dee4	DOC fix momentum equation for nesterov Fix https://github.com/pytorch/pytorch/issues/72395 This is a small fix in the doc for an indice in this equation: ![image](https://user-images.githubusercontent.com/3321081/166165461-140855b5-96b5-4417-85fc-2a170f95700a.png) I think teh indice should not be `t-1` but `t`. This is coherent with [the implementation)[https://github.com/pytorch/pytorch/blob/master/torch/optim/sgd.py#L236] and with what is done for instance in [keras](https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/SGD). Pull Request resolved: https://github.com/pytorch/pytorch/pull/76639 Approved by: https://github.com/albanD	2022-05-04 20:40:21 +00:00
Emilio Castillo	e5ee6f5cf7	Fix `CosineAnnealingLR` on restart Fixes #60265 The initial LR for this scheduler is not consistent when a new instance is created with `last_epoch != -1` Maybe we can refactor the testing code to test `last_epoch != -1` in schedulers that can recreate their state from the current epoch? Pull Request resolved: https://github.com/pytorch/pytorch/pull/60339 Approved by: https://github.com/albanD	2022-04-20 13:35:01 +00:00
Rob Zinkov	6642e88ad2	Adding maximize flag to Adagrad This adds maximize to Adagrad (#68052) along with updates the respective tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/75968 Approved by: https://github.com/albanD	2022-04-20 08:29:03 +00:00
Jake Tae	3b18bc36f3	Docs: Add missing zero-ing step in Rprop algorithm Fixes ##70418. Pull Request resolved: https://github.com/pytorch/pytorch/pull/75555 Approved by: https://github.com/albanD	2022-04-11 21:57:13 +00:00
francescocastelli	58a44523c1	Add maximize flag to Adadelta Added the maximize flag to Adadelta optimizer (#68052) and adjusted tests to take maximize into account. Pull Request resolved: https://github.com/pytorch/pytorch/pull/75330 Approved by: https://github.com/cpuhrsch	2022-04-08 20:32:35 +00:00
Mikayla Gawarecki	10bb0ffe69	Fix casting bug in state_step for optimizers when loading state dict Pull Request resolved: https://github.com/pytorch/pytorch/pull/75214 Approved by: https://github.com/albanD	2022-04-05 01:27:18 +00:00
Jan Zikes	715a0dc5c0	[PyTorch/d2go] fix optim _multi_tensor (#73215 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73215 Fixing an issue in optimizers from _multi_tensor, for `sgd_mt` introduced in `2cb03e926f` Reviewed By: mikaylagawarecki Differential Revision: D34389034 fbshipit-source-id: ede153d52dca15909c6c022853589707f18dc8d1 (cherry picked from commit `cc8a58e584`)	2022-02-23 10:29:48 +00:00
Sergii Dymchenko	313557a613	Add missing import (#72840 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72840 Reviewed By: H-Huang Differential Revision: D34242612 Pulled By: albanD fbshipit-source-id: 3dd34de96dbf1ae8f3c3ea45888d211d95862c49 (cherry picked from commit `d2650ffa75`)	2022-02-15 19:43:54 +00:00
Mikayla Gawarecki	2a5aaf1c49	Optim foreach cleanup for AdamW (#70484 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70484 Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D33767869 Pulled By: mikaylagawarecki fbshipit-source-id: 2f5273bbfeea3ed502c5d77da4bebe1674243e86 (cherry picked from commit `2dd9b77917`)	2022-02-15 18:02:08 +00:00
Mikayla Gawarecki	dff58d519f	Optim foreach cleanup for Rprop (#70483 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70483 Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D33767866 Pulled By: mikaylagawarecki fbshipit-source-id: ffc5ae68eeea8fa09385862b853b731554b77bcb (cherry picked from commit `3a0fe29580`)	2022-02-15 18:02:08 +00:00
Mikayla Gawarecki	ce3094f5f6	Optim foreach cleanup for Rmsprop (#70482 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70482 Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D33767862 Pulled By: mikaylagawarecki fbshipit-source-id: 8e2e9c986d5a3774093a79755940372945f1b3a9 (cherry picked from commit `baea537277`)	2022-02-15 18:02:08 +00:00
Mikayla Gawarecki	2cb03e926f	Optim foreach cleanup for SGD (#70481 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70481 Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D33767868 Pulled By: mikaylagawarecki fbshipit-source-id: 89b9227a4ddf99602855973cbc343c58ae3d5328 (cherry picked from commit `ffea8ddcfd`)	2022-02-15 18:02:08 +00:00
Mikayla Gawarecki	5f9590681d	Optim foreach cleanup for Adam (#70295 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70295 Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D33767870 Pulled By: mikaylagawarecki fbshipit-source-id: f922f15ecb0307458c8ecee737325c42c4f3ce8b (cherry picked from commit `66233a8a3e`)	2022-02-15 18:02:08 +00:00
Mikayla Gawarecki	0972db5b7d	Optim foreach cleanup for ASGD (#70231 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70231 Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D33767867 Pulled By: mikaylagawarecki fbshipit-source-id: 4406824acbb6f427d52c1ced2d8a02a98c943b86 (cherry picked from commit `cbd9a4da15`)	2022-02-09 16:52:13 +00:00
Mikayla Gawarecki	5948522e9c	Optim foreach cleanup for RAdam (#70230 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70230 Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D33767874 Pulled By: mikaylagawarecki fbshipit-source-id: 9379db24266a7bbcc2c23849f87ae0af2e6729c0 (cherry picked from commit `ecf7b31fc3`)	2022-02-09 16:52:13 +00:00

1 2 3 4 5 ...

549 Commits