Summary:
Fixes https://github.com/pytorch/pytorch/issues/46480 -- for SGD.
## Notes:
- I have modified the existing tests to take a new `constructor_accepts_maximize` flag. When this is set to true, the ` _test_basic_cases_template` function will test both maximizing and minimizing the sample function.
- This was the clearest way I could think of testing the changes -- I would appreciate feedback on this strategy.
## Work to be done:
[] I need to update the docs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67847
Reviewed By: H-Huang
Differential Revision: D32252631
Pulled By: albanD
fbshipit-source-id: 27915a3cc2d18b7e4d17bfc2d666fe7d2cfdf9a4
Summary:
Fixes https://github.com/pytorch/pytorch/issues/47441
To give user more information about python level functions in profiler traces, we propose to instrument on the following functions:
```
_BaseDataLoaderIter.__next__
Optimizer.step
Optimizer.zero_grad
```
Because the record_function already uses if (!active) to check whether the profiler is enabled, so we don't explicitly call torch.autograd._profiler_enabled() before each instrument.
Acknowledgement: nbcsm, guotuofeng, gunandrose4u , guyang3532 , mszhanyi
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47655
Reviewed By: smessmer
Differential Revision: D24960386
Pulled By: ilia-cher
fbshipit-source-id: 2eb655789e2e2f506e1b8f95ad3d470c83281102
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41283
in optimizer.zero_grad(), detach_ is useful to avoid memory leak only when grad has grad_fn, so add check to call grad.detach_ only when the grad has grad_fn in zero_grad() function
ghstack-source-id: 108702289
Test Plan: unit test
Reviewed By: mrshenli
Differential Revision: D22487315
fbshipit-source-id: 861909b15c8497f1da57f092d8963d4920c85e38
Summary:
This PR fixes an issue in https://github.com/pytorch/pytorch/issues/40967 where duplicate parameters across different parameter groups are not allowed, but duplicates inside the same parameter group are accepted. After this PR, both cases are treated equally and raise `ValueError`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41597
Reviewed By: zou3519
Differential Revision: D22608019
Pulled By: vincentqb
fbshipit-source-id: 6df41dac62b80db042cfefa6e53fb021b49f4399
Summary:
This is a faster and more idiomatic way of using `itertools.chain`. Instead of computing all the items in the iterable and storing them in memory, they are computed one-by-one and never stored as a huge list. This can save on both runtime and memory space.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40156
Reviewed By: ezyang
Differential Revision: D22189038
Pulled By: vincentqb
fbshipit-source-id: 160b2c27f442686821a6ea541e1f48f4a846c186
Summary:
Fixes https://github.com/pytorch/pytorch/issues/36831.
Instead of using `id()`, an arbitrary yet consistent order-based index is used instead. This results in a deterministic output between runs.
I am not the biggest fan of using `nonlocal` (it appears to be used sparingly in the codebase) to get `start_index` between calls to `pack_group()`, but the alternatives had larger issues:
- Using the last value added to `param_mappings` would be ideal, but that only works if `dict` iteration order is consistent, and PyTorch currently supports Python <3.7.
- Using the maximum value added to `param_mappings` wouldn't have that issue but would not be constant time.
For testing, I confirmed that `test_optim.py` works before and after these changes. Randomizing the indices in `param_mappings` causes the tests to fail, which is further evidence these changes work. I'm not 100% if these tests are sufficient, but they're a start.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37347
Differential Revision: D21353820
Pulled By: vincentqb
fbshipit-source-id: e549f1f154833a461b1f4df6d07ad509aab34ea1
Summary:
Resubmit #20698 which got messed up.
Idea is that when PyTorch is used in a custom build environment (e.g. Facebook), it's useful to track usage of various APIs centrally. This PR introduces a simple very lightweight mechanism to do so - only first invocation of a trigger point would be logged. This is significantly more lightweight than #18235 and thus we can allow to put logging in e.g. TensorImpl.
Also adds an initial list of trigger points. Trigger points are added in such a way that no static initialization triggers them, i.e. just linking with libtorch.so will not cause any logging. Further suggestions of what to log are welcomed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20745
Differential Revision: D15429196
Pulled By: dzhulgakov
fbshipit-source-id: a5e41a709a65b7ebccc6b95f93854e583cf20aca
Summary:
Add the defaults field to the copied object.
Prior to this patch, optimizer.__getattr__ has excluded the defaults
attribute of optimizer source object, required by some LR schedulers. (e.g. CyclicLR with momentum)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19308
Differential Revision: D15012801
Pulled By: soumith
fbshipit-source-id: 95801b269f6f9d78d531d4fed95c973b280cc96f
Summary:
Small change -- the benefit is that the docs will show
``<required parameter>`` instead of ``<object object>``
for these required parameters.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13202
Reviewed By: SsnL
Differential Revision: D12826252
Pulled By: jma127
fbshipit-source-id: 5f2c8495e5c56920377e4e012b8711e8f2a6e30e
* Codemod to update our codebase to 0.4 standard
* Update some of the test scri[ts
* remove Variable in test_clip_grad_value
* fix _symbolic_override_wrapper_maker
* Block set from param_group['params']
This might cause `list(params)` to output in random order. In this case, in `load_state_dict()`, `id_map` would not be matched correctly.
* Update Error Message
* Add Warning on Optimizer Docs
* Update optimizer.py
This replaces the torch.Tensor constructors with factories that produce
Variables. Similarly, functions on the torch module (e.g. torch.randn)
now return Variables.
To keep the PR to a reasonable size, I've left most of the unused tensor
code. Subsequent PRs will remove the dead code, clean-up calls to
torch.autograd.Variable, and rename Variable to Tensor everywhere.
There are some breaking changes because Variable and Tensors had
slightly different semantics. There's a list of those changes here:
https://github.com/pytorch/pytorch/wiki/Breaking-Changes-from-Variable-and-Tensor-merge
This removes volatile from Variable. The functionality is mostly
replaced by a global (thread-local) flag, which is controlled by
torch.set_grad_enabled() and the context manager torch.no_grad().
In C++, the flag is exposed through GradMode::is_enabled() and GradMode::set_enabled()
Fixes#3627
When I use the named_parametes to modify the lr and weight decay, I will face a bug. Because the value of the named_parameters return is torch.nn.paramter.Parameter, not a generator of the Parameter.
Here's the command I used to invoke autopep8 (in parallel!):
git ls-files | grep '\.py$' | xargs -n1 -P`nproc` autopep8 -i
Several rules are ignored in setup.cfg. The goal is to let autopep8
handle everything which it can handle safely, and to disable any rules
which are tricky or controversial to address. We may want to come back
and re-enable some of these rules later, but I'm trying to make this
patch as safe as possible.
Also configures flake8 to match pep8's behavior.
Also configures TravisCI to check the whole project for lint.