mirror of
https://github.com/zebrajr/pytorch.git
synced 2025-12-07 12:21:27 +01:00
117 lines
3.5 KiB
ReStructuredText
117 lines
3.5 KiB
ReStructuredText
torch.optim
|
|
===================================
|
|
|
|
.. automodule:: torch.optim
|
|
|
|
How to use an optimizer
|
|
-----------------------
|
|
|
|
To use :mod:`torch.optim` you have to construct an optimizer object, that will hold
|
|
the current state and will update the parameters based on the computed gradients.
|
|
|
|
Constructing it
|
|
^^^^^^^^^^^^^^^
|
|
|
|
To construct an :class:`Optimizer` you have to give it an iterable containing the
|
|
parameters (all should be :class:`~torch.autograd.Variable` s) to optimize. Then,
|
|
you can specify optimizer-specific options such as the learning rate, weight decay, etc.
|
|
|
|
Example::
|
|
|
|
optimizer = optim.SGD(model.parameters(), lr = 0.01, momentum=0.9)
|
|
optimizer = optim.Adam([var1, var2], lr = 0.0001)
|
|
|
|
Per-parameter options
|
|
^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
:class:`Optimizer` s also support specifying per-parameter options. To do this, instead
|
|
of passing an iterable of :class:`~torch.autograd.Variable` s, pass in an iterable of
|
|
:class:`dict` s. Each of them will define a separate parameter group, and should contain
|
|
a ``params`` key, containing a list of parameters belonging to it. Other keys
|
|
should match the keyword arguments accepted by the optimizers, and will be used
|
|
as optimization options for this group.
|
|
|
|
.. note::
|
|
|
|
You can still pass options as keyword arguments. They will be used as
|
|
defaults, in the groups that didn't override them. This is useful when you
|
|
only want to vary a single option, while keeping all others consistent
|
|
between parameter groups.
|
|
|
|
|
|
For example, this is very useful when one wants to specify per-layer learning rates::
|
|
|
|
optim.SGD([
|
|
{'params': model.base.parameters()},
|
|
{'params': model.classifier.parameters(), 'lr': 1e-3}
|
|
], lr=1e-2, momentum=0.9)
|
|
|
|
This means that ``model.base``'s parameters will use the default learning rate of ``1e-2``,
|
|
``model.classifier``'s parameters will use a learning rate of ``1e-3``, and a momentum of
|
|
``0.9`` will be used for all parameters
|
|
|
|
Taking an optimization step
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
All optimizers implement a :func:`~Optimizer.step` method, that updates the
|
|
parameters. It can be used in two ways:
|
|
|
|
``optimizer.step()``
|
|
~~~~~~~~~~~~~~~~~~~~
|
|
|
|
This is a simplified version supported by most optimizers. The function can be
|
|
called once the gradients are computed using e.g.
|
|
:func:`~torch.autograd.Variable.backward`.
|
|
|
|
Example::
|
|
|
|
for input, target in dataset:
|
|
optimizer.zero_grad()
|
|
output = model(input)
|
|
loss = loss_fn(output, target)
|
|
loss.backward()
|
|
optimizer.step()
|
|
|
|
``optimizer.step(closure)``
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
Some optimization algorithms such as Conjugate Gradient and LBFGS need to
|
|
reevaluate the function multiple times, so you have to pass in a closure that
|
|
allows them to recompute your model. The closure should clear the gradients,
|
|
compute the loss, and return it.
|
|
|
|
Example::
|
|
|
|
for input, target in dataset:
|
|
def closure():
|
|
optimizer.zero_grad()
|
|
output = model(input)
|
|
loss = loss_fn(output, target)
|
|
loss.backward()
|
|
return loss
|
|
optimizer.step(closure)
|
|
|
|
Algorithms
|
|
----------
|
|
|
|
.. autoclass:: Optimizer
|
|
:members:
|
|
.. autoclass:: Adadelta
|
|
:members:
|
|
.. autoclass:: Adagrad
|
|
:members:
|
|
.. autoclass:: Adam
|
|
:members:
|
|
.. autoclass:: Adamax
|
|
:members:
|
|
.. autoclass:: ASGD
|
|
:members:
|
|
.. autoclass:: LBFGS
|
|
:members:
|
|
.. autoclass:: RMSprop
|
|
:members:
|
|
.. autoclass:: Rprop
|
|
:members:
|
|
.. autoclass:: SGD
|
|
:members:
|