mirror of
https://github.com/zebrajr/tensorflow.git
synced 2025-12-06 12:20:11 +01:00
Make the RMSPropOptimizer docstring more explicit about sparse vs. dense
PiperOrigin-RevId: 165335237
This commit is contained in:
parent
75a9c4b5c8
commit
03a33c08dd
|
|
@ -63,9 +63,17 @@ class RMSPropOptimizer(optimizer.Optimizer):
|
|||
name="RMSProp"):
|
||||
"""Construct a new RMSProp optimizer.
|
||||
|
||||
Note that in dense implement of this algorithm, m_t and v_t will
|
||||
update even if g is zero, but in sparse implement, m_t and v_t
|
||||
will not update in iterations g is zero.
|
||||
Note that in the dense implementation of this algorithm, variables and their
|
||||
corresponding accumulators (momentum, gradient moving average, square
|
||||
gradient moving average) will be updated even if the gradient is zero
|
||||
(i.e. accumulators will decay, momentum will be applied). The sparse
|
||||
implementation (used when the gradient is an `IndexedSlices` object,
|
||||
typically because of `tf.gather` or an embedding lookup in the forward pass)
|
||||
will not update variable slices or their accumulators unless those slices
|
||||
were used in the forward pass (nor is there an "eventual" correction to
|
||||
account for these omitted updates). This leads to more efficient updates for
|
||||
large embedding lookup tables (where most of the slices are not accessed in
|
||||
a particular graph execution), but differs from the published algorithm.
|
||||
|
||||
Args:
|
||||
learning_rate: A Tensor or a floating point value. The learning rate.
|
||||
|
|
|
|||
Loading…
Reference in New Issue
Block a user