Make the RMSPropOptimizer docstring more explicit about sparse vs. dense

PiperOrigin-RevId: 165335237
2025-12-06 12:20:11 +01:00 · 2017-08-15 11:33:44 -07:00 · 2017-08-15 11:33:44 -07:00 · 03a33c08dd
commit 03a33c08dd
parent 75a9c4b5c8
1 changed files with 11 additions and 3 deletions
--- a/tensorflow/python/training/rmsprop.py
+++ b/tensorflow/python/training/rmsprop.py
@ -63,9 +63,17 @@ class RMSPropOptimizer(optimizer.Optimizer):
               name="RMSProp"):
    """Construct a new RMSProp optimizer.

-    Note that in dense implement of this algorithm, m_t and v_t will
-    update even if g is zero, but in sparse implement, m_t and v_t
-    will not update in iterations g is zero.
+    Note that in the dense implementation of this algorithm, variables and their
+    corresponding accumulators (momentum, gradient moving average, square
+    gradient moving average) will be updated even if the gradient is zero
+    (i.e. accumulators will decay, momentum will be applied). The sparse
+    implementation (used when the gradient is an `IndexedSlices` object,
+    typically because of `tf.gather` or an embedding lookup in the forward pass)
+    will not update variable slices or their accumulators unless those slices
+    were used in the forward pass (nor is there an "eventual" correction to
+    account for these omitted updates). This leads to more efficient updates for
+    large embedding lookup tables (where most of the slices are not accessed in
+    a particular graph execution), but differs from the published algorithm.

    Args:
      learning_rate: A Tensor or a floating point value.  The learning rate.