[Easy] Optimize clip_grad param description (#151532)

Fix missing optional description in `clip_grad_norm_` and `clip_grad_value_` ## Test Result ### Before ![image](https://github.com/user-attachments/assets/3393dd4b-a730-4dd4-8304-9b895ac669d4) ![image](https://github.com/user-attachments/assets/220c4738-a728-474b-b06d-b5be7660d150) ### After ![image](https://github.com/user-attachments/assets/5637bb68-3b6d-49a3-8ee1-3af636950aa0) ![image](https://github.com/user-attachments/assets/c0f1d966-a9ba-4fac-a874-9d4955f6e0d6) Pull Request resolved: https://github.com/pytorch/pytorch/pull/151532 Approved by: https://github.com/Skylion007, https://github.com/albanD
2025-12-06 12:20:52 +01:00 · 2025-04-17 16:47:35 +00:00 · 2025-04-17 16:47:35 +00:00 · fe90a5c140
commit fe90a5c140
parent c3a18f6126
1 changed files with 6 additions and 6 deletions
--- a/torch/nn/utils/clip_grad.py
+++ b/torch/nn/utils/clip_grad.py
@ -197,12 +197,12 @@ def clip_grad_norm_(
        parameters (Iterable[Tensor] or Tensor): an iterable of Tensors or a
            single Tensor that will have gradients normalized
        max_norm (float): max norm of the gradients
-        norm_type (float): type of the used p-norm. Can be ``'inf'`` for
-            infinity norm.
-        error_if_nonfinite (bool): if True, an error is thrown if the total
+        norm_type (float, optional): type of the used p-norm. Can be ``'inf'`` for
+            infinity norm. Default: 2.0
+        error_if_nonfinite (bool, optional): if True, an error is thrown if the total
            norm of the gradients from :attr:`parameters` is ``nan``,
-            ``inf``, or ``-inf``. Default: False (will switch to True in the future)
-        foreach (bool): use the faster foreach-based implementation.
+            ``inf``, or ``-inf``. Default: False
+        foreach (bool, optional): use the faster foreach-based implementation.
            If ``None``, use the foreach implementation for CUDA and CPU native tensors and silently
            fall back to the slow implementation for other device types.
            Default: ``None``
@ -258,7 +258,7 @@ def clip_grad_value_(
        clip_value (float): maximum allowed value of the gradients.
            The gradients are clipped in the range
            :math:`\left[\text{-clip\_value}, \text{clip\_value}\right]`
-        foreach (bool): use the faster foreach-based implementation
+        foreach (bool, optional): use the faster foreach-based implementation
            If ``None``, use the foreach implementation for CUDA and CPU native tensors and
            silently fall back to the slow implementation for other device types.
            Default: ``None``