Targeted documentation updates in autograd.functional (#72111)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72111 For vectorize flag: - Advertises the use of functorch For autograd.functional.jvp: - Advertises the use of functorch and the low-level jvp API, both of which will be more performant than the double backprop trick. Test Plan: - view docs Reviewed By: albanD Differential Revision: D33918065 Pulled By: zou3519 fbshipit-source-id: 6e19699aa94f0e023ccda0dc40551ad6d932b7c7 (cherry picked from commit b4662ceb99)
2025-12-06 12:20:52 +01:00 · 2022-02-01 19:15:17 -08:00 · 2022-02-01 19:15:17 -08:00 · f99147dec0
commit f99147dec0
parent a60e2ae037
2 changed files with 18 additions and 9 deletions
--- a/docs/source/autograd.rst
+++ b/docs/source/autograd.rst
@ -14,7 +14,7 @@ Automatic differentiation package - torch.autograd
    backward
    grad
-.. forward-mode-ad:
+.. _forward-mode-ad:
 Forward-mode Automatic Differentiation
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
--- a/torch/autograd/functional.py
+++ b/torch/autograd/functional.py
@ -323,6 +323,13 @@ def jvp(func, inputs, v=None, create_graph=False, strict=False):
            jvp (tuple of Tensors or Tensor): result of the dot product with
            the same shape as the output.
    Note:
        ``autograd.functional.jvp`` computes the jvp by using the backward of
        the backward (sometimes called the double backwards trick). This is not
        the most performant way of computing the jvp. Please consider using
        `functorch's jvp <https://github.com/pytorch/functorch#jvp>`_
        or the :ref:`low-level forward-mode AD API <forward-mode-ad>` instead.
    Example:
        >>> def exp_reducer(x):
@ -345,10 +352,6 @@ def jvp(func, inputs, v=None, create_graph=False, strict=False):
        (tensor([2.2399, 2.5005]),
         tensor([5., 5.]))
    Note:
        The jvp is currently computed by using the backward of the backward
        (sometimes called the double backwards trick) as we don't have support
        for forward mode AD in PyTorch at the moment.
    """
    with torch.enable_grad():
@ -494,8 +497,11 @@ def jacobian(func, inputs, create_graph=False, strict=False, vectorize=False, st
            independent of it. If ``False``, we return a Tensor of zeros as the
            jacobian for said inputs, which is the expected mathematical value.
            Defaults to ``False``.
-        vectorize (bool, optional): This feature is experimental, please use at
+        vectorize (bool, optional): This feature is experimental.
-            your own risk. When computing the jacobian, usually we invoke
+            Please consider using
            `functorch's jacrev or jacfwd <https://github.com/pytorch/functorch#what-are-the-transforms>`_
            instead if you are looking for something less experimental and more performant.
            When computing the jacobian, usually we invoke
            ``autograd.grad`` once per row of the jacobian. If this flag is
            ``True``, we perform only a single ``autograd.grad`` call with
            ``batched_grad=True`` which uses the vmap prototype feature.
@ -701,8 +707,11 @@ def hessian(func, inputs, create_graph=False, strict=False, vectorize=False, out
            such that all the outputs are independent of it. If ``False``, we return a Tensor of zeros as the
            hessian for said inputs, which is the expected mathematical value.
            Defaults to ``False``.
-        vectorize (bool, optional): This feature is experimental, please use at
+        vectorize (bool, optional): This feature is experimental.
-            your own risk. When computing the hessian, usually we invoke
+            Please consider using
            `functorch <https://github.com/pytorch/functorch#what-are-the-transforms>`_
            instead if you are looking for something less experimental and more performant.
            When computing the hessian, usually we invoke
            ``autograd.grad`` once per row of the hessian. If this flag is
            ``True``, we use the vmap prototype feature as the backend to
            vectorize calls to ``autograd.grad`` so we only invoke it once