Targeted documentation updates in autograd.functional (#72111)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72111 For vectorize flag: - Advertises the use of functorch For autograd.functional.jvp: - Advertises the use of functorch and the low-level jvp API, both of which will be more performant than the double backprop trick. Test Plan: - view docs Reviewed By: albanD Differential Revision: D33918065 Pulled By: zou3519 fbshipit-source-id: 6e19699aa94f0e023ccda0dc40551ad6d932b7c7 (cherry picked from commit b4662ceb99)
2025-12-06 12:20:52 +01:00 · 2022-02-01 19:15:17 -08:00 · 2022-02-01 19:15:17 -08:00 · f99147dec0
commit f99147dec0
parent a60e2ae037
2 changed files with 18 additions and 9 deletions
--- a/docs/source/autograd.rst
+++ b/docs/source/autograd.rst
@ -14,7 +14,7 @@ Automatic differentiation package - torch.autograd
    backward
    grad

-.. forward-mode-ad:
+.. _forward-mode-ad:

 Forward-mode Automatic Differentiation
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
--- a/torch/autograd/functional.py
+++ b/torch/autograd/functional.py
@ -323,6 +323,13 @@ def jvp(func, inputs, v=None, create_graph=False, strict=False):
            jvp (tuple of Tensors or Tensor): result of the dot product with
            the same shape as the output.

+    Note:
+        ``autograd.functional.jvp`` computes the jvp by using the backward of
+        the backward (sometimes called the double backwards trick). This is not
+        the most performant way of computing the jvp. Please consider using
+        `functorch's jvp <https://github.com/pytorch/functorch#jvp>`_
+        or the :ref:`low-level forward-mode AD API <forward-mode-ad>` instead.
+
    Example:

        >>> def exp_reducer(x):
@ -345,10 +352,6 @@ def jvp(func, inputs, v=None, create_graph=False, strict=False):
        (tensor([2.2399, 2.5005]),
         tensor([5., 5.]))

-    Note:
-        The jvp is currently computed by using the backward of the backward
-        (sometimes called the double backwards trick) as we don't have support
-        for forward mode AD in PyTorch at the moment.
    """

    with torch.enable_grad():
@ -494,8 +497,11 @@ def jacobian(func, inputs, create_graph=False, strict=False, vectorize=False, st
            independent of it. If ``False``, we return a Tensor of zeros as the
            jacobian for said inputs, which is the expected mathematical value.
            Defaults to ``False``.
-        vectorize (bool, optional): This feature is experimental, please use at
-            your own risk. When computing the jacobian, usually we invoke
+        vectorize (bool, optional): This feature is experimental.
+            Please consider using
+            `functorch's jacrev or jacfwd <https://github.com/pytorch/functorch#what-are-the-transforms>`_
+            instead if you are looking for something less experimental and more performant.
+            When computing the jacobian, usually we invoke
            ``autograd.grad`` once per row of the jacobian. If this flag is
            ``True``, we perform only a single ``autograd.grad`` call with
            ``batched_grad=True`` which uses the vmap prototype feature.
@ -701,8 +707,11 @@ def hessian(func, inputs, create_graph=False, strict=False, vectorize=False, out
            such that all the outputs are independent of it. If ``False``, we return a Tensor of zeros as the
            hessian for said inputs, which is the expected mathematical value.
            Defaults to ``False``.
-        vectorize (bool, optional): This feature is experimental, please use at
-            your own risk. When computing the hessian, usually we invoke
+        vectorize (bool, optional): This feature is experimental.
+            Please consider using
+            `functorch <https://github.com/pytorch/functorch#what-are-the-transforms>`_
+            instead if you are looking for something less experimental and more performant.
+            When computing the hessian, usually we invoke
            ``autograd.grad`` once per row of the hessian. If this flag is
            ``True``, we use the vmap prototype feature as the backend to
            vectorize calls to ``autograd.grad`` so we only invoke it once