Targeted documentation updates in autograd.functional (#72111)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72111

For vectorize flag:
- Advertises the use of functorch

For autograd.functional.jvp:
- Advertises the use of functorch and the low-level jvp API, both of
which will be more performant than the double backprop trick.

Test Plan: - view docs

Reviewed By: albanD

Differential Revision: D33918065

Pulled By: zou3519

fbshipit-source-id: 6e19699aa94f0e023ccda0dc40551ad6d932b7c7
(cherry picked from commit b4662ceb99)
This commit is contained in:
Richard Zou 2022-02-01 19:15:17 -08:00 committed by PyTorch MergeBot
parent a60e2ae037
commit f99147dec0
2 changed files with 18 additions and 9 deletions

View File

@ -14,7 +14,7 @@ Automatic differentiation package - torch.autograd
backward
grad
.. forward-mode-ad:
.. _forward-mode-ad:
Forward-mode Automatic Differentiation
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

View File

@ -323,6 +323,13 @@ def jvp(func, inputs, v=None, create_graph=False, strict=False):
jvp (tuple of Tensors or Tensor): result of the dot product with
the same shape as the output.
Note:
``autograd.functional.jvp`` computes the jvp by using the backward of
the backward (sometimes called the double backwards trick). This is not
the most performant way of computing the jvp. Please consider using
`functorch's jvp <https://github.com/pytorch/functorch#jvp>`_
or the :ref:`low-level forward-mode AD API <forward-mode-ad>` instead.
Example:
>>> def exp_reducer(x):
@ -345,10 +352,6 @@ def jvp(func, inputs, v=None, create_graph=False, strict=False):
(tensor([2.2399, 2.5005]),
tensor([5., 5.]))
Note:
The jvp is currently computed by using the backward of the backward
(sometimes called the double backwards trick) as we don't have support
for forward mode AD in PyTorch at the moment.
"""
with torch.enable_grad():
@ -494,8 +497,11 @@ def jacobian(func, inputs, create_graph=False, strict=False, vectorize=False, st
independent of it. If ``False``, we return a Tensor of zeros as the
jacobian for said inputs, which is the expected mathematical value.
Defaults to ``False``.
vectorize (bool, optional): This feature is experimental, please use at
your own risk. When computing the jacobian, usually we invoke
vectorize (bool, optional): This feature is experimental.
Please consider using
`functorch's jacrev or jacfwd <https://github.com/pytorch/functorch#what-are-the-transforms>`_
instead if you are looking for something less experimental and more performant.
When computing the jacobian, usually we invoke
``autograd.grad`` once per row of the jacobian. If this flag is
``True``, we perform only a single ``autograd.grad`` call with
``batched_grad=True`` which uses the vmap prototype feature.
@ -701,8 +707,11 @@ def hessian(func, inputs, create_graph=False, strict=False, vectorize=False, out
such that all the outputs are independent of it. If ``False``, we return a Tensor of zeros as the
hessian for said inputs, which is the expected mathematical value.
Defaults to ``False``.
vectorize (bool, optional): This feature is experimental, please use at
your own risk. When computing the hessian, usually we invoke
vectorize (bool, optional): This feature is experimental.
Please consider using
`functorch <https://github.com/pytorch/functorch#what-are-the-transforms>`_
instead if you are looking for something less experimental and more performant.
When computing the hessian, usually we invoke
``autograd.grad`` once per row of the hessian. If this flag is
``True``, we use the vmap prototype feature as the backend to
vectorize calls to ``autograd.grad`` so we only invoke it once