[docs] Fix backticks in docs (#60474)

Summary: There is a very common error when writing docs: One forgets to write a matching `` ` ``, and something like ``:attr:`x`` is rendered in the docs. This PR fixes most (all?) of these errors (and a few others). I found these running ``grep -r ">[^#<][^<]*\`"`` on the `docs/build/html/generated` folder. The regex finds an HTML tag that does not start with `#` (as python comments in example code may contain backticks) and that contains a backtick in the rendered HTML. This regex has not given any false positive in the current codebase, so I am inclined to suggest that we should add this check to the CI. Would this be possible / reasonable / easy to do malfet ? Pull Request resolved: https://github.com/pytorch/pytorch/pull/60474 Reviewed By: mrshenli Differential Revision: D29309633 Pulled By: albanD fbshipit-source-id: 9621e0e9f87590cea060dd084fa367442b6bd046
2025-12-06 12:20:52 +01:00 · 2021-06-24 06:26:23 -07:00 · 2021-06-24 06:26:23 -07:00 · 4e347f1242
commit 4e347f1242
parent bb9e1150ea
15 changed files with 76 additions and 71 deletions
--- a/torch/_tensor_docs.py
+++ b/torch/_tensor_docs.py
@ -1751,7 +1751,7 @@ add_docstr_all('index_add_',
               r"""
 index_add_(dim, index, tensor, *, alpha=1) -> Tensor

-Accumulate the elements of attr:`alpha` times :attr:`tensor` into the :attr:`self`
+Accumulate the elements of :attr:`alpha` times :attr:`tensor` into the :attr:`self`
 tensor by adding to the indices in the order given in :attr:`index`. For example,
 if ``dim == 0``, ``index[i] == j``, and ``alpha=-1``, then the ``i``\ th row of
 :attr:`tensor` is subtracted from the ``j``\ th row of :attr:`self`.
--- a/torch/_torch_docs.py
+++ b/torch/_torch_docs.py
@ -4642,7 +4642,7 @@ Multiplies :attr:`input` by 2**:attr:`other`.

 Typically this function is used to construct floating point numbers by multiplying
 mantissas in :attr:`input` with integral powers of two created from the exponents
-in :attr:'other'.
+in :attr:`other`.

 Args:
    {input}
@ -5242,7 +5242,7 @@ remaining :math:`m - n` rows of that column.
    last `m - n` columns in the case `m > n`. In :func:`torch.linalg.lstsq`, the residuals
    are in the field 'residuals' of the returned named tuple.

-    Unpacking the solution as``X = torch.lstsq(B, A).solution[:A.size(1)]`` should be replaced with
+    Unpacking the solution as ``X = torch.lstsq(B, A).solution[:A.size(1)]`` should be replaced with

    .. code:: python

@ -5671,10 +5671,7 @@ dimension(s) :attr:`dim`.
          while ``max(dim)``/``min(dim)`` propagates gradient only to a single
          index in the source tensor.

-If :attr:`keepdim is ``True``, the output tensors are of the same size
-as :attr:`input` except in the dimension(s) :attr:`dim` where they are of size 1.
-Otherwise, :attr:`dim`s are squeezed (see :func:`torch.squeeze`), resulting
-in the output tensors having fewer dimension than :attr:`input`.
+{keepdim_details}

 Args:
    {input}
@ -6132,10 +6129,7 @@ dimension(s) :attr:`dim`.
          while ``max(dim)``/``min(dim)`` propagates gradient only to a single
          index in the source tensor.

-If :attr:`keepdim` is ``True``, the output tensors are of the same size as
-:attr:`input` except in the dimension(s) :attr:`dim` where they are of size 1.
-Otherwise, :attr:`dim`s are squeezed (see :func:`torch.squeeze`), resulting in
-the output tensors having fewer dimensions than :attr:`input`.
+{keepdim_details}

 Args:
    {input}
@ -6677,7 +6671,7 @@ add_docstr(torch.narrow,
 narrow(input, dim, start, length) -> Tensor

 Returns a new tensor that is a narrowed version of :attr:`input` tensor. The
-dimension :attr:`dim` is input from :attr:`start` to :attr:`start + length`. The
+dimension :attr:`dim` is input from :attr:`start` to ``start + length``. The
 returned tensor and :attr:`input` tensor share the same underlying storage.

 Args:
@ -6704,7 +6698,7 @@ nan_to_num(input, nan=0.0, posinf=None, neginf=None, *, out=None) -> Tensor

 Replaces :literal:`NaN`, positive infinity, and negative infinity values in :attr:`input`
 with the values specified by :attr:`nan`, :attr:`posinf`, and :attr:`neginf`, respectively.
-By default, :literal:`NaN`s are replaced with zero, positive infinity is replaced with the
+By default, :literal:`NaN`\ s are replaced with zero, positive infinity is replaced with the
 greatest finite value representable by :attr:`input`'s dtype, and negative infinity
 is replaced with the least finite value representable by :attr:`input`'s dtype.

@ -6837,7 +6831,7 @@ nonzero(input, *, out=None, as_tuple=False) -> LongTensor or tuple of LongTensor
    When :attr:`input` is on CUDA, :func:`torch.nonzero() <torch.nonzero>` causes
    host-device synchronization.

-**When** :attr:`as_tuple` **is ``False`` (default)**:
+**When** :attr:`as_tuple` **is** ``False`` **(default)**:

 Returns a tensor containing the indices of all non-zero elements of
 :attr:`input`.  Each row in the result contains the indices of a non-zero
@ -6848,7 +6842,7 @@ If :attr:`input` has :math:`n` dimensions, then the resulting indices tensor
 :attr:`out` is of size :math:`(z \times n)`, where :math:`z` is the total number of
 non-zero elements in the :attr:`input` tensor.

-**When** :attr:`as_tuple` **is ``True``**:
+**When** :attr:`as_tuple` **is** ``True``:

 Returns a tuple of 1-D tensors, one for each dimension in :attr:`input`,
 each containing the indices (in that dimension) of all non-zero elements of
@ -10205,9 +10199,9 @@ The operation is defined as:

 Arguments:
    condition (BoolTensor): When True (nonzero), yield x, otherwise yield y
-    x (Tensor or Scalar): value (if :attr:x is a scalar) or values selected at indices
+    x (Tensor or Scalar): value (if :attr:`x` is a scalar) or values selected at indices
                          where :attr:`condition` is ``True``
-    y (Tensor or Scalar): value (if :attr:x is a scalar) or values selected at indices
+    y (Tensor or Scalar): value (if :attr:`y` is a scalar) or values selected at indices
                          where :attr:`condition` is ``False``

 Returns:
--- a/torch/distributions/categorical.py
+++ b/torch/distributions/categorical.py
@ -23,11 +23,11 @@ class Categorical(Distribution):
    relative probability vectors.

    .. note:: The `probs` argument must be non-negative, finite and have a non-zero sum,
-              and it will be normalized to sum to 1 along the last dimension. attr:`probs`
+              and it will be normalized to sum to 1 along the last dimension. :attr:`probs`
              will return this normalized value.
              The `logits` argument will be interpreted as unnormalized log probabilities
              and can therefore be any real number. It will likewise be normalized so that
-              the resulting probabilities sum to 1 along the last dimension. attr:`logits`
+              the resulting probabilities sum to 1 along the last dimension. :attr:`logits`
              will return this normalized value.

    See also: :func:`torch.multinomial`
--- a/torch/distributions/multinomial.py
+++ b/torch/distributions/multinomial.py
@ -16,11 +16,11 @@ class Multinomial(Distribution):
    called (see example below)

    .. note:: The `probs` argument must be non-negative, finite and have a non-zero sum,
-              and it will be normalized to sum to 1 along the last dimension. attr:`probs`
+              and it will be normalized to sum to 1 along the last dimension. :attr:`probs`
              will return this normalized value.
              The `logits` argument will be interpreted as unnormalized log probabilities
              and can therefore be any real number. It will likewise be normalized so that
-              the resulting probabilities sum to 1 along the last dimension. attr:`logits`
+              the resulting probabilities sum to 1 along the last dimension. :attr:`logits`
              will return this normalized value.

    -   :meth:`sample` requires a single shared `total_count` for all
--- a/torch/distributions/one_hot_categorical.py
+++ b/torch/distributions/one_hot_categorical.py
@ -12,11 +12,11 @@ class OneHotCategorical(Distribution):
    Samples are one-hot coded vectors of size ``probs.size(-1)``.

    .. note:: The `probs` argument must be non-negative, finite and have a non-zero sum,
-              and it will be normalized to sum to 1 along the last dimension. attr:`probs`
+              and it will be normalized to sum to 1 along the last dimension. :attr:`probs`
              will return this normalized value.
              The `logits` argument will be interpreted as unnormalized log probabilities
              and can therefore be any real number. It will likewise be normalized so that
-              the resulting probabilities sum to 1 along the last dimension. attr:`logits`
+              the resulting probabilities sum to 1 along the last dimension. :attr:`logits`
              will return this normalized value.

    See also: :func:`torch.distributions.Categorical` for specifications of
--- a/torch/functional.py
+++ b/torch/functional.py
@ -214,7 +214,7 @@ def einsum(*args):
        As of PyTorch 1.10 :func:`torch.einsum` also supports the sublist format (see examples below). In this format,
        subscripts for each operand are specified by sublists, list of integers in the range [0, 52). These sublists
        follow their operands, and an extra sublist can appear at the end of the input to specify the output's
-        subscripts., e.g.`torch.einsum(op1, sublist1, op2, sublist2, ..., [subslist_out])`. Python's `Ellipsis` object
+        subscripts., e.g. `torch.einsum(op1, sublist1, op2, sublist2, ..., [subslist_out])`. Python's `Ellipsis` object
        may be provided in a sublist to enable broadcasting as described in the Equation section above.

    Args:
@ -1286,7 +1286,7 @@ def norm(input, p="fro", dim=None, keepdim=False, out=None, dtype=None):  # noqa
            :attr:`dim` = ``None`` and :attr:`out` = ``None``.
        dtype (:class:`torch.dtype`, optional): the desired data type of
            returned tensor. If specified, the input tensor is casted to
-            :attr:'dtype' while performing the operation. Default: None.
+            :attr:`dtype` while performing the operation. Default: None.

    .. note::
        Even though ``p='fro'`` supports any number of dimensions, the true
--- a/torch/jit/init.py
+++ b/torch/jit/init.py
@ -82,7 +82,8 @@ def annotate(the_type, the_value):

    Though TorchScript can infer correct type for most Python expressions, there are some cases where
    type inference can be wrong, including:
-    - Empty containers like `[]` and `{}`, which TorchScript assumes to be container of `Tensor`s
+
+    - Empty containers like `[]` and `{}`, which TorchScript assumes to be container of `Tensor`
    - Optional types like `Optional[T]` but assigned a valid value of type `T`, TorchScript would assume
      it is type `T` rather than `Optional[T]`

--- a/torch/jit/_async.py
+++ b/torch/jit/_async.py
@ -27,10 +27,12 @@ def fork(func, *args, **kwargs):
    Asynchronous execution will only occur when run in TorchScript. If run in pure python,
    `fork` will not execute in parallel. `fork` will also not execute in parallel when invoked
    while tracing, however the `fork` and `wait` calls will be captured in the exported IR Graph.
-    Warning:
+
+    .. warning::
        `fork` tasks will execute non-deterministically. We recommend only spawning
        parallel fork tasks for pure functions that do not modify their inputs,
        module attributes, or global state.
+
    Args:
        func (callable or torch.nn.Module):  A Python function or `torch.nn.Module`
            that will be invoked. If executed in TorchScript, it will execute asynchronously,
--- a/torch/jit/_script.py
+++ b/torch/jit/_script.py
@ -73,7 +73,8 @@ Attribute.__doc__ = """

    Though TorchScript can infer correct type for most Python expressions, there are some cases where
    type inference can be wrong, including:
-    - Empty containers like `[]` and `{}`, which TorchScript assumes to be container of `Tensor`s
+
+    - Empty containers like `[]` and `{}`, which TorchScript assumes to be container of `Tensor`
    - Optional types like `Optional[T]` but assigned a valid value of type `T`, TorchScript would assume
      it is type `T` rather than `Optional[T]`

--- a/torch/linalg/init.py
+++ b/torch/linalg/init.py
@ -1212,7 +1212,7 @@ If :attr:`A` is complex valued, it computes the norm of :attr:`A`\ `.abs()`

 Supports input of float, double, cfloat and cdouble dtypes.

-This function does not necessarily treat multidimensonal attr:`A` as a batch of
+This function does not necessarily treat multidimensonal :attr:`A` as a batch of
 vectors, instead:

 - If :attr:`dim`\ `= None`, :attr:`A` will be flattened before the norm is computed.
@ -1223,15 +1223,15 @@ This behavior is for consistency with :func:`torch.linalg.norm`.

 :attr:`ord` defines the vector norm that is computed. The following norms are supported:

-======================   ========================================================
+======================   ===============================
 :attr:`ord`              vector norm
-======================   ========================================================
+======================   ===============================
 `2` (default)            `2`-norm (see below)
 `inf`                    `max(abs(x))`
 `-inf`                   `min(abs(x))`
 `0`                      `sum(x != 0)`
 other `int` or `float`   `sum(abs(x)^{ord})^{(1 / ord)}`
-======================   ========================================================
+======================   ===============================

 where `inf` refers to `float('inf')`, NumPy's `inf` object, or any equivalent object.

--- a/torch/nn/functional.py
+++ b/torch/nn/functional.py
@ -2556,7 +2556,7 @@ def poisson_nll_loss(
            is set to ``False``, the losses are instead summed for each minibatch. Ignored
            when reduce is ``False``. Default: ``True``
        eps (float, optional): Small value to avoid evaluation of :math:`\log(0)` when
-            :attr:`log_input`=``False``. Default: 1e-8
+            :attr:`log_input`\ =\ ``False``. Default: 1e-8
        reduce (bool, optional): Deprecated (see :attr:`reduction`). By default, the
            losses are averaged or summed over observations for each minibatch depending
            on :attr:`size_average`. When :attr:`reduce` is ``False``, returns a loss per
--- a/torch/nn/modules/batchnorm.py
+++ b/torch/nn/modules/batchnorm.py
@ -763,7 +763,7 @@ class SyncBatchNorm(_BatchNorm):
        :class:`torch.nn.SyncBatchNorm` layers.

        Args:
-            module (nn.Module): module containing one or more attr:`BatchNorm*D` layers
+            module (nn.Module): module containing one or more :attr:`BatchNorm*D` layers
            process_group (optional): process group to scope synchronization,
                default is the whole world

--- a/torch/nn/modules/module.py
+++ b/torch/nn/modules/module.py
@ -768,7 +768,7 @@ class Module:
        .. function:: to(memory_format=torch.channels_last)

        Its signature is similar to :meth:`torch.Tensor.to`, but only accepts
-        floating point or complex :attr:`dtype`s. In addition, this method will
+        floating point or complex :attr:`dtype`\ s. In addition, this method will
        only cast the floating point or complex parameters and buffers to :attr:`dtype`
        (if given). The integral parameters and buffers will be moved
        :attr:`device`, if that is given, but with dtypes unchanged. When
--- a/torch/nn/utils/prune.py
+++ b/torch/nn/utils/prune.py
@ -668,7 +668,7 @@ class RandomStructured(BasePruningMethod):

 class LnStructured(BasePruningMethod):
    r"""Prune entire (currently unpruned) channels in a tensor based on their
-    Ln-norm.
+    L\ ``n``-norm.

    Args:
        amount (int or float): quantity of channels to prune.
@ -695,7 +695,7 @@ class LnStructured(BasePruningMethod):
        Starting from a base ``default_mask`` (which should be a mask of ones
        if the tensor has not been pruned yet), generate a mask to apply on
        top of the ``default_mask`` by zeroing out the channels along the
-        specified dim with the lowest Ln-norm.
+        specified dim with the lowest L\ ``n``-norm.

        Args:
            t (torch.Tensor): tensor representing the parameter to prune
@ -824,6 +824,7 @@ def identity(module, name):
    parameter called ``name`` in ``module`` without actually pruning any
    units. Modifies module in place (and also return the modified module)
    by:
+
    1) adding a named buffer called ``name+'_mask'`` corresponding to the
       binary mask applied to the parameter ``name`` by the pruning method.
    2) replacing the parameter ``name`` by its pruned version, while the
@ -855,8 +856,9 @@ def random_unstructured(module, name, amount):
    by removing the specified ``amount`` of (currently unpruned) units
    selected at random.
    Modifies module in place (and also return the modified module) by:
+
    1) adding a named buffer called ``name+'_mask'`` corresponding to the
-    binary mask applied to the parameter `name` by the pruning method.
+       binary mask applied to the parameter ``name`` by the pruning method.
    2) replacing the parameter ``name`` by its pruned version, while the
       original (unpruned) parameter is stored in a new parameter named
       ``name+'_orig'``.
@ -889,6 +891,7 @@ def l1_unstructured(module, name, amount, importance_scores=None):
    lowest L1-norm.
    Modifies module in place (and also return the modified module)
    by:
+
    1) adding a named buffer called ``name+'_mask'`` corresponding to the
       binary mask applied to the parameter ``name`` by the pruning method.
    2) replacing the parameter ``name`` by its pruned version, while the
@ -929,6 +932,7 @@ def random_structured(module, name, amount, dim):
    along the specified ``dim`` selected at random.
    Modifies module in place (and also return the modified module)
    by:
+
    1) adding a named buffer called ``name+'_mask'`` corresponding to the
       binary mask applied to the parameter ``name`` by the pruning method.
    2) replacing the parameter ``name`` by its pruned version, while the
@ -963,9 +967,10 @@ def random_structured(module, name, amount, dim):
 def ln_structured(module, name, amount, n, dim, importance_scores=None):
    r"""Prunes tensor corresponding to parameter called ``name`` in ``module``
    by removing the specified ``amount`` of (currently unpruned) channels
-    along the specified ``dim`` with the lowest L``n``-norm.
+    along the specified ``dim`` with the lowest L\ ``n``-norm.
    Modifies module in place (and also return the modified module)
    by:
+
    1) adding a named buffer called ``name+'_mask'`` corresponding to the
       binary mask applied to the parameter ``name`` by the pruning method.
    2) replacing the parameter ``name`` by its pruned version, while the
@ -1008,6 +1013,7 @@ def global_unstructured(parameters, pruning_method, importance_scores=None, **kw
    Globally prunes tensors corresponding to all parameters in ``parameters``
    by applying the specified ``pruning_method``.
    Modifies modules in place by:
+
    1) adding a named buffer called ``name+'_mask'`` corresponding to the
       binary mask applied to the parameter ``name`` by the pruning method.
    2) replacing the parameter ``name`` by its pruned version, while the
@ -1127,6 +1133,7 @@ def custom_from_mask(module, name, mask):
    by applying the pre-computed mask in ``mask``.
    Modifies module in place (and also return the modified module)
    by:
+
    1) adding a named buffer called ``name+'_mask'`` corresponding to the
       binary mask applied to the parameter ``name`` by the pruning method.
    2) replacing the parameter ``name`` by its pruned version, while the
--- a/torch/quantization/_learnable_fake_quantize.py
+++ b/torch/quantization/_learnable_fake_quantize.py
@ -11,20 +11,20 @@ class _LearnableFakeQuantize(torch.quantization.FakeQuantizeBase):
    In addition to the attributes in the original FakeQuantize module, the _LearnableFakeQuantize
    module also includes the following attributes to support quantization parameter learning.

-    * :attr: `channel_len` defines the length of the channel when initializing scale and zero point
+    * :attr:`channel_len` defines the length of the channel when initializing scale and zero point
      for the per channel case.

-    * :attr: `use_grad_scaling` defines the flag for whether the gradients for scale and zero point are
+    * :attr:`use_grad_scaling` defines the flag for whether the gradients for scale and zero point are
      normalized by the constant, which is proportional to the square root of the number of
      elements in the tensor. The related literature justifying the use of this particular constant
      can be found here: https://openreview.net/pdf?id=rkgO66VKDS.

-    * :attr: `fake_quant_enabled` defines the flag for enabling fake quantization on the output.
+    * :attr:`fake_quant_enabled` defines the flag for enabling fake quantization on the output.

-    * :attr: `static_enabled` defines the flag for using observer's static estimation for
+    * :attr:`static_enabled` defines the flag for using observer's static estimation for
      scale and zero point.

-    * attr: `learning_enabled` defines the flag for enabling backpropagation for scale and zero point.
+    * :attr:`learning_enabled` defines the flag for enabling backpropagation for scale and zero point.
    """
    def __init__(self, observer, quant_min=0, quant_max=255, scale=1., zero_point=0., channel_len=-1,
                 use_grad_scaling=False, **observer_kwargs):