[docs] Fix backticks in docs (#60474)

Summary:
There is a very common error when writing docs: One forgets to write a matching `` ` ``, and something like ``:attr:`x`` is rendered in the docs. This PR fixes most (all?) of these errors (and a few others).

I found these running ``grep -r ">[^#<][^<]*\`"`` on the `docs/build/html/generated` folder. The regex finds an HTML tag that does not start with `#` (as python comments in example code may contain backticks) and that contains a backtick in the rendered HTML.

This regex has not given any false positive in the current codebase, so I am inclined to suggest that we should add this check to the CI. Would this be possible / reasonable / easy to do malfet ?

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60474

Reviewed By: mrshenli

Differential Revision: D29309633

Pulled By: albanD

fbshipit-source-id: 9621e0e9f87590cea060dd084fa367442b6bd046
This commit is contained in:
lezcano 2021-06-24 06:26:23 -07:00 committed by Facebook GitHub Bot
parent bb9e1150ea
commit 4e347f1242
15 changed files with 76 additions and 71 deletions

View File

@ -1751,7 +1751,7 @@ add_docstr_all('index_add_',
r"""
index_add_(dim, index, tensor, *, alpha=1) -> Tensor
Accumulate the elements of attr:`alpha` times :attr:`tensor` into the :attr:`self`
Accumulate the elements of :attr:`alpha` times :attr:`tensor` into the :attr:`self`
tensor by adding to the indices in the order given in :attr:`index`. For example,
if ``dim == 0``, ``index[i] == j``, and ``alpha=-1``, then the ``i``\ th row of
:attr:`tensor` is subtracted from the ``j``\ th row of :attr:`self`.

View File

@ -4642,7 +4642,7 @@ Multiplies :attr:`input` by 2**:attr:`other`.
Typically this function is used to construct floating point numbers by multiplying
mantissas in :attr:`input` with integral powers of two created from the exponents
in :attr:'other'.
in :attr:`other`.
Args:
{input}
@ -5242,7 +5242,7 @@ remaining :math:`m - n` rows of that column.
last `m - n` columns in the case `m > n`. In :func:`torch.linalg.lstsq`, the residuals
are in the field 'residuals' of the returned named tuple.
Unpacking the solution as``X = torch.lstsq(B, A).solution[:A.size(1)]`` should be replaced with
Unpacking the solution as ``X = torch.lstsq(B, A).solution[:A.size(1)]`` should be replaced with
.. code:: python
@ -5671,10 +5671,7 @@ dimension(s) :attr:`dim`.
while ``max(dim)``/``min(dim)`` propagates gradient only to a single
index in the source tensor.
If :attr:`keepdim is ``True``, the output tensors are of the same size
as :attr:`input` except in the dimension(s) :attr:`dim` where they are of size 1.
Otherwise, :attr:`dim`s are squeezed (see :func:`torch.squeeze`), resulting
in the output tensors having fewer dimension than :attr:`input`.
{keepdim_details}
Args:
{input}
@ -6132,10 +6129,7 @@ dimension(s) :attr:`dim`.
while ``max(dim)``/``min(dim)`` propagates gradient only to a single
index in the source tensor.
If :attr:`keepdim` is ``True``, the output tensors are of the same size as
:attr:`input` except in the dimension(s) :attr:`dim` where they are of size 1.
Otherwise, :attr:`dim`s are squeezed (see :func:`torch.squeeze`), resulting in
the output tensors having fewer dimensions than :attr:`input`.
{keepdim_details}
Args:
{input}
@ -6677,7 +6671,7 @@ add_docstr(torch.narrow,
narrow(input, dim, start, length) -> Tensor
Returns a new tensor that is a narrowed version of :attr:`input` tensor. The
dimension :attr:`dim` is input from :attr:`start` to :attr:`start + length`. The
dimension :attr:`dim` is input from :attr:`start` to ``start + length``. The
returned tensor and :attr:`input` tensor share the same underlying storage.
Args:
@ -6704,7 +6698,7 @@ nan_to_num(input, nan=0.0, posinf=None, neginf=None, *, out=None) -> Tensor
Replaces :literal:`NaN`, positive infinity, and negative infinity values in :attr:`input`
with the values specified by :attr:`nan`, :attr:`posinf`, and :attr:`neginf`, respectively.
By default, :literal:`NaN`s are replaced with zero, positive infinity is replaced with the
By default, :literal:`NaN`\ s are replaced with zero, positive infinity is replaced with the
greatest finite value representable by :attr:`input`'s dtype, and negative infinity
is replaced with the least finite value representable by :attr:`input`'s dtype.
@ -6837,7 +6831,7 @@ nonzero(input, *, out=None, as_tuple=False) -> LongTensor or tuple of LongTensor
When :attr:`input` is on CUDA, :func:`torch.nonzero() <torch.nonzero>` causes
host-device synchronization.
**When** :attr:`as_tuple` **is ``False`` (default)**:
**When** :attr:`as_tuple` **is** ``False`` **(default)**:
Returns a tensor containing the indices of all non-zero elements of
:attr:`input`. Each row in the result contains the indices of a non-zero
@ -6848,7 +6842,7 @@ If :attr:`input` has :math:`n` dimensions, then the resulting indices tensor
:attr:`out` is of size :math:`(z \times n)`, where :math:`z` is the total number of
non-zero elements in the :attr:`input` tensor.
**When** :attr:`as_tuple` **is ``True``**:
**When** :attr:`as_tuple` **is** ``True``:
Returns a tuple of 1-D tensors, one for each dimension in :attr:`input`,
each containing the indices (in that dimension) of all non-zero elements of
@ -10205,9 +10199,9 @@ The operation is defined as:
Arguments:
condition (BoolTensor): When True (nonzero), yield x, otherwise yield y
x (Tensor or Scalar): value (if :attr:x is a scalar) or values selected at indices
x (Tensor or Scalar): value (if :attr:`x` is a scalar) or values selected at indices
where :attr:`condition` is ``True``
y (Tensor or Scalar): value (if :attr:x is a scalar) or values selected at indices
y (Tensor or Scalar): value (if :attr:`y` is a scalar) or values selected at indices
where :attr:`condition` is ``False``
Returns:

View File

@ -23,11 +23,11 @@ class Categorical(Distribution):
relative probability vectors.
.. note:: The `probs` argument must be non-negative, finite and have a non-zero sum,
and it will be normalized to sum to 1 along the last dimension. attr:`probs`
and it will be normalized to sum to 1 along the last dimension. :attr:`probs`
will return this normalized value.
The `logits` argument will be interpreted as unnormalized log probabilities
and can therefore be any real number. It will likewise be normalized so that
the resulting probabilities sum to 1 along the last dimension. attr:`logits`
the resulting probabilities sum to 1 along the last dimension. :attr:`logits`
will return this normalized value.
See also: :func:`torch.multinomial`

View File

@ -16,11 +16,11 @@ class Multinomial(Distribution):
called (see example below)
.. note:: The `probs` argument must be non-negative, finite and have a non-zero sum,
and it will be normalized to sum to 1 along the last dimension. attr:`probs`
and it will be normalized to sum to 1 along the last dimension. :attr:`probs`
will return this normalized value.
The `logits` argument will be interpreted as unnormalized log probabilities
and can therefore be any real number. It will likewise be normalized so that
the resulting probabilities sum to 1 along the last dimension. attr:`logits`
the resulting probabilities sum to 1 along the last dimension. :attr:`logits`
will return this normalized value.
- :meth:`sample` requires a single shared `total_count` for all

View File

@ -12,11 +12,11 @@ class OneHotCategorical(Distribution):
Samples are one-hot coded vectors of size ``probs.size(-1)``.
.. note:: The `probs` argument must be non-negative, finite and have a non-zero sum,
and it will be normalized to sum to 1 along the last dimension. attr:`probs`
and it will be normalized to sum to 1 along the last dimension. :attr:`probs`
will return this normalized value.
The `logits` argument will be interpreted as unnormalized log probabilities
and can therefore be any real number. It will likewise be normalized so that
the resulting probabilities sum to 1 along the last dimension. attr:`logits`
the resulting probabilities sum to 1 along the last dimension. :attr:`logits`
will return this normalized value.
See also: :func:`torch.distributions.Categorical` for specifications of

View File

@ -214,7 +214,7 @@ def einsum(*args):
As of PyTorch 1.10 :func:`torch.einsum` also supports the sublist format (see examples below). In this format,
subscripts for each operand are specified by sublists, list of integers in the range [0, 52). These sublists
follow their operands, and an extra sublist can appear at the end of the input to specify the output's
subscripts., e.g.`torch.einsum(op1, sublist1, op2, sublist2, ..., [subslist_out])`. Python's `Ellipsis` object
subscripts., e.g. `torch.einsum(op1, sublist1, op2, sublist2, ..., [subslist_out])`. Python's `Ellipsis` object
may be provided in a sublist to enable broadcasting as described in the Equation section above.
Args:
@ -1286,7 +1286,7 @@ def norm(input, p="fro", dim=None, keepdim=False, out=None, dtype=None): # noqa
:attr:`dim` = ``None`` and :attr:`out` = ``None``.
dtype (:class:`torch.dtype`, optional): the desired data type of
returned tensor. If specified, the input tensor is casted to
:attr:'dtype' while performing the operation. Default: None.
:attr:`dtype` while performing the operation. Default: None.
.. note::
Even though ``p='fro'`` supports any number of dimensions, the true

View File

@ -82,7 +82,8 @@ def annotate(the_type, the_value):
Though TorchScript can infer correct type for most Python expressions, there are some cases where
type inference can be wrong, including:
- Empty containers like `[]` and `{}`, which TorchScript assumes to be container of `Tensor`s
- Empty containers like `[]` and `{}`, which TorchScript assumes to be container of `Tensor`
- Optional types like `Optional[T]` but assigned a valid value of type `T`, TorchScript would assume
it is type `T` rather than `Optional[T]`

View File

@ -27,10 +27,12 @@ def fork(func, *args, **kwargs):
Asynchronous execution will only occur when run in TorchScript. If run in pure python,
`fork` will not execute in parallel. `fork` will also not execute in parallel when invoked
while tracing, however the `fork` and `wait` calls will be captured in the exported IR Graph.
Warning:
.. warning::
`fork` tasks will execute non-deterministically. We recommend only spawning
parallel fork tasks for pure functions that do not modify their inputs,
module attributes, or global state.
Args:
func (callable or torch.nn.Module): A Python function or `torch.nn.Module`
that will be invoked. If executed in TorchScript, it will execute asynchronously,

View File

@ -73,7 +73,8 @@ Attribute.__doc__ = """
Though TorchScript can infer correct type for most Python expressions, there are some cases where
type inference can be wrong, including:
- Empty containers like `[]` and `{}`, which TorchScript assumes to be container of `Tensor`s
- Empty containers like `[]` and `{}`, which TorchScript assumes to be container of `Tensor`
- Optional types like `Optional[T]` but assigned a valid value of type `T`, TorchScript would assume
it is type `T` rather than `Optional[T]`

View File

@ -1212,7 +1212,7 @@ If :attr:`A` is complex valued, it computes the norm of :attr:`A`\ `.abs()`
Supports input of float, double, cfloat and cdouble dtypes.
This function does not necessarily treat multidimensonal attr:`A` as a batch of
This function does not necessarily treat multidimensonal :attr:`A` as a batch of
vectors, instead:
- If :attr:`dim`\ `= None`, :attr:`A` will be flattened before the norm is computed.
@ -1223,15 +1223,15 @@ This behavior is for consistency with :func:`torch.linalg.norm`.
:attr:`ord` defines the vector norm that is computed. The following norms are supported:
====================== ========================================================
====================== ===============================
:attr:`ord` vector norm
====================== ========================================================
====================== ===============================
`2` (default) `2`-norm (see below)
`inf` `max(abs(x))`
`-inf` `min(abs(x))`
`0` `sum(x != 0)`
other `int` or `float` `sum(abs(x)^{ord})^{(1 / ord)}`
====================== ========================================================
====================== ===============================
where `inf` refers to `float('inf')`, NumPy's `inf` object, or any equivalent object.

View File

@ -2556,7 +2556,7 @@ def poisson_nll_loss(
is set to ``False``, the losses are instead summed for each minibatch. Ignored
when reduce is ``False``. Default: ``True``
eps (float, optional): Small value to avoid evaluation of :math:`\log(0)` when
:attr:`log_input`=``False``. Default: 1e-8
:attr:`log_input`\ =\ ``False``. Default: 1e-8
reduce (bool, optional): Deprecated (see :attr:`reduction`). By default, the
losses are averaged or summed over observations for each minibatch depending
on :attr:`size_average`. When :attr:`reduce` is ``False``, returns a loss per

View File

@ -763,7 +763,7 @@ class SyncBatchNorm(_BatchNorm):
:class:`torch.nn.SyncBatchNorm` layers.
Args:
module (nn.Module): module containing one or more attr:`BatchNorm*D` layers
module (nn.Module): module containing one or more :attr:`BatchNorm*D` layers
process_group (optional): process group to scope synchronization,
default is the whole world

View File

@ -768,7 +768,7 @@ class Module:
.. function:: to(memory_format=torch.channels_last)
Its signature is similar to :meth:`torch.Tensor.to`, but only accepts
floating point or complex :attr:`dtype`s. In addition, this method will
floating point or complex :attr:`dtype`\ s. In addition, this method will
only cast the floating point or complex parameters and buffers to :attr:`dtype`
(if given). The integral parameters and buffers will be moved
:attr:`device`, if that is given, but with dtypes unchanged. When

View File

@ -668,7 +668,7 @@ class RandomStructured(BasePruningMethod):
class LnStructured(BasePruningMethod):
r"""Prune entire (currently unpruned) channels in a tensor based on their
Ln-norm.
L\ ``n``-norm.
Args:
amount (int or float): quantity of channels to prune.
@ -695,7 +695,7 @@ class LnStructured(BasePruningMethod):
Starting from a base ``default_mask`` (which should be a mask of ones
if the tensor has not been pruned yet), generate a mask to apply on
top of the ``default_mask`` by zeroing out the channels along the
specified dim with the lowest Ln-norm.
specified dim with the lowest L\ ``n``-norm.
Args:
t (torch.Tensor): tensor representing the parameter to prune
@ -824,6 +824,7 @@ def identity(module, name):
parameter called ``name`` in ``module`` without actually pruning any
units. Modifies module in place (and also return the modified module)
by:
1) adding a named buffer called ``name+'_mask'`` corresponding to the
binary mask applied to the parameter ``name`` by the pruning method.
2) replacing the parameter ``name`` by its pruned version, while the
@ -855,8 +856,9 @@ def random_unstructured(module, name, amount):
by removing the specified ``amount`` of (currently unpruned) units
selected at random.
Modifies module in place (and also return the modified module) by:
1) adding a named buffer called ``name+'_mask'`` corresponding to the
binary mask applied to the parameter `name` by the pruning method.
binary mask applied to the parameter ``name`` by the pruning method.
2) replacing the parameter ``name`` by its pruned version, while the
original (unpruned) parameter is stored in a new parameter named
``name+'_orig'``.
@ -889,6 +891,7 @@ def l1_unstructured(module, name, amount, importance_scores=None):
lowest L1-norm.
Modifies module in place (and also return the modified module)
by:
1) adding a named buffer called ``name+'_mask'`` corresponding to the
binary mask applied to the parameter ``name`` by the pruning method.
2) replacing the parameter ``name`` by its pruned version, while the
@ -929,6 +932,7 @@ def random_structured(module, name, amount, dim):
along the specified ``dim`` selected at random.
Modifies module in place (and also return the modified module)
by:
1) adding a named buffer called ``name+'_mask'`` corresponding to the
binary mask applied to the parameter ``name`` by the pruning method.
2) replacing the parameter ``name`` by its pruned version, while the
@ -963,9 +967,10 @@ def random_structured(module, name, amount, dim):
def ln_structured(module, name, amount, n, dim, importance_scores=None):
r"""Prunes tensor corresponding to parameter called ``name`` in ``module``
by removing the specified ``amount`` of (currently unpruned) channels
along the specified ``dim`` with the lowest L``n``-norm.
along the specified ``dim`` with the lowest L\ ``n``-norm.
Modifies module in place (and also return the modified module)
by:
1) adding a named buffer called ``name+'_mask'`` corresponding to the
binary mask applied to the parameter ``name`` by the pruning method.
2) replacing the parameter ``name`` by its pruned version, while the
@ -1008,6 +1013,7 @@ def global_unstructured(parameters, pruning_method, importance_scores=None, **kw
Globally prunes tensors corresponding to all parameters in ``parameters``
by applying the specified ``pruning_method``.
Modifies modules in place by:
1) adding a named buffer called ``name+'_mask'`` corresponding to the
binary mask applied to the parameter ``name`` by the pruning method.
2) replacing the parameter ``name`` by its pruned version, while the
@ -1127,6 +1133,7 @@ def custom_from_mask(module, name, mask):
by applying the pre-computed mask in ``mask``.
Modifies module in place (and also return the modified module)
by:
1) adding a named buffer called ``name+'_mask'`` corresponding to the
binary mask applied to the parameter ``name`` by the pruning method.
2) replacing the parameter ``name`` by its pruned version, while the

View File

@ -11,20 +11,20 @@ class _LearnableFakeQuantize(torch.quantization.FakeQuantizeBase):
In addition to the attributes in the original FakeQuantize module, the _LearnableFakeQuantize
module also includes the following attributes to support quantization parameter learning.
* :attr: `channel_len` defines the length of the channel when initializing scale and zero point
* :attr:`channel_len` defines the length of the channel when initializing scale and zero point
for the per channel case.
* :attr: `use_grad_scaling` defines the flag for whether the gradients for scale and zero point are
* :attr:`use_grad_scaling` defines the flag for whether the gradients for scale and zero point are
normalized by the constant, which is proportional to the square root of the number of
elements in the tensor. The related literature justifying the use of this particular constant
can be found here: https://openreview.net/pdf?id=rkgO66VKDS.
* :attr: `fake_quant_enabled` defines the flag for enabling fake quantization on the output.
* :attr:`fake_quant_enabled` defines the flag for enabling fake quantization on the output.
* :attr: `static_enabled` defines the flag for using observer's static estimation for
* :attr:`static_enabled` defines the flag for using observer's static estimation for
scale and zero point.
* attr: `learning_enabled` defines the flag for enabling backpropagation for scale and zero point.
* :attr:`learning_enabled` defines the flag for enabling backpropagation for scale and zero point.
"""
def __init__(self, observer, quant_min=0, quant_max=255, scale=1., zero_point=0., channel_len=-1,
use_grad_scaling=False, **observer_kwargs):