Revert "[Docs] Convert to markdown to fix 155032 (#155520)"

This reverts commit cd66ff8030.

Reverted https://github.com/pytorch/pytorch/pull/155520 on behalf of https://github.com/atalman due to breaks multiple test_quantization.py::TestQuantizationDocs::test_quantization_ ([comment](https://github.com/pytorch/pytorch/pull/155520#issuecomment-2981996091))
This commit is contained in:
PyTorch MergeBot 2025-06-17 22:22:50 +00:00
parent 54998c2daa
commit fa4f07b5b8
5 changed files with 495 additions and 503 deletions

View File

@ -1,4 +1,5 @@
# Quantization Accuracy Debugging
Quantization Accuracy Debugging
-------------------------------
This document provides high level strategies for improving quantization
accuracy. If a quantized model has error compared to the original model,
@ -10,9 +11,11 @@ we can categorize the error into:
portion of input data has large error
3. **implementation error** - quantized kernel is not matching reference implementation
## Data insensitive error
Data insensitive error
~~~~~~~~~~~~~~~~~~~~~~
### General tips
General tips
^^^^^^^^^^^^
1. For PTQ, ensure that the data you are calibrating with is representative
of your dataset. For example, for a classification problem a general
@ -38,7 +41,8 @@ we can categorize the error into:
4. If you are using PTQ, consider using QAT to recover some of the accuracy loss
from quantization.
### Int8 quantization tips
Int8 quantization tips
^^^^^^^^^^^^^^^^^^^^^^
1. If you are using per-tensor weight quantization, consider using per-channel
weight quantization.
@ -48,7 +52,8 @@ we can categorize the error into:
If this variation is high, the layer may be suitable for dynamic quantization
but not static quantization.
## Data sensitive error
Data sensitive error
~~~~~~~~~~~~~~~~~~~~
If you are using static quantization and a small portion of your input data is
resulting in high quantization error, you can try:
@ -60,7 +65,8 @@ resulting in high quantization error, you can try:
the observer settings to choose a better scale and zero_point.
## Implementation error
Implementation error
~~~~~~~~~~~~~~~~~~~~
If you are using PyTorch quantization with your own backend
you may see differences between the reference implementation of an
@ -74,23 +80,19 @@ operation (such as ``dequant -> op_fp32 -> quant``) and the quantized implementa
2. the kernel on the target hardware has an accuracy issue. In this case, reach
out to the kernel developer.
## Numerical Debugging Tooling (prototype)
Numerical Debugging Tooling (prototype)
---------------------------------------
```{eval-rst}
.. toctree::
:hidden:
torch.ao.ns._numeric_suite
torch.ao.ns._numeric_suite_fx
```
```{warning}
Numerical debugging tooling is early prototype and subject to change.
```
.. warning ::
Numerical debugging tooling is early prototype and subject to change.
```{eval-rst}
* :ref:`torch_ao_ns_numeric_suite`
Eager mode numeric suite
* :ref:`torch_ao_ns_numeric_suite_fx`
FX numeric suite
```

View File

@ -1,4 +1,5 @@
# Quantization Backend Configuration
Quantization Backend Configuration
----------------------------------
FX Graph Mode Quantization allows the user to configure various
quantization behaviors of an op in order to match the expectation
@ -7,13 +8,13 @@ of their backend.
In the future, this document will contain a detailed spec of
these configurations.
## Default values for native configurations
Default values for native configurations
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Below is the output of the configuration for quantization of ops
in x86 and qnnpack (PyTorch's default quantized backends).
Results:
```{eval-rst}
.. literalinclude:: scripts/quantization_backend_configs/default_backend_config.txt
```

View File

@ -1,16 +1,16 @@
# Quantization API Reference
Quantization API Reference
-------------------------------
## torch.ao.quantization
torch.ao.quantization
~~~~~~~~~~~~~~~~~~~~~
This module contains Eager mode quantization APIs.
```{eval-rst}
.. currentmodule:: torch.ao.quantization
```
### Top level APIs
Top level APIs
^^^^^^^^^^^^^^
```{eval-rst}
.. autosummary::
:toctree: generated
:nosignatures:
@ -22,11 +22,10 @@ This module contains Eager mode quantization APIs.
prepare
prepare_qat
convert
```
### Preparing model for quantization
Preparing model for quantization
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
```{eval-rst}
.. autosummary::
:toctree: generated
:nosignatures:
@ -37,11 +36,10 @@ This module contains Eager mode quantization APIs.
DeQuantStub
QuantWrapper
add_quant_dequant
```
### Utility functions
Utility functions
^^^^^^^^^^^^^^^^^
```{eval-rst}
.. autosummary::
:toctree: generated
:nosignatures:
@ -50,17 +48,15 @@ This module contains Eager mode quantization APIs.
swap_module
propagate_qconfig_
default_eval_fn
```
## torch.ao.quantization.quantize_fx
torch.ao.quantization.quantize_fx
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This module contains FX graph mode quantization APIs (prototype).
```{eval-rst}
.. currentmodule:: torch.ao.quantization.quantize_fx
```
```{eval-rst}
.. autosummary::
:toctree: generated
:nosignatures:
@ -70,17 +66,14 @@ This module contains FX graph mode quantization APIs (prototype).
prepare_qat_fx
convert_fx
fuse_fx
```
## torch.ao.quantization.qconfig_mapping
torch.ao.quantization.qconfig_mapping
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This module contains QConfigMapping for configuring FX graph mode quantization.
```{eval-rst}
.. currentmodule:: torch.ao.quantization.qconfig_mapping
```
```{eval-rst}
.. autosummary::
:toctree: generated
:nosignatures:
@ -89,19 +82,16 @@ This module contains QConfigMapping for configuring FX graph mode quantization.
QConfigMapping
get_default_qconfig_mapping
get_default_qat_qconfig_mapping
```
## torch.ao.quantization.backend_config
torch.ao.quantization.backend_config
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This module contains BackendConfig, a config object that defines how quantization is supported
in a backend. Currently only used by FX Graph Mode Quantization, but we may extend Eager Mode
Quantization to work with this as well.
```{eval-rst}
.. currentmodule:: torch.ao.quantization.backend_config
```
```{eval-rst}
.. autosummary::
:toctree: generated
:nosignatures:
@ -112,17 +102,15 @@ Quantization to work with this as well.
DTypeConfig
DTypeWithConstraints
ObservationType
```
## torch.ao.quantization.fx.custom_config
torch.ao.quantization.fx.custom_config
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This module contains a few CustomConfig classes that's used in both eager mode and FX graph mode quantization
```{eval-rst}
.. currentmodule:: torch.ao.quantization.fx.custom_config
```
```{eval-rst}
.. currentmodule:: torch.ao.quantization.fx.custom_config
.. autosummary::
:toctree: generated
:nosignatures:
@ -132,62 +120,48 @@ This module contains a few CustomConfig classes that's used in both eager mode a
PrepareCustomConfig
ConvertCustomConfig
StandaloneModuleConfigEntry
```
## torch.ao.quantization.quantizer
torch.ao.quantization.quantizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
```{eval-rst}
.. automodule:: torch.ao.quantization.quantizer
```
## torch.ao.quantization.pt2e (quantization in pytorch 2.0 export implementation)
torch.ao.quantization.pt2e (quantization in pytorch 2.0 export implementation)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
```{eval-rst}
.. automodule:: torch.ao.quantization.pt2e
.. automodule:: torch.ao.quantization.pt2e.representation
```
## torch.ao.quantization.pt2e.export_utils
torch.ao.quantization.pt2e.export_utils
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
```{eval-rst}
.. currentmodule:: torch.ao.quantization.pt2e.export_utils
```
```{eval-rst}
.. autosummary::
:toctree: generated
:nosignatures:
:template: classtemplate.rst
model_is_exported
```
```{eval-rst}
.. currentmodule:: torch.ao.quantization
```
## torch.ao.quantization.pt2e.lowering
torch.ao.quantization.pt2e.lowering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
```{eval-rst}
.. currentmodule:: torch.ao.quantization.pt2e.lowering
```
```{eval-rst}
.. autosummary::
:toctree: generated
:nosignatures:
:template: classtemplate.rst
lower_pt2e_quantized_to_x86
```
```{eval-rst}
.. currentmodule:: torch.ao.quantization
```
## PT2 Export (pt2e) Numeric Debugger
```{eval-rst}
PT2 Export (pt2e) Numeric Debugger
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autosummary::
:toctree: generated
:nosignatures:
@ -199,17 +173,14 @@ This module contains a few CustomConfig classes that's used in both eager mode a
prepare_for_propagation_comparison
extract_results_from_loggers
compare_results
```
## torch (quantization related functions)
torch (quantization related functions)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This describes the quantization related functions of the `torch` namespace.
```{eval-rst}
.. currentmodule:: torch
```
```{eval-rst}
.. autosummary::
:toctree: generated
:nosignatures:
@ -218,18 +189,15 @@ This describes the quantization related functions of the `torch` namespace.
quantize_per_tensor
quantize_per_channel
dequantize
```
## torch.Tensor (quantization related methods)
torch.Tensor (quantization related methods)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Quantized Tensors support a limited subset of data manipulation methods of the
regular full-precision tensor.
```{eval-rst}
.. currentmodule:: torch.Tensor
```
```{eval-rst}
.. autosummary::
:toctree: generated
:nosignatures:
@ -262,18 +230,16 @@ regular full-precision tensor.
resize_
sort
topk
```
## torch.ao.quantization.observer
torch.ao.quantization.observer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This module contains observers which are used to collect statistics about
the values observed during calibration (PTQ) or training (QAT).
```{eval-rst}
.. currentmodule:: torch.ao.quantization.observer
```
```{eval-rst}
.. autosummary::
:toctree: generated
:nosignatures:
@ -310,18 +276,15 @@ the values observed during calibration (PTQ) or training (QAT).
TorchAODType
ZeroPointDomain
get_block_size
```
## torch.ao.quantization.fake_quantize
torch.ao.quantization.fake_quantize
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This module implements modules which are used to perform fake quantization
during QAT.
```{eval-rst}
.. currentmodule:: torch.ao.quantization.fake_quantize
```
```{eval-rst}
.. autosummary::
:toctree: generated
:nosignatures:
@ -342,18 +305,15 @@ during QAT.
enable_fake_quant
disable_observer
enable_observer
```
## torch.ao.quantization.qconfig
torch.ao.quantization.qconfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This module defines `QConfig` objects which are used
to configure quantization settings for individual ops.
```{eval-rst}
.. currentmodule:: torch.ao.quantization.qconfig
```
```{eval-rst}
.. autosummary::
:toctree: generated
:nosignatures:
@ -372,23 +332,17 @@ to configure quantization settings for individual ops.
default_weight_only_qconfig
default_activation_only_qconfig
default_qat_qconfig_v2
```
## torch.ao.nn.intrinsic
```{eval-rst}
torch.ao.nn.intrinsic
~~~~~~~~~~~~~~~~~~~~~
.. automodule:: torch.ao.nn.intrinsic
.. automodule:: torch.ao.nn.intrinsic.modules
```
This module implements the combined (fused) modules conv + relu which can
then be quantized.
```{eval-rst}
.. currentmodule:: torch.ao.nn.intrinsic
```
```{eval-rst}
.. autosummary::
:toctree: generated
:nosignatures:
@ -406,23 +360,18 @@ then be quantized.
ConvBnReLU3d
BNReLU2d
BNReLU3d
```
## torch.ao.nn.intrinsic.qat
```{eval-rst}
torch.ao.nn.intrinsic.qat
~~~~~~~~~~~~~~~~~~~~~~~~~
.. automodule:: torch.ao.nn.intrinsic.qat
.. automodule:: torch.ao.nn.intrinsic.qat.modules
```
This module implements the versions of those fused operations needed for
quantization aware training.
```{eval-rst}
.. currentmodule:: torch.ao.nn.intrinsic.qat
```
```{eval-rst}
.. autosummary::
:toctree: generated
:nosignatures:
@ -439,24 +388,19 @@ quantization aware training.
ConvReLU3d
update_bn_stats
freeze_bn_stats
```
## torch.ao.nn.intrinsic.quantized
```{eval-rst}
torch.ao.nn.intrinsic.quantized
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. automodule:: torch.ao.nn.intrinsic.quantized
.. automodule:: torch.ao.nn.intrinsic.quantized.modules
```
This module implements the quantized implementations of fused operations
like conv + relu. No BatchNorm variants as it's usually folded into convolution
for inference.
```{eval-rst}
.. currentmodule:: torch.ao.nn.intrinsic.quantized
```
```{eval-rst}
.. autosummary::
:toctree: generated
:nosignatures:
@ -468,47 +412,35 @@ for inference.
ConvReLU2d
ConvReLU3d
LinearReLU
```
## torch.ao.nn.intrinsic.quantized.dynamic
```{eval-rst}
torch.ao.nn.intrinsic.quantized.dynamic
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. automodule:: torch.ao.nn.intrinsic.quantized.dynamic
.. automodule:: torch.ao.nn.intrinsic.quantized.dynamic.modules
```
This module implements the quantized dynamic implementations of fused operations
like linear + relu.
```{eval-rst}
.. currentmodule:: torch.ao.nn.intrinsic.quantized.dynamic
```
```{eval-rst}
.. autosummary::
:toctree: generated
:nosignatures:
:template: classtemplate.rst
LinearReLU
```
## torch.ao.nn.qat
```{eval-rst}
torch.ao.nn.qat
~~~~~~~~~~~~~~~~~~~~~~
.. automodule:: torch.ao.nn.qat
.. automodule:: torch.ao.nn.qat.modules
```
This module implements versions of the key nn modules **Conv2d()** and
**Linear()** which run in FP32 but with rounding applied to simulate the
effect of INT8 quantization.
```{eval-rst}
.. currentmodule:: torch.ao.nn.qat
```
```{eval-rst}
.. autosummary::
:toctree: generated
:nosignatures:
@ -517,48 +449,36 @@ effect of INT8 quantization.
Conv2d
Conv3d
Linear
```
## torch.ao.nn.qat.dynamic
```{eval-rst}
torch.ao.nn.qat.dynamic
~~~~~~~~~~~~~~~~~~~~~~~~~~
.. automodule:: torch.ao.nn.qat.dynamic
.. automodule:: torch.ao.nn.qat.dynamic.modules
```
This module implements versions of the key nn modules such as **Linear()**
which run in FP32 but with rounding applied to simulate the effect of INT8
quantization and will be dynamically quantized during inference.
```{eval-rst}
.. currentmodule:: torch.ao.nn.qat.dynamic
```
```{eval-rst}
.. autosummary::
:toctree: generated
:nosignatures:
:template: classtemplate.rst
Linear
```
## torch.ao.nn.quantized
```{eval-rst}
torch.ao.nn.quantized
~~~~~~~~~~~~~~~~~~~~~~
.. automodule:: torch.ao.nn.quantized
:noindex:
.. automodule:: torch.ao.nn.quantized.modules
```
This module implements the quantized versions of the nn layers such as
`~torch.nn.Conv2d` and `torch.nn.ReLU`.
```{eval-rst}
.. currentmodule:: torch.ao.nn.quantized
```
```{eval-rst}
.. autosummary::
:toctree: generated
:nosignatures:
@ -588,25 +508,17 @@ This module implements the quantized versions of the nn layers such as
InstanceNorm1d
InstanceNorm2d
InstanceNorm3d
```
## torch.ao.nn.quantized.functional
```{eval-rst}
torch.ao.nn.quantized.functional
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. automodule:: torch.ao.nn.quantized.functional
```
```{eval-rst}
This module implements the quantized versions of the functional layers such as
`~torch.nn.functional.conv2d` and `torch.nn.functional.relu`. Note:
:math:`~torch.nn.functional.relu` supports quantized inputs.
```
:meth:`~torch.nn.functional.relu` supports quantized inputs.
```{eval-rst}
.. currentmodule:: torch.ao.nn.quantized.functional
```
```{eval-rst}
.. autosummary::
:toctree: generated
:nosignatures:
@ -634,19 +546,16 @@ This module implements the quantized versions of the functional layers such as
upsample
upsample_bilinear
upsample_nearest
```
## torch.ao.nn.quantizable
torch.ao.nn.quantizable
~~~~~~~~~~~~~~~~~~~~~~~
This module implements the quantizable versions of some of the nn layers.
These modules can be used in conjunction with the custom module mechanism,
by providing the ``custom_module_config`` argument to both prepare and convert.
```{eval-rst}
.. currentmodule:: torch.ao.nn.quantizable
```
```{eval-rst}
.. autosummary::
:toctree: generated
:nosignatures:
@ -654,24 +563,19 @@ by providing the ``custom_module_config`` argument to both prepare and convert.
LSTM
MultiheadAttention
```
## torch.ao.nn.quantized.dynamic
```{eval-rst}
torch.ao.nn.quantized.dynamic
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. automodule:: torch.ao.nn.quantized.dynamic
.. automodule:: torch.ao.nn.quantized.dynamic.modules
```
Dynamically quantized {class}`~torch.nn.Linear`, {class}`~torch.nn.LSTM`,
{class}`~torch.nn.LSTMCell`, {class}`~torch.nn.GRUCell`, and
{class}`~torch.nn.RNNCell`.
Dynamically quantized :class:`~torch.nn.Linear`, :class:`~torch.nn.LSTM`,
:class:`~torch.nn.LSTMCell`, :class:`~torch.nn.GRUCell`, and
:class:`~torch.nn.RNNCell`.
```{eval-rst}
.. currentmodule:: torch.ao.nn.quantized.dynamic
```
```{eval-rst}
.. autosummary::
:toctree: generated
:nosignatures:
@ -683,9 +587,9 @@ Dynamically quantized {class}`~torch.nn.Linear`, {class}`~torch.nn.LSTM`,
RNNCell
LSTMCell
GRUCell
```
## Quantized dtypes and quantization schemes
Quantized dtypes and quantization schemes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Note that operator implementations currently only
support per channel quantization for weights of the **conv** and **linear**
@ -693,7 +597,6 @@ operators. Furthermore, the input data is
mapped linearly to the quantized data and vice versa
as follows:
```{eval-rst}
.. math::
\begin{aligned}
@ -702,15 +605,11 @@ as follows:
\text{Dequantization:}&\\
&x_\text{out} = (Q_\text{input}-z)*s
\end{aligned}
```
```{eval-rst}
where :math:`\text{clamp}(.)` is the same as :func:`~torch.clamp` while the
scale :math:`s` and zero point :math:`z` are then computed
as described in :class:`~torch.ao.quantization.observer.MinMaxObserver`, specifically:
```
```{eval-rst}
.. math::
\begin{aligned}
@ -726,7 +625,6 @@ as described in :class:`~torch.ao.quantization.observer.MinMaxObserver`, specifi
\left( Q_\text{max} - Q_\text{min} \right ) \\
&z = Q_\text{min} - \text{round}(x_\text{min} / s)
\end{aligned}
```
where :math:`[x_\text{min}, x_\text{max}]` denotes the range of the input data while
:math:`Q_\text{min}` and :math:`Q_\text{max}` are respectively the minimum and maximum values of the quantized dtype.
@ -737,7 +635,6 @@ the range of the input data or symmetric quantization is being used.
Additional data types and quantization schemes can be implemented through
the `custom operator mechanism <https://pytorch.org/tutorials/advanced/torch_script_custom_ops.html>`_.
```{eval-rst}
* :attr:`torch.qscheme` — Type to describe the quantization scheme of a tensor.
Supported types:
@ -751,9 +648,8 @@ the `custom operator mechanism <https://pytorch.org/tutorials/advanced/torch_scr
* :attr:`torch.quint8` — 8-bit unsigned integer
* :attr:`torch.qint8` — 8-bit signed integer
* :attr:`torch.qint32` — 32-bit signed integer
```
```{eval-rst}
.. These modules are missing docs. Adding them here only for tracking
.. automodule:: torch.ao.nn.quantizable.modules
:noindex:
@ -782,4 +678,3 @@ the `custom operator mechanism <https://pytorch.org/tutorials/advanced/torch_scr
.. automodule:: torch.nn.quantized.dynamic.modules
.. automodule:: torch.quantization
.. automodule:: torch.nn.intrinsic.modules
```

File diff suppressed because it is too large Load Diff

View File

@ -1,10 +1,7 @@
# torch.random
torch.random
===================================
```{eval-rst}
.. currentmodule:: torch.random
```
```{eval-rst}
.. automodule:: torch.random
:members:
```