Commit Graph

1153 Commits

Author SHA1 Message Date
Scott Wolchok
e2b94923ba [PyTorch] Speed up decomposed quantize_per_channel (#133029)
Similar to D60871396 (#132828).

Differential Revision: [D60978385](https://our.internmc.facebook.com/intern/diff/D60978385/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/133029
Approved by: https://github.com/cccclai
2024-08-08 23:48:34 +00:00
Scott Wolchok
eeb6ad0744 [quant] Speed up dequantize_per_channel (#132828)
Tensor-wise operations are much faster than looping over tensor elements. Rewrite loop in dequantize_per_channel to use whole-Tensor operations accordingly.

Differential Revision: [D60871396](https://our.internmc.facebook.com/intern/diff/D60871396/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132828
Approved by: https://github.com/cccclai
2024-08-08 16:44:41 +00:00
Shangdi Yu
825002c9c6 [export][fx] More robust DCE pass (#132764)
Summary:
- make default DCE pass check schema,
- need to rebase onto https://github.com/pytorch/pytorch/pull/131651 after it's in phabricator (for now the change is manually added).

- mark Proxy dump as NotImplemented for better error msg

- Remove Proxy from tensors when dumping models, as Proxy cannot be dumped.

More details in https://docs.google.com/document/d/1G5vmTXjzxoyVGRI2kpA1gQukK_Glyg2NrE0Oh6Nlg9A/edit?usp=sharing.

Test Plan:
CI
```
- buck2 run 'fbcode//mode/dev-nosan'  fbcode//caffe2/test/quantization:test_quantization -- -r  qat_conv2d
- test_export.py
- buck2 run 'fbcode//mode/dev-nosan' fbcode//modai/test:test_modai -- -r test_qat_stinson_htp_export
- buck2 run 'fbcode//mode/dev-nosan' fbcode//vizard_projects/ml_depth/tests:test_model -- -r test_qat_model_et
- buck2 run 'fbcode//mode/dev-nosan'  fbcode//caffe2/test:fx -- -r dce
- buck2 run 'fbcode//mode/dev-nosan' fbcode//bolt/nn/executorch/backends/tests:qnn_test -- -r test_qat_bias=False,use_3d_input=False
- buck2 run 'fbcode//mode/dev-nosan' fbcode//bolt/nn/executorch/backends/tests:qnn_test -- -r test_qat_bias=True,use_3d_input=False
- buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r  test_fold_bn_erases_bn_node
```

Reviewed By: angelayi

Differential Revision: D60319175

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132764
Approved by: https://github.com/angelayi
2024-08-06 22:27:22 +00:00
Max Ren
81a5a7a30a [Quantizer] Fix getattr for quantizing constants (#132705)
Mobilebert quantization was failing because there were embedding constants that could not be accessed through getattr().

It seems that we have to search the submodule for the embeddings. Which we do here. This is just to help get around looking at unlifted attrs to check if they are large scalars

Differential Revision: [D60492338](https://our.internmc.facebook.com/intern/diff/D60492338/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132705
Approved by: https://github.com/jerryzh168
ghstack dependencies: #132704
2024-08-06 18:16:27 +00:00
Max Ren
1de4ebc85d [Quantizer] Fix Maxpool2d share q params (#132704)
There seems to be a bug in the code for sharing q params for maxpool2d. This case occurs when output_node = maxpool_node. When this happens we overwrite the node's "quantization_annotation" metadata. This fix ensures that qparams are indeed shared across input and output

Differential Revision: [D60492341](https://our.internmc.facebook.com/intern/diff/D60492341/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132704
Approved by: https://github.com/jerryzh168
2024-08-06 18:13:16 +00:00
andrewor14
fc7849b93f [pt2e][quant] Ensure BN node is erased after convert (#131651)
Summary: Previously, when folding BN into conv, we rely on DCE
to clean up the unused BN node from the graph. This works if
the model is already in eval mode, but fails if the model is
still in train mode because DCE doesn't remove nodes with
potential side effects (in this case `_native_batch_norm_legit`).
This required users to move the model to eval mode before calling
convert in order to get a properly DCE'd graph.

To solve this, we manually erase the BN node after folding
instead of relying on DCE. This relaxes the ordering constraints
between `move_exported_model_to_eval` and `convert_pt2e`.

Test Plan:
python test/test_quantization.py TestQuantizePT2EQAT_ConvBn1d.test_fold_bn_erases_bn_node
python test/test_quantization.py TestQuantizePT2EQAT_ConvBn2d.test_fold_bn_erases_bn_node

Reviewers: jerryzh168, yushangdi

Subscribers: jerryzh168, yushangdi, supriyar

Differential Revision: [D60520149](https://our.internmc.facebook.com/intern/diff/D60520149)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/131651
Approved by: https://github.com/yushangdi, https://github.com/leslie-fang-intel
2024-08-06 16:37:39 +00:00
Oguz Ulgen
72d2dba992 Add None return type to init (#132335)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132335
Approved by: https://github.com/albanD
2024-08-01 15:26:45 +00:00
Max Ren
d95756f6a5 [Quantizer][Add] Fix add annotation with constant (#132092)
Summary:
Occaisonally we run into a partition that looks like this for Add:

```
SourcePartition(nodes=[_constant2, add_2], source=<built-in function add>, input_nodes=[x], output_nodes=[_constant2, add_2], params=[_constant2])
```

In this case we are adding a constant to an input, and reusing the constant later down the line. This causes our constant to be an output in our SourcePartition. The assumption then that:

```
        add_node = add_partition.output_nodes[0]
```
Will not necessarily hold. As a result we must check that the output node is indeed a call function and not a constant.

Test Plan: buck test mode/dev-nosan //executorch/backends/xnnpack/test:test_xnnpack_ops -- test_qs8_add_constant

Differential Revision: D60413221

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132092
Approved by: https://github.com/jerryzh168
2024-08-01 09:57:43 +00:00
Joel Schlosser
e6cddc9271 Fix public API tests (#131386)
This PR fixes a bug in `test_correct_module_names` introduced in #130497. It also addresses post-fix test failures in:
* `torch/ao/quantization/__init__.py` - set the correct `__module__` for several public API helpers
* `torch/library.py` - add `register_vmap` to `__all__`
* `torch/nn/attention/flex_attention.py` - make `round_up_to_multiple` private by prepending an underscore
* `torch/storage.py` - introduce `__all__` to avoid `Self` being re-exported as a public API
* `torch/distributed/pipelining/schedules.py` - add `ZeroBubbleAlgorithm` to `__all__`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131386
Approved by: https://github.com/albanD
2024-07-30 18:42:54 +00:00
PyTorch MergeBot
e73a4cb21f Revert "[pt2e][quant] Ensure BN node is erased after convert (#131651)"
This reverts commit eba2ffd278.

Reverted https://github.com/pytorch/pytorch/pull/131651 on behalf of https://github.com/ZainRizvi due to breaking internal builds ([comment](https://github.com/pytorch/pytorch/pull/131651#issuecomment-2256407968))
2024-07-29 16:42:24 +00:00
PyTorch MergeBot
945bf78894 Revert "[BE] typing for decorators - fx/_compatibility (#131568)"
This reverts commit 193f62fde9.

Reverted https://github.com/pytorch/pytorch/pull/131568 on behalf of https://github.com/clee2000 due to same as https://github.com/pytorch/pytorch/pull/131572#issuecomment-2254328359 but I clicked the wrong link by accident.  This is where it actually starts ([comment](https://github.com/pytorch/pytorch/pull/131568#issuecomment-2254330781))
2024-07-28 03:43:39 +00:00
PyTorch MergeBot
a3ba405871 Revert "[BE] typing for decorators - library (#131570)"
This reverts commit 5731b486c8.

Reverted https://github.com/pytorch/pytorch/pull/131570 on behalf of https://github.com/clee2000 due to same as https://github.com/pytorch/pytorch/pull/131572#issuecomment-2254328359 but I clicked the wrong link by accident.  This is where it actually starts ([comment](https://github.com/pytorch/pytorch/pull/131568#issuecomment-2254330781))
2024-07-28 03:43:39 +00:00
PyTorch MergeBot
609447a626 Revert "[BE] typing for decorators - _jit_internal (#131573)"
This reverts commit f0f20f7e97.

Reverted https://github.com/pytorch/pytorch/pull/131573 on behalf of https://github.com/clee2000 due to breaking lint internally D60265575 ([comment](https://github.com/pytorch/pytorch/pull/131572#issuecomment-2254328359))
2024-07-28 03:29:32 +00:00
PyTorch MergeBot
b1d640a2b7 Revert "[BE] typing for decorators - ao/quantization/quantizer/xnnpack_quantizer_utils (#131577)"
This reverts commit 5ee6a6dacc.

Reverted https://github.com/pytorch/pytorch/pull/131577 on behalf of https://github.com/clee2000 due to breaking lint internally D60265575 ([comment](https://github.com/pytorch/pytorch/pull/131572#issuecomment-2254328359))
2024-07-28 03:29:32 +00:00
PyTorch MergeBot
8f5cf46405 Revert "Fix public API tests (#131386)"
This reverts commit 91fcfd8760.

Reverted https://github.com/pytorch/pytorch/pull/131386 on behalf of https://github.com/clee2000 due to reverting this to revert something else, only action you should need to do is to rebase and merge again, sorry for the churn ([comment](https://github.com/pytorch/pytorch/pull/131386#issuecomment-2254327487))
2024-07-28 03:23:04 +00:00
Joel Schlosser
91fcfd8760 Fix public API tests (#131386)
This PR fixes a bug in `test_correct_module_names` introduced in #130497. It also addresses post-fix test failures in:
* `torch/ao/quantization/__init__.py` - set the correct `__module__` for several public API helpers
* `torch/library.py` - add `register_vmap` to `__all__`
* `torch/nn/attention/flex_attention.py` - make `round_up_to_multiple` private by prepending an underscore
* `torch/storage.py` - introduce `__all__` to avoid `Self` being re-exported as a public API
* `torch/distributed/pipelining/schedules.py` - add `ZeroBubbleAlgorithm` to `__all__`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131386
Approved by: https://github.com/albanD
2024-07-26 23:38:43 +00:00
Sergii Dymchenko
5489ff8e94 Use Mermaid for the diagram in torch/ao/quantization/fx/README.md (#131412)
preview 3a0efcdfa3/torch/ao/quantization/fx/README.md
Pull Request resolved: https://github.com/pytorch/pytorch/pull/131412
Approved by: https://github.com/jerryzh168
2024-07-26 22:01:21 +00:00
Mikayla Gawarecki
1dd10ac802 [BE] [Reland] Make nn.Module state_dict load_state_dict pre-hook and state_dict post-hook public (#131690)
Reland https://github.com/pytorch/pytorch/pull/126704

#### Fixes the issue with type of `nn.Module._state_dict_hooks` being changed in that PR which was problematic:
Instead of using `Tuple(Callable, bool)` to keep track of whether the private `_register_state_dict_hook` or the public `register_state_dict_post_hook` API was used to register the hook and toggle the behavior accordingly, I set an attribute on the Callable in the private API, which is never cleaned up.

If a callable previously registered using the private API is registered via the public API, a RuntimeError will be raised

#### Copied from previous PR description
Fixes https://github.com/pytorch/pytorch/issues/75287 and https://github.com/pytorch/pytorch/issues/117437

- `nn.Module._register_state_dict_hook` --> add public `nn.Module.register_state_dict_post_hook`
   - Add a test as this API was previously untested
- `nn.Module._register_load_state_dict_pre_hook` --> add public `nn.Module.register_load_state_dict_pre_hook` (remove the `with_module` flag, default it to `True`
    ~- For consistency with optimizer `load_state_dict_pre_hook` raised by @janeyx99, allow the pre-hook to return a new `state_dict`~
 - For issuet by https://github.com/pytorch/pytorch/issues/117437 regarding `_register_state_dict_hook` semantic of returning a new state_dict only being respected for the root for private hook
       - Document this for private `_register_state_dict_hook`
       - Remove this for the public `register_state_dict_post_hook`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131690
Approved by: https://github.com/albanD
2024-07-26 18:14:07 +00:00
andrewor14
eba2ffd278 [pt2e][quant] Ensure BN node is erased after convert (#131651)
Summary: Previously, when folding BN into conv, we rely on DCE
to clean up the unused BN node from the graph. This works if
the model is already in eval mode, but fails if the model is
still in train mode because DCE doesn't remove nodes with
potential side effects (in this case `_native_batch_norm_legit`).
This required users to move the model to eval mode before calling
convert in order to get a properly DCE'd graph.

To solve this, we manually erase the BN node after folding
instead of relying on DCE. This relaxes the ordering constraints
between `move_exported_model_to_eval` and `convert_pt2e`.

Test Plan:
python test/test_quantization.py TestQuantizePT2EQAT_ConvBn1d.test_fold_bn_erases_bn_node
python test/test_quantization.py TestQuantizePT2EQAT_ConvBn2d.test_fold_bn_erases_bn_node

Reviewers: jerryzh168, yushangdi

Subscribers: jerryzh168, yushangdi, supriyar
Pull Request resolved: https://github.com/pytorch/pytorch/pull/131651
Approved by: https://github.com/yushangdi
2024-07-26 15:30:45 +00:00
Aaron Orenstein
5ee6a6dacc [BE] typing for decorators - ao/quantization/quantizer/xnnpack_quantizer_utils (#131577)
See #131429
Pull Request resolved: https://github.com/pytorch/pytorch/pull/131577
Approved by: https://github.com/oulgen, https://github.com/zou3519
ghstack dependencies: #131568, #131569, #131570, #131571, #131572, #131573, #131574, #131575, #131576
2024-07-25 22:24:19 +00:00
Aaron Orenstein
f0f20f7e97 [BE] typing for decorators - _jit_internal (#131573)
See #131429

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131573
Approved by: https://github.com/oulgen, https://github.com/zou3519
ghstack dependencies: #131568, #131569, #131570, #131571, #131572
2024-07-25 22:24:19 +00:00
Aaron Orenstein
5731b486c8 [BE] typing for decorators - library (#131570)
See #131429

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131570
Approved by: https://github.com/oulgen, https://github.com/zou3519
ghstack dependencies: #131568, #131569
2024-07-25 22:24:19 +00:00
Aaron Orenstein
193f62fde9 [BE] typing for decorators - fx/_compatibility (#131568)
See #131429

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131568
Approved by: https://github.com/justinchuby, https://github.com/oulgen, https://github.com/zou3519
2024-07-25 22:24:19 +00:00
Xuehai Pan
2ce734cee9 [BE] enable UFMT for torch/ao/quantization/ (#128863)
Part of #123062

- #123062

Pull Request resolved: https://github.com/pytorch/pytorch/pull/128863
Approved by: https://github.com/ezyang
ghstack dependencies: #128861, #128862
2024-07-25 04:17:54 +00:00
Aaron Orenstein
5a0068cc69 [BE] mypy: disallow untyped decorators (#131428)
Untyped decorators strip the types from their decorated function so even if the underlying function is fully typed then callers to it don't get any benefit from type annotations.

Step 1 - Enable the error and override in all the offending files.

#131429

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131428
Approved by: https://github.com/justinchuby, https://github.com/oulgen
2024-07-23 21:50:55 +00:00
kausik
4f60a2e39c Set correct output dtype for dequantize op during convert_pt2e in decomposed mode (#128953)
Earlier the signature of dequantize ops for decomposed quantized Tensor was changed for wider use-cases where the output dtype can be different from torch.float and needs to be passed during dequantization.
Please refer: https://github.com/pytorch/pytorch/pull/121450

However, setting of correct output dtype for dequantize ops was still missing in convert_pt2e flow.

This change enables the users to use PT2E quantization flow with non torch.float unquantized dtype, such as torch.bfloat16.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/128953
Approved by: https://github.com/jgong5, https://github.com/jerryzh168
2024-07-19 04:58:02 +00:00
Jerry Zhang
793b17ebcb Add numeric_debugger top level APIs (#130643)
Summary:
Add three top level APIs for numeric debugger in pt2e flow that can log intermediate output in the model
and calculate summary for metric comparisons between nodes in two graphs

* `prepare_for_propagation_comparison`
* `extract_results_from_loggers`
* `compare_results`

Test Plan:
python test/test_quantization.py -k test_prepare_for_propagation_comparison
python test/test_quantization.py -k test_extract_results_from_loggers

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130643
Approved by: https://github.com/dulinriley, https://github.com/tarun292
2024-07-18 20:54:18 +00:00
Jerry Zhang
b893aa71ca Rename generate_numeric_debug_handle to numeric_debugger (#130590)
Summary:
att

Test Plan:
CI

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130590
Approved by: https://github.com/dulinriley, https://github.com/tarun292
2024-07-15 22:42:27 +00:00
Tijmen Blankevoort
bc18863713 Corner-case fix for upscale_histogram in the new HistogramObserver (#130316)
Summary: Small fix to the bucketize function that caused a run-time error in some corner cases.

Test Plan: Unit tests

Differential Revision: D59508432

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130316
Approved by: https://github.com/jerryzh168
2024-07-11 19:49:21 +00:00
Jerry Zhang
df9d1b44e7 Preserve _numeric_debug_handle throguh deepcopy and re-export (#129287)
Summary:
* Added support for preserving it during deepcopy, need to remap the args since _numeric_debug_handle refers
to the nodes in the graph

TODO: need to fully support re-export, currently the metadata for output node is not preserved

Test Plan:
python test/test_quantization.py -k test_deepcopy_preserve_handle
python test/test_quantization.py -k test_copy_preserve_handle

all related tests:
python test/test_quantization.py -k TestGenerateNumericDebugHandle

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129287
Approved by: https://github.com/zhxchen17
2024-07-11 02:19:41 +00:00
Jerry Zhang
4c19623800 Change numeric_debug_handle to store per-node id (#129811)
Summary:
Previously we store edge id in numeric_debug_handle to support operator fusion and operator decomposition throughout the stack,
but according to feedback from customers, people prefer the simpler per-node id, and they are fine with not having the additional
support for numerical debugging for inputs and willing to hack around to achieve this.

This PR changes the structure of numeric_debug_handle to store unique_id for each node instead.

e.g.
graph:
```
node = op(input_node, weight_node)
```
Before:
```
node.meta[NUMERIC_DEBUG_HANDLE_KEY] = {input_node: id1, weight_node: id2, "output": id3}
```

After:
```
node.meta[NUMERIC_DEBUG_HANDLE_KEY] = id1
```

Test Plan:
python test/test_quantization.py -k TestGenerateNumericDebugHandle

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129811
Approved by: https://github.com/tarun292
2024-07-08 23:36:19 +00:00
Xia, Weiwen
36e2608783 [Quant][PT2E] enable qlinear post op fusion for dynamic quant & qat (#122667)
**Description**
Add fusion path for dynamic quant and for QAT.
The following patterns can be matched for static quant with QAT cases:
`qx -> qlinear -> add -> optional relu -> optional type convert -> optional quant`

The following patterns can be matched for dynamic quant cases:
`qx -> qlinear -> add -> optional relu`

**Test plan**
python test/inductor/test_mkldnn_pattern_matcher.py -k test_qlinear
python test/inductor/test_cpu_cpp_wrapper.py -k test_qlinear
python test/test_quantization.py -k test_linear_unary
python test/test_quantization.py -k test_linear_binary

Differential Revision: [D57655830](https://our.internmc.facebook.com/intern/diff/D57655830)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/122667
Approved by: https://github.com/jgong5, https://github.com/leslie-fang-intel
2024-07-08 20:04:39 +00:00
PyTorch MergeBot
784e3b4123 Revert "Change numeric_debug_handle to store per-node id (#129811)"
This reverts commit a9a744e442.

Reverted https://github.com/pytorch/pytorch/pull/129811 on behalf of https://github.com/kit1980 due to breaking internal builds ([comment](https://github.com/pytorch/pytorch/pull/129811#issuecomment-2211245852))
2024-07-05 18:14:02 +00:00
Jerry Zhang
a9a744e442 Change numeric_debug_handle to store per-node id (#129811)
Summary:
Previously we store edge id in numeric_debug_handle to support operator fusion and operator decomposition throughout the stack,
but according to feedback from customers, people prefer the simpler per-node id, and they are fine with not having the additional
support for numerical debugging for inputs and willing to hack around to achieve this.

This PR changes the structure of numeric_debug_handle to store unique_id for each node instead.

e.g.
graph:
```
node = op(input_node, weight_node)
```
Before:
```
node.meta[NUMERIC_DEBUG_HANDLE_KEY] = {input_node: id1, weight_node: id2, "output": id3}
```

After:
```
node.meta[NUMERIC_DEBUG_HANDLE_KEY] = id1
```

Test Plan:
python test/test_quantization.py -k TestGenerateNumericDebugHandle

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129811
Approved by: https://github.com/tarun292
2024-07-03 22:03:31 +00:00
Tijmen Blankevoort
e3b3431c42 Fix for HistogramObserver (#129387)
Summary:
There were two problems with the HistogramObserver:
1. It does not work when someone passes a batch_size 1, tensor_size 1 data-point.
2. The Histogram doesn't seem to actually update if the range of the new x falls within the old one

These issues were both fixed.

On top of this, I greatly simplified the logic for the histogram updating. Now, it doesn't do the downsampling anymore, which saves a ton of memory and code. The accuracy can still be controlled with the upsampling ratio. This ratio was also too high for the accuracy we generally need here, I reduced the default for this.
Also the code is cleaner now, much easier to follow what's happening.

test_histogram_observer_same_inputs was likely wrong - If I pass 0s and 1s to my histogramobserver, I want them to actually count! The current test now thinks it's good to discard and ignore these values.

Test Plan: You can run the included tests.

Differential Revision: D58931336

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129387
Approved by: https://github.com/jerryzh168
2024-07-02 15:41:44 +00:00
Piotr Kluska
9cf8e5dd32 chore(quantization): Enable PT2E symmetric dynamic quantization (#124615)
in `_find_choose_qparams_node` function compare
the current node if it is affine or symmetric
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124615
Approved by: https://github.com/kimishpatel, https://github.com/malfet
2024-06-26 16:14:58 +00:00
Tijmen Blankevoort
ddb95dbb0d Fixing equalize with three things and improving functionality (#124632)
Summary:
(1) Make code work when a first layer does not have a bias.
(2) Make it possible to provide both modules and module names as input
(3) Allow sequences of contiguous layers as input, that then get split into pairs
(4) fix documentation to be more clear on inputs to be provided

Test Plan:
Run this new version of the algorithm on a network and see if it throws errors.

There's also this notebook to run and test N5199827

It you tell me where I can find the tests for this code, I can add some simple unit tests as well.

Differential Revision: D55895862

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124632
Approved by: https://github.com/jerryzh168
2024-06-20 16:55:56 +00:00
yiliu30
4669c6d3ae [quant][pt2e][quantizer] Support set_module_name_qconfig in X86InductorQuantizer (#126044)
Summary:
Added `set_module_name_qconfig` support to allow users to set configurations based on module name in `X86InductorQuantizer`.

For example, only quantize the `sub`:

```python
class M(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = torch.nn.Linear(5, 5)
        self.sub = Sub()

    def forward(self, x):
        x = self.linear(x)
        x = self.sub(x)
        return x

m = M().eval()
example_inputs = (torch.randn(3, 5),)
# Set config for a specific submodule.
quantizer = X86InductorQuantizer()
quantizer.set_module_name_qconfig("sub", xiq.get_default_x86_inductor_quantization_config())
```

- Added `set_module_name_qconfig` to allow user set the configuration at the `module_name` level.
- Unified the annotation process to follow this order:  `module_name_qconfig`, `operator_type_qconfig`, and `global_config`.
- Added `config_checker` to validate all user configurations and prevent mixing of static/dynamic or QAT/non-QAT configs.
- Moved `_get_module_name_filter` from `xnnpack_quantizer.py` into `utils.py` as it common for all quantizer.

Test Plan

```bash
python -m pytest quantization/pt2e/test_x86inductor_quantizer.py -k test_set_module_name
```

@Xia-Weiwen @leslie-fang-intel  @jgong5
Pull Request resolved: https://github.com/pytorch/pytorch/pull/126044
Approved by: https://github.com/jgong5, https://github.com/leslie-fang-intel, https://github.com/jerryzh168
2024-06-14 07:13:10 +00:00
Aaron Gokaslan
93a14aba6e [BE]: Update mypy to 1.10.0 (#127717)
Updates mypy to the latest and greatest.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127717
Approved by: https://github.com/ezyang
2024-06-13 15:57:13 +00:00
PyTorch MergeBot
1d233b8f50 Revert "Make nn.Module state_dict load_state_dict pre-hook and state_dict post hook public (#126704)"
This reverts commit c38b3381a1.

Reverted https://github.com/pytorch/pytorch/pull/126704 on behalf of https://github.com/clee2000 due to broke internal typecheck D58394110 (which probably means the code wouldn't work either but I guess it didn't run on the diff). Probably an easy fix? ([comment](https://github.com/pytorch/pytorch/pull/126704#issuecomment-2161299193))
2024-06-11 17:45:20 +00:00
Mikayla Gawarecki
c38b3381a1 Make nn.Module state_dict load_state_dict pre-hook and state_dict post hook public (#126704)
Fixes https://github.com/pytorch/pytorch/issues/75287 and https://github.com/pytorch/pytorch/issues/117437

- `nn.Module._register_state_dict_hook` --> add public `nn.Module.register_state_dict_post_hook`
   - Add a test as this API was previously untested
- `nn.Module._register_load_state_dict_pre_hook` --> add public `nn.Module.register_load_state_dict_pre_hook` (remove the `with_module` flag, default it to `True`
    ~- For consistency with optimizer `load_state_dict_pre_hook` raised by @janeyx99, allow the pre-hook to return a new `state_dict`~
 - Document issue pointed out by https://github.com/pytorch/pytorch/issues/117437 regarding `_register_state_dict_hook` semantic of returning a new state_dict only being respected for the root for private hook
       - Remove this for the public `register_state_dict_post_hook`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/126704
Approved by: https://github.com/albanD
ghstack dependencies: #126906
2024-06-10 21:50:17 +00:00
Aaron Orenstein
62bcdc0ac9 Flip default value for mypy disallow_untyped_defs [4/11] (#127841)
See #127836 for details.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127841
Approved by: https://github.com/oulgen
2024-06-08 18:36:48 +00:00
Andrea Frittoli
04272a0e12 Add docstring for the torch.ao.quantization.utils.get_combined_dict function (#128127)
Fixes: #127906

Pull Request resolved: https://github.com/pytorch/pytorch/pull/128127
Approved by: https://github.com/jerryzh168
2024-06-06 21:22:09 +00:00
rk7697
72e863df27 Update _learnable_fake_quantize.py (#127993)
Remove sentence "For literature references, please see the class _LearnableFakeQuantizePerTensorOp." and add "s" to "support"

(Possibly) Fixes #99107 (But not sure, sorry)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127993
Approved by: https://github.com/jerryzh168
2024-06-05 20:02:33 +00:00
PyTorch MergeBot
84776d7597 Revert "[BE]: Update mypy to 1.10.0 (#127717)"
This reverts commit 30213ab0a7.

Reverted https://github.com/pytorch/pytorch/pull/127717 on behalf of https://github.com/huydhn due to I am not sure why but the failures look legit and they are showing up in trunk 30213ab0a7 ([comment](https://github.com/pytorch/pytorch/pull/127717#issuecomment-2144183347))
2024-06-03 02:52:47 +00:00
Xuehai Pan
8b08b0f340 [BE] enable ruff rule Q from flake8-quotes (#127713)
Enable [ruff rule `Q`](https://docs.astral.sh/ruff/rules/#flake8-quotes-q) from flake8-quotes. Fixes:

- [avoidable-escaped-quote (Q003)](https://docs.astral.sh/ruff/rules/avoidable-escaped-quote/#avoidable-escaped-quote-q003)
- [unnecessary-escaped-quote (Q004)](https://docs.astral.sh/ruff/rules/unnecessary-escaped-quote/#unnecessary-escaped-quote-q004)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127713
Approved by: https://github.com/ezyang
2024-06-02 23:25:26 +00:00
Aaron Gokaslan
30213ab0a7 [BE]: Update mypy to 1.10.0 (#127717)
Updates mypy to the latest and greatest.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127717
Approved by: https://github.com/ezyang
2024-06-02 21:07:23 +00:00
Xuehai Pan
67ef2683d9 [BE] wrap deprecated function/class with typing_extensions.deprecated (#127689)
Use `typing_extensions.deprecated` for deprecation annotation if possible. Otherwise, add `category=FutureWarning` to `warnings.warn("message")` if the category is missing.

Note that only warnings that their messages contain `[Dd]eprecat(ed|ion)` are updated in this PR.

Resolves #126888

- #126888

This PR is split from PR #126898.

- #126898

------

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127689
Approved by: https://github.com/Skylion007
2024-06-02 12:30:43 +00:00
PyTorch MergeBot
033e733021 Revert "[BE] wrap deprecated function/class with typing_extensions.deprecated (#126898)"
This reverts commit 749a132fb0.

Reverted https://github.com/pytorch/pytorch/pull/126898 on behalf of https://github.com/fbgheith due to switching typing-extensions=4.3.0 to 4.9.0 causes internal failure ([comment](https://github.com/pytorch/pytorch/pull/126898#issuecomment-2142884456))
2024-05-31 19:47:24 +00:00
Kwanghoon An
b704c7cf0f Re trying Support min/max carry over for eager mode from_float method (#127576)
Summary:
Original commit changeset: 2605900516c8

Original Phabricator Diff: D57977896

Test Plan: Re enabling due to prod failure

Reviewed By: jerryzh168

Differential Revision: D57978925

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127576
Approved by: https://github.com/jerryzh168
2024-05-31 19:08:07 +00:00
Chen Lai
7827afca14 Copy the constant folding pass to the pass under export/passes folder (#127456)
It's a generic pass and I'm trying to find a good place to host it. It's currently needed by quantization flow. See context in D55930580, it's too much effort to land a fix in the inductor folder.

Differential Revision: [D57934182](https://our.internmc.facebook.com/intern/diff/D57934182/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127456
Approved by: https://github.com/angelayi
2024-05-30 18:04:08 +00:00
Kwanghoon An
24a4bfdcc2 [AdaRound] Make versatile for data / extra param for callback function (#126891)
Summary:
For Speech sequential model, there could be a case where model(data) does not work correctly for feed forward,

Speech model uses a different type of Criterion (a.k.a loss function) to feed a data on individual components like encoder, predictor, joiner.

Hence we need extra parameter to pass feedforward wrapper

Differential Revision: D57680391

Pull Request resolved: https://github.com/pytorch/pytorch/pull/126891
Approved by: https://github.com/jerryzh168
2024-05-29 20:05:27 +00:00
Kwanghoon An
c404b2968c Support min/max carry over for eager mode from_float method (#127309)
Summary:
After QAT is completed or given pre-tuned weight observer via tunable PTQ algorithm, it should not over-write again with a given weight, at least for static QAT never.

Dynamic QAT also does not require to re-run weight observer again by design.

This is a fix

Test Plan: Signals

Differential Revision: D57747749

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127309
Approved by: https://github.com/jerryzh168
2024-05-29 19:33:26 +00:00
Xuehai Pan
749a132fb0 [BE] wrap deprecated function/class with typing_extensions.deprecated (#126898)
Use `typing_extensions.deprecated` for deprecation annotation if possible. Otherwise, add `category=FutureWarning` to `warnings.warn("message")` if the category is missing.

Note that only warnings that their messages contain `[Dd]eprecat(ed|ion)` are updated in this PR.

UPDATE: Use `FutureWarning` instead of `DeprecationWarning`.

Resolves #126888

- #126888

Pull Request resolved: https://github.com/pytorch/pytorch/pull/126898
Approved by: https://github.com/albanD
2024-05-29 12:09:27 +00:00
Xuehai Pan
26f4f10ac8 [5/N][Easy] fix typo for usort config in pyproject.toml (kown -> known): sort torch (#127126)
The `usort` config in `pyproject.toml` has no effect due to a typo. Fixing the typo make `usort` do more and generate the changes in the PR. Except `pyproject.toml`, all changes are generated by `lintrunner -a --take UFMT --all-files`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127126
Approved by: https://github.com/kit1980
2024-05-27 14:49:57 +00:00
PyTorch MergeBot
55c0ab2887 Revert "[5/N][Easy] fix typo for usort config in pyproject.toml (kown -> known): sort torch (#127126)"
This reverts commit 7763c83af6.

Reverted https://github.com/pytorch/pytorch/pull/127126 on behalf of https://github.com/XuehaiPan due to Broken CI ([comment](https://github.com/pytorch/pytorch/pull/127126#issuecomment-2133044286))
2024-05-27 09:22:08 +00:00
Xuehai Pan
7763c83af6 [5/N][Easy] fix typo for usort config in pyproject.toml (kown -> known): sort torch (#127126)
The `usort` config in `pyproject.toml` has no effect due to a typo. Fixing the typo make `usort` do more and generate the changes in the PR. Except `pyproject.toml`, all changes are generated by `lintrunner -a --take UFMT --all-files`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127126
Approved by: https://github.com/kit1980
ghstack dependencies: #127122, #127123, #127124, #127125
2024-05-27 04:22:18 +00:00
PyTorch MergeBot
980f5ac049 Revert "[Quant][PT2E] enable qlinear post op fusion for dynamic quant & qat (#122667)"
This reverts commit 3642e51ea5.

Reverted https://github.com/pytorch/pytorch/pull/122667 on behalf of https://github.com/DanilBaibak due to Break internal build ([comment](https://github.com/pytorch/pytorch/pull/122667#issuecomment-2122642317))
2024-05-21 13:45:07 +00:00
Xia, Weiwen
3642e51ea5 [Quant][PT2E] enable qlinear post op fusion for dynamic quant & qat (#122667)
**Description**
Add fusion path for dynamic quant and for QAT.
The following patterns can be matched for static quant with QAT cases:
`qx -> qlinear -> add -> optional relu -> optional type convert -> optional quant`

The following patterns can be matched for dynamic quant cases:
`qx -> qlinear -> add -> optional relu`

**Test plan**
python test/inductor/test_mkldnn_pattern_matcher.py -k test_qlinear
python test/inductor/test_cpu_cpp_wrapper.py -k test_qlinear
python test/test_quantization.py -k test_linear_unary
python test/test_quantization.py -k test_linear_binary

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122667
Approved by: https://github.com/jgong5
2024-05-20 15:55:18 +00:00
Yihan He
d7de4c9d80 Fix issue of lowering nn.linear ops with kwargs (#126331)
Summary: Support kwarg bias for nn.linear quantization

Differential Revision: D57403190

Pull Request resolved: https://github.com/pytorch/pytorch/pull/126331
Approved by: https://github.com/ZhengkaiZ, https://github.com/huydhn
2024-05-17 21:50:55 +00:00
Kwanghoon An
eb0b16db92 Initial implementation of AdaRound (#126153)
Summary:
This is an implementation of AdaRound from a paper https://arxiv.org/abs/2004.10568

This algorithm is going to be used by multiple people, hence we need make it official implementation.

Differential Revision: D57227565

Pull Request resolved: https://github.com/pytorch/pytorch/pull/126153
Approved by: https://github.com/jerryzh168, https://github.com/huydhn
2024-05-17 19:44:50 +00:00
andrewor14
6931f781c2 [quant][pt2e] Allow multi users without output observers (#126487)
Summary: The PT2E quantization flow does not support unquantized
outputs yet. To work around this, users may wish to remove the
output observer from their graphs. However, this fails currently
in some cases because the `PortNodeMetaForQDQ` pass is too
restrictive, for example:

```
conv -> obs -------> output0
         \\-> add -> output1
```

Previously we expected conv to always have exactly 1 user,
which is the observer. When the observer is removed, however,
conv now has 2 users, and this fails the check.

```
conv -------> output0
  \\-> add -> output1
```

This commit relaxes the error into a warning to enable
this workaround.

Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_multi_users_without_output_observer

Reviewers: jerryzh168

Subscribers: jerryzh168, supriyar

Differential Revision: [D57472601](https://our.internmc.facebook.com/intern/diff/D57472601)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/126487
Approved by: https://github.com/tarun292
2024-05-17 18:48:21 +00:00
PyTorch MergeBot
ae6fdfa539 Revert "Initial implementation of AdaRound (#126153)"
This reverts commit 175c18af81.

Reverted https://github.com/pytorch/pytorch/pull/126153 on behalf of https://github.com/huydhn due to Sorry for reverting your change but the lint failure is legit because there are more than one lint issues, torch/optim/asgd.py is just the last one ([comment](https://github.com/pytorch/pytorch/pull/126153#issuecomment-2113902522))
2024-05-16 02:34:49 +00:00
Kwanghoon An
175c18af81 Initial implementation of AdaRound (#126153)
Summary:
This is an implementation of AdaRound from a paper https://arxiv.org/abs/2004.10568

This algorithm is going to be used by multiple people, hence we need make it official implementation.

Differential Revision: D57227565

Pull Request resolved: https://github.com/pytorch/pytorch/pull/126153
Approved by: https://github.com/jerryzh168
2024-05-16 02:09:18 +00:00
albanD
af9acc4168 Fix public binding to actually traverse modules (#126103)
The current call passes in `['/actual/path']` to os.walk which is a string pointing to no path and thus silently leads to and empty traversal.
There is an unused function just above that handles that, so I guess this is what was supposed to be called.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/126103
Approved by: https://github.com/suo
2024-05-15 19:36:03 +00:00
andrewor14
3cba50e478 [quant] Make per_group and per_token quant match torch.fake_quantize (#125781)
Summary: Follow-up to https://github.com/pytorch/ao/pull/229.
This resolves the difference between `input.div(scales)` and
`input.mul(1.0 / scales)`, which results in small numerical
discrepancies on some inputs.

Test Plan:
python test/test_quantization.py TestQuantizedTensor.test_decomposed_quantize_per_channel_group
python test/test_quantization.py TestQuantizedTensor.test_decomposed_quantize_per_token

Reviewers: jerryzh168

Subscribers: jerryzh168, supriyar
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125781
Approved by: https://github.com/jerryzh168
2024-05-14 18:18:54 +00:00
Aaron Gokaslan
34910f87f0 [BE]: Update ruff to v0.4.4 (#125031)
Update ruff version to 0.4.2. This version mostly has bugfixes for the new parser and also updates the f-string rule to be able to apply more fixes.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/125031
Approved by: https://github.com/albanD, https://github.com/malfet
2024-05-12 20:02:37 +00:00
PyTorch MergeBot
7ffa5558ee Revert "[FX] Update type hints in torch.fx._compatibility.py (#125469)"
This reverts commit 235b4d6ec2.

Reverted https://github.com/pytorch/pytorch/pull/125469 on behalf of https://github.com/izaitsevfb due to breaks pyre in dependent projects (internal: see D56986361) ([comment](https://github.com/pytorch/pytorch/pull/125469#issuecomment-2096665396))
2024-05-06 18:36:43 +00:00
Aaron Gokaslan
1dd42e42c4 [BE]: Try TCH autofixes on torch/ (#125536)
Tries TCH autofixes and see what breaks

Pull Request resolved: https://github.com/pytorch/pytorch/pull/125536
Approved by: https://github.com/ezyang
2024-05-05 23:13:59 +00:00
Xuehai Pan
235b4d6ec2 [FX] Update type hints in torch.fx._compatibility.py (#125469)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125469
Approved by: https://github.com/Skylion007
ghstack dependencies: #125468
2024-05-05 19:30:22 +00:00
andrewor14
8242fb62a7 [quant][pt2e] Fix conv-bn weight + bias per channel QAT (#125208)
Summary: This commit fixes the pattern matching for conv-bn
during QAT fusion where both weight and bias are quantized per
channel. Previously this failed because weights and biases used
the same example kwargs for their scales and zero points,
causing these qparams to be tied during pattern matching.

Test Plan:
python test/test_quantization.py TestQuantizePT2EQAT_ConvBn1d.test_qat_conv_bn_per_channel_weight_bias
python test/test_quantization.py TestQuantizePT2EQAT_ConvBn2d.test_qat_conv_bn_per_channel_weight_bias

Reviewers: jerryzh168, angelayi

Subscribers: jerryzh168, angelayi, supriyar

Differential Revision: [D56740694](https://our.internmc.facebook.com/intern/diff/D56740694)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125208
Approved by: https://github.com/angelayi
2024-04-30 18:12:25 +00:00
Xia, Weiwen
35b332882b [Quant][PT2E] Enable linear-binary(-unary) post-op recipe for X86Inductor quantizer (#122387)
As the title
**Test plan**
python test/test_quantization.py -k test_linear_binary

Differential Revision: [D56288440](https://our.internmc.facebook.com/intern/diff/D56288440)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/122387
Approved by: https://github.com/leslie-fang-intel, https://github.com/jgong5
ghstack dependencies: #123240
2024-04-27 02:40:57 +00:00
Aaron Orenstein
a8574a9719 Fix global flake8 issues (#124771)
Prior to this `lintrunner --all-files --take FLAKE8` failed.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124771
Approved by: https://github.com/Skylion007
ghstack dependencies: #124428
2024-04-26 15:35:53 +00:00
Aaron Gokaslan
2f3b0befed [BE]: Apply ruff FURB 118. (#124743)
Replaces various lambdas with operator.itemgetter which is more efficient (as it's a builtin function). Particularly useful for when lambdas are used as 'key' functions.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124743
Approved by: https://github.com/albanD, https://github.com/malfet
2024-04-26 14:34:52 +00:00
PyTorch MergeBot
1ac60484c1 Revert "Fix global flake8 issues (#124771)"
This reverts commit f01275934b.

Reverted https://github.com/pytorch/pytorch/pull/124771 on behalf of https://github.com/jeanschmidt due to Unfortunately, I needed to revert #123735 and this one depends on it. So please check if there are no merge conflicts or breakages and feel free to merge this PR again ([comment](https://github.com/pytorch/pytorch/pull/124428#issuecomment-2078699836))
2024-04-26 06:15:17 +00:00
Aaron Orenstein
f01275934b Fix global flake8 issues (#124771)
Prior to this `lintrunner --all-files --take FLAKE8` failed.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124771
Approved by: https://github.com/Skylion007
ghstack dependencies: #124428
2024-04-25 14:25:00 +00:00
andrewor14
85b28ffc3a [quant][pt2e] Move batch norm op between eval/train for cuda (#123957)
Summary: Before in `move_exported_model_to_train/eval`, we only
switched the CPU versions of the batch norm op. This commit adds
support for the cuda versions of the op too. Note that this fix
is temporary; we won't have to differentiate between these two
cases once we have batch norm consolidation.

Test Plan:
python test/test_quantization.py -k test_move_exported_model_bn

Reviewers: jerryzh168

Subscribers: jerryzh168, leslie-fang-intel, supriyar

Differential Revision: [D56070054](https://our.internmc.facebook.com/intern/diff/D56070054)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123957
Approved by: https://github.com/jerryzh168
2024-04-24 22:01:50 +00:00
Shen Xu
8885638f95 [quant][pt2e] Propagate get_attr meta through known ops only (#124415)
Summary: Avoid situation where the graph traversal finds a matmul node with a `get_attr` as its `args[0]`, and incorrectly propagate the `get_attr`'s meta to everything downstream.

Test Plan: CI

Differential Revision: D56219120

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124415
Approved by: https://github.com/jerryzh168
2024-04-24 20:55:56 +00:00
PyTorch MergeBot
e739a2d59e Revert "[quant][pt2e] Move batch norm op between eval/train for cuda (#123957)"
This reverts commit 4efb28c900.

Reverted https://github.com/pytorch/pytorch/pull/123957 on behalf of https://github.com/jeanschmidt due to reverting to check if it will fix rocm jobs on main ([comment](https://github.com/pytorch/pytorch/pull/123957#issuecomment-2075158146))
2024-04-24 15:02:11 +00:00
andrewor14
4efb28c900 [quant][pt2e] Move batch norm op between eval/train for cuda (#123957)
Summary: Before in `move_exported_model_to_train/eval`, we only
switched the CPU versions of the batch norm op. This commit adds
support for the cuda versions of the op too. Note that this fix
is temporary; we won't have to differentiate between these two
cases once we have batch norm consolidation.

Test Plan:
python test/test_quantization.py -k test_move_exported_model_bn

Reviewers: jerryzh168

Subscribers: jerryzh168, leslie-fang-intel, supriyar

Differential Revision: [D56070054](https://our.internmc.facebook.com/intern/diff/D56070054)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123957
Approved by: https://github.com/jerryzh168
2024-04-24 01:02:59 +00:00
Pian Pawakapan
cf98cab1b6 [export] Forward fix XNNPackQuantizer.module_type_config to detect str nn_module_stack (#123662)
https://github.com/pytorch/pytorch/pull/123308 previously changed the nn_module_stack format (module type -> module str). This modifies XNNPackQuantizer's module_type_config to detect class module strs instead.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/123662
Approved by: https://github.com/williamwen42
2024-04-23 15:21:37 +00:00
Amadeusz Skrzypczak
107f944f22 Support fp8 quantization (#123161)
This commit enables float8_e5m2 and float8_e4m3fn dtypes in fx quantization and PT2E.

Motivation for using fp8 quantization instead of int8:
- it works better to run inference with the same datatype the model was trained with,
- fp8 can handle outliers better, which is one of the problems in LLMs activations.

The numerical recipe we want to use it for is fp8 inference:
- bgemms/gemms running in float8_e4m3fn,
- Per-Tensor-Quantization/Scaling,
- amax observer for measurement with input_backoff and weight_backoff.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123161
Approved by: https://github.com/jgong5, https://github.com/jerryzh168
2024-04-23 13:35:27 +00:00
leslie-fang-intel
dd440ac734 Add Matmul recipe into x86_inductor_quantizer (#122776)
**Summary**
Add `matmul` in the quantization recipes, noting that it's not a general recipe but tailored to meet accuracy criteria for specific models. `matmul` recipe is disabled by default.

**Test Plan**
```
python -m pytest quantization/pt2e/test_x86inductor_quantizer.py -k test_attention_block
```

Differential Revision: [D56288468](https://our.internmc.facebook.com/intern/diff/D56288468)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/122776
Approved by: https://github.com/jgong5, https://github.com/jerryzh168
2024-04-23 00:25:41 +00:00
Aaron Gokaslan
c5fafe9f48 [BE]: TRY002 - Ban raising vanilla exceptions (#124570)
Adds a ruff lint rule to ban raising raw exceptions. Most of these should at the very least be runtime exception, value errors, type errors or some other errors. There are hundreds of instance of these bad exception types already in the codebase, so I have noqa'd most of them. Hopefully this error code will get commiters to rethink what exception type they should raise when they submit a PR.

I also encourage people to gradually go and fix all the existing noqas that have been added so they can be removed overtime and our exception typing can be improved.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124570
Approved by: https://github.com/ezyang
2024-04-21 22:26:40 +00:00
Aaron Gokaslan
29cc293725 [BE]: FURB142 - Remove set mutations. Use set update (#124551)
Uses set mutation methods instead of manually reimplementing (update, set_difference etc).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124551
Approved by: https://github.com/ezyang
2024-04-21 14:12:33 +00:00
Aaron Gokaslan
5a1216bb2e [BE]: Update ruff to 0.4.1 (#124549)
Update ruff to 0.4.1 .
This version fixes a lot false negatives/false positives, is 20-40% faster, and has various other bug fixes.

Below is a before and after table showing the execution time of ruff lint and ruff format in milliseconds courtesy of https://astral.sh/blog/ruff-v0.4.0

| Repository                                         | Linter (v0.3) | Linter (v0.4) | Formatter (v0.3) | Formatter (v0.4) |
|----------------------------------------------------|---------------|---------------|------------------|------------------|
| [pytorch/pytorch](https://github.com/pytorch/pytorch) | 328.7         | 251.8         | 351.1            | 274.9            |

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124549
Approved by: https://github.com/ezyang
2024-04-21 14:06:23 +00:00
andrewor14
3eea300680 [quant] Do not decompose choose_qparams_per_token_asymmetric (#124178)
Summary: https://github.com/pytorch/pytorch/pull/123452 added
backward support to this op by turning it into
CompositeImplicitAutograd, which meant it gets decomposed during
export/compile. However, this is not desirable behavior for the
PTQ case when we try to lower the model. This commit enables
QAT without breaking PTQ by refactoring the impl into a separate
op that does have backward support.

Test Plan:
python test/test_quantization.py -k test_decomposed_choose_qparams_per_token_asymmetric_backward

Reviewers: jerryzh168, digantdesai, zou3519

Subscribers: jerryzh168, digantdesai, zou3519, supriyar

Differential Revision: [D56192116](https://our.internmc.facebook.com/intern/diff/D56192116)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124178
Approved by: https://github.com/digantdesai
2024-04-16 22:58:48 +00:00
Aaron Gokaslan
1d6c5972c1 [BE]: Optimize min/max/sum comprehensions C419 (#123960)
Automatic fixes that replaces certain list comprehensions with generator ones where appropriate so that they are immediately consumed. This is preview functionality in ruff for rule C419 and it was automatically applied.

Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123960
Approved by: https://github.com/malfet
2024-04-12 23:54:15 +00:00
andrewor14
762e19606e [quant] Enable backward for choose_qparams_per_token_asymmetric (#123452)
Summary: When running the backward for this op, we get the error:
```
RuntimeError: derivative for aten::aminmax is not implemented
```
This commit replaces this call with separate amin and amax
calls instead, which do have implemented derivatives.

Test Plan:
python test/test_quantization.py -k test_decomposed_choose_qparams_per_token_asymmetric_backward

Reviewers: jerryzh168, digantdesai

Subscribers: jerryzh168, digantdesai, supriyar

Differential Revision: [D55805170](https://our.internmc.facebook.com/intern/diff/D55805170)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123452
Approved by: https://github.com/digantdesai, https://github.com/jerryzh168, https://github.com/zou3519
2024-04-12 20:05:56 +00:00
andrewor14
5c0a380bdf [pt2e][qat] Support conv-transpose-bn[-relu] QAT fusion (#123652)
Summary: This commit adds support for QAT fusion for the
[conv-transpose-bn] and [conv-transpose-bn-relu] patterns.

Test Plan:
python test/test_quantization.py TestQuantizePT2EQAT_ConvBn1d.test_qat_conv_transpose_bn
python test/test_quantization.py TestQuantizePT2EQAT_ConvBn1d.test_qat_conv_transpose_bn_relu
python test/test_quantization.py TestQuantizePT2EQAT_ConvBn2d.test_qat_conv_transpose_bn
python test/test_quantization.py TestQuantizePT2EQAT_ConvBn2d.test_qat_conv_transpose_bn_relu

Reviewers: jerryzh168

Subscribers: jerryzh168, supriyar

Tasks: https://github.com/pytorch/pytorch/issues/122224

Differential Revision: [D55930704](https://our.internmc.facebook.com/intern/diff/D55930704)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123652
Approved by: https://github.com/jerryzh168
2024-04-12 17:16:02 +00:00
PyTorch MergeBot
f0eb162730 Revert "Switch quantized_decomposed over to new custom ops API (#123454)"
This reverts commit 638729c0cd.

Reverted https://github.com/pytorch/pytorch/pull/123454 on behalf of https://github.com/DanilBaibak due to Break internal build ([comment](https://github.com/pytorch/pytorch/pull/123454#issuecomment-2051738976))
2024-04-12 13:14:59 +00:00
PyTorch MergeBot
5669334175 Revert "Add Matmul recipe into x86_inductor_quantizer (#122776)"
This reverts commit e8e9261b90.

Reverted https://github.com/pytorch/pytorch/pull/122776 on behalf of https://github.com/DanilBaibak due to Break internal build ([comment](https://github.com/pytorch/pytorch/pull/122776#issuecomment-2051073373))
2024-04-12 06:29:27 +00:00
PyTorch MergeBot
fe092da874 Revert "[quant] Enable backward for choose_qparams_per_token_asymmetric (#123452)"
This reverts commit c83900887f.

Reverted https://github.com/pytorch/pytorch/pull/123452 on behalf of https://github.com/clee2000 due to broke test_quantization.py::TestQuantizedTensor::test_decomposed_choose_qparams_per_token_asymmetric_backward on multiple jobs c83900887f https://github.com/pytorch/pytorch/actions/runs/8648781225/job/23714753103, probably a landrace ([comment](https://github.com/pytorch/pytorch/pull/123452#issuecomment-2050056601))
2024-04-11 16:19:28 +00:00
andrewor14
c83900887f [quant] Enable backward for choose_qparams_per_token_asymmetric (#123452)
Summary: When running the backward for this op, we get the error:
```
RuntimeError: derivative for aten::aminmax is not implemented
```
This commit replaces this call with separate amin and amax
calls instead, which do have implemented derivatives.

Test Plan:
python test/test_quantization.py -k test_decomposed_choose_qparams_per_token_asymmetric_backward

Reviewers: jerryzh168, digantdesai

Subscribers: jerryzh168, digantdesai, supriyar

Differential Revision: [D55805170](https://our.internmc.facebook.com/intern/diff/D55805170)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123452
Approved by: https://github.com/digantdesai, https://github.com/jerryzh168
2024-04-11 14:51:42 +00:00
rzou
638729c0cd Switch quantized_decomposed over to new custom ops API (#123454)
We are taking API feedback. Changes:
- I removed some of the default values (they weren't being used).
- I was unable to convert the last op (which is essentially an
  autograd.Function registered as CompositeImplicitAutograd). That one
  is "incorrectly registered"; I punt fixing it to the future.

Test Plan:
- existing tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123454
Approved by: https://github.com/andrewor14
ghstack dependencies: #123453, #123578
2024-04-11 13:18:06 +00:00
leslie-fang-intel
e8e9261b90 Add Matmul recipe into x86_inductor_quantizer (#122776)
**Summary**
Add `matmul` in the quantization recipes, noting that it's not a general recipe but tailored to meet accuracy criteria for specific models. `matmul` recipe is disabled by default.

**Test Plan**
```
python -m pytest quantization/pt2e/test_x86inductor_quantizer.py -k test_attention_block
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122776
Approved by: https://github.com/jgong5, https://github.com/jerryzh168
ghstack dependencies: #122775
2024-04-11 09:32:47 +00:00
leslie-fang-intel
8798f5bf0d Add Quantization recipe filter per operator type for x86_inductor_quantizer (#122775)
**Summary**
Default recipes are enabled in `X86InductorQuantizer` and request comes to customize recipes based on these defaults.

- Avoid annotation propagation and restrict annotation only to annotate `conv`/`linear`.
- Add `matmul`  in the quantization recipes, noting that it's not a general recipe but tailored to meet accuracy criteria for specific models.

To meet these requests, we made changes in this PR by introducing interface as `set_function_type_qconfig` and `set_module_type_qconfig`

- `set_function_type_qconfig` accepts functional input as `torch.nn.functional.linear` or `torch.matmul`; `set_module_type_qconfig` accepts nn.Module input as `torch.nn.Conv2d`.
- To disable the recipe for this operator, user can simply exclude it from the list of operations as `quantizer.set_function_type_qconfig(op, None)`.
- To modify or extend the recipe for this operator with default recipe, user can customize as `quantizer.set_function_type_qconfig(op, config)`.

**Test Plan**
```
python -m pytest quantization/pt2e/test_x86inductor_quantizer.py -k test_filter_conv2d_recipe
python -m pytest quantization/pt2e/test_x86inductor_quantizer.py -k test_filter_linear_recipe
python -m pytest quantization/pt2e/test_x86inductor_quantizer.py -k test_filter_maxpool2d_recipe
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122775
Approved by: https://github.com/jgong5, https://github.com/jerryzh168
2024-04-11 09:30:31 +00:00
PyTorch MergeBot
8d9af8b91c Revert "[Quant][PT2E] Enable linear-binary(-unary) post-op recipe for X86Inductor quantizer (#122387)"
This reverts commit 82e0153487.

Reverted https://github.com/pytorch/pytorch/pull/122387 on behalf of https://github.com/DanilBaibak due to Break internal build ([comment](https://github.com/pytorch/pytorch/pull/122387#issuecomment-2048294643))
2024-04-10 19:34:26 +00:00
Xia, Weiwen
82e0153487 [Quant][PT2E] Enable linear-binary(-unary) post-op recipe for X86Inductor quantizer (#122387)
As the title
**Test plan**
python test/test_quantization.py -k test_linear_binary

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122387
Approved by: https://github.com/leslie-fang-intel, https://github.com/jgong5
2024-04-10 01:34:14 +00:00
RoboSchmied
af27bc443b fix typo in 4 files (#123529)
fix typo: `information` has no plural.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/123529
Approved by: https://github.com/albanD
2024-04-09 23:37:35 +00:00
Zhicheng Yan
77643ed2eb [torch quantization]raise exception when OOM during combine histogram in observer (#123309)
Summary:
Even with changes in D55347133, it is still possible to OOM in histogram observer, because the size of allocated tensor also depends on *downsample_rate*.

For example, I still see OOM due to the attempt of allocating a 10GB+ histogram tensor in multi-task model.

To fix OOM issue better, we use *try-catch* clause to avoid OOM.
Empirically, we set the max size of a single histogram tensor size to 1 GB.

Test Plan: Test the change for Multi-Task model (depth + segmentation)

Differential Revision: D55567292

Pull Request resolved: https://github.com/pytorch/pytorch/pull/123309
Approved by: https://github.com/jerryzh168
2024-04-06 03:15:02 +00:00
andrewor14
fe29a8fbea [quant][be] Simplify fake_quant_per_channel (#123186)
Summary: We probably don't need
`torch._C._AutoDispatchBelowAutograd()`, which is to prevent
infinite recursion if the implementation calls itself. Let's
remove it and see if anything breaks. The other major change
is registering the op to the more general Autograd dispatch
key so it can be used on cuda as well.

Test Plan:
python test/inductor/test_cpu_repro.py -k test_decomposed_fake_quant_per_channel

Reviewers: zou3519, bdhirsh

Subscribers: zou3519, bdhirsh, jerryzh168, leslie-fang-intel
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123186
Approved by: https://github.com/zou3519, https://github.com/leslie-fang-intel
2024-04-03 18:06:45 +00:00
Zhengxu Chen
dacc73669c [export] Make quantizer compatible with the standard nn_module_stack. (#122819)
Summary: When we migrate to torch.export, we won't put L['self'] as the prefix for all the fqn in nn_module_stack. This diff adds the branch to handle the new case.

Test Plan: buck test mode/opt caffe2/test/quantization:test_quantization -- -r set_module_name

Differential Revision: D55436617

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122819
Approved by: https://github.com/tugsbayasgalan
2024-03-28 19:36:46 +00:00
Zhicheng Yan
07f94df1a6 [torch quantization]fix HistogramObserver OOM when (self.max_val - self.min_val) is too small (#122659)
Differential Revision: D55347133

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122659
Approved by: https://github.com/jerryzh168
2024-03-28 17:41:21 +00:00
Jerry Zhang
5af839f86d [quant][pt2e] Enable observer sharing between different quantization specs (#122734)
Summary:

Right now we don't insert additional observers (share observers) if qspec.dtype and qspec.is_dynamic matches exactly,
since fixed qparams quantization spec and derived quantization spec do have have is_dynamic field curerntly, observer sharing does not happen between them and quantization spec, in this PR we fixed the issue by
adding is_dynamic to all quantization specs.

Note: SharedQuantizationSpec should probably be its own type in the future
TODO later:
(1). move all these fields (dtype, is_dynamic, quant_min, quant_max etc.) to QuantizationSpecBase,
(2). make SharedQuantizationSpec a separate type
(3). add quant_min/quant_max in observer sharing checking in pt2e/prepare.py

Test Plan:
python test/test_quantization.py -k test_fixed_qparams_qspec_observer_dedup
Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D55396546](https://our.internmc.facebook.com/intern/diff/D55396546)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/122734
Approved by: https://github.com/andrewor14
2024-03-27 16:45:19 +00:00
haozhe.zhu
e0329cba8a [Quant] [PT2] Add SiLU into X86InductorQuantizer Conv2d Unary Annotation (#122267)
**Summary**
Add `SiLU` into X86InductorQuantizer Conv2d Unary Annotation

**TestPlan**
```
python -m pytest test_x86inductor_quantizer.py -k test_conv2d_unary
python -m pytest test_x86inductor_quantizer.py -k test_qat_conv2d_unary
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122267
Approved by: https://github.com/leslie-fang-intel, https://github.com/jgong5
ghstack dependencies: #122266
2024-03-26 08:03:42 +00:00
Guang Yang
c677221798 remove torchao dependency (#122524)
Test Plan:
CI

```
buck2 run mode/dev-nosan mode/inplace executorch/examples/models/llama2:export_llama -- -c ~/llama/ultra_new_checkpoint.pt -p ~/llama/params.json -kv -E 8,8 -d fp32 --pt2e_quantize "xnnpack_dynamic" -2
```

```
buck run //executorch/backends/xnnpack/test:test_xnnpack_ops -- executorch.backends.xnnpack.test.ops.linear.TestLinear.test_qd8_fp32_per_token_weight_per_channel_group_int4
```

Differential Revision: D55263008

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122524
Approved by: https://github.com/jerryzh168
2024-03-23 03:18:43 +00:00
PyTorch MergeBot
60bc29aa0b Revert "[Quant] [PT2] Add SiLU into X86InductorQuantizer Conv2d Unary Annotation (#122267)"
This reverts commit 2c6eeb26d3.

Reverted https://github.com/pytorch/pytorch/pull/122267 on behalf of https://github.com/jeanschmidt due to Not sure if this PR caused breakages in main rocm jobs, I'll remerge if reverting does not fix it ([comment](https://github.com/pytorch/pytorch/pull/122267#issuecomment-2015294491))
2024-03-22 15:04:30 +00:00
andrewor14
ea8e0c75c7 [quant][pt2] Fix create FQ with FixedQParamsQSpec (#122104)
Summary: Before we just returned a _PartialWrapper object when
using FixedQParamsQuantizationSpec in QAT. This is wrong and
we should return a FQ object instead.

Differential Revision: [D55021106](https://our.internmc.facebook.com/intern/diff/D55021106)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/122104
Approved by: https://github.com/jerryzh168
2024-03-22 14:23:05 +00:00
haozhe.zhu
2c6eeb26d3 [Quant] [PT2] Add SiLU into X86InductorQuantizer Conv2d Unary Annotation (#122267)
**Summary**
Add `SiLU` into X86InductorQuantizer Conv2d Unary Annotation

**TestPlan**
```
python -m pytest test_x86inductor_quantizer.py -k test_conv2d_unary
python -m pytest test_x86inductor_quantizer.py -k test_qat_conv2d_unary
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122267
Approved by: https://github.com/leslie-fang-intel, https://github.com/jgong5
ghstack dependencies: #122266
2024-03-22 08:12:23 +00:00
Jerry Zhang
901ba2be86 [quant][pt2e] Add support for conv transpose + bn + {relu} weights fusion in PTQ (#122046)
Summary:

also added some utils in xnnpack_quantizer_utils.py
* annotate_conv_tranpsose_bn_relu and annotate_conv_transpose_bn -> this is for QAT
* annotate_conv_transpose_relu

conv_transpose + bn weights fusion is performed automatically and can not be disabled currently
we can add support to allow disable this fusion later if needed

Test Plan:
python test/test_quantization.py -k test_conv_transpose_bn_fusion

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/122046
Approved by: https://github.com/andrewor14
2024-03-19 21:00:57 +00:00
Tugsbayasgalan Manlaibaatar
53d2188df9 Update get_aten_graph_module (#121937)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/121937
Approved by: https://github.com/andrewor14
2024-03-15 20:35:55 +00:00
Le-Zheng
25e00545bb [Quant][PT2E] Enable linear and linear-unary post-op gelu quant recipe for x86 inductor quantizer (#114853)
**Summary**
Add Gelu for linear-unary post-op quantization recipe to x86 inductor quantizer.

**Test plan**
python -m pytest test/quantization/pt2e/test_x86inductor_quantizer.py -k test_linear_unary_gelu
python test/test_quantization.py -k test_linear_unary_with_quantizer_api
Co-authored-by: leslie-fang-intel <leslie.fang@intel.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/114853
Approved by: https://github.com/jgong5, https://github.com/leslie-fang-intel, https://github.com/jerryzh168
2024-03-14 01:46:35 +00:00
Manuel Candales
c53e3f57b5 allow fp16 in quant/dequant decompositions (#121738)
Test Plan:
```
buck2 run mode/dev-nosan mode/inplace executorch/examples/models/llama2:export_llama -- -c ~/llama/ultra_new_checkpoint.pt -p ~/llama/params.json -kv -E 8,8 -d fp16 --pt2e_quantize "xnnpack_dynamic" -2
```

Reviewed By: kirklandsign

Differential Revision: D54785950

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121738
Approved by: https://github.com/jerryzh168
2024-03-13 21:45:08 +00:00
Manuel Candales
6d8a7d6e58 [pytorch] optional zero points on dequantize per channel (#121724)
Summary:
X-link: https://github.com/pytorch/executorch/pull/2364

bypass-github-export-checks

Test Plan: sandcastle

Reviewed By: mikekgfb

Differential Revision: D54709217

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121724
Approved by: https://github.com/mikekgfb
2024-03-12 19:54:11 +00:00
kausik
edf22f3a48 Modify signature of dequantize ops for decomposed quantized Tensor (#119173) (#121450)
Summary:
X-link: https://github.com/pytorch/executorch/pull/2308

Note: The initial purpose of this PR is to draw suggestion and feedback regarding better alternative, if any.

At present, dequantize op for decomposed quantized Tensor representation e.g. dequantize_per_tensor() assumes the output dtype as torch.float and hence, it does not have the output dtype in its operator argument list. However, this op signature becomes unusable when the assumption breaks. Because, in case the output dtype is different from torch.float, there is no way to specify the same during dequantization.

This change is aimed at generalizing the signature of dequantize op like dequantize_per_tensor() for wider use-cases where the output dtype can be different from torch.float and needs to passed during dequantization. The proposal is to use an additional argument named 'output_dtype' to solve the problem. However, we would also like to have suggestion and feedback regarding any better alternative that can be used instead.

cc jerryzh168 jianyuh raghuramank100 jamesr66a vkuzo jgong5 Xia-Weiwen leslie-fang-intel

Reviewed By: digantdesai

Differential Revision: D53590486

Pulled By: manuelcandales

Co-authored-by: kausik <kmaiti@habana.ai>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/121450
Approved by: https://github.com/jerryzh168
2024-03-12 12:36:31 +00:00
Shen Xu
159f30331f [quant][pt2e] Call sub-quantizers' transform_for_annotation in ComposableQuantizer (#121548)
Test Plan:
```
buck run caffe2/test:quantization_pt2e
```

Differential Revision: D54454707

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121548
Approved by: https://github.com/jerryzh168
2024-03-12 02:59:12 +00:00
Jerry Zhang
a6a67da333 [quant] Add error check for input_edge annotation (#121536)
Summary:
Raises error when an input edge contains non-Node elements like constant values etc in annotation.

Test Plan:
python test/test_quantization.py -k test_input_edge_sanity_check

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/121536
Approved by: https://github.com/andrewor14
2024-03-09 06:13:04 +00:00
leslie-fang-intel
975d428425 [Quant] Add the operator of decomposed fake quant per channel (#121297)
**Summary**
Add the operator of `quantized_decomposed.fake_quant_per_channel` and test the forward and backward of this op with comparing to ATen.

**Test Plan**
```
python -u -m pytest -s -v test_cpu_repro.py -k test_decomposed_fake_quant_per_channel
```

**Next Step**
Optimize the performance: from the generated code of forward and backward graph, the code didn't vectorize.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121297
Approved by: https://github.com/jerryzh168, https://github.com/jgong5
2024-03-08 10:51:37 +00:00
Tugsbayasgalan (Tugsuu) Manlaibaatar
b474a523c6 Ban passing in free function into capture_pre_autograd_graph (#120817)
Summary: Today we don't allow free functions to be tracing callable in torch.export. As a part of migrating capture_preautograd_graph usages to torch.export, we need to ban free functions to capture_preautograd_graph  as well

Test Plan: CI

Differential Revision: D54319597

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120817
Approved by: https://github.com/zhxchen17, https://github.com/andrewor14
2024-03-01 19:38:58 +00:00
andrewor14
91190d8087 [quant][pt2e] Relax model_is_exported input (#120720)
Summary: This commit relaxes the `model_is_exported` API to
additionally work for `torch.nn.Module`s in addition to just
`torch.fx.GraphModule`s, simplifying downstream uses.

Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_model_is_exported

Differential Revision: [D54263935](https://our.internmc.facebook.com/intern/diff/D54263935)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/120720
Approved by: https://github.com/tugsbayasgalan
2024-02-28 18:32:03 +00:00
leslie-fang-intel
84de851539 [Inductor] Enable the decomposition of quant/dequant per channel (#119177)
**Summary**
Part 2 of fixing https://github.com/pytorch/pytorch/issues/119141 which needs vectorized code generation of per channel quant and int8 data type.
Enable decomposition of quant/dequant per channel to make it vectorized code generation.

**TestPlan**
```
python -u -m pytest -s -v test_cpu_repro.py -k test_per_channel_fake_quant_uint8
python -u -m pytest -s -v test_cpu_repro.py -k test_per_channel_fake_quant_int8
python -u -m pytest -s -v test_cpu_repro.py -k test_per_channel_fake_quant_uint8_bf16_input
python -u -m pytest -s -v test_cpu_repro.py -k test_per_channel_fake_quant_int8_bf16_input
```

Co-authored-by: Jiong Gong <jiong.gong@intel.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119177
Approved by: https://github.com/peterbell10, https://github.com/jansel
2024-02-19 01:30:44 +00:00
andrewor14
6ea4480818 [quant][pt2e] Add model_is_exported util function (#119726)
Summary: This commit adds the `model_is_exported` util function
for users to be able to easily tell what APIs to call to move
their models between train and eval modes. This has the
additional advantage of hiding the implementation of how we
detect a model is exported, in case the metadata format changes
in the future.

Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_model_is_exported

Differential Revision: [D53812972](https://our.internmc.facebook.com/intern/diff/D53812972)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119726
Approved by: https://github.com/tugsbayasgalan, https://github.com/albanD
2024-02-16 19:29:36 +00:00
andrewor14
8ec8d78ef2 [quant][pt2e][be] Rename eval_utils -> export_utils (#119725)
It's not really eval_utils anymore, since we added some training
related utils. Instead it should be util functions that are
related to general export use cases.

Differential Revision: [D53711494](https://our.internmc.facebook.com/intern/diff/D53711494)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119725
Approved by: https://github.com/tugsbayasgalan
2024-02-13 19:10:06 +00:00
andrewor14
830ed6d9b2 [quant][pt2] Fix _disallow_eval_train error message (#119694)
Fix the message to use the right function name.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119694
Approved by: https://github.com/tugsbayasgalan
2024-02-13 00:17:53 +00:00
Riley Dulin
44796682d0 [torch][ao] Fix module name filter for pytorch2 quantization for underscores (#119344)
Summary:
There was a bug in the module name filter for modules that had an underscore
already in them, as it was replaced with a "dot" notation.
This is because it was thought that underscores always meant a module separator,
but this isn't the case for modules whose name contains an underscore.

Test Plan:
Added a unit test. Before this change, that test failed (due to applying the wrong
qscheme). Now it passes.

Differential Revision: D53502771

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119344
Approved by: https://github.com/jerryzh168
2024-02-10 00:29:08 +00:00
Jerry Zhang
7082e24ce8 [quant][pt2e][bc-breaking] Set fold_quantize to True in convert_pt2e (#119425)
Summary: This is a follow up to https://github.com/pytorch/pytorch/pull/118605 to set `fold_quantize` flag to True in `convert_pt2e`

Test Plan: CI

Differential Revision: D53550237

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119425
Approved by: https://github.com/andrewor14
2024-02-09 18:13:43 +00:00
PyTorch MergeBot
81abc2b249 Revert "[quant][pt2e][bc-breaking] Remove fold_quantize flag (#118701)"
This reverts commit 482d952e88.

Reverted https://github.com/pytorch/pytorch/pull/118701 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/118701#issuecomment-1932866964))
2024-02-07 20:56:16 +00:00
Jerry Zhang
482d952e88 [quant][pt2e][bc-breaking] Remove fold_quantize flag (#118701)
Summary:
This is a follow up to https://github.com/pytorch/pytorch/pull/118605 to remove `fold_quantize` flag from
`convert_pt2e`

Test Plan: CI

Differential Revision: D53247301

BC Breaking Note:

flag `fold_quantize` set to True `convert_pt2e` and now we'll fold the quantize op in the weight by default, so users will see model size reduction by default after pt2e quantization.
2.2
```
folded_model = convert_pt2e(model, fold_quantize=True)

non_folded_model = convert_pt2e(model)
```

2.3
```
folded_model = convert_pt2e(model)

non_folded_model = convert_pt2e(model, fold_quantize=False)
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/118701
Approved by: https://github.com/andrewor14, https://github.com/leslie-fang-intel
2024-02-07 19:10:51 +00:00
andrewor14
6c1cca153e [quant][pt2e] Allow users to override train/eval behavior (#119091)
Summary: This commit adds a util for PT2E quantization users
to call `model.train()` and `model.eval()` without error.
Instead, these will automatically call the equivalent
`move_exported_model_to_train/eval` for the user, which only
switch behavior for special ops like dropout and batchnorm.
This enables users to onboard to the PT2E flow more easily.

Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_allow_exported_model_train_eval

Reviewers: jerryzh168, tugsbayasgalan, zhxchen17

Subscribers: jerryzh168, tugsbayasgalan, zhxchen17, supriyar

Differential Revision: [D53426636](https://our.internmc.facebook.com/intern/diff/D53426636)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119091
Approved by: https://github.com/jerryzh168, https://github.com/tugsbayasgalan, https://github.com/zhxchen17
2024-02-06 22:19:58 +00:00
andrewor14
70605d150b [quant][pt2] Add move_exported_model_to_train (#113492)
Summary: This is the equivalent API to `model.train()` for
exported models, analogous to `move_exported_model_to_eval`.

Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_move_exported_model_dropout
python test/test_quantization.py TestQuantizePT2E.test_move_exported_model_dropout_inplace
python test/test_quantization.py TestQuantizePT2E.test_move_exported_model_dropout_bn

Reviewers: jerryzh168, kimishpatel

Subscribers: jerryzh168, kimishpatel, supriyar
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113492
Approved by: https://github.com/jerryzh168, https://github.com/tugsbayasgalan
2024-02-02 17:39:47 +00:00
Jiaxu Zhu
b97ab47619 [pytorch][ao] Update PerChannelMinMaxObserver default _load_from_state_dict (#118659)
Summary:
When `version` is missing in the metadata, use `min_val/max_val` as keys instead of `max_vals/min_vals`

## Reasons
1. It's been almost 2 years since this change D30003700, which means now most checkpoints are using the `max_val/min_val` keys

2. most checkpoints dumps using `model.state_dict()` don't have version info, which will lead a fake `missing keys` error when loading state_dict

Test Plan: CI

Differential Revision: D53233012

Pull Request resolved: https://github.com/pytorch/pytorch/pull/118659
Approved by: https://github.com/jerryzh168
2024-02-01 04:39:31 +00:00
suo
5586d7797e fix up batchnorm folding in pt2 quant (#118720)
Changes to how attributes are structured messed this pass up, fix it

Differential Revision: [D53253601](https://our.internmc.facebook.com/intern/diff/D53253601/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118720
Approved by: https://github.com/SherlockNoMad
2024-01-31 17:40:47 +00:00
Jerry Zhang
82a7460b67 [quant][bc-breaking] Turn on fold_quantize by default (#118605)
Summary:
Previously by default we don't generate quantized weight, that is, we'll have fp32 weight, and
`fp32 weight -> q -> dq -> linear -> ...` in the quantized model

After this PR, we'll produce a graph with int8 weight by default after convert_pt2e:
`int8 weight -> dq -> linear -> ...`

We'll remove the fold_quantize flag in the next PR

Test Plan: CI

Differential Revision: D51730862

Pull Request resolved: https://github.com/pytorch/pytorch/pull/118605
Approved by: https://github.com/andrewor14
2024-01-30 21:42:29 +00:00
Catherine Lee
4f5785b6b3 Enable possibly-undefined error code (#118533)
Fixes https://github.com/pytorch/pytorch/issues/118129

Suppressions automatically added with

```
import re

with open("error_file.txt", "r") as f:
    errors = f.readlines()

error_lines = {}
for error in errors:
    match = re.match(r"(.*):(\d+):\d+: error:.*\[(.*)\]", error)
    if match:
        file_path, line_number, error_type = match.groups()
        if file_path not in error_lines:
            error_lines[file_path] = {}
        error_lines[file_path][int(line_number)] = error_type

for file_path, lines in error_lines.items():
    with open(file_path, "r") as f:
        code = f.readlines()
    for line_number, error_type in sorted(lines.items(), key=lambda x: x[0], reverse=True):
        code[line_number - 1] = code[line_number - 1].rstrip() + f"  # type: ignore[{error_type}]\n"
    with open(file_path, "w") as f:
        f.writelines(code)
```

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Co-authored-by: Catherine Lee <csl@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118533
Approved by: https://github.com/Skylion007, https://github.com/zou3519
2024-01-30 21:07:01 +00:00
PyTorch MergeBot
40ece2e579 Revert "Enable possibly-undefined error code (#118533)"
This reverts commit 4f13f69a45.

Reverted https://github.com/pytorch/pytorch/pull/118533 on behalf of https://github.com/clee2000 due to sorry i'm trying to figure out a codev merge conflict, if this works i'll be back to rebase and merge ([comment](https://github.com/pytorch/pytorch/pull/118533#issuecomment-1917695185))
2024-01-30 19:00:34 +00:00
Edward Z. Yang
4f13f69a45 Enable possibly-undefined error code (#118533)
Fixes https://github.com/pytorch/pytorch/issues/118129

Suppressions automatically added with

```
import re

with open("error_file.txt", "r") as f:
    errors = f.readlines()

error_lines = {}
for error in errors:
    match = re.match(r"(.*):(\d+):\d+: error:.*\[(.*)\]", error)
    if match:
        file_path, line_number, error_type = match.groups()
        if file_path not in error_lines:
            error_lines[file_path] = {}
        error_lines[file_path][int(line_number)] = error_type

for file_path, lines in error_lines.items():
    with open(file_path, "r") as f:
        code = f.readlines()
    for line_number, error_type in sorted(lines.items(), key=lambda x: x[0], reverse=True):
        code[line_number - 1] = code[line_number - 1].rstrip() + f"  # type: ignore[{error_type}]\n"
    with open(file_path, "w") as f:
        f.writelines(code)
```

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/118533
Approved by: https://github.com/Skylion007, https://github.com/zou3519
2024-01-30 05:08:10 +00:00
Edward Z. Yang
46712b019d Enable local_partial_types (#118467)
When using dmypy, this setting is enabled and cannot be turned off. Force it for regular mypy too.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/118467
Approved by: https://github.com/Skylion007
ghstack dependencies: #118414, #118418, #118432
2024-01-28 13:38:22 +00:00
Edward Z. Yang
9bce208dfb Replace follow_imports = silent with normal (#118414)
This is a lot of files changed! Don't panic! Here's how it works:

* Previously, we set `follow_imports = silent` for our mypy.ini configuration. Per https://mypy.readthedocs.io/en/stable/running_mypy.html#follow-imports, what this does is whenever we have an import to a module which is not listed as a file to be typechecked in mypy, we typecheck it as normal but suppress all errors that occurred in that file.
* When mypy is run inside lintrunner, the list of files is precisely the files covered by the glob in lintrunner.toml, but with files in excludes excluded.
* The top-level directive `# mypy: ignore-errors` instructs mypy to typecheck the file as normal, but ignore all errors.
* Therefore, it should be equivalent to set `follow_imports = normal`, if we put `# mypy: ignore-errors` on all files that were previously excluded from the file list.
* Having done this, we can remove the exclude list from .lintrunner.toml, since excluding a file from typechecking is baked into the files themselves.
* torch/_dynamo and torch/_inductor were previously in the exclude list, because they were covered by MYPYINDUCTOR. It is not OK to mark these as `# mypy: ignore-errors` as this will impede typechecking on the alternate configuration. So they are temporarily being checked twice, but I am suppressing the errors in these files as the configurations are not quite the same. I plan to unify the configurations so this is only a temporary state.
* There were some straggler type errors after these changes somehow, so I fixed them as needed. There weren't that many.

In the future, to start type checking a file, just remove the ignore-errors directive from the top of the file.

The codemod was done with this script authored by GPT-4:

```
import glob

exclude_patterns = [
    ...
]

for pattern in exclude_patterns:
    for filepath in glob.glob(pattern, recursive=True):
        if filepath.endswith('.py'):
            with open(filepath, 'r+') as f:
                content = f.read()
                f.seek(0, 0)
                f.write('# mypy: ignore-errors\n\n' + content)
```

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/118414
Approved by: https://github.com/thiagocrepaldi, https://github.com/albanD
2024-01-27 02:44:11 +00:00
le-zheng
94f0472579 [Quant] [PT2] Add Hardswish into X86InductorQuantizer Conv2d Unary Annotation (#117488)
**Summary**
Add `hardswish`  into X86InductorQuantizer Conv2d Unary Annotation

**TestPlan**
```
python -m pytest test_x86inductor_quantizer.py -k test_conv2d_unary
python -m pytest test_x86inductor_quantizer.py -k test_qat_conv2d_unary
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/117488
Approved by: https://github.com/leslie-fang-intel, https://github.com/jgong5
ghstack dependencies: #117487
2024-01-20 01:37:33 +00:00
Jerry Zhang
8f1bc876b2 [quant] Support custom qmin/qmax for activation and weight for xnnpack quantizer (#117305)
Summary:
att, this allows us to experiment with 4 bit quant in xnnpack

Test Plan:
python test/test_quantization.py -k test_dynamic_linear_int4_weight

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/117305
Approved by: https://github.com/digantdesai
2024-01-17 03:22:49 +00:00
Xia, Weiwen
94db6578cc [Quant] Add dynamic quantization config for x86 inductor backend (#115337)
**Description**
Add dynamic quantization config for x86 inductor backend.
To support the QKV structure in self-attention, we removed an assertion in port-metadata-pass that requires single dequantize node after quantize node.

**Test plan**
```
python test/test_quantization.py -k TestQuantizePT2EX86Inductor.test_dynamic_quant_linear
python test/test_quantization.py -k TestQuantizePT2EX86Inductor.test_qat_dynamic_quant_linear
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115337
Approved by: https://github.com/jgong5, https://github.com/jerryzh168
2024-01-10 11:33:37 +00:00
Max Ren
d2033a0639 [quant][pt2e][xnnpack_quantizer] add support for linear_relu (#117052)
Add support for linear_relu annotation for XNNPACKQuantizer, this allows the input to linear and the output to relu to share the same quantization parameter.s

Differential Revision: [D52574086](https://our.internmc.facebook.com/intern/diff/D52574086/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/117052
Approved by: https://github.com/jerryzh168, https://github.com/digantdesai
2024-01-09 23:19:52 +00:00
Zhengxu Chen
5ac57a06eb [export] Refactor ExportPassBase. (#116778)
Summary:
X-link: https://github.com/pytorch/executorch/pull/1532

as title. This diff decouple the pass base library from torch export and exir, so that different layers can evolve in their own fashion, and we have more head room to divide and conquer in the future.

Test Plan: CI

Reviewed By: angelayi

Differential Revision: D52514517

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116778
Approved by: https://github.com/angelayi
2024-01-04 21:32:14 +00:00
Aaron Gokaslan
3fe437b24b [BE]: Update flake8 to v6.1.0 and fix lints (#116591)
Updates flake8 to v6.1.0 and fixes a few lints using sed and some ruff tooling.
- Replace `assert(0)` with `raise AssertionError()`
- Remove extraneous parenthesis i.e.
  - `assert(a == b)` -> `assert a == b`
  - `if(x > y or y < z):`->`if x > y or y < z:`
  - And `return('...')` -> `return '...'`

Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116591
Approved by: https://github.com/albanD, https://github.com/malfet
2024-01-03 06:04:44 +00:00
Jerry Zhang
41f265b06a [quant][pt2e] Preserve numeric_debug_handle in quantization flows (#116477)
Summary:
We introduced `node.meta["numeric_debug_handle"]` in https://github.com/pytorch/pytorch/pull/114315 to
indicate the numeric debug handle for values in the graph, in this PR we supported preserving this field
in prepare and convert so that we can use these for numerical debugging

Next: we also want to preserve these in deepcopy of GraphModule as well

Test Plan:
python test/test_quantization.py -k test_quantize_pt2e_preserve_handle

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/116477
Approved by: https://github.com/tugsbayasgalan
2024-01-03 03:39:00 +00:00
Aaron Gokaslan
bd10fea79a [BE]: Enable F821 and fix bugs (#116579)
Fixes #112371

I tried to fix as many of the bugs as I could, a few I could not figure out what the proper fix for them was though and so I left them with noqas.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116579
Approved by: https://github.com/ezyang
2024-01-01 08:40:46 +00:00
Jerry Zhang
8173d98c57 [quant][be] Skip conv-bn folding when there are no batchnorm ops (#116440)
Summary:
`_fold_conv_bn_qat` is taking a long time currently, so skipping it when it's not necessary,
we can have follow up fixes to actually reduce the patterns or cache the patterns if possible

Test Plan:
uncomment the print in `test_speed`, run

python test/test_quantization.py -k test_speed

and make sure the convert time is low, e.g. 0.1s instead of 8-9 seconds

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/116440
Approved by: https://github.com/andrewor14
2023-12-28 23:33:21 +00:00
Aaron Gokaslan
bbe3261dd3 [BE]: Use iterable.chain.from_iterable where possible (#116376)
This is more readable and more efficient when dealing with lots of sequences to chain together.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116376
Approved by: https://github.com/albanD
2023-12-27 19:20:07 +00:00
Xuehai Pan
199e07f108 [pytree][BE] update treespec num_children access (#116370)
Change `len(treespec.children_spes) -> treespec.num_children`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116370
Approved by: https://github.com/Skylion007
2023-12-24 20:54:32 +00:00