Summary:
Previously we only checked dtype and is_dynamic to decide if two quantization spec are equivalent
this may not work in some cases, e.g. when people use different qscheme or quant_min/quant_max
This PR added checks for other fields as well
Test Plan:
regression tests
Reviewers:
Subscribers:
Tasks:
Tags:
Differential Revision: [D62530974](https://our.internmc.facebook.com/intern/diff/D62530974)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/135736
Approved by: https://github.com/sxu
Summary:
Make quantization tests compatible with the new training IR.
With the new batch norm node `torch.ops.aten.batch_norm.default`, we don't need an additional getitem node after the bn node, so tests need to be fixed to not check for the getitem node.
We added a capture_pre_autograd_graph_using_training_ir() function, which returns True when we are using the training ir, and False otherwise. This way, the code supports both training ir and the old ir.
For now, we are just rolling out the training ir for fbcode internal tests.
Test Plan:
```
buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_qat_preserve_source_fn_stack
buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_qat_update_shared_qspec
buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_conv2d
buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_qat_conv_bn_relu_fusion
buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_qat_conv_bn_fusion
buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_qat_conv_bn_fusion_literal_args
```
Reviewed By: andrewor14, tugsbayasgalan
Differential Revision: D61292102
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134259
Approved by: https://github.com/tugsbayasgalan
Summary:
In the new training ir, we produce `torch.ops.aten.batch_norm.default` instead of `torch.ops.aten._native_batch_norm_legit.default` or `torch.ops.aten._native_batch_norm_legit_no_training.default`.
So we need to change the pattern match to accomodate the new op.
- Add `torch.ops.aten.batch_norm.default` to pattern matcher list so it's identified as a batch norm node
- `torch.ops.aten.batch_norm.default` doesn't have a getitem user anymore, so when removing the bn norm, we need to do `bn_node.replace_all_uses_with(conv_node)` instead of `getitem_node.replace_all_uses_with(conv_node)`
The behavior of capture_pre_autograd_graph is consistent for each run.
If the run is a fbcode test, then capture_pre_autograd_graph uses training IR. This means both _get_aten_graph_module_for_pattern and replace_pattern_with_filters see the same training IR.
If the run is not a fbcode test, then both would see the old IR.
Test Plan:
```
buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_conv2d_binary2
buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_conv2d_unary
buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_linear_unary
buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_dynamic_quant_linear
buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_qat_dynamic_quant_linear
buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_flatten_recipe
buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_linear_unary
```
Reviewed By: andrewor14, tugsbayasgalan
Differential Revision: D61291077
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134157
Approved by: https://github.com/tugsbayasgalan
Part of #134054.
This corresponds to the pytorch mypy changes from D61493706. Updating takes so
long and touches so many files that it's impossible to land as a whole without conflicting with some other intermediate change.
So landing these 'type: ignore' for pytorch in advance of them actually being needed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134202
Approved by: https://github.com/Skylion007
Summary:
- exir.capture + to_edge is deprecated. We need to use the export + to_edge.
- Fix quantization pass to be compatible with the new export IR. In the quantization pass, some nodes might have side-effects, so they don't have users, but still are not removed by the DCE pass. We need to consider it.
- now export_rle_model works with the default `capture_pre_autograd_graph`, it should also work with the new training it
Test Plan:
```
buck2 run 'fbcode//mode/dev-nosan' fbcode//bolt/nn/executorch/export:export_rle_model -- -r export_rle_model
```
Differential Revision: D61485834
Pull Request resolved: https://github.com/pytorch/pytorch/pull/133937
Approved by: https://github.com/tugsbayasgalan
Summary:
Fix quantization pass to be compatible with the new export IR.
Some nodes might have side-effects, so they don't have users, but still are not removed by the DCE pass.
Test Plan:
CI
buck2 run 'fbcode//mode/dev-nosan' fbcode//bolt/nn/executorch/export:export_rle_model -- -r export_rle_model
Differential Revision: D61223356
Pull Request resolved: https://github.com/pytorch/pytorch/pull/133587
Approved by: https://github.com/tugsbayasgalan
Summary:
The Quantizer subclass can return a new model from `transform_for_annotation`,
and this is common if it uses any ExportPass subclass which does not mutate in-place.
Use the returned model instead of assuming its the same.
Differential Revision: D60869676
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132893
Approved by: https://github.com/jerryzh168
Summary:
- make default DCE pass check schema,
- need to rebase onto https://github.com/pytorch/pytorch/pull/131651 after it's in phabricator (for now the change is manually added).
- mark Proxy dump as NotImplemented for better error msg
- Remove Proxy from tensors when dumping models, as Proxy cannot be dumped.
More details in https://docs.google.com/document/d/1G5vmTXjzxoyVGRI2kpA1gQukK_Glyg2NrE0Oh6Nlg9A/edit?usp=sharing.
Test Plan:
CI
```
- buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r qat_conv2d
- test_export.py
- buck2 run 'fbcode//mode/dev-nosan' fbcode//modai/test:test_modai -- -r test_qat_stinson_htp_export
- buck2 run 'fbcode//mode/dev-nosan' fbcode//vizard_projects/ml_depth/tests:test_model -- -r test_qat_model_et
- buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test:fx -- -r dce
- buck2 run 'fbcode//mode/dev-nosan' fbcode//bolt/nn/executorch/backends/tests:qnn_test -- -r test_qat_bias=False,use_3d_input=False
- buck2 run 'fbcode//mode/dev-nosan' fbcode//bolt/nn/executorch/backends/tests:qnn_test -- -r test_qat_bias=True,use_3d_input=False
- buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_fold_bn_erases_bn_node
```
Reviewed By: angelayi
Differential Revision: D60319175
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132764
Approved by: https://github.com/angelayi
Summary: Previously, when folding BN into conv, we rely on DCE
to clean up the unused BN node from the graph. This works if
the model is already in eval mode, but fails if the model is
still in train mode because DCE doesn't remove nodes with
potential side effects (in this case `_native_batch_norm_legit`).
This required users to move the model to eval mode before calling
convert in order to get a properly DCE'd graph.
To solve this, we manually erase the BN node after folding
instead of relying on DCE. This relaxes the ordering constraints
between `move_exported_model_to_eval` and `convert_pt2e`.
Test Plan:
python test/test_quantization.py TestQuantizePT2EQAT_ConvBn1d.test_fold_bn_erases_bn_node
python test/test_quantization.py TestQuantizePT2EQAT_ConvBn2d.test_fold_bn_erases_bn_node
Reviewers: jerryzh168, yushangdi
Subscribers: jerryzh168, yushangdi, supriyar
Differential Revision: [D60520149](https://our.internmc.facebook.com/intern/diff/D60520149)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/131651
Approved by: https://github.com/yushangdi, https://github.com/leslie-fang-intel
Summary:
Occaisonally we run into a partition that looks like this for Add:
```
SourcePartition(nodes=[_constant2, add_2], source=<built-in function add>, input_nodes=[x], output_nodes=[_constant2, add_2], params=[_constant2])
```
In this case we are adding a constant to an input, and reusing the constant later down the line. This causes our constant to be an output in our SourcePartition. The assumption then that:
```
add_node = add_partition.output_nodes[0]
```
Will not necessarily hold. As a result we must check that the output node is indeed a call function and not a constant.
Test Plan: buck test mode/dev-nosan //executorch/backends/xnnpack/test:test_xnnpack_ops -- test_qs8_add_constant
Differential Revision: D60413221
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132092
Approved by: https://github.com/jerryzh168
This PR fixes a bug in `test_correct_module_names` introduced in #130497. It also addresses post-fix test failures in:
* `torch/ao/quantization/__init__.py` - set the correct `__module__` for several public API helpers
* `torch/library.py` - add `register_vmap` to `__all__`
* `torch/nn/attention/flex_attention.py` - make `round_up_to_multiple` private by prepending an underscore
* `torch/storage.py` - introduce `__all__` to avoid `Self` being re-exported as a public API
* `torch/distributed/pipelining/schedules.py` - add `ZeroBubbleAlgorithm` to `__all__`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/131386
Approved by: https://github.com/albanD
This PR fixes a bug in `test_correct_module_names` introduced in #130497. It also addresses post-fix test failures in:
* `torch/ao/quantization/__init__.py` - set the correct `__module__` for several public API helpers
* `torch/library.py` - add `register_vmap` to `__all__`
* `torch/nn/attention/flex_attention.py` - make `round_up_to_multiple` private by prepending an underscore
* `torch/storage.py` - introduce `__all__` to avoid `Self` being re-exported as a public API
* `torch/distributed/pipelining/schedules.py` - add `ZeroBubbleAlgorithm` to `__all__`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/131386
Approved by: https://github.com/albanD
Reland https://github.com/pytorch/pytorch/pull/126704
#### Fixes the issue with type of `nn.Module._state_dict_hooks` being changed in that PR which was problematic:
Instead of using `Tuple(Callable, bool)` to keep track of whether the private `_register_state_dict_hook` or the public `register_state_dict_post_hook` API was used to register the hook and toggle the behavior accordingly, I set an attribute on the Callable in the private API, which is never cleaned up.
If a callable previously registered using the private API is registered via the public API, a RuntimeError will be raised
#### Copied from previous PR description
Fixes https://github.com/pytorch/pytorch/issues/75287 and https://github.com/pytorch/pytorch/issues/117437
- `nn.Module._register_state_dict_hook` --> add public `nn.Module.register_state_dict_post_hook`
- Add a test as this API was previously untested
- `nn.Module._register_load_state_dict_pre_hook` --> add public `nn.Module.register_load_state_dict_pre_hook` (remove the `with_module` flag, default it to `True`
~- For consistency with optimizer `load_state_dict_pre_hook` raised by @janeyx99, allow the pre-hook to return a new `state_dict`~
- For issuet by https://github.com/pytorch/pytorch/issues/117437 regarding `_register_state_dict_hook` semantic of returning a new state_dict only being respected for the root for private hook
- Document this for private `_register_state_dict_hook`
- Remove this for the public `register_state_dict_post_hook`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/131690
Approved by: https://github.com/albanD
Summary: Previously, when folding BN into conv, we rely on DCE
to clean up the unused BN node from the graph. This works if
the model is already in eval mode, but fails if the model is
still in train mode because DCE doesn't remove nodes with
potential side effects (in this case `_native_batch_norm_legit`).
This required users to move the model to eval mode before calling
convert in order to get a properly DCE'd graph.
To solve this, we manually erase the BN node after folding
instead of relying on DCE. This relaxes the ordering constraints
between `move_exported_model_to_eval` and `convert_pt2e`.
Test Plan:
python test/test_quantization.py TestQuantizePT2EQAT_ConvBn1d.test_fold_bn_erases_bn_node
python test/test_quantization.py TestQuantizePT2EQAT_ConvBn2d.test_fold_bn_erases_bn_node
Reviewers: jerryzh168, yushangdi
Subscribers: jerryzh168, yushangdi, supriyar
Pull Request resolved: https://github.com/pytorch/pytorch/pull/131651
Approved by: https://github.com/yushangdi
Earlier the signature of dequantize ops for decomposed quantized Tensor was changed for wider use-cases where the output dtype can be different from torch.float and needs to be passed during dequantization.
Please refer: https://github.com/pytorch/pytorch/pull/121450
However, setting of correct output dtype for dequantize ops was still missing in convert_pt2e flow.
This change enables the users to use PT2E quantization flow with non torch.float unquantized dtype, such as torch.bfloat16.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/128953
Approved by: https://github.com/jgong5, https://github.com/jerryzh168
Summary:
Add three top level APIs for numeric debugger in pt2e flow that can log intermediate output in the model
and calculate summary for metric comparisons between nodes in two graphs
* `prepare_for_propagation_comparison`
* `extract_results_from_loggers`
* `compare_results`
Test Plan:
python test/test_quantization.py -k test_prepare_for_propagation_comparison
python test/test_quantization.py -k test_extract_results_from_loggers
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130643
Approved by: https://github.com/dulinriley, https://github.com/tarun292
Summary:
* Added support for preserving it during deepcopy, need to remap the args since _numeric_debug_handle refers
to the nodes in the graph
TODO: need to fully support re-export, currently the metadata for output node is not preserved
Test Plan:
python test/test_quantization.py -k test_deepcopy_preserve_handle
python test/test_quantization.py -k test_copy_preserve_handle
all related tests:
python test/test_quantization.py -k TestGenerateNumericDebugHandle
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129287
Approved by: https://github.com/zhxchen17
Summary:
Previously we store edge id in numeric_debug_handle to support operator fusion and operator decomposition throughout the stack,
but according to feedback from customers, people prefer the simpler per-node id, and they are fine with not having the additional
support for numerical debugging for inputs and willing to hack around to achieve this.
This PR changes the structure of numeric_debug_handle to store unique_id for each node instead.
e.g.
graph:
```
node = op(input_node, weight_node)
```
Before:
```
node.meta[NUMERIC_DEBUG_HANDLE_KEY] = {input_node: id1, weight_node: id2, "output": id3}
```
After:
```
node.meta[NUMERIC_DEBUG_HANDLE_KEY] = id1
```
Test Plan:
python test/test_quantization.py -k TestGenerateNumericDebugHandle
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129811
Approved by: https://github.com/tarun292