Summary:
- make default DCE pass check schema,
- need to rebase onto https://github.com/pytorch/pytorch/pull/131651 after it's in phabricator (for now the change is manually added).
- mark Proxy dump as NotImplemented for better error msg
- Remove Proxy from tensors when dumping models, as Proxy cannot be dumped.
More details in https://docs.google.com/document/d/1G5vmTXjzxoyVGRI2kpA1gQukK_Glyg2NrE0Oh6Nlg9A/edit?usp=sharing.
Test Plan:
CI
```
- buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r qat_conv2d
- test_export.py
- buck2 run 'fbcode//mode/dev-nosan' fbcode//modai/test:test_modai -- -r test_qat_stinson_htp_export
- buck2 run 'fbcode//mode/dev-nosan' fbcode//vizard_projects/ml_depth/tests:test_model -- -r test_qat_model_et
- buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test:fx -- -r dce
- buck2 run 'fbcode//mode/dev-nosan' fbcode//bolt/nn/executorch/backends/tests:qnn_test -- -r test_qat_bias=False,use_3d_input=False
- buck2 run 'fbcode//mode/dev-nosan' fbcode//bolt/nn/executorch/backends/tests:qnn_test -- -r test_qat_bias=True,use_3d_input=False
- buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_fold_bn_erases_bn_node
```
Reviewed By: angelayi
Differential Revision: D60319175
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132764
Approved by: https://github.com/angelayi
Summary: Previously, when folding BN into conv, we rely on DCE
to clean up the unused BN node from the graph. This works if
the model is already in eval mode, but fails if the model is
still in train mode because DCE doesn't remove nodes with
potential side effects (in this case `_native_batch_norm_legit`).
This required users to move the model to eval mode before calling
convert in order to get a properly DCE'd graph.
To solve this, we manually erase the BN node after folding
instead of relying on DCE. This relaxes the ordering constraints
between `move_exported_model_to_eval` and `convert_pt2e`.
Test Plan:
python test/test_quantization.py TestQuantizePT2EQAT_ConvBn1d.test_fold_bn_erases_bn_node
python test/test_quantization.py TestQuantizePT2EQAT_ConvBn2d.test_fold_bn_erases_bn_node
Reviewers: jerryzh168, yushangdi
Subscribers: jerryzh168, yushangdi, supriyar
Differential Revision: [D60520149](https://our.internmc.facebook.com/intern/diff/D60520149)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/131651
Approved by: https://github.com/yushangdi, https://github.com/leslie-fang-intel
Summary:
Occaisonally we run into a partition that looks like this for Add:
```
SourcePartition(nodes=[_constant2, add_2], source=<built-in function add>, input_nodes=[x], output_nodes=[_constant2, add_2], params=[_constant2])
```
In this case we are adding a constant to an input, and reusing the constant later down the line. This causes our constant to be an output in our SourcePartition. The assumption then that:
```
add_node = add_partition.output_nodes[0]
```
Will not necessarily hold. As a result we must check that the output node is indeed a call function and not a constant.
Test Plan: buck test mode/dev-nosan //executorch/backends/xnnpack/test:test_xnnpack_ops -- test_qs8_add_constant
Differential Revision: D60413221
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132092
Approved by: https://github.com/jerryzh168
This PR fixes a bug in `test_correct_module_names` introduced in #130497. It also addresses post-fix test failures in:
* `torch/ao/quantization/__init__.py` - set the correct `__module__` for several public API helpers
* `torch/library.py` - add `register_vmap` to `__all__`
* `torch/nn/attention/flex_attention.py` - make `round_up_to_multiple` private by prepending an underscore
* `torch/storage.py` - introduce `__all__` to avoid `Self` being re-exported as a public API
* `torch/distributed/pipelining/schedules.py` - add `ZeroBubbleAlgorithm` to `__all__`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/131386
Approved by: https://github.com/albanD
This PR fixes a bug in `test_correct_module_names` introduced in #130497. It also addresses post-fix test failures in:
* `torch/ao/quantization/__init__.py` - set the correct `__module__` for several public API helpers
* `torch/library.py` - add `register_vmap` to `__all__`
* `torch/nn/attention/flex_attention.py` - make `round_up_to_multiple` private by prepending an underscore
* `torch/storage.py` - introduce `__all__` to avoid `Self` being re-exported as a public API
* `torch/distributed/pipelining/schedules.py` - add `ZeroBubbleAlgorithm` to `__all__`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/131386
Approved by: https://github.com/albanD
Reland https://github.com/pytorch/pytorch/pull/126704
#### Fixes the issue with type of `nn.Module._state_dict_hooks` being changed in that PR which was problematic:
Instead of using `Tuple(Callable, bool)` to keep track of whether the private `_register_state_dict_hook` or the public `register_state_dict_post_hook` API was used to register the hook and toggle the behavior accordingly, I set an attribute on the Callable in the private API, which is never cleaned up.
If a callable previously registered using the private API is registered via the public API, a RuntimeError will be raised
#### Copied from previous PR description
Fixes https://github.com/pytorch/pytorch/issues/75287 and https://github.com/pytorch/pytorch/issues/117437
- `nn.Module._register_state_dict_hook` --> add public `nn.Module.register_state_dict_post_hook`
- Add a test as this API was previously untested
- `nn.Module._register_load_state_dict_pre_hook` --> add public `nn.Module.register_load_state_dict_pre_hook` (remove the `with_module` flag, default it to `True`
~- For consistency with optimizer `load_state_dict_pre_hook` raised by @janeyx99, allow the pre-hook to return a new `state_dict`~
- For issuet by https://github.com/pytorch/pytorch/issues/117437 regarding `_register_state_dict_hook` semantic of returning a new state_dict only being respected for the root for private hook
- Document this for private `_register_state_dict_hook`
- Remove this for the public `register_state_dict_post_hook`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/131690
Approved by: https://github.com/albanD
Summary: Previously, when folding BN into conv, we rely on DCE
to clean up the unused BN node from the graph. This works if
the model is already in eval mode, but fails if the model is
still in train mode because DCE doesn't remove nodes with
potential side effects (in this case `_native_batch_norm_legit`).
This required users to move the model to eval mode before calling
convert in order to get a properly DCE'd graph.
To solve this, we manually erase the BN node after folding
instead of relying on DCE. This relaxes the ordering constraints
between `move_exported_model_to_eval` and `convert_pt2e`.
Test Plan:
python test/test_quantization.py TestQuantizePT2EQAT_ConvBn1d.test_fold_bn_erases_bn_node
python test/test_quantization.py TestQuantizePT2EQAT_ConvBn2d.test_fold_bn_erases_bn_node
Reviewers: jerryzh168, yushangdi
Subscribers: jerryzh168, yushangdi, supriyar
Pull Request resolved: https://github.com/pytorch/pytorch/pull/131651
Approved by: https://github.com/yushangdi
Earlier the signature of dequantize ops for decomposed quantized Tensor was changed for wider use-cases where the output dtype can be different from torch.float and needs to be passed during dequantization.
Please refer: https://github.com/pytorch/pytorch/pull/121450
However, setting of correct output dtype for dequantize ops was still missing in convert_pt2e flow.
This change enables the users to use PT2E quantization flow with non torch.float unquantized dtype, such as torch.bfloat16.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/128953
Approved by: https://github.com/jgong5, https://github.com/jerryzh168
Summary:
Add three top level APIs for numeric debugger in pt2e flow that can log intermediate output in the model
and calculate summary for metric comparisons between nodes in two graphs
* `prepare_for_propagation_comparison`
* `extract_results_from_loggers`
* `compare_results`
Test Plan:
python test/test_quantization.py -k test_prepare_for_propagation_comparison
python test/test_quantization.py -k test_extract_results_from_loggers
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130643
Approved by: https://github.com/dulinriley, https://github.com/tarun292
Summary:
* Added support for preserving it during deepcopy, need to remap the args since _numeric_debug_handle refers
to the nodes in the graph
TODO: need to fully support re-export, currently the metadata for output node is not preserved
Test Plan:
python test/test_quantization.py -k test_deepcopy_preserve_handle
python test/test_quantization.py -k test_copy_preserve_handle
all related tests:
python test/test_quantization.py -k TestGenerateNumericDebugHandle
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129287
Approved by: https://github.com/zhxchen17
Summary:
Previously we store edge id in numeric_debug_handle to support operator fusion and operator decomposition throughout the stack,
but according to feedback from customers, people prefer the simpler per-node id, and they are fine with not having the additional
support for numerical debugging for inputs and willing to hack around to achieve this.
This PR changes the structure of numeric_debug_handle to store unique_id for each node instead.
e.g.
graph:
```
node = op(input_node, weight_node)
```
Before:
```
node.meta[NUMERIC_DEBUG_HANDLE_KEY] = {input_node: id1, weight_node: id2, "output": id3}
```
After:
```
node.meta[NUMERIC_DEBUG_HANDLE_KEY] = id1
```
Test Plan:
python test/test_quantization.py -k TestGenerateNumericDebugHandle
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129811
Approved by: https://github.com/tarun292
Summary:
Previously we store edge id in numeric_debug_handle to support operator fusion and operator decomposition throughout the stack,
but according to feedback from customers, people prefer the simpler per-node id, and they are fine with not having the additional
support for numerical debugging for inputs and willing to hack around to achieve this.
This PR changes the structure of numeric_debug_handle to store unique_id for each node instead.
e.g.
graph:
```
node = op(input_node, weight_node)
```
Before:
```
node.meta[NUMERIC_DEBUG_HANDLE_KEY] = {input_node: id1, weight_node: id2, "output": id3}
```
After:
```
node.meta[NUMERIC_DEBUG_HANDLE_KEY] = id1
```
Test Plan:
python test/test_quantization.py -k TestGenerateNumericDebugHandle
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129811
Approved by: https://github.com/tarun292
Summary:
There were two problems with the HistogramObserver:
1. It does not work when someone passes a batch_size 1, tensor_size 1 data-point.
2. The Histogram doesn't seem to actually update if the range of the new x falls within the old one
These issues were both fixed.
On top of this, I greatly simplified the logic for the histogram updating. Now, it doesn't do the downsampling anymore, which saves a ton of memory and code. The accuracy can still be controlled with the upsampling ratio. This ratio was also too high for the accuracy we generally need here, I reduced the default for this.
Also the code is cleaner now, much easier to follow what's happening.
test_histogram_observer_same_inputs was likely wrong - If I pass 0s and 1s to my histogramobserver, I want them to actually count! The current test now thinks it's good to discard and ignore these values.
Test Plan: You can run the included tests.
Differential Revision: D58931336
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129387
Approved by: https://github.com/jerryzh168
Summary:
(1) Make code work when a first layer does not have a bias.
(2) Make it possible to provide both modules and module names as input
(3) Allow sequences of contiguous layers as input, that then get split into pairs
(4) fix documentation to be more clear on inputs to be provided
Test Plan:
Run this new version of the algorithm on a network and see if it throws errors.
There's also this notebook to run and test N5199827
It you tell me where I can find the tests for this code, I can add some simple unit tests as well.
Differential Revision: D55895862
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124632
Approved by: https://github.com/jerryzh168
Summary:
Added `set_module_name_qconfig` support to allow users to set configurations based on module name in `X86InductorQuantizer`.
For example, only quantize the `sub`:
```python
class M(torch.nn.Module):
def __init__(self):
super().__init__()
self.linear = torch.nn.Linear(5, 5)
self.sub = Sub()
def forward(self, x):
x = self.linear(x)
x = self.sub(x)
return x
m = M().eval()
example_inputs = (torch.randn(3, 5),)
# Set config for a specific submodule.
quantizer = X86InductorQuantizer()
quantizer.set_module_name_qconfig("sub", xiq.get_default_x86_inductor_quantization_config())
```
- Added `set_module_name_qconfig` to allow user set the configuration at the `module_name` level.
- Unified the annotation process to follow this order: `module_name_qconfig`, `operator_type_qconfig`, and `global_config`.
- Added `config_checker` to validate all user configurations and prevent mixing of static/dynamic or QAT/non-QAT configs.
- Moved `_get_module_name_filter` from `xnnpack_quantizer.py` into `utils.py` as it common for all quantizer.
Test Plan
```bash
python -m pytest quantization/pt2e/test_x86inductor_quantizer.py -k test_set_module_name
```
@Xia-Weiwen @leslie-fang-intel @jgong5
Pull Request resolved: https://github.com/pytorch/pytorch/pull/126044
Approved by: https://github.com/jgong5, https://github.com/leslie-fang-intel, https://github.com/jerryzh168
Fixes https://github.com/pytorch/pytorch/issues/75287 and https://github.com/pytorch/pytorch/issues/117437
- `nn.Module._register_state_dict_hook` --> add public `nn.Module.register_state_dict_post_hook`
- Add a test as this API was previously untested
- `nn.Module._register_load_state_dict_pre_hook` --> add public `nn.Module.register_load_state_dict_pre_hook` (remove the `with_module` flag, default it to `True`
~- For consistency with optimizer `load_state_dict_pre_hook` raised by @janeyx99, allow the pre-hook to return a new `state_dict`~
- Document issue pointed out by https://github.com/pytorch/pytorch/issues/117437 regarding `_register_state_dict_hook` semantic of returning a new state_dict only being respected for the root for private hook
- Remove this for the public `register_state_dict_post_hook`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/126704
Approved by: https://github.com/albanD
ghstack dependencies: #126906
Use `typing_extensions.deprecated` for deprecation annotation if possible. Otherwise, add `category=FutureWarning` to `warnings.warn("message")` if the category is missing.
Note that only warnings that their messages contain `[Dd]eprecat(ed|ion)` are updated in this PR.
Resolves#126888
- #126888
This PR is split from PR #126898.
- #126898
------
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127689
Approved by: https://github.com/Skylion007
Summary:
For Speech sequential model, there could be a case where model(data) does not work correctly for feed forward,
Speech model uses a different type of Criterion (a.k.a loss function) to feed a data on individual components like encoder, predictor, joiner.
Hence we need extra parameter to pass feedforward wrapper
Differential Revision: D57680391
Pull Request resolved: https://github.com/pytorch/pytorch/pull/126891
Approved by: https://github.com/jerryzh168
Summary:
After QAT is completed or given pre-tuned weight observer via tunable PTQ algorithm, it should not over-write again with a given weight, at least for static QAT never.
Dynamic QAT also does not require to re-run weight observer again by design.
This is a fix
Test Plan: Signals
Differential Revision: D57747749
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127309
Approved by: https://github.com/jerryzh168
Use `typing_extensions.deprecated` for deprecation annotation if possible. Otherwise, add `category=FutureWarning` to `warnings.warn("message")` if the category is missing.
Note that only warnings that their messages contain `[Dd]eprecat(ed|ion)` are updated in this PR.
UPDATE: Use `FutureWarning` instead of `DeprecationWarning`.
Resolves#126888
- #126888
Pull Request resolved: https://github.com/pytorch/pytorch/pull/126898
Approved by: https://github.com/albanD
The `usort` config in `pyproject.toml` has no effect due to a typo. Fixing the typo make `usort` do more and generate the changes in the PR. Except `pyproject.toml`, all changes are generated by `lintrunner -a --take UFMT --all-files`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127126
Approved by: https://github.com/kit1980
The `usort` config in `pyproject.toml` has no effect due to a typo. Fixing the typo make `usort` do more and generate the changes in the PR. Except `pyproject.toml`, all changes are generated by `lintrunner -a --take UFMT --all-files`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127126
Approved by: https://github.com/kit1980
ghstack dependencies: #127122, #127123, #127124, #127125
Summary: The PT2E quantization flow does not support unquantized
outputs yet. To work around this, users may wish to remove the
output observer from their graphs. However, this fails currently
in some cases because the `PortNodeMetaForQDQ` pass is too
restrictive, for example:
```
conv -> obs -------> output0
\\-> add -> output1
```
Previously we expected conv to always have exactly 1 user,
which is the observer. When the observer is removed, however,
conv now has 2 users, and this fails the check.
```
conv -------> output0
\\-> add -> output1
```
This commit relaxes the error into a warning to enable
this workaround.
Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_multi_users_without_output_observer
Reviewers: jerryzh168
Subscribers: jerryzh168, supriyar
Differential Revision: [D57472601](https://our.internmc.facebook.com/intern/diff/D57472601)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/126487
Approved by: https://github.com/tarun292
The current call passes in `['/actual/path']` to os.walk which is a string pointing to no path and thus silently leads to and empty traversal.
There is an unused function just above that handles that, so I guess this is what was supposed to be called.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/126103
Approved by: https://github.com/suo
Summary: Follow-up to https://github.com/pytorch/ao/pull/229.
This resolves the difference between `input.div(scales)` and
`input.mul(1.0 / scales)`, which results in small numerical
discrepancies on some inputs.
Test Plan:
python test/test_quantization.py TestQuantizedTensor.test_decomposed_quantize_per_channel_group
python test/test_quantization.py TestQuantizedTensor.test_decomposed_quantize_per_token
Reviewers: jerryzh168
Subscribers: jerryzh168, supriyar
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125781
Approved by: https://github.com/jerryzh168
Summary: This commit fixes the pattern matching for conv-bn
during QAT fusion where both weight and bias are quantized per
channel. Previously this failed because weights and biases used
the same example kwargs for their scales and zero points,
causing these qparams to be tied during pattern matching.
Test Plan:
python test/test_quantization.py TestQuantizePT2EQAT_ConvBn1d.test_qat_conv_bn_per_channel_weight_bias
python test/test_quantization.py TestQuantizePT2EQAT_ConvBn2d.test_qat_conv_bn_per_channel_weight_bias
Reviewers: jerryzh168, angelayi
Subscribers: jerryzh168, angelayi, supriyar
Differential Revision: [D56740694](https://our.internmc.facebook.com/intern/diff/D56740694)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125208
Approved by: https://github.com/angelayi
Summary: Before in `move_exported_model_to_train/eval`, we only
switched the CPU versions of the batch norm op. This commit adds
support for the cuda versions of the op too. Note that this fix
is temporary; we won't have to differentiate between these two
cases once we have batch norm consolidation.
Test Plan:
python test/test_quantization.py -k test_move_exported_model_bn
Reviewers: jerryzh168
Subscribers: jerryzh168, leslie-fang-intel, supriyar
Differential Revision: [D56070054](https://our.internmc.facebook.com/intern/diff/D56070054)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123957
Approved by: https://github.com/jerryzh168
Summary: Avoid situation where the graph traversal finds a matmul node with a `get_attr` as its `args[0]`, and incorrectly propagate the `get_attr`'s meta to everything downstream.
Test Plan: CI
Differential Revision: D56219120
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124415
Approved by: https://github.com/jerryzh168
Summary: Before in `move_exported_model_to_train/eval`, we only
switched the CPU versions of the batch norm op. This commit adds
support for the cuda versions of the op too. Note that this fix
is temporary; we won't have to differentiate between these two
cases once we have batch norm consolidation.
Test Plan:
python test/test_quantization.py -k test_move_exported_model_bn
Reviewers: jerryzh168
Subscribers: jerryzh168, leslie-fang-intel, supriyar
Differential Revision: [D56070054](https://our.internmc.facebook.com/intern/diff/D56070054)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123957
Approved by: https://github.com/jerryzh168
This commit enables float8_e5m2 and float8_e4m3fn dtypes in fx quantization and PT2E.
Motivation for using fp8 quantization instead of int8:
- it works better to run inference with the same datatype the model was trained with,
- fp8 can handle outliers better, which is one of the problems in LLMs activations.
The numerical recipe we want to use it for is fp8 inference:
- bgemms/gemms running in float8_e4m3fn,
- Per-Tensor-Quantization/Scaling,
- amax observer for measurement with input_backoff and weight_backoff.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123161
Approved by: https://github.com/jgong5, https://github.com/jerryzh168
Adds a ruff lint rule to ban raising raw exceptions. Most of these should at the very least be runtime exception, value errors, type errors or some other errors. There are hundreds of instance of these bad exception types already in the codebase, so I have noqa'd most of them. Hopefully this error code will get commiters to rethink what exception type they should raise when they submit a PR.
I also encourage people to gradually go and fix all the existing noqas that have been added so they can be removed overtime and our exception typing can be improved.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124570
Approved by: https://github.com/ezyang
Update ruff to 0.4.1 .
This version fixes a lot false negatives/false positives, is 20-40% faster, and has various other bug fixes.
Below is a before and after table showing the execution time of ruff lint and ruff format in milliseconds courtesy of https://astral.sh/blog/ruff-v0.4.0
| Repository | Linter (v0.3) | Linter (v0.4) | Formatter (v0.3) | Formatter (v0.4) |
|----------------------------------------------------|---------------|---------------|------------------|------------------|
| [pytorch/pytorch](https://github.com/pytorch/pytorch) | 328.7 | 251.8 | 351.1 | 274.9 |
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124549
Approved by: https://github.com/ezyang
Summary: https://github.com/pytorch/pytorch/pull/123452 added
backward support to this op by turning it into
CompositeImplicitAutograd, which meant it gets decomposed during
export/compile. However, this is not desirable behavior for the
PTQ case when we try to lower the model. This commit enables
QAT without breaking PTQ by refactoring the impl into a separate
op that does have backward support.
Test Plan:
python test/test_quantization.py -k test_decomposed_choose_qparams_per_token_asymmetric_backward
Reviewers: jerryzh168, digantdesai, zou3519
Subscribers: jerryzh168, digantdesai, zou3519, supriyar
Differential Revision: [D56192116](https://our.internmc.facebook.com/intern/diff/D56192116)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124178
Approved by: https://github.com/digantdesai
Automatic fixes that replaces certain list comprehensions with generator ones where appropriate so that they are immediately consumed. This is preview functionality in ruff for rule C419 and it was automatically applied.
Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123960
Approved by: https://github.com/malfet
Summary: When running the backward for this op, we get the error:
```
RuntimeError: derivative for aten::aminmax is not implemented
```
This commit replaces this call with separate amin and amax
calls instead, which do have implemented derivatives.
Test Plan:
python test/test_quantization.py -k test_decomposed_choose_qparams_per_token_asymmetric_backward
Reviewers: jerryzh168, digantdesai
Subscribers: jerryzh168, digantdesai, supriyar
Differential Revision: [D55805170](https://our.internmc.facebook.com/intern/diff/D55805170)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123452
Approved by: https://github.com/digantdesai, https://github.com/jerryzh168
We are taking API feedback. Changes:
- I removed some of the default values (they weren't being used).
- I was unable to convert the last op (which is essentially an
autograd.Function registered as CompositeImplicitAutograd). That one
is "incorrectly registered"; I punt fixing it to the future.
Test Plan:
- existing tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123454
Approved by: https://github.com/andrewor14
ghstack dependencies: #123453, #123578
**Summary**
Add `matmul` in the quantization recipes, noting that it's not a general recipe but tailored to meet accuracy criteria for specific models. `matmul` recipe is disabled by default.
**Test Plan**
```
python -m pytest quantization/pt2e/test_x86inductor_quantizer.py -k test_attention_block
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/122776
Approved by: https://github.com/jgong5, https://github.com/jerryzh168
ghstack dependencies: #122775
**Summary**
Default recipes are enabled in `X86InductorQuantizer` and request comes to customize recipes based on these defaults.
- Avoid annotation propagation and restrict annotation only to annotate `conv`/`linear`.
- Add `matmul` in the quantization recipes, noting that it's not a general recipe but tailored to meet accuracy criteria for specific models.
To meet these requests, we made changes in this PR by introducing interface as `set_function_type_qconfig` and `set_module_type_qconfig`
- `set_function_type_qconfig` accepts functional input as `torch.nn.functional.linear` or `torch.matmul`; `set_module_type_qconfig` accepts nn.Module input as `torch.nn.Conv2d`.
- To disable the recipe for this operator, user can simply exclude it from the list of operations as `quantizer.set_function_type_qconfig(op, None)`.
- To modify or extend the recipe for this operator with default recipe, user can customize as `quantizer.set_function_type_qconfig(op, config)`.
**Test Plan**
```
python -m pytest quantization/pt2e/test_x86inductor_quantizer.py -k test_filter_conv2d_recipe
python -m pytest quantization/pt2e/test_x86inductor_quantizer.py -k test_filter_linear_recipe
python -m pytest quantization/pt2e/test_x86inductor_quantizer.py -k test_filter_maxpool2d_recipe
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/122775
Approved by: https://github.com/jgong5, https://github.com/jerryzh168
Summary:
Even with changes in D55347133, it is still possible to OOM in histogram observer, because the size of allocated tensor also depends on *downsample_rate*.
For example, I still see OOM due to the attempt of allocating a 10GB+ histogram tensor in multi-task model.
To fix OOM issue better, we use *try-catch* clause to avoid OOM.
Empirically, we set the max size of a single histogram tensor size to 1 GB.
Test Plan: Test the change for Multi-Task model (depth + segmentation)
Differential Revision: D55567292
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123309
Approved by: https://github.com/jerryzh168
Summary: We probably don't need
`torch._C._AutoDispatchBelowAutograd()`, which is to prevent
infinite recursion if the implementation calls itself. Let's
remove it and see if anything breaks. The other major change
is registering the op to the more general Autograd dispatch
key so it can be used on cuda as well.
Test Plan:
python test/inductor/test_cpu_repro.py -k test_decomposed_fake_quant_per_channel
Reviewers: zou3519, bdhirsh
Subscribers: zou3519, bdhirsh, jerryzh168, leslie-fang-intel
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123186
Approved by: https://github.com/zou3519, https://github.com/leslie-fang-intel
Summary: When we migrate to torch.export, we won't put L['self'] as the prefix for all the fqn in nn_module_stack. This diff adds the branch to handle the new case.
Test Plan: buck test mode/opt caffe2/test/quantization:test_quantization -- -r set_module_name
Differential Revision: D55436617
Pull Request resolved: https://github.com/pytorch/pytorch/pull/122819
Approved by: https://github.com/tugsbayasgalan
Summary:
Right now we don't insert additional observers (share observers) if qspec.dtype and qspec.is_dynamic matches exactly,
since fixed qparams quantization spec and derived quantization spec do have have is_dynamic field curerntly, observer sharing does not happen between them and quantization spec, in this PR we fixed the issue by
adding is_dynamic to all quantization specs.
Note: SharedQuantizationSpec should probably be its own type in the future
TODO later:
(1). move all these fields (dtype, is_dynamic, quant_min, quant_max etc.) to QuantizationSpecBase,
(2). make SharedQuantizationSpec a separate type
(3). add quant_min/quant_max in observer sharing checking in pt2e/prepare.py
Test Plan:
python test/test_quantization.py -k test_fixed_qparams_qspec_observer_dedup
Reviewers:
Subscribers:
Tasks:
Tags:
Differential Revision: [D55396546](https://our.internmc.facebook.com/intern/diff/D55396546)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/122734
Approved by: https://github.com/andrewor14
Summary:
also added some utils in xnnpack_quantizer_utils.py
* annotate_conv_tranpsose_bn_relu and annotate_conv_transpose_bn -> this is for QAT
* annotate_conv_transpose_relu
conv_transpose + bn weights fusion is performed automatically and can not be disabled currently
we can add support to allow disable this fusion later if needed
Test Plan:
python test/test_quantization.py -k test_conv_transpose_bn_fusion
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/122046
Approved by: https://github.com/andrewor14
Summary:
X-link: https://github.com/pytorch/executorch/pull/2308
Note: The initial purpose of this PR is to draw suggestion and feedback regarding better alternative, if any.
At present, dequantize op for decomposed quantized Tensor representation e.g. dequantize_per_tensor() assumes the output dtype as torch.float and hence, it does not have the output dtype in its operator argument list. However, this op signature becomes unusable when the assumption breaks. Because, in case the output dtype is different from torch.float, there is no way to specify the same during dequantization.
This change is aimed at generalizing the signature of dequantize op like dequantize_per_tensor() for wider use-cases where the output dtype can be different from torch.float and needs to passed during dequantization. The proposal is to use an additional argument named 'output_dtype' to solve the problem. However, we would also like to have suggestion and feedback regarding any better alternative that can be used instead.
cc jerryzh168 jianyuh raghuramank100 jamesr66a vkuzo jgong5 Xia-Weiwen leslie-fang-intel
Reviewed By: digantdesai
Differential Revision: D53590486
Pulled By: manuelcandales
Co-authored-by: kausik <kmaiti@habana.ai>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/121450
Approved by: https://github.com/jerryzh168
**Summary**
Add the operator of `quantized_decomposed.fake_quant_per_channel` and test the forward and backward of this op with comparing to ATen.
**Test Plan**
```
python -u -m pytest -s -v test_cpu_repro.py -k test_decomposed_fake_quant_per_channel
```
**Next Step**
Optimize the performance: from the generated code of forward and backward graph, the code didn't vectorize.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/121297
Approved by: https://github.com/jerryzh168, https://github.com/jgong5
Summary: Today we don't allow free functions to be tracing callable in torch.export. As a part of migrating capture_preautograd_graph usages to torch.export, we need to ban free functions to capture_preautograd_graph as well
Test Plan: CI
Differential Revision: D54319597
Pull Request resolved: https://github.com/pytorch/pytorch/pull/120817
Approved by: https://github.com/zhxchen17, https://github.com/andrewor14
Summary: This commit adds the `model_is_exported` util function
for users to be able to easily tell what APIs to call to move
their models between train and eval modes. This has the
additional advantage of hiding the implementation of how we
detect a model is exported, in case the metadata format changes
in the future.
Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_model_is_exported
Differential Revision: [D53812972](https://our.internmc.facebook.com/intern/diff/D53812972)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119726
Approved by: https://github.com/tugsbayasgalan, https://github.com/albanD
Summary:
There was a bug in the module name filter for modules that had an underscore
already in them, as it was replaced with a "dot" notation.
This is because it was thought that underscores always meant a module separator,
but this isn't the case for modules whose name contains an underscore.
Test Plan:
Added a unit test. Before this change, that test failed (due to applying the wrong
qscheme). Now it passes.
Differential Revision: D53502771
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119344
Approved by: https://github.com/jerryzh168
Summary:
This is a follow up to https://github.com/pytorch/pytorch/pull/118605 to remove `fold_quantize` flag from
`convert_pt2e`
Test Plan: CI
Differential Revision: D53247301
BC Breaking Note:
flag `fold_quantize` set to True `convert_pt2e` and now we'll fold the quantize op in the weight by default, so users will see model size reduction by default after pt2e quantization.
2.2
```
folded_model = convert_pt2e(model, fold_quantize=True)
non_folded_model = convert_pt2e(model)
```
2.3
```
folded_model = convert_pt2e(model)
non_folded_model = convert_pt2e(model, fold_quantize=False)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118701
Approved by: https://github.com/andrewor14, https://github.com/leslie-fang-intel
Summary: This commit adds a util for PT2E quantization users
to call `model.train()` and `model.eval()` without error.
Instead, these will automatically call the equivalent
`move_exported_model_to_train/eval` for the user, which only
switch behavior for special ops like dropout and batchnorm.
This enables users to onboard to the PT2E flow more easily.
Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_allow_exported_model_train_eval
Reviewers: jerryzh168, tugsbayasgalan, zhxchen17
Subscribers: jerryzh168, tugsbayasgalan, zhxchen17, supriyar
Differential Revision: [D53426636](https://our.internmc.facebook.com/intern/diff/D53426636)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119091
Approved by: https://github.com/jerryzh168, https://github.com/tugsbayasgalan, https://github.com/zhxchen17
Summary: This is the equivalent API to `model.train()` for
exported models, analogous to `move_exported_model_to_eval`.
Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_move_exported_model_dropout
python test/test_quantization.py TestQuantizePT2E.test_move_exported_model_dropout_inplace
python test/test_quantization.py TestQuantizePT2E.test_move_exported_model_dropout_bn
Reviewers: jerryzh168, kimishpatel
Subscribers: jerryzh168, kimishpatel, supriyar
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113492
Approved by: https://github.com/jerryzh168, https://github.com/tugsbayasgalan
Summary:
When `version` is missing in the metadata, use `min_val/max_val` as keys instead of `max_vals/min_vals`
## Reasons
1. It's been almost 2 years since this change D30003700, which means now most checkpoints are using the `max_val/min_val` keys
2. most checkpoints dumps using `model.state_dict()` don't have version info, which will lead a fake `missing keys` error when loading state_dict
Test Plan: CI
Differential Revision: D53233012
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118659
Approved by: https://github.com/jerryzh168
Summary:
Previously by default we don't generate quantized weight, that is, we'll have fp32 weight, and
`fp32 weight -> q -> dq -> linear -> ...` in the quantized model
After this PR, we'll produce a graph with int8 weight by default after convert_pt2e:
`int8 weight -> dq -> linear -> ...`
We'll remove the fold_quantize flag in the next PR
Test Plan: CI
Differential Revision: D51730862
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118605
Approved by: https://github.com/andrewor14
Fixes https://github.com/pytorch/pytorch/issues/118129
Suppressions automatically added with
```
import re
with open("error_file.txt", "r") as f:
errors = f.readlines()
error_lines = {}
for error in errors:
match = re.match(r"(.*):(\d+):\d+: error:.*\[(.*)\]", error)
if match:
file_path, line_number, error_type = match.groups()
if file_path not in error_lines:
error_lines[file_path] = {}
error_lines[file_path][int(line_number)] = error_type
for file_path, lines in error_lines.items():
with open(file_path, "r") as f:
code = f.readlines()
for line_number, error_type in sorted(lines.items(), key=lambda x: x[0], reverse=True):
code[line_number - 1] = code[line_number - 1].rstrip() + f" # type: ignore[{error_type}]\n"
with open(file_path, "w") as f:
f.writelines(code)
```
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Co-authored-by: Catherine Lee <csl@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118533
Approved by: https://github.com/Skylion007, https://github.com/zou3519
Fixes https://github.com/pytorch/pytorch/issues/118129
Suppressions automatically added with
```
import re
with open("error_file.txt", "r") as f:
errors = f.readlines()
error_lines = {}
for error in errors:
match = re.match(r"(.*):(\d+):\d+: error:.*\[(.*)\]", error)
if match:
file_path, line_number, error_type = match.groups()
if file_path not in error_lines:
error_lines[file_path] = {}
error_lines[file_path][int(line_number)] = error_type
for file_path, lines in error_lines.items():
with open(file_path, "r") as f:
code = f.readlines()
for line_number, error_type in sorted(lines.items(), key=lambda x: x[0], reverse=True):
code[line_number - 1] = code[line_number - 1].rstrip() + f" # type: ignore[{error_type}]\n"
with open(file_path, "w") as f:
f.writelines(code)
```
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118533
Approved by: https://github.com/Skylion007, https://github.com/zou3519
This is a lot of files changed! Don't panic! Here's how it works:
* Previously, we set `follow_imports = silent` for our mypy.ini configuration. Per https://mypy.readthedocs.io/en/stable/running_mypy.html#follow-imports, what this does is whenever we have an import to a module which is not listed as a file to be typechecked in mypy, we typecheck it as normal but suppress all errors that occurred in that file.
* When mypy is run inside lintrunner, the list of files is precisely the files covered by the glob in lintrunner.toml, but with files in excludes excluded.
* The top-level directive `# mypy: ignore-errors` instructs mypy to typecheck the file as normal, but ignore all errors.
* Therefore, it should be equivalent to set `follow_imports = normal`, if we put `# mypy: ignore-errors` on all files that were previously excluded from the file list.
* Having done this, we can remove the exclude list from .lintrunner.toml, since excluding a file from typechecking is baked into the files themselves.
* torch/_dynamo and torch/_inductor were previously in the exclude list, because they were covered by MYPYINDUCTOR. It is not OK to mark these as `# mypy: ignore-errors` as this will impede typechecking on the alternate configuration. So they are temporarily being checked twice, but I am suppressing the errors in these files as the configurations are not quite the same. I plan to unify the configurations so this is only a temporary state.
* There were some straggler type errors after these changes somehow, so I fixed them as needed. There weren't that many.
In the future, to start type checking a file, just remove the ignore-errors directive from the top of the file.
The codemod was done with this script authored by GPT-4:
```
import glob
exclude_patterns = [
...
]
for pattern in exclude_patterns:
for filepath in glob.glob(pattern, recursive=True):
if filepath.endswith('.py'):
with open(filepath, 'r+') as f:
content = f.read()
f.seek(0, 0)
f.write('# mypy: ignore-errors\n\n' + content)
```
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118414
Approved by: https://github.com/thiagocrepaldi, https://github.com/albanD
**Description**
Add dynamic quantization config for x86 inductor backend.
To support the QKV structure in self-attention, we removed an assertion in port-metadata-pass that requires single dequantize node after quantize node.
**Test plan**
```
python test/test_quantization.py -k TestQuantizePT2EX86Inductor.test_dynamic_quant_linear
python test/test_quantization.py -k TestQuantizePT2EX86Inductor.test_qat_dynamic_quant_linear
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115337
Approved by: https://github.com/jgong5, https://github.com/jerryzh168
Updates flake8 to v6.1.0 and fixes a few lints using sed and some ruff tooling.
- Replace `assert(0)` with `raise AssertionError()`
- Remove extraneous parenthesis i.e.
- `assert(a == b)` -> `assert a == b`
- `if(x > y or y < z):`->`if x > y or y < z:`
- And `return('...')` -> `return '...'`
Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/116591
Approved by: https://github.com/albanD, https://github.com/malfet
Summary:
We introduced `node.meta["numeric_debug_handle"]` in https://github.com/pytorch/pytorch/pull/114315 to
indicate the numeric debug handle for values in the graph, in this PR we supported preserving this field
in prepare and convert so that we can use these for numerical debugging
Next: we also want to preserve these in deepcopy of GraphModule as well
Test Plan:
python test/test_quantization.py -k test_quantize_pt2e_preserve_handle
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/116477
Approved by: https://github.com/tugsbayasgalan
Summary:
`_fold_conv_bn_qat` is taking a long time currently, so skipping it when it's not necessary,
we can have follow up fixes to actually reduce the patterns or cache the patterns if possible
Test Plan:
uncomment the print in `test_speed`, run
python test/test_quantization.py -k test_speed
and make sure the convert time is low, e.g. 0.1s instead of 8-9 seconds
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/116440
Approved by: https://github.com/andrewor14