Summary:
During convert step observers are first replaced by Q-DQ pair. In some
scenarios like following output DQ has a fan out.
---> OP2 -> Q -> DQ
/
OP -> Q -> DQ -
\
---> OP3 -> Q -> DQ
If either op OP2 or OP3 are configured to be quantized, then the input
is expected to quantized. In this case quantized equivalent of some
pattern, that quantizer asked to be quantized, should look like:
[DQ -> {pattern} -> Q]. However, in scenario like above where DQ node
is shared between multiple "quantized" patterns, boundary of "quantized"
pattern is not clear because DQ now belongs to multiple quantized
patterns.
This poses challenge for:
- Porting metadata: which "quantized" partition this DQ node belongs
- Quantized representation, equivalently, needs to identify
self-contained quantized pattern that is replaced by its equivalent pattern
that captures compute in the quantized precision.
Test Plan:
test_duplicate_dq_pass
Reviewers:
Subscribers:
Tasks:
Tags:
Differential Revision: [D48663147](https://our.internmc.facebook.com/intern/diff/D48663147)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107900
Approved by: https://github.com/jerryzh168, https://github.com/andrewor14, https://github.com/leslie-fang-intel
ghstack dependencies: #107105, #107106, #107899
Summary:
In prepararation for metadata porting diff, it is required that weight
quant annotation happens via edge quantization, i.e. input_qspec_map.
Reason: Metadata is ported via associating DQ node's metadata with its
consumer while associating Q node's metadata with its producer.
Furthermore, such porting must be qualified via user intent to see if
the consumder of DQ, or producer of Q, actually specified intent of
quantization
By making quantization annotation on linear node's weight via
input_qspec_map, we can enable associating DQ of [weight -> Q -> DQ],
with the linear module.
Test Plan:
CI
Reviewers:
Subscribers:
Tasks:
Tags:
Differential Revision: [D48488414](https://our.internmc.facebook.com/intern/diff/D48488414)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107105
Approved by: https://github.com/jerryzh168
Summary: D41985889 removed the cast to int for the inputs to torch.histc below, allowing the inputs to still be tensors. These tensors still have require_grad_ set to True, causing issues with the call to torch.histc.
Test Plan: buck2 test 'fbcode//mode/opt' fbcode//dper3/dper3/modules/low_level_modules/tests:stat_collector_test -- --exact 'dper3/dper3/modules/low_level_modules/tests:stat_collector_test - test_scripted_module (dper3.dper3.modules.low_level_modules.tests.stat_collector_test.StatCollectorTest_1)'
Differential Revision: D48800879
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108232
Approved by: https://github.com/jerryzh168
Summary:
Previously we run propagate_annotation by default in quantization flow to propagate annotations for ops like reshape, view etc.
Not all quantizers would need this so we moved this to xnnpack_quantizer_utils for now.
Next Step:
* make propagate_annotation function configurable with a custom list of ops
* remove unneeded ops in `_is_share_obs_or_fq_op`
Test Plan:
python test/test_quantization.py TestQuantizePT2E
Reviewers:
Subscribers:
Tasks:
Tags:
Differential Revision: [D48856985](https://our.internmc.facebook.com/intern/diff/D48856985)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108320
Approved by: https://github.com/kimishpatel
Summary: This commit adds a public facing
`torch.ao.quantization.move_model_to_eval` util function
for QAT users. Instead of calling model.eval() on an exported
model (which doesn't work, see
https://github.com/pytorch/pytorch/issues/103681), the user
would call this new util function instead. This ensures special
ops such as dropout and batchnorm (not supported yet) will have
the right behavior when the graph is later used for inference.
Note: Support for an equivalent `move_model_to_train` will be
added in the future. This is difficult to do for dropout
currently because the eval pattern of dropout is simply a clone
op, which we cannot just match and replace with a dropout op.
Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_move_model_to_eval
Reviewers: jerryzh168, kimishpatel
Subscribers: jerryzh168, kimishpatel, supriyar
Differential Revision: [D48814735](https://our.internmc.facebook.com/intern/diff/D48814735)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108184
Approved by: https://github.com/jerryzh168
**Summary**
Add linear and linear-unary post-op quantization recipe to x86 inductor quantizer. For PT2E with Inductor. With this, the quantization path will add `quant-dequant` pattern for linear and linear-unary post op.
**Test plan**
python test/test_quantization.py -k test_linear_with_quantizer_api
python test/test_quantization.py -k test_linear_unary_with_quantizer_api
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106781
Approved by: https://github.com/leslie-fang-intel, https://github.com/jgong5, https://github.com/jerryzh168
ghstack dependencies: #105818
**Summary**
The latest check-in a0cfaf0688 for the conv-bn folding assumes the graph is captured by the new graph capture API `torch._export.capture_pre_autograd_graph`. Since we still need to use the original graph capture API `torch._dynamo_export` in 2.1 release. So, this check-in made negative impact to workloads' performance heavily. Made this PR to fix this issue by trying to make the conv-bn folding function workable with both new and original graph capture API.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107951
Approved by: https://github.com/jgong5, https://github.com/jerryzh168
ghstack dependencies: #106836, #106838, #106958
Summary: This fixes the no bias case for conv annotations.
Previously this would result in an index out of bounds, since
the new aten.conv2d op may not have the bias arg (unlike the
old aten.convolution op). This was not caught because of a lack
of test cases, which are added in this commit.
Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_qat_conv_no_bias
python test/test_quantization.py TestQuantizePT2E.test_qat_conv_bn_relu_fusion_no_conv_bias
Reviewers: jerryzh168, kimishpatel
Subscribers: jerryzh168, kimishpatel
Differential Revision: [D48696874](https://our.internmc.facebook.com/intern/diff/D48696874)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107971
Approved by: https://github.com/jerryzh168
When exporting dropout with cpu tensor, we get following graph module
```
class GraphModule(torch.nn.Module):
def forward(self, arg0_1: f32[512, 10]):
empty_memory_format: f32[512, 10] = torch.ops.aten.empty.memory_format([512, 10], dtype = torch.float32, layout = torch.strided, device = device(type='cpu'), pin_memory = False, memory_format = torch.contiguous_format)
bernoulli_p: f32[512, 10] = torch.ops.aten.bernoulli.p(empty_memory_format, 0.9); empty_memory_format = None
div_scalar: f32[512, 10] = torch.ops.aten.div.Scalar(bernoulli_p, 0.9); bernoulli_p = None
mul_tensor: f32[512, 10] = torch.ops.aten.mul.Tensor(arg0_1, div_scalar); arg0_1 = div_scalar = None
return (mul_tensor,)
```
In addition, if we export with eval() mode, we will have an empty graph.
However, when exporting with cuda tensor, we got
```
class GraphModule(torch.nn.Module):
def forward(self, arg0_1: f32[512, 10]):
native_dropout_default = torch.ops.aten.native_dropout.default(arg0_1, 0.1, True); arg0_1 = None
getitem: f32[512, 10] = native_dropout_default[0]; native_dropout_default = None
return (getitem,)
```
and exporting under eval() mode will still have a dropout node in graph.
This PR make exporting with CPU tensor also produce aten.native_dropout.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106274
Approved by: https://github.com/ezyang
This updates ruff to 0.285 which is faster, better, and have fixes a bunch of false negatives with regards to fstrings.
I also enabled RUF017 which looks for accidental quadratic list summation. Luckily, seems like there are no instances of it in our codebase, so enabling it so that it stays like that. :)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107519
Approved by: https://github.com/ezyang
This updates ruff to 0.285 which is faster, better, and have fixes a bunch of false negatives with regards to fstrings.
I also enabled RUF017 which looks for accidental quadratic list summation. Luckily, seems like there are no instances of it in our codebase, so enabling it so that it stays like that. :)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107519
Approved by: https://github.com/ezyang
Summary:
Currently in quantizer/quantize_pt2e we import things from specific quantizers (XNNPACKQuantizer, QuantizationConfig) etc.
this PR removes them so it's clearer that they are not part of the core quantization code base
This PR also removed get_supported_operators from main Quantizer since we haven't seen a clear need for this API
Test Plan:
CIs
Imported from OSS
Differential Revision: D48340367
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107259
Approved by: https://github.com/kimishpatel
Summary:
Previously if we have:
```
conv1 -> cat
conv2 /
```
and configure output of conv1/conv2 to be int8 quantized, and cat also int8 quantized and with shared inputs,
it will not produce expected results (input of cat will not be shared)
The problem is that there is some missing checks when inserting observers for input for cat
This PR fixes the problem.
Fixes: https://github.com/pytorch/pytorch/issues/106760
Test Plan:
python tes/test_quantization.py TestQuantzePT2E.test_shared_qspec
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106922
Approved by: https://github.com/kimishpatel
Summary: Internal model and Resnet uses "re-export" flow now. Also did some refactoring to make the code little cleaner
Some changes for OSS:
1. Correctly use the "cached" fake tensors so that static symbols are still resolved to static
2. Change logic in PassBase to allocate static shapes for parameters
3. Add "is_torch_exported" tag to every node to make it survive during various graph transformations.
4. Added experimental wrapper API for quantization team to get pre_dispatch=True graph. Note that it doesn't actually do that right now. But we plan to switch soon.
Test Plan: CI
Differential Revision: D47890878
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106676
Approved by: https://github.com/jerryzh168
Summary:
This is to allow sharing these annotate functions by other quantizers so that writing a new quantizer is easier
note that these annotation functions will be maintained by XNNPACKQuantizer developers instead of AO team
Test Plan:
python test/test_quantization.py TestQuantizePT2E
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106642
Approved by: https://github.com/andrewor14
Summary:
As title.
There's a corner case where both cpu and gpu are avaiable, although the model is moved to cpu, the newly created PTQ weight observer is still on gpu. Therefore, during the convert, this line will fail https://fburl.com/4rhipfvb
Test Plan: CI
Differential Revision: D48141494
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106755
Approved by: https://github.com/jerryzh168
Summary:
Added support to allow users to set configurations based on module type in XNNPACKQuantizer, can also serve as an example
for implementing new quantizers
Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_xnnpack_quantizer_set_module_type
Summary:
Test Plan:
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106094
Approved by: https://github.com/andrewor14
ghstack dependencies: #106087
Summary:
Added support to allow users to set configurations based on module name in XNNPACKQuantizer, can also serve as an example
for implementing new quantizers
Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_xnnpack_quantizer_set_module_name
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106087
Approved by: https://github.com/andrewor14
Summary: moving quantizer to torch.ao.quantization to make it a public api, since pt2e is a folder for implementations
Test Plan:
CIs
sanity check: "buck test //executorch/backends/xnnpack/test:test_xnnpack_quantized_models -- test_resnet18"
Differential Revision: D47727838
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105885
Approved by: https://github.com/andrewor14