Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65033
1. Move the file:
```
hg mv caffe2/torch/quantization/fx caffe2/torch/ao/quantization/fx
hg mv caffe2/torch/quantization/quantize_fx.py caffe2/torch/ao/quantization/quantize_fx.py
```
2. Create new files
```
touch caffe2/torch/quantization/quantize_fx.py
touch caffe2/torch/quantization/fx/__init__.py
```
3. import things in the new files
4. add tests to test/quantization/ao_migration/test_quantization_fx.py
this is because we have some fx import in quantize_fx and fx/*.py
Test Plan: buck test mode/dev //caffe2/test:quantization
Reviewed By: vkuzo, z-a-f
Differential Revision: D30949749
fbshipit-source-id: 9e5d4d039c8a0a0820bc9040e224f0d2c26886d3
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65109
Previously for sub we set the dtype for sub with qconfig since it's matched with a QuantizeHandler,
however this is incorrect, the dtype for sub is decided by whether the output is quantized or not,
so we added a check of is_output_quantized while deciding the dtype for the output of sub.
Later: is_output_quantized now depends on is_reference, which is pretty confusing and it may cause problems down the road, we should remove this dependency in the future.
Test Plan:
python test/test_quantization.py TestQuantizeFx.test_sub_scalar
Imported from OSS
Reviewed By: vkuzo
Differential Revision: D30977826
fbshipit-source-id: 551fd63bd61b43b3c3415944ff73174e3a21cc8a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64445
AO Team is migrating the existing torch.quantization into torch.ao.quantization. We are doing it one file at a time to make sure that the internal callsites are updated properly.
This migrates the quantize.py from torch.quantization to torch.ao.quantization.
At this point both locations will be supported. Eventually the torch.quantization will be deprecated.
Test Plan: `buck test mode/dev //caffe2/test:quantization`
Reviewed By: HDCharles
Differential Revision: D30734870
fbshipit-source-id: dc204f3cc46bff2cc81c95159eab9d333b43bb4b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64288
This is just to setup the file structure and unblock experimentation.
The format for backend_config_dict will change in the future
Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
Imported from OSS
Reviewed By: zou3519
Differential Revision: D30699457
fbshipit-source-id: 28211a4def05d34757850c045a36e311f54760fe
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64135
We want to start aligning the api with the design in https://github.com/pytorch/pytorch/wiki/Extending-PyTorch-Quantization-to-Custom-Backends
We plan to gradually move things from `prepare_custom_config_dict` and `convert_custom_config_dict`
to `backend_config_dict` and allow custom backend developer to define their own way of quantizing operators.
Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
Imported from OSS
Reviewed By: zou3519
Differential Revision: D30699456
fbshipit-source-id: e3c068da8d3da2270f57719f7159cc71cafa8598
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64086
AO Team is migrating the existing torch.quantization into torch.ao.quantization. We are doing it one file at a time to make sure that the internal callsites are updated properly.
This migrates the `quantize.py` from torch.quantization to `torch.ao.quantization`.
At this point both locations will be supported. Eventually the torch.quantization will be deprecated.
Test Plan: `buck test mode/opt //caffe2/test:quantization`
Reviewed By: jerryzh168, raghuramank100
Differential Revision: D30055886
fbshipit-source-id: 8ef7470f9fa640c0042bef5bb843e7a05ecd0b9f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62698
We also removed the special handling in match_utils for binary ops
Test Plan:
python test/test_quantize.py TestQuantizeFx
python test/test_quantize.py TestQuantizeFxOps
Imported from OSS
Reviewed By: vkuzo
Differential Revision: D30093781
fbshipit-source-id: 58cc972de8211a80dd4d111e25dc4ad36057933f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63513
updating these names per Jerry's nits in the previous pr
Test Plan: Imported from OSS
Reviewed By: jerryzh168
Differential Revision: D30406710
fbshipit-source-id: a9f1577a2b8c4a93f5005e0f6278b7d7348d8b66
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63384
In some cases the changes to qconfig on module would cause the
fusions to fail. This bugfix solves that problem by adding a
qconfig_function_comparison that compares the functions within the
qconfig rather than the modules the qconfigs are on. The comparison
looks at the partial object within QConfig.activation/weight.p and
compares args, keywords and func. This is necessary to do mannually
because partial doesn't have __eq__ implemented and so == reverts to is.
Test Plan:
python test/test_quantization.py
TestFuseFx.test_problematic_fuse_example
Imported from OSS
Reviewed By: supriyar, ejguan
Differential Revision: D30386264
fbshipit-source-id: 51e358c021c39d6f48dc12ad2a82b2838677b9de
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63376
Previously when tuple is an argument for a quantizable op it would be transformed to a list by mistake,
this PR fixes that.
Test Plan:
python test/test_quantization.py TestQuantizeFx.test_preserve_tuple
Imported from OSS
Reviewed By: raghuramank100
Differential Revision: D30357642
fbshipit-source-id: 82d10805d9c00c003cc99983dca68b6455ff7b2e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63343
The previous implementation had a bug where we were trying to modify an ordered dict value while iterating through it.
This fixes it by creating a copy before modifying it.
Test Plan:
python test/test_quantization.py TestQuantizeFx.test_qconfig_qat_module_type
Imported from OSS
Reviewed By: raghuramank100
Differential Revision: D30346116
fbshipit-source-id: 0e33dad1163e8bff3fd363bfd04de8f7114d7a3a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62366
In the case of models with branches, we are unable to equalize the branching part in the graph.
For example, given this graph:
```
conv2
/ \
x -> conv1 -> add
```
After prepare, we will ignore the branched layers (conv1 and conv2) and will not insert the equalization observers. A warning message will also be printed with the layers that are unable to be equalized.
```
conv2 -> out_quant_obs2
/ \
x -> input_quant_obs -> conv1 -> out_quant_obs1 -> add
```
Test Plan:
`python test/test_quantization.py TestEqualizeFx.test_input_weight_equalization_prepare`
Imported from OSS
Reviewed By: malfet, supriyar
Differential Revision: D29982585
fbshipit-source-id: 706297e7f1861975998dfa83e7ca59af09d80618
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61859
BC-breakign note:
Previously we do not add observer/fake_quant for output of add/mul for tensor - scalar operation,
in this PR we added the observer/fake_quant instance (that's the same as input) to correctly model
the behavior of the quantized add_scalar and mul_scalar op (since quantized add/mul scalar assumes the
output quantized tensor have the same quantization parameter as input)
Test Plan:
python test/test_quantization.py TestQuantizeFxOps.test_add
python test/test_quantization.py TestQuantizeFxOps.test_mul
Imported from OSS
Reviewed By: vkuzo
Differential Revision: D29770859
fbshipit-source-id: f43fcbfecd04c392467770b22c481bbbdaf43c25
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61687
Previously we do not insert observer/fake_quant for output copy nodes (e.g. maxpool).
But to produce reference patterns we need to insert observer/fake_quant for the output and later convert that to a quantize
node.
Model:
```
class M(torch.nn.Module):
def __init__(self):
super().__init__()
self.maxpool2d = torch.nn.MaxPool2d(kernel_size=3)
def forward(self, x):
x = self.maxpool2d(x)
return x
```
result of prepare:
Before:
def forward(self, x):
x_activation_post_process_0 = self.x_activation_post_process_0(x); x = None
maxpool2d = self.maxpool2d(x_activation_post_process_0); x_activation_post_process_0 = None
return maxpool2d
After:
def forward(self, x):
x_activation_post_process_0 = self.x_activation_post_process_0(x); x = None
maxpool2d = self.maxpool2d(x_activation_post_process_0); x_activation_post_process_0 = None
maxpool2d_activation_post_process_0 = self.maxpool2d_activation_post_process_0(maxpool2d); maxpool2d = None
return maxpool2d_activation_post_process_0
Test Plan: Imported from OSS
Reviewed By: vkuzo
Differential Revision: D29715566
fbshipit-source-id: 817df9b2933a35cad5331d8d8ce1c5ba0752e9df
Summary:
This PR enables gpu only quantization, best used with is_reference since
there are not many gpu kernels for ops as of now.
This PR mainly changes how qconfigs and their obs constructors operate once they
on modules qconfig. The function add_module_to_qconfig_obs_ctr takes the obs constructors on the original
qconfig, and configures them so that when invoked, the created obs will
be on whatever device the module occupies. (Once observers are created,
module.to(device) is already setup so that it moves any observers). To do this,
a new method and a few small chanegs were added to the _PartialWrapper class that
our observers already use to create constructors (without changing the
existing functionality). These changes work in
concert with changes to the prepare flow such that when the qconfigs are
propagated to the moduels (in quantize.py and qconfig_utils.py) they are configured using add_module_to_qconfig_obs_ctr.
Ideally this would work on other models but the is_reference support for
a lot of modules isn't there yet, those tests should be added in a
future PR
Test Plan:
python test/test_quantization.py TestQuantizeFxModels.test_static_gpu_convert_basic
python test/test_quantization.py TestQuantizeFxModels.test_switch_device_prepare_convert
python test/test_quantization.py TestQuantizeFxModels.test_prepare_serialize_switch_device_convert
python test/test_quantization.py TestQuantizeFx.test_qconfig_precedence
Reviewed By: vkuzo
Differential Revision: D29684114
fbshipit-source-id: 19fefb8e1998eaf212723e836276ccf39467f2e7
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61132
For large models, the insert_observers_for_model function was taking a long time, especially for the case where not all the nodes are being quantized
For example for a model with 21000 nodes of which only ~50 are being quantized the breakdown of prepare_fx vs convert fx was
prepare_fx 979 seconds
convert_fx 9 seconds
The main reason was because we were doing some unnecessary computation for all nodes in this function, this PR just moves them to where they are actually used
After this PR
prepare_fx 26 seconds
convert_fx 9 seconds
Test Plan:
Existing tests
Imported from OSS
Reviewed By: raghuramank100
Differential Revision: D29522303
fbshipit-source-id: 7ce12582a859d02ff763abebf4a592d28e0764ca
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60779
When we do fusion, we replace certain modules (such as Linear + ReLU) with fused versions (such as LinearReLU) by calling `_fuse_fx` in prepare_fx. However when we try to look up using the fused module type in qconfig_dict, we cannot find a match anymore since the qconfig dict contains the original module types. An example is here [N882873](https://fburl.com/anp/azenjx3v).
So we will now update the qconfig_dict to include the fused modules mapping to the qconfigs used for the modules that make up the fused modules. If the modules are not mapped to the same qconfig, then we will raise an error.
Test Plan:
`python test/test_quantization.py TestFuseFx.test_qconfig_fused_module`
Imported from OSS
Reviewed By: supriyar
Differential Revision: D29406941
fbshipit-source-id: 74b5db89f4998aeb02b2bf7c37bf97326580c654
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60555
When we do QAT, we swap the FP32 modules with the corresponding quantized modules counterpart by calling `qat_swap_modules` in prepare.
However when we try to look up using the swapped module type in qconfig_dict, we cannot find a match anymore since the qconfig dict contains the original
module type.
In this PR we update the qconfig_dict to include the modules swapped for QATT
Test Plan:
python test/test_quantization.py TestQuantizeFx.test_qconfig_qat_module_type
Imported from OSS
Reviewed By: jerryzh168
Differential Revision: D29337036
fbshipit-source-id: 60212eec3ee252a2445c1b58874cb36048c9f7dd
Summary:
During development it is common practice to put `type: ignore` comments on lines that are correct, but `mypy` doesn't recognize this. This often stems from the fact, that the used `mypy` version wasn't able to handle the used pattern.
With every new release `mypy` gets better at handling complex code. In addition to fix all the previously accepted but now failing patterns, we should also revisit all `type: ignore` comments to see if they are still needed or not. Fortunately, we don't need to do it manually: by adding `warn_unused_ignores = True` to the configuration, `mypy` will error out in case it encounters an `type: ignore` that is no longer needed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60006
Reviewed By: jbschlosser, malfet
Differential Revision: D29133237
Pulled By: albanD
fbshipit-source-id: 41e82edc5cd5affa7ccedad044b59b94dad4425a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59799
This is a redo of #58574, easier to create a new PR than to fix rebase
conflicts, as there have been a large number of refactors to the
underlying code.
Removes some code which was incorrectly added by #57519 but never
actually used for anything.
Test Plan:
```
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
```
Imported from OSS
Reviewed By: jerryzh168
Differential Revision: D29031955
fbshipit-source-id: f407d181070cb283382965952821e3647c705544