Commit Graph

57 Commits

Author SHA1 Message Date
Edward Z. Yang
3bf922a6ce Apply UFMT to low traffic torch modules (#106249)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106249
Approved by: https://github.com/Skylion007
2023-07-29 23:37:30 +00:00
andrewor14
13fcc412be [Quant][fx][bc-breaking] Remove unused functions in fx/utils.py (#90025)
Summary and BC-breaking notes: This commit removes the following
unused functions from both the `torch.quantization` and the
`torch.ao.quantization` namespaces:

```
graph_pretty_str
get_per_tensor_qparams
quantize_node
get_qconv_op
create_qparam_nodes
node_return_type_is_int
is_get_tensor_info_node
```

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
python test/test_quantization.py TestAOMigrationQuantizationFx

Reviewers: jerryzh168, vkuzo

Subscribers: jerryzh168, vkuzo
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90025
Approved by: https://github.com/HDCharles
2022-12-07 01:31:28 +00:00
Charles David Hernandez
c1d070d0f0 [ao] Fixing obs insertion through dtype propagation (#73274)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73274

As noticed in https://discuss.pytorch.org/t/calibration-of-model-in-post-training-static-quantization-using-fx-api/143661/6
and related to https://github.com/pytorch/pytorch/issues/72698 when using fx quantizaiton, if an op like view was used in a
model and the index parameters were passed in to the ops with a
variable rather than
hard coded, fx would mistakenly insert observers for them, leading to an
error when the observer tried to do tensor only operations on a
non-tensor. To fix this, an API was added to specify non tensor
arguments for various ops to enable better dtype propagation.
NON_TENSOR_ARG_DICT is a nested dict whose first key is a named tuple
which contains matching parameters for ops with nontensor args, the
inner dict's keys are dtypes and the values are a list of those arg indices that
take use such dtypes. Alternatively, instead of a list, the inner dict
value can also be a function that takes the node as an argument and
returns the list of arg indices.

Theoretically this api can support arbitrary functions but the current
implmentation is limited to simpler functions given the particular
issue this fixes seems to be rare.

Note: although torch.unsqueeze and torch.transpose are listed in
quantization_patterns.py, those ops appear to be untraceable by fx. I've
included tests for their cases but fixing this issue is beyond the scope
of this PR

Test Plan:
python test/test_quantization.py test_non_reference_size
...
python test/test_quantization.py test_non_reference_<op>

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D34410122

fbshipit-source-id: fc09949ca8a2d6473876a4b6c214eb91e9a9dae2
(cherry picked from commit 3a1375d677b7c98d62b1f5c839645698c39b32b9)
2022-03-16 01:41:17 +00:00
Vasiliy Kuznetsov
b999f87503 fx quant: move _parent_name to common utils (#69720)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69720

This function is also useful for DBR quant, moving it from FX utils
to common utils.

Test Plan:
```
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeDBR
```

Reviewed By: jerryzh168

Differential Revision: D33003473

Pulled By: vkuzo

fbshipit-source-id: 20360682c69d614a645c14fc29d3ee023d6b2623
2021-12-17 05:59:46 -08:00
Jerry Zhang
508845f2b5 [quant] AO migration of the torch/quantization/quantize_fx.py and torch/quantization/fx/* (#65033)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65033

1. Move the file:
```
hg mv caffe2/torch/quantization/fx caffe2/torch/ao/quantization/fx
hg mv caffe2/torch/quantization/quantize_fx.py caffe2/torch/ao/quantization/quantize_fx.py
```
2. Create new files
```
touch caffe2/torch/quantization/quantize_fx.py
touch caffe2/torch/quantization/fx/__init__.py
```
3. import things in the new files
4. add tests to test/quantization/ao_migration/test_quantization_fx.py
this is because we have some fx import in quantize_fx and fx/*.py

Test Plan: buck test mode/dev //caffe2/test:quantization

Reviewed By: vkuzo, z-a-f

Differential Revision: D30949749

fbshipit-source-id: 9e5d4d039c8a0a0820bc9040e224f0d2c26886d3
2021-09-22 09:29:15 -07:00
Zafar Takhirov
9cc44aad21 [quant] AO migration of the quantize.py (resubmission) (#64445)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64445

AO Team is migrating the existing torch.quantization into torch.ao.quantization. We are doing it one file at a time to make sure that the internal callsites are updated properly.
This migrates the quantize.py from torch.quantization to torch.ao.quantization.
At this point both locations will be supported. Eventually the torch.quantization will be deprecated.

Test Plan: `buck test mode/dev //caffe2/test:quantization`

Reviewed By: HDCharles

Differential Revision: D30734870

fbshipit-source-id: dc204f3cc46bff2cc81c95159eab9d333b43bb4b
2021-09-08 04:58:47 -07:00
Zafar Takhirov
046ed57a4d Revert D30055886: [quant] AO migration of the quantize.py
Test Plan: revert-hammer

Differential Revision:
D30055886 (44e3ed88c9)

Original commit changeset: 8ef7470f9fa6

fbshipit-source-id: c5bd3ead43a2d44b9e56872ec5bd7a195bdac725
2021-09-02 16:59:59 -07:00
Zafar Takhirov
44e3ed88c9 [quant] AO migration of the quantize.py (#64086)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64086

AO Team is migrating the existing torch.quantization into torch.ao.quantization. We are doing it one file at a time to make sure that the internal callsites are updated properly.

This migrates the `quantize.py` from torch.quantization to `torch.ao.quantization`.

At this point both locations will be supported. Eventually the torch.quantization will be deprecated.

Test Plan: `buck test mode/opt //caffe2/test:quantization`

Reviewed By: jerryzh168, raghuramank100

Differential Revision: D30055886

fbshipit-source-id: 8ef7470f9fa640c0042bef5bb843e7a05ecd0b9f
2021-08-29 20:30:01 -07:00
Charles David Hernandez
32d0c3e8ee Support for reference convert_fx working on gpu
Summary:
This PR enables gpu only quantization, best used with is_reference since
there are not many gpu kernels for ops as of now.

This PR mainly changes how qconfigs and their obs constructors operate once they
on modules qconfig. The function add_module_to_qconfig_obs_ctr takes the obs constructors on the original
qconfig, and configures them so that when invoked, the created obs will
be on whatever device the module occupies. (Once observers are created,
module.to(device) is already setup so that it moves any observers). To do this,
a new method and a few small chanegs were added to the _PartialWrapper class that
our observers already use to create constructors (without changing the
existing functionality). These changes work in
concert with changes to the prepare flow such that when the qconfigs are
propagated to the moduels (in quantize.py and qconfig_utils.py) they are configured using add_module_to_qconfig_obs_ctr.

Ideally this would work on other models but the is_reference support for
a lot of modules isn't there yet, those tests should be added in a
future PR

Test Plan:
python test/test_quantization.py TestQuantizeFxModels.test_static_gpu_convert_basic

python test/test_quantization.py TestQuantizeFxModels.test_switch_device_prepare_convert

python test/test_quantization.py TestQuantizeFxModels.test_prepare_serialize_switch_device_convert

python test/test_quantization.py TestQuantizeFx.test_qconfig_precedence

Reviewed By: vkuzo

Differential Revision: D29684114

fbshipit-source-id: 19fefb8e1998eaf212723e836276ccf39467f2e7
2021-07-23 10:30:38 -07:00
Angela Yi
1a0195db49 [quant] Input-Weight Equalization - support for LinearReLU layers (#60653)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60653

Special casing was needed to get the weight attribute in the linear layers of fused LinearReLU layers.

Initial Model: `x -> linear1 -> relu`

After fusion: `x -> linearRelu`

After prepare: `x -> input_quant_obs -> input_eq_obs1 -> linearRelu -> output_quant_obs1`

After equalization functions: `x -> mul -> input_quant_obs (scaled) -> linearRelu -> output_quant_obs`

After convert: `x -> mul -> quantize_per_tensor -> quantized::linearRelu -> dequantize`

More step-throughs here: https://fb.quip.com/A9J3AsBxkykR

Test Plan:
`python test/test_quantization.py TestEqualizeFx`

Original model:
```
LinearReluModel(
  (fc): Linear(in_features=5, out_features=5, bias=True)
  (relu): ReLU()
)
```

Graph after `prepare_fx`:
```
graph():
    %x : [#users=1] = placeholder[target=x]
    %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_0](args = (%x,), kwargs = {})
    %x_activation_post_process_0_equalization_process_0 : [#users=1] = call_module[target=x_activation_post_process_0_equalization_process_0](args = (%x_activation_post_process_0,), kwargs = {})
    %fc : [#users=1] = call_module[target=fc](args = (%x_activation_post_process_0_equalization_process_0,), kwargs = {})
    %fc_activation_post_process_0 : [#users=1] = call_module[target=fc_activation_post_process_0](args = (%fc,), kwargs = {})
    return fc_activation_post_process_0
```

Graph after equalization functions:
```
graph():
    %x : [#users=1] = placeholder[target=x]
    %x_equalization_scale0 : [#users=1] = get_attr[target=x_equalization_scale0]
    %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_equalization_scale0), kwargs = {})
    %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_0](args = (%mul,), kwargs = {})
    %fc : [#users=1] = call_module[target=fc](args = (%x_activation_post_process_0,), kwargs = {})
    %fc_activation_post_process_0 : [#users=1] = call_module[target=fc_activation_post_process_0](args = (%fc,), kwargs = {})
    return fc_activation_post_process_0
```

Graph after `convert_fx`:
```
graph():
    %x : [#users=1] = placeholder[target=x]
    %x_equalization_scale0 : [#users=1] = get_attr[target=x_equalization_scale0]
    %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_equalization_scale0), kwargs = {})
    %fc_input_scale_0 : [#users=1] = get_attr[target=fc_input_scale_0]
    %fc_input_zero_point_0 : [#users=1] = get_attr[target=fc_input_zero_point_0]
    %quantize_per_tensor : [#users=1] = call_function[target=torch.quantize_per_tensor](args = (%mul, %fc_input_scale_0, %fc_input_zero_point_0, torch.quint8), kwargs = {})
    %fc : [#users=1] = call_module[target=fc](args = (%quantize_per_tensor,), kwargs = {})
    %dequantize : [#users=1] = call_method[target=dequantize](args = (%fc,), kwargs = {})
    return dequantize
```

Imported from OSS

Reviewed By: supriyar

Differential Revision: D29406999

fbshipit-source-id: add38e8e7fb84a241c3b10bfb8451b50103effd4
2021-06-30 14:22:06 -07:00
angelayi
e13a9587b4 Revert "Revert D29135358: [quant] Input-Weight Equaliaztion - convert modifications" (#60646)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60646

This reverts commit e60f9cfc58.

Test Plan: Imported from OSS

Reviewed By: supriyar

Differential Revision: D29361191

Pulled By: angelayi

fbshipit-source-id: 275d8691d8e47da4ab80bb21b51d77ec25a0f714
2021-06-25 15:37:05 -07:00
Rong Rong (AI Infra)
e60f9cfc58 Revert D29135358: [quant] Input-Weight Equaliaztion - convert modifications
Test Plan: revert-hammer

Differential Revision:
D29135358 (3de79b7757)

Original commit changeset: 2d0005672904

fbshipit-source-id: cac30c1202ebbce4f22e50ed920340c7b4c6849f
2021-06-23 11:23:24 -07:00
Angela Yi
3de79b7757 [quant] Input-Weight Equaliaztion - convert modifications (#59963)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59963

When converting, before quantizing the nodes, we call
`update_obs_for_equalization()` and `convert_eq_obs()`.

`update_obs_for_equalization`:
1. For each InputEqualizationObserver, we find the corresponding
WeightEqualizationObserver.
2. For nn.Linear layers, we will create an instance of the
WeightEqualizationObserver, run forward on the observer with the given
weights.
3. Calculate the equalization scale between the
InputEqualizationObserver and WeightEqualizationObserver.

`convert_eq_obs`:
For every InputEqualizationObserver, we will do the following:
1. Create a node (ex. `x0_activation_post_process_scale`) containing the
equalization scale constant.
2. Create another node containing a `mul` operator multiplying the
equalization scale and the input.
3. Remove the current InputEqualizationObserver node, and replace it
with the `mul` node.

For every WeightEqualizationObserver, we will do the following:
1. Get the next equalization scale (we may need this for equalizing
connected linear layers).
2. Scale the weights by multiplying it with the reciprocal of the
current equalization scale and the next equalization scale

Currently, this supports models with `nn.Linear` layers, but does not
support connecting linear layers.

Test Plan:
`python test/test_quantization.py
TestEqualizeFx.test_input_weight_equalization_convert`

Original Model:
```
.LinearModule(
  (linear): Linear(in_features=2, out_features=2, bias=True)
)
```

Graph after `prepare_fx`:
```
graph():
    %x : [#users=1] = placeholder[target=x]
    %x_equalization_process_0 : [#users=1] = call_module[target=x_equalization_process_0](args = (%x,), kwargs = {})
    %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_00](args = (%x_equalization_process_0,), kwargs = {})
    %linear : [#users=1] = call_module[target=linear](args = (%x_activation_post_process_0,), kwargs = {})
    %linear_activation_post_process_0 : [#users=1] = call_module[target=linear_activation_post_process_0](args = (%linear,), kwargs = {})
    return linear_activation_post_process_0
```

Graph after equalization functions:
```
graph():
    %x : [#users=1] = placeholder[target=x]
    %x_equalization_process_0_scale : [#users=1] = get_attr[target=x_equalization_process_0_scale]
    %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_equalization_process_0_scale), kwargs = {})
    %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_00](args = (%mul,), kwargs = {})
    %linear : [#users=1] = call_module[target=linear](args = (%x_activation_post_process_0,), kwargs = {})
    %linear_activation_post_process_0 : [#users=1] = call_module[target=linear_activation_post_process_0](args = (%linear,), kwargs = {})
    return linear_activation_post_process_0
```

Graph after `convert_fx`:
```
graph():
    %x : [#users=1] = placeholder[target=x]
    %x_equalization_process_0_scale : [#users=1] = get_attr[target=x_equalization_process_0_scale]
    %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_equalization_process_0_scale), kwargs = {})
    %linear_input_scale_0 : [#users=1] = get_attr[target=linear_input_scale_0]
    %linear_input_zero_point_0 : [#users=1] = get_attr[target=linear_input_zero_point_0]
    %quantize_per_tensor : [#users=1] = call_function[target=torch.quantize_per_tensor](args = (%mul, %linear_input_scale_0, %linear_input_zero_point_0, torch.quint8), kwargs = {})
    %linear : [#users=1] = call_module[target=linear](args = (%quantize_per_tensor,), kwargs = {})
    %dequantize : [#users=1] = call_method[target=dequantize](args = (%linear,), kwargs = {})
    return dequantize
```

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D29135358

fbshipit-source-id: 2d00056729041318463de61841483490b6bfeee5
2021-06-22 20:43:30 -07:00
Supriya Rao
864d129bae [quant][fx] Remove extra q-dq for weight bias in normalization ops (#59882)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59882

Currently for normalization ops, the weight and bias arguments are treated as activationn inputs which require observers.
This results in adding extra quant-dequant ops for the weight and bias inputs.

This PR adds support to skip observing weight/bias inputs of norm operators, thus removing the redundant q-dq ops

Quantized graph with F.layer_norm
Before this PR
```
def forward(self, x):
    _input_scale_0 = self._input_scale_0
    _input_zero_point_0 = self._input_zero_point_0
    quantize_per_tensor = torch.quantize_per_tensor(x, _input_scale_0, _input_zero_point_0, torch.quint8);  x = _input_scale_0 = _input_zero_point_0 = None
    scale = self.scale
    _input_scale_1 = self._input_scale_1
    _input_zero_point_1 = self._input_zero_point_1
    quantize_per_tensor_1 = torch.quantize_per_tensor(scale, _input_scale_1, _input_zero_point_1, torch.quint8);  scale = _input_scale_1 = _input_zero_point_1 = None
    bias = self.bias
    _input_scale_2 = self._input_scale_2
    _input_zero_point_2 = self._input_zero_point_2
    quantize_per_tensor_2 = torch.quantize_per_tensor(bias, _input_scale_2, _input_zero_point_2, torch.quint8);  bias = _input_scale_2 = _input_zero_point_2 = None
    _scale_0 = self._scale_0
    _zero_point_0 = self._zero_point_0
    dequantize = quantize_per_tensor_1.dequantize();  quantize_per_tensor_1 = None
    dequantize_1 = quantize_per_tensor_2.dequantize();  quantize_per_tensor_2 = None
    layer_norm = torch.ops.quantized.layer_norm(quantize_per_tensor, [2, 5, 5], weight = dequantize, bias = dequantize_1, eps = 1e-05, output_scale = _scale_0, output_zero_point = _zero_point_0);  quantize_per_tensor = dequantize = dequantize_1 = _scale_0 = _zero_point_0 = None
    dequantize_2 = layer_norm.dequantize();  layer_norm = None
    return dequantize_2
```
After
```
def forward(self, x):
    _input_scale_0 = self._input_scale_0
    _input_zero_point_0 = self._input_zero_point_0
    quantize_per_tensor = torch.quantize_per_tensor(x, _input_scale_0, _input_zero_point_0, torch.quint8);  x = _input_scale_0 = _input_zero_point_0 = None
    scale = self.scale
    bias = self.bias
    _scale_0 = self._scale_0
    _zero_point_0 = self._zero_point_0
    layer_norm = torch.ops.quantized.layer_norm(quantize_per_tensor, [2, 5, 5], weight = scale, bias = bias, eps = 1e-05, output_scale = _scale_0, output_zero_point = _zero_point_0);  quantize_per_tensor = scale = bias = _scale_0 = _zero_point_0 = None
    dequantize = layer_norm.dequantize();  layer_norm = None
    return dequantize
```

Test Plan:
python test/test_quantization.py TestQuantizeFxOps.test_norm_weight_bias

Imported from OSS

Reviewed By: HDCharles, ailzhang

Differential Revision: D29068203

fbshipit-source-id: 24b5c38bbea5fd355d34522bfa654c9db18607da
2021-06-11 16:22:36 -07:00
Jerry Zhang
18642e664a [quant][graphmode][fx][refactor] Split quantize.py to prepare.py and convert.py (#59353)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59353

Next: remove Quantizer class

Test Plan: Imported from OSS

Reviewed By: raghuramank100

Differential Revision: D28856277

fbshipit-source-id: 25f5502be387dbe9706780f667501b46b82789a5
2021-06-02 23:52:39 -07:00
Jerry Zhang
06af7618e7 [quant][graphmode][fx][refactor] Remove Quantizer class from convert (QuantizeHandler) (#59040)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59040

To remove Quantizer class and split prepare and convert functions to different files

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D28724870

fbshipit-source-id: c0f748711b825cd46bdfcc05c054c77a41e8207a
2021-06-01 22:00:49 -07:00
Jerry Zhang
50e6ee3ca2 [quant][graphmode][fx][refactor] Remove Quantizer class from quantize_node (#59039)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59039

To remove Quantizer class and split prepare and convert functions to different files

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D28724874

fbshipit-source-id: bd984716b2da1d6879c3e92fa827574783a41567
2021-06-01 21:40:08 -07:00
Jerry Zhang
83892c1861 [quant][graphmode][fx][refactor] Remove node_name_to_scope from Quantizer (#59032)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59032

To remove Quantizer class and split prepare and convert functions to different files

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D28724868

fbshipit-source-id: 6df639f20076b480812b6dcf0fc7d2c87ca29d8b
2021-06-01 13:26:09 -07:00
Jerry Zhang
3826f7e8e0 [quant][graphmode][fx][refactor] Remove quantized_graph from Quantizer (#59031)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59031

Trying to remove Quantizer class and split prepare and convert code

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D28724871

fbshipit-source-id: dad0332ba271c4cfb6ec1e8f2036443149b5bea4
2021-06-01 13:01:54 -07:00
Jerry Zhang
1b4586ee20 [quant][gx][graphmode][refactor] Remove modules from Quantizer (#59030)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59030

Trying to remove Quantizer class and split prepare and convert code

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D28724875

fbshipit-source-id: d6610c1d5eb7755331252be9e348a230abf4175c
2021-06-01 12:42:28 -07:00
Jerry Zhang
10fc42eacc [quant][graphmode][fx] Merge quant_env and env (#59028)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59028

Previously we have an env and a quant_env in convert, which is a bit confusing,
in this PR we merged them and have a Dict[str, Tuple[Node, torch.dtype]]

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D28724863

fbshipit-source-id: 722a682c70d300a6ccd2b988786a1ac2d45e880e
2021-06-01 09:21:38 -07:00
Vasiliy Kuznetsov
821a97595b fx quant: improve performance of all_node_args_have_no_tensors (#58461)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58461

Improves the logic which calculates whether a node has any tensors
in its arguments by terminating the recursion early when possible.

In a future PR, we should probably ditch this entire approach and switch to
using dtype propagation.

Test Plan:
```
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
```

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D28499455

fbshipit-source-id: bedd844022b90e1fcb7d7a3cb4cc65440dc9cc59
2021-05-18 07:19:59 -07:00
Jerry Zhang
945c93b8bd [quant][graphmode][fx] Skip observering boolean Tensors (#57375)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57375

Skip observing the input for masked_fill. Currently we don't have a way to
query the type of Proxy in GraphModule, hopefully we should have the functionality to annotate the type,
we'll need to annotate a Proxy to be a boolean Tensor to remove this hack.

Test Plan:
python test/test_quantization.py TestQuantizeFxOps.test_boolean_tensor

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D28126003

fbshipit-source-id: 2989766370a607579b3ea07ca36cdc2ce35893cc
2021-05-03 11:20:33 -07:00
Jerry Zhang
ecacb8c78b [quant][graphmode][fx] Fix getitem for unmatched nodes (#57173)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57173

If getitem is followed by an unmatched node, we'll remove the observer after it.

Test Plan:
python test/test_quantization.pyt TestQuantizeFxOps.test_getitem

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D28068805

fbshipit-source-id: e79f8ec3e8fd61d348b8a7069ab0bb434d737c30
2021-04-29 10:16:44 -07:00
Sam Estep
75024e228c Add lint for unqualified type: ignore (#56290)
Summary:
The other half of https://github.com/pytorch/pytorch/issues/56272.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56290

Test Plan:
CI should pass on the tip of this PR, and we know that the lint works because the following CI runs (before this PR was finished) failed:

- https://github.com/pytorch/pytorch/runs/2384511062
- https://github.com/pytorch/pytorch/actions/runs/765036024

Reviewed By: seemethere

Differential Revision: D27867219

Pulled By: samestep

fbshipit-source-id: e648f07b6822867e70833e23ddafe7fb7eaca235
2021-04-21 08:07:23 -07:00
Jerry Zhang
bbd2b1bd3c [quant][graphmode][fx] Add shape to nontensor op list (#55529)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55529

x.shape outputs a non-Tensor, add this to the all_node_args_have_no_tensors function
to avoid inserting observer for the getattr "shape" node.

Test Plan: Imported from OSS

Reviewed By: wat3rBro

Differential Revision: D27628145

fbshipit-source-id: 4729294ab80c0a1e72440396d31e7e82257b1092
2021-04-08 23:27:05 -07:00
Jerry Zhang
4d449f915f [quant][graphmode][fx] Separate handling Copy operator to a helper function (#54644) (#55429)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55429

Previously we special case copy operator in normal insert observer code, this PR tries to split the
special case logic to a separate function and keep the rest of the code clean.

Test Plan:
Imported from OSS

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D27609972

fbshipit-source-id: 378f6aa70f18c0b477b62b6efe236648748aae7e
2021-04-08 22:12:24 -07:00
Bradley Davis
8eaa4a97b7 Back out "[quant][graphmode][fx] Separate handling Copy operator to a helper function" (#55388)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55388

temporarily revert D27314678 (c57541ce06), it appears to cause a perf regression that makes quantization of some models take too long to complete tests.

Reviewed By: houseroad

Differential Revision: D27583809

fbshipit-source-id: e9c088ccbfd3bfb3a1d4c7eafee3eca29ee7717b
2021-04-06 14:20:36 -07:00
Jerry Zhang
c57541ce06 [quant][graphmode][fx] Separate handling Copy operator to a helper function (#54644)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54644

Previously we special case copy operator in normal insert observer code, this PR tries to split the
special case logic to a separate function and keep the rest of the code clean.

Test Plan: Imported from OSS

Reviewed By: vkuzo

Differential Revision: D27314678

fbshipit-source-id: d36870ceb3717bc01eaeaa6f3f1532ad562cbaf1
2021-03-31 17:50:32 -07:00
Jerry Zhang
55544cb13a [quant][graphmode][fx] Add support for one value being quantized with different qconfigs (#53586)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53586

Previously one value can only be quantized to one dtype, this PR adds the support for quantizing one value
in the fx graph with multiple dtypes, e.g. first quantize to int8 and then float16

might do some followup PRs to clean up the hacks and refactor the code.

Test Plan:
python test/test_quantization.py TestQuantizeFx.test_multiple_qconfigs_single_value

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D26912676

fbshipit-source-id: ae3653fd67f05870a3a9e808f491871826c555d5
2021-03-31 17:48:50 -07:00
Supriya Rao
a7dc0ab845 [quant][fx][pyper] Get first linear use of quantize_per_tensor for FQN (#54859)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54859

This is applicable to the case when a call_function linear op is one of the users of quantize op
In order to be able to map the qparams of quantize_per_tensor to the qparams of the linear operator
that consumes it, we need to use the FQN of the module with linear op for the qparmas of quantize_per_tensor.

Test Plan:
python test/test_quantization.py test_qparams_fqn

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D27390505

fbshipit-source-id: a47af0e5ac016f2b2df74fbdf45afe99dc04be46
2021-03-30 08:38:51 -07:00
Supriya Rao
a0a7a2d648 [quant][fx] store dtype, axis as literals in the graph (#54624)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54624

previously we were creating setattr nodes for dtype and axis.
The FX convention is that primitive types are embedded as literals in args/kwargs.

With this change we won't see getattr nodes in the graph anymore for dtype/axis

Test Plan:
python test/test_quantization.py TestQuantizeFx

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D27306898

fbshipit-source-id: a7c91c7cb21ee96015c7f8830b38d943ada65358
2021-03-28 21:59:49 -07:00
Vasiliy Kuznetsov
93d5807c1e [not for land yet]fix using size of quant layer in torch._assert (#53187)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53187

Before this diff, if we had code lik

```
x = any_quant_layer(...)
x_size0 = x.size(0)
torch._assert(x_size_0 == 1)
```

The convert code would try to insert a dequantize after `x_size0`,
because it was a descendant of a quantized node and it was needed
for a non-quantized operation.  Since the actual type of the `size`
function output is an integer, this does not make sense.

For now, this is fixed as a one-off to unblock a customer.  In the
future, we may need to think more deeply about all the functions which
can return non-quantized types from quantized tensors and make sure
they are all covered.

Test Plan:
```
python test/test_quantization.py TestQuantizeFx.test_assert_on_size_after_quant_layer
```

Imported from OSS

Reviewed By: raghuramank100

Differential Revision: D26780690

fbshipit-source-id: 44cc25c9179d460efb3f110d40b73d854d676af5
2021-03-12 07:43:48 -08:00
Vasiliy Kuznetsov
ccab6680d5 [not for land yet] hacky fix for x.ndim followed by sub (#53120)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53120

Currently there is a pattern which is not handled correctly by
FX graph mode quantization:

```
def forward(self, x):
    ndim = x.ndim
    # or add, mul, div, etc
    x = torch.sub(x, ndim)
    return x
```

The reason this does not work is as follows:
1. x.ndim becomes a getattr node
2. the real world type of x.ndim is an integer, but this is not known from the graph (yet)
3. binary ops such as `torch.sub` require quantization of inputs
4. the framework inserts an observer to observe the output of `ndim`
5. the observer fails because `ndim` is not a Tensor

For now, we hack a bandaid to unblock some teams, none of this is for
land.  We will have to think of a better fix which is landable (TBD).

Test Plan:
```
python test/test_quantization.py TestQuantizeFx.test_getattr_with_nontensor_result
```

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D26756180

fbshipit-source-id: c0e498766b22c23df74fbb5aaeaa237c4c944263
2021-03-12 07:42:12 -08:00
Jerry Zhang
d9fa957ecc [quant][graphmode][fix] Handle the case when observed node has no users (#53210)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53210

Test Plan: Imported from OSS

Reviewed By: vkuzo

Differential Revision: D26791724

fbshipit-source-id: b2a226a22d6aba86dd01cacbb56577048a289b3e
2021-03-10 15:08:48 -08:00
Jerry Zhang
177694681e [quant][graphmode][fx] Add reference option support for linear_dynamic_fp16 (#52534)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52534

Currently linear_dynamic_fp16 has a signature that's tied to fbgemm/qnnpack
We'll need to produce a pattern equivalent to linear_dynamic_fp16 to support extensions
to other backends

Test Plan:
python test/test_quantization.py TestQuantizeFxOps.test_linear_dynamic_fp16

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D26557726

fbshipit-source-id: 270c9f781f73c79416a092b7831294cabca84b0c
2021-02-26 21:12:22 -08:00
Jerry Zhang
7097c0d4f3 [quant][graphmode][fx] Add support for functional conv1d and conv3d (#51155) (#51254)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51254

This PR added support for quantizing functional conv1d, conv3d,  conv1d_relu and conv3d_relu

Test Plan:
python test/test_quantization.py TestQuantizeFxOps.test_functional_conv

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D26116172

fbshipit-source-id: 56e7d799c11963fe59ee3a1b6eb23f52007b91dc
2021-01-28 14:32:32 -08:00
Supriya Rao
288b94a8ee [quant][fx] Make scale, zero_point buffers in the model, use FQN (for quantize_per_tensor ops) (#51171)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51171

Following up on previous PR, this PR makes scale and zero_point for quantize_per_tensor to be
registered as buffers in the module.
Currently the dtype is still stored as attr (not registered as buffer) since we can only register tensor types.

Test Plan:
python test/test_quantization.py test_qparams_buffers

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D26092964

fbshipit-source-id: a54d914db7863402f2b5a3ba2c8ce8b27c18b47b
2021-01-28 08:35:46 -08:00
Supriya Rao
4c3f59b70e [quant][fx] Make scale, zero_point buffers in the model and use FQN (for quantized ops) (#51166)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51166

Currently scale and zero_point values are stored as constant values in the graph.
This prevents these values from being updated in the graph and also does not enable saving
these values to state_dict

After this PR we store scale/zero_point values for quantized ops as buffers in the root module
and createe get_attr nodes for them in the graph.

We also use the FQN of the module where the quantized ops are present to name these attributes so
that they can be uniquely  identified and mapped to quantized ops.

Test Plan:
python test/test_quantization.py TestQuantizeFx.test_qparams_buffers

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D26092965

fbshipit-source-id: b549b2d3dccb45c5d38415ce95a09c26f5bd590b
2021-01-28 08:35:42 -08:00
Mike Ruberry
f7e90cf311 Revert D26089965: [quant][graphmode][fx] Add support for functional conv1d and conv3d
Test Plan: revert-hammer

Differential Revision:
D26089965 (dd1a97b3ae)

Original commit changeset: 4aea507d05b7

fbshipit-source-id: f54184cafb9dd07858683489d8bd147474e7e4b3
2021-01-27 13:27:10 -08:00
Jerry Zhang
dd1a97b3ae [quant][graphmode][fx] Add support for functional conv1d and conv3d (#51155)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51155

This PR added support for quantizing functional conv1d, conv3d,  conv1d_relu and conv3d_relu

Test Plan:
python test/test_quantization.py TestQuantizeFxOps.test_functional_conv

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D26089965

fbshipit-source-id: 4aea507d05b744807e993f6d3711ab308fb7591b
2021-01-27 12:00:35 -08:00
Jerry Zhang
f6f0fde841 [reland][quant][graphmode][fx] Standalone module support {input/output}_quantized_idxs (#49754) (#50058)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50058

This PR adds the support for {input/output}_quantized_idxs for standalone module.

if input_quantized_idxs = [] and output_quantized_idxs = [], the standalone module will be expecting float
input and produce float output, and will quantize the input and dequantize output internally

if input_quantized_idxs = [0] and otuput_qiuantized_idxs = [0], the standalone module will be expecting quantized
input and produce quantized output, the input will be quantized in the parent module, and output will be dequantized
in the parent module as well, this is similar to current quantized modules like nn.quantized.Conv2d

For more details, please see the test case

Test Plan:
python test/test_quantization.py TestQuantizeFx.test_standalone_module

Imported from OSS

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D25768910

fbshipit-source-id: 96c21a3456cf192c8f1400afa4e86273ee69197b
2021-01-05 20:27:46 -08:00
Vasiliy Kuznetsov
d033e185ed fx quant: move more functions to utils (#48908)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48908

No logic change, improving readability

Test Plan:
CI

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D25363080

fbshipit-source-id: 1d73a875bd7abf671b544ebc835432fea5306dc3
2020-12-08 15:37:04 -08:00
Vasiliy Kuznetsov
6b80b664bb quant: enable mypy on torch/quantization/fx (#48331)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48331

Enables mypy to not ignore type errors in FX quantization files.  Fixes the easy
typing errors inline, and comments out the harder errors to be fixed at a later time.
After this PR, mypy runs without errors on `torch/quantization`.

Test Plan:
```
> mypy torch/quantization/
Success: no issues found in 25 source files
```

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D25133348

fbshipit-source-id: 0568ef9405b292b80b3857eae300450108843e80
2020-11-21 15:29:27 -08:00
Jerry Zhang
ed57f804fa [quant][refactor] Move some util functions from torch/quantization/fx/utils.py to torch/quantization/utils.py (#48107)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48107

Test Plan: Imported from OSS

Reviewed By: supriyar

Differential Revision: D25026495

fbshipit-source-id: 3634b6b95a18670232600874b1e593180ea9f44c
2020-11-18 22:32:19 -08:00
Jerry Zhang
5883e0b0e0 [quant][fix][ez] Fix quant_type classification for fp16, fp16 (#48073)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48073

Test Plan: Imported from OSS

Reviewed By: supriyar

Differential Revision: D25011799

fbshipit-source-id: a12f645d6be1c607898633225b02617283d37df1
2020-11-18 20:07:54 -08:00
Jerry Zhang
be2e3dd2a1 [quant][graphmode][fx][fix] Linear work with float_qparam_dynamic_qconfig (#47068)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47068

Filter the dtype config before performing the quantization in linear

Test Plan: Imported from OSS

Reviewed By: supriyar

Differential Revision: D24627907

fbshipit-source-id: 162fa47b3fcf6648049f8bc0438e41ee97ac19e9
2020-11-02 16:28:33 -08:00
Jerry Zhang
998b9b9e68 [quant][graphmode][fx] custom_module support static/dynamic/weight_only quant (#46786)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46786

Previously we only support static quant, this PR added support for other types of quantization.

Note qat is actually orthogonal to these quant types, this is referring to the convert step where we
convert the observed module to a quantized module.

for qat, user will provide a CustomModule -> FakeQuantizedCustomModule in prepare_custom_config_dict
and FakeQuantizedCustomModule -> static/dynamic/weight_only quantized CustomModule in convert_custom_config_dict.

Test Plan: Imported from OSS

Reviewed By: raghuramank100

Differential Revision: D24514701

fbshipit-source-id: 2918be422dd76093d67a6df560aaaf949b7f338c
2020-10-27 21:41:33 -07:00
Jerry Zhang
4f685ecc25 [reland][quant][graphmode][fx] Merge all quantization mode (#45292) (#45672)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45672

This PR merges all quantization mode and will only expose the following top level functions:
```
prepare_fx
prepare_qat_fx
convert_fx
```

Test Plan:
Imported from OSS

Imported from OSS

Reviewed By: z-a-f

Differential Revision: D24053439

fbshipit-source-id: 03d545e26a36bc22a73349061b751eeb35171e64
2020-10-01 15:47:11 -07:00
Mike Ruberry
c36b354072 Revert D23913105: [quant][graphmode][fx] Merge all quantization mode
Test Plan: revert-hammer

Differential Revision:
D23913105 (ffcb0989e7)

Original commit changeset: 4e335286d6de

fbshipit-source-id: 5765b4e8ec917423f1745f73a9f3f235fc53423d
2020-10-01 03:12:42 -07:00