Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61942
This PR changes is_reference=True for conv to produce a pattern consists of dequant - float conv - quant instead of reference conv module, this is useful for future transformations to custom backends, it is also helpful to simplify the implementation for
convert in the future.
Test Plan:
python test/test_quantization.py TestQuantizeFxOps
Imported from OSS
Reviewed By: vkuzo
Differential Revision: D29810656
fbshipit-source-id: 549237a62bfda4341a2a7474c124f5e33350e267
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62277
This PR changes is_reference=True for linear to produce a pattern consists of dequant - float linear - quant instead of reference linear module, this is useful for future transformations to custom backends, it is also helpful to simplify the implementation for
convert in the future.
Test Plan:
python test/test_quantization.py TestQuantizeFxOps
Imported from OSS
Imported from OSS
Reviewed By: ejguan
Differential Revision: D29941079
fbshipit-source-id: 84bdfc0bb872c34fc345875e545c8b323e77c41e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61892
This PR changes is_reference=True for linear to produce a pattern consists of dequant - float linear - quant instead of reference linear module, this is useful for future transformations to custom backends, it is also helpful to simplify the implementation for
convert in the future.
Test Plan:
python test/test_quantization.py TestQuantizeFxOps
Imported from OSS
Reviewed By: vkuzo
Differential Revision: D29810657
fbshipit-source-id: 949615bbc017bc454d81c8a6b2bdec53badaab19
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61859
BC-breakign note:
Previously we do not add observer/fake_quant for output of add/mul for tensor - scalar operation,
in this PR we added the observer/fake_quant instance (that's the same as input) to correctly model
the behavior of the quantized add_scalar and mul_scalar op (since quantized add/mul scalar assumes the
output quantized tensor have the same quantization parameter as input)
Test Plan:
python test/test_quantization.py TestQuantizeFxOps.test_add
python test/test_quantization.py TestQuantizeFxOps.test_mul
Imported from OSS
Reviewed By: vkuzo
Differential Revision: D29770859
fbshipit-source-id: f43fcbfecd04c392467770b22c481bbbdaf43c25
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61687
Previously we do not insert observer/fake_quant for output copy nodes (e.g. maxpool).
But to produce reference patterns we need to insert observer/fake_quant for the output and later convert that to a quantize
node.
Model:
```
class M(torch.nn.Module):
def __init__(self):
super().__init__()
self.maxpool2d = torch.nn.MaxPool2d(kernel_size=3)
def forward(self, x):
x = self.maxpool2d(x)
return x
```
result of prepare:
Before:
def forward(self, x):
x_activation_post_process_0 = self.x_activation_post_process_0(x); x = None
maxpool2d = self.maxpool2d(x_activation_post_process_0); x_activation_post_process_0 = None
return maxpool2d
After:
def forward(self, x):
x_activation_post_process_0 = self.x_activation_post_process_0(x); x = None
maxpool2d = self.maxpool2d(x_activation_post_process_0); x_activation_post_process_0 = None
maxpool2d_activation_post_process_0 = self.maxpool2d_activation_post_process_0(maxpool2d); maxpool2d = None
return maxpool2d_activation_post_process_0
Test Plan: Imported from OSS
Reviewed By: vkuzo
Differential Revision: D29715566
fbshipit-source-id: 817df9b2933a35cad5331d8d8ce1c5ba0752e9df
Summary:
This PR enables gpu only quantization, best used with is_reference since
there are not many gpu kernels for ops as of now.
This PR mainly changes how qconfigs and their obs constructors operate once they
on modules qconfig. The function add_module_to_qconfig_obs_ctr takes the obs constructors on the original
qconfig, and configures them so that when invoked, the created obs will
be on whatever device the module occupies. (Once observers are created,
module.to(device) is already setup so that it moves any observers). To do this,
a new method and a few small chanegs were added to the _PartialWrapper class that
our observers already use to create constructors (without changing the
existing functionality). These changes work in
concert with changes to the prepare flow such that when the qconfigs are
propagated to the moduels (in quantize.py and qconfig_utils.py) they are configured using add_module_to_qconfig_obs_ctr.
Ideally this would work on other models but the is_reference support for
a lot of modules isn't there yet, those tests should be added in a
future PR
Test Plan:
python test/test_quantization.py TestQuantizeFxModels.test_static_gpu_convert_basic
python test/test_quantization.py TestQuantizeFxModels.test_switch_device_prepare_convert
python test/test_quantization.py TestQuantizeFxModels.test_prepare_serialize_switch_device_convert
python test/test_quantization.py TestQuantizeFx.test_qconfig_precedence
Reviewed By: vkuzo
Differential Revision: D29684114
fbshipit-source-id: 19fefb8e1998eaf212723e836276ccf39467f2e7
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62041
Before this PR, weights of conv and linear modules were extracted
as lists, in order to match the signature of LSTM weights.
After this PR, weight extraction preserves the type of the weights,
so extracted weights of conv and linear have a different type
from LSTM weights. The comparison util functions are updated to
handle the LSTM weight type of `List[tensor]`.
Test Plan:
```
python test/test_quantization.py TestFXNumericSuiteCoreAPIs
python test/test_quantization.py TestFXNumericSuiteCoreAPIsModels
```
Imported from OSS
Reviewed By: jerryzh168
Differential Revision: D29853626
fbshipit-source-id: 93da5b9b0b174679c61528d02b6b902cb064444e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62038
Updates the logic to extract weights from nodes to use a
direct mapping from type to weight extraction function.
This is needed for a future PR which will allow users to
specify custom weight extraction functions for user defined
types.
Test Plan:
```
python test/test_quantization.py TestFXNumericSuiteCoreAPIs
python test/test_quantization.py TestFXNumericSuiteCoreAPIsModels
```
Imported from OSS
Reviewed By: jerryzh168
Differential Revision: D29853627
fbshipit-source-id: 3ef90ef4bd7b28f6316c0af215a2bd3ff8a2aeca
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61377
Both the quantization tracer and the NS tracer record
`_node_name_to_scope`, which contains the mapping from
node name to FQN.
This PR adds the FQN information to the NS results, so that it is
more convenient for users to attribute a NS result to the corresponding
module in their model.
Test Plan:
```
python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_extract_weights_fqn
python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_match_activations_fqn
python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_shadow_activations_fqn
```
Imported from OSS
Reviewed By: jerryzh168
Differential Revision: D29600349
fbshipit-source-id: df489e03daff97dd380f59c83ffdc2b0012a0a53
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61360
By default, NS graph matching matches from the end of the graph
to the start. This PR reverses the returned results so that
the outputs of the NS APIs are in the order of execution, making
it easier to analyze.
Test Plan:
```
python test/test_quantization.py TestFXGraphMatcher.test_results_order
```
Imported from OSS
Reviewed By: hx89
Differential Revision: D29600348
fbshipit-source-id: c9fa4a3748db27c1788eebf803f35221e6fc8701
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61359
Makes the error messages when graph matching easier to read
for users.
Test Plan:
```
// inspect the exceptions in the following two tests and verify
// that they are easier to read than before
python test/test_quantization.py TestFXGraphMatcher.test_matching_failure_node_count
python test/test_quantization.py TestFXGraphMatcher.test_matching_failure_node_type
```
Imported from OSS
Reviewed By: hx89
Differential Revision: D29600353
fbshipit-source-id: ec6640fe6cab7b62a697e4ee385be182f2918fd4
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61323
Before this PR, all observers and fake quants were silently removed
when adding loggers with NS. This is problematic for QAT models because
we need the fake quants to run in order to properly capture intermediate
outputs.
This PR fixes the issue by preserving the observers throughout
the passes which add loggers. In detail:
* for each quantization module or fusion, add additional patterns with that fusion and an observer/fake_quant at the end
* remove the places in the logger model creation code which removed observers
* add unit testing that QAT numerics do not change after adding loggers
Test Plan:
```
python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_loggers_preserve_qat_numerics
python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_shadow_loggers_preserve_qat_numerics
```
Imported from OSS
Reviewed By: hx89
Differential Revision: D29600351
fbshipit-source-id: 5f25118b79eb47860c49bca882de6a8eae7a4456
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61129
Adds support the comparing fp32 model (without quantization) to a
fp32 model prepared with quantization. The main missing feature was
handling conv-bn fusion, since this fusion for PTQ happens outside
of quantization patterns.
Adds testing for this case for comparing weights and comparing
activations
Adds a TODO for also handling this for shadow activations, we need to
first stop removing observers in graph passes before we can add
this support, will be in a future PR.
Test Plan:
```
python test/test_quantization.py TestFXGraphMatcherModels.test_mobilenet_v2
python test/test_quantization.py TestFXGraphMatcherModels.test_mobilenet_v2_qat
python test/test_quantization.py TestFXNumericSuiteCoreAPIsModels.test_compare_activations_conv
```
Imported from OSS
Reviewed By: raghuramank100
Differential Revision: D29520009
fbshipit-source-id: f63484a998f1424bd9cacf5d823b82b2edfea1ae
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61285
Modifies observers to support conv layers and tests to make sure that the observers are returning the expected values for conv inputs.
Test Plan:
`python test/test_quantization.py TestEqualizeFx.test_input_weight_eq_observer`
Imported from OSS
Reviewed By: jerryzh168
Differential Revision: D29557041
fbshipit-source-id: 5e43329f189ba352eb8b991f38bf37752eebb6e6
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61317
Add an overload to fake_quantize_per_tensor that accepts scale/zero_point as input. The reasons to do this are
* required for fused observer + fake_quant operator on GPU where the scale/zero_point will be calculated by the observer on device. Passing tensor inputs enables us to directly access the scale/zero-point value in the cuda kernel to avoid extra copies/malloc
* enables us to pass in float as scale dtype and int32 as zero_point dtype (which is consistent with what the quantize call actually uses) https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/quantized/affine_quantizer_base.cpp#L52-L53
* overload consistent with `quantizer_per_tensor.tensor_qparams`
ghstack-source-id: 133370216
Test Plan:
buck test mode/dev-nosan caffe2/test/:quantization -- test_backward_per_tensor_cachemask
buck test mode/dev-nosan caffe2/test/:quantization -- test_forward_per_tensor_cachemask
Reviewed By: raghuramank100
Differential Revision: D29552727
fbshipit-source-id: cbb9af40fc575ad27a29c646b760d5ee52cc923d
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61132
For large models, the insert_observers_for_model function was taking a long time, especially for the case where not all the nodes are being quantized
For example for a model with 21000 nodes of which only ~50 are being quantized the breakdown of prepare_fx vs convert fx was
prepare_fx 979 seconds
convert_fx 9 seconds
The main reason was because we were doing some unnecessary computation for all nodes in this function, this PR just moves them to where they are actually used
After this PR
prepare_fx 26 seconds
convert_fx 9 seconds
Test Plan:
Existing tests
Imported from OSS
Reviewed By: raghuramank100
Differential Revision: D29522303
fbshipit-source-id: 7ce12582a859d02ff763abebf4a592d28e0764ca
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60883
As per this [comment](https://github.com/pytorch/pytorch/pull/59964#discussion_r659064270), I created a `reset_min_max_vals()` function inside the observers which will be called during input-weight equalization. This is so that we will not expose the implementation of the observers in the equalization code.
Test Plan:
`python test/test_quantization.py TestEqualizeFx`
Imported from OSS
Reviewed By: jerryzh168
Differential Revision: D29491848
fbshipit-source-id: 00e91959ceb3b4f3688175a1a7ba11823e929b2f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60419
Adds support for NS for FX shadowed activations pass to handle int8
modules shadowing fp32 modules. The difficulty here is that in order
to insert the dtype cast, we need the qparams of the input.
For the current PR, we only handle the easy cases where the previous
node is either a `quantize_per_tensor` or an OSS quantized module.
A future PR can handle more complicated cases such as various functions.
Test Plan:
```
python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_int8_shadows_fp32_simple
```
Imported from OSS
Reviewed By: hx89
Differential Revision: D29280050
fbshipit-source-id: 465257c9f82a34fa91b48ae8887355c68e00edc6
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60779
When we do fusion, we replace certain modules (such as Linear + ReLU) with fused versions (such as LinearReLU) by calling `_fuse_fx` in prepare_fx. However when we try to look up using the fused module type in qconfig_dict, we cannot find a match anymore since the qconfig dict contains the original module types. An example is here [N882873](https://fburl.com/anp/azenjx3v).
So we will now update the qconfig_dict to include the fused modules mapping to the qconfigs used for the modules that make up the fused modules. If the modules are not mapped to the same qconfig, then we will raise an error.
Test Plan:
`python test/test_quantization.py TestFuseFx.test_qconfig_fused_module`
Imported from OSS
Reviewed By: supriyar
Differential Revision: D29406941
fbshipit-source-id: 74b5db89f4998aeb02b2bf7c37bf97326580c654
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60559
Adds `resnet18` to integration test, and fixes the error to
make creating the shadow model work.
Test Plan:
```
python test/test_quantization.py TestFXNumericSuiteCoreAPIsModels.test_resnet18
```
Imported from OSS
Reviewed By: jerryzh168
Differential Revision: D29336236
fbshipit-source-id: 9425aa096162d80ef3a7c98144b2301cfbccc1ea
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60305
Adjusts the NS for FX weight and activation extraction APIs
to require a model name, and rekeys the results of these APIs
to use the node names of the specified model as layer keys.
For example, before
```
// API call
results = ns.extract_logger_info(
model_a, model_b, ns.OutputLogger)
// results
{'base_op_1_0': {'node_output':
{'model_a': [{'ref_node_name': 'linear1', ...}]}}}
```
and after
```
// API call
results = ns.extract_logger_info(
model_a, model_b, ns.OutputLogger, 'model_b_name')
// results
// note: instead of `base_op_1_0`, the layer is named `linear1`
{'linear1': {'node_output':
{'model_a': [{'ref_node_name': 'linear1', ...}]}}}
```
Note: we cannot use these names while collecting data because
node names are not guaranteed to be consistent across graphs.
This is why we only rekey as the very last step.
Test Plan:
```
python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_layer_names
```
Imported from OSS
Reviewed By: hx89
Differential Revision: D29243045
fbshipit-source-id: d39ecdfdd18b07291e3ecefed2ede287b100b7d0
Summary:
There is a very common error when writing docs: One forgets to write a matching `` ` ``, and something like ``:attr:`x`` is rendered in the docs. This PR fixes most (all?) of these errors (and a few others).
I found these running ``grep -r ">[^#<][^<]*\`"`` on the `docs/build/html/generated` folder. The regex finds an HTML tag that does not start with `#` (as python comments in example code may contain backticks) and that contains a backtick in the rendered HTML.
This regex has not given any false positive in the current codebase, so I am inclined to suggest that we should add this check to the CI. Would this be possible / reasonable / easy to do malfet ?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60474
Reviewed By: mrshenli
Differential Revision: D29309633
Pulled By: albanD
fbshipit-source-id: 9621e0e9f87590cea060dd084fa367442b6bd046
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60555
When we do QAT, we swap the FP32 modules with the corresponding quantized modules counterpart by calling `qat_swap_modules` in prepare.
However when we try to look up using the swapped module type in qconfig_dict, we cannot find a match anymore since the qconfig dict contains the original
module type.
In this PR we update the qconfig_dict to include the modules swapped for QATT
Test Plan:
python test/test_quantization.py TestQuantizeFx.test_qconfig_qat_module_type
Imported from OSS
Reviewed By: jerryzh168
Differential Revision: D29337036
fbshipit-source-id: 60212eec3ee252a2445c1b58874cb36048c9f7dd
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59963
When converting, before quantizing the nodes, we call
`update_obs_for_equalization()` and `convert_eq_obs()`.
`update_obs_for_equalization`:
1. For each InputEqualizationObserver, we find the corresponding
WeightEqualizationObserver.
2. For nn.Linear layers, we will create an instance of the
WeightEqualizationObserver, run forward on the observer with the given
weights.
3. Calculate the equalization scale between the
InputEqualizationObserver and WeightEqualizationObserver.
`convert_eq_obs`:
For every InputEqualizationObserver, we will do the following:
1. Create a node (ex. `x0_activation_post_process_scale`) containing the
equalization scale constant.
2. Create another node containing a `mul` operator multiplying the
equalization scale and the input.
3. Remove the current InputEqualizationObserver node, and replace it
with the `mul` node.
For every WeightEqualizationObserver, we will do the following:
1. Get the next equalization scale (we may need this for equalizing
connected linear layers).
2. Scale the weights by multiplying it with the reciprocal of the
current equalization scale and the next equalization scale
Currently, this supports models with `nn.Linear` layers, but does not
support connecting linear layers.
Test Plan:
`python test/test_quantization.py
TestEqualizeFx.test_input_weight_equalization_convert`
Original Model:
```
.LinearModule(
(linear): Linear(in_features=2, out_features=2, bias=True)
)
```
Graph after `prepare_fx`:
```
graph():
%x : [#users=1] = placeholder[target=x]
%x_equalization_process_0 : [#users=1] = call_module[target=x_equalization_process_0](args = (%x,), kwargs = {})
%x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_00](args = (%x_equalization_process_0,), kwargs = {})
%linear : [#users=1] = call_module[target=linear](args = (%x_activation_post_process_0,), kwargs = {})
%linear_activation_post_process_0 : [#users=1] = call_module[target=linear_activation_post_process_0](args = (%linear,), kwargs = {})
return linear_activation_post_process_0
```
Graph after equalization functions:
```
graph():
%x : [#users=1] = placeholder[target=x]
%x_equalization_process_0_scale : [#users=1] = get_attr[target=x_equalization_process_0_scale]
%mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_equalization_process_0_scale), kwargs = {})
%x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_00](args = (%mul,), kwargs = {})
%linear : [#users=1] = call_module[target=linear](args = (%x_activation_post_process_0,), kwargs = {})
%linear_activation_post_process_0 : [#users=1] = call_module[target=linear_activation_post_process_0](args = (%linear,), kwargs = {})
return linear_activation_post_process_0
```
Graph after `convert_fx`:
```
graph():
%x : [#users=1] = placeholder[target=x]
%x_equalization_process_0_scale : [#users=1] = get_attr[target=x_equalization_process_0_scale]
%mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_equalization_process_0_scale), kwargs = {})
%linear_input_scale_0 : [#users=1] = get_attr[target=linear_input_scale_0]
%linear_input_zero_point_0 : [#users=1] = get_attr[target=linear_input_zero_point_0]
%quantize_per_tensor : [#users=1] = call_function[target=torch.quantize_per_tensor](args = (%mul, %linear_input_scale_0, %linear_input_zero_point_0, torch.quint8), kwargs = {})
%linear : [#users=1] = call_module[target=linear](args = (%quantize_per_tensor,), kwargs = {})
%dequantize : [#users=1] = call_method[target=dequantize](args = (%linear,), kwargs = {})
return dequantize
```
Imported from OSS
Reviewed By: jerryzh168
Differential Revision: D29135358
fbshipit-source-id: 2d00056729041318463de61841483490b6bfeee5
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60386
During QAT we sometimes encounter errors with scripted models
`RuntimeError: cannot resize variables that require grad`
For per-tensor cases we don't need to resize some buffers so this PR removes the extra resize ops where applicable
Test Plan: Imported from OSS
Reviewed By: vkuzo
Differential Revision: D29271905
fbshipit-source-id: 01a484a9559a3a4180490f9476d0cd3044ba0d1b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60054
Previously env in convert is Dict[str, Tuple[Node, torch.dtype]], that is, at a given time each node can only have one dtype,
this causes a problem for the following case:
```
class M(torch.nn.Module):
def __init__(self):
super().__init__()
self.conv = nn.Conv2d(1, 1, 1)
def forward(self, x):
x = self.conv(x)
x1 = x.expand_as(x)
x2 = torch.add(x, x1)
return x2
def forward(self, x):
x = self.activation_post_process_0(x)
x = self.conv(x)
x = self.activation_post_process_1(x)
x1 = x.expand_as(x)
x1 = self.activation_post_process_2(x1)
x2 = torch.add(x, x1)
x2 = self.activation_post_process_3(x2)
return x2
def forward(self, x):
x = torch.quantize_per_tensor(x, ...)
x = self.conv(x). # quantized conv
x = torch.dequantize(x)
x1 = x.expand_as(x)
x1 = torch.quantize_per_tensor(x1, ...)
# Error: x is dequantized
x2 = torch.ops.quantized.add(x, x1)
return x2
Currently we have a env that is a map from node name of the observed graph to the Node in the quantized graph, here the problem is that following a quantized operator conv, we have two operators, one is expecting float input (expand_as), the other is expecting quantized input (quantized add), and in the quantized graph, ideally, expand_as should consume the dequantized output, and quantized add should consume the quantized output:
quantized_conv - dequantize - expand_as
\ ------- quantized_add
But currently in env, each node needs to either be quantized or not quantized. Therefore we will need to change env to include dtype as well:
env: Dict[str, Dict[dtype, Node]], e.g. {‘x’: {torch.float: dequantized_node, torch.quint8: quantized_node}}
And when we load from the env, we will need to provide the dtype of the Node that we want to load as well. We can have a separate pass to figure out this information for each node.
```
Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
Imported from OSS
Reviewed By: vkuzo
Differential Revision: D29149408
fbshipit-source-id: c9e4b7d65444ab6a6f573929bae1db5037629892
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60103
Adds internal logging for NS for FX API usage.
Test Plan: CI
Reviewed By: jerryzh168
Differential Revision: D29166710
fbshipit-source-id: 2a1bf2f6038b0c6c5945b57b2db2de25c585a04a
Summary:
During development it is common practice to put `type: ignore` comments on lines that are correct, but `mypy` doesn't recognize this. This often stems from the fact, that the used `mypy` version wasn't able to handle the used pattern.
With every new release `mypy` gets better at handling complex code. In addition to fix all the previously accepted but now failing patterns, we should also revisit all `type: ignore` comments to see if they are still needed or not. Fortunately, we don't need to do it manually: by adding `warn_unused_ignores = True` to the configuration, `mypy` will error out in case it encounters an `type: ignore` that is no longer needed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60006
Reviewed By: jbschlosser, malfet
Differential Revision: D29133237
Pulled By: albanD
fbshipit-source-id: 41e82edc5cd5affa7ccedad044b59b94dad4425a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59953
The following modifications were made to the equalization
observers due to design changes:
- [InputEqualizationObserver] Replaced `calculate_qparams()` with
`calculate_scaled_minmax()` since we will need to return the scaled
min/max values to update the following input quantization observer
- [WeightEqualizationObserver] We no longer need a row observer since
this will be taken care of by the following weight quantization observer
- [WeightEqualizationObserver] Following the previous comment, we no
longer need to calculate the scaled qparam values. Instead, we will use
the equalization scale to later scale the weights and the qparams will
be taken care of by the weight quantization observer.
Test Plan:
`python test/test_quantization.py
TestEqualizeFx.test_input_weight_eq_observer`
Imported from OSS
Reviewed By: supriyar
Differential Revision: D29135332
fbshipit-source-id: be7e468273c8b62fc183b1e1ec50f6bd6d8cf831
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59739
Created an EqualizationQConfig specifically for equalization.
This inherits from QConfig and is used to distinguish between inserting
an input observer with an output observer. Since the output observer
field is included in the EqualizationQConfig, we no longer need an
output observer field in the _InputEqualizationObserver
Test Plan:
compiles
Imported from OSS
Reviewed By: ezyang
Differential Revision: D29135298
fbshipit-source-id: 3dde9c029c291467ff0a0845f0fc9c44573fc6f6
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59605
Enables targeting of individual function invocations by execution order.
For example, given a module such as
```
class M1(torch.nn.Module):
def forward(self, x):
x = torch.add(x, x)
x = torch.add(x, x)
return x
class M2(torch.nn.Module):
def __init__(self):
self.m1 = M1()
def forward(self, x):
x = self.m1(x)
return x
```
We can now target the first add of `m1` with
```
qconfig_dict = {
"module_name_function_order": ("m1", torch.add, 0, custom_qconfig),
}
```
Test Plan:
```
python test/test_quantization.py TestQuantizeFx.test_qconfig_module_name_function_order
```
Imported from OSS
Reviewed By: hx89
Differential Revision: D28951077
fbshipit-source-id: 311d423724a31193d4fa4bbf3a712b46464b5a29
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59799
This is a redo of #58574, easier to create a new PR than to fix rebase
conflicts, as there have been a large number of refactors to the
underlying code.
Removes some code which was incorrectly added by #57519 but never
actually used for anything.
Test Plan:
```
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
```
Imported from OSS
Reviewed By: jerryzh168
Differential Revision: D29031955
fbshipit-source-id: f407d181070cb283382965952821e3647c705544
Summary: Implemented two observers (InputEqualObserver and WeightEqualObserver) which will be inserted into the graph during prepare_fx().
Test Plan: python test/test_quantization.py TestEqualizeFx
Reviewed By: supriyar
Differential Revision: D28836954
fbshipit-source-id: 25517dc82ae67698ed8b2dc334e3323286976104
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59042
To remove Quantizer class and split prepare and convert functions to different files
Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
Imported from OSS
Reviewed By: vkuzo
Differential Revision: D28724867
fbshipit-source-id: 9f87d51020caa20d5408cb2820947e23d92d5fc3
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59041
Static quantization for Custom module support was removed in a previous refactor
https://github.com/pytorch/pytorch/pull/57519 since it's not covered by the test case
This PR re-enabled the test case and fixed the support
Test Plan: Imported from OSS
Reviewed By: vkuzo
Differential Revision: D28724866
fbshipit-source-id: 1974675b88b56a2173daf86965d6f3fb7ebd783b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59040
To remove Quantizer class and split prepare and convert functions to different files
Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
Imported from OSS
Reviewed By: vkuzo
Differential Revision: D28724870
fbshipit-source-id: c0f748711b825cd46bdfcc05c054c77a41e8207a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59039
To remove Quantizer class and split prepare and convert functions to different files
Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
Imported from OSS
Reviewed By: vkuzo
Differential Revision: D28724874
fbshipit-source-id: bd984716b2da1d6879c3e92fa827574783a41567
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59038
To remove Quantizer class and split prepare and convert functions to different files
Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
Imported from OSS
Reviewed By: vkuzo
Differential Revision: D28724869
fbshipit-source-id: e8501c9720b5ddb654e78bc8fa08de0466c1d52b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59037
To remove Quantizer class and split prepare and convert functions to different files
Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
Imported from OSS
Reviewed By: vkuzo
Differential Revision: D28724865
fbshipit-source-id: 6c6824d0af7dd47d4c111d6a08e373bc65f33e08
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59036
To remove Quantizer class and split prepare and convert functions to different files
Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
Imported from OSS
Reviewed By: vkuzo
Differential Revision: D28724862
fbshipit-source-id: 5900420127fcc14846bc34c9ac29ff7e6a703f1e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59035
To remove Quantizer class and split prepare and convert functions to different files
Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
Imported from OSS
Reviewed By: vkuzo
Differential Revision: D28724872
fbshipit-source-id: d32752c635917c9820e5e7cc414ba9d48a258a19
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59034
To remove Quantizer class and split prepare and convert functions to different files
Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
Imported from OSS
Reviewed By: vkuzo
Differential Revision: D28724873
fbshipit-source-id: 870e0822843ad1d035f41eaa015bdde9ccf6ec23
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59033
To remove Quantizer class and split prepare and convert functions to different files
Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
Imported from OSS
Reviewed By: vkuzo
Differential Revision: D28724861
fbshipit-source-id: 97b38e851b6bf581510a24636b1d8d6f1d977f5a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59032
To remove Quantizer class and split prepare and convert functions to different files
Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
Imported from OSS
Reviewed By: vkuzo
Differential Revision: D28724868
fbshipit-source-id: 6df639f20076b480812b6dcf0fc7d2c87ca29d8b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59028
Previously we have an env and a quant_env in convert, which is a bit confusing,
in this PR we merged them and have a Dict[str, Tuple[Node, torch.dtype]]
Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
Imported from OSS
Reviewed By: vkuzo
Differential Revision: D28724863
fbshipit-source-id: 722a682c70d300a6ccd2b988786a1ac2d45e880e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57068
When training with histogram observer on, we got this runtime error:
```
torch/quantization/observer.py", line 942, in forward
self.bins)
self.histogram.resize_(combined_histogram.shape)
~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
self.histogram.copy_(combined_histogram)
self.min_val.resize_(combined_min.shape)
RuntimeError: cannot resize variables that require grad
```
Since this is the histogram observer that is used to collect histogram information, should not need gradient. So turn off the grad before resizing using `detach_()` method.
Test Plan:
- arc lint
- Train with histogram observer turned on, training finished successfully
f264139727
Reviewed By: supriyar
Differential Revision: D27147212
fbshipit-source-id: abed5b9c4570ffc6bb60e58e64791cfce66856cd
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57067
auto format the code
Test Plan: lint
Reviewed By: jerryzh168
Differential Revision: D27147213
fbshipit-source-id: 008871d276c8891b2411549e17617e5c27d16ee3
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58792
Enabling support for fused modules like ConvReLU or LinearReLU on eager mode cross-layer equalization.
Test Plan:
`python test/test_quantization.py TestEqualizeEager`
Imported from OSS
Reviewed By: jerryzh168
Differential Revision: D28647242
fbshipit-source-id: 286e057ce70aa7de45d575afd6c13e55120ff18a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58566
Validates the keys of the qconfig_dict, prepare_custom_config_dict, convert_custom_config_dict, and
fuse_custom_config_dict. If the user passes in an invalid key or makes a type, we will throw and error and let the user know what keys are supported.
Test Plan:
Imported from OSS
python test/test_quantization.py
Reviewed By: jerryzh168
Differential Revision: D28540923
fbshipit-source-id: 5958c32017b7d16abd219aefc8e92c42543897c2
Summary:
Enable the quantization on XPU devices. Keep the model as is if the model is on XPU devices.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54857
Reviewed By: ailzhang
Differential Revision: D28501381
Pulled By: jerryzh168
fbshipit-source-id: 6d3e9b04075393248b30776c69881f957a1a837c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58453
Move the class method generate_qconfig_map to qconfig_utils, will add more PRs
to remove functions out of Quantizer and eventually remove the Quantizer object
Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
Imported from OSS
Reviewed By: vkuzo
Differential Revision: D28497965
fbshipit-source-id: 3c78cfe676965d20a8834a859ffed4d8e9ecade4
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58445
Previously the output of statically quantized fp16 operator is not quantized in QuantizeHandler, which is not consistent with
the behavior of static int8 operators. Also it does not work well with reference functions, this PR
changes the fp16 static QuantizeHandler to quantize (call to(torch.float16)) in the QuantizeHandler, this also
makes the future support for reference functions easier.
Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
Imported from OSS
Reviewed By: vkuzo
Differential Revision: D28495830
fbshipit-source-id: 2140eab8ab2dd08f6570d9e305485e3029e1f47d
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58461
Improves the logic which calculates whether a node has any tensors
in its arguments by terminating the recursion early when possible.
In a future PR, we should probably ditch this entire approach and switch to
using dtype propagation.
Test Plan:
```
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
```
Imported from OSS
Reviewed By: jerryzh168
Differential Revision: D28499455
fbshipit-source-id: bedd844022b90e1fcb7d7a3cb4cc65440dc9cc59
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58416https://github.com/pytorch/pytorch/pull/57519 had a regression not
caught by CI, it added an assertion which failed on various model
output types.
This PR removes the assertion and adds the logic to observe graph
outputs in a way that supports arbitrary output formats.
Test Plan:
```
python test/test_quantization.py TestQuantizeFx.test_output_lists_and_dicts
```
Imported from OSS
Reviewed By: z-a-f
Differential Revision: D28479946
fbshipit-source-id: bcce301f98a057b134c0cd34ab0ca96ba457863f
Summary:
tl;dr; rewrites the FX graph mode quantization observer insertion to be easier to understand and extend.
The key conceptual difference from before is:
* before: for each node, observers are always inserted to the output of the current node, even if they are needed for the next node. This is hard to reason about.
* after: for each node, observers are inserted to the inputs (if needed, as calculated by the dtype of the argument and dtype of current node) and to the output (if needed for the type of pattern and qconfig). There is no knowledge of future nodes needed to insert observers for the current node.
This allows us to significantly simplify various things:
* all new observers needed for a node are inserted together. This makes it easier to understand and debug things. We add an invariant that node X will never change any observers inserted by any preceding or subsequent node, so to debug an issue the user can just understand what is happening for node X, without having to understand what happens before or after it.
* all the state tracking of activation_post_process_map and activation_post_process_indices are removed, instead observers are looked up by graph traversals
* since there is no longer a need for overlapping graph passes which mutate each other's interemediate state, it is easier to understand what the rules are for inserting observers, and to create new rules in the future.
Test Plan:
```
# all OSS tests pass
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
```
Imported from OSS
Differential Revision: D28241864
Reviewed By: jerryzh168
Pulled By: vkuzo
fbshipit-source-id: 950d58972d26362808564cc0a2dfb30413a3734d
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57210
Removes the manually specified string name for sets of
related ops, and replaces it with an automatically generated
index. The manual name was arbitrary and ok for an MVP, but
is not safe for wide usage.
Also, adds APIs for users to add custom functions to the
relatedness map by either pairing it to a known function
or creating a new relatedness set.
Test Plan:
```
python test/test_quantization.py TestFXGraphMatcher
```
Imported from OSS
Reviewed By: jerryzh168
Differential Revision: D28077977
fbshipit-source-id: e64a1ad6cd063014d74cdad189b0a612b1143435
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57186
Before this PR, we matched any pair of nodes with equal or related
types.
This PR changes the behavior to only match nodes whose type is in
the allowlist (the relatedness mappings). This will prevent matching
user defined modules, unless users add them to the mappings.
This is motivated by a couple of things:
1. if user defined types are matched, it can break scriptability of the
model with loggers attached. This happens whenever the user module
has a return type of anything other than a Tensor or a tuple of
Tensors.
2. we tried the past behavior on a couple of models, and it hasn't been
useful.
Test Plan:
```
python test/test_quantization.py TestFXGraphMatcher
python test/test_quantization.py TestFXGraphMatcherModels
python test/test_quantization.py TestFXNumericSuiteCoreAPIs
python test/test_quantization.py TestFXNumericSuiteCoreAPIsModels
```
Imported from OSS
Reviewed By: jerryzh168
Differential Revision: D28077981
fbshipit-source-id: 0a698e52b807cda47e6923310448a985b26eb362
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57184
Add remaining types to the relationship mapping to have full coverage
of ops quantization knows about, except binary ops and RNNs.
Test Plan:
```
python test/test_quantization.py TestFXGraphMatcher.test_op_relationship_mapping
```
Imported from OSS
Reviewed By: jerryzh168
Differential Revision: D28077979
fbshipit-source-id: 0f6070c8a995032978702d088803f89ff25f2a7f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57171
No logic change, just moving the mapping to a file where
the other mappings are.
Test Plan:
```
python test/test_quantization.py TestFXNumericSuiteCoreAPIs
```
Imported from OSS
Reviewed By: jerryzh168
Differential Revision: D28077978
fbshipit-source-id: 4049d6a498156a5dffe3a03d2f4abc79da7bf907
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57470
Removes the earlier hack of matching patterns originally matched
to BinaryOpQuantizeHandler to switch to CopyHandler. After this PR,
each pattern can only be matched to one type of QuantizeHandler or
to nothing.
Test Plan:
```
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
```
Imported from OSS
Reviewed By: jerryzh168
Differential Revision: D28152909
fbshipit-source-id: afc285e770bd7eb0518c90e3ee4874c421e78bbc
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57375
Skip observing the input for masked_fill. Currently we don't have a way to
query the type of Proxy in GraphModule, hopefully we should have the functionality to annotate the type,
we'll need to annotate a Proxy to be a boolean Tensor to remove this hack.
Test Plan:
python test/test_quantization.py TestQuantizeFxOps.test_boolean_tensor
Imported from OSS
Reviewed By: vkuzo
Differential Revision: D28126003
fbshipit-source-id: 2989766370a607579b3ea07ca36cdc2ce35893cc
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57402
This is a cleanup, the value is not used by anything. It was
probably left behind after previous refactors.
Test Plan:
```
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
```
Imported from OSS
Reviewed By: jerryzh168
Differential Revision: D28133622
fbshipit-source-id: 44a3f955d4af8d6dd15b4fb3038188568e4ee549
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57399
There were a couple of functions which took `quants` as arguments
without using them, probably left over from after past refactors.
Cleaning this up to improve code readability.
Test Plan:
```
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
```
Imported from OSS
Reviewed By: jerryzh168
Differential Revision: D28132413
fbshipit-source-id: 636b146c0b5ef0caea9c4b539e245de245d48c49
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57393
Moves the information on whether we should pass the information
whether the output is quantized based on the inputs to live
on the qhandler object. This allows us to remove
FixedQParamsOpQuantizeHandler from quantize.py, further reducing
the coupling between handler objects and the quantization pass.
Test Plan:
```
python test/test_quantization.py TestQuantizeFxOps
```
Imported from OSS
Reviewed By: astaff
Differential Revision: D28132414
fbshipit-source-id: 5c28524b47c00f618d3a38657376abae9e6ffe7c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57388
It's a bit confusing to have this be a decorator. It's simpler to
just expose it as a function on qhandler.
Test Plan:
```
python test/test_quantization.py TestQuantizeFxOps
```
Imported from OSS
Reviewed By: jerryzh168
Differential Revision: D28129411
fbshipit-source-id: f7316f285e8546c67e8d8cf753462b2c2abb2636
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57377
Moves the logic which determines
1. whether a pattern instance's output should be observed
2. whether a pattern instance's output should be marked as observed based on its inputs
3. whether to ovverride the activation specified in the qconfig
from `quantize.py` to `quantization_patterns.py`. This makes
the code easier to read and reduces the coupling between `Quantizer`
and `QuantizeHandler` instances.
Note: there are some further cleanups which would be good after this one
- leaving those for future PRs.
Test Plan:
```
python test/test_quantization.py TestQuantizeFxOps
```
Imported from OSS
Reviewed By: jerryzh168
Differential Revision: D28126896
fbshipit-source-id: 94c80a9c7307452783348d65b402acc84983e3f6
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57368
1. renames functions which only sometimes insert observers to start with `maybe_`,
to clarify the difference from functions which always insert observers
2. saves a level of indent in `maybe_insert_observer_for_output_of_the_node`
Test Plan:
```
python test/test_quantization.py TestQuantizeFx
```
Imported from OSS
Reviewed By: jerryzh168
Differential Revision: D28126897
fbshipit-source-id: 4cbc184dbf5e85954314cfbbcdd1551474175bf0
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57367
This code is never hit (see insert_observer_for_output_of_the_node
which gates it out), so changing to an assert in order to
have `insert_observer` actually always insert an observer.
This helps code readability.
Test Plan:
```
python test/test_quantization.py TestQuantizeFx
```
Imported from OSS
Reviewed By: jerryzh168
Differential Revision: D28126898
fbshipit-source-id: 411bc37769a6eacbebc463ed6c84cac85871bd5e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56927
Adds the connection of `torch.add` to `toq.add_relu` and of `torch.mul`
to `toq.mul_relu`.
Test Plan:
CI
Imported from OSS
Reviewed By: supriyar
Differential Revision: D28003475
fbshipit-source-id: a12871feacf84c5afb0e1cc47e708e285695ffeb
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57173
If getitem is followed by an unmatched node, we'll remove the observer after it.
Test Plan:
python test/test_quantization.pyt TestQuantizeFxOps.test_getitem
Imported from OSS
Reviewed By: vkuzo
Differential Revision: D28068805
fbshipit-source-id: e79f8ec3e8fd61d348b8a7069ab0bb434d737c30
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57028
Adds a test case for wrapped sigmoid, and fixes the following issues
to make it pass in NS:
* allows comparing between x.sigmoid() and torch.sigmoid(x), if they are related
* allows dtype cast from FP32_OR_INT8 to FP32, via dequantize (this will be improved later)
Test Plan:
```
python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_user_defined_function
```
Reviewed By: jerryzh168
Differential Revision: D28030089
Pulled By: vkuzo
fbshipit-source-id: b237353e2d564a4476f409df461746a259015a4b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57027
Fixes a bug to allow shadowing of linear and conv functionals.
The bug is to only detach tensors, not all objects.
Test Plan:
```
python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_int8_shadows_int8_fun
```
Reviewed By: jerryzh168
Differential Revision: D28030090
Pulled By: vkuzo
fbshipit-source-id: 0a38c4b232e007d7822eee818b0af99d98335d22
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57026
Adds a config option to skip matching classes by class type
and functions by function type.
This is useful when users make custom modules which return
types other than tensors. With the current implementation of
Logger, these are not scriptable.
Test Plan:
```
python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_user_module_scriptable
```
Reviewed By: jerryzh168
Differential Revision: D28030093
Pulled By: vkuzo
fbshipit-source-id: 71dc54dd935d2071c4b017260ea2a1e5c2298bfe
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57025
Adds the ability to log unshadowed inputs of binary ops such as `add`
and `mul`, when indices 0, 1, or 0 and 1 are tensors.
Note: making shadowing support this is saved for a future PR.
Test Plan:
```
python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_add_mul_inputs_activations
```
Reviewed By: jerryzh168
Differential Revision: D28030098
Pulled By: vkuzo
fbshipit-source-id: fd46760faac153975cd7688e70c44991ec1d5dff
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57024
Enables shadow copies of fp16 emulation patterns where weights
are cast to fp16 before being passed to linear. This previously
did not work because copying of `call_method` nodes was not implemented.
Test Plan:
```
python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_linear_fp16_vs_linear_fp16_shadow_activations
```
Reviewed By: jerryzh168
Differential Revision: D28030096
Pulled By: vkuzo
fbshipit-source-id: 13a39ea6c106180df6d750246672286b58b4d04c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57022
Allows usage of user functions in NS shadow APIs. We expose the
i/o mapping to the user APIs, and thread them throughout the code.
Note: the format of the mapping is currently not the best. Saving
improving that for a future PR.
Test Plan:
```
python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_user_defined_function
```
Reviewed By: jerryzh168
Differential Revision: D28030095
Pulled By: vkuzo
fbshipit-source-id: 2863312362223ad276437e2aeeec4a3f71b691c7
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57021
To support shadows of custom functions, we need to allow user to
specify I/O type of the custom functions.
This PR is a cleanup in preparation for making the above happen.
We make the I/O dtype mappings be generated by a function instead
of a global variable. In the next PR, we will add a hook so user
can modify these mappings.
Test Plan:
```
python test/test_quantization.py TestFXNumericSuiteCoreAPIs
```
Reviewed By: jerryzh168
Differential Revision: D28030094
Pulled By: vkuzo
fbshipit-source-id: 3cbb617f034ef385c2875c4ec7fed13ca30bfc57
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56762
Adds a test case for wrapped sigmoid, and fixes the following issues
to make it pass in NS:
* allows comparing between x.sigmoid() and torch.sigmoid(x), if they are related
* allows dtype cast from FP32_OR_INT8 to FP32, via dequantize (this will be improved later)
Test Plan:
```
python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_user_defined_function
```
Imported from OSS
Reviewed By: jerryzh168
Differential Revision: D27960766
fbshipit-source-id: 02935d2f400aa0b8f3d51bbf664a6c8ca89aa811
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56742
Fixes a bug to allow shadowing of linear and conv functionals.
The bug is to only detach tensors, not all objects.
Test Plan:
```
python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_int8_shadows_int8_fun
```
Imported from OSS
Reviewed By: jerryzh168
Differential Revision: D27960767
fbshipit-source-id: abc911ca4b9edafd1effb9dada7731981538c2df
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56493
Adds a config option to skip matching classes by class type
and functions by function type.
This is useful when users make custom modules which return
types other than tensors. With the current implementation of
Logger, these are not scriptable.
Test Plan:
```
python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_user_module_scriptable
```
needs more testing before land
Imported from OSS
Reviewed By: jerryzh168
Differential Revision: D27886107
fbshipit-source-id: ec92c4f7ab7141021bc022f07b3b558b42bbb986
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56408
Adds the ability to log unshadowed inputs of binary ops such as `add`
and `mul`, when indices 0, 1, or 0 and 1 are tensors.
Note: making shadowing support this is saved for a future PR.
Test Plan:
```
python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_add_mul_inputs_activations
```
Imported from OSS
Reviewed By: jerryzh168
Differential Revision: D27864296
fbshipit-source-id: 3cbeb728297aa192d1ea17e815299709fd9db056
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56384
Enables shadow copies of fp16 emulation patterns where weights
are cast to fp16 before being passed to linear. This previously
did not work because copying of `call_method` nodes was not implemented.
Test Plan:
```
python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_linear_fp16_vs_linear_fp16_shadow_activations
```
Imported from OSS
Reviewed By: jerryzh168
Differential Revision: D27857735
fbshipit-source-id: 7c1a067f035acf7322175f8535876d0ead88a86a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56301
Allows usage of user functions in NS shadow APIs. We expose the
i/o mapping to the user APIs, and thread them throughout the code.
Note: the format of the mapping is currently not the best. Saving
improving that for a future PR.
Test Plan:
```
python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_user_defined_function
```
Imported from OSS
Reviewed By: jerryzh168
Differential Revision: D27833189
fbshipit-source-id: dac418e294d1c9b204efbf4071d5cc12a9e784c0
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56296
To support shadows of custom functions, we need to allow user to
specify I/O type of the custom functions.
This PR is a cleanup in preparation for making the above happen.
We make the I/O dtype mappings be generated by a function instead
of a global variable. In the next PR, we will add a hook so user
can modify these mappings.
Test Plan:
```
python test/test_quantization.py TestFXNumericSuiteCoreAPIs
```
Imported from OSS
Reviewed By: jerryzh168
Differential Revision: D27831996
fbshipit-source-id: 782f5e77de0eef3899b9b7def0fdabd8dcafef12
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56292
Adds hooks for specifying user defined functions to NS weight and
unshadowed activation APIs.
Adding it to shadowed activation APIs will be a bit more work, upcoming
in a separate PR.
Test Plan:
```
python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_user_defined_function
```
Imported from OSS
Reviewed By: jerryzh168
Differential Revision: D27830409
fbshipit-source-id: 6bbddc3062c0b3e412a3147244795319c0785a92
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56283
Exposes the `base_name_to_sets_of_related_ops` variable
to the graph matching API, so that users can add relationships
for custom functions. This is needed to enable full support of
external functions for custom backends.
The next PR will extend this to the NS APIs.
Test Plan:
```
python test/test_quantization.py TestFXGraphMatcher.test_user_defined_function
```
Imported from OSS
Reviewed By: jerryzh168
Differential Revision: D27830410
fbshipit-source-id: 8688cf697d388c52e3d18f108765edfca3c3d3aa
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56550
Add support for preserving a list of attributes on observed/quantized GraphModule
Test Plan:
python test/test_quantization.py TestQuantizeFx.test_deepcopy_preserve_attributes
Imported from OSS
Reviewed By: vkuzo, kazhang
Differential Revision: D27899317
fbshipit-source-id: ebf21334715e5ab764aaa27eed534cc0cdf9f2b5
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54924
Previously we are producing torch.ops.quantize.cat which takes inputs, dequantize them
and requantize with new qparams. This PR changes that to produce torch.cat directly, torch.cat
will assume all inputs are sharing the same qparam, and it will produce a quantized Tensor with
the same qparam as all inputs (because previous PR makes sure all inputs and output of cat are sharing
the same observer/fakequant instance).
Using torch.cat is expected to be more efficient since it does not introduce extra quant/dequant.
Test Plan:
python test/test_quantization.py TestQuantizeFx.test_cat
Imported from OSS
Reviewed By: vkuzo
Differential Revision: D27416528
fbshipit-source-id: 896c280abec2903c29d597c655729666583ff0dd
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56004
added reference pattern support for GELU, softmax and bmm for int dtypes. For GELU and Softmax, this consisted of adding reference patterns to the default node handler for int dtypes. Note GELU and softmax patterns are not registered since they do not have a proper quantized kernel which means they would either add unnecessary dequant and quant ops to the network, or they would simply error. This can be circumvented with custom qconfig usage as in test_gelu_reference
bmm was added within binary ops along with some significant changes to how that code is structured. Theoretically the reference pattern used for bmm could be applied to other dtypes. This was not enabled because of issues relating to Line 1323 in quantize.py. In essence, the prepare step does not know whether an op will use a reference pattern or not, so for ops that are supported with one dtype in reference and one dtype normally, this has the potential to cause issues. This is difficult to get aorund with the is_reference flag being available in the prepare step or discussed changes around separating
Test Plan:
python test/test_quantization.py TestQuantizeFxOps.test_gelu_reference
python test/test_quantization.py TestQuantizeFxOps.ttest_gelu_normal
python test/test_quantization.py TestQuantizeFxOps.test_softmax_reference
python test/test_quantization.py TestQuantizeFxOps.test_softmax_normal
python test/test_quantization.py TestQuantizeFxOps.test_silu_reference
python test/test_quantization.py TestQuantizeFxOps.test_bmm_int_reference
python test/test_quantization.py TestQuantizeFxOps
python test/test_quantization.py TestFuseFx
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxModels
Imported from OSS
Reviewed By: raghuramank100
Differential Revision: D27818340
fbshipit-source-id: de65be0797035463cd2d1b0e4677d1a87f69143c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56391
Previously we only support keeping output quantized for tensor output, this PR adds support
for list and dict (values) as well
Test Plan: Imported from OSS
Reviewed By: vkuzo
Differential Revision: D27860327
fbshipit-source-id: e770160ced47a7173abff5505ec620bd2b1a0b01
Summary:
As this diff shows, currently there are a couple hundred instances of raw `noqa` in the codebase, which just ignore all errors on a given line. That isn't great, so this PR changes all existing instances of that antipattern to qualify the `noqa` with respect to a specific error code, and adds a lint to prevent more of this from happening in the future.
Interestingly, some of the examples the `noqa` lint catches are genuine attempts to qualify the `noqa` with a specific error code, such as these two:
```
test/jit/test_misc.py:27: print(f"{hello + ' ' + test}, I'm a {test}") # noqa E999
test/jit/test_misc.py:28: print(f"format blank") # noqa F541
```
However, those are still wrong because they are [missing a colon](https://flake8.pycqa.org/en/3.9.1/user/violations.html#in-line-ignoring-errors), which actually causes the error code to be completely ignored:
- If you change them to anything else, the warnings will still be suppressed.
- If you add the necessary colons then it is revealed that `E261` was also being suppressed, unintentionally:
```
test/jit/test_misc.py:27:57: E261 at least two spaces before inline comment
test/jit/test_misc.py:28:35: E261 at least two spaces before inline comment
```
I did try using [flake8-noqa](https://pypi.org/project/flake8-noqa/) instead of a custom `git grep` lint, but it didn't seem to work. This PR is definitely missing some of the functionality that flake8-noqa is supposed to provide, though, so if someone can figure out how to use it, we should do that instead.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56272
Test Plan:
CI should pass on the tip of this PR, and we know that the lint works because the following CI run (before this PR was finished) failed:
- https://github.com/pytorch/pytorch/runs/2365189927
Reviewed By: janeyx99
Differential Revision: D27830127
Pulled By: samestep
fbshipit-source-id: d6dcf4f945ebd18cd76c46a07f3b408296864fcb
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56294
When matching a pattern to `BinaryOpQuantizeHandler`, we need to make
sure we check for dtype support on the base node, instead of the current
node. This is important in cases such as `add-relu` and `mul-relu`,
when the current node is `relu`, but the base node is `add|mul`.
Test Plan:
```
python test/test_quantization.py TestQuantizeFx
```
There is no good test case to check this in current logic. Created an
add-relu model manually, and verified with pdb that the add node was
being used to match against dtypes.
Imported from OSS
Reviewed By: jerryzh168
Differential Revision: D27831070
fbshipit-source-id: 3697f1328dff9fec3eb910bae49a73793ef36d63