Commit Graph

721 Commits

Author SHA1 Message Date
Vasiliy Kuznetsov
2a2bc1fc8a ns for fx: add fqn to results, when present (#61377)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61377

Both the quantization tracer and the NS tracer record
`_node_name_to_scope`, which contains the mapping from
node name to FQN.

This PR adds the FQN information to the NS results, so that it is
more convenient for users to attribute a NS result to the corresponding
module in their model.

Test Plan:
```
python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_extract_weights_fqn
python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_match_activations_fqn
python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_shadow_activations_fqn
```

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D29600349

fbshipit-source-id: df489e03daff97dd380f59c83ffdc2b0012a0a53
2021-07-17 20:53:41 -07:00
Vasiliy Kuznetsov
7449f49a4c ns for fx: return results in execution order (#61360)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61360

By default, NS graph matching matches from the end of the graph
to the start.  This PR reverses the returned results so that
the outputs of the NS APIs are in the order of execution, making
it easier to analyze.

Test Plan:
```
python test/test_quantization.py TestFXGraphMatcher.test_results_order
```

Imported from OSS

Reviewed By: hx89

Differential Revision: D29600348

fbshipit-source-id: c9fa4a3748db27c1788eebf803f35221e6fc8701
2021-07-17 20:53:39 -07:00
Vasiliy Kuznetsov
2b2928c5ca ns for fx: improve error messages for graph matching (#61359)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61359

Makes the error messages when graph matching easier to read
for users.

Test Plan:
```
// inspect the exceptions in the following two tests and verify
// that they are easier to read than before
python test/test_quantization.py TestFXGraphMatcher.test_matching_failure_node_count
python test/test_quantization.py TestFXGraphMatcher.test_matching_failure_node_type
```

Imported from OSS

Reviewed By: hx89

Differential Revision: D29600353

fbshipit-source-id: ec6640fe6cab7b62a697e4ee385be182f2918fd4
2021-07-17 20:53:38 -07:00
Vasiliy Kuznetsov
4acd14da02 ns for fx: preserve observers and fake_quants through passes (#61323)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61323

Before this PR, all observers and fake quants were silently removed
when adding loggers with NS. This is problematic for QAT models because
we need the fake quants to run in order to properly capture intermediate
outputs.

This PR fixes the issue by preserving the observers throughout
the passes which add loggers.  In detail:
* for each quantization module or fusion, add additional patterns with that fusion and an observer/fake_quant at the end
* remove the places in the logger model creation code which removed observers
* add unit testing that QAT numerics do not change after adding loggers

Test Plan:
```
python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_loggers_preserve_qat_numerics
python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_shadow_loggers_preserve_qat_numerics
```

Imported from OSS

Reviewed By: hx89

Differential Revision: D29600351

fbshipit-source-id: 5f25118b79eb47860c49bca882de6a8eae7a4456
2021-07-17 20:53:33 -07:00
Vasiliy Kuznetsov
a70505cdbd ns for fx: support comparing fp32 vs fp32_prepared, except shadowed (#61129)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61129

Adds support the comparing fp32 model (without quantization) to a
fp32 model prepared with quantization. The main missing feature was
handling conv-bn fusion, since this fusion for PTQ happens outside
of quantization patterns.

Adds testing for this case for comparing weights and comparing
activations

Adds a TODO for also handling this for shadow activations, we need to
first stop removing observers in graph passes before we can add
this support, will be in a future PR.

Test Plan:
```
python test/test_quantization.py TestFXGraphMatcherModels.test_mobilenet_v2
python test/test_quantization.py TestFXGraphMatcherModels.test_mobilenet_v2_qat
python test/test_quantization.py TestFXNumericSuiteCoreAPIsModels.test_compare_activations_conv
```

Imported from OSS

Reviewed By: raghuramank100

Differential Revision: D29520009

fbshipit-source-id: f63484a998f1424bd9cacf5d823b82b2edfea1ae
2021-07-17 20:52:23 -07:00
Angela Yi
0751a41ab1 [quant] Input-Weight Equalization - ConvReLU support (#61350)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61350

Applied changes in convert to allow for ConvReLU2d layers

Initial Model: `x -> conv1 -> relu`

After fusion: `x -> convRelu2d`

After prepare: `x -> input_quant_obs -> input_eq_obs1 -> convRelu2d -> output_quant_obs1`

After equalization functions: `x -> mul -> input_quant_obs (scaled) -> convRelu2d -> output_quant_obs`

After convert: `x -> mul -> quantize_per_tensor -> quantized::convRelu2d -> dequantize`

Test Plan:
`python test/test_quantization.py TestEqualizeFx`

Initial Model:
```
ConvReluModel(
  (fc): Conv2d(3, 5, kernel_size=(3, 3), stride=(1, 1))
  (relu): ReLU()
)
```

After prepare:
```
GraphModule(
  (x_activation_post_process_0): MinMaxObserver(min_val=5.960464477539063e-08, max_val=0.9999999403953552)
  (x_activation_post_process_0_equalization_process_0): _InputEqualizationObserver(
    (input_obs): PerChannelMinMaxObserver(min_val=tensor([1.1921e-07, 3.3379e-06, 5.9605e-08]), max_val=tensor([1.0000, 1.0000, 1.0000]))
  )
  (fc): ConvReLU2d(
    (0): Conv2d(3, 5, kernel_size=(3, 3), stride=(1, 1))
    (1): ReLU()
  )
  (fc_activation_post_process_0): MinMaxObserver(min_val=0.0, max_val=1.2341605424880981)
)

graph():
    %x : [#users=1] = placeholder[target=x]
    %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_0](args = (%x,), kwargs = {})
    %x_activation_post_process_0_equalization_process_0 : [#users=1] = call_module[target=x_activation_post_process_0_equalization_process_0](args = (%x_activation_post_process_0,), kwargs = {})
    %fc : [#users=1] = call_module[target=fc](args = (%x_activation_post_process_0_equalization_process_0,), kwargs = {})
    %fc_activation_post_process_0 : [#users=1] = call_module[target=fc_activation_post_process_0](args = (%fc,), kwargs = {})
    return fc_activation_post_process_0
```

After equalization functions:
```
graph():
    %x : [#users=1] = placeholder[target=x]
    %x_equalization_scale0 : [#users=1] = get_attr[target=x_equalization_scale0]
    %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_equalization_scale0), kwargs = {})
    %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_0](args = (%mul,), kwargs = {})
    %fc : [#users=1] = call_module[target=fc](args = (%x_activation_post_process_0,), kwargs = {})
    %fc_activation_post_process_0 : [#users=1] = call_module[target=fc_activation_post_process_0](args = (%fc,), kwargs = {})
    return fc_activation_post_process_0
```

After convert:
```
graph():
    %x : [#users=1] = placeholder[target=x]
    %x_equalization_scale0 : [#users=1] = get_attr[target=x_equalization_scale0]
    %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_equalization_scale0), kwargs = {})
    %fc_input_scale_0 : [#users=1] = get_attr[target=fc_input_scale_0]
    %fc_input_zero_point_0 : [#users=1] = get_attr[target=fc_input_zero_point_0]
    %quantize_per_tensor : [#users=1] = call_function[target=torch.quantize_per_tensor](args = (%mul, %fc_input_scale_0, %fc_input_zero_point_0, torch.quint8), kwargs = {})
    %fc : [#users=1] = call_module[target=fc](args = (%quantize_per_tensor,), kwargs = {})
    %dequantize : [#users=1] = call_method[target=dequantize](args = (%fc,), kwargs = {})
    return dequantize
```

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D29638275

fbshipit-source-id: 40d4666a4451e132612ea38fdfeaaec177a1defb
2021-07-13 14:00:40 -07:00
Angela Yi
b3e4dab45a [quant] Input-Weight Equalization - Conv convert support (#61287)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61287

Modifications to functions during convert() to support equalization. Note that this implementation does not work for connected F.conv2d layers yet.

Initial:
```
      w
      |
x -> conv -> y
```

After prepare:
```
                                         w
                                         |
                                  weight_quant_obs
                                         |
                                    weight_eq_obs
                                         |
x -> input_quant_obs -> input_eq_obs -> conv -> out_quant_obs -> y
```

After convert:
```
                scale, zero_point             w (scaled)
                       |                           |
x -> mul -> quantize_per_tensor (scaled) -> quantized::conv -> dequant -> y
      |
   eq_scale
```

Test Plan:
`python test/test_quantization.py TestEqualizeFx`

Initial model:
```
ConvModel(
  (conv): Conv2d(3, 5, kernel_size=(3, 3), stride=(1, 1), bias=False)
)
```

After prepare:
```
graph():
    %x : [#users=1] = placeholder[target=x]
    %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_0](args = (%x,), kwargs = {})
    %x_activation_post_process_0_equalization_process_0 : [#users=1] = call_module[target=x_activation_post_process_0_equalization_process_0](args = (%x_activation_post_process_0,), kwargs = {})
    %conv : [#users=1] = call_module[target=conv](args = (%x_activation_post_process_0_equalization_process_0,), kwargs = {})
    %conv_activation_post_process_0 : [#users=1] = call_module[target=conv_activation_post_process_0](args = (%conv,), kwargs = {})
    return conv_activation_post_process_0
```

After equalization functions:
```
graph():
    %x : [#users=1] = placeholder[target=x]
    %x_equalization_scale0 : [#users=1] = get_attr[target=x_equalization_scale0]
    %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_equalization_scale0), kwargs = {})
    %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_0](args = (%mul,), kwargs = {})
    %conv : [#users=1] = call_module[target=conv](args = (%x_activation_post_process_0,), kwargs = {})
    %conv_activation_post_process_0 : [#users=1] = call_module[target=conv_activation_post_process_0](args = (%conv,), kwargs = {})
    return conv_activation_post_process_0
```

After convert:
```
graph():
    %x : [#users=1] = placeholder[target=x]
    %x_equalization_scale0 : [#users=1] = get_attr[target=x_equalization_scale0]
    %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_equalization_scale0), kwargs = {})
    %conv_input_scale_0 : [#users=1] = get_attr[target=conv_input_scale_0]
    %conv_input_zero_point_0 : [#users=1] = get_attr[target=conv_input_zero_point_0]
    %quantize_per_tensor : [#users=1] = call_function[target=torch.quantize_per_tensor](args = (%mul, %conv_input_scale_0, %conv_input_zero_point_0, torch.quint8), kwargs = {})
    %conv : [#users=1] = call_module[target=conv](args = (%quantize_per_tensor,), kwargs = {})
    %dequantize : [#users=1] = call_method[target=dequantize](args = (%conv,), kwargs = {})
    return dequantize
```

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D29557055

fbshipit-source-id: dc9f44182e31fa362c43ad2dfe224e6f4e4a730e
2021-07-13 14:00:38 -07:00
Angela Yi
77d36b657a [quant] Input-Weight Equalization - Conv prepare support (#61286)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61286

Modifies the prepare step to support conv layers during input-weight equalization and adds tests to make sure that the results are as expected.

Initial:
```
      w
      |
x -> conv -> y
```

After prepare:

```
                                         w
                                         |
                                  weight_quant_obs
                                         |
                                    weight_eq_obs
                                         |
x -> input_quant_obs -> input_eq_obs -> conv -> out_quant_obs -> y
```

Test Plan:
`python test/test_quantization.py TestEqualizeFx.test_input_weight_equalization_prepare`

Initial:
```
ConvModel(
  (conv): Conv2d(3, 5, kernel_size=(3, 3), stride=(1, 1), bias=False)
)
```

After prepare:
```
graph():
    %x : [#users=1] = placeholder[target=x]
    %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_0](args = (%x,), kwargs = {})
    %x_activation_post_process_0_equalization_process_0 : [#users=1] = call_module[target=x_activation_post_process_0_equalization_process_0](args = (%x_activation_post_process_0,), kwargs = {})
    %conv : [#users=1] = call_module[target=conv](args = (%x_activation_post_process_0_equalization_process_0,), kwargs = {})
    %conv_activation_post_process_0 : [#users=1] = call_module[target=conv_activation_post_process_0](args = (%conv,), kwargs = {})
    return conv_activation_post_process_0
```

Imported from OSS

Reviewed By: supriyar

Differential Revision: D29557051

fbshipit-source-id: 25d1531645dfaf565f5c615e2ee850fcf96c7eb9
2021-07-13 14:00:36 -07:00
Angela Yi
ce9cedd119 [quant] Input-Weight Equalization - Conv observer support (#61285)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61285

Modifies observers to support conv layers and tests to make sure that the observers are returning the expected values for conv inputs.

Test Plan:
`python test/test_quantization.py TestEqualizeFx.test_input_weight_eq_observer`

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D29557041

fbshipit-source-id: 5e43329f189ba352eb8b991f38bf37752eebb6e6
2021-07-13 13:59:23 -07:00
Supriya Rao
7a15576a65 [quant] update FakeQuant modules to use tensor qparams (#61318)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61318

Remove the `float()` and `int()` calls in the forward function so that we can directly use the tensor qparams in the fake_quantize operator.

Calling `float()/int()` internally calls `item()` which can trigger a gpu-> cpu copy if the original tensors reside on GPU.
Local benchmark P427668213

Before this change
```
                                               Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg     Self CUDA   Self CUDA %    CUDA total  CUDA time avg    # of Calls
---------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------
                                     aten::_aminmax         2.57%       1.507ms         3.10%       1.819ms      36.371us       2.872ms         4.81%       2.872ms      57.446us            50
              aten::fake_quantize_per_tensor_affine         1.04%     610.915us         3.60%       2.114ms      42.276us     472.896us         0.79%       2.698ms      53.962us            50
    aten::fake_quantize_per_tensor_affine_cachemask         1.69%     993.626us         2.56%       1.503ms      30.058us       2.225ms         3.73%       2.225ms      44.504us            50
                                   aten::is_nonzero         3.85%       2.258ms        19.68%      11.540ms      46.161us       2.168ms         3.63%      11.084ms      44.336us           250
                                   aten::zeros_like         1.82%       1.064ms         6.65%       3.901ms      39.007us       1.531ms         2.57%       3.905ms      39.045us           100
                                           aten::eq        13.80%       8.093ms        25.90%      15.189ms      37.972us       9.580ms        16.05%      15.566ms      38.914us           400
                                         aten::item         5.67%       3.323ms        21.50%      12.607ms      36.019us       3.233ms         5.42%      12.167ms      34.762us           350
                                        aten::zeros         0.94%     549.208us         2.93%       1.717ms      34.343us     688.928us         1.15%       1.695ms      33.894us            50
                                           aten::le         2.52%       1.478ms         4.50%       2.641ms      26.411us       1.753ms         2.94%       2.845ms      28.448us           100
                                         aten::rsub         1.04%     608.715us         2.44%       1.433ms      28.667us     532.000us         0.89%       1.418ms      28.353us            50
                                          aten::max         1.54%     905.401us         4.62%       2.711ms      27.106us     847.488us         1.42%       2.697ms      26.969us           100
                                         aten::ones         0.92%     542.159us         2.16%       1.266ms      25.324us     661.856us         1.11%       1.301ms      26.017us            50
                                          aten::min         0.82%     479.167us         2.15%       1.258ms      25.160us     407.808us         0.68%       1.276ms      25.530us            50
                          aten::_local_scalar_dense        15.83%       9.284ms        15.83%       9.284ms      26.526us       8.934ms        14.97%       8.934ms      25.524us           350
                                        aten::clamp         2.35%       1.378ms         4.21%       2.467ms      24.669us       1.546ms         2.59%       2.461ms      24.612us           100
                                        aten::zero_         2.53%       1.482ms         5.65%       3.316ms      22.108us       1.326ms         2.22%       3.380ms      22.531us           150
                                      aten::maximum         3.08%       1.805ms         3.08%       1.805ms      18.052us       1.849ms         3.10%       1.849ms      18.494us           100
                                      aten::minimum         1.33%     778.854us         1.33%     778.854us      15.577us     868.672us         1.46%     868.672us      17.373us            50
                                        aten::round         1.36%     799.910us         1.36%     799.910us      15.998us     809.568us         1.36%     809.568us      16.191us            50
                                        aten::copy_         6.61%       3.878ms         6.61%       3.878ms      15.513us       4.036ms         6.76%       4.036ms      16.143us           250
                                          aten::div         2.53%       1.483ms         2.53%       1.483ms      14.833us       1.535ms         2.57%       1.535ms      15.353us           100
                                          aten::mul         2.44%       1.431ms         2.44%       1.431ms      14.314us       1.478ms         2.48%       1.478ms      14.782us           100
                                       aten::detach         1.46%     855.670us         2.41%       1.411ms      14.110us     832.448us         1.39%       1.395ms      13.949us           100
                                          aten::add         2.22%       1.301ms         2.22%       1.301ms      13.008us       1.383ms         2.32%       1.383ms      13.828us           100
                                        aten::fill_         4.18%       2.452ms         4.18%       2.452ms      12.262us       2.693ms         4.51%       2.693ms      13.463us           200
                                          aten::sub         5.06%       2.967ms         5.06%       2.967ms      14.837us       2.675ms         4.48%       2.675ms      13.374us           200
                                           aten::to         2.10%       1.230ms         3.65%       2.140ms      10.701us       1.310ms         2.20%       2.062ms      10.310us           200
                                       aten::select         1.28%     749.144us         1.49%     874.227us       8.742us     863.232us         1.45%     863.232us       8.632us           100
                                             detach         0.95%     555.326us         0.95%     555.326us       5.553us     562.496us         0.94%     562.496us       5.625us           100
                                   aten::as_strided         0.40%     232.289us         0.40%     232.289us       1.161us       0.000us         0.00%       0.000us       0.000us           200
                                        aten::empty         2.93%       1.720ms         2.93%       1.720ms       3.439us       0.000us         0.00%       0.000us       0.000us           500
                                      aten::resize_         1.04%     611.313us         1.04%     611.313us       2.038us       0.000us         0.00%       0.000us       0.000us           300
                                   aten::empty_like         0.75%     438.585us         1.77%       1.036ms       5.180us       0.000us         0.00%       0.000us       0.000us           200
                                aten::empty_strided         1.36%     799.442us         1.36%     799.442us       3.198us       0.000us         0.00%       0.000us       0.000us           250
---------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------
Self CPU time total: 58.645ms
Self CUDA time total: 59.674ms
```

After this change
```

test_fake_quant_profiler (scripts.supriyar.benchmark.module_bench.ProfilerBench) ... -------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------
                                                   Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg     Self CUDA   Self CUDA %    CUDA total  CUDA time avg    # of Calls
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------
                  aten::fake_quantize_per_tensor_affine         0.98%     505.210us         4.38%       2.259ms      45.187us     419.424us         0.78%       3.218ms      64.367us            50
                                         aten::_aminmax         2.78%       1.434ms         3.42%       1.766ms      35.321us       2.825ms         5.27%       2.825ms      56.505us            50
aten::fake_quantize_per_tensor_affine_cachemask_tens...         2.38%       1.229ms         3.40%       1.754ms      35.083us       2.799ms         5.22%       2.799ms      55.979us            50
                                             aten::rsub         0.94%     485.040us         5.02%       2.590ms      51.793us     458.976us         0.86%       2.587ms      51.747us            50
                                       aten::is_nonzero         3.78%       1.952ms        23.64%      12.196ms      48.786us       2.055ms         3.83%      11.986ms      47.944us           250
                                             aten::item         6.92%       3.572ms        19.86%      10.244ms      40.977us       3.670ms         6.85%       9.931ms      39.724us           250
                                       aten::zeros_like         1.65%     848.874us         6.64%       3.426ms      34.260us       1.397ms         2.61%       3.572ms      35.717us           100
                                            aten::zeros         0.85%     436.691us         3.00%       1.549ms      30.984us     551.936us         1.03%       1.576ms      31.516us            50
                                               aten::eq        10.60%       5.467ms        20.26%      10.452ms      26.130us       7.018ms        13.09%      10.832ms      27.079us           400
                                               aten::le         2.58%       1.332ms         4.67%       2.407ms      24.074us       1.580ms         2.95%       2.614ms      26.144us           100
                              aten::_local_scalar_dense        12.93%       6.673ms        12.93%       6.673ms      26.691us       6.261ms        11.68%       6.261ms      25.046us           250
                                            aten::clamp         2.43%       1.253ms         4.37%       2.256ms      22.560us       1.431ms         2.67%       2.273ms      22.725us           100
                                             aten::ones         0.89%     460.133us         2.18%       1.123ms      22.467us     570.496us         1.06%       1.128ms      22.551us            50
                                              aten::min         0.74%     383.132us         2.06%       1.065ms      21.296us     377.536us         0.70%       1.091ms      21.824us            50
                                            aten::zero_         2.36%       1.219ms         5.87%       3.029ms      20.194us       1.261ms         2.35%       3.199ms      21.327us           150
                                              aten::max         1.51%     779.081us         4.06%       2.096ms      20.960us     791.680us         1.48%       2.130ms      21.295us           100
                                              aten::sub         7.97%       4.111ms         7.97%       4.111ms      20.556us       3.847ms         7.18%       3.847ms      19.234us           200
                                              aten::div         2.94%       1.516ms         2.94%       1.516ms      15.158us       1.580ms         2.95%       1.580ms      15.798us           100
                                            aten::round         1.45%     750.445us         1.45%     750.445us      15.009us     756.064us         1.41%     756.064us      15.121us            50
                                            aten::copy_         6.88%       3.548ms         6.88%       3.548ms      14.190us       3.701ms         6.90%       3.701ms      14.803us           250
                                          aten::minimum         1.32%     681.654us         1.32%     681.654us      13.633us     713.664us         1.33%     713.664us      14.273us            50
                                          aten::maximum         2.55%       1.317ms         2.55%       1.317ms      13.169us       1.338ms         2.50%       1.338ms      13.378us           100
                                              aten::mul         2.63%       1.358ms         2.63%       1.358ms      13.581us       1.328ms         2.48%       1.328ms      13.283us           100
                                           aten::detach         1.34%     688.820us         2.35%       1.211ms      12.110us     772.800us         1.44%       1.278ms      12.779us           100
                                            aten::fill_         4.53%       2.338ms         4.53%       2.338ms      11.692us       2.495ms         4.65%       2.495ms      12.473us           200
                                              aten::add         2.32%       1.197ms         2.32%       1.197ms      11.968us       1.240ms         2.31%       1.240ms      12.405us           100
                                               aten::to         2.07%       1.069ms         3.66%       1.889ms       9.443us       1.224ms         2.28%       1.975ms       9.874us           200
                                           aten::select         1.44%     743.042us         1.64%     848.207us       8.482us     641.600us         1.20%     641.600us       6.416us           100
                                                 detach         1.01%     522.155us         1.01%     522.155us       5.222us     505.088us         0.94%     505.088us       5.051us           100
                                       aten::as_strided         0.44%     227.884us         0.44%     227.884us       1.139us       0.000us         0.00%       0.000us       0.000us           200
                                            aten::empty         3.20%       1.652ms         3.20%       1.652ms       3.304us       0.000us         0.00%       0.000us       0.000us           500
                                          aten::resize_         1.25%     646.711us         1.25%     646.711us       2.156us       0.000us         0.00%       0.000us       0.000us           300
                                       aten::empty_like         0.79%     407.768us         2.07%       1.067ms       5.334us       0.000us         0.00%       0.000us       0.000us           200
                                    aten::empty_strided         1.52%     785.788us         1.52%     785.788us       3.143us       0.000us         0.00%       0.000us       0.000us           250
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------
Self CPU time total: 51.590ms
Self CUDA time total: 53.609ms
ghstack-source-id: 133370215

Test Plan: buck test mode/dev-nosan caffe2/test/:quantization

Reviewed By: raghuramank100

Differential Revision: D29566512

fbshipit-source-id: 1aefca51f99949da7334bcfe504848275c9f952c
2021-07-10 19:43:02 -07:00
Supriya Rao
99848c7269 [quant] Add tensor_qparam variant to fake_quantize_per_tensor (#61317)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61317

Add an overload to fake_quantize_per_tensor that accepts scale/zero_point as input. The reasons to do this are

* required for fused observer + fake_quant operator on GPU where the scale/zero_point will be calculated by the observer on device. Passing tensor inputs enables us to directly access the scale/zero-point value in the cuda kernel to avoid extra copies/malloc
* enables us to pass in float as scale dtype and int32 as zero_point dtype (which is consistent with what the quantize call actually uses) https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/quantized/affine_quantizer_base.cpp#L52-L53
* overload consistent with `quantizer_per_tensor.tensor_qparams`
ghstack-source-id: 133370216

Test Plan:
buck test mode/dev-nosan caffe2/test/:quantization -- test_backward_per_tensor_cachemask
buck test mode/dev-nosan caffe2/test/:quantization -- test_forward_per_tensor_cachemask

Reviewed By: raghuramank100

Differential Revision: D29552727

fbshipit-source-id: cbb9af40fc575ad27a29c646b760d5ee52cc923d
2021-07-10 19:41:55 -07:00
Supriya Rao
a4d86e0d53 [quant][fx][perf] improve runtime of prepare step for large models (#61132)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61132

For large models, the insert_observers_for_model function was taking a long time, especially for the case where not all the nodes are being quantized

For example for a model with 21000 nodes of which only ~50 are being quantized the breakdown of prepare_fx vs convert fx was

prepare_fx 979 seconds
convert_fx 9 seconds

The main reason was because we were doing some unnecessary computation for all nodes in this function, this PR just moves them to where they are actually used

After this PR
prepare_fx 26 seconds
convert_fx 9 seconds

Test Plan:
Existing tests

Imported from OSS

Reviewed By: raghuramank100

Differential Revision: D29522303

fbshipit-source-id: 7ce12582a859d02ff763abebf4a592d28e0764ca
2021-07-01 17:17:10 -07:00
Angela Yi
dabadd7e20 [quant] Added reset_min_max_vals() function to observers (#60883)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60883

As per this [comment](https://github.com/pytorch/pytorch/pull/59964#discussion_r659064270), I created a `reset_min_max_vals()` function inside the observers which will be called during input-weight equalization. This is so that we will not expose the implementation of the observers in the equalization code.

Test Plan:
`python test/test_quantization.py TestEqualizeFx`

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D29491848

fbshipit-source-id: 00e91959ceb3b4f3688175a1a7ba11823e929b2f
2021-06-30 14:22:08 -07:00
Angela Yi
1a0195db49 [quant] Input-Weight Equalization - support for LinearReLU layers (#60653)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60653

Special casing was needed to get the weight attribute in the linear layers of fused LinearReLU layers.

Initial Model: `x -> linear1 -> relu`

After fusion: `x -> linearRelu`

After prepare: `x -> input_quant_obs -> input_eq_obs1 -> linearRelu -> output_quant_obs1`

After equalization functions: `x -> mul -> input_quant_obs (scaled) -> linearRelu -> output_quant_obs`

After convert: `x -> mul -> quantize_per_tensor -> quantized::linearRelu -> dequantize`

More step-throughs here: https://fb.quip.com/A9J3AsBxkykR

Test Plan:
`python test/test_quantization.py TestEqualizeFx`

Original model:
```
LinearReluModel(
  (fc): Linear(in_features=5, out_features=5, bias=True)
  (relu): ReLU()
)
```

Graph after `prepare_fx`:
```
graph():
    %x : [#users=1] = placeholder[target=x]
    %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_0](args = (%x,), kwargs = {})
    %x_activation_post_process_0_equalization_process_0 : [#users=1] = call_module[target=x_activation_post_process_0_equalization_process_0](args = (%x_activation_post_process_0,), kwargs = {})
    %fc : [#users=1] = call_module[target=fc](args = (%x_activation_post_process_0_equalization_process_0,), kwargs = {})
    %fc_activation_post_process_0 : [#users=1] = call_module[target=fc_activation_post_process_0](args = (%fc,), kwargs = {})
    return fc_activation_post_process_0
```

Graph after equalization functions:
```
graph():
    %x : [#users=1] = placeholder[target=x]
    %x_equalization_scale0 : [#users=1] = get_attr[target=x_equalization_scale0]
    %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_equalization_scale0), kwargs = {})
    %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_0](args = (%mul,), kwargs = {})
    %fc : [#users=1] = call_module[target=fc](args = (%x_activation_post_process_0,), kwargs = {})
    %fc_activation_post_process_0 : [#users=1] = call_module[target=fc_activation_post_process_0](args = (%fc,), kwargs = {})
    return fc_activation_post_process_0
```

Graph after `convert_fx`:
```
graph():
    %x : [#users=1] = placeholder[target=x]
    %x_equalization_scale0 : [#users=1] = get_attr[target=x_equalization_scale0]
    %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_equalization_scale0), kwargs = {})
    %fc_input_scale_0 : [#users=1] = get_attr[target=fc_input_scale_0]
    %fc_input_zero_point_0 : [#users=1] = get_attr[target=fc_input_zero_point_0]
    %quantize_per_tensor : [#users=1] = call_function[target=torch.quantize_per_tensor](args = (%mul, %fc_input_scale_0, %fc_input_zero_point_0, torch.quint8), kwargs = {})
    %fc : [#users=1] = call_module[target=fc](args = (%quantize_per_tensor,), kwargs = {})
    %dequantize : [#users=1] = call_method[target=dequantize](args = (%fc,), kwargs = {})
    return dequantize
```

Imported from OSS

Reviewed By: supriyar

Differential Revision: D29406999

fbshipit-source-id: add38e8e7fb84a241c3b10bfb8451b50103effd4
2021-06-30 14:22:06 -07:00
Vasiliy Kuznetsov
5576c7bdd1 ns for fx: initial support for int8 shadows fp32 (#60419)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60419

Adds support for NS for FX shadowed activations pass to handle int8
modules shadowing fp32 modules. The difficulty here is that in order
to insert the dtype cast, we need the qparams of the input.

For the current PR, we only handle the easy cases where the previous
node is either a `quantize_per_tensor` or an OSS quantized module.
A future PR can handle more complicated cases such as various functions.

Test Plan:
```
python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_int8_shadows_fp32_simple
```

Imported from OSS

Reviewed By: hx89

Differential Revision: D29280050

fbshipit-source-id: 465257c9f82a34fa91b48ae8887355c68e00edc6
2021-06-30 08:08:46 -07:00
Angela Yi
9b94aa5356 [quant][fx][fix] Fused modules with object_type in qconfig (#60779)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60779

When we do fusion, we replace certain modules (such as Linear + ReLU) with fused versions (such as LinearReLU) by calling `_fuse_fx` in prepare_fx. However when we try to look up using the fused module type in qconfig_dict, we cannot find a match anymore since the qconfig dict contains the original module types. An example is here [N882873](https://fburl.com/anp/azenjx3v).

So we will now update the qconfig_dict to include the fused modules mapping to the qconfigs used for the modules that make up the fused modules. If the modules are not mapped to the same qconfig, then we will raise an error.

Test Plan:
`python test/test_quantization.py TestFuseFx.test_qconfig_fused_module`

Imported from OSS

Reviewed By: supriyar

Differential Revision: D29406941

fbshipit-source-id: 74b5db89f4998aeb02b2bf7c37bf97326580c654
2021-06-28 15:22:22 -07:00
Angela Yi
dfb9c0bae8 [quant] Input-Weight Equalization - support for connected F.linear layer (#60272)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60272

Test Plan:
`python test/test_quantization.py TestEqualizeFx`

Original model:
```
FunctionalLinear2Module(
  (linear1): Linear()
  (linear2): Linear()
)
```

Graph after `prepare_fx`:
```
graph():
    %x : [#users=1] = placeholder[target=x]
    %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_0](args = (%x,), kwargs = {})
    %x_activation_post_process_0_equalization_process_0 : [#users=1] = call_module[target=x_activation_post_process_0_equalization_process_0](args = (%x_activation_post_process_0,), kwargs = {})
    %linear1_w : [#users=1] = get_attr[target=linear1.w]
    %linear1_w_activation_post_process_0 : [#users=1] = call_module[target=linear1_w_activation_post_process_0](args = (%linear1_w,), kwargs = {})
    %linear1_w_activation_post_process_0_equalization_process_0 : [#users=1] = call_module[target=linear1_w_activation_post_process_0_equalization_process_0](args = (%linear1_w_activation_post_process_0,), kwargs = {})
    %linear1_b : [#users=1] = get_attr[target=linear1.b]
    %linear : [#users=1] = call_function[target=torch.nn.functional.linear](args = (%x_activation_post_process_0_equalization_process_0, %linear1_w_activation_post_process_0_equalization_process_0), kwargs = {bias: %linear1_b})
    %linear_activation_post_process_0 : [#users=1] = call_module[target=linear_activation_post_process_0](args = (%linear,), kwargs = {})
    %linear_activation_post_process_0_equalization_process_0 : [#users=1] = call_module[target=linear_activation_post_process_0_equalization_process_0](args = (%linear_activation_post_process_0,), kwargs = {})
    %linear2_w : [#users=1] = get_attr[target=linear2.w]
    %linear2_w_activation_post_process_0 : [#users=1] = call_module[target=linear2_w_activation_post_process_0](args = (%linear2_w,), kwargs = {})
    %linear2_w_activation_post_process_0_equalization_process_0 : [#users=1] = call_module[target=linear2_w_activation_post_process_0_equalization_process_0](args = (%linear2_w_activation_post_process_0,), kwargs = {})
    %linear2_b : [#users=1] = get_attr[target=linear2.b]
    %linear_1 : [#users=1] = call_function[target=torch.nn.functional.linear](args = (%linear_activation_post_process_0_equalization_process_0, %linear2_w_activation_post_process_0_equalization_process_0), kwargs = {bias: %linear2_b})
    %linear_1_activation_post_process_0 : [#users=1] = call_module[target=linear_1_activation_post_process_0](args = (%linear_1,), kwargs = {})
    return linear_1_activation_post_process_0
```

Graph after equalization steps:
```
graph():
    %x : [#users=1] = placeholder[target=x]
    %x_equalization_scale0 : [#users=1] = get_attr[target=x_equalization_scale0]
    %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_equalization_scale0), kwargs = {})
    %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_0](args = (%mul,), kwargs = {})
    %linear1_w : [#users=1] = get_attr[target=linear1.w]
    %linear1_w_activation_post_process_0 : [#users=1] = call_module[target=linear1_w_activation_post_process_0](args = (%linear1_w,), kwargs = {})
    %linear1_b : [#users=1] = get_attr[target=linear1.b]
    %linear : [#users=1] = call_function[target=torch.nn.functional.linear](args = (%x_activation_post_process_0, %linear1_w_activation_post_process_0), kwargs = {bias: %linear1_b})
    %linear_activation_post_process_0 : [#users=1] = call_module[target=linear_activation_post_process_0](args = (%linear,), kwargs = {})
    %linear2_w : [#users=1] = get_attr[target=linear2.w]
    %linear2_w_activation_post_process_0 : [#users=1] = call_module[target=linear2_w_activation_post_process_0](args = (%linear2_w,), kwargs = {})
    %linear2_b : [#users=1] = get_attr[target=linear2.b]
    %linear_1 : [#users=1] = call_function[target=torch.nn.functional.linear](args = (%linear_activation_post_process_0, %linear2_w_activation_post_process_0), kwargs = {bias: %linear2_b})
    %linear_1_activation_post_process_0 : [#users=1] = call_module[target=linear_1_activation_post_process_0](args = (%linear_1,), kwargs = {})
    return linear_1_activation_post_process_0
```

Graph after `convert_fx`:
```
graph():
    %x : [#users=1] = placeholder[target=x]
    %x_equalization_scale0 : [#users=1] = get_attr[target=x_equalization_scale0]
    %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_equalization_scale0), kwargs = {})
    %linear1_input_scale_0 : [#users=1] = get_attr[target=linear1_input_scale_0]
    %linear1_input_zero_point_0 : [#users=1] = get_attr[target=linear1_input_zero_point_0]
    %quantize_per_tensor : [#users=1] = call_function[target=torch.quantize_per_tensor](args = (%mul, %linear1_input_scale_0, %linear1_input_zero_point_0, torch.quint8), kwargs = {})
    %linear1_packed_weight_0 : [#users=1] = get_attr[target=linear1_packed_weight_0]
    %linear1_scale_0 : [#users=1] = get_attr[target=linear1_scale_0]
    %linear1_zero_point_0 : [#users=1] = get_attr[target=linear1_zero_point_0]
    %linear : [#users=1] = call_function[target=torch.ops.quantized.linear](args = (%quantize_per_tensor, %linear1_packed_weight_0, %linear1_scale_0, %linear1_zero_point_0), kwargs = {})
    %linear2_packed_weight_0 : [#users=1] = get_attr[target=linear2_packed_weight_0]
    %linear2_scale_0 : [#users=1] = get_attr[target=linear2_scale_0]
    %linear2_zero_point_0 : [#users=1] = get_attr[target=linear2_zero_point_0]
    %linear_1 : [#users=1] = call_function[target=torch.ops.quantized.linear](args = (%linear, %linear2_packed_weight_0, %linear2_scale_0, %linear2_zero_point_0), kwargs = {})
    %dequantize : [#users=1] = call_method[target=dequantize](args = (%linear_1,), kwargs = {})
    return dequantize
```

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D29267218

fbshipit-source-id: 6b97bed1a307f1d0b1f5efcbecf41f35418242f7
2021-06-28 10:44:27 -07:00
Angela Yi
ddf2ce03bb [quant] Input-Weight Equalization - support for connected linear layers (#60034)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60034

Added support for equalizing models with connected linear
layers. To account for connected linear layers, we will additionally
multiply the previous weight values (row-wise) by the next equalization
scale, and remove the input equalization observer between the two linear
layers. We also want to scale the bias by the next equalization scale.
The math is shown here: https://fb.quip.com/fK8rA9aRM4ca .

Original Model: `x -> linear1 -> linear2`
After `prepare_fx`: `x -> InpEqObs -> InpQuantObs -> linear1 ->
OutQuantObs -> InpEqObs -> linear2`
After equalization: `x -> mul -> InpQuantObs -> linear1 -> OutQuantObs
-> linear2`

Test Plan:
`python test/test_quantization.py
TestEqualizeFx.test_input_weight_equalization_convert`

Original Model:
```
Linear2Module(
  (linear1): Linear(in_features=2, out_features=2, bias=True)
  (linear2): Linear(in_features=2, out_features=2, bias=True)
)
```

Graph after `prepare_fx`:
```
graph():
    %x : [#users=1] = placeholder[target=x]
    %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_0](args = (%x,), kwargs = {})
    %x_activation_post_process_0_equalization_process_0 : [#users=1] = call_module[target=x_activation_post_process_0_equalization_process_0](args = (%x_activation_post_process_0,), kwargs = {})
    %linear1 : [#users=1] = call_module[target=linear1](args = (%x_activation_post_process_0_equalization_process_0,), kwargs = {})
    %linear1_activation_post_process_0 : [#users=1] = call_module[target=linear1_activation_post_process_0](args = (%linear1,), kwargs = {})
    %linear1_activation_post_process_0_equalization_process_0 : [#users=1] = call_module[target=linear1_activation_post_process_0_equalization_process_0](args = (%linear1_activation_post_process_0,), kwargs = {})
    %linear2 : [#users=1] = call_module[target=linear2](args = (%linear1_activation_post_process_0_equalization_process_0,), kwargs = {})
    %linear2_activation_post_process_0 : [#users=1] = call_module[target=linear2_activation_post_process_0](args = (%linear2,), kwargs = {})
    return linear2_activation_post_process_0
```

Graph after equaliation functions:
```
graph():
    %x : [#users=1] = placeholder[target=x]
    %x_activation_post_process_0_equalization_process_0_scale : [#users=1] = get_attr[target=x_activation_post_process_0_equalization_process_0_scale]
    %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_activation_post_process_0_equalization_process_0_scale), kwargs = {})
    %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_0](args = (%mul,), kwargs = {})
    %linear1 : [#users=1] = call_module[target=linear1](args = (%x_activation_post_process_0,), kwargs = {})
    %linear1_activation_post_process_0 : [#users=1] = call_module[target=linear1_activation_post_process_0](args = (%linear1,), kwargs = {})
    %linear2 : [#users=1] = call_module[target=linear2](args = (%linear1_activation_post_process_0,), kwargs = {})
    %linear2_activation_post_process_0 : [#users=1] = call_module[target=linear2_activation_post_process_0](args = (%linear2,), kwargs = {})
    return linear2_activation_post_process_0
```

Graph after `convert_fx`:
```
graph():
    %x : [#users=1] = placeholder[target=x]
    %x_activation_post_process_0_equalization_process_0_scale : [#users=1] = get_attr[target=x_activation_post_process_0_equalization_process_0_scale]
    %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_activation_post_process_0_equalization_process_0_scale), kwargs = {})
    %linear1_input_scale_0 : [#users=1] = get_attr[target=linear1_input_scale_0]
    %linear1_input_zero_point_0 : [#users=1] = get_attr[target=linear1_input_zero_point_0]
    %quantize_per_tensor : [#users=1] = call_function[target=torch.quantize_per_tensor](args = (%mul, %linear1_input_scale_0, %linear1_input_zero_point_0, torch.quint8), kwargs = {})
    %linear1 : [#users=1] = call_module[target=linear1](args = (%quantize_per_tensor,), kwargs = {})
    %linear2 : [#users=1] = call_module[target=linear2](args = (%linear1,), kwargs = {})
    %dequantize : [#users=1] = call_method[target=dequantize](args = (%linear2,), kwargs = {})
    return dequantize
```

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D29204347

fbshipit-source-id: 6bb9e25e2468f50df523885ded2edc731f002ac1
2021-06-28 10:44:25 -07:00
Angela Yi
7917318917 [quant] Input-Weight Equalization - support for F.linear layers (#59964)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59964

Input-Weight Equalization support for functional layers

Test Plan:
`python test/test_quantization.py
TestEqualizeFx.test_input_weight_equalization_convert`

Original model:
```
FunctionalLinearModule(
  (linear1): Linear()
)
```

Graph after `prepare_fx`:
```
graph():
    %x : [#users=1] = placeholder[target=x]
    %x_equalization_process_0 : [#users=1] = call_module[target=x_equalization_process_0](args = (%x,), kwargs = {})
graph():
    %x : [#users=1] = placeholder[target=x]
    %x_equalization_process_0 : [#users=1] = call_module[target=x_equalization_process_0](args = (%x,), kwargs = {})
    %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_00](args = (%x_equalization_process_0,), kwargs = {})
    %linear1_w : [#users=1] = get_attr[target=linear1.w]
    %linear1_w_equalization_process_0 : [#users=1] = call_module[target=linear1_w_equalization_process_0](args = (%linear1_w,), kwargs = {})
    %linear1_w_activation_post_process_0 : [#users=1] = call_module[target=linear1_w_activation_post_process_00](args = (%linear1_w_equalization_process_0,), kwargs = {})
    %linear1_b : [#users=1] = get_attr[target=linear1.b]
    %linear : [#users=1] = call_function[target=torch.nn.functional.linear](args = (%x_activation_post_process_0, %linear1_w_activation_post_process_0), kwargs = {bias: %linear1_b})
    %linear_activation_post_process_0 : [#users=1] = call_module[target=linear_activation_post_process_0](args = (%linear,), kwargs = {})
    return linear_activation_post_process_0
```

Graph after equalization functions:
```
graph():
    %x : [#users=1] = placeholder[target=x]
    %x_equalization_process_0_scale : [#users=1] = get_attr[target=x_equalization_process_0_scale]
    %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_equalization_process_0_scale), kwargs = {})
    %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_00](args = (%mul,), kwargs = {})
    %linear1_w : [#users=1] = get_attr[target=linear1.w]
    %linear1_w_equalization_process_0 : [#users=1] = call_module[target=linear1_w_equalization_process_0](args = (%linear1_w,), kwargs = {})
    %linear1_w_activation_post_process_0 : [#users=1] = call_module[target=linear1_w_activation_post_process_00](args = (%linear1_w_equalization_process_0,), kwargs = {})
    %linear1_b : [#users=1] = get_attr[target=linear1.b]
    %linear : [#users=1] = call_function[target=torch.nn.functional.linear](args = (%x_activation_post_process_0, %linear1_w_activation_post_process_0), kwargs = {bias: %linear1_b})
    %linear_activation_post_process_0 : [#users=1] = call_module[target=linear_activation_post_process_0](args = (%linear,), kwargs = {})
    return linear_activation_post_process_0
```

Graph after `convert_fx`:
```
graph():
    %x : [#users=1] = placeholder[target=x]
    %x_equalization_process_0_scale : [#users=1] = get_attr[target=x_equalization_process_0_scale]
    %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_equalization_process_0_scale), kwargs = {})
    %linear1_input_scale_0 : [#users=1] = get_attr[target=linear1_input_scale_0]
    %linear1_input_zero_point_0 : [#users=1] = get_attr[target=linear1_input_zero_point_0]
    %quantize_per_tensor : [#users=1] = call_function[target=torch.quantize_per_tensor](args = (%mul, %linear1_input_scale_0, %linear1_input_zero_point_0, torch.quint8), kwargs = {})
    %linear1_packed_weight_0 : [#users=1] = get_attr[target=linear1_packed_weight_0]
    %linear1_scale_0 : [#users=1] = get_attr[target=linear1_scale_0]
    %linear1_zero_point_0 : [#users=1] = get_attr[target=linear1_zero_point_0]
    %linear : [#users=1] = call_function[target=torch.ops.quantized.linear](args = (%quantize_per_tensor, %linear1_packed_weight_0, %linear1_scale_0, %linear1_zero_point_0), kwargs = {})
    %dequantize : [#users=1] = call_method[target=dequantize](args = (%linear,), kwargs = {})
    return dequantize
```

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D29135459

fbshipit-source-id: 1e69bfbb82a0c89538e55b64968effd0b11b2fde
2021-06-28 10:44:24 -07:00
angelayi
e13a9587b4 Revert "Revert D29135358: [quant] Input-Weight Equaliaztion - convert modifications" (#60646)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60646

This reverts commit e60f9cfc58.

Test Plan: Imported from OSS

Reviewed By: supriyar

Differential Revision: D29361191

Pulled By: angelayi

fbshipit-source-id: 275d8691d8e47da4ab80bb21b51d77ec25a0f714
2021-06-25 15:37:05 -07:00
Vasiliy Kuznetsov
7fc4e67771 ns for fx: fix shadow logger error for resnet18 (#60559)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60559

Adds `resnet18` to integration test, and fixes the error to
make creating the shadow model work.

Test Plan:
```
python test/test_quantization.py TestFXNumericSuiteCoreAPIsModels.test_resnet18
```

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D29336236

fbshipit-source-id: 9425aa096162d80ef3a7c98144b2301cfbccc1ea
2021-06-24 13:42:18 -07:00
Vasiliy Kuznetsov
4ddb2b43b7 ns for fx: expose function to add comparisons between logged values (#60311)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60311

Adds a user facing utility function to FX Numeric Suite Core APIs
for comparing the values extracted by the loggers to each other.
This is needed for any kind of analysis, so would be great to
provide an example implementation.

Example:

```
// code

m = nn.Sequential(nn.Conv2d(1, 1, 1), nn.Conv2d(1, 1, 1)).eval()
qconfig_dict = {'': torch.quantization.default_qconfig}
mp = torch.quantization.quantize_fx.prepare_fx(m, qconfig_dict)
mq = torch.quantization.quantize_fx.convert_fx(copy.deepcopy(mp))
results = extract_weights('fp32', mp, 'int8', mq)
extend_logger_results_with_comparison(
    results, 'fp32', 'int8', compute_sqnr, 'sqnr_int8_vs_fp32')

print(results)

// results

{
  '_1': {'weight': {
    'fp32': [
      {'type': 'weight', 'values': [tensor([[[[-0.3284]]]])], 'prev_node_name': '_1', 'prev_node_target_type': "<class 'torch.nn.modules.conv.Conv2d'>", 'ref_node_name': '_1', 'index_within_arg': 0, 'index_of_arg': 0}
    ],
    'int8': [
      {'type': 'weight', 'values': [tensor([[[[-0.3297]]]], size=(1, 1, 1, 1), dtype=torch.qint8,
       quantization_scheme=torch.per_tensor_affine, scale=0.002575645223259926,
       zero_point=0)], 'prev_node_name': '_1', 'prev_node_target_type': "<class 'torch.nn.quantized.modules.conv.Conv2d'>", 'ref_node_name': '_1', 'index_within_arg': 0, 'index_of_arg': 0, 'sqnr_int8_vs_fp32': [tensor(48.1308)]}
    ]
  }},
  '_0': {'weight': {
    'fp32': [{'type': 'weight', 'values': [tensor([[[[0.5205]]]])], 'prev_node_name': '_0', 'prev_node_target_type': "<class 'torch.nn.modules.conv.Conv2d'>", 'ref_node_name': '_0', 'index_within_arg': 0, 'index_of_arg': 0}],
    'int8': [{'type': 'weight', 'values': [tensor([[[[0.5184]]]], size=(1, 1, 1, 1), dtype=torch.qint8,
       quantization_scheme=torch.per_tensor_affine, scale=0.004082232713699341,
       zero_point=0)], 'prev_node_name': '_0', 'prev_node_target_type': "<class 'torch.nn.quantized.modules.conv.Conv2d'>", 'ref_node_name': '_0', 'index_within_arg': 0, 'index_of_arg': 0, 'sqnr_int8_vs_fp32': [tensor(48.1309)]}]
  }}
}

```

Test Plan:
```
python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_extend_logger_results_with_comparison
```

Imported from OSS

Reviewed By: hx89

Differential Revision: D29244715

fbshipit-source-id: a5547b449ea54e046c752119559be49bd738beea
2021-06-24 13:42:16 -07:00
Vasiliy Kuznetsov
31fe1c1323 ns for fx: rekey results by model node names (#60305)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60305

Adjusts the NS for FX weight and activation extraction APIs
to require a model name, and rekeys the results of these APIs
to use the node names of the specified model as layer keys.

For example, before

```
// API call
results = ns.extract_logger_info(
  model_a, model_b, ns.OutputLogger)

// results
{'base_op_1_0': {'node_output':
  {'model_a': [{'ref_node_name': 'linear1', ...}]}}}
```

and after

```
// API call
results = ns.extract_logger_info(
  model_a, model_b, ns.OutputLogger, 'model_b_name')

// results
// note: instead of `base_op_1_0`, the layer is named `linear1`
{'linear1': {'node_output':
  {'model_a': [{'ref_node_name': 'linear1', ...}]}}}
```

Note: we cannot use these names while collecting data because
node names are not guaranteed to be consistent across graphs.
This is why we only rekey as the very last step.

Test Plan:
```
python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_layer_names
```

Imported from OSS

Reviewed By: hx89

Differential Revision: D29243045

fbshipit-source-id: d39ecdfdd18b07291e3ecefed2ede287b100b7d0
2021-06-24 13:41:01 -07:00
lezcano
4e347f1242 [docs] Fix backticks in docs (#60474)
Summary:
There is a very common error when writing docs: One forgets to write a matching `` ` ``, and something like ``:attr:`x`` is rendered in the docs. This PR fixes most (all?) of these errors (and a few others).

I found these running ``grep -r ">[^#<][^<]*\`"`` on the `docs/build/html/generated` folder. The regex finds an HTML tag that does not start with `#` (as python comments in example code may contain backticks) and that contains a backtick in the rendered HTML.

This regex has not given any false positive in the current codebase, so I am inclined to suggest that we should add this check to the CI. Would this be possible / reasonable / easy to do malfet ?

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60474

Reviewed By: mrshenli

Differential Revision: D29309633

Pulled By: albanD

fbshipit-source-id: 9621e0e9f87590cea060dd084fa367442b6bd046
2021-06-24 06:27:41 -07:00
Supriya Rao
1120a1b92e [quant][fx][fix] QAT with object_type in qconfig (#60555)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60555

When we do QAT, we swap the FP32 modules with the corresponding quantized modules counterpart by calling `qat_swap_modules` in prepare.
However when we try to look up using the swapped module type in qconfig_dict, we cannot find a match anymore since the qconfig dict contains the original
module type.

In this PR we update the qconfig_dict to include the modules swapped for QATT

Test Plan:
python test/test_quantization.py TestQuantizeFx.test_qconfig_qat_module_type

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D29337036

fbshipit-source-id: 60212eec3ee252a2445c1b58874cb36048c9f7dd
2021-06-23 15:55:25 -07:00
Rong Rong (AI Infra)
e60f9cfc58 Revert D29135358: [quant] Input-Weight Equaliaztion - convert modifications
Test Plan: revert-hammer

Differential Revision:
D29135358 (3de79b7757)

Original commit changeset: 2d0005672904

fbshipit-source-id: cac30c1202ebbce4f22e50ed920340c7b4c6849f
2021-06-23 11:23:24 -07:00
Angela Yi
3de79b7757 [quant] Input-Weight Equaliaztion - convert modifications (#59963)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59963

When converting, before quantizing the nodes, we call
`update_obs_for_equalization()` and `convert_eq_obs()`.

`update_obs_for_equalization`:
1. For each InputEqualizationObserver, we find the corresponding
WeightEqualizationObserver.
2. For nn.Linear layers, we will create an instance of the
WeightEqualizationObserver, run forward on the observer with the given
weights.
3. Calculate the equalization scale between the
InputEqualizationObserver and WeightEqualizationObserver.

`convert_eq_obs`:
For every InputEqualizationObserver, we will do the following:
1. Create a node (ex. `x0_activation_post_process_scale`) containing the
equalization scale constant.
2. Create another node containing a `mul` operator multiplying the
equalization scale and the input.
3. Remove the current InputEqualizationObserver node, and replace it
with the `mul` node.

For every WeightEqualizationObserver, we will do the following:
1. Get the next equalization scale (we may need this for equalizing
connected linear layers).
2. Scale the weights by multiplying it with the reciprocal of the
current equalization scale and the next equalization scale

Currently, this supports models with `nn.Linear` layers, but does not
support connecting linear layers.

Test Plan:
`python test/test_quantization.py
TestEqualizeFx.test_input_weight_equalization_convert`

Original Model:
```
.LinearModule(
  (linear): Linear(in_features=2, out_features=2, bias=True)
)
```

Graph after `prepare_fx`:
```
graph():
    %x : [#users=1] = placeholder[target=x]
    %x_equalization_process_0 : [#users=1] = call_module[target=x_equalization_process_0](args = (%x,), kwargs = {})
    %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_00](args = (%x_equalization_process_0,), kwargs = {})
    %linear : [#users=1] = call_module[target=linear](args = (%x_activation_post_process_0,), kwargs = {})
    %linear_activation_post_process_0 : [#users=1] = call_module[target=linear_activation_post_process_0](args = (%linear,), kwargs = {})
    return linear_activation_post_process_0
```

Graph after equalization functions:
```
graph():
    %x : [#users=1] = placeholder[target=x]
    %x_equalization_process_0_scale : [#users=1] = get_attr[target=x_equalization_process_0_scale]
    %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_equalization_process_0_scale), kwargs = {})
    %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_00](args = (%mul,), kwargs = {})
    %linear : [#users=1] = call_module[target=linear](args = (%x_activation_post_process_0,), kwargs = {})
    %linear_activation_post_process_0 : [#users=1] = call_module[target=linear_activation_post_process_0](args = (%linear,), kwargs = {})
    return linear_activation_post_process_0
```

Graph after `convert_fx`:
```
graph():
    %x : [#users=1] = placeholder[target=x]
    %x_equalization_process_0_scale : [#users=1] = get_attr[target=x_equalization_process_0_scale]
    %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_equalization_process_0_scale), kwargs = {})
    %linear_input_scale_0 : [#users=1] = get_attr[target=linear_input_scale_0]
    %linear_input_zero_point_0 : [#users=1] = get_attr[target=linear_input_zero_point_0]
    %quantize_per_tensor : [#users=1] = call_function[target=torch.quantize_per_tensor](args = (%mul, %linear_input_scale_0, %linear_input_zero_point_0, torch.quint8), kwargs = {})
    %linear : [#users=1] = call_module[target=linear](args = (%quantize_per_tensor,), kwargs = {})
    %dequantize : [#users=1] = call_method[target=dequantize](args = (%linear,), kwargs = {})
    return dequantize
```

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D29135358

fbshipit-source-id: 2d00056729041318463de61841483490b6bfeee5
2021-06-22 20:43:30 -07:00
Supriya Rao
4887c6e401 [quant] avoid resize calls in observer/fake_quant (#60386)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60386

During QAT we sometimes encounter errors with scripted models
`RuntimeError: cannot resize variables that require grad`

For per-tensor cases we don't need to resize some buffers so this PR removes the extra resize ops where applicable

Test Plan: Imported from OSS

Reviewed By: vkuzo

Differential Revision: D29271905

fbshipit-source-id: 01a484a9559a3a4180490f9476d0cd3044ba0d1b
2021-06-22 17:41:43 -07:00
Jerry Zhang
4a3eea9a6a [quant][graphmode][fx] Produce reference linear module in convert (#60152)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60152

Test Plan:
python test/test_quantization.py TestQuantizeFx

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D29188263

fbshipit-source-id: f7bbbef5d4d747eadf7a627a4e77a5ec9bb0bc94
2021-06-20 20:08:12 -07:00
Jerry Zhang
2293ab4e53 [quant][graphmode][fx] Refactor convert for linear to use get_static_module_mapping and get_dynamic_module_mapping (#60151)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60151

Test Plan:
```
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
```

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D29188264

fbshipit-source-id: d2b77ffcf4b7446fc6c43248e43218092d2a6aea
2021-06-20 19:41:16 -07:00
Jerry Zhang
47d727fe1b [quant][graphmode][fx] Produce conv reference static quant modules (#60138)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60138

Test Plan:
python test/test_quantization.py TestQuantizeFx

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D29184791

fbshipit-source-id: 971a40012dbba0cf687c62a3a4af9358513c253b
2021-06-20 19:25:45 -07:00
Jerry Zhang
a029422cae [quant][graphmode][fx][refactor] Change the env map to add dtype as a key (#60054)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60054

Previously env in convert is Dict[str, Tuple[Node, torch.dtype]], that is, at a given time each node can only have one dtype,
this causes a problem for the following case:
```
class M(torch.nn.Module):
        def __init__(self):
            super().__init__()
            self.conv = nn.Conv2d(1, 1, 1)

        def forward(self, x):
            x = self.conv(x)
            x1 = x.expand_as(x)
            x2 = torch.add(x, x1)
            return x2

def forward(self, x):
    x = self.activation_post_process_0(x)
    x = self.conv(x)
    x = self.activation_post_process_1(x)
    x1 = x.expand_as(x)
    x1 = self.activation_post_process_2(x1)
    x2 = torch.add(x, x1)
    x2 = self.activation_post_process_3(x2)
    return x2

def forward(self, x):
    x = torch.quantize_per_tensor(x, ...)
    x = self.conv(x). # quantized conv
    x = torch.dequantize(x)
    x1 = x.expand_as(x)
    x1 = torch.quantize_per_tensor(x1, ...)
    # Error: x is dequantized
    x2 = torch.ops.quantized.add(x, x1)
    return x2

Currently we have a env that is a map from node name of the observed graph to the Node in the quantized graph, here the problem is that following a quantized operator conv, we have two operators, one is expecting float input (expand_as), the other is expecting quantized input (quantized add), and in the quantized graph, ideally, expand_as should consume the dequantized output, and quantized add should consume the quantized output:

quantized_conv - dequantize - expand_as
  \ ------- quantized_add

But currently in env, each node needs to either be quantized or not quantized. Therefore we will need to change env to include dtype as well:
env: Dict[str, Dict[dtype, Node]], e.g. {‘x’: {torch.float: dequantized_node, torch.quint8: quantized_node}}
And when we load from the env, we will need to provide the dtype of the Node that we want to load as well. We can have a separate pass to figure out this information for each node.
```

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D29149408

fbshipit-source-id: c9e4b7d65444ab6a6f573929bae1db5037629892
2021-06-18 13:31:43 -07:00
Vasiliy Kuznetsov
5a45103139 ns for fx: add API usage logging (#60103)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60103

Adds internal logging for NS for FX API usage.

Test Plan: CI

Reviewed By: jerryzh168

Differential Revision: D29166710

fbshipit-source-id: 2a1bf2f6038b0c6c5945b57b2db2de25c585a04a
2021-06-18 10:25:59 -07:00
Philip Meier
d5988c5eca remove unused type: ignore directives (#60006)
Summary:
During development it is common practice to put `type: ignore` comments on lines that are correct, but `mypy` doesn't recognize this. This often stems from the fact, that the used `mypy` version wasn't able to handle the used pattern.

With every new release `mypy` gets better at handling complex code. In addition to fix all the previously accepted but now failing patterns, we should also revisit all `type: ignore` comments to see if they are still needed or not. Fortunately, we don't need to do it manually: by adding `warn_unused_ignores = True` to the configuration, `mypy` will error out in case it encounters an `type: ignore` that is no longer needed.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60006

Reviewed By: jbschlosser, malfet

Differential Revision: D29133237

Pulled By: albanD

fbshipit-source-id: 41e82edc5cd5affa7ccedad044b59b94dad4425a
2021-06-18 07:23:31 -07:00
Angela Yi
c0b7c59e55 [quant] Equalization Observer modifications (#59953)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59953

The following modifications were made to the equalization
observers due to design changes:
- [InputEqualizationObserver] Replaced `calculate_qparams()` with
`calculate_scaled_minmax()` since we will need to return the scaled
min/max values to update the following input quantization observer
- [WeightEqualizationObserver] We no longer need a row observer since
this will be taken care of by the following weight quantization observer
- [WeightEqualizationObserver] Following the previous comment, we no
longer need to calculate the scaled qparam values. Instead, we will use
the equalization scale to later scale the weights and the qparams will
be taken care of by the weight quantization observer.

Test Plan:
`python test/test_quantization.py
TestEqualizeFx.test_input_weight_eq_observer`

Imported from OSS

Reviewed By: supriyar

Differential Revision: D29135332

fbshipit-source-id: be7e468273c8b62fc183b1e1ec50f6bd6d8cf831
2021-06-16 22:32:30 -07:00
Angela Yi
45c31cabb5 [quant] Input Weight Equalization - prepare modifications (#59747)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59747

Modifies prepare_fx for input-weight equalization. If a current
node is being equalized (there exists a EqualizationQConfig), then the
EqualizationObserver will be inserted before its quantization observer.

For a singular linear layer, the general flow looks like:
Original graph: `x0 -> linear -> x1`, `w -> linear`
After prepare: `x0 -> InpEqObs -> MinMaxObs -> linear1 -> MinMaxObs -> x1`
  `w -> WeightEqObs -> MinMaxObs -> linear1`

For two connected linear layers, the general flow looks like:
Original graph: `x0 -> linear1 -> linear2 -> x1`,
  `w1 -> linear1`, `w2 -> linear2`
After prepare: `x0 -> InpEqObs -> MinMaxObs -> linear1 -> MinMaxObs -> InpEqObs -> linear2 -> MinMaxObs -> x1`
  `w1 -> WeightEqObs -> MinMaxObs -> linear1`, `w2 -> WeightEqObs -> MinMaxObs -> linear2

Test Plan:
`python test/test_quantization.py
TestEqualizeFx.test_input_equalization_prepare`

Original model with one `nn.Linear` layer
```
LinearModule(
  (linear): Linear(in_features=1, out_features=1, bias=True)
)
```

Graph after `prepare_fx`:
```
graph():
    %x : [#users=1] = placeholder[target=x]
    %x_equalization_process_0 : [#users=1] = call_module[target=x_equalization_process_0](args = (%x,), kwargs = {})
    %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_00](args = (%x_equalization_process_0,), kwargs = {})
    %linear : [#users=1] = call_module[target=linear](args = (%x_activation_post_process_0,), kwargs = {})
    %linear_activation_post_process_0 : [#users=1] = call_module[target=linear_activation_post_process_0](args = (%linear,), kwargs = {})
    return linear_activation_post_process_0
```
--------------------------------------

Original model with two connected functional linear layers
```
FunctionalLinearModule(
  (linear1): Linear()
  (linear2): Linear()
)
```

Graph after `prepare_fx`:
```
graph():
    %x : [#users=1] = placeholder[target=x]
    %x_equalization_process_0 : [#users=1] = call_module[target=x_equalization_process_0](args = (%x,), kwargs = {})
    %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_00](args = (%x_equalization_process_0,), kwargs = {})
    %linear1_w : [#users=1] = get_attr[target=linear1.w]
    %linear1_w_equalization_process_0 : [#users=1] = call_module[target=linear1_w_equalization_process_0](args = (%linear1_w,), kwargs = {})
    %linear1_w_activation_post_process_0 : [#users=1] = call_module[target=linear1_w_activation_post_process_00](args = (%linear1_w_equalization_process_0,), kwargs = {})
    %linear1_b : [#users=1] = get_attr[target=linear1.b]
    %linear : [#users=1] = call_function[target=torch.nn.functional.linear](args = (%x_activation_post_process_0, %linear1_w_activation_post_process_0), kwargs = {bias: %linear1_b})
    %linear_activation_post_process_0 : [#users=1] = call_module[target=linear_activation_post_process_0](args = (%linear,), kwargs = {})
    %linear_activation_post_process_0_equalization_process_0 : [#users=1] = call_module[target=linear_activation_post_process_0_equalization_process_0](args = (%linear_activation_post_process_0,), kwargs = {})
    %linear2_w : [#users=1] = get_attr[target=linear2.w]
    %linear2_w_equalization_process_0 : [#users=1] = call_module[target=linear2_w_equalization_process_0](args = (%linear2_w,), kwargs = {})
    %linear2_w_activation_post_process_0 : [#users=1] = call_module[target=linear2_w_activation_post_process_00](args = (%linear2_w_equalization_process_0,), kwargs = {})
    %linear2_b : [#users=1] = get_attr[target=linear2.b]
    %linear_1 : [#users=1] = call_function[target=torch.nn.functional.linear](args = (%linear_activation_post_process_0_equalization_process_0, %linear2_w_activation_post_process_0), kwargs = {bias: %linear2_b})
    %linear_1_activation_post_process_0 : [#users=1] = call_module[target=linear_1_activation_post_process_0](args = (%linear_1,), kwargs = {})
    return linear_1_activation_post_process_0
```

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D29135316

fbshipit-source-id: 91697e805ede254dbb2a42ee4c23eb1c1c64590e
2021-06-16 22:32:28 -07:00
Angela Yi
7ce74f3339 [quant] EqualizationQConfig to distinguish input/output activations (#59739)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59739

Created an EqualizationQConfig specifically for equalization.
This inherits from QConfig and is used to distinguish between inserting
an input observer with an output observer. Since the output observer
field is included in the EqualizationQConfig, we no longer need an
output observer field in the _InputEqualizationObserver

Test Plan:
compiles

Imported from OSS

Reviewed By: ezyang

Differential Revision: D29135298

fbshipit-source-id: 3dde9c029c291467ff0a0845f0fc9c44573fc6f6
2021-06-16 22:31:18 -07:00
Jerry Zhang
a344b09db2 [quant][fx][graphmode] Remove Quantizer class (#59606)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59606

Test Plan:
python test/test_quantization.py TestQuantizeFx

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D28951432

fbshipit-source-id: 3301f7200a4c7166673c27f9ac7ff559f1e6935d
2021-06-15 21:54:57 -07:00
Supriya Rao
864d129bae [quant][fx] Remove extra q-dq for weight bias in normalization ops (#59882)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59882

Currently for normalization ops, the weight and bias arguments are treated as activationn inputs which require observers.
This results in adding extra quant-dequant ops for the weight and bias inputs.

This PR adds support to skip observing weight/bias inputs of norm operators, thus removing the redundant q-dq ops

Quantized graph with F.layer_norm
Before this PR
```
def forward(self, x):
    _input_scale_0 = self._input_scale_0
    _input_zero_point_0 = self._input_zero_point_0
    quantize_per_tensor = torch.quantize_per_tensor(x, _input_scale_0, _input_zero_point_0, torch.quint8);  x = _input_scale_0 = _input_zero_point_0 = None
    scale = self.scale
    _input_scale_1 = self._input_scale_1
    _input_zero_point_1 = self._input_zero_point_1
    quantize_per_tensor_1 = torch.quantize_per_tensor(scale, _input_scale_1, _input_zero_point_1, torch.quint8);  scale = _input_scale_1 = _input_zero_point_1 = None
    bias = self.bias
    _input_scale_2 = self._input_scale_2
    _input_zero_point_2 = self._input_zero_point_2
    quantize_per_tensor_2 = torch.quantize_per_tensor(bias, _input_scale_2, _input_zero_point_2, torch.quint8);  bias = _input_scale_2 = _input_zero_point_2 = None
    _scale_0 = self._scale_0
    _zero_point_0 = self._zero_point_0
    dequantize = quantize_per_tensor_1.dequantize();  quantize_per_tensor_1 = None
    dequantize_1 = quantize_per_tensor_2.dequantize();  quantize_per_tensor_2 = None
    layer_norm = torch.ops.quantized.layer_norm(quantize_per_tensor, [2, 5, 5], weight = dequantize, bias = dequantize_1, eps = 1e-05, output_scale = _scale_0, output_zero_point = _zero_point_0);  quantize_per_tensor = dequantize = dequantize_1 = _scale_0 = _zero_point_0 = None
    dequantize_2 = layer_norm.dequantize();  layer_norm = None
    return dequantize_2
```
After
```
def forward(self, x):
    _input_scale_0 = self._input_scale_0
    _input_zero_point_0 = self._input_zero_point_0
    quantize_per_tensor = torch.quantize_per_tensor(x, _input_scale_0, _input_zero_point_0, torch.quint8);  x = _input_scale_0 = _input_zero_point_0 = None
    scale = self.scale
    bias = self.bias
    _scale_0 = self._scale_0
    _zero_point_0 = self._zero_point_0
    layer_norm = torch.ops.quantized.layer_norm(quantize_per_tensor, [2, 5, 5], weight = scale, bias = bias, eps = 1e-05, output_scale = _scale_0, output_zero_point = _zero_point_0);  quantize_per_tensor = scale = bias = _scale_0 = _zero_point_0 = None
    dequantize = layer_norm.dequantize();  layer_norm = None
    return dequantize
```

Test Plan:
python test/test_quantization.py TestQuantizeFxOps.test_norm_weight_bias

Imported from OSS

Reviewed By: HDCharles, ailzhang

Differential Revision: D29068203

fbshipit-source-id: 24b5c38bbea5fd355d34522bfa654c9db18607da
2021-06-11 16:22:36 -07:00
Vasiliy Kuznetsov
d75e99b709 fx quant: enable qconfig_dict to target function invocations by order (#59605)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59605

Enables targeting of individual function invocations by execution order.
For example, given a module such as

```
class M1(torch.nn.Module):
  def forward(self, x):
    x = torch.add(x, x)
    x = torch.add(x, x)
    return x

class M2(torch.nn.Module):
  def __init__(self):
    self.m1 = M1()

  def forward(self, x):
    x = self.m1(x)
    return x
```

We can now target the first add of `m1` with

```
qconfig_dict = {
  "module_name_function_order": ("m1", torch.add, 0, custom_qconfig),
}
```

Test Plan:
```
python test/test_quantization.py TestQuantizeFx.test_qconfig_module_name_function_order
```

Imported from OSS

Reviewed By: hx89

Differential Revision: D28951077

fbshipit-source-id: 311d423724a31193d4fa4bbf3a712b46464b5a29
2021-06-11 08:53:40 -07:00
Vasiliy Kuznetsov
0099c25b85 fx quant: remove some dead code in observer insertion (redo) (#59799)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59799

This is a redo of #58574, easier to create a new PR than to fix rebase
conflicts, as there have been a large number of refactors to the
underlying code.

Removes some code which was incorrectly added by #57519 but never
actually used for anything.

Test Plan:
```
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
```

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D29031955

fbshipit-source-id: f407d181070cb283382965952821e3647c705544
2021-06-10 12:57:09 -07:00
Kevin Zheng (FRL)
61965abad7 Move _PartialWrapper to module scope (#59660)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59660

Context https://github.com/pytorch/pytorch/issues/57352

Test Plan: Pytorch CI tests

Reviewed By: vkuzo

Differential Revision: D28972991

fbshipit-source-id: efc9dd3e90e18e1cdf27d5ef0f168abd8169bc42
2021-06-09 11:55:04 -07:00
Jerry Zhang
7dac2987ce [quant][eager][fix] Fix a typo in convert function in eager mode quantization (#59571)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59571

Test Plan:
python test/test_quantization.py TestPostTrainingStatic.test_custom_module_class

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D28938355

fbshipit-source-id: 566daeb07d616ae40e52754d3d4581f75f248f04
2021-06-08 10:24:22 -07:00
Angela Yi
cc03ea2c47 [quant] Implemented InputWeightObserver for Linear inputs
Summary: Implemented two observers (InputEqualObserver and WeightEqualObserver) which will be inserted into the graph during prepare_fx().

Test Plan: python test/test_quantization.py TestEqualizeFx

Reviewed By: supriyar

Differential Revision: D28836954

fbshipit-source-id: 25517dc82ae67698ed8b2dc334e3323286976104
2021-06-07 11:19:43 -07:00
KAI ZHAO
1aa14fcb14 Fix the "tensors to be on the same device" error in HistogramObserver (#59234)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/59075

This PR fixes the "tensors to be on the same device" error in `HistogramObserver`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59234

Reviewed By: jbschlosser

Differential Revision: D28837572

Pulled By: vkuzo

fbshipit-source-id: ff7c3229ced7de2cdd8f76d526f0fd33ac643216
2021-06-03 13:30:56 -07:00
Jerry Zhang
18642e664a [quant][graphmode][fx][refactor] Split quantize.py to prepare.py and convert.py (#59353)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59353

Next: remove Quantizer class

Test Plan: Imported from OSS

Reviewed By: raghuramank100

Differential Revision: D28856277

fbshipit-source-id: 25f5502be387dbe9706780f667501b46b82789a5
2021-06-02 23:52:39 -07:00
Jerry Zhang
87a25e09f4 [quant][graphmode][fx][refactor] Remove _convert from Quantizer class (#59042)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59042

To remove Quantizer class and split prepare and convert functions to different files

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D28724867

fbshipit-source-id: 9f87d51020caa20d5408cb2820947e23d92d5fc3
2021-06-02 08:50:56 -07:00
Jerry Zhang
3218d890dd [quant][graphmode][fx][fix] Fix support for custom module (#59041)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59041

Static quantization for Custom module support was removed in a previous refactor
https://github.com/pytorch/pytorch/pull/57519 since it's not covered by the test case
This PR re-enabled the test case and fixed the support

Test Plan: Imported from OSS

Reviewed By: vkuzo

Differential Revision: D28724866

fbshipit-source-id: 1974675b88b56a2173daf86965d6f3fb7ebd783b
2021-06-01 22:31:15 -07:00
Jerry Zhang
06af7618e7 [quant][graphmode][fx][refactor] Remove Quantizer class from convert (QuantizeHandler) (#59040)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59040

To remove Quantizer class and split prepare and convert functions to different files

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D28724870

fbshipit-source-id: c0f748711b825cd46bdfcc05c054c77a41e8207a
2021-06-01 22:00:49 -07:00
Jerry Zhang
50e6ee3ca2 [quant][graphmode][fx][refactor] Remove Quantizer class from quantize_node (#59039)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59039

To remove Quantizer class and split prepare and convert functions to different files

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D28724874

fbshipit-source-id: bd984716b2da1d6879c3e92fa827574783a41567
2021-06-01 21:40:08 -07:00