Commit Graph

836 Commits

Author SHA1 Message Date
Jerry Zhang
cdf85a82ed [quant][graphmode][fx] Add reference pattern support for BatchNorm (#62215)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62215

including batchnorm2d, batchnorm3d, batchnormrelu2d and batchnormrelu3d

Test Plan:
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D29917524

fbshipit-source-id: 3a9520ff659cb21e6e2fe614973b3d08aa0af923
2021-07-28 10:14:16 -07:00
Jerry Zhang
f4baa83eae [bc-breaking] reference option for conv produce a pattern instead of reference conv module (#61942)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61942

This PR changes is_reference=True for conv to produce a pattern consists of dequant - float conv - quant instead of reference conv module, this is useful for future transformations to custom backends, it is also helpful to simplify the implementation for
convert in the future.

Test Plan:
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D29810656

fbshipit-source-id: 549237a62bfda4341a2a7474c124f5e33350e267
2021-07-28 09:13:40 -07:00
Jerry Zhang
7507aeded5 [reland][bc-breaking] reference option for linear produce a pattern instead of reference linear module (#61892) (#62277)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62277

This PR changes is_reference=True for linear to produce a pattern consists of dequant - float linear - quant instead of reference linear module, this is useful for future transformations to custom backends, it is also helpful to simplify the implementation for
convert in the future.

Test Plan:
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Imported from OSS

Reviewed By: ejguan

Differential Revision: D29941079

fbshipit-source-id: 84bdfc0bb872c34fc345875e545c8b323e77c41e
2021-07-27 15:46:44 -07:00
Erjia Guan
8cdf16d1de Revert D29810657: [bc-breaking] reference option for linear produce a pattern instead of reference linear module
Test Plan: revert-hammer

Differential Revision:
D29810657 (9df605133e)

Original commit changeset: 949615bbc017

fbshipit-source-id: 54597d1f9636b0f94ae01c66018ff2592e5c39fc
2021-07-27 10:10:13 -07:00
Jerry Zhang
9df605133e [bc-breaking] reference option for linear produce a pattern instead of reference linear module (#61892)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61892

This PR changes is_reference=True for linear to produce a pattern consists of dequant - float linear - quant instead of reference linear module, this is useful for future transformations to custom backends, it is also helpful to simplify the implementation for
convert in the future.

Test Plan:
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D29810657

fbshipit-source-id: 949615bbc017bc454d81c8a6b2bdec53badaab19
2021-07-27 09:49:20 -07:00
Erjia Guan
dc55d511d9 Forward fix mypy (#62263)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62263

Fixes current HUD Error: https://github.com/pytorch/pytorch/runs/3170342799

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D29935265

Pulled By: ejguan

fbshipit-source-id: 6f247833d24bff7aea42f6287493a85d62d73b96
2021-07-27 07:52:31 -07:00
Jerry Zhang
2d7c1e3fa8 [bc-breaking] Produce quantization pattern for add_scalar and mul_scalar (#61859)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61859

BC-breakign note:
Previously we do not add observer/fake_quant for output of add/mul for tensor - scalar operation,
in this PR we added the observer/fake_quant instance (that's the same as input) to correctly model
the behavior of the quantized add_scalar and mul_scalar op (since quantized add/mul scalar assumes the
output quantized tensor have the same quantization parameter as input)

Test Plan:
python test/test_quantization.py TestQuantizeFxOps.test_add
python test/test_quantization.py TestQuantizeFxOps.test_mul

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D29770859

fbshipit-source-id: f43fcbfecd04c392467770b22c481bbbdaf43c25
2021-07-27 02:46:00 -07:00
Jerry Zhang
457a3fb6d1 [bc-breaking][quant][graphmode][fx] Produce dequant - fp_op - quant pattern for copy nodes (#61763)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61763

This PR changes the is_reference=True option for convert_fx to produce a dequant - fp_op - quant
pattern for copy nodes like maxpool op.

Before the PR:
```
def forward(self, x):
    maxpool2d_input_scale_0 = self.maxpool2d_input_scale_0
    maxpool2d_input_zero_point_0 = self.maxpool2d_input_zero_point_0
    quantize_per_tensor = torch.quantize_per_tensor(x, maxpool2d_input_scale_0, maxpool2d_input_zero_point_0, torch.quint8);  x = maxpool2d_input_scale_0 = maxpool2d_input_zero_point_0 = None
    maxpool2d = self.maxpool2d(quantize_per_tensor);  quantize_per_tensor = None
    dequantize = maxpool2d.dequantize();  maxpool2d = None
    return dequantize
```

After (we expand the maxpool2d that works with quantized input to "dequant - maxpool2d - quant" pattern
```
def forward(self, x):
    maxpool2d_input_scale_0 = self.maxpool2d_input_scale_0
    maxpool2d_input_zero_point_0 = self.maxpool2d_input_zero_point_0
    quantize_per_tensor = torch.quantize_per_tensor(x, maxpool2d_input_scale_0, maxpool2d_input_zero_point_0, torch.quint8);  x = maxpool2d_input_scale_0 = maxpool2d_input_zero_point_0 = None
    dequantize = quantize_per_tensor.dequantize();  quantize_per_tensor = None
    maxpool2d = self.maxpool2d(dequantize);  dequantize = None
    maxpool2d_output_scale_0 = self.maxpool2d_output_scale_0
    maxpool2d_output_zero_point_0 = self.maxpool2d_output_zero_point_0
    quantize_per_tensor_1 = torch.quantize_per_tensor(maxpool2d, maxpool2d_output_scale_0, maxpool2d_output_zero_point_0, torch.quint8);  maxpool2d = maxpool2d_output_scale_0 = maxpool2d_output_zero_point_0 = None
    dequantize_1 = quantize_per_tensor_1.dequantize();  quantize_per_tensor_1 = None
    return dequantize_1
```

note that the call to self.maxpool2d is expanded to
```
    dequantize = quantize_per_tensor.dequantize();  quantize_per_tensor = None
    maxpool2d = self.maxpool2d(dequantize);  dequantize = None
    maxpool2d_output_scale_0 = self.maxpool2d_output_scale_0
    maxpool2d_output_zero_point_0 = self.maxpool2d_output_zero_point_0
    quantize_per_tensor_1 = torch.quantize_per_tensor(maxpool2d, maxpool2d_output_scale_0, maxpool2d_output_zero_point_0, torch.quint8);  maxpool2d = maxpool2d_output_scale_0 = maxpool2d_output_zero_point_0 = None
```

Test Plan:
```
python test/test_quantization.py TestQuantizeFx.test_copy_node_has_shared_actpp_instance
```

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D29728900

fbshipit-source-id: cf2c7f1f6659e3ba97cbb7c6204dd13983da10bd
2021-07-25 19:49:13 -07:00
Jerry Zhang
cc263ef795 [bc-breaking][quant][graphmode][fx] Add observer/fake_quant for copy nodes (#61687)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61687

Previously we do not insert observer/fake_quant for output copy nodes (e.g. maxpool).
But to produce reference patterns we need to insert observer/fake_quant for the output and later convert that to a quantize
node.

Model:
```
class M(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.maxpool2d = torch.nn.MaxPool2d(kernel_size=3)

    def forward(self, x):
        x = self.maxpool2d(x)
        return x
```
result of prepare:

Before:
def forward(self, x):
    x_activation_post_process_0 = self.x_activation_post_process_0(x);  x = None
    maxpool2d = self.maxpool2d(x_activation_post_process_0);  x_activation_post_process_0 = None
    return maxpool2d

After:
def forward(self, x):
    x_activation_post_process_0 = self.x_activation_post_process_0(x);  x = None
    maxpool2d = self.maxpool2d(x_activation_post_process_0);  x_activation_post_process_0 = None
    maxpool2d_activation_post_process_0 = self.maxpool2d_activation_post_process_0(maxpool2d);  maxpool2d = None
    return maxpool2d_activation_post_process_0

Test Plan: Imported from OSS

Reviewed By: vkuzo

Differential Revision: D29715566

fbshipit-source-id: 817df9b2933a35cad5331d8d8ce1c5ba0752e9df
2021-07-23 21:29:37 -07:00
Charles David Hernandez
32d0c3e8ee Support for reference convert_fx working on gpu
Summary:
This PR enables gpu only quantization, best used with is_reference since
there are not many gpu kernels for ops as of now.

This PR mainly changes how qconfigs and their obs constructors operate once they
on modules qconfig. The function add_module_to_qconfig_obs_ctr takes the obs constructors on the original
qconfig, and configures them so that when invoked, the created obs will
be on whatever device the module occupies. (Once observers are created,
module.to(device) is already setup so that it moves any observers). To do this,
a new method and a few small chanegs were added to the _PartialWrapper class that
our observers already use to create constructors (without changing the
existing functionality). These changes work in
concert with changes to the prepare flow such that when the qconfigs are
propagated to the moduels (in quantize.py and qconfig_utils.py) they are configured using add_module_to_qconfig_obs_ctr.

Ideally this would work on other models but the is_reference support for
a lot of modules isn't there yet, those tests should be added in a
future PR

Test Plan:
python test/test_quantization.py TestQuantizeFxModels.test_static_gpu_convert_basic

python test/test_quantization.py TestQuantizeFxModels.test_switch_device_prepare_convert

python test/test_quantization.py TestQuantizeFxModels.test_prepare_serialize_switch_device_convert

python test/test_quantization.py TestQuantizeFx.test_qconfig_precedence

Reviewed By: vkuzo

Differential Revision: D29684114

fbshipit-source-id: 19fefb8e1998eaf212723e836276ccf39467f2e7
2021-07-23 10:30:38 -07:00
Vasiliy Kuznetsov
04c95a0638 ns for fx: expose hook to define custom weight extraction functions (#62047)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62047

Adds a hook for user to define a weight extraction function for a
custom type.

Example usage:
```
op_to_type_to_weight_extraction_fn = \
    get_op_to_type_to_weight_extraction_fn()
op_to_type_to_weight_extraction_fn['call_function'][_wrapped_linear] = \
    torch.quantization.ns.weight_utils.get_linear_fun_weight

results = extract_weights_impl(
    'a', m1, 'b', m2,
    op_to_type_to_weight_extraction_fn=op_to_type_to_weight_extraction_fn)
```

Test Plan:
```
python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_user_defined_function
```

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D29853625

fbshipit-source-id: 183916ef54ba303bc818e0eba00b52e33c4633ad
2021-07-23 09:31:37 -07:00
Vasiliy Kuznetsov
07c6a12008 ns for fx: fix typing issue in weight extraction (#62041)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62041

Before this PR, weights of conv and linear modules were extracted
as lists, in order to match the signature of LSTM weights.

After this PR, weight extraction preserves the type of the weights,
so extracted weights of conv and linear have a different type
from LSTM weights.  The comparison util functions are updated to
handle the LSTM weight type of `List[tensor]`.

Test Plan:
```
python test/test_quantization.py TestFXNumericSuiteCoreAPIs
python test/test_quantization.py TestFXNumericSuiteCoreAPIsModels
```

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D29853626

fbshipit-source-id: 93da5b9b0b174679c61528d02b6b902cb064444e
2021-07-23 09:31:33 -07:00
Vasiliy Kuznetsov
eaba16d665 ns for fx: change weight extraction to direct mapping (#62038)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62038

Updates the logic to extract weights from nodes to use a
direct mapping from type to weight extraction function.

This is needed for a future PR which will allow users to
specify custom weight extraction functions for user defined
types.

Test Plan:
```
python test/test_quantization.py TestFXNumericSuiteCoreAPIs
python test/test_quantization.py TestFXNumericSuiteCoreAPIsModels
```

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D29853627

fbshipit-source-id: 3ef90ef4bd7b28f6316c0af215a2bd3ff8a2aeca
2021-07-23 09:30:08 -07:00
Supriya Rao
b8386f5d72 [quant] Create FusedMovingAvgObsFakeQuantize for QAT (#61691)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61691

Create a new module for QAT that does a Fused MovingAvgMinMaxObserver and FakeQuantize operation
The module currently only supports per-tensor quantization (affine/symmetric). Follow-up PR will add support for per-channel

Results on running QAT with MobileNetV2 (Obs enabled/fake_quant enabled)
Original FQ module
PyTorchObserver {"type": "_", "metric": "qnnpack_fp_latency_ms", "unit": "ms", "value": "242.80261993408203"}
PyTorchObserver {"type": "_", "metric": "qnnpack_qat0_latency_ms", "unit": "ms", "value": "505.7964324951172"}
PyTorchObserver {"type": "_", "metric": "fbgemm_fp_latency_ms", "unit": "ms", "value": "235.80145835876465"}
PyTorchObserver {"type": "_", "metric": "fbgemm_qat0_latency_ms", "unit": "ms", "value": "543.8144207000732"}

Fused FakeQuant module (~50% improvement in latency)
PyTorchObserver {"type": "_", "metric": "qnnpack_fp_latency_ms", "unit": "ms", "value": "232.1624755859375"}
PyTorchObserver {"type": "_", "metric": "qnnpack_qat0_latency_ms", "unit": "ms", "value": "263.8866901397705"}
PyTorchObserver {"type": "_", "metric": "fbgemm_fp_latency_ms", "unit": "ms", "value": "236.9832992553711"}
PyTorchObserver {"type": "_", "metric": "fbgemm_qat0_latency_ms", "unit": "ms", "value": "292.1590805053711"}

Individual module benchmark result (>5x improvement in latency)
===> Baseline FakeQuantize module
```
---------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------
                                               Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg     Self CUDA   Self CUDA %    CUDA total  CUDA time avg    # of Calls
---------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------
              aten::fake_quantize_per_tensor_affine         0.77%       1.210ms         4.92%       7.730ms     154.596us     718.528us         0.45%       9.543ms     190.862us            50
    aten::fake_quantize_per_tensor_affine_cachemask         2.41%       3.792ms         4.15%       6.520ms     130.402us       8.825ms         5.58%       8.825ms     176.492us            50
                                     aten::_aminmax         3.25%       5.105ms         4.43%       6.955ms     139.102us       8.193ms         5.18%       8.193ms     163.868us            50
                                   aten::zeros_like         1.87%       2.939ms         6.95%      10.922ms     109.218us       5.992ms         3.79%      10.844ms     108.442us           100
                                        aten::zeros         0.97%       1.527ms         3.11%       4.885ms      97.702us       2.383ms         1.51%       4.800ms      96.010us            50
                                         aten::rsub         1.34%       2.106ms         2.94%       4.614ms      92.277us       2.063ms         1.30%       4.559ms      91.173us            50
                                        aten::clamp         2.79%       4.381ms         5.42%       8.519ms      85.190us       5.385ms         3.41%       8.438ms      84.381us           100
                                           aten::eq        11.70%      18.384ms        21.31%      33.479ms      83.280us      22.465ms        14.21%      33.310ms      82.861us           402
                                         aten::ones         1.05%       1.656ms         2.57%       4.038ms      80.751us       2.494ms         1.58%       3.951ms      79.028us            50
                                           aten::le         2.52%       3.955ms         4.84%       7.607ms      76.071us       4.998ms         3.16%       7.702ms      77.016us           100
                                          aten::min         0.69%       1.087ms         2.32%       3.641ms      72.827us       1.017ms         0.64%       3.603ms      72.055us            50
                                          aten::max         1.40%       2.195ms         4.62%       7.260ms      72.597us       2.008ms         1.27%       7.140ms      71.404us           100
                                   aten::is_nonzero         2.68%       4.207ms        11.35%      17.829ms      71.033us       4.062ms         2.57%      17.225ms      68.625us           251
                                       aten::detach         1.17%       1.831ms         3.65%       5.736ms      57.360us       1.680ms         1.06%       5.634ms      56.340us           100
                                          aten::mul         3.36%       5.278ms         3.36%       5.278ms      53.862us       5.215ms         3.30%       5.215ms      53.216us            98
                                          aten::div         3.42%       5.376ms         3.42%       5.376ms      53.759us       5.320ms         3.36%       5.320ms      53.196us           100
                                          aten::sub         6.79%      10.672ms         6.79%      10.672ms      53.901us      10.504ms         6.64%      10.504ms      53.050us           198
                                         aten::item         4.06%       6.380ms        12.02%      18.883ms      53.798us       6.127ms         3.87%      18.322ms      52.198us           351
                                          aten::add         3.28%       5.147ms         3.28%       5.147ms      52.518us       5.113ms         3.23%       5.113ms      52.171us            98
                                      aten::minimum         1.63%       2.555ms         1.63%       2.555ms      51.092us       2.585ms         1.64%       2.585ms      51.708us            50
                                      aten::maximum         3.22%       5.065ms         3.22%       5.065ms      50.646us       5.133ms         3.25%       5.133ms      51.329us           100
                                        aten::round         1.61%       2.529ms         1.61%       2.529ms      50.578us       2.528ms         1.60%       2.528ms      50.552us            50
                                        aten::zero_         1.99%       3.125ms         4.72%       7.422ms      49.481us       2.835ms         1.79%       7.269ms      48.462us           150
                                        aten::copy_         6.62%      10.394ms         6.62%      10.394ms      41.576us      10.252ms         6.48%      10.252ms      41.010us           250
                                             detach         2.49%       3.905ms         2.49%       3.905ms      39.049us       3.954ms         2.50%       3.954ms      39.539us           100
                                       aten::select         2.01%       3.154ms         2.47%       3.876ms      38.759us       3.866ms         2.44%       3.866ms      38.658us           100
                          aten::_local_scalar_dense         7.96%      12.503ms         7.96%      12.503ms      35.621us      12.195ms         7.71%      12.195ms      34.743us           351
                                           aten::to         2.31%       3.625ms         4.16%       6.530ms      32.650us       4.320ms         2.73%       6.270ms      31.348us           200
                                        aten::fill_         3.70%       5.808ms         3.70%       5.808ms      29.039us       5.892ms         3.73%       5.892ms      29.459us           200
                                   aten::as_strided         0.79%       1.244ms         0.79%       1.244ms       6.221us       0.000us         0.00%       0.000us       0.000us           200
                                        aten::empty         3.55%       5.579ms         3.55%       5.579ms      11.137us       0.000us         0.00%       0.000us       0.000us           501
                                      aten::resize_         2.36%       3.712ms         2.36%       3.712ms      12.332us       0.000us         0.00%       0.000us       0.000us           301
                                   aten::empty_like         1.45%       2.284ms         3.68%       5.776ms      28.878us       0.000us         0.00%       0.000us       0.000us           200
                                aten::empty_strided         2.80%       4.398ms         2.80%       4.398ms      17.592us       0.000us         0.00%       0.000us       0.000us           250
---------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------
Self CPU time total: 157.108ms
Self CUDA time total: 158.122ms
```

===> FusedFakeQuant
```
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------
                                                   Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg     Self CUDA   Self CUDA %    CUDA total  CUDA time avg    # of Calls
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------
                                   fb::fused_fake_quant        23.42%       6.408ms       100.00%      27.361ms     547.215us       7.887ms        27.20%      28.996ms     579.925us            50
                  aten::fake_quantize_per_tensor_affine         4.25%       1.162ms        27.65%       7.565ms     151.298us     686.176us         2.37%      10.217ms     204.336us            50
aten::_fake_quantize_per_tensor_affine_cachemask_ten...        14.11%       3.860ms        23.40%       6.403ms     128.068us       9.531ms        32.87%       9.531ms     190.612us            50
                                         aten::_aminmax        20.57%       5.628ms        27.47%       7.515ms     150.305us       8.218ms        28.34%       8.218ms     164.367us            50
                                             aten::item         3.65%     999.522us        10.27%       2.810ms      56.202us     931.904us         3.21%       2.674ms      53.481us            50
                              aten::_local_scalar_dense         6.62%       1.811ms         6.62%       1.811ms      36.212us       1.742ms         6.01%       1.742ms      34.843us            50
                                            aten::empty        10.85%       2.969ms        10.85%       2.969ms      14.843us       0.000us         0.00%       0.000us       0.000us           200
                                       aten::as_strided         1.92%     524.365us         1.92%     524.365us       5.244us       0.000us         0.00%       0.000us       0.000us           100
                                       aten::empty_like         6.48%       1.774ms        14.62%       4.000ms      26.670us       0.000us         0.00%       0.000us       0.000us           150
                                    aten::empty_strided         8.14%       2.226ms         8.14%       2.226ms      14.842us       0.000us         0.00%       0.000us       0.000us           150
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------
Self CPU time total: 27.361ms
Self CUDA time total: 28.996ms
```

Test Plan:
python test/test_quantization.py TestFusedObsFakeQuantModule

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D29706889

fbshipit-source-id: ae3f9fb1fc559920459bf6e8663e8299bf7d21e1
2021-07-21 10:13:04 -07:00
Vasiliy Kuznetsov
eefbff773b ns for fx: add utils for l2 error and cosine similarity (#61380)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61380

Adds convenience wrappers for l2 error and cosine similarity
to NS utils.

Test Plan:
```
python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_extend_logger_results_with_comparison
```

Imported from OSS

Reviewed By: hx89

Differential Revision: D29600354

fbshipit-source-id: 670c44a44df7f345884cacf26ed3c885edbe9977
2021-07-17 20:53:43 -07:00
Vasiliy Kuznetsov
2a2bc1fc8a ns for fx: add fqn to results, when present (#61377)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61377

Both the quantization tracer and the NS tracer record
`_node_name_to_scope`, which contains the mapping from
node name to FQN.

This PR adds the FQN information to the NS results, so that it is
more convenient for users to attribute a NS result to the corresponding
module in their model.

Test Plan:
```
python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_extract_weights_fqn
python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_match_activations_fqn
python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_shadow_activations_fqn
```

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D29600349

fbshipit-source-id: df489e03daff97dd380f59c83ffdc2b0012a0a53
2021-07-17 20:53:41 -07:00
Vasiliy Kuznetsov
7449f49a4c ns for fx: return results in execution order (#61360)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61360

By default, NS graph matching matches from the end of the graph
to the start.  This PR reverses the returned results so that
the outputs of the NS APIs are in the order of execution, making
it easier to analyze.

Test Plan:
```
python test/test_quantization.py TestFXGraphMatcher.test_results_order
```

Imported from OSS

Reviewed By: hx89

Differential Revision: D29600348

fbshipit-source-id: c9fa4a3748db27c1788eebf803f35221e6fc8701
2021-07-17 20:53:39 -07:00
Vasiliy Kuznetsov
2b2928c5ca ns for fx: improve error messages for graph matching (#61359)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61359

Makes the error messages when graph matching easier to read
for users.

Test Plan:
```
// inspect the exceptions in the following two tests and verify
// that they are easier to read than before
python test/test_quantization.py TestFXGraphMatcher.test_matching_failure_node_count
python test/test_quantization.py TestFXGraphMatcher.test_matching_failure_node_type
```

Imported from OSS

Reviewed By: hx89

Differential Revision: D29600353

fbshipit-source-id: ec6640fe6cab7b62a697e4ee385be182f2918fd4
2021-07-17 20:53:38 -07:00
Vasiliy Kuznetsov
4acd14da02 ns for fx: preserve observers and fake_quants through passes (#61323)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61323

Before this PR, all observers and fake quants were silently removed
when adding loggers with NS. This is problematic for QAT models because
we need the fake quants to run in order to properly capture intermediate
outputs.

This PR fixes the issue by preserving the observers throughout
the passes which add loggers.  In detail:
* for each quantization module or fusion, add additional patterns with that fusion and an observer/fake_quant at the end
* remove the places in the logger model creation code which removed observers
* add unit testing that QAT numerics do not change after adding loggers

Test Plan:
```
python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_loggers_preserve_qat_numerics
python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_shadow_loggers_preserve_qat_numerics
```

Imported from OSS

Reviewed By: hx89

Differential Revision: D29600351

fbshipit-source-id: 5f25118b79eb47860c49bca882de6a8eae7a4456
2021-07-17 20:53:33 -07:00
Vasiliy Kuznetsov
a70505cdbd ns for fx: support comparing fp32 vs fp32_prepared, except shadowed (#61129)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61129

Adds support the comparing fp32 model (without quantization) to a
fp32 model prepared with quantization. The main missing feature was
handling conv-bn fusion, since this fusion for PTQ happens outside
of quantization patterns.

Adds testing for this case for comparing weights and comparing
activations

Adds a TODO for also handling this for shadow activations, we need to
first stop removing observers in graph passes before we can add
this support, will be in a future PR.

Test Plan:
```
python test/test_quantization.py TestFXGraphMatcherModels.test_mobilenet_v2
python test/test_quantization.py TestFXGraphMatcherModels.test_mobilenet_v2_qat
python test/test_quantization.py TestFXNumericSuiteCoreAPIsModels.test_compare_activations_conv
```

Imported from OSS

Reviewed By: raghuramank100

Differential Revision: D29520009

fbshipit-source-id: f63484a998f1424bd9cacf5d823b82b2edfea1ae
2021-07-17 20:52:23 -07:00
Angela Yi
0751a41ab1 [quant] Input-Weight Equalization - ConvReLU support (#61350)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61350

Applied changes in convert to allow for ConvReLU2d layers

Initial Model: `x -> conv1 -> relu`

After fusion: `x -> convRelu2d`

After prepare: `x -> input_quant_obs -> input_eq_obs1 -> convRelu2d -> output_quant_obs1`

After equalization functions: `x -> mul -> input_quant_obs (scaled) -> convRelu2d -> output_quant_obs`

After convert: `x -> mul -> quantize_per_tensor -> quantized::convRelu2d -> dequantize`

Test Plan:
`python test/test_quantization.py TestEqualizeFx`

Initial Model:
```
ConvReluModel(
  (fc): Conv2d(3, 5, kernel_size=(3, 3), stride=(1, 1))
  (relu): ReLU()
)
```

After prepare:
```
GraphModule(
  (x_activation_post_process_0): MinMaxObserver(min_val=5.960464477539063e-08, max_val=0.9999999403953552)
  (x_activation_post_process_0_equalization_process_0): _InputEqualizationObserver(
    (input_obs): PerChannelMinMaxObserver(min_val=tensor([1.1921e-07, 3.3379e-06, 5.9605e-08]), max_val=tensor([1.0000, 1.0000, 1.0000]))
  )
  (fc): ConvReLU2d(
    (0): Conv2d(3, 5, kernel_size=(3, 3), stride=(1, 1))
    (1): ReLU()
  )
  (fc_activation_post_process_0): MinMaxObserver(min_val=0.0, max_val=1.2341605424880981)
)

graph():
    %x : [#users=1] = placeholder[target=x]
    %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_0](args = (%x,), kwargs = {})
    %x_activation_post_process_0_equalization_process_0 : [#users=1] = call_module[target=x_activation_post_process_0_equalization_process_0](args = (%x_activation_post_process_0,), kwargs = {})
    %fc : [#users=1] = call_module[target=fc](args = (%x_activation_post_process_0_equalization_process_0,), kwargs = {})
    %fc_activation_post_process_0 : [#users=1] = call_module[target=fc_activation_post_process_0](args = (%fc,), kwargs = {})
    return fc_activation_post_process_0
```

After equalization functions:
```
graph():
    %x : [#users=1] = placeholder[target=x]
    %x_equalization_scale0 : [#users=1] = get_attr[target=x_equalization_scale0]
    %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_equalization_scale0), kwargs = {})
    %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_0](args = (%mul,), kwargs = {})
    %fc : [#users=1] = call_module[target=fc](args = (%x_activation_post_process_0,), kwargs = {})
    %fc_activation_post_process_0 : [#users=1] = call_module[target=fc_activation_post_process_0](args = (%fc,), kwargs = {})
    return fc_activation_post_process_0
```

After convert:
```
graph():
    %x : [#users=1] = placeholder[target=x]
    %x_equalization_scale0 : [#users=1] = get_attr[target=x_equalization_scale0]
    %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_equalization_scale0), kwargs = {})
    %fc_input_scale_0 : [#users=1] = get_attr[target=fc_input_scale_0]
    %fc_input_zero_point_0 : [#users=1] = get_attr[target=fc_input_zero_point_0]
    %quantize_per_tensor : [#users=1] = call_function[target=torch.quantize_per_tensor](args = (%mul, %fc_input_scale_0, %fc_input_zero_point_0, torch.quint8), kwargs = {})
    %fc : [#users=1] = call_module[target=fc](args = (%quantize_per_tensor,), kwargs = {})
    %dequantize : [#users=1] = call_method[target=dequantize](args = (%fc,), kwargs = {})
    return dequantize
```

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D29638275

fbshipit-source-id: 40d4666a4451e132612ea38fdfeaaec177a1defb
2021-07-13 14:00:40 -07:00
Angela Yi
b3e4dab45a [quant] Input-Weight Equalization - Conv convert support (#61287)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61287

Modifications to functions during convert() to support equalization. Note that this implementation does not work for connected F.conv2d layers yet.

Initial:
```
      w
      |
x -> conv -> y
```

After prepare:
```
                                         w
                                         |
                                  weight_quant_obs
                                         |
                                    weight_eq_obs
                                         |
x -> input_quant_obs -> input_eq_obs -> conv -> out_quant_obs -> y
```

After convert:
```
                scale, zero_point             w (scaled)
                       |                           |
x -> mul -> quantize_per_tensor (scaled) -> quantized::conv -> dequant -> y
      |
   eq_scale
```

Test Plan:
`python test/test_quantization.py TestEqualizeFx`

Initial model:
```
ConvModel(
  (conv): Conv2d(3, 5, kernel_size=(3, 3), stride=(1, 1), bias=False)
)
```

After prepare:
```
graph():
    %x : [#users=1] = placeholder[target=x]
    %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_0](args = (%x,), kwargs = {})
    %x_activation_post_process_0_equalization_process_0 : [#users=1] = call_module[target=x_activation_post_process_0_equalization_process_0](args = (%x_activation_post_process_0,), kwargs = {})
    %conv : [#users=1] = call_module[target=conv](args = (%x_activation_post_process_0_equalization_process_0,), kwargs = {})
    %conv_activation_post_process_0 : [#users=1] = call_module[target=conv_activation_post_process_0](args = (%conv,), kwargs = {})
    return conv_activation_post_process_0
```

After equalization functions:
```
graph():
    %x : [#users=1] = placeholder[target=x]
    %x_equalization_scale0 : [#users=1] = get_attr[target=x_equalization_scale0]
    %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_equalization_scale0), kwargs = {})
    %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_0](args = (%mul,), kwargs = {})
    %conv : [#users=1] = call_module[target=conv](args = (%x_activation_post_process_0,), kwargs = {})
    %conv_activation_post_process_0 : [#users=1] = call_module[target=conv_activation_post_process_0](args = (%conv,), kwargs = {})
    return conv_activation_post_process_0
```

After convert:
```
graph():
    %x : [#users=1] = placeholder[target=x]
    %x_equalization_scale0 : [#users=1] = get_attr[target=x_equalization_scale0]
    %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_equalization_scale0), kwargs = {})
    %conv_input_scale_0 : [#users=1] = get_attr[target=conv_input_scale_0]
    %conv_input_zero_point_0 : [#users=1] = get_attr[target=conv_input_zero_point_0]
    %quantize_per_tensor : [#users=1] = call_function[target=torch.quantize_per_tensor](args = (%mul, %conv_input_scale_0, %conv_input_zero_point_0, torch.quint8), kwargs = {})
    %conv : [#users=1] = call_module[target=conv](args = (%quantize_per_tensor,), kwargs = {})
    %dequantize : [#users=1] = call_method[target=dequantize](args = (%conv,), kwargs = {})
    return dequantize
```

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D29557055

fbshipit-source-id: dc9f44182e31fa362c43ad2dfe224e6f4e4a730e
2021-07-13 14:00:38 -07:00
Angela Yi
77d36b657a [quant] Input-Weight Equalization - Conv prepare support (#61286)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61286

Modifies the prepare step to support conv layers during input-weight equalization and adds tests to make sure that the results are as expected.

Initial:
```
      w
      |
x -> conv -> y
```

After prepare:

```
                                         w
                                         |
                                  weight_quant_obs
                                         |
                                    weight_eq_obs
                                         |
x -> input_quant_obs -> input_eq_obs -> conv -> out_quant_obs -> y
```

Test Plan:
`python test/test_quantization.py TestEqualizeFx.test_input_weight_equalization_prepare`

Initial:
```
ConvModel(
  (conv): Conv2d(3, 5, kernel_size=(3, 3), stride=(1, 1), bias=False)
)
```

After prepare:
```
graph():
    %x : [#users=1] = placeholder[target=x]
    %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_0](args = (%x,), kwargs = {})
    %x_activation_post_process_0_equalization_process_0 : [#users=1] = call_module[target=x_activation_post_process_0_equalization_process_0](args = (%x_activation_post_process_0,), kwargs = {})
    %conv : [#users=1] = call_module[target=conv](args = (%x_activation_post_process_0_equalization_process_0,), kwargs = {})
    %conv_activation_post_process_0 : [#users=1] = call_module[target=conv_activation_post_process_0](args = (%conv,), kwargs = {})
    return conv_activation_post_process_0
```

Imported from OSS

Reviewed By: supriyar

Differential Revision: D29557051

fbshipit-source-id: 25d1531645dfaf565f5c615e2ee850fcf96c7eb9
2021-07-13 14:00:36 -07:00
Angela Yi
ce9cedd119 [quant] Input-Weight Equalization - Conv observer support (#61285)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61285

Modifies observers to support conv layers and tests to make sure that the observers are returning the expected values for conv inputs.

Test Plan:
`python test/test_quantization.py TestEqualizeFx.test_input_weight_eq_observer`

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D29557041

fbshipit-source-id: 5e43329f189ba352eb8b991f38bf37752eebb6e6
2021-07-13 13:59:23 -07:00
Supriya Rao
7a15576a65 [quant] update FakeQuant modules to use tensor qparams (#61318)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61318

Remove the `float()` and `int()` calls in the forward function so that we can directly use the tensor qparams in the fake_quantize operator.

Calling `float()/int()` internally calls `item()` which can trigger a gpu-> cpu copy if the original tensors reside on GPU.
Local benchmark P427668213

Before this change
```
                                               Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg     Self CUDA   Self CUDA %    CUDA total  CUDA time avg    # of Calls
---------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------
                                     aten::_aminmax         2.57%       1.507ms         3.10%       1.819ms      36.371us       2.872ms         4.81%       2.872ms      57.446us            50
              aten::fake_quantize_per_tensor_affine         1.04%     610.915us         3.60%       2.114ms      42.276us     472.896us         0.79%       2.698ms      53.962us            50
    aten::fake_quantize_per_tensor_affine_cachemask         1.69%     993.626us         2.56%       1.503ms      30.058us       2.225ms         3.73%       2.225ms      44.504us            50
                                   aten::is_nonzero         3.85%       2.258ms        19.68%      11.540ms      46.161us       2.168ms         3.63%      11.084ms      44.336us           250
                                   aten::zeros_like         1.82%       1.064ms         6.65%       3.901ms      39.007us       1.531ms         2.57%       3.905ms      39.045us           100
                                           aten::eq        13.80%       8.093ms        25.90%      15.189ms      37.972us       9.580ms        16.05%      15.566ms      38.914us           400
                                         aten::item         5.67%       3.323ms        21.50%      12.607ms      36.019us       3.233ms         5.42%      12.167ms      34.762us           350
                                        aten::zeros         0.94%     549.208us         2.93%       1.717ms      34.343us     688.928us         1.15%       1.695ms      33.894us            50
                                           aten::le         2.52%       1.478ms         4.50%       2.641ms      26.411us       1.753ms         2.94%       2.845ms      28.448us           100
                                         aten::rsub         1.04%     608.715us         2.44%       1.433ms      28.667us     532.000us         0.89%       1.418ms      28.353us            50
                                          aten::max         1.54%     905.401us         4.62%       2.711ms      27.106us     847.488us         1.42%       2.697ms      26.969us           100
                                         aten::ones         0.92%     542.159us         2.16%       1.266ms      25.324us     661.856us         1.11%       1.301ms      26.017us            50
                                          aten::min         0.82%     479.167us         2.15%       1.258ms      25.160us     407.808us         0.68%       1.276ms      25.530us            50
                          aten::_local_scalar_dense        15.83%       9.284ms        15.83%       9.284ms      26.526us       8.934ms        14.97%       8.934ms      25.524us           350
                                        aten::clamp         2.35%       1.378ms         4.21%       2.467ms      24.669us       1.546ms         2.59%       2.461ms      24.612us           100
                                        aten::zero_         2.53%       1.482ms         5.65%       3.316ms      22.108us       1.326ms         2.22%       3.380ms      22.531us           150
                                      aten::maximum         3.08%       1.805ms         3.08%       1.805ms      18.052us       1.849ms         3.10%       1.849ms      18.494us           100
                                      aten::minimum         1.33%     778.854us         1.33%     778.854us      15.577us     868.672us         1.46%     868.672us      17.373us            50
                                        aten::round         1.36%     799.910us         1.36%     799.910us      15.998us     809.568us         1.36%     809.568us      16.191us            50
                                        aten::copy_         6.61%       3.878ms         6.61%       3.878ms      15.513us       4.036ms         6.76%       4.036ms      16.143us           250
                                          aten::div         2.53%       1.483ms         2.53%       1.483ms      14.833us       1.535ms         2.57%       1.535ms      15.353us           100
                                          aten::mul         2.44%       1.431ms         2.44%       1.431ms      14.314us       1.478ms         2.48%       1.478ms      14.782us           100
                                       aten::detach         1.46%     855.670us         2.41%       1.411ms      14.110us     832.448us         1.39%       1.395ms      13.949us           100
                                          aten::add         2.22%       1.301ms         2.22%       1.301ms      13.008us       1.383ms         2.32%       1.383ms      13.828us           100
                                        aten::fill_         4.18%       2.452ms         4.18%       2.452ms      12.262us       2.693ms         4.51%       2.693ms      13.463us           200
                                          aten::sub         5.06%       2.967ms         5.06%       2.967ms      14.837us       2.675ms         4.48%       2.675ms      13.374us           200
                                           aten::to         2.10%       1.230ms         3.65%       2.140ms      10.701us       1.310ms         2.20%       2.062ms      10.310us           200
                                       aten::select         1.28%     749.144us         1.49%     874.227us       8.742us     863.232us         1.45%     863.232us       8.632us           100
                                             detach         0.95%     555.326us         0.95%     555.326us       5.553us     562.496us         0.94%     562.496us       5.625us           100
                                   aten::as_strided         0.40%     232.289us         0.40%     232.289us       1.161us       0.000us         0.00%       0.000us       0.000us           200
                                        aten::empty         2.93%       1.720ms         2.93%       1.720ms       3.439us       0.000us         0.00%       0.000us       0.000us           500
                                      aten::resize_         1.04%     611.313us         1.04%     611.313us       2.038us       0.000us         0.00%       0.000us       0.000us           300
                                   aten::empty_like         0.75%     438.585us         1.77%       1.036ms       5.180us       0.000us         0.00%       0.000us       0.000us           200
                                aten::empty_strided         1.36%     799.442us         1.36%     799.442us       3.198us       0.000us         0.00%       0.000us       0.000us           250
---------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------
Self CPU time total: 58.645ms
Self CUDA time total: 59.674ms
```

After this change
```

test_fake_quant_profiler (scripts.supriyar.benchmark.module_bench.ProfilerBench) ... -------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------
                                                   Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg     Self CUDA   Self CUDA %    CUDA total  CUDA time avg    # of Calls
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------
                  aten::fake_quantize_per_tensor_affine         0.98%     505.210us         4.38%       2.259ms      45.187us     419.424us         0.78%       3.218ms      64.367us            50
                                         aten::_aminmax         2.78%       1.434ms         3.42%       1.766ms      35.321us       2.825ms         5.27%       2.825ms      56.505us            50
aten::fake_quantize_per_tensor_affine_cachemask_tens...         2.38%       1.229ms         3.40%       1.754ms      35.083us       2.799ms         5.22%       2.799ms      55.979us            50
                                             aten::rsub         0.94%     485.040us         5.02%       2.590ms      51.793us     458.976us         0.86%       2.587ms      51.747us            50
                                       aten::is_nonzero         3.78%       1.952ms        23.64%      12.196ms      48.786us       2.055ms         3.83%      11.986ms      47.944us           250
                                             aten::item         6.92%       3.572ms        19.86%      10.244ms      40.977us       3.670ms         6.85%       9.931ms      39.724us           250
                                       aten::zeros_like         1.65%     848.874us         6.64%       3.426ms      34.260us       1.397ms         2.61%       3.572ms      35.717us           100
                                            aten::zeros         0.85%     436.691us         3.00%       1.549ms      30.984us     551.936us         1.03%       1.576ms      31.516us            50
                                               aten::eq        10.60%       5.467ms        20.26%      10.452ms      26.130us       7.018ms        13.09%      10.832ms      27.079us           400
                                               aten::le         2.58%       1.332ms         4.67%       2.407ms      24.074us       1.580ms         2.95%       2.614ms      26.144us           100
                              aten::_local_scalar_dense        12.93%       6.673ms        12.93%       6.673ms      26.691us       6.261ms        11.68%       6.261ms      25.046us           250
                                            aten::clamp         2.43%       1.253ms         4.37%       2.256ms      22.560us       1.431ms         2.67%       2.273ms      22.725us           100
                                             aten::ones         0.89%     460.133us         2.18%       1.123ms      22.467us     570.496us         1.06%       1.128ms      22.551us            50
                                              aten::min         0.74%     383.132us         2.06%       1.065ms      21.296us     377.536us         0.70%       1.091ms      21.824us            50
                                            aten::zero_         2.36%       1.219ms         5.87%       3.029ms      20.194us       1.261ms         2.35%       3.199ms      21.327us           150
                                              aten::max         1.51%     779.081us         4.06%       2.096ms      20.960us     791.680us         1.48%       2.130ms      21.295us           100
                                              aten::sub         7.97%       4.111ms         7.97%       4.111ms      20.556us       3.847ms         7.18%       3.847ms      19.234us           200
                                              aten::div         2.94%       1.516ms         2.94%       1.516ms      15.158us       1.580ms         2.95%       1.580ms      15.798us           100
                                            aten::round         1.45%     750.445us         1.45%     750.445us      15.009us     756.064us         1.41%     756.064us      15.121us            50
                                            aten::copy_         6.88%       3.548ms         6.88%       3.548ms      14.190us       3.701ms         6.90%       3.701ms      14.803us           250
                                          aten::minimum         1.32%     681.654us         1.32%     681.654us      13.633us     713.664us         1.33%     713.664us      14.273us            50
                                          aten::maximum         2.55%       1.317ms         2.55%       1.317ms      13.169us       1.338ms         2.50%       1.338ms      13.378us           100
                                              aten::mul         2.63%       1.358ms         2.63%       1.358ms      13.581us       1.328ms         2.48%       1.328ms      13.283us           100
                                           aten::detach         1.34%     688.820us         2.35%       1.211ms      12.110us     772.800us         1.44%       1.278ms      12.779us           100
                                            aten::fill_         4.53%       2.338ms         4.53%       2.338ms      11.692us       2.495ms         4.65%       2.495ms      12.473us           200
                                              aten::add         2.32%       1.197ms         2.32%       1.197ms      11.968us       1.240ms         2.31%       1.240ms      12.405us           100
                                               aten::to         2.07%       1.069ms         3.66%       1.889ms       9.443us       1.224ms         2.28%       1.975ms       9.874us           200
                                           aten::select         1.44%     743.042us         1.64%     848.207us       8.482us     641.600us         1.20%     641.600us       6.416us           100
                                                 detach         1.01%     522.155us         1.01%     522.155us       5.222us     505.088us         0.94%     505.088us       5.051us           100
                                       aten::as_strided         0.44%     227.884us         0.44%     227.884us       1.139us       0.000us         0.00%       0.000us       0.000us           200
                                            aten::empty         3.20%       1.652ms         3.20%       1.652ms       3.304us       0.000us         0.00%       0.000us       0.000us           500
                                          aten::resize_         1.25%     646.711us         1.25%     646.711us       2.156us       0.000us         0.00%       0.000us       0.000us           300
                                       aten::empty_like         0.79%     407.768us         2.07%       1.067ms       5.334us       0.000us         0.00%       0.000us       0.000us           200
                                    aten::empty_strided         1.52%     785.788us         1.52%     785.788us       3.143us       0.000us         0.00%       0.000us       0.000us           250
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------
Self CPU time total: 51.590ms
Self CUDA time total: 53.609ms
ghstack-source-id: 133370215

Test Plan: buck test mode/dev-nosan caffe2/test/:quantization

Reviewed By: raghuramank100

Differential Revision: D29566512

fbshipit-source-id: 1aefca51f99949da7334bcfe504848275c9f952c
2021-07-10 19:43:02 -07:00
Supriya Rao
99848c7269 [quant] Add tensor_qparam variant to fake_quantize_per_tensor (#61317)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61317

Add an overload to fake_quantize_per_tensor that accepts scale/zero_point as input. The reasons to do this are

* required for fused observer + fake_quant operator on GPU where the scale/zero_point will be calculated by the observer on device. Passing tensor inputs enables us to directly access the scale/zero-point value in the cuda kernel to avoid extra copies/malloc
* enables us to pass in float as scale dtype and int32 as zero_point dtype (which is consistent with what the quantize call actually uses) https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/quantized/affine_quantizer_base.cpp#L52-L53
* overload consistent with `quantizer_per_tensor.tensor_qparams`
ghstack-source-id: 133370216

Test Plan:
buck test mode/dev-nosan caffe2/test/:quantization -- test_backward_per_tensor_cachemask
buck test mode/dev-nosan caffe2/test/:quantization -- test_forward_per_tensor_cachemask

Reviewed By: raghuramank100

Differential Revision: D29552727

fbshipit-source-id: cbb9af40fc575ad27a29c646b760d5ee52cc923d
2021-07-10 19:41:55 -07:00
Supriya Rao
a4d86e0d53 [quant][fx][perf] improve runtime of prepare step for large models (#61132)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61132

For large models, the insert_observers_for_model function was taking a long time, especially for the case where not all the nodes are being quantized

For example for a model with 21000 nodes of which only ~50 are being quantized the breakdown of prepare_fx vs convert fx was

prepare_fx 979 seconds
convert_fx 9 seconds

The main reason was because we were doing some unnecessary computation for all nodes in this function, this PR just moves them to where they are actually used

After this PR
prepare_fx 26 seconds
convert_fx 9 seconds

Test Plan:
Existing tests

Imported from OSS

Reviewed By: raghuramank100

Differential Revision: D29522303

fbshipit-source-id: 7ce12582a859d02ff763abebf4a592d28e0764ca
2021-07-01 17:17:10 -07:00
Angela Yi
dabadd7e20 [quant] Added reset_min_max_vals() function to observers (#60883)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60883

As per this [comment](https://github.com/pytorch/pytorch/pull/59964#discussion_r659064270), I created a `reset_min_max_vals()` function inside the observers which will be called during input-weight equalization. This is so that we will not expose the implementation of the observers in the equalization code.

Test Plan:
`python test/test_quantization.py TestEqualizeFx`

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D29491848

fbshipit-source-id: 00e91959ceb3b4f3688175a1a7ba11823e929b2f
2021-06-30 14:22:08 -07:00
Angela Yi
1a0195db49 [quant] Input-Weight Equalization - support for LinearReLU layers (#60653)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60653

Special casing was needed to get the weight attribute in the linear layers of fused LinearReLU layers.

Initial Model: `x -> linear1 -> relu`

After fusion: `x -> linearRelu`

After prepare: `x -> input_quant_obs -> input_eq_obs1 -> linearRelu -> output_quant_obs1`

After equalization functions: `x -> mul -> input_quant_obs (scaled) -> linearRelu -> output_quant_obs`

After convert: `x -> mul -> quantize_per_tensor -> quantized::linearRelu -> dequantize`

More step-throughs here: https://fb.quip.com/A9J3AsBxkykR

Test Plan:
`python test/test_quantization.py TestEqualizeFx`

Original model:
```
LinearReluModel(
  (fc): Linear(in_features=5, out_features=5, bias=True)
  (relu): ReLU()
)
```

Graph after `prepare_fx`:
```
graph():
    %x : [#users=1] = placeholder[target=x]
    %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_0](args = (%x,), kwargs = {})
    %x_activation_post_process_0_equalization_process_0 : [#users=1] = call_module[target=x_activation_post_process_0_equalization_process_0](args = (%x_activation_post_process_0,), kwargs = {})
    %fc : [#users=1] = call_module[target=fc](args = (%x_activation_post_process_0_equalization_process_0,), kwargs = {})
    %fc_activation_post_process_0 : [#users=1] = call_module[target=fc_activation_post_process_0](args = (%fc,), kwargs = {})
    return fc_activation_post_process_0
```

Graph after equalization functions:
```
graph():
    %x : [#users=1] = placeholder[target=x]
    %x_equalization_scale0 : [#users=1] = get_attr[target=x_equalization_scale0]
    %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_equalization_scale0), kwargs = {})
    %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_0](args = (%mul,), kwargs = {})
    %fc : [#users=1] = call_module[target=fc](args = (%x_activation_post_process_0,), kwargs = {})
    %fc_activation_post_process_0 : [#users=1] = call_module[target=fc_activation_post_process_0](args = (%fc,), kwargs = {})
    return fc_activation_post_process_0
```

Graph after `convert_fx`:
```
graph():
    %x : [#users=1] = placeholder[target=x]
    %x_equalization_scale0 : [#users=1] = get_attr[target=x_equalization_scale0]
    %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_equalization_scale0), kwargs = {})
    %fc_input_scale_0 : [#users=1] = get_attr[target=fc_input_scale_0]
    %fc_input_zero_point_0 : [#users=1] = get_attr[target=fc_input_zero_point_0]
    %quantize_per_tensor : [#users=1] = call_function[target=torch.quantize_per_tensor](args = (%mul, %fc_input_scale_0, %fc_input_zero_point_0, torch.quint8), kwargs = {})
    %fc : [#users=1] = call_module[target=fc](args = (%quantize_per_tensor,), kwargs = {})
    %dequantize : [#users=1] = call_method[target=dequantize](args = (%fc,), kwargs = {})
    return dequantize
```

Imported from OSS

Reviewed By: supriyar

Differential Revision: D29406999

fbshipit-source-id: add38e8e7fb84a241c3b10bfb8451b50103effd4
2021-06-30 14:22:06 -07:00
Vasiliy Kuznetsov
5576c7bdd1 ns for fx: initial support for int8 shadows fp32 (#60419)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60419

Adds support for NS for FX shadowed activations pass to handle int8
modules shadowing fp32 modules. The difficulty here is that in order
to insert the dtype cast, we need the qparams of the input.

For the current PR, we only handle the easy cases where the previous
node is either a `quantize_per_tensor` or an OSS quantized module.
A future PR can handle more complicated cases such as various functions.

Test Plan:
```
python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_int8_shadows_fp32_simple
```

Imported from OSS

Reviewed By: hx89

Differential Revision: D29280050

fbshipit-source-id: 465257c9f82a34fa91b48ae8887355c68e00edc6
2021-06-30 08:08:46 -07:00
Angela Yi
9b94aa5356 [quant][fx][fix] Fused modules with object_type in qconfig (#60779)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60779

When we do fusion, we replace certain modules (such as Linear + ReLU) with fused versions (such as LinearReLU) by calling `_fuse_fx` in prepare_fx. However when we try to look up using the fused module type in qconfig_dict, we cannot find a match anymore since the qconfig dict contains the original module types. An example is here [N882873](https://fburl.com/anp/azenjx3v).

So we will now update the qconfig_dict to include the fused modules mapping to the qconfigs used for the modules that make up the fused modules. If the modules are not mapped to the same qconfig, then we will raise an error.

Test Plan:
`python test/test_quantization.py TestFuseFx.test_qconfig_fused_module`

Imported from OSS

Reviewed By: supriyar

Differential Revision: D29406941

fbshipit-source-id: 74b5db89f4998aeb02b2bf7c37bf97326580c654
2021-06-28 15:22:22 -07:00
Angela Yi
dfb9c0bae8 [quant] Input-Weight Equalization - support for connected F.linear layer (#60272)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60272

Test Plan:
`python test/test_quantization.py TestEqualizeFx`

Original model:
```
FunctionalLinear2Module(
  (linear1): Linear()
  (linear2): Linear()
)
```

Graph after `prepare_fx`:
```
graph():
    %x : [#users=1] = placeholder[target=x]
    %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_0](args = (%x,), kwargs = {})
    %x_activation_post_process_0_equalization_process_0 : [#users=1] = call_module[target=x_activation_post_process_0_equalization_process_0](args = (%x_activation_post_process_0,), kwargs = {})
    %linear1_w : [#users=1] = get_attr[target=linear1.w]
    %linear1_w_activation_post_process_0 : [#users=1] = call_module[target=linear1_w_activation_post_process_0](args = (%linear1_w,), kwargs = {})
    %linear1_w_activation_post_process_0_equalization_process_0 : [#users=1] = call_module[target=linear1_w_activation_post_process_0_equalization_process_0](args = (%linear1_w_activation_post_process_0,), kwargs = {})
    %linear1_b : [#users=1] = get_attr[target=linear1.b]
    %linear : [#users=1] = call_function[target=torch.nn.functional.linear](args = (%x_activation_post_process_0_equalization_process_0, %linear1_w_activation_post_process_0_equalization_process_0), kwargs = {bias: %linear1_b})
    %linear_activation_post_process_0 : [#users=1] = call_module[target=linear_activation_post_process_0](args = (%linear,), kwargs = {})
    %linear_activation_post_process_0_equalization_process_0 : [#users=1] = call_module[target=linear_activation_post_process_0_equalization_process_0](args = (%linear_activation_post_process_0,), kwargs = {})
    %linear2_w : [#users=1] = get_attr[target=linear2.w]
    %linear2_w_activation_post_process_0 : [#users=1] = call_module[target=linear2_w_activation_post_process_0](args = (%linear2_w,), kwargs = {})
    %linear2_w_activation_post_process_0_equalization_process_0 : [#users=1] = call_module[target=linear2_w_activation_post_process_0_equalization_process_0](args = (%linear2_w_activation_post_process_0,), kwargs = {})
    %linear2_b : [#users=1] = get_attr[target=linear2.b]
    %linear_1 : [#users=1] = call_function[target=torch.nn.functional.linear](args = (%linear_activation_post_process_0_equalization_process_0, %linear2_w_activation_post_process_0_equalization_process_0), kwargs = {bias: %linear2_b})
    %linear_1_activation_post_process_0 : [#users=1] = call_module[target=linear_1_activation_post_process_0](args = (%linear_1,), kwargs = {})
    return linear_1_activation_post_process_0
```

Graph after equalization steps:
```
graph():
    %x : [#users=1] = placeholder[target=x]
    %x_equalization_scale0 : [#users=1] = get_attr[target=x_equalization_scale0]
    %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_equalization_scale0), kwargs = {})
    %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_0](args = (%mul,), kwargs = {})
    %linear1_w : [#users=1] = get_attr[target=linear1.w]
    %linear1_w_activation_post_process_0 : [#users=1] = call_module[target=linear1_w_activation_post_process_0](args = (%linear1_w,), kwargs = {})
    %linear1_b : [#users=1] = get_attr[target=linear1.b]
    %linear : [#users=1] = call_function[target=torch.nn.functional.linear](args = (%x_activation_post_process_0, %linear1_w_activation_post_process_0), kwargs = {bias: %linear1_b})
    %linear_activation_post_process_0 : [#users=1] = call_module[target=linear_activation_post_process_0](args = (%linear,), kwargs = {})
    %linear2_w : [#users=1] = get_attr[target=linear2.w]
    %linear2_w_activation_post_process_0 : [#users=1] = call_module[target=linear2_w_activation_post_process_0](args = (%linear2_w,), kwargs = {})
    %linear2_b : [#users=1] = get_attr[target=linear2.b]
    %linear_1 : [#users=1] = call_function[target=torch.nn.functional.linear](args = (%linear_activation_post_process_0, %linear2_w_activation_post_process_0), kwargs = {bias: %linear2_b})
    %linear_1_activation_post_process_0 : [#users=1] = call_module[target=linear_1_activation_post_process_0](args = (%linear_1,), kwargs = {})
    return linear_1_activation_post_process_0
```

Graph after `convert_fx`:
```
graph():
    %x : [#users=1] = placeholder[target=x]
    %x_equalization_scale0 : [#users=1] = get_attr[target=x_equalization_scale0]
    %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_equalization_scale0), kwargs = {})
    %linear1_input_scale_0 : [#users=1] = get_attr[target=linear1_input_scale_0]
    %linear1_input_zero_point_0 : [#users=1] = get_attr[target=linear1_input_zero_point_0]
    %quantize_per_tensor : [#users=1] = call_function[target=torch.quantize_per_tensor](args = (%mul, %linear1_input_scale_0, %linear1_input_zero_point_0, torch.quint8), kwargs = {})
    %linear1_packed_weight_0 : [#users=1] = get_attr[target=linear1_packed_weight_0]
    %linear1_scale_0 : [#users=1] = get_attr[target=linear1_scale_0]
    %linear1_zero_point_0 : [#users=1] = get_attr[target=linear1_zero_point_0]
    %linear : [#users=1] = call_function[target=torch.ops.quantized.linear](args = (%quantize_per_tensor, %linear1_packed_weight_0, %linear1_scale_0, %linear1_zero_point_0), kwargs = {})
    %linear2_packed_weight_0 : [#users=1] = get_attr[target=linear2_packed_weight_0]
    %linear2_scale_0 : [#users=1] = get_attr[target=linear2_scale_0]
    %linear2_zero_point_0 : [#users=1] = get_attr[target=linear2_zero_point_0]
    %linear_1 : [#users=1] = call_function[target=torch.ops.quantized.linear](args = (%linear, %linear2_packed_weight_0, %linear2_scale_0, %linear2_zero_point_0), kwargs = {})
    %dequantize : [#users=1] = call_method[target=dequantize](args = (%linear_1,), kwargs = {})
    return dequantize
```

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D29267218

fbshipit-source-id: 6b97bed1a307f1d0b1f5efcbecf41f35418242f7
2021-06-28 10:44:27 -07:00
Angela Yi
ddf2ce03bb [quant] Input-Weight Equalization - support for connected linear layers (#60034)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60034

Added support for equalizing models with connected linear
layers. To account for connected linear layers, we will additionally
multiply the previous weight values (row-wise) by the next equalization
scale, and remove the input equalization observer between the two linear
layers. We also want to scale the bias by the next equalization scale.
The math is shown here: https://fb.quip.com/fK8rA9aRM4ca .

Original Model: `x -> linear1 -> linear2`
After `prepare_fx`: `x -> InpEqObs -> InpQuantObs -> linear1 ->
OutQuantObs -> InpEqObs -> linear2`
After equalization: `x -> mul -> InpQuantObs -> linear1 -> OutQuantObs
-> linear2`

Test Plan:
`python test/test_quantization.py
TestEqualizeFx.test_input_weight_equalization_convert`

Original Model:
```
Linear2Module(
  (linear1): Linear(in_features=2, out_features=2, bias=True)
  (linear2): Linear(in_features=2, out_features=2, bias=True)
)
```

Graph after `prepare_fx`:
```
graph():
    %x : [#users=1] = placeholder[target=x]
    %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_0](args = (%x,), kwargs = {})
    %x_activation_post_process_0_equalization_process_0 : [#users=1] = call_module[target=x_activation_post_process_0_equalization_process_0](args = (%x_activation_post_process_0,), kwargs = {})
    %linear1 : [#users=1] = call_module[target=linear1](args = (%x_activation_post_process_0_equalization_process_0,), kwargs = {})
    %linear1_activation_post_process_0 : [#users=1] = call_module[target=linear1_activation_post_process_0](args = (%linear1,), kwargs = {})
    %linear1_activation_post_process_0_equalization_process_0 : [#users=1] = call_module[target=linear1_activation_post_process_0_equalization_process_0](args = (%linear1_activation_post_process_0,), kwargs = {})
    %linear2 : [#users=1] = call_module[target=linear2](args = (%linear1_activation_post_process_0_equalization_process_0,), kwargs = {})
    %linear2_activation_post_process_0 : [#users=1] = call_module[target=linear2_activation_post_process_0](args = (%linear2,), kwargs = {})
    return linear2_activation_post_process_0
```

Graph after equaliation functions:
```
graph():
    %x : [#users=1] = placeholder[target=x]
    %x_activation_post_process_0_equalization_process_0_scale : [#users=1] = get_attr[target=x_activation_post_process_0_equalization_process_0_scale]
    %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_activation_post_process_0_equalization_process_0_scale), kwargs = {})
    %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_0](args = (%mul,), kwargs = {})
    %linear1 : [#users=1] = call_module[target=linear1](args = (%x_activation_post_process_0,), kwargs = {})
    %linear1_activation_post_process_0 : [#users=1] = call_module[target=linear1_activation_post_process_0](args = (%linear1,), kwargs = {})
    %linear2 : [#users=1] = call_module[target=linear2](args = (%linear1_activation_post_process_0,), kwargs = {})
    %linear2_activation_post_process_0 : [#users=1] = call_module[target=linear2_activation_post_process_0](args = (%linear2,), kwargs = {})
    return linear2_activation_post_process_0
```

Graph after `convert_fx`:
```
graph():
    %x : [#users=1] = placeholder[target=x]
    %x_activation_post_process_0_equalization_process_0_scale : [#users=1] = get_attr[target=x_activation_post_process_0_equalization_process_0_scale]
    %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_activation_post_process_0_equalization_process_0_scale), kwargs = {})
    %linear1_input_scale_0 : [#users=1] = get_attr[target=linear1_input_scale_0]
    %linear1_input_zero_point_0 : [#users=1] = get_attr[target=linear1_input_zero_point_0]
    %quantize_per_tensor : [#users=1] = call_function[target=torch.quantize_per_tensor](args = (%mul, %linear1_input_scale_0, %linear1_input_zero_point_0, torch.quint8), kwargs = {})
    %linear1 : [#users=1] = call_module[target=linear1](args = (%quantize_per_tensor,), kwargs = {})
    %linear2 : [#users=1] = call_module[target=linear2](args = (%linear1,), kwargs = {})
    %dequantize : [#users=1] = call_method[target=dequantize](args = (%linear2,), kwargs = {})
    return dequantize
```

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D29204347

fbshipit-source-id: 6bb9e25e2468f50df523885ded2edc731f002ac1
2021-06-28 10:44:25 -07:00
Angela Yi
7917318917 [quant] Input-Weight Equalization - support for F.linear layers (#59964)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59964

Input-Weight Equalization support for functional layers

Test Plan:
`python test/test_quantization.py
TestEqualizeFx.test_input_weight_equalization_convert`

Original model:
```
FunctionalLinearModule(
  (linear1): Linear()
)
```

Graph after `prepare_fx`:
```
graph():
    %x : [#users=1] = placeholder[target=x]
    %x_equalization_process_0 : [#users=1] = call_module[target=x_equalization_process_0](args = (%x,), kwargs = {})
graph():
    %x : [#users=1] = placeholder[target=x]
    %x_equalization_process_0 : [#users=1] = call_module[target=x_equalization_process_0](args = (%x,), kwargs = {})
    %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_00](args = (%x_equalization_process_0,), kwargs = {})
    %linear1_w : [#users=1] = get_attr[target=linear1.w]
    %linear1_w_equalization_process_0 : [#users=1] = call_module[target=linear1_w_equalization_process_0](args = (%linear1_w,), kwargs = {})
    %linear1_w_activation_post_process_0 : [#users=1] = call_module[target=linear1_w_activation_post_process_00](args = (%linear1_w_equalization_process_0,), kwargs = {})
    %linear1_b : [#users=1] = get_attr[target=linear1.b]
    %linear : [#users=1] = call_function[target=torch.nn.functional.linear](args = (%x_activation_post_process_0, %linear1_w_activation_post_process_0), kwargs = {bias: %linear1_b})
    %linear_activation_post_process_0 : [#users=1] = call_module[target=linear_activation_post_process_0](args = (%linear,), kwargs = {})
    return linear_activation_post_process_0
```

Graph after equalization functions:
```
graph():
    %x : [#users=1] = placeholder[target=x]
    %x_equalization_process_0_scale : [#users=1] = get_attr[target=x_equalization_process_0_scale]
    %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_equalization_process_0_scale), kwargs = {})
    %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_00](args = (%mul,), kwargs = {})
    %linear1_w : [#users=1] = get_attr[target=linear1.w]
    %linear1_w_equalization_process_0 : [#users=1] = call_module[target=linear1_w_equalization_process_0](args = (%linear1_w,), kwargs = {})
    %linear1_w_activation_post_process_0 : [#users=1] = call_module[target=linear1_w_activation_post_process_00](args = (%linear1_w_equalization_process_0,), kwargs = {})
    %linear1_b : [#users=1] = get_attr[target=linear1.b]
    %linear : [#users=1] = call_function[target=torch.nn.functional.linear](args = (%x_activation_post_process_0, %linear1_w_activation_post_process_0), kwargs = {bias: %linear1_b})
    %linear_activation_post_process_0 : [#users=1] = call_module[target=linear_activation_post_process_0](args = (%linear,), kwargs = {})
    return linear_activation_post_process_0
```

Graph after `convert_fx`:
```
graph():
    %x : [#users=1] = placeholder[target=x]
    %x_equalization_process_0_scale : [#users=1] = get_attr[target=x_equalization_process_0_scale]
    %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_equalization_process_0_scale), kwargs = {})
    %linear1_input_scale_0 : [#users=1] = get_attr[target=linear1_input_scale_0]
    %linear1_input_zero_point_0 : [#users=1] = get_attr[target=linear1_input_zero_point_0]
    %quantize_per_tensor : [#users=1] = call_function[target=torch.quantize_per_tensor](args = (%mul, %linear1_input_scale_0, %linear1_input_zero_point_0, torch.quint8), kwargs = {})
    %linear1_packed_weight_0 : [#users=1] = get_attr[target=linear1_packed_weight_0]
    %linear1_scale_0 : [#users=1] = get_attr[target=linear1_scale_0]
    %linear1_zero_point_0 : [#users=1] = get_attr[target=linear1_zero_point_0]
    %linear : [#users=1] = call_function[target=torch.ops.quantized.linear](args = (%quantize_per_tensor, %linear1_packed_weight_0, %linear1_scale_0, %linear1_zero_point_0), kwargs = {})
    %dequantize : [#users=1] = call_method[target=dequantize](args = (%linear,), kwargs = {})
    return dequantize
```

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D29135459

fbshipit-source-id: 1e69bfbb82a0c89538e55b64968effd0b11b2fde
2021-06-28 10:44:24 -07:00
angelayi
e13a9587b4 Revert "Revert D29135358: [quant] Input-Weight Equaliaztion - convert modifications" (#60646)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60646

This reverts commit e60f9cfc58.

Test Plan: Imported from OSS

Reviewed By: supriyar

Differential Revision: D29361191

Pulled By: angelayi

fbshipit-source-id: 275d8691d8e47da4ab80bb21b51d77ec25a0f714
2021-06-25 15:37:05 -07:00
Vasiliy Kuznetsov
7fc4e67771 ns for fx: fix shadow logger error for resnet18 (#60559)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60559

Adds `resnet18` to integration test, and fixes the error to
make creating the shadow model work.

Test Plan:
```
python test/test_quantization.py TestFXNumericSuiteCoreAPIsModels.test_resnet18
```

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D29336236

fbshipit-source-id: 9425aa096162d80ef3a7c98144b2301cfbccc1ea
2021-06-24 13:42:18 -07:00
Vasiliy Kuznetsov
4ddb2b43b7 ns for fx: expose function to add comparisons between logged values (#60311)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60311

Adds a user facing utility function to FX Numeric Suite Core APIs
for comparing the values extracted by the loggers to each other.
This is needed for any kind of analysis, so would be great to
provide an example implementation.

Example:

```
// code

m = nn.Sequential(nn.Conv2d(1, 1, 1), nn.Conv2d(1, 1, 1)).eval()
qconfig_dict = {'': torch.quantization.default_qconfig}
mp = torch.quantization.quantize_fx.prepare_fx(m, qconfig_dict)
mq = torch.quantization.quantize_fx.convert_fx(copy.deepcopy(mp))
results = extract_weights('fp32', mp, 'int8', mq)
extend_logger_results_with_comparison(
    results, 'fp32', 'int8', compute_sqnr, 'sqnr_int8_vs_fp32')

print(results)

// results

{
  '_1': {'weight': {
    'fp32': [
      {'type': 'weight', 'values': [tensor([[[[-0.3284]]]])], 'prev_node_name': '_1', 'prev_node_target_type': "<class 'torch.nn.modules.conv.Conv2d'>", 'ref_node_name': '_1', 'index_within_arg': 0, 'index_of_arg': 0}
    ],
    'int8': [
      {'type': 'weight', 'values': [tensor([[[[-0.3297]]]], size=(1, 1, 1, 1), dtype=torch.qint8,
       quantization_scheme=torch.per_tensor_affine, scale=0.002575645223259926,
       zero_point=0)], 'prev_node_name': '_1', 'prev_node_target_type': "<class 'torch.nn.quantized.modules.conv.Conv2d'>", 'ref_node_name': '_1', 'index_within_arg': 0, 'index_of_arg': 0, 'sqnr_int8_vs_fp32': [tensor(48.1308)]}
    ]
  }},
  '_0': {'weight': {
    'fp32': [{'type': 'weight', 'values': [tensor([[[[0.5205]]]])], 'prev_node_name': '_0', 'prev_node_target_type': "<class 'torch.nn.modules.conv.Conv2d'>", 'ref_node_name': '_0', 'index_within_arg': 0, 'index_of_arg': 0}],
    'int8': [{'type': 'weight', 'values': [tensor([[[[0.5184]]]], size=(1, 1, 1, 1), dtype=torch.qint8,
       quantization_scheme=torch.per_tensor_affine, scale=0.004082232713699341,
       zero_point=0)], 'prev_node_name': '_0', 'prev_node_target_type': "<class 'torch.nn.quantized.modules.conv.Conv2d'>", 'ref_node_name': '_0', 'index_within_arg': 0, 'index_of_arg': 0, 'sqnr_int8_vs_fp32': [tensor(48.1309)]}]
  }}
}

```

Test Plan:
```
python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_extend_logger_results_with_comparison
```

Imported from OSS

Reviewed By: hx89

Differential Revision: D29244715

fbshipit-source-id: a5547b449ea54e046c752119559be49bd738beea
2021-06-24 13:42:16 -07:00
Vasiliy Kuznetsov
31fe1c1323 ns for fx: rekey results by model node names (#60305)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60305

Adjusts the NS for FX weight and activation extraction APIs
to require a model name, and rekeys the results of these APIs
to use the node names of the specified model as layer keys.

For example, before

```
// API call
results = ns.extract_logger_info(
  model_a, model_b, ns.OutputLogger)

// results
{'base_op_1_0': {'node_output':
  {'model_a': [{'ref_node_name': 'linear1', ...}]}}}
```

and after

```
// API call
results = ns.extract_logger_info(
  model_a, model_b, ns.OutputLogger, 'model_b_name')

// results
// note: instead of `base_op_1_0`, the layer is named `linear1`
{'linear1': {'node_output':
  {'model_a': [{'ref_node_name': 'linear1', ...}]}}}
```

Note: we cannot use these names while collecting data because
node names are not guaranteed to be consistent across graphs.
This is why we only rekey as the very last step.

Test Plan:
```
python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_layer_names
```

Imported from OSS

Reviewed By: hx89

Differential Revision: D29243045

fbshipit-source-id: d39ecdfdd18b07291e3ecefed2ede287b100b7d0
2021-06-24 13:41:01 -07:00
lezcano
4e347f1242 [docs] Fix backticks in docs (#60474)
Summary:
There is a very common error when writing docs: One forgets to write a matching `` ` ``, and something like ``:attr:`x`` is rendered in the docs. This PR fixes most (all?) of these errors (and a few others).

I found these running ``grep -r ">[^#<][^<]*\`"`` on the `docs/build/html/generated` folder. The regex finds an HTML tag that does not start with `#` (as python comments in example code may contain backticks) and that contains a backtick in the rendered HTML.

This regex has not given any false positive in the current codebase, so I am inclined to suggest that we should add this check to the CI. Would this be possible / reasonable / easy to do malfet ?

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60474

Reviewed By: mrshenli

Differential Revision: D29309633

Pulled By: albanD

fbshipit-source-id: 9621e0e9f87590cea060dd084fa367442b6bd046
2021-06-24 06:27:41 -07:00
Supriya Rao
1120a1b92e [quant][fx][fix] QAT with object_type in qconfig (#60555)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60555

When we do QAT, we swap the FP32 modules with the corresponding quantized modules counterpart by calling `qat_swap_modules` in prepare.
However when we try to look up using the swapped module type in qconfig_dict, we cannot find a match anymore since the qconfig dict contains the original
module type.

In this PR we update the qconfig_dict to include the modules swapped for QATT

Test Plan:
python test/test_quantization.py TestQuantizeFx.test_qconfig_qat_module_type

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D29337036

fbshipit-source-id: 60212eec3ee252a2445c1b58874cb36048c9f7dd
2021-06-23 15:55:25 -07:00
Rong Rong (AI Infra)
e60f9cfc58 Revert D29135358: [quant] Input-Weight Equaliaztion - convert modifications
Test Plan: revert-hammer

Differential Revision:
D29135358 (3de79b7757)

Original commit changeset: 2d0005672904

fbshipit-source-id: cac30c1202ebbce4f22e50ed920340c7b4c6849f
2021-06-23 11:23:24 -07:00
Angela Yi
3de79b7757 [quant] Input-Weight Equaliaztion - convert modifications (#59963)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59963

When converting, before quantizing the nodes, we call
`update_obs_for_equalization()` and `convert_eq_obs()`.

`update_obs_for_equalization`:
1. For each InputEqualizationObserver, we find the corresponding
WeightEqualizationObserver.
2. For nn.Linear layers, we will create an instance of the
WeightEqualizationObserver, run forward on the observer with the given
weights.
3. Calculate the equalization scale between the
InputEqualizationObserver and WeightEqualizationObserver.

`convert_eq_obs`:
For every InputEqualizationObserver, we will do the following:
1. Create a node (ex. `x0_activation_post_process_scale`) containing the
equalization scale constant.
2. Create another node containing a `mul` operator multiplying the
equalization scale and the input.
3. Remove the current InputEqualizationObserver node, and replace it
with the `mul` node.

For every WeightEqualizationObserver, we will do the following:
1. Get the next equalization scale (we may need this for equalizing
connected linear layers).
2. Scale the weights by multiplying it with the reciprocal of the
current equalization scale and the next equalization scale

Currently, this supports models with `nn.Linear` layers, but does not
support connecting linear layers.

Test Plan:
`python test/test_quantization.py
TestEqualizeFx.test_input_weight_equalization_convert`

Original Model:
```
.LinearModule(
  (linear): Linear(in_features=2, out_features=2, bias=True)
)
```

Graph after `prepare_fx`:
```
graph():
    %x : [#users=1] = placeholder[target=x]
    %x_equalization_process_0 : [#users=1] = call_module[target=x_equalization_process_0](args = (%x,), kwargs = {})
    %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_00](args = (%x_equalization_process_0,), kwargs = {})
    %linear : [#users=1] = call_module[target=linear](args = (%x_activation_post_process_0,), kwargs = {})
    %linear_activation_post_process_0 : [#users=1] = call_module[target=linear_activation_post_process_0](args = (%linear,), kwargs = {})
    return linear_activation_post_process_0
```

Graph after equalization functions:
```
graph():
    %x : [#users=1] = placeholder[target=x]
    %x_equalization_process_0_scale : [#users=1] = get_attr[target=x_equalization_process_0_scale]
    %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_equalization_process_0_scale), kwargs = {})
    %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_00](args = (%mul,), kwargs = {})
    %linear : [#users=1] = call_module[target=linear](args = (%x_activation_post_process_0,), kwargs = {})
    %linear_activation_post_process_0 : [#users=1] = call_module[target=linear_activation_post_process_0](args = (%linear,), kwargs = {})
    return linear_activation_post_process_0
```

Graph after `convert_fx`:
```
graph():
    %x : [#users=1] = placeholder[target=x]
    %x_equalization_process_0_scale : [#users=1] = get_attr[target=x_equalization_process_0_scale]
    %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_equalization_process_0_scale), kwargs = {})
    %linear_input_scale_0 : [#users=1] = get_attr[target=linear_input_scale_0]
    %linear_input_zero_point_0 : [#users=1] = get_attr[target=linear_input_zero_point_0]
    %quantize_per_tensor : [#users=1] = call_function[target=torch.quantize_per_tensor](args = (%mul, %linear_input_scale_0, %linear_input_zero_point_0, torch.quint8), kwargs = {})
    %linear : [#users=1] = call_module[target=linear](args = (%quantize_per_tensor,), kwargs = {})
    %dequantize : [#users=1] = call_method[target=dequantize](args = (%linear,), kwargs = {})
    return dequantize
```

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D29135358

fbshipit-source-id: 2d00056729041318463de61841483490b6bfeee5
2021-06-22 20:43:30 -07:00
Supriya Rao
4887c6e401 [quant] avoid resize calls in observer/fake_quant (#60386)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60386

During QAT we sometimes encounter errors with scripted models
`RuntimeError: cannot resize variables that require grad`

For per-tensor cases we don't need to resize some buffers so this PR removes the extra resize ops where applicable

Test Plan: Imported from OSS

Reviewed By: vkuzo

Differential Revision: D29271905

fbshipit-source-id: 01a484a9559a3a4180490f9476d0cd3044ba0d1b
2021-06-22 17:41:43 -07:00
Jerry Zhang
4a3eea9a6a [quant][graphmode][fx] Produce reference linear module in convert (#60152)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60152

Test Plan:
python test/test_quantization.py TestQuantizeFx

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D29188263

fbshipit-source-id: f7bbbef5d4d747eadf7a627a4e77a5ec9bb0bc94
2021-06-20 20:08:12 -07:00
Jerry Zhang
2293ab4e53 [quant][graphmode][fx] Refactor convert for linear to use get_static_module_mapping and get_dynamic_module_mapping (#60151)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60151

Test Plan:
```
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
```

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D29188264

fbshipit-source-id: d2b77ffcf4b7446fc6c43248e43218092d2a6aea
2021-06-20 19:41:16 -07:00
Jerry Zhang
47d727fe1b [quant][graphmode][fx] Produce conv reference static quant modules (#60138)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60138

Test Plan:
python test/test_quantization.py TestQuantizeFx

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D29184791

fbshipit-source-id: 971a40012dbba0cf687c62a3a4af9358513c253b
2021-06-20 19:25:45 -07:00
Jerry Zhang
a029422cae [quant][graphmode][fx][refactor] Change the env map to add dtype as a key (#60054)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60054

Previously env in convert is Dict[str, Tuple[Node, torch.dtype]], that is, at a given time each node can only have one dtype,
this causes a problem for the following case:
```
class M(torch.nn.Module):
        def __init__(self):
            super().__init__()
            self.conv = nn.Conv2d(1, 1, 1)

        def forward(self, x):
            x = self.conv(x)
            x1 = x.expand_as(x)
            x2 = torch.add(x, x1)
            return x2

def forward(self, x):
    x = self.activation_post_process_0(x)
    x = self.conv(x)
    x = self.activation_post_process_1(x)
    x1 = x.expand_as(x)
    x1 = self.activation_post_process_2(x1)
    x2 = torch.add(x, x1)
    x2 = self.activation_post_process_3(x2)
    return x2

def forward(self, x):
    x = torch.quantize_per_tensor(x, ...)
    x = self.conv(x). # quantized conv
    x = torch.dequantize(x)
    x1 = x.expand_as(x)
    x1 = torch.quantize_per_tensor(x1, ...)
    # Error: x is dequantized
    x2 = torch.ops.quantized.add(x, x1)
    return x2

Currently we have a env that is a map from node name of the observed graph to the Node in the quantized graph, here the problem is that following a quantized operator conv, we have two operators, one is expecting float input (expand_as), the other is expecting quantized input (quantized add), and in the quantized graph, ideally, expand_as should consume the dequantized output, and quantized add should consume the quantized output:

quantized_conv - dequantize - expand_as
  \ ------- quantized_add

But currently in env, each node needs to either be quantized or not quantized. Therefore we will need to change env to include dtype as well:
env: Dict[str, Dict[dtype, Node]], e.g. {‘x’: {torch.float: dequantized_node, torch.quint8: quantized_node}}
And when we load from the env, we will need to provide the dtype of the Node that we want to load as well. We can have a separate pass to figure out this information for each node.
```

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D29149408

fbshipit-source-id: c9e4b7d65444ab6a6f573929bae1db5037629892
2021-06-18 13:31:43 -07:00
Vasiliy Kuznetsov
5a45103139 ns for fx: add API usage logging (#60103)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60103

Adds internal logging for NS for FX API usage.

Test Plan: CI

Reviewed By: jerryzh168

Differential Revision: D29166710

fbshipit-source-id: 2a1bf2f6038b0c6c5945b57b2db2de25c585a04a
2021-06-18 10:25:59 -07:00
Philip Meier
d5988c5eca remove unused type: ignore directives (#60006)
Summary:
During development it is common practice to put `type: ignore` comments on lines that are correct, but `mypy` doesn't recognize this. This often stems from the fact, that the used `mypy` version wasn't able to handle the used pattern.

With every new release `mypy` gets better at handling complex code. In addition to fix all the previously accepted but now failing patterns, we should also revisit all `type: ignore` comments to see if they are still needed or not. Fortunately, we don't need to do it manually: by adding `warn_unused_ignores = True` to the configuration, `mypy` will error out in case it encounters an `type: ignore` that is no longer needed.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60006

Reviewed By: jbschlosser, malfet

Differential Revision: D29133237

Pulled By: albanD

fbshipit-source-id: 41e82edc5cd5affa7ccedad044b59b94dad4425a
2021-06-18 07:23:31 -07:00
Angela Yi
c0b7c59e55 [quant] Equalization Observer modifications (#59953)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59953

The following modifications were made to the equalization
observers due to design changes:
- [InputEqualizationObserver] Replaced `calculate_qparams()` with
`calculate_scaled_minmax()` since we will need to return the scaled
min/max values to update the following input quantization observer
- [WeightEqualizationObserver] We no longer need a row observer since
this will be taken care of by the following weight quantization observer
- [WeightEqualizationObserver] Following the previous comment, we no
longer need to calculate the scaled qparam values. Instead, we will use
the equalization scale to later scale the weights and the qparams will
be taken care of by the weight quantization observer.

Test Plan:
`python test/test_quantization.py
TestEqualizeFx.test_input_weight_eq_observer`

Imported from OSS

Reviewed By: supriyar

Differential Revision: D29135332

fbshipit-source-id: be7e468273c8b62fc183b1e1ec50f6bd6d8cf831
2021-06-16 22:32:30 -07:00
Angela Yi
45c31cabb5 [quant] Input Weight Equalization - prepare modifications (#59747)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59747

Modifies prepare_fx for input-weight equalization. If a current
node is being equalized (there exists a EqualizationQConfig), then the
EqualizationObserver will be inserted before its quantization observer.

For a singular linear layer, the general flow looks like:
Original graph: `x0 -> linear -> x1`, `w -> linear`
After prepare: `x0 -> InpEqObs -> MinMaxObs -> linear1 -> MinMaxObs -> x1`
  `w -> WeightEqObs -> MinMaxObs -> linear1`

For two connected linear layers, the general flow looks like:
Original graph: `x0 -> linear1 -> linear2 -> x1`,
  `w1 -> linear1`, `w2 -> linear2`
After prepare: `x0 -> InpEqObs -> MinMaxObs -> linear1 -> MinMaxObs -> InpEqObs -> linear2 -> MinMaxObs -> x1`
  `w1 -> WeightEqObs -> MinMaxObs -> linear1`, `w2 -> WeightEqObs -> MinMaxObs -> linear2

Test Plan:
`python test/test_quantization.py
TestEqualizeFx.test_input_equalization_prepare`

Original model with one `nn.Linear` layer
```
LinearModule(
  (linear): Linear(in_features=1, out_features=1, bias=True)
)
```

Graph after `prepare_fx`:
```
graph():
    %x : [#users=1] = placeholder[target=x]
    %x_equalization_process_0 : [#users=1] = call_module[target=x_equalization_process_0](args = (%x,), kwargs = {})
    %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_00](args = (%x_equalization_process_0,), kwargs = {})
    %linear : [#users=1] = call_module[target=linear](args = (%x_activation_post_process_0,), kwargs = {})
    %linear_activation_post_process_0 : [#users=1] = call_module[target=linear_activation_post_process_0](args = (%linear,), kwargs = {})
    return linear_activation_post_process_0
```
--------------------------------------

Original model with two connected functional linear layers
```
FunctionalLinearModule(
  (linear1): Linear()
  (linear2): Linear()
)
```

Graph after `prepare_fx`:
```
graph():
    %x : [#users=1] = placeholder[target=x]
    %x_equalization_process_0 : [#users=1] = call_module[target=x_equalization_process_0](args = (%x,), kwargs = {})
    %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_00](args = (%x_equalization_process_0,), kwargs = {})
    %linear1_w : [#users=1] = get_attr[target=linear1.w]
    %linear1_w_equalization_process_0 : [#users=1] = call_module[target=linear1_w_equalization_process_0](args = (%linear1_w,), kwargs = {})
    %linear1_w_activation_post_process_0 : [#users=1] = call_module[target=linear1_w_activation_post_process_00](args = (%linear1_w_equalization_process_0,), kwargs = {})
    %linear1_b : [#users=1] = get_attr[target=linear1.b]
    %linear : [#users=1] = call_function[target=torch.nn.functional.linear](args = (%x_activation_post_process_0, %linear1_w_activation_post_process_0), kwargs = {bias: %linear1_b})
    %linear_activation_post_process_0 : [#users=1] = call_module[target=linear_activation_post_process_0](args = (%linear,), kwargs = {})
    %linear_activation_post_process_0_equalization_process_0 : [#users=1] = call_module[target=linear_activation_post_process_0_equalization_process_0](args = (%linear_activation_post_process_0,), kwargs = {})
    %linear2_w : [#users=1] = get_attr[target=linear2.w]
    %linear2_w_equalization_process_0 : [#users=1] = call_module[target=linear2_w_equalization_process_0](args = (%linear2_w,), kwargs = {})
    %linear2_w_activation_post_process_0 : [#users=1] = call_module[target=linear2_w_activation_post_process_00](args = (%linear2_w_equalization_process_0,), kwargs = {})
    %linear2_b : [#users=1] = get_attr[target=linear2.b]
    %linear_1 : [#users=1] = call_function[target=torch.nn.functional.linear](args = (%linear_activation_post_process_0_equalization_process_0, %linear2_w_activation_post_process_0), kwargs = {bias: %linear2_b})
    %linear_1_activation_post_process_0 : [#users=1] = call_module[target=linear_1_activation_post_process_0](args = (%linear_1,), kwargs = {})
    return linear_1_activation_post_process_0
```

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D29135316

fbshipit-source-id: 91697e805ede254dbb2a42ee4c23eb1c1c64590e
2021-06-16 22:32:28 -07:00
Angela Yi
7ce74f3339 [quant] EqualizationQConfig to distinguish input/output activations (#59739)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59739

Created an EqualizationQConfig specifically for equalization.
This inherits from QConfig and is used to distinguish between inserting
an input observer with an output observer. Since the output observer
field is included in the EqualizationQConfig, we no longer need an
output observer field in the _InputEqualizationObserver

Test Plan:
compiles

Imported from OSS

Reviewed By: ezyang

Differential Revision: D29135298

fbshipit-source-id: 3dde9c029c291467ff0a0845f0fc9c44573fc6f6
2021-06-16 22:31:18 -07:00
Jerry Zhang
a344b09db2 [quant][fx][graphmode] Remove Quantizer class (#59606)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59606

Test Plan:
python test/test_quantization.py TestQuantizeFx

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D28951432

fbshipit-source-id: 3301f7200a4c7166673c27f9ac7ff559f1e6935d
2021-06-15 21:54:57 -07:00
Supriya Rao
864d129bae [quant][fx] Remove extra q-dq for weight bias in normalization ops (#59882)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59882

Currently for normalization ops, the weight and bias arguments are treated as activationn inputs which require observers.
This results in adding extra quant-dequant ops for the weight and bias inputs.

This PR adds support to skip observing weight/bias inputs of norm operators, thus removing the redundant q-dq ops

Quantized graph with F.layer_norm
Before this PR
```
def forward(self, x):
    _input_scale_0 = self._input_scale_0
    _input_zero_point_0 = self._input_zero_point_0
    quantize_per_tensor = torch.quantize_per_tensor(x, _input_scale_0, _input_zero_point_0, torch.quint8);  x = _input_scale_0 = _input_zero_point_0 = None
    scale = self.scale
    _input_scale_1 = self._input_scale_1
    _input_zero_point_1 = self._input_zero_point_1
    quantize_per_tensor_1 = torch.quantize_per_tensor(scale, _input_scale_1, _input_zero_point_1, torch.quint8);  scale = _input_scale_1 = _input_zero_point_1 = None
    bias = self.bias
    _input_scale_2 = self._input_scale_2
    _input_zero_point_2 = self._input_zero_point_2
    quantize_per_tensor_2 = torch.quantize_per_tensor(bias, _input_scale_2, _input_zero_point_2, torch.quint8);  bias = _input_scale_2 = _input_zero_point_2 = None
    _scale_0 = self._scale_0
    _zero_point_0 = self._zero_point_0
    dequantize = quantize_per_tensor_1.dequantize();  quantize_per_tensor_1 = None
    dequantize_1 = quantize_per_tensor_2.dequantize();  quantize_per_tensor_2 = None
    layer_norm = torch.ops.quantized.layer_norm(quantize_per_tensor, [2, 5, 5], weight = dequantize, bias = dequantize_1, eps = 1e-05, output_scale = _scale_0, output_zero_point = _zero_point_0);  quantize_per_tensor = dequantize = dequantize_1 = _scale_0 = _zero_point_0 = None
    dequantize_2 = layer_norm.dequantize();  layer_norm = None
    return dequantize_2
```
After
```
def forward(self, x):
    _input_scale_0 = self._input_scale_0
    _input_zero_point_0 = self._input_zero_point_0
    quantize_per_tensor = torch.quantize_per_tensor(x, _input_scale_0, _input_zero_point_0, torch.quint8);  x = _input_scale_0 = _input_zero_point_0 = None
    scale = self.scale
    bias = self.bias
    _scale_0 = self._scale_0
    _zero_point_0 = self._zero_point_0
    layer_norm = torch.ops.quantized.layer_norm(quantize_per_tensor, [2, 5, 5], weight = scale, bias = bias, eps = 1e-05, output_scale = _scale_0, output_zero_point = _zero_point_0);  quantize_per_tensor = scale = bias = _scale_0 = _zero_point_0 = None
    dequantize = layer_norm.dequantize();  layer_norm = None
    return dequantize
```

Test Plan:
python test/test_quantization.py TestQuantizeFxOps.test_norm_weight_bias

Imported from OSS

Reviewed By: HDCharles, ailzhang

Differential Revision: D29068203

fbshipit-source-id: 24b5c38bbea5fd355d34522bfa654c9db18607da
2021-06-11 16:22:36 -07:00
Vasiliy Kuznetsov
d75e99b709 fx quant: enable qconfig_dict to target function invocations by order (#59605)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59605

Enables targeting of individual function invocations by execution order.
For example, given a module such as

```
class M1(torch.nn.Module):
  def forward(self, x):
    x = torch.add(x, x)
    x = torch.add(x, x)
    return x

class M2(torch.nn.Module):
  def __init__(self):
    self.m1 = M1()

  def forward(self, x):
    x = self.m1(x)
    return x
```

We can now target the first add of `m1` with

```
qconfig_dict = {
  "module_name_function_order": ("m1", torch.add, 0, custom_qconfig),
}
```

Test Plan:
```
python test/test_quantization.py TestQuantizeFx.test_qconfig_module_name_function_order
```

Imported from OSS

Reviewed By: hx89

Differential Revision: D28951077

fbshipit-source-id: 311d423724a31193d4fa4bbf3a712b46464b5a29
2021-06-11 08:53:40 -07:00
Vasiliy Kuznetsov
0099c25b85 fx quant: remove some dead code in observer insertion (redo) (#59799)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59799

This is a redo of #58574, easier to create a new PR than to fix rebase
conflicts, as there have been a large number of refactors to the
underlying code.

Removes some code which was incorrectly added by #57519 but never
actually used for anything.

Test Plan:
```
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
```

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D29031955

fbshipit-source-id: f407d181070cb283382965952821e3647c705544
2021-06-10 12:57:09 -07:00
Kevin Zheng (FRL)
61965abad7 Move _PartialWrapper to module scope (#59660)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59660

Context https://github.com/pytorch/pytorch/issues/57352

Test Plan: Pytorch CI tests

Reviewed By: vkuzo

Differential Revision: D28972991

fbshipit-source-id: efc9dd3e90e18e1cdf27d5ef0f168abd8169bc42
2021-06-09 11:55:04 -07:00
Jerry Zhang
7dac2987ce [quant][eager][fix] Fix a typo in convert function in eager mode quantization (#59571)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59571

Test Plan:
python test/test_quantization.py TestPostTrainingStatic.test_custom_module_class

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D28938355

fbshipit-source-id: 566daeb07d616ae40e52754d3d4581f75f248f04
2021-06-08 10:24:22 -07:00
Angela Yi
cc03ea2c47 [quant] Implemented InputWeightObserver for Linear inputs
Summary: Implemented two observers (InputEqualObserver and WeightEqualObserver) which will be inserted into the graph during prepare_fx().

Test Plan: python test/test_quantization.py TestEqualizeFx

Reviewed By: supriyar

Differential Revision: D28836954

fbshipit-source-id: 25517dc82ae67698ed8b2dc334e3323286976104
2021-06-07 11:19:43 -07:00
KAI ZHAO
1aa14fcb14 Fix the "tensors to be on the same device" error in HistogramObserver (#59234)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/59075

This PR fixes the "tensors to be on the same device" error in `HistogramObserver`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59234

Reviewed By: jbschlosser

Differential Revision: D28837572

Pulled By: vkuzo

fbshipit-source-id: ff7c3229ced7de2cdd8f76d526f0fd33ac643216
2021-06-03 13:30:56 -07:00
Jerry Zhang
18642e664a [quant][graphmode][fx][refactor] Split quantize.py to prepare.py and convert.py (#59353)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59353

Next: remove Quantizer class

Test Plan: Imported from OSS

Reviewed By: raghuramank100

Differential Revision: D28856277

fbshipit-source-id: 25f5502be387dbe9706780f667501b46b82789a5
2021-06-02 23:52:39 -07:00
Jerry Zhang
87a25e09f4 [quant][graphmode][fx][refactor] Remove _convert from Quantizer class (#59042)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59042

To remove Quantizer class and split prepare and convert functions to different files

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D28724867

fbshipit-source-id: 9f87d51020caa20d5408cb2820947e23d92d5fc3
2021-06-02 08:50:56 -07:00
Jerry Zhang
3218d890dd [quant][graphmode][fx][fix] Fix support for custom module (#59041)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59041

Static quantization for Custom module support was removed in a previous refactor
https://github.com/pytorch/pytorch/pull/57519 since it's not covered by the test case
This PR re-enabled the test case and fixed the support

Test Plan: Imported from OSS

Reviewed By: vkuzo

Differential Revision: D28724866

fbshipit-source-id: 1974675b88b56a2173daf86965d6f3fb7ebd783b
2021-06-01 22:31:15 -07:00
Jerry Zhang
06af7618e7 [quant][graphmode][fx][refactor] Remove Quantizer class from convert (QuantizeHandler) (#59040)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59040

To remove Quantizer class and split prepare and convert functions to different files

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D28724870

fbshipit-source-id: c0f748711b825cd46bdfcc05c054c77a41e8207a
2021-06-01 22:00:49 -07:00
Jerry Zhang
50e6ee3ca2 [quant][graphmode][fx][refactor] Remove Quantizer class from quantize_node (#59039)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59039

To remove Quantizer class and split prepare and convert functions to different files

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D28724874

fbshipit-source-id: bd984716b2da1d6879c3e92fa827574783a41567
2021-06-01 21:40:08 -07:00
Jerry Zhang
1d37f41567 [quant][graphmode][fx][refactor] Remove _prepare from Quantizer class (#59038)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59038

To remove Quantizer class and split prepare and convert functions to different files

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D28724869

fbshipit-source-id: e8501c9720b5ddb654e78bc8fa08de0466c1d52b
2021-06-01 18:01:22 -07:00
Jerry Zhang
20348fb32e [quant][graphmode][fx][refactor] Remove find_matches from Quantizer class (#59037)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59037

To remove Quantizer class and split prepare and convert functions to different files

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D28724865

fbshipit-source-id: 6c6824d0af7dd47d4c111d6a08e373bc65f33e08
2021-06-01 16:07:07 -07:00
Jerry Zhang
7d64fc675b [quant][graphmode][fx][refactor] Remove fold_weights from Quantizer class (#59036)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59036

To remove Quantizer class and split prepare and convert functions to different files

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D28724862

fbshipit-source-id: 5900420127fcc14846bc34c9ac29ff7e6a703f1e
2021-06-01 15:52:57 -07:00
Jerry Zhang
cc4891804c [quant][graphmode][fx][refactor] Remove save_state and restore_state from Quantizer class (#59035)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59035

To remove Quantizer class and split prepare and convert functions to different files

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D28724872

fbshipit-source-id: d32752c635917c9820e5e7cc414ba9d48a258a19
2021-06-01 15:38:36 -07:00
Jerry Zhang
3d521e8b40 [quant][graphmode][fx][refactor] Remove prepare_custom_config from Quantizer class (#59034)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59034

To remove Quantizer class and split prepare and convert functions to different files

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D28724873

fbshipit-source-id: 870e0822843ad1d035f41eaa015bdde9ccf6ec23
2021-06-01 14:52:22 -07:00
Jerry Zhang
e4b2684331 [quant][graphmode][fx][refactor] Remove patterns from Quantizer class (#59033)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59033

To remove Quantizer class and split prepare and convert functions to different files

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D28724861

fbshipit-source-id: 97b38e851b6bf581510a24636b1d8d6f1d977f5a
2021-06-01 13:44:08 -07:00
Jerry Zhang
83892c1861 [quant][graphmode][fx][refactor] Remove node_name_to_scope from Quantizer (#59032)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59032

To remove Quantizer class and split prepare and convert functions to different files

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D28724868

fbshipit-source-id: 6df639f20076b480812b6dcf0fc7d2c87ca29d8b
2021-06-01 13:26:09 -07:00
Jerry Zhang
3826f7e8e0 [quant][graphmode][fx][refactor] Remove quantized_graph from Quantizer (#59031)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59031

Trying to remove Quantizer class and split prepare and convert code

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D28724871

fbshipit-source-id: dad0332ba271c4cfb6ec1e8f2036443149b5bea4
2021-06-01 13:01:54 -07:00
Jerry Zhang
1b4586ee20 [quant][gx][graphmode][refactor] Remove modules from Quantizer (#59030)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59030

Trying to remove Quantizer class and split prepare and convert code

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D28724875

fbshipit-source-id: d6610c1d5eb7755331252be9e348a230abf4175c
2021-06-01 12:42:28 -07:00
Jerry Zhang
7523728368 [quant][graphmode][fx] Factor out run_weight_observer (#59029)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59029

Trying to remove Quantizer class and split prepare and convert code

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D28724864

fbshipit-source-id: 67ac5e7eb351970fdf46532c3c2ac6ac831bc697
2021-06-01 10:01:42 -07:00
Jerry Zhang
10fc42eacc [quant][graphmode][fx] Merge quant_env and env (#59028)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59028

Previously we have an env and a quant_env in convert, which is a bit confusing,
in this PR we merged them and have a Dict[str, Tuple[Node, torch.dtype]]

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D28724863

fbshipit-source-id: 722a682c70d300a6ccd2b988786a1ac2d45e880e
2021-06-01 09:21:38 -07:00
Jing Shan
a1806134a7 [QAT] Fix the runtime run cannot resize variables that require grad (#57068)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57068

When training with histogram observer on, we got this runtime error:
```
torch/quantization/observer.py", line 942, in forward
                    self.bins)

            self.histogram.resize_(combined_histogram.shape)
            ~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
            self.histogram.copy_(combined_histogram)
            self.min_val.resize_(combined_min.shape)
RuntimeError: cannot resize variables that require grad
```

Since this is the histogram observer that is used to collect histogram information, should not need gradient. So turn off the grad before resizing using `detach_()` method.

Test Plan:
- arc lint
- Train with histogram observer turned on, training finished successfully

f264139727

Reviewed By: supriyar

Differential Revision: D27147212

fbshipit-source-id: abed5b9c4570ffc6bb60e58e64791cfce66856cd
2021-05-27 09:12:06 -07:00
Jing Shan
25ac647f64 [QAT] Auto format the torch/quantization/observer.py` (#57067)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57067

auto format the code

Test Plan: lint

Reviewed By: jerryzh168

Differential Revision: D27147213

fbshipit-source-id: 008871d276c8891b2411549e17617e5c27d16ee3
2021-05-27 09:10:34 -07:00
Basil Hosmer
58d1b3639b fix nn.MHA scriptability (#58727)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58727

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D28593830

Pulled By: bhosmer

fbshipit-source-id: 37dee9efededaea9985a2bf040df1ba4b46f6580
2021-05-26 15:29:49 -07:00
Adnios
09a8f22bf9 Add mish activation function (#58648)
Summary:
See issus: https://github.com/pytorch/pytorch/issues/58375

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58648

Reviewed By: gchanan

Differential Revision: D28625390

Pulled By: jbschlosser

fbshipit-source-id: 23ea2eb7d5b3dc89c6809ff6581b90ee742149f4
2021-05-25 10:36:21 -07:00
KAI ZHAO
de845020a0 fix docstring for fusing functions (#58638)
Summary:
This PR fixes docstrings of fusing functions.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58638

Reviewed By: H-Huang

Differential Revision: D28584501

Pulled By: jerryzh168

fbshipit-source-id: 77a53a709d968df8ba8f5b613ad7cf225ba2826a
2021-05-24 18:27:22 -07:00
Angela Yi
a5250425e0 [quant] Eager mode equalization support for ConvReLU and LinearReLU (#58792)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58792

Enabling support for fused modules like ConvReLU or LinearReLU on eager mode cross-layer equalization.

Test Plan:
`python test/test_quantization.py TestEqualizeEager`

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D28647242

fbshipit-source-id: 286e057ce70aa7de45d575afd6c13e55120ff18a
2021-05-24 17:25:13 -07:00
Jerry Zhang
f29e75c4dc [reland][quant][fx][graphmode][refactor] Remove qconfig_map from Quantizer (#58455) (#58756)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58756

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Imported from OSS

Reviewed By: supriyar

Differential Revision: D28607564

fbshipit-source-id: 979cf165941bb3a9044d03077a170b5ea64dc36a
2021-05-24 14:57:45 -07:00
Angela Yi
e574c2c025 [quant][fx] Validate qconfig_dict keys (#58566)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58566

Validates the keys of the qconfig_dict, prepare_custom_config_dict, convert_custom_config_dict, and
fuse_custom_config_dict. If the user passes in an invalid key or makes a type, we will throw and error and let the user know what keys are supported.

Test Plan:
Imported from OSS

python test/test_quantization.py

Reviewed By: jerryzh168

Differential Revision: D28540923

fbshipit-source-id: 5958c32017b7d16abd219aefc8e92c42543897c2
2021-05-21 15:20:05 -07:00
Horace He
21a9334034 Revert D28497967: [quant][fx][graphmode][refactor] Remove qconfig_map from Quantizer
Test Plan: revert-hammer

Differential Revision:
D28497967 (1cf8f7a439)

Original commit changeset: 421ce3d86fad

fbshipit-source-id: b1b290be47d847ab0e0128e3ae89f528578550ee
2021-05-20 20:56:12 -07:00
Jerry Zhang
1cf8f7a439 [quant][fx][graphmode][refactor] Remove qconfig_map from Quantizer (#58455)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58455

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D28497967

fbshipit-source-id: 421ce3d86fadd3d92f4120b850b0167270509189
2021-05-20 20:34:47 -07:00
Jerry Zhang
b6dcdeacc9 [quant][graphmode][fx] Move qat_swap_modules outside of Quantizer (#58454)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58454

Trying to remove Quantizer in the end

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D28497966

fbshipit-source-id: 800f8e4afd99918d7330345f8ae7bcf018a5bde7
2021-05-20 17:27:49 -07:00
johnlu
618be18a41 Enable the quantization on XPU devices (#54857)
Summary:
Enable the quantization on XPU devices. Keep the model as is if the model is on XPU devices.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/54857

Reviewed By: ailzhang

Differential Revision: D28501381

Pulled By: jerryzh168

fbshipit-source-id: 6d3e9b04075393248b30776c69881f957a1a837c
2021-05-20 17:02:13 -07:00
Jerry Zhang
f879e70fc1 [quant][fx][graphmode][refactor] Factor out generate_qconfig_map to qconfig_utils.py (#58453)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58453

Move the class method generate_qconfig_map to qconfig_utils, will add more PRs
to remove functions out of Quantizer and eventually remove the Quantizer object

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D28497965

fbshipit-source-id: 3c78cfe676965d20a8834a859ffed4d8e9ecade4
2021-05-20 16:26:24 -07:00
Jerry Zhang
4668d09ca6 [quant][graphmode][fx] Quantize the output of statically quantized fp16 op in QuantizeHandler (#58445)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58445

Previously the output of statically quantized fp16 operator is not quantized in QuantizeHandler, which is not consistent with
the behavior of static int8 operators. Also it does not work well with reference functions, this PR
changes the fp16 static QuantizeHandler to quantize (call to(torch.float16)) in the QuantizeHandler, this also
makes the future support for reference functions easier.

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D28495830

fbshipit-source-id: 2140eab8ab2dd08f6570d9e305485e3029e1f47d
2021-05-20 16:03:42 -07:00
Emily Shen
07da584dbd Fix KeyError returned by _maybe_get_last_node_only_observer (#58443)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58443

Test Plan: arc lint

Reviewed By: vkuzo

Differential Revision: D28494119

fbshipit-source-id: 05abf4e12051afc237096812fb0ee08a8b9447f9
2021-05-18 12:41:19 -07:00
Vasiliy Kuznetsov
821a97595b fx quant: improve performance of all_node_args_have_no_tensors (#58461)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58461

Improves the logic which calculates whether a node has any tensors
in its arguments by terminating the recursion early when possible.

In a future PR, we should probably ditch this entire approach and switch to
using dtype propagation.

Test Plan:
```
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
```

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D28499455

fbshipit-source-id: bedd844022b90e1fcb7d7a3cb4cc65440dc9cc59
2021-05-18 07:19:59 -07:00
James Reed
7b73fdf597 [FX] Fix retracing wrapped functions (#58061)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58061

Test Plan: Imported from OSS

Reviewed By: yuhc

Differential Revision: D28358801

Pulled By: jamesr66a

fbshipit-source-id: c7c9a8a80e5bfe1eb1f6d2cf858ac7e57153a860
2021-05-17 19:50:16 -07:00
Vasiliy Kuznetsov
c156a4ffaa fx quant: fix crash on output dicts and lists (#58416)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58416

https://github.com/pytorch/pytorch/pull/57519 had a regression not
caught by CI, it added an assertion which failed on various model
output types.

This PR removes the assertion and adds the logic to observe graph
outputs in a way that supports arbitrary output formats.

Test Plan:
```
python test/test_quantization.py TestQuantizeFx.test_output_lists_and_dicts
```

Imported from OSS

Reviewed By: z-a-f

Differential Revision: D28479946

fbshipit-source-id: bcce301f98a057b134c0cd34ab0ca96ba457863f
2021-05-17 15:02:09 -07:00
Vasiliy Kuznetsov
4f50fdc2a3 fx quant: refactor observer insertion
Summary:
tl;dr; rewrites the FX graph mode quantization observer insertion to be easier to understand and extend.
The key conceptual difference from before is:
* before: for each node, observers are always inserted to the output of the current node, even if they are needed for the next node. This is hard to reason about.
* after: for each node, observers are inserted to the inputs (if needed, as calculated by the dtype of the argument and dtype of current node) and to the output (if needed for the type of pattern and qconfig).  There is no knowledge of future nodes needed to insert observers for the current node.

This allows us to significantly simplify various things:
* all new observers needed for a node are inserted together.  This makes it easier to understand and debug things.  We add an invariant that node X will never change any observers inserted by any preceding or subsequent node, so to debug an issue the user can just understand what is happening for node X, without having to understand what happens before or after it.
* all the state tracking of activation_post_process_map and activation_post_process_indices are removed, instead observers are looked up by graph traversals
* since there is no longer a need for overlapping graph passes which mutate each other's interemediate state, it is easier to understand what the rules are for inserting observers, and to create new rules in the future.

Test Plan:
```
# all OSS tests pass
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
```

Imported from OSS

Differential Revision: D28241864

Reviewed By: jerryzh168

Pulled By: vkuzo

fbshipit-source-id: 950d58972d26362808564cc0a2dfb30413a3734d
2021-05-15 09:51:33 -07:00
Vasiliy Kuznetsov
49adac65c4 ns for fx: clean up manual string names of related ops (#57210)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57210

Removes the manually specified string name for sets of
related ops, and replaces it with an automatically generated
index. The manual name was arbitrary and ok for an MVP, but
is not safe for wide usage.

Also, adds APIs for users to add custom functions to the
relatedness map by either pairing it to a known function
or creating a new relatedness set.

Test Plan:
```
python test/test_quantization.py TestFXGraphMatcher
```

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D28077977

fbshipit-source-id: e64a1ad6cd063014d74cdad189b0a612b1143435
2021-05-05 06:30:32 -07:00
Vasiliy Kuznetsov
76f29d53bf ns for fx: change matching to only match known types (#57186)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57186

Before this PR, we matched any pair of nodes with equal or related
types.

This PR changes the behavior to only match nodes whose type is in
the allowlist (the relatedness mappings). This will prevent matching
user defined modules, unless users add them to the mappings.

This is motivated by a couple of things:
1. if user defined types are matched, it can break scriptability of the
   model with loggers attached. This happens whenever the user module
   has a return type of anything other than a Tensor or a tuple of
   Tensors.
2. we tried the past behavior on a couple of models, and it hasn't been
   useful.

Test Plan:
```
python test/test_quantization.py TestFXGraphMatcher
python test/test_quantization.py TestFXGraphMatcherModels
python test/test_quantization.py TestFXNumericSuiteCoreAPIs
python test/test_quantization.py TestFXNumericSuiteCoreAPIsModels
```

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D28077981

fbshipit-source-id: 0a698e52b807cda47e6923310448a985b26eb362
2021-05-05 06:30:30 -07:00
Vasiliy Kuznetsov
44bb15cfd3 ns for fx: add more type to relationship mapping (#57184)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57184

Add remaining types to the relationship mapping to have full coverage
of ops quantization knows about, except binary ops and RNNs.

Test Plan:
```
python test/test_quantization.py TestFXGraphMatcher.test_op_relationship_mapping
```

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D28077979

fbshipit-source-id: 0f6070c8a995032978702d088803f89ff25f2a7f
2021-05-05 06:30:29 -07:00
Vasiliy Kuznetsov
a9dc9535f6 ns for fx: move relatedness mapping to mappings file (#57171)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57171

No logic change, just moving the mapping to a file where
the other mappings are.

Test Plan:
```
python test/test_quantization.py TestFXNumericSuiteCoreAPIs
```

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D28077978

fbshipit-source-id: 4049d6a498156a5dffe3a03d2f4abc79da7bf907
2021-05-05 06:29:11 -07:00
Vasiliy Kuznetsov
7c3a30fd79 fx quant: remove matching hack for binary qhandler (#57470)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57470

Removes the earlier hack of matching patterns originally matched
to BinaryOpQuantizeHandler to switch to CopyHandler. After this PR,
each pattern can only be matched to one type of QuantizeHandler or
to nothing.

Test Plan:
```
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
```

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D28152909

fbshipit-source-id: afc285e770bd7eb0518c90e3ee4874c421e78bbc
2021-05-04 16:38:56 -07:00
Jerry Zhang
945c93b8bd [quant][graphmode][fx] Skip observering boolean Tensors (#57375)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57375

Skip observing the input for masked_fill. Currently we don't have a way to
query the type of Proxy in GraphModule, hopefully we should have the functionality to annotate the type,
we'll need to annotate a Proxy to be a boolean Tensor to remove this hack.

Test Plan:
python test/test_quantization.py TestQuantizeFxOps.test_boolean_tensor

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D28126003

fbshipit-source-id: 2989766370a607579b3ea07ca36cdc2ce35893cc
2021-05-03 11:20:33 -07:00
Vasiliy Kuznetsov
da51fd31a5 fx quant: remove find_quants from convert (#57402)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57402

This is a cleanup, the value is not used by anything. It was
probably left behind after previous refactors.

Test Plan:
```
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
```

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D28133622

fbshipit-source-id: 44a3f955d4af8d6dd15b4fb3038188568e4ee549
2021-05-02 20:13:13 -07:00
Vasiliy Kuznetsov
d6563bc153 fx quant: remove unnecessary quants arguments (#57399)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57399

There were a couple of functions which took `quants` as arguments
without using them, probably left over from after past refactors.
Cleaning this up to improve code readability.

Test Plan:
```
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
```

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D28132413

fbshipit-source-id: 636b146c0b5ef0caea9c4b539e245de245d48c49
2021-05-02 20:13:12 -07:00
Vasiliy Kuznetsov
643f41be61 fx quant: remove FixedQParamsOpQuantizeHandler from quantize.py (#57393)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57393

Moves the information on whether we should pass the information
whether the output is quantized based on the inputs to live
on the qhandler object.  This allows us to remove
FixedQParamsOpQuantizeHandler from quantize.py, further reducing
the coupling between handler objects and the quantization pass.

Test Plan:
```
python test/test_quantization.py TestQuantizeFxOps
```

Imported from OSS

Reviewed By: astaff

Differential Revision: D28132414

fbshipit-source-id: 5c28524b47c00f618d3a38657376abae9e6ffe7c
2021-05-02 20:13:10 -07:00
Vasiliy Kuznetsov
2bd158386a fx quant: move input_output_observed to qhandler (#57388)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57388

It's a bit confusing to have this be a decorator. It's simpler to
just expose it as a function on qhandler.

Test Plan:
```
python test/test_quantization.py TestQuantizeFxOps
```

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D28129411

fbshipit-source-id: f7316f285e8546c67e8d8cf753462b2c2abb2636
2021-05-02 20:13:08 -07:00
Vasiliy Kuznetsov
1b20eeb138 fx quant: move output obs logic to QuantizeHandler (#57377)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57377

Moves the logic which determines
1. whether a pattern instance's output should be observed
2. whether a pattern instance's output should be marked as observed based on its inputs
3. whether to ovverride the activation specified in the qconfig

from `quantize.py` to `quantization_patterns.py`.  This makes
the code easier to read and reduces the coupling between `Quantizer`
and `QuantizeHandler` instances.

Note: there are some further cleanups which would be good after this one
- leaving those for future PRs.

Test Plan:
```
python test/test_quantization.py TestQuantizeFxOps
```

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D28126896

fbshipit-source-id: 94c80a9c7307452783348d65b402acc84983e3f6
2021-05-02 20:13:07 -07:00
Vasiliy Kuznetsov
fe23881e76 fx quant: readability improvements on observer functions (#57368)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57368

1. renames functions which only sometimes insert observers to start with `maybe_`,
to clarify the difference from functions which always insert observers
2. saves a level of indent in `maybe_insert_observer_for_output_of_the_node`

Test Plan:
```
python test/test_quantization.py TestQuantizeFx
```

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D28126897

fbshipit-source-id: 4cbc184dbf5e85954314cfbbcdd1551474175bf0
2021-05-02 20:13:05 -07:00
Vasiliy Kuznetsov
db6cd42434 fx quant: clean up nit in insert_observer (#57367)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57367

This code is never hit (see insert_observer_for_output_of_the_node
which gates it out), so changing to an assert in order to
have `insert_observer` actually always insert an observer.
This helps code readability.

Test Plan:
```
python test/test_quantization.py TestQuantizeFx
```

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D28126898

fbshipit-source-id: 411bc37769a6eacbebc463ed6c84cac85871bd5e
2021-05-02 20:12:10 -07:00
Vasiliy Kuznetsov
bb640efa40 ns for fx: add missing add_relu and mul_relu patterns (#56927)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56927

Adds the connection of `torch.add` to `toq.add_relu` and of `torch.mul`
to `toq.mul_relu`.

Test Plan:
CI

Imported from OSS

Reviewed By: supriyar

Differential Revision: D28003475

fbshipit-source-id: a12871feacf84c5afb0e1cc47e708e285695ffeb
2021-05-02 08:34:49 -07:00
Jerry Zhang
ecacb8c78b [quant][graphmode][fx] Fix getitem for unmatched nodes (#57173)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57173

If getitem is followed by an unmatched node, we'll remove the observer after it.

Test Plan:
python test/test_quantization.pyt TestQuantizeFxOps.test_getitem

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D28068805

fbshipit-source-id: e79f8ec3e8fd61d348b8a7069ab0bb434d737c30
2021-04-29 10:16:44 -07:00
Vasiliy Kuznetsov
9fe2673d1c ns for fx: additional bugfix for user defined functions (#57028)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57028

Adds a test case for wrapped sigmoid, and fixes the following issues
to make it pass in NS:
* allows comparing between x.sigmoid() and torch.sigmoid(x), if they are related
* allows dtype cast from FP32_OR_INT8 to FP32, via dequantize (this will be improved later)

Test Plan:
```
python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_user_defined_function
```

Reviewed By: jerryzh168

Differential Revision: D28030089

Pulled By: vkuzo

fbshipit-source-id: b237353e2d564a4476f409df461746a259015a4b
2021-04-27 16:29:03 -07:00
Vasiliy Kuznetsov
da2cef6a40 ns for fx: allow comparing int8 to int8 for functionals (#57027)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57027

Fixes a bug to allow shadowing of linear and conv functionals.
The bug is to only detach tensors, not all objects.

Test Plan:
```
python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_int8_shadows_int8_fun
```

Reviewed By: jerryzh168

Differential Revision: D28030090

Pulled By: vkuzo

fbshipit-source-id: 0a38c4b232e007d7822eee818b0af99d98335d22
2021-04-27 16:29:01 -07:00
Vasiliy Kuznetsov
a359cfac22 ns for fx: add option to skip matching classes and functions (#57026)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57026

Adds a config option to skip matching classes by class type
and functions by function type.

This is useful when users make custom modules which return
types other than tensors. With the current implementation of
Logger, these are not scriptable.

Test Plan:
```
python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_user_module_scriptable
```

Reviewed By: jerryzh168

Differential Revision: D28030093

Pulled By: vkuzo

fbshipit-source-id: 71dc54dd935d2071c4b017260ea2a1e5c2298bfe
2021-04-27 16:29:00 -07:00
Vasiliy Kuznetsov
e8a5490c0a ns for fx: support binary ops when adding unshadowed loggers for inputs (#57025)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57025

Adds the ability to log unshadowed inputs of binary ops such as `add`
and `mul`, when indices 0, 1, or 0 and 1 are tensors.

Note: making shadowing support this is saved for a future PR.

Test Plan:
```
python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_add_mul_inputs_activations
```

Reviewed By: jerryzh168

Differential Revision: D28030098

Pulled By: vkuzo

fbshipit-source-id: fd46760faac153975cd7688e70c44991ec1d5dff
2021-04-27 16:28:58 -07:00
Vasiliy Kuznetsov
ddedeab66d ns for fx: bug fix for shadowing fp16 emulation patterns (#57024)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57024

Enables shadow copies of fp16 emulation patterns where weights
are cast to fp16 before being passed to linear.  This previously
did not work because copying of `call_method` nodes was not implemented.

Test Plan:
```
python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_linear_fp16_vs_linear_fp16_shadow_activations
```

Reviewed By: jerryzh168

Differential Revision: D28030096

Pulled By: vkuzo

fbshipit-source-id: 13a39ea6c106180df6d750246672286b58b4d04c
2021-04-27 16:28:56 -07:00
Vasiliy Kuznetsov
2acc19eca1 ns for fx: add fp16 function shadowing (#57023)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57023

Adds functionality for shadowing user functions with fp16 I/O dtype.

Test Plan:
```
python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_user_defined_function
```

Reviewed By: jerryzh168

Differential Revision: D28030092

Pulled By: vkuzo

fbshipit-source-id: 642792398a76bd62593fa439ab14901e9dbdf4f8
2021-04-27 16:28:54 -07:00
Vasiliy Kuznetsov
782a0a1469 ns for fx: allow user functions in shadowing (#57022)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57022

Allows usage of user functions in NS shadow APIs. We expose the
i/o mapping to the user APIs, and thread them throughout the code.

Note: the format of the mapping is currently not the best.  Saving
improving that for a future PR.

Test Plan:
```
python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_user_defined_function
```

Reviewed By: jerryzh168

Differential Revision: D28030095

Pulled By: vkuzo

fbshipit-source-id: 2863312362223ad276437e2aeeec4a3f71b691c7
2021-04-27 16:28:53 -07:00
Vasiliy Kuznetsov
c4bec76bec ns for fx: move node I/O dtype mapping to be local instead of global (#57021)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57021

To support shadows of custom functions, we need to allow user to
specify I/O type of the custom functions.

This PR is a cleanup in preparation for making the above happen.
We make the I/O dtype mappings be generated by a function instead
of a global variable. In the next PR, we will add a hook so user
can modify these mappings.

Test Plan:
```
python test/test_quantization.py TestFXNumericSuiteCoreAPIs
```

Reviewed By: jerryzh168

Differential Revision: D28030094

Pulled By: vkuzo

fbshipit-source-id: 3cbb617f034ef385c2875c4ec7fed13ca30bfc57
2021-04-27 16:27:40 -07:00
Mike Ruberry
1145e2c6e2 Revert D27831996: ns for fx: move node I/O dtype mapping to be local instead of global
Test Plan: revert-hammer

Differential Revision:
D27831996 (93de80203d)

Original commit changeset: 782f5e77de0e

fbshipit-source-id: 6637ef4e8ba76fc4f2b3836ad1ed8d37ce040576
2021-04-27 01:01:08 -07:00
Mike Ruberry
45e96b5410 Revert D27833189: ns for fx: allow user functions in shadowing
Test Plan: revert-hammer

Differential Revision:
D27833189 (1917350977)

Original commit changeset: dac418e294d1

fbshipit-source-id: c6f58dac1a35806ea7d1dfb993d67e698196dee1
2021-04-27 01:01:06 -07:00
Mike Ruberry
982c72ac33 Revert D27836064: ns for fx: add fp16 function shadowing
Test Plan: revert-hammer

Differential Revision:
D27836064 (96a9eafcfb)

Original commit changeset: 37a434a04e2b

fbshipit-source-id: e85088f5e301e14a0fc9ac1f7241c2baaf0a957e
2021-04-27 01:01:04 -07:00
Mike Ruberry
90d554bd86 Revert D27857735: ns for fx: bug fix for shadowing fp16 emulation patterns
Test Plan: revert-hammer

Differential Revision:
D27857735 (f35540be38)

Original commit changeset: 7c1a067f035a

fbshipit-source-id: 6816223975b2e7b1f395e8894d17e3358fdb50ed
2021-04-27 01:01:02 -07:00
Mike Ruberry
abb8b6c1c1 Revert D27864296: ns for fx: support binary ops when adding unshadowed loggers for inputs
Test Plan: revert-hammer

Differential Revision:
D27864296 (c004346c88)

Original commit changeset: 3cbeb728297a

fbshipit-source-id: bc87cb707b14a0965452e9a1aa0d4e37ffbe5bf1
2021-04-27 01:01:01 -07:00
Mike Ruberry
cc8c5c1447 Revert D27886107: ns for fx: add option to skip matching classes and functions
Test Plan: revert-hammer

Differential Revision:
D27886107 (92c7aec5f5)

Original commit changeset: ec92c4f7ab71

fbshipit-source-id: 87d3b91c3d601f1706b61a2b2ce287a7b44f3d81
2021-04-27 01:00:59 -07:00
Mike Ruberry
5dc7a6b050 Revert D27960767: ns for fx: allow comparing int8 to int8 for functionals
Test Plan: revert-hammer

Differential Revision:
D27960767 (502c58ad84)

Original commit changeset: abc911ca4b9e

fbshipit-source-id: 9bb1aa9d0e764bfd2dd6745af897d958c054ef3a
2021-04-27 01:00:57 -07:00
Mike Ruberry
5db03b4109 Revert D27960766: ns for fx: additional bugfix for user defined functions
Test Plan: revert-hammer

Differential Revision:
D27960766 (9bd14da6e4)

Original commit changeset: 02935d2f400a

fbshipit-source-id: e7026c8637a591b6ffef288da8ef6306cdb9eb95
2021-04-27 00:59:57 -07:00
Vasiliy Kuznetsov
9bd14da6e4 ns for fx: additional bugfix for user defined functions (#56762)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56762

Adds a test case for wrapped sigmoid, and fixes the following issues
to make it pass in NS:
* allows comparing between x.sigmoid() and torch.sigmoid(x), if they are related
* allows dtype cast from FP32_OR_INT8 to FP32, via dequantize (this will be improved later)

Test Plan:
```
python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_user_defined_function
```

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D27960766

fbshipit-source-id: 02935d2f400aa0b8f3d51bbf664a6c8ca89aa811
2021-04-26 17:03:32 -07:00
Vasiliy Kuznetsov
502c58ad84 ns for fx: allow comparing int8 to int8 for functionals (#56742)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56742

Fixes a bug to allow shadowing of linear and conv functionals.
The bug is to only detach tensors, not all objects.

Test Plan:
```
python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_int8_shadows_int8_fun
```

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D27960767

fbshipit-source-id: abc911ca4b9edafd1effb9dada7731981538c2df
2021-04-26 17:03:30 -07:00
Vasiliy Kuznetsov
92c7aec5f5 ns for fx: add option to skip matching classes and functions (#56493)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56493

Adds a config option to skip matching classes by class type
and functions by function type.

This is useful when users make custom modules which return
types other than tensors. With the current implementation of
Logger, these are not scriptable.

Test Plan:
```
python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_user_module_scriptable
```

needs more testing before land

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D27886107

fbshipit-source-id: ec92c4f7ab7141021bc022f07b3b558b42bbb986
2021-04-26 17:03:28 -07:00
Vasiliy Kuznetsov
c004346c88 ns for fx: support binary ops when adding unshadowed loggers for inputs (#56408)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56408

Adds the ability to log unshadowed inputs of binary ops such as `add`
and `mul`, when indices 0, 1, or 0 and 1 are tensors.

Note: making shadowing support this is saved for a future PR.

Test Plan:
```
python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_add_mul_inputs_activations
```

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D27864296

fbshipit-source-id: 3cbeb728297aa192d1ea17e815299709fd9db056
2021-04-26 17:03:26 -07:00
Vasiliy Kuznetsov
f35540be38 ns for fx: bug fix for shadowing fp16 emulation patterns (#56384)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56384

Enables shadow copies of fp16 emulation patterns where weights
are cast to fp16 before being passed to linear.  This previously
did not work because copying of `call_method` nodes was not implemented.

Test Plan:
```
python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_linear_fp16_vs_linear_fp16_shadow_activations
```

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D27857735

fbshipit-source-id: 7c1a067f035acf7322175f8535876d0ead88a86a
2021-04-26 17:03:25 -07:00
Vasiliy Kuznetsov
96a9eafcfb ns for fx: add fp16 function shadowing (#56311)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56311

Adds functionality for shadowing user functions with fp16 I/O dtype.

Test Plan:
```
python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_user_defined_function
```

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D27836064

fbshipit-source-id: 37a434a04e2bd2593a892209bbae59f0f1f34319
2021-04-26 17:03:23 -07:00
Vasiliy Kuznetsov
1917350977 ns for fx: allow user functions in shadowing (#56301)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56301

Allows usage of user functions in NS shadow APIs. We expose the
i/o mapping to the user APIs, and thread them throughout the code.

Note: the format of the mapping is currently not the best.  Saving
improving that for a future PR.

Test Plan:
```
python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_user_defined_function
```

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D27833189

fbshipit-source-id: dac418e294d1c9b204efbf4071d5cc12a9e784c0
2021-04-26 17:03:21 -07:00
Vasiliy Kuznetsov
93de80203d ns for fx: move node I/O dtype mapping to be local instead of global (#56296)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56296

To support shadows of custom functions, we need to allow user to
specify I/O type of the custom functions.

This PR is a cleanup in preparation for making the above happen.
We make the I/O dtype mappings be generated by a function instead
of a global variable. In the next PR, we will add a hook so user
can modify these mappings.

Test Plan:
```
python test/test_quantization.py TestFXNumericSuiteCoreAPIs
```

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D27831996

fbshipit-source-id: 782f5e77de0eef3899b9b7def0fdabd8dcafef12
2021-04-26 17:03:19 -07:00
Vasiliy Kuznetsov
8dbf6ae8fa ns for fx: handling for user functions in weight and unshadowed act APIs (#56292)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56292

Adds hooks for specifying user defined functions to NS weight and
unshadowed activation APIs.

Adding it to shadowed activation APIs will be a bit more work, upcoming
in a separate PR.

Test Plan:
```
python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_user_defined_function
```

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D27830409

fbshipit-source-id: 6bbddc3062c0b3e412a3147244795319c0785a92
2021-04-26 17:03:18 -07:00
Vasiliy Kuznetsov
d405d41a7c ns for fx: enable user defined functions for graph matching (#56283)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56283

Exposes the `base_name_to_sets_of_related_ops` variable
to the graph matching API, so that users can add relationships
for custom functions. This is needed to enable full support of
external functions for custom backends.

The next PR will extend this to the NS APIs.

Test Plan:
```
python test/test_quantization.py TestFXGraphMatcher.test_user_defined_function
```

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D27830410

fbshipit-source-id: 8688cf697d388c52e3d18f108765edfca3c3d3aa
2021-04-26 17:02:11 -07:00
Joel Schlosser
febff45900 Support factory kwargs in torch.nn modules (#54508)
Summary:
Continuation of https://github.com/pytorch/pytorch/pull/53144

Pull Request resolved: https://github.com/pytorch/pytorch/pull/54508

Reviewed By: albanD

Differential Revision: D27939544

Pulled By: jbschlosser

fbshipit-source-id: 4bf517e5f74f093e27ca38a85e732da65e44d805
2021-04-22 16:16:53 -07:00
Jerry Zhang
1719cb82f3 [quant][graphmode][fx] Support preserving attributes in deepcopy of observed/quantized graphmodule (#56550)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56550

Add support for preserving a list of attributes on observed/quantized GraphModule

Test Plan:
python test/test_quantization.py TestQuantizeFx.test_deepcopy_preserve_attributes

Imported from OSS

Reviewed By: vkuzo, kazhang

Differential Revision: D27899317

fbshipit-source-id: ebf21334715e5ab764aaa27eed534cc0cdf9f2b5
2021-04-22 15:02:44 -07:00
Joel Schlosser
12b2bc94d7 Revert D27909732: [pytorch][PR] Support factory kwargs in torch.nn modules
Test Plan: revert-hammer

Differential Revision:
D27909732 (5a09def9b0)

Original commit changeset: d8684b2403ab

fbshipit-source-id: d00d69fae4fa4ed58d9e97e70b27a06a0dcb39e4
2021-04-21 13:44:03 -07:00
Joel Schlosser
5a09def9b0 Support factory kwargs in torch.nn modules (#54508)
Summary:
Continuation of https://github.com/pytorch/pytorch/pull/53144

Pull Request resolved: https://github.com/pytorch/pytorch/pull/54508

Reviewed By: malfet

Differential Revision: D27909732

Pulled By: jbschlosser

fbshipit-source-id: d8684b2403ab7eb336371d118799146a2520bd76
2021-04-21 13:20:11 -07:00
Jerry Zhang
096089abcb [quant][graphmode][fx] Produce torch.cat instead of torch.ops.quantized.cat (#54924)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54924

Previously we are producing torch.ops.quantize.cat which takes inputs, dequantize them
and requantize with new qparams. This PR changes that to produce torch.cat directly, torch.cat
will assume all inputs are sharing the same qparam, and it will produce a quantized Tensor with
the same qparam as all inputs (because previous PR makes sure all inputs and output of cat are sharing
the same observer/fakequant instance).

Using torch.cat is expected to be more efficient since it does not introduce extra quant/dequant.

Test Plan:
python test/test_quantization.py TestQuantizeFx.test_cat

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D27416528

fbshipit-source-id: 896c280abec2903c29d597c655729666583ff0dd
2021-04-21 10:58:09 -07:00
Sam Estep
75024e228c Add lint for unqualified type: ignore (#56290)
Summary:
The other half of https://github.com/pytorch/pytorch/issues/56272.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56290

Test Plan:
CI should pass on the tip of this PR, and we know that the lint works because the following CI runs (before this PR was finished) failed:

- https://github.com/pytorch/pytorch/runs/2384511062
- https://github.com/pytorch/pytorch/actions/runs/765036024

Reviewed By: seemethere

Differential Revision: D27867219

Pulled By: samestep

fbshipit-source-id: e648f07b6822867e70833e23ddafe7fb7eaca235
2021-04-21 08:07:23 -07:00
Charles David Hernandez
6e1fc5cef8 [quant] added dq->op->q quantization patterns for GELU and softmax ops (#56004)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56004

added reference pattern support for GELU, softmax and bmm for int dtypes. For GELU and Softmax, this consisted of adding reference patterns to the default node handler for int dtypes. Note GELU and softmax patterns are not registered since they do not have a proper quantized kernel which means they would either add unnecessary dequant and quant ops to the network, or they would simply error. This can be circumvented with custom qconfig usage as in test_gelu_reference

bmm was added within binary ops along with some significant changes to how that code is structured. Theoretically the reference pattern used for bmm could be applied to other dtypes. This was not enabled because of issues relating to Line 1323 in quantize.py. In essence, the prepare step does not know whether an op will use a reference pattern or not, so for ops that are supported with one dtype in reference and one dtype normally, this has the potential to cause issues. This is difficult to get aorund with the is_reference flag being available in the prepare step or discussed changes around separating

Test Plan:
python test/test_quantization.py TestQuantizeFxOps.test_gelu_reference
python test/test_quantization.py TestQuantizeFxOps.ttest_gelu_normal
python test/test_quantization.py TestQuantizeFxOps.test_softmax_reference
python test/test_quantization.py TestQuantizeFxOps.test_softmax_normal
python test/test_quantization.py TestQuantizeFxOps.test_silu_reference
python test/test_quantization.py TestQuantizeFxOps.test_bmm_int_reference
python test/test_quantization.py TestQuantizeFxOps
python test/test_quantization.py TestFuseFx
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxModels

Imported from OSS

Reviewed By: raghuramank100

Differential Revision: D27818340

fbshipit-source-id: de65be0797035463cd2d1b0e4677d1a87f69143c
2021-04-20 13:26:15 -07:00
Jerry Zhang
94406f77f6 [quant][graphmode][fx] Add support for keeping output quantized for list and dict (#56391)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56391

Previously we only support keeping output quantized for tensor output, this PR adds support
for list and dict (values) as well

Test Plan: Imported from OSS

Reviewed By: vkuzo

Differential Revision: D27860327

fbshipit-source-id: e770160ced47a7173abff5505ec620bd2b1a0b01
2021-04-19 21:37:11 -07:00
Natalia Gimelshein
92d24e3060 Revert D27855386: [pytorch][PR] Support factory kwargs in torch.nn modules
Test Plan: revert-hammer

Differential Revision:
D27855386 (40483acc51)

Original commit changeset: dabd505d2a04

fbshipit-source-id: f5bf3120d87861b30a8e1bf11977ad7d27cd8500
2021-04-19 20:07:20 -07:00
Sam Estep
e3900d2ba5 Add lint for unqualified noqa (#56272)
Summary:
As this diff shows, currently there are a couple hundred instances of raw `noqa` in the codebase, which just ignore all errors on a given line. That isn't great, so this PR changes all existing instances of that antipattern to qualify the `noqa` with respect to a specific error code, and adds a lint to prevent more of this from happening in the future.

Interestingly, some of the examples the `noqa` lint catches are genuine attempts to qualify the `noqa` with a specific error code, such as these two:
```
test/jit/test_misc.py:27:            print(f"{hello + ' ' + test}, I'm a {test}") # noqa E999
test/jit/test_misc.py:28:            print(f"format blank") # noqa F541
```
However, those are still wrong because they are [missing a colon](https://flake8.pycqa.org/en/3.9.1/user/violations.html#in-line-ignoring-errors), which actually causes the error code to be completely ignored:

- If you change them to anything else, the warnings will still be suppressed.
- If you add the necessary colons then it is revealed that `E261` was also being suppressed, unintentionally:
  ```
  test/jit/test_misc.py:27:57: E261 at least two spaces before inline comment
  test/jit/test_misc.py:28:35: E261 at least two spaces before inline comment
  ```

I did try using [flake8-noqa](https://pypi.org/project/flake8-noqa/) instead of a custom `git grep` lint, but it didn't seem to work. This PR is definitely missing some of the functionality that flake8-noqa is supposed to provide, though, so if someone can figure out how to use it, we should do that instead.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56272

Test Plan:
CI should pass on the tip of this PR, and we know that the lint works because the following CI run (before this PR was finished) failed:

- https://github.com/pytorch/pytorch/runs/2365189927

Reviewed By: janeyx99

Differential Revision: D27830127

Pulled By: samestep

fbshipit-source-id: d6dcf4f945ebd18cd76c46a07f3b408296864fcb
2021-04-19 13:16:18 -07:00
Joel Schlosser
40483acc51 Support factory kwargs in torch.nn modules (#54508)
Summary:
Continuation of https://github.com/pytorch/pytorch/pull/53144

Pull Request resolved: https://github.com/pytorch/pytorch/pull/54508

Reviewed By: bdhirsh

Differential Revision: D27855386

Pulled By: jbschlosser

fbshipit-source-id: dabd505d2a04208e74b158570fb2859c736eea2c
2021-04-19 12:24:58 -07:00
Sam Estep
d05e7c163f Revert D27600457: [pytorch][PR] Support factory kwargs in torch.nn modules
Test Plan: revert-hammer

Differential Revision:
D27600457 (1077f87269)

Original commit changeset: b58bfee61c39

fbshipit-source-id: 19d5bfc5133a3880383731d0332503ca1f3bce0c
2021-04-19 07:47:24 -07:00
Joel Schlosser
1077f87269 Support factory kwargs in torch.nn modules (#54508)
Summary:
Continuation of https://github.com/pytorch/pytorch/pull/53144

Pull Request resolved: https://github.com/pytorch/pytorch/pull/54508

Reviewed By: mrshenli

Differential Revision: D27600457

Pulled By: jbschlosser

fbshipit-source-id: b58bfee61c3917524b4622f63ef216c27a588eb1
2021-04-19 06:58:40 -07:00
Vasiliy Kuznetsov
48e675ac75 fx quant: fix subtle bug in BinaryOpQuantizeHanlder logic in matching (#56294)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56294

When matching a pattern to `BinaryOpQuantizeHandler`, we need to make
sure we check for dtype support on the base node, instead of the current
node.  This is important in cases such as `add-relu` and `mul-relu`,
when the current node is `relu`, but the base node is `add|mul`.

Test Plan:
```
python test/test_quantization.py TestQuantizeFx
```

There is no good test case to check this in current logic.  Created an
add-relu model manually, and verified with pdb that the add node was
being used to match against dtypes.

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D27831070

fbshipit-source-id: 3697f1328dff9fec3eb910bae49a73793ef36d63
2021-04-16 18:19:22 -07:00