mirror of
https://github.com/zebrajr/pytorch.git
synced 2025-12-07 12:21:27 +01:00
2a2bc1fc8a
721 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
2a2bc1fc8a |
ns for fx: add fqn to results, when present (#61377)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61377 Both the quantization tracer and the NS tracer record `_node_name_to_scope`, which contains the mapping from node name to FQN. This PR adds the FQN information to the NS results, so that it is more convenient for users to attribute a NS result to the corresponding module in their model. Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_extract_weights_fqn python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_match_activations_fqn python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_shadow_activations_fqn ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D29600349 fbshipit-source-id: df489e03daff97dd380f59c83ffdc2b0012a0a53 |
||
|
|
7449f49a4c |
ns for fx: return results in execution order (#61360)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61360 By default, NS graph matching matches from the end of the graph to the start. This PR reverses the returned results so that the outputs of the NS APIs are in the order of execution, making it easier to analyze. Test Plan: ``` python test/test_quantization.py TestFXGraphMatcher.test_results_order ``` Imported from OSS Reviewed By: hx89 Differential Revision: D29600348 fbshipit-source-id: c9fa4a3748db27c1788eebf803f35221e6fc8701 |
||
|
|
2b2928c5ca |
ns for fx: improve error messages for graph matching (#61359)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61359 Makes the error messages when graph matching easier to read for users. Test Plan: ``` // inspect the exceptions in the following two tests and verify // that they are easier to read than before python test/test_quantization.py TestFXGraphMatcher.test_matching_failure_node_count python test/test_quantization.py TestFXGraphMatcher.test_matching_failure_node_type ``` Imported from OSS Reviewed By: hx89 Differential Revision: D29600353 fbshipit-source-id: ec6640fe6cab7b62a697e4ee385be182f2918fd4 |
||
|
|
4acd14da02 |
ns for fx: preserve observers and fake_quants through passes (#61323)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61323 Before this PR, all observers and fake quants were silently removed when adding loggers with NS. This is problematic for QAT models because we need the fake quants to run in order to properly capture intermediate outputs. This PR fixes the issue by preserving the observers throughout the passes which add loggers. In detail: * for each quantization module or fusion, add additional patterns with that fusion and an observer/fake_quant at the end * remove the places in the logger model creation code which removed observers * add unit testing that QAT numerics do not change after adding loggers Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_loggers_preserve_qat_numerics python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_shadow_loggers_preserve_qat_numerics ``` Imported from OSS Reviewed By: hx89 Differential Revision: D29600351 fbshipit-source-id: 5f25118b79eb47860c49bca882de6a8eae7a4456 |
||
|
|
a70505cdbd |
ns for fx: support comparing fp32 vs fp32_prepared, except shadowed (#61129)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61129 Adds support the comparing fp32 model (without quantization) to a fp32 model prepared with quantization. The main missing feature was handling conv-bn fusion, since this fusion for PTQ happens outside of quantization patterns. Adds testing for this case for comparing weights and comparing activations Adds a TODO for also handling this for shadow activations, we need to first stop removing observers in graph passes before we can add this support, will be in a future PR. Test Plan: ``` python test/test_quantization.py TestFXGraphMatcherModels.test_mobilenet_v2 python test/test_quantization.py TestFXGraphMatcherModels.test_mobilenet_v2_qat python test/test_quantization.py TestFXNumericSuiteCoreAPIsModels.test_compare_activations_conv ``` Imported from OSS Reviewed By: raghuramank100 Differential Revision: D29520009 fbshipit-source-id: f63484a998f1424bd9cacf5d823b82b2edfea1ae |
||
|
|
0751a41ab1 |
[quant] Input-Weight Equalization - ConvReLU support (#61350)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61350 Applied changes in convert to allow for ConvReLU2d layers Initial Model: `x -> conv1 -> relu` After fusion: `x -> convRelu2d` After prepare: `x -> input_quant_obs -> input_eq_obs1 -> convRelu2d -> output_quant_obs1` After equalization functions: `x -> mul -> input_quant_obs (scaled) -> convRelu2d -> output_quant_obs` After convert: `x -> mul -> quantize_per_tensor -> quantized::convRelu2d -> dequantize` Test Plan: `python test/test_quantization.py TestEqualizeFx` Initial Model: ``` ConvReluModel( (fc): Conv2d(3, 5, kernel_size=(3, 3), stride=(1, 1)) (relu): ReLU() ) ``` After prepare: ``` GraphModule( (x_activation_post_process_0): MinMaxObserver(min_val=5.960464477539063e-08, max_val=0.9999999403953552) (x_activation_post_process_0_equalization_process_0): _InputEqualizationObserver( (input_obs): PerChannelMinMaxObserver(min_val=tensor([1.1921e-07, 3.3379e-06, 5.9605e-08]), max_val=tensor([1.0000, 1.0000, 1.0000])) ) (fc): ConvReLU2d( (0): Conv2d(3, 5, kernel_size=(3, 3), stride=(1, 1)) (1): ReLU() ) (fc_activation_post_process_0): MinMaxObserver(min_val=0.0, max_val=1.2341605424880981) ) graph(): %x : [#users=1] = placeholder[target=x] %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_0](args = (%x,), kwargs = {}) %x_activation_post_process_0_equalization_process_0 : [#users=1] = call_module[target=x_activation_post_process_0_equalization_process_0](args = (%x_activation_post_process_0,), kwargs = {}) %fc : [#users=1] = call_module[target=fc](args = (%x_activation_post_process_0_equalization_process_0,), kwargs = {}) %fc_activation_post_process_0 : [#users=1] = call_module[target=fc_activation_post_process_0](args = (%fc,), kwargs = {}) return fc_activation_post_process_0 ``` After equalization functions: ``` graph(): %x : [#users=1] = placeholder[target=x] %x_equalization_scale0 : [#users=1] = get_attr[target=x_equalization_scale0] %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_equalization_scale0), kwargs = {}) %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_0](args = (%mul,), kwargs = {}) %fc : [#users=1] = call_module[target=fc](args = (%x_activation_post_process_0,), kwargs = {}) %fc_activation_post_process_0 : [#users=1] = call_module[target=fc_activation_post_process_0](args = (%fc,), kwargs = {}) return fc_activation_post_process_0 ``` After convert: ``` graph(): %x : [#users=1] = placeholder[target=x] %x_equalization_scale0 : [#users=1] = get_attr[target=x_equalization_scale0] %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_equalization_scale0), kwargs = {}) %fc_input_scale_0 : [#users=1] = get_attr[target=fc_input_scale_0] %fc_input_zero_point_0 : [#users=1] = get_attr[target=fc_input_zero_point_0] %quantize_per_tensor : [#users=1] = call_function[target=torch.quantize_per_tensor](args = (%mul, %fc_input_scale_0, %fc_input_zero_point_0, torch.quint8), kwargs = {}) %fc : [#users=1] = call_module[target=fc](args = (%quantize_per_tensor,), kwargs = {}) %dequantize : [#users=1] = call_method[target=dequantize](args = (%fc,), kwargs = {}) return dequantize ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D29638275 fbshipit-source-id: 40d4666a4451e132612ea38fdfeaaec177a1defb |
||
|
|
b3e4dab45a |
[quant] Input-Weight Equalization - Conv convert support (#61287)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61287 Modifications to functions during convert() to support equalization. Note that this implementation does not work for connected F.conv2d layers yet. Initial: ``` w | x -> conv -> y ``` After prepare: ``` w | weight_quant_obs | weight_eq_obs | x -> input_quant_obs -> input_eq_obs -> conv -> out_quant_obs -> y ``` After convert: ``` scale, zero_point w (scaled) | | x -> mul -> quantize_per_tensor (scaled) -> quantized::conv -> dequant -> y | eq_scale ``` Test Plan: `python test/test_quantization.py TestEqualizeFx` Initial model: ``` ConvModel( (conv): Conv2d(3, 5, kernel_size=(3, 3), stride=(1, 1), bias=False) ) ``` After prepare: ``` graph(): %x : [#users=1] = placeholder[target=x] %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_0](args = (%x,), kwargs = {}) %x_activation_post_process_0_equalization_process_0 : [#users=1] = call_module[target=x_activation_post_process_0_equalization_process_0](args = (%x_activation_post_process_0,), kwargs = {}) %conv : [#users=1] = call_module[target=conv](args = (%x_activation_post_process_0_equalization_process_0,), kwargs = {}) %conv_activation_post_process_0 : [#users=1] = call_module[target=conv_activation_post_process_0](args = (%conv,), kwargs = {}) return conv_activation_post_process_0 ``` After equalization functions: ``` graph(): %x : [#users=1] = placeholder[target=x] %x_equalization_scale0 : [#users=1] = get_attr[target=x_equalization_scale0] %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_equalization_scale0), kwargs = {}) %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_0](args = (%mul,), kwargs = {}) %conv : [#users=1] = call_module[target=conv](args = (%x_activation_post_process_0,), kwargs = {}) %conv_activation_post_process_0 : [#users=1] = call_module[target=conv_activation_post_process_0](args = (%conv,), kwargs = {}) return conv_activation_post_process_0 ``` After convert: ``` graph(): %x : [#users=1] = placeholder[target=x] %x_equalization_scale0 : [#users=1] = get_attr[target=x_equalization_scale0] %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_equalization_scale0), kwargs = {}) %conv_input_scale_0 : [#users=1] = get_attr[target=conv_input_scale_0] %conv_input_zero_point_0 : [#users=1] = get_attr[target=conv_input_zero_point_0] %quantize_per_tensor : [#users=1] = call_function[target=torch.quantize_per_tensor](args = (%mul, %conv_input_scale_0, %conv_input_zero_point_0, torch.quint8), kwargs = {}) %conv : [#users=1] = call_module[target=conv](args = (%quantize_per_tensor,), kwargs = {}) %dequantize : [#users=1] = call_method[target=dequantize](args = (%conv,), kwargs = {}) return dequantize ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D29557055 fbshipit-source-id: dc9f44182e31fa362c43ad2dfe224e6f4e4a730e |
||
|
|
77d36b657a |
[quant] Input-Weight Equalization - Conv prepare support (#61286)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61286 Modifies the prepare step to support conv layers during input-weight equalization and adds tests to make sure that the results are as expected. Initial: ``` w | x -> conv -> y ``` After prepare: ``` w | weight_quant_obs | weight_eq_obs | x -> input_quant_obs -> input_eq_obs -> conv -> out_quant_obs -> y ``` Test Plan: `python test/test_quantization.py TestEqualizeFx.test_input_weight_equalization_prepare` Initial: ``` ConvModel( (conv): Conv2d(3, 5, kernel_size=(3, 3), stride=(1, 1), bias=False) ) ``` After prepare: ``` graph(): %x : [#users=1] = placeholder[target=x] %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_0](args = (%x,), kwargs = {}) %x_activation_post_process_0_equalization_process_0 : [#users=1] = call_module[target=x_activation_post_process_0_equalization_process_0](args = (%x_activation_post_process_0,), kwargs = {}) %conv : [#users=1] = call_module[target=conv](args = (%x_activation_post_process_0_equalization_process_0,), kwargs = {}) %conv_activation_post_process_0 : [#users=1] = call_module[target=conv_activation_post_process_0](args = (%conv,), kwargs = {}) return conv_activation_post_process_0 ``` Imported from OSS Reviewed By: supriyar Differential Revision: D29557051 fbshipit-source-id: 25d1531645dfaf565f5c615e2ee850fcf96c7eb9 |
||
|
|
ce9cedd119 |
[quant] Input-Weight Equalization - Conv observer support (#61285)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61285 Modifies observers to support conv layers and tests to make sure that the observers are returning the expected values for conv inputs. Test Plan: `python test/test_quantization.py TestEqualizeFx.test_input_weight_eq_observer` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D29557041 fbshipit-source-id: 5e43329f189ba352eb8b991f38bf37752eebb6e6 |
||
|
|
7a15576a65 |
[quant] update FakeQuant modules to use tensor qparams (#61318)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61318 Remove the `float()` and `int()` calls in the forward function so that we can directly use the tensor qparams in the fake_quantize operator. Calling `float()/int()` internally calls `item()` which can trigger a gpu-> cpu copy if the original tensors reside on GPU. Local benchmark P427668213 Before this change ``` Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls --------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ aten::_aminmax 2.57% 1.507ms 3.10% 1.819ms 36.371us 2.872ms 4.81% 2.872ms 57.446us 50 aten::fake_quantize_per_tensor_affine 1.04% 610.915us 3.60% 2.114ms 42.276us 472.896us 0.79% 2.698ms 53.962us 50 aten::fake_quantize_per_tensor_affine_cachemask 1.69% 993.626us 2.56% 1.503ms 30.058us 2.225ms 3.73% 2.225ms 44.504us 50 aten::is_nonzero 3.85% 2.258ms 19.68% 11.540ms 46.161us 2.168ms 3.63% 11.084ms 44.336us 250 aten::zeros_like 1.82% 1.064ms 6.65% 3.901ms 39.007us 1.531ms 2.57% 3.905ms 39.045us 100 aten::eq 13.80% 8.093ms 25.90% 15.189ms 37.972us 9.580ms 16.05% 15.566ms 38.914us 400 aten::item 5.67% 3.323ms 21.50% 12.607ms 36.019us 3.233ms 5.42% 12.167ms 34.762us 350 aten::zeros 0.94% 549.208us 2.93% 1.717ms 34.343us 688.928us 1.15% 1.695ms 33.894us 50 aten::le 2.52% 1.478ms 4.50% 2.641ms 26.411us 1.753ms 2.94% 2.845ms 28.448us 100 aten::rsub 1.04% 608.715us 2.44% 1.433ms 28.667us 532.000us 0.89% 1.418ms 28.353us 50 aten::max 1.54% 905.401us 4.62% 2.711ms 27.106us 847.488us 1.42% 2.697ms 26.969us 100 aten::ones 0.92% 542.159us 2.16% 1.266ms 25.324us 661.856us 1.11% 1.301ms 26.017us 50 aten::min 0.82% 479.167us 2.15% 1.258ms 25.160us 407.808us 0.68% 1.276ms 25.530us 50 aten::_local_scalar_dense 15.83% 9.284ms 15.83% 9.284ms 26.526us 8.934ms 14.97% 8.934ms 25.524us 350 aten::clamp 2.35% 1.378ms 4.21% 2.467ms 24.669us 1.546ms 2.59% 2.461ms 24.612us 100 aten::zero_ 2.53% 1.482ms 5.65% 3.316ms 22.108us 1.326ms 2.22% 3.380ms 22.531us 150 aten::maximum 3.08% 1.805ms 3.08% 1.805ms 18.052us 1.849ms 3.10% 1.849ms 18.494us 100 aten::minimum 1.33% 778.854us 1.33% 778.854us 15.577us 868.672us 1.46% 868.672us 17.373us 50 aten::round 1.36% 799.910us 1.36% 799.910us 15.998us 809.568us 1.36% 809.568us 16.191us 50 aten::copy_ 6.61% 3.878ms 6.61% 3.878ms 15.513us 4.036ms 6.76% 4.036ms 16.143us 250 aten::div 2.53% 1.483ms 2.53% 1.483ms 14.833us 1.535ms 2.57% 1.535ms 15.353us 100 aten::mul 2.44% 1.431ms 2.44% 1.431ms 14.314us 1.478ms 2.48% 1.478ms 14.782us 100 aten::detach 1.46% 855.670us 2.41% 1.411ms 14.110us 832.448us 1.39% 1.395ms 13.949us 100 aten::add 2.22% 1.301ms 2.22% 1.301ms 13.008us 1.383ms 2.32% 1.383ms 13.828us 100 aten::fill_ 4.18% 2.452ms 4.18% 2.452ms 12.262us 2.693ms 4.51% 2.693ms 13.463us 200 aten::sub 5.06% 2.967ms 5.06% 2.967ms 14.837us 2.675ms 4.48% 2.675ms 13.374us 200 aten::to 2.10% 1.230ms 3.65% 2.140ms 10.701us 1.310ms 2.20% 2.062ms 10.310us 200 aten::select 1.28% 749.144us 1.49% 874.227us 8.742us 863.232us 1.45% 863.232us 8.632us 100 detach 0.95% 555.326us 0.95% 555.326us 5.553us 562.496us 0.94% 562.496us 5.625us 100 aten::as_strided 0.40% 232.289us 0.40% 232.289us 1.161us 0.000us 0.00% 0.000us 0.000us 200 aten::empty 2.93% 1.720ms 2.93% 1.720ms 3.439us 0.000us 0.00% 0.000us 0.000us 500 aten::resize_ 1.04% 611.313us 1.04% 611.313us 2.038us 0.000us 0.00% 0.000us 0.000us 300 aten::empty_like 0.75% 438.585us 1.77% 1.036ms 5.180us 0.000us 0.00% 0.000us 0.000us 200 aten::empty_strided 1.36% 799.442us 1.36% 799.442us 3.198us 0.000us 0.00% 0.000us 0.000us 250 --------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 58.645ms Self CUDA time total: 59.674ms ``` After this change ``` test_fake_quant_profiler (scripts.supriyar.benchmark.module_bench.ProfilerBench) ... ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ aten::fake_quantize_per_tensor_affine 0.98% 505.210us 4.38% 2.259ms 45.187us 419.424us 0.78% 3.218ms 64.367us 50 aten::_aminmax 2.78% 1.434ms 3.42% 1.766ms 35.321us 2.825ms 5.27% 2.825ms 56.505us 50 aten::fake_quantize_per_tensor_affine_cachemask_tens... 2.38% 1.229ms 3.40% 1.754ms 35.083us 2.799ms 5.22% 2.799ms 55.979us 50 aten::rsub 0.94% 485.040us 5.02% 2.590ms 51.793us 458.976us 0.86% 2.587ms 51.747us 50 aten::is_nonzero 3.78% 1.952ms 23.64% 12.196ms 48.786us 2.055ms 3.83% 11.986ms 47.944us 250 aten::item 6.92% 3.572ms 19.86% 10.244ms 40.977us 3.670ms 6.85% 9.931ms 39.724us 250 aten::zeros_like 1.65% 848.874us 6.64% 3.426ms 34.260us 1.397ms 2.61% 3.572ms 35.717us 100 aten::zeros 0.85% 436.691us 3.00% 1.549ms 30.984us 551.936us 1.03% 1.576ms 31.516us 50 aten::eq 10.60% 5.467ms 20.26% 10.452ms 26.130us 7.018ms 13.09% 10.832ms 27.079us 400 aten::le 2.58% 1.332ms 4.67% 2.407ms 24.074us 1.580ms 2.95% 2.614ms 26.144us 100 aten::_local_scalar_dense 12.93% 6.673ms 12.93% 6.673ms 26.691us 6.261ms 11.68% 6.261ms 25.046us 250 aten::clamp 2.43% 1.253ms 4.37% 2.256ms 22.560us 1.431ms 2.67% 2.273ms 22.725us 100 aten::ones 0.89% 460.133us 2.18% 1.123ms 22.467us 570.496us 1.06% 1.128ms 22.551us 50 aten::min 0.74% 383.132us 2.06% 1.065ms 21.296us 377.536us 0.70% 1.091ms 21.824us 50 aten::zero_ 2.36% 1.219ms 5.87% 3.029ms 20.194us 1.261ms 2.35% 3.199ms 21.327us 150 aten::max 1.51% 779.081us 4.06% 2.096ms 20.960us 791.680us 1.48% 2.130ms 21.295us 100 aten::sub 7.97% 4.111ms 7.97% 4.111ms 20.556us 3.847ms 7.18% 3.847ms 19.234us 200 aten::div 2.94% 1.516ms 2.94% 1.516ms 15.158us 1.580ms 2.95% 1.580ms 15.798us 100 aten::round 1.45% 750.445us 1.45% 750.445us 15.009us 756.064us 1.41% 756.064us 15.121us 50 aten::copy_ 6.88% 3.548ms 6.88% 3.548ms 14.190us 3.701ms 6.90% 3.701ms 14.803us 250 aten::minimum 1.32% 681.654us 1.32% 681.654us 13.633us 713.664us 1.33% 713.664us 14.273us 50 aten::maximum 2.55% 1.317ms 2.55% 1.317ms 13.169us 1.338ms 2.50% 1.338ms 13.378us 100 aten::mul 2.63% 1.358ms 2.63% 1.358ms 13.581us 1.328ms 2.48% 1.328ms 13.283us 100 aten::detach 1.34% 688.820us 2.35% 1.211ms 12.110us 772.800us 1.44% 1.278ms 12.779us 100 aten::fill_ 4.53% 2.338ms 4.53% 2.338ms 11.692us 2.495ms 4.65% 2.495ms 12.473us 200 aten::add 2.32% 1.197ms 2.32% 1.197ms 11.968us 1.240ms 2.31% 1.240ms 12.405us 100 aten::to 2.07% 1.069ms 3.66% 1.889ms 9.443us 1.224ms 2.28% 1.975ms 9.874us 200 aten::select 1.44% 743.042us 1.64% 848.207us 8.482us 641.600us 1.20% 641.600us 6.416us 100 detach 1.01% 522.155us 1.01% 522.155us 5.222us 505.088us 0.94% 505.088us 5.051us 100 aten::as_strided 0.44% 227.884us 0.44% 227.884us 1.139us 0.000us 0.00% 0.000us 0.000us 200 aten::empty 3.20% 1.652ms 3.20% 1.652ms 3.304us 0.000us 0.00% 0.000us 0.000us 500 aten::resize_ 1.25% 646.711us 1.25% 646.711us 2.156us 0.000us 0.00% 0.000us 0.000us 300 aten::empty_like 0.79% 407.768us 2.07% 1.067ms 5.334us 0.000us 0.00% 0.000us 0.000us 200 aten::empty_strided 1.52% 785.788us 1.52% 785.788us 3.143us 0.000us 0.00% 0.000us 0.000us 250 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 51.590ms Self CUDA time total: 53.609ms ghstack-source-id: 133370215 Test Plan: buck test mode/dev-nosan caffe2/test/:quantization Reviewed By: raghuramank100 Differential Revision: D29566512 fbshipit-source-id: 1aefca51f99949da7334bcfe504848275c9f952c |
||
|
|
99848c7269 |
[quant] Add tensor_qparam variant to fake_quantize_per_tensor (#61317)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61317 Add an overload to fake_quantize_per_tensor that accepts scale/zero_point as input. The reasons to do this are * required for fused observer + fake_quant operator on GPU where the scale/zero_point will be calculated by the observer on device. Passing tensor inputs enables us to directly access the scale/zero-point value in the cuda kernel to avoid extra copies/malloc * enables us to pass in float as scale dtype and int32 as zero_point dtype (which is consistent with what the quantize call actually uses) https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/quantized/affine_quantizer_base.cpp#L52-L53 * overload consistent with `quantizer_per_tensor.tensor_qparams` ghstack-source-id: 133370216 Test Plan: buck test mode/dev-nosan caffe2/test/:quantization -- test_backward_per_tensor_cachemask buck test mode/dev-nosan caffe2/test/:quantization -- test_forward_per_tensor_cachemask Reviewed By: raghuramank100 Differential Revision: D29552727 fbshipit-source-id: cbb9af40fc575ad27a29c646b760d5ee52cc923d |
||
|
|
a4d86e0d53 |
[quant][fx][perf] improve runtime of prepare step for large models (#61132)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61132 For large models, the insert_observers_for_model function was taking a long time, especially for the case where not all the nodes are being quantized For example for a model with 21000 nodes of which only ~50 are being quantized the breakdown of prepare_fx vs convert fx was prepare_fx 979 seconds convert_fx 9 seconds The main reason was because we were doing some unnecessary computation for all nodes in this function, this PR just moves them to where they are actually used After this PR prepare_fx 26 seconds convert_fx 9 seconds Test Plan: Existing tests Imported from OSS Reviewed By: raghuramank100 Differential Revision: D29522303 fbshipit-source-id: 7ce12582a859d02ff763abebf4a592d28e0764ca |
||
|
|
dabadd7e20 |
[quant] Added reset_min_max_vals() function to observers (#60883)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60883 As per this [comment](https://github.com/pytorch/pytorch/pull/59964#discussion_r659064270), I created a `reset_min_max_vals()` function inside the observers which will be called during input-weight equalization. This is so that we will not expose the implementation of the observers in the equalization code. Test Plan: `python test/test_quantization.py TestEqualizeFx` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D29491848 fbshipit-source-id: 00e91959ceb3b4f3688175a1a7ba11823e929b2f |
||
|
|
1a0195db49 |
[quant] Input-Weight Equalization - support for LinearReLU layers (#60653)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60653 Special casing was needed to get the weight attribute in the linear layers of fused LinearReLU layers. Initial Model: `x -> linear1 -> relu` After fusion: `x -> linearRelu` After prepare: `x -> input_quant_obs -> input_eq_obs1 -> linearRelu -> output_quant_obs1` After equalization functions: `x -> mul -> input_quant_obs (scaled) -> linearRelu -> output_quant_obs` After convert: `x -> mul -> quantize_per_tensor -> quantized::linearRelu -> dequantize` More step-throughs here: https://fb.quip.com/A9J3AsBxkykR Test Plan: `python test/test_quantization.py TestEqualizeFx` Original model: ``` LinearReluModel( (fc): Linear(in_features=5, out_features=5, bias=True) (relu): ReLU() ) ``` Graph after `prepare_fx`: ``` graph(): %x : [#users=1] = placeholder[target=x] %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_0](args = (%x,), kwargs = {}) %x_activation_post_process_0_equalization_process_0 : [#users=1] = call_module[target=x_activation_post_process_0_equalization_process_0](args = (%x_activation_post_process_0,), kwargs = {}) %fc : [#users=1] = call_module[target=fc](args = (%x_activation_post_process_0_equalization_process_0,), kwargs = {}) %fc_activation_post_process_0 : [#users=1] = call_module[target=fc_activation_post_process_0](args = (%fc,), kwargs = {}) return fc_activation_post_process_0 ``` Graph after equalization functions: ``` graph(): %x : [#users=1] = placeholder[target=x] %x_equalization_scale0 : [#users=1] = get_attr[target=x_equalization_scale0] %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_equalization_scale0), kwargs = {}) %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_0](args = (%mul,), kwargs = {}) %fc : [#users=1] = call_module[target=fc](args = (%x_activation_post_process_0,), kwargs = {}) %fc_activation_post_process_0 : [#users=1] = call_module[target=fc_activation_post_process_0](args = (%fc,), kwargs = {}) return fc_activation_post_process_0 ``` Graph after `convert_fx`: ``` graph(): %x : [#users=1] = placeholder[target=x] %x_equalization_scale0 : [#users=1] = get_attr[target=x_equalization_scale0] %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_equalization_scale0), kwargs = {}) %fc_input_scale_0 : [#users=1] = get_attr[target=fc_input_scale_0] %fc_input_zero_point_0 : [#users=1] = get_attr[target=fc_input_zero_point_0] %quantize_per_tensor : [#users=1] = call_function[target=torch.quantize_per_tensor](args = (%mul, %fc_input_scale_0, %fc_input_zero_point_0, torch.quint8), kwargs = {}) %fc : [#users=1] = call_module[target=fc](args = (%quantize_per_tensor,), kwargs = {}) %dequantize : [#users=1] = call_method[target=dequantize](args = (%fc,), kwargs = {}) return dequantize ``` Imported from OSS Reviewed By: supriyar Differential Revision: D29406999 fbshipit-source-id: add38e8e7fb84a241c3b10bfb8451b50103effd4 |
||
|
|
5576c7bdd1 |
ns for fx: initial support for int8 shadows fp32 (#60419)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60419 Adds support for NS for FX shadowed activations pass to handle int8 modules shadowing fp32 modules. The difficulty here is that in order to insert the dtype cast, we need the qparams of the input. For the current PR, we only handle the easy cases where the previous node is either a `quantize_per_tensor` or an OSS quantized module. A future PR can handle more complicated cases such as various functions. Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_int8_shadows_fp32_simple ``` Imported from OSS Reviewed By: hx89 Differential Revision: D29280050 fbshipit-source-id: 465257c9f82a34fa91b48ae8887355c68e00edc6 |
||
|
|
9b94aa5356 |
[quant][fx][fix] Fused modules with object_type in qconfig (#60779)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60779 When we do fusion, we replace certain modules (such as Linear + ReLU) with fused versions (such as LinearReLU) by calling `_fuse_fx` in prepare_fx. However when we try to look up using the fused module type in qconfig_dict, we cannot find a match anymore since the qconfig dict contains the original module types. An example is here [N882873](https://fburl.com/anp/azenjx3v). So we will now update the qconfig_dict to include the fused modules mapping to the qconfigs used for the modules that make up the fused modules. If the modules are not mapped to the same qconfig, then we will raise an error. Test Plan: `python test/test_quantization.py TestFuseFx.test_qconfig_fused_module` Imported from OSS Reviewed By: supriyar Differential Revision: D29406941 fbshipit-source-id: 74b5db89f4998aeb02b2bf7c37bf97326580c654 |
||
|
|
dfb9c0bae8 |
[quant] Input-Weight Equalization - support for connected F.linear layer (#60272)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60272 Test Plan: `python test/test_quantization.py TestEqualizeFx` Original model: ``` FunctionalLinear2Module( (linear1): Linear() (linear2): Linear() ) ``` Graph after `prepare_fx`: ``` graph(): %x : [#users=1] = placeholder[target=x] %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_0](args = (%x,), kwargs = {}) %x_activation_post_process_0_equalization_process_0 : [#users=1] = call_module[target=x_activation_post_process_0_equalization_process_0](args = (%x_activation_post_process_0,), kwargs = {}) %linear1_w : [#users=1] = get_attr[target=linear1.w] %linear1_w_activation_post_process_0 : [#users=1] = call_module[target=linear1_w_activation_post_process_0](args = (%linear1_w,), kwargs = {}) %linear1_w_activation_post_process_0_equalization_process_0 : [#users=1] = call_module[target=linear1_w_activation_post_process_0_equalization_process_0](args = (%linear1_w_activation_post_process_0,), kwargs = {}) %linear1_b : [#users=1] = get_attr[target=linear1.b] %linear : [#users=1] = call_function[target=torch.nn.functional.linear](args = (%x_activation_post_process_0_equalization_process_0, %linear1_w_activation_post_process_0_equalization_process_0), kwargs = {bias: %linear1_b}) %linear_activation_post_process_0 : [#users=1] = call_module[target=linear_activation_post_process_0](args = (%linear,), kwargs = {}) %linear_activation_post_process_0_equalization_process_0 : [#users=1] = call_module[target=linear_activation_post_process_0_equalization_process_0](args = (%linear_activation_post_process_0,), kwargs = {}) %linear2_w : [#users=1] = get_attr[target=linear2.w] %linear2_w_activation_post_process_0 : [#users=1] = call_module[target=linear2_w_activation_post_process_0](args = (%linear2_w,), kwargs = {}) %linear2_w_activation_post_process_0_equalization_process_0 : [#users=1] = call_module[target=linear2_w_activation_post_process_0_equalization_process_0](args = (%linear2_w_activation_post_process_0,), kwargs = {}) %linear2_b : [#users=1] = get_attr[target=linear2.b] %linear_1 : [#users=1] = call_function[target=torch.nn.functional.linear](args = (%linear_activation_post_process_0_equalization_process_0, %linear2_w_activation_post_process_0_equalization_process_0), kwargs = {bias: %linear2_b}) %linear_1_activation_post_process_0 : [#users=1] = call_module[target=linear_1_activation_post_process_0](args = (%linear_1,), kwargs = {}) return linear_1_activation_post_process_0 ``` Graph after equalization steps: ``` graph(): %x : [#users=1] = placeholder[target=x] %x_equalization_scale0 : [#users=1] = get_attr[target=x_equalization_scale0] %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_equalization_scale0), kwargs = {}) %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_0](args = (%mul,), kwargs = {}) %linear1_w : [#users=1] = get_attr[target=linear1.w] %linear1_w_activation_post_process_0 : [#users=1] = call_module[target=linear1_w_activation_post_process_0](args = (%linear1_w,), kwargs = {}) %linear1_b : [#users=1] = get_attr[target=linear1.b] %linear : [#users=1] = call_function[target=torch.nn.functional.linear](args = (%x_activation_post_process_0, %linear1_w_activation_post_process_0), kwargs = {bias: %linear1_b}) %linear_activation_post_process_0 : [#users=1] = call_module[target=linear_activation_post_process_0](args = (%linear,), kwargs = {}) %linear2_w : [#users=1] = get_attr[target=linear2.w] %linear2_w_activation_post_process_0 : [#users=1] = call_module[target=linear2_w_activation_post_process_0](args = (%linear2_w,), kwargs = {}) %linear2_b : [#users=1] = get_attr[target=linear2.b] %linear_1 : [#users=1] = call_function[target=torch.nn.functional.linear](args = (%linear_activation_post_process_0, %linear2_w_activation_post_process_0), kwargs = {bias: %linear2_b}) %linear_1_activation_post_process_0 : [#users=1] = call_module[target=linear_1_activation_post_process_0](args = (%linear_1,), kwargs = {}) return linear_1_activation_post_process_0 ``` Graph after `convert_fx`: ``` graph(): %x : [#users=1] = placeholder[target=x] %x_equalization_scale0 : [#users=1] = get_attr[target=x_equalization_scale0] %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_equalization_scale0), kwargs = {}) %linear1_input_scale_0 : [#users=1] = get_attr[target=linear1_input_scale_0] %linear1_input_zero_point_0 : [#users=1] = get_attr[target=linear1_input_zero_point_0] %quantize_per_tensor : [#users=1] = call_function[target=torch.quantize_per_tensor](args = (%mul, %linear1_input_scale_0, %linear1_input_zero_point_0, torch.quint8), kwargs = {}) %linear1_packed_weight_0 : [#users=1] = get_attr[target=linear1_packed_weight_0] %linear1_scale_0 : [#users=1] = get_attr[target=linear1_scale_0] %linear1_zero_point_0 : [#users=1] = get_attr[target=linear1_zero_point_0] %linear : [#users=1] = call_function[target=torch.ops.quantized.linear](args = (%quantize_per_tensor, %linear1_packed_weight_0, %linear1_scale_0, %linear1_zero_point_0), kwargs = {}) %linear2_packed_weight_0 : [#users=1] = get_attr[target=linear2_packed_weight_0] %linear2_scale_0 : [#users=1] = get_attr[target=linear2_scale_0] %linear2_zero_point_0 : [#users=1] = get_attr[target=linear2_zero_point_0] %linear_1 : [#users=1] = call_function[target=torch.ops.quantized.linear](args = (%linear, %linear2_packed_weight_0, %linear2_scale_0, %linear2_zero_point_0), kwargs = {}) %dequantize : [#users=1] = call_method[target=dequantize](args = (%linear_1,), kwargs = {}) return dequantize ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D29267218 fbshipit-source-id: 6b97bed1a307f1d0b1f5efcbecf41f35418242f7 |
||
|
|
ddf2ce03bb |
[quant] Input-Weight Equalization - support for connected linear layers (#60034)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60034 Added support for equalizing models with connected linear layers. To account for connected linear layers, we will additionally multiply the previous weight values (row-wise) by the next equalization scale, and remove the input equalization observer between the two linear layers. We also want to scale the bias by the next equalization scale. The math is shown here: https://fb.quip.com/fK8rA9aRM4ca . Original Model: `x -> linear1 -> linear2` After `prepare_fx`: `x -> InpEqObs -> InpQuantObs -> linear1 -> OutQuantObs -> InpEqObs -> linear2` After equalization: `x -> mul -> InpQuantObs -> linear1 -> OutQuantObs -> linear2` Test Plan: `python test/test_quantization.py TestEqualizeFx.test_input_weight_equalization_convert` Original Model: ``` Linear2Module( (linear1): Linear(in_features=2, out_features=2, bias=True) (linear2): Linear(in_features=2, out_features=2, bias=True) ) ``` Graph after `prepare_fx`: ``` graph(): %x : [#users=1] = placeholder[target=x] %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_0](args = (%x,), kwargs = {}) %x_activation_post_process_0_equalization_process_0 : [#users=1] = call_module[target=x_activation_post_process_0_equalization_process_0](args = (%x_activation_post_process_0,), kwargs = {}) %linear1 : [#users=1] = call_module[target=linear1](args = (%x_activation_post_process_0_equalization_process_0,), kwargs = {}) %linear1_activation_post_process_0 : [#users=1] = call_module[target=linear1_activation_post_process_0](args = (%linear1,), kwargs = {}) %linear1_activation_post_process_0_equalization_process_0 : [#users=1] = call_module[target=linear1_activation_post_process_0_equalization_process_0](args = (%linear1_activation_post_process_0,), kwargs = {}) %linear2 : [#users=1] = call_module[target=linear2](args = (%linear1_activation_post_process_0_equalization_process_0,), kwargs = {}) %linear2_activation_post_process_0 : [#users=1] = call_module[target=linear2_activation_post_process_0](args = (%linear2,), kwargs = {}) return linear2_activation_post_process_0 ``` Graph after equaliation functions: ``` graph(): %x : [#users=1] = placeholder[target=x] %x_activation_post_process_0_equalization_process_0_scale : [#users=1] = get_attr[target=x_activation_post_process_0_equalization_process_0_scale] %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_activation_post_process_0_equalization_process_0_scale), kwargs = {}) %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_0](args = (%mul,), kwargs = {}) %linear1 : [#users=1] = call_module[target=linear1](args = (%x_activation_post_process_0,), kwargs = {}) %linear1_activation_post_process_0 : [#users=1] = call_module[target=linear1_activation_post_process_0](args = (%linear1,), kwargs = {}) %linear2 : [#users=1] = call_module[target=linear2](args = (%linear1_activation_post_process_0,), kwargs = {}) %linear2_activation_post_process_0 : [#users=1] = call_module[target=linear2_activation_post_process_0](args = (%linear2,), kwargs = {}) return linear2_activation_post_process_0 ``` Graph after `convert_fx`: ``` graph(): %x : [#users=1] = placeholder[target=x] %x_activation_post_process_0_equalization_process_0_scale : [#users=1] = get_attr[target=x_activation_post_process_0_equalization_process_0_scale] %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_activation_post_process_0_equalization_process_0_scale), kwargs = {}) %linear1_input_scale_0 : [#users=1] = get_attr[target=linear1_input_scale_0] %linear1_input_zero_point_0 : [#users=1] = get_attr[target=linear1_input_zero_point_0] %quantize_per_tensor : [#users=1] = call_function[target=torch.quantize_per_tensor](args = (%mul, %linear1_input_scale_0, %linear1_input_zero_point_0, torch.quint8), kwargs = {}) %linear1 : [#users=1] = call_module[target=linear1](args = (%quantize_per_tensor,), kwargs = {}) %linear2 : [#users=1] = call_module[target=linear2](args = (%linear1,), kwargs = {}) %dequantize : [#users=1] = call_method[target=dequantize](args = (%linear2,), kwargs = {}) return dequantize ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D29204347 fbshipit-source-id: 6bb9e25e2468f50df523885ded2edc731f002ac1 |
||
|
|
7917318917 |
[quant] Input-Weight Equalization - support for F.linear layers (#59964)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59964 Input-Weight Equalization support for functional layers Test Plan: `python test/test_quantization.py TestEqualizeFx.test_input_weight_equalization_convert` Original model: ``` FunctionalLinearModule( (linear1): Linear() ) ``` Graph after `prepare_fx`: ``` graph(): %x : [#users=1] = placeholder[target=x] %x_equalization_process_0 : [#users=1] = call_module[target=x_equalization_process_0](args = (%x,), kwargs = {}) graph(): %x : [#users=1] = placeholder[target=x] %x_equalization_process_0 : [#users=1] = call_module[target=x_equalization_process_0](args = (%x,), kwargs = {}) %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_00](args = (%x_equalization_process_0,), kwargs = {}) %linear1_w : [#users=1] = get_attr[target=linear1.w] %linear1_w_equalization_process_0 : [#users=1] = call_module[target=linear1_w_equalization_process_0](args = (%linear1_w,), kwargs = {}) %linear1_w_activation_post_process_0 : [#users=1] = call_module[target=linear1_w_activation_post_process_00](args = (%linear1_w_equalization_process_0,), kwargs = {}) %linear1_b : [#users=1] = get_attr[target=linear1.b] %linear : [#users=1] = call_function[target=torch.nn.functional.linear](args = (%x_activation_post_process_0, %linear1_w_activation_post_process_0), kwargs = {bias: %linear1_b}) %linear_activation_post_process_0 : [#users=1] = call_module[target=linear_activation_post_process_0](args = (%linear,), kwargs = {}) return linear_activation_post_process_0 ``` Graph after equalization functions: ``` graph(): %x : [#users=1] = placeholder[target=x] %x_equalization_process_0_scale : [#users=1] = get_attr[target=x_equalization_process_0_scale] %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_equalization_process_0_scale), kwargs = {}) %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_00](args = (%mul,), kwargs = {}) %linear1_w : [#users=1] = get_attr[target=linear1.w] %linear1_w_equalization_process_0 : [#users=1] = call_module[target=linear1_w_equalization_process_0](args = (%linear1_w,), kwargs = {}) %linear1_w_activation_post_process_0 : [#users=1] = call_module[target=linear1_w_activation_post_process_00](args = (%linear1_w_equalization_process_0,), kwargs = {}) %linear1_b : [#users=1] = get_attr[target=linear1.b] %linear : [#users=1] = call_function[target=torch.nn.functional.linear](args = (%x_activation_post_process_0, %linear1_w_activation_post_process_0), kwargs = {bias: %linear1_b}) %linear_activation_post_process_0 : [#users=1] = call_module[target=linear_activation_post_process_0](args = (%linear,), kwargs = {}) return linear_activation_post_process_0 ``` Graph after `convert_fx`: ``` graph(): %x : [#users=1] = placeholder[target=x] %x_equalization_process_0_scale : [#users=1] = get_attr[target=x_equalization_process_0_scale] %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_equalization_process_0_scale), kwargs = {}) %linear1_input_scale_0 : [#users=1] = get_attr[target=linear1_input_scale_0] %linear1_input_zero_point_0 : [#users=1] = get_attr[target=linear1_input_zero_point_0] %quantize_per_tensor : [#users=1] = call_function[target=torch.quantize_per_tensor](args = (%mul, %linear1_input_scale_0, %linear1_input_zero_point_0, torch.quint8), kwargs = {}) %linear1_packed_weight_0 : [#users=1] = get_attr[target=linear1_packed_weight_0] %linear1_scale_0 : [#users=1] = get_attr[target=linear1_scale_0] %linear1_zero_point_0 : [#users=1] = get_attr[target=linear1_zero_point_0] %linear : [#users=1] = call_function[target=torch.ops.quantized.linear](args = (%quantize_per_tensor, %linear1_packed_weight_0, %linear1_scale_0, %linear1_zero_point_0), kwargs = {}) %dequantize : [#users=1] = call_method[target=dequantize](args = (%linear,), kwargs = {}) return dequantize ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D29135459 fbshipit-source-id: 1e69bfbb82a0c89538e55b64968effd0b11b2fde |
||
|
|
e13a9587b4 |
Revert "Revert D29135358: [quant] Input-Weight Equaliaztion - convert modifications" (#60646)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60646
This reverts commit
|
||
|
|
7fc4e67771 |
ns for fx: fix shadow logger error for resnet18 (#60559)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60559 Adds `resnet18` to integration test, and fixes the error to make creating the shadow model work. Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIsModels.test_resnet18 ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D29336236 fbshipit-source-id: 9425aa096162d80ef3a7c98144b2301cfbccc1ea |
||
|
|
4ddb2b43b7 |
ns for fx: expose function to add comparisons between logged values (#60311)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60311 Adds a user facing utility function to FX Numeric Suite Core APIs for comparing the values extracted by the loggers to each other. This is needed for any kind of analysis, so would be great to provide an example implementation. Example: ``` // code m = nn.Sequential(nn.Conv2d(1, 1, 1), nn.Conv2d(1, 1, 1)).eval() qconfig_dict = {'': torch.quantization.default_qconfig} mp = torch.quantization.quantize_fx.prepare_fx(m, qconfig_dict) mq = torch.quantization.quantize_fx.convert_fx(copy.deepcopy(mp)) results = extract_weights('fp32', mp, 'int8', mq) extend_logger_results_with_comparison( results, 'fp32', 'int8', compute_sqnr, 'sqnr_int8_vs_fp32') print(results) // results { '_1': {'weight': { 'fp32': [ {'type': 'weight', 'values': [tensor([[[[-0.3284]]]])], 'prev_node_name': '_1', 'prev_node_target_type': "<class 'torch.nn.modules.conv.Conv2d'>", 'ref_node_name': '_1', 'index_within_arg': 0, 'index_of_arg': 0} ], 'int8': [ {'type': 'weight', 'values': [tensor([[[[-0.3297]]]], size=(1, 1, 1, 1), dtype=torch.qint8, quantization_scheme=torch.per_tensor_affine, scale=0.002575645223259926, zero_point=0)], 'prev_node_name': '_1', 'prev_node_target_type': "<class 'torch.nn.quantized.modules.conv.Conv2d'>", 'ref_node_name': '_1', 'index_within_arg': 0, 'index_of_arg': 0, 'sqnr_int8_vs_fp32': [tensor(48.1308)]} ] }}, '_0': {'weight': { 'fp32': [{'type': 'weight', 'values': [tensor([[[[0.5205]]]])], 'prev_node_name': '_0', 'prev_node_target_type': "<class 'torch.nn.modules.conv.Conv2d'>", 'ref_node_name': '_0', 'index_within_arg': 0, 'index_of_arg': 0}], 'int8': [{'type': 'weight', 'values': [tensor([[[[0.5184]]]], size=(1, 1, 1, 1), dtype=torch.qint8, quantization_scheme=torch.per_tensor_affine, scale=0.004082232713699341, zero_point=0)], 'prev_node_name': '_0', 'prev_node_target_type': "<class 'torch.nn.quantized.modules.conv.Conv2d'>", 'ref_node_name': '_0', 'index_within_arg': 0, 'index_of_arg': 0, 'sqnr_int8_vs_fp32': [tensor(48.1309)]}] }} } ``` Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_extend_logger_results_with_comparison ``` Imported from OSS Reviewed By: hx89 Differential Revision: D29244715 fbshipit-source-id: a5547b449ea54e046c752119559be49bd738beea |
||
|
|
31fe1c1323 |
ns for fx: rekey results by model node names (#60305)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60305 Adjusts the NS for FX weight and activation extraction APIs to require a model name, and rekeys the results of these APIs to use the node names of the specified model as layer keys. For example, before ``` // API call results = ns.extract_logger_info( model_a, model_b, ns.OutputLogger) // results {'base_op_1_0': {'node_output': {'model_a': [{'ref_node_name': 'linear1', ...}]}}} ``` and after ``` // API call results = ns.extract_logger_info( model_a, model_b, ns.OutputLogger, 'model_b_name') // results // note: instead of `base_op_1_0`, the layer is named `linear1` {'linear1': {'node_output': {'model_a': [{'ref_node_name': 'linear1', ...}]}}} ``` Note: we cannot use these names while collecting data because node names are not guaranteed to be consistent across graphs. This is why we only rekey as the very last step. Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_layer_names ``` Imported from OSS Reviewed By: hx89 Differential Revision: D29243045 fbshipit-source-id: d39ecdfdd18b07291e3ecefed2ede287b100b7d0 |
||
|
|
4e347f1242 |
[docs] Fix backticks in docs (#60474)
Summary: There is a very common error when writing docs: One forgets to write a matching `` ` ``, and something like ``:attr:`x`` is rendered in the docs. This PR fixes most (all?) of these errors (and a few others). I found these running ``grep -r ">[^#<][^<]*\`"`` on the `docs/build/html/generated` folder. The regex finds an HTML tag that does not start with `#` (as python comments in example code may contain backticks) and that contains a backtick in the rendered HTML. This regex has not given any false positive in the current codebase, so I am inclined to suggest that we should add this check to the CI. Would this be possible / reasonable / easy to do malfet ? Pull Request resolved: https://github.com/pytorch/pytorch/pull/60474 Reviewed By: mrshenli Differential Revision: D29309633 Pulled By: albanD fbshipit-source-id: 9621e0e9f87590cea060dd084fa367442b6bd046 |
||
|
|
1120a1b92e |
[quant][fx][fix] QAT with object_type in qconfig (#60555)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60555 When we do QAT, we swap the FP32 modules with the corresponding quantized modules counterpart by calling `qat_swap_modules` in prepare. However when we try to look up using the swapped module type in qconfig_dict, we cannot find a match anymore since the qconfig dict contains the original module type. In this PR we update the qconfig_dict to include the modules swapped for QATT Test Plan: python test/test_quantization.py TestQuantizeFx.test_qconfig_qat_module_type Imported from OSS Reviewed By: jerryzh168 Differential Revision: D29337036 fbshipit-source-id: 60212eec3ee252a2445c1b58874cb36048c9f7dd |
||
|
|
e60f9cfc58 |
Revert D29135358: [quant] Input-Weight Equaliaztion - convert modifications
Test Plan: revert-hammer
Differential Revision:
D29135358 (
|
||
|
|
3de79b7757 |
[quant] Input-Weight Equaliaztion - convert modifications (#59963)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59963 When converting, before quantizing the nodes, we call `update_obs_for_equalization()` and `convert_eq_obs()`. `update_obs_for_equalization`: 1. For each InputEqualizationObserver, we find the corresponding WeightEqualizationObserver. 2. For nn.Linear layers, we will create an instance of the WeightEqualizationObserver, run forward on the observer with the given weights. 3. Calculate the equalization scale between the InputEqualizationObserver and WeightEqualizationObserver. `convert_eq_obs`: For every InputEqualizationObserver, we will do the following: 1. Create a node (ex. `x0_activation_post_process_scale`) containing the equalization scale constant. 2. Create another node containing a `mul` operator multiplying the equalization scale and the input. 3. Remove the current InputEqualizationObserver node, and replace it with the `mul` node. For every WeightEqualizationObserver, we will do the following: 1. Get the next equalization scale (we may need this for equalizing connected linear layers). 2. Scale the weights by multiplying it with the reciprocal of the current equalization scale and the next equalization scale Currently, this supports models with `nn.Linear` layers, but does not support connecting linear layers. Test Plan: `python test/test_quantization.py TestEqualizeFx.test_input_weight_equalization_convert` Original Model: ``` .LinearModule( (linear): Linear(in_features=2, out_features=2, bias=True) ) ``` Graph after `prepare_fx`: ``` graph(): %x : [#users=1] = placeholder[target=x] %x_equalization_process_0 : [#users=1] = call_module[target=x_equalization_process_0](args = (%x,), kwargs = {}) %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_00](args = (%x_equalization_process_0,), kwargs = {}) %linear : [#users=1] = call_module[target=linear](args = (%x_activation_post_process_0,), kwargs = {}) %linear_activation_post_process_0 : [#users=1] = call_module[target=linear_activation_post_process_0](args = (%linear,), kwargs = {}) return linear_activation_post_process_0 ``` Graph after equalization functions: ``` graph(): %x : [#users=1] = placeholder[target=x] %x_equalization_process_0_scale : [#users=1] = get_attr[target=x_equalization_process_0_scale] %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_equalization_process_0_scale), kwargs = {}) %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_00](args = (%mul,), kwargs = {}) %linear : [#users=1] = call_module[target=linear](args = (%x_activation_post_process_0,), kwargs = {}) %linear_activation_post_process_0 : [#users=1] = call_module[target=linear_activation_post_process_0](args = (%linear,), kwargs = {}) return linear_activation_post_process_0 ``` Graph after `convert_fx`: ``` graph(): %x : [#users=1] = placeholder[target=x] %x_equalization_process_0_scale : [#users=1] = get_attr[target=x_equalization_process_0_scale] %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_equalization_process_0_scale), kwargs = {}) %linear_input_scale_0 : [#users=1] = get_attr[target=linear_input_scale_0] %linear_input_zero_point_0 : [#users=1] = get_attr[target=linear_input_zero_point_0] %quantize_per_tensor : [#users=1] = call_function[target=torch.quantize_per_tensor](args = (%mul, %linear_input_scale_0, %linear_input_zero_point_0, torch.quint8), kwargs = {}) %linear : [#users=1] = call_module[target=linear](args = (%quantize_per_tensor,), kwargs = {}) %dequantize : [#users=1] = call_method[target=dequantize](args = (%linear,), kwargs = {}) return dequantize ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D29135358 fbshipit-source-id: 2d00056729041318463de61841483490b6bfeee5 |
||
|
|
4887c6e401 |
[quant] avoid resize calls in observer/fake_quant (#60386)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60386 During QAT we sometimes encounter errors with scripted models `RuntimeError: cannot resize variables that require grad` For per-tensor cases we don't need to resize some buffers so this PR removes the extra resize ops where applicable Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D29271905 fbshipit-source-id: 01a484a9559a3a4180490f9476d0cd3044ba0d1b |
||
|
|
4a3eea9a6a |
[quant][graphmode][fx] Produce reference linear module in convert (#60152)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60152 Test Plan: python test/test_quantization.py TestQuantizeFx Imported from OSS Reviewed By: vkuzo Differential Revision: D29188263 fbshipit-source-id: f7bbbef5d4d747eadf7a627a4e77a5ec9bb0bc94 |
||
|
|
2293ab4e53 |
[quant][graphmode][fx] Refactor convert for linear to use get_static_module_mapping and get_dynamic_module_mapping (#60151)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60151 Test Plan: ``` python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps ``` Imported from OSS Reviewed By: vkuzo Differential Revision: D29188264 fbshipit-source-id: d2b77ffcf4b7446fc6c43248e43218092d2a6aea |
||
|
|
47d727fe1b |
[quant][graphmode][fx] Produce conv reference static quant modules (#60138)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60138 Test Plan: python test/test_quantization.py TestQuantizeFx Imported from OSS Reviewed By: vkuzo Differential Revision: D29184791 fbshipit-source-id: 971a40012dbba0cf687c62a3a4af9358513c253b |
||
|
|
a029422cae |
[quant][graphmode][fx][refactor] Change the env map to add dtype as a key (#60054)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60054 Previously env in convert is Dict[str, Tuple[Node, torch.dtype]], that is, at a given time each node can only have one dtype, this causes a problem for the following case: ``` class M(torch.nn.Module): def __init__(self): super().__init__() self.conv = nn.Conv2d(1, 1, 1) def forward(self, x): x = self.conv(x) x1 = x.expand_as(x) x2 = torch.add(x, x1) return x2 def forward(self, x): x = self.activation_post_process_0(x) x = self.conv(x) x = self.activation_post_process_1(x) x1 = x.expand_as(x) x1 = self.activation_post_process_2(x1) x2 = torch.add(x, x1) x2 = self.activation_post_process_3(x2) return x2 def forward(self, x): x = torch.quantize_per_tensor(x, ...) x = self.conv(x). # quantized conv x = torch.dequantize(x) x1 = x.expand_as(x) x1 = torch.quantize_per_tensor(x1, ...) # Error: x is dequantized x2 = torch.ops.quantized.add(x, x1) return x2 Currently we have a env that is a map from node name of the observed graph to the Node in the quantized graph, here the problem is that following a quantized operator conv, we have two operators, one is expecting float input (expand_as), the other is expecting quantized input (quantized add), and in the quantized graph, ideally, expand_as should consume the dequantized output, and quantized add should consume the quantized output: quantized_conv - dequantize - expand_as \ ------- quantized_add But currently in env, each node needs to either be quantized or not quantized. Therefore we will need to change env to include dtype as well: env: Dict[str, Dict[dtype, Node]], e.g. {‘x’: {torch.float: dequantized_node, torch.quint8: quantized_node}} And when we load from the env, we will need to provide the dtype of the Node that we want to load as well. We can have a separate pass to figure out this information for each node. ``` Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Imported from OSS Reviewed By: vkuzo Differential Revision: D29149408 fbshipit-source-id: c9e4b7d65444ab6a6f573929bae1db5037629892 |
||
|
|
5a45103139 |
ns for fx: add API usage logging (#60103)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60103 Adds internal logging for NS for FX API usage. Test Plan: CI Reviewed By: jerryzh168 Differential Revision: D29166710 fbshipit-source-id: 2a1bf2f6038b0c6c5945b57b2db2de25c585a04a |
||
|
|
d5988c5eca |
remove unused type: ignore directives (#60006)
Summary: During development it is common practice to put `type: ignore` comments on lines that are correct, but `mypy` doesn't recognize this. This often stems from the fact, that the used `mypy` version wasn't able to handle the used pattern. With every new release `mypy` gets better at handling complex code. In addition to fix all the previously accepted but now failing patterns, we should also revisit all `type: ignore` comments to see if they are still needed or not. Fortunately, we don't need to do it manually: by adding `warn_unused_ignores = True` to the configuration, `mypy` will error out in case it encounters an `type: ignore` that is no longer needed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60006 Reviewed By: jbschlosser, malfet Differential Revision: D29133237 Pulled By: albanD fbshipit-source-id: 41e82edc5cd5affa7ccedad044b59b94dad4425a |
||
|
|
c0b7c59e55 |
[quant] Equalization Observer modifications (#59953)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59953 The following modifications were made to the equalization observers due to design changes: - [InputEqualizationObserver] Replaced `calculate_qparams()` with `calculate_scaled_minmax()` since we will need to return the scaled min/max values to update the following input quantization observer - [WeightEqualizationObserver] We no longer need a row observer since this will be taken care of by the following weight quantization observer - [WeightEqualizationObserver] Following the previous comment, we no longer need to calculate the scaled qparam values. Instead, we will use the equalization scale to later scale the weights and the qparams will be taken care of by the weight quantization observer. Test Plan: `python test/test_quantization.py TestEqualizeFx.test_input_weight_eq_observer` Imported from OSS Reviewed By: supriyar Differential Revision: D29135332 fbshipit-source-id: be7e468273c8b62fc183b1e1ec50f6bd6d8cf831 |
||
|
|
45c31cabb5 |
[quant] Input Weight Equalization - prepare modifications (#59747)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59747 Modifies prepare_fx for input-weight equalization. If a current node is being equalized (there exists a EqualizationQConfig), then the EqualizationObserver will be inserted before its quantization observer. For a singular linear layer, the general flow looks like: Original graph: `x0 -> linear -> x1`, `w -> linear` After prepare: `x0 -> InpEqObs -> MinMaxObs -> linear1 -> MinMaxObs -> x1` `w -> WeightEqObs -> MinMaxObs -> linear1` For two connected linear layers, the general flow looks like: Original graph: `x0 -> linear1 -> linear2 -> x1`, `w1 -> linear1`, `w2 -> linear2` After prepare: `x0 -> InpEqObs -> MinMaxObs -> linear1 -> MinMaxObs -> InpEqObs -> linear2 -> MinMaxObs -> x1` `w1 -> WeightEqObs -> MinMaxObs -> linear1`, `w2 -> WeightEqObs -> MinMaxObs -> linear2 Test Plan: `python test/test_quantization.py TestEqualizeFx.test_input_equalization_prepare` Original model with one `nn.Linear` layer ``` LinearModule( (linear): Linear(in_features=1, out_features=1, bias=True) ) ``` Graph after `prepare_fx`: ``` graph(): %x : [#users=1] = placeholder[target=x] %x_equalization_process_0 : [#users=1] = call_module[target=x_equalization_process_0](args = (%x,), kwargs = {}) %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_00](args = (%x_equalization_process_0,), kwargs = {}) %linear : [#users=1] = call_module[target=linear](args = (%x_activation_post_process_0,), kwargs = {}) %linear_activation_post_process_0 : [#users=1] = call_module[target=linear_activation_post_process_0](args = (%linear,), kwargs = {}) return linear_activation_post_process_0 ``` -------------------------------------- Original model with two connected functional linear layers ``` FunctionalLinearModule( (linear1): Linear() (linear2): Linear() ) ``` Graph after `prepare_fx`: ``` graph(): %x : [#users=1] = placeholder[target=x] %x_equalization_process_0 : [#users=1] = call_module[target=x_equalization_process_0](args = (%x,), kwargs = {}) %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_00](args = (%x_equalization_process_0,), kwargs = {}) %linear1_w : [#users=1] = get_attr[target=linear1.w] %linear1_w_equalization_process_0 : [#users=1] = call_module[target=linear1_w_equalization_process_0](args = (%linear1_w,), kwargs = {}) %linear1_w_activation_post_process_0 : [#users=1] = call_module[target=linear1_w_activation_post_process_00](args = (%linear1_w_equalization_process_0,), kwargs = {}) %linear1_b : [#users=1] = get_attr[target=linear1.b] %linear : [#users=1] = call_function[target=torch.nn.functional.linear](args = (%x_activation_post_process_0, %linear1_w_activation_post_process_0), kwargs = {bias: %linear1_b}) %linear_activation_post_process_0 : [#users=1] = call_module[target=linear_activation_post_process_0](args = (%linear,), kwargs = {}) %linear_activation_post_process_0_equalization_process_0 : [#users=1] = call_module[target=linear_activation_post_process_0_equalization_process_0](args = (%linear_activation_post_process_0,), kwargs = {}) %linear2_w : [#users=1] = get_attr[target=linear2.w] %linear2_w_equalization_process_0 : [#users=1] = call_module[target=linear2_w_equalization_process_0](args = (%linear2_w,), kwargs = {}) %linear2_w_activation_post_process_0 : [#users=1] = call_module[target=linear2_w_activation_post_process_00](args = (%linear2_w_equalization_process_0,), kwargs = {}) %linear2_b : [#users=1] = get_attr[target=linear2.b] %linear_1 : [#users=1] = call_function[target=torch.nn.functional.linear](args = (%linear_activation_post_process_0_equalization_process_0, %linear2_w_activation_post_process_0), kwargs = {bias: %linear2_b}) %linear_1_activation_post_process_0 : [#users=1] = call_module[target=linear_1_activation_post_process_0](args = (%linear_1,), kwargs = {}) return linear_1_activation_post_process_0 ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D29135316 fbshipit-source-id: 91697e805ede254dbb2a42ee4c23eb1c1c64590e |
||
|
|
7ce74f3339 |
[quant] EqualizationQConfig to distinguish input/output activations (#59739)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59739 Created an EqualizationQConfig specifically for equalization. This inherits from QConfig and is used to distinguish between inserting an input observer with an output observer. Since the output observer field is included in the EqualizationQConfig, we no longer need an output observer field in the _InputEqualizationObserver Test Plan: compiles Imported from OSS Reviewed By: ezyang Differential Revision: D29135298 fbshipit-source-id: 3dde9c029c291467ff0a0845f0fc9c44573fc6f6 |
||
|
|
a344b09db2 |
[quant][fx][graphmode] Remove Quantizer class (#59606)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59606 Test Plan: python test/test_quantization.py TestQuantizeFx Imported from OSS Reviewed By: vkuzo Differential Revision: D28951432 fbshipit-source-id: 3301f7200a4c7166673c27f9ac7ff559f1e6935d |
||
|
|
864d129bae |
[quant][fx] Remove extra q-dq for weight bias in normalization ops (#59882)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59882 Currently for normalization ops, the weight and bias arguments are treated as activationn inputs which require observers. This results in adding extra quant-dequant ops for the weight and bias inputs. This PR adds support to skip observing weight/bias inputs of norm operators, thus removing the redundant q-dq ops Quantized graph with F.layer_norm Before this PR ``` def forward(self, x): _input_scale_0 = self._input_scale_0 _input_zero_point_0 = self._input_zero_point_0 quantize_per_tensor = torch.quantize_per_tensor(x, _input_scale_0, _input_zero_point_0, torch.quint8); x = _input_scale_0 = _input_zero_point_0 = None scale = self.scale _input_scale_1 = self._input_scale_1 _input_zero_point_1 = self._input_zero_point_1 quantize_per_tensor_1 = torch.quantize_per_tensor(scale, _input_scale_1, _input_zero_point_1, torch.quint8); scale = _input_scale_1 = _input_zero_point_1 = None bias = self.bias _input_scale_2 = self._input_scale_2 _input_zero_point_2 = self._input_zero_point_2 quantize_per_tensor_2 = torch.quantize_per_tensor(bias, _input_scale_2, _input_zero_point_2, torch.quint8); bias = _input_scale_2 = _input_zero_point_2 = None _scale_0 = self._scale_0 _zero_point_0 = self._zero_point_0 dequantize = quantize_per_tensor_1.dequantize(); quantize_per_tensor_1 = None dequantize_1 = quantize_per_tensor_2.dequantize(); quantize_per_tensor_2 = None layer_norm = torch.ops.quantized.layer_norm(quantize_per_tensor, [2, 5, 5], weight = dequantize, bias = dequantize_1, eps = 1e-05, output_scale = _scale_0, output_zero_point = _zero_point_0); quantize_per_tensor = dequantize = dequantize_1 = _scale_0 = _zero_point_0 = None dequantize_2 = layer_norm.dequantize(); layer_norm = None return dequantize_2 ``` After ``` def forward(self, x): _input_scale_0 = self._input_scale_0 _input_zero_point_0 = self._input_zero_point_0 quantize_per_tensor = torch.quantize_per_tensor(x, _input_scale_0, _input_zero_point_0, torch.quint8); x = _input_scale_0 = _input_zero_point_0 = None scale = self.scale bias = self.bias _scale_0 = self._scale_0 _zero_point_0 = self._zero_point_0 layer_norm = torch.ops.quantized.layer_norm(quantize_per_tensor, [2, 5, 5], weight = scale, bias = bias, eps = 1e-05, output_scale = _scale_0, output_zero_point = _zero_point_0); quantize_per_tensor = scale = bias = _scale_0 = _zero_point_0 = None dequantize = layer_norm.dequantize(); layer_norm = None return dequantize ``` Test Plan: python test/test_quantization.py TestQuantizeFxOps.test_norm_weight_bias Imported from OSS Reviewed By: HDCharles, ailzhang Differential Revision: D29068203 fbshipit-source-id: 24b5c38bbea5fd355d34522bfa654c9db18607da |
||
|
|
d75e99b709 |
fx quant: enable qconfig_dict to target function invocations by order (#59605)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59605 Enables targeting of individual function invocations by execution order. For example, given a module such as ``` class M1(torch.nn.Module): def forward(self, x): x = torch.add(x, x) x = torch.add(x, x) return x class M2(torch.nn.Module): def __init__(self): self.m1 = M1() def forward(self, x): x = self.m1(x) return x ``` We can now target the first add of `m1` with ``` qconfig_dict = { "module_name_function_order": ("m1", torch.add, 0, custom_qconfig), } ``` Test Plan: ``` python test/test_quantization.py TestQuantizeFx.test_qconfig_module_name_function_order ``` Imported from OSS Reviewed By: hx89 Differential Revision: D28951077 fbshipit-source-id: 311d423724a31193d4fa4bbf3a712b46464b5a29 |
||
|
|
0099c25b85 |
fx quant: remove some dead code in observer insertion (redo) (#59799)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59799 This is a redo of #58574, easier to create a new PR than to fix rebase conflicts, as there have been a large number of refactors to the underlying code. Removes some code which was incorrectly added by #57519 but never actually used for anything. Test Plan: ``` python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D29031955 fbshipit-source-id: f407d181070cb283382965952821e3647c705544 |
||
|
|
61965abad7 |
Move _PartialWrapper to module scope (#59660)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59660 Context https://github.com/pytorch/pytorch/issues/57352 Test Plan: Pytorch CI tests Reviewed By: vkuzo Differential Revision: D28972991 fbshipit-source-id: efc9dd3e90e18e1cdf27d5ef0f168abd8169bc42 |
||
|
|
7dac2987ce |
[quant][eager][fix] Fix a typo in convert function in eager mode quantization (#59571)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59571 Test Plan: python test/test_quantization.py TestPostTrainingStatic.test_custom_module_class Imported from OSS Reviewed By: vkuzo Differential Revision: D28938355 fbshipit-source-id: 566daeb07d616ae40e52754d3d4581f75f248f04 |
||
|
|
cc03ea2c47 |
[quant] Implemented InputWeightObserver for Linear inputs
Summary: Implemented two observers (InputEqualObserver and WeightEqualObserver) which will be inserted into the graph during prepare_fx(). Test Plan: python test/test_quantization.py TestEqualizeFx Reviewed By: supriyar Differential Revision: D28836954 fbshipit-source-id: 25517dc82ae67698ed8b2dc334e3323286976104 |
||
|
|
1aa14fcb14 |
Fix the "tensors to be on the same device" error in HistogramObserver (#59234)
Summary: Fixes https://github.com/pytorch/pytorch/issues/59075 This PR fixes the "tensors to be on the same device" error in `HistogramObserver`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59234 Reviewed By: jbschlosser Differential Revision: D28837572 Pulled By: vkuzo fbshipit-source-id: ff7c3229ced7de2cdd8f76d526f0fd33ac643216 |
||
|
|
18642e664a |
[quant][graphmode][fx][refactor] Split quantize.py to prepare.py and convert.py (#59353)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59353 Next: remove Quantizer class Test Plan: Imported from OSS Reviewed By: raghuramank100 Differential Revision: D28856277 fbshipit-source-id: 25f5502be387dbe9706780f667501b46b82789a5 |
||
|
|
87a25e09f4 |
[quant][graphmode][fx][refactor] Remove _convert from Quantizer class (#59042)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59042 To remove Quantizer class and split prepare and convert functions to different files Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Imported from OSS Reviewed By: vkuzo Differential Revision: D28724867 fbshipit-source-id: 9f87d51020caa20d5408cb2820947e23d92d5fc3 |
||
|
|
3218d890dd |
[quant][graphmode][fx][fix] Fix support for custom module (#59041)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59041 Static quantization for Custom module support was removed in a previous refactor https://github.com/pytorch/pytorch/pull/57519 since it's not covered by the test case This PR re-enabled the test case and fixed the support Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D28724866 fbshipit-source-id: 1974675b88b56a2173daf86965d6f3fb7ebd783b |
||
|
|
06af7618e7 |
[quant][graphmode][fx][refactor] Remove Quantizer class from convert (QuantizeHandler) (#59040)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59040 To remove Quantizer class and split prepare and convert functions to different files Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Imported from OSS Reviewed By: vkuzo Differential Revision: D28724870 fbshipit-source-id: c0f748711b825cd46bdfcc05c054c77a41e8207a |
||
|
|
50e6ee3ca2 |
[quant][graphmode][fx][refactor] Remove Quantizer class from quantize_node (#59039)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59039 To remove Quantizer class and split prepare and convert functions to different files Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Imported from OSS Reviewed By: vkuzo Differential Revision: D28724874 fbshipit-source-id: bd984716b2da1d6879c3e92fa827574783a41567 |