Summary: Add a unit test for the Dynamic Quantized Linear operator (```torch.fbgemm_linear_quantize_weight```, ```torch.fbgemm_pack_quantized_matrix```, and ```torch.fbgemm_linear_int8_weight```) in ```test_quantized.py```.
Test Plan:
buck test mode/dev caffe2/test:quantized -- 'test_qlinear_legacy \(test_quantized\.TestDynamicQuantizedLinear\)' --print-passing-details
[jianyuhuang@devvm29567.prn1.facebook.com: ~/fbsource/fbcode/caffe2/test] $ buck test mode/dev caffe2/test:quantized -- 'test_dynamic_qlinear \(test_quantized\.TestQuantizedLinear\)' --print-passing-details
Parsing buck files: finished in 1.8 sec
Building: finished in 3.4 sec (100%) 6772/6772 jobs, 2 updated
Total time: 5.2 sec
Trace available for this run at /tmp/testpilot.20190714-220130.2698168.log
TestPilot test runner for Facebook. See https://fburl.com/testpilot for details.
Testpilot build revision 4f180136f799ab45ec2bf5d7644cb14955d4dd7a fbpkg
6c6253f255644ca3b8ce1bc5955b0f25 at Mon Jul 8 14:13:38 2019 by twsvcscm from /
usr/local/fbprojects/packages/testinfra.testpilot/651/t.par
Discovering tests
Running 1 tests
Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/1125900044862617
✓ caffe2/test:quantized - test_dynamic_qlinear (test_quantized.TestQuantizedLinear) 0.023 1/1
(passed)
Test output:
> test_dynamic_qlinear (test_quantized.TestQuantizedLinear) ... ok
>
> ----------------------------------------------------------------------
> Ran 1 test in 0.024s
>
> OK
Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/1125900044862617
Summary (total time 9.03s):
PASS: 1
FAIL: 0
SKIP: 0
FATAL: 0
TIMEOUT: 0
OMIT: 0
Differential Revision: D16404027
fbshipit-source-id: 4c85dd255637fd8b1eb4830e0464f48c22706f41
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29529
Pull Request resolved: https://github.com/pytorch/glow/pull/3771
We would like to replace `conv_prepack` with `conv2d_prepack` and `conv_unpack` with `conv2d_unpack`.
This makes the naming consistent between 2D and 3D conv:
```
torch.ops.quantized.conv2d_prepack
torch.ops.quantized.conv2d_unpack
torch.ops.quantized.conv2d
torch.ops.quantized.conv3d_prepack
torch.ops.quantized.conv3d_unpack
torch.ops.quantized.conv3d
```
We should do this earlier rather than later when we have more users for the quantized conv2d ops, for better engineering.
The replacement bash command is as the follows:
```
find ./ -type f -exec sed -i -e 's/quantized::conv_prepack/quantized::conv2d_prepack/g' {} \;
find ./ -type f -exec sed -i -e 's/quantized::conv_unpack/quantized::conv2d_unpack/g' {} \;
find ./ -type f -exec sed -i -e 's/torch.ops.quantized.conv_prepack/torch.ops.quantized.conv2d_prepack/g' {} \;
find ./ -type f -exec sed -i -e 's/torch.ops.quantized.conv_unpack/torch.ops.quantized.conv2d_unpack/g' {} \;
```
ghstack-source-id: 93661879
Test Plan: CI
Reviewed By: jackm321
Differential Revision: D18421079
fbshipit-source-id: 17ae8b1ee79223bd2c5d4bbccd57af6580c4ab12
Summary:
It is reported this test is flaky due to the time expiration. This pr flags it as no_deadline test.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29502
Differential Revision: D18416632
Pulled By: lly-zero-one
fbshipit-source-id: 27cd7b28139f3f16ee0cf5802a0709385719d487
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29328
Tests are flaky as seen in issue #29326.
Disable until we fix the kernels.
Test Plan:
python test/test_quantized.py TestQNNPackOps
Imported from OSS
Differential Revision: D18358200
fbshipit-source-id: 58f1981799fe8253234fcc7b0540e1c0b6babc15
Summary:
This is actually a bug in both testing and the average pool implementation.
In testing, we used the quantized value as float input and failed to padding the value with zero_point.
In op implementation, the size for averaging is not correct for padding case when count_include_pad is true.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28260
Differential Revision: D18039960
Pulled By: lly-zero-one
fbshipit-source-id: 7b5d34498b60f5d574a276a22798c9f576944734
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27675
This leverages QNNPACK global average pooling to perform torch.mean on input feature maps
Currently can only support mean along HxW plane in NCHW tensor.
Test Plan:
python test/test_quantized.py TestQuantizedOps.test_mean
Imported from OSS
Differential Revision: D17989336
fbshipit-source-id: 8d4cbcbed5f146290b1580d26e5b45359d293761
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28246
Updated the reference fp32 implementation to use the dequantized input tensor to correctly take padded values into account
Test Plan:
python test/test_quantized.py TestQNNPackOps.test_avg_pool2d
Imported from OSS
Differential Revision: D17989334
fbshipit-source-id: 848ce78713280f529f71ff48e930db8de18abc62
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27631
Add support to perform avg_pool2d on mobile. Tested using existing avg_pool2d python tests
Uses qnnpack backend, which currently only support 4 dim inputs.
Test Plan:
python test/test_quantized.py TestQNNPackOps.test_avg_pool2d
Imported from OSS
Differential Revision: D17973792
fbshipit-source-id: 95ffffb2da656ed911a618b9cb68d6b728c16c74
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27616
Fix a problem in reference implementation of equal
Test Plan:
pytho test/test_quantized.py
Imported from OSS
Differential Revision: D17837055
fbshipit-source-id: 1e4bc32f4334c0352468a61fa4316a1c0ff76485
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26992
Run the same test for FBGEMM and QNNPACK backends.
Checks that QNNPACK or FBGEMM are supported before running it (using supported_qengines)
Test Plan:
python test/test_quantized.py TestQuantizedLinear
python test/test_quantized.py TestQuantizedConv
python test/test_quantized_models.py
python test/test_quantized_nn_mods.py
Imported from OSS
Differential Revision: D17689171
fbshipit-source-id: e11c0a5e41f5f4e6836a614a5b61e4db3c5e384b
Summary:
The QuantizedAVx2 does not support the int32 type. We switch to use at::quantize_vec function instead.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26854
Differential Revision: D17609872
Pulled By: llyfacebook
fbshipit-source-id: b4a77d93ce0ebfef696506b5cdbe3e91fe44bb36
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26676
Just makes it more user-friendly to be able to pass any floating point or int point values to scales or zero_points for per-channel quantization. It matches behavior or per tensor quantizer where those arguments are scalars (not tensors) and thus automatic casting is applied.
Test Plan: Imported from OSS
Differential Revision: D17537051
Pulled By: dzhulgakov
fbshipit-source-id: e955ccdb5b4691828a559dc8f1ed7de54b6d12c4
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26675
Based on offline poll, we're very unlikely to have multi-axis quantized tensors in the foreseeable future. Let's simplify API and just return int instead of list. It also matches the singular `axis` name.
Test Plan: Imported from OSS
Differential Revision: D17537052
Pulled By: dzhulgakov
fbshipit-source-id: 676abc3b251d288468aaed467b5e5ca4063b98b0
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26524
This creates an NHWC specialization for `quantized::cat` that kicks in when all inputs are `NHWC`. This ensures the correct layout is propagated downstream as well as is an optimized implementation specifically for this data layout
Benchmark script based on Squeezenet shapes:
```
import torch, time
torch.manual_seed(0)
# NHWC
sizes = [
(1, 54, 54, 64),
(1, 54, 54, 128),
(1, 26, 26, 128),
(1, 26, 26, 256),
(1, 12, 12, 256)
]
for size in sizes:
x = torch.rand(*size)
y = torch.rand(*size)
qX = torch.quantize_linear(x, 0.01, 3, torch.qint8).permute([0, 3, 1, 2])
qY = torch.quantize_linear(y, 0.01, 3, torch.qint8).permute([0, 3, 1, 2])
ref = torch.cat([qX.dequantize(), qY.dequantize()], dim=1)
NITER = 1000
s = time.time()
for i in range(NITER):
out = torch.ops.quantized.cat([qX, qY], dim=1, scale=0.01, zero_point=3)
time_per_iter = (time.time() - s) / NITER
print('time per iter ms', time_per_iter * 1000)
print('gb/s', (qX.numel() + qY.numel() + out.numel()) * qX.element_size() / time_per_iter / 1e9)
torch.testing.assert_allclose(out.dequantize(), ref)
```
Before this change
```
time per iter ms 0.6898486614227295
gb/s 1.0821156026605054
time per iter ms 1.5480577945709229
gb/s 0.9644291093239284
time per iter ms 0.3180875778198242
gb/s 1.0881028500775023
time per iter ms 0.6702737808227539
gb/s 1.032748139350315
time per iter ms 0.13010454177856445
gb/s 1.1333655073392244
```
After this change
```
time per iter ms 0.11604785919189453
gb/s 6.432656364350577
time per iter ms 0.15956878662109375
gb/s 9.356416324360508
time per iter ms 0.040181636810302734
gb/s 8.613685939027139
time per iter ms 0.06564664840698242
gb/s 10.544696748392909
time per iter ms 0.018549680709838867
gb/s 7.949247337814738
```
Test Plan: Imported from OSS
Differential Revision: D17503593
Pulled By: jamesr66a
fbshipit-source-id: ec5d57ad8fbcb3fd9379e8bd370abd29d386f953
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26586
Use the backend engine flag to call QNNPACK for quantized ops.
Test Plan: python test/test_quantized.py TestQNNPACKOps
Differential Revision: D17515129
Pulled By: supriyar
fbshipit-source-id: 951e90205aa19581ea006a91d9514fc7a94409ef
Summary:
In this PR, we tried to fix the windows build issue of d17437015.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26580
Differential Revision: D17517341
Pulled By: llyfacebook
fbshipit-source-id: db726596aa8f7c992c5a7ddc2781dc3aa0312284
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26599
These fail due to tolerance in equality comparison. Disable them for now.
ghstack-source-id: 90553855
Test Plan: unit tests
Differential Revision: D17517085
fbshipit-source-id: a4d9278e356318719ccd84047404915a97944f52
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26575
To keep consistent with `quantize_per_tensor` we also
rename `quantize_linear_per_channel` to `quantize_per_channel`
Test Plan:
ci
Imported from OSS
Differential Revision: D17517360
fbshipit-source-id: 3af7d8f0fbe99148b79fcb1ad2fe811f776590cd
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26574
Since we also have `quantized::linear`, `quantize_linear` sounds
confusing, so we plan to rename it before the branch cut
Test Plan:
ci
Imported from OSS
Differential Revision: D17514876
fbshipit-source-id: 01d9005e6ec8cb9950b9d8bba122109c389641d3
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26335
Use the backend engine flag to call QNNPACK for quantized ops.
Test Plan:
python test/test_quantized.py TestQNNPACKOps
Imported from OSS
Differential Revision: D17504331
fbshipit-source-id: 35cb2189067ac5cc6a7307179ef0335d1cec7b8f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26307
Add support for FP32 bias. Re-quantize bias during time time based on input scale.
If the value of input scale changes in the packed struct we requantize the bias with the updated input scale.
Test Plan: python test/test_quantized.py TestQNNPackOps
Differential Revision: D17504253
Pulled By: supriyar
fbshipit-source-id: 49fe36a0bee91aaeb085db28eec4ded8c684dcf4
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26242
According to https://github.com/pytorch/pytorch/issues/19092 we always keep NCHW order and do handling inside the kernels. This PR fixes it for activations of the qconv by using MemoryLayout mechanism - activations stay logically as NCHW but strided as NHWC.
Note, that this version is more aggressive than eventual MemoryLayout mechanism - the QConv's output is always NHWC regardless of the input striding. I think it's ok as we don't have NCHW quantized kernels anyway - so the very first conv would magically switch the order, but I'm open to suggestions. Btw, it doesn't change behavior - same happens today in master because of the explicit permute() call.
Test Plan: Imported from OSS
Differential Revision: D17443218
Pulled By: dzhulgakov
fbshipit-source-id: cfd136ae0465acd8d8c26ffad87385dac9c88726
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26241
According to https://github.com/pytorch/pytorch/issues/19092 we always keep NCHW order and do handling inside the kernels. This PR fixes it for weights of the qconv by using MemoryLayout mechanism.
Test Plan: Imported from OSS
Differential Revision: D17443219
Pulled By: dzhulgakov
fbshipit-source-id: ce0eb92034a9977b3303dafab8b0414575171062
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26211
Currently QNNPACK does not have an unpack function like FBGEMM does.
In order to be able to script quantized models for mobile, we need to save unpacked weights.
This change stores the original weights and bias in the opaque struct and simply returns it when unpack is called
Test Plan:
python test/test_quantized.py TestQNNPackOps.test_qconv_unpack
python test/test_quantized.py TestQNNPackOps.test_qlinear_unpack
Imported from OSS
Differential Revision: D17464430
fbshipit-source-id: 83ad5a2556dcf13245a1047feef6cfb489c9ef69
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26152
This change adds the support to call QNNPACK using the refactored API for Conv2d operators
Test Plan:
python test/test_quantized.py TestQNNPackOps.test_qconv_qnnpack
Imported from OSS
Differential Revision: D17459892
fbshipit-source-id: d20b3e8b81dd403541cb2b9164731448ca229695
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26135
This change adds the support to call QNNPACK using the refactored API for Linear operators (Fully Connected)
It also has certain cmake changes to enable builing and using pytorch_qnnpack inside aten
I have disabled USE_QNNPACK in CMakeLists.txt. Enabling it results in picking kernels from third_party/QNNPACK during runtime since the function names are the same.
Test Plan:
python test/test_quantized.py TestQNNPackOps.test_qlinear_qnnpack
Imported from OSS
Differential Revision: D17434885
fbshipit-source-id: 084698026938f4529f61d12e86dfe82534ec73dd