Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27616
Fix a problem in reference implementation of equal
Test Plan:
pytho test/test_quantized.py
Imported from OSS
Differential Revision: D17837055
fbshipit-source-id: 1e4bc32f4334c0352468a61fa4316a1c0ff76485
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26992
Run the same test for FBGEMM and QNNPACK backends.
Checks that QNNPACK or FBGEMM are supported before running it (using supported_qengines)
Test Plan:
python test/test_quantized.py TestQuantizedLinear
python test/test_quantized.py TestQuantizedConv
python test/test_quantized_models.py
python test/test_quantized_nn_mods.py
Imported from OSS
Differential Revision: D17689171
fbshipit-source-id: e11c0a5e41f5f4e6836a614a5b61e4db3c5e384b
Summary:
The QuantizedAVx2 does not support the int32 type. We switch to use at::quantize_vec function instead.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26854
Differential Revision: D17609872
Pulled By: llyfacebook
fbshipit-source-id: b4a77d93ce0ebfef696506b5cdbe3e91fe44bb36
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26676
Just makes it more user-friendly to be able to pass any floating point or int point values to scales or zero_points for per-channel quantization. It matches behavior or per tensor quantizer where those arguments are scalars (not tensors) and thus automatic casting is applied.
Test Plan: Imported from OSS
Differential Revision: D17537051
Pulled By: dzhulgakov
fbshipit-source-id: e955ccdb5b4691828a559dc8f1ed7de54b6d12c4
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26675
Based on offline poll, we're very unlikely to have multi-axis quantized tensors in the foreseeable future. Let's simplify API and just return int instead of list. It also matches the singular `axis` name.
Test Plan: Imported from OSS
Differential Revision: D17537052
Pulled By: dzhulgakov
fbshipit-source-id: 676abc3b251d288468aaed467b5e5ca4063b98b0
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26524
This creates an NHWC specialization for `quantized::cat` that kicks in when all inputs are `NHWC`. This ensures the correct layout is propagated downstream as well as is an optimized implementation specifically for this data layout
Benchmark script based on Squeezenet shapes:
```
import torch, time
torch.manual_seed(0)
# NHWC
sizes = [
(1, 54, 54, 64),
(1, 54, 54, 128),
(1, 26, 26, 128),
(1, 26, 26, 256),
(1, 12, 12, 256)
]
for size in sizes:
x = torch.rand(*size)
y = torch.rand(*size)
qX = torch.quantize_linear(x, 0.01, 3, torch.qint8).permute([0, 3, 1, 2])
qY = torch.quantize_linear(y, 0.01, 3, torch.qint8).permute([0, 3, 1, 2])
ref = torch.cat([qX.dequantize(), qY.dequantize()], dim=1)
NITER = 1000
s = time.time()
for i in range(NITER):
out = torch.ops.quantized.cat([qX, qY], dim=1, scale=0.01, zero_point=3)
time_per_iter = (time.time() - s) / NITER
print('time per iter ms', time_per_iter * 1000)
print('gb/s', (qX.numel() + qY.numel() + out.numel()) * qX.element_size() / time_per_iter / 1e9)
torch.testing.assert_allclose(out.dequantize(), ref)
```
Before this change
```
time per iter ms 0.6898486614227295
gb/s 1.0821156026605054
time per iter ms 1.5480577945709229
gb/s 0.9644291093239284
time per iter ms 0.3180875778198242
gb/s 1.0881028500775023
time per iter ms 0.6702737808227539
gb/s 1.032748139350315
time per iter ms 0.13010454177856445
gb/s 1.1333655073392244
```
After this change
```
time per iter ms 0.11604785919189453
gb/s 6.432656364350577
time per iter ms 0.15956878662109375
gb/s 9.356416324360508
time per iter ms 0.040181636810302734
gb/s 8.613685939027139
time per iter ms 0.06564664840698242
gb/s 10.544696748392909
time per iter ms 0.018549680709838867
gb/s 7.949247337814738
```
Test Plan: Imported from OSS
Differential Revision: D17503593
Pulled By: jamesr66a
fbshipit-source-id: ec5d57ad8fbcb3fd9379e8bd370abd29d386f953
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26586
Use the backend engine flag to call QNNPACK for quantized ops.
Test Plan: python test/test_quantized.py TestQNNPACKOps
Differential Revision: D17515129
Pulled By: supriyar
fbshipit-source-id: 951e90205aa19581ea006a91d9514fc7a94409ef
Summary:
In this PR, we tried to fix the windows build issue of d17437015.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26580
Differential Revision: D17517341
Pulled By: llyfacebook
fbshipit-source-id: db726596aa8f7c992c5a7ddc2781dc3aa0312284
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26599
These fail due to tolerance in equality comparison. Disable them for now.
ghstack-source-id: 90553855
Test Plan: unit tests
Differential Revision: D17517085
fbshipit-source-id: a4d9278e356318719ccd84047404915a97944f52
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26575
To keep consistent with `quantize_per_tensor` we also
rename `quantize_linear_per_channel` to `quantize_per_channel`
Test Plan:
ci
Imported from OSS
Differential Revision: D17517360
fbshipit-source-id: 3af7d8f0fbe99148b79fcb1ad2fe811f776590cd
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26574
Since we also have `quantized::linear`, `quantize_linear` sounds
confusing, so we plan to rename it before the branch cut
Test Plan:
ci
Imported from OSS
Differential Revision: D17514876
fbshipit-source-id: 01d9005e6ec8cb9950b9d8bba122109c389641d3
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26335
Use the backend engine flag to call QNNPACK for quantized ops.
Test Plan:
python test/test_quantized.py TestQNNPACKOps
Imported from OSS
Differential Revision: D17504331
fbshipit-source-id: 35cb2189067ac5cc6a7307179ef0335d1cec7b8f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26307
Add support for FP32 bias. Re-quantize bias during time time based on input scale.
If the value of input scale changes in the packed struct we requantize the bias with the updated input scale.
Test Plan: python test/test_quantized.py TestQNNPackOps
Differential Revision: D17504253
Pulled By: supriyar
fbshipit-source-id: 49fe36a0bee91aaeb085db28eec4ded8c684dcf4
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26242
According to https://github.com/pytorch/pytorch/issues/19092 we always keep NCHW order and do handling inside the kernels. This PR fixes it for activations of the qconv by using MemoryLayout mechanism - activations stay logically as NCHW but strided as NHWC.
Note, that this version is more aggressive than eventual MemoryLayout mechanism - the QConv's output is always NHWC regardless of the input striding. I think it's ok as we don't have NCHW quantized kernels anyway - so the very first conv would magically switch the order, but I'm open to suggestions. Btw, it doesn't change behavior - same happens today in master because of the explicit permute() call.
Test Plan: Imported from OSS
Differential Revision: D17443218
Pulled By: dzhulgakov
fbshipit-source-id: cfd136ae0465acd8d8c26ffad87385dac9c88726
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26241
According to https://github.com/pytorch/pytorch/issues/19092 we always keep NCHW order and do handling inside the kernels. This PR fixes it for weights of the qconv by using MemoryLayout mechanism.
Test Plan: Imported from OSS
Differential Revision: D17443219
Pulled By: dzhulgakov
fbshipit-source-id: ce0eb92034a9977b3303dafab8b0414575171062
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26211
Currently QNNPACK does not have an unpack function like FBGEMM does.
In order to be able to script quantized models for mobile, we need to save unpacked weights.
This change stores the original weights and bias in the opaque struct and simply returns it when unpack is called
Test Plan:
python test/test_quantized.py TestQNNPackOps.test_qconv_unpack
python test/test_quantized.py TestQNNPackOps.test_qlinear_unpack
Imported from OSS
Differential Revision: D17464430
fbshipit-source-id: 83ad5a2556dcf13245a1047feef6cfb489c9ef69
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26152
This change adds the support to call QNNPACK using the refactored API for Conv2d operators
Test Plan:
python test/test_quantized.py TestQNNPackOps.test_qconv_qnnpack
Imported from OSS
Differential Revision: D17459892
fbshipit-source-id: d20b3e8b81dd403541cb2b9164731448ca229695
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26135
This change adds the support to call QNNPACK using the refactored API for Linear operators (Fully Connected)
It also has certain cmake changes to enable builing and using pytorch_qnnpack inside aten
I have disabled USE_QNNPACK in CMakeLists.txt. Enabling it results in picking kernels from third_party/QNNPACK during runtime since the function names are the same.
Test Plan:
python test/test_quantized.py TestQNNPackOps.test_qlinear_qnnpack
Imported from OSS
Differential Revision: D17434885
fbshipit-source-id: 084698026938f4529f61d12e86dfe82534ec73dd
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25680
Add a runtime flag to choose between FBGEMM and QNNPACK when compiled with both.
The flag can be set by using torch.backends.quantized.engine = torch.fbgemm/torch.qnnpack or ctx::setPreferredQuantizedEngine(at::QEngine)
ghstack-source-id: 89935643
Test Plan: Verified torch.backends.quantized.engine works
Differential Revision: D17198233
fbshipit-source-id: e5449d06f4136385e0e6d18bd4237f8654a61672
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25731
I didn't notice this before, but the QuantizeAvx2 routine was requantizing only a single vector of 8 floats into 1/4 of a 256-bit int8 register. This switches it to use a specialization that goes from 4 float vectors into a whole int8 vector, borrowed from C2
Test Plan: Imported from OSS
Differential Revision: D17214413
Pulled By: jamesr66a
fbshipit-source-id: 1d6fc556e43739e9a4b0dba5df2332beb1b3795b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25428
Added bias as an optional param to the quantized_linear_prepack function.
Bias is quantized during runtime using input scale and weight scale.
ghstack-source-id: 89601399
Test Plan: python test/run_test.py --exclude nn --verbose --bring-to-front quantization quantized quantized_tensor quantized_nn_mods quantizer
Differential Revision: D17121304
fbshipit-source-id: 8adb0e55e4aed0a5430aaa2c8639c8ad1639c85a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25678
As an effort to unify fbgemm and qnnpack at the dispatcher level, we need to have a generic name for the quantized backed ops.
Currently FBGEMM is guarded by the USE_FBGEMM macro and QNNPACK uses USE_QNNPACK.
ghstack-source-id: 89518961
Test Plan: buck test caffe2/test:quantized
Differential Revision: D17194364
fbshipit-source-id: 5960aedff6b8cb89eb3872c39b74caf54c0fbf20
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25598
att
Test Plan:
CI
Imported from OSS
Differential Revision: D17192467
fbshipit-source-id: 9ee93b02cc293bb71ed114534d92eedda3ddee88
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25338
As an effort to unify fbgemm and qnnpack at the dispatcher level, we need to have a generic name for the quantized backed ops.
Currently FBGEMM is guarded by the USE_FBGEMM macro and QNNPACK uses USE_QNNPACK.
TBD: Use compile time macro or run_time to switch between fbgemm and qnnpack.
ghstack-source-id: 89454244
Test Plan: buck test caffe2/test:quantized
Differential Revision: D17097735
fbshipit-source-id: 447112a7a421387724d3e29b8fd8412dfb1c373a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25545
This re-uses the infrastructure from ATen/native/cpu, which compiles kernels multiple times for different instruction sets and dispatches dynamically based on the CPU's capability flags at runtime. This ensures we use the most optimal quantized kernel for the given machine
Test Plan: Imported from OSS
Differential Revision: D17166369
Pulled By: jamesr66a
fbshipit-source-id: 8c8393f99365e1408819bbaf254c1b5734a34b70
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25276
We add the per channel quantization support for the quantized linear operator, based on the recent added per channel quantization APIs in https://github.com/pytorch/pytorch/pull/24935 and https://github.com/pytorch/pytorch/pull/24934.
ghstack-source-id: 89267515
Test Plan:
buck test mode/dev caffe2/test:quantized -- 'test_qlinear_unpack \(test_quantized\.TestQuantizedLinear\)' --print-passing-details
```
[jianyuhuang@devvm6560.prn2.facebook.com: ~/fbsource/fbcode/caffe2/test] $ buck test mode/dev caffe2/test:quantized -- 'test_qlinear_unpack \(test_quantized\.TestQuantizedLinear\)' --print-passing-details
Action graph will be rebuilt because files have been added or removed.
Parsing buck files: finished in 1.3 sec
Building: finished in 5.7 sec (100%) 8114/8114 jobs, 0 updated
Total time: 7.0 sec
Trace available for this run at /tmp/testpilot.20190827-141824.842847.log
TestPilot test runner for Facebook. See https://fburl.com/testpilot for details.
Testpilot build revision c4cde854bae419be71282b0f92bf2d57a9203003 fbpkg f45bf410f1694a6882727cf03961702b at Mon Aug 26 22:10:29 2019 by twsvcscm from /usr/local/fbprojects/packages/testinfra.testpilot/686/t.par
Discovering tests
Running 1 tests
Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/5629499540372523
✓ caffe2/test:quantized - test_qlinear_unpack (test_quantized.TestQuantizedLinear) 0.996 1/1 (passed)
Test output:
> test_qlinear_unpack (test_quantized.TestQuantizedLinear) ... ok
>
> ----------------------------------------------------------------------
> Ran 1 test in 0.997s
>
> OK
Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/5629499540372523
Summary (total time 5.05s):
PASS: 1
FAIL: 0
SKIP: 0
FATAL: 0
TIMEOUT: 0
OMIT: 0
```
buck test mode/dev caffe2/test:quantized -- 'test_qlinear \(test_quantized\.TestQuantizedLinear\)' --print-passing-details
```
[jianyuhuang@devvm6560.prn2.facebook.com: ~/fbsource/fbcode/caffe2/test] $ buck test mode/dev caffe2/test:quantized -- 'test_qlinear \(test_quantized\.TestQuantizedLinear\)' --print-passing-details
Action graph will be rebuilt because files have been added or removed.
Parsing buck files: finished in 0.9 sec
Building: finished in 6.4 sec (100%) 8114/8114 jobs, 2 updated
Total time: 7.3 sec
Trace available for this run at /tmp/testpilot.20190827-141631.836596.log
TestPilot test runner for Facebook. See https://fburl.com/testpilot for details.
Testpilot build revision c4cde854bae419be71282b0f92bf2d57a9203003 fbpkg f45bf410f1694a6882727cf03961702b at Mon Aug 26 22:10:29 2019 by twsvcscm from /usr/local/fbprojects/packages/testinfra.testpilot/686/t.par
Discovering tests
Running 1 tests
Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/1125900049005601
✓ caffe2/test:quantized - test_qlinear (test_quantized.TestQuantizedLinear) 2.893 1/1 (passed)
Test output:
> test_qlinear (test_quantized.TestQuantizedLinear) ... ok
>
> ----------------------------------------------------------------------
> Ran 1 test in 2.893s
>
> OK
Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/1125900049005601
Summary (total time 6.78s):
PASS: 1
FAIL: 0
SKIP: 0
FATAL: 0
TIMEOUT: 0
OMIT: 0
```
buck test mode/dev caffe2/test:quantized -- 'test_qlinear \(test_quantized\.TestDynamicQuantizedLinear\)' --print-passing-details
```
[jianyuhuang@devvm6560.prn2.facebook.com: ~/fbsource/fbcode/caffe2/test] $ buck test mode/dev caffe2/test:quantized -- 'test_qlinear \(test_quantized\.TestDynamicQuantizedLinear\)' --print-passing-details
Action graph will be rebuilt because files have been added or removed.
Parsing buck files: finished in 1.7 sec
Building: finished in 4.9 sec (100%) 8118/8118 jobs, 2 updated
Total time: 6.6 sec
Trace available for this run at /tmp/testpilot.20190829-153630.613647.log
TestPilot test runner for Facebook. See https://fburl.com/testpilot for details.
Testpilot build revision f39465ac7f6b26840c8cbd0ae5e367fb8a60ec24 fbpkg cf4e6efcd2fa4642b6f8c26a9bd98d67 at Tue Aug 27 21:58:47 2019 by twsvcscm from /usr/local/fbprojects/packages/testinfra.testpilot/687/t.par
Discovering tests
Running 1 tests
Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/4222124657066806
✓ caffe2/test:quantized - test_qlinear (test_quantized.TestDynamicQuantizedLinear) 3.377 1/1 (passed)
Test output:
> test_qlinear (test_quantized.TestDynamicQuantizedLinear) ... ok
>
> ----------------------------------------------------------------------
> Ran 1 test in 3.378s
>
> OK
Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/4222124657066806
Summary (total time 8.18s):
PASS: 1
FAIL: 0
SKIP: 0
FATAL: 0
TIMEOUT: 0
OMIT: 0
```
Differential Revision: D17057818
fbshipit-source-id: 9ad8b9120fd0d9933ca81c132da61b53e2c91b9e