pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Brian Wignall	e7fe64f6a6	Fix typos (#30606 ) Summary: Should be non-semantic. Uses https://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings/For_machines to find likely typos. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30606 Differential Revision: D18763028 Pulled By: mrshenli fbshipit-source-id: 896515a2156d062653408852e6c04b429fc5955c	2019-12-02 20:17:42 -08:00
Jianyu Huang	38ca3552d9	Unit Test for the Legacy Dynamic Quantized Linear operator (#23139 ) Summary: Add a unit test for the Dynamic Quantized Linear operator (```torch.fbgemm_linear_quantize_weight```, ```torch.fbgemm_pack_quantized_matrix```, and ```torch.fbgemm_linear_int8_weight```) in ```test_quantized.py```. Test Plan: buck test mode/dev caffe2/test:quantized -- 'test_qlinear_legacy $test_quantized\.TestDynamicQuantizedLinear$' --print-passing-details [jianyuhuang@devvm29567.prn1.facebook.com: ~/fbsource/fbcode/caffe2/test] $ buck test mode/dev caffe2/test:quantized -- 'test_dynamic_qlinear $test_quantized\.TestQuantizedLinear$' --print-passing-details Parsing buck files: finished in 1.8 sec Building: finished in 3.4 sec (100%) 6772/6772 jobs, 2 updated Total time: 5.2 sec Trace available for this run at /tmp/testpilot.20190714-220130.2698168.log TestPilot test runner for Facebook. See https://fburl.com/testpilot for details. Testpilot build revision 4f180136f799ab45ec2bf5d7644cb14955d4dd7a fbpkg 6c6253f255644ca3b8ce1bc5955b0f25 at Mon Jul 8 14:13:38 2019 by twsvcscm from / usr/local/fbprojects/packages/testinfra.testpilot/651/t.par Discovering tests Running 1 tests Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/1125900044862617 ✓ caffe2/test:quantized - test_dynamic_qlinear (test_quantized.TestQuantizedLinear) 0.023 1/1 (passed) Test output: > test_dynamic_qlinear (test_quantized.TestQuantizedLinear) ... ok > > ---------------------------------------------------------------------- > Ran 1 test in 0.024s > > OK Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/1125900044862617 Summary (total time 9.03s): PASS: 1 FAIL: 0 SKIP: 0 FATAL: 0 TIMEOUT: 0 OMIT: 0 Differential Revision: D16404027 fbshipit-source-id: 4c85dd255637fd8b1eb4830e0464f48c22706f41	2019-11-20 20:59:35 -08:00
Jianyu Huang	bbff06ee96	Convert conv_prepack to conv2d_prepack and conv_unpack to conv2d_unpack (#29529 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29529 Pull Request resolved: https://github.com/pytorch/glow/pull/3771 We would like to replace `conv_prepack` with `conv2d_prepack` and `conv_unpack` with `conv2d_unpack`. This makes the naming consistent between 2D and 3D conv: ``` torch.ops.quantized.conv2d_prepack torch.ops.quantized.conv2d_unpack torch.ops.quantized.conv2d torch.ops.quantized.conv3d_prepack torch.ops.quantized.conv3d_unpack torch.ops.quantized.conv3d ``` We should do this earlier rather than later when we have more users for the quantized conv2d ops, for better engineering. The replacement bash command is as the follows: ``` find ./ -type f -exec sed -i -e 's/quantized::conv_prepack/quantized::conv2d_prepack/g' {} \; find ./ -type f -exec sed -i -e 's/quantized::conv_unpack/quantized::conv2d_unpack/g' {} \; find ./ -type f -exec sed -i -e 's/torch.ops.quantized.conv_prepack/torch.ops.quantized.conv2d_prepack/g' {} \; find ./ -type f -exec sed -i -e 's/torch.ops.quantized.conv_unpack/torch.ops.quantized.conv2d_unpack/g' {} \; ``` ghstack-source-id: 93661879 Test Plan: CI Reviewed By: jackm321 Differential Revision: D18421079 fbshipit-source-id: 17ae8b1ee79223bd2c5d4bbccd57af6580c4ab12	2019-11-11 21:54:10 -08:00
Zafar Takhirov	aa658a2a68	Adding inplace quantized relu6 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29245 Test Plan: Imported from OSS Differential Revision: D18334541 Pulled By: z-a-f fbshipit-source-id: 25b12cc88ee81434d96cf5c44c008c6f85da0673	2019-11-09 14:53:42 -08:00
Lingyi Liu	f5074ccafe	set the no_deadline for the adaptive_avg_pool_nhwc test (#29502 ) Summary: It is reported this test is flaky due to the time expiration. This pr flags it as no_deadline test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29502 Differential Revision: D18416632 Pulled By: lly-zero-one fbshipit-source-id: 27cd7b28139f3f16ee0cf5802a0709385719d487	2019-11-09 09:30:46 -08:00
Supriya Rao	4515edfe15	Disable QNNPACK tests on MacOS (#29328 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29328 Tests are flaky as seen in issue #29326. Disable until we fix the kernels. Test Plan: python test/test_quantized.py TestQNNPackOps Imported from OSS Differential Revision: D18358200 fbshipit-source-id: 58f1981799fe8253234fcc7b0540e1c0b6babc15	2019-11-06 21:30:11 -08:00
Supriya Rao	6ea4219d20	Temporarily disable qnnpack tests on MACOS (#29176 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29176 Captured in issue #27326 Test Plan: python test/test_quantized.py test_qconv Imported from OSS Differential Revision: D18336184 fbshipit-source-id: 7394b04215b6c8b7bc0508f1648f23022bd031cb	2019-11-05 18:52:45 -08:00
Zafar Takhirov	7ea83120df	Fixing the shape calculation for pool tests Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28853 Test Plan: Imported from OSS Differential Revision: D18212290 Pulled By: z-a-f fbshipit-source-id: 44a41f3192c8b168a8a0fb68eb33b68400917c7a	2019-11-01 12:29:27 -07:00
Xiaomeng Yang	f6692146e7	Add Conv3dInt8 (#28768 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28768 Add Conv3dInt8 Test Plan: buck test mode/dev-nosan caffe2/test:quantized -- "Conv" Reviewed By: jianyuh Differential Revision: D18023661 fbshipit-source-id: 8fc7a4350baf29271dfd6fa3c1c4b10e60e2fdbf	2019-10-28 23:28:11 -07:00
Xiaomeng Yang	d5afd97569	Refactor qconv_prepack and qconv_unpack to support conv3d (#28481 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28481 Refactor qconv_prepack and qconv_unpack to support conv3d Test Plan: buck test mode/dev-nosan caffe2/test:quantized -- "Conv" Reviewed By: dskhudia Differential Revision: D18023651 fbshipit-source-id: 8cbc9fe68f93bc4b247a4f41423c6d8c30a5ef90	2019-10-27 14:43:16 -07:00
Lingyi Liu	4d9c017dee	Fix the padding issue of quantized average pool operator (#28260 ) Summary: This is actually a bug in both testing and the average pool implementation. In testing, we used the quantized value as float input and failed to padding the value with zero_point. In op implementation, the size for averaging is not correct for padding case when count_include_pad is true. Pull Request resolved: https://github.com/pytorch/pytorch/pull/28260 Differential Revision: D18039960 Pulled By: lly-zero-one fbshipit-source-id: 7b5d34498b60f5d574a276a22798c9f576944734	2019-10-21 11:06:31 -07:00
Supriya Rao	15be189f0d	Add quantized torch mean implementation (#27675 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27675 This leverages QNNPACK global average pooling to perform torch.mean on input feature maps Currently can only support mean along HxW plane in NCHW tensor. Test Plan: python test/test_quantized.py TestQuantizedOps.test_mean Imported from OSS Differential Revision: D17989336 fbshipit-source-id: 8d4cbcbed5f146290b1580d26e5b45359d293761	2019-10-19 19:20:59 -07:00
Supriya Rao	3629974c1e	Fix quantized avg_pool2d test to support non-zero padding (#28246 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28246 Updated the reference fp32 implementation to use the dequantized input tensor to correctly take padded values into account Test Plan: python test/test_quantized.py TestQNNPackOps.test_avg_pool2d Imported from OSS Differential Revision: D17989334 fbshipit-source-id: 848ce78713280f529f71ff48e930db8de18abc62	2019-10-18 09:14:54 -07:00
Supriya Rao	de0f9567a3	Add quantized avg_pool2d for pytorch mobile (#27631 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27631 Add support to perform avg_pool2d on mobile. Tested using existing avg_pool2d python tests Uses qnnpack backend, which currently only support 4 dim inputs. Test Plan: python test/test_quantized.py TestQNNPackOps.test_avg_pool2d Imported from OSS Differential Revision: D17973792 fbshipit-source-id: 95ffffb2da656ed911a618b9cb68d6b728c16c74	2019-10-16 22:02:23 -07:00
Jerry Zhang	9084fcba46	test_equal in test_quantized.py (#27616 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27616 Fix a problem in reference implementation of equal Test Plan: pytho test/test_quantized.py Imported from OSS Differential Revision: D17837055 fbshipit-source-id: 1e4bc32f4334c0352468a61fa4316a1c0ff76485	2019-10-09 14:13:56 -07:00
Zafar Takhirov	eb5040c205	Suppressing hypothesis health check for qnnpack_add Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27193 Test Plan: Imported from OSS Differential Revision: D17704958 Pulled By: zafartahirov fbshipit-source-id: d8ab58b724cce2f5130b10ead0f10f5f32e26cfb	2019-10-02 11:39:12 -07:00
Supriya Rao	b805b5dab8	Unify quantized conv and linear tests (#26992 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26992 Run the same test for FBGEMM and QNNPACK backends. Checks that QNNPACK or FBGEMM are supported before running it (using supported_qengines) Test Plan: python test/test_quantized.py TestQuantizedLinear python test/test_quantized.py TestQuantizedConv python test/test_quantized_models.py python test/test_quantized_nn_mods.py Imported from OSS Differential Revision: D17689171 fbshipit-source-id: e11c0a5e41f5f4e6836a614a5b61e4db3c5e384b	2019-10-01 14:07:16 -07:00
Supriya Rao	250f482aa5	Support qadd_relu on pytorch mobile (#26982 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26982 Fused add+relu support Test Plan: python test/test_quantized.py TestQNNPackOps.test_qnnpack_add Also, Add torch.backends.quantized.engine = "qnnpack" Ran python test/test_quantized.py TestQuantizedOps.test_qadd_relu_different_qparams python test/test_quantized.py TestQuantizedOps.test_qadd_relu_same_qparams Imported from OSS Differential Revision: D17635063 fbshipit-source-id: dd1cdf07f66c4cd657c1907f1b650e50d3d4725f	2019-09-27 16:13:42 -07:00
James Reed	b518ff3cb8	Re-write of tensor-scalar mul Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26937 Test Plan: Imported from OSS Differential Revision: D17618028 Pulled By: jamesr66a fbshipit-source-id: 90ef461972e826327a19467ad4cefdeb35e13adc	2019-09-27 16:09:27 -07:00
Dmytro Dzhulgakov	764bf826e3	Remove fbgemm_is_cpu_supported in favor of torch.backends.quantized.supported_qengines (#26840 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26840 Cleaning up top-level namespace. Also cosmetic changes to torch.backends.quantized Test Plan: Imported from OSS Differential Revision: D17604403 Pulled By: dzhulgakov fbshipit-source-id: c55af277ea7319d962a82a6120f65ccd47a60abc	2019-09-27 13:45:15 -07:00
Lingyi Liu	428204dfa4	Fix the QuantizedAVX2 build issue (#26854 ) Summary: The QuantizedAVx2 does not support the int32 type. We switch to use at::quantize_vec function instead. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26854 Differential Revision: D17609872 Pulled By: llyfacebook fbshipit-source-id: b4a77d93ce0ebfef696506b5cdbe3e91fe44bb36	2019-09-27 10:20:26 -07:00
James Reed	b1a09dbec7	Support ceil_mode in quantized maxpool Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26916 Test Plan: Imported from OSS Differential Revision: D17609625 Pulled By: jamesr66a fbshipit-source-id: a9e1878e7946ee71b6888a91f0dcb2e889939376	2019-09-26 16:48:09 -07:00
James Reed	20ebd13f0a	Re-write of tensor-scalar quantized add Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26766 Test Plan: Imported from OSS Differential Revision: D17587105 Pulled By: jamesr66a fbshipit-source-id: 4da6ea98a4c5cc36fd191d9845c1ef409efce464	2019-09-25 20:19:28 -07:00
Lingyi Liu	03007b3dda	Quantized Interpolate Kernel(upsample_bilinear2d) (#26631 ) Summary: We implement the quantized upsample_bilinear2d case for interpolate kernel in this PR. For nhwc performance improvement: import torch, time for dtype in [torch.qint8, torch.quint8, torch.qint32]: print('**', str(dtype), '**') x = torch.rand(1, 56, 56, 256) q_x = torch.quantize_per_tensor(x, 0.5, 1, dtype) q_x = q_x.permute([0, 3, 1, 2]) x = x.permute([0, 3, 1, 2]) NITER = 100 s = time.time() for i in range(NITER): float_out = torch.nn.functional.interpolate(x, size=5, scale_factor=None, mode="bilinear", align_corners=True) time_per_iter_float = (time.time() - s) / NITER s = time.time() for i in range(NITER): quant_out = torch.nn.quantized.functional.interpolate(q_x, size=5, scale_factor=None, mode="bilinear", align_corners=True) time_per_iter_quant = (time.time() - s) / NITER ref_quantized = torch.quantize_per_tensor(float_out, 0.5, 1, dtype) # torch.testing.assert_allclose(ref_quantized.dequantize(), quant_out.dequantize()) print('time/iter ms (float)', 'time/iter ms (quant)', 'quant/float', sep='\t') print(time_per_iter_float 1000, time_per_iter_quant * 1000, time_per_iter_quant / time_per_iter_float, sep='\t') bytes_float = (x.numel() + float_out.numel()) * x.element_size() bytes_quant = (q_x.numel() + quant_out.numel()) * q_x.element_size() float_bw_gbps = bytes_float / time_per_iter_float / 1e9 quant_bw_gbps = bytes_quant / time_per_iter_quant / 1e9 print('GB/s float', 'GB/s quant', sep='\t') print(float_bw_gbps, quant_bw_gbps, sep='\t') ===========without nhwc handling=========== ** torch.qint8 * time/iter ms (float) time/iter ms (quant) quant/float 1.999044418334961 2.5860953330993652 1.2936657681940702 GB/s float GB/s quant 1.6192056416115257 0.3129103516188541 torch.quint8 * time/iter ms (float) time/iter ms (quant) quant/float 2.02730655670166 2.6061582565307617 1.2855274639721328 GB/s float GB/s quant 1.596632728927902 0.3105014816242217 torch.qint32 * time/iter ms (float) time/iter ms (quant) quant/float 2.0180463790893555 2.4047350883483887 1.1916153728010588 GB/s float GB/s quant 1.603959172365819 1.3460376636426636 ===========with nhwc handling=========== torch.qint8 * time/iter ms (float) time/iter ms (quant) quant/float 2.0913314819335938 0.09696483612060547 0.04636512047863123 GB/s float GB/s quant 1.5477527249803915 8.345458337015 torch.quint8 * time/iter ms (float) time/iter ms (quant) quant/float 2.1065664291381836 0.09959936141967773 0.04728042754408879 GB/s float GB/s quant 1.5365591871338384 8.124710725706763 torch.qint32 *** time/iter ms (float) time/iter ms (quant) quant/float 2.044203281402588 0.6003522872924805 0.29368521846837126 GB/s float GB/s quant 1.5834354779917448 5.391607675216635 Pull Request resolved: https://github.com/pytorch/pytorch/pull/26631 Differential Revision: D17521498 Pulled By: llyfacebook fbshipit-source-id: 385ae0f77777cd8bee385cafb80e492127b7d103	2019-09-25 13:43:43 -07:00
James Reed	cf272d43ab	Trivial quantized torch.mean implementation Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26253 Test Plan: Imported from OSS Differential Revision: D17529994 Pulled By: jamesr66a fbshipit-source-id: e3aff71da35b05ed61710cdb88d72b51c944168b	2019-09-24 10:18:15 -07:00
Dmytro Dzhulgakov	ade60f8a8d	Allow per-channel QTensor accept any floating type for scales (#26676 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26676 Just makes it more user-friendly to be able to pass any floating point or int point values to scales or zero_points for per-channel quantization. It matches behavior or per tensor quantizer where those arguments are scalars (not tensors) and thus automatic casting is applied. Test Plan: Imported from OSS Differential Revision: D17537051 Pulled By: dzhulgakov fbshipit-source-id: e955ccdb5b4691828a559dc8f1ed7de54b6d12c4	2019-09-23 22:29:05 -07:00
Dmytro Dzhulgakov	b93823cb65	Per-channel quantized tensor to have only a single axis (#26675 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26675 Based on offline poll, we're very unlikely to have multi-axis quantized tensors in the foreseeable future. Let's simplify API and just return int instead of list. It also matches the singular `axis` name. Test Plan: Imported from OSS Differential Revision: D17537052 Pulled By: dzhulgakov fbshipit-source-id: 676abc3b251d288468aaed467b5e5ca4063b98b0	2019-09-23 22:29:01 -07:00
Lingyi Liu	ba8002ec13	Quantized Interpolate Kernel(upsample_nearest2d) (#26617 ) Summary: In this PR, we implemented the support of quantized interpolate with upsample_nearest2d case. import torch, time for dtype in [torch.qint8, torch.quint8, torch.qint32]: print('**', str(dtype), '**') x = torch.rand(1, 56, 56, 256) q_x = torch.quantize_per_tensor(x, 0.5, 1, dtype) q_x = q_x.permute([0, 3, 1, 2]) x = x.permute([0, 3, 1, 2]) NITER = 100 s = time.time() for i in range(NITER): # float_out = torch.nn.functional.avg_pool2d(x, kernel_size=5, stride=None, padding=0) # float_out = torch.nn.functional.adaptive_avg_pool2d(x, output_size=5) float_out = torch.nn.functional.interpolate(x, size=5, scale_factor=None, mode="nearest", align_corners=None) time_per_iter_float = (time.time() - s) / NITER s = time.time() for i in range(NITER): # quant_out = torch.nn.quantized.functional.avg_pool2d(q_x, kernel_size=5, stride=None, padding=0) # quant_out = torch.nn.quantized.functional.adaptive_avg_pool2d(q_x, output_size=5) quant_out = torch.nn.quantized.functional.interpolate(q_x, size=5, scale_factor=None, mode="nearest", align_corners=None) time_per_iter_quant = (time.time() - s) / NITER ref_quantized = torch.quantize_per_tensor(float_out, 0.5, 1, dtype) # torch.testing.assert_allclose(ref_quantized.dequantize(), quant_out.dequantize()) print('time/iter ms (float)', 'time/iter ms (quant)', 'quant/float', sep='\t') print(time_per_iter_float 1000, time_per_iter_quant * 1000, time_per_iter_quant / time_per_iter_float, sep='\t') bytes_float = (x.numel() + float_out.numel()) * x.element_size() bytes_quant = (q_x.numel() + quant_out.numel()) * q_x.element_size() float_bw_gbps = bytes_float / time_per_iter_float / 1e9 quant_bw_gbps = bytes_quant / time_per_iter_quant / 1e9 print('GB/s float', 'GB/s quant', sep='\t') print(float_bw_gbps, quant_bw_gbps, sep='\t') =========without special handling of NHWC layout============= ** torch.qint8 * time/iter ms (float) time/iter ms (quant) quant/float 2.08712100982666 2.1624231338500977 1.0360794240817361 GB/s float GB/s quant 1.5508750976872339 0.37421723220248165 torch.quint8 * time/iter ms (float) time/iter ms (quant) quant/float 2.056601047515869 2.184889316558838 1.0623787823107091 GB/s float GB/s quant 1.573890086222483 0.3703693335250963 torch.qint32 * time/iter ms (float) time/iter ms (quant) quant/float 2.0152783393859863 2.067704200744629 1.0260142037623525 GB/s float GB/s quant 1.6061622539873104 1.5654386148823074 =========with special handling of NHWC layout============= torch.qint8 * time/iter ms (float) time/iter ms (quant) quant/float 2.044649124145508 0.009250640869140625 0.004524317038018256 GB/s float GB/s quant 1.5830902044636819 87.47675014597938 torch.quint8 * time/iter ms (float) time/iter ms (quant) quant/float 2.049403190612793 0.009107589721679688 0.004444020465761265 GB/s float GB/s quant 1.579417859221808 88.8507305147644 torch.qint32 *** time/iter ms (float) time/iter ms (quant) quant/float 2.0601415634155273 0.01062631607055664 0.0051580513976618066 GB/s float GB/s quant 1.5711852318699757 304.6082930818039 Pull Request resolved: https://github.com/pytorch/pytorch/pull/26617 Differential Revision: D17519146 Pulled By: llyfacebook fbshipit-source-id: 126876e550ef7009fd75f5ccc033599f1f37456d	2019-09-23 20:32:19 -07:00
James Reed	cb9fd0ce58	quantized torch.topk (#26486 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26486 This PR adds a quantized version of torch.topk, supporting all the same options Benchmark script ``` import torch import time for dtype in [torch.qint8, torch.quint8, torch.qint32]: X = torch.rand(6, 5, 1024) qX = torch.quantize_linear(X, 0.01, 0, dtype) X = qX.dequantize() NITER = 10000 s = time.time() for i in range(NITER): float_out = torch.topk(X, 50) float_time_per_iter = (time.time() - s) / NITER s = time.time() for i in range(NITER): quant_out = torch.topk(qX, 50) quant_time_per_iter = (time.time() - s) / NITER print(dtype) print('float ms', 'quant ms', 'float gB/s', 'quant gB/s', sep='\t') nbytes_float = (X.numel() + float_out[0].numel()) * X.element_size() nbytes_quant = (qX.numel() + quant_out[0].numel()) * qX.element_size() print(float_time_per_iter * 1000, quant_time_per_iter * 1000, nbytes_float / float_time_per_iter / 1e9, nbytes_quant / quant_time_per_iter / 1e9, sep='\t') ``` Results ``` torch.qint8 float ms quant ms float gB/s quant gB/s 0.3706729888916016 0.3370296716690064 0.34769191136743244 0.09559989136992947 torch.quint8 float ms quant ms float gB/s quant gB/s 0.38260042667388916 0.34079675674438475 0.3368527346412275 0.09454315325003715 torch.qint32 float ms quant ms float gB/s quant gB/s 0.38033516407012935 0.3364055633544922 0.3388590174539739 0.38310900305828427 ``` Test Plan: Imported from OSS Differential Revision: D17529988 Pulled By: jamesr66a fbshipit-source-id: b5edfe90c592b6c84459d1c0c77e4c18f5b04417	2019-09-23 16:47:47 -07:00
Jianyu Huang	cbdbdd3c8c	Fix the flaky test_qlinear test caused by hypothesis deadline (#26663 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26663 As Title says. Example error: https://circleci.com/gh/pytorch/pytorch/2894108?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link%2Fconsole ``` Sep 23 19:08:00 Unreliable test timings! On an initial run, this test took 453.13ms, which exceeded the deadline of 200.00ms, but on a subsequent run it took 23.01 ms, which did not. If you expect this sort of variability in your test timings, consider turning deadlines off for this test by setting deadline=None. ``` ghstack-source-id: 90613535 Test Plan: CI Differential Revision: D17534476 fbshipit-source-id: d3ab91c8b290a0433eab4af3fc73ecbf728ec5bf	2019-09-23 14:19:39 -07:00
James Reed	c0aa6a01ce	NHWC specialization for quantized::cat (#26524 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26524 This creates an NHWC specialization for `quantized::cat` that kicks in when all inputs are `NHWC`. This ensures the correct layout is propagated downstream as well as is an optimized implementation specifically for this data layout Benchmark script based on Squeezenet shapes: ``` import torch, time torch.manual_seed(0) # NHWC sizes = [ (1, 54, 54, 64), (1, 54, 54, 128), (1, 26, 26, 128), (1, 26, 26, 256), (1, 12, 12, 256) ] for size in sizes: x = torch.rand(size) y = torch.rand(size) qX = torch.quantize_linear(x, 0.01, 3, torch.qint8).permute([0, 3, 1, 2]) qY = torch.quantize_linear(y, 0.01, 3, torch.qint8).permute([0, 3, 1, 2]) ref = torch.cat([qX.dequantize(), qY.dequantize()], dim=1) NITER = 1000 s = time.time() for i in range(NITER): out = torch.ops.quantized.cat([qX, qY], dim=1, scale=0.01, zero_point=3) time_per_iter = (time.time() - s) / NITER print('time per iter ms', time_per_iter * 1000) print('gb/s', (qX.numel() + qY.numel() + out.numel()) * qX.element_size() / time_per_iter / 1e9) torch.testing.assert_allclose(out.dequantize(), ref) ``` Before this change ``` time per iter ms 0.6898486614227295 gb/s 1.0821156026605054 time per iter ms 1.5480577945709229 gb/s 0.9644291093239284 time per iter ms 0.3180875778198242 gb/s 1.0881028500775023 time per iter ms 0.6702737808227539 gb/s 1.032748139350315 time per iter ms 0.13010454177856445 gb/s 1.1333655073392244 ``` After this change ``` time per iter ms 0.11604785919189453 gb/s 6.432656364350577 time per iter ms 0.15956878662109375 gb/s 9.356416324360508 time per iter ms 0.040181636810302734 gb/s 8.613685939027139 time per iter ms 0.06564664840698242 gb/s 10.544696748392909 time per iter ms 0.018549680709838867 gb/s 7.949247337814738 ``` Test Plan: Imported from OSS Differential Revision: D17503593 Pulled By: jamesr66a fbshipit-source-id: ec5d57ad8fbcb3fd9379e8bd370abd29d386f953	2019-09-23 13:19:29 -07:00
Supriya Rao	99226cd51e	Unify Quantization APIs for add, pool and relu (#26586 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26586 Use the backend engine flag to call QNNPACK for quantized ops. Test Plan: python test/test_quantized.py TestQNNPACKOps Differential Revision: D17515129 Pulled By: supriyar fbshipit-source-id: 951e90205aa19581ea006a91d9514fc7a94409ef	2019-09-21 13:41:16 -07:00
Lingyi Liu	eca01eb0a6	quantized average_pool2d and adaptive_avg_pool2d implementation(Revert d17437015) (#26580 ) Summary: In this PR, we tried to fix the windows build issue of d17437015. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26580 Differential Revision: D17517341 Pulled By: llyfacebook fbshipit-source-id: db726596aa8f7c992c5a7ddc2781dc3aa0312284	2019-09-21 11:10:26 -07:00
Sebastian Messmer	fcfca9ad62	Skip some fragile tests (#26599 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26599 These fail due to tolerance in equality comparison. Disable them for now. ghstack-source-id: 90553855 Test Plan: unit tests Differential Revision: D17517085 fbshipit-source-id: a4d9278e356318719ccd84047404915a97944f52	2019-09-21 11:06:42 -07:00
Jerry Zhang	2e82ee0335	quantize_linear_per_channel -> quantize_per_channel (#26575 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26575 To keep consistent with `quantize_per_tensor` we also rename `quantize_linear_per_channel` to `quantize_per_channel` Test Plan: ci Imported from OSS Differential Revision: D17517360 fbshipit-source-id: 3af7d8f0fbe99148b79fcb1ad2fe811f776590cd	2019-09-21 11:02:17 -07:00
Jerry Zhang	254122dd4e	quantize_linear -> quantize_per_tensor (#26574 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26574 Since we also have `quantized::linear`, `quantize_linear` sounds confusing, so we plan to rename it before the branch cut Test Plan: ci Imported from OSS Differential Revision: D17514876 fbshipit-source-id: 01d9005e6ec8cb9950b9d8bba122109c389641d3	2019-09-20 21:58:48 -07:00
Supriya Rao	516cf051ee	Revert D17504331: Unify Quantization APIs for add, pool and relu Test Plan: revert-hammer Differential Revision: D17504331 Original commit changeset: 35cb2189067a fbshipit-source-id: d433288f1dbb430d647c6694b3e3ad4276787c3b	2019-09-20 17:13:01 -07:00
Lingyi Liu	f0b7132b87	Revert D17437015: [pytorch][PR] Add the quantized average_pool2d support and adaptive_avg_pool2d support Test Plan: revert-hammer Differential Revision: D17437015 Original commit changeset: 496aed1e4171 fbshipit-source-id: 53e22a85e06bd9d7827579b124b7f136230b6c1d	2019-09-20 15:01:49 -07:00
Supriya Rao	f337459619	Unify Quantization APIs for add, pool and relu (#26335 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26335 Use the backend engine flag to call QNNPACK for quantized ops. Test Plan: python test/test_quantized.py TestQNNPACKOps Imported from OSS Differential Revision: D17504331 fbshipit-source-id: 35cb2189067ac5cc6a7307179ef0335d1cec7b8f	2019-09-20 14:58:35 -07:00
Lingyi Liu	6411b92d6e	Add the quantized average_pool2d support and adaptive_avg_pool2d support (#25899 ) Summary: //copied from PR https://github.com/pytorch/pytorch/issues/25676 ===============For avg_pool2d============== import torch, time for dtype in [torch.qint8, torch.quint8, torch.qint32]: print('**', str(dtype), '**') x = torch.rand(1, 56, 56, 256) q_x = torch.quantize_linear(x, 0.5, 1, dtype) q_x = q_x.permute([0, 3, 1, 2]) x = x.permute([0, 3, 1, 2]) NITER = 100 s = time.time() for i in range(NITER): float_out = torch.nn.functional.avg_pool2d(x, kernel_size=3, stride=None, padding=0) time_per_iter_float = (time.time() - s) / NITER s = time.time() for i in range(NITER): quant_out = torch.nn.quantized.functional.avg_pool2d(q_x, kernel_size=3, stride=None, padding=0) time_per_iter_quant = (time.time() - s) / NITER ref_quantized = torch.quantize_linear(float_out, 0.5, 1, dtype) torch.testing.assert_allclose(ref_quantized.dequantize(), quant_out.dequantize()) print('time/iter ms (float)', 'time/iter ms (quant)', 'quant/float', sep='\t') print(time_per_iter_float 1000, time_per_iter_quant * 1000, time_per_iter_quant / time_per_iter_float, sep='\t') bytes_float = (x.numel() + float_out.numel()) * x.element_size() bytes_quant = (q_x.numel() + quant_out.numel()) * q_x.element_size() float_bw_gbps = bytes_float / time_per_iter_float / 1e9 quant_bw_gbps = bytes_quant / time_per_iter_quant / 1e9 print('GB/s float', 'GB/s quant', sep='\t') print(float_bw_gbps, quant_bw_gbps, sep='\t') Before the vectorization: ** torch.qint8 * time/iter ms (float) time/iter ms (quant) quant/float 2.67439603805542 7.126874923706055 2.6648539791017924 GB/s float GB/s quant 1.2470733401269298 0.11699265230915809 torch.quint8 * time/iter ms (float) time/iter ms (quant) quant/float 2.587001323699951 7.011299133300781 2.7102031487456535 GB/s float GB/s quant 1.2892022781148076 0.11892118481150399 torch.qint32 * time/iter ms (float) time/iter ms (quant) quant/float 2.6659250259399414 7.03080415725708 2.637285028215745 GB/s float GB/s quant 1.2510359321992184 0.4743650833393638 After the vectorization torch.qint8 * time/iter ms (float) time/iter ms (quant) quant/float 2.6113319396972656 0.5631613731384277 0.2156605847679846 GB/s float GB/s quant 1.2771903676047593 1.48055608884072 torch.quint8 * time/iter ms (float) time/iter ms (quant) quant/float 2.5221967697143555 0.5518221855163574 0.21878633425529784 GB/s float GB/s quant 1.322326647963202 1.5109794819499591 torch.qint32 * time/iter ms (float) time/iter ms (quant) quant/float 2.5173258781433105 4.0132904052734375 1.5942673295177407 GB/s float GB/s quant 1.324885279636461 0.8310308159154421 ===============For adaptive_avg_pool2d============== import torch, time for dtype in [torch.qint8, torch.quint8, torch.qint32]: print('', str(dtype), '**') x = torch.rand(1, 56, 56, 256) q_x = torch.quantize_linear(x, 0.5, 1, dtype) q_x = q_x.permute([0, 3, 1, 2]) x = x.permute([0, 3, 1, 2]) NITER = 100 s = time.time() for i in range(NITER): float_out = torch.nn.functional.adaptive_avg_pool2d(x, output_size=5) time_per_iter_float = (time.time() - s) / NITER s = time.time() for i in range(NITER): quant_out = torch.nn.quantized.functional.adaptive_avg_pool2d(q_x, output_size=5) time_per_iter_quant = (time.time() - s) / NITER ref_quantized = torch.quantize_linear(float_out, 0.5, 1, dtype) torch.testing.assert_allclose(ref_quantized.dequantize(), quant_out.dequantize()) print('time/iter ms (float)', 'time/iter ms (quant)', 'quant/float', sep='\t') print(time_per_iter_float 1000, time_per_iter_quant * 1000, time_per_iter_quant / time_per_iter_float, sep='\t') bytes_float = (x.numel() + float_out.numel()) * x.element_size() bytes_quant = (q_x.numel() + quant_out.numel()) * q_x.element_size() float_bw_gbps = bytes_float / time_per_iter_float / 1e9 quant_bw_gbps = bytes_quant / time_per_iter_quant / 1e9 print('GB/s float', 'GB/s quant', sep='\t') print(float_bw_gbps, quant_bw_gbps, sep='\t') ~ //Before the vectorization ** torch.qint8 * time/iter ms (float) time/iter ms (quant) quant/float 2.286238670349121 4.600362777709961 2.0121970804594342 GB/s float GB/s quant 1.4158031888707898 0.17590264922602994 torch.quint8 * time/iter ms (float) time/iter ms (quant) quant/float 2.2867274284362793 4.474163055419922 1.9565790831832832 GB/s float GB/s quant 1.4155005794518536 0.180864217503144 torch.qint32 * time/iter ms (float) time/iter ms (quant) quant/float 2.3176145553588867 4.264359474182129 1.8399778618588218 GB/s float GB/s quant 1.3966360335956578 0.7590504551966285 //After the vectorization: torch.qint8 * time/iter ms (float) time/iter ms (quant) quant/float 2.3224568367004395 0.23195743560791016 0.09987588657942796 GB/s float GB/s quant 1.3937240722194333 3.4886400510473843 torch.quint8 * time/iter ms (float) time/iter ms (quant) quant/float 2.255082130432129 0.2124309539794922 0.09420098324258604 GB/s float GB/s quant 1.435364129899667 3.8093130254365883 torch.qint32 *** time/iter ms (float) time/iter ms (quant) quant/float 2.266514301300049 1.6029787063598633 0.7072440290539581 GB/s float GB/s quant 1.4281242338260862 2.0192807222938463 Pull Request resolved: https://github.com/pytorch/pytorch/pull/25899 Differential Revision: D17437015 Pulled By: llyfacebook fbshipit-source-id: 496aed1e41711048d0853254d6819d3fb141a0c0	2019-09-20 14:20:16 -07:00
Supriya Rao	8c4b7a1b4b	Changes to support int8 weight and fp32 bias in QNNPACK (#26307 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26307 Add support for FP32 bias. Re-quantize bias during time time based on input scale. If the value of input scale changes in the packed struct we requantize the bias with the updated input scale. Test Plan: python test/test_quantized.py TestQNNPackOps Differential Revision: D17504253 Pulled By: supriyar fbshipit-source-id: 49fe36a0bee91aaeb085db28eec4ded8c684dcf4	2019-09-20 13:17:56 -07:00
Dmytro Dzhulgakov	af64789cfa	Fold activation permutation inside quantized conv operator (#26242 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26242 According to https://github.com/pytorch/pytorch/issues/19092 we always keep NCHW order and do handling inside the kernels. This PR fixes it for activations of the qconv by using MemoryLayout mechanism - activations stay logically as NCHW but strided as NHWC. Note, that this version is more aggressive than eventual MemoryLayout mechanism - the QConv's output is always NHWC regardless of the input striding. I think it's ok as we don't have NCHW quantized kernels anyway - so the very first conv would magically switch the order, but I'm open to suggestions. Btw, it doesn't change behavior - same happens today in master because of the explicit permute() call. Test Plan: Imported from OSS Differential Revision: D17443218 Pulled By: dzhulgakov fbshipit-source-id: cfd136ae0465acd8d8c26ffad87385dac9c88726	2019-09-19 13:39:26 -07:00
Dmytro Dzhulgakov	d5daac7223	Fold weight permutation inside quantized conv operator (#26241 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26241 According to https://github.com/pytorch/pytorch/issues/19092 we always keep NCHW order and do handling inside the kernels. This PR fixes it for weights of the qconv by using MemoryLayout mechanism. Test Plan: Imported from OSS Differential Revision: D17443219 Pulled By: dzhulgakov fbshipit-source-id: ce0eb92034a9977b3303dafab8b0414575171062	2019-09-19 13:39:22 -07:00
Supriya Rao	d46b982db3	Add support to call unpack for pytorch mobile quantized FC and Conv (#26211 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26211 Currently QNNPACK does not have an unpack function like FBGEMM does. In order to be able to script quantized models for mobile, we need to save unpacked weights. This change stores the original weights and bias in the opaque struct and simply returns it when unpack is called Test Plan: python test/test_quantized.py TestQNNPackOps.test_qconv_unpack python test/test_quantized.py TestQNNPackOps.test_qlinear_unpack Imported from OSS Differential Revision: D17464430 fbshipit-source-id: 83ad5a2556dcf13245a1047feef6cfb489c9ef69	2019-09-18 23:05:18 -07:00
Supriya Rao	b23be95558	Adding quantized::conv2d function for pytorch mobile in c10 (#26152 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26152 This change adds the support to call QNNPACK using the refactored API for Conv2d operators Test Plan: python test/test_quantized.py TestQNNPackOps.test_qconv_qnnpack Imported from OSS Differential Revision: D17459892 fbshipit-source-id: d20b3e8b81dd403541cb2b9164731448ca229695	2019-09-18 16:48:42 -07:00
Supriya Rao	52d999e173	Disable QNNPACK tests if pytorch is not built with it. (#26427 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26427 Use the new macro USE_PYTORCH_QNNPACK to enable testing with qnnpack Test Plan: test caffe2/test:quantized -- TestQNNPackOps Summary (total time 4.96s): PASS: 0 FAIL: 0 SKIP: 4 caffe2/test:quantized - test_qlinear_qnnpack (test_quantized.TestQNNPackOps) caffe2/test:quantized - test_qnnpack_add (test_quantized.TestQNNPackOps) caffe2/test:quantized - test_qnnpack_relu (test_quantized.TestQNNPackOps) caffe2/test:quantized - test_qnnpack_maxpool2d (test_quantized.TestQNNPackOps) FATAL: 0 TIMEOUT: 0 OMIT: 0 Reviewed By: ljk53 Differential Revision: D17459791 fbshipit-source-id: 3798fc270d22123b8807c9c63f12b9940981b115	2019-09-18 14:51:29 -07:00
Supriya Rao	bb1efb3bee	Adding quantized::linear function for pytorch mobile in c10 (#26135 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26135 This change adds the support to call QNNPACK using the refactored API for Linear operators (Fully Connected) It also has certain cmake changes to enable builing and using pytorch_qnnpack inside aten I have disabled USE_QNNPACK in CMakeLists.txt. Enabling it results in picking kernels from third_party/QNNPACK during runtime since the function names are the same. Test Plan: python test/test_quantized.py TestQNNPackOps.test_qlinear_qnnpack Imported from OSS Differential Revision: D17434885 fbshipit-source-id: 084698026938f4529f61d12e86dfe82534ec73dd	2019-09-17 16:16:39 -07:00
Daya Khudia	0ad8c679ae	Enable support for dilated convolutions (#26205 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26205 Enabling quantized dilated convolutions. test:quantized ``` Summary (total time 14.01s): PASS: 43 FAIL: 0 SKIP: 5 caffe2/test:quantized - test_qnnpack_add (test_quantized.TestQNNPackOps) caffe2/test:quantized - test_qnnpack_maxpool2d (test_quantized.TestQNNPackOps) caffe2/test:quantized - test_qnnpack_linear (test_quantized.TestQNNPackOps) caffe2/test:quantized - test_qnnpack_relu (test_quantized.TestQNNPackOps) caffe2/test:quantized - test_compare_tensor_scalar (test_quantized.TestComparatorOps) FATAL: 0 TIMEOUT: 0 OMIT: 0 ``` ghstack-source-id: 90244587 Test Plan: buck test mode/dev caffe2/test:quantized Differential Revision: D17375370 fbshipit-source-id: cff0ba9a77cabac3ad164b2e133bfa466865afd4	2019-09-17 10:55:23 -07:00
Daya Khudia	2b52c1d982	Dynamic quantization for bias. (#26057 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26057 bias is now unquantized (i.e. floating type) for qconv and qlinear. It is dynamically quantized by fbgemm. TODO: Add some performance numbers. Tests: test:quantization ``` Summary (total time 8.41s): PASS: 24 FAIL: 0 SKIP: 0 FATAL: 0 TIMEOUT: 0More details at https://our.intern.facebook.com/intern/buck/build/74d5f6f7-55c9-4350-a618-2013042fffd8 OMIT: 0 ``` test:quantized ``` Summary (total time 13.21s): PASS: 43 FAIL: 0 SKIP: 5 caffe2/test:quantized - test_qnnpack_maxpool2d (test_quantized.TestQNNPackOps) caffe2/test:quantized - test_compare_tensor_scalar (test_quantized.TestComparatorOps) caffe2/test:quantized - test_qnnpack_linear (test_quantized.TestQNNPackOps) caffe2/test:quantized - test_qnnpack_relu (test_quantized.TestQNNPackOps) caffe2/test:quantized - test_qnnpack_add (test_quantized.TestQNNPackOps) FATAL: 0 TIMEOUT: 0 OMIT: 0 ``` ghstack-source-id: 90166254 Test Plan: buck test mode/dev caffe2/test:quantization buck test mode/dev caffe2/test:quantized Differential Revision: D17328028 fbshipit-source-id: d4a163d730d0f4a03e8e0faf7420710cf36eec09	2019-09-16 14:43:06 -07:00
Sebastian Messmer	6df70db807	Disable broken unit tests (#26301 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26301 - ghstack-source-id: 90176419 Test Plan: waitforsandcastle Differential Revision: D17400971 fbshipit-source-id: b6f9cb27fe955b0200d62591300c70ba79a90e5f	2019-09-16 12:12:39 -07:00

1 2 3

145 Commits