Commit Graph

157 Commits

Author SHA1 Message Date
Lingyi Liu
ecb05f12c3 Support broadcast for quantized mul kernel (#30442)
Summary:
Since the tensor iterator supports the broadcast, we will just remove the assertion on input shapes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30442

Differential Revision: D19976562

Pulled By: lly-zero-one

fbshipit-source-id: 91b27fc8b2570f29d110c6df26eacdd16f587b9f
2020-02-19 16:52:31 -08:00
Supriya Rao
d0435604a5 [quant] Add a quantized batch_norm operator (#33080)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33080

Quantized batch norm for cases where batch norm cannot be fused with conv.
AVX2 implementation is from Caffe2.

Test Plan:
python test/test_quantized.py TestQuantizedOps.test_batch_norm

Imported from OSS

Differential Revision: D19861927

fbshipit-source-id: bd8cd101fc063cb6358132ab7c651a160999293c
2020-02-13 12:15:38 -08:00
Zafar Takhirov
a23009f98f Quantized leaky relu
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33004

Test Plan: Imported from OSS

Differential Revision: D19740193

Pulled By: z-a-f

fbshipit-source-id: 32542d5465db44190366a2f8b737305a03b5fa76
2020-02-11 17:56:02 -08:00
Zafar Takhirov
fbe121e395 Quantized sigmoid function
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31851

Test Plan: Imported from OSS

Differential Revision: D19280716

Pulled By: z-a-f

fbshipit-source-id: f47d37e32a675756fcaca293e2c14f90c43891de
2020-01-31 14:40:21 -08:00
Supriya Rao
169541871a Add operator support for dynamic quant on mobile (#32479)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32479

Run dynamic quantization on mobile (similar to FBGEMM). Currently only implemented on linear operator

Test Plan:
python test/test_quantized.py TestDynamicQuantizedLinear.test_qlinear

Imported from OSS

Differential Revision: D19542980

fbshipit-source-id: c9f6e5e8ded4d62ae0f2ed99e478c8307dde22ed
2020-01-24 17:51:54 -08:00
Pritam Damania
f050b16dd9 Move pytorch distributed tests to separate folder for contbuild. (#30445)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30445

Create distributed and rpc directories under caffe/test for better management
of unit tests.

Differential Revision: D18702786

fbshipit-source-id: e9daeed0cfb846ef68806f6decfcb57c0e0e3606
2020-01-22 21:16:59 -08:00
Zafar Takhirov
6abfa9ad8a Quantized H Tangent function (#31031)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31031

This activation will be needed for the LSTM implementation.
Also includes the QNNPack implementation.

Test Plan: Imported from OSS

Differential Revision: D19334280

Pulled By: z-a-f

fbshipit-source-id: ae14399765a47afdf9b1e072d3967c24ff473e8d
2020-01-09 16:16:17 -08:00
Iurii Zdebskyi
5d5f156558 Revert D18903453: Quantized H Tangent function
Test Plan: revert-hammer

Differential Revision:
D18903453

Original commit changeset: 0050b1cebb1d

fbshipit-source-id: 205978f71d5688d4068861f7cf2dff40fbb311c6
2020-01-09 07:30:49 -08:00
Zafar Takhirov
620060cb0c Quantized H Tangent function (#31031)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31031

This activation will be needed for the LSTM implementation.
Also includes the QNNPack implementation.

Test Plan: Imported from OSS

Differential Revision: D18903453

Pulled By: z-a-f

fbshipit-source-id: 0050b1cebb1ddb179b7ecbcb114fe70705070f67
2020-01-08 12:59:39 -08:00
Jerry Zhang
40e720282c Using _floats_wrapper in per_channel_tensor generation (#31780)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31780

We need to specify width to ensure the generated float is representable by `float32`
fixes: https://github.com/pytorch/pytorch/issues/31774

Test Plan:
ci

Imported from OSS

Differential Revision: D19275165

fbshipit-source-id: 50560b4208c562b6bcd2abccadd234f29fbb4b0a
2020-01-03 13:40:08 -08:00
Daya Khudia
a2463cbc38 Adding quantized clamp kernel (#30541)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30541

ghstack-source-id: 95450749

Adding quantized clamp kernel

Test Plan:
Added test.

buck test mode/dev //caffe2/test:quantized -- 'test_qclamp \(test_quantized\.TestQuantizedOps\)' --print-passing-details

Differential Revision: D18739628

fbshipit-source-id: 38a029ab96c5b0689bb15c67dc4f274883e74975
2019-12-12 15:54:40 -08:00
James Reed
4fd20c0816 Kill hypothesis deadline testing (#30890)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30890

We've received way too many complaints about this functionality making tests flaky, and it's not providing value to us anyway. Let's cut the shit and kill deadline testing

Test Plan: Imported from OSS

Differential Revision: D18857597

Pulled By: jamesr66a

fbshipit-source-id: 67e3412795ef2fb7b7ee896169651084e434d2f6
2019-12-06 13:36:14 -08:00
Brian Wignall
e7fe64f6a6 Fix typos (#30606)
Summary:
Should be non-semantic.

Uses https://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings/For_machines to find likely typos.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30606

Differential Revision: D18763028

Pulled By: mrshenli

fbshipit-source-id: 896515a2156d062653408852e6c04b429fc5955c
2019-12-02 20:17:42 -08:00
Jianyu Huang
38ca3552d9 Unit Test for the Legacy Dynamic Quantized Linear operator (#23139)
Summary: Add a unit test for the Dynamic Quantized Linear operator (```torch.fbgemm_linear_quantize_weight```, ```torch.fbgemm_pack_quantized_matrix```, and ```torch.fbgemm_linear_int8_weight```) in ```test_quantized.py```.

Test Plan:
buck test mode/dev caffe2/test:quantized -- 'test_qlinear_legacy \(test_quantized\.TestDynamicQuantizedLinear\)'  --print-passing-details

  [jianyuhuang@devvm29567.prn1.facebook.com: ~/fbsource/fbcode/caffe2/test] $ buck test mode/dev caffe2/test:quantized -- 'test_dynamic_qlinear \(test_quantized\.TestQuantizedLinear\)'  --print-passing-details
  Parsing buck files: finished in 1.8 sec
  Building: finished in 3.4 sec (100%) 6772/6772 jobs, 2 updated
    Total time: 5.2 sec
  Trace available for this run at /tmp/testpilot.20190714-220130.2698168.log
  TestPilot test runner for Facebook. See https://fburl.com/testpilot for details.
  Testpilot build revision 4f180136f799ab45ec2bf5d7644cb14955d4dd7a fbpkg
  6c6253f255644ca3b8ce1bc5955b0f25 at Mon Jul  8 14:13:38 2019 by twsvcscm from /
   usr/local/fbprojects/packages/testinfra.testpilot/651/t.par
  Discovering tests
  Running 1 tests
  Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/1125900044862617
        ✓ caffe2/test:quantized - test_dynamic_qlinear (test_quantized.TestQuantizedLinear) 0.023 1/1
  (passed)
  Test output:
  > test_dynamic_qlinear (test_quantized.TestQuantizedLinear) ... ok
  >
  > ----------------------------------------------------------------------
  > Ran 1 test in 0.024s
  >
  > OK
  Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/1125900044862617
  Summary (total time 9.03s):
    PASS: 1
    FAIL: 0
    SKIP: 0
    FATAL: 0
    TIMEOUT: 0
    OMIT: 0

Differential Revision: D16404027

fbshipit-source-id: 4c85dd255637fd8b1eb4830e0464f48c22706f41
2019-11-20 20:59:35 -08:00
Jianyu Huang
bbff06ee96 Convert conv_prepack to conv2d_prepack and conv_unpack to conv2d_unpack (#29529)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29529

Pull Request resolved: https://github.com/pytorch/glow/pull/3771

We would like to replace `conv_prepack` with `conv2d_prepack` and  `conv_unpack` with `conv2d_unpack`.

This makes the naming consistent between 2D and 3D conv:
```
torch.ops.quantized.conv2d_prepack
torch.ops.quantized.conv2d_unpack
torch.ops.quantized.conv2d
torch.ops.quantized.conv3d_prepack
torch.ops.quantized.conv3d_unpack
torch.ops.quantized.conv3d
```

We should do this earlier rather than later when we have more users for the quantized conv2d ops, for better engineering.

The replacement bash command is as the follows:
```
find ./ -type f -exec sed -i -e 's/quantized::conv_prepack/quantized::conv2d_prepack/g' {} \;
find ./ -type f -exec sed -i -e 's/quantized::conv_unpack/quantized::conv2d_unpack/g' {} \;
find ./ -type f -exec sed -i -e 's/torch.ops.quantized.conv_prepack/torch.ops.quantized.conv2d_prepack/g' {} \;
find ./ -type f -exec sed -i -e 's/torch.ops.quantized.conv_unpack/torch.ops.quantized.conv2d_unpack/g' {} \;
```
ghstack-source-id: 93661879

Test Plan: CI

Reviewed By: jackm321

Differential Revision: D18421079

fbshipit-source-id: 17ae8b1ee79223bd2c5d4bbccd57af6580c4ab12
2019-11-11 21:54:10 -08:00
Zafar Takhirov
aa658a2a68 Adding inplace quantized relu6
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29245

Test Plan: Imported from OSS

Differential Revision: D18334541

Pulled By: z-a-f

fbshipit-source-id: 25b12cc88ee81434d96cf5c44c008c6f85da0673
2019-11-09 14:53:42 -08:00
Lingyi Liu
f5074ccafe set the no_deadline for the adaptive_avg_pool_nhwc test (#29502)
Summary:
It is reported this test is flaky due to the time expiration. This pr flags it as no_deadline test.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29502

Differential Revision: D18416632

Pulled By: lly-zero-one

fbshipit-source-id: 27cd7b28139f3f16ee0cf5802a0709385719d487
2019-11-09 09:30:46 -08:00
Supriya Rao
4515edfe15 Disable QNNPACK tests on MacOS (#29328)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29328

Tests are flaky as seen in issue #29326.
Disable until we fix the kernels.

Test Plan:
python test/test_quantized.py TestQNNPackOps

Imported from OSS

Differential Revision: D18358200

fbshipit-source-id: 58f1981799fe8253234fcc7b0540e1c0b6babc15
2019-11-06 21:30:11 -08:00
Supriya Rao
6ea4219d20 Temporarily disable qnnpack tests on MACOS (#29176)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29176

Captured in issue  #27326

Test Plan:
python test/test_quantized.py test_qconv

Imported from OSS

Differential Revision: D18336184

fbshipit-source-id: 7394b04215b6c8b7bc0508f1648f23022bd031cb
2019-11-05 18:52:45 -08:00
Zafar Takhirov
7ea83120df Fixing the shape calculation for pool tests
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28853

Test Plan: Imported from OSS

Differential Revision: D18212290

Pulled By: z-a-f

fbshipit-source-id: 44a41f3192c8b168a8a0fb68eb33b68400917c7a
2019-11-01 12:29:27 -07:00
Xiaomeng Yang
f6692146e7 Add Conv3dInt8 (#28768)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28768

Add Conv3dInt8

Test Plan: buck test mode/dev-nosan caffe2/test:quantized -- "Conv"

Reviewed By: jianyuh

Differential Revision: D18023661

fbshipit-source-id: 8fc7a4350baf29271dfd6fa3c1c4b10e60e2fdbf
2019-10-28 23:28:11 -07:00
Xiaomeng Yang
d5afd97569 Refactor qconv_prepack and qconv_unpack to support conv3d (#28481)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28481

Refactor qconv_prepack and qconv_unpack to support conv3d

Test Plan: buck test mode/dev-nosan caffe2/test:quantized -- "Conv"

Reviewed By: dskhudia

Differential Revision: D18023651

fbshipit-source-id: 8cbc9fe68f93bc4b247a4f41423c6d8c30a5ef90
2019-10-27 14:43:16 -07:00
Lingyi Liu
4d9c017dee Fix the padding issue of quantized average pool operator (#28260)
Summary:
This is actually a bug in both testing and the average pool implementation.
In testing, we used the quantized value as float input and failed to padding the value with zero_point.
In op implementation, the size for averaging is not correct for padding case when count_include_pad is true.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28260

Differential Revision: D18039960

Pulled By: lly-zero-one

fbshipit-source-id: 7b5d34498b60f5d574a276a22798c9f576944734
2019-10-21 11:06:31 -07:00
Supriya Rao
15be189f0d Add quantized torch mean implementation (#27675)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27675

This leverages QNNPACK global average pooling to perform torch.mean on input feature maps
Currently can only support mean along HxW plane in NCHW tensor.

Test Plan:
python test/test_quantized.py TestQuantizedOps.test_mean

Imported from OSS

Differential Revision: D17989336

fbshipit-source-id: 8d4cbcbed5f146290b1580d26e5b45359d293761
2019-10-19 19:20:59 -07:00
Supriya Rao
3629974c1e Fix quantized avg_pool2d test to support non-zero padding (#28246)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28246

Updated the reference fp32 implementation to use the dequantized input tensor to correctly take padded values into account

Test Plan:
python test/test_quantized.py TestQNNPackOps.test_avg_pool2d

Imported from OSS

Differential Revision: D17989334

fbshipit-source-id: 848ce78713280f529f71ff48e930db8de18abc62
2019-10-18 09:14:54 -07:00
Supriya Rao
de0f9567a3 Add quantized avg_pool2d for pytorch mobile (#27631)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27631

Add support to perform avg_pool2d on mobile. Tested using existing avg_pool2d python tests
Uses qnnpack backend, which currently only support 4 dim inputs.

Test Plan:
python test/test_quantized.py TestQNNPackOps.test_avg_pool2d

Imported from OSS

Differential Revision: D17973792

fbshipit-source-id: 95ffffb2da656ed911a618b9cb68d6b728c16c74
2019-10-16 22:02:23 -07:00
Jerry Zhang
9084fcba46 test_equal in test_quantized.py (#27616)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27616

Fix a problem in reference implementation of equal

Test Plan:
pytho test/test_quantized.py

Imported from OSS

Differential Revision: D17837055

fbshipit-source-id: 1e4bc32f4334c0352468a61fa4316a1c0ff76485
2019-10-09 14:13:56 -07:00
Zafar Takhirov
eb5040c205 Suppressing hypothesis health check for qnnpack_add
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27193

Test Plan: Imported from OSS

Differential Revision: D17704958

Pulled By: zafartahirov

fbshipit-source-id: d8ab58b724cce2f5130b10ead0f10f5f32e26cfb
2019-10-02 11:39:12 -07:00
Supriya Rao
b805b5dab8 Unify quantized conv and linear tests (#26992)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26992

Run the same test for FBGEMM and QNNPACK backends.
Checks that QNNPACK or FBGEMM are supported before running it (using supported_qengines)

Test Plan:
python test/test_quantized.py TestQuantizedLinear
    python test/test_quantized.py TestQuantizedConv
    python test/test_quantized_models.py
    python test/test_quantized_nn_mods.py

Imported from OSS

Differential Revision: D17689171

fbshipit-source-id: e11c0a5e41f5f4e6836a614a5b61e4db3c5e384b
2019-10-01 14:07:16 -07:00
Supriya Rao
250f482aa5 Support qadd_relu on pytorch mobile (#26982)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26982

Fused add+relu support

Test Plan:
python test/test_quantized.py TestQNNPackOps.test_qnnpack_add

Also,
Add torch.backends.quantized.engine = "qnnpack"
Ran
python test/test_quantized.py TestQuantizedOps.test_qadd_relu_different_qparams
python test/test_quantized.py TestQuantizedOps.test_qadd_relu_same_qparams

Imported from OSS

Differential Revision: D17635063

fbshipit-source-id: dd1cdf07f66c4cd657c1907f1b650e50d3d4725f
2019-09-27 16:13:42 -07:00
James Reed
b518ff3cb8 Re-write of tensor-scalar mul
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26937

Test Plan: Imported from OSS

Differential Revision: D17618028

Pulled By: jamesr66a

fbshipit-source-id: 90ef461972e826327a19467ad4cefdeb35e13adc
2019-09-27 16:09:27 -07:00
Dmytro Dzhulgakov
764bf826e3 Remove fbgemm_is_cpu_supported in favor of torch.backends.quantized.supported_qengines (#26840)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26840

Cleaning up top-level namespace. Also cosmetic changes to torch.backends.quantized

Test Plan: Imported from OSS

Differential Revision: D17604403

Pulled By: dzhulgakov

fbshipit-source-id: c55af277ea7319d962a82a6120f65ccd47a60abc
2019-09-27 13:45:15 -07:00
Lingyi Liu
428204dfa4 Fix the QuantizedAVX2 build issue (#26854)
Summary:
The QuantizedAVx2 does not support the int32 type. We switch to use at::quantize_vec function instead.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26854

Differential Revision: D17609872

Pulled By: llyfacebook

fbshipit-source-id: b4a77d93ce0ebfef696506b5cdbe3e91fe44bb36
2019-09-27 10:20:26 -07:00
James Reed
b1a09dbec7 Support ceil_mode in quantized maxpool
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26916

Test Plan: Imported from OSS

Differential Revision: D17609625

Pulled By: jamesr66a

fbshipit-source-id: a9e1878e7946ee71b6888a91f0dcb2e889939376
2019-09-26 16:48:09 -07:00
James Reed
20ebd13f0a Re-write of tensor-scalar quantized add
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26766

Test Plan: Imported from OSS

Differential Revision: D17587105

Pulled By: jamesr66a

fbshipit-source-id: 4da6ea98a4c5cc36fd191d9845c1ef409efce464
2019-09-25 20:19:28 -07:00
Lingyi Liu
03007b3dda Quantized Interpolate Kernel(upsample_bilinear2d) (#26631)
Summary:
We implement the quantized upsample_bilinear2d case for interpolate kernel in this PR.

For nhwc performance improvement:
import torch, time

for dtype in [torch.qint8, torch.quint8, torch.qint32]:
    print('****', str(dtype), '*****')
    x = torch.rand(1, 56, 56, 256)

    q_x = torch.quantize_per_tensor(x, 0.5, 1, dtype)
    q_x = q_x.permute([0, 3, 1, 2])

    x = x.permute([0, 3, 1, 2])

    NITER = 100

    s = time.time()
    for i in range(NITER):
        float_out = torch.nn.functional.interpolate(x, size=5, scale_factor=None, mode="bilinear", align_corners=True)
    time_per_iter_float = (time.time() - s) / NITER

    s = time.time()
    for i in range(NITER):
        quant_out = torch.nn.quantized.functional.interpolate(q_x, size=5, scale_factor=None, mode="bilinear", align_corners=True)
    time_per_iter_quant = (time.time() - s) / NITER

    ref_quantized = torch.quantize_per_tensor(float_out, 0.5, 1, dtype)
    #  torch.testing.assert_allclose(ref_quantized.dequantize(), quant_out.dequantize())

    print('time/iter ms (float)', 'time/iter ms (quant)', 'quant/float', sep='\t')
    print(time_per_iter_float * 1000, time_per_iter_quant * 1000, time_per_iter_quant / time_per_iter_float, sep='\t')

    bytes_float = (x.numel() + float_out.numel()) * x.element_size()
    bytes_quant = (q_x.numel() + quant_out.numel()) * q_x.element_size()

    float_bw_gbps = bytes_float / time_per_iter_float / 1e9
    quant_bw_gbps = bytes_quant / time_per_iter_quant / 1e9

    print('GB/s float', 'GB/s quant', sep='\t')
    print(float_bw_gbps, quant_bw_gbps, sep='\t')

===========without nhwc handling===========
**** torch.qint8 *****
time/iter ms (float)    time/iter ms (quant)    quant/float
1.999044418334961       2.5860953330993652      1.2936657681940702
GB/s float      GB/s quant
1.6192056416115257      0.3129103516188541
**** torch.quint8 *****
time/iter ms (float)    time/iter ms (quant)    quant/float
2.02730655670166        2.6061582565307617      1.2855274639721328
GB/s float      GB/s quant
1.596632728927902       0.3105014816242217
**** torch.qint32 *****
time/iter ms (float)    time/iter ms (quant)    quant/float
2.0180463790893555      2.4047350883483887      1.1916153728010588
GB/s float      GB/s quant
1.603959172365819       1.3460376636426636

===========with nhwc handling===========

**** torch.qint8 *****
time/iter ms (float)    time/iter ms (quant)    quant/float
2.0913314819335938      0.09696483612060547     0.04636512047863123
GB/s float      GB/s quant
1.5477527249803915      8.345458337015
**** torch.quint8 *****
time/iter ms (float)    time/iter ms (quant)    quant/float
2.1065664291381836      0.09959936141967773     0.04728042754408879
GB/s float      GB/s quant
1.5365591871338384      8.124710725706763
**** torch.qint32 *****
time/iter ms (float)    time/iter ms (quant)    quant/float
2.044203281402588       0.6003522872924805      0.29368521846837126
GB/s float      GB/s quant
1.5834354779917448      5.391607675216635
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26631

Differential Revision: D17521498

Pulled By: llyfacebook

fbshipit-source-id: 385ae0f77777cd8bee385cafb80e492127b7d103
2019-09-25 13:43:43 -07:00
James Reed
cf272d43ab Trivial quantized torch.mean implementation
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26253

Test Plan: Imported from OSS

Differential Revision: D17529994

Pulled By: jamesr66a

fbshipit-source-id: e3aff71da35b05ed61710cdb88d72b51c944168b
2019-09-24 10:18:15 -07:00
Dmytro Dzhulgakov
ade60f8a8d Allow per-channel QTensor accept any floating type for scales (#26676)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26676

Just makes it more user-friendly to be able to pass any floating point or int point values to scales or zero_points for per-channel quantization. It matches behavior or per tensor quantizer where those arguments are scalars (not tensors) and thus automatic casting is applied.

Test Plan: Imported from OSS

Differential Revision: D17537051

Pulled By: dzhulgakov

fbshipit-source-id: e955ccdb5b4691828a559dc8f1ed7de54b6d12c4
2019-09-23 22:29:05 -07:00
Dmytro Dzhulgakov
b93823cb65 Per-channel quantized tensor to have only a single axis (#26675)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26675

Based on offline poll, we're very unlikely to have multi-axis quantized tensors in the foreseeable future. Let's simplify API and just return int instead of list. It also matches the singular `axis` name.

Test Plan: Imported from OSS

Differential Revision: D17537052

Pulled By: dzhulgakov

fbshipit-source-id: 676abc3b251d288468aaed467b5e5ca4063b98b0
2019-09-23 22:29:01 -07:00
Lingyi Liu
ba8002ec13 Quantized Interpolate Kernel(upsample_nearest2d) (#26617)
Summary:
In this PR, we implemented the support of quantized interpolate with upsample_nearest2d case.

import torch, time

for dtype in [torch.qint8, torch.quint8, torch.qint32]:
    print('****', str(dtype), '*****')
    x = torch.rand(1, 56, 56, 256)

    q_x = torch.quantize_per_tensor(x, 0.5, 1, dtype)
    q_x = q_x.permute([0, 3, 1, 2])

    x = x.permute([0, 3, 1, 2])

    NITER = 100

    s = time.time()
    for i in range(NITER):
        # float_out = torch.nn.functional.avg_pool2d(x, kernel_size=5, stride=None, padding=0)
        # float_out = torch.nn.functional.adaptive_avg_pool2d(x, output_size=5)
        float_out = torch.nn.functional.interpolate(x, size=5, scale_factor=None, mode="nearest", align_corners=None)
    time_per_iter_float = (time.time() - s) / NITER

    s = time.time()
    for i in range(NITER):
        # quant_out = torch.nn.quantized.functional.avg_pool2d(q_x, kernel_size=5, stride=None, padding=0)
        # quant_out = torch.nn.quantized.functional.adaptive_avg_pool2d(q_x, output_size=5)
        quant_out = torch.nn.quantized.functional.interpolate(q_x, size=5, scale_factor=None, mode="nearest", align_corners=None)
    time_per_iter_quant = (time.time() - s) / NITER

    ref_quantized = torch.quantize_per_tensor(float_out, 0.5, 1, dtype)
    #  torch.testing.assert_allclose(ref_quantized.dequantize(), quant_out.dequantize())

    print('time/iter ms (float)', 'time/iter ms (quant)', 'quant/float', sep='\t')
    print(time_per_iter_float * 1000, time_per_iter_quant * 1000, time_per_iter_quant / time_per_iter_float, sep='\t')

    bytes_float = (x.numel() + float_out.numel()) * x.element_size()
    bytes_quant = (q_x.numel() + quant_out.numel()) * q_x.element_size()

    float_bw_gbps = bytes_float / time_per_iter_float / 1e9
    quant_bw_gbps = bytes_quant / time_per_iter_quant / 1e9

    print('GB/s float', 'GB/s quant', sep='\t')
    print(float_bw_gbps, quant_bw_gbps, sep='\t')

=========without special handling of NHWC layout=============
**** torch.qint8 *****
time/iter ms (float)    time/iter ms (quant)    quant/float
2.08712100982666        2.1624231338500977      1.0360794240817361
GB/s float      GB/s quant
1.5508750976872339      0.37421723220248165
**** torch.quint8 *****
time/iter ms (float)    time/iter ms (quant)    quant/float
2.056601047515869       2.184889316558838       1.0623787823107091
GB/s float      GB/s quant
1.573890086222483       0.3703693335250963
**** torch.qint32 *****
time/iter ms (float)    time/iter ms (quant)    quant/float
2.0152783393859863      2.067704200744629       1.0260142037623525
GB/s float      GB/s quant
1.6061622539873104      1.5654386148823074

=========with special handling of NHWC layout=============
**** torch.qint8 *****
time/iter ms (float)    time/iter ms (quant)    quant/float
2.044649124145508       0.009250640869140625    0.004524317038018256
GB/s float      GB/s quant
1.5830902044636819      87.47675014597938
**** torch.quint8 *****
time/iter ms (float)    time/iter ms (quant)    quant/float
2.049403190612793       0.009107589721679688    0.004444020465761265
GB/s float      GB/s quant
1.579417859221808       88.8507305147644
**** torch.qint32 *****
time/iter ms (float)    time/iter ms (quant)    quant/float
2.0601415634155273      0.01062631607055664     0.0051580513976618066
GB/s float      GB/s quant
1.5711852318699757      304.6082930818039
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26617

Differential Revision: D17519146

Pulled By: llyfacebook

fbshipit-source-id: 126876e550ef7009fd75f5ccc033599f1f37456d
2019-09-23 20:32:19 -07:00
James Reed
cb9fd0ce58 quantized torch.topk (#26486)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26486

This PR adds a quantized version of torch.topk, supporting all the same options

Benchmark script
```
import torch
import time

for dtype in [torch.qint8, torch.quint8, torch.qint32]:
    X = torch.rand(6, 5, 1024)
    qX = torch.quantize_linear(X, 0.01, 0, dtype)
    X = qX.dequantize()

    NITER = 10000

    s = time.time()
    for i in range(NITER):
        float_out = torch.topk(X, 50)
    float_time_per_iter = (time.time() - s) / NITER

    s = time.time()
    for i in range(NITER):
        quant_out = torch.topk(qX, 50)
    quant_time_per_iter = (time.time() - s) / NITER

    print(dtype)
    print('float ms', 'quant ms', 'float gB/s', 'quant gB/s', sep='\t')
    nbytes_float = (X.numel() + float_out[0].numel()) * X.element_size()
    nbytes_quant = (qX.numel() + quant_out[0].numel()) * qX.element_size()
    print(float_time_per_iter * 1000,
          quant_time_per_iter * 1000,
          nbytes_float / float_time_per_iter / 1e9,
          nbytes_quant / quant_time_per_iter / 1e9, sep='\t')
```

Results

```
torch.qint8
float ms	quant ms	float gB/s	quant gB/s
0.3706729888916016	0.3370296716690064	0.34769191136743244	0.09559989136992947
torch.quint8
float ms	quant ms	float gB/s	quant gB/s
0.38260042667388916	0.34079675674438475	0.3368527346412275	0.09454315325003715
torch.qint32
float ms	quant ms	float gB/s	quant gB/s
0.38033516407012935	0.3364055633544922	0.3388590174539739	0.38310900305828427

```

Test Plan: Imported from OSS

Differential Revision: D17529988

Pulled By: jamesr66a

fbshipit-source-id: b5edfe90c592b6c84459d1c0c77e4c18f5b04417
2019-09-23 16:47:47 -07:00
Jianyu Huang
cbdbdd3c8c Fix the flaky test_qlinear test caused by hypothesis deadline (#26663)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26663

As Title says.

Example error:
https://circleci.com/gh/pytorch/pytorch/2894108?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link%2Fconsole

```
Sep 23 19:08:00 Unreliable test timings! On an initial run, this test took 453.13ms, which exceeded the deadline of 200.00ms, but on a subsequent run it took 23.01 ms, which did not. If you expect this sort of variability in your test timings, consider turning deadlines off for this test by setting deadline=None.
```
ghstack-source-id: 90613535

Test Plan: CI

Differential Revision: D17534476

fbshipit-source-id: d3ab91c8b290a0433eab4af3fc73ecbf728ec5bf
2019-09-23 14:19:39 -07:00
James Reed
c0aa6a01ce NHWC specialization for quantized::cat (#26524)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26524

This creates an NHWC specialization for `quantized::cat` that kicks in when all inputs are `NHWC`. This ensures the correct layout is propagated downstream as well as is an optimized implementation specifically for this data layout

Benchmark script based on Squeezenet shapes:
```
import torch, time

torch.manual_seed(0)

# NHWC
sizes = [
    (1, 54, 54, 64),
    (1, 54, 54, 128),
    (1, 26, 26, 128),
    (1, 26, 26, 256),
    (1, 12, 12, 256)
]

for size in sizes:
    x = torch.rand(*size)
    y = torch.rand(*size)
    qX = torch.quantize_linear(x, 0.01, 3, torch.qint8).permute([0, 3, 1, 2])
    qY = torch.quantize_linear(y, 0.01, 3, torch.qint8).permute([0, 3, 1, 2])

    ref = torch.cat([qX.dequantize(), qY.dequantize()], dim=1)

    NITER = 1000
    s = time.time()
    for i in range(NITER):
        out = torch.ops.quantized.cat([qX, qY], dim=1, scale=0.01, zero_point=3)
    time_per_iter = (time.time() - s) / NITER

    print('time per iter ms', time_per_iter * 1000)
    print('gb/s', (qX.numel() + qY.numel() + out.numel()) * qX.element_size() / time_per_iter / 1e9)

    torch.testing.assert_allclose(out.dequantize(), ref)
```

Before this change

```
time per iter ms 0.6898486614227295
gb/s 1.0821156026605054
time per iter ms 1.5480577945709229
gb/s 0.9644291093239284
time per iter ms 0.3180875778198242
gb/s 1.0881028500775023
time per iter ms 0.6702737808227539
gb/s 1.032748139350315
time per iter ms 0.13010454177856445
gb/s 1.1333655073392244
```
After this change
```
time per iter ms 0.11604785919189453
gb/s 6.432656364350577
time per iter ms 0.15956878662109375
gb/s 9.356416324360508
time per iter ms 0.040181636810302734
gb/s 8.613685939027139
time per iter ms 0.06564664840698242
gb/s 10.544696748392909
time per iter ms 0.018549680709838867
gb/s 7.949247337814738
```

Test Plan: Imported from OSS

Differential Revision: D17503593

Pulled By: jamesr66a

fbshipit-source-id: ec5d57ad8fbcb3fd9379e8bd370abd29d386f953
2019-09-23 13:19:29 -07:00
Supriya Rao
99226cd51e Unify Quantization APIs for add, pool and relu (#26586)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26586

Use the backend engine flag to call QNNPACK for quantized ops.

Test Plan: python test/test_quantized.py TestQNNPACKOps

Differential Revision: D17515129

Pulled By: supriyar

fbshipit-source-id: 951e90205aa19581ea006a91d9514fc7a94409ef
2019-09-21 13:41:16 -07:00
Lingyi Liu
eca01eb0a6 quantized average_pool2d and adaptive_avg_pool2d implementation(Revert d17437015) (#26580)
Summary:
In this PR, we tried to fix the windows build issue of  d17437015.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26580

Differential Revision: D17517341

Pulled By: llyfacebook

fbshipit-source-id: db726596aa8f7c992c5a7ddc2781dc3aa0312284
2019-09-21 11:10:26 -07:00
Sebastian Messmer
fcfca9ad62 Skip some fragile tests (#26599)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26599

These fail due to tolerance in equality comparison. Disable them for now.
ghstack-source-id: 90553855

Test Plan: unit tests

Differential Revision: D17517085

fbshipit-source-id: a4d9278e356318719ccd84047404915a97944f52
2019-09-21 11:06:42 -07:00
Jerry Zhang
2e82ee0335 quantize_linear_per_channel -> quantize_per_channel (#26575)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26575

To keep consistent with `quantize_per_tensor` we also
rename `quantize_linear_per_channel` to `quantize_per_channel`

Test Plan:
ci

Imported from OSS

Differential Revision: D17517360

fbshipit-source-id: 3af7d8f0fbe99148b79fcb1ad2fe811f776590cd
2019-09-21 11:02:17 -07:00
Jerry Zhang
254122dd4e quantize_linear -> quantize_per_tensor (#26574)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26574

Since we also have `quantized::linear`, `quantize_linear` sounds
confusing, so we plan to rename it before the branch cut

Test Plan:
ci

Imported from OSS

Differential Revision: D17514876

fbshipit-source-id: 01d9005e6ec8cb9950b9d8bba122109c389641d3
2019-09-20 21:58:48 -07:00
Supriya Rao
516cf051ee Revert D17504331: Unify Quantization APIs for add, pool and relu
Test Plan: revert-hammer

Differential Revision:
D17504331

Original commit changeset: 35cb2189067a

fbshipit-source-id: d433288f1dbb430d647c6694b3e3ad4276787c3b
2019-09-20 17:13:01 -07:00
Lingyi Liu
f0b7132b87 Revert D17437015: [pytorch][PR] Add the quantized average_pool2d support and adaptive_avg_pool2d support
Test Plan: revert-hammer

Differential Revision:
D17437015

Original commit changeset: 496aed1e4171

fbshipit-source-id: 53e22a85e06bd9d7827579b124b7f136230b6c1d
2019-09-20 15:01:49 -07:00