pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

History

Jongsoo Park 7a837019a4 [caffe2] optimize 2/4-bit row-wise quantization (#387 ) Summary: Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/387 Pull Request resolved: https://github.com/pytorch/pytorch/pull/39985 avx2 optimized 2/4-bit row-wise quantization/dequantization in perfkernels. This diff slightly change the numerics of quantization by multiplying with the inverse of scale instead of dividing with scale. Test Plan: In my devserver for i in 2 4 8; do echo $i; buck run mode/opt :fused_rowwise_nbit_conversion_bench -- --bit-rate=$i; done Before this diff 2-bit 3.35394 ms. 100%. FloatToFused2BitRowwiseQuantized 4-bit 3.60351 ms. 100%. FloatToFused4BitRowwiseQuantized 8-bit 0.434467 ms. 100%. FloatToFused8BitRowwiseQuantized After this diff 2-bit 0.606386 ms. 100%. FloatToFused2BitRowwiseQuantized 4-bit 0.446683 ms. 100%. FloatToFused4BitRowwiseQuantized 8-bit 0.4349 ms. 100%. FloatToFused8BitRowwiseQuantized Reviewed By: choudharydhruv, jianyuh Differential Revision: D22033195 fbshipit-source-id: d3a219e47b8345268d90a160c9314ed0d5b71467	2020-06-19 21:28:31 -07:00
..
fused_rowwise_nbit_conversion_bench.py	[caffe2] optimize 2/4-bit row-wise quantization (#387 )	2020-06-19 21:28:31 -07:00
sparse_lengths_sum_nbit_benchmark.py	[caffe2] open source 2/4-bit SLS operators (#34903 )	2020-03-17 22:55:10 -07:00

Jongsoo Park 7a837019a4 [caffe2] optimize 2/4-bit row-wise quantization (#387 )

Summary:
Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/387

Pull Request resolved: https://github.com/pytorch/pytorch/pull/39985

avx2 optimized 2/4-bit row-wise quantization/dequantization in perfkernels.
This diff slightly change the numerics of quantization by multiplying with the inverse of scale instead of dividing with scale.

Test Plan:
In my devserver

for i in 2 4 8; do echo $i; buck run mode/opt :fused_rowwise_nbit_conversion_bench -- --bit-rate=$i; done

Before this diff
2-bit
        3.35394 ms.        100%. FloatToFused2BitRowwiseQuantized
4-bit
        3.60351 ms.        100%. FloatToFused4BitRowwiseQuantized
8-bit
       0.434467 ms.        100%. FloatToFused8BitRowwiseQuantized

After this diff

2-bit
       0.606386 ms.        100%. FloatToFused2BitRowwiseQuantized
4-bit
       0.446683 ms.        100%. FloatToFused4BitRowwiseQuantized
8-bit
         0.4349 ms.        100%. FloatToFused8BitRowwiseQuantized

Reviewed By: choudharydhruv, jianyuh

Differential Revision: D22033195

fbshipit-source-id: d3a219e47b8345268d90a160c9314ed0d5b71467

2020-06-19 21:28:31 -07:00

fused_rowwise_nbit_conversion_bench.py

[caffe2] optimize 2/4-bit row-wise quantization (#387 )

2020-06-19 21:28:31 -07:00

sparse_lengths_sum_nbit_benchmark.py

[caffe2] open source 2/4-bit SLS operators (#34903 )

2020-03-17 22:55:10 -07:00