pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

History

Supriya Rao 434af5d94a [quant] Speed up per-channel min-max observer (#34118 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34118 Previously calc_per_channel_qparams was using for loops and python primitives, which called `item` many times causing slowdown during training. These changes uses torch primitives on the tensor to speed up the operation over 60x Perf results on MobileNetV2 during training using autograd profiler FP32 forward call - Self CPU time total: 47.222ms CUDA time total: 124.001ms before change FakeQuant Model - Self CPU time total: 19.107s CUDA time total: 27.177s after change FakeQuant Model - Self CPU time total: 404.667ms CUDA time total: 446.344ms Test Plan: python test/test_quantization.py Imported from OSS Differential Revision: D20287841 fbshipit-source-id: 6b706b8206e0d0da3c3c217b014e8da5b71b870d		2020-03-05 18:29:41 -08:00
..
__init__.py	Ignore F401 in all __init__.py without putting noqa (#25823 )	2019-10-23 15:28:13 -07:00
_quantize_script.py	[quant][graphmode][refactor] Better API for fold_convbn (#32380 )	2020-01-24 15:46:47 -08:00
default_mappings.py	[quant] Add Quantized BatchNorm2d module (#33109 )	2020-02-13 12:15:43 -08:00
fake_quantize.py	Per channel quantization performance improvement (#33772 )	2020-02-26 10:19:25 -08:00
fuse_modules.py	Enable inplace relu fusion for training (#33105 )	2020-02-14 12:15:58 -08:00
observer.py	[quant] Speed up per-channel min-max observer (#34118 )	2020-03-05 18:29:41 -08:00
qconfig.py	Updates to quantization documentation (#30288 )	2019-11-23 09:29:30 -08:00
quantize.py	[quantization] FP16 dynamic quantized Linear	2020-01-27 15:45:32 -08:00
stubs.py	Factored out the default mappings	2019-10-03 11:52:21 -07:00