pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Ivan Zaitsev	38e73b30b7	bring quantized_backward.cpp in sync with intern (#101990 ) The version of [D45965552](https://www.internalfb.com/diff/D45965552) exported as #101739 was not the latest. This PR brings GH in sync with intern. For Meta employees, see: [D46056765](https://www.internalfb.com/diff/D46056765) [D45965552](https://www.internalfb.com/diff/D45965552) @diff-train-skip-merge Pull Request resolved: https://github.com/pytorch/pytorch/pull/101990 Approved by: https://github.com/kit1980	2023-05-22 19:53:42 +00:00
Kwanghoon An	13640bf925	disableing quantizing gradient in 8bw (#101739 ) Summary: Quantizing a gradient is not applicable to complex ASR model. Gradient in INT8 f438266519 Gradient in FP32 f438109197 Clearly two WER shows the limitation with quantizing a gradient. As of now, we are okay with simply enabling quantized backpropagation but computing gradient in FP32. It already saves a memory due to model size. Test Plan: Signals Differential Revision: D45965552 Pull Request resolved: https://github.com/pytorch/pytorch/pull/101739 Approved by: https://github.com/izaitsevfb	2023-05-20 18:39:12 +00:00
Kwanghoon An	13f169c9da	Per Channel in brack-propagation function (#97475 ) Summary: Supporting Per Channel quantization in the gradient computation function. One workaround that I have added here is Current QNNPACK is not designed to process [transposed weight](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1283737025829921/) Here we are simply replacing Per Channel to Per Tensor to compute a gradient (Some slow learning curve or WER degradation might be expected - We don't know, nothing is guaranteed) Test Plan: You can create your own synthetic model, FP32 layer -> INT8 layer with Per Channel and see if loss is decreasing Differential Revision: D43898794 Pull Request resolved: https://github.com/pytorch/pytorch/pull/97475 Approved by: https://github.com/weiwangmeta	2023-04-03 20:34:44 +00:00
Kwanghoon An	3f4090652c	Passing LinearPackedParamBase Capsule as a saved_data to backward stage (#96269 ) Summary: Initial implementation was unpacking for original weight in custom furward function which will double weight tensor in memory 2x bigger. Hence we better unpack weight in backward function. store Capsule object in saved_data storage and unpack in backward function. Detail : https://github.com/pytorch/pytorch/pull/94432#discussion_r1126669178 Test Plan: buck2 run //scripts/kwanghoon/pytorch:torch_playground - [D43809980](https://www.internalfb.com/diff/D43809980) You can plug and play with above script. Differential Revision: D43895790 Pull Request resolved: https://github.com/pytorch/pytorch/pull/96269 Approved by: https://github.com/kimishpatel	2023-03-16 17:37:05 +00:00
Kwanghoon An	a1d7014c0f	Hooking backward for QNNPACK (#94432 ) Summary: Enabling quantized gradient. Test Plan: Algorithmic correctness - Dequantized matmul vs QNNPACK matmul for gradient - P616202766 ``` dequantized matmul : [1.5463, -0.2917, -2.1735, 0.5689, -1.0795] QNNPACK matmul : tensor([[ 1.5463, -0.2917, -2.1735, 0.5689, -1.0795]]) ``` Differential Revision: D42593235 Pull Request resolved: https://github.com/pytorch/pytorch/pull/94432 Approved by: https://github.com/malfet, https://github.com/kimishpatel	2023-03-08 10:21:32 +00:00

5 Commits