Commit Graph

5 Commits

Author SHA1 Message Date
Ivan Zaitsev
38e73b30b7 bring quantized_backward.cpp in sync with intern (#101990)
The version of [D45965552](https://www.internalfb.com/diff/D45965552) exported as #101739 was not the latest. This PR brings GH in sync with intern.

For Meta employees, see:
[D46056765](https://www.internalfb.com/diff/D46056765)
[D45965552](https://www.internalfb.com/diff/D45965552)

@diff-train-skip-merge
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101990
Approved by: https://github.com/kit1980
2023-05-22 19:53:42 +00:00
Kwanghoon An
13640bf925 disableing quantizing gradient in 8bw (#101739)
Summary:
Quantizing a *gradient* is not applicable to complex ASR model.

Gradient in INT8
f438266519
Gradient in FP32
f438109197
Clearly two WER shows the limitation with quantizing a gradient.

As of now, we are okay with simply enabling quantized backpropagation but computing gradient in FP32.
It already saves a memory due to model size.

Test Plan: Signals

Differential Revision: D45965552

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101739
Approved by: https://github.com/izaitsevfb
2023-05-20 18:39:12 +00:00
Kwanghoon An
13f169c9da Per Channel in brack-propagation function (#97475)
Summary:
Supporting Per Channel quantization in the gradient computation function.

One workaround that I have added here is
Current QNNPACK is not designed to process [transposed weight](https://fb.workplace.com/groups/pytorch.edge.users/permalink/1283737025829921/)
Here we are simply replacing Per Channel to Per Tensor to compute a gradient (Some slow learning curve or WER degradation might be expected - We don't know, nothing is guaranteed)

Test Plan:
You can create your own synthetic model,
FP32 layer -> INT8 layer with Per Channel and see if loss is decreasing

Differential Revision: D43898794

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97475
Approved by: https://github.com/weiwangmeta
2023-04-03 20:34:44 +00:00
Kwanghoon An
3f4090652c Passing LinearPackedParamBase Capsule as a saved_data to backward stage (#96269)
Summary:
Initial implementation was unpacking for original weight in custom furward function which will double weight tensor in memory 2x bigger.

Hence we better unpack weight in backward function.
store Capsule object in saved_data storage and unpack in backward function.

Detail :
https://github.com/pytorch/pytorch/pull/94432#discussion_r1126669178

Test Plan: buck2 run //scripts/kwanghoon/pytorch:torch_playground - [D43809980](https://www.internalfb.com/diff/D43809980)
You can plug and play with above script.

Differential Revision: D43895790

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96269
Approved by: https://github.com/kimishpatel
2023-03-16 17:37:05 +00:00
Kwanghoon An
a1d7014c0f Hooking backward for QNNPACK (#94432)
Summary: Enabling quantized gradient.

Test Plan:
Algorithmic correctness - Dequantized matmul vs QNNPACK matmul for gradient - P616202766

```
dequantized matmul : [1.5463, -0.2917, -2.1735, 0.5689, -1.0795]
QNNPACK matmul : tensor([[ 1.5463, -0.2917, -2.1735,  0.5689, -1.0795]])
```

Differential Revision: D42593235

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94432
Approved by: https://github.com/malfet, https://github.com/kimishpatel
2023-03-08 10:21:32 +00:00