pytorch/torchgen
Nikhil Gupta 94737e8a2a [ARM][feat]: Add 4 bit dynamic quantization matmuls & KleidiAI Backend (#134124)
Description:
1. Quantize Linear Layer Weights to 4-bits:
Quantize the weights of the Linear layer to 4 bits, using symmetric quantization.
Pack two 4-bit weights into one uint8 container.
Choose a quantization scheme (channel-wise or group-wise), with the group size being a multiple of 32.

2. Prepare Quantized Weights, Scales, and Optional Bias:
After quantizing, obtain the quantized_weights, scales, and groupsize.
If the original Linear layer has a bias, prepare it as well.

3. Pack the Weights Efficiently:
Use torch.ops.aten._dyn_quant_pack_4bit_weight to optimally pack the weights, scales, and optional bias.
```python
packed_weights = torch.ops.aten._dyn_quant_pack_4bit_weight(weight, scales_and_zeros, bias, groupsize, in_features, out_features)
```
Input parameters should include:
in_features and out_features (the same as the Linear layer’s corresponding parameters).

4. Perform Dynamic Quantized Matrix Multiplication:
Use torch.ops.aten._dyn_quant_matmul_4bit to perform matrix multiplication with quantized weights.
```python
output = torch.ops.aten._dyn_quant_matmul_4bit(input, packed_weights,  groupsize, in_features, out_features)
```
Inputs required include:
The input tensor, packed_weights , groupsize, and the in_features and out_features.

API Usage: https://github.com/pytorch/pytorch/issues/143289

Model Perf :
7B Transformer model:
Prefill : 340 t/s
Decode  : 40  t/s
2B Transformer model
Prefill : 747 t/s
Decode  : 80  t/s

Tests:
python test/test_linalg.py -k test__dyn_quant_pack_4bit_weight
Ran 1 test in 0.016s

OK

python test/test_linalg.py -k test__dyn_quant_matmul_4bit
Ran 8 tests in 0.077s

OK

python test/test_linalg.py -k test_compile_dyn_quant_matmul_4bit
Ran 8 tests in 11.454s

Change-Id: Ia1672bad5e6ec94e64d8bb1971395d60f4b3a452

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134124
Approved by: https://github.com/digantdesai, https://github.com/malfet
2024-12-20 19:32:03 +00:00
..
_autoheuristic Fix unused Python variables outside torch/ and test/ (#136359) 2024-12-11 17:10:23 +00:00
aoti [ARM][feat]: Add 4 bit dynamic quantization matmuls & KleidiAI Backend (#134124) 2024-12-20 19:32:03 +00:00
api [TorchGen] remove remove_non_owning_ref_types from valuetype_type (#142449) 2024-12-12 00:15:44 +00:00
decompositions [BE][Easy] eliminate relative import in torchgen (#128872) 2024-06-21 14:11:46 +00:00
dest [1/N] Apply py39 ruff fixes (#138578) 2024-12-02 21:46:18 +00:00
executorch [TorchGen] remove remove_non_owning_ref_types from valuetype_type (#142449) 2024-12-12 00:15:44 +00:00
fuse [BE] update type annotations for basic utilities in torch/__init__.py (#129001) 2024-06-24 18:04:38 +00:00
operator_versions Fix unused Python variables outside torch/ and test/ (#136359) 2024-12-11 17:10:23 +00:00
selective_build [BE][Easy] enable postponed annotations in torchgen (#129376) 2024-06-29 09:23:39 +00:00
shape_functions [BE][Easy] enable postponed annotations in torchgen (#129376) 2024-06-29 09:23:39 +00:00
static_runtime [1/N] Apply py39 ruff fixes (#138578) 2024-12-02 21:46:18 +00:00
__init__.py
BUCK.oss
BUILD.bazel
build.bzl update rules_python and let bazel install its own pip dependencies (#101405) 2023-05-23 06:20:33 +00:00
code_template.py [1/N] Apply py39 ruff fixes (#138578) 2024-12-02 21:46:18 +00:00
context.py [2/N] Apply py39 ruff fixes (#141938) 2024-12-05 06:26:06 +00:00
gen_aoti_c_shim.py Fix unused Python variables outside torch/ and test/ (#136359) 2024-12-11 17:10:23 +00:00
gen_backend_stubs.py [1/N] Apply py39 ruff fixes (#138578) 2024-12-02 21:46:18 +00:00
gen_executorch.py [1/N] Apply py39 ruff fixes (#138578) 2024-12-02 21:46:18 +00:00
gen_functionalization_type.py Fix unused Python variables outside torch/ and test/ (#136359) 2024-12-11 17:10:23 +00:00
gen_lazy_tensor.py [1/N] Apply py39 ruff fixes (#138578) 2024-12-02 21:46:18 +00:00
gen_schema_utils.py [2/N] Apply py39 ruff fixes (#141938) 2024-12-05 06:26:06 +00:00
gen_vmap_plumbing.py [1/N] Apply py39 ruff fixes (#138578) 2024-12-02 21:46:18 +00:00
gen.py [TorchGen] Remove cpp_type_registration_declarations (#142452) 2024-12-11 19:01:36 +00:00
local.py [1/N] Apply py39 ruff fixes (#138578) 2024-12-02 21:46:18 +00:00
model.py Remove ConstQuantizerPtr in torchgen (#142375) 2024-12-10 02:37:01 +00:00
native_function_generation.py [aotd] capture rrelu_with_noise noise mutation in compile (#141867) 2024-12-04 12:18:58 +00:00
utils.py [1/N] Apply py39 ruff fixes (#138578) 2024-12-02 21:46:18 +00:00
yaml_utils.py [Reland] Update mypy to 1.4.1 (#105227) 2023-07-15 20:30:20 +00:00