pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

History

Nikhil Gupta 94737e8a2a [ARM][feat]: Add 4 bit dynamic quantization matmuls & KleidiAI Backend (#134124 ) Description: 1. Quantize Linear Layer Weights to 4-bits: Quantize the weights of the Linear layer to 4 bits, using symmetric quantization. Pack two 4-bit weights into one uint8 container. Choose a quantization scheme (channel-wise or group-wise), with the group size being a multiple of 32. 2. Prepare Quantized Weights, Scales, and Optional Bias: After quantizing, obtain the quantized_weights, scales, and groupsize. If the original Linear layer has a bias, prepare it as well. 3. Pack the Weights Efficiently: Use torch.ops.aten._dyn_quant_pack_4bit_weight to optimally pack the weights, scales, and optional bias. ```python packed_weights = torch.ops.aten._dyn_quant_pack_4bit_weight(weight, scales_and_zeros, bias, groupsize, in_features, out_features) ``` Input parameters should include: in_features and out_features (the same as the Linear layer’s corresponding parameters). 4. Perform Dynamic Quantized Matrix Multiplication: Use torch.ops.aten._dyn_quant_matmul_4bit to perform matrix multiplication with quantized weights. ```python output = torch.ops.aten._dyn_quant_matmul_4bit(input, packed_weights, groupsize, in_features, out_features) ``` Inputs required include: The input tensor, packed_weights , groupsize, and the in_features and out_features. API Usage: https://github.com/pytorch/pytorch/issues/143289 Model Perf : 7B Transformer model: Prefill : 340 t/s Decode : 40 t/s 2B Transformer model Prefill : 747 t/s Decode : 80 t/s Tests: python test/test_linalg.py -k test__dyn_quant_pack_4bit_weight Ran 1 test in 0.016s OK python test/test_linalg.py -k test__dyn_quant_matmul_4bit Ran 8 tests in 0.077s OK python test/test_linalg.py -k test_compile_dyn_quant_matmul_4bit Ran 8 tests in 11.454s Change-Id: Ia1672bad5e6ec94e64d8bb1971395d60f4b3a452 Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/134124 Approved by: https://github.com/digantdesai, https://github.com/malfet		2024-12-20 19:32:03 +00:00
..
_autoheuristic	Fix unused Python variables outside torch/ and test/ (#136359 )	2024-12-11 17:10:23 +00:00
aoti	[ARM][feat]: Add 4 bit dynamic quantization matmuls & KleidiAI Backend (#134124 )	2024-12-20 19:32:03 +00:00
api	[TorchGen] remove remove_non_owning_ref_types from valuetype_type (#142449 )	2024-12-12 00:15:44 +00:00
decompositions	[BE][Easy] eliminate relative import in `torchgen` (#128872 )	2024-06-21 14:11:46 +00:00
dest	[1/N] Apply py39 ruff fixes (#138578 )	2024-12-02 21:46:18 +00:00
executorch	[TorchGen] remove remove_non_owning_ref_types from valuetype_type (#142449 )	2024-12-12 00:15:44 +00:00
fuse	[BE] update type annotations for basic utilities in `torch/__init__.py` (#129001 )	2024-06-24 18:04:38 +00:00
operator_versions	Fix unused Python variables outside torch/ and test/ (#136359 )	2024-12-11 17:10:23 +00:00
selective_build	[BE][Easy] enable postponed annotations in `torchgen` (#129376 )	2024-06-29 09:23:39 +00:00
shape_functions	[BE][Easy] enable postponed annotations in `torchgen` (#129376 )	2024-06-29 09:23:39 +00:00
static_runtime	[1/N] Apply py39 ruff fixes (#138578 )	2024-12-02 21:46:18 +00:00
__init__.py
BUCK.oss
BUILD.bazel
build.bzl	update rules_python and let bazel install its own pip dependencies (#101405 )	2023-05-23 06:20:33 +00:00
code_template.py	[1/N] Apply py39 ruff fixes (#138578 )	2024-12-02 21:46:18 +00:00
context.py	[2/N] Apply py39 ruff fixes (#141938 )	2024-12-05 06:26:06 +00:00
gen_aoti_c_shim.py	Fix unused Python variables outside torch/ and test/ (#136359 )	2024-12-11 17:10:23 +00:00
gen_backend_stubs.py	[1/N] Apply py39 ruff fixes (#138578 )	2024-12-02 21:46:18 +00:00
gen_executorch.py	[1/N] Apply py39 ruff fixes (#138578 )	2024-12-02 21:46:18 +00:00
gen_functionalization_type.py	Fix unused Python variables outside torch/ and test/ (#136359 )	2024-12-11 17:10:23 +00:00
gen_lazy_tensor.py	[1/N] Apply py39 ruff fixes (#138578 )	2024-12-02 21:46:18 +00:00
gen_schema_utils.py	[2/N] Apply py39 ruff fixes (#141938 )	2024-12-05 06:26:06 +00:00
gen_vmap_plumbing.py	[1/N] Apply py39 ruff fixes (#138578 )	2024-12-02 21:46:18 +00:00
gen.py	[TorchGen] Remove cpp_type_registration_declarations (#142452 )	2024-12-11 19:01:36 +00:00
local.py	[1/N] Apply py39 ruff fixes (#138578 )	2024-12-02 21:46:18 +00:00
model.py	Remove ConstQuantizerPtr in torchgen (#142375 )	2024-12-10 02:37:01 +00:00
native_function_generation.py	[aotd] capture rrelu_with_noise noise mutation in compile (#141867 )	2024-12-04 12:18:58 +00:00
utils.py	[1/N] Apply py39 ruff fixes (#138578 )	2024-12-02 21:46:18 +00:00
yaml_utils.py	[Reland] Update mypy to 1.4.1 (#105227 )	2023-07-15 20:30:20 +00:00