pytorch/test/quantization/core
Weiwen Xia 54217e695d [Quant] Add fast path of qmean/qstd for quantized CPU (reopen #70172) (#80579)
> Note: This is a reopen of https://github.com/pytorch/pytorch/pull/70172 which was merged then reverted.

Add fast path of qmean and qstd when computation is done in innermost dimensions for quantized CPU. The fast path supports inputs in contiguous memory format.
For example:
```python
X = torch.randn((2,3,4,5), dtype=torch.float)
qX = torch.quantize_per_tensor(X, scale, zero_point, torch_type)

# dim can be: -1, (-1, -2), (-1, -2, -3), (-1, -2, -3, -4), 3, (3, 2), (3, 2, 1), (3, 2, 1, 0) or None
dim = -1
qY = torch.mean(qX, dim) # qY = torch.std(qX, dim)
```

**Performance test results**
Test Env:
- Intel® Xeon® CLX-8260
- 1 instance, 4 cores
- Using Jemalloc

Test method:
Create 4d contiguous tensors as inputs, set `dim` to the innermost two dimensions `(-1, -2)`, then do the following tests
- Quantize inputs and use the fast path
- Quantize inputs and use the reference path
- Use fp32 kernel (no quantization)

Mean: exec time (us) vs. shape
![image](https://user-images.githubusercontent.com/12522207/148152617-604f2841-cfcd-495c-ae88-c27d9165b46a.png)

Std: exec time (us) vs. shape
![image](https://user-images.githubusercontent.com/12522207/148152632-3a8dceb1-0057-42c9-af65-1e26d697ff0c.png)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/80579
Approved by: https://github.com/malfet
2022-07-05 16:00:04 +00:00
..
experimental [quant] Implement APoT fake quantization (#79845) 2022-06-28 18:15:26 +00:00
__init__.py
test_docs.py [ao][docs] tests for quantization docs (#79923) 2022-06-23 20:50:31 +00:00
test_quantized_functional.py [quantized] Add bilinear quantized grid_sample (#66879) 2021-11-01 14:44:26 -07:00
test_quantized_module.py Revert "Add prelu op and module for quantized CPU backend (#73491)" 2022-06-30 12:54:39 +00:00
test_quantized_op.py [Quant] Add fast path of qmean/qstd for quantized CPU (reopen #70172) (#80579) 2022-07-05 16:00:04 +00:00
test_quantized_tensor.py [quant][core][improvement][feature] Enabled support for quantized fill of nhwc tensors 2022-06-15 22:44:50 +00:00
test_utils.py [reland][quant] Add utility function get_fqn_to_example_inputs 2022-05-25 23:31:51 +00:00
test_workflow_module.py [quant][better-engineering][bc-breaking] Removed quant_min/quant_max from fake_quant modules 2022-05-11 14:23:05 +00:00
test_workflow_ops.py [quant] Skip some broken tests due to hypothesis 2022-05-25 21:46:11 +00:00