mirror of
https://github.com/zebrajr/pytorch.git
synced 2025-12-07 12:21:27 +01:00
Summary: This diff adds device side API which will convert the model to its quantized equivalent. THe input model must have been prepared AOT for quantization. API is implemented by: - Running reset obervers - Running observe method - Running quantize method - And replacing method, e.g. forward, with its quantized equivalent. Test Plan: test/quantization/jit/test_ondevice_quantization.py Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D38889818](https://our.internmc.facebook.com/intern/diff/D38889818) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83807 Approved by: https://github.com/iseeyuan
39 lines
1.3 KiB
C++
39 lines
1.3 KiB
C++
#pragma once
|
|
|
|
#include <c10/macros/Export.h>
|
|
#include <string>
|
|
|
|
namespace torch {
|
|
namespace jit {
|
|
namespace mobile {
|
|
class Module;
|
|
namespace quantization {
|
|
/*
|
|
* Device side PTQ API.
|
|
* Once the model has been prepared for quantization on server side, such model
|
|
* is sent to device. On device side the model is further trained. At the end of
|
|
* the training, before the model is readied for inference, we need to quantize
|
|
* the model.
|
|
* Usage of this API is as follows.
|
|
* PTQQuanizationHelper ptq_helper;
|
|
* ptq_helper.quantize_dynamic(m, "forward");
|
|
* Args:
|
|
* m: Captured by reference, an instance of mobile::Module. This module will be
|
|
* mutated in place to replace its <method_name> method with quantized
|
|
* equivalent. method:name: Name of the method to be quantized. AOT preparation
|
|
* for quantization must also have been done for this method. Returns: In place
|
|
* mutated `m` whose size should be smaller due to weight quantization and whose
|
|
* <method_name> method should use quantized ops
|
|
*/
|
|
class TORCH_API PTQQuanizationHelper {
|
|
public:
|
|
PTQQuanizationHelper() = default;
|
|
void quantize_dynamic(
|
|
torch::jit::mobile::Module& m,
|
|
const std::string& method_name);
|
|
};
|
|
} // namespace quantization
|
|
} // namespace mobile
|
|
} // namespace jit
|
|
} // namespace torch
|