pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
HDCharles	28a5cd9480	[ao] fixing public v private for quantize_jit.py (#86024 ) Summary: just needed to add __all__ Test Plan: python test/test_public_bindings.py Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/86024 Approved by: https://github.com/jerryzh168	2022-10-05 22:11:43 +00:00
HDCharles	14db44ad72	[ao] fixing public v private for quantize.py (#86023 ) Summary: just needed to add __all__ Test Plan: python test/test_public_bindings.py Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/86023 Approved by: https://github.com/jerryzh168	2022-10-05 19:40:42 +00:00
HDCharles	c21caff876	[ao] correctly set public v private for fake_quantize.py (#86022 ) Summary: biggest issue was that the constructors for the fake_quantize classes use custom partials that live in the observer module and so the module for these needed to be set correctly in the constructor class method Test Plan: python test/test_public_bindings.py Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/86022 Approved by: https://github.com/jerryzh168	2022-10-05 19:30:50 +00:00
Zafar	adb12438c1	[AO] Cubic sparsity level scheduler (#85232 ) The scheduler updates the levels of sparsity based on https://arxiv.org/abs/1710.01878. ## Implementation The update rule is defined as: $$ \begin{aligned} s_t &= s_f + (s_i - s_f)\left( 1 - \frac{t - t_0}{n\Delta t} \right)^3 \\ \mbox{for} ~ t &\in \left\\{ t_0, t_0+\Delta t, \dots, t_0 + n\Delta t \right\\} \end{aligned} $$ There is a minor difference compared to the original paper. By providing `initially_zero` argument, one can set the level of sparsity before step $t_0$: If `False`, the sparsity level before $t_0$ is set to $s_i$, otherwise 0. ## Tests ``` python test/test_ao_sparsity.py -- TestCubicScheduler ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/85232 Approved by: https://github.com/junesg, https://github.com/jerryzh168	2022-10-04 22:44:15 +00:00
Xia, Weiwen	4b86a9359a	[Quant] Make x86 backend default when querying qconfig (#85461 ) This PR is a follow-up of #84329 [[Quant] Add unified x86 quant backend](https://github.com/pytorch/pytorch/pull/84329) It makes `x86` backend default when querying `qconfig`. Users get x86's qconfig/qconfig_mappings if backend is not specified. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85461 Approved by: https://github.com/jgong5, https://github.com/vkuzo	2022-09-30 23:44:45 +00:00
andrewor14	24fc680ee4	[Quant] Enable XNNPACK ops in QNNPACK BackendConfig (#85863 ) Summary: This commit enforces the following constraints on the QNNPACK BackendConfig: - `quant_min_lower_bound` = -127 for qint8 weight - `quant_max_upper_bound` = 127 for qint8 weight - `scale_min_lower_bound` = 2 -12 for qint8 activations and weight These constraints will enable users to use this BackendConfig with faster XNNPACK quantized ops. They are also consistent with the existing settings in `default_symmetric_qnnpack_qconfig` and its per_channel and QAT variants. For more detail on why these exact values were chosen, please see the description of https://github.com/pytorch/pytorch/pull/74396. Note that there are currently no restrictions on the qscheme in DTypeConfig. This should be added in the future to further enforce the restriction that the weights must be quantized with either per_tensor_symmetric or per_channel_symmetric. Existing default QConfigs such as `get_default_qconfig("qnnpack")` and `get_default_qat_qconfig("qnnpack")` will continue to be supported, but only for the existing dtypes, e.g. quint8 activations for weighted ops like linear and conv. In the future, we should revisit whether to enable XNNPACK ops using these QConfigs as well. Test Plan: python test/test_quantization.py TestQuantizeFx.test_qnnpack_backend_config Reviewers: jerryzh168, vkuzo Subscribers:** jerryzh168, vkuzo Pull Request resolved: https://github.com/pytorch/pytorch/pull/85863 Approved by: https://github.com/jerryzh168	2022-09-30 22:53:38 +00:00
Digant Desai	071f875046	[quant] Fix per channel weight observer (#85883 ) Summary: `per_channel_weight_observer_range_neg_127_to_127` now correctly uses `PerChannelMinMaxObserver` instead of `MinMaxObserver` Test Plan: Adds a new test `quantization.core.test_top_level_apis ` to instansiate and run `forward()` on all `default` observers Differential Revision: D39916482 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85883 Approved by: https://github.com/salilsdesai	2022-09-30 22:02:44 +00:00
Xia, Weiwen	3a3e2002d8	[Quant] Add unified x86 quant backend (#84329 ) ## Description Implement unified quantization backend 'X86' for x86 platforms. It combines the advantages of FBGEMM and ONEDNN. It selects kernels during weight prepacking and hide the details from end users. It will be the default backend in place of FBGEMM. For details, please refer to this RFC: [[RFC] Unified quantization backend for x86 CPU platforms](https://github.com/pytorch/pytorch/issues/83888) ## Validation Correctness Covered by UT Accuracy By running torchvision models on imagenet, no accuracy difference is found between FBGEMM and the unified X86 backend: [torchvision_accuracy_comparison_fbgemm_vs_x86.xlsx](https://github.com/pytorch/pytorch/files/9598114/torchvision_accuracy_comparison_fbgemm_vs_x86.xlsx) Performance Depends on https://github.com/pytorch/pytorch/pull/84470 which improves performance. For early PoC results, please refer to https://github.com/pytorch/pytorch/files/9399202/unified_qengine_poc_performance_bechmark.xlsx With the two PRs combined, we collected some data on Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz Method: Run multi-instances with 4 cores per instance on whole socket. Using JeMalloc and Intel OMP. Models/throughput \| fbgemm \| x86 \| improvement -- \| -- \| -- \| -- wide_resnet101_2 \| 173.5675 \| 241.815 \| 39.32% resnext101_32x8d \| 174.365 \| 339.8175 \| 94.89% resnet50 \| 573.155 \| 1174.14 \| 104.86% vgg19_bn \| 260.335 \| 337.92 \| 29.80% vgg19 \| 257.935 \| 333.265 \| 29.21% inception_v3 \| 601.1175 \| 1309.33 \| 117.82% densenet161 \| 296.645 \| 435.5625 \| 46.83% mnasnet1_0 \| 1216.7 \| 4057.515 \| 233.49% squeezenet1_0 \| 1220.085 \| 5153.3875 \| 322.38% alexnet \| 2294.91 \| 2624.6375 \| 14.37% fbnetc_100 \| 976.2825 \| 3110.1825 \| 218.57% shufflenet_v2_x0_5 \| 1555.76 \| 3026.125 \| 94.51% spnasnet_100 \| 1059.065 \| 3502.0975 \| 230.68% pytorch-unet \| 192.76 \| 246.77 \| 28.02% acgan \| 257.32 \| 333.7325 \| 29.70% cgan \| 7790.6925 \| 7803.1025 \| 0.16% sgan \| 257.565 \| 338.8875 \| 31.57% se_resnet50 \| 492.3725 \| 916.5175 \| 86.14% vggm \| 300.2875 \| 316.2075 \| 5.30% Environment: - PyTorch version: 1.13.0a0+gitcdd625b - Is debug build: False - CUDA used to build PyTorch: None - ROCM used to build PyTorch: N/A - OS: Ubuntu 20.04.3 LTS (x86_64) - GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 - Clang version: Could not collect - CMake version: version 3.22.5 - Libc version: glibc-2.31 - Python version: 3.9.12 (main, Jun 1 2022, 11:38:51) [GCC 7.5.0] (64-bit runtime) - Python platform: Linux-5.11.0-27-generic-x86_64-with-glibc2.31 - Is CUDA available: False - CUDA runtime version: No CUDA - GPU models and configuration: No CUDA - Nvidia driver version: No CUDA - cuDNN version: No CUDA - HIP runtime version: N/A - MIOpen runtime version: N/A - Is XNNPACK available: True Versions of relevant libraries: - [pip3] intel-extension-for-pytorch==1.13.0+cpu - [pip3] numpy==1.23.3 - [pip3] pytorch-widedeep==0.3.7 - [pip3] torch==1.13.0a0+git48b423b - [pip3] torchvision==0.14.0a0+ebb68f3 - [conda] blas 1.0 mkl - [conda] intel-extension-for-pytorch 1.13.0+cpu pypi_0 pypi - [conda] mkl 2021.4.0 h06a4308_640 - [conda] mkl-include 2022.1.0 pypi_0 pypi - [conda] mkl-service 2.4.0 py39h7f8727e_0 - [conda] mkl-static 2022.1.0 pypi_0 pypi - [conda] mkl_fft 1.3.1 py39hd3c417c_0 - [conda] mkl_random 1.2.2 py39h51133e4_0 - [conda] numpy 1.23.3 pypi_0 pypi - [conda] numpy-base 1.22.3 py39hf524024_0 - [conda] torch 1.13.0a0+git48b423b pypi_0 pypi - [conda] torchvision 0.14.0a0+ebb68f3 pypi_0 pypi Pull Request resolved: https://github.com/pytorch/pytorch/pull/84329 Approved by: https://github.com/jerryzh168	2022-09-29 00:44:40 +00:00
zaf	d542aab5c1	[quant][ao_migration] nn.intrinsic migration to ao (#84842 ) All quantization-related modules are being migrated to `torch.ao`. This migrates the `nn.intrinsic.modules`. Please, see the [tracker](https://github.com/pytorch/pytorch/issues/81667) for the timeline. Differential Revision: [D39419733](https://our.internmc.facebook.com/intern/diff/D39419733/) NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D39419733/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/84842 Approved by: https://github.com/jerryzh168	2022-09-28 23:54:29 +00:00
andrewor14	4ca125a9e1	[Quant][fx] Add quant and scale ranges to BackendConfig (#85200 ) Summary: This commit adds the following constraints to BackendConfig: quant_min_lower_bound quant_max_upper_bound scale_min_lower_bound scale_max_upper_bound This is motivated by QNNPACK constraints on qint8 weight values and the min scale value. Actually enforcing these constraints in the QNNPACK BackendConfig will follow in a future commit. Today, users can also specify the above constraints through QConfigs, and these settings may not necessarily match the ones specified in the BackendConfig. In this case, we will handle the discrepancy as follows: (1) Require QConfig quant ranges to fall within the backend's (2) Require QConfig min scale value (eps) >= backend's (3) Require QConfig to specify quant range if the backend specified one (4) Require QConfig to specify min scale value (eps) if the backend specified one Public API changes: * Previous API, still supported after this commit: ``` dtype_config = DTypeConfig( input_dtype=torch.quint8, output_dtype=torch.quint8, weight_dtype=torch.qint8, bias_dtype=torch.float, ) ``` * New API: ``` dtype_config = DTypeConfig( input_dtype=DTypeWithConstraints( dtype=torch.quint8, quant_min_lower_bound=0, quant_max_upper_bound=127, scale_min_lower_bound=2 -12, ), output_dtype=DTypeWithConstraints( dtype=torch.quint8, quant_min_lower_bound=0, quant_max_upper_bound=127, scale_min_lower_bound=2 -12, ), weight_dtype=DTypeWithConstraints( dtype=torch.qint8, quant_min_lower_bound=-128, quant_max_upper_bound=127, scale_min_lower_bound=2 ** -12, ), bias_dtype=torch.float, ) ``` * Additionally, the following `DTypeConfig` attributes have new types with helper getters: ``` # These have type DTypeWithConstraints dtype_config.input_dtype dtype_config.output_dtype dtype_config.weight_dtype # These return Optional[torch.dtype] dtype_config.get_input_dtype() dtype_config.get_output_dtype() dtype_config.get_weight_dtype() ``` Note that scale_max is currently not used because there is no existing mechanism to enforce this on the observer. In the future, we can validate this as well if there is a use case. Test Plan: python test/test_quantization.py TestBackendConfig.test_dtype_with_constraints python test/test_quantization.py TestQuantizeFx.test_backend_config_scale_min python test/test_quantization.py TestQuantizeFx.test_backend_config_quantization_range Reviewers: jerryzh168, vkuzo Subscribers: jerryzh168, vkuzo Pull Request resolved: https://github.com/pytorch/pytorch/pull/85200 Approved by: https://github.com/jerryzh168	2022-09-28 00:33:29 +00:00
andrewor14	2e81710366	[Quant] Add initial Executorch BackendConfig (#85527 ) Summary: This commit adds the initial BackendConfig for backends PyTorch lowers to through the Executorch stack. This initial version is only intended to cover the following set of ops: quantized::linear_dynamic, quantized::add, quantized::batch_norm2d, quantized::conv2d.new, quantized::linear, quantized::conv2d_relu.new, aten::relu_, aten::_adaptive_avg_pool2d, aten::_reshape_alias_copy, aten::squeeze.dim, aten::permute For now, the `BackendPatternConfig` for each of these ops is the same as the ones for the corresponding ops in the FBGEMM `BackendConfig`, though this may change in the future. Reviewers: jerryzh168, vkuzo Subscribers: jerryzh168, vkuzo Pull Request resolved: https://github.com/pytorch/pytorch/pull/85527 Approved by: https://github.com/jerryzh168	2022-09-23 21:24:59 +00:00
andrewor14	034f2b4d23	[Quant][fx] Enable FX static quantization for LSTM (#85068 ) Summary: This commit enables the custom module LSTM path for FX graph mode static quantization. This has the same flow as eager mode, which was already previously supported: ``` torch.nn.LSTM \| (prepare_fx) v torch.ao.nn.quantizable.LSTM \| (convert_fx) v torch.ao.nn.quantized.LSTM ``` The main reason why custom module LSTM is not supported in FX graph mode quantization today is because its inputs and outputs are nested tuples, and existing constructs such as observers, "quantize" nodes, and "dequantize" nodes do not understand how to handle complex structures. Note that the approach taken in this commit is only intended to be a short-term solution highly tailored to the input and output formats of custom module LSTM. In the future, for the longer-term solution, we should design a more general QConfig that allows users to specify complex input and output formats, and enable FX graph mode quantization to understand arbitrary nested structures and automatically infer how to transform the graph accordingly. Context: Today, in FX graph mode static quantization, custom modules are assumed to have quantized inputs and quantized outputs, with the exact dtypes derived from the associated QConfig (default quint8). Since custom modules are currently not handled through the reference model flow, their observer replacement logic are a little different from normal operators: ``` # (1) Original model input -> custom_module -> output # (2) Observed model (after prepare) input -> obs0 -> custom_module -> obs1 -> output # (3) Quantized model (after convert) input -> quant -> quantized_custom_module -> dequant -> output ``` In the last step, input observers are replaced with "quantize" and output observers are replaced with "dequantize", in contrast to other non-custom-module patterns where observers are replaced with "quantize-dequantize" pairs instead. Note that, conceptually, the output observer `obs1` is really just a DeQuantStub, since no observation is actually needed. Custom module LSTM: The reason why custom module LSTM cannot be handled in the same way is because, unlike other custom modules, its inputs and outputs are nested tuples instead of single tensors. This is how the existing custom module code would try to handle LSTMs: ``` # (1) Original model # input format: (input, (hidden0, hidden1)) # output format: (output, (hidden0, hidden1)) input -> lstm -> output hidden0 -/ \-> hidden0 hidden1 -/ \-> hidden1 # (2) Observed model (after prepare) input -> obs0 -> lstm -> obs1 # fails hidden0 -/ # missing observer hidden1 -/ # missing observer ``` However, this fails today because 1) we assume there is only one input to the custom module, and so we never end up quantizing `hidden0` and `hidden1`, and 2) the output observer `obs1` is fed a tuple, which it does not understand how to handle. Short-term fix: This commit addresses the above by specifically handling the input and output structures used by custom module LSTM. For the inputs, we manually insert observers for `hidden0` and `hidden1` to ensure all input tensors are quantized. For the outputs, we split the tuple into its internal nodes, attach a DeQuantStub to each node, and recombine these DeQuantStubs according to the original structure. Finally, we must also reroute consumers of the original LSTM tuple (and its internal nodes, e.g. `lstm[0]`) to these DeQuantStubs: ``` # (1) Original model input -> lstm -> output -> linear0 hidden0 -/ \-> hidden0 -> linear1 hidden1 -/ \-> hidden1 -> linear2 # (2) Observed model (after prepare) input -> obs0 -> lstm -> output -> dqstub -> linear0 -> obs3 hidden0 -> obs1 -/ \-> hidden0 -> dqstub -> linear1 -> obs4 hidden1 -> obs2 -/ \-> hidden1 -> dqstub -> linear2 -> obs5 # (3) Reference model (after convert) input -> quant -> qlstm -> output -> dequant -> linear0 -> quant -> dequant hidden0 -> quant -/ \-> hidden0 -> dequant -> linear1 -> quant -> dequant hidden1 -> quant -/ \-> hidden1 -> dequant -> linear2 -> quant -> dequant # (4) Quantized model (after lowering) input -> quant -> qlstm -> output -> quantized_linear0 -> dequant hidden0 -> quant -/ \-> hidden0 -> quantized_linear1 -> dequant hidden1 -> quant -/ \-> hidden1 -> quantized_linear2 -> dequant ``` Note that we choose to insert DeQuantStubs here instead of observers because these will ultimately be replaced by "dequantize" nodes. This matches the general custom module behavior, where output observers are replaced only with "dequantize" nodes (as opposed to the normal "quantize-dequantize" pair), since custom module outputs are assumed to already be quantized. Using DeQuantStubs instead of observers also simplifies the "dequantize" insertion logic. In the future, we should use DeQuantStubs in place of output observers for custom modules in general. Test plan: python test/test_quantization.py TestQuantizeFx.test_static_lstm python test/test_quantization.py TestQuantizeFx.test_static_lstm_consume_tuple Reviewers: jerryzh168, vkuzo Subscribers: jerryzh168, vkuzo Pull Request resolved: https://github.com/pytorch/pytorch/pull/85068 Approved by: https://github.com/jerryzh168	2022-09-23 13:53:39 +00:00
Jerry Zhang	4523ac7aa1	[quant][docs][ez] Fix formatting for qconfig_mapping (#85306 ) Summary: att Test Plan: visual inspection of generated docs Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/85306 Approved by: https://github.com/vkuzo, https://github.com/andrewor14	2022-09-22 02:09:36 +00:00
Zafar	0336308be5	[AO] Callable norm function for sparsifier (#85236 ) The `WeightNormSparsifier` currently only supports L2-norm. This allows the users specify the function that is applied to compute the norm. In addition, L1-norm is also added, as an `.abs` function. ## Implementation details - The functions that are referred to as "norms", are not strictly such. For example, L2-norm of `x` is computed as `F.avg_pool(x * x, ...)`. Similarly, L1-norm of `x` is computed as `F.avg_pool(x.abs(), ...)`. - When passing callable functions for the norm, the above assumption must hold: `F.avg_pool(norm_fn(x), ...)` will be applied. ## Example: ```python >>> # L3-norm >>> l3 = lambda T: T * T * T >>> sparsifier = WeightNormSparsifier(norm=l3) >>> >>> # L0-norm >>> l0 = lambda T: (torch.logical_or(torch.zeros(T.shape), T != 0).to(T.dtype) >>> sparsifier = WeightNormSparsifier(norm=l0) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/85236 Approved by: https://github.com/jcaip	2022-09-21 22:46:25 +00:00
Jerry Zhang	2c285f3e9b	[quant][docs] README for FX Graph Mode Quantization (#85070 ) Summary: This is a developer-oriented design doc/README for FX Graph Mode Quantization, the goal for the doc is for new developers of FX Graph Mode Quantization to get familiarized with the high level algorithm of FX Graph Mode Quantization and ramp up quickly Test Plan: no test needed Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/85070 Approved by: https://github.com/vkuzo	2022-09-21 16:13:44 +00:00
Vasiliy Kuznetsov	09965957cd	quantization: align observer dtype with reference model spec (#85345 ) Summary: Before this PR, the `dtype` attribute of observers was not clearly defined. It originally meant `interface_dtype` in the eager mode workflow, which is how the codebase before this PR is using it. In the new reference model spec, `dtype` attribute of an observer represents the `dtype` value which needs to be passed into a `quantize` function in the reference model spec. This PR aligns the codebase to this definition of dtype. In detail: 1. change util functions to interpret `dtype` using the reference model definition 2. change `prepare` to interpret `dtype` using the reference model definition 3. change observers for dynamic quantization to interpret `dtype` using the reference model definition. A future PR (left out of this one to keep LOC small) will deprecate the `compute_dtype` field and instead expose `is_dynamic` on observers. " Test plan: ``` python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps ``` Differential Revision: [D39675209](https://our.internmc.facebook.com/intern/diff/D39675209) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85345 Approved by: https://github.com/z-a-f, https://github.com/jerryzh168	2022-09-21 06:34:26 +00:00
Feisi Fu	d8eae6283d	Rename 'torch/ao/nn/quantized._reference' to 'torch/ao/nn/quantized/reference'. (#84974 ) Currently, the path for reference modules contains _ which means it's private (https://github.com/pytorch/pytorch/tree/master/torch/ao/nn/quantized/_reference), but we would like to make it public since the reference module is now enabled by default in the fx graph mode quantization flow and it will be added to eager mode flow as well in the future. To make '_reference' public, it should satisfy the [public API rules](https://github.com/pytorch/pytorch/wiki/Public-API-definition-and-documentation). I did in the first commit (prepare '_reference' to be public): 1: add __all__ to public modules and packages; 2. made functions, that are only used in the file that the function is defined, private by adding _ at their names. Fixes #83090. (we rename the 'torch/ao/nn/quantized/_reference', because of migration #81667.) This is a dup for the #84786. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84974 Approved by: https://github.com/andrewor14, https://github.com/z-a-f	2022-09-16 17:49:07 +00:00
Jerry Zhang	44c30c5d1c	[quant][docs] Add example for the error message for fixed qparam ops (#84666 ) Summary: att, since example makes it clearer what the user needs to do Test Plan: local test for the error message Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/84666 Approved by: https://github.com/vkuzo, https://github.com/andrewor14	2022-09-14 03:43:00 +00:00
Jesse Cai	d6b2f5c643	[Quant][fx] Remove `remove_quant_dequant_pairs` and fix tests (#84203 ) Summary: - `remove_quant_dequant_pairs` removes ops when a `quant` is followed by a `dequant` - It looks like the quantized implementation of `layer_norm` only supports float weights, so updated the default qconfig to avoid quantizing the weight param. - Fixes broken test, `test_norm_weight_bias`. This was the only test that broke, because the default qconfig dict we pass in quantizes the weight. I just pulled the native qconfig object and converted it to a dict. - Adds in qconfig and backend config support for layernorm Test Plan: ``` python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps python test/test_quantization.py TestQuantizeFxModels ``` Reviewers: Subscribers: Tasks: Fixes https://github.com/pytorch/pytorch/issues/83110 Tags: quant, fx Differential Revision: [D39395141](https://our.internmc.facebook.com/intern/diff/D39395141) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84203 Approved by: https://github.com/jerryzh168	2022-09-12 16:32:15 +00:00
Vasiliy Kuznetsov	1dabb51a16	quant: add `extra_repr` to HistogramObserver (#84760 ) Summary: Adds `extra_repr` to `HistogramObserver`. This is useful when debugging PTQ models because it allows to quickly check whether a `HistogramObserver` has received data or not. Test plan: ``` >>> import torch >>> obs = torch.ao.quantization.HistogramObserver() >>> obs(torch.randn(1, 3, 224, 224)) ... >>> print(obs) // before - hard to tell if observer has seen data HistogramObserver() // after HistogramObserver(min_val=-4.778339862823486, max_val=4.311892986297607) >>> ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/84760 Approved by: https://github.com/andrewor14	2022-09-09 21:21:03 +00:00
Jerry Zhang	214a6500e3	[quant][docs] Additonal fixes for quantize_fx docs (#84587 ) Summary: Some more clarifications for the arguments, including linking to object docs (QConfigMapping, BackendConfig) and adding types in the doc Test Plan: ``` cd docs make html ``` and visual inspection for the generated docs Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/84587 Approved by: https://github.com/vkuzo	2022-09-09 15:23:23 +00:00
Zafar	521d1071f8	[quant] Subpackage import in nn.quantized (#84141 ) Some of the subpackages were not included in the 'torch.nn.quantized'. That would cause some specific cases fail. For example, `from torch.nn.quantized import dynamic` would work, but `import torch; torch.nn.quantized.dynamic` would fail. Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/84141 Approved by: https://github.com/andrewor14	2022-09-01 11:35:03 +00:00
Jesse Cai	eabe34cc40	[Quant] Remove warnings from using torch.tensor(value) (#84277 ) Summary: I think zafar made an earlier pull for these changes [here](`ce0786add2`), but they didn't seem to make it through the migration. Test Plan: ``` python test/test_quantization.py ``` Reviewers: Subscribers: Tasks: https://github.com/pytorch/pytorch/issues/73566 Tags: quant Differential Revision: [D39145070](https://our.internmc.facebook.com/intern/diff/D39145070) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84277 Approved by: https://github.com/z-a-f	2022-08-30 22:10:14 +00:00
Jesse Cai	d144594512	[Quant][fx] Remove WEIGHT_INDEX_DICT and BIAS_INDEX_DICT (Part 2) (#83853 ) Summary: - Finishes the second part of https://github.com/pytorch/pytorch/pull/83263 - Removes WEIGHT_INDEX_DICT and BIAS_INDEX_DICT from utils.py - Moves two funcitons, `node_arg_is_weight` and `node_arg_is_bias` into utils.py from prepare.py convert.py and _equalize.py now use node_arg_is_weight instead of the dictionaries - Adds in quantization support for `F.groupnorm`. Add in missing BackendPatternConfigs for layernorm, instancenorm, and groupnorm Test Plan: ``` python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 2b157e0dc4f1553be1f4813b4693db952e6fc558 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83848 Fixes #83093 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83853 Approved by: https://github.com/jerryzh168, https://github.com/andrewor14	2022-08-29 18:08:36 +00:00
Kimish Patel	eebdcb5a2e	[Pytorch][quantization][ondevice] Add a wrapper API for server side prep (#83742 ) for ondevice quantization Summary: THis diff just wraps existing API for ondevice quantization Test Plan: test/quantization/jit/test_ondevice_quantization.py Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D38868647](https://our.internmc.facebook.com/intern/diff/D38868647) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83742 Approved by: https://github.com/jerryzh168	2022-08-29 17:55:26 +00:00
Kimish Patel	5c7e801c50	[pytorch][on device quant] Finalize method for ondevice quant (#83571 ) Summary: After inserting quant dequant nodes in the graph, we need 1. Insert packed param creation and quantized op 2. Create packed_params attribute in the top module. For this we need graph that inlined except for calculate_qparams method calls. But they can be inlined too. So perhaps we need to make sure no other callmethods exist. 3. Insert SetAttr for the packed param 4. Insert GetAttr for the packed param 5. Use GetAttr output for quantized op where applicable, e.g. linear_dynamic The above is added to quantize_<method-name> method created inprevious step. Once the above steps are done clone the method into quantized_<method-name> Modify quantize_<method-name>: 1. Remove all outputs from the method. 2. Run dce 3. Remove all inputs from the method except self. Modify quantized_<method-name>: 1. Remove all packed_param setAttr nodes. 2. Run dce. This should result in removal of all nodes that generate packed param. Test Plan: To be written Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D38771416](https://our.internmc.facebook.com/intern/diff/D38771416) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83571 Approved by: https://github.com/jerryzh168	2022-08-29 17:53:11 +00:00
Kimish Patel	446afb5f9f	[On Device Quantization][pytorch]Make insert_quant_dequant support ondevice ptq (#83570 ) Summary: This diff adds a way to: - clone previously observed method - Add calls to observer's calculate_qparams methods - Extract the scale and zero point - Use them to insert quant dequant nodes Now for forward method we have - observe_forward - quantize_forward observe_forward is used post training to observer statistics. In the case of dynamic PTQ this requires just running that method once to update weight observer statistics. quantize_forward method will be used to use the observer statistics to calculate quantization parameters and apply that to quant dequant op. Subsequent diffs will replace dequant + op with their quantized op counter parts and replace quantize ops with relevant packed params class where possible Test Plan: To be written Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D38771419](https://our.internmc.facebook.com/intern/diff/D38771419) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83570 Approved by: https://github.com/jerryzh168	2022-08-29 17:51:00 +00:00
Kimish Patel	9189edb3b3	[Quantization][Pytorch] On device quantization support part 1 (#83568 ) Summary: TO support on device quantization this diff introduces observer insertion. Specifically observers are inserted by adding new method with prefix observ_. Intent is that post training, this method will be run to record statistics Test Plan: test_ondevice_quantization.py Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D38771417](https://our.internmc.facebook.com/intern/diff/D38771417) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83568 Approved by: https://github.com/jerryzh168	2022-08-29 17:22:30 +00:00
zaf	2f04ba2c7c	[quant][ao_migration] `torch.nn.qat` → `torch.ao.nn.qat` (#78716 ) Context: In order to avoid the cluttering of the `torch.nn` namespace the quantized modules namespace is moved to `torch.ao.nn`. The list of the `nn.quantized` files that are being migrated: - [X] `torch.nn.quantized` → `torch.ao.nn.quantized` - [X] `torch.nn.quantized.functional` → `torch.ao.nn.quantized.functional` - [X] `torch.nn.quantized.modules` → `torch.ao.nn.quantized.modules` - [X] `torch.nn.quantized.dynamic` → `torch.ao.nn.quantized.dynamic` - [X] `torch.nn.quantized._reference` → `torch.ao.nn.quantized._reference` - [X] `torch.nn.quantizable` → `torch.ao.nn.quantizable` - [X] [Current PR] `torch.nn.qat` → `torch.ao.nn.qat` - [X] `torch.nn.qat.modules` → `torch.ao.nn.qat.modules` - [X] `torch.nn.qat.dynamic` → `torch.ao.nn.qat.dynamic` - [ ] `torch.nn.intrinsic` → `torch.ao.nn.intrinsic` - [ ] `torch.nn.intrinsic.modules` → `torch.ao.nn.intrinsic.modules` - [ ] `torch.nn.intrinsic.qat` → `torch.ao.nn.intrinsic.qat` - [ ] `torch.nn.intrinsic.quantized` → `torch.ao.nn.intrinsic.quantized` - [ ] `torch.nn.intrinsic.quantized.modules` → `torch.ao.nn.intrinsic.quantized.modules` - [ ] `torch.nn.intrinsic.quantized.dynamic` → `torch.ao.nn.intrinsic.quantized.dynamic` Majority of the files are just moved to the new location. However, specific files need to be double checked: - None Differential Revision: [D36861197](https://our.internmc.facebook.com/intern/diff/D36861197/) NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D36861197/)! Differential Revision: [D36861197](https://our.internmc.facebook.com/intern/diff/D36861197) Pull Request resolved: https://github.com/pytorch/pytorch/pull/78716 Approved by: https://github.com/jerryzh168	2022-08-25 16:50:38 +00:00
zaf	29e83b6599	[quant][ao_migration] `torch.nn.quantizable` → `torch.ao.nn.quantizable`. (#78717 ) Context: In order to avoid the cluttering of the `torch.nn` namespace the quantized modules namespace is moved to `torch.ao.nn`. The list of the `nn.quantized` files that are being migrated: - [X] `torch.nn.quantized` → `torch.ao.nn.quantized` - [X] `torch.nn.quantized.functional` → `torch.ao.nn.quantized.functional` - [X] `torch.nn.quantized.modules` → `torch.ao.nn.quantized.modules` - [X] `torch.nn.quantized.dynamic` → `torch.ao.nn.quantized.dynamic` - [X] `torch.nn.quantized._reference` → `torch.ao.nn.quantized._reference` - [X] [Current PR] `torch.nn.quantizable` → `torch.ao.nn.quantizable` - [ ] `torch.nn.qat` → `torch.ao.nn.qat` - [ ] `torch.nn.qat.modules` → `torch.ao.nn.qat.modules` - [ ] `torch.nn.qat.dynamic` → `torch.ao.nn.qat.dynamic` - [ ] `torch.nn.intrinsic` → `torch.ao.nn.intrinsic` - [ ] `torch.nn.intrinsic.modules` → `torch.ao.nn.intrinsic.modules` - [ ] `torch.nn.intrinsic.qat` → `torch.ao.nn.intrinsic.qat` - [ ] `torch.nn.intrinsic.quantized` → `torch.ao.nn.intrinsic.quantized` - [ ] `torch.nn.intrinsic.quantized.modules` → `torch.ao.nn.intrinsic.quantized.modules` - [ ] `torch.nn.intrinsic.quantized.dynamic` → `torch.ao.nn.intrinsic.quantized.dynamic` Majority of the files are just moved to the new location. However, specific files need to be double checked: - `torch/ao/nn/__init__.py` → Changing the imports to lazy. Differential Revision: [D36861090](https://our.internmc.facebook.com/intern/diff/D36861090/) NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D36861090/)! Differential Revision: [D36861090](https://our.internmc.facebook.com/intern/diff/D36861090) Pull Request resolved: https://github.com/pytorch/pytorch/pull/78717 Approved by: https://github.com/jerryzh168	2022-08-25 16:50:37 +00:00
zaf	b1455f9424	[quant][ao_migration] `torch.nn.quantized._reference` → `torch.ao.nn.quantized._reference` (#78715 ) Context: In order to avoid the cluttering of the `torch.nn` namespace the quantized modules namespace is moved to `torch.ao.nn`. The list of the `nn.quantized` files that are being migrated: - [ ] `torch.nn.quantized` → `torch.ao.nn.quantized` - [X] `torch.nn.quantized.functional` → `torch.ao.nn.quantized.functional` - [X] `torch.nn.quantized.modules` → `torch.ao.nn.quantized.modules` - [X] `torch.nn.quantized.dynamic` → `torch.ao.nn.quantized.dynamic` - [X] [Current PR] `torch.nn.quantized._reference` → `torch.ao.nn.quantized._reference` - [ ] `torch.nn.quantizable` → `torch.ao.nn.quantizable` - [ ] `torch.nn.qat` → `torch.ao.nn.qat` - [ ] `torch.nn.qat.modules` → `torch.ao.nn.qat.modules` - [ ] `torch.nn.qat.dynamic` → `torch.ao.nn.qat.dynamic` - [ ] `torch.nn.intrinsic` → `torch.ao.nn.intrinsic` - [ ] `torch.nn.intrinsic.modules` → `torch.ao.nn.intrinsic.modules` - [ ] `torch.nn.intrinsic.qat` → `torch.ao.nn.intrinsic.qat` - [ ] `torch.nn.intrinsic.quantized` → `torch.ao.nn.intrinsic.quantized` - [ ] `torch.nn.intrinsic.quantized.modules` → `torch.ao.nn.intrinsic.quantized.modules` - [ ] `torch.nn.intrinsic.quantized.dynamic` → `torch.ao.nn.intrinsic.quantized.dynamic` Majority of the files are just moved to the new location. However, specific files need to be double checked: - None Differential Revision: [D36860927](https://our.internmc.facebook.com/intern/diff/D36860927/) NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D36860927/)! Differential Revision: [D36860927](https://our.internmc.facebook.com/intern/diff/D36860927) Pull Request resolved: https://github.com/pytorch/pytorch/pull/78715 Approved by: https://github.com/jerryzh168	2022-08-25 16:50:36 +00:00
zaf	d32a762147	[quant][ao_migration] `torch.nn.quantized.dynamic` → `torch.ao.nn.quantized.dynamic` (#78714 ) Context: In order to avoid the cluttering of the `torch.nn` namespace the quantized modules namespace is moved to `torch.ao.nn`. The list of the `nn.quantized` files that are being migrated: - [ ] `torch.nn.quantized` → `torch.ao.nn.quantized` - [X] `torch.nn.quantized.functional` → `torch.ao.nn.quantized.functional` - [X] `torch.nn.quantized.modules` → `torch.ao.nn.quantized.modules` - [X] [Current PR] `torch.nn.quantized.dynamic` → `torch.ao.nn.quantized.dynamic` - [ ] `torch.nn.quantized._reference` → `torch.ao.nn.quantized._reference` - [ ] `torch.nn.quantizable` → `torch.ao.nn.quantizable` - [ ] `torch.nn.qat` → `torch.ao.nn.qat` - [ ] `torch.nn.qat.modules` → `torch.ao.nn.qat.modules` - [ ] `torch.nn.qat.dynamic` → `torch.ao.nn.qat.dynamic` - [ ] `torch.nn.intrinsic` → `torch.ao.nn.intrinsic` - [ ] `torch.nn.intrinsic.modules` → `torch.ao.nn.intrinsic.modules` - [ ] `torch.nn.intrinsic.qat` → `torch.ao.nn.intrinsic.qat` - [ ] `torch.nn.intrinsic.quantized` → `torch.ao.nn.intrinsic.quantized` - [ ] `torch.nn.intrinsic.quantized.modules` → `torch.ao.nn.intrinsic.quantized.modules` - [ ] `torch.nn.intrinsic.quantized.dynamic` → `torch.ao.nn.intrinsic.quantized.dynamic` Majority of the files are just moved to the new location. However, specific files need to be double checked: - [Documentation](docs/source/quantization-support.rst) @vkuzo - [Public API test list](test/allowlist_for_publicAPI.json) @peterbell10 - [BC test](test/quantization/bc/test_backward_compatibility.py) @vkuzo - [IR emitter](torch/csrc/jit/frontend/ir_emitter.cpp) @jamesr66a - [JIT serialization](torch/csrc/jit/serialization/import_source.cpp) @IvanKobzarev @jamesr66a Differential Revision: [D36860660](https://our.internmc.facebook.com/intern/diff/D36860660/) NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D36860660/)! Differential Revision: [D36860660](https://our.internmc.facebook.com/intern/diff/D36860660) Pull Request resolved: https://github.com/pytorch/pytorch/pull/78714 Approved by: https://github.com/jerryzh168	2022-08-25 16:50:34 +00:00
zaf	c92e5ac95b	[quant][ao_migration] `torch.nn.quantized.modules` → `torch.ao.nn.quantized.modules` (#78713 ) Context: In order to avoid the cluttering of the `torch.nn` namespace the quantized modules namespace is moved to `torch.ao.nn`. The list of the `nn.quantized` files that are being migrated: - [ ] `torch.nn.quantized` → `torch.ao.nn.quantized` - [X] `torch.nn.quantized.functional` → `torch.ao.nn.quantized.functional` - [X] [Current PR] `torch.nn.quantized.modules` → `torch.ao.nn.quantized.modules` - [ ] `torch.nn.quantized.dynamic` → `torch.ao.nn.quantized.dynamic` - [ ] `torch.nn.quantized._reference` → `torch.ao.nn.quantized._reference` - [ ] `torch.nn.quantizable` → `torch.ao.nn.quantizable` - [ ] `torch.nn.qat` → `torch.ao.nn.qat` - [ ] `torch.nn.qat.modules` → `torch.ao.nn.qat.modules` - [ ] `torch.nn.qat.dynamic` → `torch.ao.nn.qat.dynamic` - [ ] `torch.nn.intrinsic` → `torch.ao.nn.intrinsic` - [ ] `torch.nn.intrinsic.modules` → `torch.ao.nn.intrinsic.modules` - [ ] `torch.nn.intrinsic.qat` → `torch.ao.nn.intrinsic.qat` - [ ] `torch.nn.intrinsic.quantized` → `torch.ao.nn.intrinsic.quantized` - [ ] `torch.nn.intrinsic.quantized.modules` → `torch.ao.nn.intrinsic.quantized.modules` - [ ] `torch.nn.intrinsic.quantized.dynamic` → `torch.ao.nn.intrinsic.quantized.dynamic` Majority of the files are just moved to the new location. However, specific files need to be double checked: - Documentation @vkuzo - docs/source/conf.py - docs/source/quantization.rst - [quantize_fx](torch/ao/quantization/quantize_fx.py) @jerryzh168 - [common test routine](test/quantization/ao_migration/common.py) @HDCharles - JIT stuff @jamesr66a - torch/csrc/jit/passes/hoist_conv_packed_params.cpp - torch/csrc/jit/passes/quantization/helper.h - torch/csrc/jit/serialization/import_source.cpp Differential Revision: [D38926012](https://our.internmc.facebook.com/intern/diff/D38926012/) Differential Revision: [D38926012](https://our.internmc.facebook.com/intern/diff/D38926012) Pull Request resolved: https://github.com/pytorch/pytorch/pull/78713 Approved by: https://github.com/jerryzh168	2022-08-25 16:50:33 +00:00
XiaobingSuper	31f151767b	add qscheme check for quantization observer (#80126 ) Motivation: each quantization observer only supports a limit qschemes, we need to do this check at the initiation step, rather than at the running step, such as MinMaxObserver with set qscheme with torch.per_channel_affine, there will have a runtime error at the running the calibration step: ``` AttributeError: 'MinMaxObserver' object has no attribute 'ch_axis' ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/80126 Approved by: https://github.com/jerryzh168	2022-08-25 10:03:19 +00:00
Sergii Dymchenko	591222f5d9	Fix use-dict-literal lint (#83718 ) Fix use-dict-literal pylint suggestions by changing `dict()` to `{}`. This PR should do the change for every Python file except test/jit/test_list_dict.py, where I think the intent is to test the constructor. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83718 Approved by: https://github.com/albanD	2022-08-24 00:26:46 +00:00
Vasiliy Kuznetsov	58170fb8aa	Remove DBR quantization from the codebase (#83642 ) Summary: DBR quantization is a no-go for now because it does not align well with PyTorch 2.0 plans and we do not want to build yet another tracing system. Deleting it from the codebase for now since there are no plans to develop this in the near future. We can bring it back at a later time if necessary. Test plan: CI Differential Revision: [D38839556](https://our.internmc.facebook.com/intern/diff/D38839556) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83642 Approved by: https://github.com/andrewor14, https://github.com/jerryzh168	2022-08-23 15:18:40 +00:00
Jerry Zhang	a419e483b2	[quant][fx] Add support for quantized matmul (#83885 ) Summary: att, probably missed the op during migration to the reference flow Test Plan: python test/test_quantization.py TestQuantizeFxOps.test_qmatmul Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/83885 Approved by: https://github.com/andrewor14	2022-08-23 05:46:25 +00:00
Andrew Or	b8496eb411	[Quant] Separate FBGEMM/QNNPACK BackendConfigs (#83566 ) Summary: Previously we use a single BackendConfig (get_native_backend_config) for both the FBGEMM and QNNPACK backends. However, these two backends have subtle differences in terms of their requirements that cannot be satisfied using a single BackendConfig. Therefore, this commit is the first step torwards decoupling the two backends. The real change in functionality will come in a future commit after DTypeConfig supports quant_min/quant_max and scale_min/scale_max. Existing uses of `get_native_backend_config` should not be affected. Public facing changes: ``` from torch.ao.quantization.backend_config import ( get_fbgemm_backend_config, get_qnnpack_backend_config, ) fbgemm_backend_config = get_fbgemm_backend_config() qnnpack_backend_config = get_qnnpack_backend_config() ``` Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Reviewers: jerryzh168 Subscribers: jerryzh168 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83566 Approved by: https://github.com/jerryzh168	2022-08-22 16:44:10 +00:00
PyTorch MergeBot	6a9c02339d	Revert "[quant][ao_migration] `torch.nn.quantized.modules` → `torch.ao.nn.quantized.modules` (#78713 )" This reverts commit `432f037498`. Reverted https://github.com/pytorch/pytorch/pull/78713 on behalf of https://github.com/janeyx99 due to Reverting for breaking (trunk-only) ios build	2022-08-22 07:32:37 +00:00
PyTorch MergeBot	b1a7b67529	Revert "[quant][ao_migration] `torch.nn.quantized.dynamic` → `torch.ao.nn.quantized.dynamic` (#78714 )" This reverts commit `e6fb97d8ae`. Reverted https://github.com/pytorch/pytorch/pull/78714 on behalf of https://github.com/janeyx99 due to sorry, reverting so https://github.com/pytorch/pytorch/pull/78713 could be cleanly reverted	2022-08-22 07:30:48 +00:00
PyTorch MergeBot	355d343fa8	Revert "[quant][ao_migration] `torch.nn.quantized._reference` → `torch.ao.nn.quantized._reference` (#78715 )" This reverts commit `a7344e52b9`. Reverted https://github.com/pytorch/pytorch/pull/78715 on behalf of https://github.com/janeyx99 due to sorry, reverting so https://github.com/pytorch/pytorch/pull/78713 could be cleanly reverted	2022-08-22 07:29:15 +00:00
PyTorch MergeBot	e9dd4d5adf	Revert "[quant][ao_migration] `torch.nn.quantizable` → `torch.ao.nn.quantizable`. (#78717 )" This reverts commit `e0876feb49`. Reverted https://github.com/pytorch/pytorch/pull/78717 on behalf of https://github.com/janeyx99 due to sorry, reverting so https://github.com/pytorch/pytorch/pull/78713 could be cleanly reverted	2022-08-22 07:26:44 +00:00
PyTorch MergeBot	4cbb1986fe	Revert "[quant][ao_migration] `torch.nn.qat` → `torch.ao.nn.qat` (#78716 )" This reverts commit `7cd2fa1d38`. Reverted https://github.com/pytorch/pytorch/pull/78716 on behalf of https://github.com/janeyx99 due to sorry, reverting so https://github.com/pytorch/pytorch/pull/78713 could be cleanly reverted	2022-08-22 07:23:24 +00:00
zaf	7cd2fa1d38	[quant][ao_migration] `torch.nn.qat` → `torch.ao.nn.qat` (#78716 ) Context: In order to avoid the cluttering of the `torch.nn` namespace the quantized modules namespace is moved to `torch.ao.nn`. The list of the `nn.quantized` files that are being migrated: - [X] `torch.nn.quantized` → `torch.ao.nn.quantized` - [X] `torch.nn.quantized.functional` → `torch.ao.nn.quantized.functional` - [X] `torch.nn.quantized.modules` → `torch.ao.nn.quantized.modules` - [X] `torch.nn.quantized.dynamic` → `torch.ao.nn.quantized.dynamic` - [X] `torch.nn.quantized._reference` → `torch.ao.nn.quantized._reference` - [X] `torch.nn.quantizable` → `torch.ao.nn.quantizable` - [X] [Current PR] `torch.nn.qat` → `torch.ao.nn.qat` - [X] `torch.nn.qat.modules` → `torch.ao.nn.qat.modules` - [X] `torch.nn.qat.dynamic` → `torch.ao.nn.qat.dynamic` - [ ] `torch.nn.intrinsic` → `torch.ao.nn.intrinsic` - [ ] `torch.nn.intrinsic.modules` → `torch.ao.nn.intrinsic.modules` - [ ] `torch.nn.intrinsic.qat` → `torch.ao.nn.intrinsic.qat` - [ ] `torch.nn.intrinsic.quantized` → `torch.ao.nn.intrinsic.quantized` - [ ] `torch.nn.intrinsic.quantized.modules` → `torch.ao.nn.intrinsic.quantized.modules` - [ ] `torch.nn.intrinsic.quantized.dynamic` → `torch.ao.nn.intrinsic.quantized.dynamic` Majority of the files are just moved to the new location. However, specific files need to be double checked: - None Differential Revision: [D36861197](https://our.internmc.facebook.com/intern/diff/D36861197/) NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D36861197/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/78716 Approved by: https://github.com/jerryzh168	2022-08-22 05:33:23 +00:00
zaf	e0876feb49	[quant][ao_migration] `torch.nn.quantizable` → `torch.ao.nn.quantizable`. (#78717 ) Context: In order to avoid the cluttering of the `torch.nn` namespace the quantized modules namespace is moved to `torch.ao.nn`. The list of the `nn.quantized` files that are being migrated: - [X] `torch.nn.quantized` → `torch.ao.nn.quantized` - [X] `torch.nn.quantized.functional` → `torch.ao.nn.quantized.functional` - [X] `torch.nn.quantized.modules` → `torch.ao.nn.quantized.modules` - [X] `torch.nn.quantized.dynamic` → `torch.ao.nn.quantized.dynamic` - [X] `torch.nn.quantized._reference` → `torch.ao.nn.quantized._reference` - [X] [Current PR] `torch.nn.quantizable` → `torch.ao.nn.quantizable` - [ ] `torch.nn.qat` → `torch.ao.nn.qat` - [ ] `torch.nn.qat.modules` → `torch.ao.nn.qat.modules` - [ ] `torch.nn.qat.dynamic` → `torch.ao.nn.qat.dynamic` - [ ] `torch.nn.intrinsic` → `torch.ao.nn.intrinsic` - [ ] `torch.nn.intrinsic.modules` → `torch.ao.nn.intrinsic.modules` - [ ] `torch.nn.intrinsic.qat` → `torch.ao.nn.intrinsic.qat` - [ ] `torch.nn.intrinsic.quantized` → `torch.ao.nn.intrinsic.quantized` - [ ] `torch.nn.intrinsic.quantized.modules` → `torch.ao.nn.intrinsic.quantized.modules` - [ ] `torch.nn.intrinsic.quantized.dynamic` → `torch.ao.nn.intrinsic.quantized.dynamic` Majority of the files are just moved to the new location. However, specific files need to be double checked: - None Differential Revision: [D36861090](https://our.internmc.facebook.com/intern/diff/D36861090/) NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D36861090/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/78717 Approved by: https://github.com/jerryzh168	2022-08-22 05:31:48 +00:00
zaf	a7344e52b9	[quant][ao_migration] `torch.nn.quantized._reference` → `torch.ao.nn.quantized._reference` (#78715 ) Context: In order to avoid the cluttering of the `torch.nn` namespace the quantized modules namespace is moved to `torch.ao.nn`. The list of the `nn.quantized` files that are being migrated: - [ ] `torch.nn.quantized` → `torch.ao.nn.quantized` - [X] `torch.nn.quantized.functional` → `torch.ao.nn.quantized.functional` - [X] `torch.nn.quantized.modules` → `torch.ao.nn.quantized.modules` - [X] `torch.nn.quantized.dynamic` → `torch.ao.nn.quantized.dynamic` - [X] [Current PR] `torch.nn.quantized._reference` → `torch.ao.nn.quantized._reference` - [ ] `torch.nn.quantizable` → `torch.ao.nn.quantizable` - [ ] `torch.nn.qat` → `torch.ao.nn.qat` - [ ] `torch.nn.qat.modules` → `torch.ao.nn.qat.modules` - [ ] `torch.nn.qat.dynamic` → `torch.ao.nn.qat.dynamic` - [ ] `torch.nn.intrinsic` → `torch.ao.nn.intrinsic` - [ ] `torch.nn.intrinsic.modules` → `torch.ao.nn.intrinsic.modules` - [ ] `torch.nn.intrinsic.qat` → `torch.ao.nn.intrinsic.qat` - [ ] `torch.nn.intrinsic.quantized` → `torch.ao.nn.intrinsic.quantized` - [ ] `torch.nn.intrinsic.quantized.modules` → `torch.ao.nn.intrinsic.quantized.modules` - [ ] `torch.nn.intrinsic.quantized.dynamic` → `torch.ao.nn.intrinsic.quantized.dynamic` Majority of the files are just moved to the new location. However, specific files need to be double checked: - None Differential Revision: [D36860927](https://our.internmc.facebook.com/intern/diff/D36860927/) NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D36860927/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/78715 Approved by: https://github.com/jerryzh168	2022-08-22 05:29:23 +00:00
zaf	e6fb97d8ae	[quant][ao_migration] `torch.nn.quantized.dynamic` → `torch.ao.nn.quantized.dynamic` (#78714 ) Context: In order to avoid the cluttering of the `torch.nn` namespace the quantized modules namespace is moved to `torch.ao.nn`. The list of the `nn.quantized` files that are being migrated: - [ ] `torch.nn.quantized` → `torch.ao.nn.quantized` - [X] `torch.nn.quantized.functional` → `torch.ao.nn.quantized.functional` - [X] `torch.nn.quantized.modules` → `torch.ao.nn.quantized.modules` - [X] [Current PR] `torch.nn.quantized.dynamic` → `torch.ao.nn.quantized.dynamic` - [ ] `torch.nn.quantized._reference` → `torch.ao.nn.quantized._reference` - [ ] `torch.nn.quantizable` → `torch.ao.nn.quantizable` - [ ] `torch.nn.qat` → `torch.ao.nn.qat` - [ ] `torch.nn.qat.modules` → `torch.ao.nn.qat.modules` - [ ] `torch.nn.qat.dynamic` → `torch.ao.nn.qat.dynamic` - [ ] `torch.nn.intrinsic` → `torch.ao.nn.intrinsic` - [ ] `torch.nn.intrinsic.modules` → `torch.ao.nn.intrinsic.modules` - [ ] `torch.nn.intrinsic.qat` → `torch.ao.nn.intrinsic.qat` - [ ] `torch.nn.intrinsic.quantized` → `torch.ao.nn.intrinsic.quantized` - [ ] `torch.nn.intrinsic.quantized.modules` → `torch.ao.nn.intrinsic.quantized.modules` - [ ] `torch.nn.intrinsic.quantized.dynamic` → `torch.ao.nn.intrinsic.quantized.dynamic` Majority of the files are just moved to the new location. However, specific files need to be double checked: - [Documentation](docs/source/quantization-support.rst) @vkuzo - [Public API test list](test/allowlist_for_publicAPI.json) @peterbell10 - [BC test](test/quantization/bc/test_backward_compatibility.py) @vkuzo - [IR emitter](torch/csrc/jit/frontend/ir_emitter.cpp) @jamesr66a - [JIT serialization](torch/csrc/jit/serialization/import_source.cpp) @IvanKobzarev @jamesr66a Differential Revision: [D36860660](https://our.internmc.facebook.com/intern/diff/D36860660/) NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D36860660/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/78714 Approved by: https://github.com/jerryzh168	2022-08-22 05:22:00 +00:00
zaf	432f037498	[quant][ao_migration] `torch.nn.quantized.modules` → `torch.ao.nn.quantized.modules` (#78713 ) Context: In order to avoid the cluttering of the `torch.nn` namespace the quantized modules namespace is moved to `torch.ao.nn`. The list of the `nn.quantized` files that are being migrated: - [ ] `torch.nn.quantized` → `torch.ao.nn.quantized` - [X] `torch.nn.quantized.functional` → `torch.ao.nn.quantized.functional` - [X] [Current PR] `torch.nn.quantized.modules` → `torch.ao.nn.quantized.modules` - [ ] `torch.nn.quantized.dynamic` → `torch.ao.nn.quantized.dynamic` - [ ] `torch.nn.quantized._reference` → `torch.ao.nn.quantized._reference` - [ ] `torch.nn.quantizable` → `torch.ao.nn.quantizable` - [ ] `torch.nn.qat` → `torch.ao.nn.qat` - [ ] `torch.nn.qat.modules` → `torch.ao.nn.qat.modules` - [ ] `torch.nn.qat.dynamic` → `torch.ao.nn.qat.dynamic` - [ ] `torch.nn.intrinsic` → `torch.ao.nn.intrinsic` - [ ] `torch.nn.intrinsic.modules` → `torch.ao.nn.intrinsic.modules` - [ ] `torch.nn.intrinsic.qat` → `torch.ao.nn.intrinsic.qat` - [ ] `torch.nn.intrinsic.quantized` → `torch.ao.nn.intrinsic.quantized` - [ ] `torch.nn.intrinsic.quantized.modules` → `torch.ao.nn.intrinsic.quantized.modules` - [ ] `torch.nn.intrinsic.quantized.dynamic` → `torch.ao.nn.intrinsic.quantized.dynamic` Majority of the files are just moved to the new location. However, specific files need to be double checked: - Documentation @vkuzo - docs/source/conf.py - docs/source/quantization.rst - [quantize_fx](torch/ao/quantization/quantize_fx.py) @jerryzh168 - [common test routine](test/quantization/ao_migration/common.py) @HDCharles - JIT stuff @jamesr66a - torch/csrc/jit/passes/hoist_conv_packed_params.cpp - torch/csrc/jit/passes/quantization/helper.h - torch/csrc/jit/serialization/import_source.cpp Differential Revision: [D36860145](https://our.internmc.facebook.com/intern/diff/D36860145/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/78713 Approved by: https://github.com/jerryzh168	2022-08-22 01:38:55 +00:00
Jerry Zhang	13f42069a8	[quant][fx][refactor] Rename qconfig_utils.py to qconfig_mapping_utils.py in torch/ao/quantization/fx (#83369 ) Summary: att, it seems more appropriate to name it qconfig_mapping_utils, also we probably want to move the functions in torch/ao/quantization/qconfig_mapping_utils.py to torch/ao/quantization/fx/qconfig_mapping_utils.py as well Test Plan: python test/test_quantization.py TestQuantizeFx Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/83369 Approved by: https://github.com/andrewor14	2022-08-19 21:36:05 +00:00
Daniel Recoskie	7453019e79	Remove duplicate_dequantize_node and remove_extra_dequantize (#83611 ) Summary: removed duplicate_dequantize_node and remove_extra_dequantize Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps python test/test_quantization.py TestQuantizeFxModels Reviewers: jerryzh168 Subscribers: Tasks: Tags: Differential Revision: [D38841052](https://our.internmc.facebook.com/intern/diff/D38841052) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83611 Approved by: https://github.com/jerryzh168	2022-08-19 16:59:55 +00:00
vspenubarthi	88e0165d08	[ao] Added Equalization QConfig generation to ModelReport class (#83698 ) Summary: This adds the capability to generate a QConfigMapping based on the suggestions of the ModelReport API for the user to use. The only dependency of this feature is that the calibration is run before the generation of the QConfigMapping and there is no dependency on the report generation other than that the observers cannot be removed before this is called. This maps module fqns to EqualizationQConfigs instead of regular QConfigs. Example Usage (after callibration): ``` quantization_mapping = mod_report.generate_qconfig_mapping() equalization_mapping = mod_report.generate_equalization_mapping() prepared_model = quantize_fx.prepare_fx(model, mapping, example_input, _equalization_config=equalization_mapping) quantized_model = quantize_fx.convert_fx(prepared) ``` This was tested by ensuring that the suggestions generated in the QConfigMapping are: 1. Correct according to the set backend and data passed through 2. Able to be prepared and converted as a proper config (is a valid config) The test for this is a part of the TestFxModelReportClass test suite. Test Plan: python test/test_quantization.py TestFxModelReportClass.test_equalization_mapping_generation Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/83698 Approved by: https://github.com/jerryzh168	2022-08-19 02:16:01 +00:00
Jerry Zhang	784c47fbee	[quant][fx][refactor] Move ObservationType to backend_config.py (#83368 ) Summary: Now we have a separate file to define BackendConfig related classes, we can move ObservationType to that file as well Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps python test/test_quantization.py TestQuantizeFxModels Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/83368 Approved by: https://github.com/andrewor14	2022-08-19 01:22:10 +00:00
vspenubarthi	5e715be17e	[ao] Added Quantization QConfig generation to ModelReport class (#83688 ) Summary: This adds the capability to generate a QConfigMapping based on the suggestions of the ModelReport API for the user to use. The only dependency of this feature is that the callibration is run before the generation of the QConfigMapping and there is no dependency on the report generation other than that the observers cannot be removed before this is called. Example Usage (after callibration): ``` mapping = mod_report.generate_qconfig_mapping() prepared_model = quantize_fx.prepare_fx(model, mapping, example_input) quantized_model = quantize_fx.convert_fx(prepared) ``` This was tested by ensuring that the suggestions generated in the QConfigMapping are: 1. Correct according to the set backend and data passed through 2. Able to be prepared and converted as a proper config (is a valid config) The test for this is a part of the TestFxModelReportClass test suite. Test Plan: python test/test_quantization.py TestFxModelReportClass.test_qconfig_mapping_generation Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/83688 Approved by: https://github.com/jerryzh168	2022-08-18 23:12:05 +00:00
zaf	78c8a0d752	[quant][ao_migration] `torch.nn.quantized.functional` → `torch.ao.nn.quantized.functional` (#78712 ) Context: In order to avoid the cluttering of the `torch.nn` namespace the quantized modules namespace is moved to `torch.ao.nn`. The list of the `nn.quantized` files that are being migrated: - [ ] `torch.nn.quantized` → `torch.ao.nn.quantized` - [X] [Current PR] `torch.nn.quantized.functional` → `torch.ao.nn.quantized.functional` - [ ] `torch.nn.quantized.modules` → `torch.ao.nn.quantized.modules` - [ ] `torch.nn.quantized.dynamic` → `torch.ao.nn.quantized.dynamic` - [ ] `torch.nn.quantized._reference` → `torch.ao.nn.quantized._reference` - [ ] `torch.nn.quantizable` → `torch.ao.nn.quantizable` - [ ] `torch.nn.qat` → `torch.ao.nn.qat` - [ ] `torch.nn.qat.modules` → `torch.ao.nn.qat.modules` - [ ] `torch.nn.qat.dynamic` → `torch.ao.nn.qat.dynamic` - [ ] `torch.nn.intrinsic` → `torch.ao.nn.intrinsic` - [ ] `torch.nn.intrinsic.modules` → `torch.ao.nn.intrinsic.modules` - [ ] `torch.nn.intrinsic.qat` → `torch.ao.nn.intrinsic.qat` - [ ] `torch.nn.intrinsic.quantized` → `torch.ao.nn.intrinsic.quantized` - [ ] `torch.nn.intrinsic.quantized.modules` → `torch.ao.nn.intrinsic.quantized.modules` - [ ] `torch.nn.intrinsic.quantized.dynamic` → `torch.ao.nn.intrinsic.quantized.dynamic` Majority of the files are just moved to the new location. However, specific files need to be double checked: - [Documentation](docs/source/quantization-support.rst) @vkuzo - [Public API test list](test/allowlist_for_publicAPI.json) @peterbell10 Differential Revision: [D36792967](https://our.internmc.facebook.com/intern/diff/D36792967/) NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D36792967/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/78712 Approved by: https://github.com/jerryzh168	2022-08-18 17:51:54 +00:00
Daniel Recoskie	ea2183f0ea	removed duplicate_quantize_dynamic_node (#83459 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83459 Approved by: https://github.com/jerryzh168	2022-08-17 21:26:12 +00:00
Jerry Zhang	3586af8adc	[quant] Remove unused quantize handler definitions (#83360 ) Summary: CommonQuantizeHandler This was added previously to make some of the refactor to use reference quantized model flow easier, now we have fully migrated to use reference quantized model flow, it's no longer needed, so we can remove it Also updated some comments Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps python test/test_quantization.py TestQuantizeFxModels Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/83360 Approved by: https://github.com/andrewor14	2022-08-16 23:13:01 +00:00
Jesse Cai	d4bd88b64b	[Quant][fx] Remove WEIGHT_INDEX_DICT and BIAS_INDEX_DICT (#83263 ) Summary: This change adds in input_type_to_index mappings to the backend patterns for `nn.functional.linear`, `nn.functional.conv1d`, `nn.functional.conv1d`, and `nn.functional.conv3d`. This let's us remove `WEIGHT_INDEX_DICT` and `BIAS_INDEX_DICT` from `prepare.py`. Instead we pass around `backend_config` and check wether an arg is weight/bias agains that config Test Plan: ``` python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps ``` Reviewers: @andrewor14 Subscribers: Tasks: Tags: quant, fx Differential Revision: [D38705516](https://our.internmc.facebook.com/intern/diff/D38705516) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83263 Approved by: https://github.com/andrewor14	2022-08-15 14:23:22 +00:00
joncrall	4618371da5	Integrate xdoctest - Rebased (#82797 ) This is a new version of #15648 based on the latest master branch. Unlike the previous PR where I fixed a lot of the doctests in addition to integrating xdoctest, I'm going to reduce the scope here. I'm simply going to integrate xdoctest, and then I'm going to mark all of the failing tests as "SKIP". This will let xdoctest run on the dashboards, provide some value, and still let the dashboards pass. I'll leave fixing the doctests themselves to another PR. In my initial commit, I do the bare minimum to get something running with failing dashboards. The few tests that I marked as skip are causing segfaults. Running xdoctest results in 293 failed, 201 passed tests. The next commits will be to disable those tests. (unfortunately I don't have a tool that will insert the `#xdoctest: +SKIP` directive over every failing test, so I'm going to do this mostly manually.) Fixes https://github.com/pytorch/pytorch/issues/71105 @ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/82797 Approved by: https://github.com/ezyang	2022-08-12 02:08:01 +00:00
Jerry Zhang	bce1540f1f	[quant][fx] Add more detailed docs for prepare_fx/prepare_qat_fx/convert_fx (#83132 ) Summary: att Test Plan: visual inspection of generated docs page https://pytorch.org/docs/stable/quantization-support.html Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/83132 Approved by: https://github.com/andrewor14	2022-08-11 16:20:30 +00:00
vspenubarthi	a2ca89331f	[ao] Create framework for ModelReport Qconfig Generation (#83091 ) Summary: This creates the framework in the ModelReport API for the generation of QConfigs by the ModelReport instance based on suggestions. This functionality will eventually be added into the report generation or be something that complements it, but for now it will be an independent call for API stability and to be able to better modularize the features as it stabilizes. This also adds the framework for the relavent test function and a note in the README at what future changes are planned for this new method in the ModelReport API. Test Plan: python test/test_quantization.py TestFxModelReportClass.test_qconfig_generation Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/83091 Approved by: https://github.com/HDCharles	2022-08-11 00:11:50 +00:00
vspenubarthi	888c1a143f	[ao] Added some additional / future tasks for ModelReport API to README (#83088 ) Summary: I added some additional tasks to further improve the ModelReport API to the README. These are tasks that I will try to complete in the next few weeks but also can help to provide future direction later. Test Plan: No code added Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/83088 Approved by: https://github.com/andrewor14	2022-08-10 17:13:25 +00:00
macandro96	03abcf2317	[ao][sparsity] Data Sparsity with Post Training Quantization (#82759 ) Implementation of `post_training_sparse_quantize` that takes in a model and applies sparsification and quantization to only `embeddings` & `embeddingbags`. The quantization step can happen before or after sparsification depending on the `sparsify_first` argument. Test Plan: ```python test/test_ao_sparsity.py TestQuantizationUtils``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/82759 Approved by: https://github.com/z-a-f	2022-08-10 16:51:35 +00:00
Yixin Bao	2e1929709d	Back out "[Quant][fx] Remove dequant-quant around getitem" (#83147 ) Differential Revision: D38566988 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83147 Approved by: https://github.com/soumith	2022-08-10 09:41:34 +00:00
Sergii Dymchenko	a0b3854548	Change seperate -> separate (#83056 ) One instance was caught by Meta-internal "exact-word-misspell" linter in D38505529. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83056 Approved by: https://github.com/huydhn, https://github.com/seemethere	2022-08-09 23:11:34 +00:00
Andrew Or	782f3489c6	[Quant][fx][bc-breaking] Integrate BackendConfig with quantization flow (part 2) (#82557 ) This is part 2 of the effort to replace `backend_config_dict` with a python config object, a more formal and robust API that leads to better user experience. This commit integrates the `BackendConfig` implemented in part 1 (https://github.com/pytorch/pytorch/pull/81469) with the existing FX graph mode quantization flow. Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps BC-breaking Notes: Before: ``` import torch from torch.ao.quantization import get_default_qconfig_mapping from torch.ao.quantization.backend_config import ObservationType from torch.ao.quantization.quantize_fx import prepare_fx, convert_fx dtype_config = { "input_dtype": torch.quint8, "output_dtype": torch.quint8 "weight_dtype": torch.qint8, "bias_dtype": torch.float, } backend_config_dict = { "name": "my_backend", "configs": [{ "pattern": torch.nn.Linear, "observation_type": ObservationType.OUTPUT_USE_DIFFERENT_OBSERVER_AS_INPUT, "dtype_configs": [dtype_config], "root_module": torch.nn.Linear, "reference_quantized_module": torch.nn.quantized._reference.Linear, "qat_module": torch.nn.qat.Linear, }] } m = MyModel() qconfig_mapping = get_default_qconfig_mapping() example_inputs = (torch.rand(3, 3),) m = prepare_fx( m, qconfig_mapping, example_inputs, backend_config_dict=backend_config_dict) m = convert_fx(m, backend_config_dict=backend_config_dict) ``` After: ``` import torch from torch.ao.quantization import get_default_qconfig_mapping from torch.ao.quantization.backend_config import ( BackendConfig, BackendPatternConfig, DTypeConfig, ObservationType, ) from torch.ao.quantization.quantize_fx import prepare_fx, convert_fx dtype_config = DTypeConfig( input_dtype=torch.quint8, output_dtype=torch.quint8 weight_dtype=torch.qint8, bias_dtype=torch.float, ) backend_config = BackendConfig("my_backend").set_backend_pattern_config( BackendPatternConfig(torch.nn.Linear) .set_observation_type(ObservationType.OUTPUT_USE_DIFFERENT_OBSERVER_AS_INPUT) .add_dtype_config(dtype_config) .set_root_module(torch.nn.Linear) .set_reference_quantized_module(torch.nn.quantized._reference.Linear) .set_qat_module(torch.nn.qat.Linear)) m = MyModel() qconfig_mapping = get_default_qconfig_mapping() example_inputs = (torch.rand(3, 3),) m = prepare_fx(m, qconfig_mapping, example_inputs, backend_config=backend_config) m = convert_fx(m, backend_config=backend_config) ``` Reviewers: jerryzh168 Subscribers: jerryzh168, supriyar Differential Revision: [D38471932](https://our.internmc.facebook.com/intern/diff/D38471932) Pull Request resolved: https://github.com/pytorch/pytorch/pull/82557 Approved by: https://github.com/jerryzh168	2022-08-08 18:55:50 +00:00
asl3	b91ff5e361	[quant] Remove unneeded lines from APoT linear (#82909 ) ### Summary Remove unnecessary lines from APoT linear module ### Test Plan Run unit tests with:` python /pytorch/test/quantization/core/experimental/test_linear.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/82909 Approved by: https://github.com/jerryzh168	2022-08-08 11:30:24 +00:00
vspenubarthi	86437b8631	[ao] Updated ModelReportVisualizer per-channel line plot (#82918 ) Summary: Before, the line plot for the ModelReportVisualizer used to plot a different line for each channel. However, for models that have a lot of channels, this can get really hard to read and parse and doesn't provide much valuable information. Now, we just have a single value per module that is the average of the 500 channels. We also considered plotting 3 lines (a min line, a max line, and an average line) but the issue was that large outliers could result in one of the lines completely messing up the scale and the other two not being visible. As a result, it made sense to do an average and let the user use the report data to generate the other two if they wished to do so. This was tested visually in a ipynb notebook Test Plan: Tested visually in a ipynb notebook Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/82918 Approved by: https://github.com/jerryzh168	2022-08-06 02:55:43 +00:00
vspenubarthi	da5272ef3b	[ao] Fix per-channel histogram visualization in ModelReportVisualizer (#82917 ) Summary: There was an issue with per-channel visualizations in the ModelReportVisualizer that in specific scenarios in which there were only per-channel features for a module, it would fail to specifically get the channel by channel info. After digging through the code, the core reason was a for loop that was enumerating on the `tensor_table` (tensor level info) even in the scenario in which we only had per-channel info. This was fixed, and tested in a Bento to ensure expected functionality. Test Plan: Tested visually Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/82917 Approved by: https://github.com/jerryzh168	2022-08-06 02:36:55 +00:00
vspenubarthi	2e74b51a4e	[ao] Added ModelReportVisualizer info to README for ModelReport (#82796 ) Summary: This adds information on how the ModelReportVisualizer integrates into the ModelReport API into the README file for the ModelReport folder. It updates the high level usage flow, includes information on the API, some of the important public methods and what they do, as well as updates to the folder structure to include the new `model_report_visualizer.py` file as well as updating the tests section to highlight that there are high level tests for the ModelReportVisualizer as well. There really aren't any direct tests for this since it's just updates to a README, but the tests for the ModelReportVisualizer are relavent and were run to make sure table generation was still properly occuring. Test Plan: python test/test_quantization.py TestFxModelReportVisualizer Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/82796 Approved by: https://github.com/jerryzh168	2022-08-05 19:29:50 +00:00
vspenubarthi	5ca098fe38	[ao] Changed ratio of channels needed for input-weight rec (#82795 ) Summary: After working on a tutorial and spending more time experimenting with the input-weight equalization recommendation feature, I realized that having half as the number of channels to benefit from input-weight was too high, and that it should be a bit more lenient. Based on the example I played around with in an internal tutorial, I found that somewhere in the 0.3 - 0.4 threshold made more sense. In the future, more in-depth testing and experimenting with more models may help further fine-tune this fraction of channels that would benefit. Test Plan: python test/test_quantization.py TestFxDetectInputWeightEqualization Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/82795 Approved by: https://github.com/jerryzh168	2022-08-05 16:34:30 +00:00
vspenubarthi	95c7fc395b	[ao] Fix punctuation issue with Dynamic Static Report (#82794 ) Summary: This fixes a punctuation issue with the Dynamic Static Detector that was missing a period when suggesting to use a dynamic quantize per tensor layer. Quick grammer fix and no other changes to code. Test Plan: python test/test_quantization.py TestFxModelReportDetectDynamicStatic Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/82794 Approved by: https://github.com/jerryzh168	2022-08-05 16:33:59 +00:00
Andrew Or	8f38f6773a	[Quant][fx] Remove dequant-quant around getitem (#82675 ) Summary: https://github.com/pytorch/pytorch/issues/82480 saw unnecessary dequant-quant pairs around the getitem op, which led to significant slowdowns. This commit simply removes this pair in the lowering step, since getitem already handles quantized inputs. Test Plan: python test/test_quantization.py TestQuantizeFxOps.test_getitem_no_dequant_quant Reviewers: jerryzh168 Subscribers: jerryzh168, supriyar Tasks: https://github.com/pytorch/pytorch/issues/82480 Differential Revision: [D38427508](https://our.internmc.facebook.com/intern/diff/D38427508) Pull Request resolved: https://github.com/pytorch/pytorch/pull/82675 Approved by: https://github.com/jerryzh168	2022-08-05 15:01:51 +00:00
asl3	34103a3033	Refactor quant levels visualization (#82790 ) ### Summary Refactors quantization levels visualization function to include alpha qparam in parameters of `float_to_apot` function call (due to `float_to_apot` function update). Also adds additional detail to the documentation for `quant_levels_visualization`. ### Test Plan Print visualization by calling `quant_levels_visualization` function. Pull Request resolved: https://github.com/pytorch/pytorch/pull/82790 Approved by: https://github.com/jerryzh168	2022-08-04 22:28:50 +00:00
Hao Li	aa40503954	Add Custom Module Support List (#82606 ) Summary: Add a global custon module support list for the users to specify the modules they want the equalization process support. To use this list, import it from the _equalize.py file and append module in it. Unittest passed to check global support list: https://pxl.cl/28RKG Test Plan: buck1 test mode/dev //on_device_ai/odai/tests/transforms:test_transforms -- --exact 'on_device_ai/odai/tests/transforms:test_transforms - test_custom_support_list (on_device_ai.odai.tests.transforms.test_input_weight_for_turing.TestInputWeight)' Reviewed By: jerryzh168 Differential Revision: D38264244 Pull Request resolved: https://github.com/pytorch/pytorch/pull/82606 Approved by: https://github.com/HDCharles	2022-08-03 17:48:51 +00:00
asl3	4680047001	Modify LinearAPoT matrix multiplication bitshift to support all k (#82409 ) ### Summary This PR modifies the bitshift implementation of matrix multiplication for LinearAPoT in `bitshift_mul` to support all input values of k. It also fixes the row/col dimension assignment for the `mat_mul `method ### Test Plan Run unit tests with: `python test/quantization/core/experimental/test_linear.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/82409 Approved by: https://github.com/dzdang	2022-07-28 20:40:26 +00:00
asl3	a1e6325149	Implement linear module for APoT quantization (#82105 ) ### Summary Implement linear module to support APoT quantization. Use bitshifting method discussed in APoT paper https://arxiv.org/pdf/1909.13144.pdf to multiply PoT terms in APoT quantized weight tensor with uniformly quantized activation tensor to demonstrate alternative to matrix multiplication. Multiplication using bitshifting for PoT: <img width="340" alt="Screen Shot 2022-07-25 at 12 44 26 PM" src="https://user-images.githubusercontent.com/68875504/180831050-ff849bca-8eb0-4b69-9b7f-c6c94a4cdfb5.png"> ### Test Plan Run unit tests with: `python /pytorch/test/quantization/core/experimental/test_linear.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/82105 Approved by: https://github.com/HDCharles	2022-07-28 13:09:59 +00:00
asl3	13ad4739a6	[quant] Implement PTQ for APoT FakeQuant (#81040 ) ### Summary: This PR implements PTQ for APoT FakeQuant. It runs models (Resnet-18 pre-trained model, ImageNet dataset) to compare accuracy metrics for different qconfig settings of uniform vs. APoT quantized activation and weight. According to the collected accuracy stats, model #2 (uniform activation and APoT weight) appears to have a slight improvement in accuracy compared to model #1 (uniform activation and uniform weight) for 8-bit and significant improvement for 4-bit (see "Accuracy Stats" section below). ### Test Plan: Run models with: `python test/quantization/core/experimental/fx_graph_mode_apot.py` ### Accuracy Stats: 8-bit (Uniform int8, APoT b = 8 k = 2) Model #1: Uniform activation, uniform weight (FX Graph Mode quantized) Evaluation accuracy on test dataset: 64.43% (Top-1), 85.62% (Top-5) Model #2: Uniform activation, APoT weight (FX Graph Mode quantized) Evaluation accuracy on test dataset: 64.51% (Top-1), 85.78% (Top-5) Model #3: APoT activation, APoT weight (FX Graph Mode quantized) Evaluation accuracy on test dataset: 64.32% (Top-1), 85.78% (Top-5) 4-bit (Uniform int4, APoT b = 4 k = 2) Model #1: Uniform activation, uniform weight (FX Graph Mode quantized) Evaluation accuracy on test dataset: 45.63% (Top-1), 71.96% (Top-5) Model #2: Uniform activation, APoT weight (FX Graph Mode quantized) Evaluation accuracy on test dataset: 64.24% (Top-1), 85.56% (Top-5) Model #3: APoT activation, APoT weight (FX Graph Mode quantized) Evaluation accuracy on test dataset: 45.40% (Top-1), 76.21% (Top-5) Full Precision model (FX Graph Mode quantized) Evaluation accuracy on test dataset: 69.76% (Top-1), 89.08% (Top-5) Eager mode quantized model Evaluation accuracy on test dataset: 69.49% (Top-1), 88.90% (Top-5) Pull Request resolved: https://github.com/pytorch/pytorch/pull/81040 Approved by: https://github.com/jerryzh168	2022-07-28 07:21:31 +00:00
HDCharles	8d82367f52	[ao][sparsity][fx] make sparse prepare->quant prepare compose (#81993 ) Summary: The primary issue was that fusion and matching had to be updated to handle parametrized modules Test Plan: python test/test_ao_sparsity.py TestFxComposability Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/81993 Approved by: https://github.com/jerryzh168	2022-07-27 22:09:29 +00:00
macandro96	e0e3a98555	[ao][sparsity] README for base data scheduler class (#82131 ) The readme file contains an overview of the base data scheduler. Consists of code snippets and instructions on how to create your own custom data scheduler and how to use during training a model. Test Plan: None Pull Request resolved: https://github.com/pytorch/pytorch/pull/82131 Approved by: https://github.com/z-a-f	2022-07-27 20:44:38 +00:00
HDCharles	8533951f09	[ao][sparsity][fx] make quant prepare -> sparse prepare compose (#81992 ) Summary: sparse_prepare automatically composes with quantized prepare even in cases with fusion. However, the convert step needed to be updated to handle parametrized modules. Test Plan: python test/test_ao_sparsity.py TestFxComposability Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/81992 Approved by: https://github.com/jerryzh168	2022-07-27 17:14:13 +00:00
macandro96	ad788662b1	[ao][sparsity] README for activation sparsifier (#81814 ) The README contains introduction and details on the activation sparsifier. It also contains code snippets and examples on using the activation sparsifier. Test Plan: None Pull Request resolved: https://github.com/pytorch/pytorch/pull/81814 Approved by: https://github.com/z-a-f	2022-07-27 17:04:45 +00:00
macandro96	7af2baffce	[ao][sparsity] README for data sparsifier lightning callbacks (#81813 ) The README contains instructions on using the lightning callbacks to sparsify the model during and post training. Test Plan: None Pull Request resolved: https://github.com/pytorch/pytorch/pull/81813 Approved by: https://github.com/z-a-f	2022-07-27 16:50:01 +00:00
macandro96	0d0bd0e3c6	[ao][sparsity] README for BaseDataSparsifier (#82130 ) The readme file contains an overview of the base data sparsifier, it's implementation details. Also, consists of code snippets and instructions on how to create your own custom data sparsifier. Test Plan: None Pull Request resolved: https://github.com/pytorch/pytorch/pull/82130 Approved by: https://github.com/z-a-f	2022-07-27 16:39:54 +00:00
macandro96	7391dec96a	[ao][sparsity] Bug Fix: Retain mask and config while replacing data in data sparsifier (#82129 ) Bug: The config and mask were being recreated while replacing data on the data sparsifier. Fix: Introduced an argument `reuse_mask` which when set `True` uses the old mask. If new config is not specified, the data sparsifier by default uses the old config with the new data. Also, added unit tests to check this bug. Test Plan: ```python test/test_ao_sparsity.py TestBaseDataSparsifier``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/82129 Approved by: https://github.com/z-a-f	2022-07-27 16:37:28 +00:00
macandro96	85a9e7367c	[ao][sparsity] Store mask as sparse coo during serialization for the activation sparsifier (#82181 ) The stored mask is dumped as `torch.sparse_coo` while serializing. While restoring the state, the mask is converted to a dense tensor again. Test Plan: ```python test/test_ao_sparsity.py TestActivationSparsifier``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/82181 Approved by: https://github.com/z-a-f	2022-07-26 23:01:35 +00:00
macandro96	18e8bc9b72	[ao][sparsity] Store mask as sparse coo during serialization for the data sparsifier (#82180 ) The stored mask is dumped as `torch.sparse_coo` while serializing. While restoring the state, the mask is converted to a dense tensor again. Test Plan: ```python test/test_ao_sparsity.py TestBaseDataSparsifier``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/82180 Approved by: https://github.com/z-a-f	2022-07-26 22:59:03 +00:00
asl3	a01fb5392f	Modify APoT dequantize method (#82126 ) ### Summary Modify APoT dequantize method to correctly add dequantized values to result numpy array and retain original tensor dimensions ### Test Plan Run unit tests with: `python test/quantization/core/experimental/test_quantizer.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/82126 Approved by: https://github.com/HDCharles	2022-07-26 20:26:51 +00:00
vspenubarthi	04fc3e4c04	[ao] Add histogram visualization capability to ModelReportVisualizer (#81975 ) Summary: This adds the capability to visualize the histogram in the ModelReportVisualizer. You can visualize the histogram of a single feature for a single layer, (for example you want to see the distribution of some data across all channels), or for some feature across multiple layers of a similar kind. All channel data is merged together to plot one large distribution. The user gets to decide the number of bins the histogram has and it will create those many equally spaced bins. Expected Usage ``` mod_rep_visualizer.generate_histogram_visualization(<feature_name>,<module_name>) ``` You can also filter the modules so that only modules with a certain substring will have their features represented in the plot. > This is intended to be used in a `.ipynb` style notebook The tests for this were just visual inspection for two reasons: 1.) This method does not return anything, it just generates the visualization plot 2.) All the data to create the plot visualization is gotten from `generate_filtered_tables` which is already tested, so testing all that for this again would be redundant. Example Image outputs are pasted below in the PR thread. Test Plan: Visual Test Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/81975 Approved by: https://github.com/jerryzh168	2022-07-26 19:57:43 +00:00
vspenubarthi	d7d04fd38c	[ao] Add line plot visualization capability to ModelReportVisualizer (#81974 ) Summary: This adds the capability to visualize the line plot in the ModelReportVisualizer. You can visualize line plots of single feature, and this feature can either be a per-tensor or per-channel feature. If the feature is per tensor, then the idx of the module is printed as the x axis and the values of the feature as the y. If the feature is per channel, then an (the first one) idx of the module will the value on the x axis and the corresponding feature val in the y axis, and there will be a seperate line for each channel, and a legend denoting which line belongs to which channel. Expected Usage ``` mod_rep_visualizer.generate_plot_visualization(<feature_name>) ``` You can also filter the modules so that only modules with a certain substring will have their features represented in the plot. > This is intended to be used in a `.ipynb` style notebook The tests for this were just visual inspection for two reasons: 1.) This method does not return anything, it just generates the visualization plot 2.) All the data to create the plot visualization is gotten from `generate_filtered_tables` which is already tested, so testing all that for this again would be redundant. Example Image outputs are pasted below in the PR thread. Test Plan: Visual Test Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/81974 Approved by: https://github.com/jerryzh168	2022-07-26 19:55:26 +00:00
vspenubarthi	eac20a45fa	[ao] Added table visualization capability to ModelReportVisualizer (#81973 ) Summary: This adds the capability to visualize the table of information in the ModelReportVisualizer. This allows the user to filter based on module name pattern match or feature name pattern match and the implemented method `generate_table_visualization` prints out the table in a string format that is easy to parse. Expected Usage ``` mod_rep_visualizer.generate_table_visualization() ``` Can also pass in optional filters as well if needed. The tests for this were just visual inspection for two reasons: 1.) This method does not return anything, it just generates the visualization 2.) All the data to create the table visualization is gotten from `generate_filtered_tables` which is already tested, so testing all that for this again would be redundant. Example Printed Output ``` Tensor Level Information idx layer_fqn input_activation_global_max input_activation_global_min input_weight_channel_axis input_weight_threshold outlier_detection_channel_axis outlier_detection_ratio_threshold outlier_detection_reference_percentile weight_global_max weight_global_min ----- ------------- ----------------------------- ----------------------------- --------------------------- ------------------------ -------------------------------- ----------------------------------- ---------------------------------------- ------------------- ------------------- 1 block1.linear 1.9543 -1.33414 1 0.5 1 3.5 0.95 0.380521 -0.568476 2 block2.linear 1.81486 0 1 0.5 1 3.5 0.95 0.521438 -0.0256195 Channel Level Information idx layer_fqn channel constant_batch_counts input_activation_per_channel_max input_activation_per_channel_min input_weight_channel_comparison_metrics input_weight_equalization_recommended outlier_detection_batches_used outlier_detection_is_sufficient_batches outlier_detection_percentile_ratios outliers_detected weight_per_channel_max weight_per_channel_min ----- ------------- --------- ----------------------- ---------------------------------- ---------------------------------- ----------------------------------------- --------------------------------------- -------------------------------- ----------------------------------------- ------------------------------------- ------------------- ------------------------ ------------------------ 1 block1.linear 0 0 1.9543 -1.33414 0.956912 True 1 True 1.77489 False 0.300502 -0.568476 2 block1.linear 1 0 1.14313 -0.756184 1.04378 True 1 True 2.07887 False 0.336131 -0.261025 3 block1.linear 2 0 0.653274 -0.937748 1.10837 True 1 True 1.00712 False 0.380521 -0.183536 4 block2.linear 0 0 1.81486 0 0.542731 True 1 True 1.78714 False 0.13552 -0.0256195 5 block2.linear 1 0 1.72578 0 0.505475 True 1 True 1.40475 False 0.485536 0.352621 6 block2.linear 2 0 1.7284 0 0.909304 True 1 True 1.40392 False 0.521438 0.0906605 ``` Test Plan: Visual Test Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/81973 Approved by: https://github.com/jerryzh168	2022-07-26 19:51:45 +00:00
vspenubarthi	b9ed224a1b	[ao] Added filtered table generation capability to ModelReportVisualizer (#81673 ) Summary: This adds the ability to generate and display the collected statistics in a table format for the ModelReportVisualizer. The output of this is a dictionary containing two keys, mapping to a tensor stats table and channel stats table respectively. The two ways you can filter is by module_fqn, by only including modules with the `module_fqn_filter` substring, or by feature filter, which only includes features that contain the `feature_filter` substring. Expected Use: ``` table_dict = mod_rep_visualizer.generate_filtered_tables() tensor_table = table_dict[ModelReportVisualizer.TABLE_TENSOR_KEY] channel_table = table_dict[ModelReportVisualizer.TABLE_CHANNEL_KEY] ``` Headers for the Tensor level info: ``` idx layer_fqn feature_1 feature_2 feature_3 .... feature_n ---- --------- --------- --------- --------- --------- ``` Headers for the channel level info: ``` idx layer_fqn channel feature_1 feature_2 feature_3 .... feature_n ---- --------- ------- --------- --------- --------- --------- ``` The reason we split this up into two tables is because with the design where everything is in one table, it is ambiguous and easy to mix up whether a tensor level stat is actually tensor level stat or might be a per channel stat since we would have a row for each channel. Also changed some of the framework to abstract out the finding of the tables to the actual visualization to make the API much easier for the user to digest and parse. Test Plan: python test/test_quantization.py TestFxModelReportVisualizer.test_generate_table Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/81673 Approved by: https://github.com/jerryzh168	2022-07-26 19:42:08 +00:00
macandro96	fc78976921	[ao][sparsity] README for data sparsifier benchmarking (#81781 ) The README contains the results of the benchmarking exercise and area of future work. It also contains instructions to run the benchmarking scripts to reproduce the results. Also, contains other information such as requirements, machine config etc. Test Plan: None Pull Request resolved: https://github.com/pytorch/pytorch/pull/81781 Approved by: https://github.com/z-a-f	2022-07-26 16:57:09 +00:00
macandro96	58c330fcbb	[ao][sparsity] Data Sparsifier Benchmarking: Forward time evaluation of the sparse dlrm model with torch.sparse (#81780 ) The objective is to check if introducing torch sparse coo in the sparse dlrm model improves the inference time over different sparsity levels. The ```evaluate_forward_time.py``` makes use of the ```sparse_model_metadata.csv``` file dumped by the ```evaluate_disk_savings.py```. Records forward time for the sparse dlrm model using sparse coo tensors and without using sparse coo tensors and dumps it into a csv file ```dlrm_forward_time_info.csv``` Results: The dlrm model with sparse coo tensor is slower (roughly 2x). After running, `evaluate_memory_savings.py`, run: `python evaluate_forward_time.py --raw_data_file=<path_to_raw_data_txt_file> --processed_data_file=<path_to_kaggleAdDisplayChallenge_processed.npz> --sparse_model_metadata=<path_to_sparse_model_metadata_csv>` Dependencies: DLRM Repository (https://github.com/facebookresearch/dlrm) Test Plan: None Pull Request resolved: https://github.com/pytorch/pytorch/pull/81780 Approved by: https://github.com/z-a-f	2022-07-26 16:43:45 +00:00
macandro96	eca21fbd17	[ao][sparsity] Data Sparsifier Benchmarking: Model quality evaluation of the sparsified DLRM model (#81779 ) The objective is to perform evaluation of the model quality after sparsifying the embeddings of the dlrm model. The ```evaluation_model_metrics.py``` makes use of the ```sparse_model_metadata.csv``` file dumped by the ```evaluate_disk_savings.py```. The model metrics such as accuracy, auc, f1 etc are calculated on the test-dataset for various sparsity levels, block shapes and norms available on the metadata csv file. Results: The model accuracy decreases slowly with sparsity levels. Even at 90% sparsity levels, the model accuracy decreases only by 2%. After running `evaluate_memory_savings.py`, run: `python evaluate_model_metrics.py --raw_data_file=<path_to_raw_data_txt_file> --processed_data_file=<path_to_kaggleAdDisplayChallenge_processed.npz> --sparse_model_metadata=<path_to_sparse_model_metadata_csv>` Dependencies: DLRM Repository (https://github.com/facebookresearch/dlrm) Test Plan: None Pull Request resolved: https://github.com/pytorch/pytorch/pull/81779 Approved by: https://github.com/z-a-f	2022-07-25 20:40:32 +00:00
macandro96	1223e94469	[ao][sparsity] Data Sparsifier Benchmarking: Evaluating disk savings of DLRM model (#81778 ) The objective is to sparsify the embeddings of the dlrm model and observe the disk savings. The model is sparsified and dumped to disk and then zipped. The embeddings are pruned to different sparsity levels (0.0 - 1.0), for multiple block shapes ((1,1) and (1,4)) and optimization functions (L1, L2). The user trying to reproduce the results is required to clone the dlrm repository and copy the files to dlrm directory. Then train the dlrm model as per the instructions on the github page and then run this script. Results: Introducing sparsity in the embeddings reduces file size after compression. The compressed model size goes down from 1.9 GB to 150 MB after 100% sparsity. Dependencies: DLRM Repository (https://github.com/facebookresearch/dlrm) After Setup, Run: `python evaluate_disk_savings.py --model_path=<path_to_model_checkpoint> --sparsified_model_dump_path=<path_to_dump_sparsified_models>` Test Plan: None Pull Request resolved: https://github.com/pytorch/pytorch/pull/81778 Approved by: https://github.com/z-a-f	2022-07-25 20:38:46 +00:00
Andrew Or	194255bb56	[Quant][fx] Implement BackendConfig (part 1) (#81469 ) Summary: Following https://github.com/pytorch/pytorch/pull/78452 and https://github.com/pytorch/pytorch/pull/79066, this commit is part 1 of the broader effort to replace `backend_config_dict` with a python config object, a more formal and robust API that leads to better user experience. Note that there is no change in behavior in this commit by itself. A future commit (part 2) will replace all existing usages of `backend_config_dict` with the `BackendConfig` object added in this commit. Test Plan: python test/test_quantization.py TestBackendConfig Reviewers: jerryzh168 Subscribers: jerryzh168 Pull Request resolved: https://github.com/pytorch/pytorch/pull/81469 Approved by: https://github.com/jerryzh168	2022-07-24 00:34:48 +00:00
macandro96	1ba63e5a56	[ao][sparsity] Serialization support (#80890 ) Implemented dumping and loading of state_dicts and __get_state__ and __set_state__ functions. hook and layer are removed from the data_groups dictionary before serializing. In the future, might have to treat functions differently before serializing. Currently, it is being treated similar to other types while serializing. Test Plan: ```python test/test_ao_sparsity.py TestActivationSparsifier``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/80890 Approved by: https://github.com/z-a-f	2022-07-22 21:57:56 +00:00
macandro96	aa23447904	[ao][sparsity] Implementation of squash_mask() (#80889 ) Unregisters aggreagate hook that was applied earlier and registers sparsification hooks. The sparsification hook will apply the mask to the activations before it is fed into the attached layer. Test Plan: ```python test/test_ao_sparsity.py TestActivationSparsifier``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/80889 Approved by: https://github.com/z-a-f	2022-07-22 21:54:24 +00:00
macandro96	6b3bf3d6d9	[ao][sparsity] Implementation of step() and update_mask() (#80888 ) The step() internally calls the update_mask() function for each layer The update_mask() applies reduce_fn and mask_fn to compute the sparsification mask. Note: the reduce_fn and mask_fn is called for each feature, dim over the data Test Plan: ```python test/test_ao_sparsity.py TestActivationSparsifier``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/80888 Approved by: https://github.com/z-a-f	2022-07-22 21:50:10 +00:00
macandro96	5fe3a1669c	[ao][sparsity] Implementation of register_layer() and get_mask() (#80887 ) The register_layer() attaches a pre-forward hook to the layer to aggregate activations over time. The mask shape is also inferred here. The get_mask() returns the computed mask associated to the attached layer. The mask is - a torch tensor is features for that layer is None. - a list of torch tensors for each feature, otherwise Test Plan: ```python test/test_ao_sparsity.py TestActivationSparsifier``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/80887 Approved by: https://github.com/z-a-f	2022-07-22 21:47:34 +00:00
macandro96	f87d8c2f62	[ao][sparsity] Basic implementation of activation sparsifier (#80886 ) The Activation sparsifier class aims to sparsify/prune activations in a neural network. The idea is to attach the sparsifier to a layer (or layers) and it zeroes out the activations based on the mask_fn (or sparsification function) input by the user. The mask_fn is applied once all the inputs are aggregated and reduced i.e. mask = mask_fn(reduce_fn(aggregate_fn(activations))) Note:: The sparsification mask is computed on the input before it goes through the attached layer. Test Plan: ```python test/test_ao_sparsity.py TestActivationSparsifier``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/80886 Approved by: https://github.com/HDCharles	2022-07-22 21:43:33 +00:00
vspenubarthi	75aab6540e	[ao] Update DynamicStatic Detector to account for Conv (#81972 ) Summary: This updates the DynamicStatic Detector to also provide insight into whether Conv layers should use dynamic or static quantization. Before, this was not included because as of now, Dynamic quantization is not supported for Conv layers, but this adds a check for Conv layers and if dynamic is recommended, it will also give a disclaimer that it is not currently supported but will be in the future. Test Plan: python test/test_quantization.py TestFxModelReportDetectDynamicStatic Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/81972 Approved by: https://github.com/jerryzh168	2022-07-22 21:00:29 +00:00
vspenubarthi	0cacaf070f	[ao] Fix to InputWeightEqualization detector to handle Conv groups (#81971 ) Summary: The current implementation of the InputWeightEqualization detector broke when it was tested on MobileNetV2, and the reason for this is that it wasn't able to properly handle groups in Conv layers, and there also had to be some minor reshaping of the weights to handle this as well. In addition, the output was correspondingly tuned so that instead of giving on output for each channel on each layer, it gives a single suggestion per module and just lets it know how many of the channels could benefit from input-weight equalization, and suggests it if it's more than half. There was also the realization that the test class didn't do a good job of testing different dimensions for the batch vs. height vs. width, so this was updated to be more comprehensive as well. Test Plan: python test/test_quantization TestFxDetectInputWeightEqualization Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/81971 Approved by: https://github.com/jerryzh168	2022-07-22 20:56:15 +00:00
macandro96	e66986421d	[ao][sparsity] Training-aware data sparsity callback for lightning (#80371 ) This callback aims to sparsify the model inside lightning module after training. Note that the model is copied and then sparsified, so the existing model is not modified The sparsified model can be used for comparison and can be accessed using <callback_obj>.sparsified Test Plan: ```python torch/ao/sparsity/_experimental/data_sparsifier/lightning/tests/test_callbacks.py TestTrainingAwareCallback``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/80371 Approved by: https://github.com/z-a-f	2022-07-21 16:41:43 +00:00
macandro96	eecf34fbe7	[ao][sparsity] Post training data sparsifier callback for lightning (#80370 ) Lightning callback that enables post-training sparsity. This callback aims to sparsify the model inside lightning module after training. Note that the model is copied and then sparsified, so the existing model is not modified The sparsified model can be used for comparison and can be accessed using <callback_obj>.sparsified Test Plan ```python torch/ao/sparsity/_experimental/data_sparsifier/lightning/tests/test_callbacks.py TestPostTrainingCallback``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/80370 Approved by: https://github.com/z-a-f	2022-07-21 16:39:13 +00:00
Weiwen Xia	2edd6aaeaa	Add prelu op and module for quantized CPU backend (#73491 ) Add prelu op and module for quantized CPU backend. The PR includes: - Quantized version of prelu op - Native prelu kernel for quantized CPU - Prelu modules in `nn` and `nn.quantized` - FX support for prelu - Unit tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/73491 Approved by: https://github.com/jerryzh168	2022-07-20 07:48:15 +00:00
vspenubarthi	589e8a1da5	[ao] Get feature and module names from ModelReportVisualizer class (#81647 ) Summary: Added the functionality to be able to get the feature names and module_fqns from the ModelReportVisualizer class. The purpose of this addition is so that users can see the exact set of module_fqns or feature names that they can filter based on, and use this information to perform their filtering. Test Plan: python test/test_quantization.py TestFxModelReportVisualizer.test_get_modules_and_features Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/81647 Approved by: https://github.com/andrewor14	2022-07-20 03:03:03 +00:00
vspenubarthi	1d3935a77d	[ao] Add method in ModelReport to generate visualizer (#81589 ) Summary: We created a ModelReportVisualizer class, and the primary way it is envisioned that it is accessed is: ``` model_report_visualizer = model_reporter.generate_visualizer() ``` This method only works after reports have been generated and it takes in the generated reports and reformats them to be ordered by module, into the format required by the ModelReportVisualization. It then generates the visualizer instance and returns that to the user. Test Plan: python test/test_quantization.py TestFxModelReportClass.test_generate_visualizer Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/81589 Approved by: https://github.com/andrewor14	2022-07-20 02:58:52 +00:00
vspenubarthi	d0ce1fbbe2	[ao] Created Skeleton for ModelReportVisualizer class (#81523 ) Summary: This introduces the skeleton for the ModelReportVisualizer class. This class helps visualize the information generated by the ModelReport class `generate_report()` output. This class aims to provide visualizations in a table, plot (line graph) and histogram view. This also introduces an empty test class for testing visualizations. As implementations start occuring for this class, tests will also be approrpriately added. This includes the high level descriptions for each of the methods as well. Expected use cases will be added to the class description in a future commit as that gets finalized. Test Plan: python test/test_quantization.py TestFxModelReportVisualizer Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/81523 Approved by: https://github.com/andrewor14	2022-07-20 02:39:14 +00:00
vspenubarthi	8a6d1289d8	[ao] Revised ModelReport API to take in model at initialization (#81588 ) Summary: Currently, the ModelReport API only takes in detectors at the beginning and for each of its methods, you have to pass in the model each time, which doesn't really make sense because: 1. you will always want to be working on the same model 2. passing in a different model could break things, so more fault-tolerant if we keep the model internally and make calls on it Therefore, now the model will be passed in in intialization, and will just be used for the rest of the operations with the local link. All the ModelReport tests have been adjusted to account for this, and this change must pass all the tests to ensure a successful API transition. If you wish to see how the updated API looks, the Expected Usage in the ModelReport clas description has been updated to reflect the changes. The README has also been updated with these changes as well. Test Plan: python test/test_quantization.py TestFxModelReportClass Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/81588 Approved by: https://github.com/jerryzh168	2022-07-19 16:11:46 +00:00
vspenubarthi	e907a8d966	[ao] Updated dict keys of detectors to have consistent naming scheme (#81587 ) Summary: Currently, all the detectors have pretty accurate naming schemes that give an idea of what they do. However, since now there are more and more detectors being developed, there is a need to make sure that the naming scheme for detectors are consistent for their keys. This updates the keys of the returned dictionary keys to better highlight if something is an activation stat or weight stat, etc. Test Plan: python test/test_quantization.py TestFxModelReportDetector python test/test_quantization.py TestFxModelReportObserver python test/test_quantization.py TestFxModelReportDetectDynamicStatic python test/test_quantization.py TestFxModelReportClass python test/test_quantization.py TestFxDetectInputWeightEqualization python test/test_quantization.py TestFxDetectOutliers Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/81587 Approved by: https://github.com/jerryzh168	2022-07-19 08:50:30 +00:00
asl3	368018530e	[quant] Implement forward and backward autograd functions for fake quantize (#81438 ) ### Summary: This PR implements custom autograd functions for forward and backward to be used in APoT fake quantization. The implementation follows this doc about custom autograd functions: https://pytorch.org/tutorials/beginner/examples_autograd/polynomial_custom_function.html ### Test Plan: Run tests with: `python test/quantization/core/experimental/test_fake_quantize.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/81438 Approved by: https://github.com/jerryzh168	2022-07-19 02:15:30 +00:00
vspenubarthi	8a3f88b5e0	[ao] Standardized InputWeightEqualizationDetector output to single level (#81586 ) Summary: Currently the InputWeightEqualizationDetector has a multi-layered output. Example ``` {'block1.linear': {'channel_axis_selected': 1, 'channel_comparison_metrics': tensor([0.8736, 0.6594, 0.2916], grad_fn=<DivBackward0>), 'input_range_info': {'global_max': tensor(9.), 'global_min': tensor(-10.), 'per_channel_max': tensor([9., 9., 9.]), 'per_channel_min': tensor([-10., -10., -10.])}, 'input_weight_equalization_recommended': [True, False, False], 'threshold': 0.8, 'weight_range_info': {'global_max': tensor(0.5618, grad_fn=<UnbindBackward0>), 'global_min': tensor(-0.2211, grad_fn=<UnbindBackward0>), 'per_channel_max': tensor([0.3764, 0.5618, 0.2894], grad_fn=<NotImplemented>), 'per_channel_min': tensor([-0.2211, 0.2213, 0.2228], grad_fn=<NotImplemented>)}}, } ``` With all the levels, it can be hard to parse the information for anything, especially the planned visualization feature where the data has to be reorganized. Therefore, to make it standardized across all detectors, all outputs will be limited to one level. The new format is: ``` {'block1.linear': { 'channel_axis_selected': 1, 'channel_comparison_metrics': tensor([0.5705, 0.9457, 0.8891], grad_fn=<DivBackward0>), 'activation_global_max': tensor(9.), 'activation_global_min': tensor(-10.), 'activation_per_channel_max': tensor([9., 9., 9.]), 'activation_per_channel_min': tensor([-10., -10., -10.]), 'input_weight_equalization_recommended': [False, True, True], 'threshold': 0.8, 'weight_global_max': tensor(0.4258, grad_fn=<UnbindBackward0>), 'weight_global_min': tensor(-0.4958, grad_fn=<UnbindBackward0>), 'weight_per_channel_max': tensor([0.1482, 0.3285, 0.4258], grad_fn=<NotImplemented>), 'weight_per_channel_min': tensor([-0.1517, -0.4958, -0.3027], grad_fn=<NotImplemented>)}, } ``` The README will also be updated to reflect this change. Test Plan: python test/test_quantization.py TestFxDetectInputWeightEqualization Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/81586 Approved by: https://github.com/jerryzh168	2022-07-19 01:00:40 +00:00
vspenubarthi	2ddb722bc6	[ao] Standardize PerChannelDetector Output to be single level (#81585 ) Summary: Currently, the PerChannelDetector has a multi-layered output. Example: ``` {'backend': 'qnnpack', 'per_channel_status': {'block1.linear': {'per_channel_supported': True, 'per_channel_used': False}, 'block2.linear': {'per_channel_supported': True, 'per_channel_used': False}}} ``` The issue with this is that when it comes to future features such as visualizations where we need to go through this dictionary, it can be hard because of the variable number of layers. This changes the output format of the PerChannelDetector to have a standard format. Ex.) ``` {'block1.linear': {'backend': 'qnnpack', 'per_channel_supported': True, 'per_channel_used': False}, 'block2.linear': {'backend': 'qnnpack', 'per_channel_supported': True, 'per_channel_used': False}} ``` Test Plan: python test/test_quantization.py TestFxModelReportDetector Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/81585 Approved by: https://github.com/HDCharles	2022-07-18 22:16:08 +00:00
vspenubarthi	845792db3c	[ao] Fix for extra lines after return in Outlier Detector (#81499 ) Summary: There were accidently two lines added after a return statement in the OutlierDetecor insertion that was not caught by either the linter nor the tests nor i, that were harmless, but some odd merge issue. This removes those two lines. Test Plan: python test/test_quantization.py TestFxDetectOutliers Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/81499 Approved by: https://github.com/kit1980	2022-07-15 00:10:59 +00:00
vspenubarthi	0f3c8c939f	[ao] Added README for ModelReport functionality (#81369 ) Summary: This adds a README for the ModelReport functionality that contains an overview of the class, what it does, and how it works, an example of usage, information on how to implement a new detector (since this is how core functionality is added), folder structure information, and finally information on tests and where they are located. The ModelReport class is still in development and will, in the future, get additional features such as visualizations, and the README will be updated with this information as it is added. Test Plan: Just a new README, no code is added, README will be reviewed for accuracy and ease of use/ easiness to read. Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/81369 Approved by: https://github.com/jerryzh168	2022-07-14 19:17:52 +00:00
vspenubarthi	8f743d7a70	[ao] Updated detector observer insert args to be vars not strings (#81382 ) Summary: Before for the detectors, the determine_observer_insert_points() function for all of them would have hard coded strings as the keys for the dictionary that would be returned to the ModelReport instance, and those same hard-coded keys would be used to actually extract information from them. Since all detectors used the same string keys, these were just made default variables at the top of the detector.py file, and all detectors just used those. The same ones are imported and now used in ModelReport file as well. This way, there is less of a chance of an error because of incorrectly typed strings. The test plan primarily tests the ModelReport class because this uses the same new vars as well for the strings and is the primary one calling each of the detector instances' determine_observer_insert_points() Test Plan: python test/test_quantization.py TestFxModelReportClass Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/81382 Approved by: https://github.com/jerryzh168	2022-07-14 19:17:28 +00:00
Jerry Zhang	446edadd95	[quant][fx] Follow up fixes for qconfig validations for fixedqparams ops (#81010 ) Summary: This adds a few things on top of https://github.com/pytorch/pytorch/pull/80184, 1). node.target was assumed to be "tanh", torch.nn.Tanh etc. this PR handles that properly 2). adds FixedQParamsFakeQuantize support 3). extends the comparison function _partial_wrapper_equals to work with FakeQuantize.with_args(observer=...) Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D37735193](https://our.internmc.facebook.com/intern/diff/D37735193) Pull Request resolved: https://github.com/pytorch/pytorch/pull/81010 Approved by: https://github.com/andrewor14	2022-07-14 18:06:23 +00:00
Andrew Or	c657c3d3ab	[Quant][fx] Rename convert_to_reference to convert_to_reference_fx (#81326 ) Summary: This commit renames the convert_to_reference function to convert_to_reference_fx, which is more descriptive and matches prepare_fx and prepare_qat_fx better. Test Plan: python test/test_quantization.py TestQuantizeFx Reviewers: jerryzh168 Subscribers: jerryh168 Differential Revision: [D37787876](https://our.internmc.facebook.com/intern/diff/D37787876) Pull Request resolved: https://github.com/pytorch/pytorch/pull/81326 Approved by: https://github.com/jerryzh168	2022-07-13 22:18:46 +00:00
vspenubarthi	a25df29cc4	[ao] Updated ModelReport function calls to show not dependent on Fx GraphMode (#81252 ) Summary: Before, all the function calls for the ModelReport object were dependent on the Fx Graph Mode workflow. However, in reality, this was not true and the only requirement that was needed was for the model to be a traceable GraphModule. This also helped keep the ModelReport class as detached from the Fx Workflow as possible so that it can be used as a more all purpose tool in the future. This updated all the references to make sure that it wasn't specifically referencing that a Fx Graph Mode workflow is needed, and is instead more general since all we really need is a traceable model. Test Plan: python test/test_quantization.py TestFxModelReportClass Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/81252 Approved by: https://github.com/jerryzh168	2022-07-13 20:24:37 +00:00
vspenubarthi	5eec908700	[ao] Update ModelReport class with class usage in description. (#81251 ) Summary: This adds a example usage description to the ModelReport class so that people can see how it can be used right in the class documentation without having to consult external sources. The example usage depicts how it can be used using the QuantizationTracer, which was a decision taken to illustrate how there is no strict requirement on using this tool with only Fx Graph Mode workflow. Test Plan: python test/test_quantization.py TestFxModelReportClass Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/81251 Approved by: https://github.com/jerryzh168	2022-07-13 20:21:37 +00:00
vspenubarthi	6366c99e5b	[ao] Added Collab link for Outlier Detector ratio val choice (#81250 ) Summary: A huge part of the work for the Outlier detector was figuring out what a good nth percentile to compare against the 100th percentile was while also figuring out what a good comparision ratio would be. This commit adds the link to a collab to the documentation of the function so that people can go and see what the calculations used to determine those values are and realize that they are not just randomly thrown in there. At a high level, this collab contains work that includes: - Figuring out whether to use interpolation or lower as the rule for finding quantile between two indices - Figuring out what a good value for reference_percentile is - Figuring out what a good value for ratio_threshold is Test Plan: python test/test_quantization.py TestFxDetectOutliers Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/81250 Approved by: https://github.com/jerryzh168	2022-07-13 20:19:24 +00:00
vspenubarthi	9c298fff2e	[ao] Added constant channel check to Outlier Detector (#81249 ) Summary: The current Outlier detector does a good job of finding whether data distributions passing through layers have outliers. However, suppose we have a completely constant channel. The outlier detector would not detect it as an outlier, but that is still something we want to highlight because a constant channel usually is a result of a bad configuration or something really wrong with the data. To address this there are two additions to the outlier detector that this commit makes: - The first is to add whether there are any constant batches at all and let the user know in the text report - The second is to let the user know the number of total constant batches found for each channel, so they can figure out if there are any unnecessary channels present. The exisiting outlier detector tests were modified to do a quick check for this feature. Test Plan: python test/test_quantization.py TestFxDetectOutliers Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/81249 Approved by: https://github.com/andrewor14	2022-07-13 20:16:33 +00:00
vspenubarthi	229762dcd9	[ao] Added statistical threshold arg in Outlier Detector (#81174 ) Summary: The outlier detector has a feature where it's able to notify the user if below the whole set of batches that passed through were used in Outlier calculation, which mainly happens as a result of 0-errors. This changes the code so that instead of comparing against a value like 30 as we were before, we now let the user pass in an optional fractional value and if the ratio of the batches used was below that value, the detector alerts the user. Test Plan: python test/test_quantization.py TestFxDetectOutliers Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/81174 Approved by: https://github.com/andrewor14	2022-07-13 20:13:46 +00:00
vspenubarthi	893d763276	[ao] Implemented Outlier Detection Report Generation (#80937 ) Summary: This adds the implementation for the report generation for the Outlier Detector class. This includes both the generation of a dictionary containing each module that had an observer attached and any relavent stats collected by the observer that can help shed light on outlier relavent data or computed metrics. It also includes a string denoting specific modules that had outliers and gives a bit of insight into what channels they are contained in. This contains both the implementation for the report generation for the outlier detector as well as a test class to test the report generation functionality. Test Plan: python test/test_quantization.py TestFxDetectOutliers Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/80937 Approved by: https://github.com/andrewor14	2022-07-12 19:56:33 +00:00
zaf	55d1b376ea	[ao][sparsity] Vectorized WeightNormSparsifier (#80059 ) The previous implementation was using loops to compute the sparsity within a block in a mask, as well as across the mask blocks. This implements the vectorized version. ## Vectorization: A high level overview of the vectorization procedure falls into a two step process: ### Tensor-level masking A tensor-level masking is a mask generation routine that has a granularity of `sparse_block_shape`. That means that only patches of that shape can be considered sparse/dense. To vectorize: 1. Reshape the data such that one of the dimensions represents the patches of sparse_block_shape. 2. Create a mask of the same shape as the reshaped data 3. Find the smallest `k` elements in the the data, given the dimension of the sparse "patches". `k` represents a derived paramter specifying the sparsity level. 4. Apply the 0/1 to the patches in the mask 5. Reshape the mask back to the original dimensions Note: because the shape of the mask might not be multiple of the sparse_block_shape, we nudge the sshape of the mask, and truncate it afterwards. ## Block-level masking A block-level masking is a mask generation routine that concerns itself only with sparsity within a patch of shape `sparse_block_shape`. This is useful when block sparsity allows partial block sparsification. To vectorize: Overall the block-level masking follows the same routine as the tensor-level algorithm described above. One distinction is that when reshaping the data/mask tensors we aim for creating a dimension that captures the internals of each patch. For example, if a `sparse_block_shape` is `(2, 2)`, we want to reshape the data/mask into `(2, 2, -1)`. That allows us to sort the internal elements on the last axis, and zero-out the ones that obey the sparse logic. Differential Revision: [D37352494](https://our.internmc.facebook.com/intern/diff/D37352494/) NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D37352494/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/80059 Approved by: https://github.com/jerryzh168	2022-07-12 19:16:44 +00:00
PyTorch MergeBot	caee732aa1	Revert "[quant][fx] Support keyword arguments for functional linear (#79095 )" This reverts commit `d71fb40d98`. Reverted https://github.com/pytorch/pytorch/pull/79095 on behalf of https://github.com/jerryzh168 due to broken master	2022-07-09 21:45:01 +00:00
Jerry Zhang	d71fb40d98	[quant][fx] Support keyword arguments for functional linear (#79095 ) Summary: Fixes: https://github.com/pytorch/pytorch/issues/78117 Fixes: https://github.com/pytorch/pytorch/issues/73463 This PR adds a normalization pass that normalizes all the args to keyword args in positional order and fixes lowering code that previously only uses node.args to use both args and kwargs instead. Also tried to add a test for F.conv2d, but since conv2d matches multiple schemas we are doing an extra schema match, and because we are using symbolic values in `transform`, we don't have a schema match, so F.conv2d still fails with runtime errors. we can resolve this issue later when there is a need. Another thing I'm considering is to do the normalization with real inputs instead of symbolic inputs and not rely on operator_schemas (which is based on torchscript), and rely on inspect.signature, I tried this briefly but didn't get too far, it looks like we cannot get the python signature for `torch._C._nn.linear`, it might be possible to fix as well, but will need follow up discussions. The goal for this PR is just to introduce normalization in our codebase so that we can adapt some downstream code to this, and also fix the F.linear issue. Test Plan: python test/test_quantization.py TestQuantizeFx.test_normalize_args Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D37163228](https://our.internmc.facebook.com/intern/diff/D37163228) Pull Request resolved: https://github.com/pytorch/pytorch/pull/79095 Approved by: https://github.com/andrewor14	2022-07-09 20:01:09 +00:00
Zafar	68ec793cfd	[ao] Moving the sparsity/experimental to sparsity/_experimental (#81149 ) The experimental code in the sparsity does not have user-facing api, and should reside under the proivate package. This involves pruner and base_sparsifier. Pull Request resolved: https://github.com/pytorch/pytorch/pull/81149 Approved by: https://github.com/macandro96	2022-07-09 03:00:11 +00:00
Andrew Or	8fab682e47	[Quant][fx][bc-breaking] Do not move models to CPU in convert (#80555 ) Summary: Previously, we automatically moved the model to CPU in torch.ao.quantization.fx.convert to work around the issue where certain functions called by convert expect CPU arguments. This commit pushes this responsibility to the caller since it is the user's decision of which device to use. Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps BC-breaking Notes: Before: ``` model = resnet18(...) model = prepare_fx(model, qconfig_mapping, example_inputs) ... # calibrate model = convert_fx(model) ``` After: ``` model = resnet18(...) model.cpu() model = prepare_fx(model, qconfig_mapping, example_inputs) ... # calibrate model = convert_fx(model) ``` Reviewers: jerryzh168 Subscribers: jerryzh168 Differential Revision: [D37528830](https://our.internmc.facebook.com/intern/diff/D37528830) Pull Request resolved: https://github.com/pytorch/pytorch/pull/80555 Approved by: https://github.com/jerryzh168	2022-07-08 19:23:57 +00:00
vspenubarthi	6a7ed56d79	[ao] Added OutlierDetector observer insert implementation (#80880 ) Summary: This adds the implementation for observer insertion point selection for the OutlierDetector. For this detector, the insertion points are to insert a ModelReportObserver before any leaf level module to study the distribution of data that passes into the module to detect outliers. This commit contains the implementation of the observer insertion as well as the relavent test case. Some code from the InputWeightEqualization was abstracted and made more modular so the same helper function could be used for multiple outlier class tests. As a part of the work for this, there was testing done to determine what a good default ratio threshold and reference percentile would be, and the work to determine this (based on a normal distribution) was then analyzed to find good paramters. We still want to keep thresholds and reference percentile as something the user can input because these were based on a normal distribution, and it can definately vary depending on the type of data a user has. Test Plan: python test/test_quantization.py TestFxDetectOutliers Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/80880 Approved by: https://github.com/andrewor14	2022-07-08 15:36:20 +00:00
Salil Desai	5c12cd224f	[PyTorch Edge] Add qnnpack bcsr matrix unpacking and use unpacking in Linear module (#80475 ) Having unpacking removes the need to store the original dense weights in the python Linear module Differential Revision: [D34699287](https://our.internmc.facebook.com/intern/diff/D34699287/) NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D34699287/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/80475 Approved by: https://github.com/qihqi	2022-07-07 15:32:21 +00:00
Salil Desai	eaf817df3a	[PyTorch Edge] Add serialization/deserialization of Sparse Quantize Linear Packed Params (#80474 ) Packed Params are serialized/deserialized in sparse form Differential Revision: [D34392761](https://our.internmc.facebook.com/intern/diff/D34392761/) NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D34392761/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/80474 Approved by: https://github.com/qihqi	2022-07-07 15:30:02 +00:00
Salil Desai	523b081a64	[PyTorch Edge] Remove Original Weight Tensor from QNNPack Sparse Quantized Linear Packed Params (#80473 ) We plan to add serialization/deserialization wihout the original weight tensor, so we no longer need to store it Differential Revision: [D34617321](https://our.internmc.facebook.com/intern/diff/D34617321/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/80473 Approved by: https://github.com/qihqi	2022-07-07 15:11:44 +00:00
macandro96	daf00e843a	[ao][sparsity] Bug Fix: data norm sparsifier not working with 1D tensors/parameters (#80465 ) Issue: Previously, the L1/L2 norm data sparsifier was not supported with 1D tensors or parameters. Fix: If the tensor is 1D, then unsqueeze it to make it look 2D and perform the rest as usual. Also, added some 1D tensor in the unit test to test this issue. Test Plan: ```python test/test_ao_sparsity.py TestNormDataSparsifiers``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/80465 Approved by: https://github.com/z-a-f	2022-07-06 21:04:19 +00:00
macandro96	ec594dd305	[ao][sparsity] Bug fix: data not correctly attached to the sparsifier (#80394 ) Issue: Previously, the data was not "attached" to the data sparsifier. Meaning the data sparsifier created a copy of the actual data inside it's container. So, when the data was modified outside of the sparsifier, the changes was not reflected in the sparsifier. Fix: Use register_buffer() instead of nn.Parameter(..) to store the data inside the container. Also, added a unit-test to reference this issue. Test Plan: ```python test/test_ao_sparsity.py TestBaseDataSparsifier``` ```python test/test_ao_sparsity.py TestNormDataSparsifiers``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/80394 Approved by: https://github.com/z-a-f	2022-07-06 20:57:32 +00:00
Vasiliy Kuznetsov	ce0786add2	fx quant: fix warning in util function when cloning tensors (#80883 ) Summary: Some of the util functions in FX graph mode quantization throw warnings such as: ``` /Users/vasiliy/pytorch/torch/ao/quantization/fx/utils.py:410: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach(). requires_grad_(True), rather than torch.tensor(sourceTensor). ``` This PR fixes the warnings by moving the code to the recommended syntax if the value is a tensor. Test plan: ``` python test/test_quantization.py -k test_conv_linear_reference // warning appeared before this PR and disappeared after this PR ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/80883 Approved by: https://github.com/jerryzh168	2022-07-06 12:44:10 +00:00
Jiaxu Zhu	280f4704b7	[torch.fx] Check node type before fetching .users (#80166 ) Summary: as title currently it fails when `node` is actually a constant instead of `fx.Node` Test Plan: existing unit tests Differential Revision: D37389003 Pull Request resolved: https://github.com/pytorch/pytorch/pull/80166 Approved by: https://github.com/jerryzh168	2022-07-05 23:32:22 +00:00
asl3	5b493ba18b	[quant] Refactor quantize clamping into float_to_apot util function (#80885 ) ### Summary: This PR moves the clamping functionality from `quantize` to `float_to_apot` util function to align with the uniform quantize workflow in the codebase. ### Test Plan: Run unit tests with: python pytorch/test/quantization/core/experimental/test_quantizer.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/80885 Approved by: https://github.com/dzdang	2022-07-05 19:28:37 +00:00
vspenubarthi	e5162dcfa7	[ao] Added framework for ModelReport Outlier Detector (#80743 ) Summary: This adds the class framework for the ModelReport OutlierDetector. This detector will be in charge of looking at activation data and figuring out whether there are significant oultiers present in them. It will average this data across batches to make a recommendation / warning if significant outliers are found. This commit contains just the class framework and a base test class. Implementations will follow in following commits. Test Plan: python test/test_quantization.py TestFxDetectOutliers Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/80743 Approved by: https://github.com/HDCharles	2022-07-01 01:03:31 +00:00
PyTorch MergeBot	b64096a264	Revert "Add prelu op and module for quantized CPU backend (#73491 )" This reverts commit `3a6d6bc3cc`. Reverted https://github.com/pytorch/pytorch/pull/73491 on behalf of https://github.com/malfet due to Broke Windows builds, see `3a6d6bc3cc`	2022-06-30 12:54:39 +00:00
Weiwen Xia	3a6d6bc3cc	Add prelu op and module for quantized CPU backend (#73491 ) Add prelu op and module for quantized CPU backend. The PR includes: - Quantized version of prelu op - Native prelu kernel for quantized CPU - Prelu modules in `nn` and `nn.quantized` - FX support for prelu - Unit tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/73491 Approved by: https://github.com/jerryzh168	2022-06-30 06:50:22 +00:00
Jerry Zhang	1a7e560ade	[quant] Refactor quantization tracer to a separate file (#80268 ) Summary: att, since we need to reuse the tracer in some other places Test Plan: python test/test_quantization.py TestQuantizeFx Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D37435748](https://our.internmc.facebook.com/intern/diff/D37435748) Pull Request resolved: https://github.com/pytorch/pytorch/pull/80268 Approved by: https://github.com/vkuzo	2022-06-30 00:49:57 +00:00
HDCharles	fa6b6842e1	[ao][sparsity] removing leading '.' from fqn in utils (#79774 ) Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #79774 Pull Request resolved: https://github.com/pytorch/pytorch/pull/79774 Approved by: https://github.com/z-a-f	2022-06-30 00:00:56 +00:00
HDCharles	3f1dc7ec00	[quant] Create default custom modules for LSTM and MHA (#79960 ) Summary: Currently we expect the users to provide custom modules for LSTM and MHA. However, as we almost always ask the users to use those modules in the custom context, it is better to make this behavior default. In this case we try to align with the base quantization API, if the user specifies a custom_config_dict then that is used, however if the value is left as None then the default is used. If a user would like to both use the default and modify it, they have to do so manually, however the default is accessible by get_default_custom_config_dict Additionally, the NS which uses prepare to insert custom observers for its purposes had to be slightly modified to pass in an empty custom_config_dict in order to avoid modifying the custom modules. due to weird CI issues with previous PR, previous discussion can be found: https://github.com/pytorch/pytorch/pull/71192 Test Plan: Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/79960 Approved by: https://github.com/z-a-f	2022-06-30 00:00:46 +00:00
Andrew Or	c44317704a	[Quant][fx] Add default configs for fixed qparams ops (#80184 ) Summary: This commit adds qconfigs with special observers for fixed qparams ops in get_default_qconfig_mapping and get_default_qat_qconfig_mapping. For correctness, we also require users to use these special observers if we detect these fixed qparams ops in prepare. Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Reviewers: jerryzh168, vkuzo Subscribers: jerryzh168, vkuzo Differential Revision: [D37396379](https://our.internmc.facebook.com/intern/diff/D37396379) Pull Request resolved: https://github.com/pytorch/pytorch/pull/80184 Approved by: https://github.com/jerryzh168	2022-06-29 23:07:26 +00:00
Andrew Or	17104d3d7f	[Quant][fx][bc-breaking] Replace is_reference with convert_to_reference (#80091 ) Summary: This PR removes the is_reference flag from the existing convert_fx API and replaces it with a new convert_to_reference function. This separates (1) converting the prepared model to a reference model from (2) lowering the reference model to a quantized model, enabling users to call their custom lowering function for custom backends. For the native fbgemm backend, for example, the following are equivalent: ``` from torch.ao.quantization.quantize_fx import prepare_fx, convert_fx prepared = prepare_fx(model, ...) quantized = convert_fx(prepared, ...) ``` ``` from torch.ao.quantization.fx import lower_to_fbgemm from torch.ao.quantization.quantize_fx import ( prepare_fx, convert_to_reference ) prepared = prepare_fx(model, ...) reference = convert_to_reference(prepared, ...) quantized = lower_to_fbgemm(reference, ...) ``` Note that currently `lower_to_fbgemm` takes in two other arguments that are difficult for users to provide. A future commit will remove these arguments to make the helper function more user friendly. Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Reviewers: jerryzh168, vkuzo Subscribers: jerryzh168, vkuzo Differential Revision: [D37359946](https://our.internmc.facebook.com/intern/diff/D37359946) Pull Request resolved: https://github.com/pytorch/pytorch/pull/80091 Approved by: https://github.com/jerryzh168	2022-06-29 23:01:27 +00:00
asl3	5070f5d18f	[quant] Implement APoT fake quantization (#79845 ) ### Summary: This PR implements APoT fake quantization for the purpose of quantization aware training. This implements `calculate_qparams` and `forward `methods to be used in fake quantization. ### Test Plan: Run unit tests with: `python pytorch/test/quantization/core/experimental/test_fake_quantize.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/79845 Approved by: https://github.com/dzdang	2022-06-28 18:15:26 +00:00
zaf	cb5ef130b6	[ao][sparsity] Fixing failing internal pruner tests (#80111 ) After a recent change in the base_sparsifier API, the internal pruner started failing. This adopts the testcases to the change: 1. Changed `module_groups` to `groups` 2. Changed the fusion logic from taking care of the whole fused module to handling the submodules individually. Differential Revision: [D37364801](https://our.internmc.facebook.com/intern/diff/D37364801/) NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D37364801/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/80111 Approved by: https://github.com/macandro96	2022-06-28 04:38:58 +00:00
Andrew Or	8aedd8fb25	[Quant][fx] Hide equalization_config from prepare APIs (#80164 ) Summary: This PR hides the equalization_config argument from prepare_fx. This is a private API that we do not wish to expose to users and have to maintain backward compatibility for. Test Plan: python test/test_quantization.py TestEqualizeFx Reviewers: jerryzh168 Subscribers: jerryzh168 Differential Revision: [D37394353](https://our.internmc.facebook.com/intern/diff/D37394353) Pull Request resolved: https://github.com/pytorch/pytorch/pull/80164 Approved by: https://github.com/jerryzh168	2022-06-28 04:20:34 +00:00

1 2 3 4 5 ...

681 Commits