pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
siahuat0727	3c63e76b03	[PT2E Quantization] Fix RecursionError when prepare_pt2e graph with concat of the same node (#141651 ) Fixes #129038 Related PR #129567 Here is the new PR against main, thanks! @jerryzh168 Pull Request resolved: https://github.com/pytorch/pytorch/pull/141651 Approved by: https://github.com/jerryzh168	2024-11-29 09:19:22 +00:00
Xia, Weiwen	9827d677b4	[Quant][PT2E][X86] annotate and convert for linear_dynamic_fp16 (#141480 ) Annotate linear node for `linear_dynamic_fp16` with `X86InductorQuantizer` After `convert_pt2e`, the pattern will be ``` x \| linear <- to_fp32 <- to_fp16 <- w ``` Test plan ``` pytest test/quantization/pt2e/test_x86inductor_quantizer.py -k test_linear_dynamic_fp16 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/141480 Approved by: https://github.com/jgong5, https://github.com/jerryzh168	2024-11-29 07:48:39 +00:00
yintong-lu	1ef1b3b391	Add missing data types at torch export serialization (#138561 ) Related to #131654 Added missing FP8 data types at torch export serialization. Added test cases of FP8 data types. Pull Request resolved: https://github.com/pytorch/pytorch/pull/138561 Approved by: https://github.com/jerryzh168, https://github.com/jgong5	2024-11-28 08:35:03 +00:00
cyy	5ca75ac1df	Enable UBSAN tests (#141672 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/141672 Approved by: https://github.com/ezyang	2024-11-28 01:55:15 +00:00
Shangdi Yu	02990fe36b	Populate nn.module.stack in _fuse_conv_bn_qat (#141400 ) Summary: Populate nn.module.stack in _fuse_conv_bn_qat for replacement nodes that correspond to a `get_attr` node in the original graph. In new training ir , `get_attr` nodes don't have `nn_module_stack` in node meta anymore (because the get_attr nodes are de-duplicated, so one get_attr node can potential have users in different module stacks). We populate it by checking if "conv_input" or "conv_weight" replacement node has nn_module_stack. If not, we copy it from the conv node. Test Plan: CI ``` buck run fbcode//caffe2/test:quantization_pt2e -- -r test_preserve_nn_module_stack ``` Differential Revision: D66393517 Pull Request resolved: https://github.com/pytorch/pytorch/pull/141400 Approved by: https://github.com/angelayi	2024-11-25 23:41:28 +00:00
PyTorch MergeBot	cf1d95a965	Revert "Add option to split Linear gates for Quantizable LSTM into separate ops (#140868 )" This reverts commit `3fcf66f61f`. Reverted https://github.com/pytorch/pytorch/pull/140868 on behalf of https://github.com/huydhn due to Sorry for reverting your change but I think lint is failing on this in trunk ([comment](https://github.com/pytorch/pytorch/pull/140868#issuecomment-2494076202))	2024-11-22 15:54:05 +00:00
Johnson Wong	3fcf66f61f	Add option to split Linear gates for Quantizable LSTM into separate ops (#140868 ) Summary: For LSTM, the input and hidden state are projected with Linear layers to construct the 4 gates. This is typically performed together as a single Linear (for each state) with output channel count `4 * hidden_dim` for efficiency. https://www.internalfb.com/code/fbsource/[ebef7c4238aa55948b2b444044f2c8ed2040de55]/fbcode/caffe2/torch/ao/nn/quantizable/modules/rnn.py?lines=52-58 The output is then ultimately split into 4: https://www.internalfb.com/code/fbsource/[ebef7c4238aa55948b2b444044f2c8ed2040de55]/fbcode/caffe2/torch/ao/nn/quantizable/modules/rnn.py?lines=83-87 For on-device latency (and possibly memory) considerations, we want to avoid constructing the intermediate `gates` tensor (which can be relatively large), by splitting `igates` and `hgates` first (as 4x `Linear(hidden_dim, hidden_dim)` each), applying add separately, then proceeding as usual. This functionality can be enabled by specifying `split_gates=True` (default False is original behavior) at any entry point (directly with `torch.ao.nn.quantizable.LSTM` or via `_get_lstm_with_individually_observed_parts`). Test Plan: piggy back on existing test to check for correct swap handling, numerics, and jit.script during prepare/convert ``` buck2 test 'fbcode//mode/opt' fbcode//caffe2/test/quantization:test_quantization -- --exact 'caffe2/test/quantization:test_quantization - test_custom_module_lstm (caffe2.test.quantization.core.test_quantized_op.TestQuantizedOps)' ``` https://www.internalfb.com/intern/testinfra/testrun/11540474102848372 This test is quite long running now (more than double original). Reviewed By: Ninja91 Differential Revision: D65283170 Pull Request resolved: https://github.com/pytorch/pytorch/pull/140868 Approved by: https://github.com/jerryzh168	2024-11-22 04:10:26 +00:00
Songhao Jia	84d86e3767	[numeric_debugger] guard the input generate_numeric_debug_handle as GraphModule type (#140742 ) Summary: Support ExportProgram type in generate_numeric_debug_handle, to better meet the requirement Test Plan: ci Differential Revision: D65920529 Pull Request resolved: https://github.com/pytorch/pytorch/pull/140742 Approved by: https://github.com/tarun292, https://github.com/jerryzh168	2024-11-20 03:40:04 +00:00
Shen Xu	efe8482c0d	Add prepare_obs_or_fq_callback to quantizer (#140863 ) Test Plan: CI. Differential Revision: D65982003 Pull Request resolved: https://github.com/pytorch/pytorch/pull/140863 Approved by: https://github.com/jerryzh168	2024-11-19 01:13:38 +00:00
Xia, Weiwen	62eea62493	[Quant][Onednn] add linear_dynamic_fp16 ops (#140376 ) About this PR This PR adds the following ops for `linear_dynamic_fp16` in onednn namespace. These ops are intended for PT2E quantization eager mode. - `onednn::linear_prepack_fp16`: packs fp32 weight to an fp16 MkldnnCPU tensor. - `onednn::linear_dynamic_fp16`: takes an fp32 CPU tensor and an fp16 MkldnnCPU tensor and compute linear in fp32 - `onednn::linear_relu_dynamic_fp16`: similar as the former and apply relu on output. Test plan `python test/test_quantization.py -k test_linear_dynamic_fp16_onednn` Implementation These ops call oneDNN lib under the hood. It's worth noting that oneDNN does not support f32 * f16 -> f32 computation, so we have to convert fp16 weight to fp32 before computation. And weight is still in plain format after packing. Correctness and performance Correctness is guaranteed by UT. Performance of the new ops may be better than the FBGEMM implementation when weight shape is small but worse when weight shape is large. It's because weight dtype conversion and computation are not fused. For example, I ran benchmarks on an Intel(R) Xeon(R) Platinum 8490H machine with different cores and shapes. When using 1 core per instance, the new implementation generally is faster for weight shape < 1024 * 1024. When using more cores, the threshold will increase. Pull Request resolved: https://github.com/pytorch/pytorch/pull/140376 Approved by: https://github.com/jerryzh168, https://github.com/jgong5	2024-11-14 05:19:18 +00:00
zeshengzong	cb71bcc542	Replace clone.detach with detach.clone (#140264 ) Fixes #64532 As state in issue, replace `clone.detach` by `detach.clone` Pull Request resolved: https://github.com/pytorch/pytorch/pull/140264 Approved by: https://github.com/soulitzer	2024-11-13 07:01:02 +00:00
Songhao Jia	59ec011855	[numerical debugger] bumped up the starting handler id (#139666 ) Differential Revision: D65445250 Pull Request resolved: https://github.com/pytorch/pytorch/pull/139666 Approved by: https://github.com/tarun292, https://github.com/dulinriley	2024-11-07 01:00:43 +00:00
Jerry Zhang	938803df94	Add bfloat16 support for per tensor/channel cpu/cuda fake quantize ops (#139306 ) Summary: Fixes https://fb.workplace.com/groups/2240361332735959/permalink/8190736677698365 Test Plan: buck2 test 'fbcode//mode/dev' fbcode//caffe2/test/quantization:test_quantization -- --exact 'caffe2/test/quantization:test_quantization - test_forward_per_channel_cachemask_cpu (caffe2.test.quantization.core.test_workflow_ops.TestFakeQuantizeOps)' buck2 test 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- --exact 'caffe2/test/quantization:test_quantization - test_forward_per_tensor_cachemask_cpu (caffe2.test.quantization.core.test_workflow_ops.TestFakeQuantizeOps)' buck2 test 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- --exact 'caffe2/test/quantization:test_quantization - test_forward_per_channel_cachemask_cuda (caffe2.test.quantization.core.test_workflow_ops.TestFakeQuantizeOps)' buck2 test 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- --exact 'caffe2/test/quantization:test_quantization - test_forward_per_channel_cachemask_cpu (caffe2.test.quantization.core.test_workflow_ops.TestFakeQuantizeOps)' Differential Revision: D65221710 Pull Request resolved: https://github.com/pytorch/pytorch/pull/139306 Approved by: https://github.com/navsud	2024-10-31 20:41:15 +00:00
Wu, Chunyuan	d7411c0cc1	[AOTI] add C shim for QConvPointWise (#138540 ) This PR adds C shim for `QConvPointWisePT2E` and `QConvPointWiseBinaryPT2E` similar to https://github.com/pytorch/pytorch/pull/138439. Besides that, we aligned the implementation of `qconv_pointwise` with `qlinear_pointwise` in the following aspects: 1. The parameter order of `qconv_pointwise` and `qlinear_pointwise` are quite different, we aligned the schema of `qconv_pointwise` to have similar parameter order as `qlinear_pointwise` to make it more consistent. 2. We always converted `x_scale` and `x_zero_point` to Tensors, just like in the lowering of `qlinear_pointwise`. This avoids the need to create two separate C APIs (one for `double x_scale` and `int64_t x_zero_point`, and another for `Tensor` versions). Instead, we only need one API for `Tensor`-based `x_scale` and `x_zero_point`. If we later add dynamic quantization for qconv (which will use `Tensor` for `x_scale` and `x_zero_point`), we can reuse the code from this PR and don't need to change the C shim layer API. Pull Request resolved: https://github.com/pytorch/pytorch/pull/138540 Approved by: https://github.com/jgong5, https://github.com/desertfire ghstack dependencies: #138691, #138806	2024-10-31 02:03:01 +00:00
Xia, Weiwen	edcab61f93	Skip test for PT2E quantized ops in fbcode (#138792 ) Skip those tests as they are failing in fbcode. Submit this PR per request from @jerryzh168 Pull Request resolved: https://github.com/pytorch/pytorch/pull/138792 Approved by: https://github.com/jerryzh168	2024-10-30 02:37:38 +00:00
amathewc	69d401d010	Update test_quantize_pt2e.py with HPU support (#137863 ) MOTIVATION We recently integrated support for Intel Gaudi devices (identified as 'hpu') into the common_device_type framework via the pull request at https://github.com/pytorch/pytorch/pull/126970. This integration allows tests to be automatically instantiated for Gaudi devices upon loading the relevant library. Building on this development, the current pull request extends the utility of these hooks by adapting selected CUDA tests to operate on Gaudi devices. Additionally, we have confirmed that these modifications do not interfere with the existing tests on CUDA devices. CHANGES - Add support for HPU devices within the test_move_exported_model_bn using TEST_HPU flag - Use instantiate_device_type_tests with targeted attributes to generate device-specific test instances. - Apply skipIfHPU decorator to bypass tests that are not yet compatible with HPU devices. Pull Request resolved: https://github.com/pytorch/pytorch/pull/137863 Approved by: https://github.com/jerryzh168	2024-10-29 13:01:03 +00:00
Aaron Gokaslan	5d074746e9	[BE]: Add better optional typing (#138426 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/138426 Approved by: https://github.com/XuehaiPan, https://github.com/malfet	2024-10-27 14:19:00 +00:00
Xu Han	043864afdf	enable test_x86inductor_quantizer.py UTs on Windows. (#138937 ) This UTs are failed months ago, but due to the main branch move forward, some PRs fixed it. Let's turn on them. Local test passed: <img width="863" alt="image" src="https://github.com/user-attachments/assets/a2ec160c-cdf1-404d-bc24-2f60faa8d791"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/138937 Approved by: https://github.com/jansel	2024-10-26 12:48:51 +00:00
Jerry Zhang	6d8c9be54b	[reland] Add int1 to int7 dtypes (#137928 ) Summary: Similar to https://github.com/pytorch/pytorch/pull/117208, we want to add int1 to int7 for edge use cases for weight quantization Test Plan: python test/test_quantization.py -k test_uint4_int4_dtype Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D64344944](https://our.internmc.facebook.com/intern/diff/D64344944) Pull Request resolved: https://github.com/pytorch/pytorch/pull/137928 Approved by: https://github.com/malfet	2024-10-18 02:02:08 +00:00
Jerry Zhang	ad134fe038	Skip doc test internally (#137813 ) Summary: there are some path issues when we run the doc tests internally https://www.internalfb.com/intern/test/281475143872621 Test Plan: sandcastle Reviewed By: drisspg, msaroufim Differential Revision: D64255824 Pull Request resolved: https://github.com/pytorch/pytorch/pull/137813 Approved by: https://github.com/HDCharles	2024-10-14 21:29:15 +00:00
Shangdi Yu	c0a930b104	Change to export_for_training in quantize_pt2e tests (#137233 ) Summary: as title also change it in `prepare_pt2e()` docstring Test Plan: ``` buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test:quantization_pt2e_qat buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization ``` Differential Revision: D63345059 Pull Request resolved: https://github.com/pytorch/pytorch/pull/137233 Approved by: https://github.com/tugsbayasgalan	2024-10-04 18:33:02 +00:00
Shangdi Yu	4096ed7dc2	Migrate to training ir in quantization_pt2e_qat unittests (#137232 ) Summary: Change capture_pre_autograd_graph to export_for_training in unit tests. Test Plan: ``` buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test:quantization_pt2e_qat ``` Reviewed By: tugsbayasgalan Differential Revision: D63336660 Pull Request resolved: https://github.com/pytorch/pytorch/pull/137232 Approved by: https://github.com/angelayi	2024-10-03 22:57:04 +00:00
Shangdi Yu	c83178d894	Change to export_for_training in XNNPACK tests (#137238 ) Summary: as title Test Plan: CI Differential Revision: D63344674 Pull Request resolved: https://github.com/pytorch/pytorch/pull/137238 Approved by: https://github.com/tugsbayasgalan	2024-10-03 21:28:05 +00:00
Shangdi Yu	a3f3773477	Make PT2E work with both IR simultaneously (#135769 ) Summary: as title Test Plan: ``` buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test:quantization_pt2e_qat ``` Differential Revision: D62449830 Pull Request resolved: https://github.com/pytorch/pytorch/pull/135769 Approved by: https://github.com/angelayi	2024-10-02 21:05:22 +00:00
PyTorch MergeBot	2ef1454189	Revert "Add int1 to int7 dtypes (#136301 )" This reverts commit `bfa16a161d`. Reverted https://github.com/pytorch/pytorch/pull/136301 on behalf of https://github.com/PaliC due to causing internal failures ([comment](https://github.com/pytorch/pytorch/pull/136301#issuecomment-2384119600))	2024-09-30 20:50:49 +00:00
Nitin Jain	8df97d78c2	[QAT] Make Fused modules torchscriptable (#136285 ) Summary: Same as title. Inspired by: https://pytorch.org/tutorials/recipes/script_optimized.html#fix-common-errors-when-using-the-script-method Test Plan: CI Differential Revision: D62980019 Pull Request resolved: https://github.com/pytorch/pytorch/pull/136285 Approved by: https://github.com/jerryzh168	2024-09-28 03:46:19 +00:00
Jerry Zhang	bfa16a161d	Add int1 to int7 dtypes (#136301 ) Summary: Similar to https://github.com/pytorch/pytorch/pull/117208, we want to add int1 to int7 for edge use cases for weight quantization (https://www.internalfb.com/diff/D62464487) Test Plan: python test/test_quantization.py -k test_uint4_int4_dtype Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/136301 Approved by: https://github.com/ezyang	2024-09-28 02:08:33 +00:00
Oguz Ulgen	a28b40fa74	Improve is_fbcode functionality (#136871 ) Summary: Previously is_fbcode just checked whether the checkout was git or not. This is extremely error prone. Lets make it fool-proof. Test Plan: unit tests Reviewed By: masnesral Differential Revision: D63545169 Pull Request resolved: https://github.com/pytorch/pytorch/pull/136871 Approved by: https://github.com/masnesral	2024-09-27 21:19:01 +00:00
blzheng	797c7e2802	[Quant][PT2E]change flatten recipe for X86InductorQuantizer (#136298 ) This PR modifies the flatten recipe: if none of the users of the flatten node are quantizable ops, int8 flatten will be disabled to avoid unnecessary dtype conversions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/136298 Approved by: https://github.com/leslie-fang-intel, https://github.com/jgong5	2024-09-24 04:30:12 +00:00
Riley Dulin	3be150653c	[torch][ao] Add customizable loss function to NodeAccuracySummary (#136282 ) Summary: Add a customizable loss function callback to NodeAccuracySummary to allow users to pass in their own loss function. Also, fix some type errors and propagate better exception messages when unexpected tensor comparisons occur. Finally, enhance the robustness of `generate_numeric_debug_handle` in the case where it is called multiple times on the same model, by avoiding reuse of the same IDs. Test Plan: Added a test for this case in `test_numeric_debugger`. Reviewed By: jerryzh168 Differential Revision: D62898297 Pull Request resolved: https://github.com/pytorch/pytorch/pull/136282 Approved by: https://github.com/jerryzh168	2024-09-24 03:28:12 +00:00
PyTorch MergeBot	df1eef9779	Revert "[torch][ao] Add customizable loss function to NodeAccuracySummary (#136282 )" This reverts commit `f3c54ccf8f`. Reverted https://github.com/pytorch/pytorch/pull/136282 on behalf of https://github.com/huydhn due to This breaks OSS, let revert it and land the revert internally then ([comment](https://github.com/pytorch/pytorch/pull/136282#issuecomment-2364219252))	2024-09-20 17:49:06 +00:00
Riley Dulin	f3c54ccf8f	[torch][ao] Add customizable loss function to NodeAccuracySummary (#136282 ) Summary: Add a customizable loss function callback to NodeAccuracySummary to allow users to pass in their own loss function. Also, fix some type errors and propagate better exception messages when unexpected tensor comparisons occur. Finally, enhance the robustness of `generate_numeric_debug_handle` in the case where it is called multiple times on the same model, by avoiding reuse of the same IDs. Test Plan: Added a test for this case in `test_numeric_debugger`. Reviewed By: jerryzh168 Differential Revision: D62898297 Pull Request resolved: https://github.com/pytorch/pytorch/pull/136282 Approved by: https://github.com/jerryzh168	2024-09-20 07:34:52 +00:00
Jerry Zhang	f2b0fc89f2	Add uint16 support for observer (#136238 ) Summary: att Test Plan: python test/test_quantization.py -k TestObserver Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D62909821](https://our.internmc.facebook.com/intern/diff/D62909821) Pull Request resolved: https://github.com/pytorch/pytorch/pull/136238 Approved by: https://github.com/tarun292	2024-09-18 23:52:18 +00:00
Michael Lazos	5c5c33ac32	[Dynamo] Trace torch function modes entered outside of torch.compile (#133137 ) This PR adds initial tracing for torch function modes. Details: In essence, this adds tracing into the torch function of modes entered outside of the torch.compile call. This does not yet support tracing enter/exit of a torch function mode/ tracing set_default_device properly using the new mode infra (this will be a very good stress test for modes). I am adding more PRs to this stack to support these. The overall plan is to support tracing enter/exit and handling graph breaks like we do other torch.* context managers. Previously landed: https://github.com/pytorch/pytorch/pull/133135 https://github.com/pytorch/pytorch/pull/133136 https://github.com/pytorch/pytorch/pull/133134 https://github.com/pytorch/pytorch/pull/133133 https://github.com/pytorch/pytorch/pull/133132 https://github.com/pytorch/pytorch/pull/133131 https://github.com/pytorch/pytorch/pull/133729 https://github.com/pytorch/pytorch/pull/133130 Pull Request resolved: https://github.com/pytorch/pytorch/pull/133137 Approved by: https://github.com/jansel, https://github.com/zou3519 ghstack dependencies: #134732	2024-09-14 18:52:22 +00:00
PyTorch MergeBot	8c8a3086a7	Revert "[Dynamo] Trace torch function modes entered outside of torch.compile (#133137 )" This reverts commit `4528777e03`. Reverted https://github.com/pytorch/pytorch/pull/133137 on behalf of https://github.com/mlazos due to broke python test/quantization/pt2e/test_numeric_debugger.py TestNumericDebugger.test_re_export_preserve_handle modified yesterday ([comment](https://github.com/pytorch/pytorch/pull/134732#issuecomment-2350937008))	2024-09-14 10:02:55 +00:00
Michael Lazos	4528777e03	[Dynamo] Trace torch function modes entered outside of torch.compile (#133137 ) This PR adds initial tracing for torch function modes. Details: In essence, this adds tracing into the torch function of modes entered outside of the torch.compile call. This does not yet support tracing enter/exit of a torch function mode/ tracing set_default_device properly using the new mode infra (this will be a very good stress test for modes). I am adding more PRs to this stack to support these. The overall plan is to support tracing enter/exit and handling graph breaks like we do other torch.* context managers. Previously landed: https://github.com/pytorch/pytorch/pull/133135 https://github.com/pytorch/pytorch/pull/133136 https://github.com/pytorch/pytorch/pull/133134 https://github.com/pytorch/pytorch/pull/133133 https://github.com/pytorch/pytorch/pull/133132 https://github.com/pytorch/pytorch/pull/133131 https://github.com/pytorch/pytorch/pull/133729 https://github.com/pytorch/pytorch/pull/133130 Pull Request resolved: https://github.com/pytorch/pytorch/pull/133137 Approved by: https://github.com/jansel, https://github.com/zou3519 ghstack dependencies: #134732	2024-09-14 02:40:43 +00:00
Jerry Zhang	b8eef500a6	Fix attr check for quantization spec (#135736 ) Summary: Previously we only checked dtype and is_dynamic to decide if two quantization spec are equivalent this may not work in some cases, e.g. when people use different qscheme or quant_min/quant_max This PR added checks for other fields as well Test Plan: regression tests Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D62530974](https://our.internmc.facebook.com/intern/diff/D62530974) Pull Request resolved: https://github.com/pytorch/pytorch/pull/135736 Approved by: https://github.com/sxu	2024-09-13 23:01:22 +00:00
Yiming Zhou	4312794b92	[reland][export] fix re-export custom metadata (#135720 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/134778 The previous D62304294 broke some executorch tests. It has already been reverted. In this diff, `_collect_param_buffer_metadata()` is modified in a way that when a `call_function` node is encountered and its input nodes include `get_attr`. We skip the fields that have been collected previously and only collect rest of the fields. This prevents over-writing. Test Plan: ``` buck2 test 'fbcode//mode/dev-nosan' fbcode//executorch/backends/xnnpack/test:test_xnnpack_ops buck2 test 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_re_export_preserve_handle buck2 test 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_run_decompositions_preserve_handle ``` Differential Revision: D62514208 Pull Request resolved: https://github.com/pytorch/pytorch/pull/135720 Approved by: https://github.com/zhxchen17, https://github.com/jerryzh168	2024-09-13 20:15:15 +00:00
PyTorch MergeBot	eb7dd91dd1	Revert "[Dynamo] Trace torch function modes entered outside of torch.compile (#133137 )" This reverts commit `fafdd588f2`. Reverted https://github.com/pytorch/pytorch/pull/133137 on behalf of https://github.com/albanD due to Broke tests on main ([comment](https://github.com/pytorch/pytorch/pull/134732#issuecomment-2348886378))	2024-09-13 12:52:58 +00:00
Michael Lazos	fafdd588f2	[Dynamo] Trace torch function modes entered outside of torch.compile (#133137 ) This PR adds initial tracing for torch function modes. Details: In essence, this adds tracing into the torch function of modes entered outside of the torch.compile call. This does not yet support tracing enter/exit of a torch function mode/ tracing set_default_device properly using the new mode infra (this will be a very good stress test for modes). I am adding more PRs to this stack to support these. The overall plan is to support tracing enter/exit and handling graph breaks like we do other torch.* context managers. Previously landed: https://github.com/pytorch/pytorch/pull/133135 https://github.com/pytorch/pytorch/pull/133136 https://github.com/pytorch/pytorch/pull/133134 https://github.com/pytorch/pytorch/pull/133133 https://github.com/pytorch/pytorch/pull/133132 https://github.com/pytorch/pytorch/pull/133131 https://github.com/pytorch/pytorch/pull/133729 https://github.com/pytorch/pytorch/pull/133130 Pull Request resolved: https://github.com/pytorch/pytorch/pull/133137 Approved by: https://github.com/jansel, https://github.com/zou3519 ghstack dependencies: #134732	2024-09-13 08:41:00 +00:00
Shangdi Yu	1a74952925	"Remove BLOCK_LIST" (#135729 ) Summary: Skip test_prepare_qat_conv_bn_fusion_getitem_placeholder when we use training ir, since it's only for bn-getitem pattern, but the pattern doesn't exist in training ir. Remove BLOCK_LIST since it's empty. Now all internal unittests will use training ir. Test Plan: ``` buck2 run 'fbcode//mode/dev-nosan' caffe2/test/quantization:test_quantization -- -r test_prepare_qat_conv_bn_fusion_getitem_placeholder buck2 run 'fbcode//mode/dev-nosan' caffe2/test:quantization_pt2e_qat -- -r test_prepare_qat_conv_bn_fusion_getitem_placeholder ``` Differential Revision: D62387987 Pull Request resolved: https://github.com/pytorch/pytorch/pull/135729 Approved by: https://github.com/tugsbayasgalan	2024-09-12 01:22:06 +00:00
Shangdi Yu	ad75b09d89	Replace capture_pre_autograd_graph with export_for_training in torch tests (#135623 ) Summary: as title Test Plan: ``` buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test:test_export -- -r test_conv_dynamic buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test:fx -- -r matcher buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r x86 ``` CI Differential Revision: D62448302 Pull Request resolved: https://github.com/pytorch/pytorch/pull/135623 Approved by: https://github.com/tugsbayasgalan	2024-09-11 19:23:08 +00:00
PyTorch MergeBot	183c32fd3b	Revert "[Dynamo] Trace torch function modes entered outside of torch.compile (#133137 )" This reverts commit `0d15122092`. Reverted https://github.com/pytorch/pytorch/pull/133137 on behalf of https://github.com/clee2000 due to something in this stack broke functorch/test_control_flow.py::TestControlFlow::test_scan_simple_graph [GH job link](https://github.com/pytorch/pytorch/actions/runs/10804912306/job/29980571390) [HUD commit link](`444b52ff40`), newly added test yesterday ([comment](https://github.com/pytorch/pytorch/pull/133137#issuecomment-2344054339))	2024-09-11 15:57:00 +00:00
Yiming Zhou	4ae6d7c18f	Back out "[pytorch][PR] [export] fix re-export custom metadata" (#135634 ) Summary: Broke some tests. Revert this diff Test Plan: CI Differential Revision: D62474337 Pull Request resolved: https://github.com/pytorch/pytorch/pull/135634 Approved by: https://github.com/tugsbayasgalan	2024-09-11 06:16:26 +00:00
Michael Lazos	0d15122092	[Dynamo] Trace torch function modes entered outside of torch.compile (#133137 ) This PR adds initial tracing for torch function modes. Details: In essence, this adds tracing into the torch function of modes entered outside of the torch.compile call. This does not yet support tracing enter/exit of a torch function mode/ tracing set_default_device properly using the new mode infra (this will be a very good stress test for modes). I am adding more PRs to this stack to support these. The overall plan is to support tracing enter/exit and handling graph breaks like we do other torch.* context managers. Previously landed: https://github.com/pytorch/pytorch/pull/133135 https://github.com/pytorch/pytorch/pull/133136 https://github.com/pytorch/pytorch/pull/133134 https://github.com/pytorch/pytorch/pull/133133 https://github.com/pytorch/pytorch/pull/133132 https://github.com/pytorch/pytorch/pull/133131 https://github.com/pytorch/pytorch/pull/133729 https://github.com/pytorch/pytorch/pull/133130 Pull Request resolved: https://github.com/pytorch/pytorch/pull/133137 Approved by: https://github.com/jansel, https://github.com/zou3519 ghstack dependencies: #134732	2024-09-11 04:18:22 +00:00
Yiming Zhou	66c45f3ed9	[export] fix re-export custom metadata (#135282 ) Fixes #134778 When a model is exported and debug handles are added to the "custom" field of non-placeholder and non-output nodes in the graph, re-exporting it will change the metadata of placeholder nodes (the "custom" field will be added or copied to these nodes, depending whether `ExportedProgram` or `ExportedProgram.module()` is passed to `generate_numeric_debug_handle()`). This occurs because when we re-export the model, `placeholder` nodes are unlifted to `get_attr` nodes. These nodes remain as `get_attr` after being exported to `gm_torch_level`. Their metadata are modified [here](https://github.com/pytorch/pytorch/blob/main/torch/export/_trace.py#L1347) based on `params_buffers_to_node_meta` which is collected [here](https://github.com/pytorch/pytorch/blob/main/torch/export/_trace.py#L1312). Pull Request resolved: https://github.com/pytorch/pytorch/pull/135282 Approved by: https://github.com/jerryzh168, https://github.com/zhxchen17, https://github.com/tugsbayasgalan	2024-09-10 20:15:02 +00:00
Huamin Li	fd494dd426	Change wrapped_linear_prepack and wrapped_quantized_linear_prepacked to private by adding _ as prefix (#135401 ) Summary: In https://github.com/pytorch/pytorch/pull/134232, we added two new ops wrapped_linear_prepack and wrapped_quantized_linear_prepacked. From the review comments and offline discussion, we are changing them to private by adding `_` as prefix Differential Revision: D62325142 Pull Request resolved: https://github.com/pytorch/pytorch/pull/135401 Approved by: https://github.com/houseroad	2024-09-08 04:16:24 +00:00
Yiming Zhou	c92227c41a	[quant][pt2e] fix placeholder typo and related quantization tests (#135379 ) A previous typo on "placeholder" and related tests in quantization are fixed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/135379 Approved by: https://github.com/jerryzh168	2024-09-07 02:31:43 +00:00
Shangdi Yu	b1a934741e	Change test_constant_prop_preserve_metadata (#135268 ) Summary: In new export_for_training, "stack_trace" does not exist in node meta anymore. Test Plan: ``` buck run fbcode//mode/dev-nosan fbcode//caffe2/test:quantization_pt2e -- -r test_constant_prop_preserve_metadata ``` Reviewed By: angelayi Differential Revision: D62219974 Pull Request resolved: https://github.com/pytorch/pytorch/pull/135268 Approved by: https://github.com/angelayi	2024-09-07 00:02:35 +00:00
Shangdi Yu	590a3e9f8a	[export][training ir migration] quantized_decomposed.quantize_per_tensor decomposition (#134525 ) Summary: In graph of TestXNNPACKQuantizer.test_dynamic_linear_with_con test, some quantized_decomposed.quantize_per_tensor.default ops are becoming quantized_decomposed.dequantize_per_tensor.tensor ops when using the new training ir. This is because we lift params/buffers before calling make_fx. So previously, for the graph that’s passed to make_fx,`graph.L__self___linear1.weight` is a tensor now in training ir, graph.L__self___linear1.weight is a FakeTensor. This caused the node overload to be different. Test Plan: ``` buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_dynamic_linear_with_conv ``` Differential Revision: D61364547 Pull Request resolved: https://github.com/pytorch/pytorch/pull/134525 Approved by: https://github.com/tugsbayasgalan, https://github.com/jerryzh168	2024-09-06 07:06:06 +00:00
Shangdi Yu	bc5ecf83d7	[training ir migration] Fix quantization tests (#135184 ) Summary: Fixed some quantization tests for new training ir: Fix batch norm node pattern matcher. In training ir, we have `aten.batch_norm` node instead of `aten._native_batch_norm_legit` and `aten._native_batch_norm_legit_no_training`. Test Plan: ``` buck run fbcode//mode/dev-nosan fbcode//caffe2/test:quantization_pt2e ``` Reviewed By: tugsbayasgalan Differential Revision: D62209819 Pull Request resolved: https://github.com/pytorch/pytorch/pull/135184 Approved by: https://github.com/tugsbayasgalan	2024-09-05 21:19:28 +00:00
Tarun Karuturi	0043dcd79e	Switch torch pt2e xnnpack tests to use export_for_training (#134788 ) Migrate all the callsites inside the pt2e XNNPACK tests to use export_for_training. Differential Revision: D61994553 Pull Request resolved: https://github.com/pytorch/pytorch/pull/134788 Approved by: https://github.com/mergennachin	2024-09-05 16:11:18 +00:00
Jerry Zhang	3ef4c27ab3	Update pt2e numeric debugger to use node.meta["custom"] field (#134040 ) Summary: With https://github.com/pytorch/pytorch/pull/131912 we now have a "custom" field in node.meta that can be preserved in * copy/deepcopy * run_decompositions() * serialization * re-exporting So we refactored numeric debugger to use this. Test Plan: python test/test_quantization.py TestNumericDebugger Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/134040 Approved by: https://github.com/tarun292	2024-08-27 19:51:03 +00:00
Huamin Li	311af3b988	Add new ops wrapped_linear_prepack and wrapped_quantized_linear_prepacked (#134232 ) Summary: This diff adds two new operators torch.ops._quantized.wrapped_linear_prepack and torch.ops._quantized.wrapped_quantized_linear_prepacked. It is a decomposition of the op torch.ops._quantized.wrapped_quantized_linear added in the previous diff. We decomposed in this way as packed weight could be computed early so we don;t need to do it in every forward in AOTI Reviewed By: jerryzh168 Differential Revision: D61395887 Pull Request resolved: https://github.com/pytorch/pytorch/pull/134232 Approved by: https://github.com/houseroad	2024-08-23 04:54:26 +00:00
Shangdi Yu	b0cf287b46	[export][training ir migration] Fix getitem not exist (#134259 ) Summary: Make quantization tests compatible with the new training IR. With the new batch norm node `torch.ops.aten.batch_norm.default`, we don't need an additional getitem node after the bn node, so tests need to be fixed to not check for the getitem node. We added a capture_pre_autograd_graph_using_training_ir() function, which returns True when we are using the training ir, and False otherwise. This way, the code supports both training ir and the old ir. For now, we are just rolling out the training ir for fbcode internal tests. Test Plan: ``` buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_qat_preserve_source_fn_stack buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_qat_update_shared_qspec buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_conv2d buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_qat_conv_bn_relu_fusion buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_qat_conv_bn_fusion buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_qat_conv_bn_fusion_literal_args ``` Reviewed By: andrewor14, tugsbayasgalan Differential Revision: D61292102 Pull Request resolved: https://github.com/pytorch/pytorch/pull/134259 Approved by: https://github.com/tugsbayasgalan	2024-08-22 22:00:14 +00:00
Huamin Li	3d8db41337	Add new op wrapped_quantized_linear (#134024 ) Summary: This diff adds a new operator wrapped_quantized_linear (torch.ops._quantized.wrapped_quantized_linear) and takes the following input argument: input (in fp32) , input_scale, input_zero_point, weight (in fp32), weight_scale, weight_zero_point, bias (in fp32), output_scale, output_zero_point, and out_channel. It does the following 1. Use quantize_per_tensor(input, input_scale, input_zero_point) to quantize the input tensor to int8 2. Use quantized::linear_prepack(weight, weight_scale, weight_zero_point, bias) to pack the weight and bias 3. Use quantized::linear to perform int8 quantized linear 4. dequantize This new op is essentially a wrapper of mutiple ops. We do this as torch.export cannot handle models where it has old quantize apis. Reviewed By: jerryzh168 Differential Revision: D61377266 Pull Request resolved: https://github.com/pytorch/pytorch/pull/134024 Approved by: https://github.com/houseroad	2024-08-21 09:26:58 +00:00
Charles David Hernandez	99e789b52b	[Fix 1/n] GPU Test skips - fbcode/ caffe2/test/quantization (#133158 ) Summary: This diff aims to fix the GPU Test skips in the quantization tests under the `caffe2/test/quantization` directory. The changes made in the `TARGETS` files include adding the `should_use_remote_gpu` flag to enable remote GPU testing. This should help to resolve the skipped tests and improve the overall test coverage. [This diff] Fixed skip count: 4 [Running total] Fixed skip count: 4 Note: Creating separate diffs for each test-group. Test Plan: 281475054644766: buck2 test 'fbcode//mode/opt' fbcode//caffe2/test/quantization:test_quantization -- --exact 'caffe2/test/quantization:test_quantization - test_compare_per_channel_device_numerics (caffe2.test.quantization.core.test_quantized_tensor.TestQuantizedTensor)' https://www.internalfb.com/intern/testinfra/testrun/5629499773981783 281475054644780: buck2 test 'fbcode//mode/opt' fbcode//caffe2/test/quantization:test_quantization -- --exact 'caffe2/test/quantization:test_quantization - test_compare_per_tensor_device_numerics (caffe2.test.quantization.core.test_quantized_tensor.TestQuantizedTensor)' https://www.internalfb.com/intern/testinfra/testrun/11540474087422107 281475054644853: buck2 test 'fbcode//mode/opt' fbcode//caffe2/test/quantization:test_quantization -- --exact 'caffe2/test/quantization:test_quantization - test_quant_pin_memory (caffe2.test.quantization.core.test_quantized_tensor.TestQuantizedTensor)' https://www.internalfb.com/intern/testinfra/testrun/11540474087422477 844425008078016: buck2 test 'fbcode//mode/opt' fbcode//caffe2/test/quantization:test_quantization -- --exact 'caffe2/test/quantization:test_quantization - test_cuda_quantization_does_not_pin_memory (caffe2.test.quantization.core.test_quantized_tensor.TestQuantizedTensor)' https://www.internalfb.com/intern/testinfra/testrun/1407375259845199 Differential Revision: D60055277 Pull Request resolved: https://github.com/pytorch/pytorch/pull/133158 Approved by: https://github.com/jovianjaison	2024-08-16 22:00:57 +00:00
Mikayla Gawarecki	d9576c9440	Fix failures when default is flipped for weights_only (#127627 ) Tests on XLA shard not fixed yet but there is an issue here https://github.com/pytorch/xla/issues/7799 Pull Request resolved: https://github.com/pytorch/pytorch/pull/127627 Approved by: https://github.com/albanD ghstack dependencies: #132349	2024-08-16 00:22:43 +00:00
Riley Dulin	d61815cb7d	[torch][ao] Use returned model from Quantizer.transform_for_annotation in prepare_pt2e (#132893 ) Summary: The Quantizer subclass can return a new model from `transform_for_annotation`, and this is common if it uses any ExportPass subclass which does not mutate in-place. Use the returned model instead of assuming its the same. Differential Revision: D60869676 Pull Request resolved: https://github.com/pytorch/pytorch/pull/132893 Approved by: https://github.com/jerryzh168	2024-08-12 17:23:19 +00:00
Shangdi Yu	3c5b246d3c	[export] Remove Proxy from exported programs and modules (#132956 ) Summary: Remove Proxy from exported programs and modules because they cannot be deepcopied or pickeled. Test Plan: CI ``` buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r qat_conv2d buck2 run 'fbcode//mode/dev-nosan' fbcode//modai/test:test_modai -- -r test_qat_stinson_htp_export buck2 run 'fbcode//mode/dev-nosan' fbcode//vizard_projects/ml_depth/tests:test_model -- -r test_qat_model_et buck2 run 'fbcode//mode/dev-nosan' fbcode//bolt/nn/executorch/backends/tests:qnn_test -- -r test_qat_bias=False,use_3d_input=False buck2 run 'fbcode//mode/dev-nosan' fbcode//bolt/nn/executorch/backends/tests:qnn_test -- -r test_qat_bias=True,use_3d_input=False buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_fold_bn_erases_bn_node ``` Differential Revision: D60940832 Pull Request resolved: https://github.com/pytorch/pytorch/pull/132956 Approved by: https://github.com/angelayi	2024-08-09 00:00:20 +00:00
Shangdi Yu	825002c9c6	[export][fx] More robust DCE pass (#132764 ) Summary: - make default DCE pass check schema, - need to rebase onto https://github.com/pytorch/pytorch/pull/131651 after it's in phabricator (for now the change is manually added). - mark Proxy dump as NotImplemented for better error msg - Remove Proxy from tensors when dumping models, as Proxy cannot be dumped. More details in https://docs.google.com/document/d/1G5vmTXjzxoyVGRI2kpA1gQukK_Glyg2NrE0Oh6Nlg9A/edit?usp=sharing. Test Plan: CI ``` - buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r qat_conv2d - test_export.py - buck2 run 'fbcode//mode/dev-nosan' fbcode//modai/test:test_modai -- -r test_qat_stinson_htp_export - buck2 run 'fbcode//mode/dev-nosan' fbcode//vizard_projects/ml_depth/tests:test_model -- -r test_qat_model_et - buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test:fx -- -r dce - buck2 run 'fbcode//mode/dev-nosan' fbcode//bolt/nn/executorch/backends/tests:qnn_test -- -r test_qat_bias=False,use_3d_input=False - buck2 run 'fbcode//mode/dev-nosan' fbcode//bolt/nn/executorch/backends/tests:qnn_test -- -r test_qat_bias=True,use_3d_input=False - buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_fold_bn_erases_bn_node ``` Reviewed By: angelayi Differential Revision: D60319175 Pull Request resolved: https://github.com/pytorch/pytorch/pull/132764 Approved by: https://github.com/angelayi	2024-08-06 22:27:22 +00:00
andrewor14	fc7849b93f	[pt2e][quant] Ensure BN node is erased after convert (#131651 ) Summary: Previously, when folding BN into conv, we rely on DCE to clean up the unused BN node from the graph. This works if the model is already in eval mode, but fails if the model is still in train mode because DCE doesn't remove nodes with potential side effects (in this case `_native_batch_norm_legit`). This required users to move the model to eval mode before calling convert in order to get a properly DCE'd graph. To solve this, we manually erase the BN node after folding instead of relying on DCE. This relaxes the ordering constraints between `move_exported_model_to_eval` and `convert_pt2e`. Test Plan: python test/test_quantization.py TestQuantizePT2EQAT_ConvBn1d.test_fold_bn_erases_bn_node python test/test_quantization.py TestQuantizePT2EQAT_ConvBn2d.test_fold_bn_erases_bn_node Reviewers: jerryzh168, yushangdi Subscribers: jerryzh168, yushangdi, supriyar Differential Revision: [D60520149](https://our.internmc.facebook.com/intern/diff/D60520149) Pull Request resolved: https://github.com/pytorch/pytorch/pull/131651 Approved by: https://github.com/yushangdi, https://github.com/leslie-fang-intel	2024-08-06 16:37:39 +00:00
Oguz Ulgen	221350e3a4	Add None return type to init -- tests (#132352 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/132352 Approved by: https://github.com/ezyang ghstack dependencies: #132335, #132351	2024-08-01 15:44:51 +00:00
Xuehai Pan	548c460bf1	[BE][Easy][7/19] enforce style for empty lines in import segments in `test/[a-c]/` and `test/[q-z]/` (#129758 ) See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter. You can review these PRs via: ```bash git diff --ignore-all-space --ignore-blank-lines HEAD~1 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/129758 Approved by: https://github.com/ezyang	2024-07-31 10:54:03 +00:00
ekamiti	9e473fd868	Make adding Buffers more like adding Parameters (#125971 ) Add similar semantics for creating a buffer object similar to creating a parameter. This is done by introducing a new Buffer class that can be used for type disambiguation. The underlying functionality of registering a buffer remains the same as the register_buffer method has not been changed. The persistent parameter in the Buffer type is to indicate whether a buffer object should be persistent or not. Other non-test changes have to do with getting the new Buffer type recognized by inductor and dynamo. Remaining changes are test changes to make sure that the Buffer type can be used as a drop in replacement for register_buffer as it just leads to register_buffer being called. The addition of this new functionality still allows for normal tensors to be used as buffers so these changes are intended to be backwards compatible. Fixes #35735 Co-authored-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/125971 Approved by: https://github.com/albanD, https://github.com/anijain2305, https://github.com/mlazos	2024-07-31 10:32:40 +00:00
PyTorch MergeBot	e73a4cb21f	Revert "[pt2e][quant] Ensure BN node is erased after convert (#131651 )" This reverts commit `eba2ffd278`. Reverted https://github.com/pytorch/pytorch/pull/131651 on behalf of https://github.com/ZainRizvi due to breaking internal builds ([comment](https://github.com/pytorch/pytorch/pull/131651#issuecomment-2256407968))	2024-07-29 16:42:24 +00:00
andrewor14	eba2ffd278	[pt2e][quant] Ensure BN node is erased after convert (#131651 ) Summary: Previously, when folding BN into conv, we rely on DCE to clean up the unused BN node from the graph. This works if the model is already in eval mode, but fails if the model is still in train mode because DCE doesn't remove nodes with potential side effects (in this case `_native_batch_norm_legit`). This required users to move the model to eval mode before calling convert in order to get a properly DCE'd graph. To solve this, we manually erase the BN node after folding instead of relying on DCE. This relaxes the ordering constraints between `move_exported_model_to_eval` and `convert_pt2e`. Test Plan: python test/test_quantization.py TestQuantizePT2EQAT_ConvBn1d.test_fold_bn_erases_bn_node python test/test_quantization.py TestQuantizePT2EQAT_ConvBn2d.test_fold_bn_erases_bn_node Reviewers: jerryzh168, yushangdi Subscribers: jerryzh168, yushangdi, supriyar Pull Request resolved: https://github.com/pytorch/pytorch/pull/131651 Approved by: https://github.com/yushangdi	2024-07-26 15:30:45 +00:00
Yidi Wu	2c1851f04e	[export] fix output node's meta (#131706 ) Summary: This pr fixes all the places in strict export stack where the output node's meta is not preserved correctly. However, we're getting a new error for the test we intend to fix: `buck2 run caffe2/test/quantization:test_quantization -- -r "test_re_export_preserve_handle"`: The `get_attr` nodes has wrong metadata. I guess there are more things need to be fixed to get it working but it's beyond the scope of this PR. Test Plan: buck2 run caffe2/test/quantization:test_quantization -- -r "test_re_export_preserve_handle" Differential Revision: D60198221 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131706 Approved by: https://github.com/yushangdi	2024-07-25 18:44:21 +00:00
Xu Han	72d17d95d7	[inductor] Enable dynamo for Windows. RC1 (#131286 ) Changes: 1. Enable Windows in `check_if_inductor_supported`. 2. Disable Windows in `AotCodeCompiler`. 3. Force Windows inductor to `c++20` to support `std::enable_if_t`. 4. Disable `test_x86inductor_quantizer` UT on `Windows` temporary, It still some issue need to be fix: https://github.com/pytorch/pytorch/pull/131308 . Based on this PR, I have run first model `resnet18` on Windows inductor successful. <img width="1036" alt="image" src="https://github.com/user-attachments/assets/2642bda1-1845-417a-aaba-39bdf22e65d6"> TODO: 1. Upgrade pytorch Windows build to `c++20`. 2. Fix and re-enable `test_x86inductor_quantizer` UT on `Windows`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/131286 Approved by: https://github.com/jgong5, https://github.com/jansel	2024-07-24 15:26:55 +00:00
Shangdi Yu	29e2e2afb6	Revert D59561509: Multisect successfully blamed "D59561509: [FX][export] DCE pass, check schema for node impurity (#130395 )" for one test failure (#131341 ) Summary: This diff reverts D59561509 D59561509: [FX][export] DCE pass, check schema for node impurity (#130395) by yushangdi causes the following test failure: Tests affected: - [cogwheel:cogwheel_mtia_cmf_m5_shrunk_test#test_flow_with_verification](https://www.internalfb.com/intern/test/844425041436985/) Here's the Multisect link: https://www.internalfb.com/multisect/6533402 Here are the tasks that are relevant to this breakage: T191383430: 10+ tests unhealthy for ads_mtia_inference The backout may land if someone accepts it. If this diff has been generated in error, you can Commandeer and Abandon it. Test Plan: NA Differential Revision: D60029318 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131341 Approved by: https://github.com/angelayi	2024-07-23 05:23:47 +00:00
kausik	4f60a2e39c	Set correct output dtype for dequantize op during convert_pt2e in decomposed mode (#128953 ) Earlier the signature of dequantize ops for decomposed quantized Tensor was changed for wider use-cases where the output dtype can be different from torch.float and needs to be passed during dequantization. Please refer: https://github.com/pytorch/pytorch/pull/121450 However, setting of correct output dtype for dequantize ops was still missing in convert_pt2e flow. This change enables the users to use PT2E quantization flow with non torch.float unquantized dtype, such as torch.bfloat16. Pull Request resolved: https://github.com/pytorch/pytorch/pull/128953 Approved by: https://github.com/jgong5, https://github.com/jerryzh168	2024-07-19 04:58:02 +00:00
Jerry Zhang	793b17ebcb	Add numeric_debugger top level APIs (#130643 ) Summary: Add three top level APIs for numeric debugger in pt2e flow that can log intermediate output in the model and calculate summary for metric comparisons between nodes in two graphs * `prepare_for_propagation_comparison` * `extract_results_from_loggers` * `compare_results` Test Plan: python test/test_quantization.py -k test_prepare_for_propagation_comparison python test/test_quantization.py -k test_extract_results_from_loggers Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/130643 Approved by: https://github.com/dulinriley, https://github.com/tarun292	2024-07-18 20:54:18 +00:00
Shangdi Yu	27ded03545	[FX][export] DCE pass, check schema for node impurity (#130395 ) Change the default DCE pass to check node schema for impure nodes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/130395 Approved by: https://github.com/angelayi, https://github.com/jgong5	2024-07-18 16:31:40 +00:00
PyTorch MergeBot	433ef4e444	Revert "[FX][export] DCE pass, check schema for node impurity (#130395 )" This reverts commit `e22b0acc76`. Reverted https://github.com/pytorch/pytorch/pull/130395 on behalf of https://github.com/yushangdi due to breaking tests, need to rebase and fix ([comment](https://github.com/pytorch/pytorch/pull/130395#issuecomment-2235192986))	2024-07-18 02:46:03 +00:00
Shangdi Yu	e22b0acc76	[FX][export] DCE pass, check schema for node impurity (#130395 ) Change the default DCE pass to check node schema for impure nodes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/130395 Approved by: https://github.com/angelayi, https://github.com/jgong5	2024-07-18 00:55:20 +00:00
Jerry Zhang	b893aa71ca	Rename generate_numeric_debug_handle to numeric_debugger (#130590 ) Summary: att Test Plan: CI Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/130590 Approved by: https://github.com/dulinriley, https://github.com/tarun292	2024-07-15 22:42:27 +00:00
Jerry Zhang	df9d1b44e7	Preserve _numeric_debug_handle throguh deepcopy and re-export (#129287 ) Summary: * Added support for preserving it during deepcopy, need to remap the args since _numeric_debug_handle refers to the nodes in the graph TODO: need to fully support re-export, currently the metadata for output node is not preserved Test Plan: python test/test_quantization.py -k test_deepcopy_preserve_handle python test/test_quantization.py -k test_copy_preserve_handle all related tests: python test/test_quantization.py -k TestGenerateNumericDebugHandle Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/129287 Approved by: https://github.com/zhxchen17	2024-07-11 02:19:41 +00:00
Jerry Zhang	4c19623800	Change numeric_debug_handle to store per-node id (#129811 ) Summary: Previously we store edge id in numeric_debug_handle to support operator fusion and operator decomposition throughout the stack, but according to feedback from customers, people prefer the simpler per-node id, and they are fine with not having the additional support for numerical debugging for inputs and willing to hack around to achieve this. This PR changes the structure of numeric_debug_handle to store unique_id for each node instead. e.g. graph: ``` node = op(input_node, weight_node) ``` Before: ``` node.meta[NUMERIC_DEBUG_HANDLE_KEY] = {input_node: id1, weight_node: id2, "output": id3} ``` After: ``` node.meta[NUMERIC_DEBUG_HANDLE_KEY] = id1 ``` Test Plan: python test/test_quantization.py -k TestGenerateNumericDebugHandle Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/129811 Approved by: https://github.com/tarun292	2024-07-08 23:36:19 +00:00
Xia, Weiwen	36e2608783	[Quant][PT2E] enable qlinear post op fusion for dynamic quant & qat (#122667 ) Description Add fusion path for dynamic quant and for QAT. The following patterns can be matched for static quant with QAT cases: `qx -> qlinear -> add -> optional relu -> optional type convert -> optional quant` The following patterns can be matched for dynamic quant cases: `qx -> qlinear -> add -> optional relu` Test plan python test/inductor/test_mkldnn_pattern_matcher.py -k test_qlinear python test/inductor/test_cpu_cpp_wrapper.py -k test_qlinear python test/test_quantization.py -k test_linear_unary python test/test_quantization.py -k test_linear_binary Differential Revision: [D57655830](https://our.internmc.facebook.com/intern/diff/D57655830) Pull Request resolved: https://github.com/pytorch/pytorch/pull/122667 Approved by: https://github.com/jgong5, https://github.com/leslie-fang-intel	2024-07-08 20:04:39 +00:00
PyTorch MergeBot	784e3b4123	Revert "Change numeric_debug_handle to store per-node id (#129811 )" This reverts commit `a9a744e442`. Reverted https://github.com/pytorch/pytorch/pull/129811 on behalf of https://github.com/kit1980 due to breaking internal builds ([comment](https://github.com/pytorch/pytorch/pull/129811#issuecomment-2211245852))	2024-07-05 18:14:02 +00:00
Jerry Zhang	a9a744e442	Change numeric_debug_handle to store per-node id (#129811 ) Summary: Previously we store edge id in numeric_debug_handle to support operator fusion and operator decomposition throughout the stack, but according to feedback from customers, people prefer the simpler per-node id, and they are fine with not having the additional support for numerical debugging for inputs and willing to hack around to achieve this. This PR changes the structure of numeric_debug_handle to store unique_id for each node instead. e.g. graph: ``` node = op(input_node, weight_node) ``` Before: ``` node.meta[NUMERIC_DEBUG_HANDLE_KEY] = {input_node: id1, weight_node: id2, "output": id3} ``` After: ``` node.meta[NUMERIC_DEBUG_HANDLE_KEY] = id1 ``` Test Plan: python test/test_quantization.py -k TestGenerateNumericDebugHandle Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/129811 Approved by: https://github.com/tarun292	2024-07-03 22:03:31 +00:00
Tijmen Blankevoort	e3b3431c42	Fix for HistogramObserver (#129387 ) Summary: There were two problems with the HistogramObserver: 1. It does not work when someone passes a batch_size 1, tensor_size 1 data-point. 2. The Histogram doesn't seem to actually update if the range of the new x falls within the old one These issues were both fixed. On top of this, I greatly simplified the logic for the histogram updating. Now, it doesn't do the downsampling anymore, which saves a ton of memory and code. The accuracy can still be controlled with the upsampling ratio. This ratio was also too high for the accuracy we generally need here, I reduced the default for this. Also the code is cleaner now, much easier to follow what's happening. test_histogram_observer_same_inputs was likely wrong - If I pass 0s and 1s to my histogramobserver, I want them to actually count! The current test now thinks it's good to discard and ignore these values. Test Plan: You can run the included tests. Differential Revision: D58931336 Pull Request resolved: https://github.com/pytorch/pytorch/pull/129387 Approved by: https://github.com/jerryzh168	2024-07-02 15:41:44 +00:00
Aaron Gokaslan	6c2a8b6b38	[Ez][BE]: Enable new stable ruff rules (#129825 ) Applies a bunch of new ruff lint rules that are now stable. Some of these improve efficiency or readability. Since I already did passes on the codebase for these when they were in preview, there should be relatively few changes to the codebase. This is just more for future hardening of it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/129825 Approved by: https://github.com/XuehaiPan, https://github.com/jansel, https://github.com/malfet	2024-07-02 14:47:10 +00:00
leslie-fang-intel	86e2d16ba0	[Inductor][Quant] Change the schema of QLinear Binary (#129049 ) Summary We change the schema of QLinear Binary, so it will be easier to enable the corresponding gemm template. - Extra input of binary post-op is a tensor which needs to be an input node of autotuning, we need to move it at front of `output_scale` which is a scalar. - We also move it at front of `bias`, since `bias` is optional tensor for this fusion, but `other` is a must to have for linear binary fusion. Test Plan ``` python -u -m pytest -s -v test/quantization/core/test_quantized_op.py -k qlinear python -u -m pytest -s -v test/inductor/test_mkldnn_pattern_matcher.py -k qlinear ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/129049 Approved by: https://github.com/jgong5, https://github.com/jansel ghstack dependencies: #128825, #129048	2024-07-02 12:36:38 +00:00
PyTorch MergeBot	3d96217891	Revert "[BE][Easy] use `pathlib.Path` instead of `dirname` / `".."` / `pardir` (#129374 )" This reverts commit `9e1f3ecaa7`. Reverted https://github.com/pytorch/pytorch/pull/129374 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it is still failing with the same error ([comment](https://github.com/pytorch/pytorch/pull/129374#issuecomment-2197801405))	2024-06-29 00:47:15 +00:00
Xuehai Pan	9e1f3ecaa7	[BE][Easy] use `pathlib.Path` instead of `dirname` / `".."` / `pardir` (#129374 ) Changes by apply order: 1. Replace all `".."` and `os.pardir` usage with `os.path.dirname(...)`. 2. Replace nested `os.path.dirname(os.path.dirname(...))` call with `str(Path(...).parent.parent)`. 3. Reorder `.absolute()` ~/ `.resolve()`~ and `.parent`: always resolve the path first. `.parent{...}.absolute()` -> `.absolute().parent{...}` 4. Replace chained `.parent x N` with `.parents[${N - 1}]`: the code is easier to read (see 5.) `.parent.parent.parent.parent` -> `.parents[3]` 5. ~Replace `.parents[${N - 1}]` with `.parents[${N} - 1]`: the code is easier to read and does not introduce any runtime overhead.~ ~`.parents[3]` -> `.parents[4 - 1]`~ 6. ~Replace `.parents[2 - 1]` with `.parent.parent`: because the code is shorter and easier to read.~ Pull Request resolved: https://github.com/pytorch/pytorch/pull/129374 Approved by: https://github.com/justinchuby, https://github.com/malfet	2024-06-28 00:35:15 +00:00
PyTorch MergeBot	895316119d	Revert "[BE][Easy] use `pathlib.Path` instead of `dirname` / `".."` / `pardir` (#129374 )" This reverts commit `0314c4c101`. Reverted https://github.com/pytorch/pytorch/pull/129374 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it causes lots of internal build failures where they fail to find hipify module ([comment](https://github.com/pytorch/pytorch/pull/129374#issuecomment-2192437052))	2024-06-26 19:03:57 +00:00
Xuehai Pan	0314c4c101	[BE][Easy] use `pathlib.Path` instead of `dirname` / `".."` / `pardir` (#129374 ) Changes by apply order: 1. Replace all `".."` and `os.pardir` usage with `os.path.dirname(...)`. 2. Replace nested `os.path.dirname(os.path.dirname(...))` call with `str(Path(...).parent.parent)`. 3. Reorder `.absolute()` ~/ `.resolve()`~ and `.parent`: always resolve the path first. `.parent{...}.absolute()` -> `.absolute().parent{...}` 4. Replace chained `.parent x N` with `.parents[${N - 1}]`: the code is easier to read (see 5.) `.parent.parent.parent.parent` -> `.parents[3]` 5. ~Replace `.parents[${N - 1}]` with `.parents[${N} - 1]`: the code is easier to read and does not introduce any runtime overhead.~ ~`.parents[3]` -> `.parents[4 - 1]`~ 6. ~Replace `.parents[2 - 1]` with `.parent.parent`: because the code is shorter and easier to read.~ Pull Request resolved: https://github.com/pytorch/pytorch/pull/129374 Approved by: https://github.com/justinchuby, https://github.com/malfet	2024-06-25 08:28:38 +00:00
cyy	cb5e9183c6	[Caffe2] [2/N] Remove Caffe2 from tests (#128911 ) Follows #128675 Pull Request resolved: https://github.com/pytorch/pytorch/pull/128911 Approved by: https://github.com/titaiwangms, https://github.com/r-barnes	2024-06-19 00:05:50 +00:00
cyy	163847b1bb	[1/N] [Caffe2] Remove caffe2_aten_fallback code (#128675 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/128675 Approved by: https://github.com/r-barnes	2024-06-17 21:25:59 +00:00
vasiliy	2d01f87737	Enable torch.empty for float8 dtypes + deterministic mode + cpu (#128744 ) Summary: Enables creating empty float8 tensors for: * cuda when `torch.use_deterministic_algorithms` is set to True * cpu for all settings of `torch.use_deterministic_algorithms` Context for NaN values of float8_e4m3fn and float8_e5m2: https://arxiv.org/pdf/2209.05433, Section 3, Table 1 Context for NaN values of float8_e4m3fnuz and float8_e5m2fnuz: https://arxiv.org/pdf/2206.02915, Section 3.2, "instead of reserving one exponent field to represent Inf and NaN, we reserve only a single codeword (corresponding to negative zero)" Test Plan: ``` python test/test_quantization.py -k test_empty ``` Reviewers: Subscribers: Tasks: Tags: Fixes https://github.com/pytorch/pytorch/issues/128733 Pull Request resolved: https://github.com/pytorch/pytorch/pull/128744 Approved by: https://github.com/malfet, https://github.com/drisspg	2024-06-15 02:05:30 +00:00
yiliu30	4669c6d3ae	[quant][pt2e][quantizer] Support `set_module_name_qconfig` in X86InductorQuantizer (#126044 ) Summary: Added `set_module_name_qconfig` support to allow users to set configurations based on module name in `X86InductorQuantizer`. For example, only quantize the `sub`: ```python class M(torch.nn.Module): def __init__(self): super().__init__() self.linear = torch.nn.Linear(5, 5) self.sub = Sub() def forward(self, x): x = self.linear(x) x = self.sub(x) return x m = M().eval() example_inputs = (torch.randn(3, 5),) # Set config for a specific submodule. quantizer = X86InductorQuantizer() quantizer.set_module_name_qconfig("sub", xiq.get_default_x86_inductor_quantization_config()) ``` - Added `set_module_name_qconfig` to allow user set the configuration at the `module_name` level. - Unified the annotation process to follow this order: `module_name_qconfig`, `operator_type_qconfig`, and `global_config`. - Added `config_checker` to validate all user configurations and prevent mixing of static/dynamic or QAT/non-QAT configs. - Moved `_get_module_name_filter` from `xnnpack_quantizer.py` into `utils.py` as it common for all quantizer. Test Plan ```bash python -m pytest quantization/pt2e/test_x86inductor_quantizer.py -k test_set_module_name ``` @Xia-Weiwen @leslie-fang-intel @jgong5 Pull Request resolved: https://github.com/pytorch/pytorch/pull/126044 Approved by: https://github.com/jgong5, https://github.com/leslie-fang-intel, https://github.com/jerryzh168	2024-06-14 07:13:10 +00:00
PyTorch MergeBot	3ddec713b8	Revert "[cuDNN][Quantization] Don't print when plan finalization fails in cuDNN quantization backend (#128177 )" This reverts commit `cac7a22b92`. Reverted https://github.com/pytorch/pytorch/pull/128177 on behalf of https://github.com/clee2000 due to broke test/test_quantization.py::TestQuantizedLinear::test_qlinear_cudnn on sm86 tests `cac7a22b92` https://github.com/pytorch/pytorch/actions/runs/9470648757/job/26100448913. Probably a landrace, test ran on the PR and succeed ([comment](https://github.com/pytorch/pytorch/pull/128177#issuecomment-2161977110))	2024-06-12 02:20:15 +00:00
Eddie Yan	cac7a22b92	[cuDNN][Quantization] Don't print when plan finalization fails in cuDNN quantization backend (#128177 ) Similar in spirit to #125790, hopefully addresses failures seen for cuDNN 9.1 upgrade: #https://github.com/pytorch/pytorch/pull/128166 CC @nWEIdia @atalman Pull Request resolved: https://github.com/pytorch/pytorch/pull/128177 Approved by: https://github.com/nWEIdia, https://github.com/Skylion007	2024-06-11 18:09:25 +00:00
Eddie Yan	54fe2d0e89	[cuDNN][quantization] skip qlinear test in cuDNN v9.1.0 (#128166 ) #120006 only very recently unskipped this test 3 days ago so we don't consider it a blocker for cuDNNv9 for now CC @atalman Pull Request resolved: https://github.com/pytorch/pytorch/pull/128166 Approved by: https://github.com/atalman, https://github.com/nWEIdia	2024-06-06 21:43:29 +00:00
eqy	ac568fc007	[CUDNN] Remove defunct cuDNN V8 API build flag (#120006 ) The flag basically does nothing following #95722 Let's see if the quantization tests break CC @malfet @atalmanagement Pull Request resolved: https://github.com/pytorch/pytorch/pull/120006 Approved by: https://github.com/malfet	2024-06-03 22:42:05 +00:00
Kwanghoon An	24a4bfdcc2	[AdaRound] Make versatile for data / extra param for callback function (#126891 ) Summary: For Speech sequential model, there could be a case where model(data) does not work correctly for feed forward, Speech model uses a different type of Criterion (a.k.a loss function) to feed a data on individual components like encoder, predictor, joiner. Hence we need extra parameter to pass feedforward wrapper Differential Revision: D57680391 Pull Request resolved: https://github.com/pytorch/pytorch/pull/126891 Approved by: https://github.com/jerryzh168	2024-05-29 20:05:27 +00:00
Kwanghoon An	c404b2968c	Support min/max carry over for eager mode from_float method (#127309 ) Summary: After QAT is completed or given pre-tuned weight observer via tunable PTQ algorithm, it should not over-write again with a given weight, at least for static QAT never. Dynamic QAT also does not require to re-run weight observer again by design. This is a fix Test Plan: Signals Differential Revision: D57747749 Pull Request resolved: https://github.com/pytorch/pytorch/pull/127309 Approved by: https://github.com/jerryzh168	2024-05-29 19:33:26 +00:00
PyTorch MergeBot	980f5ac049	Revert "[Quant][PT2E] enable qlinear post op fusion for dynamic quant & qat (#122667 )" This reverts commit `3642e51ea5`. Reverted https://github.com/pytorch/pytorch/pull/122667 on behalf of https://github.com/DanilBaibak due to Break internal build ([comment](https://github.com/pytorch/pytorch/pull/122667#issuecomment-2122642317))	2024-05-21 13:45:07 +00:00
Xia, Weiwen	3642e51ea5	[Quant][PT2E] enable qlinear post op fusion for dynamic quant & qat (#122667 ) Description Add fusion path for dynamic quant and for QAT. The following patterns can be matched for static quant with QAT cases: `qx -> qlinear -> add -> optional relu -> optional type convert -> optional quant` The following patterns can be matched for dynamic quant cases: `qx -> qlinear -> add -> optional relu` Test plan python test/inductor/test_mkldnn_pattern_matcher.py -k test_qlinear python test/inductor/test_cpu_cpp_wrapper.py -k test_qlinear python test/test_quantization.py -k test_linear_unary python test/test_quantization.py -k test_linear_binary Pull Request resolved: https://github.com/pytorch/pytorch/pull/122667 Approved by: https://github.com/jgong5	2024-05-20 15:55:18 +00:00
Kwanghoon An	eb0b16db92	Initial implementation of AdaRound (#126153 ) Summary: This is an implementation of AdaRound from a paper https://arxiv.org/abs/2004.10568 This algorithm is going to be used by multiple people, hence we need make it official implementation. Differential Revision: D57227565 Pull Request resolved: https://github.com/pytorch/pytorch/pull/126153 Approved by: https://github.com/jerryzh168, https://github.com/huydhn	2024-05-17 19:44:50 +00:00
andrewor14	6931f781c2	[quant][pt2e] Allow multi users without output observers (#126487 ) Summary: The PT2E quantization flow does not support unquantized outputs yet. To work around this, users may wish to remove the output observer from their graphs. However, this fails currently in some cases because the `PortNodeMetaForQDQ` pass is too restrictive, for example: ``` conv -> obs -------> output0 \\-> add -> output1 ``` Previously we expected conv to always have exactly 1 user, which is the observer. When the observer is removed, however, conv now has 2 users, and this fails the check. ``` conv -------> output0 \\-> add -> output1 ``` This commit relaxes the error into a warning to enable this workaround. Test Plan: python test/test_quantization.py TestQuantizePT2E.test_multi_users_without_output_observer Reviewers: jerryzh168 Subscribers: jerryzh168, supriyar Differential Revision: [D57472601](https://our.internmc.facebook.com/intern/diff/D57472601) Pull Request resolved: https://github.com/pytorch/pytorch/pull/126487 Approved by: https://github.com/tarun292	2024-05-17 18:48:21 +00:00
PyTorch MergeBot	ae6fdfa539	Revert "Initial implementation of AdaRound (#126153 )" This reverts commit `175c18af81`. Reverted https://github.com/pytorch/pytorch/pull/126153 on behalf of https://github.com/huydhn due to Sorry for reverting your change but the lint failure is legit because there are more than one lint issues, torch/optim/asgd.py is just the last one ([comment](https://github.com/pytorch/pytorch/pull/126153#issuecomment-2113902522))	2024-05-16 02:34:49 +00:00
Kwanghoon An	175c18af81	Initial implementation of AdaRound (#126153 ) Summary: This is an implementation of AdaRound from a paper https://arxiv.org/abs/2004.10568 This algorithm is going to be used by multiple people, hence we need make it official implementation. Differential Revision: D57227565 Pull Request resolved: https://github.com/pytorch/pytorch/pull/126153 Approved by: https://github.com/jerryzh168	2024-05-16 02:09:18 +00:00
andrewor14	3cba50e478	[quant] Make per_group and per_token quant match torch.fake_quantize (#125781 ) Summary: Follow-up to https://github.com/pytorch/ao/pull/229. This resolves the difference between `input.div(scales)` and `input.mul(1.0 / scales)`, which results in small numerical discrepancies on some inputs. Test Plan: python test/test_quantization.py TestQuantizedTensor.test_decomposed_quantize_per_channel_group python test/test_quantization.py TestQuantizedTensor.test_decomposed_quantize_per_token Reviewers: jerryzh168 Subscribers: jerryzh168, supriyar Pull Request resolved: https://github.com/pytorch/pytorch/pull/125781 Approved by: https://github.com/jerryzh168	2024-05-14 18:18:54 +00:00
Aaron Gokaslan	34910f87f0	[BE]: Update ruff to v0.4.4 (#125031 ) Update ruff version to 0.4.2. This version mostly has bugfixes for the new parser and also updates the f-string rule to be able to apply more fixes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/125031 Approved by: https://github.com/albanD, https://github.com/malfet	2024-05-12 20:02:37 +00:00
leslie-fang-intel	d83ab88f81	[Inductor] [Quant] Enable lowering of quant per tensor and refactor quant pattern (#124041 ) Summary Per the discussion in https://github.com/pytorch/pytorch/pull/123444, the `decomposed quant/dequant` patterns changed after https://github.com/pytorch/pytorch/pull/123445, we can move the optimization of `decomposed quant/dequant` from inductor decomposition into lowering phase to avoid the changes. In this way, we can: - Avoid the pattern matcher failure introduced in https://github.com/pytorch/pytorch/pull/123445 - Make the quantization pattern clearer in the pattern matcher phase, since the `quant/dequant` nodes have not been decomposed. Changes in this PR - Move optimization of `decomposed quant/dequant` from inductor decomposition into lowering phase. - Corresponding changes in the quantization pattern matcher to ensure no bc-breaking. TestPlan ``` python -u -m pytest -s -v test/inductor/test_mkldnn_pattern_matcher.py -k test_q ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/124041 Approved by: https://github.com/peterbell10, https://github.com/jgong5	2024-05-09 08:40:44 +00:00
PyTorch MergeBot	ea3f625e32	Revert "[Inductor] [Quant] Enable lowering of quant per tensor and refactor quant pattern (#124041 )" This reverts commit `33e6791645`. Reverted https://github.com/pytorch/pytorch/pull/124041 on behalf of https://github.com/huydhn due to Sorry for reverting your change but I think there is a land race with the change `33e6791645` ([comment](https://github.com/pytorch/pytorch/pull/124041#issuecomment-2101766558))	2024-05-09 01:34:19 +00:00
leslie-fang-intel	33e6791645	[Inductor] [Quant] Enable lowering of quant per tensor and refactor quant pattern (#124041 ) Summary Per the discussion in https://github.com/pytorch/pytorch/pull/123444, the `decomposed quant/dequant` patterns changed after https://github.com/pytorch/pytorch/pull/123445, we can move the optimization of `decomposed quant/dequant` from inductor decomposition into lowering phase to avoid the changes. In this way, we can: - Avoid the pattern matcher failure introduced in https://github.com/pytorch/pytorch/pull/123445 - Make the quantization pattern clearer in the pattern matcher phase, since the `quant/dequant` nodes have not been decomposed. Changes in this PR - Move optimization of `decomposed quant/dequant` from inductor decomposition into lowering phase. - Corresponding changes in the quantization pattern matcher to ensure no bc-breaking. TestPlan ``` python -u -m pytest -s -v test/inductor/test_mkldnn_pattern_matcher.py -k test_q ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/124041 Approved by: https://github.com/peterbell10, https://github.com/jgong5	2024-05-09 00:54:22 +00:00
PyTorch MergeBot	1b396d69cb	Revert "[CUDNN] Remove defunct cuDNN V8 API build flag (#120006 )" This reverts commit `ee4cafa098`. Reverted https://github.com/pytorch/pytorch/pull/120006 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it is failing ROCm jobs in trunk `ee4cafa098` ([comment](https://github.com/pytorch/pytorch/pull/120006#issuecomment-2098849813))	2024-05-07 16:28:04 +00:00
eqy	ee4cafa098	[CUDNN] Remove defunct cuDNN V8 API build flag (#120006 ) The flag basically does nothing following #95722 Let's see if the quantization tests break CC @malfet @atalmanagement Pull Request resolved: https://github.com/pytorch/pytorch/pull/120006 Approved by: https://github.com/malfet	2024-05-06 23:13:58 +00:00
andrewor14	8242fb62a7	[quant][pt2e] Fix conv-bn weight + bias per channel QAT (#125208 ) Summary: This commit fixes the pattern matching for conv-bn during QAT fusion where both weight and bias are quantized per channel. Previously this failed because weights and biases used the same example kwargs for their scales and zero points, causing these qparams to be tied during pattern matching. Test Plan: python test/test_quantization.py TestQuantizePT2EQAT_ConvBn1d.test_qat_conv_bn_per_channel_weight_bias python test/test_quantization.py TestQuantizePT2EQAT_ConvBn2d.test_qat_conv_bn_per_channel_weight_bias Reviewers: jerryzh168, angelayi Subscribers: jerryzh168, angelayi, supriyar Differential Revision: [D56740694](https://our.internmc.facebook.com/intern/diff/D56740694) Pull Request resolved: https://github.com/pytorch/pytorch/pull/125208 Approved by: https://github.com/angelayi	2024-04-30 18:12:25 +00:00
Xia, Weiwen	35b332882b	[Quant][PT2E] Enable linear-binary(-unary) post-op recipe for X86Inductor quantizer (#122387 ) As the title Test plan python test/test_quantization.py -k test_linear_binary Differential Revision: [D56288440](https://our.internmc.facebook.com/intern/diff/D56288440) Pull Request resolved: https://github.com/pytorch/pytorch/pull/122387 Approved by: https://github.com/leslie-fang-intel, https://github.com/jgong5 ghstack dependencies: #123240	2024-04-27 02:40:57 +00:00
andrewor14	85b28ffc3a	[quant][pt2e] Move batch norm op between eval/train for cuda (#123957 ) Summary: Before in `move_exported_model_to_train/eval`, we only switched the CPU versions of the batch norm op. This commit adds support for the cuda versions of the op too. Note that this fix is temporary; we won't have to differentiate between these two cases once we have batch norm consolidation. Test Plan: python test/test_quantization.py -k test_move_exported_model_bn Reviewers: jerryzh168 Subscribers: jerryzh168, leslie-fang-intel, supriyar Differential Revision: [D56070054](https://our.internmc.facebook.com/intern/diff/D56070054) Pull Request resolved: https://github.com/pytorch/pytorch/pull/123957 Approved by: https://github.com/jerryzh168	2024-04-24 22:01:50 +00:00
Shen Xu	8885638f95	[quant][pt2e] Propagate get_attr meta through known ops only (#124415 ) Summary: Avoid situation where the graph traversal finds a matmul node with a `get_attr` as its `args[0]`, and incorrectly propagate the `get_attr`'s meta to everything downstream. Test Plan: CI Differential Revision: D56219120 Pull Request resolved: https://github.com/pytorch/pytorch/pull/124415 Approved by: https://github.com/jerryzh168	2024-04-24 20:55:56 +00:00
PyTorch MergeBot	e739a2d59e	Revert "[quant][pt2e] Move batch norm op between eval/train for cuda (#123957 )" This reverts commit `4efb28c900`. Reverted https://github.com/pytorch/pytorch/pull/123957 on behalf of https://github.com/jeanschmidt due to reverting to check if it will fix rocm jobs on main ([comment](https://github.com/pytorch/pytorch/pull/123957#issuecomment-2075158146))	2024-04-24 15:02:11 +00:00
andrewor14	4efb28c900	[quant][pt2e] Move batch norm op between eval/train for cuda (#123957 ) Summary: Before in `move_exported_model_to_train/eval`, we only switched the CPU versions of the batch norm op. This commit adds support for the cuda versions of the op too. Note that this fix is temporary; we won't have to differentiate between these two cases once we have batch norm consolidation. Test Plan: python test/test_quantization.py -k test_move_exported_model_bn Reviewers: jerryzh168 Subscribers: jerryzh168, leslie-fang-intel, supriyar Differential Revision: [D56070054](https://our.internmc.facebook.com/intern/diff/D56070054) Pull Request resolved: https://github.com/pytorch/pytorch/pull/123957 Approved by: https://github.com/jerryzh168	2024-04-24 01:02:59 +00:00
Amadeusz Skrzypczak	107f944f22	Support fp8 quantization (#123161 ) This commit enables float8_e5m2 and float8_e4m3fn dtypes in fx quantization and PT2E. Motivation for using fp8 quantization instead of int8: - it works better to run inference with the same datatype the model was trained with, - fp8 can handle outliers better, which is one of the problems in LLMs activations. The numerical recipe we want to use it for is fp8 inference: - bgemms/gemms running in float8_e4m3fn, - Per-Tensor-Quantization/Scaling, - amax observer for measurement with input_backoff and weight_backoff. Pull Request resolved: https://github.com/pytorch/pytorch/pull/123161 Approved by: https://github.com/jgong5, https://github.com/jerryzh168	2024-04-23 13:35:27 +00:00
Mikayla Gawarecki	c82fcb7b30	Add testing and fix `weights_only` load for quantized types and nn.Parameters with python attrs (#124330 ) Adds the following to allowed globals for the `weights_only` unpickler - [x] `torch._utils._rebuild_qtensor` and qtensor related types - [x] `torch._utils._rebuild_parameter_with_state` (used deserializing a parameter that has user-defined attributes like `Param.foo`) The remaining rebuild functions that have not been allowlisted are - [x] `torch._utils._rebuild_wrapper_subclass` (allowlisted in above PR) - [ ] `torch._utils._rebuild_device_tensor_from_numpy` - [ ] `torch._utils._rebuild_xla_tensor` (legacy) Pull Request resolved: https://github.com/pytorch/pytorch/pull/124330 Approved by: https://github.com/albanD	2024-04-23 04:13:26 +00:00
leslie-fang-intel	dd440ac734	Add Matmul recipe into x86_inductor_quantizer (#122776 ) Summary Add `matmul` in the quantization recipes, noting that it's not a general recipe but tailored to meet accuracy criteria for specific models. `matmul` recipe is disabled by default. Test Plan ``` python -m pytest quantization/pt2e/test_x86inductor_quantizer.py -k test_attention_block ``` Differential Revision: [D56288468](https://our.internmc.facebook.com/intern/diff/D56288468) Pull Request resolved: https://github.com/pytorch/pytorch/pull/122776 Approved by: https://github.com/jgong5, https://github.com/jerryzh168	2024-04-23 00:25:41 +00:00
Aaron Gokaslan	5a1216bb2e	[BE]: Update ruff to 0.4.1 (#124549 ) Update ruff to 0.4.1 . This version fixes a lot false negatives/false positives, is 20-40% faster, and has various other bug fixes. Below is a before and after table showing the execution time of ruff lint and ruff format in milliseconds courtesy of https://astral.sh/blog/ruff-v0.4.0 \| Repository \| Linter (v0.3) \| Linter (v0.4) \| Formatter (v0.3) \| Formatter (v0.4) \| \|----------------------------------------------------\|---------------\|---------------\|------------------\|------------------\| \| [pytorch/pytorch](https://github.com/pytorch/pytorch) \| 328.7 \| 251.8 \| 351.1 \| 274.9 \| Pull Request resolved: https://github.com/pytorch/pytorch/pull/124549 Approved by: https://github.com/ezyang	2024-04-21 14:06:23 +00:00
andrewor14	3eea300680	[quant] Do not decompose choose_qparams_per_token_asymmetric (#124178 ) Summary: https://github.com/pytorch/pytorch/pull/123452 added backward support to this op by turning it into CompositeImplicitAutograd, which meant it gets decomposed during export/compile. However, this is not desirable behavior for the PTQ case when we try to lower the model. This commit enables QAT without breaking PTQ by refactoring the impl into a separate op that does have backward support. Test Plan: python test/test_quantization.py -k test_decomposed_choose_qparams_per_token_asymmetric_backward Reviewers: jerryzh168, digantdesai, zou3519 Subscribers: jerryzh168, digantdesai, zou3519, supriyar Differential Revision: [D56192116](https://our.internmc.facebook.com/intern/diff/D56192116) Pull Request resolved: https://github.com/pytorch/pytorch/pull/124178 Approved by: https://github.com/digantdesai	2024-04-16 22:58:48 +00:00
WeiChunyu-star	635c238bad	Enable UFMT on all of test/quantization/jit &pt2e (#124010 ) Partially addresses #123062 Ran lintrunner on: - test/quantization/jit - test/quantization/pt2e Detail: ``` $ lintrunner -a --take UFMT --all-files ok No lint issues. Successfully applied all patches. ``` cc, please @ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/124010 Approved by: https://github.com/ezyang	2024-04-14 06:07:23 +00:00
WeiChunyu-star	6ac8fe46dd	Enable UFMT on all of test/quantization/ao_migration &bc (#123994 ) Partially addresses #123062 Ran lintrunner on: - test/quantization/ao_migration - test/quantization/bc Detail: ``` $ lintrunner -a --take UFMT --all-files ok No lint issues. Successfully applied all patches. ``` @ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/123994 Approved by: https://github.com/ezyang	2024-04-13 06:36:10 +00:00
Aaron Gokaslan	1d6c5972c1	[BE]: Optimize min/max/sum comprehensions C419 (#123960 ) Automatic fixes that replaces certain list comprehensions with generator ones where appropriate so that they are immediately consumed. This is preview functionality in ruff for rule C419 and it was automatically applied. Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/123960 Approved by: https://github.com/malfet	2024-04-12 23:54:15 +00:00
andrewor14	762e19606e	[quant] Enable backward for choose_qparams_per_token_asymmetric (#123452 ) Summary: When running the backward for this op, we get the error: ``` RuntimeError: derivative for aten::aminmax is not implemented ``` This commit replaces this call with separate amin and amax calls instead, which do have implemented derivatives. Test Plan: python test/test_quantization.py -k test_decomposed_choose_qparams_per_token_asymmetric_backward Reviewers: jerryzh168, digantdesai Subscribers: jerryzh168, digantdesai, supriyar Differential Revision: [D55805170](https://our.internmc.facebook.com/intern/diff/D55805170) Pull Request resolved: https://github.com/pytorch/pytorch/pull/123452 Approved by: https://github.com/digantdesai, https://github.com/jerryzh168, https://github.com/zou3519	2024-04-12 20:05:56 +00:00
andrewor14	5c0a380bdf	[pt2e][qat] Support conv-transpose-bn[-relu] QAT fusion (#123652 ) Summary: This commit adds support for QAT fusion for the [conv-transpose-bn] and [conv-transpose-bn-relu] patterns. Test Plan: python test/test_quantization.py TestQuantizePT2EQAT_ConvBn1d.test_qat_conv_transpose_bn python test/test_quantization.py TestQuantizePT2EQAT_ConvBn1d.test_qat_conv_transpose_bn_relu python test/test_quantization.py TestQuantizePT2EQAT_ConvBn2d.test_qat_conv_transpose_bn python test/test_quantization.py TestQuantizePT2EQAT_ConvBn2d.test_qat_conv_transpose_bn_relu Reviewers: jerryzh168 Subscribers: jerryzh168, supriyar Tasks: https://github.com/pytorch/pytorch/issues/122224 Differential Revision: [D55930704](https://our.internmc.facebook.com/intern/diff/D55930704) Pull Request resolved: https://github.com/pytorch/pytorch/pull/123652 Approved by: https://github.com/jerryzh168	2024-04-12 17:16:02 +00:00
PyTorch MergeBot	5669334175	Revert "Add Matmul recipe into x86_inductor_quantizer (#122776 )" This reverts commit `e8e9261b90`. Reverted https://github.com/pytorch/pytorch/pull/122776 on behalf of https://github.com/DanilBaibak due to Break internal build ([comment](https://github.com/pytorch/pytorch/pull/122776#issuecomment-2051073373))	2024-04-12 06:29:27 +00:00
PyTorch MergeBot	fe092da874	Revert "[quant] Enable backward for choose_qparams_per_token_asymmetric (#123452 )" This reverts commit `c83900887f`. Reverted https://github.com/pytorch/pytorch/pull/123452 on behalf of https://github.com/clee2000 due to broke test_quantization.py::TestQuantizedTensor::test_decomposed_choose_qparams_per_token_asymmetric_backward on multiple jobs `c83900887f` https://github.com/pytorch/pytorch/actions/runs/8648781225/job/23714753103, probably a landrace ([comment](https://github.com/pytorch/pytorch/pull/123452#issuecomment-2050056601))	2024-04-11 16:19:28 +00:00
andrewor14	c83900887f	[quant] Enable backward for choose_qparams_per_token_asymmetric (#123452 ) Summary: When running the backward for this op, we get the error: ``` RuntimeError: derivative for aten::aminmax is not implemented ``` This commit replaces this call with separate amin and amax calls instead, which do have implemented derivatives. Test Plan: python test/test_quantization.py -k test_decomposed_choose_qparams_per_token_asymmetric_backward Reviewers: jerryzh168, digantdesai Subscribers: jerryzh168, digantdesai, supriyar Differential Revision: [D55805170](https://our.internmc.facebook.com/intern/diff/D55805170) Pull Request resolved: https://github.com/pytorch/pytorch/pull/123452 Approved by: https://github.com/digantdesai, https://github.com/jerryzh168	2024-04-11 14:51:42 +00:00
leslie-fang-intel	e8e9261b90	Add Matmul recipe into x86_inductor_quantizer (#122776 ) Summary Add `matmul` in the quantization recipes, noting that it's not a general recipe but tailored to meet accuracy criteria for specific models. `matmul` recipe is disabled by default. Test Plan ``` python -m pytest quantization/pt2e/test_x86inductor_quantizer.py -k test_attention_block ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/122776 Approved by: https://github.com/jgong5, https://github.com/jerryzh168 ghstack dependencies: #122775	2024-04-11 09:32:47 +00:00
leslie-fang-intel	8798f5bf0d	Add Quantization recipe filter per operator type for x86_inductor_quantizer (#122775 ) Summary Default recipes are enabled in `X86InductorQuantizer` and request comes to customize recipes based on these defaults. - Avoid annotation propagation and restrict annotation only to annotate `conv`/`linear`. - Add `matmul` in the quantization recipes, noting that it's not a general recipe but tailored to meet accuracy criteria for specific models. To meet these requests, we made changes in this PR by introducing interface as `set_function_type_qconfig` and `set_module_type_qconfig` - `set_function_type_qconfig` accepts functional input as `torch.nn.functional.linear` or `torch.matmul`; `set_module_type_qconfig` accepts nn.Module input as `torch.nn.Conv2d`. - To disable the recipe for this operator, user can simply exclude it from the list of operations as `quantizer.set_function_type_qconfig(op, None)`. - To modify or extend the recipe for this operator with default recipe, user can customize as `quantizer.set_function_type_qconfig(op, config)`. Test Plan ``` python -m pytest quantization/pt2e/test_x86inductor_quantizer.py -k test_filter_conv2d_recipe python -m pytest quantization/pt2e/test_x86inductor_quantizer.py -k test_filter_linear_recipe python -m pytest quantization/pt2e/test_x86inductor_quantizer.py -k test_filter_maxpool2d_recipe ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/122775 Approved by: https://github.com/jgong5, https://github.com/jerryzh168	2024-04-11 09:30:31 +00:00
PyTorch MergeBot	8d9af8b91c	Revert "[Quant][PT2E] Enable linear-binary(-unary) post-op recipe for X86Inductor quantizer (#122387 )" This reverts commit `82e0153487`. Reverted https://github.com/pytorch/pytorch/pull/122387 on behalf of https://github.com/DanilBaibak due to Break internal build ([comment](https://github.com/pytorch/pytorch/pull/122387#issuecomment-2048294643))	2024-04-10 19:34:26 +00:00
Xia, Weiwen	82e0153487	[Quant][PT2E] Enable linear-binary(-unary) post-op recipe for X86Inductor quantizer (#122387 ) As the title Test plan python test/test_quantization.py -k test_linear_binary Pull Request resolved: https://github.com/pytorch/pytorch/pull/122387 Approved by: https://github.com/leslie-fang-intel, https://github.com/jgong5	2024-04-10 01:34:14 +00:00
Xia, Weiwen	d86cb9c747	[Quant][Inductor] Add qlinear_pointwise.binary op for X86Inductor backend (#123144 ) Note: This is a reopen of https://github.com/pytorch/pytorch/pull/122288, which was merged by `ghstack land` to its base (not main) by mistake. Description Add qlinear_binary op for X86Inductor backend of quantization PT2E. It only supports `add` and `add_relu` now. It will use post op sum if the extra input has the same dtype as output. Otherwise, it uses binary add. ``` +-------------------+--------------+---------------+ \| Extra input dtype \| Output dtype \| Post op \| +-------------------+--------------+---------------+ \| Fp32/bf16 \| fp32/bf16 \| sum or add* \| +-------------------+--------------+---------------+ \| Fp32/bf16 \| int8 \| add \| +-------------------+--------------+---------------+ \| int8 \| fp32/bf16 \| not supported \| +-------------------+--------------+---------------+ \| int8 \| int8 \| sum \| +-------------------+--------------+---------------+ Use sum if extra input and output have the same dtype; otherwise use add. ``` Test plan* python test_quantization.py -k test_qlinear_add_pt2e python test_quantization.py -k test_qlinear_add_relu_pt2e Pull Request resolved: https://github.com/pytorch/pytorch/pull/123144 Approved by: https://github.com/leslie-fang-intel, https://github.com/jgong5, https://github.com/jerryzh168	2024-04-09 04:56:37 +00:00
Zhicheng Yan	77643ed2eb	[torch quantization]raise exception when OOM during combine histogram in observer (#123309 ) Summary: Even with changes in D55347133, it is still possible to OOM in histogram observer, because the size of allocated tensor also depends on downsample_rate. For example, I still see OOM due to the attempt of allocating a 10GB+ histogram tensor in multi-task model. To fix OOM issue better, we use try-catch clause to avoid OOM. Empirically, we set the max size of a single histogram tensor size to 1 GB. Test Plan: Test the change for Multi-Task model (depth + segmentation) Differential Revision: D55567292 Pull Request resolved: https://github.com/pytorch/pytorch/pull/123309 Approved by: https://github.com/jerryzh168	2024-04-06 03:15:02 +00:00
William Wen	cbde0f048b	[dynamo, 3.12] enable tests disabled due to missing dynamo 3.12 support (#123300 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/123300 Approved by: https://github.com/jansel, https://github.com/malfet, https://github.com/zou3519	2024-04-05 20:13:17 +00:00
Zhengxu Chen	b4c810491e	[export] Temporarily block mutating ops in quant tests. (#122863 ) Summary: After we migrate to torch.export, we won't see ops like add_ and mul_ due to functionalization. We are rolling out pre dispatch export, so for now we just skip those mutating ops in tests. Test Plan: buck run mode/opt caffe2/test/quantization:test_quantization Reviewed By: tugsbayasgalan Differential Revision: D55442019 Pull Request resolved: https://github.com/pytorch/pytorch/pull/122863 Approved by: https://github.com/clee2000	2024-04-01 16:41:13 +00:00
Xia, Weiwen	2cd3ef4777	Check scale dtype for fake_quantize_per_channel_affine_cachemask (#120987 ) Fixes #120903 Scale for fake quant is assumed FP32 but not checked. If scales of double dtype are passed in, an internal error is raised: `TORCH_INTERNAL_ASSERT(!needs_dynamic_casting<func_t>::check(iter));` in aten/src/ATen/native/cpu/Loops.h This PR adds a check of scale dtype. Pull Request resolved: https://github.com/pytorch/pytorch/pull/120987 Approved by: https://github.com/jgong5, https://github.com/jerryzh168	2024-03-30 07:32:32 +00:00
Mu-Chu Lee	966ae943df	Add wrapper for fbgemm quantization operations (#122763 ) Summary: We add wrappers for fbgemm's packing so we can pass it through PT2 to lowering phase of AOTInductor. Test Plan: Included in commit. test_quantized_ops::test_wrapped_fbgemm_linear_fp16 Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D55433204](https://our.internmc.facebook.com/intern/diff/D55433204) Pull Request resolved: https://github.com/pytorch/pytorch/pull/122763 Approved by: https://github.com/jerryzh168 ghstack dependencies: #122762	2024-03-28 18:41:18 +00:00
Mu-Chu Lee	a3b30851c5	Add quantized.linear_unpacked_dynamic_fp16 (#122762 ) Summary: We add a new op quantized.linear_unpacked_dynamic_fp16, which is essentially linear_dynamic_fp16 with different (unpacked) weight/bias format. This op does packing on the fly for each call with standard at::Tensor weight & bias. Test Plan: Included in commit. test_quantized_op::test_unpacked_qlinear_dynamic_fp16 Differential Revision: [D55433203](https://our.internmc.facebook.com/intern/diff/D55433203) Pull Request resolved: https://github.com/pytorch/pytorch/pull/122762 Approved by: https://github.com/jerryzh168	2024-03-28 18:02:27 +00:00
Jerry Zhang	5af839f86d	[quant][pt2e] Enable observer sharing between different quantization specs (#122734 ) Summary: Right now we don't insert additional observers (share observers) if qspec.dtype and qspec.is_dynamic matches exactly, since fixed qparams quantization spec and derived quantization spec do have have is_dynamic field curerntly, observer sharing does not happen between them and quantization spec, in this PR we fixed the issue by adding is_dynamic to all quantization specs. Note: SharedQuantizationSpec should probably be its own type in the future TODO later: (1). move all these fields (dtype, is_dynamic, quant_min, quant_max etc.) to QuantizationSpecBase, (2). make SharedQuantizationSpec a separate type (3). add quant_min/quant_max in observer sharing checking in pt2e/prepare.py Test Plan: python test/test_quantization.py -k test_fixed_qparams_qspec_observer_dedup Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D55396546](https://our.internmc.facebook.com/intern/diff/D55396546) Pull Request resolved: https://github.com/pytorch/pytorch/pull/122734 Approved by: https://github.com/andrewor14	2024-03-27 16:45:19 +00:00
haozhe.zhu	e0329cba8a	[Quant] [PT2] Add SiLU into X86InductorQuantizer Conv2d Unary Annotation (#122267 ) Summary Add `SiLU` into X86InductorQuantizer Conv2d Unary Annotation TestPlan ``` python -m pytest test_x86inductor_quantizer.py -k test_conv2d_unary python -m pytest test_x86inductor_quantizer.py -k test_qat_conv2d_unary ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/122267 Approved by: https://github.com/leslie-fang-intel, https://github.com/jgong5 ghstack dependencies: #122266	2024-03-26 08:03:42 +00:00
PyTorch MergeBot	60bc29aa0b	Revert "[Quant] [PT2] Add SiLU into X86InductorQuantizer Conv2d Unary Annotation (#122267 )" This reverts commit `2c6eeb26d3`. Reverted https://github.com/pytorch/pytorch/pull/122267 on behalf of https://github.com/jeanschmidt due to Not sure if this PR caused breakages in main rocm jobs, I'll remerge if reverting does not fix it ([comment](https://github.com/pytorch/pytorch/pull/122267#issuecomment-2015294491))	2024-03-22 15:04:30 +00:00
andrewor14	ea8e0c75c7	[quant][pt2] Fix create FQ with FixedQParamsQSpec (#122104 ) Summary: Before we just returned a _PartialWrapper object when using FixedQParamsQuantizationSpec in QAT. This is wrong and we should return a FQ object instead. Differential Revision: [D55021106](https://our.internmc.facebook.com/intern/diff/D55021106) Pull Request resolved: https://github.com/pytorch/pytorch/pull/122104 Approved by: https://github.com/jerryzh168	2024-03-22 14:23:05 +00:00
haozhe.zhu	2c6eeb26d3	[Quant] [PT2] Add SiLU into X86InductorQuantizer Conv2d Unary Annotation (#122267 ) Summary Add `SiLU` into X86InductorQuantizer Conv2d Unary Annotation TestPlan ``` python -m pytest test_x86inductor_quantizer.py -k test_conv2d_unary python -m pytest test_x86inductor_quantizer.py -k test_qat_conv2d_unary ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/122267 Approved by: https://github.com/leslie-fang-intel, https://github.com/jgong5 ghstack dependencies: #122266	2024-03-22 08:12:23 +00:00
haozhe.zhu	a337ee0a3a	[Quant] Enable QConv2d with silu post op (#122266 ) Summary Enable QConv2d implementation with post op `silu` Test Plan ``` python -m pytest test_quantized_op.py -k test_qconv2d_silu_pt2e ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/122266 Approved by: https://github.com/leslie-fang-intel, https://github.com/jgong5	2024-03-22 07:58:45 +00:00
Jerry Zhang	901ba2be86	[quant][pt2e] Add support for conv transpose + bn + {relu} weights fusion in PTQ (#122046 ) Summary: also added some utils in xnnpack_quantizer_utils.py * annotate_conv_tranpsose_bn_relu and annotate_conv_transpose_bn -> this is for QAT * annotate_conv_transpose_relu conv_transpose + bn weights fusion is performed automatically and can not be disabled currently we can add support to allow disable this fusion later if needed Test Plan: python test/test_quantization.py -k test_conv_transpose_bn_fusion Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/122046 Approved by: https://github.com/andrewor14	2024-03-19 21:00:57 +00:00
Le-Zheng	25e00545bb	[Quant][PT2E] Enable linear and linear-unary post-op gelu quant recipe for x86 inductor quantizer (#114853 ) Summary Add Gelu for linear-unary post-op quantization recipe to x86 inductor quantizer. Test plan python -m pytest test/quantization/pt2e/test_x86inductor_quantizer.py -k test_linear_unary_gelu python test/test_quantization.py -k test_linear_unary_with_quantizer_api Co-authored-by: leslie-fang-intel <leslie.fang@intel.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/114853 Approved by: https://github.com/jgong5, https://github.com/leslie-fang-intel, https://github.com/jerryzh168	2024-03-14 01:46:35 +00:00
Shen Xu	159f30331f	[quant][pt2e] Call sub-quantizers' transform_for_annotation in ComposableQuantizer (#121548 ) Test Plan: ``` buck run caffe2/test:quantization_pt2e ``` Differential Revision: D54454707 Pull Request resolved: https://github.com/pytorch/pytorch/pull/121548 Approved by: https://github.com/jerryzh168	2024-03-12 02:59:12 +00:00

1 2 3 4 5 ...

1886 Commits