Commit Graph

1819 Commits

Author SHA1 Message Date
Huamin Li
68bbe20db7 Add test coverage (#149182)
Summary: Follow up from D71160718

Differential Revision: D71177037

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149182
Approved by: https://github.com/houseroad
2025-03-14 09:38:29 +00:00
cyy
50eb4f3990 Enable UBSAN test (#147511)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/147511
Approved by: https://github.com/colesbury
2025-03-07 00:35:32 +00:00
vasiliy
382fbcc1e4 add the torch.float8_e8m0fnu dtype to PyTorch (#147466)
Summary:

Continuing the work from https://github.com/pytorch/pytorch/pull/146427

Adds the `torch.float8_e8m0fnu` dtype to PyTorch, as detailed in
https://github.com/pytorch/pytorch/issues/146414 . Please see the issue for a detailed definition of the format.  Example of basic functionality:

```python
import torch

# round trip
x0 = torch.randn(4, 4, dtype=torch.float32)
x1 = x0.to(torch.float8_e8m0fnu)  # RNE rounding
x2 = x1.to(torch.float32)  # 2 ** exponent

# creation with empty
x0 = torch.empty(4, 4, dtype=torch.float8_e8m0fnu)

# printing
print(x0)
```

Done in this PR:
* numerical correctness
* op coverage (except for `torch._scaled_mm`): create tensor, cast to/from float32
* printing a tensor works

For future PRs:
* performance optimizations for casting
* torch._scaled_mm
* PT2
* various cleanups (detailed in comments with issue numbers)

Test Plan:

```
pytest test/quantization/core/experimental/test_float8.py -s
```

Reviewers:

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/147466
Approved by: https://github.com/drisspg
2025-02-20 13:55:42 +00:00
Chen Lai
bd370c138a fix pt2e block wise quantization unit test (#147406)
Differential Revision: D69806596

https://github.com/pytorch/pytorch/pull/146946 breaks the unit test, because the quant nodes are folded by default now.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/147406
Approved by: https://github.com/andrewor14, https://github.com/jerryzh168
2025-02-19 02:40:27 +00:00
Chen Lai
2d3db4509a fix pt2e block wise quantization test (#147035)
Differential Revision: D69559217

https://github.com/pytorch/pytorch/pull/145941 breaks the unit test added for prepare pt2e + block wise quantization. Fixing

Pull Request resolved: https://github.com/pytorch/pytorch/pull/147035
Approved by: https://github.com/andrewor14
2025-02-13 19:44:56 +00:00
Aaron Orenstein
99dbc5b0e2 PEP585 update - test (#145176)
See #145101 for details.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145176
Approved by: https://github.com/bobrenjc93
2025-01-22 04:48:28 +00:00
PyTorch MergeBot
f522502b97 Revert "patch for block-wise quantization + pt2e (#144492)"
This reverts commit 1d43b81508.

Reverted https://github.com/pytorch/pytorch/pull/144492 on behalf of https://github.com/albanD due to Broke a few things in trunk ([comment](https://github.com/pytorch/pytorch/pull/144492#issuecomment-2598485291))
2025-01-17 14:27:53 +00:00
Chen Lai
1d43b81508 patch for block-wise quantization + pt2e (#144492)
Summary: As title, needed for enable qcom block-wise quantization kernel

Test Plan: local test

Differential Revision: D67985303

Pull Request resolved: https://github.com/pytorch/pytorch/pull/144492
Approved by: https://github.com/angelayi, https://github.com/billmguo
2025-01-17 04:10:49 +00:00
PyTorch MergeBot
aa4a1ff027 Revert "Prevent _legacy_load with weights_only=True (#144914)"
This reverts commit 7c3aa1da1c.

Reverted https://github.com/pytorch/pytorch/pull/144914 on behalf of https://github.com/izaitsevfb due to breaking inductor on trunk ([comment](https://github.com/pytorch/pytorch/pull/144914#issuecomment-2596922781))
2025-01-16 21:29:50 +00:00
Mikayla Gawarecki
7c3aa1da1c Prevent _legacy_load with weights_only=True (#144914)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144914
Approved by: https://github.com/malfet, https://github.com/albanD
2025-01-16 19:33:46 +00:00
Annop Wongwathanarat
7265dc0622 Enable s8s8s8 for qlinear with mkl-dnn (#139887)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/139887
Approved by: https://github.com/huydhn
2025-01-15 23:20:10 +00:00
PyTorch MergeBot
443de667b1 Revert "Enable s8s8s8 for qlinear with mkl-dnn (#139887)"
This reverts commit dc8692b0eb.

Reverted https://github.com/pytorch/pytorch/pull/139887 on behalf of https://github.com/ZainRizvi due to Sorry but this appears to have broken trunk. See here for more details: [GH job link](https://github.com/pytorch/pytorch/actions/runs/12788709683/job/35651699934) [HUD commit link](dc8692b0eb) ([comment](https://github.com/pytorch/pytorch/pull/139887#issuecomment-2593597977))
2025-01-15 17:58:33 +00:00
Annop Wongwathanarat
dc8692b0eb Enable s8s8s8 for qlinear with mkl-dnn (#139887)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/139887
Approved by: https://github.com/leslie-fang-intel, https://github.com/jerryzh168, https://github.com/ng-05, https://github.com/digantdesai
2025-01-15 12:51:21 +00:00
PyTorch MergeBot
99f2491af9 Revert "Use absolute path path.resolve() -> path.absolute() (#129409)"
This reverts commit 45411d1fc9.

Reverted https://github.com/pytorch/pytorch/pull/129409 on behalf of https://github.com/jeanschmidt due to Breaking internal CI, @albanD please help get this PR merged ([comment](https://github.com/pytorch/pytorch/pull/129409#issuecomment-2571316444))
2025-01-04 14:17:20 +00:00
Xuehai Pan
45411d1fc9 Use absolute path path.resolve() -> path.absolute() (#129409)
Changes:

1. Always explicit `.absolute()`: `Path(__file__)` -> `Path(__file__).absolute()`
2. Replace `path.resolve()` with `path.absolute()` if the code is resolving the PyTorch repo root directory.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129409
Approved by: https://github.com/albanD
2025-01-03 20:03:40 +00:00
Xuehai Pan
b6bdb67f82 [BE][Easy] use pathlib.Path instead of dirname / ".." / pardir (#129374)
Changes by apply order:

1. Replace all `".."` and `os.pardir` usage with `os.path.dirname(...)`.
2. Replace nested `os.path.dirname(os.path.dirname(...))` call with `str(Path(...).parent.parent)`.
3. Reorder `.absolute()` ~/ `.resolve()`~ and `.parent`: always resolve the path first.

    `.parent{...}.absolute()` -> `.absolute().parent{...}`

4. Replace chained `.parent x N` with `.parents[${N - 1}]`: the code is easier to read (see 5.)

    `.parent.parent.parent.parent` -> `.parents[3]`

5. ~Replace `.parents[${N - 1}]` with `.parents[${N} - 1]`: the code is easier to read and does not introduce any runtime overhead.~

    ~`.parents[3]` -> `.parents[4 - 1]`~

6. ~Replace `.parents[2 - 1]` with `.parent.parent`: because the code is shorter and easier to read.~

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129374
Approved by: https://github.com/justinchuby, https://github.com/malfet
2024-12-29 17:23:13 +00:00
Jerry Zhang
ad78edee8e Add support for list, tuple and dict in numeric debugger (#143882)
Summary:
Previously numeric debugger only supports torch.Tensor, this PR adds support for list, tuple and dict as well

Test Plan:
python test/test_quantization.py -k test_extract_results_from_loggers_list_output

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D67660049](https://our.internmc.facebook.com/intern/diff/D67660049)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/143882
Approved by: https://github.com/dulinriley
2024-12-28 02:10:31 +00:00
Yanan Cao (PyTorch)
ba5cacbc17 [Codemod][AddExplicitStrictExportArg] caffe2/test (#143688)
Reviewed By: avikchaudhuri

Differential Revision: D67530154

Pull Request resolved: https://github.com/pytorch/pytorch/pull/143688
Approved by: https://github.com/tugsbayasgalan
2024-12-27 07:58:44 +00:00
Huamin Li
43853691bc [Quantization] add an option keep_original_weights in _lower_to_native_backend (#141049)
Differential Revision: D66153809

This diff adds an option to keep_original_weights so we can track back the original weight and bias after performing prepare_fx and convert_fx

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141049
Approved by: https://github.com/jerryzh168
2024-12-27 04:02:07 +00:00
PyTorch MergeBot
475656fd9c Revert "[BE][Easy] use pathlib.Path instead of dirname / ".." / pardir (#129374)"
This reverts commit 2293fe1024.

Reverted https://github.com/pytorch/pytorch/pull/129374 on behalf of https://github.com/malfet due to failing internal ROCM builds with error: ModuleNotFoundError: No module named hipify ([comment](https://github.com/pytorch/pytorch/pull/129374#issuecomment-2562973920))
2024-12-26 17:32:23 +00:00
PyTorch MergeBot
cc4e70b7c3 Revert "Use absolute path path.resolve() -> path.absolute() (#129409)"
This reverts commit 135c7db99d.

Reverted https://github.com/pytorch/pytorch/pull/129409 on behalf of https://github.com/malfet due to need to revert to as dependency of https://github.com/pytorch/pytorch/pull/129374 ([comment](https://github.com/pytorch/pytorch/pull/129409#issuecomment-2562969825))
2024-12-26 17:26:06 +00:00
Xuehai Pan
135c7db99d Use absolute path path.resolve() -> path.absolute() (#129409)
Changes:

1. Always explicit `.absolute()`: `Path(__file__)` -> `Path(__file__).absolute()`
2. Replace `path.resolve()` with `path.absolute()` if the code is resolving the PyTorch repo root directory.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129409
Approved by: https://github.com/albanD
2024-12-24 08:33:08 +00:00
Jerry Zhang
ace645a017 Add support for prototype affine quantization in pt2e flow (#141421)
Summary:
duplicated affine quantization functionality including
observer (https://github.com/pytorch/ao/blob/main/torchao/quantization/observer.py)
and some quant_primitive ops (7c3c51fd0d/torchao/quantization/quant_primitives.py (L26-L30))
to allow for per group quantization min max observer in pt2e flow

Next: We can follow up to add moving average min max observer

Test Plan:
python test/test_quantization.py -k test_channel_group_quantization

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/141421
Approved by: https://github.com/cccclai
2024-12-24 04:22:18 +00:00
Xuehai Pan
2293fe1024 [BE][Easy] use pathlib.Path instead of dirname / ".." / pardir (#129374)
Changes by apply order:

1. Replace all `".."` and `os.pardir` usage with `os.path.dirname(...)`.
2. Replace nested `os.path.dirname(os.path.dirname(...))` call with `str(Path(...).parent.parent)`.
3. Reorder `.absolute()` ~/ `.resolve()`~ and `.parent`: always resolve the path first.

    `.parent{...}.absolute()` -> `.absolute().parent{...}`

4. Replace chained `.parent x N` with `.parents[${N - 1}]`: the code is easier to read (see 5.)

    `.parent.parent.parent.parent` -> `.parents[3]`

5. ~Replace `.parents[${N - 1}]` with `.parents[${N} - 1]`: the code is easier to read and does not introduce any runtime overhead.~

    ~`.parents[3]` -> `.parents[4 - 1]`~

6. ~Replace `.parents[2 - 1]` with `.parent.parent`: because the code is shorter and easier to read.~

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129374
Approved by: https://github.com/justinchuby, https://github.com/malfet
2024-12-21 22:08:01 +00:00
Tom Ritchford
d8c8ba2440 Fix unused Python variables in test/[e-z]* (#136964)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/136964
Approved by: https://github.com/justinchuby, https://github.com/albanD
2024-12-18 23:02:30 +00:00
Shangdi Yu
bb574abe73 [BC-Breaking]Remove capture_pre_autograd_graph references in quantization (#139505)
Summary:
As title

This is a BC-breaking change because graph produced by "capture_pre_autograd_graph" cannot be input to quantization anymore. But this is ok, since this API is deprecated for a while and is going to be deleted. We have removed all call sites of it.

We remove the deprecated API references in code, docs, and tests.

We also removed two tests that specific to capture_pre_autograd_graph API.

Test Plan: CI

Differential Revision: D65351887

Pull Request resolved: https://github.com/pytorch/pytorch/pull/139505
Approved by: https://github.com/tugsbayasgalan, https://github.com/andrewor14, https://github.com/jerryzh168
2024-12-13 22:26:22 +00:00
gasoonjia
91261107e0 debug handler maintain through decomposition (#141612)
Add checks in the ao numberic debugger to guard the debug handle consistency between aten op decomposition

Differential Revision: [D66517480](https://our.internmc.facebook.com/intern/diff/D66517480/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/141612
Approved by: https://github.com/jerryzh168
2024-12-12 12:26:45 +00:00
gasoonjia
ff059587c6 support condition branch in ao debug handler (#141516)
This diff introduced the supportive of condition statement into ao debug handler generation.

Most of code borrowed from ExecuTorch to avoid circle dependency issue.

Differential Revision: [D66270691](https://our.internmc.facebook.com/intern/diff/D66270691/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/141516
Approved by: https://github.com/jerryzh168
2024-12-10 14:05:12 +00:00
Shangdi Yu
bcddae14ec Enhance "from_node" node meta to track source recursively (#142066)
Summary:
Change the "from_node" node meta format to be able to track the provenance of nodes recursively.

The new "from_node" format is a a list node NodeSource:

```
class NodeSource:
	self.node_name: str
	self.target: str
	self.graph_id: int
	self.pass_name: str
	self.action: str
	self.from_node: List[NoedSource]
```

This is in preparation for the inductor provenance tracking. For background, the inductor provenance tracking doc: https://docs.google.com/document/d/1dGh9myqNhywmbfP0Quzx_f04bghDFlj8cawj8MopiO8/edit?fbclid=IwZXh0bgNhZW0CMTEAAR0jUQ0Tf4ROLDED8Y_eIzrU0KVZVdRmyIQLp-avt-kGRPI_VgYVNyjH_q0_aem_HCQ_pxHDiwOkO9mQyWB2-g&tab=t.0 (internal only),

Test Plan:
```
buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test:test_export -- -r test_unflatten_multiple_graphs_state
buck run mode/dev-nosan caffe2/test:fx -- -r node_source
```

Differential Revision: D66737916

Pull Request resolved: https://github.com/pytorch/pytorch/pull/142066
Approved by: https://github.com/avikchaudhuri
2024-12-09 23:39:15 +00:00
andrewor14
dd5df002b9 [pt2e][quant] Make move_exported_model_to_train/eval idempotent (#142239)
Summary: Before we would recompile the model unnecessarily even
if the model is already in the desired mode. For training
frameworks that assume `model.train()` is idempotent and calls
this before every single training step, this led to a bunch of
tiny graphs and poor performance. This commit makes these calls
no-ops if we're already in the target train/eval mode.

Test Plan:
python test/test_quantization -k TestQuantizePT2E.test_allow_exported_model_train_eval_idempotent
Pull Request resolved: https://github.com/pytorch/pytorch/pull/142239
Approved by: https://github.com/jerryzh168
2024-12-09 16:50:20 +00:00
William Wen
2be8d16247 [ci, 3.13] disable some quantization tests affected by numpy 2.1 overflow error (#141621)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/141621
Approved by: https://github.com/jerryzh168, https://github.com/atalman
ghstack dependencies: #141409, #142003, #141572, #141577, #141605
2024-12-05 00:24:29 +00:00
Johnson Wong
f86a1753d1
Add option to split Linear gates for Quantizable LSTM into separate ops (#141366)
Add option to split Linear gates for Quantizable LSTM into separate ops (#141366)

Summary:

Reattempt to land D65283170, adding pyre-fixmes / mypy ignores following D52890934

For LSTM, the input and hidden state are projected with Linear layers to construct the 4 gates. This is typically performed together as a single Linear (for each state) with output channel count `4 * hidden_dim` for efficiency.
https://www.internalfb.com/code/fbsource/[ebef7c4238aa55948b2b444044f2c8ed2040de55]/fbcode/caffe2/torch/ao/nn/quantizable/modules/rnn.py?lines=52-58
The output is then ultimately split into 4:
https://www.internalfb.com/code/fbsource/[ebef7c4238aa55948b2b444044f2c8ed2040de55]/fbcode/caffe2/torch/ao/nn/quantizable/modules/rnn.py?lines=83-87

For on-device latency (and possibly memory) considerations, we want to avoid constructing the intermediate `gates` tensor (which can be relatively large), by splitting `igates` and `hgates` first (as 4x `Linear(hidden_dim, hidden_dim)` each), applying add separately, then proceeding as usual.

This functionality can be enabled by specifying `split_gates=True` (default False is original behavior) at any entry point (directly with `torch.ao.nn.quantizable.LSTM`  or via `_get_lstm_with_individually_observed_parts`).

Test Plan:
piggy back on existing test to check for correct swap handling, numerics, and jit.script during prepare/convert
```
buck2 test 'fbcode//mode/opt' fbcode//caffe2/test/quantization:test_quantization -- --exact 'caffe2/test/quantization:test_quantization - test_custom_module_lstm (caffe2.test.quantization.core.test_quantized_op.TestQuantizedOps)'
```
https://www.internalfb.com/intern/testinfra/testrun/4503599884152725

This test is quite long running now (more than double original).

---

shorter test to confirm original `LSTMCell` passes
```
buck2 test 'fbcode//mode/opt' fbcode//caffe2/test:quantization_fx -- --exact 'caffe2/test:quantization_fx - test_static_lstm_with_custom_fixed_qparams (quantization.fx.test_quantize_fx.TestQuantizeFx)'
```
https://www.internalfb.com/intern/testinfra/testrun/11258999127933996

Reviewed By: Ninja91

Differential Revision: D66380336
2024-12-03 17:21:44 -05:00
PyTorch MergeBot
09ce760fef Revert "Add missing data types at torch export serialization (#138561)"
This reverts commit 1ef1b3b391.

Reverted https://github.com/pytorch/pytorch/pull/138561 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/138561#issuecomment-2513343401))
2024-12-03 01:32:50 +00:00
siahuat0727
3c63e76b03 [PT2E Quantization] Fix RecursionError when prepare_pt2e graph with concat of the same node (#141651)
Fixes #129038

Related PR #129567

Here is the new PR against main, thanks! @jerryzh168

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141651
Approved by: https://github.com/jerryzh168
2024-11-29 09:19:22 +00:00
Xia, Weiwen
9827d677b4 [Quant][PT2E][X86] annotate and convert for linear_dynamic_fp16 (#141480)
Annotate linear node for `linear_dynamic_fp16` with `X86InductorQuantizer`
After `convert_pt2e`, the pattern will be
```
  x
  |
linear <- to_fp32 <- to_fp16 <- w
```

**Test plan**
```
pytest test/quantization/pt2e/test_x86inductor_quantizer.py -k test_linear_dynamic_fp16
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141480
Approved by: https://github.com/jgong5, https://github.com/jerryzh168
2024-11-29 07:48:39 +00:00
yintong-lu
1ef1b3b391 Add missing data types at torch export serialization (#138561)
Related to #131654

Added missing FP8 data types at torch export serialization.
Added test cases of FP8 data types.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138561
Approved by: https://github.com/jerryzh168, https://github.com/jgong5
2024-11-28 08:35:03 +00:00
cyy
5ca75ac1df Enable UBSAN tests (#141672)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141672
Approved by: https://github.com/ezyang
2024-11-28 01:55:15 +00:00
Shangdi Yu
02990fe36b Populate nn.module.stack in _fuse_conv_bn_qat (#141400)
Summary:
Populate nn.module.stack in _fuse_conv_bn_qat for replacement nodes that correspond to a `get_attr` node in the original graph.

In new training ir , `get_attr` nodes don't have `nn_module_stack` in node meta anymore (because the get_attr nodes are de-duplicated, so one get_attr node can potential have users in different module stacks).

We populate it by checking if "conv_input" or "conv_weight" replacement node has nn_module_stack. If not, we copy it from the conv node.

Test Plan:
CI

```
buck run fbcode//caffe2/test:quantization_pt2e -- -r test_preserve_nn_module_stack
```

Differential Revision: D66393517

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141400
Approved by: https://github.com/angelayi
2024-11-25 23:41:28 +00:00
PyTorch MergeBot
cf1d95a965 Revert "Add option to split Linear gates for Quantizable LSTM into separate ops (#140868)"
This reverts commit 3fcf66f61f.

Reverted https://github.com/pytorch/pytorch/pull/140868 on behalf of https://github.com/huydhn due to Sorry for reverting your change but I think lint is failing on this in trunk ([comment](https://github.com/pytorch/pytorch/pull/140868#issuecomment-2494076202))
2024-11-22 15:54:05 +00:00
Johnson Wong
3fcf66f61f Add option to split Linear gates for Quantizable LSTM into separate ops (#140868)
Summary:
For LSTM, the input and hidden state are projected with Linear layers to construct the 4 gates. This is typically performed together as a single Linear (for each state) with output channel count `4 * hidden_dim` for efficiency.
https://www.internalfb.com/code/fbsource/[ebef7c4238aa55948b2b444044f2c8ed2040de55]/fbcode/caffe2/torch/ao/nn/quantizable/modules/rnn.py?lines=52-58
The output is then ultimately split into 4:
https://www.internalfb.com/code/fbsource/[ebef7c4238aa55948b2b444044f2c8ed2040de55]/fbcode/caffe2/torch/ao/nn/quantizable/modules/rnn.py?lines=83-87

For on-device latency (and possibly memory) considerations, we want to avoid constructing the intermediate `gates` tensor (which can be relatively large), by splitting `igates` and `hgates` first (as 4x `Linear(hidden_dim, hidden_dim)` each), applying add separately, then proceeding as usual.

This functionality can be enabled by specifying `split_gates=True` (default False is original behavior) at any entry point (directly with `torch.ao.nn.quantizable.LSTM`  or via `_get_lstm_with_individually_observed_parts`).

Test Plan:
piggy back on existing test to check for correct swap handling, numerics, and jit.script during prepare/convert
```
buck2 test 'fbcode//mode/opt' fbcode//caffe2/test/quantization:test_quantization -- --exact 'caffe2/test/quantization:test_quantization - test_custom_module_lstm (caffe2.test.quantization.core.test_quantized_op.TestQuantizedOps)'
```
https://www.internalfb.com/intern/testinfra/testrun/11540474102848372

This test is quite long running now (more than double original).

Reviewed By: Ninja91

Differential Revision: D65283170

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140868
Approved by: https://github.com/jerryzh168
2024-11-22 04:10:26 +00:00
Songhao Jia
84d86e3767 [numeric_debugger] guard the input generate_numeric_debug_handle as GraphModule type (#140742)
Summary: Support ExportProgram type in generate_numeric_debug_handle, to better meet the requirement

Test Plan: ci

Differential Revision: D65920529

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140742
Approved by: https://github.com/tarun292, https://github.com/jerryzh168
2024-11-20 03:40:04 +00:00
Shen Xu
efe8482c0d Add prepare_obs_or_fq_callback to quantizer (#140863)
Test Plan: CI.

Differential Revision: D65982003

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140863
Approved by: https://github.com/jerryzh168
2024-11-19 01:13:38 +00:00
Xia, Weiwen
62eea62493 [Quant][Onednn] add linear_dynamic_fp16 ops (#140376)
**About this PR**
This PR adds the following ops for `linear_dynamic_fp16` in onednn namespace. These ops are intended for PT2E quantization eager mode.
- `onednn::linear_prepack_fp16`: packs fp32 weight to an fp16 MkldnnCPU tensor.
- `onednn::linear_dynamic_fp16`: takes an fp32 CPU tensor and an fp16 MkldnnCPU tensor and compute linear in fp32
- `onednn::linear_relu_dynamic_fp16`: similar as the former and apply relu on output.

**Test plan**
`python test/test_quantization.py -k test_linear_dynamic_fp16_onednn`

**Implementation**
These ops call oneDNN lib under the hood. It's worth noting that oneDNN does not support f32 * f16 -> f32 computation, so we have to convert fp16 weight to fp32 before computation. And weight is still in plain format after packing.

**Correctness and performance**
Correctness is guaranteed by UT.
Performance of the new ops may be better than the FBGEMM implementation when weight shape is small but worse when weight shape is large. It's because weight dtype conversion and computation are not fused.
For example, I ran benchmarks on an Intel(R) Xeon(R) Platinum 8490H machine with different cores and shapes. When using 1 core per instance, the new implementation generally is faster for weight shape < 1024 * 1024. When using more cores, the threshold will increase.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140376
Approved by: https://github.com/jerryzh168, https://github.com/jgong5
2024-11-14 05:19:18 +00:00
zeshengzong
cb71bcc542 Replace clone.detach with detach.clone (#140264)
Fixes #64532

As state in issue, replace `clone.detach` by `detach.clone`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140264
Approved by: https://github.com/soulitzer
2024-11-13 07:01:02 +00:00
Songhao Jia
59ec011855 [numerical debugger] bumped up the starting handler id (#139666)
Differential Revision: D65445250

Pull Request resolved: https://github.com/pytorch/pytorch/pull/139666
Approved by: https://github.com/tarun292, https://github.com/dulinriley
2024-11-07 01:00:43 +00:00
Jerry Zhang
938803df94 Add bfloat16 support for per tensor/channel cpu/cuda fake quantize ops (#139306)
Summary: Fixes https://fb.workplace.com/groups/2240361332735959/permalink/8190736677698365

Test Plan:
buck2 test 'fbcode//mode/dev' fbcode//caffe2/test/quantization:test_quantization -- --exact 'caffe2/test/quantization:test_quantization - test_forward_per_channel_cachemask_cpu (caffe2.test.quantization.core.test_workflow_ops.TestFakeQuantizeOps)'

buck2 test 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- --exact 'caffe2/test/quantization:test_quantization - test_forward_per_tensor_cachemask_cpu (caffe2.test.quantization.core.test_workflow_ops.TestFakeQuantizeOps)'

buck2 test 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- --exact 'caffe2/test/quantization:test_quantization - test_forward_per_channel_cachemask_cuda (caffe2.test.quantization.core.test_workflow_ops.TestFakeQuantizeOps)'

buck2 test 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- --exact 'caffe2/test/quantization:test_quantization - test_forward_per_channel_cachemask_cpu (caffe2.test.quantization.core.test_workflow_ops.TestFakeQuantizeOps)'

Differential Revision: D65221710

Pull Request resolved: https://github.com/pytorch/pytorch/pull/139306
Approved by: https://github.com/navsud
2024-10-31 20:41:15 +00:00
Wu, Chunyuan
d7411c0cc1 [AOTI] add C shim for QConvPointWise (#138540)
This PR adds C shim for `QConvPointWisePT2E` and `QConvPointWiseBinaryPT2E` similar to https://github.com/pytorch/pytorch/pull/138439. Besides that, we aligned the implementation of `qconv_pointwise` with `qlinear_pointwise` in the following aspects:
1. The parameter order of `qconv_pointwise` and `qlinear_pointwise` are quite different, we aligned the schema of `qconv_pointwise` to have similar parameter order as `qlinear_pointwise` to make it more consistent.
2. We always converted `x_scale` and `x_zero_point` to Tensors, just like in the lowering of `qlinear_pointwise`. This avoids the need to create two separate C APIs (one for `double x_scale` and `int64_t x_zero_point`, and another for `Tensor` versions). Instead, we only need one API for `Tensor`-based `x_scale` and `x_zero_point`. If we later add dynamic quantization for qconv (which will use `Tensor` for `x_scale` and `x_zero_point`), we can reuse the code from this PR and don't need to change the C shim layer API.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138540
Approved by: https://github.com/jgong5, https://github.com/desertfire
ghstack dependencies: #138691, #138806
2024-10-31 02:03:01 +00:00
Xia, Weiwen
edcab61f93 Skip test for PT2E quantized ops in fbcode (#138792)
Skip those tests as they are failing in fbcode.
Submit this PR per request from @jerryzh168
Pull Request resolved: https://github.com/pytorch/pytorch/pull/138792
Approved by: https://github.com/jerryzh168
2024-10-30 02:37:38 +00:00
amathewc
69d401d010 Update test_quantize_pt2e.py with HPU support (#137863)
**MOTIVATION**

We recently integrated support for Intel Gaudi devices (identified as 'hpu') into the common_device_type framework via the pull request at https://github.com/pytorch/pytorch/pull/126970. This integration allows tests to be automatically instantiated for Gaudi devices upon loading the relevant library. Building on this development, the current pull request extends the utility of these hooks by adapting selected CUDA tests to operate on Gaudi devices. Additionally, we have confirmed that these modifications do not interfere with the existing tests on CUDA devices.

**CHANGES**
- Add support for HPU devices within the test_move_exported_model_bn using TEST_HPU flag
- Use instantiate_device_type_tests with targeted attributes to generate device-specific test instances.
- Apply skipIfHPU decorator to bypass tests that are not yet compatible with HPU devices.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/137863
Approved by: https://github.com/jerryzh168
2024-10-29 13:01:03 +00:00
Aaron Gokaslan
5d074746e9 [BE]: Add better optional typing (#138426)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138426
Approved by: https://github.com/XuehaiPan, https://github.com/malfet
2024-10-27 14:19:00 +00:00