pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
bobrenjc93	a55977f763	Migrate from Tuple -> tuple in torch/ao (#144265 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144265 Approved by: https://github.com/aorenste	2025-01-10 00:12:06 +00:00
Johnson Wong	f86a1753d1	Add option to split Linear gates for Quantizable LSTM into separate ops (#141366 ) Add option to split Linear gates for Quantizable LSTM into separate ops (#141366) Summary: Reattempt to land D65283170, adding pyre-fixmes / mypy ignores following D52890934 For LSTM, the input and hidden state are projected with Linear layers to construct the 4 gates. This is typically performed together as a single Linear (for each state) with output channel count `4 * hidden_dim` for efficiency. https://www.internalfb.com/code/fbsource/[ebef7c4238aa55948b2b444044f2c8ed2040de55]/fbcode/caffe2/torch/ao/nn/quantizable/modules/rnn.py?lines=52-58 The output is then ultimately split into 4: https://www.internalfb.com/code/fbsource/[ebef7c4238aa55948b2b444044f2c8ed2040de55]/fbcode/caffe2/torch/ao/nn/quantizable/modules/rnn.py?lines=83-87 For on-device latency (and possibly memory) considerations, we want to avoid constructing the intermediate `gates` tensor (which can be relatively large), by splitting `igates` and `hgates` first (as 4x `Linear(hidden_dim, hidden_dim)` each), applying add separately, then proceeding as usual. This functionality can be enabled by specifying `split_gates=True` (default False is original behavior) at any entry point (directly with `torch.ao.nn.quantizable.LSTM` or via `_get_lstm_with_individually_observed_parts`). Test Plan: piggy back on existing test to check for correct swap handling, numerics, and jit.script during prepare/convert ``` buck2 test 'fbcode//mode/opt' fbcode//caffe2/test/quantization:test_quantization -- --exact 'caffe2/test/quantization:test_quantization - test_custom_module_lstm (caffe2.test.quantization.core.test_quantized_op.TestQuantizedOps)' ``` https://www.internalfb.com/intern/testinfra/testrun/4503599884152725 This test is quite long running now (more than double original). --- shorter test to confirm original `LSTMCell` passes ``` buck2 test 'fbcode//mode/opt' fbcode//caffe2/test:quantization_fx -- --exact 'caffe2/test:quantization_fx - test_static_lstm_with_custom_fixed_qparams (quantization.fx.test_quantize_fx.TestQuantizeFx)' ``` https://www.internalfb.com/intern/testinfra/testrun/11258999127933996 Reviewed By: Ninja91 Differential Revision: D66380336	2024-12-03 17:21:44 -05:00
PyTorch MergeBot	cf1d95a965	Revert "Add option to split Linear gates for Quantizable LSTM into separate ops (#140868 )" This reverts commit `3fcf66f61f`. Reverted https://github.com/pytorch/pytorch/pull/140868 on behalf of https://github.com/huydhn due to Sorry for reverting your change but I think lint is failing on this in trunk ([comment](https://github.com/pytorch/pytorch/pull/140868#issuecomment-2494076202))	2024-11-22 15:54:05 +00:00
Johnson Wong	3fcf66f61f	Add option to split Linear gates for Quantizable LSTM into separate ops (#140868 ) Summary: For LSTM, the input and hidden state are projected with Linear layers to construct the 4 gates. This is typically performed together as a single Linear (for each state) with output channel count `4 * hidden_dim` for efficiency. https://www.internalfb.com/code/fbsource/[ebef7c4238aa55948b2b444044f2c8ed2040de55]/fbcode/caffe2/torch/ao/nn/quantizable/modules/rnn.py?lines=52-58 The output is then ultimately split into 4: https://www.internalfb.com/code/fbsource/[ebef7c4238aa55948b2b444044f2c8ed2040de55]/fbcode/caffe2/torch/ao/nn/quantizable/modules/rnn.py?lines=83-87 For on-device latency (and possibly memory) considerations, we want to avoid constructing the intermediate `gates` tensor (which can be relatively large), by splitting `igates` and `hgates` first (as 4x `Linear(hidden_dim, hidden_dim)` each), applying add separately, then proceeding as usual. This functionality can be enabled by specifying `split_gates=True` (default False is original behavior) at any entry point (directly with `torch.ao.nn.quantizable.LSTM` or via `_get_lstm_with_individually_observed_parts`). Test Plan: piggy back on existing test to check for correct swap handling, numerics, and jit.script during prepare/convert ``` buck2 test 'fbcode//mode/opt' fbcode//caffe2/test/quantization:test_quantization -- --exact 'caffe2/test/quantization:test_quantization - test_custom_module_lstm (caffe2.test.quantization.core.test_quantized_op.TestQuantizedOps)' ``` https://www.internalfb.com/intern/testinfra/testrun/11540474102848372 This test is quite long running now (more than double original). Reviewed By: Ninja91 Differential Revision: D65283170 Pull Request resolved: https://github.com/pytorch/pytorch/pull/140868 Approved by: https://github.com/jerryzh168	2024-11-22 04:10:26 +00:00
Xuehai Pan	2ce734cee9	[BE] enable UFMT for `torch/ao/quantization/` (#128863 ) Part of #123062 - #123062 Pull Request resolved: https://github.com/pytorch/pytorch/pull/128863 Approved by: https://github.com/ezyang ghstack dependencies: #128861, #128862	2024-07-25 04:17:54 +00:00
Aaron Gokaslan	8219bf051b	[BE]: Apply RUF015 to torch folder (#113025 ) Removes unnecessary allocations of iterators. There is a small chance this may have side effects as the entire iterator is no longer consumed, but this is a way more efficient method for retrieving the first element. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113025 Approved by: https://github.com/ezyang, https://github.com/malfet	2023-11-07 00:48:15 +00:00
Nitin Jain	556bb691fd	[AO]Fix observed LSTM layer setup individually observed LSTM (#101299 ) Summary: We have found that `_get_lstm_with_individually_observed_parts()` is missing setup step which sets up the LSTM layer state initializing weights and biases of this layer. This diff fixes the observed numerical discrepancy seen by CTRL team in using the above API. Test Plan: N3358643 Differential Revision: D45821681 Pull Request resolved: https://github.com/pytorch/pytorch/pull/101299 Approved by: https://github.com/andrewor14	2023-05-18 19:15:01 +00:00
andrewor14	faa4cb29b2	[Quant][fx] Create new FX-based LSTM reference module (#96343 ) Summary: The previous LSTM reference module implementation did not handle dtypes other than quint8 correctly. This is because the internal LSTM custom module quantization used eager mode, which did not insert the q-dq ops properly. E.g., we want the following reference quantized model: ``` [dq -> linear1_fp32 -> q_to_qint32] -> dq -> q_to_quint8 -> [dq - linear2_fp32 -> q_to_quint8] -> dq -> ... ``` This requires two sets of `q - dq` pairs between two adjacent ops that have different dtypes (linear1 and linear2). However, these `q - dq` pairs were not inserted in the old flow, because eager mode required users to insert Quant/DeQuantStubs manually. This commit changes the internal LSTM custom module quantization to use FX graph mode quantization, which automatically inserts the `q - dq` ops that convert the dtypes between adjacent ops correctly. However, using FX graph mode quantization here comes with its own set of challenges that required some hacks to get the end-to-end flow to work. These hacks are detailed in the comments in the util functions. Test Plan: python test/test_quantization.py TestQuantizeFx.test_static_lstm_with_custom_fixed_qparams This commit also updates the corresponding test to verify the dtypes as well as the qparams in the reference quantized graph. This test case should serve as an example for users to set up their own LSTM reference module flows. Reviewers: vkuzo, supriyar, jcaip Subscribers: vkuzo, supriyar, jcaip Pull Request resolved: https://github.com/pytorch/pytorch/pull/96343 Approved by: https://github.com/vkuzo	2023-03-09 23:23:48 +00:00

8 Commits