Summary:
Raise and assert used to have a hard-coded error message "Exception". User provided error message was ignored. This PR adds support to represent user's error message in TorchScript.
This breaks backward compatibility because now we actually need to script the user's error message, which can potentially contain unscriptable expressions. Such programs can break when scripting, but saved models can still continue to work.
Increased an op count in test_mobile_optimizer.py because now we need aten::format to form the actual exception message.
This is built upon an WIP PR: https://github.com/pytorch/pytorch/pull/34112 by driazati
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41907
Reviewed By: ngimel
Differential Revision: D22778301
Pulled By: gmagogsfm
fbshipit-source-id: 2b94f0db4ae9fe70c4cd03f4048e519ea96323ad
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33504
Fix resolution fo functions that are bound onto torch in torch/functional.py. This does not fix compilation of all of those functions, those will be done in follow ups. Does torch.stft as a start.
Fixes#21478
Test Plan: Imported from OSS
Differential Revision: D20014591
Pulled By: eellison
fbshipit-source-id: bb362f1b5479adbb890e72a54111ef716679d127
Summary:
This should be covered under recursive script now
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32235
Pulled By: driazati
Differential Revision: D19414889
fbshipit-source-id: 85f8132401dbe44c9dbaef7c0350110f90eb9843
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28766
Add the warning message to explicitly ask the users to upgrade the deprecated `torch.jit.quantized` API to the new `torch.quantization.quantize_dynamic` API.
ghstack-source-id: 92711620
Test Plan: CI
Differential Revision: D18164903
fbshipit-source-id: e6aff2527f335c2d9f362e6856ce8597edb52aaa
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27399
This was devised in a time when we didn't have module attributes. They
are essentially just tensor lists, so represent them that way. This has
the additional benefit of making the RNN forward pass faster because we
effectively cache the flattened weights.
The only complication part is that someone may come along and do:
```
my_rnn_mod.w_ih_l0 = torch.nn.Parameter(...)
```
This means we need to override setattr to keep the flattened weights
cache up to date.
Test Plan: Imported from OSS
Differential Revision: D17785658
Pulled By: suo
fbshipit-source-id: 7789cd1d0d4922bfd5eba1716976442fbf150766
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26311
We are currently unable to deploy models due to D16955662 changing function signature of ```quantized_lstm(``` but the function call here (https://fburl.com/diffusion/e4wrmx83) not passing the newly added ```use_dynamic``` param.
Here is the details of the error: P111215482
```
E0916 12:36:16.423516 1149877 ExceptionTracer.cpp:214] exception stack complete
terminate called after throwing an instance of 'torch::jit::script::ErrorReport'
what():
Arguments for call are not valid.
The following operator variants are available:
aten::quantized_lstm(Tensor input, Tensor[] hx, Tensor[] params, bool has_biases, int num_layers, float dropout, bool train, bool bidirectional, bool batch_first, *, int? dtype=None) -> (Tensor, Tensor, Tensor):
Keyword argument use_dynamic unknown.
```
This diff fixes that.
Test Plan:
Running quantization tests after.
```buck test mode/dev caffe2/test:jit -- 'test_quantization_modules \(test_jit\.TestScript\)'```
https://our.intern.facebook.com/intern/testinfra/testrun/5910974518872494
Also, currently building a package (language_technology.translation.jedi.scripts:35c3643) and testing this (f138747078).
f138771702
Reviewed By: jhcross
Differential Revision: D17404451
fbshipit-source-id: 390d2ce1ecbdd63a07a8f16c80e4c3ac25ab0a99
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24046
`nnq.Linear` was a confusing mess of buffers/attributes and Tensor/not tensor members. This PR reworks it to consistently have only Python attributes, with the conversions handled explicitly by state_dict or __{get,set}state__ methods (added in PRs further up the stack
Test Plan: Imported from OSS
Reviewed By: driazati
Differential Revision: D16728345
Pulled By: jamesr66a
fbshipit-source-id: 47468b776b428fca2409bb55c8b161afb68a3379
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22956
As Title says: remove the extra function arguments for better engineering.
Differential Revision: D16297724
fbshipit-source-id: a31be17708d13508c4ce9a3ce7eb5238e8d17984
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23577
This diff is fixing a model size issue introduced in #23291. After that PR, the model size after in8 quantization is the same as that of the original unquantized model. The reason is that we save original weight for int8 quantization even when that's not needed anymore. This diff fixes that by only saving original weight for fp16 quantization path.
Reviewed By: llyfacebook
Differential Revision: D16557619
fbshipit-source-id: f924ae8d155a0d525b86a7440b3c7147d5bead0a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23347
This diff replaces uint8 with int8 to match with the underlying kernel implementation. When we do int8 quantization, we are computing with uint8 (input activation) * int8 (weight) -> uint8 (output activation). The weight is quantized into int8.
Reviewed By: jianyuh
Differential Revision: D16469435
fbshipit-source-id: a697655b0e97833fc601e5980970aec4dba53c39
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23291
This diff implements LSTM with FP16 weights based on FBGEMM.
At a high level, here are the steps:
1. Quantize and pack weight in every layer of LSTM
2. Pass weights from step 1 to the ATen `quantized_lstm` function which does matrix multiplication with FP16 weight. The following code shows the dtype of each variable used in MM:
Y = X * W + B
(fp32, fp32, fp16, fp32)
Reviewed By: jianyuh
Differential Revision: D16389595
fbshipit-source-id: c26ae4e153c667a941f4af64e9d07fc251403cee
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22323
This diff adds an interface to use quantized Linear op in JIT.
Reviewed By: jamesr66a
Differential Revision: D16040724
fbshipit-source-id: 90e90aff9973c96ea076ed6a21ae02c349ee2bcf
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20787
Set requires_grad=False for bias: this will block the jit tracing.
The as_type fix: The input tensor shape and output tensor shape will be different, which will trigger the assertion failure at https://fburl.com/0m8xy7tc.
Reviewed By: jamesr66a
Differential Revision: D15445092
fbshipit-source-id: 22da41a56ecb9ac092585d0cc1ff0658fb9d631b
Summary:
Previously we only had a Python wrapper for `torch.quantized_lstm_cell`. We had the op `torch.quantized_lstm`, but it didn't have a wrapper. This makes the wrapper
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20249
Reviewed By: driazati
Differential Revision: D15250023
Pulled By: jamesr66a
fbshipit-source-id: f05ad784d903e0ef3a62633c8bf80bad79de48ae
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18598
ghimport-source-id: c74597e5e7437e94a43c163cee0639b20d0d0c6a
Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18598 Turn on F401: Unused import warning.**
This was requested by someone at Facebook; this lint is turned
on for Facebook by default. "Sure, why not."
I had to noqa a number of imports in __init__. Hypothetically
we're supposed to use __all__ in this case, but I was too lazy
to fix it. Left for future work.
Be careful! flake8-2 and flake8-3 behave differently with
respect to import resolution for # type: comments. flake8-3 will
report an import unused; flake8-2 will not. For now, I just
noqa'd all these sites.
All the changes were done by hand.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Differential Revision: D14687478
fbshipit-source-id: 30d532381e914091aadfa0d2a5a89404819663e3
Summary:
Use flake8 installed with mypy checks so that our linter matches fbcode. Mypy type errors also provide valuable signal
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17721
Differential Revision: D14357778
Pulled By: eellison
fbshipit-source-id: d8c9ea3fe3b5f550c3b70fe259e0eabf95e4c92d
Summary:
Remove calls to torch.jit._unwrap_optional that are no longer needed.
The remaining instances would require control flow logic for exceptions.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16245
Differential Revision: D13804292
Pulled By: eellison
fbshipit-source-id: 08c5cbe4b956519be2333de5cf4e202488aff626
Summary:
Similarly to https://github.com/pytorch/pytorch/pull/13777, we apply post-processing quantization to RNN cell modules (`RNNCell`, `LSTMCell`, and `GRUCell`).
A further follow-up PR will involve quantizing the full `RNN`, `GRU`, and `LSTM` modules. This depends on those modules being scriptable as part of the standard library scripting effort, though. Note that infrastructure in this pr such as `gather_quantized_params` is currently unused but should be used in the future when we can port over the full RNN modules.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15469
Differential Revision: D13545802
Pulled By: jamesr66a
fbshipit-source-id: ad3b694517842893ea619438e9f5e88fd7b96510
Summary:
This PR implements infrastructure for post-processing a model to apply int8 quantization to its `nn.Linear` modules. Highlights of the implementation:
1) Inputs and outputs are `float` (quantized and packed internally), but the weight is quantized and packed ahead of time for efficiency. This implementation performs well in small-batch size GEMM calls. It should not be considered a general-purpose quantized GEMM kernel.
2) Weight packing is dependent on machine architecture (e.g. vector register width), so it is done just-in-time. Concretely, it is done on model load for the weights and it is done during operator execution for the input value.
3) Biases are unquantized
4) We fail loudly if we are attempting to run this on a machine that does not support FBGEMM. This is because we do not want a model's numerics to differ based on which machine it is run on. A model containing these FBGEMM ops *must* be run with FBGEMM
The API can be seen in the added test case. Highlights are:
1) `torch.jit.quantized.quantize_linear_modules` walks the module hierarchy of the passed-in Module and replaces all `nn.Linear` modules with a new `QuantizedLinear` module, which encapsulates the behavior described above.
2) `_pack()` and `_unpack()` script methods are present on `QuantizedLinear` modules. These methods should be called before serialization and after deserialization, respectively. This ensures that the weight matrix is properly packed for the running machine's architecture. Note that in the long term, we would like to move toward a more Pickle-style serialization technique, rather than having these explicit methods that mutate member values. This is blocked on being able to assign attributes in a ScriptMethod, among other things.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13777
Differential Revision: D13383276
Pulled By: jamesr66a
fbshipit-source-id: 00f29c9f34544add2b90107e3cf55a287802c344