Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41934
The model exported from online training workflow with int8 quantization contains FCs with 4 inputs. The extra input is the quant_param blob. This diff is to adjust the bound_shape_inferencer and int8 op schema to get shape info for the quant_param input.
Test Plan:
```
buck test caffe2/caffe2/opt:bound_shape_inference_test
```
Reviewed By: yinghai
Differential Revision: D22683554
fbshipit-source-id: 684d1433212a528120aba1c37d27e26b6a31b403
Summary:
This directory is opted-in to clang-format but is not format-clean. This blocks continuous formatting from being enabled on fbcode, and causes hassle for other codemods that leave inconsistent formatting. This diff runs clang-format, which is widely used and considered safe.
If you are unhappy with the formatting of a particular block, please *accept this diff* and then in a stacked commit undo the change and wrap that code in `// clang-format off` and `// clang-format on`, or `/* clang-format off */` and `/* clang-format on */`.
drop-conflicts
Test Plan: sandcastleit
Reviewed By: jerryzh168
Differential Revision: D22311706
fbshipit-source-id: 1ca59a82e96156a4a5dfad70ba3e64d44c5e762a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40494
Resubmit the diff because D22124313 (1ec4337b7d) was reverted due to CI test failures
Added the int8_gen_quant_params.cc to CMakeList.txt to fix the CI failures
Test Plan: buck test caffe2/caffe2/quantization/server:
Reviewed By: hx89
Differential Revision: D22204244
fbshipit-source-id: a2c8b668f199cc5b0c5894086f554f7c459b1ad7
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40390
Change the Int8FC/Int8Quantize op interface to use Int8QuantParamsBlob as the qparam input blob format when needed.
Test Plan:
```
buck test caffe2/caffe2/quantization/server:
```
Reviewed By: hx89
Differential Revision: D22124313
fbshipit-source-id: 6b5c1974c0fc5928f72773495f0da8d0eb9b98c9
Summary: Extend int8 FC op to take scale and zero point from input to support int8 PTQ productization of online training models.
Test Plan: buck test caffe2/caffe2/quantization/server:fully_connected_dnnlowp_op_test
Reviewed By: csummersea
Differential Revision: D21944884
fbshipit-source-id: 2094827da903f3993afe4f8cf6e70286b195321d
Summary: Extend int8 quantize op to take scale and zero point from input to support int8 PTQ productization of online training models.
Test Plan: buck test caffe2/caffe2/quantization/server:quantize_dnnlowp_op_test
Reviewed By: csummersea
Differential Revision: D21939660
fbshipit-source-id: 7ce2fbf9cd8a990c270f2187a49b1578ce76bc37
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32104
Fixes these warnings:
```
xplat\caffe2\caffe2Windows#header-mode-symlink-tree-only,headers\caffe2\operators\quantized\int8_conv_op.h(96,17): warning: use 'template' keyword to treat 'data' as a dependent template name
W.t.data<uint8_t>(),
^
template
xplat\caffe2\caffe2Windows#header-mode-symlink-tree-only,headers\caffe2\operators\quantized\int8_conv_op.h(97,17): warning: use 'template' keyword to treat 'data' as a dependent template name
B.t.data<int32_t>(),
^
template
```
Test Plan: Tested locally with clang-cl and CI for other toolchains
Reviewed By: boguscoder
Differential Revision: D19353563
fbshipit-source-id: c28afb8c1ad72fd77ef82556ba89fcf09100d1f9
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30915
Since we now have C++14, we don't need these c10::guts helpers anymore
ghstack-source-id: 95777609
Test Plan: waitforsandcastle
Differential Revision: D18869639
fbshipit-source-id: 97716f932297c64c6e814410ac47b444c33d4e2e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30498
Updated Int8SliceOp to accept dim, start and end index similar to Pytorch.
Test Plan:
python test/onnx/test_pytorch_onnx_caffe2_quantized.py TestQuantizedOps.test_slice
Imported from OSS
Differential Revision: D18740519
fbshipit-source-id: 2313f37a4936edb150ce04911b241e591e191801
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30202
Pytorch Upsample operator has output_size as an argument.
For quantized tensor inputs we cannot get the input_size to calculate the width and height scale factor.
Instead we pass the output_size directly to caffe2 to calculate the scale factors.
Test Plan:
python test/onnx/test_pytorch_onnx_caffe2_quantized.py TestQuantizedOps.test_upsample
Imported from OSS
Differential Revision: D18631478
fbshipit-source-id: 38a39129bc863f4ecf2293acc068e40ab7edc825
Summary: It's failing in the FB internal build because we don't enable that op.
Test Plan: buck test //xplat/caffe2:caffe2_testAndroid
Reviewed By: supriyar
Differential Revision: D17139694
fbshipit-source-id: 8091b71ff826466f3e2e1b4d6f87b9b50d1def20
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16382
Adding an Int8TransposeOp that inherits from TransposeOp.
Small refactoring to normal TransposeOp to move main logic into a TransposeImpl
function.
Test Plan: int8_test.cc
Reviewed By: supriyar
Differential Revision: D13822715
fbshipit-source-id: a4d61bdf8e4e1d3f2e30b86d325810ed44c21635
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23566
Currently if we use dynamic quantization we don't have the access to the internally quantized inputs and output for debugging.
To make the debugging easier, this diff adds a debug feature to expose the quantized X, W and Y for debugging if debug outputs are attached to the operator and caffe2_dnnlowp_force_slow_path flag is set.
The quantized inputs and output are exposed as the extra outputs.
The example Int8FC op with debug outputs appended looks like:
```
op {
input: "X"
input: "W"
input: "b"
output: "Y"
output: "X_q"
output: "W_q"
output: "Y_q"
name: ""
type: "Int8FC"
arg {
name: "axis"
i: 1
}
...
}
```
Next need to expose the quantization parameters.
Reviewed By: jspark1105
Differential Revision: D16566753
fbshipit-source-id: acd855a172ee7993ddba8808f2af81b628ff9c02
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17088
clangr codemod
also manually moved the constructor of a class from the .cpp file to the .h file.
Reviewed By: ezyang
Differential Revision: D14078531
fbshipit-source-id: 2adb4ac0ce523742da6cce3bc3b6c177b816c299
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17272
after windows-specific fixes were applied new file was left out of CMakeLists
Reviewed By: orionr
Differential Revision: D14140419
fbshipit-source-id: 6a6c652048ed196ec20241bc2a1d08cbe2a4e155
Summary:
Hi,
caffe2/operators/quantized/int8_given_tensor_fill_op.cc expects the value array to be named "values" but the operator schema describe "value" (no s). I guess it is a little typo but it made me losing a bit of time before understanding why I had this error by passing "value" instead of "values":
```
[F int8_given_tensor_fill_op.h:95] Check failed: output->t.numel() == values_.numel() output size: 3 given size: 0
Aborted (core dumped)
```
Thanks,
Eyyüb Sari
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16204
Differential Revision: D14020476
Pulled By: ezyang
fbshipit-source-id: a8a46bfc44ec125e7925ce4b7c79fdf99c890a50
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16765
Code changes required to build caffe2 for windows with toolchain used by FB.
Reviewed By: orionr
Differential Revision: D13953258
fbshipit-source-id: 651823ec9d81ac70e32d4cce5bc2472434104733
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16273
Previously we have SetOutputSize which accept a partially initialized Output Tensor and set it to the correct size,
the diff change this to GetOutputSize that returns the correct size instead.
e.g.
```
auto* Y = Output(0);
ConvPoolOp<Context>::SetOutputSize(X, Y, channels);
...
Y->mutable_data<T>...
```
-->
```
auto sizes = ConvPoolOp<Context>::GetOutputSize(X, channels);
auto* Y = Output(0, sizes, at::dtype<T>());
```
Reviewed By: dzhulgakov
Differential Revision: D13736281
fbshipit-source-id: 64abce3dbaed0b375098463333dfd0ea5a3b1945
Summary:
If we use clang with sse4 support, we will have the function redefinition
error between [1] and [2]. This patch try to add some checkings to fix this
problem.
I just turn on USE_NATIVE_ARCH with clang, then I hit the redefinition error.
[1]
caffe2/operators/quantized/int8_simd.h
[2]
third_party/gemmlowp/gemmlowp/fixedpoint/fixedpoint_sse.h
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13859
Differential Revision: D13095694
Pulled By: ezyang
fbshipit-source-id: c65166e4d5a04bb54e2b82c52740af00116ccb0d
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15912
Codemod generated with clangr shard mode, 25 files per diff,
To eliminiate partially initialized Tensor, we split the initialization of local Tensor variables into two steps, first declare un uninitialized Tensor, and
call `ReinitializeTensor` to initialize it.
motivation: https://github.com/pytorch/pytorch/pull/12407
Reviewed By: dzhulgakov
Differential Revision: D13586734
fbshipit-source-id: 8485d2c51225343961351c7a2e8f95055534f9a9
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15967
Codemod generated with clangr shard mode, 25 files per diff,
To eliminiate partially initialized Tensor, we split the initialization of local Tensor variables into two steps, first declare un uninitialized Tensor, and
call `ReinitializeTensor` to initialize it.
motivation: https://github.com/pytorch/pytorch/pull/12407
Reviewed By: smessmer
Differential Revision: D13586735
fbshipit-source-id: eae2d79e1107a2e813ce3809e690af4706aaa9ca
Summary:
50x-100x speedup compared to current version.
Also, fixes a bug in the current version when batch size exceeds 1 (current version processes only the first image in this case).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14883
Differential Revision: D13390655
Pulled By: Maratyszcza
fbshipit-source-id: 1b33a97bf2d0866d38faa2b42e64fd2859017898
Summary:
2.2-2.9X better performance on ARM when compiled with gcc (same bad perf when compiled with Clang)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14783
Differential Revision: D13332680
Pulled By: Maratyszcza
fbshipit-source-id: 4c1138500c6b3026335e9bfe5f6be43b1ae2cefb
Summary:
- Improved single-threaded performance due to optimized low-level micro-kernels
- Improved parallelization (previously was parallelized across images in a batch and pixels only, now within channels as well)
- Slightly different result due to different implementation of fixed-point arithmetics (no accuracy loss expected)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14089
Differential Revision: D13110135
Pulled By: Maratyszcza
fbshipit-source-id: 1f149394af5c16940f79a3fd36e183bba1be2497
Summary:
- NEON2SSE is a header that implements NEON intrinsics on top fo SSE intrinsics
- Upstream repo provides NEON_2_SSE.h header, but internally it was imported as neon2sse.h
- This patch fix incompatibilities between internal and upstream versions
Reviewed By: hlu1
Differential Revision: D13096755
fbshipit-source-id: 65e1df9a2a5e74bd52c9aee9be27469ba938cd8c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13660
Any change in server side quantized operator was triggering ios-sanity-check with more than 5 hours testing time. I suspect this was because the operator code was synced with xplat directory. This diff moves server side quantized operators to caffe2/caffe2/quantization/server to avoid this issue.
Reviewed By: hx89
Differential Revision: D12955420
fbshipit-source-id: b6c824b9de5e2a696f8c748e1b2c77d81d46746b
Summary:
C2GEMMContext is a remnant of old times when Int8 ops used gemmlowp.
It is no longer needed: formerly gemmlowp-based ops use QNNPACK with pthreadpool interface, and other ops (Int8Add, Int8ChannelShuffle) use Caffe2 thread pool interface directly.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13443
Differential Revision: D12887773
Pulled By: Maratyszcza
fbshipit-source-id: bd2732e2c187b399c8a82efebdd244457720256b