pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Haixin Liu	7f130c8494	Expose the quantized inputs and output of dynamic quantized int8 FC operator for debugging (#23566 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23566 Currently if we use dynamic quantization we don't have the access to the internally quantized inputs and output for debugging. To make the debugging easier, this diff adds a debug feature to expose the quantized X, W and Y for debugging if debug outputs are attached to the operator and caffe2_dnnlowp_force_slow_path flag is set. The quantized inputs and output are exposed as the extra outputs. The example Int8FC op with debug outputs appended looks like: ``` op { input: "X" input: "W" input: "b" output: "Y" output: "X_q" output: "W_q" output: "Y_q" name: "" type: "Int8FC" arg { name: "axis" i: 1 } ... } ``` Next need to expose the quantization parameters. Reviewed By: jspark1105 Differential Revision: D16566753 fbshipit-source-id: acd855a172ee7993ddba8808f2af81b628ff9c02	2019-08-02 21:23:43 -07:00
Yinghai Lu	b964bdb53a	Fbgemm fp16 tensor support (#23101 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23101 Support for - Shape inference - Tensor info extraction Reviewed By: zrphercule Differential Revision: D16345251 fbshipit-source-id: 53ef674b5b1581e6267e6d2070e34355280dae79	2019-07-19 17:08:03 -07:00
Jongsoo Park	738aba171b	use caffe2_dnnlowp_force_slow_path in FC (#22143 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22143 Like Conv DNNLOWP operator, allow FC to run the slow path to debug numerical issues caused by Intel's int8 instruction that does horizontal addition of 2 int8 multiplication results in 16 bit Reviewed By: hx89 Differential Revision: D15966885 fbshipit-source-id: c6726376a3e39d341fd8aeb0e54e0450d2af8920	2019-07-08 17:01:04 -07:00
Jongsoo Park	040a4bd914	include conv_op_impl.h from conv_dnnlowp_op.cc (#22458 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22458 To make sure template instantiation. Reviewed By: jianyuh Differential Revision: D16094183 fbshipit-source-id: 7861df0b303bec42ab80a53477c4b608edebb61d	2019-07-02 15:09:34 -07:00
Liuyi Jin	f5a1ea170b	SIMD version average pooling added (#22148 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22148 Average pooling is added into dnnlowp optimization code. Reviewed By: jspark1105 Differential Revision: D15936556 fbshipit-source-id: 6177ee62529801898f230c6fb89e9c4b598593a5	2019-06-25 12:19:21 -07:00
Jongsoo Park	b19b20efef	fix minor comment (#21576 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21576 Fix comment regarding original_tensor Reviewed By: jianyuh Differential Revision: D15733294 fbshipit-source-id: e2957f32dcf90859b77e61c931b64abdd066aabb	2019-06-21 22:23:53 -07:00
Jongsoo Park	5d7cf66862	add Int8SpatialBNRelu (#22014 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22014 Add Int8SpatialBN + Relu fused operator. Reviewed By: dskhudia Differential Revision: D15916551 fbshipit-source-id: a938e0f0e105ab5f823a3cb6144f50aa2ab944c1	2019-06-20 23:23:04 -07:00
Jongsoo Park	95aee81dd7	more general fusion logic (#22015 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22015 Previous fusion logic only works for operators back-to-back in the linear order of protobuf file. This diff generalizes to work for any predecessor-successor operators in the graph without any "interfering" use/def of the related blobs. Reviewed By: csummersea Differential Revision: D15916709 fbshipit-source-id: 82fe4911a8250845a8bea3427d1b77ce2442c495	2019-06-20 20:44:26 -07:00
Summer Deng	97ea44b34a	Fix issue in quantization error measurement when followed by Relu (#21890 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21890 As title Reviewed By: jspark1105 Differential Revision: D15739808 fbshipit-source-id: 8fbcca04f0711fd9f994d67e1f4a604ef9fa42c6	2019-06-19 22:29:54 -07:00
Jongsoo Park	1ffa9d3d3b	correct measure quantization error when followed_by=Relu and dequantize_output=1 (#21664 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21664 As title Reviewed By: csummersea Differential Revision: D15770947 fbshipit-source-id: 57f5842e1a250300703b02134c314e4f06b767b8	2019-06-11 23:36:15 -07:00
Jongsoo Park	afd202be9f	StoreMatrixInMatrixMarketFormat can store both integer and float tensors (#21606 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21606 StoreMatrixInMatrixMarketFormat was able to dump quantized tensors only but sometimes we want to dump float tensors. Reviewed By: csummersea Differential Revision: D15741611 fbshipit-source-id: 95b03c2fdf1bd8407f7d925171d9dc9f25677464	2019-06-11 17:28:19 -07:00
Rui Zhu	2b902e9738	Fix the offset numerical bug when casting (#21484 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21484 cast<int32_t*> => cast<int32_t> Also fixed reserve problem which might cause incorrect pointer. Reviewed By: yinghai Differential Revision: D15699866 fbshipit-source-id: 374418476bddd60f5c5306c8c57319ccf28b9990	2019-06-07 12:33:18 -07:00
Daya Khudia	80a083ef92	Remove unneeded headers (#21393 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21393 Result of splitting the base diff. We moved a header from src/* to include/fbgemm/* Reviewed By: jianyuh Differential Revision: D15635188 fbshipit-source-id: ad7d0ddba964ff1cb8b2e33f5f98e457a4d2eac9	2019-06-06 14:23:54 -07:00
Yinghai Lu	cf7ef5e631	Add onnxifi support for Int8FCDNNLowPPackedWeightBlob (#20564 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20564 Reviewed By: bddppq Differential Revision: D15106712 fbshipit-source-id: 428db9c23cfd36ddedc8d79121fbbb3bb484c993	2019-05-20 16:57:11 -07:00
Jongsoo Park	101176870e	eliminate FE_INVALID exceptions related to fp16 conversion (#20390 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20390 duc0 Ngo implemented observing floating point exceptions but there were a couple of places where we have "benign" floating point exceptions leading to false positives. This diff eliminates one source of such false positives, namely using _mm256_cvtph_ps and _mm256_cvtps_ph for partially uninitialized array for the remainder loop. Reviewed By: hx89 Differential Revision: D15307358 fbshipit-source-id: 38f57dfdd90c70bc693292d2f9c33c7ba558e2c9	2019-05-13 23:42:01 -07:00
Yinghai Lu	56977db4a7	Provide option to save quantized data for DNNLOWP without layout optimization (#19681 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19681 For accelerator, we need to lower just the quantized weights data without layout transformation. This diff attempts to provide this option. Reviewed By: jerryzh168, zrphercule Differential Revision: D15066568 fbshipit-source-id: 133d749e087c2ad4a899bee5e96f597f70b2443c	2019-04-30 12:32:42 -07:00
Daya S Khudia	d868c97580	Improve performance of Int8SpatialBN (needed for DF4 quantization) (#19702 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19702 avx2 implementation of core compute for Int8SpatialBN Reviewed By: jianyuh Differential Revision: D15073973 fbshipit-source-id: c30b0c621348ba9331ba5e48b281c00cf6e479a1	2019-04-30 10:26:48 -07:00
Summer Deng	cbd0a2d3c9	Fix the depthwise 3x3x3 fast path criteria for the stride (#19692 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19692 Remove the requirement on stride for the optimized depthwise 3x3x3 kernels. Reviewed By: jspark1105 Differential Revision: D15070214 fbshipit-source-id: 9fe2d8e96930166e4eb0e2dd2288f6a0c4831e0a	2019-04-24 21:35:27 -07:00
Jongsoo Park	ffc9e29844	unit test with multiple op invocations (#19118 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19118 A bug introduced by D14700576 reported by Yufei (fixed by D14778810 and D14785256) was not detected by our units tests. This diff improves unit tests to catch such errors (with this diff and without D14778810, we can reproduce the bug Yufei reported). This improvement also revealed a bug that affects the accuracy when we pre-pack weight and bias together and the pre-packed weight/bias are used by multiple nets. We were modifying the pre-packed bias in-place which was supposed to be constants. Reviewed By: csummersea Differential Revision: D14806077 fbshipit-source-id: aa9049c74b6ea98d21fbd097de306447a662a46d	2019-04-15 14:41:28 -07:00
Summer Deng	496b0b03d9	amend D14778810 (#18902 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18902 Fix in D14778810 had an issue that when we fallback to acc32 because the density of outlier is too high W_quantized_ is already modified. In this diff we first just count the number of outliers (without modifying W_quantized_) and only when density is low enough and no need for fallback we modify W_quantized_ and construct an outlier matrix. Reviewed By: jspark1105 Differential Revision: D14785256 fbshipit-source-id: 03933110a4ca7409686a06b18a9bb921f8657950	2019-04-09 22:08:54 -07:00
Summer Deng	02968398d5	Fix a dev mode bug in activation distribution observer (#19004 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19004 Handling the exception case when the data has min 3.40282e+38 max -3.40282e+38 Reviewed By: jspark1105 Differential Revision: D14822193 fbshipit-source-id: b9771d1584fdf8317f5b8c7f5806be5d27314386	2019-04-08 09:36:50 -07:00
Summer Deng	907b4c5890	fix bug when falling back to acc32 when weight is prepacked (#18974 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18974 When the weight is prepacked and it doesn't contain a prepacked weight for acc32, we shouldn't fallback to acc32. Reviewed By: bddppq Differential Revision: D14814067 fbshipit-source-id: aec917322de695e283f0aca1e930c5603d196404	2019-04-06 21:53:08 -07:00
Junjie Bai	46fe266507	Revert D14778810: [caffe2/int8] fix bug when falling back to acc32 when weight is prepacked Differential Revision: D14778810 Original commit changeset: d49a8c4b7c81 fbshipit-source-id: 15568b084848de74437582548bec42aadc74080d	2019-04-05 14:01:33 -07:00
Summer Deng	28990f34d9	fix bug when falling back to acc32 when weight is prepacked (#18881 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18881 Pull Request resolved: https://github.com/pytorch/pytorch/pull/18878 When the weight is prepacked and it doesn't contain a prepacked weight for acc32, we shouldn't fallback to acc32. TODO: add unit tests with better coverage Reviewed By: feiyu1990 Differential Revision: D14778810 fbshipit-source-id: d49a8c4b7c815ab29b77feb53ee730ad63780488	2019-04-05 13:00:26 -07:00
Jongsoo Park	fa0ad057f8	fold col offset into bias; optimize A symmetric quant (#17026 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17026 D14013931 was for FC. This diff is similar optimizations for Conv. A subtle difference is that in FC, once we fold col_offset into bias during pre-processing step, we can treat everything as if A_zero_offset == 0 (symmetric quantization of A). In Conv, we can't do this because padding still needs to use the original A_zero_offset. From requantization point of view, once col_offset folded into bias, we can treat as if we're doing symmetric A quantization. But, for steps involving padding like im2col, im2col fused with packing, and direct conv for depth-wise/group convolution we still need to pass the original A_zero_offset. Reviewed By: jianyuh Differential Revision: D14020276 fbshipit-source-id: c29caefd1127bbc6aff0e9d535939bb0c1ecb66c	2019-04-03 22:52:54 -07:00
Jongsoo Park	06b7fe59f2	use optimization in D14020675 (#16945 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16945 As title Reviewed By: jianyuh Differential Revision: D14020769 fbshipit-source-id: fc0f05fcc57bfe9b4aa0c5750060d7b2ba57dd7a	2019-04-03 08:05:10 -07:00
Jongsoo Park	f084c129db	add Int8FCRelu (#18673 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18673 Add a fused FC + Relu Reviewed By: csummersea Differential Revision: D14667055 fbshipit-source-id: d88fefba008fc0ca450291532d2b320694c6b785	2019-04-01 23:50:30 -07:00
Junjie Bai	246f5c412e	Revert "Tensor construction codemod(raw_mutable_data) (#16373 )" (#18680 ) Summary: This reverts commit `d73c830e23`. We have observed significant perf drop when training ResNext101 with multiple amd GPUs: Before: https://ci.pytorch.org/jenkins/job/caffe2-builds/job/py2-clang7-rocmdeb-ubuntu16.04-bench/1636/console 2 GPUs ResNext training got 150\~160 imgs/sec 4 GPUs ResNext training got 270\~280 imgs/sec After: https://ci.pytorch.org/jenkins/job/caffe2-builds/job/py2-clang7-rocmdeb-ubuntu16.04-bench/1637/console Both 2 and 4 GPUs ResNext training drop to 110\~120 imgs/sec Similar perf drop are seen on ResNet50 training jobs as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18680 Differential Revision: D14702941 Pulled By: bddppq fbshipit-source-id: 828141805afc23f25c08d4a2eb6d4b99f817c128	2019-04-01 14:39:13 -07:00
Jongsoo Park	89e9b1cf8e	add ConvRelu schema (#18693 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18693 As title Reviewed By: protonu Differential Revision: D14662880 fbshipit-source-id: 3664faa660a04e1f528a413d2a1700b872c3c684	2019-04-01 13:09:07 -07:00
Jongsoo Park	822c8ee143	use acc16 only when n>128 and k>128 in Skylake (#18672 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18672 In Skylake, when n < 128 or k < 128, acc16 is slower. Reviewed By: jianyuh Differential Revision: D14700576 fbshipit-source-id: 80ca9f1af4626637eed9c5ca49f95ae744811189	2019-04-01 08:52:28 -07:00
Jongsoo Park	505d50ea90	handle a rare case of histogram min is inf/nan (#18239 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18239 When min is inf or nan, we get UBSAN errors Reviewed By: csummersea Differential Revision: D14537668 fbshipit-source-id: e70ffb5ecd2b10793356070c69fdabf8f25b203e	2019-03-31 21:32:54 -07:00
Jerry Zhang	d73c830e23	Tensor construction codemod(raw_mutable_data) (#16373 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16373 motivation: https://github.com/pytorch/pytorch/pull/12407 This is a manual diff. most of the fixes should be: ``` auto* Y = Output(0); Y->Resize(dims); Y->raw_mutable_data(dtype); ``` --> ``` auto* Y = Output(0, dims, at::dtype(dtype)); ``` But there might be other cases. Reviewed By: dzhulgakov Differential Revision: D13725460 fbshipit-source-id: 649a4b0e42f62cda1a60171dd9fa3e440dc9dca1	2019-03-29 18:36:46 -07:00
Summer Deng	7c438c82eb	Change dnnlowp log level from warning to v2 (#18576 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18576 As in title Reviewed By: feiyu1990 Differential Revision: D14670898 fbshipit-source-id: 1983099b2ba57daab393278553f10dcdb1812fdf	2019-03-29 09:29:25 -07:00
Summer Deng	c297f26843	Add more options to the quantization model exporter (#18383 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18383 Add command line options for different quantization schemes. Reviewed By: amylittleyang Differential Revision: D14476862 fbshipit-source-id: 37fbf5b4c1c550121eae313f5a71d703a0a87f0f	2019-03-25 04:23:17 -07:00
Jianyu Huang	18a6781f57	Fix alignment issues for Fake BFP16 fp32 -> bfp16 rounding routines (#18321 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18321 As title. Reviewed By: jspark1105 Differential Revision: D14575512 fbshipit-source-id: 0e33cdab54b1aef8b67f0b4c366692c5dbdf631d	2019-03-22 12:41:58 -07:00
Jongsoo Park	77a7285764	add more Python interface functions to make quantization simpler (#18246 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18246 Simplifies histogram collection and quantization process. Histogram collection before this diff was something like this ``` from caffe2.quantization.server import dnnlowp_pybind11 ... dnnlowp_pybind11.ObserveHistogramOfOutput(hist_file) for ... workspace.RunNet(predict_net) dnnlowp_pybind11.ClearNetObservers() # This is to trigger Stop function in the observer to dump out histogram file but this can have unintended consequence of also clearing all the other useful observers we attached ``` After this diff we can ``` workspace.CreateNet(predict_net) # Note we need to create net to have a net to attach observer histogram_observer = dnnlowp_pybind11.AddHistogramObserver(predic_net, hist_file) for ... workspace.RunNet(predict_net) predict_net.RemoveObserver(histogram_observer) ``` Choosing quantization parameters of weights before this diff was something like this ``` dnnlowp_pybind11.ObserveHistogramOfOutput(weight_hist_file) workspace.RunNetOnce(init_net) dnnlowp_pybind11.ClearNetObservers() # Has same issue as the histogram collection example above dnnlowp_pybind11.RegisterQuantizationParamsWithHistogram( weight_hist_file, is_weight=True, qparams_output_file_name=qparams_file ) workspace.CreateNet(init_net, overwrite=True) dnnlowp_pybind11.ClearNetObservers() logger.info("Loading quantization params from {}".format(qparams_file)) blobs_to_qparams = {} with open(qparams_file) as f: lines = f.readlines() for line in lines: op_id, op_type, output_id, tensor_name, mini, maxi, scale, zero_point, precision = ( line.split() ) op_id = int(op_id) output_id = int(output_id) op = net.Proto().op[op_id] if op_type != op.type or op.output[output_id] != tensor_name: print( "Corrupt qparams file {} {} {} {} {}".format( qparams_file, op_type, op.type, op.output[output_id], tensor_name ) ) blobs_to_qparams[tensor_name] = QuantizationParam(float(scale), int(zero_point)) ``` After this diff this can be simplified to ``` blobs_to_qparams = {} for op in init_net.Proto().op: for output in op.output: scale, zero_point = dnnlowp_pybind11.ChooseQuantizationParams(output) blobs_to_qparams[output] = QuantizationParam(scale, zero_point) ``` Reviewed By: dskhudia Differential Revision: D14544694 fbshipit-source-id: 4fd06cd63256201e2e9d15c39f503138d1be53c2	2019-03-22 00:52:24 -07:00
Junjie Bai	46439c78d0	Replace the remaining usages of IntList in caffe2 to IntArrayRef Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18282 Differential Revision: D14569269 Pulled By: bddppq fbshipit-source-id: 5fc33701b83f9efdec4b456d2691764831d10e7f	2019-03-21 16:34:38 -07:00
Jongsoo Park	bbbabda4e8	handle dst_bin_width==0 case properly (#18240 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18240 For rare cases when dst_bin_width == 0 we should just put all numbers to an arbitrary bin. Reviewed By: csummersea Differential Revision: D14544685 fbshipit-source-id: 02d04ff8bd1555d6cf7e7eeb1196a4ab3325a9e5	2019-03-20 17:11:25 -07:00
Jongsoo Park	87b6cbb6fd	fix bug in pool_dnnlowp_op_avx2.cc (#18141 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18141 VLEN should've been 32 Reviewed By: jianyuh Differential Revision: D14510780 fbshipit-source-id: ddf12746e1c69677a268432432ddb088cc210084	2019-03-18 16:31:42 -07:00
Thomas Viehmann	13bc002422	fixes for AVX detection (#17915 ) Summary: Our AVX2 routines use functions such as _mm256_extract_epi64 that do not exist on 32 bit systems even when they have AVX2. This disables AVX2 when _mm256_extract_epi64 does not exist. This fixes the "local" part of #17901 (except disabling FBGEMM), but there also is sleef to be updated and NNPACK to be fixed, see the bug report for further discussion. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17915 Differential Revision: D14437338 Pulled By: soumith fbshipit-source-id: d4ef7e0801b5d1222a855a38ec207dd88b4680da	2019-03-13 03:55:06 -07:00
Jongsoo Park	92e35ac0a7	fix overly restrictive assertion (#17939 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17939 Instead of just asserting min <= 0 and max >= 0 , we adjust histogram to include 0 in the range. We need to include 0 in the range during norm error minimization to correctly represent our quantization method that includes 0. Reviewed By: csummersea Differential Revision: D14428732 fbshipit-source-id: 6669a9d2c7d409ec3b31aee0afe48071986b9b71	2019-03-12 18:18:49 -07:00
Summer Deng	c10c73f047	Int8 FC performance debugging (#17700 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17700 Add performance debugging utilities in DNNLOWP FC operator and the python script Reviewed By: amylittleyang Differential Revision: D14321299 fbshipit-source-id: 50dbd7b352a1da5d2ecb659d8003e71e70750063	2019-03-08 19:03:54 -08:00
Jerry Zhang	ac87488bd3	Change ConvPoolOp<Context>::SetOutputSize to ConvPoolOp<Context>::GetOutputSize (#17764 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17764 Original commit changeset: f1923fdca4a1 reverted int8 ops fixes the original runtime regression. We'll ignore the memory regression since it is flaky, see D14228484 Reviewed By: dzhulgakov Differential Revision: D13885233 fbshipit-source-id: ccbe4b94acb44b7b4cb3ae4d73e3f6091e1e1195	2019-03-07 18:38:53 -08:00
Jongsoo Park	aea8dd8377	print warnings when DNNLOWP_16 or DNNLOWP_ROWWISE_16 engine is used (#17176 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17176 As title Reviewed By: csummersea Differential Revision: D14111616 fbshipit-source-id: 1282cb2452c4ad385fd2dc6d3f8c19e9fec715ff	2019-03-04 14:28:42 -08:00
Jongsoo Park	222a07863f	optimize elementwise sum (#17456 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17456 Using an instruction sequence similar to function in fbgemm/src/QuantUtilAvx2.cc elementwise_sum_benchmark added Reviewed By: protonu Differential Revision: D14205695 fbshipit-source-id: 84939c9d3551f123deec3baf7086c8d31fbc873e	2019-02-27 10:12:41 -08:00
Jongsoo Park	08fed51926	optimize max pool 2d (#17418 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17418 Retry of D14181620 this time with CMakeLists.txt changes Reviewed By: jianyuh Differential Revision: D14190538 fbshipit-source-id: c59b1bd474edf6376f4c2767a797b041a2ddf742	2019-02-22 19:43:57 -08:00
Lu Fang	0c24f3754b	Revert D14181620: [caffe2/int8] optimize max pool 2d Differential Revision: D14181620 Original commit changeset: ffc6c4412bd1 fbshipit-source-id: 4391703164a672c9a8daecb24a46578765df67c6	2019-02-22 11:23:59 -08:00
Jongsoo Park	4778a4089e	optimize max pool 2d (#17391 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17391 Optimize 2D max pool using AVX2 intrinsics. Reviewed By: jianyuh Differential Revision: D14181620 fbshipit-source-id: ffc6c4412bd1c1d7839fe06226921df40d9cab83	2019-02-22 10:36:19 -08:00
Jongsoo Park	dad0dbd3b9	merge fully_connected_rowwise_dnnlowp_op into fully_connected_dnnlowp_op (#17105 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17105 To make FC with rowwise quantization faster, reduce code duplication, and make code consistent with Convolution Reviewed By: csummersea Differential Revision: D14080461 fbshipit-source-id: 2b0e67b86e7e3029c90751a8824bf80ae1223680	2019-02-15 09:50:11 -08:00
Jongsoo Park	90fc6133b2	bug fix when we prepack weight and bias together (#17145 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17145 Prepacked weight contains both weight and bias, so the bias should be obtained from input index 1, not from 2 Reviewed By: jianyuh Differential Revision: D14097281 fbshipit-source-id: b8b836b85a7b240e2fd1734377c46d9bf2ce3390	2019-02-15 09:21:20 -08:00

1 2 3

118 Commits