pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Summer Deng	cbd0a2d3c9	Fix the depthwise 3x3x3 fast path criteria for the stride (#19692 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19692 Remove the requirement on stride for the optimized depthwise 3x3x3 kernels. Reviewed By: jspark1105 Differential Revision: D15070214 fbshipit-source-id: 9fe2d8e96930166e4eb0e2dd2288f6a0c4831e0a	2019-04-24 21:35:27 -07:00
Jongsoo Park	ffc9e29844	unit test with multiple op invocations (#19118 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19118 A bug introduced by D14700576 reported by Yufei (fixed by D14778810 and D14785256) was not detected by our units tests. This diff improves unit tests to catch such errors (with this diff and without D14778810, we can reproduce the bug Yufei reported). This improvement also revealed a bug that affects the accuracy when we pre-pack weight and bias together and the pre-packed weight/bias are used by multiple nets. We were modifying the pre-packed bias in-place which was supposed to be constants. Reviewed By: csummersea Differential Revision: D14806077 fbshipit-source-id: aa9049c74b6ea98d21fbd097de306447a662a46d	2019-04-15 14:41:28 -07:00
Summer Deng	496b0b03d9	amend D14778810 (#18902 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18902 Fix in D14778810 had an issue that when we fallback to acc32 because the density of outlier is too high W_quantized_ is already modified. In this diff we first just count the number of outliers (without modifying W_quantized_) and only when density is low enough and no need for fallback we modify W_quantized_ and construct an outlier matrix. Reviewed By: jspark1105 Differential Revision: D14785256 fbshipit-source-id: 03933110a4ca7409686a06b18a9bb921f8657950	2019-04-09 22:08:54 -07:00
Summer Deng	02968398d5	Fix a dev mode bug in activation distribution observer (#19004 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19004 Handling the exception case when the data has min 3.40282e+38 max -3.40282e+38 Reviewed By: jspark1105 Differential Revision: D14822193 fbshipit-source-id: b9771d1584fdf8317f5b8c7f5806be5d27314386	2019-04-08 09:36:50 -07:00
Summer Deng	907b4c5890	fix bug when falling back to acc32 when weight is prepacked (#18974 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18974 When the weight is prepacked and it doesn't contain a prepacked weight for acc32, we shouldn't fallback to acc32. Reviewed By: bddppq Differential Revision: D14814067 fbshipit-source-id: aec917322de695e283f0aca1e930c5603d196404	2019-04-06 21:53:08 -07:00
Junjie Bai	46fe266507	Revert D14778810: [caffe2/int8] fix bug when falling back to acc32 when weight is prepacked Differential Revision: D14778810 Original commit changeset: d49a8c4b7c81 fbshipit-source-id: 15568b084848de74437582548bec42aadc74080d	2019-04-05 14:01:33 -07:00
Summer Deng	28990f34d9	fix bug when falling back to acc32 when weight is prepacked (#18881 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18881 Pull Request resolved: https://github.com/pytorch/pytorch/pull/18878 When the weight is prepacked and it doesn't contain a prepacked weight for acc32, we shouldn't fallback to acc32. TODO: add unit tests with better coverage Reviewed By: feiyu1990 Differential Revision: D14778810 fbshipit-source-id: d49a8c4b7c815ab29b77feb53ee730ad63780488	2019-04-05 13:00:26 -07:00
Jongsoo Park	fa0ad057f8	fold col offset into bias; optimize A symmetric quant (#17026 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17026 D14013931 was for FC. This diff is similar optimizations for Conv. A subtle difference is that in FC, once we fold col_offset into bias during pre-processing step, we can treat everything as if A_zero_offset == 0 (symmetric quantization of A). In Conv, we can't do this because padding still needs to use the original A_zero_offset. From requantization point of view, once col_offset folded into bias, we can treat as if we're doing symmetric A quantization. But, for steps involving padding like im2col, im2col fused with packing, and direct conv for depth-wise/group convolution we still need to pass the original A_zero_offset. Reviewed By: jianyuh Differential Revision: D14020276 fbshipit-source-id: c29caefd1127bbc6aff0e9d535939bb0c1ecb66c	2019-04-03 22:52:54 -07:00
Jongsoo Park	06b7fe59f2	use optimization in D14020675 (#16945 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16945 As title Reviewed By: jianyuh Differential Revision: D14020769 fbshipit-source-id: fc0f05fcc57bfe9b4aa0c5750060d7b2ba57dd7a	2019-04-03 08:05:10 -07:00
Jongsoo Park	f084c129db	add Int8FCRelu (#18673 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18673 Add a fused FC + Relu Reviewed By: csummersea Differential Revision: D14667055 fbshipit-source-id: d88fefba008fc0ca450291532d2b320694c6b785	2019-04-01 23:50:30 -07:00
Junjie Bai	246f5c412e	Revert "Tensor construction codemod(raw_mutable_data) (#16373 )" (#18680 ) Summary: This reverts commit `d73c830e23`. We have observed significant perf drop when training ResNext101 with multiple amd GPUs: Before: https://ci.pytorch.org/jenkins/job/caffe2-builds/job/py2-clang7-rocmdeb-ubuntu16.04-bench/1636/console 2 GPUs ResNext training got 150\~160 imgs/sec 4 GPUs ResNext training got 270\~280 imgs/sec After: https://ci.pytorch.org/jenkins/job/caffe2-builds/job/py2-clang7-rocmdeb-ubuntu16.04-bench/1637/console Both 2 and 4 GPUs ResNext training drop to 110\~120 imgs/sec Similar perf drop are seen on ResNet50 training jobs as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18680 Differential Revision: D14702941 Pulled By: bddppq fbshipit-source-id: 828141805afc23f25c08d4a2eb6d4b99f817c128	2019-04-01 14:39:13 -07:00
Jongsoo Park	89e9b1cf8e	add ConvRelu schema (#18693 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18693 As title Reviewed By: protonu Differential Revision: D14662880 fbshipit-source-id: 3664faa660a04e1f528a413d2a1700b872c3c684	2019-04-01 13:09:07 -07:00
Jongsoo Park	822c8ee143	use acc16 only when n>128 and k>128 in Skylake (#18672 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18672 In Skylake, when n < 128 or k < 128, acc16 is slower. Reviewed By: jianyuh Differential Revision: D14700576 fbshipit-source-id: 80ca9f1af4626637eed9c5ca49f95ae744811189	2019-04-01 08:52:28 -07:00
Jongsoo Park	505d50ea90	handle a rare case of histogram min is inf/nan (#18239 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18239 When min is inf or nan, we get UBSAN errors Reviewed By: csummersea Differential Revision: D14537668 fbshipit-source-id: e70ffb5ecd2b10793356070c69fdabf8f25b203e	2019-03-31 21:32:54 -07:00
Jerry Zhang	d73c830e23	Tensor construction codemod(raw_mutable_data) (#16373 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16373 motivation: https://github.com/pytorch/pytorch/pull/12407 This is a manual diff. most of the fixes should be: ``` auto* Y = Output(0); Y->Resize(dims); Y->raw_mutable_data(dtype); ``` --> ``` auto* Y = Output(0, dims, at::dtype(dtype)); ``` But there might be other cases. Reviewed By: dzhulgakov Differential Revision: D13725460 fbshipit-source-id: 649a4b0e42f62cda1a60171dd9fa3e440dc9dca1	2019-03-29 18:36:46 -07:00
Summer Deng	7c438c82eb	Change dnnlowp log level from warning to v2 (#18576 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18576 As in title Reviewed By: feiyu1990 Differential Revision: D14670898 fbshipit-source-id: 1983099b2ba57daab393278553f10dcdb1812fdf	2019-03-29 09:29:25 -07:00
Summer Deng	c297f26843	Add more options to the quantization model exporter (#18383 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18383 Add command line options for different quantization schemes. Reviewed By: amylittleyang Differential Revision: D14476862 fbshipit-source-id: 37fbf5b4c1c550121eae313f5a71d703a0a87f0f	2019-03-25 04:23:17 -07:00
Jianyu Huang	18a6781f57	Fix alignment issues for Fake BFP16 fp32 -> bfp16 rounding routines (#18321 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18321 As title. Reviewed By: jspark1105 Differential Revision: D14575512 fbshipit-source-id: 0e33cdab54b1aef8b67f0b4c366692c5dbdf631d	2019-03-22 12:41:58 -07:00
Jongsoo Park	77a7285764	add more Python interface functions to make quantization simpler (#18246 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18246 Simplifies histogram collection and quantization process. Histogram collection before this diff was something like this ``` from caffe2.quantization.server import dnnlowp_pybind11 ... dnnlowp_pybind11.ObserveHistogramOfOutput(hist_file) for ... workspace.RunNet(predict_net) dnnlowp_pybind11.ClearNetObservers() # This is to trigger Stop function in the observer to dump out histogram file but this can have unintended consequence of also clearing all the other useful observers we attached ``` After this diff we can ``` workspace.CreateNet(predict_net) # Note we need to create net to have a net to attach observer histogram_observer = dnnlowp_pybind11.AddHistogramObserver(predic_net, hist_file) for ... workspace.RunNet(predict_net) predict_net.RemoveObserver(histogram_observer) ``` Choosing quantization parameters of weights before this diff was something like this ``` dnnlowp_pybind11.ObserveHistogramOfOutput(weight_hist_file) workspace.RunNetOnce(init_net) dnnlowp_pybind11.ClearNetObservers() # Has same issue as the histogram collection example above dnnlowp_pybind11.RegisterQuantizationParamsWithHistogram( weight_hist_file, is_weight=True, qparams_output_file_name=qparams_file ) workspace.CreateNet(init_net, overwrite=True) dnnlowp_pybind11.ClearNetObservers() logger.info("Loading quantization params from {}".format(qparams_file)) blobs_to_qparams = {} with open(qparams_file) as f: lines = f.readlines() for line in lines: op_id, op_type, output_id, tensor_name, mini, maxi, scale, zero_point, precision = ( line.split() ) op_id = int(op_id) output_id = int(output_id) op = net.Proto().op[op_id] if op_type != op.type or op.output[output_id] != tensor_name: print( "Corrupt qparams file {} {} {} {} {}".format( qparams_file, op_type, op.type, op.output[output_id], tensor_name ) ) blobs_to_qparams[tensor_name] = QuantizationParam(float(scale), int(zero_point)) ``` After this diff this can be simplified to ``` blobs_to_qparams = {} for op in init_net.Proto().op: for output in op.output: scale, zero_point = dnnlowp_pybind11.ChooseQuantizationParams(output) blobs_to_qparams[output] = QuantizationParam(scale, zero_point) ``` Reviewed By: dskhudia Differential Revision: D14544694 fbshipit-source-id: 4fd06cd63256201e2e9d15c39f503138d1be53c2	2019-03-22 00:52:24 -07:00
Junjie Bai	46439c78d0	Replace the remaining usages of IntList in caffe2 to IntArrayRef Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18282 Differential Revision: D14569269 Pulled By: bddppq fbshipit-source-id: 5fc33701b83f9efdec4b456d2691764831d10e7f	2019-03-21 16:34:38 -07:00
Jongsoo Park	bbbabda4e8	handle dst_bin_width==0 case properly (#18240 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18240 For rare cases when dst_bin_width == 0 we should just put all numbers to an arbitrary bin. Reviewed By: csummersea Differential Revision: D14544685 fbshipit-source-id: 02d04ff8bd1555d6cf7e7eeb1196a4ab3325a9e5	2019-03-20 17:11:25 -07:00
Jongsoo Park	87b6cbb6fd	fix bug in pool_dnnlowp_op_avx2.cc (#18141 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18141 VLEN should've been 32 Reviewed By: jianyuh Differential Revision: D14510780 fbshipit-source-id: ddf12746e1c69677a268432432ddb088cc210084	2019-03-18 16:31:42 -07:00
Thomas Viehmann	13bc002422	fixes for AVX detection (#17915 ) Summary: Our AVX2 routines use functions such as _mm256_extract_epi64 that do not exist on 32 bit systems even when they have AVX2. This disables AVX2 when _mm256_extract_epi64 does not exist. This fixes the "local" part of #17901 (except disabling FBGEMM), but there also is sleef to be updated and NNPACK to be fixed, see the bug report for further discussion. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17915 Differential Revision: D14437338 Pulled By: soumith fbshipit-source-id: d4ef7e0801b5d1222a855a38ec207dd88b4680da	2019-03-13 03:55:06 -07:00
Jongsoo Park	92e35ac0a7	fix overly restrictive assertion (#17939 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17939 Instead of just asserting min <= 0 and max >= 0 , we adjust histogram to include 0 in the range. We need to include 0 in the range during norm error minimization to correctly represent our quantization method that includes 0. Reviewed By: csummersea Differential Revision: D14428732 fbshipit-source-id: 6669a9d2c7d409ec3b31aee0afe48071986b9b71	2019-03-12 18:18:49 -07:00
Summer Deng	c10c73f047	Int8 FC performance debugging (#17700 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17700 Add performance debugging utilities in DNNLOWP FC operator and the python script Reviewed By: amylittleyang Differential Revision: D14321299 fbshipit-source-id: 50dbd7b352a1da5d2ecb659d8003e71e70750063	2019-03-08 19:03:54 -08:00
Jerry Zhang	ac87488bd3	Change ConvPoolOp<Context>::SetOutputSize to ConvPoolOp<Context>::GetOutputSize (#17764 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17764 Original commit changeset: f1923fdca4a1 reverted int8 ops fixes the original runtime regression. We'll ignore the memory regression since it is flaky, see D14228484 Reviewed By: dzhulgakov Differential Revision: D13885233 fbshipit-source-id: ccbe4b94acb44b7b4cb3ae4d73e3f6091e1e1195	2019-03-07 18:38:53 -08:00
Jongsoo Park	aea8dd8377	print warnings when DNNLOWP_16 or DNNLOWP_ROWWISE_16 engine is used (#17176 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17176 As title Reviewed By: csummersea Differential Revision: D14111616 fbshipit-source-id: 1282cb2452c4ad385fd2dc6d3f8c19e9fec715ff	2019-03-04 14:28:42 -08:00
Jongsoo Park	222a07863f	optimize elementwise sum (#17456 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17456 Using an instruction sequence similar to function in fbgemm/src/QuantUtilAvx2.cc elementwise_sum_benchmark added Reviewed By: protonu Differential Revision: D14205695 fbshipit-source-id: 84939c9d3551f123deec3baf7086c8d31fbc873e	2019-02-27 10:12:41 -08:00
Jongsoo Park	08fed51926	optimize max pool 2d (#17418 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17418 Retry of D14181620 this time with CMakeLists.txt changes Reviewed By: jianyuh Differential Revision: D14190538 fbshipit-source-id: c59b1bd474edf6376f4c2767a797b041a2ddf742	2019-02-22 19:43:57 -08:00
Lu Fang	0c24f3754b	Revert D14181620: [caffe2/int8] optimize max pool 2d Differential Revision: D14181620 Original commit changeset: ffc6c4412bd1 fbshipit-source-id: 4391703164a672c9a8daecb24a46578765df67c6	2019-02-22 11:23:59 -08:00
Jongsoo Park	4778a4089e	optimize max pool 2d (#17391 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17391 Optimize 2D max pool using AVX2 intrinsics. Reviewed By: jianyuh Differential Revision: D14181620 fbshipit-source-id: ffc6c4412bd1c1d7839fe06226921df40d9cab83	2019-02-22 10:36:19 -08:00
Jongsoo Park	dad0dbd3b9	merge fully_connected_rowwise_dnnlowp_op into fully_connected_dnnlowp_op (#17105 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17105 To make FC with rowwise quantization faster, reduce code duplication, and make code consistent with Convolution Reviewed By: csummersea Differential Revision: D14080461 fbshipit-source-id: 2b0e67b86e7e3029c90751a8824bf80ae1223680	2019-02-15 09:50:11 -08:00
Jongsoo Park	90fc6133b2	bug fix when we prepack weight and bias together (#17145 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17145 Prepacked weight contains both weight and bias, so the bias should be obtained from input index 1, not from 2 Reviewed By: jianyuh Differential Revision: D14097281 fbshipit-source-id: b8b836b85a7b240e2fd1734377c46d9bf2ce3390	2019-02-15 09:21:20 -08:00
Jongsoo Park	0a975d333f	add pre-packing operation in README.md (#17151 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17151 As title Reviewed By: jianyuh Differential Revision: D14084272 fbshipit-source-id: e58c041e0374f6e82b337e5b6325ef06981ad8b4	2019-02-14 22:46:47 -08:00
Summer Deng	a1f2ed008f	Minor fix of the histogram observer in FBL eval flows (#17118 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17118 Fix the bug in quantization eval workflow; Add mul_nets option in histogram observer pybind Reviewed By: yinghai Differential Revision: D14085321 fbshipit-source-id: 08e3153148522ebc9512a57144d9a8ad154bb6f8	2019-02-14 22:02:04 -08:00
Jongsoo Park	92221ad840	Fold col offsets into bias; optimize A symmetric quant (#16942 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16942 We can fold col offsets into bias if zero point of activation is constant. fbgemm still needs to provide an option to pass col offsets in case zero point of activation keep changes (e.g., dynamic quantization). A trick to optimize static quantization case is setting A zero point to 0 after folding into bias. This diff also optimizes when weights use symmetric quantization. When B zero point is 0, we use PackAMatrix instead of PackAWithRowOffset . TODO: Ideally, PackAWithRowOffset should perform as fast as PackAMatrix when B_zero_point is 0 to make client code simpler Same in PackAWithIm2Col and depth-wise convolution (group convolution is already doing this) Reviewed By: csummersea Differential Revision: D14013931 fbshipit-source-id: e4d313343e2a16a451eb910beed30e35de02a40c	2019-02-12 17:33:06 -08:00
Summer Deng	b5111918cd	Activation histogram net observer with multiple histogram files as output (#16855 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16855 Save the histogram of each net to a separate file Reviewed By: jspark1105 Differential Revision: D13991610 fbshipit-source-id: a5be4e37a5e63567dcd7fdf99f451ee31bb350a5	2019-02-07 19:51:30 -08:00
Xiaomeng Yang	2db847b3a7	Separate elementwise level2 math functions (#16753 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16753 Separate elementwise level2 math functions i-am-not-moving-c2-to-c10 Reviewed By: houseroad Differential Revision: D13954928 fbshipit-source-id: 1ca7a5d3da96e32510f502e5e4e79168854bee67	2019-02-07 18:38:26 -08:00
Jongsoo Park	8105aaca86	int8 SpatialBN (#16796 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16796 SpatialBN int8 version Reviewed By: dskhudia Differential Revision: D13971224 fbshipit-source-id: e55fd608c161069daaa4e62c618bc14b01f32cb7	2019-02-06 15:32:01 -08:00
Jongsoo Park	30ab1773f9	call istringstream clear after str (#16820 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16820 Sometimes parsing histogram was not working correctly due to changes in D13633256 We need to call istringstream clear after str Reviewed By: csummersea Differential Revision: D13977509 fbshipit-source-id: ce3e8cb390641d8f0b5c9a7d6d6daadffeddbe11	2019-02-06 15:23:08 -08:00
Summer Deng	a7a2618d51	Bug fix in l2 quantization (#16749 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16749 Use global quantization options in l2 quantization Reviewed By: jspark1105 Differential Revision: D13951378 fbshipit-source-id: d4e356149587e5d2d09a6937c7fa1aa131957fd6	2019-02-04 22:31:38 -08:00
Jerry Zhang	2af95d8e3e	Back out "[pt1][tensor] Change ConvPoolOp<Context>::SetOutputSize to ConvPoolOp<Context>::GetOutputSize" (#16516 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16516 Original commit changeset: 64abce3dbaed Reviewed By: dzhulgakov Differential Revision: D13863715 fbshipit-source-id: f1923fdca4a1a82768d9c280a8493ff15a7eb2ba	2019-01-30 12:50:38 -08:00
Jerry Zhang	ff963d4b9f	Change ConvPoolOp<Context>::SetOutputSize to ConvPoolOp<Context>::GetOutputSize (#16273 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16273 Previously we have SetOutputSize which accept a partially initialized Output Tensor and set it to the correct size, the diff change this to GetOutputSize that returns the correct size instead. e.g. ``` auto* Y = Output(0); ConvPoolOp<Context>::SetOutputSize(X, Y, channels); ... Y->mutable_data<T>... ``` --> ``` auto sizes = ConvPoolOp<Context>::GetOutputSize(X, channels); auto* Y = Output(0, sizes, at::dtype<T>()); ``` Reviewed By: dzhulgakov Differential Revision: D13736281 fbshipit-source-id: 64abce3dbaed0b375098463333dfd0ea5a3b1945	2019-01-28 15:56:34 -08:00
Jongsoo Park	1e19fd941f	Fix formating in caffe2/quantization/server/README.md Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14237 Reviewed By: dskhudia Differential Revision: D13751791 Pulled By: jspark1105 fbshipit-source-id: 54f73d5134e596817802c66d43098d18458c2799	2019-01-22 10:15:37 -08:00
Xiaomeng Yang	866c4e3467	Separate Moments from math and optimize it (#16175 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16175 Separate Moments from math and optimize it i-am-not-moving-c2-to-c10 Reviewed By: houseroad Differential Revision: D13742472 fbshipit-source-id: 90757d908d38c98ca69818855aaf68315e525992	2019-01-20 08:53:25 -08:00
Kjell Schubert	a28c0ff7b8	Allow for concurrent quantization in FullyConnectedDNNLowPOp (#16174 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16174 Our service creates a new caffe2 workspace for the same underlying network on multiple threads concurrently at service startup time (later these workspaces are being reused for sequential requests), resulting in concurrent quantization via FullyConnectedDNNLowPOp calling GetOrCreateFbgemmPackBMatrix(). The lazily performed quantizations during the first inference in each workspace are all funnelled through GetOrCreateFbgemmPackBMatrix()'s cache_mutex, which means quantization is serialized, so at service startup time only a single CPU core is being used for around a minute until the serial quantization is done. An better solution would be to avoid the quantization of the same weight matrix of the operator copies in different net copies to begin with, but this here is the simpler solution for our current problem. Reviewed By: jspark1105 Differential Revision: D13708785 fbshipit-source-id: 537519896b3b939c552d67f400bafc8a69ce11eb	2019-01-19 06:00:22 -08:00
Jongsoo Park	964732fa8d	use fbgemm gconv in dnnlowp (#16020 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16020 Needs to go over more iterations. For conv, I think we need a high level interface that abstracts out low-level details of which code path will be taken (acc16, outlier-aware, depth-wise, group conv, ...) otherwise the client code will be complex as can be seen from DNNLOWP Conv ops. This will also help us to make interface more stable. Reviewed By: dskhudia, jianyuh Differential Revision: D13588996 fbshipit-source-id: 9afce9e441bcaf20437fcc2874fb9d4165a46bcb	2019-01-15 00:02:31 -08:00
Jongsoo Park	ca18fb8567	simplify lambda function use in conv dnnlowp ops to fix #15911 (#15996 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15996 As reported in issue #15911, gcc 4.9 was getting internal compiler error due to a complex use of lambda function in conv_dnnlowp_op.cc and conv_acc16_op.cc . This diff simplifies them. Reviewed By: viswanathgs Differential Revision: D13648264 fbshipit-source-id: 1551ae8a0a7653749185dca51ccceb2471b96b82	2019-01-13 23:32:48 -08:00
Jongsoo Park	04b8a2f1ba	fix compile error reported in issue #15911 (#15953 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15953 Fix issue reported in https://github.com/pytorch/pytorch/issues/15911 Reviewed By: csummersea Differential Revision: D13633256 fbshipit-source-id: 3808f100ff7dedfe5e20708e72e6081ff07eb32c	2019-01-12 21:03:12 -08:00
Jongsoo Park	e5266b4ba6	3x3x3 depthwise convolution with per channel quantization (#15775 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15775 Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/55 fbgemm didn't have per-channel quantization for 3x3x3 depth-wise convolution Reviewed By: jianyuh Differential Revision: D13587438 fbshipit-source-id: 91c36fae7a0e8386e3bc49808e18918b01681dd1	2019-01-11 19:42:29 -08:00

1 2 3

101 Commits