Commit Graph

51 Commits

Author SHA1 Message Date
Yinghai Lu
63dbef3038 Better msg (#43848)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43848

Missing space in logging.

Test Plan: build

Reviewed By: hl475

Differential Revision: D23416698

fbshipit-source-id: bf7c494f33836601f5f380c03a0910f419c2e62b
2020-08-31 10:36:59 -07:00
Hector Yuen
c8e789e06e add fake fp16 fusions to net transforms (#42927)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42927

added fp16 fusion to net transforms
refactored the transforms as well as glow_transform to get out of opt/custom so that the OSS builds passed

Test Plan: added net runner tests for this

Reviewed By: yinghai

Differential Revision: D23080881

fbshipit-source-id: ee6451811fedfd07c6560c178229854bca29301f
2020-08-14 13:30:27 -07:00
Hector Yuen
18ca999e1a integrate int8 swish with net transformer
Summary:
add a fuse path for deq->swish->quant
update swish fake op interface to take arguments accordingly

Test Plan:
net_runner passes
unit tests need to be updated

Reviewed By: venkatacrc

Differential Revision: D22962064

fbshipit-source-id: cef79768db3c8af926fca58193d459d671321f80
2020-08-07 23:01:06 -07:00
Stephen Chen
2971bc23a6 Handle fused scale and bias in fake fp16 layernorm
Summary: Allow passing scale and bias to fake fp16 layernorm.

Test Plan: net_runner. Now matches glow's fused layernorm.

Reviewed By: hyuen

Differential Revision: D22952646

fbshipit-source-id: cf9ad055b14f9d0167016a18a6b6e26449cb4de8
2020-08-07 10:48:33 -07:00
Yinghai Lu
5c5d7a9dca Freeze dynamic (re)quantizaiton ops into standard ones (#42591)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42591

We don't support lowering with 2-input Int8Quantize and 4-input Int8FC. Just do a conversion to absorb the quantization params into the op itself.

Test Plan:
```
buck test caffe2/caffe2/quantization/server:quantize_dnnlowp_op_test
```

Reviewed By: benjibc

Differential Revision: D22942673

fbshipit-source-id: a392ba2afdfa39c05c5adcb6c4dc5f814c95e449
2020-08-05 11:53:09 -07:00
Ying Zhang
b2ef7fa359 Add a flag to enforce fp32 to fp16 conversion for all inputs of the onnxifi net. (#39931)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39931

ATT.

Reviewed By: yinghai, ChunliF

Differential Revision: D21993492

fbshipit-source-id: ff386e6e9b95a783906fc1ae6a62462e6559a20b
2020-07-28 16:48:43 -07:00
Yinghai Lu
eb3bf96f95 During inbatch broadcast, move Tile op after Fused8BitRowwiseQuantizedToFloat if applicable (#41464)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41464

If input is int8 rowwise quantized, currently we cannot low it to Glow. And previously, we had some error when running with inbatch broadcast. The main issue is that Tile op doesn't support uint8_t type, which is very easily added here. However, this will result in non-ideal situation that we will leave Tile -> Fused8BitRowwiseQuantizedToFloat on host side, which probably hurt the memory bw a lot. Even we later add the support to Fused8BitRowwiseQuantizedToFloat in Glow, it's still not ideal because we are doing redudant compute on identical columns. So the solution here is to swap the order of Fused8BitRowwiseQuantizedToFloat and Tile to make it Tile -> Fused8BitRowwiseQuantizedToFloat. In this way, it will resolve the error we saw immediately. For the short term, we can still run Tile in card. And for longer term, things runs faster on card.

The optimization is a heuristic. If in the net, there isn't such pattern, inbatch broadcast will work as it was before.

(Note: this ignores all push blocking failures!)

Test Plan:
```
buck test caffe2/caffe2/opt/custom:in_batch_broadcast_test
```

Reviewed By: benjibc

Differential Revision: D22544162

fbshipit-source-id: b6dd36a5925a9c8103b80f034e7730a7a085a6ff
2020-07-16 21:25:18 -07:00
Hector Yuen
d601325de4 update operators in the mapping to fp16 emulation
Summary: add logit and swish to this list

Test Plan: f203925461

Reviewed By: amylittleyang

Differential Revision: D22506814

fbshipit-source-id: b449e4ea16354cb76915adb01cf317cffb494733
2020-07-13 14:08:24 -07:00
Hector Yuen
6d70d1574f rename the LayerNorm operator and add it to the replacement map (#40318)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40318

rename layernom fakefp16 to the right naming convention
add it to the map of replacement ops

this can be done even if the operator is not complete because we are blacklisting anyways

Test Plan: net_runner and inspected the log that replacement happened

Reviewed By: venkatacrc

Differential Revision: D22145900

fbshipit-source-id: f19794ec05234b877f7697ed8b05dd8f46606c47
2020-06-19 16:49:22 -07:00
Yinghai Lu
3ea15af630 [Onnxifi] Allow adding timeout for OnnxifOp run (#40081)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40081

Adding the functionality to enable timeout of OnnxifiOp run. In the case of backend hanging, it can error out quickly.

Test Plan:
```
 buck test glow/fb/test:test_onnxifinnpi -- test_timeout
```

Reviewed By: jackm321

Differential Revision: D22064533

fbshipit-source-id: 25487287c10ab217eb95692f09d48e13e19436ab
2020-06-17 16:21:25 -07:00
Yinghai Lu
00505adbad Add net_pos Tiles added during in-batch broadcast (#40078)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40078

att. It's good to have net_pos for all the ops so that we can distinguish each op in minimizer in net_runner.

Test Plan: unittest

Reviewed By: ipiszy, ChunliF

Differential Revision: D22062748

fbshipit-source-id: 5266abdb6dde63055fdffdba6e8d65bd0f221d7b
2020-06-16 21:51:18 -07:00
Amy Yang
88c5fd94e7 [nnpi eval] enable int8 eval with emulation Int8FC (#39112)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39112

Allow int8 packed weights in int8 model to deserialize to original format. Set default deserialization behavior in eval workflows to original format.

Test Plan: Tested with workflow: f192797187

Reviewed By: yinghai

Differential Revision: D21737940

fbshipit-source-id: 7afaf307b16cb4e85e61f019356f83fdab772c57
2020-05-29 11:59:12 -07:00
Chunli Fu
898d062bfd [disagg_acc] In batch broadcast (#38700)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38700

Reviewed By: yinghai

Differential Revision: D21634147

fbshipit-source-id: 7bd1912654e2433cfb580b5f7a9fb86570a55cab
2020-05-27 15:21:37 -07:00
Yinghai Lu
8338426ed8 Fix infinite loop bug in minimizer (#38507)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38507

With `--merge_fp32_inputs_into_fp16` we added some ops to the net with out net_pos, this makes the cardinality of blacklist pos smaller than number of op in the net. Previously, the updateInternalState() function of minimizer will just enter infinite loop. This diff fixed it by changing the loop condition.

Reviewed By: tracelogfb

Differential Revision: D21578777

fbshipit-source-id: 0d5373fa0a417ded1c80a2dc03248c07b1e0a320
2020-05-18 11:44:05 -07:00
Ansha Yu
25413635d0 [c2][opt] nomnigraph transform for ClipRangesGatherSigridHashV2 fusion (#38004)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38004

Un-backout of D21353550, originally D21262085. No changes here, fix in D21445881.

Fuse ClipRanges + GatherRanges + SigridHash -> ClipRangesGatherSigridHashV2

dpa_product_ctr model's dper2 to dper3 migration is blocked by 3.6% higher prospector cpu usage. Root cause is traced down to sigrid transforms, where ClipRanges, GatherRanges, SigridHash are separately called, instead of fused, as is the case in dper2.

Further context:
https://fb.quip.com/GijaAZtX5mav
https://fb.quip.com/pIDdAjJP2uiG

Test Plan:
Local benchmarking with small model 181513584_0
(Dper3 full model is 178772812, dper2 refresh is 178770392)

Transform turned on: P129799373
Iters per second: 609.291

Transform turned off: P129799397
Iters per second: 519.088

We also want to confirm this performance on the full model in canary and in qrt.

`buck build mode/opt-clang mode/no-gpu caffe2/caffe2/fb/predictor:ptvsc2_predictor_bench`

`MKL_NUM_THREADS=1 OMP_NUM_THREADS=1 ./buck-out/opt/gen/caffe2/caffe2/fb/predictor/ptvsc2_predictor_bench --pred_net=/data/users/ansha/tmp/dpa/small_pred_net.pb --c2_model=/data/users/ansha/tmp/dpa/181513584_0.predictor --c2_inputs=/data/users/ansha/tmp/dpa/c2_inputs_small.pb --iters=3000 --warmup_iters=100 --num_threads=32 --c2_apply_nomnigraph_passes=1 --caffe2_predictor_enable_preproc_fusion=1`

Run dbgo build to check that all transforms happen.

Check that ClipRangesGatherSigridHash is used: https://fburl.com/scuba/caffe2_operator_stats_canary/e6qfdsat

Canaries:
https://our.intern.facebook.com/intern/ads/canary/426498918895712377/
https://our.intern.facebook.com/intern/ads/canary/426498905389730718/
https://our.intern.facebook.com/intern/ads/canary/426498901795492517/

Dbgo canaries:
https://our.intern.facebook.com/intern/ads/canary/426498888067456166/
https://our.intern.facebook.com/intern/ads/canary/426498879652089095/
https://our.intern.facebook.com/intern/ads/canary/426498873491575187/
https://our.intern.facebook.com/intern/ads/canary/426498860171351505/

Reviewed By: houseroad

Differential Revision: D21445887

fbshipit-source-id: a3c15ee30465de693f434b6ee041025c276581ac
2020-05-07 20:00:35 -07:00
Ansha Yu
b410d03e6e Back out "[c2][opt] nomnigraph transform for ClipRangesGatherSigridHash fusion" (#37675)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37675

Original commit changeset: 2c2481e3d497

(Note: this ignores all push blocking failures!)

Test Plan: Back out D21262085 due to ASAN crash P130123493

Differential Revision: D21353550

fbshipit-source-id: c43c8764322f7e58aca0c1360b1d03966b1d9798
2020-05-01 12:49:17 -07:00
Ansha Yu
b97341e3dd [c2][opt] nomnigraph transform for ClipRangesGatherSigridHash fusion (#37535)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37535

Fuse ClipRanges + GatherRanges + SigridHash -> ClipRangesGatherSigridHash

dpa_product_ctr model's dper2 to dper3 migration is blocked by 3.6% higher prospector cpu usage. Root cause is traced down to sigrid transforms, where ClipRanges, GatherRanges, SigridHash are separately called, instead of fused, as is the case in dper2.

Further context:
https://fb.quip.com/GijaAZtX5mav
https://fb.quip.com/pIDdAjJP2uiG

Test Plan:
Local benchmarking with small model 181513584_0
(Dper3 full model is 178772812, dper2 refresh is 178770392)

Transform turned on: P129799373
Iters per second: 609.291

Transform turned off: P129799397
Iters per second: 519.088

We also want to confirm this performance on the full model in canary and in qrt.

`buck build mode/opt-clang mode/no-gpu caffe2/caffe2/fb/predictor:ptvsc2_predictor_bench`

`MKL_NUM_THREADS=1 OMP_NUM_THREADS=1 ./buck-out/opt/gen/caffe2/caffe2/fb/predictor/ptvsc2_predictor_bench --pred_net=/data/users/ansha/tmp/dpa/small_pred_net.pb --c2_model=/data/users/ansha/tmp/dpa/181513584_0.predictor --c2_inputs=/data/users/ansha/tmp/dpa/c2_inputs_small.pb --iters=3000 --warmup_iters=100 --num_threads=32 --c2_apply_nomnigraph_passes=1 --caffe2_predictor_enable_preproc_fusion=1`

Prospector canary:
https://our.intern.facebook.com/intern/ads/canary/426280288521552095/
Check that ClipRangesGatherSigridHash is used: https://fburl.com/scuba/caffe2_operator_stats_canary/e6qfdsat

Reviewed By: yinghai

Differential Revision: D21262085

fbshipit-source-id: 2c2481e3d4977abb8abe6e9ef0c9999382320ab2
2020-04-30 11:03:47 -07:00
Yinghai Lu
dd98abb453 Enable splitSparseLengthsSumSparse in onnxifi (#35555)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35555

Att. So that we can lower the SparseLengthsSum* part of SparseLengthsSum*Sparse. We update the tying policy between Gather and SparsLengthsWeightSum* so that we don't bother lowering a single Gather into the backend, which is inefficient to execute on card and creates bubbles between continuous lowering graphs.

Test Plan:
```
buck test glow/fb/test:test_onnxifinnpi
```

Reviewed By: ipiszy

Differential Revision: D20688525

fbshipit-source-id: cb8e38239057ff13a8d385ed09d0d019421de78b
2020-03-30 13:34:59 -07:00
Benny Chen
dbd2b8bb41 [SigridHashOp] Fix converter (#34836)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34836

Once SigridHashOp argument is supplied, I realized the shape inference is still wrong because the argument is not supplied in the debug_ssa. Thanks to yinghai, I didn't fix the converter, fixing it in this diff

Test Plan:
Run the binary, and checked the exported op

  op {
		input: "sequential_250/parallel/normalization/dper_feature_normalization/sparse_features_processor/sparse_feature_transform/gather_ranges_GSF_IDLIST_COOCCUR_APP_ID_NEKO_ORGANIC_1D_7D_INSTALL_V1/gathered_values_0"
		output: "sequential_250/parallel/normalization/dper_feature_normalization/sparse_features_processor/sparse_feature_transform/sequential_1/hash_feature_ids/SigridHash:0_0"
		type: "SigridHash"
		arg {
			name: "salt"
			i: 0
		}
		arg {
			name: "maxValue"
			i: 100000
		}
		arg {
			name: "hashIntoInt32"
			i: 1
		}
		arg {
			name: "net_pos"
			i: 3
		}
	}

it now have hashIntInt32

Reviewed By: yinghai

Differential Revision: D20457057

fbshipit-source-id: 023ade5e66df82037a8f2da3174383dda8aff230
2020-03-29 13:06:05 -07:00
Chunli Fu
6b1ffcbf59 [model loading] Skip ssaRewrite for predict_net if it has been ssaRewritten (#35428)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35428

ATT

Reviewed By: yinghai

Differential Revision: D20655131

fbshipit-source-id: 4089b3527fc7b83ba793f8d292c7189a0fa68361
2020-03-26 16:48:15 -07:00
Hao Lu
4bd5d1b3be [TVM] Use caffe2_predictor_model_shape_hints to pass shape_hints to TVM (#35091)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35091

Test Plan:
AI/AF canary to make sure it does not affect production:

https://our.intern.facebook.com/intern/ads/canary/425387509869003921/
https://our.intern.facebook.com/intern/ads/canary/425387881631488449/

Glow:

```
buck test glow:
```

Reviewed By: yinghai

Differential Revision: D20552830

fbshipit-source-id: bdf65fb0ba945963a7c9621cc3f7ea5ebaecb907
2020-03-20 20:06:17 -07:00
Yinghai Lu
6000dca5df [nomnigraph] Copy device option when customize the op conversion (#34976)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34976

Previously, we are dropping the original device option info when we override the operator conversion function.

Test Plan:
```
buck test caffe2/caffe2/opt:converter_nomigraph_test
```

Reviewed By: ipiszy

Differential Revision: D20507277

fbshipit-source-id: 66b5eab07d18651eff27dab2a809cd04872ac224
2020-03-19 22:48:28 -07:00
Yinghai Lu
1af6002321 Initial implementation of NNPI Int8FC op
Test Plan:
```
 buck test mode/no-gpu glow/fb/test/numerics:test_fc_nnpi_int8nnpi -- --print-passing-detail
```

Reviewed By: hyuen

Differential Revision: D20450490

fbshipit-source-id: c4811cdc994548b6e319d57115434dfc199e07c2
2020-03-16 10:46:17 -07:00
Amy Yang
7c20578794 NNPI op mapping correct SpatialBN NNPI op name (#34176)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34176

Wrong operator name for the NNPI SpatialBN

Test Plan: flow canary

Reviewed By: hyuen

Differential Revision: D20237933

fbshipit-source-id: dfde658dcbf2482320e36d549f7d83c27df264a0
2020-03-03 17:57:28 -08:00
Hector Yuen
49586a2a7e fix sph batchnorm to use sph fma
Summary: make use of springhill's fma on SpatialBatchnorm

Test Plan:
re-enabled the unit test, ran it a couple of times
pending: net runner

Reviewed By: amylittleyang

Differential Revision: D20227767

fbshipit-source-id: 7c601f185940249c0a32bdf95d74a20552cd2625
2020-03-03 12:53:08 -08:00
Amy Yang
0759191f12 blacklist spatialBN until bitwise matching (#34092)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34092

Disable op in transform map until we get bitwise matching to ice-ref

Test Plan: CI

Reviewed By: hyuen

Differential Revision: D20177936

fbshipit-source-id: e316384184cb264852e63e5edce721a8614742d1
2020-03-02 17:55:00 -08:00
Hector Yuen
56d9906083 update mapping of fake operators (#33946)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33946

update mapping of fake operators to model nnpi
update SpatialBN to non-lowered

Test Plan:
compilation

https://github.com/pytorch/pytorch/pull/33946

Reviewed By: amylittleyang

Differential Revision: D20156136

fbshipit-source-id: e6ed87c3c5eba692a49376f0d9dae37ae185f185
2020-02-28 14:01:02 -08:00
Hector Yuen
a80d0330e4 add int4 fake fp16 mappings
Summary: update this mapping with thte int4 sls ops so we can run netrunner

Test Plan: testing with net_runner

Reviewed By: jfix71

Differential Revision: D19879826

fbshipit-source-id: eac84b10e2365c21cb8a7cfbf3123e26a9945deb
2020-02-13 15:37:23 -08:00
Yinghai Lu
b4b1b100bd Add a loop test for onnxified net (#32935)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32935

Mock away the content of onnxified net with some low cost ops so that we can still mimic the input/output transfer while doing minimal work on the card.

Test Plan:
```
buck run glow/fb/test:sparsenn_test -- --gtest_filter='SparseNNTest.vanillaC2' --onnxifi_debug_mode --onnxifi_loop_test_mode --nocaffe2_predictor_use_memonger
```

Differential Revision: D19631971

fbshipit-source-id: f970c55ccb410702f479255eeb750e01e3f8c2ae
2020-02-03 18:35:41 -08:00
Hector Yuen
4baadd54d7 add SpatialBN lowered fake fp16
Summary:
SpatialBNFakeLoweredFp16NNPI

this is the fake operator for SpatialBN that gets lowered into add/mul/div, etc.

Test Plan: test_spatialbn

Reviewed By: tracelogfb, amylittleyang

Differential Revision: D19658680

fbshipit-source-id: 2abddbcd9a2023ac75c494f20eaac2051b7139dc
2020-02-03 15:03:34 -08:00
Yinghai Lu
94ddc2c462 Resubmit more code fakefp16 mapping unification (#32798)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32798

ATT

Test Plan: unittests

Reviewed By: amylittleyang

Differential Revision: D19632251

fbshipit-source-id: 670004050d67415bb24392f3520afa32b64ce740
2020-01-30 12:48:48 -08:00
Edward Yang
c47c78d0bf Revert D19597036: More code fakefp16 mapping unification
Test Plan: revert-hammer

Differential Revision:
D19597036

Original commit changeset: deed61945884

fbshipit-source-id: c057e57810a99464aefb00b645613ecd6a7c5533
2020-01-29 13:32:42 -08:00
Yinghai Lu
642c9ef922 More code fakefp16 mapping unification
Summary: ATT

Reviewed By: amylittleyang

Differential Revision: D19597036

fbshipit-source-id: deed61945884fb4b01d058f3c72c75f5a937a41c
2020-01-29 11:01:24 -08:00
Yinghai Lu
02f055ffd9 Add mapping for FbFCPacked in fakefp16 transform
Summary: ATT. Since the infra is there.

Test Plan: run it

Reviewed By: amylittleyang

Differential Revision: D19605250

fbshipit-source-id: c68be4d7963afa4fa5f8f60c90f1913605eae516
2020-01-28 17:00:24 -08:00
Yinghai Lu
ffdcbadeaa Minor refactoring to improve code reuse (#32675)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32675

It's good to have one location to do the mapping.

Test Plan: Everything still runs.

Reviewed By: amylittleyang

Differential Revision: D19590354

fbshipit-source-id: d8c0d14e4bdf27da3e13bd4d161cd135d6e3822b
2020-01-28 13:31:48 -08:00
Brian Wignall
f326045b37 Fix typos, via a Levenshtein-type corrector (#31523)
Summary:
Should be non-semantic.

Uses https://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings/For_machines to find likely typos, with https://github.com/bwignall/typochecker to help automate the checking.

Uses an updated version of the tool used in https://github.com/pytorch/pytorch/pull/30606 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31523

Differential Revision: D19216749

Pulled By: mrshenli

fbshipit-source-id: 7fd489cb9a77cd7e4950c1046f925d57524960ea
2020-01-17 16:03:19 -08:00
Yinghai Lu
d2fdf140af Combine all the user inputs together and convert them to fp16 (#31898)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31898

Att

Reviewed By: tracelogfb

Differential Revision: D19291357

fbshipit-source-id: 747ed5234ca042ceeaff2d094701ead7597ac3ee
2020-01-08 14:36:42 -08:00
Chunli Fu
bb7befb12c Support loading by blob in predictor
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30805

Reviewed By: ipiszy

Differential Revision: D18827383

fbshipit-source-id: b97f958768618ca29a02b057667a9b4ee313ad3c
2019-12-10 10:34:14 -08:00
Chunli Fu
42324cb6e8 Change interface from map of TensorShape to shapeInfoMap (#30802)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30802

Change shape_hints from map<string, TensorShape> to ShapeInfoMap to catch dimType info from model file.

Reviewed By: ipiszy

Differential Revision: D18821486

fbshipit-source-id: c5d9ed72e158d3698aba38900aeda00f776745b4
2019-12-10 00:35:11 -08:00
Hector Yuen
ee20e66c48 replace the SLSRQ for their right emulations in the replayer test (#30367)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30367

use the SLS emulations that match the hardware

Test Plan: replayer test

Differential Revision: D18667605

fbshipit-source-id: 89aee630184737b86ecfb09717437e5c7473e42c
2019-11-23 00:06:03 -08:00
Benny Chen
496f740824 Connect with clip range gather operator (#28866)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28866

When we are working on the fix for int32 instead of int64, we also need to take care of the ClipRangesGatherSigridHash since this is the operator that actually gets used during inference.

Test Plan: Added unittest to cover for the new case

Reviewed By: ipiszy

Differential Revision: D17147237

fbshipit-source-id: 2b562b72a6ae8f7282e54d822467b8204fb1055e
2019-10-29 23:32:08 -07:00
Ying Zhang
e8c23c9f85 Add various flags for fakefp16 conversion
Summary: ATT

Test Plan: manually tested

Reviewed By: hyuen

Differential Revision: D17849416

fbshipit-source-id: 85ae8fb9c31a0f0139a3c61d5a164b342851d847
2019-10-11 18:06:18 -07:00
Ying Zhang
024a422f41 Add fakefp16 transformation.
Summary: ATT.

Reviewed By: hyuen

Differential Revision: D17559866

fbshipit-source-id: 58e3de97d00f20a9b5556e35504c520926d43cbd
2019-09-27 16:46:03 -07:00
Summer Deng
d95763b4dc Enable loading int8 prepacked models in PredictorContainer
Summary: To test the int8 ads models on CPU and accelerators with the ads replayer, we need to load the PREPACKING_INIT_NET_TYPE in the int8 model to initialize the int8 w_packed blobs.

Test Plan:
Ads replayer test.

P74811059

Reviewed By: zrphercule

Differential Revision: D16518888

fbshipit-source-id: cee212710ad37d9e491c970b25b2fe484373e5e4
2019-09-06 02:53:52 -07:00
Yinghai Lu
4edf77b6c0 Fuse to individual operators to GatherFuse8BitRowwiseQuantFloatMulLengthElim (#25519)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25519

Fuse  Gather-Fused8BitRowwiseQuantizedToFloat-Mul-LengthsSum opportunistically.

Test Plan:
```
buck test caffe2/caffe2/opt/custom:concat_elim_test
```

Reviewed By: dreamingleo

Differential Revision: D17125045

fbshipit-source-id: 8ee50410eb13a82e1e5c8180f392fce2fe9cd728
2019-09-03 19:08:49 -07:00
Stephen Chen
c5e1e5c300 Put ParseBlackListOps() into caffe2::glow namespace (#24384)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24384

So that we can use them in other functions.

Reviewed By: yinghai

Differential Revision: D16824289

fbshipit-source-id: 3cb33cfa9a5c479a63db6438aef518209bdfb1f4
2019-08-15 10:53:10 -07:00
Stephen Chen
b53916a373 C2/glow: assign net_pos to a net before applying onnxifi_blacklist_ops (#24262)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24262

Previously for onnxifi_blacklist_ops option, we figure out the net_pos based on the order of ops in the net. But this logic is wrong if the net already has net_pos assigned and we may end up blacklisting unintended ops. Fix this issue to always assign net_pos before computing any blacklist.

Reviewed By: yinghai

Differential Revision: D16789166

fbshipit-source-id: 2d08a7737d417822f2209adb4dcb24dbb258ff90
2019-08-14 10:39:15 -07:00
Lucian Grijincu
a936a90391 caffe2/caffe2/fb/operators/cc_amrc: drop SIMD OpenMP vectorization
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23235

Reviewed By: ajtulloch

Differential Revision: D16384612

Pulled By: luciang

fbshipit-source-id: a4c8257c6d3e151ba99167a152ad824b0dde7671
2019-07-23 17:25:00 -07:00
Alexander Sidorov
a6ccd62a81 BlackBoxPredictor OSS part 5: glow transforms
Summary:
Overal context: open-source BlackBoxPredictor as the entry
point for inference in Caffe2 (thread safe abstraction for Caffe2
inference). This should be used in ThroughputBenchmark for the purpose
of framework comparison
This specific diff:
There should be no harm in moving transformation code to
OSS. On the advantages side we will be able to compare production
Caffe2 setup with PyTorch in the most fair way via
ThroughputBenchmark. This approach avoid any complicated
transformation regirstries. Building those proper would be significant
engineering effort as well as production risk. In the past we had SEVs
related to transforms being turned off due to various refactors. Given
that we don't plan to build any other significant investments into
transformation logic except existing ones (like TVM and Glow), and
those also relate to open-source technologies, I came up to the
conclusion of moving to OSS the whole thing.

Reviewed By: bertmaher

Differential Revision: D16367134

fbshipit-source-id: fc6bacc1be3ff6336beb57cdad58168d3a2b8c28
2019-07-23 16:39:23 -07:00
Alexander Sidorov
2becbd3faa BlackBoxPredictor OSS part 4: Open-source other transforms (#23099)
Summary:
Overal context: open-source BlackBoxPredictor as the entry
point for inference in Caffe2 (thread safe abstraction for Caffe2
inference). This should be used in ThroughputBenchmark for the purpose
of framework comparison
This specific diff:
There should be no harm in moving transformation code to
OSS. On the advantages side we will be able to compare production
Caffe2 setup with PyTorch in the most fair way via
ThroughputBenchmark. This approach avoid any complicated
transformation regirstries. Building those proper would be significant
engineering effort as well as production risk. In the past we had SEVs
related to transforms being turned off due to various refactors. Given
that we don't plan to build any other significant investments into
transformation logic except existing ones (like TVM and Glow), and
those also relate to open-source technologies, I came up to the
conclusion of moving to OSS the whole thing.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23099

Test Plan:
salex@devvm4218:caffe2 { (fcdaf96|HISTEDIT)}$ submit_canary --q tw_adindexer_canary_on_canary_tier && submit_canary --q tw_adfinder_canary_on_canary_tier && submit_canary prospector_repl
ay_canary
/proc/self/fd/4/urllib3/connectionpool.py:851: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
Patch Phabricator Link: differential/diff/86851419/
Submit job request to the thrift service
https://our.intern.facebook.com/intern/ads/canary/419717789681292057
DONE
Everpaste link: https://our.intern.facebook.com/intern/everpaste/?color=0&handle=GBYe_ANnNNBnbWsDAAAAAABJPvJBbjEQAAAz
/proc/self/fd/4/urllib3/connectionpool.py:851: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
Patch Phabricator Link: differential/diff/86851536/
Submit job request to the thrift service
https://our.intern.facebook.com/intern/ads/canary/419717806884923980
DONE
Everpaste link: https://our.intern.facebook.com/intern/everpaste/?color=0&handle=GArl_QPncP7tc30IAAAAAACfza93bjEQAAAz
/proc/self/fd/4/urllib3/connectionpool.py:851: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
Patch Phabricator Link: differential/diff/86851661/
Submit job request to the thrift service
https://our.intern.facebook.com/intern/ads/canary/419717823090263325
DONE
Everpaste link: https://our.intern.facebook.com/intern/everpaste/?color=0&handle=GNcyAwRrfFd0MIUIAAAAAABLOINibjEQAAAz

Differential Revision: D16288332

Pulled By: salexspb

fbshipit-source-id: 95899dede6b11a2ae14703b9aaea8e1a677f0aaa
2019-07-22 13:53:43 -07:00