Commit Graph

54 Commits

Author SHA1 Message Date
Andy Wei
19943aafe9 [caffe2] Speed up remote net loading
Summary:
Training recovery takes over 3 hours for DI models. See T88118480 for more details.

One of the slowness reasons could be the linear search in the ApplicationSpecificInfo. To improve that, we cache the app info into a dict so the lookup can be much faster.

Test Plan:
Unit test
  buck test caffe2/caffe2/fb/predictor:predictor_py_dist_utils_test
```Building: finished in 6.2 sec (100%) 11023/11023 jobs, 2 updated
  Total time: 6.6 sec
More details at https://www.internalfb.com/intern/buck/build/95555464-b15f-44f2-a781-a712126aeaa1
Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details.
Running with tpx session id: 3f4e4913-5802-4437-81bf-1e0a08c067da
Trace available for this run at /tmp/tpx-20210420-101444.394595/trace.log
Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/5348024608951863
    ✓ ListingSuccess: caffe2/caffe2/fb/predictor:predictor_py_dist_utils_test - main (8.412)
    ✓ Pass: caffe2/caffe2/fb/predictor:predictor_py_dist_utils_test - test_empty_remote_net_in_app_into (caffe2.caffe2.fb.predictor.predictor_py_dist_utils_test.TestPredictorDistUtils) (7.844)
    ✓ Pass: caffe2/caffe2/fb/predictor:predictor_py_dist_utils_test - test_distributed_context_in_app_info (caffe2.caffe2.fb.predictor.predictor_py_dist_utils_test.TestPredictorDistUtils) (8.014)
    ✓ Pass: caffe2/caffe2/fb/predictor:predictor_py_dist_utils_test - test_remote_net_in_app_info (caffe2.caffe2.fb.predictor.predictor_py_dist_utils_test.TestPredictorDistUtils) (8.027)
Summary
  Pass: 3
  ListingSuccess: 1
If you need help debugging your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users
Finished test run: https://www.internalfb.com/intern/testinfra/testrun/5348024608951863
```

Performance Test:
N557020 is the old way, which takes about 30~60 secs for every 1000 remote nets
N556897 is the new way, which takes 0.12 secs for every 1000 remote nets

N557020 output:
~~~
I0420 112047.755 <ipython-input-2-515f8ba1b5f6>:48] Start retrieving remote nets ...
I0420 112050.036 <ipython-input-2-515f8ba1b5f6>:27] Get 1000 remote nets
I0420 112052.750 <ipython-input-2-515f8ba1b5f6>:27] Get 2000 remote nets
I0420 112055.907 <ipython-input-2-515f8ba1b5f6>:27] Get 3000 remote nets
I0420 112059.542 <ipython-input-2-515f8ba1b5f6>:27] Get 4000 remote nets
I0420 112103.628 <ipython-input-2-515f8ba1b5f6>:27] Get 5000 remote nets
I0420 112108.309 <ipython-input-2-515f8ba1b5f6>:27] Get 6000 remote nets
I0420 112113.883 <ipython-input-2-515f8ba1b5f6>:27] Get 7000 remote nets
I0420 112119.564 <ipython-input-2-515f8ba1b5f6>:27] Get 8000 remote nets
I0420 112125.629 <ipython-input-2-515f8ba1b5f6>:27] Get 9000 remote nets
I0420 112132.057 <ipython-input-2-515f8ba1b5f6>:27] Get 10000 remote nets
I0420 112138.979 <ipython-input-2-515f8ba1b5f6>:27] Get 11000 remote nets
I0420 112146.198 <ipython-input-2-515f8ba1b5f6>:27] Get 12000 remote nets
I0420 112154.381 <ipython-input-2-515f8ba1b5f6>:27] Get 13000 remote nets
I0420 112202.881 <ipython-input-2-515f8ba1b5f6>:27] Get 14000 remote nets
I0420 112211.595 <ipython-input-2-515f8ba1b5f6>:27] Get 15000 remote nets
I0420 112221.341 <ipython-input-2-515f8ba1b5f6>:27] Get 16000 remote nets
I0420 112231.300 <ipython-input-2-515f8ba1b5f6>:27] Get 17000 remote nets
I0420 112242.615 <ipython-input-2-515f8ba1b5f6>:27] Get 18000 remote nets
I0420 112253.730 <ipython-input-2-515f8ba1b5f6>:27] Get 19000 remote nets
I0420 112305.044 <ipython-input-2-515f8ba1b5f6>:27] Get 20000 remote nets
I0420 112316.378 <ipython-input-2-515f8ba1b5f6>:27] Get 21000 remote nets
I0420 112328.176 <ipython-input-2-515f8ba1b5f6>:27] Get 22000 remote nets
I0420 112341.466 <ipython-input-2-515f8ba1b5f6>:27] Get 23000 remote nets
I0420 112355.653 <ipython-input-2-515f8ba1b5f6>:27] Get 24000 remote nets
I0420 112409.014 <ipython-input-2-515f8ba1b5f6>:27] Get 25000 remote nets
I0420 112422.924 <ipython-input-2-515f8ba1b5f6>:27] Get 26000 remote nets
I0420 112437.026 <ipython-input-2-515f8ba1b5f6>:27] Get 27000 remote nets
I0420 112451.413 <ipython-input-2-515f8ba1b5f6>:27] Get 28000 remote nets
I0420 112506.773 <ipython-input-2-515f8ba1b5f6>:27] Get 29000 remote nets
I0420 112522.614 <ipython-input-2-515f8ba1b5f6>:27] Get 30000 remote nets
I0420 112538.564 <ipython-input-2-515f8ba1b5f6>:27] Get 31000 remote nets
I0420 112555.075 <ipython-input-2-515f8ba1b5f6>:27] Get 32000 remote nets
I0420 112612.159 <ipython-input-2-515f8ba1b5f6>:27] Get 33000 remote nets
I0420 112629.656 <ipython-input-2-515f8ba1b5f6>:27] Get 34000 remote nets
I0420 112647.850 <ipython-input-2-515f8ba1b5f6>:27] Get 35000 remote nets
I0420 112705.807 <ipython-input-2-515f8ba1b5f6>:27] Get 36000 remote nets
I0420 112724.495 <ipython-input-2-515f8ba1b5f6>:27] Get 37000 remote nets
I0420 112744.072 <ipython-input-2-515f8ba1b5f6>:27] Get 38000 remote nets
I0420 112804.266 <ipython-input-2-515f8ba1b5f6>:27] Get 39000 remote nets
I0420 112824.954 <ipython-input-2-515f8ba1b5f6>:27] Get 40000 remote nets
I0420 112845.934 <ipython-input-2-515f8ba1b5f6>:27] Get 41000 remote nets
I0420 112908.721 <ipython-input-2-515f8ba1b5f6>:27] Get 42000 remote nets
I0420 112930.573 <ipython-input-2-515f8ba1b5f6>:27] Get 43000 remote nets
I0420 112952.775 <ipython-input-2-515f8ba1b5f6>:27] Get 44000 remote nets
I0420 113015.969 <ipython-input-2-515f8ba1b5f6>:27] Get 45000 remote nets
I0420 113041.214 <ipython-input-2-515f8ba1b5f6>:27] Get 46000 remote nets
I0420 113104.702 <ipython-input-2-515f8ba1b5f6>:27] Get 47000 remote nets
I0420 113128.730 <ipython-input-2-515f8ba1b5f6>:27] Get 48000 remote nets
I0420 113153.378 <ipython-input-2-515f8ba1b5f6>:27] Get 49000 remote nets
I0420 113218.021 <ipython-input-2-515f8ba1b5f6>:27] Get 50000 remote nets
I0420 113243.351 <ipython-input-2-515f8ba1b5f6>:27] Get 51000 remote nets
I0420 113309.279 <ipython-input-2-515f8ba1b5f6>:27] Get 52000 remote nets
I0420 113335.202 <ipython-input-2-515f8ba1b5f6>:27] Get 53000 remote nets
I0420 113402.367 <ipython-input-2-515f8ba1b5f6>:27] Get 54000 remote nets
I0420 113430.947 <ipython-input-2-515f8ba1b5f6>:27] Get 55000 remote nets
I0420 113458.127 <ipython-input-2-515f8ba1b5f6>:27] Get 56000 remote nets
I0420 113526.365 <ipython-input-2-515f8ba1b5f6>:27] Get 57000 remote nets
I0420 113554.709 <ipython-input-2-515f8ba1b5f6>:27] Get 58000 remote nets
I0420 113623.601 <ipython-input-2-515f8ba1b5f6>:27] Get 59000 remote nets
I0420 113653.264 <ipython-input-2-515f8ba1b5f6>:27] Get 60000 remote nets
I0420 113724.726 <ipython-input-2-515f8ba1b5f6>:27] Get 61000 remote nets
I0420 113755.080 <ipython-input-2-515f8ba1b5f6>:27] Get 62000 remote nets
I0420 113827.936 <ipython-input-2-515f8ba1b5f6>:27] Get 63000 remote nets
I0420 113859.362 <ipython-input-2-515f8ba1b5f6>:27] Get 64000 remote nets
I0420 113931.138 <ipython-input-2-515f8ba1b5f6>:27] Get 65000 remote nets
I0420 114003.229 <ipython-input-2-515f8ba1b5f6>:27] Get 66000 remote nets
I0420 114038.085 <ipython-input-2-515f8ba1b5f6>:27] Get 67000 remote nets
I0420 114111.300 <ipython-input-2-515f8ba1b5f6>:27] Get 68000 remote nets
I0420 114145.383 <ipython-input-2-515f8ba1b5f6>:27] Get 69000 remote nets
I0420 114219.571 <ipython-input-2-515f8ba1b5f6>:27] Get 70000 remote nets
I0420 114254.233 <ipython-input-2-515f8ba1b5f6>:27] Get 71000 remote nets
I0420 114329.326 <ipython-input-2-515f8ba1b5f6>:27] Get 72000 remote nets
I0420 114405.087 <ipython-input-2-515f8ba1b5f6>:27] Get 73000 remote nets
I0420 114440.979 <ipython-input-2-515f8ba1b5f6>:27] Get 74000 remote nets
I0420 114518.520 <ipython-input-2-515f8ba1b5f6>:27] Get 75000 remote nets
I0420 114556.013 <ipython-input-2-515f8ba1b5f6>:27] Get 76000 remote nets
I0420 114633.434 <ipython-input-2-515f8ba1b5f6>:27] Get 77000 remote nets
I0420 114711.834 <ipython-input-2-515f8ba1b5f6>:27] Get 78000 remote nets
I0420 114750.741 <ipython-input-2-515f8ba1b5f6>:27] Get 79000 remote nets
I0420 114829.749 <ipython-input-2-515f8ba1b5f6>:27] Get 80000 remote nets
I0420 114909.038 <ipython-input-2-515f8ba1b5f6>:27] Get 81000 remote nets
I0420 114948.711 <ipython-input-2-515f8ba1b5f6>:27] Get 82000 remote nets
I0420 115028.869 <ipython-input-2-515f8ba1b5f6>:27] Get 83000 remote nets
I0420 115109.094 <ipython-input-2-515f8ba1b5f6>:27] Get 84000 remote nets
I0420 115150.249 <ipython-input-2-515f8ba1b5f6>:27] Get 85000 remote nets
I0420 115231.601 <ipython-input-2-515f8ba1b5f6>:27] Get 86000 remote nets
I0420 115313.772 <ipython-input-2-515f8ba1b5f6>:27] Get 87000 remote nets
I0420 115356.035 <ipython-input-2-515f8ba1b5f6>:27] Get 88000 remote nets
I0420 115438.846 <ipython-input-2-515f8ba1b5f6>:27] Get 89000 remote nets
I0420 115522.213 <ipython-input-2-515f8ba1b5f6>:27] Get 90000 remote nets
I0420 115607.908 <ipython-input-2-515f8ba1b5f6>:27] Get 91000 remote nets
I0420 115652.009 <ipython-input-2-515f8ba1b5f6>:27] Get 92000 remote nets
I0420 115736.510 <ipython-input-2-515f8ba1b5f6>:27] Get 93000 remote nets
I0420 115822.303 <ipython-input-2-515f8ba1b5f6>:27] Get 94000 remote nets
I0420 115908.392 <ipython-input-2-515f8ba1b5f6>:27] Get 95000 remote nets
I0420 115954.912 <ipython-input-2-515f8ba1b5f6>:27] Get 96000 remote nets
I0420 120042.219 <ipython-input-2-515f8ba1b5f6>:27] Get 97000 remote nets
I0420 120129.969 <ipython-input-2-515f8ba1b5f6>:27] Get 98000 remote nets
I0420 120218.765 <ipython-input-2-515f8ba1b5f6>:27] Get 99000 remote nets
I0420 120306.883 <ipython-input-2-515f8ba1b5f6>:27] Get 100000 remote nets
I0420 120355.543 <ipython-input-2-515f8ba1b5f6>:27] Get 101000 remote nets
I0420 120444.976 <ipython-input-2-515f8ba1b5f6>:27] Get 102000 remote nets
I0420 120533.482 <ipython-input-2-515f8ba1b5f6>:27] Get 103000 remote nets
I0420 120622.351 <ipython-input-2-515f8ba1b5f6>:27] Get 104000 remote nets
I0420 120712.467 <ipython-input-2-515f8ba1b5f6>:27] Get 105000 remote nets
I0420 120802.660 <ipython-input-2-515f8ba1b5f6>:27] Get 106000 remote nets
I0420 120854.634 <ipython-input-2-515f8ba1b5f6>:27] Get 107000 remote nets
I0420 120945.786 <ipython-input-2-515f8ba1b5f6>:27] Get 108000 remote nets
~~~

N556897 output:
~~~
I0420 111502.516 <ipython-input-7-52640a51556f>:60] Start retrieving remote nets ...
I0420 111504.709 <ipython-input-7-52640a51556f>:40] Get 1000 remote nets
I0420 111504.825 <ipython-input-7-52640a51556f>:40] Get 2000 remote nets
I0420 111504.941 <ipython-input-7-52640a51556f>:40] Get 3000 remote nets
I0420 111505.056 <ipython-input-7-52640a51556f>:40] Get 4000 remote nets
I0420 111505.174 <ipython-input-7-52640a51556f>:40] Get 5000 remote nets
I0420 111505.286 <ipython-input-7-52640a51556f>:40] Get 6000 remote nets
I0420 111505.405 <ipython-input-7-52640a51556f>:40] Get 7000 remote nets
I0420 111505.522 <ipython-input-7-52640a51556f>:40] Get 8000 remote nets
I0420 111505.639 <ipython-input-7-52640a51556f>:40] Get 9000 remote nets
I0420 111505.756 <ipython-input-7-52640a51556f>:40] Get 10000 remote nets
I0420 111505.873 <ipython-input-7-52640a51556f>:40] Get 11000 remote nets
I0420 111505.990 <ipython-input-7-52640a51556f>:40] Get 12000 remote nets
I0420 111506.106 <ipython-input-7-52640a51556f>:40] Get 13000 remote nets
I0420 111506.223 <ipython-input-7-52640a51556f>:40] Get 14000 remote nets
I0420 111506.343 <ipython-input-7-52640a51556f>:40] Get 15000 remote nets
I0420 111506.457 <ipython-input-7-52640a51556f>:40] Get 16000 remote nets
I0420 111506.585 <ipython-input-7-52640a51556f>:40] Get 17000 remote nets
I0420 111508.930 <ipython-input-7-52640a51556f>:40] Get 18000 remote nets
I0420 111509.045 <ipython-input-7-52640a51556f>:40] Get 19000 remote nets
I0420 111509.154 <ipython-input-7-52640a51556f>:40] Get 20000 remote nets
I0420 111509.266 <ipython-input-7-52640a51556f>:40] Get 21000 remote nets
I0420 111509.382 <ipython-input-7-52640a51556f>:40] Get 22000 remote nets
I0420 111509.497 <ipython-input-7-52640a51556f>:40] Get 23000 remote nets
I0420 111509.614 <ipython-input-7-52640a51556f>:40] Get 24000 remote nets
I0420 111509.736 <ipython-input-7-52640a51556f>:40] Get 25000 remote nets
I0420 111509.854 <ipython-input-7-52640a51556f>:40] Get 26000 remote nets
I0420 111509.972 <ipython-input-7-52640a51556f>:40] Get 27000 remote nets
I0420 111510.090 <ipython-input-7-52640a51556f>:40] Get 28000 remote nets
I0420 111510.210 <ipython-input-7-52640a51556f>:40] Get 29000 remote nets
I0420 111510.329 <ipython-input-7-52640a51556f>:40] Get 30000 remote nets
I0420 111510.448 <ipython-input-7-52640a51556f>:40] Get 31000 remote nets
I0420 111510.572 <ipython-input-7-52640a51556f>:40] Get 32000 remote nets
I0420 111510.689 <ipython-input-7-52640a51556f>:40] Get 33000 remote nets
I0420 111510.821 <ipython-input-7-52640a51556f>:40] Get 34000 remote nets
I0420 111510.989 <ipython-input-7-52640a51556f>:40] Get 35000 remote nets
I0420 111511.110 <ipython-input-7-52640a51556f>:40] Get 36000 remote nets
I0420 111511.236 <ipython-input-7-52640a51556f>:40] Get 37000 remote nets
I0420 111511.357 <ipython-input-7-52640a51556f>:40] Get 38000 remote nets
I0420 111511.482 <ipython-input-7-52640a51556f>:40] Get 39000 remote nets
I0420 111511.607 <ipython-input-7-52640a51556f>:40] Get 40000 remote nets
I0420 111511.729 <ipython-input-7-52640a51556f>:40] Get 41000 remote nets
I0420 111511.855 <ipython-input-7-52640a51556f>:40] Get 42000 remote nets
I0420 111511.988 <ipython-input-7-52640a51556f>:40] Get 43000 remote nets
I0420 111512.112 <ipython-input-7-52640a51556f>:40] Get 44000 remote nets
I0420 111512.232 <ipython-input-7-52640a51556f>:40] Get 45000 remote nets
I0420 111512.353 <ipython-input-7-52640a51556f>:40] Get 46000 remote nets
I0420 111512.477 <ipython-input-7-52640a51556f>:40] Get 47000 remote nets
I0420 111512.597 <ipython-input-7-52640a51556f>:40] Get 48000 remote nets
I0420 111512.723 <ipython-input-7-52640a51556f>:40] Get 49000 remote nets
I0420 111512.839 <ipython-input-7-52640a51556f>:40] Get 50000 remote nets
I0420 111512.969 <ipython-input-7-52640a51556f>:40] Get 51000 remote nets
I0420 111513.085 <ipython-input-7-52640a51556f>:40] Get 52000 remote nets
I0420 111513.205 <ipython-input-7-52640a51556f>:40] Get 53000 remote nets
I0420 111513.322 <ipython-input-7-52640a51556f>:40] Get 54000 remote nets
I0420 111513.441 <ipython-input-7-52640a51556f>:40] Get 55000 remote nets
I0420 111513.559 <ipython-input-7-52640a51556f>:40] Get 56000 remote nets
I0420 111513.678 <ipython-input-7-52640a51556f>:40] Get 57000 remote nets
I0420 111513.796 <ipython-input-7-52640a51556f>:40] Get 58000 remote nets
I0420 111513.918 <ipython-input-7-52640a51556f>:40] Get 59000 remote nets
I0420 111514.038 <ipython-input-7-52640a51556f>:40] Get 60000 remote nets
I0420 111514.158 <ipython-input-7-52640a51556f>:40] Get 61000 remote nets
I0420 111514.273 <ipython-input-7-52640a51556f>:40] Get 62000 remote nets
I0420 111514.391 <ipython-input-7-52640a51556f>:40] Get 63000 remote nets
I0420 111514.512 <ipython-input-7-52640a51556f>:40] Get 64000 remote nets
I0420 111514.638 <ipython-input-7-52640a51556f>:40] Get 65000 remote nets
I0420 111514.759 <ipython-input-7-52640a51556f>:40] Get 66000 remote nets
I0420 111514.874 <ipython-input-7-52640a51556f>:40] Get 67000 remote nets
I0420 111515.000 <ipython-input-7-52640a51556f>:40] Get 68000 remote nets
I0420 111515.117 <ipython-input-7-52640a51556f>:40] Get 69000 remote nets
I0420 111515.235 <ipython-input-7-52640a51556f>:40] Get 70000 remote nets
I0420 111515.358 <ipython-input-7-52640a51556f>:40] Get 71000 remote nets
I0420 111515.481 <ipython-input-7-52640a51556f>:40] Get 72000 remote nets
I0420 111515.604 <ipython-input-7-52640a51556f>:40] Get 73000 remote nets
I0420 111515.725 <ipython-input-7-52640a51556f>:40] Get 74000 remote nets
I0420 111515.848 <ipython-input-7-52640a51556f>:40] Get 75000 remote nets
I0420 111515.979 <ipython-input-7-52640a51556f>:40] Get 76000 remote nets
I0420 111516.102 <ipython-input-7-52640a51556f>:40] Get 77000 remote nets
I0420 111516.226 <ipython-input-7-52640a51556f>:40] Get 78000 remote nets
I0420 111516.344 <ipython-input-7-52640a51556f>:40] Get 79000 remote nets
I0420 111516.472 <ipython-input-7-52640a51556f>:40] Get 80000 remote nets
I0420 111516.603 <ipython-input-7-52640a51556f>:40] Get 81000 remote nets
I0420 111516.751 <ipython-input-7-52640a51556f>:40] Get 82000 remote nets
I0420 111516.883 <ipython-input-7-52640a51556f>:40] Get 83000 remote nets
I0420 111517.025 <ipython-input-7-52640a51556f>:40] Get 84000 remote nets
I0420 111517.160 <ipython-input-7-52640a51556f>:40] Get 85000 remote nets
I0420 111517.290 <ipython-input-7-52640a51556f>:40] Get 86000 remote nets
I0420 111517.415 <ipython-input-7-52640a51556f>:40] Get 87000 remote nets
I0420 111517.541 <ipython-input-7-52640a51556f>:40] Get 88000 remote nets
I0420 111517.665 <ipython-input-7-52640a51556f>:40] Get 89000 remote nets
I0420 111517.790 <ipython-input-7-52640a51556f>:40] Get 90000 remote nets
I0420 111517.918 <ipython-input-7-52640a51556f>:40] Get 91000 remote nets
I0420 111518.044 <ipython-input-7-52640a51556f>:40] Get 92000 remote nets
I0420 111518.171 <ipython-input-7-52640a51556f>:40] Get 93000 remote nets
I0420 111518.292 <ipython-input-7-52640a51556f>:40] Get 94000 remote nets
I0420 111518.429 <ipython-input-7-52640a51556f>:40] Get 95000 remote nets
I0420 111520.024 <ipython-input-7-52640a51556f>:40] Get 96000 remote nets
I0420 111520.148 <ipython-input-7-52640a51556f>:40] Get 97000 remote nets
I0420 111520.271 <ipython-input-7-52640a51556f>:40] Get 98000 remote nets
I0420 111520.396 <ipython-input-7-52640a51556f>:40] Get 99000 remote nets
I0420 111520.522 <ipython-input-7-52640a51556f>:40] Get 100000 remote nets
I0420 111520.646 <ipython-input-7-52640a51556f>:40] Get 101000 remote nets
I0420 111520.770 <ipython-input-7-52640a51556f>:40] Get 102000 remote nets
I0420 111520.899 <ipython-input-7-52640a51556f>:40] Get 103000 remote nets
I0420 111521.023 <ipython-input-7-52640a51556f>:40] Get 104000 remote nets
I0420 111521.149 <ipython-input-7-52640a51556f>:40] Get 105000 remote nets
I0420 111521.274 <ipython-input-7-52640a51556f>:40] Get 106000 remote nets
I0420 111521.399 <ipython-input-7-52640a51556f>:40] Get 107000 remote nets
I0420 111521.526 <ipython-input-7-52640a51556f>:40] Get 108000 remote nets
I0420 111521.651 <ipython-input-7-52640a51556f>:40] Get 109000 remote nets
I0420 111521.778 <ipython-input-7-52640a51556f>:40] Get 110000 remote nets
I0420 111521.900 <ipython-input-7-52640a51556f>:40] Get 111000 remote nets
I0420 111522.055 <ipython-input-7-52640a51556f>:40] Get 112000 remote nets
I0420 111522.173 <ipython-input-7-52640a51556f>:40] Get 113000 remote nets
I0420 111522.297 <ipython-input-7-52640a51556f>:40] Get 114000 remote nets
I0420 111522.421 <ipython-input-7-52640a51556f>:40] Get 115000 remote nets
I0420 111522.545 <ipython-input-7-52640a51556f>:40] Get 116000 remote nets
I0420 111522.671 <ipython-input-7-52640a51556f>:40] Get 117000 remote nets
I0420 111522.795 <ipython-input-7-52640a51556f>:40] Get 118000 remote nets
I0420 111522.919 <ipython-input-7-52640a51556f>:40] Get 119000 remote nets
I0420 111523.048 <ipython-input-7-52640a51556f>:40] Get 120000 remote nets
I0420 111523.171 <ipython-input-7-52640a51556f>:40] Get 121000 remote nets
I0420 111523.298 <ipython-input-7-52640a51556f>:40] Get 122000 remote nets
I0420 111523.420 <ipython-input-7-52640a51556f>:40] Get 123000 remote nets
I0420 111523.544 <ipython-input-7-52640a51556f>:40] Get 124000 remote nets
I0420 111523.669 <ipython-input-7-52640a51556f>:40] Get 125000 remote nets
I0420 111523.794 <ipython-input-7-52640a51556f>:40] Get 126000 remote nets
I0420 111523.920 <ipython-input-7-52640a51556f>:40] Get 127000 remote nets
I0420 111524.041 <ipython-input-7-52640a51556f>:40] Get 128000 remote nets
I0420 111524.173 <ipython-input-7-52640a51556f>:40] Get 129000 remote nets
I0420 111524.293 <ipython-input-7-52640a51556f>:40] Get 130000 remote nets
I0420 111524.417 <ipython-input-7-52640a51556f>:40] Get 131000 remote nets
I0420 111524.542 <ipython-input-7-52640a51556f>:40] Get 132000 remote nets
I0420 111524.665 <ipython-input-7-52640a51556f>:40] Get 133000 remote nets
I0420 111524.790 <ipython-input-7-52640a51556f>:40] Get 134000 remote nets
I0420 111524.913 <ipython-input-7-52640a51556f>:40] Get 135000 remote nets
I0420 111525.038 <ipython-input-7-52640a51556f>:40] Get 136000 remote nets
I0420 111525.166 <ipython-input-7-52640a51556f>:40] Get 137000 remote nets
I0420 111525.289 <ipython-input-7-52640a51556f>:40] Get 138000 remote nets
I0420 111525.414 <ipython-input-7-52640a51556f>:40] Get 139000 remote nets
I0420 111525.536 <ipython-input-7-52640a51556f>:40] Get 140000 remote nets
I0420 111525.659 <ipython-input-7-52640a51556f>:40] Get 141000 remote nets
I0420 111525.782 <ipython-input-7-52640a51556f>:40] Get 142000 remote nets
I0420 111525.907 <ipython-input-7-52640a51556f>:40] Get 143000 remote nets
I0420 111526.035 <ipython-input-7-52640a51556f>:40] Get 144000 remote nets
I0420 111526.157 <ipython-input-7-52640a51556f>:40] Get 145000 remote nets
I0420 111526.287 <ipython-input-7-52640a51556f>:40] Get 146000 remote nets
I0420 111526.409 <ipython-input-7-52640a51556f>:40] Get 147000 remote nets
I0420 111526.533 <ipython-input-7-52640a51556f>:40] Get 148000 remote nets
I0420 111526.658 <ipython-input-7-52640a51556f>:40] Get 149000 remote nets
I0420 111526.781 <ipython-input-7-52640a51556f>:40] Get 150000 remote nets
I0420 111526.908 <ipython-input-7-52640a51556f>:40] Get 151000 remote nets
I0420 111527.033 <ipython-input-7-52640a51556f>:40] Get 152000 remote nets
I0420 111527.158 <ipython-input-7-52640a51556f>:40] Get 153000 remote nets
I0420 111527.289 <ipython-input-7-52640a51556f>:40] Get 154000 remote nets
I0420 111527.413 <ipython-input-7-52640a51556f>:40] Get 155000 remote nets
I0420 111527.544 <ipython-input-7-52640a51556f>:40] Get 156000 remote nets
I0420 111527.665 <ipython-input-7-52640a51556f>:40] Get 157000 remote nets
I0420 111527.790 <ipython-input-7-52640a51556f>:40] Get 158000 remote nets
I0420 111527.917 <ipython-input-7-52640a51556f>:40] Get 159000 remote nets
I0420 111528.046 <ipython-input-7-52640a51556f>:40] Get 160000 remote nets
I0420 111528.175 <ipython-input-7-52640a51556f>:40] Get 161000 remote nets
I0420 111528.297 <ipython-input-7-52640a51556f>:40] Get 162000 remote nets
I0420 111528.422 <ipython-input-7-52640a51556f>:40] Get 163000 remote nets
I0420 111528.548 <ipython-input-7-52640a51556f>:40] Get 164000 remote nets
I0420 111528.672 <ipython-input-7-52640a51556f>:40] Get 165000 remote nets
I0420 111528.796 <ipython-input-7-52640a51556f>:40] Get 166000 remote nets
I0420 111528.920 <ipython-input-7-52640a51556f>:40] Get 167000 remote nets
I0420 111529.045 <ipython-input-7-52640a51556f>:40] Get 168000 remote nets
I0420 111529.172 <ipython-input-7-52640a51556f>:40] Get 169000 remote nets
I0420 111529.300 <ipython-input-7-52640a51556f>:40] Get 170000 remote nets
I0420 111529.426 <ipython-input-7-52640a51556f>:40] Get 171000 remote nets
I0420 111529.547 <ipython-input-7-52640a51556f>:40] Get 172000 remote nets
I0420 111529.683 <ipython-input-7-52640a51556f>:40] Get 173000 remote nets
I0420 111529.800 <ipython-input-7-52640a51556f>:40] Get 174000 remote nets
I0420 111529.923 <ipython-input-7-52640a51556f>:40] Get 175000 remote nets
I0420 111530.080 <ipython-input-7-52640a51556f>:40] Get 176000 remote nets
I0420 111530.205 <ipython-input-7-52640a51556f>:40] Get 177000 remote nets
I0420 111530.331 <ipython-input-7-52640a51556f>:40] Get 178000 remote nets
I0420 111530.453 <ipython-input-7-52640a51556f>:40] Get 179000 remote nets
I0420 111530.577 <ipython-input-7-52640a51556f>:40] Get 180000 remote nets
I0420 111530.705 <ipython-input-7-52640a51556f>:40] Get 181000 remote nets
I0420 111530.829 <ipython-input-7-52640a51556f>:40] Get 182000 remote nets
I0420 111530.955 <ipython-input-7-52640a51556f>:40] Get 183000 remote nets
I0420 111531.082 <ipython-input-7-52640a51556f>:40] Get 184000 remote nets
I0420 111531.210 <ipython-input-7-52640a51556f>:40] Get 185000 remote nets
I0420 111531.338 <ipython-input-7-52640a51556f>:40] Get 186000 remote nets
I0420 111531.461 <ipython-input-7-52640a51556f>:40] Get 187000 remote nets
I0420 111531.588 <ipython-input-7-52640a51556f>:40] Get 188000 remote nets
I0420 111531.708 <ipython-input-7-52640a51556f>:40] Get 189000 remote nets
I0420 111531.845 <ipython-input-7-52640a51556f>:40] Get 190000 remote nets
I0420 111531.968 <ipython-input-7-52640a51556f>:40] Get 191000 remote nets
I0420 111532.096 <ipython-input-7-52640a51556f>:40] Get 192000 remote nets
I0420 111534.047 <ipython-input-7-52640a51556f>:40] Get 193000 remote nets
I0420 111534.172 <ipython-input-7-52640a51556f>:40] Get 194000 remote nets
I0420 111534.297 <ipython-input-7-52640a51556f>:40] Get 195000 remote nets
I0420 111534.420 <ipython-input-7-52640a51556f>:40] Get 196000 remote nets
I0420 111534.543 <ipython-input-7-52640a51556f>:40] Get 197000 remote nets
I0420 111534.671 <ipython-input-7-52640a51556f>:40] Get 198000 remote nets
I0420 111534.794 <ipython-input-7-52640a51556f>:40] Get 199000 remote nets
I0420 111534.920 <ipython-input-7-52640a51556f>:40] Get 200000 remote nets
I0420 111535.044 <ipython-input-7-52640a51556f>:40] Get 201000 remote nets
I0420 111535.167 <ipython-input-7-52640a51556f>:40] Get 202000 remote nets
I0420 111535.291 <ipython-input-7-52640a51556f>:40] Get 203000 remote nets
I0420 111537.169 <ipython-input-7-52640a51556f>:64] Finish retrieving remote nets. Starting processing ...
I0420 111537.201 <ipython-input-7-52640a51556f>:77] Finished processing remote nets
~~~

Reviewed By: heslami

Differential Revision: D27886217

fbshipit-source-id: cdc398d04bf963d4f495adc0a91c8ceb54466e58
2021-04-20 22:32:40 -07:00
Adam Simpkins
81b9aa743b [pytorch] Update caffe2/python to eliminate Pyre errors (#52083)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52083

This makes minor fixes in `caffe2/python` to address all errors currently
reported by Pyre.

I update the code to fix errors when doing so looked simple and safe,
and added `pyre-fixme` comments in other places.
ghstack-source-id: 121109695

Test Plan: Confirmed that Pyre no longer reports errors under `caffe2/python`

Differential Revision: D26272279

fbshipit-source-id: b1eb19d323b613f23280ce9c71e800e874ca1162
2021-02-11 11:04:59 -08:00
Bugra Akyildiz
27c7158166 Remove __future__ imports for legacy Python2 supports (#45033)
Summary:
There is a module called `2to3` which you can target for future specifically to remove these, the directory of `caffe2` has the most redundant imports:

```2to3 -f future -w caffe2```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45033

Reviewed By: seemethere

Differential Revision: D23808648

Pulled By: bugra

fbshipit-source-id: 38971900f0fe43ab44a9168e57f2307580d36a38
2020-09-23 17:57:02 -07:00
Chunli Fu
3699274ce2 [DPER3] AOT integration
Summary: Integrate aot flow with model exporter.

Test Plan:
buck test dper3/dper3_backend/delivery/tests:dper3_model_export_test

replayer test see D23407733

Reviewed By: ipiszy

Differential Revision: D23313689

fbshipit-source-id: 39ae8d578ed28ddd6510db959b65974a5ff62888
2020-09-04 18:37:22 -07:00
Chunli Fu
d70b263e3a [DPER3] Separate user embeddings and ad embeddings in blob reorder
Summary:
Separate user embeddings and ad embeddings in blobsOrder. New order:
1. meta_net_def
2. preload_blobs
3. user_embeddings (embeddings in remote request only net)
4. ad_embeddings (embeddings in remote other net)

Add a field requestOnlyEmbeddings in meta_net_def to record user_embeddings.

This is for flash verification.

Test Plan:
buck test dper3/dper3_backend/delivery/tests:blob_reorder_test

Run a flow with canary package f211282476
Check the net: n326826, request_only_embeddings are recorded as expected

Reviewed By: ipiszy

Differential Revision: D23008305

fbshipit-source-id: 9360ba3d078f205832821005e8f151b8314f0cf2
2020-08-22 23:40:04 -07:00
Stanislau Hlebik
b774ce54f8 remediation of S205607
fbshipit-source-id: 798decc90db4f13770e97cdce3c0df7d5421b2a3
2020-07-17 17:19:47 -07:00
Stanislau Hlebik
8fdea489af remediation of S205607
fbshipit-source-id: 5113fe0c527595e4227ff827253b7414abbdf7ac
2020-07-17 17:17:03 -07:00
Chunli Fu
834569232b [online trainer] Add blob reorder (#39534)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39534

Reviewed By: boryiingsu

Differential Revision: D21871352

fbshipit-source-id: 00cce83b7351fdafd36d4db57c99fb8a58e8a260
2020-06-05 17:33:08 -07:00
Chunli Fu
b3fccda4a9 [DPER3][Shape inference] Update Shape Information in dper3 backend (#34475)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34475

Differential Revision: D20332799

fbshipit-source-id: 16aa7399eb48ce4d1d0f8431941ae1252322c382
2020-03-19 13:49:34 -07:00
Bangsheng Tang
8f854fb9e2 [1/n][multi-tower] add partition info in predictor construction (#34175)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34175

to incorporate PartitionInfo added in D20015493

Test Plan: unit tests

Reviewed By: yinghai

Differential Revision: D20133759

fbshipit-source-id: 130db2d80bca3c05a7ec91292159f857046718e0
2020-03-13 09:23:39 -07:00
Chunli Fu
fe9b4e3cba [DPER3] Blob Reorder (#33579)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33579

Differential Revision: D20008865

fbshipit-source-id: f35aded311d9d1d7d438d828ccabd2bab5575e5c
2020-03-12 12:28:12 -07:00
Lei Zhang
b45069b59f fix fc fp16 quantization (#29469)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29469

The original approach is to save both fp16 and fp32 for all models, which increased the filesize and memory.

This diff is to save 'used' blobs into predictor file.

Test Plan:
fc clone workflow :
f149878151

ctr mbl feed test with fc fp16 quantization:
f149996395

No fp32 in local file
{F221750392}

QRT after the fix:
https://fburl.com/qrt/cp8r8263

Reviewed By: wx1988

Differential Revision: D18382503

fbshipit-source-id: 231c41668f25b1d35ca8d4358ce9b12ba60a4f91
2019-11-18 11:26:49 -08:00
Lu Fang
dfa6fca1c6 Supporting Manifold DB in Predictor Exporter (#22334)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22334

Improve the function signatures of save_to_db and load_from_db in predictor_exporter.

Reviewed By: akyrola

Differential Revision: D16047208

fbshipit-source-id: a4e947f86e00ef3b3dd32c57efe58f76a38fcec7
2019-07-01 16:17:02 -07:00
Weiyi Zheng
f3cf6ed789 add fbgemm fp16 (fbfcpacked) support, add global_init_net in predictor_export_meta (#18257)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18257

support adding op in global_init_net. because pred_init_net is per thread, and just doesn't cut it.

Reviewed By: jspark1105

Differential Revision: D14552695

fbshipit-source-id: 53dd44c84ad019019ab9f35fc04d076b7f941ddc
2019-03-22 00:19:59 -07:00
Lu Fang
e12091d0a3 Revert D14114134: [asr] add fbgemm fp16 (fbfcpacked) support, add global_init_net in predictor_export_meta
Differential Revision:
D14114134

Original commit changeset: 112bb2ceb9d3

fbshipit-source-id: 763262c1b78eed88a653caad5adc27d97feb43aa
2019-03-20 16:32:53 -07:00
Weiyi Zheng
1b71f6d4eb add fbgemm fp16 (fbfcpacked) support, add global_init_net in predictor_export_meta (#17905)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17905

support adding op in global_init_net. because pred_init_net is per thread, and just doesn't cut it.

Reviewed By: jspark1105

Differential Revision: D14114134

fbshipit-source-id: 112bb2ceb9d3d5e663dd430585567f4eaa2db35f
2019-03-20 13:52:10 -07:00
Shane Li
620ff25bdb Enhance cpu support on gloo based multi-nodes mode. (#11330)
Summary:
1. Add some gloo communication operators into related fallback list;
2. Work around to avoid compiling errors while using fallback operator whose CPU operator inherits from 'OperatorBase' directly like PrefetchOperator;
3. Add new cpu context support for some python module files and resnet50 training example file.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11330

Reviewed By: yinghai

Differential Revision: D13624519

Pulled By: wesolwsk

fbshipit-source-id: ce39d57ddb8cd7786db2e873bfe954069d972f4f
2019-01-15 11:47:10 -08:00
Parth Raichura
3808e9fad3 Caffe2: Fix for creating entries of external_input in predic_net (#12979)
Summary:
Currently after performing export it gives two entries of externel_input
  of input data in predict_net proto because it extends the externel_input
  twice once seperately using input blob and one it is extendind all the entries
  of external_input from proto in which input blob is already included

Signed-off-by: Parth Raichura <parth.raichura@softnautics.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12979

Differential Revision: D12916349

Pulled By: soumith

fbshipit-source-id: 4d4a1c68c0936f8de3f4e380aea1393fe193cd2d
2018-11-15 22:33:50 -08:00
Junjie Bai
f54ab540af Rename cuda_gpu_id to device_id in DeviceOption (#12456)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12456

codemod with 'Yes to all'
codemod -d . --extensions h,cc,cpp,cu,py,proto,pbtxt,pb.txt,config cuda_gpu_id device_id

Overload TextFormat::ParseFromString to do string replace when parsing from protobuf format

Reviewed By: Yangqing

Differential Revision: D10240535

fbshipit-source-id: 5e6992bec961214be8dbe26f16f5794154a22b25
2018-10-09 15:54:04 -07:00
Junjie Bai
ff608a9ff3 Back out "Revert D10123245: Back out "codemod cuda_gpu_id to device_id"" (#12232)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12232

Original commit changeset: fca91fea58b7

This adds proper modifications to the DeviceType <->DeviceOption conversion code added in D10033396

Reviewed By: jerryzh168

Differential Revision: D10132473

fbshipit-source-id: 801ef777e2950982cb47b48051b1471a0a91e64b
2018-10-01 21:54:52 -07:00
Rick Ratmansky
3010dc4208 Revert D10123245: Back out "codemod cuda_gpu_id to device_id"
Differential Revision:
D10123245

Original commit changeset: d83da8e00a12

fbshipit-source-id: fca91fea58b7df208edc2e218a1d514f9821ec7b
2018-10-01 12:22:36 -07:00
Yang Liu
7d7d336c45 Back out "codemod cuda_gpu_id to device_id"
Summary:
Original commit changeset: f5614a5d2607

D9986213 is causing Multifeed Aggregator a [huge performance different](https://our.intern.facebook.com/intern/ads/analyze_canary/412951953278781781/) and is blocking aggregator push since last Friday night: https://fburl.com/feedtools/b6izvwjz
We need to land this revert ASAP to unblock aggregator push.

Reviewed By: orionr

Differential Revision: D10123245

fbshipit-source-id: d83da8e00a1250f5d09811a0a587c127e377aab2
2018-10-01 11:31:14 -07:00
Junjie Bai
3eb5940cf5 codemod cuda_gpu_id to device_id (#12022)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12022

codemod -d . --extensions h,cc,cpp,cu,py,proto,pbtxt,pb.txt,config cuda_gpu_id device_id

codemod with 'Yes to all'

Reviewed By: orionr

Differential Revision: D9986213

fbshipit-source-id: f5614a5d26078817aee8caf79a494abfd6a95ff1
2018-09-27 20:24:53 -07:00
Eli Amesefe
c5b1aa93ee Export uint8 tensors as byte string in mobile_exporter and add GivenTensorByteStringToUInt8FillOp (#10385)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10385

Pull Request resolved: https://github.com/pytorch/pytorch/pull/10354

Pull Request resolved: https://github.com/pytorch/pytorch/pull/10316

Because Protobuf encodes uint8_t tensors using a less space efficient varint uin32_t encoding, we are adding a new operator that reads back a byte string into a uint8_t tensor.

Reviewed By: harouwu

Differential Revision: D9004839

fbshipit-source-id: dfd27085c813fdeff13fee15eef4a2e7fef72845
2018-08-15 14:26:50 -07:00
Pushkar Tripathi
1f6888b70a Allow mobile exporter to export string arrays (#10017)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10017

Allow mobile exporter to export string arrays

Reviewed By: pjh5

Differential Revision: D9061213

fbshipit-source-id: b6c5257eb2f0f964dba255b97dc5d32af8ce15a7
2018-08-01 16:09:58 -07:00
Orion Reblitz-Richardson
edb88b5f3a
Update from Facebook (#8887)
* add opencl + fpga context

adds an opencl context inside caffe2/fb which can be used for fpga access

* [Caffe2] Force tensor inference checks to be triggered during testing

We've started to rely on TensorInference functions more for different analysis.  This diff ensures that the TensorInference function's result matches what is expected from the definition of the operator.

* Enable building //caffe2:torch with @mode/opt

In @mode/opt, python runs out of a PAR, which breaks a lot of
assumptions in the code about where templates/ folders live relative
to __file__. Rather than introduce hacks with parutil, I simply turn
template_path into a parameter for all the relevant functions and
thread it through from the top level.

* [Caffe2] Fix cost models for DotProduct and Div.  Update Tensor Inference for dot product

As title.  DotProduct states that output is a 1-D tensor (https://caffe2.ai/docs/operators-catalogue.html#dotproduct) though code suggests it is either 0- or 1-D depending on inputs.  TensorInference defined to support implementation.

* [SG-MoE] Add an option to make the experts NOT as components

* [nomnigraph] Rename and fixup convertToNeuralNetOperator API

This will make things a bit cleaner

* no longer symlink THNN.h and THCUNN.h

* forced decoder network (onnx export)

Closes https://github.com/pytorch/translate/pull/95

Add networks in ensemble_export.py to create a forced decoding network from PyTorch NMT checkpoints. This network takes an arbitrary numberized (source, target) pair and returns the model score for the translation, including penalties.

Vocabulary reduction networks are also supported, but note that target indices which are not in the possible_translation_tokens generated for the source input will be trea

* Revert schema change to fix production models

Revert schema change to fix production models

* MockLogDeviceReader - rebase on FIX

# Goal

1), Build a make_mock_log_device_reader using make_mock_reader

2), Replace the real log_device_reader here: https://fburl.com/raihwf1p

# Log by D8151734

Real log_device_reader:
```
I0529 20:29:05.373108 954994 tensor.h:839] Tensor print_net/log of type std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >. Dims: (): read_net/ParseOpenTrainingRow:0
I0529 20:29:05.373244 954994 tensor.h:839] Tensor read_net/ParseOpenTrainin

* [C2/D2][1/n]: Nonnegative-Constrained Optimization -- log barrier

implement log barrier as a regularization method

* Add teacher weight screening.

Add teacher weight sceening according to teacher labels. If teacher label is zero, we do not use the distill loss in the objective function.

* Add NormalizerContext

See task for more detail. This implementation is a copy of what exists for RegularizerContext except for how the parameters are defined in the model_definition thrift file.

I'll try an alternative implementation which overrides the default arguments of functions instead like for argscopes in tensorflow.

https://github.com/pytorch/pytorch/compare/master...MaximeBoucher:update-from-facebook-0939578c068c?expand=1

* Adding cosine similarity option in dot processor

Add pairwise cosine similarity option in dot product.
Add an option to concate dot product and cosine similarity.
Add test cases.

* [nomnigraph][redo] Concat elim for sparseNN

Same as D7962948, which was reverted because Operator Schema was not
defined

* [pytorch] Revert pytorch/pytorch#7918 'Release GIL when copying to shared memory', breaks ASAN

Revert this pytorch diff that breaks ASAN when running Filament in dev mode; in opt mode it gives "bad file descriptor" errors. Looks like a race when copying tensors to shared memory in multiple mp.Queue's (which spawn separate threads).

https://github.com/pytorch/pytorch/pull/7918/files

* [nomnigraph][mobile] Enable nomnigraph by default, use -Oz on nomnigraph related code to reduce code size

enables nomnigraph and reduces codesize

* [Warmup] Allow both offline incremental training and online training

Change plan name on saving side and reading side to support both training type

This diff depends on D8128530 and D8168651.

* Revert D7802642: [Warmup] Allow both offline incremental training and online training

This reverts commit afc213cf9b36cecf75333a788391c4d09f4afccc

@bypass-lint

An infra SEV is better than not reverting this diff.
If you copy this password, see you in SEV Review!
@cause_a_sev_many_files

* Add legacy grad logic to fix div op on old graphs.

Add legacy grad logic to fix div op on old graphs.

* Correctly propagate operator failures

Propagate errors from operators that throw exceptions and return false

* Revert D8374829: [caffe2][nomnigraph][redo] Concat elim for sparseNN

This reverts commit 6dda028c463e54bb5c32188bbbe9202107e188a5

@bypass-lint

An infra SEV is better than not reverting this diff.
If you copy this password, see you in SEV Review!
@cause_a_sev_many_files

* [Caffe2] Added extra_info to core.DeviceOption(), enforced extra_info to be inherited in scope.DeviceScope

extra_info is a newly defined field in DeviceOption proto. This diff added extra_info to the core.DeviceOption().  And, In scope.DeviceScope(), this diff enforce the new scope to inherit the extra_info from old scope.

* [opt] hgdirsync wasn't enabled, merge diverged code

Here's the damage, P59732616 basically xplat was left behind but had
the change from assert to CAFFE_ENFORCE

* OMP parallelism over RoIs for RoIAlign op

Simpler to parallelize over RoIs. Shouldn't affect other uses as it relies on
the number of OMP threads set during startup.

PR: https://github.com/pytorch/pytorch/pull/8562

* Use int64_t for shape in FillOps

to avoid overflow of int32

* Implement Rotated RoIAlign op

Based on Rotated RPNs as explained in https://arxiv.org/abs/1703.01086.
The idea is simple - orientation/angle is added as an RPN
anchor parameter and then the angle is further regressed similar to bbox
coords. There are some additional changes related to NMS and IoU, but besides
that it's a direct extension to Faster-RCNN. Further details in https://fb.quip.com/sZHlA1iMfWPZ.

RoIs are represented in [center_x, center_y, width, height, angle] format.
`angle` repre

* Rotated RoIAlign op CUDA forward implementation

CUDA forward impl for D8415490

* RoIAlignRotated op CUDA backward pass implementation

TSIA

* All remaining fixes to eliminate process_github.sh

Most of this diff has already been reviewed separately, except for the parts relating to _thnn/utils.py and _utils._internal.py

remove skipIf(True, 'Fbcode') line from process_github.sh

replace sed of cpp file with #ifdef to control cudnnDestroy use

undo sync-time deletion of .gitattributes, remove process_github.sh

switch to using _utils._internal rather than try-import-except

This diff also fixes the open-source bug where rebuilds have

* Back out "Revert D7802642: [Warmup] Allow both offline incremental training and online training"

Original commit changeset: 7707d2efe60e The original diff is backout becuase the online trainer package is backed out. This code would only work with new online trainer package

* [easy] improve error log in adagrad op

as title

* re-allow use of thnn_h_path

This fixes cffi usage in OSS

* [4/4] [tum] paralyzing layerNorm for GPU full sync

as title

* add compile=False to pytorch tests, remove hack with pyc

* Add shape and type inference for RowWiseArgMax operator

See title

* Revert D8515341: Back out "Revert D7802642: [Warmup] Allow both offline incremental training and online training"

This reverts commit 78167eeef0af16b60f72c82f9dcdda9b41b4dcbd

@bypass-lint

An infra SEV is better than not reverting this diff.
If you copy this password, see you in SEV Review!
@cause_a_sev_many_files

* [fix-flaky-test] mock_hive_reader_test flaky, because GlobalCounter collects local counts intervally

# Problem

`MockHiveReader` uses `GlobalCounter` to limit `max_examples`.

GlobalCounter on server node collect local counts from worker nodes every 1 sec.

This 1 sec delay makes it impossible to limit exactly to the `max_examples`, it will definitely exceed `max_examples`.

# Plan

Given,
```
Expected num_examples = max_examples + num_examples/sec (Read Speed) x 1 sec (GlobalCounter Sync Int

* [Caffe2] Fix FCGradient cost inference.  Prevent overflow in cost inference

FCGradient missed a factor 2 in the `num_outputs == 3` case.  Overflow was occurring with flop calculation for FC.  Changed types to `uint64_t` to prevent future problems.

* Fix binary ops with empty inputs

Fix binary ops with empty inputs

* Support the filling of input blob with provided data

as title for Biz Integrity case

* Back out "Revert D8515341: Back out "Revert D7802642: [Warmup] Allow both offline incremental training and online training""

Original commit changeset: 30c55dd38816 Original diff is reverted due to introducing bad integration test. Fixed the integration test.

* [c2][easy] improve pack ops error loggings

as desc.

* Add ShapeTypeInference for LpNorm operator

As desc

* Shard test_nn to reduce runtime for each test target

Closes https://github.com/pytorch/pytorch/pull/8793

The current test_nn would time out and be disabled in GreenWarden, and we need to have an option to split it up in order to pass the stress test. Right now GreenWarden roughly allows running 100 test cases in test_nn before timing out, and here we have an option to divide test_nn into 30 shards (with ~40 tests in each shard) to allow for some test suite growth in the future.

* Change default caffe2_streams_per_gpu to 1

* Remove IN_SANDCASTLE from common.py and test_nn.py

We prefer to disable the failing tests through Sandcastle UI instead.

* Add a new class for an updated prof_dag.proto

This diff contains:
- An updated prof_dag.proto that contains blob profiles.
- A class to deserialize this information (serialization is in a follow up diff)
- Update to separate profiling information from NeuralNet (and use it as part of the class above).
- Unit tests

* Lambdarank for SparseNN

This diff adds a lambda_rank_layer for SparseNN.
 changes include
1) Adds support for multi sessions in c2 op
2) Adds support for two different loss functions in c2 op
3) Unit tests for op

* Revert D8586950: Back out "Revert D8515341: Back out "Revert D7802642: [Warmup] Allow both offline incremental training and online training""

This reverts commit 012220ed63eccc35659a57b31d16a3625da6317b

@bypass-lint

An infra SEV is better than not reverting this diff.
If you copy this password, see you in SEV Review!
@cause_a_sev_many_files

* [easy] A few fixups to multithread predictor benchmark

(1) support perf on T6 server
(2) remove dead code

* fix a bug about the map size

as title

* Fix reduce sum on in-place case.

Fix reduce sum on in-place case.

* [Warmup] Reland reverted diff Allow both offline incremental training and online training

Closes https://github.com/pytorch/pytorch/pull/8827

fix net transform integration test. Allow offline and online trainer to coexist D7802642.

* Add StoreHandlerNotAvailableException

Add an exception for a store that is not available or has been
deleted.

* Use exception handling for fault tolerance, missing KV store

Remove status blobs to communication ops so that exceptions propagate on
failure.

* [C2/D2][2/n]: Nonnegative-Constrained Optimization -- bounded grad proj

for simple bounded constrained optimization, incl non-negative box constraints.

* [GanH]: Adaptive Weighting with More Estimations

With implemented postivity optimization, we now learn adaptive weights with different
parameterizations.

This improves parameter estimation and training stability.

* Revert some changes for landing

* Remove AutoNoGIL in StorageSharing

* Temporarily disable net_tests

* Revert "[Caffe2] Force tensor inference checks to be triggered during testing"

This reverts commit 67ef05c22b2f71b4a489695384932f968384a2a4.

* Revert "Fix reduce sum on in-place case."

This reverts commit 6cb8a8e1b3db7b6d20941b0053e3f3836068eb64.

* Revert "Revert "Fix reduce sum on in-place case.""

This reverts commit 130a257c0893dc09f4bd6e6a45d112261807fd2c.
2018-06-26 14:55:48 -07:00
sf-wind
5b86c3af4a
Update from facebook (#8384)
* [fix] fixup the bias multiplier data access issue

Hotfix for failues in conv_transpose

* [D2][Easy]: lint regularizer

lint with black

* [GanH]: Split mu in adaptive weight for diagnose

* [Dper] Add the ability to split FC weights into multiple smaller ones

* fix SumReduceLikeOp for empty blob

as desc.

* add ctc_greedy_decoder for caffe2

ctc_greedy_decoder same as tf's

* Update event callback handling

Allow multiple callbacks per event

* Add WeightedSum layer

The motivation is to do weighted sum in HoNet/crossnet, in the next diff, I'll replace model.Add with model.WeightedSum in
honet: https://fburl.com/f4rmolg2
crossnet: https://fburl.com/v7awn8se, https://fburl.com/63filbnm

* Replicate DAG's behavior

Some callers expect RunAsync to block, replicate that behavior in case of
explicit 'dag' net type

* [dper] layernorm layer

as title

* Override dag, async_dag, async_polling

Overriding dag, async_dag and async_polling with async_scheduling

* Name the thread pools

Caffe thread pools currently inherit the thread names from the thread that starts them, which can be misleading. Give them an explicit name instead.

* [Caffe2] FilleOp should support int64_t dimensions

Change argument type to int64_t for shape argument of FillerOp (used in ConstantFill, XavierFill, etc)

* Remove caffe2/caffe2/contrib/torch/

It's not used anywhere and depends on old lua torch that conflicts with Aten. Given PT1 it's not relevant any more (though it was nice and clever code!)

#accept2ship

* Fix linearWarmup multiplier check

The multiplier needs to be non-negative, not strictly positive.

* Revert D3314316

This is after 2 years and we do not seem to have a use case for this one, so
for the sake of clean API design we should potentially remove this. This would
allow us to potentially pass in arguments to optionally construct an object,
although it is indeed a little bit unclear how we can reuse existing objects if
constructor arguments are passed in. In any case, we may want to remove this
dangling feature.

* Speedup generate proposals by partial_sort.

Speedup generate proposals by partial_sort.

FACEBOOK:
- Saw speed improvement for training with this op.
- Yanghan benchmarked the op on a small dataset and see consistent 100% improvement on speed (6ms -> 3ms) on 420 input resolution. See next diff for details.

* More parallel processing friendly for CPP version of GenerateProposals.

More parallel processing friendly for CPP version of GenerateProposals.

* [DT] [43/n] Lift stop conditions inside reader code back to flow control

1. Split multi_reader function into local_reader and remote_reader
2. Lifted stop conditions inside Limiter back to flow control
3. Split epoch flow building logic into 3 cases:
  - single machine (1 reader, 1 trainer on trainer0 node, no PS)
  - (1 reader + 1 trainer) on trainer0 node, has PS
  - multiple readers, readers do not share nodes with trainers, might have PS or not

* Resolve conflicts for torch/_thnn/utils.py

* [Caffe2] Handle image decoding errors

Image decoding errors can make the whole training fail. This diff is to handle them
1.Catch imdecode exceptions and check if decoded image has zero columns or rows. This is counted as decoding errors.
2.Replace the image with empty in case of error
3.Count the number of errors and throw runtime exception if the rate reaches given number

The empty image data is kept. It might introduce noise in the training data.

* Update MKL exporter to IDEEP ops

TSIA

* [Caffe2] GlobalInit is thread safe, fixing the comment

With the mutex and lock, GlobalInit is thread safe.
Update the comments.

* Back out "Add support for generating ATen files during fbcode build"

Original commit changeset: 28970ddba353

@override-unit-failures
(Note: this ignores all push blocking failures!)

* [DT]: fix predictor save

similar to D6610058, here we add the fix for distributed online training

* Remove net_singlethread_async_gpu.cc

Closes https://github.com/caffe2/caffe2/pull/2528

This removes net_singlethread_async_gpu.cc as part of our effort to clean
CUDAContext and the net executors.

* Inline DFS task execution

Add a DFS inline task execution mode in executor

* Add c10 folder to fbcode

This adds the c10 folder and its test cases to fbcode. Build flags are mostly taken from aten.

* add dependencies for online trainer

Add some dependencies so that the online model can use DataPipeline and PredictionTransform operators

Relevent post: https://fb.intern.facebook.com/groups/1324375037655677/permalink/1740993462660497/

* Resolve conflicts for tools/jit/gen_jit_dispatch.py

* [Fix] sparse regularization in distributed training

* Support advanced pooling options in sum processor

* support advanced pooling options in sum processor
* remove redundant code
* support attention in sum processor

* Improve shard logging in net tracing code

Make it handle arbitrary shard ids instead of just one digit ids.

* [Caffe2] Call GlobalInit in predictor only in mobile

FACEBOOK:
Calling GlobalInit long after the program starts may not be safe. There are issues if the following happens:

User does not call GlobalInit and initFacebook after program starts
User sets a flag manually: https://fburl.com/mcsumw7d
User calls OSS predictor.
OSS predictor calls GlobalInit
GlobalInit calls initFacebook
initFacebook resets all flags: https://fburl.com/tolszha1
Thus, the user manually set flags are overwritten

This would happen anytime GlobalInit is called long after the program starts.
I suppose the intention of the user in this case is not to call GlobalInit throughout the program,
but use Caffe2 regardless (is that desired?)
But adding GlobalInit in the OSS predictor would automatically call GlobalInit when using Caffe2.

This issue doesn't exist in mobile, since initFacebook is not called on mobile.

For now, guard the GlobalInit in predictor for mobile only.
May want to ensure the GlobalInit is always called at the start of the program. @[3501714:kutta] has seen weird issues when not calling GlobalInit at the start of the program on server side. He has made some progress on this.

* resolve conflicts for caffe2/core/logging_is_google_glog.h and test/test_torch.py

* Add empty fix for SumLikeReduceOp

Add empty fix for SumLikeReduceOp

* Revert D7962948: [caffe2][nomnigraph] Concat elim for sparseNN

This reverts commit f7f434dc5c34ca6058b9765d2ef615453d2276a9

@bypass-lint

An infra SEV is better than not reverting this diff.
If you copy this password, see you in SEV Review!
@cause_a_sev_many_files

* Remove Declarations.yaml

* Include common.h

* Change std::stoi to caffe2::stoi

* Add thread_name.cc to the CMake file

* No need to subtract 1. Fix test segfaults

* Fix NetTest, ObserverTest

Fix tests

(cherry picked from commit 3767e66c3f365596cba3d46d3e7322c933a0ab41)

* CTCGreedyDecoderOp only has CPU implementation, test should only run on CPU

* Add a variable to avoid conversion resizing issue

* [fix] fixup the bias multiplier data access issue

Hotfix for failues in conv_transpose

* [D2][Easy]: lint regularizer

lint with black

* [GanH]: Split mu in adaptive weight for diagnose

* [Dper] Add the ability to split FC weights into multiple smaller ones

* fix SumReduceLikeOp for empty blob

as desc.

* add ctc_greedy_decoder for caffe2

ctc_greedy_decoder same as tf's

* Update event callback handling

Allow multiple callbacks per event

* Add WeightedSum layer

The motivation is to do weighted sum in HoNet/crossnet, in the next diff, I'll replace model.Add with model.WeightedSum in
honet: https://fburl.com/f4rmolg2
crossnet: https://fburl.com/v7awn8se, https://fburl.com/63filbnm

* Replicate DAG's behavior

Some callers expect RunAsync to block, replicate that behavior in case of
explicit 'dag' net type

* [dper] layernorm layer

as title

* Override dag, async_dag, async_polling

Overriding dag, async_dag and async_polling with async_scheduling

* Name the thread pools

Caffe thread pools currently inherit the thread names from the thread that starts them, which can be misleading. Give them an explicit name instead.

* [Caffe2] FilleOp should support int64_t dimensions

Change argument type to int64_t for shape argument of FillerOp (used in ConstantFill, XavierFill, etc)

* Remove caffe2/caffe2/contrib/torch/

It's not used anywhere and depends on old lua torch that conflicts with Aten. Given PT1 it's not relevant any more (though it was nice and clever code!)

#accept2ship

* Fix linearWarmup multiplier check

The multiplier needs to be non-negative, not strictly positive.

* Revert D3314316

This is after 2 years and we do not seem to have a use case for this one, so
for the sake of clean API design we should potentially remove this. This would
allow us to potentially pass in arguments to optionally construct an object,
although it is indeed a little bit unclear how we can reuse existing objects if
constructor arguments are passed in. In any case, we may want to remove this
dangling feature.

* Speedup generate proposals by partial_sort.

Speedup generate proposals by partial_sort.

FACEBOOK:
- Saw speed improvement for training with this op.
- Yanghan benchmarked the op on a small dataset and see consistent 100% improvement on speed (6ms -> 3ms) on 420 input resolution. See next diff for details.

* More parallel processing friendly for CPP version of GenerateProposals.

More parallel processing friendly for CPP version of GenerateProposals.

* [DT] [43/n] Lift stop conditions inside reader code back to flow control

1. Split multi_reader function into local_reader and remote_reader
2. Lifted stop conditions inside Limiter back to flow control
3. Split epoch flow building logic into 3 cases:
  - single machine (1 reader, 1 trainer on trainer0 node, no PS)
  - (1 reader + 1 trainer) on trainer0 node, has PS
  - multiple readers, readers do not share nodes with trainers, might have PS or not

* Resolve conflicts for torch/_thnn/utils.py

* [Caffe2] Handle image decoding errors

Image decoding errors can make the whole training fail. This diff is to handle them
1.Catch imdecode exceptions and check if decoded image has zero columns or rows. This is counted as decoding errors.
2.Replace the image with empty in case of error
3.Count the number of errors and throw runtime exception if the rate reaches given number

The empty image data is kept. It might introduce noise in the training data.

* Update MKL exporter to IDEEP ops

TSIA

* [Caffe2] GlobalInit is thread safe, fixing the comment

With the mutex and lock, GlobalInit is thread safe.
Update the comments.

* Back out "Add support for generating ATen files during fbcode build"

Original commit changeset: 28970ddba353

@override-unit-failures
(Note: this ignores all push blocking failures!)

* [DT]: fix predictor save

similar to D6610058, here we add the fix for distributed online training

* Remove net_singlethread_async_gpu.cc

Closes https://github.com/caffe2/caffe2/pull/2528

This removes net_singlethread_async_gpu.cc as part of our effort to clean
CUDAContext and the net executors.

* Inline DFS task execution

Add a DFS inline task execution mode in executor

* Add c10 folder to fbcode

This adds the c10 folder and its test cases to fbcode. Build flags are mostly taken from aten.

* add dependencies for online trainer

Add some dependencies so that the online model can use DataPipeline and PredictionTransform operators

Relevent post: https://fb.intern.facebook.com/groups/1324375037655677/permalink/1740993462660497/

* Resolve conflicts for tools/jit/gen_jit_dispatch.py

* [Fix] sparse regularization in distributed training

* Support advanced pooling options in sum processor

* support advanced pooling options in sum processor
* remove redundant code
* support attention in sum processor

* Improve shard logging in net tracing code

Make it handle arbitrary shard ids instead of just one digit ids.

* [Caffe2] Call GlobalInit in predictor only in mobile

FACEBOOK:
Calling GlobalInit long after the program starts may not be safe. There are issues if the following happens:

User does not call GlobalInit and initFacebook after program starts
User sets a flag manually: https://fburl.com/mcsumw7d
User calls OSS predictor.
OSS predictor calls GlobalInit
GlobalInit calls initFacebook
initFacebook resets all flags: https://fburl.com/tolszha1
Thus, the user manually set flags are overwritten

This would happen anytime GlobalInit is called long after the program starts.
I suppose the intention of the user in this case is not to call GlobalInit throughout the program,
but use Caffe2 regardless (is that desired?)
But adding GlobalInit in the OSS predictor would automatically call GlobalInit when using Caffe2.

This issue doesn't exist in mobile, since initFacebook is not called on mobile.

For now, guard the GlobalInit in predictor for mobile only.
May want to ensure the GlobalInit is always called at the start of the program. @[3501714:kutta] has seen weird issues when not calling GlobalInit at the start of the program on server side. He has made some progress on this.

* resolve conflicts for caffe2/core/logging_is_google_glog.h and test/test_torch.py

* Add empty fix for SumLikeReduceOp

Add empty fix for SumLikeReduceOp

* Revert D7962948: [caffe2][nomnigraph] Concat elim for sparseNN

This reverts commit f7f434dc5c34ca6058b9765d2ef615453d2276a9

@bypass-lint

An infra SEV is better than not reverting this diff.
If you copy this password, see you in SEV Review!
@cause_a_sev_many_files

* Remove Declarations.yaml

* Include common.h

* Change std::stoi to caffe2::stoi

* Add thread_name.cc to the CMake file

* No need to subtract 1. Fix test segfaults

* Fix NetTest, ObserverTest

Fix tests

(cherry picked from commit 3767e66c3f365596cba3d46d3e7322c933a0ab41)

* CTCGreedyDecoderOp only has CPU implementation, test should only run on CPU

* Add a variable to avoid conversion resizing issue

* Remove the code per soumith's comments

* Remove the code per soumith's comments

* Remove blank lines in the end of file

* Resolve conflicts for torch/_thnn/utils.py

* Update MKL exporter to IDEEP ops

TSIA

* Back out "Add support for generating ATen files during fbcode build"

Original commit changeset: 28970ddba353

@override-unit-failures
(Note: this ignores all push blocking failures!)

* add dependencies for online trainer

Add some dependencies so that the online model can use DataPipeline and PredictionTransform operators

Relevent post: https://fb.intern.facebook.com/groups/1324375037655677/permalink/1740993462660497/

* Resolve conflicts for tools/jit/gen_jit_dispatch.py

* Support advanced pooling options in sum processor

* support advanced pooling options in sum processor
* remove redundant code
* support attention in sum processor

* resolve conflicts for caffe2/core/logging_is_google_glog.h and test/test_torch.py

* Revert D7962948: [caffe2][nomnigraph] Concat elim for sparseNN

This reverts commit f7f434dc5c34ca6058b9765d2ef615453d2276a9

@bypass-lint

An infra SEV is better than not reverting this diff.
If you copy this password, see you in SEV Review!
@cause_a_sev_many_files

* Remove Declarations.yaml

* Include common.h

* Change std::stoi to caffe2::stoi

* [caffe2] uprade IDEEP and hotfix for conv op accuracy issue (#8364)

* [IDEEP] Upgrade IDEEP version

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* [IDEEP] Fix accuracy issue in conv op

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Fix build error due to lack of src in CMakeLists

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Remove the code per soumith's comments

* [ONNX] Add an ATen fallback pathway for ONNX export (#8273)

* ATen fallback for ONNX export

* Move to enum

* Fix model test

* Add comment

* Address comments

BC interface

* Remove imaginary file (#8415)

* [Caffe2] Enable AMD/MIOPEN ops for Caffe2  (#8306)

* Add hip support for caffe2 core

* Add MIOPEN header/wrapper to caffe2 core

* Add HIP device into caffe2 PB

* top level makefile change for rocm/hip

* makefile scaffolding for AMD/RocM/HIP

* Makefile scafodding for AMD/RocM/HIP; add makefile/utility for HIP files

* caffe2 PB update for AMD/ROCM HIP device

* Add AMD/RocM/Thrust dependency

* HIP threadpool update

* Fix makefile macro

* makefile fix: duplicate test/binary name

* makefile clean-up

* makefile clean-up

* add HIP operator registry

* add utilities for hip device

* Add USE_HIP to config summary

* makefile fix for BUILD_TEST

* merge latest

* Fix indentation

* code clean-up

* Guard builds without HIP and use the same cmake script as PyTorch to find HIP

* Setup rocm environment variables in build.sh (ideally should be done in the docker images)

* setup locale

* set HIP_PLATFORM

* Revert "set HIP_PLATFORM"

This reverts commit 8ec58db2b390c9259220c49fa34cd403568300ad.

* continue the build script environment variables mess

* HCC_AMDGPU_TARGET

* Cleanup the mess, has been fixed in the lastest docker images

* Assign protobuf field hip_gpu_id a new field number for backward compatibility

* change name to avoid conflict

* Fix duplicated thread pool flag

* Refactor cmake files to not add hip includes and libs globally

* Fix the wrong usage of environment variables detection in cmake

* Add MIOPEN CNN operators

* Revert "Add MIOPEN CNN operators"

This reverts commit 6e89ad4385b5b8967a7854c4adda52c012cee42a.

* Add MIOPEN pooling operator

* Add MIOPEN activation operator

* Add MIOPEN softmax operator

* Add MIOPEN spatial batch norm operator

* Add MIOPEN loacl response normalization operator

* Add MIOPEN conv operator

* Clean-up LRN ops

* enable fp16 in MIOPEN pool ops

* Enable fp16 for MIOPEN relu op

* Enable fp16 for MIOPEN spatial batch norm op

* code clean-up

* revert float16 support

* Create Caffe2 python binding for AMD/ROCM/HIP

* Add op fallback for HIP operator

* add hip src/test files in cmake

* exclude hip src/test files

* fix python binding for hip backend

* fix MIOPEN pooling op workspace

* hack to compile miopen operators

* fix include path for MIOPEN ops

* Fix include path

* Add HIP math utilities

* Fix path for HIP math utils

* cmake fix

* Cmake fix / hipcc for hip files

* suppress hipcc warning

* cmake fix /replcae USE_HIP with USE_ROCM

* revert LoadHIP.cmake change

* fix include for thrust/cub-hip

* include path fix for conversion.h

* Updated with latest upstream changes

* clang format fixes

* Context_hip updates

* Fixed typo in rocblas handle get function

* Updated hipified math utils

* Updated math hip test util

* Updated context hip test

* Updated common_hip

* Updated net async dag for HIP

* Added MIOPEN in operator hip test

* fix

* C2 dependencies clean-up

* fix include path for building custom protobuf

* Decouple miopen pool op and conv_pool_op base

* cmake refactor

* fix operator_hip_test

* move all hip/miopen ops files into caffe2/operators/hip

* sanitize cmake

* permission issue

* remove extra parenthesis

* remove artifact from resolving merge conflict

* cont. sanitize cmake files

* fix syntax error

* sanitize conversion.h

* .

* Revert "."

This reverts commit 56020cb0e996a31ae27bf1f8f491955ed0b121b9.

* clang-format

* Enable some reduce operators' ONNX backend tests (#8418)

* fix old comment to point to the right file (#8416)

* Stop pinning nccl version. (#8421)

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Expose logsumexp docs and mark log_sum_exp in distributions for internal use (#8428)

* Enable some of the ONNX backend test on broadcasting (#8423)

* Enable some of the ONNX backend test on broadcasting

* enable gemm broadcast

* Expose proto utils and ONNX (#8073)

* Expose proto utils and ONNX from PyTorch libcaffe2.so

* Try to use protobuf from _C.so

* Fix ONNX proto header include

* Adjust order of imports for ONNX until nanopb goes away

* Set and use ONNX_NAMESPACE for PyTorch builds

* Show protobuf summary for all builds

* Add ONNX_NAMESPACE for cpp_build

* Statically link libprotobuf.a into libtorch.so

* Set ONNX_NAMESPACE on Windows build

* Move core/dispatch up as well

* Add /MD flag for Windows build of _C

* Potential Windows fix for ONNX and protobuf

* Add direct linkage from _C to ONNX on Windows

* Only include protobuf wrapper for PyTorch

* Pass extra_compile_args to _nvrtc ext build

* Remove installation of .a files

* Rebase creates some weird situations, revert them manually

* Remove more weird changes due to rebase

* Need to add thread_name.cc after merge
2018-06-13 13:10:45 -07:00
Orion Reblitz-Richardson
1d5780d42c Remove Apache headers from source.
* LICENSE file contains details, so removing from individual source files.
2018-03-27 13:10:18 -07:00
Paul Jesse Hellemn
74f0b270ea Fixing conda (#2123)
* Fixing conda

* Adding hypothesis and onnx to conda builds

* Updates but still not working

* Adding required changes to conda_full

* Updates

* Moving to more general build_anaconda script

* Adding check for gcc version

* Adding general ways to add/remove packages from meta.yaml?

* Changes for specific packages to build on gcc 5.4

* Fix with glog spec

* Requiring >numpy 1.12 for python 3 to satisfy opencv dependency

* Adding pydot to required testing packages

* Adding script to read conda versions for gcc ABI

* Trying to fix segfault by installing in env instead

* conda activate -> source activate

* Trying adding back leveldb

* Setting locale for ONNX + conda-search changed its format

* read_conda_versions handles libprotobuf

* Conda script updates

* Adding a protobuf-working test

* Removing changes to proto defs b/c they will require internal changes in a separate diff
2018-03-14 12:24:37 -07:00
Alexander Sidorov
60aa8c793d Update caffe2 from facebook (#2178)
* [C2] Don't crash kernel in case of invalid shapes for ConcatOp

Enforce correctness of the shapes for input tensors so we won't access invalid index.

* [Caffe2] Add analytical performance counters to Dynolog

Initial diff for counting analytical flops and memory writes for C2 operators.

* BBoxTransform op: Handle RoIs from multiple images per batch

BBoxTransform op used during typical Faster-RCNN inference operates only on
RoIs from a single image (no batching). Adding support to handle that with an
optional output blob containing the batch splits (i.e., the number of RoIs
belonging to each item in the batch). The code is perfectly backward compatible
and shouldn't break any existing models..

* [mkl] Make MKL-DNN cooperate with memongered nets

C2's MKL-DNN implementation caches input dims and reuses intermediate and
output buffers across net runs, which prevents memonger from being used. This
may not always be useful since input dims may vary widely in many cases and
we'll end up reallocating anyway. Added an option to force reallocation when
memonger is used.

* [oncall] fix batch gather ops for empty input

still need to bisect for the breaking change, but this shall fix the case for empty input.

the error logging is like: https://interncache-ftw.fbcdn.net/t49.3276-7/23938497_293562711176943_6500112636590424064_n.txt?_nc_log=1

@[557759185:raychen] can you help to subscribe oncall from ads side. this may affect the Sigrid online trainer.

* optimize BatchOneHotOp

We want to iterate in row-major as opposed to column-major for better
locality.

* Supported exporting model with int blobs.

Supported exporting model with int blobs. Needed by condensenet.

* BoxWithNMSLimit op: Handle boxes from mutiple images per batch

Similar to D7135360. Added support for multiple images per batch in the op.
Takes an optional additional input "batch_splits" as output by BBoxTransform
op, and returns new batch_splits after applying NMS and filtering. Otherwise,
backward compatibility is maintained.
2018-03-07 16:41:22 -08:00
Andrey Malevich
16cd3f4a9e Don't allow to export models where parameters are inputs/outputs
Summary:
Without this enforce it's too easy to export model overriding it's params in
predictor.

Reviewed By: rayleichen

Differential Revision: D6984506

fbshipit-source-id: 9bbf375758686c6ad12ad071723f255363e98ae6
2018-02-14 23:54:42 -08:00
Junjie Bai
b11ba65204 Experimental support for setup.py develop mode install
Summary:
`python setup.py develop` / `pip install -e .`
Closes https://github.com/caffe2/caffe2/pull/1926

Reviewed By: orionr

Differential Revision: D6951780

Pulled By: bddppq

fbshipit-source-id: 01249cbca90ec5326ea4107d4e500ae95a9dbd7b
2018-02-12 23:36:18 -08:00
Jesse Hellemn
1c005602fc Adding model_id argument to nets in predictor_container when modelInfo exists
Summary: Copying model_id from metaNetDef_->modelInfo in PredictorContainer for dper models. Since these model_id's are strings of <model_id>_<snapshot_id>, changed them to strings in net_observer

Reviewed By: salexspb

Differential Revision: D6752448

fbshipit-source-id: 93c91950b44c012e57240aaf909bc961449cfd7c
2018-02-12 10:38:58 -08:00
Jesse Hellemn
52600f8607 Record workflow run id for inference.
Reviewed By: salexspb

Differential Revision: D6094757

fbshipit-source-id: d8761749e8eb080f50fb08a37431e8a987d0a2db
2017-12-18 15:33:19 -08:00
Davin Wang
f2be3a4e5e Allow specifying device to prepare_prediction_net()
Summary:
This is a supplementary to commit ce8267d425444f60ae650389fb41838847a44a5e. It allows specifying device to prepare_prediction_net() so prediction extractor can work with GPU.
Closes https://github.com/caffe2/caffe2/pull/1035

Differential Revision: D6467420

Pulled By: salexspb

fbshipit-source-id: b5b9a1536fb516e90b5e4b615403086943cfbe93
2017-12-03 10:32:08 -08:00
Andrew Tulloch
7244d27220 Add a EmptyDeviceScope (i.e. allow setting CurrentDeviceScope() to None)
Summary:
See comments for where this can be useful (disabling the
OperatorDef::DeviceOption(...) so we can control the scope at the
NetDef::DeviceOption(...) level).

Reviewed By: viswanathgs

Differential Revision: D6103412

fbshipit-source-id: 75a9be54275760132f6d1e71acbe9190e7099289
2017-11-02 11:25:48 -07:00
Junjie Bai
43b303bfc0 Expose Predictor::run_map to Python
Reviewed By: jerryzh168

Differential Revision: D6087316

fbshipit-source-id: d90e20429645391f17f0c56c8a8a60685097f801
2017-10-18 19:32:56 -07:00
Yangqing Jia
8286ce1e3a Re-license to Apache
Summary: Closes https://github.com/caffe2/caffe2/pull/1260

Differential Revision: D5906739

Pulled By: Yangqing

fbshipit-source-id: e482ba9ba60b5337d9165f28f7ec68d4518a0902
2017-09-28 16:22:00 -07:00
James Reed
9aed89ac88 Allow specification of num_workers in PredictorExportMeta and enable for NMT beam search model
Summary:
The predictor export functions allowed a way to specify a net type, but no way to specify num_workers for when you use net type 'dag'. This adds that option to the PredictorExportMeta named tuple and populates the field in the exported protobuf. Also added parameters to callsites in NMT ensemble model class and model repackager to populate net_type and num_workers.

Using DAGNet for our base predictor net (not recurrent stepnets) speeds up our inference by 1.15x, since we can now run encoder forward and backward RecurrentNet's for each model in the ensemble in parallel.

Reviewed By: salexspb

Differential Revision: D5792203

fbshipit-source-id: cb9a8237a0cbe1a09645d4de051dfbb23f06dcfa
2017-09-07 22:48:45 -07:00
Priya Goyal
ca3f2f9e6a Small fix to exporter to accept net/NetDef both
Reviewed By: bwasti

Differential Revision: D5753261

fbshipit-source-id: 55b9252606023648ee3b2acdcbbe89bcc8b54748
2017-09-01 13:32:12 -07:00
Thomas Dudziak
5355634dac Dict fixes/improvements and unittest targets for Python 3 in caffe2 core
Summary: As title

Reviewed By: salexspb

Differential Revision: D5316104

fbshipit-source-id: aee43819d817842e5ce6ba3d045a55b1a2491c30
2017-06-29 17:05:41 -07:00
Aapo Kyrola
46a95cf420 Allow specifying device to load_from_db()
Summary: A quite common problem is that it is hard to load blobs with pe.load_from_db to a specific device. One must set the device options of the returned init_net and predict_init_net, which is quite magical. So I made load_from_db() able to set these device options automatically, based on device scope or device_option parameter. Added an unit test.

Reviewed By: asaadaldien

Differential Revision: D5249202

fbshipit-source-id: 7b9d91476cb8d1b0ec0d9772e50b9148b8b184fa
2017-06-14 14:32:24 -07:00
Thomas Dudziak
b877d4b5f8 Misc fixes for Python 3
Summary: As title

Differential Revision: D5216942

fbshipit-source-id: def5563f1b259efefab3a829d8a78d8d3297ffc7
2017-06-13 12:18:43 -07:00
Fedor Borisyuk
686470a6b8 Feature importance in dper 2.0: build network representation
Summary: Changes to enable feature importance.

Reviewed By: kennyhorror

Differential Revision: D5075252

fbshipit-source-id: e5d46e129bcd5cbef77932c63b5a288dd57775d1
2017-06-05 18:03:34 -07:00
Andrey Malevich
aa59b217a9 Relax requirement on the outputs of the predictor.
Summary: It looks like it's a bit too restrictive requirement. Let's remove it.

Reviewed By: volkhin

Differential Revision: D5150968

fbshipit-source-id: 9e38574edc6542c5ce3c7f25a01afe8f5ff9b507
2017-05-30 17:23:18 -07:00
Thomas Dudziak
47e921ba49 Remove map() and filter() in favor of comprehensions
Summary: These return views in Python 3 which would not do anything in a lot of usages currently present in Caffe2. This diff simply removes (almost) all usages of these two in Caffe2 and sub projects in favor of comprehensions which are also easier to read/understand

Reviewed By: akyrola

Differential Revision: D5142049

fbshipit-source-id: e800631d2df7d0823fed698cae46c486038007dc
2017-05-30 15:32:58 -07:00
Andrey Malevich
6c12df3003 Fix export of SparseToDense layer.
Summary:
If there're 2 SparseToDense layers that are densifying same IdList feature
it'll result in the situation, where we might export invalid input for the
prediction in input specs. This diff is changing the behavior to support to use
Alias to a new blob instead of passing things directly.

Reviewed By: dzhulgakov

Differential Revision: D5093754

fbshipit-source-id: ef4fa4ac3722331d6e72716bd0c6363b3a629cf7
2017-05-25 21:46:28 -07:00
Viswanath Sivakumar
152d439400 Allow specifying net type in predictor_exporter
Summary:
predictor_exporter copies the original predict_net's op, external_input and
external_output fields, but ignores the type field. This is reasonable as the
train net would generally have 'dag' type and copying that for inference may
not be applicable. It's good to have a way to specify the net type nevertheless
to run DAGNet for inference. This diff adds a field in predictor_exporter to do
that.

Reviewed By: akyrola

Differential Revision: D5122354

fbshipit-source-id: 0e3cc417128db903c71515135c9e3b87620ae21e
2017-05-24 11:46:27 -07:00
Bram Wasti
c55be38e63 Added mobile exporter
Summary: Basically takes in a live net and creates an init_net and predict_net which can be written to file and run in Predictor

Reviewed By: salexspb

Differential Revision: D4989425

fbshipit-source-id: 8052065da9ed763d48bd9e1e19f7697ef60a2829
2017-05-24 11:36:44 -07:00
Aapo Kyrola
6384bae29b call save_to_db in CPUContext + fix a typo in data_parallel_model.
Summary:
If Predictor Exporter save_to_db is called in CUDAContext, a failure occurs since the following FeedBlob() tries to store a string (meta data), but for CUDA blobs we assume they are tensors.
  + fix a typo in data_parallel_model that I bumped on.

Reviewed By: asaadaldien

Differential Revision: D5099837

fbshipit-source-id: 69d01b35a9a1816bf083f13d8a6ce88e1f5aecb7
2017-05-19 18:25:00 -07:00