Commit Graph

1944 Commits

Author SHA1 Message Date
Di Yu
f3d72b2101 Modify barrier net to allow better control over its initialization and execution in DPM (#9665)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9665

In data_parallel_model, we isolate synchronizing barrier init net into its own from the param_init_net, so that we could have finer granularity of control over the barrier net.

Reviewed By: andrewwdye

Differential Revision: D8375389

fbshipit-source-id: ce0c8c1c8e4bd82b7078a1b07abaced3f149d578
2018-07-22 00:23:47 -07:00
Xiaomeng Yang
a01d6f01b5 Update channel_shuffle_op and transpose 2d to speed up ShuffleNet (#9525)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9525

Update channel_shuffle_op and transpose 2d to speed up ShuffleNet

Reviewed By: houseroad

Differential Revision: D8889361

fbshipit-source-id: 60196e819b6842becc53b4859b62d4419a0e2c6e
2018-07-21 12:54:33 -07:00
Yinghai Lu
45e5c17ecf ONNXIFI transform (#9569)
Summary:
Cut-off runnable subgraph and off-load to ONNXIFI backend
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9569

Reviewed By: Maratyszcza

Differential Revision: D8930408

Pulled By: yinghai

fbshipit-source-id: 2b494f7f8dc10c00e58cf0fed5c4a9434be6155b
2018-07-20 15:09:59 -07:00
Kittipat Virochsiri
01581037dc Add workspace.RunPlanInBackground (#9637)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9637

Adding a method to run plan in background. The intended use is to run BlueWhale's data reading & preprocessing net in background while the GPU is training.

Reviewed By: MisterTea

Differential Revision: D8906439

fbshipit-source-id: b1c73ca7327e2d87a8f873924e05ab3d161a3f1e
2018-07-20 14:56:12 -07:00
Kittipat Virochsiri
8a0fe0a588 set_input_record() should always add external input (#9636)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9636

Make sure that the blobs are registered to the net

Reviewed By: pjh5

Differential Revision: D8924883

fbshipit-source-id: f09422a2d4d5ba8bf6cfbfd00172097b5ab1fcd6
2018-07-20 11:55:37 -07:00
Zhaoheng Ni
a3a6ab60cd Fix the error in UnpackSegmentsOp when calculating the gradient with "max_length" argument (#9598)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9598

The "max_length" should be passed to UnPackSegmentsOp if "max_length" is given when calling PackSegmentsOp.

Reviewed By: jerryzh168

Differential Revision: D8919799

fbshipit-source-id: 8c97aa717b69177b8a5d5d56892817d488853840
2018-07-20 11:09:34 -07:00
Junjie Bai
f521823b7b Do not always set broadcast argument when exporting new onnx add and sub to caffe2
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9597

Reviewed By: colesbury

Differential Revision: D8920575

Pulled By: bddppq

fbshipit-source-id: 97423e1bf6a20559d466d2ac56c9e74e10bfc129
2018-07-19 14:10:05 -07:00
Zhishuai Zhang
6557856671 Fix l2 normalization when handling zero vector (#9594)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9594

When the input vector is a zero vector, the previous GPU code will give Nan in backward. We fix this.

Reviewed By: pjh5

Differential Revision: D8849732

fbshipit-source-id: 87b1fb1ee05dfdb0d43bcbe67e36f15896fe1706
2018-07-19 14:10:03 -07:00
Peter Yeh
54db14e390 HIP Operators Generator--> HipOpG (#9322)
Summary:
The goal of this PR is to add an infrastructure; to convert(hipify) CUDA ops into [HIP](https://github.com/ROCm-Developer-Tools/HIP) ops , at **compile** time.

Note that HIP ops, which are portable c++ code, can run on AMD and NVIDIA platform.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9322

Differential Revision: D8884707

Pulled By: bddppq

fbshipit-source-id: dabc6319546002c308c10528238e6684f7aef0f8
2018-07-19 00:26:06 -07:00
Xiaomeng Yang
ca3b36aa6a Add implementation for batch_moments_op (#9510)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9510

Add implementation for batch_moments_op

Reviewed By: houseroad

Differential Revision: D8587654

fbshipit-source-id: d20f52cc8e900716c1057e68c147258dfda5245b
2018-07-18 11:59:54 -07:00
Keren Zhou
8c741b7c4f Add transformation from caffe2::resizeop to onnx::upsample
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9511

Reviewed By: hlu1

Differential Revision: D8876692

fbshipit-source-id: 9ba346e225cfbc686d370134fe41a28333b933cc
2018-07-18 11:59:52 -07:00
Artem Volkhin
b6b6e1b39f Fix core.Plan.create_from_proto (#9438)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9438

Current implementation of create_from_proto doesn't work as expected: it
duplicates networks and execution steps by copying original PlanDef first and
adding each step one-by-one later.

Reviewed By: pjh5

Differential Revision: D8850316

fbshipit-source-id: 9b02836d6e6ee1c91cfdd3b4c4804f14137dc22b
2018-07-18 10:55:55 -07:00
Huamin Li
13e0c9295d Add Support for count_include_pad in AveragePool in Caffe2 ONNX Backend (#9458)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9458

The goal is to support count_include_pad in Caffe2 ONNX backend. This commit contains the first step - support 4-D tensor cases.
AveragePool with count_include_pad can be expressed as PadImage + AveragePool.

Reviewed By: houseroad

Differential Revision: D8852180

fbshipit-source-id: 4db00e9771be7a000a2d92850dfd066d9c9c38bf
2018-07-17 17:41:52 -07:00
Lin Li
0fe980c748 Memory usage measurement -- Caffe2 (#9017)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9017

Closes https://github.com/pytorch/pytorch/pull/9017

Added "get_blob_size_bytes" to "pybind_state.cc" in Caffe2 to expose the size of blob in bytes.

Reviewed By: kuttas

Differential Revision: D8685696

fbshipit-source-id: 9a9d38f207c8c59ef534217181e8ce1514617628
2018-07-17 16:40:23 -07:00
Junjie Bai
30f849cdc5 Correct model name in caffe2 onnx backend tests
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9485

Reviewed By: houseroad

Differential Revision: D8873733

Pulled By: bddppq

fbshipit-source-id: 3a3cc351834cbbedce360760504ea16f5fa0ea06
2018-07-17 11:41:01 -07:00
Gu, Jinghui
e8b8c3895e Enable Conv fusion optimizations in optimizeForIdeep (#9255)
Summary:
Enable fusion for IDEEP in optimizeForIdeep
including Conv+ReLU, Conv+Sum, Conv+Sum+ReLU, Conv+BN
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9255

Reviewed By: bddppq

Differential Revision: D8809030

Pulled By: yinghai

fbshipit-source-id: af30bad3b96cb965bd26a4dfa810370faec4bb88
2018-07-16 21:28:50 -07:00
Viswanath Sivakumar
9235ff53f1 Clip horizontal bounding boxes during rotated detection for backward compatibility (#9403)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9403

In BBoxTransform and GenerateProposal ops, clip_boxes makes sure the bbox fits
within the images. For rotated boxes, this doesn't always make sense as there
could be multiple ways to clip a rotated box within an image boundary.
Moreover, clipping to a horizontal box means we leave out pixels of interest
potentially. Therefore, we clip only boxes with angle almost equal to 0 (with a
specified `angle_thresh` tolerance).

Reviewed By: pjh5

Differential Revision: D8828588

fbshipit-source-id: 39c1eafdb5d39d383780faa0a47e76149145e50c
2018-07-16 20:24:49 -07:00
Yinghai Lu
45140368c3 Update onnx-tensort module to the latest (#9469)
Summary:
Update onnx-tensort to follow up recent changes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9469

Reviewed By: Maratyszcza

Differential Revision: D8866704

Pulled By: yinghai

fbshipit-source-id: 3b96ec2fa28470f0d4b5a7c62ab332eeba4bdb12
2018-07-16 17:10:16 -07:00
Mark Richardson
88146484b4 Add support for .norm() pytorch onnx export and ReduceL1/ReduceL2 caffe2 operators (#9299)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9299

Onnx has ReduceL1 and ReduceL2 operators that would facilitate this, so allow pytorch to export those and allow caffe2 to run them.

I only implemented this on CPU so far.

Reviewed By: pjh5

Differential Revision: D8757381

fbshipit-source-id: 68afc9e2f90042a70929b73ace05a499b5c670c7
2018-07-14 10:54:13 -07:00
Zhaoheng Ni
5ac8a80f8b Add BatchBucketizeOp in caffe2 (#9385)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9385

The operator transform dense features to sparse features by bucketizing. Only the feature in indices tensor will be transformed and output.

Reviewed By: bddppq

Differential Revision: D8820351

fbshipit-source-id: a66cae546b870c6b2982ac20641f198334f2e853
2018-07-13 20:39:30 -07:00
Jian Zhang
099a6d5e08 Implementation of Wngrad optimizer caffe2 python wrapper and unit test on least square regression (#9001)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9001

Closes https://github.com/pytorch/pytorch/pull/9001

We added caffe2 python wrapper and unit test for the Wngrad C++ operator.

Reviewed By: chocjy

Differential Revision: D8655724

fbshipit-source-id: fb259afd6fd50231691bd75c52852b20a1e1aec8
2018-07-13 18:54:52 -07:00
Jian Zhang
9e2f2cab94 Implementation and operator test for Wngrad optimizer (#8999)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8999

Closes https://github.com/pytorch/pytorch/pull/8999

Implemented the WRgrad optimizer operator for dense (base case as well as the case with additional output for effective learning rate and update value) and sparse case.

Reviewed By: pjh5

Differential Revision: D8627933

fbshipit-source-id: a63cde46c04bcc6b428ab5f77a4b3b2beb66c046
2018-07-13 18:11:41 -07:00
Xiaomeng Yang
bb9ff58c6d Add cudnn activation ops (#9379)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9379

Add cudnn activation ops

Reviewed By: houseroad

Differential Revision: D8818013

fbshipit-source-id: d3881c634a46578b9331da07f9fdf7e1f31d7e8a
2018-07-12 23:18:56 -07:00
103yiran
117a5c3cc0 fix the annotation
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9380

Differential Revision: D8821294

Pulled By: zou3519

fbshipit-source-id: b375cd0de9042bcaef1d22de104966fb704bd43e
2018-07-12 18:53:59 -07:00
Chenguang Xi
feaee21968 Plotting embeddings norm being slow in distributed training. (#9325)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9325

as title. Fixing by calculating norm on same device.

Reviewed By: chocjy

Differential Revision: D8668136

fbshipit-source-id: 6671a1858da4b0a6f766f067b7fa648a072cd219
2018-07-12 11:51:23 -07:00
Akshay Chalana
e30ff68410 Add Hardtanh Export (#8804)
Summary:
Added hartanh CPU/GPU Implementations and backend tests to Caffe2
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8804

Reviewed By: bddppq

Differential Revision: D8813987

Pulled By: houseroad

fbshipit-source-id: 2480296eab3373425b9e1734a10c009b4f5d3e26
2018-07-11 18:09:51 -07:00
Lu Fang
1a8e826ed4 Skip the count_include_pad in average pool for now (#9365)
Summary:
Will create a bootcamp task.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9365

Reviewed By: bddppq

Differential Revision: D8813889

Pulled By: houseroad

fbshipit-source-id: bce1eaafd0efb3c27c0f71fcc40a8313e2b1c7b8
2018-07-11 18:09:50 -07:00
Yan Shang
8253947256 Make error message more informative (#9352)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9352

I am debugging a failed workflow f61490672, and found the original error message to be not informative.

Differential Revision: D8808181

fbshipit-source-id: 3f524ca092881186a492c5c0456124ce31d54751
2018-07-11 15:09:46 -07:00
Xiaomeng Yang
cbcf45274b Move tanh function to math (#9328)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9328

Move tanh function to math

Reviewed By: houseroad

Differential Revision: D8794745

fbshipit-source-id: ea525dedde6f53592b06c2caffd6426688dea5fc
2018-07-11 13:59:50 -07:00
Yinghai Lu
80380f637c Fix to make ONNXIFI flow work (#9340)
Summary:
Small step to have Relu test work.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9340

Reviewed By: bddppq

Differential Revision: D8807018

Pulled By: yinghai

fbshipit-source-id: 429f3185e12afb12aaecfea8dd9595fdf838d356
2018-07-11 13:09:41 -07:00
Viswanath Sivakumar
c2dd90c40e Add angle normalization for rotated boxes (#9056)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9056

Closes https://github.com/pytorch/pytorch/pull/9056

Updates bbox_transform for rotated boxes with angle info to normalize the
predicted angle to be within [angle_bound_lo, angle_bound_hi] range.

Reviewed By: pjh5

Differential Revision: D8706240

fbshipit-source-id: f3ee834cf362736136e285f0f8f0c063af94a879
2018-07-11 11:25:54 -07:00
JerryShih
8da936ab52 Fix the build break for python3.7 PyUnicode_AsUTF8AndSize() prototype changing (#9259)
Summary:
https://docs.python.org/3.7/c-api/unicode.html#c.PyUnicode_AsUTF8AndSize
The return type changes from "char*" to "const char*".
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9259

Reviewed By: orionr

Differential Revision: D8776219

Pulled By: pjh5

fbshipit-source-id: e5eadf71264002ba57cfb68dd39686a7ec074092
2018-07-11 10:39:43 -07:00
Viswanath Sivakumar
748a90d05b BBoxTransform op: Add support for rotated boxes (#8952)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8952

Closes https://github.com/pytorch/pytorch/pull/8952

Based on RRPN paper: https://arxiv.org/abs/1703.01086

Reviewed By: pjh5

Differential Revision: D8598547

fbshipit-source-id: 3699379df9bf45ed5bdd395175a0e26a77e079f7
2018-07-11 10:25:34 -07:00
Lu Fang
04a7fc1dc4 Add Upsample support in C2 onnx backend for opset 1
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9327

Reviewed By: ailzhang

Differential Revision: D8798462

Pulled By: houseroad

fbshipit-source-id: d7d1127a853de6a7bb8fdef146f283487e1e5569
2018-07-10 22:43:25 -07:00
Huamin Li
fb9f9c9ba2 Implement Sinh and Cosh (#9213)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9213

Closes https://github.com/pytorch/pytorch/pull/9213

Added hyperbolic trig functions Sinh and Cosh

Reviewed By: BIT-silence

Differential Revision: D8752566

fbshipit-source-id: 5a58336a5153ec804404b9ac7b10b5662ede3cb7
2018-07-10 18:55:31 -07:00
Lu Fang
e06abab264 Fix Upsample ONNX Symbolic (#9288)
Summary:
Adjust the change to changes in ATen
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9288

Reviewed By: ailzhang

Differential Revision: D8779078

Pulled By: houseroad

fbshipit-source-id: 7f387eeb35ae1f5a1494afc6287853a87a6173b4
2018-07-09 23:25:26 -07:00
Lu Fang
181d2a5e60 Add support of is_compatible for old version of onnx (#9284)
Summary:
Fix the problem if caffe2 works with old version of onnx
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9284

Reviewed By: yinghai

Differential Revision: D8773894

Pulled By: houseroad

fbshipit-source-id: 99b5a962099f854edc85a2ea815cb88c82a6e175
2018-07-09 21:09:14 -07:00
Yinghai Lu
7ace3a99ec Fix TensorRT tests (#9285)
Summary:
ONNX-TensorRT is still using old opset (<7). Patch it for now.

Future fix would be expose versioning in onnx exporter.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9285

Reviewed By: houseroad

Differential Revision: D8775268

Pulled By: yinghai

fbshipit-source-id: c272073f80cce35ebd971e44ec9472e3c8fd4b9e
2018-07-09 20:40:19 -07:00
Yinghai Lu
cb98c5020a Normalize IDEEP spatial bn op test (#9276)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9276

Use `checkDevice` instead rolling our own.

Reviewed By: orionr

Differential Revision: D8769401

fbshipit-source-id: bd47ec2b2501552c2da1cee2eb9ad96a215602b4
2018-07-09 11:55:41 -07:00
Orion Reblitz-Richardson
936f47f271 Make roi_align_rotated_op_test not rely on 1.12.0 numpy.rot90 (#9267)
Summary:
Breaking this out of https://github.com/pytorch/pytorch/pull/8338

Use a local version of `np.rot90` with an `axes` argument, since we don't have NumPy 1.12.0 in all of the test environments. Caffe2 conda2-ubuntu16.04, for example, fails. Generally, it seems better to not require a NumPy bump just for this test.

cc mingzhe09088
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9267

Reviewed By: mingzhe09088

Differential Revision: D8767819

Pulled By: orionr

fbshipit-source-id: c51a6295d58366eba06e4e55e3f1ffaa8af96975
2018-07-09 11:55:39 -07:00
Zhaoheng Ni
f87499a8f3 Modify the original PackSegments operator by adding "max_length" argument (#9048)
Summary:
Closes https://github.com/pytorch/pytorch/pull/9048

max_length argument helps fix the shape of the output to be N * max_length * D, where N is the batch_size, D is the feature_dim.

Reviewed By: bddppq

Differential Revision: D8702782

fbshipit-source-id: e30555608fee1c4a61cc95922f4a71c7f54903af
2018-07-06 14:33:59 -07:00
Xiuyan Ni
4e5369349f Add FTRL Optimzier with Group Lasso regularizer (#9074)
Summary:
Closes https://github.com/pytorch/pytorch/pull/9074

Implement an optimzier based on FTRL Optimzier which support Group
Lasso regularizer.

The relevant paper list for this optimizer:
1. About the FTRL Optimizer: https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/41159.pdf,
2. About the group lasso regularizer solver: http://www.cse.cuhk.edu.hk/~king/PUB/ICML2010-Yang-473.pdf

Differential Revision: D8623146

fbshipit-source-id: 40e08aa6319d1ad7aa95e8716e3de83b9cfb8452
2018-07-06 13:41:00 -07:00
Shaoliang Nie
da39c24971 Add GroupL1Norm regularizer (#9115)
Summary:
Closes https://github.com/pytorch/pytorch/pull/9115

As desc

Reviewed By: hlu1

Differential Revision: D8718011

fbshipit-source-id: c9d750662064dd6e6362b6b13d9d0175e93e60e4
2018-07-06 13:26:09 -07:00
Xiaomeng Yang
21c420c32c Remove unused RowwiseArgMaxOp (#9119)
Summary:
Closes https://github.com/pytorch/pytorch/pull/9119

Remove unused RowwiseArgMaxOp

Reviewed By: houseroad

Differential Revision: D8719826

fbshipit-source-id: 57d78c8b93bc94a4634d806c7c2041f8c18678a5
2018-07-05 15:25:28 -07:00
Yan Zhu
8364470e5c fix expty batch for softmax (#9075)
Summary:
Closes https://github.com/pytorch/pytorch/pull/9075

as title

Reviewed By: QueryConnectionException

Differential Revision: D8710616

fbshipit-source-id: ca505e1a733cc24db9e2ab83a5395c64fa8360c4
2018-07-01 16:40:14 -07:00
Xiaomeng Yang
03e7953a98 Use FixedDivisor in Reduce and Broadcast CUDA kernels (#9072)
Summary:
Closes https://github.com/pytorch/pytorch/pull/9072

Use FixedDivisor in Reduce and Broadcast CUDA kernels

Reviewed By: houseroad

Differential Revision: D8710243

fbshipit-source-id: 6f1da12234898594a1be8c979d942aa515832aeb
2018-07-01 00:25:34 -07:00
Yan Zhu
b07ea04e23 empty batch for spatialBN (#8933)
Summary:
Closes https://github.com/pytorch/pytorch/pull/8933

spatialBN implementation cannot deal with empty batch, this diff tries to enable zero batch setting:

during training, when batch_size = 0:
in forward, output's saved_mean and saved_var are zeros.
in backward, the gradient for SCALE_GRAD and BIAS_GRAD are zeros.

Reviewed By: pjh5

Differential Revision: D8644699

fbshipit-source-id: 599ea687329d68699c987e05f56f409f4e729d1c
2018-06-29 18:40:41 -07:00
Lu Fang
863754c722 Update the ONNX op coverage in C2
Summary: Closes https://github.com/pytorch/pytorch/pull/9051

Reviewed By: pjh5

Differential Revision: D8704583

Pulled By: houseroad

fbshipit-source-id: 186e8b62378ab4f7cdef5fa77dc08c6b9ddc9cc0
2018-06-29 17:25:19 -07:00
Lu Fang
b75490414c Bump up the C2 onnx frontend opset to 8 (#9006)
Summary:
Now ONNX master has bump up to opset 8.
Closes https://github.com/pytorch/pytorch/pull/9006

Reviewed By: yinghai

Differential Revision: D8685417

Pulled By: houseroad

fbshipit-source-id: f0c0a3682417b8803a856e232c2740cf3e68e554
2018-06-29 11:56:11 -07:00
Xiaomeng Yang
838fdd6f99 Add Cube and Cbrt Ops (#8991)
Summary:
Closes https://github.com/pytorch/pytorch/pull/8991

Add Cube and Cbrt Ops

Reviewed By: houseroad

Differential Revision: D8678848

fbshipit-source-id: 051dd475e45ad9f1d11a8b32ae3acd1f7459b930
2018-06-28 14:55:30 -07:00
Xiaomeng Yang
93cc7d1923 Add in_place test for binary ops
Summary: Closes https://github.com/pytorch/pytorch/pull/8973

Reviewed By: houseroad

Differential Revision: D8674216

Pulled By: BIT-silence

fbshipit-source-id: bde1ff7b47dbc8a48d1ff72b345c767af698a09b
2018-06-28 11:45:35 -07:00
Lu Fang
63233f98ad Bump up opset version to 7 in Caffe2 ONNX exporter (#8854)
Summary:
Will bump up to opset 8 in another PR to match the current opset version.

Already tested through generating the models in current model zoo.
Closes https://github.com/pytorch/pytorch/pull/8854

Reviewed By: ezyang

Differential Revision: D8666437

Pulled By: houseroad

fbshipit-source-id: feffdf704dd3136aa59c0f1ff1830c14d1bd20aa
2018-06-28 07:39:02 -07:00
Yinghai Lu
346de2535d Workaround lack of 0-dim support in ideep (#8959)
Summary:
Closes https://github.com/pytorch/pytorch/pull/8959

MKL-DNN doesn't have support to  0-dim tensor. As a workaround, we produce CPUTensor instead of Ideep tensor in the fallback ops. And for those tensors, we don't need Ideep copy op anymore.

Reviewed By: viswanathgs

Differential Revision: D8665168

fbshipit-source-id: 59678de2c5aed8c691ab5caaadede6d6c000dd7b
2018-06-27 20:24:28 -07:00
Duc Ngo
f52c2ca1c6 net_async tracing use enable_profile arg from NetDef (#8927)
Summary:
Closes https://github.com/pytorch/pytorch/pull/8927

Closes https://github.com/pytorch/pytorch/pull/8855

- Add parameter `enable_tracing` to the Arg field of NetDef. `net_async_tracing` will only enable Tracer for Net instances that have this field set (unless the command line argument also include the net name).
- Append a unique id to the json profiling result file because there could be multiple instances of the same net running.
- Dump json profling file regularly instead of just when the Tracer object is destroyed

Reviewed By: ilia-cher

Differential Revision: D8372378

fbshipit-source-id: 8adc9d59f48b67456beed2e3a88235c298fdfd01
2018-06-27 16:24:57 -07:00
Mingzhe Li
c4744cfafa bilinear upsample operator on CPU
Summary: Add support for bilinear upsample operator on CPU.

Reviewed By: BIT-silence

Differential Revision: D7853215

fbshipit-source-id: 9043c95f9eb4e1f6df324e8f7a4e8fdb0c758f66
2018-06-27 10:12:06 -07:00
Orion Reblitz-Richardson
9ec0a2aef4 fbshipit-source-id: ba600fcd2b5cefc7621357bdeb05e24cea02e5af 2018-06-27 04:50:56 -07:00
Orion Reblitz-Richardson
edb88b5f3a
Update from Facebook (#8887)
* add opencl + fpga context

adds an opencl context inside caffe2/fb which can be used for fpga access

* [Caffe2] Force tensor inference checks to be triggered during testing

We've started to rely on TensorInference functions more for different analysis.  This diff ensures that the TensorInference function's result matches what is expected from the definition of the operator.

* Enable building //caffe2:torch with @mode/opt

In @mode/opt, python runs out of a PAR, which breaks a lot of
assumptions in the code about where templates/ folders live relative
to __file__. Rather than introduce hacks with parutil, I simply turn
template_path into a parameter for all the relevant functions and
thread it through from the top level.

* [Caffe2] Fix cost models for DotProduct and Div.  Update Tensor Inference for dot product

As title.  DotProduct states that output is a 1-D tensor (https://caffe2.ai/docs/operators-catalogue.html#dotproduct) though code suggests it is either 0- or 1-D depending on inputs.  TensorInference defined to support implementation.

* [SG-MoE] Add an option to make the experts NOT as components

* [nomnigraph] Rename and fixup convertToNeuralNetOperator API

This will make things a bit cleaner

* no longer symlink THNN.h and THCUNN.h

* forced decoder network (onnx export)

Closes https://github.com/pytorch/translate/pull/95

Add networks in ensemble_export.py to create a forced decoding network from PyTorch NMT checkpoints. This network takes an arbitrary numberized (source, target) pair and returns the model score for the translation, including penalties.

Vocabulary reduction networks are also supported, but note that target indices which are not in the possible_translation_tokens generated for the source input will be trea

* Revert schema change to fix production models

Revert schema change to fix production models

* MockLogDeviceReader - rebase on FIX

# Goal

1), Build a make_mock_log_device_reader using make_mock_reader

2), Replace the real log_device_reader here: https://fburl.com/raihwf1p

# Log by D8151734

Real log_device_reader:
```
I0529 20:29:05.373108 954994 tensor.h:839] Tensor print_net/log of type std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >. Dims: (): read_net/ParseOpenTrainingRow:0
I0529 20:29:05.373244 954994 tensor.h:839] Tensor read_net/ParseOpenTrainin

* [C2/D2][1/n]: Nonnegative-Constrained Optimization -- log barrier

implement log barrier as a regularization method

* Add teacher weight screening.

Add teacher weight sceening according to teacher labels. If teacher label is zero, we do not use the distill loss in the objective function.

* Add NormalizerContext

See task for more detail. This implementation is a copy of what exists for RegularizerContext except for how the parameters are defined in the model_definition thrift file.

I'll try an alternative implementation which overrides the default arguments of functions instead like for argscopes in tensorflow.

https://github.com/pytorch/pytorch/compare/master...MaximeBoucher:update-from-facebook-0939578c068c?expand=1

* Adding cosine similarity option in dot processor

Add pairwise cosine similarity option in dot product.
Add an option to concate dot product and cosine similarity.
Add test cases.

* [nomnigraph][redo] Concat elim for sparseNN

Same as D7962948, which was reverted because Operator Schema was not
defined

* [pytorch] Revert pytorch/pytorch#7918 'Release GIL when copying to shared memory', breaks ASAN

Revert this pytorch diff that breaks ASAN when running Filament in dev mode; in opt mode it gives "bad file descriptor" errors. Looks like a race when copying tensors to shared memory in multiple mp.Queue's (which spawn separate threads).

https://github.com/pytorch/pytorch/pull/7918/files

* [nomnigraph][mobile] Enable nomnigraph by default, use -Oz on nomnigraph related code to reduce code size

enables nomnigraph and reduces codesize

* [Warmup] Allow both offline incremental training and online training

Change plan name on saving side and reading side to support both training type

This diff depends on D8128530 and D8168651.

* Revert D7802642: [Warmup] Allow both offline incremental training and online training

This reverts commit afc213cf9b36cecf75333a788391c4d09f4afccc

@bypass-lint

An infra SEV is better than not reverting this diff.
If you copy this password, see you in SEV Review!
@cause_a_sev_many_files

* Add legacy grad logic to fix div op on old graphs.

Add legacy grad logic to fix div op on old graphs.

* Correctly propagate operator failures

Propagate errors from operators that throw exceptions and return false

* Revert D8374829: [caffe2][nomnigraph][redo] Concat elim for sparseNN

This reverts commit 6dda028c463e54bb5c32188bbbe9202107e188a5

@bypass-lint

An infra SEV is better than not reverting this diff.
If you copy this password, see you in SEV Review!
@cause_a_sev_many_files

* [Caffe2] Added extra_info to core.DeviceOption(), enforced extra_info to be inherited in scope.DeviceScope

extra_info is a newly defined field in DeviceOption proto. This diff added extra_info to the core.DeviceOption().  And, In scope.DeviceScope(), this diff enforce the new scope to inherit the extra_info from old scope.

* [opt] hgdirsync wasn't enabled, merge diverged code

Here's the damage, P59732616 basically xplat was left behind but had
the change from assert to CAFFE_ENFORCE

* OMP parallelism over RoIs for RoIAlign op

Simpler to parallelize over RoIs. Shouldn't affect other uses as it relies on
the number of OMP threads set during startup.

PR: https://github.com/pytorch/pytorch/pull/8562

* Use int64_t for shape in FillOps

to avoid overflow of int32

* Implement Rotated RoIAlign op

Based on Rotated RPNs as explained in https://arxiv.org/abs/1703.01086.
The idea is simple - orientation/angle is added as an RPN
anchor parameter and then the angle is further regressed similar to bbox
coords. There are some additional changes related to NMS and IoU, but besides
that it's a direct extension to Faster-RCNN. Further details in https://fb.quip.com/sZHlA1iMfWPZ.

RoIs are represented in [center_x, center_y, width, height, angle] format.
`angle` repre

* Rotated RoIAlign op CUDA forward implementation

CUDA forward impl for D8415490

* RoIAlignRotated op CUDA backward pass implementation

TSIA

* All remaining fixes to eliminate process_github.sh

Most of this diff has already been reviewed separately, except for the parts relating to _thnn/utils.py and _utils._internal.py

remove skipIf(True, 'Fbcode') line from process_github.sh

replace sed of cpp file with #ifdef to control cudnnDestroy use

undo sync-time deletion of .gitattributes, remove process_github.sh

switch to using _utils._internal rather than try-import-except

This diff also fixes the open-source bug where rebuilds have

* Back out "Revert D7802642: [Warmup] Allow both offline incremental training and online training"

Original commit changeset: 7707d2efe60e The original diff is backout becuase the online trainer package is backed out. This code would only work with new online trainer package

* [easy] improve error log in adagrad op

as title

* re-allow use of thnn_h_path

This fixes cffi usage in OSS

* [4/4] [tum] paralyzing layerNorm for GPU full sync

as title

* add compile=False to pytorch tests, remove hack with pyc

* Add shape and type inference for RowWiseArgMax operator

See title

* Revert D8515341: Back out "Revert D7802642: [Warmup] Allow both offline incremental training and online training"

This reverts commit 78167eeef0af16b60f72c82f9dcdda9b41b4dcbd

@bypass-lint

An infra SEV is better than not reverting this diff.
If you copy this password, see you in SEV Review!
@cause_a_sev_many_files

* [fix-flaky-test] mock_hive_reader_test flaky, because GlobalCounter collects local counts intervally

# Problem

`MockHiveReader` uses `GlobalCounter` to limit `max_examples`.

GlobalCounter on server node collect local counts from worker nodes every 1 sec.

This 1 sec delay makes it impossible to limit exactly to the `max_examples`, it will definitely exceed `max_examples`.

# Plan

Given,
```
Expected num_examples = max_examples + num_examples/sec (Read Speed) x 1 sec (GlobalCounter Sync Int

* [Caffe2] Fix FCGradient cost inference.  Prevent overflow in cost inference

FCGradient missed a factor 2 in the `num_outputs == 3` case.  Overflow was occurring with flop calculation for FC.  Changed types to `uint64_t` to prevent future problems.

* Fix binary ops with empty inputs

Fix binary ops with empty inputs

* Support the filling of input blob with provided data

as title for Biz Integrity case

* Back out "Revert D8515341: Back out "Revert D7802642: [Warmup] Allow both offline incremental training and online training""

Original commit changeset: 30c55dd38816 Original diff is reverted due to introducing bad integration test. Fixed the integration test.

* [c2][easy] improve pack ops error loggings

as desc.

* Add ShapeTypeInference for LpNorm operator

As desc

* Shard test_nn to reduce runtime for each test target

Closes https://github.com/pytorch/pytorch/pull/8793

The current test_nn would time out and be disabled in GreenWarden, and we need to have an option to split it up in order to pass the stress test. Right now GreenWarden roughly allows running 100 test cases in test_nn before timing out, and here we have an option to divide test_nn into 30 shards (with ~40 tests in each shard) to allow for some test suite growth in the future.

* Change default caffe2_streams_per_gpu to 1

* Remove IN_SANDCASTLE from common.py and test_nn.py

We prefer to disable the failing tests through Sandcastle UI instead.

* Add a new class for an updated prof_dag.proto

This diff contains:
- An updated prof_dag.proto that contains blob profiles.
- A class to deserialize this information (serialization is in a follow up diff)
- Update to separate profiling information from NeuralNet (and use it as part of the class above).
- Unit tests

* Lambdarank for SparseNN

This diff adds a lambda_rank_layer for SparseNN.
 changes include
1) Adds support for multi sessions in c2 op
2) Adds support for two different loss functions in c2 op
3) Unit tests for op

* Revert D8586950: Back out "Revert D8515341: Back out "Revert D7802642: [Warmup] Allow both offline incremental training and online training""

This reverts commit 012220ed63eccc35659a57b31d16a3625da6317b

@bypass-lint

An infra SEV is better than not reverting this diff.
If you copy this password, see you in SEV Review!
@cause_a_sev_many_files

* [easy] A few fixups to multithread predictor benchmark

(1) support perf on T6 server
(2) remove dead code

* fix a bug about the map size

as title

* Fix reduce sum on in-place case.

Fix reduce sum on in-place case.

* [Warmup] Reland reverted diff Allow both offline incremental training and online training

Closes https://github.com/pytorch/pytorch/pull/8827

fix net transform integration test. Allow offline and online trainer to coexist D7802642.

* Add StoreHandlerNotAvailableException

Add an exception for a store that is not available or has been
deleted.

* Use exception handling for fault tolerance, missing KV store

Remove status blobs to communication ops so that exceptions propagate on
failure.

* [C2/D2][2/n]: Nonnegative-Constrained Optimization -- bounded grad proj

for simple bounded constrained optimization, incl non-negative box constraints.

* [GanH]: Adaptive Weighting with More Estimations

With implemented postivity optimization, we now learn adaptive weights with different
parameterizations.

This improves parameter estimation and training stability.

* Revert some changes for landing

* Remove AutoNoGIL in StorageSharing

* Temporarily disable net_tests

* Revert "[Caffe2] Force tensor inference checks to be triggered during testing"

This reverts commit 67ef05c22b2f71b4a489695384932f968384a2a4.

* Revert "Fix reduce sum on in-place case."

This reverts commit 6cb8a8e1b3db7b6d20941b0053e3f3836068eb64.

* Revert "Revert "Fix reduce sum on in-place case.""

This reverts commit 130a257c0893dc09f4bd6e6a45d112261807fd2c.
2018-06-26 14:55:48 -07:00
Xiaomeng Yang
288d37998a
[Caffe2] Fix gradient_check on in-place ops (#8828)
* Fix gradient_check on in-place ops

* Fix hsm_test

* Fix SplitByLengthOp test

* Fix input_device_options for gradient_checker

* Fix hypothesis_test_util.py
2018-06-25 15:25:56 -07:00
Lu Fang
9c426797a8 Expose is_compatible function (#8783) 2018-06-21 23:37:54 -07:00
Hexus (Shihao Xu)
bd95f8f948 Resolve name conflict of ContextManager (#7244)
* Resolve conflicting name, ContextManager

Concept name `Context Manager` is taken by Python. See https://docs.python.org/3.6/reference/datamodel.html#with-statement-context-managers

It says,
A context manager is an object that defines the runtime context to be established when executing a with statement. The context manager handles the entry into, and the exit from, the desired runtime context for the execution of the block of code.

The `ContextManager` here is more like a registry. 
And there is a C++ registry in caffe2 codebase `caffe2/caffe2/core/registry.h`.
There is also a Caffe2DBRegistry, declared by calling `CAFFE_DECLARE_REGISTRY(Caffe2DBRegistry, DB, const string&, Mode);` in `caffe2/caffe2/core/db.h`.

I think we can follow the concept name `Registry`, calling it `ContextRegistry`.

* Make Classes and Functions internal to this module start with "_"

Make Classes and Functions internal to this module start with "_"

* Update context.py

* Update context.py
2018-06-22 00:41:51 -04:00
Jinghui
0e0031e204 Fix build error in pybind_state_ideep (#8684)
Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>
2018-06-20 08:29:48 -07:00
kittipatv
32bc28dd18 caffe2 export (#8642) 2018-06-19 00:50:33 -07:00
zrphercule
c44c95fd0b New operator 'expand' (#8263)
* operator 'expand'

* updated operator with a simple testcase

* Revert "updated operator with a simple testcase"

This reverts commit 1ce9f8ac567b525677254b0dce5735d7fea133d7.

* updated operator with a simple testcase

* expand operator with a passed testcase

* typo

* GPU full support added

* GPU support testing...

* GPU full supported

* formatted

* nits repaired

* gpu parameters fixed

* Expander removed

* nits fixed, document added

* formatted

* new testcases added & nits repaired
2018-06-18 16:33:47 -07:00
bddppq
a8bf30d7a5
caffe2 hip python binding (#8491)
* caffe2 hip python binding

* Change back onnx submodule
2018-06-14 19:56:56 -07:00
Sebastian Meßmer
384936f73e
TypeId improvements (#8350)
* Improve TypeId:
- move it to c10 namespace to allow for easy extraction from caffe2 into c10 (i.e. reuseability from aten)
- Use unordered_map/unordered_set instead of map/set for performance
- Make TypeId a type safe class (i.e. no implicit casts from/to int)
- Make TypeId constexpr
- Some readability improvements (e.g. using instead of typedef)
- Don't explicitly implement TypeMeta copy assignment and construction - let the compiler do that for us.
- Add TypeMeta move constructor
- Make TypeMeta members noexcept
- Implement TypeMeta::operator== and operator!= as free functions instead of in-class

* CR comments

* fix

* fix windows

* Rename back to CaffeTypeId

* Remove c10::TypeId/TypeMeta

* remove C10_KNOWN_TYPE

* code review
2018-06-14 09:16:26 -07:00
sf-wind
5b86c3af4a
Update from facebook (#8384)
* [fix] fixup the bias multiplier data access issue

Hotfix for failues in conv_transpose

* [D2][Easy]: lint regularizer

lint with black

* [GanH]: Split mu in adaptive weight for diagnose

* [Dper] Add the ability to split FC weights into multiple smaller ones

* fix SumReduceLikeOp for empty blob

as desc.

* add ctc_greedy_decoder for caffe2

ctc_greedy_decoder same as tf's

* Update event callback handling

Allow multiple callbacks per event

* Add WeightedSum layer

The motivation is to do weighted sum in HoNet/crossnet, in the next diff, I'll replace model.Add with model.WeightedSum in
honet: https://fburl.com/f4rmolg2
crossnet: https://fburl.com/v7awn8se, https://fburl.com/63filbnm

* Replicate DAG's behavior

Some callers expect RunAsync to block, replicate that behavior in case of
explicit 'dag' net type

* [dper] layernorm layer

as title

* Override dag, async_dag, async_polling

Overriding dag, async_dag and async_polling with async_scheduling

* Name the thread pools

Caffe thread pools currently inherit the thread names from the thread that starts them, which can be misleading. Give them an explicit name instead.

* [Caffe2] FilleOp should support int64_t dimensions

Change argument type to int64_t for shape argument of FillerOp (used in ConstantFill, XavierFill, etc)

* Remove caffe2/caffe2/contrib/torch/

It's not used anywhere and depends on old lua torch that conflicts with Aten. Given PT1 it's not relevant any more (though it was nice and clever code!)

#accept2ship

* Fix linearWarmup multiplier check

The multiplier needs to be non-negative, not strictly positive.

* Revert D3314316

This is after 2 years and we do not seem to have a use case for this one, so
for the sake of clean API design we should potentially remove this. This would
allow us to potentially pass in arguments to optionally construct an object,
although it is indeed a little bit unclear how we can reuse existing objects if
constructor arguments are passed in. In any case, we may want to remove this
dangling feature.

* Speedup generate proposals by partial_sort.

Speedup generate proposals by partial_sort.

FACEBOOK:
- Saw speed improvement for training with this op.
- Yanghan benchmarked the op on a small dataset and see consistent 100% improvement on speed (6ms -> 3ms) on 420 input resolution. See next diff for details.

* More parallel processing friendly for CPP version of GenerateProposals.

More parallel processing friendly for CPP version of GenerateProposals.

* [DT] [43/n] Lift stop conditions inside reader code back to flow control

1. Split multi_reader function into local_reader and remote_reader
2. Lifted stop conditions inside Limiter back to flow control
3. Split epoch flow building logic into 3 cases:
  - single machine (1 reader, 1 trainer on trainer0 node, no PS)
  - (1 reader + 1 trainer) on trainer0 node, has PS
  - multiple readers, readers do not share nodes with trainers, might have PS or not

* Resolve conflicts for torch/_thnn/utils.py

* [Caffe2] Handle image decoding errors

Image decoding errors can make the whole training fail. This diff is to handle them
1.Catch imdecode exceptions and check if decoded image has zero columns or rows. This is counted as decoding errors.
2.Replace the image with empty in case of error
3.Count the number of errors and throw runtime exception if the rate reaches given number

The empty image data is kept. It might introduce noise in the training data.

* Update MKL exporter to IDEEP ops

TSIA

* [Caffe2] GlobalInit is thread safe, fixing the comment

With the mutex and lock, GlobalInit is thread safe.
Update the comments.

* Back out "Add support for generating ATen files during fbcode build"

Original commit changeset: 28970ddba353

@override-unit-failures
(Note: this ignores all push blocking failures!)

* [DT]: fix predictor save

similar to D6610058, here we add the fix for distributed online training

* Remove net_singlethread_async_gpu.cc

Closes https://github.com/caffe2/caffe2/pull/2528

This removes net_singlethread_async_gpu.cc as part of our effort to clean
CUDAContext and the net executors.

* Inline DFS task execution

Add a DFS inline task execution mode in executor

* Add c10 folder to fbcode

This adds the c10 folder and its test cases to fbcode. Build flags are mostly taken from aten.

* add dependencies for online trainer

Add some dependencies so that the online model can use DataPipeline and PredictionTransform operators

Relevent post: https://fb.intern.facebook.com/groups/1324375037655677/permalink/1740993462660497/

* Resolve conflicts for tools/jit/gen_jit_dispatch.py

* [Fix] sparse regularization in distributed training

* Support advanced pooling options in sum processor

* support advanced pooling options in sum processor
* remove redundant code
* support attention in sum processor

* Improve shard logging in net tracing code

Make it handle arbitrary shard ids instead of just one digit ids.

* [Caffe2] Call GlobalInit in predictor only in mobile

FACEBOOK:
Calling GlobalInit long after the program starts may not be safe. There are issues if the following happens:

User does not call GlobalInit and initFacebook after program starts
User sets a flag manually: https://fburl.com/mcsumw7d
User calls OSS predictor.
OSS predictor calls GlobalInit
GlobalInit calls initFacebook
initFacebook resets all flags: https://fburl.com/tolszha1
Thus, the user manually set flags are overwritten

This would happen anytime GlobalInit is called long after the program starts.
I suppose the intention of the user in this case is not to call GlobalInit throughout the program,
but use Caffe2 regardless (is that desired?)
But adding GlobalInit in the OSS predictor would automatically call GlobalInit when using Caffe2.

This issue doesn't exist in mobile, since initFacebook is not called on mobile.

For now, guard the GlobalInit in predictor for mobile only.
May want to ensure the GlobalInit is always called at the start of the program. @[3501714:kutta] has seen weird issues when not calling GlobalInit at the start of the program on server side. He has made some progress on this.

* resolve conflicts for caffe2/core/logging_is_google_glog.h and test/test_torch.py

* Add empty fix for SumLikeReduceOp

Add empty fix for SumLikeReduceOp

* Revert D7962948: [caffe2][nomnigraph] Concat elim for sparseNN

This reverts commit f7f434dc5c34ca6058b9765d2ef615453d2276a9

@bypass-lint

An infra SEV is better than not reverting this diff.
If you copy this password, see you in SEV Review!
@cause_a_sev_many_files

* Remove Declarations.yaml

* Include common.h

* Change std::stoi to caffe2::stoi

* Add thread_name.cc to the CMake file

* No need to subtract 1. Fix test segfaults

* Fix NetTest, ObserverTest

Fix tests

(cherry picked from commit 3767e66c3f365596cba3d46d3e7322c933a0ab41)

* CTCGreedyDecoderOp only has CPU implementation, test should only run on CPU

* Add a variable to avoid conversion resizing issue

* [fix] fixup the bias multiplier data access issue

Hotfix for failues in conv_transpose

* [D2][Easy]: lint regularizer

lint with black

* [GanH]: Split mu in adaptive weight for diagnose

* [Dper] Add the ability to split FC weights into multiple smaller ones

* fix SumReduceLikeOp for empty blob

as desc.

* add ctc_greedy_decoder for caffe2

ctc_greedy_decoder same as tf's

* Update event callback handling

Allow multiple callbacks per event

* Add WeightedSum layer

The motivation is to do weighted sum in HoNet/crossnet, in the next diff, I'll replace model.Add with model.WeightedSum in
honet: https://fburl.com/f4rmolg2
crossnet: https://fburl.com/v7awn8se, https://fburl.com/63filbnm

* Replicate DAG's behavior

Some callers expect RunAsync to block, replicate that behavior in case of
explicit 'dag' net type

* [dper] layernorm layer

as title

* Override dag, async_dag, async_polling

Overriding dag, async_dag and async_polling with async_scheduling

* Name the thread pools

Caffe thread pools currently inherit the thread names from the thread that starts them, which can be misleading. Give them an explicit name instead.

* [Caffe2] FilleOp should support int64_t dimensions

Change argument type to int64_t for shape argument of FillerOp (used in ConstantFill, XavierFill, etc)

* Remove caffe2/caffe2/contrib/torch/

It's not used anywhere and depends on old lua torch that conflicts with Aten. Given PT1 it's not relevant any more (though it was nice and clever code!)

#accept2ship

* Fix linearWarmup multiplier check

The multiplier needs to be non-negative, not strictly positive.

* Revert D3314316

This is after 2 years and we do not seem to have a use case for this one, so
for the sake of clean API design we should potentially remove this. This would
allow us to potentially pass in arguments to optionally construct an object,
although it is indeed a little bit unclear how we can reuse existing objects if
constructor arguments are passed in. In any case, we may want to remove this
dangling feature.

* Speedup generate proposals by partial_sort.

Speedup generate proposals by partial_sort.

FACEBOOK:
- Saw speed improvement for training with this op.
- Yanghan benchmarked the op on a small dataset and see consistent 100% improvement on speed (6ms -> 3ms) on 420 input resolution. See next diff for details.

* More parallel processing friendly for CPP version of GenerateProposals.

More parallel processing friendly for CPP version of GenerateProposals.

* [DT] [43/n] Lift stop conditions inside reader code back to flow control

1. Split multi_reader function into local_reader and remote_reader
2. Lifted stop conditions inside Limiter back to flow control
3. Split epoch flow building logic into 3 cases:
  - single machine (1 reader, 1 trainer on trainer0 node, no PS)
  - (1 reader + 1 trainer) on trainer0 node, has PS
  - multiple readers, readers do not share nodes with trainers, might have PS or not

* Resolve conflicts for torch/_thnn/utils.py

* [Caffe2] Handle image decoding errors

Image decoding errors can make the whole training fail. This diff is to handle them
1.Catch imdecode exceptions and check if decoded image has zero columns or rows. This is counted as decoding errors.
2.Replace the image with empty in case of error
3.Count the number of errors and throw runtime exception if the rate reaches given number

The empty image data is kept. It might introduce noise in the training data.

* Update MKL exporter to IDEEP ops

TSIA

* [Caffe2] GlobalInit is thread safe, fixing the comment

With the mutex and lock, GlobalInit is thread safe.
Update the comments.

* Back out "Add support for generating ATen files during fbcode build"

Original commit changeset: 28970ddba353

@override-unit-failures
(Note: this ignores all push blocking failures!)

* [DT]: fix predictor save

similar to D6610058, here we add the fix for distributed online training

* Remove net_singlethread_async_gpu.cc

Closes https://github.com/caffe2/caffe2/pull/2528

This removes net_singlethread_async_gpu.cc as part of our effort to clean
CUDAContext and the net executors.

* Inline DFS task execution

Add a DFS inline task execution mode in executor

* Add c10 folder to fbcode

This adds the c10 folder and its test cases to fbcode. Build flags are mostly taken from aten.

* add dependencies for online trainer

Add some dependencies so that the online model can use DataPipeline and PredictionTransform operators

Relevent post: https://fb.intern.facebook.com/groups/1324375037655677/permalink/1740993462660497/

* Resolve conflicts for tools/jit/gen_jit_dispatch.py

* [Fix] sparse regularization in distributed training

* Support advanced pooling options in sum processor

* support advanced pooling options in sum processor
* remove redundant code
* support attention in sum processor

* Improve shard logging in net tracing code

Make it handle arbitrary shard ids instead of just one digit ids.

* [Caffe2] Call GlobalInit in predictor only in mobile

FACEBOOK:
Calling GlobalInit long after the program starts may not be safe. There are issues if the following happens:

User does not call GlobalInit and initFacebook after program starts
User sets a flag manually: https://fburl.com/mcsumw7d
User calls OSS predictor.
OSS predictor calls GlobalInit
GlobalInit calls initFacebook
initFacebook resets all flags: https://fburl.com/tolszha1
Thus, the user manually set flags are overwritten

This would happen anytime GlobalInit is called long after the program starts.
I suppose the intention of the user in this case is not to call GlobalInit throughout the program,
but use Caffe2 regardless (is that desired?)
But adding GlobalInit in the OSS predictor would automatically call GlobalInit when using Caffe2.

This issue doesn't exist in mobile, since initFacebook is not called on mobile.

For now, guard the GlobalInit in predictor for mobile only.
May want to ensure the GlobalInit is always called at the start of the program. @[3501714:kutta] has seen weird issues when not calling GlobalInit at the start of the program on server side. He has made some progress on this.

* resolve conflicts for caffe2/core/logging_is_google_glog.h and test/test_torch.py

* Add empty fix for SumLikeReduceOp

Add empty fix for SumLikeReduceOp

* Revert D7962948: [caffe2][nomnigraph] Concat elim for sparseNN

This reverts commit f7f434dc5c34ca6058b9765d2ef615453d2276a9

@bypass-lint

An infra SEV is better than not reverting this diff.
If you copy this password, see you in SEV Review!
@cause_a_sev_many_files

* Remove Declarations.yaml

* Include common.h

* Change std::stoi to caffe2::stoi

* Add thread_name.cc to the CMake file

* No need to subtract 1. Fix test segfaults

* Fix NetTest, ObserverTest

Fix tests

(cherry picked from commit 3767e66c3f365596cba3d46d3e7322c933a0ab41)

* CTCGreedyDecoderOp only has CPU implementation, test should only run on CPU

* Add a variable to avoid conversion resizing issue

* Remove the code per soumith's comments

* Remove the code per soumith's comments

* Remove blank lines in the end of file

* Resolve conflicts for torch/_thnn/utils.py

* Update MKL exporter to IDEEP ops

TSIA

* Back out "Add support for generating ATen files during fbcode build"

Original commit changeset: 28970ddba353

@override-unit-failures
(Note: this ignores all push blocking failures!)

* add dependencies for online trainer

Add some dependencies so that the online model can use DataPipeline and PredictionTransform operators

Relevent post: https://fb.intern.facebook.com/groups/1324375037655677/permalink/1740993462660497/

* Resolve conflicts for tools/jit/gen_jit_dispatch.py

* Support advanced pooling options in sum processor

* support advanced pooling options in sum processor
* remove redundant code
* support attention in sum processor

* resolve conflicts for caffe2/core/logging_is_google_glog.h and test/test_torch.py

* Revert D7962948: [caffe2][nomnigraph] Concat elim for sparseNN

This reverts commit f7f434dc5c34ca6058b9765d2ef615453d2276a9

@bypass-lint

An infra SEV is better than not reverting this diff.
If you copy this password, see you in SEV Review!
@cause_a_sev_many_files

* Remove Declarations.yaml

* Include common.h

* Change std::stoi to caffe2::stoi

* [caffe2] uprade IDEEP and hotfix for conv op accuracy issue (#8364)

* [IDEEP] Upgrade IDEEP version

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* [IDEEP] Fix accuracy issue in conv op

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Fix build error due to lack of src in CMakeLists

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Remove the code per soumith's comments

* [ONNX] Add an ATen fallback pathway for ONNX export (#8273)

* ATen fallback for ONNX export

* Move to enum

* Fix model test

* Add comment

* Address comments

BC interface

* Remove imaginary file (#8415)

* [Caffe2] Enable AMD/MIOPEN ops for Caffe2  (#8306)

* Add hip support for caffe2 core

* Add MIOPEN header/wrapper to caffe2 core

* Add HIP device into caffe2 PB

* top level makefile change for rocm/hip

* makefile scaffolding for AMD/RocM/HIP

* Makefile scafodding for AMD/RocM/HIP; add makefile/utility for HIP files

* caffe2 PB update for AMD/ROCM HIP device

* Add AMD/RocM/Thrust dependency

* HIP threadpool update

* Fix makefile macro

* makefile fix: duplicate test/binary name

* makefile clean-up

* makefile clean-up

* add HIP operator registry

* add utilities for hip device

* Add USE_HIP to config summary

* makefile fix for BUILD_TEST

* merge latest

* Fix indentation

* code clean-up

* Guard builds without HIP and use the same cmake script as PyTorch to find HIP

* Setup rocm environment variables in build.sh (ideally should be done in the docker images)

* setup locale

* set HIP_PLATFORM

* Revert "set HIP_PLATFORM"

This reverts commit 8ec58db2b390c9259220c49fa34cd403568300ad.

* continue the build script environment variables mess

* HCC_AMDGPU_TARGET

* Cleanup the mess, has been fixed in the lastest docker images

* Assign protobuf field hip_gpu_id a new field number for backward compatibility

* change name to avoid conflict

* Fix duplicated thread pool flag

* Refactor cmake files to not add hip includes and libs globally

* Fix the wrong usage of environment variables detection in cmake

* Add MIOPEN CNN operators

* Revert "Add MIOPEN CNN operators"

This reverts commit 6e89ad4385b5b8967a7854c4adda52c012cee42a.

* Add MIOPEN pooling operator

* Add MIOPEN activation operator

* Add MIOPEN softmax operator

* Add MIOPEN spatial batch norm operator

* Add MIOPEN loacl response normalization operator

* Add MIOPEN conv operator

* Clean-up LRN ops

* enable fp16 in MIOPEN pool ops

* Enable fp16 for MIOPEN relu op

* Enable fp16 for MIOPEN spatial batch norm op

* code clean-up

* revert float16 support

* Create Caffe2 python binding for AMD/ROCM/HIP

* Add op fallback for HIP operator

* add hip src/test files in cmake

* exclude hip src/test files

* fix python binding for hip backend

* fix MIOPEN pooling op workspace

* hack to compile miopen operators

* fix include path for MIOPEN ops

* Fix include path

* Add HIP math utilities

* Fix path for HIP math utils

* cmake fix

* Cmake fix / hipcc for hip files

* suppress hipcc warning

* cmake fix /replcae USE_HIP with USE_ROCM

* revert LoadHIP.cmake change

* fix include for thrust/cub-hip

* include path fix for conversion.h

* Updated with latest upstream changes

* clang format fixes

* Context_hip updates

* Fixed typo in rocblas handle get function

* Updated hipified math utils

* Updated math hip test util

* Updated context hip test

* Updated common_hip

* Updated net async dag for HIP

* Added MIOPEN in operator hip test

* fix

* C2 dependencies clean-up

* fix include path for building custom protobuf

* Decouple miopen pool op and conv_pool_op base

* cmake refactor

* fix operator_hip_test

* move all hip/miopen ops files into caffe2/operators/hip

* sanitize cmake

* permission issue

* remove extra parenthesis

* remove artifact from resolving merge conflict

* cont. sanitize cmake files

* fix syntax error

* sanitize conversion.h

* .

* Revert "."

This reverts commit 56020cb0e996a31ae27bf1f8f491955ed0b121b9.

* clang-format

* Enable some reduce operators' ONNX backend tests (#8418)

* fix old comment to point to the right file (#8416)

* Stop pinning nccl version. (#8421)

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Expose logsumexp docs and mark log_sum_exp in distributions for internal use (#8428)

* Enable some of the ONNX backend test on broadcasting (#8423)

* Enable some of the ONNX backend test on broadcasting

* enable gemm broadcast

* Expose proto utils and ONNX (#8073)

* Expose proto utils and ONNX from PyTorch libcaffe2.so

* Try to use protobuf from _C.so

* Fix ONNX proto header include

* Adjust order of imports for ONNX until nanopb goes away

* Set and use ONNX_NAMESPACE for PyTorch builds

* Show protobuf summary for all builds

* Add ONNX_NAMESPACE for cpp_build

* Statically link libprotobuf.a into libtorch.so

* Set ONNX_NAMESPACE on Windows build

* Move core/dispatch up as well

* Add /MD flag for Windows build of _C

* Potential Windows fix for ONNX and protobuf

* Add direct linkage from _C to ONNX on Windows

* Only include protobuf wrapper for PyTorch

* Pass extra_compile_args to _nvrtc ext build

* Remove installation of .a files

* Rebase creates some weird situations, revert them manually

* Remove more weird changes due to rebase

* Need to add thread_name.cc after merge
2018-06-13 13:10:45 -07:00
Lu Fang
7543d0f794 Enable some of the ONNX backend test on broadcasting (#8423)
* Enable some of the ONNX backend test on broadcasting

* enable gemm broadcast
2018-06-13 10:15:56 -07:00
Lu Fang
a42c12bb11
Enable some reduce operators' ONNX backend tests (#8418) 2018-06-13 21:32:50 +08:00
Peter Yeh
c37e5b7137 [Caffe2] Enable AMD/MIOPEN ops for Caffe2 (#8306)
* Add hip support for caffe2 core

* Add MIOPEN header/wrapper to caffe2 core

* Add HIP device into caffe2 PB

* top level makefile change for rocm/hip

* makefile scaffolding for AMD/RocM/HIP

* Makefile scafodding for AMD/RocM/HIP; add makefile/utility for HIP files

* caffe2 PB update for AMD/ROCM HIP device

* Add AMD/RocM/Thrust dependency

* HIP threadpool update

* Fix makefile macro

* makefile fix: duplicate test/binary name

* makefile clean-up

* makefile clean-up

* add HIP operator registry

* add utilities for hip device

* Add USE_HIP to config summary

* makefile fix for BUILD_TEST

* merge latest

* Fix indentation

* code clean-up

* Guard builds without HIP and use the same cmake script as PyTorch to find HIP

* Setup rocm environment variables in build.sh (ideally should be done in the docker images)

* setup locale

* set HIP_PLATFORM

* Revert "set HIP_PLATFORM"

This reverts commit 8ec58db2b390c9259220c49fa34cd403568300ad.

* continue the build script environment variables mess

* HCC_AMDGPU_TARGET

* Cleanup the mess, has been fixed in the lastest docker images

* Assign protobuf field hip_gpu_id a new field number for backward compatibility

* change name to avoid conflict

* Fix duplicated thread pool flag

* Refactor cmake files to not add hip includes and libs globally

* Fix the wrong usage of environment variables detection in cmake

* Add MIOPEN CNN operators

* Revert "Add MIOPEN CNN operators"

This reverts commit 6e89ad4385b5b8967a7854c4adda52c012cee42a.

* Add MIOPEN pooling operator

* Add MIOPEN activation operator

* Add MIOPEN softmax operator

* Add MIOPEN spatial batch norm operator

* Add MIOPEN loacl response normalization operator

* Add MIOPEN conv operator

* Clean-up LRN ops

* enable fp16 in MIOPEN pool ops

* Enable fp16 for MIOPEN relu op

* Enable fp16 for MIOPEN spatial batch norm op

* code clean-up

* revert float16 support

* Create Caffe2 python binding for AMD/ROCM/HIP

* Add op fallback for HIP operator

* add hip src/test files in cmake

* exclude hip src/test files

* fix python binding for hip backend

* fix MIOPEN pooling op workspace

* hack to compile miopen operators

* fix include path for MIOPEN ops

* Fix include path

* Add HIP math utilities

* Fix path for HIP math utils

* cmake fix

* Cmake fix / hipcc for hip files

* suppress hipcc warning

* cmake fix /replcae USE_HIP with USE_ROCM

* revert LoadHIP.cmake change

* fix include for thrust/cub-hip

* include path fix for conversion.h

* Updated with latest upstream changes

* clang format fixes

* Context_hip updates

* Fixed typo in rocblas handle get function

* Updated hipified math utils

* Updated math hip test util

* Updated context hip test

* Updated common_hip

* Updated net async dag for HIP

* Added MIOPEN in operator hip test

* fix

* C2 dependencies clean-up

* fix include path for building custom protobuf

* Decouple miopen pool op and conv_pool_op base

* cmake refactor

* fix operator_hip_test

* move all hip/miopen ops files into caffe2/operators/hip

* sanitize cmake

* permission issue

* remove extra parenthesis

* remove artifact from resolving merge conflict

* cont. sanitize cmake files

* fix syntax error

* sanitize conversion.h

* .

* Revert "."

This reverts commit 56020cb0e996a31ae27bf1f8f491955ed0b121b9.

* clang-format
2018-06-13 04:00:39 -07:00
Xiaomeng Yang
44973a06ba
Add affine_channel_op (#8356)
Add affine_channel_op
2018-06-11 20:51:11 -07:00
bddppq
3521cd54af
Fix dividing by zero segfault in Reshape (#8302)
when infer a dimension of zero size new shape
2018-06-09 09:48:22 -07:00
Yinghai Lu
2ed03898cd
Add depthwise convolution test for IDEEP (#8301) 2018-06-09 08:44:13 -07:00
Viswanath Sivakumar
d301d9df7a [ideep] Fuse Conv-Relu after IDEEP graph rewrite, skip group conv (#8233)
IDEEP supports fusion for non-group conv
2018-06-08 10:29:15 -07:00
Viswanath Sivakumar
832c88a766 [ideep] Add IDEEP Squeeze op (#8227)
Similar to MKLSqueezeOp at caffe2/mkl/operators/squeeze_op.cc
2018-06-06 21:58:51 -07:00
Viswanath Sivakumar
4df86b6547 Update MKL exporter to IDEEP ops (#8228)
IDEEP exporter support
2018-06-06 21:43:43 -07:00
sunnieshang
b2dac08049 Fix a corner case for ReShapeOp (#8178)
In my use case, in the backward propogate pass, the reshape need to
change a [0] tensor into [0,0] shaped tensor. The original implementation would
cause out of index issue. This diff fix this problem.
2018-06-05 19:06:10 -07:00
Xiao Yang
ffde23d45e use the correct datatype format (#8144) 2018-06-05 22:01:59 -04:00
Xiaomeng Yang
9243b64bff
[Caffe2] Update elementwise ops to support numpy style boradcast (#8070)
* Update elementwise ops to support numpy style boradcast

Update elementwise ops to support numpy style boradcast

* Fix sqrt_op

* Fix compare ops

* Fix gradient test

* Fix optimizer legacy broadcast

* Fix legacy broadcast for elementwise ops

* Skip flaky test

* Fix eigen simple binary op

* Fix attention test

* Fix rnn test

* Fix LSTM test

* Fix tan grad

* Fix schema check
2018-06-05 15:49:16 -07:00
Yinghai Lu
c446269568
cpu/ideep context converter (#8139) 2018-06-04 21:28:59 -07:00
bddppq
ec4a0f332e
Add back lrn test (#8134)
* Revert "Skip OnnxBackendNodeModelTest::test_lrn_default_cuda that causes segfault (#8127)"

This reverts commit 410191c417.

* Fix mismatched default values
2018-06-04 15:06:40 -07:00
bddppq
410191c417
Skip OnnxBackendNodeModelTest::test_lrn_default_cuda that causes segfault (#8127) 2018-06-04 12:34:15 -07:00
daquexian
df28f5d06e [Caffe2] Support non peer access in muji and fix bug when reduced_affix is empty (#6896)
* [Caffe2] Support non peer access in muji

* [Caffe2] Add test for 4 gpus and 2 groups

* [Caffe2] Add comments

* Fix bug when reduced_affix is empty

* Fix typo and add comments about cpu and amd gpu
2018-06-05 03:14:43 +08:00
bddppq
01f5ee77e3 Skip ConvTraspose ONNX backend tests (#8074) 2018-06-02 09:52:18 -07:00
Varun Jain
68948306bc Support to run ONNX Upsample operator (mode=nearest) in Caffe2 (#8037)
* Added support to run ONNX Upsample operator (mode=nearest) in Caffe2

* adding error checks to upsample

* adding error checks to upsample

* adding error checks to upsample

* changing to np.isclose

* Revert onnx submodule update

* still fixing
2018-06-02 08:45:44 -07:00
Bram Wasti
82b981e4db Update from facebook 1ee4edd286a3 (#8040)
* Adding instance weight to batch distill loss

as title

* add bfloat 16-31

added bfloat 16-31 and their respective unit tests

* [CUDA9] Upgrade - fbcode

CUDA9 upgrade diff D5654023 has been out for a while thanks to Pieter. But with time growing it's becoming quite hard to rebase, because of the symlinks and auto-generated build/config files in tp2. Break D5654023 into two diffs, one touching tp2 config files, and another one touching fbcode TARGETS file (adding nvcc flag). These two should be a bit easier to rebase (for detailed procedure see "Test Plan").

This diff can only be committed if:
1. CUDA 9 rpm is rolled out fleet-wide (TBD)
2. NVidia driver 390.40 is rolled out fleet-wide (done)
3. Upgrade CUDA 9.1, cudnn 7.1, nccl 2.1 (done)
4. Make sure all dependents are built (done)
5. Test all C2 operators, PyTorch (see test plan)

* Share intermediate int32 buffer across Conv ops

Adding a known type

* [C2 fix] infer function for ensure_cpu_output_op

this is adding the missing device funtion for ensure_cpu_output_op

* [int8] Add blob serializer/deserializer for Int8TensorCPU

To export to logfiledb

* [nomnigraph] Add try catch block to optimization passes in predictor

This will catch failures that happen in the optimization pass.

* Caffe2: avoid static initialization order fiasco for CAFFE_ENFORCE

CAFFE_ENFORCE uses strack trace fetcher. Which is currently a
global static variable. If at static initialization time CAFFE_ENFORCE
is used, this is a SIOF. Recently CAFFE_ENFORCE was added into init
functions registration, so we started to see this.

Meyers singleton is going to provide safety here. If stacktrace
fetcher was not registered yet, it will just use a dummy one.

* NUMA support in SparseNN CPU benchmark

Adding support for NUMA in SparseNN CPU benchmark

* [mobile-roofline] Add logging needed for roofline model

This should be all that's needed

* Let the operators using the same input if the operators are not chained

or else, we have to change the input data dims

* fix null-pointer-use UBSAN errors in in reshape_op.h

* revert previous fix on input blob name

as title

* Adding flag to let MineHardNegative automatically extract single value from dict

Model exporter requires the output of the model to be a struct. This makes it convenient to use those models directly in MineHardNegative by allow automatic extraction of the single element of dict, which is a common use case.

* Reverting change that broke internal tests back to OSS compatible state
2018-06-01 17:41:09 -04:00
bddppq
d8e28cfec2 Enable ONNX backend Mean tests (#7985) 2018-05-31 21:03:12 +08:00
James Reed
5419c6ecb7 Add unsafe flag to skip checking in prepare (#7832)
* Add unsafe flag to skip checking in prepare

* pop
2018-05-30 11:48:01 -07:00
Sebastian Meßmer
b3e87b1066 Fix fbcode compatibility (#7939) 2018-05-30 13:35:46 -04:00
Sebastian Meßmer
49f8581745
Update from facebook (#7855)
* [mpscnn] MPSCNNChannelShuffle

att

* [Easy] Adding tags as an argument to the functional layer

Without it "tags" would be added as an argument to the operator.

The change here is based on the assumption that there is no operator that takes "tags" as an argument.

* Fix locally_connected_op schema check.

Fix locally_connected_op schema check.

* [C2] Add TypeAndShape inference for few more operators

As desc

* [c2] Shape inference should support 0 as dimension

Tensors can have 0 in their dimension.

* Make MockHiveReader loop over and support max_examples

Replace DatasetReader with RandomDatasetReader.

So that Mock Hive Reader can simulate a large data input using a small sample file as source.

* Utility function to wipe cache between benchmark runs

Caffe2 benchmark does not wipe out cache between runs, and this potentially creates an unrealistically optimistic picture of performance. This diff adds utility function to wipe out the cache.

* Allow caffe2 GlobalInit to be invoked multiple times

Allow caffe2 GlobalInit to be invoked multiple times. Will re-parse gflags and update logging levels on successive invocations, but will not re-run init functions or perform other one-time initialization.

* Add Caffe2 GlobalInitIsCalledGuard to base net and operator classes

Warn if caffe2's GlobalInit function has not been invoked before creating an operator or net object. This is based on discussion here: https://fb.quip.com/kqGIAbmK7vNG

* Rethrow current exception on failure

Rethrow current exception instead of copy constructing a new one on op failure.

* Make `clone()` return subclass of List/Struct

`clone()` is not working correctly when we subclass those classes

* Wipe the cache before the net run

the util function is copied from D7409424
will rebase once D7409424 is landed.

* [Caffe2] [Mobile] Support utils/cast.h::GetCastDataType with LITE_PROTO builds

* Correct includes

async_polling include -> async_base include

* Prepare execution flags for executor migration

Making async_scheduling aware of underlying net type to prepare for executor
migration

* Add operator level observers into async executor

Adding operator level observers into RunAsync operators' calls

* Cleanup TEST_Benchmark

Remove duplicate code and provide default implementation in NetBase

* [C2] Fix type and shape inference for binary comparison ops

As desc.

* Add GlobalInit to predictor to ensure initialization is always done before prediction

FACEBOOK:

Redo D7651453 the correct way.

Now use a static variable for the arguments passed to GLog

* Remove spammy log message

This method is currently used in various places inside Caffe itself.

* Disable events for operators inside a chain

We don't need to use events in operators within a chain because the chain is
always scheduled on a single stream, keeping only first and last event for
scheduling purposes

* Ensure correct finish run order

In rare cases we might call finishRun and trigger net's destruction while
another worker is still holding shared_ptr to a thread pool, that can cause
thread pool destruction from within a worker thread in case no other nets are
using the pool. This diff fixes the order of calling finishRun and also changes
pool() to return raw pointer to keep pool's ownership within the net

* Reduce unnecessary polling

Make sure we don't waste CPU by polling operators that we can set an efficient
callbacks on

* Squash commit of syncing 9506eeb from github to fbcode

Patch xplat buck fix

add virtual destructor to OptimizationPass

add virtual destructor to OptimizationPass

build fixes for sync

build fixes for sync

* Fix net tracing

Fix net tracing from async_scheduling

* Fix logging
2018-05-29 11:38:02 -07:00
Orion Reblitz-Richardson
74246c9ba4
Potential fix for RNN test on MKL (#7862) 2018-05-25 16:16:46 -07:00
anderspapitto
d5c466e5ce
RNN export: add transpose to match onnx spec (#7825)
Didn't quite get it right the first time.

fixes https://github.com/pytorch/pytorch/issues/7817
2018-05-25 12:56:57 -07:00
bddppq
93b7b5dddd
Fix trigonometric_op_test failures when running in python3.6 (#7831) 2018-05-24 19:09:35 -07:00
anderspapitto
2271e7d7ab
onnx->caffe2 output: better handling of init/pred splitting (#7820) 2018-05-24 14:49:14 -07:00
Lu Fang
f9633b9542 [Caffe2] Skip some tests to unbreak CI (#7804)
* Skip some tests to unbreak CI

* Pass the opset_version to run_node

* Remove the stale check_graph call, caffe2_net_to_onnx_model will invoke check_model
2018-05-24 00:12:00 -07:00
Lu Fang
1289fc870d
Disable onnx backend node tests with broadcasting (#7730) 2018-05-24 09:15:16 +08:00
bddppq
5316cad5c2
[Easy] Remove unused code (#7782) 2018-05-22 22:32:47 -07:00
bddppq
f94ae3ba1d
Update from facebook (#7696)
* Fix handling of empty batches in SumReduceDimsOp

As titled

* Deferrable async_scheduling finishRun fix

Proper order of finishing run operations in deferrable_async_scheduling net

* Simplify exception handling in async_scheduling

Simplify exception handling, no need to busy wait, thread that processes the
last task can finish the run

* [C2]worker_coordinator_memorize_worker_ids

As titled. This is related to T28689868, where the number of blobs we want to create is equal to the number of worker ids

* Add unit test for nets with no type set

* Ignore total length argument in sympolic_pad_packed_sequence

1- There was a mistake in the code that total_length was added to the wrong symbolic function (pack_padded_sequence) instead of (pad_packed_sequence)
2- No need to throw an exception if total_length is given since it is only used to enable data_parallel training on multi-gpus and doesn't have anything to do with onnx export, so just ignore it. https://fburl.com/tk4gciqp

* Add support for MKLDNN to async_scheduling

Just add MKLDNN as a possible CPU option to async_scheduling's pool function

* [AuFL][ensemble] support branch output for prediction

This diff supports using predictions from different branches and thus enables model ensembling (not fully independent).

* Fix a bug in add_loss in layer_model_helper

As titled.

* Support lradaption for adam

1.lr adaption operator
2.apply to dense adam

* Perf tweaks for async_scheduling

Restore single pool option + remove unnecessary (no-ops) calls

* add quantization to SparseSimdAdagradOp

add a bunch of quantization signatures to SparseSimdAdagradOp, implementations to come next

* [sr] [codemod] Change all SR callsites to use new API

@allow-large-files

This diff refactors all callsites of SR to use the slightly changed API introduced in the diff below. Really what this means is that you need to include the correct header. Also if you were using `ClientFactory::newFactory` you need to not prefix it with `ClientFactory::`.

```
cd ~/fbsource/fbcode
find ./ -type f -exec sed -i -e 's:#include "servicerouter/client/cpp2/ClientFactory.h":#include "servicerouter/client/cpp2/ServiceRouter.h":' -e 's:#include <servicerouter/client/cpp2/ClientFactory.h>:#include <servicerouter/client/cpp2/ServiceRouter.h>:' -e 's/ClientFactory::newFactory(/newFactory(/g' {} \;
```

Also manually fixed spots that couldn't be done automatically (or broke because they depended on transitive includes).

* Back out "Fix handling of empty batches in SumReduceDimsOp"

Original commit changeset: 282da1730cc2 This commit is blocking the
Github->fbcode sync, which really needs to get merged ASAP. D7881937 which this
diff depends on will be reverted in the sync D7990948 which causes this to
break. The sync diff cannot be patched with this reversion because it must be
landed against base revision 5c8c099 , and D7881937 must not be included in the
sync diff because it is breaking GPU tests that are not available in sandcastle
: https://ci.pytorch.org/jenkins/job/caffe2-builds/job/py2-cuda8.0-cudnn6-ubuntu16.04-test/3638/console
for one example.

* Add the flow to support operator benchmark

1) generate model with the operator 2) upload to everstore 3) generate model spec into json file 4) start running the benchmark

* [tum][gpu] Connect DPM trainer with flow and unit tests

This diff:
- Fix some small bugs for Yiming's recent changes to parallelizer, so it suits real use cases.
- Add correct tags to the TUM code, so we can do data parallel transform
- pass extra info when instantiation.
- add unit test for using DPM in TUM model

After this diff, we can do simple box, multi-gpu fully-sync trainer for TUM in Fblearner workflow, but may still need to do speed benchmarking.

* w/o normalized lradaption for adam dense only

The previous lr adaption includes a normalization step when performing the dot product operation. This is not exactly same as what is proposed in the paper. I add normalization as an option. Without it, the operator performs exactly what the paper proposed. With the option, we add the normalization step

* [fb] Use SharedPromise in DeferrableAsyncSchedulingNet

This code is to simplify DeferrableAsyncSchedulingNet by removing condition
variable + small fixes

* [tum] implement cuda sparseLengthsMean and LengthsMean

as title

* Adding an optional parameter to allow use of protobufs in InferShapesAndTypes function.

Adding an optional parameter to allow use of protobufs in InferShapesAndTypes function.

* Move feature_to_index to FeatureSpec.feature_to_index

move feature_to_index to FeatureSpec.feature_to_index to avoid override other fields

* [Caffe2] Rename bytes_moved to bytes_written

Just a rename in preparation for supporting bytes_read.

* [c2] fix ReduceFrontSumOp for empty case by setting 0

otherwise, it may use the results from last iteration when it's empty batch.

* [Caffe2] [Int8] Improve Intel CPU performance

* [Easy] Improve PrependDim op logging

as titled

* DBFileReader expand db_path using os.path.expanduser(..)

Since there are a lot of possible use cases of `DBFileReader` to read from user home path, like `~/local/sample.db`, I want to save people's trouble of calling `os.path.expanduser(db_path)` themselves.

* [Caffe2] Add bytes_read to cost structure

We're adding analytical read bytes to cost functions.  This extends the structure accordingly for all CostInference defined operators.
Additionally, some small bug fixes were performed:
1) Cost functions now extract type information of operands instead of assuming float

* Fix sleef on aarch64 for hhvm

@bypass-lint

Rename flag

* Remove duplicated part in caffe2/ideep/operators/conv_op.cc

should be sync error

* Rename test helper function test_adagrad_sparse_helper to adagrad_sparse_test_helper to avoid confusing pytest
2018-05-19 23:10:48 -07:00
Paul Jesse Hellemn
48bf733480 Changes from D7881937 and D7963936 plus an edit (#7605)
* Changes from D7881937 and D7963936 plus an edit

* D8038158

* Another change from cxj
2018-05-18 20:59:16 -07:00
bddppq
bc4feab3e3
Fix flaky atomic iter test (#7649) 2018-05-17 21:17:29 -07:00
James Sun
b4d5e67e5f Add asin, acos, tan, atan operators (#7600) 2018-05-16 18:09:26 -07:00