Commit Graph

978 Commits

Author SHA1 Message Date
Yanghan Wang
a285cbcccf support different class modes for bbox in box_with_nms_limit_op
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19820

Reviewed By: newstzpz

Differential Revision: D15112955

fbshipit-source-id: a757622a32cff7159c39735607103138dbbafc24
2019-04-30 16:02:44 -07:00
Chandler Zuo
472be69a73 Avoid Output Uninitialized Blobs in Load with load_all=1 (#19133)
Summary:
When output blob names are specified while load_all=1, output blob names are ignored. However, this behavior is not documented. In this diff, we just disallow users to provide blob names when load_all=1.

See discussion at https://fb.workplace.com/groups/1405155842844877/permalink/2714909788536136/
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19133

Reviewed By: dzhulgakov

Differential Revision: D14883698

Pulled By: chandlerzuo

fbshipit-source-id: 6e4171e36c4ccc4f857e79da98b858a06b7d8ad6
2019-04-27 10:45:44 -07:00
Xiaomeng Yang
2ce39de3fc Add elementwise_affine for layer_norm_op (#19713)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19713

Add elementwise_affine for layer_norm_op

Reviewed By: houseroad

Differential Revision: D15075454

fbshipit-source-id: e8a7d3da1c81e49fa55323f5e74a68bc4ef8d83f
2019-04-26 17:20:01 -07:00
Xiaomeng Yang
f5fe7aa0b2 Fix relu bug for empty tensor (#19451)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19451

Fix relu bug for empty tensor

Reviewed By: xianjiec

Differential Revision: D15009811

fbshipit-source-id: b75e567c3bec08d7d12b950d8f1380c50c138704
2019-04-19 15:21:07 -07:00
Yinghai Lu
f1f31b634d Eliminate AdjustBatch ops (#19083)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19083

As we have discussed, there are too many of AdjustBatch ops and they incur reallocation overhead and affects the performance. We will eliminate these ops by
- inling the input adjust batch op into Glow
- inling the output adjust batch op into OnnxifiOp and do that only conditionally.

This is the C2 part of the change and requires change from Glow side to work e2e.

Reviewed By: rdzhabarov

Differential Revision: D14860582

fbshipit-source-id: ac2588b894bac25735babb62b1924acc559face6
2019-04-17 10:00:25 -07:00
Huamin Li
c480798a1c use C10_REGISTER for GELU op
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19090

Reviewed By: BIT-silence

Differential Revision: D14864737

fbshipit-source-id: 8debd53171f7068726f0ab777a13ca46becbfbdf
2019-04-12 11:41:04 -07:00
Xiaomeng Yang
fd40c0eba0 Add gelu op (#18992)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18992

Add gelu op

Reviewed By: houseroad

Differential Revision: D14814811

fbshipit-source-id: 00f126b8b83763c57ebbf28fbd2de5a8fab6d491
2019-04-08 21:58:29 -07:00
Lu Fang
443a58e03d Export C10 operator in PyTorch Model (#18210)
Summary:
Almost there, feel free to review.

these c10 operators are exported to _caffe2 domain.

TODO:

- [x] let the onnx checker pass
- [x] test tensor list as argument
- [x] test caffe2 backend and converter
- [x] check the c10 schema can be exported to onnx
- [x] refactor the test case to share some code
- [x] fix the problem in ONNX_ATEN_FALLBACK
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18210

Reviewed By: zrphercule

Differential Revision: D14600916

Pulled By: houseroad

fbshipit-source-id: 2592a75f21098fb6ceb38c5d00ee40e9e01cd144
2019-04-08 16:06:00 -07:00
Xiaomeng Yang
b145dcca04 Add support for group ConvTranspose (#18794)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18794

Add support for group ConvTranspose

Reviewed By: houseroad

Differential Revision: D14741327

fbshipit-source-id: 5d947ca044bf8495dd7f8f56122441ebbcc6c7e4
2019-04-04 11:52:06 -07:00
Duc Ngo
16f07d7dac caffe2 - set up correct inheritance structure for remaining operator test classes (#18622)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18622

Set up correct inheritance structure for remaining operator test classes

Reviewed By: ezyang

Differential Revision: D14685941

fbshipit-source-id: a6b1b3be325935b7fec7515be13a4994b3016bf0
2019-04-01 15:53:22 -07:00
Yanghan Wang
f4e35d30ed register BoxWithNMSLimit with C10
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17956

Reviewed By: houseroad

Differential Revision: D14417300

fbshipit-source-id: eb5e2ba84513b3b7bfa509dc442424b13fe9148f
2019-03-29 13:41:40 -07:00
Ahmed Aly
9eb0f435d9 Inference LSTM integration test (#18559)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18559

Adding integration test for inference LSTM

Reviewed By: houseroad

Differential Revision: D14656698

fbshipit-source-id: 80fb2a72be30fcb695f4471b72bf9d6e3965bf81
2019-03-28 11:31:06 -07:00
Duc Ngo
6a1a019c0a caffe2 - support flaky operator tests for caffe2 build (#18155)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18155

- Make a python decorator caffe2_flaky for caffe2 operator unit tests.
- The environment variable CAFFE2_RUN_FLAKY_TESTS are now used to mark flaky test mode

During test run,
- If flaky tests mode are on, only flaky tests are run
- If flaky tests mode are off, only non-flaky tests are run

Mark ctc_beam_search_decoder_op_test as flaky

Reviewed By: ezyang, salexspb

Differential Revision: D14468816

fbshipit-source-id: dceb4a48daeb5437ad9cc714bef3343e9761f3a4
2019-03-25 16:58:34 -07:00
Gerard Goossen
46990c20fa Verify def before infer fensor (#18129)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18129

A lot of tensor interference function assume the operator passes the schema.
So call Verity to make sure this is actually the case.

Created diff before to add checking in Concat (https://github.com/pytorch/pytorch/pull/17110), but I encountered lot more places where this is assumed (for example ElementwiseOpShapeInference)

Reviewed By: mdschatz

Differential Revision: D14503933

fbshipit-source-id: cf0097b8c3e4beb1cded6b61e092a6adee4b8fcb
2019-03-22 06:36:25 -07:00
Jongsoo Park
c7448aa13c remove unused parameters in optimizer tests (#18084)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18084

data_strategy parameter was not used in some of unit tests for optimizers

Reviewed By: hyuen

Differential Revision: D14487830

fbshipit-source-id: d757cd06aa2965f4c0570a4a18ba090b98820ef4
2019-03-15 18:06:15 -07:00
Sebastian Messmer
7a3488e0fc Expose c10 cuda ops to caffe2 (#18036)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18036

- Add macros to export c10 cuda operators to caffe2 frontend
- Instead of having a separate caffe2 registry for the c10 operator wrappers, use the existing caffe2 registries

Reviewed By: ezyang

Differential Revision: D14467495

fbshipit-source-id: 7715ed2e38d2bbe16f1446ae82c17193a3fabcb9
2019-03-15 16:58:12 -07:00
Yanghan Wang
53fb9a462a register RoIAlign with C10
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17889

Reviewed By: smessmer

Differential Revision: D14411630

fbshipit-source-id: c3b7941d725ae2c78e8d79f52a7983db92b75807
2019-03-14 11:55:29 -07:00
Jongsoo Park
8bd9465b79 make momentum non negative in adagrad test (#18009)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18009

momentum should be initialized with non-negative values

Reviewed By: hyuen

Differential Revision: D14450841

fbshipit-source-id: 5bbbd11645db9e6f2dc42b26a00ff3caf378c59f
2019-03-14 03:15:07 -07:00
Xiaomeng Yang
54b33503ec Optimize channel_stats_op (#16243)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16243

Optimize channel_stats_op and add NHWC impl

Reviewed By: takatosp1

Differential Revision: D13775515

fbshipit-source-id: decb889e646f5316d4afefdf9f9b6bc6343613cd
2019-03-12 12:08:00 -07:00
youkaichao
b87abdfc12 typo fix
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17653

Differential Revision: D14302003

Pulled By: ezyang

fbshipit-source-id: 8ad90985a392b07127c7e315d4e74ce77962b573
2019-03-06 11:36:44 -08:00
Deepali Chourasia
e3516d0a95 omit group conv NHWC test for GPU (#17715)
Summary:
Observed the test `TestGroupConvolution.test_group_convolution` to fail with the following error:

```
Falsifying example: test_group_convolution(self=<caffe2.python.operator_test.group_conv_test.TestGroupConvolution testMethod=test_group_convolution>, stride=3, pad=0, kernel=5, size=8, group=4, input_channels_per_group=7, output_channels_per_group=8, batch_size=2, order='NHWC', engine='', use_bias=False, gc=, dc=[, device_type: 1])

You can reproduce this example by temporarily adding reproduce_failure('3.59.1', b'AAAA') as a decorator on your test case
```
This example generated by hypothesis has `group=2, order='NHWC' and dc=[, device_type: 1])`.
I think this example should be skipped.

I have mimicked the change corresponding to [PR#13554](https://github.com/pytorch/pytorch/pull/13554) to skip this example.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17715

Differential Revision: D14346642

Pulled By: ezyang

fbshipit-source-id: b1f1fef09f625fdb43d31c7213854e61a96381ba
2019-03-06 11:32:35 -08:00
Sebastian Messmer
910519e45b Expose cuda kernel for caffe2::GenerateProposals
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17066

Reviewed By: ezyang, wat3rBro

Differential Revision: D14071130

fbshipit-source-id: 6fe26503f6069c36ec31d6c09b549b932d5db242
2019-03-04 14:59:08 -08:00
rohithkrn
8c72217817 Enable boolean_mask, adadelta, adagrad fp16 on ROCm (#17235)
Summary:
-  Fix bugs, indentation for adadelta and adagrad tests to enable fp16
- Enable boolean_mask fp16  on ROCm
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17235

Differential Revision: D14240828

Pulled By: bddppq

fbshipit-source-id: ab6e8f38aa7afb83b4b879f2f4cf2277c643198f
2019-02-27 10:07:36 -08:00
Peizhao Zhang
54e4c4d7de Removed obsolete argument correct_transform_coords in bbox_transform op. (#16723)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16723

Removed obsolete argument correct_transform_coords in bbox_transform op.
* It was only for backward compatibility. We should not have models using it now.

Differential Revision: D13937430

fbshipit-source-id: 504bb066137ce408c12dc9dcc2e0a513bad9b7ee
2019-02-20 13:22:33 -08:00
Sebastian Messmer
9696fee635 Register CUDA kernels for caffe2 operators (#16691)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16691

Previous diffs already introduced a macro that registers caffe2 CPU kernels with c10.
This now also registers the CUDA kernels with it.

Reviewed By: bwasti

Differential Revision: D13901619

fbshipit-source-id: c15e5b7081ff10e5219af460779b88d6e091a6a6
2019-02-12 17:24:01 -08:00
Sebastian Messmer
920c684367 Expose GenerateProposals to PyTorch
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16880

Reviewed By: bwasti

Differential Revision: D13998092

fbshipit-source-id: 23ab886ba137377312557fa718f262f4c8149cc7
2019-02-11 14:15:47 -08:00
Sebastian Messmer
0c02d317ea Expose BBoxTransform to pytorch
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16879

Reviewed By: bwasti

Differential Revision: D13998093

fbshipit-source-id: ddfe4bff83e9a1a4cedf1e520e6d2977b21cb3af
2019-02-11 14:15:45 -08:00
peter.yeh@amd.com
c65b03b9f8 Enable arg_ops_test/unique_ops_test on AMD/rocm (#16853)
Summary:
Verified both tests are passing on rocm 2.1 env.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16853

Differential Revision: D13996279

Pulled By: bddppq

fbshipit-source-id: c0df610d7d9ca8d80ed2d1339cdadef59105a71c
2019-02-07 16:51:15 -08:00
Sebastian Messmer
64339dbd51 Fix and re-enable test case (#16643)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16643

The test was disabled in D13908117 because it conflicted with another diff that was about to land.
Now fixed the merge conflict and re-landing it.

Reviewed By: ezyang

Differential Revision: D13911775

fbshipit-source-id: b790f1c3a3f207916eea41ac93bc104d011f629b
2019-02-07 13:58:16 -08:00
Sebastian Messmer
6750e1e3e9 C10_REGISTER_CAFFE2_OPERATOR: Macro for registering c2 kernels (#16548)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16548

With this macro, a caffe2 operator can now directly be registered with c10.
No need to write custom wrapper kernels anymore.

Differential Revision: D13877076

fbshipit-source-id: e56846238c5bb4b1989b79855fd44d5ecf089c9c
2019-02-07 13:58:14 -08:00
rohithkrn
aa88c2c0b6 Unify gpu_support variable in python tests (#16748)
Summary:
Assign `has_gpu_support = has_cuda_support or has_hip_support` and make according changes in python tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16748

Differential Revision: D13983132

Pulled By: bddppq

fbshipit-source-id: ca496fd8c6ae3549b736bebd3ace7fa20a6dad7f
2019-02-07 00:29:51 -08:00
Yinghai Lu
e5e0bf4152 Add AdjustBatch Op (#16676)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16676

This op is used for changing batch size (first dimension) of the tensor.

Reviewed By: bertmaher, ipiszy

Differential Revision: D13929200

fbshipit-source-id: 4f2c3faec072d468be8301bf00c80d33adb3b5b3
2019-02-06 19:15:41 -08:00
Jongsoo Park
929cd23da1 no EIGEN engine for DeformConv (#16785)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16785

There's no EIGEN engine implemented for DeformConv but unit test was checking it.

Reviewed By: BIT-silence

Differential Revision: D13967306

fbshipit-source-id: e29c19f59f5700fc0501c59f45d60443b87ffedc
2019-02-06 11:59:31 -08:00
Jongsoo Park
8d4b2db529 format deform_conv_test.py (#16786)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16786

Format to prepare D13967306

Reviewed By: BIT-silence

Differential Revision: D13967317

fbshipit-source-id: 2de895f8474b04c55ba067fbf788c553dc010c60
2019-02-06 11:59:29 -08:00
Edward Yang
a3f600e394 Revert D13854304: [redo][c10] LayerNorm Registration Example
Differential Revision:
D13854304

Original commit changeset: ec463ce22721

fbshipit-source-id: 4262b9a2ef486e1c7c0283ea021331ac97cc5f56
2019-02-06 08:26:23 -08:00
Edward Yang
fc0e88dd77 Revert D13855525: [c10] Expose RoIAlign to torch
Differential Revision:
D13855525

Original commit changeset: cfee7bb1544d

fbshipit-source-id: 0b4124b78c4082b52e592a1275069c879a9aed39
2019-02-06 08:26:22 -08:00
Edward Yang
33a6a7a3ea Revert D13856086: [c10] Expose GenerateProposals to torch
Differential Revision:
D13856086

Original commit changeset: a4873646a71a

fbshipit-source-id: 79b634426404236ddbc407d3796a350ad3dae5ca
2019-02-06 08:26:20 -08:00
Edward Yang
018485130f Revert D13864292: [c10] Expose BBoxTransform to pytorch
Differential Revision:
D13864292

Original commit changeset: 1f57664e7834

fbshipit-source-id: 37663b7e8213185ecaa5c219076fc7de64704549
2019-02-06 08:26:18 -08:00
Edward Yang
c0a7bf94ed Revert D13865221: [c10] Expose BoxWithNMSLimit
Differential Revision:
D13865221

Original commit changeset: 8a3f1d420183

fbshipit-source-id: 0057be9619b660dcad8c01bae67b54400127577e
2019-02-06 08:26:17 -08:00
Edward Yang
cda43336d4 Revert D13866214: [c10] Expose HeatmapMaxKeypoints to torch
Differential Revision:
D13866214

Original commit changeset: 2ca79037fc07

fbshipit-source-id: d2c653f4f32cf0ea76875888f3523c0dc7db9960
2019-02-06 08:26:16 -08:00
Bram Wasti
a9713d07b0 Expose HeatmapMaxKeypoints to torch (#16528)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16528

..

Reviewed By: smessmer

Differential Revision: D13866214

fbshipit-source-id: 2ca79037fc070bade5542345af5ce09f88beda44
2019-02-05 12:56:58 -08:00
Bram Wasti
3df7b321cc Expose BoxWithNMSLimit (#16529)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16529

..

Reviewed By: smessmer

Differential Revision: D13865221

fbshipit-source-id: 8a3f1d420183ed5ae51b3c9e4eb6e033078c7ae4
2019-02-05 12:56:56 -08:00
Bram Wasti
add39b85cc Expose BBoxTransform to pytorch (#16530)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16530

..

Reviewed By: smessmer

Differential Revision: D13864292

fbshipit-source-id: 1f57664e78347e72c0087aa3d825a6a9517c1945
2019-02-05 12:56:54 -08:00
Bram Wasti
f33a2b960e Expose GenerateProposals to torch (#16477)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16477

expose generateproposals to torch

Reviewed By: smessmer

Differential Revision: D13856086

fbshipit-source-id: a4873646a71a6b6c01740d21729e827f4b36588f
2019-02-05 12:56:52 -08:00
Bram Wasti
f5d4636021 Expose RoIAlign to torch (#16476)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16476

enable calling roialign (caffe2) from torch frontend

Reviewed By: smessmer

Differential Revision: D13855525

fbshipit-source-id: cfee7bb1544dc58df4231604ba01d61ca905ae3f
2019-02-05 12:56:50 -08:00
Bram Wasti
240240bb10 LayerNorm Registration Example (#16478)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16478

This diff includes an example registration of a caffe2 op in torch.  A previous attempt ran into a static initialization order bug.

Reviewed By: smessmer

Differential Revision: D13854304

fbshipit-source-id: ec463ce2272126d08a5163d1599361ee5b718bbc
2019-02-05 12:56:48 -08:00
Sebastian Messmer
f36f3cce9a Simplify layer_norm_op_test
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16570

Reviewed By: ezyang

Differential Revision: D13883913

fbshipit-source-id: 7437d3cbc00c0de92bb01562c620cb658aa9f0d3
2019-02-01 21:34:18 -08:00
Xiaomeng Yang
4ae9ab24b6 Update conv_base to support empty batch (#16603)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16603

Update conv_base to support empty batch

Reviewed By: houseroad

Differential Revision: D13894111

fbshipit-source-id: fc4370ff16ba6046f374e77bd845d28e6af05ea3
2019-01-31 23:46:18 -08:00
Dmytro Dzhulgakov
51752e09c6 Disable layernorm_c10 test for now (#16630)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16630

two PRs landed concurrently - enforcing tensor constraints and refactoring c10. Since it's not a prod code - disable test and I'll let Sebastian to fix it properly.

Reviewed By: ezyang

Differential Revision: D13908117

fbshipit-source-id: 381c5626078b794afa1fc7a95cb1ea529650424c
2019-01-31 15:47:13 -08:00
Sebastian Messmer
c43917b0a3 Add a test case calling caffe2 layer_norm from caffe2 executor but through the c10 dispatcher
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16283

Reviewed By: ezyang

Differential Revision: D13792591

fbshipit-source-id: 9c190649e38e8706549102b2e136ceaf508eb37f
2019-01-30 13:16:47 -08:00
Sebastian Messmer
80f4374dde Handle stack correctly (#16246)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16246

The op schema says it returns multiple values, so let's actually return multiple values instead of one tuple.
For some reason, this did work when called from python (probably some auto-unpacking),
but once called from JIT, it segfaulted. This diff fixes that.

Reviewed By: dzhulgakov

Differential Revision: D13780147

fbshipit-source-id: fe94f82f4c53b7454f77c4484fca4ac9dc444475
2019-01-28 11:46:03 -08:00
Juan Miguel Pino
41e9b092a9 Revert D13821061: [redo][c10] layernorm example
Differential Revision:
D13821061

Original commit changeset: 82f0dade0145

fbshipit-source-id: e5b0b1bab0c9e731ae04add35e9a6c91656dd178
2019-01-25 22:52:04 -08:00
Bram Wasti
27a1ba3ef2 layernorm example (#16374)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16374

this fixes the original attempt in OSS (adds to CMake and python build files)

Reviewed By: smessmer

Differential Revision: D13821061

fbshipit-source-id: 82f0dade0145fd04bdf8e3cb3954b5790e918162
2019-01-25 16:52:33 -08:00
Bram Wasti
958f846fb3 Back out "[c10] layernorm example"
Summary: Original commit changeset: 87240ca7f48d

Reviewed By: bddppq

Differential Revision: D13816657

fbshipit-source-id: bafcf0779d811c7e4a134cfb323a89352fa8c180
2019-01-25 10:22:30 -08:00
Bram Wasti
265ed8ff45 layernorm example (#16350)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16350

Example usage of the new caffe2 integration

Reviewed By: smessmer

Differential Revision: D13408546

fbshipit-source-id: 87240ca7f48d653a70241d243aa0eb25efa67611
2019-01-24 22:28:22 -08:00
Jongsoo Park
6700eff03e disable testing group conv with EIGEN engine (#16335)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16335

group conv is not implemented with EIGEN engine so this diff disables related tests

Reviewed By: jamesr66a

Differential Revision: D13807204

fbshipit-source-id: 41f6de43da40882f57e64474520e185733caefb7
2019-01-24 16:39:20 -08:00
Jongsoo Park
f0dd85d141 reduce parameter space of test_1x1_conv to avoid timeout (#16223)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16223

As title says

Reviewed By: jamesr66a

Differential Revision: D13758202

fbshipit-source-id: 3cdffb80a5dad53b29e65e8eb0ae128edba70dbb
2019-01-24 14:17:11 -08:00
bddppq
1a09a2a27f Export PyTorch erf to ONNX Erf and add Caffe2 Erf operator
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16106

Differential Revision: D13709490

Pulled By: bddppq

fbshipit-source-id: 1b5b32261f06543371f7bd7ac9b11957a5eb4ad0
2019-01-17 09:18:08 -08:00
Xiaomeng Yang
7536887cb7 Add count_include_pad for avg_pool on CuDNN (#16100)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16100

Add count_include_pad for avg_pool on CuDNN

Reviewed By: houseroad

Differential Revision: D13707959

fbshipit-source-id: 261f5d116066fef75cf9a5787dfbc5d12b5b9f9b
2019-01-17 02:10:12 -08:00
Xiaomeng Yang
7a5f782c2e Fix max_pool_grad test (#16088)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16088

Fix max_pool_grad test

Reviewed By: houseroad

Differential Revision: D13700917

fbshipit-source-id: f4f942ee920bcd943c38a8f8a6aafd1d13c4515f
2019-01-16 15:32:27 -08:00
Xiaomeng Yang
13f38ab79d Add count_include_pad to average_pool_gradient_op (#15997)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15997

Add count_include_pad to average_pool_gradient_op

Reviewed By: houseroad

Differential Revision: D13648339

fbshipit-source-id: 205cb2acb32dc24a85256b628298b1a11f0ffa2c
2019-01-15 16:56:40 -08:00
Sebastian Messmer
57b5e7572b Test cases for calling caffe2 LayerNorm from PyTorch and JIT
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15895

Reviewed By: dzhulgakov

Differential Revision: D13615336

fbshipit-source-id: de28fef8ce025d6d37a4c80c029ec97b7195cfd9
2019-01-15 12:03:57 -08:00
Sebastian Messmer
4ed9de8680 Remove code duplication (#15880)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15880

The layer_norm reference was implemented twice. Removing one of them.

Reviewed By: dzhulgakov

Differential Revision: D13611232

fbshipit-source-id: cee96c78d3255c3a4e34300693bf9260cf096615
2019-01-14 17:59:37 -08:00
Jesse Hellemn
8964a2e6e6 Split Caffe2 CI into cmake-only and python builds (#15917)
Summary:
bypass-lint

- Change all Caffe2 builds to use setup.py instead of cmake
- Add a -cmake- Caffe2 build configuration that uses cmake and only builds cpp
- Move skipIfCI logic from onnx test scripts to the rest of CI logic
- Removal of old PYTHONPATH/LD_LIBRARY_PATH/etc. env management
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15917

Reviewed By: orionr

Differential Revision: D13637583

Pulled By: pjh5

fbshipit-source-id: c5c5639db0251ba12b6e4b51b2ac3b26a8953153
2019-01-14 15:20:44 -08:00
Sergei Nikolaev
a282378baf Caffe 2: Reshape Op upgrade (#15380)
Summary:
This is follow up on #13945 where we had to turn off some TRT tests because some ops were not ready to accept ONNX opset 9+ models. This PR fixes Reshape.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15380

Differential Revision: D13649825

Pulled By: houseroad

fbshipit-source-id: b72e62803de5b63cc001c3fe4b3bf64dfa996e94
2019-01-13 22:49:40 -08:00
Andre Georg Holzner
961f829067 deduplicated code in elementwise_op_broadcast_test.py (#15865)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15865

factored out code used in tests for operators Add, Mul and Sub
into two new methods: a first one to generate the test vectors, a second
one to run the actual tests given a caffe2 and python operator.

Reviewed By: houseroad

Differential Revision: D13526955

fbshipit-source-id: 8970ba5a1305ca19a54a14b51816d4a19f19d678
2019-01-09 03:07:22 -08:00
David Carrillo Cisneros
2b22612289 Add NHWC support to Resize Operator (#15553)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15553

Add unit test and implementation of NHWC layout for Resize operator.

Also, add pragma parallel loop to old NCHWC layout.

Reviewed By: jspark1105

Differential Revision: D13540762

fbshipit-source-id: eebf252bf0d1efdff180a171d804181045f100a5
2019-01-08 16:44:17 -08:00
Jongsoo Park
1159302ab1 bug fix in 3d group conv (#15625)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15625

3D group conv (both NCHW and NHWC layout) was not correct.
Added group=2 in test_1d_convolution and test_3d_convolution in conv_test

Reviewed By: protonu

Differential Revision: D13562099

fbshipit-source-id: 586e8a7574a2764f2a3b559db6c2415b3ab90453
2019-01-03 09:46:49 -08:00
Jongsoo Park
bee6c6761e format conv_test.py to prepare D13562099 (#15632)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15632

Just formatting and a few lints.

Reviewed By: yinghai

Differential Revision: D13562403

fbshipit-source-id: c56f8ee61f68cdaccc0828a764ff729454f68259
2019-01-02 11:34:30 -08:00
Jongsoo Park
d53012b4fe add NCHW2NHWC and NHWC2NCHW in utils.py (#15588)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15588

Use NHWC2NCHW or NCHW2NHWC functions which is easier to understand compared to code using transpose and generalizable to non-2D convolutions.

Reviewed By: csummersea

Differential Revision: D13557674

fbshipit-source-id: c4fdb8850503ea58f6b17b188513ae2b29691ec0
2018-12-28 17:34:50 -08:00
Jongsoo Park
4e4ef0cffb add rowwise adagrad lp test (#15082)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15082

We didn't have unit test for low-precision rowwise adagrad

Reviewed By: chocjy

Differential Revision: D13300732

fbshipit-source-id: 46e7bdfc82c5a6855eeb6f653c0a96b0b3a20546
2018-12-22 10:25:39 -08:00
Jongsoo Park
e012b183dd handle empty inputs to SparseLengthsMean correctly (#15389)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15389

SparseLengthsMean was generating uninitialized data for empty inputs (lengths == 0). We should return zeros.
The unit tests were also not covering this special case which is fixed by this diff.

Reviewed By: salexspb

Differential Revision: D13515970

fbshipit-source-id: 3c35265638f64f13f0262cee930c94f8628005da
2018-12-21 22:20:14 -08:00
Jongsoo Park
f52f68bcf9 format specialized_segment_ops_test.py to prepare D13515970 (#15408)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15408

Applied formatting to specialized_segment_ops_test.py to prepare D13515970

Reviewed By: salexspb

Differential Revision: D13520300

fbshipit-source-id: c3250b6abe8087c607f65ae60d1da61bd46c342b
2018-12-20 23:44:47 -08:00
Edward Yang
26b04523b1 Record Caffe2's current stream ID in c10_cuda. (#15174)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15174

Previously, Caffe2 maintained a separate per-thread per-device
current logical CUDA stream ID.  In this PR, we switch Caffe2 over
to using c10::Stream to manage the current stream, and also
manage the allocation of cudaStream_t objects.

This results in a slight behavior change: previously, Caffe2
would have been willing to allocate an arbitrary number of
CUDA streams, depending on how high the logical stream IDs
went.  The c10::Stream pool has a fixed number of streams, once
you exceed it, it wraps around.

Reviewed By: dzhulgakov

Differential Revision: D13451550

fbshipit-source-id: da6cf33ee026932a2d873835f6e090f7b8a7d8dc
2018-12-20 21:54:05 -08:00
Bill Li
3681bf7cff add dense vector to id_list operator (#15090)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15090

as title
step 2 of the linked task

Reviewed By: ellie-wen

Differential Revision: D13425977

fbshipit-source-id: f3538ed68f42470ba39c5b779af764d4a5591a9d
2018-12-18 16:27:38 -08:00
rohithkrn
763b9954f3 FP16MomentumSGDUpdate Op fix and enable for ROCm (#15150)
Summary:
1. Fix a bug in FP16MomentumSGDUpdate operator
2. Enable operator for ROCm
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15150

Differential Revision: D13473145

Pulled By: bddppq

fbshipit-source-id: 4c5c5f30cb9bba658e3639dbe193fa08a304d306
2018-12-14 16:33:45 -08:00
Xianjie Chen
fabd23cb2d support casting to string (#15110)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15110

support casting to string on CPU

Reviewed By: intermilan

Differential Revision: D13429381

fbshipit-source-id: b737a1ba1237b10f692d5c42b42a544b94ba9fd1
2018-12-12 21:33:58 -08:00
Jongsoo Park
cff509e2b1 share code between adagrad and rowwise adagrad tests (#14692)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14692

Remove some code duplication

Reviewed By: chocjy

Differential Revision: D13296731

fbshipit-source-id: 5924e037ca64fc4b89234be922bc5ca47fb8bd32
2018-12-10 22:10:39 -08:00
bddppq
45dfc6764e Enable more caffe2 fp16 rocm tests (#15040)
Summary:
cc rohithkrn petrex
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15040

Reviewed By: houseroad

Differential Revision: D13413068

Pulled By: bddppq

fbshipit-source-id: b2967f16f8da0b9e80083138fb8632c14e9e9b63
2018-12-10 21:30:21 -08:00
rohithkrn
7e2b074219 Integrate rocBLAS fp16 api into Caffe2 (#14882)
Summary:
This PR integrates rocBLAS half and mixed precision APIs in to Caffe2.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14882

Differential Revision: D13407840

Pulled By: bddppq

fbshipit-source-id: 75cb0d74da066776fa66575f1d255e879d36121e
2018-12-10 17:54:06 -08:00
rohithkrn
11a9248d01 Enable fp16 for MIOPEN operators in Caffe2 (#14905)
Summary:
This PR enables fp16 MIOPEN operators in Caffe2.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14905

Differential Revision: D13383439

Pulled By: bddppq

fbshipit-source-id: 840afa8d08bef2952ca0039dee2423f1542bb330
2018-12-07 17:26:44 -08:00
lcskrishna
12addc64a6 Fixed MIOpen RNN Segfault issue and enabled RNN test (#14810)
Summary:
This pull request contains changes for:
1. Added MIOpen RNN API miopenGetRNNLayerBiasSize and miopenGetRNNLayerParamSize.
2. Fixed usage of API miopenGetRNNLayerParam.
3. Modifying the RNN test to run using MIOpen engine.

Differential Revision: D13355699

Pulled By: bddppq

fbshipit-source-id: 6f750657f8049c5446eca893880b397804120b69
2018-12-05 23:54:31 -08:00
Huan Gui
ba287eebca Fix clip gradient with empty input (#14709)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14709

As titled

Reviewed By: Wakeupbuddy

Differential Revision: D13305554

fbshipit-source-id: 380062d4b0e4f9dc0207a27766cac7b8d05384d5
2018-12-05 22:53:25 -08:00
Michael Antonov
773f4d8081 Implements Gather operator for arbitrary axis, sharing the code with BatchGather. (#13756)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13756

This implements general Gather operator for arbitrary axis, sharing the code with BatchGather.
 - CPU gather & batch gather logic is now shared through caffe2::gather_helper, for any axis.
 - Shared CUDA kernel moved to gather_op.cuh, for any axis.
 - Gradients of axis > 0 delegate to BatchGatherGradientOp which now has axis argument.
 - BatchGatherOp doc strings updated to have correct rank (q + (r -1)) and output.
 - Added tests for axis == 2.

GatherOp supports index wrapping for axis == 0 by default, which was earlier for ONNX.
This diff also extends it to work in Cuda kernel. Added "wrap_indices" argument which specifies
wheather this wrapping should be done; set it to true if you'd like wrapping for any axis.

TBD: Update gradients to support negative indices (separate diff).
TBD: Once we have operator versioning, we'd like to update GatherOp to NOT support axis 0 wrapping
by default, but rather do it only if wrap_indices is set.

Reviewed By: dzhulgakov

Differential Revision: D12983815

fbshipit-source-id: 8add9d67b47fe8c5ba7a335f581ca0530b205cd7
2018-12-04 11:54:28 -08:00
Yan Zhu
aeb38cfcea cuda implementation for PackSegment to support presence mask (#14635)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14635

as title

Reviewed By: enosair

Differential Revision: D13254097

fbshipit-source-id: b9f40109e2889907c925f9a4df9da14f67f45f38
2018-11-30 16:54:10 -08:00
rohithkrn
0d663cec30 Unify cuda and hip device types in Caffe2 python front end (#14221)
Summary:
Goal of this PR is to unify cuda and hip device types in caffe2 python front end.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14221

Differential Revision: D13148564

Pulled By: bddppq

fbshipit-source-id: ef9bd2c7d238200165f217097ac5727e686d887b
2018-11-29 14:00:16 -08:00
Ashish
f4e502a8c5 Added MIOpen conv transpose op (#13938)
Summary:
This pull request contains changes for:
1. Removing ConvTranspose related changes from caffe2/operators/hip/conv_op_miopen.cc
2. Adding the file caffe2/operators/hip/conv_transpose_op_miopen.cc
3. Modifying the tests to run convTranspose op using MIOpen engine

Differential Revision: D13055099

Pulled By: bddppq

fbshipit-source-id: ca284f8f9a073005b22013c375cc958257815865
2018-11-13 21:01:52 -08:00
Shuting Wang
23e19ebfa7 add non expotential emphasis loss to Lambdarank
Summary: Currently Lambdarank applies exponential emphasis on relevance, i.e., g=2^rel when calculating dcg, this diff adds options that supports g=rel in the loss function.

Reviewed By: itomatik

Differential Revision: D9891514

fbshipit-source-id: 64730d467a665670edd37e6dc1c077987991d1a8
2018-11-13 14:54:04 -08:00
Yan Shang
c85463fc74 Allow Gather to handle empty data (#13781)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13781

allow Gather Op to handle empty data.

Reviewed By: intermilan

Differential Revision: D13001267

fbshipit-source-id: 633c8471b637c56be8f6574f9bf9430785073977
2018-11-10 10:00:47 -08:00
Ansha Yu
e3e6ca1102 operator serialized test coverage summary document (#13703)
Summary:
Add a markdown document summarizing the coverage of serialized operator tests. This currently only takes into account what has been covered by the tests with respect to the entire registry of c2 operators.

Next, we will break down the coverage by which operators have unit tests associated with them, which have hypothesis tests, and which have tests more specifically calling assertReferenceChecks.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13703

Reviewed By: dzhulgakov

Differential Revision: D12970810

Pulled By: ajyu

fbshipit-source-id: 4f0cd057b1cf734371333e24d26cbab630a170e1
2018-11-09 15:04:08 -08:00
François Garillot
edd2e38023 Clean up a couple of items in the C2 test scaffolding (WIP) (#7847)
Summary:
- Py3 compatibility
- utility functions refactoring
Pull Request resolved: https://github.com/pytorch/pytorch/pull/7847

Reviewed By: pietern

Differential Revision: D9355096

Pulled By: huitseeker

fbshipit-source-id: 8e78faa937488c5299714f78075d7cadb1b2490c
2018-11-07 09:16:13 -08:00
Pradeep Dorairaj
76c1b5cd79 Fix overflow error in stats_put_ops
Summary:
I was hitting this error:

caffe2/caffe2/operators/stats_put_ops.h:66:25: runtime error: 9.22337e+18 is outside the range of representable values of type 'long'

So, the assignment from int64_t to float loses some precision and because of that we overflow.

Reproduced this issue with this diff D12945013

Reviewed By: mlappelbaum, jdshi-fb

Differential Revision: D12927086

fbshipit-source-id: 7eae7fe25ab49d5ac15279335bd5b1fa89d6e683
2018-11-06 15:41:51 -08:00
Junjie Bai
95ca66763d Add math functions overloaded over different numeric types for cuda and hip (#13602)
Summary:
petrex ashishfarmer rohithkrn iotamudelta
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13602

Reviewed By: dzhulgakov

Differential Revision: D12935797

Pulled By: bddppq

fbshipit-source-id: a49ec66fb60bfd947c63dd2133d431884df62235
2018-11-06 01:40:31 -08:00
Jongsoo Park
54e8623d26 3D Conv in NHWC layout (#12733)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12733

Conv in NHWC layout only works for 2D images. This has been a pain point when implementing quantized 3D convolution because we need NHWC layout for best performance (note that NHWC layout in general gives better performance in CPU not just for quantized operators). For example, our quantized ops have a functionality to measure quantized error operator by operator but this needs running a shadow fp32 operator, but this is not easy when there's no 3D conv in NHWC layout is available (currently we're doing layout conversion on the fly for the shadow fp32 operator which is error prone). Some of Caffe2 frameworks like brew generates error when we try to create a 3D conv op in NHWC layout. This was also a blocker for using aibench because aibench is using brew.

i-am-not-moving-c2-to-c10

Reviewed By: houseroad

Differential Revision: D10333829

fbshipit-source-id: 2d203ee1db833cd3f9d39353219e3894b46c4389
2018-11-04 21:50:09 -08:00
Jongsoo Park
8be0efaa8c omit group conv NHWC test for HIP (#13554)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13554

D10233252 broke ROCM test.
We don't have group conv in NHWC for hip yet and this diff omits related tests.

Reviewed By: hyuen

Differential Revision: D12917880

fbshipit-source-id: 9baf36a8cb061ee8cf393b2c438a2d1460ce5cd8
2018-11-03 21:18:23 -07:00
Jongsoo Park
2bc6a7a260 enable group conv test in NHWC layout in CPU (#12428)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12428

Group conv in NHWC layout was enabled in CPU after D7547497.
In D7547497, unit test of group conv in NHWC layout in CPU was enabled in group_conv_test.py but not in conv_test.py . This diff also enables it in conv_test.py .

Reviewed By: BIT-silence

Differential Revision: D10233252

fbshipit-source-id: aeeaf3eedc60e1cf6321b5a1dbe6a561e3aacbde
2018-11-03 11:58:51 -07:00
Junjie Bai
da029ca042 Skip Conv1D tests for MIOPEN (#13512)
Summary:
miopen currently only supports 2d

Pull Request resolved: https://github.com/pytorch/pytorch/pull/13512

Differential Revision: D12903307

Pulled By: bddppq

fbshipit-source-id: a8b0f0580a1859f1e0c1518907406abf013c4c8c
2018-11-02 11:38:26 -07:00
Sergei Nikolaev
61a2d47ec6 Special handling for 1D covolutional kernels in cuDNN flavor of conv_op. (#12902)
Summary:
Essentially makes cuDNN to think of those kernels like of Nx1 ones.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12902

Reviewed By: BIT-silence

Differential Revision: D10852862

Pulled By: soumith

fbshipit-source-id: 7416cf6d131177340d21cbf1d42c1daa6c7cad8c
2018-11-02 07:08:23 -07:00
Will Feng
4c06f1f2bb CircleCI: enable all flaky tests (#13356)
Summary:
A few Caffe2 tests are currently disabled in `py2-gcc4.8-ubuntu14.04` test job because they are known to be flaky. https://github.com/pytorch/pytorch/pull/13055 likely had fixed the flakiness, and this PR tests it.

Fixes https://github.com/pytorch/pytorch/issues/12395.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13356

Differential Revision: D12858206

Pulled By: yf225

fbshipit-source-id: 491c9c4a5c48ac1b791fdc9d78acf66091e80457
2018-10-31 09:34:49 -07:00
Michael Antonov
f58e4fbc45 Remove redundant array-gen loop in gather_ops_test.py (#13338)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13338

Remove unnecessary [r for r in []] statements.

Reviewed By: ezyang

Differential Revision: D12848907

fbshipit-source-id: 256551b286ac6801585acf9bb0b2644ef0b7ed58
2018-10-30 16:20:22 -07:00
Dong Shi
3a81984bde Make Stat put ops accept empty tensors safely (#13178)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13178

Add default value option to stats put ops

Reviewed By: mlappelbaum

Differential Revision: D10858564

fbshipit-source-id: cc9b3e621abf3fc21821b73f354bebdcd35e477e
2018-10-30 13:28:58 -07:00
Ilia Cherniavskii
1032cf9fe4 Support for zero-length sequences in RNN executor (#13244)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13244

Adding support for zero-length sequences into RNN executor

Reviewed By: dzhulgakov

Differential Revision: D10848803

fbshipit-source-id: f2994ee28c09fb30146243bb300ae7205024dd17
2018-10-29 10:32:42 -07:00
Tristan Rice
ab40eff5dd caffe2: UpsampleBilinear CUDA implementation (#12843)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12843

This adds a cuda implementation for the UpsampleBilinearOp and UpsampleBilinearGradientOp.

The CUDA code is based off of the corresponding ResizeNearest operators but with bilinear interpolation logic taken from the CPU implementation.

Reviewed By: houseroad

Differential Revision: D10453776

fbshipit-source-id: b29ac330b72465974ddb27c0587bca590773fdec
2018-10-25 11:10:04 -07:00
Junjie Bai
ccfaf46431 Make CUDNN an alias of MIOPEN for HIP ops (#12278)
Summary:
This is mostly for reusing all the cudnn test cases in our python operator_tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12278

Differential Revision: D10842592

Pulled By: bddppq

fbshipit-source-id: 4b3ed91fca64ff02060837b3270393bc2f9a9898
2018-10-24 17:07:31 -07:00
Edward Yang
df47bbe9c1 Fix test_glu_old HealthCheck with smarter generation strategy. (#12975)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12975

Differential Revision: D10513493

Pulled By: ezyang

fbshipit-source-id: ac183aeb4ae7f0a5f91f1a369b595ae92c3e844d
2018-10-24 13:45:19 -07:00
Yangqing Jia
ff508c91a1 Remove numba dependency
Summary:
TSIA - we want to deprecate numba in fbcode when moving to new compiler tiers.

Converted the old test to a non-numba regular python op test.

Reviewed By: xw285cornell

Differential Revision: D10519910

fbshipit-source-id: 0e9188a6d0fc159100f0db704b106fbfde3c5833
2018-10-23 17:03:47 -07:00
Tristan Rice
6190408e24 caffe2: UpsampleBilinear support for scales (#12736)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12736

This updates UpsampleBilinearOp and UpsampleBilinearGradientOp to support scales to bring it inline with ResizeNearestOp https://github.com/pytorch/pytorch/pull/12720.

Reviewed By: houseroad

Differential Revision: D10416228

fbshipit-source-id: f339b7e06979c9c566afb4cee64a2d939b352957
2018-10-19 08:55:55 -07:00
Dmytro Dzhulgakov
92890d4314 Delete ExtendTensor operator
Summary: Added 2 years ago in D3665603, never used, kill it.

Reviewed By: ezyang

Differential Revision: D10421336

fbshipit-source-id: 1b027a9ef2b71d0dd2c572cd4338bc8e046320d8
2018-10-18 15:18:40 -07:00
Lu Fang
f1e7d384b6 Support scales as inputs in ResizeNearest (#12720)
Summary:
To address https://github.com/onnx/onnx/pull/1467
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12720

Reviewed By: BIT-silence

Differential Revision: D10414813

Pulled By: houseroad

fbshipit-source-id: 8831381b0115c363065c8d23bd1a95b4d641b857
2018-10-17 23:08:53 -07:00
Hector Yuen
17ab3bd502 implement rowwise quantization for fp16 (#12382)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12382

implement fp16-> (uint8 + scale and bias in fp32)

this is similar to fp32 rowwise quantization

we could have done scale and bias in fp16 but not too motivated since we are not saving much and those datatypes have to be converted to fp32 to process since x86 doesn't support half float operations anyways

Reviewed By: csummersea

Differential Revision: D10220463

fbshipit-source-id: 6c382026de881f03798c2e5fc43abfc80f84ea1f
2018-10-12 13:57:55 -07:00
Dong Shi
da3dd9af12 No Op Optimizer (#12390)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12390

Introduce a no op optimizer for when we don't want updates to happen, but don't want to affect downstream processes.

Reviewed By: mlappelbaum

Differential Revision: D10209812

fbshipit-source-id: 2af4ebc0fb42e78ea851c3a9f4860f3d224037b6
2018-10-10 18:09:46 -07:00
Junjie Bai
f54ab540af Rename cuda_gpu_id to device_id in DeviceOption (#12456)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12456

codemod with 'Yes to all'
codemod -d . --extensions h,cc,cpp,cu,py,proto,pbtxt,pb.txt,config cuda_gpu_id device_id

Overload TextFormat::ParseFromString to do string replace when parsing from protobuf format

Reviewed By: Yangqing

Differential Revision: D10240535

fbshipit-source-id: 5e6992bec961214be8dbe26f16f5794154a22b25
2018-10-09 15:54:04 -07:00
Will Feng
cdead5ace1 Enable CircleCI for Linux jobs (#12389)
Summary:
Changes in this PR:
1. Intermediate Docker image is shared from build stage to test stage through ECR, in order to fix the Caffe2 flaky CUDA tests.
2. There are ~7 Caffe2 operator tests that are only flaky in `caffe2_py2_gcc4_8_ubuntu14_04_test` on CPU. Disabling those tests on that config only, which is okay to do because we are still running those tests in other test jobs.

After this PR is merged, CircleCI will be running on master automatically, and will be running on PRs if the author rebased their PR onto the newest master (which we will ask all the authors to do when we switch off Jenkins for Linux).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12389

Differential Revision: D10224267

Pulled By: yf225

fbshipit-source-id: dd1a90a425c3d13b870d3d328cb301eee2e6e2cd
2018-10-08 17:09:37 -07:00
Dong Shi
5a0d2c7138 Add clamping functionality to stats_put_ops
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12391

Reviewed By: mlappelbaum

Differential Revision: D10220000

fbshipit-source-id: 10fdbc8ebab931a5be31df964b5de5728048205d
2018-10-08 16:53:26 -07:00
Junjie Bai
ff608a9ff3 Back out "Revert D10123245: Back out "codemod cuda_gpu_id to device_id"" (#12232)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12232

Original commit changeset: fca91fea58b7

This adds proper modifications to the DeviceType <->DeviceOption conversion code added in D10033396

Reviewed By: jerryzh168

Differential Revision: D10132473

fbshipit-source-id: 801ef777e2950982cb47b48051b1471a0a91e64b
2018-10-01 21:54:52 -07:00
Junjie Bai
26df16eb21 Clear previous device option when keep_device is set in load op
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12240

Reviewed By: jerryzh168

Differential Revision: D10133933

fbshipit-source-id: 05935bd527177f936c1d08626888d43dedbf5ce4
2018-10-01 17:20:26 -07:00
Rick Ratmansky
3010dc4208 Revert D10123245: Back out "codemod cuda_gpu_id to device_id"
Differential Revision:
D10123245

Original commit changeset: d83da8e00a12

fbshipit-source-id: fca91fea58b7df208edc2e218a1d514f9821ec7b
2018-10-01 12:22:36 -07:00
Yang Liu
7d7d336c45 Back out "codemod cuda_gpu_id to device_id"
Summary:
Original commit changeset: f5614a5d2607

D9986213 is causing Multifeed Aggregator a [huge performance different](https://our.intern.facebook.com/intern/ads/analyze_canary/412951953278781781/) and is blocking aggregator push since last Friday night: https://fburl.com/feedtools/b6izvwjz
We need to land this revert ASAP to unblock aggregator push.

Reviewed By: orionr

Differential Revision: D10123245

fbshipit-source-id: d83da8e00a1250f5d09811a0a587c127e377aab2
2018-10-01 11:31:14 -07:00
Satish Nadathur
04c0971679 Special case BatchGather and BatchGatherGradient for block_size=1. (#11349)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11349

Special case BatchGather and BatchGatherGradient for block_size=1. This makes BatchGather 3-4X faster and BatchGatherGradient 10X for this case.

Reviewed By: jspark1105, ilia-cher

Differential Revision: D7218043

fbshipit-source-id: ea12042239a8adc92b9efcbd0b66e354fb43f4c7
2018-09-27 21:11:38 -07:00
Junjie Bai
3eb5940cf5 codemod cuda_gpu_id to device_id (#12022)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12022

codemod -d . --extensions h,cc,cpp,cu,py,proto,pbtxt,pb.txt,config cuda_gpu_id device_id

codemod with 'Yes to all'

Reviewed By: orionr

Differential Revision: D9986213

fbshipit-source-id: f5614a5d26078817aee8caf79a494abfd6a95ff1
2018-09-27 20:24:53 -07:00
Dong Shi
d9c27f4d8d T33898723: Simple put operators for caffe2 stats (#12057)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12057

Add simple put operators for various types of stats

Reviewed By: mlappelbaum

Differential Revision: D9925268

fbshipit-source-id: cec02b0027d2d0ef3d35741be4b02c429d492810
2018-09-26 12:39:37 -07:00
Will Feng
7122f8b3bb Disable more flaky tests on CircleCI (#11399)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/11362.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11399

Differential Revision: D9736673

Pulled By: yf225

fbshipit-source-id: cad8c0e86a70a01b047e648975ca5b9926e4acb3
2018-09-25 10:25:30 -07:00
Ansha Yu
3b1a5a1b8a Refactor tests part 2 (#11811)
Summary:
Followup to the [first refactor](https://github.com/pytorch/pytorch/pull/11350). Increase coverage of tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11811

Reviewed By: houseroad

Differential Revision: D9923074

Pulled By: ajyu

fbshipit-source-id: 0f899bb9e9a75bf7ed939e06cc9b028daa7f6bd9
2018-09-19 10:09:28 -07:00
Ansha Yu
98aebed88e Refactor tests part 1 (#11350)
Summary:
Followup to [the serialized test framework](https://github.com/pytorch/pytorch/pull/10594)

Round 1 for refactoring tests, starting alphabetically. I added some functionality, so I wanted to send out some of these initial changes sooner.

I'm skipping all tests that don't explicitly call assertReferenceChecks. Some tests directly call np.allclose, and others are simply TestCase (rather than HypothesisTestCase).

1. Start alphabetically producing serialized outputs for test functions, annotating those we want to include with `serialized_test_util.given`. So far I've only added one test per operator, but this already does seem to add quite a few tests.
2. Add functionality to allow us to generate outputs using pytest by adding pytest argument options. This allows us to skip adding a `__main__` function to quite a few tests.
3. Catch any exceptions generating the gradient operator and skip serializing/reading it, since certain operators don't have gradients.
4. Add functionality to better handle jagged array inputs, which numpy doesn't handle very well. We simply explicitly do the conversion to dtype=object.
5. Make only one file per test function, rather than 4, to reduce the number of files in the github repo.

I also noticed that there is some hypothesis handling that makes `serialized_test_util.given` not compatible with adding more hypothesis decorators on top. For example, there are tests that do
```
settings(...)
given(...)
def test_my_stuff(...)
```
But there is a hypothesis handler that explicitly checks that `given` is called below `settings`, so we cannot refactor this to `serialized_test_util.given`. I've just avoided decorating these kinds of tests for now, I hope that's alright.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11350

Reviewed By: houseroad

Differential Revision: D9693857

Pulled By: ajyu

fbshipit-source-id: a9b4279afbe51c90cf2025c5ac6b2db2111f4af7
2018-09-18 10:42:10 -07:00
Chenguang Xi
cdefc27795 Support lr adaption for SparseAdam and RowWiseSparseAdam (#11162)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11162

as title, fix pr test failure

Reviewed By: chocjy

Differential Revision: D9619308

fbshipit-source-id: 0a2228841ed8fadb15f07e94d3575aa701b10146
2018-09-17 10:29:03 -07:00
Xiaomeng Yang
7f7cda99cd Optimize order_swich_ops on GPU (#11404)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11404

Optimize order_swich_ops on GPU

Reviewed By: houseroad

Differential Revision: D9728642

fbshipit-source-id: 74ff62268856fb1613fa61eb214bed6ec6716632
2018-09-12 16:56:15 -07:00
Lukasz Wesolowski
4db21a1d8e Optimize LengthsTileOp on GPU to run a kernel instead of a sequence of memcopies (#11413)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11413

LengthsTileOp was implemented using a sequence of device memcopies initiated on the CPU. This was very slow. I changed it to use a kernel. TUM benchmark QPS improved from 13k QPS to 20k QPS as a result.

Reviewed By: manojkris, xianjiec

Differential Revision: D9724988

fbshipit-source-id: 2f98c697730982734d7c6a26d0b6967310d49900
2018-09-11 13:25:35 -07:00
Mingda Li
f2f43ad2da Add new LengthsSplit operator (#10974)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10974

Pull Request resolved: https://github.com/pytorch/pytorch/pull/10291

This new operator will do the following:

Given a LENGTHS vector and n_splits, output a "split" LENGTHS vector where:

1. Each length in input vector is split into n_splits values (thus output vector should have LENGTHS.size(0) * n_splits elements)
2. The new lengths in output should be evenly split, and if the length is not divisible by n_splits, then order new values in descending order. (e.g. n_splits = 3, length = 5 -> 2 2 1)
3. If n_splits > some element in the array, its split elements will contain 0s. (e.g. n_splits = 3, length = 2 - > 1 1 0)

Reviewed By: bddppq, chocjy

Differential Revision: D9013119

fbshipit-source-id: 82bf3371ec08c41fc3379177f0007afc142e0d84
2018-09-10 15:40:28 -07:00
Xiaomeng Yang
ec5404a449 Add cuda version of SpatialBNOp also optimize SpatialBN on CPU (#10888)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10888

Add cuda version of SpatialBNOp also optimize SpatialBN on CPU

Reviewed By: houseroad

Differential Revision: D9512435

fbshipit-source-id: 6f828c88d56d30dc9a2f98a297a161c35cc511b1
2018-09-06 18:26:13 -07:00
Will Feng
c9e66351a7 Port all PyTorch and Caffe2 jobs to CircleCI (#11264)
Summary:
This PR adds all PyTorch and Caffe2 job configs to CircleCI.

Steps for the CircleCI mini-trial:
- [ ] Make sure this PR passes Jenkins CI and fbcode internal tests
- [x] Approve this PR
- [ ] Ask CircleCI to turn up the number of build machines
- [ ] Land this PR so that the new `.circleci/config.yml` will take effect

Several Caffe2 tests are flaky on CircleCI machines and hence skipped when running on CircleCI. A proper fix for them will be worked on after a successful mini-trial.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11264

Differential Revision: D9656793

Pulled By: yf225

fbshipit-source-id: 7832e90018f3dff7651489c04a179d6742168fe1
2018-09-05 16:28:11 -07:00
Xiaomeng Yang
b3d559cdd1 Optimize WeightedSumOp for two inputs (#11049)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11049

Optimize WeightedSumOp for two inputs

Reviewed By: houseroad

Differential Revision: D9566692

fbshipit-source-id: 9aab1f02251d386b6f7d0699ae11eeb2ea2b5b4f
2018-09-01 11:54:55 -07:00
Edward Yang
3073051a18 Revert D9554375: Support lr adaption for SparseAdam and RowWiseSparseAdam
Differential Revision:
D9554375

Original commit changeset: b88768f470ef

fbshipit-source-id: 2c103c616c8680684892c7d9085fd7bb8289d2f1
2018-08-31 07:54:31 -07:00
Chenguang Xi
0555768e0f Support lr adaption for SparseAdam and RowWiseSparseAdam (#10993)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10993

as title

Reviewed By: chocjy

Differential Revision: D9554375

fbshipit-source-id: b88768f470ef7d023dd481c6a97b91594892f422
2018-08-31 00:55:39 -07:00
Ansha Yu
9fae8fcdff framework for committed serialized tests (#10594)
Summary:
Generate serialized test inputs/outputs/backward graphs of tests inside `caffe2/python/operator_test` that call assertSerializedOperatorCheck(). Tests should be decorated with serialized_test.collect_tests.given_and_seeded to run hypothesis tests that are actually random and a single fixed seeded hypothesis tests.

To use:
1. Refactor your test to be a SerializedTestCase
1a. Decorate it with given_and_seeded
1b. Call testWithArgs in main
2. Run your test with -g to generate the output. Check it in.
3. Subsequent runs of the test without generating the output will check against the checked in test case.

Details:
Run your test with `python caffe2/python/operator_test/[your_test].py -g`
Outputs are in `caffe2/python/serialized_test/data`. The operator tests outputs are in a further subdirectory `operator_test`, to allow for other tests in the future (model zoo tests?)

Currently, we've only refactored weighted_sum_test to use this, but in the next diff, we'll refactor as many as possible. The directory structure may also change as usually there are multiple tests in a single file, so we may create more structure to account for that.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10594

Reviewed By: ezyang

Differential Revision: D9370359

Pulled By: ajyu

fbshipit-source-id: 2ce77389cd8bcc0255d3bccd61569833e545ede8
2018-08-30 22:41:46 -07:00
Tommy Yu
89834dfe64 Add GPU version of HardSigmoid Op to Caffe2 (#10955)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10955

Add GPU version of HardSigmoid Op to Caffe2. Updated test file to
include GPU tests.

Reviewed By: enosair

Differential Revision: D9499353

fbshipit-source-id: fcb51902063d0c3e4b10354533a8a42cf827c545
2018-08-29 14:55:29 -07:00
Tommy Yu
92ff070b83 Add CPU version of hard sigmoid operator to caffe2 (#10837)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10837

Add CPU version of hard sigmoid operator to caffe2. The definition of
this operator can be found here:
https://github.com/onnx/onnx/blob/master/docs/Operators.md#HardSigmoid.

Reviewed By: BIT-silence

Differential Revision: D9489536

fbshipit-source-id: 67b3171ed96d5ebcc8d500d93e7827a4a9705a81
2018-08-28 14:55:49 -07:00
Yanghan Wang
f64f6eed3a move HeatmapMaxKeypointOp unittest to oss
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10859

Reviewed By: newstzpz

Differential Revision: D9498312

fbshipit-source-id: 08b8a596f774c9102286019f286ca0b74d1f5304
2018-08-27 12:56:46 -07:00
Edward Yang
deda05e59f Revert D9395814: move HeatmapMaxKeypointOp unittest to oss
Differential Revision:
D9395814

Original commit changeset: 25073eb6b143

fbshipit-source-id: 56f2b7b57e3c6361e2d78e5ba7850ea3b89e98fb
2018-08-23 06:54:29 -07:00
Yanghan Wang
9a43fc5eaa move HeatmapMaxKeypointOp unittest to oss
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10674

Reviewed By: newstzpz

Differential Revision: D9395814

fbshipit-source-id: 25073eb6b143fc1e7cbf5f887545d2b7df15c9a9
2018-08-22 19:11:10 -07:00
Wei Wen
6c75fc0aa3 Intergrating stochastic quantization to easgd to reduce communication + supporting quantization on both sides (split from D8849770) (#10644)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10644

Depends on D8493264

Reviewed By: chocjy, boryiingsu

Differential Revision: D9347706

fbshipit-source-id: 6fdcc5b61098bf47ec9391b1f009b0e6a0615842
2018-08-22 17:10:03 -07:00
Huan Gui
7832e9d564 Add a bisect percentile operator (#10563)
Summary:
Add a bisect percentile operators with lower and upper bounds for interpolation
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10563

Reviewed By: chocjy

Differential Revision: D7802182

Pulled By: olittle

fbshipit-source-id: 89ebfa8b3463adc2c89235fa3dfffa187a9d5417
2018-08-20 13:14:05 -07:00
Jongsoo Park
cc53807be5 group conv with NHWC layout (#10585)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10585

group conv with NHWC layout

Reviewed By: BIT-silence

Differential Revision: D7547497

fbshipit-source-id: da0ec5a4512c15a0a0d7b79e6ce00c1f8f77f661
2018-08-17 00:39:23 -07:00
Xiaomeng Yang
87cac4c2f1 Update Im2Col related to make preparation for group conv in NHWC order. (#10439)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10439

Update Im2Col related to make preparation for group conv in NHWC order.

Reviewed By: houseroad

Differential Revision: D9285344

fbshipit-source-id: 1377b0243acb880d2ad9cf73084529a787dcb97d
2018-08-15 17:10:24 -07:00
Eli Amesefe
c5b1aa93ee Export uint8 tensors as byte string in mobile_exporter and add GivenTensorByteStringToUInt8FillOp (#10385)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10385

Pull Request resolved: https://github.com/pytorch/pytorch/pull/10354

Pull Request resolved: https://github.com/pytorch/pytorch/pull/10316

Because Protobuf encodes uint8_t tensors using a less space efficient varint uin32_t encoding, we are adding a new operator that reads back a byte string into a uint8_t tensor.

Reviewed By: harouwu

Differential Revision: D9004839

fbshipit-source-id: dfd27085c813fdeff13fee15eef4a2e7fef72845
2018-08-15 14:26:50 -07:00
Jongsoo Park
d8ff7ad6f8 generalize order switch ops for 1-3d (#10395)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10395

Order switch ops (NCHW2NHWC and NHWC2NCHW) were only supporting 2D images.
This diff generalizes them to 1D and 3D, and also add a unit test we didn't have.

Reviewed By: protonu

Differential Revision: D9261177

fbshipit-source-id: 56e7ec54c9a8fb71781ac1336f3f28cf024b4bda
2018-08-15 10:09:31 -07:00
Peizhao Zhang
ce8e8feceb Fixed a bug in box_with_nms_limit where it may produce more bounding boxes than specified. (#10390)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10390

Fixed a bug in box_with_nms_limit where it may produce more bounding boxes than specified.
* The original code first finds the threshold for the boxes at the 'detectons_per_im' position, and filters out boxes lower than the threshold.
* In some cases that there are multiple boxes have the same threshold, the op will return more boxes than 'detectons_per_im'.

Reviewed By: wat3rBro

Differential Revision: D9252726

fbshipit-source-id: 63f40829bcd275cb181692bc7547c384cee01499
2018-08-14 23:54:23 -07:00
Peizhao Zhang
520f4f6cb9 Added some unit test for box_with_nms_limit_op. (#10389)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10389

Added some unit test for box_with_nms_limit_op.

Reviewed By: wat3rBro

Differential Revision: D9237860

fbshipit-source-id: 2d65744bd387314071b68d2a0c934289fc64a731
2018-08-14 11:55:03 -07:00
Wei Wen
ffb59e5f20 adding stochastic quantization caffe2 operators (encoder and decoder in CPU are implemented. GPU mode is pending)
Summary:
This operator implements b (1/2/4/8) bit stochastic quantization of a floating
matrix in a row-wise fashion. 8/b floating values are concatenated to a byte
and returned in uint8 tensor. PR: https://github.com/pytorch/pytorch/pull/8629

Reviewed By: harouwu

Differential Revision: D8493264

fbshipit-source-id: 01f64066568a1e5a2b87c6d2134bd31cdf119c02
2018-08-13 16:39:23 -07:00
Jerry Zhang
656bb320b7 EnforceFinite test (#10143)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10143

att

Reviewed By: xianjiec

Differential Revision: D9122444

fbshipit-source-id: 010abcc1eb64f084c00890e8de5f5d422b4b8d02
2018-08-03 10:31:29 -07:00
Junjie Bai
4778afb8bb In Expand support using -1 to indicate preserving original size (#10174)
Summary:
zrphercule

https://pytorch.org/docs/stable/tensors.html#torch.Tensor.expand
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10174

Differential Revision: D9136467

Pulled By: bddppq

fbshipit-source-id: 825c489899097acda8d43706964d78a104cdf583
2018-08-02 22:09:47 -07:00
Junjie Bai
dd527db711 Skip TestConvolution.test_convolution_sync on ROCM which caused random segfaults (#10179)
Summary:
https://ci.pytorch.org/jenkins/job/caffe2-builds/job/py2-clang3.8-rocm1.7.1-ubuntu16.04-test/4701/console

petrex ashishfarmer rohithkrn
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10179

Differential Revision: D9139657

Pulled By: bddppq

fbshipit-source-id: 9b1bb2ad185ed16fff696ce026a5ee5fcf9cbaee
2018-08-02 21:09:27 -07:00
Lin Li
4a2f3cc45f Improve lars operator by applying clipping (#9905)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9905

This diff improves lars operator in Caffe2 by applying clipping to the computed learning rate

Reviewed By: pjh5

Differential Revision: D9020606

fbshipit-source-id: b579f1d628113c09366feac9406002f1ef4bd54f
2018-08-02 11:54:28 -07:00
Anshul Jain (B*8)
56974a06b5 Revert D8909766: [caffe2] Simplify order switch operators
Differential Revision:
D8909766

Original commit changeset: 17a302d5bf4a

fbshipit-source-id: 56c75a8ce27873ed1d5f194b9d6bf0049d8f21ba
2018-07-28 18:40:13 -07:00
Igor Milyakov
607688e928 Adding reciprocal operator and a test
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9908

Differential Revision: D9035809

Pulled By: virtan

fbshipit-source-id: bce1db46fd55faeeab18a3b266d25c8beeb08df7
2018-07-27 18:24:43 -07:00
Igor Milyakov
12a1af3731 Adding conv tests with explicit algo definition
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9798

Differential Revision: D9034663

Pulled By: virtan

fbshipit-source-id: d722f25f1dd00231ccc3ad5960bbbef63af02c2d
2018-07-27 17:39:17 -07:00
root
c3fe071483 Update hip files (#9826)
Summary:
The goal of this PR is to update the hip files to reflect relevant changes in cuda source files.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9826

Differential Revision: D9032840

Pulled By: bddppq

fbshipit-source-id: 504e55c46308eebfee3c9a7beea1f294fe03470f
2018-07-27 16:54:39 -07:00
Norman Mu
a532c1a48c Fix default argument value for CTCGreedyDecoder op (#9747)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9747

Currently the ctc_greedy_decoder op initializes the `merge_repeated` argument only if it has been provided by the user. Change to initialize in all cases.

Reviewed By: houseroad

Differential Revision: D8963635

fbshipit-source-id: 18955c7c26a77d9d7f5137e4dec085252ffabfeb
2018-07-27 16:33:07 -07:00
Jongsoo Park
e7ab093d93 Simplify order switch operators (#9581)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9581

Mostly to simplify code. Should also improve performance but order switch ops
don't take much time anyway.

Reviewed By: viswanathgs

Differential Revision: D8909766

fbshipit-source-id: 17a302d5bf4aba2755d88223fc01a41fd72c5919
2018-07-26 18:24:29 -07:00
Junjie Bai
bdbbcf068a Temporarily disable test_unique on rocm since it keeps running into segfault (#9872)
Summary:
petrex

https://ci.pytorch.org/jenkins/job/caffe2-builds/job/py2-clang3.8-rocm1.7.1-ubuntu16.04-test/3758/console
https://ci.pytorch.org/jenkins/job/caffe2-builds/job/py2-clang3.8-rocm1.7.1-ubuntu16.04-test/3757/console
https://ci.pytorch.org/jenkins/job/caffe2-builds/job/py2-clang3.8-rocm1.7.1-ubuntu16.04-test/3752/console
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9872

Reviewed By: ezyang

Differential Revision: D9013335

Pulled By: bddppq

fbshipit-source-id: 80490a0fd4a86aa9c8454378c0edddc57d135c4e
2018-07-26 08:34:00 -07:00
Junjie Bai
997f46d1e1 Disable "filter too much" health check for fc operator tests (#9865)
Summary:
makes the CI flaky
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9865

Differential Revision: D9011882

Pulled By: bddppq

fbshipit-source-id: 5124ab97d258eed7585734d64fb01e5df98abd0d
2018-07-25 21:41:14 -07:00
Siddharth Goyal
4b61760738 Add Adadelta optimizer to caffe2 (#9088)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9088

Closes https://github.com/pytorch/pytorch/pull/9088

- Added CPU/GPU implementations of Adadelta and SparseAdadelta.
- Added corresponding Python unittests

Reviewed By: BIT-silence

Differential Revision: D8712169

fbshipit-source-id: 544e99e13b230a919672a7341b3715d64597c0be
2018-07-24 20:09:21 -07:00
Kittipat Virochsiri
2b134c72e6 Add interface to provide blob types to shape&type inference (#9643)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9643

Current map interface assumes float data type, which is not always correct.

Reviewed By: kennyhorror

Differential Revision: D8455784

fbshipit-source-id: b94a31267760f7f97c15aa4b03008affc347fd10
2018-07-24 11:58:05 -07:00
Junjie Bai
7af5883860 Eanble python tests on ROCM (#9616)
Summary:
petrex
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9616

Differential Revision: D8960623

Pulled By: bddppq

fbshipit-source-id: bde93bda6230094e6bf4badd8ee79f0688ae1993
2018-07-24 11:37:58 -07:00
Xiaomeng Yang
5df3eae89e Add 1x1 specialization for conv with NCHW order (#9671)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9671

Add 1x1 specialization for conv with NCHW order

Reviewed By: houseroad

Differential Revision: D8944686

fbshipit-source-id: 94bf44f69498b1934b7dfff4c0e989342c7bb61c
2018-07-23 18:54:58 -07:00
Norman Mu
ee2cc68259 Add ctc_beam_search_decoder op for caffe2 (#9622)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9622

Implement a ctc_beam_sarch_decoder operator based on ctc_greedy_decoder.

Differential Revision: D8903100

fbshipit-source-id: 38973632cb437e5cfcb9ed3a48ed6b901c10efa3
2018-07-23 13:40:24 -07:00
Xiaomeng Yang
a01d6f01b5 Update channel_shuffle_op and transpose 2d to speed up ShuffleNet (#9525)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9525

Update channel_shuffle_op and transpose 2d to speed up ShuffleNet

Reviewed By: houseroad

Differential Revision: D8889361

fbshipit-source-id: 60196e819b6842becc53b4859b62d4419a0e2c6e
2018-07-21 12:54:33 -07:00
Zhaoheng Ni
a3a6ab60cd Fix the error in UnpackSegmentsOp when calculating the gradient with "max_length" argument (#9598)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9598

The "max_length" should be passed to UnPackSegmentsOp if "max_length" is given when calling PackSegmentsOp.

Reviewed By: jerryzh168

Differential Revision: D8919799

fbshipit-source-id: 8c97aa717b69177b8a5d5d56892817d488853840
2018-07-20 11:09:34 -07:00
Zhishuai Zhang
6557856671 Fix l2 normalization when handling zero vector (#9594)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9594

When the input vector is a zero vector, the previous GPU code will give Nan in backward. We fix this.

Reviewed By: pjh5

Differential Revision: D8849732

fbshipit-source-id: 87b1fb1ee05dfdb0d43bcbe67e36f15896fe1706
2018-07-19 14:10:03 -07:00
Xiaomeng Yang
ca3b36aa6a Add implementation for batch_moments_op (#9510)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9510

Add implementation for batch_moments_op

Reviewed By: houseroad

Differential Revision: D8587654

fbshipit-source-id: d20f52cc8e900716c1057e68c147258dfda5245b
2018-07-18 11:59:54 -07:00
Viswanath Sivakumar
9235ff53f1 Clip horizontal bounding boxes during rotated detection for backward compatibility (#9403)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9403

In BBoxTransform and GenerateProposal ops, clip_boxes makes sure the bbox fits
within the images. For rotated boxes, this doesn't always make sense as there
could be multiple ways to clip a rotated box within an image boundary.
Moreover, clipping to a horizontal box means we leave out pixels of interest
potentially. Therefore, we clip only boxes with angle almost equal to 0 (with a
specified `angle_thresh` tolerance).

Reviewed By: pjh5

Differential Revision: D8828588

fbshipit-source-id: 39c1eafdb5d39d383780faa0a47e76149145e50c
2018-07-16 20:24:49 -07:00
Mark Richardson
88146484b4 Add support for .norm() pytorch onnx export and ReduceL1/ReduceL2 caffe2 operators (#9299)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9299

Onnx has ReduceL1 and ReduceL2 operators that would facilitate this, so allow pytorch to export those and allow caffe2 to run them.

I only implemented this on CPU so far.

Reviewed By: pjh5

Differential Revision: D8757381

fbshipit-source-id: 68afc9e2f90042a70929b73ace05a499b5c670c7
2018-07-14 10:54:13 -07:00
Zhaoheng Ni
5ac8a80f8b Add BatchBucketizeOp in caffe2 (#9385)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9385

The operator transform dense features to sparse features by bucketizing. Only the feature in indices tensor will be transformed and output.

Reviewed By: bddppq

Differential Revision: D8820351

fbshipit-source-id: a66cae546b870c6b2982ac20641f198334f2e853
2018-07-13 20:39:30 -07:00
Jian Zhang
9e2f2cab94 Implementation and operator test for Wngrad optimizer (#8999)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8999

Closes https://github.com/pytorch/pytorch/pull/8999

Implemented the WRgrad optimizer operator for dense (base case as well as the case with additional output for effective learning rate and update value) and sparse case.

Reviewed By: pjh5

Differential Revision: D8627933

fbshipit-source-id: a63cde46c04bcc6b428ab5f77a4b3b2beb66c046
2018-07-13 18:11:41 -07:00
Xiaomeng Yang
bb9ff58c6d Add cudnn activation ops (#9379)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9379

Add cudnn activation ops

Reviewed By: houseroad

Differential Revision: D8818013

fbshipit-source-id: d3881c634a46578b9331da07f9fdf7e1f31d7e8a
2018-07-12 23:18:56 -07:00
Akshay Chalana
e30ff68410 Add Hardtanh Export (#8804)
Summary:
Added hartanh CPU/GPU Implementations and backend tests to Caffe2
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8804

Reviewed By: bddppq

Differential Revision: D8813987

Pulled By: houseroad

fbshipit-source-id: 2480296eab3373425b9e1734a10c009b4f5d3e26
2018-07-11 18:09:51 -07:00
Viswanath Sivakumar
c2dd90c40e Add angle normalization for rotated boxes (#9056)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9056

Closes https://github.com/pytorch/pytorch/pull/9056

Updates bbox_transform for rotated boxes with angle info to normalize the
predicted angle to be within [angle_bound_lo, angle_bound_hi] range.

Reviewed By: pjh5

Differential Revision: D8706240

fbshipit-source-id: f3ee834cf362736136e285f0f8f0c063af94a879
2018-07-11 11:25:54 -07:00
Viswanath Sivakumar
748a90d05b BBoxTransform op: Add support for rotated boxes (#8952)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8952

Closes https://github.com/pytorch/pytorch/pull/8952

Based on RRPN paper: https://arxiv.org/abs/1703.01086

Reviewed By: pjh5

Differential Revision: D8598547

fbshipit-source-id: 3699379df9bf45ed5bdd395175a0e26a77e079f7
2018-07-11 10:25:34 -07:00
Huamin Li
fb9f9c9ba2 Implement Sinh and Cosh (#9213)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9213

Closes https://github.com/pytorch/pytorch/pull/9213

Added hyperbolic trig functions Sinh and Cosh

Reviewed By: BIT-silence

Differential Revision: D8752566

fbshipit-source-id: 5a58336a5153ec804404b9ac7b10b5662ede3cb7
2018-07-10 18:55:31 -07:00
Orion Reblitz-Richardson
936f47f271 Make roi_align_rotated_op_test not rely on 1.12.0 numpy.rot90 (#9267)
Summary:
Breaking this out of https://github.com/pytorch/pytorch/pull/8338

Use a local version of `np.rot90` with an `axes` argument, since we don't have NumPy 1.12.0 in all of the test environments. Caffe2 conda2-ubuntu16.04, for example, fails. Generally, it seems better to not require a NumPy bump just for this test.

cc mingzhe09088
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9267

Reviewed By: mingzhe09088

Differential Revision: D8767819

Pulled By: orionr

fbshipit-source-id: c51a6295d58366eba06e4e55e3f1ffaa8af96975
2018-07-09 11:55:39 -07:00
Zhaoheng Ni
f87499a8f3 Modify the original PackSegments operator by adding "max_length" argument (#9048)
Summary:
Closes https://github.com/pytorch/pytorch/pull/9048

max_length argument helps fix the shape of the output to be N * max_length * D, where N is the batch_size, D is the feature_dim.

Reviewed By: bddppq

Differential Revision: D8702782

fbshipit-source-id: e30555608fee1c4a61cc95922f4a71c7f54903af
2018-07-06 14:33:59 -07:00
Xiaomeng Yang
21c420c32c Remove unused RowwiseArgMaxOp (#9119)
Summary:
Closes https://github.com/pytorch/pytorch/pull/9119

Remove unused RowwiseArgMaxOp

Reviewed By: houseroad

Differential Revision: D8719826

fbshipit-source-id: 57d78c8b93bc94a4634d806c7c2041f8c18678a5
2018-07-05 15:25:28 -07:00
Yan Zhu
8364470e5c fix expty batch for softmax (#9075)
Summary:
Closes https://github.com/pytorch/pytorch/pull/9075

as title

Reviewed By: QueryConnectionException

Differential Revision: D8710616

fbshipit-source-id: ca505e1a733cc24db9e2ab83a5395c64fa8360c4
2018-07-01 16:40:14 -07:00
Xiaomeng Yang
03e7953a98 Use FixedDivisor in Reduce and Broadcast CUDA kernels (#9072)
Summary:
Closes https://github.com/pytorch/pytorch/pull/9072

Use FixedDivisor in Reduce and Broadcast CUDA kernels

Reviewed By: houseroad

Differential Revision: D8710243

fbshipit-source-id: 6f1da12234898594a1be8c979d942aa515832aeb
2018-07-01 00:25:34 -07:00
Yan Zhu
b07ea04e23 empty batch for spatialBN (#8933)
Summary:
Closes https://github.com/pytorch/pytorch/pull/8933

spatialBN implementation cannot deal with empty batch, this diff tries to enable zero batch setting:

during training, when batch_size = 0:
in forward, output's saved_mean and saved_var are zeros.
in backward, the gradient for SCALE_GRAD and BIAS_GRAD are zeros.

Reviewed By: pjh5

Differential Revision: D8644699

fbshipit-source-id: 599ea687329d68699c987e05f56f409f4e729d1c
2018-06-29 18:40:41 -07:00
Xiaomeng Yang
838fdd6f99 Add Cube and Cbrt Ops (#8991)
Summary:
Closes https://github.com/pytorch/pytorch/pull/8991

Add Cube and Cbrt Ops

Reviewed By: houseroad

Differential Revision: D8678848

fbshipit-source-id: 051dd475e45ad9f1d11a8b32ae3acd1f7459b930
2018-06-28 14:55:30 -07:00
Xiaomeng Yang
93cc7d1923 Add in_place test for binary ops
Summary: Closes https://github.com/pytorch/pytorch/pull/8973

Reviewed By: houseroad

Differential Revision: D8674216

Pulled By: BIT-silence

fbshipit-source-id: bde1ff7b47dbc8a48d1ff72b345c767af698a09b
2018-06-28 11:45:35 -07:00
Mingzhe Li
c4744cfafa bilinear upsample operator on CPU
Summary: Add support for bilinear upsample operator on CPU.

Reviewed By: BIT-silence

Differential Revision: D7853215

fbshipit-source-id: 9043c95f9eb4e1f6df324e8f7a4e8fdb0c758f66
2018-06-27 10:12:06 -07:00
Orion Reblitz-Richardson
edb88b5f3a
Update from Facebook (#8887)
* add opencl + fpga context

adds an opencl context inside caffe2/fb which can be used for fpga access

* [Caffe2] Force tensor inference checks to be triggered during testing

We've started to rely on TensorInference functions more for different analysis.  This diff ensures that the TensorInference function's result matches what is expected from the definition of the operator.

* Enable building //caffe2:torch with @mode/opt

In @mode/opt, python runs out of a PAR, which breaks a lot of
assumptions in the code about where templates/ folders live relative
to __file__. Rather than introduce hacks with parutil, I simply turn
template_path into a parameter for all the relevant functions and
thread it through from the top level.

* [Caffe2] Fix cost models for DotProduct and Div.  Update Tensor Inference for dot product

As title.  DotProduct states that output is a 1-D tensor (https://caffe2.ai/docs/operators-catalogue.html#dotproduct) though code suggests it is either 0- or 1-D depending on inputs.  TensorInference defined to support implementation.

* [SG-MoE] Add an option to make the experts NOT as components

* [nomnigraph] Rename and fixup convertToNeuralNetOperator API

This will make things a bit cleaner

* no longer symlink THNN.h and THCUNN.h

* forced decoder network (onnx export)

Closes https://github.com/pytorch/translate/pull/95

Add networks in ensemble_export.py to create a forced decoding network from PyTorch NMT checkpoints. This network takes an arbitrary numberized (source, target) pair and returns the model score for the translation, including penalties.

Vocabulary reduction networks are also supported, but note that target indices which are not in the possible_translation_tokens generated for the source input will be trea

* Revert schema change to fix production models

Revert schema change to fix production models

* MockLogDeviceReader - rebase on FIX

# Goal

1), Build a make_mock_log_device_reader using make_mock_reader

2), Replace the real log_device_reader here: https://fburl.com/raihwf1p

# Log by D8151734

Real log_device_reader:
```
I0529 20:29:05.373108 954994 tensor.h:839] Tensor print_net/log of type std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >. Dims: (): read_net/ParseOpenTrainingRow:0
I0529 20:29:05.373244 954994 tensor.h:839] Tensor read_net/ParseOpenTrainin

* [C2/D2][1/n]: Nonnegative-Constrained Optimization -- log barrier

implement log barrier as a regularization method

* Add teacher weight screening.

Add teacher weight sceening according to teacher labels. If teacher label is zero, we do not use the distill loss in the objective function.

* Add NormalizerContext

See task for more detail. This implementation is a copy of what exists for RegularizerContext except for how the parameters are defined in the model_definition thrift file.

I'll try an alternative implementation which overrides the default arguments of functions instead like for argscopes in tensorflow.

https://github.com/pytorch/pytorch/compare/master...MaximeBoucher:update-from-facebook-0939578c068c?expand=1

* Adding cosine similarity option in dot processor

Add pairwise cosine similarity option in dot product.
Add an option to concate dot product and cosine similarity.
Add test cases.

* [nomnigraph][redo] Concat elim for sparseNN

Same as D7962948, which was reverted because Operator Schema was not
defined

* [pytorch] Revert pytorch/pytorch#7918 'Release GIL when copying to shared memory', breaks ASAN

Revert this pytorch diff that breaks ASAN when running Filament in dev mode; in opt mode it gives "bad file descriptor" errors. Looks like a race when copying tensors to shared memory in multiple mp.Queue's (which spawn separate threads).

https://github.com/pytorch/pytorch/pull/7918/files

* [nomnigraph][mobile] Enable nomnigraph by default, use -Oz on nomnigraph related code to reduce code size

enables nomnigraph and reduces codesize

* [Warmup] Allow both offline incremental training and online training

Change plan name on saving side and reading side to support both training type

This diff depends on D8128530 and D8168651.

* Revert D7802642: [Warmup] Allow both offline incremental training and online training

This reverts commit afc213cf9b36cecf75333a788391c4d09f4afccc

@bypass-lint

An infra SEV is better than not reverting this diff.
If you copy this password, see you in SEV Review!
@cause_a_sev_many_files

* Add legacy grad logic to fix div op on old graphs.

Add legacy grad logic to fix div op on old graphs.

* Correctly propagate operator failures

Propagate errors from operators that throw exceptions and return false

* Revert D8374829: [caffe2][nomnigraph][redo] Concat elim for sparseNN

This reverts commit 6dda028c463e54bb5c32188bbbe9202107e188a5

@bypass-lint

An infra SEV is better than not reverting this diff.
If you copy this password, see you in SEV Review!
@cause_a_sev_many_files

* [Caffe2] Added extra_info to core.DeviceOption(), enforced extra_info to be inherited in scope.DeviceScope

extra_info is a newly defined field in DeviceOption proto. This diff added extra_info to the core.DeviceOption().  And, In scope.DeviceScope(), this diff enforce the new scope to inherit the extra_info from old scope.

* [opt] hgdirsync wasn't enabled, merge diverged code

Here's the damage, P59732616 basically xplat was left behind but had
the change from assert to CAFFE_ENFORCE

* OMP parallelism over RoIs for RoIAlign op

Simpler to parallelize over RoIs. Shouldn't affect other uses as it relies on
the number of OMP threads set during startup.

PR: https://github.com/pytorch/pytorch/pull/8562

* Use int64_t for shape in FillOps

to avoid overflow of int32

* Implement Rotated RoIAlign op

Based on Rotated RPNs as explained in https://arxiv.org/abs/1703.01086.
The idea is simple - orientation/angle is added as an RPN
anchor parameter and then the angle is further regressed similar to bbox
coords. There are some additional changes related to NMS and IoU, but besides
that it's a direct extension to Faster-RCNN. Further details in https://fb.quip.com/sZHlA1iMfWPZ.

RoIs are represented in [center_x, center_y, width, height, angle] format.
`angle` repre

* Rotated RoIAlign op CUDA forward implementation

CUDA forward impl for D8415490

* RoIAlignRotated op CUDA backward pass implementation

TSIA

* All remaining fixes to eliminate process_github.sh

Most of this diff has already been reviewed separately, except for the parts relating to _thnn/utils.py and _utils._internal.py

remove skipIf(True, 'Fbcode') line from process_github.sh

replace sed of cpp file with #ifdef to control cudnnDestroy use

undo sync-time deletion of .gitattributes, remove process_github.sh

switch to using _utils._internal rather than try-import-except

This diff also fixes the open-source bug where rebuilds have

* Back out "Revert D7802642: [Warmup] Allow both offline incremental training and online training"

Original commit changeset: 7707d2efe60e The original diff is backout becuase the online trainer package is backed out. This code would only work with new online trainer package

* [easy] improve error log in adagrad op

as title

* re-allow use of thnn_h_path

This fixes cffi usage in OSS

* [4/4] [tum] paralyzing layerNorm for GPU full sync

as title

* add compile=False to pytorch tests, remove hack with pyc

* Add shape and type inference for RowWiseArgMax operator

See title

* Revert D8515341: Back out "Revert D7802642: [Warmup] Allow both offline incremental training and online training"

This reverts commit 78167eeef0af16b60f72c82f9dcdda9b41b4dcbd

@bypass-lint

An infra SEV is better than not reverting this diff.
If you copy this password, see you in SEV Review!
@cause_a_sev_many_files

* [fix-flaky-test] mock_hive_reader_test flaky, because GlobalCounter collects local counts intervally

# Problem

`MockHiveReader` uses `GlobalCounter` to limit `max_examples`.

GlobalCounter on server node collect local counts from worker nodes every 1 sec.

This 1 sec delay makes it impossible to limit exactly to the `max_examples`, it will definitely exceed `max_examples`.

# Plan

Given,
```
Expected num_examples = max_examples + num_examples/sec (Read Speed) x 1 sec (GlobalCounter Sync Int

* [Caffe2] Fix FCGradient cost inference.  Prevent overflow in cost inference

FCGradient missed a factor 2 in the `num_outputs == 3` case.  Overflow was occurring with flop calculation for FC.  Changed types to `uint64_t` to prevent future problems.

* Fix binary ops with empty inputs

Fix binary ops with empty inputs

* Support the filling of input blob with provided data

as title for Biz Integrity case

* Back out "Revert D8515341: Back out "Revert D7802642: [Warmup] Allow both offline incremental training and online training""

Original commit changeset: 30c55dd38816 Original diff is reverted due to introducing bad integration test. Fixed the integration test.

* [c2][easy] improve pack ops error loggings

as desc.

* Add ShapeTypeInference for LpNorm operator

As desc

* Shard test_nn to reduce runtime for each test target

Closes https://github.com/pytorch/pytorch/pull/8793

The current test_nn would time out and be disabled in GreenWarden, and we need to have an option to split it up in order to pass the stress test. Right now GreenWarden roughly allows running 100 test cases in test_nn before timing out, and here we have an option to divide test_nn into 30 shards (with ~40 tests in each shard) to allow for some test suite growth in the future.

* Change default caffe2_streams_per_gpu to 1

* Remove IN_SANDCASTLE from common.py and test_nn.py

We prefer to disable the failing tests through Sandcastle UI instead.

* Add a new class for an updated prof_dag.proto

This diff contains:
- An updated prof_dag.proto that contains blob profiles.
- A class to deserialize this information (serialization is in a follow up diff)
- Update to separate profiling information from NeuralNet (and use it as part of the class above).
- Unit tests

* Lambdarank for SparseNN

This diff adds a lambda_rank_layer for SparseNN.
 changes include
1) Adds support for multi sessions in c2 op
2) Adds support for two different loss functions in c2 op
3) Unit tests for op

* Revert D8586950: Back out "Revert D8515341: Back out "Revert D7802642: [Warmup] Allow both offline incremental training and online training""

This reverts commit 012220ed63eccc35659a57b31d16a3625da6317b

@bypass-lint

An infra SEV is better than not reverting this diff.
If you copy this password, see you in SEV Review!
@cause_a_sev_many_files

* [easy] A few fixups to multithread predictor benchmark

(1) support perf on T6 server
(2) remove dead code

* fix a bug about the map size

as title

* Fix reduce sum on in-place case.

Fix reduce sum on in-place case.

* [Warmup] Reland reverted diff Allow both offline incremental training and online training

Closes https://github.com/pytorch/pytorch/pull/8827

fix net transform integration test. Allow offline and online trainer to coexist D7802642.

* Add StoreHandlerNotAvailableException

Add an exception for a store that is not available or has been
deleted.

* Use exception handling for fault tolerance, missing KV store

Remove status blobs to communication ops so that exceptions propagate on
failure.

* [C2/D2][2/n]: Nonnegative-Constrained Optimization -- bounded grad proj

for simple bounded constrained optimization, incl non-negative box constraints.

* [GanH]: Adaptive Weighting with More Estimations

With implemented postivity optimization, we now learn adaptive weights with different
parameterizations.

This improves parameter estimation and training stability.

* Revert some changes for landing

* Remove AutoNoGIL in StorageSharing

* Temporarily disable net_tests

* Revert "[Caffe2] Force tensor inference checks to be triggered during testing"

This reverts commit 67ef05c22b2f71b4a489695384932f968384a2a4.

* Revert "Fix reduce sum on in-place case."

This reverts commit 6cb8a8e1b3db7b6d20941b0053e3f3836068eb64.

* Revert "Revert "Fix reduce sum on in-place case.""

This reverts commit 130a257c0893dc09f4bd6e6a45d112261807fd2c.
2018-06-26 14:55:48 -07:00
Xiaomeng Yang
288d37998a
[Caffe2] Fix gradient_check on in-place ops (#8828)
* Fix gradient_check on in-place ops

* Fix hsm_test

* Fix SplitByLengthOp test

* Fix input_device_options for gradient_checker

* Fix hypothesis_test_util.py
2018-06-25 15:25:56 -07:00
zrphercule
c44c95fd0b New operator 'expand' (#8263)
* operator 'expand'

* updated operator with a simple testcase

* Revert "updated operator with a simple testcase"

This reverts commit 1ce9f8ac567b525677254b0dce5735d7fea133d7.

* updated operator with a simple testcase

* expand operator with a passed testcase

* typo

* GPU full support added

* GPU support testing...

* GPU full supported

* formatted

* nits repaired

* gpu parameters fixed

* Expander removed

* nits fixed, document added

* formatted

* new testcases added & nits repaired
2018-06-18 16:33:47 -07:00
sf-wind
5b86c3af4a
Update from facebook (#8384)
* [fix] fixup the bias multiplier data access issue

Hotfix for failues in conv_transpose

* [D2][Easy]: lint regularizer

lint with black

* [GanH]: Split mu in adaptive weight for diagnose

* [Dper] Add the ability to split FC weights into multiple smaller ones

* fix SumReduceLikeOp for empty blob

as desc.

* add ctc_greedy_decoder for caffe2

ctc_greedy_decoder same as tf's

* Update event callback handling

Allow multiple callbacks per event

* Add WeightedSum layer

The motivation is to do weighted sum in HoNet/crossnet, in the next diff, I'll replace model.Add with model.WeightedSum in
honet: https://fburl.com/f4rmolg2
crossnet: https://fburl.com/v7awn8se, https://fburl.com/63filbnm

* Replicate DAG's behavior

Some callers expect RunAsync to block, replicate that behavior in case of
explicit 'dag' net type

* [dper] layernorm layer

as title

* Override dag, async_dag, async_polling

Overriding dag, async_dag and async_polling with async_scheduling

* Name the thread pools

Caffe thread pools currently inherit the thread names from the thread that starts them, which can be misleading. Give them an explicit name instead.

* [Caffe2] FilleOp should support int64_t dimensions

Change argument type to int64_t for shape argument of FillerOp (used in ConstantFill, XavierFill, etc)

* Remove caffe2/caffe2/contrib/torch/

It's not used anywhere and depends on old lua torch that conflicts with Aten. Given PT1 it's not relevant any more (though it was nice and clever code!)

#accept2ship

* Fix linearWarmup multiplier check

The multiplier needs to be non-negative, not strictly positive.

* Revert D3314316

This is after 2 years and we do not seem to have a use case for this one, so
for the sake of clean API design we should potentially remove this. This would
allow us to potentially pass in arguments to optionally construct an object,
although it is indeed a little bit unclear how we can reuse existing objects if
constructor arguments are passed in. In any case, we may want to remove this
dangling feature.

* Speedup generate proposals by partial_sort.

Speedup generate proposals by partial_sort.

FACEBOOK:
- Saw speed improvement for training with this op.
- Yanghan benchmarked the op on a small dataset and see consistent 100% improvement on speed (6ms -> 3ms) on 420 input resolution. See next diff for details.

* More parallel processing friendly for CPP version of GenerateProposals.

More parallel processing friendly for CPP version of GenerateProposals.

* [DT] [43/n] Lift stop conditions inside reader code back to flow control

1. Split multi_reader function into local_reader and remote_reader
2. Lifted stop conditions inside Limiter back to flow control
3. Split epoch flow building logic into 3 cases:
  - single machine (1 reader, 1 trainer on trainer0 node, no PS)
  - (1 reader + 1 trainer) on trainer0 node, has PS
  - multiple readers, readers do not share nodes with trainers, might have PS or not

* Resolve conflicts for torch/_thnn/utils.py

* [Caffe2] Handle image decoding errors

Image decoding errors can make the whole training fail. This diff is to handle them
1.Catch imdecode exceptions and check if decoded image has zero columns or rows. This is counted as decoding errors.
2.Replace the image with empty in case of error
3.Count the number of errors and throw runtime exception if the rate reaches given number

The empty image data is kept. It might introduce noise in the training data.

* Update MKL exporter to IDEEP ops

TSIA

* [Caffe2] GlobalInit is thread safe, fixing the comment

With the mutex and lock, GlobalInit is thread safe.
Update the comments.

* Back out "Add support for generating ATen files during fbcode build"

Original commit changeset: 28970ddba353

@override-unit-failures
(Note: this ignores all push blocking failures!)

* [DT]: fix predictor save

similar to D6610058, here we add the fix for distributed online training

* Remove net_singlethread_async_gpu.cc

Closes https://github.com/caffe2/caffe2/pull/2528

This removes net_singlethread_async_gpu.cc as part of our effort to clean
CUDAContext and the net executors.

* Inline DFS task execution

Add a DFS inline task execution mode in executor

* Add c10 folder to fbcode

This adds the c10 folder and its test cases to fbcode. Build flags are mostly taken from aten.

* add dependencies for online trainer

Add some dependencies so that the online model can use DataPipeline and PredictionTransform operators

Relevent post: https://fb.intern.facebook.com/groups/1324375037655677/permalink/1740993462660497/

* Resolve conflicts for tools/jit/gen_jit_dispatch.py

* [Fix] sparse regularization in distributed training

* Support advanced pooling options in sum processor

* support advanced pooling options in sum processor
* remove redundant code
* support attention in sum processor

* Improve shard logging in net tracing code

Make it handle arbitrary shard ids instead of just one digit ids.

* [Caffe2] Call GlobalInit in predictor only in mobile

FACEBOOK:
Calling GlobalInit long after the program starts may not be safe. There are issues if the following happens:

User does not call GlobalInit and initFacebook after program starts
User sets a flag manually: https://fburl.com/mcsumw7d
User calls OSS predictor.
OSS predictor calls GlobalInit
GlobalInit calls initFacebook
initFacebook resets all flags: https://fburl.com/tolszha1
Thus, the user manually set flags are overwritten

This would happen anytime GlobalInit is called long after the program starts.
I suppose the intention of the user in this case is not to call GlobalInit throughout the program,
but use Caffe2 regardless (is that desired?)
But adding GlobalInit in the OSS predictor would automatically call GlobalInit when using Caffe2.

This issue doesn't exist in mobile, since initFacebook is not called on mobile.

For now, guard the GlobalInit in predictor for mobile only.
May want to ensure the GlobalInit is always called at the start of the program. @[3501714:kutta] has seen weird issues when not calling GlobalInit at the start of the program on server side. He has made some progress on this.

* resolve conflicts for caffe2/core/logging_is_google_glog.h and test/test_torch.py

* Add empty fix for SumLikeReduceOp

Add empty fix for SumLikeReduceOp

* Revert D7962948: [caffe2][nomnigraph] Concat elim for sparseNN

This reverts commit f7f434dc5c34ca6058b9765d2ef615453d2276a9

@bypass-lint

An infra SEV is better than not reverting this diff.
If you copy this password, see you in SEV Review!
@cause_a_sev_many_files

* Remove Declarations.yaml

* Include common.h

* Change std::stoi to caffe2::stoi

* Add thread_name.cc to the CMake file

* No need to subtract 1. Fix test segfaults

* Fix NetTest, ObserverTest

Fix tests

(cherry picked from commit 3767e66c3f365596cba3d46d3e7322c933a0ab41)

* CTCGreedyDecoderOp only has CPU implementation, test should only run on CPU

* Add a variable to avoid conversion resizing issue

* [fix] fixup the bias multiplier data access issue

Hotfix for failues in conv_transpose

* [D2][Easy]: lint regularizer

lint with black

* [GanH]: Split mu in adaptive weight for diagnose

* [Dper] Add the ability to split FC weights into multiple smaller ones

* fix SumReduceLikeOp for empty blob

as desc.

* add ctc_greedy_decoder for caffe2

ctc_greedy_decoder same as tf's

* Update event callback handling

Allow multiple callbacks per event

* Add WeightedSum layer

The motivation is to do weighted sum in HoNet/crossnet, in the next diff, I'll replace model.Add with model.WeightedSum in
honet: https://fburl.com/f4rmolg2
crossnet: https://fburl.com/v7awn8se, https://fburl.com/63filbnm

* Replicate DAG's behavior

Some callers expect RunAsync to block, replicate that behavior in case of
explicit 'dag' net type

* [dper] layernorm layer

as title

* Override dag, async_dag, async_polling

Overriding dag, async_dag and async_polling with async_scheduling

* Name the thread pools

Caffe thread pools currently inherit the thread names from the thread that starts them, which can be misleading. Give them an explicit name instead.

* [Caffe2] FilleOp should support int64_t dimensions

Change argument type to int64_t for shape argument of FillerOp (used in ConstantFill, XavierFill, etc)

* Remove caffe2/caffe2/contrib/torch/

It's not used anywhere and depends on old lua torch that conflicts with Aten. Given PT1 it's not relevant any more (though it was nice and clever code!)

#accept2ship

* Fix linearWarmup multiplier check

The multiplier needs to be non-negative, not strictly positive.

* Revert D3314316

This is after 2 years and we do not seem to have a use case for this one, so
for the sake of clean API design we should potentially remove this. This would
allow us to potentially pass in arguments to optionally construct an object,
although it is indeed a little bit unclear how we can reuse existing objects if
constructor arguments are passed in. In any case, we may want to remove this
dangling feature.

* Speedup generate proposals by partial_sort.

Speedup generate proposals by partial_sort.

FACEBOOK:
- Saw speed improvement for training with this op.
- Yanghan benchmarked the op on a small dataset and see consistent 100% improvement on speed (6ms -> 3ms) on 420 input resolution. See next diff for details.

* More parallel processing friendly for CPP version of GenerateProposals.

More parallel processing friendly for CPP version of GenerateProposals.

* [DT] [43/n] Lift stop conditions inside reader code back to flow control

1. Split multi_reader function into local_reader and remote_reader
2. Lifted stop conditions inside Limiter back to flow control
3. Split epoch flow building logic into 3 cases:
  - single machine (1 reader, 1 trainer on trainer0 node, no PS)
  - (1 reader + 1 trainer) on trainer0 node, has PS
  - multiple readers, readers do not share nodes with trainers, might have PS or not

* Resolve conflicts for torch/_thnn/utils.py

* [Caffe2] Handle image decoding errors

Image decoding errors can make the whole training fail. This diff is to handle them
1.Catch imdecode exceptions and check if decoded image has zero columns or rows. This is counted as decoding errors.
2.Replace the image with empty in case of error
3.Count the number of errors and throw runtime exception if the rate reaches given number

The empty image data is kept. It might introduce noise in the training data.

* Update MKL exporter to IDEEP ops

TSIA

* [Caffe2] GlobalInit is thread safe, fixing the comment

With the mutex and lock, GlobalInit is thread safe.
Update the comments.

* Back out "Add support for generating ATen files during fbcode build"

Original commit changeset: 28970ddba353

@override-unit-failures
(Note: this ignores all push blocking failures!)

* [DT]: fix predictor save

similar to D6610058, here we add the fix for distributed online training

* Remove net_singlethread_async_gpu.cc

Closes https://github.com/caffe2/caffe2/pull/2528

This removes net_singlethread_async_gpu.cc as part of our effort to clean
CUDAContext and the net executors.

* Inline DFS task execution

Add a DFS inline task execution mode in executor

* Add c10 folder to fbcode

This adds the c10 folder and its test cases to fbcode. Build flags are mostly taken from aten.

* add dependencies for online trainer

Add some dependencies so that the online model can use DataPipeline and PredictionTransform operators

Relevent post: https://fb.intern.facebook.com/groups/1324375037655677/permalink/1740993462660497/

* Resolve conflicts for tools/jit/gen_jit_dispatch.py

* [Fix] sparse regularization in distributed training

* Support advanced pooling options in sum processor

* support advanced pooling options in sum processor
* remove redundant code
* support attention in sum processor

* Improve shard logging in net tracing code

Make it handle arbitrary shard ids instead of just one digit ids.

* [Caffe2] Call GlobalInit in predictor only in mobile

FACEBOOK:
Calling GlobalInit long after the program starts may not be safe. There are issues if the following happens:

User does not call GlobalInit and initFacebook after program starts
User sets a flag manually: https://fburl.com/mcsumw7d
User calls OSS predictor.
OSS predictor calls GlobalInit
GlobalInit calls initFacebook
initFacebook resets all flags: https://fburl.com/tolszha1
Thus, the user manually set flags are overwritten

This would happen anytime GlobalInit is called long after the program starts.
I suppose the intention of the user in this case is not to call GlobalInit throughout the program,
but use Caffe2 regardless (is that desired?)
But adding GlobalInit in the OSS predictor would automatically call GlobalInit when using Caffe2.

This issue doesn't exist in mobile, since initFacebook is not called on mobile.

For now, guard the GlobalInit in predictor for mobile only.
May want to ensure the GlobalInit is always called at the start of the program. @[3501714:kutta] has seen weird issues when not calling GlobalInit at the start of the program on server side. He has made some progress on this.

* resolve conflicts for caffe2/core/logging_is_google_glog.h and test/test_torch.py

* Add empty fix for SumLikeReduceOp

Add empty fix for SumLikeReduceOp

* Revert D7962948: [caffe2][nomnigraph] Concat elim for sparseNN

This reverts commit f7f434dc5c34ca6058b9765d2ef615453d2276a9

@bypass-lint

An infra SEV is better than not reverting this diff.
If you copy this password, see you in SEV Review!
@cause_a_sev_many_files

* Remove Declarations.yaml

* Include common.h

* Change std::stoi to caffe2::stoi

* Add thread_name.cc to the CMake file

* No need to subtract 1. Fix test segfaults

* Fix NetTest, ObserverTest

Fix tests

(cherry picked from commit 3767e66c3f365596cba3d46d3e7322c933a0ab41)

* CTCGreedyDecoderOp only has CPU implementation, test should only run on CPU

* Add a variable to avoid conversion resizing issue

* Remove the code per soumith's comments

* Remove the code per soumith's comments

* Remove blank lines in the end of file

* Resolve conflicts for torch/_thnn/utils.py

* Update MKL exporter to IDEEP ops

TSIA

* Back out "Add support for generating ATen files during fbcode build"

Original commit changeset: 28970ddba353

@override-unit-failures
(Note: this ignores all push blocking failures!)

* add dependencies for online trainer

Add some dependencies so that the online model can use DataPipeline and PredictionTransform operators

Relevent post: https://fb.intern.facebook.com/groups/1324375037655677/permalink/1740993462660497/

* Resolve conflicts for tools/jit/gen_jit_dispatch.py

* Support advanced pooling options in sum processor

* support advanced pooling options in sum processor
* remove redundant code
* support attention in sum processor

* resolve conflicts for caffe2/core/logging_is_google_glog.h and test/test_torch.py

* Revert D7962948: [caffe2][nomnigraph] Concat elim for sparseNN

This reverts commit f7f434dc5c34ca6058b9765d2ef615453d2276a9

@bypass-lint

An infra SEV is better than not reverting this diff.
If you copy this password, see you in SEV Review!
@cause_a_sev_many_files

* Remove Declarations.yaml

* Include common.h

* Change std::stoi to caffe2::stoi

* [caffe2] uprade IDEEP and hotfix for conv op accuracy issue (#8364)

* [IDEEP] Upgrade IDEEP version

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* [IDEEP] Fix accuracy issue in conv op

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Fix build error due to lack of src in CMakeLists

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Remove the code per soumith's comments

* [ONNX] Add an ATen fallback pathway for ONNX export (#8273)

* ATen fallback for ONNX export

* Move to enum

* Fix model test

* Add comment

* Address comments

BC interface

* Remove imaginary file (#8415)

* [Caffe2] Enable AMD/MIOPEN ops for Caffe2  (#8306)

* Add hip support for caffe2 core

* Add MIOPEN header/wrapper to caffe2 core

* Add HIP device into caffe2 PB

* top level makefile change for rocm/hip

* makefile scaffolding for AMD/RocM/HIP

* Makefile scafodding for AMD/RocM/HIP; add makefile/utility for HIP files

* caffe2 PB update for AMD/ROCM HIP device

* Add AMD/RocM/Thrust dependency

* HIP threadpool update

* Fix makefile macro

* makefile fix: duplicate test/binary name

* makefile clean-up

* makefile clean-up

* add HIP operator registry

* add utilities for hip device

* Add USE_HIP to config summary

* makefile fix for BUILD_TEST

* merge latest

* Fix indentation

* code clean-up

* Guard builds without HIP and use the same cmake script as PyTorch to find HIP

* Setup rocm environment variables in build.sh (ideally should be done in the docker images)

* setup locale

* set HIP_PLATFORM

* Revert "set HIP_PLATFORM"

This reverts commit 8ec58db2b390c9259220c49fa34cd403568300ad.

* continue the build script environment variables mess

* HCC_AMDGPU_TARGET

* Cleanup the mess, has been fixed in the lastest docker images

* Assign protobuf field hip_gpu_id a new field number for backward compatibility

* change name to avoid conflict

* Fix duplicated thread pool flag

* Refactor cmake files to not add hip includes and libs globally

* Fix the wrong usage of environment variables detection in cmake

* Add MIOPEN CNN operators

* Revert "Add MIOPEN CNN operators"

This reverts commit 6e89ad4385b5b8967a7854c4adda52c012cee42a.

* Add MIOPEN pooling operator

* Add MIOPEN activation operator

* Add MIOPEN softmax operator

* Add MIOPEN spatial batch norm operator

* Add MIOPEN loacl response normalization operator

* Add MIOPEN conv operator

* Clean-up LRN ops

* enable fp16 in MIOPEN pool ops

* Enable fp16 for MIOPEN relu op

* Enable fp16 for MIOPEN spatial batch norm op

* code clean-up

* revert float16 support

* Create Caffe2 python binding for AMD/ROCM/HIP

* Add op fallback for HIP operator

* add hip src/test files in cmake

* exclude hip src/test files

* fix python binding for hip backend

* fix MIOPEN pooling op workspace

* hack to compile miopen operators

* fix include path for MIOPEN ops

* Fix include path

* Add HIP math utilities

* Fix path for HIP math utils

* cmake fix

* Cmake fix / hipcc for hip files

* suppress hipcc warning

* cmake fix /replcae USE_HIP with USE_ROCM

* revert LoadHIP.cmake change

* fix include for thrust/cub-hip

* include path fix for conversion.h

* Updated with latest upstream changes

* clang format fixes

* Context_hip updates

* Fixed typo in rocblas handle get function

* Updated hipified math utils

* Updated math hip test util

* Updated context hip test

* Updated common_hip

* Updated net async dag for HIP

* Added MIOPEN in operator hip test

* fix

* C2 dependencies clean-up

* fix include path for building custom protobuf

* Decouple miopen pool op and conv_pool_op base

* cmake refactor

* fix operator_hip_test

* move all hip/miopen ops files into caffe2/operators/hip

* sanitize cmake

* permission issue

* remove extra parenthesis

* remove artifact from resolving merge conflict

* cont. sanitize cmake files

* fix syntax error

* sanitize conversion.h

* .

* Revert "."

This reverts commit 56020cb0e996a31ae27bf1f8f491955ed0b121b9.

* clang-format

* Enable some reduce operators' ONNX backend tests (#8418)

* fix old comment to point to the right file (#8416)

* Stop pinning nccl version. (#8421)

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Expose logsumexp docs and mark log_sum_exp in distributions for internal use (#8428)

* Enable some of the ONNX backend test on broadcasting (#8423)

* Enable some of the ONNX backend test on broadcasting

* enable gemm broadcast

* Expose proto utils and ONNX (#8073)

* Expose proto utils and ONNX from PyTorch libcaffe2.so

* Try to use protobuf from _C.so

* Fix ONNX proto header include

* Adjust order of imports for ONNX until nanopb goes away

* Set and use ONNX_NAMESPACE for PyTorch builds

* Show protobuf summary for all builds

* Add ONNX_NAMESPACE for cpp_build

* Statically link libprotobuf.a into libtorch.so

* Set ONNX_NAMESPACE on Windows build

* Move core/dispatch up as well

* Add /MD flag for Windows build of _C

* Potential Windows fix for ONNX and protobuf

* Add direct linkage from _C to ONNX on Windows

* Only include protobuf wrapper for PyTorch

* Pass extra_compile_args to _nvrtc ext build

* Remove installation of .a files

* Rebase creates some weird situations, revert them manually

* Remove more weird changes due to rebase

* Need to add thread_name.cc after merge
2018-06-13 13:10:45 -07:00
Xiaomeng Yang
44973a06ba
Add affine_channel_op (#8356)
Add affine_channel_op
2018-06-11 20:51:11 -07:00
bddppq
3521cd54af
Fix dividing by zero segfault in Reshape (#8302)
when infer a dimension of zero size new shape
2018-06-09 09:48:22 -07:00
sunnieshang
b2dac08049 Fix a corner case for ReShapeOp (#8178)
In my use case, in the backward propogate pass, the reshape need to
change a [0] tensor into [0,0] shaped tensor. The original implementation would
cause out of index issue. This diff fix this problem.
2018-06-05 19:06:10 -07:00
Xiao Yang
ffde23d45e use the correct datatype format (#8144) 2018-06-05 22:01:59 -04:00
Xiaomeng Yang
9243b64bff
[Caffe2] Update elementwise ops to support numpy style boradcast (#8070)
* Update elementwise ops to support numpy style boradcast

Update elementwise ops to support numpy style boradcast

* Fix sqrt_op

* Fix compare ops

* Fix gradient test

* Fix optimizer legacy broadcast

* Fix legacy broadcast for elementwise ops

* Skip flaky test

* Fix eigen simple binary op

* Fix attention test

* Fix rnn test

* Fix LSTM test

* Fix tan grad

* Fix schema check
2018-06-05 15:49:16 -07:00
Bram Wasti
82b981e4db Update from facebook 1ee4edd286a3 (#8040)
* Adding instance weight to batch distill loss

as title

* add bfloat 16-31

added bfloat 16-31 and their respective unit tests

* [CUDA9] Upgrade - fbcode

CUDA9 upgrade diff D5654023 has been out for a while thanks to Pieter. But with time growing it's becoming quite hard to rebase, because of the symlinks and auto-generated build/config files in tp2. Break D5654023 into two diffs, one touching tp2 config files, and another one touching fbcode TARGETS file (adding nvcc flag). These two should be a bit easier to rebase (for detailed procedure see "Test Plan").

This diff can only be committed if:
1. CUDA 9 rpm is rolled out fleet-wide (TBD)
2. NVidia driver 390.40 is rolled out fleet-wide (done)
3. Upgrade CUDA 9.1, cudnn 7.1, nccl 2.1 (done)
4. Make sure all dependents are built (done)
5. Test all C2 operators, PyTorch (see test plan)

* Share intermediate int32 buffer across Conv ops

Adding a known type

* [C2 fix] infer function for ensure_cpu_output_op

this is adding the missing device funtion for ensure_cpu_output_op

* [int8] Add blob serializer/deserializer for Int8TensorCPU

To export to logfiledb

* [nomnigraph] Add try catch block to optimization passes in predictor

This will catch failures that happen in the optimization pass.

* Caffe2: avoid static initialization order fiasco for CAFFE_ENFORCE

CAFFE_ENFORCE uses strack trace fetcher. Which is currently a
global static variable. If at static initialization time CAFFE_ENFORCE
is used, this is a SIOF. Recently CAFFE_ENFORCE was added into init
functions registration, so we started to see this.

Meyers singleton is going to provide safety here. If stacktrace
fetcher was not registered yet, it will just use a dummy one.

* NUMA support in SparseNN CPU benchmark

Adding support for NUMA in SparseNN CPU benchmark

* [mobile-roofline] Add logging needed for roofline model

This should be all that's needed

* Let the operators using the same input if the operators are not chained

or else, we have to change the input data dims

* fix null-pointer-use UBSAN errors in in reshape_op.h

* revert previous fix on input blob name

as title

* Adding flag to let MineHardNegative automatically extract single value from dict

Model exporter requires the output of the model to be a struct. This makes it convenient to use those models directly in MineHardNegative by allow automatic extraction of the single element of dict, which is a common use case.

* Reverting change that broke internal tests back to OSS compatible state
2018-06-01 17:41:09 -04:00
Sebastian Meßmer
49f8581745
Update from facebook (#7855)
* [mpscnn] MPSCNNChannelShuffle

att

* [Easy] Adding tags as an argument to the functional layer

Without it "tags" would be added as an argument to the operator.

The change here is based on the assumption that there is no operator that takes "tags" as an argument.

* Fix locally_connected_op schema check.

Fix locally_connected_op schema check.

* [C2] Add TypeAndShape inference for few more operators

As desc

* [c2] Shape inference should support 0 as dimension

Tensors can have 0 in their dimension.

* Make MockHiveReader loop over and support max_examples

Replace DatasetReader with RandomDatasetReader.

So that Mock Hive Reader can simulate a large data input using a small sample file as source.

* Utility function to wipe cache between benchmark runs

Caffe2 benchmark does not wipe out cache between runs, and this potentially creates an unrealistically optimistic picture of performance. This diff adds utility function to wipe out the cache.

* Allow caffe2 GlobalInit to be invoked multiple times

Allow caffe2 GlobalInit to be invoked multiple times. Will re-parse gflags and update logging levels on successive invocations, but will not re-run init functions or perform other one-time initialization.

* Add Caffe2 GlobalInitIsCalledGuard to base net and operator classes

Warn if caffe2's GlobalInit function has not been invoked before creating an operator or net object. This is based on discussion here: https://fb.quip.com/kqGIAbmK7vNG

* Rethrow current exception on failure

Rethrow current exception instead of copy constructing a new one on op failure.

* Make `clone()` return subclass of List/Struct

`clone()` is not working correctly when we subclass those classes

* Wipe the cache before the net run

the util function is copied from D7409424
will rebase once D7409424 is landed.

* [Caffe2] [Mobile] Support utils/cast.h::GetCastDataType with LITE_PROTO builds

* Correct includes

async_polling include -> async_base include

* Prepare execution flags for executor migration

Making async_scheduling aware of underlying net type to prepare for executor
migration

* Add operator level observers into async executor

Adding operator level observers into RunAsync operators' calls

* Cleanup TEST_Benchmark

Remove duplicate code and provide default implementation in NetBase

* [C2] Fix type and shape inference for binary comparison ops

As desc.

* Add GlobalInit to predictor to ensure initialization is always done before prediction

FACEBOOK:

Redo D7651453 the correct way.

Now use a static variable for the arguments passed to GLog

* Remove spammy log message

This method is currently used in various places inside Caffe itself.

* Disable events for operators inside a chain

We don't need to use events in operators within a chain because the chain is
always scheduled on a single stream, keeping only first and last event for
scheduling purposes

* Ensure correct finish run order

In rare cases we might call finishRun and trigger net's destruction while
another worker is still holding shared_ptr to a thread pool, that can cause
thread pool destruction from within a worker thread in case no other nets are
using the pool. This diff fixes the order of calling finishRun and also changes
pool() to return raw pointer to keep pool's ownership within the net

* Reduce unnecessary polling

Make sure we don't waste CPU by polling operators that we can set an efficient
callbacks on

* Squash commit of syncing 9506eeb from github to fbcode

Patch xplat buck fix

add virtual destructor to OptimizationPass

add virtual destructor to OptimizationPass

build fixes for sync

build fixes for sync

* Fix net tracing

Fix net tracing from async_scheduling

* Fix logging
2018-05-29 11:38:02 -07:00
Orion Reblitz-Richardson
74246c9ba4
Potential fix for RNN test on MKL (#7862) 2018-05-25 16:16:46 -07:00
bddppq
93b7b5dddd
Fix trigonometric_op_test failures when running in python3.6 (#7831) 2018-05-24 19:09:35 -07:00
bddppq
f94ae3ba1d
Update from facebook (#7696)
* Fix handling of empty batches in SumReduceDimsOp

As titled

* Deferrable async_scheduling finishRun fix

Proper order of finishing run operations in deferrable_async_scheduling net

* Simplify exception handling in async_scheduling

Simplify exception handling, no need to busy wait, thread that processes the
last task can finish the run

* [C2]worker_coordinator_memorize_worker_ids

As titled. This is related to T28689868, where the number of blobs we want to create is equal to the number of worker ids

* Add unit test for nets with no type set

* Ignore total length argument in sympolic_pad_packed_sequence

1- There was a mistake in the code that total_length was added to the wrong symbolic function (pack_padded_sequence) instead of (pad_packed_sequence)
2- No need to throw an exception if total_length is given since it is only used to enable data_parallel training on multi-gpus and doesn't have anything to do with onnx export, so just ignore it. https://fburl.com/tk4gciqp

* Add support for MKLDNN to async_scheduling

Just add MKLDNN as a possible CPU option to async_scheduling's pool function

* [AuFL][ensemble] support branch output for prediction

This diff supports using predictions from different branches and thus enables model ensembling (not fully independent).

* Fix a bug in add_loss in layer_model_helper

As titled.

* Support lradaption for adam

1.lr adaption operator
2.apply to dense adam

* Perf tweaks for async_scheduling

Restore single pool option + remove unnecessary (no-ops) calls

* add quantization to SparseSimdAdagradOp

add a bunch of quantization signatures to SparseSimdAdagradOp, implementations to come next

* [sr] [codemod] Change all SR callsites to use new API

@allow-large-files

This diff refactors all callsites of SR to use the slightly changed API introduced in the diff below. Really what this means is that you need to include the correct header. Also if you were using `ClientFactory::newFactory` you need to not prefix it with `ClientFactory::`.

```
cd ~/fbsource/fbcode
find ./ -type f -exec sed -i -e 's:#include "servicerouter/client/cpp2/ClientFactory.h":#include "servicerouter/client/cpp2/ServiceRouter.h":' -e 's:#include <servicerouter/client/cpp2/ClientFactory.h>:#include <servicerouter/client/cpp2/ServiceRouter.h>:' -e 's/ClientFactory::newFactory(/newFactory(/g' {} \;
```

Also manually fixed spots that couldn't be done automatically (or broke because they depended on transitive includes).

* Back out "Fix handling of empty batches in SumReduceDimsOp"

Original commit changeset: 282da1730cc2 This commit is blocking the
Github->fbcode sync, which really needs to get merged ASAP. D7881937 which this
diff depends on will be reverted in the sync D7990948 which causes this to
break. The sync diff cannot be patched with this reversion because it must be
landed against base revision 5c8c099 , and D7881937 must not be included in the
sync diff because it is breaking GPU tests that are not available in sandcastle
: https://ci.pytorch.org/jenkins/job/caffe2-builds/job/py2-cuda8.0-cudnn6-ubuntu16.04-test/3638/console
for one example.

* Add the flow to support operator benchmark

1) generate model with the operator 2) upload to everstore 3) generate model spec into json file 4) start running the benchmark

* [tum][gpu] Connect DPM trainer with flow and unit tests

This diff:
- Fix some small bugs for Yiming's recent changes to parallelizer, so it suits real use cases.
- Add correct tags to the TUM code, so we can do data parallel transform
- pass extra info when instantiation.
- add unit test for using DPM in TUM model

After this diff, we can do simple box, multi-gpu fully-sync trainer for TUM in Fblearner workflow, but may still need to do speed benchmarking.

* w/o normalized lradaption for adam dense only

The previous lr adaption includes a normalization step when performing the dot product operation. This is not exactly same as what is proposed in the paper. I add normalization as an option. Without it, the operator performs exactly what the paper proposed. With the option, we add the normalization step

* [fb] Use SharedPromise in DeferrableAsyncSchedulingNet

This code is to simplify DeferrableAsyncSchedulingNet by removing condition
variable + small fixes

* [tum] implement cuda sparseLengthsMean and LengthsMean

as title

* Adding an optional parameter to allow use of protobufs in InferShapesAndTypes function.

Adding an optional parameter to allow use of protobufs in InferShapesAndTypes function.

* Move feature_to_index to FeatureSpec.feature_to_index

move feature_to_index to FeatureSpec.feature_to_index to avoid override other fields

* [Caffe2] Rename bytes_moved to bytes_written

Just a rename in preparation for supporting bytes_read.

* [c2] fix ReduceFrontSumOp for empty case by setting 0

otherwise, it may use the results from last iteration when it's empty batch.

* [Caffe2] [Int8] Improve Intel CPU performance

* [Easy] Improve PrependDim op logging

as titled

* DBFileReader expand db_path using os.path.expanduser(..)

Since there are a lot of possible use cases of `DBFileReader` to read from user home path, like `~/local/sample.db`, I want to save people's trouble of calling `os.path.expanduser(db_path)` themselves.

* [Caffe2] Add bytes_read to cost structure

We're adding analytical read bytes to cost functions.  This extends the structure accordingly for all CostInference defined operators.
Additionally, some small bug fixes were performed:
1) Cost functions now extract type information of operands instead of assuming float

* Fix sleef on aarch64 for hhvm

@bypass-lint

Rename flag

* Remove duplicated part in caffe2/ideep/operators/conv_op.cc

should be sync error

* Rename test helper function test_adagrad_sparse_helper to adagrad_sparse_test_helper to avoid confusing pytest
2018-05-19 23:10:48 -07:00
Paul Jesse Hellemn
48bf733480 Changes from D7881937 and D7963936 plus an edit (#7605)
* Changes from D7881937 and D7963936 plus an edit

* D8038158

* Another change from cxj
2018-05-18 20:59:16 -07:00
James Sun
b4d5e67e5f Add asin, acos, tan, atan operators (#7600) 2018-05-16 18:09:26 -07:00
Xiaomeng Yang
921dece2d7
Update Im2ColNd functions (#7505)
Update Im2ColNd functions
2018-05-12 15:59:50 -07:00
Paul Jesse Hellemn
b875fb281c
Update from facebook (#7451)
* [bootcamp] Improve "Shape" operator to support axes specification

To improve .shape operator of Caffe2 to support x.shape(tensor, axes), which takes an optional int array "axes" as input. For example, x.shape(tensor, [1, 0]) will return the dimension for axis 1 and 0 following the specified order. For current version, "axes" input allows duplications and can have arbitrary length.

* Back out "Add barrier net that runs before training nets"

Original commit changeset: b373fdc9c30f. Need additional changes to some callers to support barrier failures.

* Change warning to verbose log to reduce log spam

The `LOG(WARNING)` was a bit spammy for regular use so lets just make it a `VLOG`.

* Extract the shared code from different caffe2_benchmark binaries

The OSS benchmark and Internal benchmark will share most functions in the benchmark.

* Support MFR in sequence training

As titled.

* Make knowledge distillation work with using logged prediction feature as teacher label.

1) Add loading raw dense feature as teacher label.
2) Optional calibration function for teacher label
3) Add teacher label into generic unit test
4) Deprecated TTSN workflow version using feature_options to config teacher label

* [C2/CUDA]: unjoined cross entropy sigmoid

as desc

* Add async_scheduling executor into deferrable_net_exec_test

Add async_scheduling into tests and fix some exception cases

* Fix Event disabled error

When disabling event in RNN ops make sure we don't call Finish on disabled
event from op's RunAsync

* cuda ensure cpu output op can handle both TensorCPU and TensorCUDA

as desc.

* [C2 Core] Infer input device option in C2 hypothesis_test checkers

Improve how we default input blob device options.
Previously it defaults as where op lives but it is not necessarily the case.

For example:
CopyCPUToGPU

* [C2 Op]SplitByLengthsOp CPU/GPU implementation

[C2 Op]SplitByLengthsOp CPU/GPU implementation

* fix undefined symbol error

not sure why we're getting undefined symbol even with link_whole = True
Need to figure out why but need this workaround for now

* Add tools in DAIPlayground platform to help debugging models

Add additional tools to allow Plauground override individual method defined in AnyExp.  This will allow user to create module that specificly change certain default method behavior.  An example included in this diff is deactivating test model and checkpointing.  When debugging any model problems, switching off components helps me quickly narrow down the location of the bug.  The technique is extensively used in task T27038712 (Steady memory increase in EDPM, eventually resulting in gloo/cuda.cu:34: out of memory)

* add shape and type inference for int8 conversion operator

* Fix flaky test for group_norm

Fix flaky test for group_norm

* Fix group_norm_op_test flaky

Fix group_norm_op_test flaky

* Implementation of composite learning rate policy

In many state-of-the-arts deep learning works, people use a simple trick to
schedule the learning rate: use a fixed learning rate until error plateaus
and then switch to a different fixed learning rate, and so on. In this diff,
we implemented a simple version of the composite learning rate. The user gives
a set of learning rates policies and corresponding iteration nums, and the
optimizer will change the learning rate policy based on the number of iterations so far.

For example, the user give two learning rate policies, one is FixedLearningRate
and PolyLearningRate, with an iteration number of 1k. Then the first 1k iteration,
we use FixedLearningRate. For the following iterations, we use PolyLearningRate.

* Split two use cases of CachedReader into two classes, DBFileReader and CachedReader

# Use Cases:

1). input: DB file -> output: DatasetReader.

Use DBFileReader.

2). input: Reader -> build cache DB file -> output: DatasetReader.

Use CachedReader.

# Changes to CachedReader:

1). Move db_path to the constructor.
Because in mock reader. cache will always be built ahead.

# Changes to tests:

1). Make a separate TestCase class for CachedReader and DBFileReader.

2). Make it possible to add more test functions by adding setUp, tearDown and _make_temp_path.

3). Make delete db_path more general. `db_path` could be a file for `log_file_db`, but could also be a directory for `leveldb`.

* Back out "On Mobile phones, call GlobalInit with no arguments in predictor in case we need to perform initialization"

Original commit changeset: 4489c6133f11

* Fix LARS bug

Fixed a bug in the LARS implementation which caused all subsequent blobs not using LARS to have the LARS learning rate multiplier applied to them.

* [tum] support sparse init & add uniformFill option

as title

* Propagate exception for async nets

Capture the exception when an exception is thrown in async nets and re-throw it after wait().  This allows exceptions to be propagated up to the caller.

This diff was a part of D7752068.  We split the diff so that C2 core files changes are in a separate diff.

* Automatic update of fbcode/onnx to 69894f207dfcd72d1e70497d387201cec327efbc

Previous import was 403ccfbd0161c38f0834413d790bad0874afbf9a

Included changes:
- **[69894f2](https://github.com/onnx/onnx/commit/69894f2)**: Use op schema.all tensor types in random like definitions (#865) <Scott McKay>
- **[b9d6b90](https://github.com/onnx/onnx/commit/b9d6b90)**: Clarify random like operators (#846) <Scott McKay>
- **[fc6b5fb](https://github.com/onnx/onnx/commit/fc6b5fb)**: Refactor shape inference implementation (#855) <anderspapitto>
- **[b7d8dc8](https://github.com/onnx/onnx/commit/b7d8dc8)**: fix cmake warning message (#863) <Eric S. Yu>
- **[f585c5d](https://github.com/onnx/onnx/commit/f585c5d)**: add pytorch-operator test for tile (#831) <Wenhao Hu>
- **[993fe70](https://github.com/onnx/onnx/commit/993fe70)**: add install step (#832) <Eric S. Yu>
- **[68bc26c](https://github.com/onnx/onnx/commit/68bc26c)**: add type inference for traditional ml ops except classifier ops. (#857) <Ke Zhang>
- **[9cc0cda](https://github.com/onnx/onnx/commit/9cc0cda)**: fix string representation of scalar types (#858) <G. Ramalingam>
- **[1078925](https://github.com/onnx/onnx/commit/1078925)**: fix y in pow test case to scalar (#852) <Wenhao Hu>
- **[c66fb6f](https://github.com/onnx/onnx/commit/c66fb6f)**: Add some math function shape inference (#845) <anderspapitto>
- **[ff667d1](https://github.com/onnx/onnx/commit/ff667d1)**: Refactor return type and docs for ONNXIFI_BACKEND_DIRECTX_ID (#853) <Marat Dukhan>
- **[11c6876](https://github.com/onnx/onnx/commit/11c6876)**: clear initializer names when clear initializer (#849) <Wenhao Hu>
- **[73c34ae](https://github.com/onnx/onnx/commit/73c34ae)**: Clarify FeatureVectorizer description. (#843) <Scott McKay>
- **[1befb9b](https://github.com/onnx/onnx/commit/1befb9b)**: Remove useless text in docs (#850) <Lu Fang>
- **[e84788f](https://github.com/onnx/onnx/commit/e84788f)**: Fix SELU attributes' default values (#839) <Lu Fang>
- **[ebac046](https://github.com/onnx/onnx/commit/ebac046)**: Add tile test case (#823) <Wenhao Hu>
- **[8b7a925](https://github.com/onnx/onnx/commit/8b7a925)**: a few more shape inference functions (#772) <anderspapitto>
- **[9718f42](https://github.com/onnx/onnx/commit/9718f42)**: Make the coefficient non optional for LinearClassifier (#836) <Jaliya Ekanayake>
- **[ef083d0](https://github.com/onnx/onnx/commit/ef083d0)**: Add save_tensor and load_tensor functions for Protos (#770) <Lu Fang>
- **[45ceb55](https://github.com/onnx/onnx/commit/45ceb55)**: Check if CMAKE_BUILD_TYPE set before project(). (#812) <Sergii Dymchenko>
- **[4b3d2b0](https://github.com/onnx/onnx/commit/4b3d2b0)**: [WIP] reenable shape inference tests (#834) <anderspapitto>
- **[22d17ee](https://github.com/onnx/onnx/commit/22d17ee)**: RNN tests: LSTM, GRU, SimpleRNN (#739) <Peyman Manikashani>
- **[de65b95](https://github.com/onnx/onnx/commit/de65b95)**: dimension denotation (#443) <Tian Jin>
- **[eccc76e](https://github.com/onnx/onnx/commit/eccc76e)**: fix field number issue in onnx operator proto and enable its build (#829) <Ke Zhang>
- **[d582beb](https://github.com/onnx/onnx/commit/d582beb)**: disable shape inference test to unbreak ci (#830) <Lu Fang>
- **[485b787](https://github.com/onnx/onnx/commit/485b787)**: function proto for composite op. (#802) <Ke Zhang>
- **[cd58928](https://github.com/onnx/onnx/commit/cd58928)**: specify defaults for attributes of Affine op (#820) <G. Ramalingam>
- **[7ee2cf9](https://github.com/onnx/onnx/commit/7ee2cf9)**: merge the dummy backend back into the main one (#743) <anderspapitto>
- **[1c03a5a](https://github.com/onnx/onnx/commit/1c03a5a)**: [Proposal] ONNX Interface for Framework Integration (previously ONNX Backend API) header and docs (#551) <Marat Dukhan>
- **[3769a98](https://github.com/onnx/onnx/commit/3769a98)**: Rename real model test case from VGG-16 to ZFNet (#821) <Lu Fang>

* [C2]ReluN Op

relu n op.

tf reference: https://www.tensorflow.org/api_docs/python/tf/nn/relu6

* Call destructor when assigning a blob value

* Add executor overrides

Add executor overrides flag to enable migration to async_scheduling executor

* Add barrier net that runs before training nets - attempt #2

Add a synchonize barrier net that is run before training nets.  With this net, shards that are faster will wait for other shards before start training.  This reduce chances of the faster shards timing out during GLOO AllReduce.
Removed explicit data_parallel_model.py.synchronize call in holmes workflow.

This change was landed previously but caused errors for some EDPM workflows - See https://fb.facebook.com/groups/1426530000692545/permalink/1906766366002237/ - because EDPM assumes any call to CreateOrCloneCommonWorld and Gloo ops are wrapped in exception handlers but in this case exception thrown in the barrier init net is not handled.

To address this issue, we add _CreateOrCloneCommonWorld to the param_init_net instead of a new barrier init net.  Since errors for param_init_net run is handled gracefully and re-rendezvous, it should fixes the problem.

* Handle empty nets in async_scheduling

Make sure we don't get stuck on empty nets

* use CUDA_ARCH for conditional compile

* [C2 fix] infer function for ensure_cpu_output_op

* Update group_norm test to reduce flaky test

* Fix lr_multiplier for GPU
2018-05-10 23:14:27 -07:00
Xiaomeng Yang
3ae92b3a8b
Fix lint errors (#7247) 2018-05-03 12:17:23 -07:00
Lu Fang
664fe34e0a
[Caffe2][fbcode=>GH sync] Update from facebook 4323b18ce13c (#7116)
* [fix] Re-enable events in RNN ops

We have earlier added event disabling in RNN ops as back then we didn't use
events, with current use cases this is no longer true
(https://fburl.com/8vd0lp8y)

* use ops with cude impl

* Revert D7729695: [caffe2][fix] Re-enable events in RNN ops

This reverts commit 4b215c7496fb724656ff4c776933a15bdbbcde5e

@bypass-lint

An infra SEV is better than not reverting this diff.
If you copy this password, see you in SEV Review!
@cause_a_sev_many_files

* [observer] Clean up observer_config.h

#accept2ship

* [1/n] Refactor dataio_test.py

Replace code duplication with a common function

* Add barrier net that runs before training nets

Add a synchonize barrier net that is run before training nets.  With this net, shards that are faster will wait for other shards before start training.  This reduce chances of the faster shards timing out during GLOO AllReduce.

Removed explicit data_parallel_model.py.synchronize call in holmes workflow.  Similar change in speech/asr_training workflow will come in another diff.

* Support the dnnlowp backend in caffe2_benchmark

This is for SHARE operator latency evaluation

* Migrate integral_image_op to main caffe2

migrate integral_image_op(GPU version) given by https://fburl.com/yvqezigi
to caffe2/caffe2/operators and implement its CPU version. Write up a test
using the hypothesis_test mechanism

* [pos_disc, fbcode] Implement unjoined lr loss

As explained in https://our.intern.facebook.com/intern/wiki/Model_Based_Calibration/, when the dataset is an joined data set, where labels might change later, we need to use unjoined logloss.

The implementation is almost the same as in Sigrid (https://fburl.com/1trngsls), where
    loss = y (log(p) - log(1-p)) + (1-y)(log(1-p)) = xy - (1-y)x - (1-y)log(1+exp(-x))

For x < 0, to ensure stability and avoid overflow, we reformulate the above exp as
    loss = xy - (1-y)x - (1-y)x + (1-y)log(1+exp(x)) = xy + (1-y)log(1+exp(x))

Then the final expression becomes
    loss = xy + (y - 1) x (x >= 0) - (1 - y) log(1 + exp(x - 2 x (x >= 0)))

where y is the true label, x is the dot product and p = logistic(x).

This kind of implementation is align with the current implementation of the original cross entropy in
https://phabricator.intern.facebook.com/diffusion/FBS/browse/master/fbcode/caffe2/caffe2/operators/cross_entropy_op.cc;0bae3b5d0f825897c5e0dd0ff10f489d7271bf25$7-13

* Keep the array to fix the conflict

* [C2] Compute Adagrad effective LR

The AdagradWithLR op outputs an extra blob which is contains the average effective learning rate across all weights in this blob.

* Open-source extractMetaNetDef & runGlobalInitialization, add new Predictor constructor from db file, and add run_map_outputs

1. Open-source extractMetaNetDef and runGlobalInitialization, for use in
2. new Predictor constructor from db file.
3. Add new run function that returns outputs as TensorMap

* Disable eigen cpu

Disable eigen cpu in transpose and reduce

* Introduce request_only/object_only property of ModelLayer

by default this is False

* A simple TC Caffe2 benchmark

We can run tunner, get MappingOptions and then use them to
compare against cuBLAS

currently broken due to LLVM issues. How to run:

hg checkout eec1ab31b59c03b8deded1c755a9abaf8c45be01
add D7401202
add D7434625
add D7506031
add D7540728

buck run @mode/dev-nosan tc/tc/benchmarks_python:caffe2_benchmark

* Move Caffe2 feature_maps_ops to open source

Need feature maps operators in open source project facebookresearch/BlueWhale

* Manually fix the conflicts in channel shuffle op

* Fix the inconsistency between different gh and fbcode

* Skip Adagrad GPU Test (Because some gpu implementation is missing)

* Fix another test to make sure it won't run on gpu when implementation is not available yet
2018-05-01 20:49:00 -07:00
Xiaomeng Yang
08a853b02c
Add rsqrt op in caffe2 (#7154) 2018-05-01 15:06:53 -07:00
Xiaomeng Yang
762eb3ddc8
[Caffe2] Add moments op in caffe2 (#7114)
* Add moments op in caffe2

* Use rsqrtf in float for group_norm

* Add docs for default behavior when axes is not provided.

* Update group_norm_op by using Eigen::sqrt on CPU
2018-05-01 12:19:08 -07:00
daquexian
f87462c65f [Caffe2] Fix the wrong argument name in collect_and_distribute_op (#7091)
* Fix the wrong argument name, FPN works!

* Fix collect_and_distribute test
2018-04-30 15:01:11 -07:00
Xiaomeng Yang
49f87320ba
[Caffe2] Add full impl of GroupNorm (#7058)
* Add full impl of GroupNorm

* Fix comments in math.h

* Remove unsed buffers

* Add #include <array> in gpu version

* Remove unused moments_buffer_

* Make inverse std to be a template.

* Add detailed comments
2018-04-29 11:26:40 -07:00
James Reed
20cd27da42
[caffe2][ONNX] Implement CPU NumpyTileOp and corresponding ONNX backend (#7053)
* Implement CPU NumpyTileOp

* Address comments
2018-04-27 19:58:15 -07:00
Marat Dukhan
24d05662ea
[caffe2] Open-source DEPTHWISE_3x3 engine (#6601)
DEPTHWISE_3x3 engine provides an optimized implementation of depthwise 3x3 convolution, e.g. for ShuffleNet, MobileNets
Implementations exist for CPU (generic), ARM CPU, and CUDA GPU.

Originally developed by @ajtulloch
2018-04-26 02:30:51 -04:00
James Reed
3c80a2b85c
[caffe2] Add flag to ONNXWhile to skip scoping (#6910)
* [caffe2] Fix logic error in tensor filling ops in C++ ONNX backend

* [caffe2] Add flag to ONNXWhile to skip scoping
2018-04-24 16:53:22 -07:00
Bram Wasti
aa56a1211d
Update from facebook (#6871)
* Track checkpoint performance in scuba

As title.

* [C2/CUDA]: fix cross entropy sigmoid with logits

when adding log_d_trick, I forgot to add it to the cuda impl; this diff fixes
it.

* Back out "[caffe2] Unregister MKL fallbacks for NCHW conversions"

Original commit changeset: 8918dd40205a
Will land after @jongsoo's diff https://phabricator.intern.facebook.com/D7596315 lands

* [Easy][C2] Don't add blob to external outputs from output_record if it's already external output

As desc.

* On Mobile phones, call GlobalInit with no arguments in predictor in case we need to perform initialization

FACEBOOK:

The QPL logger needs the initialization code. In the past, the initialization code is put in the pipeline calling Caffe2. However, those places become obsolete quickly, as the product teams change places to call Caffe2 from time to time. We also need to track which teams use Caffe2 so that we can put the initialization code there.

With this diff, the initialization code is put in the predictor constructor, only enabled for mobile phones. This way, we can always enable QPL logging.

Once we do this, we can check how many times Caffe2 inference is called in production, and which models are more popular in production. This way, we can prioritize our effort supporting those models.

Will clean up the old code calling the init in the product in a separate diff.

* add padding op for sparse length tensor

to pad length-based sparse tensor with padding_value

* Add conv_op with cudaconvnet engine

Add conv_op with cudaconvnet engine

* [numa] Fix simple NUMA copy benchmark

Move XavierFill into init_net and also compute BW

* call roundf (device function) instead of round (host function)

* [caffe2_benchmark][observer] Make caffe2_benchmark use its own observer

1. Add ClearGlobalNetObservers()
2. Make caffe2_benchmark use its own observer and observer_reporter

* [detectron] Use roundf instead of round in the detectron module ops

* allow K larger than number of elements in top k op

one use case is to use this op together with PackSegments for sparse tensors, where the number of elements in each slice is not statistically defined.

* add ChannelShuffle DNNLOWP op

* fixup math_cpu.cc break
2018-04-23 15:01:56 -07:00
Xiaomeng Yang
71c644b005
[caffe2] Add ReduceMinOp and ReduceMaxOp (#6744)
* Add gpu check for reduce_max

* Add ReduceMinOp and ReduceMaxOp

* Merge util functions in reduce_ops and math

* Expose math internal functions
2018-04-19 00:22:23 -07:00
Jongsoo Park
c40eefeef9 ChannelShuffle with NHWC layout (#6667)
* ChannelShuffle with NHWC layout

* ChannelShuffle with NHWC layout
2018-04-18 19:13:45 -07:00
Pooya Davoodi
969251962c [Caffe2] Enhance test for CollectAndDistributeOp (#6693)
* Caffe2: Enhance test for CollectAndDistributeOp

This also changes the operator and the test to use stable sort
otherwise the test will fail due to differences between the op
and the test when facing ROIs of the same score.

* Caffe2: Adjust comparator to make std::nth_element and std::sort stable

Revert the removal of std::nth_element and std::sort and adding of
std::stable_sort.
2018-04-18 13:19:05 -07:00
Orion Reblitz-Richardson
6223bfdb1d Update from Facebook (#6692)
* [GanH][Easy]: Add assertion to adaptive weighting layer

0 weight causes numeric instability and exploding ne

* [Easy] Add cast op before computing norm in diagnose options

As LpNorm only takes floats we add a manual casting here.

* Introduce a new caching device allocator

`cudaMalloc` and `cudaFree` calls are slow, and become slower the
more GPUs there are. Essentially, they grab a host-wide (not device-wide) lock
because GPU memory is transparently shared across all GPUs. Normally, this
isn't much of a concern since workloads allocate memory upfront, and reuse it
during later computation.

However, under some computation models (specifically, memory conserving
approaches like checkpoint-and-recompute, see
https://medium.com/@yaroslavvb/fitting-larger-networks-into-memory-583e3c758ff9)
this assumption is no longer true. In these situations, `cudaMalloc` and
`cudaFree` are common and frequent. Furthermore, in data parallel contexts,
these calls happen at nearly the same time from all GPUs worsening lock
contention.

A common solution to this problem is to add a custom allocator. In fact,
nVIDIA provides one out of the box: CUB, which Caffe2 already supports.
Unfortunately, the CUB allocator suffers from very high fragmentation. This is
primarily because it is a "buddy" allocator which neither splits nor merges
free cached blocks. Study
https://github.com/NVlabs/cub/blob/1.8.0/cub/util_allocator.cuh#L357 if you
want to convince yourself.

This diff adapts a caching allocator from the Torch codebase
https://github.com/torch/cutorch/blob/master/lib/THC/THCCachingAllocator.cpp
which does splitting and merging and ends up working really well, at least for
workloads like the checkpoint-and-recompute computation models noted above.

I simplified the implementation a little bit, made it a bit more C++-like. I
also removed a bunch of stream synchronization primitives for this diff. I
plan to add them back in subsequent diffs.

* Report reader progress in fblearner workflows

Integrate with fblearner progress reporting API and add support to report training progress from reader nodes.
If reader is constructed with batch limits, report based on finished batch vs total batch. The finished batch may be more than total batch because we evaludate if we should stop processing everytime we dequeue a split.
If no limit for the reader, report based on finished splits (Hive files) vs total splits. This is fairly accurate.

* [GanH][Diagnose]: fix plotting

1. ganh diagnose needs to set plot options
2. modifier's blob name is used for metric field can need to be fixed before
generating net

* Automatic update of fbcode/onnx to 985af3f5a0f7e7d29bc0ee6b13047e7ead9c90c8

* Make CompositeReader stops as soon as one reader finishes

Previously, CompositeReader calls all readers before stopping. It results in flaky test since the last batch may be read by different threads; resulting in dropped data.

* [dper] make sure loss is not nan

as desc.

* [rosetta2] [mobile-vision] Option to export NHWC order for RoIWarp/RoIAlign

Thanks for finding this @stzpz and @wangyanghan. Looks like NHWC is more
optimized. For OCR though it doesn't yet help since NHWC uses more mem b/w but
will soon become important.

* Intra-op parallel FC operator

Intra-op parallel FC operator

* [C2 Proto] extra info in device option

passing extra information in device option

design doc: https://fb.quip.com/yAiuAXkRXZGx

* Unregister MKL fallbacks for NCHW conversions

* Tracing for more executors

Modified Tracer to work with other executors and add more tracing

* Remove ShiftActivationDevices()

* Check for blob entry iff it is present

When processing the placeholders ops, ignore if the blob is not present in the blob_to_device.

* Internalize use of eigen tensor

Move use of eigen tensor out of the header file so we don't get template partial specialization errors when building other libraries.

* feature importance for transformed features.

* - Fix unused parameter warnings

The changes in this diff comments out unused parameters.
This will allow us to enable -Wunused-parameter as error.

#accept2ship

* add opencv dependencies to caffe2

The video input op requires additional opencv packages. This is to add them to
cmake so that it can build

* Add clip_by_value option in gradient clipping

Add clip_by_value option in gradient clipping

when the value is bigger than max or smaller than min, do the clip

* std::round compat
2018-04-17 23:36:40 -07:00
Xiaomeng Yang
4be34ca0f3 Add broadcast and reduce gradient (#6668)
Add broadcast and reduce gradient
2018-04-17 13:31:13 -07:00
Xiaomeng Yang
cd2112717c
[caffe2] Update math functions with params on host. (#6602)
* Update ReduceMean

Add reduce mean to math

Add reduce mean to math

* sync reduce_ops_test

* Update math_gpu.cu
2018-04-14 21:41:41 -07:00
Yinghai Lu
ef8f556212
[Caffe2] Changes done inside Facebook (#6378)
* fix unit test for sqrt op

From the error logging:

[idx, grad, grad_estimate] are:
[[ 146.            0.5           0.45776367]
 [ 147.            0.5           0.45776367]

The gradient == 0.5 is correct, which means the SqrtOp and its gradient is doing right job. (Because y = sqrt(x), loss = y^2/2 = x/2, and then d(loss)/dx = 1/2 = 0.5; )

The test failed because of numerical problem of grad_estimate (in unit test). It can be because the step_size is small, and float precision is not high (when there are multiple elements in the tensor, we do sum(y^2) to compute loss)

This diff
- increase the step size, and also move the test cases to be further away from 0 (where sqrt(x) is not well defined) to be safe :)
- also clean up, and merge the test case for inplace Vs. non-inplace

Tested with:

`CAFFE2_HYPOTHESIS_PROFILE=debug ai_bt caffe2/caffe2/python/operator_test:elementwise_ops_test -- "test_sqrt"`

* CompositeReader & CompositeReaderBuilder

A new type of reader gluing multiple readers together.

* Back out "Revert D7394363: [GanH]: Log D Trick for Cross Entropy with Sigmoid"

Original commit changeset: 9325a4356dbe

* [dai][WIP] convert params to int8 on ps before sending to trainer

Add float->uint8 conversion in addition to float->fp16 conversion in model_saver.

* [easy] improve unit test for sparse length sum ops

as desc.

#accept2ship

* Update GitHub upstream to 771fcb3455

* move sparse hash unique ops to OOS and add unit tests

- move the SparseHash version to OOS, since 'sparsehash' is already deps of caffe2 OOS: https://fburl.com/arssw4n1
- The 'SparseHash' engine is also being used in OOS, so the SparseHash version shall be in OOS to reduce confusion: https://fburl.com/o5ea7ah2

- fix the CUDA UniqueOp for the case when batch is empty.
- add unit test

* group_norm_op for caffe2

This is the cuda op for Group Normalization (GN): https://arxiv.org/abs/1803.08494

This code implements GN in one op that computes Y=gamma * (X-mu) / sigma + beta and also its gradients. It is expected to have minimal memory consumption (similar to the BN op), without creating new blobs if GN were implemented as several ops (e.g., reshape, norm_mean/std, affine_channel).

* Resubmit D7405233: disappeared in D7464958

OOS publish causes the op missing -- however, test was still there

* [c2] add sparse hash engine for cuda unique op

The SparseHash version of UniqueOp copy input tensor to CPU, and make use of sparse hash map to get unique output, and then copy back to GPU.

* [dper][gpu] enable unit testing gpu trainer for sparse nn

to debug the GPU trainer using mock data in unit test.

make it easier to develop GPU trainer for new models.

* Reuse Gloo context for Synchronize() calls

Previously we were creating (and leaking) the Gloo context on each call to Synchronize(). Now only run the common world op and create the barrier net once, then run the barrier net on each Synchronize() call. Since timeout is associated with the Gloo context, assert that the timeout is fixed instead of trying to handle the complexity of multiple timeouts (and associated contexts).

* [GanH/WGAN][1/n]: add FC param clipping

as titled

* [mobile] minimizing changes between caffe2_benchmark and speed_benchmark

* [GanH]: enable diagnose within model

avoid finding blob names but to directly enable inside the model

* Add `net_transformer_fun` option to DPM

This callback allows for various transformations to be made to the
model after gradient operators have been added. The immediate motivation for
this is to allow transformations such has "checkpoint-and-recompute" which
allow trading off memory for additional compute.

Adding several callbacks like this has made DPM's API less than ideal at this
stage. However, I could not find any reasonable alternative.

* [DT] [33/n] Compile flow task groups

task groups need to compiled in order to pickle the object in fblearner. However I also changed the Job's compile function as creating new object is not necessary.

* Initial commit for sparse_normalize vectorization and benchmark

* [GanH]: LB Calibration for JSD

as titled

* Tracing event in async executor

Adding event tracing through TRACE_EVENT macro in async executor

* [Resubmit] D7409751 Reseting book-keeping blobs when the reservoir is reset

D7409751 got lost in D7464958

* Visualizing realtime weights values

we want to visualize the weights values as optimizer is iterating. This diff supports to visual the weights at an assigned index.
Currently, we assume the blob to be 2 dimensional.

* [GanH][Easy]: Fix Homotopy Weighting

apparantely, there was a bug in homotopy weight (alpha, beta) update

* [c2] move sparse hash unique op out of oss

so that oss do not need to depend on google hash map.

* Get rid of std::round as it's not supported on Android

* Revert changes on setup.py

* Skip shaky test on Dataio

* fix
2018-04-10 21:11:43 -07:00
Paul Jesse Hellemn
771fcb3455 [caffe2] Fbcode to GitHub sync (#6208)
* [easy] allow empty tensor in cuda relu op

The diff has not enabled unit test of empty tensor, because MLKVersion of ReluOp need extra work to support

* Make blob norm plotting work with distributed trainer when the old framework is used
2018-04-02 16:35:27 -07:00
Orion Reblitz-Richardson
cbe92abd7c Disable failing test_lengths_max_gpu 2018-03-30 21:00:45 -07:00
Ellie Wen
3d27095eec [easy] fix comments
nit: fix comments
2018-03-30 21:00:44 -07:00
Andrey Malevich
b9d2ba1dbf Revert D7394363: [GanH]: Log D Trick for Cross Entropy with Sigmoid
This reverts commit d63266ccbc0c1390c58c2a71ae0b562fdec2fbc0

@bypass-lint

An infra SEV is better than not reverting this diff.
If you copy this password, see you in SEV Review!
@cause_a_sev_many_files
2018-03-30 21:00:44 -07:00
Ellie Wen
363a227d19 extend bucketize op to support duplicated boundries
upgrade bucketize op to support duplicated boundaries
2018-03-30 21:00:44 -07:00
Jason Gauci
551d5fbf9a CUDA version of LengthsMax operator
CUDA version of LengthsMax operator

@override-unit-failures
2018-03-30 21:00:44 -07:00
Xiaolong Wang
2b0e39f569 [GanH]: Log D Trick for Cross Entropy with Sigmoid
as titled
2018-03-30 21:00:44 -07:00
Lu Fang
344fa57680 Adjust the test since only the op only has CPU implementation 2018-03-27 18:10:39 -07:00
Jongsoo Park
3300e21d52 Add SparseLengthsPositionalWeightedSum operator that fuses SparseLengthsWeightedSum, LengthsRangeFill, and Gather
add SparseLengthsPositionalWeightedSum operator that fuses SparseLengthsWeightedSum, LengthsRangeFill, and Gather
2018-03-27 18:10:39 -07:00
Xianjie Chen
e6b04ba121 fix lengths sum cuda op for empty batch
the cuda does not allow launching empty kernel
2018-03-27 18:10:39 -07:00
Xianjie Chen
6ed9a0c3f2 fix cuda elementwise ops for empty batch
CUDA will fail to launch empty kernel
2018-03-27 18:10:39 -07:00
Roxie He
d2453afb1e Add SumElementsInt operator
Added a caffe2 math sum operator so that it takes integers (only int32)
Changed the SumFloatIter to SumGenericIter so that it takes >1 types.
Added a sumElementInt operator
2018-03-27 18:10:39 -07:00
Jiyan Yang
8fa38f8dce Add gradient clipping (#2452)
As titled.
2018-03-27 15:10:15 -07:00
Orion Reblitz-Richardson
1d5780d42c Remove Apache headers from source.
* LICENSE file contains details, so removing from individual source files.
2018-03-27 13:10:18 -07:00
Jason Gauci
f93e820e7d Revert "[C2][GPU]LengthsMax CUDA version (#2209)" (#2444)
This reverts commit 71acc269bb573c8c04343e6d534b2557a456b29a.
2018-03-27 01:15:52 -07:00
harouwu
6740126f5c [C2][GPU]LengthsMax CUDA version (#2209)
lengthsmax CUDA version.

will provide gradient later
2018-03-27 00:19:17 -07:00
Xiaomeng Yang
a73f9af5ab Add axis to top_k_op. (#2416)
* Revert update on top_k_op

* Add axis to top_k_op

Add axis to top_k_op
2018-03-23 20:43:43 -07:00
Xiaomeng Yang
3053618624 Add argmax and argmin ops (#2371)
* Revert update on top_k_op

* Add axis to top_k_op

* Remove do { ... } while (false)

* Revert top_k op to upstream

* Add argmin and argmax ops

Add argmin and argmax ops

* Revert top_k_test to upstream

* Add argmin and argmax ops

Add argmin and argmax ops
2018-03-22 00:52:11 -07:00
James Reed
48c70d2dbd Fix ReduceMean performance by specializing Eigen implementation for common shapes (#2355) 2018-03-21 21:48:54 -07:00
Orion Reblitz-Richardson
42d3bcc189 Only run WeightedMultiSample test on CPU and not GPU. 2018-03-20 13:34:22 -07:00
Yan Shang
69706b2ab4 Add C2 for weighted sampling
C2 operator, with input (1) index; (2) cdf; argument number_samples,
output number_samples samples from the index.
2018-03-20 13:34:22 -07:00
Lukasz Wesolowski
f7f48989ba GPU support for ChannelBackpropStatsOp
Step 2 of 3 in adding support for multidevice batch normalization on GPUs. Implements ChannelBackpropStatsOp. Similar to D6953411.
2018-03-20 13:34:22 -07:00
Chenguang Xi
3940e7f0a7 Support computing averaged norm in blob magnitdue visualization
1. support the LpNorm operator to calculate the average LpNorm by adding one more boolean argument, i.e., LpNorm(average = true) = LpNorm(x) / size of (x)

2. integrate the average option into visualization framework
2018-03-20 13:34:22 -07:00
Zhanibek Datbayev
7aeda25cfb Add type / shape inference for IndexHash op
just as title says
2018-03-20 13:34:22 -07:00
Edoardo Conti
6af3429f4f Add 2D Row-wise Arg Max Operator
Add operator to return row-wise arg max of 2D matrix.
2018-03-20 13:34:22 -07:00
Orion Reblitz-Richardson
00603b5e0a Add CollectAndDistributeFpnRpnProposalsOp for FPN support (#2254)
* Add CollectAndDistributeFpnRpnProposalsOp for FPN support

* Adds a C++ operator equivalent to the Python op in Detectron
* Once some additional GenerateProposalsOp changes are made this will
 let us support Detectron FPN models with straight Caffe2 C++ ops
* RetinaNet and segmentation models require additional work

* Remove some uses of conservativeResize

* Add notes about training and inputs/outputs to operator documentation
2018-03-19 14:04:43 -07:00
Mohammad Hossain
28eda01809 Reduce Sum and Reduce Mean (#2189)
* Reduce Sum and Reduce Mean

* Handle reductions with empty 'axes'

* Merge codebase and simplify tesnor reduction logic

* Restructure code and add comments.

* Fix parameter to scale

* Fix parameter to scale
2018-03-13 19:13:47 -07:00
Qinqing Zheng
edd138ba00 [C2] Support optional lengths input to ReduceFront/Back operators (#2250) 2018-03-13 13:20:26 -07:00