pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Yanghan Wang	a285cbcccf	support different class modes for bbox in box_with_nms_limit_op Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19820 Reviewed By: newstzpz Differential Revision: D15112955 fbshipit-source-id: a757622a32cff7159c39735607103138dbbafc24	2019-04-30 16:02:44 -07:00
Chandler Zuo	472be69a73	Avoid Output Uninitialized Blobs in Load with load_all=1 (#19133 ) Summary: When output blob names are specified while load_all=1, output blob names are ignored. However, this behavior is not documented. In this diff, we just disallow users to provide blob names when load_all=1. See discussion at https://fb.workplace.com/groups/1405155842844877/permalink/2714909788536136/ Pull Request resolved: https://github.com/pytorch/pytorch/pull/19133 Reviewed By: dzhulgakov Differential Revision: D14883698 Pulled By: chandlerzuo fbshipit-source-id: 6e4171e36c4ccc4f857e79da98b858a06b7d8ad6	2019-04-27 10:45:44 -07:00
Xiaomeng Yang	2ce39de3fc	Add elementwise_affine for layer_norm_op (#19713 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19713 Add elementwise_affine for layer_norm_op Reviewed By: houseroad Differential Revision: D15075454 fbshipit-source-id: e8a7d3da1c81e49fa55323f5e74a68bc4ef8d83f	2019-04-26 17:20:01 -07:00
Xiaomeng Yang	f5fe7aa0b2	Fix relu bug for empty tensor (#19451 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19451 Fix relu bug for empty tensor Reviewed By: xianjiec Differential Revision: D15009811 fbshipit-source-id: b75e567c3bec08d7d12b950d8f1380c50c138704	2019-04-19 15:21:07 -07:00
Yinghai Lu	f1f31b634d	Eliminate AdjustBatch ops (#19083 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19083 As we have discussed, there are too many of AdjustBatch ops and they incur reallocation overhead and affects the performance. We will eliminate these ops by - inling the input adjust batch op into Glow - inling the output adjust batch op into OnnxifiOp and do that only conditionally. This is the C2 part of the change and requires change from Glow side to work e2e. Reviewed By: rdzhabarov Differential Revision: D14860582 fbshipit-source-id: ac2588b894bac25735babb62b1924acc559face6	2019-04-17 10:00:25 -07:00
Huamin Li	c480798a1c	use C10_REGISTER for GELU op Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19090 Reviewed By: BIT-silence Differential Revision: D14864737 fbshipit-source-id: 8debd53171f7068726f0ab777a13ca46becbfbdf	2019-04-12 11:41:04 -07:00
Xiaomeng Yang	fd40c0eba0	Add gelu op (#18992 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18992 Add gelu op Reviewed By: houseroad Differential Revision: D14814811 fbshipit-source-id: 00f126b8b83763c57ebbf28fbd2de5a8fab6d491	2019-04-08 21:58:29 -07:00
Lu Fang	443a58e03d	Export C10 operator in PyTorch Model (#18210 ) Summary: Almost there, feel free to review. these c10 operators are exported to _caffe2 domain. TODO: - [x] let the onnx checker pass - [x] test tensor list as argument - [x] test caffe2 backend and converter - [x] check the c10 schema can be exported to onnx - [x] refactor the test case to share some code - [x] fix the problem in ONNX_ATEN_FALLBACK Pull Request resolved: https://github.com/pytorch/pytorch/pull/18210 Reviewed By: zrphercule Differential Revision: D14600916 Pulled By: houseroad fbshipit-source-id: 2592a75f21098fb6ceb38c5d00ee40e9e01cd144	2019-04-08 16:06:00 -07:00
Xiaomeng Yang	b145dcca04	Add support for group ConvTranspose (#18794 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18794 Add support for group ConvTranspose Reviewed By: houseroad Differential Revision: D14741327 fbshipit-source-id: 5d947ca044bf8495dd7f8f56122441ebbcc6c7e4	2019-04-04 11:52:06 -07:00
Duc Ngo	16f07d7dac	caffe2 - set up correct inheritance structure for remaining operator test classes (#18622 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18622 Set up correct inheritance structure for remaining operator test classes Reviewed By: ezyang Differential Revision: D14685941 fbshipit-source-id: a6b1b3be325935b7fec7515be13a4994b3016bf0	2019-04-01 15:53:22 -07:00
Yanghan Wang	f4e35d30ed	register BoxWithNMSLimit with C10 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17956 Reviewed By: houseroad Differential Revision: D14417300 fbshipit-source-id: eb5e2ba84513b3b7bfa509dc442424b13fe9148f	2019-03-29 13:41:40 -07:00
Ahmed Aly	9eb0f435d9	Inference LSTM integration test (#18559 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18559 Adding integration test for inference LSTM Reviewed By: houseroad Differential Revision: D14656698 fbshipit-source-id: 80fb2a72be30fcb695f4471b72bf9d6e3965bf81	2019-03-28 11:31:06 -07:00
Duc Ngo	6a1a019c0a	caffe2 - support flaky operator tests for caffe2 build (#18155 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18155 - Make a python decorator caffe2_flaky for caffe2 operator unit tests. - The environment variable CAFFE2_RUN_FLAKY_TESTS are now used to mark flaky test mode During test run, - If flaky tests mode are on, only flaky tests are run - If flaky tests mode are off, only non-flaky tests are run Mark ctc_beam_search_decoder_op_test as flaky Reviewed By: ezyang, salexspb Differential Revision: D14468816 fbshipit-source-id: dceb4a48daeb5437ad9cc714bef3343e9761f3a4	2019-03-25 16:58:34 -07:00
Gerard Goossen	46990c20fa	Verify def before infer fensor (#18129 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18129 A lot of tensor interference function assume the operator passes the schema. So call Verity to make sure this is actually the case. Created diff before to add checking in Concat (https://github.com/pytorch/pytorch/pull/17110), but I encountered lot more places where this is assumed (for example ElementwiseOpShapeInference) Reviewed By: mdschatz Differential Revision: D14503933 fbshipit-source-id: cf0097b8c3e4beb1cded6b61e092a6adee4b8fcb	2019-03-22 06:36:25 -07:00
Jongsoo Park	c7448aa13c	remove unused parameters in optimizer tests (#18084 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18084 data_strategy parameter was not used in some of unit tests for optimizers Reviewed By: hyuen Differential Revision: D14487830 fbshipit-source-id: d757cd06aa2965f4c0570a4a18ba090b98820ef4	2019-03-15 18:06:15 -07:00
Sebastian Messmer	7a3488e0fc	Expose c10 cuda ops to caffe2 (#18036 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18036 - Add macros to export c10 cuda operators to caffe2 frontend - Instead of having a separate caffe2 registry for the c10 operator wrappers, use the existing caffe2 registries Reviewed By: ezyang Differential Revision: D14467495 fbshipit-source-id: 7715ed2e38d2bbe16f1446ae82c17193a3fabcb9	2019-03-15 16:58:12 -07:00
Yanghan Wang	53fb9a462a	register RoIAlign with C10 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17889 Reviewed By: smessmer Differential Revision: D14411630 fbshipit-source-id: c3b7941d725ae2c78e8d79f52a7983db92b75807	2019-03-14 11:55:29 -07:00
Jongsoo Park	8bd9465b79	make momentum non negative in adagrad test (#18009 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18009 momentum should be initialized with non-negative values Reviewed By: hyuen Differential Revision: D14450841 fbshipit-source-id: 5bbbd11645db9e6f2dc42b26a00ff3caf378c59f	2019-03-14 03:15:07 -07:00
Xiaomeng Yang	54b33503ec	Optimize channel_stats_op (#16243 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16243 Optimize channel_stats_op and add NHWC impl Reviewed By: takatosp1 Differential Revision: D13775515 fbshipit-source-id: decb889e646f5316d4afefdf9f9b6bc6343613cd	2019-03-12 12:08:00 -07:00
youkaichao	b87abdfc12	typo fix Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17653 Differential Revision: D14302003 Pulled By: ezyang fbshipit-source-id: 8ad90985a392b07127c7e315d4e74ce77962b573	2019-03-06 11:36:44 -08:00
Deepali Chourasia	e3516d0a95	omit group conv NHWC test for GPU (#17715 ) Summary: Observed the test `TestGroupConvolution.test_group_convolution` to fail with the following error: ``` Falsifying example: test_group_convolution(self=<caffe2.python.operator_test.group_conv_test.TestGroupConvolution testMethod=test_group_convolution>, stride=3, pad=0, kernel=5, size=8, group=4, input_channels_per_group=7, output_channels_per_group=8, batch_size=2, order='NHWC', engine='', use_bias=False, gc=, dc=[, device_type: 1]) You can reproduce this example by temporarily adding reproduce_failure('3.59.1', b'AAAA') as a decorator on your test case ``` This example generated by hypothesis has `group=2, order='NHWC' and dc=[, device_type: 1])`. I think this example should be skipped. I have mimicked the change corresponding to [PR#13554](https://github.com/pytorch/pytorch/pull/13554) to skip this example. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17715 Differential Revision: D14346642 Pulled By: ezyang fbshipit-source-id: b1f1fef09f625fdb43d31c7213854e61a96381ba	2019-03-06 11:32:35 -08:00
Sebastian Messmer	910519e45b	Expose cuda kernel for caffe2::GenerateProposals Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17066 Reviewed By: ezyang, wat3rBro Differential Revision: D14071130 fbshipit-source-id: 6fe26503f6069c36ec31d6c09b549b932d5db242	2019-03-04 14:59:08 -08:00
rohithkrn	8c72217817	Enable boolean_mask, adadelta, adagrad fp16 on ROCm (#17235 ) Summary: - Fix bugs, indentation for adadelta and adagrad tests to enable fp16 - Enable boolean_mask fp16 on ROCm Pull Request resolved: https://github.com/pytorch/pytorch/pull/17235 Differential Revision: D14240828 Pulled By: bddppq fbshipit-source-id: ab6e8f38aa7afb83b4b879f2f4cf2277c643198f	2019-02-27 10:07:36 -08:00
Peizhao Zhang	54e4c4d7de	Removed obsolete argument correct_transform_coords in bbox_transform op. (#16723 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16723 Removed obsolete argument correct_transform_coords in bbox_transform op. * It was only for backward compatibility. We should not have models using it now. Differential Revision: D13937430 fbshipit-source-id: 504bb066137ce408c12dc9dcc2e0a513bad9b7ee	2019-02-20 13:22:33 -08:00
Sebastian Messmer	9696fee635	Register CUDA kernels for caffe2 operators (#16691 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16691 Previous diffs already introduced a macro that registers caffe2 CPU kernels with c10. This now also registers the CUDA kernels with it. Reviewed By: bwasti Differential Revision: D13901619 fbshipit-source-id: c15e5b7081ff10e5219af460779b88d6e091a6a6	2019-02-12 17:24:01 -08:00
Sebastian Messmer	920c684367	Expose GenerateProposals to PyTorch Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16880 Reviewed By: bwasti Differential Revision: D13998092 fbshipit-source-id: 23ab886ba137377312557fa718f262f4c8149cc7	2019-02-11 14:15:47 -08:00
Sebastian Messmer	0c02d317ea	Expose BBoxTransform to pytorch Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16879 Reviewed By: bwasti Differential Revision: D13998093 fbshipit-source-id: ddfe4bff83e9a1a4cedf1e520e6d2977b21cb3af	2019-02-11 14:15:45 -08:00
peter.yeh@amd.com	c65b03b9f8	Enable arg_ops_test/unique_ops_test on AMD/rocm (#16853 ) Summary: Verified both tests are passing on rocm 2.1 env. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16853 Differential Revision: D13996279 Pulled By: bddppq fbshipit-source-id: c0df610d7d9ca8d80ed2d1339cdadef59105a71c	2019-02-07 16:51:15 -08:00
Sebastian Messmer	64339dbd51	Fix and re-enable test case (#16643 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16643 The test was disabled in D13908117 because it conflicted with another diff that was about to land. Now fixed the merge conflict and re-landing it. Reviewed By: ezyang Differential Revision: D13911775 fbshipit-source-id: b790f1c3a3f207916eea41ac93bc104d011f629b	2019-02-07 13:58:16 -08:00
Sebastian Messmer	6750e1e3e9	C10_REGISTER_CAFFE2_OPERATOR: Macro for registering c2 kernels (#16548 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16548 With this macro, a caffe2 operator can now directly be registered with c10. No need to write custom wrapper kernels anymore. Differential Revision: D13877076 fbshipit-source-id: e56846238c5bb4b1989b79855fd44d5ecf089c9c	2019-02-07 13:58:14 -08:00
rohithkrn	aa88c2c0b6	Unify gpu_support variable in python tests (#16748 ) Summary: Assign `has_gpu_support = has_cuda_support or has_hip_support` and make according changes in python tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16748 Differential Revision: D13983132 Pulled By: bddppq fbshipit-source-id: ca496fd8c6ae3549b736bebd3ace7fa20a6dad7f	2019-02-07 00:29:51 -08:00
Yinghai Lu	e5e0bf4152	Add AdjustBatch Op (#16676 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16676 This op is used for changing batch size (first dimension) of the tensor. Reviewed By: bertmaher, ipiszy Differential Revision: D13929200 fbshipit-source-id: 4f2c3faec072d468be8301bf00c80d33adb3b5b3	2019-02-06 19:15:41 -08:00
Jongsoo Park	929cd23da1	no EIGEN engine for DeformConv (#16785 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16785 There's no EIGEN engine implemented for DeformConv but unit test was checking it. Reviewed By: BIT-silence Differential Revision: D13967306 fbshipit-source-id: e29c19f59f5700fc0501c59f45d60443b87ffedc	2019-02-06 11:59:31 -08:00
Jongsoo Park	8d4b2db529	format deform_conv_test.py (#16786 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16786 Format to prepare D13967306 Reviewed By: BIT-silence Differential Revision: D13967317 fbshipit-source-id: 2de895f8474b04c55ba067fbf788c553dc010c60	2019-02-06 11:59:29 -08:00
Edward Yang	a3f600e394	Revert D13854304: [redo][c10] LayerNorm Registration Example Differential Revision: D13854304 Original commit changeset: ec463ce22721 fbshipit-source-id: 4262b9a2ef486e1c7c0283ea021331ac97cc5f56	2019-02-06 08:26:23 -08:00
Edward Yang	fc0e88dd77	Revert D13855525: [c10] Expose RoIAlign to torch Differential Revision: D13855525 Original commit changeset: cfee7bb1544d fbshipit-source-id: 0b4124b78c4082b52e592a1275069c879a9aed39	2019-02-06 08:26:22 -08:00
Edward Yang	33a6a7a3ea	Revert D13856086: [c10] Expose GenerateProposals to torch Differential Revision: D13856086 Original commit changeset: a4873646a71a fbshipit-source-id: 79b634426404236ddbc407d3796a350ad3dae5ca	2019-02-06 08:26:20 -08:00
Edward Yang	018485130f	Revert D13864292: [c10] Expose BBoxTransform to pytorch Differential Revision: D13864292 Original commit changeset: 1f57664e7834 fbshipit-source-id: 37663b7e8213185ecaa5c219076fc7de64704549	2019-02-06 08:26:18 -08:00
Edward Yang	c0a7bf94ed	Revert D13865221: [c10] Expose BoxWithNMSLimit Differential Revision: D13865221 Original commit changeset: 8a3f1d420183 fbshipit-source-id: 0057be9619b660dcad8c01bae67b54400127577e	2019-02-06 08:26:17 -08:00
Edward Yang	cda43336d4	Revert D13866214: [c10] Expose HeatmapMaxKeypoints to torch Differential Revision: D13866214 Original commit changeset: 2ca79037fc07 fbshipit-source-id: d2c653f4f32cf0ea76875888f3523c0dc7db9960	2019-02-06 08:26:16 -08:00
Bram Wasti	a9713d07b0	Expose HeatmapMaxKeypoints to torch (#16528 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16528 .. Reviewed By: smessmer Differential Revision: D13866214 fbshipit-source-id: 2ca79037fc070bade5542345af5ce09f88beda44	2019-02-05 12:56:58 -08:00
Bram Wasti	3df7b321cc	Expose BoxWithNMSLimit (#16529 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16529 .. Reviewed By: smessmer Differential Revision: D13865221 fbshipit-source-id: 8a3f1d420183ed5ae51b3c9e4eb6e033078c7ae4	2019-02-05 12:56:56 -08:00
Bram Wasti	add39b85cc	Expose BBoxTransform to pytorch (#16530 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16530 .. Reviewed By: smessmer Differential Revision: D13864292 fbshipit-source-id: 1f57664e78347e72c0087aa3d825a6a9517c1945	2019-02-05 12:56:54 -08:00
Bram Wasti	f33a2b960e	Expose GenerateProposals to torch (#16477 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16477 expose generateproposals to torch Reviewed By: smessmer Differential Revision: D13856086 fbshipit-source-id: a4873646a71a6b6c01740d21729e827f4b36588f	2019-02-05 12:56:52 -08:00
Bram Wasti	f5d4636021	Expose RoIAlign to torch (#16476 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16476 enable calling roialign (caffe2) from torch frontend Reviewed By: smessmer Differential Revision: D13855525 fbshipit-source-id: cfee7bb1544dc58df4231604ba01d61ca905ae3f	2019-02-05 12:56:50 -08:00
Bram Wasti	240240bb10	LayerNorm Registration Example (#16478 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16478 This diff includes an example registration of a caffe2 op in torch. A previous attempt ran into a static initialization order bug. Reviewed By: smessmer Differential Revision: D13854304 fbshipit-source-id: ec463ce2272126d08a5163d1599361ee5b718bbc	2019-02-05 12:56:48 -08:00
Sebastian Messmer	f36f3cce9a	Simplify layer_norm_op_test Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16570 Reviewed By: ezyang Differential Revision: D13883913 fbshipit-source-id: 7437d3cbc00c0de92bb01562c620cb658aa9f0d3	2019-02-01 21:34:18 -08:00
Xiaomeng Yang	4ae9ab24b6	Update conv_base to support empty batch (#16603 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16603 Update conv_base to support empty batch Reviewed By: houseroad Differential Revision: D13894111 fbshipit-source-id: fc4370ff16ba6046f374e77bd845d28e6af05ea3	2019-01-31 23:46:18 -08:00
Dmytro Dzhulgakov	51752e09c6	Disable layernorm_c10 test for now (#16630 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16630 two PRs landed concurrently - enforcing tensor constraints and refactoring c10. Since it's not a prod code - disable test and I'll let Sebastian to fix it properly. Reviewed By: ezyang Differential Revision: D13908117 fbshipit-source-id: 381c5626078b794afa1fc7a95cb1ea529650424c	2019-01-31 15:47:13 -08:00
Sebastian Messmer	c43917b0a3	Add a test case calling caffe2 layer_norm from caffe2 executor but through the c10 dispatcher Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16283 Reviewed By: ezyang Differential Revision: D13792591 fbshipit-source-id: 9c190649e38e8706549102b2e136ceaf508eb37f	2019-01-30 13:16:47 -08:00
Sebastian Messmer	80f4374dde	Handle stack correctly (#16246 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16246 The op schema says it returns multiple values, so let's actually return multiple values instead of one tuple. For some reason, this did work when called from python (probably some auto-unpacking), but once called from JIT, it segfaulted. This diff fixes that. Reviewed By: dzhulgakov Differential Revision: D13780147 fbshipit-source-id: fe94f82f4c53b7454f77c4484fca4ac9dc444475	2019-01-28 11:46:03 -08:00
Juan Miguel Pino	41e9b092a9	Revert D13821061: [redo][c10] layernorm example Differential Revision: D13821061 Original commit changeset: 82f0dade0145 fbshipit-source-id: e5b0b1bab0c9e731ae04add35e9a6c91656dd178	2019-01-25 22:52:04 -08:00
Bram Wasti	27a1ba3ef2	layernorm example (#16374 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16374 this fixes the original attempt in OSS (adds to CMake and python build files) Reviewed By: smessmer Differential Revision: D13821061 fbshipit-source-id: 82f0dade0145fd04bdf8e3cb3954b5790e918162	2019-01-25 16:52:33 -08:00
Bram Wasti	958f846fb3	Back out "[c10] layernorm example" Summary: Original commit changeset: 87240ca7f48d Reviewed By: bddppq Differential Revision: D13816657 fbshipit-source-id: bafcf0779d811c7e4a134cfb323a89352fa8c180	2019-01-25 10:22:30 -08:00
Bram Wasti	265ed8ff45	layernorm example (#16350 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16350 Example usage of the new caffe2 integration Reviewed By: smessmer Differential Revision: D13408546 fbshipit-source-id: 87240ca7f48d653a70241d243aa0eb25efa67611	2019-01-24 22:28:22 -08:00
Jongsoo Park	6700eff03e	disable testing group conv with EIGEN engine (#16335 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16335 group conv is not implemented with EIGEN engine so this diff disables related tests Reviewed By: jamesr66a Differential Revision: D13807204 fbshipit-source-id: 41f6de43da40882f57e64474520e185733caefb7	2019-01-24 16:39:20 -08:00
Jongsoo Park	f0dd85d141	reduce parameter space of test_1x1_conv to avoid timeout (#16223 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16223 As title says Reviewed By: jamesr66a Differential Revision: D13758202 fbshipit-source-id: 3cdffb80a5dad53b29e65e8eb0ae128edba70dbb	2019-01-24 14:17:11 -08:00
bddppq	1a09a2a27f	Export PyTorch erf to ONNX Erf and add Caffe2 Erf operator Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16106 Differential Revision: D13709490 Pulled By: bddppq fbshipit-source-id: 1b5b32261f06543371f7bd7ac9b11957a5eb4ad0	2019-01-17 09:18:08 -08:00
Xiaomeng Yang	7536887cb7	Add count_include_pad for avg_pool on CuDNN (#16100 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16100 Add count_include_pad for avg_pool on CuDNN Reviewed By: houseroad Differential Revision: D13707959 fbshipit-source-id: 261f5d116066fef75cf9a5787dfbc5d12b5b9f9b	2019-01-17 02:10:12 -08:00
Xiaomeng Yang	7a5f782c2e	Fix max_pool_grad test (#16088 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16088 Fix max_pool_grad test Reviewed By: houseroad Differential Revision: D13700917 fbshipit-source-id: f4f942ee920bcd943c38a8f8a6aafd1d13c4515f	2019-01-16 15:32:27 -08:00
Xiaomeng Yang	13f38ab79d	Add count_include_pad to average_pool_gradient_op (#15997 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15997 Add count_include_pad to average_pool_gradient_op Reviewed By: houseroad Differential Revision: D13648339 fbshipit-source-id: 205cb2acb32dc24a85256b628298b1a11f0ffa2c	2019-01-15 16:56:40 -08:00
Sebastian Messmer	57b5e7572b	Test cases for calling caffe2 LayerNorm from PyTorch and JIT Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15895 Reviewed By: dzhulgakov Differential Revision: D13615336 fbshipit-source-id: de28fef8ce025d6d37a4c80c029ec97b7195cfd9	2019-01-15 12:03:57 -08:00
Sebastian Messmer	4ed9de8680	Remove code duplication (#15880 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15880 The layer_norm reference was implemented twice. Removing one of them. Reviewed By: dzhulgakov Differential Revision: D13611232 fbshipit-source-id: cee96c78d3255c3a4e34300693bf9260cf096615	2019-01-14 17:59:37 -08:00
Jesse Hellemn	8964a2e6e6	Split Caffe2 CI into cmake-only and python builds (#15917 ) Summary: bypass-lint - Change all Caffe2 builds to use setup.py instead of cmake - Add a -cmake- Caffe2 build configuration that uses cmake and only builds cpp - Move skipIfCI logic from onnx test scripts to the rest of CI logic - Removal of old PYTHONPATH/LD_LIBRARY_PATH/etc. env management Pull Request resolved: https://github.com/pytorch/pytorch/pull/15917 Reviewed By: orionr Differential Revision: D13637583 Pulled By: pjh5 fbshipit-source-id: c5c5639db0251ba12b6e4b51b2ac3b26a8953153	2019-01-14 15:20:44 -08:00
Sergei Nikolaev	a282378baf	Caffe 2: Reshape Op upgrade (#15380 ) Summary: This is follow up on #13945 where we had to turn off some TRT tests because some ops were not ready to accept ONNX opset 9+ models. This PR fixes Reshape. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15380 Differential Revision: D13649825 Pulled By: houseroad fbshipit-source-id: b72e62803de5b63cc001c3fe4b3bf64dfa996e94	2019-01-13 22:49:40 -08:00
Andre Georg Holzner	961f829067	deduplicated code in elementwise_op_broadcast_test.py (#15865 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15865 factored out code used in tests for operators Add, Mul and Sub into two new methods: a first one to generate the test vectors, a second one to run the actual tests given a caffe2 and python operator. Reviewed By: houseroad Differential Revision: D13526955 fbshipit-source-id: 8970ba5a1305ca19a54a14b51816d4a19f19d678	2019-01-09 03:07:22 -08:00
David Carrillo Cisneros	2b22612289	Add NHWC support to Resize Operator (#15553 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15553 Add unit test and implementation of NHWC layout for Resize operator. Also, add pragma parallel loop to old NCHWC layout. Reviewed By: jspark1105 Differential Revision: D13540762 fbshipit-source-id: eebf252bf0d1efdff180a171d804181045f100a5	2019-01-08 16:44:17 -08:00
Jongsoo Park	1159302ab1	bug fix in 3d group conv (#15625 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15625 3D group conv (both NCHW and NHWC layout) was not correct. Added group=2 in test_1d_convolution and test_3d_convolution in conv_test Reviewed By: protonu Differential Revision: D13562099 fbshipit-source-id: 586e8a7574a2764f2a3b559db6c2415b3ab90453	2019-01-03 09:46:49 -08:00
Jongsoo Park	bee6c6761e	format conv_test.py to prepare D13562099 (#15632 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15632 Just formatting and a few lints. Reviewed By: yinghai Differential Revision: D13562403 fbshipit-source-id: c56f8ee61f68cdaccc0828a764ff729454f68259	2019-01-02 11:34:30 -08:00
Jongsoo Park	d53012b4fe	add NCHW2NHWC and NHWC2NCHW in utils.py (#15588 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15588 Use NHWC2NCHW or NCHW2NHWC functions which is easier to understand compared to code using transpose and generalizable to non-2D convolutions. Reviewed By: csummersea Differential Revision: D13557674 fbshipit-source-id: c4fdb8850503ea58f6b17b188513ae2b29691ec0	2018-12-28 17:34:50 -08:00
Jongsoo Park	4e4ef0cffb	add rowwise adagrad lp test (#15082 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15082 We didn't have unit test for low-precision rowwise adagrad Reviewed By: chocjy Differential Revision: D13300732 fbshipit-source-id: 46e7bdfc82c5a6855eeb6f653c0a96b0b3a20546	2018-12-22 10:25:39 -08:00
Jongsoo Park	e012b183dd	handle empty inputs to SparseLengthsMean correctly (#15389 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15389 SparseLengthsMean was generating uninitialized data for empty inputs (lengths == 0). We should return zeros. The unit tests were also not covering this special case which is fixed by this diff. Reviewed By: salexspb Differential Revision: D13515970 fbshipit-source-id: 3c35265638f64f13f0262cee930c94f8628005da	2018-12-21 22:20:14 -08:00
Jongsoo Park	f52f68bcf9	format specialized_segment_ops_test.py to prepare D13515970 (#15408 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15408 Applied formatting to specialized_segment_ops_test.py to prepare D13515970 Reviewed By: salexspb Differential Revision: D13520300 fbshipit-source-id: c3250b6abe8087c607f65ae60d1da61bd46c342b	2018-12-20 23:44:47 -08:00
Edward Yang	26b04523b1	Record Caffe2's current stream ID in c10_cuda. (#15174 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15174 Previously, Caffe2 maintained a separate per-thread per-device current logical CUDA stream ID. In this PR, we switch Caffe2 over to using c10::Stream to manage the current stream, and also manage the allocation of cudaStream_t objects. This results in a slight behavior change: previously, Caffe2 would have been willing to allocate an arbitrary number of CUDA streams, depending on how high the logical stream IDs went. The c10::Stream pool has a fixed number of streams, once you exceed it, it wraps around. Reviewed By: dzhulgakov Differential Revision: D13451550 fbshipit-source-id: da6cf33ee026932a2d873835f6e090f7b8a7d8dc	2018-12-20 21:54:05 -08:00
Bill Li	3681bf7cff	add dense vector to id_list operator (#15090 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15090 as title step 2 of the linked task Reviewed By: ellie-wen Differential Revision: D13425977 fbshipit-source-id: f3538ed68f42470ba39c5b779af764d4a5591a9d	2018-12-18 16:27:38 -08:00
rohithkrn	763b9954f3	FP16MomentumSGDUpdate Op fix and enable for ROCm (#15150 ) Summary: 1. Fix a bug in FP16MomentumSGDUpdate operator 2. Enable operator for ROCm Pull Request resolved: https://github.com/pytorch/pytorch/pull/15150 Differential Revision: D13473145 Pulled By: bddppq fbshipit-source-id: 4c5c5f30cb9bba658e3639dbe193fa08a304d306	2018-12-14 16:33:45 -08:00
Xianjie Chen	fabd23cb2d	support casting to string (#15110 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15110 support casting to string on CPU Reviewed By: intermilan Differential Revision: D13429381 fbshipit-source-id: b737a1ba1237b10f692d5c42b42a544b94ba9fd1	2018-12-12 21:33:58 -08:00
Jongsoo Park	cff509e2b1	share code between adagrad and rowwise adagrad tests (#14692 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14692 Remove some code duplication Reviewed By: chocjy Differential Revision: D13296731 fbshipit-source-id: 5924e037ca64fc4b89234be922bc5ca47fb8bd32	2018-12-10 22:10:39 -08:00
bddppq	45dfc6764e	Enable more caffe2 fp16 rocm tests (#15040 ) Summary: cc rohithkrn petrex Pull Request resolved: https://github.com/pytorch/pytorch/pull/15040 Reviewed By: houseroad Differential Revision: D13413068 Pulled By: bddppq fbshipit-source-id: b2967f16f8da0b9e80083138fb8632c14e9e9b63	2018-12-10 21:30:21 -08:00
rohithkrn	7e2b074219	Integrate rocBLAS fp16 api into Caffe2 (#14882 ) Summary: This PR integrates rocBLAS half and mixed precision APIs in to Caffe2. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14882 Differential Revision: D13407840 Pulled By: bddppq fbshipit-source-id: 75cb0d74da066776fa66575f1d255e879d36121e	2018-12-10 17:54:06 -08:00
rohithkrn	11a9248d01	Enable fp16 for MIOPEN operators in Caffe2 (#14905 ) Summary: This PR enables fp16 MIOPEN operators in Caffe2. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14905 Differential Revision: D13383439 Pulled By: bddppq fbshipit-source-id: 840afa8d08bef2952ca0039dee2423f1542bb330	2018-12-07 17:26:44 -08:00
lcskrishna	12addc64a6	Fixed MIOpen RNN Segfault issue and enabled RNN test (#14810 ) Summary: This pull request contains changes for: 1. Added MIOpen RNN API miopenGetRNNLayerBiasSize and miopenGetRNNLayerParamSize. 2. Fixed usage of API miopenGetRNNLayerParam. 3. Modifying the RNN test to run using MIOpen engine. Differential Revision: D13355699 Pulled By: bddppq fbshipit-source-id: 6f750657f8049c5446eca893880b397804120b69	2018-12-05 23:54:31 -08:00
Huan Gui	ba287eebca	Fix clip gradient with empty input (#14709 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14709 As titled Reviewed By: Wakeupbuddy Differential Revision: D13305554 fbshipit-source-id: 380062d4b0e4f9dc0207a27766cac7b8d05384d5	2018-12-05 22:53:25 -08:00
Michael Antonov	773f4d8081	Implements Gather operator for arbitrary axis, sharing the code with BatchGather. (#13756 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13756 This implements general Gather operator for arbitrary axis, sharing the code with BatchGather. - CPU gather & batch gather logic is now shared through caffe2::gather_helper, for any axis. - Shared CUDA kernel moved to gather_op.cuh, for any axis. - Gradients of axis > 0 delegate to BatchGatherGradientOp which now has axis argument. - BatchGatherOp doc strings updated to have correct rank (q + (r -1)) and output. - Added tests for axis == 2. GatherOp supports index wrapping for axis == 0 by default, which was earlier for ONNX. This diff also extends it to work in Cuda kernel. Added "wrap_indices" argument which specifies wheather this wrapping should be done; set it to true if you'd like wrapping for any axis. TBD: Update gradients to support negative indices (separate diff). TBD: Once we have operator versioning, we'd like to update GatherOp to NOT support axis 0 wrapping by default, but rather do it only if wrap_indices is set. Reviewed By: dzhulgakov Differential Revision: D12983815 fbshipit-source-id: 8add9d67b47fe8c5ba7a335f581ca0530b205cd7	2018-12-04 11:54:28 -08:00
Yan Zhu	aeb38cfcea	cuda implementation for PackSegment to support presence mask (#14635 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14635 as title Reviewed By: enosair Differential Revision: D13254097 fbshipit-source-id: b9f40109e2889907c925f9a4df9da14f67f45f38	2018-11-30 16:54:10 -08:00
rohithkrn	0d663cec30	Unify cuda and hip device types in Caffe2 python front end (#14221 ) Summary: Goal of this PR is to unify cuda and hip device types in caffe2 python front end. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14221 Differential Revision: D13148564 Pulled By: bddppq fbshipit-source-id: ef9bd2c7d238200165f217097ac5727e686d887b	2018-11-29 14:00:16 -08:00
Ashish	f4e502a8c5	Added MIOpen conv transpose op (#13938 ) Summary: This pull request contains changes for: 1. Removing ConvTranspose related changes from caffe2/operators/hip/conv_op_miopen.cc 2. Adding the file caffe2/operators/hip/conv_transpose_op_miopen.cc 3. Modifying the tests to run convTranspose op using MIOpen engine Differential Revision: D13055099 Pulled By: bddppq fbshipit-source-id: ca284f8f9a073005b22013c375cc958257815865	2018-11-13 21:01:52 -08:00
Shuting Wang	23e19ebfa7	add non expotential emphasis loss to Lambdarank Summary: Currently Lambdarank applies exponential emphasis on relevance, i.e., g=2^rel when calculating dcg, this diff adds options that supports g=rel in the loss function. Reviewed By: itomatik Differential Revision: D9891514 fbshipit-source-id: 64730d467a665670edd37e6dc1c077987991d1a8	2018-11-13 14:54:04 -08:00
Yan Shang	c85463fc74	Allow Gather to handle empty data (#13781 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13781 allow Gather Op to handle empty data. Reviewed By: intermilan Differential Revision: D13001267 fbshipit-source-id: 633c8471b637c56be8f6574f9bf9430785073977	2018-11-10 10:00:47 -08:00
Ansha Yu	e3e6ca1102	operator serialized test coverage summary document (#13703 ) Summary: Add a markdown document summarizing the coverage of serialized operator tests. This currently only takes into account what has been covered by the tests with respect to the entire registry of c2 operators. Next, we will break down the coverage by which operators have unit tests associated with them, which have hypothesis tests, and which have tests more specifically calling assertReferenceChecks. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13703 Reviewed By: dzhulgakov Differential Revision: D12970810 Pulled By: ajyu fbshipit-source-id: 4f0cd057b1cf734371333e24d26cbab630a170e1	2018-11-09 15:04:08 -08:00
François Garillot	edd2e38023	Clean up a couple of items in the C2 test scaffolding (WIP) (#7847 ) Summary: - Py3 compatibility - utility functions refactoring Pull Request resolved: https://github.com/pytorch/pytorch/pull/7847 Reviewed By: pietern Differential Revision: D9355096 Pulled By: huitseeker fbshipit-source-id: 8e78faa937488c5299714f78075d7cadb1b2490c	2018-11-07 09:16:13 -08:00
Pradeep Dorairaj	76c1b5cd79	Fix overflow error in stats_put_ops Summary: I was hitting this error: caffe2/caffe2/operators/stats_put_ops.h:66:25: runtime error: 9.22337e+18 is outside the range of representable values of type 'long' So, the assignment from int64_t to float loses some precision and because of that we overflow. Reproduced this issue with this diff D12945013 Reviewed By: mlappelbaum, jdshi-fb Differential Revision: D12927086 fbshipit-source-id: 7eae7fe25ab49d5ac15279335bd5b1fa89d6e683	2018-11-06 15:41:51 -08:00
Junjie Bai	95ca66763d	Add math functions overloaded over different numeric types for cuda and hip (#13602 ) Summary: petrex ashishfarmer rohithkrn iotamudelta Pull Request resolved: https://github.com/pytorch/pytorch/pull/13602 Reviewed By: dzhulgakov Differential Revision: D12935797 Pulled By: bddppq fbshipit-source-id: a49ec66fb60bfd947c63dd2133d431884df62235	2018-11-06 01:40:31 -08:00
Jongsoo Park	54e8623d26	3D Conv in NHWC layout (#12733 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12733 Conv in NHWC layout only works for 2D images. This has been a pain point when implementing quantized 3D convolution because we need NHWC layout for best performance (note that NHWC layout in general gives better performance in CPU not just for quantized operators). For example, our quantized ops have a functionality to measure quantized error operator by operator but this needs running a shadow fp32 operator, but this is not easy when there's no 3D conv in NHWC layout is available (currently we're doing layout conversion on the fly for the shadow fp32 operator which is error prone). Some of Caffe2 frameworks like brew generates error when we try to create a 3D conv op in NHWC layout. This was also a blocker for using aibench because aibench is using brew. i-am-not-moving-c2-to-c10 Reviewed By: houseroad Differential Revision: D10333829 fbshipit-source-id: 2d203ee1db833cd3f9d39353219e3894b46c4389	2018-11-04 21:50:09 -08:00
Jongsoo Park	8be0efaa8c	omit group conv NHWC test for HIP (#13554 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13554 D10233252 broke ROCM test. We don't have group conv in NHWC for hip yet and this diff omits related tests. Reviewed By: hyuen Differential Revision: D12917880 fbshipit-source-id: 9baf36a8cb061ee8cf393b2c438a2d1460ce5cd8	2018-11-03 21:18:23 -07:00
Jongsoo Park	2bc6a7a260	enable group conv test in NHWC layout in CPU (#12428 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12428 Group conv in NHWC layout was enabled in CPU after D7547497. In D7547497, unit test of group conv in NHWC layout in CPU was enabled in group_conv_test.py but not in conv_test.py . This diff also enables it in conv_test.py . Reviewed By: BIT-silence Differential Revision: D10233252 fbshipit-source-id: aeeaf3eedc60e1cf6321b5a1dbe6a561e3aacbde	2018-11-03 11:58:51 -07:00
Junjie Bai	da029ca042	Skip Conv1D tests for MIOPEN (#13512 ) Summary: miopen currently only supports 2d Pull Request resolved: https://github.com/pytorch/pytorch/pull/13512 Differential Revision: D12903307 Pulled By: bddppq fbshipit-source-id: a8b0f0580a1859f1e0c1518907406abf013c4c8c	2018-11-02 11:38:26 -07:00
Sergei Nikolaev	61a2d47ec6	Special handling for 1D covolutional kernels in cuDNN flavor of conv_op. (#12902 ) Summary: Essentially makes cuDNN to think of those kernels like of Nx1 ones. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12902 Reviewed By: BIT-silence Differential Revision: D10852862 Pulled By: soumith fbshipit-source-id: 7416cf6d131177340d21cbf1d42c1daa6c7cad8c	2018-11-02 07:08:23 -07:00
Will Feng	4c06f1f2bb	CircleCI: enable all flaky tests (#13356 ) Summary: A few Caffe2 tests are currently disabled in `py2-gcc4.8-ubuntu14.04` test job because they are known to be flaky. https://github.com/pytorch/pytorch/pull/13055 likely had fixed the flakiness, and this PR tests it. Fixes https://github.com/pytorch/pytorch/issues/12395. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13356 Differential Revision: D12858206 Pulled By: yf225 fbshipit-source-id: 491c9c4a5c48ac1b791fdc9d78acf66091e80457	2018-10-31 09:34:49 -07:00
Michael Antonov	f58e4fbc45	Remove redundant array-gen loop in gather_ops_test.py (#13338 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13338 Remove unnecessary [r for r in []] statements. Reviewed By: ezyang Differential Revision: D12848907 fbshipit-source-id: 256551b286ac6801585acf9bb0b2644ef0b7ed58	2018-10-30 16:20:22 -07:00
Dong Shi	3a81984bde	Make Stat put ops accept empty tensors safely (#13178 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13178 Add default value option to stats put ops Reviewed By: mlappelbaum Differential Revision: D10858564 fbshipit-source-id: cc9b3e621abf3fc21821b73f354bebdcd35e477e	2018-10-30 13:28:58 -07:00
Ilia Cherniavskii	1032cf9fe4	Support for zero-length sequences in RNN executor (#13244 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13244 Adding support for zero-length sequences into RNN executor Reviewed By: dzhulgakov Differential Revision: D10848803 fbshipit-source-id: f2994ee28c09fb30146243bb300ae7205024dd17	2018-10-29 10:32:42 -07:00
Tristan Rice	ab40eff5dd	caffe2: UpsampleBilinear CUDA implementation (#12843 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12843 This adds a cuda implementation for the UpsampleBilinearOp and UpsampleBilinearGradientOp. The CUDA code is based off of the corresponding ResizeNearest operators but with bilinear interpolation logic taken from the CPU implementation. Reviewed By: houseroad Differential Revision: D10453776 fbshipit-source-id: b29ac330b72465974ddb27c0587bca590773fdec	2018-10-25 11:10:04 -07:00
Junjie Bai	ccfaf46431	Make CUDNN an alias of MIOPEN for HIP ops (#12278 ) Summary: This is mostly for reusing all the cudnn test cases in our python operator_tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12278 Differential Revision: D10842592 Pulled By: bddppq fbshipit-source-id: 4b3ed91fca64ff02060837b3270393bc2f9a9898	2018-10-24 17:07:31 -07:00
Edward Yang	df47bbe9c1	Fix test_glu_old HealthCheck with smarter generation strategy. (#12975 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/12975 Differential Revision: D10513493 Pulled By: ezyang fbshipit-source-id: ac183aeb4ae7f0a5f91f1a369b595ae92c3e844d	2018-10-24 13:45:19 -07:00
Yangqing Jia	ff508c91a1	Remove numba dependency Summary: TSIA - we want to deprecate numba in fbcode when moving to new compiler tiers. Converted the old test to a non-numba regular python op test. Reviewed By: xw285cornell Differential Revision: D10519910 fbshipit-source-id: 0e9188a6d0fc159100f0db704b106fbfde3c5833	2018-10-23 17:03:47 -07:00
Tristan Rice	6190408e24	caffe2: UpsampleBilinear support for scales (#12736 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12736 This updates UpsampleBilinearOp and UpsampleBilinearGradientOp to support scales to bring it inline with ResizeNearestOp https://github.com/pytorch/pytorch/pull/12720. Reviewed By: houseroad Differential Revision: D10416228 fbshipit-source-id: f339b7e06979c9c566afb4cee64a2d939b352957	2018-10-19 08:55:55 -07:00
Dmytro Dzhulgakov	92890d4314	Delete ExtendTensor operator Summary: Added 2 years ago in D3665603, never used, kill it. Reviewed By: ezyang Differential Revision: D10421336 fbshipit-source-id: 1b027a9ef2b71d0dd2c572cd4338bc8e046320d8	2018-10-18 15:18:40 -07:00
Lu Fang	f1e7d384b6	Support scales as inputs in ResizeNearest (#12720 ) Summary: To address https://github.com/onnx/onnx/pull/1467 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12720 Reviewed By: BIT-silence Differential Revision: D10414813 Pulled By: houseroad fbshipit-source-id: 8831381b0115c363065c8d23bd1a95b4d641b857	2018-10-17 23:08:53 -07:00
Hector Yuen	17ab3bd502	implement rowwise quantization for fp16 (#12382 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12382 implement fp16-> (uint8 + scale and bias in fp32) this is similar to fp32 rowwise quantization we could have done scale and bias in fp16 but not too motivated since we are not saving much and those datatypes have to be converted to fp32 to process since x86 doesn't support half float operations anyways Reviewed By: csummersea Differential Revision: D10220463 fbshipit-source-id: 6c382026de881f03798c2e5fc43abfc80f84ea1f	2018-10-12 13:57:55 -07:00
Dong Shi	da3dd9af12	No Op Optimizer (#12390 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12390 Introduce a no op optimizer for when we don't want updates to happen, but don't want to affect downstream processes. Reviewed By: mlappelbaum Differential Revision: D10209812 fbshipit-source-id: 2af4ebc0fb42e78ea851c3a9f4860f3d224037b6	2018-10-10 18:09:46 -07:00
Junjie Bai	f54ab540af	Rename cuda_gpu_id to device_id in DeviceOption (#12456 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12456 codemod with 'Yes to all' codemod -d . --extensions h,cc,cpp,cu,py,proto,pbtxt,pb.txt,config cuda_gpu_id device_id Overload TextFormat::ParseFromString to do string replace when parsing from protobuf format Reviewed By: Yangqing Differential Revision: D10240535 fbshipit-source-id: 5e6992bec961214be8dbe26f16f5794154a22b25	2018-10-09 15:54:04 -07:00
Will Feng	cdead5ace1	Enable CircleCI for Linux jobs (#12389 ) Summary: Changes in this PR: 1. Intermediate Docker image is shared from build stage to test stage through ECR, in order to fix the Caffe2 flaky CUDA tests. 2. There are ~7 Caffe2 operator tests that are only flaky in `caffe2_py2_gcc4_8_ubuntu14_04_test` on CPU. Disabling those tests on that config only, which is okay to do because we are still running those tests in other test jobs. After this PR is merged, CircleCI will be running on master automatically, and will be running on PRs if the author rebased their PR onto the newest master (which we will ask all the authors to do when we switch off Jenkins for Linux). Pull Request resolved: https://github.com/pytorch/pytorch/pull/12389 Differential Revision: D10224267 Pulled By: yf225 fbshipit-source-id: dd1a90a425c3d13b870d3d328cb301eee2e6e2cd	2018-10-08 17:09:37 -07:00
Dong Shi	5a0d2c7138	Add clamping functionality to stats_put_ops Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12391 Reviewed By: mlappelbaum Differential Revision: D10220000 fbshipit-source-id: 10fdbc8ebab931a5be31df964b5de5728048205d	2018-10-08 16:53:26 -07:00
Junjie Bai	ff608a9ff3	Back out "Revert D10123245: Back out "codemod cuda_gpu_id to device_id"" (#12232 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12232 Original commit changeset: fca91fea58b7 This adds proper modifications to the DeviceType <->DeviceOption conversion code added in D10033396 Reviewed By: jerryzh168 Differential Revision: D10132473 fbshipit-source-id: 801ef777e2950982cb47b48051b1471a0a91e64b	2018-10-01 21:54:52 -07:00
Junjie Bai	26df16eb21	Clear previous device option when keep_device is set in load op Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12240 Reviewed By: jerryzh168 Differential Revision: D10133933 fbshipit-source-id: 05935bd527177f936c1d08626888d43dedbf5ce4	2018-10-01 17:20:26 -07:00
Rick Ratmansky	3010dc4208	Revert D10123245: Back out "codemod cuda_gpu_id to device_id" Differential Revision: D10123245 Original commit changeset: d83da8e00a12 fbshipit-source-id: fca91fea58b7df208edc2e218a1d514f9821ec7b	2018-10-01 12:22:36 -07:00
Yang Liu	7d7d336c45	Back out "codemod cuda_gpu_id to device_id" Summary: Original commit changeset: f5614a5d2607 D9986213 is causing Multifeed Aggregator a [huge performance different](https://our.intern.facebook.com/intern/ads/analyze_canary/412951953278781781/) and is blocking aggregator push since last Friday night: https://fburl.com/feedtools/b6izvwjz We need to land this revert ASAP to unblock aggregator push. Reviewed By: orionr Differential Revision: D10123245 fbshipit-source-id: d83da8e00a1250f5d09811a0a587c127e377aab2	2018-10-01 11:31:14 -07:00
Satish Nadathur	04c0971679	Special case BatchGather and BatchGatherGradient for block_size=1. (#11349 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11349 Special case BatchGather and BatchGatherGradient for block_size=1. This makes BatchGather 3-4X faster and BatchGatherGradient 10X for this case. Reviewed By: jspark1105, ilia-cher Differential Revision: D7218043 fbshipit-source-id: ea12042239a8adc92b9efcbd0b66e354fb43f4c7	2018-09-27 21:11:38 -07:00
Junjie Bai	3eb5940cf5	codemod cuda_gpu_id to device_id (#12022 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12022 codemod -d . --extensions h,cc,cpp,cu,py,proto,pbtxt,pb.txt,config cuda_gpu_id device_id codemod with 'Yes to all' Reviewed By: orionr Differential Revision: D9986213 fbshipit-source-id: f5614a5d26078817aee8caf79a494abfd6a95ff1	2018-09-27 20:24:53 -07:00
Dong Shi	d9c27f4d8d	T33898723: Simple put operators for caffe2 stats (#12057 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12057 Add simple put operators for various types of stats Reviewed By: mlappelbaum Differential Revision: D9925268 fbshipit-source-id: cec02b0027d2d0ef3d35741be4b02c429d492810	2018-09-26 12:39:37 -07:00
Will Feng	7122f8b3bb	Disable more flaky tests on CircleCI (#11399 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/11362. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11399 Differential Revision: D9736673 Pulled By: yf225 fbshipit-source-id: cad8c0e86a70a01b047e648975ca5b9926e4acb3	2018-09-25 10:25:30 -07:00
Ansha Yu	3b1a5a1b8a	Refactor tests part 2 (#11811 ) Summary: Followup to the [first refactor](https://github.com/pytorch/pytorch/pull/11350). Increase coverage of tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/11811 Reviewed By: houseroad Differential Revision: D9923074 Pulled By: ajyu fbshipit-source-id: 0f899bb9e9a75bf7ed939e06cc9b028daa7f6bd9	2018-09-19 10:09:28 -07:00
Ansha Yu	98aebed88e	Refactor tests part 1 (#11350 ) Summary: Followup to [the serialized test framework](https://github.com/pytorch/pytorch/pull/10594) Round 1 for refactoring tests, starting alphabetically. I added some functionality, so I wanted to send out some of these initial changes sooner. I'm skipping all tests that don't explicitly call assertReferenceChecks. Some tests directly call np.allclose, and others are simply TestCase (rather than HypothesisTestCase). 1. Start alphabetically producing serialized outputs for test functions, annotating those we want to include with `serialized_test_util.given`. So far I've only added one test per operator, but this already does seem to add quite a few tests. 2. Add functionality to allow us to generate outputs using pytest by adding pytest argument options. This allows us to skip adding a `__main__` function to quite a few tests. 3. Catch any exceptions generating the gradient operator and skip serializing/reading it, since certain operators don't have gradients. 4. Add functionality to better handle jagged array inputs, which numpy doesn't handle very well. We simply explicitly do the conversion to dtype=object. 5. Make only one file per test function, rather than 4, to reduce the number of files in the github repo. I also noticed that there is some hypothesis handling that makes `serialized_test_util.given` not compatible with adding more hypothesis decorators on top. For example, there are tests that do ``` settings(...) given(...) def test_my_stuff(...) ``` But there is a hypothesis handler that explicitly checks that `given` is called below `settings`, so we cannot refactor this to `serialized_test_util.given`. I've just avoided decorating these kinds of tests for now, I hope that's alright. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11350 Reviewed By: houseroad Differential Revision: D9693857 Pulled By: ajyu fbshipit-source-id: a9b4279afbe51c90cf2025c5ac6b2db2111f4af7	2018-09-18 10:42:10 -07:00
Chenguang Xi	cdefc27795	Support lr adaption for SparseAdam and RowWiseSparseAdam (#11162 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11162 as title, fix pr test failure Reviewed By: chocjy Differential Revision: D9619308 fbshipit-source-id: 0a2228841ed8fadb15f07e94d3575aa701b10146	2018-09-17 10:29:03 -07:00
Xiaomeng Yang	7f7cda99cd	Optimize order_swich_ops on GPU (#11404 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11404 Optimize order_swich_ops on GPU Reviewed By: houseroad Differential Revision: D9728642 fbshipit-source-id: 74ff62268856fb1613fa61eb214bed6ec6716632	2018-09-12 16:56:15 -07:00
Lukasz Wesolowski	4db21a1d8e	Optimize LengthsTileOp on GPU to run a kernel instead of a sequence of memcopies (#11413 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11413 LengthsTileOp was implemented using a sequence of device memcopies initiated on the CPU. This was very slow. I changed it to use a kernel. TUM benchmark QPS improved from 13k QPS to 20k QPS as a result. Reviewed By: manojkris, xianjiec Differential Revision: D9724988 fbshipit-source-id: 2f98c697730982734d7c6a26d0b6967310d49900	2018-09-11 13:25:35 -07:00
Mingda Li	f2f43ad2da	Add new LengthsSplit operator (#10974 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10974 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10291 This new operator will do the following: Given a LENGTHS vector and n_splits, output a "split" LENGTHS vector where: 1. Each length in input vector is split into n_splits values (thus output vector should have LENGTHS.size(0) * n_splits elements) 2. The new lengths in output should be evenly split, and if the length is not divisible by n_splits, then order new values in descending order. (e.g. n_splits = 3, length = 5 -> 2 2 1) 3. If n_splits > some element in the array, its split elements will contain 0s. (e.g. n_splits = 3, length = 2 - > 1 1 0) Reviewed By: bddppq, chocjy Differential Revision: D9013119 fbshipit-source-id: 82bf3371ec08c41fc3379177f0007afc142e0d84	2018-09-10 15:40:28 -07:00
Xiaomeng Yang	ec5404a449	Add cuda version of SpatialBNOp also optimize SpatialBN on CPU (#10888 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10888 Add cuda version of SpatialBNOp also optimize SpatialBN on CPU Reviewed By: houseroad Differential Revision: D9512435 fbshipit-source-id: 6f828c88d56d30dc9a2f98a297a161c35cc511b1	2018-09-06 18:26:13 -07:00
Will Feng	c9e66351a7	Port all PyTorch and Caffe2 jobs to CircleCI (#11264 ) Summary: This PR adds all PyTorch and Caffe2 job configs to CircleCI. Steps for the CircleCI mini-trial: - [ ] Make sure this PR passes Jenkins CI and fbcode internal tests - [x] Approve this PR - [ ] Ask CircleCI to turn up the number of build machines - [ ] Land this PR so that the new `.circleci/config.yml` will take effect Several Caffe2 tests are flaky on CircleCI machines and hence skipped when running on CircleCI. A proper fix for them will be worked on after a successful mini-trial. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11264 Differential Revision: D9656793 Pulled By: yf225 fbshipit-source-id: 7832e90018f3dff7651489c04a179d6742168fe1	2018-09-05 16:28:11 -07:00
Xiaomeng Yang	b3d559cdd1	Optimize WeightedSumOp for two inputs (#11049 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11049 Optimize WeightedSumOp for two inputs Reviewed By: houseroad Differential Revision: D9566692 fbshipit-source-id: 9aab1f02251d386b6f7d0699ae11eeb2ea2b5b4f	2018-09-01 11:54:55 -07:00
Edward Yang	3073051a18	Revert D9554375: Support lr adaption for SparseAdam and RowWiseSparseAdam Differential Revision: D9554375 Original commit changeset: b88768f470ef fbshipit-source-id: 2c103c616c8680684892c7d9085fd7bb8289d2f1	2018-08-31 07:54:31 -07:00
Chenguang Xi	0555768e0f	Support lr adaption for SparseAdam and RowWiseSparseAdam (#10993 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10993 as title Reviewed By: chocjy Differential Revision: D9554375 fbshipit-source-id: b88768f470ef7d023dd481c6a97b91594892f422	2018-08-31 00:55:39 -07:00
Ansha Yu	9fae8fcdff	framework for committed serialized tests (#10594 ) Summary: Generate serialized test inputs/outputs/backward graphs of tests inside `caffe2/python/operator_test` that call assertSerializedOperatorCheck(). Tests should be decorated with serialized_test.collect_tests.given_and_seeded to run hypothesis tests that are actually random and a single fixed seeded hypothesis tests. To use: 1. Refactor your test to be a SerializedTestCase 1a. Decorate it with given_and_seeded 1b. Call testWithArgs in main 2. Run your test with -g to generate the output. Check it in. 3. Subsequent runs of the test without generating the output will check against the checked in test case. Details: Run your test with `python caffe2/python/operator_test/[your_test].py -g` Outputs are in `caffe2/python/serialized_test/data`. The operator tests outputs are in a further subdirectory `operator_test`, to allow for other tests in the future (model zoo tests?) Currently, we've only refactored weighted_sum_test to use this, but in the next diff, we'll refactor as many as possible. The directory structure may also change as usually there are multiple tests in a single file, so we may create more structure to account for that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10594 Reviewed By: ezyang Differential Revision: D9370359 Pulled By: ajyu fbshipit-source-id: 2ce77389cd8bcc0255d3bccd61569833e545ede8	2018-08-30 22:41:46 -07:00
Tommy Yu	89834dfe64	Add GPU version of HardSigmoid Op to Caffe2 (#10955 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10955 Add GPU version of HardSigmoid Op to Caffe2. Updated test file to include GPU tests. Reviewed By: enosair Differential Revision: D9499353 fbshipit-source-id: fcb51902063d0c3e4b10354533a8a42cf827c545	2018-08-29 14:55:29 -07:00
Tommy Yu	92ff070b83	Add CPU version of hard sigmoid operator to caffe2 (#10837 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10837 Add CPU version of hard sigmoid operator to caffe2. The definition of this operator can be found here: https://github.com/onnx/onnx/blob/master/docs/Operators.md#HardSigmoid. Reviewed By: BIT-silence Differential Revision: D9489536 fbshipit-source-id: 67b3171ed96d5ebcc8d500d93e7827a4a9705a81	2018-08-28 14:55:49 -07:00
Yanghan Wang	f64f6eed3a	move HeatmapMaxKeypointOp unittest to oss Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10859 Reviewed By: newstzpz Differential Revision: D9498312 fbshipit-source-id: 08b8a596f774c9102286019f286ca0b74d1f5304	2018-08-27 12:56:46 -07:00
Edward Yang	deda05e59f	Revert D9395814: move HeatmapMaxKeypointOp unittest to oss Differential Revision: D9395814 Original commit changeset: 25073eb6b143 fbshipit-source-id: 56f2b7b57e3c6361e2d78e5ba7850ea3b89e98fb	2018-08-23 06:54:29 -07:00
Yanghan Wang	9a43fc5eaa	move HeatmapMaxKeypointOp unittest to oss Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10674 Reviewed By: newstzpz Differential Revision: D9395814 fbshipit-source-id: 25073eb6b143fc1e7cbf5f887545d2b7df15c9a9	2018-08-22 19:11:10 -07:00
Wei Wen	6c75fc0aa3	Intergrating stochastic quantization to easgd to reduce communication + supporting quantization on both sides (split from D8849770) (#10644 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10644 Depends on D8493264 Reviewed By: chocjy, boryiingsu Differential Revision: D9347706 fbshipit-source-id: 6fdcc5b61098bf47ec9391b1f009b0e6a0615842	2018-08-22 17:10:03 -07:00
Huan Gui	7832e9d564	Add a bisect percentile operator (#10563 ) Summary: Add a bisect percentile operators with lower and upper bounds for interpolation Pull Request resolved: https://github.com/pytorch/pytorch/pull/10563 Reviewed By: chocjy Differential Revision: D7802182 Pulled By: olittle fbshipit-source-id: 89ebfa8b3463adc2c89235fa3dfffa187a9d5417	2018-08-20 13:14:05 -07:00
Jongsoo Park	cc53807be5	group conv with NHWC layout (#10585 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10585 group conv with NHWC layout Reviewed By: BIT-silence Differential Revision: D7547497 fbshipit-source-id: da0ec5a4512c15a0a0d7b79e6ce00c1f8f77f661	2018-08-17 00:39:23 -07:00
Xiaomeng Yang	87cac4c2f1	Update Im2Col related to make preparation for group conv in NHWC order. (#10439 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10439 Update Im2Col related to make preparation for group conv in NHWC order. Reviewed By: houseroad Differential Revision: D9285344 fbshipit-source-id: 1377b0243acb880d2ad9cf73084529a787dcb97d	2018-08-15 17:10:24 -07:00
Eli Amesefe	c5b1aa93ee	Export uint8 tensors as byte string in mobile_exporter and add GivenTensorByteStringToUInt8FillOp (#10385 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10385 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10354 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10316 Because Protobuf encodes uint8_t tensors using a less space efficient varint uin32_t encoding, we are adding a new operator that reads back a byte string into a uint8_t tensor. Reviewed By: harouwu Differential Revision: D9004839 fbshipit-source-id: dfd27085c813fdeff13fee15eef4a2e7fef72845	2018-08-15 14:26:50 -07:00
Jongsoo Park	d8ff7ad6f8	generalize order switch ops for 1-3d (#10395 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10395 Order switch ops (NCHW2NHWC and NHWC2NCHW) were only supporting 2D images. This diff generalizes them to 1D and 3D, and also add a unit test we didn't have. Reviewed By: protonu Differential Revision: D9261177 fbshipit-source-id: 56e7ec54c9a8fb71781ac1336f3f28cf024b4bda	2018-08-15 10:09:31 -07:00
Peizhao Zhang	ce8e8feceb	Fixed a bug in box_with_nms_limit where it may produce more bounding boxes than specified. (#10390 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10390 Fixed a bug in box_with_nms_limit where it may produce more bounding boxes than specified. * The original code first finds the threshold for the boxes at the 'detectons_per_im' position, and filters out boxes lower than the threshold. * In some cases that there are multiple boxes have the same threshold, the op will return more boxes than 'detectons_per_im'. Reviewed By: wat3rBro Differential Revision: D9252726 fbshipit-source-id: 63f40829bcd275cb181692bc7547c384cee01499	2018-08-14 23:54:23 -07:00
Peizhao Zhang	520f4f6cb9	Added some unit test for box_with_nms_limit_op. (#10389 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10389 Added some unit test for box_with_nms_limit_op. Reviewed By: wat3rBro Differential Revision: D9237860 fbshipit-source-id: 2d65744bd387314071b68d2a0c934289fc64a731	2018-08-14 11:55:03 -07:00
Wei Wen	ffb59e5f20	adding stochastic quantization caffe2 operators (encoder and decoder in CPU are implemented. GPU mode is pending) Summary: This operator implements b (1/2/4/8) bit stochastic quantization of a floating matrix in a row-wise fashion. 8/b floating values are concatenated to a byte and returned in uint8 tensor. PR: https://github.com/pytorch/pytorch/pull/8629 Reviewed By: harouwu Differential Revision: D8493264 fbshipit-source-id: 01f64066568a1e5a2b87c6d2134bd31cdf119c02	2018-08-13 16:39:23 -07:00
Jerry Zhang	656bb320b7	EnforceFinite test (#10143 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10143 att Reviewed By: xianjiec Differential Revision: D9122444 fbshipit-source-id: 010abcc1eb64f084c00890e8de5f5d422b4b8d02	2018-08-03 10:31:29 -07:00
Junjie Bai	4778afb8bb	In Expand support using -1 to indicate preserving original size (#10174 ) Summary: zrphercule https://pytorch.org/docs/stable/tensors.html#torch.Tensor.expand Pull Request resolved: https://github.com/pytorch/pytorch/pull/10174 Differential Revision: D9136467 Pulled By: bddppq fbshipit-source-id: 825c489899097acda8d43706964d78a104cdf583	2018-08-02 22:09:47 -07:00
Junjie Bai	dd527db711	Skip TestConvolution.test_convolution_sync on ROCM which caused random segfaults (#10179 ) Summary: https://ci.pytorch.org/jenkins/job/caffe2-builds/job/py2-clang3.8-rocm1.7.1-ubuntu16.04-test/4701/console petrex ashishfarmer rohithkrn Pull Request resolved: https://github.com/pytorch/pytorch/pull/10179 Differential Revision: D9139657 Pulled By: bddppq fbshipit-source-id: 9b1bb2ad185ed16fff696ce026a5ee5fcf9cbaee	2018-08-02 21:09:27 -07:00
Lin Li	4a2f3cc45f	Improve lars operator by applying clipping (#9905 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9905 This diff improves lars operator in Caffe2 by applying clipping to the computed learning rate Reviewed By: pjh5 Differential Revision: D9020606 fbshipit-source-id: b579f1d628113c09366feac9406002f1ef4bd54f	2018-08-02 11:54:28 -07:00
Anshul Jain (B*8)	56974a06b5	Revert D8909766: [caffe2] Simplify order switch operators Differential Revision: D8909766 Original commit changeset: 17a302d5bf4a fbshipit-source-id: 56c75a8ce27873ed1d5f194b9d6bf0049d8f21ba	2018-07-28 18:40:13 -07:00
Igor Milyakov	607688e928	Adding reciprocal operator and a test Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9908 Differential Revision: D9035809 Pulled By: virtan fbshipit-source-id: bce1db46fd55faeeab18a3b266d25c8beeb08df7	2018-07-27 18:24:43 -07:00
Igor Milyakov	12a1af3731	Adding conv tests with explicit algo definition Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9798 Differential Revision: D9034663 Pulled By: virtan fbshipit-source-id: d722f25f1dd00231ccc3ad5960bbbef63af02c2d	2018-07-27 17:39:17 -07:00
root	c3fe071483	Update hip files (#9826 ) Summary: The goal of this PR is to update the hip files to reflect relevant changes in cuda source files. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9826 Differential Revision: D9032840 Pulled By: bddppq fbshipit-source-id: 504e55c46308eebfee3c9a7beea1f294fe03470f	2018-07-27 16:54:39 -07:00
Norman Mu	a532c1a48c	Fix default argument value for CTCGreedyDecoder op (#9747 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9747 Currently the ctc_greedy_decoder op initializes the `merge_repeated` argument only if it has been provided by the user. Change to initialize in all cases. Reviewed By: houseroad Differential Revision: D8963635 fbshipit-source-id: 18955c7c26a77d9d7f5137e4dec085252ffabfeb	2018-07-27 16:33:07 -07:00
Jongsoo Park	e7ab093d93	Simplify order switch operators (#9581 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9581 Mostly to simplify code. Should also improve performance but order switch ops don't take much time anyway. Reviewed By: viswanathgs Differential Revision: D8909766 fbshipit-source-id: 17a302d5bf4aba2755d88223fc01a41fd72c5919	2018-07-26 18:24:29 -07:00
Junjie Bai	bdbbcf068a	Temporarily disable test_unique on rocm since it keeps running into segfault (#9872 ) Summary: petrex https://ci.pytorch.org/jenkins/job/caffe2-builds/job/py2-clang3.8-rocm1.7.1-ubuntu16.04-test/3758/console https://ci.pytorch.org/jenkins/job/caffe2-builds/job/py2-clang3.8-rocm1.7.1-ubuntu16.04-test/3757/console https://ci.pytorch.org/jenkins/job/caffe2-builds/job/py2-clang3.8-rocm1.7.1-ubuntu16.04-test/3752/console Pull Request resolved: https://github.com/pytorch/pytorch/pull/9872 Reviewed By: ezyang Differential Revision: D9013335 Pulled By: bddppq fbshipit-source-id: 80490a0fd4a86aa9c8454378c0edddc57d135c4e	2018-07-26 08:34:00 -07:00
Junjie Bai	997f46d1e1	Disable "filter too much" health check for fc operator tests (#9865 ) Summary: makes the CI flaky Pull Request resolved: https://github.com/pytorch/pytorch/pull/9865 Differential Revision: D9011882 Pulled By: bddppq fbshipit-source-id: 5124ab97d258eed7585734d64fb01e5df98abd0d	2018-07-25 21:41:14 -07:00
Siddharth Goyal	4b61760738	Add Adadelta optimizer to caffe2 (#9088 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9088 Closes https://github.com/pytorch/pytorch/pull/9088 - Added CPU/GPU implementations of Adadelta and SparseAdadelta. - Added corresponding Python unittests Reviewed By: BIT-silence Differential Revision: D8712169 fbshipit-source-id: 544e99e13b230a919672a7341b3715d64597c0be	2018-07-24 20:09:21 -07:00
Kittipat Virochsiri	2b134c72e6	Add interface to provide blob types to shape&type inference (#9643 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9643 Current map interface assumes float data type, which is not always correct. Reviewed By: kennyhorror Differential Revision: D8455784 fbshipit-source-id: b94a31267760f7f97c15aa4b03008affc347fd10	2018-07-24 11:58:05 -07:00
Junjie Bai	7af5883860	Eanble python tests on ROCM (#9616 ) Summary: petrex Pull Request resolved: https://github.com/pytorch/pytorch/pull/9616 Differential Revision: D8960623 Pulled By: bddppq fbshipit-source-id: bde93bda6230094e6bf4badd8ee79f0688ae1993	2018-07-24 11:37:58 -07:00
Xiaomeng Yang	5df3eae89e	Add 1x1 specialization for conv with NCHW order (#9671 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9671 Add 1x1 specialization for conv with NCHW order Reviewed By: houseroad Differential Revision: D8944686 fbshipit-source-id: 94bf44f69498b1934b7dfff4c0e989342c7bb61c	2018-07-23 18:54:58 -07:00
Norman Mu	ee2cc68259	Add ctc_beam_search_decoder op for caffe2 (#9622 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9622 Implement a ctc_beam_sarch_decoder operator based on ctc_greedy_decoder. Differential Revision: D8903100 fbshipit-source-id: 38973632cb437e5cfcb9ed3a48ed6b901c10efa3	2018-07-23 13:40:24 -07:00
Xiaomeng Yang	a01d6f01b5	Update channel_shuffle_op and transpose 2d to speed up ShuffleNet (#9525 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9525 Update channel_shuffle_op and transpose 2d to speed up ShuffleNet Reviewed By: houseroad Differential Revision: D8889361 fbshipit-source-id: 60196e819b6842becc53b4859b62d4419a0e2c6e	2018-07-21 12:54:33 -07:00
Zhaoheng Ni	a3a6ab60cd	Fix the error in UnpackSegmentsOp when calculating the gradient with "max_length" argument (#9598 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9598 The "max_length" should be passed to UnPackSegmentsOp if "max_length" is given when calling PackSegmentsOp. Reviewed By: jerryzh168 Differential Revision: D8919799 fbshipit-source-id: 8c97aa717b69177b8a5d5d56892817d488853840	2018-07-20 11:09:34 -07:00
Zhishuai Zhang	6557856671	Fix l2 normalization when handling zero vector (#9594 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9594 When the input vector is a zero vector, the previous GPU code will give Nan in backward. We fix this. Reviewed By: pjh5 Differential Revision: D8849732 fbshipit-source-id: 87b1fb1ee05dfdb0d43bcbe67e36f15896fe1706	2018-07-19 14:10:03 -07:00
Xiaomeng Yang	ca3b36aa6a	Add implementation for batch_moments_op (#9510 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9510 Add implementation for batch_moments_op Reviewed By: houseroad Differential Revision: D8587654 fbshipit-source-id: d20f52cc8e900716c1057e68c147258dfda5245b	2018-07-18 11:59:54 -07:00
Viswanath Sivakumar	9235ff53f1	Clip horizontal bounding boxes during rotated detection for backward compatibility (#9403 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9403 In BBoxTransform and GenerateProposal ops, clip_boxes makes sure the bbox fits within the images. For rotated boxes, this doesn't always make sense as there could be multiple ways to clip a rotated box within an image boundary. Moreover, clipping to a horizontal box means we leave out pixels of interest potentially. Therefore, we clip only boxes with angle almost equal to 0 (with a specified `angle_thresh` tolerance). Reviewed By: pjh5 Differential Revision: D8828588 fbshipit-source-id: 39c1eafdb5d39d383780faa0a47e76149145e50c	2018-07-16 20:24:49 -07:00
Mark Richardson	88146484b4	Add support for .norm() pytorch onnx export and ReduceL1/ReduceL2 caffe2 operators (#9299 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9299 Onnx has ReduceL1 and ReduceL2 operators that would facilitate this, so allow pytorch to export those and allow caffe2 to run them. I only implemented this on CPU so far. Reviewed By: pjh5 Differential Revision: D8757381 fbshipit-source-id: 68afc9e2f90042a70929b73ace05a499b5c670c7	2018-07-14 10:54:13 -07:00
Zhaoheng Ni	5ac8a80f8b	Add BatchBucketizeOp in caffe2 (#9385 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9385 The operator transform dense features to sparse features by bucketizing. Only the feature in indices tensor will be transformed and output. Reviewed By: bddppq Differential Revision: D8820351 fbshipit-source-id: a66cae546b870c6b2982ac20641f198334f2e853	2018-07-13 20:39:30 -07:00
Jian Zhang	9e2f2cab94	Implementation and operator test for Wngrad optimizer (#8999 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/8999 Closes https://github.com/pytorch/pytorch/pull/8999 Implemented the WRgrad optimizer operator for dense (base case as well as the case with additional output for effective learning rate and update value) and sparse case. Reviewed By: pjh5 Differential Revision: D8627933 fbshipit-source-id: a63cde46c04bcc6b428ab5f77a4b3b2beb66c046	2018-07-13 18:11:41 -07:00
Xiaomeng Yang	bb9ff58c6d	Add cudnn activation ops (#9379 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9379 Add cudnn activation ops Reviewed By: houseroad Differential Revision: D8818013 fbshipit-source-id: d3881c634a46578b9331da07f9fdf7e1f31d7e8a	2018-07-12 23:18:56 -07:00
Akshay Chalana	e30ff68410	Add Hardtanh Export (#8804 ) Summary: Added hartanh CPU/GPU Implementations and backend tests to Caffe2 Pull Request resolved: https://github.com/pytorch/pytorch/pull/8804 Reviewed By: bddppq Differential Revision: D8813987 Pulled By: houseroad fbshipit-source-id: 2480296eab3373425b9e1734a10c009b4f5d3e26	2018-07-11 18:09:51 -07:00
Viswanath Sivakumar	c2dd90c40e	Add angle normalization for rotated boxes (#9056 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9056 Closes https://github.com/pytorch/pytorch/pull/9056 Updates bbox_transform for rotated boxes with angle info to normalize the predicted angle to be within [angle_bound_lo, angle_bound_hi] range. Reviewed By: pjh5 Differential Revision: D8706240 fbshipit-source-id: f3ee834cf362736136e285f0f8f0c063af94a879	2018-07-11 11:25:54 -07:00
Viswanath Sivakumar	748a90d05b	BBoxTransform op: Add support for rotated boxes (#8952 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/8952 Closes https://github.com/pytorch/pytorch/pull/8952 Based on RRPN paper: https://arxiv.org/abs/1703.01086 Reviewed By: pjh5 Differential Revision: D8598547 fbshipit-source-id: 3699379df9bf45ed5bdd395175a0e26a77e079f7	2018-07-11 10:25:34 -07:00
Huamin Li	fb9f9c9ba2	Implement Sinh and Cosh (#9213 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9213 Closes https://github.com/pytorch/pytorch/pull/9213 Added hyperbolic trig functions Sinh and Cosh Reviewed By: BIT-silence Differential Revision: D8752566 fbshipit-source-id: 5a58336a5153ec804404b9ac7b10b5662ede3cb7	2018-07-10 18:55:31 -07:00
Orion Reblitz-Richardson	936f47f271	Make roi_align_rotated_op_test not rely on 1.12.0 numpy.rot90 (#9267 ) Summary: Breaking this out of https://github.com/pytorch/pytorch/pull/8338 Use a local version of `np.rot90` with an `axes` argument, since we don't have NumPy 1.12.0 in all of the test environments. Caffe2 conda2-ubuntu16.04, for example, fails. Generally, it seems better to not require a NumPy bump just for this test. cc mingzhe09088 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9267 Reviewed By: mingzhe09088 Differential Revision: D8767819 Pulled By: orionr fbshipit-source-id: c51a6295d58366eba06e4e55e3f1ffaa8af96975	2018-07-09 11:55:39 -07:00
Zhaoheng Ni	f87499a8f3	Modify the original PackSegments operator by adding "max_length" argument (#9048 ) Summary: Closes https://github.com/pytorch/pytorch/pull/9048 max_length argument helps fix the shape of the output to be N * max_length * D, where N is the batch_size, D is the feature_dim. Reviewed By: bddppq Differential Revision: D8702782 fbshipit-source-id: e30555608fee1c4a61cc95922f4a71c7f54903af	2018-07-06 14:33:59 -07:00
Xiaomeng Yang	21c420c32c	Remove unused RowwiseArgMaxOp (#9119 ) Summary: Closes https://github.com/pytorch/pytorch/pull/9119 Remove unused RowwiseArgMaxOp Reviewed By: houseroad Differential Revision: D8719826 fbshipit-source-id: 57d78c8b93bc94a4634d806c7c2041f8c18678a5	2018-07-05 15:25:28 -07:00
Yan Zhu	8364470e5c	fix expty batch for softmax (#9075 ) Summary: Closes https://github.com/pytorch/pytorch/pull/9075 as title Reviewed By: QueryConnectionException Differential Revision: D8710616 fbshipit-source-id: ca505e1a733cc24db9e2ab83a5395c64fa8360c4	2018-07-01 16:40:14 -07:00
Xiaomeng Yang	03e7953a98	Use FixedDivisor in Reduce and Broadcast CUDA kernels (#9072 ) Summary: Closes https://github.com/pytorch/pytorch/pull/9072 Use FixedDivisor in Reduce and Broadcast CUDA kernels Reviewed By: houseroad Differential Revision: D8710243 fbshipit-source-id: 6f1da12234898594a1be8c979d942aa515832aeb	2018-07-01 00:25:34 -07:00
Yan Zhu	b07ea04e23	empty batch for spatialBN (#8933 ) Summary: Closes https://github.com/pytorch/pytorch/pull/8933 spatialBN implementation cannot deal with empty batch, this diff tries to enable zero batch setting: during training, when batch_size = 0: in forward, output's saved_mean and saved_var are zeros. in backward, the gradient for SCALE_GRAD and BIAS_GRAD are zeros. Reviewed By: pjh5 Differential Revision: D8644699 fbshipit-source-id: 599ea687329d68699c987e05f56f409f4e729d1c	2018-06-29 18:40:41 -07:00
Xiaomeng Yang	838fdd6f99	Add Cube and Cbrt Ops (#8991 ) Summary: Closes https://github.com/pytorch/pytorch/pull/8991 Add Cube and Cbrt Ops Reviewed By: houseroad Differential Revision: D8678848 fbshipit-source-id: 051dd475e45ad9f1d11a8b32ae3acd1f7459b930	2018-06-28 14:55:30 -07:00
Xiaomeng Yang	93cc7d1923	Add in_place test for binary ops Summary: Closes https://github.com/pytorch/pytorch/pull/8973 Reviewed By: houseroad Differential Revision: D8674216 Pulled By: BIT-silence fbshipit-source-id: bde1ff7b47dbc8a48d1ff72b345c767af698a09b	2018-06-28 11:45:35 -07:00
Mingzhe Li	c4744cfafa	bilinear upsample operator on CPU Summary: Add support for bilinear upsample operator on CPU. Reviewed By: BIT-silence Differential Revision: D7853215 fbshipit-source-id: 9043c95f9eb4e1f6df324e8f7a4e8fdb0c758f66	2018-06-27 10:12:06 -07:00
Orion Reblitz-Richardson	edb88b5f3a	Update from Facebook (#8887 ) * add opencl + fpga context adds an opencl context inside caffe2/fb which can be used for fpga access * [Caffe2] Force tensor inference checks to be triggered during testing We've started to rely on TensorInference functions more for different analysis. This diff ensures that the TensorInference function's result matches what is expected from the definition of the operator. * Enable building //caffe2:torch with @mode/opt In @mode/opt, python runs out of a PAR, which breaks a lot of assumptions in the code about where templates/ folders live relative to __file__. Rather than introduce hacks with parutil, I simply turn template_path into a parameter for all the relevant functions and thread it through from the top level. * [Caffe2] Fix cost models for DotProduct and Div. Update Tensor Inference for dot product As title. DotProduct states that output is a 1-D tensor (https://caffe2.ai/docs/operators-catalogue.html#dotproduct) though code suggests it is either 0- or 1-D depending on inputs. TensorInference defined to support implementation. * [SG-MoE] Add an option to make the experts NOT as components * [nomnigraph] Rename and fixup convertToNeuralNetOperator API This will make things a bit cleaner * no longer symlink THNN.h and THCUNN.h * forced decoder network (onnx export) Closes https://github.com/pytorch/translate/pull/95 Add networks in ensemble_export.py to create a forced decoding network from PyTorch NMT checkpoints. This network takes an arbitrary numberized (source, target) pair and returns the model score for the translation, including penalties. Vocabulary reduction networks are also supported, but note that target indices which are not in the possible_translation_tokens generated for the source input will be trea * Revert schema change to fix production models Revert schema change to fix production models * MockLogDeviceReader - rebase on FIX # Goal 1), Build a make_mock_log_device_reader using make_mock_reader 2), Replace the real log_device_reader here: https://fburl.com/raihwf1p # Log by D8151734 Real log_device_reader: ``` I0529 20:29:05.373108 954994 tensor.h:839] Tensor print_net/log of type std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >. Dims: (): read_net/ParseOpenTrainingRow:0 I0529 20:29:05.373244 954994 tensor.h:839] Tensor read_net/ParseOpenTrainin * [C2/D2][1/n]: Nonnegative-Constrained Optimization -- log barrier implement log barrier as a regularization method * Add teacher weight screening. Add teacher weight sceening according to teacher labels. If teacher label is zero, we do not use the distill loss in the objective function. * Add NormalizerContext See task for more detail. This implementation is a copy of what exists for RegularizerContext except for how the parameters are defined in the model_definition thrift file. I'll try an alternative implementation which overrides the default arguments of functions instead like for argscopes in tensorflow. https://github.com/pytorch/pytorch/compare/master...MaximeBoucher:update-from-facebook-0939578c068c?expand=1 * Adding cosine similarity option in dot processor Add pairwise cosine similarity option in dot product. Add an option to concate dot product and cosine similarity. Add test cases. * [nomnigraph][redo] Concat elim for sparseNN Same as D7962948, which was reverted because Operator Schema was not defined * [pytorch] Revert pytorch/pytorch#7918 'Release GIL when copying to shared memory', breaks ASAN Revert this pytorch diff that breaks ASAN when running Filament in dev mode; in opt mode it gives "bad file descriptor" errors. Looks like a race when copying tensors to shared memory in multiple mp.Queue's (which spawn separate threads). https://github.com/pytorch/pytorch/pull/7918/files * [nomnigraph][mobile] Enable nomnigraph by default, use -Oz on nomnigraph related code to reduce code size enables nomnigraph and reduces codesize * [Warmup] Allow both offline incremental training and online training Change plan name on saving side and reading side to support both training type This diff depends on D8128530 and D8168651. * Revert D7802642: [Warmup] Allow both offline incremental training and online training This reverts commit afc213cf9b36cecf75333a788391c4d09f4afccc @bypass-lint An infra SEV is better than not reverting this diff. If you copy this password, see you in SEV Review! @cause_a_sev_many_files * Add legacy grad logic to fix div op on old graphs. Add legacy grad logic to fix div op on old graphs. * Correctly propagate operator failures Propagate errors from operators that throw exceptions and return false * Revert D8374829: [caffe2][nomnigraph][redo] Concat elim for sparseNN This reverts commit 6dda028c463e54bb5c32188bbbe9202107e188a5 @bypass-lint An infra SEV is better than not reverting this diff. If you copy this password, see you in SEV Review! @cause_a_sev_many_files * [Caffe2] Added extra_info to core.DeviceOption(), enforced extra_info to be inherited in scope.DeviceScope extra_info is a newly defined field in DeviceOption proto. This diff added extra_info to the core.DeviceOption(). And, In scope.DeviceScope(), this diff enforce the new scope to inherit the extra_info from old scope. * [opt] hgdirsync wasn't enabled, merge diverged code Here's the damage, P59732616 basically xplat was left behind but had the change from assert to CAFFE_ENFORCE * OMP parallelism over RoIs for RoIAlign op Simpler to parallelize over RoIs. Shouldn't affect other uses as it relies on the number of OMP threads set during startup. PR: https://github.com/pytorch/pytorch/pull/8562 * Use int64_t for shape in FillOps to avoid overflow of int32 * Implement Rotated RoIAlign op Based on Rotated RPNs as explained in https://arxiv.org/abs/1703.01086. The idea is simple - orientation/angle is added as an RPN anchor parameter and then the angle is further regressed similar to bbox coords. There are some additional changes related to NMS and IoU, but besides that it's a direct extension to Faster-RCNN. Further details in https://fb.quip.com/sZHlA1iMfWPZ. RoIs are represented in [center_x, center_y, width, height, angle] format. `angle` repre * Rotated RoIAlign op CUDA forward implementation CUDA forward impl for D8415490 * RoIAlignRotated op CUDA backward pass implementation TSIA * All remaining fixes to eliminate process_github.sh Most of this diff has already been reviewed separately, except for the parts relating to _thnn/utils.py and _utils._internal.py remove skipIf(True, 'Fbcode') line from process_github.sh replace sed of cpp file with #ifdef to control cudnnDestroy use undo sync-time deletion of .gitattributes, remove process_github.sh switch to using _utils._internal rather than try-import-except This diff also fixes the open-source bug where rebuilds have * Back out "Revert D7802642: [Warmup] Allow both offline incremental training and online training" Original commit changeset: 7707d2efe60e The original diff is backout becuase the online trainer package is backed out. This code would only work with new online trainer package * [easy] improve error log in adagrad op as title * re-allow use of thnn_h_path This fixes cffi usage in OSS * [4/4] [tum] paralyzing layerNorm for GPU full sync as title * add compile=False to pytorch tests, remove hack with pyc * Add shape and type inference for RowWiseArgMax operator See title * Revert D8515341: Back out "Revert D7802642: [Warmup] Allow both offline incremental training and online training" This reverts commit 78167eeef0af16b60f72c82f9dcdda9b41b4dcbd @bypass-lint An infra SEV is better than not reverting this diff. If you copy this password, see you in SEV Review! @cause_a_sev_many_files * [fix-flaky-test] mock_hive_reader_test flaky, because GlobalCounter collects local counts intervally # Problem `MockHiveReader` uses `GlobalCounter` to limit `max_examples`. GlobalCounter on server node collect local counts from worker nodes every 1 sec. This 1 sec delay makes it impossible to limit exactly to the `max_examples`, it will definitely exceed `max_examples`. # Plan Given, ``` Expected num_examples = max_examples + num_examples/sec (Read Speed) x 1 sec (GlobalCounter Sync Int * [Caffe2] Fix FCGradient cost inference. Prevent overflow in cost inference FCGradient missed a factor 2 in the `num_outputs == 3` case. Overflow was occurring with flop calculation for FC. Changed types to `uint64_t` to prevent future problems. * Fix binary ops with empty inputs Fix binary ops with empty inputs * Support the filling of input blob with provided data as title for Biz Integrity case * Back out "Revert D8515341: Back out "Revert D7802642: [Warmup] Allow both offline incremental training and online training"" Original commit changeset: 30c55dd38816 Original diff is reverted due to introducing bad integration test. Fixed the integration test. * [c2][easy] improve pack ops error loggings as desc. * Add ShapeTypeInference for LpNorm operator As desc * Shard test_nn to reduce runtime for each test target Closes https://github.com/pytorch/pytorch/pull/8793 The current test_nn would time out and be disabled in GreenWarden, and we need to have an option to split it up in order to pass the stress test. Right now GreenWarden roughly allows running 100 test cases in test_nn before timing out, and here we have an option to divide test_nn into 30 shards (with ~40 tests in each shard) to allow for some test suite growth in the future. * Change default caffe2_streams_per_gpu to 1 * Remove IN_SANDCASTLE from common.py and test_nn.py We prefer to disable the failing tests through Sandcastle UI instead. * Add a new class for an updated prof_dag.proto This diff contains: - An updated prof_dag.proto that contains blob profiles. - A class to deserialize this information (serialization is in a follow up diff) - Update to separate profiling information from NeuralNet (and use it as part of the class above). - Unit tests * Lambdarank for SparseNN This diff adds a lambda_rank_layer for SparseNN. changes include 1) Adds support for multi sessions in c2 op 2) Adds support for two different loss functions in c2 op 3) Unit tests for op * Revert D8586950: Back out "Revert D8515341: Back out "Revert D7802642: [Warmup] Allow both offline incremental training and online training"" This reverts commit 012220ed63eccc35659a57b31d16a3625da6317b @bypass-lint An infra SEV is better than not reverting this diff. If you copy this password, see you in SEV Review! @cause_a_sev_many_files * [easy] A few fixups to multithread predictor benchmark (1) support perf on T6 server (2) remove dead code * fix a bug about the map size as title * Fix reduce sum on in-place case. Fix reduce sum on in-place case. * [Warmup] Reland reverted diff Allow both offline incremental training and online training Closes https://github.com/pytorch/pytorch/pull/8827 fix net transform integration test. Allow offline and online trainer to coexist D7802642. * Add StoreHandlerNotAvailableException Add an exception for a store that is not available or has been deleted. * Use exception handling for fault tolerance, missing KV store Remove status blobs to communication ops so that exceptions propagate on failure. * [C2/D2][2/n]: Nonnegative-Constrained Optimization -- bounded grad proj for simple bounded constrained optimization, incl non-negative box constraints. * [GanH]: Adaptive Weighting with More Estimations With implemented postivity optimization, we now learn adaptive weights with different parameterizations. This improves parameter estimation and training stability. * Revert some changes for landing * Remove AutoNoGIL in StorageSharing * Temporarily disable net_tests * Revert "[Caffe2] Force tensor inference checks to be triggered during testing" This reverts commit 67ef05c22b2f71b4a489695384932f968384a2a4. * Revert "Fix reduce sum on in-place case." This reverts commit 6cb8a8e1b3db7b6d20941b0053e3f3836068eb64. * Revert "Revert "Fix reduce sum on in-place case."" This reverts commit 130a257c0893dc09f4bd6e6a45d112261807fd2c.	2018-06-26 14:55:48 -07:00
Xiaomeng Yang	288d37998a	[Caffe2] Fix gradient_check on in-place ops (#8828 ) * Fix gradient_check on in-place ops * Fix hsm_test * Fix SplitByLengthOp test * Fix input_device_options for gradient_checker * Fix hypothesis_test_util.py	2018-06-25 15:25:56 -07:00
zrphercule	c44c95fd0b	New operator 'expand' (#8263 ) * operator 'expand' * updated operator with a simple testcase * Revert "updated operator with a simple testcase" This reverts commit 1ce9f8ac567b525677254b0dce5735d7fea133d7. * updated operator with a simple testcase * expand operator with a passed testcase * typo * GPU full support added * GPU support testing... * GPU full supported * formatted * nits repaired * gpu parameters fixed * Expander removed * nits fixed, document added * formatted * new testcases added & nits repaired	2018-06-18 16:33:47 -07:00
sf-wind	5b86c3af4a	Update from facebook (#8384 ) * [fix] fixup the bias multiplier data access issue Hotfix for failues in conv_transpose * [D2][Easy]: lint regularizer lint with black * [GanH]: Split mu in adaptive weight for diagnose * [Dper] Add the ability to split FC weights into multiple smaller ones * fix SumReduceLikeOp for empty blob as desc. * add ctc_greedy_decoder for caffe2 ctc_greedy_decoder same as tf's * Update event callback handling Allow multiple callbacks per event * Add WeightedSum layer The motivation is to do weighted sum in HoNet/crossnet, in the next diff, I'll replace model.Add with model.WeightedSum in honet: https://fburl.com/f4rmolg2 crossnet: https://fburl.com/v7awn8se, https://fburl.com/63filbnm * Replicate DAG's behavior Some callers expect RunAsync to block, replicate that behavior in case of explicit 'dag' net type * [dper] layernorm layer as title * Override dag, async_dag, async_polling Overriding dag, async_dag and async_polling with async_scheduling * Name the thread pools Caffe thread pools currently inherit the thread names from the thread that starts them, which can be misleading. Give them an explicit name instead. * [Caffe2] FilleOp should support int64_t dimensions Change argument type to int64_t for shape argument of FillerOp (used in ConstantFill, XavierFill, etc) * Remove caffe2/caffe2/contrib/torch/ It's not used anywhere and depends on old lua torch that conflicts with Aten. Given PT1 it's not relevant any more (though it was nice and clever code!) #accept2ship * Fix linearWarmup multiplier check The multiplier needs to be non-negative, not strictly positive. * Revert D3314316 This is after 2 years and we do not seem to have a use case for this one, so for the sake of clean API design we should potentially remove this. This would allow us to potentially pass in arguments to optionally construct an object, although it is indeed a little bit unclear how we can reuse existing objects if constructor arguments are passed in. In any case, we may want to remove this dangling feature. * Speedup generate proposals by partial_sort. Speedup generate proposals by partial_sort. FACEBOOK: - Saw speed improvement for training with this op. - Yanghan benchmarked the op on a small dataset and see consistent 100% improvement on speed (6ms -> 3ms) on 420 input resolution. See next diff for details. * More parallel processing friendly for CPP version of GenerateProposals. More parallel processing friendly for CPP version of GenerateProposals. * [DT] [43/n] Lift stop conditions inside reader code back to flow control 1. Split multi_reader function into local_reader and remote_reader 2. Lifted stop conditions inside Limiter back to flow control 3. Split epoch flow building logic into 3 cases: - single machine (1 reader, 1 trainer on trainer0 node, no PS) - (1 reader + 1 trainer) on trainer0 node, has PS - multiple readers, readers do not share nodes with trainers, might have PS or not * Resolve conflicts for torch/_thnn/utils.py * [Caffe2] Handle image decoding errors Image decoding errors can make the whole training fail. This diff is to handle them 1.Catch imdecode exceptions and check if decoded image has zero columns or rows. This is counted as decoding errors. 2.Replace the image with empty in case of error 3.Count the number of errors and throw runtime exception if the rate reaches given number The empty image data is kept. It might introduce noise in the training data. * Update MKL exporter to IDEEP ops TSIA * [Caffe2] GlobalInit is thread safe, fixing the comment With the mutex and lock, GlobalInit is thread safe. Update the comments. * Back out "Add support for generating ATen files during fbcode build" Original commit changeset: 28970ddba353 @override-unit-failures (Note: this ignores all push blocking failures!) * [DT]: fix predictor save similar to D6610058, here we add the fix for distributed online training * Remove net_singlethread_async_gpu.cc Closes https://github.com/caffe2/caffe2/pull/2528 This removes net_singlethread_async_gpu.cc as part of our effort to clean CUDAContext and the net executors. * Inline DFS task execution Add a DFS inline task execution mode in executor * Add c10 folder to fbcode This adds the c10 folder and its test cases to fbcode. Build flags are mostly taken from aten. * add dependencies for online trainer Add some dependencies so that the online model can use DataPipeline and PredictionTransform operators Relevent post: https://fb.intern.facebook.com/groups/1324375037655677/permalink/1740993462660497/ * Resolve conflicts for tools/jit/gen_jit_dispatch.py * [Fix] sparse regularization in distributed training * Support advanced pooling options in sum processor * support advanced pooling options in sum processor * remove redundant code * support attention in sum processor * Improve shard logging in net tracing code Make it handle arbitrary shard ids instead of just one digit ids. * [Caffe2] Call GlobalInit in predictor only in mobile FACEBOOK: Calling GlobalInit long after the program starts may not be safe. There are issues if the following happens: User does not call GlobalInit and initFacebook after program starts User sets a flag manually: https://fburl.com/mcsumw7d User calls OSS predictor. OSS predictor calls GlobalInit GlobalInit calls initFacebook initFacebook resets all flags: https://fburl.com/tolszha1 Thus, the user manually set flags are overwritten This would happen anytime GlobalInit is called long after the program starts. I suppose the intention of the user in this case is not to call GlobalInit throughout the program, but use Caffe2 regardless (is that desired?) But adding GlobalInit in the OSS predictor would automatically call GlobalInit when using Caffe2. This issue doesn't exist in mobile, since initFacebook is not called on mobile. For now, guard the GlobalInit in predictor for mobile only. May want to ensure the GlobalInit is always called at the start of the program. @[3501714:kutta] has seen weird issues when not calling GlobalInit at the start of the program on server side. He has made some progress on this. * resolve conflicts for caffe2/core/logging_is_google_glog.h and test/test_torch.py * Add empty fix for SumLikeReduceOp Add empty fix for SumLikeReduceOp * Revert D7962948: [caffe2][nomnigraph] Concat elim for sparseNN This reverts commit f7f434dc5c34ca6058b9765d2ef615453d2276a9 @bypass-lint An infra SEV is better than not reverting this diff. If you copy this password, see you in SEV Review! @cause_a_sev_many_files * Remove Declarations.yaml * Include common.h * Change std::stoi to caffe2::stoi * Add thread_name.cc to the CMake file * No need to subtract 1. Fix test segfaults * Fix NetTest, ObserverTest Fix tests (cherry picked from commit 3767e66c3f365596cba3d46d3e7322c933a0ab41) * CTCGreedyDecoderOp only has CPU implementation, test should only run on CPU * Add a variable to avoid conversion resizing issue * [fix] fixup the bias multiplier data access issue Hotfix for failues in conv_transpose * [D2][Easy]: lint regularizer lint with black * [GanH]: Split mu in adaptive weight for diagnose * [Dper] Add the ability to split FC weights into multiple smaller ones * fix SumReduceLikeOp for empty blob as desc. * add ctc_greedy_decoder for caffe2 ctc_greedy_decoder same as tf's * Update event callback handling Allow multiple callbacks per event * Add WeightedSum layer The motivation is to do weighted sum in HoNet/crossnet, in the next diff, I'll replace model.Add with model.WeightedSum in honet: https://fburl.com/f4rmolg2 crossnet: https://fburl.com/v7awn8se, https://fburl.com/63filbnm * Replicate DAG's behavior Some callers expect RunAsync to block, replicate that behavior in case of explicit 'dag' net type * [dper] layernorm layer as title * Override dag, async_dag, async_polling Overriding dag, async_dag and async_polling with async_scheduling * Name the thread pools Caffe thread pools currently inherit the thread names from the thread that starts them, which can be misleading. Give them an explicit name instead. * [Caffe2] FilleOp should support int64_t dimensions Change argument type to int64_t for shape argument of FillerOp (used in ConstantFill, XavierFill, etc) * Remove caffe2/caffe2/contrib/torch/ It's not used anywhere and depends on old lua torch that conflicts with Aten. Given PT1 it's not relevant any more (though it was nice and clever code!) #accept2ship * Fix linearWarmup multiplier check The multiplier needs to be non-negative, not strictly positive. * Revert D3314316 This is after 2 years and we do not seem to have a use case for this one, so for the sake of clean API design we should potentially remove this. This would allow us to potentially pass in arguments to optionally construct an object, although it is indeed a little bit unclear how we can reuse existing objects if constructor arguments are passed in. In any case, we may want to remove this dangling feature. * Speedup generate proposals by partial_sort. Speedup generate proposals by partial_sort. FACEBOOK: - Saw speed improvement for training with this op. - Yanghan benchmarked the op on a small dataset and see consistent 100% improvement on speed (6ms -> 3ms) on 420 input resolution. See next diff for details. * More parallel processing friendly for CPP version of GenerateProposals. More parallel processing friendly for CPP version of GenerateProposals. * [DT] [43/n] Lift stop conditions inside reader code back to flow control 1. Split multi_reader function into local_reader and remote_reader 2. Lifted stop conditions inside Limiter back to flow control 3. Split epoch flow building logic into 3 cases: - single machine (1 reader, 1 trainer on trainer0 node, no PS) - (1 reader + 1 trainer) on trainer0 node, has PS - multiple readers, readers do not share nodes with trainers, might have PS or not * Resolve conflicts for torch/_thnn/utils.py * [Caffe2] Handle image decoding errors Image decoding errors can make the whole training fail. This diff is to handle them 1.Catch imdecode exceptions and check if decoded image has zero columns or rows. This is counted as decoding errors. 2.Replace the image with empty in case of error 3.Count the number of errors and throw runtime exception if the rate reaches given number The empty image data is kept. It might introduce noise in the training data. * Update MKL exporter to IDEEP ops TSIA * [Caffe2] GlobalInit is thread safe, fixing the comment With the mutex and lock, GlobalInit is thread safe. Update the comments. * Back out "Add support for generating ATen files during fbcode build" Original commit changeset: 28970ddba353 @override-unit-failures (Note: this ignores all push blocking failures!) * [DT]: fix predictor save similar to D6610058, here we add the fix for distributed online training * Remove net_singlethread_async_gpu.cc Closes https://github.com/caffe2/caffe2/pull/2528 This removes net_singlethread_async_gpu.cc as part of our effort to clean CUDAContext and the net executors. * Inline DFS task execution Add a DFS inline task execution mode in executor * Add c10 folder to fbcode This adds the c10 folder and its test cases to fbcode. Build flags are mostly taken from aten. * add dependencies for online trainer Add some dependencies so that the online model can use DataPipeline and PredictionTransform operators Relevent post: https://fb.intern.facebook.com/groups/1324375037655677/permalink/1740993462660497/ * Resolve conflicts for tools/jit/gen_jit_dispatch.py * [Fix] sparse regularization in distributed training * Support advanced pooling options in sum processor * support advanced pooling options in sum processor * remove redundant code * support attention in sum processor * Improve shard logging in net tracing code Make it handle arbitrary shard ids instead of just one digit ids. * [Caffe2] Call GlobalInit in predictor only in mobile FACEBOOK: Calling GlobalInit long after the program starts may not be safe. There are issues if the following happens: User does not call GlobalInit and initFacebook after program starts User sets a flag manually: https://fburl.com/mcsumw7d User calls OSS predictor. OSS predictor calls GlobalInit GlobalInit calls initFacebook initFacebook resets all flags: https://fburl.com/tolszha1 Thus, the user manually set flags are overwritten This would happen anytime GlobalInit is called long after the program starts. I suppose the intention of the user in this case is not to call GlobalInit throughout the program, but use Caffe2 regardless (is that desired?) But adding GlobalInit in the OSS predictor would automatically call GlobalInit when using Caffe2. This issue doesn't exist in mobile, since initFacebook is not called on mobile. For now, guard the GlobalInit in predictor for mobile only. May want to ensure the GlobalInit is always called at the start of the program. @[3501714:kutta] has seen weird issues when not calling GlobalInit at the start of the program on server side. He has made some progress on this. * resolve conflicts for caffe2/core/logging_is_google_glog.h and test/test_torch.py * Add empty fix for SumLikeReduceOp Add empty fix for SumLikeReduceOp * Revert D7962948: [caffe2][nomnigraph] Concat elim for sparseNN This reverts commit f7f434dc5c34ca6058b9765d2ef615453d2276a9 @bypass-lint An infra SEV is better than not reverting this diff. If you copy this password, see you in SEV Review! @cause_a_sev_many_files * Remove Declarations.yaml * Include common.h * Change std::stoi to caffe2::stoi * Add thread_name.cc to the CMake file * No need to subtract 1. Fix test segfaults * Fix NetTest, ObserverTest Fix tests (cherry picked from commit 3767e66c3f365596cba3d46d3e7322c933a0ab41) * CTCGreedyDecoderOp only has CPU implementation, test should only run on CPU * Add a variable to avoid conversion resizing issue * Remove the code per soumith's comments * Remove the code per soumith's comments * Remove blank lines in the end of file * Resolve conflicts for torch/_thnn/utils.py * Update MKL exporter to IDEEP ops TSIA * Back out "Add support for generating ATen files during fbcode build" Original commit changeset: 28970ddba353 @override-unit-failures (Note: this ignores all push blocking failures!) * add dependencies for online trainer Add some dependencies so that the online model can use DataPipeline and PredictionTransform operators Relevent post: https://fb.intern.facebook.com/groups/1324375037655677/permalink/1740993462660497/ * Resolve conflicts for tools/jit/gen_jit_dispatch.py * Support advanced pooling options in sum processor * support advanced pooling options in sum processor * remove redundant code * support attention in sum processor * resolve conflicts for caffe2/core/logging_is_google_glog.h and test/test_torch.py * Revert D7962948: [caffe2][nomnigraph] Concat elim for sparseNN This reverts commit f7f434dc5c34ca6058b9765d2ef615453d2276a9 @bypass-lint An infra SEV is better than not reverting this diff. If you copy this password, see you in SEV Review! @cause_a_sev_many_files * Remove Declarations.yaml * Include common.h * Change std::stoi to caffe2::stoi * [caffe2] uprade IDEEP and hotfix for conv op accuracy issue (#8364) * [IDEEP] Upgrade IDEEP version Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com> * [IDEEP] Fix accuracy issue in conv op Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com> * Fix build error due to lack of src in CMakeLists Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com> * Remove the code per soumith's comments * [ONNX] Add an ATen fallback pathway for ONNX export (#8273) * ATen fallback for ONNX export * Move to enum * Fix model test * Add comment * Address comments BC interface * Remove imaginary file (#8415) * [Caffe2] Enable AMD/MIOPEN ops for Caffe2 (#8306) * Add hip support for caffe2 core * Add MIOPEN header/wrapper to caffe2 core * Add HIP device into caffe2 PB * top level makefile change for rocm/hip * makefile scaffolding for AMD/RocM/HIP * Makefile scafodding for AMD/RocM/HIP; add makefile/utility for HIP files * caffe2 PB update for AMD/ROCM HIP device * Add AMD/RocM/Thrust dependency * HIP threadpool update * Fix makefile macro * makefile fix: duplicate test/binary name * makefile clean-up * makefile clean-up * add HIP operator registry * add utilities for hip device * Add USE_HIP to config summary * makefile fix for BUILD_TEST * merge latest * Fix indentation * code clean-up * Guard builds without HIP and use the same cmake script as PyTorch to find HIP * Setup rocm environment variables in build.sh (ideally should be done in the docker images) * setup locale * set HIP_PLATFORM * Revert "set HIP_PLATFORM" This reverts commit 8ec58db2b390c9259220c49fa34cd403568300ad. * continue the build script environment variables mess * HCC_AMDGPU_TARGET * Cleanup the mess, has been fixed in the lastest docker images * Assign protobuf field hip_gpu_id a new field number for backward compatibility * change name to avoid conflict * Fix duplicated thread pool flag * Refactor cmake files to not add hip includes and libs globally * Fix the wrong usage of environment variables detection in cmake * Add MIOPEN CNN operators * Revert "Add MIOPEN CNN operators" This reverts commit 6e89ad4385b5b8967a7854c4adda52c012cee42a. * Add MIOPEN pooling operator * Add MIOPEN activation operator * Add MIOPEN softmax operator * Add MIOPEN spatial batch norm operator * Add MIOPEN loacl response normalization operator * Add MIOPEN conv operator * Clean-up LRN ops * enable fp16 in MIOPEN pool ops * Enable fp16 for MIOPEN relu op * Enable fp16 for MIOPEN spatial batch norm op * code clean-up * revert float16 support * Create Caffe2 python binding for AMD/ROCM/HIP * Add op fallback for HIP operator * add hip src/test files in cmake * exclude hip src/test files * fix python binding for hip backend * fix MIOPEN pooling op workspace * hack to compile miopen operators * fix include path for MIOPEN ops * Fix include path * Add HIP math utilities * Fix path for HIP math utils * cmake fix * Cmake fix / hipcc for hip files * suppress hipcc warning * cmake fix /replcae USE_HIP with USE_ROCM * revert LoadHIP.cmake change * fix include for thrust/cub-hip * include path fix for conversion.h * Updated with latest upstream changes * clang format fixes * Context_hip updates * Fixed typo in rocblas handle get function * Updated hipified math utils * Updated math hip test util * Updated context hip test * Updated common_hip * Updated net async dag for HIP * Added MIOPEN in operator hip test * fix * C2 dependencies clean-up * fix include path for building custom protobuf * Decouple miopen pool op and conv_pool_op base * cmake refactor * fix operator_hip_test * move all hip/miopen ops files into caffe2/operators/hip * sanitize cmake * permission issue * remove extra parenthesis * remove artifact from resolving merge conflict * cont. sanitize cmake files * fix syntax error * sanitize conversion.h * . * Revert "." This reverts commit 56020cb0e996a31ae27bf1f8f491955ed0b121b9. * clang-format * Enable some reduce operators' ONNX backend tests (#8418) * fix old comment to point to the right file (#8416) * Stop pinning nccl version. (#8421) Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Expose logsumexp docs and mark log_sum_exp in distributions for internal use (#8428) * Enable some of the ONNX backend test on broadcasting (#8423) * Enable some of the ONNX backend test on broadcasting * enable gemm broadcast * Expose proto utils and ONNX (#8073) * Expose proto utils and ONNX from PyTorch libcaffe2.so * Try to use protobuf from _C.so * Fix ONNX proto header include * Adjust order of imports for ONNX until nanopb goes away * Set and use ONNX_NAMESPACE for PyTorch builds * Show protobuf summary for all builds * Add ONNX_NAMESPACE for cpp_build * Statically link libprotobuf.a into libtorch.so * Set ONNX_NAMESPACE on Windows build * Move core/dispatch up as well * Add /MD flag for Windows build of _C * Potential Windows fix for ONNX and protobuf * Add direct linkage from _C to ONNX on Windows * Only include protobuf wrapper for PyTorch * Pass extra_compile_args to _nvrtc ext build * Remove installation of .a files * Rebase creates some weird situations, revert them manually * Remove more weird changes due to rebase * Need to add thread_name.cc after merge	2018-06-13 13:10:45 -07:00
Xiaomeng Yang	44973a06ba	Add affine_channel_op (#8356 ) Add affine_channel_op	2018-06-11 20:51:11 -07:00
bddppq	3521cd54af	Fix dividing by zero segfault in Reshape (#8302 ) when infer a dimension of zero size new shape	2018-06-09 09:48:22 -07:00
sunnieshang	b2dac08049	Fix a corner case for ReShapeOp (#8178 ) In my use case, in the backward propogate pass, the reshape need to change a [0] tensor into [0,0] shaped tensor. The original implementation would cause out of index issue. This diff fix this problem.	2018-06-05 19:06:10 -07:00
Xiao Yang	ffde23d45e	use the correct datatype format (#8144 )	2018-06-05 22:01:59 -04:00
Xiaomeng Yang	9243b64bff	[Caffe2] Update elementwise ops to support numpy style boradcast (#8070 ) * Update elementwise ops to support numpy style boradcast Update elementwise ops to support numpy style boradcast * Fix sqrt_op * Fix compare ops * Fix gradient test * Fix optimizer legacy broadcast * Fix legacy broadcast for elementwise ops * Skip flaky test * Fix eigen simple binary op * Fix attention test * Fix rnn test * Fix LSTM test * Fix tan grad * Fix schema check	2018-06-05 15:49:16 -07:00
Bram Wasti	82b981e4db	Update from facebook 1ee4edd286a3 (#8040 ) * Adding instance weight to batch distill loss as title * add bfloat 16-31 added bfloat 16-31 and their respective unit tests * [CUDA9] Upgrade - fbcode CUDA9 upgrade diff D5654023 has been out for a while thanks to Pieter. But with time growing it's becoming quite hard to rebase, because of the symlinks and auto-generated build/config files in tp2. Break D5654023 into two diffs, one touching tp2 config files, and another one touching fbcode TARGETS file (adding nvcc flag). These two should be a bit easier to rebase (for detailed procedure see "Test Plan"). This diff can only be committed if: 1. CUDA 9 rpm is rolled out fleet-wide (TBD) 2. NVidia driver 390.40 is rolled out fleet-wide (done) 3. Upgrade CUDA 9.1, cudnn 7.1, nccl 2.1 (done) 4. Make sure all dependents are built (done) 5. Test all C2 operators, PyTorch (see test plan) * Share intermediate int32 buffer across Conv ops Adding a known type * [C2 fix] infer function for ensure_cpu_output_op this is adding the missing device funtion for ensure_cpu_output_op * [int8] Add blob serializer/deserializer for Int8TensorCPU To export to logfiledb * [nomnigraph] Add try catch block to optimization passes in predictor This will catch failures that happen in the optimization pass. * Caffe2: avoid static initialization order fiasco for CAFFE_ENFORCE CAFFE_ENFORCE uses strack trace fetcher. Which is currently a global static variable. If at static initialization time CAFFE_ENFORCE is used, this is a SIOF. Recently CAFFE_ENFORCE was added into init functions registration, so we started to see this. Meyers singleton is going to provide safety here. If stacktrace fetcher was not registered yet, it will just use a dummy one. * NUMA support in SparseNN CPU benchmark Adding support for NUMA in SparseNN CPU benchmark * [mobile-roofline] Add logging needed for roofline model This should be all that's needed * Let the operators using the same input if the operators are not chained or else, we have to change the input data dims * fix null-pointer-use UBSAN errors in in reshape_op.h * revert previous fix on input blob name as title * Adding flag to let MineHardNegative automatically extract single value from dict Model exporter requires the output of the model to be a struct. This makes it convenient to use those models directly in MineHardNegative by allow automatic extraction of the single element of dict, which is a common use case. * Reverting change that broke internal tests back to OSS compatible state	2018-06-01 17:41:09 -04:00
Sebastian Meßmer	49f8581745	Update from facebook (#7855 ) * [mpscnn] MPSCNNChannelShuffle att * [Easy] Adding tags as an argument to the functional layer Without it "tags" would be added as an argument to the operator. The change here is based on the assumption that there is no operator that takes "tags" as an argument. * Fix locally_connected_op schema check. Fix locally_connected_op schema check. * [C2] Add TypeAndShape inference for few more operators As desc * [c2] Shape inference should support 0 as dimension Tensors can have 0 in their dimension. * Make MockHiveReader loop over and support max_examples Replace DatasetReader with RandomDatasetReader. So that Mock Hive Reader can simulate a large data input using a small sample file as source. * Utility function to wipe cache between benchmark runs Caffe2 benchmark does not wipe out cache between runs, and this potentially creates an unrealistically optimistic picture of performance. This diff adds utility function to wipe out the cache. * Allow caffe2 GlobalInit to be invoked multiple times Allow caffe2 GlobalInit to be invoked multiple times. Will re-parse gflags and update logging levels on successive invocations, but will not re-run init functions or perform other one-time initialization. * Add Caffe2 GlobalInitIsCalledGuard to base net and operator classes Warn if caffe2's GlobalInit function has not been invoked before creating an operator or net object. This is based on discussion here: https://fb.quip.com/kqGIAbmK7vNG * Rethrow current exception on failure Rethrow current exception instead of copy constructing a new one on op failure. * Make `clone()` return subclass of List/Struct `clone()` is not working correctly when we subclass those classes * Wipe the cache before the net run the util function is copied from D7409424 will rebase once D7409424 is landed. * [Caffe2] [Mobile] Support utils/cast.h::GetCastDataType with LITE_PROTO builds * Correct includes async_polling include -> async_base include * Prepare execution flags for executor migration Making async_scheduling aware of underlying net type to prepare for executor migration * Add operator level observers into async executor Adding operator level observers into RunAsync operators' calls * Cleanup TEST_Benchmark Remove duplicate code and provide default implementation in NetBase * [C2] Fix type and shape inference for binary comparison ops As desc. * Add GlobalInit to predictor to ensure initialization is always done before prediction FACEBOOK: Redo D7651453 the correct way. Now use a static variable for the arguments passed to GLog * Remove spammy log message This method is currently used in various places inside Caffe itself. * Disable events for operators inside a chain We don't need to use events in operators within a chain because the chain is always scheduled on a single stream, keeping only first and last event for scheduling purposes * Ensure correct finish run order In rare cases we might call finishRun and trigger net's destruction while another worker is still holding shared_ptr to a thread pool, that can cause thread pool destruction from within a worker thread in case no other nets are using the pool. This diff fixes the order of calling finishRun and also changes pool() to return raw pointer to keep pool's ownership within the net * Reduce unnecessary polling Make sure we don't waste CPU by polling operators that we can set an efficient callbacks on * Squash commit of syncing `9506eeb` from github to fbcode Patch xplat buck fix add virtual destructor to OptimizationPass add virtual destructor to OptimizationPass build fixes for sync build fixes for sync * Fix net tracing Fix net tracing from async_scheduling * Fix logging	2018-05-29 11:38:02 -07:00
Orion Reblitz-Richardson	74246c9ba4	Potential fix for RNN test on MKL (#7862 )	2018-05-25 16:16:46 -07:00
bddppq	93b7b5dddd	Fix trigonometric_op_test failures when running in python3.6 (#7831 )	2018-05-24 19:09:35 -07:00
bddppq	f94ae3ba1d	Update from facebook (#7696 ) * Fix handling of empty batches in SumReduceDimsOp As titled * Deferrable async_scheduling finishRun fix Proper order of finishing run operations in deferrable_async_scheduling net * Simplify exception handling in async_scheduling Simplify exception handling, no need to busy wait, thread that processes the last task can finish the run * [C2]worker_coordinator_memorize_worker_ids As titled. This is related to T28689868, where the number of blobs we want to create is equal to the number of worker ids * Add unit test for nets with no type set * Ignore total length argument in sympolic_pad_packed_sequence 1- There was a mistake in the code that total_length was added to the wrong symbolic function (pack_padded_sequence) instead of (pad_packed_sequence) 2- No need to throw an exception if total_length is given since it is only used to enable data_parallel training on multi-gpus and doesn't have anything to do with onnx export, so just ignore it. https://fburl.com/tk4gciqp * Add support for MKLDNN to async_scheduling Just add MKLDNN as a possible CPU option to async_scheduling's pool function * [AuFL][ensemble] support branch output for prediction This diff supports using predictions from different branches and thus enables model ensembling (not fully independent). * Fix a bug in add_loss in layer_model_helper As titled. * Support lradaption for adam 1.lr adaption operator 2.apply to dense adam * Perf tweaks for async_scheduling Restore single pool option + remove unnecessary (no-ops) calls * add quantization to SparseSimdAdagradOp add a bunch of quantization signatures to SparseSimdAdagradOp, implementations to come next * [sr] [codemod] Change all SR callsites to use new API @allow-large-files This diff refactors all callsites of SR to use the slightly changed API introduced in the diff below. Really what this means is that you need to include the correct header. Also if you were using `ClientFactory::newFactory` you need to not prefix it with `ClientFactory::`. ``` cd ~/fbsource/fbcode find ./ -type f -exec sed -i -e 's:#include "servicerouter/client/cpp2/ClientFactory.h":#include "servicerouter/client/cpp2/ServiceRouter.h":' -e 's:#include <servicerouter/client/cpp2/ClientFactory.h>:#include <servicerouter/client/cpp2/ServiceRouter.h>:' -e 's/ClientFactory::newFactory(/newFactory(/g' {} \; ``` Also manually fixed spots that couldn't be done automatically (or broke because they depended on transitive includes). * Back out "Fix handling of empty batches in SumReduceDimsOp" Original commit changeset: 282da1730cc2 This commit is blocking the Github->fbcode sync, which really needs to get merged ASAP. D7881937 which this diff depends on will be reverted in the sync D7990948 which causes this to break. The sync diff cannot be patched with this reversion because it must be landed against base revision 5c8c099 , and D7881937 must not be included in the sync diff because it is breaking GPU tests that are not available in sandcastle : https://ci.pytorch.org/jenkins/job/caffe2-builds/job/py2-cuda8.0-cudnn6-ubuntu16.04-test/3638/console for one example. * Add the flow to support operator benchmark 1) generate model with the operator 2) upload to everstore 3) generate model spec into json file 4) start running the benchmark * [tum][gpu] Connect DPM trainer with flow and unit tests This diff: - Fix some small bugs for Yiming's recent changes to parallelizer, so it suits real use cases. - Add correct tags to the TUM code, so we can do data parallel transform - pass extra info when instantiation. - add unit test for using DPM in TUM model After this diff, we can do simple box, multi-gpu fully-sync trainer for TUM in Fblearner workflow, but may still need to do speed benchmarking. * w/o normalized lradaption for adam dense only The previous lr adaption includes a normalization step when performing the dot product operation. This is not exactly same as what is proposed in the paper. I add normalization as an option. Without it, the operator performs exactly what the paper proposed. With the option, we add the normalization step * [fb] Use SharedPromise in DeferrableAsyncSchedulingNet This code is to simplify DeferrableAsyncSchedulingNet by removing condition variable + small fixes * [tum] implement cuda sparseLengthsMean and LengthsMean as title * Adding an optional parameter to allow use of protobufs in InferShapesAndTypes function. Adding an optional parameter to allow use of protobufs in InferShapesAndTypes function. * Move feature_to_index to FeatureSpec.feature_to_index move feature_to_index to FeatureSpec.feature_to_index to avoid override other fields * [Caffe2] Rename bytes_moved to bytes_written Just a rename in preparation for supporting bytes_read. * [c2] fix ReduceFrontSumOp for empty case by setting 0 otherwise, it may use the results from last iteration when it's empty batch. * [Caffe2] [Int8] Improve Intel CPU performance * [Easy] Improve PrependDim op logging as titled * DBFileReader expand db_path using os.path.expanduser(..) Since there are a lot of possible use cases of `DBFileReader` to read from user home path, like `~/local/sample.db`, I want to save people's trouble of calling `os.path.expanduser(db_path)` themselves. * [Caffe2] Add bytes_read to cost structure We're adding analytical read bytes to cost functions. This extends the structure accordingly for all CostInference defined operators. Additionally, some small bug fixes were performed: 1) Cost functions now extract type information of operands instead of assuming float * Fix sleef on aarch64 for hhvm @bypass-lint Rename flag * Remove duplicated part in caffe2/ideep/operators/conv_op.cc should be sync error * Rename test helper function test_adagrad_sparse_helper to adagrad_sparse_test_helper to avoid confusing pytest	2018-05-19 23:10:48 -07:00
Paul Jesse Hellemn	48bf733480	Changes from D7881937 and D7963936 plus an edit (#7605 ) * Changes from D7881937 and D7963936 plus an edit * D8038158 * Another change from cxj	2018-05-18 20:59:16 -07:00
James Sun	b4d5e67e5f	Add asin, acos, tan, atan operators (#7600 )	2018-05-16 18:09:26 -07:00
Xiaomeng Yang	921dece2d7	Update Im2ColNd functions (#7505 ) Update Im2ColNd functions	2018-05-12 15:59:50 -07:00
Paul Jesse Hellemn	b875fb281c	Update from facebook (#7451 ) * [bootcamp] Improve "Shape" operator to support axes specification To improve .shape operator of Caffe2 to support x.shape(tensor, axes), which takes an optional int array "axes" as input. For example, x.shape(tensor, [1, 0]) will return the dimension for axis 1 and 0 following the specified order. For current version, "axes" input allows duplications and can have arbitrary length. * Back out "Add barrier net that runs before training nets" Original commit changeset: b373fdc9c30f. Need additional changes to some callers to support barrier failures. * Change warning to verbose log to reduce log spam The `LOG(WARNING)` was a bit spammy for regular use so lets just make it a `VLOG`. * Extract the shared code from different caffe2_benchmark binaries The OSS benchmark and Internal benchmark will share most functions in the benchmark. * Support MFR in sequence training As titled. * Make knowledge distillation work with using logged prediction feature as teacher label. 1) Add loading raw dense feature as teacher label. 2) Optional calibration function for teacher label 3) Add teacher label into generic unit test 4) Deprecated TTSN workflow version using feature_options to config teacher label * [C2/CUDA]: unjoined cross entropy sigmoid as desc * Add async_scheduling executor into deferrable_net_exec_test Add async_scheduling into tests and fix some exception cases * Fix Event disabled error When disabling event in RNN ops make sure we don't call Finish on disabled event from op's RunAsync * cuda ensure cpu output op can handle both TensorCPU and TensorCUDA as desc. * [C2 Core] Infer input device option in C2 hypothesis_test checkers Improve how we default input blob device options. Previously it defaults as where op lives but it is not necessarily the case. For example: CopyCPUToGPU * [C2 Op]SplitByLengthsOp CPU/GPU implementation [C2 Op]SplitByLengthsOp CPU/GPU implementation * fix undefined symbol error not sure why we're getting undefined symbol even with link_whole = True Need to figure out why but need this workaround for now * Add tools in DAIPlayground platform to help debugging models Add additional tools to allow Plauground override individual method defined in AnyExp. This will allow user to create module that specificly change certain default method behavior. An example included in this diff is deactivating test model and checkpointing. When debugging any model problems, switching off components helps me quickly narrow down the location of the bug. The technique is extensively used in task T27038712 (Steady memory increase in EDPM, eventually resulting in gloo/cuda.cu:34: out of memory) * add shape and type inference for int8 conversion operator * Fix flaky test for group_norm Fix flaky test for group_norm * Fix group_norm_op_test flaky Fix group_norm_op_test flaky * Implementation of composite learning rate policy In many state-of-the-arts deep learning works, people use a simple trick to schedule the learning rate: use a fixed learning rate until error plateaus and then switch to a different fixed learning rate, and so on. In this diff, we implemented a simple version of the composite learning rate. The user gives a set of learning rates policies and corresponding iteration nums, and the optimizer will change the learning rate policy based on the number of iterations so far. For example, the user give two learning rate policies, one is FixedLearningRate and PolyLearningRate, with an iteration number of 1k. Then the first 1k iteration, we use FixedLearningRate. For the following iterations, we use PolyLearningRate. * Split two use cases of CachedReader into two classes, DBFileReader and CachedReader # Use Cases: 1). input: DB file -> output: DatasetReader. Use DBFileReader. 2). input: Reader -> build cache DB file -> output: DatasetReader. Use CachedReader. # Changes to CachedReader: 1). Move db_path to the constructor. Because in mock reader. cache will always be built ahead. # Changes to tests: 1). Make a separate TestCase class for CachedReader and DBFileReader. 2). Make it possible to add more test functions by adding setUp, tearDown and _make_temp_path. 3). Make delete db_path more general. `db_path` could be a file for `log_file_db`, but could also be a directory for `leveldb`. * Back out "On Mobile phones, call GlobalInit with no arguments in predictor in case we need to perform initialization" Original commit changeset: 4489c6133f11 * Fix LARS bug Fixed a bug in the LARS implementation which caused all subsequent blobs not using LARS to have the LARS learning rate multiplier applied to them. * [tum] support sparse init & add uniformFill option as title * Propagate exception for async nets Capture the exception when an exception is thrown in async nets and re-throw it after wait(). This allows exceptions to be propagated up to the caller. This diff was a part of D7752068. We split the diff so that C2 core files changes are in a separate diff. * Automatic update of fbcode/onnx to 69894f207dfcd72d1e70497d387201cec327efbc Previous import was 403ccfbd0161c38f0834413d790bad0874afbf9a Included changes: - [69894f2](https://github.com/onnx/onnx/commit/69894f2): Use op schema.all tensor types in random like definitions (#865) <Scott McKay> - [b9d6b90](https://github.com/onnx/onnx/commit/b9d6b90): Clarify random like operators (#846) <Scott McKay> - [fc6b5fb](https://github.com/onnx/onnx/commit/fc6b5fb): Refactor shape inference implementation (#855) <anderspapitto> - [b7d8dc8](https://github.com/onnx/onnx/commit/b7d8dc8): fix cmake warning message (#863) <Eric S. Yu> - [f585c5d](https://github.com/onnx/onnx/commit/f585c5d): add pytorch-operator test for tile (#831) <Wenhao Hu> - [993fe70](https://github.com/onnx/onnx/commit/993fe70): add install step (#832) <Eric S. Yu> - [68bc26c](https://github.com/onnx/onnx/commit/68bc26c): add type inference for traditional ml ops except classifier ops. (#857) <Ke Zhang> - [9cc0cda](https://github.com/onnx/onnx/commit/9cc0cda): fix string representation of scalar types (#858) <G. Ramalingam> - [1078925](https://github.com/onnx/onnx/commit/1078925): fix y in pow test case to scalar (#852) <Wenhao Hu> - [c66fb6f](https://github.com/onnx/onnx/commit/c66fb6f): Add some math function shape inference (#845) <anderspapitto> - [ff667d1](https://github.com/onnx/onnx/commit/ff667d1): Refactor return type and docs for ONNXIFI_BACKEND_DIRECTX_ID (#853) <Marat Dukhan> - [11c6876](https://github.com/onnx/onnx/commit/11c6876): clear initializer names when clear initializer (#849) <Wenhao Hu> - [73c34ae](https://github.com/onnx/onnx/commit/73c34ae): Clarify FeatureVectorizer description. (#843) <Scott McKay> - [1befb9b](https://github.com/onnx/onnx/commit/1befb9b): Remove useless text in docs (#850) <Lu Fang> - [e84788f](https://github.com/onnx/onnx/commit/e84788f): Fix SELU attributes' default values (#839) <Lu Fang> - [ebac046](https://github.com/onnx/onnx/commit/ebac046): Add tile test case (#823) <Wenhao Hu> - [8b7a925](https://github.com/onnx/onnx/commit/8b7a925): a few more shape inference functions (#772) <anderspapitto> - [9718f42](https://github.com/onnx/onnx/commit/9718f42): Make the coefficient non optional for LinearClassifier (#836) <Jaliya Ekanayake> - [ef083d0](https://github.com/onnx/onnx/commit/ef083d0): Add save_tensor and load_tensor functions for Protos (#770) <Lu Fang> - [45ceb55](https://github.com/onnx/onnx/commit/45ceb55): Check if CMAKE_BUILD_TYPE set before project(). (#812) <Sergii Dymchenko> - [4b3d2b0](https://github.com/onnx/onnx/commit/4b3d2b0): [WIP] reenable shape inference tests (#834) <anderspapitto> - [22d17ee](https://github.com/onnx/onnx/commit/22d17ee): RNN tests: LSTM, GRU, SimpleRNN (#739) <Peyman Manikashani> - [de65b95](https://github.com/onnx/onnx/commit/de65b95): dimension denotation (#443) <Tian Jin> - [eccc76e](https://github.com/onnx/onnx/commit/eccc76e): fix field number issue in onnx operator proto and enable its build (#829) <Ke Zhang> - [d582beb](https://github.com/onnx/onnx/commit/d582beb): disable shape inference test to unbreak ci (#830) <Lu Fang> - [485b787](https://github.com/onnx/onnx/commit/485b787): function proto for composite op. (#802) <Ke Zhang> - [cd58928](https://github.com/onnx/onnx/commit/cd58928): specify defaults for attributes of Affine op (#820) <G. Ramalingam> - [7ee2cf9](https://github.com/onnx/onnx/commit/7ee2cf9): merge the dummy backend back into the main one (#743) <anderspapitto> - [1c03a5a](https://github.com/onnx/onnx/commit/1c03a5a): [Proposal] ONNX Interface for Framework Integration (previously ONNX Backend API) header and docs (#551) <Marat Dukhan> - [3769a98](https://github.com/onnx/onnx/commit/3769a98): Rename real model test case from VGG-16 to ZFNet (#821) <Lu Fang> * [C2]ReluN Op relu n op. tf reference: https://www.tensorflow.org/api_docs/python/tf/nn/relu6 * Call destructor when assigning a blob value * Add executor overrides Add executor overrides flag to enable migration to async_scheduling executor * Add barrier net that runs before training nets - attempt #2 Add a synchonize barrier net that is run before training nets. With this net, shards that are faster will wait for other shards before start training. This reduce chances of the faster shards timing out during GLOO AllReduce. Removed explicit data_parallel_model.py.synchronize call in holmes workflow. This change was landed previously but caused errors for some EDPM workflows - See https://fb.facebook.com/groups/1426530000692545/permalink/1906766366002237/ - because EDPM assumes any call to CreateOrCloneCommonWorld and Gloo ops are wrapped in exception handlers but in this case exception thrown in the barrier init net is not handled. To address this issue, we add _CreateOrCloneCommonWorld to the param_init_net instead of a new barrier init net. Since errors for param_init_net run is handled gracefully and re-rendezvous, it should fixes the problem. * Handle empty nets in async_scheduling Make sure we don't get stuck on empty nets * use CUDA_ARCH for conditional compile * [C2 fix] infer function for ensure_cpu_output_op * Update group_norm test to reduce flaky test * Fix lr_multiplier for GPU	2018-05-10 23:14:27 -07:00
Xiaomeng Yang	3ae92b3a8b	Fix lint errors (#7247 )	2018-05-03 12:17:23 -07:00
Lu Fang	664fe34e0a	[Caffe2][fbcode=>GH sync] Update from facebook 4323b18ce13c (#7116 ) * [fix] Re-enable events in RNN ops We have earlier added event disabling in RNN ops as back then we didn't use events, with current use cases this is no longer true (https://fburl.com/8vd0lp8y) * use ops with cude impl * Revert D7729695: [caffe2][fix] Re-enable events in RNN ops This reverts commit 4b215c7496fb724656ff4c776933a15bdbbcde5e @bypass-lint An infra SEV is better than not reverting this diff. If you copy this password, see you in SEV Review! @cause_a_sev_many_files * [observer] Clean up observer_config.h #accept2ship * [1/n] Refactor dataio_test.py Replace code duplication with a common function * Add barrier net that runs before training nets Add a synchonize barrier net that is run before training nets. With this net, shards that are faster will wait for other shards before start training. This reduce chances of the faster shards timing out during GLOO AllReduce. Removed explicit data_parallel_model.py.synchronize call in holmes workflow. Similar change in speech/asr_training workflow will come in another diff. * Support the dnnlowp backend in caffe2_benchmark This is for SHARE operator latency evaluation * Migrate integral_image_op to main caffe2 migrate integral_image_op(GPU version) given by https://fburl.com/yvqezigi to caffe2/caffe2/operators and implement its CPU version. Write up a test using the hypothesis_test mechanism * [pos_disc, fbcode] Implement unjoined lr loss As explained in https://our.intern.facebook.com/intern/wiki/Model_Based_Calibration/, when the dataset is an joined data set, where labels might change later, we need to use unjoined logloss. The implementation is almost the same as in Sigrid (https://fburl.com/1trngsls), where loss = y (log(p) - log(1-p)) + (1-y)(log(1-p)) = xy - (1-y)x - (1-y)log(1+exp(-x)) For x < 0, to ensure stability and avoid overflow, we reformulate the above exp as loss = xy - (1-y)x - (1-y)x + (1-y)log(1+exp(x)) = xy + (1-y)log(1+exp(x)) Then the final expression becomes loss = xy + (y - 1) x (x >= 0) - (1 - y) log(1 + exp(x - 2 x (x >= 0))) where y is the true label, x is the dot product and p = logistic(x). This kind of implementation is align with the current implementation of the original cross entropy in https://phabricator.intern.facebook.com/diffusion/FBS/browse/master/fbcode/caffe2/caffe2/operators/cross_entropy_op.cc;0bae3b5d0f825897c5e0dd0ff10f489d7271bf25$7-13 * Keep the array to fix the conflict * [C2] Compute Adagrad effective LR The AdagradWithLR op outputs an extra blob which is contains the average effective learning rate across all weights in this blob. * Open-source extractMetaNetDef & runGlobalInitialization, add new Predictor constructor from db file, and add run_map_outputs 1. Open-source extractMetaNetDef and runGlobalInitialization, for use in 2. new Predictor constructor from db file. 3. Add new run function that returns outputs as TensorMap * Disable eigen cpu Disable eigen cpu in transpose and reduce * Introduce request_only/object_only property of ModelLayer by default this is False * A simple TC Caffe2 benchmark We can run tunner, get MappingOptions and then use them to compare against cuBLAS currently broken due to LLVM issues. How to run: hg checkout eec1ab31b59c03b8deded1c755a9abaf8c45be01 add D7401202 add D7434625 add D7506031 add D7540728 buck run @mode/dev-nosan tc/tc/benchmarks_python:caffe2_benchmark * Move Caffe2 feature_maps_ops to open source Need feature maps operators in open source project facebookresearch/BlueWhale * Manually fix the conflicts in channel shuffle op * Fix the inconsistency between different gh and fbcode * Skip Adagrad GPU Test (Because some gpu implementation is missing) * Fix another test to make sure it won't run on gpu when implementation is not available yet	2018-05-01 20:49:00 -07:00
Xiaomeng Yang	08a853b02c	Add rsqrt op in caffe2 (#7154 )	2018-05-01 15:06:53 -07:00
Xiaomeng Yang	762eb3ddc8	[Caffe2] Add moments op in caffe2 (#7114 ) * Add moments op in caffe2 * Use rsqrtf in float for group_norm * Add docs for default behavior when axes is not provided. * Update group_norm_op by using Eigen::sqrt on CPU	2018-05-01 12:19:08 -07:00
daquexian	f87462c65f	[Caffe2] Fix the wrong argument name in collect_and_distribute_op (#7091 ) * Fix the wrong argument name, FPN works! * Fix collect_and_distribute test	2018-04-30 15:01:11 -07:00
Xiaomeng Yang	49f87320ba	[Caffe2] Add full impl of GroupNorm (#7058 ) * Add full impl of GroupNorm * Fix comments in math.h * Remove unsed buffers * Add #include <array> in gpu version * Remove unused moments_buffer_ * Make inverse std to be a template. * Add detailed comments	2018-04-29 11:26:40 -07:00
James Reed	20cd27da42	[caffe2][ONNX] Implement CPU NumpyTileOp and corresponding ONNX backend (#7053 ) * Implement CPU NumpyTileOp * Address comments	2018-04-27 19:58:15 -07:00
Marat Dukhan	24d05662ea	[caffe2] Open-source DEPTHWISE_3x3 engine (#6601 ) DEPTHWISE_3x3 engine provides an optimized implementation of depthwise 3x3 convolution, e.g. for ShuffleNet, MobileNets Implementations exist for CPU (generic), ARM CPU, and CUDA GPU. Originally developed by @ajtulloch	2018-04-26 02:30:51 -04:00
James Reed	3c80a2b85c	[caffe2] Add flag to ONNXWhile to skip scoping (#6910 ) * [caffe2] Fix logic error in tensor filling ops in C++ ONNX backend * [caffe2] Add flag to ONNXWhile to skip scoping	2018-04-24 16:53:22 -07:00
Bram Wasti	aa56a1211d	Update from facebook (#6871 ) * Track checkpoint performance in scuba As title. * [C2/CUDA]: fix cross entropy sigmoid with logits when adding log_d_trick, I forgot to add it to the cuda impl; this diff fixes it. * Back out "[caffe2] Unregister MKL fallbacks for NCHW conversions" Original commit changeset: 8918dd40205a Will land after @jongsoo's diff https://phabricator.intern.facebook.com/D7596315 lands * [Easy][C2] Don't add blob to external outputs from output_record if it's already external output As desc. * On Mobile phones, call GlobalInit with no arguments in predictor in case we need to perform initialization FACEBOOK: The QPL logger needs the initialization code. In the past, the initialization code is put in the pipeline calling Caffe2. However, those places become obsolete quickly, as the product teams change places to call Caffe2 from time to time. We also need to track which teams use Caffe2 so that we can put the initialization code there. With this diff, the initialization code is put in the predictor constructor, only enabled for mobile phones. This way, we can always enable QPL logging. Once we do this, we can check how many times Caffe2 inference is called in production, and which models are more popular in production. This way, we can prioritize our effort supporting those models. Will clean up the old code calling the init in the product in a separate diff. * add padding op for sparse length tensor to pad length-based sparse tensor with padding_value * Add conv_op with cudaconvnet engine Add conv_op with cudaconvnet engine * [numa] Fix simple NUMA copy benchmark Move XavierFill into init_net and also compute BW * call roundf (device function) instead of round (host function) * [caffe2_benchmark][observer] Make caffe2_benchmark use its own observer 1. Add ClearGlobalNetObservers() 2. Make caffe2_benchmark use its own observer and observer_reporter * [detectron] Use roundf instead of round in the detectron module ops * allow K larger than number of elements in top k op one use case is to use this op together with PackSegments for sparse tensors, where the number of elements in each slice is not statistically defined. * add ChannelShuffle DNNLOWP op * fixup math_cpu.cc break	2018-04-23 15:01:56 -07:00
Xiaomeng Yang	71c644b005	[caffe2] Add ReduceMinOp and ReduceMaxOp (#6744 ) * Add gpu check for reduce_max * Add ReduceMinOp and ReduceMaxOp * Merge util functions in reduce_ops and math * Expose math internal functions	2018-04-19 00:22:23 -07:00
Jongsoo Park	c40eefeef9	ChannelShuffle with NHWC layout (#6667 ) * ChannelShuffle with NHWC layout * ChannelShuffle with NHWC layout	2018-04-18 19:13:45 -07:00
Pooya Davoodi	969251962c	[Caffe2] Enhance test for CollectAndDistributeOp (#6693 ) * Caffe2: Enhance test for CollectAndDistributeOp This also changes the operator and the test to use stable sort otherwise the test will fail due to differences between the op and the test when facing ROIs of the same score. * Caffe2: Adjust comparator to make std::nth_element and std::sort stable Revert the removal of std::nth_element and std::sort and adding of std::stable_sort.	2018-04-18 13:19:05 -07:00
Orion Reblitz-Richardson	6223bfdb1d	Update from Facebook (#6692 ) * [GanH][Easy]: Add assertion to adaptive weighting layer 0 weight causes numeric instability and exploding ne * [Easy] Add cast op before computing norm in diagnose options As LpNorm only takes floats we add a manual casting here. * Introduce a new caching device allocator `cudaMalloc` and `cudaFree` calls are slow, and become slower the more GPUs there are. Essentially, they grab a host-wide (not device-wide) lock because GPU memory is transparently shared across all GPUs. Normally, this isn't much of a concern since workloads allocate memory upfront, and reuse it during later computation. However, under some computation models (specifically, memory conserving approaches like checkpoint-and-recompute, see https://medium.com/@yaroslavvb/fitting-larger-networks-into-memory-583e3c758ff9) this assumption is no longer true. In these situations, `cudaMalloc` and `cudaFree` are common and frequent. Furthermore, in data parallel contexts, these calls happen at nearly the same time from all GPUs worsening lock contention. A common solution to this problem is to add a custom allocator. In fact, nVIDIA provides one out of the box: CUB, which Caffe2 already supports. Unfortunately, the CUB allocator suffers from very high fragmentation. This is primarily because it is a "buddy" allocator which neither splits nor merges free cached blocks. Study https://github.com/NVlabs/cub/blob/1.8.0/cub/util_allocator.cuh#L357 if you want to convince yourself. This diff adapts a caching allocator from the Torch codebase https://github.com/torch/cutorch/blob/master/lib/THC/THCCachingAllocator.cpp which does splitting and merging and ends up working really well, at least for workloads like the checkpoint-and-recompute computation models noted above. I simplified the implementation a little bit, made it a bit more C++-like. I also removed a bunch of stream synchronization primitives for this diff. I plan to add them back in subsequent diffs. * Report reader progress in fblearner workflows Integrate with fblearner progress reporting API and add support to report training progress from reader nodes. If reader is constructed with batch limits, report based on finished batch vs total batch. The finished batch may be more than total batch because we evaludate if we should stop processing everytime we dequeue a split. If no limit for the reader, report based on finished splits (Hive files) vs total splits. This is fairly accurate. * [GanH][Diagnose]: fix plotting 1. ganh diagnose needs to set plot options 2. modifier's blob name is used for metric field can need to be fixed before generating net * Automatic update of fbcode/onnx to 985af3f5a0f7e7d29bc0ee6b13047e7ead9c90c8 * Make CompositeReader stops as soon as one reader finishes Previously, CompositeReader calls all readers before stopping. It results in flaky test since the last batch may be read by different threads; resulting in dropped data. * [dper] make sure loss is not nan as desc. * [rosetta2] [mobile-vision] Option to export NHWC order for RoIWarp/RoIAlign Thanks for finding this @stzpz and @wangyanghan. Looks like NHWC is more optimized. For OCR though it doesn't yet help since NHWC uses more mem b/w but will soon become important. * Intra-op parallel FC operator Intra-op parallel FC operator * [C2 Proto] extra info in device option passing extra information in device option design doc: https://fb.quip.com/yAiuAXkRXZGx * Unregister MKL fallbacks for NCHW conversions * Tracing for more executors Modified Tracer to work with other executors and add more tracing * Remove ShiftActivationDevices() * Check for blob entry iff it is present When processing the placeholders ops, ignore if the blob is not present in the blob_to_device. * Internalize use of eigen tensor Move use of eigen tensor out of the header file so we don't get template partial specialization errors when building other libraries. * feature importance for transformed features. * - Fix unused parameter warnings The changes in this diff comments out unused parameters. This will allow us to enable -Wunused-parameter as error. #accept2ship * add opencv dependencies to caffe2 The video input op requires additional opencv packages. This is to add them to cmake so that it can build * Add clip_by_value option in gradient clipping Add clip_by_value option in gradient clipping when the value is bigger than max or smaller than min, do the clip * std::round compat	2018-04-17 23:36:40 -07:00
Xiaomeng Yang	4be34ca0f3	Add broadcast and reduce gradient (#6668 ) Add broadcast and reduce gradient	2018-04-17 13:31:13 -07:00
Xiaomeng Yang	cd2112717c	[caffe2] Update math functions with params on host. (#6602 ) * Update ReduceMean Add reduce mean to math Add reduce mean to math * sync reduce_ops_test * Update math_gpu.cu	2018-04-14 21:41:41 -07:00
Yinghai Lu	ef8f556212	[Caffe2] Changes done inside Facebook (#6378 ) * fix unit test for sqrt op From the error logging: [idx, grad, grad_estimate] are: [[ 146. 0.5 0.45776367] [ 147. 0.5 0.45776367] The gradient == 0.5 is correct, which means the SqrtOp and its gradient is doing right job. (Because y = sqrt(x), loss = y^2/2 = x/2, and then d(loss)/dx = 1/2 = 0.5; ) The test failed because of numerical problem of grad_estimate (in unit test). It can be because the step_size is small, and float precision is not high (when there are multiple elements in the tensor, we do sum(y^2) to compute loss) This diff - increase the step size, and also move the test cases to be further away from 0 (where sqrt(x) is not well defined) to be safe :) - also clean up, and merge the test case for inplace Vs. non-inplace Tested with: `CAFFE2_HYPOTHESIS_PROFILE=debug ai_bt caffe2/caffe2/python/operator_test:elementwise_ops_test -- "test_sqrt"` * CompositeReader & CompositeReaderBuilder A new type of reader gluing multiple readers together. * Back out "Revert D7394363: [GanH]: Log D Trick for Cross Entropy with Sigmoid" Original commit changeset: 9325a4356dbe * [dai][WIP] convert params to int8 on ps before sending to trainer Add float->uint8 conversion in addition to float->fp16 conversion in model_saver. * [easy] improve unit test for sparse length sum ops as desc. #accept2ship * Update GitHub upstream to `771fcb3455` * move sparse hash unique ops to OOS and add unit tests - move the SparseHash version to OOS, since 'sparsehash' is already deps of caffe2 OOS: https://fburl.com/arssw4n1 - The 'SparseHash' engine is also being used in OOS, so the SparseHash version shall be in OOS to reduce confusion: https://fburl.com/o5ea7ah2 - fix the CUDA UniqueOp for the case when batch is empty. - add unit test * group_norm_op for caffe2 This is the cuda op for Group Normalization (GN): https://arxiv.org/abs/1803.08494 This code implements GN in one op that computes Y=gamma * (X-mu) / sigma + beta and also its gradients. It is expected to have minimal memory consumption (similar to the BN op), without creating new blobs if GN were implemented as several ops (e.g., reshape, norm_mean/std, affine_channel). * Resubmit D7405233: disappeared in D7464958 OOS publish causes the op missing -- however, test was still there * [c2] add sparse hash engine for cuda unique op The SparseHash version of UniqueOp copy input tensor to CPU, and make use of sparse hash map to get unique output, and then copy back to GPU. * [dper][gpu] enable unit testing gpu trainer for sparse nn to debug the GPU trainer using mock data in unit test. make it easier to develop GPU trainer for new models. * Reuse Gloo context for Synchronize() calls Previously we were creating (and leaking) the Gloo context on each call to Synchronize(). Now only run the common world op and create the barrier net once, then run the barrier net on each Synchronize() call. Since timeout is associated with the Gloo context, assert that the timeout is fixed instead of trying to handle the complexity of multiple timeouts (and associated contexts). * [GanH/WGAN][1/n]: add FC param clipping as titled * [mobile] minimizing changes between caffe2_benchmark and speed_benchmark * [GanH]: enable diagnose within model avoid finding blob names but to directly enable inside the model * Add `net_transformer_fun` option to DPM This callback allows for various transformations to be made to the model after gradient operators have been added. The immediate motivation for this is to allow transformations such has "checkpoint-and-recompute" which allow trading off memory for additional compute. Adding several callbacks like this has made DPM's API less than ideal at this stage. However, I could not find any reasonable alternative. * [DT] [33/n] Compile flow task groups task groups need to compiled in order to pickle the object in fblearner. However I also changed the Job's compile function as creating new object is not necessary. * Initial commit for sparse_normalize vectorization and benchmark * [GanH]: LB Calibration for JSD as titled * Tracing event in async executor Adding event tracing through TRACE_EVENT macro in async executor * [Resubmit] D7409751 Reseting book-keeping blobs when the reservoir is reset D7409751 got lost in D7464958 * Visualizing realtime weights values we want to visualize the weights values as optimizer is iterating. This diff supports to visual the weights at an assigned index. Currently, we assume the blob to be 2 dimensional. * [GanH][Easy]: Fix Homotopy Weighting apparantely, there was a bug in homotopy weight (alpha, beta) update * [c2] move sparse hash unique op out of oss so that oss do not need to depend on google hash map. * Get rid of std::round as it's not supported on Android * Revert changes on setup.py * Skip shaky test on Dataio * fix	2018-04-10 21:11:43 -07:00
Paul Jesse Hellemn	771fcb3455	[caffe2] Fbcode to GitHub sync (#6208 ) * [easy] allow empty tensor in cuda relu op The diff has not enabled unit test of empty tensor, because MLKVersion of ReluOp need extra work to support * Make blob norm plotting work with distributed trainer when the old framework is used	2018-04-02 16:35:27 -07:00
Orion Reblitz-Richardson	cbe92abd7c	Disable failing test_lengths_max_gpu	2018-03-30 21:00:45 -07:00
Ellie Wen	3d27095eec	[easy] fix comments nit: fix comments	2018-03-30 21:00:44 -07:00
Andrey Malevich	b9d2ba1dbf	Revert D7394363: [GanH]: Log D Trick for Cross Entropy with Sigmoid This reverts commit d63266ccbc0c1390c58c2a71ae0b562fdec2fbc0 @bypass-lint An infra SEV is better than not reverting this diff. If you copy this password, see you in SEV Review! @cause_a_sev_many_files	2018-03-30 21:00:44 -07:00
Ellie Wen	363a227d19	extend bucketize op to support duplicated boundries upgrade bucketize op to support duplicated boundaries	2018-03-30 21:00:44 -07:00
Jason Gauci	551d5fbf9a	CUDA version of LengthsMax operator CUDA version of LengthsMax operator @override-unit-failures	2018-03-30 21:00:44 -07:00
Xiaolong Wang	2b0e39f569	[GanH]: Log D Trick for Cross Entropy with Sigmoid as titled	2018-03-30 21:00:44 -07:00
Lu Fang	344fa57680	Adjust the test since only the op only has CPU implementation	2018-03-27 18:10:39 -07:00
Jongsoo Park	3300e21d52	Add SparseLengthsPositionalWeightedSum operator that fuses SparseLengthsWeightedSum, LengthsRangeFill, and Gather add SparseLengthsPositionalWeightedSum operator that fuses SparseLengthsWeightedSum, LengthsRangeFill, and Gather	2018-03-27 18:10:39 -07:00
Xianjie Chen	e6b04ba121	fix lengths sum cuda op for empty batch the cuda does not allow launching empty kernel	2018-03-27 18:10:39 -07:00
Xianjie Chen	6ed9a0c3f2	fix cuda elementwise ops for empty batch CUDA will fail to launch empty kernel	2018-03-27 18:10:39 -07:00
Roxie He	d2453afb1e	Add SumElementsInt operator Added a caffe2 math sum operator so that it takes integers (only int32) Changed the SumFloatIter to SumGenericIter so that it takes >1 types. Added a sumElementInt operator	2018-03-27 18:10:39 -07:00
Jiyan Yang	8fa38f8dce	Add gradient clipping (#2452 ) As titled.	2018-03-27 15:10:15 -07:00
Orion Reblitz-Richardson	1d5780d42c	Remove Apache headers from source. * LICENSE file contains details, so removing from individual source files.	2018-03-27 13:10:18 -07:00
Jason Gauci	f93e820e7d	Revert "[C2][GPU]LengthsMax CUDA version (#2209 )" (#2444 ) This reverts commit 71acc269bb573c8c04343e6d534b2557a456b29a.	2018-03-27 01:15:52 -07:00
harouwu	6740126f5c	[C2][GPU]LengthsMax CUDA version (#2209 ) lengthsmax CUDA version. will provide gradient later	2018-03-27 00:19:17 -07:00
Xiaomeng Yang	a73f9af5ab	Add axis to top_k_op. (#2416 ) * Revert update on top_k_op * Add axis to top_k_op Add axis to top_k_op	2018-03-23 20:43:43 -07:00
Xiaomeng Yang	3053618624	Add argmax and argmin ops (#2371 ) * Revert update on top_k_op * Add axis to top_k_op * Remove do { ... } while (false) * Revert top_k op to upstream * Add argmin and argmax ops Add argmin and argmax ops * Revert top_k_test to upstream * Add argmin and argmax ops Add argmin and argmax ops	2018-03-22 00:52:11 -07:00
James Reed	48c70d2dbd	Fix ReduceMean performance by specializing Eigen implementation for common shapes (#2355 )	2018-03-21 21:48:54 -07:00
Orion Reblitz-Richardson	42d3bcc189	Only run WeightedMultiSample test on CPU and not GPU.	2018-03-20 13:34:22 -07:00
Yan Shang	69706b2ab4	Add C2 for weighted sampling C2 operator, with input (1) index; (2) cdf; argument number_samples, output number_samples samples from the index.	2018-03-20 13:34:22 -07:00
Lukasz Wesolowski	f7f48989ba	GPU support for ChannelBackpropStatsOp Step 2 of 3 in adding support for multidevice batch normalization on GPUs. Implements ChannelBackpropStatsOp. Similar to D6953411.	2018-03-20 13:34:22 -07:00
Chenguang Xi	3940e7f0a7	Support computing averaged norm in blob magnitdue visualization 1. support the LpNorm operator to calculate the average LpNorm by adding one more boolean argument, i.e., LpNorm(average = true) = LpNorm(x) / size of (x) 2. integrate the average option into visualization framework	2018-03-20 13:34:22 -07:00
Zhanibek Datbayev	7aeda25cfb	Add type / shape inference for IndexHash op just as title says	2018-03-20 13:34:22 -07:00
Edoardo Conti	6af3429f4f	Add 2D Row-wise Arg Max Operator Add operator to return row-wise arg max of 2D matrix.	2018-03-20 13:34:22 -07:00
Orion Reblitz-Richardson	00603b5e0a	Add CollectAndDistributeFpnRpnProposalsOp for FPN support (#2254 ) * Add CollectAndDistributeFpnRpnProposalsOp for FPN support * Adds a C++ operator equivalent to the Python op in Detectron * Once some additional GenerateProposalsOp changes are made this will let us support Detectron FPN models with straight Caffe2 C++ ops * RetinaNet and segmentation models require additional work * Remove some uses of conservativeResize * Add notes about training and inputs/outputs to operator documentation	2018-03-19 14:04:43 -07:00
Mohammad Hossain	28eda01809	Reduce Sum and Reduce Mean (#2189 ) * Reduce Sum and Reduce Mean * Handle reductions with empty 'axes' * Merge codebase and simplify tesnor reduction logic * Restructure code and add comments. * Fix parameter to scale * Fix parameter to scale	2018-03-13 19:13:47 -07:00
Qinqing Zheng	edd138ba00	[C2] Support optional lengths input to ReduceFront/Back operators (#2250 )	2018-03-13 13:20:26 -07:00

... 3 4 5 6 7 ...

978 Commits