Commit Graph

636 Commits

Author SHA1 Message Date
Xiaomeng Yang
87cac4c2f1 Update Im2Col related to make preparation for group conv in NHWC order. (#10439)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10439

Update Im2Col related to make preparation for group conv in NHWC order.

Reviewed By: houseroad

Differential Revision: D9285344

fbshipit-source-id: 1377b0243acb880d2ad9cf73084529a787dcb97d
2018-08-15 17:10:24 -07:00
Eli Amesefe
c5b1aa93ee Export uint8 tensors as byte string in mobile_exporter and add GivenTensorByteStringToUInt8FillOp (#10385)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10385

Pull Request resolved: https://github.com/pytorch/pytorch/pull/10354

Pull Request resolved: https://github.com/pytorch/pytorch/pull/10316

Because Protobuf encodes uint8_t tensors using a less space efficient varint uin32_t encoding, we are adding a new operator that reads back a byte string into a uint8_t tensor.

Reviewed By: harouwu

Differential Revision: D9004839

fbshipit-source-id: dfd27085c813fdeff13fee15eef4a2e7fef72845
2018-08-15 14:26:50 -07:00
Jongsoo Park
d8ff7ad6f8 generalize order switch ops for 1-3d (#10395)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10395

Order switch ops (NCHW2NHWC and NHWC2NCHW) were only supporting 2D images.
This diff generalizes them to 1D and 3D, and also add a unit test we didn't have.

Reviewed By: protonu

Differential Revision: D9261177

fbshipit-source-id: 56e7ec54c9a8fb71781ac1336f3f28cf024b4bda
2018-08-15 10:09:31 -07:00
Peizhao Zhang
ce8e8feceb Fixed a bug in box_with_nms_limit where it may produce more bounding boxes than specified. (#10390)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10390

Fixed a bug in box_with_nms_limit where it may produce more bounding boxes than specified.
* The original code first finds the threshold for the boxes at the 'detectons_per_im' position, and filters out boxes lower than the threshold.
* In some cases that there are multiple boxes have the same threshold, the op will return more boxes than 'detectons_per_im'.

Reviewed By: wat3rBro

Differential Revision: D9252726

fbshipit-source-id: 63f40829bcd275cb181692bc7547c384cee01499
2018-08-14 23:54:23 -07:00
Peizhao Zhang
520f4f6cb9 Added some unit test for box_with_nms_limit_op. (#10389)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10389

Added some unit test for box_with_nms_limit_op.

Reviewed By: wat3rBro

Differential Revision: D9237860

fbshipit-source-id: 2d65744bd387314071b68d2a0c934289fc64a731
2018-08-14 11:55:03 -07:00
Wei Wen
ffb59e5f20 adding stochastic quantization caffe2 operators (encoder and decoder in CPU are implemented. GPU mode is pending)
Summary:
This operator implements b (1/2/4/8) bit stochastic quantization of a floating
matrix in a row-wise fashion. 8/b floating values are concatenated to a byte
and returned in uint8 tensor. PR: https://github.com/pytorch/pytorch/pull/8629

Reviewed By: harouwu

Differential Revision: D8493264

fbshipit-source-id: 01f64066568a1e5a2b87c6d2134bd31cdf119c02
2018-08-13 16:39:23 -07:00
Jerry Zhang
656bb320b7 EnforceFinite test (#10143)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10143

att

Reviewed By: xianjiec

Differential Revision: D9122444

fbshipit-source-id: 010abcc1eb64f084c00890e8de5f5d422b4b8d02
2018-08-03 10:31:29 -07:00
Junjie Bai
4778afb8bb In Expand support using -1 to indicate preserving original size (#10174)
Summary:
zrphercule

https://pytorch.org/docs/stable/tensors.html#torch.Tensor.expand
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10174

Differential Revision: D9136467

Pulled By: bddppq

fbshipit-source-id: 825c489899097acda8d43706964d78a104cdf583
2018-08-02 22:09:47 -07:00
Junjie Bai
dd527db711 Skip TestConvolution.test_convolution_sync on ROCM which caused random segfaults (#10179)
Summary:
https://ci.pytorch.org/jenkins/job/caffe2-builds/job/py2-clang3.8-rocm1.7.1-ubuntu16.04-test/4701/console

petrex ashishfarmer rohithkrn
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10179

Differential Revision: D9139657

Pulled By: bddppq

fbshipit-source-id: 9b1bb2ad185ed16fff696ce026a5ee5fcf9cbaee
2018-08-02 21:09:27 -07:00
Lin Li
4a2f3cc45f Improve lars operator by applying clipping (#9905)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9905

This diff improves lars operator in Caffe2 by applying clipping to the computed learning rate

Reviewed By: pjh5

Differential Revision: D9020606

fbshipit-source-id: b579f1d628113c09366feac9406002f1ef4bd54f
2018-08-02 11:54:28 -07:00
Anshul Jain (B*8)
56974a06b5 Revert D8909766: [caffe2] Simplify order switch operators
Differential Revision:
D8909766

Original commit changeset: 17a302d5bf4a

fbshipit-source-id: 56c75a8ce27873ed1d5f194b9d6bf0049d8f21ba
2018-07-28 18:40:13 -07:00
Igor Milyakov
607688e928 Adding reciprocal operator and a test
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9908

Differential Revision: D9035809

Pulled By: virtan

fbshipit-source-id: bce1db46fd55faeeab18a3b266d25c8beeb08df7
2018-07-27 18:24:43 -07:00
Igor Milyakov
12a1af3731 Adding conv tests with explicit algo definition
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9798

Differential Revision: D9034663

Pulled By: virtan

fbshipit-source-id: d722f25f1dd00231ccc3ad5960bbbef63af02c2d
2018-07-27 17:39:17 -07:00
root
c3fe071483 Update hip files (#9826)
Summary:
The goal of this PR is to update the hip files to reflect relevant changes in cuda source files.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9826

Differential Revision: D9032840

Pulled By: bddppq

fbshipit-source-id: 504e55c46308eebfee3c9a7beea1f294fe03470f
2018-07-27 16:54:39 -07:00
Norman Mu
a532c1a48c Fix default argument value for CTCGreedyDecoder op (#9747)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9747

Currently the ctc_greedy_decoder op initializes the `merge_repeated` argument only if it has been provided by the user. Change to initialize in all cases.

Reviewed By: houseroad

Differential Revision: D8963635

fbshipit-source-id: 18955c7c26a77d9d7f5137e4dec085252ffabfeb
2018-07-27 16:33:07 -07:00
Jongsoo Park
e7ab093d93 Simplify order switch operators (#9581)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9581

Mostly to simplify code. Should also improve performance but order switch ops
don't take much time anyway.

Reviewed By: viswanathgs

Differential Revision: D8909766

fbshipit-source-id: 17a302d5bf4aba2755d88223fc01a41fd72c5919
2018-07-26 18:24:29 -07:00
Junjie Bai
bdbbcf068a Temporarily disable test_unique on rocm since it keeps running into segfault (#9872)
Summary:
petrex

https://ci.pytorch.org/jenkins/job/caffe2-builds/job/py2-clang3.8-rocm1.7.1-ubuntu16.04-test/3758/console
https://ci.pytorch.org/jenkins/job/caffe2-builds/job/py2-clang3.8-rocm1.7.1-ubuntu16.04-test/3757/console
https://ci.pytorch.org/jenkins/job/caffe2-builds/job/py2-clang3.8-rocm1.7.1-ubuntu16.04-test/3752/console
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9872

Reviewed By: ezyang

Differential Revision: D9013335

Pulled By: bddppq

fbshipit-source-id: 80490a0fd4a86aa9c8454378c0edddc57d135c4e
2018-07-26 08:34:00 -07:00
Junjie Bai
997f46d1e1 Disable "filter too much" health check for fc operator tests (#9865)
Summary:
makes the CI flaky
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9865

Differential Revision: D9011882

Pulled By: bddppq

fbshipit-source-id: 5124ab97d258eed7585734d64fb01e5df98abd0d
2018-07-25 21:41:14 -07:00
Siddharth Goyal
4b61760738 Add Adadelta optimizer to caffe2 (#9088)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9088

Closes https://github.com/pytorch/pytorch/pull/9088

- Added CPU/GPU implementations of Adadelta and SparseAdadelta.
- Added corresponding Python unittests

Reviewed By: BIT-silence

Differential Revision: D8712169

fbshipit-source-id: 544e99e13b230a919672a7341b3715d64597c0be
2018-07-24 20:09:21 -07:00
Kittipat Virochsiri
2b134c72e6 Add interface to provide blob types to shape&type inference (#9643)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9643

Current map interface assumes float data type, which is not always correct.

Reviewed By: kennyhorror

Differential Revision: D8455784

fbshipit-source-id: b94a31267760f7f97c15aa4b03008affc347fd10
2018-07-24 11:58:05 -07:00
Junjie Bai
7af5883860 Eanble python tests on ROCM (#9616)
Summary:
petrex
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9616

Differential Revision: D8960623

Pulled By: bddppq

fbshipit-source-id: bde93bda6230094e6bf4badd8ee79f0688ae1993
2018-07-24 11:37:58 -07:00
Xiaomeng Yang
5df3eae89e Add 1x1 specialization for conv with NCHW order (#9671)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9671

Add 1x1 specialization for conv with NCHW order

Reviewed By: houseroad

Differential Revision: D8944686

fbshipit-source-id: 94bf44f69498b1934b7dfff4c0e989342c7bb61c
2018-07-23 18:54:58 -07:00
Norman Mu
ee2cc68259 Add ctc_beam_search_decoder op for caffe2 (#9622)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9622

Implement a ctc_beam_sarch_decoder operator based on ctc_greedy_decoder.

Differential Revision: D8903100

fbshipit-source-id: 38973632cb437e5cfcb9ed3a48ed6b901c10efa3
2018-07-23 13:40:24 -07:00
Xiaomeng Yang
a01d6f01b5 Update channel_shuffle_op and transpose 2d to speed up ShuffleNet (#9525)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9525

Update channel_shuffle_op and transpose 2d to speed up ShuffleNet

Reviewed By: houseroad

Differential Revision: D8889361

fbshipit-source-id: 60196e819b6842becc53b4859b62d4419a0e2c6e
2018-07-21 12:54:33 -07:00
Zhaoheng Ni
a3a6ab60cd Fix the error in UnpackSegmentsOp when calculating the gradient with "max_length" argument (#9598)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9598

The "max_length" should be passed to UnPackSegmentsOp if "max_length" is given when calling PackSegmentsOp.

Reviewed By: jerryzh168

Differential Revision: D8919799

fbshipit-source-id: 8c97aa717b69177b8a5d5d56892817d488853840
2018-07-20 11:09:34 -07:00
Zhishuai Zhang
6557856671 Fix l2 normalization when handling zero vector (#9594)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9594

When the input vector is a zero vector, the previous GPU code will give Nan in backward. We fix this.

Reviewed By: pjh5

Differential Revision: D8849732

fbshipit-source-id: 87b1fb1ee05dfdb0d43bcbe67e36f15896fe1706
2018-07-19 14:10:03 -07:00
Xiaomeng Yang
ca3b36aa6a Add implementation for batch_moments_op (#9510)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9510

Add implementation for batch_moments_op

Reviewed By: houseroad

Differential Revision: D8587654

fbshipit-source-id: d20f52cc8e900716c1057e68c147258dfda5245b
2018-07-18 11:59:54 -07:00
Viswanath Sivakumar
9235ff53f1 Clip horizontal bounding boxes during rotated detection for backward compatibility (#9403)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9403

In BBoxTransform and GenerateProposal ops, clip_boxes makes sure the bbox fits
within the images. For rotated boxes, this doesn't always make sense as there
could be multiple ways to clip a rotated box within an image boundary.
Moreover, clipping to a horizontal box means we leave out pixels of interest
potentially. Therefore, we clip only boxes with angle almost equal to 0 (with a
specified `angle_thresh` tolerance).

Reviewed By: pjh5

Differential Revision: D8828588

fbshipit-source-id: 39c1eafdb5d39d383780faa0a47e76149145e50c
2018-07-16 20:24:49 -07:00
Mark Richardson
88146484b4 Add support for .norm() pytorch onnx export and ReduceL1/ReduceL2 caffe2 operators (#9299)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9299

Onnx has ReduceL1 and ReduceL2 operators that would facilitate this, so allow pytorch to export those and allow caffe2 to run them.

I only implemented this on CPU so far.

Reviewed By: pjh5

Differential Revision: D8757381

fbshipit-source-id: 68afc9e2f90042a70929b73ace05a499b5c670c7
2018-07-14 10:54:13 -07:00
Zhaoheng Ni
5ac8a80f8b Add BatchBucketizeOp in caffe2 (#9385)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9385

The operator transform dense features to sparse features by bucketizing. Only the feature in indices tensor will be transformed and output.

Reviewed By: bddppq

Differential Revision: D8820351

fbshipit-source-id: a66cae546b870c6b2982ac20641f198334f2e853
2018-07-13 20:39:30 -07:00
Jian Zhang
9e2f2cab94 Implementation and operator test for Wngrad optimizer (#8999)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8999

Closes https://github.com/pytorch/pytorch/pull/8999

Implemented the WRgrad optimizer operator for dense (base case as well as the case with additional output for effective learning rate and update value) and sparse case.

Reviewed By: pjh5

Differential Revision: D8627933

fbshipit-source-id: a63cde46c04bcc6b428ab5f77a4b3b2beb66c046
2018-07-13 18:11:41 -07:00
Xiaomeng Yang
bb9ff58c6d Add cudnn activation ops (#9379)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9379

Add cudnn activation ops

Reviewed By: houseroad

Differential Revision: D8818013

fbshipit-source-id: d3881c634a46578b9331da07f9fdf7e1f31d7e8a
2018-07-12 23:18:56 -07:00
Akshay Chalana
e30ff68410 Add Hardtanh Export (#8804)
Summary:
Added hartanh CPU/GPU Implementations and backend tests to Caffe2
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8804

Reviewed By: bddppq

Differential Revision: D8813987

Pulled By: houseroad

fbshipit-source-id: 2480296eab3373425b9e1734a10c009b4f5d3e26
2018-07-11 18:09:51 -07:00
Viswanath Sivakumar
c2dd90c40e Add angle normalization for rotated boxes (#9056)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9056

Closes https://github.com/pytorch/pytorch/pull/9056

Updates bbox_transform for rotated boxes with angle info to normalize the
predicted angle to be within [angle_bound_lo, angle_bound_hi] range.

Reviewed By: pjh5

Differential Revision: D8706240

fbshipit-source-id: f3ee834cf362736136e285f0f8f0c063af94a879
2018-07-11 11:25:54 -07:00
Viswanath Sivakumar
748a90d05b BBoxTransform op: Add support for rotated boxes (#8952)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8952

Closes https://github.com/pytorch/pytorch/pull/8952

Based on RRPN paper: https://arxiv.org/abs/1703.01086

Reviewed By: pjh5

Differential Revision: D8598547

fbshipit-source-id: 3699379df9bf45ed5bdd395175a0e26a77e079f7
2018-07-11 10:25:34 -07:00
Huamin Li
fb9f9c9ba2 Implement Sinh and Cosh (#9213)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9213

Closes https://github.com/pytorch/pytorch/pull/9213

Added hyperbolic trig functions Sinh and Cosh

Reviewed By: BIT-silence

Differential Revision: D8752566

fbshipit-source-id: 5a58336a5153ec804404b9ac7b10b5662ede3cb7
2018-07-10 18:55:31 -07:00
Orion Reblitz-Richardson
936f47f271 Make roi_align_rotated_op_test not rely on 1.12.0 numpy.rot90 (#9267)
Summary:
Breaking this out of https://github.com/pytorch/pytorch/pull/8338

Use a local version of `np.rot90` with an `axes` argument, since we don't have NumPy 1.12.0 in all of the test environments. Caffe2 conda2-ubuntu16.04, for example, fails. Generally, it seems better to not require a NumPy bump just for this test.

cc mingzhe09088
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9267

Reviewed By: mingzhe09088

Differential Revision: D8767819

Pulled By: orionr

fbshipit-source-id: c51a6295d58366eba06e4e55e3f1ffaa8af96975
2018-07-09 11:55:39 -07:00
Zhaoheng Ni
f87499a8f3 Modify the original PackSegments operator by adding "max_length" argument (#9048)
Summary:
Closes https://github.com/pytorch/pytorch/pull/9048

max_length argument helps fix the shape of the output to be N * max_length * D, where N is the batch_size, D is the feature_dim.

Reviewed By: bddppq

Differential Revision: D8702782

fbshipit-source-id: e30555608fee1c4a61cc95922f4a71c7f54903af
2018-07-06 14:33:59 -07:00
Xiaomeng Yang
21c420c32c Remove unused RowwiseArgMaxOp (#9119)
Summary:
Closes https://github.com/pytorch/pytorch/pull/9119

Remove unused RowwiseArgMaxOp

Reviewed By: houseroad

Differential Revision: D8719826

fbshipit-source-id: 57d78c8b93bc94a4634d806c7c2041f8c18678a5
2018-07-05 15:25:28 -07:00
Yan Zhu
8364470e5c fix expty batch for softmax (#9075)
Summary:
Closes https://github.com/pytorch/pytorch/pull/9075

as title

Reviewed By: QueryConnectionException

Differential Revision: D8710616

fbshipit-source-id: ca505e1a733cc24db9e2ab83a5395c64fa8360c4
2018-07-01 16:40:14 -07:00
Xiaomeng Yang
03e7953a98 Use FixedDivisor in Reduce and Broadcast CUDA kernels (#9072)
Summary:
Closes https://github.com/pytorch/pytorch/pull/9072

Use FixedDivisor in Reduce and Broadcast CUDA kernels

Reviewed By: houseroad

Differential Revision: D8710243

fbshipit-source-id: 6f1da12234898594a1be8c979d942aa515832aeb
2018-07-01 00:25:34 -07:00
Yan Zhu
b07ea04e23 empty batch for spatialBN (#8933)
Summary:
Closes https://github.com/pytorch/pytorch/pull/8933

spatialBN implementation cannot deal with empty batch, this diff tries to enable zero batch setting:

during training, when batch_size = 0:
in forward, output's saved_mean and saved_var are zeros.
in backward, the gradient for SCALE_GRAD and BIAS_GRAD are zeros.

Reviewed By: pjh5

Differential Revision: D8644699

fbshipit-source-id: 599ea687329d68699c987e05f56f409f4e729d1c
2018-06-29 18:40:41 -07:00
Xiaomeng Yang
838fdd6f99 Add Cube and Cbrt Ops (#8991)
Summary:
Closes https://github.com/pytorch/pytorch/pull/8991

Add Cube and Cbrt Ops

Reviewed By: houseroad

Differential Revision: D8678848

fbshipit-source-id: 051dd475e45ad9f1d11a8b32ae3acd1f7459b930
2018-06-28 14:55:30 -07:00
Xiaomeng Yang
93cc7d1923 Add in_place test for binary ops
Summary: Closes https://github.com/pytorch/pytorch/pull/8973

Reviewed By: houseroad

Differential Revision: D8674216

Pulled By: BIT-silence

fbshipit-source-id: bde1ff7b47dbc8a48d1ff72b345c767af698a09b
2018-06-28 11:45:35 -07:00
Mingzhe Li
c4744cfafa bilinear upsample operator on CPU
Summary: Add support for bilinear upsample operator on CPU.

Reviewed By: BIT-silence

Differential Revision: D7853215

fbshipit-source-id: 9043c95f9eb4e1f6df324e8f7a4e8fdb0c758f66
2018-06-27 10:12:06 -07:00
Orion Reblitz-Richardson
edb88b5f3a
Update from Facebook (#8887)
* add opencl + fpga context

adds an opencl context inside caffe2/fb which can be used for fpga access

* [Caffe2] Force tensor inference checks to be triggered during testing

We've started to rely on TensorInference functions more for different analysis.  This diff ensures that the TensorInference function's result matches what is expected from the definition of the operator.

* Enable building //caffe2:torch with @mode/opt

In @mode/opt, python runs out of a PAR, which breaks a lot of
assumptions in the code about where templates/ folders live relative
to __file__. Rather than introduce hacks with parutil, I simply turn
template_path into a parameter for all the relevant functions and
thread it through from the top level.

* [Caffe2] Fix cost models for DotProduct and Div.  Update Tensor Inference for dot product

As title.  DotProduct states that output is a 1-D tensor (https://caffe2.ai/docs/operators-catalogue.html#dotproduct) though code suggests it is either 0- or 1-D depending on inputs.  TensorInference defined to support implementation.

* [SG-MoE] Add an option to make the experts NOT as components

* [nomnigraph] Rename and fixup convertToNeuralNetOperator API

This will make things a bit cleaner

* no longer symlink THNN.h and THCUNN.h

* forced decoder network (onnx export)

Closes https://github.com/pytorch/translate/pull/95

Add networks in ensemble_export.py to create a forced decoding network from PyTorch NMT checkpoints. This network takes an arbitrary numberized (source, target) pair and returns the model score for the translation, including penalties.

Vocabulary reduction networks are also supported, but note that target indices which are not in the possible_translation_tokens generated for the source input will be trea

* Revert schema change to fix production models

Revert schema change to fix production models

* MockLogDeviceReader - rebase on FIX

# Goal

1), Build a make_mock_log_device_reader using make_mock_reader

2), Replace the real log_device_reader here: https://fburl.com/raihwf1p

# Log by D8151734

Real log_device_reader:
```
I0529 20:29:05.373108 954994 tensor.h:839] Tensor print_net/log of type std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >. Dims: (): read_net/ParseOpenTrainingRow:0
I0529 20:29:05.373244 954994 tensor.h:839] Tensor read_net/ParseOpenTrainin

* [C2/D2][1/n]: Nonnegative-Constrained Optimization -- log barrier

implement log barrier as a regularization method

* Add teacher weight screening.

Add teacher weight sceening according to teacher labels. If teacher label is zero, we do not use the distill loss in the objective function.

* Add NormalizerContext

See task for more detail. This implementation is a copy of what exists for RegularizerContext except for how the parameters are defined in the model_definition thrift file.

I'll try an alternative implementation which overrides the default arguments of functions instead like for argscopes in tensorflow.

https://github.com/pytorch/pytorch/compare/master...MaximeBoucher:update-from-facebook-0939578c068c?expand=1

* Adding cosine similarity option in dot processor

Add pairwise cosine similarity option in dot product.
Add an option to concate dot product and cosine similarity.
Add test cases.

* [nomnigraph][redo] Concat elim for sparseNN

Same as D7962948, which was reverted because Operator Schema was not
defined

* [pytorch] Revert pytorch/pytorch#7918 'Release GIL when copying to shared memory', breaks ASAN

Revert this pytorch diff that breaks ASAN when running Filament in dev mode; in opt mode it gives "bad file descriptor" errors. Looks like a race when copying tensors to shared memory in multiple mp.Queue's (which spawn separate threads).

https://github.com/pytorch/pytorch/pull/7918/files

* [nomnigraph][mobile] Enable nomnigraph by default, use -Oz on nomnigraph related code to reduce code size

enables nomnigraph and reduces codesize

* [Warmup] Allow both offline incremental training and online training

Change plan name on saving side and reading side to support both training type

This diff depends on D8128530 and D8168651.

* Revert D7802642: [Warmup] Allow both offline incremental training and online training

This reverts commit afc213cf9b36cecf75333a788391c4d09f4afccc

@bypass-lint

An infra SEV is better than not reverting this diff.
If you copy this password, see you in SEV Review!
@cause_a_sev_many_files

* Add legacy grad logic to fix div op on old graphs.

Add legacy grad logic to fix div op on old graphs.

* Correctly propagate operator failures

Propagate errors from operators that throw exceptions and return false

* Revert D8374829: [caffe2][nomnigraph][redo] Concat elim for sparseNN

This reverts commit 6dda028c463e54bb5c32188bbbe9202107e188a5

@bypass-lint

An infra SEV is better than not reverting this diff.
If you copy this password, see you in SEV Review!
@cause_a_sev_many_files

* [Caffe2] Added extra_info to core.DeviceOption(), enforced extra_info to be inherited in scope.DeviceScope

extra_info is a newly defined field in DeviceOption proto. This diff added extra_info to the core.DeviceOption().  And, In scope.DeviceScope(), this diff enforce the new scope to inherit the extra_info from old scope.

* [opt] hgdirsync wasn't enabled, merge diverged code

Here's the damage, P59732616 basically xplat was left behind but had
the change from assert to CAFFE_ENFORCE

* OMP parallelism over RoIs for RoIAlign op

Simpler to parallelize over RoIs. Shouldn't affect other uses as it relies on
the number of OMP threads set during startup.

PR: https://github.com/pytorch/pytorch/pull/8562

* Use int64_t for shape in FillOps

to avoid overflow of int32

* Implement Rotated RoIAlign op

Based on Rotated RPNs as explained in https://arxiv.org/abs/1703.01086.
The idea is simple - orientation/angle is added as an RPN
anchor parameter and then the angle is further regressed similar to bbox
coords. There are some additional changes related to NMS and IoU, but besides
that it's a direct extension to Faster-RCNN. Further details in https://fb.quip.com/sZHlA1iMfWPZ.

RoIs are represented in [center_x, center_y, width, height, angle] format.
`angle` repre

* Rotated RoIAlign op CUDA forward implementation

CUDA forward impl for D8415490

* RoIAlignRotated op CUDA backward pass implementation

TSIA

* All remaining fixes to eliminate process_github.sh

Most of this diff has already been reviewed separately, except for the parts relating to _thnn/utils.py and _utils._internal.py

remove skipIf(True, 'Fbcode') line from process_github.sh

replace sed of cpp file with #ifdef to control cudnnDestroy use

undo sync-time deletion of .gitattributes, remove process_github.sh

switch to using _utils._internal rather than try-import-except

This diff also fixes the open-source bug where rebuilds have

* Back out "Revert D7802642: [Warmup] Allow both offline incremental training and online training"

Original commit changeset: 7707d2efe60e The original diff is backout becuase the online trainer package is backed out. This code would only work with new online trainer package

* [easy] improve error log in adagrad op

as title

* re-allow use of thnn_h_path

This fixes cffi usage in OSS

* [4/4] [tum] paralyzing layerNorm for GPU full sync

as title

* add compile=False to pytorch tests, remove hack with pyc

* Add shape and type inference for RowWiseArgMax operator

See title

* Revert D8515341: Back out "Revert D7802642: [Warmup] Allow both offline incremental training and online training"

This reverts commit 78167eeef0af16b60f72c82f9dcdda9b41b4dcbd

@bypass-lint

An infra SEV is better than not reverting this diff.
If you copy this password, see you in SEV Review!
@cause_a_sev_many_files

* [fix-flaky-test] mock_hive_reader_test flaky, because GlobalCounter collects local counts intervally

# Problem

`MockHiveReader` uses `GlobalCounter` to limit `max_examples`.

GlobalCounter on server node collect local counts from worker nodes every 1 sec.

This 1 sec delay makes it impossible to limit exactly to the `max_examples`, it will definitely exceed `max_examples`.

# Plan

Given,
```
Expected num_examples = max_examples + num_examples/sec (Read Speed) x 1 sec (GlobalCounter Sync Int

* [Caffe2] Fix FCGradient cost inference.  Prevent overflow in cost inference

FCGradient missed a factor 2 in the `num_outputs == 3` case.  Overflow was occurring with flop calculation for FC.  Changed types to `uint64_t` to prevent future problems.

* Fix binary ops with empty inputs

Fix binary ops with empty inputs

* Support the filling of input blob with provided data

as title for Biz Integrity case

* Back out "Revert D8515341: Back out "Revert D7802642: [Warmup] Allow both offline incremental training and online training""

Original commit changeset: 30c55dd38816 Original diff is reverted due to introducing bad integration test. Fixed the integration test.

* [c2][easy] improve pack ops error loggings

as desc.

* Add ShapeTypeInference for LpNorm operator

As desc

* Shard test_nn to reduce runtime for each test target

Closes https://github.com/pytorch/pytorch/pull/8793

The current test_nn would time out and be disabled in GreenWarden, and we need to have an option to split it up in order to pass the stress test. Right now GreenWarden roughly allows running 100 test cases in test_nn before timing out, and here we have an option to divide test_nn into 30 shards (with ~40 tests in each shard) to allow for some test suite growth in the future.

* Change default caffe2_streams_per_gpu to 1

* Remove IN_SANDCASTLE from common.py and test_nn.py

We prefer to disable the failing tests through Sandcastle UI instead.

* Add a new class for an updated prof_dag.proto

This diff contains:
- An updated prof_dag.proto that contains blob profiles.
- A class to deserialize this information (serialization is in a follow up diff)
- Update to separate profiling information from NeuralNet (and use it as part of the class above).
- Unit tests

* Lambdarank for SparseNN

This diff adds a lambda_rank_layer for SparseNN.
 changes include
1) Adds support for multi sessions in c2 op
2) Adds support for two different loss functions in c2 op
3) Unit tests for op

* Revert D8586950: Back out "Revert D8515341: Back out "Revert D7802642: [Warmup] Allow both offline incremental training and online training""

This reverts commit 012220ed63eccc35659a57b31d16a3625da6317b

@bypass-lint

An infra SEV is better than not reverting this diff.
If you copy this password, see you in SEV Review!
@cause_a_sev_many_files

* [easy] A few fixups to multithread predictor benchmark

(1) support perf on T6 server
(2) remove dead code

* fix a bug about the map size

as title

* Fix reduce sum on in-place case.

Fix reduce sum on in-place case.

* [Warmup] Reland reverted diff Allow both offline incremental training and online training

Closes https://github.com/pytorch/pytorch/pull/8827

fix net transform integration test. Allow offline and online trainer to coexist D7802642.

* Add StoreHandlerNotAvailableException

Add an exception for a store that is not available or has been
deleted.

* Use exception handling for fault tolerance, missing KV store

Remove status blobs to communication ops so that exceptions propagate on
failure.

* [C2/D2][2/n]: Nonnegative-Constrained Optimization -- bounded grad proj

for simple bounded constrained optimization, incl non-negative box constraints.

* [GanH]: Adaptive Weighting with More Estimations

With implemented postivity optimization, we now learn adaptive weights with different
parameterizations.

This improves parameter estimation and training stability.

* Revert some changes for landing

* Remove AutoNoGIL in StorageSharing

* Temporarily disable net_tests

* Revert "[Caffe2] Force tensor inference checks to be triggered during testing"

This reverts commit 67ef05c22b2f71b4a489695384932f968384a2a4.

* Revert "Fix reduce sum on in-place case."

This reverts commit 6cb8a8e1b3db7b6d20941b0053e3f3836068eb64.

* Revert "Revert "Fix reduce sum on in-place case.""

This reverts commit 130a257c0893dc09f4bd6e6a45d112261807fd2c.
2018-06-26 14:55:48 -07:00
Xiaomeng Yang
288d37998a
[Caffe2] Fix gradient_check on in-place ops (#8828)
* Fix gradient_check on in-place ops

* Fix hsm_test

* Fix SplitByLengthOp test

* Fix input_device_options for gradient_checker

* Fix hypothesis_test_util.py
2018-06-25 15:25:56 -07:00
zrphercule
c44c95fd0b New operator 'expand' (#8263)
* operator 'expand'

* updated operator with a simple testcase

* Revert "updated operator with a simple testcase"

This reverts commit 1ce9f8ac567b525677254b0dce5735d7fea133d7.

* updated operator with a simple testcase

* expand operator with a passed testcase

* typo

* GPU full support added

* GPU support testing...

* GPU full supported

* formatted

* nits repaired

* gpu parameters fixed

* Expander removed

* nits fixed, document added

* formatted

* new testcases added & nits repaired
2018-06-18 16:33:47 -07:00
sf-wind
5b86c3af4a
Update from facebook (#8384)
* [fix] fixup the bias multiplier data access issue

Hotfix for failues in conv_transpose

* [D2][Easy]: lint regularizer

lint with black

* [GanH]: Split mu in adaptive weight for diagnose

* [Dper] Add the ability to split FC weights into multiple smaller ones

* fix SumReduceLikeOp for empty blob

as desc.

* add ctc_greedy_decoder for caffe2

ctc_greedy_decoder same as tf's

* Update event callback handling

Allow multiple callbacks per event

* Add WeightedSum layer

The motivation is to do weighted sum in HoNet/crossnet, in the next diff, I'll replace model.Add with model.WeightedSum in
honet: https://fburl.com/f4rmolg2
crossnet: https://fburl.com/v7awn8se, https://fburl.com/63filbnm

* Replicate DAG's behavior

Some callers expect RunAsync to block, replicate that behavior in case of
explicit 'dag' net type

* [dper] layernorm layer

as title

* Override dag, async_dag, async_polling

Overriding dag, async_dag and async_polling with async_scheduling

* Name the thread pools

Caffe thread pools currently inherit the thread names from the thread that starts them, which can be misleading. Give them an explicit name instead.

* [Caffe2] FilleOp should support int64_t dimensions

Change argument type to int64_t for shape argument of FillerOp (used in ConstantFill, XavierFill, etc)

* Remove caffe2/caffe2/contrib/torch/

It's not used anywhere and depends on old lua torch that conflicts with Aten. Given PT1 it's not relevant any more (though it was nice and clever code!)

#accept2ship

* Fix linearWarmup multiplier check

The multiplier needs to be non-negative, not strictly positive.

* Revert D3314316

This is after 2 years and we do not seem to have a use case for this one, so
for the sake of clean API design we should potentially remove this. This would
allow us to potentially pass in arguments to optionally construct an object,
although it is indeed a little bit unclear how we can reuse existing objects if
constructor arguments are passed in. In any case, we may want to remove this
dangling feature.

* Speedup generate proposals by partial_sort.

Speedup generate proposals by partial_sort.

FACEBOOK:
- Saw speed improvement for training with this op.
- Yanghan benchmarked the op on a small dataset and see consistent 100% improvement on speed (6ms -> 3ms) on 420 input resolution. See next diff for details.

* More parallel processing friendly for CPP version of GenerateProposals.

More parallel processing friendly for CPP version of GenerateProposals.

* [DT] [43/n] Lift stop conditions inside reader code back to flow control

1. Split multi_reader function into local_reader and remote_reader
2. Lifted stop conditions inside Limiter back to flow control
3. Split epoch flow building logic into 3 cases:
  - single machine (1 reader, 1 trainer on trainer0 node, no PS)
  - (1 reader + 1 trainer) on trainer0 node, has PS
  - multiple readers, readers do not share nodes with trainers, might have PS or not

* Resolve conflicts for torch/_thnn/utils.py

* [Caffe2] Handle image decoding errors

Image decoding errors can make the whole training fail. This diff is to handle them
1.Catch imdecode exceptions and check if decoded image has zero columns or rows. This is counted as decoding errors.
2.Replace the image with empty in case of error
3.Count the number of errors and throw runtime exception if the rate reaches given number

The empty image data is kept. It might introduce noise in the training data.

* Update MKL exporter to IDEEP ops

TSIA

* [Caffe2] GlobalInit is thread safe, fixing the comment

With the mutex and lock, GlobalInit is thread safe.
Update the comments.

* Back out "Add support for generating ATen files during fbcode build"

Original commit changeset: 28970ddba353

@override-unit-failures
(Note: this ignores all push blocking failures!)

* [DT]: fix predictor save

similar to D6610058, here we add the fix for distributed online training

* Remove net_singlethread_async_gpu.cc

Closes https://github.com/caffe2/caffe2/pull/2528

This removes net_singlethread_async_gpu.cc as part of our effort to clean
CUDAContext and the net executors.

* Inline DFS task execution

Add a DFS inline task execution mode in executor

* Add c10 folder to fbcode

This adds the c10 folder and its test cases to fbcode. Build flags are mostly taken from aten.

* add dependencies for online trainer

Add some dependencies so that the online model can use DataPipeline and PredictionTransform operators

Relevent post: https://fb.intern.facebook.com/groups/1324375037655677/permalink/1740993462660497/

* Resolve conflicts for tools/jit/gen_jit_dispatch.py

* [Fix] sparse regularization in distributed training

* Support advanced pooling options in sum processor

* support advanced pooling options in sum processor
* remove redundant code
* support attention in sum processor

* Improve shard logging in net tracing code

Make it handle arbitrary shard ids instead of just one digit ids.

* [Caffe2] Call GlobalInit in predictor only in mobile

FACEBOOK:
Calling GlobalInit long after the program starts may not be safe. There are issues if the following happens:

User does not call GlobalInit and initFacebook after program starts
User sets a flag manually: https://fburl.com/mcsumw7d
User calls OSS predictor.
OSS predictor calls GlobalInit
GlobalInit calls initFacebook
initFacebook resets all flags: https://fburl.com/tolszha1
Thus, the user manually set flags are overwritten

This would happen anytime GlobalInit is called long after the program starts.
I suppose the intention of the user in this case is not to call GlobalInit throughout the program,
but use Caffe2 regardless (is that desired?)
But adding GlobalInit in the OSS predictor would automatically call GlobalInit when using Caffe2.

This issue doesn't exist in mobile, since initFacebook is not called on mobile.

For now, guard the GlobalInit in predictor for mobile only.
May want to ensure the GlobalInit is always called at the start of the program. @[3501714:kutta] has seen weird issues when not calling GlobalInit at the start of the program on server side. He has made some progress on this.

* resolve conflicts for caffe2/core/logging_is_google_glog.h and test/test_torch.py

* Add empty fix for SumLikeReduceOp

Add empty fix for SumLikeReduceOp

* Revert D7962948: [caffe2][nomnigraph] Concat elim for sparseNN

This reverts commit f7f434dc5c34ca6058b9765d2ef615453d2276a9

@bypass-lint

An infra SEV is better than not reverting this diff.
If you copy this password, see you in SEV Review!
@cause_a_sev_many_files

* Remove Declarations.yaml

* Include common.h

* Change std::stoi to caffe2::stoi

* Add thread_name.cc to the CMake file

* No need to subtract 1. Fix test segfaults

* Fix NetTest, ObserverTest

Fix tests

(cherry picked from commit 3767e66c3f365596cba3d46d3e7322c933a0ab41)

* CTCGreedyDecoderOp only has CPU implementation, test should only run on CPU

* Add a variable to avoid conversion resizing issue

* [fix] fixup the bias multiplier data access issue

Hotfix for failues in conv_transpose

* [D2][Easy]: lint regularizer

lint with black

* [GanH]: Split mu in adaptive weight for diagnose

* [Dper] Add the ability to split FC weights into multiple smaller ones

* fix SumReduceLikeOp for empty blob

as desc.

* add ctc_greedy_decoder for caffe2

ctc_greedy_decoder same as tf's

* Update event callback handling

Allow multiple callbacks per event

* Add WeightedSum layer

The motivation is to do weighted sum in HoNet/crossnet, in the next diff, I'll replace model.Add with model.WeightedSum in
honet: https://fburl.com/f4rmolg2
crossnet: https://fburl.com/v7awn8se, https://fburl.com/63filbnm

* Replicate DAG's behavior

Some callers expect RunAsync to block, replicate that behavior in case of
explicit 'dag' net type

* [dper] layernorm layer

as title

* Override dag, async_dag, async_polling

Overriding dag, async_dag and async_polling with async_scheduling

* Name the thread pools

Caffe thread pools currently inherit the thread names from the thread that starts them, which can be misleading. Give them an explicit name instead.

* [Caffe2] FilleOp should support int64_t dimensions

Change argument type to int64_t for shape argument of FillerOp (used in ConstantFill, XavierFill, etc)

* Remove caffe2/caffe2/contrib/torch/

It's not used anywhere and depends on old lua torch that conflicts with Aten. Given PT1 it's not relevant any more (though it was nice and clever code!)

#accept2ship

* Fix linearWarmup multiplier check

The multiplier needs to be non-negative, not strictly positive.

* Revert D3314316

This is after 2 years and we do not seem to have a use case for this one, so
for the sake of clean API design we should potentially remove this. This would
allow us to potentially pass in arguments to optionally construct an object,
although it is indeed a little bit unclear how we can reuse existing objects if
constructor arguments are passed in. In any case, we may want to remove this
dangling feature.

* Speedup generate proposals by partial_sort.

Speedup generate proposals by partial_sort.

FACEBOOK:
- Saw speed improvement for training with this op.
- Yanghan benchmarked the op on a small dataset and see consistent 100% improvement on speed (6ms -> 3ms) on 420 input resolution. See next diff for details.

* More parallel processing friendly for CPP version of GenerateProposals.

More parallel processing friendly for CPP version of GenerateProposals.

* [DT] [43/n] Lift stop conditions inside reader code back to flow control

1. Split multi_reader function into local_reader and remote_reader
2. Lifted stop conditions inside Limiter back to flow control
3. Split epoch flow building logic into 3 cases:
  - single machine (1 reader, 1 trainer on trainer0 node, no PS)
  - (1 reader + 1 trainer) on trainer0 node, has PS
  - multiple readers, readers do not share nodes with trainers, might have PS or not

* Resolve conflicts for torch/_thnn/utils.py

* [Caffe2] Handle image decoding errors

Image decoding errors can make the whole training fail. This diff is to handle them
1.Catch imdecode exceptions and check if decoded image has zero columns or rows. This is counted as decoding errors.
2.Replace the image with empty in case of error
3.Count the number of errors and throw runtime exception if the rate reaches given number

The empty image data is kept. It might introduce noise in the training data.

* Update MKL exporter to IDEEP ops

TSIA

* [Caffe2] GlobalInit is thread safe, fixing the comment

With the mutex and lock, GlobalInit is thread safe.
Update the comments.

* Back out "Add support for generating ATen files during fbcode build"

Original commit changeset: 28970ddba353

@override-unit-failures
(Note: this ignores all push blocking failures!)

* [DT]: fix predictor save

similar to D6610058, here we add the fix for distributed online training

* Remove net_singlethread_async_gpu.cc

Closes https://github.com/caffe2/caffe2/pull/2528

This removes net_singlethread_async_gpu.cc as part of our effort to clean
CUDAContext and the net executors.

* Inline DFS task execution

Add a DFS inline task execution mode in executor

* Add c10 folder to fbcode

This adds the c10 folder and its test cases to fbcode. Build flags are mostly taken from aten.

* add dependencies for online trainer

Add some dependencies so that the online model can use DataPipeline and PredictionTransform operators

Relevent post: https://fb.intern.facebook.com/groups/1324375037655677/permalink/1740993462660497/

* Resolve conflicts for tools/jit/gen_jit_dispatch.py

* [Fix] sparse regularization in distributed training

* Support advanced pooling options in sum processor

* support advanced pooling options in sum processor
* remove redundant code
* support attention in sum processor

* Improve shard logging in net tracing code

Make it handle arbitrary shard ids instead of just one digit ids.

* [Caffe2] Call GlobalInit in predictor only in mobile

FACEBOOK:
Calling GlobalInit long after the program starts may not be safe. There are issues if the following happens:

User does not call GlobalInit and initFacebook after program starts
User sets a flag manually: https://fburl.com/mcsumw7d
User calls OSS predictor.
OSS predictor calls GlobalInit
GlobalInit calls initFacebook
initFacebook resets all flags: https://fburl.com/tolszha1
Thus, the user manually set flags are overwritten

This would happen anytime GlobalInit is called long after the program starts.
I suppose the intention of the user in this case is not to call GlobalInit throughout the program,
but use Caffe2 regardless (is that desired?)
But adding GlobalInit in the OSS predictor would automatically call GlobalInit when using Caffe2.

This issue doesn't exist in mobile, since initFacebook is not called on mobile.

For now, guard the GlobalInit in predictor for mobile only.
May want to ensure the GlobalInit is always called at the start of the program. @[3501714:kutta] has seen weird issues when not calling GlobalInit at the start of the program on server side. He has made some progress on this.

* resolve conflicts for caffe2/core/logging_is_google_glog.h and test/test_torch.py

* Add empty fix for SumLikeReduceOp

Add empty fix for SumLikeReduceOp

* Revert D7962948: [caffe2][nomnigraph] Concat elim for sparseNN

This reverts commit f7f434dc5c34ca6058b9765d2ef615453d2276a9

@bypass-lint

An infra SEV is better than not reverting this diff.
If you copy this password, see you in SEV Review!
@cause_a_sev_many_files

* Remove Declarations.yaml

* Include common.h

* Change std::stoi to caffe2::stoi

* Add thread_name.cc to the CMake file

* No need to subtract 1. Fix test segfaults

* Fix NetTest, ObserverTest

Fix tests

(cherry picked from commit 3767e66c3f365596cba3d46d3e7322c933a0ab41)

* CTCGreedyDecoderOp only has CPU implementation, test should only run on CPU

* Add a variable to avoid conversion resizing issue

* Remove the code per soumith's comments

* Remove the code per soumith's comments

* Remove blank lines in the end of file

* Resolve conflicts for torch/_thnn/utils.py

* Update MKL exporter to IDEEP ops

TSIA

* Back out "Add support for generating ATen files during fbcode build"

Original commit changeset: 28970ddba353

@override-unit-failures
(Note: this ignores all push blocking failures!)

* add dependencies for online trainer

Add some dependencies so that the online model can use DataPipeline and PredictionTransform operators

Relevent post: https://fb.intern.facebook.com/groups/1324375037655677/permalink/1740993462660497/

* Resolve conflicts for tools/jit/gen_jit_dispatch.py

* Support advanced pooling options in sum processor

* support advanced pooling options in sum processor
* remove redundant code
* support attention in sum processor

* resolve conflicts for caffe2/core/logging_is_google_glog.h and test/test_torch.py

* Revert D7962948: [caffe2][nomnigraph] Concat elim for sparseNN

This reverts commit f7f434dc5c34ca6058b9765d2ef615453d2276a9

@bypass-lint

An infra SEV is better than not reverting this diff.
If you copy this password, see you in SEV Review!
@cause_a_sev_many_files

* Remove Declarations.yaml

* Include common.h

* Change std::stoi to caffe2::stoi

* [caffe2] uprade IDEEP and hotfix for conv op accuracy issue (#8364)

* [IDEEP] Upgrade IDEEP version

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* [IDEEP] Fix accuracy issue in conv op

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Fix build error due to lack of src in CMakeLists

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Remove the code per soumith's comments

* [ONNX] Add an ATen fallback pathway for ONNX export (#8273)

* ATen fallback for ONNX export

* Move to enum

* Fix model test

* Add comment

* Address comments

BC interface

* Remove imaginary file (#8415)

* [Caffe2] Enable AMD/MIOPEN ops for Caffe2  (#8306)

* Add hip support for caffe2 core

* Add MIOPEN header/wrapper to caffe2 core

* Add HIP device into caffe2 PB

* top level makefile change for rocm/hip

* makefile scaffolding for AMD/RocM/HIP

* Makefile scafodding for AMD/RocM/HIP; add makefile/utility for HIP files

* caffe2 PB update for AMD/ROCM HIP device

* Add AMD/RocM/Thrust dependency

* HIP threadpool update

* Fix makefile macro

* makefile fix: duplicate test/binary name

* makefile clean-up

* makefile clean-up

* add HIP operator registry

* add utilities for hip device

* Add USE_HIP to config summary

* makefile fix for BUILD_TEST

* merge latest

* Fix indentation

* code clean-up

* Guard builds without HIP and use the same cmake script as PyTorch to find HIP

* Setup rocm environment variables in build.sh (ideally should be done in the docker images)

* setup locale

* set HIP_PLATFORM

* Revert "set HIP_PLATFORM"

This reverts commit 8ec58db2b390c9259220c49fa34cd403568300ad.

* continue the build script environment variables mess

* HCC_AMDGPU_TARGET

* Cleanup the mess, has been fixed in the lastest docker images

* Assign protobuf field hip_gpu_id a new field number for backward compatibility

* change name to avoid conflict

* Fix duplicated thread pool flag

* Refactor cmake files to not add hip includes and libs globally

* Fix the wrong usage of environment variables detection in cmake

* Add MIOPEN CNN operators

* Revert "Add MIOPEN CNN operators"

This reverts commit 6e89ad4385b5b8967a7854c4adda52c012cee42a.

* Add MIOPEN pooling operator

* Add MIOPEN activation operator

* Add MIOPEN softmax operator

* Add MIOPEN spatial batch norm operator

* Add MIOPEN loacl response normalization operator

* Add MIOPEN conv operator

* Clean-up LRN ops

* enable fp16 in MIOPEN pool ops

* Enable fp16 for MIOPEN relu op

* Enable fp16 for MIOPEN spatial batch norm op

* code clean-up

* revert float16 support

* Create Caffe2 python binding for AMD/ROCM/HIP

* Add op fallback for HIP operator

* add hip src/test files in cmake

* exclude hip src/test files

* fix python binding for hip backend

* fix MIOPEN pooling op workspace

* hack to compile miopen operators

* fix include path for MIOPEN ops

* Fix include path

* Add HIP math utilities

* Fix path for HIP math utils

* cmake fix

* Cmake fix / hipcc for hip files

* suppress hipcc warning

* cmake fix /replcae USE_HIP with USE_ROCM

* revert LoadHIP.cmake change

* fix include for thrust/cub-hip

* include path fix for conversion.h

* Updated with latest upstream changes

* clang format fixes

* Context_hip updates

* Fixed typo in rocblas handle get function

* Updated hipified math utils

* Updated math hip test util

* Updated context hip test

* Updated common_hip

* Updated net async dag for HIP

* Added MIOPEN in operator hip test

* fix

* C2 dependencies clean-up

* fix include path for building custom protobuf

* Decouple miopen pool op and conv_pool_op base

* cmake refactor

* fix operator_hip_test

* move all hip/miopen ops files into caffe2/operators/hip

* sanitize cmake

* permission issue

* remove extra parenthesis

* remove artifact from resolving merge conflict

* cont. sanitize cmake files

* fix syntax error

* sanitize conversion.h

* .

* Revert "."

This reverts commit 56020cb0e996a31ae27bf1f8f491955ed0b121b9.

* clang-format

* Enable some reduce operators' ONNX backend tests (#8418)

* fix old comment to point to the right file (#8416)

* Stop pinning nccl version. (#8421)

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Expose logsumexp docs and mark log_sum_exp in distributions for internal use (#8428)

* Enable some of the ONNX backend test on broadcasting (#8423)

* Enable some of the ONNX backend test on broadcasting

* enable gemm broadcast

* Expose proto utils and ONNX (#8073)

* Expose proto utils and ONNX from PyTorch libcaffe2.so

* Try to use protobuf from _C.so

* Fix ONNX proto header include

* Adjust order of imports for ONNX until nanopb goes away

* Set and use ONNX_NAMESPACE for PyTorch builds

* Show protobuf summary for all builds

* Add ONNX_NAMESPACE for cpp_build

* Statically link libprotobuf.a into libtorch.so

* Set ONNX_NAMESPACE on Windows build

* Move core/dispatch up as well

* Add /MD flag for Windows build of _C

* Potential Windows fix for ONNX and protobuf

* Add direct linkage from _C to ONNX on Windows

* Only include protobuf wrapper for PyTorch

* Pass extra_compile_args to _nvrtc ext build

* Remove installation of .a files

* Rebase creates some weird situations, revert them manually

* Remove more weird changes due to rebase

* Need to add thread_name.cc after merge
2018-06-13 13:10:45 -07:00
Xiaomeng Yang
44973a06ba
Add affine_channel_op (#8356)
Add affine_channel_op
2018-06-11 20:51:11 -07:00