Commit Graph

33 Commits

Author SHA1 Message Date
Chonglin Sun
8a85d6bd34 support vectors with different dims in for DotProductOp.
Summary:
Add two argument to DotProductOp operator, `force_same_dim` (1 if we want
DotProductOp to only accept two tensors with equal dimension, 0 otherwise) and
pad_value (only useful when force_same_dim = 0, pad the tensor with smaller
size to the same as the other one).

Differential Revision: D4502619

fbshipit-source-id: 46f7da710c6f6365f76a7af6234c34c7f656be62
2017-02-23 11:09:07 -08:00
James Cross
63901e9aca allow recurrent network gradient op to receive gradient on any combination of network output blobs
Summary:
(Caffe2) Modified RecurrentNetworkGradient operator so that training is possible with any of the output blob(s) receiving gradient during the backward pass. This is realized through a new argument for the RecurrentNetwork op, outputs_with_grads, which takes a list of the indices of the output blobs which will receive gradient. The default case (only receiving gradient from the first output blob) remains the default.

New unit test covers the case where outputs_with_grads = [1, 2] using Python LSTM wrapper.

Reviewed By: urikz

Differential Revision: D4518516

fbshipit-source-id: 5c531582b20f3cf727d1aa91239b4d5a2b8a7c1f
2017-02-15 16:00:45 -08:00
Huazhong Ning
cb3c41b9a9 PiecewiseLinearTransformOp transform binary predictions specially
Summary:
The existing op tranforms the input in a general way. It needs M transform mappings to transform a NxM input tensor.
But for binary predictions X (Nx2 tensor), we know that X[:, 0] = 1 - X[:, 1].
So we just need one mapping for X[:, 1]. After being transformed, we can compute X[:, 0].
This diff is to handle this.

Differential Revision: D4550441

fbshipit-source-id: 42d8c6e88d830c97628ee930b543740a32acf904
2017-02-15 16:00:44 -08:00
Huazhong Ning
ed0024a82c SparseToDenseOp and GatherDense
Summary:
1. The existing Gather op outputs gradients in sparse format. We add GatherDense that does the same thing
   as Gather but outputs gradients in dense format. This relies on the SparseToDenseOp.
2. SparseToDenseOp converts sparse representation (indices, values) into a dense format (missing values are
   filled with zeros). There is an existing SparseToDenseMaskOp. It is mainly for converting sparse features
   into dense format. Modifying it to achieve our purpose is too complicated and messy. Better to create a new one.

Reviewed By: dzhulgakov

Differential Revision: D4508879

fbshipit-source-id: f4a50efa1c08586d94040f93195661c41cd414da
2017-02-09 13:33:06 -08:00
Yangqing Jia
274ac2b590 Add cmake guard for python, build for tegra X1
Summary:
In short: cmake is lovely.
Closes https://github.com/caffe2/caffe2/pull/131

Differential Revision: D4517234

Pulled By: Yangqing

fbshipit-source-id: 1117878393f8fe7d6bebbc4a06a3c37b734f3222
2017-02-07 13:17:50 -08:00
Zhao Tan
a386fe8b6a LogOP implementation
Summary: Element-wise log operation for a Tensor

Reviewed By: dzhulgakov

Differential Revision: D4519090

fbshipit-source-id: 68b73efa0ef268426b5aece77c8137000a73d165
2017-02-06 20:19:19 -08:00
Alexander Sidorov
b7fa6b2a8b remove recurrent_inputs in a favor of recurrent_input_ids
Summary:
I have forgotten to remove this one. The rest of indexing
instead of string names is comming after  D4446813 lands as scratches
aren't inputs or outputs and thus can't be indexed.

Reviewed By: urikz

Differential Revision: D4465748

fbshipit-source-id: 2ccbedfb35541ef4a2231d1480eef59025bd5290
2017-01-31 13:14:33 -08:00
Yury Zemlyanskiy
debd256177 Fix for gradient propagation for initial recurrent state for RecurrentNetwork
Summary: looks like we don't a good job with initial recurrent input gradients yet. Here is some fix, but gradient doesn't check yet. The shape is correct now though

Reviewed By: salexspb

Differential Revision: D4475447

fbshipit-source-id: 280f1f59f19e487fd0dce0d440609c50ddce294a
2017-01-30 18:59:32 -08:00
Yury Zemlyanskiy
22e1bdd6d1 Use stack workspaces in RecurrentNetwork
Summary: This diff use stack workspaces in RecurrentNetwork, which allows to simplify the implementation and get rid of scratches.

Reviewed By: salexspb

Differential Revision: D4446813

fbshipit-source-id: 514eec7e4300bdf492a9cb192b40cf4f89acf656
2017-01-27 11:44:26 -08:00
Xianjie Chen
ddbf90afa3 improve dper dh
Summary:
it's broken because it relies on add sparse bias.
it's not easy to add_sparse_bias after switch to loader_param.

DPA would like to try it out :)

Differential Revision: D4447275

fbshipit-source-id: 631cb4995f35383070e44387dc86692ba64b91eb
2017-01-25 02:59:22 -08:00
Yury Zemlyanskiy
0e3146e1e8 Remove recurrent_sizes from RecurrentNetwork
Summary: Remove usage of recurrent_sizes, so recurrent states' sizes can depend on input (in case of attention matrix for beam decoder). I removed recurrent_sizes from forward and backward steps.

Reviewed By: salexspb

Differential Revision: D4427688

fbshipit-source-id: 580420a294d309c86ec5cb4e677058623b7228e1
2017-01-24 23:14:25 -08:00
Alexander Sidorov
b1472a173a don't hardcode outputs order to work only for lstm + don't pass blob names for parameters
Summary:
In this diff I stop passing parameters by name and also remove hardcoded output ids which were there specifically for LSTM to work. It also allows to avoid using recurrent_sizes in the backward pass (for forward this is done in D4427688)

Using similar technic it should be simple enough to eliminate blob name passing at all. Then we can fix scoping. These can be done in a next diff.

Reviewed By: urikz

Differential Revision: D4444614

fbshipit-source-id: 3580a76365502b9f2f09e3d8b7e78084ca739f00
2017-01-24 16:29:23 -08:00
Chao Zhang
96fc095ccb Add piecewise linear transformation operator
Summary:
New operator is added for model calibration. Given a piecewise linear function and raw prediction as input, generate the mapping as output.
Detail can be find in the operator doc.

Differential Revision: D4418640

fbshipit-source-id: f8ff3ea786b0fe233a4ddcb709e5dbf0861ca484
2017-01-23 17:44:26 -08:00
Andrew Tulloch
e23ddf06e9 UnsafeCoalesceOp for nn.Module.flattenParameters style coalescing
Summary:
This is a handy tool for amortizing expensive operators (e.g.
distributed communication, some heavier kernel launches, etc) over a
lot of small blobs (e.g. all the biases in a network). We can just
coalesce these small blobs in-place into a single blob, act on them in
operators, etc as if they are non-coalsed (passing them as inputs to
operators, etc), and then finally for heavier operators, just work on
the coalesced blob that contains each of these units.

I named it UnsafeCoalesce since it introduces blob aliasing, which
needs care for work like memory management, graph rewriting as in
memonger, etc.

Reviewed By: Yangqing

Differential Revision: D3557149

fbshipit-source-id: 09cff4459b84270fe9e1da3b4a168fd66d01f795
2017-01-17 17:14:35 -08:00
Pooya Davoodi
92ebb58a06 Top-k accuracy operator on host
Summary:
Automatically copy from device -> host if necessary.

Thanks to pooyadavoodi for the host top-k code.
Closes https://github.com/caffe2/caffe2/pull/51

Reviewed By: Yangqing

Differential Revision: D4348953

Pulled By: bwasti

fbshipit-source-id: be650855cdd6c2c7bed838155f30e9fa92759dfe
2017-01-10 18:44:30 -08:00
Yury Zemlyanskiy
c2d28fb874 RNNs API simplification
Summary:
This is a first step in improving our RNN story. It provides a wrapper around current RecurrentNetworkOp implementation which infers most of the redundant parameters and makes API much simpler.

Also in order to support general step nets I added an extra argument to the RecurrentNetworkOp.

Future work:

1. Inferring step net output and internal blobs (scratches) sizes and type
2. Avoid accessing blobs by names in c++ part
3. Remove requirement for inputs / output 1:1 correspondence in the step net
4. Make python API support networks with operators like Sum being on the boarder of the Cell net (currently there is an issue with such networks where gradient blobs which are on the side are not explicitly created).

Differential Revision: D4268503

fbshipit-source-id: f8a66491c2b55daa730caeed7e9f2b3921541b49
2016-12-21 09:29:43 -08:00
Simon Layton
84e7eff458 Waive some hypothesis tests on GPU
Summary:
operators don't exist on GPU
Closes https://github.com/caffe2/caffe2/pull/63

Reviewed By: Yangqing

Differential Revision: D4348968

Pulled By: bwasti

fbshipit-source-id: 1fb8693842d6827ffcf96de2a9a8ba2f9dff0293
2016-12-19 15:59:32 -08:00
Yangqing Jia
42bbdda8c4 MKLDevice and MKLOperator
Summary:
This adds Caffe2 support for MKL operators directly with MKLMemory. Included a
Relu layer that shows how to use it.

Reviewed By: salexspb

Differential Revision: D4322144

fbshipit-source-id: 8b3392c4fd024ab1a7ba7135c349ebd3e1976799
2016-12-15 19:59:24 -08:00
Yangqing Jia
dc16bcfa27 Remove float64 test
Summary:
float64 test breaks things on the cuda side. I am deleting it for now and if
we add it back, let's make sure we run the test on a GPU machine first :)

Reviewed By: azzolini

Differential Revision: D4324427

fbshipit-source-id: 0246fe9dd28a286422ca94c90f5b0fc33a162e74
2016-12-15 12:01:30 -08:00
Maxime Boucher
4cd263db74 Last N window collector
Summary: Allows to collect samples over multiple batches. The method uses a circular array and so there is no guarantee about the order of the samples. The goal is to get a view of the data accross multiple batches

Reviewed By: salexspb

Differential Revision: D4216181

fbshipit-source-id: bb9e1fa84ac7e04006dcddb53c9347a42ec83dc8
2016-12-15 12:01:30 -08:00
Xianjie Chen
0bc104a3d0 fix unit test
Summary: ...

Differential Revision: D4298663

fbshipit-source-id: 7831830a5201eb6603d846460c22b2f906e53858
2016-12-15 12:01:29 -08:00
Xianjie Chen
3c47d41f86 add unit test for row mul
Summary: so that we are more confident.

Differential Revision: D4290132

fbshipit-source-id: 44e4687d977ab90cc022a14131bbf701bdf131d4
2016-12-15 12:01:29 -08:00
Xianjie Chen
f41b2ca85c fix sliceop for empty batch
Summary: Used in the NNPreProc layers. It fails the online training when there is empty batch.

Reviewed By: dzhulgakov

Differential Revision: D4235498

fbshipit-source-id: bde00a011831762e44a3f9bf2190d4b241a06ccc
2016-11-29 15:18:39 -08:00
Wenlin Chen
9fa26fcc32 position weighted embedding
Summary: Each sparse feature is a ID list. And usually the position of the id in the id list is meaningful. The earlier the id appears in the list, the more important. In this diff, we multiple each embedding with a weight, where the weight corresponds to the position. With this change, same ID appears on different position would have different norm/length/importance after aggregation. The firstX transformation in sigrid is a special case of this model where the weights before n are 1, and 0 after n, where n is the argument of firstX.

Reviewed By: xianjiec

Differential Revision: D4181251

fbshipit-source-id: 2a6f8b7240af445b6bd2052fd24c2d99f39ee7ff
2016-11-29 15:18:35 -08:00
Yangqing Jia
589398950f fbsync at f5a877 2016-11-18 15:41:06 -08:00
Yangqing Jia
238ceab825 fbsync. TODO: check if build files need update. 2016-11-15 00:00:46 -08:00
Yangqing Jia
d1e9215184 fbsync 2016-10-07 13:08:53 -07:00
Yangqing Jia
b23e51d467 chunky sync 2016-09-06 15:55:19 -07:00
Yangqing Jia
05512d1e10 sync 2016-08-10 11:02:15 -07:00
Yangqing Jia
c15e45c9bb chunky sync again 2016-08-01 20:58:46 -07:00
Yangqing Jia
bcea409c82 sync 2016-07-28 15:06:43 -07:00
Yangqing Jia
6463eebc7b chunky sync - build scripts to be written 2016-07-21 10:16:42 -07:00
Yangqing Jia
559053d3a8 chunky sync 2016-05-13 14:43:48 -07:00