Commit Graph

41 Commits

Author SHA1 Message Date
Yiming Wu
b922b19bfd add weights bias to modelhelperbase
Summary: add weights and bias to modelhelperbase

Reviewed By: salexspb

Differential Revision: D4837125

fbshipit-source-id: 6a357c0e3d07d35aa6cdeb8ef803976646b9dbe6
2017-04-06 11:16:55 -07:00
Aapo Kyrola
c66c8f6e84 Add Softmax to cnn.py, cuDNN engine.
Summary: Softmax was not in the model helper, so added it there so we can set the CUDNN engine, as it is the preferred version.

Reviewed By: asaadaldien

Differential Revision: D4835624

fbshipit-source-id: 7f0c84b7a73653119901795782709a6a617345c5
2017-04-05 14:20:23 -07:00
Aapo Kyrola
e13e9c1302 cuDNN version of TransposeOp
Summary:
Uses the cudnnTransformTensor function. It works by shuffling the strides according to the transpose axis. Significant speedup over current GPU version .
+ moves the transpose test under utility_ops, because hypothesis_test is too big

Reviewed By: jamesr66a

Differential Revision: D4810993

fbshipit-source-id: 82577c4ced1389e70bd5992820ae4d8297a3817f
2017-04-03 13:33:10 -07:00
Aaron Markham
58f7f2b441 doxygen python block added
Summary: Closes https://github.com/caffe2/caffe2/pull/226

Differential Revision: D4793550

Pulled By: JoelMarcey

fbshipit-source-id: cc33e58186304fa8dcac2ee9115dcc271d785b1e
2017-03-29 06:46:16 -07:00
Ahmed Taei
3b7cb50d1c Add ConvNd to model helper
Summary:
Add ConvNd interface for Nd  convluton and keep Conv for 2d convlution.
I added _BaseConv to share code between ConvNd and Conv.

Reviewed By: Yangqing

Differential Revision: D4660822

fbshipit-source-id: 8339421351ce9a36ce5a165f7fa455cfcc61733d
2017-03-22 15:47:48 -07:00
Alexander Sidorov
f97d7949d0 Remove legacy LSTM, cleanup tests
Summary: we don't use this one any more except a few tests

Reviewed By: urikz

Differential Revision: D4731401

fbshipit-source-id: c5c28b7594e3251f501fc28455dfc9bd2093a836
2017-03-17 16:33:53 -07:00
Alexander Sidorov
1fac027d0e Quantized Training API
Summary: These python helpers are going to provide sufficient book keeping when adding quantization for conv layers

Reviewed By: Yangqing

Differential Revision: D4671478

fbshipit-source-id: 292e2f633dd30969c0afbe7a8075b340ce9a6d12
2017-03-13 22:17:58 -07:00
Pooya Davoodi
d85ca8c6df Do not initialize BN params if init_params is false.
Summary:
If init_params is False, the parameters should not be initialized.
This is particularly important when testing a model that provides values for these BN parameters.
Closes https://github.com/caffe2/caffe2/pull/174

Differential Revision: D4621791

Pulled By: Yangqing

fbshipit-source-id: 518443925990a12c1d5729b0971ebe19ba5d8998
2017-02-27 20:19:03 -08:00
Aapo Kyrola
9eeeb8407f use CUDA version of AccuracyOp with top_k=1
Summary: D4348953 added support for accuracy for top_k>1, which is only supported on CPU, requiring data to be copied to CUDA. But that diff did not take into account that we have top_k=1 version of AccuracyOp for CUDA. This diff ensures we use the CUDA version for top_k=1.

Differential Revision: D4607767

fbshipit-source-id: 8becda23890343043eb79ad04e4c6196e9010f0c
2017-02-23 19:02:53 -08:00
Kittipat Virochsiri
ba7fad53b5 Support for sample softmax
Summary:
This diff adds ability to train multiclass classifier on sampled subset of classes. This basically implements what described in https://arxiv.org/abs/1412.2007 without the sampling probability correction. Since this implement uniform sampling, sampling probabilities are cancelled out in softmax anyway.

The trick to make this work is to have 2 different nets for prediction and training, both shared parameters. The model is built normally until the last layer. If sampling is needed, then we do the following:

The class sampling works as following:

Reviewed By: xianjiec

Differential Revision: D4512859

fbshipit-source-id: ab537bcac81d5e5877a8795045e8682c8064da68
2017-02-17 09:31:54 -08:00
James Cross
b436788b16 LSTMUnit: pass through H values
Summary:
Pass through the h-value recurrent output unchanged at each LSTM step beyond the valid part of a sequence (computed based on seqLengths, allowing batching of sequences of different length). This enables using the final-step output of each sequence as the output when one vector is desired for the entire sequence. Gradient also passed back unchanged.

Also made some cosmetic changes to recurrent_network_test.py (seq_lengths offset corrected, should be in [1, T] rather than [0, T-1]).

Reviewed By: urikz

Differential Revision: D4540307

fbshipit-source-id: 73a9f6326069d713dcb0cdc8d17869317c6dbe96
2017-02-16 15:31:38 -08:00
James Cross
63901e9aca allow recurrent network gradient op to receive gradient on any combination of network output blobs
Summary:
(Caffe2) Modified RecurrentNetworkGradient operator so that training is possible with any of the output blob(s) receiving gradient during the backward pass. This is realized through a new argument for the RecurrentNetwork op, outputs_with_grads, which takes a list of the indices of the output blobs which will receive gradient. The default case (only receiving gradient from the first output blob) remains the default.

New unit test covers the case where outputs_with_grads = [1, 2] using Python LSTM wrapper.

Reviewed By: urikz

Differential Revision: D4518516

fbshipit-source-id: 5c531582b20f3cf727d1aa91239b4d5a2b8a7c1f
2017-02-15 16:00:45 -08:00
David Truong
60be25f4cd Added shape inference to padding operator for tensors
Summary: Can now infer the shape of the tensor

Differential Revision: D4529339

fbshipit-source-id: 33553611fd3ecd7fde4b7b432c7720255ddda8be
2017-02-13 11:04:13 -08:00
Alexander Sidorov
b7fa6b2a8b remove recurrent_inputs in a favor of recurrent_input_ids
Summary:
I have forgotten to remove this one. The rest of indexing
instead of string names is comming after  D4446813 lands as scratches
aren't inputs or outputs and thus can't be indexed.

Reviewed By: urikz

Differential Revision: D4465748

fbshipit-source-id: 2ccbedfb35541ef4a2231d1480eef59025bd5290
2017-01-31 13:14:33 -08:00
Yury Zemlyanskiy
22e1bdd6d1 Use stack workspaces in RecurrentNetwork
Summary: This diff use stack workspaces in RecurrentNetwork, which allows to simplify the implementation and get rid of scratches.

Reviewed By: salexspb

Differential Revision: D4446813

fbshipit-source-id: 514eec7e4300bdf492a9cb192b40cf4f89acf656
2017-01-27 11:44:26 -08:00
Yury Zemlyanskiy
0e3146e1e8 Remove recurrent_sizes from RecurrentNetwork
Summary: Remove usage of recurrent_sizes, so recurrent states' sizes can depend on input (in case of attention matrix for beam decoder). I removed recurrent_sizes from forward and backward steps.

Reviewed By: salexspb

Differential Revision: D4427688

fbshipit-source-id: 580420a294d309c86ec5cb4e677058623b7228e1
2017-01-24 23:14:25 -08:00
Alexander Sidorov
b1472a173a don't hardcode outputs order to work only for lstm + don't pass blob names for parameters
Summary:
In this diff I stop passing parameters by name and also remove hardcoded output ids which were there specifically for LSTM to work. It also allows to avoid using recurrent_sizes in the backward pass (for forward this is done in D4427688)

Using similar technic it should be simple enough to eliminate blob name passing at all. Then we can fix scoping. These can be done in a next diff.

Reviewed By: urikz

Differential Revision: D4444614

fbshipit-source-id: 3580a76365502b9f2f09e3d8b7e78084ca739f00
2017-01-24 16:29:23 -08:00
Kevin Matzen
6a7dd236fa instance norm
Summary: Added gradient and GPU implementation to caffe2 InstanceNorm op

Reviewed By: Yangqing

Differential Revision: D4304808

fbshipit-source-id: 6feecaed589ea9f825260a49b39b4260da6e5426
2017-01-20 12:29:28 -08:00
Pooya Davoodi
92ebb58a06 Top-k accuracy operator on host
Summary:
Automatically copy from device -> host if necessary.

Thanks to pooyadavoodi for the host top-k code.
Closes https://github.com/caffe2/caffe2/pull/51

Reviewed By: Yangqing

Differential Revision: D4348953

Pulled By: bwasti

fbshipit-source-id: be650855cdd6c2c7bed838155f30e9fa92759dfe
2017-01-10 18:44:30 -08:00
Simon Layton
7c3f1521a7 Gpu transform
Summary:
Adds a thread pool for image decode, and optional GPU-based data conversion, mean subtraction and std division
Closes https://github.com/caffe2/caffe2/pull/56

Reviewed By: Yangqing

Differential Revision: D4341326

Pulled By: bwasti

fbshipit-source-id: 6485616ea7d212c7701274a40fae912db30dff4a
2017-01-03 17:59:34 -08:00
Priya Goyal
3eb08feff5 Support no_bias in naive group conv implementation
Summary:
I was testing perf difference between naive group conv and cudnn group conv. I am doing no_bias conv and added support for that in naive implementation
although its deprecated, i thought it would be nice to have working things in our code

Differential Revision: D4363168

fbshipit-source-id: 29719013d79b449fd359884709c7a1195be51ae3
2016-12-22 14:14:26 -08:00
Aapo Kyrola
db5cc8f278 revert exhaustive_search setting to False
Summary: As per discussion in D4355529

Reviewed By: prigoyal

Differential Revision: D4362162

fbshipit-source-id: 795fcf1507235a7dc3c7a10b0453037936d057aa
2016-12-22 12:44:42 -08:00
Yangqing Jia
2c6a579859 Make all convolution operators allow optional bias term
Summary:
It used to be that only the cudnn engine supports it, and now it should be
fully supported by any conv engine.

To ignore bias, simply use a convolution op that has two inputs instead of
3. The gradient operator will automatically figure out that it does not
compute the bias gradient.

Reviewed By: prigoyal

Differential Revision: D4354183

fbshipit-source-id: cf71b6289a254d15a6a663a85df63fbbaec3702b
2016-12-21 15:14:24 -08:00
Aapo Kyrola
5209a28c95 cuddn_exhaustive_search default True
Summary: As discussed, this improves performance a lot and is not a memory hog anymore. Anyway anyone can also turn it off.

Differential Revision: D4338798

fbshipit-source-id: bf0fdb594427ebe90e1e94b2effdc63196096b3f
2016-12-21 09:29:43 -08:00
Yury Zemlyanskiy
c2d28fb874 RNNs API simplification
Summary:
This is a first step in improving our RNN story. It provides a wrapper around current RecurrentNetworkOp implementation which infers most of the redundant parameters and makes API much simpler.

Also in order to support general step nets I added an extra argument to the RecurrentNetworkOp.

Future work:

1. Inferring step net output and internal blobs (scratches) sizes and type
2. Avoid accessing blobs by names in c++ part
3. Remove requirement for inputs / output 1:1 correspondence in the step net
4. Make python API support networks with operators like Sum being on the boarder of the Cell net (currently there is an issue with such networks where gradient blobs which are on the side are not explicitly created).

Differential Revision: D4268503

fbshipit-source-id: f8a66491c2b55daa730caeed7e9f2b3921541b49
2016-12-21 09:29:43 -08:00
Simon Layton
05233cd5b8 Make bias optional in cuDNN conv op
Summary:
Yangqing This seems to work for me, not sure if it's implemented in the right way for you to accept :)

Allows user to specify "no_bias" as an option for convolution layers (only cuDNN at this point), so that the bias associated with that operator is not allocated or computed. This is useful in particular for conv + BatchNorm combinations (such as ResNets), as the bias term can be handled by both conv and Batch Norm, wasting memory and computation.
Closes https://github.com/caffe2/caffe2/pull/50

Reviewed By: Yangqing

Differential Revision: D4341288

Pulled By: bwasti

fbshipit-source-id: e6138d0024c83ed876dff2f83ffbebe7de502fd8
2016-12-19 14:59:49 -08:00
Yangqing Jia
1a00ffea2a Implement fix recommended by @slayton58
Summary: This addresses integer division errors.

Reviewed By: bwasti

Differential Revision: D4315555

fbshipit-source-id: 13ef9496409b3452bc5fb66ce787b11af1382132
2016-12-15 12:01:30 -08:00
Aapo Kyrola
eddf23ca0f Handle parameters that are computed but not optimized
Summary:
prigoyal sharply noticed a bug in the Resnet models: we have not been checkpointing, nor synchronizing between gpus, the moving average and variance computed by the SpatialBN ops.  Particularly the first problen is serious, since models starting from checkpoint would have started from a null-state for SpatialBN. Not synchronizing with the data parallel model is less tragic since each GPU should see very similar data.

Thus I propose keeping track of "computed params", i.e params that are computed from data but not optimized. I don't know if there are other examples, but SpatialBN's moving avg and var definitely are one.

- I modified the checkpointign for xray model to store those blobs + also ensure the synchronization of those blobs
- I modified data parallel model to broadcast those params from gpu0. I first tried averaging, but hit some NCCL deadlocks ... :(

Differential Revision: D4281265

fbshipit-source-id: 933311afeec4b7e9344a13cf2d38aa939c50ac31
2016-12-15 12:01:28 -08:00
Ou Jin
e8b7ec1393 disable local update for sparse features
Summary:
With parameter server, sparse features are updated on the parameter server.
Local update for sparse features are disabled. But that logic is removed in
D4144922. This diff is to add this logic back in a slightly different way.

Previously, in trainer_example, I did that in a hacky way just avoid adding
sparse weight to model.params. It will still generate grad, but will not add
optimization operators. At the same time, it is always registered directly in
the sparse_mapping, so the parameter server is aware of this parameter.
But with the new change for ParameterInfo. I can not do it in that way anymore.
Because the param registry and params are bind together in ParameterInfo.

For dper, there is a option in dper model helper to disable all of the sparse
parameter optimizer.

To combine these two together, I directly changed the ModelHelperBase in this
diff. It is not quite ideal. It is better to do it in Layer. But to fix the old
one, this seems to be more reasonable place to cover both cases.

With this diff, there is no spike anymore. So probably this is the root cause
for the convergence issue we have seen in D4144922. It explains that why the
model can recover, which is because adagrad decays local learning rate and
local updates cause less change.

Reviewed By: dzhulgakov

Differential Revision: D4229684

fbshipit-source-id: da1241d43d7c52cbf13560f9bb83e09897d8d56f
2016-11-29 15:18:38 -08:00
Yangqing Jia
238ceab825 fbsync. TODO: check if build files need update. 2016-11-15 00:00:46 -08:00
Simon Layton
8def54e82b Fix BN in test phase 2016-10-19 08:20:11 -04:00
Yangqing Jia
f019672e0b Merge branch 'master' into fbsync 2016-10-07 16:42:13 -07:00
Yangqing Jia
d1e9215184 fbsync 2016-10-07 13:08:53 -07:00
Simon Layton
00c493864e Fix BN for test phase 2016-10-07 12:11:36 -04:00
Yangqing Jia
0a09d09431 fbsync 2016-09-08 17:56:14 -07:00
Yangqing Jia
b23e51d467 chunky sync 2016-09-06 15:55:19 -07:00
Yangqing Jia
05512d1e10 sync 2016-08-10 11:02:15 -07:00
Yangqing Jia
6463eebc7b chunky sync - build scripts to be written 2016-07-21 10:16:42 -07:00
Yangqing Jia
559053d3a8 chunky sync 2016-05-13 14:43:48 -07:00
Yangqing Jia
cf7ca23fc1 make caffe2.python build 2016-03-08 16:48:19 -08:00
Yangqing Jia
9ae880bb6f move pycaffe2 to caffe2.python 2016-03-08 15:45:30 -08:00