Commit Graph

16 Commits

Author SHA1 Message Date
Simon Layton
05233cd5b8 Make bias optional in cuDNN conv op
Summary:
Yangqing This seems to work for me, not sure if it's implemented in the right way for you to accept :)

Allows user to specify "no_bias" as an option for convolution layers (only cuDNN at this point), so that the bias associated with that operator is not allocated or computed. This is useful in particular for conv + BatchNorm combinations (such as ResNets), as the bias term can be handled by both conv and Batch Norm, wasting memory and computation.
Closes https://github.com/caffe2/caffe2/pull/50

Reviewed By: Yangqing

Differential Revision: D4341288

Pulled By: bwasti

fbshipit-source-id: e6138d0024c83ed876dff2f83ffbebe7de502fd8
2016-12-19 14:59:49 -08:00
Yangqing Jia
1a00ffea2a Implement fix recommended by @slayton58
Summary: This addresses integer division errors.

Reviewed By: bwasti

Differential Revision: D4315555

fbshipit-source-id: 13ef9496409b3452bc5fb66ce787b11af1382132
2016-12-15 12:01:30 -08:00
Aapo Kyrola
eddf23ca0f Handle parameters that are computed but not optimized
Summary:
prigoyal sharply noticed a bug in the Resnet models: we have not been checkpointing, nor synchronizing between gpus, the moving average and variance computed by the SpatialBN ops.  Particularly the first problen is serious, since models starting from checkpoint would have started from a null-state for SpatialBN. Not synchronizing with the data parallel model is less tragic since each GPU should see very similar data.

Thus I propose keeping track of "computed params", i.e params that are computed from data but not optimized. I don't know if there are other examples, but SpatialBN's moving avg and var definitely are one.

- I modified the checkpointign for xray model to store those blobs + also ensure the synchronization of those blobs
- I modified data parallel model to broadcast those params from gpu0. I first tried averaging, but hit some NCCL deadlocks ... :(

Differential Revision: D4281265

fbshipit-source-id: 933311afeec4b7e9344a13cf2d38aa939c50ac31
2016-12-15 12:01:28 -08:00
Ou Jin
e8b7ec1393 disable local update for sparse features
Summary:
With parameter server, sparse features are updated on the parameter server.
Local update for sparse features are disabled. But that logic is removed in
D4144922. This diff is to add this logic back in a slightly different way.

Previously, in trainer_example, I did that in a hacky way just avoid adding
sparse weight to model.params. It will still generate grad, but will not add
optimization operators. At the same time, it is always registered directly in
the sparse_mapping, so the parameter server is aware of this parameter.
But with the new change for ParameterInfo. I can not do it in that way anymore.
Because the param registry and params are bind together in ParameterInfo.

For dper, there is a option in dper model helper to disable all of the sparse
parameter optimizer.

To combine these two together, I directly changed the ModelHelperBase in this
diff. It is not quite ideal. It is better to do it in Layer. But to fix the old
one, this seems to be more reasonable place to cover both cases.

With this diff, there is no spike anymore. So probably this is the root cause
for the convergence issue we have seen in D4144922. It explains that why the
model can recover, which is because adagrad decays local learning rate and
local updates cause less change.

Reviewed By: dzhulgakov

Differential Revision: D4229684

fbshipit-source-id: da1241d43d7c52cbf13560f9bb83e09897d8d56f
2016-11-29 15:18:38 -08:00
Yangqing Jia
238ceab825 fbsync. TODO: check if build files need update. 2016-11-15 00:00:46 -08:00
Simon Layton
8def54e82b Fix BN in test phase 2016-10-19 08:20:11 -04:00
Yangqing Jia
f019672e0b Merge branch 'master' into fbsync 2016-10-07 16:42:13 -07:00
Yangqing Jia
d1e9215184 fbsync 2016-10-07 13:08:53 -07:00
Simon Layton
00c493864e Fix BN for test phase 2016-10-07 12:11:36 -04:00
Yangqing Jia
0a09d09431 fbsync 2016-09-08 17:56:14 -07:00
Yangqing Jia
b23e51d467 chunky sync 2016-09-06 15:55:19 -07:00
Yangqing Jia
05512d1e10 sync 2016-08-10 11:02:15 -07:00
Yangqing Jia
6463eebc7b chunky sync - build scripts to be written 2016-07-21 10:16:42 -07:00
Yangqing Jia
559053d3a8 chunky sync 2016-05-13 14:43:48 -07:00
Yangqing Jia
cf7ca23fc1 make caffe2.python build 2016-03-08 16:48:19 -08:00
Yangqing Jia
9ae880bb6f move pycaffe2 to caffe2.python 2016-03-08 15:45:30 -08:00