Commit Graph

40 Commits

Author SHA1 Message Date
Andrey Malevich
a5fc70857c Support fetching of the parameters from the global namescope by ''
Summary:
This diff is fixing fetching of the parameters in the global namescope. Earlier
diff that have switched to '' have introduced this bug.

Reviewed By: dzhulgakov

Differential Revision: D5189667

fbshipit-source-id: 4818e99e2c2c90788e70e0b8b6204ec6f471d37d
2017-06-05 22:32:39 -07:00
James Cross
f1c971d04b add ExpandDims to _known_working_ops
Summary: ExpandDims is a trivial utility op which should not be triggering a warning when used by ModelHelper.

Reviewed By: akyrola

Differential Revision: D5117985

fbshipit-source-id: 5589f46f58458f5019924b48602db088563f2fee
2017-06-05 15:49:21 -07:00
Aapo Kyrola
5e6bd4fbfc Return predict params from ExtractPredictorNet + test
Summary:
Make it easier for users by returning from ExtractPredictorNet the list of blobs that must be saved/exported to run a predictor net. Added a test for ExtractPredictorNet

Codemod.

Reviewed By: asaadaldien

Differential Revision: D5176097

fbshipit-source-id: b1af42132459487b8d94fcdde0e4c514da608243
2017-06-05 15:34:37 -07:00
Andrey Malevich
77c1027abb Create ParameterSharing abstraction for Caffe2.
Summary:
This diff is introducing abstractions for parameter sharing for all the
parameters, that are created through new create_param syntax.

Possible use-cases of this parameters sharing:
1. Share params within RNN interface.
2. Some complicated models that might share some of the branches.
3. TODO (next diff): Cross-model parameter sharing.

Reviewed By: salexspb

Differential Revision: D5160935

fbshipit-source-id: c6d40a5ed7ead240cd7db0eb69de6dc5f505b05a
2017-06-05 11:49:54 -07:00
Andrey Malevich
a8fb85797c Refactoring of the parameters step 0. Add simple tags and unify interface for params and computed_params.
Summary:
This diff is the first step in the effort for refactoring all parameters. As a first step - I'm merging concept of params and computed_params, that is going
to be based on tags instead (in the first version it's still using old data structs to store all the BlobReferences).

Renaming computed_params to non-trainable/non-backprop params should be done is some other diff.

Reviewed By: salexspb

Differential Revision: D5171159

fbshipit-source-id: 68031ca779f053fb266a7c4a2e5b482a3bd9c832
2017-06-02 17:17:57 -07:00
Aapo Kyrola
ffbba0fae7 add model_helper Validate() + sprinkler around
Summary:
Recent diff introduced a duplicate parameter to the model, which would hurt the performance and also affect correctness (duplicate momentum updates, for example). We unfortunately had no checks for duplicate params, outside of data_parallel_model, which fortunately brought this into our attention.

But it is better to have a Validate() function in model_helper, and call that before adding gradient ops and querying for parameters. Added to brew_test calls as well.

Reviewed By: kennyhorror

Differential Revision: D5163458

fbshipit-source-id: 35692e8bfcc359d4e8bc73e6f2358659f6e45ceb
2017-06-01 02:36:47 -07:00
Aapo Kyrola
076376f4f6 Revert D5119830: [C2] Refactoring of the parameters step 0. Add simple tags and unify interface for params and computed_params
Summary: This reverts commit 2001090a37346eb12abbb234e13e727c288eb8a7

Differential Revision: D5119830

fbshipit-source-id: bf321868338f0db85dff3237af7eaf74212dbdf6
2017-06-01 00:02:21 -07:00
Andrey Malevich
ff61ed358e Refactoring of the parameters step 0. Add simple tags and unify interface for params and computed_params
Summary:
This diff is the first step in the effort for refactoring all paramters. As a
first step - I'm merging concept of params and computed_params, that is going
to be based on tags instead (in the first version it's still using old data
structs to store all the BlobReferences).

Renaming computed_params to non-trainable/non-backprop params should be done is
some other diff.

Reviewed By: salexspb

Differential Revision: D5119830

fbshipit-source-id: 2001090a37346eb12abbb234e13e727c288eb8a7
2017-05-31 22:36:36 -07:00
Aapo Kyrola
ccdf2d99e1 Add description to assert in model_helper
Summary: Add information about the offending param when assertion fires.

Reviewed By: kennyhorror

Differential Revision: D5153625

fbshipit-source-id: 9f5a02bf64ccbdef9d93d346f79e589dfe3ec5be
2017-05-31 00:02:18 -07:00
Alexander Sidorov
016f72537a ModelHelper.create_param, Initializer abstraction and ParameterInfo for optimizers
Summary:
This is going to unblock Nvidia in their work on adding fp16
support to Caffe2. I discussed this with kennyhorror before to make
sure this fits into his work on parameter sharing.

Reviewed By: kennyhorror

Differential Revision: D5127797

fbshipit-source-id: 4db155d320b1862570c23b77c4252bdacbf2296f
2017-05-25 22:03:15 -07:00
Andrey Malevich
64e04e78d2 Remove AddOperator from ModelHelper
Summary:
It looks like AddOperator was never really used (searched across the whole
code-base). In addition to this all model_helper functionality is getting
replaced with Brew, so there I'd prefer to remove this method to reduce the
amount of code touching model.params.

Reviewed By: rayleichen

Differential Revision: D5110425

fbshipit-source-id: f2a88e4c1ce5149d27e809e03da9a86c0867bc4d
2017-05-23 13:34:45 -07:00
Yiming Wu
3eeca5b5e0 arg scope in ModelHelper
Summary: based on our discussion, we want an arg_map in ModelHelper and create arg_scope for that model within brew. Now it is realized

Reviewed By: salexspb

Differential Revision: D5042983

fbshipit-source-id: ddd2c7e9bca1be2f08a32f7252b44d3b60a57996
2017-05-12 11:18:59 -07:00
Eider Moore
0c6099ce25 Add __dir__ so autocomplete in iPython works.
Summary: It is good practice to provide __dir__ whenever __getattr__ is defined so that tooling will work intelligently.  In particular, it is hard to explore the available methods in iPython without tab completion.

Reviewed By: dzhulgakov

Differential Revision: D5006545

fbshipit-source-id: 1a150d91d54637d80b292764513943ff70d971b4
2017-05-05 11:32:06 -07:00
Bor-Yiing Su
13bdd4ec05 Replaces the non-existing _param_init_net net by raising an exception.
Summary:
The _param_init_net does not exist. All the other places reference
param_init_net instead. So far no one has encountered any problem
because all the passed params are BlobReferences. This diff makes
this assumption explicit.

Reviewed By: azzolini

Differential Revision: D4922930

fbshipit-source-id: e6dbd7a29ea640b7e62fcfec7ced3cc7d149f872
2017-04-26 10:35:45 -07:00
Yiming Wu
0bb558716a rename model_helpers to brew and lowercase all helper functions
Summary:
rename model_helpers to brew. This is a big diff now. I did these things:

1. replace model_helpers with brew:

  find . -type f -exec sed -i 's/model_helpers/brew/g' {} +

2. rename model_helpers.py and model_helpers_test.py
3. rename ModelHelpersTest to BrewTest
4. lowercase all the helper functions to distinguish them from single op
5. run my unittests
6. run converge tests

Reviewed By: salexspb

Differential Revision: D4930465

fbshipit-source-id: f420a1b03238df1cbe9f4426e0b9c43a12119661
2017-04-24 15:52:26 -07:00
Yiming Wu
bef6e45f8b rename ModelHelperBase
Summary:
rename ModelHelperBase to Model.

This is the result of running:

  find . -type f -exec sed -i 's/ModelHelperBase/ModelHelper/g' {} +

We had 19 results when fbgs ModelHelperBase. Here is 20 instances because I added 1 test in model_helpers_test.py

Reviewed By: salexspb

Differential Revision: D4928337

fbshipit-source-id: bc4c12b60b90c167e717de50ea9fe17521e142e3
2017-04-24 15:52:26 -07:00
Aapo Kyrola
c7d284a03b ability to disable inputs for extract predictor net
Summary:
This is not a super-elegant, but a working solution to fix Newsfeed-teams problem of extracting a predictor net of a net that has a "side chain" that they want to cut from the middle.

This adds a argument to ExtractPredictorModel that allows one to define "disabled inputs". These are inputs that we want to switch off, so that all operators that depend on that input will be removed from the model.

Differential Revision: D4839953

fbshipit-source-id: 5d16df6f0ec4aac6670e6917efc77abde5d75c95
2017-04-06 17:05:32 -07:00
Yiming Wu
b922b19bfd add weights bias to modelhelperbase
Summary: add weights and bias to modelhelperbase

Reviewed By: salexspb

Differential Revision: D4837125

fbshipit-source-id: 6a357c0e3d07d35aa6cdeb8ef803976646b9dbe6
2017-04-06 11:16:55 -07:00
Aapo Kyrola
d604961b26 check for ExtractPredictorNet for is_test arguments
Summary: To prevent others making the same mistake as I did, check that no op has is_test=0 argument when ExtractPredictorNet is called.

Reviewed By: viswanathgs

Differential Revision: D4796425

fbshipit-source-id: 38c14df6bcc767ec2e6a6e35ee79596a5dab531c
2017-03-29 12:48:54 -07:00
Aaron Markham
58f7f2b441 doxygen python block added
Summary: Closes https://github.com/caffe2/caffe2/pull/226

Differential Revision: D4793550

Pulled By: JoelMarcey

fbshipit-source-id: cc33e58186304fa8dcac2ee9115dcc271d785b1e
2017-03-29 06:46:16 -07:00
Jerry Pan
ee28b6ce22 Caffe2: instrument Everstore loader
Summary: Caffe2: instrument Everstore loader and log to Scuba

Differential Revision: D4669060

fbshipit-source-id: 603256e4ba62a32d9aeadc409f83ef9b1f6a7358
2017-03-27 10:02:11 -07:00
James Cross
79c3a3af54 add gpu support for caffe2-seq2seq
Summary: Adding synchronous optimization on GPUs to the translation training pipeline, via data_parallel_model.Parallelize_GPU, which needs to be updated so there is some way of performing sparse parameter updates (e.g., on embedding tables), whether on GPU or CPU.

Reviewed By: urikz

Differential Revision: D4631914

fbshipit-source-id: 9cdd655f7dbda3f9b2733d459228b3e097892441
2017-03-17 05:19:14 -07:00
Aapo Kyrola
91f468b15c fixes to make data parallel model work for RecurrentNet + test case
Summary:
First, this diff includes a full test of data-parallel LSTM, which confirms it works correctly. To make it work, some changes had to be made:
 - cell net/step net external inputs must be namespace scoped
 - prevent double-namescoping of cellnet inputs
 - make data parallel model understand recurrentnets so the device-mapping works

Reviewed By: salexspb

Differential Revision: D4708840

fbshipit-source-id: 4b0ddc43642d449076a2b6f67ad1c47f84138ff4
2017-03-14 15:48:07 -07:00
Aapo Kyrola
fc7939c25b add model_helper.ExtractPredictorNet()
Summary:
It has been a pain to save predictor-compatible models from Caffe2. This diff adds function ExtractPredictorNet that takes a training model and outputs a predictor model by removing all operators that are not relevant for prediction, such as backward pass and dequeue-ops for input loading (as in predictor, the input data is external input).

We can also consider including this directly in the predictor exporter for FB usage.

Reviewed By: rpenggithub

Differential Revision: D4693264

fbshipit-source-id: e81abbbec0bd4d717159cf36488d0baaf0130090
2017-03-13 16:32:04 -07:00
Zachary Mirman
1c92e85dae Added editDistance helper to caffe2 operators
Summary: Added editDistance helper to caffe2 operators

Differential Revision: D4622152

fbshipit-source-id: 4d6246b8226c1283d5883edfaa27e8f7748fdc4c
2017-02-28 13:31:56 -08:00
Min Li
182c168285 Add group collector limit and add option for enable sum loss
Summary: as title. Add num of examples limit for group collect. Add option for enabling sum loss in BatchLRLoss

Reviewed By: xianjiec

Differential Revision: D4602311

fbshipit-source-id: 5b2a244f1f0e9f1ab0f4590e94828fd18d018d8d
2017-02-23 15:03:22 -08:00
Yury Zemlyanskiy
40534de705 Gradient for Copy operator
Summary:
One can find a reason, why I need gradient for CopyOp in this post - https://fb.facebook.com/groups/1405155842844877/permalink/1639683782725414/

Gradient for CopyOp is trivial in case the device was the same (cpu, or same gpu), but get's a little harder, when the copy was made across two different gpu.
I introduce new operator CopyOnDeviceLike, which has additional second input. The op copies the first input to the same device as the second one. The default implementation is exactly the same as CopyOp, but I specialize it for CUDAContext.

Please, let me know if I'm doing anything wrong here! That's my first caffe2 diff, related to operators definitions.

Reviewed By: Yangqing

Differential Revision: D4557258

fbshipit-source-id: 9494be589cc1e5696bbbfe25b7622aaa4c9efe4a
2017-02-16 06:11:27 -08:00
Aapo Kyrola
bb928f3cc0 Latest fixes to Xray Flow workflows for Caffe2
Summary:
(Ignore the convolution-op related changes, they will be later patched separately)

This diff ignores work from latest few weeks:
- some refactoring of the flow ops
- no_bias setting
- MAP computation (instead of accuracy) for OC
- adaptive learning rate for Xray concepts
- various small bug fixes

Reviewed By: viswanathgs

Differential Revision: D4329500

fbshipit-source-id: 000d4fd22ec408af5290480c788eb86546bff52e
2017-01-10 12:59:23 -08:00
Ou Jin
a4f3721e15 weightedsum on ps
Summary:
Rewrite D3993337 based on new stack.
Comparing to the old one, we need more readers to achieve the same speed. But so far the speed is the same and the new bottleneck is the write bandwidth of trainer. Model quality is the same as the base.

Reviewed By: azzolini

Differential Revision: D4310803

fbshipit-source-id: 6d04ae8040c1ee7caa9aea5287f054e73fbe325a
2016-12-22 19:14:38 -08:00
Yury Zemlyanskiy
c2d28fb874 RNNs API simplification
Summary:
This is a first step in improving our RNN story. It provides a wrapper around current RecurrentNetworkOp implementation which infers most of the redundant parameters and makes API much simpler.

Also in order to support general step nets I added an extra argument to the RecurrentNetworkOp.

Future work:

1. Inferring step net output and internal blobs (scratches) sizes and type
2. Avoid accessing blobs by names in c++ part
3. Remove requirement for inputs / output 1:1 correspondence in the step net
4. Make python API support networks with operators like Sum being on the boarder of the Cell net (currently there is an issue with such networks where gradient blobs which are on the side are not explicitly created).

Differential Revision: D4268503

fbshipit-source-id: f8a66491c2b55daa730caeed7e9f2b3921541b49
2016-12-21 09:29:43 -08:00
Yangqing Jia
4858a6bc6f snapshot -> checkpoint
Summary:
This renames the "Snapshot" op name to "Checkpoint" as we discussed earlier.

The early Snapshot name is still available, but we should move to the new name and
eventually deprecate the old name.

The Python SnapshotManager should be also changed, cc azzolini

Reviewed By: dzhulgakov

Differential Revision: D4272021

fbshipit-source-id: 4b8e029354416530dfbf0d538bfc91a0f61e0296
2016-12-15 12:01:30 -08:00
Aapo Kyrola
6191de7ac9 gradients for CopyGPUToCPU and CopyCPUToGPU + unit test + schema
Summary: Added gradients for the Copy operators. They are simply the reverse operation. Also added a unit test to test things actually work and added the operator schema and registration to model_helper's known operators.

Differential Revision: D4306516

fbshipit-source-id: dd0633fa7f2ed01991990e56e63669794df037d9
2016-12-15 12:01:29 -08:00
Aapo Kyrola
eddf23ca0f Handle parameters that are computed but not optimized
Summary:
prigoyal sharply noticed a bug in the Resnet models: we have not been checkpointing, nor synchronizing between gpus, the moving average and variance computed by the SpatialBN ops.  Particularly the first problen is serious, since models starting from checkpoint would have started from a null-state for SpatialBN. Not synchronizing with the data parallel model is less tragic since each GPU should see very similar data.

Thus I propose keeping track of "computed params", i.e params that are computed from data but not optimized. I don't know if there are other examples, but SpatialBN's moving avg and var definitely are one.

- I modified the checkpointign for xray model to store those blobs + also ensure the synchronization of those blobs
- I modified data parallel model to broadcast those params from gpu0. I first tried averaging, but hit some NCCL deadlocks ... :(

Differential Revision: D4281265

fbshipit-source-id: 933311afeec4b7e9344a13cf2d38aa939c50ac31
2016-12-15 12:01:28 -08:00
Ou Jin
e8b7ec1393 disable local update for sparse features
Summary:
With parameter server, sparse features are updated on the parameter server.
Local update for sparse features are disabled. But that logic is removed in
D4144922. This diff is to add this logic back in a slightly different way.

Previously, in trainer_example, I did that in a hacky way just avoid adding
sparse weight to model.params. It will still generate grad, but will not add
optimization operators. At the same time, it is always registered directly in
the sparse_mapping, so the parameter server is aware of this parameter.
But with the new change for ParameterInfo. I can not do it in that way anymore.
Because the param registry and params are bind together in ParameterInfo.

For dper, there is a option in dper model helper to disable all of the sparse
parameter optimizer.

To combine these two together, I directly changed the ModelHelperBase in this
diff. It is not quite ideal. It is better to do it in Layer. But to fix the old
one, this seems to be more reasonable place to cover both cases.

With this diff, there is no spike anymore. So probably this is the root cause
for the convergence issue we have seen in D4144922. It explains that why the
model can recover, which is because adagrad decays local learning rate and
local updates cause less change.

Reviewed By: dzhulgakov

Differential Revision: D4229684

fbshipit-source-id: da1241d43d7c52cbf13560f9bb83e09897d8d56f
2016-11-29 15:18:38 -08:00
Huazhong Ning
6ebae91d24 multi-task learning: save model and evaluator
Summary:
This consists of a series of diffs for implementing Multi-task learning.
This diff is to
1. save model;
2. support MT learning in evaluator
3. add unittest.

model after merging (saved model): https://our.intern.facebook.com/intern/graphviz/?paste=56793140

Reviewed By: xianjiec

Differential Revision: D4123316

fbshipit-source-id: 225bf8616962ec08f4f1ef85729c1e94ba7c373a
2016-11-29 15:18:38 -08:00
Aapo Kyrola
b77aa551a4 add missed comma
Summary: D4205610 missed a comma , causing unnecessary logspill with WeightedSum op

Reviewed By: Yangqing

Differential Revision: D4222806

fbshipit-source-id: ff17c20eae7a7168475f39cc227d3e8ab347288f
2016-11-29 15:18:37 -08:00
Aaron Jaech
c41f0d27c4 adding more things to the list of known operators in model_helper
Summary: This is so they don't generate spurious warning messages in the logs

Reviewed By: dzhulgakov

Differential Revision: D4205610

fbshipit-source-id: f764b51565430f4057898ab929372bc7943e0495
2016-11-29 15:18:35 -08:00
Yangqing Jia
589398950f fbsync at f5a877 2016-11-18 15:41:06 -08:00
Yangqing Jia
238ceab825 fbsync. TODO: check if build files need update. 2016-11-15 00:00:46 -08:00
Yangqing Jia
d1e9215184 fbsync 2016-10-07 13:08:53 -07:00