Commit Graph

776 Commits

Author SHA1 Message Date
Aapo Kyrola
27e01744b2 Probably fixed memonger
Summary:
This diff fixes various issues with memonger, and works at leasrt with rbgirshick's failure case, Resnet-50, and new harder unit test. I will still create a proper resnet50-test.

1) Introduce concept of "tokens". These are passed down the dependency chains, and a blob can be used for recycling only if it owns all the tokens that are currently in possession. Tokens are added when branching, and tokens are redeemed after all inputs are satisfied. A bit hard to explain.
2) There were various bugs due to bad code: the free_blobs data structure is of different type when we have blob sizes and when we haven't. I plan to rewrite this soon. But there were some bugs.
3) Added a harder unit test that failed before.
4) Added test for resnet50 + memonger

Reviewed By: asaadaldien

Differential Revision: D5193393

fbshipit-source-id: bc2a714877aa1201c32a5ba8ade862865e455711
2017-06-08 09:19:24 -07:00
Aapo Kyrola
feba1eed00 resnet50: fetch right lr
Summary: I broke resnet50 when switching to use optimizer, which uses LR per parameter. This only happens after each epoch, and I did no test patiently enough. For a stop-gap, while asaadaldien works on a better solution, just fetch the lr of a conv1_w param.

Reviewed By: asaadaldien

Differential Revision: D5207552

fbshipit-source-id: f3474cd5eb0e291a59880e2834375491883fddfc
2017-06-07 21:46:35 -07:00
Yiming Wu
4fefff0bbb Auto injecting device copy for single net and several nets
Summary:
This diff plan to attack the problem where we want to just annotate device option for operators and leave Caffe2 to help us inject cross device copy functions. This feature would be useful for mixed device training and multi device training with several nets, where previously we do the heavy lifting of adding copy functions ourselves.

Ideally, this feature will happen like this:

      //construct your nets first
      core.InjectDeviceCopyAmongNets([train_init, train_net, ...])

My ideas are written in comments. I will update them here as well later.

Reviewed By: dzhulgakov

Differential Revision: D5134103

fbshipit-source-id: 173f7da9d1773d1c50ccdc27f1b5cd3067b04af5
2017-06-07 20:03:18 -07:00
Peizhao Zhang
87a12dd355 Caught exception when fetching uninitialized blobs when collecting blob sizes in workspace.
Summary: Caught exception when fetching uninitialized blobs when collecting blob sizes in workspace. Some of the output blobs (like mask output of DropOut when is_test=1) may be nullptr and FetchBlob will fail.

Differential Revision: D5198641

fbshipit-source-id: 45ee26c4cb1c25cc48904e9f7d7c007224c97418
2017-06-07 15:35:32 -07:00
Ran Xian
4316fb4876 Implement APMeter op
Summary: Implements an APMeter operator (APMeterOp) to calculate AP for multilclass classification given prediction socres and labels. The Op takes a score tensor [nsamples x nclasses] and a label tensor [nsamples x nclasses], and outputs a float tensor of size nclasses as the AP for each class.

Reviewed By: akyrola

Differential Revision: D5082565

fbshipit-source-id: ae7304bc8fc999c361245b9aec38eb9a5f5eef4b
2017-06-07 15:03:04 -07:00
Zhicheng Yan
ee3727db00 add_helper_function_ElementwiseLinear_op
Summary:
Add a helper function for parametric op ElementwiseLinear
The typical syntax is model.ElementwiseLinear(input, output, dimension)

Reviewed By: harouwu, akyrola

Differential Revision: D5114152

fbshipit-source-id: 8e8c691f824f518ae510a72ab0c12de1b018f3b5
2017-06-07 13:49:48 -07:00
James Cross
98825d1323 guard against special case of in-place operation
Summary:
There is an edge case where internal gradient blobs of the backward step net should not be considered internally calclulated if the only "internal" calculation is in-place.

In the case of the failing attention unit tests, the offending blob was attention_weighted_encoder_context_grad, which was incorrectly considered internal because it was the output (as well as input) of a Reshape on the step net's edge. The caveat here is that the results may be unpredictable if a non-pass-through in-place operation is applied to a blob within a step net which is also consumed both internally and is a recurrent state/output. (This is an extreme edge case, and difficult to explicitly enforce, but it's worth noting.)

Reviewed By: salexspb

Differential Revision: D5198328

fbshipit-source-id: 0cfa8f903fd767fc50e727f238ac3d8cdca03fe0
2017-06-07 12:33:31 -07:00
Thomas Dudziak
d524d5b481 Fixes zip/izip for Python 3
Summary: As title

Reviewed By: salexspb

Differential Revision: D5154186

fbshipit-source-id: 2ef24557d82ae16d3bdfbc90a4cc96be8e2dc6c3
2017-06-07 00:04:26 -07:00
Thomas Dudziak
60c78d6160 Fixes range/xrange for Python 3
Summary: As title

Differential Revision: D5151894

fbshipit-source-id: 7badce5d3122e8f2526a7170fbdcf0d0b66e2638
2017-06-07 00:04:26 -07:00
Ahmed Taei
4c5d101caf Implement ColwiseMax and RowwiseMax reduction ops.
Differential Revision: D5192949

fbshipit-source-id: e7e877b4bea19dd1be94449d45d2733f4858b8e7
2017-06-06 21:17:29 -07:00
Aarti Basant
93ac6a9837 checkpointing for distributed hive reader.
Summary:
The goal of this diff is:
1) Enable checkpointing to honor batches_per_epoch
2) Resume hive_readers mid-split

Reviewed By: azzolini

Differential Revision: D5004212

fbshipit-source-id: 2ff5df30ba946eefadd109d80056cde67398a080
2017-06-06 14:20:06 -07:00
Wenyi Huang
7723129d14 Add gradient for topK op
Summary:
Input of topK op: X (dense)
Output of topK op: Value and Indices (sparse representation)
Value will have gradient in some cases,

We backprop (copy) the gradient from sparse (d Value) to dense (d X)

Differential Revision: D5133461

fbshipit-source-id: 7bad55b60e8a22dfe0e51357ce2099d7f752c133
2017-06-06 14:20:06 -07:00
Xiangyu Wang
c9c862fa8f 16117716 [Caffe2 OSS] make char-rnn exapmle use build_sgd
Summary: replace hand made sgd with build_sgd

Reviewed By: salexspb

Differential Revision: D5186331

fbshipit-source-id: 3c7b4b370e29a1344b95819766463bae3812c9a6
2017-06-06 13:54:59 -07:00
Dmytro Dzhulgakov
80fe2e5caf Fix from_column_list
Summary: Previous implementation relied on the order of fields for some reason.

Reviewed By: azzolini

Differential Revision: D5164478

fbshipit-source-id: 12717310860584e18ce4ca67d0bd5048354cdc0a
2017-06-06 01:17:02 -07:00
Yiming Wu
8cd208ad6f Infer input and output device from OperatorDef through OperatorSchema
Summary: Infer input and output device from OperatorDef through OperatorSchema. This is inspired by shape inference. With this feature, we can easily analysis device information for all blobs in the net in a generic way. It is really helpful for auto cross device execution.

Reviewed By: akyrola, dzhulgakov

Differential Revision: D5161065

fbshipit-source-id: ee656123112171a4ca00f2fb3f6940f32ddf3135
2017-06-05 23:47:33 -07:00
Andrey Malevich
a5fc70857c Support fetching of the parameters from the global namescope by ''
Summary:
This diff is fixing fetching of the parameters in the global namescope. Earlier
diff that have switched to '' have introduced this bug.

Reviewed By: dzhulgakov

Differential Revision: D5189667

fbshipit-source-id: 4818e99e2c2c90788e70e0b8b6204ec6f471d37d
2017-06-05 22:32:39 -07:00
Fedor Borisyuk
686470a6b8 Feature importance in dper 2.0: build network representation
Summary: Changes to enable feature importance.

Reviewed By: kennyhorror

Differential Revision: D5075252

fbshipit-source-id: e5d46e129bcd5cbef77932c63b5a288dd57775d1
2017-06-05 18:03:34 -07:00
Wael Abdelghani
ebecafbcca Support for position weighted in distributed PS
Summary: Title

Reviewed By: azzolini

Differential Revision: D5081871

fbshipit-source-id: 68a97c2112522fbcbcdfd9e0f717b8bce60fe028
2017-06-05 17:04:42 -07:00
Wael Abdelghani
5447f5c0d7 Move position weighted to separate layer
Reviewed By: kennyhorror

Differential Revision: D5063086

fbshipit-source-id: 212c08946728437bcc8b6049438ae82235137ec6
2017-06-05 15:49:22 -07:00
James Cross
f1c971d04b add ExpandDims to _known_working_ops
Summary: ExpandDims is a trivial utility op which should not be triggering a warning when used by ModelHelper.

Reviewed By: akyrola

Differential Revision: D5117985

fbshipit-source-id: 5589f46f58458f5019924b48602db088563f2fee
2017-06-05 15:49:21 -07:00
Aapo Kyrola
5e6bd4fbfc Return predict params from ExtractPredictorNet + test
Summary:
Make it easier for users by returning from ExtractPredictorNet the list of blobs that must be saved/exported to run a predictor net. Added a test for ExtractPredictorNet

Codemod.

Reviewed By: asaadaldien

Differential Revision: D5176097

fbshipit-source-id: b1af42132459487b8d94fcdde0e4c514da608243
2017-06-05 15:34:37 -07:00
Ross Girshick
8e99824ce7 Allow subsets of gradient outputs / inputs in Python ops
Summary:
I'm using Python ops in a project and need corresponding Python gradient ops. For my use case, only a subset of the forward op outputs have gradients and only a subset of forward op inputs have gradients. However the current implementation of `GetPythonGradient` forces all grad inputs and outputs to exist. This diff allows one to specify that only a subset of grad inputs / outputs are used when constructing the Python op.

I'm not sure if this is up to caffe2 standards, so please push back on style and content as needed.

Reviewed By: dzhulgakov

Differential Revision: D4897004

fbshipit-source-id: 96fffe8634c51a49b6bce7339a46c6235f7d4bbd
2017-06-05 12:52:01 -07:00
Yiming Wu
8871ef029b quick fix future issue with brew/core/schema/workspace/scope/utils.py
Summary:
fixing missing future package issue.

Recently we found some of our users does not have future module support. So we might need a try/catch wrapper around all past import

Reviewed By: Yangqing

Differential Revision: D5183547

fbshipit-source-id: 262fdf2940ee1be4454bf0b0abb9e6a0f1a0ee82
2017-06-05 12:01:48 -07:00
Andrey Malevich
77c1027abb Create ParameterSharing abstraction for Caffe2.
Summary:
This diff is introducing abstractions for parameter sharing for all the
parameters, that are created through new create_param syntax.

Possible use-cases of this parameters sharing:
1. Share params within RNN interface.
2. Some complicated models that might share some of the branches.
3. TODO (next diff): Cross-model parameter sharing.

Reviewed By: salexspb

Differential Revision: D5160935

fbshipit-source-id: c6d40a5ed7ead240cd7db0eb69de6dc5f505b05a
2017-06-05 11:49:54 -07:00
Andrey Malevich
e05173a476 Create ExternalInitializer to simplify logic around init_params = False
Summary:
This diff is creating new type of Initializer - ExternalInitializer. This
initializer is supposed to be used in cases when the parameter blob is already
expected to exist in the workspace.

Reviewed By: dzhulgakov

Differential Revision: D5171322

fbshipit-source-id: d27861f0f80afdea93c235d49f63da19adccc92c
2017-06-02 18:22:50 -07:00
Andrey Malevich
a8fb85797c Refactoring of the parameters step 0. Add simple tags and unify interface for params and computed_params.
Summary:
This diff is the first step in the effort for refactoring all parameters. As a first step - I'm merging concept of params and computed_params, that is going
to be based on tags instead (in the first version it's still using old data structs to store all the BlobReferences).

Renaming computed_params to non-trainable/non-backprop params should be done is some other diff.

Reviewed By: salexspb

Differential Revision: D5171159

fbshipit-source-id: 68031ca779f053fb266a7c4a2e5b482a3bd9c832
2017-06-02 17:17:57 -07:00
Tao Wu
3bd6195891 removed Sum from simple_operator_layers.py; passed unit tests
Summary: removed softmax, sigmoid, tanh, relu from simple_operator_layers.py; passed all unit tests

Reviewed By: kittipatv

Differential Revision: D5150271

fbshipit-source-id: abe611bf6c5de5caba189181e9e41d705d8c5c54
2017-06-02 15:03:16 -07:00
Aapo Kyrola
401908d570 add_weight_decay + restore weight decay to resnet50_trainer
Summary:
Add add_weight_decay to optimizer + test.

In D5142973 I accidentally removed weight decay from resnet50 trainer, so this restores it.

Reviewed By: asaadaldien

Differential Revision: D5173594

fbshipit-source-id: c736d8955eddff151632ae6be11afde0883f7531
2017-06-02 14:16:56 -07:00
Aaron Markham
a2ba169354 fixed operators schema output to work from only this file for OSS
Summary: old diff had some changes to formatter.py and generator.py, but now everything is in github.py

Reviewed By: bwasti

Differential Revision: D5165061

fbshipit-source-id: 5fe5ff70ff2c5525c7aacf20854916c86d272749
2017-06-02 13:47:25 -07:00
James Cross
4bed0c6d41 Update RNN Seq2SeqModelCaffe2EnsembleDecoder to reflect training network structure
Summary: Use new blob as residual sum output, and add scoping to prevent any name conflicts.

Reviewed By: urikz

Differential Revision: D5167145

fbshipit-source-id: a01c87ed2278205e95e8395314b166afb1dca1b3
2017-06-01 23:32:35 -07:00
Pooya Davoodi
2c97c98ca7 Enable testing the GPU implementations of Adagrad and Adam
Summary:
Enable testing the GPU implementations of Adagrad and Adam incl sparse versions.
Closes https://github.com/caffe2/caffe2/pull/607

Reviewed By: dzhulgakov

Differential Revision: D5121552

Pulled By: Yangqing

fbshipit-source-id: da6b7dde456237c94cf74d00860e7327b2267eab
2017-06-01 18:10:57 -07:00
Kun Han
fc4d118e6b Caffe2 MemNN Production Model Saving
Summary:
Split the Caffe2 memory based model into to parts
- Dimension reduction MLP
- DNN with concatenation of memory and obj feature

Currently only implement simple mean

Differential Revision: D4866825

fbshipit-source-id: d2f6813402513ec9af30dbe29a50593e2d3cdb3b
2017-06-01 14:31:53 -07:00
Ahmed Taei
299f293cb2 Add initializer classes to conv_nd.
Summary: Fix parameters passed to _ConvBase

Reviewed By: sunwael

Differential Revision: D5166836

fbshipit-source-id: 6c2a9fa73cf1199a5f861900554f3075a49104fc
2017-06-01 14:17:55 -07:00
Simon Layton
58874ad5bf Fp16 training initializers
Summary:
Re-open for re-importing :)
Closes https://github.com/caffe2/caffe2/pull/721

Differential Revision: D5164345

Pulled By: akyrola

fbshipit-source-id: e80b32556cd25610602df91a4225b93edc0ca40b
2017-06-01 08:34:46 -07:00
Aapo Kyrola
ffbba0fae7 add model_helper Validate() + sprinkler around
Summary:
Recent diff introduced a duplicate parameter to the model, which would hurt the performance and also affect correctness (duplicate momentum updates, for example). We unfortunately had no checks for duplicate params, outside of data_parallel_model, which fortunately brought this into our attention.

But it is better to have a Validate() function in model_helper, and call that before adding gradient ops and querying for parameters. Added to brew_test calls as well.

Reviewed By: kennyhorror

Differential Revision: D5163458

fbshipit-source-id: 35692e8bfcc359d4e8bc73e6f2358659f6e45ceb
2017-06-01 02:36:47 -07:00
Aapo Kyrola
0f8c8f37a8 Revert D5159712: [caffe2][PR] Fp16 training initializers
Summary: This reverts commit 60a889494d2e2f4df1d720331e19f638c5eb95cc

Differential Revision: D5159712

fbshipit-source-id: 16040c911b260648857f656f92b165f92c2daae0
2017-06-01 00:17:14 -07:00
Aapo Kyrola
076376f4f6 Revert D5119830: [C2] Refactoring of the parameters step 0. Add simple tags and unify interface for params and computed_params
Summary: This reverts commit 2001090a37346eb12abbb234e13e727c288eb8a7

Differential Revision: D5119830

fbshipit-source-id: bf321868338f0db85dff3237af7eaf74212dbdf6
2017-06-01 00:02:21 -07:00
Andrey Malevich
ff61ed358e Refactoring of the parameters step 0. Add simple tags and unify interface for params and computed_params
Summary:
This diff is the first step in the effort for refactoring all paramters. As a
first step - I'm merging concept of params and computed_params, that is going
to be based on tags instead (in the first version it's still using old data
structs to store all the BlobReferences).

Renaming computed_params to non-trainable/non-backprop params should be done is
some other diff.

Reviewed By: salexspb

Differential Revision: D5119830

fbshipit-source-id: 2001090a37346eb12abbb234e13e727c288eb8a7
2017-05-31 22:36:36 -07:00
Luke Yeager
d8d1cd1064 Test smaller tensors in segment_ops_test
Summary:
It's causing problems inside docker containers:

`InvalidArgument: Insufficient bytes of entropy to draw requested array.  shape=(5, 9, 10, 5), dtype=float32.  Can you reduce the size or dimensions of the array?  What about using a smaller dtype? If slow test runs and minimisation are acceptable, you  could increase settings().buffer_size from 8192 to at least 18432000.`
Closes https://github.com/caffe2/caffe2/pull/707

Differential Revision: D5162621

Pulled By: Yangqing

fbshipit-source-id: 55544210961cbc80828dca2cbeba6a5ace8cf8d1
2017-05-31 20:17:31 -07:00
Luke Yeager
e2cf007dc8 Avoid numpy VisibleDeprecationWarning in test
Summary:
This warning becomes an error with https://github.com/numpy/numpy/pull/6271 (`>=0.12.0`).

```
caffe2/python/operator_test/tile_op_test.py::TestTile::test_tilewinput
  /opt/caffe2/caffe2/python/operator_test/tile_op_test.py💯 VisibleDeprecationWarning: converting an array with ndim > 0 to an index will result in an error in the future
    dims[axis] = tiles
  /usr/lib/python2.7/dist-packages/numpy/lib/shape_base.py:873: VisibleDeprecationWarning: converting an array with ndim > 0 to an index will result in an error in the future
    return c.reshape(shape_out)
```
Closes https://github.com/caffe2/caffe2/pull/710

Differential Revision: D5160776

Pulled By: Yangqing

fbshipit-source-id: b264e0e389de5817a289db878c15e655f9fa2f09
2017-05-31 20:01:30 -07:00
Simon Layton
2bfacff426 Fp16 training initializers
Summary:
Adds support for generating and training pfp16 models. Added SGD optimizer for multi-precision trainers and a new callback to data_parallel_model in order to help multi-precision models keep their different copies of parameters in sync during training.
Closes https://github.com/caffe2/caffe2/pull/697

Differential Revision: D5159712

Pulled By: salexspb

fbshipit-source-id: 60a889494d2e2f4df1d720331e19f638c5eb95cc
2017-05-31 17:46:58 -07:00
Ahmed Taei
f0f4c2fc5d Increase the number of DAG execution worker threads.
Reviewed By: akyrola

Differential Revision: D5158414

fbshipit-source-id: add377aec5588076db881a2a3750101710f29732
2017-05-31 15:19:19 -07:00
Aapo Kyrola
73a8a49c7e synchronize re-rendezvousing on node changes + support num_shards=1 rendezvous
Summary:
Currently we can get into broken situations when some nodes working on computation detectChanges() faster than others, thus only some of the nodes start doing next iteration of training. This is an inconsistent state. To prevent this to happen, now each node sets a "re-rendezvous flag" and that is allreduced after each iteration. Once all agnodes agree, re-rendezvous will be done.

Also noticed that min_shards=1 does not work because data parallel model assumed num_shards>1 when rendezvous is not None. Fixed that.

Reviewed By: andrewwdye

Differential Revision: D5156282

fbshipit-source-id: f2ccbd8ad13ed37f7813ff8ad1080d963d0d17e3
2017-05-31 15:19:13 -07:00
Ahmed Taei
f2d9d97008 Add an option to reset momentum-sgd params every time between successive block updates.
Reviewed By: akyrola

Differential Revision: D5149263

fbshipit-source-id: c0a3637a1b48f74ec55c9d13c8fab3456dab809c
2017-05-31 00:32:11 -07:00
Aapo Kyrola
ccdf2d99e1 Add description to assert in model_helper
Summary: Add information about the offending param when assertion fires.

Reviewed By: kennyhorror

Differential Revision: D5153625

fbshipit-source-id: 9f5a02bf64ccbdef9d93d346f79e589dfe3ec5be
2017-05-31 00:02:18 -07:00
Aapo Kyrola
ce7ce46ca1 fix secondary device check by gradient, if it is sparse
Summary: Fix an issue where the parameter is not created in param_init_net, or net, and then we secondarily look at which device op outputs the gradient. This did not work if the gradient was a GradientSlice.

Reviewed By: harouwu

Differential Revision: D5153102

fbshipit-source-id: 20eae660ea32e5a9ea484bf93c04c8f8c71a51ed
2017-05-30 20:47:17 -07:00
Aapo Kyrola
96d8ae2163 Make fills work with input_shape when run in CUDAContext
Summary: If ConstantFill (or other fill op) is used in CUDAContext, with input_as_shape, the code crashes as it expects the shape be in CUDAContext but accesses the array in host code... We could fix this by copying the values from the CUDA tensor, but it is probably best to enforce the shape param is in CPU context. This is what this diff does.

Differential Revision: D5152766

fbshipit-source-id: 0629a189bd1d800c0b7c9dbc324b78d279efac0b
2017-05-30 20:47:16 -07:00
Alexander Sidorov
846240a340 Caffe2 gradient generator bug fix
Summary:
Bug repro is in a test. Generally speaking accumulation was
not happening if len(ys) >= 2 (list of blobs we compute gradients
from) and for some blob in the net it was both in ys list and also got
a gradient propagated from another element in ys.

Reviewed By: akyrola

Differential Revision: D5121695

fbshipit-source-id: 282d88f2f4f6e27dadae311964f40246a2739130
2017-05-30 18:47:08 -07:00
Andrey Malevich
aa59b217a9 Relax requirement on the outputs of the predictor.
Summary: It looks like it's a bit too restrictive requirement. Let's remove it.

Reviewed By: volkhin

Differential Revision: D5150968

fbshipit-source-id: 9e38574edc6542c5ce3c7f25a01afe8f5ff9b507
2017-05-30 17:23:18 -07:00
Simon Layton
1aa6300696 Option to use NCCL for broadcast
Summary:
Fixes some performance issues when `broadcast_computed_params=True` is passed to Parallelize_GPU. Enabled via the same `use_nccl` flag as AllReduce
Closes https://github.com/caffe2/caffe2/pull/630

Differential Revision: D5149828

Pulled By: akyrola

fbshipit-source-id: 12c9714c7fa078811f1cde61c8523dca8f7f968f
2017-05-30 16:46:38 -07:00