Summary:
This diff fixes various issues with memonger, and works at leasrt with rbgirshick's failure case, Resnet-50, and new harder unit test. I will still create a proper resnet50-test.
1) Introduce concept of "tokens". These are passed down the dependency chains, and a blob can be used for recycling only if it owns all the tokens that are currently in possession. Tokens are added when branching, and tokens are redeemed after all inputs are satisfied. A bit hard to explain.
2) There were various bugs due to bad code: the free_blobs data structure is of different type when we have blob sizes and when we haven't. I plan to rewrite this soon. But there were some bugs.
3) Added a harder unit test that failed before.
4) Added test for resnet50 + memonger
Reviewed By: asaadaldien
Differential Revision: D5193393
fbshipit-source-id: bc2a714877aa1201c32a5ba8ade862865e455711
Summary: I broke resnet50 when switching to use optimizer, which uses LR per parameter. This only happens after each epoch, and I did no test patiently enough. For a stop-gap, while asaadaldien works on a better solution, just fetch the lr of a conv1_w param.
Reviewed By: asaadaldien
Differential Revision: D5207552
fbshipit-source-id: f3474cd5eb0e291a59880e2834375491883fddfc
Summary:
This diff plan to attack the problem where we want to just annotate device option for operators and leave Caffe2 to help us inject cross device copy functions. This feature would be useful for mixed device training and multi device training with several nets, where previously we do the heavy lifting of adding copy functions ourselves.
Ideally, this feature will happen like this:
//construct your nets first
core.InjectDeviceCopyAmongNets([train_init, train_net, ...])
My ideas are written in comments. I will update them here as well later.
Reviewed By: dzhulgakov
Differential Revision: D5134103
fbshipit-source-id: 173f7da9d1773d1c50ccdc27f1b5cd3067b04af5
Summary: Caught exception when fetching uninitialized blobs when collecting blob sizes in workspace. Some of the output blobs (like mask output of DropOut when is_test=1) may be nullptr and FetchBlob will fail.
Differential Revision: D5198641
fbshipit-source-id: 45ee26c4cb1c25cc48904e9f7d7c007224c97418
Summary: Implements an APMeter operator (APMeterOp) to calculate AP for multilclass classification given prediction socres and labels. The Op takes a score tensor [nsamples x nclasses] and a label tensor [nsamples x nclasses], and outputs a float tensor of size nclasses as the AP for each class.
Reviewed By: akyrola
Differential Revision: D5082565
fbshipit-source-id: ae7304bc8fc999c361245b9aec38eb9a5f5eef4b
Summary:
Add a helper function for parametric op ElementwiseLinear
The typical syntax is model.ElementwiseLinear(input, output, dimension)
Reviewed By: harouwu, akyrola
Differential Revision: D5114152
fbshipit-source-id: 8e8c691f824f518ae510a72ab0c12de1b018f3b5
Summary:
There is an edge case where internal gradient blobs of the backward step net should not be considered internally calclulated if the only "internal" calculation is in-place.
In the case of the failing attention unit tests, the offending blob was attention_weighted_encoder_context_grad, which was incorrectly considered internal because it was the output (as well as input) of a Reshape on the step net's edge. The caveat here is that the results may be unpredictable if a non-pass-through in-place operation is applied to a blob within a step net which is also consumed both internally and is a recurrent state/output. (This is an extreme edge case, and difficult to explicitly enforce, but it's worth noting.)
Reviewed By: salexspb
Differential Revision: D5198328
fbshipit-source-id: 0cfa8f903fd767fc50e727f238ac3d8cdca03fe0
Summary:
The goal of this diff is:
1) Enable checkpointing to honor batches_per_epoch
2) Resume hive_readers mid-split
Reviewed By: azzolini
Differential Revision: D5004212
fbshipit-source-id: 2ff5df30ba946eefadd109d80056cde67398a080
Summary:
Input of topK op: X (dense)
Output of topK op: Value and Indices (sparse representation)
Value will have gradient in some cases,
We backprop (copy) the gradient from sparse (d Value) to dense (d X)
Differential Revision: D5133461
fbshipit-source-id: 7bad55b60e8a22dfe0e51357ce2099d7f752c133
Summary: replace hand made sgd with build_sgd
Reviewed By: salexspb
Differential Revision: D5186331
fbshipit-source-id: 3c7b4b370e29a1344b95819766463bae3812c9a6
Summary: Previous implementation relied on the order of fields for some reason.
Reviewed By: azzolini
Differential Revision: D5164478
fbshipit-source-id: 12717310860584e18ce4ca67d0bd5048354cdc0a
Summary: Infer input and output device from OperatorDef through OperatorSchema. This is inspired by shape inference. With this feature, we can easily analysis device information for all blobs in the net in a generic way. It is really helpful for auto cross device execution.
Reviewed By: akyrola, dzhulgakov
Differential Revision: D5161065
fbshipit-source-id: ee656123112171a4ca00f2fb3f6940f32ddf3135
Summary:
This diff is fixing fetching of the parameters in the global namescope. Earlier
diff that have switched to '' have introduced this bug.
Reviewed By: dzhulgakov
Differential Revision: D5189667
fbshipit-source-id: 4818e99e2c2c90788e70e0b8b6204ec6f471d37d
Summary: ExpandDims is a trivial utility op which should not be triggering a warning when used by ModelHelper.
Reviewed By: akyrola
Differential Revision: D5117985
fbshipit-source-id: 5589f46f58458f5019924b48602db088563f2fee
Summary:
Make it easier for users by returning from ExtractPredictorNet the list of blobs that must be saved/exported to run a predictor net. Added a test for ExtractPredictorNet
Codemod.
Reviewed By: asaadaldien
Differential Revision: D5176097
fbshipit-source-id: b1af42132459487b8d94fcdde0e4c514da608243
Summary:
I'm using Python ops in a project and need corresponding Python gradient ops. For my use case, only a subset of the forward op outputs have gradients and only a subset of forward op inputs have gradients. However the current implementation of `GetPythonGradient` forces all grad inputs and outputs to exist. This diff allows one to specify that only a subset of grad inputs / outputs are used when constructing the Python op.
I'm not sure if this is up to caffe2 standards, so please push back on style and content as needed.
Reviewed By: dzhulgakov
Differential Revision: D4897004
fbshipit-source-id: 96fffe8634c51a49b6bce7339a46c6235f7d4bbd
Summary:
fixing missing future package issue.
Recently we found some of our users does not have future module support. So we might need a try/catch wrapper around all past import
Reviewed By: Yangqing
Differential Revision: D5183547
fbshipit-source-id: 262fdf2940ee1be4454bf0b0abb9e6a0f1a0ee82
Summary:
This diff is introducing abstractions for parameter sharing for all the
parameters, that are created through new create_param syntax.
Possible use-cases of this parameters sharing:
1. Share params within RNN interface.
2. Some complicated models that might share some of the branches.
3. TODO (next diff): Cross-model parameter sharing.
Reviewed By: salexspb
Differential Revision: D5160935
fbshipit-source-id: c6d40a5ed7ead240cd7db0eb69de6dc5f505b05a
Summary:
This diff is creating new type of Initializer - ExternalInitializer. This
initializer is supposed to be used in cases when the parameter blob is already
expected to exist in the workspace.
Reviewed By: dzhulgakov
Differential Revision: D5171322
fbshipit-source-id: d27861f0f80afdea93c235d49f63da19adccc92c
Summary:
This diff is the first step in the effort for refactoring all parameters. As a first step - I'm merging concept of params and computed_params, that is going
to be based on tags instead (in the first version it's still using old data structs to store all the BlobReferences).
Renaming computed_params to non-trainable/non-backprop params should be done is some other diff.
Reviewed By: salexspb
Differential Revision: D5171159
fbshipit-source-id: 68031ca779f053fb266a7c4a2e5b482a3bd9c832
Summary:
Add add_weight_decay to optimizer + test.
In D5142973 I accidentally removed weight decay from resnet50 trainer, so this restores it.
Reviewed By: asaadaldien
Differential Revision: D5173594
fbshipit-source-id: c736d8955eddff151632ae6be11afde0883f7531
Summary: old diff had some changes to formatter.py and generator.py, but now everything is in github.py
Reviewed By: bwasti
Differential Revision: D5165061
fbshipit-source-id: 5fe5ff70ff2c5525c7aacf20854916c86d272749
Summary: Use new blob as residual sum output, and add scoping to prevent any name conflicts.
Reviewed By: urikz
Differential Revision: D5167145
fbshipit-source-id: a01c87ed2278205e95e8395314b166afb1dca1b3
Summary:
Split the Caffe2 memory based model into to parts
- Dimension reduction MLP
- DNN with concatenation of memory and obj feature
Currently only implement simple mean
Differential Revision: D4866825
fbshipit-source-id: d2f6813402513ec9af30dbe29a50593e2d3cdb3b
Summary:
Recent diff introduced a duplicate parameter to the model, which would hurt the performance and also affect correctness (duplicate momentum updates, for example). We unfortunately had no checks for duplicate params, outside of data_parallel_model, which fortunately brought this into our attention.
But it is better to have a Validate() function in model_helper, and call that before adding gradient ops and querying for parameters. Added to brew_test calls as well.
Reviewed By: kennyhorror
Differential Revision: D5163458
fbshipit-source-id: 35692e8bfcc359d4e8bc73e6f2358659f6e45ceb
Summary:
This diff is the first step in the effort for refactoring all paramters. As a
first step - I'm merging concept of params and computed_params, that is going
to be based on tags instead (in the first version it's still using old data
structs to store all the BlobReferences).
Renaming computed_params to non-trainable/non-backprop params should be done is
some other diff.
Reviewed By: salexspb
Differential Revision: D5119830
fbshipit-source-id: 2001090a37346eb12abbb234e13e727c288eb8a7
Summary:
It's causing problems inside docker containers:
`InvalidArgument: Insufficient bytes of entropy to draw requested array. shape=(5, 9, 10, 5), dtype=float32. Can you reduce the size or dimensions of the array? What about using a smaller dtype? If slow test runs and minimisation are acceptable, you could increase settings().buffer_size from 8192 to at least 18432000.`
Closes https://github.com/caffe2/caffe2/pull/707
Differential Revision: D5162621
Pulled By: Yangqing
fbshipit-source-id: 55544210961cbc80828dca2cbeba6a5ace8cf8d1
Summary:
This warning becomes an error with https://github.com/numpy/numpy/pull/6271 (`>=0.12.0`).
```
caffe2/python/operator_test/tile_op_test.py::TestTile::test_tilewinput
/opt/caffe2/caffe2/python/operator_test/tile_op_test.py💯 VisibleDeprecationWarning: converting an array with ndim > 0 to an index will result in an error in the future
dims[axis] = tiles
/usr/lib/python2.7/dist-packages/numpy/lib/shape_base.py:873: VisibleDeprecationWarning: converting an array with ndim > 0 to an index will result in an error in the future
return c.reshape(shape_out)
```
Closes https://github.com/caffe2/caffe2/pull/710
Differential Revision: D5160776
Pulled By: Yangqing
fbshipit-source-id: b264e0e389de5817a289db878c15e655f9fa2f09
Summary:
Adds support for generating and training pfp16 models. Added SGD optimizer for multi-precision trainers and a new callback to data_parallel_model in order to help multi-precision models keep their different copies of parameters in sync during training.
Closes https://github.com/caffe2/caffe2/pull/697
Differential Revision: D5159712
Pulled By: salexspb
fbshipit-source-id: 60a889494d2e2f4df1d720331e19f638c5eb95cc
Summary:
Currently we can get into broken situations when some nodes working on computation detectChanges() faster than others, thus only some of the nodes start doing next iteration of training. This is an inconsistent state. To prevent this to happen, now each node sets a "re-rendezvous flag" and that is allreduced after each iteration. Once all agnodes agree, re-rendezvous will be done.
Also noticed that min_shards=1 does not work because data parallel model assumed num_shards>1 when rendezvous is not None. Fixed that.
Reviewed By: andrewwdye
Differential Revision: D5156282
fbshipit-source-id: f2ccbd8ad13ed37f7813ff8ad1080d963d0d17e3
Summary: Add information about the offending param when assertion fires.
Reviewed By: kennyhorror
Differential Revision: D5153625
fbshipit-source-id: 9f5a02bf64ccbdef9d93d346f79e589dfe3ec5be
Summary: Fix an issue where the parameter is not created in param_init_net, or net, and then we secondarily look at which device op outputs the gradient. This did not work if the gradient was a GradientSlice.
Reviewed By: harouwu
Differential Revision: D5153102
fbshipit-source-id: 20eae660ea32e5a9ea484bf93c04c8f8c71a51ed
Summary: If ConstantFill (or other fill op) is used in CUDAContext, with input_as_shape, the code crashes as it expects the shape be in CUDAContext but accesses the array in host code... We could fix this by copying the values from the CUDA tensor, but it is probably best to enforce the shape param is in CPU context. This is what this diff does.
Differential Revision: D5152766
fbshipit-source-id: 0629a189bd1d800c0b7c9dbc324b78d279efac0b
Summary:
Bug repro is in a test. Generally speaking accumulation was
not happening if len(ys) >= 2 (list of blobs we compute gradients
from) and for some blob in the net it was both in ys list and also got
a gradient propagated from another element in ys.
Reviewed By: akyrola
Differential Revision: D5121695
fbshipit-source-id: 282d88f2f4f6e27dadae311964f40246a2739130
Summary: It looks like it's a bit too restrictive requirement. Let's remove it.
Reviewed By: volkhin
Differential Revision: D5150968
fbshipit-source-id: 9e38574edc6542c5ce3c7f25a01afe8f5ff9b507
Summary:
Fixes some performance issues when `broadcast_computed_params=True` is passed to Parallelize_GPU. Enabled via the same `use_nccl` flag as AllReduce
Closes https://github.com/caffe2/caffe2/pull/630
Differential Revision: D5149828
Pulled By: akyrola
fbshipit-source-id: 12c9714c7fa078811f1cde61c8523dca8f7f968f