Summary: Hard-to-debug problems arise when a gradient creator fails when the forward op is incorrect itself. Add checking of the schema before callig the creator. Also clarify the error messages
Reviewed By: Yangqing
Differential Revision: D5256016
fbshipit-source-id: 78550f7e2ce5b88e26b69fdae4be0eece52edfea
Summary:
The current version of schema.py has a Metadata class with three fields. The default for it is set to
four Nones. This is just changing that to three Nones so that the number of default values matches the number
of actual fields.
Reviewed By: kennyhorror
Differential Revision: D5250463
fbshipit-source-id: 42e5650d270f5f63662614d8445b4819ed370dec
Summary: Also fixed a small bug in ModelHelper constructor
Reviewed By: harouwu
Differential Revision: D5246799
fbshipit-source-id: 3719ca078f0e2b5e463fc93da9c8215f5583bd9a
Summary:
We need to support RNNs explicitly in ExtractPredictorNet, because they store sub-nets as strings in special arguments. When netdef argument arrive, we can generalize this a bit.
Added a test under rnn_cell_test to test that extracting an LSTM predictor net works correctly and sets the device option properly for the step net ops.
Reviewed By: yqwangustc
Differential Revision: D5236334
fbshipit-source-id: cd653427f8c440a14d94195a532d18276f94749a
Summary: A quite common problem is that it is hard to load blobs with pe.load_from_db to a specific device. One must set the device options of the returned init_net and predict_init_net, which is quite magical. So I made load_from_db() able to set these device options automatically, based on device scope or device_option parameter. Added an unit test.
Reviewed By: asaadaldien
Differential Revision: D5249202
fbshipit-source-id: 7b9d91476cb8d1b0ec0d9772e50b9148b8b184fa
Summary:
salexspb This fixes a major perf issue (40% boost on alexnet end-to-end perf) in the multi-precision SGD optimizer - it was causing repeated cudaMalloc / cudaFree calls during training iterations due to the changing size of the `grad` blob as it moved from fp16 <-> fp32.
Closes https://github.com/caffe2/caffe2/pull/797
Differential Revision: D5246978
Pulled By: salexspb
fbshipit-source-id: ec3d7ef18445e19eaf5aac908d0a7bcd5957eb60
Summary: This was only needed in order to initialize stateful PythonOps. Now PythonOp has support for initialization at Op creation time, so this is not used anymore.
Reviewed By: dzhulgakov
Differential Revision: D5242908
fbshipit-source-id: dbaa249466dd0f37f25d204d387b1f99c6dd4fed
Summary: This is going to show a python Caffe2 user where a failed operator was created. Motivation for having this information not right in protobuf is to avoid having it too verboose and keep ability to read protobufs of a net after a simple print() call.
Reviewed By: jamesr66a
Differential Revision: D5226047
fbshipit-source-id: 7edfe850e05a2ec209577142aa3368664a57a108
Summary:
This allows to construct a python op by passing a pickled "builder function call" as an argument to the op.
The builder function is called at PythonOp construction time and returns a function that will be called when the op is run.
This way we allow to drop the dependency on 'tokens', which didn't work properly for protobufs that get distributed to other processes. Now, the PythonOp definition is self-contained: as long as the build dependencies are right, sharding the protobuf is enough to execute the net remotely.
Reviewed By: dzhulgakov
Differential Revision: D5080833
fbshipit-source-id: a5deaca5d3143024cdb121519689224e9dbec5ce
Summary:
truncate id list using the max length computed in compute meta, so that it has determined length,
which is useful for position weighted pooling method.
Reviewed By: sunwael
Differential Revision: D5233739
fbshipit-source-id: f73deec1bb50144ba14c4f8cfa545e1ced5071ce
Summary: Recently people find that this test is too strict because of proto string matching. Thus, I change it to compare fields so that this test will not complain even if protobuf chnaged in future.
Reviewed By: dzhulgakov
Differential Revision: D5229855
fbshipit-source-id: 54efcd7a0f9e5dbba1ddeb480801abcb859e07bd
Summary: added an operator that converts key/value blobs into a blob containing a map pointer, unittest passed.
Differential Revision: D5224449
fbshipit-source-id: 2f60754ed3ba6ed16039c09019117ae3c3646ab2
Summary:
Diff D5224410 initializes the should_stop_blob explicitly. With that, we will
have one more blob when executing the job. Adjusts the check accordingly.
Reviewed By: azzolini
Differential Revision: D5228398
fbshipit-source-id: 439b186c30b0b1d0e41e513babbcccd85e7a1b4a
Summary:
We waste extra memory by creating two autosplit gradient
blobs and then accumulating it into them main one. Sometimesk, when Sum
/ Sub ops are involved, we can avoid wasting extra memory at all.
Ideally we would not waste any memory and make ops add to the same
blob rather then calculating separate results and then mering
them. But it would require a substantial change to the frameworks and
rewriting a lot of operators.
Reviewed By: dzhulgakov
Differential Revision: D5157667
fbshipit-source-id: 8293824d6cdd971d8853ae90aee68e4a6d1e132b
Summary:
It's very useful for simple cases like benchmarking nets where we want to encode input/output record in the net and don't want to go through the hurdles of storing input/output record in MetaNetDef.
For those cases I propose remapping the input/output record before saving to 'input_record/{field_name}'. Then we can recover input/output record back just based on the names of the blobs.
Differential Revision: D5170473
fbshipit-source-id: ac5daa60051605ed93022aec1377a49f08f15663
Summary: This diff fixes an issue with running the same reader in the same workspace multiple times. In order to achieve correct behavior of execution step we have to explicitly initialize should_stop_blob with False.
Reviewed By: kennyhorror
Differential Revision: D5224410
fbshipit-source-id: 4ad2740e187b62b0a1f5612ea3eef223dcc8a799
Summary: added an operator that converts key/value blobs into a blob containing a map pointer, unittest passed.
Differential Revision: D5166513
fbshipit-source-id: 748527c423a163fe55f914c08fff3adfc74a540c
Summary:
The SparseToDense layer is essentially calling the SparseToDenseMask op.
This makes it impossible to call the functional layer with the true SparseToDense op.
This diff is to rename the layer.
Please let me know if I missed anything or you have a better name suggestion.
Differential Revision: D5169353
fbshipit-source-id: 724d3c6dba81448a6db054f044176ffc7f708bdb
Summary:
Static RNN allows to unroll an RNN into Caffe2 graph using all existing cell abstractions. In this diff I introduce several new tests that already caught a few bugs in our RecurrentNetworkOp gradient accumulation logic by comparing it to an unrolled version.
Another use case is perf - potentially we can run an unrolled net faster because DAGNet will have access to the whole graph. Same about memonger. But this work is not part of this diff
Reviewed By: akyrola
Differential Revision: D5200943
fbshipit-source-id: 20f16fc1b2ca500d06ccc60c4cec6e81839149dc
Summary:
In some cases you have an optimized network and a normal
one. And you would like to make sure they produce same results. If
math under the hood is the same, you could do this with a very high
precision compare to a traditional numerical gradient check. One of
the application - RNNs. There we can unroll RNN into Caffe2 graph and
make sure result is the same as in the optimized version using
RecurrentNetworkOp.
Another possible application - graph transformations. We can verify
that after that nets produce same gradients (cc akyrola on memonger,
bwasti on other transformation ideas)
Reviewed By: bwasti
Differential Revision: D5200855
fbshipit-source-id: 0196af187f0c2feb33de4778ea08d0d288fe1017
Summary:
when building a multi layer static RNN the last timestep of
the first layer (and other layers except the last one) doesn't get a
gradient for the cell state as normally user uses results only from
the last layer and cell state doesn't go up either.
ZeroGradient provides a general solution for injecting 0 gradient
blobs. It is in some way similar to StopGradient operator which is
also specialcased
Reviewed By: bwasti
Differential Revision: D5198375
fbshipit-source-id: a21d0cfb3676a77fac72e5897a200d0bd25fc6de
Summary:
`brew_test.py` is just plain broken. `core_test.py` doesn't work with pytest. `apmeter_test.py` and `top_k_test.py` don't work for CUDA builds.
Closes https://github.com/caffe2/caffe2/pull/765
Differential Revision: D5211817
Pulled By: Yangqing
fbshipit-source-id: 78ec5af35a3fa870978e4c9590210ade9e3bc5ac
Summary:
Neither dependency is required by the core Python modules.
OpenCV, in particular, is a pain to install (no pip package). Conditionally skipping this test will make TravisCI integration easier.
Closes https://github.com/caffe2/caffe2/pull/739
Differential Revision: D5211799
Pulled By: Yangqing
fbshipit-source-id: c6bdc8a17977f64f34e968fd9ab8c65161d2624d
Summary:
This diff fixes various issues with memonger, and works at leasrt with rbgirshick's failure case, Resnet-50, and new harder unit test. I will still create a proper resnet50-test.
1) Introduce concept of "tokens". These are passed down the dependency chains, and a blob can be used for recycling only if it owns all the tokens that are currently in possession. Tokens are added when branching, and tokens are redeemed after all inputs are satisfied. A bit hard to explain.
2) There were various bugs due to bad code: the free_blobs data structure is of different type when we have blob sizes and when we haven't. I plan to rewrite this soon. But there were some bugs.
3) Added a harder unit test that failed before.
4) Added test for resnet50 + memonger
Reviewed By: asaadaldien
Differential Revision: D5193393
fbshipit-source-id: bc2a714877aa1201c32a5ba8ade862865e455711
Summary: I broke resnet50 when switching to use optimizer, which uses LR per parameter. This only happens after each epoch, and I did no test patiently enough. For a stop-gap, while asaadaldien works on a better solution, just fetch the lr of a conv1_w param.
Reviewed By: asaadaldien
Differential Revision: D5207552
fbshipit-source-id: f3474cd5eb0e291a59880e2834375491883fddfc
Summary:
This diff plan to attack the problem where we want to just annotate device option for operators and leave Caffe2 to help us inject cross device copy functions. This feature would be useful for mixed device training and multi device training with several nets, where previously we do the heavy lifting of adding copy functions ourselves.
Ideally, this feature will happen like this:
//construct your nets first
core.InjectDeviceCopyAmongNets([train_init, train_net, ...])
My ideas are written in comments. I will update them here as well later.
Reviewed By: dzhulgakov
Differential Revision: D5134103
fbshipit-source-id: 173f7da9d1773d1c50ccdc27f1b5cd3067b04af5
Summary: Caught exception when fetching uninitialized blobs when collecting blob sizes in workspace. Some of the output blobs (like mask output of DropOut when is_test=1) may be nullptr and FetchBlob will fail.
Differential Revision: D5198641
fbshipit-source-id: 45ee26c4cb1c25cc48904e9f7d7c007224c97418
Summary: Implements an APMeter operator (APMeterOp) to calculate AP for multilclass classification given prediction socres and labels. The Op takes a score tensor [nsamples x nclasses] and a label tensor [nsamples x nclasses], and outputs a float tensor of size nclasses as the AP for each class.
Reviewed By: akyrola
Differential Revision: D5082565
fbshipit-source-id: ae7304bc8fc999c361245b9aec38eb9a5f5eef4b
Summary:
Add a helper function for parametric op ElementwiseLinear
The typical syntax is model.ElementwiseLinear(input, output, dimension)
Reviewed By: harouwu, akyrola
Differential Revision: D5114152
fbshipit-source-id: 8e8c691f824f518ae510a72ab0c12de1b018f3b5
Summary:
There is an edge case where internal gradient blobs of the backward step net should not be considered internally calclulated if the only "internal" calculation is in-place.
In the case of the failing attention unit tests, the offending blob was attention_weighted_encoder_context_grad, which was incorrectly considered internal because it was the output (as well as input) of a Reshape on the step net's edge. The caveat here is that the results may be unpredictable if a non-pass-through in-place operation is applied to a blob within a step net which is also consumed both internally and is a recurrent state/output. (This is an extreme edge case, and difficult to explicitly enforce, but it's worth noting.)
Reviewed By: salexspb
Differential Revision: D5198328
fbshipit-source-id: 0cfa8f903fd767fc50e727f238ac3d8cdca03fe0
Summary:
The goal of this diff is:
1) Enable checkpointing to honor batches_per_epoch
2) Resume hive_readers mid-split
Reviewed By: azzolini
Differential Revision: D5004212
fbshipit-source-id: 2ff5df30ba946eefadd109d80056cde67398a080
Summary:
Input of topK op: X (dense)
Output of topK op: Value and Indices (sparse representation)
Value will have gradient in some cases,
We backprop (copy) the gradient from sparse (d Value) to dense (d X)
Differential Revision: D5133461
fbshipit-source-id: 7bad55b60e8a22dfe0e51357ce2099d7f752c133
Summary: replace hand made sgd with build_sgd
Reviewed By: salexspb
Differential Revision: D5186331
fbshipit-source-id: 3c7b4b370e29a1344b95819766463bae3812c9a6
Summary: Previous implementation relied on the order of fields for some reason.
Reviewed By: azzolini
Differential Revision: D5164478
fbshipit-source-id: 12717310860584e18ce4ca67d0bd5048354cdc0a
Summary: Infer input and output device from OperatorDef through OperatorSchema. This is inspired by shape inference. With this feature, we can easily analysis device information for all blobs in the net in a generic way. It is really helpful for auto cross device execution.
Reviewed By: akyrola, dzhulgakov
Differential Revision: D5161065
fbshipit-source-id: ee656123112171a4ca00f2fb3f6940f32ddf3135