Summary:
a few issues:
1. Randomization hurts memoization
1. Even if we make it non random, then we can get key colisions when loading it back.
2. RNNs use prototxt for step net and apparently its not forward compatible like normal protobuf is
I am thinking of a better less invasive solution now.
Reviewed By: jamesr66a
Differential Revision: D5272118
fbshipit-source-id: ab577fad04fbfc632e1fceffa923377a0d3da1be
Summary: This is going to show a python Caffe2 user where a failed operator was created. Motivation for having this information not right in protobuf is to avoid having it too verboose and keep ability to read protobufs of a net after a simple print() call.
Reviewed By: jamesr66a
Differential Revision: D5226047
fbshipit-source-id: 7edfe850e05a2ec209577142aa3368664a57a108
Summary:
fixing missing future package issue.
Recently we found some of our users does not have future module support. So we might need a try/catch wrapper around all past import
Reviewed By: Yangqing
Differential Revision: D5183547
fbshipit-source-id: 262fdf2940ee1be4454bf0b0abb9e6a0f1a0ee82
Summary: This diff is one step towards enabling python 3 build by making it be more diligent in its handling of strings.
Reviewed By: salexspb
Differential Revision: D4893083
fbshipit-source-id: 28b8adf3280e8d1f0a7dc9b0fee5ad53f2fada57
Summary: Add a RandomFailureOp and handling to elastic data parallel model of the status code
Reviewed By: andrewwdye
Differential Revision: D5065936
fbshipit-source-id: 24224f9ea414ee535c9e90cc28add5189354b0ef
Summary:
This is from discussion with dzhulgakov : as a step towards revisiting the
core.Net autonaming, we will first guard against accidental overwrites of
existing networks in the workspace.
ajtulloch since we are doing Predictors in mobile, this should be safe right?
azzolini - I assume this would be safe, but would love to get your approval.
akyrola - would this hurt xray?
Reviewed By: dzhulgakov
Differential Revision: D4897725
fbshipit-source-id: aa41271927ad6671f07a53b9505283623f8c49e5
Summary:
Add cudnn v6 support, including testing support for dilated convolution.
Add a check to ensure that the versions of cuDNN used to compile Caffe2 and run it are compatible
Closes https://github.com/caffe2/caffe2/pull/85
Reviewed By: bwasti
Differential Revision: D4387690
Pulled By: Yangqing
fbshipit-source-id: 312960134398dd4afe6ee0c01cdc160046c904e8
Summary: Removed Model API because no one {seems to,should} be using it
Reviewed By: Yangqing
Differential Revision: D4575126
fbshipit-source-id: 174d39e9aa46750f1fae8295f7e1e5452559af33
Summary:
.In Tutorial, I found it not correct when calling Model(). After that changing, It works.
Closes https://github.com/caffe2/caffe2/pull/148
Reviewed By: bwasti
Differential Revision: D4556894
Pulled By: Yangqing
fbshipit-source-id: 949a8d0496861f19869436908ffe1ef1a0f853b1
Summary:
Shape inference allows Caffe2 to compute shapes of blobs without running a model. Update InferShapesAndTypes() to accept an optional blob:dimensions map so that external input blobs do not need to be part of the workspace.
InferShapesAndTypes() in workspace.py conditionally calls the ...from_workspace or ...from_map bindings. Note I favored a small amount of code duplication here for the sake of readability. InferShapesAndTypes() in operator.cc has been refactored into mirrored entry points, invoking a common helper.
Other minor changes to address linter warnings.
Reviewed By: dzhulgakov
Differential Revision: D4524873
fbshipit-source-id: 56f863b759c016d7f23523f06fda3aa5bba22357
Summary:
Running RunNet() in python in a loop can be a performance issue if the python code is doing a lot of other processing, such as data input, because python's Global Interpreter lock (GIL) will prevent the RunNet() to be called. This can easily be fixed by making RunNet() run multiple iterations inside the C++ land. (Another way to accomplish the same thing is to use Caffe2's "execution plans", but that requires more setup).
+ fixed timing reporting in my OC workflow
+ improved one error log in data_workers.py
Sorry for piggypagging those small changes, but landing diffs currently is slow...
Reviewed By: rpenggithub
Differential Revision: D4523575
fbshipit-source-id: 039a647576efad5dd9afda74df478ac22b43c103
Summary:
This is a bit large diff, sorry about it. It includes basic shape and type inference functionality, based on YQ's Schema scaffolding. I added some helper functions to make it easier to write simple translations.
Bigger refactoring was needed for ConvPoolBase so that we could use the shape inference already there in the schema.
I annotated enough operators to be able to infer forward-pass of shapes for basic convnet, and added test for that. I intend to bootcamp some annotations and annotate enough to handle Resnets fully. Need to think about gradients, if they could be annotated in an easier way.
Only shapes are now exposed to Python, types will follow later. Also the inference is not called yet anywhere but unit test.
Also I am not sure if everything is in the best location in the code, but shouldn't be hard to move stuff around.
Reviewed By: dzhulgakov
Differential Revision: D4436818
fbshipit-source-id: eebee5937ccc9ac09c245465302388a1fae6933c
Summary: As part of PR from GitHub, "logging.basicConfig()" was added to workplace, causing havoc with existing logger configurations. It should not be here. Thanks rbgirshick for reporting.
Reviewed By: kdub0
Differential Revision: D4346077
fbshipit-source-id: 084ddcbfe6354bdaf5c97a42086c0bd36ec4629c
Summary:
The exception in FeedBlob causes many tests to fail.
Instead of exception, we log a warning message and move on.
Feeding a float64 blob should not cause any issue.
Closes https://github.com/caffe2/caffe2/pull/57
Reviewed By: bwasti
Differential Revision: D4343135
Pulled By: Yangqing
fbshipit-source-id: cd1144b94c9883fcbd8bdcd78f9f93a67debc0a6
Summary:
When refactoring data parallel model, the division of LR by number of devices was dropped, and thus we ended up effectively multiplying gradients by the number of devices. Thus, we need to scale the LR by 1/numgpus.
Created a test to confirm that data_parallel_model produces exactly same results on different number of gpus, given the total batch size.
Reviewed By: prigoyal
Differential Revision: D4248907
fbshipit-source-id: af21ede113e6ac25f12c556de298cb18974548be
Summary:
Previously DPER was quite broken - we couldn't change loaders on the fly because serialized model had blob names hard-coded, e.g. "nn_loader/dense". In fact, the tests worked only by accident as both trainer and evaluator used the same loader type.
This diff does the following:
1) when writing out model, remap input blobs to be 'inputs/<field_name>'
2) when loading eval model, remap them back to the current loader
This diff uses Net.input_schema() for convenience, in particular the schema format is implicitly serialized in input blobs names. From our discussion with Andrey this type of hardcoding is actually acceptible since the schema of HiveReader on python side is inferred via the same string-parsing procedure
It also modifies model saving a bit so that we don't pollute global namespace with shape_provider net.
Overall code in mlp.py is pretty terrible. But I'd leave refactoring to xianjiec as a part of Layers migration.
Reviewed By: xianjiec
Differential Revision: D4218902
fbshipit-source-id: 6cd19f0343ec1be6ddaa3581512e61879957749e
Summary:
Recurrent developer-issue is that they pass numpy arrays with FeedBlob but forget that python float is actually double. Cuda ops in caffe2 don't allow doubles.
Thus, I think we should reject incorrect types already at the FeedBlob() when device option is CUDA.
Added test.
Is this too strong?
Reviewed By: ajtulloch
Differential Revision: D4208153
fbshipit-source-id: 364b057a2a37b5d4b95de4e59faebdab724bb0ed