Summary:
To be used with predictor "online": C++ version of memonger for simple nets. Very simple greedy algorithm. Works well at least on Resnet-50 inference graph: only 3 shared blobs are used.
Next I will integrate this with predictor and run canary (separate diff).
Reviewed By: asaadaldien
Differential Revision: D5375392
fbshipit-source-id: d36e419e39a32e568e105657c27fb00c85a2535d
Summary:
Last time I used uuid filled into OperatorDef. And operator_tracebacks was populated using traceback.extract_stack. There were several issues with this approach:
1. A random field in OperatorDef breaks workflows relying on memoization, i.e. when computation is skipped based on already computed result before.
2. Adding one more field revealed RNNs being non forward compatible wrt to new fields in there. prototxt format seems to not allow forward compatibility (thanks jamesr66a for the investigation!). For RNNs we need to swtich them to a more resilient approach. azzolini's proposed change to OperatorDef / NetDef would allow that by just nesting NetDef dirrectly inside OperatorDef without need for extra serialization.
3. traceback.extract_stack is very slow when executable is on a remote filesystem. It does one or more os.stat for each frame on the stack. For some cases it ended up being up to 15 extra minutes on model construction.
In this diff I use a different approach which should fix all those problems above.
1.2. are solved by not adding a new field at all. Instead I report operator idx wrt to a net it runs in. Thanks akyrola and dzhulgakov for the idea. Downside here is that operator list manipulation breaks the logic and separately created ops are not covered at all.
3. I solved this by operating on raw frames without using traceback and inspect modules which end up doing a lot of file system calls. See function extract_stacktace in core.py with additional comments.
Reviewed By: dzhulgakov
Differential Revision: D5286285
fbshipit-source-id: 626dd0f5f6b8b1d86bd6bf519078b122f43ddcaa
Summary:
Advantages of cloning the tasks/execution_steps at runtime:
- Less complexity on the python side: no need to clone nets and add prefixes to blob names
- Faster start-up: we had cases of complex plans that took up to 30min to be created.
- Better isolation: each task cloned at runtime has its own child workspace, preventing false sharing of blobs.
- Opens up possibility for dynamic scheduling: Number of threads per task can be increased on the fly, at runtime.
Reviewed By: dzhulgakov
Differential Revision: D5100730
fbshipit-source-id: 71b83193b135da4e6eaf2536d8fc266528e1fdcc
Summary:
a few issues:
1. Randomization hurts memoization
1. Even if we make it non random, then we can get key colisions when loading it back.
2. RNNs use prototxt for step net and apparently its not forward compatible like normal protobuf is
I am thinking of a better less invasive solution now.
Reviewed By: jamesr66a
Differential Revision: D5272118
fbshipit-source-id: ab577fad04fbfc632e1fceffa923377a0d3da1be
Summary: This is going to show a python Caffe2 user where a failed operator was created. Motivation for having this information not right in protobuf is to avoid having it too verboose and keep ability to read protobufs of a net after a simple print() call.
Reviewed By: jamesr66a
Differential Revision: D5226047
fbshipit-source-id: 7edfe850e05a2ec209577142aa3368664a57a108
Summary:
This allows to construct a python op by passing a pickled "builder function call" as an argument to the op.
The builder function is called at PythonOp construction time and returns a function that will be called when the op is run.
This way we allow to drop the dependency on 'tokens', which didn't work properly for protobufs that get distributed to other processes. Now, the PythonOp definition is self-contained: as long as the build dependencies are right, sharding the protobuf is enough to execute the net remotely.
Reviewed By: dzhulgakov
Differential Revision: D5080833
fbshipit-source-id: a5deaca5d3143024cdb121519689224e9dbec5ce
Summary: Infer input and output device from OperatorDef through OperatorSchema. This is inspired by shape inference. With this feature, we can easily analysis device information for all blobs in the net in a generic way. It is really helpful for auto cross device execution.
Reviewed By: akyrola, dzhulgakov
Differential Revision: D5161065
fbshipit-source-id: ee656123112171a4ca00f2fb3f6940f32ddf3135
Summary:
I'm using Python ops in a project and need corresponding Python gradient ops. For my use case, only a subset of the forward op outputs have gradients and only a subset of forward op inputs have gradients. However the current implementation of `GetPythonGradient` forces all grad inputs and outputs to exist. This diff allows one to specify that only a subset of grad inputs / outputs are used when constructing the Python op.
I'm not sure if this is up to caffe2 standards, so please push back on style and content as needed.
Reviewed By: dzhulgakov
Differential Revision: D4897004
fbshipit-source-id: 96fffe8634c51a49b6bce7339a46c6235f7d4bbd
Summary: This diff is one step towards enabling python 3 build by making it be more diligent in its handling of strings.
Reviewed By: salexspb
Differential Revision: D4893083
fbshipit-source-id: 28b8adf3280e8d1f0a7dc9b0fee5ad53f2fada57
Summary:
Makes benchmark a bit hacky, but it's a benchmark after all :)
Specifically ports functionality of proper BenchmarkNet run from the ads_benchmarks so that we can see training net perf.
Also adds --report_interval parameter to print stats more often when running in hogwild mode
kdub0 - hopefully if you have time you can integrate it properly with the Flow's workflow
harouwu -shouldn't conflict too much with your current diff
Reviewed By: rayleichen
Differential Revision: D5125183
fbshipit-source-id: 9c6f1663bc85e26d6609f0f2f23aa280731939db
Summary: Add a RandomFailureOp and handling to elastic data parallel model of the status code
Reviewed By: andrewwdye
Differential Revision: D5065936
fbshipit-source-id: 24224f9ea414ee535c9e90cc28add5189354b0ef
Summary: Relax requirement on token uniqueness since a few use cases broke after the uniqueness requirement was added in a previous diff.
Reviewed By: kittipatv
Differential Revision: D5034132
fbshipit-source-id: 327eb065923e6ea152a360324316f81b7fb9564b
Summary: For distributed jobs, we were relying on the order the PythonOps were registered, which was very fragile.
Reviewed By: dzhulgakov
Differential Revision: D5016847
fbshipit-source-id: f5601467c5b0569d5e8a0efdd76abad0d703c5f5
Summary:
Part of project to make all gradient accumulation business ops in RecurrentNetworkGradientOp, this makes the accumulateInputGradients ops.
Also added way to mark operators private so they don't appear in docs.
Reviewed By: salexspb
Differential Revision: D5006698
fbshipit-source-id: 226d7afb473290c8d0f936d2cc87640be3e06615
Summary:
This is from discussion with dzhulgakov : as a step towards revisiting the
core.Net autonaming, we will first guard against accidental overwrites of
existing networks in the workspace.
ajtulloch since we are doing Predictors in mobile, this should be safe right?
azzolini - I assume this would be safe, but would love to get your approval.
akyrola - would this hurt xray?
Reviewed By: dzhulgakov
Differential Revision: D4897725
fbshipit-source-id: aa41271927ad6671f07a53b9505283623f8c49e5
Summary: Add CAFFE_ENFORCE to make sure the protobuf parsing is successful.
Reviewed By: salexspb
Differential Revision: D4843662
fbshipit-source-id: 20cab7180e6b0e5afb5e29ff3333591659e41f7a
Summary: In case of distributed task, load_from_db() loads to wrong workspace (when used inside a Python op). Passing which workspace to use explicitly so that it loads to the one Python op is being run.
Reviewed By: kennyhorror
Differential Revision: D4653692
fbshipit-source-id: 94585c012b05ee38b9ce5e8ef0efdd50aa41dd2b
Summary:
Shape inference allows Caffe2 to compute shapes of blobs without running a model. Update InferShapesAndTypes() to accept an optional blob:dimensions map so that external input blobs do not need to be part of the workspace.
InferShapesAndTypes() in workspace.py conditionally calls the ...from_workspace or ...from_map bindings. Note I favored a small amount of code duplication here for the sake of readability. InferShapesAndTypes() in operator.cc has been refactored into mirrored entry points, invoking a common helper.
Other minor changes to address linter warnings.
Reviewed By: dzhulgakov
Differential Revision: D4524873
fbshipit-source-id: 56f863b759c016d7f23523f06fda3aa5bba22357
Summary:
Running RunNet() in python in a loop can be a performance issue if the python code is doing a lot of other processing, such as data input, because python's Global Interpreter lock (GIL) will prevent the RunNet() to be called. This can easily be fixed by making RunNet() run multiple iterations inside the C++ land. (Another way to accomplish the same thing is to use Caffe2's "execution plans", but that requires more setup).
+ fixed timing reporting in my OC workflow
+ improved one error log in data_workers.py
Sorry for piggypagging those small changes, but landing diffs currently is slow...
Reviewed By: rpenggithub
Differential Revision: D4523575
fbshipit-source-id: 039a647576efad5dd9afda74df478ac22b43c103
Summary:
This is a bit large diff, sorry about it. It includes basic shape and type inference functionality, based on YQ's Schema scaffolding. I added some helper functions to make it easier to write simple translations.
Bigger refactoring was needed for ConvPoolBase so that we could use the shape inference already there in the schema.
I annotated enough operators to be able to infer forward-pass of shapes for basic convnet, and added test for that. I intend to bootcamp some annotations and annotate enough to handle Resnets fully. Need to think about gradients, if they could be annotated in an easier way.
Only shapes are now exposed to Python, types will follow later. Also the inference is not called yet anywhere but unit test.
Also I am not sure if everything is in the best location in the code, but shouldn't be hard to move stuff around.
Reviewed By: dzhulgakov
Differential Revision: D4436818
fbshipit-source-id: eebee5937ccc9ac09c245465302388a1fae6933c
Summary:
Turns out that building on raspbian is easy as a cake for caffe2 - cmake is awesome.
Closes https://github.com/caffe2/caffe2/pull/112
Differential Revision: D4480985
Pulled By: Yangqing
fbshipit-source-id: 5dbe5e1e71d8680dea7a5ec8a9ce7fbe6aa5270a
Summary: Replace ParseFromString with ParseProtobufFromLargeString to get around the limitation of the 64MB limit.
Reviewed By: Yangqing
Differential Revision: D4466226
fbshipit-source-id: b68a6efc76955db294ddb0d23bbaf03b69e4952a
Summary: Makes it much nicer to spot errors, especially in iPython notebook.
Reviewed By: kennyhorror
Differential Revision: D4465726
fbshipit-source-id: c0adaf5168248a70987ff9d5dfce54a622ff2219
Summary:
DPER has very strange python ops that play with Workspace - they are somewhat similar to LoadOp/SaveOp, so I guess the semantics is fine.
Thus it makes sense to allow python operators to receive workspace pointer similarly to regular Operators.
I didn't figure out a better way to implement optional argument than just checking the number of args function receives on python side.
Reviewed By: ajtulloch
Differential Revision: D4242943
fbshipit-source-id: d97d4227815b741c8f884cfe254b06d2b56b5a41
Summary:
This is #2 of a series of changes. It did the following:
(1) a few refactor of the MKL memory interface
(2) an initial MKLContext to deal with MKL specific computations
(3) Provide MKLMemory access in Python with the blob feeder/fetcher registration.
Reviewed By: dzhulgakov
Differential Revision: D4210123
fbshipit-source-id: adea1f1ffbd0b9ffdd55092676468c16bec08992