Summary:
Adds the ability for a script function to call another and adds the extern function to register an external Caffe2 Net that can be called by the script.
Closes https://github.com/caffe2/caffe2/pull/1591
Reviewed By: jamesr66a
Differential Revision: D6515877
Pulled By: zdevito
fbshipit-source-id: b893d9e4bacd7389b550ac8a37ad7974b95de749
Summary: Experimental code that allows you to write C2 NetDefs directly using python-like syntax. This includes the ability to write native control-flow (if, while) and have it turn into IfOp and WhileOp
Reviewed By: jamesr66a, dzhulgakov
Differential Revision: D6123298
fbshipit-source-id: 25fc078b5769be61ac7fb3aa9a7c95bd88dccc30
Summary: Today when PythonOp throws an exception, we log the error and fail the op. Later we assert that the op/net/plan succeeds and throw with a generic message. The user must ttail the logs to find the real error. Instead, align with exception handling from other ops - throw directly. This will include full context of the exception in the error message.
Reviewed By: Yangqing, akyrola
Differential Revision: D6359684
fbshipit-source-id: 85133ba6562759607a3971449120647cbacce946
Summary: Pass the list of observers to rnnExecutor_ and attach them to operators
Reviewed By: akyrola
Differential Revision: D6279655
fbshipit-source-id: 086dde1bf6edbfb36082d6b4de33ec41f0bbefab
Summary:
Implementation of polling async net executor.
Notes:
- New net executor async_polling - schedules CPU and GPU ops asynchronously, uses single polling thread
- Events: update to Caffe2 events to support async CPU events, adding new methods:
Query() - non-blocking checking of event states: INITIALIZED -> RECORDED -> SUCCESS/FAILED
ErrorMessage() - when operation runs asynchronously and fails calling this on event will give error message
- Tasks: using existing DAGNet's algorithm to compute CPU and GPU chains, a separate task for each chain
- Polling: using single thread to query state of events - for CPU tasks atomically queries task state, for GPU task - uses cudaEventQuery; using Event
- Scheduling of CPU ops: using global thread pools
- Scheduling of GPU ops: using GPU thread pool per GPU device
Reviewed By: dzhulgakov
Differential Revision: D5985110
fbshipit-source-id: a9de7fcbb71d046a3aa1b573072b89a65dfeee8c
Summary: observer framework can now be used in python + a small writeup of how to use it. this is D6035393 with a fix for ct-scan
Reviewed By: salexspb
Differential Revision: D6066380
fbshipit-source-id: 896c4c580d4387240b81ac2dbbc43db51d4bfeb9
Summary: observer framework can now be used in python + a small writeup of how to use it
Reviewed By: sf-wind
Differential Revision: D6035393
fbshipit-source-id: 4563cf0203095fa979bb2160621cd16dd22ff830
Summary: observer framework can now be used in python + a small writeup of how to use it
Reviewed By: salexspb
Differential Revision: D5905002
fbshipit-source-id: e40ec24a55e08fb73beea9b4f3b68e71fc66ffb1
Summary:
Useful for figuring out with people which version they built with. We can just ask for --caffe2_version gflag or get core.build_options from python.
Also adds CMAKE_INSTALL_RPATH_USE_LINK_PATH - without it wasn't building on my Mac. How should it be tested?
Closes https://github.com/caffe2/caffe2/pull/1271
Reviewed By: bddppq
Differential Revision: D5940750
Pulled By: dzhulgakov
fbshipit-source-id: 45b4c94f67e79346a10a65b34f40fd258295dad1
Summary:
Also add the ability to mark an argument as required.
Added a string constant `OpSchema::Arg_IsTest` for `is_test` arg.
If users define the `is_test` argument with `ArgIsTest(...)`, then it automatically becomes required argument, in the meanwhile user can still use `Arg("is_test", ...)` to define an optional `is_test` argument.
Reviewed By: akyrola
Differential Revision: D5812391
fbshipit-source-id: eaaba50d027813a8012389edc6c459de23c3c728
Summary:
Implemented ApplyTransformIfFaster
Determine if a transform is faster, then return whichever net is better.
Reviewed By: bwasti
Differential Revision: D5534535
fbshipit-source-id: 509943205b0c454bf30fb01343ac4e88d1441c39
Summary: This diff replaces the main of the memonger for dag algorithm _compute_blob_recycling_for_dag with a c++ implementation.
Reviewed By: akyrola
Differential Revision: D5544219
fbshipit-source-id: 9f868880c8d0eb997ad3dd39433f9d0b9216d303
Summary: Allow the use of apply_transform() in the python API
Reviewed By: bwasti
Differential Revision: D5530483
fbshipit-source-id: 61a6d36fe125c89629fdeea040a717c453d84417
Summary:
Nothing gets changed - this would allow us to more easily deal with build
systems. Also now everything that is MKL related lives under mkl/.
Reviewed By: dzhulgakov
Differential Revision: D5505157
fbshipit-source-id: ddb2e6ac290a146a7cb495da23bb0e5b5594bd2a
Summary:
This is needed for us to do more fine grained dispatch based on CPU arch, so
I figured we should just add it. Can help Dima and Misha doing optimization
I think?
Reviewed By: dzhulgakov
Differential Revision: D5477444
fbshipit-source-id: 48aaf8bd799e9755493cd51c793ceec080a8846c
Summary:
This removes/comments out/silences one or more unused parameters in the files.
We are going to enable `-Wunused-parameter` in fbcode and this fixes a case that automated tooling can't handle.
This diff is automatically generated.
Reviewers are added heuristically.
Reviewed By: dzhulgakov
Differential Revision: D5437217
fbshipit-source-id: c2fc5ed30e7ee47b8c40248f89a9f4304ce7c098
Summary:
To be used with predictor "online": C++ version of memonger for simple nets. Very simple greedy algorithm. Works well at least on Resnet-50 inference graph: only 3 shared blobs are used.
Next I will integrate this with predictor and run canary (separate diff).
Reviewed By: asaadaldien
Differential Revision: D5375392
fbshipit-source-id: d36e419e39a32e568e105657c27fb00c85a2535d
Summary:
Last time I used uuid filled into OperatorDef. And operator_tracebacks was populated using traceback.extract_stack. There were several issues with this approach:
1. A random field in OperatorDef breaks workflows relying on memoization, i.e. when computation is skipped based on already computed result before.
2. Adding one more field revealed RNNs being non forward compatible wrt to new fields in there. prototxt format seems to not allow forward compatibility (thanks jamesr66a for the investigation!). For RNNs we need to swtich them to a more resilient approach. azzolini's proposed change to OperatorDef / NetDef would allow that by just nesting NetDef dirrectly inside OperatorDef without need for extra serialization.
3. traceback.extract_stack is very slow when executable is on a remote filesystem. It does one or more os.stat for each frame on the stack. For some cases it ended up being up to 15 extra minutes on model construction.
In this diff I use a different approach which should fix all those problems above.
1.2. are solved by not adding a new field at all. Instead I report operator idx wrt to a net it runs in. Thanks akyrola and dzhulgakov for the idea. Downside here is that operator list manipulation breaks the logic and separately created ops are not covered at all.
3. I solved this by operating on raw frames without using traceback and inspect modules which end up doing a lot of file system calls. See function extract_stacktace in core.py with additional comments.
Reviewed By: dzhulgakov
Differential Revision: D5286285
fbshipit-source-id: 626dd0f5f6b8b1d86bd6bf519078b122f43ddcaa
Summary:
Advantages of cloning the tasks/execution_steps at runtime:
- Less complexity on the python side: no need to clone nets and add prefixes to blob names
- Faster start-up: we had cases of complex plans that took up to 30min to be created.
- Better isolation: each task cloned at runtime has its own child workspace, preventing false sharing of blobs.
- Opens up possibility for dynamic scheduling: Number of threads per task can be increased on the fly, at runtime.
Reviewed By: dzhulgakov
Differential Revision: D5100730
fbshipit-source-id: 71b83193b135da4e6eaf2536d8fc266528e1fdcc
Summary:
a few issues:
1. Randomization hurts memoization
1. Even if we make it non random, then we can get key colisions when loading it back.
2. RNNs use prototxt for step net and apparently its not forward compatible like normal protobuf is
I am thinking of a better less invasive solution now.
Reviewed By: jamesr66a
Differential Revision: D5272118
fbshipit-source-id: ab577fad04fbfc632e1fceffa923377a0d3da1be
Summary: This is going to show a python Caffe2 user where a failed operator was created. Motivation for having this information not right in protobuf is to avoid having it too verboose and keep ability to read protobufs of a net after a simple print() call.
Reviewed By: jamesr66a
Differential Revision: D5226047
fbshipit-source-id: 7edfe850e05a2ec209577142aa3368664a57a108
Summary:
This allows to construct a python op by passing a pickled "builder function call" as an argument to the op.
The builder function is called at PythonOp construction time and returns a function that will be called when the op is run.
This way we allow to drop the dependency on 'tokens', which didn't work properly for protobufs that get distributed to other processes. Now, the PythonOp definition is self-contained: as long as the build dependencies are right, sharding the protobuf is enough to execute the net remotely.
Reviewed By: dzhulgakov
Differential Revision: D5080833
fbshipit-source-id: a5deaca5d3143024cdb121519689224e9dbec5ce
Summary: Infer input and output device from OperatorDef through OperatorSchema. This is inspired by shape inference. With this feature, we can easily analysis device information for all blobs in the net in a generic way. It is really helpful for auto cross device execution.
Reviewed By: akyrola, dzhulgakov
Differential Revision: D5161065
fbshipit-source-id: ee656123112171a4ca00f2fb3f6940f32ddf3135
Summary:
I'm using Python ops in a project and need corresponding Python gradient ops. For my use case, only a subset of the forward op outputs have gradients and only a subset of forward op inputs have gradients. However the current implementation of `GetPythonGradient` forces all grad inputs and outputs to exist. This diff allows one to specify that only a subset of grad inputs / outputs are used when constructing the Python op.
I'm not sure if this is up to caffe2 standards, so please push back on style and content as needed.
Reviewed By: dzhulgakov
Differential Revision: D4897004
fbshipit-source-id: 96fffe8634c51a49b6bce7339a46c6235f7d4bbd
Summary: This diff is one step towards enabling python 3 build by making it be more diligent in its handling of strings.
Reviewed By: salexspb
Differential Revision: D4893083
fbshipit-source-id: 28b8adf3280e8d1f0a7dc9b0fee5ad53f2fada57
Summary:
Makes benchmark a bit hacky, but it's a benchmark after all :)
Specifically ports functionality of proper BenchmarkNet run from the ads_benchmarks so that we can see training net perf.
Also adds --report_interval parameter to print stats more often when running in hogwild mode
kdub0 - hopefully if you have time you can integrate it properly with the Flow's workflow
harouwu -shouldn't conflict too much with your current diff
Reviewed By: rayleichen
Differential Revision: D5125183
fbshipit-source-id: 9c6f1663bc85e26d6609f0f2f23aa280731939db
Summary: Add a RandomFailureOp and handling to elastic data parallel model of the status code
Reviewed By: andrewwdye
Differential Revision: D5065936
fbshipit-source-id: 24224f9ea414ee535c9e90cc28add5189354b0ef
Summary: Relax requirement on token uniqueness since a few use cases broke after the uniqueness requirement was added in a previous diff.
Reviewed By: kittipatv
Differential Revision: D5034132
fbshipit-source-id: 327eb065923e6ea152a360324316f81b7fb9564b
Summary: For distributed jobs, we were relying on the order the PythonOps were registered, which was very fragile.
Reviewed By: dzhulgakov
Differential Revision: D5016847
fbshipit-source-id: f5601467c5b0569d5e8a0efdd76abad0d703c5f5
Summary:
Part of project to make all gradient accumulation business ops in RecurrentNetworkGradientOp, this makes the accumulateInputGradients ops.
Also added way to mark operators private so they don't appear in docs.
Reviewed By: salexspb
Differential Revision: D5006698
fbshipit-source-id: 226d7afb473290c8d0f936d2cc87640be3e06615
Summary:
This is from discussion with dzhulgakov : as a step towards revisiting the
core.Net autonaming, we will first guard against accidental overwrites of
existing networks in the workspace.
ajtulloch since we are doing Predictors in mobile, this should be safe right?
azzolini - I assume this would be safe, but would love to get your approval.
akyrola - would this hurt xray?
Reviewed By: dzhulgakov
Differential Revision: D4897725
fbshipit-source-id: aa41271927ad6671f07a53b9505283623f8c49e5
Summary: Add CAFFE_ENFORCE to make sure the protobuf parsing is successful.
Reviewed By: salexspb
Differential Revision: D4843662
fbshipit-source-id: 20cab7180e6b0e5afb5e29ff3333591659e41f7a
Summary: In case of distributed task, load_from_db() loads to wrong workspace (when used inside a Python op). Passing which workspace to use explicitly so that it loads to the one Python op is being run.
Reviewed By: kennyhorror
Differential Revision: D4653692
fbshipit-source-id: 94585c012b05ee38b9ce5e8ef0efdd50aa41dd2b