Summary: CopyGPUToGPU does not exist. Copy seems to do the trick. Didn't go into details of how copy works, not sure if it ends up triggering UVA.
Reviewed By: akyrola
Differential Revision: D5471014
fbshipit-source-id: d8bc1aed9b19070c92f3ffc76f5617bdd0054563
Summary: Quite common confusion is how to use StopGradient, and typical bug is to forget to specify input=output. This adds a sanity check to gradient builder that checks if some StopGradient outputs are orphaned.
Reviewed By: dzhulgakov
Differential Revision: D5458341
fbshipit-source-id: 056fef4f0ee53eb10e66e9be0ecb55b55f9cc3d7
Summary: added support of passing remap_funcs to clone_and_bind_net, so that it can pass it to clone method. Added other utils to ensure RecurrentNetwork operator is correctly cloned based on the remap_blob. The reason that RecurrentNetwork operator needs special treatment is that its arguments contain proto and blobs.
Reviewed By: kittipatv
Differential Revision: D5421532
fbshipit-source-id: 5de68365ce97df2de483f02ad260d78c8d35eead
Summary: Allows to override the input/output record as long as the field blobs are the same.
Reviewed By: yangyangyyy
Differential Revision: D5362132
fbshipit-source-id: 3ac2ac22802902b7eed5c226b00a7e1971ad264c
Summary:
It is quite common question when users get some variant of "blob has version 2 but gradient expects version 1" in their backward pass. The error message is completely unhelpful.
To remedy this, I added proper debug information which tells user how the version number of a blob was incremented over time. i.e which ops caused the version to go op. This should help
understand the issue.
Reviewed By: dzhulgakov
Differential Revision: D5358227
fbshipit-source-id: bc09d048ac33200c35d56460e44e86c2f2888f3f
Summary:
Last time I used uuid filled into OperatorDef. And operator_tracebacks was populated using traceback.extract_stack. There were several issues with this approach:
1. A random field in OperatorDef breaks workflows relying on memoization, i.e. when computation is skipped based on already computed result before.
2. Adding one more field revealed RNNs being non forward compatible wrt to new fields in there. prototxt format seems to not allow forward compatibility (thanks jamesr66a for the investigation!). For RNNs we need to swtich them to a more resilient approach. azzolini's proposed change to OperatorDef / NetDef would allow that by just nesting NetDef dirrectly inside OperatorDef without need for extra serialization.
3. traceback.extract_stack is very slow when executable is on a remote filesystem. It does one or more os.stat for each frame on the stack. For some cases it ended up being up to 15 extra minutes on model construction.
In this diff I use a different approach which should fix all those problems above.
1.2. are solved by not adding a new field at all. Instead I report operator idx wrt to a net it runs in. Thanks akyrola and dzhulgakov for the idea. Downside here is that operator list manipulation breaks the logic and separately created ops are not covered at all.
3. I solved this by operating on raw frames without using traceback and inspect modules which end up doing a lot of file system calls. See function extract_stacktace in core.py with additional comments.
Reviewed By: dzhulgakov
Differential Revision: D5286285
fbshipit-source-id: 626dd0f5f6b8b1d86bd6bf519078b122f43ddcaa
Summary:
Advantages of cloning the tasks/execution_steps at runtime:
- Less complexity on the python side: no need to clone nets and add prefixes to blob names
- Faster start-up: we had cases of complex plans that took up to 30min to be created.
- Better isolation: each task cloned at runtime has its own child workspace, preventing false sharing of blobs.
- Opens up possibility for dynamic scheduling: Number of threads per task can be increased on the fly, at runtime.
Reviewed By: dzhulgakov
Differential Revision: D5100730
fbshipit-source-id: 71b83193b135da4e6eaf2536d8fc266528e1fdcc
Summary:
a few issues:
1. Randomization hurts memoization
1. Even if we make it non random, then we can get key colisions when loading it back.
2. RNNs use prototxt for step net and apparently its not forward compatible like normal protobuf is
I am thinking of a better less invasive solution now.
Reviewed By: jamesr66a
Differential Revision: D5272118
fbshipit-source-id: ab577fad04fbfc632e1fceffa923377a0d3da1be
Summary: Hard-to-debug problems arise when a gradient creator fails when the forward op is incorrect itself. Add checking of the schema before callig the creator. Also clarify the error messages
Reviewed By: Yangqing
Differential Revision: D5256016
fbshipit-source-id: 78550f7e2ce5b88e26b69fdae4be0eece52edfea
Summary: This was only needed in order to initialize stateful PythonOps. Now PythonOp has support for initialization at Op creation time, so this is not used anymore.
Reviewed By: dzhulgakov
Differential Revision: D5242908
fbshipit-source-id: dbaa249466dd0f37f25d204d387b1f99c6dd4fed
Summary: This is going to show a python Caffe2 user where a failed operator was created. Motivation for having this information not right in protobuf is to avoid having it too verboose and keep ability to read protobufs of a net after a simple print() call.
Reviewed By: jamesr66a
Differential Revision: D5226047
fbshipit-source-id: 7edfe850e05a2ec209577142aa3368664a57a108
Summary:
This allows to construct a python op by passing a pickled "builder function call" as an argument to the op.
The builder function is called at PythonOp construction time and returns a function that will be called when the op is run.
This way we allow to drop the dependency on 'tokens', which didn't work properly for protobufs that get distributed to other processes. Now, the PythonOp definition is self-contained: as long as the build dependencies are right, sharding the protobuf is enough to execute the net remotely.
Reviewed By: dzhulgakov
Differential Revision: D5080833
fbshipit-source-id: a5deaca5d3143024cdb121519689224e9dbec5ce
Summary:
We waste extra memory by creating two autosplit gradient
blobs and then accumulating it into them main one. Sometimesk, when Sum
/ Sub ops are involved, we can avoid wasting extra memory at all.
Ideally we would not waste any memory and make ops add to the same
blob rather then calculating separate results and then mering
them. But it would require a substantial change to the frameworks and
rewriting a lot of operators.
Reviewed By: dzhulgakov
Differential Revision: D5157667
fbshipit-source-id: 8293824d6cdd971d8853ae90aee68e4a6d1e132b
Summary:
It's very useful for simple cases like benchmarking nets where we want to encode input/output record in the net and don't want to go through the hurdles of storing input/output record in MetaNetDef.
For those cases I propose remapping the input/output record before saving to 'input_record/{field_name}'. Then we can recover input/output record back just based on the names of the blobs.
Differential Revision: D5170473
fbshipit-source-id: ac5daa60051605ed93022aec1377a49f08f15663
Summary:
Static RNN allows to unroll an RNN into Caffe2 graph using all existing cell abstractions. In this diff I introduce several new tests that already caught a few bugs in our RecurrentNetworkOp gradient accumulation logic by comparing it to an unrolled version.
Another use case is perf - potentially we can run an unrolled net faster because DAGNet will have access to the whole graph. Same about memonger. But this work is not part of this diff
Reviewed By: akyrola
Differential Revision: D5200943
fbshipit-source-id: 20f16fc1b2ca500d06ccc60c4cec6e81839149dc
Summary:
when building a multi layer static RNN the last timestep of
the first layer (and other layers except the last one) doesn't get a
gradient for the cell state as normally user uses results only from
the last layer and cell state doesn't go up either.
ZeroGradient provides a general solution for injecting 0 gradient
blobs. It is in some way similar to StopGradient operator which is
also specialcased
Reviewed By: bwasti
Differential Revision: D5198375
fbshipit-source-id: a21d0cfb3676a77fac72e5897a200d0bd25fc6de
Summary:
This diff plan to attack the problem where we want to just annotate device option for operators and leave Caffe2 to help us inject cross device copy functions. This feature would be useful for mixed device training and multi device training with several nets, where previously we do the heavy lifting of adding copy functions ourselves.
Ideally, this feature will happen like this:
//construct your nets first
core.InjectDeviceCopyAmongNets([train_init, train_net, ...])
My ideas are written in comments. I will update them here as well later.
Reviewed By: dzhulgakov
Differential Revision: D5134103
fbshipit-source-id: 173f7da9d1773d1c50ccdc27f1b5cd3067b04af5
Summary: Infer input and output device from OperatorDef through OperatorSchema. This is inspired by shape inference. With this feature, we can easily analysis device information for all blobs in the net in a generic way. It is really helpful for auto cross device execution.
Reviewed By: akyrola, dzhulgakov
Differential Revision: D5161065
fbshipit-source-id: ee656123112171a4ca00f2fb3f6940f32ddf3135
Summary:
I'm using Python ops in a project and need corresponding Python gradient ops. For my use case, only a subset of the forward op outputs have gradients and only a subset of forward op inputs have gradients. However the current implementation of `GetPythonGradient` forces all grad inputs and outputs to exist. This diff allows one to specify that only a subset of grad inputs / outputs are used when constructing the Python op.
I'm not sure if this is up to caffe2 standards, so please push back on style and content as needed.
Reviewed By: dzhulgakov
Differential Revision: D4897004
fbshipit-source-id: 96fffe8634c51a49b6bce7339a46c6235f7d4bbd
Summary:
fixing missing future package issue.
Recently we found some of our users does not have future module support. So we might need a try/catch wrapper around all past import
Reviewed By: Yangqing
Differential Revision: D5183547
fbshipit-source-id: 262fdf2940ee1be4454bf0b0abb9e6a0f1a0ee82
Summary:
Bug repro is in a test. Generally speaking accumulation was
not happening if len(ys) >= 2 (list of blobs we compute gradients
from) and for some blob in the net it was both in ys list and also got
a gradient propagated from another element in ys.
Reviewed By: akyrola
Differential Revision: D5121695
fbshipit-source-id: 282d88f2f4f6e27dadae311964f40246a2739130
Summary: These return views in Python 3 which would not do anything in a lot of usages currently present in Caffe2. This diff simply removes (almost) all usages of these two in Caffe2 and sub projects in favor of comprehensions which are also easier to read/understand
Reviewed By: akyrola
Differential Revision: D5142049
fbshipit-source-id: e800631d2df7d0823fed698cae46c486038007dc
Summary:
hankun is using the optimizer, but having mixed set of of GPU and CPU operators. Currently this won't work with optimizer since it adds optimizers for all parameters in the current device scope. But we can actually infer the device that a param belongs to by looking at the device option in the param_init_net.
Added a test as well.
Reviewed By: salexspb
Differential Revision: D5133652
fbshipit-source-id: ad8689d75ac1f5c78981bae1b6978fe91e40ef0f
Summary: This diff is one step towards enabling python 3 build by making it be more diligent in its handling of strings.
Reviewed By: salexspb
Differential Revision: D4893083
fbshipit-source-id: 28b8adf3280e8d1f0a7dc9b0fee5ad53f2fada57
Summary: Relax requirement on token uniqueness since a few use cases broke after the uniqueness requirement was added in a previous diff.
Reviewed By: kittipatv
Differential Revision: D5034132
fbshipit-source-id: 327eb065923e6ea152a360324316f81b7fb9564b
Summary: For distributed jobs, we were relying on the order the PythonOps were registered, which was very fragile.
Reviewed By: dzhulgakov
Differential Revision: D5016847
fbshipit-source-id: f5601467c5b0569d5e8a0efdd76abad0d703c5f5
Summary: External inputs must be computed before updating the _ops_output structure, otherwise if the net to be appended outputs the external input, it is not added correctly
Differential Revision: D5013496
fbshipit-source-id: 6a83d0a6f1c63ef8ae7bec4d862c0ac2a690d47b
Summary: It is good practice to provide __dir__ whenever __getattr__ is defined so that tooling will work intelligently. In particular, it is hard to explore the available methods in iPython without tab completion.
Reviewed By: dzhulgakov
Differential Revision: D5006545
fbshipit-source-id: 1a150d91d54637d80b292764513943ff70d971b4
Summary:
Layer to allow model to follow different paths for each instantiation context and join later. Together with tagging system cleanup (this is a separate issue), this should reduce the need to write a layer to differentiate between context.
Re: tagging system clean up, we should make exclusion more explicit: EXCLUDE_FROM_<CONTEXT>. This would simplify instation code. TRAIN_ONLY should become a set of all EXCLUDE_FROM_*, except EXCLUDE_FROM_TRAIN.
Reviewed By: kennyhorror
Differential Revision: D4964949
fbshipit-source-id: ba6453b0deb92d1989404efb9d86e1ed25297202
Summary: I ran into this earlier and the debug messages were not helpful enuogh
Reviewed By: kennyhorror
Differential Revision: D4985754
fbshipit-source-id: b3d12b5e2cfa1b54fca9126768c84c902664ef28
Summary:
When appending net A to net B, an external input of net A should not be added as
an external input of net B if net B is outputting that blob.
Reviewed By: dzhulgakov
Differential Revision: D4975921
fbshipit-source-id: a5c0ada7b96d851e57d345244d322dd93c7be8e4
Summary:
This PR is based on commit "977c6b3" as this version allows MKL to use all the cores available.
All MKL related files are added here after incorporating review comments, major changes include
1. usage of Clang-format(Linter) with --style = Google
2. usage of macros for checking input and filter dimension in the mkl operators
3. merged Max and Average pooling functions
4. created a new folder for mkl related python scripts in Python folder and moved them there
5. there is no mkl_alexnet_test.py as that was redundant while convnet_benchmark.py does the same thing
Closes https://github.com/caffe2/caffe2/pull/270
Differential Revision: D4905219
Pulled By: Yangqing
fbshipit-source-id: e5f5b189714a835b93b9ebda24c52e09572dfca7
Summary:
This is needed to have a stateful PythonOp (such as the PyTorch in the following diff) where computing f will produce a state (not tensors) thats consumed by grad_f.
python_func_type is a type that constructed as python_func_type(f) and provides forward, backward methods (will be delegated to f, &f_grad). We are constructing this object in at Op registration time to have it as thread local.
Differential Revision: D4900963
fbshipit-source-id: 00a6a55fa372e2244048921914e22e710d11f7ce
Summary:
Fix issue that amyzhang encountered. She was using ConstantFill to create a blob of same size as an another blob. This caused the gradient op computation flow to interrupt through the ConstantFil since the gradient for the input blob was set to None (although it had another gradient already set). The correct solution is to avoid overwriting gradient assignments with None, if they already have a gradient. UNLESS that blob is output of the same op, as with StopGradient op. (Note that Amy's problem was fixed by using instead a fixed shape ConstantFill and Add with broadcast=1, which is better solution anyway).
Not sure if I explained this well, but see the new unit tests. Before this change, the testAddAndDynamicConstant failed but the testAddAndStaticConstant succeeded.
Reviewed By: dzhulgakov
Differential Revision: D4861176
fbshipit-source-id: 3b53621bfaba2e36786a5e4664145038995f6616
Summary:
Quite large diff to make cuDNN LSTM and our LSTM produce same results and provide python API for the cuDNN LSTM.
* Added operators RecurrentParamGet and RecurrentParamSet to access weights and biases for the different gates, input/recurrent.
* Removed RecurrentInit as not needed
* recurrent.cudnn_LSTM() returns a special net and mapping that can be used to retrieve the parameters from the LSTM
* recurrent.cudnn_LSTM() can be passed blobs that have the parameters for the individual gate weights and biases
* recurrnet.InitFromLSTMParams() can be used to initialize our own LSTM from CUDNN params. This way we can test if cuDNN and our own produce the same result.
recurrent_test.py tests for the equivalency
Reviewed By: salexspb
Differential Revision: D4654988
fbshipit-source-id: 6c1547d873cadcf33e03b0e0110248f0a7ab8cb0
Summary:
First, this diff includes a full test of data-parallel LSTM, which confirms it works correctly. To make it work, some changes had to be made:
- cell net/step net external inputs must be namespace scoped
- prevent double-namescoping of cellnet inputs
- make data parallel model understand recurrentnets so the device-mapping works
Reviewed By: salexspb
Differential Revision: D4708840
fbshipit-source-id: 4b0ddc43642d449076a2b6f67ad1c47f84138ff4
Summary: When cloning recurrent net op, we do a remapping of the lengths-blobs. But if they don't exists (like with CRF), we should not do that.
Differential Revision: D4702123
fbshipit-source-id: 37a22d11e709011b8b98b2cc3d9f08eb9fda06c4
Summary:
This diff is modifying the way we're specifying metrics from doing reporter, that should know all the blobs which is should access in advance, to reporter that is connected through schema.
This diff is also reporting any arbitrary number of learning curves to Flow and provides really flexible way to specify all the metrics we care about.
TODO: Modify model helper to allow providing intermediate results for reporting
TODO: Add evaluation net (instead of prediction net).
TODO: Move all other places in DPER 2.0 to use that abstractions instead.
TODO: Get rid of LogScoreEstimator in favor of metric that is going to be really suiting our needs.
Reviewed By: azzolini, dzhulgakov, kittipatv
Differential Revision: D4577548
fbshipit-source-id: 3515bd41e0f92263ff90ce2f7207abf65d01b1f7
Summary: so that the utils can be used by a wider range of audience.
Reviewed By: xianjiec
Differential Revision: D4637462
fbshipit-source-id: f0695f430902aef26360efa511069b3755eaf52a
Summary: fix a check if the net is net_dict
Reviewed By: kennyhorror
Differential Revision: D4647493
fbshipit-source-id: e0a62fc5847c99c85857c5635b4e39d59c66d5ce
Summary: Add SparseNN workflow for feed. I haven't fully thought about the change needed for ads, as I added a property called 'preproc_output_schema' for LayerModelHelper.
Reviewed By: xianjiec
Differential Revision: D4585796
fbshipit-source-id: 060d08f4beb928e7e7863f2e563f612c358951fb
Summary:
For code in layer model helper, layers. It's intentionally to not have NameScope by default.
This looks another place that may need default NameScope.
https://fburl.com/wdwtxp0m
Reviewed By: kennyhorror
Differential Revision: D4606971
fbshipit-source-id: b560bf59d3242e3f9443cd5aeda5c7e2e4e89079