Summary:
This is a bit large diff, sorry about it. It includes basic shape and type inference functionality, based on YQ's Schema scaffolding. I added some helper functions to make it easier to write simple translations.
Bigger refactoring was needed for ConvPoolBase so that we could use the shape inference already there in the schema.
I annotated enough operators to be able to infer forward-pass of shapes for basic convnet, and added test for that. I intend to bootcamp some annotations and annotate enough to handle Resnets fully. Need to think about gradients, if they could be annotated in an easier way.
Only shapes are now exposed to Python, types will follow later. Also the inference is not called yet anywhere but unit test.
Also I am not sure if everything is in the best location in the code, but shouldn't be hard to move stuff around.
Reviewed By: dzhulgakov
Differential Revision: D4436818
fbshipit-source-id: eebee5937ccc9ac09c245465302388a1fae6933c
Summary:
This learns Shakespeare and then generates samples one character at a time. We want this to be an example of using our LSTM and RNNs in general.
Now it takes 4ms to run the training net on current parameters (with batch size = 1). I don't have data on how much each operator takes yet. But overal python loop doesn't seem to influence much - with 1000 fake iterations in run_net it took 4s for each iteration as expected.
Future work:
* fixing convergence for batching
* profiling on operator level
* trying it out with GPUs
* benchmarking against existing char-rnn implementations
* stacking lstms (one lstm is different from two, one needs to take care of scoping)
Reviewed By: urikz
Differential Revision: D4430612
fbshipit-source-id: b36644fed9844683f670717d57f8527c25ad285c
Summary: stop_if() was not being honored in ProcessingReader.
Reviewed By: dzhulgakov
Differential Revision: D4497784
fbshipit-source-id: 1c967c6252f832149800796e2c26aadf10b74850
Summary: This allows to save the previous value of the counter and send it upstream without losing counts.
Reviewed By: kennyhorror
Differential Revision: D4497854
fbshipit-source-id: 28a7ad0ff1020bde26f78b1f59614b094d1e1881
Summary: The net was being added to the task body by mistake. Also, adds local_init and local_exit functionality.
Reviewed By: dzhulgakov
Differential Revision: D4497794
fbshipit-source-id: 4d9dfb48a277ccfa204f1e74886abba5d44c61f8
Summary: For customers like Ads, Feeds, MarketPlace, their training data size is super large. It is unnecessary and costly to go over all the data to compute meta information. In this diff, numSample option is added in preCompute, so users have control over how many samples they want to use when computing meta information.
Differential Revision: D4492399
fbshipit-source-id: 7199381d226ee6300a959fc5e116d39984d199fc
Summary: The initial implementation wasn't working quite right (no const fill of an empty external input)
Reviewed By: viswanathgs
Differential Revision: D4490569
fbshipit-source-id: 1b2a4f612efb3b2685edfe6c683571dd9d01aa4f
Summary: add an option to use a resnet network instead of alexnet. Modified the resnet.create_resnet50 function slightly to allow specifying different kernel/stride parameters so we can adapt resnet to our image size.
Differential Revision: D4472535
fbshipit-source-id: ed06acf52f6425a1e04d047548eb3c70388d74aa
Summary:
I have forgotten to remove this one. The rest of indexing
instead of string names is comming after D4446813 lands as scratches
aren't inputs or outputs and thus can't be indexed.
Reviewed By: urikz
Differential Revision: D4465748
fbshipit-source-id: 2ccbedfb35541ef4a2231d1480eef59025bd5290
Summary: On some inputs TestWarden was failing
Reviewed By: Yangqing
Differential Revision: D4487293
fbshipit-source-id: 3da4b310a619c2b57f033b2dd7727f71403bfd68
Summary: looks like we don't a good job with initial recurrent input gradients yet. Here is some fix, but gradient doesn't check yet. The shape is correct now though
Reviewed By: salexspb
Differential Revision: D4475447
fbshipit-source-id: 280f1f59f19e487fd0dce0d440609c50ddce294a
Summary: See distributed.py for example of usage
Reviewed By: xianjiec
Differential Revision: D4467723
fbshipit-source-id: c74f71bebaa1751098379838d3da55945aac62bd
Summary:
Turns out that building on raspbian is easy as a cake for caffe2 - cmake is awesome.
Closes https://github.com/caffe2/caffe2/pull/112
Differential Revision: D4480985
Pulled By: Yangqing
fbshipit-source-id: 5dbe5e1e71d8680dea7a5ec8a9ce7fbe6aa5270a
Summary:
Xray is being converted to c2 and ROIPool (needed for detection models) is
missing in c2 trunk. Ported rbgirshick's implementation from experimental with a few
changes:
Also added code for translation in caffe_translate.py
Differential Revision: D4453331
fbshipit-source-id: 7a05a88edec1bd6e806e52dc1e6c55bc75c3149f
Summary: This diff use stack workspaces in RecurrentNetwork, which allows to simplify the implementation and get rid of scratches.
Reviewed By: salexspb
Differential Revision: D4446813
fbshipit-source-id: 514eec7e4300bdf492a9cb192b40cf4f89acf656
Summary:
Using multiple readers for model evaluation. Since it is built by new framework, only NativeLoader is supported.
With 5 readers, the evaluation speed is 124k. The speed for single evaluator is 32k. There is still room for improvement since the evaluator machine is under-utilized.
(Hive is the bottleneck. Adding more loading threads help to improve the speed to 240k. More readers can improve it further.)
Reviewed By: azzolini
Differential Revision: D4469393
fbshipit-source-id: b55af5f798faca4c150b2c0663fe5db0f154cb70
Summary: Replace ParseFromString with ParseProtobufFromLargeString to get around the limitation of the 64MB limit.
Reviewed By: Yangqing
Differential Revision: D4466226
fbshipit-source-id: b68a6efc76955db294ddb0d23bbaf03b69e4952a
Summary: Might be useful to have a command line version of this. Thoughts?
Reviewed By: Yangqing
Differential Revision: D4456221
fbshipit-source-id: 42dd464c5734c0cfbd4c2b1cb348aef9b269b4c2
Summary: Makes it much nicer to spot errors, especially in iPython notebook.
Reviewed By: kennyhorror
Differential Revision: D4465726
fbshipit-source-id: c0adaf5168248a70987ff9d5dfce54a622ff2219
Summary:
We get fluky lstm tests on a numerical gradient check. I
would like to improve accuracy of the latter. But first need an
example. After lading this TestWarden would find a bad input for me.
Reviewed By: urikz
Differential Revision: D4467223
fbshipit-source-id: 68d4bf22af11190f39fa28332c6d99efbb192132
Summary:
- Writing a Caffe2 computation graph to json for visualization in Flow
- Example use in the Text models workflow: it replaces the existing draw function which produces PNG file
- Visualization: https://our.intern.facebook.com/intern/fblearner/c2graphvis/13215753/
- The visualization uses FBLearnerDAG. Plan to add many visualization-related features.
Reviewed By: Mortimerp9
Differential Revision: D4415299
fbshipit-source-id: 2d641d60177566ed2837fb3750394420690f28de
Summary: Fixes segfaults that occur in Eigen and im2col/sgemm backends.
Reviewed By: Yangqing
Differential Revision: D4451772
fbshipit-source-id: 3cf21e5afb2fe300db4228933a82063db5f7091f
Summary:
1. Use opencv for data augmentation after benchmarking various image libraries in python
2. Use cuda no bias conv
3. Use cuda fastest conv (exhaustive search)
4. data_parallel_model had a few changes. Syncing them
3. propagate the errors in threads to make debugging easy
Reviewed By: rbgirshick
Differential Revision: D4341422
fbshipit-source-id: aa4471a2f49dd6d7ca13879999b3c7ceaf818c1e
Summary:
It's a similar trick to dyndeps. The idea is that global state is better to be just replicated to gang workers as otherwise it causes a lot of confusion.
In particular it's useful if one wants to enable detailed logging (--v)
For other operators user still needs to call GlobalInit explicitly. We should consider doing it for all Flow operators, but I'll leave it for future considerations.
Reviewed By: kennyhorror
Differential Revision: D4460686
fbshipit-source-id: 5836737dd3195f9ad12589fd899a3ff63f173e05
Summary:
it's broken because it relies on add sparse bias.
it's not easy to add_sparse_bias after switch to loader_param.
DPA would like to try it out :)
Differential Revision: D4447275
fbshipit-source-id: 631cb4995f35383070e44387dc86692ba64b91eb
Summary: Remove usage of recurrent_sizes, so recurrent states' sizes can depend on input (in case of attention matrix for beam decoder). I removed recurrent_sizes from forward and backward steps.
Reviewed By: salexspb
Differential Revision: D4427688
fbshipit-source-id: 580420a294d309c86ec5cb4e677058623b7228e1
Summary:
In this diff I stop passing parameters by name and also remove hardcoded output ids which were there specifically for LSTM to work. It also allows to avoid using recurrent_sizes in the backward pass (for forward this is done in D4427688)
Using similar technic it should be simple enough to eliminate blob name passing at all. Then we can fix scoping. These can be done in a next diff.
Reviewed By: urikz
Differential Revision: D4444614
fbshipit-source-id: 3580a76365502b9f2f09e3d8b7e78084ca739f00
Summary:
lets have a test for this so we don't break existing usecases
while iterating over RecurrentOp's code
Reviewed By: urikz
Differential Revision: D4456404
fbshipit-source-id: 79f2b88c1eed16106adf5b793b4c74441c7146c6
Summary:
New operator is added for model calibration. Given a piecewise linear function and raw prediction as input, generate the mapping as output.
Detail can be find in the operator doc.
Differential Revision: D4418640
fbshipit-source-id: f8ff3ea786b0fe233a4ddcb709e5dbf0861ca484
Summary:
Relies on NHWC implementation of group conv which doesn't exist right
now
Closes https://github.com/caffe2/caffe2/pull/103
Differential Revision: D4451635
Pulled By: Yangqing
fbshipit-source-id: 31d99b37abf7563a26389f47affcc759ce6bc5e1
Summary:
Perf bug report: https://www.facebook.com/groups/1405155842844877/permalink/1617904561570003/
Diagnosis:
I've done some digging into this and here's what I've found:
(1) In this use case, the call is disallowed_op_ids = get_op_ids_in_path(ssa, blob_versions, [], inputs)) where inputs = ['res4_22_sum'] is the last blob produced by the res4 stage of a ResNet101 model.
(2) get_op_ids_in_path has exponential running time in the number of blocks in the res4 stage of ResNet. This is based on empirical running times. This call should complete in 4.5 days on my devgpu.
(3) I haven't familiarized myself enough with the IR and SSA code in core.py to understand the algorithmic fix yet, but surely there's a more efficient algorithm to compute the same thing.
Reviewed By: Yangqing
Differential Revision: D4446278
fbshipit-source-id: 8bd147f92d62b865dc355d5802a53e92d64b6e21
Summary:
Now it takes two lines to get drop-in debugger: import it and
then decorate your function. Also got rid of enable / disable logic as
it doesn't seem usefull.
We can also try to enable this by default for our tests when running
locally as a next step.
Reviewed By: bwasti
Differential Revision: D4444299
fbshipit-source-id: 6e2006945d8ad640685b1017ca1bd63054728908
Summary:
DPer example have been creating multiple copies of the transform config in net
defition till this moment, that resulted in the fact that I've hit the limit of
ProtoBuf (64MB) for a certain Task requests (especially visible because of the
ValidationPipeline that I was adding).
After this diff we're going to store SigridTransforms in one instance per
machine for training (or 1 instance per reading).
Difference in sizes of the plans for some simple SparseNN model ~30 MB (even including the fact that second model have validation plan as well).
TODO: Do similar logic for NNPreProc as well (it's also pretty large).
Reviewed By: dzhulgakov
Differential Revision: D4441441
fbshipit-source-id: 4452dd86a4dc49b2c7f5b7642f443aed5720b047
Summary:
Spatial Softmax allows specifying locations that are not counted for the loss. If none of the locations are counted, this resulted in NaNs, and headache. This diff fixes that by explicitly handling these cases.
+ assertion for label blob dimension(0)
Created a new test as well.
Differential Revision: D4442939
fbshipit-source-id: 8641bfad2a994e517ca3eda39345380a6ca1ba50
Summary:
When testing the code, a couple of issues arised:
- we need to have different name for last layer than the preprocessed model, otherwise a shape assertion is created
- preprocess_noaugmentation still needs to do a crop for images larger than 227x227, otherwise things fail.
Reviewed By: viswanathgs
Differential Revision: D4442700
fbshipit-source-id: 05f54e7f17c266280f5ba5bb57af1721fe30df12
Summary:
It helps to develop scripts locally (when working outside of Flow). One doesn't have to rerun the script in order to catch exception in the debugger / add a print statement. (Flow does this kind of thing automatically)
Usage example:
```
if __name__ == '__main__':
workspace.GlobalInit(['caffe2', '--caffe2_log_level=2'])
from caffe2.python.utils import DebugMode
DebugMode.enable()
DebugMode.run(main)
```
Reviewed By: Yangqing
Differential Revision: D4424096
fbshipit-source-id: 73f418c80f581820e70139df7e166981e4d8c55f
Summary:
Some tweaks, hopefully getting us to 0.98 MAP
- no cropping for test dataset (as per patrick)
- spatialBN momentum 0.1 (default is 0.9)
Also added some additional logging and reduced frequency of running of test net and logging.
Reviewed By: viswanathgs
Differential Revision: D4439790
fbshipit-source-id: 700705b811a5fc8c7139a265de96db646605ca5a
Summary:
In this diff :
[1] Change the output from generating all paths from root to labels to TreeProto.
TreeProto itself is required by inference and we can use hsm_util to get the
paths from TreeProto.
[2] Fix hsm_util index assigment.
Differential Revision: D4416731
fbshipit-source-id: 657d8b9b4df6fa30c9f92d391cf7e07b5c5db1f8
Summary: Change labels indices range to be in the range [0, num_classes[
Differential Revision: D4416685
fbshipit-source-id: b16ca8539fd538ad62bf1298dbad3f1553956241