Commit Graph

199 Commits

Author SHA1 Message Date
Aapo Kyrola
dcefc74a0c Shape and Type Inference Part1
Summary:
This is a bit large diff, sorry about it. It includes basic shape and type inference functionality, based on YQ's Schema scaffolding. I added some helper functions to make it easier to write simple translations.

Bigger refactoring was needed for ConvPoolBase so that we could use the shape inference already there in the schema.

I annotated enough operators to be able to infer forward-pass of shapes for basic convnet, and added test for that. I intend to bootcamp some annotations and annotate enough to handle Resnets fully. Need to think about gradients, if they could be annotated in an easier way.

Only shapes are now exposed to Python, types will follow later. Also the inference is not called yet anywhere but unit test.

Also I am not sure if everything is in the best location in the code, but shouldn't be hard to move stuff around.

Reviewed By: dzhulgakov

Differential Revision: D4436818

fbshipit-source-id: eebee5937ccc9ac09c245465302388a1fae6933c
2017-02-02 22:29:22 -08:00
Alexander Sidorov
2ce3cfefe1 Char-RNN Tutorial
Summary:
This learns Shakespeare and then generates samples one character at a time. We want this to be an example of using our LSTM and RNNs in general.

Now it takes 4ms to run the training net on current parameters (with batch size = 1). I don't have data on how much each operator takes yet. But overal python loop doesn't seem to influence much - with 1000 fake iterations in run_net it took 4s for each iteration as expected.

Future work:

* fixing convergence for batching
* profiling on operator level
* trying it out with GPUs
* benchmarking against  existing char-rnn implementations
* stacking lstms (one lstm is different from two, one needs to take care of scoping)

Reviewed By: urikz

Differential Revision: D4430612

fbshipit-source-id: b36644fed9844683f670717d57f8527c25ad285c
2017-02-02 15:44:32 -08:00
Alisson Gusatti Azzolini
d7e85bf38e Fix ops.stop_if() from inside processors
Summary: stop_if() was not being honored in ProcessingReader.

Reviewed By: dzhulgakov

Differential Revision: D4497784

fbshipit-source-id: 1c967c6252f832149800796e2c26aadf10b74850
2017-02-02 15:14:27 -08:00
Alisson Gusatti Azzolini
000c53a7b1 AtomicCounter to return previous value on Reset.
Summary: This allows to save the previous value of the counter and send it upstream without losing counts.

Reviewed By: kennyhorror

Differential Revision: D4497854

fbshipit-source-id: 28a7ad0ff1020bde26f78b1f59614b094d1e1881
2017-02-02 14:59:30 -08:00
Alisson Gusatti Azzolini
d93b9eeae2 Fix NetBuilder's task_init
Summary: The net was being added to the task body by mistake. Also, adds local_init and local_exit functionality.

Reviewed By: dzhulgakov

Differential Revision: D4497794

fbshipit-source-id: 4d9dfb48a277ccfa204f1e74886abba5d44c61f8
2017-02-02 14:59:30 -08:00
Zhao Tan
d8dff5853e Add numSample field for preComputing
Summary: For customers like Ads, Feeds, MarketPlace, their training data size is super large. It is unnecessary and costly to go over all the data to compute meta information. In this diff, numSample option is added in preCompute, so users have control over how many samples they want to use when computing meta information.

Differential Revision: D4492399

fbshipit-source-id: 7199381d226ee6300a959fc5e116d39984d199fc
2017-02-02 13:59:30 -08:00
Bram Wasti
77fd7c2b6f Make translator work as command line tool
Summary: The initial implementation wasn't working quite right (no const fill of an empty external input)

Reviewed By: viswanathgs

Differential Revision: D4490569

fbshipit-source-id: 1b2a4f612efb3b2685edfe6c683571dd9d01aa4f
2017-02-01 13:14:26 -08:00
Sean Snyder
79c04d32dc add an option to use a resnet network instead of alexnet
Summary: add an option to use a resnet network instead of alexnet. Modified the resnet.create_resnet50 function slightly to allow specifying different kernel/stride parameters so we can adapt resnet to our image size.

Differential Revision: D4472535

fbshipit-source-id: ed06acf52f6425a1e04d047548eb3c70388d74aa
2017-01-31 16:59:30 -08:00
Alexander Sidorov
b7fa6b2a8b remove recurrent_inputs in a favor of recurrent_input_ids
Summary:
I have forgotten to remove this one. The rest of indexing
instead of string names is comming after  D4446813 lands as scratches
aren't inputs or outputs and thus can't be indexed.

Reviewed By: urikz

Differential Revision: D4465748

fbshipit-source-id: 2ccbedfb35541ef4a2231d1480eef59025bd5290
2017-01-31 13:14:33 -08:00
Alexander Sidorov
d019ec793c improve fluky test
Summary: On some inputs TestWarden was failing

Reviewed By: Yangqing

Differential Revision: D4487293

fbshipit-source-id: 3da4b310a619c2b57f033b2dd7727f71403bfd68
2017-01-30 22:14:27 -08:00
Yury Zemlyanskiy
debd256177 Fix for gradient propagation for initial recurrent state for RecurrentNetwork
Summary: looks like we don't a good job with initial recurrent input gradients yet. Here is some fix, but gradient doesn't check yet. The shape is correct now though

Reviewed By: salexspb

Differential Revision: D4475447

fbshipit-source-id: 280f1f59f19e487fd0dce0d440609c50ddce294a
2017-01-30 18:59:32 -08:00
Alisson Gusatti Azzolini
0700e05e68 Disallow duplicate field names in Struct
Summary: title.

Differential Revision: D4482958

fbshipit-source-id: a732f6b5d862b440a4856251ad68ecd98f60e8d1
2017-01-30 14:44:28 -08:00
Alisson Gusatti Azzolini
1d3834eeb2 Nodes to support resource requirements and outputs
Summary: See distributed.py for example of usage

Reviewed By: xianjiec

Differential Revision: D4467723

fbshipit-source-id: c74f71bebaa1751098379838d3da55945aac62bd
2017-01-30 11:29:25 -08:00
Yangqing Jia
8553bd3f68 Ensure we are not using Eigen LGPL code, and build on raspbian.
Summary:
Turns out that building on raspbian is easy as a cake for caffe2 - cmake is awesome.
Closes https://github.com/caffe2/caffe2/pull/112

Differential Revision: D4480985

Pulled By: Yangqing

fbshipit-source-id: 5dbe5e1e71d8680dea7a5ec8a9ce7fbe6aa5270a
2017-01-30 09:44:27 -08:00
Alisson Gusatti Azzolini
14a5b35805 Snapshot -> Checkpoint
Summary: As per kennyhorror request.

Reviewed By: kennyhorror

Differential Revision: D4473177

fbshipit-source-id: 6cab6ccf247b09aab8f6f056c807bd3ed27ee6a5
2017-01-27 22:29:32 -08:00
Andrey Malevich
86fb25cefa Rely on embedding size in split
Summary: As desc.

Differential Revision: D4471823

fbshipit-source-id: 2685c64c22556da1749b3e3e6b21a684a7231e7b
2017-01-27 19:44:31 -08:00
Viswanath Sivakumar
eba5299576 Port ROIPool to caffe2 trunk, add CPU implementation
Summary:
Xray is being converted to c2 and ROIPool (needed for detection models) is
missing in c2 trunk. Ported rbgirshick's implementation from experimental with a few
changes:

Also added code for translation in caffe_translate.py

Differential Revision: D4453331

fbshipit-source-id: 7a05a88edec1bd6e806e52dc1e6c55bc75c3149f
2017-01-27 12:59:20 -08:00
Yury Zemlyanskiy
22e1bdd6d1 Use stack workspaces in RecurrentNetwork
Summary: This diff use stack workspaces in RecurrentNetwork, which allows to simplify the implementation and get rid of scratches.

Reviewed By: salexspb

Differential Revision: D4446813

fbshipit-source-id: 514eec7e4300bdf492a9cb192b40cf4f89acf656
2017-01-27 11:44:26 -08:00
Ou Jin
ed04a20289 distributed reader for evaluation
Summary:
Using multiple readers for model evaluation. Since it is built by new framework, only NativeLoader is supported.

With 5 readers, the evaluation speed is 124k. The speed for single evaluator is 32k. There is still room for improvement since the evaluator machine is under-utilized.
(Hive is the bottleneck. Adding more loading threads help to improve the speed to 240k. More readers can improve it further.)

Reviewed By: azzolini

Differential Revision: D4469393

fbshipit-source-id: b55af5f798faca4c150b2c0663fe5db0f154cb70
2017-01-27 10:44:24 -08:00
Vsevolod Oparin
319945df15 Test for FC operator + fix for docs
Summary: Test for FC operator + fix for docs

Differential Revision: D4473293

fbshipit-source-id: 6e6ebad007ee08b05184fda288ab74982c6b2219
2017-01-27 10:44:24 -08:00
Fei Sun
cc65cc64c8 Create function ParseProtobufFromLargeString to parse strings more than 64MB
Summary: Replace ParseFromString with ParseProtobufFromLargeString to get around the limitation of the 64MB limit.

Reviewed By: Yangqing

Differential Revision: D4466226

fbshipit-source-id: b68a6efc76955db294ddb0d23bbaf03b69e4952a
2017-01-27 10:29:22 -08:00
Viswanath Sivakumar
ca1ff1ee9b Add Flatten layer, bugfix in InnerProduct
Summary: Uncovered these while converting xray detection model.

Differential Revision: D4461051

fbshipit-source-id: 1654c0d7ed101c8c211a93aed6bb542db1e20e0a
2017-01-26 21:44:35 -08:00
Bram Wasti
9dd1d9428e Made translator work as command line tool
Summary: Might be useful to have a command line version of this. Thoughts?

Reviewed By: Yangqing

Differential Revision: D4456221

fbshipit-source-id: 42dd464c5734c0cfbd4c2b1cb348aef9b269b4c2
2017-01-26 20:29:35 -08:00
Dmytro Dzhulgakov
864f561525 Make BlobDeserialization throw exceptions instead of returning bool
Summary: Makes it much nicer to spot errors, especially in iPython notebook.

Reviewed By: kennyhorror

Differential Revision: D4465726

fbshipit-source-id: c0adaf5168248a70987ff9d5dfce54a622ff2219
2017-01-26 09:44:19 -08:00
Alexander Sidorov
8bff8014b3 print out inputs in lstm test to catch when it is fluky
Summary:
We get fluky lstm tests on a numerical gradient check. I
would like to improve accuracy of the latter. But first need an
example. After lading this TestWarden would find a bad input for me.

Reviewed By: urikz

Differential Revision: D4467223

fbshipit-source-id: 68d4bf22af11190f39fa28332c6d99efbb192132
2017-01-25 20:59:21 -08:00
Minsuk (Brian) Kahng
de8cd46416 Caffe2 graph to json for visualization in flow
Summary:
- Writing a Caffe2 computation graph to json for visualization in Flow
- Example use in the Text models workflow: it replaces the existing draw function which produces PNG file
- Visualization: https://our.intern.facebook.com/intern/fblearner/c2graphvis/13215753/
- The visualization uses FBLearnerDAG. Plan to add many visualization-related features.

Reviewed By: Mortimerp9

Differential Revision: D4415299

fbshipit-source-id: 2d641d60177566ed2837fb3750394420690f28de
2017-01-25 19:44:20 -08:00
Andrew Tulloch
0f870d4f40 Add error checking for too-small input in ConvPoolOpBase
Summary: Fixes segfaults that occur in Eigen and im2col/sgemm backends.

Reviewed By: Yangqing

Differential Revision: D4451772

fbshipit-source-id: 3cf21e5afb2fe300db4228933a82063db5f7091f
2017-01-25 17:44:22 -08:00
Viswanath Sivakumar
9775ffc6ae Fixes to topological sort, canonical blob naming, sharing final blob
Summary: Three small changes:

Reviewed By: ajtulloch

Differential Revision: D4437131

fbshipit-source-id: c849e36e1c4d1dce947076349df863fafe62c66d
2017-01-25 15:14:26 -08:00
Viswanath Sivakumar
a4ba0cceb2 Run memonger to optimize net if needed
Summary: This runs memory optimization on the net.

Differential Revision: D4433788

fbshipit-source-id: 80c3f0568795c2d7a5beb3cdb89a92af91162fef
2017-01-25 15:14:26 -08:00
Priya Goyal
40ce50e0bd Speed-up training, fast data-augmentation, sync data_parallel_model changes + other small fixes
Summary:
1. Use opencv for data augmentation after benchmarking various image libraries in python
2. Use cuda no bias conv
3. Use cuda fastest conv (exhaustive search)
4. data_parallel_model had a few changes. Syncing them
3. propagate the errors in threads to make debugging easy

Reviewed By: rbgirshick

Differential Revision: D4341422

fbshipit-source-id: aa4471a2f49dd6d7ca13879999b3c7ceaf818c1e
2017-01-25 11:44:22 -08:00
Dmytro Dzhulgakov
aed53dd7cf Pass cmd flags of GlobalInit down to workers in Flow
Summary:
It's a similar trick to dyndeps. The idea is that global state is better to be just replicated to gang workers as otherwise it causes a lot of confusion.

In particular it's useful if one wants to enable detailed logging (--v)

For other operators user still needs to call GlobalInit explicitly. We should consider doing it for all Flow operators, but I'll leave it for future considerations.

Reviewed By: kennyhorror

Differential Revision: D4460686

fbshipit-source-id: 5836737dd3195f9ad12589fd899a3ff63f173e05
2017-01-25 11:14:51 -08:00
Xianjie Chen
ddbf90afa3 improve dper dh
Summary:
it's broken because it relies on add sparse bias.
it's not easy to add_sparse_bias after switch to loader_param.

DPA would like to try it out :)

Differential Revision: D4447275

fbshipit-source-id: 631cb4995f35383070e44387dc86692ba64b91eb
2017-01-25 02:59:22 -08:00
Yury Zemlyanskiy
0e3146e1e8 Remove recurrent_sizes from RecurrentNetwork
Summary: Remove usage of recurrent_sizes, so recurrent states' sizes can depend on input (in case of attention matrix for beam decoder). I removed recurrent_sizes from forward and backward steps.

Reviewed By: salexspb

Differential Revision: D4427688

fbshipit-source-id: 580420a294d309c86ec5cb4e677058623b7228e1
2017-01-24 23:14:25 -08:00
Vsevolod Oparin
5e5486491d Replace Gather + RowMul by SparseLengthsWeightedSum
Summary:
Improving performace using command SparseLenghtsWeightedSum. Results for my run:
Before:

  8.98474 RowMul
  6.89952 Gather
  0.80991 LengthsSum
  2.02056 SparseLengthsWeightedSum
  Total: 18.71

After:

  1.075 Gather
  6.54999 SparseLengthsWeightedSum
  Total: 7.62

Log of run: P56992396

With skip_backward. Command:

  CLASSPATH=/mnt/vol/gfsetlprocstore-oregon/users/cxj/hivereader-wrapper-1.0-SNAPSHOT-standalone.jar OMP_NUM_THREADS=1 MKL_NUM_THREADS=1 MKL_DYNAMIC=FALSE ./buck-out/gen/caffe2/caffe2/fb/dper/tools/speed_benchmark.par -loader_param /mnt/vol/gfsfblearner-altoona/flow/data/2017-01-22/d832bb7b-5598-422e-9fee-b3299a9c8c1f -negDownsampleRate 0.1 -hidden 'unary(dot{"num_dense": 6, "pooling_method": "PositionWeighted"}(128, 64)128-128, 1)' -model_type mlp_sparse -warmup_runs 10 -main_runs 1000 -run_individual -skip_backward 2>&1 | tee /tmp/log.txt

Before: P56993234$7509
After: P56992503$7344

Command:

  ./fblearner/nn/ads/canary all

https://our.intern.facebook.com/intern/fblearner/details/13320564/?notif_channel=cli

Cloned "caffe2 ads sparse nn canary" run: https://our.intern.facebook.com/intern/fblearner/details/13322337/

Reviewed By: xianjiec

Differential Revision: D4451073

fbshipit-source-id: 0a4e9693d7b8b0372b2efefa61154e987a493210
2017-01-24 20:44:21 -08:00
Alexander Sidorov
b1472a173a don't hardcode outputs order to work only for lstm + don't pass blob names for parameters
Summary:
In this diff I stop passing parameters by name and also remove hardcoded output ids which were there specifically for LSTM to work. It also allows to avoid using recurrent_sizes in the backward pass (for forward this is done in D4427688)

Using similar technic it should be simple enough to eliminate blob name passing at all. Then we can fix scoping. These can be done in a next diff.

Reviewed By: urikz

Differential Revision: D4444614

fbshipit-source-id: 3580a76365502b9f2f09e3d8b7e78084ca739f00
2017-01-24 16:29:23 -08:00
Alexander Sidorov
f09da676d7 CNNModelHelper.LSTM test
Summary:
lets have a test for this so we don't break existing usecases
while iterating over RecurrentOp's code

Reviewed By: urikz

Differential Revision: D4456404

fbshipit-source-id: 79f2b88c1eed16106adf5b793b4c74441c7146c6
2017-01-24 15:59:24 -08:00
Chao Zhang
96fc095ccb Add piecewise linear transformation operator
Summary:
New operator is added for model calibration. Given a piecewise linear function and raw prediction as input, generate the mapping as output.
Detail can be find in the operator doc.

Differential Revision: D4418640

fbshipit-source-id: f8ff3ea786b0fe233a4ddcb709e5dbf0861ca484
2017-01-23 17:44:26 -08:00
Bram Wasti
b5424c9646 Enable top-k accuracy option in caffe_translator
Summary: Caffe2 has a topk accuracy op now

Differential Revision: D4450387

fbshipit-source-id: 2d516cc44fb4e814ca901e73746b0364a0584217
2017-01-23 14:29:24 -08:00
Simon Layton
7acdece3b2 Comment out NHWC Alexnet test for now
Summary:
Relies on NHWC implementation of group conv which doesn't exist right
now
Closes https://github.com/caffe2/caffe2/pull/103

Differential Revision: D4451635

Pulled By: Yangqing

fbshipit-source-id: 31d99b37abf7563a26389f47affcc759ce6bc5e1
2017-01-23 13:59:29 -08:00
Yangqing Jia
e3ea3e8c12 MKL convolution operator
Summary: Closes https://github.com/caffe2/caffe2/pull/102

Differential Revision: D4448886

Pulled By: Yangqing

fbshipit-source-id: 914d11cd79107895a9755154df3526fcf71a31ea
2017-01-23 09:59:30 -08:00
Ross Girshick
e0c90de6e6 Speedup get_op_ids_in_path
Summary:
Perf bug report: https://www.facebook.com/groups/1405155842844877/permalink/1617904561570003/

Diagnosis:

I've done some digging into this and here's what I've found:
(1) In this use case, the call is disallowed_op_ids = get_op_ids_in_path(ssa, blob_versions, [], inputs)) where inputs = ['res4_22_sum'] is the last blob produced by the res4 stage of a ResNet101 model.
(2) get_op_ids_in_path has exponential running time in the number of blocks in the res4 stage of ResNet. This is based on empirical running times. This call should complete in 4.5 days on my devgpu.
(3) I haven't familiarized myself enough with the IR and SSA code in core.py to understand the algorithmic fix yet, but surely there's a more efficient algorithm to compute the same thing.

Reviewed By: Yangqing

Differential Revision: D4446278

fbshipit-source-id: 8bd147f92d62b865dc355d5802a53e92d64b6e21
2017-01-23 09:44:26 -08:00
Alexander Sidorov
c4b640aeb2 @debug decorator to make it easier to use dropin debugger
Summary:
Now it takes two lines to get drop-in debugger: import it and
then decorate your function. Also got rid of enable / disable logic as
it doesn't seem usefull.

We can also try to enable this by default for our tests when running
locally as a next step.

Reviewed By: bwasti

Differential Revision: D4444299

fbshipit-source-id: 6e2006945d8ad640685b1017ca1bd63054728908
2017-01-23 09:44:26 -08:00
Andrey Malevich
ec51f887bf Create only one instance of SigridTransform in DPerExample.
Summary:
DPer example have been creating multiple copies of the transform config in net
defition till this moment, that resulted in the fact that I've hit the limit of
ProtoBuf (64MB) for a certain Task requests (especially visible because of the
ValidationPipeline that I was adding).

After this diff we're going to store SigridTransforms in one instance per
machine for training (or 1 instance per reading).

Difference in sizes of the plans for some simple SparseNN model ~30 MB (even including the fact that second model have validation plan as well).

TODO: Do similar logic for NNPreProc as well (it's also pretty large).

Reviewed By: dzhulgakov

Differential Revision: D4441441

fbshipit-source-id: 4452dd86a4dc49b2c7f5b7642f443aed5720b047
2017-01-22 19:29:16 -08:00
Aapo Kyrola
06398e9bfb softmax-with-loss, handle gracefully cases when total weight is 0
Summary:
Spatial Softmax allows specifying locations that are not counted for the loss. If none of the locations are counted, this resulted in NaNs, and headache. This diff fixes that by explicitly handling these cases.

+ assertion for label blob dimension(0)

Created a new test as well.

Differential Revision: D4442939

fbshipit-source-id: 8641bfad2a994e517ca3eda39345380a6ca1ba50
2017-01-20 15:29:21 -08:00
Aapo Kyrola
e18643f90b More fixes
Summary:
When testing the code, a couple of issues arised:
 - we need to have different name for last layer than the preprocessed model, otherwise a shape assertion is created
 - preprocess_noaugmentation still needs to do a crop for images larger than 227x227, otherwise things fail.

Reviewed By: viswanathgs

Differential Revision: D4442700

fbshipit-source-id: 05f54e7f17c266280f5ba5bb57af1721fe30df12
2017-01-20 13:44:24 -08:00
Kevin Matzen
6a7dd236fa instance norm
Summary: Added gradient and GPU implementation to caffe2 InstanceNorm op

Reviewed By: Yangqing

Differential Revision: D4304808

fbshipit-source-id: 6feecaed589ea9f825260a49b39b4260da6e5426
2017-01-20 12:29:28 -08:00
Alexander Sidorov
3f66f66da9 DebugMode helper for Caffe2
Summary:
It helps to develop scripts locally (when working outside of Flow). One doesn't have to rerun the script in order to catch exception in the debugger / add a print statement. (Flow does this kind of thing automatically)

Usage example:

```
if __name__ == '__main__':
  workspace.GlobalInit(['caffe2', '--caffe2_log_level=2'])
  from caffe2.python.utils import DebugMode
  DebugMode.enable()
  DebugMode.run(main)
```

Reviewed By: Yangqing

Differential Revision: D4424096

fbshipit-source-id: 73f418c80f581820e70139df7e166981e4d8c55f
2017-01-20 09:29:31 -08:00
Aapo Kyrola
afe822ebd7 Small tweaks
Summary:
Some tweaks, hopefully getting us to 0.98 MAP
- no cropping for test dataset (as per patrick)
- spatialBN momentum 0.1 (default is 0.9)

Also added some additional logging and reduced frequency of running of test net and logging.

Reviewed By: viswanathgs

Differential Revision: D4439790

fbshipit-source-id: 700705b811a5fc8c7139a265de96db646605ca5a
2017-01-19 18:44:26 -08:00
Ahmed Taei
411059d649 Generate huffman tree
Summary:
In this diff :
[1] Change the output from generating all paths from root to labels to TreeProto.
TreeProto itself is required by inference and we can use hsm_util to get the
paths from TreeProto.

[2] Fix hsm_util index assigment.

Differential Revision: D4416731

fbshipit-source-id: 657d8b9b4df6fa30c9f92d391cf7e07b5c5db1f8
2017-01-19 16:14:23 -08:00
Ahmed Taei
dd51336611 Fix label start index for HuffmanTreeHierarchyOp
Summary: Change labels indices range to be in the range [0, num_classes[

Differential Revision: D4416685

fbshipit-source-id: b16ca8539fd538ad62bf1298dbad3f1553956241
2017-01-19 15:14:53 -08:00