Commit Graph

715 Commits

Author SHA1 Message Date
Jiyan Yang
6aff754dbc Add batch normalization layer
Summary: As desc.

Reviewed By: xianjiec

Differential Revision: D5077230

fbshipit-source-id: f73cdedac6d9a3542f8ef829b54fb4c713dcafd0
2017-05-26 16:46:52 -07:00
Thomas Dudziak
ec19b4bd7b Import fixes for Python 3
Summary: As title

Differential Revision: D5135990

fbshipit-source-id: 88cb15bb2fb97dd21faf3ea5ddb8d4dbff7fad93
2017-05-26 16:31:50 -07:00
Thomas Dudziak
3ccbf23132 String-related fixes for Python 3
Summary: This diff is one step towards enabling python 3 build by making it be more diligent in its handling of strings.

Reviewed By: salexspb

Differential Revision: D4893083

fbshipit-source-id: 28b8adf3280e8d1f0a7dc9b0fee5ad53f2fada57
2017-05-26 16:04:32 -07:00
Anmol Kalia
7f98dc28cb Refactored spatial softmax
Summary: Refactored SoftmaxWithLoss by removing the code for spatial=1 mode and created a new op SpatialSoftmaxWithLoss that has the spatial mode implemented.

Reviewed By: viswanathgs

Differential Revision: D5104120

fbshipit-source-id: 8ab999e32c916b2a39a670a7b2a3365401535f24
2017-05-26 14:50:43 -07:00
Ahmed Taei
75a6f909c5 Add option to enable memonger for gradients and add param_names for save_model.
Reviewed By: akyrola

Differential Revision: D5131493

fbshipit-source-id: 7c159ccffa30eb064c157e559f1d8f0350f03ccb
2017-05-26 11:31:35 -07:00
Dmytro Dzhulgakov
35eaf444c0 Quickly hack sparsenn_benchmarks to also do BenchmarkNet
Summary:
Makes benchmark a bit hacky, but it's a benchmark after all :)

Specifically ports functionality of proper BenchmarkNet run from the ads_benchmarks so that we can see training net perf.

Also adds --report_interval parameter to print stats more often when running in hogwild mode

kdub0 - hopefully if you have time you can integrate it properly with the Flow's workflow

harouwu -shouldn't conflict too much with your current diff

Reviewed By: rayleichen

Differential Revision: D5125183

fbshipit-source-id: 9c6f1663bc85e26d6609f0f2f23aa280731939db
2017-05-26 10:48:45 -07:00
Aapo Kyrola
d60a2e3c58 UnsortedSegmentSum/Mean for CUDA
Summary:
To make optimizer for sparse gradients work with CUDA, we need UnsortedSegmentSum and Mean implemented for CUDA. Unique was already implemented by harouwu.

Pretty straightforward implementations, should be fast enough -- and i don't know a faster way anyway.

Added some tests as well.

Reviewed By: asaadaldien

Differential Revision: D5124548

fbshipit-source-id: 63ae72f45fc2f07470603f7b2de12f34635dbb3d
2017-05-26 09:33:49 -07:00
Luke Yeager
97159810c9 Restore compatibility with protobuf2
Summary:
Addresses an issue with 417f74509e.
```
>               operators.append(proto.op.pop())
E               AttributeError: 'RepeatedCompositeFieldContainer' object has no attribute 'pop'
```
/cc jhcross
Closes https://github.com/caffe2/caffe2/pull/658

Reviewed By: dzhulgakov

Differential Revision: D5130382

Pulled By: salexspb

fbshipit-source-id: 34e0c39aad5f339c1aaa1506af3e7495193565f4
2017-05-26 08:47:24 -07:00
Alexander Sidorov
016f72537a ModelHelper.create_param, Initializer abstraction and ParameterInfo for optimizers
Summary:
This is going to unblock Nvidia in their work on adding fp16
support to Caffe2. I discussed this with kennyhorror before to make
sure this fits into his work on parameter sharing.

Reviewed By: kennyhorror

Differential Revision: D5127797

fbshipit-source-id: 4db155d320b1862570c23b77c4252bdacbf2296f
2017-05-25 22:03:15 -07:00
Andrey Malevich
6c12df3003 Fix export of SparseToDense layer.
Summary:
If there're 2 SparseToDense layers that are densifying same IdList feature
it'll result in the situation, where we might export invalid input for the
prediction in input specs. This diff is changing the behavior to support to use
Alias to a new blob instead of passing things directly.

Reviewed By: dzhulgakov

Differential Revision: D5093754

fbshipit-source-id: ef4fa4ac3722331d6e72716bd0c6363b3a629cf7
2017-05-25 21:46:28 -07:00
Jiyan Yang
9bf1f16255 Add bias to cosine distance for two tower models
Summary: Currently using two tower models with cosine distance results in bad calibration. Adding bias to the output of cosine term solves the problem.

Reviewed By: xianjiec

Differential Revision: D5132606

fbshipit-source-id: eb4fa75acf908db89954eeee67627b4a00572f61
2017-05-25 19:50:20 -07:00
Zhicheng Yan
2002018603 memory_leak_data_worker
Summary: Memory leak happens when new BlobReference is constantly added to the set _scratch_blobs

Reviewed By: panshen1

Differential Revision: D5134945

fbshipit-source-id: 3ce4d482153bb89de065f20cd91411178085caad
2017-05-25 19:22:03 -07:00
Pieter Noordhuis
a9b5efe3c2 Expose max collective concurrency
Summary:
This was hardcoded at 4 before but should be made
configurable. Can be kept low for big MLPs and higher for convnets.

Reviewed By: akyrola

Differential Revision: D5126138

fbshipit-source-id: 713ee8bbeb243b7de1479808fd6398d397e0b49a
2017-05-25 13:32:40 -07:00
Mohamed Fawzy
e35a4fe5cc Implement SizeOp as requested in github issue#583
Summary:
Implement SizeOp that returns the number of elements in the input
tensor.

Output is 1D tensor that contains the number of elements

Reviewed By: akyrola

Differential Revision: D5101061

fbshipit-source-id: d1c56053b6f3b41c65ac574dd748482775d1ea0d
2017-05-25 11:07:35 -07:00
Artem Volkhin
55d293f730 remove non-existing blobs from output_schema in layer_model_instantiator
Summary: In some cases (for example, when include_tags option is used) output_schema contains blobs that aren't produced by the generated net. In this case we want to filter them from output_schema as well.

Differential Revision: D5120115

fbshipit-source-id: f98ea3f747589390b039d1e1987becec3980634c
2017-05-25 00:36:19 -07:00
Aapo Kyrola
da6b82b810 fix another bug related to in-place ops --> treat in-place ops like any other
Summary:
D5116828 changed how in-place ops were hanled in memonger and fixed a crash in NeuralMT. However, it still produced incorrect memongerization, because an op with one inplace input-output but another non-inplace output would be handled still incorrectly, as the other output's branch would not be followed properly.

This is fixed by actually removing the whole in-place op special handling. This actually is not needed anymore, it was leftover from an older version of memonger that used topological sort of the ops.

Reviewed By: asaadaldien

Differential Revision: D5128142

fbshipit-source-id: b551b0faebdde410e6bd7516958c63cf610cc065
2017-05-24 23:32:03 -07:00
Deepak Gopinath
33c40e8a6e Handling shared indices in sparse gradient updates
Summary: When two or more blobs are gathered by the same indices blob in a data parallel model, we used to concatenate multiple times and re-write to the same indices blob. This leads to illegal memory access at times because the gradientslice indices blob is longer than its corresponding gradientslice values blob. This diff adds a check in order to avoid this.

Reviewed By: akyrola

Differential Revision: D5116817

fbshipit-source-id: 1c086d092eb6d48926d600f9408f578f5ddc41c7
2017-05-24 22:47:00 -07:00
Aapo Kyrola
f2303ccb77 fix tileop test
Summary: Gradient test for tile op was flaky because i had made the dimensions too large. This caused push blocking errors. Also I noticed my test_grad_tile was incorrect.

Reviewed By: asaadaldien

Differential Revision: D5126476

fbshipit-source-id: ae9ce5d9041648d7a4535fc88d4013e669bd6f02
2017-05-24 18:32:01 -07:00
Bram Ton
4da076d3e9 Fixed typo caffe_translator.py, fixes bug #397
Summary:
Fixed minor typo in python/caffe_translator.py. Fixes #397.
Closes https://github.com/caffe2/caffe2/pull/412

Differential Revision: D4950875

Pulled By: aaronmarkham

fbshipit-source-id: 07183c6d6e8e97451bb5ee5ff01a88553d6bdb82
2017-05-24 12:18:32 -07:00
Bor-Yiing Su
c79ce5c2ba Profiles pipe stages.
Summary: Adds timers to collect the average runtime for each pipe stage.

Reviewed By: azzolini

Differential Revision: D5083958

fbshipit-source-id: 42536bd70c80c2453d98d872286525388f6164c3
2017-05-24 12:02:03 -07:00
Viswanath Sivakumar
152d439400 Allow specifying net type in predictor_exporter
Summary:
predictor_exporter copies the original predict_net's op, external_input and
external_output fields, but ignores the type field. This is reasonable as the
train net would generally have 'dag' type and copying that for inference may
not be applicable. It's good to have a way to specify the net type nevertheless
to run DAGNet for inference. This diff adds a field in predictor_exporter to do
that.

Reviewed By: akyrola

Differential Revision: D5122354

fbshipit-source-id: 0e3cc417128db903c71515135c9e3b87620ae21e
2017-05-24 11:46:27 -07:00
James Cross
03503140fd DropoutCell as wrapper for another RNNCell
Summary: Added a new RNNCell, DropoutCell, which wraps an existing RNNCell and applies dropout to its primary output (as defined by get_output_state_index()).

Reviewed By: salexspb

Differential Revision: D5084871

fbshipit-source-id: 60474af84e5757a12e7fdc3814840dc9ba8e32a1
2017-05-24 11:36:45 -07:00
Bram Wasti
c55be38e63 Added mobile exporter
Summary: Basically takes in a live net and creates an init_net and predict_net which can be written to file and run in Predictor

Reviewed By: salexspb

Differential Revision: D4989425

fbshipit-source-id: 8052065da9ed763d48bd9e1e19f7697ef60a2829
2017-05-24 11:36:44 -07:00
James Cross
c39f6cf2d0 gradient accumulation fix
Summary: As noted by salexspb, MultiRNNCell had unreliable gradient computation. The problem was that recurrent gradient and gradient computed wihtin the backward step net were not being accumulated during the backward pass, but rather writing to the same blob, thus overwriting each other. This diff fixes that by artificially introducing an extra blob for the internal output, and then accumulating it into the gradient coming from the recurrent connection.

Reviewed By: salexspb

Differential Revision: D5110059

fbshipit-source-id: 16add50989fe8866361bbc21afce5f214c5292fd
2017-05-24 10:33:32 -07:00
Aapo Kyrola
6c511f64cc fix handling of ops with in-place input/output
Summary: Memonger ignores ops with input and output in-place, but did not work correctly if there were also non-inplace inputs, like with Mul. Simple fix to also look at in-placeness during the traversar.

Reviewed By: jhcross

Differential Revision: D5116828

fbshipit-source-id: 52817f1221597986cc09cc65d094417c1923d965
2017-05-23 18:23:33 -07:00
Andrey Malevich
64e04e78d2 Remove AddOperator from ModelHelper
Summary:
It looks like AddOperator was never really used (searched across the whole
code-base). In addition to this all model_helper functionality is getting
replaced with Brew, so there I'd prefer to remove this method to reduce the
amount of code touching model.params.

Reviewed By: rayleichen

Differential Revision: D5110425

fbshipit-source-id: f2a88e4c1ce5149d27e809e03da9a86c0867bc4d
2017-05-23 13:34:45 -07:00
Aapo Kyrola
2b11adb414 TileOp CUDA fix: number of threads must be hard coded
Summary:
I had "optimized" the number of threads / block, but cub::BlockReduce has a static template parameter for the number of threads, and this must match. Probably tests still passed because typically the initial numbers are zeros.

Also added a stronger test.

Thanks ves for the report.

Differential Revision: D5110901

fbshipit-source-id: c1169b1286e204c202b0727448ddb51b4965eacb
2017-05-23 09:32:19 -07:00
Aapo Kyrola
74e964ff0d make data_workers restartable
Summary: Add ability to restart data workers data input.

Reviewed By: andrewwdye

Differential Revision: D5108666

fbshipit-source-id: f7f71cd6d4d45d007067814a552fc93cbe3eca42
2017-05-23 01:18:44 -07:00
Simon Layton
193c9289f0 Fix LRN schema for cuDNN op
Summary:
Correct schema generation was previously broken leading to invalid gradient op creation.

Also exhibited in model_device_helper, where invalid schema were being created on the CPU when kwargs['engine'] == 'CUDNN'
Closes https://github.com/caffe2/caffe2/pull/617

Reviewed By: asaadaldien

Differential Revision: D5097062

Pulled By: akyrola

fbshipit-source-id: e22181f857deccb7b4395e87271e2cbf1226eb64
2017-05-22 08:33:34 -07:00
Alexander Sidorov
92610e78bb CuDNN comparison mode
Summary:
This is allows to produce nice comparisons against
CuDNN. Currently on 1 layer I see about 28% slow down on
average across setups specified.

Reviewed By: akyrola

Differential Revision: D4986218

fbshipit-source-id: efb12081f13dbfb92428fd4a85f12fd566eb9522
2017-05-20 15:19:43 -07:00
Aapo Kyrola
a2c01e830b fix duplicate init blob issue + fix test
Summary:
Address KaimingHe's comments in D5093689 about same blob being initialized twice causing internal consistency check to fail. Also I noticed that my new test for test_checkpoint_params was completely botched due to an indentatino issue (it did not actually execute any test). So this fixes that as well.
 Modified the test to add a duplicate param initializer, so that this bug is tested for.

Reviewed By: KaimingHe

Differential Revision: D5101304

fbshipit-source-id: 72f343035c1b4953e7bb9a1a1c171cf05d3ead26
2017-05-20 09:18:29 -07:00
Aapo Kyrola
aa603a9083 add test for input order
Summary: Based on jay-mahadeokar's code, add a test for input order consistency to data workers.

Reviewed By: jay-mahadeokar

Differential Revision: D5096887

fbshipit-source-id: efd226343f81e9a0157ec89d4588f1eee8a78549
2017-05-19 23:46:38 -07:00
Aapo Kyrola
6384bae29b call save_to_db in CPUContext + fix a typo in data_parallel_model.
Summary:
If Predictor Exporter save_to_db is called in CUDAContext, a failure occurs since the following FeedBlob() tries to store a string (meta data), but for CUDA blobs we assume they are tensors.
  + fix a typo in data_parallel_model that I bumped on.

Reviewed By: asaadaldien

Differential Revision: D5099837

fbshipit-source-id: 69d01b35a9a1816bf083f13d8a6ce88e1f5aecb7
2017-05-19 18:25:00 -07:00
James Cross
83f6dceaa6 remove forget_bias as argument to AttentionCell constructor
Summary: argument unsused.

Differential Revision: D5096088

fbshipit-source-id: fcda8a1d2b0d7c85182ab5bc002c86640b443f97
2017-05-19 16:53:40 -07:00
Ahmed Taei
09bbd0382c ConvNd cuDNN
Summary: Add ConvND cuDNN implementation.

Reviewed By: akyrola

Differential Revision: D4702205

fbshipit-source-id: 65275bcff3970b0d43ac5c168d38bcd075985979
2017-05-19 15:20:33 -07:00
Aapo Kyrola
0af0cba2b7 Refactor data_parallel_model initial sync and checkpointing
Summary:
Major improvements. Before we only synced "params" and "computed params" of model after initialization and after loading a checkpoint. But actually we want to sync all blobs that are generated in the param_init_net. For example the _momentum blobs were missed by the previous implementation and had to be manually included in checkpoint finalization.

I also added GetCheckpointParams() to data_parallel_model because it is now fully general. Also added a unit test.

Reviewed By: andrewwdye

Differential Revision: D5093689

fbshipit-source-id: 8154ded0c73cd6a0f54ee024dc5f2c6826ed7e42
2017-05-19 12:48:06 -07:00
Yiming Wu
0aeffa985e make sure mutex is on CPU too
Summary: mutex is only supported on CPU. need to make sure mutex and following atomicIter are both on CPU. This is critical for gpu SparseNN training

Differential Revision: D5093184

fbshipit-source-id: 021e6ba699a3208449fa4761cad6b0ec4544957e
2017-05-19 12:17:17 -07:00
Yiming Wu
65750349ba deprecate CNNModelHelper in python/operator_test dir
Summary:
deprecate CNNModelHelper in python/operator_test dir

BTW I found that there is 2 mkl_speed_test. I am confused...

Reviewed By: salexspb

Differential Revision: D5094122

fbshipit-source-id: f6526f4de334f2245eb4c1f204a8ec9f23750d78
2017-05-19 12:17:17 -07:00
Ahmed Taei
32bf7a2c2b Generalize PoolingOp(cuDNN) to compute 2D and 3D pooling.
Reviewed By: akyrola

Differential Revision: D5090689

fbshipit-source-id: f9f11e12adc0ee8db088f3397a8c33aa31eb5deb
2017-05-19 10:19:00 -07:00
Yiming Wu
1b7497807f cnnmodelhelper deprecate warning
Summary: We will start our API migration process. Before that, I want to make sure people don't add new CNNModelHelper instance to our opensource code. So that I put deprecation warning here in advance

Reviewed By: salexspb

Differential Revision: D5093556

fbshipit-source-id: 74bf4a7782c2d882f72f202d48c72255d152b68a
2017-05-18 23:35:26 -07:00
Pooya Davoodi
307459eb62 Fix conv_test for CUDNN dilated convolution in NHWC
Summary:
CUDNN dilated convolution was added to V6. This version of CUDNN does not support NHWC for dilated convolution.

Fix conv_test.py so that it does not test CUDNN for dilated convolution in NHWC format.
Closes https://github.com/caffe2/caffe2/pull/598

Reviewed By: akyrola

Differential Revision: D5084835

Pulled By: asaadaldien

fbshipit-source-id: 3c0c5ed02c5d9232fca567e387ab6260d71e5aaf
2017-05-18 10:07:28 -07:00
James Reed
85f1d947dd Vectorize SigmoidOp on CPU
Summary: I noticed that Sigmoid was taking an inordinate amount of time in our NMT benchmark, so I looked at the implementation and it didn't seem optimal. I replaced the implementation with an Eigen version so that when the Eigen update goes through, we will get proper AVX(2) vectorization.

Differential Revision: D5082464

fbshipit-source-id: aa951f7d730fc05198f7dd04076ec58d471b74c8
2017-05-17 20:33:36 -07:00
Ben Zhang
12edbcb154 Implemented L1Distance Operator for CUDA
Summary: Added L1Distance Operator for CUDA, as well as tests.

Reviewed By: bwasti

Differential Revision: D5071966

fbshipit-source-id: 4c3d862605e9123d955bf091efa67d0731bd816a
2017-05-17 17:32:53 -07:00
Pieter Noordhuis
bbd7aee9ab Revert D4952993: [Caffe2] fix mkl_sparse and migrate sparsity experiments
Summary: This reverts commit 86c03676ab4e47f04d2d0dd438a4a1c849bbbff0

Differential Revision: D4952993

fbshipit-source-id: 5c213c48ac44ce6aefccacc6d80534648d3c516a
2017-05-17 14:46:56 -07:00
James Cross
f27c9eea20 dropout for C2 multilayer
Summary:
Incorporate arbitrary dropout for encoder and decoder layers for Caffe2 NMT models using current configuration. This involves separate output processing (_prepare_output() and _prepare_output_sequence()) for the final layer in a MultiRNNCell.

Switching to using the newly introduced forward_only switch for RNN cells revealed an unrelated bug in our NetGradientChecker test, which urikz is investigating.

Reviewed By: salexspb

Differential Revision: D5031964

fbshipit-source-id: 19b49607d551aa3e2140041ef4e585f128c8f178
2017-05-17 11:32:47 -07:00
Aapo Kyrola
658c337f41 Error status for Gloo ops, and handling in elastic dpm
Summary: Add a RandomFailureOp and handling to elastic data parallel model of the status code

Reviewed By: andrewwdye

Differential Revision: D5065936

fbshipit-source-id: 24224f9ea414ee535c9e90cc28add5189354b0ef
2017-05-17 00:16:52 -07:00
Szymon Piechowicz
5ced84856a Caffe2: SparseToDenseMask: return key presence
Summary: Caffe2: SparseToDenseMask: return key presence

Reviewed By: matbd

Differential Revision: D5066863

fbshipit-source-id: 4f4dd141f6661829535cb77ff47cc0c230dce5d6
2017-05-16 20:22:03 -07:00
Yiming Wu
f359d70ae7 fix mkl_sparse and migrate sparsity experiments
Summary:
Migrate experiments folder to fb/sparse folder. Keep FunHashOp and SparseFunHashOp because they are now assumed as a default Op in depr. What I did

  # Migrate FunHashOp and SparseFunHashOp and their unitests to core-caffe2, make sure tests are passed.
  # Migrate other Ops in experiment folder to fb/sparse folder. Write new TARGETS files for them. Make sure tests are passed.
  # Make sure all related tests passed.
  # Fix MKL definition btw. Make sure that FC_Sparse is not compiled when there is no MKL support

Reviewed By: salexspb

Differential Revision: D4952993

fbshipit-source-id: 86c03676ab4e47f04d2d0dd438a4a1c849bbbff0
2017-05-16 18:33:51 -07:00
James Cross
37c06a3ba8 residual connections in multilayer C2 ('add' only)
Summary:
Residual connections for multilayer RNN encoder/decoder for Caffe2 NMT model. Only supporting 'add' connections (the standard approach, which ves's TF experiments concluded was at least as good as other approaches), and also only implementing for residual_level >= 1 (which also fits our use case).

It is the responsibility of the config to ensure dimension compatibility: each level at and beyond residual_level (in both the encoder and decoder) should have the same number of units, with the exception that a bidirectional initial encoder layer should have half the number of units of the succeeding layer if that next layer is a residual layer.

Differential Revision: D5023160

fbshipit-source-id: f38c1b140638fee78cf3ef7d6b4602dd462484ee
2017-05-16 17:04:58 -07:00
Yiming Wu
a28b01c155 rnn with brew
Summary:
Update rnn_cell.py and char_rnn.py example with new `brew` model.

- Deprecated CNNModelHelper
- replace all helper functions with brew helper functions
- Use `model.net.<SingleOp>` format to create bare bone Operator for better clarity.

Reviewed By: salexspb

Differential Revision: D5062963

fbshipit-source-id: 254f7b9059a29621027d2b09e932f3f81db2e0ce
2017-05-16 13:33:44 -07:00