Commit Graph

922 Commits

Author SHA1 Message Date
Bangsheng Tang
5f63f5697a IndexHash
Summary:
1. IndexHashOp
2. Helper class SparseFeatureHash
3. FeatureSpec changes to add desired_hash_size

Reviewed By: kennyhorror

Differential Revision: D5361370

fbshipit-source-id: bf02e3ca12b3654f1d291f77c8af9248b6c4ac55
2017-07-07 23:06:11 -07:00
Geet Sethi
86b6a6e2f8 Added PiecewiseLinearTransform CUDA Op
Summary: Added a CUDA implementation of the PiecewiseLinearTransformOp.

Differential Revision: D5378537

fbshipit-source-id: 38857f59f5cc52e16e1ecc97983a0b0b82a46c74
2017-07-07 15:20:00 -07:00
Clément Godard
cb7f17ab64 added gradients for ResizeNearest (CPU + CUDA) and ref
Summary:
# Added the gradients of the operation for both CPU and CUDA kernels.
  # Unified variable names across all ops.
  # Added reference implementation in numpy.
  # The gradient check needs a larger stepsize to succeed, is that normal?

Reviewed By: akyrola

Differential Revision: D5313682

fbshipit-source-id: aceb92649e01c5caeba8774e678f9095502d396c
2017-07-07 14:19:42 -07:00
Ralph Mao
febae7b20b fix a bug in the report function of Data_Parallel
Summary: replace params with sp, otherwise it will report an empty list

Reviewed By: akyrola

Differential Revision: D5382716

fbshipit-source-id: 34d8e6ee00cbe1718702e3d1f23ea12f8d65063e
2017-07-07 13:03:46 -07:00
Jacqueline Xu
8cedf35d55 Adding Random Fourier Features to SparseNN Model and Flow
Summary:
- Integrated RFF into the preprocessing workflow for dense features
- Developed Flow interface to input RFF parameters
- Created unit test for using RFF with sparseNN

Reviewed By: chocjy

Differential Revision: D5367534

fbshipit-source-id: 07307259c501a614d9ee68a731f0cc8ecd17db68
2017-07-07 09:39:32 -07:00
Aapo Kyrola
ad62e82179 fast simple-net memonger for C++
Summary:
To be used with predictor "online": C++ version of memonger for simple nets. Very simple greedy algorithm. Works well at least on Resnet-50 inference graph: only 3 shared blobs are used.

Next I will integrate this with predictor and run canary (separate diff).

Reviewed By: asaadaldien

Differential Revision: D5375392

fbshipit-source-id: d36e419e39a32e568e105657c27fb00c85a2535d
2017-07-06 15:17:07 -07:00
Guillaume Dumont
e8689dda8f Python 3 compatible integer division
Summary:
As the title says.
Closes https://github.com/caffe2/caffe2/pull/879

Differential Revision: D5372787

Pulled By: akyrola

fbshipit-source-id: 0ff469c0d227f1b2252c1a0c4f6f8bebaac5580f
2017-07-06 11:47:12 -07:00
Andrew Dye
31f394f8b3 Add synchronization barrier API to data parallel model
Summary: Add synchronization barrier API with configurable timeout. Users can call Synchronize() to join variable length execution before resuming multi-machine communication steps, i.e., resuming distributed training iterations after validation on a single machine.

Reviewed By: akyrola

Differential Revision: D5348387

fbshipit-source-id: 5826da10e6a60c50394c36c7cf47624f10191d11
2017-07-06 09:21:19 -07:00
Aapo Kyrola
21ba0ff560 small fix to when input blob is input to multiple ops
Summary: Memonger had a bug that it crashes if an input blob was input to multiple ops. This fixes that and adds a test.

Reviewed By: asaadaldien

Differential Revision: D5374860

fbshipit-source-id: 1d5044001eacdbe6db43f69727da9297558f5c5c
2017-07-05 22:37:26 -07:00
Aapo Kyrola
2d133d4627 increase concurrency default
Summary: Huge improvement in my tests, and it does not really hurt either.

Reviewed By: wesolwsk

Differential Revision: D5374925

fbshipit-source-id: c96a4ed2ca653120a82233c0037cbfded8a2d2a1
2017-07-05 21:46:31 -07:00
Luke Yeager
be7725b0ba Tests: fix dpm test when only 1 GPU present
Summary:
b33894e95d removed this line:
```py
unittest.skipIf(workspace.NumCudaDevices() < 2, "Need at least 2 GPUs.")
```
but forgot to add it back later.
```
_________________________________ DataParallelModelTest.test_equiv __________________________________
...
            if p2p_access_pattern is not None and not p2p_access_pattern[
>               devices[0], peer
            ]:
E           IndexError: index 1 is out of bounds for axis 1 with size 1
...
WARNING:data_parallel_model:** Only 1 GPUs available, GPUs [0, 1] requested
```

/cc akyrola
Closes https://github.com/caffe2/caffe2/pull/888

Reviewed By: akyrola

Differential Revision: D5341310

Pulled By: harouwu

fbshipit-source-id: 8d7f06913c7b5a42009a4033dbb6a48a8e812822
2017-07-05 14:32:12 -07:00
Yiming Wu
60e4607106 brew API in convnet benchmark
Summary: upgrade convnet_benchmarks to brew api

Reviewed By: salexspb

Differential Revision: D5341829

fbshipit-source-id: f34c6dd4aae5f0c8db51e7600eb1f0e1cdc72ea3
2017-07-05 10:34:48 -07:00
Jacqueline Xu
25bd5dda27 Implementing random fourier features layer
Summary:
- Created the random fourier features layer
- Generated a unit test to test the random fourier features layer is built correctly
- Inspired by the paper [[ https://people.eecs.berkeley.edu/~brecht/papers/07.rah.rec.nips.pdf |   Random Features for Large-Scale Kernel Machines]]

Reviewed By: chocjy

Differential Revision: D5318105

fbshipit-source-id: c3885cb5ad1358853d4fc13c780fec3141609176
2017-07-04 23:48:42 -07:00
Jiyan Yang
00e5afea6a Adding dedup aggregator options to sgd optimizer
Summary: As desc.

Reviewed By: xianjiec

Differential Revision: D5324671

fbshipit-source-id: 27f3a58f618cd5ea11c2ea2e756df3f73635c2c8
2017-07-04 02:10:18 -07:00
Marat Dukhan
2ac9ff5c96 Cos, Sin, and Abs operators
Summary: add Cos, Sin, and Abs operators

Reviewed By: akyrola

Differential Revision: D5307632

fbshipit-source-id: 743c9d289e4d3fd439e4b5385841cdff87d9247a
2017-07-03 22:18:32 -07:00
Simon Layton
090506ac87 Add NCCLBroadcast to correct net
Summary:
Otherwise was always added to main net instead of param_init_net when
desired (i.e. initial param sync)
Closes https://github.com/caffe2/caffe2/pull/894

Differential Revision: D5367451

Pulled By: akyrola

fbshipit-source-id: 3d82be6da687c736bd15f4852dbd272266eb4811
2017-07-03 16:54:44 -07:00
Dmytro Dzhulgakov
b6c1c0ac4e Fix communication_schema decoding
Summary: Allows to override the input/output record as long as the field blobs are the same.

Reviewed By: yangyangyyy

Differential Revision: D5362132

fbshipit-source-id: 3ac2ac22802902b7eed5c226b00a7e1971ad264c
2017-07-02 13:04:20 -07:00
Dmytro Dzhulgakov
c0cebc3578 Added flags to lstm, convnet and sparse_nn_benchmarks to print out operators
Summary: pass flags directly to C2

Reviewed By: salexspb

Differential Revision: D5345869

fbshipit-source-id: 22b0e791526c7b0caf1e6a13dd29900df0db8fe8
2017-06-30 23:47:04 -07:00
Aapo Kyrola
ab0fe0a5f4 add debug information when there is blob version mismatch
Summary:
It is quite common question when users get some variant of "blob has version 2 but gradient expects version 1" in their backward pass. The error message is completely unhelpful.
To remedy this, I added proper debug information which tells user how the version number of a blob was incremented over time. i.e which ops caused the version to go op. This should help
understand the issue.

Reviewed By: dzhulgakov

Differential Revision: D5358227

fbshipit-source-id: bc09d048ac33200c35d56460e44e86c2f2888f3f
2017-06-30 16:22:46 -07:00
Tao Wu
5aa147f273 added PackRNNSequence and UnpackRNNSequence operators
Summary: Added two operators that can be used to tranfer data into the input format of RNN and back.

Reviewed By: kittipatv

Differential Revision: D5329886

fbshipit-source-id: 07eac29416427b08c49989d4eeed50a6f18493a1
2017-06-30 09:53:31 -07:00
Aapo Kyrola
8c74c36626 fix reducing device option
Summary: This was broken in a previous diff, fixing it to use model device type.

Reviewed By: asaadaldien

Differential Revision: D5356005

fbshipit-source-id: a4fcc932bae772076b57625a5fcc0d38eb702cc9
2017-06-30 09:19:57 -07:00
Thomas Dudziak
5355634dac Dict fixes/improvements and unittest targets for Python 3 in caffe2 core
Summary: As title

Reviewed By: salexspb

Differential Revision: D5316104

fbshipit-source-id: aee43819d817842e5ce6ba3d045a55b1a2491c30
2017-06-29 17:05:41 -07:00
Alexander Sidorov
a6dee1da32 Make args.fixed_shape in lstm_benchmark work in a library mode
Summary:
this works as a standalone python script because args are
global. When used from Flow for monitoring purposes it doesn't
work. This diff fixes it

Reviewed By: zem7

Differential Revision: D5349996

fbshipit-source-id: f73842901d975b783e09e9db0565eb81880bbea1
2017-06-29 14:55:26 -07:00
Aapo Kyrola
dd6e170b8d fix LSTM benchmark reporting
Summary:
A couple of fixes to fix broken rerporting of lstm_benchmark:
- last_time must be recorded after warm up
- entry count was incorectly removed

Reviewed By: salexspb

Differential Revision: D5349890

fbshipit-source-id: 5dd5bdf46594c520b61bc3b57b153f90a6a17903
2017-06-29 13:53:17 -07:00
Andrew Tulloch
6c67a753c7 Fix test_pair_wise_loss_predictions
Summary: Increase absolute error tolerance.

Reviewed By: tomdz

Differential Revision: D5349604

fbshipit-source-id: 8e04001b0b6a6e83083f341e265ab3c0d2b06918
2017-06-29 12:48:04 -07:00
Andrew Tulloch
912ee4e40a Fix test_sparse_to_dense precision failures
Summary: ..

Reviewed By: tomdz

Differential Revision: D5349561

fbshipit-source-id: 4c510905515eb03a64abc36f33d59a1d998c2ab1
2017-06-29 12:48:03 -07:00
Andrew Tulloch
83765906c6 Add min_satisfying_examples
Summary:
Eliminates failures from overloaded machines from only
running a few examples before being timed out.

Reviewed By: tomdz

Differential Revision: D5349555

fbshipit-source-id: 89d1db063f58c72656b37157225a586c9e3f24bc
2017-06-29 12:48:01 -07:00
Junjie Bai
86305ddd49 Deprecate CNNModelHelper in python/seq2seq/seq2seq_model_helper.py
Summary: Also added some simple tests for Seq2SeqModelHelper.

Reviewed By: jamesr66a

Differential Revision: D5291733

fbshipit-source-id: 15866dccb89acd82c08e0348f14834cd9c201422
2017-06-28 20:18:12 -07:00
Yiming Wu
fb4c0a664b brew API in lstm benchamrk
Summary: I deprecated CNN ModelHelper in LSTM benchmark

Reviewed By: salexspb

Differential Revision: D5342734

fbshipit-source-id: 81a552194bcb0cc3071604340fce6873230964f2
2017-06-28 20:18:12 -07:00
Ben Zhang
e128245e8c Move memonger graph equality into memonger
Summary: Lets try this again. Verify graphs every time memonger is run. Will definitely check for time though.

Reviewed By: akyrola

Differential Revision: D5308188

fbshipit-source-id: 512a76c759b670d31c49d1d492dd8ee1eaf3bafd
2017-06-28 17:36:40 -07:00
Luke Yeager
fe9b0bfd27 Fix some typos
Summary: Closes https://github.com/caffe2/caffe2/pull/882

Differential Revision: D5341277

Pulled By: harouwu

fbshipit-source-id: bb5595c65c05ca7ea1a1d060d61d14fbfe008241
2017-06-28 13:50:48 -07:00
Yongqiang Wang
ea659b8f2e broadcast to global parameters when using warmup
Reviewed By: asaadaldien, jay-mahadeokar

Differential Revision: D5340692

fbshipit-source-id: 80879847ff71c8d620de502ef95a9ffb4bdf595d
2017-06-28 13:35:27 -07:00
Ahmed Taei
fbe2526343 Allow concurrent execution of GLOO broadcast collectives in
Summary:
This add CollectivesConcurrencyControl class to mange creating common context and cyclic controls to execute GLOO collectivces
and refactors AllReduce and _AddDistributedParamterSync to use it

Reviewed By: akyrola

Differential Revision: D5335795

fbshipit-source-id: 5084e0a65cdb989cd949be3868b77a680561022d
2017-06-28 12:49:12 -07:00
Brian Lan
e2bd3cfc8b Add __sub__ function for schema.Struct
Summary:
This is for the ease of removing the common fields of a struct from another.
For example,
  s1 = Struct(
      ('a', Scalar()),
      ('b', Scalar()),
  )
  s2 = Struct(('a', Scalar()))
  s1 - s2 == Struct(('b', Scalar()))

More examples are provided in the code comments.

Differential Revision: D5299277

fbshipit-source-id: 7008586ffdc8e24e1eccc8757da70330c4d90370
2017-06-28 11:24:01 -07:00
Jiyan Yang
8260002941 Partial eval layers
Summary:
In some cases we don't want to compute the full FC during eval.
These layers allow us to compute dot product between
X and W[idx,:] where idx is an input, e.g., label.

Reviewed By: kittipatv

Differential Revision: D5305364

fbshipit-source-id: 0b6a1b61cc8fcb26c8def8bcd037a4a35d223078
2017-06-28 00:36:40 -07:00
Yiming Wu
a1fcbb8be1 offline_all_gpu_experiment
Summary:
similar to sparse_nn all gpu, this is our first step towards offline full gpu experiment.

**Compare Run**
cat(128, 32)512-512 :
GPU 21138598 https://fburl.com/jpeod1pi
CPU 21138787 https://fburl.com/vma7225l

Reviewed By: dzhulgakov

Differential Revision: D5308789

fbshipit-source-id: 413819bf9c5fff125d6967ed48faa5c7b3d6fa85
2017-06-27 23:09:54 -07:00
Yiming Wu
1fce3eac4e single trainer hybrid device
Summary:
First try of single trainer hybrid device training for sparsenn

Comparison results with CPU training:
https://our.intern.facebook.com/intern/fblearner/run/compare/?compare_to[0]=20016969&compare_to[1]=19660293&baseline_run=19660293&all_runs[0]=20016969&all_runs[1]=19660293

Reviewed By: dzhulgakov

Differential Revision: D5205723

fbshipit-source-id: 4a024324ac2efc3248dd470d4c533cf2ecec2e92
2017-06-27 22:06:30 -07:00
Henry Lu
9a14c013c3 Refactor data_parallel_model to take advantage of Gloo broadcast op in broadcasting across machines and GPUs in one operation
Summary: Combine _AddDistributedParameterSync() and _SyncParams() into a single function to broadcast across distributes machines and all local GPU simultaneously. This is similar to how calls to Allreduce has already optimized using the functionalities of Gloo. All the refactoring work is contained in data_parallel_model.py.

Reviewed By: akyrola, andrewwdye

Differential Revision: D5329277

fbshipit-source-id: 4407b88980cf396f2e0f994d796294fa79fd39ed
2017-06-27 19:35:24 -07:00
Luke Yeager
c3b4d277bf Tests: fix test_convolution_sync()
Summary:
This bug in the test was exposed by https://github.com/caffe2/caffe2/pull/861 (previously, the test was always using the cuDNN engine, regardless of the value of `engine`). This bug is now blocking https://github.com/caffe2/caffe2/pull/817.
```
____________________ TestConvolution.test_convolution_sync _____________________
...
            if use_cudnn and requested_engine != 'CUDNN':
                raise ValueError(
>                   'When use_cudnn=True, the only engine you can specify is '
E                   ValueError: When use_cudnn=True, the only engine you can specify is "CUDNN"
```
https://travis-ci.org/caffe2/caffe2/jobs/247605579
Closes https://github.com/caffe2/caffe2/pull/881

Differential Revision: D5332619

Pulled By: akyrola

fbshipit-source-id: 63737768a155359ddbbef1da424fcbb94f86bd4e
2017-06-27 18:07:04 -07:00
James Cross
08cfc72dee Increase threshold for test_unroll_attention
Summary: To 0.000001.

Reviewed By: salexspb

Differential Revision: D5323697

fbshipit-source-id: 5a06c8f5e719b5252e4229704205be37777a8bab
2017-06-27 17:17:32 -07:00
James Reed
07ba98b4b2 Allow specification of SliceOp dimensions via argument rather than via tensor
Summary: This should make it so we no longer have super hacky DAG chains just to generate vectors of indices that could be specified at model creation time

Reviewed By: akyrola

Differential Revision: D5316707

fbshipit-source-id: 97bb3868b69e0c5a7f465c95f2e16ae0485dcc56
2017-06-27 17:17:32 -07:00
Aapo Kyrola
4d16578284 fix + verification for inplace blobs
Summary:
Fixes a memonger bug where it could recycle a blob that was released by the same op being processed.
Added a verification step to ensure in-place assignments are not changed.

Reviewed By: asaadaldien

Differential Revision: D5331495

fbshipit-source-id: 20b08f6de5b973e8c9868aa048c142cac1eb6c58
2017-06-27 13:51:03 -07:00
Luke Yeager
dfd745a4d1 Conv frontend: checking engine and use_cudnn
Summary:
*Fixes https://github.com/caffe2/caffe2/issues/860*

Raise an exception when the user specifies conflicting values for `engine` and `use_cudnn` in the conv frontend.
Closes https://github.com/caffe2/caffe2/pull/861

Differential Revision: D5329587

Pulled By: akyrola

fbshipit-source-id: 0f1ced9a88c9c6c5a7cb30a070e5bf60129082f0
2017-06-27 09:47:48 -07:00
Simon Layton
d45f722e43 data_parallel_model: NCCLBroadcast root fix
Summary:
The root is the root _rank_ and not the root _device_. Thus we always
use root=0, regardless of the devices used.

https://github.com/NVIDIA/nccl/blob/v1.3.0-1/src/broadcast.cu#L75

/cc slayton58
Closes https://github.com/caffe2/caffe2/pull/872

Differential Revision: D5329564

Pulled By: akyrola

fbshipit-source-id: 5a34be30c1a0046a74f28437cb08333c1fb46098
2017-06-27 09:47:48 -07:00
Luke Yeager
ca2bf16009 Tests: handle missing python-lmdb gracefully
Summary:
Fix issue mentioned here: 875a9850c1 (commitcomment-22773221)

Unblocks https://github.com/caffe2/caffe2/pull/817

/cc tomdz
Closes https://github.com/caffe2/caffe2/pull/871

Differential Revision: D5329573

Pulled By: akyrola

fbshipit-source-id: 855294f76bce82dce6d4bd489244922799848076
2017-06-27 09:47:46 -07:00
Zhicheng Yan
c0445c4426 support_multi_label
Summary: Extend image_input_op to support multi-label binary label vector

Reviewed By: panshen1

Differential Revision: D5318119

fbshipit-source-id: da6757ed9a562f1ab58e3ae5642b7a70d6d499c1
2017-06-27 08:47:59 -07:00
James Reed
24e30534ea Implement SliceGradientOp for CPU
Summary: Implement slice gradient for CPU. Will soon port this over to GPU so NMT can use it

Reviewed By: akyrola

Differential Revision: D5309305

fbshipit-source-id: 8fb5f4e665f236ecce9227c5c0c302f5076b01ad
2017-06-26 21:18:05 -07:00
Andrew Tulloch
cb5af39c69 Vectorize CPU ClipOp implementation (and add test)
Summary: Noticed this wasn't vectorized, could be handy.

Reviewed By: kennyhorror

Differential Revision: D5308593

fbshipit-source-id: c2b35ece34831f0546f010a1ebe0b89f1a7d9446
2017-06-26 11:33:13 -07:00
Ben Zhang
4862c0f47f Memonger in O(blobs)
Summary:
Made them faster.

This should be equivalent to the algorithm akyrola suggested, just with a list (of parents) as an intermediate representation instead of a string.

Reviewed By: akyrola

Differential Revision: D5308133

fbshipit-source-id: c976a513d10e79c157ea803afb99b147e9ea3357
2017-06-26 11:04:13 -07:00
Aapo Kyrola
87275817a4 fix a rare race condition by initializing scratch blobs beforehand
Summary: Data workers test timeouts randomly (very seldom), and looks like the reason is that we call FeedBlob in a thread (eneuque-thread), and first time that is called, it will call workspace.CreateBlob() -- which is not thread safe. Fix this by initializing the scratch blobs explicitly.

Reviewed By: panshen1

Differential Revision: D5292426

fbshipit-source-id: d7dad68f3ccc636c60bd82b2527f00f20da298b5
2017-06-26 10:18:18 -07:00