Commit Graph

348 Commits

Author SHA1 Message Date
James Cross
53ccbd9a6e soft-coverage attention
Summary:
Implementation of a new variant of attention module, which contains a recurrent decoder state with vectors corresponding to each source-side word and strictly increasing values, thus enabling it to model the degree to which source words have been translated.

The approach is a variant of the approaches described in https://arxiv.org/pdf/1601.04811.pdf. We simply include the sum of all previous attention weights for encoder words as a new recurrent state (coverage_t). A new linear transform on encoder_outputs is used to produce coverage_weights, which has the same dimensionality as encoder_outputs, and implicitly models the fertility of source-side words (and putting this extra information strain on the encoder network).

Thus the encoder output, the decoder state, and the coverage weights have the same dimensionality for a given source word, and attention logits are calculated as v *  tanh(coverage * coverage_weights + encoder_output + decoder_state).

Note: the entire coverage state for each translation instance is of shape (encoder_length, coverage_units), but the states for the RecurrentNetwork operator, used to train the decoder, must be flat in the data dimension. This state is therefore initialized with shape (encoder_length * coverage_units) [not shown in the open-source library] and reshaped appropriately within the apply_soft_coverage_attention() function.

Differential Revision: D5593617

fbshipit-source-id: 7d0522b5eb0b26f22e8429e4461a459f2f16ed46
2017-08-31 21:21:54 -07:00
Jerry Zhang
debceaff02 Support new arguments in ConvTranspose
Summary: Adding support to use kernels, strides, pads etc. as arguments.

Reviewed By: houseroad

Differential Revision: D5710699

fbshipit-source-id: 8b63af4c4a76cd06b637a376aeb29a34c659be2e
2017-08-31 11:17:32 -07:00
Kittipat Virochsiri
4ec26d23a7 TensorInference function for LengthsSum and such
Summary: Adding missing tensor inference function

Reviewed By: kennyhorror

Differential Revision: D5735119

fbshipit-source-id: 1602b5aeec95f13a3c3c6d3e5417af2712a4dfbb
2017-08-31 09:32:48 -07:00
Misha Smelyanskiy
080fab8f6c Code generator for and high-performance emebding look-up kernels, supporting
Summary:
Code generator for and high-performance emebding look-up kernels, supporting
Sum, WeightedSum, and Mean reducers.
Achieve at least 1.5x speedup on float and over 2x speedup for float16, compared to existing code
These are results on Broadwell, using sparse_lengths_sum_benchmar.par benchmark

Old
==============
[root@fblearner001.01.ftw1 /home/msmelyan]# numactl -m 0 -C 0 ./sparse_lengths_sum_benchmark.par  --iteration 10000
Preparing lookup table. 2017-08-08 00:10:23.101848
Preparation finished. 2017-08-08 00:10:27.955680
I0808 00:10:27.955732 30700 net.cc:177] Starting benchmark.
I0808 00:10:27.955759 30700 net.cc:178] Running warmup runs.
I0808 00:10:27.956367 30700 net.cc:188] Main runs.
I0808 00:10:31.839035 30700 net.cc:199] Main run finished. Milliseconds per iter: 0.388264. Iters per second: 2575.56
I0808 00:10:35.704169 30700 net.cc:233] Operator #0 (indices, Python) 0.0583264 ms/iter
I0808 00:10:35.704210 30700 net.cc:233] Operator #1 (Y, SparseLengthsSum) 0.327694 ms/iter
I0808 00:10:35.704213 30700 net.cc:237] Time per operator type:
I0808 00:10:35.704217 30700 net.cc:246]        0.327694 SparseLengthsSum
I0808 00:10:35.704221 30700 net.cc:246]       0.0583264 Python
[root@fblearner001.01.ftw1 /home/msmelyan]# numactl -m 0 -C 0 ./sparse_lengths_sum_benchmark.par  --iteration 10000 --dtype float16
Preparing lookup table. 2017-08-08 00:10:59.047159
Preparation finished. 2017-08-08 00:11:05.140565
I0808 00:11:05.140612 31725 net.cc:177] Starting benchmark.
I0808 00:11:05.140635 31725 net.cc:178] Running warmup runs.
I0808 00:11:05.141104 31725 net.cc:188] Main runs.
I0808 00:11:08.371510 31725 net.cc:199] Main run finished. Milliseconds per iter: 0.323039. Iters per second: 3095.6
I0808 00:11:11.671450 31725 net.cc:233] Operator #0 (indices, Python) 0.0609876 ms/iter
I0808 00:11:11.671489 31725 net.cc:233] Operator #1 (Y, SparseLengthsSum) 0.26856 ms/iter
I0808 00:11:11.671494 31725 net.cc:237] Time per operator type:
I0808 00:11:11.671497 31725 net.cc:246]         0.26856 SparseLengthsSum
I0808 00:11:11.671500 31725 net.cc:246]       0.0609876 Python

New (Misha's)
==============
[root@fblearner001.01.ftw1 /home/msmelyan]# numactl -m 0 -C 0 ./sparse_lengths_sum_benchmark.par  --iteration 10000
Preparing lookup table. 2017-08-07 23:44:55.897748
Preparation finished. 2017-08-07 23:45:00.708896
I0807 23:45:00.708945 4178361 net.cc:177] Starting benchmark.
I0807 23:45:00.708971 4178361 net.cc:178] Running warmup runs.
I0807 23:45:00.709444 4178361 net.cc:188] Main runs.
I0807 23:45:03.608551 4178361 net.cc:199] Main run finished. Milliseconds per iter: 0.289909. Iters per second: 3449.36
I0807 23:45:06.536182 4178361 net.cc:233] Operator #0 (indices, Python) 0.0572399 ms/iter
I0807 23:45:06.536224 4178361 net.cc:233] Operator #1 (Y, SparseLengthsSum) 0.23512 ms/iter
I0807 23:45:06.536228 4178361 net.cc:237] Time per operator type:
I0807 23:45:06.536232 4178361 net.cc:246]         0.23512 SparseLengthsSum
I0807 23:45:06.536236 4178361 net.cc:246]       0.0572399 Python
[root@fblearner001.01.ftw1 /home/msmelyan]# numactl -m 0 -C 0 ./sparse_lengths_sum_benchmark.par  --iteration 10000 --dtype float16
Preparing lookup table. 2017-08-07 23:45:17.191579
Preparation finished. 2017-08-07 23:45:23.173668
I0807 23:45:23.173715 4179316 net.cc:177] Starting benchmark.
I0807 23:45:23.173743 4179316 net.cc:178] Running warmup runs.
I0807 23:45:23.174090 4179316 net.cc:188] Main runs.
I0807 23:45:24.939749 4179316 net.cc:199] Main run finished. Milliseconds per iter: 0.176564. Iters per second: 5663.67
I0807 23:45:26.698885 4179316 net.cc:233] Operator #0 (indices, Python) 0.0557303 ms/iter
I0807 23:45:26.698923 4179316 net.cc:233] Operator #1 (Y, SparseLengthsSum) 0.119794 ms/iter
I0807 23:45:26.698927 4179316 net.cc:237] Time per operator type:
I0807 23:45:26.698931 4179316 net.cc:246]        0.119794 SparseLengthsSum
I0807 23:45:26.698935 4179316 net.cc:246]       0.0557303 Python

Reviewed By: salexspb

Differential Revision: D5582172

fbshipit-source-id: d71f5a55580b734a51b8f30852b75f379acfdaf2
2017-08-30 16:22:11 -07:00
Ahmed Taei
5315669bd8 Add ShapeInference for ConcatOp (Fixed)
Reviewed By: akyrola

Differential Revision: D5721442

fbshipit-source-id: 64ed35cb4c40f32a5cca29fe9cd04e18a340db4b
2017-08-29 12:18:03 -07:00
Aapo Kyrola
488abdcd6c slice op shape inference
Summary: As titled + test

Reviewed By: jamesr66a

Differential Revision: D5720637

fbshipit-source-id: eae76e587808139fcf06abc0f8345152979815ec
2017-08-29 11:05:24 -07:00
Aapo Kyrola
7c7603a60e fix FC shape inference
Summary: FC shape inference was broken for non-default axis. Add test.

Reviewed By: asaadaldien

Differential Revision: D5720146

fbshipit-source-id: f36f9cc8477dc61c3b07eeea8ea0702562045c88
2017-08-28 16:08:07 -07:00
Yangqing Jia
9f693b39aa Revert D5711951: [caffe2] Add shape inference for ConcatOp
Summary:
This reverts commit 9173ef0f18af25326ec18e66f6ce29eecfa5ceea

bypass-lint

Differential Revision: D5711951

fbshipit-source-id: 9bbb872eafcbd3c470b782a5ddb2a1c894888101
2017-08-25 23:37:38 -07:00
Ahmed Taei
da418f5744 Add shape inference for ConcatOp
Reviewed By: akyrola

Differential Revision: D5711951

fbshipit-source-id: 9173ef0f18af25326ec18e66f6ce29eecfa5ceea
2017-08-25 18:09:35 -07:00
Jerry Zhang
3c180ba317 Opensourcing channel shuffle
Summary: att

Reviewed By: Yangqing

Differential Revision: D5662540

fbshipit-source-id: 474d7d808841ff8f7ce97b55df836b9d2f4a7629
2017-08-25 16:46:31 -07:00
Alexander Sidorov
7eba614503 RNNCell: Initializers interface, simplify _LSTM helper
Summary:
_LSTM helper is a legacy piece we had before all the RNNCell awesomeness landed. Now we need to pull it apart and create separate building blocks that people can use for any RNNs.

Please note changes to a test with double scoping. That should go away once we change RNNCell scoping logic in such a way that each cells ads its own name to the scope for all of its outputs (see another diff: D5613139 )

Reviewed By: jhcross

Differential Revision: D5632276

fbshipit-source-id: 1cb568ab995c4c0b3dd1b4bad2d028e34bded9c1
2017-08-25 12:01:24 -07:00
Aapo Kyrola
82360d8cba shape inference for ReduceFront/Back/Sum/Mean, Gather and Dropout
Summary: These were missing and required for some seq2seq models. Unit tested. The previous implementation of ReduceBackMean shape inference was incorrect, so removed it.

Reviewed By: asaadaldien

Differential Revision: D5691262

fbshipit-source-id: 76f868b298440f988635966a410f0232301ca6c4
2017-08-25 11:31:17 -07:00
Alisson Gusatti Azzolini
5e0b28e7bd PrependDimOp
Summary:
Split the first dimension of a tensor into 2, the first of which is fixed and given in the argument.
This is used to then split batch into smaller batches and distributed it across workers.

Reviewed By: harouwu

Differential Revision: D5702175

fbshipit-source-id: 02bb93e49bf9db411b516e149c8e647301dd2ca5
2017-08-24 18:52:05 -07:00
Jiyan Yang
20c854d43c Make FC op work with empty batch in cuda
Reviewed By: xianjiec

Differential Revision: D5673458

fbshipit-source-id: d1c950c94173843670ae1fae0e15ff61ca7d6761
2017-08-24 18:52:04 -07:00
Jerry Zhang
7f4ceb83e3 Relax dimension constraints for weight matrix in FC
Summary: att

Reviewed By: Yangqing

Differential Revision: D5662265

fbshipit-source-id: 893ee2f92debab06117725beeca3199cba565f1e
2017-08-24 11:16:39 -07:00
Catherine Dong
1955d0797e Added fast path for CUDNN global max pooling
Summary:
This adds a fast path for global max pooling with NCHW. Compared to equivalent ReduceBackMean, this is about 3.5x faster.

Based on D5533059.

Reviewed By: akyrola

Differential Revision: D5681122

fbshipit-source-id: 7a4df934044c7dd01888f095f7dd46654aaf4eae
2017-08-23 16:33:06 -07:00
Alisson Gusatti Azzolini
930acc8e85 CUDA SparseLengthsWeightedSum
Summary: title.

Reviewed By: harouwu

Differential Revision: D5665776

fbshipit-source-id: a8ae1a71a9a21e68172662f38b5f799870b9dcd1
2017-08-22 15:42:02 -07:00
Junjie Bai
5748e7140f Strip Operator Schema in mobile build
Reviewed By: Yangqing

Differential Revision: D5677792

fbshipit-source-id: d29edb26a36b24a46821e13e2d77af0f21571fcd
2017-08-22 13:31:08 -07:00
Douglas Chen
440d979075 Optimizations for Caffe2 SinusoidPositionEncodingOp
Summary:
Optimizations for SinusoidPositionEncodingOp to sinusoid position embeddings
more competitive against table based embeddings.
- Removed most calls to std::pow
- Replaced division with multiplication with reciprocal
- Reused computation across examples within a batch

Current speedup with batch size of 16, sequence length of 128 and embedding
size of 512 is about 270x (17k embeddings per second -> 4.7M embeddings per
second). The speedup is very dependent on the batch size; at a batch size of 4
this only gets 1.7M embeddings per second.

Profile: https://pxl.cl/8zf0
Annotated DoRunWithType: P57925031

Reviewed By: jamesr66a

Differential Revision: D5634766

fbshipit-source-id: 0f35bb176164ea547c91de242a0205c5d7adf7cf
2017-08-22 00:04:06 -07:00
Zhicheng Yan
0e20a7cb7d ImageInputOp_more_data_augmentation
Summary:
Add more data augmentation to ImageInputOp
1) Inception-style random sized cropping
2) color jittering
3) color lighting

Reviewed By: panshen1

Differential Revision: D5637726

fbshipit-source-id: 45d9cc69eec9f4d48c1607d80ccd89e325961b1a
2017-08-19 14:15:58 -07:00
Eider Moore
d6632a9a05 Adding a range operator similar to np.arange
Summary:
Adding a range operator in the spirit of np.arange. It is an imporant building block for a lot of manipulation functions.

This accepts parameters with the same meaning in the same order as python's range or np.arange (e.g. `(stop)`, `(start, stop)` or `(start, stop, step)`)

Differential Revision: D5616861

fbshipit-source-id: 02622b8bd85ebca125cc881c06fae5b54b7c602a
2017-08-18 14:45:56 -07:00
Philipp Keller
d617a77433 Add tests for ConcatOp and SplitOp
Summary: The new test ensures 'add_axis' and 'split' arguments work as intended for tensors of various dimensions. Hypothesis should checks various edge cases like zeroes in 'split_info' and 1D input with axis=0, add_axis=1.

Reviewed By: hoangmit

Differential Revision: D5645778

fbshipit-source-id: 061f9511a082da54e5c1bbe53a0e7096af4b8d1b
2017-08-18 14:02:42 -07:00
Chonglin Sun
5f612d9740 GPU version of BatchGatherOp
Summary: GPU version of BatchGatherOp.

Reviewed By: azzolini

Differential Revision: D5613593

fbshipit-source-id: 0e4a35b84db852ac2718868a02fa90e7c3d8f1f0
2017-08-17 18:31:10 -07:00
James Reed
f388135d3f Layer norm brew wrapper
Summary: Implement a brew wrapper for the LayerNorm op. This adds the scalar weight and bias terms to the op.

Reviewed By: jmp84

Differential Revision: D5595836

fbshipit-source-id: 467b2e1158b0c454a149d4b26c47719826e98752
2017-08-17 11:17:47 -07:00
James Reed
e45e621b0e Implement layer norm gradient GPU
Summary: Implement layer normalization from https://arxiv.org/pdf/1607.06450.pdf

Reviewed By: wickedfoo

Differential Revision: D5594445

fbshipit-source-id: 873643165c958fd5829fa7cf07d5d4b1b8b0ed59
2017-08-17 11:17:46 -07:00
James Reed
8e8e90f595 IMplement layer normalization backward CPU
Summary: Implement layer normalization from https://arxiv.org/pdf/1607.06450.pdf

Reviewed By: jmp84

Differential Revision: D5578306

fbshipit-source-id: 94d262f0317b3ee1b504e0110ad5135afe8350ca
2017-08-17 11:17:46 -07:00
James Reed
e16c40eb4f Implement layer normalization op forward GPU
Summary: Implement layer normalization from https://arxiv.org/pdf/1607.06450.pdf

Reviewed By: wickedfoo

Differential Revision: D5552262

fbshipit-source-id: d0cddb0769623a1b3779e2114c19e6ebc57c0f0d
2017-08-17 11:17:45 -07:00
James Reed
474c043be5 Implement layer normalization op forward CPU
Summary: Implement layer normalization from https://arxiv.org/pdf/1607.06450.pdf

Reviewed By: akyrola

Differential Revision: D5543381

fbshipit-source-id: 1102e568439af6a60aad3b87017d5a997fb7dc16
2017-08-17 11:17:44 -07:00
Aapo Kyrola
e89474c496 fix forward_only mode
Summary:
Forward-only mode had broken at some point. Two things: RNNCell did not pass the parameter to recurrent.py and also recurrent.py was broken if forward_only=True after python3 codemod.

Added test to rnn_cell_test to actually check the forward only parameter is passed to prevent future breakage.

Reviewed By: jmp84

Differential Revision: D5639306

fbshipit-source-id: b1bbc39d59c3f3734b2f40a1c2f3740c733e0bd4
2017-08-17 10:19:04 -07:00
Jerry Zhang
a63e7314f3 Adding 1d-2d-3d Schemas for Conv and Pool
Summary: Add Conv and Pool operators with dimensions.

Reviewed By: bddppq

Differential Revision: D5588614

fbshipit-source-id: 2552c40dc3ca180a6ab51817d60f0b85b97885d5
2017-08-17 09:45:54 -07:00
Jerry Zhang
4ca5735753 Allow inplace for spatial_bn_op
Summary: att

Reviewed By: Yangqing

Differential Revision: D5644717

fbshipit-source-id: 1a020fe4ca7028056ce7bebddb7bfd1437998530
2017-08-17 09:18:55 -07:00
Badri Narayan Bhaskar
ae2aad9c0d Operator to Merge ID_LIST features
Summary:
As an alternative to sharing embeddings, we want to explore merging the ID_LISTs in the net.

This commit adds an operator to merge many ID_LIST features into a single one.

Differential Revision: D5481523

fbshipit-source-id: 446121122a32de5682d5d75a165370bc8d776d03
2017-08-17 01:16:00 -07:00
Jingfei Du
b3029df1d0 Added window mode for caffe2 sequence operator
Summary: This can be used for local attention to mask elements outside of a window

Reviewed By: jamesr66a

Differential Revision: D5643677

fbshipit-source-id: 92b33866258ccc7307d5bcf08234610aa3fb152d
2017-08-16 21:34:29 -07:00
Kevin Wilfong
1f47a80e88 Caffe2: diagonal fill op
Summary: Caffe2: diagonal fill op

Reviewed By: panshen1

Differential Revision: D4775640

fbshipit-source-id: bb388ffe223e6b153d4cde1fdad6f84a2bb65b0f
2017-08-16 13:05:11 -07:00
Aapo Kyrola
a53192e334 Revert D5001637: [Caffe2][RNN] Threaded dependency-aware RNNExecutor (frontier/diagonal execution).
Summary:
This reverts commit 3d0a71593d73a9ff22f4c1a5c9abf2a4a0c633c8

bypass-lint

Differential Revision: D5001637

fbshipit-source-id: 4d6250ae7e66ea0aa635a68d943d552e5db65b69
2017-08-16 03:21:49 -07:00
Aapo Kyrola
453c60ce28 Threaded dependency-aware RNNExecutor (frontier/diagonal execution).
Summary:
This diff adds dependency-aware concurrent/parallel execution of operators in stepnets. For CPU, we use multi-threaded execution. For CUDA, we use multiple streams and cuda events for parallelism and dependency tracking.

Much of the diff is about computing dependency graph, which was quite tricky because we need to also avoid write-races of multiple operators running in multiple timesteps in parallel. Also, recurrent blobs "change name" when passing over timestep ("_prev"), so that needs to be handled as well.

This diff also restores the link-ops that I unlanded earlier.

The performance gain of this diff is very good for CPU (same perf as with static_dag, even better on forward-only). On CUDA, the gains are modest, at least with the sizes i was testing with.

Reviewed By: salexspb

Differential Revision: D5001637

fbshipit-source-id: 3d0a71593d73a9ff22f4c1a5c9abf2a4a0c633c8
2017-08-15 23:55:15 -07:00
James Reed
a985355935 Gradient for SequenceMaskOp
Summary: Implement backward pass for a SequenceMaskOp to replace https://github.com/caffe2/caffe2/blob/master/caffe2/python/attention.py#L54-L72.

Reviewed By: akyrola

Differential Revision: D5618373

fbshipit-source-id: b831fa69f51d9468c858961f922564159e12b46f
2017-08-12 14:34:29 -07:00
James Reed
0a828768e9 Implement SequenceMaskOp forward pass
Summary:
Implement forward pass for a SequenceMaskOp to replace https://github.com/caffe2/caffe2/blob/master/caffe2/python/attention.py#L54-L72.

This implements two modes: a sequence-length based mode and a matrix triangle mode.

Reviewed By: akyrola

Differential Revision: D5615493

fbshipit-source-id: a2ce4a8e655d9b720049010a7856be052c5567eb
2017-08-12 14:34:28 -07:00
Jerry Pan
9372ff7a86 Caffe2: support Tensor in BlobsQueueDB
Summary: Caffe2: support Tensor in BlobsQueueDB

Reviewed By: kevinwilfong

Differential Revision: D5589616

fbshipit-source-id: 66aa6092b6403960c4858abd986771b58be94106
2017-08-11 11:21:14 -07:00
Alexander Sidorov
a7be496fe2 Revert D5589309: modify _LSTM into _RNN to adapt GRU
Summary:
This reverts commit f5af67dfe0842acd68223f6da3e96a81639e8049

bypass-lint

Differential Revision: D5589309

fbshipit-source-id: 79b0a3a9455829c3899472a1368ef36dc75f6e14
2017-08-10 16:42:41 -07:00
Christopher Hay
f2dfb40302 Added amplitude argument to SinusoidPositionEncodingOp
Summary: In order to control the absolute scale/magnitude of the output of this op, added a tuning parameter: amplitude

Reviewed By: jamesr66a

Differential Revision: D5596574

fbshipit-source-id: 3b7e316de55cce6fd686da70aa5658ec3e99b070
2017-08-10 15:27:17 -07:00
Kittipat Virochsiri
eb85258beb CreateMapOp
Summary: Add operator to create empty map

Reviewed By: xianjiec

Differential Revision: D5454652

fbshipit-source-id: ecad6cc58572b378962af08cf02063ef546ed58f
2017-08-09 13:32:19 -07:00
Tao Wu
7b86a34610 modify _LSTM into _RNN to adapt GRU
Summary: GRU is different than LSTM that it only has hidden states but no cell states. So in this case, reusing the code of _LSTM is problematic, as we need to delete the part of creating cell state, and change many other places that use hard-coded 4 (hidden_all, hidden, cell_all, cell) into 2 (hidden_all, hidden). Otherwise GRU will break during the backward pass, when the optimizer tries to apply gradient to each of the parameters, because cell state is never used, so it does not have gradients for the corresponding parameters (i.e., cell_state_w, cell_state_b).

Differential Revision: D5589309

fbshipit-source-id: f5af67dfe0842acd68223f6da3e96a81639e8049
2017-08-09 13:24:45 -07:00
Andrei Chtcherbatchenko
a2204f0b1e Caffe2: Write CUDA version of OneHot operator
Summary: This diff implements CUDA version of OneHot operator.

Reviewed By: bddppq

Differential Revision: D5578543

fbshipit-source-id: 55b70e8ec6ee34b647b9140fecbba31b6968f403
2017-08-08 18:17:39 -07:00
Jianlong Zhong
152d2ae3a8 Implement CUDA version of GRU operator
Summary: Add CUDA version of GRU operator

Reviewed By: jamesr66a

Differential Revision: D5571043

fbshipit-source-id: 332aa64fc8a9116cc33382f2b2907080e58c13b3
2017-08-08 10:57:40 -07:00
Chonglin Sun
8ad382df3c implement LengthsTopK operator
Summary:
It was reverted previously because of lack of schema for gradient op. Added it back and resend.

difference between this diff and previous reverted diff:
1. added schema for gradient operator
2. change line:95 in kmax_pooling_op.h from CAFFE_ENFORCE to CAFFE_ENFORCE_GE

Reviewed By: xianjiec

Differential Revision: D5568867

fbshipit-source-id: 39813b389a5da803967a561249793afdfce00c58
2017-08-07 18:19:29 -07:00
Ahmed Taei
8af625ede2 Implement gradients for Col2Im and Im2Col operators
Reviewed By: jay-mahadeokar

Differential Revision: D5576385

fbshipit-source-id: a0ca4f704fd861f7cc67079041b1d0772fc66920
2017-08-07 15:51:30 -07:00
Ben Zhang
42fb87d0b1 L1Distance Row-wise, instead of cumulative
Summary:
The L1Distance operator used to return a single value denoting the L1 of the entire input, instead of a vector for each input value.

This fixes that.

Reviewed By: Yangqing

Differential Revision: D5570385

fbshipit-source-id: fbab0e0c9262ccbdb3af27262b8baacdeb2d0fc9
2017-08-07 14:09:25 -07:00
Zhicheng Yan
e7192c3b91 image_input_op_dense_multi_label
Summary:
To train an image model, we also can use label embedding vector as supervision as opposed to using SoftmaxLoss/SigmoidCrossEntropyLoss.
In such case, the label is a dense vector. This diff enables such use cases.

Reviewed By: panshen1

Differential Revision: D5556203

fbshipit-source-id: 52c61495e02fab457dc2d43e3345d7dbd5580ab7
2017-08-07 12:38:16 -07:00
Juan Miguel Pino
4d8a8c2e1e Implement dot attention
Summary:
Implement dot attention as described in https://arxiv.org/abs/1508.04025
This saves the computation of weighted encoder outputs in `rnn_cell.py`
When the encoder and decoder dimensions are different, we apply an FC, which corresponds to the general case below Figure 2.
Refactored unit tests.

Reviewed By: jhcross

Differential Revision: D5486976

fbshipit-source-id: f9e9aea675b3b072fbe631bc004199b90a9d95cb
2017-08-06 11:50:16 -07:00