Summary:
It could be that only first item
in the batch was really used in a case rest of the memory was 0. Or if
memory there had a big positive integer, then whole sequence was used. So we used rest of the batch depending on our luck :)
Reviewed By: Yangqing
Differential Revision: D4599569
fbshipit-source-id: ae89cee796bbcbc232e4abcab71dee360b0d8bc6
Summary:
Input have to be arranged in such a way so j-th example of
batch i goes right before j-th example in batch i+1 in the text.
Reviewed By: urikz
Differential Revision: D4519553
fbshipit-source-id: 9dd80658e0c4d9ff0f97a7904cbb164f267fe39f
Summary: On batch size of 32 and other default parameters I get 70 iterations per second vs. 40 on CPU. batching still doesn't produce good loss, I am going to work on this in a separate diff
Reviewed By: urikz
Differential Revision: D4516566
fbshipit-source-id: d0611534747beb2cd935a8607a283369378e4a6c
Summary:
This learns Shakespeare and then generates samples one character at a time. We want this to be an example of using our LSTM and RNNs in general.
Now it takes 4ms to run the training net on current parameters (with batch size = 1). I don't have data on how much each operator takes yet. But overal python loop doesn't seem to influence much - with 1000 fake iterations in run_net it took 4s for each iteration as expected.
Future work:
* fixing convergence for batching
* profiling on operator level
* trying it out with GPUs
* benchmarking against existing char-rnn implementations
* stacking lstms (one lstm is different from two, one needs to take care of scoping)
Reviewed By: urikz
Differential Revision: D4430612
fbshipit-source-id: b36644fed9844683f670717d57f8527c25ad285c