Summary: GRU is different than LSTM that it only has hidden states but no cell states. So in this case, reusing the code of _LSTM is problematic, as we need to delete the part of creating cell state, and change many other places that use hard-coded 4 (hidden_all, hidden, cell_all, cell) into 2 (hidden_all, hidden). Otherwise GRU will break during the backward pass, when the optimizer tries to apply gradient to each of the parameters, because cell state is never used, so it does not have gradients for the corresponding parameters (i.e., cell_state_w, cell_state_b).
Differential Revision: D5589309
fbshipit-source-id: f5af67dfe0842acd68223f6da3e96a81639e8049
Summary: Implemented python logic and tests to create an RNNCell for GRU. Uses the preexisting GRU Unit Op code.
Reviewed By: salexspb
Differential Revision: D5364893
fbshipit-source-id: 2451d7ec8c2eacb8d8c9b7c893bfd21b65fb9d18