Summary:
This PR adds all PyTorch and Caffe2 job configs to CircleCI.
Steps for the CircleCI mini-trial:
- [ ] Make sure this PR passes Jenkins CI and fbcode internal tests
- [x] Approve this PR
- [ ] Ask CircleCI to turn up the number of build machines
- [ ] Land this PR so that the new `.circleci/config.yml` will take effect
Several Caffe2 tests are flaky on CircleCI machines and hence skipped when running on CircleCI. A proper fix for them will be worked on after a successful mini-trial.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11264
Differential Revision: D9656793
Pulled By: yf225
fbshipit-source-id: 7832e90018f3dff7651489c04a179d6742168fe1
Summary:
There is a long lasting problem of scoping which was introduced in original python wrappers early in H1. Basically each RNNCell implemented has to manually scope outputs of each of the operators. If somebody forgets, then there could be weird bugs with layers etc.
Approach is the following. User has to explicitly specify current scope when using apply_over_sequence function and others if the function is going to be called several times (like for stacking layers). This way we use Caffe2 native scoping approach instead of inventing one extra API people have to use (i.e. passing scope name as an argument to the RNNCell constructor).
Closes https://github.com/caffe2/caffe2/pull/1681
Differential Revision: D6777536
Pulled By: salexspb
fbshipit-source-id: 73d860b8d4857589e04bdea5a6fcd3080d68427c
Summary: GRU is different than LSTM that it only has hidden states but no cell states. So in this case, reusing the code of _LSTM is problematic, as we need to delete the part of creating cell state, and change many other places that use hard-coded 4 (hidden_all, hidden, cell_all, cell) into 2 (hidden_all, hidden). Otherwise GRU will break during the backward pass, when the optimizer tries to apply gradient to each of the parameters, because cell state is never used, so it does not have gradients for the corresponding parameters (i.e., cell_state_w, cell_state_b).
Differential Revision: D5589309
fbshipit-source-id: f5af67dfe0842acd68223f6da3e96a81639e8049
Summary: Add CUDA version of GRU operator
Reviewed By: jamesr66a
Differential Revision: D5571043
fbshipit-source-id: 332aa64fc8a9116cc33382f2b2907080e58c13b3
Summary: Implemented python logic and tests to create an RNNCell for GRU. Uses the preexisting GRU Unit Op code.
Reviewed By: salexspb
Differential Revision: D5364893
fbshipit-source-id: 2451d7ec8c2eacb8d8c9b7c893bfd21b65fb9d18
Summary:
Just an implementation of the forward pass of the GRU Unit Op, not the full RNNCell.
Functions were created to mimic LSTM implementation as closely as possible.
Backwards pass implementations are defined in GRU_unit_op.{h, cc}
assertGradientChecks call added to gru_cell_test.py
Reviewed By: salexspb
Differential Revision: D5364856
fbshipit-source-id: 09cff4478091827763b40cc331e4e0abf0ec258f
Summary:
Just an implementation of the forward pass of the GRU Unit Op, not the full RNNCell.
Functions were created to mimic LSTM implementation as closely as possible.
Implementation defined in GRU_unit_op.{h, cc}
tests put in gru_cell_test.py, which import rnn_cell_test_util.py for sigmoid, tanh, and _prepare_rnn functions.
Reviewed By: jamesr66a
Differential Revision: D5363697
fbshipit-source-id: f9ba9fe0be01ffc868dd22027be8be4975b84998