Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18494
Today we have some C2 end2end test run requiring reading model data from external filesystem (for example, Gluster and AWS). This could be a source for flaky test when the external filesystems are not reachable during the tests.
In this diff, we add try/catch logic around where we download models and open model files from external system. In case such attempts fails, we will catch the excption and let the unittest skip the current test instead of failure.
I also refactor the code a little bit by removing some duplicated logic on downloading and build the c2 model data. It has been duplicated in two classes and a few functions...
Reviewed By: yinghai
Differential Revision: D14442241
fbshipit-source-id: da8bf56c8d096efa34ca2070de5cd10a18aad70c
Summary:
According to https://docs.python.org/3/tutorial/inputoutput.html, it is good practice to use the "with" keyword when dealing with file objects. If not, you should call f.close() to close the file and immediately free up any system resources used by it. Thus, I adjust the open file function to "with open() as f".
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18017
Differential Revision: D14475112
Pulled By: ezyang
fbshipit-source-id: d1c0821e39cb8a09f86d6d08b437b4a99746416c
Summary:
Goal of this PR is to unify cuda and hip device types in caffe2 python front end.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14221
Differential Revision: D13148564
Pulled By: bddppq
fbshipit-source-id: ef9bd2c7d238200165f217097ac5727e686d887b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10035
This is an initial diff which refactors some of the components in the Seq2SeqModelCaffe2EnsembleDecoder class.
Reviewed By: jmp84
Differential Revision: D9026372
fbshipit-source-id: 449635208f24494209ae2fb78a19fca872970ea8
Summary:
There is a long lasting problem of scoping which was introduced in original python wrappers early in H1. Basically each RNNCell implemented has to manually scope outputs of each of the operators. If somebody forgets, then there could be weird bugs with layers etc.
Approach is the following. User has to explicitly specify current scope when using apply_over_sequence function and others if the function is going to be called several times (like for stacking layers). This way we use Caffe2 native scoping approach instead of inventing one extra API people have to use (i.e. passing scope name as an argument to the RNNCell constructor).
Closes https://github.com/caffe2/caffe2/pull/1681
Differential Revision: D6777536
Pulled By: salexspb
fbshipit-source-id: 73d860b8d4857589e04bdea5a6fcd3080d68427c
Summary:
* We now allow subdirectories as well as numbers in the name.
* Also fixed an error case.
Closes https://github.com/caffe2/caffe2/pull/1875
Reviewed By: pjh5
Differential Revision: D6894401
Pulled By: orionr
fbshipit-source-id: 6a9938bc7d2ba6b8f094ed7b8a02664120a10626
Summary:
In D5681122 - when routing to global maxpool and average pool, the condition is not correct.
see T24876217 for discussion
Reviewed By: Yangqing
Differential Revision: D6665466
fbshipit-source-id: dcb5b4686249e6ee8e1e976ab66b003ef09b32fd
Summary: Still WIP, but works for the universal encoder. The other ones are currently broken.
Differential Revision: D6492786
fbshipit-source-id: 232e0058eb3a0c036de3adf0295db5efd624cca7
Summary: word_rewards data type is mixed; ConstantFill assigns long but later is filled with float32. This causes issues when running net from outputted protobuf. This change makes data type to be float32 for lifetime of blob.
Reviewed By: jhcross
Differential Revision: D6486723
fbshipit-source-id: c4ce5185a0a6d71b08b1819f2355e9354823b701
Summary: Current beam search generates successor states to EOS which are considered for inclusion in the beam even though they do not represent valid sequence prefixes. This diff introduces a penalty to ensure that such states are not included in the beam.
Reviewed By: xliilx
Differential Revision: D6325511
fbshipit-source-id: b17f10b0d00f3bc5fcc5a826a8a57a0f2cb360a6
Summary:
My commit bab5bc broke things wiht fp16 compute, as i had tested it only with the null-input, that actually produced fp32 data (even dtype was given as float16). Also, I had confused the concepts of "float16 compute" and fp16 data. Issue #1408.
This fixes those issues, tested with both Volta and M40 GPUs. Basically restored much of the previous code and fixed the null input to do FloatToHalf.
Reviewed By: pietern
Differential Revision: D6211849
fbshipit-source-id: 5b41cffdd605f61a438a4c34c56972ede9eee28e
Summary: This cleans up the _hack_get_slice_end() using the Conditional operator.
Reviewed By: jmp84
Differential Revision: D6177797
fbshipit-source-id: 5ce0b76b8472123415bba39488aa2c69aad96111
Summary:
RNN executor had a disadvantage to plain nets when running in forward-only mode: for plain nets, we only create two workspaces and two nets and alternate between them. With RNN executor, we had only four workspaces (4 > 2 because it was faster in some cases), but the nets (or rather the ops) were created for each of the timesteps. This has significant overhead. This diff changes this sos that if executor is is forward-only mode (i.e has limited parallelism setting), then it will use the same operators as the t - 4'th net -- excluding the ops that require the timestep blob. The latter exception is required because RNN executor needs different timestep blob for each timestep because it cannot modify the value of the timestep blob like when running nets in a loop.
Also removed redundancy in the dependency computation and added a debug flag to the executor that outputs the description of the rnn contents.
Reviewed By: salexspb
Differential Revision: D6155510
fbshipit-source-id: c47f727d2128649b081270d15020a08d41e5748d
Summary:
seq2seq/translate.py was running much slower on RNNExecutor. This was because RNNExecutor has significant init overhead (I have another diff to reduce, but not completely eliminate it), and translate was calling the decoder with RunNetOnce -- thus always recreating the net and the ops. Changhing this to RunNet() makes translate run faster than without executor. RunNet uses the net name and uses the already created net, while RunNetOnce passes the whole protobuffer.
Noticed similar bug in seq2seq ensemble bean model, which also calls CreateNet() but uses RunNetOnce() instead of RunNet().
Reviewed By: jhcross
Differential Revision: D6156566
fbshipit-source-id: a933453e36a0d8fd163d0584186fda427a680687
Summary:
RNN executor has significant overhead of creating the timestep-nets the first time, and this is especially bad with beamsearch that is complex.
So disable RNN executor for now until perf regression is fixed (I have pending diff on it).
Reviewed By: salexspb
Differential Revision: D6138878
fbshipit-source-id: ce63ab9ce9cc1c0f67097aea1e370494ca98c680
Summary: Before this diff RNNOp was using TextFormat for representing steps. This diff is changing RNNOp to prefer NetDef argument instead. To be backward compatible it supports TextFormat for existing models, though we can compile RNNs without TextFormat as well.
Reviewed By: salexspb
Differential Revision: D5949330
fbshipit-source-id: 9336a8f5ccf30ad8d8e3a7067b9437e1704b1c9f
Summary:
T22119644 showed that there is a potential illegal memory access in beam search with attention. Upon further inspection, we can see that there are multiple ops that write to the same old shape blob:
{"output0": "model0/attention_decoder/attention_weighted_encoder_context_reshaped", "output1": "state_old_shape_before_choosing_per_hypo", "input0": "model0/attention_decoder/attention_weighted_encoder_context" }},
{"output0": "model0/attention_decoder/hidden_t_external_reshaped", "output1": "state_old_shape_before_choosing_per_hypo", "input0": "model0/attention_decoder/hidden_t_external" }},
{"output0": "model0/decoder/layer0/cell_t_reshaped", "output1": "state_old_shape_before_choosing_per_hypo", "input0": "model0/decoder/layer0/cell_t" }},
This diff de-dupes these outputs
Reviewed By: akyrola
Differential Revision: D5899103
fbshipit-source-id: 8b6f3f113e764dfeb9262f6c442e1124559cd2d8
Summary: Previously, the RecurrentNetwork op used for our beam search did not have any of the input blobs listed as data dependencies. This was fine when we were using SimpleNet, since the ops were run in the order in which we added them to the graph, and thus the RecurrentNetwork op was run after all the other ops. However, when switching to DAG, the ops that produce input data for the beam search were being run in parallel with the RecurrentNetwork beam search op, which caused non-deterministic failures based on thread scheduling. This fixes that
Reviewed By: jmp84, jhcross
Differential Revision: D5879622
fbshipit-source-id: b622de1f6a24b2636b191096db92990e0535890c
Summary: In some cases (e.g. CI), showing progress bar will mess up the log.
Reviewed By: jerryzh168
Differential Revision: D5850918
fbshipit-source-id: 2da9d020832264cef977391dc2fd8d1e2677d159
Summary: get and getheader are the same in Python 2
Reviewed By: akyrola
Differential Revision: D5836486
fbshipit-source-id: 3bacfccc872c44741d7f26c68ba967093fce45c2
Summary: This is will allow the same decoder to handle different go tokens.
Differential Revision: D5801811
fbshipit-source-id: ddd309963c97e32c728b15d2ccd4ba0c4ad5ebbe
Summary: RNN executor previously relied on getting the mapping from x to x_prev (and gradients) from recurrent.py, but we can just infer them from links. This makes all models compatible with rnn executor, given enable_rnn_executor=1 argument.
Reviewed By: jamesr66a
Differential Revision: D5801436
fbshipit-source-id: 14d0e26dfbad6347f645d907da493187c98e9b17
Summary: Make the command-line arguments pertaining to model architecture the same as between train.py and translate.py. Also use s() scoping function for all intermediate blobs in attention.py (this is for comatibility with multi-headed attention).
Differential Revision: D5594312
fbshipit-source-id: cadf51d854b5a9174ec913f32c655be2abf111e5
Summary:
Model downloader was broken after the move on s3 to the vanity url, download.caffe2.ai. Using this as the url base hits a redirect, and will result in the script throwing a 403 error. Rather than upgrading to urllib2 or putting in a bunch of code to handle a redirect on urllib, we can just use the non-vanity base url.
Closes https://github.com/caffe2/caffe2/pull/1020
Reviewed By: Yangqing
Differential Revision: D5568686
Pulled By: aaronmarkham
fbshipit-source-id: d88a6b3e1b7955835fc03b036dc54dec48316e7f
Summary:
Fix multilayer inference in Caffe2 example seq2seq code. (Rely on LSTMWithAttentionDecoder.apply rather than fixed state indices to determine stepwise decoder output.)
Also assorted updates to bring code in line with changes elsewhere in the codebase, and added unit tests which ensure that training and inference networks generate the same loss, which should make these problems much easier to identify in future.
Reviewed By: jamesr66a
Differential Revision: D5579803
fbshipit-source-id: 6e0f27340d981990ab8d0da58e63793222e7be87
Summary:
Implement dot attention as described in https://arxiv.org/abs/1508.04025
This saves the computation of weighted encoder outputs in `rnn_cell.py`
When the encoder and decoder dimensions are different, we apply an FC, which corresponds to the general case below Figure 2.
Refactored unit tests.
Reviewed By: jhcross
Differential Revision: D5486976
fbshipit-source-id: f9e9aea675b3b072fbe631bc004199b90a9d95cb
Summary: Several refinements to seq2seq example code, including support for multilayer LSTM.
Reviewed By: jamesr66a
Differential Revision: D5460372
fbshipit-source-id: d2eabf6aa9a5b5df7bbc341fd99c4e7d8322e717
Summary:
To be used with predictor "online": C++ version of memonger for simple nets. Very simple greedy algorithm. Works well at least on Resnet-50 inference graph: only 3 shared blobs are used.
Next I will integrate this with predictor and run canary (separate diff).
Reviewed By: asaadaldien
Differential Revision: D5375392
fbshipit-source-id: d36e419e39a32e568e105657c27fb00c85a2535d
Summary: Lets try this again. Verify graphs every time memonger is run. Will definitely check for time though.
Reviewed By: akyrola
Differential Revision: D5308188
fbshipit-source-id: 512a76c759b670d31c49d1d492dd8ee1eaf3bafd
Summary: Make verify_graph_equality get called by share_grad_blobs and optimize_inference_for_dag
Reviewed By: akyrola
Differential Revision: D5288993
fbshipit-source-id: b9f105ce00148b2673eed2dd390ab74f82f990ad