pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
Alexander Sidorov	df72826ead	Static RNN Summary: Static RNN allows to unroll an RNN into Caffe2 graph using all existing cell abstractions. In this diff I introduce several new tests that already caught a few bugs in our RecurrentNetworkOp gradient accumulation logic by comparing it to an unrolled version. Another use case is perf - potentially we can run an unrolled net faster because DAGNet will have access to the whole graph. Same about memonger. But this work is not part of this diff Reviewed By: akyrola Differential Revision: D5200943 fbshipit-source-id: 20f16fc1b2ca500d06ccc60c4cec6e81839149dc	2017-06-08 17:48:48 -07:00
James Cross	98825d1323	guard against special case of in-place operation Summary: There is an edge case where internal gradient blobs of the backward step net should not be considered internally calclulated if the only "internal" calculation is in-place. In the case of the failing attention unit tests, the offending blob was attention_weighted_encoder_context_grad, which was incorrectly considered internal because it was the output (as well as input) of a Reshape on the step net's edge. The caveat here is that the results may be unpredictable if a non-pass-through in-place operation is applied to a blob within a step net which is also consumed both internally and is a recurrent state/output. (This is an extreme edge case, and difficult to explicitly enforce, but it's worth noting.) Reviewed By: salexspb Differential Revision: D5198328 fbshipit-source-id: 0cfa8f903fd767fc50e727f238ac3d8cdca03fe0	2017-06-07 12:33:31 -07:00
Thomas Dudziak	47e921ba49	Remove map() and filter() in favor of comprehensions Summary: These return views in Python 3 which would not do anything in a lot of usages currently present in Caffe2. This diff simply removes (almost) all usages of these two in Caffe2 and sub projects in favor of comprehensions which are also easier to read/understand Reviewed By: akyrola Differential Revision: D5142049 fbshipit-source-id: e800631d2df7d0823fed698cae46c486038007dc	2017-05-30 15:32:58 -07:00
Luke Yeager	97159810c9	Restore compatibility with protobuf2 Summary: Addresses an issue with `417f74509e`. ``` > operators.append(proto.op.pop()) E AttributeError: 'RepeatedCompositeFieldContainer' object has no attribute 'pop' ``` /cc jhcross Closes https://github.com/caffe2/caffe2/pull/658 Reviewed By: dzhulgakov Differential Revision: D5130382 Pulled By: salexspb fbshipit-source-id: 34e0c39aad5f339c1aaa1506af3e7495193565f4	2017-05-26 08:47:24 -07:00
James Cross	c39f6cf2d0	gradient accumulation fix Summary: As noted by salexspb, MultiRNNCell had unreliable gradient computation. The problem was that recurrent gradient and gradient computed wihtin the backward step net were not being accumulated during the backward pass, but rather writing to the same blob, thus overwriting each other. This diff fixes that by artificially introducing an extra blob for the internal output, and then accumulating it into the gradient coming from the recurrent connection. Reviewed By: salexspb Differential Revision: D5110059 fbshipit-source-id: 16add50989fe8866361bbc21afce5f214c5292fd	2017-05-24 10:33:32 -07:00
Aapo Kyrola	a61778a628	fix recompute_blobs_on_backward Summary: My previous refactoring broke recompute_blobs_on_backward, which was cleared. Reviewed By: urikz Differential Revision: D5013351 fbshipit-source-id: 5945778c0cff2ee2c7f5ad7b59b58f4305fa6a05	2017-05-05 14:01:34 -07:00
Aapo Kyrola	c86610b738	special executor class for RecurrentNetworks (just single threaded now) Summary: This is preamble for the "diagonal executor". Instead of creating a Net for each timestep, we have a single executor for the RecurrentNetworkOp that manages ops per timestep. This will be used if net_type='rnn', so one can still use the old way by using a net type of 'simple' or 'dag' (so there is effective kill-switch if there are some issues with this). Did this only for the forward-model. Gradient op will follow later on, but it is basically similar, just reverse order. Reviewed By: salexspb Differential Revision: D4979933 fbshipit-source-id: bda77918ec518cb6b29d7021ee036d59eb2dd303	2017-05-01 19:06:25 -07:00
Alexander Sidorov	ad6204eb0b	LSTM: support dropping hidden / cell states when sequence Summary: This is useful when data has standalone sequences which are not connected to each other by any meaningful context Reviewed By: yqwangustc Differential Revision: D4835164 fbshipit-source-id: f95626acc26acc3eba3bca7efb08ed1dbdb36c83	2017-04-27 11:47:29 -07:00
Alexander Sidorov	b905166362	RNN: fix bug for parameter gradient in a case when SumOp is Summary: Issue is that AliasOp doesn't work well with swaps that we do for param.grad and param.accGrad. Tensors become the same if there is no reallocation of the gradient tensor inside the backward cell net's local workspace. bug explanation from akyrola: ``` gpu_0/decoder/decoder_hidden_encoder_outputs_sum_grad: tensor A on each timestap back to 0, we Alias gpu_0/decoder/weighted_encoder_outputs_grad, so then also gpu_0/decoder/weighted_encoder_outputs_grad: tensor A It's acc is: gpu_0/decoder/weighted_encoder_outputs_grad_acc: tensor B Now after timesteps, we swap (line 626) with _acc to get gpu_0/decoder/weighted_encoder_outputs_grad: tensor B gpu_0/decoder/weighted_encoder_outputs_grad_acc: tensor A OPTION A -- batch size is same as before or smaller: Then on next iteration, we do again the Alias to gpu_0/decoder/decoder_hidden_encoder_outputs_sum_grad, so now gpu_0/decoder/weighted_encoder_outputs_grad: tensor A and also gpu_0/decoder/weighted_encoder_outputs_grad_acc: tensor A swapping them does nothing and they are the same OPTION B -- batch size increases gpu_0/decoder/decoder_hidden_encoder_outputs_sum_grad is reallocated, becomes tensor C gpu_0/decoder/weighted_encoder_outputs_grad becomes tensor C with Alias gpu_0/decoder/weighted_encoder_outputs_grad_acc: is tensor A ``` Reviewed By: urikz Differential Revision: D4946730 Tags: rnn, caffe2 fbshipit-source-id: b52d63cb238b81d2ad40e05e70deb32a81336f47	2017-04-25 20:46:59 -07:00
Aapo Kyrola	9cb901caf0	Forward-only rnns Summary: Added option to recurrent_net and RNNCell's for forward_only. If this is set, the backward_step_net is not passed to the operator. When backward_step_net is not available, operator knows it is in forward_only mode and does not create workspaces for each step but cycles through only one private workspace. Note: we could avoid doing a lot of work in recurrent.py:recurrent_network call when backward step is not needed, but doing that nicely requires more refactoring that I did not want to do now. Thus, we create the backward step nets etc, but just don't pass it to the op. This can be used to create more efficient inference models. You can also sanitize existing inference nets and remove the backward_step_net argument to get the benefits. Reviewed By: salexspb Differential Revision: D4916482 fbshipit-source-id: c99b93c9cb897c32b0f449253f7f6d6a942618ad	2017-04-24 15:52:27 -07:00
Alexander Sidorov	bf20e4e9b0	Remove MiLSTM from recurrent.py left over after refactoring Summary: its not used Reviewed By: urikz Differential Revision: D4936008 fbshipit-source-id: cc044bbdac0d17503ce9376b98e4bf79a4dc959c	2017-04-24 15:52:26 -07:00
Yury Zemlyanskiy	4bf559eddb	RNNCell, LSTMCell, LSTMWithAttentionCell Summary: This is the nice way to re-use RNN layers for training and for inference. Reviewed By: salexspb Differential Revision: D4825894 fbshipit-source-id: 779c69758cee8caca6f36bc507e3ea0566f7652a	2017-04-18 00:47:20 -07:00
James Reed	e8cc5563fe	Add an optional forget bias argument to LSTMUnit Summary: Add an option to bias the forget gate one way or another by adding in some float value before the sigmoid is applied. Differential Revision: D4880712 fbshipit-source-id: 1306a97c29fb31630838b2f96597a46e952d940a	2017-04-13 21:49:17 -07:00
James Reed	bbcdc91135	Remove prof_dag from step net Summary: prof_dag in step net is not supported (Note: this ignores all push blocking failures!) Differential Revision: D4876551 fbshipit-source-id: 4003e60908e51ef052f8656bf527b326676c298c	2017-04-12 11:01:30 -07:00
Aapo Kyrola	1e5140aa76	option to recompute blobs backward pass with massive memory savings Summary: This diff adds an option to recurrent_net to define some cell blobs to be recomputed on backward step, and thus they don't need to be stored in the step workspace. This is done by modifying the backward step to automatically include all operators that are needed to produce the output that is to be recomputed, and by storing those blobs in a shared workspace. To enable the shared workspace, i had to modify the stepworkspaces blob to also store a forward shared workspace. Making it a class field won't work since the lifecycle of the blob does not match the lifecycle of the operator. For basic LSTM, the performance hit is quite modest (about 15% with one setting, but your mileage might vary. For Attention models, I am sure this is beneficial as computing the attention blobs is not expensive. For basic LSTM, the memory saving is wonderful: each forward workspace only has 4 bytes (for timestep). I also modified the neural_mt LSTM Cells, but there is no test available, so I am not 100% sure I did it correctly. Please have a look. Added options to LSTM, MILSTM and LSTMAttention to enable memory mode. Reviewed By: urikz Differential Revision: D4853890 fbshipit-source-id: d8d0e0e75a5330d174fbfa39b96d8e4e8c446baa	2017-04-11 13:03:48 -07:00
James Reed	4dc1dbab05	MILSTM Cells with and without attention Summary: Bug fix for MILSTM implementation: parameters were not trainable. Reviewed By: urikz Differential Revision: D4864581 fbshipit-source-id: a3fdb7a85c8d87c5117328ca8cae4fb6352728d0	2017-04-11 02:01:23 -07:00
Aapo Kyrola	b0adcf02f8	remove workspace sequence id Summary: As said in the title. This should save a lot of memory if using both train and test workflows. Reviewed By: jhcross Differential Revision: D4855436 fbshipit-source-id: 9eeca548eee118e07bd587c46f40e7beb138318e	2017-04-08 00:01:59 -07:00
Aapo Kyrola	8da2d75ec8	Caffe2/Recurrent] recurrent.py API to cuDNN LSTM Summary: Quite large diff to make cuDNN LSTM and our LSTM produce same results and provide python API for the cuDNN LSTM. * Added operators RecurrentParamGet and RecurrentParamSet to access weights and biases for the different gates, input/recurrent. * Removed RecurrentInit as not needed * recurrent.cudnn_LSTM() returns a special net and mapping that can be used to retrieve the parameters from the LSTM * recurrent.cudnn_LSTM() can be passed blobs that have the parameters for the individual gate weights and biases * recurrnet.InitFromLSTMParams() can be used to initialize our own LSTM from CUDNN params. This way we can test if cuDNN and our own produce the same result. recurrent_test.py tests for the equivalency Reviewed By: salexspb Differential Revision: D4654988 fbshipit-source-id: 6c1547d873cadcf33e03b0e0110248f0a7ab8cb0	2017-04-05 14:20:23 -07:00
Aapo Kyrola	e13e9c1302	cuDNN version of TransposeOp Summary: Uses the cudnnTransformTensor function. It works by shuffling the strides according to the transpose axis. Significant speedup over current GPU version . + moves the transpose test under utility_ops, because hypothesis_test is too big Reviewed By: jamesr66a Differential Revision: D4810993 fbshipit-source-id: 82577c4ced1389e70bd5992820ae4d8297a3817f	2017-04-03 13:33:10 -07:00
Aaron Markham	58f7f2b441	doxygen python block added Summary: Closes https://github.com/caffe2/caffe2/pull/226 Differential Revision: D4793550 Pulled By: JoelMarcey fbshipit-source-id: cc33e58186304fa8dcac2ee9115dcc271d785b1e	2017-03-29 06:46:16 -07:00
Alexander Sidorov	242bff8480	RNN: avoid copy for gradients of inputs to the rnn cell and save more memory! Summary: This is pretty tricky to explain, but we can just use backward_links. This way the whole cell would use a blob from the states_grad tensor instead of having its own blob. This also should save on memory a bit Differential Revision: D4770798 fbshipit-source-id: 673f85b2c2fdf42c47feeaa24d1e2bf086f012f9	2017-03-28 10:02:25 -07:00
Yury Zemlyanskiy	97a6400f03	Don't do copy for param_grad in backward_step_net Summary: We anyway accumulate values of this blob (param_grad) in a another special internal blob Differential Revision: D4768643 fbshipit-source-id: a9d08b7eafd25f278a8db722f9cdb1d0064b852a	2017-03-24 02:22:33 -07:00
Deepak Gopinath	422c65ca35	Removing unnecessary Copy after fixing gradients for external parameters Summary: Apart from copying gradient blobs for inputs with initial_cell_input, we needed to perform a similar operation for external parameters used by the step net Reviewed By: salexspb Differential Revision: D4752259 fbshipit-source-id: 13ee48cf583ed86221a4cc1cc9f57f5c3a7d2450	2017-03-23 15:04:22 -07:00
Yury Zemlyanskiy	ea66516d5e	Output attention weights from apply_xxx_attention methods Summary: OSS diff. We need it later for beam decoding. Differential Revision: D4747785 fbshipit-source-id: ce2d53ee2434216ace3c4ddbd40a9b68e9db7ec5	2017-03-21 19:01:58 -07:00
Alexander Sidorov	d7b2aebf2c	Support for Sum in cell net as first operator Summary: This didn't work for a reason specified in comments. Also some cleanup in the unit tests, now inference uses a custom workspace to run cell net on Reviewed By: urikz Differential Revision: D4742670 fbshipit-source-id: 04165c029fddec5ae31b20b207faf06d2fa20816	2017-03-21 18:32:18 -07:00
James Reed	33f41c06c0	Remove more instances of batch_size Summary: D4734505 part 2. Remove more instances of the batch_size parameter Reviewed By: urikz Differential Revision: D4736906 fbshipit-source-id: fc9d374e9308017d61c427890364c5ab9cec2edf	2017-03-19 22:31:30 -07:00
Yury Zemlyanskiy	d1424c3265	Revert D4702086: Remove batch_size parameter from attention and LSTMWithAttention interfaces Summary: This reverts commit c4c1d8425cd36c1e86695918eaba2667c27e9601 Differential Revision: D4702086 fbshipit-source-id: 4620610b182bb84b9297b5de32782761ae89d20b	2017-03-17 17:36:47 -07:00
James Reed	10d95bd0f0	Remove batch_size parameter from attention and LSTMWithAttention interfaces Summary: Reshape based on tensor shapes in the graph rather than based on a passed-in batch_size parameter Reviewed By: urikz Differential Revision: D4702086 fbshipit-source-id: c4c1d8425cd36c1e86695918eaba2667c27e9601	2017-03-16 11:47:52 -07:00
Aapo Kyrola	26628d10ff	Fix workspace clashes Summary: For example, test and train nets could have shared workspaces, leading to race condition. This adds an assertion and adds a running counter to the workspace-blob name. Reviewed By: jhcross Differential Revision: D4712152 fbshipit-source-id: 808d7069095bac24ebfe0c9d31ebd134f4cf0956	2017-03-14 23:33:28 -07:00
Aapo Kyrola	91f468b15c	fixes to make data parallel model work for RecurrentNet + test case Summary: First, this diff includes a full test of data-parallel LSTM, which confirms it works correctly. To make it work, some changes had to be made: - cell net/step net external inputs must be namespace scoped - prevent double-namescoping of cellnet inputs - make data parallel model understand recurrentnets so the device-mapping works Reviewed By: salexspb Differential Revision: D4708840 fbshipit-source-id: 4b0ddc43642d449076a2b6f67ad1c47f84138ff4	2017-03-14 15:48:07 -07:00
Karthik Prasad	965a7daf9b	Implement MILSTM in caffe2 Summary: Created a new function with specifics related to MI LSTM implementation in caffe2 See https://arxiv.org/pdf/1606.06630.pdf for details. See D4478877 for the implementation of the same in tensorflow Reviewed By: jhcross Differential Revision: D4669882 fbshipit-source-id: 095bbcf187dbdac2cd79558ff0c8f9f67d8af639	2017-03-09 16:32:47 -08:00
James Reed	8de1db9eb6	Implement recurrent attention in C2 Summary: Super rough implementation of recurrent attention. Planning to factor out the common code between the two functions as well as train and eval. I want to get this out and get eyes on it sooner rather than later Differential Revision: D4647837 fbshipit-source-id: 54bc4e8ed0df6f04c86c425926decbe89f73b068	2017-03-08 11:21:28 -08:00
Yury Zemlyanskiy	4a53ab3cb6	LSTMWithAttention implementation in Caffe2 Summary: Implementation of ##LSTMWithAttention## Still TBD: 1. There are problems with back propagation, because gradient is not implemented for ops with broadcasting 2. I need to make initial_recurrent_state to be of shape [dim] rather than [1, batch_size, dim], so one doesn't need to provide batch_size to LSTMWithAttention Differential Revision: D4298735 fbshipit-source-id: 8903fcff4d6a66647ee6d45a6ef28803fc3091e5	2017-02-23 04:08:34 -08:00
James Cross	b436788b16	LSTMUnit: pass through H values Summary: Pass through the h-value recurrent output unchanged at each LSTM step beyond the valid part of a sequence (computed based on seqLengths, allowing batching of sequences of different length). This enables using the final-step output of each sequence as the output when one vector is desired for the entire sequence. Gradient also passed back unchanged. Also made some cosmetic changes to recurrent_network_test.py (seq_lengths offset corrected, should be in [1, T] rather than [0, T-1]). Reviewed By: urikz Differential Revision: D4540307 fbshipit-source-id: 73a9f6326069d713dcb0cdc8d17869317c6dbe96	2017-02-16 15:31:38 -08:00
James Cross	63901e9aca	allow recurrent network gradient op to receive gradient on any combination of network output blobs Summary: (Caffe2) Modified RecurrentNetworkGradient operator so that training is possible with any of the output blob(s) receiving gradient during the backward pass. This is realized through a new argument for the RecurrentNetwork op, outputs_with_grads, which takes a list of the indices of the output blobs which will receive gradient. The default case (only receiving gradient from the first output blob) remains the default. New unit test covers the case where outputs_with_grads = [1, 2] using Python LSTM wrapper. Reviewed By: urikz Differential Revision: D4518516 fbshipit-source-id: 5c531582b20f3cf727d1aa91239b4d5a2b8a7c1f	2017-02-15 16:00:45 -08:00
James Cross	93795406c5	Adapt NLU proj code for Caffe2 RecurrentNetworkOp changes Summary: Updates function revise_recurrent_network_op() which supports cloning recurrent networks by adding a blob-name prefix to string arguments to maintain correspondence. Previously relied on many hard-coded indices referring to the positions of arguments and inputs of RecurrentNetworkOp and its corresponding gradient operator, and therefore broke when the implementation changed. This fix should make it more general and robust Differential Revision: D4559768 fbshipit-source-id: fb85b0b1ffb1393dc84760d6ae5dc473e8b764b0	2017-02-15 16:00:44 -08:00
Alexander Sidorov	b7fa6b2a8b	remove recurrent_inputs in a favor of recurrent_input_ids Summary: I have forgotten to remove this one. The rest of indexing instead of string names is comming after D4446813 lands as scratches aren't inputs or outputs and thus can't be indexed. Reviewed By: urikz Differential Revision: D4465748 fbshipit-source-id: 2ccbedfb35541ef4a2231d1480eef59025bd5290	2017-01-31 13:14:33 -08:00
Alexander Sidorov	d019ec793c	improve fluky test Summary: On some inputs TestWarden was failing Reviewed By: Yangqing Differential Revision: D4487293 fbshipit-source-id: 3da4b310a619c2b57f033b2dd7727f71403bfd68	2017-01-30 22:14:27 -08:00
Yury Zemlyanskiy	22e1bdd6d1	Use stack workspaces in RecurrentNetwork Summary: This diff use stack workspaces in RecurrentNetwork, which allows to simplify the implementation and get rid of scratches. Reviewed By: salexspb Differential Revision: D4446813 fbshipit-source-id: 514eec7e4300bdf492a9cb192b40cf4f89acf656	2017-01-27 11:44:26 -08:00
Yury Zemlyanskiy	0e3146e1e8	Remove recurrent_sizes from RecurrentNetwork Summary: Remove usage of recurrent_sizes, so recurrent states' sizes can depend on input (in case of attention matrix for beam decoder). I removed recurrent_sizes from forward and backward steps. Reviewed By: salexspb Differential Revision: D4427688 fbshipit-source-id: 580420a294d309c86ec5cb4e677058623b7228e1	2017-01-24 23:14:25 -08:00
Alexander Sidorov	b1472a173a	don't hardcode outputs order to work only for lstm + don't pass blob names for parameters Summary: In this diff I stop passing parameters by name and also remove hardcoded output ids which were there specifically for LSTM to work. It also allows to avoid using recurrent_sizes in the backward pass (for forward this is done in D4427688) Using similar technic it should be simple enough to eliminate blob name passing at all. Then we can fix scoping. These can be done in a next diff. Reviewed By: urikz Differential Revision: D4444614 fbshipit-source-id: 3580a76365502b9f2f09e3d8b7e78084ca739f00	2017-01-24 16:29:23 -08:00
Yury Zemlyanskiy	c2d28fb874	RNNs API simplification Summary: This is a first step in improving our RNN story. It provides a wrapper around current RecurrentNetworkOp implementation which infers most of the redundant parameters and makes API much simpler. Also in order to support general step nets I added an extra argument to the RecurrentNetworkOp. Future work: 1. Inferring step net output and internal blobs (scratches) sizes and type 2. Avoid accessing blobs by names in c++ part 3. Remove requirement for inputs / output 1:1 correspondence in the step net 4. Make python API support networks with operators like Sum being on the boarder of the Cell net (currently there is an issue with such networks where gradient blobs which are on the side are not explicitly created). Differential Revision: D4268503 fbshipit-source-id: f8a66491c2b55daa730caeed7e9f2b3921541b49	2016-12-21 09:29:43 -08:00

42 Commits