pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
Tim Gates	3a87b47de9	docs: Fix a few typos (#81435 ) There are small typos in: - caffe2/python/recurrent.py - test/distributed/test_c10d_nccl.py - test/test_fx.py - torch/csrc/jit/runtime/autodiff.cpp - torchgen/gen.py Fixes: - Should read `propagation` rather than `propogation`. - Should read `multiplied` rather than `multuplied`. - Should read `eliminate` rather than `elminate`. - Should read `dispatcher` rather than `disaptcher`. Semi-automated pull request generated by https://github.com/timgates42/meticulous/blob/master/docs/NOTE.md Pull Request resolved: https://github.com/pytorch/pytorch/pull/81435 Approved by: https://github.com/ngimel	2022-07-14 04:20:26 +00:00
Bugra Akyildiz	27c7158166	Remove __future__ imports for legacy Python2 supports (#45033 ) Summary: There is a module called `2to3` which you can target for future specifically to remove these, the directory of `caffe2` has the most redundant imports: ```2to3 -f future -w caffe2``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/45033 Reviewed By: seemethere Differential Revision: D23808648 Pulled By: bugra fbshipit-source-id: 38971900f0fe43ab44a9168e57f2307580d36a38	2020-09-23 17:57:02 -07:00
Orion Reblitz-Richardson	1d5780d42c	Remove Apache headers from source. * LICENSE file contains details, so removing from individual source files.	2018-03-27 13:10:18 -07:00
Qing He	77352fdbdd	Remove scoping assertion because it is not useful and causing errors Summary: Remove scoping assrtion because it is not useful and causing errors Reviewed By: salexspb Differential Revision: D6538219 fbshipit-source-id: e587e294d4beec1370e6895af9354f0818a4cdd8	2017-12-11 18:03:45 -08:00
Andrey Malevich	e13f199452	Switch RNNOp to use NetDef argument for step represenetation. Summary: Before this diff RNNOp was using TextFormat for representing steps. This diff is changing RNNOp to prefer NetDef argument instead. To be backward compatible it supports TextFormat for existing models, though we can compile RNNs without TextFormat as well. Reviewed By: salexspb Differential Revision: D5949330 fbshipit-source-id: 9336a8f5ccf30ad8d8e3a7067b9437e1704b1c9f	2017-10-10 22:01:51 -07:00
Yangqing Jia	8286ce1e3a	Re-license to Apache Summary: Closes https://github.com/caffe2/caffe2/pull/1260 Differential Revision: D5906739 Pulled By: Yangqing fbshipit-source-id: e482ba9ba60b5337d9165f28f7ec68d4518a0902	2017-09-28 16:22:00 -07:00
Aapo Kyrola	45f07238f4	make rnn executor figure out recurrent mappings from links Summary: RNN executor previously relied on getting the mapping from x to x_prev (and gradients) from recurrent.py, but we can just infer them from links. This makes all models compatible with rnn executor, given enable_rnn_executor=1 argument. Reviewed By: jamesr66a Differential Revision: D5801436 fbshipit-source-id: 14d0e26dfbad6347f645d907da493187c98e9b17	2017-09-09 16:19:26 -07:00
Aapo Kyrola	cef2068eee	enable setting rnn executor threads and max streams Summary: As title. Made the configurations op-specific since many models run multiple RNNs. Reviewed By: jamesr66a Differential Revision: D5796208 fbshipit-source-id: 88173879dfff9f3f7bf583ccc4f4c6385cca5aca	2017-09-08 16:36:51 -07:00
Aapo Kyrola	631971e459	threaded RNN executor for CPU, multi-stream executor CUDA Summary: Special executor for RNNs which can exploit parallelism over timesteps. For CPU we use multi-threading, achiving 3x or so improved on 4-layers LSTMs. With CUDA, perf improvements are more modest, but the structure allows for optimizing it further. For CUDA, we use multiple streams and events if there is parallellism over timesteps. In my experiments, it was not good to use more than 2 streams, though. Flag --caffe2_rnn_executor can be used to switch the executor off. Reviewed By: salexspb Differential Revision: D5749304 fbshipit-source-id: d6f76b3e16598be5b4e8188aff031671ebafaa4c	2017-09-06 12:26:30 -07:00
Aapo Kyrola	e89474c496	fix forward_only mode Summary: Forward-only mode had broken at some point. Two things: RNNCell did not pass the parameter to recurrent.py and also recurrent.py was broken if forward_only=True after python3 codemod. Added test to rnn_cell_test to actually check the forward only parameter is passed to prevent future breakage. Reviewed By: jmp84 Differential Revision: D5639306 fbshipit-source-id: b1bbc39d59c3f3734b2f40a1c2f3740c733e0bd4	2017-08-17 10:19:04 -07:00
Aapo Kyrola	a53192e334	Revert D5001637: [Caffe2][RNN] Threaded dependency-aware RNNExecutor (frontier/diagonal execution). Summary: This reverts commit 3d0a71593d73a9ff22f4c1a5c9abf2a4a0c633c8 bypass-lint Differential Revision: D5001637 fbshipit-source-id: 4d6250ae7e66ea0aa635a68d943d552e5db65b69	2017-08-16 03:21:49 -07:00
Aapo Kyrola	453c60ce28	Threaded dependency-aware RNNExecutor (frontier/diagonal execution). Summary: This diff adds dependency-aware concurrent/parallel execution of operators in stepnets. For CPU, we use multi-threaded execution. For CUDA, we use multiple streams and cuda events for parallelism and dependency tracking. Much of the diff is about computing dependency graph, which was quite tricky because we need to also avoid write-races of multiple operators running in multiple timesteps in parallel. Also, recurrent blobs "change name" when passing over timestep ("_prev"), so that needs to be handled as well. This diff also restores the link-ops that I unlanded earlier. The performance gain of this diff is very good for CPU (same perf as with static_dag, even better on forward-only). On CUDA, the gains are modest, at least with the sizes i was testing with. Reviewed By: salexspb Differential Revision: D5001637 fbshipit-source-id: 3d0a71593d73a9ff22f4c1a5c9abf2a4a0c633c8	2017-08-15 23:55:15 -07:00
Geet Sethi	2dc8851206	RNN Workspace Blob Extraction Summary: Added operator RecurrentNetworkBlobFetcherOp that takes as input a scratch workspace name and prefix, and copies over all blobs in the scratch workspace into the global workspace. This essentially extracts all intermediate recurrent network computation for each timestep. Added a wrapper in recurrent.py - retrieve_step_blobs(net, prefix='rnn') - which, when called after an rnn is run, will return a list of all blobs extracted from the net. Reviewed By: akyrola Differential Revision: D5421926 fbshipit-source-id: 0f35b466d77d3c719fb0e32de7dbcafc6c0d5225	2017-07-17 10:24:18 -07:00
Thomas Dudziak	5355634dac	Dict fixes/improvements and unittest targets for Python 3 in caffe2 core Summary: As title Reviewed By: salexspb Differential Revision: D5316104 fbshipit-source-id: aee43819d817842e5ce6ba3d045a55b1a2491c30	2017-06-29 17:05:41 -07:00
Alexander Sidorov	df72826ead	Static RNN Summary: Static RNN allows to unroll an RNN into Caffe2 graph using all existing cell abstractions. In this diff I introduce several new tests that already caught a few bugs in our RecurrentNetworkOp gradient accumulation logic by comparing it to an unrolled version. Another use case is perf - potentially we can run an unrolled net faster because DAGNet will have access to the whole graph. Same about memonger. But this work is not part of this diff Reviewed By: akyrola Differential Revision: D5200943 fbshipit-source-id: 20f16fc1b2ca500d06ccc60c4cec6e81839149dc	2017-06-08 17:48:48 -07:00
James Cross	98825d1323	guard against special case of in-place operation Summary: There is an edge case where internal gradient blobs of the backward step net should not be considered internally calclulated if the only "internal" calculation is in-place. In the case of the failing attention unit tests, the offending blob was attention_weighted_encoder_context_grad, which was incorrectly considered internal because it was the output (as well as input) of a Reshape on the step net's edge. The caveat here is that the results may be unpredictable if a non-pass-through in-place operation is applied to a blob within a step net which is also consumed both internally and is a recurrent state/output. (This is an extreme edge case, and difficult to explicitly enforce, but it's worth noting.) Reviewed By: salexspb Differential Revision: D5198328 fbshipit-source-id: 0cfa8f903fd767fc50e727f238ac3d8cdca03fe0	2017-06-07 12:33:31 -07:00
Thomas Dudziak	47e921ba49	Remove map() and filter() in favor of comprehensions Summary: These return views in Python 3 which would not do anything in a lot of usages currently present in Caffe2. This diff simply removes (almost) all usages of these two in Caffe2 and sub projects in favor of comprehensions which are also easier to read/understand Reviewed By: akyrola Differential Revision: D5142049 fbshipit-source-id: e800631d2df7d0823fed698cae46c486038007dc	2017-05-30 15:32:58 -07:00
Luke Yeager	97159810c9	Restore compatibility with protobuf2 Summary: Addresses an issue with `417f74509e`. ``` > operators.append(proto.op.pop()) E AttributeError: 'RepeatedCompositeFieldContainer' object has no attribute 'pop' ``` /cc jhcross Closes https://github.com/caffe2/caffe2/pull/658 Reviewed By: dzhulgakov Differential Revision: D5130382 Pulled By: salexspb fbshipit-source-id: 34e0c39aad5f339c1aaa1506af3e7495193565f4	2017-05-26 08:47:24 -07:00
James Cross	c39f6cf2d0	gradient accumulation fix Summary: As noted by salexspb, MultiRNNCell had unreliable gradient computation. The problem was that recurrent gradient and gradient computed wihtin the backward step net were not being accumulated during the backward pass, but rather writing to the same blob, thus overwriting each other. This diff fixes that by artificially introducing an extra blob for the internal output, and then accumulating it into the gradient coming from the recurrent connection. Reviewed By: salexspb Differential Revision: D5110059 fbshipit-source-id: 16add50989fe8866361bbc21afce5f214c5292fd	2017-05-24 10:33:32 -07:00
Aapo Kyrola	a61778a628	fix recompute_blobs_on_backward Summary: My previous refactoring broke recompute_blobs_on_backward, which was cleared. Reviewed By: urikz Differential Revision: D5013351 fbshipit-source-id: 5945778c0cff2ee2c7f5ad7b59b58f4305fa6a05	2017-05-05 14:01:34 -07:00
Aapo Kyrola	c86610b738	special executor class for RecurrentNetworks (just single threaded now) Summary: This is preamble for the "diagonal executor". Instead of creating a Net for each timestep, we have a single executor for the RecurrentNetworkOp that manages ops per timestep. This will be used if net_type='rnn', so one can still use the old way by using a net type of 'simple' or 'dag' (so there is effective kill-switch if there are some issues with this). Did this only for the forward-model. Gradient op will follow later on, but it is basically similar, just reverse order. Reviewed By: salexspb Differential Revision: D4979933 fbshipit-source-id: bda77918ec518cb6b29d7021ee036d59eb2dd303	2017-05-01 19:06:25 -07:00
Alexander Sidorov	ad6204eb0b	LSTM: support dropping hidden / cell states when sequence Summary: This is useful when data has standalone sequences which are not connected to each other by any meaningful context Reviewed By: yqwangustc Differential Revision: D4835164 fbshipit-source-id: f95626acc26acc3eba3bca7efb08ed1dbdb36c83	2017-04-27 11:47:29 -07:00
Alexander Sidorov	b905166362	RNN: fix bug for parameter gradient in a case when SumOp is Summary: Issue is that AliasOp doesn't work well with swaps that we do for param.grad and param.accGrad. Tensors become the same if there is no reallocation of the gradient tensor inside the backward cell net's local workspace. bug explanation from akyrola: ``` gpu_0/decoder/decoder_hidden_encoder_outputs_sum_grad: tensor A on each timestap back to 0, we Alias gpu_0/decoder/weighted_encoder_outputs_grad, so then also gpu_0/decoder/weighted_encoder_outputs_grad: tensor A It's acc is: gpu_0/decoder/weighted_encoder_outputs_grad_acc: tensor B Now after timesteps, we swap (line 626) with _acc to get gpu_0/decoder/weighted_encoder_outputs_grad: tensor B gpu_0/decoder/weighted_encoder_outputs_grad_acc: tensor A OPTION A -- batch size is same as before or smaller: Then on next iteration, we do again the Alias to gpu_0/decoder/decoder_hidden_encoder_outputs_sum_grad, so now gpu_0/decoder/weighted_encoder_outputs_grad: tensor A and also gpu_0/decoder/weighted_encoder_outputs_grad_acc: tensor A swapping them does nothing and they are the same OPTION B -- batch size increases gpu_0/decoder/decoder_hidden_encoder_outputs_sum_grad is reallocated, becomes tensor C gpu_0/decoder/weighted_encoder_outputs_grad becomes tensor C with Alias gpu_0/decoder/weighted_encoder_outputs_grad_acc: is tensor A ``` Reviewed By: urikz Differential Revision: D4946730 Tags: rnn, caffe2 fbshipit-source-id: b52d63cb238b81d2ad40e05e70deb32a81336f47	2017-04-25 20:46:59 -07:00
Aapo Kyrola	9cb901caf0	Forward-only rnns Summary: Added option to recurrent_net and RNNCell's for forward_only. If this is set, the backward_step_net is not passed to the operator. When backward_step_net is not available, operator knows it is in forward_only mode and does not create workspaces for each step but cycles through only one private workspace. Note: we could avoid doing a lot of work in recurrent.py:recurrent_network call when backward step is not needed, but doing that nicely requires more refactoring that I did not want to do now. Thus, we create the backward step nets etc, but just don't pass it to the op. This can be used to create more efficient inference models. You can also sanitize existing inference nets and remove the backward_step_net argument to get the benefits. Reviewed By: salexspb Differential Revision: D4916482 fbshipit-source-id: c99b93c9cb897c32b0f449253f7f6d6a942618ad	2017-04-24 15:52:27 -07:00
Alexander Sidorov	bf20e4e9b0	Remove MiLSTM from recurrent.py left over after refactoring Summary: its not used Reviewed By: urikz Differential Revision: D4936008 fbshipit-source-id: cc044bbdac0d17503ce9376b98e4bf79a4dc959c	2017-04-24 15:52:26 -07:00
Yury Zemlyanskiy	4bf559eddb	RNNCell, LSTMCell, LSTMWithAttentionCell Summary: This is the nice way to re-use RNN layers for training and for inference. Reviewed By: salexspb Differential Revision: D4825894 fbshipit-source-id: 779c69758cee8caca6f36bc507e3ea0566f7652a	2017-04-18 00:47:20 -07:00
James Reed	e8cc5563fe	Add an optional forget bias argument to LSTMUnit Summary: Add an option to bias the forget gate one way or another by adding in some float value before the sigmoid is applied. Differential Revision: D4880712 fbshipit-source-id: 1306a97c29fb31630838b2f96597a46e952d940a	2017-04-13 21:49:17 -07:00
James Reed	bbcdc91135	Remove prof_dag from step net Summary: prof_dag in step net is not supported (Note: this ignores all push blocking failures!) Differential Revision: D4876551 fbshipit-source-id: 4003e60908e51ef052f8656bf527b326676c298c	2017-04-12 11:01:30 -07:00
Aapo Kyrola	1e5140aa76	option to recompute blobs backward pass with massive memory savings Summary: This diff adds an option to recurrent_net to define some cell blobs to be recomputed on backward step, and thus they don't need to be stored in the step workspace. This is done by modifying the backward step to automatically include all operators that are needed to produce the output that is to be recomputed, and by storing those blobs in a shared workspace. To enable the shared workspace, i had to modify the stepworkspaces blob to also store a forward shared workspace. Making it a class field won't work since the lifecycle of the blob does not match the lifecycle of the operator. For basic LSTM, the performance hit is quite modest (about 15% with one setting, but your mileage might vary. For Attention models, I am sure this is beneficial as computing the attention blobs is not expensive. For basic LSTM, the memory saving is wonderful: each forward workspace only has 4 bytes (for timestep). I also modified the neural_mt LSTM Cells, but there is no test available, so I am not 100% sure I did it correctly. Please have a look. Added options to LSTM, MILSTM and LSTMAttention to enable memory mode. Reviewed By: urikz Differential Revision: D4853890 fbshipit-source-id: d8d0e0e75a5330d174fbfa39b96d8e4e8c446baa	2017-04-11 13:03:48 -07:00
James Reed	4dc1dbab05	MILSTM Cells with and without attention Summary: Bug fix for MILSTM implementation: parameters were not trainable. Reviewed By: urikz Differential Revision: D4864581 fbshipit-source-id: a3fdb7a85c8d87c5117328ca8cae4fb6352728d0	2017-04-11 02:01:23 -07:00
Aapo Kyrola	b0adcf02f8	remove workspace sequence id Summary: As said in the title. This should save a lot of memory if using both train and test workflows. Reviewed By: jhcross Differential Revision: D4855436 fbshipit-source-id: 9eeca548eee118e07bd587c46f40e7beb138318e	2017-04-08 00:01:59 -07:00
Aapo Kyrola	8da2d75ec8	Caffe2/Recurrent] recurrent.py API to cuDNN LSTM Summary: Quite large diff to make cuDNN LSTM and our LSTM produce same results and provide python API for the cuDNN LSTM. * Added operators RecurrentParamGet and RecurrentParamSet to access weights and biases for the different gates, input/recurrent. * Removed RecurrentInit as not needed * recurrent.cudnn_LSTM() returns a special net and mapping that can be used to retrieve the parameters from the LSTM * recurrent.cudnn_LSTM() can be passed blobs that have the parameters for the individual gate weights and biases * recurrnet.InitFromLSTMParams() can be used to initialize our own LSTM from CUDNN params. This way we can test if cuDNN and our own produce the same result. recurrent_test.py tests for the equivalency Reviewed By: salexspb Differential Revision: D4654988 fbshipit-source-id: 6c1547d873cadcf33e03b0e0110248f0a7ab8cb0	2017-04-05 14:20:23 -07:00
Aapo Kyrola	e13e9c1302	cuDNN version of TransposeOp Summary: Uses the cudnnTransformTensor function. It works by shuffling the strides according to the transpose axis. Significant speedup over current GPU version . + moves the transpose test under utility_ops, because hypothesis_test is too big Reviewed By: jamesr66a Differential Revision: D4810993 fbshipit-source-id: 82577c4ced1389e70bd5992820ae4d8297a3817f	2017-04-03 13:33:10 -07:00
Aaron Markham	58f7f2b441	doxygen python block added Summary: Closes https://github.com/caffe2/caffe2/pull/226 Differential Revision: D4793550 Pulled By: JoelMarcey fbshipit-source-id: cc33e58186304fa8dcac2ee9115dcc271d785b1e	2017-03-29 06:46:16 -07:00
Alexander Sidorov	242bff8480	RNN: avoid copy for gradients of inputs to the rnn cell and save more memory! Summary: This is pretty tricky to explain, but we can just use backward_links. This way the whole cell would use a blob from the states_grad tensor instead of having its own blob. This also should save on memory a bit Differential Revision: D4770798 fbshipit-source-id: 673f85b2c2fdf42c47feeaa24d1e2bf086f012f9	2017-03-28 10:02:25 -07:00
Yury Zemlyanskiy	97a6400f03	Don't do copy for param_grad in backward_step_net Summary: We anyway accumulate values of this blob (param_grad) in a another special internal blob Differential Revision: D4768643 fbshipit-source-id: a9d08b7eafd25f278a8db722f9cdb1d0064b852a	2017-03-24 02:22:33 -07:00
Deepak Gopinath	422c65ca35	Removing unnecessary Copy after fixing gradients for external parameters Summary: Apart from copying gradient blobs for inputs with initial_cell_input, we needed to perform a similar operation for external parameters used by the step net Reviewed By: salexspb Differential Revision: D4752259 fbshipit-source-id: 13ee48cf583ed86221a4cc1cc9f57f5c3a7d2450	2017-03-23 15:04:22 -07:00
Yury Zemlyanskiy	ea66516d5e	Output attention weights from apply_xxx_attention methods Summary: OSS diff. We need it later for beam decoding. Differential Revision: D4747785 fbshipit-source-id: ce2d53ee2434216ace3c4ddbd40a9b68e9db7ec5	2017-03-21 19:01:58 -07:00
Alexander Sidorov	d7b2aebf2c	Support for Sum in cell net as first operator Summary: This didn't work for a reason specified in comments. Also some cleanup in the unit tests, now inference uses a custom workspace to run cell net on Reviewed By: urikz Differential Revision: D4742670 fbshipit-source-id: 04165c029fddec5ae31b20b207faf06d2fa20816	2017-03-21 18:32:18 -07:00
James Reed	33f41c06c0	Remove more instances of batch_size Summary: D4734505 part 2. Remove more instances of the batch_size parameter Reviewed By: urikz Differential Revision: D4736906 fbshipit-source-id: fc9d374e9308017d61c427890364c5ab9cec2edf	2017-03-19 22:31:30 -07:00
Yury Zemlyanskiy	d1424c3265	Revert D4702086: Remove batch_size parameter from attention and LSTMWithAttention interfaces Summary: This reverts commit c4c1d8425cd36c1e86695918eaba2667c27e9601 Differential Revision: D4702086 fbshipit-source-id: 4620610b182bb84b9297b5de32782761ae89d20b	2017-03-17 17:36:47 -07:00
James Reed	10d95bd0f0	Remove batch_size parameter from attention and LSTMWithAttention interfaces Summary: Reshape based on tensor shapes in the graph rather than based on a passed-in batch_size parameter Reviewed By: urikz Differential Revision: D4702086 fbshipit-source-id: c4c1d8425cd36c1e86695918eaba2667c27e9601	2017-03-16 11:47:52 -07:00
Aapo Kyrola	26628d10ff	Fix workspace clashes Summary: For example, test and train nets could have shared workspaces, leading to race condition. This adds an assertion and adds a running counter to the workspace-blob name. Reviewed By: jhcross Differential Revision: D4712152 fbshipit-source-id: 808d7069095bac24ebfe0c9d31ebd134f4cf0956	2017-03-14 23:33:28 -07:00
Aapo Kyrola	91f468b15c	fixes to make data parallel model work for RecurrentNet + test case Summary: First, this diff includes a full test of data-parallel LSTM, which confirms it works correctly. To make it work, some changes had to be made: - cell net/step net external inputs must be namespace scoped - prevent double-namescoping of cellnet inputs - make data parallel model understand recurrentnets so the device-mapping works Reviewed By: salexspb Differential Revision: D4708840 fbshipit-source-id: 4b0ddc43642d449076a2b6f67ad1c47f84138ff4	2017-03-14 15:48:07 -07:00
Karthik Prasad	965a7daf9b	Implement MILSTM in caffe2 Summary: Created a new function with specifics related to MI LSTM implementation in caffe2 See https://arxiv.org/pdf/1606.06630.pdf for details. See D4478877 for the implementation of the same in tensorflow Reviewed By: jhcross Differential Revision: D4669882 fbshipit-source-id: 095bbcf187dbdac2cd79558ff0c8f9f67d8af639	2017-03-09 16:32:47 -08:00
James Reed	8de1db9eb6	Implement recurrent attention in C2 Summary: Super rough implementation of recurrent attention. Planning to factor out the common code between the two functions as well as train and eval. I want to get this out and get eyes on it sooner rather than later Differential Revision: D4647837 fbshipit-source-id: 54bc4e8ed0df6f04c86c425926decbe89f73b068	2017-03-08 11:21:28 -08:00
Yury Zemlyanskiy	4a53ab3cb6	LSTMWithAttention implementation in Caffe2 Summary: Implementation of ##LSTMWithAttention## Still TBD: 1. There are problems with back propagation, because gradient is not implemented for ops with broadcasting 2. I need to make initial_recurrent_state to be of shape [dim] rather than [1, batch_size, dim], so one doesn't need to provide batch_size to LSTMWithAttention Differential Revision: D4298735 fbshipit-source-id: 8903fcff4d6a66647ee6d45a6ef28803fc3091e5	2017-02-23 04:08:34 -08:00
James Cross	b436788b16	LSTMUnit: pass through H values Summary: Pass through the h-value recurrent output unchanged at each LSTM step beyond the valid part of a sequence (computed based on seqLengths, allowing batching of sequences of different length). This enables using the final-step output of each sequence as the output when one vector is desired for the entire sequence. Gradient also passed back unchanged. Also made some cosmetic changes to recurrent_network_test.py (seq_lengths offset corrected, should be in [1, T] rather than [0, T-1]). Reviewed By: urikz Differential Revision: D4540307 fbshipit-source-id: 73a9f6326069d713dcb0cdc8d17869317c6dbe96	2017-02-16 15:31:38 -08:00
James Cross	63901e9aca	allow recurrent network gradient op to receive gradient on any combination of network output blobs Summary: (Caffe2) Modified RecurrentNetworkGradient operator so that training is possible with any of the output blob(s) receiving gradient during the backward pass. This is realized through a new argument for the RecurrentNetwork op, outputs_with_grads, which takes a list of the indices of the output blobs which will receive gradient. The default case (only receiving gradient from the first output blob) remains the default. New unit test covers the case where outputs_with_grads = [1, 2] using Python LSTM wrapper. Reviewed By: urikz Differential Revision: D4518516 fbshipit-source-id: 5c531582b20f3cf727d1aa91239b4d5a2b8a7c1f	2017-02-15 16:00:45 -08:00
James Cross	93795406c5	Adapt NLU proj code for Caffe2 RecurrentNetworkOp changes Summary: Updates function revise_recurrent_network_op() which supports cloning recurrent networks by adding a blob-name prefix to string arguments to maintain correspondence. Previously relied on many hard-coded indices referring to the positions of arguments and inputs of RecurrentNetworkOp and its corresponding gradient operator, and therefore broke when the implementation changed. This fix should make it more general and robust Differential Revision: D4559768 fbshipit-source-id: fb85b0b1ffb1393dc84760d6ae5dc473e8b764b0	2017-02-15 16:00:44 -08:00

1 2

56 Commits