pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
Bugra Akyildiz	27c7158166	Remove __future__ imports for legacy Python2 supports (#45033 ) Summary: There is a module called `2to3` which you can target for future specifically to remove these, the directory of `caffe2` has the most redundant imports: ```2to3 -f future -w caffe2``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/45033 Reviewed By: seemethere Differential Revision: D23808648 Pulled By: bugra fbshipit-source-id: 38971900f0fe43ab44a9168e57f2307580d36a38	2020-09-23 17:57:02 -07:00
Xiaomeng Yang	9243b64bff	[Caffe2] Update elementwise ops to support numpy style boradcast (#8070 ) * Update elementwise ops to support numpy style boradcast Update elementwise ops to support numpy style boradcast * Fix sqrt_op * Fix compare ops * Fix gradient test * Fix optimizer legacy broadcast * Fix legacy broadcast for elementwise ops * Skip flaky test * Fix eigen simple binary op * Fix attention test * Fix rnn test * Fix LSTM test * Fix tan grad * Fix schema check	2018-06-05 15:49:16 -07:00
Orion Reblitz-Richardson	1d5780d42c	Remove Apache headers from source. * LICENSE file contains details, so removing from individual source files.	2018-03-27 13:10:18 -07:00
Yangqing Jia	8286ce1e3a	Re-license to Apache Summary: Closes https://github.com/caffe2/caffe2/pull/1260 Differential Revision: D5906739 Pulled By: Yangqing fbshipit-source-id: e482ba9ba60b5337d9165f28f7ec68d4518a0902	2017-09-28 16:22:00 -07:00
Aapo Kyrola	fb45383ed6	resubmission of PR1175: fp16 BatchMatMul Summary: PR 1175 caused a build error because gemmBatched was only under a specific #ifdef. Now put it outside the #ifdef, and things work. Reviewed By: asaadaldien Differential Revision: D5834868 fbshipit-source-id: 072a64c8f4b259ff7504104121766115b46b8aa0	2017-09-14 21:46:05 -07:00
Yangqing Jia	f0d0361609	Revert D5794634: [caffe2][PR] fp16: BatchMatMul Summary: This reverts commit 911c462824edec3de529a5a4385a4c437e24bf59 bypass-lint Differential Revision: D5794634 fbshipit-source-id: 1863b02282329cbee6b10e5870f03051b4bb6c58	2017-09-13 18:46:47 -07:00
Luke Yeager	3cfc6f26e7	fp16: BatchMatMul Summary: Was https://github.com/caffe2/caffe2/pull/1151 Closes https://github.com/caffe2/caffe2/pull/1175 Reviewed By: Yangqing Differential Revision: D5794634 Pulled By: akyrola fbshipit-source-id: 911c462824edec3de529a5a4385a4c437e24bf59	2017-09-13 14:35:25 -07:00
James Cross	53ccbd9a6e	soft-coverage attention Summary: Implementation of a new variant of attention module, which contains a recurrent decoder state with vectors corresponding to each source-side word and strictly increasing values, thus enabling it to model the degree to which source words have been translated. The approach is a variant of the approaches described in https://arxiv.org/pdf/1601.04811.pdf. We simply include the sum of all previous attention weights for encoder words as a new recurrent state (coverage_t). A new linear transform on encoder_outputs is used to produce coverage_weights, which has the same dimensionality as encoder_outputs, and implicitly models the fertility of source-side words (and putting this extra information strain on the encoder network). Thus the encoder output, the decoder state, and the coverage weights have the same dimensionality for a given source word, and attention logits are calculated as v * tanh(coverage * coverage_weights + encoder_output + decoder_state). Note: the entire coverage state for each translation instance is of shape (encoder_length, coverage_units), but the states for the RecurrentNetwork operator, used to train the decoder, must be flat in the data dimension. This state is therefore initialized with shape (encoder_length * coverage_units) [not shown in the open-source library] and reshaped appropriately within the apply_soft_coverage_attention() function. Differential Revision: D5593617 fbshipit-source-id: 7d0522b5eb0b26f22e8429e4461a459f2f16ed46	2017-08-31 21:21:54 -07:00
Juan Miguel Pino	434fa7f694	Reduce memory usage for dot attention Summary: Title Differential Revision: D5569996 fbshipit-source-id: c705fc7870ac3e71a071c3f808ac885a82334af2	2017-08-14 12:35:50 -07:00
James Reed	ffd9316b03	Use SequenceMask op in attention code for sequence masking Summary: Use the new SequenceMask op to mask out invalid positions in the attention mechanism rather than using PackSegments and UnpackSegments. This should help us on several fronts, including elision of host<>device copies and using fewer intermediate blobs Differential Revision: D5619156 fbshipit-source-id: e59c644236cee02f853d8743f9a938fb10adc73b	2017-08-12 19:17:49 -07:00
James Cross	4758bd851b	rectify args btw. train and translate Summary: Make the command-line arguments pertaining to model architecture the same as between train.py and translate.py. Also use s() scoping function for all intermediate blobs in attention.py (this is for comatibility with multi-headed attention). Differential Revision: D5594312 fbshipit-source-id: cadf51d854b5a9174ec913f32c655be2abf111e5	2017-08-10 15:27:18 -07:00
Juan Miguel Pino	4d8a8c2e1e	Implement dot attention Summary: Implement dot attention as described in https://arxiv.org/abs/1508.04025 This saves the computation of weighted encoder outputs in `rnn_cell.py` When the encoder and decoder dimensions are different, we apply an FC, which corresponds to the general case below Figure 2. Refactored unit tests. Reviewed By: jhcross Differential Revision: D5486976 fbshipit-source-id: f9e9aea675b3b072fbe631bc004199b90a9d95cb	2017-08-06 11:50:16 -07:00
James Cross	99e79a616b	attention with encoder_lengths Summary: For RNN attention, we should not include the invalid parts of the encoder output (based on encoder_lengths) in the computation. This diff accomplishes that by forcing logits for those positions to be negative infinity. Note that the this step can be bypassed by passing encoder_lengths=None, which is what we do for beam search, thus incurring no extra overhead for inference. Reviewed By: jamesr66a Differential Revision: D5402547 fbshipit-source-id: 1863d6050b5129e4df829c6357f0aa9ded0715dc	2017-07-23 10:06:01 -07:00
James Cross	29887f556f	Unrolled test for AttentionCell Summary: Adding a test to check computational integrity of networks constructed with AttentionCell using UnrolledCell. Reviewed By: salexspb Differential Revision: D5306915 fbshipit-source-id: 02acfd1011f7d3ee5fac21cc2778c4a486190c43	2017-06-25 17:21:24 -07:00
Aapo Kyrola	2a9cb7d4a9	use brew for Tranpose --> major perf regression fix Summary: I accidentaly noticed that we were calling the non-CUDNN version of Transpose with attention, and it is super slow. This broke when rnn_cell was changed to use ModelHelper instead of CNNModelHelper in D5062963, but calls to transpose were not "brewed". Reviewed By: jamesr66a Differential Revision: D5264248 fbshipit-source-id: b61494ae210f34597245f1195d20547f5b5cd8b5	2017-06-16 11:02:48 -07:00
Thomas Dudziak	60c78d6160	Fixes range/xrange for Python 3 Summary: As title Differential Revision: D5151894 fbshipit-source-id: 7badce5d3122e8f2526a7170fbdcf0d0b66e2638	2017-06-07 00:04:26 -07:00
Alexander Sidorov	016f72537a	ModelHelper.create_param, Initializer abstraction and ParameterInfo for optimizers Summary: This is going to unblock Nvidia in their work on adding fp16 support to Caffe2. I discussed this with kennyhorror before to make sure this fits into his work on parameter sharing. Reviewed By: kennyhorror Differential Revision: D5127797 fbshipit-source-id: 4db155d320b1862570c23b77c4252bdacbf2296f	2017-05-25 22:03:15 -07:00
Yiming Wu	a28b01c155	rnn with brew Summary: Update rnn_cell.py and char_rnn.py example with new `brew` model. - Deprecated CNNModelHelper - replace all helper functions with brew helper functions - Use `model.net.<SingleOp>` format to create bare bone Operator for better clarity. Reviewed By: salexspb Differential Revision: D5062963 fbshipit-source-id: 254f7b9059a29621027d2b09e932f3f81db2e0ce	2017-05-16 13:33:44 -07:00
Yury Zemlyanskiy	48de1ea165	Drop extra Reshape in attention calculation Summary: We can avoid this extra Reshape. Reviewed By: jamesr66a Differential Revision: D5032874 fbshipit-source-id: 92bd568bc6bec53d7f81a64cfa96d2c610823f8c	2017-05-09 17:16:36 -07:00
Yury Zemlyanskiy	b6a8dd1438	don't recompute small blob in attention Summary: decoder_hidden_encoder_outputs_sum_tmp is tiny after D5010109, no need to recompute it. Reviewed By: akyrola Differential Revision: D5014335 fbshipit-source-id: cc9e8f91372889d10bd99c79366018cb3943a435	2017-05-08 13:06:06 -07:00
Yury Zemlyanskiy	d7f20c94fd	Optimize memory for RNN attention Summary: The fix should save us (source_len - 1) * target_len * batch_size * encoder_output_size * 4 bytes for the forward pass. Typically, these values are 100 * 100 * 128 * 512 * 4 = 2.4GB. Not entirely sure about backward pass. Reviewed By: akyrola Differential Revision: D5010109 fbshipit-source-id: 2ed68f3ebfd3b8362916d24af991482f1686e064	2017-05-05 12:18:50 -07:00
James Cross	51033f19d7	unbreak test_seq2seq_caffe2_model_cnn_one_stack_encoder Summary: Fixes unit test test_seq2seq_caffe2_model_cnn_one_stack_encoder, broken by D4905003. (Also some commas.) Differential Revision: D4920699 fbshipit-source-id: 2fe501095e3e26a475d666afcae8e48c953f2eef	2017-04-20 10:06:25 -07:00
Aapo Kyrola	1e5140aa76	option to recompute blobs backward pass with massive memory savings Summary: This diff adds an option to recurrent_net to define some cell blobs to be recomputed on backward step, and thus they don't need to be stored in the step workspace. This is done by modifying the backward step to automatically include all operators that are needed to produce the output that is to be recomputed, and by storing those blobs in a shared workspace. To enable the shared workspace, i had to modify the stepworkspaces blob to also store a forward shared workspace. Making it a class field won't work since the lifecycle of the blob does not match the lifecycle of the operator. For basic LSTM, the performance hit is quite modest (about 15% with one setting, but your mileage might vary. For Attention models, I am sure this is beneficial as computing the attention blobs is not expensive. For basic LSTM, the memory saving is wonderful: each forward workspace only has 4 bytes (for timestep). I also modified the neural_mt LSTM Cells, but there is no test available, so I am not 100% sure I did it correctly. Please have a look. Added options to LSTM, MILSTM and LSTMAttention to enable memory mode. Reviewed By: urikz Differential Revision: D4853890 fbshipit-source-id: d8d0e0e75a5330d174fbfa39b96d8e4e8c446baa	2017-04-11 13:03:48 -07:00
James Reed	66d00b3a63	Use CUDNN softmax implementation Summary: The caffe2 implementation of bare Softmax() has a race condition that wipes out the numerical stability trick. Use the CUDNN implementation instead Reviewed By: urikz Differential Revision: D4831298 fbshipit-source-id: d11b1de700e3954629e7ed43225a2416c27b3252	2017-04-04 20:02:21 -07:00
Aapo Kyrola	e13e9c1302	cuDNN version of TransposeOp Summary: Uses the cudnnTransformTensor function. It works by shuffling the strides according to the transpose axis. Significant speedup over current GPU version . + moves the transpose test under utility_ops, because hypothesis_test is too big Reviewed By: jamesr66a Differential Revision: D4810993 fbshipit-source-id: 82577c4ced1389e70bd5992820ae4d8297a3817f	2017-04-03 13:33:10 -07:00
Aaron Markham	58f7f2b441	doxygen python block added Summary: Closes https://github.com/caffe2/caffe2/pull/226 Differential Revision: D4793550 Pulled By: JoelMarcey fbshipit-source-id: cc33e58186304fa8dcac2ee9115dcc271d785b1e	2017-03-29 06:46:16 -07:00
Deepak Gopinath	422c65ca35	Removing unnecessary Copy after fixing gradients for external parameters Summary: Apart from copying gradient blobs for inputs with initial_cell_input, we needed to perform a similar operation for external parameters used by the step net Reviewed By: salexspb Differential Revision: D4752259 fbshipit-source-id: 13ee48cf583ed86221a4cc1cc9f57f5c3a7d2450	2017-03-23 15:04:22 -07:00
Yury Zemlyanskiy	ea66516d5e	Output attention weights from apply_xxx_attention methods Summary: OSS diff. We need it later for beam decoding. Differential Revision: D4747785 fbshipit-source-id: ce2d53ee2434216ace3c4ddbd40a9b68e9db7ec5	2017-03-21 19:01:58 -07:00
Yury Zemlyanskiy	93ff338ca7	Beam decoder for NMT in Caffe2 Summary: yolo5 Differential Revision: D4685076 fbshipit-source-id: b5534e441bb453f90e5210294f2dfff6b5c3b5b1	2017-03-20 22:03:59 -07:00
James Reed	33f41c06c0	Remove more instances of batch_size Summary: D4734505 part 2. Remove more instances of the batch_size parameter Reviewed By: urikz Differential Revision: D4736906 fbshipit-source-id: fc9d374e9308017d61c427890364c5ab9cec2edf	2017-03-19 22:31:30 -07:00
James Reed	17da5856ed	Remove batch_size parameter from attention and LSTMWithAttention interfaces Summary: Reshape based on tensor shapes in the graph rather than based on a passed-in batch_size parameter Reviewed By: urikz Differential Revision: D4734505 fbshipit-source-id: d9c23d85be84f61124106e752ef2b4f6945e2a07	2017-03-19 18:16:28 -07:00
Yury Zemlyanskiy	d1424c3265	Revert D4702086: Remove batch_size parameter from attention and LSTMWithAttention interfaces Summary: This reverts commit c4c1d8425cd36c1e86695918eaba2667c27e9601 Differential Revision: D4702086 fbshipit-source-id: 4620610b182bb84b9297b5de32782761ae89d20b	2017-03-17 17:36:47 -07:00
James Reed	10d95bd0f0	Remove batch_size parameter from attention and LSTMWithAttention interfaces Summary: Reshape based on tensor shapes in the graph rather than based on a passed-in batch_size parameter Reviewed By: urikz Differential Revision: D4702086 fbshipit-source-id: c4c1d8425cd36c1e86695918eaba2667c27e9601	2017-03-16 11:47:52 -07:00
James Reed	8de1db9eb6	Implement recurrent attention in C2 Summary: Super rough implementation of recurrent attention. Planning to factor out the common code between the two functions as well as train and eval. I want to get this out and get eyes on it sooner rather than later Differential Revision: D4647837 fbshipit-source-id: 54bc4e8ed0df6f04c86c425926decbe89f73b068	2017-03-08 11:21:28 -08:00
Yury Zemlyanskiy	4a53ab3cb6	LSTMWithAttention implementation in Caffe2 Summary: Implementation of ##LSTMWithAttention## Still TBD: 1. There are problems with back propagation, because gradient is not implemented for ops with broadcasting 2. I need to make initial_recurrent_state to be of shape [dim] rather than [1, batch_size, dim], so one doesn't need to provide batch_size to LSTMWithAttention Differential Revision: D4298735 fbshipit-source-id: 8903fcff4d6a66647ee6d45a6ef28803fc3091e5	2017-02-23 04:08:34 -08:00

35 Commits