pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
Yury Zemlyanskiy	22e1bdd6d1	Use stack workspaces in RecurrentNetwork Summary: This diff use stack workspaces in RecurrentNetwork, which allows to simplify the implementation and get rid of scratches. Reviewed By: salexspb Differential Revision: D4446813 fbshipit-source-id: 514eec7e4300bdf492a9cb192b40cf4f89acf656	2017-01-27 11:44:26 -08:00
Xianjie Chen	ddbf90afa3	improve dper dh Summary: it's broken because it relies on add sparse bias. it's not easy to add_sparse_bias after switch to loader_param. DPA would like to try it out :) Differential Revision: D4447275 fbshipit-source-id: 631cb4995f35383070e44387dc86692ba64b91eb	2017-01-25 02:59:22 -08:00
Yury Zemlyanskiy	0e3146e1e8	Remove recurrent_sizes from RecurrentNetwork Summary: Remove usage of recurrent_sizes, so recurrent states' sizes can depend on input (in case of attention matrix for beam decoder). I removed recurrent_sizes from forward and backward steps. Reviewed By: salexspb Differential Revision: D4427688 fbshipit-source-id: 580420a294d309c86ec5cb4e677058623b7228e1	2017-01-24 23:14:25 -08:00
Alexander Sidorov	b1472a173a	don't hardcode outputs order to work only for lstm + don't pass blob names for parameters Summary: In this diff I stop passing parameters by name and also remove hardcoded output ids which were there specifically for LSTM to work. It also allows to avoid using recurrent_sizes in the backward pass (for forward this is done in D4427688) Using similar technic it should be simple enough to eliminate blob name passing at all. Then we can fix scoping. These can be done in a next diff. Reviewed By: urikz Differential Revision: D4444614 fbshipit-source-id: 3580a76365502b9f2f09e3d8b7e78084ca739f00	2017-01-24 16:29:23 -08:00
Chao Zhang	96fc095ccb	Add piecewise linear transformation operator Summary: New operator is added for model calibration. Given a piecewise linear function and raw prediction as input, generate the mapping as output. Detail can be find in the operator doc. Differential Revision: D4418640 fbshipit-source-id: f8ff3ea786b0fe233a4ddcb709e5dbf0861ca484	2017-01-23 17:44:26 -08:00
Andrew Tulloch	e23ddf06e9	UnsafeCoalesceOp for `nn.Module.flattenParameters` style coalescing Summary: This is a handy tool for amortizing expensive operators (e.g. distributed communication, some heavier kernel launches, etc) over a lot of small blobs (e.g. all the biases in a network). We can just coalesce these small blobs in-place into a single blob, act on them in operators, etc as if they are non-coalsed (passing them as inputs to operators, etc), and then finally for heavier operators, just work on the coalesced blob that contains each of these units. I named it UnsafeCoalesce since it introduces blob aliasing, which needs care for work like memory management, graph rewriting as in memonger, etc. Reviewed By: Yangqing Differential Revision: D3557149 fbshipit-source-id: 09cff4459b84270fe9e1da3b4a168fd66d01f795	2017-01-17 17:14:35 -08:00
Pooya Davoodi	92ebb58a06	Top-k accuracy operator on host Summary: Automatically copy from device -> host if necessary. Thanks to pooyadavoodi for the host top-k code. Closes https://github.com/caffe2/caffe2/pull/51 Reviewed By: Yangqing Differential Revision: D4348953 Pulled By: bwasti fbshipit-source-id: be650855cdd6c2c7bed838155f30e9fa92759dfe	2017-01-10 18:44:30 -08:00
Yury Zemlyanskiy	c2d28fb874	RNNs API simplification Summary: This is a first step in improving our RNN story. It provides a wrapper around current RecurrentNetworkOp implementation which infers most of the redundant parameters and makes API much simpler. Also in order to support general step nets I added an extra argument to the RecurrentNetworkOp. Future work: 1. Inferring step net output and internal blobs (scratches) sizes and type 2. Avoid accessing blobs by names in c++ part 3. Remove requirement for inputs / output 1:1 correspondence in the step net 4. Make python API support networks with operators like Sum being on the boarder of the Cell net (currently there is an issue with such networks where gradient blobs which are on the side are not explicitly created). Differential Revision: D4268503 fbshipit-source-id: f8a66491c2b55daa730caeed7e9f2b3921541b49	2016-12-21 09:29:43 -08:00
Simon Layton	84e7eff458	Waive some hypothesis tests on GPU Summary: operators don't exist on GPU Closes https://github.com/caffe2/caffe2/pull/63 Reviewed By: Yangqing Differential Revision: D4348968 Pulled By: bwasti fbshipit-source-id: 1fb8693842d6827ffcf96de2a9a8ba2f9dff0293	2016-12-19 15:59:32 -08:00
Yangqing Jia	42bbdda8c4	MKLDevice and MKLOperator Summary: This adds Caffe2 support for MKL operators directly with MKLMemory. Included a Relu layer that shows how to use it. Reviewed By: salexspb Differential Revision: D4322144 fbshipit-source-id: 8b3392c4fd024ab1a7ba7135c349ebd3e1976799	2016-12-15 19:59:24 -08:00
Yangqing Jia	dc16bcfa27	Remove float64 test Summary: float64 test breaks things on the cuda side. I am deleting it for now and if we add it back, let's make sure we run the test on a GPU machine first :) Reviewed By: azzolini Differential Revision: D4324427 fbshipit-source-id: 0246fe9dd28a286422ca94c90f5b0fc33a162e74	2016-12-15 12:01:30 -08:00
Maxime Boucher	4cd263db74	Last N window collector Summary: Allows to collect samples over multiple batches. The method uses a circular array and so there is no guarantee about the order of the samples. The goal is to get a view of the data accross multiple batches Reviewed By: salexspb Differential Revision: D4216181 fbshipit-source-id: bb9e1fa84ac7e04006dcddb53c9347a42ec83dc8	2016-12-15 12:01:30 -08:00
Xianjie Chen	0bc104a3d0	fix unit test Summary: ... Differential Revision: D4298663 fbshipit-source-id: 7831830a5201eb6603d846460c22b2f906e53858	2016-12-15 12:01:29 -08:00
Xianjie Chen	3c47d41f86	add unit test for row mul Summary: so that we are more confident. Differential Revision: D4290132 fbshipit-source-id: 44e4687d977ab90cc022a14131bbf701bdf131d4	2016-12-15 12:01:29 -08:00
Xianjie Chen	f41b2ca85c	fix sliceop for empty batch Summary: Used in the NNPreProc layers. It fails the online training when there is empty batch. Reviewed By: dzhulgakov Differential Revision: D4235498 fbshipit-source-id: bde00a011831762e44a3f9bf2190d4b241a06ccc	2016-11-29 15:18:39 -08:00
Wenlin Chen	9fa26fcc32	position weighted embedding Summary: Each sparse feature is a ID list. And usually the position of the id in the id list is meaningful. The earlier the id appears in the list, the more important. In this diff, we multiple each embedding with a weight, where the weight corresponds to the position. With this change, same ID appears on different position would have different norm/length/importance after aggregation. The firstX transformation in sigrid is a special case of this model where the weights before n are 1, and 0 after n, where n is the argument of firstX. Reviewed By: xianjiec Differential Revision: D4181251 fbshipit-source-id: 2a6f8b7240af445b6bd2052fd24c2d99f39ee7ff	2016-11-29 15:18:35 -08:00
Yangqing Jia	589398950f	fbsync at f5a877	2016-11-18 15:41:06 -08:00
Yangqing Jia	238ceab825	fbsync. TODO: check if build files need update.	2016-11-15 00:00:46 -08:00
Yangqing Jia	d1e9215184	fbsync	2016-10-07 13:08:53 -07:00
Yangqing Jia	b23e51d467	chunky sync	2016-09-06 15:55:19 -07:00
Yangqing Jia	05512d1e10	sync	2016-08-10 11:02:15 -07:00
Yangqing Jia	c15e45c9bb	chunky sync again	2016-08-01 20:58:46 -07:00
Yangqing Jia	bcea409c82	sync	2016-07-28 15:06:43 -07:00
Yangqing Jia	6463eebc7b	chunky sync - build scripts to be written	2016-07-21 10:16:42 -07:00
Yangqing Jia	559053d3a8	chunky sync	2016-05-13 14:43:48 -07:00

1 2 3

125 Commits