pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Aapo Kyrola	27e01744b2	Probably fixed memonger Summary: This diff fixes various issues with memonger, and works at leasrt with rbgirshick's failure case, Resnet-50, and new harder unit test. I will still create a proper resnet50-test. 1) Introduce concept of "tokens". These are passed down the dependency chains, and a blob can be used for recycling only if it owns all the tokens that are currently in possession. Tokens are added when branching, and tokens are redeemed after all inputs are satisfied. A bit hard to explain. 2) There were various bugs due to bad code: the free_blobs data structure is of different type when we have blob sizes and when we haven't. I plan to rewrite this soon. But there were some bugs. 3) Added a harder unit test that failed before. 4) Added test for resnet50 + memonger Reviewed By: asaadaldien Differential Revision: D5193393 fbshipit-source-id: bc2a714877aa1201c32a5ba8ade862865e455711	2017-06-08 09:19:24 -07:00
Aapo Kyrola	feba1eed00	resnet50: fetch right lr Summary: I broke resnet50 when switching to use optimizer, which uses LR per parameter. This only happens after each epoch, and I did no test patiently enough. For a stop-gap, while asaadaldien works on a better solution, just fetch the lr of a conv1_w param. Reviewed By: asaadaldien Differential Revision: D5207552 fbshipit-source-id: f3474cd5eb0e291a59880e2834375491883fddfc	2017-06-07 21:46:35 -07:00
Yiming Wu	4fefff0bbb	Auto injecting device copy for single net and several nets Summary: This diff plan to attack the problem where we want to just annotate device option for operators and leave Caffe2 to help us inject cross device copy functions. This feature would be useful for mixed device training and multi device training with several nets, where previously we do the heavy lifting of adding copy functions ourselves. Ideally, this feature will happen like this: //construct your nets first core.InjectDeviceCopyAmongNets([train_init, train_net, ...]) My ideas are written in comments. I will update them here as well later. Reviewed By: dzhulgakov Differential Revision: D5134103 fbshipit-source-id: 173f7da9d1773d1c50ccdc27f1b5cd3067b04af5	2017-06-07 20:03:18 -07:00
Peizhao Zhang	87a12dd355	Caught exception when fetching uninitialized blobs when collecting blob sizes in workspace. Summary: Caught exception when fetching uninitialized blobs when collecting blob sizes in workspace. Some of the output blobs (like mask output of DropOut when is_test=1) may be nullptr and FetchBlob will fail. Differential Revision: D5198641 fbshipit-source-id: 45ee26c4cb1c25cc48904e9f7d7c007224c97418	2017-06-07 15:35:32 -07:00
Ran Xian	4316fb4876	Implement APMeter op Summary: Implements an APMeter operator (APMeterOp) to calculate AP for multilclass classification given prediction socres and labels. The Op takes a score tensor [nsamples x nclasses] and a label tensor [nsamples x nclasses], and outputs a float tensor of size nclasses as the AP for each class. Reviewed By: akyrola Differential Revision: D5082565 fbshipit-source-id: ae7304bc8fc999c361245b9aec38eb9a5f5eef4b	2017-06-07 15:03:04 -07:00
Zhicheng Yan	ee3727db00	add_helper_function_ElementwiseLinear_op Summary: Add a helper function for parametric op ElementwiseLinear The typical syntax is model.ElementwiseLinear(input, output, dimension) Reviewed By: harouwu, akyrola Differential Revision: D5114152 fbshipit-source-id: 8e8c691f824f518ae510a72ab0c12de1b018f3b5	2017-06-07 13:49:48 -07:00
James Cross	98825d1323	guard against special case of in-place operation Summary: There is an edge case where internal gradient blobs of the backward step net should not be considered internally calclulated if the only "internal" calculation is in-place. In the case of the failing attention unit tests, the offending blob was attention_weighted_encoder_context_grad, which was incorrectly considered internal because it was the output (as well as input) of a Reshape on the step net's edge. The caveat here is that the results may be unpredictable if a non-pass-through in-place operation is applied to a blob within a step net which is also consumed both internally and is a recurrent state/output. (This is an extreme edge case, and difficult to explicitly enforce, but it's worth noting.) Reviewed By: salexspb Differential Revision: D5198328 fbshipit-source-id: 0cfa8f903fd767fc50e727f238ac3d8cdca03fe0	2017-06-07 12:33:31 -07:00
Thomas Dudziak	d524d5b481	Fixes zip/izip for Python 3 Summary: As title Reviewed By: salexspb Differential Revision: D5154186 fbshipit-source-id: 2ef24557d82ae16d3bdfbc90a4cc96be8e2dc6c3	2017-06-07 00:04:26 -07:00
Thomas Dudziak	60c78d6160	Fixes range/xrange for Python 3 Summary: As title Differential Revision: D5151894 fbshipit-source-id: 7badce5d3122e8f2526a7170fbdcf0d0b66e2638	2017-06-07 00:04:26 -07:00
Ahmed Taei	4c5d101caf	Implement ColwiseMax and RowwiseMax reduction ops. Differential Revision: D5192949 fbshipit-source-id: e7e877b4bea19dd1be94449d45d2733f4858b8e7	2017-06-06 21:17:29 -07:00
Aarti Basant	93ac6a9837	checkpointing for distributed hive reader. Summary: The goal of this diff is: 1) Enable checkpointing to honor batches_per_epoch 2) Resume hive_readers mid-split Reviewed By: azzolini Differential Revision: D5004212 fbshipit-source-id: 2ff5df30ba946eefadd109d80056cde67398a080	2017-06-06 14:20:06 -07:00
Wenyi Huang	7723129d14	Add gradient for topK op Summary: Input of topK op: X (dense) Output of topK op: Value and Indices (sparse representation) Value will have gradient in some cases, We backprop (copy) the gradient from sparse (d Value) to dense (d X) Differential Revision: D5133461 fbshipit-source-id: 7bad55b60e8a22dfe0e51357ce2099d7f752c133	2017-06-06 14:20:06 -07:00
Xiangyu Wang	c9c862fa8f	16117716 [Caffe2 OSS] make char-rnn exapmle use build_sgd Summary: replace hand made sgd with build_sgd Reviewed By: salexspb Differential Revision: D5186331 fbshipit-source-id: 3c7b4b370e29a1344b95819766463bae3812c9a6	2017-06-06 13:54:59 -07:00
Dmytro Dzhulgakov	80fe2e5caf	Fix from_column_list Summary: Previous implementation relied on the order of fields for some reason. Reviewed By: azzolini Differential Revision: D5164478 fbshipit-source-id: 12717310860584e18ce4ca67d0bd5048354cdc0a	2017-06-06 01:17:02 -07:00
Yiming Wu	8cd208ad6f	Infer input and output device from OperatorDef through OperatorSchema Summary: Infer input and output device from OperatorDef through OperatorSchema. This is inspired by shape inference. With this feature, we can easily analysis device information for all blobs in the net in a generic way. It is really helpful for auto cross device execution. Reviewed By: akyrola, dzhulgakov Differential Revision: D5161065 fbshipit-source-id: ee656123112171a4ca00f2fb3f6940f32ddf3135	2017-06-05 23:47:33 -07:00
Andrey Malevich	a5fc70857c	Support fetching of the parameters from the global namescope by '' Summary: This diff is fixing fetching of the parameters in the global namescope. Earlier diff that have switched to '' have introduced this bug. Reviewed By: dzhulgakov Differential Revision: D5189667 fbshipit-source-id: 4818e99e2c2c90788e70e0b8b6204ec6f471d37d	2017-06-05 22:32:39 -07:00
Fedor Borisyuk	686470a6b8	Feature importance in dper 2.0: build network representation Summary: Changes to enable feature importance. Reviewed By: kennyhorror Differential Revision: D5075252 fbshipit-source-id: e5d46e129bcd5cbef77932c63b5a288dd57775d1	2017-06-05 18:03:34 -07:00
Wael Abdelghani	ebecafbcca	Support for position weighted in distributed PS Summary: Title Reviewed By: azzolini Differential Revision: D5081871 fbshipit-source-id: 68a97c2112522fbcbcdfd9e0f717b8bce60fe028	2017-06-05 17:04:42 -07:00
Wael Abdelghani	5447f5c0d7	Move position weighted to separate layer Reviewed By: kennyhorror Differential Revision: D5063086 fbshipit-source-id: 212c08946728437bcc8b6049438ae82235137ec6	2017-06-05 15:49:22 -07:00
James Cross	f1c971d04b	add ExpandDims to _known_working_ops Summary: ExpandDims is a trivial utility op which should not be triggering a warning when used by ModelHelper. Reviewed By: akyrola Differential Revision: D5117985 fbshipit-source-id: 5589f46f58458f5019924b48602db088563f2fee	2017-06-05 15:49:21 -07:00
Aapo Kyrola	5e6bd4fbfc	Return predict params from ExtractPredictorNet + test Summary: Make it easier for users by returning from ExtractPredictorNet the list of blobs that must be saved/exported to run a predictor net. Added a test for ExtractPredictorNet Codemod. Reviewed By: asaadaldien Differential Revision: D5176097 fbshipit-source-id: b1af42132459487b8d94fcdde0e4c514da608243	2017-06-05 15:34:37 -07:00
Ross Girshick	8e99824ce7	Allow subsets of gradient outputs / inputs in Python ops Summary: I'm using Python ops in a project and need corresponding Python gradient ops. For my use case, only a subset of the forward op outputs have gradients and only a subset of forward op inputs have gradients. However the current implementation of `GetPythonGradient` forces all grad inputs and outputs to exist. This diff allows one to specify that only a subset of grad inputs / outputs are used when constructing the Python op. I'm not sure if this is up to caffe2 standards, so please push back on style and content as needed. Reviewed By: dzhulgakov Differential Revision: D4897004 fbshipit-source-id: 96fffe8634c51a49b6bce7339a46c6235f7d4bbd	2017-06-05 12:52:01 -07:00
Yiming Wu	8871ef029b	quick fix future issue with brew/core/schema/workspace/scope/utils.py Summary: fixing missing future package issue. Recently we found some of our users does not have future module support. So we might need a try/catch wrapper around all past import Reviewed By: Yangqing Differential Revision: D5183547 fbshipit-source-id: 262fdf2940ee1be4454bf0b0abb9e6a0f1a0ee82	2017-06-05 12:01:48 -07:00
Andrey Malevich	77c1027abb	Create ParameterSharing abstraction for Caffe2. Summary: This diff is introducing abstractions for parameter sharing for all the parameters, that are created through new create_param syntax. Possible use-cases of this parameters sharing: 1. Share params within RNN interface. 2. Some complicated models that might share some of the branches. 3. TODO (next diff): Cross-model parameter sharing. Reviewed By: salexspb Differential Revision: D5160935 fbshipit-source-id: c6d40a5ed7ead240cd7db0eb69de6dc5f505b05a	2017-06-05 11:49:54 -07:00
Andrey Malevich	e05173a476	Create ExternalInitializer to simplify logic around init_params = False Summary: This diff is creating new type of Initializer - ExternalInitializer. This initializer is supposed to be used in cases when the parameter blob is already expected to exist in the workspace. Reviewed By: dzhulgakov Differential Revision: D5171322 fbshipit-source-id: d27861f0f80afdea93c235d49f63da19adccc92c	2017-06-02 18:22:50 -07:00
Andrey Malevich	a8fb85797c	Refactoring of the parameters step 0. Add simple tags and unify interface for params and computed_params. Summary: This diff is the first step in the effort for refactoring all parameters. As a first step - I'm merging concept of params and computed_params, that is going to be based on tags instead (in the first version it's still using old data structs to store all the BlobReferences). Renaming computed_params to non-trainable/non-backprop params should be done is some other diff. Reviewed By: salexspb Differential Revision: D5171159 fbshipit-source-id: 68031ca779f053fb266a7c4a2e5b482a3bd9c832	2017-06-02 17:17:57 -07:00
Tao Wu	3bd6195891	removed Sum from simple_operator_layers.py; passed unit tests Summary: removed softmax, sigmoid, tanh, relu from simple_operator_layers.py; passed all unit tests Reviewed By: kittipatv Differential Revision: D5150271 fbshipit-source-id: abe611bf6c5de5caba189181e9e41d705d8c5c54	2017-06-02 15:03:16 -07:00
Aapo Kyrola	401908d570	add_weight_decay + restore weight decay to resnet50_trainer Summary: Add add_weight_decay to optimizer + test. In D5142973 I accidentally removed weight decay from resnet50 trainer, so this restores it. Reviewed By: asaadaldien Differential Revision: D5173594 fbshipit-source-id: c736d8955eddff151632ae6be11afde0883f7531	2017-06-02 14:16:56 -07:00
Aaron Markham	a2ba169354	fixed operators schema output to work from only this file for OSS Summary: old diff had some changes to formatter.py and generator.py, but now everything is in github.py Reviewed By: bwasti Differential Revision: D5165061 fbshipit-source-id: 5fe5ff70ff2c5525c7aacf20854916c86d272749	2017-06-02 13:47:25 -07:00
James Cross	4bed0c6d41	Update RNN Seq2SeqModelCaffe2EnsembleDecoder to reflect training network structure Summary: Use new blob as residual sum output, and add scoping to prevent any name conflicts. Reviewed By: urikz Differential Revision: D5167145 fbshipit-source-id: a01c87ed2278205e95e8395314b166afb1dca1b3	2017-06-01 23:32:35 -07:00
Pooya Davoodi	2c97c98ca7	Enable testing the GPU implementations of Adagrad and Adam Summary: Enable testing the GPU implementations of Adagrad and Adam incl sparse versions. Closes https://github.com/caffe2/caffe2/pull/607 Reviewed By: dzhulgakov Differential Revision: D5121552 Pulled By: Yangqing fbshipit-source-id: da6b7dde456237c94cf74d00860e7327b2267eab	2017-06-01 18:10:57 -07:00
Kun Han	fc4d118e6b	Caffe2 MemNN Production Model Saving Summary: Split the Caffe2 memory based model into to parts - Dimension reduction MLP - DNN with concatenation of memory and obj feature Currently only implement simple mean Differential Revision: D4866825 fbshipit-source-id: d2f6813402513ec9af30dbe29a50593e2d3cdb3b	2017-06-01 14:31:53 -07:00
Ahmed Taei	299f293cb2	Add initializer classes to conv_nd. Summary: Fix parameters passed to _ConvBase Reviewed By: sunwael Differential Revision: D5166836 fbshipit-source-id: 6c2a9fa73cf1199a5f861900554f3075a49104fc	2017-06-01 14:17:55 -07:00
Simon Layton	58874ad5bf	Fp16 training initializers Summary: Re-open for re-importing :) Closes https://github.com/caffe2/caffe2/pull/721 Differential Revision: D5164345 Pulled By: akyrola fbshipit-source-id: e80b32556cd25610602df91a4225b93edc0ca40b	2017-06-01 08:34:46 -07:00
Aapo Kyrola	ffbba0fae7	add model_helper Validate() + sprinkler around Summary: Recent diff introduced a duplicate parameter to the model, which would hurt the performance and also affect correctness (duplicate momentum updates, for example). We unfortunately had no checks for duplicate params, outside of data_parallel_model, which fortunately brought this into our attention. But it is better to have a Validate() function in model_helper, and call that before adding gradient ops and querying for parameters. Added to brew_test calls as well. Reviewed By: kennyhorror Differential Revision: D5163458 fbshipit-source-id: 35692e8bfcc359d4e8bc73e6f2358659f6e45ceb	2017-06-01 02:36:47 -07:00
Aapo Kyrola	0f8c8f37a8	Revert D5159712: [caffe2][PR] Fp16 training initializers Summary: This reverts commit 60a889494d2e2f4df1d720331e19f638c5eb95cc Differential Revision: D5159712 fbshipit-source-id: 16040c911b260648857f656f92b165f92c2daae0	2017-06-01 00:17:14 -07:00
Aapo Kyrola	076376f4f6	Revert D5119830: [C2] Refactoring of the parameters step 0. Add simple tags and unify interface for params and computed_params Summary: This reverts commit 2001090a37346eb12abbb234e13e727c288eb8a7 Differential Revision: D5119830 fbshipit-source-id: bf321868338f0db85dff3237af7eaf74212dbdf6	2017-06-01 00:02:21 -07:00
Andrey Malevich	ff61ed358e	Refactoring of the parameters step 0. Add simple tags and unify interface for params and computed_params Summary: This diff is the first step in the effort for refactoring all paramters. As a first step - I'm merging concept of params and computed_params, that is going to be based on tags instead (in the first version it's still using old data structs to store all the BlobReferences). Renaming computed_params to non-trainable/non-backprop params should be done is some other diff. Reviewed By: salexspb Differential Revision: D5119830 fbshipit-source-id: 2001090a37346eb12abbb234e13e727c288eb8a7	2017-05-31 22:36:36 -07:00
Luke Yeager	d8d1cd1064	Test smaller tensors in segment_ops_test Summary: It's causing problems inside docker containers: `InvalidArgument: Insufficient bytes of entropy to draw requested array. shape=(5, 9, 10, 5), dtype=float32. Can you reduce the size or dimensions of the array? What about using a smaller dtype? If slow test runs and minimisation are acceptable, you could increase settings().buffer_size from 8192 to at least 18432000.` Closes https://github.com/caffe2/caffe2/pull/707 Differential Revision: D5162621 Pulled By: Yangqing fbshipit-source-id: 55544210961cbc80828dca2cbeba6a5ace8cf8d1	2017-05-31 20:17:31 -07:00
Luke Yeager	e2cf007dc8	Avoid numpy VisibleDeprecationWarning in test Summary: This warning becomes an error with https://github.com/numpy/numpy/pull/6271 (`>=0.12.0`). ``` caffe2/python/operator_test/tile_op_test.py::TestTile::test_tilewinput /opt/caffe2/caffe2/python/operator_test/tile_op_test.py💯 VisibleDeprecationWarning: converting an array with ndim > 0 to an index will result in an error in the future dims[axis] = tiles /usr/lib/python2.7/dist-packages/numpy/lib/shape_base.py:873: VisibleDeprecationWarning: converting an array with ndim > 0 to an index will result in an error in the future return c.reshape(shape_out) ``` Closes https://github.com/caffe2/caffe2/pull/710 Differential Revision: D5160776 Pulled By: Yangqing fbshipit-source-id: b264e0e389de5817a289db878c15e655f9fa2f09	2017-05-31 20:01:30 -07:00
Simon Layton	2bfacff426	Fp16 training initializers Summary: Adds support for generating and training pfp16 models. Added SGD optimizer for multi-precision trainers and a new callback to data_parallel_model in order to help multi-precision models keep their different copies of parameters in sync during training. Closes https://github.com/caffe2/caffe2/pull/697 Differential Revision: D5159712 Pulled By: salexspb fbshipit-source-id: 60a889494d2e2f4df1d720331e19f638c5eb95cc	2017-05-31 17:46:58 -07:00
Ahmed Taei	f0f4c2fc5d	Increase the number of DAG execution worker threads. Reviewed By: akyrola Differential Revision: D5158414 fbshipit-source-id: add377aec5588076db881a2a3750101710f29732	2017-05-31 15:19:19 -07:00
Aapo Kyrola	73a8a49c7e	synchronize re-rendezvousing on node changes + support num_shards=1 rendezvous Summary: Currently we can get into broken situations when some nodes working on computation detectChanges() faster than others, thus only some of the nodes start doing next iteration of training. This is an inconsistent state. To prevent this to happen, now each node sets a "re-rendezvous flag" and that is allreduced after each iteration. Once all agnodes agree, re-rendezvous will be done. Also noticed that min_shards=1 does not work because data parallel model assumed num_shards>1 when rendezvous is not None. Fixed that. Reviewed By: andrewwdye Differential Revision: D5156282 fbshipit-source-id: f2ccbd8ad13ed37f7813ff8ad1080d963d0d17e3	2017-05-31 15:19:13 -07:00
Ahmed Taei	f2d9d97008	Add an option to reset momentum-sgd params every time between successive block updates. Reviewed By: akyrola Differential Revision: D5149263 fbshipit-source-id: c0a3637a1b48f74ec55c9d13c8fab3456dab809c	2017-05-31 00:32:11 -07:00
Aapo Kyrola	ccdf2d99e1	Add description to assert in model_helper Summary: Add information about the offending param when assertion fires. Reviewed By: kennyhorror Differential Revision: D5153625 fbshipit-source-id: 9f5a02bf64ccbdef9d93d346f79e589dfe3ec5be	2017-05-31 00:02:18 -07:00
Aapo Kyrola	ce7ce46ca1	fix secondary device check by gradient, if it is sparse Summary: Fix an issue where the parameter is not created in param_init_net, or net, and then we secondarily look at which device op outputs the gradient. This did not work if the gradient was a GradientSlice. Reviewed By: harouwu Differential Revision: D5153102 fbshipit-source-id: 20eae660ea32e5a9ea484bf93c04c8f8c71a51ed	2017-05-30 20:47:17 -07:00
Aapo Kyrola	96d8ae2163	Make fills work with input_shape when run in CUDAContext Summary: If ConstantFill (or other fill op) is used in CUDAContext, with input_as_shape, the code crashes as it expects the shape be in CUDAContext but accesses the array in host code... We could fix this by copying the values from the CUDA tensor, but it is probably best to enforce the shape param is in CPU context. This is what this diff does. Differential Revision: D5152766 fbshipit-source-id: 0629a189bd1d800c0b7c9dbc324b78d279efac0b	2017-05-30 20:47:16 -07:00
Alexander Sidorov	846240a340	Caffe2 gradient generator bug fix Summary: Bug repro is in a test. Generally speaking accumulation was not happening if len(ys) >= 2 (list of blobs we compute gradients from) and for some blob in the net it was both in ys list and also got a gradient propagated from another element in ys. Reviewed By: akyrola Differential Revision: D5121695 fbshipit-source-id: 282d88f2f4f6e27dadae311964f40246a2739130	2017-05-30 18:47:08 -07:00
Andrey Malevich	aa59b217a9	Relax requirement on the outputs of the predictor. Summary: It looks like it's a bit too restrictive requirement. Let's remove it. Reviewed By: volkhin Differential Revision: D5150968 fbshipit-source-id: 9e38574edc6542c5ce3c7f25a01afe8f5ff9b507	2017-05-30 17:23:18 -07:00
Simon Layton	1aa6300696	Option to use NCCL for broadcast Summary: Fixes some performance issues when `broadcast_computed_params=True` is passed to Parallelize_GPU. Enabled via the same `use_nccl` flag as AllReduce Closes https://github.com/caffe2/caffe2/pull/630 Differential Revision: D5149828 Pulled By: akyrola fbshipit-source-id: 12c9714c7fa078811f1cde61c8523dca8f7f968f	2017-05-30 16:46:38 -07:00

1 2 3 4 5 ...

776 Commits