pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Aapo Kyrola	db5cc8f278	revert exhaustive_search setting to False Summary: As per discussion in D4355529 Reviewed By: prigoyal Differential Revision: D4362162 fbshipit-source-id: 795fcf1507235a7dc3c7a10b0453037936d057aa	2016-12-22 12:44:42 -08:00
Yangqing Jia	2c6a579859	Make all convolution operators allow optional bias term Summary: It used to be that only the cudnn engine supports it, and now it should be fully supported by any conv engine. To ignore bias, simply use a convolution op that has two inputs instead of 3. The gradient operator will automatically figure out that it does not compute the bias gradient. Reviewed By: prigoyal Differential Revision: D4354183 fbshipit-source-id: cf71b6289a254d15a6a663a85df63fbbaec3702b	2016-12-21 15:14:24 -08:00
Aapo Kyrola	5209a28c95	cuddn_exhaustive_search default True Summary: As discussed, this improves performance a lot and is not a memory hog anymore. Anyway anyone can also turn it off. Differential Revision: D4338798 fbshipit-source-id: bf0fdb594427ebe90e1e94b2effdc63196096b3f	2016-12-21 09:29:43 -08:00
Yury Zemlyanskiy	c2d28fb874	RNNs API simplification Summary: This is a first step in improving our RNN story. It provides a wrapper around current RecurrentNetworkOp implementation which infers most of the redundant parameters and makes API much simpler. Also in order to support general step nets I added an extra argument to the RecurrentNetworkOp. Future work: 1. Inferring step net output and internal blobs (scratches) sizes and type 2. Avoid accessing blobs by names in c++ part 3. Remove requirement for inputs / output 1:1 correspondence in the step net 4. Make python API support networks with operators like Sum being on the boarder of the Cell net (currently there is an issue with such networks where gradient blobs which are on the side are not explicitly created). Differential Revision: D4268503 fbshipit-source-id: f8a66491c2b55daa730caeed7e9f2b3921541b49	2016-12-21 09:29:43 -08:00
Simon Layton	05233cd5b8	Make bias optional in cuDNN conv op Summary: Yangqing This seems to work for me, not sure if it's implemented in the right way for you to accept :) Allows user to specify "no_bias" as an option for convolution layers (only cuDNN at this point), so that the bias associated with that operator is not allocated or computed. This is useful in particular for conv + BatchNorm combinations (such as ResNets), as the bias term can be handled by both conv and Batch Norm, wasting memory and computation. Closes https://github.com/caffe2/caffe2/pull/50 Reviewed By: Yangqing Differential Revision: D4341288 Pulled By: bwasti fbshipit-source-id: e6138d0024c83ed876dff2f83ffbebe7de502fd8	2016-12-19 14:59:49 -08:00
Yangqing Jia	1a00ffea2a	Implement fix recommended by @slayton58 Summary: This addresses integer division errors. Reviewed By: bwasti Differential Revision: D4315555 fbshipit-source-id: 13ef9496409b3452bc5fb66ce787b11af1382132	2016-12-15 12:01:30 -08:00
Aapo Kyrola	eddf23ca0f	Handle parameters that are computed but not optimized Summary: prigoyal sharply noticed a bug in the Resnet models: we have not been checkpointing, nor synchronizing between gpus, the moving average and variance computed by the SpatialBN ops. Particularly the first problen is serious, since models starting from checkpoint would have started from a null-state for SpatialBN. Not synchronizing with the data parallel model is less tragic since each GPU should see very similar data. Thus I propose keeping track of "computed params", i.e params that are computed from data but not optimized. I don't know if there are other examples, but SpatialBN's moving avg and var definitely are one. - I modified the checkpointign for xray model to store those blobs + also ensure the synchronization of those blobs - I modified data parallel model to broadcast those params from gpu0. I first tried averaging, but hit some NCCL deadlocks ... :( Differential Revision: D4281265 fbshipit-source-id: 933311afeec4b7e9344a13cf2d38aa939c50ac31	2016-12-15 12:01:28 -08:00
Ou Jin	e8b7ec1393	disable local update for sparse features Summary: With parameter server, sparse features are updated on the parameter server. Local update for sparse features are disabled. But that logic is removed in D4144922. This diff is to add this logic back in a slightly different way. Previously, in trainer_example, I did that in a hacky way just avoid adding sparse weight to model.params. It will still generate grad, but will not add optimization operators. At the same time, it is always registered directly in the sparse_mapping, so the parameter server is aware of this parameter. But with the new change for ParameterInfo. I can not do it in that way anymore. Because the param registry and params are bind together in ParameterInfo. For dper, there is a option in dper model helper to disable all of the sparse parameter optimizer. To combine these two together, I directly changed the ModelHelperBase in this diff. It is not quite ideal. It is better to do it in Layer. But to fix the old one, this seems to be more reasonable place to cover both cases. With this diff, there is no spike anymore. So probably this is the root cause for the convergence issue we have seen in D4144922. It explains that why the model can recover, which is because adagrad decays local learning rate and local updates cause less change. Reviewed By: dzhulgakov Differential Revision: D4229684 fbshipit-source-id: da1241d43d7c52cbf13560f9bb83e09897d8d56f	2016-11-29 15:18:38 -08:00
Yangqing Jia	238ceab825	fbsync. TODO: check if build files need update.	2016-11-15 00:00:46 -08:00
Simon Layton	8def54e82b	Fix BN in test phase	2016-10-19 08:20:11 -04:00
Yangqing Jia	f019672e0b	Merge branch 'master' into fbsync	2016-10-07 16:42:13 -07:00
Yangqing Jia	d1e9215184	fbsync	2016-10-07 13:08:53 -07:00
Simon Layton	00c493864e	Fix BN for test phase	2016-10-07 12:11:36 -04:00
Yangqing Jia	0a09d09431	fbsync	2016-09-08 17:56:14 -07:00
Yangqing Jia	b23e51d467	chunky sync	2016-09-06 15:55:19 -07:00
Yangqing Jia	05512d1e10	sync	2016-08-10 11:02:15 -07:00
Yangqing Jia	6463eebc7b	chunky sync - build scripts to be written	2016-07-21 10:16:42 -07:00
Yangqing Jia	559053d3a8	chunky sync	2016-05-13 14:43:48 -07:00
Yangqing Jia	cf7ca23fc1	make caffe2.python build	2016-03-08 16:48:19 -08:00
Yangqing Jia	9ae880bb6f	move pycaffe2 to caffe2.python	2016-03-08 15:45:30 -08:00

1 2

70 Commits