pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
Yiming Wu	0bb558716a	rename model_helpers to brew and lowercase all helper functions Summary: rename model_helpers to brew. This is a big diff now. I did these things: 1. replace model_helpers with brew: find . -type f -exec sed -i 's/model_helpers/brew/g' {} + 2. rename model_helpers.py and model_helpers_test.py 3. rename ModelHelpersTest to BrewTest 4. lowercase all the helper functions to distinguish them from single op 5. run my unittests 6. run converge tests Reviewed By: salexspb Differential Revision: D4930465 fbshipit-source-id: f420a1b03238df1cbe9f4426e0b9c43a12119661	2017-04-24 15:52:26 -07:00
Yiming Wu	bef6e45f8b	rename ModelHelperBase Summary: rename ModelHelperBase to Model. This is the result of running: find . -type f -exec sed -i 's/ModelHelperBase/ModelHelper/g' {} + We had 19 results when fbgs ModelHelperBase. Here is 20 instances because I added 1 test in model_helpers_test.py Reviewed By: salexspb Differential Revision: D4928337 fbshipit-source-id: bc4c12b60b90c167e717de50ea9fe17521e142e3	2017-04-24 15:52:26 -07:00
Aapo Kyrola	c7d284a03b	ability to disable inputs for extract predictor net Summary: This is not a super-elegant, but a working solution to fix Newsfeed-teams problem of extracting a predictor net of a net that has a "side chain" that they want to cut from the middle. This adds a argument to ExtractPredictorModel that allows one to define "disabled inputs". These are inputs that we want to switch off, so that all operators that depend on that input will be removed from the model. Differential Revision: D4839953 fbshipit-source-id: 5d16df6f0ec4aac6670e6917efc77abde5d75c95	2017-04-06 17:05:32 -07:00
Yiming Wu	b922b19bfd	add weights bias to modelhelperbase Summary: add weights and bias to modelhelperbase Reviewed By: salexspb Differential Revision: D4837125 fbshipit-source-id: 6a357c0e3d07d35aa6cdeb8ef803976646b9dbe6	2017-04-06 11:16:55 -07:00
Aapo Kyrola	d604961b26	check for ExtractPredictorNet for is_test arguments Summary: To prevent others making the same mistake as I did, check that no op has is_test=0 argument when ExtractPredictorNet is called. Reviewed By: viswanathgs Differential Revision: D4796425 fbshipit-source-id: 38c14df6bcc767ec2e6a6e35ee79596a5dab531c	2017-03-29 12:48:54 -07:00
Aaron Markham	58f7f2b441	doxygen python block added Summary: Closes https://github.com/caffe2/caffe2/pull/226 Differential Revision: D4793550 Pulled By: JoelMarcey fbshipit-source-id: cc33e58186304fa8dcac2ee9115dcc271d785b1e	2017-03-29 06:46:16 -07:00
Jerry Pan	ee28b6ce22	Caffe2: instrument Everstore loader Summary: Caffe2: instrument Everstore loader and log to Scuba Differential Revision: D4669060 fbshipit-source-id: 603256e4ba62a32d9aeadc409f83ef9b1f6a7358	2017-03-27 10:02:11 -07:00
James Cross	79c3a3af54	add gpu support for caffe2-seq2seq Summary: Adding synchronous optimization on GPUs to the translation training pipeline, via data_parallel_model.Parallelize_GPU, which needs to be updated so there is some way of performing sparse parameter updates (e.g., on embedding tables), whether on GPU or CPU. Reviewed By: urikz Differential Revision: D4631914 fbshipit-source-id: 9cdd655f7dbda3f9b2733d459228b3e097892441	2017-03-17 05:19:14 -07:00
Aapo Kyrola	91f468b15c	fixes to make data parallel model work for RecurrentNet + test case Summary: First, this diff includes a full test of data-parallel LSTM, which confirms it works correctly. To make it work, some changes had to be made: - cell net/step net external inputs must be namespace scoped - prevent double-namescoping of cellnet inputs - make data parallel model understand recurrentnets so the device-mapping works Reviewed By: salexspb Differential Revision: D4708840 fbshipit-source-id: 4b0ddc43642d449076a2b6f67ad1c47f84138ff4	2017-03-14 15:48:07 -07:00
Aapo Kyrola	fc7939c25b	add model_helper.ExtractPredictorNet() Summary: It has been a pain to save predictor-compatible models from Caffe2. This diff adds function ExtractPredictorNet that takes a training model and outputs a predictor model by removing all operators that are not relevant for prediction, such as backward pass and dequeue-ops for input loading (as in predictor, the input data is external input). We can also consider including this directly in the predictor exporter for FB usage. Reviewed By: rpenggithub Differential Revision: D4693264 fbshipit-source-id: e81abbbec0bd4d717159cf36488d0baaf0130090	2017-03-13 16:32:04 -07:00
Zachary Mirman	1c92e85dae	Added editDistance helper to caffe2 operators Summary: Added editDistance helper to caffe2 operators Differential Revision: D4622152 fbshipit-source-id: 4d6246b8226c1283d5883edfaa27e8f7748fdc4c	2017-02-28 13:31:56 -08:00
Min Li	182c168285	Add group collector limit and add option for enable sum loss Summary: as title. Add num of examples limit for group collect. Add option for enabling sum loss in BatchLRLoss Reviewed By: xianjiec Differential Revision: D4602311 fbshipit-source-id: 5b2a244f1f0e9f1ab0f4590e94828fd18d018d8d	2017-02-23 15:03:22 -08:00
Yury Zemlyanskiy	40534de705	Gradient for Copy operator Summary: One can find a reason, why I need gradient for CopyOp in this post - https://fb.facebook.com/groups/1405155842844877/permalink/1639683782725414/ Gradient for CopyOp is trivial in case the device was the same (cpu, or same gpu), but get's a little harder, when the copy was made across two different gpu. I introduce new operator CopyOnDeviceLike, which has additional second input. The op copies the first input to the same device as the second one. The default implementation is exactly the same as CopyOp, but I specialize it for CUDAContext. Please, let me know if I'm doing anything wrong here! That's my first caffe2 diff, related to operators definitions. Reviewed By: Yangqing Differential Revision: D4557258 fbshipit-source-id: 9494be589cc1e5696bbbfe25b7622aaa4c9efe4a	2017-02-16 06:11:27 -08:00
Aapo Kyrola	bb928f3cc0	Latest fixes to Xray Flow workflows for Caffe2 Summary: (Ignore the convolution-op related changes, they will be later patched separately) This diff ignores work from latest few weeks: - some refactoring of the flow ops - no_bias setting - MAP computation (instead of accuracy) for OC - adaptive learning rate for Xray concepts - various small bug fixes Reviewed By: viswanathgs Differential Revision: D4329500 fbshipit-source-id: 000d4fd22ec408af5290480c788eb86546bff52e	2017-01-10 12:59:23 -08:00
Ou Jin	a4f3721e15	weightedsum on ps Summary: Rewrite D3993337 based on new stack. Comparing to the old one, we need more readers to achieve the same speed. But so far the speed is the same and the new bottleneck is the write bandwidth of trainer. Model quality is the same as the base. Reviewed By: azzolini Differential Revision: D4310803 fbshipit-source-id: 6d04ae8040c1ee7caa9aea5287f054e73fbe325a	2016-12-22 19:14:38 -08:00
Yury Zemlyanskiy	c2d28fb874	RNNs API simplification Summary: This is a first step in improving our RNN story. It provides a wrapper around current RecurrentNetworkOp implementation which infers most of the redundant parameters and makes API much simpler. Also in order to support general step nets I added an extra argument to the RecurrentNetworkOp. Future work: 1. Inferring step net output and internal blobs (scratches) sizes and type 2. Avoid accessing blobs by names in c++ part 3. Remove requirement for inputs / output 1:1 correspondence in the step net 4. Make python API support networks with operators like Sum being on the boarder of the Cell net (currently there is an issue with such networks where gradient blobs which are on the side are not explicitly created). Differential Revision: D4268503 fbshipit-source-id: f8a66491c2b55daa730caeed7e9f2b3921541b49	2016-12-21 09:29:43 -08:00
Yangqing Jia	4858a6bc6f	snapshot -> checkpoint Summary: This renames the "Snapshot" op name to "Checkpoint" as we discussed earlier. The early Snapshot name is still available, but we should move to the new name and eventually deprecate the old name. The Python SnapshotManager should be also changed, cc azzolini Reviewed By: dzhulgakov Differential Revision: D4272021 fbshipit-source-id: 4b8e029354416530dfbf0d538bfc91a0f61e0296	2016-12-15 12:01:30 -08:00
Aapo Kyrola	6191de7ac9	gradients for CopyGPUToCPU and CopyCPUToGPU + unit test + schema Summary: Added gradients for the Copy operators. They are simply the reverse operation. Also added a unit test to test things actually work and added the operator schema and registration to model_helper's known operators. Differential Revision: D4306516 fbshipit-source-id: dd0633fa7f2ed01991990e56e63669794df037d9	2016-12-15 12:01:29 -08:00
Aapo Kyrola	eddf23ca0f	Handle parameters that are computed but not optimized Summary: prigoyal sharply noticed a bug in the Resnet models: we have not been checkpointing, nor synchronizing between gpus, the moving average and variance computed by the SpatialBN ops. Particularly the first problen is serious, since models starting from checkpoint would have started from a null-state for SpatialBN. Not synchronizing with the data parallel model is less tragic since each GPU should see very similar data. Thus I propose keeping track of "computed params", i.e params that are computed from data but not optimized. I don't know if there are other examples, but SpatialBN's moving avg and var definitely are one. - I modified the checkpointign for xray model to store those blobs + also ensure the synchronization of those blobs - I modified data parallel model to broadcast those params from gpu0. I first tried averaging, but hit some NCCL deadlocks ... :( Differential Revision: D4281265 fbshipit-source-id: 933311afeec4b7e9344a13cf2d38aa939c50ac31	2016-12-15 12:01:28 -08:00
Ou Jin	e8b7ec1393	disable local update for sparse features Summary: With parameter server, sparse features are updated on the parameter server. Local update for sparse features are disabled. But that logic is removed in D4144922. This diff is to add this logic back in a slightly different way. Previously, in trainer_example, I did that in a hacky way just avoid adding sparse weight to model.params. It will still generate grad, but will not add optimization operators. At the same time, it is always registered directly in the sparse_mapping, so the parameter server is aware of this parameter. But with the new change for ParameterInfo. I can not do it in that way anymore. Because the param registry and params are bind together in ParameterInfo. For dper, there is a option in dper model helper to disable all of the sparse parameter optimizer. To combine these two together, I directly changed the ModelHelperBase in this diff. It is not quite ideal. It is better to do it in Layer. But to fix the old one, this seems to be more reasonable place to cover both cases. With this diff, there is no spike anymore. So probably this is the root cause for the convergence issue we have seen in D4144922. It explains that why the model can recover, which is because adagrad decays local learning rate and local updates cause less change. Reviewed By: dzhulgakov Differential Revision: D4229684 fbshipit-source-id: da1241d43d7c52cbf13560f9bb83e09897d8d56f	2016-11-29 15:18:38 -08:00
Huazhong Ning	6ebae91d24	multi-task learning: save model and evaluator Summary: This consists of a series of diffs for implementing Multi-task learning. This diff is to 1. save model; 2. support MT learning in evaluator 3. add unittest. model after merging (saved model): https://our.intern.facebook.com/intern/graphviz/?paste=56793140 Reviewed By: xianjiec Differential Revision: D4123316 fbshipit-source-id: 225bf8616962ec08f4f1ef85729c1e94ba7c373a	2016-11-29 15:18:38 -08:00
Aapo Kyrola	b77aa551a4	add missed comma Summary: D4205610 missed a comma , causing unnecessary logspill with WeightedSum op Reviewed By: Yangqing Differential Revision: D4222806 fbshipit-source-id: ff17c20eae7a7168475f39cc227d3e8ab347288f	2016-11-29 15:18:37 -08:00
Aaron Jaech	c41f0d27c4	adding more things to the list of known operators in model_helper Summary: This is so they don't generate spurious warning messages in the logs Reviewed By: dzhulgakov Differential Revision: D4205610 fbshipit-source-id: f764b51565430f4057898ab929372bc7943e0495	2016-11-29 15:18:35 -08:00
Yangqing Jia	589398950f	fbsync at f5a877	2016-11-18 15:41:06 -08:00
Yangqing Jia	238ceab825	fbsync. TODO: check if build files need update.	2016-11-15 00:00:46 -08:00
Yangqing Jia	d1e9215184	fbsync	2016-10-07 13:08:53 -07:00

26 Commits