pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
Alexander Sidorov	52befa4802	DataParallelModel: take param_init_net into account in _InferBlobDevice Summary: Here is my example: For static RNN timestep is created as a part of param_init_net. Before DPM assumed that it is CUDA blob by default and it participated in broadcasting causing Copy on line 798 to fail. No device mapping is correct for this blob. Reviewed By: akyrola Differential Revision: D5631716 fbshipit-source-id: 28c3eb17ecc3080c95c41d69a60bf7262d3907d4	2017-08-15 12:06:46 -07:00
Zhaoming Wu	399fc9fb09	Added Nesterov Summary: Added Nesterov momentum as an option for BMUF and corresponding tests Reviewed By: asaadaldien Differential Revision: D5599888 fbshipit-source-id: 30819c9e689347c8b75daddc7444bea9f54193ae	2017-08-11 13:52:43 -07:00
Priya Goyal	5c77cc8182	Exposing num_workers as parameter and enable recycling activations Summary: as promised, a separate diff for dpm changes I made in experimental code Reviewed By: pietern Differential Revision: D5551304 fbshipit-source-id: 9013aeab6c388b1c415ffb2e36fb8dd6b8cf90b0	2017-08-08 19:48:41 -07:00
Ahmed Taei	647f35e742	Fix SyncAllParamsDistributed for Python 3x Summary: In Python 3x dictionary values aren't a list and can't be concatenated to a list this diff should fix that. Reviewed By: andrewwdye Differential Revision: D5576724 fbshipit-source-id: c60441857ceceb9c4a71122d2db5e9abad6d3fc2	2017-08-07 14:23:32 -07:00
Aapo Kyrola	26645154bb	warn about using test/val model with init_params=True + fixed some cases Summary: It is common mistake to create test/validation model with init_params=True. When its param_init_net is run, it will overwrite training models' params, and with DPM, those won't be synchronized to all GPUs. I don't want to make this an assertion yet, since it might break people's trainers (it is ok to have init_params=True if you never run the param_init_net...). Reviewed By: asaadaldien Differential Revision: D5509963 fbshipit-source-id: 63b1a16ec0af96e3790e226850f6e0e64689143f	2017-07-27 13:20:27 -07:00
Aapo Kyrola	af1e45c1e1	support appending net and converting them Summary: As per rushabhmshah99 request: he wants to append a pre-trained model (without training that) to the model. So added data_parallel_model.ConvertNetForDevice() to enable that. The unit test shows example how to use this with AppendNet, and I also added a blurb to the function. Differential Revision: D5503335 fbshipit-source-id: b2a5db5c1739dc97f46dd0d7606ed555d99255b8	2017-07-27 11:07:48 -07:00
Aapo Kyrola	3363681304	enable CreateCommonWorld to bootstrap from existing common world Summary: Use romain-intel's ContextFactory to create common worlds from existing common worlds, thus bypassing KV store completely. Changed data_parallel_model to automatically find if there is already a CW we can work. CreateCommonWorldOp takes optional second parameter, which is existing CW. Reviewed By: andrewwdye Differential Revision: D5494956 fbshipit-source-id: 5f7a840bcd5fe4ea756fafeacc746bc2cf5078b0	2017-07-26 22:31:55 -07:00
Ahmed Taei	804ebf7c41	Populate learning rate blob name into data_parallel_model and fix resnet50_trainer example. Reviewed By: akyrola Differential Revision: D5463772 fbshipit-source-id: 10b8963af778503a3de6edbabb869747bd1e986d	2017-07-21 16:24:10 -07:00
Geet Sethi	11c4647447	Allow CPU device scope in data_parallel_model and data_parallel_rendevous device scope checks Summary: Allowing CPU device scope instead of enforcing no device scope in data_parallel_model and data_parallel_rendevous. Reviewed By: akyrola Differential Revision: D5440492 fbshipit-source-id: bcd4344d64c710ea50ec8a65e3e9d102e35c66ea	2017-07-18 15:47:41 -07:00
Geet Sethi	ab0d631d6d	Adding AllCompare-like function to data_parallel_model Summary: Added function _RunComparison to data_parallel_model that checks if all shards in a given rendevous have the same value for a given blob_name Reviewed By: wesolwsk Differential Revision: D5394164 fbshipit-source-id: c2b07d0f8d5846fa9887d53b0be091a8c057f106	2017-07-13 13:03:57 -07:00
Geet Sethi	a68bb5e3f9	Added device scope checks to data_parallel_model and data_parallel_rendevous Summary: Added device scope checks to data_parallel_model and data_parallel_rendevous Added test to check that checks are working correctly to data_parallel_model_test Fixed device_scope error in test_synchronization_barrier Reviewed By: akyrola Differential Revision: D5403936 fbshipit-source-id: 849c1cd7452692efbc5ef74d2d60ede090c9c017	2017-07-12 10:47:28 -07:00
Ralph Mao	febae7b20b	fix a bug in the report function of Data_Parallel Summary: replace params with sp, otherwise it will report an empty list Reviewed By: akyrola Differential Revision: D5382716 fbshipit-source-id: 34d8e6ee00cbe1718702e3d1f23ea12f8d65063e	2017-07-07 13:03:46 -07:00
Andrew Dye	31f394f8b3	Add synchronization barrier API to data parallel model Summary: Add synchronization barrier API with configurable timeout. Users can call Synchronize() to join variable length execution before resuming multi-machine communication steps, i.e., resuming distributed training iterations after validation on a single machine. Reviewed By: akyrola Differential Revision: D5348387 fbshipit-source-id: 5826da10e6a60c50394c36c7cf47624f10191d11	2017-07-06 09:21:19 -07:00
Aapo Kyrola	2d133d4627	increase concurrency default Summary: Huge improvement in my tests, and it does not really hurt either. Reviewed By: wesolwsk Differential Revision: D5374925 fbshipit-source-id: c96a4ed2ca653120a82233c0037cbfded8a2d2a1	2017-07-05 21:46:31 -07:00
Simon Layton	090506ac87	Add NCCLBroadcast to correct net Summary: Otherwise was always added to main net instead of param_init_net when desired (i.e. initial param sync) Closes https://github.com/caffe2/caffe2/pull/894 Differential Revision: D5367451 Pulled By: akyrola fbshipit-source-id: 3d82be6da687c736bd15f4852dbd272266eb4811	2017-07-03 16:54:44 -07:00
Aapo Kyrola	8c74c36626	fix reducing device option Summary: This was broken in a previous diff, fixing it to use model device type. Reviewed By: asaadaldien Differential Revision: D5356005 fbshipit-source-id: a4fcc932bae772076b57625a5fcc0d38eb702cc9	2017-06-30 09:19:57 -07:00
Thomas Dudziak	5355634dac	Dict fixes/improvements and unittest targets for Python 3 in caffe2 core Summary: As title Reviewed By: salexspb Differential Revision: D5316104 fbshipit-source-id: aee43819d817842e5ce6ba3d045a55b1a2491c30	2017-06-29 17:05:41 -07:00
Yongqiang Wang	ea659b8f2e	broadcast to global parameters when using warmup Reviewed By: asaadaldien, jay-mahadeokar Differential Revision: D5340692 fbshipit-source-id: 80879847ff71c8d620de502ef95a9ffb4bdf595d	2017-06-28 13:35:27 -07:00
Ahmed Taei	fbe2526343	Allow concurrent execution of GLOO broadcast collectives in Summary: This add CollectivesConcurrencyControl class to mange creating common context and cyclic controls to execute GLOO collectivces and refactors AllReduce and _AddDistributedParamterSync to use it Reviewed By: akyrola Differential Revision: D5335795 fbshipit-source-id: 5084e0a65cdb989cd949be3868b77a680561022d	2017-06-28 12:49:12 -07:00
Henry Lu	9a14c013c3	Refactor data_parallel_model to take advantage of Gloo broadcast op in broadcasting across machines and GPUs in one operation Summary: Combine _AddDistributedParameterSync() and _SyncParams() into a single function to broadcast across distributes machines and all local GPU simultaneously. This is similar to how calls to Allreduce has already optimized using the functionalities of Gloo. All the refactoring work is contained in data_parallel_model.py. Reviewed By: akyrola, andrewwdye Differential Revision: D5329277 fbshipit-source-id: 4407b88980cf396f2e0f994d796294fa79fd39ed	2017-06-27 19:35:24 -07:00
Simon Layton	d45f722e43	data_parallel_model: NCCLBroadcast root fix Summary: The root is the root _rank_ and not the root _device_. Thus we always use root=0, regardless of the devices used. https://github.com/NVIDIA/nccl/blob/v1.3.0-1/src/broadcast.cu#L75 /cc slayton58 Closes https://github.com/caffe2/caffe2/pull/872 Differential Revision: D5329564 Pulled By: akyrola fbshipit-source-id: 5a34be30c1a0046a74f28437cb08333c1fb46098	2017-06-27 09:47:48 -07:00
Jay Mahadeokar	04c9c8c5c2	fix for loading model with bmuf Summary: - One line fix for loading saved checkpoint when using Parallelize_GPU_BMUF Reviewed By: asaadaldien Differential Revision: D5315254 fbshipit-source-id: a20ba6438c8e6b2ef44b65270c1d3f9ab645ded0	2017-06-23 17:16:33 -07:00
Thomas Dudziak	342de07231	Core unit test fixes for Python 3 Summary: As title Differential Revision: D5291327 fbshipit-source-id: 7dd9279c53ba55d3422c31973ffcec5705787fdf	2017-06-23 13:22:16 -07:00
Ahmed Taei	5ca263fb1c	Add a warmup option for BMUF Reviewed By: yqwangustc Differential Revision: D5279655 fbshipit-source-id: 7c778a88909580bbe43d4bac4b7d73be0d0e3f27	2017-06-22 14:32:39 -07:00
Ahmed Taei	ffd32c8ab7	Add distributed BMUF implementation. Summary: Refactor data_parallel_model all_reduce and broadcast methods to work for a given parameter set not only gradients and reuse them for BMUF distributed implementation. Add a distributed test (multiprocessing) to BMUF. Reviewed By: akyrola Differential Revision: D5267083 fbshipit-source-id: 8dcc7527d0a755b903d693d8071585f0b54d3403	2017-06-21 16:18:11 -07:00
Aapo Kyrola	34eaa19d27	CPU data parallel model Summary: CPU -version of data parallel model. Great thing is that now we can run data_parallel_model_test in Sandcastle (as it does not have GPUs). Pretty simple change, really. I did not change all variable names with "gpu" in them, to reduce risk (and being a bit lazy). Can improve later. Reviewed By: wesolwsk Differential Revision: D5277350 fbshipit-source-id: 682e0c5f9f4ce94a8f5bd089905b0f8268bd2210	2017-06-20 23:19:08 -07:00
Aapo Kyrola	96f19fefc0	add warning if data parallel model is created for gpus that we dont have Summary: Don't want to assert since it can be useful to sometimes create models that are not run (for example, unit tests). Reviewed By: pietern Differential Revision: D5258905 fbshipit-source-id: f1beee0605bfef235ed0f23f7e78259109720254	2017-06-16 07:02:37 -07:00
Thomas Dudziak	60c78d6160	Fixes range/xrange for Python 3 Summary: As title Differential Revision: D5151894 fbshipit-source-id: 7badce5d3122e8f2526a7170fbdcf0d0b66e2638	2017-06-07 00:04:26 -07:00
Aapo Kyrola	5e6bd4fbfc	Return predict params from ExtractPredictorNet + test Summary: Make it easier for users by returning from ExtractPredictorNet the list of blobs that must be saved/exported to run a predictor net. Added a test for ExtractPredictorNet Codemod. Reviewed By: asaadaldien Differential Revision: D5176097 fbshipit-source-id: b1af42132459487b8d94fcdde0e4c514da608243	2017-06-05 15:34:37 -07:00
Andrey Malevich	a8fb85797c	Refactoring of the parameters step 0. Add simple tags and unify interface for params and computed_params. Summary: This diff is the first step in the effort for refactoring all parameters. As a first step - I'm merging concept of params and computed_params, that is going to be based on tags instead (in the first version it's still using old data structs to store all the BlobReferences). Renaming computed_params to non-trainable/non-backprop params should be done is some other diff. Reviewed By: salexspb Differential Revision: D5171159 fbshipit-source-id: 68031ca779f053fb266a7c4a2e5b482a3bd9c832	2017-06-02 17:17:57 -07:00
Simon Layton	58874ad5bf	Fp16 training initializers Summary: Re-open for re-importing :) Closes https://github.com/caffe2/caffe2/pull/721 Differential Revision: D5164345 Pulled By: akyrola fbshipit-source-id: e80b32556cd25610602df91a4225b93edc0ca40b	2017-06-01 08:34:46 -07:00
Aapo Kyrola	0f8c8f37a8	Revert D5159712: [caffe2][PR] Fp16 training initializers Summary: This reverts commit 60a889494d2e2f4df1d720331e19f638c5eb95cc Differential Revision: D5159712 fbshipit-source-id: 16040c911b260648857f656f92b165f92c2daae0	2017-06-01 00:17:14 -07:00
Aapo Kyrola	076376f4f6	Revert D5119830: [C2] Refactoring of the parameters step 0. Add simple tags and unify interface for params and computed_params Summary: This reverts commit 2001090a37346eb12abbb234e13e727c288eb8a7 Differential Revision: D5119830 fbshipit-source-id: bf321868338f0db85dff3237af7eaf74212dbdf6	2017-06-01 00:02:21 -07:00
Andrey Malevich	ff61ed358e	Refactoring of the parameters step 0. Add simple tags and unify interface for params and computed_params Summary: This diff is the first step in the effort for refactoring all paramters. As a first step - I'm merging concept of params and computed_params, that is going to be based on tags instead (in the first version it's still using old data structs to store all the BlobReferences). Renaming computed_params to non-trainable/non-backprop params should be done is some other diff. Reviewed By: salexspb Differential Revision: D5119830 fbshipit-source-id: 2001090a37346eb12abbb234e13e727c288eb8a7	2017-05-31 22:36:36 -07:00
Simon Layton	2bfacff426	Fp16 training initializers Summary: Adds support for generating and training pfp16 models. Added SGD optimizer for multi-precision trainers and a new callback to data_parallel_model in order to help multi-precision models keep their different copies of parameters in sync during training. Closes https://github.com/caffe2/caffe2/pull/697 Differential Revision: D5159712 Pulled By: salexspb fbshipit-source-id: 60a889494d2e2f4df1d720331e19f638c5eb95cc	2017-05-31 17:46:58 -07:00
Ahmed Taei	f0f4c2fc5d	Increase the number of DAG execution worker threads. Reviewed By: akyrola Differential Revision: D5158414 fbshipit-source-id: add377aec5588076db881a2a3750101710f29732	2017-05-31 15:19:19 -07:00
Aapo Kyrola	73a8a49c7e	synchronize re-rendezvousing on node changes + support num_shards=1 rendezvous Summary: Currently we can get into broken situations when some nodes working on computation detectChanges() faster than others, thus only some of the nodes start doing next iteration of training. This is an inconsistent state. To prevent this to happen, now each node sets a "re-rendezvous flag" and that is allreduced after each iteration. Once all agnodes agree, re-rendezvous will be done. Also noticed that min_shards=1 does not work because data parallel model assumed num_shards>1 when rendezvous is not None. Fixed that. Reviewed By: andrewwdye Differential Revision: D5156282 fbshipit-source-id: f2ccbd8ad13ed37f7813ff8ad1080d963d0d17e3	2017-05-31 15:19:13 -07:00
Ahmed Taei	f2d9d97008	Add an option to reset momentum-sgd params every time between successive block updates. Reviewed By: akyrola Differential Revision: D5149263 fbshipit-source-id: c0a3637a1b48f74ec55c9d13c8fab3456dab809c	2017-05-31 00:32:11 -07:00
Simon Layton	1aa6300696	Option to use NCCL for broadcast Summary: Fixes some performance issues when `broadcast_computed_params=True` is passed to Parallelize_GPU. Enabled via the same `use_nccl` flag as AllReduce Closes https://github.com/caffe2/caffe2/pull/630 Differential Revision: D5149828 Pulled By: akyrola fbshipit-source-id: 12c9714c7fa078811f1cde61c8523dca8f7f968f	2017-05-30 16:46:38 -07:00
Aapo Kyrola	cdb50fbf2b	add optimizer support to data_parallel_model; Use MomentumSGDUpdate Summary: This diff does two things: - add supports for optimizer to data_parallel_model. User can supply optimizer_builder_fun instead of param_update_builder_fun. The latter is called for each GPU separately with proper namescope and devicescope, while optimizer builder only is called once and adds optimizes to the whole model. - use MomentumSGDUpdate instead of MomentumSGD + WeightedSum. This bring major perf benefits. Changes resnet50 trainer to use optimizer. This relies on D5133652 Reviewed By: dzhulgakov Differential Revision: D5142973 fbshipit-source-id: 98e1114f5fae6c657314b3296841ae2dad0dc0e2	2017-05-30 12:49:57 -07:00
Luke Yeager	6b1cf26380	Fix for dpm when GPUs don't have p2p access Summary: See discussion at https://github.com/caffe2/caffe2/pull/633#issuecomment-303536902 Tested with a TitanX (Pascal) and a TitanZ (Kepler) with this access pattern. ``` Checking GPU(s) for support of peer to peer memory access... > Peer access from TITAN X (Pascal) (GPU0) -> GeForce GTX TITAN Z (GPU1) : No > Peer access from TITAN X (Pascal) (GPU0) -> GeForce GTX TITAN Z (GPU2) : No > Peer access from GeForce GTX TITAN Z (GPU1) -> TITAN X (Pascal) (GPU0) : No > Peer access from GeForce GTX TITAN Z (GPU1) -> GeForce GTX TITAN Z (GPU2) : Yes > Peer access from GeForce GTX TITAN Z (GPU2) -> TITAN X (Pascal) (GPU0) : No > Peer access from GeForce GTX TITAN Z (GPU2) -> GeForce GTX TITAN Z (GPU1) : Yes ``` All combinations pass: * `0,1` * `0,2` * `1,2` * `0,1,2` Closes https://github.com/caffe2/caffe2/pull/659 Differential Revision: D5148779 Pulled By: akyrola fbshipit-source-id: 6263edfe8b36623983f1946b5c3f4a3fef415a45	2017-05-30 12:02:19 -07:00
Ahmed Taei	75a6f909c5	Add option to enable memonger for gradients and add param_names for save_model. Reviewed By: akyrola Differential Revision: D5131493 fbshipit-source-id: 7c159ccffa30eb064c157e559f1d8f0350f03ccb	2017-05-26 11:31:35 -07:00
Pieter Noordhuis	a9b5efe3c2	Expose max collective concurrency Summary: This was hardcoded at 4 before but should be made configurable. Can be kept low for big MLPs and higher for convnets. Reviewed By: akyrola Differential Revision: D5126138 fbshipit-source-id: 713ee8bbeb243b7de1479808fd6398d397e0b49a	2017-05-25 13:32:40 -07:00
Deepak Gopinath	33c40e8a6e	Handling shared indices in sparse gradient updates Summary: When two or more blobs are gathered by the same indices blob in a data parallel model, we used to concatenate multiple times and re-write to the same indices blob. This leads to illegal memory access at times because the gradientslice indices blob is longer than its corresponding gradientslice values blob. This diff adds a check in order to avoid this. Reviewed By: akyrola Differential Revision: D5116817 fbshipit-source-id: 1c086d092eb6d48926d600f9408f578f5ddc41c7	2017-05-24 22:47:00 -07:00
Aapo Kyrola	a2c01e830b	fix duplicate init blob issue + fix test Summary: Address KaimingHe's comments in D5093689 about same blob being initialized twice causing internal consistency check to fail. Also I noticed that my new test for test_checkpoint_params was completely botched due to an indentatino issue (it did not actually execute any test). So this fixes that as well. Modified the test to add a duplicate param initializer, so that this bug is tested for. Reviewed By: KaimingHe Differential Revision: D5101304 fbshipit-source-id: 72f343035c1b4953e7bb9a1a1c171cf05d3ead26	2017-05-20 09:18:29 -07:00
Aapo Kyrola	6384bae29b	call save_to_db in CPUContext + fix a typo in data_parallel_model. Summary: If Predictor Exporter save_to_db is called in CUDAContext, a failure occurs since the following FeedBlob() tries to store a string (meta data), but for CUDA blobs we assume they are tensors. + fix a typo in data_parallel_model that I bumped on. Reviewed By: asaadaldien Differential Revision: D5099837 fbshipit-source-id: 69d01b35a9a1816bf083f13d8a6ce88e1f5aecb7	2017-05-19 18:25:00 -07:00
Aapo Kyrola	0af0cba2b7	Refactor data_parallel_model initial sync and checkpointing Summary: Major improvements. Before we only synced "params" and "computed params" of model after initialization and after loading a checkpoint. But actually we want to sync all blobs that are generated in the param_init_net. For example the _momentum blobs were missed by the previous implementation and had to be manually included in checkpoint finalization. I also added GetCheckpointParams() to data_parallel_model because it is now fully general. Also added a unit test. Reviewed By: andrewwdye Differential Revision: D5093689 fbshipit-source-id: 8154ded0c73cd6a0f54ee024dc5f2c6826ed7e42	2017-05-19 12:48:06 -07:00
Aapo Kyrola	658c337f41	Error status for Gloo ops, and handling in elastic dpm Summary: Add a RandomFailureOp and handling to elastic data parallel model of the status code Reviewed By: andrewwdye Differential Revision: D5065936 fbshipit-source-id: 24224f9ea414ee535c9e90cc28add5189354b0ef	2017-05-17 00:16:52 -07:00
Ahmed Taei	25fd005dd9	Initial implementation of Blockwise Model Update Filtering (BMUF) Summary: A Single machine multi-GPU version of BMUF algorithm. BMUF is a modification to model averaging where updates to global model is implemented as a filter: param_t = param_(t-1) + delta delta = \beta delta_(t-1) + \alpha average(param_t) - param_(t-1) Reviewed By: akyrola Differential Revision: D4995057 fbshipit-source-id: 48176ba66d67eaf3fa4dee16d50d9589825ddba4	2017-05-15 18:18:15 -07:00
Aapo Kyrola	282298dd1c	Data parallel model: Disable NCCL by default to hopefully reduce deadlocks Summary: Make NCCL optional in data_parallel_model due to continuing reliablity (deadlock) issues. Reviewed By: pietern Differential Revision: D4988950 fbshipit-source-id: 8a2192f01b5f3c0e847137cd37aefc69e553a56f	2017-05-02 16:09:17 -07:00

1 2

87 Commits