pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
Luca Wehrstedt	a1780432fa	Move c10d to libtorch(_cuda) (#59563 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59563 ghstack-source-id: 131331264 Test Plan: CI Reviewed By: malfet Differential Revision: D28932239 fbshipit-source-id: 5df6cdfa5253b15cbbc97039fe672d6d97321e34	2021-06-15 02:01:31 -07:00
Luca Wehrstedt	773b56e719	Fix Windows guards in c10d (#59696 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59696 Some files in c10d refer to dist autograd. However, on Windows, dist autograd isn't built. Hence we need to "mask out" those references under Windows. This was already partly done, but when moving c10d to libtorch some issues came up, possibly due to the different way in which linking happens. Hence I masked out the remaining references. ghstack-source-id: 131169541 Test Plan: CI Reviewed By: agolynski Differential Revision: D28987579 fbshipit-source-id: c29c5330f8429d699554972d30f99a89b2e3971d	2021-06-11 05:06:40 -07:00
Luca Wehrstedt	cbcae46fa5	Remove USE_CUDA from c10d reducer/logger (#59562 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59562 Needed to merge c10d into libtorch(_cuda). ghstack-source-id: 131169542 Test Plan: CI Reviewed By: agolynski Differential Revision: D28931378 fbshipit-source-id: 71376b862ff6ef7dbfa7331ec8d269bd3fcc7e0d	2021-06-11 05:06:39 -07:00
Yi Wang	31d136c81f	[DDP] Rename the member divFactor_ as div_factor for naming consistency in reducer (#59523 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59523 Should use snake case instead of camel case for the consistency. ghstack-source-id: 130759655 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_ddp_grad_div_uneven_inputs Reviewed By: cbalioglu Differential Revision: D28922896 fbshipit-source-id: e04298284a78b2e71b562f790a878731962f873a	2021-06-08 10:04:20 -07:00
Yi Wang	6575975da9	[Reland2][DDP] Merge work and future_work in reducer (#59574 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59574 Remove `work` attribute from Reducer class in favor of `future_work`. Additionally, remove `copy_grad_to_bucket` method since now it's only one-line implementation, and created a new C++ comm hook called `_AllReduceCommHookWithDivFactor` to replace allreduce and also support handling uneven input. 1) Compared with the reverted https://github.com/pytorch/pytorch/pull/58937, updated `_AllReduceCommHookWithDivFactor` in `default_comm_hooks.cpp` to apply division first and hence avoid FP16 overflow. 2) Compared with the reverted https://github.com/pytorch/pytorch/pull/59520, disabled `test_DistributedDataParallel_non_default_stream` on AMD, because now applying division first hurts the gradient averaging accuracy on AMD. See [07:48:26]: https://ci.pytorch.org/jenkins/job/pytorch-builds/job/pytorch-linux-bionic-rocm4.2-py3.6-test1/1129/console #Original PR Issue: https://github.com/pytorch/pytorch/issues/41266 ghstack-source-id: 130752393 Test Plan: buck test caffe2/test/distributed:distributed_gloo_fork -- test_accumulate_gradients_no_sync buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_accumulate_gradients_no_sync buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_ddp_grad_div_uneven_inputs buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_fp16 buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_fp16_grad_is_view buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_DistributedDataParallel_non_default_stream Reviewed By: rohan-varma Differential Revision: D28940800 fbshipit-source-id: 1ba727ac951ebc1e7875dc1a1be8108a2c8d9462	2021-06-07 16:52:20 -07:00
Mike Ruberry	94cc681fc2	Revert D28922305: [Reland][DDP] Merge work and future_work in reducer Test Plan: revert-hammer Differential Revision: D28922305 (`3137bbeb1a`) Original commit changeset: 6388a96eda7a fbshipit-source-id: bc150672e857286eeb129ea683b1cfd2034f0564	2021-06-07 03:58:20 -07:00
Yi Wang	3137bbeb1a	[Reland][DDP] Merge work and future_work in reducer (#59520 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59520 Remove `work` attribute from Reducer class in favor of `future_work`. Additionally, remove `copy_grad_to_bucket` method since now it's only one-line implementation, and created a new C++ comm hook called `_AllReduceCommHookWithDivFactor` to replace allreduce and also support handling uneven input. Compared with the reverted https://github.com/pytorch/pytorch/pull/58937, updated `_AllReduceCommHookWithDivFactor` in `default_comm_hooks.cpp` to apply division first and hence avoid FP16 overflow. #Original PR Issue: https://github.com/pytorch/pytorch/issues/41266 ghstack-source-id: 130685351 Test Plan: buck test caffe2/test/distributed:distributed_gloo_fork -- test_accumulate_gradients_no_sync buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_accumulate_gradients_no_sync buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_ddp_grad_div_uneven_inputs buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_fp16 buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_fp16_grad_is_view Reviewed By: walterddr Differential Revision: D28922305 fbshipit-source-id: 6388a96eda7a06f292873afed6d1362096c13e1c	2021-06-06 09:49:08 -07:00
Rong Rong (AI Infra)	c88a0b55b3	Revert D28677383: [DDP] Merge work and future_work in reducer Test Plan: revert-hammer Differential Revision: D28677383 (`f8bebade47`) Original commit changeset: 85e0620378b7 fbshipit-source-id: ef3c65b88c375aa9a6befe2ab004ec37ae7eb587	2021-06-05 07:25:44 -07:00
Yi Wang	f8bebade47	[DDP] Merge work and future_work in reducer (#58937 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58937 Remove `work` attribute from Reducer class in favor of `future_work`. Additionally, remove `copy_grad_to_bucket` method since now it's only one-line implementation, and created a new C++ comm hook called `_AllReduceCommHookWithDivFactor` to replace allreduce and also support handling uneven input. #Original PR Issue: https://github.com/pytorch/pytorch/issues/41266 ghstack-source-id: 130673249 Test Plan: buck test caffe2/test/distributed:distributed_gloo_fork -- test_accumulate_gradients_no_sync buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_accumulate_gradients_no_sync buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_ddp_grad_div_uneven_inputs Reviewed By: agolynski Differential Revision: D28677383 fbshipit-source-id: 85e0620378b7e9d837e436e94b9d807631d7d752	2021-06-05 01:18:30 -07:00
Rohan Varma	79aeca0b00	[DDP] Log when errors happen (#59281 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59281 Adds ability to log when reducer/ddp encounters an error. We add fields "has_error" and "error" to indicate that an error has occured in this iteration, and the other fields (performance stats) are not guaranteed to be updated. Errors encountered in python-side DDP will be added in the next diff. ghstack-source-id: 130412974 Test Plan: CI Reviewed By: mrshenli Differential Revision: D28652717 fbshipit-source-id: 9772abc2647a92dac6a325da6976ef5eb877c589	2021-06-02 19:48:26 -07:00
Rohan Varma	d83c5a5c7f	Format reducer.cpp, hpp (#58593 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58593 Per title ghstack-source-id: 129498230 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D28528465 fbshipit-source-id: 89e4bfcb4a0275dc17090a934d4c0a41a3c54046	2021-05-20 22:32:30 -07:00
Rohan Varma	62adf9e1c9	[Reducer] Completely remove VariableIndex (#58592 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58592 Completely removes VariableIndex from reducer code, as it is not needed. replica_index is always 0 so simplify the code to only use the parameter index. Next, we should also remove all of the nested data structures that were needed when num_replicas > 1 was possible. ghstack-source-id: 129498226 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D28528440 fbshipit-source-id: e0568399264ab4f86de3b7a379a4f0831f8f42e9	2021-05-20 19:47:50 -07:00
Rohan Varma	faa7d3793d	[DDP] Support not all outputs used in loss calculation (#57081 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57081 Changes in this diff: Enable passthrough autograd function when find_unused_parameters=True. With above, move prepare_for_backward which does unused parameter checking logic to beginning of backwards pass, only when find_unused_parameters=True. Enhance process of unused parameter checking to account for outputs not being used in loss. The way (3) is implemented is by triggering the autograd hook corresponding to parameters that did not participate in loss computation. Since they did not participate, the autograd hook is triggered with a gradient of None, and the reducer handles this appropriately to ensure that the gradient is not touched. Tested by ensuring that when a model output is not used in loss, the corresponding grad is not modified. Also verified that the grads are the same in local vs DDP training case. Also verified that gradients are not touched in this case, i.e. if grad is originally None, it stays as None, not zero, after. Note that in this diff we are not enabling the pass through autograd function for regular case find_unused_parameters=False because that has a much bigger blast radius and needs additional careful analysis especially with regard to the performance. ghstack-source-id: 129425139 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D28048628 fbshipit-source-id: 71d7b6af8626804710017a4edd753787aa9bba61	2021-05-20 08:34:33 -07:00
Yanli Zhao	ea421fb249	enable static graph training in DDP (#55248 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55248 This PR provides enable static graph training when users call _set_static_graph(). This can help support more use cases in DDP without performance regression, also can potentially improve performance when there are unused parameters in the graph. 1. first iteration records graph states like how many times a grad is calculated, whether the grad is used or not. then first iteration queues a delay_all_reduce call back to all reduce grads. 2. Since autograd call back is associated with current target graph task, the delay_all_all call back should be associated with out-most backward graph task. A DDP sink layer is added in DDP forward loop so that we can queue the delay_all_reduce call back in the sink layer. 3. after first iterations, DDP will use the saved graph states to determine whether a grad is used or not. whether a grad is ready for communication. 4. rebuilt bucket is called in second iteration, after graph states are recorded in first iteration. 5. if the graph states change, DDP will throw errors ghstack-source-id: 128599464 Test Plan: unit tests. adding more tests Reviewed By: rohan-varma Differential Revision: D27539964 fbshipit-source-id: 74de1ad2719465be67bab8688d6e293cd6e3a246	2021-05-11 10:23:25 -07:00
Yanli Zhao	3f81912885	static graph api skeleton (#54995 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54995 provide an DDP private API to explicitly set the training is static, also set this flag in logger ghstack-source-id: 127755713 Test Plan: unit tests Reviewed By: rohan-varma Differential Revision: D27444965 fbshipit-source-id: 06ef1c372296815944b2adb33fbdf4e1217c1359	2021-04-30 11:07:26 -07:00
Yanli Zhao	5f2b9b1df9	refactor autograd_hook (#54981 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54981 put part of codes in autograd_hook into functions, so that they can be used in the static graph training later on. ghstack-source-id: 127755405 Test Plan: unit tests Reviewed By: SciPioneer Differential Revision: D27439508 fbshipit-source-id: a02a4b029841f5e7f11cfc5496bb7972ef53d878	2021-04-30 11:06:04 -07:00
Rohan Varma	51e7a371f5	[DDP] Param to name mapping in Reducer (#55075 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55075 Constructs and passes in a mapping with parameter names to Reducer to log information about unused parameters in error messages about unused parameters/not all parameters getting gradient. Use case: 1) User runs DDP forward + bwd, and it has some unused parameters that will result in ddp error in next iteration 2) Next forward pass calls `Reducer::ensure_prior_reduction_finished()` where we check all params got gradient from the previous bwd pass. DDP would throw here in this case. 3) Reducer maintains mapping and tracks used parameters, and computes which parameters did not get gradient and logs this as part of the error. Implementation details: 0) The following is only enabled for debug modes of INFO or DETAIL. 1) To save memory, we don't map param -> param name so that we don't have to copy the entire tensor, instead we map param_index -> param_name and use the existing concept of variable_index in Reducer to look up parameter names. 2) DDP constructs param index -> param name mapping. The name is the fully qualified name: f"{module_name}:{param_name}" and passes it into Reducer 3) Reducer maintains per-iteration std::set<int> of variable indices that have had `mark_variable_ready` called. 4) When some params go unused, we take a set difference to detect the unused params. 5) Unittests to test the logged unused params, as well as for nested modules, are added ghstack-source-id: 126581051 Test Plan: CI, UT Reviewed By: zhaojuanmao Differential Revision: D27356394 fbshipit-source-id: 89f436af4e74145b0a8eda92b3c4e2af8e747332	2021-04-15 09:19:50 -07:00
Yanli Zhao	5ffc4e3b0f	refactor prepare_for_backward (#54977 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54977 put part of codes in prepare_for_backward into functions, so that those functions can be used in static graph training and delay all reduce later on. ghstack-source-id: 126366714 Test Plan: unit tests Reviewed By: rohan-varma Differential Revision: D27439195 fbshipit-source-id: 8899eda621260232d774cb145f9c6d683c47e188	2021-04-13 14:25:29 -07:00
Yi Wang	3e9cbe5ef7	[SPMD] Remove the code branches only used in SPMD mode from distributed.py (#55353 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55353 Remove all the code branches that will only be executed when `device_ids > 1`. Some helper functions are also removed: 1. `_verify_replicas_within_process` and `verify_replicas_within_process` 2. `_replicate_modules_within_process` 3. `parallel_apply` The next step is deprecating `_module_copies` field. ghstack-source-id: 126201121 Test Plan: waitforbuildbot Reviewed By: rohan-varma Differential Revision: D27552201 fbshipit-source-id: 128d0216a202f5b1ba4279517d68c3badba92a6c	2021-04-09 17:27:56 -07:00
Rohan Varma	5c3d80d8fa	[DDP] Mark a few variables as const in reducer (#54764 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54764 We mark a few vars as const in Reducer, also do this for replicas_ and process_group_ as they should not be changed by Reducer during training. This can help eliminate issues at compile time and prevent the developer from accidently changing these variables. ghstack-source-id: 125040110 Test Plan: CI Reviewed By: SciPioneer Differential Revision: D27357132 fbshipit-source-id: 23a0edf754a8e4f9e6440e99860e5549724cb7ad	2021-03-27 21:40:18 -07:00
Rohan Varma	671f80a313	[c10d] s/torch::autograd::variable/at::Tensor/g (#54763 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54763 Replaces deprecated torch::autograd::variable with at::Tensor. torch::autograd::variable is defined as equal to at::Tensor now so this should be a noop, but follows convention of using tensor instead of Variable. ghstack-source-id: 125040109 Test Plan: CI Reviewed By: SciPioneer Differential Revision: D27356450 fbshipit-source-id: 1a001358d7726a597141ec47803c8213db4814c0	2021-03-27 21:38:51 -07:00
Rohan Varma	f52a3bd634	[DDP] remove dedupe check in reducer (#53919 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53919 https://github.com/pytorch/pytorch/pull/53279/files has landed deduplicating the shared params in python before constructing reducer. Because of this, we no longer need the changes in https://github.com/pytorch/pytorch/pull/46755/files. This is already tested by `test_ddp_shared_grad_acc_unused_params` and `test_ddp_weight_sharing` ghstack-source-id: 123828299 Test Plan: ci Reviewed By: SciPioneer Differential Revision: D27015466 fbshipit-source-id: efb079540c1a0e18bb38e68479caeb50cf550304	2021-03-15 18:50:05 -07:00
Yanli Zhao	a08fc1a7fc	allow users to set sample rate and add per iteration latency breakdowns (#53145 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53145 add a new API to allow users to set sample rate for runtime stats, also add per iteration latency breakdowns to DDPLoggingData struct. e.g. if users set sample rate to be 1, they can analyze per iteration latency change over time (not avged) ghstack-source-id: 123443369 Test Plan: unit test Reviewed By: SciPioneer Differential Revision: D26763957 fbshipit-source-id: baff6a09c2a590e6eb91362ca6f47ae8fa6ddb0e	2021-03-10 11:35:18 -08:00
Rohan Varma	68134374cb	Refactor/fix DDP model check during init (#52887 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52887 This diff changes the way to do model consistency check (i.e. `_verify_replicas_across_processes`) in DDP. There were a few things that could be improved with the way we verify model across processes in DDP initialization: 1. We should do this check before syncing module states in DDP init, otherwise with Gloo backend this will throw but we would like to throw the error corresponding to different models on different ranks. To do this, we move the methods to be standalone C++ functions (not part of reducer) and move this check to before synchronizing parameters. 2. Refactor DDP init in the following ways: - Run model consistency check before creating reducer, 2 - add helper functions to build params to pass into reducer - add helper function to call `_verify_model_across_ranks` - move `def parameters` to a helper function `_get_parameters` to be used more broadly within DDP In follow up changes we will add the ability to detect which rank had inconsistent model (https://github.com/pytorch/pytorch/issues/52876 would be useful for this to determine which ranks(s) had errors). ghstack-source-id: 123171877 Test Plan: CI/unittest buck test mode/dev-nosan //caffe2/test/distributed:c10d BACKEND="nccl" WORLD_SIZE="2" ~/fbcode/buck-out/dev/gen/caffe2/test/distributed/distributed_nccl_fork#binary.par -r test_ddp_model_diff_across_ranks Reviewed By: zhaojuanmao Differential Revision: D26565290 fbshipit-source-id: f0e1709585b53730e86915e768448f5b8817a608	2021-03-05 11:21:45 -08:00
Yanli Zhao	c75fa39b6c	add stats that can only be collected at runtime (#51386 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51386 add stats such as rebuilt bucket stats, unused parameter stats and performance stats to ddp logging data 1. gpu time stats are not collected for single process multiple devices in this diff, as that requires events are created and recorded on multiple devices 2. use at::cuda::event API for safer calls 3. events may not be created in autograd hook if hook is not triggered in user's codes, e.g., users runs in non-sync mode in some iterations. So we checked events are created or not before synchronizing, also skipped invalid results. 4. users may not set device upfront, so explicitly set proper device before creating events in our prepare_forward() and prepare_backward() calls ghstack-source-id: 121933566 Test Plan: unit tests Reviewed By: SciPioneer Differential Revision: D26158645 fbshipit-source-id: ce5f15187802eba76accb980449be68902c10178	2021-02-19 00:13:11 -08:00
Rohan Varma	6dabe0b291	[Dist Profiling] Enable dist profiling for DDP (gloo only) (#52031 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52031 Closes https://github.com/pytorch/pytorch/issues/52020 Ensures that we can profile collectives in DDP by propagating the profiler threadLocalState appropriately. As described in the above issue, before this wouldn't work as the profiler would only be enabled on the main thread. ghstack-source-id: 121818080 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D26356192 fbshipit-source-id: 0158b5833a3f857a0b4b2943ae3037e9d998dfd1	2021-02-17 12:21:37 -08:00
Yanli Zhao	18e0a61388	add more logging fields that can be set in construction time (#51260 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51260 add more logging fields to DDPLoggingData, including param stats, bucket stats, environment variables, nccl version, data type ghstack-source-id: 121260224 Test Plan: unit tests Reviewed By: rohan-varma Differential Revision: D26118245 fbshipit-source-id: ba48b7a11340bda1f5f3b24c8603545d346361e9	2021-02-09 21:58:58 -08:00
Yanli Zhao	250c71121b	Create a DDPLoggingData and expose it to python interface (#50622 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50622 1. Define a DDPLoggingData struct that is the placeholder for all the ddp related logging fields 2. Put the DDPLoggingData struct in the C10 directory so that it can be easily imported by c10 and torch files 3. Expose get_ddp_logging_data() method in python so that users can get the logging data and dump in their applications 4. Unit test tested the logging data can be set and got as expected 5. Follow up will add more logging fields such as perf stats, internal states, env variables and etc ghstack-source-id: 120275870 Test Plan: unit tests Reviewed By: SciPioneer Differential Revision: D25930527 fbshipit-source-id: 290c200161019c58e28eed9a5a2a7a8153113f99	2021-01-25 15:23:07 -08:00
Wanchao Liang	553ccccc54	[c10d] switch ProcessGroup to be managed by intrusive_ptr (#47343 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47343 Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D24723418 Pulled By: wanchaol fbshipit-source-id: 0463819b96c53b12bdbb3905431110d7b21beb77	2020-11-12 07:36:23 -08:00
Wanchao Liang	70ae5685f9	[reland][c10d] switch ProcessGroup::Work to be managed by intrusive_ptr (#47806 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47806 reland https://github.com/pytorch/pytorch/pull/44046 Test Plan: wait for ci Reviewed By: gmagogsfm Differential Revision: D24905245 fbshipit-source-id: ad75ace5432fcfd22d513878f5a73c4bb017324e	2020-11-11 22:51:03 -08:00
Wanchao Liang	dac0192148	Revert D23632280: [c10d] switch ProcessGroup::Work to be managed by intrusive_ptr Test Plan: revert-hammer Differential Revision: D23632280 (`0650a6166f`) Original commit changeset: 0a4642a8ffab fbshipit-source-id: 2aa8ddb874fab11f773f4c08d740afcd865482e9	2020-11-11 10:54:08 -08:00
Wanchao Liang	0650a6166f	[c10d] switch ProcessGroup::Work to be managed by intrusive_ptr (#44046 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44046 Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D23632280 Pulled By: wanchaol fbshipit-source-id: 0a4642a8ffabdd26c52c1baabfa30c0f446c3c85	2020-11-10 23:30:22 -08:00
Yi Wang	6b3802a711	[Gradient Compression] Export sizes, along with length and offset of each variable to GradBucket for PowerSGD (#47203 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47203 1. Create a new field in BucketReplica to store sizes info for each variable. 2. Export sizes list, along with lengths and offsets to GradBuceket. These fields are needed for PowerSGD. Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202 ghstack-source-id: 115875194 Test Plan: Checked the field values from log. Reviewed By: rohan-varma Differential Revision: D24644137 fbshipit-source-id: bcec0daf0d02cbf25389bfd9be90df1e6fd8fc56	2020-11-04 12:34:53 -08:00
Yanan Cao	5c4bd9a38f	Move python-independent c10d implementations to torch/lib (#47309 ) Summary: * This is a pre-step to build c10d into libtorch * Includes a minor cleanup in c10d/CMakeLists.txt Pull Request resolved: https://github.com/pytorch/pytorch/pull/47309 Reviewed By: wanchaol Differential Revision: D24711768 Pulled By: gmagogsfm fbshipit-source-id: 6f9e0a6a73c30f5ac7dafde9082efcc4b725dde1	2020-11-03 23:39:54 -08:00

34 Commits