Commit Graph

34 Commits

Author SHA1 Message Date
Luca Wehrstedt
a1780432fa Move c10d to libtorch(_cuda) (#59563)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59563

ghstack-source-id: 131331264

Test Plan: CI

Reviewed By: malfet

Differential Revision: D28932239

fbshipit-source-id: 5df6cdfa5253b15cbbc97039fe672d6d97321e34
2021-06-15 02:01:31 -07:00
Luca Wehrstedt
773b56e719 Fix Windows guards in c10d (#59696)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59696

Some files in c10d refer to dist autograd. However, on Windows, dist autograd isn't built. Hence we need to "mask out" those references under Windows. This was already partly done, but when moving c10d to libtorch some issues came up, possibly due to the different way in which linking happens. Hence I masked out the remaining references.
ghstack-source-id: 131169541

Test Plan: CI

Reviewed By: agolynski

Differential Revision: D28987579

fbshipit-source-id: c29c5330f8429d699554972d30f99a89b2e3971d
2021-06-11 05:06:40 -07:00
Luca Wehrstedt
cbcae46fa5 Remove USE_CUDA from c10d reducer/logger (#59562)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59562

Needed to merge c10d into libtorch(_cuda).

ghstack-source-id: 131169542

Test Plan: CI

Reviewed By: agolynski

Differential Revision: D28931378

fbshipit-source-id: 71376b862ff6ef7dbfa7331ec8d269bd3fcc7e0d
2021-06-11 05:06:39 -07:00
Yi Wang
31d136c81f [DDP] Rename the member divFactor_ as div_factor for naming consistency in reducer (#59523)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59523

Should use snake case instead of camel case for the consistency.
ghstack-source-id: 130759655

Test Plan: buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_ddp_grad_div_uneven_inputs

Reviewed By: cbalioglu

Differential Revision: D28922896

fbshipit-source-id: e04298284a78b2e71b562f790a878731962f873a
2021-06-08 10:04:20 -07:00
Yi Wang
6575975da9 [Reland2][DDP] Merge work and future_work in reducer (#59574)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59574

Remove `work` attribute from Reducer class in favor of `future_work`.

Additionally, remove `copy_grad_to_bucket` method since now it's only one-line implementation, and created a new C++ comm hook called `_AllReduceCommHookWithDivFactor` to replace allreduce and also support handling uneven input.

1) Compared with the reverted https://github.com/pytorch/pytorch/pull/58937, updated `_AllReduceCommHookWithDivFactor` in `default_comm_hooks.cpp` to apply division first and hence avoid FP16 overflow.

2) Compared with the reverted https://github.com/pytorch/pytorch/pull/59520, disabled `test_DistributedDataParallel_non_default_stream` on AMD, because now applying division first hurts the gradient averaging accuracy on AMD.
See [07:48:26]:
https://ci.pytorch.org/jenkins/job/pytorch-builds/job/pytorch-linux-bionic-rocm4.2-py3.6-test1/1129/console

#Original PR Issue: https://github.com/pytorch/pytorch/issues/41266
ghstack-source-id: 130752393

Test Plan:
buck test caffe2/test/distributed:distributed_gloo_fork --  test_accumulate_gradients_no_sync
buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_accumulate_gradients_no_sync
buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_ddp_grad_div_uneven_inputs
buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_fp16
buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_fp16_grad_is_view

buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork --  test_DistributedDataParallel_non_default_stream

Reviewed By: rohan-varma

Differential Revision: D28940800

fbshipit-source-id: 1ba727ac951ebc1e7875dc1a1be8108a2c8d9462
2021-06-07 16:52:20 -07:00
Mike Ruberry
94cc681fc2 Revert D28922305: [Reland][DDP] Merge work and future_work in reducer
Test Plan: revert-hammer

Differential Revision:
D28922305 (3137bbeb1a)

Original commit changeset: 6388a96eda7a

fbshipit-source-id: bc150672e857286eeb129ea683b1cfd2034f0564
2021-06-07 03:58:20 -07:00
Yi Wang
3137bbeb1a [Reland][DDP] Merge work and future_work in reducer (#59520)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59520

Remove `work` attribute from Reducer class in favor of `future_work`.

Additionally, remove `copy_grad_to_bucket` method since now it's only one-line implementation, and created a new C++ comm hook called `_AllReduceCommHookWithDivFactor` to replace allreduce and also support handling uneven input.

Compared with the reverted https://github.com/pytorch/pytorch/pull/58937, updated `_AllReduceCommHookWithDivFactor` in `default_comm_hooks.cpp` to apply division first and hence avoid FP16 overflow.

#Original PR Issue: https://github.com/pytorch/pytorch/issues/41266
ghstack-source-id: 130685351

Test Plan:
buck test caffe2/test/distributed:distributed_gloo_fork --  test_accumulate_gradients_no_sync
buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_accumulate_gradients_no_sync
buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_ddp_grad_div_uneven_inputs
buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_fp16
buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_fp16_grad_is_view

Reviewed By: walterddr

Differential Revision: D28922305

fbshipit-source-id: 6388a96eda7a06f292873afed6d1362096c13e1c
2021-06-06 09:49:08 -07:00
Rong Rong (AI Infra)
c88a0b55b3 Revert D28677383: [DDP] Merge work and future_work in reducer
Test Plan: revert-hammer

Differential Revision:
D28677383 (f8bebade47)

Original commit changeset: 85e0620378b7

fbshipit-source-id: ef3c65b88c375aa9a6befe2ab004ec37ae7eb587
2021-06-05 07:25:44 -07:00
Yi Wang
f8bebade47 [DDP] Merge work and future_work in reducer (#58937)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58937

Remove `work` attribute from Reducer class in favor of `future_work`.

Additionally, remove `copy_grad_to_bucket` method since now it's only one-line implementation, and created a new C++ comm hook called `_AllReduceCommHookWithDivFactor` to replace allreduce and also support handling uneven input.

#Original PR Issue: https://github.com/pytorch/pytorch/issues/41266
ghstack-source-id: 130673249

Test Plan:
buck test caffe2/test/distributed:distributed_gloo_fork --  test_accumulate_gradients_no_sync
buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_accumulate_gradients_no_sync
buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_ddp_grad_div_uneven_inputs

Reviewed By: agolynski

Differential Revision: D28677383

fbshipit-source-id: 85e0620378b7e9d837e436e94b9d807631d7d752
2021-06-05 01:18:30 -07:00
Rohan Varma
79aeca0b00 [DDP] Log when errors happen (#59281)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59281

Adds ability to log when reducer/ddp encounters an error. We add fields "has_error" and "error" to indicate that an error has
occured in this iteration, and the other fields (performance stats) are not
guaranteed to be updated.

Errors encountered in python-side DDP will be added in the next diff.
ghstack-source-id: 130412974

Test Plan: CI

Reviewed By: mrshenli

Differential Revision: D28652717

fbshipit-source-id: 9772abc2647a92dac6a325da6976ef5eb877c589
2021-06-02 19:48:26 -07:00
Rohan Varma
d83c5a5c7f Format reducer.cpp, hpp (#58593)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58593

Per title
ghstack-source-id: 129498230

Test Plan: CI

Reviewed By: zhaojuanmao

Differential Revision: D28528465

fbshipit-source-id: 89e4bfcb4a0275dc17090a934d4c0a41a3c54046
2021-05-20 22:32:30 -07:00
Rohan Varma
62adf9e1c9 [Reducer] Completely remove VariableIndex (#58592)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58592

Completely removes VariableIndex from reducer code, as it is not
needed. replica_index is always 0 so simplify the code to only use the
parameter index. Next, we should also remove all of the nested data structures
that were needed when num_replicas > 1 was possible.
ghstack-source-id: 129498226

Test Plan: CI

Reviewed By: zhaojuanmao

Differential Revision: D28528440

fbshipit-source-id: e0568399264ab4f86de3b7a379a4f0831f8f42e9
2021-05-20 19:47:50 -07:00
Rohan Varma
faa7d3793d [DDP] Support not all outputs used in loss calculation (#57081)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57081

Changes in this diff:

Enable passthrough autograd function when find_unused_parameters=True.
With above, move prepare_for_backward which does unused parameter checking logic to beginning of backwards pass, only when find_unused_parameters=True.
Enhance process of unused parameter checking to account for outputs not being used in loss.
The way (3) is implemented is by triggering the autograd hook corresponding to parameters that did not participate in loss computation. Since they did not participate, the autograd hook is triggered with a gradient of None, and the reducer handles this appropriately to ensure that the gradient is not touched.

Tested by ensuring that when a model output is not used in loss, the corresponding grad is not modified. Also verified that the grads are the same in local vs DDP training case. Also verified that gradients are not touched in this case, i.e. if grad is originally None, it stays as None, not zero, after.

Note that in this diff we are not enabling the pass through autograd function for regular case find_unused_parameters=False because that has a much bigger blast radius and needs additional careful analysis especially with regard to the performance.
ghstack-source-id: 129425139

Test Plan: CI

Reviewed By: zhaojuanmao

Differential Revision: D28048628

fbshipit-source-id: 71d7b6af8626804710017a4edd753787aa9bba61
2021-05-20 08:34:33 -07:00
Yanli Zhao
ea421fb249 enable static graph training in DDP (#55248)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55248

This PR provides enable static graph training when users call _set_static_graph(). This can help support more use cases in DDP without performance regression, also can potentially improve performance when there are unused parameters in the graph.
1. first iteration records graph states like how many times a grad is calculated, whether the grad is used or not. then first iteration queues a delay_all_reduce call back to all reduce grads.
2. Since autograd call back is associated with current target graph task, the delay_all_all call back should be associated with out-most backward graph task. A DDP sink layer is added in DDP forward loop so that we can queue the delay_all_reduce call back in the sink layer.
3. after first iterations, DDP will use the saved graph states to determine whether a grad is used or not. whether a grad is ready for communication.
4. rebuilt bucket is called in second iteration, after graph states are recorded in first iteration.
5. if the graph states change, DDP will throw errors
ghstack-source-id: 128599464

Test Plan: unit tests. adding more tests

Reviewed By: rohan-varma

Differential Revision: D27539964

fbshipit-source-id: 74de1ad2719465be67bab8688d6e293cd6e3a246
2021-05-11 10:23:25 -07:00
Yanli Zhao
3f81912885 static graph api skeleton (#54995)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54995

provide an DDP private API to explicitly set the training is static, also set this flag in logger
ghstack-source-id: 127755713

Test Plan: unit tests

Reviewed By: rohan-varma

Differential Revision: D27444965

fbshipit-source-id: 06ef1c372296815944b2adb33fbdf4e1217c1359
2021-04-30 11:07:26 -07:00
Yanli Zhao
5f2b9b1df9 refactor autograd_hook (#54981)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54981

put part of codes in autograd_hook into functions, so that they can be used in the static graph training later on.
ghstack-source-id: 127755405

Test Plan: unit tests

Reviewed By: SciPioneer

Differential Revision: D27439508

fbshipit-source-id: a02a4b029841f5e7f11cfc5496bb7972ef53d878
2021-04-30 11:06:04 -07:00
Rohan Varma
51e7a371f5 [DDP] Param to name mapping in Reducer (#55075)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55075

Constructs and passes in a mapping with parameter names to Reducer to log information about unused parameters in error messages about unused parameters/not all parameters getting gradient.

Use case:
1) User runs DDP forward + bwd, and it has some unused parameters that will result in ddp error in next iteration
2) Next forward pass calls `Reducer::ensure_prior_reduction_finished()` where we check all params got gradient from the previous bwd pass. DDP would throw here in this case.
3) Reducer maintains mapping and tracks used parameters, and computes which parameters did not get gradient and logs this as part of the error.

Implementation details:
0) The following is only enabled for debug modes of INFO or DETAIL.
1) To save memory, we don't map param -> param name so that we don't have to copy the entire tensor, instead we map param_index -> param_name and use the existing concept of variable_index in Reducer to look up parameter names.
2) DDP constructs param index -> param name mapping. The name is the fully qualified name: f"{module_name}:{param_name}" and passes it into Reducer
3) Reducer maintains per-iteration std::set<int> of variable indices that have had `mark_variable_ready` called.
4) When some params go unused, we take a set difference to detect the unused params.
5) Unittests to test the logged unused params, as well as for nested modules, are added
ghstack-source-id: 126581051

Test Plan: CI, UT

Reviewed By: zhaojuanmao

Differential Revision: D27356394

fbshipit-source-id: 89f436af4e74145b0a8eda92b3c4e2af8e747332
2021-04-15 09:19:50 -07:00
Yanli Zhao
5ffc4e3b0f refactor prepare_for_backward (#54977)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54977

put part of codes in prepare_for_backward into functions, so that those functions can be used in static graph training and delay all reduce later on.
ghstack-source-id: 126366714

Test Plan: unit tests

Reviewed By: rohan-varma

Differential Revision: D27439195

fbshipit-source-id: 8899eda621260232d774cb145f9c6d683c47e188
2021-04-13 14:25:29 -07:00
Yi Wang
3e9cbe5ef7 [SPMD] Remove the code branches only used in SPMD mode from distributed.py (#55353)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55353

Remove all the code branches that will only be executed when `device_ids > 1`.

Some helper functions are also removed:
1.  `_verify_replicas_within_process` and `verify_replicas_within_process`
2. `_replicate_modules_within_process`
3. `parallel_apply`

The next step is deprecating `_module_copies` field.
ghstack-source-id: 126201121

Test Plan: waitforbuildbot

Reviewed By: rohan-varma

Differential Revision: D27552201

fbshipit-source-id: 128d0216a202f5b1ba4279517d68c3badba92a6c
2021-04-09 17:27:56 -07:00
Rohan Varma
5c3d80d8fa [DDP] Mark a few variables as const in reducer (#54764)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54764

We mark a few vars as const in Reducer, also do this for replicas_ and
process_group_ as they should not be changed by Reducer during training. This
can help eliminate issues at compile time and prevent the developer from
accidently changing these variables.
ghstack-source-id: 125040110

Test Plan: CI

Reviewed By: SciPioneer

Differential Revision: D27357132

fbshipit-source-id: 23a0edf754a8e4f9e6440e99860e5549724cb7ad
2021-03-27 21:40:18 -07:00
Rohan Varma
671f80a313 [c10d] s/torch::autograd::variable/at::Tensor/g (#54763)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54763

Replaces deprecated torch::autograd::variable with at::Tensor.
torch::autograd::variable is defined as equal to at::Tensor now so this should
be a noop, but follows convention of using tensor instead of Variable.
ghstack-source-id: 125040109

Test Plan: CI

Reviewed By: SciPioneer

Differential Revision: D27356450

fbshipit-source-id: 1a001358d7726a597141ec47803c8213db4814c0
2021-03-27 21:38:51 -07:00
Rohan Varma
f52a3bd634 [DDP] remove dedupe check in reducer (#53919)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53919

https://github.com/pytorch/pytorch/pull/53279/files has landed
deduplicating the shared params in python before constructing reducer. Because
of this, we no longer need the changes in
https://github.com/pytorch/pytorch/pull/46755/files.

This is already tested by `test_ddp_shared_grad_acc_unused_params` and
`test_ddp_weight_sharing`
ghstack-source-id: 123828299

Test Plan: ci

Reviewed By: SciPioneer

Differential Revision: D27015466

fbshipit-source-id: efb079540c1a0e18bb38e68479caeb50cf550304
2021-03-15 18:50:05 -07:00
Yanli Zhao
a08fc1a7fc allow users to set sample rate and add per iteration latency breakdowns (#53145)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53145

add a new API to allow users to set sample rate for runtime stats, also add per iteration latency breakdowns to DDPLoggingData struct. e.g.
if users set sample rate to be 1, they can analyze per iteration latency change over time (not avged)
ghstack-source-id: 123443369

Test Plan: unit test

Reviewed By: SciPioneer

Differential Revision: D26763957

fbshipit-source-id: baff6a09c2a590e6eb91362ca6f47ae8fa6ddb0e
2021-03-10 11:35:18 -08:00
Rohan Varma
68134374cb Refactor/fix DDP model check during init (#52887)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52887

This diff changes the way to do model consistency check (i.e. `_verify_replicas_across_processes`) in DDP.

There were a few things that could be improved with the way we verify model across processes in DDP initialization:

1. We should do this check before syncing module states in DDP init, otherwise with Gloo backend this will throw but we would like to throw the error corresponding to different models on different ranks. To do this, we move the methods to be standalone C++ functions (not part of reducer) and move this check to before synchronizing parameters.
2. Refactor DDP init in the following ways:
- Run model consistency check before creating reducer, 2
- add helper functions to build params to pass into reducer
- add helper function to call `_verify_model_across_ranks`
- move `def parameters` to a helper function `_get_parameters` to be used more broadly within DDP

In follow up changes we will add the ability to detect which rank had inconsistent model (https://github.com/pytorch/pytorch/issues/52876 would be useful for this to determine which ranks(s) had errors).
ghstack-source-id: 123171877

Test Plan:
CI/unittest
buck test mode/dev-nosan //caffe2/test/distributed:c10d
BACKEND="nccl" WORLD_SIZE="2" ~/fbcode/buck-out/dev/gen/caffe2/test/distributed/distributed_nccl_fork#binary.par -r test_ddp_model_diff_across_ranks

Reviewed By: zhaojuanmao

Differential Revision: D26565290

fbshipit-source-id: f0e1709585b53730e86915e768448f5b8817a608
2021-03-05 11:21:45 -08:00
Yanli Zhao
c75fa39b6c add stats that can only be collected at runtime (#51386)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51386

add stats such as rebuilt bucket stats, unused parameter stats and performance stats to ddp logging data

1. gpu time stats are not collected for single process multiple devices in this diff, as that requires events are created and recorded on multiple devices
2. use at::cuda::event API for safer calls
3. events may not be created in autograd hook if hook is not triggered in user's codes, e.g., users runs in non-sync mode in some iterations. So we checked events are created or not before synchronizing, also skipped invalid results.
4. users may not set device upfront, so explicitly set proper device before creating events in our prepare_forward() and prepare_backward() calls

ghstack-source-id: 121933566

Test Plan: unit tests

Reviewed By: SciPioneer

Differential Revision: D26158645

fbshipit-source-id: ce5f15187802eba76accb980449be68902c10178
2021-02-19 00:13:11 -08:00
Rohan Varma
6dabe0b291 [Dist Profiling] Enable dist profiling for DDP (gloo only) (#52031)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52031

Closes https://github.com/pytorch/pytorch/issues/52020
Ensures that we can profile collectives in DDP by propagating the profiler threadLocalState appropriately. As described in the above issue, before this wouldn't work as the profiler would only be enabled on the main thread.
ghstack-source-id: 121818080

Test Plan: CI

Reviewed By: zhaojuanmao

Differential Revision: D26356192

fbshipit-source-id: 0158b5833a3f857a0b4b2943ae3037e9d998dfd1
2021-02-17 12:21:37 -08:00
Yanli Zhao
18e0a61388 add more logging fields that can be set in construction time (#51260)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51260

add more logging fields to DDPLoggingData, including param stats, bucket stats, environment variables, nccl version, data type
ghstack-source-id: 121260224

Test Plan: unit tests

Reviewed By: rohan-varma

Differential Revision: D26118245

fbshipit-source-id: ba48b7a11340bda1f5f3b24c8603545d346361e9
2021-02-09 21:58:58 -08:00
Yanli Zhao
250c71121b Create a DDPLoggingData and expose it to python interface (#50622)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50622

1. Define a DDPLoggingData struct that is the placeholder for all the ddp related logging fields
2. Put the DDPLoggingData struct in the C10 directory so that it can be easily imported by c10 and torch files
3. Expose get_ddp_logging_data() method in python so that users can get the logging data and dump in their applications
4. Unit test tested the logging data can be set and got as expected
5. Follow up will add more logging fields such as perf stats, internal states, env variables and etc
ghstack-source-id: 120275870

Test Plan: unit tests

Reviewed By: SciPioneer

Differential Revision: D25930527

fbshipit-source-id: 290c200161019c58e28eed9a5a2a7a8153113f99
2021-01-25 15:23:07 -08:00
Wanchao Liang
553ccccc54 [c10d] switch ProcessGroup to be managed by intrusive_ptr (#47343)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47343

Test Plan: Imported from OSS

Reviewed By: gmagogsfm

Differential Revision: D24723418

Pulled By: wanchaol

fbshipit-source-id: 0463819b96c53b12bdbb3905431110d7b21beb77
2020-11-12 07:36:23 -08:00
Wanchao Liang
70ae5685f9 [reland][c10d] switch ProcessGroup::Work to be managed by intrusive_ptr (#47806)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47806

reland https://github.com/pytorch/pytorch/pull/44046

Test Plan: wait for ci

Reviewed By: gmagogsfm

Differential Revision: D24905245

fbshipit-source-id: ad75ace5432fcfd22d513878f5a73c4bb017324e
2020-11-11 22:51:03 -08:00
Wanchao Liang
dac0192148 Revert D23632280: [c10d] switch ProcessGroup::Work to be managed by intrusive_ptr
Test Plan: revert-hammer

Differential Revision:
D23632280 (0650a6166f)

Original commit changeset: 0a4642a8ffab

fbshipit-source-id: 2aa8ddb874fab11f773f4c08d740afcd865482e9
2020-11-11 10:54:08 -08:00
Wanchao Liang
0650a6166f [c10d] switch ProcessGroup::Work to be managed by intrusive_ptr (#44046)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44046

Test Plan: Imported from OSS

Reviewed By: gmagogsfm

Differential Revision: D23632280

Pulled By: wanchaol

fbshipit-source-id: 0a4642a8ffabdd26c52c1baabfa30c0f446c3c85
2020-11-10 23:30:22 -08:00
Yi Wang
6b3802a711 [Gradient Compression] Export sizes, along with length and offset of each variable to GradBucket for PowerSGD (#47203)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47203

1. Create a new field in BucketReplica to store sizes info for each variable.
2. Export sizes list, along with lengths and offsets to GradBuceket.

These fields are needed for PowerSGD.

Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202
ghstack-source-id: 115875194

Test Plan: Checked the field values from log.

Reviewed By: rohan-varma

Differential Revision: D24644137

fbshipit-source-id: bcec0daf0d02cbf25389bfd9be90df1e6fd8fc56
2020-11-04 12:34:53 -08:00
Yanan Cao
5c4bd9a38f Move python-independent c10d implementations to torch/lib (#47309)
Summary:
* This is a pre-step to build c10d into libtorch
* Includes a minor cleanup in c10d/CMakeLists.txt

Pull Request resolved: https://github.com/pytorch/pytorch/pull/47309

Reviewed By: wanchaol

Differential Revision: D24711768

Pulled By: gmagogsfm

fbshipit-source-id: 6f9e0a6a73c30f5ac7dafde9082efcc4b725dde1
2020-11-03 23:39:54 -08:00