pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 00:20:18 +01:00

History

Andrew Gu c30659ffcc [ZeRO] (Reland) Add ctor support for multiple param groups (#72932 ) Summary: Reland of https://github.com/pytorch/pytorch/pull/72578. Overview Windows CI was failing due to the multi-rank single-GPU case (see [here](https://github.com/pytorch/pytorch/runs/5204906995?check_suite_focus=true)). To address this, I - added `common_distributed.skip_if_no_gpu` for `test_multiple_param_groups()` to ensure that each rank can safely call `to(self.device)` -- this targets the expected SPSD use case where each rank has its own GPU; - moved `test_constructor()` back to `TestZeroRedundancyOptimizerSingleRank` to check that the multiple parameter group method for construction works even on a single rank. Test Plan - I checked both tests for CPU, 1 GPU, 2 GPUs, 4 GPUs, and 8 GPUs. - I added the `ciflow/win` label to run the failing Windows CI test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/72932 Reviewed By: rohan-varma Differential Revision: D34281482 Pulled By: awgu fbshipit-source-id: c4fe604ddd9d2c123c3071249741e6b8a6454b6e (cherry picked from commit `6bea9bcc63`)		2022-02-22 16:29:55 +00:00
..
_shard	[PT-D][Sharded Tensor] new init api for local tensor and sharding spec auto inference (#72733 )	2022-02-16 17:42:39 +00:00
algorithms	[BE] move init_multigpu_helper to common_distributed (#67050 )	2021-10-22 17:16:11 -07:00
bin	Add test owner to distributed files starting with test_ (#66797 )	2021-10-19 10:55:20 -07:00
elastic	Revise the socket implementation of c10d (#68226 )	2021-11-16 20:49:25 -08:00
fsdp	Revert D33919683: [FSDP] Implement local_state_dict and load_local_state_dict	2022-02-20 02:32:48 +00:00
launcher	[torchelastic][1/n] Fix `caffe2.test.distributed.launcher.api_test` flaky tests (#68624 )	2021-11-19 15:23:30 -08:00
nn/jit	Have test classes extend from common_utils.TestCase, not unittest.TestCase (#66900 )	2021-10-19 16:54:05 -07:00
optim	[ZeRO] (Reland) Add ctor support for multiple param groups (#72932 )	2022-02-22 16:29:55 +00:00
pipeline/sync	[skip ci] set more tests with owners for distributed and elastic (#67583 )	2021-11-01 12:26:03 -07:00
rpc	Add test owner to distributed files starting with test_ (#66797 )	2021-10-19 10:55:20 -07:00
argparse_util_test.py	[skip ci] set more tests with owners for distributed and elastic (#67583 )	2021-11-01 12:26:03 -07:00
test_c10d_common.py	[BE] rename some tests in test_c10d_common (#67828 )	2021-11-18 17:14:58 -08:00
test_c10d_gloo.py	no longer coalesce sparse COO tensors before comparison (#69751 )	2022-02-17 02:33:08 +00:00
test_c10d_nccl.py	Implement scatter primitive for ProcessGroupNCCL (#70029 )	2022-01-27 19:37:55 +00:00
test_c10d_spawn_gloo.py	[PyTorch][Distributed] Enable Reduce Scatter and modify all_to_all for sharded linear with more test cases. (#68786 )	2021-12-06 13:38:58 -08:00
test_c10d_spawn_nccl.py	[PyTorch][Distributed] Enable Reduce Scatter and modify all_to_all for sharded linear with more test cases. (#68786 )	2021-12-06 13:38:58 -08:00
test_c10d_spawn.py	[PyTorch][Distributed] Enable Reduce Scatter and modify all_to_all for sharded linear with more test cases. (#68786 )	2021-12-06 13:38:58 -08:00
test_data_parallel.py	no longer coalesce sparse COO tensors before comparison (#69751 )	2022-02-17 02:33:08 +00:00
test_distributed_spawn.py	Add test owner to distributed files starting with test_ (#66797 )	2021-10-19 10:55:20 -07:00
test_launcher.py	Add test owner to distributed files starting with test_ (#66797 )	2021-10-19 10:55:20 -07:00
test_nccl.py	[NCCL] Patch bfloat16 support (#67843 )	2021-11-09 13:46:13 -08:00
test_pg_wrapper.py	Add test owner to distributed files starting with test_ (#66797 )	2021-10-19 10:55:20 -07:00
test_store.py	Add support for deleteKey for FileStore (#69953 )	2022-01-07 06:20:59 -08:00