Commit Graph

390 Commits

Author SHA1 Message Date
pritam
37eb31599c [reland] Add sharding tests to multigpu-test.sh and fix custom operator decorator (#77987)
1. Enabled multigpu tests.
2. Fixed failing multigpu tests.
3. Fixed custom operator decorator to be first preference in operator dispatch.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77987
Approved by: https://github.com/fduwjj, https://github.com/wanchaol, https://github.com/janeyx99
2022-05-21 22:33:58 +00:00
PyTorch MergeBot
0f74b44f1a Revert "Add sharding tests to multigpu-test.sh and fix custom operator decorator (#77825)"
This reverts commit 8d4c8df33a.

Reverted https://github.com/pytorch/pytorch/pull/77825 on behalf of https://github.com/janeyx99 due to as it will break multigpu test reporting
2022-05-20 17:59:03 +00:00
pritam
8d4c8df33a Add sharding tests to multigpu-test.sh and fix custom operator decorator (#77825)
1. Enabled multigpu tests.
2. Fixed failing multigpu tests.
3. Fixed custom operator decorator to be first preference in operator dispatch.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77825
Approved by: https://github.com/wanchaol, https://github.com/fduwjj
2022-05-20 16:53:27 +00:00
PyTorch MergeBot
5e0f559d23 Revert "Add sharding tests to multigpu-test.sh (#77708)"
This reverts commit a7cf95a609.

Reverted https://github.com/pytorch/pytorch/pull/77708 on behalf of https://github.com/suo
2022-05-18 21:47:11 +00:00
pritam
a7cf95a609 Add sharding tests to multigpu-test.sh (#77708)
Summary: These tests were being skipped since they don't run on multigpu
jobs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77708
Approved by: https://github.com/wanchaol
2022-05-18 17:37:55 +00:00
Wanchao Liang
25fa964d96 [shard] add clone/detach and set requires_grad for ShardedTensor
This PR adding clone/detach and set requires_grad to ShardedTensor

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77367

Approved by: https://github.com/pritamdamania87
2022-05-16 21:42:27 +00:00
pritam
9e52b50e34 Additional ops for ShardedTensor, ReplicatedTensor and PartialTensor.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76477

Adding the following ops:

1) softmax for ShardedTensor
2) getitem and unsqueeze for ReplicatedTensor
3) transpose and cat for PartialTensor

Differential Revision: [D35979510](https://our.internmc.facebook.com/intern/diff/D35979510/)

Approved by: https://github.com/fduwjj, https://github.com/wanchaol
2022-05-06 16:28:04 +00:00
Catherine Lee
56ea57de61 shard pull / linux-xenial-cuda11.3-py3.7-gcc7 / test (distributed 1->2
Fixes #ISSUE_NUMBER

shard `pull / linux-xenial-cuda11.3-py3.7-gcc7 / test (distributed ...` from 1 shard to 2

Pros:
- It currently takes about 2.6 hours and is 3rd longest running job on pull
- Theoretically minimal overhead

Cons:
- Requires changes to the run_test.py which might have correctness issues

Notes:
- Cannot shard further as one of the test files is responsible for about half of the total run time

spreadsheet regarding sharding: https://docs.google.com/spreadsheets/d/1BdtVsjRr0Is9LXMNilR02FEdPXNq7zEWl8AmR3ArsLQ/edit#gid=1153012347

Test Plan:
<details><summary>expand to see test plan (its long)</summary>

tests from a commit ran on master (90 tests ran)
```
2022-05-03T12:45:34.7974184Z Selected tests:
2022-05-03T12:45:34.7974495Z  distributed/_shard/sharded_optim/test_sharded_optim
2022-05-03T12:45:34.7974839Z  distributed/_shard/sharded_tensor/ops/test_binary_cmp
2022-05-03T12:45:34.7975209Z  distributed/_shard/sharded_tensor/ops/test_elementwise_ops
2022-05-03T12:45:34.7975575Z  distributed/_shard/sharded_tensor/ops/test_embedding
2022-05-03T12:45:34.7976180Z  distributed/_shard/sharded_tensor/ops/test_embedding_bag
2022-05-03T12:45:34.7976802Z  distributed/_shard/sharded_tensor/ops/test_init
2022-05-03T12:45:34.7977361Z  distributed/_shard/sharded_tensor/ops/test_linear
2022-05-03T12:45:34.7978157Z  distributed/_shard/sharded_tensor/ops/test_math_ops
2022-05-03T12:45:34.7978879Z  distributed/_shard/sharded_tensor/test_megatron_prototype
2022-05-03T12:45:34.7979594Z  distributed/_shard/sharded_tensor/test_sharded_tensor
2022-05-03T12:45:34.7980366Z  distributed/_shard/sharded_tensor/test_sharded_tensor_reshard
2022-05-03T12:45:34.7981066Z  distributed/_shard/sharding_plan/test_sharding_plan
2022-05-03T12:45:34.7981877Z  distributed/_shard/sharding_spec/test_sharding_spec
2022-05-03T12:45:34.7982387Z  distributed/_shard/test_partial_tensor
2022-05-03T12:45:34.7982691Z  distributed/_shard/test_replicated_tensor
2022-05-03T12:45:34.7982994Z  distributed/_shard/test_sharder
2022-05-03T12:45:34.7983280Z  distributed/algorithms/test_join
2022-05-03T12:45:34.7983695Z  distributed/elastic/events/lib_test
2022-05-03T12:45:34.7983984Z  distributed/elastic/metrics/api_test
2022-05-03T12:45:34.7984308Z  distributed/elastic/multiprocessing/api_test
2022-05-03T12:45:34.7984624Z  distributed/elastic/timer/api_test
2022-05-03T12:45:34.7984924Z  distributed/elastic/timer/local_timer_example
2022-05-03T12:45:34.7985254Z  distributed/elastic/timer/local_timer_test
2022-05-03T12:45:34.7985575Z  distributed/elastic/utils/distributed_test
2022-05-03T12:45:34.7985889Z  distributed/elastic/utils/logging_test
2022-05-03T12:45:34.7986176Z  distributed/elastic/utils/util_test
2022-05-03T12:45:34.7986492Z  distributed/fsdp/test_flatten_params_wrapper
2022-05-03T12:45:34.7986799Z  distributed/fsdp/test_fsdp_apply
2022-05-03T12:45:34.7987078Z  distributed/fsdp/test_fsdp_checkpoint
2022-05-03T12:45:34.7987388Z  distributed/fsdp/test_fsdp_clip_grad_norm
2022-05-03T12:45:34.7987691Z  distributed/fsdp/test_fsdp_comm
2022-05-03T12:45:34.7987961Z  distributed/fsdp/test_fsdp_core
2022-05-03T12:45:34.7988251Z  distributed/fsdp/test_fsdp_exec_order
2022-05-03T12:45:34.7988570Z  distributed/fsdp/test_fsdp_freezing_weights
2022-05-03T12:45:34.7988865Z  distributed/fsdp/test_fsdp_grad_acc
2022-05-03T12:45:34.7989176Z  distributed/fsdp/test_fsdp_ignored_modules
2022-05-03T12:45:34.7989478Z  distributed/fsdp/test_fsdp_input
2022-05-03T12:45:34.7989950Z  distributed/fsdp/test_fsdp_memory
2022-05-03T12:45:34.7990241Z  distributed/fsdp/test_fsdp_meta
2022-05-03T12:45:34.7990640Z  distributed/fsdp/test_fsdp_mixed_precision
2022-05-03T12:45:34.7990964Z  distributed/fsdp/test_fsdp_multiple_forward
2022-05-03T12:45:34.7991293Z  distributed/fsdp/test_fsdp_multiple_wrapping
2022-05-03T12:45:34.7991610Z  distributed/fsdp/test_fsdp_optim_state
2022-05-03T12:45:34.7991895Z  distributed/fsdp/test_fsdp_overlap
2022-05-03T12:45:34.7992195Z  distributed/fsdp/test_fsdp_pure_fp16
2022-05-03T12:45:34.7992500Z  distributed/fsdp/test_fsdp_state_dict
2022-05-03T12:45:34.7992818Z  distributed/fsdp/test_fsdp_summon_full_params
2022-05-03T12:45:34.7993117Z  distributed/fsdp/test_fsdp_traversal
2022-05-03T12:45:34.7993861Z  distributed/fsdp/test_fsdp_uneven
2022-05-03T12:45:34.7994181Z  distributed/fsdp/test_shard_utils
2022-05-03T12:45:34.7994447Z  distributed/fsdp/test_utils
2022-05-03T12:45:34.7994721Z  distributed/fsdp/test_wrap
2022-05-03T12:45:34.7995015Z  distributed/nn/jit/test_instantiator
2022-05-03T12:45:34.7995328Z  distributed/optim/test_zero_redundancy_optimizer
2022-05-03T12:45:34.7995664Z  distributed/pipeline/sync/skip/test_api
2022-05-03T12:45:34.7995983Z  distributed/pipeline/sync/skip/test_gpipe
2022-05-03T12:45:34.7996315Z  distributed/pipeline/sync/skip/test_inspect_skip_layout
2022-05-03T12:45:34.7996652Z  distributed/pipeline/sync/skip/test_leak
2022-05-03T12:45:34.7996977Z  distributed/pipeline/sync/skip/test_portal
2022-05-03T12:45:34.7997292Z  distributed/pipeline/sync/skip/test_stash_pop
2022-05-03T12:45:34.7997623Z  distributed/pipeline/sync/skip/test_tracker
2022-05-03T12:45:34.7997968Z  distributed/pipeline/sync/skip/test_verify_skippables
2022-05-03T12:45:34.7998301Z  distributed/pipeline/sync/test_balance
2022-05-03T12:45:34.7998591Z  distributed/pipeline/sync/test_bugs
2022-05-03T12:45:34.7998927Z  distributed/pipeline/sync/test_checkpoint
2022-05-03T12:45:34.7999243Z  distributed/pipeline/sync/test_copy
2022-05-03T12:45:34.7999557Z  distributed/pipeline/sync/test_deferred_batch_norm
2022-05-03T12:45:34.7999896Z  distributed/pipeline/sync/test_dependency
2022-05-03T12:45:34.8000215Z  distributed/pipeline/sync/test_inplace
2022-05-03T12:45:34.8000516Z  distributed/pipeline/sync/test_microbatch
2022-05-03T12:45:34.8000826Z  distributed/pipeline/sync/test_phony
2022-05-03T12:45:34.8001130Z  distributed/pipeline/sync/test_pipe
2022-05-03T12:45:34.8001424Z  distributed/pipeline/sync/test_pipeline
2022-05-03T12:45:34.8001733Z  distributed/pipeline/sync/test_stream
2022-05-03T12:45:34.8002055Z  distributed/pipeline/sync/test_transparency
2022-05-03T12:45:34.8002353Z  distributed/pipeline/sync/test_worker
2022-05-03T12:45:34.8002672Z  distributed/rpc/cuda/test_tensorpipe_agent
2022-05-03T12:45:34.8002982Z  distributed/rpc/test_faulty_agent
2022-05-03T12:45:34.8003270Z  distributed/rpc/test_tensorpipe_agent
2022-05-03T12:45:34.8003568Z  distributed/test_c10d_common
2022-05-03T12:45:34.8003839Z  distributed/test_c10d_gloo
2022-05-03T12:45:34.8004088Z  distributed/test_c10d_nccl
2022-05-03T12:45:34.8004369Z  distributed/test_c10d_spawn_gloo
2022-05-03T12:45:34.8004656Z  distributed/test_c10d_spawn_nccl
2022-05-03T12:45:34.8004938Z  distributed/test_data_parallel
2022-05-03T12:45:34.8005212Z  distributed/test_distributed_spawn
2022-05-03T12:45:34.8005496Z  distributed/test_launcher
2022-05-03T12:45:34.8005767Z  distributed/test_nccl
2022-05-03T12:45:34.8006019Z  distributed/test_pg_wrapper
2022-05-03T12:45:34.8006285Z  distributed/test_store
```

tests ran on first shard for distributed on this PR (34 tests)
```
2022-05-02T21:26:00.1385256Z Selected tests:
2022-05-02T21:26:00.1385767Z  distributed/test_distributed_spawn
2022-05-02T21:26:00.1386403Z  distributed/elastic/multiprocessing/api_test
2022-05-02T21:26:00.1387051Z  distributed/fsdp/test_fsdp_memory
2022-05-02T21:26:00.1387607Z  distributed/fsdp/test_fsdp_ignored_modules
2022-05-02T21:26:00.1388179Z  distributed/fsdp/test_fsdp_apply
2022-05-02T21:26:00.1388600Z  distributed/_shard/sharded_tensor/ops/test_binary_cmp
2022-05-02T21:26:00.1389181Z  distributed/_shard/sharding_spec/test_sharding_spec
2022-05-02T21:26:00.1389545Z  distributed/_shard/sharded_tensor/ops/test_linear
2022-05-02T21:26:00.1389878Z  distributed/fsdp/test_fsdp_uneven
2022-05-02T21:26:00.1390186Z  distributed/fsdp/test_fsdp_multiple_wrapping
2022-05-02T21:26:00.1390526Z  distributed/fsdp/test_fsdp_multiple_forward
2022-05-02T21:26:00.1390877Z  distributed/_shard/sharded_tensor/ops/test_embedding
2022-05-02T21:26:00.1391219Z  distributed/_shard/test_partial_tensor
2022-05-02T21:26:00.1391542Z  distributed/_shard/sharded_optim/test_sharded_optim
2022-05-02T21:26:00.1391915Z  distributed/_shard/sharded_tensor/ops/test_elementwise_ops
2022-05-02T21:26:00.1392297Z  distributed/fsdp/test_flatten_params_wrapper
2022-05-02T21:26:00.1392585Z  distributed/fsdp/test_utils
2022-05-02T21:26:00.1392883Z  distributed/nn/jit/test_instantiator
2022-05-02T21:26:00.1393167Z  distributed/test_nccl
2022-05-02T21:26:00.1393466Z  distributed/_shard/sharding_plan/test_sharding_plan
2022-05-02T21:26:00.1393787Z  distributed/_shard/test_sharder
2022-05-02T21:26:00.1394085Z  distributed/elastic/timer/api_test
2022-05-02T21:26:00.1394383Z  distributed/pipeline/sync/skip/test_api
2022-05-02T21:26:00.1394738Z  distributed/pipeline/sync/skip/test_inspect_skip_layout
2022-05-02T21:26:00.1395090Z  distributed/pipeline/sync/skip/test_portal
2022-05-02T21:26:00.1395424Z  distributed/pipeline/sync/skip/test_tracker
2022-05-02T21:26:00.1395935Z  distributed/pipeline/sync/test_balance
2022-05-02T21:26:00.1396288Z  distributed/pipeline/sync/test_checkpoint
2022-05-02T21:26:00.1396635Z  distributed/pipeline/sync/test_deferred_batch_norm
2022-05-02T21:26:00.1396953Z  distributed/pipeline/sync/test_inplace
2022-05-02T21:26:00.1397269Z  distributed/pipeline/sync/test_phony
2022-05-02T21:26:00.1397587Z  distributed/pipeline/sync/test_pipeline
2022-05-02T21:26:00.1397903Z  distributed/pipeline/sync/test_transparency
2022-05-02T21:26:00.1398221Z  distributed/rpc/test_faulty_agent
```

tests ran on second shard for distributed on this PR (56 tests)
```
2022-05-02T21:26:55.1342892Z Selected tests:
2022-05-02T21:26:55.1343201Z  distributed/rpc/cuda/test_tensorpipe_agent
2022-05-02T21:26:55.1343526Z  distributed/fsdp/test_fsdp_core
2022-05-02T21:26:55.1343829Z  distributed/test_c10d_nccl
2022-05-02T21:26:55.1344089Z  distributed/test_c10d_gloo
2022-05-02T21:26:55.1344408Z  distributed/fsdp/test_fsdp_summon_full_params
2022-05-02T21:26:55.1344749Z  distributed/fsdp/test_fsdp_mixed_precision
2022-05-02T21:26:55.1345085Z  distributed/optim/test_zero_redundancy_optimizer
2022-05-02T21:26:55.1345423Z  distributed/fsdp/test_fsdp_optim_state
2022-05-02T21:26:55.1345773Z  distributed/_shard/sharded_tensor/test_sharded_tensor
2022-05-02T21:26:55.1346088Z  distributed/fsdp/test_fsdp_state_dict
2022-05-02T21:26:55.1346379Z  distributed/test_store
2022-05-02T21:26:55.1346661Z  distributed/test_c10d_spawn_gloo
2022-05-02T21:26:55.1346966Z  distributed/test_pg_wrapper
2022-05-02T21:26:55.1347252Z  distributed/test_c10d_spawn_nccl
2022-05-02T21:26:55.1347565Z  distributed/fsdp/test_fsdp_clip_grad_norm
2022-05-02T21:26:55.1347871Z  distributed/fsdp/test_wrap
2022-05-02T21:26:55.1348369Z  distributed/fsdp/test_fsdp_grad_acc
2022-05-02T21:26:55.1348679Z  distributed/algorithms/test_join
2022-05-02T21:26:55.1349004Z  distributed/fsdp/test_fsdp_freezing_weights
2022-05-02T21:26:55.1349305Z  distributed/fsdp/test_fsdp_comm
2022-05-02T21:26:55.1349593Z  distributed/test_c10d_common
2022-05-02T21:26:55.1349885Z  distributed/fsdp/test_fsdp_meta
2022-05-02T21:26:55.1350171Z  distributed/fsdp/test_fsdp_exec_order
2022-05-02T21:26:55.1350486Z  distributed/fsdp/test_fsdp_checkpoint
2022-05-02T21:26:55.1350798Z  distributed/fsdp/test_fsdp_overlap
2022-05-02T21:26:55.1351105Z  distributed/elastic/timer/local_timer_example
2022-05-02T21:26:55.1351423Z  distributed/fsdp/test_fsdp_input
2022-05-02T21:26:55.1351749Z  distributed/_shard/sharded_tensor/ops/test_init
2022-05-02T21:26:55.1352190Z  distributed/elastic/timer/local_timer_test
2022-05-02T21:26:55.1352520Z  distributed/elastic/utils/distributed_test
2022-05-02T21:26:55.1352841Z  distributed/fsdp/test_fsdp_pure_fp16
2022-05-02T21:26:55.1353150Z  distributed/test_data_parallel
2022-05-02T21:26:55.1353437Z  distributed/fsdp/test_fsdp_traversal
2022-05-02T21:26:55.1353792Z  distributed/_shard/sharded_tensor/test_sharded_tensor_reshard
2022-05-02T21:26:55.1354174Z  distributed/_shard/sharded_tensor/ops/test_embedding_bag
2022-05-02T21:26:55.1354534Z  distributed/_shard/sharded_tensor/test_megatron_prototype
2022-05-02T21:26:55.1354858Z  distributed/test_launcher
2022-05-02T21:26:55.1355149Z  distributed/elastic/utils/util_test
2022-05-02T21:26:55.1355441Z  distributed/elastic/utils/logging_test
2022-05-02T21:26:55.1355755Z  distributed/elastic/metrics/api_test
2022-05-02T21:26:55.1356095Z  distributed/_shard/sharded_tensor/ops/test_math_ops
2022-05-02T21:26:55.1356455Z  distributed/_shard/test_replicated_tensor
2022-05-02T21:26:55.1356754Z  distributed/elastic/events/lib_test
2022-05-02T21:26:55.1357065Z  distributed/fsdp/test_shard_utils
2022-05-02T21:26:55.1357387Z  distributed/pipeline/sync/skip/test_gpipe
2022-05-02T21:26:55.1357702Z  distributed/pipeline/sync/skip/test_leak
2022-05-02T21:26:55.1358040Z  distributed/pipeline/sync/skip/test_stash_pop
2022-05-02T21:26:55.1358396Z  distributed/pipeline/sync/skip/test_verify_skippables
2022-05-02T21:26:55.1358716Z  distributed/pipeline/sync/test_bugs
2022-05-02T21:26:55.1359027Z  distributed/pipeline/sync/test_copy
2022-05-02T21:26:55.1359350Z  distributed/pipeline/sync/test_dependency
2022-05-02T21:26:55.1359662Z  distributed/pipeline/sync/test_microbatch
2022-05-02T21:26:55.1359983Z  distributed/pipeline/sync/test_pipe
2022-05-02T21:26:55.1360299Z  distributed/pipeline/sync/test_stream
2022-05-02T21:26:55.1360593Z  distributed/pipeline/sync/test_worker
2022-05-02T21:26:55.1360912Z  distributed/rpc/test_tensorpipe_agent
```
</details>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76564
Approved by: https://github.com/jeffdaily, https://github.com/janeyx99
2022-05-03 23:01:42 +00:00
Junjie Wang (PyTorch)
7c44d560ba [PT-D][Sharding] Enable ops needed in the transformer model training (#75374)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75374

From the code base of FairSeq and MetaSeq codebase (which is essentially a transformer model), we have found that loads of ops are not supported by sharded tensor. So we now implement a simple version so that we can at least run a transformer example:

Ops include: chuck, transpose, view, mask_fill, dropout, softmax and type_as.

Isolate the common logic of registering simple ops into a function and for future register, we just need to implement at most three functions for a new op.

ghstack-source-id: 155309147

Test Plan: CI

Reviewed By: pritamdamania87

Differential Revision: D35123021

fbshipit-source-id: 660e559fb8b4a910eb63e0586c63ab927873a2ce
(cherry picked from commit 83a87ebf627d863448dfe1019c7c5f7112cc14ab)
2022-05-03 17:20:28 +00:00
Junjie Wang (PyTorch)
c1037d0d4c [PT-D][Sharding] Move Partial Tensor to the _shard folder and add logic to remove padding (#76199)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76199

Since Partial Tensor is somehow isolated to sharded tensor. We now move it to the _shard folder.

Also, we added the logic to remove paddings when the size is not divisible by the world size. Modify the unit test to reflect this changes.

Finally, we need to consider the placement order for the reshading spec for partial tensor, related logic is added in this change. Futhermore, for sharded linear, we will need to order the placement by rank to get the expected local result.
ghstack-source-id: 154853290

Test Plan: CI

Reviewed By: pritamdamania87, wanchaol

Differential Revision: D35827894

fbshipit-source-id: 58dab77969b8b6557f42afa7e8f5a8a053dd5793
(cherry picked from commit abeb28f16582dcf707c2e165f39df6caf692384d)
2022-04-28 06:22:02 +00:00
Alban Desmaison
3d7abc0e55 Make -h work with run_test.py
As per title.

### When running `python run_test.py -h`
It used to show:
- The general unittest parser help that we print via a second thread 35545d85dc/torch/testing/_internal/common_utils.py (L467-L470)
- The common_utils's parser help

<details><summary>Full result</summary>
<p>

```bash
$ python run_test.py -h
usage: run_test.py [-h] [-v] [-q] [--locals] [-f] [-c] [-b] [-k TESTNAMEPATTERNS] [tests [tests ...]]

positional arguments:
  tests                a list of any number of test modules, classes and test methods.

optional arguments:
  -h, --help           show this help message and exit
  -v, --verbose        Verbose output
  -q, --quiet          Quiet output
  --locals             Show local variables in tracebacks
  -f, --failfast       Stop on first fail or error
  -c, --catch          Catch Ctrl-C and display results so far
  -b, --buffer         Buffer stdout and stderr during tests
  -k TESTNAMEPATTERNS  Only run tests which match the given substring

Examples:
  run_test.py                           - run default set of tests
  run_test.py MyTestSuite               - run suite 'MyTestSuite'
  run_test.py MyTestCase.testSomething  - run MyTestCase.testSomething
  run_test.py MyTestCase                - run all 'test*' test methods
                                       in MyTestCase

usage: run_test.py [-h] [--subprocess] [--seed SEED] [--accept] [--jit_executor JIT_EXECUTOR] [--repeat REPEAT] [--test_bailouts]
                   [--save-xml [SAVE_XML]] [--discover-tests] [--log-suffix LOG_SUFFIX] [--run-parallel RUN_PARALLEL]
                   [--import-slow-tests [IMPORT_SLOW_TESTS]] [--import-disabled-tests [IMPORT_DISABLED_TESTS]]

optional arguments:
  -h, --help            show this help message and exit
  --subprocess          whether to run each test in a subprocess
  --seed SEED
  --accept
  --jit_executor JIT_EXECUTOR
  --repeat REPEAT
  --test_bailouts
  --save-xml [SAVE_XML]
  --discover-tests
  --log-suffix LOG_SUFFIX
  --run-parallel RUN_PARALLEL
  --import-slow-tests [IMPORT_SLOW_TESTS]
  --import-disabled-tests [IMPORT_DISABLED_TESTS]
```

</p>
</details>

It now prints:
- The general unittest parser help the same way. Should we remove this? We can't merge them unfortunately as inittest does not accept parent / does not expose the parser for us to take it as a parent.
- The combined common_utils + run_test parsers help

<details><summary>Full result</summary>
<p>

```bash
$ python run_test.py -h
usage: run_test.py [-h] [-v] [-q] [--locals] [-f] [-c] [-b] [-k TESTNAMEPATTERNS] [tests [tests ...]]

positional arguments:
  tests                a list of any number of test modules, classes and test methods.

optional arguments:
  -h, --help           show this help message and exit
  -v, --verbose        Verbose output
  -q, --quiet          Quiet output
  --locals             Show local variables in tracebacks
  -f, --failfast       Stop on first fail or error
  -c, --catch          Catch Ctrl-C and display results so far
  -b, --buffer         Buffer stdout and stderr during tests
  -k TESTNAMEPATTERNS  Only run tests which match the given substring

Examples:
  run_test.py                           - run default set of tests
  run_test.py MyTestSuite               - run suite 'MyTestSuite'
  run_test.py MyTestCase.testSomething  - run MyTestCase.testSomething
  run_test.py MyTestCase                - run all 'test*' test methods
                                       in MyTestCase

Ignoring disabled issues:  []
usage: run_test.py [-h] [--subprocess] [--seed SEED] [--accept] [--jit_executor JIT_EXECUTOR] [--repeat REPEAT] [--test_bailouts]
                   [--save-xml [SAVE_XML]] [--discover-tests] [--log-suffix LOG_SUFFIX] [--run-parallel RUN_PARALLEL]
                   [--import-slow-tests [IMPORT_SLOW_TESTS]] [--import-disabled-tests [IMPORT_DISABLED_TESTS]] [-v] [--jit]
                   [--distributed-tests] [-core] [-pt] [-c] [-i TESTS [TESTS ...]] [-x TESTS [TESTS ...]] [-f TESTS] [-l TESTS]
                   [--bring-to-front TESTS [TESTS ...]] [--ignore-win-blocklist] [--continue-through-error]
                   [--export-past-test-times [EXPORT_PAST_TEST_TIMES]] [--shard SHARD SHARD] [--exclude-jit-executor]
                   [--exclude-distributed-tests] [--run-specified-test-cases [RUN_SPECIFIED_TEST_CASES]]
                   [--use-specified-test-cases-by {include,bring-to-front}] [--dry-run]
                   [additional_unittest_args [additional_unittest_args ...]]

Run the PyTorch unit test suite

positional arguments:
  additional_unittest_args
                        additional arguments passed through to unittest, e.g., python run_test.py -i sparse -- TestSparse.test_factory_size_check

optional arguments:
  -h, --help            show this help message and exit
  --subprocess          whether to run each test in a subprocess
  --seed SEED
  --accept
  --jit_executor JIT_EXECUTOR
  --repeat REPEAT
  --test_bailouts
  --save-xml [SAVE_XML]
  --discover-tests
  --log-suffix LOG_SUFFIX
  --run-parallel RUN_PARALLEL
  --import-slow-tests [IMPORT_SLOW_TESTS]
  --import-disabled-tests [IMPORT_DISABLED_TESTS]
  -v, --verbose         print verbose information and test-by-test results
  --jit, --jit          run all jit tests
  --distributed-tests, --distributed-tests
                        run all distributed tests
  -core, --core         Only run core tests, or tests that validate PyTorch's ops, modules,and autograd. They are defined by CORE_TEST_LIST.
  -pt, --pytest         If true, use `pytest` to execute the tests. E.g., this runs TestTorch with pytest in verbose and coverage mode: python run_test.py -vci torch -pt
  -c, --coverage        enable coverage
  -i TESTS [TESTS ...], --include TESTS [TESTS ...]
                        select a set of tests to include (defaults to ALL tests). tests must be a part of the TESTS list defined in run_test.py
  -x TESTS [TESTS ...], --exclude TESTS [TESTS ...]
                        select a set of tests to exclude
  -f TESTS, --first TESTS
                        select the test to start from (excludes previous tests)
  -l TESTS, --last TESTS
                        select the last test to run (excludes following tests)
  --bring-to-front TESTS [TESTS ...]
                        select a set of tests to run first. This can be used in situations where you want to run all tests, but care more about some set, e.g. after making a change to a specific component
  --ignore-win-blocklist
                        always run blocklisted windows tests
  --continue-through-error
                        Runs the full test suite despite one of the tests failing
  --export-past-test-times [EXPORT_PAST_TEST_TIMES]
                        dumps test times from previous S3 stats into a file, format JSON
  --shard SHARD SHARD   runs a shard of the tests (taking into account other selections), e.g., --shard 2 3 will break up the selected tests into 3 shards and run the tests in the 2nd shard (the first number should not exceed the second)
  --exclude-jit-executor
                        exclude tests that are run for a specific jit config
  --exclude-distributed-tests
                        exclude distributed tests
  --run-specified-test-cases [RUN_SPECIFIED_TEST_CASES]
                        load specified test cases file dumped from previous OSS CI stats, format CSV.  If all test cases should run for a <test_module> please add a single row:
                         test_filename,test_case_name
                         ...
                         <test_module>,__all__
                         ...
                        how we use the stats will be based on option "--use-specified-test-cases-by".
  --use-specified-test-cases-by {include,bring-to-front}
                        used together with option "--run-specified-test-cases". When specified test case file is set, this option allows the user to control whether to only run the specified test modules or to simply bring the specified modules to front and also run the remaining modules. Note: regardless of this option, we will only run the specified test cases  within a specified test module. For unspecified test modules with the bring-to-front option, all test cases will be run, as one may expect.
  --dry-run             Only list the test that will run.

where TESTS is any of: benchmark_utils/test_benchmark_utils, distributed/_shard/sharded_optim/test_sharded_optim, distributed/_shard/sharded_tensor/ops/test_binary_cmp, distributed/_shard/sharded_tensor/ops/test_elementwise_ops, distributed/_shard/sharded_tensor/ops/test_embedding, distributed/_shard/sharded_tensor/ops/test_embedding_bag, distributed/_shard/sharded_tensor/ops/test_init, distributed/_shard/sharded_tensor/ops/test_linear, distributed/_shard/sharded_tensor/ops/test_math_ops, distributed/_shard/sharded_tensor/test_megatron_prototype, distributed/_shard/sharded_tensor/test_partial_tensor, distributed/_shard/sharded_tensor/test_sharded_tensor, distributed/_shard/sharded_tensor/test_sharded_tensor_reshard, distributed/_shard/sharding_spec/test_sharding_spec, distributed/_shard/test_replicated_tensor, distributed/algorithms/test_join, distributed/elastic/events/lib_test, distributed/elastic/metrics/api_test, distributed/elastic/multiprocessing/api_test, distributed/elastic/timer/api_test, distributed/elastic/timer/local_timer_example, distributed/elastic/timer/local_timer_test, distributed/elastic/utils/distributed_test, distributed/elastic/utils/logging_test, distributed/elastic/utils/util_test, distributed/fsdp/test_flatten_params_wrapper, distributed/fsdp/test_fsdp_apply, distributed/fsdp/test_fsdp_checkpoint, distributed/fsdp/test_fsdp_clip_grad_norm, distributed/fsdp/test_fsdp_comm, distributed/fsdp/test_fsdp_core, distributed/fsdp/test_fsdp_freezing_weights, distributed/fsdp/test_fsdp_grad_acc, distributed/fsdp/test_fsdp_ignored_modules, distributed/fsdp/test_fsdp_input, distributed/fsdp/test_fsdp_memory, distributed/fsdp/test_fsdp_mixed_precision, distributed/fsdp/test_fsdp_multiple_forward, distributed/fsdp/test_fsdp_multiple_wrapping, distributed/fsdp/test_fsdp_optim_state, distributed/fsdp/test_fsdp_overlap, distributed/fsdp/test_fsdp_pure_fp16, distributed/fsdp/test_fsdp_state_dict, distributed/fsdp/test_fsdp_summon_full_params, distributed/fsdp/test_fsdp_traversal, distributed/fsdp/test_fsdp_uneven, distributed/fsdp/test_shard_utils, distributed/fsdp/test_utils, distributed/fsdp/test_wrap, distributed/nn/jit/test_instantiator, distributed/optim/test_zero_redundancy_optimizer, distributed/pipeline/sync/skip/test_api, distributed/pipeline/sync/skip/test_gpipe, distributed/pipeline/sync/skip/test_inspect_skip_layout, distributed/pipeline/sync/skip/test_leak, distributed/pipeline/sync/skip/test_portal, distributed/pipeline/sync/skip/test_stash_pop, distributed/pipeline/sync/skip/test_tracker, distributed/pipeline/sync/skip/test_verify_skippables, distributed/pipeline/sync/test_balance, distributed/pipeline/sync/test_bugs, distributed/pipeline/sync/test_checkpoint, distributed/pipeline/sync/test_copy, distributed/pipeline/sync/test_deferred_batch_norm, distributed/pipeline/sync/test_dependency, distributed/pipeline/sync/test_inplace, distributed/pipeline/sync/test_microbatch, distributed/pipeline/sync/test_phony, distributed/pipeline/sync/test_pipe, distributed/pipeline/sync/test_pipeline, distributed/pipeline/sync/test_stream, distributed/pipeline/sync/test_transparency, distributed/pipeline/sync/test_worker, distributed/rpc/cuda/test_tensorpipe_agent, distributed/rpc/test_faulty_agent, distributed/rpc/test_tensorpipe_agent, distributed/test_c10d_common, distributed/test_c10d_gloo, distributed/test_c10d_nccl, distributed/test_c10d_spawn_gloo, distributed/test_c10d_spawn_nccl, distributed/test_data_parallel, distributed/test_distributed_spawn, distributed/test_launcher, distributed/test_nccl, distributed/test_pg_wrapper, distributed/test_store, distributions/test_constraints, distributions/test_distributions, lazy/test_bindings, lazy/test_extract_compiled_graph, lazy/test_ts_opinfo, test_ao_sparsity, test_autocast, test_autograd, test_binary_ufuncs, test_bundled_inputs, test_complex, test_cpp_api_parity, test_cpp_extensions_aot_ninja, test_cpp_extensions_aot_no_ninja, test_cpp_extensions_jit, test_cuda, test_cuda_primary_ctx, test_dataloader, test_datapipe, test_deploy, test_deploy, test_dispatch, test_expanded_weights, test_foreach, test_function_schema, test_functional_autograd_benchmark, test_functional_optim, test_functionalization, test_futures, test_fx, test_fx_experimental, test_hub, test_import_stats, test_indexing, test_jit, test_jit_autocast, test_jit_cuda_fuser, test_jit_disabled, test_jit_fuser_legacy, test_jit_fuser_te, test_jit_legacy, test_jit_profiling, test_license, test_linalg, test_logging, test_masked, test_mkldnn, test_mobile_optimizer, test_model_dump, test_module_init, test_modules, test_monitor, test_multiprocessing, test_multiprocessing_spawn, test_namedtensor, test_namedtuple_return_api, test_native_functions, test_nestedtensor, test_nn, test_numba_integration, test_numpy_interop, test_openmp, test_ops, test_ops_gradients, test_ops_jit, test_optim, test_overrides, test_package, test_per_overload_api, test_profiler, test_pruning_op, test_public_bindings, test_python_dispatch, test_pytree, test_quantization, test_reductions, test_scatter_gather_ops, test_serialization, test_set_default_mobile_cpu_allocator, test_shape_ops, test_show_pickle, test_sort_and_select, test_sparse, test_sparse_csr, test_spectral_ops, test_stateless, test_tensor_creation_ops, test_tensorboard, test_tensorexpr, test_tensorexpr_pybind, test_testing, test_torch, test_type_hints, test_type_info, test_type_promotion, test_unary_ufuncs, test_utils, test_view_ops, test_vmap, test_vulkan, test_xnnpack_integration
```

</p>
</details>

### When running anything else (for example  `python test_autograd.py -h`)
It did not change and still does:
- The general unittest parser help that we print via a second thread
- The common_utils's parser help
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76152
Approved by: https://github.com/malfet, https://github.com/seemethere
2022-04-25 14:01:33 +00:00
Wanchao Liang
78ea86a445 [shard] Sharder and ShardingPlan prototype (#73873)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73873

Basic ShardingPlan interface and Sharder implemention:
1. We provide `ShardingPlan` to allow user to specify all parameter sharding strategies for a given model, this including `plan` for sharding the parameters, and `output_plan` for tagging the output layout, `return_local_tensor` for converting back to DDP.
2. Introduce `shard_module` API, that could take a nn.Module, a ShardingPlan, then shard the module according to the plan.

TODO:
next PR we will introduce Extensible Sharder and ShardingPlanner.
ghstack-source-id: 154682421

Test Plan: test_sharding_plann.py

Reviewed By: pritamdamania87, fduwjj

Differential Revision: D34695159

fbshipit-source-id: 3d695803c4b7e9a7543177ade5b709b5f847baa9
(cherry picked from commit 670cd279b0e5304a9bf0ce6e6651a08273a77035)
2022-04-25 13:01:24 +00:00
Jeff Daily
44bbb247a6 [ROCm] enable fsdp tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75632
Approved by: https://github.com/kumpera, https://github.com/malfet
2022-04-22 19:50:36 +00:00
wanchaol
be354d8139 [shard] Add basic math ops to ShardedTensor and add ReplicatedTensor inter-op
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73703

This PR add basic math ops to ShardedTensor (+-*/), and add ReplicatedTensor inter-op ShardedTensor to those math ops. This enables ShardedTensor (op) ReplicatedTensor to avoid communication in certain cases.

Differential Revision: [D34560867](https://our.internmc.facebook.com/intern/diff/D34560867/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D34560867/)!

Approved by: https://github.com/pritamdamania87
2022-04-12 04:25:10 +00:00
Andrey Talman
622cff3e95 Cuda 11.6 Disable failing tests (#75420)
Summary:
This mitigates number of issues with CUDA 11.6 update and updates Linux driver .

New issues discovered
#[75391](https://github.com/pytorch/pytorch/issues/75391)
#[75375](https://github.com/pytorch/pytorch/issues/75375)

Old issue present since 11.3
#[57482](https://github.com/pytorch/pytorch/issues/57482)
#[70111](https://github.com/pytorch/pytorch/issues/70111)

These changes already testsed WIP PR:
#[75337](https://github.com/pytorch/pytorch/pull/75337)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75420

Reviewed By: seemethere

Differential Revision: D35481973

Pulled By: atalman

fbshipit-source-id: 4db00c646e2df4f8650404763963c3b215110f1f
(cherry picked from commit 518e19dc361b43273f5bd6bdfff942614e8466f5)
2022-04-07 22:43:15 +00:00
Brian Hirsh
9429dbb434 make functionalization work better with subclasses
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73441

Approved by: https://github.com/ezyang, https://github.com/albanD
2022-04-04 15:33:27 +00:00
David Berard
27deefb5e1 [JIT] Enable NVFuser tests in OSS CI (#73322)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73322

These tests have been disabled in OSS CI since #34785.

Test Plan: Imported from OSS

Reviewed By: eellison

Differential Revision: D34436844

Pulled By: davidberard98

fbshipit-source-id: c5b14b33e7f369a6fa1e9cfbcb484a30dffc659e
(cherry picked from commit b08f51587c0203c3e8b69f06ea613759e740aa4f)
2022-04-01 23:48:30 +00:00
Wanchao Liang
0524b2829a [shard] Add ReplicatedTensor (#73529)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73529

Add ReplicatedTensor, a ReplicatedTensor is a type of tensor that have the same value on all ranks across the world_size.

ReplicatedTensor is a :class:`~torch.Tensor` subclass, and it could be used together with ShardedTensor/Tensor together to express different types of computation. The inter-op rules defined as (using torch.add as an example op):
    ReplicatedTensor + ReplicatedTensor = ReplicatedTensor
    ReplicatedTensor + torch.Tensor = torch.Tensor
    ReplicatedTensor + ShardedTensor = ShardedTensor

We also added a `validate()` API to help user validate if a replicated tensor on certain process_group is truly replicated or not.

TODO: next PR gonna add ShardedTensor/PartialTensor logic to handle ReplicatedTensor.
ghstack-source-id: 152064781

Test Plan: test_replicated_tensor

Reviewed By: pritamdamania87, fduwjj

Differential Revision: D34529374

fbshipit-source-id: 16ccb300e9f9c47ac29a17eb6d46d029ab7d60b8
(cherry picked from commit 44f4e11e795a1bf330a8108bda256950ca769525)
2022-03-24 12:41:17 +00:00
Jeff Daily
956a028b55 [ROCm] enable HIP IPC
Enables code paths that use hipIpc* functions.  Also enables test_multiprocessing.py.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74383
Approved by: https://github.com/osalpekar
2022-03-21 19:32:01 +00:00
Sahan Paliskara
0bfa2f8255 Move torch::deploy tests to their own workflow job (#73676)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73676

For some reason https://github.com/pytorch/pytorch/pull/72637 ended up in getting messed up during rebasing so please refer to that pr for review history.

This PR creates a new workflow called ` deploy-linux-xenial-cuda11.3-py3.7-gcc7` for torch::deploy tests.

For testing go to https://www.torch-ci.com/pytorch/pytorch/pull/73676 and check if a build and test job occur with ` deploy-linux-xenial-cuda11.3-py3.7-gcc7`

Test Plan: Imported from OSS

Reviewed By: soulitzer

Differential Revision: D34586702

Pulled By: PaliC

fbshipit-source-id: 5627cf4ff411a4a04030f8b7726f84af979da213
(cherry picked from commit df6dddebb9fe078a6053a31033b5a40cc742fcf3)
2022-03-17 12:19:48 +00:00
atalman
ebca80ed08 Move test ops gradients and test ops jit to separate files
Fixes #72368

As per reference issue, the test_ops in single file takes around 3:30-4:00Hrs to execute on asan jobs:

Reference : pytorch_test_times.json

```
{
    "commit": "39535fec6c3ff5bf7c2d322d096c59571c3295ed",
    "JOB_BASE_NAME": "linux-xenial-py3.7-clang7-asan",
    "job_times": {
        "test_ops": 14928.355000000636, <- This test group is over 4hrs alone
```
----

Hence separating  test_ops into following parts:
1. TestGradients
2. TestJit
3.  TestCommon and TestMathBits

Pull Request resolved: https://github.com/pytorch/pytorch/pull/74297
Approved by: https://github.com/malfet
2022-03-17 02:07:50 +00:00
PyTorch MergeBot
232faeacf8 Revert "Move test ops gradients and test ops jit to separate files"
This reverts commit 7cf9b942da.

Reverted https://github.com/pytorch/pytorch/pull/74297 on behalf of https://github.com/atalman
2022-03-16 20:08:23 +00:00
atalman
7cf9b942da Move test ops gradients and test ops jit to separate files
Fixes #72368

As per reference issue, the test_ops in single file takes around 3:30-4:00Hrs to execute on asan jobs:

Reference : pytorch_test_times.json

```
{
    "commit": "39535fec6c3ff5bf7c2d322d096c59571c3295ed",
    "JOB_BASE_NAME": "linux-xenial-py3.7-clang7-asan",
    "job_times": {
        "test_ops": 14928.355000000636, <- This test group is over 4hrs alone
```
----

Hence separating  test_ops into following parts:
1. TestGradients
2. TestJit
3.  TestCommon and TestMathBits

Pull Request resolved: https://github.com/pytorch/pytorch/pull/74297
Approved by: https://github.com/malfet
2022-03-16 19:30:22 +00:00
Wanchao Liang
8b2ae86f02 [shard] disable rocm and windows for sharding_spec test (#74040)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74040

fixes https://github.com/pytorch/pytorch/issues/73552
ghstack-source-id: 151046817

Test Plan: wait for ci

Reviewed By: rohan-varma

Differential Revision: D34792398

fbshipit-source-id: 84d08f01db8375817f48537505e7d988cb39d1f4
(cherry picked from commit 18b21ef0db91ddd22dc57a5b413e3e3ad594bb14)
2022-03-10 20:23:59 +00:00
Alban Desmaison
701fa16eed only run complex autograd tests once
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73210
2022-03-01 23:42:07 +00:00
Alban Desmaison
f275b3f9a1 simplify run_test for distributed tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73209
2022-03-01 23:37:37 +00:00
Alban Desmaison
7e919bd3c6 add dry run option and improve test list printing
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73208
2022-02-22 20:45:41 +00:00
Ilya Persky
1b089292df Fix test failure when compiled without LAPACK support (#70671)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/70670

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70671

Reviewed By: H-Huang

Differential Revision: D34242339

Pulled By: janeyx99

fbshipit-source-id: 8cd13c13588007c60e9c3f17dbf707dcfa2e0e04
(cherry picked from commit cf6dbe3e81)
2022-02-15 16:38:47 +00:00
wushirong
4d01789f69 Remove fx2trt from oss CI (#72595)
Summary:
Remove fx2trt test from oss CI

Pull Request resolved: https://github.com/pytorch/pytorch/pull/72595

Test Plan: CI

Reviewed By: houseroad

Differential Revision: D34112595

Pulled By: wushirong

fbshipit-source-id: 02376ef0f25381eff31b72dcbf964c1966af9793
(cherry picked from commit e3d698a942)
2022-02-10 18:49:31 +00:00
Junjie Wang (PyTorch)
88547396eb [PT-D] Enable megatron-lm style MLP layers (Changes mainly on sharded linear op) (#69735)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69735

We want to build a prototype of Megatron-LM so that we can apply PT-D op to models like transformer and other Meta flagship models like

The basic idea of Megatron-LM is as following:
1. Col-wise sharding of linear weight. Perform the linear op for the first layer.
2. Perform a math op (optional), such as ReLU or GeLU. We use GeLU in our example unit test. The input is from step 1.
3. Row-wise sharing of linear weight. Perform the linear op for the second layer. The input is from step 2.

We then save communications to concatenate the col-wise sharding results and spreading the input to different ranks for row-wise sharding.

The change is as following:
1. Return a ShardedTensor for the col-wise sharding in the sharded_linear op.
2. Return a PartialTensors for the row-wise sharding in the sharded_linear op.
3. Leverage APIs already defined for `reshard` to merge/aggregate local results to a fully sync local result if needed.
4. Add helper function to create sharded tensor based on the local result.
5. Add a unit test to test the Megatron-LM idea mentioned above and compare with local ops, including the grad and optimizer so that we can ensure the correctness of the implementation.
6. Refactor the unit test of sharded linear to reflect the changes in the code.
ghstack-source-id: 148273049

Test Plan: Unit test + CI

Reviewed By: pritamdamania87

Differential Revision: D32978221

fbshipit-source-id: 565fc92e7807e19d53b0261f8ace3945bef69e3e
(cherry picked from commit 344abe7520)
2022-02-03 06:12:15 +00:00
Junjie Wang (PyTorch)
19d0de8a57 [PT-D][RFC] Resharding related API implement for ShardedTensor and Partial Tensor (#70079)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70079

We defined a new concept named `PartialTensor`, which is an abstraction to represent Tensors that need aggregation across multiple devices and multiple processes.

We also defined a API `reshard_output` to reshard a `PartialTensor` to `Tensor` or reshard a `ShardedTensor` to `ShardedTensor/Tensor`. This is done via class `ModuleResharder` which acts like a wrapper of original modules plus the a reshard in the final step.

The `reshard` logic is defined in each class (`ShardedTensor` and `PartialTensor`).
ghstack-source-id: 148273050

Test Plan: Unit test is in the next PR.

Reviewed By: pritamdamania87

Differential Revision: D33121037

fbshipit-source-id: 5f56617ea526b857c5b73df6e069697d428ec359
(cherry picked from commit 58b1457cbc)
2022-02-03 05:26:02 +00:00
Pritam Damania
64670e414e [reland] Create torch.distributed._shard package. (#72141)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72141

We have many sharding components currently:
torch.distributed._sharded_tensor, torch.distributed._sharding_spec,
torch.distributed._sharded_optimizer and more coming.

As a result, organizing all of this under the `torch.distributed._shard`
package. For BC reasons, I'm still keeping the old packages and have them just
reference the new package.
ghstack-source-id: 148150861
ghstack-source-id: 148150861

Test Plan: waitforbuildbot

Reviewed By: fduwjj

Differential Revision: D33904585

fbshipit-source-id: 057e847eb7521b536a3ee4e0f94871aacc752062
(cherry picked from commit 29a70dd7af)
2022-02-02 06:58:20 +00:00
Nikita Shulga
34494e6252 Back out "Create torch.distributed.shard package." (#72062)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72062

Original commit changeset: dc692b31e260

Original Phabricator Diff: D33755913 (87bbcf70f7)

Test Plan: CI

Reviewed By: pbelevich

Differential Revision: D33891115

fbshipit-source-id: 37286e03d743d8691319f07c95e9561d54f3d6d0
(cherry picked from commit 0c1b3fe008)
2022-01-31 18:29:27 +00:00
Pritam Damania
87bbcf70f7 Create torch.distributed.shard package. (#71742)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71742

We have many sharding components currently:
torch.distributed._sharded_tensor, torch.distributed._sharding_spec,
torch.distributed._sharded_optimizer and more coming.

As a result, organizing all of this under the `torch.distributed.shard`
package. For BC reasons, I'm still keeping the old packages and have them just
reference the new package.
ghstack-source-id: 147899768

Test Plan: waitforbuildbot

Reviewed By: fduwjj, wanchaol

Differential Revision: D33755913

fbshipit-source-id: dc692b31e2607063d55dfcb3db33ec53961d5a5b
(cherry picked from commit 5b6885f358)
2022-01-29 00:48:06 +00:00
Shirong Wu
7a08030903 Fix fx2trt CI test trigger condition (#71014)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71014

Replace test trigger with test_config matching.

Test Plan:
CI
https://github.com/pytorch/pytorch/runs/4746717568?check_suite_focus=true

Reviewed By: janeyx99

Differential Revision: D33480971

fbshipit-source-id: 9513e464753343a7ae47fcfaf48119f34bae94c5
2022-01-10 13:37:24 -08:00
Rodrigo Kumpera
2378421340 Implement torch.allclose for sharded tensor. (#70331)
Summary:
Implement torch.allclose op for sharded tensors.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70331

Test Plan:
Automated test added.
pritamdamania87
Fixes https://github.com/pytorch/pytorch/issues/67112

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang

Reviewed By: pritamdamania87

Differential Revision: D33339137

Pulled By: kumpera

fbshipit-source-id: 4263e468eaa117317b190f69877bf3f8bbac5658
2022-01-07 08:37:04 -08:00
Ilya Persky
bc514cb425 Skip distributed tests if built with USE_DISTRIBUTED=0 (#70677)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/70676

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70677

Reviewed By: albanD

Differential Revision: D33439808

Pulled By: janeyx99

fbshipit-source-id: 7f9971eb564dbbb6625fe5f78328c3abe3808719
2022-01-06 08:55:05 -08:00
Brian Hirsh
bb5b4cceb6 Revert "Revert D32498569: allow external backend codegen to toggle whether to generate out= and inplace kernels" (#69950)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69950

This reverts commit f6cad53443.

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D33113545

Pulled By: bdhirsh

fbshipit-source-id: d6590294662588d36c09662dea65919ad4e1e288
2022-01-04 14:52:00 -08:00
wushirong
31c7e5d629 Install TensorRT lib on oss docker and enable fx2trt unit test (#70203)
Summary:
CI

Lib installed and unit test run on https://github.com/pytorch/pytorch/actions/runs/1604076060

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70203

Reviewed By: malfet

Differential Revision: D33264641

Pulled By: wushirong

fbshipit-source-id: ba30010bbd06e70d31415d8c52086d1779371bcf
2021-12-22 08:50:48 -08:00
Pritam Damania
0544f975e1 [reland] Support torch.equal for ShardedTensor. (#70145)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70145

Added support for torch.equal to ShardedTensor. This is really
helpful in terms of comparing two ShardedTensors.
ghstack-source-id: 146066939

Test Plan: waitforbuildbot

Reviewed By: wanchaol

Differential Revision: D33201714

fbshipit-source-id: 56adfc36e345d512c9901c56c07759bf658c745b
2021-12-21 13:22:52 -08:00
Michael Suo
19f898402d Revert D33241684: [pytorch][PR] Install TensorRT lib on oss docker and enable fx2trt unit test
Test Plan: revert-hammer

Differential Revision:
D33241684 (dab3d3132b)

Original commit changeset: cd498908b00f

Original Phabricator Diff: D33241684 (dab3d3132b)

fbshipit-source-id: d5b2e663b5b0c9e570bd799b9f6111cd2a0de4f7
2021-12-20 23:14:35 -08:00
wushirong
dab3d3132b Install TensorRT lib on oss docker and enable fx2trt unit test (#70203)
Summary:
CI

Lib installed and unit test run on https://github.com/pytorch/pytorch/actions/runs/1604076060

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70203

Reviewed By: janeyx99

Differential Revision: D33241684

Pulled By: wushirong

fbshipit-source-id: cd498908b00f3417bdeb5ede78f5576b3b71087c
2021-12-20 18:51:48 -08:00
Michael Suo
a406a427ae Revert D33004315: Support torch.equal for ShardedTensor.
Test Plan: revert-hammer

Differential Revision:
D33004315 (1c4c81622c)

Original commit changeset: 786fe26baf82

Original Phabricator Diff: D33004315 (1c4c81622c)

fbshipit-source-id: e1dda70fea656834fdf0f2a9f874415f7b460c6e
2021-12-15 14:14:06 -08:00
Pritam Damania
1c4c81622c Support torch.equal for ShardedTensor. (#69734)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69734

Added support for `torch.equal` to ShardedTensor. This is really
helpful in terms of comparing two ShardedTensors.

Will implement `allclose` in a follow PR.
ghstack-source-id: 145301451

Test Plan: waitforbuildbot

Reviewed By: fduwjj, wanchaol

Differential Revision: D33004315

fbshipit-source-id: 786fe26baf82e1bb4fecfdbfc9ad4b64e704877f
2021-12-15 13:07:36 -08:00
Brian Hirsh
f6cad53443 Revert D32498569: allow external backend codegen to toggle whether to generate out= and inplace kernels
Test Plan: revert-hammer

Differential Revision:
D32498569 (aa0cf68c17)

Original commit changeset: ebd932d042b9

Original Phabricator Diff: D32498569 (aa0cf68c17)

fbshipit-source-id: 21a393fa339510d926512a7983d33ece327b743d
2021-12-14 15:27:24 -08:00
Nikita Shulga
24ee1d13f6 Another attempt to fix version comparison check (#69939)
Summary:
Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69939

Reviewed By: atalman

Differential Revision: D33108135

Pulled By: malfet

fbshipit-source-id: cadadfe5b04c4378f149136f8e1f8e8d6266775c
2021-12-14 14:54:15 -08:00
Wanchao Liang
800a457b6f [shard] add ShardedOptimizer (#68607)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68607

This PR added ShardedOptimizer and a API to get module parameters along with ShardedTensor param, it allows user to use this Optimizer Wrapper to construct a optimizer that involves ShardedTensor

The state_dict support will be a follow up diff
ghstack-source-id: 145532834

Test Plan: python test_sharded_optim.py

Reviewed By: pritamdamania87

Differential Revision: D32539994

fbshipit-source-id: a3313c6870d1f1817fc3e08dc2fc27dc43bef743
2021-12-14 12:15:20 -08:00
Nikita Shulga
fef9981998 Update run_test.py (#69920)
Summary:
Do not compare LooseVersion against string

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69920

Reviewed By: atalman

Differential Revision: D33101166

Pulled By: malfet

fbshipit-source-id: a2df9e01d17663262718f11e580c8b009764f7b5
2021-12-14 11:26:56 -08:00
Brian Hirsh
aa0cf68c17 allow external backend codegen to toggle whether to generate out= and inplace kernels (#68530)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68530

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D32498569

Pulled By: bdhirsh

fbshipit-source-id: ebd932d042b988e19c71aa04a21677db9bdc9f04
2021-12-14 10:25:02 -08:00
Nikita Shulga
07767569c9 Properly import LooseVersion (#69904)
Summary:
This fixes regression introduced by https://github.com/pytorch/pytorch/pull/57040

Somehow importing `distutils` from `setuptool` caused import of
`distutils.versions`, which is not a documented dependency and got
change with the release of
[setuptools-59.6.0](https://github.com/pypa/setuptools/tree/v59.6.0)
We should not rely on that, as
`import distutils` never re-imports `distutils.version`, which one can
see by observing
https://github.com/python/cpython/blob/3.9/Lib/distutils/__init__.py
or by running:
```
% python3 -c "import distutils;print(distutils.__version__, dir(distutils))"
3.7.5 ['__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', '__version__', 'sys']
% python3 -c "from setuptools import distutils;print(distutils.__version__, dir(distutils))"
3.7.5 ['__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', '__version__', 'archive_util', 'ccompiler', 'cmd', 'config', 'core', 'debug', 'dep_util', 'dir_util', 'dist', 'errors', 'extension', 'fancy_getopt', 'file_util', 'filelist', 'log', 'spawn', 'sys', 'sysconfig', 'util', 'version']
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69904

Reviewed By: albanD, atalman, janeyx99

Differential Revision: D33094453

Pulled By: malfet

fbshipit-source-id: aaf1adb7c6f293c4e376ccff21c64cd6ba625e97
2021-12-14 09:28:19 -08:00