Commit Graph

551 Commits

Author SHA1 Message Date
Charlie Yan
ffae7308c9 Enable test: distributed/algorithms/quantization/test_quantization (#80097)
fixes  https://github.com/pytorch/pytorch/issues/69017
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80097
Approved by: https://github.com/wanchaol
2022-07-01 01:32:33 +00:00
Charlie Yan
14eadf937b Enable test: test/distributed/algorithms/ddp_comm_hooks/test_ddp_hooks.py
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80077

Approved by: https://github.com/wanchaol
2022-06-23 00:11:53 +00:00
Alex Hedges
cb2b7b1e57 Fix code that triggers BytesWarning (#79868)
Fixes #74812.

I have fixed the multiple instances in the repository that trigger
`BytesWarning`, and I have enabled the `-bb` option when tests are run
to prevent regressions.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79868
Approved by: https://github.com/janeyx99
2022-06-21 01:12:21 +00:00
PyTorch MergeBot
e10cbe3880 Revert "Fix BytesWarning in torch.load() (#74813)"
This reverts commit 6c2e8119dd.

Reverted https://github.com/pytorch/pytorch/pull/74813 on behalf of https://github.com/janeyx99 due to Broke slow tests in cuda 10.2 https://github.com/pytorch/pytorch/runs/6944238177?check_suite_focus=true
2022-06-18 03:53:54 +00:00
Alex Hedges
6c2e8119dd Fix BytesWarning in torch.load() (#74813)
Fixes #74812.

I have enabled the `-bb` option when tests are run to prevent regressions. I don't think it will make CI run more slowly, but I'm not entirely sure.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/74813
Approved by: https://github.com/kit1980
2022-06-17 22:56:43 +00:00
Michael Suo
842da8a5de [ci] remove TD + test specification code from run_test.py
In the case of target determination, this is just removing comments that
refer to non-existent code.

In the case of the test specification code; this removes (what I believe
to be) an unused feature. If we're using this somehow let me know and I
can revise the PR.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79372

Approved by: https://github.com/janeyx99
2022-06-13 16:09:53 +00:00
Michael Suo
943c09a53e [ci] clean up dead code related to PR test selection
This is never used and not tested, so removing it for clarity.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79363

Approved by: https://github.com/janeyx99
2022-06-13 16:09:51 +00:00
Michael Suo
c978b609f7 [ci] remove IN_CI env var
The conventional env var to set is CI. Both circle and GHA set it, so
IN_CI is unnecessary

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79229

Approved by: https://github.com/janeyx99
2022-06-11 17:16:30 +00:00
Jagadish Krishnamoorthy
2d354cdc2a [ROCm] Enable test_instantiator, test_type_hints (#78633)
Signed-off-by: Jagadish Krishnamoorthy <jagdish.krishna@gmail.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78633
Approved by: https://github.com/malfet, https://github.com/pruthvistony
2022-06-06 06:09:34 +00:00
Xiao Wang
ef0332e36d Allow relocatable device code linking in pytorch CUDA extensions (#78225)
Close https://github.com/pytorch/pytorch/issues/57543

Doc: check `Relocatable device code linking:` in https://docs-preview.pytorch.org/78225/cpp_extension.html#torch.utils.cpp_extension.CUDAExtension
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78225
Approved by: https://github.com/ezyang, https://github.com/malfet
2022-06-02 21:35:56 +00:00
Kurt Mohler
1705be8ff7 Fix _free_weak_ref error (#78575)
Fixes #74016

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78575
Approved by: https://github.com/ezyang
2022-06-01 00:07:48 +00:00
pritam
37eb31599c [reland] Add sharding tests to multigpu-test.sh and fix custom operator decorator (#77987)
1. Enabled multigpu tests.
2. Fixed failing multigpu tests.
3. Fixed custom operator decorator to be first preference in operator dispatch.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77987
Approved by: https://github.com/fduwjj, https://github.com/wanchaol, https://github.com/janeyx99
2022-05-21 22:33:58 +00:00
PyTorch MergeBot
0f74b44f1a Revert "Add sharding tests to multigpu-test.sh and fix custom operator decorator (#77825)"
This reverts commit 8d4c8df33a.

Reverted https://github.com/pytorch/pytorch/pull/77825 on behalf of https://github.com/janeyx99 due to as it will break multigpu test reporting
2022-05-20 17:59:03 +00:00
pritam
8d4c8df33a Add sharding tests to multigpu-test.sh and fix custom operator decorator (#77825)
1. Enabled multigpu tests.
2. Fixed failing multigpu tests.
3. Fixed custom operator decorator to be first preference in operator dispatch.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77825
Approved by: https://github.com/wanchaol, https://github.com/fduwjj
2022-05-20 16:53:27 +00:00
PyTorch MergeBot
5e0f559d23 Revert "Add sharding tests to multigpu-test.sh (#77708)"
This reverts commit a7cf95a609.

Reverted https://github.com/pytorch/pytorch/pull/77708 on behalf of https://github.com/suo
2022-05-18 21:47:11 +00:00
pritam
a7cf95a609 Add sharding tests to multigpu-test.sh (#77708)
Summary: These tests were being skipped since they don't run on multigpu
jobs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77708
Approved by: https://github.com/wanchaol
2022-05-18 17:37:55 +00:00
Wanchao Liang
25fa964d96 [shard] add clone/detach and set requires_grad for ShardedTensor
This PR adding clone/detach and set requires_grad to ShardedTensor

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77367

Approved by: https://github.com/pritamdamania87
2022-05-16 21:42:27 +00:00
pritam
9e52b50e34 Additional ops for ShardedTensor, ReplicatedTensor and PartialTensor.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76477

Adding the following ops:

1) softmax for ShardedTensor
2) getitem and unsqueeze for ReplicatedTensor
3) transpose and cat for PartialTensor

Differential Revision: [D35979510](https://our.internmc.facebook.com/intern/diff/D35979510/)

Approved by: https://github.com/fduwjj, https://github.com/wanchaol
2022-05-06 16:28:04 +00:00
Catherine Lee
56ea57de61 shard pull / linux-xenial-cuda11.3-py3.7-gcc7 / test (distributed 1->2
Fixes #ISSUE_NUMBER

shard `pull / linux-xenial-cuda11.3-py3.7-gcc7 / test (distributed ...` from 1 shard to 2

Pros:
- It currently takes about 2.6 hours and is 3rd longest running job on pull
- Theoretically minimal overhead

Cons:
- Requires changes to the run_test.py which might have correctness issues

Notes:
- Cannot shard further as one of the test files is responsible for about half of the total run time

spreadsheet regarding sharding: https://docs.google.com/spreadsheets/d/1BdtVsjRr0Is9LXMNilR02FEdPXNq7zEWl8AmR3ArsLQ/edit#gid=1153012347

Test Plan:
<details><summary>expand to see test plan (its long)</summary>

tests from a commit ran on master (90 tests ran)
```
2022-05-03T12:45:34.7974184Z Selected tests:
2022-05-03T12:45:34.7974495Z  distributed/_shard/sharded_optim/test_sharded_optim
2022-05-03T12:45:34.7974839Z  distributed/_shard/sharded_tensor/ops/test_binary_cmp
2022-05-03T12:45:34.7975209Z  distributed/_shard/sharded_tensor/ops/test_elementwise_ops
2022-05-03T12:45:34.7975575Z  distributed/_shard/sharded_tensor/ops/test_embedding
2022-05-03T12:45:34.7976180Z  distributed/_shard/sharded_tensor/ops/test_embedding_bag
2022-05-03T12:45:34.7976802Z  distributed/_shard/sharded_tensor/ops/test_init
2022-05-03T12:45:34.7977361Z  distributed/_shard/sharded_tensor/ops/test_linear
2022-05-03T12:45:34.7978157Z  distributed/_shard/sharded_tensor/ops/test_math_ops
2022-05-03T12:45:34.7978879Z  distributed/_shard/sharded_tensor/test_megatron_prototype
2022-05-03T12:45:34.7979594Z  distributed/_shard/sharded_tensor/test_sharded_tensor
2022-05-03T12:45:34.7980366Z  distributed/_shard/sharded_tensor/test_sharded_tensor_reshard
2022-05-03T12:45:34.7981066Z  distributed/_shard/sharding_plan/test_sharding_plan
2022-05-03T12:45:34.7981877Z  distributed/_shard/sharding_spec/test_sharding_spec
2022-05-03T12:45:34.7982387Z  distributed/_shard/test_partial_tensor
2022-05-03T12:45:34.7982691Z  distributed/_shard/test_replicated_tensor
2022-05-03T12:45:34.7982994Z  distributed/_shard/test_sharder
2022-05-03T12:45:34.7983280Z  distributed/algorithms/test_join
2022-05-03T12:45:34.7983695Z  distributed/elastic/events/lib_test
2022-05-03T12:45:34.7983984Z  distributed/elastic/metrics/api_test
2022-05-03T12:45:34.7984308Z  distributed/elastic/multiprocessing/api_test
2022-05-03T12:45:34.7984624Z  distributed/elastic/timer/api_test
2022-05-03T12:45:34.7984924Z  distributed/elastic/timer/local_timer_example
2022-05-03T12:45:34.7985254Z  distributed/elastic/timer/local_timer_test
2022-05-03T12:45:34.7985575Z  distributed/elastic/utils/distributed_test
2022-05-03T12:45:34.7985889Z  distributed/elastic/utils/logging_test
2022-05-03T12:45:34.7986176Z  distributed/elastic/utils/util_test
2022-05-03T12:45:34.7986492Z  distributed/fsdp/test_flatten_params_wrapper
2022-05-03T12:45:34.7986799Z  distributed/fsdp/test_fsdp_apply
2022-05-03T12:45:34.7987078Z  distributed/fsdp/test_fsdp_checkpoint
2022-05-03T12:45:34.7987388Z  distributed/fsdp/test_fsdp_clip_grad_norm
2022-05-03T12:45:34.7987691Z  distributed/fsdp/test_fsdp_comm
2022-05-03T12:45:34.7987961Z  distributed/fsdp/test_fsdp_core
2022-05-03T12:45:34.7988251Z  distributed/fsdp/test_fsdp_exec_order
2022-05-03T12:45:34.7988570Z  distributed/fsdp/test_fsdp_freezing_weights
2022-05-03T12:45:34.7988865Z  distributed/fsdp/test_fsdp_grad_acc
2022-05-03T12:45:34.7989176Z  distributed/fsdp/test_fsdp_ignored_modules
2022-05-03T12:45:34.7989478Z  distributed/fsdp/test_fsdp_input
2022-05-03T12:45:34.7989950Z  distributed/fsdp/test_fsdp_memory
2022-05-03T12:45:34.7990241Z  distributed/fsdp/test_fsdp_meta
2022-05-03T12:45:34.7990640Z  distributed/fsdp/test_fsdp_mixed_precision
2022-05-03T12:45:34.7990964Z  distributed/fsdp/test_fsdp_multiple_forward
2022-05-03T12:45:34.7991293Z  distributed/fsdp/test_fsdp_multiple_wrapping
2022-05-03T12:45:34.7991610Z  distributed/fsdp/test_fsdp_optim_state
2022-05-03T12:45:34.7991895Z  distributed/fsdp/test_fsdp_overlap
2022-05-03T12:45:34.7992195Z  distributed/fsdp/test_fsdp_pure_fp16
2022-05-03T12:45:34.7992500Z  distributed/fsdp/test_fsdp_state_dict
2022-05-03T12:45:34.7992818Z  distributed/fsdp/test_fsdp_summon_full_params
2022-05-03T12:45:34.7993117Z  distributed/fsdp/test_fsdp_traversal
2022-05-03T12:45:34.7993861Z  distributed/fsdp/test_fsdp_uneven
2022-05-03T12:45:34.7994181Z  distributed/fsdp/test_shard_utils
2022-05-03T12:45:34.7994447Z  distributed/fsdp/test_utils
2022-05-03T12:45:34.7994721Z  distributed/fsdp/test_wrap
2022-05-03T12:45:34.7995015Z  distributed/nn/jit/test_instantiator
2022-05-03T12:45:34.7995328Z  distributed/optim/test_zero_redundancy_optimizer
2022-05-03T12:45:34.7995664Z  distributed/pipeline/sync/skip/test_api
2022-05-03T12:45:34.7995983Z  distributed/pipeline/sync/skip/test_gpipe
2022-05-03T12:45:34.7996315Z  distributed/pipeline/sync/skip/test_inspect_skip_layout
2022-05-03T12:45:34.7996652Z  distributed/pipeline/sync/skip/test_leak
2022-05-03T12:45:34.7996977Z  distributed/pipeline/sync/skip/test_portal
2022-05-03T12:45:34.7997292Z  distributed/pipeline/sync/skip/test_stash_pop
2022-05-03T12:45:34.7997623Z  distributed/pipeline/sync/skip/test_tracker
2022-05-03T12:45:34.7997968Z  distributed/pipeline/sync/skip/test_verify_skippables
2022-05-03T12:45:34.7998301Z  distributed/pipeline/sync/test_balance
2022-05-03T12:45:34.7998591Z  distributed/pipeline/sync/test_bugs
2022-05-03T12:45:34.7998927Z  distributed/pipeline/sync/test_checkpoint
2022-05-03T12:45:34.7999243Z  distributed/pipeline/sync/test_copy
2022-05-03T12:45:34.7999557Z  distributed/pipeline/sync/test_deferred_batch_norm
2022-05-03T12:45:34.7999896Z  distributed/pipeline/sync/test_dependency
2022-05-03T12:45:34.8000215Z  distributed/pipeline/sync/test_inplace
2022-05-03T12:45:34.8000516Z  distributed/pipeline/sync/test_microbatch
2022-05-03T12:45:34.8000826Z  distributed/pipeline/sync/test_phony
2022-05-03T12:45:34.8001130Z  distributed/pipeline/sync/test_pipe
2022-05-03T12:45:34.8001424Z  distributed/pipeline/sync/test_pipeline
2022-05-03T12:45:34.8001733Z  distributed/pipeline/sync/test_stream
2022-05-03T12:45:34.8002055Z  distributed/pipeline/sync/test_transparency
2022-05-03T12:45:34.8002353Z  distributed/pipeline/sync/test_worker
2022-05-03T12:45:34.8002672Z  distributed/rpc/cuda/test_tensorpipe_agent
2022-05-03T12:45:34.8002982Z  distributed/rpc/test_faulty_agent
2022-05-03T12:45:34.8003270Z  distributed/rpc/test_tensorpipe_agent
2022-05-03T12:45:34.8003568Z  distributed/test_c10d_common
2022-05-03T12:45:34.8003839Z  distributed/test_c10d_gloo
2022-05-03T12:45:34.8004088Z  distributed/test_c10d_nccl
2022-05-03T12:45:34.8004369Z  distributed/test_c10d_spawn_gloo
2022-05-03T12:45:34.8004656Z  distributed/test_c10d_spawn_nccl
2022-05-03T12:45:34.8004938Z  distributed/test_data_parallel
2022-05-03T12:45:34.8005212Z  distributed/test_distributed_spawn
2022-05-03T12:45:34.8005496Z  distributed/test_launcher
2022-05-03T12:45:34.8005767Z  distributed/test_nccl
2022-05-03T12:45:34.8006019Z  distributed/test_pg_wrapper
2022-05-03T12:45:34.8006285Z  distributed/test_store
```

tests ran on first shard for distributed on this PR (34 tests)
```
2022-05-02T21:26:00.1385256Z Selected tests:
2022-05-02T21:26:00.1385767Z  distributed/test_distributed_spawn
2022-05-02T21:26:00.1386403Z  distributed/elastic/multiprocessing/api_test
2022-05-02T21:26:00.1387051Z  distributed/fsdp/test_fsdp_memory
2022-05-02T21:26:00.1387607Z  distributed/fsdp/test_fsdp_ignored_modules
2022-05-02T21:26:00.1388179Z  distributed/fsdp/test_fsdp_apply
2022-05-02T21:26:00.1388600Z  distributed/_shard/sharded_tensor/ops/test_binary_cmp
2022-05-02T21:26:00.1389181Z  distributed/_shard/sharding_spec/test_sharding_spec
2022-05-02T21:26:00.1389545Z  distributed/_shard/sharded_tensor/ops/test_linear
2022-05-02T21:26:00.1389878Z  distributed/fsdp/test_fsdp_uneven
2022-05-02T21:26:00.1390186Z  distributed/fsdp/test_fsdp_multiple_wrapping
2022-05-02T21:26:00.1390526Z  distributed/fsdp/test_fsdp_multiple_forward
2022-05-02T21:26:00.1390877Z  distributed/_shard/sharded_tensor/ops/test_embedding
2022-05-02T21:26:00.1391219Z  distributed/_shard/test_partial_tensor
2022-05-02T21:26:00.1391542Z  distributed/_shard/sharded_optim/test_sharded_optim
2022-05-02T21:26:00.1391915Z  distributed/_shard/sharded_tensor/ops/test_elementwise_ops
2022-05-02T21:26:00.1392297Z  distributed/fsdp/test_flatten_params_wrapper
2022-05-02T21:26:00.1392585Z  distributed/fsdp/test_utils
2022-05-02T21:26:00.1392883Z  distributed/nn/jit/test_instantiator
2022-05-02T21:26:00.1393167Z  distributed/test_nccl
2022-05-02T21:26:00.1393466Z  distributed/_shard/sharding_plan/test_sharding_plan
2022-05-02T21:26:00.1393787Z  distributed/_shard/test_sharder
2022-05-02T21:26:00.1394085Z  distributed/elastic/timer/api_test
2022-05-02T21:26:00.1394383Z  distributed/pipeline/sync/skip/test_api
2022-05-02T21:26:00.1394738Z  distributed/pipeline/sync/skip/test_inspect_skip_layout
2022-05-02T21:26:00.1395090Z  distributed/pipeline/sync/skip/test_portal
2022-05-02T21:26:00.1395424Z  distributed/pipeline/sync/skip/test_tracker
2022-05-02T21:26:00.1395935Z  distributed/pipeline/sync/test_balance
2022-05-02T21:26:00.1396288Z  distributed/pipeline/sync/test_checkpoint
2022-05-02T21:26:00.1396635Z  distributed/pipeline/sync/test_deferred_batch_norm
2022-05-02T21:26:00.1396953Z  distributed/pipeline/sync/test_inplace
2022-05-02T21:26:00.1397269Z  distributed/pipeline/sync/test_phony
2022-05-02T21:26:00.1397587Z  distributed/pipeline/sync/test_pipeline
2022-05-02T21:26:00.1397903Z  distributed/pipeline/sync/test_transparency
2022-05-02T21:26:00.1398221Z  distributed/rpc/test_faulty_agent
```

tests ran on second shard for distributed on this PR (56 tests)
```
2022-05-02T21:26:55.1342892Z Selected tests:
2022-05-02T21:26:55.1343201Z  distributed/rpc/cuda/test_tensorpipe_agent
2022-05-02T21:26:55.1343526Z  distributed/fsdp/test_fsdp_core
2022-05-02T21:26:55.1343829Z  distributed/test_c10d_nccl
2022-05-02T21:26:55.1344089Z  distributed/test_c10d_gloo
2022-05-02T21:26:55.1344408Z  distributed/fsdp/test_fsdp_summon_full_params
2022-05-02T21:26:55.1344749Z  distributed/fsdp/test_fsdp_mixed_precision
2022-05-02T21:26:55.1345085Z  distributed/optim/test_zero_redundancy_optimizer
2022-05-02T21:26:55.1345423Z  distributed/fsdp/test_fsdp_optim_state
2022-05-02T21:26:55.1345773Z  distributed/_shard/sharded_tensor/test_sharded_tensor
2022-05-02T21:26:55.1346088Z  distributed/fsdp/test_fsdp_state_dict
2022-05-02T21:26:55.1346379Z  distributed/test_store
2022-05-02T21:26:55.1346661Z  distributed/test_c10d_spawn_gloo
2022-05-02T21:26:55.1346966Z  distributed/test_pg_wrapper
2022-05-02T21:26:55.1347252Z  distributed/test_c10d_spawn_nccl
2022-05-02T21:26:55.1347565Z  distributed/fsdp/test_fsdp_clip_grad_norm
2022-05-02T21:26:55.1347871Z  distributed/fsdp/test_wrap
2022-05-02T21:26:55.1348369Z  distributed/fsdp/test_fsdp_grad_acc
2022-05-02T21:26:55.1348679Z  distributed/algorithms/test_join
2022-05-02T21:26:55.1349004Z  distributed/fsdp/test_fsdp_freezing_weights
2022-05-02T21:26:55.1349305Z  distributed/fsdp/test_fsdp_comm
2022-05-02T21:26:55.1349593Z  distributed/test_c10d_common
2022-05-02T21:26:55.1349885Z  distributed/fsdp/test_fsdp_meta
2022-05-02T21:26:55.1350171Z  distributed/fsdp/test_fsdp_exec_order
2022-05-02T21:26:55.1350486Z  distributed/fsdp/test_fsdp_checkpoint
2022-05-02T21:26:55.1350798Z  distributed/fsdp/test_fsdp_overlap
2022-05-02T21:26:55.1351105Z  distributed/elastic/timer/local_timer_example
2022-05-02T21:26:55.1351423Z  distributed/fsdp/test_fsdp_input
2022-05-02T21:26:55.1351749Z  distributed/_shard/sharded_tensor/ops/test_init
2022-05-02T21:26:55.1352190Z  distributed/elastic/timer/local_timer_test
2022-05-02T21:26:55.1352520Z  distributed/elastic/utils/distributed_test
2022-05-02T21:26:55.1352841Z  distributed/fsdp/test_fsdp_pure_fp16
2022-05-02T21:26:55.1353150Z  distributed/test_data_parallel
2022-05-02T21:26:55.1353437Z  distributed/fsdp/test_fsdp_traversal
2022-05-02T21:26:55.1353792Z  distributed/_shard/sharded_tensor/test_sharded_tensor_reshard
2022-05-02T21:26:55.1354174Z  distributed/_shard/sharded_tensor/ops/test_embedding_bag
2022-05-02T21:26:55.1354534Z  distributed/_shard/sharded_tensor/test_megatron_prototype
2022-05-02T21:26:55.1354858Z  distributed/test_launcher
2022-05-02T21:26:55.1355149Z  distributed/elastic/utils/util_test
2022-05-02T21:26:55.1355441Z  distributed/elastic/utils/logging_test
2022-05-02T21:26:55.1355755Z  distributed/elastic/metrics/api_test
2022-05-02T21:26:55.1356095Z  distributed/_shard/sharded_tensor/ops/test_math_ops
2022-05-02T21:26:55.1356455Z  distributed/_shard/test_replicated_tensor
2022-05-02T21:26:55.1356754Z  distributed/elastic/events/lib_test
2022-05-02T21:26:55.1357065Z  distributed/fsdp/test_shard_utils
2022-05-02T21:26:55.1357387Z  distributed/pipeline/sync/skip/test_gpipe
2022-05-02T21:26:55.1357702Z  distributed/pipeline/sync/skip/test_leak
2022-05-02T21:26:55.1358040Z  distributed/pipeline/sync/skip/test_stash_pop
2022-05-02T21:26:55.1358396Z  distributed/pipeline/sync/skip/test_verify_skippables
2022-05-02T21:26:55.1358716Z  distributed/pipeline/sync/test_bugs
2022-05-02T21:26:55.1359027Z  distributed/pipeline/sync/test_copy
2022-05-02T21:26:55.1359350Z  distributed/pipeline/sync/test_dependency
2022-05-02T21:26:55.1359662Z  distributed/pipeline/sync/test_microbatch
2022-05-02T21:26:55.1359983Z  distributed/pipeline/sync/test_pipe
2022-05-02T21:26:55.1360299Z  distributed/pipeline/sync/test_stream
2022-05-02T21:26:55.1360593Z  distributed/pipeline/sync/test_worker
2022-05-02T21:26:55.1360912Z  distributed/rpc/test_tensorpipe_agent
```
</details>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76564
Approved by: https://github.com/jeffdaily, https://github.com/janeyx99
2022-05-03 23:01:42 +00:00
Junjie Wang (PyTorch)
7c44d560ba [PT-D][Sharding] Enable ops needed in the transformer model training (#75374)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75374

From the code base of FairSeq and MetaSeq codebase (which is essentially a transformer model), we have found that loads of ops are not supported by sharded tensor. So we now implement a simple version so that we can at least run a transformer example:

Ops include: chuck, transpose, view, mask_fill, dropout, softmax and type_as.

Isolate the common logic of registering simple ops into a function and for future register, we just need to implement at most three functions for a new op.

ghstack-source-id: 155309147

Test Plan: CI

Reviewed By: pritamdamania87

Differential Revision: D35123021

fbshipit-source-id: 660e559fb8b4a910eb63e0586c63ab927873a2ce
(cherry picked from commit 83a87ebf627d863448dfe1019c7c5f7112cc14ab)
2022-05-03 17:20:28 +00:00
Junjie Wang (PyTorch)
c1037d0d4c [PT-D][Sharding] Move Partial Tensor to the _shard folder and add logic to remove padding (#76199)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76199

Since Partial Tensor is somehow isolated to sharded tensor. We now move it to the _shard folder.

Also, we added the logic to remove paddings when the size is not divisible by the world size. Modify the unit test to reflect this changes.

Finally, we need to consider the placement order for the reshading spec for partial tensor, related logic is added in this change. Futhermore, for sharded linear, we will need to order the placement by rank to get the expected local result.
ghstack-source-id: 154853290

Test Plan: CI

Reviewed By: pritamdamania87, wanchaol

Differential Revision: D35827894

fbshipit-source-id: 58dab77969b8b6557f42afa7e8f5a8a053dd5793
(cherry picked from commit abeb28f16582dcf707c2e165f39df6caf692384d)
2022-04-28 06:22:02 +00:00
Alban Desmaison
3d7abc0e55 Make -h work with run_test.py
As per title.

### When running `python run_test.py -h`
It used to show:
- The general unittest parser help that we print via a second thread 35545d85dc/torch/testing/_internal/common_utils.py (L467-L470)
- The common_utils's parser help

<details><summary>Full result</summary>
<p>

```bash
$ python run_test.py -h
usage: run_test.py [-h] [-v] [-q] [--locals] [-f] [-c] [-b] [-k TESTNAMEPATTERNS] [tests [tests ...]]

positional arguments:
  tests                a list of any number of test modules, classes and test methods.

optional arguments:
  -h, --help           show this help message and exit
  -v, --verbose        Verbose output
  -q, --quiet          Quiet output
  --locals             Show local variables in tracebacks
  -f, --failfast       Stop on first fail or error
  -c, --catch          Catch Ctrl-C and display results so far
  -b, --buffer         Buffer stdout and stderr during tests
  -k TESTNAMEPATTERNS  Only run tests which match the given substring

Examples:
  run_test.py                           - run default set of tests
  run_test.py MyTestSuite               - run suite 'MyTestSuite'
  run_test.py MyTestCase.testSomething  - run MyTestCase.testSomething
  run_test.py MyTestCase                - run all 'test*' test methods
                                       in MyTestCase

usage: run_test.py [-h] [--subprocess] [--seed SEED] [--accept] [--jit_executor JIT_EXECUTOR] [--repeat REPEAT] [--test_bailouts]
                   [--save-xml [SAVE_XML]] [--discover-tests] [--log-suffix LOG_SUFFIX] [--run-parallel RUN_PARALLEL]
                   [--import-slow-tests [IMPORT_SLOW_TESTS]] [--import-disabled-tests [IMPORT_DISABLED_TESTS]]

optional arguments:
  -h, --help            show this help message and exit
  --subprocess          whether to run each test in a subprocess
  --seed SEED
  --accept
  --jit_executor JIT_EXECUTOR
  --repeat REPEAT
  --test_bailouts
  --save-xml [SAVE_XML]
  --discover-tests
  --log-suffix LOG_SUFFIX
  --run-parallel RUN_PARALLEL
  --import-slow-tests [IMPORT_SLOW_TESTS]
  --import-disabled-tests [IMPORT_DISABLED_TESTS]
```

</p>
</details>

It now prints:
- The general unittest parser help the same way. Should we remove this? We can't merge them unfortunately as inittest does not accept parent / does not expose the parser for us to take it as a parent.
- The combined common_utils + run_test parsers help

<details><summary>Full result</summary>
<p>

```bash
$ python run_test.py -h
usage: run_test.py [-h] [-v] [-q] [--locals] [-f] [-c] [-b] [-k TESTNAMEPATTERNS] [tests [tests ...]]

positional arguments:
  tests                a list of any number of test modules, classes and test methods.

optional arguments:
  -h, --help           show this help message and exit
  -v, --verbose        Verbose output
  -q, --quiet          Quiet output
  --locals             Show local variables in tracebacks
  -f, --failfast       Stop on first fail or error
  -c, --catch          Catch Ctrl-C and display results so far
  -b, --buffer         Buffer stdout and stderr during tests
  -k TESTNAMEPATTERNS  Only run tests which match the given substring

Examples:
  run_test.py                           - run default set of tests
  run_test.py MyTestSuite               - run suite 'MyTestSuite'
  run_test.py MyTestCase.testSomething  - run MyTestCase.testSomething
  run_test.py MyTestCase                - run all 'test*' test methods
                                       in MyTestCase

Ignoring disabled issues:  []
usage: run_test.py [-h] [--subprocess] [--seed SEED] [--accept] [--jit_executor JIT_EXECUTOR] [--repeat REPEAT] [--test_bailouts]
                   [--save-xml [SAVE_XML]] [--discover-tests] [--log-suffix LOG_SUFFIX] [--run-parallel RUN_PARALLEL]
                   [--import-slow-tests [IMPORT_SLOW_TESTS]] [--import-disabled-tests [IMPORT_DISABLED_TESTS]] [-v] [--jit]
                   [--distributed-tests] [-core] [-pt] [-c] [-i TESTS [TESTS ...]] [-x TESTS [TESTS ...]] [-f TESTS] [-l TESTS]
                   [--bring-to-front TESTS [TESTS ...]] [--ignore-win-blocklist] [--continue-through-error]
                   [--export-past-test-times [EXPORT_PAST_TEST_TIMES]] [--shard SHARD SHARD] [--exclude-jit-executor]
                   [--exclude-distributed-tests] [--run-specified-test-cases [RUN_SPECIFIED_TEST_CASES]]
                   [--use-specified-test-cases-by {include,bring-to-front}] [--dry-run]
                   [additional_unittest_args [additional_unittest_args ...]]

Run the PyTorch unit test suite

positional arguments:
  additional_unittest_args
                        additional arguments passed through to unittest, e.g., python run_test.py -i sparse -- TestSparse.test_factory_size_check

optional arguments:
  -h, --help            show this help message and exit
  --subprocess          whether to run each test in a subprocess
  --seed SEED
  --accept
  --jit_executor JIT_EXECUTOR
  --repeat REPEAT
  --test_bailouts
  --save-xml [SAVE_XML]
  --discover-tests
  --log-suffix LOG_SUFFIX
  --run-parallel RUN_PARALLEL
  --import-slow-tests [IMPORT_SLOW_TESTS]
  --import-disabled-tests [IMPORT_DISABLED_TESTS]
  -v, --verbose         print verbose information and test-by-test results
  --jit, --jit          run all jit tests
  --distributed-tests, --distributed-tests
                        run all distributed tests
  -core, --core         Only run core tests, or tests that validate PyTorch's ops, modules,and autograd. They are defined by CORE_TEST_LIST.
  -pt, --pytest         If true, use `pytest` to execute the tests. E.g., this runs TestTorch with pytest in verbose and coverage mode: python run_test.py -vci torch -pt
  -c, --coverage        enable coverage
  -i TESTS [TESTS ...], --include TESTS [TESTS ...]
                        select a set of tests to include (defaults to ALL tests). tests must be a part of the TESTS list defined in run_test.py
  -x TESTS [TESTS ...], --exclude TESTS [TESTS ...]
                        select a set of tests to exclude
  -f TESTS, --first TESTS
                        select the test to start from (excludes previous tests)
  -l TESTS, --last TESTS
                        select the last test to run (excludes following tests)
  --bring-to-front TESTS [TESTS ...]
                        select a set of tests to run first. This can be used in situations where you want to run all tests, but care more about some set, e.g. after making a change to a specific component
  --ignore-win-blocklist
                        always run blocklisted windows tests
  --continue-through-error
                        Runs the full test suite despite one of the tests failing
  --export-past-test-times [EXPORT_PAST_TEST_TIMES]
                        dumps test times from previous S3 stats into a file, format JSON
  --shard SHARD SHARD   runs a shard of the tests (taking into account other selections), e.g., --shard 2 3 will break up the selected tests into 3 shards and run the tests in the 2nd shard (the first number should not exceed the second)
  --exclude-jit-executor
                        exclude tests that are run for a specific jit config
  --exclude-distributed-tests
                        exclude distributed tests
  --run-specified-test-cases [RUN_SPECIFIED_TEST_CASES]
                        load specified test cases file dumped from previous OSS CI stats, format CSV.  If all test cases should run for a <test_module> please add a single row:
                         test_filename,test_case_name
                         ...
                         <test_module>,__all__
                         ...
                        how we use the stats will be based on option "--use-specified-test-cases-by".
  --use-specified-test-cases-by {include,bring-to-front}
                        used together with option "--run-specified-test-cases". When specified test case file is set, this option allows the user to control whether to only run the specified test modules or to simply bring the specified modules to front and also run the remaining modules. Note: regardless of this option, we will only run the specified test cases  within a specified test module. For unspecified test modules with the bring-to-front option, all test cases will be run, as one may expect.
  --dry-run             Only list the test that will run.

where TESTS is any of: benchmark_utils/test_benchmark_utils, distributed/_shard/sharded_optim/test_sharded_optim, distributed/_shard/sharded_tensor/ops/test_binary_cmp, distributed/_shard/sharded_tensor/ops/test_elementwise_ops, distributed/_shard/sharded_tensor/ops/test_embedding, distributed/_shard/sharded_tensor/ops/test_embedding_bag, distributed/_shard/sharded_tensor/ops/test_init, distributed/_shard/sharded_tensor/ops/test_linear, distributed/_shard/sharded_tensor/ops/test_math_ops, distributed/_shard/sharded_tensor/test_megatron_prototype, distributed/_shard/sharded_tensor/test_partial_tensor, distributed/_shard/sharded_tensor/test_sharded_tensor, distributed/_shard/sharded_tensor/test_sharded_tensor_reshard, distributed/_shard/sharding_spec/test_sharding_spec, distributed/_shard/test_replicated_tensor, distributed/algorithms/test_join, distributed/elastic/events/lib_test, distributed/elastic/metrics/api_test, distributed/elastic/multiprocessing/api_test, distributed/elastic/timer/api_test, distributed/elastic/timer/local_timer_example, distributed/elastic/timer/local_timer_test, distributed/elastic/utils/distributed_test, distributed/elastic/utils/logging_test, distributed/elastic/utils/util_test, distributed/fsdp/test_flatten_params_wrapper, distributed/fsdp/test_fsdp_apply, distributed/fsdp/test_fsdp_checkpoint, distributed/fsdp/test_fsdp_clip_grad_norm, distributed/fsdp/test_fsdp_comm, distributed/fsdp/test_fsdp_core, distributed/fsdp/test_fsdp_freezing_weights, distributed/fsdp/test_fsdp_grad_acc, distributed/fsdp/test_fsdp_ignored_modules, distributed/fsdp/test_fsdp_input, distributed/fsdp/test_fsdp_memory, distributed/fsdp/test_fsdp_mixed_precision, distributed/fsdp/test_fsdp_multiple_forward, distributed/fsdp/test_fsdp_multiple_wrapping, distributed/fsdp/test_fsdp_optim_state, distributed/fsdp/test_fsdp_overlap, distributed/fsdp/test_fsdp_pure_fp16, distributed/fsdp/test_fsdp_state_dict, distributed/fsdp/test_fsdp_summon_full_params, distributed/fsdp/test_fsdp_traversal, distributed/fsdp/test_fsdp_uneven, distributed/fsdp/test_shard_utils, distributed/fsdp/test_utils, distributed/fsdp/test_wrap, distributed/nn/jit/test_instantiator, distributed/optim/test_zero_redundancy_optimizer, distributed/pipeline/sync/skip/test_api, distributed/pipeline/sync/skip/test_gpipe, distributed/pipeline/sync/skip/test_inspect_skip_layout, distributed/pipeline/sync/skip/test_leak, distributed/pipeline/sync/skip/test_portal, distributed/pipeline/sync/skip/test_stash_pop, distributed/pipeline/sync/skip/test_tracker, distributed/pipeline/sync/skip/test_verify_skippables, distributed/pipeline/sync/test_balance, distributed/pipeline/sync/test_bugs, distributed/pipeline/sync/test_checkpoint, distributed/pipeline/sync/test_copy, distributed/pipeline/sync/test_deferred_batch_norm, distributed/pipeline/sync/test_dependency, distributed/pipeline/sync/test_inplace, distributed/pipeline/sync/test_microbatch, distributed/pipeline/sync/test_phony, distributed/pipeline/sync/test_pipe, distributed/pipeline/sync/test_pipeline, distributed/pipeline/sync/test_stream, distributed/pipeline/sync/test_transparency, distributed/pipeline/sync/test_worker, distributed/rpc/cuda/test_tensorpipe_agent, distributed/rpc/test_faulty_agent, distributed/rpc/test_tensorpipe_agent, distributed/test_c10d_common, distributed/test_c10d_gloo, distributed/test_c10d_nccl, distributed/test_c10d_spawn_gloo, distributed/test_c10d_spawn_nccl, distributed/test_data_parallel, distributed/test_distributed_spawn, distributed/test_launcher, distributed/test_nccl, distributed/test_pg_wrapper, distributed/test_store, distributions/test_constraints, distributions/test_distributions, lazy/test_bindings, lazy/test_extract_compiled_graph, lazy/test_ts_opinfo, test_ao_sparsity, test_autocast, test_autograd, test_binary_ufuncs, test_bundled_inputs, test_complex, test_cpp_api_parity, test_cpp_extensions_aot_ninja, test_cpp_extensions_aot_no_ninja, test_cpp_extensions_jit, test_cuda, test_cuda_primary_ctx, test_dataloader, test_datapipe, test_deploy, test_deploy, test_dispatch, test_expanded_weights, test_foreach, test_function_schema, test_functional_autograd_benchmark, test_functional_optim, test_functionalization, test_futures, test_fx, test_fx_experimental, test_hub, test_import_stats, test_indexing, test_jit, test_jit_autocast, test_jit_cuda_fuser, test_jit_disabled, test_jit_fuser_legacy, test_jit_fuser_te, test_jit_legacy, test_jit_profiling, test_license, test_linalg, test_logging, test_masked, test_mkldnn, test_mobile_optimizer, test_model_dump, test_module_init, test_modules, test_monitor, test_multiprocessing, test_multiprocessing_spawn, test_namedtensor, test_namedtuple_return_api, test_native_functions, test_nestedtensor, test_nn, test_numba_integration, test_numpy_interop, test_openmp, test_ops, test_ops_gradients, test_ops_jit, test_optim, test_overrides, test_package, test_per_overload_api, test_profiler, test_pruning_op, test_public_bindings, test_python_dispatch, test_pytree, test_quantization, test_reductions, test_scatter_gather_ops, test_serialization, test_set_default_mobile_cpu_allocator, test_shape_ops, test_show_pickle, test_sort_and_select, test_sparse, test_sparse_csr, test_spectral_ops, test_stateless, test_tensor_creation_ops, test_tensorboard, test_tensorexpr, test_tensorexpr_pybind, test_testing, test_torch, test_type_hints, test_type_info, test_type_promotion, test_unary_ufuncs, test_utils, test_view_ops, test_vmap, test_vulkan, test_xnnpack_integration
```

</p>
</details>

### When running anything else (for example  `python test_autograd.py -h`)
It did not change and still does:
- The general unittest parser help that we print via a second thread
- The common_utils's parser help
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76152
Approved by: https://github.com/malfet, https://github.com/seemethere
2022-04-25 14:01:33 +00:00
Wanchao Liang
78ea86a445 [shard] Sharder and ShardingPlan prototype (#73873)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73873

Basic ShardingPlan interface and Sharder implemention:
1. We provide `ShardingPlan` to allow user to specify all parameter sharding strategies for a given model, this including `plan` for sharding the parameters, and `output_plan` for tagging the output layout, `return_local_tensor` for converting back to DDP.
2. Introduce `shard_module` API, that could take a nn.Module, a ShardingPlan, then shard the module according to the plan.

TODO:
next PR we will introduce Extensible Sharder and ShardingPlanner.
ghstack-source-id: 154682421

Test Plan: test_sharding_plann.py

Reviewed By: pritamdamania87, fduwjj

Differential Revision: D34695159

fbshipit-source-id: 3d695803c4b7e9a7543177ade5b709b5f847baa9
(cherry picked from commit 670cd279b0e5304a9bf0ce6e6651a08273a77035)
2022-04-25 13:01:24 +00:00
Jeff Daily
44bbb247a6 [ROCm] enable fsdp tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75632
Approved by: https://github.com/kumpera, https://github.com/malfet
2022-04-22 19:50:36 +00:00
wanchaol
be354d8139 [shard] Add basic math ops to ShardedTensor and add ReplicatedTensor inter-op
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73703

This PR add basic math ops to ShardedTensor (+-*/), and add ReplicatedTensor inter-op ShardedTensor to those math ops. This enables ShardedTensor (op) ReplicatedTensor to avoid communication in certain cases.

Differential Revision: [D34560867](https://our.internmc.facebook.com/intern/diff/D34560867/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D34560867/)!

Approved by: https://github.com/pritamdamania87
2022-04-12 04:25:10 +00:00
Andrey Talman
622cff3e95 Cuda 11.6 Disable failing tests (#75420)
Summary:
This mitigates number of issues with CUDA 11.6 update and updates Linux driver .

New issues discovered
#[75391](https://github.com/pytorch/pytorch/issues/75391)
#[75375](https://github.com/pytorch/pytorch/issues/75375)

Old issue present since 11.3
#[57482](https://github.com/pytorch/pytorch/issues/57482)
#[70111](https://github.com/pytorch/pytorch/issues/70111)

These changes already testsed WIP PR:
#[75337](https://github.com/pytorch/pytorch/pull/75337)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75420

Reviewed By: seemethere

Differential Revision: D35481973

Pulled By: atalman

fbshipit-source-id: 4db00c646e2df4f8650404763963c3b215110f1f
(cherry picked from commit 518e19dc361b43273f5bd6bdfff942614e8466f5)
2022-04-07 22:43:15 +00:00
Brian Hirsh
9429dbb434 make functionalization work better with subclasses
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73441

Approved by: https://github.com/ezyang, https://github.com/albanD
2022-04-04 15:33:27 +00:00
David Berard
27deefb5e1 [JIT] Enable NVFuser tests in OSS CI (#73322)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73322

These tests have been disabled in OSS CI since #34785.

Test Plan: Imported from OSS

Reviewed By: eellison

Differential Revision: D34436844

Pulled By: davidberard98

fbshipit-source-id: c5b14b33e7f369a6fa1e9cfbcb484a30dffc659e
(cherry picked from commit b08f51587c0203c3e8b69f06ea613759e740aa4f)
2022-04-01 23:48:30 +00:00
Wanchao Liang
0524b2829a [shard] Add ReplicatedTensor (#73529)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73529

Add ReplicatedTensor, a ReplicatedTensor is a type of tensor that have the same value on all ranks across the world_size.

ReplicatedTensor is a :class:`~torch.Tensor` subclass, and it could be used together with ShardedTensor/Tensor together to express different types of computation. The inter-op rules defined as (using torch.add as an example op):
    ReplicatedTensor + ReplicatedTensor = ReplicatedTensor
    ReplicatedTensor + torch.Tensor = torch.Tensor
    ReplicatedTensor + ShardedTensor = ShardedTensor

We also added a `validate()` API to help user validate if a replicated tensor on certain process_group is truly replicated or not.

TODO: next PR gonna add ShardedTensor/PartialTensor logic to handle ReplicatedTensor.
ghstack-source-id: 152064781

Test Plan: test_replicated_tensor

Reviewed By: pritamdamania87, fduwjj

Differential Revision: D34529374

fbshipit-source-id: 16ccb300e9f9c47ac29a17eb6d46d029ab7d60b8
(cherry picked from commit 44f4e11e795a1bf330a8108bda256950ca769525)
2022-03-24 12:41:17 +00:00
Jeff Daily
956a028b55 [ROCm] enable HIP IPC
Enables code paths that use hipIpc* functions.  Also enables test_multiprocessing.py.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74383
Approved by: https://github.com/osalpekar
2022-03-21 19:32:01 +00:00
Sahan Paliskara
0bfa2f8255 Move torch::deploy tests to their own workflow job (#73676)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73676

For some reason https://github.com/pytorch/pytorch/pull/72637 ended up in getting messed up during rebasing so please refer to that pr for review history.

This PR creates a new workflow called ` deploy-linux-xenial-cuda11.3-py3.7-gcc7` for torch::deploy tests.

For testing go to https://www.torch-ci.com/pytorch/pytorch/pull/73676 and check if a build and test job occur with ` deploy-linux-xenial-cuda11.3-py3.7-gcc7`

Test Plan: Imported from OSS

Reviewed By: soulitzer

Differential Revision: D34586702

Pulled By: PaliC

fbshipit-source-id: 5627cf4ff411a4a04030f8b7726f84af979da213
(cherry picked from commit df6dddebb9fe078a6053a31033b5a40cc742fcf3)
2022-03-17 12:19:48 +00:00
atalman
ebca80ed08 Move test ops gradients and test ops jit to separate files
Fixes #72368

As per reference issue, the test_ops in single file takes around 3:30-4:00Hrs to execute on asan jobs:

Reference : pytorch_test_times.json

```
{
    "commit": "39535fec6c3ff5bf7c2d322d096c59571c3295ed",
    "JOB_BASE_NAME": "linux-xenial-py3.7-clang7-asan",
    "job_times": {
        "test_ops": 14928.355000000636, <- This test group is over 4hrs alone
```
----

Hence separating  test_ops into following parts:
1. TestGradients
2. TestJit
3.  TestCommon and TestMathBits

Pull Request resolved: https://github.com/pytorch/pytorch/pull/74297
Approved by: https://github.com/malfet
2022-03-17 02:07:50 +00:00
PyTorch MergeBot
232faeacf8 Revert "Move test ops gradients and test ops jit to separate files"
This reverts commit 7cf9b942da.

Reverted https://github.com/pytorch/pytorch/pull/74297 on behalf of https://github.com/atalman
2022-03-16 20:08:23 +00:00
atalman
7cf9b942da Move test ops gradients and test ops jit to separate files
Fixes #72368

As per reference issue, the test_ops in single file takes around 3:30-4:00Hrs to execute on asan jobs:

Reference : pytorch_test_times.json

```
{
    "commit": "39535fec6c3ff5bf7c2d322d096c59571c3295ed",
    "JOB_BASE_NAME": "linux-xenial-py3.7-clang7-asan",
    "job_times": {
        "test_ops": 14928.355000000636, <- This test group is over 4hrs alone
```
----

Hence separating  test_ops into following parts:
1. TestGradients
2. TestJit
3.  TestCommon and TestMathBits

Pull Request resolved: https://github.com/pytorch/pytorch/pull/74297
Approved by: https://github.com/malfet
2022-03-16 19:30:22 +00:00
Wanchao Liang
8b2ae86f02 [shard] disable rocm and windows for sharding_spec test (#74040)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74040

fixes https://github.com/pytorch/pytorch/issues/73552
ghstack-source-id: 151046817

Test Plan: wait for ci

Reviewed By: rohan-varma

Differential Revision: D34792398

fbshipit-source-id: 84d08f01db8375817f48537505e7d988cb39d1f4
(cherry picked from commit 18b21ef0db91ddd22dc57a5b413e3e3ad594bb14)
2022-03-10 20:23:59 +00:00
Alban Desmaison
701fa16eed only run complex autograd tests once
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73210
2022-03-01 23:42:07 +00:00
Alban Desmaison
f275b3f9a1 simplify run_test for distributed tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73209
2022-03-01 23:37:37 +00:00
Alban Desmaison
7e919bd3c6 add dry run option and improve test list printing
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73208
2022-02-22 20:45:41 +00:00
Ilya Persky
1b089292df Fix test failure when compiled without LAPACK support (#70671)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/70670

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70671

Reviewed By: H-Huang

Differential Revision: D34242339

Pulled By: janeyx99

fbshipit-source-id: 8cd13c13588007c60e9c3f17dbf707dcfa2e0e04
(cherry picked from commit cf6dbe3e81)
2022-02-15 16:38:47 +00:00
wushirong
4d01789f69 Remove fx2trt from oss CI (#72595)
Summary:
Remove fx2trt test from oss CI

Pull Request resolved: https://github.com/pytorch/pytorch/pull/72595

Test Plan: CI

Reviewed By: houseroad

Differential Revision: D34112595

Pulled By: wushirong

fbshipit-source-id: 02376ef0f25381eff31b72dcbf964c1966af9793
(cherry picked from commit e3d698a942)
2022-02-10 18:49:31 +00:00
Junjie Wang (PyTorch)
88547396eb [PT-D] Enable megatron-lm style MLP layers (Changes mainly on sharded linear op) (#69735)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69735

We want to build a prototype of Megatron-LM so that we can apply PT-D op to models like transformer and other Meta flagship models like

The basic idea of Megatron-LM is as following:
1. Col-wise sharding of linear weight. Perform the linear op for the first layer.
2. Perform a math op (optional), such as ReLU or GeLU. We use GeLU in our example unit test. The input is from step 1.
3. Row-wise sharing of linear weight. Perform the linear op for the second layer. The input is from step 2.

We then save communications to concatenate the col-wise sharding results and spreading the input to different ranks for row-wise sharding.

The change is as following:
1. Return a ShardedTensor for the col-wise sharding in the sharded_linear op.
2. Return a PartialTensors for the row-wise sharding in the sharded_linear op.
3. Leverage APIs already defined for `reshard` to merge/aggregate local results to a fully sync local result if needed.
4. Add helper function to create sharded tensor based on the local result.
5. Add a unit test to test the Megatron-LM idea mentioned above and compare with local ops, including the grad and optimizer so that we can ensure the correctness of the implementation.
6. Refactor the unit test of sharded linear to reflect the changes in the code.
ghstack-source-id: 148273049

Test Plan: Unit test + CI

Reviewed By: pritamdamania87

Differential Revision: D32978221

fbshipit-source-id: 565fc92e7807e19d53b0261f8ace3945bef69e3e
(cherry picked from commit 344abe7520)
2022-02-03 06:12:15 +00:00
Junjie Wang (PyTorch)
19d0de8a57 [PT-D][RFC] Resharding related API implement for ShardedTensor and Partial Tensor (#70079)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70079

We defined a new concept named `PartialTensor`, which is an abstraction to represent Tensors that need aggregation across multiple devices and multiple processes.

We also defined a API `reshard_output` to reshard a `PartialTensor` to `Tensor` or reshard a `ShardedTensor` to `ShardedTensor/Tensor`. This is done via class `ModuleResharder` which acts like a wrapper of original modules plus the a reshard in the final step.

The `reshard` logic is defined in each class (`ShardedTensor` and `PartialTensor`).
ghstack-source-id: 148273050

Test Plan: Unit test is in the next PR.

Reviewed By: pritamdamania87

Differential Revision: D33121037

fbshipit-source-id: 5f56617ea526b857c5b73df6e069697d428ec359
(cherry picked from commit 58b1457cbc)
2022-02-03 05:26:02 +00:00
Pritam Damania
64670e414e [reland] Create torch.distributed._shard package. (#72141)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72141

We have many sharding components currently:
torch.distributed._sharded_tensor, torch.distributed._sharding_spec,
torch.distributed._sharded_optimizer and more coming.

As a result, organizing all of this under the `torch.distributed._shard`
package. For BC reasons, I'm still keeping the old packages and have them just
reference the new package.
ghstack-source-id: 148150861
ghstack-source-id: 148150861

Test Plan: waitforbuildbot

Reviewed By: fduwjj

Differential Revision: D33904585

fbshipit-source-id: 057e847eb7521b536a3ee4e0f94871aacc752062
(cherry picked from commit 29a70dd7af)
2022-02-02 06:58:20 +00:00
Nikita Shulga
34494e6252 Back out "Create torch.distributed.shard package." (#72062)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72062

Original commit changeset: dc692b31e260

Original Phabricator Diff: D33755913 (87bbcf70f7)

Test Plan: CI

Reviewed By: pbelevich

Differential Revision: D33891115

fbshipit-source-id: 37286e03d743d8691319f07c95e9561d54f3d6d0
(cherry picked from commit 0c1b3fe008)
2022-01-31 18:29:27 +00:00
Pritam Damania
87bbcf70f7 Create torch.distributed.shard package. (#71742)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71742

We have many sharding components currently:
torch.distributed._sharded_tensor, torch.distributed._sharding_spec,
torch.distributed._sharded_optimizer and more coming.

As a result, organizing all of this under the `torch.distributed.shard`
package. For BC reasons, I'm still keeping the old packages and have them just
reference the new package.
ghstack-source-id: 147899768

Test Plan: waitforbuildbot

Reviewed By: fduwjj, wanchaol

Differential Revision: D33755913

fbshipit-source-id: dc692b31e2607063d55dfcb3db33ec53961d5a5b
(cherry picked from commit 5b6885f358)
2022-01-29 00:48:06 +00:00
Shirong Wu
7a08030903 Fix fx2trt CI test trigger condition (#71014)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71014

Replace test trigger with test_config matching.

Test Plan:
CI
https://github.com/pytorch/pytorch/runs/4746717568?check_suite_focus=true

Reviewed By: janeyx99

Differential Revision: D33480971

fbshipit-source-id: 9513e464753343a7ae47fcfaf48119f34bae94c5
2022-01-10 13:37:24 -08:00
Rodrigo Kumpera
2378421340 Implement torch.allclose for sharded tensor. (#70331)
Summary:
Implement torch.allclose op for sharded tensors.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70331

Test Plan:
Automated test added.
pritamdamania87
Fixes https://github.com/pytorch/pytorch/issues/67112

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang

Reviewed By: pritamdamania87

Differential Revision: D33339137

Pulled By: kumpera

fbshipit-source-id: 4263e468eaa117317b190f69877bf3f8bbac5658
2022-01-07 08:37:04 -08:00
Ilya Persky
bc514cb425 Skip distributed tests if built with USE_DISTRIBUTED=0 (#70677)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/70676

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70677

Reviewed By: albanD

Differential Revision: D33439808

Pulled By: janeyx99

fbshipit-source-id: 7f9971eb564dbbb6625fe5f78328c3abe3808719
2022-01-06 08:55:05 -08:00
Brian Hirsh
bb5b4cceb6 Revert "Revert D32498569: allow external backend codegen to toggle whether to generate out= and inplace kernels" (#69950)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69950

This reverts commit f6cad53443.

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D33113545

Pulled By: bdhirsh

fbshipit-source-id: d6590294662588d36c09662dea65919ad4e1e288
2022-01-04 14:52:00 -08:00
wushirong
31c7e5d629 Install TensorRT lib on oss docker and enable fx2trt unit test (#70203)
Summary:
CI

Lib installed and unit test run on https://github.com/pytorch/pytorch/actions/runs/1604076060

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70203

Reviewed By: malfet

Differential Revision: D33264641

Pulled By: wushirong

fbshipit-source-id: ba30010bbd06e70d31415d8c52086d1779371bcf
2021-12-22 08:50:48 -08:00
Pritam Damania
0544f975e1 [reland] Support torch.equal for ShardedTensor. (#70145)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70145

Added support for torch.equal to ShardedTensor. This is really
helpful in terms of comparing two ShardedTensors.
ghstack-source-id: 146066939

Test Plan: waitforbuildbot

Reviewed By: wanchaol

Differential Revision: D33201714

fbshipit-source-id: 56adfc36e345d512c9901c56c07759bf658c745b
2021-12-21 13:22:52 -08:00
Michael Suo
19f898402d Revert D33241684: [pytorch][PR] Install TensorRT lib on oss docker and enable fx2trt unit test
Test Plan: revert-hammer

Differential Revision:
D33241684 (dab3d3132b)

Original commit changeset: cd498908b00f

Original Phabricator Diff: D33241684 (dab3d3132b)

fbshipit-source-id: d5b2e663b5b0c9e570bd799b9f6111cd2a0de4f7
2021-12-20 23:14:35 -08:00
wushirong
dab3d3132b Install TensorRT lib on oss docker and enable fx2trt unit test (#70203)
Summary:
CI

Lib installed and unit test run on https://github.com/pytorch/pytorch/actions/runs/1604076060

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70203

Reviewed By: janeyx99

Differential Revision: D33241684

Pulled By: wushirong

fbshipit-source-id: cd498908b00f3417bdeb5ede78f5576b3b71087c
2021-12-20 18:51:48 -08:00
Michael Suo
a406a427ae Revert D33004315: Support torch.equal for ShardedTensor.
Test Plan: revert-hammer

Differential Revision:
D33004315 (1c4c81622c)

Original commit changeset: 786fe26baf82

Original Phabricator Diff: D33004315 (1c4c81622c)

fbshipit-source-id: e1dda70fea656834fdf0f2a9f874415f7b460c6e
2021-12-15 14:14:06 -08:00
Pritam Damania
1c4c81622c Support torch.equal for ShardedTensor. (#69734)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69734

Added support for `torch.equal` to ShardedTensor. This is really
helpful in terms of comparing two ShardedTensors.

Will implement `allclose` in a follow PR.
ghstack-source-id: 145301451

Test Plan: waitforbuildbot

Reviewed By: fduwjj, wanchaol

Differential Revision: D33004315

fbshipit-source-id: 786fe26baf82e1bb4fecfdbfc9ad4b64e704877f
2021-12-15 13:07:36 -08:00
Brian Hirsh
f6cad53443 Revert D32498569: allow external backend codegen to toggle whether to generate out= and inplace kernels
Test Plan: revert-hammer

Differential Revision:
D32498569 (aa0cf68c17)

Original commit changeset: ebd932d042b9

Original Phabricator Diff: D32498569 (aa0cf68c17)

fbshipit-source-id: 21a393fa339510d926512a7983d33ece327b743d
2021-12-14 15:27:24 -08:00
Nikita Shulga
24ee1d13f6 Another attempt to fix version comparison check (#69939)
Summary:
Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69939

Reviewed By: atalman

Differential Revision: D33108135

Pulled By: malfet

fbshipit-source-id: cadadfe5b04c4378f149136f8e1f8e8d6266775c
2021-12-14 14:54:15 -08:00
Wanchao Liang
800a457b6f [shard] add ShardedOptimizer (#68607)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68607

This PR added ShardedOptimizer and a API to get module parameters along with ShardedTensor param, it allows user to use this Optimizer Wrapper to construct a optimizer that involves ShardedTensor

The state_dict support will be a follow up diff
ghstack-source-id: 145532834

Test Plan: python test_sharded_optim.py

Reviewed By: pritamdamania87

Differential Revision: D32539994

fbshipit-source-id: a3313c6870d1f1817fc3e08dc2fc27dc43bef743
2021-12-14 12:15:20 -08:00
Nikita Shulga
fef9981998 Update run_test.py (#69920)
Summary:
Do not compare LooseVersion against string

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69920

Reviewed By: atalman

Differential Revision: D33101166

Pulled By: malfet

fbshipit-source-id: a2df9e01d17663262718f11e580c8b009764f7b5
2021-12-14 11:26:56 -08:00
Brian Hirsh
aa0cf68c17 allow external backend codegen to toggle whether to generate out= and inplace kernels (#68530)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68530

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D32498569

Pulled By: bdhirsh

fbshipit-source-id: ebd932d042b988e19c71aa04a21677db9bdc9f04
2021-12-14 10:25:02 -08:00
Nikita Shulga
07767569c9 Properly import LooseVersion (#69904)
Summary:
This fixes regression introduced by https://github.com/pytorch/pytorch/pull/57040

Somehow importing `distutils` from `setuptool` caused import of
`distutils.versions`, which is not a documented dependency and got
change with the release of
[setuptools-59.6.0](https://github.com/pypa/setuptools/tree/v59.6.0)
We should not rely on that, as
`import distutils` never re-imports `distutils.version`, which one can
see by observing
https://github.com/python/cpython/blob/3.9/Lib/distutils/__init__.py
or by running:
```
% python3 -c "import distutils;print(distutils.__version__, dir(distutils))"
3.7.5 ['__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', '__version__', 'sys']
% python3 -c "from setuptools import distutils;print(distutils.__version__, dir(distutils))"
3.7.5 ['__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', '__version__', 'archive_util', 'ccompiler', 'cmd', 'config', 'core', 'debug', 'dep_util', 'dir_util', 'dist', 'errors', 'extension', 'fancy_getopt', 'file_util', 'filelist', 'log', 'spawn', 'sys', 'sysconfig', 'util', 'version']
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69904

Reviewed By: albanD, atalman, janeyx99

Differential Revision: D33094453

Pulled By: malfet

fbshipit-source-id: aaf1adb7c6f293c4e376ccff21c64cd6ba625e97
2021-12-14 09:28:19 -08:00
Andrey Talman
77a4b89411 Adding windows cuda 11.5 workflows (#69377)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/69081

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69377

Reviewed By: ngimel

Differential Revision: D33076022

Pulled By: atalman

fbshipit-source-id: aeb2791fc15d7b491976f57a74c1989c6ca61b81
2021-12-13 20:49:02 -08:00
Alban Desmaison
8b20dde932 add python dispatch test back to CI and fix typo in test (#69565)
Summary:
The error message was changed following a PR comment. And since the test doesn't run on CI, I forgot to update the test to catch the new error message.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69565

Reviewed By: mrshenli

Differential Revision: D32932982

Pulled By: albanD

fbshipit-source-id: a1da72b0ca735e72b481bc944039233094f1c422
2021-12-08 08:44:49 -08:00
Rohan Varma
3bd7dbf119 [Dist CI][BE] Remainder of c10d/store tests run in subprocess (#68822)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68822

Per title, we switched over c10d_gloo and nccl and results look good
so far, so switch the rest of them as well. After the only dist tests that
won't run in subprocess are pipe and fsdp tests, which historically haven't had
much flakiness.
ghstack-source-id: 144213522

Test Plan: CI

Reviewed By: H-Huang

Differential Revision: D32624330

fbshipit-source-id: 469f613e5b0e4529e6b23ef259d948837d4af26b
2021-11-29 10:59:39 -08:00
Rohan Varma
250d0bd20b [RPC][Dist CI][BE] RPC tests run in subprocess (#68821)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68821

Continuing effort to move most distributed tests to run in subprocess
for better reproducibility + reduce flakiness.
ghstack-source-id: 144213520

Test Plan: CI

Reviewed By: H-Huang

Differential Revision: D32624199

fbshipit-source-id: 04448636320554d7a3ab29ae92bc1ca9fbe37da2
2021-11-29 10:58:08 -08:00
Nikita Shulga
b5b62b3408 Cleanup old TD logic (#68842)
Summary:
Remove `--determine-from` option from run_test.py and remove all
references from corresponding test scripts

Followup after https://github.com/pytorch/pytorch/pull/64921

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68842

Reviewed By: seemethere, janeyx99

Differential Revision: D32631418

Pulled By: malfet

fbshipit-source-id: bdb5dd888c1d97dfaf95c1f299bf8073f3de9588
2021-11-23 18:45:42 -08:00
Rohan Varma
9554ebe44e [Dist CI][BE] c10d gloo tests run in subprocess (#68504)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68504

Per title
ghstack-source-id: 143928767

Test Plan: CI

Reviewed By: H-Huang

Differential Revision: D32485100

fbshipit-source-id: a55687aea4af69e3830aee6f0278550c72f142c2
2021-11-22 09:54:07 -08:00
Rohan Varma
ddc22ea3b2 [Dist CI][BE] test_c10d_nccl run in subprocess (#68503)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68503

Per title
ghstack-source-id: 143928768

Test Plan: CI

Reviewed By: H-Huang

Differential Revision: D32484990

fbshipit-source-id: 6682f46256af0da5153e5087a91a7044156dd17f
2021-11-22 09:52:58 -08:00
Wanchao Liang
fb556c91ce [BE] delete frontend.cpp (#67400)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67400

c10d/frontend.cpp was originally proposed to introduce pure C++ API and use TorcBind to share python level API with TorchScript. This is no longer needed, so delete this to reduce code redundancy.
ghstack-source-id: 143910066
ghstack-source-id: 143910066

Test Plan: wait for ci

Reviewed By: navahgar

Differential Revision: D31979270

fbshipit-source-id: 6ceb8b53d67ab8f9aef44b34da79346dfbb51225
2021-11-21 23:30:52 -08:00
Rohan Varma
f02efc749a [Dist CI][BE] Run each test in its own process for test_distributed_spawn (#67901)
Summary:
Context: https://github.com/pytorch/pytorch/issues/67061

Use `run_test.py`'s provided flag `"--subprocess"`, passed in like `extra_unittest_args=["--subprocess"]` when running test_distributed_spawn. This will ensure that each test is run separately in its own process. The goal is to more closely simulate how a developer would run a single test when reproducing a CI failure and make reproducibility easier in general.

Also, when a test fails, print out the exact command that was issued so developer knows how to reproduce it.

For example test fails, it will print out something like the following to logs -

```
Test exited with non-zero exitcode 1. Command to reproduce: BACKEND=gloo WORLD_SIZE=3 /fsx/users/rvarm1/conda/envs/pytorch/bin/python distributed/test_distributed_spawn.py -v TestDistBackendWithSpawn.test_Backend_enum_class
```

running test_distributed_spawn is still the same cmd as before:

`
python test/run_test.py --verbose -i distributed/test_distributed_spawn
`

as seen in [distributed contributing](https://github.com/pytorch/pytorch/blob/master/torch/distributed/CONTRIBUTING.md) guide.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67901

Reviewed By: cbalioglu, mruberry

Differential Revision: D32225172

Pulled By: rohan-varma

fbshipit-source-id: 7e8d4c7a41858044bd2a4e0d1f0bf8f1ac671d67
2021-11-11 06:11:00 -08:00
Brian Hirsh
7c90bd77ec Test functionalization pass in python (#66101)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66101

Updated description:

This PR tests the functionalization pass in python in two ways. For each of the test programs that I have in `test_functionalization.py`, it:
- runs the program with and without functionalization, and asserts the outputs and (potentially mutated) inputs are equal in both cases
- runs the program with `LoggingTensor`, and uses expecttests on the resulting graph. I manually confirm that the graphs look reasonable and only contain functional ops.

Mechanically, the changes include:
- factoring out `LoggingTensor` into a testing util so it can be re-used in multiple tests
- adding some private python api's in the `torch` namespace as hooks that I can use during testing

In the original version of this PR, I also added some fixes to the `_make_subclass()` function in python: allowing you to pass in strides and storage_offset. I kept them in mainly because the changes were already there.

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D31942095

Pulled By: bdhirsh

fbshipit-source-id: 90ff4c88d461089704922e779571eee09c21d707
2021-11-09 14:34:05 -08:00
Junjie Wang
2766662ca9 [PyTorch][2/N] Basic implementation of ShardedEmbeddingBag using ShardedTensor. (#67188)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67188

This diff/PR is trying to implement the ShardedEmbeddingBag using the ShardedTensor.

We support both row-wise and column-wise sharding of the embedding bag. The detailed logic can be found in the comment.

Several caveats:
1. Only the sharding of one weight is supported now.
1. We support limited input params for the op. To support more params are on the way.
2. We only support chuck sharding for now.
3. We only support a single local shard per rank for now.

Some other changes include:
1. Refactor the ShardedEmbedding code so that the common logic can be reused.
2. Fix tiny typos and corner cases in API `get_chunked_dim_size`. Where it will return -1 if the we set the dim_size = 5, split_size = 2, idx = 3. (This is a valid case because when chunks = 4, dim_size = 5, then the split_size = 2)
ghstack-source-id: 142325915

Test Plan: Unit test and CI

Reviewed By: pritamdamania87

Differential Revision: D31749458

fbshipit-source-id: ed77e05e4ec94ef1a01b1feda8bbf32dc5d5da1b
2021-11-03 17:39:18 -07:00
Bo Wang
b6df043f1f Add torch.nn.init.uniform_ operator to ShardedTensor. (#63997)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63997

Use torch_function to extend torch.nn.init.uniform_
The Init is done in SPMD fashion. Note that ideally we want to aggregate sharded tensors into a global tensor, init it and reshard. It's fine to run it SPMD since uniform is I.I.D indepenent and identifically distributed.
Also enable unit test for test_linear.py for OSS test

Test Plan:
a) Unit Test
(pytorch) ... $ python test/distributed/_sharded_tensor/ops/test_init.py TestShardedTensorNNInit --v
(pytorch) ... $ python test/distributed/_sharded_tensor/ops/test_linear.py --v (before runs this command is no-op)

or b) Manual run: Instruction here: https://docs.google.com/document/d/1_m1Hdo5w51-hhPlZ_F8Y6PIWrN7UgJZqiSpARYvhsaE/edit#

Imported from OSS

Reviewed By: pritamdamania87, anjali411

Differential Revision: D30563017

fbshipit-source-id: d1859f7682235bcb44515efc69ca92bc5e34fce1
2021-10-21 00:17:13 -07:00
Junjie Wang
08cb31a03e [PyTorch][1/N] Basic implementation of ShardedEmbedding using ShardedTensor. (#66604)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66604

This diff/PR is trying to implement the ShardedEmbedding and ShardedEmbedding using the ShardedTensor.

Several caveats:
1. We support limited input params for the op. To support more params are on the way.
2. We only support chuck sharding for now.
3. We only support a single local shard per rank for now.

ghstack-source-id: 141056130

Test Plan: Unit test and CI

Reviewed By: pritamdamania87

Differential Revision: D31544556

fbshipit-source-id: cc867dcba8c11e6f4c7c3722488908f5108cc67f
2021-10-20 15:16:49 -07:00
Yanli Zhao
61fca037d6 [Part 1] upstreaming fairscale fsdp to PyTorch -- sharding, core data flow and hooks (#63881)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63881
This PR includes the minimal sets of features to make FSDP work, like sharding, core data flow and hooks. More tests will be added in the follow up PRs. Tests are refactored to utilize common PyTorch utils. Codes are also refactored a little bit. Alternative ways to replace ".data" usage in this PR are still being discussed offline.

Test Plan: unit tests

Reviewed By: mrshenli

Differential Revision: D30521673

fbshipit-source-id: 9a23390dd7c925749604c6860e08fbe39ddc5500
2021-10-07 09:06:44 -07:00
Pritam Damania
0dc98728bc Basic implementation of ShardedLinear using ShardedTensor. (#64128)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64128

This PR implements a sharded nn.Linear layer using ShardedTensors with
the following limitations:

1) Works only for ChunkShardingSpec.
2) Implementation is only aimed to demonstrate functionality and is most likely
not performant at all.

The PR also introduces a `shard_parameter` API to easily shard parameters of
`nn.Modules`. This also has the following limitations:

1) Works only for ChunkShardingSpec.
2) Is not performant since it uses broadcast instead of scatter since
ProcessGroupNCCL doesn't yet support scatter.

Overall user API for running a sharded linear would be something like this:

```
# SPMD programming paradigm running same code on all nodes.
fc = nn.Linear(10, 10)

# Setup sharding.
sharding_spec=ChunkShardingSpec(...)
shard_parameter(fc, 'weight', sharding_spec, src_rank=0)

# Run as a normal linear layer.
inp = torch.rand(10, 10)
output = fc(inp)
```
ghstack-source-id: 138500985

Test Plan:
1) unit tests.
2) waitforbuildbot

Reviewed By: wanchaol, bowangbj

Differential Revision: D30621215

fbshipit-source-id: 1aa7478568c18a4572f6c3462fdf24a4cbde01d6
2021-09-20 18:31:11 -07:00
Nikita Shulga
01cfea9485 Disable target determination for now (#64921)
Summary:
There were several reports of target determinator incorrectly skipping
tests, most recent one is https://github.com/pytorch/pytorch/issues/64902

Let's disable it until it could be further stabilized

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64921

Reviewed By: seemethere, janeyx99

Differential Revision: D30901186

Pulled By: malfet

fbshipit-source-id: 531afd2d390c6b51f727330d5dd1882d70b6fdde
2021-09-14 09:40:13 -07:00
Rohan Varma
d067f15622 [Dist CI] Move rest of distributed tests to their own CI job (#64253)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64253

Follow up to D30496178 (f4aff3a346) to move the rest of distributed tests to their own jobs for Linux GHA.
ghstack-source-id: 137233785

Test Plan: CI

Reviewed By: walterddr

Differential Revision: D30662999

fbshipit-source-id: f7cfbc0d1223aca52120f17f9da987d70fda8de6
2021-09-01 21:43:41 -07:00
Nikita Shulga
c2da103fe6 Discover new tests in run_tests.py (#64246)
Summary:
Introduce `discover_tests` function that globs for all Python files
starting with `test_` in test folder excluding subfolders which are
executed differently

Fixes https://github.com/pytorch/pytorch/issues/64178

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64246

Reviewed By: walterddr, seemethere

Differential Revision: D30661652

Pulled By: malfet

fbshipit-source-id: a52e78ec717b6846add267579dd8d9ae75326bf9
2021-08-31 17:32:55 -07:00
Richard Zou
0457a85d45 Revert D30543236: Add python mode
Test Plan: revert-hammer

Differential Revision:
D30543236 (4bd03b0242)

Original commit changeset: ef5444d96a5a

fbshipit-source-id: b0042ac2c22765fa11d6d00bf751f6a4489eb6d8
2021-08-31 15:28:33 -07:00
Rohan Varma
1c2b5e59ae Remove ref to test_distributed_fork (#64197)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64197

Removes this line as test is gone.
ghstack-source-id: 136986275

Test Plan: CI

Reviewed By: walterddr

Differential Revision: D30642929

fbshipit-source-id: a0c7dfdfb35a4a7f7ec1b881dbea53d85136012c
2021-08-31 13:31:27 -07:00
leslie-fang-intel
09dfaa0339 add operation list for AutocastCPU (#63534)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63534

In this PR:
* We have changed the default dtype of `AutocastCPU` from `float16` to `bfloat16` as discussed here `https://github.com/pytorch/pytorch/pull/61002`
* We also update the operation list which needs casting to `lower_precision_fp` or `float32`.

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D30644914

Pulled By: ezyang

fbshipit-source-id: 8b93485ba452b3759611e3f0ac88e920fe495ac1
2021-08-30 19:30:33 -07:00
Richard Zou
4bd03b0242 Add python mode (#63496)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63496

This PR adds a (private) enable_python_mode context manager.
(see torch/utils/_python_dispatch.py).
enable_python_mode accepts the type of a __torch_dispatch__ object
as its argument. Whenever an operator gets called inside of the
context manager, it dispatches to the __torch_dispatch__ of
the passed-in type.

Example usage:
```
with enable_python_mode(LoggingTensor):
    z = torch.empty([])
    assert isinstance(z, LoggingTensor)
```

There are quite a few changes that were made to support this.

First, we added TorchDispatchTypeObject, a C++ struct that represents the
type of a `__torch_dispatch__` object (e.g. LoggingTensor).
It holds both the PyObject* representing the class and a PyInterpreter*
so we know which Python interpreter it came from.

Next, we updated the concrete_dispatch_fn in python_variable.cpp to accept
a `const std::shared_ptr<TorchDispatchTypeObject>&` argument. When this
is null, dispatching happens as usual. When it is non-null, we prepend
the TorchDispatchTypeObject's PyObject* to the overloaded args list so that
it is considered first for dispatch.

To get that to work, we changed how `handle_torch_dispatch_no_python_arg_parser`
works. The "overloaded args list" previously only consisted of Tensor PyObjects,
but now it can have types in addition to Tensors!
- We renamed `append_overloaded_arg` to `append_overloaded_arg`
- We added a new `append_overloaded_type` that appends a type to
overloaded_args
- We added special handling in `handle_torch_dispatch_no_python_arg_parser`
and `append_overloaded_arg` to handle types in addition to Tensors.

Then, there is PythonMode and PythonModeTLS.
- We reuse the DispatchKey::Python dispatch key as a mode key
- We use PythonMode::enter and PythonMode::exit to enable/disable
DispatchKey::Python and set the PythonModeTLS.
- PythonModeTLS stores a TorchDispatchTypeObject as metadata.
- PythonMode is in libtorch_python, and PythonModeTLS is in ATen.
This split is due to the libtorch_python library boundary (because we need
to save TLS in ATen/ThreadLocalState)
- We modify the PythonFallbackKernel to look up
the relevant TorchDispatchTypeObject (if Python Mode is active) and
dispatch using it.

There are two more miscellaneous changes:
- internal_new_from_data (torch/csrc/utils/tensor_new.cpp) gets an
exclude guard. enable_python_mode currently does not handle
torch.tensor and the exclude guard is to prevent a bug.

Future:
- This PR does not allow for the nesting of Python modes. In the future we
should be able to enable this with a more sane no_dispatch API and by changing
the TLS to a stack. For now I did not need this for CompositeImplicitAutograd testing.

Test Plan: - new tests

Reviewed By: malfet, albanD

Differential Revision: D30543236

Pulled By: zou3519

fbshipit-source-id: ef5444d96a5a957d1657b7e37dce80f9a497d452
2021-08-30 18:44:35 -07:00
Jane Xu
1354ee417a run_test.py: add option to run only core tests (#63976)
Summary:
This is in response to a feature request from some folks in the core team to have a local command that would only run relevant "core" tests. The idea is to have a local smoke test option for developers to run locally before making a PR in order to verify their changes did not break core functionality. These smoke tests are not targeted to be short but rather relevant.

This PR enables that by allowing developers to run `python test/run_test.py --core` or `python test/run_test.py -core` in order to run the CORE_TEST_LIST, which is currently test_nn.py, test_torch.py, and test_ops.py.

I am not the best person to judge what should be considered "core", so please comment which tests should be included and/or excluded from the CORE_TEST_LIST!

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63976

Test Plan:
```
(pytorch) janeyx@janeyx-mbp test % python run_test.py --core -v
Selected tests: test_nn, test_ops, test_torch
Running test_nn ... [2021-08-25 14:48:28.865078]
Executing ['/Users/janeyx/miniconda3/envs/pytorch/bin/python', 'test_nn.py', '-v'] ... [2021-08-25 14:48:28.865123]
test_to (__main__.PackedSequenceTest) ... ok
test_to_memory_format (__main__.PackedSequenceTest) ... ok
```

Reviewed By: walterddr

Differential Revision: D30575560

Pulled By: janeyx99

fbshipit-source-id: 3f151982c1e315e50e60cb0d818adaea34556a04
2021-08-26 09:29:57 -07:00
driazati
ab5cf5a1eb Move existing target determinator to tools (#63809)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63809

This moves out the modulefinder determinator to `tools/testing` since it is supposed to be CI-only. This also simplifies run_test.py a little bit.

Test Plan: Imported from OSS

Reviewed By: malfet, seemethere, janeyx99

Differential Revision: D30497438

Pulled By: driazati

fbshipit-source-id: 1d203037af5af6a20c1e7812da935e7cbb5cd82f
2021-08-25 13:03:53 -07:00
driazati
67d8e7b659 Reformat run_test.py (#63808)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63808

`black run_test.py`

Test Plan: Imported from OSS

Reviewed By: seemethere

Differential Revision: D30497437

Pulled By: driazati

fbshipit-source-id: 41b29b73f41fa4bb15fce5eaa69f8efe614e02f7
2021-08-25 11:27:18 -07:00
Rong Rong (AI Infra)
f4aff3a346 [BE] add distributed run_test options (#63147)
Summary:
Currently distributed tests are mixed within test_python.
We would like to split the distributed tests into its own batch thus we need to split them out.

Adding an option to include/exclude distributed tests with CUSTOM_HANDLERS.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63147

Test Plan:
- locally run with the addition run_test.py options.
- CI

Dependency: found a bug in mpiexec test and need https://github.com/pytorch/pytorch/issues/63580 to fix it first.

Reviewed By: bdhirsh

Differential Revision: D30496178

Pulled By: walterddr

fbshipit-source-id: 7903a57b619f2425028028f944211938823918a6
2021-08-24 08:03:01 -07:00
Pritam Damania
2d671ca41b [8/N] Remove c10d/ddp fork tests. (#63454)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63454

Continuation of https://github.com/pytorch/pytorch/pull/63443, this
PR removes all fork tests from torch.distributed.
ghstack-source-id: 136285511

Test Plan: waitforbuildbot

Reviewed By: SciPioneer

Differential Revision: D30387872

fbshipit-source-id: f6d6313db126ae7b95b86f78a1e0726887c5c513
2021-08-20 12:23:18 -07:00
Jeff Daily
be9be9bfdd add distributed/_sharded_tensor/test_sharded_tensor to ROCM_BLOCKLIST (#63508)
Summary:
Fixes current ROCm CI test2 brokenness until tensorpipe is fully supported by ROCm.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63508

Reviewed By: ejguan

Differential Revision: D30406450

Pulled By: walterddr

fbshipit-source-id: c07509271d5d33901f3eaf7ffb916dc3626e1f9a
2021-08-19 07:50:55 -07:00
Eli Uriegas
4982fc4ecf test: Add ability to set CONTINUE_THROUGH_ERROR (#63357)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63357

Adds the ability to set CONTINUE_THROUGH_ERROR as an environment
variable so that we can easily set it without having to add the flag
directly

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: astaff

Differential Revision: D30351108

Pulled By: seemethere

fbshipit-source-id: 767fa9bd24e1399f359eb24d16f6cc985a2d7173
2021-08-16 15:35:40 -07:00
Shen Li
1022443168 Revert D30279364: [codemod][lint][fbcode/c*] Enable BLACK by default
Test Plan: revert-hammer

Differential Revision:
D30279364 (b004307252)

Original commit changeset: c1ed77dfe43a

fbshipit-source-id: eab50857675c51e0088391af06ec0ecb14e2347e
2021-08-12 11:45:01 -07:00
Zsolt Dollenstein
b004307252 [codemod][lint][fbcode/c*] Enable BLACK by default
Test Plan: manual inspection & sandcastle

Reviewed By: zertosh

Differential Revision: D30279364

fbshipit-source-id: c1ed77dfe43a3bde358f92737cd5535ae5d13c9a
2021-08-12 10:58:35 -07:00
Pritam Damania
91525d42d9 Fix sharded tensor tests. (#63054)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63054

1) Ensure these tests are skipped in environments without any GPUs.
2) Add the test to run_test.py
ghstack-source-id: 135595698

Test Plan: waitforbuildbot

Reviewed By: wanchaol

Differential Revision: D30239159

fbshipit-source-id: 21b543ba72e8d10182bc77e7ae1fd34fd4096509
2021-08-11 21:46:45 -07:00
Rohan Varma
39ec1da935 [reland] Gate DistributedOptimizers on RPC availability (#62937)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62937

reland due to windows + cuda failure, fix by running it on gloo on windows even with cuda.
ghstack-source-id: 135306176

Test Plan: ci

Reviewed By: mrshenli

Differential Revision: D30177734

fbshipit-source-id: 7625746984c8f858648c1b3632394b98bd4518d2
2021-08-09 14:41:06 -07:00
Natalia Gimelshein
b45cf9b81b Revert D30117838: [WIP] Gate DistributedOptimizers on RPC availability
Test Plan: revert-hammer

Differential Revision:
D30117838 (3f09485d7e)

Original commit changeset: e6365a910a3d

fbshipit-source-id: f276b2b2bdf5f7bd27df473fca0eebaee9f7aef2
2021-08-06 22:10:41 -07:00
Rohan Varma
3f09485d7e [WIP] Gate DistributedOptimizers on RPC availability (#62774)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62774

Gates DistributedOptimizer which relies on RRef based on if RPC is available. This should enable ZeRo to work with Windows as Windows should not try to import the DIstributedOptimizer. If this works as expected we can enable the windows tests for functional/local sgd optimizers as well.
ghstack-source-id: 135216642

Test Plan: CI

Reviewed By: pbelevich

Differential Revision: D30117838

fbshipit-source-id: e6365a910a3d1ca40d95fa6777a7019c561957db
2021-08-06 10:59:00 -07:00
Joel Schlosser
a0309f89f4 Initial ModuleInfo implementation (#61935)
Summary:
This PR contains the initial version of `ModuleInfo` for use in testing modules. The design philosophy taken here is to start small and simple and build out / refactor as needed when more test coverage or `ModuleInfo` entries are added. As such, it's not intended for general usage yet. The PR contains the following:

* (new file) `torch/testing/_internal/common_modules.py`
  * `ModuleInfo` definition - metadata for each module to use in testing
  * `module_db` - the actual `ModuleInfo` database; currently contains entries for two modules
  * `ModuleInput` - analogous to `SampleInput` from OpInfo; contains `FunctionInput`s for both constructor and forward pass inputs
      * Constructor and forward pass inputs are tied together within a `ModuleInput` because they are likely correlated
  * `FunctionInput` - just contains args and kwargs to pass to a function (is there a nicer way to do this?)
  * `modules` decorator - analogous to `ops`; specifies a set of modules to run a test over
  * Some constants used to keep track of all modules under torch.nn:
      * `MODULE_NAMESPACES` - list of all namespaces containing modules
      * `MODULE_CLASSES` - list of all module class objects
      * `MODULE_CLASS_NAMES` - dict from module class object to nice name (e.g. torch.nn.Linear -> "nn.Linear")
* (new file) `test/test_modules.py`
    * Uses the above to define tests over modules
    * Currently, there is one test for demonstration, `test_forward`, which instantiates a module, runs its forward pass, and compares it to a reference, if one is defined

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61935

Reviewed By: mruberry

Differential Revision: D29881832

Pulled By: jbschlosser

fbshipit-source-id: cc05c7d85f190a3aa42d55d4c8b01847d1efd57f
2021-07-27 07:42:07 -07:00
Rohan Varma
69adb21940 Parity tests for functional optimizer step_param (#61756)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61756

DDP will support running optimizer as communication hook with
optimizers that support a per-parameter/gradient step function `step_param`.
Add parity tests as we implement more optimizers that support step_param to
ensure parity with regular optimizers.
ghstack-source-id: 134330378

Test Plan: Ci

Reviewed By: SciPioneer

Differential Revision: D29727549

fbshipit-source-id: 18977c896f12b8e478298488b298fd107affcf5f
2021-07-26 19:03:22 -07:00
Yukio Siraichi
5224490ae9 Implement NumPy-like frombuffer tensor constructor. (#59077)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59077

Fixes #58549

`from_buffer` constructs a tensor object from an already allocated buffer through
CPython's buffer protocol. Besides the standard `dtype`, `count`, and `offset` parameters,
this function also accepts:

- `device`: where the buffer lives
- `requires_grad`: should autograd record operations on the new tensor

A new test file _test_buffer_protocol.py_ was created. Currently, only CPU tests were
implemented. That's because neither PyTorch nor Numba implements CPython's buffer
protocol. Therefore, there's no way to create a CUDA buffer with the existing
dependencies (could use PyCUDA for that, though).

At the moment, if `device` differs from the device the buffer actually lives, two things
may happen:

- `RuntimeError`, if `device='cuda'`
- Segmentation fault (not tested -- see above), if `device='cpu'`

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D29870914

Pulled By: mruberry

fbshipit-source-id: 9fa8611aeffedfe39c9af74558178157a11326bb
2021-07-23 13:17:48 -07:00
Andrew Gu
c2cc6a9396 Add generic join unit tests (#61786)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61786

This adds unit tests for the generic join context manager.

```
gpurun python test/distributed/algorithms/test_join.py
```

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D29746646

Pulled By: andwgu

fbshipit-source-id: 2933d85783c2225574c4b77bfb90064690c6e668
2021-07-20 12:13:05 -07:00
Rong Rong (AI Infra)
a5a10fe353 Move all downloading logic out of common_utils.py (#61479)
Summary:
and into tools/ folder

Currently run_tests.py invokes tools/test_selections.py
1. download and analyze what test_file to run
2. download and parse S3 stats and pass the info to local files.
3. common_utils.py uses download S3 stats to determine what test cases to run.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61479

Reviewed By: janeyx99

Differential Revision: D29661986

Pulled By: walterddr

fbshipit-source-id: bebd8c474bcc2444e135bfd2fa4bdd1eefafe595
2021-07-12 11:23:22 -07:00
Rong Rong (AI Infra)
718db968b8 move CI related functions out of run_test.py (#61124)
Summary:
run_test.py currently does lots of downloading and test file/suite/case parsing. It doesn't work well outside of the CI environment

Restructured the run_test.py and created tools/test/test_selections.py and move all test selection logic (reordering, categorizing slow test, creating shards)

Follow up PRs should:
- refactor those file read/write logic entangled inside test_selections.py into stats/ folder
- restructure and add network independent test logics to test_test_selections.py

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61124

Test Plan:
- tools/test
- CI

Related PR:
This follows the refactoring example in: https://github.com/pytorch/pytorch/issues/60373

Reviewed By: malfet

Differential Revision: D29558981

Pulled By: walterddr

fbshipit-source-id: 7f0fd9b4720a918d82918766c002295e8df04169
2021-07-06 09:06:42 -07:00
Zafar
509b1ef9d5 [sparsity] Add sparsity tests to run_test.py (#60887)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60887

Test Plan:
```
./test/run_test.py -i test_ao_sparsity
```

```
./test/run_test.py -i test_ao_sparsity
```

Differential Revision:
D29465834
D29465834

Reviewed By: mruberry

Pulled By: z-a-f

fbshipit-source-id: 144f940363a20dd65c2bbfe70924c266d8791dc7
2021-07-02 11:11:20 -07:00
Sam Estep
d5a44f9f12 Use expecttest from PyPI (#60658)
Summary:
This PR removes `torch/testing/_internal/expecttest.py` in favor of https://github.com/ezyang/expecttest. See also https://github.com/ezyang/ghstack/pull/71.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60658

Test Plan: CI.

Reviewed By: ezyang

Differential Revision: D29430763

Pulled By: samestep

fbshipit-source-id: b7cdc7ba37330176149fd465312118e2254ae92e
2021-06-28 15:43:34 -07:00
Rong Rong (AI Infra)
7e619b9588 First step to rearrange files in tools folder (#60473)
Summary:
Changes including:
- introduced `linter/`, `testing/`, `stats/` folders in `tools/`
- move appropriate scripts into these folders
- change grepped references in the pytorch/pytorch repo

Next step
- introduce `build/` folder for build scripts

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60473

Test Plan:
- CI (this is important b/c pytorch/test-infra also rely on some script reference.
- tools/tests/

Reviewed By: albanD

Differential Revision: D29352716

Pulled By: walterddr

fbshipit-source-id: bad40b5ce130b35dfd9e59b8af34f9025f3285fd
2021-06-24 10:13:58 -07:00
Rong Rong (AI Infra)
40d2fe1053 correct filename issue for test_cpp_extensions_aot (#60604)
Summary:
Using file copy to make actual ninja vs. no_ninja suffixed python test files.
This is to trick xmlrunner to report test cases in the correct folder.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60604

Test Plan:
- CI reports correctly into the corresponding folders
- If download the test statistics, calculate shards now doesn't need custom logic to handle `test_cpp_extensions_aot`

CI result shown it is working properly:
https://github.com/pytorch/pytorch/pull/60604/checks?check_run_id=2900038654 vs
https://github.com/pytorch/pytorch/pull/60604/checks?check_run_id=2900038673

Reviewed By: albanD

Differential Revision: D29349562

Pulled By: walterddr

fbshipit-source-id: e86e6bc0db288a2a57bea3c5f8edf03be1773944
2021-06-24 09:20:19 -07:00
Jane Xu
6385621003 Use JOB_BASE_NAME throughout code--consolidate CIRCLE_JOB (#60425)
Summary:
This PR is a first step in unifying our environment variables across CI (so that we don't have `CIRCLE_BLAH` in our GHA workflows, for example), though I'd like for this PR to be more for discussion about how best to consolidate these variables.

This small change only changes most CIRCLE_JOB references in our code to be JOB_BASE_NAME, as that seems the closest GHA (and ROCm) equivalent. Currently, JOB_BASE_NAME is defined as:
- in Circle: CIRCLE_JOB (name of the job, like `pytorch_linux_bionic_py3_8_gcc9_coverage_test1`)
- in GHA: the build_environment with a `-build` or `-test` tacked to the end , e.g., `pytorch-linux-xenial-cuda10.2-cudnn7-py3.6-gcc7-test`
- in ROCm: I don't actually know, but it's important for ROCm test sharding as shown in https://github.com/pytorch/pytorch/pull/60409

I am not sure if this is the intention for JOB_BASE_NAME so it is open to discussion what variable we should use if not JOB_BASE_NAME. I also don't know if it's worth the effort consolidating all these variables, so discussion is also highly encouraged there!

Next steps:
- Consolidate more CIRCLE_* references, maybe into CI_* equivalents?
- We use BUILD_ENVIRONMENT everywhere in Circle though the variable is inconsistent across binary vs CI jobs and across platforms. For example, for linux tests and builds, BUILD_ENVIRONMENT contains the `_test` and `_build` suffixes, but the windows jobs don't. In GHA, BUILD_ENVIRONMENT is similar to how it's defined in windows jobs on Circle. This inconsistency is confusing, and we can probably do something about it. I'm thinking of switching out BUILD_ENVIRONMENT for JOB_BASE_NAME in our test scripts where we actually mean JOB_BASE_NAME.
- We should probably document the meaning of the variables we consolidate somewhere, preferably in a README in some unified `ci/` folder. For example, it seems BUILD_ENVIRONMENT is supposed to capture the build environment, whereas JOB_BASE_NAME is supposed to capture the environment _and_ whether we're building or testing.

Notes:
- I did not replace CIRCLE_JOB references in third_party directories
- Previously, print_test_stats reported CIRCLE_JOB as only the build environment for GHA workflows, and I think tacking on the `build` or `test` will not harm anything, though I may be wrong.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60425

Reviewed By: seemethere, samestep

Differential Revision: D29333882

Pulled By: janeyx99

fbshipit-source-id: a82080e6205a03a1183035011ce59698eca06748
2021-06-23 11:11:21 -07:00
Howard Huang
ff3678eec2 Disable group group backend rpc tests from running on CI (#60407)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60407

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D29278179

Pulled By: H-Huang

fbshipit-source-id: ee78085eeb04d81842c95236b8c3a33de7142a3a
2021-06-23 10:58:31 -07:00
Jane Xu
c63a0d0cfe Adding windows CUDA smoke tests on PRs (#59686)
Summary:
Adding windows CUDA smoke tests on PRs (master should run the full suite).

Next step:
- Automate data update so we get a new smoke test list without manual effort

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59686

Test Plan: https://github.com/pytorch/pytorch/actions/runs/958296267 The sharded smoke tests take long still because of dependencies installation

Reviewed By: walterddr

Differential Revision: D29243533

Pulled By: janeyx99

fbshipit-source-id: dde7ba127fa15c95bda0e833cc5311598fb85e2b
2021-06-23 10:13:50 -07:00
Jane Xu
462448f07a Enable GHA sharding on linux (#60124)
Summary:
This is branch off of https://github.com/pytorch/pytorch/issues/59970 to only shard on linux so far (we're running in issues with windows gflags).

This would enable sharding of tests on a few Linux jobs on GHA, allowing tts to be essentially halved.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60124

Reviewed By: zou3519

Differential Revision: D29204211

Pulled By: janeyx99

fbshipit-source-id: 1cc31d1eccd564d96e2aef14c0acae96a3f0fcd0
2021-06-17 13:00:23 -07:00
Rong Rong (AI Infra)
b2fc6de2c4 support parsing of PR stats in run_test.py (#60026)
Summary:
Currently S3 test stats doesn't support PR stats parisng.

Changes to s3_stats_parser:
1. they are uploaded to `test_times/{sha1}/{job}` and `pr_test_times/{pr}/{sha1}/{job}` separately. Thus we need parsing logics for both
2. need to attach time for PR stats parsing for ordering since PR commits can be force-pushed

Changes to run_test.py
1. Reordering based on previous PR stats if available
2. Falling back to file change option if not enabled.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60026

Test Plan:
- CI.
- local repro: plz run:
```
CIRCLE_JOB="pytorch_linux_bionic_py3_6_clang9_noarch_test" CIRCLE_PR_NUMBER=60057 IN_CI=1 ENABLE_PR_HISTORY_REORDERING=1 python test/run_test.py
```

Reviewed By: samestep

Differential Revision: D29164754

Pulled By: walterddr

fbshipit-source-id: 206688e0fb0b78d1c9042c07243da1fbf88a924b
2021-06-16 13:32:31 -07:00
Jane Xu
d88fbf0fbc fix minor typo in run_test.py (#60055)
Summary:
Fixes typo in run_test.py for option use_specified_test_cases_by

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60055

Reviewed By: walterddr

Differential Revision: D29150156

Pulled By: janeyx99

fbshipit-source-id: 375e594d09c83188bfa80762c8b833a0b7c5cca4
2021-06-16 09:30:45 -07:00
Rohan Varma
c2098487e8 [c10d] Move pg wrapper tests to their own file. (#59840)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59840

moving these tests to their own standalone file. No meaningful code changes.
ghstack-source-id: 131359162

Test Plan: CI

Reviewed By: cbalioglu

Differential Revision: D29012664

fbshipit-source-id: 348870016509a6ed7e69240fa82bccef4a12d674
2021-06-14 15:05:55 -07:00
Rong Rong
e41bc31eb2 make --run-specified-test-case use --include (#59704)
Summary:
instead of having specific logic to handle run-specific-test-case, we provide the flag to override include or bring-to-front with the SPECIFIED_TEST_CASES_FILE.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59704

Reviewed By: janeyx99

Differential Revision: D29038425

Pulled By: walterddr

fbshipit-source-id: 803d3555813437c7f287a22f7704106b0c609919
2021-06-11 13:57:13 -07:00
Jane Xu
9bb5663979 Use commit stats from viable/strict instead of nightlies for sharding (#59727)
Summary:
Currently, not all of CI runs on nightlies, so it's better to use viable/strict.

For example, current 11.1 test jobs do not get to use automatic sharding because of the lack of stats: https://app.circleci.com/jobs/github/pytorch/pytorch/14010983?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59727

Reviewed By: heitorschueroff

Differential Revision: D29004910

Pulled By: janeyx99

fbshipit-source-id: eb0c54a7e7947decba8134a1d67e4b0434151a06
2021-06-09 13:52:15 -07:00
Jane Xu
97dfc7e300 [Reland] Adding run specified tests option to run_test.py (#59649)
Summary:
Reland of https://github.com/pytorch/pytorch/issues/59487

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59649

Reviewed By: samestep

Differential Revision: D28970751

Pulled By: janeyx99

fbshipit-source-id: 6e28d4dcfdab8a49da4b6a02c57516b08bacd7b5
2021-06-08 16:04:46 -07:00
Alban Desmaison
5d6a10a765 Revert D28913223: [pytorch][PR] Adding run-specified-test-cases option in run_test.py
Test Plan: revert-hammer

Differential Revision:
D28913223 (24432eaa29)

Original commit changeset: 0d1f99109734

fbshipit-source-id: 47c073720cff23a5d4cb64556381c46025e90937
2021-06-08 02:18:16 -07:00
Rong Rong (AI Infra)
57d8bccd00 only reorder tests based on git diff if IN_CI (#59565)
Summary:
Do not reorder tests unless they are in IN_CI, this causes local development test ordering indeterministic. most of use branch out from viable strict not head of master.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59565

Reviewed By: ejguan

Differential Revision: D28943906

Pulled By: walterddr

fbshipit-source-id: e742e7ce4b3fc017d7563b01e93c4cd774d0a537
2021-06-07 17:54:19 -07:00
Jane Xu
24432eaa29 Adding run-specified-test-cases option in run_test.py (#59487)
Summary:
The run-specified-test-cases option would allow us to specify a list of test cases to run by having a CSV with minimally two columns: test_filename and test_case_name.

This PR also adds .json to some files we use for better clarity.

Usage:
`python test/run_test.py --run-specified-test-cases <csv_file>` where the csv file can look like:
```
test_filename,test_case_name,test_total_time,windows_only_failure_sha_count,total_sha_count,windows_failure_count,linux_failure_count,windows_total_count,linux_total_count
test_cuda,test_cudnn_multiple_threads_same_device,8068.8409659525,46,3768,53,0,2181,6750
test_utils,test_load_standalone,8308.8062920459,14,4630,65,0,2718,8729
test_ops,test_forward_mode_AD_acosh_cuda_complex128,91.652619369806,11,1971,26,1,1197,3825
test_ops,test_forward_mode_AD_acos_cuda_complex128,91.825633094915,11,1971,26,1,1197,3825
test_profiler,test_source,60.93786725749,9,4656,21,3,2742,8805
test_profiler,test_profiler_tracing,203.09352795241,9,4662,21,3,2737,8807
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59487

Test Plan:
Without specifying the option, everything should be as they were before.

Running `python test/run_test.py --run-specified-test-cases windows_smoke_tests.csv` resulted in this paste P420276949 (you can see internally). A snippet looks like:
```
(pytorch) janeyx@janeyx-mbp pytorch % python test/run_test.py --run-specified-test-cases windows_smoke_tests.csv
Loading specified test cases to run from windows_smoke_tests.csv.
Processed 28 test cases.
Running test_cpp_extensions_jit ... [2021-06-04 17:24:41.213644]
Executing ['/Users/janeyx/miniconda3/envs/pytorch/bin/python', 'test_cpp_extensions_jit.py', '-k', 'test_jit_cuda_archflags'] ... [2021-06-04 17:24:41.213781]
s
----------------------------------------------------------------------
Ran 1 test in 0.000s

OK (skipped=1)
...
```
With pytest, an example executable would be:
`Running test_dataloader ... [2021-06-04 17:37:57.643039]
Executing ['/Users/janeyx/miniconda3/envs/pytorch/bin/python', '-m', 'pytest', 'test_dataloader.py', '-v', '-k', 'test_segfault or test_timeout'] ... [2021-06-04 17:37:57.643327]`

Reviewed By: samestep

Differential Revision: D28913223

Pulled By: janeyx99

fbshipit-source-id: 0d1f9910973426b8756815c697b483160517b127
2021-06-07 16:27:43 -07:00
Jane Xu
caf76c2445 Move sharding to after all tests have been excluded (#59583)
Summary:
It would be most accurate if sharding occurred after all other changes to selected_tests were complete.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59583

Reviewed By: ejguan

Differential Revision: D28944737

Pulled By: janeyx99

fbshipit-source-id: a851473948a5ec942ffeeedeefdc645536a3d9f7
2021-06-07 15:04:36 -07:00
Mike Ruberry
de40c8e495 Adds remaining OpInfos and removes redundant test generators (#55558)
Summary:
Per title.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/55558

Reviewed By: ngimel

Differential Revision: D28922522

Pulled By: mruberry

fbshipit-source-id: 89cefd93788bc8aa0683f4583cf5caa81aa2dc93
2021-06-06 14:52:26 -07:00
Andrew Gu
2ad4b8e58c Extract c10d Store tests to dedicated test file (#59271)
Summary:
Partially addresses https://github.com/pytorch/pytorch/issues/55340

**Overview**
This factors out `FileStoreTest`, `HashStoreTest`, `PrefixFileStoreTest`, `TCPStoreTest`, `PrefixTCPStoreTest`, `PythonStoreTest`, `RendezvousTest`, `RendezvousEnvTest`, `RendezvousFileTest`, and `RendezvousTCPTest` from `test_c10d_common.py` to a new file `test_store.py`.

Additionally, unused import/initialization statements are removed from `test_c10d_common.py`, and the minimal set of import/initialization statements are used for `test_store.py`.

Also, this changes `.jenkins/pytorch/multigpu-test.sh`, `.jenkins/pytorch/win-test-helpers/test_distributed.bat`, and `test/run_test.py` to include the new `test_store.py`.

**Testing**
All commands shown are run on an AI AWS cluster.

I check the Store tests:
```
python test/distributed/test_store.py
```

I also check `test_c10d_common.py` since it is the source of the refactored code. In addition, I check `test_c10d_nccl.py` and `test_c10d_gloo.py` since they import from `test_c10d_common.py`; those two should be the only test files depending on `test_c10d_common.py`.
```
python test/distributed/test_c10d_common.py
python test/distributed/test_c10d_nccl.py
python test/distributed/test_c10d_gloo.py
```
`test_c10d_gloo.py` produces warnings about how using sparse tensors in TorchScript is experimental, but the warnings do not result from this PR's changes.

**Testing Issues** (To Be Revisited)
```
WORLD_SIZE=4 BACKEND=gloo gpurun pytest test/distributed/test_c10d_gloo.py
```
Running the above command fails three tests (written as `[Test]`: `[Error]`):
- `ProcessGroupGlooWrapperTest.test_collective_hang`: `RuntimeError: [../third_party/gloo/gloo/transport/tcp/pair.cc:598] Connection closed by peer [10.200.24.101]:15580`
- `CommTest.test_broadcast_coalesced_gloo_cuda`: `RuntimeError: cuda runtime error (3) : initialization error at ../aten/src/THC/THCGeneral.cpp:54`
- `CommTest.test_sequence_num_incremented_gloo_default`: `RuntimeError: cuda runtime error (3) : initialization error at ../aten/src/THC/THCGeneral.cpp:54`
However, running each of the following yields no errors:
```
WORLD_SIZE=4 BACKEND=gloo gpurun pytest test/distributed/test_c10d_gloo.py -k test_collective_hang
WORLD_SIZE=4 BACKEND=gloo gpurun pytest test/distributed/test_c10d_gloo.py -k test_broadcast_coalesced_gloo_cuda
WORLD_SIZE=4 BACKEND=gloo gpurun pytest test/distributed/test_c10d_gloo.py -k test_sequence_num_incremented_gloo_default
```
This suggests the existence of some inadvertent state dependency between tests (e.g. improper cleanup). I have not explored this further yet. In particular, I do not have a solid understanding of the tests to be able to explain why using `pytest` and `gpurun` induces the failure (since notably, running the `.py` directly shows no issue).

Similarly, running the following yields 47 errors:
```
WORLD_SIZE=4 BACKEND=nccl gpurun pytest test/distributed/test_c10d_nccl.py
```
The errors seem to all be simply complaining about the usage of `fork()` instead of `spawn()` for CUDA multiprocessing. Though, most of the tests in `test_c10d_nccl.py` ask for at least 2 CUDA devices, so I think that the `gpurun` is warranted (assuming that the test file does not need to be run partially on different machines).

Both `test_c10d_common.py` and `test_store.py` work fine with `pytest`.

**Other Notes**
I noticed that `torch.distributed` is imported both as `dist` and as `c10d` and that `c10d` is used throughout the Store tests. I was curious if this is intentional (as opposed to using `dist` to refer to `torch.distributed`). Also, the original [issue](https://github.com/pytorch/pytorch/issues/55340) suggests that the Store tests do not use multiprocessing, but I saw that `torch.multiprocessing` is still used in `TCPStoreTest`.

The links for the Store files in the `CONTRIBUTING.md` [file](https://github.com/pytorch/pytorch/blob/master/torch/distributed/CONTRIBUTING.md) are broken. This can fixed in a separate PR.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59271

Reviewed By: jbschlosser, mrshenli

Differential Revision: D28856920

Pulled By: andwgu

fbshipit-source-id: 630950cba18d34e6b5de661f5a748f2cddc1b446
2021-06-03 10:53:33 -07:00
Pritam Damania
0d6fa1adc5 Introduce ChunkShardingSpec as a model sharding specification. (#55728)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55728

Full design: https://github.com/pytorch/pytorch/issues/55207

This PR introduces ChunkShardingSpec (SingleShardingSpec in the design). Used
the name ChunkShardingSpec since it is very similar to `torch.chunk` in terms
of how a Tensor is split up and feels more clear compared to SingleShardingSpec.
ghstack-source-id: 129603318

Test Plan: waitforbuildbot

Reviewed By: SciPioneer

Differential Revision: D27694108

fbshipit-source-id: c8764abe6a4d5fc56d023fda29b74b5af2a73b49
2021-05-23 16:04:57 -07:00
Rong Rong (AI Infra)
a70020465b adding test_sparse_csr to run_test (#58666)
Summary:
fixes https://github.com/pytorch/pytorch/issues/58632.

Added several skips that relates to test assert and MKL. Will address them in separate PR.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58666

Reviewed By: seemethere, janeyx99

Differential Revision: D28607966

Pulled By: walterddr

fbshipit-source-id: 066d4afce2672e4026334528233e69f68da04965
2021-05-22 13:17:46 -07:00
Sam Estep
2e26976ad3 Disallow versionless Python shebangs (#58275)
Summary:
Some machines don't have a versionless `python` on their PATH, which breaks these existing shebangs.

I'm assuming that all the existing versionless `python` shebangs are meant to be `python3` and not `python2`; please let me know if my assumption was incorrect for any of these.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58275

Test Plan: CI.

Reviewed By: zhouzhuojie

Differential Revision: D28428143

Pulled By: samestep

fbshipit-source-id: 6562be3d12924db72a92a0207b060ef740f61ebf
2021-05-14 08:26:02 -07:00
Nikita Shulga
b587354e4c Add Python-3.9 CI testing (#50992)
Summary:
Skip number of tests adjust typing handling

Pull Request resolved: https://github.com/pytorch/pytorch/pull/50992

Reviewed By: walterddr

Differential Revision: D26170388

Pulled By: malfet

fbshipit-source-id: 47852512aa3d5c25faf6687bcd0b1cbb332b0b20
2021-05-10 10:51:39 -07:00
Aliaksandr Ivanou
7fe4c1d0e7 Torchelastic: add multiprocessing tests to ci/cd (#56842)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56842

Add elastic multiprocessing test to ci/cd

Test Plan: buck test mode/opt-tsan //caffe2/test/distributed/elastic/multiprocessing/... -- --run-disabled

Reviewed By: wilson100hong

Differential Revision: D27982226

fbshipit-source-id: 1b4e6f1a20867a6aa7ca409e280fdb04e8db198b
2021-05-02 14:03:47 -07:00
Aliaksandr Ivanou
5c8ceefe46 Pytorch add agent api tests (#56985)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56985

Pytorch add agent api tests

Test Plan: ci/cd

Reviewed By: cbalioglu

Differential Revision: D28020485

fbshipit-source-id: e6acf095f26ce4b99cddfbf7641fb4fa885b0c86
2021-04-29 06:14:39 -07:00
Aliaksandr Ivanou
6ff0002b12 Pytorch: enable many torchelastic tests (#56970)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56970

The diff enables metrics, events, utils and timer tests on ci/cd pipeline

Test Plan: ci/cd

Reviewed By: cbalioglu

Differential Revision: D28015200

fbshipit-source-id: 6b419aaf9e62a10a747b6511bff90c82cfb7bcd6
2021-04-28 17:05:09 -07:00
David Reiss
89377e3e45 model_dump tool for model inspection (#56868)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56868

See __init__.py for a summary of the tool.
The following sections are present in this initial version
- Model Size.  Show the total model size, as well as a breakdown by
  stored files, compressed files, and zip overhead.  (I expect this
  breakdown to be a bit more useful once data.pkl is compressed.)
- Model Structure.  This is basically the output of
  `show_pickle(data.pkl)`, but as a hierarchical structure.
  Some structures cause this view to crash right now, but it can be
  improved incrementally.
- Zip Contents.  This is basically the output of `zipinfo -l`.
- Code.  This is the TorchScript code.  It's integrated with a blame
  window at the bottom, so you can click "Blame Code", then click a bit
  of code to see where it came from (based on the debug_pkl).  This
  currently doesn't render properly if debug_pkl is missing or
  incomplete.
- Extra files (JSON).  JSON dumps of each json file under /extra/, up to
  a size limit.
- Extra Pickles.  For each .pkl file in the model, we safely unpickle it
  with `show_pickle`, then render it with `pprint` and include it here
  if the size is not too large.  We aren't able to install the pprint
  hack that thw show_pickle CLI uses, so we get one-line rendering for
  custom objects, which is not very useful.  Built-in types look fine,
  though.  In particular, bytecode.pkl seems to look fine (and we
  hard-code that file to ignore the size limit).

I'm checking in the JS dependencies to avoid a network dependency at
runtime.  They were retrieved from the following URLS, then passed
through a JS minifier:
  https://unpkg.com/htm@3.0.4/dist/htm.module.js?module
  https://unpkg.com/preact@10.5.13/dist/preact.module.js?module

Test Plan:
Manually ran on a few models I had lying around.
Mostly tested in Chrome, but I also poked around in Firefox.

Reviewed By: dhruvbird

Differential Revision: D28020849

Pulled By: dreiss

fbshipit-source-id: 421c30ed7ca55244e9fda1a03b8aab830466536d
2021-04-28 07:33:10 -07:00
Philip Meier
759cfb7495 add missing comma to run_test.py (#57010)
Summary:
Factored out from https://github.com/pytorch/pytorch/pull/57008#discussion_r621137121:

> Without this comma, the strings are concatenated to `test_binary_ufuncstest_numpy_interop`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/57010

Reviewed By: malfet

Differential Revision: D28028061

Pulled By: walterddr

fbshipit-source-id: 97c64b79a6aaaf0242def03c8808c1a032537258
2021-04-27 08:00:13 -07:00
Joel Schlosser
febff45900 Support factory kwargs in torch.nn modules (#54508)
Summary:
Continuation of https://github.com/pytorch/pytorch/pull/53144

Pull Request resolved: https://github.com/pytorch/pytorch/pull/54508

Reviewed By: albanD

Differential Revision: D27939544

Pulled By: jbschlosser

fbshipit-source-id: 4bf517e5f74f093e27ca38a85e732da65e44d805
2021-04-22 16:16:53 -07:00
driazati
187a524249 Re-order tests based on changed files (#56666)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56666

Addresses some of #56557 by checking for changed files when running tests. This will help deliver signal faster when a failing test is run. It should always be safe to at least try to re-order the tests, so there's no option to turn it off, and any error ends up bailing out of the sorting process. Time saved will change between tests, with more improvement for things that are further down the static list here:

1e9c7ad4cb/test/run_test.py (L32)

The results vary from not much improvement ([before: 11m](https://app.circleci.com/pipelines/github/pytorch/pytorch/307580/workflows/6ab3def6-8d63-4f41-9b8d-9c2c50f6266b/jobs/12712819/steps), [after: 10m](https://app.circleci.com/pipelines/github/pytorch/pytorch/307578/workflows/157407b4-f850-431c-b641-d2ac97916a04/jobs/12712802/steps)) to a lot ([before: 75m](https://app.circleci.com/pipelines/github/pytorch/pytorch/307580/workflows/6ab3def6-8d63-4f41-9b8d-9c2c50f6266b/jobs/12712884/steps), [after: 8m](https://app.circleci.com/pipelines/github/pytorch/pytorch/307578/workflows/157407b4-f850-431c-b641-d2ac97916a04/jobs/12712865/steps)), but overall there shouldn't be any regression in test timing. These results are also probably a little confounded since the test sharding will be different after re-ordering.

As a follow up we can use the target determination logic to figure out which tests to bring to front based on the actual code instead of just edits to test files

Test Plan: Imported from OSS

Reviewed By: samestep

Differential Revision: D27934076

Pulled By: driazati

fbshipit-source-id: 747d09ad732289d7693101803d46e9fa8e6d2f59
2021-04-22 10:27:07 -07:00
Pavel Belevich
426852b4f0 Split test_c10d_spawn.py to test_c10d_spawn_gloo.py,test_c10d_spawn_nccl.py (#56599)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56599

Test Plan: NA

Reviewed By: SciPioneer

Differential Revision: D27913955

fbshipit-source-id: 7206e589fb7d08c55d08a58a3d57dc3d210a795e
2021-04-21 22:11:49 -07:00
Pavel Belevich
5cc75e46fa Split test_c10d.py to test_c10d_common.py, test_c10d_gloo.py, test_c10d_nccl.py (#56598)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56598

Test Plan: NA

Reviewed By: SciPioneer

Differential Revision: D27913170

fbshipit-source-id: 3439d18141131b02d55f2ca399a4c795cba2b04b
2021-04-21 22:10:41 -07:00
Joel Schlosser
12b2bc94d7 Revert D27909732: [pytorch][PR] Support factory kwargs in torch.nn modules
Test Plan: revert-hammer

Differential Revision:
D27909732 (5a09def9b0)

Original commit changeset: d8684b2403ab

fbshipit-source-id: d00d69fae4fa4ed58d9e97e70b27a06a0dcb39e4
2021-04-21 13:44:03 -07:00
Joel Schlosser
5a09def9b0 Support factory kwargs in torch.nn modules (#54508)
Summary:
Continuation of https://github.com/pytorch/pytorch/pull/53144

Pull Request resolved: https://github.com/pytorch/pytorch/pull/54508

Reviewed By: malfet

Differential Revision: D27909732

Pulled By: jbschlosser

fbshipit-source-id: d8684b2403ab7eb336371d118799146a2520bd76
2021-04-21 13:20:11 -07:00
Aliaksandr Ivanou
c5c5230890 Pytorch resolve bug around incorrect rdzv handler resolution (#56386)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56386

The diff resolves bug around incorrect handler resolution:
_create_static_handler pointed towards etcd, and _create_etcd_handler pointed towards static.

Test Plan:
buck test mode/dev-nosan //caffe2/test/distributed:test_launcher

Added test_launcher to the ci/cd tests

Reviewed By: cbalioglu

Differential Revision: D27858897

fbshipit-source-id: 440155789958c091ce5755e7c9524e4bb704203a
2021-04-19 23:50:28 -07:00
Natalia Gimelshein
92d24e3060 Revert D27855386: [pytorch][PR] Support factory kwargs in torch.nn modules
Test Plan: revert-hammer

Differential Revision:
D27855386 (40483acc51)

Original commit changeset: dabd505d2a04

fbshipit-source-id: f5bf3120d87861b30a8e1bf11977ad7d27cd8500
2021-04-19 20:07:20 -07:00
Joel Schlosser
40483acc51 Support factory kwargs in torch.nn modules (#54508)
Summary:
Continuation of https://github.com/pytorch/pytorch/pull/53144

Pull Request resolved: https://github.com/pytorch/pytorch/pull/54508

Reviewed By: bdhirsh

Differential Revision: D27855386

Pulled By: jbschlosser

fbshipit-source-id: dabd505d2a04208e74b158570fb2859c736eea2c
2021-04-19 12:24:58 -07:00
Sam Estep
d05e7c163f Revert D27600457: [pytorch][PR] Support factory kwargs in torch.nn modules
Test Plan: revert-hammer

Differential Revision:
D27600457 (1077f87269)

Original commit changeset: b58bfee61c39

fbshipit-source-id: 19d5bfc5133a3880383731d0332503ca1f3bce0c
2021-04-19 07:47:24 -07:00
Joel Schlosser
1077f87269 Support factory kwargs in torch.nn modules (#54508)
Summary:
Continuation of https://github.com/pytorch/pytorch/pull/53144

Pull Request resolved: https://github.com/pytorch/pytorch/pull/54508

Reviewed By: mrshenli

Differential Revision: D27600457

Pulled By: jbschlosser

fbshipit-source-id: b58bfee61c3917524b4622f63ef216c27a588eb1
2021-04-19 06:58:40 -07:00
Sam Estep
1e9c7ad4cb Add a test to measure import torch time (#56041)
Summary:
This PR adds a couple very simple tests which (as the code comment says) measure the time it takes to `import torch` and ask for the CUDA device count.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56041

Test Plan:
```
$ rm -r /tmp/reports ; python3 test/test_import_time.py --save-xml=/tmp/reports

Running tests...
----------------------------------------------------------------------
..
----------------------------------------------------------------------
Ran 2 tests in 1.855s

OK

Generating XML reports...
```
```
$ tools/print_test_stats.py /tmp/reports
No scribe access token provided, skip sending report!
class TestImportTime:
    tests: 2 failed: 0 skipped: 0 errored: 0
    run_time: 1.85 seconds
    avg_time: 0.93 seconds
    median_time: 0.93 seconds
    2 longest tests:
        test_time_cuda_device_count time: 1.10 seconds
        test_time_import_torch time: 0.75 seconds

Total runtime is 0:00:01
2 longest tests of entire run:
    TestImportTime.test_time_cuda_device_count  time: 1.10 seconds
    TestImportTime.test_time_import_torch  time: 0.75 seconds
```

Reviewed By: driazati

Differential Revision: D27770908

Pulled By: samestep

fbshipit-source-id: 01bbf5a339f41d3a1f493e6fa8c946ff7567daec
2021-04-15 00:53:30 -07:00
Edward Yang
bc86358cf5 Make run_test.py work even if s3_stat_parser fails to import (#56039)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56039

Python will try to eagerly resolve the name references even if
the import failed.  Quote them so that it doesn't.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: janeyx99

Differential Revision: D27770536

Pulled By: ezyang

fbshipit-source-id: b111739289498f9bab856fb9424f3080efee4ee0
2021-04-14 13:21:50 -07:00
Luca Wehrstedt
3f8d476857 Split out CUDA RPC tests (#55695)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55695

In order to be able to run CUDA tests on their own (e.g., to avoid running CPU tests on GPU machines).

Done by moving test methods to a separate class (and sometimes introducing a "common" base class for utils), and then providing new entry points inside a `cuda/` subdirectory.

Test Plan: Checked they are run on Sandcastle.

Reviewed By: mrshenli

Differential Revision: D27618198

fbshipit-source-id: 8f671657f79c8ae115748ab7752fe0066705893b
2021-04-12 07:48:08 -07:00
Rong Rong (AI Infra)
55db156229 remove test_jit_py3.py entirely (#55560)
Summary:
1. move module related stuff to test_module_container
2. created test_types for types and annotation
3. created test_misc for the rest

Pull Request resolved: https://github.com/pytorch/pytorch/pull/55560

Reviewed By: VitalyFedyunin

Differential Revision: D27650911

Pulled By: walterddr

fbshipit-source-id: d895a7da9e9c3d25a662a37faf4daabc276b9c1a
2021-04-08 14:28:54 -07:00
Erjia Guan
f9a0bbbeb8 [DataPipe] Remove duplicate dataset (#54553)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54553

Test Plan: Imported from OSS

Reviewed By: VitalyFedyunin

Differential Revision: D27279301

Pulled By: ejguan

fbshipit-source-id: 112a83e7061e3f35dc517eb623bd9ca93c2f034c
2021-04-07 10:11:22 -07:00
Jane Xu
bf37bf7da4 Make JSON files more human readable (#55335)
Summary:
Prettifies JSON files .pytorch-test-times and .pytorch-slow-tests so that not everything is on one single line.

This is of slightly more importance as generated  .pytorch-slow-tests ends up getting stored in our test-infra repo ([example](ad9cd87565)), and it is nice to not have that lil red symbol at the end.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/55335

Reviewed By: samestep

Differential Revision: D27576930

Pulled By: janeyx99

fbshipit-source-id: be58565b8c8593a9bfcfab383ee19facc79f0572
2021-04-05 17:23:36 -07:00
Jane Xu
717e70a824 (BE) Refactor get-test-times-from-S3 into s3_stat_parser (#54808)
Summary:
Moves more s3 parsing code to s3_stat_parser.py. This is another step in modularizing the parsing code more correctly. I will also be using this exact function in future slowTest code.

Also replaces some Any's in the code to be Report.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/54808

Test Plan:
.pytorch-test-times generated before the code and after this code is the same.
CI should pass, specifically the test tools GHA.

Reviewed By: walterddr

Differential Revision: D27375783

Pulled By: janeyx99

fbshipit-source-id: bec28551668b2eb3fdd60d802200993e493eac83
2021-03-29 08:45:22 -07:00
Rong Rong (AI Infra)
d4045e9aa1 initial commit to refactor all s3 access codes to s3_stats_parser (#54681)
Summary:
First step to move all S3 related operations into S3 parser utils.
in the end we provide APIs from s3_stats_parser:
1. downloading data as reports and uploading data as reports
2. filter by job name

and handle all compression, formatting inside.

TODO
- [ ] Refactor out upload into s3_stats_parser
- [ ] Remove all S3/BOTO related checkers and try/catch blocks outside of s3_stats_parser

Pull Request resolved: https://github.com/pytorch/pytorch/pull/54681

Test Plan:
1. Running tools/test/* covers the refactoring logic (test_test_history.py and test_stats.py as entrypoint and both using the 2 new APIs in s3_stats_parser after the refactoring.
2. print_test_stats.py's main argparse entrypoint is covered by CI step Report Test Result step.
3. run `python test/run_test.py --export-past-test-times` before and after this PR should result in the same file content in .pytorch-test-times

Reviewed By: ailzhang

Differential Revision: D27346742

Pulled By: walterddr

fbshipit-source-id: fb40162e631e007fed9d5821fe4f190bda2cb52e
2021-03-26 06:49:15 -07:00
Jane Xu
792f5ffb83 Also strip slow_test (#54528)
Summary:
Since `_test1`, `_test2` and `_build` and `test` are all stripped, `slow_test` should be stripped as well. This way, the _slow_test stats will be considered as a part of all stats relating to a particular build job, though currently, it doesn't do much because the jobs don't share a common stemmed name--the build has `_gcc7` while the slow_test CI job does not.

This makes me think...do we omit the `gcc7` intentionally? Are there other things I should strip, e.g., `multigpu_test`?

See:
ci/circleci: pytorch_linux_xenial_cuda10_2_cudnn7_py3_slow_test
ci/circleci: pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_test1
ci/circleci: pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_test2

Pull Request resolved: https://github.com/pytorch/pytorch/pull/54528

Reviewed By: samestep

Differential Revision: D27270393

Pulled By: janeyx99

fbshipit-source-id: ffb7289cfe4dba52ded67f50a89f3e75e7bad68d
2021-03-23 14:44:21 -07:00
Jane Xu
635595f706 Change sharding in ci (#54228)
Summary:
Step three (landing this should fix https://github.com/pytorch/pytorch/issues/53882)!

Modifying CI to compute job times during build so that the exported job times can be used for sharding future test jobs.
The builds that are exempted from this:
- `bazel` (no python tests so no need)
- `libtorch` (no python stuff so no need)
- `onnx` (the test shards are not calculated the same way)
- `asan` (runs into error I don't know how to debug/we can debug later: [logs](https://app.circleci.com/pipelines/github/pytorch/pytorch/288019/workflows/57f95f67-1a1b-44a0-9b02-9652b57f2a5f/jobs/11693962)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/54228

Test Plan: CI

Reviewed By: samestep

Differential Revision: D27192978

Pulled By: janeyx99

fbshipit-source-id: 3cb20d14f4989e61873043b81dfd6b0f82d17ccd
2021-03-22 08:40:34 -07:00
Jane Xu
0645e2b490 Use shard file if present, improve functions used for sharding (#54210)
Summary:
Step 2 to fixing https://github.com/pytorch/pytorch/issues/53882 :)

This changes TARGET_DET_LIST and sharding automation by checking if there's already cached data from the commit in `.pytorch-test-times`. If not, it pulls data from S3 and updates the file to have the stats. This way, S3 pulling does not need to happen more than once for the same commit.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/54210

Test Plan:
the following methods should run the same set of tests.
First `export CIRCLE_JOB=pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_test2` or your favorite CIRCLE JOB.

1. Pull data first and use it:
Download the data from S3 and write it to the cache file with `python test/run_test.py --export-historic-test-times .pytorch-test-times`
Now run `python test/run_test.py --shard 1 10`

2. Make the sharding job pull data:
Delete the file you just created: `rm .pytorch-test-times`
Now run `python test/run_test.py --shard 1 10`

Reviewed By: walterddr

Differential Revision: D27136849

Pulled By: janeyx99

fbshipit-source-id: 51a42c4e2fa3f8cf15e682679dd3eb6130aad927
2021-03-18 13:25:51 -07:00
Jane Xu
2e7311ef25 First step to refactoring S3 reading logic (#53755)
Summary:
This is an initial attempt in refactoring and consolidating our S3 read logic for print_test_stats.py, test_history.py, and run_test.py. This way, boto3 and botocore do not need to be imported in various places throughout the code base, and duplicated logic (such as the many type definitions) can exist in one place: `tools/stat_utils/s3_stat_parser.py`. walterddr contributed to this PR by moving print_test_stats.py to the tools folder and the corresponding tests a subfolder within tools.

**NOTE: this removes those tests from CI as the new `tools/test/test_stats.py` is not in the test/ directory as the other tests in TESTS in run_test.py.**

Pull Request resolved: https://github.com/pytorch/pytorch/pull/53755

Test Plan:
This refactoring change should not break anything, so running the files as before should work as they did previously.
To make sure that print_test_stats.py still functions: run `python tools/test/test_stats.py` and make sure all tests pass.
To make sure that test_history.py works, run the example commands from `tools/test_history.py --help` and check that their output matches that shown. Note that the script will continue printing for a while, so don't be alarmed.

Some next steps:
- Actually coming up with similarities among the three current use cases and further refactoring/consolidating of functions (e.g., combining simplify and get_cases)
- Moving more parsing logic to s3_stat_parser.py to have better abstraction between our files
- Adding tests for s3_stat_parser.py when there is more functionality in it

Reviewed By: agolynski, samestep

Differential Revision: D27030285

Pulled By: janeyx99

fbshipit-source-id: e664781324ef7c0c30943bfd7f17c895075ef7a7
2021-03-17 12:38:09 -07:00
Jane Xu
f30a7a2739 Add export-historic-test-times option to dump S3 test times into a JSON file (#54083)
Summary:
This will allow for future work to use the test times file (which will save computation time and also allow for more consistency). (Step one to fixing https://github.com/pytorch/pytorch/issues/53882)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/54083

Test Plan:
export CIRCLE_JOB=your-favorite-circleci-job e.g., pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_test2
`python test/run_test.py --export-historic-test-times` OR
`python test/run_test.py --export-historic-test-times .your-favorite-file`

When opening either .pytorch-test-times or .your-favorite-file, you should see something like:
```
{"commit": "2d559a09392aabb84dfb4a498010b2f01d99818c", "job_times": {"distributed/test_distributed_spawn": 583.5889999999973, "distributed/test_data_parallel": 4.866999999999997, "test_binary_ufuncs": 171.1569999999998, "test_numpy_interop": 2.5649999999999995, "test_public_bindings": 0.011,...}}
```

Note that no tests will be run when this option is specified.

Reviewed By: walterddr

Differential Revision: D27091351

Pulled By: janeyx99

fbshipit-source-id: e191d739268d86de0a0ba0eea0006969859d1940
2021-03-17 12:22:00 -07:00
Jane Xu
ee35060888 Fix sharding algo + test it (#53942)
Summary:
This PR:
1. moves sharding algorithm from run_test.py to framework_utils.py (let me know if you have a better place for it)
2. adds tests for the algorithm in test_testing.py
3. fixes the algorithm so that it doesn't tack on the unknown jobs all to the shard with the minimum time, but instead distributes them around the shards.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/53942

Test Plan: python test/test_testing.py -k TestFrameworkUtils

Reviewed By: samestep

Differential Revision: D27047223

Pulled By: janeyx99

fbshipit-source-id: 824b20009c0bb707aa5361de445cdec795d5e3f1
2021-03-15 16:33:56 -07:00
Nikita Shulga
b00cdfe136 Fix run_test_module logic (#53884)
Summary:
First argument is either file name or test module name, but key to `CUSTOM_HANDLERS` is test module name.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/53884

Test Plan: Run `python3 run_test.py -i distributed/test_distributed_spawn.py`

Reviewed By: janeyx99

Differential Revision: D27006164

Pulled By: malfet

fbshipit-source-id: f30b42856cd2754e5981c1c69618f84e392c986a
2021-03-12 09:53:58 -08:00
Aliaksandr Ivanou
ec484981c6 [3/n][torch/elastic][upstream] Move torchelastic/events to torch/distributed/events (#53760)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53760

Pull Request resolved: https://github.com/pytorch/elastic/pull/143

The diff upsteams torchelastic/events to the torch.

Test Plan:
buck test mode/dev-nosan //pytorch/elastic/torchelastic/agent/...
    buck test mode/dev-nosan //caffe2/test/distributed/elastic/events/fb/...

Reviewed By: kiukchung

Differential Revision: D26932830

fbshipit-source-id: 23fc10d2ead5af7f7ed510ae0d2581cc2421cf76
2021-03-11 11:25:24 -08:00
Guilherme Leobas
cb68039363 Port NumPy typing testing style to PyTorch (#52408)
Summary:
ref: https://github.com/pytorch/pytorch/issues/16574

Pull Request resolved: https://github.com/pytorch/pytorch/pull/52408

Reviewed By: anjali411

Differential Revision: D26654687

Pulled By: malfet

fbshipit-source-id: 6feb603d8fb03c2ba2a01468bfde1a9901e889fd
2021-03-10 12:18:01 -08:00
Jane Xu
bcbe07200c Improve logic for S3 stats gathering. Uses automatic SLOW_TESTS. (#53549)
Summary:
This PR:
1. refactors the logic for S3 stats gathering.
2. Renames SLOW_TESTS to TARGET_DET_LIST to disambiguate and remove confusion with slowTest
2. detects slow tests (tests with time > 5min) to add to the TARGET_DET_LIST based on results in S3 from the previous nightly.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/53549

Test Plan:
Set CIRCLE_JOB to your favorite CI job (like `pytorch_linux_bionic_py3_8_gcc9_coverage_test1`).
Run `python test/run_test.py --determine-from=<your fave pytorch files>`
e.g., `python test/run_test.py --determine-from=test/run_test.py`

Reviewed By: mrshenli

Differential Revision: D26904478

Pulled By: janeyx99

fbshipit-source-id: 9576b34f4fee09291d60e36ff2631753a3925094
2021-03-10 09:37:06 -08:00
Sam Estep
8c798e0622 Forbid trailing whitespace (#53406)
Summary:
Context: https://github.com/pytorch/pytorch/pull/53299#discussion_r587882857

These are the only hand-written parts of this diff:
- the addition to `.github/workflows/lint.yml`
- the file endings changed in these four files (to appease FB-internal land-blocking lints):
  - `GLOSSARY.md`
  - `aten/src/ATen/core/op_registration/README.md`
  - `scripts/README.md`
  - `torch/csrc/jit/codegen/fuser/README.md`

The rest was generated by running this command (on macOS):
```
git grep -I -l ' $' -- . ':(exclude)**/contrib/**' ':(exclude)third_party' | xargs gsed -i 's/ *$//'
```

I looked over the auto-generated changes and didn't see anything that looked problematic.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/53406

Test Plan:
This run (after adding the lint but before removing existing trailing spaces) failed:
- https://github.com/pytorch/pytorch/runs/2043032377

This run (on the tip of this PR) succeeded:
- https://github.com/pytorch/pytorch/runs/2043296348

Reviewed By: walterddr, seemethere

Differential Revision: D26856620

Pulled By: samestep

fbshipit-source-id: 3f0de7f7c2e4b0f1c089eac9b5085a58dd7e0d97
2021-03-05 17:22:55 -08:00
Jane Xu
c0adabe172 automate sharding using S3 test time stats (#53269)
Summary:
Uses nightly commit stats to automatically shard tests based on execution time.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/53269

Test Plan:
set CIRCLE_JOB to an existing job, like `pytorch_linux_bionic_py3_6_clang9_test`
Then you can run something like: `python test/run_test.py --shard 1 10`

Reviewed By: malfet

Differential Revision: D26819440

Pulled By: janeyx99

fbshipit-source-id: 6bc73d6aa3d52d9850817536be15d7b54a72780e
2021-03-05 13:40:24 -08:00
Yi Zhang
fd582af06c enable coverage test for dataloader on Windows (#52550)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/50661
For coverage,
The class qualified name is `'SimpleCustomBatch': <class '__mp_main__.SimpleCustomBatch'>`

For pytest
The class qualified name is `'SimpleCustomBatch': <class 'test_dataloader.SimpleCustomBatch'>`

So move the class to one separate file

![image](https://user-images.githubusercontent.com/16190118/108611869-d6b51f80-741d-11eb-908e-be7a64da916d.png)

As malfet suggestion, use __import__ to avoid adding new file.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/52550

Reviewed By: walterddr

Differential Revision: D26754023

Pulled By: malfet

fbshipit-source-id: 34b0fbe7336b9303cedc28ec6116ab752a2d3630
2021-03-02 18:40:47 -08:00
Meghan Lele
1d6bd15790 [JIT] Add torch._C._jit submodule (#52910)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52910

**Summary**
PR #52158 tried to move all JIT bindings from `torch._C` to a new
submodule `torch._C._jit`, but that...did not go well. This pull request
adds the new `torch._C._jit` submodule, but does not migrate the
existing bindings. Instead, it adds a unit test that fails if any new
bindings are added to `torch._C`. A comment in the test instructs
developers to add their new binding to the allowlist if it really should
be in `torch._C`, or to add it to the appropriate submodule (e.g
`torch._C._jit`, for example). The idea is to prevent the issue
described in #51691 from getting *worse* if it cannot be fixed.

**Test Plan**
Continuous integration.

**Fixes**
This commit fixes #51691.

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D26698373

Pulled By: SplitInfinity

fbshipit-source-id: ec9f5426051227a513d4fd09512b624420e0100b
2021-02-26 16:05:05 -08:00
Kimish Patel
a6e94d274f [Pytorch] Add python binding to use mobile cpu allocator. (#52323)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52323

Using default cpu allocator for ops executed on qnnpack backend will result in
asan failures with heap overflow since qnnpack (and xnnpack) can access input
beyond their and/beginning.

Here we are enabling this feature specifically to enable dynamic sparse linear op test
using qnnpack engine. In dynamic linear op, the fp32 bias is not packed and
hence can result in out-of-bound access.

Test Plan: test_set_default_mobile_cpu_allocator.py

Reviewed By: z-a-f

Differential Revision: D26263481

fbshipit-source-id: a49227cac7e6781b0db4a156ca734d7671972d9f
2021-02-17 08:42:23 -08:00
Chester Liu
58eb23378f Clean up usage of torch._six partially (#49785)
Summary:
See https://github.com/pytorch/pytorch/issues/42919

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49785

Reviewed By: mruberry

Differential Revision: D25963833

Pulled By: bugra

fbshipit-source-id: 11c90d6b8d3f206c9d0a4d8621b773beb10c6ba2
2021-02-08 13:58:34 -08:00
mattip
9cbefad83f concantenate LICENSE files when building a wheel (#51634)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/50695

I checked locally that the concatenated license file appears at `torch-<version>.dist-info/LICENSE` in the wheel.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/51634

Reviewed By: zhangguanheng66

Differential Revision: D26225550

Pulled By: walterddr

fbshipit-source-id: 830c59fb7aea0eb50b99e295edddad9edab6ba3a
2021-02-08 08:28:46 -08:00
vfdev
b106250047 Introduced AliasInfo for OpInfo (#50368)
Summary:
Introduced AliasInfo for OpInfo.

Context: Split of https://github.com/pytorch/pytorch/issues/49158

cc mruberry , please let me know if you'd like to see here more code to cover

> [ ] fold test_op_aliases.py into OpInfo-based testing in test_ops.py

from https://github.com/pytorch/pytorch/issues/50006

and/or add `UnaryUfuncInfo('abs')` as discussed https://github.com/pytorch/pytorch/pull/49158/files#r548774221

Pull Request resolved: https://github.com/pytorch/pytorch/pull/50368

Reviewed By: ngimel

Differential Revision: D26177261

Pulled By: mruberry

fbshipit-source-id: 2e3884a387e8d5365fe05945375f0a9d1b5f5d82
2021-02-02 00:10:09 -08:00
Radhakrishnan Venkataramani
3397919dcf Rowwise Prune op (Add the test to OSS run_test), Make the op private. (#46131)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46131

Refer to the title.

Test Plan: `buck test caffe2/test:pruning`

Reviewed By: raghuramank100

Differential Revision: D24230472

fbshipit-source-id: 8f0a83446c23fdf30d0313b8c3f5ff1a463b50c7
2021-01-29 06:08:18 -08:00
lixinyu
5ed0ad4b6a DataPipe naming convension update (#51262)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51262

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D26120628

Pulled By: glaringlee

fbshipit-source-id: 6855a0dd6d4a93ff93adce1039960ffd7057a827
2021-01-28 17:44:36 -08:00
Benjamin Lefaudeux
87fb3707d9 ZeroRedundancyOptimizer: an implementation of a standalone sharded optimizer wrapper (#46750)
Summary:
Implement the first stage of ZeRO, sharding of the optimizer state, as described in [this blog post](https://www.microsoft.com/en-us/research/blog/zero-2-deepspeed-shattering-barriers-of-deep-learning-speed-scale/) and [this paper](https://arxiv.org/abs/1910.02054). This implementation is completely independent from the [DeepSpeed](https://github.com/microsoft/DeepSpeed) framework, and aims at providing ZeRO-compliant building blocks within the PyTorch scheme of things.

This works by:
- acting as a wrapper to a pytorch optimizer. ZeROptimizer does not optimize anything by itself, it only shards optimizers for distributed jobs
- each rank distributes parameters according to a given partitioning scheme (could be updated), and owns the update of a given shard only
- the .step() is called on each rank as expected, the fact that the optimizer actually works on a shard of the model is not visible from the outside
- when the update is completed, each rank broadcasts the updated model shard to all the other ranks

This can be used with DDP, although some communications are wasted in that case (gradients are all-reduced to all ranks). This implementation was initially developed in [Fairscale](https://github.com/facebookresearch/fairscale), and can also be used with an optimized DDP which only reduces to the relevant ranks. More context on ZeRO and PyTorch can be found in [this RFC](https://github.com/pytorch/pytorch/issues/42849)

The API with respect to loading and saving the state is a known pain point and should probably be discussed an updated. Other possible follow ups include integrating more closely to a [modularized DDP](https://github.com/pytorch/pytorch/issues/37002), [making the checkpoints partition-agnostic](https://github.com/facebookresearch/fairscale/issues/164), [exposing a gradient clipping option](https://github.com/facebookresearch/fairscale/issues/98) and making sure that mixed precision states are properly handled.

original authors include msbaines, min-xu-ai and myself

Pull Request resolved: https://github.com/pytorch/pytorch/pull/46750

Reviewed By: mruberry

Differential Revision: D25958918

Pulled By: blefaudeux

fbshipit-source-id: 14280f2fd90cf251eee8ef9ac0f1fa6025ae9c50
2021-01-20 14:36:16 -08:00
peter
a1b1d0cdc0 Better split of the windows test jobs (#50660)
Summary:
See discussion in https://github.com/pytorch/pytorch/pull/50320#discussion_r554447365.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/50660

Reviewed By: xuzhao9, samestep

Differential Revision: D25959021

Pulled By: seemethere

fbshipit-source-id: 7623bddc09e7d55208b8a1af4b5a23fba2cdeb14
2021-01-19 15:07:33 -08:00
Mikhail Zolotukhin
e9dc8fc162 [TensorExpr] Add python bindings. (#49698)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49698

Reincarnation of #47620 by jamesr66a.

It's just an initial bunch of things that we're exposing to python, more
is expected to come in future. Some things can probably be done better,
but I'm putting this out anyway, since some other people were interested
in using and/or developing this.

Differential Revision: D25668694

Test Plan: Imported from OSS

Reviewed By: bertmaher

Pulled By: ZolotukhinM

fbshipit-source-id: fb0fd1b31e851ef9ab724686b9ac2d172fa4905a
2021-01-14 21:02:47 -08:00
Nikita Shulga
22bd277891 Run test_type_hints first (#49748)
Summary:
Since it sort of a liner check and fails frequently

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49748

Reviewed By: vkuzo

Differential Revision: D25682980

Pulled By: malfet

fbshipit-source-id: 7dba28242dced0277bad56dc887d3273c1e9e575
2021-01-04 09:33:13 -08:00
Samuel Marks
e6779d4357 [*.py] Rename "Arguments:" to "Args:" (#49736)
Summary:
I've written custom parsers and emitters for everything from docstrings to classes and functions. However, I recently came across an issue when I was parsing/generating from the TensorFlow codebase: inconsistent use of `Args:` and `Arguments:` in its docstrings.

```sh
(pytorch#c348fae)$ for name in 'Args:' 'Arguments:'; do
    printf '%-10s %04d\n' "$name" "$(rg -IFtpy --count-matches "$name" | paste -s -d+ -- | bc)"; done
Args:      1095
Arguments: 0336
```

It is easy enough to extend my parsers to support both variants, however it looks like `Arguments:` is wrong anyway, as per:

  - https://google.github.io/styleguide/pyguide.html#doc-function-args @ [`ddccc0f`](https://github.com/google/styleguide/blob/ddccc0f/pyguide.md)

  - https://chromium.googlesource.com/chromiumos/docs/+/master/styleguide/python.md#describing-arguments-in-docstrings @ [`9fc0fc0`](https://chromium.googlesource.com/chromiumos/docs/+/9fc0fc0/styleguide/python.md)

  - https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html @ [`c0ae8e3`](https://github.com/sphinx-contrib/napoleon/blob/c0ae8e3/docs/source/example_google.rst)

Therefore, only `Args:` is valid. This PR replaces them throughout the codebase.

PS: For related PRs, see tensorflow/tensorflow/pull/45420

PPS: The trackbacks automatically appearing below are sending the same changes to other repositories in the [PyTorch](https://github.com/pytorch) organisation.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49736

Reviewed By: albanD

Differential Revision: D25710534

Pulled By: soumith

fbshipit-source-id: 61e8ff01abb433e9f78185c2d1d0cbd7c22c1619
2020-12-28 09:34:47 -08:00
Nikita Shulga
12942ea52b [BE] Introduce set_cwd context manager (#49657)
Summary:
Used to temporarily change working directory, but restore it even if exception is raised
Use it in test_type_hints and during code coverage collection

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49657

Reviewed By: walterddr

Differential Revision: D25660543

Pulled By: malfet

fbshipit-source-id: 77f08d57e4b60b95daa4068d0dacf7c25f978526
2020-12-21 12:08:48 -08:00
Erjia Guan
1b6fc1fd42 [WIP][DataLoader] CollateIterableDataset prototype (#48933)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48933

Prototype for CollateIterableDataset.
Move `collate_batch_fn` to BatchIterableDataset

- CollateIterableDataset
  - [x] Prototype
  - [x] Tests
- BatchIterableDataset
  - [x] Prototype
  - [x] Tests
- SamplerIterableDataset
  - [x] Prototype
  - [x] Tests

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D25623635

Pulled By: ejguan

fbshipit-source-id: 99ba077619f672551ac15367baaba985db35a9c2
2020-12-21 07:04:25 -08:00
Nikita Shulga
6f381de006 Inline coverage report combining/reporting (#49615)
Summary:
Instead of calling coverage frontend import coverage module and call combine() and html_report()

Fixes https://github.com/pytorch/pytorch/issues/49596 by not using a strict mode when combining those reports

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49615

Reviewed By: seemethere

Differential Revision: D25645196

Pulled By: malfet

fbshipit-source-id: be55b5c23a3569a331cbdf3f86d8c89bc27d5fe1
2020-12-18 17:08:46 -08:00
Pritam Damania
9d91360b5d Cleanup APIs for pipeline parallelism. (#48630)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48630

1) Make torch.distributed.pipeline package public.
2) Make several helper methods private.
ghstack-source-id: 118820803

Test Plan: waitforbuildbot

Reviewed By: rohan-varma

Differential Revision: D25235688

fbshipit-source-id: c32833ebf090ddbd4eaf06fcb5e3f9d421623a60
2020-12-18 15:17:13 -08:00
Rong Rong (AI Infra)
df2337097d add files to SLOW_TESTS for target determinator (#49500)
Summary:
- test_torch was split into 6 in https://github.com/pytorch/pytorch/issues/47356.
- also test_linalg has 10 slowtest marking.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49500

Reviewed By: ezyang, malfet

Differential Revision: D25598085

Pulled By: walterddr

fbshipit-source-id: 74b0b433897721db86c00e236d1dd925d7a6d3d0
2020-12-16 19:10:56 -08:00
Brian Hirsh
9908b93dcf fix test_dispatch tests to error on duplicate def (#49254)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49254

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D25505170

Pulled By: bdhirsh

fbshipit-source-id: 6796f4ce022c3141934ee69c7caaa08e663adf39
2020-12-15 08:27:52 -08:00
Pritam Damania
df027bfd2c Modify Pipe to return an RRef. (#47829)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47829

As per proposal in https://github.com/pytorch/pytorch/issues/44827,
the API needs to return an RRef to support inter-host pipelining.

For now, we just return a local RRef and only support pipeline on a single
host. But having this change in the API upfront ensures we don't make any BC
breaking changes later.
ghstack-source-id: 118366784

Test Plan: waitforbuildbot

Reviewed By: rohan-varma

Differential Revision: D24914022

fbshipit-source-id: e711e7d12efa45645f752f0e5e776a3d845f3ef5
2020-12-11 14:55:16 -08:00
Rong Rong
ef50c94e7c reenabling MPI test (#48725)
Summary:
fixes https://github.com/pytorch/pytorch/issues/47443.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/48725

Reviewed By: mrshenli

Differential Revision: D25278758

Pulled By: walterddr

fbshipit-source-id: a02d0fef99a7941c8e98da16a45d840e12b8b0c3
2020-12-03 06:50:36 -08:00
neerajprad
5489a98cd3 Add support for CorrCholeskyTransform (#48041)
Summary:
This adds a transform to convert a real vector of (D * (D-1))/2 dimension into the cholesky factor of a D x D correlation matrix. This follows the implementation in [NumPyro](https://github.com/pyro-ppl/numpyro/blob/master/numpyro/distributions/transforms.py) by fehiepsi. This is needed for the LKJDistribution which will be added in a subsequent PR.

Also in line with the ongoing effort to refactor distributions test, this moves the transforms test into its own file that uses pytest with parametrized fixtures.

For review:
 fehiepsi - could you help review the math?
 fritzo - do you have any suggestions for what to do about the event dimension (more details are in the comment below)?
 ezyang - could you review the changes in `run_test.py`? Instead of a separate `PYTEST_TESTS`, I have clubbed these tests in `USE_PYTEST_LIST` to avoid duplicate logic. The only difference is that we do not anymore check if pytest is not installed and exclude the tests in the list. I figured that if existing tests are already using pytest, this should not matter.

TODOs (probably not all can be satisfied at the same time):
 - [x] Use operations that are JIT friendly, i.e. the transform works with different sized input under JIT.
 - [x] Resolve test failures - currently `arange(scalar_tensor)` fails on certain backends but this is needed for JIT. Maybe we should only support same sized tensor under JIT?
 - [x] Add tests to check that the transform gives correct gradients and is in agreement with the `log_det_jacobian`.
 - [x] Add `input_event_dim` and `output_event_dim` to `CorrCholeskyTransform`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/48041

Reviewed By: zhangguanheng66

Differential Revision: D25262505

Pulled By: neerajprad

fbshipit-source-id: 5a57e1c19d8230b53592437590b9169bdf2f71e9
2020-12-03 03:21:08 -08:00
Mike Ruberry
36c87f1243 Refactors test_torch.py to be fewer than 10k lines (#47356)
Summary:
Creates multiple new test suites to have fewer tests in test_torch.py, consistent with previous test suite creation like test_unary_ufuncs.py and test_linalg.py.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/47356

Reviewed By: ngimel

Differential Revision: D25202268

Pulled By: mruberry

fbshipit-source-id: 75fde3ca76545d1b32b86d432a5cb7a5ba8f5bb6
2020-11-28 20:11:40 -08:00
Jithun Nair
f1c985695c Enabled gloo backend in test_distributed unit tests for ROCm (#40395)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40395

Reviewed By: ngimel

Differential Revision: D25181692

Pulled By: mrshenli

fbshipit-source-id: 29f478c974791efc0acea210c8c9e574944746a5
2020-11-25 19:51:40 -08:00
Sam Estep
c4a6df989c Pass any verbosity from test/run_test.py to pytest (#48204)
Summary:
Previously it was only possible to pass up to one [verbosity level](https://adamj.eu/tech/2019/10/03/my-most-used-pytest-commandline-flags/) to `pytest` when running a test via `test/run_test.py`. Presumably that behavior was never added because `unittest` [doesn't do anything extra](https://stackoverflow.com/a/1322648/5044950) when given more than one `--verbose` flag. This PR removes that limitation.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/48204

Test Plan:
Make a dummy `pytest`-style file `test/test_foo.py`:
```py
def test_bar():
    assert 'hello\n' * 10 == 'hello\n' * 20
```
Then add `'test_foo'` to both `TESTS` and `USE_PYTEST_LIST` in `test/run_test.py`, and run this command:
```sh
test/run_test.py -vvi test_foo
```

Reviewed By: walterddr

Differential Revision: D25069147

Pulled By: samestep

fbshipit-source-id: 2765ee78d18cc84ea0e262520838993f9e9ee04f
2020-11-19 08:06:26 -08:00
Wanchao Liang
bc484cfed1 [c10d][jit] initial torchbind bindings for ProcessGroupNCCL (#42944)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42944

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D23228682

Pulled By: wanchaol

fbshipit-source-id: 30f4258ec2a90202264745511b897f4e1f5550f7
2020-11-17 21:01:55 -08:00
Xiang Gao
6e42b77be1 Add '--allow-run-as-root' to mpiexec to allow running distributed test inside a container (#43794)
Summary:
Inside a container, the user is often root. We should allow this use case so that people can easily run `run_test.py` insider a container

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43794

Reviewed By: ezyang

Differential Revision: D24904469

Pulled By: malfet

fbshipit-source-id: f96cb9dda3e7bd18b29801cde4c5b0616c750016
2020-11-13 15:31:06 -08:00
Jane Xu
579cfc6641 Moving test order to rebalance test1 and test2 times (#47290)
Summary:
asan testing diff is absurd right now, moving some heftier tests to be in shard2 (test_nn and test_quantization)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/47290

Reviewed By: malfet

Differential Revision: D24706877

Pulled By: janeyx99

fbshipit-source-id: 35069d1e425857f85775f9be76501d6a158e0376
2020-11-03 09:39:29 -08:00
Pritam Damania
78de12f588 Replace -f with -x for pytest tests. (#46967)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46967

Tests under `tests/distributed/_pipeline/sync` use pytest and
specifying the `-f` option for such tests as follows: `python test/run_test.py
-i distributed/_pipeline/sync/skip/test_api -- -f` doesn't work.

The equivalent option for pytest is `-x`. To resolve this issue, I've updated
`run_test.py` to replace `-f` with `-x` for pytest tests.

More details in https://github.com/pytorch/pytorch/issues/46782

#Closes: https://github.com/pytorch/pytorch/issues/46782
ghstack-source-id: 115440558

Test Plan:
1) waitforbuildbot
2) `python test/run_test.py -i distributed/_pipeline/sync/skip/test_api -- -f`

Reviewed By: malfet

Differential Revision: D24584556

fbshipit-source-id: bd87f5b4953504e5659fe72fc8615e126e5490ff
2020-10-29 15:28:06 -07:00
Jane Xu
85954164a4 fix minor bug, message variable does not exist (#46777)
Summary:
When run with `--continue-through-error`, the script ends with the following error:

```
Traceback (most recent call last):
  File "run_test.py", line 745, in <module>
    main()
  File "run_test.py", line 741, in main
    print_to_stderr(message)
NameError: name 'message' is not defined
make: *** [macos-compat] Error 1
```

This PR just changes `message` to `err`, which is the intended variable.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/46777

Reviewed By: seemethere

Differential Revision: D24510460

Pulled By: janeyx99

fbshipit-source-id: be1124b6fc72b178d62acc168d0cbc74962de52b
2020-10-23 14:20:23 -07:00
Pritam Damania
06d50b5eb0 Pull in fairscale.nn.Pipe into PyTorch. (#44090)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44090

This is an initial commit pulling in the torchgpipe fork at
https://github.com/facebookresearch/fairscale.

The purpose of this commit is to just pull in the code and ensure all tests and
builds work fine. We will slowly modify this to match our intended API
mentioned in https://fb.quip.com/txurAV3zIFox#RPZACAfAKMq. Follow up PRs would
address further changes needed on top of the initial commit..

We're pulling the code into the `torch.distributed._pipeline.sync` package. The
package is private on purpose since there is a lot of work (ex: docs, API
changes etc.) that needs to go in before we can actually officially support
this.
ghstack-source-id: 114864254

Test Plan:
1) waitforbuildbot
2) Ran all tests on my devgpu

Reviewed By: mrshenli

Differential Revision: D23493316

fbshipit-source-id: fe3c8b7dadeeb86abdc00e8a8652491b0b16743a
2020-10-22 10:59:02 -07:00
Richard Zou
0285618a11 Add utilities to support handling of nested python data structures (#46287)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46287

This adds a lightweight `pytree` implementation that is similar to and
inspired by JAX pytrees, tensorflow.nest, deepmind/tree,
TorchBeast's TensorNest, etc.

A *pytree* is Python nested data structure. It is a tree in the sense
that nodes are Python collections (e.g., list, tuple, dict) and the leaves
are Python values. Furthermore, a pytree should not contain reference
cycles.

This PR:
- adds support for flattening and unflattening nested Python list/dict/tuples

Context: nested Tensor inputs for vmap
--------------------------------------
Right now, vmap is restricted to taking in flat lists of tensors. This
is because vmap needs to be able to convert every tensor in the input
that is being vmapped over into a BatchedTensor.

With a pytree library, we can simply flatten the input data structure
(returning the leaves), map all of the Tensors in the flat input to
BatchedTensors, and unflatten the flat list of BatchedTensors into a new
input. Or equivalently, with a `tree_map` function, we can map a nested
python data structure containing Tensors into one containing
BatchedTensors.

Future work
-----------
In some future PRs, we'll add nested input support for vmap. The
prerequisites for that are:
- a `broadcast_to(small, big)` that broadcasts `small` up to `big`.
  This is for handling the in_dims to vmap: the in_dims structure must
  be compatible with the structure of the inputs.

Test Plan
---------
- New tests in test/test_pytree.py

Test Plan: Imported from OSS

Reviewed By: heitorschueroff

Differential Revision: D24392890

Pulled By: zou3519

fbshipit-source-id: 7daf7430c5a38354e7d203a72882bd7a9b24cfb1
2020-10-20 07:45:45 -07:00
jiej
ac146c4820 [nvFuser] Switching to CudaFusionGuard from BailOut for nvfuser - update 2 (#46452)
Summary:
1. Added CudaFusionGuard as the custom TypeCheck for nvfuser; enabled dynamic shape support with profiling executor;
2. dropped support for legacy fuser;
3. re-enabled nvfuser tests;
4. added registration for profiling record to allow profiling on user specified nodes.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/46452

Reviewed By: zou3519, anjali411

Differential Revision: D24364642

Pulled By: ngimel

fbshipit-source-id: daf53a9a6b6636e1ede420a3a6d0397d4a8b450b
2020-10-19 15:44:31 -07:00
Taylor Robie
dda95e6914 More Timer refinement (#46023)
Summary:
This PR just adds more polish to the benchmark utils:

1) `common.py`, `timer.py`, and `valgrind_wrapper/timer_interface.py` are now MyPy strict compliant. (except for three violations due to external deps.) Compare and Fuzzer will be covered in a future PR.
2) `CallgrindStats` now uses `TaskSpec` rather than accepting the individual fields which brings it closer to `Measurement`.
3) Some `__repr__` logic has been moved into `TaskSpec` (which `Measurement` and `CallgrindStats` use in their own `__repr__`s) for a more unified feel and less horrible f-string hacking, and the repr's have been given a cleanup pass.
4) `Tuple[FunctionCount, ...]` has been formalized as the `FunctionCounts` class, which has a much nicer `__repr__` than just the raw tuple, as well as some convenience methods (`__add__`, `__sub__`, `filter`, `transform`) for easier DIY stat exploration. (I find myself using the latter two a lot now.) My personal experience is that manipulating `FunctionCounts` is massively more pleasant than the raw tuples of `FunctionCount`. (Though it's still possible to get at the raw data if you want.)
5) Better support for multi-line `stmt` and `setup`.
6) Compare now also supports rowwise coloring, which is often the more natural layout for A/B testing.
7) Limited support for `globals` in `collect_callgrind`. This should make it easier to benchmark JIT models. (CC ZolotukhinM)
8) More unit tests, including extensive tests for the Callgrind stats manipulation APIs.
9) Mitigate issue with `MKL_THREADING_LAYER` when run in Jupyter. (https://github.com/pytorch/pytorch/issues/37377)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/46023

Test Plan: changes should be covered by existing and new unit tests.

Reviewed By: navahgar, malfet

Differential Revision: D24313911

Pulled By: robieta

fbshipit-source-id: 835d4b5cde336fb7ff0adef3c0fd614d64df0f77
2020-10-15 16:32:53 -07:00
Wang Xu
62d37b9f26 add size_based_partition final (#46282)
Summary:
Reopen the PR: https://github.com/pytorch/pytorch/pull/45837
This PR add a new feature for Partitioner() class called size_based_partition. Given a list of devices with the same memory size, this function could distribute graph nodes into different devices. To implement this feature, several help functions are created in Partitioner.py and GraphManipulation.py.
An unit test is also added in test/test_fx_experimental.py

Pull Request resolved: https://github.com/pytorch/pytorch/pull/46282

Reviewed By: gcatron

Differential Revision: D24288470

Pulled By: scottxu0730

fbshipit-source-id: e81b1e0c56e34f61e497d868882126216eba7538
2020-10-14 03:44:05 -07:00
Neeraj Pradhan
faa9c22a51 Support pytest for distribution testing (#45648)
Summary:
In response to https://github.com/pytorch/pytorch/issues/11578. This is a test run to see if CI (and other internal systems) works fine with pytest style tests.
 - Creates a separate `distributions` directory within `test`.
 - For testing, this rewrites the `constraint` tests as parameterized tests in pytest. I don't plan to convert any other tests to pytest style, but only expose this option for adding new tests, if required.

If this is a success, we can move `EXAMPLES` in `test_distributions` into a separate file that can be imported by both pytest and unittest style tests. cc. fritzo

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45648

Reviewed By: ezyang, colesbury

Differential Revision: D24080248

Pulled By: neerajprad

fbshipit-source-id: 1f2e7d169c3c291a3051d0cece17851560fe9ea9
2020-10-13 10:56:50 -07:00
Jane Xu
ba78eb80ff including tensorexpr tests in CI for all configs (#46188)
Summary:
Removed test_tensorexpr from the JIT-EXECUTOR exclude list.

CI will now run those tests.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/46188

Reviewed By: glaringlee

Differential Revision: D24255433

Pulled By: janeyx99

fbshipit-source-id: f18e5b41d49b439407c1c24ef6190ef68bc809bf
2020-10-12 12:03:06 -07:00
Jane (Yuan) Xu
be137e45cd reorganizing tests so that test1 and test2 are balanced in timing (#45778)
Summary:
used --shard option to split up python tests ran from `test/run_test.py` in the testing script run in CI

also revised a help message to be more accurate for --shard.

Test results:
BEFORE:
| EVENT | TIMING  |
|---|---|
| **TEST1** | |
| | |
| test_python_nn | 35m19s |
| test_cpp_extensions | 30s |
| **total** | **35m49s** |
| **TEST2** | |
| | |
| install_torchvision | 35s |
| test_python_all_except_nn_and_cpp_extensions | 255m37s |
| test_aten | SKIPPED |
| test_libtorch | 9m8s |
| test_custom_script_ops | SKIPPED |
| test_custom_backend | SKIPPED |
| test_torch_function_benchmark | 10s |
| **total** | **4hr24m** |

AFTER THIS SHARD:
| EVENT | TIMING  |
|---|---|
| **TEST1** | |
| | |
| test_autograd | 26m30s |
| test_foreach | 69m |
| test_nn | test_nn is 35m38s |
| **total** | **3h1m** |
| **TEST2** | |
| | |
| test-quantization | 41m28s |
| test_spectral_ops | 17m37s |
| test_torch | 8m56s |
| test_jit_legacy | 16m21s |
| **total** | **2h18m** |

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45778

Reviewed By: albanD

Differential Revision: D24137156

Pulled By: janeyx99

fbshipit-source-id: 5873fec47aedb9f699ebbda653a4d32a9950fc13
2020-10-06 07:57:08 -07:00