In the case of target determination, this is just removing comments that
refer to non-existent code.
In the case of the test specification code; this removes (what I believe
to be) an unused feature. If we're using this somehow let me know and I
can revise the PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79372
Approved by: https://github.com/janeyx99
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75374
From the code base of FairSeq and MetaSeq codebase (which is essentially a transformer model), we have found that loads of ops are not supported by sharded tensor. So we now implement a simple version so that we can at least run a transformer example:
Ops include: chuck, transpose, view, mask_fill, dropout, softmax and type_as.
Isolate the common logic of registering simple ops into a function and for future register, we just need to implement at most three functions for a new op.
ghstack-source-id: 155309147
Test Plan: CI
Reviewed By: pritamdamania87
Differential Revision: D35123021
fbshipit-source-id: 660e559fb8b4a910eb63e0586c63ab927873a2ce
(cherry picked from commit 83a87ebf627d863448dfe1019c7c5f7112cc14ab)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76199
Since Partial Tensor is somehow isolated to sharded tensor. We now move it to the _shard folder.
Also, we added the logic to remove paddings when the size is not divisible by the world size. Modify the unit test to reflect this changes.
Finally, we need to consider the placement order for the reshading spec for partial tensor, related logic is added in this change. Futhermore, for sharded linear, we will need to order the placement by rank to get the expected local result.
ghstack-source-id: 154853290
Test Plan: CI
Reviewed By: pritamdamania87, wanchaol
Differential Revision: D35827894
fbshipit-source-id: 58dab77969b8b6557f42afa7e8f5a8a053dd5793
(cherry picked from commit abeb28f16582dcf707c2e165f39df6caf692384d)
As per title.
### When running `python run_test.py -h`
It used to show:
- The general unittest parser help that we print via a second thread 35545d85dc/torch/testing/_internal/common_utils.py (L467-L470)
- The common_utils's parser help
<details><summary>Full result</summary>
<p>
```bash
$ python run_test.py -h
usage: run_test.py [-h] [-v] [-q] [--locals] [-f] [-c] [-b] [-k TESTNAMEPATTERNS] [tests [tests ...]]
positional arguments:
tests a list of any number of test modules, classes and test methods.
optional arguments:
-h, --help show this help message and exit
-v, --verbose Verbose output
-q, --quiet Quiet output
--locals Show local variables in tracebacks
-f, --failfast Stop on first fail or error
-c, --catch Catch Ctrl-C and display results so far
-b, --buffer Buffer stdout and stderr during tests
-k TESTNAMEPATTERNS Only run tests which match the given substring
Examples:
run_test.py - run default set of tests
run_test.py MyTestSuite - run suite 'MyTestSuite'
run_test.py MyTestCase.testSomething - run MyTestCase.testSomething
run_test.py MyTestCase - run all 'test*' test methods
in MyTestCase
usage: run_test.py [-h] [--subprocess] [--seed SEED] [--accept] [--jit_executor JIT_EXECUTOR] [--repeat REPEAT] [--test_bailouts]
[--save-xml [SAVE_XML]] [--discover-tests] [--log-suffix LOG_SUFFIX] [--run-parallel RUN_PARALLEL]
[--import-slow-tests [IMPORT_SLOW_TESTS]] [--import-disabled-tests [IMPORT_DISABLED_TESTS]]
optional arguments:
-h, --help show this help message and exit
--subprocess whether to run each test in a subprocess
--seed SEED
--accept
--jit_executor JIT_EXECUTOR
--repeat REPEAT
--test_bailouts
--save-xml [SAVE_XML]
--discover-tests
--log-suffix LOG_SUFFIX
--run-parallel RUN_PARALLEL
--import-slow-tests [IMPORT_SLOW_TESTS]
--import-disabled-tests [IMPORT_DISABLED_TESTS]
```
</p>
</details>
It now prints:
- The general unittest parser help the same way. Should we remove this? We can't merge them unfortunately as inittest does not accept parent / does not expose the parser for us to take it as a parent.
- The combined common_utils + run_test parsers help
<details><summary>Full result</summary>
<p>
```bash
$ python run_test.py -h
usage: run_test.py [-h] [-v] [-q] [--locals] [-f] [-c] [-b] [-k TESTNAMEPATTERNS] [tests [tests ...]]
positional arguments:
tests a list of any number of test modules, classes and test methods.
optional arguments:
-h, --help show this help message and exit
-v, --verbose Verbose output
-q, --quiet Quiet output
--locals Show local variables in tracebacks
-f, --failfast Stop on first fail or error
-c, --catch Catch Ctrl-C and display results so far
-b, --buffer Buffer stdout and stderr during tests
-k TESTNAMEPATTERNS Only run tests which match the given substring
Examples:
run_test.py - run default set of tests
run_test.py MyTestSuite - run suite 'MyTestSuite'
run_test.py MyTestCase.testSomething - run MyTestCase.testSomething
run_test.py MyTestCase - run all 'test*' test methods
in MyTestCase
Ignoring disabled issues: []
usage: run_test.py [-h] [--subprocess] [--seed SEED] [--accept] [--jit_executor JIT_EXECUTOR] [--repeat REPEAT] [--test_bailouts]
[--save-xml [SAVE_XML]] [--discover-tests] [--log-suffix LOG_SUFFIX] [--run-parallel RUN_PARALLEL]
[--import-slow-tests [IMPORT_SLOW_TESTS]] [--import-disabled-tests [IMPORT_DISABLED_TESTS]] [-v] [--jit]
[--distributed-tests] [-core] [-pt] [-c] [-i TESTS [TESTS ...]] [-x TESTS [TESTS ...]] [-f TESTS] [-l TESTS]
[--bring-to-front TESTS [TESTS ...]] [--ignore-win-blocklist] [--continue-through-error]
[--export-past-test-times [EXPORT_PAST_TEST_TIMES]] [--shard SHARD SHARD] [--exclude-jit-executor]
[--exclude-distributed-tests] [--run-specified-test-cases [RUN_SPECIFIED_TEST_CASES]]
[--use-specified-test-cases-by {include,bring-to-front}] [--dry-run]
[additional_unittest_args [additional_unittest_args ...]]
Run the PyTorch unit test suite
positional arguments:
additional_unittest_args
additional arguments passed through to unittest, e.g., python run_test.py -i sparse -- TestSparse.test_factory_size_check
optional arguments:
-h, --help show this help message and exit
--subprocess whether to run each test in a subprocess
--seed SEED
--accept
--jit_executor JIT_EXECUTOR
--repeat REPEAT
--test_bailouts
--save-xml [SAVE_XML]
--discover-tests
--log-suffix LOG_SUFFIX
--run-parallel RUN_PARALLEL
--import-slow-tests [IMPORT_SLOW_TESTS]
--import-disabled-tests [IMPORT_DISABLED_TESTS]
-v, --verbose print verbose information and test-by-test results
--jit, --jit run all jit tests
--distributed-tests, --distributed-tests
run all distributed tests
-core, --core Only run core tests, or tests that validate PyTorch's ops, modules,and autograd. They are defined by CORE_TEST_LIST.
-pt, --pytest If true, use `pytest` to execute the tests. E.g., this runs TestTorch with pytest in verbose and coverage mode: python run_test.py -vci torch -pt
-c, --coverage enable coverage
-i TESTS [TESTS ...], --include TESTS [TESTS ...]
select a set of tests to include (defaults to ALL tests). tests must be a part of the TESTS list defined in run_test.py
-x TESTS [TESTS ...], --exclude TESTS [TESTS ...]
select a set of tests to exclude
-f TESTS, --first TESTS
select the test to start from (excludes previous tests)
-l TESTS, --last TESTS
select the last test to run (excludes following tests)
--bring-to-front TESTS [TESTS ...]
select a set of tests to run first. This can be used in situations where you want to run all tests, but care more about some set, e.g. after making a change to a specific component
--ignore-win-blocklist
always run blocklisted windows tests
--continue-through-error
Runs the full test suite despite one of the tests failing
--export-past-test-times [EXPORT_PAST_TEST_TIMES]
dumps test times from previous S3 stats into a file, format JSON
--shard SHARD SHARD runs a shard of the tests (taking into account other selections), e.g., --shard 2 3 will break up the selected tests into 3 shards and run the tests in the 2nd shard (the first number should not exceed the second)
--exclude-jit-executor
exclude tests that are run for a specific jit config
--exclude-distributed-tests
exclude distributed tests
--run-specified-test-cases [RUN_SPECIFIED_TEST_CASES]
load specified test cases file dumped from previous OSS CI stats, format CSV. If all test cases should run for a <test_module> please add a single row:
test_filename,test_case_name
...
<test_module>,__all__
...
how we use the stats will be based on option "--use-specified-test-cases-by".
--use-specified-test-cases-by {include,bring-to-front}
used together with option "--run-specified-test-cases". When specified test case file is set, this option allows the user to control whether to only run the specified test modules or to simply bring the specified modules to front and also run the remaining modules. Note: regardless of this option, we will only run the specified test cases within a specified test module. For unspecified test modules with the bring-to-front option, all test cases will be run, as one may expect.
--dry-run Only list the test that will run.
where TESTS is any of: benchmark_utils/test_benchmark_utils, distributed/_shard/sharded_optim/test_sharded_optim, distributed/_shard/sharded_tensor/ops/test_binary_cmp, distributed/_shard/sharded_tensor/ops/test_elementwise_ops, distributed/_shard/sharded_tensor/ops/test_embedding, distributed/_shard/sharded_tensor/ops/test_embedding_bag, distributed/_shard/sharded_tensor/ops/test_init, distributed/_shard/sharded_tensor/ops/test_linear, distributed/_shard/sharded_tensor/ops/test_math_ops, distributed/_shard/sharded_tensor/test_megatron_prototype, distributed/_shard/sharded_tensor/test_partial_tensor, distributed/_shard/sharded_tensor/test_sharded_tensor, distributed/_shard/sharded_tensor/test_sharded_tensor_reshard, distributed/_shard/sharding_spec/test_sharding_spec, distributed/_shard/test_replicated_tensor, distributed/algorithms/test_join, distributed/elastic/events/lib_test, distributed/elastic/metrics/api_test, distributed/elastic/multiprocessing/api_test, distributed/elastic/timer/api_test, distributed/elastic/timer/local_timer_example, distributed/elastic/timer/local_timer_test, distributed/elastic/utils/distributed_test, distributed/elastic/utils/logging_test, distributed/elastic/utils/util_test, distributed/fsdp/test_flatten_params_wrapper, distributed/fsdp/test_fsdp_apply, distributed/fsdp/test_fsdp_checkpoint, distributed/fsdp/test_fsdp_clip_grad_norm, distributed/fsdp/test_fsdp_comm, distributed/fsdp/test_fsdp_core, distributed/fsdp/test_fsdp_freezing_weights, distributed/fsdp/test_fsdp_grad_acc, distributed/fsdp/test_fsdp_ignored_modules, distributed/fsdp/test_fsdp_input, distributed/fsdp/test_fsdp_memory, distributed/fsdp/test_fsdp_mixed_precision, distributed/fsdp/test_fsdp_multiple_forward, distributed/fsdp/test_fsdp_multiple_wrapping, distributed/fsdp/test_fsdp_optim_state, distributed/fsdp/test_fsdp_overlap, distributed/fsdp/test_fsdp_pure_fp16, distributed/fsdp/test_fsdp_state_dict, distributed/fsdp/test_fsdp_summon_full_params, distributed/fsdp/test_fsdp_traversal, distributed/fsdp/test_fsdp_uneven, distributed/fsdp/test_shard_utils, distributed/fsdp/test_utils, distributed/fsdp/test_wrap, distributed/nn/jit/test_instantiator, distributed/optim/test_zero_redundancy_optimizer, distributed/pipeline/sync/skip/test_api, distributed/pipeline/sync/skip/test_gpipe, distributed/pipeline/sync/skip/test_inspect_skip_layout, distributed/pipeline/sync/skip/test_leak, distributed/pipeline/sync/skip/test_portal, distributed/pipeline/sync/skip/test_stash_pop, distributed/pipeline/sync/skip/test_tracker, distributed/pipeline/sync/skip/test_verify_skippables, distributed/pipeline/sync/test_balance, distributed/pipeline/sync/test_bugs, distributed/pipeline/sync/test_checkpoint, distributed/pipeline/sync/test_copy, distributed/pipeline/sync/test_deferred_batch_norm, distributed/pipeline/sync/test_dependency, distributed/pipeline/sync/test_inplace, distributed/pipeline/sync/test_microbatch, distributed/pipeline/sync/test_phony, distributed/pipeline/sync/test_pipe, distributed/pipeline/sync/test_pipeline, distributed/pipeline/sync/test_stream, distributed/pipeline/sync/test_transparency, distributed/pipeline/sync/test_worker, distributed/rpc/cuda/test_tensorpipe_agent, distributed/rpc/test_faulty_agent, distributed/rpc/test_tensorpipe_agent, distributed/test_c10d_common, distributed/test_c10d_gloo, distributed/test_c10d_nccl, distributed/test_c10d_spawn_gloo, distributed/test_c10d_spawn_nccl, distributed/test_data_parallel, distributed/test_distributed_spawn, distributed/test_launcher, distributed/test_nccl, distributed/test_pg_wrapper, distributed/test_store, distributions/test_constraints, distributions/test_distributions, lazy/test_bindings, lazy/test_extract_compiled_graph, lazy/test_ts_opinfo, test_ao_sparsity, test_autocast, test_autograd, test_binary_ufuncs, test_bundled_inputs, test_complex, test_cpp_api_parity, test_cpp_extensions_aot_ninja, test_cpp_extensions_aot_no_ninja, test_cpp_extensions_jit, test_cuda, test_cuda_primary_ctx, test_dataloader, test_datapipe, test_deploy, test_deploy, test_dispatch, test_expanded_weights, test_foreach, test_function_schema, test_functional_autograd_benchmark, test_functional_optim, test_functionalization, test_futures, test_fx, test_fx_experimental, test_hub, test_import_stats, test_indexing, test_jit, test_jit_autocast, test_jit_cuda_fuser, test_jit_disabled, test_jit_fuser_legacy, test_jit_fuser_te, test_jit_legacy, test_jit_profiling, test_license, test_linalg, test_logging, test_masked, test_mkldnn, test_mobile_optimizer, test_model_dump, test_module_init, test_modules, test_monitor, test_multiprocessing, test_multiprocessing_spawn, test_namedtensor, test_namedtuple_return_api, test_native_functions, test_nestedtensor, test_nn, test_numba_integration, test_numpy_interop, test_openmp, test_ops, test_ops_gradients, test_ops_jit, test_optim, test_overrides, test_package, test_per_overload_api, test_profiler, test_pruning_op, test_public_bindings, test_python_dispatch, test_pytree, test_quantization, test_reductions, test_scatter_gather_ops, test_serialization, test_set_default_mobile_cpu_allocator, test_shape_ops, test_show_pickle, test_sort_and_select, test_sparse, test_sparse_csr, test_spectral_ops, test_stateless, test_tensor_creation_ops, test_tensorboard, test_tensorexpr, test_tensorexpr_pybind, test_testing, test_torch, test_type_hints, test_type_info, test_type_promotion, test_unary_ufuncs, test_utils, test_view_ops, test_vmap, test_vulkan, test_xnnpack_integration
```
</p>
</details>
### When running anything else (for example `python test_autograd.py -h`)
It did not change and still does:
- The general unittest parser help that we print via a second thread
- The common_utils's parser help
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76152
Approved by: https://github.com/malfet, https://github.com/seemethere
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73873
Basic ShardingPlan interface and Sharder implemention:
1. We provide `ShardingPlan` to allow user to specify all parameter sharding strategies for a given model, this including `plan` for sharding the parameters, and `output_plan` for tagging the output layout, `return_local_tensor` for converting back to DDP.
2. Introduce `shard_module` API, that could take a nn.Module, a ShardingPlan, then shard the module according to the plan.
TODO:
next PR we will introduce Extensible Sharder and ShardingPlanner.
ghstack-source-id: 154682421
Test Plan: test_sharding_plann.py
Reviewed By: pritamdamania87, fduwjj
Differential Revision: D34695159
fbshipit-source-id: 3d695803c4b7e9a7543177ade5b709b5f847baa9
(cherry picked from commit 670cd279b0e5304a9bf0ce6e6651a08273a77035)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73322
These tests have been disabled in OSS CI since #34785.
Test Plan: Imported from OSS
Reviewed By: eellison
Differential Revision: D34436844
Pulled By: davidberard98
fbshipit-source-id: c5b14b33e7f369a6fa1e9cfbcb484a30dffc659e
(cherry picked from commit b08f51587c0203c3e8b69f06ea613759e740aa4f)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73529
Add ReplicatedTensor, a ReplicatedTensor is a type of tensor that have the same value on all ranks across the world_size.
ReplicatedTensor is a :class:`~torch.Tensor` subclass, and it could be used together with ShardedTensor/Tensor together to express different types of computation. The inter-op rules defined as (using torch.add as an example op):
ReplicatedTensor + ReplicatedTensor = ReplicatedTensor
ReplicatedTensor + torch.Tensor = torch.Tensor
ReplicatedTensor + ShardedTensor = ShardedTensor
We also added a `validate()` API to help user validate if a replicated tensor on certain process_group is truly replicated or not.
TODO: next PR gonna add ShardedTensor/PartialTensor logic to handle ReplicatedTensor.
ghstack-source-id: 152064781
Test Plan: test_replicated_tensor
Reviewed By: pritamdamania87, fduwjj
Differential Revision: D34529374
fbshipit-source-id: 16ccb300e9f9c47ac29a17eb6d46d029ab7d60b8
(cherry picked from commit 44f4e11e795a1bf330a8108bda256950ca769525)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73676
For some reason https://github.com/pytorch/pytorch/pull/72637 ended up in getting messed up during rebasing so please refer to that pr for review history.
This PR creates a new workflow called ` deploy-linux-xenial-cuda11.3-py3.7-gcc7` for torch::deploy tests.
For testing go to https://www.torch-ci.com/pytorch/pytorch/pull/73676 and check if a build and test job occur with ` deploy-linux-xenial-cuda11.3-py3.7-gcc7`
Test Plan: Imported from OSS
Reviewed By: soulitzer
Differential Revision: D34586702
Pulled By: PaliC
fbshipit-source-id: 5627cf4ff411a4a04030f8b7726f84af979da213
(cherry picked from commit df6dddebb9fe078a6053a31033b5a40cc742fcf3)
Fixes#72368
As per reference issue, the test_ops in single file takes around 3:30-4:00Hrs to execute on asan jobs:
Reference : pytorch_test_times.json
```
{
"commit": "39535fec6c3ff5bf7c2d322d096c59571c3295ed",
"JOB_BASE_NAME": "linux-xenial-py3.7-clang7-asan",
"job_times": {
"test_ops": 14928.355000000636, <- This test group is over 4hrs alone
```
----
Hence separating test_ops into following parts:
1. TestGradients
2. TestJit
3. TestCommon and TestMathBits
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74297
Approved by: https://github.com/malfet
Fixes#72368
As per reference issue, the test_ops in single file takes around 3:30-4:00Hrs to execute on asan jobs:
Reference : pytorch_test_times.json
```
{
"commit": "39535fec6c3ff5bf7c2d322d096c59571c3295ed",
"JOB_BASE_NAME": "linux-xenial-py3.7-clang7-asan",
"job_times": {
"test_ops": 14928.355000000636, <- This test group is over 4hrs alone
```
----
Hence separating test_ops into following parts:
1. TestGradients
2. TestJit
3. TestCommon and TestMathBits
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74297
Approved by: https://github.com/malfet
Summary:
Remove fx2trt test from oss CI
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72595
Test Plan: CI
Reviewed By: houseroad
Differential Revision: D34112595
Pulled By: wushirong
fbshipit-source-id: 02376ef0f25381eff31b72dcbf964c1966af9793
(cherry picked from commit e3d698a942)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69735
We want to build a prototype of Megatron-LM so that we can apply PT-D op to models like transformer and other Meta flagship models like
The basic idea of Megatron-LM is as following:
1. Col-wise sharding of linear weight. Perform the linear op for the first layer.
2. Perform a math op (optional), such as ReLU or GeLU. We use GeLU in our example unit test. The input is from step 1.
3. Row-wise sharing of linear weight. Perform the linear op for the second layer. The input is from step 2.
We then save communications to concatenate the col-wise sharding results and spreading the input to different ranks for row-wise sharding.
The change is as following:
1. Return a ShardedTensor for the col-wise sharding in the sharded_linear op.
2. Return a PartialTensors for the row-wise sharding in the sharded_linear op.
3. Leverage APIs already defined for `reshard` to merge/aggregate local results to a fully sync local result if needed.
4. Add helper function to create sharded tensor based on the local result.
5. Add a unit test to test the Megatron-LM idea mentioned above and compare with local ops, including the grad and optimizer so that we can ensure the correctness of the implementation.
6. Refactor the unit test of sharded linear to reflect the changes in the code.
ghstack-source-id: 148273049
Test Plan: Unit test + CI
Reviewed By: pritamdamania87
Differential Revision: D32978221
fbshipit-source-id: 565fc92e7807e19d53b0261f8ace3945bef69e3e
(cherry picked from commit 344abe7520)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70079
We defined a new concept named `PartialTensor`, which is an abstraction to represent Tensors that need aggregation across multiple devices and multiple processes.
We also defined a API `reshard_output` to reshard a `PartialTensor` to `Tensor` or reshard a `ShardedTensor` to `ShardedTensor/Tensor`. This is done via class `ModuleResharder` which acts like a wrapper of original modules plus the a reshard in the final step.
The `reshard` logic is defined in each class (`ShardedTensor` and `PartialTensor`).
ghstack-source-id: 148273050
Test Plan: Unit test is in the next PR.
Reviewed By: pritamdamania87
Differential Revision: D33121037
fbshipit-source-id: 5f56617ea526b857c5b73df6e069697d428ec359
(cherry picked from commit 58b1457cbc)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72141
We have many sharding components currently:
torch.distributed._sharded_tensor, torch.distributed._sharding_spec,
torch.distributed._sharded_optimizer and more coming.
As a result, organizing all of this under the `torch.distributed._shard`
package. For BC reasons, I'm still keeping the old packages and have them just
reference the new package.
ghstack-source-id: 148150861
ghstack-source-id: 148150861
Test Plan: waitforbuildbot
Reviewed By: fduwjj
Differential Revision: D33904585
fbshipit-source-id: 057e847eb7521b536a3ee4e0f94871aacc752062
(cherry picked from commit 29a70dd7af)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71742
We have many sharding components currently:
torch.distributed._sharded_tensor, torch.distributed._sharding_spec,
torch.distributed._sharded_optimizer and more coming.
As a result, organizing all of this under the `torch.distributed.shard`
package. For BC reasons, I'm still keeping the old packages and have them just
reference the new package.
ghstack-source-id: 147899768
Test Plan: waitforbuildbot
Reviewed By: fduwjj, wanchaol
Differential Revision: D33755913
fbshipit-source-id: dc692b31e2607063d55dfcb3db33ec53961d5a5b
(cherry picked from commit 5b6885f358)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70145
Added support for torch.equal to ShardedTensor. This is really
helpful in terms of comparing two ShardedTensors.
ghstack-source-id: 146066939
Test Plan: waitforbuildbot
Reviewed By: wanchaol
Differential Revision: D33201714
fbshipit-source-id: 56adfc36e345d512c9901c56c07759bf658c745b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69734
Added support for `torch.equal` to ShardedTensor. This is really
helpful in terms of comparing two ShardedTensors.
Will implement `allclose` in a follow PR.
ghstack-source-id: 145301451
Test Plan: waitforbuildbot
Reviewed By: fduwjj, wanchaol
Differential Revision: D33004315
fbshipit-source-id: 786fe26baf82e1bb4fecfdbfc9ad4b64e704877f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68607
This PR added ShardedOptimizer and a API to get module parameters along with ShardedTensor param, it allows user to use this Optimizer Wrapper to construct a optimizer that involves ShardedTensor
The state_dict support will be a follow up diff
ghstack-source-id: 145532834
Test Plan: python test_sharded_optim.py
Reviewed By: pritamdamania87
Differential Revision: D32539994
fbshipit-source-id: a3313c6870d1f1817fc3e08dc2fc27dc43bef743
Summary:
The error message was changed following a PR comment. And since the test doesn't run on CI, I forgot to update the test to catch the new error message.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69565
Reviewed By: mrshenli
Differential Revision: D32932982
Pulled By: albanD
fbshipit-source-id: a1da72b0ca735e72b481bc944039233094f1c422
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68822
Per title, we switched over c10d_gloo and nccl and results look good
so far, so switch the rest of them as well. After the only dist tests that
won't run in subprocess are pipe and fsdp tests, which historically haven't had
much flakiness.
ghstack-source-id: 144213522
Test Plan: CI
Reviewed By: H-Huang
Differential Revision: D32624330
fbshipit-source-id: 469f613e5b0e4529e6b23ef259d948837d4af26b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68821
Continuing effort to move most distributed tests to run in subprocess
for better reproducibility + reduce flakiness.
ghstack-source-id: 144213520
Test Plan: CI
Reviewed By: H-Huang
Differential Revision: D32624199
fbshipit-source-id: 04448636320554d7a3ab29ae92bc1ca9fbe37da2
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68504
Per title
ghstack-source-id: 143928767
Test Plan: CI
Reviewed By: H-Huang
Differential Revision: D32485100
fbshipit-source-id: a55687aea4af69e3830aee6f0278550c72f142c2
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68503
Per title
ghstack-source-id: 143928768
Test Plan: CI
Reviewed By: H-Huang
Differential Revision: D32484990
fbshipit-source-id: 6682f46256af0da5153e5087a91a7044156dd17f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67400
c10d/frontend.cpp was originally proposed to introduce pure C++ API and use TorcBind to share python level API with TorchScript. This is no longer needed, so delete this to reduce code redundancy.
ghstack-source-id: 143910066
ghstack-source-id: 143910066
Test Plan: wait for ci
Reviewed By: navahgar
Differential Revision: D31979270
fbshipit-source-id: 6ceb8b53d67ab8f9aef44b34da79346dfbb51225
Summary:
Context: https://github.com/pytorch/pytorch/issues/67061
Use `run_test.py`'s provided flag `"--subprocess"`, passed in like `extra_unittest_args=["--subprocess"]` when running test_distributed_spawn. This will ensure that each test is run separately in its own process. The goal is to more closely simulate how a developer would run a single test when reproducing a CI failure and make reproducibility easier in general.
Also, when a test fails, print out the exact command that was issued so developer knows how to reproduce it.
For example test fails, it will print out something like the following to logs -
```
Test exited with non-zero exitcode 1. Command to reproduce: BACKEND=gloo WORLD_SIZE=3 /fsx/users/rvarm1/conda/envs/pytorch/bin/python distributed/test_distributed_spawn.py -v TestDistBackendWithSpawn.test_Backend_enum_class
```
running test_distributed_spawn is still the same cmd as before:
`
python test/run_test.py --verbose -i distributed/test_distributed_spawn
`
as seen in [distributed contributing](https://github.com/pytorch/pytorch/blob/master/torch/distributed/CONTRIBUTING.md) guide.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67901
Reviewed By: cbalioglu, mruberry
Differential Revision: D32225172
Pulled By: rohan-varma
fbshipit-source-id: 7e8d4c7a41858044bd2a4e0d1f0bf8f1ac671d67
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66101
Updated description:
This PR tests the functionalization pass in python in two ways. For each of the test programs that I have in `test_functionalization.py`, it:
- runs the program with and without functionalization, and asserts the outputs and (potentially mutated) inputs are equal in both cases
- runs the program with `LoggingTensor`, and uses expecttests on the resulting graph. I manually confirm that the graphs look reasonable and only contain functional ops.
Mechanically, the changes include:
- factoring out `LoggingTensor` into a testing util so it can be re-used in multiple tests
- adding some private python api's in the `torch` namespace as hooks that I can use during testing
In the original version of this PR, I also added some fixes to the `_make_subclass()` function in python: allowing you to pass in strides and storage_offset. I kept them in mainly because the changes were already there.
Test Plan: Imported from OSS
Reviewed By: zou3519
Differential Revision: D31942095
Pulled By: bdhirsh
fbshipit-source-id: 90ff4c88d461089704922e779571eee09c21d707
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67188
This diff/PR is trying to implement the ShardedEmbeddingBag using the ShardedTensor.
We support both row-wise and column-wise sharding of the embedding bag. The detailed logic can be found in the comment.
Several caveats:
1. Only the sharding of one weight is supported now.
1. We support limited input params for the op. To support more params are on the way.
2. We only support chuck sharding for now.
3. We only support a single local shard per rank for now.
Some other changes include:
1. Refactor the ShardedEmbedding code so that the common logic can be reused.
2. Fix tiny typos and corner cases in API `get_chunked_dim_size`. Where it will return -1 if the we set the dim_size = 5, split_size = 2, idx = 3. (This is a valid case because when chunks = 4, dim_size = 5, then the split_size = 2)
ghstack-source-id: 142325915
Test Plan: Unit test and CI
Reviewed By: pritamdamania87
Differential Revision: D31749458
fbshipit-source-id: ed77e05e4ec94ef1a01b1feda8bbf32dc5d5da1b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63997
Use torch_function to extend torch.nn.init.uniform_
The Init is done in SPMD fashion. Note that ideally we want to aggregate sharded tensors into a global tensor, init it and reshard. It's fine to run it SPMD since uniform is I.I.D indepenent and identifically distributed.
Also enable unit test for test_linear.py for OSS test
Test Plan:
a) Unit Test
(pytorch) ... $ python test/distributed/_sharded_tensor/ops/test_init.py TestShardedTensorNNInit --v
(pytorch) ... $ python test/distributed/_sharded_tensor/ops/test_linear.py --v (before runs this command is no-op)
or b) Manual run: Instruction here: https://docs.google.com/document/d/1_m1Hdo5w51-hhPlZ_F8Y6PIWrN7UgJZqiSpARYvhsaE/edit#
Imported from OSS
Reviewed By: pritamdamania87, anjali411
Differential Revision: D30563017
fbshipit-source-id: d1859f7682235bcb44515efc69ca92bc5e34fce1
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66604
This diff/PR is trying to implement the ShardedEmbedding and ShardedEmbedding using the ShardedTensor.
Several caveats:
1. We support limited input params for the op. To support more params are on the way.
2. We only support chuck sharding for now.
3. We only support a single local shard per rank for now.
ghstack-source-id: 141056130
Test Plan: Unit test and CI
Reviewed By: pritamdamania87
Differential Revision: D31544556
fbshipit-source-id: cc867dcba8c11e6f4c7c3722488908f5108cc67f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63881
This PR includes the minimal sets of features to make FSDP work, like sharding, core data flow and hooks. More tests will be added in the follow up PRs. Tests are refactored to utilize common PyTorch utils. Codes are also refactored a little bit. Alternative ways to replace ".data" usage in this PR are still being discussed offline.
Test Plan: unit tests
Reviewed By: mrshenli
Differential Revision: D30521673
fbshipit-source-id: 9a23390dd7c925749604c6860e08fbe39ddc5500
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64128
This PR implements a sharded nn.Linear layer using ShardedTensors with
the following limitations:
1) Works only for ChunkShardingSpec.
2) Implementation is only aimed to demonstrate functionality and is most likely
not performant at all.
The PR also introduces a `shard_parameter` API to easily shard parameters of
`nn.Modules`. This also has the following limitations:
1) Works only for ChunkShardingSpec.
2) Is not performant since it uses broadcast instead of scatter since
ProcessGroupNCCL doesn't yet support scatter.
Overall user API for running a sharded linear would be something like this:
```
# SPMD programming paradigm running same code on all nodes.
fc = nn.Linear(10, 10)
# Setup sharding.
sharding_spec=ChunkShardingSpec(...)
shard_parameter(fc, 'weight', sharding_spec, src_rank=0)
# Run as a normal linear layer.
inp = torch.rand(10, 10)
output = fc(inp)
```
ghstack-source-id: 138500985
Test Plan:
1) unit tests.
2) waitforbuildbot
Reviewed By: wanchaol, bowangbj
Differential Revision: D30621215
fbshipit-source-id: 1aa7478568c18a4572f6c3462fdf24a4cbde01d6
Summary:
There were several reports of target determinator incorrectly skipping
tests, most recent one is https://github.com/pytorch/pytorch/issues/64902
Let's disable it until it could be further stabilized
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64921
Reviewed By: seemethere, janeyx99
Differential Revision: D30901186
Pulled By: malfet
fbshipit-source-id: 531afd2d390c6b51f727330d5dd1882d70b6fdde
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64253
Follow up to D30496178 (f4aff3a346) to move the rest of distributed tests to their own jobs for Linux GHA.
ghstack-source-id: 137233785
Test Plan: CI
Reviewed By: walterddr
Differential Revision: D30662999
fbshipit-source-id: f7cfbc0d1223aca52120f17f9da987d70fda8de6
Summary:
Introduce `discover_tests` function that globs for all Python files
starting with `test_` in test folder excluding subfolders which are
executed differently
Fixes https://github.com/pytorch/pytorch/issues/64178
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64246
Reviewed By: walterddr, seemethere
Differential Revision: D30661652
Pulled By: malfet
fbshipit-source-id: a52e78ec717b6846add267579dd8d9ae75326bf9
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64197
Removes this line as test is gone.
ghstack-source-id: 136986275
Test Plan: CI
Reviewed By: walterddr
Differential Revision: D30642929
fbshipit-source-id: a0c7dfdfb35a4a7f7ec1b881dbea53d85136012c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63534
In this PR:
* We have changed the default dtype of `AutocastCPU` from `float16` to `bfloat16` as discussed here `https://github.com/pytorch/pytorch/pull/61002`
* We also update the operation list which needs casting to `lower_precision_fp` or `float32`.
Test Plan: Imported from OSS
Reviewed By: zou3519
Differential Revision: D30644914
Pulled By: ezyang
fbshipit-source-id: 8b93485ba452b3759611e3f0ac88e920fe495ac1
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63496
This PR adds a (private) enable_python_mode context manager.
(see torch/utils/_python_dispatch.py).
enable_python_mode accepts the type of a __torch_dispatch__ object
as its argument. Whenever an operator gets called inside of the
context manager, it dispatches to the __torch_dispatch__ of
the passed-in type.
Example usage:
```
with enable_python_mode(LoggingTensor):
z = torch.empty([])
assert isinstance(z, LoggingTensor)
```
There are quite a few changes that were made to support this.
First, we added TorchDispatchTypeObject, a C++ struct that represents the
type of a `__torch_dispatch__` object (e.g. LoggingTensor).
It holds both the PyObject* representing the class and a PyInterpreter*
so we know which Python interpreter it came from.
Next, we updated the concrete_dispatch_fn in python_variable.cpp to accept
a `const std::shared_ptr<TorchDispatchTypeObject>&` argument. When this
is null, dispatching happens as usual. When it is non-null, we prepend
the TorchDispatchTypeObject's PyObject* to the overloaded args list so that
it is considered first for dispatch.
To get that to work, we changed how `handle_torch_dispatch_no_python_arg_parser`
works. The "overloaded args list" previously only consisted of Tensor PyObjects,
but now it can have types in addition to Tensors!
- We renamed `append_overloaded_arg` to `append_overloaded_arg`
- We added a new `append_overloaded_type` that appends a type to
overloaded_args
- We added special handling in `handle_torch_dispatch_no_python_arg_parser`
and `append_overloaded_arg` to handle types in addition to Tensors.
Then, there is PythonMode and PythonModeTLS.
- We reuse the DispatchKey::Python dispatch key as a mode key
- We use PythonMode::enter and PythonMode::exit to enable/disable
DispatchKey::Python and set the PythonModeTLS.
- PythonModeTLS stores a TorchDispatchTypeObject as metadata.
- PythonMode is in libtorch_python, and PythonModeTLS is in ATen.
This split is due to the libtorch_python library boundary (because we need
to save TLS in ATen/ThreadLocalState)
- We modify the PythonFallbackKernel to look up
the relevant TorchDispatchTypeObject (if Python Mode is active) and
dispatch using it.
There are two more miscellaneous changes:
- internal_new_from_data (torch/csrc/utils/tensor_new.cpp) gets an
exclude guard. enable_python_mode currently does not handle
torch.tensor and the exclude guard is to prevent a bug.
Future:
- This PR does not allow for the nesting of Python modes. In the future we
should be able to enable this with a more sane no_dispatch API and by changing
the TLS to a stack. For now I did not need this for CompositeImplicitAutograd testing.
Test Plan: - new tests
Reviewed By: malfet, albanD
Differential Revision: D30543236
Pulled By: zou3519
fbshipit-source-id: ef5444d96a5a957d1657b7e37dce80f9a497d452
Summary:
This is in response to a feature request from some folks in the core team to have a local command that would only run relevant "core" tests. The idea is to have a local smoke test option for developers to run locally before making a PR in order to verify their changes did not break core functionality. These smoke tests are not targeted to be short but rather relevant.
This PR enables that by allowing developers to run `python test/run_test.py --core` or `python test/run_test.py -core` in order to run the CORE_TEST_LIST, which is currently test_nn.py, test_torch.py, and test_ops.py.
I am not the best person to judge what should be considered "core", so please comment which tests should be included and/or excluded from the CORE_TEST_LIST!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63976
Test Plan:
```
(pytorch) janeyx@janeyx-mbp test % python run_test.py --core -v
Selected tests: test_nn, test_ops, test_torch
Running test_nn ... [2021-08-25 14:48:28.865078]
Executing ['/Users/janeyx/miniconda3/envs/pytorch/bin/python', 'test_nn.py', '-v'] ... [2021-08-25 14:48:28.865123]
test_to (__main__.PackedSequenceTest) ... ok
test_to_memory_format (__main__.PackedSequenceTest) ... ok
```
Reviewed By: walterddr
Differential Revision: D30575560
Pulled By: janeyx99
fbshipit-source-id: 3f151982c1e315e50e60cb0d818adaea34556a04
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63809
This moves out the modulefinder determinator to `tools/testing` since it is supposed to be CI-only. This also simplifies run_test.py a little bit.
Test Plan: Imported from OSS
Reviewed By: malfet, seemethere, janeyx99
Differential Revision: D30497438
Pulled By: driazati
fbshipit-source-id: 1d203037af5af6a20c1e7812da935e7cbb5cd82f
Summary:
Currently distributed tests are mixed within test_python.
We would like to split the distributed tests into its own batch thus we need to split them out.
Adding an option to include/exclude distributed tests with CUSTOM_HANDLERS.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63147
Test Plan:
- locally run with the addition run_test.py options.
- CI
Dependency: found a bug in mpiexec test and need https://github.com/pytorch/pytorch/issues/63580 to fix it first.
Reviewed By: bdhirsh
Differential Revision: D30496178
Pulled By: walterddr
fbshipit-source-id: 7903a57b619f2425028028f944211938823918a6
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63357
Adds the ability to set CONTINUE_THROUGH_ERROR as an environment
variable so that we can easily set it without having to add the flag
directly
Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
Test Plan: Imported from OSS
Reviewed By: astaff
Differential Revision: D30351108
Pulled By: seemethere
fbshipit-source-id: 767fa9bd24e1399f359eb24d16f6cc985a2d7173
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63054
1) Ensure these tests are skipped in environments without any GPUs.
2) Add the test to run_test.py
ghstack-source-id: 135595698
Test Plan: waitforbuildbot
Reviewed By: wanchaol
Differential Revision: D30239159
fbshipit-source-id: 21b543ba72e8d10182bc77e7ae1fd34fd4096509
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62937
reland due to windows + cuda failure, fix by running it on gloo on windows even with cuda.
ghstack-source-id: 135306176
Test Plan: ci
Reviewed By: mrshenli
Differential Revision: D30177734
fbshipit-source-id: 7625746984c8f858648c1b3632394b98bd4518d2
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62774
Gates DistributedOptimizer which relies on RRef based on if RPC is available. This should enable ZeRo to work with Windows as Windows should not try to import the DIstributedOptimizer. If this works as expected we can enable the windows tests for functional/local sgd optimizers as well.
ghstack-source-id: 135216642
Test Plan: CI
Reviewed By: pbelevich
Differential Revision: D30117838
fbshipit-source-id: e6365a910a3d1ca40d95fa6777a7019c561957db
Summary:
This PR contains the initial version of `ModuleInfo` for use in testing modules. The design philosophy taken here is to start small and simple and build out / refactor as needed when more test coverage or `ModuleInfo` entries are added. As such, it's not intended for general usage yet. The PR contains the following:
* (new file) `torch/testing/_internal/common_modules.py`
* `ModuleInfo` definition - metadata for each module to use in testing
* `module_db` - the actual `ModuleInfo` database; currently contains entries for two modules
* `ModuleInput` - analogous to `SampleInput` from OpInfo; contains `FunctionInput`s for both constructor and forward pass inputs
* Constructor and forward pass inputs are tied together within a `ModuleInput` because they are likely correlated
* `FunctionInput` - just contains args and kwargs to pass to a function (is there a nicer way to do this?)
* `modules` decorator - analogous to `ops`; specifies a set of modules to run a test over
* Some constants used to keep track of all modules under torch.nn:
* `MODULE_NAMESPACES` - list of all namespaces containing modules
* `MODULE_CLASSES` - list of all module class objects
* `MODULE_CLASS_NAMES` - dict from module class object to nice name (e.g. torch.nn.Linear -> "nn.Linear")
* (new file) `test/test_modules.py`
* Uses the above to define tests over modules
* Currently, there is one test for demonstration, `test_forward`, which instantiates a module, runs its forward pass, and compares it to a reference, if one is defined
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61935
Reviewed By: mruberry
Differential Revision: D29881832
Pulled By: jbschlosser
fbshipit-source-id: cc05c7d85f190a3aa42d55d4c8b01847d1efd57f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61756
DDP will support running optimizer as communication hook with
optimizers that support a per-parameter/gradient step function `step_param`.
Add parity tests as we implement more optimizers that support step_param to
ensure parity with regular optimizers.
ghstack-source-id: 134330378
Test Plan: Ci
Reviewed By: SciPioneer
Differential Revision: D29727549
fbshipit-source-id: 18977c896f12b8e478298488b298fd107affcf5f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59077Fixes#58549
`from_buffer` constructs a tensor object from an already allocated buffer through
CPython's buffer protocol. Besides the standard `dtype`, `count`, and `offset` parameters,
this function also accepts:
- `device`: where the buffer lives
- `requires_grad`: should autograd record operations on the new tensor
A new test file _test_buffer_protocol.py_ was created. Currently, only CPU tests were
implemented. That's because neither PyTorch nor Numba implements CPython's buffer
protocol. Therefore, there's no way to create a CUDA buffer with the existing
dependencies (could use PyCUDA for that, though).
At the moment, if `device` differs from the device the buffer actually lives, two things
may happen:
- `RuntimeError`, if `device='cuda'`
- Segmentation fault (not tested -- see above), if `device='cpu'`
Test Plan: Imported from OSS
Reviewed By: jbschlosser
Differential Revision: D29870914
Pulled By: mruberry
fbshipit-source-id: 9fa8611aeffedfe39c9af74558178157a11326bb
Summary:
and into tools/ folder
Currently run_tests.py invokes tools/test_selections.py
1. download and analyze what test_file to run
2. download and parse S3 stats and pass the info to local files.
3. common_utils.py uses download S3 stats to determine what test cases to run.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61479
Reviewed By: janeyx99
Differential Revision: D29661986
Pulled By: walterddr
fbshipit-source-id: bebd8c474bcc2444e135bfd2fa4bdd1eefafe595
Summary:
run_test.py currently does lots of downloading and test file/suite/case parsing. It doesn't work well outside of the CI environment
Restructured the run_test.py and created tools/test/test_selections.py and move all test selection logic (reordering, categorizing slow test, creating shards)
Follow up PRs should:
- refactor those file read/write logic entangled inside test_selections.py into stats/ folder
- restructure and add network independent test logics to test_test_selections.py
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61124
Test Plan:
- tools/test
- CI
Related PR:
This follows the refactoring example in: https://github.com/pytorch/pytorch/issues/60373
Reviewed By: malfet
Differential Revision: D29558981
Pulled By: walterddr
fbshipit-source-id: 7f0fd9b4720a918d82918766c002295e8df04169
Summary:
Changes including:
- introduced `linter/`, `testing/`, `stats/` folders in `tools/`
- move appropriate scripts into these folders
- change grepped references in the pytorch/pytorch repo
Next step
- introduce `build/` folder for build scripts
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60473
Test Plan:
- CI (this is important b/c pytorch/test-infra also rely on some script reference.
- tools/tests/
Reviewed By: albanD
Differential Revision: D29352716
Pulled By: walterddr
fbshipit-source-id: bad40b5ce130b35dfd9e59b8af34f9025f3285fd
Summary:
This PR is a first step in unifying our environment variables across CI (so that we don't have `CIRCLE_BLAH` in our GHA workflows, for example), though I'd like for this PR to be more for discussion about how best to consolidate these variables.
This small change only changes most CIRCLE_JOB references in our code to be JOB_BASE_NAME, as that seems the closest GHA (and ROCm) equivalent. Currently, JOB_BASE_NAME is defined as:
- in Circle: CIRCLE_JOB (name of the job, like `pytorch_linux_bionic_py3_8_gcc9_coverage_test1`)
- in GHA: the build_environment with a `-build` or `-test` tacked to the end , e.g., `pytorch-linux-xenial-cuda10.2-cudnn7-py3.6-gcc7-test`
- in ROCm: I don't actually know, but it's important for ROCm test sharding as shown in https://github.com/pytorch/pytorch/pull/60409
I am not sure if this is the intention for JOB_BASE_NAME so it is open to discussion what variable we should use if not JOB_BASE_NAME. I also don't know if it's worth the effort consolidating all these variables, so discussion is also highly encouraged there!
Next steps:
- Consolidate more CIRCLE_* references, maybe into CI_* equivalents?
- We use BUILD_ENVIRONMENT everywhere in Circle though the variable is inconsistent across binary vs CI jobs and across platforms. For example, for linux tests and builds, BUILD_ENVIRONMENT contains the `_test` and `_build` suffixes, but the windows jobs don't. In GHA, BUILD_ENVIRONMENT is similar to how it's defined in windows jobs on Circle. This inconsistency is confusing, and we can probably do something about it. I'm thinking of switching out BUILD_ENVIRONMENT for JOB_BASE_NAME in our test scripts where we actually mean JOB_BASE_NAME.
- We should probably document the meaning of the variables we consolidate somewhere, preferably in a README in some unified `ci/` folder. For example, it seems BUILD_ENVIRONMENT is supposed to capture the build environment, whereas JOB_BASE_NAME is supposed to capture the environment _and_ whether we're building or testing.
Notes:
- I did not replace CIRCLE_JOB references in third_party directories
- Previously, print_test_stats reported CIRCLE_JOB as only the build environment for GHA workflows, and I think tacking on the `build` or `test` will not harm anything, though I may be wrong.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60425
Reviewed By: seemethere, samestep
Differential Revision: D29333882
Pulled By: janeyx99
fbshipit-source-id: a82080e6205a03a1183035011ce59698eca06748
Summary:
Adding windows CUDA smoke tests on PRs (master should run the full suite).
Next step:
- Automate data update so we get a new smoke test list without manual effort
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59686
Test Plan: https://github.com/pytorch/pytorch/actions/runs/958296267 The sharded smoke tests take long still because of dependencies installation
Reviewed By: walterddr
Differential Revision: D29243533
Pulled By: janeyx99
fbshipit-source-id: dde7ba127fa15c95bda0e833cc5311598fb85e2b
Summary:
This is branch off of https://github.com/pytorch/pytorch/issues/59970 to only shard on linux so far (we're running in issues with windows gflags).
This would enable sharding of tests on a few Linux jobs on GHA, allowing tts to be essentially halved.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60124
Reviewed By: zou3519
Differential Revision: D29204211
Pulled By: janeyx99
fbshipit-source-id: 1cc31d1eccd564d96e2aef14c0acae96a3f0fcd0
Summary:
Currently S3 test stats doesn't support PR stats parisng.
Changes to s3_stats_parser:
1. they are uploaded to `test_times/{sha1}/{job}` and `pr_test_times/{pr}/{sha1}/{job}` separately. Thus we need parsing logics for both
2. need to attach time for PR stats parsing for ordering since PR commits can be force-pushed
Changes to run_test.py
1. Reordering based on previous PR stats if available
2. Falling back to file change option if not enabled.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60026
Test Plan:
- CI.
- local repro: plz run:
```
CIRCLE_JOB="pytorch_linux_bionic_py3_6_clang9_noarch_test" CIRCLE_PR_NUMBER=60057 IN_CI=1 ENABLE_PR_HISTORY_REORDERING=1 python test/run_test.py
```
Reviewed By: samestep
Differential Revision: D29164754
Pulled By: walterddr
fbshipit-source-id: 206688e0fb0b78d1c9042c07243da1fbf88a924b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59840
moving these tests to their own standalone file. No meaningful code changes.
ghstack-source-id: 131359162
Test Plan: CI
Reviewed By: cbalioglu
Differential Revision: D29012664
fbshipit-source-id: 348870016509a6ed7e69240fa82bccef4a12d674
Summary:
instead of having specific logic to handle run-specific-test-case, we provide the flag to override include or bring-to-front with the SPECIFIED_TEST_CASES_FILE.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59704
Reviewed By: janeyx99
Differential Revision: D29038425
Pulled By: walterddr
fbshipit-source-id: 803d3555813437c7f287a22f7704106b0c609919
Summary:
Do not reorder tests unless they are in IN_CI, this causes local development test ordering indeterministic. most of use branch out from viable strict not head of master.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59565
Reviewed By: ejguan
Differential Revision: D28943906
Pulled By: walterddr
fbshipit-source-id: e742e7ce4b3fc017d7563b01e93c4cd774d0a537
Summary:
The run-specified-test-cases option would allow us to specify a list of test cases to run by having a CSV with minimally two columns: test_filename and test_case_name.
This PR also adds .json to some files we use for better clarity.
Usage:
`python test/run_test.py --run-specified-test-cases <csv_file>` where the csv file can look like:
```
test_filename,test_case_name,test_total_time,windows_only_failure_sha_count,total_sha_count,windows_failure_count,linux_failure_count,windows_total_count,linux_total_count
test_cuda,test_cudnn_multiple_threads_same_device,8068.8409659525,46,3768,53,0,2181,6750
test_utils,test_load_standalone,8308.8062920459,14,4630,65,0,2718,8729
test_ops,test_forward_mode_AD_acosh_cuda_complex128,91.652619369806,11,1971,26,1,1197,3825
test_ops,test_forward_mode_AD_acos_cuda_complex128,91.825633094915,11,1971,26,1,1197,3825
test_profiler,test_source,60.93786725749,9,4656,21,3,2742,8805
test_profiler,test_profiler_tracing,203.09352795241,9,4662,21,3,2737,8807
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59487
Test Plan:
Without specifying the option, everything should be as they were before.
Running `python test/run_test.py --run-specified-test-cases windows_smoke_tests.csv` resulted in this paste P420276949 (you can see internally). A snippet looks like:
```
(pytorch) janeyx@janeyx-mbp pytorch % python test/run_test.py --run-specified-test-cases windows_smoke_tests.csv
Loading specified test cases to run from windows_smoke_tests.csv.
Processed 28 test cases.
Running test_cpp_extensions_jit ... [2021-06-04 17:24:41.213644]
Executing ['/Users/janeyx/miniconda3/envs/pytorch/bin/python', 'test_cpp_extensions_jit.py', '-k', 'test_jit_cuda_archflags'] ... [2021-06-04 17:24:41.213781]
s
----------------------------------------------------------------------
Ran 1 test in 0.000s
OK (skipped=1)
...
```
With pytest, an example executable would be:
`Running test_dataloader ... [2021-06-04 17:37:57.643039]
Executing ['/Users/janeyx/miniconda3/envs/pytorch/bin/python', '-m', 'pytest', 'test_dataloader.py', '-v', '-k', 'test_segfault or test_timeout'] ... [2021-06-04 17:37:57.643327]`
Reviewed By: samestep
Differential Revision: D28913223
Pulled By: janeyx99
fbshipit-source-id: 0d1f9910973426b8756815c697b483160517b127
Summary:
It would be most accurate if sharding occurred after all other changes to selected_tests were complete.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59583
Reviewed By: ejguan
Differential Revision: D28944737
Pulled By: janeyx99
fbshipit-source-id: a851473948a5ec942ffeeedeefdc645536a3d9f7
Summary:
Partially addresses https://github.com/pytorch/pytorch/issues/55340
**Overview**
This factors out `FileStoreTest`, `HashStoreTest`, `PrefixFileStoreTest`, `TCPStoreTest`, `PrefixTCPStoreTest`, `PythonStoreTest`, `RendezvousTest`, `RendezvousEnvTest`, `RendezvousFileTest`, and `RendezvousTCPTest` from `test_c10d_common.py` to a new file `test_store.py`.
Additionally, unused import/initialization statements are removed from `test_c10d_common.py`, and the minimal set of import/initialization statements are used for `test_store.py`.
Also, this changes `.jenkins/pytorch/multigpu-test.sh`, `.jenkins/pytorch/win-test-helpers/test_distributed.bat`, and `test/run_test.py` to include the new `test_store.py`.
**Testing**
All commands shown are run on an AI AWS cluster.
I check the Store tests:
```
python test/distributed/test_store.py
```
I also check `test_c10d_common.py` since it is the source of the refactored code. In addition, I check `test_c10d_nccl.py` and `test_c10d_gloo.py` since they import from `test_c10d_common.py`; those two should be the only test files depending on `test_c10d_common.py`.
```
python test/distributed/test_c10d_common.py
python test/distributed/test_c10d_nccl.py
python test/distributed/test_c10d_gloo.py
```
`test_c10d_gloo.py` produces warnings about how using sparse tensors in TorchScript is experimental, but the warnings do not result from this PR's changes.
**Testing Issues** (To Be Revisited)
```
WORLD_SIZE=4 BACKEND=gloo gpurun pytest test/distributed/test_c10d_gloo.py
```
Running the above command fails three tests (written as `[Test]`: `[Error]`):
- `ProcessGroupGlooWrapperTest.test_collective_hang`: `RuntimeError: [../third_party/gloo/gloo/transport/tcp/pair.cc:598] Connection closed by peer [10.200.24.101]:15580`
- `CommTest.test_broadcast_coalesced_gloo_cuda`: `RuntimeError: cuda runtime error (3) : initialization error at ../aten/src/THC/THCGeneral.cpp:54`
- `CommTest.test_sequence_num_incremented_gloo_default`: `RuntimeError: cuda runtime error (3) : initialization error at ../aten/src/THC/THCGeneral.cpp:54`
However, running each of the following yields no errors:
```
WORLD_SIZE=4 BACKEND=gloo gpurun pytest test/distributed/test_c10d_gloo.py -k test_collective_hang
WORLD_SIZE=4 BACKEND=gloo gpurun pytest test/distributed/test_c10d_gloo.py -k test_broadcast_coalesced_gloo_cuda
WORLD_SIZE=4 BACKEND=gloo gpurun pytest test/distributed/test_c10d_gloo.py -k test_sequence_num_incremented_gloo_default
```
This suggests the existence of some inadvertent state dependency between tests (e.g. improper cleanup). I have not explored this further yet. In particular, I do not have a solid understanding of the tests to be able to explain why using `pytest` and `gpurun` induces the failure (since notably, running the `.py` directly shows no issue).
Similarly, running the following yields 47 errors:
```
WORLD_SIZE=4 BACKEND=nccl gpurun pytest test/distributed/test_c10d_nccl.py
```
The errors seem to all be simply complaining about the usage of `fork()` instead of `spawn()` for CUDA multiprocessing. Though, most of the tests in `test_c10d_nccl.py` ask for at least 2 CUDA devices, so I think that the `gpurun` is warranted (assuming that the test file does not need to be run partially on different machines).
Both `test_c10d_common.py` and `test_store.py` work fine with `pytest`.
**Other Notes**
I noticed that `torch.distributed` is imported both as `dist` and as `c10d` and that `c10d` is used throughout the Store tests. I was curious if this is intentional (as opposed to using `dist` to refer to `torch.distributed`). Also, the original [issue](https://github.com/pytorch/pytorch/issues/55340) suggests that the Store tests do not use multiprocessing, but I saw that `torch.multiprocessing` is still used in `TCPStoreTest`.
The links for the Store files in the `CONTRIBUTING.md` [file](https://github.com/pytorch/pytorch/blob/master/torch/distributed/CONTRIBUTING.md) are broken. This can fixed in a separate PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59271
Reviewed By: jbschlosser, mrshenli
Differential Revision: D28856920
Pulled By: andwgu
fbshipit-source-id: 630950cba18d34e6b5de661f5a748f2cddc1b446
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55728
Full design: https://github.com/pytorch/pytorch/issues/55207
This PR introduces ChunkShardingSpec (SingleShardingSpec in the design). Used
the name ChunkShardingSpec since it is very similar to `torch.chunk` in terms
of how a Tensor is split up and feels more clear compared to SingleShardingSpec.
ghstack-source-id: 129603318
Test Plan: waitforbuildbot
Reviewed By: SciPioneer
Differential Revision: D27694108
fbshipit-source-id: c8764abe6a4d5fc56d023fda29b74b5af2a73b49
Summary:
fixes https://github.com/pytorch/pytorch/issues/58632.
Added several skips that relates to test assert and MKL. Will address them in separate PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58666
Reviewed By: seemethere, janeyx99
Differential Revision: D28607966
Pulled By: walterddr
fbshipit-source-id: 066d4afce2672e4026334528233e69f68da04965
Summary:
Some machines don't have a versionless `python` on their PATH, which breaks these existing shebangs.
I'm assuming that all the existing versionless `python` shebangs are meant to be `python3` and not `python2`; please let me know if my assumption was incorrect for any of these.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58275
Test Plan: CI.
Reviewed By: zhouzhuojie
Differential Revision: D28428143
Pulled By: samestep
fbshipit-source-id: 6562be3d12924db72a92a0207b060ef740f61ebf
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56868
See __init__.py for a summary of the tool.
The following sections are present in this initial version
- Model Size. Show the total model size, as well as a breakdown by
stored files, compressed files, and zip overhead. (I expect this
breakdown to be a bit more useful once data.pkl is compressed.)
- Model Structure. This is basically the output of
`show_pickle(data.pkl)`, but as a hierarchical structure.
Some structures cause this view to crash right now, but it can be
improved incrementally.
- Zip Contents. This is basically the output of `zipinfo -l`.
- Code. This is the TorchScript code. It's integrated with a blame
window at the bottom, so you can click "Blame Code", then click a bit
of code to see where it came from (based on the debug_pkl). This
currently doesn't render properly if debug_pkl is missing or
incomplete.
- Extra files (JSON). JSON dumps of each json file under /extra/, up to
a size limit.
- Extra Pickles. For each .pkl file in the model, we safely unpickle it
with `show_pickle`, then render it with `pprint` and include it here
if the size is not too large. We aren't able to install the pprint
hack that thw show_pickle CLI uses, so we get one-line rendering for
custom objects, which is not very useful. Built-in types look fine,
though. In particular, bytecode.pkl seems to look fine (and we
hard-code that file to ignore the size limit).
I'm checking in the JS dependencies to avoid a network dependency at
runtime. They were retrieved from the following URLS, then passed
through a JS minifier:
https://unpkg.com/htm@3.0.4/dist/htm.module.js?modulehttps://unpkg.com/preact@10.5.13/dist/preact.module.js?module
Test Plan:
Manually ran on a few models I had lying around.
Mostly tested in Chrome, but I also poked around in Firefox.
Reviewed By: dhruvbird
Differential Revision: D28020849
Pulled By: dreiss
fbshipit-source-id: 421c30ed7ca55244e9fda1a03b8aab830466536d
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56386
The diff resolves bug around incorrect handler resolution:
_create_static_handler pointed towards etcd, and _create_etcd_handler pointed towards static.
Test Plan:
buck test mode/dev-nosan //caffe2/test/distributed:test_launcher
Added test_launcher to the ci/cd tests
Reviewed By: cbalioglu
Differential Revision: D27858897
fbshipit-source-id: 440155789958c091ce5755e7c9524e4bb704203a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56039
Python will try to eagerly resolve the name references even if
the import failed. Quote them so that it doesn't.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Reviewed By: janeyx99
Differential Revision: D27770536
Pulled By: ezyang
fbshipit-source-id: b111739289498f9bab856fb9424f3080efee4ee0
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55695
In order to be able to run CUDA tests on their own (e.g., to avoid running CPU tests on GPU machines).
Done by moving test methods to a separate class (and sometimes introducing a "common" base class for utils), and then providing new entry points inside a `cuda/` subdirectory.
Test Plan: Checked they are run on Sandcastle.
Reviewed By: mrshenli
Differential Revision: D27618198
fbshipit-source-id: 8f671657f79c8ae115748ab7752fe0066705893b
Summary:
1. move module related stuff to test_module_container
2. created test_types for types and annotation
3. created test_misc for the rest
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55560
Reviewed By: VitalyFedyunin
Differential Revision: D27650911
Pulled By: walterddr
fbshipit-source-id: d895a7da9e9c3d25a662a37faf4daabc276b9c1a
Summary:
Prettifies JSON files .pytorch-test-times and .pytorch-slow-tests so that not everything is on one single line.
This is of slightly more importance as generated .pytorch-slow-tests ends up getting stored in our test-infra repo ([example](ad9cd87565)), and it is nice to not have that lil red symbol at the end.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55335
Reviewed By: samestep
Differential Revision: D27576930
Pulled By: janeyx99
fbshipit-source-id: be58565b8c8593a9bfcfab383ee19facc79f0572
Summary:
Moves more s3 parsing code to s3_stat_parser.py. This is another step in modularizing the parsing code more correctly. I will also be using this exact function in future slowTest code.
Also replaces some Any's in the code to be Report.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54808
Test Plan:
.pytorch-test-times generated before the code and after this code is the same.
CI should pass, specifically the test tools GHA.
Reviewed By: walterddr
Differential Revision: D27375783
Pulled By: janeyx99
fbshipit-source-id: bec28551668b2eb3fdd60d802200993e493eac83
Summary:
First step to move all S3 related operations into S3 parser utils.
in the end we provide APIs from s3_stats_parser:
1. downloading data as reports and uploading data as reports
2. filter by job name
and handle all compression, formatting inside.
TODO
- [ ] Refactor out upload into s3_stats_parser
- [ ] Remove all S3/BOTO related checkers and try/catch blocks outside of s3_stats_parser
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54681
Test Plan:
1. Running tools/test/* covers the refactoring logic (test_test_history.py and test_stats.py as entrypoint and both using the 2 new APIs in s3_stats_parser after the refactoring.
2. print_test_stats.py's main argparse entrypoint is covered by CI step Report Test Result step.
3. run `python test/run_test.py --export-past-test-times` before and after this PR should result in the same file content in .pytorch-test-times
Reviewed By: ailzhang
Differential Revision: D27346742
Pulled By: walterddr
fbshipit-source-id: fb40162e631e007fed9d5821fe4f190bda2cb52e
Summary:
Since `_test1`, `_test2` and `_build` and `test` are all stripped, `slow_test` should be stripped as well. This way, the _slow_test stats will be considered as a part of all stats relating to a particular build job, though currently, it doesn't do much because the jobs don't share a common stemmed name--the build has `_gcc7` while the slow_test CI job does not.
This makes me think...do we omit the `gcc7` intentionally? Are there other things I should strip, e.g., `multigpu_test`?
See:
ci/circleci: pytorch_linux_xenial_cuda10_2_cudnn7_py3_slow_test
ci/circleci: pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_test1
ci/circleci: pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_test2
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54528
Reviewed By: samestep
Differential Revision: D27270393
Pulled By: janeyx99
fbshipit-source-id: ffb7289cfe4dba52ded67f50a89f3e75e7bad68d
Summary:
Step 2 to fixing https://github.com/pytorch/pytorch/issues/53882 :)
This changes TARGET_DET_LIST and sharding automation by checking if there's already cached data from the commit in `.pytorch-test-times`. If not, it pulls data from S3 and updates the file to have the stats. This way, S3 pulling does not need to happen more than once for the same commit.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54210
Test Plan:
the following methods should run the same set of tests.
First `export CIRCLE_JOB=pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_test2` or your favorite CIRCLE JOB.
1. Pull data first and use it:
Download the data from S3 and write it to the cache file with `python test/run_test.py --export-historic-test-times .pytorch-test-times`
Now run `python test/run_test.py --shard 1 10`
2. Make the sharding job pull data:
Delete the file you just created: `rm .pytorch-test-times`
Now run `python test/run_test.py --shard 1 10`
Reviewed By: walterddr
Differential Revision: D27136849
Pulled By: janeyx99
fbshipit-source-id: 51a42c4e2fa3f8cf15e682679dd3eb6130aad927
Summary:
This is an initial attempt in refactoring and consolidating our S3 read logic for print_test_stats.py, test_history.py, and run_test.py. This way, boto3 and botocore do not need to be imported in various places throughout the code base, and duplicated logic (such as the many type definitions) can exist in one place: `tools/stat_utils/s3_stat_parser.py`. walterddr contributed to this PR by moving print_test_stats.py to the tools folder and the corresponding tests a subfolder within tools.
**NOTE: this removes those tests from CI as the new `tools/test/test_stats.py` is not in the test/ directory as the other tests in TESTS in run_test.py.**
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53755
Test Plan:
This refactoring change should not break anything, so running the files as before should work as they did previously.
To make sure that print_test_stats.py still functions: run `python tools/test/test_stats.py` and make sure all tests pass.
To make sure that test_history.py works, run the example commands from `tools/test_history.py --help` and check that their output matches that shown. Note that the script will continue printing for a while, so don't be alarmed.
Some next steps:
- Actually coming up with similarities among the three current use cases and further refactoring/consolidating of functions (e.g., combining simplify and get_cases)
- Moving more parsing logic to s3_stat_parser.py to have better abstraction between our files
- Adding tests for s3_stat_parser.py when there is more functionality in it
Reviewed By: agolynski, samestep
Differential Revision: D27030285
Pulled By: janeyx99
fbshipit-source-id: e664781324ef7c0c30943bfd7f17c895075ef7a7
Summary:
This will allow for future work to use the test times file (which will save computation time and also allow for more consistency). (Step one to fixing https://github.com/pytorch/pytorch/issues/53882)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54083
Test Plan:
export CIRCLE_JOB=your-favorite-circleci-job e.g., pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_test2
`python test/run_test.py --export-historic-test-times` OR
`python test/run_test.py --export-historic-test-times .your-favorite-file`
When opening either .pytorch-test-times or .your-favorite-file, you should see something like:
```
{"commit": "2d559a09392aabb84dfb4a498010b2f01d99818c", "job_times": {"distributed/test_distributed_spawn": 583.5889999999973, "distributed/test_data_parallel": 4.866999999999997, "test_binary_ufuncs": 171.1569999999998, "test_numpy_interop": 2.5649999999999995, "test_public_bindings": 0.011,...}}
```
Note that no tests will be run when this option is specified.
Reviewed By: walterddr
Differential Revision: D27091351
Pulled By: janeyx99
fbshipit-source-id: e191d739268d86de0a0ba0eea0006969859d1940
Summary:
This PR:
1. moves sharding algorithm from run_test.py to framework_utils.py (let me know if you have a better place for it)
2. adds tests for the algorithm in test_testing.py
3. fixes the algorithm so that it doesn't tack on the unknown jobs all to the shard with the minimum time, but instead distributes them around the shards.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53942
Test Plan: python test/test_testing.py -k TestFrameworkUtils
Reviewed By: samestep
Differential Revision: D27047223
Pulled By: janeyx99
fbshipit-source-id: 824b20009c0bb707aa5361de445cdec795d5e3f1
Summary:
First argument is either file name or test module name, but key to `CUSTOM_HANDLERS` is test module name.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53884
Test Plan: Run `python3 run_test.py -i distributed/test_distributed_spawn.py`
Reviewed By: janeyx99
Differential Revision: D27006164
Pulled By: malfet
fbshipit-source-id: f30b42856cd2754e5981c1c69618f84e392c986a
Summary:
This PR:
1. refactors the logic for S3 stats gathering.
2. Renames SLOW_TESTS to TARGET_DET_LIST to disambiguate and remove confusion with slowTest
2. detects slow tests (tests with time > 5min) to add to the TARGET_DET_LIST based on results in S3 from the previous nightly.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53549
Test Plan:
Set CIRCLE_JOB to your favorite CI job (like `pytorch_linux_bionic_py3_8_gcc9_coverage_test1`).
Run `python test/run_test.py --determine-from=<your fave pytorch files>`
e.g., `python test/run_test.py --determine-from=test/run_test.py`
Reviewed By: mrshenli
Differential Revision: D26904478
Pulled By: janeyx99
fbshipit-source-id: 9576b34f4fee09291d60e36ff2631753a3925094
Summary:
Context: https://github.com/pytorch/pytorch/pull/53299#discussion_r587882857
These are the only hand-written parts of this diff:
- the addition to `.github/workflows/lint.yml`
- the file endings changed in these four files (to appease FB-internal land-blocking lints):
- `GLOSSARY.md`
- `aten/src/ATen/core/op_registration/README.md`
- `scripts/README.md`
- `torch/csrc/jit/codegen/fuser/README.md`
The rest was generated by running this command (on macOS):
```
git grep -I -l ' $' -- . ':(exclude)**/contrib/**' ':(exclude)third_party' | xargs gsed -i 's/ *$//'
```
I looked over the auto-generated changes and didn't see anything that looked problematic.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53406
Test Plan:
This run (after adding the lint but before removing existing trailing spaces) failed:
- https://github.com/pytorch/pytorch/runs/2043032377
This run (on the tip of this PR) succeeded:
- https://github.com/pytorch/pytorch/runs/2043296348
Reviewed By: walterddr, seemethere
Differential Revision: D26856620
Pulled By: samestep
fbshipit-source-id: 3f0de7f7c2e4b0f1c089eac9b5085a58dd7e0d97
Summary:
Uses nightly commit stats to automatically shard tests based on execution time.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53269
Test Plan:
set CIRCLE_JOB to an existing job, like `pytorch_linux_bionic_py3_6_clang9_test`
Then you can run something like: `python test/run_test.py --shard 1 10`
Reviewed By: malfet
Differential Revision: D26819440
Pulled By: janeyx99
fbshipit-source-id: 6bc73d6aa3d52d9850817536be15d7b54a72780e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52910
**Summary**
PR #52158 tried to move all JIT bindings from `torch._C` to a new
submodule `torch._C._jit`, but that...did not go well. This pull request
adds the new `torch._C._jit` submodule, but does not migrate the
existing bindings. Instead, it adds a unit test that fails if any new
bindings are added to `torch._C`. A comment in the test instructs
developers to add their new binding to the allowlist if it really should
be in `torch._C`, or to add it to the appropriate submodule (e.g
`torch._C._jit`, for example). The idea is to prevent the issue
described in #51691 from getting *worse* if it cannot be fixed.
**Test Plan**
Continuous integration.
**Fixes**
This commit fixes#51691.
Test Plan: Imported from OSS
Reviewed By: albanD
Differential Revision: D26698373
Pulled By: SplitInfinity
fbshipit-source-id: ec9f5426051227a513d4fd09512b624420e0100b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52323
Using default cpu allocator for ops executed on qnnpack backend will result in
asan failures with heap overflow since qnnpack (and xnnpack) can access input
beyond their and/beginning.
Here we are enabling this feature specifically to enable dynamic sparse linear op test
using qnnpack engine. In dynamic linear op, the fp32 bias is not packed and
hence can result in out-of-bound access.
Test Plan: test_set_default_mobile_cpu_allocator.py
Reviewed By: z-a-f
Differential Revision: D26263481
fbshipit-source-id: a49227cac7e6781b0db4a156ca734d7671972d9f
Summary:
Implement the first stage of ZeRO, sharding of the optimizer state, as described in [this blog post](https://www.microsoft.com/en-us/research/blog/zero-2-deepspeed-shattering-barriers-of-deep-learning-speed-scale/) and [this paper](https://arxiv.org/abs/1910.02054). This implementation is completely independent from the [DeepSpeed](https://github.com/microsoft/DeepSpeed) framework, and aims at providing ZeRO-compliant building blocks within the PyTorch scheme of things.
This works by:
- acting as a wrapper to a pytorch optimizer. ZeROptimizer does not optimize anything by itself, it only shards optimizers for distributed jobs
- each rank distributes parameters according to a given partitioning scheme (could be updated), and owns the update of a given shard only
- the .step() is called on each rank as expected, the fact that the optimizer actually works on a shard of the model is not visible from the outside
- when the update is completed, each rank broadcasts the updated model shard to all the other ranks
This can be used with DDP, although some communications are wasted in that case (gradients are all-reduced to all ranks). This implementation was initially developed in [Fairscale](https://github.com/facebookresearch/fairscale), and can also be used with an optimized DDP which only reduces to the relevant ranks. More context on ZeRO and PyTorch can be found in [this RFC](https://github.com/pytorch/pytorch/issues/42849)
The API with respect to loading and saving the state is a known pain point and should probably be discussed an updated. Other possible follow ups include integrating more closely to a [modularized DDP](https://github.com/pytorch/pytorch/issues/37002), [making the checkpoints partition-agnostic](https://github.com/facebookresearch/fairscale/issues/164), [exposing a gradient clipping option](https://github.com/facebookresearch/fairscale/issues/98) and making sure that mixed precision states are properly handled.
original authors include msbaines, min-xu-ai and myself
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46750
Reviewed By: mruberry
Differential Revision: D25958918
Pulled By: blefaudeux
fbshipit-source-id: 14280f2fd90cf251eee8ef9ac0f1fa6025ae9c50
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49698
Reincarnation of #47620 by jamesr66a.
It's just an initial bunch of things that we're exposing to python, more
is expected to come in future. Some things can probably be done better,
but I'm putting this out anyway, since some other people were interested
in using and/or developing this.
Differential Revision: D25668694
Test Plan: Imported from OSS
Reviewed By: bertmaher
Pulled By: ZolotukhinM
fbshipit-source-id: fb0fd1b31e851ef9ab724686b9ac2d172fa4905a
Summary:
Used to temporarily change working directory, but restore it even if exception is raised
Use it in test_type_hints and during code coverage collection
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49657
Reviewed By: walterddr
Differential Revision: D25660543
Pulled By: malfet
fbshipit-source-id: 77f08d57e4b60b95daa4068d0dacf7c25f978526
Summary:
Instead of calling coverage frontend import coverage module and call combine() and html_report()
Fixes https://github.com/pytorch/pytorch/issues/49596 by not using a strict mode when combining those reports
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49615
Reviewed By: seemethere
Differential Revision: D25645196
Pulled By: malfet
fbshipit-source-id: be55b5c23a3569a331cbdf3f86d8c89bc27d5fe1
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47829
As per proposal in https://github.com/pytorch/pytorch/issues/44827,
the API needs to return an RRef to support inter-host pipelining.
For now, we just return a local RRef and only support pipeline on a single
host. But having this change in the API upfront ensures we don't make any BC
breaking changes later.
ghstack-source-id: 118366784
Test Plan: waitforbuildbot
Reviewed By: rohan-varma
Differential Revision: D24914022
fbshipit-source-id: e711e7d12efa45645f752f0e5e776a3d845f3ef5
Summary:
This adds a transform to convert a real vector of (D * (D-1))/2 dimension into the cholesky factor of a D x D correlation matrix. This follows the implementation in [NumPyro](https://github.com/pyro-ppl/numpyro/blob/master/numpyro/distributions/transforms.py) by fehiepsi. This is needed for the LKJDistribution which will be added in a subsequent PR.
Also in line with the ongoing effort to refactor distributions test, this moves the transforms test into its own file that uses pytest with parametrized fixtures.
For review:
fehiepsi - could you help review the math?
fritzo - do you have any suggestions for what to do about the event dimension (more details are in the comment below)?
ezyang - could you review the changes in `run_test.py`? Instead of a separate `PYTEST_TESTS`, I have clubbed these tests in `USE_PYTEST_LIST` to avoid duplicate logic. The only difference is that we do not anymore check if pytest is not installed and exclude the tests in the list. I figured that if existing tests are already using pytest, this should not matter.
TODOs (probably not all can be satisfied at the same time):
- [x] Use operations that are JIT friendly, i.e. the transform works with different sized input under JIT.
- [x] Resolve test failures - currently `arange(scalar_tensor)` fails on certain backends but this is needed for JIT. Maybe we should only support same sized tensor under JIT?
- [x] Add tests to check that the transform gives correct gradients and is in agreement with the `log_det_jacobian`.
- [x] Add `input_event_dim` and `output_event_dim` to `CorrCholeskyTransform`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48041
Reviewed By: zhangguanheng66
Differential Revision: D25262505
Pulled By: neerajprad
fbshipit-source-id: 5a57e1c19d8230b53592437590b9169bdf2f71e9
Summary:
Creates multiple new test suites to have fewer tests in test_torch.py, consistent with previous test suite creation like test_unary_ufuncs.py and test_linalg.py.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47356
Reviewed By: ngimel
Differential Revision: D25202268
Pulled By: mruberry
fbshipit-source-id: 75fde3ca76545d1b32b86d432a5cb7a5ba8f5bb6
Summary:
Previously it was only possible to pass up to one [verbosity level](https://adamj.eu/tech/2019/10/03/my-most-used-pytest-commandline-flags/) to `pytest` when running a test via `test/run_test.py`. Presumably that behavior was never added because `unittest` [doesn't do anything extra](https://stackoverflow.com/a/1322648/5044950) when given more than one `--verbose` flag. This PR removes that limitation.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48204
Test Plan:
Make a dummy `pytest`-style file `test/test_foo.py`:
```py
def test_bar():
assert 'hello\n' * 10 == 'hello\n' * 20
```
Then add `'test_foo'` to both `TESTS` and `USE_PYTEST_LIST` in `test/run_test.py`, and run this command:
```sh
test/run_test.py -vvi test_foo
```
Reviewed By: walterddr
Differential Revision: D25069147
Pulled By: samestep
fbshipit-source-id: 2765ee78d18cc84ea0e262520838993f9e9ee04f
Summary:
Inside a container, the user is often root. We should allow this use case so that people can easily run `run_test.py` insider a container
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43794
Reviewed By: ezyang
Differential Revision: D24904469
Pulled By: malfet
fbshipit-source-id: f96cb9dda3e7bd18b29801cde4c5b0616c750016
Summary:
asan testing diff is absurd right now, moving some heftier tests to be in shard2 (test_nn and test_quantization)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47290
Reviewed By: malfet
Differential Revision: D24706877
Pulled By: janeyx99
fbshipit-source-id: 35069d1e425857f85775f9be76501d6a158e0376
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46967
Tests under `tests/distributed/_pipeline/sync` use pytest and
specifying the `-f` option for such tests as follows: `python test/run_test.py
-i distributed/_pipeline/sync/skip/test_api -- -f` doesn't work.
The equivalent option for pytest is `-x`. To resolve this issue, I've updated
`run_test.py` to replace `-f` with `-x` for pytest tests.
More details in https://github.com/pytorch/pytorch/issues/46782
#Closes: https://github.com/pytorch/pytorch/issues/46782
ghstack-source-id: 115440558
Test Plan:
1) waitforbuildbot
2) `python test/run_test.py -i distributed/_pipeline/sync/skip/test_api -- -f`
Reviewed By: malfet
Differential Revision: D24584556
fbshipit-source-id: bd87f5b4953504e5659fe72fc8615e126e5490ff
Summary:
When run with `--continue-through-error`, the script ends with the following error:
```
Traceback (most recent call last):
File "run_test.py", line 745, in <module>
main()
File "run_test.py", line 741, in main
print_to_stderr(message)
NameError: name 'message' is not defined
make: *** [macos-compat] Error 1
```
This PR just changes `message` to `err`, which is the intended variable.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46777
Reviewed By: seemethere
Differential Revision: D24510460
Pulled By: janeyx99
fbshipit-source-id: be1124b6fc72b178d62acc168d0cbc74962de52b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44090
This is an initial commit pulling in the torchgpipe fork at
https://github.com/facebookresearch/fairscale.
The purpose of this commit is to just pull in the code and ensure all tests and
builds work fine. We will slowly modify this to match our intended API
mentioned in https://fb.quip.com/txurAV3zIFox#RPZACAfAKMq. Follow up PRs would
address further changes needed on top of the initial commit..
We're pulling the code into the `torch.distributed._pipeline.sync` package. The
package is private on purpose since there is a lot of work (ex: docs, API
changes etc.) that needs to go in before we can actually officially support
this.
ghstack-source-id: 114864254
Test Plan:
1) waitforbuildbot
2) Ran all tests on my devgpu
Reviewed By: mrshenli
Differential Revision: D23493316
fbshipit-source-id: fe3c8b7dadeeb86abdc00e8a8652491b0b16743a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46287
This adds a lightweight `pytree` implementation that is similar to and
inspired by JAX pytrees, tensorflow.nest, deepmind/tree,
TorchBeast's TensorNest, etc.
A *pytree* is Python nested data structure. It is a tree in the sense
that nodes are Python collections (e.g., list, tuple, dict) and the leaves
are Python values. Furthermore, a pytree should not contain reference
cycles.
This PR:
- adds support for flattening and unflattening nested Python list/dict/tuples
Context: nested Tensor inputs for vmap
--------------------------------------
Right now, vmap is restricted to taking in flat lists of tensors. This
is because vmap needs to be able to convert every tensor in the input
that is being vmapped over into a BatchedTensor.
With a pytree library, we can simply flatten the input data structure
(returning the leaves), map all of the Tensors in the flat input to
BatchedTensors, and unflatten the flat list of BatchedTensors into a new
input. Or equivalently, with a `tree_map` function, we can map a nested
python data structure containing Tensors into one containing
BatchedTensors.
Future work
-----------
In some future PRs, we'll add nested input support for vmap. The
prerequisites for that are:
- a `broadcast_to(small, big)` that broadcasts `small` up to `big`.
This is for handling the in_dims to vmap: the in_dims structure must
be compatible with the structure of the inputs.
Test Plan
---------
- New tests in test/test_pytree.py
Test Plan: Imported from OSS
Reviewed By: heitorschueroff
Differential Revision: D24392890
Pulled By: zou3519
fbshipit-source-id: 7daf7430c5a38354e7d203a72882bd7a9b24cfb1
Summary:
1. Added CudaFusionGuard as the custom TypeCheck for nvfuser; enabled dynamic shape support with profiling executor;
2. dropped support for legacy fuser;
3. re-enabled nvfuser tests;
4. added registration for profiling record to allow profiling on user specified nodes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46452
Reviewed By: zou3519, anjali411
Differential Revision: D24364642
Pulled By: ngimel
fbshipit-source-id: daf53a9a6b6636e1ede420a3a6d0397d4a8b450b
Summary:
This PR just adds more polish to the benchmark utils:
1) `common.py`, `timer.py`, and `valgrind_wrapper/timer_interface.py` are now MyPy strict compliant. (except for three violations due to external deps.) Compare and Fuzzer will be covered in a future PR.
2) `CallgrindStats` now uses `TaskSpec` rather than accepting the individual fields which brings it closer to `Measurement`.
3) Some `__repr__` logic has been moved into `TaskSpec` (which `Measurement` and `CallgrindStats` use in their own `__repr__`s) for a more unified feel and less horrible f-string hacking, and the repr's have been given a cleanup pass.
4) `Tuple[FunctionCount, ...]` has been formalized as the `FunctionCounts` class, which has a much nicer `__repr__` than just the raw tuple, as well as some convenience methods (`__add__`, `__sub__`, `filter`, `transform`) for easier DIY stat exploration. (I find myself using the latter two a lot now.) My personal experience is that manipulating `FunctionCounts` is massively more pleasant than the raw tuples of `FunctionCount`. (Though it's still possible to get at the raw data if you want.)
5) Better support for multi-line `stmt` and `setup`.
6) Compare now also supports rowwise coloring, which is often the more natural layout for A/B testing.
7) Limited support for `globals` in `collect_callgrind`. This should make it easier to benchmark JIT models. (CC ZolotukhinM)
8) More unit tests, including extensive tests for the Callgrind stats manipulation APIs.
9) Mitigate issue with `MKL_THREADING_LAYER` when run in Jupyter. (https://github.com/pytorch/pytorch/issues/37377)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46023
Test Plan: changes should be covered by existing and new unit tests.
Reviewed By: navahgar, malfet
Differential Revision: D24313911
Pulled By: robieta
fbshipit-source-id: 835d4b5cde336fb7ff0adef3c0fd614d64df0f77
Summary:
Reopen the PR: https://github.com/pytorch/pytorch/pull/45837
This PR add a new feature for Partitioner() class called size_based_partition. Given a list of devices with the same memory size, this function could distribute graph nodes into different devices. To implement this feature, several help functions are created in Partitioner.py and GraphManipulation.py.
An unit test is also added in test/test_fx_experimental.py
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46282
Reviewed By: gcatron
Differential Revision: D24288470
Pulled By: scottxu0730
fbshipit-source-id: e81b1e0c56e34f61e497d868882126216eba7538
Summary:
In response to https://github.com/pytorch/pytorch/issues/11578. This is a test run to see if CI (and other internal systems) works fine with pytest style tests.
- Creates a separate `distributions` directory within `test`.
- For testing, this rewrites the `constraint` tests as parameterized tests in pytest. I don't plan to convert any other tests to pytest style, but only expose this option for adding new tests, if required.
If this is a success, we can move `EXAMPLES` in `test_distributions` into a separate file that can be imported by both pytest and unittest style tests. cc. fritzo
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45648
Reviewed By: ezyang, colesbury
Differential Revision: D24080248
Pulled By: neerajprad
fbshipit-source-id: 1f2e7d169c3c291a3051d0cece17851560fe9ea9
Summary:
Removed test_tensorexpr from the JIT-EXECUTOR exclude list.
CI will now run those tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46188
Reviewed By: glaringlee
Differential Revision: D24255433
Pulled By: janeyx99
fbshipit-source-id: f18e5b41d49b439407c1c24ef6190ef68bc809bf