Commit Graph

297 Commits

Author SHA1 Message Date
Rong Rong (AI Infra)
7e619b9588 First step to rearrange files in tools folder (#60473)
Summary:
Changes including:
- introduced `linter/`, `testing/`, `stats/` folders in `tools/`
- move appropriate scripts into these folders
- change grepped references in the pytorch/pytorch repo

Next step
- introduce `build/` folder for build scripts

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60473

Test Plan:
- CI (this is important b/c pytorch/test-infra also rely on some script reference.
- tools/tests/

Reviewed By: albanD

Differential Revision: D29352716

Pulled By: walterddr

fbshipit-source-id: bad40b5ce130b35dfd9e59b8af34f9025f3285fd
2021-06-24 10:13:58 -07:00
Rong Rong (AI Infra)
40d2fe1053 correct filename issue for test_cpp_extensions_aot (#60604)
Summary:
Using file copy to make actual ninja vs. no_ninja suffixed python test files.
This is to trick xmlrunner to report test cases in the correct folder.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60604

Test Plan:
- CI reports correctly into the corresponding folders
- If download the test statistics, calculate shards now doesn't need custom logic to handle `test_cpp_extensions_aot`

CI result shown it is working properly:
https://github.com/pytorch/pytorch/pull/60604/checks?check_run_id=2900038654 vs
https://github.com/pytorch/pytorch/pull/60604/checks?check_run_id=2900038673

Reviewed By: albanD

Differential Revision: D29349562

Pulled By: walterddr

fbshipit-source-id: e86e6bc0db288a2a57bea3c5f8edf03be1773944
2021-06-24 09:20:19 -07:00
Jane Xu
6385621003 Use JOB_BASE_NAME throughout code--consolidate CIRCLE_JOB (#60425)
Summary:
This PR is a first step in unifying our environment variables across CI (so that we don't have `CIRCLE_BLAH` in our GHA workflows, for example), though I'd like for this PR to be more for discussion about how best to consolidate these variables.

This small change only changes most CIRCLE_JOB references in our code to be JOB_BASE_NAME, as that seems the closest GHA (and ROCm) equivalent. Currently, JOB_BASE_NAME is defined as:
- in Circle: CIRCLE_JOB (name of the job, like `pytorch_linux_bionic_py3_8_gcc9_coverage_test1`)
- in GHA: the build_environment with a `-build` or `-test` tacked to the end , e.g., `pytorch-linux-xenial-cuda10.2-cudnn7-py3.6-gcc7-test`
- in ROCm: I don't actually know, but it's important for ROCm test sharding as shown in https://github.com/pytorch/pytorch/pull/60409

I am not sure if this is the intention for JOB_BASE_NAME so it is open to discussion what variable we should use if not JOB_BASE_NAME. I also don't know if it's worth the effort consolidating all these variables, so discussion is also highly encouraged there!

Next steps:
- Consolidate more CIRCLE_* references, maybe into CI_* equivalents?
- We use BUILD_ENVIRONMENT everywhere in Circle though the variable is inconsistent across binary vs CI jobs and across platforms. For example, for linux tests and builds, BUILD_ENVIRONMENT contains the `_test` and `_build` suffixes, but the windows jobs don't. In GHA, BUILD_ENVIRONMENT is similar to how it's defined in windows jobs on Circle. This inconsistency is confusing, and we can probably do something about it. I'm thinking of switching out BUILD_ENVIRONMENT for JOB_BASE_NAME in our test scripts where we actually mean JOB_BASE_NAME.
- We should probably document the meaning of the variables we consolidate somewhere, preferably in a README in some unified `ci/` folder. For example, it seems BUILD_ENVIRONMENT is supposed to capture the build environment, whereas JOB_BASE_NAME is supposed to capture the environment _and_ whether we're building or testing.

Notes:
- I did not replace CIRCLE_JOB references in third_party directories
- Previously, print_test_stats reported CIRCLE_JOB as only the build environment for GHA workflows, and I think tacking on the `build` or `test` will not harm anything, though I may be wrong.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60425

Reviewed By: seemethere, samestep

Differential Revision: D29333882

Pulled By: janeyx99

fbshipit-source-id: a82080e6205a03a1183035011ce59698eca06748
2021-06-23 11:11:21 -07:00
Howard Huang
ff3678eec2 Disable group group backend rpc tests from running on CI (#60407)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60407

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D29278179

Pulled By: H-Huang

fbshipit-source-id: ee78085eeb04d81842c95236b8c3a33de7142a3a
2021-06-23 10:58:31 -07:00
Jane Xu
c63a0d0cfe Adding windows CUDA smoke tests on PRs (#59686)
Summary:
Adding windows CUDA smoke tests on PRs (master should run the full suite).

Next step:
- Automate data update so we get a new smoke test list without manual effort

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59686

Test Plan: https://github.com/pytorch/pytorch/actions/runs/958296267 The sharded smoke tests take long still because of dependencies installation

Reviewed By: walterddr

Differential Revision: D29243533

Pulled By: janeyx99

fbshipit-source-id: dde7ba127fa15c95bda0e833cc5311598fb85e2b
2021-06-23 10:13:50 -07:00
Jane Xu
462448f07a Enable GHA sharding on linux (#60124)
Summary:
This is branch off of https://github.com/pytorch/pytorch/issues/59970 to only shard on linux so far (we're running in issues with windows gflags).

This would enable sharding of tests on a few Linux jobs on GHA, allowing tts to be essentially halved.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60124

Reviewed By: zou3519

Differential Revision: D29204211

Pulled By: janeyx99

fbshipit-source-id: 1cc31d1eccd564d96e2aef14c0acae96a3f0fcd0
2021-06-17 13:00:23 -07:00
Rong Rong (AI Infra)
b2fc6de2c4 support parsing of PR stats in run_test.py (#60026)
Summary:
Currently S3 test stats doesn't support PR stats parisng.

Changes to s3_stats_parser:
1. they are uploaded to `test_times/{sha1}/{job}` and `pr_test_times/{pr}/{sha1}/{job}` separately. Thus we need parsing logics for both
2. need to attach time for PR stats parsing for ordering since PR commits can be force-pushed

Changes to run_test.py
1. Reordering based on previous PR stats if available
2. Falling back to file change option if not enabled.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60026

Test Plan:
- CI.
- local repro: plz run:
```
CIRCLE_JOB="pytorch_linux_bionic_py3_6_clang9_noarch_test" CIRCLE_PR_NUMBER=60057 IN_CI=1 ENABLE_PR_HISTORY_REORDERING=1 python test/run_test.py
```

Reviewed By: samestep

Differential Revision: D29164754

Pulled By: walterddr

fbshipit-source-id: 206688e0fb0b78d1c9042c07243da1fbf88a924b
2021-06-16 13:32:31 -07:00
Jane Xu
d88fbf0fbc fix minor typo in run_test.py (#60055)
Summary:
Fixes typo in run_test.py for option use_specified_test_cases_by

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60055

Reviewed By: walterddr

Differential Revision: D29150156

Pulled By: janeyx99

fbshipit-source-id: 375e594d09c83188bfa80762c8b833a0b7c5cca4
2021-06-16 09:30:45 -07:00
Rohan Varma
c2098487e8 [c10d] Move pg wrapper tests to their own file. (#59840)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59840

moving these tests to their own standalone file. No meaningful code changes.
ghstack-source-id: 131359162

Test Plan: CI

Reviewed By: cbalioglu

Differential Revision: D29012664

fbshipit-source-id: 348870016509a6ed7e69240fa82bccef4a12d674
2021-06-14 15:05:55 -07:00
Rong Rong
e41bc31eb2 make --run-specified-test-case use --include (#59704)
Summary:
instead of having specific logic to handle run-specific-test-case, we provide the flag to override include or bring-to-front with the SPECIFIED_TEST_CASES_FILE.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59704

Reviewed By: janeyx99

Differential Revision: D29038425

Pulled By: walterddr

fbshipit-source-id: 803d3555813437c7f287a22f7704106b0c609919
2021-06-11 13:57:13 -07:00
Jane Xu
9bb5663979 Use commit stats from viable/strict instead of nightlies for sharding (#59727)
Summary:
Currently, not all of CI runs on nightlies, so it's better to use viable/strict.

For example, current 11.1 test jobs do not get to use automatic sharding because of the lack of stats: https://app.circleci.com/jobs/github/pytorch/pytorch/14010983?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59727

Reviewed By: heitorschueroff

Differential Revision: D29004910

Pulled By: janeyx99

fbshipit-source-id: eb0c54a7e7947decba8134a1d67e4b0434151a06
2021-06-09 13:52:15 -07:00
Jane Xu
97dfc7e300 [Reland] Adding run specified tests option to run_test.py (#59649)
Summary:
Reland of https://github.com/pytorch/pytorch/issues/59487

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59649

Reviewed By: samestep

Differential Revision: D28970751

Pulled By: janeyx99

fbshipit-source-id: 6e28d4dcfdab8a49da4b6a02c57516b08bacd7b5
2021-06-08 16:04:46 -07:00
Alban Desmaison
5d6a10a765 Revert D28913223: [pytorch][PR] Adding run-specified-test-cases option in run_test.py
Test Plan: revert-hammer

Differential Revision:
D28913223 (24432eaa29)

Original commit changeset: 0d1f99109734

fbshipit-source-id: 47c073720cff23a5d4cb64556381c46025e90937
2021-06-08 02:18:16 -07:00
Rong Rong (AI Infra)
57d8bccd00 only reorder tests based on git diff if IN_CI (#59565)
Summary:
Do not reorder tests unless they are in IN_CI, this causes local development test ordering indeterministic. most of use branch out from viable strict not head of master.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59565

Reviewed By: ejguan

Differential Revision: D28943906

Pulled By: walterddr

fbshipit-source-id: e742e7ce4b3fc017d7563b01e93c4cd774d0a537
2021-06-07 17:54:19 -07:00
Jane Xu
24432eaa29 Adding run-specified-test-cases option in run_test.py (#59487)
Summary:
The run-specified-test-cases option would allow us to specify a list of test cases to run by having a CSV with minimally two columns: test_filename and test_case_name.

This PR also adds .json to some files we use for better clarity.

Usage:
`python test/run_test.py --run-specified-test-cases <csv_file>` where the csv file can look like:
```
test_filename,test_case_name,test_total_time,windows_only_failure_sha_count,total_sha_count,windows_failure_count,linux_failure_count,windows_total_count,linux_total_count
test_cuda,test_cudnn_multiple_threads_same_device,8068.8409659525,46,3768,53,0,2181,6750
test_utils,test_load_standalone,8308.8062920459,14,4630,65,0,2718,8729
test_ops,test_forward_mode_AD_acosh_cuda_complex128,91.652619369806,11,1971,26,1,1197,3825
test_ops,test_forward_mode_AD_acos_cuda_complex128,91.825633094915,11,1971,26,1,1197,3825
test_profiler,test_source,60.93786725749,9,4656,21,3,2742,8805
test_profiler,test_profiler_tracing,203.09352795241,9,4662,21,3,2737,8807
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59487

Test Plan:
Without specifying the option, everything should be as they were before.

Running `python test/run_test.py --run-specified-test-cases windows_smoke_tests.csv` resulted in this paste P420276949 (you can see internally). A snippet looks like:
```
(pytorch) janeyx@janeyx-mbp pytorch % python test/run_test.py --run-specified-test-cases windows_smoke_tests.csv
Loading specified test cases to run from windows_smoke_tests.csv.
Processed 28 test cases.
Running test_cpp_extensions_jit ... [2021-06-04 17:24:41.213644]
Executing ['/Users/janeyx/miniconda3/envs/pytorch/bin/python', 'test_cpp_extensions_jit.py', '-k', 'test_jit_cuda_archflags'] ... [2021-06-04 17:24:41.213781]
s
----------------------------------------------------------------------
Ran 1 test in 0.000s

OK (skipped=1)
...
```
With pytest, an example executable would be:
`Running test_dataloader ... [2021-06-04 17:37:57.643039]
Executing ['/Users/janeyx/miniconda3/envs/pytorch/bin/python', '-m', 'pytest', 'test_dataloader.py', '-v', '-k', 'test_segfault or test_timeout'] ... [2021-06-04 17:37:57.643327]`

Reviewed By: samestep

Differential Revision: D28913223

Pulled By: janeyx99

fbshipit-source-id: 0d1f9910973426b8756815c697b483160517b127
2021-06-07 16:27:43 -07:00
Jane Xu
caf76c2445 Move sharding to after all tests have been excluded (#59583)
Summary:
It would be most accurate if sharding occurred after all other changes to selected_tests were complete.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59583

Reviewed By: ejguan

Differential Revision: D28944737

Pulled By: janeyx99

fbshipit-source-id: a851473948a5ec942ffeeedeefdc645536a3d9f7
2021-06-07 15:04:36 -07:00
Mike Ruberry
de40c8e495 Adds remaining OpInfos and removes redundant test generators (#55558)
Summary:
Per title.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/55558

Reviewed By: ngimel

Differential Revision: D28922522

Pulled By: mruberry

fbshipit-source-id: 89cefd93788bc8aa0683f4583cf5caa81aa2dc93
2021-06-06 14:52:26 -07:00
Andrew Gu
2ad4b8e58c Extract c10d Store tests to dedicated test file (#59271)
Summary:
Partially addresses https://github.com/pytorch/pytorch/issues/55340

**Overview**
This factors out `FileStoreTest`, `HashStoreTest`, `PrefixFileStoreTest`, `TCPStoreTest`, `PrefixTCPStoreTest`, `PythonStoreTest`, `RendezvousTest`, `RendezvousEnvTest`, `RendezvousFileTest`, and `RendezvousTCPTest` from `test_c10d_common.py` to a new file `test_store.py`.

Additionally, unused import/initialization statements are removed from `test_c10d_common.py`, and the minimal set of import/initialization statements are used for `test_store.py`.

Also, this changes `.jenkins/pytorch/multigpu-test.sh`, `.jenkins/pytorch/win-test-helpers/test_distributed.bat`, and `test/run_test.py` to include the new `test_store.py`.

**Testing**
All commands shown are run on an AI AWS cluster.

I check the Store tests:
```
python test/distributed/test_store.py
```

I also check `test_c10d_common.py` since it is the source of the refactored code. In addition, I check `test_c10d_nccl.py` and `test_c10d_gloo.py` since they import from `test_c10d_common.py`; those two should be the only test files depending on `test_c10d_common.py`.
```
python test/distributed/test_c10d_common.py
python test/distributed/test_c10d_nccl.py
python test/distributed/test_c10d_gloo.py
```
`test_c10d_gloo.py` produces warnings about how using sparse tensors in TorchScript is experimental, but the warnings do not result from this PR's changes.

**Testing Issues** (To Be Revisited)
```
WORLD_SIZE=4 BACKEND=gloo gpurun pytest test/distributed/test_c10d_gloo.py
```
Running the above command fails three tests (written as `[Test]`: `[Error]`):
- `ProcessGroupGlooWrapperTest.test_collective_hang`: `RuntimeError: [../third_party/gloo/gloo/transport/tcp/pair.cc:598] Connection closed by peer [10.200.24.101]:15580`
- `CommTest.test_broadcast_coalesced_gloo_cuda`: `RuntimeError: cuda runtime error (3) : initialization error at ../aten/src/THC/THCGeneral.cpp:54`
- `CommTest.test_sequence_num_incremented_gloo_default`: `RuntimeError: cuda runtime error (3) : initialization error at ../aten/src/THC/THCGeneral.cpp:54`
However, running each of the following yields no errors:
```
WORLD_SIZE=4 BACKEND=gloo gpurun pytest test/distributed/test_c10d_gloo.py -k test_collective_hang
WORLD_SIZE=4 BACKEND=gloo gpurun pytest test/distributed/test_c10d_gloo.py -k test_broadcast_coalesced_gloo_cuda
WORLD_SIZE=4 BACKEND=gloo gpurun pytest test/distributed/test_c10d_gloo.py -k test_sequence_num_incremented_gloo_default
```
This suggests the existence of some inadvertent state dependency between tests (e.g. improper cleanup). I have not explored this further yet. In particular, I do not have a solid understanding of the tests to be able to explain why using `pytest` and `gpurun` induces the failure (since notably, running the `.py` directly shows no issue).

Similarly, running the following yields 47 errors:
```
WORLD_SIZE=4 BACKEND=nccl gpurun pytest test/distributed/test_c10d_nccl.py
```
The errors seem to all be simply complaining about the usage of `fork()` instead of `spawn()` for CUDA multiprocessing. Though, most of the tests in `test_c10d_nccl.py` ask for at least 2 CUDA devices, so I think that the `gpurun` is warranted (assuming that the test file does not need to be run partially on different machines).

Both `test_c10d_common.py` and `test_store.py` work fine with `pytest`.

**Other Notes**
I noticed that `torch.distributed` is imported both as `dist` and as `c10d` and that `c10d` is used throughout the Store tests. I was curious if this is intentional (as opposed to using `dist` to refer to `torch.distributed`). Also, the original [issue](https://github.com/pytorch/pytorch/issues/55340) suggests that the Store tests do not use multiprocessing, but I saw that `torch.multiprocessing` is still used in `TCPStoreTest`.

The links for the Store files in the `CONTRIBUTING.md` [file](https://github.com/pytorch/pytorch/blob/master/torch/distributed/CONTRIBUTING.md) are broken. This can fixed in a separate PR.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59271

Reviewed By: jbschlosser, mrshenli

Differential Revision: D28856920

Pulled By: andwgu

fbshipit-source-id: 630950cba18d34e6b5de661f5a748f2cddc1b446
2021-06-03 10:53:33 -07:00
Pritam Damania
0d6fa1adc5 Introduce ChunkShardingSpec as a model sharding specification. (#55728)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55728

Full design: https://github.com/pytorch/pytorch/issues/55207

This PR introduces ChunkShardingSpec (SingleShardingSpec in the design). Used
the name ChunkShardingSpec since it is very similar to `torch.chunk` in terms
of how a Tensor is split up and feels more clear compared to SingleShardingSpec.
ghstack-source-id: 129603318

Test Plan: waitforbuildbot

Reviewed By: SciPioneer

Differential Revision: D27694108

fbshipit-source-id: c8764abe6a4d5fc56d023fda29b74b5af2a73b49
2021-05-23 16:04:57 -07:00
Rong Rong (AI Infra)
a70020465b adding test_sparse_csr to run_test (#58666)
Summary:
fixes https://github.com/pytorch/pytorch/issues/58632.

Added several skips that relates to test assert and MKL. Will address them in separate PR.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58666

Reviewed By: seemethere, janeyx99

Differential Revision: D28607966

Pulled By: walterddr

fbshipit-source-id: 066d4afce2672e4026334528233e69f68da04965
2021-05-22 13:17:46 -07:00
Sam Estep
2e26976ad3 Disallow versionless Python shebangs (#58275)
Summary:
Some machines don't have a versionless `python` on their PATH, which breaks these existing shebangs.

I'm assuming that all the existing versionless `python` shebangs are meant to be `python3` and not `python2`; please let me know if my assumption was incorrect for any of these.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58275

Test Plan: CI.

Reviewed By: zhouzhuojie

Differential Revision: D28428143

Pulled By: samestep

fbshipit-source-id: 6562be3d12924db72a92a0207b060ef740f61ebf
2021-05-14 08:26:02 -07:00
Nikita Shulga
b587354e4c Add Python-3.9 CI testing (#50992)
Summary:
Skip number of tests adjust typing handling

Pull Request resolved: https://github.com/pytorch/pytorch/pull/50992

Reviewed By: walterddr

Differential Revision: D26170388

Pulled By: malfet

fbshipit-source-id: 47852512aa3d5c25faf6687bcd0b1cbb332b0b20
2021-05-10 10:51:39 -07:00
Aliaksandr Ivanou
7fe4c1d0e7 Torchelastic: add multiprocessing tests to ci/cd (#56842)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56842

Add elastic multiprocessing test to ci/cd

Test Plan: buck test mode/opt-tsan //caffe2/test/distributed/elastic/multiprocessing/... -- --run-disabled

Reviewed By: wilson100hong

Differential Revision: D27982226

fbshipit-source-id: 1b4e6f1a20867a6aa7ca409e280fdb04e8db198b
2021-05-02 14:03:47 -07:00
Aliaksandr Ivanou
5c8ceefe46 Pytorch add agent api tests (#56985)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56985

Pytorch add agent api tests

Test Plan: ci/cd

Reviewed By: cbalioglu

Differential Revision: D28020485

fbshipit-source-id: e6acf095f26ce4b99cddfbf7641fb4fa885b0c86
2021-04-29 06:14:39 -07:00
Aliaksandr Ivanou
6ff0002b12 Pytorch: enable many torchelastic tests (#56970)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56970

The diff enables metrics, events, utils and timer tests on ci/cd pipeline

Test Plan: ci/cd

Reviewed By: cbalioglu

Differential Revision: D28015200

fbshipit-source-id: 6b419aaf9e62a10a747b6511bff90c82cfb7bcd6
2021-04-28 17:05:09 -07:00
David Reiss
89377e3e45 model_dump tool for model inspection (#56868)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56868

See __init__.py for a summary of the tool.
The following sections are present in this initial version
- Model Size.  Show the total model size, as well as a breakdown by
  stored files, compressed files, and zip overhead.  (I expect this
  breakdown to be a bit more useful once data.pkl is compressed.)
- Model Structure.  This is basically the output of
  `show_pickle(data.pkl)`, but as a hierarchical structure.
  Some structures cause this view to crash right now, but it can be
  improved incrementally.
- Zip Contents.  This is basically the output of `zipinfo -l`.
- Code.  This is the TorchScript code.  It's integrated with a blame
  window at the bottom, so you can click "Blame Code", then click a bit
  of code to see where it came from (based on the debug_pkl).  This
  currently doesn't render properly if debug_pkl is missing or
  incomplete.
- Extra files (JSON).  JSON dumps of each json file under /extra/, up to
  a size limit.
- Extra Pickles.  For each .pkl file in the model, we safely unpickle it
  with `show_pickle`, then render it with `pprint` and include it here
  if the size is not too large.  We aren't able to install the pprint
  hack that thw show_pickle CLI uses, so we get one-line rendering for
  custom objects, which is not very useful.  Built-in types look fine,
  though.  In particular, bytecode.pkl seems to look fine (and we
  hard-code that file to ignore the size limit).

I'm checking in the JS dependencies to avoid a network dependency at
runtime.  They were retrieved from the following URLS, then passed
through a JS minifier:
  https://unpkg.com/htm@3.0.4/dist/htm.module.js?module
  https://unpkg.com/preact@10.5.13/dist/preact.module.js?module

Test Plan:
Manually ran on a few models I had lying around.
Mostly tested in Chrome, but I also poked around in Firefox.

Reviewed By: dhruvbird

Differential Revision: D28020849

Pulled By: dreiss

fbshipit-source-id: 421c30ed7ca55244e9fda1a03b8aab830466536d
2021-04-28 07:33:10 -07:00
Philip Meier
759cfb7495 add missing comma to run_test.py (#57010)
Summary:
Factored out from https://github.com/pytorch/pytorch/pull/57008#discussion_r621137121:

> Without this comma, the strings are concatenated to `test_binary_ufuncstest_numpy_interop`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/57010

Reviewed By: malfet

Differential Revision: D28028061

Pulled By: walterddr

fbshipit-source-id: 97c64b79a6aaaf0242def03c8808c1a032537258
2021-04-27 08:00:13 -07:00
Joel Schlosser
febff45900 Support factory kwargs in torch.nn modules (#54508)
Summary:
Continuation of https://github.com/pytorch/pytorch/pull/53144

Pull Request resolved: https://github.com/pytorch/pytorch/pull/54508

Reviewed By: albanD

Differential Revision: D27939544

Pulled By: jbschlosser

fbshipit-source-id: 4bf517e5f74f093e27ca38a85e732da65e44d805
2021-04-22 16:16:53 -07:00
driazati
187a524249 Re-order tests based on changed files (#56666)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56666

Addresses some of #56557 by checking for changed files when running tests. This will help deliver signal faster when a failing test is run. It should always be safe to at least try to re-order the tests, so there's no option to turn it off, and any error ends up bailing out of the sorting process. Time saved will change between tests, with more improvement for things that are further down the static list here:

1e9c7ad4cb/test/run_test.py (L32)

The results vary from not much improvement ([before: 11m](https://app.circleci.com/pipelines/github/pytorch/pytorch/307580/workflows/6ab3def6-8d63-4f41-9b8d-9c2c50f6266b/jobs/12712819/steps), [after: 10m](https://app.circleci.com/pipelines/github/pytorch/pytorch/307578/workflows/157407b4-f850-431c-b641-d2ac97916a04/jobs/12712802/steps)) to a lot ([before: 75m](https://app.circleci.com/pipelines/github/pytorch/pytorch/307580/workflows/6ab3def6-8d63-4f41-9b8d-9c2c50f6266b/jobs/12712884/steps), [after: 8m](https://app.circleci.com/pipelines/github/pytorch/pytorch/307578/workflows/157407b4-f850-431c-b641-d2ac97916a04/jobs/12712865/steps)), but overall there shouldn't be any regression in test timing. These results are also probably a little confounded since the test sharding will be different after re-ordering.

As a follow up we can use the target determination logic to figure out which tests to bring to front based on the actual code instead of just edits to test files

Test Plan: Imported from OSS

Reviewed By: samestep

Differential Revision: D27934076

Pulled By: driazati

fbshipit-source-id: 747d09ad732289d7693101803d46e9fa8e6d2f59
2021-04-22 10:27:07 -07:00
Pavel Belevich
426852b4f0 Split test_c10d_spawn.py to test_c10d_spawn_gloo.py,test_c10d_spawn_nccl.py (#56599)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56599

Test Plan: NA

Reviewed By: SciPioneer

Differential Revision: D27913955

fbshipit-source-id: 7206e589fb7d08c55d08a58a3d57dc3d210a795e
2021-04-21 22:11:49 -07:00
Pavel Belevich
5cc75e46fa Split test_c10d.py to test_c10d_common.py, test_c10d_gloo.py, test_c10d_nccl.py (#56598)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56598

Test Plan: NA

Reviewed By: SciPioneer

Differential Revision: D27913170

fbshipit-source-id: 3439d18141131b02d55f2ca399a4c795cba2b04b
2021-04-21 22:10:41 -07:00
Joel Schlosser
12b2bc94d7 Revert D27909732: [pytorch][PR] Support factory kwargs in torch.nn modules
Test Plan: revert-hammer

Differential Revision:
D27909732 (5a09def9b0)

Original commit changeset: d8684b2403ab

fbshipit-source-id: d00d69fae4fa4ed58d9e97e70b27a06a0dcb39e4
2021-04-21 13:44:03 -07:00
Joel Schlosser
5a09def9b0 Support factory kwargs in torch.nn modules (#54508)
Summary:
Continuation of https://github.com/pytorch/pytorch/pull/53144

Pull Request resolved: https://github.com/pytorch/pytorch/pull/54508

Reviewed By: malfet

Differential Revision: D27909732

Pulled By: jbschlosser

fbshipit-source-id: d8684b2403ab7eb336371d118799146a2520bd76
2021-04-21 13:20:11 -07:00
Aliaksandr Ivanou
c5c5230890 Pytorch resolve bug around incorrect rdzv handler resolution (#56386)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56386

The diff resolves bug around incorrect handler resolution:
_create_static_handler pointed towards etcd, and _create_etcd_handler pointed towards static.

Test Plan:
buck test mode/dev-nosan //caffe2/test/distributed:test_launcher

Added test_launcher to the ci/cd tests

Reviewed By: cbalioglu

Differential Revision: D27858897

fbshipit-source-id: 440155789958c091ce5755e7c9524e4bb704203a
2021-04-19 23:50:28 -07:00
Natalia Gimelshein
92d24e3060 Revert D27855386: [pytorch][PR] Support factory kwargs in torch.nn modules
Test Plan: revert-hammer

Differential Revision:
D27855386 (40483acc51)

Original commit changeset: dabd505d2a04

fbshipit-source-id: f5bf3120d87861b30a8e1bf11977ad7d27cd8500
2021-04-19 20:07:20 -07:00
Joel Schlosser
40483acc51 Support factory kwargs in torch.nn modules (#54508)
Summary:
Continuation of https://github.com/pytorch/pytorch/pull/53144

Pull Request resolved: https://github.com/pytorch/pytorch/pull/54508

Reviewed By: bdhirsh

Differential Revision: D27855386

Pulled By: jbschlosser

fbshipit-source-id: dabd505d2a04208e74b158570fb2859c736eea2c
2021-04-19 12:24:58 -07:00
Sam Estep
d05e7c163f Revert D27600457: [pytorch][PR] Support factory kwargs in torch.nn modules
Test Plan: revert-hammer

Differential Revision:
D27600457 (1077f87269)

Original commit changeset: b58bfee61c39

fbshipit-source-id: 19d5bfc5133a3880383731d0332503ca1f3bce0c
2021-04-19 07:47:24 -07:00
Joel Schlosser
1077f87269 Support factory kwargs in torch.nn modules (#54508)
Summary:
Continuation of https://github.com/pytorch/pytorch/pull/53144

Pull Request resolved: https://github.com/pytorch/pytorch/pull/54508

Reviewed By: mrshenli

Differential Revision: D27600457

Pulled By: jbschlosser

fbshipit-source-id: b58bfee61c3917524b4622f63ef216c27a588eb1
2021-04-19 06:58:40 -07:00
Sam Estep
1e9c7ad4cb Add a test to measure import torch time (#56041)
Summary:
This PR adds a couple very simple tests which (as the code comment says) measure the time it takes to `import torch` and ask for the CUDA device count.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56041

Test Plan:
```
$ rm -r /tmp/reports ; python3 test/test_import_time.py --save-xml=/tmp/reports

Running tests...
----------------------------------------------------------------------
..
----------------------------------------------------------------------
Ran 2 tests in 1.855s

OK

Generating XML reports...
```
```
$ tools/print_test_stats.py /tmp/reports
No scribe access token provided, skip sending report!
class TestImportTime:
    tests: 2 failed: 0 skipped: 0 errored: 0
    run_time: 1.85 seconds
    avg_time: 0.93 seconds
    median_time: 0.93 seconds
    2 longest tests:
        test_time_cuda_device_count time: 1.10 seconds
        test_time_import_torch time: 0.75 seconds

Total runtime is 0:00:01
2 longest tests of entire run:
    TestImportTime.test_time_cuda_device_count  time: 1.10 seconds
    TestImportTime.test_time_import_torch  time: 0.75 seconds
```

Reviewed By: driazati

Differential Revision: D27770908

Pulled By: samestep

fbshipit-source-id: 01bbf5a339f41d3a1f493e6fa8c946ff7567daec
2021-04-15 00:53:30 -07:00
Edward Yang
bc86358cf5 Make run_test.py work even if s3_stat_parser fails to import (#56039)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56039

Python will try to eagerly resolve the name references even if
the import failed.  Quote them so that it doesn't.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: janeyx99

Differential Revision: D27770536

Pulled By: ezyang

fbshipit-source-id: b111739289498f9bab856fb9424f3080efee4ee0
2021-04-14 13:21:50 -07:00
Luca Wehrstedt
3f8d476857 Split out CUDA RPC tests (#55695)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55695

In order to be able to run CUDA tests on their own (e.g., to avoid running CPU tests on GPU machines).

Done by moving test methods to a separate class (and sometimes introducing a "common" base class for utils), and then providing new entry points inside a `cuda/` subdirectory.

Test Plan: Checked they are run on Sandcastle.

Reviewed By: mrshenli

Differential Revision: D27618198

fbshipit-source-id: 8f671657f79c8ae115748ab7752fe0066705893b
2021-04-12 07:48:08 -07:00
Rong Rong (AI Infra)
55db156229 remove test_jit_py3.py entirely (#55560)
Summary:
1. move module related stuff to test_module_container
2. created test_types for types and annotation
3. created test_misc for the rest

Pull Request resolved: https://github.com/pytorch/pytorch/pull/55560

Reviewed By: VitalyFedyunin

Differential Revision: D27650911

Pulled By: walterddr

fbshipit-source-id: d895a7da9e9c3d25a662a37faf4daabc276b9c1a
2021-04-08 14:28:54 -07:00
Erjia Guan
f9a0bbbeb8 [DataPipe] Remove duplicate dataset (#54553)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54553

Test Plan: Imported from OSS

Reviewed By: VitalyFedyunin

Differential Revision: D27279301

Pulled By: ejguan

fbshipit-source-id: 112a83e7061e3f35dc517eb623bd9ca93c2f034c
2021-04-07 10:11:22 -07:00
Jane Xu
bf37bf7da4 Make JSON files more human readable (#55335)
Summary:
Prettifies JSON files .pytorch-test-times and .pytorch-slow-tests so that not everything is on one single line.

This is of slightly more importance as generated  .pytorch-slow-tests ends up getting stored in our test-infra repo ([example](ad9cd87565)), and it is nice to not have that lil red symbol at the end.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/55335

Reviewed By: samestep

Differential Revision: D27576930

Pulled By: janeyx99

fbshipit-source-id: be58565b8c8593a9bfcfab383ee19facc79f0572
2021-04-05 17:23:36 -07:00
Jane Xu
717e70a824 (BE) Refactor get-test-times-from-S3 into s3_stat_parser (#54808)
Summary:
Moves more s3 parsing code to s3_stat_parser.py. This is another step in modularizing the parsing code more correctly. I will also be using this exact function in future slowTest code.

Also replaces some Any's in the code to be Report.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/54808

Test Plan:
.pytorch-test-times generated before the code and after this code is the same.
CI should pass, specifically the test tools GHA.

Reviewed By: walterddr

Differential Revision: D27375783

Pulled By: janeyx99

fbshipit-source-id: bec28551668b2eb3fdd60d802200993e493eac83
2021-03-29 08:45:22 -07:00
Rong Rong (AI Infra)
d4045e9aa1 initial commit to refactor all s3 access codes to s3_stats_parser (#54681)
Summary:
First step to move all S3 related operations into S3 parser utils.
in the end we provide APIs from s3_stats_parser:
1. downloading data as reports and uploading data as reports
2. filter by job name

and handle all compression, formatting inside.

TODO
- [ ] Refactor out upload into s3_stats_parser
- [ ] Remove all S3/BOTO related checkers and try/catch blocks outside of s3_stats_parser

Pull Request resolved: https://github.com/pytorch/pytorch/pull/54681

Test Plan:
1. Running tools/test/* covers the refactoring logic (test_test_history.py and test_stats.py as entrypoint and both using the 2 new APIs in s3_stats_parser after the refactoring.
2. print_test_stats.py's main argparse entrypoint is covered by CI step Report Test Result step.
3. run `python test/run_test.py --export-past-test-times` before and after this PR should result in the same file content in .pytorch-test-times

Reviewed By: ailzhang

Differential Revision: D27346742

Pulled By: walterddr

fbshipit-source-id: fb40162e631e007fed9d5821fe4f190bda2cb52e
2021-03-26 06:49:15 -07:00
Jane Xu
792f5ffb83 Also strip slow_test (#54528)
Summary:
Since `_test1`, `_test2` and `_build` and `test` are all stripped, `slow_test` should be stripped as well. This way, the _slow_test stats will be considered as a part of all stats relating to a particular build job, though currently, it doesn't do much because the jobs don't share a common stemmed name--the build has `_gcc7` while the slow_test CI job does not.

This makes me think...do we omit the `gcc7` intentionally? Are there other things I should strip, e.g., `multigpu_test`?

See:
ci/circleci: pytorch_linux_xenial_cuda10_2_cudnn7_py3_slow_test
ci/circleci: pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_test1
ci/circleci: pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_test2

Pull Request resolved: https://github.com/pytorch/pytorch/pull/54528

Reviewed By: samestep

Differential Revision: D27270393

Pulled By: janeyx99

fbshipit-source-id: ffb7289cfe4dba52ded67f50a89f3e75e7bad68d
2021-03-23 14:44:21 -07:00
Jane Xu
635595f706 Change sharding in ci (#54228)
Summary:
Step three (landing this should fix https://github.com/pytorch/pytorch/issues/53882)!

Modifying CI to compute job times during build so that the exported job times can be used for sharding future test jobs.
The builds that are exempted from this:
- `bazel` (no python tests so no need)
- `libtorch` (no python stuff so no need)
- `onnx` (the test shards are not calculated the same way)
- `asan` (runs into error I don't know how to debug/we can debug later: [logs](https://app.circleci.com/pipelines/github/pytorch/pytorch/288019/workflows/57f95f67-1a1b-44a0-9b02-9652b57f2a5f/jobs/11693962)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/54228

Test Plan: CI

Reviewed By: samestep

Differential Revision: D27192978

Pulled By: janeyx99

fbshipit-source-id: 3cb20d14f4989e61873043b81dfd6b0f82d17ccd
2021-03-22 08:40:34 -07:00
Jane Xu
0645e2b490 Use shard file if present, improve functions used for sharding (#54210)
Summary:
Step 2 to fixing https://github.com/pytorch/pytorch/issues/53882 :)

This changes TARGET_DET_LIST and sharding automation by checking if there's already cached data from the commit in `.pytorch-test-times`. If not, it pulls data from S3 and updates the file to have the stats. This way, S3 pulling does not need to happen more than once for the same commit.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/54210

Test Plan:
the following methods should run the same set of tests.
First `export CIRCLE_JOB=pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_test2` or your favorite CIRCLE JOB.

1. Pull data first and use it:
Download the data from S3 and write it to the cache file with `python test/run_test.py --export-historic-test-times .pytorch-test-times`
Now run `python test/run_test.py --shard 1 10`

2. Make the sharding job pull data:
Delete the file you just created: `rm .pytorch-test-times`
Now run `python test/run_test.py --shard 1 10`

Reviewed By: walterddr

Differential Revision: D27136849

Pulled By: janeyx99

fbshipit-source-id: 51a42c4e2fa3f8cf15e682679dd3eb6130aad927
2021-03-18 13:25:51 -07:00
Jane Xu
2e7311ef25 First step to refactoring S3 reading logic (#53755)
Summary:
This is an initial attempt in refactoring and consolidating our S3 read logic for print_test_stats.py, test_history.py, and run_test.py. This way, boto3 and botocore do not need to be imported in various places throughout the code base, and duplicated logic (such as the many type definitions) can exist in one place: `tools/stat_utils/s3_stat_parser.py`. walterddr contributed to this PR by moving print_test_stats.py to the tools folder and the corresponding tests a subfolder within tools.

**NOTE: this removes those tests from CI as the new `tools/test/test_stats.py` is not in the test/ directory as the other tests in TESTS in run_test.py.**

Pull Request resolved: https://github.com/pytorch/pytorch/pull/53755

Test Plan:
This refactoring change should not break anything, so running the files as before should work as they did previously.
To make sure that print_test_stats.py still functions: run `python tools/test/test_stats.py` and make sure all tests pass.
To make sure that test_history.py works, run the example commands from `tools/test_history.py --help` and check that their output matches that shown. Note that the script will continue printing for a while, so don't be alarmed.

Some next steps:
- Actually coming up with similarities among the three current use cases and further refactoring/consolidating of functions (e.g., combining simplify and get_cases)
- Moving more parsing logic to s3_stat_parser.py to have better abstraction between our files
- Adding tests for s3_stat_parser.py when there is more functionality in it

Reviewed By: agolynski, samestep

Differential Revision: D27030285

Pulled By: janeyx99

fbshipit-source-id: e664781324ef7c0c30943bfd7f17c895075ef7a7
2021-03-17 12:38:09 -07:00