pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Pritam Damania	2d671ca41b	[8/N] Remove c10d/ddp fork tests. (#63454 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63454 Continuation of https://github.com/pytorch/pytorch/pull/63443, this PR removes all fork tests from torch.distributed. ghstack-source-id: 136285511 Test Plan: waitforbuildbot Reviewed By: SciPioneer Differential Revision: D30387872 fbshipit-source-id: f6d6313db126ae7b95b86f78a1e0726887c5c513	2021-08-20 12:23:18 -07:00
Jeff Daily	be9be9bfdd	add distributed/_sharded_tensor/test_sharded_tensor to ROCM_BLOCKLIST (#63508 ) Summary: Fixes current ROCm CI test2 brokenness until tensorpipe is fully supported by ROCm. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63508 Reviewed By: ejguan Differential Revision: D30406450 Pulled By: walterddr fbshipit-source-id: c07509271d5d33901f3eaf7ffb916dc3626e1f9a	2021-08-19 07:50:55 -07:00
Eli Uriegas	4982fc4ecf	test: Add ability to set CONTINUE_THROUGH_ERROR (#63357 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63357 Adds the ability to set CONTINUE_THROUGH_ERROR as an environment variable so that we can easily set it without having to add the flag directly Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: astaff Differential Revision: D30351108 Pulled By: seemethere fbshipit-source-id: 767fa9bd24e1399f359eb24d16f6cc985a2d7173	2021-08-16 15:35:40 -07:00
Shen Li	1022443168	Revert D30279364: [codemod][lint][fbcode/c*] Enable BLACK by default Test Plan: revert-hammer Differential Revision: D30279364 (`b004307252`) Original commit changeset: c1ed77dfe43a fbshipit-source-id: eab50857675c51e0088391af06ec0ecb14e2347e	2021-08-12 11:45:01 -07:00
Zsolt Dollenstein	b004307252	[codemod][lint][fbcode/c*] Enable BLACK by default Test Plan: manual inspection & sandcastle Reviewed By: zertosh Differential Revision: D30279364 fbshipit-source-id: c1ed77dfe43a3bde358f92737cd5535ae5d13c9a	2021-08-12 10:58:35 -07:00
Pritam Damania	91525d42d9	Fix sharded tensor tests. (#63054 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63054 1) Ensure these tests are skipped in environments without any GPUs. 2) Add the test to run_test.py ghstack-source-id: 135595698 Test Plan: waitforbuildbot Reviewed By: wanchaol Differential Revision: D30239159 fbshipit-source-id: 21b543ba72e8d10182bc77e7ae1fd34fd4096509	2021-08-11 21:46:45 -07:00
Rohan Varma	39ec1da935	[reland] Gate DistributedOptimizers on RPC availability (#62937 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62937 reland due to windows + cuda failure, fix by running it on gloo on windows even with cuda. ghstack-source-id: 135306176 Test Plan: ci Reviewed By: mrshenli Differential Revision: D30177734 fbshipit-source-id: 7625746984c8f858648c1b3632394b98bd4518d2	2021-08-09 14:41:06 -07:00
Natalia Gimelshein	b45cf9b81b	Revert D30117838: [WIP] Gate DistributedOptimizers on RPC availability Test Plan: revert-hammer Differential Revision: D30117838 (`3f09485d7e`) Original commit changeset: e6365a910a3d fbshipit-source-id: f276b2b2bdf5f7bd27df473fca0eebaee9f7aef2	2021-08-06 22:10:41 -07:00
Rohan Varma	3f09485d7e	[WIP] Gate DistributedOptimizers on RPC availability (#62774 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62774 Gates DistributedOptimizer which relies on RRef based on if RPC is available. This should enable ZeRo to work with Windows as Windows should not try to import the DIstributedOptimizer. If this works as expected we can enable the windows tests for functional/local sgd optimizers as well. ghstack-source-id: 135216642 Test Plan: CI Reviewed By: pbelevich Differential Revision: D30117838 fbshipit-source-id: e6365a910a3d1ca40d95fa6777a7019c561957db	2021-08-06 10:59:00 -07:00
Joel Schlosser	a0309f89f4	Initial ModuleInfo implementation (#61935 ) Summary: This PR contains the initial version of `ModuleInfo` for use in testing modules. The design philosophy taken here is to start small and simple and build out / refactor as needed when more test coverage or `ModuleInfo` entries are added. As such, it's not intended for general usage yet. The PR contains the following: * (new file) `torch/testing/_internal/common_modules.py` * `ModuleInfo` definition - metadata for each module to use in testing * `module_db` - the actual `ModuleInfo` database; currently contains entries for two modules * `ModuleInput` - analogous to `SampleInput` from OpInfo; contains `FunctionInput`s for both constructor and forward pass inputs * Constructor and forward pass inputs are tied together within a `ModuleInput` because they are likely correlated * `FunctionInput` - just contains args and kwargs to pass to a function (is there a nicer way to do this?) * `modules` decorator - analogous to `ops`; specifies a set of modules to run a test over * Some constants used to keep track of all modules under torch.nn: * `MODULE_NAMESPACES` - list of all namespaces containing modules * `MODULE_CLASSES` - list of all module class objects * `MODULE_CLASS_NAMES` - dict from module class object to nice name (e.g. torch.nn.Linear -> "nn.Linear") * (new file) `test/test_modules.py` * Uses the above to define tests over modules * Currently, there is one test for demonstration, `test_forward`, which instantiates a module, runs its forward pass, and compares it to a reference, if one is defined Pull Request resolved: https://github.com/pytorch/pytorch/pull/61935 Reviewed By: mruberry Differential Revision: D29881832 Pulled By: jbschlosser fbshipit-source-id: cc05c7d85f190a3aa42d55d4c8b01847d1efd57f	2021-07-27 07:42:07 -07:00
Rohan Varma	69adb21940	Parity tests for functional optimizer step_param (#61756 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61756 DDP will support running optimizer as communication hook with optimizers that support a per-parameter/gradient step function `step_param`. Add parity tests as we implement more optimizers that support step_param to ensure parity with regular optimizers. ghstack-source-id: 134330378 Test Plan: Ci Reviewed By: SciPioneer Differential Revision: D29727549 fbshipit-source-id: 18977c896f12b8e478298488b298fd107affcf5f	2021-07-26 19:03:22 -07:00
Yukio Siraichi	5224490ae9	Implement NumPy-like `frombuffer` tensor constructor. (#59077 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59077 Fixes #58549 `from_buffer` constructs a tensor object from an already allocated buffer through CPython's buffer protocol. Besides the standard `dtype`, `count`, and `offset` parameters, this function also accepts: - `device`: where the buffer lives - `requires_grad`: should autograd record operations on the new tensor A new test file _test_buffer_protocol.py_ was created. Currently, only CPU tests were implemented. That's because neither PyTorch nor Numba implements CPython's buffer protocol. Therefore, there's no way to create a CUDA buffer with the existing dependencies (could use PyCUDA for that, though). At the moment, if `device` differs from the device the buffer actually lives, two things may happen: - `RuntimeError`, if `device='cuda'` - Segmentation fault (not tested -- see above), if `device='cpu'` Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D29870914 Pulled By: mruberry fbshipit-source-id: 9fa8611aeffedfe39c9af74558178157a11326bb	2021-07-23 13:17:48 -07:00
Andrew Gu	c2cc6a9396	Add generic join unit tests (#61786 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61786 This adds unit tests for the generic join context manager. ``` gpurun python test/distributed/algorithms/test_join.py ``` Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D29746646 Pulled By: andwgu fbshipit-source-id: 2933d85783c2225574c4b77bfb90064690c6e668	2021-07-20 12:13:05 -07:00
Rong Rong (AI Infra)	a5a10fe353	Move all downloading logic out of common_utils.py (#61479 ) Summary: and into tools/ folder Currently run_tests.py invokes tools/test_selections.py 1. download and analyze what test_file to run 2. download and parse S3 stats and pass the info to local files. 3. common_utils.py uses download S3 stats to determine what test cases to run. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61479 Reviewed By: janeyx99 Differential Revision: D29661986 Pulled By: walterddr fbshipit-source-id: bebd8c474bcc2444e135bfd2fa4bdd1eefafe595	2021-07-12 11:23:22 -07:00
Rong Rong (AI Infra)	718db968b8	move CI related functions out of run_test.py (#61124 ) Summary: run_test.py currently does lots of downloading and test file/suite/case parsing. It doesn't work well outside of the CI environment Restructured the run_test.py and created tools/test/test_selections.py and move all test selection logic (reordering, categorizing slow test, creating shards) Follow up PRs should: - refactor those file read/write logic entangled inside test_selections.py into stats/ folder - restructure and add network independent test logics to test_test_selections.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/61124 Test Plan: - tools/test - CI Related PR: This follows the refactoring example in: https://github.com/pytorch/pytorch/issues/60373 Reviewed By: malfet Differential Revision: D29558981 Pulled By: walterddr fbshipit-source-id: 7f0fd9b4720a918d82918766c002295e8df04169	2021-07-06 09:06:42 -07:00
Zafar	509b1ef9d5	[sparsity] Add sparsity tests to run_test.py (#60887 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60887 Test Plan: ``` ./test/run_test.py -i test_ao_sparsity ``` ``` ./test/run_test.py -i test_ao_sparsity ``` Differential Revision: D29465834 D29465834 Reviewed By: mruberry Pulled By: z-a-f fbshipit-source-id: 144f940363a20dd65c2bbfe70924c266d8791dc7	2021-07-02 11:11:20 -07:00
Sam Estep	d5a44f9f12	Use expecttest from PyPI (#60658 ) Summary: This PR removes `torch/testing/_internal/expecttest.py` in favor of https://github.com/ezyang/expecttest. See also https://github.com/ezyang/ghstack/pull/71. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60658 Test Plan: CI. Reviewed By: ezyang Differential Revision: D29430763 Pulled By: samestep fbshipit-source-id: b7cdc7ba37330176149fd465312118e2254ae92e	2021-06-28 15:43:34 -07:00
Rong Rong (AI Infra)	7e619b9588	First step to rearrange files in tools folder (#60473 ) Summary: Changes including: - introduced `linter/`, `testing/`, `stats/` folders in `tools/` - move appropriate scripts into these folders - change grepped references in the pytorch/pytorch repo Next step - introduce `build/` folder for build scripts Pull Request resolved: https://github.com/pytorch/pytorch/pull/60473 Test Plan: - CI (this is important b/c pytorch/test-infra also rely on some script reference. - tools/tests/ Reviewed By: albanD Differential Revision: D29352716 Pulled By: walterddr fbshipit-source-id: bad40b5ce130b35dfd9e59b8af34f9025f3285fd	2021-06-24 10:13:58 -07:00
Rong Rong (AI Infra)	40d2fe1053	correct filename issue for test_cpp_extensions_aot (#60604 ) Summary: Using file copy to make actual ninja vs. no_ninja suffixed python test files. This is to trick xmlrunner to report test cases in the correct folder. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60604 Test Plan: - CI reports correctly into the corresponding folders - If download the test statistics, calculate shards now doesn't need custom logic to handle `test_cpp_extensions_aot` CI result shown it is working properly: https://github.com/pytorch/pytorch/pull/60604/checks?check_run_id=2900038654 vs https://github.com/pytorch/pytorch/pull/60604/checks?check_run_id=2900038673 Reviewed By: albanD Differential Revision: D29349562 Pulled By: walterddr fbshipit-source-id: e86e6bc0db288a2a57bea3c5f8edf03be1773944	2021-06-24 09:20:19 -07:00
Jane Xu	6385621003	Use JOB_BASE_NAME throughout code--consolidate CIRCLE_JOB (#60425 ) Summary: This PR is a first step in unifying our environment variables across CI (so that we don't have `CIRCLE_BLAH` in our GHA workflows, for example), though I'd like for this PR to be more for discussion about how best to consolidate these variables. This small change only changes most CIRCLE_JOB references in our code to be JOB_BASE_NAME, as that seems the closest GHA (and ROCm) equivalent. Currently, JOB_BASE_NAME is defined as: - in Circle: CIRCLE_JOB (name of the job, like `pytorch_linux_bionic_py3_8_gcc9_coverage_test1`) - in GHA: the build_environment with a `-build` or `-test` tacked to the end , e.g., `pytorch-linux-xenial-cuda10.2-cudnn7-py3.6-gcc7-test` - in ROCm: I don't actually know, but it's important for ROCm test sharding as shown in https://github.com/pytorch/pytorch/pull/60409 I am not sure if this is the intention for JOB_BASE_NAME so it is open to discussion what variable we should use if not JOB_BASE_NAME. I also don't know if it's worth the effort consolidating all these variables, so discussion is also highly encouraged there! Next steps: - Consolidate more CIRCLE_* references, maybe into CI_* equivalents? - We use BUILD_ENVIRONMENT everywhere in Circle though the variable is inconsistent across binary vs CI jobs and across platforms. For example, for linux tests and builds, BUILD_ENVIRONMENT contains the `_test` and `_build` suffixes, but the windows jobs don't. In GHA, BUILD_ENVIRONMENT is similar to how it's defined in windows jobs on Circle. This inconsistency is confusing, and we can probably do something about it. I'm thinking of switching out BUILD_ENVIRONMENT for JOB_BASE_NAME in our test scripts where we actually mean JOB_BASE_NAME. - We should probably document the meaning of the variables we consolidate somewhere, preferably in a README in some unified `ci/` folder. For example, it seems BUILD_ENVIRONMENT is supposed to capture the build environment, whereas JOB_BASE_NAME is supposed to capture the environment _and_ whether we're building or testing. Notes: - I did not replace CIRCLE_JOB references in third_party directories - Previously, print_test_stats reported CIRCLE_JOB as only the build environment for GHA workflows, and I think tacking on the `build` or `test` will not harm anything, though I may be wrong. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60425 Reviewed By: seemethere, samestep Differential Revision: D29333882 Pulled By: janeyx99 fbshipit-source-id: a82080e6205a03a1183035011ce59698eca06748	2021-06-23 11:11:21 -07:00
Howard Huang	ff3678eec2	Disable group group backend rpc tests from running on CI (#60407 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60407 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D29278179 Pulled By: H-Huang fbshipit-source-id: ee78085eeb04d81842c95236b8c3a33de7142a3a	2021-06-23 10:58:31 -07:00
Jane Xu	c63a0d0cfe	Adding windows CUDA smoke tests on PRs (#59686 ) Summary: Adding windows CUDA smoke tests on PRs (master should run the full suite). Next step: - Automate data update so we get a new smoke test list without manual effort Pull Request resolved: https://github.com/pytorch/pytorch/pull/59686 Test Plan: https://github.com/pytorch/pytorch/actions/runs/958296267 The sharded smoke tests take long still because of dependencies installation Reviewed By: walterddr Differential Revision: D29243533 Pulled By: janeyx99 fbshipit-source-id: dde7ba127fa15c95bda0e833cc5311598fb85e2b	2021-06-23 10:13:50 -07:00
Jane Xu	462448f07a	Enable GHA sharding on linux (#60124 ) Summary: This is branch off of https://github.com/pytorch/pytorch/issues/59970 to only shard on linux so far (we're running in issues with windows gflags). This would enable sharding of tests on a few Linux jobs on GHA, allowing tts to be essentially halved. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60124 Reviewed By: zou3519 Differential Revision: D29204211 Pulled By: janeyx99 fbshipit-source-id: 1cc31d1eccd564d96e2aef14c0acae96a3f0fcd0	2021-06-17 13:00:23 -07:00
Rong Rong (AI Infra)	b2fc6de2c4	support parsing of PR stats in run_test.py (#60026 ) Summary: Currently S3 test stats doesn't support PR stats parisng. Changes to s3_stats_parser: 1. they are uploaded to `test_times/{sha1}/{job}` and `pr_test_times/{pr}/{sha1}/{job}` separately. Thus we need parsing logics for both 2. need to attach time for PR stats parsing for ordering since PR commits can be force-pushed Changes to run_test.py 1. Reordering based on previous PR stats if available 2. Falling back to file change option if not enabled. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60026 Test Plan: - CI. - local repro: plz run: ``` CIRCLE_JOB="pytorch_linux_bionic_py3_6_clang9_noarch_test" CIRCLE_PR_NUMBER=60057 IN_CI=1 ENABLE_PR_HISTORY_REORDERING=1 python test/run_test.py ``` Reviewed By: samestep Differential Revision: D29164754 Pulled By: walterddr fbshipit-source-id: 206688e0fb0b78d1c9042c07243da1fbf88a924b	2021-06-16 13:32:31 -07:00
Jane Xu	d88fbf0fbc	fix minor typo in run_test.py (#60055 ) Summary: Fixes typo in run_test.py for option use_specified_test_cases_by Pull Request resolved: https://github.com/pytorch/pytorch/pull/60055 Reviewed By: walterddr Differential Revision: D29150156 Pulled By: janeyx99 fbshipit-source-id: 375e594d09c83188bfa80762c8b833a0b7c5cca4	2021-06-16 09:30:45 -07:00
Rohan Varma	c2098487e8	[c10d] Move pg wrapper tests to their own file. (#59840 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59840 moving these tests to their own standalone file. No meaningful code changes. ghstack-source-id: 131359162 Test Plan: CI Reviewed By: cbalioglu Differential Revision: D29012664 fbshipit-source-id: 348870016509a6ed7e69240fa82bccef4a12d674	2021-06-14 15:05:55 -07:00
Rong Rong	e41bc31eb2	make --run-specified-test-case use --include (#59704 ) Summary: instead of having specific logic to handle run-specific-test-case, we provide the flag to override include or bring-to-front with the SPECIFIED_TEST_CASES_FILE. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59704 Reviewed By: janeyx99 Differential Revision: D29038425 Pulled By: walterddr fbshipit-source-id: 803d3555813437c7f287a22f7704106b0c609919	2021-06-11 13:57:13 -07:00
Jane Xu	9bb5663979	Use commit stats from viable/strict instead of nightlies for sharding (#59727 ) Summary: Currently, not all of CI runs on nightlies, so it's better to use viable/strict. For example, current 11.1 test jobs do not get to use automatic sharding because of the lack of stats: https://app.circleci.com/jobs/github/pytorch/pytorch/14010983?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link Pull Request resolved: https://github.com/pytorch/pytorch/pull/59727 Reviewed By: heitorschueroff Differential Revision: D29004910 Pulled By: janeyx99 fbshipit-source-id: eb0c54a7e7947decba8134a1d67e4b0434151a06	2021-06-09 13:52:15 -07:00
Jane Xu	97dfc7e300	[Reland] Adding run specified tests option to run_test.py (#59649 ) Summary: Reland of https://github.com/pytorch/pytorch/issues/59487 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59649 Reviewed By: samestep Differential Revision: D28970751 Pulled By: janeyx99 fbshipit-source-id: 6e28d4dcfdab8a49da4b6a02c57516b08bacd7b5	2021-06-08 16:04:46 -07:00
Alban Desmaison	5d6a10a765	Revert D28913223: [pytorch][PR] Adding run-specified-test-cases option in run_test.py Test Plan: revert-hammer Differential Revision: D28913223 (`24432eaa29`) Original commit changeset: 0d1f99109734 fbshipit-source-id: 47c073720cff23a5d4cb64556381c46025e90937	2021-06-08 02:18:16 -07:00
Rong Rong (AI Infra)	57d8bccd00	only reorder tests based on git diff if IN_CI (#59565 ) Summary: Do not reorder tests unless they are in IN_CI, this causes local development test ordering indeterministic. most of use branch out from viable strict not head of master. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59565 Reviewed By: ejguan Differential Revision: D28943906 Pulled By: walterddr fbshipit-source-id: e742e7ce4b3fc017d7563b01e93c4cd774d0a537	2021-06-07 17:54:19 -07:00
Jane Xu	24432eaa29	Adding run-specified-test-cases option in run_test.py (#59487 ) Summary: The run-specified-test-cases option would allow us to specify a list of test cases to run by having a CSV with minimally two columns: test_filename and test_case_name. This PR also adds .json to some files we use for better clarity. Usage: `python test/run_test.py --run-specified-test-cases <csv_file>` where the csv file can look like: ``` test_filename,test_case_name,test_total_time,windows_only_failure_sha_count,total_sha_count,windows_failure_count,linux_failure_count,windows_total_count,linux_total_count test_cuda,test_cudnn_multiple_threads_same_device,8068.8409659525,46,3768,53,0,2181,6750 test_utils,test_load_standalone,8308.8062920459,14,4630,65,0,2718,8729 test_ops,test_forward_mode_AD_acosh_cuda_complex128,91.652619369806,11,1971,26,1,1197,3825 test_ops,test_forward_mode_AD_acos_cuda_complex128,91.825633094915,11,1971,26,1,1197,3825 test_profiler,test_source,60.93786725749,9,4656,21,3,2742,8805 test_profiler,test_profiler_tracing,203.09352795241,9,4662,21,3,2737,8807 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/59487 Test Plan: Without specifying the option, everything should be as they were before. Running `python test/run_test.py --run-specified-test-cases windows_smoke_tests.csv` resulted in this paste P420276949 (you can see internally). A snippet looks like: ``` (pytorch) janeyx@janeyx-mbp pytorch % python test/run_test.py --run-specified-test-cases windows_smoke_tests.csv Loading specified test cases to run from windows_smoke_tests.csv. Processed 28 test cases. Running test_cpp_extensions_jit ... [2021-06-04 17:24:41.213644] Executing ['/Users/janeyx/miniconda3/envs/pytorch/bin/python', 'test_cpp_extensions_jit.py', '-k', 'test_jit_cuda_archflags'] ... [2021-06-04 17:24:41.213781] s ---------------------------------------------------------------------- Ran 1 test in 0.000s OK (skipped=1) ... ``` With pytest, an example executable would be: `Running test_dataloader ... [2021-06-04 17:37:57.643039] Executing ['/Users/janeyx/miniconda3/envs/pytorch/bin/python', '-m', 'pytest', 'test_dataloader.py', '-v', '-k', 'test_segfault or test_timeout'] ... [2021-06-04 17:37:57.643327]` Reviewed By: samestep Differential Revision: D28913223 Pulled By: janeyx99 fbshipit-source-id: 0d1f9910973426b8756815c697b483160517b127	2021-06-07 16:27:43 -07:00
Jane Xu	caf76c2445	Move sharding to after all tests have been excluded (#59583 ) Summary: It would be most accurate if sharding occurred after all other changes to selected_tests were complete. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59583 Reviewed By: ejguan Differential Revision: D28944737 Pulled By: janeyx99 fbshipit-source-id: a851473948a5ec942ffeeedeefdc645536a3d9f7	2021-06-07 15:04:36 -07:00
Mike Ruberry	de40c8e495	Adds remaining OpInfos and removes redundant test generators (#55558 ) Summary: Per title. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55558 Reviewed By: ngimel Differential Revision: D28922522 Pulled By: mruberry fbshipit-source-id: 89cefd93788bc8aa0683f4583cf5caa81aa2dc93	2021-06-06 14:52:26 -07:00
Andrew Gu	2ad4b8e58c	Extract c10d Store tests to dedicated test file (#59271 ) Summary: Partially addresses https://github.com/pytorch/pytorch/issues/55340 Overview This factors out `FileStoreTest`, `HashStoreTest`, `PrefixFileStoreTest`, `TCPStoreTest`, `PrefixTCPStoreTest`, `PythonStoreTest`, `RendezvousTest`, `RendezvousEnvTest`, `RendezvousFileTest`, and `RendezvousTCPTest` from `test_c10d_common.py` to a new file `test_store.py`. Additionally, unused import/initialization statements are removed from `test_c10d_common.py`, and the minimal set of import/initialization statements are used for `test_store.py`. Also, this changes `.jenkins/pytorch/multigpu-test.sh`, `.jenkins/pytorch/win-test-helpers/test_distributed.bat`, and `test/run_test.py` to include the new `test_store.py`. Testing All commands shown are run on an AI AWS cluster. I check the Store tests: ``` python test/distributed/test_store.py ``` I also check `test_c10d_common.py` since it is the source of the refactored code. In addition, I check `test_c10d_nccl.py` and `test_c10d_gloo.py` since they import from `test_c10d_common.py`; those two should be the only test files depending on `test_c10d_common.py`. ``` python test/distributed/test_c10d_common.py python test/distributed/test_c10d_nccl.py python test/distributed/test_c10d_gloo.py ``` `test_c10d_gloo.py` produces warnings about how using sparse tensors in TorchScript is experimental, but the warnings do not result from this PR's changes. Testing Issues (To Be Revisited) ``` WORLD_SIZE=4 BACKEND=gloo gpurun pytest test/distributed/test_c10d_gloo.py ``` Running the above command fails three tests (written as `[Test]`: `[Error]`): - `ProcessGroupGlooWrapperTest.test_collective_hang`: `RuntimeError: [../third_party/gloo/gloo/transport/tcp/pair.cc:598] Connection closed by peer [10.200.24.101]:15580` - `CommTest.test_broadcast_coalesced_gloo_cuda`: `RuntimeError: cuda runtime error (3) : initialization error at ../aten/src/THC/THCGeneral.cpp:54` - `CommTest.test_sequence_num_incremented_gloo_default`: `RuntimeError: cuda runtime error (3) : initialization error at ../aten/src/THC/THCGeneral.cpp:54` However, running each of the following yields no errors: ``` WORLD_SIZE=4 BACKEND=gloo gpurun pytest test/distributed/test_c10d_gloo.py -k test_collective_hang WORLD_SIZE=4 BACKEND=gloo gpurun pytest test/distributed/test_c10d_gloo.py -k test_broadcast_coalesced_gloo_cuda WORLD_SIZE=4 BACKEND=gloo gpurun pytest test/distributed/test_c10d_gloo.py -k test_sequence_num_incremented_gloo_default ``` This suggests the existence of some inadvertent state dependency between tests (e.g. improper cleanup). I have not explored this further yet. In particular, I do not have a solid understanding of the tests to be able to explain why using `pytest` and `gpurun` induces the failure (since notably, running the `.py` directly shows no issue). Similarly, running the following yields 47 errors: ``` WORLD_SIZE=4 BACKEND=nccl gpurun pytest test/distributed/test_c10d_nccl.py ``` The errors seem to all be simply complaining about the usage of `fork()` instead of `spawn()` for CUDA multiprocessing. Though, most of the tests in `test_c10d_nccl.py` ask for at least 2 CUDA devices, so I think that the `gpurun` is warranted (assuming that the test file does not need to be run partially on different machines). Both `test_c10d_common.py` and `test_store.py` work fine with `pytest`. Other Notes I noticed that `torch.distributed` is imported both as `dist` and as `c10d` and that `c10d` is used throughout the Store tests. I was curious if this is intentional (as opposed to using `dist` to refer to `torch.distributed`). Also, the original [issue](https://github.com/pytorch/pytorch/issues/55340) suggests that the Store tests do not use multiprocessing, but I saw that `torch.multiprocessing` is still used in `TCPStoreTest`. The links for the Store files in the `CONTRIBUTING.md` [file](https://github.com/pytorch/pytorch/blob/master/torch/distributed/CONTRIBUTING.md) are broken. This can fixed in a separate PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59271 Reviewed By: jbschlosser, mrshenli Differential Revision: D28856920 Pulled By: andwgu fbshipit-source-id: 630950cba18d34e6b5de661f5a748f2cddc1b446	2021-06-03 10:53:33 -07:00
Pritam Damania	0d6fa1adc5	Introduce ChunkShardingSpec as a model sharding specification. (#55728 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55728 Full design: https://github.com/pytorch/pytorch/issues/55207 This PR introduces ChunkShardingSpec (SingleShardingSpec in the design). Used the name ChunkShardingSpec since it is very similar to `torch.chunk` in terms of how a Tensor is split up and feels more clear compared to SingleShardingSpec. ghstack-source-id: 129603318 Test Plan: waitforbuildbot Reviewed By: SciPioneer Differential Revision: D27694108 fbshipit-source-id: c8764abe6a4d5fc56d023fda29b74b5af2a73b49	2021-05-23 16:04:57 -07:00
Rong Rong (AI Infra)	a70020465b	adding test_sparse_csr to run_test (#58666 ) Summary: fixes https://github.com/pytorch/pytorch/issues/58632. Added several skips that relates to test assert and MKL. Will address them in separate PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58666 Reviewed By: seemethere, janeyx99 Differential Revision: D28607966 Pulled By: walterddr fbshipit-source-id: 066d4afce2672e4026334528233e69f68da04965	2021-05-22 13:17:46 -07:00
Sam Estep	2e26976ad3	Disallow versionless Python shebangs (#58275 ) Summary: Some machines don't have a versionless `python` on their PATH, which breaks these existing shebangs. I'm assuming that all the existing versionless `python` shebangs are meant to be `python3` and not `python2`; please let me know if my assumption was incorrect for any of these. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58275 Test Plan: CI. Reviewed By: zhouzhuojie Differential Revision: D28428143 Pulled By: samestep fbshipit-source-id: 6562be3d12924db72a92a0207b060ef740f61ebf	2021-05-14 08:26:02 -07:00
Nikita Shulga	b587354e4c	Add Python-3.9 CI testing (#50992 ) Summary: Skip number of tests adjust typing handling Pull Request resolved: https://github.com/pytorch/pytorch/pull/50992 Reviewed By: walterddr Differential Revision: D26170388 Pulled By: malfet fbshipit-source-id: 47852512aa3d5c25faf6687bcd0b1cbb332b0b20	2021-05-10 10:51:39 -07:00
Aliaksandr Ivanou	7fe4c1d0e7	Torchelastic: add multiprocessing tests to ci/cd (#56842 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56842 Add elastic multiprocessing test to ci/cd Test Plan: buck test mode/opt-tsan //caffe2/test/distributed/elastic/multiprocessing/... -- --run-disabled Reviewed By: wilson100hong Differential Revision: D27982226 fbshipit-source-id: 1b4e6f1a20867a6aa7ca409e280fdb04e8db198b	2021-05-02 14:03:47 -07:00
Aliaksandr Ivanou	5c8ceefe46	Pytorch add agent api tests (#56985 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56985 Pytorch add agent api tests Test Plan: ci/cd Reviewed By: cbalioglu Differential Revision: D28020485 fbshipit-source-id: e6acf095f26ce4b99cddfbf7641fb4fa885b0c86	2021-04-29 06:14:39 -07:00
Aliaksandr Ivanou	6ff0002b12	Pytorch: enable many torchelastic tests (#56970 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56970 The diff enables metrics, events, utils and timer tests on ci/cd pipeline Test Plan: ci/cd Reviewed By: cbalioglu Differential Revision: D28015200 fbshipit-source-id: 6b419aaf9e62a10a747b6511bff90c82cfb7bcd6	2021-04-28 17:05:09 -07:00
David Reiss	89377e3e45	model_dump tool for model inspection (#56868 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56868 See __init__.py for a summary of the tool. The following sections are present in this initial version - Model Size. Show the total model size, as well as a breakdown by stored files, compressed files, and zip overhead. (I expect this breakdown to be a bit more useful once data.pkl is compressed.) - Model Structure. This is basically the output of `show_pickle(data.pkl)`, but as a hierarchical structure. Some structures cause this view to crash right now, but it can be improved incrementally. - Zip Contents. This is basically the output of `zipinfo -l`. - Code. This is the TorchScript code. It's integrated with a blame window at the bottom, so you can click "Blame Code", then click a bit of code to see where it came from (based on the debug_pkl). This currently doesn't render properly if debug_pkl is missing or incomplete. - Extra files (JSON). JSON dumps of each json file under /extra/, up to a size limit. - Extra Pickles. For each .pkl file in the model, we safely unpickle it with `show_pickle`, then render it with `pprint` and include it here if the size is not too large. We aren't able to install the pprint hack that thw show_pickle CLI uses, so we get one-line rendering for custom objects, which is not very useful. Built-in types look fine, though. In particular, bytecode.pkl seems to look fine (and we hard-code that file to ignore the size limit). I'm checking in the JS dependencies to avoid a network dependency at runtime. They were retrieved from the following URLS, then passed through a JS minifier: https://unpkg.com/htm@3.0.4/dist/htm.module.js?module https://unpkg.com/preact@10.5.13/dist/preact.module.js?module Test Plan: Manually ran on a few models I had lying around. Mostly tested in Chrome, but I also poked around in Firefox. Reviewed By: dhruvbird Differential Revision: D28020849 Pulled By: dreiss fbshipit-source-id: 421c30ed7ca55244e9fda1a03b8aab830466536d	2021-04-28 07:33:10 -07:00
Philip Meier	759cfb7495	add missing comma to `run_test.py` (#57010 ) Summary: Factored out from https://github.com/pytorch/pytorch/pull/57008#discussion_r621137121: > Without this comma, the strings are concatenated to `test_binary_ufuncstest_numpy_interop` Pull Request resolved: https://github.com/pytorch/pytorch/pull/57010 Reviewed By: malfet Differential Revision: D28028061 Pulled By: walterddr fbshipit-source-id: 97c64b79a6aaaf0242def03c8808c1a032537258	2021-04-27 08:00:13 -07:00
Joel Schlosser	febff45900	Support factory kwargs in torch.nn modules (#54508 ) Summary: Continuation of https://github.com/pytorch/pytorch/pull/53144 Pull Request resolved: https://github.com/pytorch/pytorch/pull/54508 Reviewed By: albanD Differential Revision: D27939544 Pulled By: jbschlosser fbshipit-source-id: 4bf517e5f74f093e27ca38a85e732da65e44d805	2021-04-22 16:16:53 -07:00
driazati	187a524249	Re-order tests based on changed files (#56666 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56666 Addresses some of #56557 by checking for changed files when running tests. This will help deliver signal faster when a failing test is run. It should always be safe to at least try to re-order the tests, so there's no option to turn it off, and any error ends up bailing out of the sorting process. Time saved will change between tests, with more improvement for things that are further down the static list here: `1e9c7ad4cb/test/run_test.py (L32)` The results vary from not much improvement ([before: 11m](https://app.circleci.com/pipelines/github/pytorch/pytorch/307580/workflows/6ab3def6-8d63-4f41-9b8d-9c2c50f6266b/jobs/12712819/steps), [after: 10m](https://app.circleci.com/pipelines/github/pytorch/pytorch/307578/workflows/157407b4-f850-431c-b641-d2ac97916a04/jobs/12712802/steps)) to a lot ([before: 75m](https://app.circleci.com/pipelines/github/pytorch/pytorch/307580/workflows/6ab3def6-8d63-4f41-9b8d-9c2c50f6266b/jobs/12712884/steps), [after: 8m](https://app.circleci.com/pipelines/github/pytorch/pytorch/307578/workflows/157407b4-f850-431c-b641-d2ac97916a04/jobs/12712865/steps)), but overall there shouldn't be any regression in test timing. These results are also probably a little confounded since the test sharding will be different after re-ordering. As a follow up we can use the target determination logic to figure out which tests to bring to front based on the actual code instead of just edits to test files Test Plan: Imported from OSS Reviewed By: samestep Differential Revision: D27934076 Pulled By: driazati fbshipit-source-id: 747d09ad732289d7693101803d46e9fa8e6d2f59	2021-04-22 10:27:07 -07:00
Pavel Belevich	426852b4f0	Split test_c10d_spawn.py to test_c10d_spawn_gloo.py,test_c10d_spawn_nccl.py (#56599 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56599 Test Plan: NA Reviewed By: SciPioneer Differential Revision: D27913955 fbshipit-source-id: 7206e589fb7d08c55d08a58a3d57dc3d210a795e	2021-04-21 22:11:49 -07:00
Pavel Belevich	5cc75e46fa	Split test_c10d.py to test_c10d_common.py, test_c10d_gloo.py, test_c10d_nccl.py (#56598 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56598 Test Plan: NA Reviewed By: SciPioneer Differential Revision: D27913170 fbshipit-source-id: 3439d18141131b02d55f2ca399a4c795cba2b04b	2021-04-21 22:10:41 -07:00
Joel Schlosser	12b2bc94d7	Revert D27909732: [pytorch][PR] Support factory kwargs in torch.nn modules Test Plan: revert-hammer Differential Revision: D27909732 (`5a09def9b0`) Original commit changeset: d8684b2403ab fbshipit-source-id: d00d69fae4fa4ed58d9e97e70b27a06a0dcb39e4	2021-04-21 13:44:03 -07:00
Joel Schlosser	5a09def9b0	Support factory kwargs in torch.nn modules (#54508 ) Summary: Continuation of https://github.com/pytorch/pytorch/pull/53144 Pull Request resolved: https://github.com/pytorch/pytorch/pull/54508 Reviewed By: malfet Differential Revision: D27909732 Pulled By: jbschlosser fbshipit-source-id: d8684b2403ab7eb336371d118799146a2520bd76	2021-04-21 13:20:11 -07:00

1 2 3 4 5 ...

314 Commits