pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Justin Chu	4cc1745b13	[BE] f-stringify torch/ and scripts (#105538 ) This PR is a follow up on the pyupgrade series to convert more strings to use f-strings using `flynt`. - https://docs.python.org/3/reference/lexical_analysis.html#f-strings - https://pypi.org/project/flynt/ Command used: ``` flynt torch/ -ll 120 flynt scripts/ -ll 120 flynt tools/ -ll 120 ``` and excluded `collect_env.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/105538 Approved by: https://github.com/ezyang, https://github.com/malfet	2023-07-21 19:35:24 +00:00
Justin Chu	73e1455327	[BE] Enable ruff's UP rules and autoformat test/ (#105434 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/105434 Approved by: https://github.com/albanD	2023-07-19 20:36:06 +00:00
Joel Schlosser	ece19bf018	Update run_test.py to use TEST_WITH_SLOW_GRADCHECK flag (#104819 ) Finishes the job from #104537. See https://github.com/pytorch/pytorch/pull/104537#pullrequestreview-1520065008 Pull Request resolved: https://github.com/pytorch/pytorch/pull/104819 Approved by: https://github.com/huydhn	2023-07-11 21:58:46 +00:00
Yukio Siraichi	40b8d10d5e	Re-land: Turn translation validation on for tests and accuracy runs by default. (#104467 ) Re-landing: #103611 Pull Request resolved: https://github.com/pytorch/pytorch/pull/104467 Approved by: https://github.com/malfet	2023-07-05 19:01:50 +00:00
Nikita Shulga	ddd7da7546	Enable more tests (#104437 ) Remove `test_segment_reductions` from list of blocklisted tests Remove `@onlyCPU` qualifier from test_segment_reductions as it has CUDA specific parts Fixes https://github.com/pytorch/pytorch/issues/104410 Pull Request resolved: https://github.com/pytorch/pytorch/pull/104437 Approved by: https://github.com/atalman, https://github.com/huydhn	2023-06-30 16:26:11 +00:00
PyTorch MergeBot	a2a8b4d415	Revert "Turn translation validation on for tests and accuracy runs by default. (#103611 )" This reverts commit `e311bed2a8`. Reverted https://github.com/pytorch/pytorch/pull/103611 on behalf of https://github.com/malfet due to Broke inductor tests ([comment](https://github.com/pytorch/pytorch/pull/103611#issuecomment-1614850276))	2023-06-30 15:54:18 +00:00
Yukio Siraichi	e311bed2a8	Turn translation validation on for tests and accuracy runs by default. (#103611 ) This PR turns translation validation on by default for tests and accuracy benchmark runs. It also installs Z3 on CI. The main changes are: - Add `--no-translation-validation` as an option in _test/run_tests.py_ - Set `PYTORCH_TEST_WITH_TV` environment variable - Add `TEST_WITH_TV` variable in _torch/testing/_internal/common_utils.py_ - Turn translation validation on for accuracy benchmarks in _benchmarks/dynamo/common.py_ - Add Z3 installation on CI scripts Pull Request resolved: https://github.com/pytorch/pytorch/pull/103611 Approved by: https://github.com/ezyang	2023-06-30 01:32:21 +00:00
Nikita Shulga	c40f5edf7b	Change tools search order (#104214 ) Prevents following cryptic error if one attempts to use `run_tests.py` on system that also has torchaudio installed in dev mode (as `tools` from https://github.com/pytorch/audio might take precedence, but this is not how script should behave): ``` Unable to import test_selections from tools/testing. Running without test selection stats.... Reason: No module named 'tools.stats' Traceback (most recent call last): File "/Users/nshulga/git/pytorch/pytorch/test/run_test.py", line 1673, in <module> main() File "/Users/nshulga/git/pytorch/pytorch/test/run_test.py", line 1604, in main selected_tests = get_selected_tests(options) File "/Users/nshulga/git/pytorch/pytorch/test/run_test.py", line 1418, in get_selected_tests path = os.path.join(str(REPO_ROOT), TEST_TIMES_FILE) NameError: name 'TEST_TIMES_FILE' is not defined ``` But make sure to remove it in the end, otherwise it will not work if torch is installed from wheel, but tests are running from clean repo checkout. <!-- copilot:poem --> ### <samp>🤖 Generated by Copilot at dd52521</samp> > _Sing, O Muse, of the cunning code review_ > _That fixed the tests of the `tools` module_ > _By adding and removing the root path_ > _As a shepherd guides his flock to and fro._ Pull Request resolved: https://github.com/pytorch/pytorch/pull/104214 Approved by: https://github.com/kit1980	2023-06-27 15:54:34 +00:00
Nikita Shulga	925f0a01c7	Do not pass `stepcurrent` option unless in CI (#104135 ) Should allow one to run the same tests multiple times on local machine <!-- copilot:poem --> ### <samp>🤖 Generated by Copilot at 740a92d</samp> > _`pytest_args` change_ > _Only add `--sc` on CI_ > _Avoid conflicts - fall_ Pull Request resolved: https://github.com/pytorch/pytorch/pull/104135 Approved by: https://github.com/huydhn, https://github.com/kit1980	2023-06-24 09:34:14 +00:00
Nikita Shulga	63f66d19ea	[Tests] Make `run_test.py` usable without boto3 (#104111 ) There is a `HAVE_TEST_SELECTION_TOOLS` conditional, but turns out it does not really work, so fix it by defining all missing prototypes and make it work as single-shard instance Add lint rule to test stat it would succeed for runnign only test_cuda with released version of PyTorch Pull Request resolved: https://github.com/pytorch/pytorch/pull/104111 Approved by: https://github.com/clee2000, https://github.com/ZainRizvi	2023-06-24 03:10:49 +00:00
Nikita Shulga	98d513cabf	[BE][Test] Remove `--pytest` option from `run_test.py` (#104125 ) Because we always run tests with pytest now. Marking it as `bc-breaking` as there could technically be some scripts depending on it somewhere... <!-- copilot:poem --> ### <samp>🤖 Generated by Copilot at 1760568</samp> > _`pytest` option gone_ > _simpler test runner script_ > _autumn leaves fall fast_ Pull Request resolved: https://github.com/pytorch/pytorch/pull/104125 Approved by: https://github.com/seemethere	2023-06-24 00:20:20 +00:00
Catherine Lee	7ac1c64bc4	Exclude _nvfuser from test collection (#104003 ) The three files in this folder are run by should instead be run by test_jit_cuda_fuser.py, test_nvfuser_dynamo.py, and test_nvfuser_frontend.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/104003 Approved by: https://github.com/huydhn, https://github.com/jjsjann123	2023-06-22 19:46:45 +00:00
Zain Rizvi	c3d3165f16	Enable uploading metrics and upload Test Reordering metrics to dynamodb (#102691 ) Added a feature to upload test statistics to DynamoDB and Rockset using a new function `emit_metric` in `tools/stats/upload_stats_lib.py`. Added metrics to measure test reordering effectiveness in `tools/testing/test_selections.py`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/102691 Approved by: https://github.com/malfet	2023-06-12 23:01:53 +00:00
PyTorch MergeBot	b52ee80cdc	Revert "Add print statements to debug sharding error (#102713 )" This reverts commit `c7873522c2`. Reverted https://github.com/pytorch/pytorch/pull/102713 on behalf of https://github.com/clee2000 due to issue should be resolved now ([comment](https://github.com/pytorch/pytorch/pull/102713#issuecomment-1583334560))	2023-06-08 21:02:17 +00:00
Aidyn-A	591134f2a5	[CI] Enable UCC in CI (#100395 ) UCC was temporarily disabled in #98832. This PR re-enables it with the necessary fix. Pull Request resolved: https://github.com/pytorch/pytorch/pull/100395 Approved by: https://github.com/atalman	2023-06-08 19:01:22 +00:00
Catherine Lee	c7873522c2	Add print statements to debug sharding error (#102713 ) sharding on rocm is broken, i cant replicate on dummy PRs even though it seems to happen pretty often on main, so adding this to increase my sample size. Hopefully this is enough print statements... Pull Request resolved: https://github.com/pytorch/pytorch/pull/102713 Approved by: https://github.com/huydhn	2023-06-01 22:38:28 +00:00
Zain Rizvi	c84f246c83	Improve time savings calculation math for test reordering (#102411 ) Use a more accurate method that accounts for tests being run in parallel Right now we still log results to the console, but later it'll get logged to Rockset for better tracking Pull Request resolved: https://github.com/pytorch/pytorch/pull/102411 Approved by: https://github.com/huydhn, https://github.com/malfet	2023-05-31 23:51:27 +00:00
Catherine Lee	a5ddb72aec	Quick fix for keep-going + reruns (#102569 ) Currently file level reruns + stepcurrent are incompatible and it's making PRs green when they are actually red, so turn off stepcurrent + file level reruns when keep-going is used until I figure out a better way to do this. Pull Request resolved: https://github.com/pytorch/pytorch/pull/102569 Approved by: https://github.com/huydhn, https://github.com/malfet	2023-05-31 04:46:25 +00:00
PyTorch MergeBot	1a6ab8a5dc	Revert "Quick fix for keep-going + reruns (#102569 )" This reverts commit `7f6edcf422`. Reverted https://github.com/pytorch/pytorch/pull/102569 on behalf of https://github.com/clee2000 due to broke a ton of stuff ([comment](https://github.com/pytorch/pytorch/pull/102569#issuecomment-1569167673))	2023-05-30 22:04:27 +00:00
Catherine Lee	7f6edcf422	Quick fix for keep-going + reruns (#102569 ) Currently file level reruns + stepcurrent are incompatible and it's making PRs green when they are actually red, so turn off stepcurrent + file level reruns when keep-going is used until I figure out a better way to do this. Pull Request resolved: https://github.com/pytorch/pytorch/pull/102569 Approved by: https://github.com/huydhn	2023-05-30 21:29:56 +00:00
Huy Do	6e3e3dd477	Do not collect and skip non-disabled tests when rerunning disabled tests (#102107 ) The console log blows up to much when running in rerun disabled tests mode (x50) `e132f09e88`. Each log is around 1GB and the whole uncompressed logs is ~50GB. After compression, it will be around 1GB, still too big. The increase comes mainly from the multiple SKIPPED message for non-disabled tests, which is expected due to how SkipTest and pytest-flakyfinder currently work. I update `test/conftest.py` to completely ignore skipped tests when rerunning disabled test instead of collecting then skipping 50 tests each. The benefit of doing is is much more than I originally expect: * Rerun disabled tests jobs now finish in less than half an hour as they should be * Fix OOM runner crash because of too many collected tests * Fix verbosity issue as now only disabled tests are run x50 times. There are only few hundreds of them atm * Fix timed out issue when rerunning disabled distributed and ASAN tests. They are just too slow when running at x50 ### Testing When rerunning disabled tests https://github.com/pytorch/pytorch/actions/runs/5084508614, only disabled tests on the platform are run, for example `test_ops_jit` on https://ossci-raw-job-status.s3.amazonaws.com/log/13770164954 only ran 100 tests (`test_variant_consistency_jit_linalg_lu_cuda_float32` + `test_variant_consistency_jit_linalg_lu_factor_cuda_complex64`) x50. ``` Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_ops_jit.py', '--shard-id=1', '--num-shards=2', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '--sc=test_ops_jit_1', '--flake-finder', '--flake-runs=50', '--import-slow-tests', '--import-disabled-tests', '--rerun-disabled-tests'] ... [2023-05-25 21:32:49.763856] Expand the folded group to see the log file of test_ops_jit 2/2 ##[group]PRINTING LOG FILE of test_ops_jit 2/2 (/var/lib/jenkins/workspace/test/test-reports/test_ops_jit_h2wr_t2c.log) Test results will be stored in test-reports/python-pytest/test_ops_jit/test_ops_jit-51a83bd44549074e.xml ============================= test session starts ============================== platform linux -- Python 3.10.11, pytest-7.3.1, pluggy-1.0.0 -- /opt/conda/envs/py_3.10/bin/python cachedir: .pytest_cache hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] rootdir: /var/lib/jenkins/workspace configfile: pytest.ini plugins: hypothesis-5.35.1, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-11.1.2, shard-0.1.2, xdist-3.3.0, xdoctest-1.1.0 collecting ... collected 1084 items Running 100 items in this shard: test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_lu_cuda_float32 (x50), test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_lu_factor_cuda_complex64 (x50) stepcurrent: Cannot find last run test, not skipping test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_lu_cuda_float32 PASSED [2.1876s] [ 1%] test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_lu_factor_cuda_complex64 PASSED [4.5615s] [ 2%] ``` * [pull](https://github.com/pytorch/pytorch/actions/runs/5093566864) * [trunk](https://github.com/pytorch/pytorch/actions/runs/5095364311) * [periodic](https://github.com/pytorch/pytorch/actions/runs/5095378850) * [slow](https://github.com/pytorch/pytorch/actions/runs/5095390285) Pull Request resolved: https://github.com/pytorch/pytorch/pull/102107 Approved by: https://github.com/clee2000, https://github.com/malfet	2023-05-27 12:10:36 +00:00
Catherine Lee	2232cce69c	No cpp + step current (#102001 ) stepcurrent cannot handle xdist Pull Request resolved: https://github.com/pytorch/pytorch/pull/102001 Approved by: https://github.com/huydhn	2023-05-24 17:39:32 +00:00
Huy Do	d06802778e	No need to run C++ tests under rerun disabled tests mode (#102132 ) Per title. I extract this part out of the draft PR that I'm working on https://github.com/pytorch/pytorch/pull/102107 because the remaining issues with rerun disabled tests: log size and unexpected runner failures requires some further investigations while this one is clearing breaking in trunk atm. Until we can support disable C++ tests, there is no need to run them in rerun disabled tests mode. ### Testing Coming from https://github.com/pytorch/pytorch/pull/102107, for example https://github.com/pytorch/pytorch/actions/runs/5062224659/jobs/9087747981 ``` 2023-05-23T22:46:50.1953318Z Running cpp/basic 1/1 ... [2023-05-23 22:46:50.195077] 2023-05-23T22:46:50.1953847Z Skipping C++ tests when running under RERUN_DISABLED_TESTS mode 2023-05-23T22:46:50.2066032Z Running cpp/atest 1/1 ... [2023-05-23 22:46:50.206348] 2023-05-23T22:46:50.2066435Z Skipping C++ tests when running under RERUN_DISABLED_TESTS mode 2023-05-23T22:46:52.2666743Z No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda' 2023-05-23T22:46:52.2691817Z Ignoring disabled issues: [] ... ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/102132 Approved by: https://github.com/clee2000	2023-05-24 07:45:48 +00:00
Huy Do	d26c8f26d1	Lower xdist processes from auto to NUM_PROCS (#102124 ) This is to avoid CUDA OOM issues when running C++ tests both regularly and in memory leak check mode. Pull Request resolved: https://github.com/pytorch/pytorch/pull/102124 Approved by: https://github.com/clee2000	2023-05-24 06:50:55 +00:00
Catherine Lee	f3fc531eee	Check for pytest extensions in run_test (#100916 ) not very elegant checked on separate conda env that doesnt have the usual ci dependencies the two pytest extensions at fault are pytest-rerunfailures and pytest-shard, also included pytest-flakefinder just incase no idea if this is a good way to do this could also check individually and add flags based on that, but was told that needing to requiring all the ci dependencies to be downloaded was also ok Pull Request resolved: https://github.com/pytorch/pytorch/pull/100916 Approved by: https://github.com/huydhn	2023-05-17 20:27:55 +00:00
Catherine Lee	e3c9a1e5c4	Run dynamo tests in parallel (#101432 ) cuts off ~30 min per shard (2 shards and 2 python versions so 2 hours total) Pull Request resolved: https://github.com/pytorch/pytorch/pull/101432 Approved by: https://github.com/huydhn, https://github.com/desertfire, https://github.com/ZainRizvi	2023-05-17 20:26:24 +00:00
Huy Do	552b712f80	Run C++ testcases in parallel with pytest-xdist (#101440 ) After an investigation, running C++ tests with https://github.com/pytest-dev/pytest-cpp is just slower than running them directly, plain and simple. I'm curious on the exact root cause, but that's a story for another day. `time build/bin/test_lazy` takes half a minute to run 610 tests on `linux-bionic-cuda11.8-py3.10-gcc7 / test (default, 2, 5, linux.4xlarge.nvidia.gpu)` while `time pytest /var/lib/jenkins/workspace/build/bin/test_lazy -v` takes 20+ minutes on the same runner. This is a very costly price to pay. The saving grace here is that https://github.com/pytest-dev/pytest-cpp supports pytest-xdist to run tests in parallel with `-n auto`, so `time pytest /var/lib/jenkins/workspace/build/bin/test_lazy -v -n auto` takes only 3 minutes. This is still not as fast as running C++ tests directly, but it's order of magnitude faster than running them sequentially. Pull Request resolved: https://github.com/pytorch/pytorch/pull/101440 Approved by: https://github.com/clee2000	2023-05-16 21:52:36 +00:00
Huy Do	35834a405c	Run C++ tests on CI with run_test.py (#99956 ) After https://github.com/pytorch/pytorch/pull/99559, we can now run C++ test with `run_test.py`. Although advance features such as `--import-slow-tests` and `--import-disabled-tests` won't work for now, there will still be a gain in reliability and performance as C++ can now be retried and run in parallel. This covers all C++ tests in the CI including aten, libtorch, and Vulkan C++ tests across all platforms Linux, Windows, MacOS. Notes: * To support C++ test discovery, the env variable `CPP_TESTS_DIR` can be set to where the C++ test binaries is located * Support pytest -k argument via run_test as this is used by pytest-cpp to replace `--gtest-filter` * The XML output is in pytest format, but it's ok now because we don't have slow test or flaky test support for C++ test yet * ~~I need to figure out why conftest.py doesn't work when I invoke pytest directly for C++ test, so `--sc` is not available for C++ tests at the moment. Proper pytest plugin like stepwise works fine though. I'll investigate and fix it in a separate PR~~ Found the cause, `conftest.py` is per directory and needs to be in any arbitrary directory that holds C++ test * Two tests `test_api` and `test_tensorexpr` timed out on ASAN, I suspect that ASAN is now used on top of the python executable, which is slower than running native C++ code. IMO, it's ok to run these tests as before on ASAN for now Pull Request resolved: https://github.com/pytorch/pytorch/pull/99956 Approved by: https://github.com/clee2000, https://github.com/ZainRizvi	2023-05-09 21:24:12 +00:00
Ramin Azarmehr	cecfcf1e17	[MPS] Handle MPS failures of test_modules.py in common_modules.py (#95334 ) - Also cleaned up `test_modules.py` from skipMPS code. - Added `skipMPS` for unsupported or failing tests on MPS backend in common_modules.py. (We'll remove `skipMPS` from those tests once a fix is available for them.) Pull Request resolved: https://github.com/pytorch/pytorch/pull/95334 Approved by: https://github.com/kulinseth, https://github.com/albanD	2023-05-09 03:55:16 +00:00
Zain Rizvi	95f191a248	Always run prioritized tests first, even if they're expected to run serially (#100748 ) Today, we prioritize running test files that were edited in the user's PR, with the idea being to run them before we run any other test. Except, if the modified test is supposed to run serially, then we still end up running it after all the parallelized tests have finished running. This PR fixes that to _always_ run the prioritized tests before the regular tests, regardless of if the test is supposed to run serially or in parallel Pull Request resolved: https://github.com/pytorch/pytorch/pull/100748 Approved by: https://github.com/huydhn	2023-05-08 20:23:46 +00:00
Catherine Lee	a1f318daba	Fix get_reordered_tests in run_test.py (#100752 ) i think get_reordered_tests broken since master -> main switch add typing for some functions checked for `prioritized` in the logs limited testing because I only care about one very small part of the log thats near the beginning Pull Request resolved: https://github.com/pytorch/pytorch/pull/100752 Approved by: https://github.com/huydhn	2023-05-05 22:46:56 +00:00
Catherine Lee	e88e92e7a2	Update to reruns + timeouts in run_test.py (#100412 ) https://github.com/pytorch/pytorch/pull/100200/files made unknown tests more likely to fail b/c lacking test times but still have time outs, so fix that Pull Request resolved: https://github.com/pytorch/pytorch/pull/100412 Approved by: https://github.com/huydhn	2023-05-01 21:51:53 +00:00
pbialecki	73645a8412	Add CUDA 12.1 CI workflows (#98832 ) Adds CUDA 12.1 CI workflows, removes CUDA 11.7. CC @malfet Pull Request resolved: https://github.com/pytorch/pytorch/pull/98832 Approved by: https://github.com/atalman	2023-05-01 16:25:53 +00:00
PyTorch MergeBot	9075e3c2c6	Revert "Run test_fx_to_onnx_with_onnxruntime serially (#100298 )" This reverts commit `3a3f781f6c`. Reverted https://github.com/pytorch/pytorch/pull/100298 on behalf of https://github.com/huydhn due to No need as https://github.com/pytorch/pytorch/pull/100297 has been landed ([comment](https://github.com/pytorch/pytorch/pull/100298#issuecomment-1528476786))	2023-04-29 02:07:39 +00:00
Huy Do	3a3f781f6c	Run test_fx_to_onnx_with_onnxruntime serially (#100298 ) This test starts to fail out of nowhere in trunk Pull Request resolved: https://github.com/pytorch/pytorch/pull/100298 Approved by: https://github.com/kit1980	2023-04-29 00:51:25 +00:00
Catherine Lee	6ab9453ea9	File level rerun changes (#100200 ) Fixes #ISSUE_NUMBER * change hook so that test still gets saved in --sc when fails in test setup (caused an off by 1 error due to setup being called before the logreport hook) * allow reruns for all tests now that --sc is used * increase number of reruns now that --sc is used Pull Request resolved: https://github.com/pytorch/pytorch/pull/100200 Approved by: https://github.com/huydhn	2023-04-28 20:57:49 +00:00
Catherine Lee	ae5e1819a5	stepcurrent (#98035 ) * add stepcurrent flag (--sc) based off the stepwise flag that saves the currently running test so that test running can resume from the last successful test after segfaults, takes in an argument for a key so that different test runs dont overwrite each other * send sigint to process when timeout so that xml can be made * add currently unused stepcurrent skip flag (--scs) based off stepwise skip flag that skips the failing test, was going to use if for the keep-going label but having trouble with CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/98035 Approved by: https://github.com/huydhn	2023-04-25 20:56:04 +00:00
Huy Do	96d3f3dee3	Discover and run C++ tests with run_test.py (#99559 ) This depends on [pytest-cpp](https://github.com/pytest-dev/pytest-cpp) to discover and run C++ tests with pytest. C++ tests are built under `${WORKSPACE}/build/bin` directory and copied to the test job under the same path. * To expose them to `run_test`, I choose to use the mock path prefix `cpp`, for example `build/bin/c10_Array_test` would be named as `cpp/c10_Array_test` and the `python test/run_test.py --cpp -i cpp/c10_Array_test` would run the test in the same way as other Python tests. I could copy them from `build/bin` to `test/cpp`, but it will be mixed with the source code and CMake file. So this looks easier * Some executable under `build/bin` are not C++ tests, and they are exclude, for example `build/bin/torch_shm_manager` * C++ tests need to run with pytest directly as python command doesn't understand it * The change is gated by the new `--cpp` argument to `run_test.py`, for example `python test/run_test.py --cpp` will run all available C++ tests * The tests can be run in parallel * Failing tests can be retried with `--reruns=2` and `--sw` ``` ============================= test session starts ============================== platform darwin -- Python 3.9.15, pytest-7.2.0, pluggy-1.0.0 -- /Users/huydo/miniconda3/envs/py3.9/bin/python3 cachedir: .pytest_cache hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/Users/huydo/Storage/mine/pytorch/test/.hypothesis/examples') rootdir: /Users/huydo/Storage/mine/pytorch, configfile: pytest.ini plugins: xdoctest-1.1.0, cpp-2.3.0, rerunfailures-10.3, shard-0.1.2, flakefinder-1.1.0, hypothesis-6.56.4, xdist-3.0.2, repeat-0.9.1 collecting ... collected 3 items / 2 deselected / 1 selected Running 1 items in this shard: build/bin/scalar_tensor_test::TestScalarTensor.TestScalarTensorMPS stepwise: skipping 2 already passed items. ../build/bin/scalar_tensor_test::TestScalarTensor::TestScalarTensorMPS RERUN [100%] ../build/bin/scalar_tensor_test::TestScalarTensor::TestScalarTensorMPS RERUN [100%] ../build/bin/scalar_tensor_test::TestScalarTensor::TestScalarTensorMPS FAILED [100%] ``` * `--import-slow-tests` and `--import-disabled-tests` won't work for now and that's ok to have it as a future task. I also add `pytest-cpp==2.3.0` to Linux Docker, MacOS, and Windows. ### Testing Build PyTorch and run `python test/run_test.py --cpp` on my laptop. CI change would come later in a separate PR. Also running `python test/run_test.py --help` now shows all C++ test discovered under `build/bin` Pull Request resolved: https://github.com/pytorch/pytorch/pull/99559 Approved by: https://github.com/clee2000	2023-04-22 00:23:31 +00:00
Zain Rizvi	7546972565	[BE] Refactoring test execution and improving comments (#99467 ) Sharing code between the code that handles test results in parallel vs serial mode. Note that the original version of this code had an inconsistency between the two versions where it would execute `print_to_stderr(err_message)` on every test that ran in parallel, but for serial tests it would only invoke `print_to_stderr(err_message)` if `continue_on_error` was also specified. By sharing code, this PR changes that behavior to be consistent between the two modes. Also adding some comments. <!-- copilot:poem --> ### <samp>🤖 Generated by Copilot at 029342c</samp> > _Sing, O Muse, of the skillful coder who refined_ > _The PyTorch testing script, `run_test.py`, and shined_ > _A light on its obscure logic, with docstrings and comments_ > _And made it run more smoothly, with better error contents_ Pull Request resolved: https://github.com/pytorch/pytorch/pull/99467 Approved by: https://github.com/huydhn, https://github.com/malfet	2023-04-19 19:29:07 +00:00
BowenBao	d41aa448b8	[ONNX] Run ONNX tests as part of standard run_test script (#99215 ) <!-- copilot:all --> ### <samp>🤖 Generated by Copilot at dcbf7e2</samp> ### Summary 📝🧹🚩 <!-- 1. 📝 for simplifying the `./scripts/onnx/test.sh` script 2. 🧹 for refactoring the `test/onnx/dynamo/test_exporter_api.py` file 3. 🚩 for adding the `--onnx` flag to `test/run_test.py` and updating the `TESTS` list --> This pull request improves the ONNX testing infrastructure in PyTorch by refactoring the test code, normalizing the scope names, adding a flag to run only the ONNX tests, and simplifying the test script. > _To export PyTorch models to ONNX_ > _We refactored some scripts and contexts_ > _We used `common_utils`_ > _And normalized the scopes_ > _And added a flag to run the tests_ ### Walkthrough * Simplify `./scripts/onnx/test.sh` to use `run_test.py` with `--onnx` flag instead of `pytest` ([link](https://github.com/pytorch/pytorch/pull/99215/files?diff=unified&w=0#diff-0017f5b22ae1329acb0f54af8d9811c9b6180a72dac70d7a5b89d7c23c958198L44-R46)) * Remove `onnx` test from `TESTS` list in `test/run_test.py` ([link](https://github.com/pytorch/pytorch/pull/99215/files?diff=unified&w=0#diff-e72503c9e3e8766e2d1bacf3fad7b88aa166e0e90a7e103e7df99357a35df8d7L127-R127)). Replace with `onnx_caffe2`. * Add `onnx/test_pytorch_onnx_onnxruntime_cuda` and `onnx/test_models` tests to `blocklisted_tests` list in `test/run_test.py` ([link](https://github.com/pytorch/pytorch/pull/99215/files?diff=unified&w=0#diff-e72503c9e3e8766e2d1bacf3fad7b88aa166e0e90a7e103e7df99357a35df8d7R154-R155)) * Add `ONNX_SERIAL_LIST` list to `test/run_test.py` to specify ONNX tests that must run serially ([link](https://github.com/pytorch/pytorch/pull/99215/files?diff=unified&w=0#diff-e72503c9e3e8766e2d1bacf3fad7b88aa166e0e90a7e103e7df99357a35df8d7R296-R301)) * Add `ONNX_TESTS` list to `test/run_test.py` to store all ONNX tests ([link](https://github.com/pytorch/pytorch/pull/99215/files?diff=unified&w=0#diff-e72503c9e3e8766e2d1bacf3fad7b88aa166e0e90a7e103e7df99357a35df8d7R370)) * Add `--onnx` flag to `parse_args` function in `test/run_test.py` to run only ONNX tests ([link](https://github.com/pytorch/pytorch/pull/99215/files?diff=unified&w=0#diff-e72503c9e3e8766e2d1bacf3fad7b88aa166e0e90a7e103e7df99357a35df8d7R920-R928)) * Include `ONNX_SERIAL_LIST` in `must_serial` function in `test/run_test.py` to run ONNX tests serially or parallelly based on memory usage ([link](https://github.com/pytorch/pytorch/pull/99215/files?diff=unified&w=0#diff-e72503c9e3e8766e2d1bacf3fad7b88aa166e0e90a7e103e7df99357a35df8d7R1120)) * Filter selected tests based on `--onnx` flag in `get_selected_tests` function in `test/run_test.py` to exclude non-ONNX tests ([link](https://github.com/pytorch/pytorch/pull/99215/files?diff=unified&w=0#diff-e72503c9e3e8766e2d1bacf3fad7b88aa166e0e90a7e103e7df99357a35df8d7R1158-R1165)) ### Other minor changes to accommodate this change * Replace `unittest` module with `common_utils.TestCase` in `test/onnx/dynamo/test_exporter_api.py` ([link](https://github.com/pytorch/pytorch/pull/99215/files?diff=unified&w=0#diff-4545f0c15c73ebe90a875e9bee6c5ca4b6b92fb1ed0ec5560d1568e0f6339d02L4), [link](https://github.com/pytorch/pytorch/pull/99215/files?diff=unified&w=0#diff-4545f0c15c73ebe90a875e9bee6c5ca4b6b92fb1ed0ec5560d1568e0f6339d02L29-R28), [link](https://github.com/pytorch/pytorch/pull/99215/files?diff=unified&w=0#diff-4545f0c15c73ebe90a875e9bee6c5ca4b6b92fb1ed0ec5560d1568e0f6339d02L71-R70), [link](https://github.com/pytorch/pytorch/pull/99215/files?diff=unified&w=0#diff-4545f0c15c73ebe90a875e9bee6c5ca4b6b92fb1ed0ec5560d1568e0f6339d02L147-R146)) * Import `TemporaryFileName` class from `common_utils` in `test/onnx/dynamo/test_exporter_api.py` ([link](https://github.com/pytorch/pytorch/pull/99215/files?diff=unified&w=0#diff-4545f0c15c73ebe90a875e9bee6c5ca4b6b92fb1ed0ec5560d1568e0f6339d02L19-R18)) * Use `common_utils.TemporaryFileName` instead of `TemporaryFileName` in `TestDynamoExportAPI` class in `test/onnx/dynamo/test_exporter_api.py` ([link](https://github.com/pytorch/pytorch/pull/99215/files?diff=unified&w=0#diff-4545f0c15c73ebe90a875e9bee6c5ca4b6b92fb1ed0ec5560d1568e0f6339d02L92-R91), [link](https://github.com/pytorch/pytorch/pull/99215/files?diff=unified&w=0#diff-4545f0c15c73ebe90a875e9bee6c5ca4b6b92fb1ed0ec5560d1568e0f6339d02L110-R109), [link](https://github.com/pytorch/pytorch/pull/99215/files?diff=unified&w=0#diff-4545f0c15c73ebe90a875e9bee6c5ca4b6b92fb1ed0ec5560d1568e0f6339d02L129-R128)) * Use `common_utils.run_tests` instead of `unittest.main` in `test/onnx/dynamo/test_exporter_api.py` ([link](https://github.com/pytorch/pytorch/pull/99215/files?diff=unified&w=0#diff-4545f0c15c73ebe90a875e9bee6c5ca4b6b92fb1ed0ec5560d1568e0f6339d02L155-R154)) * Add `re` module to `test/onnx/test_utility_funs.py` ([link](https://github.com/pytorch/pytorch/pull/99215/files?diff=unified&w=0#diff-da71d2c81c9dc7ac0c47ff086fded82e4edcb67ba0cd3d8b5c983d7467343bc7R6)) * Add `_remove_test_environment_prefix_from_scope_name` function to `test/onnx/test_utility_funs.py` to normalize scope names of ONNX nodes ([link](https://github.com/pytorch/pytorch/pull/99215/files?diff=unified&w=0#diff-da71d2c81c9dc7ac0c47ff086fded82e4edcb67ba0cd3d8b5c983d7467343bc7R32-R58)) * Use `_remove_test_environment_prefix_from_scope_name` function to compare scope names of ONNX nodes in `TestUtilityFuns` class in `test/onnx/test_utility_funs.py` ([link](https://github.com/pytorch/pytorch/pull/99215/files?diff=unified&w=0#diff-da71d2c81c9dc7ac0c47ff086fded82e4edcb67ba0cd3d8b5c983d7467343bc7L1099-R1133), [link](https://github.com/pytorch/pytorch/pull/99215/files?diff=unified&w=0#diff-da71d2c81c9dc7ac0c47ff086fded82e4edcb67ba0cd3d8b5c983d7467343bc7L1119-R1152), [link](https://github.com/pytorch/pytorch/pull/99215/files?diff=unified&w=0#diff-da71d2c81c9dc7ac0c47ff086fded82e4edcb67ba0cd3d8b5c983d7467343bc7L1170-R1188), [link](https://github.com/pytorch/pytorch/pull/99215/files?diff=unified&w=0#diff-da71d2c81c9dc7ac0c47ff086fded82e4edcb67ba0cd3d8b5c983d7467343bc7L1181-R1199), [link](https://github.com/pytorch/pytorch/pull/99215/files?diff=unified&w=0#diff-da71d2c81c9dc7ac0c47ff086fded82e4edcb67ba0cd3d8b5c983d7467343bc7L1220-R1239), [link](https://github.com/pytorch/pytorch/pull/99215/files?diff=unified&w=0#diff-da71d2c81c9dc7ac0c47ff086fded82e4edcb67ba0cd3d8b5c983d7467343bc7L1235-R1258)) Fixes #98626 Pull Request resolved: https://github.com/pytorch/pytorch/pull/99215 Approved by: https://github.com/huydhn, https://github.com/titaiwangms	2023-04-19 06:17:47 +00:00
Zachary DeVito	7ff1f3f3f6	Revert "Revert "Expandable blocks in allocator (#96995 )"" (#99275 ) This reverts commit `851e89c8e8`. Differential Revision: [D45034526](https://our.internmc.facebook.com/intern/diff/D45034526) Pull Request resolved: https://github.com/pytorch/pytorch/pull/99275 Approved by: https://github.com/eellison	2023-04-17 23:46:08 +00:00
PyTorch MergeBot	851e89c8e8	Revert "Expandable blocks in allocator (#96995 )" This reverts commit `6a50b83b73`. Reverted https://github.com/pytorch/pytorch/pull/96995 on behalf of https://github.com/izaitsevfb due to Breaks internal tests	2023-04-16 19:23:37 +00:00
Zachary DeVito	6a50b83b73	Expandable blocks in allocator (#96995 ) Common advice we give for handling memory fragmentation issues is to allocate a big block upfront to reserve memory which will get split up later. For programs with changing tensor sizes this can be especially helpful to avoid OOMs that happen the first time we see a new largest input and would otherwise have to allocate new segments. However the issue with allocating a block upfront is that is nearly impossible to correctly estimate the size of that block. If too small, space in the block will run out and the allocator will allocate separate blocks anyway. Too large, and other non-PyTorch libraries might stop working because they cannot allocate any memory. This patch provides the same benefits as using a pre-allocating block but without having to choose its size upfront. Using the cuMemMap-style APIs, it adds the ability to expand the last block in a segment when more memory is needed. Compared to universally using cudaMallocAsync to avoid fragmentation, this patch can fix this common fragmentation issue while preserving most of the existing allocator behavior. This behavior can be enabled and disabled dynamically. This should allow users to, for instance, allocate long-lived parameters and state in individual buffers, and put temporary state into the large expandable blocks, further reducing fragmentation. See inline comments for information about the implementation and its limitations. Pull Request resolved: https://github.com/pytorch/pytorch/pull/96995 Approved by: https://github.com/eellison	2023-04-14 09:49:11 +00:00
Richard Zou	d5120ff18a	[torch.library] Add ability to create library fragments (#98439 ) In C++ we have TORCH_LIBRARY_FRAGMENT. This PR adds the same functionality to the Python torch.library API. The motivation for this is: for the simple custom op API, we don't want users to need to deal with Library objects. One way to hide this from users is to create library fragments. Test Plan: - tests that you can create multiple fragments and def+impl operators on each. Pull Request resolved: https://github.com/pytorch/pytorch/pull/98439 Approved by: https://github.com/ezyang, https://github.com/bdhirsh	2023-04-10 18:04:53 +00:00
BowenBao	4f9dbc17a4	[ONNX] Enable xdoctests in CI (#98546 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/98546 Approved by: https://github.com/justinchuby, https://github.com/kit1980	2023-04-07 22:20:18 +00:00
PyTorch MergeBot	55724a5ec9	Revert "[experiment] More procs in CI (#98098 )" This reverts commit `9fd3eba6ce`. Reverted https://github.com/pytorch/pytorch/pull/98098 on behalf of https://github.com/clee2000 due to I think theres a bug	2023-04-07 19:50:54 +00:00
Catherine Lee	9fd3eba6ce	[experiment] More procs in CI (#98098 ) experiment with more procs but only in master so prs dont get affected Pull Request resolved: https://github.com/pytorch/pytorch/pull/98098 Approved by: https://github.com/huydhn	2023-04-07 17:21:32 +00:00
Fuzzkatt	481ecffb5e	Add test c10d ucc tests (#88110 ) Creates the equivalent c10d test for ucc for https://github.com/pytorch/pytorch/blob/master/test/distributed/test_c10d_gloo.py and https://github.com/pytorch/pytorch/blob/master/test/distributed/test_c10d_nccl.py. Uses test_c10d_gloo.py as the reference and adds all the common ops. More detailed comparison of available ops here: https://docs.google.com/document/d/1yPsa_X9EiEiqo-j2Yn7ierhccBtEjwoqC-B7-amI0MI/edit?usp=sharing Also removes extra line for ProcessGroupUCC.cpp barrier blocking wait that got duplicated from merging https://github.com/pytorch/pytorch/pull/85047. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88110 Approved by: https://github.com/zasdfgbnm, https://github.com/kit1980, https://github.com/kwen2501, https://github.com/malfet	2023-04-06 23:51:27 +00:00
Catherine Lee	0d73cfb3e9	Retry at test file level (#97506 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/97506 Approved by: https://github.com/huydhn	2023-03-31 18:36:53 +00:00
Catherine Lee	c797c7bc8b	Clean up duplicate function run_test.py (#97914 ) afaict theyre the same thing Pull Request resolved: https://github.com/pytorch/pytorch/pull/97914 Approved by: https://github.com/huydhn	2023-03-31 06:31:17 +00:00
PyTorch MergeBot	675dfd2c1f	Revert "Retry at test file level (#97506 )" This reverts commit `7d5d5beba2`. Reverted https://github.com/pytorch/pytorch/pull/97506 on behalf of https://github.com/clee2000 due to test_jit_cuda_fuser having a rough time	2023-03-31 06:22:14 +00:00
Catherine Lee	7d5d5beba2	Retry at test file level (#97506 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/97506 Approved by: https://github.com/huydhn	2023-03-30 17:12:19 +00:00
Kazuaki Ishizaki	f7fe6e148e	[test] Make environment variable name better (#97356 ) This PR intends to use better (or correct?) environment variable name (`TORCH_DOCTEST_ANOMALY` instead of `TORCH_DOCTEST_ANOMOLY`) in test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/97356 Approved by: https://github.com/malfet, https://github.com/kit1980	2023-03-30 06:21:28 +00:00
Huy Do	4c0dce50fd	[BE] Apply ufmt to run_test and GitHub Python util scripts (#97588 ) This has been bugging me for a while as I'm working on these Python scripts and they are not tracked by ufmt linter. So I add these script into that linter. ``` [[linter]] code = 'UFMT' include_patterns = [ '.github/*/.py', 'test/run_test.py', ``` This change should just work and not break anything as ufmt (black + usort) linter is very safe to use for standalone util scripts. Pull Request resolved: https://github.com/pytorch/pytorch/pull/97588 Approved by: https://github.com/kit1980	2023-03-26 04:52:55 +00:00
Catherine Lee	29c061bb90	Remove non existent files in multigpu tests (#97393 ) They were removed in https://github.com/pytorch/pytorch/pull/96989/files and https://github.com/pytorch/pytorch/pull/96985/files Pull Request resolved: https://github.com/pytorch/pytorch/pull/97393 Approved by: https://github.com/kit1980, https://github.com/huydhn, https://github.com/fduwjj, https://github.com/malfet	2023-03-23 17:00:29 +00:00
Huy Do	244736a5a5	Mark ROCm tests as flaky (#97259 ) Before https://github.com/pytorch/pytorch/pull/96464, ROCm tests in trunk are already quite flaky https://hud.pytorch.org/reliability/pytorch/pytorch?jobName=trunk%20%2F%20linux-focal-rocm5.4.2-py3.8%20%2F%20test%20(default). After https://github.com/pytorch/pytorch/pull/96464, there is a new group of flaky failures coming from functorch. So let's mark the test as flaky to monitor without impacting trunk. Two flaky tests currently seeing in trunk are: * https://github.com/pytorch/pytorch/issues/97256 * `functorch/test_memory_efficient_fusion.py` OOM Pull Request resolved: https://github.com/pytorch/pytorch/pull/97259 Approved by: https://github.com/malfet, https://github.com/zou3519	2023-03-21 16:55:00 +00:00
Richard Zou	5acf403088	Run functorch tests in default shards; delete functorch-specific shards (#96464 ) Fixes #96347 This PR: - Makes the functorch tests run as a part of the "default" shards - Delete the functorch CI shard from all CI job configurations (if it exists) - Increase the "default" shard count by 1 for each job, unless it was previously set to 1, to accommodate the new functorch tests and not regress time-to-signal. - Adds a bunch of skips for ROCM and torchdynamo configurations. We can investigate them later. NB: I might go through some more iterations to figure out what other skips need to be added, but this iteration of the PR seems to pass most CI. suite. Test Plan: - wait for CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/96464 Approved by: https://github.com/huydhn	2023-03-21 13:53:01 +00:00
Huy Do	270b42d279	Fix test_schema_check CUDA illegal memory access (#97062 ) I'm seeing some recent [CUDA illegal memory access](https://hud.pytorch.org/failure/FAILED%20test_schema_check.py%3A%3ATestSchemaCheckModeOpInfoCUDA%3A%3Atest_schema_correctness_fft_fft_cuda_bool%20-%20RuntimeError%3A%20CUDA%20error%3A%20an%20illegal%20memory%20access%20was%20encountered) error related to this test. So a cheap fix is to run it serially. Fixes https://github.com/pytorch/pytorch/issues/95749 Pull Request resolved: https://github.com/pytorch/pytorch/pull/97062 Approved by: https://github.com/clee2000	2023-03-20 20:57:27 +00:00
Huy Do	db2c1ea8c8	Re-enable test_ops_jit on Windows (#96859 ) (#96931 ) Fixes https://github.com/pytorch/pytorch/issues/96858 Pull Request resolved: https://github.com/pytorch/pytorch/pull/96931 Approved by: https://github.com/kit1980	2023-03-17 22:42:22 +00:00
Catherine Lee	8c2341c1b9	Remove pytest block list (#96698 ) Enables the last few files under pytest. xdist was causing problems with `profiler/test_profiler` `test_source_multithreaded` due to creating extra threads. Luckily we don't use it so we can disable it with `-p no:xdist`, but this is incompatible with pytest-rerunfailures==10.2, so upgrade to 10.3. I'd update the windows ami but idk how. `dynamo/test_optimizers` and `dynamo/test_repros` both had tests that used skip_if_pytest. https://github.com/pytorch/pytorch/pull/93251/files suggests that it is due to pytest assertion rewriting, so I added `PYTEST_DONT_REWRITE` to their module docstrings to prevent pytest from rewriting assertions. Disable test by issue in `dynamo/test_dynamic_shapes` seems sane. Pull Request resolved: https://github.com/pytorch/pytorch/pull/96698 Approved by: https://github.com/huydhn, https://github.com/malfet	2023-03-16 04:22:42 +00:00
albanD	7c525823c7	Remove un-used list. And disable pytest for public binding test. (#96684 ) This contains a temporary change to make sure the test fails nicely now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/96684 Approved by: https://github.com/clee2000	2023-03-15 22:12:00 +00:00
Huy Do	6339ee5d23	Temporarily disable test_ops_jit on Windows (#96859 ) See https://github.com/pytorch/pytorch/issues/96858 Pull Request resolved: https://github.com/pytorch/pytorch/pull/96859 Approved by: https://github.com/kit1980	2023-03-15 17:51:32 +00:00
Huy Do	51b8ab7879	Clean up references to test_megatron_prototype (#96431 ) This test has been deleted in #96254 Pull Request resolved: https://github.com/pytorch/pytorch/pull/96431 Approved by: https://github.com/clee2000, https://github.com/fduwjj	2023-03-10 23:50:32 +00:00
Catherine Lee	4519228f60	Reduce pytest blocklist part 2 (#96397 ) Enable pytest for a few unique files. pytest runs tests in a different order than unittest (but still a consistent ordering with respect to itself) and some tests change global state, causing other tests to fail. `test_transpose_non_contiguous` in `test_torchinductor.py` gets impacted from some other test but I'm not sure which one, so my solution is to reset the metrics before the rest of the test is run. `test_register_patterns` in `test_quantize_fx.py` adds extra keys to global variables, so remove them when the test is done via unittest's `addCleanUp` which also works on pytest. pytest doesn't really have an equivalent for `load_tests` so change it to be like `test_jit` that imports all the classes. I also attempted to dynamically import them, but I failed. `test_public_api_surface` in `test_fx.py` checks for a backwards compatibility classification. There is a different test in test_fx that results in `fuser_utils` being imported. pytest runs this test before `test_public_api_surface` while unittest runs it after, so pytest sees `fuser_utils` when crawling through the modules. Pull Request resolved: https://github.com/pytorch/pytorch/pull/96397 Approved by: https://github.com/huydhn	2023-03-10 19:10:43 +00:00
Xiao Wang	cf3d3a583e	Add env PYTORCH_TEST_DO_NOT_USE_PYTEST as an option to not use pytest in unit testing (#96444 ) Set environment variable ``` PYTORCH_TEST_DO_NOT_USE_PYTEST=1 ``` to not use pytest in pytorch unit testing. This change is related to some recent changes, e.g. #96210, #96016, #95844, #95659, that enabled the use of pytest in many test modules. Those test modules were testing normally before, but failed immediately after pytest is used. Sample stacktraces are: ```python root@8e3168a83ee2:/opt/pytorch/pytorch# python test/run_test.py -v -i test_optim -- -v --save-xml Ignoring disabled issues: [] /opt/pytorch/pytorch/test/run_test.py:1225: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead. if torch.version.cuda is not None and LooseVersion(torch.version.cuda) >= "11.6": Selected tests: test_optim parallel (file granularity) tests: test_optim serial (file granularity) tests: Ignoring disabled issues: [] Ignoring disabled issues: [] Running test_optim ... [2023-03-09 12:51:59.358110] Executing ['/usr/local/bin/python', '-bb', 'test_optim.py', '-v', '--save-xml', '-v', '--use-pytest', '-vv', '-rfEX', '-x', '--reruns=2'] ... [2023-03-09 12:51:59.358810] Test results will be stored in test-reports/python-pytest/test_optim/test_optim-5e41643c8bac8ace.xml Traceback (most recent call last): File "/opt/pytorch/pytorch/test/test_optim.py", line 4581, in <module> run_tests() File "/opt/pytorch/pytorch/torch/testing/_internal/common_utils.py", line 796, in run_tests exit_code = pytest.main(args=pytest_args) File "/usr/local/lib/python3.10/site-packages/_pytest/config/__init__.py", line 148, in main config = _prepareconfig(args, plugins) File "/usr/local/lib/python3.10/site-packages/_pytest/config/__init__.py", line 329, in _prepareconfig config = pluginmanager.hook.pytest_cmdline_parse( File "/usr/local/lib/python3.10/site-packages/pluggy/_hooks.py", line 265, in __call__ return self._hookexec(self.name, self.get_hookimpls(), kwargs, firstresult) File "/usr/local/lib/python3.10/site-packages/pluggy/_manager.py", line 80, in _hookexec return self._inner_hookexec(hook_name, methods, kwargs, firstresult) File "/usr/local/lib/python3.10/site-packages/pluggy/_callers.py", line 55, in _multicall gen.send(outcome) File "/usr/local/lib/python3.10/site-packages/_pytest/helpconfig.py", line 103, in pytest_cmdline_parse config: Config = outcome.get_result() File "/usr/local/lib/python3.10/site-packages/pluggy/_result.py", line 60, in get_result raise ex[1].with_traceback(ex[2]) File "/usr/local/lib/python3.10/site-packages/pluggy/_callers.py", line 39, in _multicall res = hook_impl.function(*args) File "/usr/local/lib/python3.10/site-packages/_pytest/config/__init__.py", line 1060, in pytest_cmdline_parse self.parse(args) File "/usr/local/lib/python3.10/site-packages/_pytest/config/__init__.py", line 1348, in parse self._preparse(args, addopts=addopts) File "/usr/local/lib/python3.10/site-packages/_pytest/config/__init__.py", line 1231, in _preparse self.pluginmanager.load_setuptools_entrypoints("pytest11") File "/usr/local/lib/python3.10/site-packages/pluggy/_manager.py", line 287, in load_setuptools_entrypoints plugin = ep.load() File "/usr/local/lib/python3.10/importlib/metadata/__init__.py", line 171, in load module = import_module(match.group('module')) File "/usr/local/lib/python3.10/importlib/__init__.py", line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "<frozen importlib._bootstrap>", line 1050, in _gcd_import File "<frozen importlib._bootstrap>", line 1027, in _find_and_load File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 688, in _load_unlocked File "/usr/local/lib/python3.10/site-packages/_pytest/assertion/rewrite.py", line 168, in exec_module exec(co, module.__dict__) File "/usr/local/lib/python3.10/site-packages/xdist/looponfail.py", line 16, in <module> import execnet File "/usr/local/lib/python3.10/site-packages/execnet/__init__.py", line 14, in <module> from .gateway_base import DataFormatError File "/usr/local/lib/python3.10/site-packages/execnet/gateway_base.py", line 1138, in <module> FLOAT_FORMAT_SIZE = struct.calcsize(FLOAT_FORMAT) BytesWarning: Comparison between bytes and string FINISHED PRINTING LOG FILE of test_optim (/opt/pytorch/pytorch/test/test-reports/test_optim_1pnlesrz.log) test_optim failed! Traceback (most recent call last): File "/opt/pytorch/pytorch/test/run_test.py", line 1428, in <module> main() File "/opt/pytorch/pytorch/test/run_test.py", line 1386, in main raise RuntimeError( RuntimeError: test_optim failed! Tip: You can keep running tests even on failure by passing --keep-going to run_test.py. If running on CI, add the 'keep-going' label to your PR and rerun your jobs. ``` I'd like to propose this option that allows users to use the good old python unit test way instead of pytest to run their testing in CI. Pull Request resolved: https://github.com/pytorch/pytorch/pull/96444 Approved by: https://github.com/malfet	2023-03-10 01:32:15 +00:00
Catherine Lee	a7fe11dec0	--subprocess for pytest (#96210 ) Implements --subprocess flag for pytest, which previously only worked with unittest Pretty much all the tests in the custom handler list use --subprocess Pull Request resolved: https://github.com/pytorch/pytorch/pull/96210 Approved by: https://github.com/huydhn	2023-03-08 21:04:50 +00:00
BowenBao	bdb076ab43	[ONNX] Skip doctest `torch.onnx._internal.fx` if ImportError (#95686 ) Need to use `exclude` to skip the module altogether. Because xdoctest triggers `ImportError` when trying to import the module. So the whole test fails regardless if skip was added in the docstring. Pull Request resolved: https://github.com/pytorch/pytorch/pull/95686 Approved by: https://github.com/kit1980, https://github.com/titaiwangms	2023-03-07 22:05:27 +00:00
Catherine Lee	eea0733045	Reduce pytest blocklist (#96016 ) `TestCase = object` or variations of it get switched to `TestCase = NoTest`. unittest collects test based on subclassing unittest.TestCase, so setting TestCase = object removes it from unittest test collection. pytest collects based on name (https://docs.pytest.org/en/7.1.x/reference/reference.html#confval-python_classes) but can be told to ignore a class (bottom of https://docs.pytest.org/en/7.1.x/example/pythoncollection.html#changing-naming-conventions) Pull Request resolved: https://github.com/pytorch/pytorch/pull/96016 Approved by: https://github.com/ZainRizvi, https://github.com/huydhn	2023-03-07 18:30:27 +00:00
Catherine Lee	7f5f0b3665	Run _nvfuser/test_torchscript serially (#95951 ) Started at `ce4cbac914 (11734276291)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/95951 Approved by: https://github.com/huydhn	2023-03-03 17:41:09 +00:00
Catherine Lee	d21577f28c	Run more tests through pytest (#95844 ) Run more tests through pytest. Use a block list for tests that shouldn't run through pytest. As far as I can tell, the number of tests run, skipped, and xfailed for those not on the blocklist are the same. Regarding the main module: Usually tests are run in CI, we call `python <test file>`, which causes the file to be imported under the module name `__main__`. However, pytest searches for the module to be imported under the file name, so the file will be reimported. This can cause issues for tests that run module level code and change global state, like test_nn, which modifies lists imported from another file, or tests in test/lazy, which initialize a backend that cannot coexist with a second copy of itself. My workaround for this is to run tests from the `__main__` module. However, this results in pytest being unable to rewrite assertions (and possibly other things but I don't know what other things pytest does right now). A better solution might be to call `pytest <test file>` directly and move all the code in run_tests(argv) to be module level code or put it in a hook in conftest.py. Pull Request resolved: https://github.com/pytorch/pytorch/pull/95844 Approved by: https://github.com/huydhn	2023-03-03 17:32:26 +00:00
Bin Bao	9835c93aba	[CI] Change the way tests are triggered with dynamo and inductor (#94539 ) Summary: Currently running PyTorch tests with dynamo and inductor is controlled by environment variables, and CI sets them based on test config name matching. Change them to use options of run_test.py. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94539 Approved by: https://github.com/huydhn	2023-03-01 13:06:23 +00:00
Catherine Lee	e3c5c369ba	Run tests in USE_PYTEST_LIST through run_tests (#95659 ) Part of my effort to move everything to pytest and decrease the number of testrunner frameworks in ci Gives xmls but they might look a weird b/c module level tests vs tests in classes. Doesn't give skip/disable test infra because those are tied to classes. (for future ref, could either put tests in classes or move the check_if_enable stuff into a pytest hook) Tested in CI and checked that the same number of tests are run Pull Request resolved: https://github.com/pytorch/pytorch/pull/95659 Approved by: https://github.com/huydhn	2023-02-28 22:09:01 +00:00
Catherine Lee	5272d6e6e5	Remove mentions of distributed/_shard/test_replicated_tensor (#95632 ) The file was removed in https://github.com/pytorch/pytorch/pull/95453, which cause some issues with the multigpu job in periodic. Pull Request resolved: https://github.com/pytorch/pytorch/pull/95632 Approved by: https://github.com/huydhn	2023-02-27 22:41:02 +00:00
Huy Do	9b7abc4fac	Run slow gradcheck tests sequentially (#95494 ) Also redo https://github.com/pytorch/pytorch/pull/95246 as there are many more still run OOM Pull Request resolved: https://github.com/pytorch/pytorch/pull/95494 Approved by: https://github.com/clee2000	2023-02-26 00:44:25 +00:00
Zain Rizvi	c97275acf6	Fix OOMing periodic shards (#95246 ) Tests have been consistently failing with the error on the following shards with the error `RuntimeError: CUDA error: out of memory` - `periodic / linux-bionic-cuda11.7-py3-gcc7-slow-gradcheck / test (default, 1, 2, linux.4xlarge.nvidia.gpu)` - `periodic / linux-bionic-cuda11.7-py3-gcc7-slow-gradcheck / test (default, 2, 2, linux.4xlarge.nvidia.gpu)` Seeing if serializing those test files makes the periodic jobs succeed again. This feels a bit off since there are so many different test files that have failed and need to be serialized, indicating a potential perf regression somewhere Failures on hud: https://hud.pytorch.org/hud/pytorch/pytorch/master/1?per_page=100&name_filter=periodic%20%2F%20linux-bionic-cuda11.7-py3-gcc7-slow-gradcheck%20%2F%20test%20(default%2C%20 Pull Request resolved: https://github.com/pytorch/pytorch/pull/95246 Approved by: https://github.com/Skylion007, https://github.com/huydhn	2023-02-23 03:50:56 +00:00
Xuehai Pan	046e88a291	[BE] [3/3] Rewrite `super()` calls in test (#94592 ) Rewrite Python built-in class `super()` calls. Only non-semantic changes should be applied. - #94587 - #94588 - #94592 Also, methods with only a `super()` call are removed: ```diff class MyModule(nn.Module): - def __init__(self): - super().__init__() - def forward(self, ...): ... ``` Some cases that change the semantics should be kept unchanged. E.g.: `f152a79be9/caffe2/python/net_printer.py (L184-L190)` `f152a79be9/test/test_jit_fuser_te.py (L2628-L2635)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/94592 Approved by: https://github.com/ezyang, https://github.com/seemethere	2023-02-12 22:20:53 +00:00
Huy Do	d51ca38ef0	Run test_serialization serially (for 2xlarge runners) (#94613 ) Fixes https://github.com/pytorch/pytorch/issues/92746 Pull Request resolved: https://github.com/pytorch/pytorch/pull/94613 Approved by: https://github.com/clee2000	2023-02-11 00:15:10 +00:00
Vasiliy Kuznetsov	f15ab8a7f2	AO migration: replace torch internal callsites (#94170 ) Summary: Do the following renames: `torch.quantization` -> `torch.ao.quantization` `torch.nn.quantized` -> `torch.ao.nn.quantized` `torch.nn.quantizable` -> `torch.ao.nn.quantizable` `torch.nn.qat` -> `torch.ao.nn.qat` `torch.nn.intrinsic` -> `torch.ao.nn.intrinsic` And then, do `torch.ao.nn.quantized._reference` -> `torch.ao.nn.quantized.reference` to clean up the aftermath of https://github.com/pytorch/pytorch/pull/84974 Then, manually update `test/test_module_init.py` to fix hanging whitespace due to the replace. Run this script to do the replacements: https://gist.github.com/vkuzo/7f7afebf8c31b9ba48306223e68a1c82 This is for https://github.com/pytorch/pytorch/issues/81667 Test plan: CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/94170 Approved by: https://github.com/jerryzh168	2023-02-07 02:32:23 +00:00
Nikita Shulga	a07d1291cf	Re-enable compilation tests (#92333 ) As CUDA-11.5 is no longer supported, just remove the check Fixes https://github.com/pytorch/pytorch/issues/69460 Pull Request resolved: https://github.com/pytorch/pytorch/pull/92333 Approved by: https://github.com/atalman	2023-02-06 20:06:12 +00:00
Jane Xu	0ecb071fc4	[BE][CI] change references from .jenkins to .ci (#92624 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/92624 Approved by: https://github.com/ZainRizvi, https://github.com/huydhn	2023-01-30 22:50:07 +00:00
PyTorch MergeBot	8b3e35ea4a	Revert "Run dynamo/test_dynamic_shapes serially (#92215 )" This reverts commit `ea1007b89c`. Reverted https://github.com/pytorch/pytorch/pull/92215 on behalf of https://github.com/huydhn due to This is not needed anymore as https://github.com/pytorch/pytorch/issues/92196 has been root caused to test ordering	2023-01-20 18:54:13 +00:00
Catherine Lee	25e530083e	[ci] Run test_decomp parallel (#92566 ) run test_decomp in parallel with itself since it now takes 2+ hours on some architectures https://docs.google.com/spreadsheets/d/1o0W4WjOYIyPSzBSl3lelvKcQyLOiv8pMijiGUDoPuBU/edit#gid=0 Pull Request resolved: https://github.com/pytorch/pytorch/pull/92566 Approved by: https://github.com/huydhn	2023-01-19 20:47:27 +00:00
Huy Do	ea1007b89c	Run dynamo/test_dynamic_shapes serially (#92215 ) Per my findings in https://github.com/pytorch/pytorch/issues/92196#issuecomment-1383029544 > The test itself dynamo/test_dynamic_shapes is not flaky and all passes when I try to run it locally. However, this test is set to run in parallel with other tests on the runner (2 tests at a times). After many tries, I can only reproduce the issue once when dynamo/test_dynamic_shapes is run in parallel with test_comparison_utils After many retries, I could reproduce the issue once locally when running (https://paste.sh/_mFImq6V#FgbKq6IQBg65PKUFA08Ah_Vb) ``` python test/run_test.py --verbose --exclude-jit-executor --exclude-distributed-tests -i test_comparison_utils dynamo/test_dynamic_shapes ``` So setting this test to run serially to avoid further flakiness while the root cause is investigated. Here are some example flaky failures: * https://github.com/pytorch/pytorch/issues/92196 * https://github.com/pytorch/pytorch/issues/92178 * https://github.com/pytorch/pytorch/issues/92042 * https://github.com/pytorch/pytorch/issues/92210 The test takes 30s or so to finish, so its duration is not a concern. Pull Request resolved: https://github.com/pytorch/pytorch/pull/92215 Approved by: https://github.com/clee2000	2023-01-17 17:54:39 +00:00
Wanchao Liang	801d831d7a	[dtensor] enable op db tests by using multithreaded test case (#92198 ) Time comparison between using MultithreadedTestCase and MultiProcessTestCase on op db tests is amazing! using MultiThreadTestCase on a AWS dev node: ``` time pytest test/distributed/_tensor/test_dtensor_ops.py ============= 175 passed, 42 skipped, 397 xfailed in 80.30s (0:01:20) ======= real 1m22.330s user 1m38.782s sys 0m18.762s ``` MultiProcessTestCase spends from 40mins to more than 1h, even if using pytest parallel testing tools. Pull Request resolved: https://github.com/pytorch/pytorch/pull/92198 Approved by: https://github.com/XilunWu	2023-01-17 03:26:38 +00:00
Jeff Daily	ce50a8de75	[CI][ROCm] add test_dataloader to CI_SERIAL_LIST (#91895 ) Still working towards solving #90940 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/91895 Approved by: https://github.com/huydhn	2023-01-10 16:32:39 +00:00
Catherine Lee	e67f5ab6cc	Print and zip remaining test logs (#91510 ) When CI times out or gets cancelled, the code to print and delete logs for currently running tests doesn't get run, which makes it hard to debug what's going on, so print the logs in a new step and also zip them into the usage-log zip (which should probably get a name change at some point) Pull Request resolved: https://github.com/pytorch/pytorch/pull/91510 Approved by: https://github.com/malfet, https://github.com/huydhn, https://github.com/ZainRizvi	2023-01-09 17:31:36 +00:00
Jeff Daily	f44946289b	[CI][ROCm] fix device visibility, again (#91813 ) The previous PR #91137 was incomplete. Though it successfully queried for the number of available GPUs, it still resulted in test files sharing the same GPU. This PR lifts the maxtasksperchild=1 restriction so that Pool workers will always use the same GPU. This also adds a Note in run_test.py for future reference. Pull Request resolved: https://github.com/pytorch/pytorch/pull/91813 Approved by: https://github.com/kit1980, https://github.com/huydhn, https://github.com/malfet	2023-01-06 22:19:07 +00:00
Jeff Daily	c18e8c68d8	[ROCm] fix parallel test runners and device visibility (#91137 ) Fixes #90940. This PR revamps how tests are run in parallel as well as device visibility at the docker container and within the run_test.py test runner. First, running multiple test modules concurrently on the same GPU was causing instability for ROCm runners manifesting as timeouts. ROCm runners have at least 1 GPU each, but often 2 or more. This PR allows NUM_PROCS to be set equal to the number of devices available, but also takes care to set HIP_VISIBLE_DEVICES to avoid oversubscribing any GPU. Second, we had introduced env vars `-e ROCR_VISIBLE_DEVICES` (#91031) to prepare for two GHA runners per CI node, to split up the GPU visibility at the docker level between the two runners. This effort wasn't fully realized; to date, we haven't had more than one runner per CI host. We abandon this effort in favor of all GPUs being visible to a single runner and managing GPU resources as stated above. Pull Request resolved: https://github.com/pytorch/pytorch/pull/91137 Approved by: https://github.com/kit1980, https://github.com/huydhn, https://github.com/pruthvistony	2023-01-04 19:40:05 +00:00
Huy Do	316ba9e6fc	Run jit legacy tests sequentially (#91518 ) Fixes https://github.com/pytorch/pytorch/issues/91457. I have been re-running the 2 tests `test_jit_legacy` and `test_jit_fuser_legacy` in `jit_legacy` shard multiple times (100+) without any flaky issues found. I suspect that we might have a test parallelization flakiness here. So this PR runs these 2 tests serially. They takes less than 5 minutes to finish, so running them sequentially won't be an issue (https://hud.pytorch.org/hud/pytorch/pytorch/master/1?per_page=50&name_filter=jit_legacy) Pull Request resolved: https://github.com/pytorch/pytorch/pull/91518 Approved by: https://github.com/clee2000	2023-01-04 04:13:01 +00:00
joncrall	ad782ff7df	Enable xdoctest runner in CI for real this time (#83816 ) Builds on #83317 and enables running the doctests. Just need to figure out what is causing the failures. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83816 Approved by: https://github.com/ezyang, https://github.com/malfet	2022-12-29 05:32:42 +00:00
Huy Do	8f16524598	Run test_spectral_ops serially to fix CUDA illegal memory access (#91264 ) Fixes https://github.com/pytorch/pytorch/issues/88916 * Running this test sequentially is not flaky after 1000 reruns `pytest --verbose test_spectral_ops.py -k test_fft_round_trip_cuda_float32 --flake-finder --flake-runs=1000` * On the other hand, the curious thing is that when I run this same command on an active runner with some testing processs running in the background, the reruns could fail with CUDA illegal memory access error (hard to reproduce though) https://paste.sh/6sZdRn95#pve73riXC5XehCLqxlCbnjea. This points to the fact that running the `test_spectral_ops` test in parallel with others might be the surface-level cause of flakiness So this PR adds the test to the serial list instead. This shouldn't cause any issue w.r.t TTS because the test takes only half a minute at most to finish. ``` +---------------------+-------------------------------------------------+-------------+---------------------+ \| file \| base_name \| test_config \| time \| +---------------------+-------------------------------------------------+-------------+---------------------+ \| "test_spectral_ops" \| "cuda11.6-py3.10-gcc7-sm86" \| "default" \| 5.991666666666661 \| \| "test_spectral_ops" \| "cuda11.6-py3.10-gcc7-sm86" \| "slow" \| 0.18433333333333346 \| \| "test_spectral_ops" \| "linux-bionic-cuda11.6-py3-gcc7-slow-gradcheck" \| "default" \| 9.866000000000003 \| \| "test_spectral_ops" \| "linux-bionic-cuda11.6-py3.10-gcc7" \| "default" \| 10.591333333333337 \| \| "test_spectral_ops" \| "linux-bionic-cuda11.6-py3.7-gcc7-debug" \| "default" \| 11.395000000000003 \| \| "test_spectral_ops" \| "linux-bionic-cuda11.7-py3.10-gcc7" \| "default" \| 9.424 \| \| "test_spectral_ops" \| "linux-bionic-cuda11.7-py3.7-gcc7-debug" \| "default" \| 8.889000000000003 \| \| "test_spectral_ops" \| "linux-bionic-py3.7-clang9" \| "crossref" \| 6.280333333333329 \| \| "test_spectral_ops" \| "linux-bionic-py3.7-clang9" \| "default" \| 12.182999999999998 \| \| "test_spectral_ops" \| "linux-bionic-py3.7-clang9" \| "dynamo" \| 11.124999999999984 \| \| "test_spectral_ops" \| "linux-bionic-py3.7-clang9-slow" \| "slow" \| 0.1916666666666668 \| \| "test_spectral_ops" \| "linux-focal-py3.7-clang7-asan" \| "default" \| 20.899666666666658 \| \| "test_spectral_ops" \| "linux-focal-py3.7-gcc7" \| "default" \| 5.097999999999996 \| \| "test_spectral_ops" \| "linux-focal-rocm5.3-py3.8-slow" \| "slow" \| 0.23700000000000018 \| \| "test_spectral_ops" \| "macos-12-py3-arm64" \| "default" \| 2.8396666666666626 \| \| "test_spectral_ops" \| "macos-12-py3-x86-64" \| "default" \| 8.838999999999997 \| \| "test_spectral_ops" \| "parallelnative-linux-focal-py3.7-gcc7" \| "default" \| 5.016999999999998 \| \| "test_spectral_ops" \| "win-vs2019-cpu-py3" \| "default" \| 8.351666666666665 \| \| "test_spectral_ops" \| "win-vs2019-cuda11.6-py3" \| "default" \| 27.121666666666687 \| \| "test_spectral_ops" \| "win-vs2019-cuda11.7-py3" \| "default" \| 24.567000000000025 \| +---------------------+-------------------------------------------------+-------------+---------------------+ ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/91264 Approved by: https://github.com/clee2000	2022-12-22 02:39:33 +00:00
Xiao Wang	649d0b6ae7	Add env var PYTORCH_TEST_RUN_EVERYTHING_IN_SERIAL=1 that allows running unit test suites in serial (#90981 ) Running unit test suites in parallel sometimes creates unexpected errors. This PR adds an option that allows unit test suites to be executed in serial, by setting PYTORCH_TEST_RUN_EVERYTHING_IN_SERIAL=1. Pull Request resolved: https://github.com/pytorch/pytorch/pull/90981 Approved by: https://github.com/malfet, https://github.com/ptrblck	2022-12-20 21:20:59 +00:00
clee2000	34717b3ea8	nn/test_convolution to run in serial (#91113 ) unfortunately it takes 50 minutes on slow gradcheck but thats on periodic ends up taking up >6000 MB of space (7440 available) Pull Request resolved: https://github.com/pytorch/pytorch/pull/91113 Approved by: https://github.com/huydhn, https://github.com/ZainRizvi	2022-12-20 19:12:43 +00:00
Edward Z. Yang	ffd0b15a49	Add support for keep-going label (#90902 ) This makes run_test.py keep going even on failure. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/90902 Approved by: https://github.com/malfet, https://github.com/huydhn	2022-12-16 06:47:06 +00:00
PyTorch MergeBot	82a191313e	Revert "Add support for keep-going label (#90902 )" This reverts commit `855f4b7d24`. Reverted https://github.com/pytorch/pytorch/pull/90902 on behalf of https://github.com/huydhn due to This change breaks trunk where, unlike PR, there is no label	2022-12-16 05:07:49 +00:00
Edward Z. Yang	855f4b7d24	Add support for keep-going label (#90902 ) This makes run_test.py keep going even on failure. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/90902 Approved by: https://github.com/malfet, https://github.com/huydhn	2022-12-16 04:03:52 +00:00
soulitzer	dc4d18d47d	Remove hack to hard code test times (#90720 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/90720 Approved by: https://github.com/janeyx99	2022-12-13 04:28:01 +00:00
Huy Do	177621a0b2	Use pytest-flakefinder to rerun tests multiple times (#89106 ) Per title. The way re-run is handled in https://github.com/pytorch/pytorch/pull/88646 only applies to unittest. ### Testing * https://github.com/pytorch/pytorch/actions/runs/3484930558 * https://github.com/pytorch/pytorch/actions/runs/3484930319 Manually download the test report artifacts and verify that that pytest test_ops is called multiple times. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89106 Approved by: https://github.com/clee2000	2022-11-18 00:11:44 +00:00
Wanchao Liang	ac0a6f381d	[dtensor] disable op db tests for now (#89162 ) context: https://github.com/pytorch/pytorch/issues/89160 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89162 Approved by: https://github.com/fduwjj	2022-11-17 02:31:23 +00:00
Fuzzkatt	8ba62bdff5	add test_c10d_spawn_ucc.py (#86508 ) Initial PR to create UCC equivalent of https://github.com/pytorch/pytorch/blob/master/test/distributed/test_c10d_spawn_gloo.py and https://github.com/pytorch/pytorch/blob/master/test/distributed/test_c10d_spawn_nccl.py. Currently only added common ops. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86508 Approved by: https://github.com/kwen2501	2022-11-16 22:50:11 +00:00
Peter Bell	7b0adc290a	Run tests from test/inductor in inductor CI job (#88957 ) CUDA inductor tests are currently not run in CI because the only jobs that have triton installed don't actually run these test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88957 Approved by: https://github.com/ngimel, https://github.com/seemethere	2022-11-16 17:54:13 +00:00
Huy Do	21dd311077	Add a mode to rerun all disabled tests (without running anything else) (#88646 ) Rerun all disabled test to gather their latest result so that we can close disabled tickets automatically. When running under this mode (RERUN_DISABLED_TESTS=true), only disabled tests are run while the rest are skipped `<skipped message="Test is enabled but --rerun-disabled-tests verification mode is set, so only disabled tests are run" type="skip"/>` The logic is roughly as follows, the test runs multiple times (n=50) * If the disabled test passes, and it's flaky, do nothing because it's still flaky. In the test report, we'll see the test passes with the following skipped message: ``` <testcase classname="TestMultiprocessing" file="test_multiprocessing.py" line="357" name="test_fs" time="0.000" timestamp="0001-01-01T00:00:00"> <skipped message="{"flaky": True, "num_red": 4, "num_green": 0, "max_num_retries": 3, "rerun_disabled_test": true}" type="skip"/> </testcase> ``` * If the disabled test passes every single time, and it is not flaky anymore, mark it so that it can be closed later. We will see the test runs and passes, i.e. ``` <testcase classname="TestCommonCUDA" name="test_out_warning_linalg_lu_factor_cuda" time="0.170" file="test_ops.py" /> ``` * If the disabled test fails after all retries, this is also expected. So only report this but don't fail the job (because we don't care about red signals here), we'll see the test is skipped (without the `flaky` field), i.e. ``` <testcase classname="TestMultiprocessing" file="test_multiprocessing.py" line="357" name="test_fs" time="0.000" timestamp="0001-01-01T00:00:00"> <skipped message="{"num_red": 4, "num_green": 0, "max_num_retries": 3, "rerun_disabled_test": true}" type="skip"/> </testcase> ``` This runs at the same schedule as `mem_leak_check` (daily). The change to update test stats, and (potentially) grouping on HUD will come in separated PRs. ### Testing * pull https://github.com/pytorch/pytorch/actions/runs/3447434434 * trunk https://github.com/pytorch/pytorch/actions/runs/3447434928 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88646 Approved by: https://github.com/clee2000	2022-11-15 05:08:26 +00:00
Catherine Lee	de38c87698	Use run_test in MPS (#88829 ) Run mps through run_test to get disable test infra, create xml files (which can then be used for flakiness detection), and reruns Also added the workflow steps for uploading the xml files Pull Request resolved: https://github.com/pytorch/pytorch/pull/88829 Approved by: https://github.com/malfet, https://github.com/huydhn	2022-11-10 21:32:41 +00:00
soulitzer	4c20c0509d	Split out forward AD tests from test_ops_gradients and reenable slow gradcheck CI (#88216 ) Fixes: https://github.com/pytorch/pytorch/issues/88010 This PR does a couple things to stop slow gradcheck from timing out: - Splits out test_ops_fwd_gradients from test_ops_gradients, and factors out TestFwdGradients and TestBwdGradients which both inherit from TestGradients, now situated in common_utils (maybe there is a better place?) - Skips CompositeCompliance (and several other test files) for slow gradcheck CI since they do not use gradcheck - because test times for test_ops_fwd_gradients and test_ops_gradients are either unknown or wrong, we hardcode them for now to prevent them from being put together. We can undo the hack after we see actual test times are updated. ("def calculate_shards" randomly divides tests with unknown test times in a round-robin fashion.) - Updates references to test_ops_gradients and TestGradients - Test files that are skipped for slow gradcheck CI are now centrally located in in run_tests.py, this reduces how fine-grained we can be with the skips, so for some skips (one so far) we still use the old skipping mechanism, e.g. for test_mps Pull Request resolved: https://github.com/pytorch/pytorch/pull/88216 Approved by: https://github.com/albanD	2022-11-03 00:20:45 +00:00
Iris Zhang	7bd04fb09f	[1/N][C10D] Add a customized ScubaLogHandler implementation for internal FB use (#86699 ) (#87123 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/86699 This diff does the following: 1. c10d_error_logger.py: Add an API to create a logger with a specific logging handler based on the destination. 2. The API from above would get a logging handler based on the destination provided. - caffe2/torch/distributed/logging_handlers.py: For OSS, we simply use a NullHandler() for now. 3. Add associated test files for 1 and 2. Test Plan: ## Unit Test ``` buck test @//mode/dev-nosan //caffe2/test/distributed:test_c10d_error_logger -- --print-passing-details ``` ``` File changed: fbcode//caffe2/test/distributed/test_c10d_error_logger.py File changed: fbsource//xplat/caffe2/test/distributed/TARGETS 9 additional file changes waiting for all tests to finish... ✓ Listing success: caffe2/test/distributed:test_c10d_error_logger (0.2s) Found 1 tests ✓ Pass: caffe2/test/distributed:test_c10d_error_logger - test_get_or_create_logger (caffe2.test.distributed.test_c10d_error_logger.C10dErrorLoggerTest) (0.2s) stdout: stderr: Buck UI: https://www.internalfb.com/buck2/b975f6b0-77e9-4287-8722-f95b48036181 Test Session: https://www.internalfb.com/intern/testinfra/testrun/1407375150206593 RE: reSessionID-4d7ab8ca-1051-48e9-a5a8-6edbe15d1fe4 Up: 124 B Down: 0 B Jobs completed: 5. Time elapsed: 3.5s. Tests finished: Pass 1. Fail 0. Fatal 0. Skip 0. 0 builds failed ``` Differential Revision: D39920391 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87123 Approved by: https://github.com/fduwjj, https://github.com/H-Huang	2022-10-21 18:45:38 +00:00
atalman	40d0fa5314	Reenable aot tests on windows for cuda 11.7 and up (#87193 ) Reenable aot tests on windows for cuda 11.7 and up Issue: https://github.com/pytorch/pytorch/issues/69460 seems to be mitigated in CUDA 11.7 hence re-enable this test cc @peterjc123 @mszhanyi @skyline75489 @nbcsm Pull Request resolved: https://github.com/pytorch/pytorch/pull/87193 Approved by: https://github.com/malfet	2022-10-19 17:09:37 +00:00
Catherine Lee	9a786202b7	[ci] fix log printing (#87223 ) idk how i missed this example https://github.com/pytorch/pytorch/actions/runs/3275717751/jobs/5391093040 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87223 Approved by: https://github.com/malfet, https://github.com/kit1980, https://github.com/janeyx99	2022-10-18 20:57:27 +00:00
Catherine Lee	18cc00d399	[ci] put more logs in a folded group (#86138 ) fixes: request to not print the entire log file, but the last couple of lines since they are probably the most relevant all but last 300 lines of failing tests get put into a folded group example https://github.com/pytorch/pytorch/actions/runs/3177200444/jobs/5177703202 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86138 Approved by: https://github.com/huydhn, https://github.com/ZainRizvi, https://github.com/lezcano	2022-10-17 22:10:23 +00:00
Huy Do	d393a463ff	Fix functorch test selection logic (#86944 ) I realize that `run_test.py` doesn't take into account functorch test selection logic at the moment, for example `python test/run_test.py --functorch -i functorch/test_ops --verbose` stills run all functorch tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/86944 Approved by: https://github.com/clee2000, https://github.com/malfet	2022-10-14 17:26:52 +00:00
Huy Do	5224906749	Spread distributed backends among all distributed shards (#86837 ) So that they can be run in parallel without stepping on each other toe Pull Request resolved: https://github.com/pytorch/pytorch/pull/86837 Approved by: https://github.com/clee2000	2022-10-13 03:31:33 +00:00
Huy Do	be81f3d8d4	Revert distributed test parallelization (#86756 ) Revert an old commit and resolve some conflicts Fixes https://github.com/pytorch/pytorch/issues/86418 Fixes https://github.com/pytorch/pytorch/issues/86419 Fixes https://github.com/pytorch/pytorch/issues/86415 Fixes https://github.com/pytorch/pytorch/issues/86420 Fixes https://github.com/pytorch/pytorch/issues/86416 Fixes https://github.com/pytorch/pytorch/issues/86392 Fixes https://github.com/pytorch/pytorch/issues/86391 Fixes https://github.com/pytorch/pytorch/issues/86397 Fixes https://github.com/pytorch/pytorch/issues/86390 Fixes https://github.com/pytorch/pytorch/issues/86398 Fixes https://github.com/pytorch/pytorch/issues/86396 Fixes https://github.com/pytorch/pytorch/issues/86395 Fixes https://github.com/pytorch/pytorch/issues/86393 Fixes https://github.com/pytorch/pytorch/issues/86394 Fixes https://github.com/pytorch/pytorch/issues/86440 Fixes https://github.com/pytorch/pytorch/issues/86442 Fixes https://github.com/pytorch/pytorch/issues/86439 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86756 Approved by: https://github.com/mrshenli	2022-10-12 21:17:28 +00:00
Daniel Dale	ce56ee11fd	Extend torch.cuda.is_available() to attempt an NVML-based CUDA availability assessment when explicitly requested by the user (#85951 ) Fixes #83973 (This is a substitute PR for https://github.com/pytorch/pytorch/pull/85024) First of all, thanks for your invaluable contributions to PyTorch everyone! Given how extensively `torch.cuda.is_available` is used in the PyTorch ecosystem, IMHO it's worthwhile to provide downstream libraries/frameworks/users the ability to alter the default behavior of `torch.cuda.is_available` in the context of their PyTorch usage. I'm confident there are many current and future such use cases which could benefit from leveraging a weakened, NVML-based `torch.cuda.is_available` assessment at a downstream framework's explicit direction (thanks @malfet `81da50a972` !). Though one could always patch out the `torch.cuda.is_available` function with another implementation in a downstream library, I think this environmental variable based configuration option is more convenient and the cost to including the option is quite low. As discussed in https://github.com/pytorch/pytorch/pull/85024#issuecomment-1261542045, this PR gates new non-default NVML-based CUDA behavior with an environmental variable (PYTORCH_NVML_BASED_CUDA_CHK) that allows a user/framework to invoke non-default, NVML-based `is_available()` assessments if desired. Thanks again for your work everyone! @ngimel @malfet @awaelchli Pull Request resolved: https://github.com/pytorch/pytorch/pull/85951 Approved by: https://github.com/ngimel	2022-10-12 18:37:50 +00:00
Richard Zou	109f4d4453	Move functorch tests from functorch/test/* to test/functorch/* (#86623 ) This is the first step described in https://github.com/pytorch/pytorch/issues/86618 . test/functorch/* is the final location for these tests. Test Plan: - Check that the functorch shards in CI are still running tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86623 Approved by: https://github.com/huydhn	2022-10-11 17:20:45 +00:00
Catherine Lee	7941b042a7	parallelize at file granularity (#85770 ) part two of https://github.com/pytorch/pytorch/pull/84961 tests files in parallel at the test file granularity * 2 procs at a time * number of tests ran changed by <200, possibly due to adding more tests on master between the base commit and head commit of the PR * may cause flakiness, but I haven't seen it in my small sample size of this PR Pull Request resolved: https://github.com/pytorch/pytorch/pull/85770 Approved by: https://github.com/huydhn	2022-10-03 16:59:39 +00:00
Catherine Lee	401a358817	[ci] two procs for parallelization (#85985 ) hitting ooms on linux cuda so use 2 procs instead of 3 https://github.com/pytorch/pytorch/issues/85939 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85985 Approved by: https://github.com/huydhn	2022-09-30 20:44:13 +00:00
Catherine Lee	7e5105dd11	[ci] expand log file if failed (#85927 ) as in title, expand the logs if the test file failed ex https://github.com/pytorch/pytorch/actions/runs/3155045945/jobs/5133566508 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85927 Approved by: https://github.com/huydhn, https://github.com/janeyx99	2022-09-30 16:51:28 +00:00
Huy Do	4d3acf1203	Enable pytest-shard for functorch (#85321 ) This extends https://github.com/pytorch/pytorch/pull/84961 to support functorch tests with pytest-shard. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85321 Approved by: https://github.com/samdow, https://github.com/clee2000	2022-09-24 01:17:04 +00:00
Catherine Lee	49e10c1598	[ci] test_ops in parallel, ci tests log to file (#85528 ) part one of splitting up https://github.com/pytorch/pytorch/pull/84961 into (probably 2) parts contains * logging to file * testing test_ops in parallel Pull Request resolved: https://github.com/pytorch/pytorch/pull/85528 Approved by: https://github.com/huydhn	2022-09-23 20:45:20 +00:00
Nikita Shulga	46a6a50f4e	Skip MPS test from generic M1 testsuite (#85500 ) As there is separate Run MPS shard right now See if this reduces the number of crashes Pull Request resolved: https://github.com/pytorch/pytorch/pull/85500 Approved by: https://github.com/clee2000, https://github.com/kit1980, https://github.com/huydhn	2022-09-22 22:13:47 +00:00
PyTorch MergeBot	3dce26635f	Revert "test in parallel at file granularity (#84961 )" This reverts commit `8107666c6a`. Reverted https://github.com/pytorch/pytorch/pull/84961 on behalf of https://github.com/clee2000 due to makes test_forward_ad_nn_functional_max_unpool2d_cuda_float32 flakily unexpectedly pass	2022-09-21 20:21:25 +00:00
Catherine Lee	8107666c6a	test in parallel at file granularity (#84961 ) run tests in parallel at the test file granularity runs 3 files in parallel using multiprocessing pool, output goes to a file, which is then printed when the test finishes. Some tests cannot be run in parallel (usually due to lacking memory), so we run those after. Sharding is changed to attempt to mask large files with other large files/run them on the same shard. test_ops* gets a custom handler to run it because it is simply too big (2hrs on windows) and linalg_cholesky fails (I would really like a solution to this if possible, but until then we use the custom handler). reduces cuda tests by a lot, reduces total windows test time by ~1hr Ref. https://github.com/pytorch/pytorch/issues/82894 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84961 Approved by: https://github.com/huydhn	2022-09-21 16:58:11 +00:00
Arindam Roy	a185dc2e63	[ROCm] re-enable tensorexpr and test_openmp (#81367 ) The following tests are being re-enabled for ROCm: - test_openmp.py - TestTensorExprPyBind tests in test_tensorexpr_pybind.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/81367 Approved by: https://github.com/jeffdaily, https://github.com/malfet	2022-09-14 00:41:16 +00:00
Xiang Gao	08c4f8c7a7	ProcessGroupUCC tests (#83285 ) - [x] Direct dependency on UCX is completely removed, UCC active set API always enabled - [x] Remove `TORCH_UCC_PROFILING_ENABLE`, always enable profiling - [x] Fixes profiling of `recv` and `all_gather` - [x] Use the NCCL TL of UCC on CUDA, as the UCP TL is not well supported on CUDA Most tests are passing, but there are a few skipped tests: - `scatter` and `gather` are not supported by the UCP TL of UCC on CPU tensors - A few flaky tests in PyTorch's CI environment - Profiler-related failures, some of them will be fixed by @Fuzzkatt in https://github.com/pytorch/pytorch/pull/84368 After this PR is merged, I will continue to work on these skipped failures. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83285 Approved by: https://github.com/vtlam, https://github.com/malfet, https://github.com/kwen2501	2022-09-10 10:56:05 +00:00
Huy Do	90d6112a94	Test distributed backends in parallel (#84034 ) This allows multiple backends (nccl, gloo) to be tested in parallel and speed up the process. The improvement is mainly in the 1st distributed CUDA shard where the long pole `distributed/test_distributed_spawn` test is executed: * [linux-bionic-cuda11.6-py3.10-gcc7 / test (distributed, 1, 2, linux.8xlarge.nvidia.gpu)](https://github.com/pytorch/pytorch/runs/8007596825?check_suite_focus=true#logs) takes 1h24m. This is better than the current average expectation of 2h12m On the other hand, there is no improvement for the following two jobs: * [linux-focal-py3.7-gcc7 / test (distributed, 1, 1, linux.2xlarge)](https://github.com/pytorch/pytorch/runs/8007417353?check_suite_focus=true#logs) takes 1h47m * [linux-bionic-cuda11.6-py3.10-gcc7 / test (distributed, 2, 2, linux.8xlarge.nvidia.gpu)](https://github.com/pytorch/pytorch/runs/8007596870?check_suite_focus=true#logs) takes 1h40m This is still a gain though because it allows us to add more shards for distributed test if needed. Issue https://github.com/pytorch/pytorch/issues/83694 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84034 Approved by: https://github.com/wanchaol	2022-09-01 03:48:54 +00:00
PyTorch MergeBot	772721a4b7	Revert "Test distributed backends in parallel (#84034 )" This reverts commit `3ae5be74ac`. Reverted https://github.com/pytorch/pytorch/pull/84034 on behalf of https://github.com/huydhn due to This somehow revives the flaky test https://github.com/pytorch/pytorch/issues/76428	2022-08-30 21:01:25 +00:00
Huy Do	3ae5be74ac	Test distributed backends in parallel (#84034 ) This allows multiple backends (nccl, gloo) to be tested in parallel and speed up the process. The improvement is mainly in the 1st distributed CUDA shard where the long pole `distributed/test_distributed_spawn` test is executed: * [linux-bionic-cuda11.6-py3.10-gcc7 / test (distributed, 1, 2, linux.8xlarge.nvidia.gpu)](https://github.com/pytorch/pytorch/runs/8007596825?check_suite_focus=true#logs) takes 1h24m. This is better than the current average expectation of 2h12m On the other hand, there is no improvement for the following two jobs: * [linux-focal-py3.7-gcc7 / test (distributed, 1, 1, linux.2xlarge)](https://github.com/pytorch/pytorch/runs/8007417353?check_suite_focus=true#logs) takes 1h47m * [linux-bionic-cuda11.6-py3.10-gcc7 / test (distributed, 2, 2, linux.8xlarge.nvidia.gpu)](https://github.com/pytorch/pytorch/runs/8007596870?check_suite_focus=true#logs) takes 1h40m This is still a gain though because it allows us to add more shards for distributed test if needed. Issue https://github.com/pytorch/pytorch/issues/83694 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84034 Approved by: https://github.com/wanchaol	2022-08-30 19:06:49 +00:00
joncrall	b136f3f310	More doctest refinements. (#83317 ) Follow up to #82797 Now that the doctests themselves are in a better state, we should be able to enable xdoctest on the CI so they stay that way. @ezyang @vadimkantorov Pull Request resolved: https://github.com/pytorch/pytorch/pull/83317 Approved by: https://github.com/ezyang	2022-08-22 20:07:26 +00:00
joncrall	4618371da5	Integrate xdoctest - Rebased (#82797 ) This is a new version of #15648 based on the latest master branch. Unlike the previous PR where I fixed a lot of the doctests in addition to integrating xdoctest, I'm going to reduce the scope here. I'm simply going to integrate xdoctest, and then I'm going to mark all of the failing tests as "SKIP". This will let xdoctest run on the dashboards, provide some value, and still let the dashboards pass. I'll leave fixing the doctests themselves to another PR. In my initial commit, I do the bare minimum to get something running with failing dashboards. The few tests that I marked as skip are causing segfaults. Running xdoctest results in 293 failed, 201 passed tests. The next commits will be to disable those tests. (unfortunately I don't have a tool that will insert the `#xdoctest: +SKIP` directive over every failing test, so I'm going to do this mostly manually.) Fixes https://github.com/pytorch/pytorch/issues/71105 @ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/82797 Approved by: https://github.com/ezyang	2022-08-12 02:08:01 +00:00
Mateusz Sypniewski	916def84d4	CUDA trace Python hooks (#82824 ) ### Description This adds Python hooks into PyTorch that allow the user to register their own callbacks for events such as tensor allocation, stream allocation, event record / wait etc. Pull Request resolved: https://github.com/pytorch/pytorch/pull/82824 Approved by: https://github.com/lw, https://github.com/ezyang, https://github.com/malfet	2022-08-11 10:21:40 +00:00
pbialecki	b4f7e22640	Enable periodic builds for CUDA 11.7 (#81688 ) CC @atalman Pull Request resolved: https://github.com/pytorch/pytorch/pull/81688 Approved by: https://github.com/atalman	2022-08-10 00:03:51 +00:00
Nikita Shulga	b9cdd6d0ac	Do not assume what is in `os.environ` (#82495 ) `os.environ['FOO']` will raise `IndexError` if `FOO` is not found, while `os.environ.get('FOO')` would simply return `None` Fixes https://github.com/pytorch/pytorch/issues/82492 Pull Request resolved: https://github.com/pytorch/pytorch/pull/82495 Approved by: https://github.com/huydhn, https://github.com/kit1980	2022-07-29 22:57:18 +00:00
Catherine Lee	86f038dd56	download test times during build to avoid race conditions (#81915 ) After https://github.com/pytorch/pytorch/pull/81116, we started pulling test times straight from the source instead of first downloading them in the build job and then having the test job take the build jobs version. This can cause an issues where different shards pull different versions of the file, leading to incorrect sharding (ex two shards running the same tests file on accident). This generally happens if the test jobs happen while the test times file is being updated (unlikely, but not impossible) or if someone reruns a test job the next day. In this PR, I return to the old method of downloading the test times file during the build job and having the test jobs pull from the build jobs uploaded artifacts. If there is no test times file in the build job's artifacts, we fall back to the default sharding plan. Notes: * script moved to a new file to avoid needing to import torch, which would require torch to be built, which can cause issues with asan * I got errors with asan (`ASan runtime does not come first in initial library list; you should either link runtime to your application or manually preload it with LD_PRELOAD.`), so I put the script at the beginning of the build ### Test Plan Verified that the number of tests ran in the pull and trunk workflows are similar to workflows run on master. Checked logs to see if artifacts were being used for sharding. Spot checked a few test configs to check that their lists of selected tests didn't overlap. Pull Request resolved: https://github.com/pytorch/pytorch/pull/81915 Approved by: https://github.com/huydhn	2022-07-28 16:35:01 +00:00
Richard Zou	5f4e8c0a4d	Add ability to functorch tests via run_test.py (#82012 ) This PR: - adds the ability to run functorch tests via run_test.py - changes the functorch shards in PyTorch CI to invoke functorch tests via run_test.py The main motivation for this is so that functorch tests hook into the standard PyTorch test infrastructure. Questions for reviewers: - the functorch tests are located outside of the pytorch/test folder (they're in the pytorch/functorch/test folder). Is this OK? (run_test.py works locally for me). Test Plan: - checked that `python run_test.py --functorch` ran functorch tests locally - Local mock test: added `{"test_compilation_for_dynamic_shape (__main__.TestCompileCache)": ["https://github.com/pytorch/pytorch/issues/82016", ["linux"]]}` to .pytorch-disabled-tests.json, ran functorch tests, verified that the test was skipped. - Wait for CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/82012 Approved by: https://github.com/janeyx99	2022-07-25 14:23:18 +00:00
Michael Suo	9f58d5d7ce	[test stats] use published test stats for sharding (#81116 ) Use the nightly-published test stats to perform sharding, instead of calculating it in every build job. Pull Request resolved: https://github.com/pytorch/pytorch/pull/81116 Approved by: https://github.com/janeyx99	2022-07-12 04:50:19 +00:00
Brian Hirsh	282de5539d	add open device registration test with cpp extensions (#80477 ) Adding a test for open device registration using cpp extensions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/80477 Approved by: https://github.com/albanD, https://github.com/malfet	2022-07-12 01:46:16 +00:00
Charlie Yan	ffae7308c9	Enable test: distributed/algorithms/quantization/test_quantization (#80097 ) fixes https://github.com/pytorch/pytorch/issues/69017 Pull Request resolved: https://github.com/pytorch/pytorch/pull/80097 Approved by: https://github.com/wanchaol	2022-07-01 01:32:33 +00:00
Charlie Yan	14eadf937b	Enable test: test/distributed/algorithms/ddp_comm_hooks/test_ddp_hooks.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/80077 Approved by: https://github.com/wanchaol	2022-06-23 00:11:53 +00:00
Alex Hedges	cb2b7b1e57	Fix code that triggers BytesWarning (#79868 ) Fixes #74812. I have fixed the multiple instances in the repository that trigger `BytesWarning`, and I have enabled the `-bb` option when tests are run to prevent regressions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/79868 Approved by: https://github.com/janeyx99	2022-06-21 01:12:21 +00:00
PyTorch MergeBot	e10cbe3880	Revert "Fix BytesWarning in torch.load() (#74813 )" This reverts commit `6c2e8119dd`. Reverted https://github.com/pytorch/pytorch/pull/74813 on behalf of https://github.com/janeyx99 due to Broke slow tests in cuda 10.2 https://github.com/pytorch/pytorch/runs/6944238177?check_suite_focus=true	2022-06-18 03:53:54 +00:00
Alex Hedges	6c2e8119dd	Fix BytesWarning in torch.load() (#74813 ) Fixes #74812. I have enabled the `-bb` option when tests are run to prevent regressions. I don't think it will make CI run more slowly, but I'm not entirely sure. Pull Request resolved: https://github.com/pytorch/pytorch/pull/74813 Approved by: https://github.com/kit1980	2022-06-17 22:56:43 +00:00
Michael Suo	842da8a5de	[ci] remove TD + test specification code from run_test.py In the case of target determination, this is just removing comments that refer to non-existent code. In the case of the test specification code; this removes (what I believe to be) an unused feature. If we're using this somehow let me know and I can revise the PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/79372 Approved by: https://github.com/janeyx99	2022-06-13 16:09:53 +00:00
Michael Suo	943c09a53e	[ci] clean up dead code related to PR test selection This is never used and not tested, so removing it for clarity. Pull Request resolved: https://github.com/pytorch/pytorch/pull/79363 Approved by: https://github.com/janeyx99	2022-06-13 16:09:51 +00:00
Michael Suo	c978b609f7	[ci] remove IN_CI env var The conventional env var to set is CI. Both circle and GHA set it, so IN_CI is unnecessary Pull Request resolved: https://github.com/pytorch/pytorch/pull/79229 Approved by: https://github.com/janeyx99	2022-06-11 17:16:30 +00:00
Jagadish Krishnamoorthy	2d354cdc2a	[ROCm] Enable test_instantiator, test_type_hints (#78633 ) Signed-off-by: Jagadish Krishnamoorthy <jagdish.krishna@gmail.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/78633 Approved by: https://github.com/malfet, https://github.com/pruthvistony	2022-06-06 06:09:34 +00:00
Xiao Wang	ef0332e36d	Allow relocatable device code linking in pytorch CUDA extensions (#78225 ) Close https://github.com/pytorch/pytorch/issues/57543 Doc: check `Relocatable device code linking:` in https://docs-preview.pytorch.org/78225/cpp_extension.html#torch.utils.cpp_extension.CUDAExtension Pull Request resolved: https://github.com/pytorch/pytorch/pull/78225 Approved by: https://github.com/ezyang, https://github.com/malfet	2022-06-02 21:35:56 +00:00
Kurt Mohler	1705be8ff7	Fix `_free_weak_ref` error (#78575 ) Fixes #74016 Pull Request resolved: https://github.com/pytorch/pytorch/pull/78575 Approved by: https://github.com/ezyang	2022-06-01 00:07:48 +00:00
pritam	37eb31599c	[reland] Add sharding tests to multigpu-test.sh and fix custom operator decorator (#77987 ) 1. Enabled multigpu tests. 2. Fixed failing multigpu tests. 3. Fixed custom operator decorator to be first preference in operator dispatch. Pull Request resolved: https://github.com/pytorch/pytorch/pull/77987 Approved by: https://github.com/fduwjj, https://github.com/wanchaol, https://github.com/janeyx99	2022-05-21 22:33:58 +00:00
PyTorch MergeBot	0f74b44f1a	Revert "Add sharding tests to multigpu-test.sh and fix custom operator decorator (#77825 )" This reverts commit `8d4c8df33a`. Reverted https://github.com/pytorch/pytorch/pull/77825 on behalf of https://github.com/janeyx99 due to as it will break multigpu test reporting	2022-05-20 17:59:03 +00:00
pritam	8d4c8df33a	Add sharding tests to multigpu-test.sh and fix custom operator decorator (#77825 ) 1. Enabled multigpu tests. 2. Fixed failing multigpu tests. 3. Fixed custom operator decorator to be first preference in operator dispatch. Pull Request resolved: https://github.com/pytorch/pytorch/pull/77825 Approved by: https://github.com/wanchaol, https://github.com/fduwjj	2022-05-20 16:53:27 +00:00
PyTorch MergeBot	5e0f559d23	Revert "Add sharding tests to multigpu-test.sh (#77708 )" This reverts commit `a7cf95a609`. Reverted https://github.com/pytorch/pytorch/pull/77708 on behalf of https://github.com/suo	2022-05-18 21:47:11 +00:00
pritam	a7cf95a609	Add sharding tests to multigpu-test.sh (#77708 ) Summary: These tests were being skipped since they don't run on multigpu jobs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/77708 Approved by: https://github.com/wanchaol	2022-05-18 17:37:55 +00:00
Wanchao Liang	25fa964d96	[shard] add clone/detach and set requires_grad for ShardedTensor This PR adding clone/detach and set requires_grad to ShardedTensor Pull Request resolved: https://github.com/pytorch/pytorch/pull/77367 Approved by: https://github.com/pritamdamania87	2022-05-16 21:42:27 +00:00
pritam	9e52b50e34	Additional ops for ShardedTensor, ReplicatedTensor and PartialTensor. Pull Request resolved: https://github.com/pytorch/pytorch/pull/76477 Adding the following ops: 1) softmax for ShardedTensor 2) getitem and unsqueeze for ReplicatedTensor 3) transpose and cat for PartialTensor Differential Revision: [D35979510](https://our.internmc.facebook.com/intern/diff/D35979510/) Approved by: https://github.com/fduwjj, https://github.com/wanchaol	2022-05-06 16:28:04 +00:00
Catherine Lee	56ea57de61	shard `pull / linux-xenial-cuda11.3-py3.7-gcc7 / test (distributed` 1->2 Fixes #ISSUE_NUMBER shard `pull / linux-xenial-cuda11.3-py3.7-gcc7 / test (distributed ...` from 1 shard to 2 Pros: - It currently takes about 2.6 hours and is 3rd longest running job on pull - Theoretically minimal overhead Cons: - Requires changes to the run_test.py which might have correctness issues Notes: - Cannot shard further as one of the test files is responsible for about half of the total run time spreadsheet regarding sharding: https://docs.google.com/spreadsheets/d/1BdtVsjRr0Is9LXMNilR02FEdPXNq7zEWl8AmR3ArsLQ/edit#gid=1153012347 Test Plan: <details><summary>expand to see test plan (its long)</summary> tests from a commit ran on master (90 tests ran) ``` 2022-05-03T12:45:34.7974184Z Selected tests: 2022-05-03T12:45:34.7974495Z distributed/_shard/sharded_optim/test_sharded_optim 2022-05-03T12:45:34.7974839Z distributed/_shard/sharded_tensor/ops/test_binary_cmp 2022-05-03T12:45:34.7975209Z distributed/_shard/sharded_tensor/ops/test_elementwise_ops 2022-05-03T12:45:34.7975575Z distributed/_shard/sharded_tensor/ops/test_embedding 2022-05-03T12:45:34.7976180Z distributed/_shard/sharded_tensor/ops/test_embedding_bag 2022-05-03T12:45:34.7976802Z distributed/_shard/sharded_tensor/ops/test_init 2022-05-03T12:45:34.7977361Z distributed/_shard/sharded_tensor/ops/test_linear 2022-05-03T12:45:34.7978157Z distributed/_shard/sharded_tensor/ops/test_math_ops 2022-05-03T12:45:34.7978879Z distributed/_shard/sharded_tensor/test_megatron_prototype 2022-05-03T12:45:34.7979594Z distributed/_shard/sharded_tensor/test_sharded_tensor 2022-05-03T12:45:34.7980366Z distributed/_shard/sharded_tensor/test_sharded_tensor_reshard 2022-05-03T12:45:34.7981066Z distributed/_shard/sharding_plan/test_sharding_plan 2022-05-03T12:45:34.7981877Z distributed/_shard/sharding_spec/test_sharding_spec 2022-05-03T12:45:34.7982387Z distributed/_shard/test_partial_tensor 2022-05-03T12:45:34.7982691Z distributed/_shard/test_replicated_tensor 2022-05-03T12:45:34.7982994Z distributed/_shard/test_sharder 2022-05-03T12:45:34.7983280Z distributed/algorithms/test_join 2022-05-03T12:45:34.7983695Z distributed/elastic/events/lib_test 2022-05-03T12:45:34.7983984Z distributed/elastic/metrics/api_test 2022-05-03T12:45:34.7984308Z distributed/elastic/multiprocessing/api_test 2022-05-03T12:45:34.7984624Z distributed/elastic/timer/api_test 2022-05-03T12:45:34.7984924Z distributed/elastic/timer/local_timer_example 2022-05-03T12:45:34.7985254Z distributed/elastic/timer/local_timer_test 2022-05-03T12:45:34.7985575Z distributed/elastic/utils/distributed_test 2022-05-03T12:45:34.7985889Z distributed/elastic/utils/logging_test 2022-05-03T12:45:34.7986176Z distributed/elastic/utils/util_test 2022-05-03T12:45:34.7986492Z distributed/fsdp/test_flatten_params_wrapper 2022-05-03T12:45:34.7986799Z distributed/fsdp/test_fsdp_apply 2022-05-03T12:45:34.7987078Z distributed/fsdp/test_fsdp_checkpoint 2022-05-03T12:45:34.7987388Z distributed/fsdp/test_fsdp_clip_grad_norm 2022-05-03T12:45:34.7987691Z distributed/fsdp/test_fsdp_comm 2022-05-03T12:45:34.7987961Z distributed/fsdp/test_fsdp_core 2022-05-03T12:45:34.7988251Z distributed/fsdp/test_fsdp_exec_order 2022-05-03T12:45:34.7988570Z distributed/fsdp/test_fsdp_freezing_weights 2022-05-03T12:45:34.7988865Z distributed/fsdp/test_fsdp_grad_acc 2022-05-03T12:45:34.7989176Z distributed/fsdp/test_fsdp_ignored_modules 2022-05-03T12:45:34.7989478Z distributed/fsdp/test_fsdp_input 2022-05-03T12:45:34.7989950Z distributed/fsdp/test_fsdp_memory 2022-05-03T12:45:34.7990241Z distributed/fsdp/test_fsdp_meta 2022-05-03T12:45:34.7990640Z distributed/fsdp/test_fsdp_mixed_precision 2022-05-03T12:45:34.7990964Z distributed/fsdp/test_fsdp_multiple_forward 2022-05-03T12:45:34.7991293Z distributed/fsdp/test_fsdp_multiple_wrapping 2022-05-03T12:45:34.7991610Z distributed/fsdp/test_fsdp_optim_state 2022-05-03T12:45:34.7991895Z distributed/fsdp/test_fsdp_overlap 2022-05-03T12:45:34.7992195Z distributed/fsdp/test_fsdp_pure_fp16 2022-05-03T12:45:34.7992500Z distributed/fsdp/test_fsdp_state_dict 2022-05-03T12:45:34.7992818Z distributed/fsdp/test_fsdp_summon_full_params 2022-05-03T12:45:34.7993117Z distributed/fsdp/test_fsdp_traversal 2022-05-03T12:45:34.7993861Z distributed/fsdp/test_fsdp_uneven 2022-05-03T12:45:34.7994181Z distributed/fsdp/test_shard_utils 2022-05-03T12:45:34.7994447Z distributed/fsdp/test_utils 2022-05-03T12:45:34.7994721Z distributed/fsdp/test_wrap 2022-05-03T12:45:34.7995015Z distributed/nn/jit/test_instantiator 2022-05-03T12:45:34.7995328Z distributed/optim/test_zero_redundancy_optimizer 2022-05-03T12:45:34.7995664Z distributed/pipeline/sync/skip/test_api 2022-05-03T12:45:34.7995983Z distributed/pipeline/sync/skip/test_gpipe 2022-05-03T12:45:34.7996315Z distributed/pipeline/sync/skip/test_inspect_skip_layout 2022-05-03T12:45:34.7996652Z distributed/pipeline/sync/skip/test_leak 2022-05-03T12:45:34.7996977Z distributed/pipeline/sync/skip/test_portal 2022-05-03T12:45:34.7997292Z distributed/pipeline/sync/skip/test_stash_pop 2022-05-03T12:45:34.7997623Z distributed/pipeline/sync/skip/test_tracker 2022-05-03T12:45:34.7997968Z distributed/pipeline/sync/skip/test_verify_skippables 2022-05-03T12:45:34.7998301Z distributed/pipeline/sync/test_balance 2022-05-03T12:45:34.7998591Z distributed/pipeline/sync/test_bugs 2022-05-03T12:45:34.7998927Z distributed/pipeline/sync/test_checkpoint 2022-05-03T12:45:34.7999243Z distributed/pipeline/sync/test_copy 2022-05-03T12:45:34.7999557Z distributed/pipeline/sync/test_deferred_batch_norm 2022-05-03T12:45:34.7999896Z distributed/pipeline/sync/test_dependency 2022-05-03T12:45:34.8000215Z distributed/pipeline/sync/test_inplace 2022-05-03T12:45:34.8000516Z distributed/pipeline/sync/test_microbatch 2022-05-03T12:45:34.8000826Z distributed/pipeline/sync/test_phony 2022-05-03T12:45:34.8001130Z distributed/pipeline/sync/test_pipe 2022-05-03T12:45:34.8001424Z distributed/pipeline/sync/test_pipeline 2022-05-03T12:45:34.8001733Z distributed/pipeline/sync/test_stream 2022-05-03T12:45:34.8002055Z distributed/pipeline/sync/test_transparency 2022-05-03T12:45:34.8002353Z distributed/pipeline/sync/test_worker 2022-05-03T12:45:34.8002672Z distributed/rpc/cuda/test_tensorpipe_agent 2022-05-03T12:45:34.8002982Z distributed/rpc/test_faulty_agent 2022-05-03T12:45:34.8003270Z distributed/rpc/test_tensorpipe_agent 2022-05-03T12:45:34.8003568Z distributed/test_c10d_common 2022-05-03T12:45:34.8003839Z distributed/test_c10d_gloo 2022-05-03T12:45:34.8004088Z distributed/test_c10d_nccl 2022-05-03T12:45:34.8004369Z distributed/test_c10d_spawn_gloo 2022-05-03T12:45:34.8004656Z distributed/test_c10d_spawn_nccl 2022-05-03T12:45:34.8004938Z distributed/test_data_parallel 2022-05-03T12:45:34.8005212Z distributed/test_distributed_spawn 2022-05-03T12:45:34.8005496Z distributed/test_launcher 2022-05-03T12:45:34.8005767Z distributed/test_nccl 2022-05-03T12:45:34.8006019Z distributed/test_pg_wrapper 2022-05-03T12:45:34.8006285Z distributed/test_store ``` tests ran on first shard for distributed on this PR (34 tests) ``` 2022-05-02T21:26:00.1385256Z Selected tests: 2022-05-02T21:26:00.1385767Z distributed/test_distributed_spawn 2022-05-02T21:26:00.1386403Z distributed/elastic/multiprocessing/api_test 2022-05-02T21:26:00.1387051Z distributed/fsdp/test_fsdp_memory 2022-05-02T21:26:00.1387607Z distributed/fsdp/test_fsdp_ignored_modules 2022-05-02T21:26:00.1388179Z distributed/fsdp/test_fsdp_apply 2022-05-02T21:26:00.1388600Z distributed/_shard/sharded_tensor/ops/test_binary_cmp 2022-05-02T21:26:00.1389181Z distributed/_shard/sharding_spec/test_sharding_spec 2022-05-02T21:26:00.1389545Z distributed/_shard/sharded_tensor/ops/test_linear 2022-05-02T21:26:00.1389878Z distributed/fsdp/test_fsdp_uneven 2022-05-02T21:26:00.1390186Z distributed/fsdp/test_fsdp_multiple_wrapping 2022-05-02T21:26:00.1390526Z distributed/fsdp/test_fsdp_multiple_forward 2022-05-02T21:26:00.1390877Z distributed/_shard/sharded_tensor/ops/test_embedding 2022-05-02T21:26:00.1391219Z distributed/_shard/test_partial_tensor 2022-05-02T21:26:00.1391542Z distributed/_shard/sharded_optim/test_sharded_optim 2022-05-02T21:26:00.1391915Z distributed/_shard/sharded_tensor/ops/test_elementwise_ops 2022-05-02T21:26:00.1392297Z distributed/fsdp/test_flatten_params_wrapper 2022-05-02T21:26:00.1392585Z distributed/fsdp/test_utils 2022-05-02T21:26:00.1392883Z distributed/nn/jit/test_instantiator 2022-05-02T21:26:00.1393167Z distributed/test_nccl 2022-05-02T21:26:00.1393466Z distributed/_shard/sharding_plan/test_sharding_plan 2022-05-02T21:26:00.1393787Z distributed/_shard/test_sharder 2022-05-02T21:26:00.1394085Z distributed/elastic/timer/api_test 2022-05-02T21:26:00.1394383Z distributed/pipeline/sync/skip/test_api 2022-05-02T21:26:00.1394738Z distributed/pipeline/sync/skip/test_inspect_skip_layout 2022-05-02T21:26:00.1395090Z distributed/pipeline/sync/skip/test_portal 2022-05-02T21:26:00.1395424Z distributed/pipeline/sync/skip/test_tracker 2022-05-02T21:26:00.1395935Z distributed/pipeline/sync/test_balance 2022-05-02T21:26:00.1396288Z distributed/pipeline/sync/test_checkpoint 2022-05-02T21:26:00.1396635Z distributed/pipeline/sync/test_deferred_batch_norm 2022-05-02T21:26:00.1396953Z distributed/pipeline/sync/test_inplace 2022-05-02T21:26:00.1397269Z distributed/pipeline/sync/test_phony 2022-05-02T21:26:00.1397587Z distributed/pipeline/sync/test_pipeline 2022-05-02T21:26:00.1397903Z distributed/pipeline/sync/test_transparency 2022-05-02T21:26:00.1398221Z distributed/rpc/test_faulty_agent ``` tests ran on second shard for distributed on this PR (56 tests) ``` 2022-05-02T21:26:55.1342892Z Selected tests: 2022-05-02T21:26:55.1343201Z distributed/rpc/cuda/test_tensorpipe_agent 2022-05-02T21:26:55.1343526Z distributed/fsdp/test_fsdp_core 2022-05-02T21:26:55.1343829Z distributed/test_c10d_nccl 2022-05-02T21:26:55.1344089Z distributed/test_c10d_gloo 2022-05-02T21:26:55.1344408Z distributed/fsdp/test_fsdp_summon_full_params 2022-05-02T21:26:55.1344749Z distributed/fsdp/test_fsdp_mixed_precision 2022-05-02T21:26:55.1345085Z distributed/optim/test_zero_redundancy_optimizer 2022-05-02T21:26:55.1345423Z distributed/fsdp/test_fsdp_optim_state 2022-05-02T21:26:55.1345773Z distributed/_shard/sharded_tensor/test_sharded_tensor 2022-05-02T21:26:55.1346088Z distributed/fsdp/test_fsdp_state_dict 2022-05-02T21:26:55.1346379Z distributed/test_store 2022-05-02T21:26:55.1346661Z distributed/test_c10d_spawn_gloo 2022-05-02T21:26:55.1346966Z distributed/test_pg_wrapper 2022-05-02T21:26:55.1347252Z distributed/test_c10d_spawn_nccl 2022-05-02T21:26:55.1347565Z distributed/fsdp/test_fsdp_clip_grad_norm 2022-05-02T21:26:55.1347871Z distributed/fsdp/test_wrap 2022-05-02T21:26:55.1348369Z distributed/fsdp/test_fsdp_grad_acc 2022-05-02T21:26:55.1348679Z distributed/algorithms/test_join 2022-05-02T21:26:55.1349004Z distributed/fsdp/test_fsdp_freezing_weights 2022-05-02T21:26:55.1349305Z distributed/fsdp/test_fsdp_comm 2022-05-02T21:26:55.1349593Z distributed/test_c10d_common 2022-05-02T21:26:55.1349885Z distributed/fsdp/test_fsdp_meta 2022-05-02T21:26:55.1350171Z distributed/fsdp/test_fsdp_exec_order 2022-05-02T21:26:55.1350486Z distributed/fsdp/test_fsdp_checkpoint 2022-05-02T21:26:55.1350798Z distributed/fsdp/test_fsdp_overlap 2022-05-02T21:26:55.1351105Z distributed/elastic/timer/local_timer_example 2022-05-02T21:26:55.1351423Z distributed/fsdp/test_fsdp_input 2022-05-02T21:26:55.1351749Z distributed/_shard/sharded_tensor/ops/test_init 2022-05-02T21:26:55.1352190Z distributed/elastic/timer/local_timer_test 2022-05-02T21:26:55.1352520Z distributed/elastic/utils/distributed_test 2022-05-02T21:26:55.1352841Z distributed/fsdp/test_fsdp_pure_fp16 2022-05-02T21:26:55.1353150Z distributed/test_data_parallel 2022-05-02T21:26:55.1353437Z distributed/fsdp/test_fsdp_traversal 2022-05-02T21:26:55.1353792Z distributed/_shard/sharded_tensor/test_sharded_tensor_reshard 2022-05-02T21:26:55.1354174Z distributed/_shard/sharded_tensor/ops/test_embedding_bag 2022-05-02T21:26:55.1354534Z distributed/_shard/sharded_tensor/test_megatron_prototype 2022-05-02T21:26:55.1354858Z distributed/test_launcher 2022-05-02T21:26:55.1355149Z distributed/elastic/utils/util_test 2022-05-02T21:26:55.1355441Z distributed/elastic/utils/logging_test 2022-05-02T21:26:55.1355755Z distributed/elastic/metrics/api_test 2022-05-02T21:26:55.1356095Z distributed/_shard/sharded_tensor/ops/test_math_ops 2022-05-02T21:26:55.1356455Z distributed/_shard/test_replicated_tensor 2022-05-02T21:26:55.1356754Z distributed/elastic/events/lib_test 2022-05-02T21:26:55.1357065Z distributed/fsdp/test_shard_utils 2022-05-02T21:26:55.1357387Z distributed/pipeline/sync/skip/test_gpipe 2022-05-02T21:26:55.1357702Z distributed/pipeline/sync/skip/test_leak 2022-05-02T21:26:55.1358040Z distributed/pipeline/sync/skip/test_stash_pop 2022-05-02T21:26:55.1358396Z distributed/pipeline/sync/skip/test_verify_skippables 2022-05-02T21:26:55.1358716Z distributed/pipeline/sync/test_bugs 2022-05-02T21:26:55.1359027Z distributed/pipeline/sync/test_copy 2022-05-02T21:26:55.1359350Z distributed/pipeline/sync/test_dependency 2022-05-02T21:26:55.1359662Z distributed/pipeline/sync/test_microbatch 2022-05-02T21:26:55.1359983Z distributed/pipeline/sync/test_pipe 2022-05-02T21:26:55.1360299Z distributed/pipeline/sync/test_stream 2022-05-02T21:26:55.1360593Z distributed/pipeline/sync/test_worker 2022-05-02T21:26:55.1360912Z distributed/rpc/test_tensorpipe_agent ``` </details> Pull Request resolved: https://github.com/pytorch/pytorch/pull/76564 Approved by: https://github.com/jeffdaily, https://github.com/janeyx99	2022-05-03 23:01:42 +00:00
Junjie Wang (PyTorch)	7c44d560ba	[PT-D][Sharding] Enable ops needed in the transformer model training (#75374 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/75374 From the code base of FairSeq and MetaSeq codebase (which is essentially a transformer model), we have found that loads of ops are not supported by sharded tensor. So we now implement a simple version so that we can at least run a transformer example: Ops include: chuck, transpose, view, mask_fill, dropout, softmax and type_as. Isolate the common logic of registering simple ops into a function and for future register, we just need to implement at most three functions for a new op. ghstack-source-id: 155309147 Test Plan: CI Reviewed By: pritamdamania87 Differential Revision: D35123021 fbshipit-source-id: 660e559fb8b4a910eb63e0586c63ab927873a2ce (cherry picked from commit 83a87ebf627d863448dfe1019c7c5f7112cc14ab)	2022-05-03 17:20:28 +00:00
Junjie Wang (PyTorch)	c1037d0d4c	[PT-D][Sharding] Move Partial Tensor to the _shard folder and add logic to remove padding (#76199 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/76199 Since Partial Tensor is somehow isolated to sharded tensor. We now move it to the _shard folder. Also, we added the logic to remove paddings when the size is not divisible by the world size. Modify the unit test to reflect this changes. Finally, we need to consider the placement order for the reshading spec for partial tensor, related logic is added in this change. Futhermore, for sharded linear, we will need to order the placement by rank to get the expected local result. ghstack-source-id: 154853290 Test Plan: CI Reviewed By: pritamdamania87, wanchaol Differential Revision: D35827894 fbshipit-source-id: 58dab77969b8b6557f42afa7e8f5a8a053dd5793 (cherry picked from commit abeb28f16582dcf707c2e165f39df6caf692384d)	2022-04-28 06:22:02 +00:00
Alban Desmaison	3d7abc0e55	Make -h work with run_test.py As per title. ### When running `python run_test.py -h` It used to show: - The general unittest parser help that we print via a second thread `35545d85dc/torch/testing/_internal/common_utils.py (L467-L470)` - The common_utils's parser help <details><summary>Full result</summary> <p> ```bash $ python run_test.py -h usage: run_test.py [-h] [-v] [-q] [--locals] [-f] [-c] [-b] [-k TESTNAMEPATTERNS] [tests [tests ...]] positional arguments: tests a list of any number of test modules, classes and test methods. optional arguments: -h, --help show this help message and exit -v, --verbose Verbose output -q, --quiet Quiet output --locals Show local variables in tracebacks -f, --failfast Stop on first fail or error -c, --catch Catch Ctrl-C and display results so far -b, --buffer Buffer stdout and stderr during tests -k TESTNAMEPATTERNS Only run tests which match the given substring Examples: run_test.py - run default set of tests run_test.py MyTestSuite - run suite 'MyTestSuite' run_test.py MyTestCase.testSomething - run MyTestCase.testSomething run_test.py MyTestCase - run all 'test' test methods in MyTestCase usage: run_test.py [-h] [--subprocess] [--seed SEED] [--accept] [--jit_executor JIT_EXECUTOR] [--repeat REPEAT] [--test_bailouts] [--save-xml [SAVE_XML]] [--discover-tests] [--log-suffix LOG_SUFFIX] [--run-parallel RUN_PARALLEL] [--import-slow-tests [IMPORT_SLOW_TESTS]] [--import-disabled-tests [IMPORT_DISABLED_TESTS]] optional arguments: -h, --help show this help message and exit --subprocess whether to run each test in a subprocess --seed SEED --accept --jit_executor JIT_EXECUTOR --repeat REPEAT --test_bailouts --save-xml [SAVE_XML] --discover-tests --log-suffix LOG_SUFFIX --run-parallel RUN_PARALLEL --import-slow-tests [IMPORT_SLOW_TESTS] --import-disabled-tests [IMPORT_DISABLED_TESTS] ``` </p> </details> It now prints: - The general unittest parser help the same way. Should we remove this? We can't merge them unfortunately as inittest does not accept parent / does not expose the parser for us to take it as a parent. - The combined common_utils + run_test parsers help <details><summary>Full result</summary> <p> ```bash $ python run_test.py -h usage: run_test.py [-h] [-v] [-q] [--locals] [-f] [-c] [-b] [-k TESTNAMEPATTERNS] [tests [tests ...]] positional arguments: tests a list of any number of test modules, classes and test methods. optional arguments: -h, --help show this help message and exit -v, --verbose Verbose output -q, --quiet Quiet output --locals Show local variables in tracebacks -f, --failfast Stop on first fail or error -c, --catch Catch Ctrl-C and display results so far -b, --buffer Buffer stdout and stderr during tests -k TESTNAMEPATTERNS Only run tests which match the given substring Examples: run_test.py - run default set of tests run_test.py MyTestSuite - run suite 'MyTestSuite' run_test.py MyTestCase.testSomething - run MyTestCase.testSomething run_test.py MyTestCase - run all 'test' test methods in MyTestCase Ignoring disabled issues: [] usage: run_test.py [-h] [--subprocess] [--seed SEED] [--accept] [--jit_executor JIT_EXECUTOR] [--repeat REPEAT] [--test_bailouts] [--save-xml [SAVE_XML]] [--discover-tests] [--log-suffix LOG_SUFFIX] [--run-parallel RUN_PARALLEL] [--import-slow-tests [IMPORT_SLOW_TESTS]] [--import-disabled-tests [IMPORT_DISABLED_TESTS]] [-v] [--jit] [--distributed-tests] [-core] [-pt] [-c] [-i TESTS [TESTS ...]] [-x TESTS [TESTS ...]] [-f TESTS] [-l TESTS] [--bring-to-front TESTS [TESTS ...]] [--ignore-win-blocklist] [--continue-through-error] [--export-past-test-times [EXPORT_PAST_TEST_TIMES]] [--shard SHARD SHARD] [--exclude-jit-executor] [--exclude-distributed-tests] [--run-specified-test-cases [RUN_SPECIFIED_TEST_CASES]] [--use-specified-test-cases-by {include,bring-to-front}] [--dry-run] [additional_unittest_args [additional_unittest_args ...]] Run the PyTorch unit test suite positional arguments: additional_unittest_args additional arguments passed through to unittest, e.g., python run_test.py -i sparse -- TestSparse.test_factory_size_check optional arguments: -h, --help show this help message and exit --subprocess whether to run each test in a subprocess --seed SEED --accept --jit_executor JIT_EXECUTOR --repeat REPEAT --test_bailouts --save-xml [SAVE_XML] --discover-tests --log-suffix LOG_SUFFIX --run-parallel RUN_PARALLEL --import-slow-tests [IMPORT_SLOW_TESTS] --import-disabled-tests [IMPORT_DISABLED_TESTS] -v, --verbose print verbose information and test-by-test results --jit, --jit run all jit tests --distributed-tests, --distributed-tests run all distributed tests -core, --core Only run core tests, or tests that validate PyTorch's ops, modules,and autograd. They are defined by CORE_TEST_LIST. -pt, --pytest If true, use `pytest` to execute the tests. E.g., this runs TestTorch with pytest in verbose and coverage mode: python run_test.py -vci torch -pt -c, --coverage enable coverage -i TESTS [TESTS ...], --include TESTS [TESTS ...] select a set of tests to include (defaults to ALL tests). tests must be a part of the TESTS list defined in run_test.py -x TESTS [TESTS ...], --exclude TESTS [TESTS ...] select a set of tests to exclude -f TESTS, --first TESTS select the test to start from (excludes previous tests) -l TESTS, --last TESTS select the last test to run (excludes following tests) --bring-to-front TESTS [TESTS ...] select a set of tests to run first. This can be used in situations where you want to run all tests, but care more about some set, e.g. after making a change to a specific component --ignore-win-blocklist always run blocklisted windows tests --continue-through-error Runs the full test suite despite one of the tests failing --export-past-test-times [EXPORT_PAST_TEST_TIMES] dumps test times from previous S3 stats into a file, format JSON --shard SHARD SHARD runs a shard of the tests (taking into account other selections), e.g., --shard 2 3 will break up the selected tests into 3 shards and run the tests in the 2nd shard (the first number should not exceed the second) --exclude-jit-executor exclude tests that are run for a specific jit config --exclude-distributed-tests exclude distributed tests --run-specified-test-cases [RUN_SPECIFIED_TEST_CASES] load specified test cases file dumped from previous OSS CI stats, format CSV. If all test cases should run for a <test_module> please add a single row: test_filename,test_case_name ... <test_module>,__all__ ... how we use the stats will be based on option "--use-specified-test-cases-by". --use-specified-test-cases-by {include,bring-to-front} used together with option "--run-specified-test-cases". When specified test case file is set, this option allows the user to control whether to only run the specified test modules or to simply bring the specified modules to front and also run the remaining modules. Note: regardless of this option, we will only run the specified test cases within a specified test module. For unspecified test modules with the bring-to-front option, all test cases will be run, as one may expect. --dry-run Only list the test that will run. where TESTS is any of: benchmark_utils/test_benchmark_utils, distributed/_shard/sharded_optim/test_sharded_optim, distributed/_shard/sharded_tensor/ops/test_binary_cmp, distributed/_shard/sharded_tensor/ops/test_elementwise_ops, distributed/_shard/sharded_tensor/ops/test_embedding, distributed/_shard/sharded_tensor/ops/test_embedding_bag, distributed/_shard/sharded_tensor/ops/test_init, distributed/_shard/sharded_tensor/ops/test_linear, distributed/_shard/sharded_tensor/ops/test_math_ops, distributed/_shard/sharded_tensor/test_megatron_prototype, distributed/_shard/sharded_tensor/test_partial_tensor, distributed/_shard/sharded_tensor/test_sharded_tensor, distributed/_shard/sharded_tensor/test_sharded_tensor_reshard, distributed/_shard/sharding_spec/test_sharding_spec, distributed/_shard/test_replicated_tensor, distributed/algorithms/test_join, distributed/elastic/events/lib_test, distributed/elastic/metrics/api_test, distributed/elastic/multiprocessing/api_test, distributed/elastic/timer/api_test, distributed/elastic/timer/local_timer_example, distributed/elastic/timer/local_timer_test, distributed/elastic/utils/distributed_test, distributed/elastic/utils/logging_test, distributed/elastic/utils/util_test, distributed/fsdp/test_flatten_params_wrapper, distributed/fsdp/test_fsdp_apply, distributed/fsdp/test_fsdp_checkpoint, distributed/fsdp/test_fsdp_clip_grad_norm, distributed/fsdp/test_fsdp_comm, distributed/fsdp/test_fsdp_core, distributed/fsdp/test_fsdp_freezing_weights, distributed/fsdp/test_fsdp_grad_acc, distributed/fsdp/test_fsdp_ignored_modules, distributed/fsdp/test_fsdp_input, distributed/fsdp/test_fsdp_memory, distributed/fsdp/test_fsdp_mixed_precision, distributed/fsdp/test_fsdp_multiple_forward, distributed/fsdp/test_fsdp_multiple_wrapping, distributed/fsdp/test_fsdp_optim_state, distributed/fsdp/test_fsdp_overlap, distributed/fsdp/test_fsdp_pure_fp16, distributed/fsdp/test_fsdp_state_dict, distributed/fsdp/test_fsdp_summon_full_params, distributed/fsdp/test_fsdp_traversal, distributed/fsdp/test_fsdp_uneven, distributed/fsdp/test_shard_utils, distributed/fsdp/test_utils, distributed/fsdp/test_wrap, distributed/nn/jit/test_instantiator, distributed/optim/test_zero_redundancy_optimizer, distributed/pipeline/sync/skip/test_api, distributed/pipeline/sync/skip/test_gpipe, distributed/pipeline/sync/skip/test_inspect_skip_layout, distributed/pipeline/sync/skip/test_leak, distributed/pipeline/sync/skip/test_portal, distributed/pipeline/sync/skip/test_stash_pop, distributed/pipeline/sync/skip/test_tracker, distributed/pipeline/sync/skip/test_verify_skippables, distributed/pipeline/sync/test_balance, distributed/pipeline/sync/test_bugs, distributed/pipeline/sync/test_checkpoint, distributed/pipeline/sync/test_copy, distributed/pipeline/sync/test_deferred_batch_norm, distributed/pipeline/sync/test_dependency, distributed/pipeline/sync/test_inplace, distributed/pipeline/sync/test_microbatch, distributed/pipeline/sync/test_phony, distributed/pipeline/sync/test_pipe, distributed/pipeline/sync/test_pipeline, distributed/pipeline/sync/test_stream, distributed/pipeline/sync/test_transparency, distributed/pipeline/sync/test_worker, distributed/rpc/cuda/test_tensorpipe_agent, distributed/rpc/test_faulty_agent, distributed/rpc/test_tensorpipe_agent, distributed/test_c10d_common, distributed/test_c10d_gloo, distributed/test_c10d_nccl, distributed/test_c10d_spawn_gloo, distributed/test_c10d_spawn_nccl, distributed/test_data_parallel, distributed/test_distributed_spawn, distributed/test_launcher, distributed/test_nccl, distributed/test_pg_wrapper, distributed/test_store, distributions/test_constraints, distributions/test_distributions, lazy/test_bindings, lazy/test_extract_compiled_graph, lazy/test_ts_opinfo, test_ao_sparsity, test_autocast, test_autograd, test_binary_ufuncs, test_bundled_inputs, test_complex, test_cpp_api_parity, test_cpp_extensions_aot_ninja, test_cpp_extensions_aot_no_ninja, test_cpp_extensions_jit, test_cuda, test_cuda_primary_ctx, test_dataloader, test_datapipe, test_deploy, test_deploy, test_dispatch, test_expanded_weights, test_foreach, test_function_schema, test_functional_autograd_benchmark, test_functional_optim, test_functionalization, test_futures, test_fx, test_fx_experimental, test_hub, test_import_stats, test_indexing, test_jit, test_jit_autocast, test_jit_cuda_fuser, test_jit_disabled, test_jit_fuser_legacy, test_jit_fuser_te, test_jit_legacy, test_jit_profiling, test_license, test_linalg, test_logging, test_masked, test_mkldnn, test_mobile_optimizer, test_model_dump, test_module_init, test_modules, test_monitor, test_multiprocessing, test_multiprocessing_spawn, test_namedtensor, test_namedtuple_return_api, test_native_functions, test_nestedtensor, test_nn, test_numba_integration, test_numpy_interop, test_openmp, test_ops, test_ops_gradients, test_ops_jit, test_optim, test_overrides, test_package, test_per_overload_api, test_profiler, test_pruning_op, test_public_bindings, test_python_dispatch, test_pytree, test_quantization, test_reductions, test_scatter_gather_ops, test_serialization, test_set_default_mobile_cpu_allocator, test_shape_ops, test_show_pickle, test_sort_and_select, test_sparse, test_sparse_csr, test_spectral_ops, test_stateless, test_tensor_creation_ops, test_tensorboard, test_tensorexpr, test_tensorexpr_pybind, test_testing, test_torch, test_type_hints, test_type_info, test_type_promotion, test_unary_ufuncs, test_utils, test_view_ops, test_vmap, test_vulkan, test_xnnpack_integration ``` </p> </details> ### When running anything else (for example `python test_autograd.py -h`) It did not change and still does: - The general unittest parser help that we print via a second thread - The common_utils's parser help Pull Request resolved: https://github.com/pytorch/pytorch/pull/76152 Approved by: https://github.com/malfet, https://github.com/seemethere	2022-04-25 14:01:33 +00:00
Wanchao Liang	78ea86a445	[shard] Sharder and ShardingPlan prototype (#73873 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73873 Basic ShardingPlan interface and Sharder implemention: 1. We provide `ShardingPlan` to allow user to specify all parameter sharding strategies for a given model, this including `plan` for sharding the parameters, and `output_plan` for tagging the output layout, `return_local_tensor` for converting back to DDP. 2. Introduce `shard_module` API, that could take a nn.Module, a ShardingPlan, then shard the module according to the plan. TODO: next PR we will introduce Extensible Sharder and ShardingPlanner. ghstack-source-id: 154682421 Test Plan: test_sharding_plann.py Reviewed By: pritamdamania87, fduwjj Differential Revision: D34695159 fbshipit-source-id: 3d695803c4b7e9a7543177ade5b709b5f847baa9 (cherry picked from commit 670cd279b0e5304a9bf0ce6e6651a08273a77035)	2022-04-25 13:01:24 +00:00
Jeff Daily	44bbb247a6	[ROCm] enable fsdp tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/75632 Approved by: https://github.com/kumpera, https://github.com/malfet	2022-04-22 19:50:36 +00:00
wanchaol	be354d8139	[shard] Add basic math ops to ShardedTensor and add ReplicatedTensor inter-op Pull Request resolved: https://github.com/pytorch/pytorch/pull/73703 This PR add basic math ops to ShardedTensor (+-/), and add ReplicatedTensor inter-op ShardedTensor to those math ops. This enables ShardedTensor (op) ReplicatedTensor to avoid communication in certain cases. Differential Revision: [D34560867](https://our.internmc.facebook.com/intern/diff/D34560867/) NOTE FOR REVIEWERS*: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D34560867/)! Approved by: https://github.com/pritamdamania87	2022-04-12 04:25:10 +00:00
Andrey Talman	622cff3e95	Cuda 11.6 Disable failing tests (#75420 ) Summary: This mitigates number of issues with CUDA 11.6 update and updates Linux driver . New issues discovered #[75391](https://github.com/pytorch/pytorch/issues/75391) #[75375](https://github.com/pytorch/pytorch/issues/75375) Old issue present since 11.3 #[57482](https://github.com/pytorch/pytorch/issues/57482) #[70111](https://github.com/pytorch/pytorch/issues/70111) These changes already testsed WIP PR: #[75337](https://github.com/pytorch/pytorch/pull/75337) Pull Request resolved: https://github.com/pytorch/pytorch/pull/75420 Reviewed By: seemethere Differential Revision: D35481973 Pulled By: atalman fbshipit-source-id: 4db00c646e2df4f8650404763963c3b215110f1f (cherry picked from commit 518e19dc361b43273f5bd6bdfff942614e8466f5)	2022-04-07 22:43:15 +00:00
Brian Hirsh	9429dbb434	make functionalization work better with subclasses Pull Request resolved: https://github.com/pytorch/pytorch/pull/73441 Approved by: https://github.com/ezyang, https://github.com/albanD	2022-04-04 15:33:27 +00:00
David Berard	27deefb5e1	[JIT] Enable NVFuser tests in OSS CI (#73322 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73322 These tests have been disabled in OSS CI since #34785. Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D34436844 Pulled By: davidberard98 fbshipit-source-id: c5b14b33e7f369a6fa1e9cfbcb484a30dffc659e (cherry picked from commit b08f51587c0203c3e8b69f06ea613759e740aa4f)	2022-04-01 23:48:30 +00:00
Wanchao Liang	0524b2829a	[shard] Add ReplicatedTensor (#73529 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73529 Add ReplicatedTensor, a ReplicatedTensor is a type of tensor that have the same value on all ranks across the world_size. ReplicatedTensor is a :class:`~torch.Tensor` subclass, and it could be used together with ShardedTensor/Tensor together to express different types of computation. The inter-op rules defined as (using torch.add as an example op): ReplicatedTensor + ReplicatedTensor = ReplicatedTensor ReplicatedTensor + torch.Tensor = torch.Tensor ReplicatedTensor + ShardedTensor = ShardedTensor We also added a `validate()` API to help user validate if a replicated tensor on certain process_group is truly replicated or not. TODO: next PR gonna add ShardedTensor/PartialTensor logic to handle ReplicatedTensor. ghstack-source-id: 152064781 Test Plan: test_replicated_tensor Reviewed By: pritamdamania87, fduwjj Differential Revision: D34529374 fbshipit-source-id: 16ccb300e9f9c47ac29a17eb6d46d029ab7d60b8 (cherry picked from commit 44f4e11e795a1bf330a8108bda256950ca769525)	2022-03-24 12:41:17 +00:00
Jeff Daily	956a028b55	[ROCm] enable HIP IPC Enables code paths that use hipIpc* functions. Also enables test_multiprocessing.py. Pull Request resolved: https://github.com/pytorch/pytorch/pull/74383 Approved by: https://github.com/osalpekar	2022-03-21 19:32:01 +00:00
Sahan Paliskara	0bfa2f8255	Move torch::deploy tests to their own workflow job (#73676 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73676 For some reason https://github.com/pytorch/pytorch/pull/72637 ended up in getting messed up during rebasing so please refer to that pr for review history. This PR creates a new workflow called ` deploy-linux-xenial-cuda11.3-py3.7-gcc7` for torch::deploy tests. For testing go to https://www.torch-ci.com/pytorch/pytorch/pull/73676 and check if a build and test job occur with ` deploy-linux-xenial-cuda11.3-py3.7-gcc7` Test Plan: Imported from OSS Reviewed By: soulitzer Differential Revision: D34586702 Pulled By: PaliC fbshipit-source-id: 5627cf4ff411a4a04030f8b7726f84af979da213 (cherry picked from commit df6dddebb9fe078a6053a31033b5a40cc742fcf3)	2022-03-17 12:19:48 +00:00
atalman	ebca80ed08	Move test ops gradients and test ops jit to separate files Fixes #72368 As per reference issue, the test_ops in single file takes around 3:30-4:00Hrs to execute on asan jobs: Reference : pytorch_test_times.json ``` { "commit": "39535fec6c3ff5bf7c2d322d096c59571c3295ed", "JOB_BASE_NAME": "linux-xenial-py3.7-clang7-asan", "job_times": { "test_ops": 14928.355000000636, <- This test group is over 4hrs alone ``` ---- Hence separating test_ops into following parts: 1. TestGradients 2. TestJit 3. TestCommon and TestMathBits Pull Request resolved: https://github.com/pytorch/pytorch/pull/74297 Approved by: https://github.com/malfet	2022-03-17 02:07:50 +00:00
PyTorch MergeBot	232faeacf8	Revert "Move test ops gradients and test ops jit to separate files" This reverts commit `7cf9b942da`. Reverted https://github.com/pytorch/pytorch/pull/74297 on behalf of https://github.com/atalman	2022-03-16 20:08:23 +00:00
atalman	7cf9b942da	Move test ops gradients and test ops jit to separate files Fixes #72368 As per reference issue, the test_ops in single file takes around 3:30-4:00Hrs to execute on asan jobs: Reference : pytorch_test_times.json ``` { "commit": "39535fec6c3ff5bf7c2d322d096c59571c3295ed", "JOB_BASE_NAME": "linux-xenial-py3.7-clang7-asan", "job_times": { "test_ops": 14928.355000000636, <- This test group is over 4hrs alone ``` ---- Hence separating test_ops into following parts: 1. TestGradients 2. TestJit 3. TestCommon and TestMathBits Pull Request resolved: https://github.com/pytorch/pytorch/pull/74297 Approved by: https://github.com/malfet	2022-03-16 19:30:22 +00:00
Wanchao Liang	8b2ae86f02	[shard] disable rocm and windows for sharding_spec test (#74040 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74040 fixes https://github.com/pytorch/pytorch/issues/73552 ghstack-source-id: 151046817 Test Plan: wait for ci Reviewed By: rohan-varma Differential Revision: D34792398 fbshipit-source-id: 84d08f01db8375817f48537505e7d988cb39d1f4 (cherry picked from commit 18b21ef0db91ddd22dc57a5b413e3e3ad594bb14)	2022-03-10 20:23:59 +00:00
Alban Desmaison	701fa16eed	only run complex autograd tests once Pull Request resolved: https://github.com/pytorch/pytorch/pull/73210	2022-03-01 23:42:07 +00:00
Alban Desmaison	f275b3f9a1	simplify run_test for distributed tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/73209	2022-03-01 23:37:37 +00:00
Alban Desmaison	7e919bd3c6	add dry run option and improve test list printing Pull Request resolved: https://github.com/pytorch/pytorch/pull/73208	2022-02-22 20:45:41 +00:00
Ilya Persky	1b089292df	Fix test failure when compiled without LAPACK support (#70671 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/70670 Pull Request resolved: https://github.com/pytorch/pytorch/pull/70671 Reviewed By: H-Huang Differential Revision: D34242339 Pulled By: janeyx99 fbshipit-source-id: 8cd13c13588007c60e9c3f17dbf707dcfa2e0e04 (cherry picked from commit `cf6dbe3e81`)	2022-02-15 16:38:47 +00:00
wushirong	4d01789f69	Remove fx2trt from oss CI (#72595 ) Summary: Remove fx2trt test from oss CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/72595 Test Plan: CI Reviewed By: houseroad Differential Revision: D34112595 Pulled By: wushirong fbshipit-source-id: 02376ef0f25381eff31b72dcbf964c1966af9793 (cherry picked from commit `e3d698a942`)	2022-02-10 18:49:31 +00:00
Junjie Wang (PyTorch)	88547396eb	[PT-D] Enable megatron-lm style MLP layers (Changes mainly on sharded linear op) (#69735 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69735 We want to build a prototype of Megatron-LM so that we can apply PT-D op to models like transformer and other Meta flagship models like The basic idea of Megatron-LM is as following: 1. Col-wise sharding of linear weight. Perform the linear op for the first layer. 2. Perform a math op (optional), such as ReLU or GeLU. We use GeLU in our example unit test. The input is from step 1. 3. Row-wise sharing of linear weight. Perform the linear op for the second layer. The input is from step 2. We then save communications to concatenate the col-wise sharding results and spreading the input to different ranks for row-wise sharding. The change is as following: 1. Return a ShardedTensor for the col-wise sharding in the sharded_linear op. 2. Return a PartialTensors for the row-wise sharding in the sharded_linear op. 3. Leverage APIs already defined for `reshard` to merge/aggregate local results to a fully sync local result if needed. 4. Add helper function to create sharded tensor based on the local result. 5. Add a unit test to test the Megatron-LM idea mentioned above and compare with local ops, including the grad and optimizer so that we can ensure the correctness of the implementation. 6. Refactor the unit test of sharded linear to reflect the changes in the code. ghstack-source-id: 148273049 Test Plan: Unit test + CI Reviewed By: pritamdamania87 Differential Revision: D32978221 fbshipit-source-id: 565fc92e7807e19d53b0261f8ace3945bef69e3e (cherry picked from commit `344abe7520`)	2022-02-03 06:12:15 +00:00
Junjie Wang (PyTorch)	19d0de8a57	[PT-D][RFC] Resharding related API implement for ShardedTensor and Partial Tensor (#70079 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70079 We defined a new concept named `PartialTensor`, which is an abstraction to represent Tensors that need aggregation across multiple devices and multiple processes. We also defined a API `reshard_output` to reshard a `PartialTensor` to `Tensor` or reshard a `ShardedTensor` to `ShardedTensor/Tensor`. This is done via class `ModuleResharder` which acts like a wrapper of original modules plus the a reshard in the final step. The `reshard` logic is defined in each class (`ShardedTensor` and `PartialTensor`). ghstack-source-id: 148273050 Test Plan: Unit test is in the next PR. Reviewed By: pritamdamania87 Differential Revision: D33121037 fbshipit-source-id: 5f56617ea526b857c5b73df6e069697d428ec359 (cherry picked from commit `58b1457cbc`)	2022-02-03 05:26:02 +00:00
Pritam Damania	64670e414e	[reland] Create torch.distributed._shard package. (#72141 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72141 We have many sharding components currently: torch.distributed._sharded_tensor, torch.distributed._sharding_spec, torch.distributed._sharded_optimizer and more coming. As a result, organizing all of this under the `torch.distributed._shard` package. For BC reasons, I'm still keeping the old packages and have them just reference the new package. ghstack-source-id: 148150861 ghstack-source-id: 148150861 Test Plan: waitforbuildbot Reviewed By: fduwjj Differential Revision: D33904585 fbshipit-source-id: 057e847eb7521b536a3ee4e0f94871aacc752062 (cherry picked from commit `29a70dd7af`)	2022-02-02 06:58:20 +00:00
Nikita Shulga	34494e6252	Back out "Create torch.distributed.shard package." (#72062 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72062 Original commit changeset: dc692b31e260 Original Phabricator Diff: D33755913 (`87bbcf70f7`) Test Plan: CI Reviewed By: pbelevich Differential Revision: D33891115 fbshipit-source-id: 37286e03d743d8691319f07c95e9561d54f3d6d0 (cherry picked from commit `0c1b3fe008`)	2022-01-31 18:29:27 +00:00
Pritam Damania	87bbcf70f7	Create torch.distributed.shard package. (#71742 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71742 We have many sharding components currently: torch.distributed._sharded_tensor, torch.distributed._sharding_spec, torch.distributed._sharded_optimizer and more coming. As a result, organizing all of this under the `torch.distributed.shard` package. For BC reasons, I'm still keeping the old packages and have them just reference the new package. ghstack-source-id: 147899768 Test Plan: waitforbuildbot Reviewed By: fduwjj, wanchaol Differential Revision: D33755913 fbshipit-source-id: dc692b31e2607063d55dfcb3db33ec53961d5a5b (cherry picked from commit `5b6885f358`)	2022-01-29 00:48:06 +00:00
Shirong Wu	7a08030903	Fix fx2trt CI test trigger condition (#71014 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71014 Replace test trigger with test_config matching. Test Plan: CI https://github.com/pytorch/pytorch/runs/4746717568?check_suite_focus=true Reviewed By: janeyx99 Differential Revision: D33480971 fbshipit-source-id: 9513e464753343a7ae47fcfaf48119f34bae94c5	2022-01-10 13:37:24 -08:00
Rodrigo Kumpera	2378421340	Implement torch.allclose for sharded tensor. (#70331 ) Summary: Implement torch.allclose op for sharded tensors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/70331 Test Plan: Automated test added. pritamdamania87 Fixes https://github.com/pytorch/pytorch/issues/67112 cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Reviewed By: pritamdamania87 Differential Revision: D33339137 Pulled By: kumpera fbshipit-source-id: 4263e468eaa117317b190f69877bf3f8bbac5658	2022-01-07 08:37:04 -08:00
Ilya Persky	bc514cb425	Skip distributed tests if built with USE_DISTRIBUTED=0 (#70677 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/70676 Pull Request resolved: https://github.com/pytorch/pytorch/pull/70677 Reviewed By: albanD Differential Revision: D33439808 Pulled By: janeyx99 fbshipit-source-id: 7f9971eb564dbbb6625fe5f78328c3abe3808719	2022-01-06 08:55:05 -08:00
Brian Hirsh	bb5b4cceb6	Revert "Revert D32498569: allow external backend codegen to toggle whether to generate out= and inplace kernels" (#69950 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69950 This reverts commit `f6cad53443`. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D33113545 Pulled By: bdhirsh fbshipit-source-id: d6590294662588d36c09662dea65919ad4e1e288	2022-01-04 14:52:00 -08:00
wushirong	31c7e5d629	Install TensorRT lib on oss docker and enable fx2trt unit test (#70203 ) Summary: CI Lib installed and unit test run on https://github.com/pytorch/pytorch/actions/runs/1604076060 Pull Request resolved: https://github.com/pytorch/pytorch/pull/70203 Reviewed By: malfet Differential Revision: D33264641 Pulled By: wushirong fbshipit-source-id: ba30010bbd06e70d31415d8c52086d1779371bcf	2021-12-22 08:50:48 -08:00
Pritam Damania	0544f975e1	[reland] Support torch.equal for ShardedTensor. (#70145 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70145 Added support for torch.equal to ShardedTensor. This is really helpful in terms of comparing two ShardedTensors. ghstack-source-id: 146066939 Test Plan: waitforbuildbot Reviewed By: wanchaol Differential Revision: D33201714 fbshipit-source-id: 56adfc36e345d512c9901c56c07759bf658c745b	2021-12-21 13:22:52 -08:00
Michael Suo	19f898402d	Revert D33241684: [pytorch][PR] Install TensorRT lib on oss docker and enable fx2trt unit test Test Plan: revert-hammer Differential Revision: D33241684 (`dab3d3132b`) Original commit changeset: cd498908b00f Original Phabricator Diff: D33241684 (`dab3d3132b`) fbshipit-source-id: d5b2e663b5b0c9e570bd799b9f6111cd2a0de4f7	2021-12-20 23:14:35 -08:00
wushirong	dab3d3132b	Install TensorRT lib on oss docker and enable fx2trt unit test (#70203 ) Summary: CI Lib installed and unit test run on https://github.com/pytorch/pytorch/actions/runs/1604076060 Pull Request resolved: https://github.com/pytorch/pytorch/pull/70203 Reviewed By: janeyx99 Differential Revision: D33241684 Pulled By: wushirong fbshipit-source-id: cd498908b00f3417bdeb5ede78f5576b3b71087c	2021-12-20 18:51:48 -08:00
Michael Suo	a406a427ae	Revert D33004315: Support torch.equal for ShardedTensor. Test Plan: revert-hammer Differential Revision: D33004315 (`1c4c81622c`) Original commit changeset: 786fe26baf82 Original Phabricator Diff: D33004315 (`1c4c81622c`) fbshipit-source-id: e1dda70fea656834fdf0f2a9f874415f7b460c6e	2021-12-15 14:14:06 -08:00
Pritam Damania	1c4c81622c	Support torch.equal for ShardedTensor. (#69734 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69734 Added support for `torch.equal` to ShardedTensor. This is really helpful in terms of comparing two ShardedTensors. Will implement `allclose` in a follow PR. ghstack-source-id: 145301451 Test Plan: waitforbuildbot Reviewed By: fduwjj, wanchaol Differential Revision: D33004315 fbshipit-source-id: 786fe26baf82e1bb4fecfdbfc9ad4b64e704877f	2021-12-15 13:07:36 -08:00
Brian Hirsh	f6cad53443	Revert D32498569: allow external backend codegen to toggle whether to generate out= and inplace kernels Test Plan: revert-hammer Differential Revision: D32498569 (`aa0cf68c17`) Original commit changeset: ebd932d042b9 Original Phabricator Diff: D32498569 (`aa0cf68c17`) fbshipit-source-id: 21a393fa339510d926512a7983d33ece327b743d	2021-12-14 15:27:24 -08:00
Nikita Shulga	24ee1d13f6	Another attempt to fix version comparison check (#69939 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/69939 Reviewed By: atalman Differential Revision: D33108135 Pulled By: malfet fbshipit-source-id: cadadfe5b04c4378f149136f8e1f8e8d6266775c	2021-12-14 14:54:15 -08:00
Wanchao Liang	800a457b6f	[shard] add ShardedOptimizer (#68607 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68607 This PR added ShardedOptimizer and a API to get module parameters along with ShardedTensor param, it allows user to use this Optimizer Wrapper to construct a optimizer that involves ShardedTensor The state_dict support will be a follow up diff ghstack-source-id: 145532834 Test Plan: python test_sharded_optim.py Reviewed By: pritamdamania87 Differential Revision: D32539994 fbshipit-source-id: a3313c6870d1f1817fc3e08dc2fc27dc43bef743	2021-12-14 12:15:20 -08:00
Nikita Shulga	fef9981998	Update run_test.py (#69920 ) Summary: Do not compare LooseVersion against string Pull Request resolved: https://github.com/pytorch/pytorch/pull/69920 Reviewed By: atalman Differential Revision: D33101166 Pulled By: malfet fbshipit-source-id: a2df9e01d17663262718f11e580c8b009764f7b5	2021-12-14 11:26:56 -08:00
Brian Hirsh	aa0cf68c17	allow external backend codegen to toggle whether to generate out= and inplace kernels (#68530 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68530 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D32498569 Pulled By: bdhirsh fbshipit-source-id: ebd932d042b988e19c71aa04a21677db9bdc9f04	2021-12-14 10:25:02 -08:00
Nikita Shulga	07767569c9	Properly import LooseVersion (#69904 ) Summary: This fixes regression introduced by https://github.com/pytorch/pytorch/pull/57040 Somehow importing `distutils` from `setuptool` caused import of `distutils.versions`, which is not a documented dependency and got change with the release of [setuptools-59.6.0](https://github.com/pypa/setuptools/tree/v59.6.0) We should not rely on that, as `import distutils` never re-imports `distutils.version`, which one can see by observing https://github.com/python/cpython/blob/3.9/Lib/distutils/__init__.py or by running: ``` % python3 -c "import distutils;print(distutils.__version__, dir(distutils))" 3.7.5 ['__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', '__version__', 'sys'] % python3 -c "from setuptools import distutils;print(distutils.__version__, dir(distutils))" 3.7.5 ['__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', '__version__', 'archive_util', 'ccompiler', 'cmd', 'config', 'core', 'debug', 'dep_util', 'dir_util', 'dist', 'errors', 'extension', 'fancy_getopt', 'file_util', 'filelist', 'log', 'spawn', 'sys', 'sysconfig', 'util', 'version'] ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/69904 Reviewed By: albanD, atalman, janeyx99 Differential Revision: D33094453 Pulled By: malfet fbshipit-source-id: aaf1adb7c6f293c4e376ccff21c64cd6ba625e97	2021-12-14 09:28:19 -08:00
Andrey Talman	77a4b89411	Adding windows cuda 11.5 workflows (#69377 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/69081 Pull Request resolved: https://github.com/pytorch/pytorch/pull/69377 Reviewed By: ngimel Differential Revision: D33076022 Pulled By: atalman fbshipit-source-id: aeb2791fc15d7b491976f57a74c1989c6ca61b81	2021-12-13 20:49:02 -08:00
Alban Desmaison	8b20dde932	add python dispatch test back to CI and fix typo in test (#69565 ) Summary: The error message was changed following a PR comment. And since the test doesn't run on CI, I forgot to update the test to catch the new error message. Pull Request resolved: https://github.com/pytorch/pytorch/pull/69565 Reviewed By: mrshenli Differential Revision: D32932982 Pulled By: albanD fbshipit-source-id: a1da72b0ca735e72b481bc944039233094f1c422	2021-12-08 08:44:49 -08:00
Rohan Varma	3bd7dbf119	[Dist CI][BE] Remainder of c10d/store tests run in subprocess (#68822 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68822 Per title, we switched over c10d_gloo and nccl and results look good so far, so switch the rest of them as well. After the only dist tests that won't run in subprocess are pipe and fsdp tests, which historically haven't had much flakiness. ghstack-source-id: 144213522 Test Plan: CI Reviewed By: H-Huang Differential Revision: D32624330 fbshipit-source-id: 469f613e5b0e4529e6b23ef259d948837d4af26b	2021-11-29 10:59:39 -08:00
Rohan Varma	250d0bd20b	[RPC][Dist CI][BE] RPC tests run in subprocess (#68821 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68821 Continuing effort to move most distributed tests to run in subprocess for better reproducibility + reduce flakiness. ghstack-source-id: 144213520 Test Plan: CI Reviewed By: H-Huang Differential Revision: D32624199 fbshipit-source-id: 04448636320554d7a3ab29ae92bc1ca9fbe37da2	2021-11-29 10:58:08 -08:00
Nikita Shulga	b5b62b3408	Cleanup old TD logic (#68842 ) Summary: Remove `--determine-from` option from run_test.py and remove all references from corresponding test scripts Followup after https://github.com/pytorch/pytorch/pull/64921 Pull Request resolved: https://github.com/pytorch/pytorch/pull/68842 Reviewed By: seemethere, janeyx99 Differential Revision: D32631418 Pulled By: malfet fbshipit-source-id: bdb5dd888c1d97dfaf95c1f299bf8073f3de9588	2021-11-23 18:45:42 -08:00
Rohan Varma	9554ebe44e	[Dist CI][BE] c10d gloo tests run in subprocess (#68504 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68504 Per title ghstack-source-id: 143928767 Test Plan: CI Reviewed By: H-Huang Differential Revision: D32485100 fbshipit-source-id: a55687aea4af69e3830aee6f0278550c72f142c2	2021-11-22 09:54:07 -08:00
Rohan Varma	ddc22ea3b2	[Dist CI][BE] test_c10d_nccl run in subprocess (#68503 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68503 Per title ghstack-source-id: 143928768 Test Plan: CI Reviewed By: H-Huang Differential Revision: D32484990 fbshipit-source-id: 6682f46256af0da5153e5087a91a7044156dd17f	2021-11-22 09:52:58 -08:00
Wanchao Liang	fb556c91ce	[BE] delete frontend.cpp (#67400 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67400 c10d/frontend.cpp was originally proposed to introduce pure C++ API and use TorcBind to share python level API with TorchScript. This is no longer needed, so delete this to reduce code redundancy. ghstack-source-id: 143910066 ghstack-source-id: 143910066 Test Plan: wait for ci Reviewed By: navahgar Differential Revision: D31979270 fbshipit-source-id: 6ceb8b53d67ab8f9aef44b34da79346dfbb51225	2021-11-21 23:30:52 -08:00
Rohan Varma	f02efc749a	[Dist CI][BE] Run each test in its own process for test_distributed_spawn (#67901 ) Summary: Context: https://github.com/pytorch/pytorch/issues/67061 Use `run_test.py`'s provided flag `"--subprocess"`, passed in like `extra_unittest_args=["--subprocess"]` when running test_distributed_spawn. This will ensure that each test is run separately in its own process. The goal is to more closely simulate how a developer would run a single test when reproducing a CI failure and make reproducibility easier in general. Also, when a test fails, print out the exact command that was issued so developer knows how to reproduce it. For example test fails, it will print out something like the following to logs - ``` Test exited with non-zero exitcode 1. Command to reproduce: BACKEND=gloo WORLD_SIZE=3 /fsx/users/rvarm1/conda/envs/pytorch/bin/python distributed/test_distributed_spawn.py -v TestDistBackendWithSpawn.test_Backend_enum_class ``` running test_distributed_spawn is still the same cmd as before: ` python test/run_test.py --verbose -i distributed/test_distributed_spawn ` as seen in [distributed contributing](https://github.com/pytorch/pytorch/blob/master/torch/distributed/CONTRIBUTING.md) guide. Pull Request resolved: https://github.com/pytorch/pytorch/pull/67901 Reviewed By: cbalioglu, mruberry Differential Revision: D32225172 Pulled By: rohan-varma fbshipit-source-id: 7e8d4c7a41858044bd2a4e0d1f0bf8f1ac671d67	2021-11-11 06:11:00 -08:00
Brian Hirsh	7c90bd77ec	Test functionalization pass in python (#66101 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66101 Updated description: This PR tests the functionalization pass in python in two ways. For each of the test programs that I have in `test_functionalization.py`, it: - runs the program with and without functionalization, and asserts the outputs and (potentially mutated) inputs are equal in both cases - runs the program with `LoggingTensor`, and uses expecttests on the resulting graph. I manually confirm that the graphs look reasonable and only contain functional ops. Mechanically, the changes include: - factoring out `LoggingTensor` into a testing util so it can be re-used in multiple tests - adding some private python api's in the `torch` namespace as hooks that I can use during testing In the original version of this PR, I also added some fixes to the `_make_subclass()` function in python: allowing you to pass in strides and storage_offset. I kept them in mainly because the changes were already there. Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D31942095 Pulled By: bdhirsh fbshipit-source-id: 90ff4c88d461089704922e779571eee09c21d707	2021-11-09 14:34:05 -08:00
Junjie Wang	2766662ca9	[PyTorch][2/N] Basic implementation of ShardedEmbeddingBag using ShardedTensor. (#67188 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67188 This diff/PR is trying to implement the ShardedEmbeddingBag using the ShardedTensor. We support both row-wise and column-wise sharding of the embedding bag. The detailed logic can be found in the comment. Several caveats: 1. Only the sharding of one weight is supported now. 1. We support limited input params for the op. To support more params are on the way. 2. We only support chuck sharding for now. 3. We only support a single local shard per rank for now. Some other changes include: 1. Refactor the ShardedEmbedding code so that the common logic can be reused. 2. Fix tiny typos and corner cases in API `get_chunked_dim_size`. Where it will return -1 if the we set the dim_size = 5, split_size = 2, idx = 3. (This is a valid case because when chunks = 4, dim_size = 5, then the split_size = 2) ghstack-source-id: 142325915 Test Plan: Unit test and CI Reviewed By: pritamdamania87 Differential Revision: D31749458 fbshipit-source-id: ed77e05e4ec94ef1a01b1feda8bbf32dc5d5da1b	2021-11-03 17:39:18 -07:00
Bo Wang	b6df043f1f	Add torch.nn.init.uniform_ operator to ShardedTensor. (#63997 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63997 Use torch_function to extend torch.nn.init.uniform_ The Init is done in SPMD fashion. Note that ideally we want to aggregate sharded tensors into a global tensor, init it and reshard. It's fine to run it SPMD since uniform is I.I.D indepenent and identifically distributed. Also enable unit test for test_linear.py for OSS test Test Plan: a) Unit Test (pytorch) ... $ python test/distributed/_sharded_tensor/ops/test_init.py TestShardedTensorNNInit --v (pytorch) ... $ python test/distributed/_sharded_tensor/ops/test_linear.py --v (before runs this command is no-op) or b) Manual run: Instruction here: https://docs.google.com/document/d/1_m1Hdo5w51-hhPlZ_F8Y6PIWrN7UgJZqiSpARYvhsaE/edit# Imported from OSS Reviewed By: pritamdamania87, anjali411 Differential Revision: D30563017 fbshipit-source-id: d1859f7682235bcb44515efc69ca92bc5e34fce1	2021-10-21 00:17:13 -07:00
Junjie Wang	08cb31a03e	[PyTorch][1/N] Basic implementation of ShardedEmbedding using ShardedTensor. (#66604 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66604 This diff/PR is trying to implement the ShardedEmbedding and ShardedEmbedding using the ShardedTensor. Several caveats: 1. We support limited input params for the op. To support more params are on the way. 2. We only support chuck sharding for now. 3. We only support a single local shard per rank for now. ghstack-source-id: 141056130 Test Plan: Unit test and CI Reviewed By: pritamdamania87 Differential Revision: D31544556 fbshipit-source-id: cc867dcba8c11e6f4c7c3722488908f5108cc67f	2021-10-20 15:16:49 -07:00
Yanli Zhao	61fca037d6	[Part 1] upstreaming fairscale fsdp to PyTorch -- sharding, core data flow and hooks (#63881 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63881 This PR includes the minimal sets of features to make FSDP work, like sharding, core data flow and hooks. More tests will be added in the follow up PRs. Tests are refactored to utilize common PyTorch utils. Codes are also refactored a little bit. Alternative ways to replace ".data" usage in this PR are still being discussed offline. Test Plan: unit tests Reviewed By: mrshenli Differential Revision: D30521673 fbshipit-source-id: 9a23390dd7c925749604c6860e08fbe39ddc5500	2021-10-07 09:06:44 -07:00
Pritam Damania	0dc98728bc	Basic implementation of ShardedLinear using ShardedTensor. (#64128 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64128 This PR implements a sharded nn.Linear layer using ShardedTensors with the following limitations: 1) Works only for ChunkShardingSpec. 2) Implementation is only aimed to demonstrate functionality and is most likely not performant at all. The PR also introduces a `shard_parameter` API to easily shard parameters of `nn.Modules`. This also has the following limitations: 1) Works only for ChunkShardingSpec. 2) Is not performant since it uses broadcast instead of scatter since ProcessGroupNCCL doesn't yet support scatter. Overall user API for running a sharded linear would be something like this: ``` # SPMD programming paradigm running same code on all nodes. fc = nn.Linear(10, 10) # Setup sharding. sharding_spec=ChunkShardingSpec(...) shard_parameter(fc, 'weight', sharding_spec, src_rank=0) # Run as a normal linear layer. inp = torch.rand(10, 10) output = fc(inp) ``` ghstack-source-id: 138500985 Test Plan: 1) unit tests. 2) waitforbuildbot Reviewed By: wanchaol, bowangbj Differential Revision: D30621215 fbshipit-source-id: 1aa7478568c18a4572f6c3462fdf24a4cbde01d6	2021-09-20 18:31:11 -07:00
Nikita Shulga	01cfea9485	Disable target determination for now (#64921 ) Summary: There were several reports of target determinator incorrectly skipping tests, most recent one is https://github.com/pytorch/pytorch/issues/64902 Let's disable it until it could be further stabilized Pull Request resolved: https://github.com/pytorch/pytorch/pull/64921 Reviewed By: seemethere, janeyx99 Differential Revision: D30901186 Pulled By: malfet fbshipit-source-id: 531afd2d390c6b51f727330d5dd1882d70b6fdde	2021-09-14 09:40:13 -07:00
Rohan Varma	d067f15622	[Dist CI] Move rest of distributed tests to their own CI job (#64253 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64253 Follow up to D30496178 (`f4aff3a346`) to move the rest of distributed tests to their own jobs for Linux GHA. ghstack-source-id: 137233785 Test Plan: CI Reviewed By: walterddr Differential Revision: D30662999 fbshipit-source-id: f7cfbc0d1223aca52120f17f9da987d70fda8de6	2021-09-01 21:43:41 -07:00
Nikita Shulga	c2da103fe6	Discover new tests in run_tests.py (#64246 ) Summary: Introduce `discover_tests` function that globs for all Python files starting with `test_` in test folder excluding subfolders which are executed differently Fixes https://github.com/pytorch/pytorch/issues/64178 Pull Request resolved: https://github.com/pytorch/pytorch/pull/64246 Reviewed By: walterddr, seemethere Differential Revision: D30661652 Pulled By: malfet fbshipit-source-id: a52e78ec717b6846add267579dd8d9ae75326bf9	2021-08-31 17:32:55 -07:00
Richard Zou	0457a85d45	Revert D30543236: Add python mode Test Plan: revert-hammer Differential Revision: D30543236 (`4bd03b0242`) Original commit changeset: ef5444d96a5a fbshipit-source-id: b0042ac2c22765fa11d6d00bf751f6a4489eb6d8	2021-08-31 15:28:33 -07:00
Rohan Varma	1c2b5e59ae	Remove ref to test_distributed_fork (#64197 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64197 Removes this line as test is gone. ghstack-source-id: 136986275 Test Plan: CI Reviewed By: walterddr Differential Revision: D30642929 fbshipit-source-id: a0c7dfdfb35a4a7f7ec1b881dbea53d85136012c	2021-08-31 13:31:27 -07:00
leslie-fang-intel	09dfaa0339	add operation list for AutocastCPU (#63534 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63534 In this PR: * We have changed the default dtype of `AutocastCPU` from `float16` to `bfloat16` as discussed here `https://github.com/pytorch/pytorch/pull/61002` * We also update the operation list which needs casting to `lower_precision_fp` or `float32`. Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D30644914 Pulled By: ezyang fbshipit-source-id: 8b93485ba452b3759611e3f0ac88e920fe495ac1	2021-08-30 19:30:33 -07:00
Richard Zou	4bd03b0242	Add python mode (#63496 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63496 This PR adds a (private) enable_python_mode context manager. (see torch/utils/_python_dispatch.py). enable_python_mode accepts the type of a __torch_dispatch__ object as its argument. Whenever an operator gets called inside of the context manager, it dispatches to the __torch_dispatch__ of the passed-in type. Example usage: ``` with enable_python_mode(LoggingTensor): z = torch.empty([]) assert isinstance(z, LoggingTensor) ``` There are quite a few changes that were made to support this. First, we added TorchDispatchTypeObject, a C++ struct that represents the type of a `__torch_dispatch__` object (e.g. LoggingTensor). It holds both the PyObject* representing the class and a PyInterpreter* so we know which Python interpreter it came from. Next, we updated the concrete_dispatch_fn in python_variable.cpp to accept a `const std::shared_ptr<TorchDispatchTypeObject>&` argument. When this is null, dispatching happens as usual. When it is non-null, we prepend the TorchDispatchTypeObject's PyObject* to the overloaded args list so that it is considered first for dispatch. To get that to work, we changed how `handle_torch_dispatch_no_python_arg_parser` works. The "overloaded args list" previously only consisted of Tensor PyObjects, but now it can have types in addition to Tensors! - We renamed `append_overloaded_arg` to `append_overloaded_arg` - We added a new `append_overloaded_type` that appends a type to overloaded_args - We added special handling in `handle_torch_dispatch_no_python_arg_parser` and `append_overloaded_arg` to handle types in addition to Tensors. Then, there is PythonMode and PythonModeTLS. - We reuse the DispatchKey::Python dispatch key as a mode key - We use PythonMode::enter and PythonMode::exit to enable/disable DispatchKey::Python and set the PythonModeTLS. - PythonModeTLS stores a TorchDispatchTypeObject as metadata. - PythonMode is in libtorch_python, and PythonModeTLS is in ATen. This split is due to the libtorch_python library boundary (because we need to save TLS in ATen/ThreadLocalState) - We modify the PythonFallbackKernel to look up the relevant TorchDispatchTypeObject (if Python Mode is active) and dispatch using it. There are two more miscellaneous changes: - internal_new_from_data (torch/csrc/utils/tensor_new.cpp) gets an exclude guard. enable_python_mode currently does not handle torch.tensor and the exclude guard is to prevent a bug. Future: - This PR does not allow for the nesting of Python modes. In the future we should be able to enable this with a more sane no_dispatch API and by changing the TLS to a stack. For now I did not need this for CompositeImplicitAutograd testing. Test Plan: - new tests Reviewed By: malfet, albanD Differential Revision: D30543236 Pulled By: zou3519 fbshipit-source-id: ef5444d96a5a957d1657b7e37dce80f9a497d452	2021-08-30 18:44:35 -07:00
Jane Xu	1354ee417a	run_test.py: add option to run only core tests (#63976 ) Summary: This is in response to a feature request from some folks in the core team to have a local command that would only run relevant "core" tests. The idea is to have a local smoke test option for developers to run locally before making a PR in order to verify their changes did not break core functionality. These smoke tests are not targeted to be short but rather relevant. This PR enables that by allowing developers to run `python test/run_test.py --core` or `python test/run_test.py -core` in order to run the CORE_TEST_LIST, which is currently test_nn.py, test_torch.py, and test_ops.py. I am not the best person to judge what should be considered "core", so please comment which tests should be included and/or excluded from the CORE_TEST_LIST! Pull Request resolved: https://github.com/pytorch/pytorch/pull/63976 Test Plan: ``` (pytorch) janeyx@janeyx-mbp test % python run_test.py --core -v Selected tests: test_nn, test_ops, test_torch Running test_nn ... [2021-08-25 14:48:28.865078] Executing ['/Users/janeyx/miniconda3/envs/pytorch/bin/python', 'test_nn.py', '-v'] ... [2021-08-25 14:48:28.865123] test_to (__main__.PackedSequenceTest) ... ok test_to_memory_format (__main__.PackedSequenceTest) ... ok ``` Reviewed By: walterddr Differential Revision: D30575560 Pulled By: janeyx99 fbshipit-source-id: 3f151982c1e315e50e60cb0d818adaea34556a04	2021-08-26 09:29:57 -07:00
driazati	ab5cf5a1eb	Move existing target determinator to tools (#63809 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63809 This moves out the modulefinder determinator to `tools/testing` since it is supposed to be CI-only. This also simplifies run_test.py a little bit. Test Plan: Imported from OSS Reviewed By: malfet, seemethere, janeyx99 Differential Revision: D30497438 Pulled By: driazati fbshipit-source-id: 1d203037af5af6a20c1e7812da935e7cbb5cd82f	2021-08-25 13:03:53 -07:00
driazati	67d8e7b659	Reformat run_test.py (#63808 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63808 `black run_test.py` Test Plan: Imported from OSS Reviewed By: seemethere Differential Revision: D30497437 Pulled By: driazati fbshipit-source-id: 41b29b73f41fa4bb15fce5eaa69f8efe614e02f7	2021-08-25 11:27:18 -07:00
Rong Rong (AI Infra)	f4aff3a346	[BE] add distributed run_test options (#63147 ) Summary: Currently distributed tests are mixed within test_python. We would like to split the distributed tests into its own batch thus we need to split them out. Adding an option to include/exclude distributed tests with CUSTOM_HANDLERS. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63147 Test Plan: - locally run with the addition run_test.py options. - CI Dependency: found a bug in mpiexec test and need https://github.com/pytorch/pytorch/issues/63580 to fix it first. Reviewed By: bdhirsh Differential Revision: D30496178 Pulled By: walterddr fbshipit-source-id: 7903a57b619f2425028028f944211938823918a6	2021-08-24 08:03:01 -07:00
Pritam Damania	2d671ca41b	[8/N] Remove c10d/ddp fork tests. (#63454 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63454 Continuation of https://github.com/pytorch/pytorch/pull/63443, this PR removes all fork tests from torch.distributed. ghstack-source-id: 136285511 Test Plan: waitforbuildbot Reviewed By: SciPioneer Differential Revision: D30387872 fbshipit-source-id: f6d6313db126ae7b95b86f78a1e0726887c5c513	2021-08-20 12:23:18 -07:00
Jeff Daily	be9be9bfdd	add distributed/_sharded_tensor/test_sharded_tensor to ROCM_BLOCKLIST (#63508 ) Summary: Fixes current ROCm CI test2 brokenness until tensorpipe is fully supported by ROCm. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63508 Reviewed By: ejguan Differential Revision: D30406450 Pulled By: walterddr fbshipit-source-id: c07509271d5d33901f3eaf7ffb916dc3626e1f9a	2021-08-19 07:50:55 -07:00
Eli Uriegas	4982fc4ecf	test: Add ability to set CONTINUE_THROUGH_ERROR (#63357 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63357 Adds the ability to set CONTINUE_THROUGH_ERROR as an environment variable so that we can easily set it without having to add the flag directly Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: astaff Differential Revision: D30351108 Pulled By: seemethere fbshipit-source-id: 767fa9bd24e1399f359eb24d16f6cc985a2d7173	2021-08-16 15:35:40 -07:00
Shen Li	1022443168	Revert D30279364: [codemod][lint][fbcode/c*] Enable BLACK by default Test Plan: revert-hammer Differential Revision: D30279364 (`b004307252`) Original commit changeset: c1ed77dfe43a fbshipit-source-id: eab50857675c51e0088391af06ec0ecb14e2347e	2021-08-12 11:45:01 -07:00
Zsolt Dollenstein	b004307252	[codemod][lint][fbcode/c*] Enable BLACK by default Test Plan: manual inspection & sandcastle Reviewed By: zertosh Differential Revision: D30279364 fbshipit-source-id: c1ed77dfe43a3bde358f92737cd5535ae5d13c9a	2021-08-12 10:58:35 -07:00
Pritam Damania	91525d42d9	Fix sharded tensor tests. (#63054 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63054 1) Ensure these tests are skipped in environments without any GPUs. 2) Add the test to run_test.py ghstack-source-id: 135595698 Test Plan: waitforbuildbot Reviewed By: wanchaol Differential Revision: D30239159 fbshipit-source-id: 21b543ba72e8d10182bc77e7ae1fd34fd4096509	2021-08-11 21:46:45 -07:00
Rohan Varma	39ec1da935	[reland] Gate DistributedOptimizers on RPC availability (#62937 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62937 reland due to windows + cuda failure, fix by running it on gloo on windows even with cuda. ghstack-source-id: 135306176 Test Plan: ci Reviewed By: mrshenli Differential Revision: D30177734 fbshipit-source-id: 7625746984c8f858648c1b3632394b98bd4518d2	2021-08-09 14:41:06 -07:00
Natalia Gimelshein	b45cf9b81b	Revert D30117838: [WIP] Gate DistributedOptimizers on RPC availability Test Plan: revert-hammer Differential Revision: D30117838 (`3f09485d7e`) Original commit changeset: e6365a910a3d fbshipit-source-id: f276b2b2bdf5f7bd27df473fca0eebaee9f7aef2	2021-08-06 22:10:41 -07:00
Rohan Varma	3f09485d7e	[WIP] Gate DistributedOptimizers on RPC availability (#62774 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62774 Gates DistributedOptimizer which relies on RRef based on if RPC is available. This should enable ZeRo to work with Windows as Windows should not try to import the DIstributedOptimizer. If this works as expected we can enable the windows tests for functional/local sgd optimizers as well. ghstack-source-id: 135216642 Test Plan: CI Reviewed By: pbelevich Differential Revision: D30117838 fbshipit-source-id: e6365a910a3d1ca40d95fa6777a7019c561957db	2021-08-06 10:59:00 -07:00
Joel Schlosser	a0309f89f4	Initial ModuleInfo implementation (#61935 ) Summary: This PR contains the initial version of `ModuleInfo` for use in testing modules. The design philosophy taken here is to start small and simple and build out / refactor as needed when more test coverage or `ModuleInfo` entries are added. As such, it's not intended for general usage yet. The PR contains the following: * (new file) `torch/testing/_internal/common_modules.py` * `ModuleInfo` definition - metadata for each module to use in testing * `module_db` - the actual `ModuleInfo` database; currently contains entries for two modules * `ModuleInput` - analogous to `SampleInput` from OpInfo; contains `FunctionInput`s for both constructor and forward pass inputs * Constructor and forward pass inputs are tied together within a `ModuleInput` because they are likely correlated * `FunctionInput` - just contains args and kwargs to pass to a function (is there a nicer way to do this?) * `modules` decorator - analogous to `ops`; specifies a set of modules to run a test over * Some constants used to keep track of all modules under torch.nn: * `MODULE_NAMESPACES` - list of all namespaces containing modules * `MODULE_CLASSES` - list of all module class objects * `MODULE_CLASS_NAMES` - dict from module class object to nice name (e.g. torch.nn.Linear -> "nn.Linear") * (new file) `test/test_modules.py` * Uses the above to define tests over modules * Currently, there is one test for demonstration, `test_forward`, which instantiates a module, runs its forward pass, and compares it to a reference, if one is defined Pull Request resolved: https://github.com/pytorch/pytorch/pull/61935 Reviewed By: mruberry Differential Revision: D29881832 Pulled By: jbschlosser fbshipit-source-id: cc05c7d85f190a3aa42d55d4c8b01847d1efd57f	2021-07-27 07:42:07 -07:00
Rohan Varma	69adb21940	Parity tests for functional optimizer step_param (#61756 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61756 DDP will support running optimizer as communication hook with optimizers that support a per-parameter/gradient step function `step_param`. Add parity tests as we implement more optimizers that support step_param to ensure parity with regular optimizers. ghstack-source-id: 134330378 Test Plan: Ci Reviewed By: SciPioneer Differential Revision: D29727549 fbshipit-source-id: 18977c896f12b8e478298488b298fd107affcf5f	2021-07-26 19:03:22 -07:00
Yukio Siraichi	5224490ae9	Implement NumPy-like `frombuffer` tensor constructor. (#59077 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59077 Fixes #58549 `from_buffer` constructs a tensor object from an already allocated buffer through CPython's buffer protocol. Besides the standard `dtype`, `count`, and `offset` parameters, this function also accepts: - `device`: where the buffer lives - `requires_grad`: should autograd record operations on the new tensor A new test file _test_buffer_protocol.py_ was created. Currently, only CPU tests were implemented. That's because neither PyTorch nor Numba implements CPython's buffer protocol. Therefore, there's no way to create a CUDA buffer with the existing dependencies (could use PyCUDA for that, though). At the moment, if `device` differs from the device the buffer actually lives, two things may happen: - `RuntimeError`, if `device='cuda'` - Segmentation fault (not tested -- see above), if `device='cpu'` Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D29870914 Pulled By: mruberry fbshipit-source-id: 9fa8611aeffedfe39c9af74558178157a11326bb	2021-07-23 13:17:48 -07:00
Andrew Gu	c2cc6a9396	Add generic join unit tests (#61786 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61786 This adds unit tests for the generic join context manager. ``` gpurun python test/distributed/algorithms/test_join.py ``` Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D29746646 Pulled By: andwgu fbshipit-source-id: 2933d85783c2225574c4b77bfb90064690c6e668	2021-07-20 12:13:05 -07:00
Rong Rong (AI Infra)	a5a10fe353	Move all downloading logic out of common_utils.py (#61479 ) Summary: and into tools/ folder Currently run_tests.py invokes tools/test_selections.py 1. download and analyze what test_file to run 2. download and parse S3 stats and pass the info to local files. 3. common_utils.py uses download S3 stats to determine what test cases to run. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61479 Reviewed By: janeyx99 Differential Revision: D29661986 Pulled By: walterddr fbshipit-source-id: bebd8c474bcc2444e135bfd2fa4bdd1eefafe595	2021-07-12 11:23:22 -07:00
Rong Rong (AI Infra)	718db968b8	move CI related functions out of run_test.py (#61124 ) Summary: run_test.py currently does lots of downloading and test file/suite/case parsing. It doesn't work well outside of the CI environment Restructured the run_test.py and created tools/test/test_selections.py and move all test selection logic (reordering, categorizing slow test, creating shards) Follow up PRs should: - refactor those file read/write logic entangled inside test_selections.py into stats/ folder - restructure and add network independent test logics to test_test_selections.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/61124 Test Plan: - tools/test - CI Related PR: This follows the refactoring example in: https://github.com/pytorch/pytorch/issues/60373 Reviewed By: malfet Differential Revision: D29558981 Pulled By: walterddr fbshipit-source-id: 7f0fd9b4720a918d82918766c002295e8df04169	2021-07-06 09:06:42 -07:00
Zafar	509b1ef9d5	[sparsity] Add sparsity tests to run_test.py (#60887 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60887 Test Plan: ``` ./test/run_test.py -i test_ao_sparsity ``` ``` ./test/run_test.py -i test_ao_sparsity ``` Differential Revision: D29465834 D29465834 Reviewed By: mruberry Pulled By: z-a-f fbshipit-source-id: 144f940363a20dd65c2bbfe70924c266d8791dc7	2021-07-02 11:11:20 -07:00
Sam Estep	d5a44f9f12	Use expecttest from PyPI (#60658 ) Summary: This PR removes `torch/testing/_internal/expecttest.py` in favor of https://github.com/ezyang/expecttest. See also https://github.com/ezyang/ghstack/pull/71. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60658 Test Plan: CI. Reviewed By: ezyang Differential Revision: D29430763 Pulled By: samestep fbshipit-source-id: b7cdc7ba37330176149fd465312118e2254ae92e	2021-06-28 15:43:34 -07:00
Rong Rong (AI Infra)	7e619b9588	First step to rearrange files in tools folder (#60473 ) Summary: Changes including: - introduced `linter/`, `testing/`, `stats/` folders in `tools/` - move appropriate scripts into these folders - change grepped references in the pytorch/pytorch repo Next step - introduce `build/` folder for build scripts Pull Request resolved: https://github.com/pytorch/pytorch/pull/60473 Test Plan: - CI (this is important b/c pytorch/test-infra also rely on some script reference. - tools/tests/ Reviewed By: albanD Differential Revision: D29352716 Pulled By: walterddr fbshipit-source-id: bad40b5ce130b35dfd9e59b8af34f9025f3285fd	2021-06-24 10:13:58 -07:00
Rong Rong (AI Infra)	40d2fe1053	correct filename issue for test_cpp_extensions_aot (#60604 ) Summary: Using file copy to make actual ninja vs. no_ninja suffixed python test files. This is to trick xmlrunner to report test cases in the correct folder. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60604 Test Plan: - CI reports correctly into the corresponding folders - If download the test statistics, calculate shards now doesn't need custom logic to handle `test_cpp_extensions_aot` CI result shown it is working properly: https://github.com/pytorch/pytorch/pull/60604/checks?check_run_id=2900038654 vs https://github.com/pytorch/pytorch/pull/60604/checks?check_run_id=2900038673 Reviewed By: albanD Differential Revision: D29349562 Pulled By: walterddr fbshipit-source-id: e86e6bc0db288a2a57bea3c5f8edf03be1773944	2021-06-24 09:20:19 -07:00
Jane Xu	6385621003	Use JOB_BASE_NAME throughout code--consolidate CIRCLE_JOB (#60425 ) Summary: This PR is a first step in unifying our environment variables across CI (so that we don't have `CIRCLE_BLAH` in our GHA workflows, for example), though I'd like for this PR to be more for discussion about how best to consolidate these variables. This small change only changes most CIRCLE_JOB references in our code to be JOB_BASE_NAME, as that seems the closest GHA (and ROCm) equivalent. Currently, JOB_BASE_NAME is defined as: - in Circle: CIRCLE_JOB (name of the job, like `pytorch_linux_bionic_py3_8_gcc9_coverage_test1`) - in GHA: the build_environment with a `-build` or `-test` tacked to the end , e.g., `pytorch-linux-xenial-cuda10.2-cudnn7-py3.6-gcc7-test` - in ROCm: I don't actually know, but it's important for ROCm test sharding as shown in https://github.com/pytorch/pytorch/pull/60409 I am not sure if this is the intention for JOB_BASE_NAME so it is open to discussion what variable we should use if not JOB_BASE_NAME. I also don't know if it's worth the effort consolidating all these variables, so discussion is also highly encouraged there! Next steps: - Consolidate more CIRCLE_* references, maybe into CI_* equivalents? - We use BUILD_ENVIRONMENT everywhere in Circle though the variable is inconsistent across binary vs CI jobs and across platforms. For example, for linux tests and builds, BUILD_ENVIRONMENT contains the `_test` and `_build` suffixes, but the windows jobs don't. In GHA, BUILD_ENVIRONMENT is similar to how it's defined in windows jobs on Circle. This inconsistency is confusing, and we can probably do something about it. I'm thinking of switching out BUILD_ENVIRONMENT for JOB_BASE_NAME in our test scripts where we actually mean JOB_BASE_NAME. - We should probably document the meaning of the variables we consolidate somewhere, preferably in a README in some unified `ci/` folder. For example, it seems BUILD_ENVIRONMENT is supposed to capture the build environment, whereas JOB_BASE_NAME is supposed to capture the environment _and_ whether we're building or testing. Notes: - I did not replace CIRCLE_JOB references in third_party directories - Previously, print_test_stats reported CIRCLE_JOB as only the build environment for GHA workflows, and I think tacking on the `build` or `test` will not harm anything, though I may be wrong. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60425 Reviewed By: seemethere, samestep Differential Revision: D29333882 Pulled By: janeyx99 fbshipit-source-id: a82080e6205a03a1183035011ce59698eca06748	2021-06-23 11:11:21 -07:00
Howard Huang	ff3678eec2	Disable group group backend rpc tests from running on CI (#60407 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60407 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D29278179 Pulled By: H-Huang fbshipit-source-id: ee78085eeb04d81842c95236b8c3a33de7142a3a	2021-06-23 10:58:31 -07:00
Jane Xu	c63a0d0cfe	Adding windows CUDA smoke tests on PRs (#59686 ) Summary: Adding windows CUDA smoke tests on PRs (master should run the full suite). Next step: - Automate data update so we get a new smoke test list without manual effort Pull Request resolved: https://github.com/pytorch/pytorch/pull/59686 Test Plan: https://github.com/pytorch/pytorch/actions/runs/958296267 The sharded smoke tests take long still because of dependencies installation Reviewed By: walterddr Differential Revision: D29243533 Pulled By: janeyx99 fbshipit-source-id: dde7ba127fa15c95bda0e833cc5311598fb85e2b	2021-06-23 10:13:50 -07:00
Jane Xu	462448f07a	Enable GHA sharding on linux (#60124 ) Summary: This is branch off of https://github.com/pytorch/pytorch/issues/59970 to only shard on linux so far (we're running in issues with windows gflags). This would enable sharding of tests on a few Linux jobs on GHA, allowing tts to be essentially halved. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60124 Reviewed By: zou3519 Differential Revision: D29204211 Pulled By: janeyx99 fbshipit-source-id: 1cc31d1eccd564d96e2aef14c0acae96a3f0fcd0	2021-06-17 13:00:23 -07:00
Rong Rong (AI Infra)	b2fc6de2c4	support parsing of PR stats in run_test.py (#60026 ) Summary: Currently S3 test stats doesn't support PR stats parisng. Changes to s3_stats_parser: 1. they are uploaded to `test_times/{sha1}/{job}` and `pr_test_times/{pr}/{sha1}/{job}` separately. Thus we need parsing logics for both 2. need to attach time for PR stats parsing for ordering since PR commits can be force-pushed Changes to run_test.py 1. Reordering based on previous PR stats if available 2. Falling back to file change option if not enabled. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60026 Test Plan: - CI. - local repro: plz run: ``` CIRCLE_JOB="pytorch_linux_bionic_py3_6_clang9_noarch_test" CIRCLE_PR_NUMBER=60057 IN_CI=1 ENABLE_PR_HISTORY_REORDERING=1 python test/run_test.py ``` Reviewed By: samestep Differential Revision: D29164754 Pulled By: walterddr fbshipit-source-id: 206688e0fb0b78d1c9042c07243da1fbf88a924b	2021-06-16 13:32:31 -07:00
Jane Xu	d88fbf0fbc	fix minor typo in run_test.py (#60055 ) Summary: Fixes typo in run_test.py for option use_specified_test_cases_by Pull Request resolved: https://github.com/pytorch/pytorch/pull/60055 Reviewed By: walterddr Differential Revision: D29150156 Pulled By: janeyx99 fbshipit-source-id: 375e594d09c83188bfa80762c8b833a0b7c5cca4	2021-06-16 09:30:45 -07:00
Rohan Varma	c2098487e8	[c10d] Move pg wrapper tests to their own file. (#59840 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59840 moving these tests to their own standalone file. No meaningful code changes. ghstack-source-id: 131359162 Test Plan: CI Reviewed By: cbalioglu Differential Revision: D29012664 fbshipit-source-id: 348870016509a6ed7e69240fa82bccef4a12d674	2021-06-14 15:05:55 -07:00
Rong Rong	e41bc31eb2	make --run-specified-test-case use --include (#59704 ) Summary: instead of having specific logic to handle run-specific-test-case, we provide the flag to override include or bring-to-front with the SPECIFIED_TEST_CASES_FILE. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59704 Reviewed By: janeyx99 Differential Revision: D29038425 Pulled By: walterddr fbshipit-source-id: 803d3555813437c7f287a22f7704106b0c609919	2021-06-11 13:57:13 -07:00
Jane Xu	9bb5663979	Use commit stats from viable/strict instead of nightlies for sharding (#59727 ) Summary: Currently, not all of CI runs on nightlies, so it's better to use viable/strict. For example, current 11.1 test jobs do not get to use automatic sharding because of the lack of stats: https://app.circleci.com/jobs/github/pytorch/pytorch/14010983?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link Pull Request resolved: https://github.com/pytorch/pytorch/pull/59727 Reviewed By: heitorschueroff Differential Revision: D29004910 Pulled By: janeyx99 fbshipit-source-id: eb0c54a7e7947decba8134a1d67e4b0434151a06	2021-06-09 13:52:15 -07:00
Jane Xu	97dfc7e300	[Reland] Adding run specified tests option to run_test.py (#59649 ) Summary: Reland of https://github.com/pytorch/pytorch/issues/59487 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59649 Reviewed By: samestep Differential Revision: D28970751 Pulled By: janeyx99 fbshipit-source-id: 6e28d4dcfdab8a49da4b6a02c57516b08bacd7b5	2021-06-08 16:04:46 -07:00
Alban Desmaison	5d6a10a765	Revert D28913223: [pytorch][PR] Adding run-specified-test-cases option in run_test.py Test Plan: revert-hammer Differential Revision: D28913223 (`24432eaa29`) Original commit changeset: 0d1f99109734 fbshipit-source-id: 47c073720cff23a5d4cb64556381c46025e90937	2021-06-08 02:18:16 -07:00
Rong Rong (AI Infra)	57d8bccd00	only reorder tests based on git diff if IN_CI (#59565 ) Summary: Do not reorder tests unless they are in IN_CI, this causes local development test ordering indeterministic. most of use branch out from viable strict not head of master. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59565 Reviewed By: ejguan Differential Revision: D28943906 Pulled By: walterddr fbshipit-source-id: e742e7ce4b3fc017d7563b01e93c4cd774d0a537	2021-06-07 17:54:19 -07:00
Jane Xu	24432eaa29	Adding run-specified-test-cases option in run_test.py (#59487 ) Summary: The run-specified-test-cases option would allow us to specify a list of test cases to run by having a CSV with minimally two columns: test_filename and test_case_name. This PR also adds .json to some files we use for better clarity. Usage: `python test/run_test.py --run-specified-test-cases <csv_file>` where the csv file can look like: ``` test_filename,test_case_name,test_total_time,windows_only_failure_sha_count,total_sha_count,windows_failure_count,linux_failure_count,windows_total_count,linux_total_count test_cuda,test_cudnn_multiple_threads_same_device,8068.8409659525,46,3768,53,0,2181,6750 test_utils,test_load_standalone,8308.8062920459,14,4630,65,0,2718,8729 test_ops,test_forward_mode_AD_acosh_cuda_complex128,91.652619369806,11,1971,26,1,1197,3825 test_ops,test_forward_mode_AD_acos_cuda_complex128,91.825633094915,11,1971,26,1,1197,3825 test_profiler,test_source,60.93786725749,9,4656,21,3,2742,8805 test_profiler,test_profiler_tracing,203.09352795241,9,4662,21,3,2737,8807 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/59487 Test Plan: Without specifying the option, everything should be as they were before. Running `python test/run_test.py --run-specified-test-cases windows_smoke_tests.csv` resulted in this paste P420276949 (you can see internally). A snippet looks like: ``` (pytorch) janeyx@janeyx-mbp pytorch % python test/run_test.py --run-specified-test-cases windows_smoke_tests.csv Loading specified test cases to run from windows_smoke_tests.csv. Processed 28 test cases. Running test_cpp_extensions_jit ... [2021-06-04 17:24:41.213644] Executing ['/Users/janeyx/miniconda3/envs/pytorch/bin/python', 'test_cpp_extensions_jit.py', '-k', 'test_jit_cuda_archflags'] ... [2021-06-04 17:24:41.213781] s ---------------------------------------------------------------------- Ran 1 test in 0.000s OK (skipped=1) ... ``` With pytest, an example executable would be: `Running test_dataloader ... [2021-06-04 17:37:57.643039] Executing ['/Users/janeyx/miniconda3/envs/pytorch/bin/python', '-m', 'pytest', 'test_dataloader.py', '-v', '-k', 'test_segfault or test_timeout'] ... [2021-06-04 17:37:57.643327]` Reviewed By: samestep Differential Revision: D28913223 Pulled By: janeyx99 fbshipit-source-id: 0d1f9910973426b8756815c697b483160517b127	2021-06-07 16:27:43 -07:00
Jane Xu	caf76c2445	Move sharding to after all tests have been excluded (#59583 ) Summary: It would be most accurate if sharding occurred after all other changes to selected_tests were complete. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59583 Reviewed By: ejguan Differential Revision: D28944737 Pulled By: janeyx99 fbshipit-source-id: a851473948a5ec942ffeeedeefdc645536a3d9f7	2021-06-07 15:04:36 -07:00
Mike Ruberry	de40c8e495	Adds remaining OpInfos and removes redundant test generators (#55558 ) Summary: Per title. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55558 Reviewed By: ngimel Differential Revision: D28922522 Pulled By: mruberry fbshipit-source-id: 89cefd93788bc8aa0683f4583cf5caa81aa2dc93	2021-06-06 14:52:26 -07:00
Andrew Gu	2ad4b8e58c	Extract c10d Store tests to dedicated test file (#59271 ) Summary: Partially addresses https://github.com/pytorch/pytorch/issues/55340 Overview This factors out `FileStoreTest`, `HashStoreTest`, `PrefixFileStoreTest`, `TCPStoreTest`, `PrefixTCPStoreTest`, `PythonStoreTest`, `RendezvousTest`, `RendezvousEnvTest`, `RendezvousFileTest`, and `RendezvousTCPTest` from `test_c10d_common.py` to a new file `test_store.py`. Additionally, unused import/initialization statements are removed from `test_c10d_common.py`, and the minimal set of import/initialization statements are used for `test_store.py`. Also, this changes `.jenkins/pytorch/multigpu-test.sh`, `.jenkins/pytorch/win-test-helpers/test_distributed.bat`, and `test/run_test.py` to include the new `test_store.py`. Testing All commands shown are run on an AI AWS cluster. I check the Store tests: ``` python test/distributed/test_store.py ``` I also check `test_c10d_common.py` since it is the source of the refactored code. In addition, I check `test_c10d_nccl.py` and `test_c10d_gloo.py` since they import from `test_c10d_common.py`; those two should be the only test files depending on `test_c10d_common.py`. ``` python test/distributed/test_c10d_common.py python test/distributed/test_c10d_nccl.py python test/distributed/test_c10d_gloo.py ``` `test_c10d_gloo.py` produces warnings about how using sparse tensors in TorchScript is experimental, but the warnings do not result from this PR's changes. Testing Issues (To Be Revisited) ``` WORLD_SIZE=4 BACKEND=gloo gpurun pytest test/distributed/test_c10d_gloo.py ``` Running the above command fails three tests (written as `[Test]`: `[Error]`): - `ProcessGroupGlooWrapperTest.test_collective_hang`: `RuntimeError: [../third_party/gloo/gloo/transport/tcp/pair.cc:598] Connection closed by peer [10.200.24.101]:15580` - `CommTest.test_broadcast_coalesced_gloo_cuda`: `RuntimeError: cuda runtime error (3) : initialization error at ../aten/src/THC/THCGeneral.cpp:54` - `CommTest.test_sequence_num_incremented_gloo_default`: `RuntimeError: cuda runtime error (3) : initialization error at ../aten/src/THC/THCGeneral.cpp:54` However, running each of the following yields no errors: ``` WORLD_SIZE=4 BACKEND=gloo gpurun pytest test/distributed/test_c10d_gloo.py -k test_collective_hang WORLD_SIZE=4 BACKEND=gloo gpurun pytest test/distributed/test_c10d_gloo.py -k test_broadcast_coalesced_gloo_cuda WORLD_SIZE=4 BACKEND=gloo gpurun pytest test/distributed/test_c10d_gloo.py -k test_sequence_num_incremented_gloo_default ``` This suggests the existence of some inadvertent state dependency between tests (e.g. improper cleanup). I have not explored this further yet. In particular, I do not have a solid understanding of the tests to be able to explain why using `pytest` and `gpurun` induces the failure (since notably, running the `.py` directly shows no issue). Similarly, running the following yields 47 errors: ``` WORLD_SIZE=4 BACKEND=nccl gpurun pytest test/distributed/test_c10d_nccl.py ``` The errors seem to all be simply complaining about the usage of `fork()` instead of `spawn()` for CUDA multiprocessing. Though, most of the tests in `test_c10d_nccl.py` ask for at least 2 CUDA devices, so I think that the `gpurun` is warranted (assuming that the test file does not need to be run partially on different machines). Both `test_c10d_common.py` and `test_store.py` work fine with `pytest`. Other Notes I noticed that `torch.distributed` is imported both as `dist` and as `c10d` and that `c10d` is used throughout the Store tests. I was curious if this is intentional (as opposed to using `dist` to refer to `torch.distributed`). Also, the original [issue](https://github.com/pytorch/pytorch/issues/55340) suggests that the Store tests do not use multiprocessing, but I saw that `torch.multiprocessing` is still used in `TCPStoreTest`. The links for the Store files in the `CONTRIBUTING.md` [file](https://github.com/pytorch/pytorch/blob/master/torch/distributed/CONTRIBUTING.md) are broken. This can fixed in a separate PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59271 Reviewed By: jbschlosser, mrshenli Differential Revision: D28856920 Pulled By: andwgu fbshipit-source-id: 630950cba18d34e6b5de661f5a748f2cddc1b446	2021-06-03 10:53:33 -07:00
Pritam Damania	0d6fa1adc5	Introduce ChunkShardingSpec as a model sharding specification. (#55728 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55728 Full design: https://github.com/pytorch/pytorch/issues/55207 This PR introduces ChunkShardingSpec (SingleShardingSpec in the design). Used the name ChunkShardingSpec since it is very similar to `torch.chunk` in terms of how a Tensor is split up and feels more clear compared to SingleShardingSpec. ghstack-source-id: 129603318 Test Plan: waitforbuildbot Reviewed By: SciPioneer Differential Revision: D27694108 fbshipit-source-id: c8764abe6a4d5fc56d023fda29b74b5af2a73b49	2021-05-23 16:04:57 -07:00
Rong Rong (AI Infra)	a70020465b	adding test_sparse_csr to run_test (#58666 ) Summary: fixes https://github.com/pytorch/pytorch/issues/58632. Added several skips that relates to test assert and MKL. Will address them in separate PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58666 Reviewed By: seemethere, janeyx99 Differential Revision: D28607966 Pulled By: walterddr fbshipit-source-id: 066d4afce2672e4026334528233e69f68da04965	2021-05-22 13:17:46 -07:00
Sam Estep	2e26976ad3	Disallow versionless Python shebangs (#58275 ) Summary: Some machines don't have a versionless `python` on their PATH, which breaks these existing shebangs. I'm assuming that all the existing versionless `python` shebangs are meant to be `python3` and not `python2`; please let me know if my assumption was incorrect for any of these. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58275 Test Plan: CI. Reviewed By: zhouzhuojie Differential Revision: D28428143 Pulled By: samestep fbshipit-source-id: 6562be3d12924db72a92a0207b060ef740f61ebf	2021-05-14 08:26:02 -07:00
Nikita Shulga	b587354e4c	Add Python-3.9 CI testing (#50992 ) Summary: Skip number of tests adjust typing handling Pull Request resolved: https://github.com/pytorch/pytorch/pull/50992 Reviewed By: walterddr Differential Revision: D26170388 Pulled By: malfet fbshipit-source-id: 47852512aa3d5c25faf6687bcd0b1cbb332b0b20	2021-05-10 10:51:39 -07:00
Aliaksandr Ivanou	7fe4c1d0e7	Torchelastic: add multiprocessing tests to ci/cd (#56842 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56842 Add elastic multiprocessing test to ci/cd Test Plan: buck test mode/opt-tsan //caffe2/test/distributed/elastic/multiprocessing/... -- --run-disabled Reviewed By: wilson100hong Differential Revision: D27982226 fbshipit-source-id: 1b4e6f1a20867a6aa7ca409e280fdb04e8db198b	2021-05-02 14:03:47 -07:00
Aliaksandr Ivanou	5c8ceefe46	Pytorch add agent api tests (#56985 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56985 Pytorch add agent api tests Test Plan: ci/cd Reviewed By: cbalioglu Differential Revision: D28020485 fbshipit-source-id: e6acf095f26ce4b99cddfbf7641fb4fa885b0c86	2021-04-29 06:14:39 -07:00
Aliaksandr Ivanou	6ff0002b12	Pytorch: enable many torchelastic tests (#56970 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56970 The diff enables metrics, events, utils and timer tests on ci/cd pipeline Test Plan: ci/cd Reviewed By: cbalioglu Differential Revision: D28015200 fbshipit-source-id: 6b419aaf9e62a10a747b6511bff90c82cfb7bcd6	2021-04-28 17:05:09 -07:00
David Reiss	89377e3e45	model_dump tool for model inspection (#56868 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56868 See __init__.py for a summary of the tool. The following sections are present in this initial version - Model Size. Show the total model size, as well as a breakdown by stored files, compressed files, and zip overhead. (I expect this breakdown to be a bit more useful once data.pkl is compressed.) - Model Structure. This is basically the output of `show_pickle(data.pkl)`, but as a hierarchical structure. Some structures cause this view to crash right now, but it can be improved incrementally. - Zip Contents. This is basically the output of `zipinfo -l`. - Code. This is the TorchScript code. It's integrated with a blame window at the bottom, so you can click "Blame Code", then click a bit of code to see where it came from (based on the debug_pkl). This currently doesn't render properly if debug_pkl is missing or incomplete. - Extra files (JSON). JSON dumps of each json file under /extra/, up to a size limit. - Extra Pickles. For each .pkl file in the model, we safely unpickle it with `show_pickle`, then render it with `pprint` and include it here if the size is not too large. We aren't able to install the pprint hack that thw show_pickle CLI uses, so we get one-line rendering for custom objects, which is not very useful. Built-in types look fine, though. In particular, bytecode.pkl seems to look fine (and we hard-code that file to ignore the size limit). I'm checking in the JS dependencies to avoid a network dependency at runtime. They were retrieved from the following URLS, then passed through a JS minifier: https://unpkg.com/htm@3.0.4/dist/htm.module.js?module https://unpkg.com/preact@10.5.13/dist/preact.module.js?module Test Plan: Manually ran on a few models I had lying around. Mostly tested in Chrome, but I also poked around in Firefox. Reviewed By: dhruvbird Differential Revision: D28020849 Pulled By: dreiss fbshipit-source-id: 421c30ed7ca55244e9fda1a03b8aab830466536d	2021-04-28 07:33:10 -07:00
Philip Meier	759cfb7495	add missing comma to `run_test.py` (#57010 ) Summary: Factored out from https://github.com/pytorch/pytorch/pull/57008#discussion_r621137121: > Without this comma, the strings are concatenated to `test_binary_ufuncstest_numpy_interop` Pull Request resolved: https://github.com/pytorch/pytorch/pull/57010 Reviewed By: malfet Differential Revision: D28028061 Pulled By: walterddr fbshipit-source-id: 97c64b79a6aaaf0242def03c8808c1a032537258	2021-04-27 08:00:13 -07:00
Joel Schlosser	febff45900	Support factory kwargs in torch.nn modules (#54508 ) Summary: Continuation of https://github.com/pytorch/pytorch/pull/53144 Pull Request resolved: https://github.com/pytorch/pytorch/pull/54508 Reviewed By: albanD Differential Revision: D27939544 Pulled By: jbschlosser fbshipit-source-id: 4bf517e5f74f093e27ca38a85e732da65e44d805	2021-04-22 16:16:53 -07:00
driazati	187a524249	Re-order tests based on changed files (#56666 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56666 Addresses some of #56557 by checking for changed files when running tests. This will help deliver signal faster when a failing test is run. It should always be safe to at least try to re-order the tests, so there's no option to turn it off, and any error ends up bailing out of the sorting process. Time saved will change between tests, with more improvement for things that are further down the static list here: `1e9c7ad4cb/test/run_test.py (L32)` The results vary from not much improvement ([before: 11m](https://app.circleci.com/pipelines/github/pytorch/pytorch/307580/workflows/6ab3def6-8d63-4f41-9b8d-9c2c50f6266b/jobs/12712819/steps), [after: 10m](https://app.circleci.com/pipelines/github/pytorch/pytorch/307578/workflows/157407b4-f850-431c-b641-d2ac97916a04/jobs/12712802/steps)) to a lot ([before: 75m](https://app.circleci.com/pipelines/github/pytorch/pytorch/307580/workflows/6ab3def6-8d63-4f41-9b8d-9c2c50f6266b/jobs/12712884/steps), [after: 8m](https://app.circleci.com/pipelines/github/pytorch/pytorch/307578/workflows/157407b4-f850-431c-b641-d2ac97916a04/jobs/12712865/steps)), but overall there shouldn't be any regression in test timing. These results are also probably a little confounded since the test sharding will be different after re-ordering. As a follow up we can use the target determination logic to figure out which tests to bring to front based on the actual code instead of just edits to test files Test Plan: Imported from OSS Reviewed By: samestep Differential Revision: D27934076 Pulled By: driazati fbshipit-source-id: 747d09ad732289d7693101803d46e9fa8e6d2f59	2021-04-22 10:27:07 -07:00
Pavel Belevich	426852b4f0	Split test_c10d_spawn.py to test_c10d_spawn_gloo.py,test_c10d_spawn_nccl.py (#56599 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56599 Test Plan: NA Reviewed By: SciPioneer Differential Revision: D27913955 fbshipit-source-id: 7206e589fb7d08c55d08a58a3d57dc3d210a795e	2021-04-21 22:11:49 -07:00
Pavel Belevich	5cc75e46fa	Split test_c10d.py to test_c10d_common.py, test_c10d_gloo.py, test_c10d_nccl.py (#56598 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56598 Test Plan: NA Reviewed By: SciPioneer Differential Revision: D27913170 fbshipit-source-id: 3439d18141131b02d55f2ca399a4c795cba2b04b	2021-04-21 22:10:41 -07:00
Joel Schlosser	12b2bc94d7	Revert D27909732: [pytorch][PR] Support factory kwargs in torch.nn modules Test Plan: revert-hammer Differential Revision: D27909732 (`5a09def9b0`) Original commit changeset: d8684b2403ab fbshipit-source-id: d00d69fae4fa4ed58d9e97e70b27a06a0dcb39e4	2021-04-21 13:44:03 -07:00
Joel Schlosser	5a09def9b0	Support factory kwargs in torch.nn modules (#54508 ) Summary: Continuation of https://github.com/pytorch/pytorch/pull/53144 Pull Request resolved: https://github.com/pytorch/pytorch/pull/54508 Reviewed By: malfet Differential Revision: D27909732 Pulled By: jbschlosser fbshipit-source-id: d8684b2403ab7eb336371d118799146a2520bd76	2021-04-21 13:20:11 -07:00
Aliaksandr Ivanou	c5c5230890	Pytorch resolve bug around incorrect rdzv handler resolution (#56386 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56386 The diff resolves bug around incorrect handler resolution: _create_static_handler pointed towards etcd, and _create_etcd_handler pointed towards static. Test Plan: buck test mode/dev-nosan //caffe2/test/distributed:test_launcher Added test_launcher to the ci/cd tests Reviewed By: cbalioglu Differential Revision: D27858897 fbshipit-source-id: 440155789958c091ce5755e7c9524e4bb704203a	2021-04-19 23:50:28 -07:00
Natalia Gimelshein	92d24e3060	Revert D27855386: [pytorch][PR] Support factory kwargs in torch.nn modules Test Plan: revert-hammer Differential Revision: D27855386 (`40483acc51`) Original commit changeset: dabd505d2a04 fbshipit-source-id: f5bf3120d87861b30a8e1bf11977ad7d27cd8500	2021-04-19 20:07:20 -07:00
Joel Schlosser	40483acc51	Support factory kwargs in torch.nn modules (#54508 ) Summary: Continuation of https://github.com/pytorch/pytorch/pull/53144 Pull Request resolved: https://github.com/pytorch/pytorch/pull/54508 Reviewed By: bdhirsh Differential Revision: D27855386 Pulled By: jbschlosser fbshipit-source-id: dabd505d2a04208e74b158570fb2859c736eea2c	2021-04-19 12:24:58 -07:00
Sam Estep	d05e7c163f	Revert D27600457: [pytorch][PR] Support factory kwargs in torch.nn modules Test Plan: revert-hammer Differential Revision: D27600457 (`1077f87269`) Original commit changeset: b58bfee61c39 fbshipit-source-id: 19d5bfc5133a3880383731d0332503ca1f3bce0c	2021-04-19 07:47:24 -07:00
Joel Schlosser	1077f87269	Support factory kwargs in torch.nn modules (#54508 ) Summary: Continuation of https://github.com/pytorch/pytorch/pull/53144 Pull Request resolved: https://github.com/pytorch/pytorch/pull/54508 Reviewed By: mrshenli Differential Revision: D27600457 Pulled By: jbschlosser fbshipit-source-id: b58bfee61c3917524b4622f63ef216c27a588eb1	2021-04-19 06:58:40 -07:00
Sam Estep	1e9c7ad4cb	Add a test to measure `import torch` time (#56041 ) Summary: This PR adds a couple very simple tests which (as the code comment says) measure the time it takes to `import torch` and ask for the CUDA device count. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56041 Test Plan: ``` $ rm -r /tmp/reports ; python3 test/test_import_time.py --save-xml=/tmp/reports Running tests... ---------------------------------------------------------------------- .. ---------------------------------------------------------------------- Ran 2 tests in 1.855s OK Generating XML reports... ``` ``` $ tools/print_test_stats.py /tmp/reports No scribe access token provided, skip sending report! class TestImportTime: tests: 2 failed: 0 skipped: 0 errored: 0 run_time: 1.85 seconds avg_time: 0.93 seconds median_time: 0.93 seconds 2 longest tests: test_time_cuda_device_count time: 1.10 seconds test_time_import_torch time: 0.75 seconds Total runtime is 0:00:01 2 longest tests of entire run: TestImportTime.test_time_cuda_device_count time: 1.10 seconds TestImportTime.test_time_import_torch time: 0.75 seconds ``` Reviewed By: driazati Differential Revision: D27770908 Pulled By: samestep fbshipit-source-id: 01bbf5a339f41d3a1f493e6fa8c946ff7567daec	2021-04-15 00:53:30 -07:00
Edward Yang	bc86358cf5	Make run_test.py work even if s3_stat_parser fails to import (#56039 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56039 Python will try to eagerly resolve the name references even if the import failed. Quote them so that it doesn't. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: janeyx99 Differential Revision: D27770536 Pulled By: ezyang fbshipit-source-id: b111739289498f9bab856fb9424f3080efee4ee0	2021-04-14 13:21:50 -07:00
Luca Wehrstedt	3f8d476857	Split out CUDA RPC tests (#55695 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55695 In order to be able to run CUDA tests on their own (e.g., to avoid running CPU tests on GPU machines). Done by moving test methods to a separate class (and sometimes introducing a "common" base class for utils), and then providing new entry points inside a `cuda/` subdirectory. Test Plan: Checked they are run on Sandcastle. Reviewed By: mrshenli Differential Revision: D27618198 fbshipit-source-id: 8f671657f79c8ae115748ab7752fe0066705893b	2021-04-12 07:48:08 -07:00
Rong Rong (AI Infra)	55db156229	remove test_jit_py3.py entirely (#55560 ) Summary: 1. move module related stuff to test_module_container 2. created test_types for types and annotation 3. created test_misc for the rest Pull Request resolved: https://github.com/pytorch/pytorch/pull/55560 Reviewed By: VitalyFedyunin Differential Revision: D27650911 Pulled By: walterddr fbshipit-source-id: d895a7da9e9c3d25a662a37faf4daabc276b9c1a	2021-04-08 14:28:54 -07:00
Erjia Guan	f9a0bbbeb8	[DataPipe] Remove duplicate dataset (#54553 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54553 Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D27279301 Pulled By: ejguan fbshipit-source-id: 112a83e7061e3f35dc517eb623bd9ca93c2f034c	2021-04-07 10:11:22 -07:00
Jane Xu	bf37bf7da4	Make JSON files more human readable (#55335 ) Summary: Prettifies JSON files .pytorch-test-times and .pytorch-slow-tests so that not everything is on one single line. This is of slightly more importance as generated .pytorch-slow-tests ends up getting stored in our test-infra repo ([example](`ad9cd87565`)), and it is nice to not have that lil red symbol at the end. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55335 Reviewed By: samestep Differential Revision: D27576930 Pulled By: janeyx99 fbshipit-source-id: be58565b8c8593a9bfcfab383ee19facc79f0572	2021-04-05 17:23:36 -07:00
Jane Xu	717e70a824	(BE) Refactor get-test-times-from-S3 into s3_stat_parser (#54808 ) Summary: Moves more s3 parsing code to s3_stat_parser.py. This is another step in modularizing the parsing code more correctly. I will also be using this exact function in future slowTest code. Also replaces some Any's in the code to be Report. Pull Request resolved: https://github.com/pytorch/pytorch/pull/54808 Test Plan: .pytorch-test-times generated before the code and after this code is the same. CI should pass, specifically the test tools GHA. Reviewed By: walterddr Differential Revision: D27375783 Pulled By: janeyx99 fbshipit-source-id: bec28551668b2eb3fdd60d802200993e493eac83	2021-03-29 08:45:22 -07:00
Rong Rong (AI Infra)	d4045e9aa1	initial commit to refactor all s3 access codes to s3_stats_parser (#54681 ) Summary: First step to move all S3 related operations into S3 parser utils. in the end we provide APIs from s3_stats_parser: 1. downloading data as reports and uploading data as reports 2. filter by job name and handle all compression, formatting inside. TODO - [ ] Refactor out upload into s3_stats_parser - [ ] Remove all S3/BOTO related checkers and try/catch blocks outside of s3_stats_parser Pull Request resolved: https://github.com/pytorch/pytorch/pull/54681 Test Plan: 1. Running tools/test/* covers the refactoring logic (test_test_history.py and test_stats.py as entrypoint and both using the 2 new APIs in s3_stats_parser after the refactoring. 2. print_test_stats.py's main argparse entrypoint is covered by CI step Report Test Result step. 3. run `python test/run_test.py --export-past-test-times` before and after this PR should result in the same file content in .pytorch-test-times Reviewed By: ailzhang Differential Revision: D27346742 Pulled By: walterddr fbshipit-source-id: fb40162e631e007fed9d5821fe4f190bda2cb52e	2021-03-26 06:49:15 -07:00
Jane Xu	792f5ffb83	Also strip slow_test (#54528 ) Summary: Since `_test1`, `_test2` and `_build` and `test` are all stripped, `slow_test` should be stripped as well. This way, the _slow_test stats will be considered as a part of all stats relating to a particular build job, though currently, it doesn't do much because the jobs don't share a common stemmed name--the build has `_gcc7` while the slow_test CI job does not. This makes me think...do we omit the `gcc7` intentionally? Are there other things I should strip, e.g., `multigpu_test`? See: ci/circleci: pytorch_linux_xenial_cuda10_2_cudnn7_py3_slow_test ci/circleci: pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_test1 ci/circleci: pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_test2 Pull Request resolved: https://github.com/pytorch/pytorch/pull/54528 Reviewed By: samestep Differential Revision: D27270393 Pulled By: janeyx99 fbshipit-source-id: ffb7289cfe4dba52ded67f50a89f3e75e7bad68d	2021-03-23 14:44:21 -07:00
Jane Xu	635595f706	Change sharding in ci (#54228 ) Summary: Step three (landing this should fix https://github.com/pytorch/pytorch/issues/53882)! Modifying CI to compute job times during build so that the exported job times can be used for sharding future test jobs. The builds that are exempted from this: - `bazel` (no python tests so no need) - `libtorch` (no python stuff so no need) - `onnx` (the test shards are not calculated the same way) - `asan` (runs into error I don't know how to debug/we can debug later: [logs](https://app.circleci.com/pipelines/github/pytorch/pytorch/288019/workflows/57f95f67-1a1b-44a0-9b02-9652b57f2a5f/jobs/11693962) Pull Request resolved: https://github.com/pytorch/pytorch/pull/54228 Test Plan: CI Reviewed By: samestep Differential Revision: D27192978 Pulled By: janeyx99 fbshipit-source-id: 3cb20d14f4989e61873043b81dfd6b0f82d17ccd	2021-03-22 08:40:34 -07:00
Jane Xu	0645e2b490	Use shard file if present, improve functions used for sharding (#54210 ) Summary: Step 2 to fixing https://github.com/pytorch/pytorch/issues/53882 :) This changes TARGET_DET_LIST and sharding automation by checking if there's already cached data from the commit in `.pytorch-test-times`. If not, it pulls data from S3 and updates the file to have the stats. This way, S3 pulling does not need to happen more than once for the same commit. Pull Request resolved: https://github.com/pytorch/pytorch/pull/54210 Test Plan: the following methods should run the same set of tests. First `export CIRCLE_JOB=pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_test2` or your favorite CIRCLE JOB. 1. Pull data first and use it: Download the data from S3 and write it to the cache file with `python test/run_test.py --export-historic-test-times .pytorch-test-times` Now run `python test/run_test.py --shard 1 10` 2. Make the sharding job pull data: Delete the file you just created: `rm .pytorch-test-times` Now run `python test/run_test.py --shard 1 10` Reviewed By: walterddr Differential Revision: D27136849 Pulled By: janeyx99 fbshipit-source-id: 51a42c4e2fa3f8cf15e682679dd3eb6130aad927	2021-03-18 13:25:51 -07:00
Jane Xu	2e7311ef25	First step to refactoring S3 reading logic (#53755 ) Summary: This is an initial attempt in refactoring and consolidating our S3 read logic for print_test_stats.py, test_history.py, and run_test.py. This way, boto3 and botocore do not need to be imported in various places throughout the code base, and duplicated logic (such as the many type definitions) can exist in one place: `tools/stat_utils/s3_stat_parser.py`. walterddr contributed to this PR by moving print_test_stats.py to the tools folder and the corresponding tests a subfolder within tools. NOTE: this removes those tests from CI as the new `tools/test/test_stats.py` is not in the test/ directory as the other tests in TESTS in run_test.py. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53755 Test Plan: This refactoring change should not break anything, so running the files as before should work as they did previously. To make sure that print_test_stats.py still functions: run `python tools/test/test_stats.py` and make sure all tests pass. To make sure that test_history.py works, run the example commands from `tools/test_history.py --help` and check that their output matches that shown. Note that the script will continue printing for a while, so don't be alarmed. Some next steps: - Actually coming up with similarities among the three current use cases and further refactoring/consolidating of functions (e.g., combining simplify and get_cases) - Moving more parsing logic to s3_stat_parser.py to have better abstraction between our files - Adding tests for s3_stat_parser.py when there is more functionality in it Reviewed By: agolynski, samestep Differential Revision: D27030285 Pulled By: janeyx99 fbshipit-source-id: e664781324ef7c0c30943bfd7f17c895075ef7a7	2021-03-17 12:38:09 -07:00
Jane Xu	f30a7a2739	Add export-historic-test-times option to dump S3 test times into a JSON file (#54083 ) Summary: This will allow for future work to use the test times file (which will save computation time and also allow for more consistency). (Step one to fixing https://github.com/pytorch/pytorch/issues/53882) Pull Request resolved: https://github.com/pytorch/pytorch/pull/54083 Test Plan: export CIRCLE_JOB=your-favorite-circleci-job e.g., pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_test2 `python test/run_test.py --export-historic-test-times` OR `python test/run_test.py --export-historic-test-times .your-favorite-file` When opening either .pytorch-test-times or .your-favorite-file, you should see something like: ``` {"commit": "2d559a09392aabb84dfb4a498010b2f01d99818c", "job_times": {"distributed/test_distributed_spawn": 583.5889999999973, "distributed/test_data_parallel": 4.866999999999997, "test_binary_ufuncs": 171.1569999999998, "test_numpy_interop": 2.5649999999999995, "test_public_bindings": 0.011,...}} ``` Note that no tests will be run when this option is specified. Reviewed By: walterddr Differential Revision: D27091351 Pulled By: janeyx99 fbshipit-source-id: e191d739268d86de0a0ba0eea0006969859d1940	2021-03-17 12:22:00 -07:00
Jane Xu	ee35060888	Fix sharding algo + test it (#53942 ) Summary: This PR: 1. moves sharding algorithm from run_test.py to framework_utils.py (let me know if you have a better place for it) 2. adds tests for the algorithm in test_testing.py 3. fixes the algorithm so that it doesn't tack on the unknown jobs all to the shard with the minimum time, but instead distributes them around the shards. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53942 Test Plan: python test/test_testing.py -k TestFrameworkUtils Reviewed By: samestep Differential Revision: D27047223 Pulled By: janeyx99 fbshipit-source-id: 824b20009c0bb707aa5361de445cdec795d5e3f1	2021-03-15 16:33:56 -07:00
Nikita Shulga	b00cdfe136	Fix run_test_module logic (#53884 ) Summary: First argument is either file name or test module name, but key to `CUSTOM_HANDLERS` is test module name. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53884 Test Plan: Run `python3 run_test.py -i distributed/test_distributed_spawn.py` Reviewed By: janeyx99 Differential Revision: D27006164 Pulled By: malfet fbshipit-source-id: f30b42856cd2754e5981c1c69618f84e392c986a	2021-03-12 09:53:58 -08:00
Aliaksandr Ivanou	ec484981c6	[3/n][torch/elastic][upstream] Move torchelastic/events to torch/distributed/events (#53760 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53760 Pull Request resolved: https://github.com/pytorch/elastic/pull/143 The diff upsteams torchelastic/events to the torch. Test Plan: buck test mode/dev-nosan //pytorch/elastic/torchelastic/agent/... buck test mode/dev-nosan //caffe2/test/distributed/elastic/events/fb/... Reviewed By: kiukchung Differential Revision: D26932830 fbshipit-source-id: 23fc10d2ead5af7f7ed510ae0d2581cc2421cf76	2021-03-11 11:25:24 -08:00
Guilherme Leobas	cb68039363	Port NumPy typing testing style to PyTorch (#52408 ) Summary: ref: https://github.com/pytorch/pytorch/issues/16574 Pull Request resolved: https://github.com/pytorch/pytorch/pull/52408 Reviewed By: anjali411 Differential Revision: D26654687 Pulled By: malfet fbshipit-source-id: 6feb603d8fb03c2ba2a01468bfde1a9901e889fd	2021-03-10 12:18:01 -08:00
Jane Xu	bcbe07200c	Improve logic for S3 stats gathering. Uses automatic SLOW_TESTS. (#53549 ) Summary: This PR: 1. refactors the logic for S3 stats gathering. 2. Renames SLOW_TESTS to TARGET_DET_LIST to disambiguate and remove confusion with slowTest 2. detects slow tests (tests with time > 5min) to add to the TARGET_DET_LIST based on results in S3 from the previous nightly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53549 Test Plan: Set CIRCLE_JOB to your favorite CI job (like `pytorch_linux_bionic_py3_8_gcc9_coverage_test1`). Run `python test/run_test.py --determine-from=<your fave pytorch files>` e.g., `python test/run_test.py --determine-from=test/run_test.py` Reviewed By: mrshenli Differential Revision: D26904478 Pulled By: janeyx99 fbshipit-source-id: 9576b34f4fee09291d60e36ff2631753a3925094	2021-03-10 09:37:06 -08:00
Sam Estep	8c798e0622	Forbid trailing whitespace (#53406 ) Summary: Context: https://github.com/pytorch/pytorch/pull/53299#discussion_r587882857 These are the only hand-written parts of this diff: - the addition to `.github/workflows/lint.yml` - the file endings changed in these four files (to appease FB-internal land-blocking lints): - `GLOSSARY.md` - `aten/src/ATen/core/op_registration/README.md` - `scripts/README.md` - `torch/csrc/jit/codegen/fuser/README.md` The rest was generated by running this command (on macOS): ``` git grep -I -l ' $' -- . ':(exclude)/contrib/' ':(exclude)third_party' \| xargs gsed -i 's/ *$//' ``` I looked over the auto-generated changes and didn't see anything that looked problematic. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53406 Test Plan: This run (after adding the lint but before removing existing trailing spaces) failed: - https://github.com/pytorch/pytorch/runs/2043032377 This run (on the tip of this PR) succeeded: - https://github.com/pytorch/pytorch/runs/2043296348 Reviewed By: walterddr, seemethere Differential Revision: D26856620 Pulled By: samestep fbshipit-source-id: 3f0de7f7c2e4b0f1c089eac9b5085a58dd7e0d97	2021-03-05 17:22:55 -08:00
Jane Xu	c0adabe172	automate sharding using S3 test time stats (#53269 ) Summary: Uses nightly commit stats to automatically shard tests based on execution time. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53269 Test Plan: set CIRCLE_JOB to an existing job, like `pytorch_linux_bionic_py3_6_clang9_test` Then you can run something like: `python test/run_test.py --shard 1 10` Reviewed By: malfet Differential Revision: D26819440 Pulled By: janeyx99 fbshipit-source-id: 6bc73d6aa3d52d9850817536be15d7b54a72780e	2021-03-05 13:40:24 -08:00
Yi Zhang	fd582af06c	enable coverage test for dataloader on Windows (#52550 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/50661 For coverage, The class qualified name is `'SimpleCustomBatch': <class '__mp_main__.SimpleCustomBatch'>` For pytest The class qualified name is `'SimpleCustomBatch': <class 'test_dataloader.SimpleCustomBatch'>` So move the class to one separate file ![image](https://user-images.githubusercontent.com/16190118/108611869-d6b51f80-741d-11eb-908e-be7a64da916d.png) As malfet suggestion, use __import__ to avoid adding new file. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52550 Reviewed By: walterddr Differential Revision: D26754023 Pulled By: malfet fbshipit-source-id: 34b0fbe7336b9303cedc28ec6116ab752a2d3630	2021-03-02 18:40:47 -08:00
Meghan Lele	1d6bd15790	[JIT] Add torch._C._jit submodule (#52910 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52910 Summary PR #52158 tried to move all JIT bindings from `torch._C` to a new submodule `torch._C._jit`, but that...did not go well. This pull request adds the new `torch._C._jit` submodule, but does not migrate the existing bindings. Instead, it adds a unit test that fails if any new bindings are added to `torch._C`. A comment in the test instructs developers to add their new binding to the allowlist if it really should be in `torch._C`, or to add it to the appropriate submodule (e.g `torch._C._jit`, for example). The idea is to prevent the issue described in #51691 from getting worse if it cannot be fixed. Test Plan Continuous integration. Fixes This commit fixes #51691. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D26698373 Pulled By: SplitInfinity fbshipit-source-id: ec9f5426051227a513d4fd09512b624420e0100b	2021-02-26 16:05:05 -08:00
Kimish Patel	a6e94d274f	[Pytorch] Add python binding to use mobile cpu allocator. (#52323 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52323 Using default cpu allocator for ops executed on qnnpack backend will result in asan failures with heap overflow since qnnpack (and xnnpack) can access input beyond their and/beginning. Here we are enabling this feature specifically to enable dynamic sparse linear op test using qnnpack engine. In dynamic linear op, the fp32 bias is not packed and hence can result in out-of-bound access. Test Plan: test_set_default_mobile_cpu_allocator.py Reviewed By: z-a-f Differential Revision: D26263481 fbshipit-source-id: a49227cac7e6781b0db4a156ca734d7671972d9f	2021-02-17 08:42:23 -08:00

... 4 5 6 7 8 ...

786 Commits