This PR turns translation validation on by default for tests and accuracy benchmark
runs. It also installs Z3 on CI.
The main changes are:
- Add `--no-translation-validation` as an option in _test/run_tests.py_
- Set `PYTORCH_TEST_WITH_TV` environment variable
- Add `TEST_WITH_TV` variable in _torch/testing/_internal/common_utils.py_
- Turn translation validation on for accuracy benchmarks in _benchmarks/dynamo/common.py_
- Add Z3 installation on CI scripts
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103611
Approved by: https://github.com/ezyang
Prevents following cryptic error if one attempts to use `run_tests.py` on system that also has torchaudio installed in dev mode (as `tools` from https://github.com/pytorch/audio might take precedence, but this is not how script should behave):
```
Unable to import test_selections from tools/testing. Running without test selection stats.... Reason: No module named 'tools.stats'
Traceback (most recent call last):
File "/Users/nshulga/git/pytorch/pytorch/test/run_test.py", line 1673, in <module>
main()
File "/Users/nshulga/git/pytorch/pytorch/test/run_test.py", line 1604, in main
selected_tests = get_selected_tests(options)
File "/Users/nshulga/git/pytorch/pytorch/test/run_test.py", line 1418, in get_selected_tests
path = os.path.join(str(REPO_ROOT), TEST_TIMES_FILE)
NameError: name 'TEST_TIMES_FILE' is not defined
```
But make sure to remove it in the end, otherwise it will not work if torch is installed from wheel, but tests are running from clean repo checkout.
<!--
copilot:poem
-->
### <samp>🤖 Generated by Copilot at dd52521</samp>
> _Sing, O Muse, of the cunning code review_
> _That fixed the tests of the `tools` module_
> _By adding and removing the root path_
> _As a shepherd guides his flock to and fro._
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104214
Approved by: https://github.com/kit1980
There is a `HAVE_TEST_SELECTION_TOOLS` conditional, but turns out it does not really work, so fix it by defining all missing prototypes and make it work as single-shard instance
Add lint rule to test stat it would succeed for runnign only test_cuda with released version of PyTorch
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104111
Approved by: https://github.com/clee2000, https://github.com/ZainRizvi
Because we always run tests with pytest now.
Marking it as `bc-breaking` as there could technically be some scripts depending on it somewhere...
<!--
copilot:poem
-->
### <samp>🤖 Generated by Copilot at 1760568</samp>
> _`pytest` option gone_
> _simpler test runner script_
> _autumn leaves fall fast_
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104125
Approved by: https://github.com/seemethere
Added a feature to upload test statistics to DynamoDB and Rockset using a new function `emit_metric` in `tools/stats/upload_stats_lib.py`.
Added metrics to measure test reordering effectiveness in `tools/testing/test_selections.py`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102691
Approved by: https://github.com/malfet
sharding on rocm is broken, i cant replicate on dummy PRs even though it seems to happen pretty often on main, so adding this to increase my sample size. Hopefully this is enough print statements...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102713
Approved by: https://github.com/huydhn
Currently file level reruns + stepcurrent are incompatible and it's making PRs green when they are actually red, so turn off stepcurrent + file level reruns when keep-going is used until I figure out a better way to do this.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102569
Approved by: https://github.com/huydhn
The console log blows up to much when running in rerun disabled tests mode (x50) e132f09e88. Each log is around 1GB and the whole uncompressed logs is ~50GB. After compression, it will be around 1GB, still too big. The increase comes mainly from the multiple SKIPPED message for non-disabled tests, which is expected due to how SkipTest and pytest-flakyfinder currently work.
I update `test/conftest.py` to completely ignore skipped tests when rerunning disabled test instead of collecting then skipping 50 tests each. The benefit of doing is is much more than I originally expect:
* Rerun disabled tests jobs now finish in less than half an hour as they should be
* Fix OOM runner crash because of too many collected tests
* Fix verbosity issue as now only disabled tests are run x50 times. There are only few hundreds of them atm
* Fix timed out issue when rerunning disabled distributed and ASAN tests. They are just too slow when running at x50
### Testing
When rerunning disabled tests https://github.com/pytorch/pytorch/actions/runs/5084508614, only disabled tests on the platform are run, for example `test_ops_jit` on https://ossci-raw-job-status.s3.amazonaws.com/log/13770164954 only ran 100 tests (`test_variant_consistency_jit_linalg_lu_cuda_float32` + `test_variant_consistency_jit_linalg_lu_factor_cuda_complex64`) x50.
```
Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_ops_jit.py', '--shard-id=1', '--num-shards=2', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '--sc=test_ops_jit_1', '--flake-finder', '--flake-runs=50', '--import-slow-tests', '--import-disabled-tests', '--rerun-disabled-tests'] ... [2023-05-25 21:32:49.763856]
Expand the folded group to see the log file of test_ops_jit 2/2
##[group]PRINTING LOG FILE of test_ops_jit 2/2 (/var/lib/jenkins/workspace/test/test-reports/test_ops_jit_h2wr_t2c.log)
Test results will be stored in test-reports/python-pytest/test_ops_jit/test_ops_jit-51a83bd44549074e.xml
============================= test session starts ==============================
platform linux -- Python 3.10.11, pytest-7.3.1, pluggy-1.0.0 -- /opt/conda/envs/py_3.10/bin/python
cachedir: .pytest_cache
hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
rootdir: /var/lib/jenkins/workspace
configfile: pytest.ini
plugins: hypothesis-5.35.1, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-11.1.2, shard-0.1.2, xdist-3.3.0, xdoctest-1.1.0
collecting ... collected 1084 items
Running 100 items in this shard: test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_lu_cuda_float32 (x50), test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_lu_factor_cuda_complex64 (x50)
stepcurrent: Cannot find last run test, not skipping
test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_lu_cuda_float32 PASSED [2.1876s] [ 1%]
test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_lu_factor_cuda_complex64 PASSED [4.5615s] [ 2%]
```
* [pull](https://github.com/pytorch/pytorch/actions/runs/5093566864)
* [trunk](https://github.com/pytorch/pytorch/actions/runs/5095364311)
* [periodic](https://github.com/pytorch/pytorch/actions/runs/5095378850)
* [slow](https://github.com/pytorch/pytorch/actions/runs/5095390285)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102107
Approved by: https://github.com/clee2000, https://github.com/malfet
Per title. I extract this part out of the draft PR that I'm working on https://github.com/pytorch/pytorch/pull/102107 because
the remaining issues with rerun disabled tests: log size and unexpected runner failures requires some further investigations while this one is clearing breaking in trunk atm.
Until we can support disable C++ tests, there is no need to run them in rerun disabled tests mode.
### Testing
Coming from https://github.com/pytorch/pytorch/pull/102107, for example https://github.com/pytorch/pytorch/actions/runs/5062224659/jobs/9087747981
```
2023-05-23T22:46:50.1953318Z Running cpp/basic 1/1 ... [2023-05-23 22:46:50.195077]
2023-05-23T22:46:50.1953847Z Skipping C++ tests when running under RERUN_DISABLED_TESTS mode
2023-05-23T22:46:50.2066032Z Running cpp/atest 1/1 ... [2023-05-23 22:46:50.206348]
2023-05-23T22:46:50.2066435Z Skipping C++ tests when running under RERUN_DISABLED_TESTS mode
2023-05-23T22:46:52.2666743Z No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
2023-05-23T22:46:52.2691817Z Ignoring disabled issues: []
...
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102132
Approved by: https://github.com/clee2000
not very elegant
checked on separate conda env that doesnt have the usual ci dependencies
the two pytest extensions at fault are pytest-rerunfailures and pytest-shard, also included pytest-flakefinder just incase
no idea if this is a good way to do this
could also check individually and add flags based on that, but was told that needing to requiring all the ci dependencies to be downloaded was also ok
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100916
Approved by: https://github.com/huydhn
After an investigation, running C++ tests with https://github.com/pytest-dev/pytest-cpp is just slower than running them directly, plain and simple. I'm curious on the exact root cause, but that's a story for another day.
`time build/bin/test_lazy` takes half a minute to run 610 tests on `linux-bionic-cuda11.8-py3.10-gcc7 / test (default, 2, 5, linux.4xlarge.nvidia.gpu)` while `time pytest /var/lib/jenkins/workspace/build/bin/test_lazy -v` takes 20+ minutes on the same runner. This is a very costly price to pay.
The saving grace here is that https://github.com/pytest-dev/pytest-cpp supports pytest-xdist to run tests in parallel with `-n auto`, so `time pytest /var/lib/jenkins/workspace/build/bin/test_lazy -v -n auto` takes only 3 minutes. This is still not as fast as running C++ tests directly, but it's order of magnitude faster than running them sequentially.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101440
Approved by: https://github.com/clee2000
After https://github.com/pytorch/pytorch/pull/99559, we can now run C++ test with `run_test.py`. Although advance features such as `--import-slow-tests` and `--import-disabled-tests` won't work for now, there will still be a gain in reliability and performance as C++ can now be retried and run in parallel.
This covers all C++ tests in the CI including aten, libtorch, and Vulkan C++ tests across all platforms Linux, Windows, MacOS.
Notes:
* To support C++ test discovery, the env variable `CPP_TESTS_DIR` can be set to where the C++ test binaries is located
* Support pytest -k argument via run_test as this is used by pytest-cpp to replace `--gtest-filter`
* The XML output is in pytest format, but it's ok now because we don't have slow test or flaky test support for C++ test yet
* ~~I need to figure out why conftest.py doesn't work when I invoke pytest directly for C++ test, so `--sc` is not available for C++ tests at the moment. Proper pytest plugin like stepwise works fine though. I'll investigate and fix it in a separate PR~~ Found the cause, `conftest.py` is per directory and needs to be in any arbitrary directory that holds C++ test
* Two tests `test_api` and `test_tensorexpr` timed out on ASAN, I suspect that ASAN is now used on top of the python executable, which is slower than running native C++ code. IMO, it's ok to run these tests as before on ASAN for now
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99956
Approved by: https://github.com/clee2000, https://github.com/ZainRizvi
Today, we prioritize running test files that were edited in the user's PR, with the idea being to run them before we run any other test.
Except, if the modified test is supposed to run serially, then we still end up running it after all the parallelized tests have finished running.
This PR fixes that to _always_ run the prioritized tests before the regular tests, regardless of if the test is supposed to run serially or in parallel
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100748
Approved by: https://github.com/huydhn
i think get_reordered_tests broken since master -> main switch
add typing for some functions
checked for `prioritized` in the logs
limited testing because I only care about one very small part of the log thats near the beginning
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100752
Approved by: https://github.com/huydhn
Fixes #ISSUE_NUMBER
* change hook so that test still gets saved in --sc when fails in test setup (caused an off by 1 error due to setup being called before the logreport hook)
* allow reruns for all tests now that --sc is used
* increase number of reruns now that --sc is used
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100200
Approved by: https://github.com/huydhn
* add stepcurrent flag (--sc) based off the stepwise flag that saves the currently running test so that test running can resume from the last successful test after segfaults, takes in an argument for a key so that different test runs dont overwrite each other
* send sigint to process when timeout so that xml can be made
* add currently unused stepcurrent skip flag (--scs) based off stepwise skip flag that skips the failing test, was going to use if for the keep-going label but having trouble with CI
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98035
Approved by: https://github.com/huydhn
This depends on [pytest-cpp](https://github.com/pytest-dev/pytest-cpp) to discover and run C++ tests with pytest. C++ tests are built under `${WORKSPACE}/build/bin` directory and copied to the test job under the same path.
* To expose them to `run_test`, I choose to use the mock path prefix `cpp`, for example `build/bin/c10_Array_test` would be named as `cpp/c10_Array_test` and the `python test/run_test.py --cpp -i cpp/c10_Array_test` would run the test in the same way as other Python tests. I could copy them from `build/bin` to `test/cpp`, but it will be mixed with the source code and CMake file. So this looks easier
* Some executable under `build/bin` are not C++ tests, and they are exclude, for example `build/bin/torch_shm_manager`
* C++ tests need to run with pytest directly as python command doesn't understand it
* The change is gated by the new `--cpp` argument to `run_test.py`, for example `python test/run_test.py --cpp` will run all available C++ tests
* The tests can be run in parallel
* Failing tests can be retried with `--reruns=2` and `--sw`
```
============================= test session starts ==============================
platform darwin -- Python 3.9.15, pytest-7.2.0, pluggy-1.0.0 -- /Users/huydo/miniconda3/envs/py3.9/bin/python3
cachedir: .pytest_cache
hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/Users/huydo/Storage/mine/pytorch/test/.hypothesis/examples')
rootdir: /Users/huydo/Storage/mine/pytorch, configfile: pytest.ini
plugins: xdoctest-1.1.0, cpp-2.3.0, rerunfailures-10.3, shard-0.1.2, flakefinder-1.1.0, hypothesis-6.56.4, xdist-3.0.2, repeat-0.9.1
collecting ... collected 3 items / 2 deselected / 1 selected
Running 1 items in this shard: build/bin/scalar_tensor_test::TestScalarTensor.TestScalarTensorMPS
stepwise: skipping 2 already passed items.
../build/bin/scalar_tensor_test::TestScalarTensor::TestScalarTensorMPS RERUN [100%]
../build/bin/scalar_tensor_test::TestScalarTensor::TestScalarTensorMPS RERUN [100%]
../build/bin/scalar_tensor_test::TestScalarTensor::TestScalarTensorMPS FAILED [100%]
```
* `--import-slow-tests` and `--import-disabled-tests` won't work for now and that's ok to have it as a future task.
I also add `pytest-cpp==2.3.0` to Linux Docker, MacOS, and Windows.
### Testing
Build PyTorch and run `python test/run_test.py --cpp` on my laptop. CI change would come later in a separate PR. Also running `python test/run_test.py --help` now shows all C++ test discovered under `build/bin`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99559
Approved by: https://github.com/clee2000
Sharing code between the code that handles test results in parallel vs serial mode.
Note that the original version of this code had an inconsistency between the two versions where it would execute `print_to_stderr(err_message)` on every test that ran in parallel, but for serial tests it would only invoke `print_to_stderr(err_message)` if `continue_on_error` was also specified. By sharing code, this PR changes that behavior to be consistent between the two modes.
Also adding some comments.
<!--
copilot:poem
-->
### <samp>🤖 Generated by Copilot at 029342c</samp>
> _Sing, O Muse, of the skillful coder who refined_
> _The PyTorch testing script, `run_test.py`, and shined_
> _A light on its obscure logic, with docstrings and comments_
> _And made it run more smoothly, with better error contents_
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99467
Approved by: https://github.com/huydhn, https://github.com/malfet
Common advice we give for handling memory fragmentation issues is to
allocate a big block upfront to reserve memory which will get split up later.
For programs with changing tensor sizes this can be especially helpful to
avoid OOMs that happen the first time we see a new largest input and would
otherwise have to allocate new segments.
However the issue with allocating a block upfront is that is nearly impossible
to correctly estimate the size of that block. If too small, space in the block
will run out and the allocator will allocate separate blocks anyway. Too large,
and other non-PyTorch libraries might stop working because they cannot allocate
any memory.
This patch provides the same benefits as using a pre-allocating block but
without having to choose its size upfront. Using the cuMemMap-style APIs,
it adds the ability to expand the last block in a segment when more memory is
needed.
Compared to universally using cudaMallocAsync to avoid fragmentation,
this patch can fix this common fragmentation issue while preserving most
of the existing allocator behavior. This behavior can be enabled and disabled dynamically.
This should allow users to, for instance, allocate long-lived parameters and state in individual buffers,
and put temporary state into the large expandable blocks, further reducing
fragmentation.
See inline comments for information about the implementation and its limitations.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96995
Approved by: https://github.com/eellison
In C++ we have TORCH_LIBRARY_FRAGMENT. This PR adds the same
functionality to the Python torch.library API.
The motivation for this is: for the simple custom op API, we don't want
users to need to deal with Library objects. One way to hide this from
users is to create library fragments.
Test Plan:
- tests that you can create multiple fragments and def+impl operators on each.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98439
Approved by: https://github.com/ezyang, https://github.com/bdhirsh
This has been bugging me for a while as I'm working on these Python scripts and they are not tracked by ufmt linter. So I add these script into that linter.
```
[[linter]]
code = 'UFMT'
include_patterns = [
'.github/**/*.py',
'test/run_test.py',
```
This change should just work and not break anything as ufmt (black + usort) linter is very safe to use for standalone util scripts.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97588
Approved by: https://github.com/kit1980
Fixes#96347
This PR:
- Makes the functorch tests run as a part of the "default" shards
- Delete the functorch CI shard from all CI job configurations (if it exists)
- Increase the "default" shard count by 1 for each job, unless it was
previously set to 1, to accommodate the new functorch tests and not
regress time-to-signal.
- Adds a bunch of skips for ROCM and torchdynamo configurations. We can
investigate them later.
NB: I might go through some more iterations to figure out what other
skips need to be added, but this iteration of the PR seems to pass most CI.
suite.
Test Plan:
- wait for CI
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96464
Approved by: https://github.com/huydhn
Enables the last few files under pytest.
xdist was causing problems with `profiler/test_profiler` `test_source_multithreaded` due to creating extra threads. Luckily we don't use it so we can disable it with `-p no:xdist`, but this is incompatible with pytest-rerunfailures==10.2, so upgrade to 10.3. I'd update the windows ami but idk how.
`dynamo/test_optimizers` and `dynamo/test_repros` both had tests that used skip_if_pytest. https://github.com/pytorch/pytorch/pull/93251/files suggests that it is due to pytest assertion rewriting, so I added `PYTEST_DONT_REWRITE` to their module docstrings to prevent pytest from rewriting assertions.
Disable test by issue in `dynamo/test_dynamic_shapes` seems sane.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96698
Approved by: https://github.com/huydhn, https://github.com/malfet
Enable pytest for a few unique files. pytest runs tests in a different order than unittest (but still a consistent ordering with respect to itself) and some tests change global state, causing other tests to fail.
`test_transpose_non_contiguous` in `test_torchinductor.py` gets impacted from some other test but I'm not sure which one, so my solution is to reset the metrics before the rest of the test is run.
`test_register_patterns` in `test_quantize_fx.py` adds extra keys to global variables, so remove them when the test is done via unittest's `addCleanUp` which also works on pytest.
pytest doesn't really have an equivalent for `load_tests` so change it to be like `test_jit` that imports all the classes. I also attempted to dynamically import them, but I failed.
`test_public_api_surface` in `test_fx.py` checks for a backwards compatibility classification. There is a different test in test_fx that results in `fuser_utils` being imported. pytest runs this test before `test_public_api_surface` while unittest runs it after, so pytest sees `fuser_utils` when crawling through the modules.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96397
Approved by: https://github.com/huydhn
Set environment variable
```
PYTORCH_TEST_DO_NOT_USE_PYTEST=1
```
to not use pytest in pytorch unit testing.
This change is related to some recent changes, e.g. #96210, #96016, #95844, #95659, that enabled the use of pytest in many test modules. Those test modules were testing normally before, but failed immediately after pytest is used. Sample stacktraces are:
```python
root@8e3168a83ee2:/opt/pytorch/pytorch# python test/run_test.py -v -i test_optim -- -v --save-xml
Ignoring disabled issues: []
/opt/pytorch/pytorch/test/run_test.py:1225: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
if torch.version.cuda is not None and LooseVersion(torch.version.cuda) >= "11.6":
Selected tests:
test_optim
parallel (file granularity) tests:
test_optim
serial (file granularity) tests:
Ignoring disabled issues: []
Ignoring disabled issues: []
Running test_optim ... [2023-03-09 12:51:59.358110]
Executing ['/usr/local/bin/python', '-bb', 'test_optim.py', '-v', '--save-xml', '-v', '--use-pytest', '-vv', '-rfEX', '-x', '--reruns=2'] ... [2023-03-09 12:51:59.358810]
Test results will be stored in test-reports/python-pytest/test_optim/test_optim-5e41643c8bac8ace.xml
Traceback (most recent call last):
File "/opt/pytorch/pytorch/test/test_optim.py", line 4581, in <module>
run_tests()
File "/opt/pytorch/pytorch/torch/testing/_internal/common_utils.py", line 796, in run_tests
exit_code = pytest.main(args=pytest_args)
File "/usr/local/lib/python3.10/site-packages/_pytest/config/__init__.py", line 148, in main
config = _prepareconfig(args, plugins)
File "/usr/local/lib/python3.10/site-packages/_pytest/config/__init__.py", line 329, in _prepareconfig
config = pluginmanager.hook.pytest_cmdline_parse(
File "/usr/local/lib/python3.10/site-packages/pluggy/_hooks.py", line 265, in __call__
return self._hookexec(self.name, self.get_hookimpls(), kwargs, firstresult)
File "/usr/local/lib/python3.10/site-packages/pluggy/_manager.py", line 80, in _hookexec
return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
File "/usr/local/lib/python3.10/site-packages/pluggy/_callers.py", line 55, in _multicall
gen.send(outcome)
File "/usr/local/lib/python3.10/site-packages/_pytest/helpconfig.py", line 103, in pytest_cmdline_parse
config: Config = outcome.get_result()
File "/usr/local/lib/python3.10/site-packages/pluggy/_result.py", line 60, in get_result
raise ex[1].with_traceback(ex[2])
File "/usr/local/lib/python3.10/site-packages/pluggy/_callers.py", line 39, in _multicall
res = hook_impl.function(*args)
File "/usr/local/lib/python3.10/site-packages/_pytest/config/__init__.py", line 1060, in pytest_cmdline_parse
self.parse(args)
File "/usr/local/lib/python3.10/site-packages/_pytest/config/__init__.py", line 1348, in parse
self._preparse(args, addopts=addopts)
File "/usr/local/lib/python3.10/site-packages/_pytest/config/__init__.py", line 1231, in _preparse
self.pluginmanager.load_setuptools_entrypoints("pytest11")
File "/usr/local/lib/python3.10/site-packages/pluggy/_manager.py", line 287, in load_setuptools_entrypoints
plugin = ep.load()
File "/usr/local/lib/python3.10/importlib/metadata/__init__.py", line 171, in load
module = import_module(match.group('module'))
File "/usr/local/lib/python3.10/importlib/__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
File "/usr/local/lib/python3.10/site-packages/_pytest/assertion/rewrite.py", line 168, in exec_module
exec(co, module.__dict__)
File "/usr/local/lib/python3.10/site-packages/xdist/looponfail.py", line 16, in <module>
import execnet
File "/usr/local/lib/python3.10/site-packages/execnet/__init__.py", line 14, in <module>
from .gateway_base import DataFormatError
File "/usr/local/lib/python3.10/site-packages/execnet/gateway_base.py", line 1138, in <module>
FLOAT_FORMAT_SIZE = struct.calcsize(FLOAT_FORMAT)
BytesWarning: Comparison between bytes and string
FINISHED PRINTING LOG FILE of test_optim (/opt/pytorch/pytorch/test/test-reports/test_optim_1pnlesrz.log)
test_optim failed!
Traceback (most recent call last):
File "/opt/pytorch/pytorch/test/run_test.py", line 1428, in <module>
main()
File "/opt/pytorch/pytorch/test/run_test.py", line 1386, in main
raise RuntimeError(
RuntimeError: test_optim failed!
Tip: You can keep running tests even on failure by passing --keep-going to run_test.py.
If running on CI, add the 'keep-going' label to your PR and rerun your jobs.
```
I'd like to propose this option that allows users to use the good old python unit test way instead of pytest to run their testing in CI.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96444
Approved by: https://github.com/malfet
Run more tests through pytest.
Use a block list for tests that shouldn't run through pytest. As far as I can tell, the number of tests run, skipped, and xfailed for those not on the blocklist are the same.
Regarding the main module:
Usually tests are run in CI, we call `python <test file>`, which causes the file to be imported under the module name `__main__`. However, pytest searches for the module to be imported under the file name, so the file will be reimported. This can cause issues for tests that run module level code and change global state, like test_nn, which modifies lists imported from another file, or tests in test/lazy, which initialize a backend that cannot coexist with a second copy of itself.
My workaround for this is to run tests from the `__main__` module. However, this results in pytest being unable to rewrite assertions (and possibly other things but I don't know what other things pytest does right now). A better solution might be to call `pytest <test file>` directly and move all the code in run_tests(argv) to be module level code or put it in a hook in conftest.py.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95844
Approved by: https://github.com/huydhn
Summary: Currently running PyTorch tests with dynamo and inductor is
controlled by environment variables, and CI sets them based on test
config name matching. Change them to use options of run_test.py.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94539
Approved by: https://github.com/huydhn
Part of my effort to move everything to pytest and decrease the number of testrunner frameworks in ci
Gives xmls but they might look a weird b/c module level tests vs tests in classes.
Doesn't give skip/disable test infra because those are tied to classes. (for future ref, could either put tests in classes or move the check_if_enable stuff into a pytest hook)
Tested in CI and checked that the same number of tests are run
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95659
Approved by: https://github.com/huydhn
Time comparison between using MultithreadedTestCase and MultiProcessTestCase on op db tests is amazing!
using MultiThreadTestCase on a AWS dev node:
```
time pytest test/distributed/_tensor/test_dtensor_ops.py
============= 175 passed, 42 skipped, 397 xfailed in 80.30s (0:01:20) =======
real 1m22.330s
user 1m38.782s
sys 0m18.762s
```
MultiProcessTestCase spends from 40mins to more than 1h, even if using pytest parallel testing tools.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92198
Approved by: https://github.com/XilunWu
Fixes#90940. This PR revamps how tests are run in parallel as well as device visibility at the docker container and within the run_test.py test runner.
First, running multiple test modules concurrently on the same GPU was causing instability for ROCm runners manifesting as timeouts. ROCm runners have at least 1 GPU each, but often 2 or more. This PR allows NUM_PROCS to be set equal to the number of devices available, but also takes care to set HIP_VISIBLE_DEVICES to avoid oversubscribing any GPU.
Second, we had introduced env vars `-e ROCR_VISIBLE_DEVICES` (#91031) to prepare for two GHA runners per CI node, to split up the GPU visibility at the docker level between the two runners. This effort wasn't fully realized; to date, we haven't had more than one runner per CI host. We abandon this effort in favor of all GPUs being visible to a single runner and managing GPU resources as stated above.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91137
Approved by: https://github.com/kit1980, https://github.com/huydhn, https://github.com/pruthvistony
Rerun all disabled test to gather their latest result so that we can close disabled tickets automatically. When running under this mode (RERUN_DISABLED_TESTS=true), only disabled tests are run while the rest are skipped `<skipped message="Test is enabled but --rerun-disabled-tests verification mode is set, so only disabled tests are run" type="skip"/>`
The logic is roughly as follows, the test runs multiple times (n=50)
* If the disabled test passes, and it's flaky, do nothing because it's still flaky. In the test report, we'll see the test passes with the following skipped message:
```
<testcase classname="TestMultiprocessing" file="test_multiprocessing.py" line="357" name="test_fs" time="0.000" timestamp="0001-01-01T00:00:00">
<skipped message="{"flaky": True, "num_red": 4, "num_green": 0, "max_num_retries": 3, "rerun_disabled_test": true}" type="skip"/>
</testcase>
```
* If the disabled test passes every single time, and it is not flaky anymore, mark it so that it can be closed later. We will see the test runs and passes, i.e.
```
<testcase classname="TestCommonCUDA" name="test_out_warning_linalg_lu_factor_cuda" time="0.170" file="test_ops.py" />
```
* If the disabled test fails after all retries, this is also expected. So only report this but don't fail the job (because we don't care about red signals here), we'll see the test is skipped (without the `flaky` field), i.e.
```
<testcase classname="TestMultiprocessing" file="test_multiprocessing.py" line="357" name="test_fs" time="0.000" timestamp="0001-01-01T00:00:00">
<skipped message="{"num_red": 4, "num_green": 0, "max_num_retries": 3, "rerun_disabled_test": true}" type="skip"/>
</testcase>
```
This runs at the same schedule as `mem_leak_check` (daily). The change to update test stats, and (potentially) grouping on HUD will come in separated PRs.
### Testing
* pull https://github.com/pytorch/pytorch/actions/runs/3447434434
* trunk https://github.com/pytorch/pytorch/actions/runs/3447434928
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88646
Approved by: https://github.com/clee2000
Fixes: https://github.com/pytorch/pytorch/issues/88010
This PR does a couple things to stop slow gradcheck from timing out:
- Splits out test_ops_fwd_gradients from test_ops_gradients, and factors out TestFwdGradients and TestBwdGradients which both inherit from TestGradients, now situated in common_utils (maybe there is a better place?)
- Skips CompositeCompliance (and several other test files) for slow gradcheck CI since they do not use gradcheck
- because test times for test_ops_fwd_gradients and test_ops_gradients are either unknown or wrong, we hardcode them for now to prevent them from being put together. We can undo the hack after we see actual test times are updated. ("def calculate_shards" randomly divides tests with unknown test times in a round-robin fashion.)
- Updates references to test_ops_gradients and TestGradients
- Test files that are skipped for slow gradcheck CI are now centrally located in in run_tests.py, this reduces how fine-grained we can be with the skips, so for some skips (one so far) we still use the old skipping mechanism, e.g. for test_mps
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88216
Approved by: https://github.com/albanD
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86699
This diff does the following:
1. **c10d_error_logger.py**: Add an API to create a logger with a specific logging handler based on the destination.
2. The API from above would get a logging handler based on the destination provided.
- **caffe2/torch/distributed/logging_handlers.py**: For OSS, we simply use a NullHandler() for now.
3. Add associated test files for 1 and 2.
Test Plan:
## Unit Test
```
buck test @//mode/dev-nosan //caffe2/test/distributed:test_c10d_error_logger -- --print-passing-details
```
```
File changed: fbcode//caffe2/test/distributed/test_c10d_error_logger.py
File changed: fbsource//xplat/caffe2/test/distributed/TARGETS
9 additional file changes
waiting for all tests to finish...
✓ Listing success: caffe2/test/distributed:test_c10d_error_logger (0.2s)
Found 1 tests
✓ Pass: caffe2/test/distributed:test_c10d_error_logger - test_get_or_create_logger (caffe2.test.distributed.test_c10d_error_logger.C10dErrorLoggerTest) (0.2s)
stdout:
stderr:
Buck UI: https://www.internalfb.com/buck2/b975f6b0-77e9-4287-8722-f95b48036181
Test Session: https://www.internalfb.com/intern/testinfra/testrun/1407375150206593
RE: reSessionID-4d7ab8ca-1051-48e9-a5a8-6edbe15d1fe4 Up: 124 B Down: 0 B
Jobs completed: 5. Time elapsed: 3.5s.
Tests finished: Pass 1. Fail 0. Fatal 0. Skip 0. 0 builds failed
```
Differential Revision: D39920391
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87123
Approved by: https://github.com/fduwjj, https://github.com/H-Huang
Fixes#83973 (This is a substitute PR for https://github.com/pytorch/pytorch/pull/85024)
First of all, thanks for your invaluable contributions to PyTorch everyone!
Given how extensively `torch.cuda.is_available` is used in the PyTorch ecosystem, IMHO it's worthwhile to provide downstream libraries/frameworks/users the ability to alter the default behavior of `torch.cuda.is_available` in the context of their PyTorch usage.
I'm confident there are many current and future such use cases which could benefit from leveraging a weakened, NVML-based `torch.cuda.is_available` assessment at a downstream framework's explicit direction (thanks @malfet 81da50a972 !). Though one could always patch out the `torch.cuda.is_available` function with another implementation in a downstream library, I think this environmental variable based configuration option is more convenient and the cost to including the option is quite low.
As discussed in https://github.com/pytorch/pytorch/pull/85024#issuecomment-1261542045, this PR gates new non-default NVML-based CUDA behavior with an environmental variable (PYTORCH_NVML_BASED_CUDA_CHK) that allows a user/framework to invoke non-default, NVML-based `is_available()` assessments if desired.
Thanks again for your work everyone!
@ngimel @malfet @awaelchli
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85951
Approved by: https://github.com/ngimel
run tests in parallel at the test file granularity
runs 3 files in parallel using multiprocessing pool, output goes to a file, which is then printed when the test finishes. Some tests cannot be run in parallel (usually due to lacking memory), so we run those after. Sharding is changed to attempt to mask large files with other large files/run them on the same shard.
test_ops* gets a custom handler to run it because it is simply too big (2hrs on windows) and linalg_cholesky fails (I would really like a solution to this if possible, but until then we use the custom handler).
reduces cuda tests by a lot, reduces total windows test time by ~1hr
Ref. https://github.com/pytorch/pytorch/issues/82894
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84961
Approved by: https://github.com/huydhn
- [x] Direct dependency on UCX is completely removed, UCC active set API always enabled
- [x] Remove `TORCH_UCC_PROFILING_ENABLE`, always enable profiling
- [x] Fixes profiling of `recv` and `all_gather`
- [x] Use the NCCL TL of UCC on CUDA, as the UCP TL is not well supported on CUDA
Most tests are passing, but there are a few skipped tests:
- `scatter` and `gather` are not supported by the UCP TL of UCC on CPU tensors
- A few flaky tests in PyTorch's CI environment
- Profiler-related failures, some of them will be fixed by @Fuzzkatt in https://github.com/pytorch/pytorch/pull/84368
After this PR is merged, I will continue to work on these skipped failures.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83285
Approved by: https://github.com/vtlam, https://github.com/malfet, https://github.com/kwen2501
This is a new version of #15648 based on the latest master branch.
Unlike the previous PR where I fixed a lot of the doctests in addition to integrating xdoctest, I'm going to reduce the scope here. I'm simply going to integrate xdoctest, and then I'm going to mark all of the failing tests as "SKIP". This will let xdoctest run on the dashboards, provide some value, and still let the dashboards pass. I'll leave fixing the doctests themselves to another PR.
In my initial commit, I do the bare minimum to get something running with failing dashboards. The few tests that I marked as skip are causing segfaults. Running xdoctest results in 293 failed, 201 passed tests. The next commits will be to disable those tests. (unfortunately I don't have a tool that will insert the `#xdoctest: +SKIP` directive over every failing test, so I'm going to do this mostly manually.)
Fixes https://github.com/pytorch/pytorch/issues/71105
@ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82797
Approved by: https://github.com/ezyang
After https://github.com/pytorch/pytorch/pull/81116, we started pulling test times straight from the source instead of first downloading them in the build job and then having the test job take the build jobs version. This can cause an issues where different shards pull different versions of the file, leading to incorrect sharding (ex two shards running the same tests file on accident). This generally happens if the test jobs happen while the test times file is being updated (unlikely, but not impossible) or if someone reruns a test job the next day.
In this PR, I return to the old method of downloading the test times file during the build job and having the test jobs pull from the build jobs uploaded artifacts. If there is no test times file in the build job's artifacts, we fall back to the default sharding plan.
Notes:
* script moved to a new file to avoid needing to import torch, which would require torch to be built, which can cause issues with asan
* I got errors with asan (`ASan runtime does not come first in initial library list; you should either link runtime to your application or manually preload it with LD_PRELOAD.`), so I put the script at the beginning of the build
### Test Plan
Verified that the number of tests ran in the pull and trunk workflows are similar to workflows run on master. Checked logs to see if artifacts were being used for sharding. Spot checked a few test configs to check that their lists of selected tests didn't overlap.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81915
Approved by: https://github.com/huydhn
This PR:
- adds the ability to run functorch tests via run_test.py
- changes the functorch shards in PyTorch CI to invoke functorch tests
via run_test.py
The main motivation for this is so that functorch tests hook into the
standard PyTorch test infrastructure.
Questions for reviewers:
- the functorch tests are located outside of the pytorch/test folder
(they're in the pytorch/functorch/test folder). Is this OK? (run_test.py
works locally for me).
Test Plan:
- checked that `python run_test.py --functorch` ran functorch tests
locally
- Local mock test: added `{"test_compilation_for_dynamic_shape
(__main__.TestCompileCache)":
["https://github.com/pytorch/pytorch/issues/82016", ["linux"]]}` to .pytorch-disabled-tests.json, ran functorch tests, verified that the test was skipped.
- Wait for CI
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82012
Approved by: https://github.com/janeyx99
In the case of target determination, this is just removing comments that
refer to non-existent code.
In the case of the test specification code; this removes (what I believe
to be) an unused feature. If we're using this somehow let me know and I
can revise the PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79372
Approved by: https://github.com/janeyx99
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75374
From the code base of FairSeq and MetaSeq codebase (which is essentially a transformer model), we have found that loads of ops are not supported by sharded tensor. So we now implement a simple version so that we can at least run a transformer example:
Ops include: chuck, transpose, view, mask_fill, dropout, softmax and type_as.
Isolate the common logic of registering simple ops into a function and for future register, we just need to implement at most three functions for a new op.
ghstack-source-id: 155309147
Test Plan: CI
Reviewed By: pritamdamania87
Differential Revision: D35123021
fbshipit-source-id: 660e559fb8b4a910eb63e0586c63ab927873a2ce
(cherry picked from commit 83a87ebf627d863448dfe1019c7c5f7112cc14ab)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76199
Since Partial Tensor is somehow isolated to sharded tensor. We now move it to the _shard folder.
Also, we added the logic to remove paddings when the size is not divisible by the world size. Modify the unit test to reflect this changes.
Finally, we need to consider the placement order for the reshading spec for partial tensor, related logic is added in this change. Futhermore, for sharded linear, we will need to order the placement by rank to get the expected local result.
ghstack-source-id: 154853290
Test Plan: CI
Reviewed By: pritamdamania87, wanchaol
Differential Revision: D35827894
fbshipit-source-id: 58dab77969b8b6557f42afa7e8f5a8a053dd5793
(cherry picked from commit abeb28f16582dcf707c2e165f39df6caf692384d)
As per title.
### When running `python run_test.py -h`
It used to show:
- The general unittest parser help that we print via a second thread 35545d85dc/torch/testing/_internal/common_utils.py (L467-L470)
- The common_utils's parser help
<details><summary>Full result</summary>
<p>
```bash
$ python run_test.py -h
usage: run_test.py [-h] [-v] [-q] [--locals] [-f] [-c] [-b] [-k TESTNAMEPATTERNS] [tests [tests ...]]
positional arguments:
tests a list of any number of test modules, classes and test methods.
optional arguments:
-h, --help show this help message and exit
-v, --verbose Verbose output
-q, --quiet Quiet output
--locals Show local variables in tracebacks
-f, --failfast Stop on first fail or error
-c, --catch Catch Ctrl-C and display results so far
-b, --buffer Buffer stdout and stderr during tests
-k TESTNAMEPATTERNS Only run tests which match the given substring
Examples:
run_test.py - run default set of tests
run_test.py MyTestSuite - run suite 'MyTestSuite'
run_test.py MyTestCase.testSomething - run MyTestCase.testSomething
run_test.py MyTestCase - run all 'test*' test methods
in MyTestCase
usage: run_test.py [-h] [--subprocess] [--seed SEED] [--accept] [--jit_executor JIT_EXECUTOR] [--repeat REPEAT] [--test_bailouts]
[--save-xml [SAVE_XML]] [--discover-tests] [--log-suffix LOG_SUFFIX] [--run-parallel RUN_PARALLEL]
[--import-slow-tests [IMPORT_SLOW_TESTS]] [--import-disabled-tests [IMPORT_DISABLED_TESTS]]
optional arguments:
-h, --help show this help message and exit
--subprocess whether to run each test in a subprocess
--seed SEED
--accept
--jit_executor JIT_EXECUTOR
--repeat REPEAT
--test_bailouts
--save-xml [SAVE_XML]
--discover-tests
--log-suffix LOG_SUFFIX
--run-parallel RUN_PARALLEL
--import-slow-tests [IMPORT_SLOW_TESTS]
--import-disabled-tests [IMPORT_DISABLED_TESTS]
```
</p>
</details>
It now prints:
- The general unittest parser help the same way. Should we remove this? We can't merge them unfortunately as inittest does not accept parent / does not expose the parser for us to take it as a parent.
- The combined common_utils + run_test parsers help
<details><summary>Full result</summary>
<p>
```bash
$ python run_test.py -h
usage: run_test.py [-h] [-v] [-q] [--locals] [-f] [-c] [-b] [-k TESTNAMEPATTERNS] [tests [tests ...]]
positional arguments:
tests a list of any number of test modules, classes and test methods.
optional arguments:
-h, --help show this help message and exit
-v, --verbose Verbose output
-q, --quiet Quiet output
--locals Show local variables in tracebacks
-f, --failfast Stop on first fail or error
-c, --catch Catch Ctrl-C and display results so far
-b, --buffer Buffer stdout and stderr during tests
-k TESTNAMEPATTERNS Only run tests which match the given substring
Examples:
run_test.py - run default set of tests
run_test.py MyTestSuite - run suite 'MyTestSuite'
run_test.py MyTestCase.testSomething - run MyTestCase.testSomething
run_test.py MyTestCase - run all 'test*' test methods
in MyTestCase
Ignoring disabled issues: []
usage: run_test.py [-h] [--subprocess] [--seed SEED] [--accept] [--jit_executor JIT_EXECUTOR] [--repeat REPEAT] [--test_bailouts]
[--save-xml [SAVE_XML]] [--discover-tests] [--log-suffix LOG_SUFFIX] [--run-parallel RUN_PARALLEL]
[--import-slow-tests [IMPORT_SLOW_TESTS]] [--import-disabled-tests [IMPORT_DISABLED_TESTS]] [-v] [--jit]
[--distributed-tests] [-core] [-pt] [-c] [-i TESTS [TESTS ...]] [-x TESTS [TESTS ...]] [-f TESTS] [-l TESTS]
[--bring-to-front TESTS [TESTS ...]] [--ignore-win-blocklist] [--continue-through-error]
[--export-past-test-times [EXPORT_PAST_TEST_TIMES]] [--shard SHARD SHARD] [--exclude-jit-executor]
[--exclude-distributed-tests] [--run-specified-test-cases [RUN_SPECIFIED_TEST_CASES]]
[--use-specified-test-cases-by {include,bring-to-front}] [--dry-run]
[additional_unittest_args [additional_unittest_args ...]]
Run the PyTorch unit test suite
positional arguments:
additional_unittest_args
additional arguments passed through to unittest, e.g., python run_test.py -i sparse -- TestSparse.test_factory_size_check
optional arguments:
-h, --help show this help message and exit
--subprocess whether to run each test in a subprocess
--seed SEED
--accept
--jit_executor JIT_EXECUTOR
--repeat REPEAT
--test_bailouts
--save-xml [SAVE_XML]
--discover-tests
--log-suffix LOG_SUFFIX
--run-parallel RUN_PARALLEL
--import-slow-tests [IMPORT_SLOW_TESTS]
--import-disabled-tests [IMPORT_DISABLED_TESTS]
-v, --verbose print verbose information and test-by-test results
--jit, --jit run all jit tests
--distributed-tests, --distributed-tests
run all distributed tests
-core, --core Only run core tests, or tests that validate PyTorch's ops, modules,and autograd. They are defined by CORE_TEST_LIST.
-pt, --pytest If true, use `pytest` to execute the tests. E.g., this runs TestTorch with pytest in verbose and coverage mode: python run_test.py -vci torch -pt
-c, --coverage enable coverage
-i TESTS [TESTS ...], --include TESTS [TESTS ...]
select a set of tests to include (defaults to ALL tests). tests must be a part of the TESTS list defined in run_test.py
-x TESTS [TESTS ...], --exclude TESTS [TESTS ...]
select a set of tests to exclude
-f TESTS, --first TESTS
select the test to start from (excludes previous tests)
-l TESTS, --last TESTS
select the last test to run (excludes following tests)
--bring-to-front TESTS [TESTS ...]
select a set of tests to run first. This can be used in situations where you want to run all tests, but care more about some set, e.g. after making a change to a specific component
--ignore-win-blocklist
always run blocklisted windows tests
--continue-through-error
Runs the full test suite despite one of the tests failing
--export-past-test-times [EXPORT_PAST_TEST_TIMES]
dumps test times from previous S3 stats into a file, format JSON
--shard SHARD SHARD runs a shard of the tests (taking into account other selections), e.g., --shard 2 3 will break up the selected tests into 3 shards and run the tests in the 2nd shard (the first number should not exceed the second)
--exclude-jit-executor
exclude tests that are run for a specific jit config
--exclude-distributed-tests
exclude distributed tests
--run-specified-test-cases [RUN_SPECIFIED_TEST_CASES]
load specified test cases file dumped from previous OSS CI stats, format CSV. If all test cases should run for a <test_module> please add a single row:
test_filename,test_case_name
...
<test_module>,__all__
...
how we use the stats will be based on option "--use-specified-test-cases-by".
--use-specified-test-cases-by {include,bring-to-front}
used together with option "--run-specified-test-cases". When specified test case file is set, this option allows the user to control whether to only run the specified test modules or to simply bring the specified modules to front and also run the remaining modules. Note: regardless of this option, we will only run the specified test cases within a specified test module. For unspecified test modules with the bring-to-front option, all test cases will be run, as one may expect.
--dry-run Only list the test that will run.
where TESTS is any of: benchmark_utils/test_benchmark_utils, distributed/_shard/sharded_optim/test_sharded_optim, distributed/_shard/sharded_tensor/ops/test_binary_cmp, distributed/_shard/sharded_tensor/ops/test_elementwise_ops, distributed/_shard/sharded_tensor/ops/test_embedding, distributed/_shard/sharded_tensor/ops/test_embedding_bag, distributed/_shard/sharded_tensor/ops/test_init, distributed/_shard/sharded_tensor/ops/test_linear, distributed/_shard/sharded_tensor/ops/test_math_ops, distributed/_shard/sharded_tensor/test_megatron_prototype, distributed/_shard/sharded_tensor/test_partial_tensor, distributed/_shard/sharded_tensor/test_sharded_tensor, distributed/_shard/sharded_tensor/test_sharded_tensor_reshard, distributed/_shard/sharding_spec/test_sharding_spec, distributed/_shard/test_replicated_tensor, distributed/algorithms/test_join, distributed/elastic/events/lib_test, distributed/elastic/metrics/api_test, distributed/elastic/multiprocessing/api_test, distributed/elastic/timer/api_test, distributed/elastic/timer/local_timer_example, distributed/elastic/timer/local_timer_test, distributed/elastic/utils/distributed_test, distributed/elastic/utils/logging_test, distributed/elastic/utils/util_test, distributed/fsdp/test_flatten_params_wrapper, distributed/fsdp/test_fsdp_apply, distributed/fsdp/test_fsdp_checkpoint, distributed/fsdp/test_fsdp_clip_grad_norm, distributed/fsdp/test_fsdp_comm, distributed/fsdp/test_fsdp_core, distributed/fsdp/test_fsdp_freezing_weights, distributed/fsdp/test_fsdp_grad_acc, distributed/fsdp/test_fsdp_ignored_modules, distributed/fsdp/test_fsdp_input, distributed/fsdp/test_fsdp_memory, distributed/fsdp/test_fsdp_mixed_precision, distributed/fsdp/test_fsdp_multiple_forward, distributed/fsdp/test_fsdp_multiple_wrapping, distributed/fsdp/test_fsdp_optim_state, distributed/fsdp/test_fsdp_overlap, distributed/fsdp/test_fsdp_pure_fp16, distributed/fsdp/test_fsdp_state_dict, distributed/fsdp/test_fsdp_summon_full_params, distributed/fsdp/test_fsdp_traversal, distributed/fsdp/test_fsdp_uneven, distributed/fsdp/test_shard_utils, distributed/fsdp/test_utils, distributed/fsdp/test_wrap, distributed/nn/jit/test_instantiator, distributed/optim/test_zero_redundancy_optimizer, distributed/pipeline/sync/skip/test_api, distributed/pipeline/sync/skip/test_gpipe, distributed/pipeline/sync/skip/test_inspect_skip_layout, distributed/pipeline/sync/skip/test_leak, distributed/pipeline/sync/skip/test_portal, distributed/pipeline/sync/skip/test_stash_pop, distributed/pipeline/sync/skip/test_tracker, distributed/pipeline/sync/skip/test_verify_skippables, distributed/pipeline/sync/test_balance, distributed/pipeline/sync/test_bugs, distributed/pipeline/sync/test_checkpoint, distributed/pipeline/sync/test_copy, distributed/pipeline/sync/test_deferred_batch_norm, distributed/pipeline/sync/test_dependency, distributed/pipeline/sync/test_inplace, distributed/pipeline/sync/test_microbatch, distributed/pipeline/sync/test_phony, distributed/pipeline/sync/test_pipe, distributed/pipeline/sync/test_pipeline, distributed/pipeline/sync/test_stream, distributed/pipeline/sync/test_transparency, distributed/pipeline/sync/test_worker, distributed/rpc/cuda/test_tensorpipe_agent, distributed/rpc/test_faulty_agent, distributed/rpc/test_tensorpipe_agent, distributed/test_c10d_common, distributed/test_c10d_gloo, distributed/test_c10d_nccl, distributed/test_c10d_spawn_gloo, distributed/test_c10d_spawn_nccl, distributed/test_data_parallel, distributed/test_distributed_spawn, distributed/test_launcher, distributed/test_nccl, distributed/test_pg_wrapper, distributed/test_store, distributions/test_constraints, distributions/test_distributions, lazy/test_bindings, lazy/test_extract_compiled_graph, lazy/test_ts_opinfo, test_ao_sparsity, test_autocast, test_autograd, test_binary_ufuncs, test_bundled_inputs, test_complex, test_cpp_api_parity, test_cpp_extensions_aot_ninja, test_cpp_extensions_aot_no_ninja, test_cpp_extensions_jit, test_cuda, test_cuda_primary_ctx, test_dataloader, test_datapipe, test_deploy, test_deploy, test_dispatch, test_expanded_weights, test_foreach, test_function_schema, test_functional_autograd_benchmark, test_functional_optim, test_functionalization, test_futures, test_fx, test_fx_experimental, test_hub, test_import_stats, test_indexing, test_jit, test_jit_autocast, test_jit_cuda_fuser, test_jit_disabled, test_jit_fuser_legacy, test_jit_fuser_te, test_jit_legacy, test_jit_profiling, test_license, test_linalg, test_logging, test_masked, test_mkldnn, test_mobile_optimizer, test_model_dump, test_module_init, test_modules, test_monitor, test_multiprocessing, test_multiprocessing_spawn, test_namedtensor, test_namedtuple_return_api, test_native_functions, test_nestedtensor, test_nn, test_numba_integration, test_numpy_interop, test_openmp, test_ops, test_ops_gradients, test_ops_jit, test_optim, test_overrides, test_package, test_per_overload_api, test_profiler, test_pruning_op, test_public_bindings, test_python_dispatch, test_pytree, test_quantization, test_reductions, test_scatter_gather_ops, test_serialization, test_set_default_mobile_cpu_allocator, test_shape_ops, test_show_pickle, test_sort_and_select, test_sparse, test_sparse_csr, test_spectral_ops, test_stateless, test_tensor_creation_ops, test_tensorboard, test_tensorexpr, test_tensorexpr_pybind, test_testing, test_torch, test_type_hints, test_type_info, test_type_promotion, test_unary_ufuncs, test_utils, test_view_ops, test_vmap, test_vulkan, test_xnnpack_integration
```
</p>
</details>
### When running anything else (for example `python test_autograd.py -h`)
It did not change and still does:
- The general unittest parser help that we print via a second thread
- The common_utils's parser help
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76152
Approved by: https://github.com/malfet, https://github.com/seemethere
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73873
Basic ShardingPlan interface and Sharder implemention:
1. We provide `ShardingPlan` to allow user to specify all parameter sharding strategies for a given model, this including `plan` for sharding the parameters, and `output_plan` for tagging the output layout, `return_local_tensor` for converting back to DDP.
2. Introduce `shard_module` API, that could take a nn.Module, a ShardingPlan, then shard the module according to the plan.
TODO:
next PR we will introduce Extensible Sharder and ShardingPlanner.
ghstack-source-id: 154682421
Test Plan: test_sharding_plann.py
Reviewed By: pritamdamania87, fduwjj
Differential Revision: D34695159
fbshipit-source-id: 3d695803c4b7e9a7543177ade5b709b5f847baa9
(cherry picked from commit 670cd279b0e5304a9bf0ce6e6651a08273a77035)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73322
These tests have been disabled in OSS CI since #34785.
Test Plan: Imported from OSS
Reviewed By: eellison
Differential Revision: D34436844
Pulled By: davidberard98
fbshipit-source-id: c5b14b33e7f369a6fa1e9cfbcb484a30dffc659e
(cherry picked from commit b08f51587c0203c3e8b69f06ea613759e740aa4f)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73529
Add ReplicatedTensor, a ReplicatedTensor is a type of tensor that have the same value on all ranks across the world_size.
ReplicatedTensor is a :class:`~torch.Tensor` subclass, and it could be used together with ShardedTensor/Tensor together to express different types of computation. The inter-op rules defined as (using torch.add as an example op):
ReplicatedTensor + ReplicatedTensor = ReplicatedTensor
ReplicatedTensor + torch.Tensor = torch.Tensor
ReplicatedTensor + ShardedTensor = ShardedTensor
We also added a `validate()` API to help user validate if a replicated tensor on certain process_group is truly replicated or not.
TODO: next PR gonna add ShardedTensor/PartialTensor logic to handle ReplicatedTensor.
ghstack-source-id: 152064781
Test Plan: test_replicated_tensor
Reviewed By: pritamdamania87, fduwjj
Differential Revision: D34529374
fbshipit-source-id: 16ccb300e9f9c47ac29a17eb6d46d029ab7d60b8
(cherry picked from commit 44f4e11e795a1bf330a8108bda256950ca769525)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73676
For some reason https://github.com/pytorch/pytorch/pull/72637 ended up in getting messed up during rebasing so please refer to that pr for review history.
This PR creates a new workflow called ` deploy-linux-xenial-cuda11.3-py3.7-gcc7` for torch::deploy tests.
For testing go to https://www.torch-ci.com/pytorch/pytorch/pull/73676 and check if a build and test job occur with ` deploy-linux-xenial-cuda11.3-py3.7-gcc7`
Test Plan: Imported from OSS
Reviewed By: soulitzer
Differential Revision: D34586702
Pulled By: PaliC
fbshipit-source-id: 5627cf4ff411a4a04030f8b7726f84af979da213
(cherry picked from commit df6dddebb9fe078a6053a31033b5a40cc742fcf3)
Fixes#72368
As per reference issue, the test_ops in single file takes around 3:30-4:00Hrs to execute on asan jobs:
Reference : pytorch_test_times.json
```
{
"commit": "39535fec6c3ff5bf7c2d322d096c59571c3295ed",
"JOB_BASE_NAME": "linux-xenial-py3.7-clang7-asan",
"job_times": {
"test_ops": 14928.355000000636, <- This test group is over 4hrs alone
```
----
Hence separating test_ops into following parts:
1. TestGradients
2. TestJit
3. TestCommon and TestMathBits
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74297
Approved by: https://github.com/malfet
Fixes#72368
As per reference issue, the test_ops in single file takes around 3:30-4:00Hrs to execute on asan jobs:
Reference : pytorch_test_times.json
```
{
"commit": "39535fec6c3ff5bf7c2d322d096c59571c3295ed",
"JOB_BASE_NAME": "linux-xenial-py3.7-clang7-asan",
"job_times": {
"test_ops": 14928.355000000636, <- This test group is over 4hrs alone
```
----
Hence separating test_ops into following parts:
1. TestGradients
2. TestJit
3. TestCommon and TestMathBits
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74297
Approved by: https://github.com/malfet
Summary:
Remove fx2trt test from oss CI
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72595
Test Plan: CI
Reviewed By: houseroad
Differential Revision: D34112595
Pulled By: wushirong
fbshipit-source-id: 02376ef0f25381eff31b72dcbf964c1966af9793
(cherry picked from commit e3d698a942)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69735
We want to build a prototype of Megatron-LM so that we can apply PT-D op to models like transformer and other Meta flagship models like
The basic idea of Megatron-LM is as following:
1. Col-wise sharding of linear weight. Perform the linear op for the first layer.
2. Perform a math op (optional), such as ReLU or GeLU. We use GeLU in our example unit test. The input is from step 1.
3. Row-wise sharing of linear weight. Perform the linear op for the second layer. The input is from step 2.
We then save communications to concatenate the col-wise sharding results and spreading the input to different ranks for row-wise sharding.
The change is as following:
1. Return a ShardedTensor for the col-wise sharding in the sharded_linear op.
2. Return a PartialTensors for the row-wise sharding in the sharded_linear op.
3. Leverage APIs already defined for `reshard` to merge/aggregate local results to a fully sync local result if needed.
4. Add helper function to create sharded tensor based on the local result.
5. Add a unit test to test the Megatron-LM idea mentioned above and compare with local ops, including the grad and optimizer so that we can ensure the correctness of the implementation.
6. Refactor the unit test of sharded linear to reflect the changes in the code.
ghstack-source-id: 148273049
Test Plan: Unit test + CI
Reviewed By: pritamdamania87
Differential Revision: D32978221
fbshipit-source-id: 565fc92e7807e19d53b0261f8ace3945bef69e3e
(cherry picked from commit 344abe7520)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70079
We defined a new concept named `PartialTensor`, which is an abstraction to represent Tensors that need aggregation across multiple devices and multiple processes.
We also defined a API `reshard_output` to reshard a `PartialTensor` to `Tensor` or reshard a `ShardedTensor` to `ShardedTensor/Tensor`. This is done via class `ModuleResharder` which acts like a wrapper of original modules plus the a reshard in the final step.
The `reshard` logic is defined in each class (`ShardedTensor` and `PartialTensor`).
ghstack-source-id: 148273050
Test Plan: Unit test is in the next PR.
Reviewed By: pritamdamania87
Differential Revision: D33121037
fbshipit-source-id: 5f56617ea526b857c5b73df6e069697d428ec359
(cherry picked from commit 58b1457cbc)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72141
We have many sharding components currently:
torch.distributed._sharded_tensor, torch.distributed._sharding_spec,
torch.distributed._sharded_optimizer and more coming.
As a result, organizing all of this under the `torch.distributed._shard`
package. For BC reasons, I'm still keeping the old packages and have them just
reference the new package.
ghstack-source-id: 148150861
ghstack-source-id: 148150861
Test Plan: waitforbuildbot
Reviewed By: fduwjj
Differential Revision: D33904585
fbshipit-source-id: 057e847eb7521b536a3ee4e0f94871aacc752062
(cherry picked from commit 29a70dd7af)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71742
We have many sharding components currently:
torch.distributed._sharded_tensor, torch.distributed._sharding_spec,
torch.distributed._sharded_optimizer and more coming.
As a result, organizing all of this under the `torch.distributed.shard`
package. For BC reasons, I'm still keeping the old packages and have them just
reference the new package.
ghstack-source-id: 147899768
Test Plan: waitforbuildbot
Reviewed By: fduwjj, wanchaol
Differential Revision: D33755913
fbshipit-source-id: dc692b31e2607063d55dfcb3db33ec53961d5a5b
(cherry picked from commit 5b6885f358)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70145
Added support for torch.equal to ShardedTensor. This is really
helpful in terms of comparing two ShardedTensors.
ghstack-source-id: 146066939
Test Plan: waitforbuildbot
Reviewed By: wanchaol
Differential Revision: D33201714
fbshipit-source-id: 56adfc36e345d512c9901c56c07759bf658c745b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69734
Added support for `torch.equal` to ShardedTensor. This is really
helpful in terms of comparing two ShardedTensors.
Will implement `allclose` in a follow PR.
ghstack-source-id: 145301451
Test Plan: waitforbuildbot
Reviewed By: fduwjj, wanchaol
Differential Revision: D33004315
fbshipit-source-id: 786fe26baf82e1bb4fecfdbfc9ad4b64e704877f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68607
This PR added ShardedOptimizer and a API to get module parameters along with ShardedTensor param, it allows user to use this Optimizer Wrapper to construct a optimizer that involves ShardedTensor
The state_dict support will be a follow up diff
ghstack-source-id: 145532834
Test Plan: python test_sharded_optim.py
Reviewed By: pritamdamania87
Differential Revision: D32539994
fbshipit-source-id: a3313c6870d1f1817fc3e08dc2fc27dc43bef743
Summary:
The error message was changed following a PR comment. And since the test doesn't run on CI, I forgot to update the test to catch the new error message.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69565
Reviewed By: mrshenli
Differential Revision: D32932982
Pulled By: albanD
fbshipit-source-id: a1da72b0ca735e72b481bc944039233094f1c422
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68822
Per title, we switched over c10d_gloo and nccl and results look good
so far, so switch the rest of them as well. After the only dist tests that
won't run in subprocess are pipe and fsdp tests, which historically haven't had
much flakiness.
ghstack-source-id: 144213522
Test Plan: CI
Reviewed By: H-Huang
Differential Revision: D32624330
fbshipit-source-id: 469f613e5b0e4529e6b23ef259d948837d4af26b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68821
Continuing effort to move most distributed tests to run in subprocess
for better reproducibility + reduce flakiness.
ghstack-source-id: 144213520
Test Plan: CI
Reviewed By: H-Huang
Differential Revision: D32624199
fbshipit-source-id: 04448636320554d7a3ab29ae92bc1ca9fbe37da2
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68504
Per title
ghstack-source-id: 143928767
Test Plan: CI
Reviewed By: H-Huang
Differential Revision: D32485100
fbshipit-source-id: a55687aea4af69e3830aee6f0278550c72f142c2
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68503
Per title
ghstack-source-id: 143928768
Test Plan: CI
Reviewed By: H-Huang
Differential Revision: D32484990
fbshipit-source-id: 6682f46256af0da5153e5087a91a7044156dd17f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67400
c10d/frontend.cpp was originally proposed to introduce pure C++ API and use TorcBind to share python level API with TorchScript. This is no longer needed, so delete this to reduce code redundancy.
ghstack-source-id: 143910066
ghstack-source-id: 143910066
Test Plan: wait for ci
Reviewed By: navahgar
Differential Revision: D31979270
fbshipit-source-id: 6ceb8b53d67ab8f9aef44b34da79346dfbb51225
Summary:
Context: https://github.com/pytorch/pytorch/issues/67061
Use `run_test.py`'s provided flag `"--subprocess"`, passed in like `extra_unittest_args=["--subprocess"]` when running test_distributed_spawn. This will ensure that each test is run separately in its own process. The goal is to more closely simulate how a developer would run a single test when reproducing a CI failure and make reproducibility easier in general.
Also, when a test fails, print out the exact command that was issued so developer knows how to reproduce it.
For example test fails, it will print out something like the following to logs -
```
Test exited with non-zero exitcode 1. Command to reproduce: BACKEND=gloo WORLD_SIZE=3 /fsx/users/rvarm1/conda/envs/pytorch/bin/python distributed/test_distributed_spawn.py -v TestDistBackendWithSpawn.test_Backend_enum_class
```
running test_distributed_spawn is still the same cmd as before:
`
python test/run_test.py --verbose -i distributed/test_distributed_spawn
`
as seen in [distributed contributing](https://github.com/pytorch/pytorch/blob/master/torch/distributed/CONTRIBUTING.md) guide.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67901
Reviewed By: cbalioglu, mruberry
Differential Revision: D32225172
Pulled By: rohan-varma
fbshipit-source-id: 7e8d4c7a41858044bd2a4e0d1f0bf8f1ac671d67
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66101
Updated description:
This PR tests the functionalization pass in python in two ways. For each of the test programs that I have in `test_functionalization.py`, it:
- runs the program with and without functionalization, and asserts the outputs and (potentially mutated) inputs are equal in both cases
- runs the program with `LoggingTensor`, and uses expecttests on the resulting graph. I manually confirm that the graphs look reasonable and only contain functional ops.
Mechanically, the changes include:
- factoring out `LoggingTensor` into a testing util so it can be re-used in multiple tests
- adding some private python api's in the `torch` namespace as hooks that I can use during testing
In the original version of this PR, I also added some fixes to the `_make_subclass()` function in python: allowing you to pass in strides and storage_offset. I kept them in mainly because the changes were already there.
Test Plan: Imported from OSS
Reviewed By: zou3519
Differential Revision: D31942095
Pulled By: bdhirsh
fbshipit-source-id: 90ff4c88d461089704922e779571eee09c21d707
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67188
This diff/PR is trying to implement the ShardedEmbeddingBag using the ShardedTensor.
We support both row-wise and column-wise sharding of the embedding bag. The detailed logic can be found in the comment.
Several caveats:
1. Only the sharding of one weight is supported now.
1. We support limited input params for the op. To support more params are on the way.
2. We only support chuck sharding for now.
3. We only support a single local shard per rank for now.
Some other changes include:
1. Refactor the ShardedEmbedding code so that the common logic can be reused.
2. Fix tiny typos and corner cases in API `get_chunked_dim_size`. Where it will return -1 if the we set the dim_size = 5, split_size = 2, idx = 3. (This is a valid case because when chunks = 4, dim_size = 5, then the split_size = 2)
ghstack-source-id: 142325915
Test Plan: Unit test and CI
Reviewed By: pritamdamania87
Differential Revision: D31749458
fbshipit-source-id: ed77e05e4ec94ef1a01b1feda8bbf32dc5d5da1b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63997
Use torch_function to extend torch.nn.init.uniform_
The Init is done in SPMD fashion. Note that ideally we want to aggregate sharded tensors into a global tensor, init it and reshard. It's fine to run it SPMD since uniform is I.I.D indepenent and identifically distributed.
Also enable unit test for test_linear.py for OSS test
Test Plan:
a) Unit Test
(pytorch) ... $ python test/distributed/_sharded_tensor/ops/test_init.py TestShardedTensorNNInit --v
(pytorch) ... $ python test/distributed/_sharded_tensor/ops/test_linear.py --v (before runs this command is no-op)
or b) Manual run: Instruction here: https://docs.google.com/document/d/1_m1Hdo5w51-hhPlZ_F8Y6PIWrN7UgJZqiSpARYvhsaE/edit#
Imported from OSS
Reviewed By: pritamdamania87, anjali411
Differential Revision: D30563017
fbshipit-source-id: d1859f7682235bcb44515efc69ca92bc5e34fce1
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66604
This diff/PR is trying to implement the ShardedEmbedding and ShardedEmbedding using the ShardedTensor.
Several caveats:
1. We support limited input params for the op. To support more params are on the way.
2. We only support chuck sharding for now.
3. We only support a single local shard per rank for now.
ghstack-source-id: 141056130
Test Plan: Unit test and CI
Reviewed By: pritamdamania87
Differential Revision: D31544556
fbshipit-source-id: cc867dcba8c11e6f4c7c3722488908f5108cc67f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63881
This PR includes the minimal sets of features to make FSDP work, like sharding, core data flow and hooks. More tests will be added in the follow up PRs. Tests are refactored to utilize common PyTorch utils. Codes are also refactored a little bit. Alternative ways to replace ".data" usage in this PR are still being discussed offline.
Test Plan: unit tests
Reviewed By: mrshenli
Differential Revision: D30521673
fbshipit-source-id: 9a23390dd7c925749604c6860e08fbe39ddc5500
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64128
This PR implements a sharded nn.Linear layer using ShardedTensors with
the following limitations:
1) Works only for ChunkShardingSpec.
2) Implementation is only aimed to demonstrate functionality and is most likely
not performant at all.
The PR also introduces a `shard_parameter` API to easily shard parameters of
`nn.Modules`. This also has the following limitations:
1) Works only for ChunkShardingSpec.
2) Is not performant since it uses broadcast instead of scatter since
ProcessGroupNCCL doesn't yet support scatter.
Overall user API for running a sharded linear would be something like this:
```
# SPMD programming paradigm running same code on all nodes.
fc = nn.Linear(10, 10)
# Setup sharding.
sharding_spec=ChunkShardingSpec(...)
shard_parameter(fc, 'weight', sharding_spec, src_rank=0)
# Run as a normal linear layer.
inp = torch.rand(10, 10)
output = fc(inp)
```
ghstack-source-id: 138500985
Test Plan:
1) unit tests.
2) waitforbuildbot
Reviewed By: wanchaol, bowangbj
Differential Revision: D30621215
fbshipit-source-id: 1aa7478568c18a4572f6c3462fdf24a4cbde01d6
Summary:
There were several reports of target determinator incorrectly skipping
tests, most recent one is https://github.com/pytorch/pytorch/issues/64902
Let's disable it until it could be further stabilized
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64921
Reviewed By: seemethere, janeyx99
Differential Revision: D30901186
Pulled By: malfet
fbshipit-source-id: 531afd2d390c6b51f727330d5dd1882d70b6fdde
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64253
Follow up to D30496178 (f4aff3a346) to move the rest of distributed tests to their own jobs for Linux GHA.
ghstack-source-id: 137233785
Test Plan: CI
Reviewed By: walterddr
Differential Revision: D30662999
fbshipit-source-id: f7cfbc0d1223aca52120f17f9da987d70fda8de6
Summary:
Introduce `discover_tests` function that globs for all Python files
starting with `test_` in test folder excluding subfolders which are
executed differently
Fixes https://github.com/pytorch/pytorch/issues/64178
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64246
Reviewed By: walterddr, seemethere
Differential Revision: D30661652
Pulled By: malfet
fbshipit-source-id: a52e78ec717b6846add267579dd8d9ae75326bf9
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64197
Removes this line as test is gone.
ghstack-source-id: 136986275
Test Plan: CI
Reviewed By: walterddr
Differential Revision: D30642929
fbshipit-source-id: a0c7dfdfb35a4a7f7ec1b881dbea53d85136012c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63534
In this PR:
* We have changed the default dtype of `AutocastCPU` from `float16` to `bfloat16` as discussed here `https://github.com/pytorch/pytorch/pull/61002`
* We also update the operation list which needs casting to `lower_precision_fp` or `float32`.
Test Plan: Imported from OSS
Reviewed By: zou3519
Differential Revision: D30644914
Pulled By: ezyang
fbshipit-source-id: 8b93485ba452b3759611e3f0ac88e920fe495ac1
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63496
This PR adds a (private) enable_python_mode context manager.
(see torch/utils/_python_dispatch.py).
enable_python_mode accepts the type of a __torch_dispatch__ object
as its argument. Whenever an operator gets called inside of the
context manager, it dispatches to the __torch_dispatch__ of
the passed-in type.
Example usage:
```
with enable_python_mode(LoggingTensor):
z = torch.empty([])
assert isinstance(z, LoggingTensor)
```
There are quite a few changes that were made to support this.
First, we added TorchDispatchTypeObject, a C++ struct that represents the
type of a `__torch_dispatch__` object (e.g. LoggingTensor).
It holds both the PyObject* representing the class and a PyInterpreter*
so we know which Python interpreter it came from.
Next, we updated the concrete_dispatch_fn in python_variable.cpp to accept
a `const std::shared_ptr<TorchDispatchTypeObject>&` argument. When this
is null, dispatching happens as usual. When it is non-null, we prepend
the TorchDispatchTypeObject's PyObject* to the overloaded args list so that
it is considered first for dispatch.
To get that to work, we changed how `handle_torch_dispatch_no_python_arg_parser`
works. The "overloaded args list" previously only consisted of Tensor PyObjects,
but now it can have types in addition to Tensors!
- We renamed `append_overloaded_arg` to `append_overloaded_arg`
- We added a new `append_overloaded_type` that appends a type to
overloaded_args
- We added special handling in `handle_torch_dispatch_no_python_arg_parser`
and `append_overloaded_arg` to handle types in addition to Tensors.
Then, there is PythonMode and PythonModeTLS.
- We reuse the DispatchKey::Python dispatch key as a mode key
- We use PythonMode::enter and PythonMode::exit to enable/disable
DispatchKey::Python and set the PythonModeTLS.
- PythonModeTLS stores a TorchDispatchTypeObject as metadata.
- PythonMode is in libtorch_python, and PythonModeTLS is in ATen.
This split is due to the libtorch_python library boundary (because we need
to save TLS in ATen/ThreadLocalState)
- We modify the PythonFallbackKernel to look up
the relevant TorchDispatchTypeObject (if Python Mode is active) and
dispatch using it.
There are two more miscellaneous changes:
- internal_new_from_data (torch/csrc/utils/tensor_new.cpp) gets an
exclude guard. enable_python_mode currently does not handle
torch.tensor and the exclude guard is to prevent a bug.
Future:
- This PR does not allow for the nesting of Python modes. In the future we
should be able to enable this with a more sane no_dispatch API and by changing
the TLS to a stack. For now I did not need this for CompositeImplicitAutograd testing.
Test Plan: - new tests
Reviewed By: malfet, albanD
Differential Revision: D30543236
Pulled By: zou3519
fbshipit-source-id: ef5444d96a5a957d1657b7e37dce80f9a497d452
Summary:
This is in response to a feature request from some folks in the core team to have a local command that would only run relevant "core" tests. The idea is to have a local smoke test option for developers to run locally before making a PR in order to verify their changes did not break core functionality. These smoke tests are not targeted to be short but rather relevant.
This PR enables that by allowing developers to run `python test/run_test.py --core` or `python test/run_test.py -core` in order to run the CORE_TEST_LIST, which is currently test_nn.py, test_torch.py, and test_ops.py.
I am not the best person to judge what should be considered "core", so please comment which tests should be included and/or excluded from the CORE_TEST_LIST!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63976
Test Plan:
```
(pytorch) janeyx@janeyx-mbp test % python run_test.py --core -v
Selected tests: test_nn, test_ops, test_torch
Running test_nn ... [2021-08-25 14:48:28.865078]
Executing ['/Users/janeyx/miniconda3/envs/pytorch/bin/python', 'test_nn.py', '-v'] ... [2021-08-25 14:48:28.865123]
test_to (__main__.PackedSequenceTest) ... ok
test_to_memory_format (__main__.PackedSequenceTest) ... ok
```
Reviewed By: walterddr
Differential Revision: D30575560
Pulled By: janeyx99
fbshipit-source-id: 3f151982c1e315e50e60cb0d818adaea34556a04
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63809
This moves out the modulefinder determinator to `tools/testing` since it is supposed to be CI-only. This also simplifies run_test.py a little bit.
Test Plan: Imported from OSS
Reviewed By: malfet, seemethere, janeyx99
Differential Revision: D30497438
Pulled By: driazati
fbshipit-source-id: 1d203037af5af6a20c1e7812da935e7cbb5cd82f
Summary:
Currently distributed tests are mixed within test_python.
We would like to split the distributed tests into its own batch thus we need to split them out.
Adding an option to include/exclude distributed tests with CUSTOM_HANDLERS.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63147
Test Plan:
- locally run with the addition run_test.py options.
- CI
Dependency: found a bug in mpiexec test and need https://github.com/pytorch/pytorch/issues/63580 to fix it first.
Reviewed By: bdhirsh
Differential Revision: D30496178
Pulled By: walterddr
fbshipit-source-id: 7903a57b619f2425028028f944211938823918a6
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63357
Adds the ability to set CONTINUE_THROUGH_ERROR as an environment
variable so that we can easily set it without having to add the flag
directly
Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
Test Plan: Imported from OSS
Reviewed By: astaff
Differential Revision: D30351108
Pulled By: seemethere
fbshipit-source-id: 767fa9bd24e1399f359eb24d16f6cc985a2d7173
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63054
1) Ensure these tests are skipped in environments without any GPUs.
2) Add the test to run_test.py
ghstack-source-id: 135595698
Test Plan: waitforbuildbot
Reviewed By: wanchaol
Differential Revision: D30239159
fbshipit-source-id: 21b543ba72e8d10182bc77e7ae1fd34fd4096509
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62937
reland due to windows + cuda failure, fix by running it on gloo on windows even with cuda.
ghstack-source-id: 135306176
Test Plan: ci
Reviewed By: mrshenli
Differential Revision: D30177734
fbshipit-source-id: 7625746984c8f858648c1b3632394b98bd4518d2
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62774
Gates DistributedOptimizer which relies on RRef based on if RPC is available. This should enable ZeRo to work with Windows as Windows should not try to import the DIstributedOptimizer. If this works as expected we can enable the windows tests for functional/local sgd optimizers as well.
ghstack-source-id: 135216642
Test Plan: CI
Reviewed By: pbelevich
Differential Revision: D30117838
fbshipit-source-id: e6365a910a3d1ca40d95fa6777a7019c561957db
Summary:
This PR contains the initial version of `ModuleInfo` for use in testing modules. The design philosophy taken here is to start small and simple and build out / refactor as needed when more test coverage or `ModuleInfo` entries are added. As such, it's not intended for general usage yet. The PR contains the following:
* (new file) `torch/testing/_internal/common_modules.py`
* `ModuleInfo` definition - metadata for each module to use in testing
* `module_db` - the actual `ModuleInfo` database; currently contains entries for two modules
* `ModuleInput` - analogous to `SampleInput` from OpInfo; contains `FunctionInput`s for both constructor and forward pass inputs
* Constructor and forward pass inputs are tied together within a `ModuleInput` because they are likely correlated
* `FunctionInput` - just contains args and kwargs to pass to a function (is there a nicer way to do this?)
* `modules` decorator - analogous to `ops`; specifies a set of modules to run a test over
* Some constants used to keep track of all modules under torch.nn:
* `MODULE_NAMESPACES` - list of all namespaces containing modules
* `MODULE_CLASSES` - list of all module class objects
* `MODULE_CLASS_NAMES` - dict from module class object to nice name (e.g. torch.nn.Linear -> "nn.Linear")
* (new file) `test/test_modules.py`
* Uses the above to define tests over modules
* Currently, there is one test for demonstration, `test_forward`, which instantiates a module, runs its forward pass, and compares it to a reference, if one is defined
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61935
Reviewed By: mruberry
Differential Revision: D29881832
Pulled By: jbschlosser
fbshipit-source-id: cc05c7d85f190a3aa42d55d4c8b01847d1efd57f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61756
DDP will support running optimizer as communication hook with
optimizers that support a per-parameter/gradient step function `step_param`.
Add parity tests as we implement more optimizers that support step_param to
ensure parity with regular optimizers.
ghstack-source-id: 134330378
Test Plan: Ci
Reviewed By: SciPioneer
Differential Revision: D29727549
fbshipit-source-id: 18977c896f12b8e478298488b298fd107affcf5f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59077Fixes#58549
`from_buffer` constructs a tensor object from an already allocated buffer through
CPython's buffer protocol. Besides the standard `dtype`, `count`, and `offset` parameters,
this function also accepts:
- `device`: where the buffer lives
- `requires_grad`: should autograd record operations on the new tensor
A new test file _test_buffer_protocol.py_ was created. Currently, only CPU tests were
implemented. That's because neither PyTorch nor Numba implements CPython's buffer
protocol. Therefore, there's no way to create a CUDA buffer with the existing
dependencies (could use PyCUDA for that, though).
At the moment, if `device` differs from the device the buffer actually lives, two things
may happen:
- `RuntimeError`, if `device='cuda'`
- Segmentation fault (not tested -- see above), if `device='cpu'`
Test Plan: Imported from OSS
Reviewed By: jbschlosser
Differential Revision: D29870914
Pulled By: mruberry
fbshipit-source-id: 9fa8611aeffedfe39c9af74558178157a11326bb
Summary:
and into tools/ folder
Currently run_tests.py invokes tools/test_selections.py
1. download and analyze what test_file to run
2. download and parse S3 stats and pass the info to local files.
3. common_utils.py uses download S3 stats to determine what test cases to run.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61479
Reviewed By: janeyx99
Differential Revision: D29661986
Pulled By: walterddr
fbshipit-source-id: bebd8c474bcc2444e135bfd2fa4bdd1eefafe595
Summary:
run_test.py currently does lots of downloading and test file/suite/case parsing. It doesn't work well outside of the CI environment
Restructured the run_test.py and created tools/test/test_selections.py and move all test selection logic (reordering, categorizing slow test, creating shards)
Follow up PRs should:
- refactor those file read/write logic entangled inside test_selections.py into stats/ folder
- restructure and add network independent test logics to test_test_selections.py
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61124
Test Plan:
- tools/test
- CI
Related PR:
This follows the refactoring example in: https://github.com/pytorch/pytorch/issues/60373
Reviewed By: malfet
Differential Revision: D29558981
Pulled By: walterddr
fbshipit-source-id: 7f0fd9b4720a918d82918766c002295e8df04169
Summary:
Changes including:
- introduced `linter/`, `testing/`, `stats/` folders in `tools/`
- move appropriate scripts into these folders
- change grepped references in the pytorch/pytorch repo
Next step
- introduce `build/` folder for build scripts
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60473
Test Plan:
- CI (this is important b/c pytorch/test-infra also rely on some script reference.
- tools/tests/
Reviewed By: albanD
Differential Revision: D29352716
Pulled By: walterddr
fbshipit-source-id: bad40b5ce130b35dfd9e59b8af34f9025f3285fd
Summary:
This PR is a first step in unifying our environment variables across CI (so that we don't have `CIRCLE_BLAH` in our GHA workflows, for example), though I'd like for this PR to be more for discussion about how best to consolidate these variables.
This small change only changes most CIRCLE_JOB references in our code to be JOB_BASE_NAME, as that seems the closest GHA (and ROCm) equivalent. Currently, JOB_BASE_NAME is defined as:
- in Circle: CIRCLE_JOB (name of the job, like `pytorch_linux_bionic_py3_8_gcc9_coverage_test1`)
- in GHA: the build_environment with a `-build` or `-test` tacked to the end , e.g., `pytorch-linux-xenial-cuda10.2-cudnn7-py3.6-gcc7-test`
- in ROCm: I don't actually know, but it's important for ROCm test sharding as shown in https://github.com/pytorch/pytorch/pull/60409
I am not sure if this is the intention for JOB_BASE_NAME so it is open to discussion what variable we should use if not JOB_BASE_NAME. I also don't know if it's worth the effort consolidating all these variables, so discussion is also highly encouraged there!
Next steps:
- Consolidate more CIRCLE_* references, maybe into CI_* equivalents?
- We use BUILD_ENVIRONMENT everywhere in Circle though the variable is inconsistent across binary vs CI jobs and across platforms. For example, for linux tests and builds, BUILD_ENVIRONMENT contains the `_test` and `_build` suffixes, but the windows jobs don't. In GHA, BUILD_ENVIRONMENT is similar to how it's defined in windows jobs on Circle. This inconsistency is confusing, and we can probably do something about it. I'm thinking of switching out BUILD_ENVIRONMENT for JOB_BASE_NAME in our test scripts where we actually mean JOB_BASE_NAME.
- We should probably document the meaning of the variables we consolidate somewhere, preferably in a README in some unified `ci/` folder. For example, it seems BUILD_ENVIRONMENT is supposed to capture the build environment, whereas JOB_BASE_NAME is supposed to capture the environment _and_ whether we're building or testing.
Notes:
- I did not replace CIRCLE_JOB references in third_party directories
- Previously, print_test_stats reported CIRCLE_JOB as only the build environment for GHA workflows, and I think tacking on the `build` or `test` will not harm anything, though I may be wrong.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60425
Reviewed By: seemethere, samestep
Differential Revision: D29333882
Pulled By: janeyx99
fbshipit-source-id: a82080e6205a03a1183035011ce59698eca06748
Summary:
Adding windows CUDA smoke tests on PRs (master should run the full suite).
Next step:
- Automate data update so we get a new smoke test list without manual effort
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59686
Test Plan: https://github.com/pytorch/pytorch/actions/runs/958296267 The sharded smoke tests take long still because of dependencies installation
Reviewed By: walterddr
Differential Revision: D29243533
Pulled By: janeyx99
fbshipit-source-id: dde7ba127fa15c95bda0e833cc5311598fb85e2b
Summary:
This is branch off of https://github.com/pytorch/pytorch/issues/59970 to only shard on linux so far (we're running in issues with windows gflags).
This would enable sharding of tests on a few Linux jobs on GHA, allowing tts to be essentially halved.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60124
Reviewed By: zou3519
Differential Revision: D29204211
Pulled By: janeyx99
fbshipit-source-id: 1cc31d1eccd564d96e2aef14c0acae96a3f0fcd0
Summary:
Currently S3 test stats doesn't support PR stats parisng.
Changes to s3_stats_parser:
1. they are uploaded to `test_times/{sha1}/{job}` and `pr_test_times/{pr}/{sha1}/{job}` separately. Thus we need parsing logics for both
2. need to attach time for PR stats parsing for ordering since PR commits can be force-pushed
Changes to run_test.py
1. Reordering based on previous PR stats if available
2. Falling back to file change option if not enabled.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60026
Test Plan:
- CI.
- local repro: plz run:
```
CIRCLE_JOB="pytorch_linux_bionic_py3_6_clang9_noarch_test" CIRCLE_PR_NUMBER=60057 IN_CI=1 ENABLE_PR_HISTORY_REORDERING=1 python test/run_test.py
```
Reviewed By: samestep
Differential Revision: D29164754
Pulled By: walterddr
fbshipit-source-id: 206688e0fb0b78d1c9042c07243da1fbf88a924b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59840
moving these tests to their own standalone file. No meaningful code changes.
ghstack-source-id: 131359162
Test Plan: CI
Reviewed By: cbalioglu
Differential Revision: D29012664
fbshipit-source-id: 348870016509a6ed7e69240fa82bccef4a12d674
Summary:
instead of having specific logic to handle run-specific-test-case, we provide the flag to override include or bring-to-front with the SPECIFIED_TEST_CASES_FILE.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59704
Reviewed By: janeyx99
Differential Revision: D29038425
Pulled By: walterddr
fbshipit-source-id: 803d3555813437c7f287a22f7704106b0c609919
Summary:
Do not reorder tests unless they are in IN_CI, this causes local development test ordering indeterministic. most of use branch out from viable strict not head of master.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59565
Reviewed By: ejguan
Differential Revision: D28943906
Pulled By: walterddr
fbshipit-source-id: e742e7ce4b3fc017d7563b01e93c4cd774d0a537
Summary:
The run-specified-test-cases option would allow us to specify a list of test cases to run by having a CSV with minimally two columns: test_filename and test_case_name.
This PR also adds .json to some files we use for better clarity.
Usage:
`python test/run_test.py --run-specified-test-cases <csv_file>` where the csv file can look like:
```
test_filename,test_case_name,test_total_time,windows_only_failure_sha_count,total_sha_count,windows_failure_count,linux_failure_count,windows_total_count,linux_total_count
test_cuda,test_cudnn_multiple_threads_same_device,8068.8409659525,46,3768,53,0,2181,6750
test_utils,test_load_standalone,8308.8062920459,14,4630,65,0,2718,8729
test_ops,test_forward_mode_AD_acosh_cuda_complex128,91.652619369806,11,1971,26,1,1197,3825
test_ops,test_forward_mode_AD_acos_cuda_complex128,91.825633094915,11,1971,26,1,1197,3825
test_profiler,test_source,60.93786725749,9,4656,21,3,2742,8805
test_profiler,test_profiler_tracing,203.09352795241,9,4662,21,3,2737,8807
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59487
Test Plan:
Without specifying the option, everything should be as they were before.
Running `python test/run_test.py --run-specified-test-cases windows_smoke_tests.csv` resulted in this paste P420276949 (you can see internally). A snippet looks like:
```
(pytorch) janeyx@janeyx-mbp pytorch % python test/run_test.py --run-specified-test-cases windows_smoke_tests.csv
Loading specified test cases to run from windows_smoke_tests.csv.
Processed 28 test cases.
Running test_cpp_extensions_jit ... [2021-06-04 17:24:41.213644]
Executing ['/Users/janeyx/miniconda3/envs/pytorch/bin/python', 'test_cpp_extensions_jit.py', '-k', 'test_jit_cuda_archflags'] ... [2021-06-04 17:24:41.213781]
s
----------------------------------------------------------------------
Ran 1 test in 0.000s
OK (skipped=1)
...
```
With pytest, an example executable would be:
`Running test_dataloader ... [2021-06-04 17:37:57.643039]
Executing ['/Users/janeyx/miniconda3/envs/pytorch/bin/python', '-m', 'pytest', 'test_dataloader.py', '-v', '-k', 'test_segfault or test_timeout'] ... [2021-06-04 17:37:57.643327]`
Reviewed By: samestep
Differential Revision: D28913223
Pulled By: janeyx99
fbshipit-source-id: 0d1f9910973426b8756815c697b483160517b127
Summary:
It would be most accurate if sharding occurred after all other changes to selected_tests were complete.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59583
Reviewed By: ejguan
Differential Revision: D28944737
Pulled By: janeyx99
fbshipit-source-id: a851473948a5ec942ffeeedeefdc645536a3d9f7
Summary:
Partially addresses https://github.com/pytorch/pytorch/issues/55340
**Overview**
This factors out `FileStoreTest`, `HashStoreTest`, `PrefixFileStoreTest`, `TCPStoreTest`, `PrefixTCPStoreTest`, `PythonStoreTest`, `RendezvousTest`, `RendezvousEnvTest`, `RendezvousFileTest`, and `RendezvousTCPTest` from `test_c10d_common.py` to a new file `test_store.py`.
Additionally, unused import/initialization statements are removed from `test_c10d_common.py`, and the minimal set of import/initialization statements are used for `test_store.py`.
Also, this changes `.jenkins/pytorch/multigpu-test.sh`, `.jenkins/pytorch/win-test-helpers/test_distributed.bat`, and `test/run_test.py` to include the new `test_store.py`.
**Testing**
All commands shown are run on an AI AWS cluster.
I check the Store tests:
```
python test/distributed/test_store.py
```
I also check `test_c10d_common.py` since it is the source of the refactored code. In addition, I check `test_c10d_nccl.py` and `test_c10d_gloo.py` since they import from `test_c10d_common.py`; those two should be the only test files depending on `test_c10d_common.py`.
```
python test/distributed/test_c10d_common.py
python test/distributed/test_c10d_nccl.py
python test/distributed/test_c10d_gloo.py
```
`test_c10d_gloo.py` produces warnings about how using sparse tensors in TorchScript is experimental, but the warnings do not result from this PR's changes.
**Testing Issues** (To Be Revisited)
```
WORLD_SIZE=4 BACKEND=gloo gpurun pytest test/distributed/test_c10d_gloo.py
```
Running the above command fails three tests (written as `[Test]`: `[Error]`):
- `ProcessGroupGlooWrapperTest.test_collective_hang`: `RuntimeError: [../third_party/gloo/gloo/transport/tcp/pair.cc:598] Connection closed by peer [10.200.24.101]:15580`
- `CommTest.test_broadcast_coalesced_gloo_cuda`: `RuntimeError: cuda runtime error (3) : initialization error at ../aten/src/THC/THCGeneral.cpp:54`
- `CommTest.test_sequence_num_incremented_gloo_default`: `RuntimeError: cuda runtime error (3) : initialization error at ../aten/src/THC/THCGeneral.cpp:54`
However, running each of the following yields no errors:
```
WORLD_SIZE=4 BACKEND=gloo gpurun pytest test/distributed/test_c10d_gloo.py -k test_collective_hang
WORLD_SIZE=4 BACKEND=gloo gpurun pytest test/distributed/test_c10d_gloo.py -k test_broadcast_coalesced_gloo_cuda
WORLD_SIZE=4 BACKEND=gloo gpurun pytest test/distributed/test_c10d_gloo.py -k test_sequence_num_incremented_gloo_default
```
This suggests the existence of some inadvertent state dependency between tests (e.g. improper cleanup). I have not explored this further yet. In particular, I do not have a solid understanding of the tests to be able to explain why using `pytest` and `gpurun` induces the failure (since notably, running the `.py` directly shows no issue).
Similarly, running the following yields 47 errors:
```
WORLD_SIZE=4 BACKEND=nccl gpurun pytest test/distributed/test_c10d_nccl.py
```
The errors seem to all be simply complaining about the usage of `fork()` instead of `spawn()` for CUDA multiprocessing. Though, most of the tests in `test_c10d_nccl.py` ask for at least 2 CUDA devices, so I think that the `gpurun` is warranted (assuming that the test file does not need to be run partially on different machines).
Both `test_c10d_common.py` and `test_store.py` work fine with `pytest`.
**Other Notes**
I noticed that `torch.distributed` is imported both as `dist` and as `c10d` and that `c10d` is used throughout the Store tests. I was curious if this is intentional (as opposed to using `dist` to refer to `torch.distributed`). Also, the original [issue](https://github.com/pytorch/pytorch/issues/55340) suggests that the Store tests do not use multiprocessing, but I saw that `torch.multiprocessing` is still used in `TCPStoreTest`.
The links for the Store files in the `CONTRIBUTING.md` [file](https://github.com/pytorch/pytorch/blob/master/torch/distributed/CONTRIBUTING.md) are broken. This can fixed in a separate PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59271
Reviewed By: jbschlosser, mrshenli
Differential Revision: D28856920
Pulled By: andwgu
fbshipit-source-id: 630950cba18d34e6b5de661f5a748f2cddc1b446
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55728
Full design: https://github.com/pytorch/pytorch/issues/55207
This PR introduces ChunkShardingSpec (SingleShardingSpec in the design). Used
the name ChunkShardingSpec since it is very similar to `torch.chunk` in terms
of how a Tensor is split up and feels more clear compared to SingleShardingSpec.
ghstack-source-id: 129603318
Test Plan: waitforbuildbot
Reviewed By: SciPioneer
Differential Revision: D27694108
fbshipit-source-id: c8764abe6a4d5fc56d023fda29b74b5af2a73b49
Summary:
fixes https://github.com/pytorch/pytorch/issues/58632.
Added several skips that relates to test assert and MKL. Will address them in separate PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58666
Reviewed By: seemethere, janeyx99
Differential Revision: D28607966
Pulled By: walterddr
fbshipit-source-id: 066d4afce2672e4026334528233e69f68da04965
Summary:
Some machines don't have a versionless `python` on their PATH, which breaks these existing shebangs.
I'm assuming that all the existing versionless `python` shebangs are meant to be `python3` and not `python2`; please let me know if my assumption was incorrect for any of these.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58275
Test Plan: CI.
Reviewed By: zhouzhuojie
Differential Revision: D28428143
Pulled By: samestep
fbshipit-source-id: 6562be3d12924db72a92a0207b060ef740f61ebf
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56868
See __init__.py for a summary of the tool.
The following sections are present in this initial version
- Model Size. Show the total model size, as well as a breakdown by
stored files, compressed files, and zip overhead. (I expect this
breakdown to be a bit more useful once data.pkl is compressed.)
- Model Structure. This is basically the output of
`show_pickle(data.pkl)`, but as a hierarchical structure.
Some structures cause this view to crash right now, but it can be
improved incrementally.
- Zip Contents. This is basically the output of `zipinfo -l`.
- Code. This is the TorchScript code. It's integrated with a blame
window at the bottom, so you can click "Blame Code", then click a bit
of code to see where it came from (based on the debug_pkl). This
currently doesn't render properly if debug_pkl is missing or
incomplete.
- Extra files (JSON). JSON dumps of each json file under /extra/, up to
a size limit.
- Extra Pickles. For each .pkl file in the model, we safely unpickle it
with `show_pickle`, then render it with `pprint` and include it here
if the size is not too large. We aren't able to install the pprint
hack that thw show_pickle CLI uses, so we get one-line rendering for
custom objects, which is not very useful. Built-in types look fine,
though. In particular, bytecode.pkl seems to look fine (and we
hard-code that file to ignore the size limit).
I'm checking in the JS dependencies to avoid a network dependency at
runtime. They were retrieved from the following URLS, then passed
through a JS minifier:
https://unpkg.com/htm@3.0.4/dist/htm.module.js?modulehttps://unpkg.com/preact@10.5.13/dist/preact.module.js?module
Test Plan:
Manually ran on a few models I had lying around.
Mostly tested in Chrome, but I also poked around in Firefox.
Reviewed By: dhruvbird
Differential Revision: D28020849
Pulled By: dreiss
fbshipit-source-id: 421c30ed7ca55244e9fda1a03b8aab830466536d
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56386
The diff resolves bug around incorrect handler resolution:
_create_static_handler pointed towards etcd, and _create_etcd_handler pointed towards static.
Test Plan:
buck test mode/dev-nosan //caffe2/test/distributed:test_launcher
Added test_launcher to the ci/cd tests
Reviewed By: cbalioglu
Differential Revision: D27858897
fbshipit-source-id: 440155789958c091ce5755e7c9524e4bb704203a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56039
Python will try to eagerly resolve the name references even if
the import failed. Quote them so that it doesn't.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Reviewed By: janeyx99
Differential Revision: D27770536
Pulled By: ezyang
fbshipit-source-id: b111739289498f9bab856fb9424f3080efee4ee0
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55695
In order to be able to run CUDA tests on their own (e.g., to avoid running CPU tests on GPU machines).
Done by moving test methods to a separate class (and sometimes introducing a "common" base class for utils), and then providing new entry points inside a `cuda/` subdirectory.
Test Plan: Checked they are run on Sandcastle.
Reviewed By: mrshenli
Differential Revision: D27618198
fbshipit-source-id: 8f671657f79c8ae115748ab7752fe0066705893b
Summary:
1. move module related stuff to test_module_container
2. created test_types for types and annotation
3. created test_misc for the rest
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55560
Reviewed By: VitalyFedyunin
Differential Revision: D27650911
Pulled By: walterddr
fbshipit-source-id: d895a7da9e9c3d25a662a37faf4daabc276b9c1a
Summary:
Prettifies JSON files .pytorch-test-times and .pytorch-slow-tests so that not everything is on one single line.
This is of slightly more importance as generated .pytorch-slow-tests ends up getting stored in our test-infra repo ([example](ad9cd87565)), and it is nice to not have that lil red symbol at the end.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55335
Reviewed By: samestep
Differential Revision: D27576930
Pulled By: janeyx99
fbshipit-source-id: be58565b8c8593a9bfcfab383ee19facc79f0572
Summary:
Moves more s3 parsing code to s3_stat_parser.py. This is another step in modularizing the parsing code more correctly. I will also be using this exact function in future slowTest code.
Also replaces some Any's in the code to be Report.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54808
Test Plan:
.pytorch-test-times generated before the code and after this code is the same.
CI should pass, specifically the test tools GHA.
Reviewed By: walterddr
Differential Revision: D27375783
Pulled By: janeyx99
fbshipit-source-id: bec28551668b2eb3fdd60d802200993e493eac83
Summary:
First step to move all S3 related operations into S3 parser utils.
in the end we provide APIs from s3_stats_parser:
1. downloading data as reports and uploading data as reports
2. filter by job name
and handle all compression, formatting inside.
TODO
- [ ] Refactor out upload into s3_stats_parser
- [ ] Remove all S3/BOTO related checkers and try/catch blocks outside of s3_stats_parser
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54681
Test Plan:
1. Running tools/test/* covers the refactoring logic (test_test_history.py and test_stats.py as entrypoint and both using the 2 new APIs in s3_stats_parser after the refactoring.
2. print_test_stats.py's main argparse entrypoint is covered by CI step Report Test Result step.
3. run `python test/run_test.py --export-past-test-times` before and after this PR should result in the same file content in .pytorch-test-times
Reviewed By: ailzhang
Differential Revision: D27346742
Pulled By: walterddr
fbshipit-source-id: fb40162e631e007fed9d5821fe4f190bda2cb52e
Summary:
Since `_test1`, `_test2` and `_build` and `test` are all stripped, `slow_test` should be stripped as well. This way, the _slow_test stats will be considered as a part of all stats relating to a particular build job, though currently, it doesn't do much because the jobs don't share a common stemmed name--the build has `_gcc7` while the slow_test CI job does not.
This makes me think...do we omit the `gcc7` intentionally? Are there other things I should strip, e.g., `multigpu_test`?
See:
ci/circleci: pytorch_linux_xenial_cuda10_2_cudnn7_py3_slow_test
ci/circleci: pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_test1
ci/circleci: pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_test2
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54528
Reviewed By: samestep
Differential Revision: D27270393
Pulled By: janeyx99
fbshipit-source-id: ffb7289cfe4dba52ded67f50a89f3e75e7bad68d
Summary:
Step 2 to fixing https://github.com/pytorch/pytorch/issues/53882 :)
This changes TARGET_DET_LIST and sharding automation by checking if there's already cached data from the commit in `.pytorch-test-times`. If not, it pulls data from S3 and updates the file to have the stats. This way, S3 pulling does not need to happen more than once for the same commit.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54210
Test Plan:
the following methods should run the same set of tests.
First `export CIRCLE_JOB=pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_test2` or your favorite CIRCLE JOB.
1. Pull data first and use it:
Download the data from S3 and write it to the cache file with `python test/run_test.py --export-historic-test-times .pytorch-test-times`
Now run `python test/run_test.py --shard 1 10`
2. Make the sharding job pull data:
Delete the file you just created: `rm .pytorch-test-times`
Now run `python test/run_test.py --shard 1 10`
Reviewed By: walterddr
Differential Revision: D27136849
Pulled By: janeyx99
fbshipit-source-id: 51a42c4e2fa3f8cf15e682679dd3eb6130aad927
Summary:
This is an initial attempt in refactoring and consolidating our S3 read logic for print_test_stats.py, test_history.py, and run_test.py. This way, boto3 and botocore do not need to be imported in various places throughout the code base, and duplicated logic (such as the many type definitions) can exist in one place: `tools/stat_utils/s3_stat_parser.py`. walterddr contributed to this PR by moving print_test_stats.py to the tools folder and the corresponding tests a subfolder within tools.
**NOTE: this removes those tests from CI as the new `tools/test/test_stats.py` is not in the test/ directory as the other tests in TESTS in run_test.py.**
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53755
Test Plan:
This refactoring change should not break anything, so running the files as before should work as they did previously.
To make sure that print_test_stats.py still functions: run `python tools/test/test_stats.py` and make sure all tests pass.
To make sure that test_history.py works, run the example commands from `tools/test_history.py --help` and check that their output matches that shown. Note that the script will continue printing for a while, so don't be alarmed.
Some next steps:
- Actually coming up with similarities among the three current use cases and further refactoring/consolidating of functions (e.g., combining simplify and get_cases)
- Moving more parsing logic to s3_stat_parser.py to have better abstraction between our files
- Adding tests for s3_stat_parser.py when there is more functionality in it
Reviewed By: agolynski, samestep
Differential Revision: D27030285
Pulled By: janeyx99
fbshipit-source-id: e664781324ef7c0c30943bfd7f17c895075ef7a7
Summary:
This will allow for future work to use the test times file (which will save computation time and also allow for more consistency). (Step one to fixing https://github.com/pytorch/pytorch/issues/53882)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54083
Test Plan:
export CIRCLE_JOB=your-favorite-circleci-job e.g., pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_test2
`python test/run_test.py --export-historic-test-times` OR
`python test/run_test.py --export-historic-test-times .your-favorite-file`
When opening either .pytorch-test-times or .your-favorite-file, you should see something like:
```
{"commit": "2d559a09392aabb84dfb4a498010b2f01d99818c", "job_times": {"distributed/test_distributed_spawn": 583.5889999999973, "distributed/test_data_parallel": 4.866999999999997, "test_binary_ufuncs": 171.1569999999998, "test_numpy_interop": 2.5649999999999995, "test_public_bindings": 0.011,...}}
```
Note that no tests will be run when this option is specified.
Reviewed By: walterddr
Differential Revision: D27091351
Pulled By: janeyx99
fbshipit-source-id: e191d739268d86de0a0ba0eea0006969859d1940
Summary:
This PR:
1. moves sharding algorithm from run_test.py to framework_utils.py (let me know if you have a better place for it)
2. adds tests for the algorithm in test_testing.py
3. fixes the algorithm so that it doesn't tack on the unknown jobs all to the shard with the minimum time, but instead distributes them around the shards.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53942
Test Plan: python test/test_testing.py -k TestFrameworkUtils
Reviewed By: samestep
Differential Revision: D27047223
Pulled By: janeyx99
fbshipit-source-id: 824b20009c0bb707aa5361de445cdec795d5e3f1
Summary:
First argument is either file name or test module name, but key to `CUSTOM_HANDLERS` is test module name.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53884
Test Plan: Run `python3 run_test.py -i distributed/test_distributed_spawn.py`
Reviewed By: janeyx99
Differential Revision: D27006164
Pulled By: malfet
fbshipit-source-id: f30b42856cd2754e5981c1c69618f84e392c986a
Summary:
This PR:
1. refactors the logic for S3 stats gathering.
2. Renames SLOW_TESTS to TARGET_DET_LIST to disambiguate and remove confusion with slowTest
2. detects slow tests (tests with time > 5min) to add to the TARGET_DET_LIST based on results in S3 from the previous nightly.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53549
Test Plan:
Set CIRCLE_JOB to your favorite CI job (like `pytorch_linux_bionic_py3_8_gcc9_coverage_test1`).
Run `python test/run_test.py --determine-from=<your fave pytorch files>`
e.g., `python test/run_test.py --determine-from=test/run_test.py`
Reviewed By: mrshenli
Differential Revision: D26904478
Pulled By: janeyx99
fbshipit-source-id: 9576b34f4fee09291d60e36ff2631753a3925094
Summary:
Context: https://github.com/pytorch/pytorch/pull/53299#discussion_r587882857
These are the only hand-written parts of this diff:
- the addition to `.github/workflows/lint.yml`
- the file endings changed in these four files (to appease FB-internal land-blocking lints):
- `GLOSSARY.md`
- `aten/src/ATen/core/op_registration/README.md`
- `scripts/README.md`
- `torch/csrc/jit/codegen/fuser/README.md`
The rest was generated by running this command (on macOS):
```
git grep -I -l ' $' -- . ':(exclude)**/contrib/**' ':(exclude)third_party' | xargs gsed -i 's/ *$//'
```
I looked over the auto-generated changes and didn't see anything that looked problematic.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53406
Test Plan:
This run (after adding the lint but before removing existing trailing spaces) failed:
- https://github.com/pytorch/pytorch/runs/2043032377
This run (on the tip of this PR) succeeded:
- https://github.com/pytorch/pytorch/runs/2043296348
Reviewed By: walterddr, seemethere
Differential Revision: D26856620
Pulled By: samestep
fbshipit-source-id: 3f0de7f7c2e4b0f1c089eac9b5085a58dd7e0d97
Summary:
Uses nightly commit stats to automatically shard tests based on execution time.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53269
Test Plan:
set CIRCLE_JOB to an existing job, like `pytorch_linux_bionic_py3_6_clang9_test`
Then you can run something like: `python test/run_test.py --shard 1 10`
Reviewed By: malfet
Differential Revision: D26819440
Pulled By: janeyx99
fbshipit-source-id: 6bc73d6aa3d52d9850817536be15d7b54a72780e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52910
**Summary**
PR #52158 tried to move all JIT bindings from `torch._C` to a new
submodule `torch._C._jit`, but that...did not go well. This pull request
adds the new `torch._C._jit` submodule, but does not migrate the
existing bindings. Instead, it adds a unit test that fails if any new
bindings are added to `torch._C`. A comment in the test instructs
developers to add their new binding to the allowlist if it really should
be in `torch._C`, or to add it to the appropriate submodule (e.g
`torch._C._jit`, for example). The idea is to prevent the issue
described in #51691 from getting *worse* if it cannot be fixed.
**Test Plan**
Continuous integration.
**Fixes**
This commit fixes#51691.
Test Plan: Imported from OSS
Reviewed By: albanD
Differential Revision: D26698373
Pulled By: SplitInfinity
fbshipit-source-id: ec9f5426051227a513d4fd09512b624420e0100b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52323
Using default cpu allocator for ops executed on qnnpack backend will result in
asan failures with heap overflow since qnnpack (and xnnpack) can access input
beyond their and/beginning.
Here we are enabling this feature specifically to enable dynamic sparse linear op test
using qnnpack engine. In dynamic linear op, the fp32 bias is not packed and
hence can result in out-of-bound access.
Test Plan: test_set_default_mobile_cpu_allocator.py
Reviewed By: z-a-f
Differential Revision: D26263481
fbshipit-source-id: a49227cac7e6781b0db4a156ca734d7671972d9f