pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Richard Zou	645a9235f0	Add functorch shards for windows CI (#82161 ) Test Plan: - wait for CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/82161 Approved by: https://github.com/kit1980	2022-07-26 14:21:13 +00:00
samdow	2ac24675cc	get rid of push_torch_{dispatch, function}_mode (#78215 ) Currently we have 2 ways of doing the same thing for torch dispatch and function modes: `with push_torch_dispatch_mode(X)` or `with X.push(...)` is now the equivalent of doing `with X()` This removes the first API (which is older and private so we don't need to go through a deprecation cycle) There is some risk here that this might land race with a PR that uses the old API but in general it seems like most are using the `with X()` API or `enable_torch_dispatch_mode(X())` which isn't getting removed. EDIT: left the `with X.push(...)` API since there were ~3 land races with that over the past day or so. But made it give a warning and ask users to use the other API Pull Request resolved: https://github.com/pytorch/pytorch/pull/78215 Approved by: https://github.com/ezyang	2022-07-22 18:56:37 +00:00
soulitzer	f595467e5c	Reenable slow gradcheck and make it pass (#80514 ) Context: For a while slow gradcheck CI was skipping nearly all tests and this hid the fact that it should've been failing and timing out (10+h runtime for TestGradients). The CI configuration has since been fixed to correct this, revealing the test failures. This PR reenables slow gradcheck CI and makes it pass again. This PR: - makes slow and failing tests run in fast gradcheck mode only - reduce the input size for slow gradcheck only for unary/binary ufuncs (alternatively, skip the test entirely) - skip entire test files on slow gradcheck runner if they don't use gradcheck (test_ops, test_meta, test_decomp, test_ops_jit) - reduces the input size for some ops Follow ups: 1. Investigate slow mode failures https://github.com/pytorch/pytorch/issues/80411 2. See if we can re-enable slow gradcheck tests for some of the slow tests by reducing the sizes of their inputs The following are failing in slow mode, they are now running in fast mode only. ``` test_fn_fwgrad_bwgrad___rmod___cuda_float64 test_fn_fwgrad_bwgrad_linalg_householder_product_cuda_complex128 test_fn_fwgrad_bwgrad__masked_prod_cuda_complex128 test_fn_fwgrad_bwgrad__masked_prod_cuda_float64 test_fn_fwgrad_bwgrad_linalg_matrix_power_cuda_complex128 test_fn_fwgrad_bwgrad_cat_cuda_complex128 test_fn_fwgrad_bwgrad_linalg_lu_factor_ex_cuda_float64 test_fn_fwgrad_bwgrad_copysign_cuda_float64 test_fn_fwgrad_bwgrad_cholesky_inverse_cuda_complex128 test_fn_fwgrad_bwgrad_float_power_cuda_complex128 test_fn_fwgrad_bwgrad_fmod_cuda_float64 test_fn_fwgrad_bwgrad_float_power_cuda_float64 test_fn_fwgrad_bwgrad_linalg_lu_cuda_float64 test_fn_fwgrad_bwgrad_remainder_cuda_float64 test_fn_fwgrad_bwgrad_repeat_cuda_complex128 test_fn_fwgrad_bwgrad_prod_cuda_complex128 test_fn_fwgrad_bwgrad_slice_scatter_cuda_float64 test_fn_fwgrad_bwgrad_tile_cuda_complex128 test_fn_fwgrad_bwgrad_pow_cuda_float64 test_fn_fwgrad_bwgrad_pow_cuda_complex128 test_fn_fwgrad_bwgrad_fft_* test_fn_fwgrad_bwgrad_zero__cuda_complex128 test_fn_gradgrad_linalg_lu_factor_cuda_float64 test_fn_grad_div_trunc_rounding_cuda_float64 test_fn_grad_div_floor_rounding_cuda_float64 ``` Marks the OpInfos for the following ops that run slowly in slow gradcheck as `fast_gradcheck` only (the left column represents runtime in seconds): ``` 0 918.722 test_fn_fwgrad_bwgrad_nn_functional_conv_transpose3d_cuda_float64 1 795.042 test_fn_fwgrad_bwgrad_nn_functional_unfold_cuda_complex128 2 583.63 test_fn_fwgrad_bwgrad_nn_functional_max_pool3d_cuda_float64 3 516.946 test_fn_fwgrad_bwgrad_svd_cuda_complex128 4 503.179 test_fn_fwgrad_bwgrad_linalg_svd_cuda_complex128 5 460.985 test_fn_fwgrad_bwgrad_linalg_lu_cuda_complex128 6 401.04 test_fn_fwgrad_bwgrad_linalg_lstsq_grad_oriented_cuda_complex128 7 353.671 test_fn_fwgrad_bwgrad_nn_functional_max_pool2d_cuda_float64 8 321.903 test_fn_fwgrad_bwgrad_nn_functional_gaussian_nll_loss_cuda_float64 9 307.951 test_fn_fwgrad_bwgrad_stft_cuda_complex128 10 266.104 test_fn_fwgrad_bwgrad_svd_lowrank_cuda_float64 11 221.032 test_fn_fwgrad_bwgrad_istft_cuda_complex128 12 183.741 test_fn_fwgrad_bwgrad_lu_unpack_cuda_complex128 13 132.019 test_fn_fwgrad_bwgrad_nn_functional_unfold_cuda_float64 14 125.343 test_fn_fwgrad_bwgrad_nn_functional_pad_constant_cuda_complex128 15 124.2 test_fn_fwgrad_bwgrad_kron_cuda_complex128 16 123.721 test_fn_fwgrad_bwgrad_pca_lowrank_cuda_float64 17 121.074 test_fn_fwgrad_bwgrad_nn_functional_max_unpool3d_cuda_float64 18 119.387 test_fn_fwgrad_bwgrad_rot90_cuda_complex128 19 112.889 test_fn_fwgrad_bwgrad__masked_normalize_cuda_complex128 20 107.541 test_fn_fwgrad_bwgrad_dist_cuda_complex128 21 106.727 test_fn_fwgrad_bwgrad_diff_cuda_complex128 22 104.588 test_fn_fwgrad_bwgrad__masked_cumprod_cuda_complex128 23 100.135 test_fn_fwgrad_bwgrad_nn_functional_feature_alpha_dropout_with_train_cuda_float64 24 88.359 test_fn_fwgrad_bwgrad_mH_cuda_complex128 25 86.214 test_fn_fwgrad_bwgrad_nn_functional_max_unpool2d_cuda_float64 26 83.037 test_fn_fwgrad_bwgrad_nn_functional_bilinear_cuda_float64 27 79.987 test_fn_fwgrad_bwgrad__masked_cumsum_cuda_complex128 28 77.822 test_fn_fwgrad_bwgrad_diag_embed_cuda_complex128 29 76.256 test_fn_fwgrad_bwgrad_mT_cuda_complex128 30 74.039 test_fn_fwgrad_bwgrad_linalg_lu_solve_cuda_complex128 ``` ``` 0 334.142 test_fn_fwgrad_bwgrad_unfold_cuda_complex128 1 312.791 test_fn_fwgrad_bwgrad_linalg_lu_factor_cuda_complex128 2 121.963 test_fn_fwgrad_bwgrad_nn_functional_max_unpool3d_cuda_float64 3 108.085 test_fn_fwgrad_bwgrad_diff_cuda_complex128 4 89.418 test_fn_fwgrad_bwgrad_nn_functional_max_unpool2d_cuda_float64 5 72.231 test_fn_fwgrad_bwgrad___rdiv___cuda_complex128 6 69.433 test_fn_fwgrad_bwgrad___getitem___cuda_complex128 7 68.582 test_fn_fwgrad_bwgrad_ldexp_cuda_complex128 8 68.572 test_fn_fwgrad_bwgrad_linalg_pinv_cuda_complex128 9 67.585 test_fn_fwgrad_bwgrad_nn_functional_glu_cuda_float64 10 66.567 test_fn_fwgrad_bwgrad_lu_cuda_float64 ``` ``` 0 630.13 test_fn_gradgrad_nn_functional_conv2d_cuda_complex128 1 81.086 test_fn_gradgrad_linalg_solve_triangular_cuda_complex128 2 71.332 test_fn_gradgrad_norm_cuda_complex128 3 64.308 test_fn_gradgrad__masked_std_cuda_complex128 4 59.519 test_fn_gradgrad_div_no_rounding_mode_cuda_complex128 5 58.836 test_fn_gradgrad_nn_functional_adaptive_avg_pool3 ``` Reduces the sizes of the inputs for: - diff - diag_embed Pull Request resolved: https://github.com/pytorch/pytorch/pull/80514 Approved by: https://github.com/albanD	2022-07-22 02:05:37 +00:00
vitrioil	d4043f0d95	Added generator check for parametrize and ops (#81263 ) Fixes #81234 Pull Request resolved: https://github.com/pytorch/pytorch/pull/81263 Approved by: https://github.com/jbschlosser	2022-07-21 19:32:54 +00:00
Catherine Lee	06a0cfc0ea	pytest to run test_ops, test_ops_gradients, test_ops_jit in non linux cuda environments (#79898 ) This PR uses pytest to run test_ops, test_ops_gradients, and test_ops_jit in parallel in non linux cuda environments to decrease TTS. I am excluding linux cuda because running in parallel results in errors due to running out of memory Notes: * update hypothesis version for compatability with pytest * use rerun-failures to rerun tests (similar to flaky tests, although these test files generally don't have flaky tests) * reruns are denoted by a rerun tag in the xml. Failed reruns also have the failure tag. Successes (meaning that the test is flaky) do not have the failure tag. * see https://docs.google.com/spreadsheets/d/1aO0Rbg3y3ch7ghipt63PG2KNEUppl9a5b18Hmv2CZ4E/edit#gid=602543594 for info on speedup (or slowdown in the case of slow tests) * expecting windows tests to decrease by 60 minutes total * slow test infra is expected to stay the same - verified by running pytest and unittest on the same job and check the number of skipped/run tests * test reports to s3 changed - add entirely new table to keep track of invoking_file times Pull Request resolved: https://github.com/pytorch/pytorch/pull/79898 Approved by: https://github.com/malfet, https://github.com/janeyx99	2022-07-19 19:50:57 +00:00
Richard Zou	c9a0204ef4	Disable functorch modes in testing's freeze_rng_state(), part 2 (#81109 ) I forgot to update one line in https://github.com/pytorch/pytorch/pull/81006. torch.get_rng_state() returns a Tensor that can also be affected by modes so it also needs a no_functorch() context manager. Test Plan: - tested with functorch tests on CUDA (that's how I discovered this problem) Pull Request resolved: https://github.com/pytorch/pytorch/pull/81109 Approved by: https://github.com/samdow	2022-07-08 20:18:56 +00:00
Animesh Jain	b26d664810	Switch on TorchDynamo for PyTorch tests (#81083 ) As title. Based on https://github.com/pytorch/pytorch/pull/80106 Pull Request resolved: https://github.com/pytorch/pytorch/pull/81083 Approved by: https://github.com/ezyang	2022-07-08 16:08:11 +00:00
Richard Zou	26582056fa	Disable functorch modes in testing's freeze_rng_state() (#81006 ) freeze_rng_state() is this thing we use to test random operations in OpInfos: it ensures that everytime the op is called the rng state is the same. Unfortunately this doesn't work with functorch, because - torch.cuda.set_rng_state() clones a Tensor and then grabs its data_ptr - functorch's modes cause functorch wrappers to get emitted on the .clone() call (even if the thing being cloned a regular Tensor). Tensor subclasses also had this problem. This PR applies the same solution as torch_dispatch did before: we're just going to disable functorch dispatch when setting the rng state. In the long run, torch_dispatch should probably have an option to interpose on torch.cuda.set_rng_state or generator.set_state... but I didn't want to think very hard right now. Test Plan: - tested with functorch tests (those tests were previously being skipped, now I can unskip some of them). Pull Request resolved: https://github.com/pytorch/pytorch/pull/81006 Approved by: https://github.com/samdow	2022-07-08 03:28:49 +00:00
Animesh Jain	1d90d6ee60	Setup for running PyTorch tests with TorchDynamo and skips for known failing tests (#80106 ) @ezyang I am going to keep adding more skips in this PR for now. And once we have the CI running, I will replace with the appropriate decorators. cc @mlazos , we should add those tests in test_ops.py in this PR as well cc @jansel Pull Request resolved: https://github.com/pytorch/pytorch/pull/80106 Approved by: https://github.com/ezyang, https://github.com/jansel	2022-07-07 18:57:33 +00:00
Anish Mahishi	af9b8a691f	Refactoring the AO experimental sparsity tests The tests are passing, and this diff refactors them to conform with the best practices. Differential Revision: [D37316967](https://our.internmc.facebook.com/intern/diff/D37316967/) NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D37316967/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/79972 Approved by: https://github.com/z-a-f	2022-06-22 16:52:32 +00:00
Eric Han	06274d7a48	Add test for torchscripting nn.TransformerEncoder, including fast path (#79796 ) Summary: Add test just to check if TransformerEncoder will crash when enumerating over params [with_no_grad, use_torchscript, training]. Motivation for this was that TransformerEncoder fast path (so with_no_grad=True) and use_torchscript=True would crash with the issue that NestedTensor doesn't have size. This was caused because the TransformerEncoder fast path generates a NestedTensor automatically as a perf optimization and torchscript attempts to find intermediate tensor sizes while it optimizes. But NestedTensor has not implemented a size method, so things fail. This test goes together with this fix https://github.com/pytorch/pytorch/pull/79480 Test Plan: ``` buck build --show-output mode/opt -c fbcode.enable_gpu_sections=true -c fbcode.nvcc_arch=a100 mode/inplace //caffe2/test:transformers ./fbcode/buck-out/gen/caffe2/test/transformers#binary.par ``` Test runs and passes together with the changes from the PR above (I made another diff on top of this with those changes). Does not pass without the fix. Reviewed By: mikekgfb Differential Revision: D37222923 Pull Request resolved: https://github.com/pytorch/pytorch/pull/79796 Approved by: https://github.com/zrphercule	2022-06-17 22:00:49 +00:00
Michael Suo	c978b609f7	[ci] remove IN_CI env var The conventional env var to set is CI. Both circle and GHA set it, so IN_CI is unnecessary Pull Request resolved: https://github.com/pytorch/pytorch/pull/79229 Approved by: https://github.com/janeyx99	2022-06-11 17:16:30 +00:00
Michael Suo	f51d5233f2	[ci] fix GITHUB_ACTIONS env var checks `GITHUB_ACTIONS` is set to `true`, but some of our code checks that it is `1`. Make the checks more general. Pull Request resolved: https://github.com/pytorch/pytorch/pull/79290 Approved by: https://github.com/janeyx99	2022-06-11 17:16:30 +00:00
Peter Bell	c936396af2	Always convert truthy booleans to 1 Ref #54789 A `bool` has only two valid values, 1 or 0. Any in-memory value outside of those leads to undefined behavior. So, instead of `reinterpret_cast`-ing to `bool*` I introduce `c10::load<scalar_t>` which will read as `unsigned char` and convert to a valid `bool`. This gets >90% of operators working, but the remaining operators where skips and xfails have been added will require individual attention. Pull Request resolved: https://github.com/pytorch/pytorch/pull/77122 Approved by: https://github.com/mruberry	2022-06-07 16:00:30 +00:00
Justin Chu	d0baf5792a	[Testing] Turn instantiate_parametrized_tests into a decorator (#78228 ) The change 1. Returns the input param in `instantiate_parametrized_tests` so that we can use it as a decorator. Instead of ```python instantiate_parametrized_tests(SomeTestClass) ``` we can now do ```python @instantiate_parametrized_tests class SomeTestClass: ... ``` 2. Strips spaces around parameters in `parametrize`. Both ```python @parametrize("x,y", [(1, 'foo'), (2, 'bar'), (3, 'baz')]) def test_bar(self, x, y): ... ``` and ```python @parametrize("x, y", [(1, 'foo'), (2, 'bar'), (3, 'baz')]) def test_bar(self, x, y): ... ``` (or ```python @parametrize("x, y", [(1, 'foo'), (2, 'bar'), (3, 'baz')]) ```` , for that matter), are now parsed to the correct argument names. 3. Decorates test wrappers with `functools.wraps` to preserve the test method names (previously the names become "wrapper_xxx", which prevents the parameterized tests from getting the correct names.) Pull Request resolved: https://github.com/pytorch/pytorch/pull/78228 Approved by: https://github.com/garymm, https://github.com/seemethere	2022-05-31 22:33:37 +00:00
Pearu Peterson	8c88a55d44	Fix sparse BSR tensor validation. Also adds bits to support dense dimensions for Sparse Compressed tensors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/78359 Approved by: https://github.com/cpuhrsch	2022-05-27 13:26:35 +00:00
Jane Xu	50930604cf	Hackily use addSkip to track flakiness in common_utils.py (#78292 ) ### Problem: The current way we detect flakiness is by aggregating results at the end of a job, which has worked so far but is inefficient and potentially inaccurate. We have also been delegating a workflow step towards doing this analysis at the end of every job. ### Solution: This PR uses unittest.TestResult's addSkip method, which adds a skipped test every time we detect something is flaky. This way, we no longer need to aggregate anything and we can easily scan through the test reports and filter for skipped tests with flaky = True. Not only is this much faster to query for, it rids us of needing to figure out janky aggregation logic. ### Test plan: I simulated a flaky test locally (test_async_python) and observed that: With overriding signal ON (so flaky test = green): - Successes pass are reported just as they normally are with no skips. [override_signal_normal_success.txt](https://github.com/pytorch/pytorch/files/8774012/override_signal_normal_success.txt) - Failures fail and are reported as they are with no skips. [override_signal_all_fails.txt](https://github.com/pytorch/pytorch/files/8774010/override_signal_all_fails.txt) - Flaky tests have expected failures + a success + a skip denoting the correct information. [override_signal_1_1.txt](https://github.com/pytorch/pytorch/files/8774005/override_signal_1_1.txt) and [override_signal_2_1.txt](https://github.com/pytorch/pytorch/files/8774007/override_signal_2_1.txt) With overriding signal OFF: - Successes pass are reported just as they normally are with no skips. [report_only_one_success.txt](https://github.com/pytorch/pytorch/files/8774019/report_only_one_success.txt) - Failures fail and are reported as they are with no skips. [report_only_all_fails.txt](https://github.com/pytorch/pytorch/files/8774018/report_only_all_fails.txt) - Flaky tests have failures + unexpected successes + a skip denoting the correct information. [report_only_3_1.txt](https://github.com/pytorch/pytorch/files/8774015/report_only_3_1.txt) Pull Request resolved: https://github.com/pytorch/pytorch/pull/78292 Approved by: https://github.com/suo	2022-05-26 14:06:57 +00:00
Alban Desmaison	04ac80c73a	Fix a few issues on assert/double error/legacy constructor (#77966 ) Fixes https://github.com/pytorch/pytorch/issues/77960, https://github.com/pytorch/pytorch/issues/77957, https://github.com/pytorch/pytorch/issues/77781 Pull Request resolved: https://github.com/pytorch/pytorch/pull/77966 Approved by: https://github.com/soulitzer, https://github.com/kulinseth	2022-05-20 20:25:12 +00:00
Philip Meier	dd313d7338	support TestCase.longMessage in TestCase.assertEqual Pull Request resolved: https://github.com/pytorch/pytorch/pull/77602 Approved by: https://github.com/mruberry	2022-05-20 11:09:28 +00:00
Pearu Peterson	8b5f11c61e	Support copy_ for Sparse Compressed tensors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/77605 Approved by: https://github.com/cpuhrsch	2022-05-18 21:22:19 +00:00
Jane Xu	a325fa94b9	[flaky test reporting] print stack trace for flaky reruns (#77664 ) Give context about failures ~and include it in the test report~! Unfortunately I cannot include it easily in the test report through the addExpectedFailures :c as tracebacks are not something I can instantiate. (see revert comment) Current way doesn't print the stack traces, which has been fine because we don't hide the signal and it shows up at the end: <img width="545" alt="image" src="https://user-images.githubusercontent.com/31798555/168878182-914edc39-369f-40bc-8b35-9d5cd47a6b1c.png"> However, when we want to hide the signal, we'd like to print the stack traces for each failed attempt, as it won't be shown at the end. Pull Request resolved: https://github.com/pytorch/pytorch/pull/77664 Approved by: https://github.com/suo	2022-05-18 14:17:16 +00:00
PyTorch MergeBot	b87e5f383b	Revert "[flaky test reporting] print stack trace for flaky reruns (#77664 )" This reverts commit `646a20c8fe`. Reverted https://github.com/pytorch/pytorch/pull/77664 on behalf of https://github.com/janeyx99	2022-05-17 21:27:21 +00:00
Jane Xu	646a20c8fe	[flaky test reporting] print stack trace for flaky reruns (#77664 ) Give context about failures and include it in the test report! Current way doesn't print the stack traces, which has been fine because we don't hide the signal and it shows up at the end: <img width="545" alt="image" src="https://user-images.githubusercontent.com/31798555/168878182-914edc39-369f-40bc-8b35-9d5cd47a6b1c.png"> However, when we want to hide the signal, we'd like to print the stack traces for each failed attempt, as it won't be shown at the end. Pull Request resolved: https://github.com/pytorch/pytorch/pull/77664 Approved by: https://github.com/suo	2022-05-17 18:06:14 +00:00
Kulin Seth	f348b1b2b5	Add the Runtime components for MPS backend. (#76725 ) The PR adds the runtime components and few basic operations like copy, as_strided for MPS backend. Current list of identified TODOs are: - https://github.com/pytorch/pytorch/issues/77176 - Unify the logic with CUDACachingAllocator and remove redundant code. - https://github.com/pytorch/pytorch/issues/77170 - Look into using C++ smart pointers where possible with ObjC code - Use empty_strided_generic() to implement the `empty_strided_mps` code - https://github.com/pytorch/pytorch/issues/77144 Pull Request resolved: https://github.com/pytorch/pytorch/pull/76725 Approved by: https://github.com/albanD	2022-05-11 17:19:45 +00:00
Nikita Vedeneev	00a1fb64bb	Faster `index_select` for sparse COO tensors on CPU. (#72710 ) Fixes https://github.com/pytorch/pytorch/issues/72212. This PR improves the previous algorithm in complexity. It also utilizes the structure of the problem and parallelizes computations when possible. Benchmark results. <details> <summary>Testing script</summary> ```python import torch import math from IPython import get_ipython from itertools import product import pickle from torch.utils.benchmark import Timer, Compare torch.manual_seed(13) #torch.set_num_threads(1) ipython = get_ipython() index_sizes = (100, 1000, 10000) # specifies (n, nnz) problem_dims = ( # n > nnz (10000, 100), (100000, 1000), (1000000, 10000), # n < nnz (10, 100), (10, 1000), (10, 10000), (100, 1000), (100, 10000), (1000, 10000), (1000, 100000), (1000, 1000000), #(1000000, 1000000000), ) def f(t, d, index): s = torch_sparse.SparseTensor.from_torch_sparse_coo_tensor(t) ss = s.index_select(d, index) return ss.coo() name = "PR" results = [] for (n, nnz), m in product(problem_dims, index_sizes): for d in (0, 1): if nnz < n: shape = (n, n) else: shape = (n, nnz // n) if d == 0 else (nnz // n, n) nrows, ncols = shape rowidx = torch.randint(low=0, high=nrows, size=(nnz,)) colidx = torch.randint(low=0, high=ncols, size=(nnz,)) itemidx = torch.vstack((rowidx, colidx)) xvalues = torch.randn(nnz) index = torch.randint(low=0, high=n, size=(m,)) SparseX = torch.sparse_coo_tensor(itemidx, xvalues, size=shape).coalesce() smtp = "SparseX.index_select(d, index)" timer = Timer(smtp, globals=globals(), label="coo.index_select", description=f"{name}: coo.index_select", sub_label=f"n={n}, nnz={nnz}, index_len={m}, dim={d}", num_threads=torch.get_num_threads()) results.append(timer.blocked_autorange()) compare = Compare(results) compare.trim_significant_figures() compare.print() with open(f"{name}_index_select.pickle", 'wb') as f: pickle.dump(results, f) ``` </details> <details> <summary>Gather results</summary> ```python import pickle from torch.utils.benchmark import Timer, Compare files = [ "PR", "torch_sparse", "master" ] timers = [] for name in files: with open("{}_index_select.pickle".format(name), 'rb') as f: timers += pickle.load(f) compare = Compare(timers) compare.trim_significant_figures() compare.print() ``` </details> <details> <summary>PR/torch_sparse/master runtime comparison</summary> ``` [----------------------------------- coo.index_select ----------------------------------] \| PR \| torch_sparse \| master 32 threads: ----------------------------------------------------------------------------- n=10000, nnz=100, index_len=100, dim=0 \| 14 \| 140 \| 10 n=10000, nnz=100, index_len=100, dim=1 \| 14 \| 200 \| 10 n=10000, nnz=100, index_len=1000, dim=0 \| 30 \| 180 \| 38 n=10000, nnz=100, index_len=1000, dim=1 \| 34 \| 240 \| 38 n=10000, nnz=100, index_len=10000, dim=0 \| 278 \| 460 \| 330 n=10000, nnz=100, index_len=10000, dim=1 \| 275 \| 516 \| 330 n=100000, nnz=1000, index_len=100, dim=0 \| 16 \| 290 \| 31 n=100000, nnz=1000, index_len=100, dim=1 \| 26 \| 390 \| 31 n=100000, nnz=1000, index_len=1000, dim=0 \| 45 \| 405 \| 263 n=100000, nnz=1000, index_len=1000, dim=1 \| 73 \| 500 \| 261 n=100000, nnz=1000, index_len=10000, dim=0 \| 444 \| 783 \| 2570 n=100000, nnz=1000, index_len=10000, dim=1 \| 470 \| 890 \| 2590 n=1000000, nnz=10000, index_len=100, dim=0 \| 25 \| 2400 \| 270 n=1000000, nnz=10000, index_len=100, dim=1 \| 270 \| 4000 \| 269 n=1000000, nnz=10000, index_len=1000, dim=0 \| 74 \| 2600 \| 2620 n=1000000, nnz=10000, index_len=1000, dim=1 \| 464 \| 3600 \| 2640 n=1000000, nnz=10000, index_len=10000, dim=0 \| 635 \| 3300 \| 26400 n=1000000, nnz=10000, index_len=10000, dim=1 \| 1000 \| 3960 \| 26400 n=10, nnz=100, index_len=100, dim=0 \| 16 \| 137 \| 16 n=10, nnz=100, index_len=100, dim=1 \| 16 \| 220 \| 16 n=10, nnz=100, index_len=1000, dim=0 \| 63 \| 238 \| 81 n=10, nnz=100, index_len=1000, dim=1 \| 60 \| 698 \| 78 n=10, nnz=100, index_len=10000, dim=0 \| 480 \| 940 \| 862 n=10, nnz=100, index_len=10000, dim=1 \| 330 \| 4930 \| 1070 n=10, nnz=1000, index_len=100, dim=0 \| 60 \| 200 \| 73 n=10, nnz=1000, index_len=100, dim=1 \| 56 \| 683 \| 70 n=10, nnz=1000, index_len=1000, dim=0 \| 480 \| 530 \| 1050 n=10, nnz=1000, index_len=1000, dim=1 \| 330 \| 4550 \| 1368 n=10, nnz=1000, index_len=10000, dim=0 \| 3100 \| 2900 \| 9300 n=10, nnz=1000, index_len=10000, dim=1 \| 3400 \| 46000 \| 9100 n=10, nnz=10000, index_len=100, dim=0 \| 400 \| 453 \| 857 n=10, nnz=10000, index_len=100, dim=1 \| 400 \| 4070 \| 1730 n=10, nnz=10000, index_len=1000, dim=0 \| 2840 \| 2600 \| 13900 n=10, nnz=10000, index_len=1000, dim=1 \| 3700 \| 40600 \| 16000 n=10, nnz=10000, index_len=10000, dim=0 \| 83200 \| 67400 \| 160000 n=10, nnz=10000, index_len=10000, dim=1 \| 68000 \| 528000 \| 190000 n=100, nnz=1000, index_len=100, dim=0 \| 46 \| 148 \| 31 n=100, nnz=1000, index_len=100, dim=1 \| 45 \| 242 \| 37 n=100, nnz=1000, index_len=1000, dim=0 \| 68 \| 248 \| 240 n=100, nnz=1000, index_len=1000, dim=1 \| 66 \| 755 \| 290 n=100, nnz=1000, index_len=10000, dim=0 \| 370 \| 802 \| 2250 n=100, nnz=1000, index_len=10000, dim=1 \| 372 \| 5430 \| 2770 n=100, nnz=10000, index_len=100, dim=0 \| 82 \| 210 \| 224 n=100, nnz=10000, index_len=100, dim=1 \| 74 \| 986 \| 270 n=100, nnz=10000, index_len=1000, dim=0 \| 350 \| 618 \| 2600 n=100, nnz=10000, index_len=1000, dim=1 \| 370 \| 4660 \| 4560 n=100, nnz=10000, index_len=10000, dim=0 \| 3000 \| 3400 \| 41680 n=100, nnz=10000, index_len=10000, dim=1 \| 5000 \| 47500 \| 30400 n=1000, nnz=10000, index_len=100, dim=0 \| 71 \| 160 \| 185 n=1000, nnz=10000, index_len=100, dim=1 \| 64 \| 516 \| 190 n=1000, nnz=10000, index_len=1000, dim=0 \| 100 \| 249 \| 1740 n=1000, nnz=10000, index_len=1000, dim=1 \| 98 \| 1030 \| 1770 n=1000, nnz=10000, index_len=10000, dim=0 \| 600 \| 808 \| 18300 n=1000, nnz=10000, index_len=10000, dim=1 \| 663 \| 5300 \| 18500 n=1000, nnz=100000, index_len=100, dim=0 \| 160 \| 258 \| 1890 n=1000, nnz=100000, index_len=100, dim=1 \| 200 \| 3620 \| 2050 n=1000, nnz=100000, index_len=1000, dim=0 \| 500 \| 580 \| 18700 n=1000, nnz=100000, index_len=1000, dim=1 \| 640 \| 7550 \| 30000 n=1000, nnz=100000, index_len=10000, dim=0 \| 3400 \| 3260 \| 186000 n=1000, nnz=100000, index_len=10000, dim=1 \| 3600 \| 49600 \| 194000 n=1000, nnz=1000000, index_len=100, dim=0 \| 517 \| 957 \| 18700 n=1000, nnz=1000000, index_len=100, dim=1 \| 680 \| 39600 \| 37600 n=1000, nnz=1000000, index_len=1000, dim=0 \| 3600 \| 4500 \| 186000 n=1000, nnz=1000000, index_len=1000, dim=1 \| 5800 \| 76400 \| 190000 n=1000, nnz=1000000, index_len=10000, dim=0 \| 50000 \| 67900 \| 1800000 n=1000, nnz=1000000, index_len=10000, dim=1 \| 45000 \| 570000 \| 1900000 Times are in microseconds (us). ``` </details> Pull Request resolved: https://github.com/pytorch/pytorch/pull/72710 Approved by: https://github.com/pearu, https://github.com/cpuhrsch	2022-05-10 16:33:13 +00:00
PyTorch MergeBot	8d67972b14	Revert "Faster `index_select` for sparse COO tensors on CPU. (#72710 )" This reverts commit `ce3857e73c`. Reverted https://github.com/pytorch/pytorch/pull/72710 on behalf of https://github.com/malfet	2022-05-10 14:43:05 +00:00
Nikita Vedeneev	ce3857e73c	Faster `index_select` for sparse COO tensors on CPU. (#72710 ) Fixes https://github.com/pytorch/pytorch/issues/72212. This PR improves the previous algorithm in complexity. It also utilizes the structure of the problem and parallelizes computations when possible. Benchmark results. <details> <summary>Testing script</summary> ```python import torch import math from IPython import get_ipython from itertools import product import pickle from torch.utils.benchmark import Timer, Compare torch.manual_seed(13) #torch.set_num_threads(1) ipython = get_ipython() index_sizes = (100, 1000, 10000) # specifies (n, nnz) problem_dims = ( # n > nnz (10000, 100), (100000, 1000), (1000000, 10000), # n < nnz (10, 100), (10, 1000), (10, 10000), (100, 1000), (100, 10000), (1000, 10000), (1000, 100000), (1000, 1000000), #(1000000, 1000000000), ) def f(t, d, index): s = torch_sparse.SparseTensor.from_torch_sparse_coo_tensor(t) ss = s.index_select(d, index) return ss.coo() name = "PR" results = [] for (n, nnz), m in product(problem_dims, index_sizes): for d in (0, 1): if nnz < n: shape = (n, n) else: shape = (n, nnz // n) if d == 0 else (nnz // n, n) nrows, ncols = shape rowidx = torch.randint(low=0, high=nrows, size=(nnz,)) colidx = torch.randint(low=0, high=ncols, size=(nnz,)) itemidx = torch.vstack((rowidx, colidx)) xvalues = torch.randn(nnz) index = torch.randint(low=0, high=n, size=(m,)) SparseX = torch.sparse_coo_tensor(itemidx, xvalues, size=shape).coalesce() smtp = "SparseX.index_select(d, index)" timer = Timer(smtp, globals=globals(), label="coo.index_select", description=f"{name}: coo.index_select", sub_label=f"n={n}, nnz={nnz}, index_len={m}, dim={d}", num_threads=torch.get_num_threads()) results.append(timer.blocked_autorange()) compare = Compare(results) compare.trim_significant_figures() compare.print() with open(f"{name}_index_select.pickle", 'wb') as f: pickle.dump(results, f) ``` </details> <details> <summary>Gather results</summary> ```python import pickle from torch.utils.benchmark import Timer, Compare files = [ "PR", "torch_sparse", "master" ] timers = [] for name in files: with open("{}_index_select.pickle".format(name), 'rb') as f: timers += pickle.load(f) compare = Compare(timers) compare.trim_significant_figures() compare.print() ``` </details> <details> <summary>PR/torch_sparse/master runtime comparison</summary> ``` [----------------------------------- coo.index_select ----------------------------------] \| PR \| torch_sparse \| master 32 threads: ----------------------------------------------------------------------------- n=10000, nnz=100, index_len=100, dim=0 \| 14 \| 140 \| 10 n=10000, nnz=100, index_len=100, dim=1 \| 14 \| 200 \| 10 n=10000, nnz=100, index_len=1000, dim=0 \| 30 \| 180 \| 38 n=10000, nnz=100, index_len=1000, dim=1 \| 34 \| 240 \| 38 n=10000, nnz=100, index_len=10000, dim=0 \| 278 \| 460 \| 330 n=10000, nnz=100, index_len=10000, dim=1 \| 275 \| 516 \| 330 n=100000, nnz=1000, index_len=100, dim=0 \| 16 \| 290 \| 31 n=100000, nnz=1000, index_len=100, dim=1 \| 26 \| 390 \| 31 n=100000, nnz=1000, index_len=1000, dim=0 \| 45 \| 405 \| 263 n=100000, nnz=1000, index_len=1000, dim=1 \| 73 \| 500 \| 261 n=100000, nnz=1000, index_len=10000, dim=0 \| 444 \| 783 \| 2570 n=100000, nnz=1000, index_len=10000, dim=1 \| 470 \| 890 \| 2590 n=1000000, nnz=10000, index_len=100, dim=0 \| 25 \| 2400 \| 270 n=1000000, nnz=10000, index_len=100, dim=1 \| 270 \| 4000 \| 269 n=1000000, nnz=10000, index_len=1000, dim=0 \| 74 \| 2600 \| 2620 n=1000000, nnz=10000, index_len=1000, dim=1 \| 464 \| 3600 \| 2640 n=1000000, nnz=10000, index_len=10000, dim=0 \| 635 \| 3300 \| 26400 n=1000000, nnz=10000, index_len=10000, dim=1 \| 1000 \| 3960 \| 26400 n=10, nnz=100, index_len=100, dim=0 \| 16 \| 137 \| 16 n=10, nnz=100, index_len=100, dim=1 \| 16 \| 220 \| 16 n=10, nnz=100, index_len=1000, dim=0 \| 63 \| 238 \| 81 n=10, nnz=100, index_len=1000, dim=1 \| 60 \| 698 \| 78 n=10, nnz=100, index_len=10000, dim=0 \| 480 \| 940 \| 862 n=10, nnz=100, index_len=10000, dim=1 \| 330 \| 4930 \| 1070 n=10, nnz=1000, index_len=100, dim=0 \| 60 \| 200 \| 73 n=10, nnz=1000, index_len=100, dim=1 \| 56 \| 683 \| 70 n=10, nnz=1000, index_len=1000, dim=0 \| 480 \| 530 \| 1050 n=10, nnz=1000, index_len=1000, dim=1 \| 330 \| 4550 \| 1368 n=10, nnz=1000, index_len=10000, dim=0 \| 3100 \| 2900 \| 9300 n=10, nnz=1000, index_len=10000, dim=1 \| 3400 \| 46000 \| 9100 n=10, nnz=10000, index_len=100, dim=0 \| 400 \| 453 \| 857 n=10, nnz=10000, index_len=100, dim=1 \| 400 \| 4070 \| 1730 n=10, nnz=10000, index_len=1000, dim=0 \| 2840 \| 2600 \| 13900 n=10, nnz=10000, index_len=1000, dim=1 \| 3700 \| 40600 \| 16000 n=10, nnz=10000, index_len=10000, dim=0 \| 83200 \| 67400 \| 160000 n=10, nnz=10000, index_len=10000, dim=1 \| 68000 \| 528000 \| 190000 n=100, nnz=1000, index_len=100, dim=0 \| 46 \| 148 \| 31 n=100, nnz=1000, index_len=100, dim=1 \| 45 \| 242 \| 37 n=100, nnz=1000, index_len=1000, dim=0 \| 68 \| 248 \| 240 n=100, nnz=1000, index_len=1000, dim=1 \| 66 \| 755 \| 290 n=100, nnz=1000, index_len=10000, dim=0 \| 370 \| 802 \| 2250 n=100, nnz=1000, index_len=10000, dim=1 \| 372 \| 5430 \| 2770 n=100, nnz=10000, index_len=100, dim=0 \| 82 \| 210 \| 224 n=100, nnz=10000, index_len=100, dim=1 \| 74 \| 986 \| 270 n=100, nnz=10000, index_len=1000, dim=0 \| 350 \| 618 \| 2600 n=100, nnz=10000, index_len=1000, dim=1 \| 370 \| 4660 \| 4560 n=100, nnz=10000, index_len=10000, dim=0 \| 3000 \| 3400 \| 41680 n=100, nnz=10000, index_len=10000, dim=1 \| 5000 \| 47500 \| 30400 n=1000, nnz=10000, index_len=100, dim=0 \| 71 \| 160 \| 185 n=1000, nnz=10000, index_len=100, dim=1 \| 64 \| 516 \| 190 n=1000, nnz=10000, index_len=1000, dim=0 \| 100 \| 249 \| 1740 n=1000, nnz=10000, index_len=1000, dim=1 \| 98 \| 1030 \| 1770 n=1000, nnz=10000, index_len=10000, dim=0 \| 600 \| 808 \| 18300 n=1000, nnz=10000, index_len=10000, dim=1 \| 663 \| 5300 \| 18500 n=1000, nnz=100000, index_len=100, dim=0 \| 160 \| 258 \| 1890 n=1000, nnz=100000, index_len=100, dim=1 \| 200 \| 3620 \| 2050 n=1000, nnz=100000, index_len=1000, dim=0 \| 500 \| 580 \| 18700 n=1000, nnz=100000, index_len=1000, dim=1 \| 640 \| 7550 \| 30000 n=1000, nnz=100000, index_len=10000, dim=0 \| 3400 \| 3260 \| 186000 n=1000, nnz=100000, index_len=10000, dim=1 \| 3600 \| 49600 \| 194000 n=1000, nnz=1000000, index_len=100, dim=0 \| 517 \| 957 \| 18700 n=1000, nnz=1000000, index_len=100, dim=1 \| 680 \| 39600 \| 37600 n=1000, nnz=1000000, index_len=1000, dim=0 \| 3600 \| 4500 \| 186000 n=1000, nnz=1000000, index_len=1000, dim=1 \| 5800 \| 76400 \| 190000 n=1000, nnz=1000000, index_len=10000, dim=0 \| 50000 \| 67900 \| 1800000 n=1000, nnz=1000000, index_len=10000, dim=1 \| 45000 \| 570000 \| 1900000 Times are in microseconds (us). ``` </details> Pull Request resolved: https://github.com/pytorch/pytorch/pull/72710 Approved by: https://github.com/pearu, https://github.com/cpuhrsch	2022-05-09 19:59:39 +00:00
Pearu Peterson	436a7be059	Factory functions for sparse CSC, BSR, and BSC tensors Pull Request resolved: https://github.com/pytorch/pytorch/pull/76634 Tests for Sparse Compressed factory functions Pull Request resolved: https://github.com/pytorch/pytorch/pull/76746 Approved by: https://github.com/cpuhrsch	2022-05-04 03:30:41 +00:00
Nikita Shulga	8473173c36	Remove breakpad dependency This functionality does not seem to be used and there are some requests to update dependency. Add `third_party` to torch_cpu include directories if compiling with Caffe2 support, as `caffe2/quantization/server/conv_dnnlowp_op.cc` depends on `third_party/fbgemm/src/RefImplementations.h` Pull Request resolved: https://github.com/pytorch/pytorch/pull/75394 Approved by: https://github.com/janeyx99, https://github.com/seemethere	2022-05-03 20:21:55 +00:00
Mike Ruberry	f6bbecf8b5	Adds python ref consistency test, elementwise unary reference inputs, and formats test files Per title. Pull Request resolved: https://github.com/pytorch/pytorch/pull/76626 Approved by: https://github.com/ngimel	2022-05-01 22:42:46 +00:00
Mike Ruberry	4048d4cdd2	[primTorch] Prototype tracer and elementwise unary reference opinfo class Adds a prototype tracer with no caching support and the `ElementwiseUnaryPythonRefInfo` class. A reference for `floor` is added to test the latter, and the elementwise binary reference inputs are extended to also return noncontiguous inputs. The SampleInput transform operation has been updated to return an actual SampleInput instead of a tuple to facilitate uniform handling of (transformed) SampleInputs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/76388 Approved by: https://github.com/ngimel	2022-04-27 14:40:21 +00:00
Alban Desmaison	3d7abc0e55	Make -h work with run_test.py As per title. ### When running `python run_test.py -h` It used to show: - The general unittest parser help that we print via a second thread `35545d85dc/torch/testing/_internal/common_utils.py (L467-L470)` - The common_utils's parser help <details><summary>Full result</summary> <p> ```bash $ python run_test.py -h usage: run_test.py [-h] [-v] [-q] [--locals] [-f] [-c] [-b] [-k TESTNAMEPATTERNS] [tests [tests ...]] positional arguments: tests a list of any number of test modules, classes and test methods. optional arguments: -h, --help show this help message and exit -v, --verbose Verbose output -q, --quiet Quiet output --locals Show local variables in tracebacks -f, --failfast Stop on first fail or error -c, --catch Catch Ctrl-C and display results so far -b, --buffer Buffer stdout and stderr during tests -k TESTNAMEPATTERNS Only run tests which match the given substring Examples: run_test.py - run default set of tests run_test.py MyTestSuite - run suite 'MyTestSuite' run_test.py MyTestCase.testSomething - run MyTestCase.testSomething run_test.py MyTestCase - run all 'test' test methods in MyTestCase usage: run_test.py [-h] [--subprocess] [--seed SEED] [--accept] [--jit_executor JIT_EXECUTOR] [--repeat REPEAT] [--test_bailouts] [--save-xml [SAVE_XML]] [--discover-tests] [--log-suffix LOG_SUFFIX] [--run-parallel RUN_PARALLEL] [--import-slow-tests [IMPORT_SLOW_TESTS]] [--import-disabled-tests [IMPORT_DISABLED_TESTS]] optional arguments: -h, --help show this help message and exit --subprocess whether to run each test in a subprocess --seed SEED --accept --jit_executor JIT_EXECUTOR --repeat REPEAT --test_bailouts --save-xml [SAVE_XML] --discover-tests --log-suffix LOG_SUFFIX --run-parallel RUN_PARALLEL --import-slow-tests [IMPORT_SLOW_TESTS] --import-disabled-tests [IMPORT_DISABLED_TESTS] ``` </p> </details> It now prints: - The general unittest parser help the same way. Should we remove this? We can't merge them unfortunately as inittest does not accept parent / does not expose the parser for us to take it as a parent. - The combined common_utils + run_test parsers help <details><summary>Full result</summary> <p> ```bash $ python run_test.py -h usage: run_test.py [-h] [-v] [-q] [--locals] [-f] [-c] [-b] [-k TESTNAMEPATTERNS] [tests [tests ...]] positional arguments: tests a list of any number of test modules, classes and test methods. optional arguments: -h, --help show this help message and exit -v, --verbose Verbose output -q, --quiet Quiet output --locals Show local variables in tracebacks -f, --failfast Stop on first fail or error -c, --catch Catch Ctrl-C and display results so far -b, --buffer Buffer stdout and stderr during tests -k TESTNAMEPATTERNS Only run tests which match the given substring Examples: run_test.py - run default set of tests run_test.py MyTestSuite - run suite 'MyTestSuite' run_test.py MyTestCase.testSomething - run MyTestCase.testSomething run_test.py MyTestCase - run all 'test' test methods in MyTestCase Ignoring disabled issues: [] usage: run_test.py [-h] [--subprocess] [--seed SEED] [--accept] [--jit_executor JIT_EXECUTOR] [--repeat REPEAT] [--test_bailouts] [--save-xml [SAVE_XML]] [--discover-tests] [--log-suffix LOG_SUFFIX] [--run-parallel RUN_PARALLEL] [--import-slow-tests [IMPORT_SLOW_TESTS]] [--import-disabled-tests [IMPORT_DISABLED_TESTS]] [-v] [--jit] [--distributed-tests] [-core] [-pt] [-c] [-i TESTS [TESTS ...]] [-x TESTS [TESTS ...]] [-f TESTS] [-l TESTS] [--bring-to-front TESTS [TESTS ...]] [--ignore-win-blocklist] [--continue-through-error] [--export-past-test-times [EXPORT_PAST_TEST_TIMES]] [--shard SHARD SHARD] [--exclude-jit-executor] [--exclude-distributed-tests] [--run-specified-test-cases [RUN_SPECIFIED_TEST_CASES]] [--use-specified-test-cases-by {include,bring-to-front}] [--dry-run] [additional_unittest_args [additional_unittest_args ...]] Run the PyTorch unit test suite positional arguments: additional_unittest_args additional arguments passed through to unittest, e.g., python run_test.py -i sparse -- TestSparse.test_factory_size_check optional arguments: -h, --help show this help message and exit --subprocess whether to run each test in a subprocess --seed SEED --accept --jit_executor JIT_EXECUTOR --repeat REPEAT --test_bailouts --save-xml [SAVE_XML] --discover-tests --log-suffix LOG_SUFFIX --run-parallel RUN_PARALLEL --import-slow-tests [IMPORT_SLOW_TESTS] --import-disabled-tests [IMPORT_DISABLED_TESTS] -v, --verbose print verbose information and test-by-test results --jit, --jit run all jit tests --distributed-tests, --distributed-tests run all distributed tests -core, --core Only run core tests, or tests that validate PyTorch's ops, modules,and autograd. They are defined by CORE_TEST_LIST. -pt, --pytest If true, use `pytest` to execute the tests. E.g., this runs TestTorch with pytest in verbose and coverage mode: python run_test.py -vci torch -pt -c, --coverage enable coverage -i TESTS [TESTS ...], --include TESTS [TESTS ...] select a set of tests to include (defaults to ALL tests). tests must be a part of the TESTS list defined in run_test.py -x TESTS [TESTS ...], --exclude TESTS [TESTS ...] select a set of tests to exclude -f TESTS, --first TESTS select the test to start from (excludes previous tests) -l TESTS, --last TESTS select the last test to run (excludes following tests) --bring-to-front TESTS [TESTS ...] select a set of tests to run first. This can be used in situations where you want to run all tests, but care more about some set, e.g. after making a change to a specific component --ignore-win-blocklist always run blocklisted windows tests --continue-through-error Runs the full test suite despite one of the tests failing --export-past-test-times [EXPORT_PAST_TEST_TIMES] dumps test times from previous S3 stats into a file, format JSON --shard SHARD SHARD runs a shard of the tests (taking into account other selections), e.g., --shard 2 3 will break up the selected tests into 3 shards and run the tests in the 2nd shard (the first number should not exceed the second) --exclude-jit-executor exclude tests that are run for a specific jit config --exclude-distributed-tests exclude distributed tests --run-specified-test-cases [RUN_SPECIFIED_TEST_CASES] load specified test cases file dumped from previous OSS CI stats, format CSV. If all test cases should run for a <test_module> please add a single row: test_filename,test_case_name ... <test_module>,__all__ ... how we use the stats will be based on option "--use-specified-test-cases-by". --use-specified-test-cases-by {include,bring-to-front} used together with option "--run-specified-test-cases". When specified test case file is set, this option allows the user to control whether to only run the specified test modules or to simply bring the specified modules to front and also run the remaining modules. Note: regardless of this option, we will only run the specified test cases within a specified test module. For unspecified test modules with the bring-to-front option, all test cases will be run, as one may expect. --dry-run Only list the test that will run. where TESTS is any of: benchmark_utils/test_benchmark_utils, distributed/_shard/sharded_optim/test_sharded_optim, distributed/_shard/sharded_tensor/ops/test_binary_cmp, distributed/_shard/sharded_tensor/ops/test_elementwise_ops, distributed/_shard/sharded_tensor/ops/test_embedding, distributed/_shard/sharded_tensor/ops/test_embedding_bag, distributed/_shard/sharded_tensor/ops/test_init, distributed/_shard/sharded_tensor/ops/test_linear, distributed/_shard/sharded_tensor/ops/test_math_ops, distributed/_shard/sharded_tensor/test_megatron_prototype, distributed/_shard/sharded_tensor/test_partial_tensor, distributed/_shard/sharded_tensor/test_sharded_tensor, distributed/_shard/sharded_tensor/test_sharded_tensor_reshard, distributed/_shard/sharding_spec/test_sharding_spec, distributed/_shard/test_replicated_tensor, distributed/algorithms/test_join, distributed/elastic/events/lib_test, distributed/elastic/metrics/api_test, distributed/elastic/multiprocessing/api_test, distributed/elastic/timer/api_test, distributed/elastic/timer/local_timer_example, distributed/elastic/timer/local_timer_test, distributed/elastic/utils/distributed_test, distributed/elastic/utils/logging_test, distributed/elastic/utils/util_test, distributed/fsdp/test_flatten_params_wrapper, distributed/fsdp/test_fsdp_apply, distributed/fsdp/test_fsdp_checkpoint, distributed/fsdp/test_fsdp_clip_grad_norm, distributed/fsdp/test_fsdp_comm, distributed/fsdp/test_fsdp_core, distributed/fsdp/test_fsdp_freezing_weights, distributed/fsdp/test_fsdp_grad_acc, distributed/fsdp/test_fsdp_ignored_modules, distributed/fsdp/test_fsdp_input, distributed/fsdp/test_fsdp_memory, distributed/fsdp/test_fsdp_mixed_precision, distributed/fsdp/test_fsdp_multiple_forward, distributed/fsdp/test_fsdp_multiple_wrapping, distributed/fsdp/test_fsdp_optim_state, distributed/fsdp/test_fsdp_overlap, distributed/fsdp/test_fsdp_pure_fp16, distributed/fsdp/test_fsdp_state_dict, distributed/fsdp/test_fsdp_summon_full_params, distributed/fsdp/test_fsdp_traversal, distributed/fsdp/test_fsdp_uneven, distributed/fsdp/test_shard_utils, distributed/fsdp/test_utils, distributed/fsdp/test_wrap, distributed/nn/jit/test_instantiator, distributed/optim/test_zero_redundancy_optimizer, distributed/pipeline/sync/skip/test_api, distributed/pipeline/sync/skip/test_gpipe, distributed/pipeline/sync/skip/test_inspect_skip_layout, distributed/pipeline/sync/skip/test_leak, distributed/pipeline/sync/skip/test_portal, distributed/pipeline/sync/skip/test_stash_pop, distributed/pipeline/sync/skip/test_tracker, distributed/pipeline/sync/skip/test_verify_skippables, distributed/pipeline/sync/test_balance, distributed/pipeline/sync/test_bugs, distributed/pipeline/sync/test_checkpoint, distributed/pipeline/sync/test_copy, distributed/pipeline/sync/test_deferred_batch_norm, distributed/pipeline/sync/test_dependency, distributed/pipeline/sync/test_inplace, distributed/pipeline/sync/test_microbatch, distributed/pipeline/sync/test_phony, distributed/pipeline/sync/test_pipe, distributed/pipeline/sync/test_pipeline, distributed/pipeline/sync/test_stream, distributed/pipeline/sync/test_transparency, distributed/pipeline/sync/test_worker, distributed/rpc/cuda/test_tensorpipe_agent, distributed/rpc/test_faulty_agent, distributed/rpc/test_tensorpipe_agent, distributed/test_c10d_common, distributed/test_c10d_gloo, distributed/test_c10d_nccl, distributed/test_c10d_spawn_gloo, distributed/test_c10d_spawn_nccl, distributed/test_data_parallel, distributed/test_distributed_spawn, distributed/test_launcher, distributed/test_nccl, distributed/test_pg_wrapper, distributed/test_store, distributions/test_constraints, distributions/test_distributions, lazy/test_bindings, lazy/test_extract_compiled_graph, lazy/test_ts_opinfo, test_ao_sparsity, test_autocast, test_autograd, test_binary_ufuncs, test_bundled_inputs, test_complex, test_cpp_api_parity, test_cpp_extensions_aot_ninja, test_cpp_extensions_aot_no_ninja, test_cpp_extensions_jit, test_cuda, test_cuda_primary_ctx, test_dataloader, test_datapipe, test_deploy, test_deploy, test_dispatch, test_expanded_weights, test_foreach, test_function_schema, test_functional_autograd_benchmark, test_functional_optim, test_functionalization, test_futures, test_fx, test_fx_experimental, test_hub, test_import_stats, test_indexing, test_jit, test_jit_autocast, test_jit_cuda_fuser, test_jit_disabled, test_jit_fuser_legacy, test_jit_fuser_te, test_jit_legacy, test_jit_profiling, test_license, test_linalg, test_logging, test_masked, test_mkldnn, test_mobile_optimizer, test_model_dump, test_module_init, test_modules, test_monitor, test_multiprocessing, test_multiprocessing_spawn, test_namedtensor, test_namedtuple_return_api, test_native_functions, test_nestedtensor, test_nn, test_numba_integration, test_numpy_interop, test_openmp, test_ops, test_ops_gradients, test_ops_jit, test_optim, test_overrides, test_package, test_per_overload_api, test_profiler, test_pruning_op, test_public_bindings, test_python_dispatch, test_pytree, test_quantization, test_reductions, test_scatter_gather_ops, test_serialization, test_set_default_mobile_cpu_allocator, test_shape_ops, test_show_pickle, test_sort_and_select, test_sparse, test_sparse_csr, test_spectral_ops, test_stateless, test_tensor_creation_ops, test_tensorboard, test_tensorexpr, test_tensorexpr_pybind, test_testing, test_torch, test_type_hints, test_type_info, test_type_promotion, test_unary_ufuncs, test_utils, test_view_ops, test_vmap, test_vulkan, test_xnnpack_integration ``` </p> </details> ### When running anything else (for example `python test_autograd.py -h`) It did not change and still does: - The general unittest parser help that we print via a second thread - The common_utils's parser help Pull Request resolved: https://github.com/pytorch/pytorch/pull/76152 Approved by: https://github.com/malfet, https://github.com/seemethere	2022-04-25 14:01:33 +00:00
Thiago Crepaldi	90d31cb311	Emit ATen ops when symbolics raise + minor fixes Currently `torch.onnx.export(.., operator_export_type=OperatorExportTypes.ONNX_ATEN_FALLBACK)` only issues ATen ops through explicit requests (e.g. `g.at()`) calls inside each op symbolic function. This is done based on specific conditions such as `operator_export_type==OperatorExportTypes.ONNX_ATEN_FALLBACK)` or `is_caffe2_aten_fallback()` This PR extends the ATen fallback mechanism for scenarios when the symbolic function raises `RuntimeError` during export. The idea is that partial implementation of existing ONNX ops can fallback to ATen as a last resort. That is valuable because each operator can have many input combinations and not all are always implemented. A minor fix was done to make sure the `overload_name` attribute is added to explicit ATen op fallback requests when a symbolic is not registered to a particular op. ps: The behavior for builds with BUILD_CAFFE2=1 is not changed to ensure BC. Pull Request resolved: https://github.com/pytorch/pytorch/pull/74759 Approved by: https://github.com/garymm, https://github.com/msaroufim	2022-04-23 21:24:25 +00:00
Thiago Crepaldi	e07134092f	Add warning when importing caffe2 on build without BUILD_CAFFE2=1 Confusing backtraces are issued to users when they run Caffe2 scripts (or tests) on PyTorch builds without Caffe2 enabled through `BUILD_CAFFE2=1` This PR adds warnings (in more than one place) to return a friendly message for the user, helping them to overcome the problem by themselves Pull Request resolved: https://github.com/pytorch/pytorch/pull/73770 Approved by: https://github.com/BowenBao, https://github.com/malfet, https://github.com/garymm	2022-04-21 12:28:10 +00:00
Edward Z. Yang	ee955b8bb9	Cannibalize noarch CI job into crossref CI job crossref is a new strategy for performing tests when you want to run a normal PyTorch API call, separately run some variation of the API call (e.g., same thing but all the arguments are meta tensors) and then cross-reference the results to see that they are consistent. Any logic you add to CrossRefMode will get run on every PyTorch API call that is called in the course of PyTorch's test suite. This can be a good choice for correctness testing if OpInfo testing is not exhaustive enough. For now, the crossref test doesn't do anything except verify that we can validly push a mode onto the torch function mode stack for all functions. Signed-off-by: Edward Z. Yang <ezyangfb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/75988 Approved by: https://github.com/seemethere	2022-04-20 11:56:25 +00:00
Edward Z. Yang	28a3654668	Make PYTORCH_TEST_WITH_SLOW_GRADCHECK consistent with other test envvars Signed-off-by: Edward Z. Yang <ezyangfb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/75987 Approved by: https://github.com/soulitzer	2022-04-19 01:13:08 +00:00
Edward Z. Yang	30943d1610	Remove noarchTest decorator These tests are cheap so it doesn't matter if we run them on all configs. This is in preparation for removing the noarch build configuration entirely. Signed-off-by: Edward Z. Yang <ezyangfb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/75985 Approved by: https://github.com/seemethere, https://github.com/cbalioglu	2022-04-19 00:48:49 +00:00
PyTorch MergeBot	d79d9fa283	Revert "Remove breakpad dependency" This reverts commit `9aa3c7fd83`. Reverted https://github.com/pytorch/pytorch/pull/75394 on behalf of https://github.com/malfet	2022-04-17 17:58:51 +00:00
Nikita Shulga	9aa3c7fd83	Remove breakpad dependency This functionality does not seem to be used and there are some requests to update dependency Pull Request resolved: https://github.com/pytorch/pytorch/pull/75394 Approved by: https://github.com/janeyx99, https://github.com/seemethere	2022-04-17 17:43:45 +00:00
Thiago Crepaldi	9bbe1d632e	Fix ONNX ATen fallback for non-caffe2 engines This PR introduces 3 BC changes: First, this PR propagates `BUILD_CAFFE2` flag to `libtorch` and `libtorch_python`, which is necessary for non-caffe2 ONNX runtimes when using `ONNX_ATEN_FALLBACK` operator export type. Second, as a complement of https://github.com/pytorch/pytorch/pull/68490, this PR refactors Caffe2's Aten ops symbolics to consider not only the `operator_export_type` (aka `ONNX_ATEN_FALLBACK`) to emit Caffe2 Aten ops, but also whether `BUILD_CAFFE2` (which is called `torch.onnx._CAFFE2_ATEN_FALLBACK` in python binding) is set. Lastly, it renames `onnx::ATen` to `aten::ATen` for ONNX spec consistency in a BC fashion. ONNX doesn't have `ATen` op on its spec, but PyTorch ONNX converter emits them. Non-Caffe2 backend engines would be mislead by such operator's name/domain. A non-ideal workaround would be to have Aten ops handled based on its name and ignore the (non-complaint) domain. Moreover, users could incorrectly file bugs to either ONNX or ONNX Runtime when they inspect the model and notice the presence of an unspecified ONNX operator. Pull Request resolved: https://github.com/pytorch/pytorch/pull/73954 Approved by: https://github.com/BowenBao, https://github.com/malfet, https://github.com/garymm, https://github.com/jiafatom	2022-04-14 23:18:45 +00:00
Ivan Yashchuk	c7ae23b50e	Extend CSR constructor to support batched indices and values This is the first portion of changes required to enable Batched CSR format described in https://github.com/pytorch/pytorch/issues/60854#batched-CSR-computation. Currently, only the same batch shape for indices and values is allowed. In the future, we could enable "broadcasting" of indices and batched values, as done in xFormers (`dd96b8d8be/xformers/components/attention/_sputnik_sparse.py (L441)`). This PR adds possibility to construct a batched CSR matrix with `torch.sparse_csr_tensor` and this batched CSR can be converted to a dense tensor with a `.to_dense()` call. Pull Request resolved: https://github.com/pytorch/pytorch/pull/74542 Approved by: https://github.com/cpuhrsch	2022-04-07 17:10:52 +00:00
PyTorch MergeBot	6d832a7a20	Revert "Extend CSR constructor to support batched indices and values" This reverts commit `eead599039`. Reverted https://github.com/pytorch/pytorch/pull/74542 on behalf of https://github.com/b0noI	2022-04-05 21:39:34 +00:00
Ivan Yashchuk	eead599039	Extend CSR constructor to support batched indices and values This is the first portion of changes required to enable Batched CSR format described in https://github.com/pytorch/pytorch/issues/60854#batched-CSR-computation. Currently, only the same batch shape for indices and values is allowed. In the future, we could enable "broadcasting" of indices and batched values, as done in xFormers (`dd96b8d8be/xformers/components/attention/_sputnik_sparse.py (L441)`). This PR adds possibility to construct a batched CSR matrix with `torch.sparse_csr_tensor` and this batched CSR can be converted to a dense tensor with a `.to_dense()` call. Pull Request resolved: https://github.com/pytorch/pytorch/pull/74542 Approved by: https://github.com/cpuhrsch	2022-04-04 22:09:44 +00:00
PyTorch MergeBot	cc23725e89	Revert "Extend CSR constructor to support batched indices and values" This reverts commit `c074a53002`. Reverted https://github.com/pytorch/pytorch/pull/74542 on behalf of https://github.com/malfet	2022-03-30 19:54:26 +00:00
Ivan Yashchuk	c074a53002	Extend CSR constructor to support batched indices and values This is the first portion of changes required to enable Batched CSR format described in https://github.com/pytorch/pytorch/issues/60854#batched-CSR-computation. Currently, only the same batch shape for indices and values is allowed. In the future, we could enable "broadcasting" of indices and batched values, as done in xFormers (`dd96b8d8be/xformers/components/attention/_sputnik_sparse.py (L441)`). This PR adds possibility to construct a batched CSR matrix with `torch.sparse_csr_tensor` and this batched CSR can be converted to a dense tensor with a `.to_dense()` call. Pull Request resolved: https://github.com/pytorch/pytorch/pull/74542 Approved by: https://github.com/cpuhrsch	2022-03-29 21:20:25 +00:00
Elias Ellison	6694fdaccd	Clean up profiling mode and profiling executor strategy (#73875 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73875 Previously we had a few settings: - getExecutor - which toggled between Profiling Executor and Legacy - getGraphOptimize - if true, overrides PE/Legacy to run with simple executor (no optimizations) and then... - getProfilingMode - which would set PE to 0 specializtions. The last mode is redundant with getGraphOptimize, we should just remove it and use getGraphOptimize in these cases. It would lead to potentially invalid combinations of logic - what does mean if getProfilingMode is true but getExecutor is set to false ? This would lead to a bug in specialize_autograd_zero in this case, see: https://github.com/pytorch/pytorch/blob/master/torch%2Fcsrc%2Fjit%2Fpasses%2Fspecialize_autogradzero.cpp#L93. The tests here are failing but get fixed with the PR above it, so i'll squash for landing. Test Plan: Imported from OSS Reviewed By: cpuhrsch Differential Revision: D34938130 Pulled By: eellison fbshipit-source-id: 1a9c0ae7f6d1cfddc2ed3499a5af611053ae5e1b (cherry picked from commit cf69ce3d155ba7d334022c42fb2cee54bb088c23)	2022-03-29 18:38:51 +00:00
Christian Puhrsch	e55b73d65a	Add strided layout support for to_dense Fixes #59958 Pull Request resolved: https://github.com/pytorch/pytorch/pull/74486 Approved by: https://github.com/pearu, https://github.com/suo	2022-03-29 00:12:48 +00:00
Tristan Rice	5b915e844c	c10d: retry dns lookup failures (#74641 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74641 This makes dns hostname lookup failures retryable since in some environments such as Kubernetes they're not guaranteed to be resolvable until the job starts. Retrying this eliminates the race condition. This also fixes `sandcastle_skip_if` when used on the class instead of the method. Previously they wouldn't inherit from TestCase so just wouldn't run under buck at all. Fixes https://github.com/pytorch/pytorch/issues/73682 Test Plan: Added a unit test ``` buck test //caffe2/test/distributed:test_store ``` Reviewed By: aivanou Differential Revision: D35092284 fbshipit-source-id: d40bf187e52c41f551e4fe41c536b2b0015588ee (cherry picked from commit f8908309d8ee64c25ee466a6b4922f34f2b7618e)	2022-03-24 19:51:09 +00:00
Maksim Dmitriyevich Podkorytov	6a664481d5	Print reason for test skipped in CI (#74451 ) Summary: Issue: https://github.com/pytorch/pytorch/issues/69014 (skip reason is not printed for tests running in CI) Cause: locally tests are fired with testrunner that is built into unittest. When IN_CI env variable is true - tests are fired with xml test runner from unittest-xml-reporting. They have different summary printing properties Solution: patching printing logic in unittest-xml-reporting when a test is skipped Pull Request resolved: https://github.com/pytorch/pytorch/pull/74451 Test Plan: examine CI run Reviewed By: janeyx99 Differential Revision: D35050675 Pulled By: tenpercent fbshipit-source-id: e3421687d68932079ba156153abeefee368b2fb3 (cherry picked from commit e2356d8ecc618dad9c7c556d40b47c1a6c53f68a)	2022-03-23 00:02:26 +00:00
Mike Ruberry	0aa3c39e5f	Extends OpInfo architecture with reference inputs, adds them for elementwise binary operators This PR extends our OpInfo test architecture with "reference inputs," an optional expansion of typical sample inputs that allows for more thorough testing. Currently only the elementwise binary operations implement an extended set of reference inputs. This PR also cleans up some smaller OpInfo-related issues, including several bugs, and it identified https://github.com/pytorch/pytorch/issues/74279. A reference inputs function can be specified for an OpInfo by filling in its "reference_inputs_func" metadata. If this is done it's recommended that the reference inputs function first call the sample inputs function, then produce additional sample inputs. See `reference_inputs_elementwise_binary` for an example of this pattern. In addition to implementing reference inputs for the elementwise binary operations, this PR improves their consistency and simplifies how their metadata is represented. The great majority now use a generic sample input function, and those that want extensions start by calling the generic sample input function and then adding additional samples. This removes many older sample input functions. The BinaryUfuncInfo subclass also now allows specifying scalar support more precisely, and reference inputs and error inputs are generated based on this metadata to ensure it's correct. cc @kshitij12345 @pmeier @zou3519 @Chillee Pull Request resolved: https://github.com/pytorch/pytorch/pull/74280 Approved by: https://github.com/ngimel	2022-03-21 03:24:16 +00:00

1 2 3 4 5 ...

285 Commits