Commit Graph

285 Commits

Author SHA1 Message Date
Richard Zou
645a9235f0 Add functorch shards for windows CI (#82161)
Test Plan:
- wait for CI
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82161
Approved by: https://github.com/kit1980
2022-07-26 14:21:13 +00:00
samdow
2ac24675cc get rid of push_torch_{dispatch, function}_mode (#78215)
Currently we have 2 ways of doing the same thing for torch dispatch and function modes:
`with push_torch_dispatch_mode(X)` or `with X.push(...)`
is now the equivalent of doing
`with X()`

This removes the first API (which is older and private so we don't need to go through a deprecation cycle)

There is some risk here that this might land race with a PR that uses the old API but in general it seems like most are using the `with X()` API or `enable_torch_dispatch_mode(X())` which isn't getting removed.

EDIT: left the `with X.push(...)` API since there were ~3 land races with that over the past day or so. But made it give a warning and ask users to use the other API
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78215
Approved by: https://github.com/ezyang
2022-07-22 18:56:37 +00:00
soulitzer
f595467e5c Reenable slow gradcheck and make it pass (#80514)
Context: For a while slow gradcheck CI was skipping nearly all tests and this hid the fact that it should've been failing and timing out (10+h runtime for TestGradients). The CI configuration has since been fixed to correct this, revealing the test failures. This PR reenables slow gradcheck CI and makes it pass again.

This PR:
- makes slow and failing tests run in fast gradcheck mode only
- reduce the input size for slow gradcheck only for unary/binary ufuncs (alternatively, skip the test entirely)
- skip entire test files on slow gradcheck runner if they don't use gradcheck (test_ops, test_meta, test_decomp, test_ops_jit)
- reduces the input size for some ops

Follow ups:
1. Investigate slow mode failures https://github.com/pytorch/pytorch/issues/80411
2. See if we can re-enable slow gradcheck tests for some of the slow tests by reducing the sizes of their inputs

The following are failing in slow mode, they are now running in fast mode only.
```
test_fn_fwgrad_bwgrad___rmod___cuda_float64
test_fn_fwgrad_bwgrad_linalg_householder_product_cuda_complex128
test_fn_fwgrad_bwgrad__masked_prod_cuda_complex128
test_fn_fwgrad_bwgrad__masked_prod_cuda_float64
test_fn_fwgrad_bwgrad_linalg_matrix_power_cuda_complex128
test_fn_fwgrad_bwgrad_cat_cuda_complex128
test_fn_fwgrad_bwgrad_linalg_lu_factor_ex_cuda_float64
test_fn_fwgrad_bwgrad_copysign_cuda_float64
test_fn_fwgrad_bwgrad_cholesky_inverse_cuda_complex128
test_fn_fwgrad_bwgrad_float_power_cuda_complex128
test_fn_fwgrad_bwgrad_fmod_cuda_float64
test_fn_fwgrad_bwgrad_float_power_cuda_float64
test_fn_fwgrad_bwgrad_linalg_lu_cuda_float64
test_fn_fwgrad_bwgrad_remainder_cuda_float64
test_fn_fwgrad_bwgrad_repeat_cuda_complex128
test_fn_fwgrad_bwgrad_prod_cuda_complex128
test_fn_fwgrad_bwgrad_slice_scatter_cuda_float64
test_fn_fwgrad_bwgrad_tile_cuda_complex128
test_fn_fwgrad_bwgrad_pow_cuda_float64
test_fn_fwgrad_bwgrad_pow_cuda_complex128
test_fn_fwgrad_bwgrad_fft_*
test_fn_fwgrad_bwgrad_zero__cuda_complex128
test_fn_gradgrad_linalg_lu_factor_cuda_float64
test_fn_grad_div_trunc_rounding_cuda_float64
test_fn_grad_div_floor_rounding_cuda_float64
```

Marks the OpInfos for the following ops that run slowly in slow gradcheck as `fast_gradcheck` only (the left column represents runtime in seconds):
```
0  918.722  test_fn_fwgrad_bwgrad_nn_functional_conv_transpose3d_cuda_float64
1  795.042  test_fn_fwgrad_bwgrad_nn_functional_unfold_cuda_complex128
2  583.63  test_fn_fwgrad_bwgrad_nn_functional_max_pool3d_cuda_float64
3  516.946  test_fn_fwgrad_bwgrad_svd_cuda_complex128
4  503.179  test_fn_fwgrad_bwgrad_linalg_svd_cuda_complex128
5  460.985  test_fn_fwgrad_bwgrad_linalg_lu_cuda_complex128
6  401.04  test_fn_fwgrad_bwgrad_linalg_lstsq_grad_oriented_cuda_complex128
7  353.671  test_fn_fwgrad_bwgrad_nn_functional_max_pool2d_cuda_float64
8  321.903  test_fn_fwgrad_bwgrad_nn_functional_gaussian_nll_loss_cuda_float64
9  307.951  test_fn_fwgrad_bwgrad_stft_cuda_complex128
10  266.104  test_fn_fwgrad_bwgrad_svd_lowrank_cuda_float64
11  221.032  test_fn_fwgrad_bwgrad_istft_cuda_complex128
12  183.741  test_fn_fwgrad_bwgrad_lu_unpack_cuda_complex128
13  132.019  test_fn_fwgrad_bwgrad_nn_functional_unfold_cuda_float64
14  125.343  test_fn_fwgrad_bwgrad_nn_functional_pad_constant_cuda_complex128
15  124.2  test_fn_fwgrad_bwgrad_kron_cuda_complex128
16  123.721  test_fn_fwgrad_bwgrad_pca_lowrank_cuda_float64
17  121.074  test_fn_fwgrad_bwgrad_nn_functional_max_unpool3d_cuda_float64
18  119.387  test_fn_fwgrad_bwgrad_rot90_cuda_complex128
19  112.889  test_fn_fwgrad_bwgrad__masked_normalize_cuda_complex128
20  107.541  test_fn_fwgrad_bwgrad_dist_cuda_complex128
21  106.727  test_fn_fwgrad_bwgrad_diff_cuda_complex128
22  104.588  test_fn_fwgrad_bwgrad__masked_cumprod_cuda_complex128
23  100.135  test_fn_fwgrad_bwgrad_nn_functional_feature_alpha_dropout_with_train_cuda_float64
24  88.359  test_fn_fwgrad_bwgrad_mH_cuda_complex128
25  86.214  test_fn_fwgrad_bwgrad_nn_functional_max_unpool2d_cuda_float64
26  83.037  test_fn_fwgrad_bwgrad_nn_functional_bilinear_cuda_float64
27  79.987  test_fn_fwgrad_bwgrad__masked_cumsum_cuda_complex128
28  77.822  test_fn_fwgrad_bwgrad_diag_embed_cuda_complex128
29  76.256  test_fn_fwgrad_bwgrad_mT_cuda_complex128
30  74.039  test_fn_fwgrad_bwgrad_linalg_lu_solve_cuda_complex128
```
```
0  334.142  test_fn_fwgrad_bwgrad_unfold_cuda_complex128
1  312.791  test_fn_fwgrad_bwgrad_linalg_lu_factor_cuda_complex128
2  121.963  test_fn_fwgrad_bwgrad_nn_functional_max_unpool3d_cuda_float64
3  108.085  test_fn_fwgrad_bwgrad_diff_cuda_complex128
4  89.418  test_fn_fwgrad_bwgrad_nn_functional_max_unpool2d_cuda_float64
5  72.231  test_fn_fwgrad_bwgrad___rdiv___cuda_complex128
6  69.433  test_fn_fwgrad_bwgrad___getitem___cuda_complex128
7  68.582  test_fn_fwgrad_bwgrad_ldexp_cuda_complex128
8  68.572  test_fn_fwgrad_bwgrad_linalg_pinv_cuda_complex128
9  67.585  test_fn_fwgrad_bwgrad_nn_functional_glu_cuda_float64
10  66.567  test_fn_fwgrad_bwgrad_lu_cuda_float64
```
```
0  630.13  test_fn_gradgrad_nn_functional_conv2d_cuda_complex128
1  81.086  test_fn_gradgrad_linalg_solve_triangular_cuda_complex128
2  71.332  test_fn_gradgrad_norm_cuda_complex128
3  64.308  test_fn_gradgrad__masked_std_cuda_complex128
4  59.519  test_fn_gradgrad_div_no_rounding_mode_cuda_complex128
5  58.836  test_fn_gradgrad_nn_functional_adaptive_avg_pool3
```

Reduces the sizes of the inputs for:
- diff
- diag_embed

Pull Request resolved: https://github.com/pytorch/pytorch/pull/80514
Approved by: https://github.com/albanD
2022-07-22 02:05:37 +00:00
vitrioil
d4043f0d95 Added generator check for parametrize and ops (#81263)
Fixes #81234

Pull Request resolved: https://github.com/pytorch/pytorch/pull/81263
Approved by: https://github.com/jbschlosser
2022-07-21 19:32:54 +00:00
Catherine Lee
06a0cfc0ea pytest to run test_ops, test_ops_gradients, test_ops_jit in non linux cuda environments (#79898)
This PR uses pytest to run test_ops, test_ops_gradients, and test_ops_jit in parallel in non linux cuda environments to decrease TTS.  I am excluding linux cuda because running in parallel results in errors due to running out of memory

Notes:
* update hypothesis version for compatability with pytest
* use rerun-failures to rerun tests (similar to flaky tests, although these test files generally don't have flaky tests)
  * reruns are denoted by a rerun tag in the xml.  Failed reruns also have the failure tag.  Successes (meaning that the test is flaky) do not have the failure tag.
* see https://docs.google.com/spreadsheets/d/1aO0Rbg3y3ch7ghipt63PG2KNEUppl9a5b18Hmv2CZ4E/edit#gid=602543594 for info on speedup (or slowdown in the case of slow tests)
  * expecting windows tests to decrease by 60 minutes total
* slow test infra is expected to stay the same - verified by running pytest and unittest on the same job and check the number of skipped/run tests
* test reports to s3 changed - add entirely new table to keep track of invoking_file times
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79898
Approved by: https://github.com/malfet, https://github.com/janeyx99
2022-07-19 19:50:57 +00:00
Richard Zou
c9a0204ef4 Disable functorch modes in testing's freeze_rng_state(), part 2 (#81109)
I forgot to update one line in
https://github.com/pytorch/pytorch/pull/81006. torch.get_rng_state()
returns a Tensor that can also be affected by modes so it also needs a
no_functorch() context manager.

Test Plan:
- tested with functorch tests on CUDA (that's how I discovered this
problem)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81109
Approved by: https://github.com/samdow
2022-07-08 20:18:56 +00:00
Animesh Jain
b26d664810 Switch on TorchDynamo for PyTorch tests (#81083)
As title. Based on https://github.com/pytorch/pytorch/pull/80106
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81083
Approved by: https://github.com/ezyang
2022-07-08 16:08:11 +00:00
Richard Zou
26582056fa Disable functorch modes in testing's freeze_rng_state() (#81006)
freeze_rng_state() is this thing we use to test random operations in
OpInfos: it ensures that everytime the op is called the rng state is the
same.

Unfortunately this doesn't work with functorch, because
- torch.cuda.set_rng_state() clones a Tensor and then grabs its data_ptr
- functorch's modes cause functorch wrappers to get emitted on the
.clone() call (even if the thing being cloned a regular Tensor).

Tensor subclasses also had this problem. This PR applies the same
solution as torch_dispatch did before: we're just going to disable
functorch dispatch when setting the rng state.

In the long run, torch_dispatch should probably have an option to
interpose on torch.cuda.set_rng_state or generator.set_state... but I
didn't want to think very hard right now.

Test Plan:
- tested with functorch tests (those tests were previously being
skipped, now I can unskip some of them).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81006
Approved by: https://github.com/samdow
2022-07-08 03:28:49 +00:00
Animesh Jain
1d90d6ee60 Setup for running PyTorch tests with TorchDynamo and skips for known failing tests (#80106)
@ezyang I am going to keep adding more skips in this PR for now. And once we have the CI running, I will replace with the appropriate decorators.

cc @mlazos , we should add those tests in test_ops.py in this PR as well

cc @jansel
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80106
Approved by: https://github.com/ezyang, https://github.com/jansel
2022-07-07 18:57:33 +00:00
Anish Mahishi
af9b8a691f Refactoring the AO experimental sparsity tests
The tests are passing, and this diff refactors them to conform with the best practices.

Differential Revision: [D37316967](https://our.internmc.facebook.com/intern/diff/D37316967/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D37316967/)!

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79972

Approved by: https://github.com/z-a-f
2022-06-22 16:52:32 +00:00
Eric Han
06274d7a48 Add test for torchscripting nn.TransformerEncoder, including fast path (#79796)
Summary:
Add test just to check if TransformerEncoder will crash when enumerating over params [with_no_grad, use_torchscript, training].

Motivation for this was that TransformerEncoder fast path (so with_no_grad=True) and use_torchscript=True would crash with the issue that NestedTensor doesn't have size. This was caused because the TransformerEncoder fast path generates a NestedTensor automatically as a perf optimization and torchscript attempts to find intermediate tensor sizes while it optimizes. But NestedTensor has not implemented a size method, so things fail.

This test goes together with this fix https://github.com/pytorch/pytorch/pull/79480

Test Plan:
```
buck build --show-output mode/opt -c fbcode.enable_gpu_sections=true -c fbcode.nvcc_arch=a100 mode/inplace  //caffe2/test:transformers

./fbcode/buck-out/gen/caffe2/test/transformers#binary.par
```
Test runs and passes together with the changes from the PR above (I made another diff on top of this with those changes). Does not pass without the fix.

Reviewed By: mikekgfb

Differential Revision: D37222923

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79796
Approved by: https://github.com/zrphercule
2022-06-17 22:00:49 +00:00
Michael Suo
c978b609f7 [ci] remove IN_CI env var
The conventional env var to set is CI. Both circle and GHA set it, so
IN_CI is unnecessary

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79229

Approved by: https://github.com/janeyx99
2022-06-11 17:16:30 +00:00
Michael Suo
f51d5233f2 [ci] fix GITHUB_ACTIONS env var checks
`GITHUB_ACTIONS` is set to `true`, but some of our code checks that it
is `1`. Make the checks more general.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79290

Approved by: https://github.com/janeyx99
2022-06-11 17:16:30 +00:00
Peter Bell
c936396af2 Always convert truthy booleans to 1
Ref #54789

A `bool` has only two valid values, 1 or 0. Any in-memory value
outside of those leads to undefined behavior. So, instead of
`reinterpret_cast`-ing to `bool*` I introduce `c10::load<scalar_t>`
which will read as `unsigned char` and convert to a valid `bool`.

This gets >90% of operators working, but the remaining operators where
skips and xfails have been added will require individual attention.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77122

Approved by: https://github.com/mruberry
2022-06-07 16:00:30 +00:00
Justin Chu
d0baf5792a [Testing] Turn instantiate_parametrized_tests into a decorator (#78228)
The change

1. Returns the input param in `instantiate_parametrized_tests` so that we can use it as a decorator.

    Instead of

    ```python
    instantiate_parametrized_tests(SomeTestClass)
    ```

    we can now do

    ```python
    @instantiate_parametrized_tests
    class SomeTestClass:
        ...
    ```
2. Strips spaces around parameters in `parametrize`.
    Both
    ```python
    @parametrize("x,y", [(1, 'foo'), (2, 'bar'), (3, 'baz')])
    def test_bar(self, x, y):
        ...
    ```
    and
    ```python
    @parametrize("x, y", [(1, 'foo'), (2, 'bar'), (3, 'baz')])
    def test_bar(self, x, y):
        ...
    ```

    (or

    ```python
    @parametrize("x,                   y", [(1, 'foo'), (2, 'bar'), (3, 'baz')])
    ````

, for that matter), are now parsed to the correct argument names.

3. Decorates test wrappers with `functools.wraps` to preserve the test method names (previously the names become "wrapper_xxx", which prevents the parameterized tests from getting the correct names.)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78228
Approved by: https://github.com/garymm, https://github.com/seemethere
2022-05-31 22:33:37 +00:00
Pearu Peterson
8c88a55d44 Fix sparse BSR tensor validation.
Also adds bits to support dense dimensions for Sparse Compressed tensors.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78359

Approved by: https://github.com/cpuhrsch
2022-05-27 13:26:35 +00:00
Jane Xu
50930604cf Hackily use addSkip to track flakiness in common_utils.py (#78292)
### Problem:
The current way we detect flakiness is by aggregating results at the end of a job, which has worked so far but is inefficient and potentially inaccurate. We have also been delegating a workflow step towards doing this analysis at the end of every job.

### Solution:
This PR uses unittest.TestResult's addSkip method, which adds a skipped test every time we detect something is flaky. This way, we no longer need to aggregate anything and we can easily scan through the test reports and filter for skipped tests with flaky = True. Not only is this much faster to query for, it rids us of needing to figure out janky aggregation logic.

### Test plan:
I simulated a flaky test locally (test_async_python) and observed that:
With overriding signal ON (so flaky test = green):
- Successes pass are reported just as they normally are with no skips. [override_signal_normal_success.txt](https://github.com/pytorch/pytorch/files/8774012/override_signal_normal_success.txt)
- Failures fail and are reported as they are with no skips. [override_signal_all_fails.txt](https://github.com/pytorch/pytorch/files/8774010/override_signal_all_fails.txt)
- Flaky tests have expected failures + a success + a skip denoting the correct information. [override_signal_1_1.txt](https://github.com/pytorch/pytorch/files/8774005/override_signal_1_1.txt)
 and [override_signal_2_1.txt](https://github.com/pytorch/pytorch/files/8774007/override_signal_2_1.txt)

With overriding signal OFF:
- Successes pass are reported just as they normally are with no skips. [report_only_one_success.txt](https://github.com/pytorch/pytorch/files/8774019/report_only_one_success.txt)
- Failures fail and are reported as they are with no skips. [report_only_all_fails.txt](https://github.com/pytorch/pytorch/files/8774018/report_only_all_fails.txt)
- Flaky tests have failures + unexpected successes + a skip denoting the correct
 information. [report_only_3_1.txt](https://github.com/pytorch/pytorch/files/8774015/report_only_3_1.txt)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78292
Approved by: https://github.com/suo
2022-05-26 14:06:57 +00:00
Alban Desmaison
04ac80c73a Fix a few issues on assert/double error/legacy constructor (#77966)
Fixes https://github.com/pytorch/pytorch/issues/77960, https://github.com/pytorch/pytorch/issues/77957, https://github.com/pytorch/pytorch/issues/77781
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77966
Approved by: https://github.com/soulitzer, https://github.com/kulinseth
2022-05-20 20:25:12 +00:00
Philip Meier
dd313d7338 support TestCase.longMessage in TestCase.assertEqual
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77602

Approved by: https://github.com/mruberry
2022-05-20 11:09:28 +00:00
Pearu Peterson
8b5f11c61e Support copy_ for Sparse Compressed tensors.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77605

Approved by: https://github.com/cpuhrsch
2022-05-18 21:22:19 +00:00
Jane Xu
a325fa94b9 [flaky test reporting] print stack trace for flaky reruns (#77664)
Give context about failures ~and include it in the test report~! Unfortunately I cannot include it easily in the test report through the addExpectedFailures :c as tracebacks are not something I can instantiate. (see revert comment)

Current way doesn't print the stack traces, which has been fine because we don't hide the signal and it shows up at the end:
<img width="545" alt="image" src="https://user-images.githubusercontent.com/31798555/168878182-914edc39-369f-40bc-8b35-9d5cd47a6b1c.png">

However, when we want to hide the signal, we'd like to print the stack traces for each failed attempt, as it won't be shown at the end.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77664
Approved by: https://github.com/suo
2022-05-18 14:17:16 +00:00
PyTorch MergeBot
b87e5f383b Revert "[flaky test reporting] print stack trace for flaky reruns (#77664)"
This reverts commit 646a20c8fe.

Reverted https://github.com/pytorch/pytorch/pull/77664 on behalf of https://github.com/janeyx99
2022-05-17 21:27:21 +00:00
Jane Xu
646a20c8fe [flaky test reporting] print stack trace for flaky reruns (#77664)
Give context about failures and include it in the test report!

Current way doesn't print the stack traces, which has been fine because we don't hide the signal and it shows up at the end:
<img width="545" alt="image" src="https://user-images.githubusercontent.com/31798555/168878182-914edc39-369f-40bc-8b35-9d5cd47a6b1c.png">

However, when we want to hide the signal, we'd like to print the stack traces for each failed attempt, as it won't be shown at the end.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77664
Approved by: https://github.com/suo
2022-05-17 18:06:14 +00:00
Kulin Seth
f348b1b2b5 Add the Runtime components for MPS backend. (#76725)
The PR adds the runtime components and few basic operations like copy, as_strided for MPS backend.

Current list of identified TODOs are:

-  https://github.com/pytorch/pytorch/issues/77176
- Unify the logic with CUDACachingAllocator and remove redundant code.
-  https://github.com/pytorch/pytorch/issues/77170
- Look into using C++ smart pointers where possible with ObjC code
- Use empty_strided_generic() to implement the `empty_strided_mps` code
- https://github.com/pytorch/pytorch/issues/77144
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76725
Approved by: https://github.com/albanD
2022-05-11 17:19:45 +00:00
Nikita Vedeneev
00a1fb64bb Faster index_select for sparse COO tensors on CPU. (#72710)
Fixes https://github.com/pytorch/pytorch/issues/72212.

This PR improves the previous algorithm in complexity. It also utilizes the structure of the problem and parallelizes computations when possible.

Benchmark results.

<details>

<summary>Testing script</summary>

```python
import torch
import math
from IPython import get_ipython
from itertools import product
import pickle
from torch.utils.benchmark import Timer, Compare

torch.manual_seed(13)
#torch.set_num_threads(1)
ipython = get_ipython()

index_sizes = (100, 1000, 10000)
# specifies (n, nnz)
problem_dims = (
    # n > nnz
    (10000, 100),
    (100000, 1000),
    (1000000, 10000),
    # n < nnz
    (10, 100),
    (10, 1000),
    (10, 10000),
    (100, 1000),
    (100, 10000),
    (1000, 10000),
    (1000, 100000),
    (1000, 1000000),
    #(1000000, 1000000000),
)

def f(t, d, index):
    s = torch_sparse.SparseTensor.from_torch_sparse_coo_tensor(t)
    ss = s.index_select(d, index)
    return ss.coo()

name = "PR"
results = []

for (n, nnz), m in product(problem_dims, index_sizes):
    for d in (0, 1):
        if nnz < n:
            shape = (n, n)
        else:
            shape = (n, nnz // n) if d == 0 else (nnz // n, n)
        nrows, ncols = shape
        rowidx = torch.randint(low=0, high=nrows, size=(nnz,))
        colidx = torch.randint(low=0, high=ncols, size=(nnz,))
        itemidx = torch.vstack((rowidx, colidx))
        xvalues = torch.randn(nnz)
        index = torch.randint(low=0, high=n, size=(m,))

        SparseX = torch.sparse_coo_tensor(itemidx, xvalues, size=shape).coalesce()
        smtp = "SparseX.index_select(d, index)"
        timer = Timer(smtp,
                      globals=globals(),
                      label="coo.index_select",
                      description=f"{name}: coo.index_select",
                      sub_label=f"n={n}, nnz={nnz}, index_len={m}, dim={d}",
                      num_threads=torch.get_num_threads())
        results.append(timer.blocked_autorange())

compare = Compare(results)
compare.trim_significant_figures()
compare.print()

with open(f"{name}_index_select.pickle", 'wb') as f:
    pickle.dump(results, f)

```

</details>

<details>

<summary>Gather results</summary>

```python
import pickle
from torch.utils.benchmark import Timer, Compare

files = [
        "PR",
        "torch_sparse",
        "master"
        ]

timers = []
for name in files:
    with open("{}_index_select.pickle".format(name), 'rb') as f:
        timers += pickle.load(f)

compare = Compare(timers)
compare.trim_significant_figures()
compare.print()

```

</details>

<details>

<summary>PR/torch_sparse/master runtime comparison</summary>

```
[----------------------------------- coo.index_select ----------------------------------]
                                                    |    PR   |  torch_sparse  |   master
32 threads: -----------------------------------------------------------------------------
      n=10000, nnz=100, index_len=100, dim=0        |     14  |        140     |       10
      n=10000, nnz=100, index_len=100, dim=1        |     14  |        200     |       10
      n=10000, nnz=100, index_len=1000, dim=0       |     30  |        180     |       38
      n=10000, nnz=100, index_len=1000, dim=1       |     34  |        240     |       38
      n=10000, nnz=100, index_len=10000, dim=0      |    278  |        460     |      330
      n=10000, nnz=100, index_len=10000, dim=1      |    275  |        516     |      330
      n=100000, nnz=1000, index_len=100, dim=0      |     16  |        290     |       31
      n=100000, nnz=1000, index_len=100, dim=1      |     26  |        390     |       31
      n=100000, nnz=1000, index_len=1000, dim=0     |     45  |        405     |      263
      n=100000, nnz=1000, index_len=1000, dim=1     |     73  |        500     |      261
      n=100000, nnz=1000, index_len=10000, dim=0    |    444  |        783     |     2570
      n=100000, nnz=1000, index_len=10000, dim=1    |    470  |        890     |     2590
      n=1000000, nnz=10000, index_len=100, dim=0    |     25  |       2400     |      270
      n=1000000, nnz=10000, index_len=100, dim=1    |    270  |       4000     |      269
      n=1000000, nnz=10000, index_len=1000, dim=0   |     74  |       2600     |     2620
      n=1000000, nnz=10000, index_len=1000, dim=1   |    464  |       3600     |     2640
      n=1000000, nnz=10000, index_len=10000, dim=0  |    635  |       3300     |    26400
      n=1000000, nnz=10000, index_len=10000, dim=1  |   1000  |       3960     |    26400
      n=10, nnz=100, index_len=100, dim=0           |     16  |        137     |       16
      n=10, nnz=100, index_len=100, dim=1           |     16  |        220     |       16
      n=10, nnz=100, index_len=1000, dim=0          |     63  |        238     |       81
      n=10, nnz=100, index_len=1000, dim=1          |     60  |        698     |       78
      n=10, nnz=100, index_len=10000, dim=0         |    480  |        940     |      862
      n=10, nnz=100, index_len=10000, dim=1         |    330  |       4930     |     1070
      n=10, nnz=1000, index_len=100, dim=0          |     60  |        200     |       73
      n=10, nnz=1000, index_len=100, dim=1          |     56  |        683     |       70
      n=10, nnz=1000, index_len=1000, dim=0         |    480  |        530     |     1050
      n=10, nnz=1000, index_len=1000, dim=1         |    330  |       4550     |     1368
      n=10, nnz=1000, index_len=10000, dim=0        |   3100  |       2900     |     9300
      n=10, nnz=1000, index_len=10000, dim=1        |   3400  |      46000     |     9100
      n=10, nnz=10000, index_len=100, dim=0         |    400  |        453     |      857
      n=10, nnz=10000, index_len=100, dim=1         |    400  |       4070     |     1730
      n=10, nnz=10000, index_len=1000, dim=0        |   2840  |       2600     |    13900
      n=10, nnz=10000, index_len=1000, dim=1        |   3700  |      40600     |    16000
      n=10, nnz=10000, index_len=10000, dim=0       |  83200  |      67400     |   160000
      n=10, nnz=10000, index_len=10000, dim=1       |  68000  |     528000     |   190000
      n=100, nnz=1000, index_len=100, dim=0         |     46  |        148     |       31
      n=100, nnz=1000, index_len=100, dim=1         |     45  |        242     |       37
      n=100, nnz=1000, index_len=1000, dim=0        |     68  |        248     |      240
      n=100, nnz=1000, index_len=1000, dim=1        |     66  |        755     |      290
      n=100, nnz=1000, index_len=10000, dim=0       |    370  |        802     |     2250
      n=100, nnz=1000, index_len=10000, dim=1       |    372  |       5430     |     2770
      n=100, nnz=10000, index_len=100, dim=0        |     82  |        210     |      224
      n=100, nnz=10000, index_len=100, dim=1        |     74  |        986     |      270
      n=100, nnz=10000, index_len=1000, dim=0       |    350  |        618     |     2600
      n=100, nnz=10000, index_len=1000, dim=1       |    370  |       4660     |     4560
      n=100, nnz=10000, index_len=10000, dim=0      |   3000  |       3400     |    41680
      n=100, nnz=10000, index_len=10000, dim=1      |   5000  |      47500     |    30400
      n=1000, nnz=10000, index_len=100, dim=0       |     71  |        160     |      185
      n=1000, nnz=10000, index_len=100, dim=1       |     64  |        516     |      190
      n=1000, nnz=10000, index_len=1000, dim=0      |    100  |        249     |     1740
      n=1000, nnz=10000, index_len=1000, dim=1      |     98  |       1030     |     1770
      n=1000, nnz=10000, index_len=10000, dim=0     |    600  |        808     |    18300
      n=1000, nnz=10000, index_len=10000, dim=1     |    663  |       5300     |    18500
      n=1000, nnz=100000, index_len=100, dim=0      |    160  |        258     |     1890
      n=1000, nnz=100000, index_len=100, dim=1      |    200  |       3620     |     2050
      n=1000, nnz=100000, index_len=1000, dim=0     |    500  |        580     |    18700
      n=1000, nnz=100000, index_len=1000, dim=1     |    640  |       7550     |    30000
      n=1000, nnz=100000, index_len=10000, dim=0    |   3400  |       3260     |   186000
      n=1000, nnz=100000, index_len=10000, dim=1    |   3600  |      49600     |   194000
      n=1000, nnz=1000000, index_len=100, dim=0     |    517  |        957     |    18700
      n=1000, nnz=1000000, index_len=100, dim=1     |    680  |      39600     |    37600
      n=1000, nnz=1000000, index_len=1000, dim=0    |   3600  |       4500     |   186000
      n=1000, nnz=1000000, index_len=1000, dim=1    |   5800  |      76400     |   190000
      n=1000, nnz=1000000, index_len=10000, dim=0   |  50000  |      67900     |  1800000
      n=1000, nnz=1000000, index_len=10000, dim=1   |  45000  |     570000     |  1900000

Times are in microseconds (us).

```

</details>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72710
Approved by: https://github.com/pearu, https://github.com/cpuhrsch
2022-05-10 16:33:13 +00:00
PyTorch MergeBot
8d67972b14 Revert "Faster index_select for sparse COO tensors on CPU. (#72710)"
This reverts commit ce3857e73c.

Reverted https://github.com/pytorch/pytorch/pull/72710 on behalf of https://github.com/malfet
2022-05-10 14:43:05 +00:00
Nikita Vedeneev
ce3857e73c Faster index_select for sparse COO tensors on CPU. (#72710)
Fixes https://github.com/pytorch/pytorch/issues/72212.

This PR improves the previous algorithm in complexity. It also utilizes the structure of the problem and parallelizes computations when possible.

Benchmark results.

<details>

<summary>Testing script</summary>

```python
import torch
import math
from IPython import get_ipython
from itertools import product
import pickle
from torch.utils.benchmark import Timer, Compare

torch.manual_seed(13)
#torch.set_num_threads(1)
ipython = get_ipython()

index_sizes = (100, 1000, 10000)
# specifies (n, nnz)
problem_dims = (
    # n > nnz
    (10000, 100),
    (100000, 1000),
    (1000000, 10000),
    # n < nnz
    (10, 100),
    (10, 1000),
    (10, 10000),
    (100, 1000),
    (100, 10000),
    (1000, 10000),
    (1000, 100000),
    (1000, 1000000),
    #(1000000, 1000000000),
)

def f(t, d, index):
    s = torch_sparse.SparseTensor.from_torch_sparse_coo_tensor(t)
    ss = s.index_select(d, index)
    return ss.coo()

name = "PR"
results = []

for (n, nnz), m in product(problem_dims, index_sizes):
    for d in (0, 1):
        if nnz < n:
            shape = (n, n)
        else:
            shape = (n, nnz // n) if d == 0 else (nnz // n, n)
        nrows, ncols = shape
        rowidx = torch.randint(low=0, high=nrows, size=(nnz,))
        colidx = torch.randint(low=0, high=ncols, size=(nnz,))
        itemidx = torch.vstack((rowidx, colidx))
        xvalues = torch.randn(nnz)
        index = torch.randint(low=0, high=n, size=(m,))

        SparseX = torch.sparse_coo_tensor(itemidx, xvalues, size=shape).coalesce()
        smtp = "SparseX.index_select(d, index)"
        timer = Timer(smtp,
                      globals=globals(),
                      label="coo.index_select",
                      description=f"{name}: coo.index_select",
                      sub_label=f"n={n}, nnz={nnz}, index_len={m}, dim={d}",
                      num_threads=torch.get_num_threads())
        results.append(timer.blocked_autorange())

compare = Compare(results)
compare.trim_significant_figures()
compare.print()

with open(f"{name}_index_select.pickle", 'wb') as f:
    pickle.dump(results, f)

```

</details>

<details>

<summary>Gather results</summary>

```python
import pickle
from torch.utils.benchmark import Timer, Compare

files = [
        "PR",
        "torch_sparse",
        "master"
        ]

timers = []
for name in files:
    with open("{}_index_select.pickle".format(name), 'rb') as f:
        timers += pickle.load(f)

compare = Compare(timers)
compare.trim_significant_figures()
compare.print()

```

</details>

<details>

<summary>PR/torch_sparse/master runtime comparison</summary>

```
[----------------------------------- coo.index_select ----------------------------------]
                                                    |    PR   |  torch_sparse  |   master
32 threads: -----------------------------------------------------------------------------
      n=10000, nnz=100, index_len=100, dim=0        |     14  |        140     |       10
      n=10000, nnz=100, index_len=100, dim=1        |     14  |        200     |       10
      n=10000, nnz=100, index_len=1000, dim=0       |     30  |        180     |       38
      n=10000, nnz=100, index_len=1000, dim=1       |     34  |        240     |       38
      n=10000, nnz=100, index_len=10000, dim=0      |    278  |        460     |      330
      n=10000, nnz=100, index_len=10000, dim=1      |    275  |        516     |      330
      n=100000, nnz=1000, index_len=100, dim=0      |     16  |        290     |       31
      n=100000, nnz=1000, index_len=100, dim=1      |     26  |        390     |       31
      n=100000, nnz=1000, index_len=1000, dim=0     |     45  |        405     |      263
      n=100000, nnz=1000, index_len=1000, dim=1     |     73  |        500     |      261
      n=100000, nnz=1000, index_len=10000, dim=0    |    444  |        783     |     2570
      n=100000, nnz=1000, index_len=10000, dim=1    |    470  |        890     |     2590
      n=1000000, nnz=10000, index_len=100, dim=0    |     25  |       2400     |      270
      n=1000000, nnz=10000, index_len=100, dim=1    |    270  |       4000     |      269
      n=1000000, nnz=10000, index_len=1000, dim=0   |     74  |       2600     |     2620
      n=1000000, nnz=10000, index_len=1000, dim=1   |    464  |       3600     |     2640
      n=1000000, nnz=10000, index_len=10000, dim=0  |    635  |       3300     |    26400
      n=1000000, nnz=10000, index_len=10000, dim=1  |   1000  |       3960     |    26400
      n=10, nnz=100, index_len=100, dim=0           |     16  |        137     |       16
      n=10, nnz=100, index_len=100, dim=1           |     16  |        220     |       16
      n=10, nnz=100, index_len=1000, dim=0          |     63  |        238     |       81
      n=10, nnz=100, index_len=1000, dim=1          |     60  |        698     |       78
      n=10, nnz=100, index_len=10000, dim=0         |    480  |        940     |      862
      n=10, nnz=100, index_len=10000, dim=1         |    330  |       4930     |     1070
      n=10, nnz=1000, index_len=100, dim=0          |     60  |        200     |       73
      n=10, nnz=1000, index_len=100, dim=1          |     56  |        683     |       70
      n=10, nnz=1000, index_len=1000, dim=0         |    480  |        530     |     1050
      n=10, nnz=1000, index_len=1000, dim=1         |    330  |       4550     |     1368
      n=10, nnz=1000, index_len=10000, dim=0        |   3100  |       2900     |     9300
      n=10, nnz=1000, index_len=10000, dim=1        |   3400  |      46000     |     9100
      n=10, nnz=10000, index_len=100, dim=0         |    400  |        453     |      857
      n=10, nnz=10000, index_len=100, dim=1         |    400  |       4070     |     1730
      n=10, nnz=10000, index_len=1000, dim=0        |   2840  |       2600     |    13900
      n=10, nnz=10000, index_len=1000, dim=1        |   3700  |      40600     |    16000
      n=10, nnz=10000, index_len=10000, dim=0       |  83200  |      67400     |   160000
      n=10, nnz=10000, index_len=10000, dim=1       |  68000  |     528000     |   190000
      n=100, nnz=1000, index_len=100, dim=0         |     46  |        148     |       31
      n=100, nnz=1000, index_len=100, dim=1         |     45  |        242     |       37
      n=100, nnz=1000, index_len=1000, dim=0        |     68  |        248     |      240
      n=100, nnz=1000, index_len=1000, dim=1        |     66  |        755     |      290
      n=100, nnz=1000, index_len=10000, dim=0       |    370  |        802     |     2250
      n=100, nnz=1000, index_len=10000, dim=1       |    372  |       5430     |     2770
      n=100, nnz=10000, index_len=100, dim=0        |     82  |        210     |      224
      n=100, nnz=10000, index_len=100, dim=1        |     74  |        986     |      270
      n=100, nnz=10000, index_len=1000, dim=0       |    350  |        618     |     2600
      n=100, nnz=10000, index_len=1000, dim=1       |    370  |       4660     |     4560
      n=100, nnz=10000, index_len=10000, dim=0      |   3000  |       3400     |    41680
      n=100, nnz=10000, index_len=10000, dim=1      |   5000  |      47500     |    30400
      n=1000, nnz=10000, index_len=100, dim=0       |     71  |        160     |      185
      n=1000, nnz=10000, index_len=100, dim=1       |     64  |        516     |      190
      n=1000, nnz=10000, index_len=1000, dim=0      |    100  |        249     |     1740
      n=1000, nnz=10000, index_len=1000, dim=1      |     98  |       1030     |     1770
      n=1000, nnz=10000, index_len=10000, dim=0     |    600  |        808     |    18300
      n=1000, nnz=10000, index_len=10000, dim=1     |    663  |       5300     |    18500
      n=1000, nnz=100000, index_len=100, dim=0      |    160  |        258     |     1890
      n=1000, nnz=100000, index_len=100, dim=1      |    200  |       3620     |     2050
      n=1000, nnz=100000, index_len=1000, dim=0     |    500  |        580     |    18700
      n=1000, nnz=100000, index_len=1000, dim=1     |    640  |       7550     |    30000
      n=1000, nnz=100000, index_len=10000, dim=0    |   3400  |       3260     |   186000
      n=1000, nnz=100000, index_len=10000, dim=1    |   3600  |      49600     |   194000
      n=1000, nnz=1000000, index_len=100, dim=0     |    517  |        957     |    18700
      n=1000, nnz=1000000, index_len=100, dim=1     |    680  |      39600     |    37600
      n=1000, nnz=1000000, index_len=1000, dim=0    |   3600  |       4500     |   186000
      n=1000, nnz=1000000, index_len=1000, dim=1    |   5800  |      76400     |   190000
      n=1000, nnz=1000000, index_len=10000, dim=0   |  50000  |      67900     |  1800000
      n=1000, nnz=1000000, index_len=10000, dim=1   |  45000  |     570000     |  1900000

Times are in microseconds (us).

```

</details>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72710
Approved by: https://github.com/pearu, https://github.com/cpuhrsch
2022-05-09 19:59:39 +00:00
Pearu Peterson
436a7be059 Factory functions for sparse CSC, BSR, and BSC tensors
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76634

Tests for Sparse Compressed factory functions

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76746

Approved by: https://github.com/cpuhrsch
2022-05-04 03:30:41 +00:00
Nikita Shulga
8473173c36 Remove breakpad dependency
This functionality does not seem to be used
and there are some requests to update dependency.

Add `third_party` to torch_cpu include directories if compiling with
Caffe2 support, as `caffe2/quantization/server/conv_dnnlowp_op.cc` depends on `third_party/fbgemm/src/RefImplementations.h`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75394
Approved by: https://github.com/janeyx99, https://github.com/seemethere
2022-05-03 20:21:55 +00:00
Mike Ruberry
f6bbecf8b5 Adds python ref consistency test, elementwise unary reference inputs, and formats test files
Per title.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76626
Approved by: https://github.com/ngimel
2022-05-01 22:42:46 +00:00
Mike Ruberry
4048d4cdd2 [primTorch] Prototype tracer and elementwise unary reference opinfo class
Adds a prototype tracer with no caching support and the `ElementwiseUnaryPythonRefInfo` class. A reference for `floor` is added to test the latter, and the elementwise binary reference inputs are extended to also return noncontiguous inputs. The SampleInput transform operation has been updated to return an actual SampleInput instead of a tuple to facilitate uniform handling of (transformed) SampleInputs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76388
Approved by: https://github.com/ngimel
2022-04-27 14:40:21 +00:00
Alban Desmaison
3d7abc0e55 Make -h work with run_test.py
As per title.

### When running `python run_test.py -h`
It used to show:
- The general unittest parser help that we print via a second thread 35545d85dc/torch/testing/_internal/common_utils.py (L467-L470)
- The common_utils's parser help

<details><summary>Full result</summary>
<p>

```bash
$ python run_test.py -h
usage: run_test.py [-h] [-v] [-q] [--locals] [-f] [-c] [-b] [-k TESTNAMEPATTERNS] [tests [tests ...]]

positional arguments:
  tests                a list of any number of test modules, classes and test methods.

optional arguments:
  -h, --help           show this help message and exit
  -v, --verbose        Verbose output
  -q, --quiet          Quiet output
  --locals             Show local variables in tracebacks
  -f, --failfast       Stop on first fail or error
  -c, --catch          Catch Ctrl-C and display results so far
  -b, --buffer         Buffer stdout and stderr during tests
  -k TESTNAMEPATTERNS  Only run tests which match the given substring

Examples:
  run_test.py                           - run default set of tests
  run_test.py MyTestSuite               - run suite 'MyTestSuite'
  run_test.py MyTestCase.testSomething  - run MyTestCase.testSomething
  run_test.py MyTestCase                - run all 'test*' test methods
                                       in MyTestCase

usage: run_test.py [-h] [--subprocess] [--seed SEED] [--accept] [--jit_executor JIT_EXECUTOR] [--repeat REPEAT] [--test_bailouts]
                   [--save-xml [SAVE_XML]] [--discover-tests] [--log-suffix LOG_SUFFIX] [--run-parallel RUN_PARALLEL]
                   [--import-slow-tests [IMPORT_SLOW_TESTS]] [--import-disabled-tests [IMPORT_DISABLED_TESTS]]

optional arguments:
  -h, --help            show this help message and exit
  --subprocess          whether to run each test in a subprocess
  --seed SEED
  --accept
  --jit_executor JIT_EXECUTOR
  --repeat REPEAT
  --test_bailouts
  --save-xml [SAVE_XML]
  --discover-tests
  --log-suffix LOG_SUFFIX
  --run-parallel RUN_PARALLEL
  --import-slow-tests [IMPORT_SLOW_TESTS]
  --import-disabled-tests [IMPORT_DISABLED_TESTS]
```

</p>
</details>

It now prints:
- The general unittest parser help the same way. Should we remove this? We can't merge them unfortunately as inittest does not accept parent / does not expose the parser for us to take it as a parent.
- The combined common_utils + run_test parsers help

<details><summary>Full result</summary>
<p>

```bash
$ python run_test.py -h
usage: run_test.py [-h] [-v] [-q] [--locals] [-f] [-c] [-b] [-k TESTNAMEPATTERNS] [tests [tests ...]]

positional arguments:
  tests                a list of any number of test modules, classes and test methods.

optional arguments:
  -h, --help           show this help message and exit
  -v, --verbose        Verbose output
  -q, --quiet          Quiet output
  --locals             Show local variables in tracebacks
  -f, --failfast       Stop on first fail or error
  -c, --catch          Catch Ctrl-C and display results so far
  -b, --buffer         Buffer stdout and stderr during tests
  -k TESTNAMEPATTERNS  Only run tests which match the given substring

Examples:
  run_test.py                           - run default set of tests
  run_test.py MyTestSuite               - run suite 'MyTestSuite'
  run_test.py MyTestCase.testSomething  - run MyTestCase.testSomething
  run_test.py MyTestCase                - run all 'test*' test methods
                                       in MyTestCase

Ignoring disabled issues:  []
usage: run_test.py [-h] [--subprocess] [--seed SEED] [--accept] [--jit_executor JIT_EXECUTOR] [--repeat REPEAT] [--test_bailouts]
                   [--save-xml [SAVE_XML]] [--discover-tests] [--log-suffix LOG_SUFFIX] [--run-parallel RUN_PARALLEL]
                   [--import-slow-tests [IMPORT_SLOW_TESTS]] [--import-disabled-tests [IMPORT_DISABLED_TESTS]] [-v] [--jit]
                   [--distributed-tests] [-core] [-pt] [-c] [-i TESTS [TESTS ...]] [-x TESTS [TESTS ...]] [-f TESTS] [-l TESTS]
                   [--bring-to-front TESTS [TESTS ...]] [--ignore-win-blocklist] [--continue-through-error]
                   [--export-past-test-times [EXPORT_PAST_TEST_TIMES]] [--shard SHARD SHARD] [--exclude-jit-executor]
                   [--exclude-distributed-tests] [--run-specified-test-cases [RUN_SPECIFIED_TEST_CASES]]
                   [--use-specified-test-cases-by {include,bring-to-front}] [--dry-run]
                   [additional_unittest_args [additional_unittest_args ...]]

Run the PyTorch unit test suite

positional arguments:
  additional_unittest_args
                        additional arguments passed through to unittest, e.g., python run_test.py -i sparse -- TestSparse.test_factory_size_check

optional arguments:
  -h, --help            show this help message and exit
  --subprocess          whether to run each test in a subprocess
  --seed SEED
  --accept
  --jit_executor JIT_EXECUTOR
  --repeat REPEAT
  --test_bailouts
  --save-xml [SAVE_XML]
  --discover-tests
  --log-suffix LOG_SUFFIX
  --run-parallel RUN_PARALLEL
  --import-slow-tests [IMPORT_SLOW_TESTS]
  --import-disabled-tests [IMPORT_DISABLED_TESTS]
  -v, --verbose         print verbose information and test-by-test results
  --jit, --jit          run all jit tests
  --distributed-tests, --distributed-tests
                        run all distributed tests
  -core, --core         Only run core tests, or tests that validate PyTorch's ops, modules,and autograd. They are defined by CORE_TEST_LIST.
  -pt, --pytest         If true, use `pytest` to execute the tests. E.g., this runs TestTorch with pytest in verbose and coverage mode: python run_test.py -vci torch -pt
  -c, --coverage        enable coverage
  -i TESTS [TESTS ...], --include TESTS [TESTS ...]
                        select a set of tests to include (defaults to ALL tests). tests must be a part of the TESTS list defined in run_test.py
  -x TESTS [TESTS ...], --exclude TESTS [TESTS ...]
                        select a set of tests to exclude
  -f TESTS, --first TESTS
                        select the test to start from (excludes previous tests)
  -l TESTS, --last TESTS
                        select the last test to run (excludes following tests)
  --bring-to-front TESTS [TESTS ...]
                        select a set of tests to run first. This can be used in situations where you want to run all tests, but care more about some set, e.g. after making a change to a specific component
  --ignore-win-blocklist
                        always run blocklisted windows tests
  --continue-through-error
                        Runs the full test suite despite one of the tests failing
  --export-past-test-times [EXPORT_PAST_TEST_TIMES]
                        dumps test times from previous S3 stats into a file, format JSON
  --shard SHARD SHARD   runs a shard of the tests (taking into account other selections), e.g., --shard 2 3 will break up the selected tests into 3 shards and run the tests in the 2nd shard (the first number should not exceed the second)
  --exclude-jit-executor
                        exclude tests that are run for a specific jit config
  --exclude-distributed-tests
                        exclude distributed tests
  --run-specified-test-cases [RUN_SPECIFIED_TEST_CASES]
                        load specified test cases file dumped from previous OSS CI stats, format CSV.  If all test cases should run for a <test_module> please add a single row:
                         test_filename,test_case_name
                         ...
                         <test_module>,__all__
                         ...
                        how we use the stats will be based on option "--use-specified-test-cases-by".
  --use-specified-test-cases-by {include,bring-to-front}
                        used together with option "--run-specified-test-cases". When specified test case file is set, this option allows the user to control whether to only run the specified test modules or to simply bring the specified modules to front and also run the remaining modules. Note: regardless of this option, we will only run the specified test cases  within a specified test module. For unspecified test modules with the bring-to-front option, all test cases will be run, as one may expect.
  --dry-run             Only list the test that will run.

where TESTS is any of: benchmark_utils/test_benchmark_utils, distributed/_shard/sharded_optim/test_sharded_optim, distributed/_shard/sharded_tensor/ops/test_binary_cmp, distributed/_shard/sharded_tensor/ops/test_elementwise_ops, distributed/_shard/sharded_tensor/ops/test_embedding, distributed/_shard/sharded_tensor/ops/test_embedding_bag, distributed/_shard/sharded_tensor/ops/test_init, distributed/_shard/sharded_tensor/ops/test_linear, distributed/_shard/sharded_tensor/ops/test_math_ops, distributed/_shard/sharded_tensor/test_megatron_prototype, distributed/_shard/sharded_tensor/test_partial_tensor, distributed/_shard/sharded_tensor/test_sharded_tensor, distributed/_shard/sharded_tensor/test_sharded_tensor_reshard, distributed/_shard/sharding_spec/test_sharding_spec, distributed/_shard/test_replicated_tensor, distributed/algorithms/test_join, distributed/elastic/events/lib_test, distributed/elastic/metrics/api_test, distributed/elastic/multiprocessing/api_test, distributed/elastic/timer/api_test, distributed/elastic/timer/local_timer_example, distributed/elastic/timer/local_timer_test, distributed/elastic/utils/distributed_test, distributed/elastic/utils/logging_test, distributed/elastic/utils/util_test, distributed/fsdp/test_flatten_params_wrapper, distributed/fsdp/test_fsdp_apply, distributed/fsdp/test_fsdp_checkpoint, distributed/fsdp/test_fsdp_clip_grad_norm, distributed/fsdp/test_fsdp_comm, distributed/fsdp/test_fsdp_core, distributed/fsdp/test_fsdp_freezing_weights, distributed/fsdp/test_fsdp_grad_acc, distributed/fsdp/test_fsdp_ignored_modules, distributed/fsdp/test_fsdp_input, distributed/fsdp/test_fsdp_memory, distributed/fsdp/test_fsdp_mixed_precision, distributed/fsdp/test_fsdp_multiple_forward, distributed/fsdp/test_fsdp_multiple_wrapping, distributed/fsdp/test_fsdp_optim_state, distributed/fsdp/test_fsdp_overlap, distributed/fsdp/test_fsdp_pure_fp16, distributed/fsdp/test_fsdp_state_dict, distributed/fsdp/test_fsdp_summon_full_params, distributed/fsdp/test_fsdp_traversal, distributed/fsdp/test_fsdp_uneven, distributed/fsdp/test_shard_utils, distributed/fsdp/test_utils, distributed/fsdp/test_wrap, distributed/nn/jit/test_instantiator, distributed/optim/test_zero_redundancy_optimizer, distributed/pipeline/sync/skip/test_api, distributed/pipeline/sync/skip/test_gpipe, distributed/pipeline/sync/skip/test_inspect_skip_layout, distributed/pipeline/sync/skip/test_leak, distributed/pipeline/sync/skip/test_portal, distributed/pipeline/sync/skip/test_stash_pop, distributed/pipeline/sync/skip/test_tracker, distributed/pipeline/sync/skip/test_verify_skippables, distributed/pipeline/sync/test_balance, distributed/pipeline/sync/test_bugs, distributed/pipeline/sync/test_checkpoint, distributed/pipeline/sync/test_copy, distributed/pipeline/sync/test_deferred_batch_norm, distributed/pipeline/sync/test_dependency, distributed/pipeline/sync/test_inplace, distributed/pipeline/sync/test_microbatch, distributed/pipeline/sync/test_phony, distributed/pipeline/sync/test_pipe, distributed/pipeline/sync/test_pipeline, distributed/pipeline/sync/test_stream, distributed/pipeline/sync/test_transparency, distributed/pipeline/sync/test_worker, distributed/rpc/cuda/test_tensorpipe_agent, distributed/rpc/test_faulty_agent, distributed/rpc/test_tensorpipe_agent, distributed/test_c10d_common, distributed/test_c10d_gloo, distributed/test_c10d_nccl, distributed/test_c10d_spawn_gloo, distributed/test_c10d_spawn_nccl, distributed/test_data_parallel, distributed/test_distributed_spawn, distributed/test_launcher, distributed/test_nccl, distributed/test_pg_wrapper, distributed/test_store, distributions/test_constraints, distributions/test_distributions, lazy/test_bindings, lazy/test_extract_compiled_graph, lazy/test_ts_opinfo, test_ao_sparsity, test_autocast, test_autograd, test_binary_ufuncs, test_bundled_inputs, test_complex, test_cpp_api_parity, test_cpp_extensions_aot_ninja, test_cpp_extensions_aot_no_ninja, test_cpp_extensions_jit, test_cuda, test_cuda_primary_ctx, test_dataloader, test_datapipe, test_deploy, test_deploy, test_dispatch, test_expanded_weights, test_foreach, test_function_schema, test_functional_autograd_benchmark, test_functional_optim, test_functionalization, test_futures, test_fx, test_fx_experimental, test_hub, test_import_stats, test_indexing, test_jit, test_jit_autocast, test_jit_cuda_fuser, test_jit_disabled, test_jit_fuser_legacy, test_jit_fuser_te, test_jit_legacy, test_jit_profiling, test_license, test_linalg, test_logging, test_masked, test_mkldnn, test_mobile_optimizer, test_model_dump, test_module_init, test_modules, test_monitor, test_multiprocessing, test_multiprocessing_spawn, test_namedtensor, test_namedtuple_return_api, test_native_functions, test_nestedtensor, test_nn, test_numba_integration, test_numpy_interop, test_openmp, test_ops, test_ops_gradients, test_ops_jit, test_optim, test_overrides, test_package, test_per_overload_api, test_profiler, test_pruning_op, test_public_bindings, test_python_dispatch, test_pytree, test_quantization, test_reductions, test_scatter_gather_ops, test_serialization, test_set_default_mobile_cpu_allocator, test_shape_ops, test_show_pickle, test_sort_and_select, test_sparse, test_sparse_csr, test_spectral_ops, test_stateless, test_tensor_creation_ops, test_tensorboard, test_tensorexpr, test_tensorexpr_pybind, test_testing, test_torch, test_type_hints, test_type_info, test_type_promotion, test_unary_ufuncs, test_utils, test_view_ops, test_vmap, test_vulkan, test_xnnpack_integration
```

</p>
</details>

### When running anything else (for example  `python test_autograd.py -h`)
It did not change and still does:
- The general unittest parser help that we print via a second thread
- The common_utils's parser help
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76152
Approved by: https://github.com/malfet, https://github.com/seemethere
2022-04-25 14:01:33 +00:00
Thiago Crepaldi
90d31cb311 Emit ATen ops when symbolics raise + minor fixes
Currently `torch.onnx.export(.., operator_export_type=OperatorExportTypes.ONNX_ATEN_FALLBACK)` only issues ATen ops through explicit requests (e.g. `g.at()`) calls inside each op symbolic function. This is done based on specific conditions such as `operator_export_type==OperatorExportTypes.ONNX_ATEN_FALLBACK)` or `is_caffe2_aten_fallback()`

This PR extends the ATen fallback mechanism for scenarios when the symbolic function raises `RuntimeError` during export. The idea is that partial implementation of existing ONNX ops can fallback to ATen as a last resort. That is valuable because each operator can have many input combinations and not all are always implemented.

A minor fix was done to make sure the `overload_name` attribute is added to explicit ATen op fallback requests when a symbolic is not registered to a particular op.

ps: The behavior for builds with BUILD_CAFFE2=1 is not changed to ensure BC.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74759
Approved by: https://github.com/garymm, https://github.com/msaroufim
2022-04-23 21:24:25 +00:00
Thiago Crepaldi
e07134092f Add warning when importing caffe2 on build without BUILD_CAFFE2=1
Confusing backtraces are issued to users when they run Caffe2 scripts (or tests) on PyTorch builds without Caffe2 enabled through `BUILD_CAFFE2=1`

This PR adds warnings (in more than one place) to return a friendly message for the user, helping them to overcome the problem by themselves

Pull Request resolved: https://github.com/pytorch/pytorch/pull/73770
Approved by: https://github.com/BowenBao, https://github.com/malfet, https://github.com/garymm
2022-04-21 12:28:10 +00:00
Edward Z. Yang
ee955b8bb9 Cannibalize noarch CI job into crossref CI job
crossref is a new strategy for performing tests when you want
to run a normal PyTorch API call, separately run some variation of
the API call (e.g., same thing but all the arguments are meta tensors)
and then cross-reference the results to see that they are consistent.
Any logic you add to CrossRefMode will get run on *every* PyTorch API
call that is called in the course of PyTorch's test suite.  This can
be a good choice for correctness testing if OpInfo testing is not
exhaustive enough.

For now, the crossref test doesn't do anything except verify that
we can validly push a mode onto the torch function mode stack for all
functions.

Signed-off-by: Edward Z. Yang <ezyangfb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75988

Approved by: https://github.com/seemethere
2022-04-20 11:56:25 +00:00
Edward Z. Yang
28a3654668 Make PYTORCH_TEST_WITH_SLOW_GRADCHECK consistent with other test envvars
Signed-off-by: Edward Z. Yang <ezyangfb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75987

Approved by: https://github.com/soulitzer
2022-04-19 01:13:08 +00:00
Edward Z. Yang
30943d1610 Remove noarchTest decorator
These tests are cheap so it doesn't matter if we run them on all
configs.  This is in preparation for removing the noarch build
configuration entirely.

Signed-off-by: Edward Z. Yang <ezyangfb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75985

Approved by: https://github.com/seemethere, https://github.com/cbalioglu
2022-04-19 00:48:49 +00:00
PyTorch MergeBot
d79d9fa283 Revert "Remove breakpad dependency"
This reverts commit 9aa3c7fd83.

Reverted https://github.com/pytorch/pytorch/pull/75394 on behalf of https://github.com/malfet
2022-04-17 17:58:51 +00:00
Nikita Shulga
9aa3c7fd83 Remove breakpad dependency
This functionality does not seem to be used
and there are some requests to update dependency

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75394
Approved by: https://github.com/janeyx99, https://github.com/seemethere
2022-04-17 17:43:45 +00:00
Thiago Crepaldi
9bbe1d632e Fix ONNX ATen fallback for non-caffe2 engines
This PR introduces 3 BC changes:

First, this PR propagates `BUILD_CAFFE2` flag to `libtorch` and `libtorch_python`, which is necessary for non-caffe2 ONNX runtimes when using `ONNX_ATEN_FALLBACK` operator export type.

Second, as a complement of https://github.com/pytorch/pytorch/pull/68490, this PR refactors Caffe2's Aten ops symbolics to consider not only the `operator_export_type` (aka `ONNX_ATEN_FALLBACK`) to emit Caffe2 Aten ops, but also whether `BUILD_CAFFE2` (which is called `torch.onnx._CAFFE2_ATEN_FALLBACK` in python binding) is set.

Lastly, it renames `onnx::ATen` to `aten::ATen` for ONNX spec consistency in a BC fashion.
ONNX doesn't have `ATen` op on its spec, but PyTorch ONNX converter emits them. Non-Caffe2 backend engines would be mislead by such operator's name/domain. A non-ideal workaround would be to have Aten ops handled based on its name and ignore the (non-complaint) domain. Moreover, users could incorrectly file bugs to either ONNX or ONNX Runtime when they inspect the model and notice the presence of an unspecified ONNX operator.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73954
Approved by: https://github.com/BowenBao, https://github.com/malfet, https://github.com/garymm, https://github.com/jiafatom
2022-04-14 23:18:45 +00:00
Ivan Yashchuk
c7ae23b50e Extend CSR constructor to support batched indices and values
This is the first portion of changes required to enable Batched CSR format described in https://github.com/pytorch/pytorch/issues/60854#batched-CSR-computation.

Currently, only the same batch shape for indices and values is allowed. In the future, we could enable "broadcasting" of indices and batched values, as done in xFormers (dd96b8d8be/xformers/components/attention/_sputnik_sparse.py (L441)).

This PR adds possibility to construct a batched CSR matrix with `torch.sparse_csr_tensor` and this batched CSR can be converted to a dense tensor with a `.to_dense()` call.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74542
Approved by: https://github.com/cpuhrsch
2022-04-07 17:10:52 +00:00
PyTorch MergeBot
6d832a7a20 Revert "Extend CSR constructor to support batched indices and values"
This reverts commit eead599039.

Reverted https://github.com/pytorch/pytorch/pull/74542 on behalf of https://github.com/b0noI
2022-04-05 21:39:34 +00:00
Ivan Yashchuk
eead599039 Extend CSR constructor to support batched indices and values
This is the first portion of changes required to enable Batched CSR format described in https://github.com/pytorch/pytorch/issues/60854#batched-CSR-computation.

Currently, only the same batch shape for indices and values is allowed. In the future, we could enable "broadcasting" of indices and batched values, as done in xFormers (dd96b8d8be/xformers/components/attention/_sputnik_sparse.py (L441)).

This PR adds possibility to construct a batched CSR matrix with `torch.sparse_csr_tensor` and this batched CSR can be converted to a dense tensor with a `.to_dense()` call.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74542
Approved by: https://github.com/cpuhrsch
2022-04-04 22:09:44 +00:00
PyTorch MergeBot
cc23725e89 Revert "Extend CSR constructor to support batched indices and values"
This reverts commit c074a53002.

Reverted https://github.com/pytorch/pytorch/pull/74542 on behalf of https://github.com/malfet
2022-03-30 19:54:26 +00:00
Ivan Yashchuk
c074a53002 Extend CSR constructor to support batched indices and values
This is the first portion of changes required to enable Batched CSR format described in https://github.com/pytorch/pytorch/issues/60854#batched-CSR-computation.

Currently, only the same batch shape for indices and values is allowed. In the future, we could enable "broadcasting" of indices and batched values, as done in xFormers (dd96b8d8be/xformers/components/attention/_sputnik_sparse.py (L441)).

This PR adds possibility to construct a batched CSR matrix with `torch.sparse_csr_tensor` and this batched CSR can be converted to a dense tensor with a `.to_dense()` call.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74542
Approved by: https://github.com/cpuhrsch
2022-03-29 21:20:25 +00:00
Elias Ellison
6694fdaccd Clean up profiling mode and profiling executor strategy (#73875)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73875

Previously we had a few settings:
- getExecutor - which toggled between Profiling Executor and Legacy
- getGraphOptimize - if true, overrides PE/Legacy to run with simple executor (no optimizations)
and then...
- getProfilingMode - which would set PE to 0 specializtions.

The last mode is redundant with getGraphOptimize, we should just remove it and use getGraphOptimize in these cases. It would lead to potentially invalid combinations of logic - what does mean if getProfilingMode is true but getExecutor is set to false ? This would lead to a bug in specialize_autograd_zero in this case, see: https://github.com/pytorch/pytorch/blob/master/torch%2Fcsrc%2Fjit%2Fpasses%2Fspecialize_autogradzero.cpp#L93.

The tests here are failing but get fixed with the PR above it, so i'll squash for landing.

Test Plan: Imported from OSS

Reviewed By: cpuhrsch

Differential Revision: D34938130

Pulled By: eellison

fbshipit-source-id: 1a9c0ae7f6d1cfddc2ed3499a5af611053ae5e1b
(cherry picked from commit cf69ce3d155ba7d334022c42fb2cee54bb088c23)
2022-03-29 18:38:51 +00:00
Christian Puhrsch
e55b73d65a Add strided layout support for to_dense
Fixes #59958

Pull Request resolved: https://github.com/pytorch/pytorch/pull/74486
Approved by: https://github.com/pearu, https://github.com/suo
2022-03-29 00:12:48 +00:00
Tristan Rice
5b915e844c c10d: retry dns lookup failures (#74641)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74641

This makes dns hostname lookup failures retryable since in some environments such as Kubernetes they're not guaranteed to be resolvable until the job starts. Retrying this eliminates the race condition.

This also fixes `sandcastle_skip_if` when used on the class instead of the method. Previously they wouldn't inherit from TestCase so just wouldn't run under buck at all.

Fixes https://github.com/pytorch/pytorch/issues/73682

Test Plan:
Added a unit test

```
buck test //caffe2/test/distributed:test_store
```

Reviewed By: aivanou

Differential Revision: D35092284

fbshipit-source-id: d40bf187e52c41f551e4fe41c536b2b0015588ee
(cherry picked from commit f8908309d8ee64c25ee466a6b4922f34f2b7618e)
2022-03-24 19:51:09 +00:00
Maksim Dmitriyevich Podkorytov
6a664481d5 Print reason for test skipped in CI (#74451)
Summary:
Issue: https://github.com/pytorch/pytorch/issues/69014 (skip reason is not printed for tests running in CI)

Cause: locally tests are fired with testrunner that is built into unittest. When IN_CI env variable is true - tests are fired with xml test runner from unittest-xml-reporting. They have different summary printing properties

Solution: patching printing logic in unittest-xml-reporting when a test is skipped

Pull Request resolved: https://github.com/pytorch/pytorch/pull/74451

Test Plan: examine CI run

Reviewed By: janeyx99

Differential Revision: D35050675

Pulled By: tenpercent

fbshipit-source-id: e3421687d68932079ba156153abeefee368b2fb3
(cherry picked from commit e2356d8ecc618dad9c7c556d40b47c1a6c53f68a)
2022-03-23 00:02:26 +00:00
Mike Ruberry
0aa3c39e5f Extends OpInfo architecture with reference inputs, adds them for elementwise binary operators
This PR extends our OpInfo test architecture with "reference inputs," an optional expansion of typical sample inputs that allows for more thorough testing. Currently only the elementwise binary operations implement an extended set of reference inputs. This PR also cleans up some smaller OpInfo-related issues, including several bugs, and it identified https://github.com/pytorch/pytorch/issues/74279.

A reference inputs function can be specified for an OpInfo by filling in its "reference_inputs_func" metadata. If this is done it's recommended that the reference inputs function first call the sample inputs function, then produce additional sample inputs. See `reference_inputs_elementwise_binary` for an example of this pattern.

In addition to implementing reference inputs for the elementwise binary operations, this PR improves their consistency and simplifies how their metadata is represented. The great majority now use a generic sample input function, and those that want extensions start by calling the generic sample input function and then adding additional samples. This removes many older sample input functions. The BinaryUfuncInfo subclass also now allows specifying scalar support more precisely, and reference inputs and error inputs are generated based on this metadata to ensure it's correct.

cc @kshitij12345 @pmeier @zou3519 @Chillee

Pull Request resolved: https://github.com/pytorch/pytorch/pull/74280
Approved by: https://github.com/ngimel
2022-03-21 03:24:16 +00:00