Commit Graph

682 Commits

Author SHA1 Message Date
Catherine Lee
cc93c1e5e4 Upload artifacts during test run (#125799)
Zip and upload artifacts while run_test is running
Upgrade boto3 because I get errors about not having `botocore.vendored.six.move` if I don't
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125799
Approved by: https://github.com/huydhn
2024-10-22 16:48:57 +00:00
Will Feng
e4ad02892f Upgrade distributed test to g4dn instances (T4 GPUs) (#137161)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/137161
Approved by: https://github.com/seemethere, https://github.com/eqy, https://github.com/yf225

Co-authored-by: Will Feng <yf225@cornell.edu>
2024-10-20 23:48:54 +00:00
PyTorch MergeBot
24ee4af86b Revert "Upgrade distributed test to g4dn instances (T4 GPUs) (#137161)"
This reverts commit 2b7c7a20b9.

Reverted https://github.com/pytorch/pytorch/pull/137161 on behalf of https://github.com/kwen2501 due to breaking trunk ([comment](https://github.com/pytorch/pytorch/pull/137161#issuecomment-2417833666))
2024-10-16 20:05:38 +00:00
Catherine Lee
f173623bb2 [td] try catch exception, do not run td if not results (#138087)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138087
Approved by: https://github.com/wdvr
2024-10-16 18:04:25 +00:00
Ke Wen
2b7c7a20b9 Upgrade distributed test to g4dn instances (T4 GPUs) (#137161)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/137161
Approved by: https://github.com/seemethere, https://github.com/eqy
2024-10-16 16:42:57 +00:00
PyTorch MergeBot
78632b97b1 Revert "Upgrade distributed test to g4dn instances (T4 GPUs) (#137161)"
This reverts commit f43c4d28b8.

Reverted https://github.com/pytorch/pytorch/pull/137161 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but it seems another failure showing up after the upgrade ([comment](https://github.com/pytorch/pytorch/pull/137161#issuecomment-2415941159))
2024-10-16 07:26:34 +00:00
Ke Wen
f43c4d28b8 Upgrade distributed test to g4dn instances (T4 GPUs) (#137161)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/137161
Approved by: https://github.com/seemethere, https://github.com/eqy
2024-10-16 05:03:08 +00:00
Ke Wen
56cc22eb01 [CI][Distributed] Not to test distributed_test.py with UCC (#137932)
Some UCC tests became unstable recently, with or without the M60 to T4 upgrade.
See for example: #137855 (without upgrade), #137161 (with upgrade).
So I am extracting the disablement from #137161 here.

Failure signature:
```
RuntimeError: [/var/lib/jenkins/workspace/torch/csrc/distributed/c10d/ProcessGroupUCC.cpp:496] [Rank 0][ProcessGroupUCC-0][READY]failed to post triggered collective, error code -6: Unhandled error, system error code 0
```

Earlier discussed here:
https://github.com/pytorch/pytorch/pull/137161/files#r1797353294

Cc: @Aidyn-A @eqy
Pull Request resolved: https://github.com/pytorch/pytorch/pull/137932
Approved by: https://github.com/fduwjj, https://github.com/malfet, https://github.com/eqy
2024-10-15 07:22:57 +00:00
Jagadish Krishnamoorthy
674d59359d [ROCm] Enable dist sharded_tensor test suites (#137724)
Following test suites are enabled on ROCm
test_sharded_tensor
test_sharded_tensor_reshard
test_sharding_plan

Pull Request resolved: https://github.com/pytorch/pytorch/pull/137724
Approved by: https://github.com/jithunnair-amd, https://github.com/pruthvistony, https://github.com/malfet
2024-10-14 20:20:57 +00:00
eellison
47af7cc962 Add compiler bisector (#131936)
This is a utility to aid the torch.compile debugging. You provide a function that returns True on success, False on failure, or do something out of process and run bisect_helper `good | bad`.

The bisector will first go through backends - `eager`, `aot_eager`, `aot_eager_decomp_partition`, `inductor` to find the first failing backend. Then, it will go through subsystems within the backend - currently limited but could be expanded - and try to find the first subsystem for which disabling fixes the problem. Once it has found the failing subsystem, it will find the number of times the subsystem is applied, and then bisect through it.

An example usage of how to hook it up for aot_eager_decomp_partition and decomposition subsystem is :

```
    from torch._inductor.bisect_helper import BisectionManager
    if op in CURRENT_DECOMPOSITION_TABLE:
        if BisectionManager.disable_subsystem("aot_eager_decomp_partition", "decomposition", lambda: repr(op)):
            return NotImplemented
```

Once it has discovered the problematic change, it will print out the associated debug info, and you can set the same limits with `TORCH_BISECT_BACKEND` `TORCH_BISECT_SUBSYSTEM` and `TORCH_BISECT_MAX`.

We could add further options as an automated way of going through a check list for checking divergence - e.g., the mode to emulate amp casts.

Fix for https://github.com/pytorch/pytorch/issues/126546

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131936
Approved by: https://github.com/ezyang
2024-10-09 20:34:11 +00:00
Siddharth Kotapati
e27c0048db Enable additional tests for MPS CI runs (#134356)
As part of the follow up for https://github.com/pytorch/pytorch/issues/133520, adapting existing unused tests for use in MPS CI runs. Focusing on nhwc & other memory formatting tests

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134356
Approved by: https://github.com/malfet, https://github.com/eqy, https://github.com/huydhn
2024-10-04 21:52:38 +00:00
Sergii Dymchenko
a619ced5ed Revert "Update run_test.py"
This reverts commit 193073b491.
2024-09-26 17:34:52 -07:00
Sergii Dymchenko
193073b491
Update run_test.py 2024-09-26 16:56:29 -07:00
Xinya Zhang
74fd1bf965 [ROCm] Update to AOTriton 0.7b (#134498)
Notable changes:
1. Enable CudaGraph related tests
2. Fix UT problems
3. EXPERIMENTAL Navi31 support. User should enable Navi31 support with Env Var `TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL=1`

Know Problem:
1. `test/test_transformers.py` will massive failures and/or NaN outputs with `--use-pytest`
    + Update: Confirmed skip `class TestSDPAPrivateUse1Only` can fix the problem with `--use-pytest`

Note:
AOTriton 0.7b adds support to nestedtenosrs+SDPA but need more work (and consequently a separate PR) to enable it.

Fixes #133540

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134498
Approved by: https://github.com/pruthvistony, https://github.com/jeffdaily, https://github.com/malfet
2024-09-11 20:34:01 +00:00
Bo Li
16b8146c9e Exclude test_transformers and unit tests which require recent GPU arch (#132895)
This PR is to exclude test_transformers on ROCm temporarily and skip some unit tests which require recent GPU arch.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132895
Approved by: https://github.com/jithunnair-amd, https://github.com/pruthvistony, https://github.com/malfet
2024-08-27 20:40:53 +00:00
Roy Hvaara
1565940114 [MPS] Add test/test_nn.py to test suite (#134184)
This PR increases test coverage by including the tests in `test/test_nn.py` in the test suite of MPS.

Some of the tests are decorated with `@expectedFailureMPS` for various reasons. Either that the op is not implemented, or that the outputs do not align. Those tests that contain differing results should be investigated further to rule out any live bugs.

```bash
$ python test/run_test.py --mps --verbose -k TestNN
Running test batch 'tests to run' cost 84.76 seconds
```

Ref #133520

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134184
Approved by: https://github.com/albanD, https://github.com/malfet
2024-08-26 23:48:23 +00:00
Aidyn-A
28a4db84f2 [ARM] Fix infinite recursion in unwind (#134387)
Fixes #119905

The `TORCH_SHOW_CPP_STACKTRACES=1` setting on ARM causes infinite recursive unwind because on failure a `StackTraceFetcher` attempts to unwind the <ins>failed instruction</ins>: 5ad759ca33/torch/csrc/profiler/combined_traceback.cpp (L25)
then the unwind itself fails:
5ad759ca33/torch/csrc/profiler/unwind/unwind.cpp (L10-L12)
and it causes another attempt to unwind the failure in `unwind()`...

In summary, the executed instruction is equivalent to:
```C++
std::vector<void*> unwind() {
  // some instructions ...
  return unwind();
}
```
This PR replaces `TORCH_CHECK` by `TORCH_WARN_ONCE` as it will not cause an uncontrolled recursion. The only side effect would be an empty back-trace.

Huge thanks to @nWEIdia who found the root cause!

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134387
Approved by: https://github.com/eqy, https://github.com/nWEIdia, https://github.com/malfet
2024-08-26 21:02:31 +00:00
Edward Z. Yang
99cf567714 Make SCRIBE_GRAPHQL_ACCESS_TOKEN available to test jobs running on main (#133536)
It is possible to write to Meta's internal in-memory database Scuba via the Scribe Graph API: https://www.internalfb.com/intern/wiki/Scribe/users/Knowledge_Base/Interacting_with_Scribe_categories/Graph_API/ This is currently being used by pytorch/benchmark repo to upload torchbench performance results.

I want to make this API generally available to all jobs running on CI in a semi-trusted context. To talk to Scribe, you need a secret access token. I have initially configured an environment prod-branch-main which contains `SCRIBE_GRAPHQL_ACCESS_TOKEN`, and switched a single class of jobs (linux-test) to use this environment when they are running on the main branch. Because we require approvals for running CI on untrusted contributions, we could potentially allow all jobs to run in this environment, including jobs on PRs, but I don't need this for my use case (per-PR benchmark result reporting, and miscellaneous statistics on main.)

If this works, I'll push out this environment to the rest of our test jobs.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/133536
Approved by: https://github.com/xuzhao9, https://github.com/malfet, https://github.com/albanD
2024-08-15 19:53:17 +00:00
hippocookie
a6ad834fa8 Fix counting execution time in run_test.py (#133199)
Counting `elapsed_time` immediately after `start_time`, not reflect real execution time of `test_batch`.

Move `elapsed_time` and print method after `run_tests` method call to fix it.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133199
Approved by: https://github.com/clee2000
2024-08-15 15:29:44 +00:00
chuanqiw
72f2b29bb0 [CI] disable xpu kineto build (#133069)
Due to the xpu kineto support PR https://github.com/pytorch/pytorch/pull/130811 landed, but the xpu ci infra not ready for now. Disable kineto build as a temp WA
Pull Request resolved: https://github.com/pytorch/pytorch/pull/133069
Approved by: https://github.com/seemethere
2024-08-09 23:58:50 +00:00
Xuehai Pan
4226ed1585 [BE] Format uncategorized Python files with ruff format (#132576)
Remove patterns `**`, `test/**`, and `torch/**` in `tools/linter/adapters/pyfmt_linter.py` and run `lintrunner`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132576
Approved by: https://github.com/ezyang, https://github.com/Skylion007
ghstack dependencies: #132574
2024-08-04 17:13:31 +00:00
Xuehai Pan
5cc34f61d1 [CI] add new test config label ci-test-showlocals to control test log verbosity (#131981)
Add a new label `ci-test-showlocals` and add it to test config filter.
If the PR is labeled with `ci-test-showlocals` or "ci-test-showlocals"
present in the PR comment, the test config filter will set a environment
variable `TEST_SHOWLOCALS`. Then `pytest` will show local variables on
failures for better debugging.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131981
Approved by: https://github.com/malfet
ghstack dependencies: #131151
2024-07-29 18:53:14 +00:00
Xuehai Pan
4694ee1ad2 [BE][tests] show local variables on failure in tests (#131151)
------

As per the title, add argument `--locals` for `unittest` and `--showlocals --tb=long` for `pytest` in CI.

Some failures cannot be reproduced on the local machine but exist on cloud CI. This change allows us to investigate the test failure more easily.

Example output: https://github.com/pytorch/pytorch/actions/runs/9961546996/job/27523888353?pr=130710#step:20:3361

```text
/opt/conda/envs/py_3.8/lib/python3.8/site-packages/sympy/core/function.py:307:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

cls = FloorDiv, base = -1.00000000000000, divisor = -1.00000000000000

    @classmethod
    def eval(cls, base, divisor):
        # python test/test_dynamic_shapes.py -k TestDimConstraints.test_dim_constraints_solve_full
        # Assert triggered by inequality solver
        # assert base.is_integer, base
        # assert divisor.is_integer, divisor

        # We don't provide the same error message as in Python because SymPy
        # makes it difficult to check the types.
        if divisor.is_zero:
            raise ZeroDivisionError("division by zero")
        if base in (int_oo, -int_oo, sympy.oo, -sympy.oo) and divisor in (
            int_oo,
            -int_oo,
            sympy.oo,
            -sympy.oo,
        ):
            return sympy.nan
        if base is sympy.nan or divisor is sympy.nan:
            return sympy.nan

        if base.is_zero:
            return sympy.S.Zero
        if base.is_integer and divisor == 1:
            return base
        if base.is_integer and divisor == -1:
            return sympy.Mul(base, -1)
        if (
            isinstance(base, sympy.Number)
            and isinstance(divisor, sympy.Number)
            and (
                base in (int_oo, -int_oo, sympy.oo, -sympy.oo)
                or divisor in (int_oo, -int_oo, sympy.oo, -sympy.oo)
            )
        ):
            r = float(base) / float(divisor)
            if r == math.inf:
                return int_oo
            elif r == -math.inf:
                return -int_oo
            elif math.isnan(r):
                return sympy.nan
            else:
                return sympy.Integer(math.floor(r))
        if isinstance(base, sympy.Integer) and isinstance(divisor, sympy.Integer):
            return sympy.Integer(int(base) // int(divisor))
        if isinstance(base, FloorDiv):
            return FloorDiv(base.args[0], base.args[1] * divisor)

        # Expands (x + y) // b into x // b + y // b.
        # This only works if floor is an identity, i.e. x / b is an integer.
        for term in sympy.Add.make_args(base):
            quotient = term / divisor
            if quotient.is_integer and isinstance(divisor, sympy.Integer):
                # NB: this is correct even if the divisor is not an integer, but it
                # creates rational expressions that cause problems with dynamic
                # shapes.
                return FloorDiv(base - term, divisor) + quotient

        try:
            gcd = sympy.gcd(base, divisor)
            if gcd != 1:
>               return FloorDiv(
                    sympy.simplify(base / gcd), sympy.simplify(divisor / gcd)
                )

base       = -1.00000000000000
cls        = FloorDiv
divisor    = -1.00000000000000
gcd        = 1.00000000000000
quotient   = 1.00000000000000
term       = -1.00000000000000

/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/utils/_sympy/functions.py:159:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

args = (FloorDiv, -1.00000000000000, -1.00000000000000), kwargs = {}

    @wraps(func)
    def wrapper(*args, **kwargs):
        try:
>           retval = cfunc(*args, **kwargs)
E           RecursionError: maximum recursion depth exceeded in comparison
E
E           To execute this test, run the following from the base repo dir:
E               python test/test_sympy_utils.py -k TestValueRanges.test_binary_ref_fn_floordiv_dtype_float
E
E           This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0

args       = (FloorDiv, -1.00000000000000, -1.00000000000000)
cfunc      = <functools._lru_cache_wrapper object at 0x7fc5303173a0>
func       = <function Function.__new__ at 0x7fc530317280>
kwargs     = {}
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131151
Approved by: https://github.com/ezyang
2024-07-29 18:53:14 +00:00
PyTorch MergeBot
c35f21e5fc Revert "[BE][tests] show local variables on failure in tests (#131151)"
This reverts commit 14158d892a.

Reverted https://github.com/pytorch/pytorch/pull/131151 on behalf of https://github.com/atalman due to Broke CI: test_testing.py::TestTestingCUDA::test_cuda_assert_should_stop_common_device_type_test_suite_cuda [GH job link](https://github.com/pytorch/pytorch/actions/runs/10131415299/job/28014665693) [HUD commit link](14158d892a) ([comment](https://github.com/pytorch/pytorch/pull/131151#issuecomment-2255921015))
2024-07-29 13:19:38 +00:00
PyTorch MergeBot
06fe99a097 Revert "[CI] add new test config label ci-test-showlocals to control test log verbosity (#131981)"
This reverts commit dfa18bf3f3.

Reverted https://github.com/pytorch/pytorch/pull/131981 on behalf of https://github.com/atalman due to Sorry, need to revert bottom PR, which broke CI: https://github.com/pytorch/pytorch/pull/131151 ([comment](https://github.com/pytorch/pytorch/pull/131981#issuecomment-2255892628))
2024-07-29 13:09:41 +00:00
Xuehai Pan
dfa18bf3f3 [CI] add new test config label ci-test-showlocals to control test log verbosity (#131981)
Add a new label `ci-test-showlocals` and add it to test config filter.
If the PR is labeled with `ci-test-showlocals` or "ci-test-showlocals"
present in the PR comment, the test config filter will set a environment
variable `TEST_SHOWLOCALS`. Then `pytest` will show local variables on
failures for better debugging.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131981
Approved by: https://github.com/malfet
2024-07-29 07:40:42 +00:00
Xuehai Pan
14158d892a [BE][tests] show local variables on failure in tests (#131151)
------

As per the title, add argument `--locals` for `unittest` and `--showlocals --tb=long` for `pytest` in CI.

Some failures cannot be reproduced on the local machine but exist on cloud CI. This change allows us to investigate the test failure more easily.

Example output: https://github.com/pytorch/pytorch/actions/runs/9961546996/job/27523888353?pr=130710#step:20:3361

```text
/opt/conda/envs/py_3.8/lib/python3.8/site-packages/sympy/core/function.py:307:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

cls = FloorDiv, base = -1.00000000000000, divisor = -1.00000000000000

    @classmethod
    def eval(cls, base, divisor):
        # python test/test_dynamic_shapes.py -k TestDimConstraints.test_dim_constraints_solve_full
        # Assert triggered by inequality solver
        # assert base.is_integer, base
        # assert divisor.is_integer, divisor

        # We don't provide the same error message as in Python because SymPy
        # makes it difficult to check the types.
        if divisor.is_zero:
            raise ZeroDivisionError("division by zero")
        if base in (int_oo, -int_oo, sympy.oo, -sympy.oo) and divisor in (
            int_oo,
            -int_oo,
            sympy.oo,
            -sympy.oo,
        ):
            return sympy.nan
        if base is sympy.nan or divisor is sympy.nan:
            return sympy.nan

        if base.is_zero:
            return sympy.S.Zero
        if base.is_integer and divisor == 1:
            return base
        if base.is_integer and divisor == -1:
            return sympy.Mul(base, -1)
        if (
            isinstance(base, sympy.Number)
            and isinstance(divisor, sympy.Number)
            and (
                base in (int_oo, -int_oo, sympy.oo, -sympy.oo)
                or divisor in (int_oo, -int_oo, sympy.oo, -sympy.oo)
            )
        ):
            r = float(base) / float(divisor)
            if r == math.inf:
                return int_oo
            elif r == -math.inf:
                return -int_oo
            elif math.isnan(r):
                return sympy.nan
            else:
                return sympy.Integer(math.floor(r))
        if isinstance(base, sympy.Integer) and isinstance(divisor, sympy.Integer):
            return sympy.Integer(int(base) // int(divisor))
        if isinstance(base, FloorDiv):
            return FloorDiv(base.args[0], base.args[1] * divisor)

        # Expands (x + y) // b into x // b + y // b.
        # This only works if floor is an identity, i.e. x / b is an integer.
        for term in sympy.Add.make_args(base):
            quotient = term / divisor
            if quotient.is_integer and isinstance(divisor, sympy.Integer):
                # NB: this is correct even if the divisor is not an integer, but it
                # creates rational expressions that cause problems with dynamic
                # shapes.
                return FloorDiv(base - term, divisor) + quotient

        try:
            gcd = sympy.gcd(base, divisor)
            if gcd != 1:
>               return FloorDiv(
                    sympy.simplify(base / gcd), sympy.simplify(divisor / gcd)
                )

base       = -1.00000000000000
cls        = FloorDiv
divisor    = -1.00000000000000
gcd        = 1.00000000000000
quotient   = 1.00000000000000
term       = -1.00000000000000

/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/utils/_sympy/functions.py:159:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

args = (FloorDiv, -1.00000000000000, -1.00000000000000), kwargs = {}

    @wraps(func)
    def wrapper(*args, **kwargs):
        try:
>           retval = cfunc(*args, **kwargs)
E           RecursionError: maximum recursion depth exceeded in comparison
E
E           To execute this test, run the following from the base repo dir:
E               python test/test_sympy_utils.py -k TestValueRanges.test_binary_ref_fn_floordiv_dtype_float
E
E           This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0

args       = (FloorDiv, -1.00000000000000, -1.00000000000000)
cfunc      = <functools._lru_cache_wrapper object at 0x7fc5303173a0>
func       = <function Function.__new__ at 0x7fc530317280>
kwargs     = {}
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131151
Approved by: https://github.com/ezyang
2024-07-27 19:39:40 +00:00
PyTorch MergeBot
0f9bf208ec Revert "[BE][tests] show local variables on failure in tests (#131151)"
This reverts commit 054d214c50.

Reverted https://github.com/pytorch/pytorch/pull/131151 on behalf of https://github.com/jbschlosser due to pollutes test failure output for OpInfo tests ([comment](https://github.com/pytorch/pytorch/pull/131151#issuecomment-2253310448))
2024-07-26 19:03:10 +00:00
Xuehai Pan
054d214c50 [BE][tests] show local variables on failure in tests (#131151)
------

As per the title, add argument `--locals` for `unittest` and `--showlocals --tb=long` for `pytest` in CI.

Some failures cannot be reproduced on the local machine but exist on cloud CI. This change allows us to investigate the test failure more easily.

Example output: https://github.com/pytorch/pytorch/actions/runs/9961546996/job/27523888353?pr=130710#step:20:3361

```text
/opt/conda/envs/py_3.8/lib/python3.8/site-packages/sympy/core/function.py:307:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

cls = FloorDiv, base = -1.00000000000000, divisor = -1.00000000000000

    @classmethod
    def eval(cls, base, divisor):
        # python test/test_dynamic_shapes.py -k TestDimConstraints.test_dim_constraints_solve_full
        # Assert triggered by inequality solver
        # assert base.is_integer, base
        # assert divisor.is_integer, divisor

        # We don't provide the same error message as in Python because SymPy
        # makes it difficult to check the types.
        if divisor.is_zero:
            raise ZeroDivisionError("division by zero")
        if base in (int_oo, -int_oo, sympy.oo, -sympy.oo) and divisor in (
            int_oo,
            -int_oo,
            sympy.oo,
            -sympy.oo,
        ):
            return sympy.nan
        if base is sympy.nan or divisor is sympy.nan:
            return sympy.nan

        if base.is_zero:
            return sympy.S.Zero
        if base.is_integer and divisor == 1:
            return base
        if base.is_integer and divisor == -1:
            return sympy.Mul(base, -1)
        if (
            isinstance(base, sympy.Number)
            and isinstance(divisor, sympy.Number)
            and (
                base in (int_oo, -int_oo, sympy.oo, -sympy.oo)
                or divisor in (int_oo, -int_oo, sympy.oo, -sympy.oo)
            )
        ):
            r = float(base) / float(divisor)
            if r == math.inf:
                return int_oo
            elif r == -math.inf:
                return -int_oo
            elif math.isnan(r):
                return sympy.nan
            else:
                return sympy.Integer(math.floor(r))
        if isinstance(base, sympy.Integer) and isinstance(divisor, sympy.Integer):
            return sympy.Integer(int(base) // int(divisor))
        if isinstance(base, FloorDiv):
            return FloorDiv(base.args[0], base.args[1] * divisor)

        # Expands (x + y) // b into x // b + y // b.
        # This only works if floor is an identity, i.e. x / b is an integer.
        for term in sympy.Add.make_args(base):
            quotient = term / divisor
            if quotient.is_integer and isinstance(divisor, sympy.Integer):
                # NB: this is correct even if the divisor is not an integer, but it
                # creates rational expressions that cause problems with dynamic
                # shapes.
                return FloorDiv(base - term, divisor) + quotient

        try:
            gcd = sympy.gcd(base, divisor)
            if gcd != 1:
>               return FloorDiv(
                    sympy.simplify(base / gcd), sympy.simplify(divisor / gcd)
                )

base       = -1.00000000000000
cls        = FloorDiv
divisor    = -1.00000000000000
gcd        = 1.00000000000000
quotient   = 1.00000000000000
term       = -1.00000000000000

/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/utils/_sympy/functions.py:159:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

args = (FloorDiv, -1.00000000000000, -1.00000000000000), kwargs = {}

    @wraps(func)
    def wrapper(*args, **kwargs):
        try:
>           retval = cfunc(*args, **kwargs)
E           RecursionError: maximum recursion depth exceeded in comparison
E
E           To execute this test, run the following from the base repo dir:
E               python test/test_sympy_utils.py -k TestValueRanges.test_binary_ref_fn_floordiv_dtype_float
E
E           This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0

args       = (FloorDiv, -1.00000000000000, -1.00000000000000)
cfunc      = <functools._lru_cache_wrapper object at 0x7fc5303173a0>
func       = <function Function.__new__ at 0x7fc530317280>
kwargs     = {}
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131151
Approved by: https://github.com/ezyang
2024-07-25 10:10:58 +00:00
Xuehai Pan
ba48cf6535 [BE][Easy][6/19] enforce style for empty lines in import segments in test/ (#129757)
See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter.

You can review these PRs via:

```bash
git diff --ignore-all-space --ignore-blank-lines HEAD~1
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129757
Approved by: https://github.com/ezyang
2024-07-17 06:42:37 +00:00
Xuehai Pan
4d7bf72d93 [BE][Easy] fix ruff rule needless-bool (SIM103) (#130206)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130206
Approved by: https://github.com/malfet
2024-07-14 08:17:52 +00:00
Yuanhao Ji
312652c325 [RFC] Add support for device extension autoloading (#127074)
Fixes #122468

- Load device extensions at the end of `torch/__init__.py`
- Enabled by default, or you can disable it with `TORCH_DEVICE_BACKEND_AUTOLOAD=0`

run test:

```python
python test/run_test.py -i test_autoload_enable
python test/run_test.py -i test_autoload_disable
```

doc:

https://docs-preview.pytorch.org/pytorch/pytorch/127074/miscellaneous_environment_variables.html

co-author:  @jgong5 @bsochack @bkowalskiINTEL @jczaja @FFFrog @hipudding

Co-authored-by: albanD <desmaison.alban@gmail.com>
Co-authored-by: Jiong Gong <jiong.gong@intel.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127074
Approved by: https://github.com/albanD, https://github.com/jgong5
2024-07-09 06:14:13 +00:00
Catherine Lee
91a8376d47 run_test: Unset cpp stacktraces after reruns (#129004)
Rerun the failing test singly with the env var set.  If it succeeds, start a new process without the cpp stack traces env var

We don't want to waste time generating these if we don't have to

They can also show up in assertion errors, which may cause unexpected failures if a test wants to check these

Adds new --rs (run single) to be used the same way --scs and --sc are.  It will only run the single test in the step current file

https://hud.pytorch.org/pytorch/pytorch/pull/129004?sha=2c349d3557d399020bf1f6a8b7045e2e4957ba46 has some examples of logs

In the above:
* test_checkpoint_valid failed, then passed in another subprocess.  The testing continued in a different new subprocess from the test right after it (test_checkpointing_without_reentrant_early_free)
* test_format_traceback_short failed consistently, but it continued to run because keep-going was set

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129004
Approved by: https://github.com/PaliC
2024-07-03 01:50:15 +00:00
Xuehai Pan
4ee1cb9b95 [BE][Easy] replace import pathlib with from pathlib import Path (#129426)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129426
Approved by: https://github.com/malfet
2024-06-30 01:36:07 +00:00
PyTorch MergeBot
2effbcfcd8 Revert "[BE][Easy] replace import pathlib with from pathlib import Path (#129426)"
This reverts commit 6d75604ef1.

Reverted https://github.com/pytorch/pytorch/pull/129426 on behalf of https://github.com/XuehaiPan due to recognize `Path` as new exported API ([comment](https://github.com/pytorch/pytorch/pull/129426#issuecomment-2198371625))
2024-06-29 23:24:06 +00:00
Xuehai Pan
6d75604ef1 [BE][Easy] replace import pathlib with from pathlib import Path (#129426)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129426
Approved by: https://github.com/malfet
2024-06-29 15:42:09 +00:00
Catherine Lee
8892ddaacc [TD] Test removal on sm86 (#127131)
Yolo

I'm excited to break CI :')
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127131
Approved by: https://github.com/huydhn, https://github.com/ZainRizvi
2024-06-07 20:19:18 +00:00
Howard Huang
baaa914bf7 [small] test clean up (#128079)
remove unnecessary line: https://github.com/pytorch/pytorch/issues/123733
add main so test can be run `python ...`: https://github.com/pytorch/pytorch/issues/124906

Pull Request resolved: https://github.com/pytorch/pytorch/pull/128079
Approved by: https://github.com/awgu
2024-06-06 21:21:40 +00:00
chuanqiw
627d2cd87d [CI] disable td for xpu ci test by default (#127611)
Due to the xpu ci test has been enabled td by default, a lot of test cases (75%) have been skipped in CI tests. It caused some ci failures escaped from the ci tests, for example issue #127539. This PR depends on PR #127595 landed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127611
Approved by: https://github.com/etaf, https://github.com/atalman
2024-06-04 17:15:10 +00:00
Catherine Lee
a31a60d85b Change run_test.py arg parsing to handle additional args better (#126709)
Do not inherit parser from common_utils
* I don't think we use any variables in run_test that depend on those, and I think all tests except doctests run in a subprocess so they will parse the args in common_utils and set the variables.  I don't think doctests wants any of those variables?

Parse known args, add the extra args as extra, pass the extra ones along to the subprocess
Removes the first instance of `--`

I think I will miss run_test telling me if an arg is valid or not

Pull Request resolved: https://github.com/pytorch/pytorch/pull/126709
Approved by: https://github.com/ZainRizvi, https://github.com/huydhn, https://github.com/Flamefire
2024-05-23 21:08:12 +00:00
Catherine Lee
ac2c547838 [TD] Upload names of failures to s3 for pytest cache (#126315)
Some tests don't get run through pytest and pytest crashes when a test segfaults, so in both caess, the pytest cache won't have an entry (similar to https://github.com/pytorch/test-infra/pull/5205).

Instead, manually upload/download an extra file that lists the failing test files

Technically this would be more general than the pytest cache
Pull Request resolved: https://github.com/pytorch/pytorch/pull/126315
Approved by: https://github.com/ZainRizvi
2024-05-21 16:29:31 +00:00
PyTorch MergeBot
8bca0847c2 Revert "[TD] Upload names of failures to s3 for pytest cache (#126315)"
This reverts commit 655038687a.

Reverted https://github.com/pytorch/pytorch/pull/126315 on behalf of https://github.com/clee2000 due to broke inductor ([comment](https://github.com/pytorch/pytorch/pull/126315#issuecomment-2121133045))
2024-05-20 20:15:08 +00:00
Catherine Lee
655038687a [TD] Upload names of failures to s3 for pytest cache (#126315)
Some tests don't get run through pytest and pytest crashes when a test segfaults, so in both caess, the pytest cache won't have an entry (similar to https://github.com/pytorch/test-infra/pull/5205).

Instead, manually upload/download an extra file that lists the failing test files

Technically this would be more general than the pytest cache
Pull Request resolved: https://github.com/pytorch/pytorch/pull/126315
Approved by: https://github.com/ZainRizvi
2024-05-20 17:36:30 +00:00
drisspg
762ce6f062 Add Lowering for FlexAttention Backwards (#125515)
# Summary
#### What does this PR do?
It enables Inductor to actually generate the fused flex attention kernel for the backwards

I did some other things along the way:
- Abstract out the 'build_subgraph_buffer' subroutine and make it reusable between flex attention and flex_attention backwards. In total we need too build 3 subgraphs for fwd + bwd. 1 for the fwd graph and then 2 in the bwd. The FAv2 algorithm recomputes the parts of the forward (more efficiently since we already have the row_max via logsumexp), therefore we need to inline both the fwd graph and the joint graph in the bwds kernel.
- The version of the backwards kernel is from a somewhat older version of the triton tutorial implementation. I think that we should update in a follow up to a newer version. Notably the blocks need to be square for this to work as currently implemented. I am sure there are many opportunities for optimization.
- I didnt correctly register the decomp table + IndexMode when I landed: https://github.com/pytorch/pytorch/pull/123902, this remedies that.
- The rel_bias helper func was reversed in terms of causality. I updated and then add a test specific for "future causal" attention.
- This PRs but the main point that I think still needs to be worked out is the store_output call. I have it hacked up to be 'fake' but I dont think we want to land that and likely want to just have a mutated 'dq' and a stored_output 'dk'
- I also needed to update the `TritonTemplateKernel` to actually accept multiple subgraphs (modifications)
- I updated the benchmark to also profile bwds performance

### Benchmark Numbers:
_The current implementation is not parallelizing over ctx length in the bwd_
FWD Speedups

| Type    |   Speedup | shape              | score_mod   | dtype          |
|---------|-----------|--------------------|-------------|----------------|
| Average |     0.991 |                    |             |                |
| Max     |     1.182 | (16, 16, 4096, 64) | noop        | torch.bfloat16 |
| Min     |     0.796 | (2, 16, 512, 256)  | head_bias   | torch.bfloat16 |

BWD Speedups

| Type    |   Speedup | shape              | score_mod   | dtype          |
|---------|-----------|--------------------|-------------|----------------|
| Average |     0.291 |                    |             |                |
| Max     |     0.652 | (8, 16, 512, 64)   | head_bias   | torch.bfloat16 |
| Min     |     0.073 | (2, 16, 4096, 128) | head_bias   | torch.bfloat16 |

<details>

<summary>Full Data</summary>

| shape               | score_mod     | dtype          |   fwd_eager_time |   fwd_compiled_time |   bwd_eager_time |   bwd_compiled_time |   fwd_speedup |   bwd_speedup |
|---------------------|---------------|----------------|------------------|---------------------|------------------|---------------------|---------------|---------------|
| (2, 16, 512, 64)    | noop          | torch.bfloat16 |           19.936 |              19.092 |           57.851 |             193.564 |         1.044 |         0.299 |
| (2, 16, 512, 64)    | causal_mask   | torch.bfloat16 |           19.955 |              19.497 |           57.662 |             206.278 |         1.024 |         0.280 |
| (2, 16, 512, 64)    | relative_bias | torch.bfloat16 |           19.455 |              21.297 |           57.674 |             195.219 |         0.913 |         0.295 |
| (2, 16, 512, 64)    | head_bias     | torch.bfloat16 |           19.958 |              21.289 |           57.674 |             193.859 |         0.938 |         0.298 |
| (2, 16, 512, 128)   | noop          | torch.bfloat16 |           28.157 |              28.615 |           82.831 |             454.211 |         0.984 |         0.182 |
| (2, 16, 512, 128)   | causal_mask   | torch.bfloat16 |           28.154 |              28.444 |           83.091 |             432.083 |         0.990 |         0.192 |
| (2, 16, 512, 128)   | relative_bias | torch.bfloat16 |           28.722 |              27.897 |           83.175 |             446.789 |         1.030 |         0.186 |
| (2, 16, 512, 128)   | head_bias     | torch.bfloat16 |           28.299 |              27.673 |           83.052 |             459.179 |         1.023 |         0.181 |
| (2, 16, 512, 256)   | noop          | torch.bfloat16 |           41.167 |              50.504 |          175.019 |            1083.545 |         0.815 |         0.162 |
| (2, 16, 512, 256)   | causal_mask   | torch.bfloat16 |           41.656 |              51.933 |          175.078 |            1171.176 |         0.802 |         0.149 |
| (2, 16, 512, 256)   | relative_bias | torch.bfloat16 |           41.697 |              50.722 |          175.159 |            1097.312 |         0.822 |         0.160 |
| (2, 16, 512, 256)   | head_bias     | torch.bfloat16 |           41.690 |              52.387 |          175.184 |            1097.336 |         0.796 |         0.160 |
| (2, 16, 1024, 64)   | noop          | torch.bfloat16 |           39.232 |              37.454 |          127.847 |             612.430 |         1.047 |         0.209 |
| (2, 16, 1024, 64)   | causal_mask   | torch.bfloat16 |           39.930 |              39.599 |          127.755 |             665.359 |         1.008 |         0.192 |
| (2, 16, 1024, 64)   | relative_bias | torch.bfloat16 |           39.417 |              41.304 |          127.902 |             614.990 |         0.954 |         0.208 |
| (2, 16, 1024, 64)   | head_bias     | torch.bfloat16 |           39.965 |              42.034 |          127.953 |             613.273 |         0.951 |         0.209 |
| (2, 16, 1024, 128)  | noop          | torch.bfloat16 |           63.964 |              71.024 |          226.510 |            1637.669 |         0.901 |         0.138 |
| (2, 16, 1024, 128)  | causal_mask   | torch.bfloat16 |           63.843 |              72.451 |          226.750 |            1558.949 |         0.881 |         0.145 |
| (2, 16, 1024, 128)  | relative_bias | torch.bfloat16 |           64.301 |              70.487 |          226.651 |            1610.063 |         0.912 |         0.141 |
| (2, 16, 1024, 128)  | head_bias     | torch.bfloat16 |           64.033 |              71.394 |          226.676 |            1668.511 |         0.897 |         0.136 |
| (2, 16, 1024, 256)  | noop          | torch.bfloat16 |          129.348 |             141.390 |          507.337 |            4405.175 |         0.915 |         0.115 |
| (2, 16, 1024, 256)  | causal_mask   | torch.bfloat16 |          129.538 |             145.680 |          507.178 |            4768.874 |         0.889 |         0.106 |
| (2, 16, 1024, 256)  | relative_bias | torch.bfloat16 |          129.438 |             142.782 |          507.004 |            4401.002 |         0.907 |         0.115 |
| (2, 16, 1024, 256)  | head_bias     | torch.bfloat16 |          129.058 |             146.242 |          507.547 |            4434.251 |         0.883 |         0.114 |
| (2, 16, 4096, 64)   | noop          | torch.bfloat16 |          481.606 |             409.120 |         1440.890 |           14147.269 |         1.177 |         0.102 |
| (2, 16, 4096, 64)   | causal_mask   | torch.bfloat16 |          480.227 |             438.847 |         1434.419 |           14973.386 |         1.094 |         0.096 |
| (2, 16, 4096, 64)   | relative_bias | torch.bfloat16 |          480.831 |             458.104 |         1432.935 |           14193.253 |         1.050 |         0.101 |
| (2, 16, 4096, 64)   | head_bias     | torch.bfloat16 |          480.749 |             452.497 |         1437.040 |           14084.869 |         1.062 |         0.102 |
| (2, 16, 4096, 128)  | noop          | torch.bfloat16 |          872.534 |             848.275 |         2600.895 |           35156.849 |         1.029 |         0.074 |
| (2, 16, 4096, 128)  | causal_mask   | torch.bfloat16 |          872.647 |             868.279 |         2587.581 |           31919.531 |         1.005 |         0.081 |
| (2, 16, 4096, 128)  | relative_bias | torch.bfloat16 |          871.484 |             827.644 |         2593.989 |           34805.634 |         1.053 |         0.075 |
| (2, 16, 4096, 128)  | head_bias     | torch.bfloat16 |          871.422 |             856.437 |         2602.482 |           35708.591 |         1.017 |         0.073 |
| (2, 16, 4096, 256)  | noop          | torch.bfloat16 |         1904.497 |            1758.183 |         6122.416 |           66754.593 |         1.083 |         0.092 |
| (2, 16, 4096, 256)  | causal_mask   | torch.bfloat16 |         1911.174 |            1762.821 |         6113.207 |           72759.392 |         1.084 |         0.084 |
| (2, 16, 4096, 256)  | relative_bias | torch.bfloat16 |         1911.254 |            1727.108 |         6123.530 |           66577.988 |         1.107 |         0.092 |
| (2, 16, 4096, 256)  | head_bias     | torch.bfloat16 |         1916.977 |            1801.804 |         6118.158 |           67359.680 |         1.064 |         0.091 |
| (8, 16, 512, 64)    | noop          | torch.bfloat16 |           44.984 |              43.974 |          170.276 |             262.259 |         1.023 |         0.649 |
| (8, 16, 512, 64)    | causal_mask   | torch.bfloat16 |           45.001 |              46.265 |          170.509 |             274.893 |         0.973 |         0.620 |
| (8, 16, 512, 64)    | relative_bias | torch.bfloat16 |           45.466 |              48.211 |          170.606 |             262.759 |         0.943 |         0.649 |
| (8, 16, 512, 64)    | head_bias     | torch.bfloat16 |           45.481 |              48.435 |          170.267 |             261.265 |         0.939 |         0.652 |
| (8, 16, 512, 128)   | noop          | torch.bfloat16 |           72.565 |              74.736 |          313.220 |             773.126 |         0.971 |         0.405 |
| (8, 16, 512, 128)   | causal_mask   | torch.bfloat16 |           72.015 |              75.755 |          313.311 |             775.513 |         0.951 |         0.404 |
| (8, 16, 512, 128)   | relative_bias | torch.bfloat16 |           72.105 |              74.189 |          313.806 |             769.238 |         0.972 |         0.408 |
| (8, 16, 512, 128)   | head_bias     | torch.bfloat16 |           72.005 |              74.364 |          313.509 |             775.237 |         0.968 |         0.404 |
| (8, 16, 512, 256)   | noop          | torch.bfloat16 |          138.656 |             165.453 |          663.707 |            2672.067 |         0.838 |         0.248 |
| (8, 16, 512, 256)   | causal_mask   | torch.bfloat16 |          139.096 |             172.613 |          663.593 |            2926.538 |         0.806 |         0.227 |
| (8, 16, 512, 256)   | relative_bias | torch.bfloat16 |          139.500 |             168.417 |          663.938 |            2658.629 |         0.828 |         0.250 |
| (8, 16, 512, 256)   | head_bias     | torch.bfloat16 |          139.776 |             173.549 |          662.920 |            2667.266 |         0.805 |         0.249 |
| (8, 16, 1024, 64)   | noop          | torch.bfloat16 |          134.883 |             125.004 |          484.706 |            1195.254 |         1.079 |         0.406 |
| (8, 16, 1024, 64)   | causal_mask   | torch.bfloat16 |          134.297 |             132.875 |          485.420 |            1234.953 |         1.011 |         0.393 |
| (8, 16, 1024, 64)   | relative_bias | torch.bfloat16 |          134.839 |             139.231 |          485.470 |            1198.556 |         0.968 |         0.405 |
| (8, 16, 1024, 64)   | head_bias     | torch.bfloat16 |          133.822 |             136.449 |          485.608 |            1189.198 |         0.981 |         0.408 |
| (8, 16, 1024, 128)  | noop          | torch.bfloat16 |          235.470 |             234.765 |          886.094 |            2662.944 |         1.003 |         0.333 |
| (8, 16, 1024, 128)  | causal_mask   | torch.bfloat16 |          236.305 |             241.382 |          886.293 |            2646.984 |         0.979 |         0.335 |
| (8, 16, 1024, 128)  | relative_bias | torch.bfloat16 |          236.414 |             233.980 |          885.250 |            2642.178 |         1.010 |         0.335 |
| (8, 16, 1024, 128)  | head_bias     | torch.bfloat16 |          237.176 |             239.040 |          885.754 |            2665.242 |         0.992 |         0.332 |
| (8, 16, 1024, 256)  | noop          | torch.bfloat16 |          504.445 |             517.855 |         1978.956 |            9592.906 |         0.974 |         0.206 |
| (8, 16, 1024, 256)  | causal_mask   | torch.bfloat16 |          502.428 |             536.002 |         1978.611 |           10607.342 |         0.937 |         0.187 |
| (8, 16, 1024, 256)  | relative_bias | torch.bfloat16 |          503.396 |             523.960 |         1977.993 |            9539.284 |         0.961 |         0.207 |
| (8, 16, 1024, 256)  | head_bias     | torch.bfloat16 |          503.818 |             536.014 |         1980.131 |            9576.262 |         0.940 |         0.207 |
| (8, 16, 4096, 64)   | noop          | torch.bfloat16 |         1970.139 |            1674.930 |         5750.940 |           16724.134 |         1.176 |         0.344 |
| (8, 16, 4096, 64)   | causal_mask   | torch.bfloat16 |         1959.036 |            1775.056 |         5780.512 |           17390.350 |         1.104 |         0.332 |
| (8, 16, 4096, 64)   | relative_bias | torch.bfloat16 |         1947.198 |            1773.869 |         5780.643 |           16779.699 |         1.098 |         0.345 |
| (8, 16, 4096, 64)   | head_bias     | torch.bfloat16 |         1963.935 |            1829.502 |         5780.018 |           16703.259 |         1.073 |         0.346 |
| (8, 16, 4096, 128)  | noop          | torch.bfloat16 |         3582.711 |            3362.623 |        10436.069 |           36415.565 |         1.065 |         0.287 |
| (8, 16, 4096, 128)  | causal_mask   | torch.bfloat16 |         3581.504 |            3499.472 |        10346.869 |           36164.959 |         1.023 |         0.286 |
| (8, 16, 4096, 128)  | relative_bias | torch.bfloat16 |         3589.779 |            3337.849 |        10529.621 |           36261.696 |         1.075 |         0.290 |
| (8, 16, 4096, 128)  | head_bias     | torch.bfloat16 |         3602.265 |            3436.444 |        10458.660 |           36507.790 |         1.048 |         0.286 |
| (8, 16, 4096, 256)  | noop          | torch.bfloat16 |         7695.923 |            7126.275 |        24643.009 |          140949.081 |         1.080 |         0.175 |
| (8, 16, 4096, 256)  | causal_mask   | torch.bfloat16 |         7679.939 |            7186.252 |        24538.105 |          157156.067 |         1.069 |         0.156 |
| (8, 16, 4096, 256)  | relative_bias | torch.bfloat16 |         7681.374 |            6994.832 |        24549.713 |          140077.179 |         1.098 |         0.175 |
| (8, 16, 4096, 256)  | head_bias     | torch.bfloat16 |         7679.822 |            7212.278 |        24627.823 |          140675.003 |         1.065 |         0.175 |
| (16, 16, 512, 64)   | noop          | torch.bfloat16 |           80.126 |              78.291 |          333.719 |             541.165 |         1.023 |         0.617 |
| (16, 16, 512, 64)   | causal_mask   | torch.bfloat16 |           80.065 |              81.696 |          333.779 |             551.113 |         0.980 |         0.606 |
| (16, 16, 512, 64)   | relative_bias | torch.bfloat16 |           80.138 |              86.715 |          333.364 |             542.118 |         0.924 |         0.615 |
| (16, 16, 512, 64)   | head_bias     | torch.bfloat16 |           80.415 |              85.204 |          333.294 |             536.840 |         0.944 |         0.621 |
| (16, 16, 512, 128)  | noop          | torch.bfloat16 |          134.964 |             138.025 |          607.093 |            1333.102 |         0.978 |         0.455 |
| (16, 16, 512, 128)  | causal_mask   | torch.bfloat16 |          134.192 |             141.523 |          606.269 |            1424.318 |         0.948 |         0.426 |
| (16, 16, 512, 128)  | relative_bias | torch.bfloat16 |          135.711 |             138.639 |          606.283 |            1327.974 |         0.979 |         0.457 |
| (16, 16, 512, 128)  | head_bias     | torch.bfloat16 |          135.552 |             140.555 |          607.107 |            1347.370 |         0.964 |         0.451 |
| (16, 16, 512, 256)  | noop          | torch.bfloat16 |          275.113 |             315.144 |         1301.583 |            5268.153 |         0.873 |         0.247 |
| (16, 16, 512, 256)  | causal_mask   | torch.bfloat16 |          274.867 |             328.106 |         1302.513 |            5770.594 |         0.838 |         0.226 |
| (16, 16, 512, 256)  | relative_bias | torch.bfloat16 |          276.052 |             321.770 |         1302.904 |            5241.920 |         0.858 |         0.249 |
| (16, 16, 512, 256)  | head_bias     | torch.bfloat16 |          271.409 |             328.839 |         1302.142 |            5266.037 |         0.825 |         0.247 |
| (16, 16, 1024, 64)  | noop          | torch.bfloat16 |          260.489 |             237.463 |          955.884 |            1817.558 |         1.097 |         0.526 |
| (16, 16, 1024, 64)  | causal_mask   | torch.bfloat16 |          262.378 |             254.350 |          955.280 |            1843.807 |         1.032 |         0.518 |
| (16, 16, 1024, 64)  | relative_bias | torch.bfloat16 |          261.338 |             268.253 |          956.038 |            1820.036 |         0.974 |         0.525 |
| (16, 16, 1024, 64)  | head_bias     | torch.bfloat16 |          262.153 |             264.156 |          956.023 |            1810.076 |         0.992 |         0.528 |
| (16, 16, 1024, 128) | noop          | torch.bfloat16 |          476.475 |             461.413 |         1760.578 |            4306.521 |         1.033 |         0.409 |
| (16, 16, 1024, 128) | causal_mask   | torch.bfloat16 |          473.794 |             479.178 |         1761.277 |            4619.439 |         0.989 |         0.381 |
| (16, 16, 1024, 128) | relative_bias | torch.bfloat16 |          473.839 |             463.282 |         1758.692 |            4290.562 |         1.023 |         0.410 |
| (16, 16, 1024, 128) | head_bias     | torch.bfloat16 |          472.979 |             472.896 |         1763.086 |            4367.931 |         1.000 |         0.404 |
| (16, 16, 1024, 256) | noop          | torch.bfloat16 |         1014.184 |            1026.764 |         3922.997 |           19104.147 |         0.988 |         0.205 |
| (16, 16, 1024, 256) | causal_mask   | torch.bfloat16 |         1013.217 |            1039.046 |         3928.382 |           21086.281 |         0.975 |         0.186 |
| (16, 16, 1024, 256) | relative_bias | torch.bfloat16 |         1008.519 |            1015.278 |         3922.133 |           18980.652 |         0.993 |         0.207 |
| (16, 16, 1024, 256) | head_bias     | torch.bfloat16 |         1011.360 |            1047.542 |         3931.245 |           19069.172 |         0.965 |         0.206 |
| (16, 16, 4096, 64)  | noop          | torch.bfloat16 |         3929.850 |            3325.667 |        11411.704 |           23344.280 |         1.182 |         0.489 |
| (16, 16, 4096, 64)  | causal_mask   | torch.bfloat16 |         3885.262 |            3581.544 |        11390.515 |           23725.639 |         1.085 |         0.480 |
| (16, 16, 4096, 64)  | relative_bias | torch.bfloat16 |         3865.737 |            3537.308 |        11489.901 |           23406.330 |         1.093 |         0.491 |
| (16, 16, 4096, 64)  | head_bias     | torch.bfloat16 |         3880.530 |            3665.249 |        11484.411 |           23299.496 |         1.059 |         0.493 |
| (16, 16, 4096, 128) | noop          | torch.bfloat16 |         7030.306 |            6745.715 |        20621.264 |           57464.096 |         1.042 |         0.359 |
| (16, 16, 4096, 128) | causal_mask   | torch.bfloat16 |         7095.414 |            7034.385 |        20410.656 |           61660.511 |         1.009 |         0.331 |
| (16, 16, 4096, 128) | relative_bias | torch.bfloat16 |         7084.779 |            6686.497 |        20315.161 |           57243.969 |         1.060 |         0.355 |
| (16, 16, 4096, 128) | head_bias     | torch.bfloat16 |         7075.367 |            6863.305 |        20494.385 |           58481.953 |         1.031 |         0.350 |
| (16, 16, 4096, 256) | noop          | torch.bfloat16 |        15612.741 |           14297.482 |        55306.847 |          281161.865 |         1.092 |         0.197 |
| (16, 16, 4096, 256) | causal_mask   | torch.bfloat16 |        15326.592 |           14263.878 |        55227.806 |          313063.232 |         1.075 |         0.176 |
| (16, 16, 4096, 256) | relative_bias | torch.bfloat16 |        15297.963 |           14007.379 |        54558.029 |          279529.175 |         1.092 |         0.195 |
| (16, 16, 4096, 256) | head_bias     | torch.bfloat16 |        15216.160 |           14276.027 |        55081.581 |          280996.826 |         1.066 |         0.196 |

</details>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/125515
Approved by: https://github.com/Chillee
2024-05-17 00:41:55 +00:00
Jithun Nair
14d8e3aec0 Add distributed/_tensor/test_attention to ROCM_BLOCKLIST (#126336)
Fixes #125504
Fixes #126252
Fixes #126296
Fixes #126330

This PR doesn't really fix the RingAttentionTest tests for ROCm, but explicitly adds the whole test file to ROCM_BLOCKLIST to get a clean signal on ROCm distributed CI. We will enable these tests in a follow-up PR.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/126336
Approved by: https://github.com/huydhn, https://github.com/pruthvistony
2024-05-16 16:38:09 +00:00
Catherine Lee
48f98bcdfc [TD] Enable test removal on most default configs + distributed CUDA for everyone (#125931)
yolo

Add the longest jobs in pull:
* default cpu configs
* non sm86 cuda
* distributed cuda for everyone

Still excluding
* slow, inductor, rocm, onnx, mac, dynamo
* distributed cpu
* windows cuda
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125931
Approved by: https://github.com/huydhn, https://github.com/ZainRizvi
2024-05-14 17:35:12 +00:00
Catherine Lee
6f619cc727 [ez] functorch/test_vmap and test_dataloader to run in parallel (#125597)
Also mark test_svd serial in linalg to see if it helps with the flakiness
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125597
Approved by: https://github.com/huydhn, https://github.com/ZainRizvi
2024-05-08 15:37:29 +00:00
Huy Do
0e57bbb6d7 Set timeout for C++ tests (#125517)
Looking at the unrelated Windows timeout failure on https://github.com/pytorch/pytorch/pull/125199, it looks like we don't have a timeout value set for C++ tests atm.  In this case, a C++ test on Windows timed out after 2+ hours.

```
2024-05-02T23:35:34.0639067Z Running cpp/c10_TypeList_test 1/1 ... [2024-05-02 23:35:34.059021]
2024-05-02T23:35:34.0641108Z Executing ['pytest', 'C:\\actions-runner\\_work\\pytorch\\pytorch\\build\\win_tmp\\build\\torch\\test\\c10_TypeList_test.exe', '-m', 'not serial', '-v', '-vv', '-rfEX', '-n', '2', '--junit-xml-reruns', 'test-reports\\python-pytest\\test\\run_test\\test\\run_test-c898ddeff8f33cbf.xml', '-x', '--reruns=2'] ... [2024-05-02 23:35:34.062137]
2024-05-03T02:45:33.7862004Z Process SpawnPoolWorker-2:
2024-05-03T02:45:33.7927201Z Traceback (most recent call last):
2024-05-03T02:45:33.7928032Z   File "C:\Jenkins\Miniconda3\lib\multiprocessing\process.py", line 315, in _bootstrap
2024-05-03T02:45:33.7928722Z     self.run()
2024-05-03T02:45:33.7929722Z   File "C:\Jenkins\Miniconda3\lib\multiprocessing\process.py", line 108, in run
2024-05-03T02:45:33.7931639Z     self._target(*self._args, **self._kwargs)
2024-05-03T02:45:33.7932435Z   File "C:\Jenkins\Miniconda3\lib\multiprocessing\pool.py", line 114, in worker
2024-05-03T02:45:33.7933338Z     task = get()
2024-05-03T02:45:33.7933946Z   File "C:\Jenkins\Miniconda3\lib\multiprocessing\queues.py", line 365, in get
2024-05-03T02:45:33.7935219Z     res = self._reader.recv_bytes()
2024-05-03T02:45:33.7935897Z   File "C:\Jenkins\Miniconda3\lib\multiprocessing\connection.py", line 221, in recv_bytes
2024-05-03T02:45:33.7936609Z     buf = self._recv_bytes(maxlength)
2024-05-03T02:45:33.7937302Z   File "C:\Jenkins\Miniconda3\lib\multiprocessing\connection.py", line 310, in _recv_bytes
2024-05-03T02:45:33.7938316Z     waitres = _winapi.WaitForMultipleObjects(
2024-05-03T02:45:33.7938766Z KeyboardInterrupt
```

Retrying was working, but it was already too late to finish the job.  I'm setting the same default `THRESHOLD * 3` timeout value here for C++ tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125517
Approved by: https://github.com/clee2000
2024-05-07 16:41:38 +00:00
Catherine Lee
848fce35b5 [CI][ez] Don't retry when it says don't retry (#125643)
default arg for retry_shell is retries=1
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125643
Approved by: https://github.com/huydhn
2024-05-07 16:20:00 +00:00
Catherine Lee
1b3fd83ab2 [TD] Enable TD on AVX related configs (#125482)
On test configs `nogpu_AVX512` and `nogpu_NO_AVX2`, which are the next longest jobs on trunk after windows
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125482
Approved by: https://github.com/huydhn
2024-05-06 22:02:16 +00:00