Commit Graph

5620 Commits

Author SHA1 Message Date
Chen Lai
9371491529 [Reland][pytorch] Patch the _is_conv_node function (#154473)
Summary: Add the conv padding ops in pytorch, the corresponding pr in torch ao is https://github.com/pytorch/ao/pull/2257

Test Plan:
```
buck test 'fbcode//mode/opt' fbcode//caffe2/test:quantization_pt2e -- --exact 'caffe2/test:quantization_pt2e - test_conv_padding_bn_relu (quantization.pt2e.test_quantize_pt2e.TestQuantizePT2E)'
```

Differential Revision: D75494468

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154473
Approved by: https://github.com/Skylion007
2025-05-30 00:41:03 +00:00
Nikita Shulga
d6cb0fe576 [MPS] Extend index_copy support to complex dtypes (#154671)
Should have noticed it during the review
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154671
Approved by: https://github.com/dcci
ghstack dependencies: #154670
2025-05-30 00:28:13 +00:00
Isalia20
41092cb86c [MPS] index copy impl (#154326)
Second most requested op according to #154052

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154326
Approved by: https://github.com/malfet
2025-05-29 16:57:43 +00:00
Xuehai Pan
7ae204c3b6 [BE][CI][Easy] Run lintrunner on generated .pyi stub files (#150732)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/150732
Approved by: https://github.com/malfet, https://github.com/cyyever, https://github.com/aorenste
2025-05-27 14:58:02 +00:00
Yuanhao Ji
0a7eef140b Add torch.Tensor._make_wrapper_subclass to torch/_C/__init__.pyi (#154022)
Fixes #153790

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154022
Approved by: https://github.com/Skylion007
2025-05-27 14:10:00 +00:00
Pat Vignola
70d12ccc3f [Torch] Fix error message formatting in fp8 comparison logic (#153647)
Summary: Using `\` includes all the tabs from the next line in the error message.

Test Plan: Nothing, simply error message fixing

Reviewed By: exclamaforte

Differential Revision: D74539234

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153647
Approved by: https://github.com/exclamaforte
2025-05-27 02:51:05 +00:00
Ke Wen
8c16d0e404 [c10d] Add support for testing SIGABRT return (#153167)
`SIGABRT` is a common return by *negative* distributed tests, which checks for effectiveness of NaN assert, watchdog throw, etc.

These errors are not detectable by traditional statements like `with self.assertRaises(RuntimeError)`.

Instead, we'd need to check for the process's return code, e.g. `SIGABRT(6)` would have a return code of -6.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153167
Approved by: https://github.com/fduwjj
2025-05-26 00:56:05 +00:00
PyTorch MergeBot
54932d865e Revert "[c10d] Add support for testing SIGABRT return (#153167)"
This reverts commit 03e102dbe8.

Reverted https://github.com/pytorch/pytorch/pull/153167 on behalf of https://github.com/malfet due to It broke lint ([comment](https://github.com/pytorch/pytorch/pull/153167#issuecomment-2907820789))
2025-05-25 13:17:27 +00:00
Ke Wen
9d922b55ef [Distributed][CI] Rework continuous TestCase (#153653)
1. Reworked `MultiProcContinousTest` to spawn processes during `setUpClass` instead of `main` (so that we can support multiple TestClass'es in one file).

2. The child processes are now an infinite loop, monitoring test IDs passed from main process via a task queue. Reciprocally, the child processes inform the main process completion of a test via a completion queue.

3. Added a test template.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153653
Approved by: https://github.com/d4l3k, https://github.com/fegin, https://github.com/fduwjj
2025-05-25 03:49:29 +00:00
Ke Wen
03e102dbe8 [c10d] Add support for testing SIGABRT return (#153167)
`SIGABRT` is a common return by *negative* distributed tests, which checks for effectiveness of NaN assert, watchdog throw, etc.

These errors are not detectable by traditional statements like `with self.assertRaises(RuntimeError)`.

Instead, we'd need to check for the process's return code, e.g. `SIGABRT(6)` would have a return code of -6.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153167
Approved by: https://github.com/fduwjj
2025-05-25 03:48:34 +00:00
Aaron Orenstein
6503b4a96e Update to using mypy 1.15 (#154054)
The BC break isn't real - mypy decided to start complaining about the way we were typing that function.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154054
Approved by: https://github.com/Skylion007
2025-05-24 04:30:57 +00:00
Nikita Shulga
975bbc63db [MPS][BE] Move fmod/remainder to Metal ops (#154280)
This accomplishes following:
 - Fixes correctness problem with large integer types (though probably makes it slower, but this could not be avoided if one wants to compute accurate answer)
 - Makes op faster for floating point types (as Metal kernel invocation is faster than creating MPSGraph)
 - Eliminates need for several correctness workarounds

Fixes https://github.com/pytorch/pytorch/issues/154171
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154280
Approved by: https://github.com/dcci
ghstack dependencies: #154275, #154290
2025-05-24 01:45:33 +00:00
PyTorch MergeBot
28af44285b Revert "[c10d] Add support for testing SIGABRT return (#153167)"
This reverts commit 499a76b844.

Reverted https://github.com/pytorch/pytorch/pull/153167 on behalf of https://github.com/malfet due to Broke lint, see fe784c5a2c/1 ([comment](https://github.com/pytorch/pytorch/pull/153167#issuecomment-2905623868))
2025-05-23 19:44:08 +00:00
Ke Wen
499a76b844 [c10d] Add support for testing SIGABRT return (#153167)
`SIGABRT` is a common return by *negative* distributed tests, which checks for effectiveness of NaN assert, watchdog throw, etc.

These errors are not detectable by traditional statements like `with self.assertRaises(RuntimeError)`.

Instead, we'd need to check for the process's return code, e.g. `SIGABRT(6)` would have a return code of -6.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153167
Approved by: https://github.com/fduwjj
2025-05-23 19:04:28 +00:00
PyTorch MergeBot
561a11aa68 Revert "Patch the _is_conv_node function (#153749)"
This reverts commit c985cec5b2.

Reverted https://github.com/pytorch/pytorch/pull/153749 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/153749#issuecomment-2905504697))
2025-05-23 19:04:20 +00:00
Tsung-Hsien Lee
cae25ef4e5 [c10d] Enhance Error Logging in new_subgroups() for Non-Divisible World Sizes (#154124)
Summary: The error caused by the world size not being divisible by `group_size` is a common issue encountered by end-users when utilizing applications built on top of `new_subgroups()`. However, these applications may employ different variable names, such as `num_trainers_per_group`, which can make the current error messages less effective despite being correct. To address this, we have improved the error messages to display the actual numbers involved, thereby enhancing their clarity and usefulness.

Test Plan: contbuild & OSS CI

Differential Revision: D75226925

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154124
Approved by: https://github.com/wz337
2025-05-23 17:12:43 +00:00
PyTorch MergeBot
59c5fff2aa Revert "[DDP] rebuilt bucket order when find_unused_parameters=true (#153404)"
This reverts commit a79e621c1c.

Reverted https://github.com/pytorch/pytorch/pull/153404 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/153404#issuecomment-2902741300))
2025-05-22 22:26:59 +00:00
Chen Lai
c985cec5b2 Patch the _is_conv_node function (#153749)
Summary: torch.ops.aten.conv2d.padding is also conv2d node

Differential Revision: D74898941

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153749
Approved by: https://github.com/andrewor14, https://github.com/Skylion007
2025-05-22 22:17:02 +00:00
Michael Lazos
423fc671e9 [Cutlass] Support float8_e4m3fn GEMM (#153890)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153890
Approved by: https://github.com/drisspg, https://github.com/eellison
2025-05-22 08:37:33 +00:00
soulitzer
f2af30fee5 Add a HOP to bypass tracing of a wrapper function while tracing the wrapped function (#153487)
Usage:
```python
from torch._higher_order_ops.wrap import dynamo_bypassing_wrapper

# Your ordinary function wrapper
def my_hop_fn_impl(fn, *args, k=1, **kwargs):
    def wrapper(*args, **kwargs):
        out = fn(*args, **kwargs)
        if isinstance(out, tuple):
            return (out[0] + k,)
        return out + k

    return wrapper

# Calling `my_hop_fn` instead of the impl directly captures a HOP into the dynamo graph
def my_hop_fn(fn, *args, k=1, **kwargs):
    return dynamo_bypassing_wrapper(
        functools.partial(my_hop_fn_impl, k=k), fn, *args, **kwargs
    )
```

Notes:
- The dynamo captured graph now stashes arbitrary callable objects (the wrapper_fn) - this is equivalent to what SAC does today with policy_fn.
- The `wrapper_fn` passed to `dynamo_bypassing_wrapper ` should have signature `Callable -> Callable`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153487
Approved by: https://github.com/ydwu4
2025-05-22 04:24:38 +00:00
Slawomir Siwek
3742b7fb3a Treat dim=[] same as dim=None (#153570)
Fixes https://github.com/pytorch/pytorch/issues/153568

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153570
Approved by: https://github.com/ngimel
2025-05-20 22:44:29 +00:00
Joel Schlosser
3ecd444004 Support independent builds for cpp extension tests + apply to libtorch_agnostic tests (#153264)
Related: #148920

This PR:
* Provides a helper `install_cpp_extension(extension_root)` for building C++ extensions. This is intended to be used in `TestMyCppExtension.setUpClass()`
    * Updates libtorch_agnostic tests to use this
* Deletes preexisting libtorch_agnostic tests from `test/test_cpp_extensions_aot.py`
    * Fixes `run_test.py` to actually run tests in `test/cpp_extensions/libtorch_agnostic_extension/test/test_libtorch_agnostic.py` to avoid losing coverage. This wasn't being run due to logic excluding tests that start with "cpp"; this is fixed now

After this PR, it is now possible to run:
```
python test/cpp_extensions/libtorch_agnostic_extension/test/test_libtorch_agnostic.py
```

and the test will build the `libtorch_agnostic` extension before running the tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153264
Approved by: https://github.com/janeyx99
2025-05-20 19:18:09 +00:00
Yanli Zhao
a79e621c1c [DDP] rebuilt bucket order when find_unused_parameters=true (#153404)
Differential Revision: D72437251

Enable to rebuild bucket order when find_unused_parameters=true.

It should be always better than not rebuilding bucket order when find_unused_parameters=True:

1. for cases where bucket order in the first iteration is the same as the parameter order, rebuilding bucket order will not change anything

2. for cases where bucket order in the first iteration is not the same as the parameter order, there could be two cases:
    a. bucket order will not change after 1st iteration even the graph is dynamic and there is unused parameter, in this case, rebuilding bucket order will have performance gain
    b. bucket order change after 1st iteration due to dynamic graph, in this case, both parameter order and bucket order in 1st iteration are not ideal, so rebuilding bucket order or not does not matter

it can help case 2.a if enabling to rebuild bucket order when find_unused_parameters=true. meanwhile it will not hurt other cases in 1 and 2.b.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153404
Approved by: https://github.com/rohan-varma, https://github.com/fegin
2025-05-20 02:45:01 +00:00
PyTorch MergeBot
674a85cf26 Revert "[Distributed][CI] Rework continuous TestCase (#153653)"
This reverts commit 0d5c628a6e.

Reverted https://github.com/pytorch/pytorch/pull/153653 on behalf of https://github.com/kwen2501 due to More fixes needed ([comment](https://github.com/pytorch/pytorch/pull/153653#issuecomment-2891931028))
2025-05-19 18:29:27 +00:00
Ke Wen
0d5c628a6e [Distributed][CI] Rework continuous TestCase (#153653)
1. Reworked `MultiProcContinousTest` to spawn processes during `setUpClass` instead of `main` (so that we can support multiple TestClass'es in one file).

2. The child processes are now an infinite loop, monitoring test IDs passed from main process via a task queue. Reciprocally, the child processes inform the main process completion of a test via a completion queue.

3. Added a test template.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153653
Approved by: https://github.com/d4l3k, https://github.com/fegin, https://github.com/fduwjj
2025-05-19 18:20:42 +00:00
Tsung-Hsien Lee
6487ea30b3 [c10d] Fix new_subgroups(group=) bug (#153798)
Summary: The bug, introduced in https://github.com/pytorch/pytorch/pull/152765, was caused by passing the `group` parameter to the `get_rank()` function, which caused the function to return the rank of the entire group instead of the rank of the current process. The fix involves removing the `group` parameter from the `get_rank()` function call.

Test Plan: contbuild & OSS CI

Differential Revision: D74964213

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153798
Approved by: https://github.com/Skylion007
2025-05-19 17:01:10 +00:00
Nikita Shulga
6fe5d9215f [EZ][MPS] Enable rsub op (#153786)
Nothing really to enable, just add it to native functions, TensorIterator abstraction takes care of the rest
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153786
Approved by: https://github.com/cyyever, https://github.com/Skylion007, https://github.com/dcci
2025-05-19 02:01:48 +00:00
Nikita Shulga
62d8e3cb40 [BE][MPS] Cleanup log ops migration (#153727)
Introduced by https://github.com/pytorch/pytorch/pull/153398

Workaround internal compiler error on MacOS-13 by providing boolean specialization

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153727
Approved by: https://github.com/Skylion007
2025-05-16 19:32:17 +00:00
PyTorch MergeBot
3443627e07 Revert "[BE]: Enable RUFF TRY400 rule - log.exception (#153473)"
This reverts commit 4f4ecc583e.

Reverted https://github.com/pytorch/pytorch/pull/153473 on behalf of https://github.com/jeanschmidt due to seems to have broken internal signals, @albanD may I count on you to help the author merge his PR? D74837988 ([comment](https://github.com/pytorch/pytorch/pull/153473#issuecomment-2886017075))
2025-05-16 08:29:26 +00:00
Aaron Gokaslan
4f4ecc583e [BE]: Enable RUFF TRY400 rule - log.exception (#153473)
Change logging.error to logging.exception to log additional information when relevant.  A few places have slipped in logging.errors in try except since I last did a clean up here and the rule is stabilized so I am enabling it codebase wide. I have NOQA'd much of our custom exception stack trace handling for RPC calls and distributed and tried to a fix a few errors based on whether we immediately reraised it or if we didn't print any exception handling where it could be useful.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153473
Approved by: https://github.com/albanD, https://github.com/cyyever
2025-05-15 13:36:59 +00:00
Pat Vignola
6e107899da [Torch] Fix crash when comparing fp8 tensors that have more than 1 dimension (#153508)
Summary: `torch.nonzero` returns as many items as the number of dimensions, so we shouldn't expect a single element for the indices.

Test Plan: CI

Differential Revision: D74539233

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153508
Approved by: https://github.com/exclamaforte
2025-05-15 08:41:46 +00:00
Aby Mathew C
6a28cc826f Add TEST_HPU flag to set device type (#153461)
MOTIVATION
This PR includes a minor change to check for TEST_HPU flag as well before falling back to CPU. Without this flag, some tests were falling back to CPU causing them to fail.
Please refer to this RFC as well: https://github.com/pytorch/rfcs/pull/66

CHANGES
add TEST_HPU flag to some of the conditions checking the environment
use DEVICE_COUNT variable instead of torch.accelerator.device_count() API since the later is not supported on out-of-tree devices like Intel Gaudi.
@ankurneog , @EikanWang , @cyyever , @guangyey

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153461
Approved by: https://github.com/EikanWang, https://github.com/cyyever, https://github.com/albanD
2025-05-14 19:31:40 +00:00
clr
534b66fe30 torch.compile: Remove reference to the unused dynamo_config.dynamic_shapes from (#153297)
tests

This config option is not set anywhere, and does nothing, so this should cause
no changes to tests.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153297
Approved by: https://github.com/Skylion007
2025-05-14 19:02:51 +00:00
nikitaved
edc2d539d1 torch.tensordot: performance improvements when contracting to a scalar. (#145936)
As per title.
Fixes https://github.com/pytorch/pytorch/issues/145731

Touches only compute. The CPU overhead can potentially be further reduced.

Before:
```python
In [3]: n = 512

In [4]: A = torch.rand(n, n)

In [5]: B = torch.rand(n, n)

In [6]: %timeit torch.tensordot(A, B, [[0, 1], [0, 1]])
2.04 ms ± 70 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [7]: %timeit torch.tensordot(A, B, [[0, 1], [1, 0]])
2.85 ms ± 191 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [8]: %timeit torch.tensordot(A, B, [[1, 0], [0, 1]])
2.9 ms ± 133 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [9]: %timeit torch.tensordot(A, B, [[1, 0], [1, 0]])
4.07 ms ± 262 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
```

After
```python
In [2]: n = 512

In [3]: A = torch.rand(n, n)

In [4]: B = torch.rand(n, n)

In [5]: %timeit torch.tensordot(A, B, [[0, 1], [0, 1]])
30.7 µs ± 2.51 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [6]: %timeit torch.tensordot(A, B, [[0, 1], [1, 0]])
141 µs ± 6.52 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [7]: %timeit torch.tensordot(A, B, [[1, 0], [0, 1]])
142 µs ± 4.03 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [8]: %timeit torch.tensordot(A, B, [[1, 0], [1, 0]])
62.8 µs ± 4.31 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145936
Approved by: https://github.com/albanD, https://github.com/ngimel
2025-05-13 10:57:30 +00:00
PyTorch MergeBot
fdc387ec7c Revert "refine fp32 precision api (#125888)"
This reverts commit 4c11b26158.

Reverted https://github.com/pytorch/pytorch/pull/125888 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it seems to cause some failures on ROCm ([comment](https://github.com/pytorch/pytorch/pull/125888#issuecomment-2869274791))
2025-05-11 00:35:46 +00:00
haozhe.zhu
4c11b26158 refine fp32 precision api (#125888)
Based on the [conversation](https://github.com/pytorch/pytorch/issues/121791), we plan to drop the "highest, high, medium" to represent fp32  internal computation data types . Instead, we will directly use the algorithm to represent it.

### Design Choice: Directly use algorithms name like "TF32", "BF16".
#### Pros
 - The names are more informative. 'tf32' is more informative than a simple "high".
 - Easier to extend new algorithm like `tf32x3`
#### Cons
 - "HIGHEST, HIGH, MEDIUM" indicated the relative precision between different algorithms. However, we can have more documents to discuss them.

### We provide a layered structure for backends/operators.
('f32' is short for 'fp32_precision')
![image](https://github.com/user-attachments/assets/f89143e5-d6a1-4865-9351-9a50439f5067)

### We provide 3 fp32 compute precision can be set:
 - **"ieee"**: Not allowed to use any other internal computation data types .
 - **"tf32"**: Allowed to use tf32 as internal computation data types.
 - **"bf16"**: Allowed to use bf16 as internal computation data types.
 - **"none"**:  Precision's are not set. Can be override by its father node.

### Overriding Precision Settings
Child node can be override by its father node if it is set to default.
For current default settings:
```
backend = generic, op = all, precision setting = none
    backend = cuda, op = all, precision setting = none
        backend = cuda, op = conv, precision setting = tf32
        backend = cuda, op = rnn, precision setting = tf32
        backend = cuda, op = matmul, precision setting = none
    backend = matmul, op = all, precision setting = none
        backend = matmul, op = conv, precision setting = none
        backend = matmul, op = rnn, precision setting = none
        backend = matmul, op = matmul, precision setting = none
```
 - If the user set `torch.backends.mkldnn.fp32_precision="bf16"`, his child nodes `torch.backends.mkldnn.matmul.fp32_precision` / `torch.backends.mkldnn.conv.fp32_precision` / `torch.backends.mkldnn.rnn.fp32_precision` will also be override to "bf16".
 - If the user set `torch.backends.fp32_precision="bf16"`,  `torch.backends.mkldnn.fp32_precision` and his child nodes will also we override to "bf16".

### Backward Compatible
Since new API allow user to have more fine-grained control. There will be some conflict. For example, previous `torch.backends.cudnn.allow_tf32` are not enough to represent the status for `torch.backends.cudnn.rnn.fp32_precision="ieee"` and `torch.backends.cudnn.conv.fp32_precision="tf32"`. Therefore, our goal for backward compatible is
 - If the user only uses previous APIs, it will work as previous expectations.
 - If the user use **new** API to change the status to an **un-representable** status for old API, and try to access the status by **old** API. We will raise Runtime Error and point the document for user.

### Test Plan
```
python test/test_cuda.py -k test_fp32_precision_with_tf32
python test/test_cuda.py -k test_fp32_precision_with_float32_matmul_precision
python test/test_cuda.py -k test_invalid_status_for_legacy_api
python test/test_mkldnn.py -k test_mlkdnn_get_set
python test/test_mkldnn.py -k test_generic_precision
python test/test_mkldnn.py -k test_invalid
python test/test_mkldnn.py -k test_default_use_parent
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/125888
Approved by: https://github.com/jgong5, https://github.com/albanD

Co-authored-by: Jiang, Yanbing <yanbing.jiang@intel.com>
2025-05-10 11:13:04 +00:00
Xilun Wu
cbb03e6971 [BE][DTensor] move torch.distributed._tensor import to torch.distributed.tensor in test files (#153225)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153225
Approved by: https://github.com/kwen2501, https://github.com/fegin
2025-05-09 20:40:54 +00:00
Ke Wen
5dd746b4b5 [c10d] Reduce test verbosity (#153116)
Has been seeing a lot of `Starting event listener thread for rank` recently in test print-out. Moving them to `logger.debug`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153116
Approved by: https://github.com/fduwjj
2025-05-08 22:22:22 +00:00
Aidyn-A
086e2c2399 [TEST][ATen][CUDA] Skip row-wise scaled matrix mmultiplication tests on sm_120+ (#152814)
The float8 row-wise scaled matmuls are not supported on Blackwell yet. This PR adds skips to those tests to decrease the noise on `sm_120+` machines.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/152814
Approved by: https://github.com/eqy, https://github.com/Skylion007
2025-05-08 19:34:20 +00:00
Jithun Nair
fe8ebacee4 [ROCm] Upgrade ROCm CI to ROCm6.4 (#151368)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/151368
Approved by: https://github.com/jeffdaily, https://github.com/malfet

Co-authored-by: Jeff Daily <jeff.daily@amd.com>
2025-05-08 16:12:16 +00:00
Xilun Wu
0f9821d0e3 [BE][lint] fix PYFMT for PT-D code under torch.testing._internal, add them to the lint list (#153114)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153114
Approved by: https://github.com/cyyever, https://github.com/fegin, https://github.com/H-Huang, https://github.com/Skylion007
2025-05-08 14:01:49 +00:00
Guilherme Leobas
ae1e51b6ad Add infra to run CPython tests under Dynamo (#150787)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/150787
Approved by: https://github.com/zou3519
2025-05-07 04:03:14 +00:00
Nikita Shulga
9919d6b872 [Testing] Add copysign from scalar regression test (#152997)
But instead of adding it just for MPS backend, add it to OpInfo

Fixes https://github.com/pytorch/pytorch/issues/152582
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152997
Approved by: https://github.com/wdvr
2025-05-07 00:19:42 +00:00
Siddharth Kotapati
327d1b6ef0 Move additional MPS Unary ops to Iterator (#152876)
Noticed some of these ops were contributing to a big chunk of the runtime for OpenLLama as well as a few other benchmarks

At the op level, moving to a TensorIterator-based Metal kernel gives a 20x speedup. Will migrate the inverse trigonometric functions & log ops in a follow-up PR, as this one is already a bit large
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152876
Approved by: https://github.com/malfet
2025-05-07 00:06:54 +00:00
drisspg
14f8066910 Ensure mxfp8 scaled_mm works w/ max-autotune (#152744)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152744
Approved by: https://github.com/Skylion007
2025-05-06 01:16:57 +00:00
PyTorch MergeBot
103fe856e1 Revert "Add infra to run CPython tests under Dynamo (#150787)"
This reverts commit 7c96dd8f0c.

Reverted https://github.com/pytorch/pytorch/pull/150787 on behalf of https://github.com/huydhn due to Sorry for reverting your change but a failed test is showing up in trunk ([comment](https://github.com/pytorch/pytorch/pull/150787#issuecomment-2852818113))
2025-05-06 00:20:02 +00:00
Alexander Grund
99287b170b Generate test reports for pytest when option is given (#152170)
The argument needs to be appended when test reports should be generated. IS_CI is not necessarily set, so rather check TEST_SAVE_XML instead as in other places where test reports are conditionally enabled.

See also https://github.com/pytorch/pytorch/issues/126523
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152170
Approved by: https://github.com/Skylion007
2025-05-05 17:46:40 +00:00
Guilherme Leobas
7c96dd8f0c Add infra to run CPython tests under Dynamo (#150787)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/150787
Approved by: https://github.com/zou3519
2025-05-05 17:20:14 +00:00
Nikita Shulga
0ffd31dc8a [MPS] Migrate div roudning modes (#152758)
By implementing `div_floor` and `div_trunc` . Do not mark `div_trunc` as OPMATH, to align following output with CPU(if division is performed in fp32, than result will be truncated to 25
```
import torch
print(torch.tensor([[-7.4688, -3.1289]], dtype=torch.float16,device="cpu").div(torch.tensor([-0.2988, -0.8789], dtype=torch.bfloat16,device="cpu"), rounding_mode="trunc"))
tensor([[24.,  3.]])
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/152758
Approved by: https://github.com/dcci
ghstack dependencies: #152663, #152515, #152737, #152743
2025-05-05 03:02:29 +00:00
Nikita Shulga
e889937850 [MPS] Migrate div to Metal (#152743)
TODOs:
 - Verify accuracy of  `metal::dot` vs `x.x*x.x + y.y*y.y`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152743
Approved by: https://github.com/dcci, https://github.com/Skylion007
ghstack dependencies: #152663, #152515, #152737
2025-05-04 00:56:19 +00:00