Commit Graph

736 Commits

Author SHA1 Message Date
Alessandro Sangiorgi
f57754e815 [Inductor] Record Triton’s Base32 Cache Key in .best_config for Debugging (#154618)
This is a follow-up PR of the reverted one https://github.com/pytorch/pytorch/pull/148981 re-opening for visibility :

Modified TorchInductor’s autotuning flow so that each best_config JSON file also includes the Triton “base32” (or base64) cache key.

Motivation

Debugging & Analysis: With this change, we can quickly identify which compiled binary and IRs belongs to a given best config.
The impact is minimal since it is only an extra field in .best_config. It can help advanced performance tuning or kernel-level debugging.

Also, since Triton already stores cubin/hsaco in its cache, developers/researchers can avoid to set store_cubin = True since they can get the cubin/hsaco in the Triton cache and with the code provided in this PR, they can easily match the best_config with the right Triton cache directory for the "best" kernel.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154618
Approved by: https://github.com/jansel
2025-05-30 19:30:25 +00:00
soulitzer
733e684b11 Skip test file that doesn't run gradcheck for slow gradcheck (#154509)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154509
Approved by: https://github.com/malfet
2025-05-29 16:32:26 +00:00
Joel Schlosser
3ecd444004 Support independent builds for cpp extension tests + apply to libtorch_agnostic tests (#153264)
Related: #148920

This PR:
* Provides a helper `install_cpp_extension(extension_root)` for building C++ extensions. This is intended to be used in `TestMyCppExtension.setUpClass()`
    * Updates libtorch_agnostic tests to use this
* Deletes preexisting libtorch_agnostic tests from `test/test_cpp_extensions_aot.py`
    * Fixes `run_test.py` to actually run tests in `test/cpp_extensions/libtorch_agnostic_extension/test/test_libtorch_agnostic.py` to avoid losing coverage. This wasn't being run due to logic excluding tests that start with "cpp"; this is fixed now

After this PR, it is now possible to run:
```
python test/cpp_extensions/libtorch_agnostic_extension/test/test_libtorch_agnostic.py
```

and the test will build the `libtorch_agnostic` extension before running the tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153264
Approved by: https://github.com/janeyx99
2025-05-20 19:18:09 +00:00
Shangdi Yu
b3dea0c0dd Change aoti cpp tests to run serially within file (#152960)
Fixes #152674
https://github.com/pytorch/pytorch/issues/152889
https://github.com/pytorch/pytorch/issues/152888
https://github.com/pytorch/pytorch/issues/152891

`--dist=loadfile` ensures all tests in the same source file run in the same worker.

Tests like `FreeInactiveConstantBufferRuntimeConstantFoldingCuda` expect exclusive access to memory during test time to compute diffs (e.g., initMemory - updateMemory2 == DATASIZE).

With `-n 3`, tests run in separate processes, but CUDA device memory is shared — and cudaMemGetInfo() reads device-wide global state.

```
 python test/run_test.py --cpp --verbose -i cpp/test_aoti_inference -dist=loadfile
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152960
Approved by: https://github.com/desertfire, https://github.com/cyyever
2025-05-14 17:02:39 +00:00
Guilherme Leobas
ae1e51b6ad Add infra to run CPython tests under Dynamo (#150787)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/150787
Approved by: https://github.com/zou3519
2025-05-07 04:03:14 +00:00
PyTorch MergeBot
103fe856e1 Revert "Add infra to run CPython tests under Dynamo (#150787)"
This reverts commit 7c96dd8f0c.

Reverted https://github.com/pytorch/pytorch/pull/150787 on behalf of https://github.com/huydhn due to Sorry for reverting your change but a failed test is showing up in trunk ([comment](https://github.com/pytorch/pytorch/pull/150787#issuecomment-2852818113))
2025-05-06 00:20:02 +00:00
Alexander Grund
99287b170b Generate test reports for pytest when option is given (#152170)
The argument needs to be appended when test reports should be generated. IS_CI is not necessarily set, so rather check TEST_SAVE_XML instead as in other places where test reports are conditionally enabled.

See also https://github.com/pytorch/pytorch/issues/126523
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152170
Approved by: https://github.com/Skylion007
2025-05-05 17:46:40 +00:00
Guilherme Leobas
7c96dd8f0c Add infra to run CPython tests under Dynamo (#150787)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/150787
Approved by: https://github.com/zou3519
2025-05-05 17:20:14 +00:00
Alexander Grund
ad11d6378c Don't run NCCL/gloo distributed test without GPUs (#150764)
If there aren't any GPUs the WORLD_SIZE would be zero which does not work.
So skip those backends completely in that case.

Fix after https://github.com/pytorch/pytorch/pull/137161

It might make sense to still run the (CPU-) part of the tests by using something like `world_size = max(3, gpu_count)` or `num_gpus if num_gpus else 3` instead of skipping them all

Pull Request resolved: https://github.com/pytorch/pytorch/pull/150764
Approved by: https://github.com/kwen2501
2025-04-29 05:27:23 +00:00
PyTorch MergeBot
c03359de2d Revert "[Inductor] Record Triton’s Base32 Cache Key in .best_config for Debugging (#148981)"
This reverts commit fc6e37ceb2.

Reverted https://github.com/pytorch/pytorch/pull/148981 on behalf of https://github.com/ZainRizvi due to Sorry but this is breaking internally. @davidberard98 can you please help get these changes validated? Details in D73628297. To validate the fixes internally, you can follow the instructions here: https://fburl.com/fixing-ghfirst-reverts ([comment](https://github.com/pytorch/pytorch/pull/148981#issuecomment-2831044810))
2025-04-25 17:45:13 +00:00
fulvius31
fc6e37ceb2 [Inductor] Record Triton’s Base32 Cache Key in .best_config for Debugging (#148981)
This is a follow-up PR of the reverted one https://github.com/pytorch/pytorch/pull/147019 :

Modified TorchInductor’s autotuning flow so that each best_config JSON file also includes the Triton “base32” (or base64) cache key.

Motivation

Debugging & Analysis: With this change, we can quickly identify which compiled binary and IRs belongs to a given best config.
The impact is minimal since it is only an extra field in .best_config. It can help advanced performance tuning or kernel-level debugging.

Also, since Triton already stores cubin/hsaco in its cache, developers/researchers can avoid to set store_cubin = True since they can get the cubin/hsaco in the Triton cache and with the code provided in this PR, they can easily match the best_config with the right Triton cache directory for the "best" kernel.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/148981
Approved by: https://github.com/davidberard98
2025-04-24 21:28:53 +00:00
FFFrog
3528488061 [Openreg][PrivateUse1] Enable CI for openreg (#151007)
Changes:
- move test_openreg.py from test/cpp_extensions/open_registration_extension/ to test/
- update README.md for openreg
- enable CI
Pull Request resolved: https://github.com/pytorch/pytorch/pull/151007
Approved by: https://github.com/albanD
2025-04-18 02:40:07 +00:00
PyTorch MergeBot
f252f9df5e Revert "[Openreg][PrivateUse1] Enable CI for openreg (#151007)"
This reverts commit abbca37fe8.

Reverted https://github.com/pytorch/pytorch/pull/151007 on behalf of https://github.com/clee2000 due to At least test_record_event needs to also be skipped on dynamo too, its failing and then somehow causing a hang? https://github.com/pytorch/pytorch/actions/runs/14487625709/job/40637535027#step:25:73 ([comment](https://github.com/pytorch/pytorch/pull/151007#issuecomment-2810789483))
2025-04-16 21:05:17 +00:00
FFFrog
abbca37fe8 [Openreg][PrivateUse1] Enable CI for openreg (#151007)
Changes:
- move test_openreg.py from test/cpp_extensions/open_registration_extension/ to test/
- update README.md for openreg
- enable CI
Pull Request resolved: https://github.com/pytorch/pytorch/pull/151007
Approved by: https://github.com/albanD
ghstack dependencies: #151005
2025-04-16 07:55:51 +00:00
Nikita Shulga
d7050ef48b [CI] Run test_torchinductor for MPS device (#150821)
There are only 118 failures atm, mark them all with xfail to avoid new regressions

Add `xfail_if_mps_unimplemented` decorator to distinguish between tests that call unimplemented eager op vs ones that fail for some other reason.

Added `aten._scaled_dot_product_attention_math_for_mps` fallback to make test behavior consistent between MacOS-15 (where falback is in place) and MacOS-14

Weird MacOS-14 specific skips:
- test_torchinductor.py::GPUTests::test_cat_extern_kernel_mps
- test_torchinductor.py::GPUTests::test_sort_transpose_mps (likely an eager bug)
- test_torchinductor.py::GPUTests::test_unaligned_input_mps

Numerous MacOS-13 skips, including few eager hard crashes, for example running `test_torchinductor.py::GPUTests::test_scatter5_mps` causes
```
/AppleInternal/Library/BuildRoots/c651a45f-806e-11ed-a221-7ef33c48bc85/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShaders/MPSNDArray/Kernels/MPSNDArrayScatter.mm:309: failed assertion `Rank of destination array (1) must be greater than or equal to inner-most dimension of indices array (3)'
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/150821
Approved by: https://github.com/ZainRizvi, https://github.com/dcci
ghstack dependencies: #151224, #151246, #151272, #151282, #151288
2025-04-15 18:42:39 +00:00
Prachi Gupta
47cdad2995 [ROCm] Enable several fsdp related UTs (#149369)
Enabling 26 UTs for ROCm in the following files:

-  distributed._shard.sharded_optim.test_sharded_optim - 2 UTs
-  distributed._shard.sharded_tensor.ops.test_binary_cmp - 4 UTs
-  distributed._shard.sharded_tensor.ops.test_init - 3 UTs
-  distributed._shard.sharded_tensor.ops.test_embedding - 2 UTs
-  distributed._shard.sharded_tensor.ops.test_embedding_bag - 2 UTs
-  distributed._composable.test_replicate_with_compiler - 4 UTs
-  distributed._composable.fsdp.test_fully_shard_grad_scaler - 1 UTs
-  distributed.tensor.test_attention - 4 UTs
-  distributed.tensor.test_matrix_ops - 1 UTs
-  distributed.tensor.test_tensor_ops - 1 UTs
-  distributed.fsdp.test_fsdp_grad_acc - 2 UTs

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149369
Approved by: https://github.com/jeffdaily
2025-03-31 16:15:57 +00:00
Catherine Lee
85079e4380 [TD] Enable TD on distributed cpu (#150028)
Enable TD on distributed cpu, I think the only reason it's not is because I forgot to enable it

Get rid of some of the statements that are no ops:
* asan uses default shard
* nogpu got moved to periodic
* no windows cuda testing anymore

Only thing on pull and trunk that doesn't use TD is dynamo_wrapped but I think it's fast enough to be ok for now, we can take another look after this
Pull Request resolved: https://github.com/pytorch/pytorch/pull/150028
Approved by: https://github.com/ZainRizvi
2025-03-28 17:19:11 +00:00
Aleksei Nikiforov
0c139fa58e Switch s390x tests to blocklist (#149507)
Switch s390x tests to blocklist
Pull Request resolved: https://github.com/pytorch/pytorch/pull/149507
Approved by: https://github.com/seemethere
2025-03-26 12:11:41 +00:00
Aleksei Nikiforov
d5b1d99f78 Enable more nightly tests on s390x (#148452)
Also enable some tests which probably were accidentally disabled.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/148452
Approved by: https://github.com/seemethere, https://github.com/malfet
2025-03-18 16:09:39 +00:00
soulitzer
916e8979d3 Skip some tests not using gradcheck on slowgradcheck (#149220)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/149220
Approved by: https://github.com/seemethere
2025-03-17 00:34:52 +00:00
Jane Xu
971606befa Add a stable TORCH_LIBRARY to C shim (#148124)
This PR adds two main parts:
- shim.h stable C APIs into torch::Library APIs
- a higher level API in torch/csrc/stable/library.h that calls into this shim.h + otherwise is self contained

Goal: custom kernel writers should be able to call the apis in the directories above in order to register their library in a way that allows their custom extension to run with a different libtorch version than it was built with.

Subplots resolved:

- Do we want a whole separate StableLibrary or do we want to freeze torch::Library and add `m.stable_impl(cstring, void (*fn)(void **, int64_t, int64_t)` into it
    - Yes, we want a separate StableLibrary. We cannot freeze Library and it is NOT header only.
- Should I use unint64_t as the common denominator instead of void* to support 32bit architectures better?
    -  Yes, and done
- Should I add a stable `def` and `fragment` when those can be done in python?
    - I think we do want these --- and now they're done
- Where should library_stable_impl.cpp live? -- no longer relevant
- I need some solid test cases to make sure everything's going ok. I've intentionally thrown in a bunch of random dtypes into the signature, but I still haven't tested returning multiple things, returning nothing, complex dtypes, etc.
    - Have since tested all the torch library endpoints. the others can be tested in a followup to separate components that need to be in shim.h vs can be added later

Pull Request resolved: https://github.com/pytorch/pytorch/pull/148124
Approved by: https://github.com/albanD, https://github.com/zou3519, https://github.com/atalman
2025-03-11 19:12:46 +00:00
PyTorch MergeBot
275a7c5dbb Revert "Add a stable TORCH_LIBRARY to C shim (#148124)"
This reverts commit 327e07ac1d.

Reverted https://github.com/pytorch/pytorch/pull/148124 on behalf of https://github.com/malfet due to Sorry for reverting your PR, but somehow it caused test failures in newly introduced tests, see https://hud.pytorch.org/hud/pytorch/pytorch/main/1?per_page=50&name_filter=pull%20%2F%20linux-focal-cuda12.6-py3.10-gcc11-sm89%20%2F%20test%20(default%2C%201&mergeLF=true ([comment](https://github.com/pytorch/pytorch/pull/148124#issuecomment-2709057833))
2025-03-09 20:44:56 +00:00
Jane Xu
327e07ac1d Add a stable TORCH_LIBRARY to C shim (#148124)
This PR adds two main parts:
- shim.h stable C APIs into torch::Library APIs
- a higher level API in torch/csrc/stable/library.h that calls into this shim.h + otherwise is self contained

Goal: custom kernel writers should be able to call the apis in the directories above in order to register their library in a way that allows their custom extension to run with a different libtorch version than it was built with.

Subplots resolved:

- Do we want a whole separate StableLibrary or do we want to freeze torch::Library and add `m.stable_impl(cstring, void (*fn)(void **, int64_t, int64_t)` into it
    - Yes, we want a separate StableLibrary. We cannot freeze Library and it is NOT header only.
- Should I use unint64_t as the common denominator instead of void* to support 32bit architectures better?
    -  Yes, and done
- Should I add a stable `def` and `fragment` when those can be done in python?
    - I think we do want these --- and now they're done
- Where should library_stable_impl.cpp live? -- no longer relevant
- I need some solid test cases to make sure everything's going ok. I've intentionally thrown in a bunch of random dtypes into the signature, but I still haven't tested returning multiple things, returning nothing, complex dtypes, etc.
    - Have since tested all the torch library endpoints. the others can be tested in a followup to separate components that need to be in shim.h vs can be added later

Pull Request resolved: https://github.com/pytorch/pytorch/pull/148124
Approved by: https://github.com/albanD, https://github.com/zou3519
2025-03-09 10:07:25 +00:00
PyTorch MergeBot
63778cb8a0 Revert "[Inductor] Record Triton’s Base32 Cache Key in .best_config for Debugging (#147019)"
This reverts commit e3e45d90d8.

Reverted https://github.com/pytorch/pytorch/pull/147019 on behalf of https://github.com/clee2000 due to broke inductor test inductor/test_max_autotune.py::TestMaxAutotune::test_cat_max_autotune_extern [GH job link](https://github.com/pytorch/pytorch/actions/runs/13653495421/job/38171259603) [HUD commit link](e3e45d90d8) on inductor workflow and rocm workflow ([comment](https://github.com/pytorch/pytorch/pull/147019#issuecomment-2698677222))
2025-03-04 19:20:15 +00:00
fulvius31
e3e45d90d8 [Inductor] Record Triton’s Base32 Cache Key in .best_config for Debugging (#147019)
Modified  TorchInductor’s autotuning flow so that each `best_config` JSON file also includes the Triton “base32” (or base64) cache key.

**Motivation**

Debugging & Analysis: With this change, we can quickly identify which compiled binary and IRs belongs to a given best config.
The impact is minimal since it is only an extra field in .best_config. It can help advanced performance tuning or kernel-level debugging.

Also, since Triton already stores cubin/hsaco in its cache, developers/researchers can avoid to set `store_cubin = True` since they can get the cubin/hsaco in the Triton cache and with the code provided in this PR, they can easily match the best_config with the right Triton cache directory for the "best" kernel.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/147019
Approved by: https://github.com/davidberard98
2025-03-04 12:16:38 +00:00
Alexander Grund
f1cce0951b Create unique test report files for distributed tests (#148325)
The distributed tests are executed once for each backend and for each init method.
`$TEST_REPORT_SOURCE_OVERRIDE` is used such that test results from different backends are stored in different files.
The same needs to be done for the init method.

Move the setting of the variable into `test_distributed` and incorporate the init method into the name.

Useful for e.g. https://github.com/pytorch/pytorch/issues/126523

Pull Request resolved: https://github.com/pytorch/pytorch/pull/148325
Approved by: https://github.com/clee2000
2025-03-04 10:45:33 +00:00
Xuehai Pan
c73a92fbf5 [BE][CI] bump ruff to 0.9.2: multiline assert statements (#144546)
Reference: https://docs.astral.sh/ruff/formatter/black/#assert-statements

> Unlike Black, Ruff prefers breaking the message over breaking the assertion, similar to how both Ruff and Black prefer breaking the assignment value over breaking the assignment target:
>
> ```python
> # Input
> assert (
>     len(policy_types) >= priority + num_duplicates
> ), f"This tests needs at least {priority+num_duplicates} many types."
>
>
> # Black
> assert (
>     len(policy_types) >= priority + num_duplicates
> ), f"This tests needs at least {priority+num_duplicates} many types."
>
> # Ruff
> assert len(policy_types) >= priority + num_duplicates, (
>     f"This tests needs at least {priority + num_duplicates} many types."
> )
> ```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/144546
Approved by: https://github.com/malfet
2025-02-27 20:46:16 +00:00
Zhenbin Lin
7ffae2c028 Split test_transformers.py (#147441)
Split test_transformers.py into test_transformers.py and test_transformers_privateuser1.py. Currently the privateuse1 test cases in test_transformers.py are skipped since they conflict with cuda test cases.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/147441
Approved by: https://github.com/drisspg
2025-02-26 11:54:24 +00:00
Xuehai Pan
754fb834db [BE][CI] bump ruff to 0.9.0: string quote styles (#144569)
Reference: https://docs.astral.sh/ruff/formatter/#f-string-formatting

- Change the outer quotes to double quotes for nested f-strings

```diff
- f'{", ".join(args)}'
+ f"{', '.join(args)}"
```

- Change the inner quotes to double quotes for triple f-strings

```diff
  string = """
-     {', '.join(args)}
+     {", ".join(args)}
  """
```

- Join implicitly concatenated strings

```diff
- string = "short string " "short string " f"{var}"
+ string = f"short string short string {var}"
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/144569
Approved by: https://github.com/Skylion007
ghstack dependencies: #146509
2025-02-24 19:56:09 +00:00
Catherine Lee
0d16188c06 [CI] Use job name to index into test times json (#147154)
When the test times are generated, it doesn't know what the build environment is because it's an environment variable.  But when we index into the test times, we (previously) didn't know what the job name is.  These are usually the same but sometimes they're different and when they're different it ends up using default, which can have unbalanced sharding

I think job name was added at some point to most of the CI environments but I didn't realize, so we can now update this code to use the job name instead so the generation and the indexing match

also upload stats workflow for mps

Checked that inductor_amx doesn't use default

Pull Request resolved: https://github.com/pytorch/pytorch/pull/147154
Approved by: https://github.com/huydhn
2025-02-14 17:06:56 +00:00
PyTorch MergeBot
9a883007a2 Revert "Implement cuda graphs implementation of torch.cond and torch.while_loop (#140979)"
This reverts commit c7515da7b0.

Reverted https://github.com/pytorch/pytorch/pull/140979 on behalf of https://github.com/huydhn due to This change has been reported to break internal code ([comment](https://github.com/pytorch/pytorch/pull/140979#issuecomment-2657361940))
2025-02-13 18:04:26 +00:00
Daniel Galvez
c7515da7b0 Implement cuda graphs implementation of torch.cond and torch.while_loop (#140979)
This is a new PR for #130386 , which got stale and was closed. Since I force-pushed to that branch in order to rebase it on top of main, the PR can no longer be reopened, according to https://github.com/isaacs/github/issues/361

I fixed the possibly-not-warmed-up problem described here: https://github.com/pytorch/pytorch/pull/130386/files#r1690856534

Since starting this, torch.cond and torch.while_loop now apparently have support for backward passes. I will look into what it might take to support that.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140979
Approved by: https://github.com/eqy, https://github.com/eellison
2025-02-11 18:16:15 +00:00
Aleksei Nikiforov
44ecbcbd5a s390x: disable test_model_exports_to_core_aten.py test (#145835)
It often gets killed by OOM.
Disable it while investigating.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145835
Approved by: https://github.com/huydhn
2025-01-31 17:45:10 +00:00
Burak Turk
01a4d86b31 add pt2 callbacks for backward pass and prevent duplicate callbacks (#145732)
Summary: This change adds callbacks for lazy backwards compilation while preventing duplicate callbacks to be fired.

Differential Revision: D68577593

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145732
Approved by: https://github.com/mlazos
2025-01-28 03:50:02 +00:00
albanD
0d28188cc8 Move privateuse1 test out of test_utils and make them serial (#145380)
Fixes https://github.com/pytorch/pytorch/issues/132720

The reason is that changing the privateuse1 module is global and so can race when other tests happen to check if it is enabled.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145380
Approved by: https://github.com/Skylion007, https://github.com/janeyx99
2025-01-23 00:31:39 +00:00
Aaron Orenstein
99dbc5b0e2 PEP585 update - test (#145176)
See #145101 for details.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145176
Approved by: https://github.com/bobrenjc93
2025-01-22 04:48:28 +00:00
Zhenbin Lin
cbb1ed2966 [1/N] OpenReg: Replace open_registration_extension.cpp with openreg (#141815)
As described in OpenReg [next-steps](https://github.com/pytorch/pytorch/blob/main/test/cpp_extensions/open_registration_extension/README.md#next-steps), here we replace the current `open_registration_extension.cpp` test in PyTorch CI with openreg.

The current `open_registration_extension.cpp` contains two parts:
1. Implentations to support `PrivateUse1` backend.
2. Helper functions used for UTs in `test_cpp_extensions_open_device_registration.py` and `test_transformers.py`.

For the first part, we'll replace it with openreg. For the second part, we'll migrate them to ut files step by step.

@albanD

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141815
Approved by: https://github.com/albanD
2025-01-14 15:59:00 +00:00
Aleksei Nikiforov
4143312e67 S390x ci periodic tests (#125401)
Periodically run testsuite for s390x

**Dependencies update**
Package z3-solver is updated from version 4.12.2.0 to version 4.12.6.0. This is a minor version update, so no functional change is expected.
The reason for update is build on s390x. pypi doesn't provide binary build for z3-solver for versions 4.12.2.0 or 4.12.6.0 for s390x. Unfortunately, version 4.12.2.0 fails to build with newer gcc used on s390x builders, but those errors are fixed in version 4.12.6.0. Due to this minor version bump fixes build on s390x.

```
# pip3 install z3-solver==4.12.2.0
...
      In file included from /tmp/pip-install-756iytc6/z3-solver_ce6f750b780b4146a9a7c01e52672071/core/src/util/region.cpp:53:
      /tmp/pip-install-756iytc6/z3-solver_ce6f750b780b4146a9a7c01e52672071/core/src/util/region.cpp: In member function ‘void* region::allocate(size_t)’:
      /tmp/pip-install-756iytc6/z3-solver_ce6f750b780b4146a9a7c01e52672071/core/src/util/tptr.h:29:62: error: ‘uintptr_t’ does not name a type
         29 | #define ALIGN(T, PTR) reinterpret_cast<T>(((reinterpret_cast<uintptr_t>(PTR) >> PTR_ALIGNMENT) + \
            |                                                              ^~~~~~~~~
      /tmp/pip-install-756iytc6/z3-solver_ce6f750b780b4146a9a7c01e52672071/core/src/util/region.cpp:82:22: note: in expansion of macro ‘ALIGN’
         82 |         m_curr_ptr = ALIGN(char *, new_curr_ptr);
            |                      ^~~~~
      /tmp/pip-install-756iytc6/z3-solver_ce6f750b780b4146a9a7c01e52672071/core/src/util/region.cpp:57:1: note: ‘uintptr_t’ is defined in header ‘<cstdint>’; did you forget to ‘#include <cstdint>’?
         56 | #include "util/page.h"
        +++ |+#include <cstdint>
         57 |
```

**Python paths update**
On AlmaLinux 8 s390x, old paths:
```
python -c 'from distutils.sysconfig import get_python_lib; print(get_python_lib())'
/usr/lib/python3.12/site-packages
```

Total result is `/usr/lib/python3.12/site-packages/torch;/usr/lib/python3.12/site-packages`

New paths:
```
python -c 'import site; print(";".join([x for x in site.getsitepackages()] + [x + "/torch" for x in site.getsitepackages()]))'
/usr/local/lib64/python3.12/site-packages;/usr/local/lib/python3.12/site-packages;/usr/lib64/python3.12/site-packages;/usr/lib/python3.12/site-packages;/usr/local/lib64/python3.12/site-packages/torch;/usr/local/lib/python3.12/site-packages/torch;/usr/lib64/python3.12/site-packages/torch;/usr/lib/python3.12/site-packages/torch
```

```
# python -c 'import torch ; print(torch)'
<module 'torch' from '/usr/local/lib64/python3.12/site-packages/torch/__init__.py'>
```

`pip3 install dist/*.whl` installs torch into `/usr/local/lib64/python3.12/site-packages`, and later it's not found by cmake with old paths:

```
CMake Error at CMakeLists.txt:9 (find_package):
  By not providing "FindTorch.cmake" in CMAKE_MODULE_PATH this project has
  asked CMake to find a package configuration file provided by "Torch", but
  CMake did not find one.
```

https://github.com/pytorch/pytorch/actions/runs/10994060107/job/30521868178?pr=125401

**Builders availability**
Build took 60 minutes
Tests took: 150, 110, 65, 55, 115, 85, 50, 70, 105, 110 minutes (split into 10 shards)

60 + 150 + 110 + 65 + 55 + 115 + 85 + 50 + 70 + 105 + 110 = 975 minutes used. Let's double it. It would be 1950 minutes.

We have 20 machines * 24 hours = 20 * 24 * 60 = 20 * 1440 = 28800 minutes

We currently run 5 nightly binaries builds, each on average 90 minutes build, 15 minutes test, 5 minutes upload, 110 minutes total for each, 550 minutes total. Doubling would be 1100 minutes.

That leaves 28800 - 1100 = 27700 minutes total. Periodic tests would use will leave 25750 minutes.

Nightly binaries build + nightly tests = 3050 minutes.

25750 / 3050 = 8.44. So we could do both 8 more times for additional CI runs for any reason. And that is with pretty good safety margin.

**Skip test_tensorexpr**
On s390x, pytorch is built without llvm.
Even if it would be built with llvm, llvm currently doesn't support used features on s390x and test fails with errors like:
```
JIT session error: Unsupported target machine architecture in ELF object pytorch-jitted-objectbuffer
unknown file: Failure
C++ exception with description "valOrErr INTERNAL ASSERT FAILED at "/var/lib/jenkins/workspace/torch/csrc/jit/tensorexpr/llvm_jit.h":34, please report a bug to PyTorch. Unexpected failure in LLVM JIT: Failed to materialize symbols: { (main, { func }) }
```
**Disable cpp/static_runtime_test on s390x**

Quantization is not fully supported on s390x in pytorch yet.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/125401
Approved by: https://github.com/malfet

Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
2025-01-10 18:21:07 +00:00
Wanchao Liang
96f4abba17 [dtensor] move all tests to distribute/tensor folder (#144166)
as titled, mainly moving files

Pull Request resolved: https://github.com/pytorch/pytorch/pull/144166
Approved by: https://github.com/Skylion007
2025-01-08 00:32:33 +00:00
PyTorch MergeBot
6c54963f75 Revert "[dtensor] move all tests to distribute/tensor folder (#144166)"
This reverts commit 2e1ea8598f.

Reverted https://github.com/pytorch/pytorch/pull/144166 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but inductor/test_compiled_autograd needs to be updated ([comment](https://github.com/pytorch/pytorch/pull/144166#issuecomment-2575969871))
2025-01-07 18:31:36 +00:00
Wanchao Liang
2e1ea8598f [dtensor] move all tests to distribute/tensor folder (#144166)
as titled, mainly moving files

Pull Request resolved: https://github.com/pytorch/pytorch/pull/144166
Approved by: https://github.com/Skylion007
2025-01-07 06:45:14 +00:00
PyTorch MergeBot
99f2491af9 Revert "Use absolute path path.resolve() -> path.absolute() (#129409)"
This reverts commit 45411d1fc9.

Reverted https://github.com/pytorch/pytorch/pull/129409 on behalf of https://github.com/jeanschmidt due to Breaking internal CI, @albanD please help get this PR merged ([comment](https://github.com/pytorch/pytorch/pull/129409#issuecomment-2571316444))
2025-01-04 14:17:20 +00:00
Xuehai Pan
45411d1fc9 Use absolute path path.resolve() -> path.absolute() (#129409)
Changes:

1. Always explicit `.absolute()`: `Path(__file__)` -> `Path(__file__).absolute()`
2. Replace `path.resolve()` with `path.absolute()` if the code is resolving the PyTorch repo root directory.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129409
Approved by: https://github.com/albanD
2025-01-03 20:03:40 +00:00
Nikita Shulga
d8c3900d80 [Inductor] Implement primitive Metal compiler (#143893)
Still work in progress, only works for element wise operations. Current implementation could be used to turn something like
```python
def f(x):
  return x[:,::2].sin() + x[:, 1::2].cos()
```
into the following shader
```python
# Topologically Sorted Source Nodes: [sin, cos, add], Original ATen: [aten.sin, aten.cos, aten.add]
# Source node to ATen node mapping:
#   add => add
#   cos => cos
#   sin => sin
# Graph fragment:
#   %sin : [num_users=1] = call_function[target=torch.ops.aten.sin.default](args = (%slice_2,), kwargs = {})
#   %cos : [num_users=1] = call_function[target=torch.ops.aten.cos.default](args = (%slice_4,), kwargs = {})
#   %add : [num_users=1] = call_function[target=torch.ops.aten.add.Tensor](args = (%sin, %cos), kwargs = {})
mps_lib = torch.mps._compile_shader("""
    kernel void kernel_0(
        device float* out_ptr0,
        constant float* in_ptr0,
        uint xindex [[thread_position_in_grid]]
    ) {
        int x0 = xindex;
        auto tmp0 = in_ptr0[2*x0];
        auto tmp1 = metal::precise::sin(tmp0);
        auto tmp2 = in_ptr0[2*x0 + 1];
        auto tmp3 = metal::precise::cos(tmp2);
        auto tmp4 = tmp1 + tmp3;
        out_ptr0[x0] = static_cast<float>(tmp4);
    }
""")
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/143893
Approved by: https://github.com/jansel
ghstack dependencies: #143891, #143892
2024-12-28 06:58:32 +00:00
PyTorch MergeBot
cc4e70b7c3 Revert "Use absolute path path.resolve() -> path.absolute() (#129409)"
This reverts commit 135c7db99d.

Reverted https://github.com/pytorch/pytorch/pull/129409 on behalf of https://github.com/malfet due to need to revert to as dependency of https://github.com/pytorch/pytorch/pull/129374 ([comment](https://github.com/pytorch/pytorch/pull/129409#issuecomment-2562969825))
2024-12-26 17:26:06 +00:00
Xuehai Pan
135c7db99d Use absolute path path.resolve() -> path.absolute() (#129409)
Changes:

1. Always explicit `.absolute()`: `Path(__file__)` -> `Path(__file__).absolute()`
2. Replace `path.resolve()` with `path.absolute()` if the code is resolving the PyTorch repo root directory.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129409
Approved by: https://github.com/albanD
2024-12-24 08:33:08 +00:00
Yidi Wu
a8fa98ccef skip test dynamo for aot_dispatch tests on ci (#142185)
A lot of tests in test_aotdispatch.py is not meaningful (from user's perspective) when we run with dynamo. So we skip them.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/142185
Approved by: https://github.com/zou3519
2024-12-11 18:46:58 +00:00
Jane Xu
be27dbf2b8 Enable CPP/CUDAExtension with py_limited_api for python agnosticism (#138088)
Getting tested with ao, but now there is a real test i added.

## What does this PR do?

We want to allow custom PyTorch extensions to be able to build one wheel for multiple Python versions, in other words, achieve python agnosticism. It turns out that there is such a way that setuptools/Python provides already! Namely, if the user promises to use only the Python limited API in their extension, they can pass in `py_limited_api` to their Extension class and to the bdist_wheel command (with a min python version) in order to build 1 wheel that will suffice across multiple Python versions.

Sounds lovely! Why don't people do that already with PyTorch? Well 2 things. This workflow is hardly documented (even searching for python agnostic specifically does not reveal many answers) so I'd expect that people simply don't know about it. But even if they did, _PyTorch_ custom Extensions would still not work because we always link torch_python, which does not abide by py_limited_api rules.

So this is where this PR comes in! We respect when the user specifies py_limited_api and skip linking torch_python under that condition, allowing users to enroll in the provided functionality I just described.

## How do I know this PR works?

I manually tested my silly little ultra_norm locally (with `import python_agnostic`) and wrote a test case for the extension showing that
- torch_python doesn't show up in the ldd tree
- no Py- symbols show up
It may be a little confusing that our test case is actually python-free (more clean than python-agnostic) but it is sufficient (and not necessary) towards showing that this change works.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138088
Approved by: https://github.com/ezyang, https://github.com/albanD
2024-12-11 18:22:55 +00:00
Nikita Shulga
95b17f6346 [MPS] Add CompileShader method (#141478)
This allows one to do something like that
```python
import torch
x = torch.ones(10, device="mps")
m = torch.mps._compile_shader("""
   kernel void foo(device float* x, uint idx [[thread_position_in_grid]]) {
     x[idx] += idx;
   }
")
m.foo(x)
```

And in general enables writing custom operators using Metal shaders purely in Python
Pull Request resolved: https://github.com/pytorch/pytorch/pull/141478
Approved by: https://github.com/manuelcandales
2024-12-11 02:00:51 +00:00
PyTorch MergeBot
3e28da1e06 Revert "skip test dynamo for aot_dispatch tests on ci (#142185)"
This reverts commit 7eda06b366.

Reverted https://github.com/pytorch/pytorch/pull/142185 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but I think it has a landrace in trunk ([comment](https://github.com/pytorch/pytorch/pull/142185#issuecomment-2532605728))
2024-12-10 18:50:17 +00:00