pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Joel Schlosser	3ecd444004	Support independent builds for cpp extension tests + apply to libtorch_agnostic tests (#153264 ) Related: #148920 This PR: * Provides a helper `install_cpp_extension(extension_root)` for building C++ extensions. This is intended to be used in `TestMyCppExtension.setUpClass()` * Updates libtorch_agnostic tests to use this * Deletes preexisting libtorch_agnostic tests from `test/test_cpp_extensions_aot.py` * Fixes `run_test.py` to actually run tests in `test/cpp_extensions/libtorch_agnostic_extension/test/test_libtorch_agnostic.py` to avoid losing coverage. This wasn't being run due to logic excluding tests that start with "cpp"; this is fixed now After this PR, it is now possible to run: ``` python test/cpp_extensions/libtorch_agnostic_extension/test/test_libtorch_agnostic.py ``` and the test will build the `libtorch_agnostic` extension before running the tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/153264 Approved by: https://github.com/janeyx99	2025-05-20 19:18:09 +00:00
Shangdi Yu	b3dea0c0dd	Change aoti cpp tests to run serially within file (#152960 ) Fixes #152674 https://github.com/pytorch/pytorch/issues/152889 https://github.com/pytorch/pytorch/issues/152888 https://github.com/pytorch/pytorch/issues/152891 `--dist=loadfile` ensures all tests in the same source file run in the same worker. Tests like `FreeInactiveConstantBufferRuntimeConstantFoldingCuda` expect exclusive access to memory during test time to compute diffs (e.g., initMemory - updateMemory2 == DATASIZE). With `-n 3`, tests run in separate processes, but CUDA device memory is shared — and cudaMemGetInfo() reads device-wide global state. ``` python test/run_test.py --cpp --verbose -i cpp/test_aoti_inference -dist=loadfile ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/152960 Approved by: https://github.com/desertfire, https://github.com/cyyever	2025-05-14 17:02:39 +00:00
Guilherme Leobas	ae1e51b6ad	Add infra to run CPython tests under Dynamo (#150787 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/150787 Approved by: https://github.com/zou3519	2025-05-07 04:03:14 +00:00
PyTorch MergeBot	103fe856e1	Revert "Add infra to run CPython tests under Dynamo (#150787 )" This reverts commit `7c96dd8f0c`. Reverted https://github.com/pytorch/pytorch/pull/150787 on behalf of https://github.com/huydhn due to Sorry for reverting your change but a failed test is showing up in trunk ([comment](https://github.com/pytorch/pytorch/pull/150787#issuecomment-2852818113))	2025-05-06 00:20:02 +00:00
Alexander Grund	99287b170b	Generate test reports for pytest when option is given (#152170 ) The argument needs to be appended when test reports should be generated. IS_CI is not necessarily set, so rather check TEST_SAVE_XML instead as in other places where test reports are conditionally enabled. See also https://github.com/pytorch/pytorch/issues/126523 Pull Request resolved: https://github.com/pytorch/pytorch/pull/152170 Approved by: https://github.com/Skylion007	2025-05-05 17:46:40 +00:00
Guilherme Leobas	7c96dd8f0c	Add infra to run CPython tests under Dynamo (#150787 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/150787 Approved by: https://github.com/zou3519	2025-05-05 17:20:14 +00:00
Alexander Grund	ad11d6378c	Don't run NCCL/gloo distributed test without GPUs (#150764 ) If there aren't any GPUs the WORLD_SIZE would be zero which does not work. So skip those backends completely in that case. Fix after https://github.com/pytorch/pytorch/pull/137161 It might make sense to still run the (CPU-) part of the tests by using something like `world_size = max(3, gpu_count)` or `num_gpus if num_gpus else 3` instead of skipping them all Pull Request resolved: https://github.com/pytorch/pytorch/pull/150764 Approved by: https://github.com/kwen2501	2025-04-29 05:27:23 +00:00
PyTorch MergeBot	c03359de2d	Revert "[Inductor] Record Triton’s Base32 Cache Key in .best_config for Debugging (#148981 )" This reverts commit `fc6e37ceb2`. Reverted https://github.com/pytorch/pytorch/pull/148981 on behalf of https://github.com/ZainRizvi due to Sorry but this is breaking internally. @davidberard98 can you please help get these changes validated? Details in D73628297. To validate the fixes internally, you can follow the instructions here: https://fburl.com/fixing-ghfirst-reverts ([comment](https://github.com/pytorch/pytorch/pull/148981#issuecomment-2831044810))	2025-04-25 17:45:13 +00:00
fulvius31	fc6e37ceb2	[Inductor] Record Triton’s Base32 Cache Key in .best_config for Debugging (#148981 ) This is a follow-up PR of the reverted one https://github.com/pytorch/pytorch/pull/147019 : Modified TorchInductor’s autotuning flow so that each best_config JSON file also includes the Triton “base32” (or base64) cache key. Motivation Debugging & Analysis: With this change, we can quickly identify which compiled binary and IRs belongs to a given best config. The impact is minimal since it is only an extra field in .best_config. It can help advanced performance tuning or kernel-level debugging. Also, since Triton already stores cubin/hsaco in its cache, developers/researchers can avoid to set store_cubin = True since they can get the cubin/hsaco in the Triton cache and with the code provided in this PR, they can easily match the best_config with the right Triton cache directory for the "best" kernel. Pull Request resolved: https://github.com/pytorch/pytorch/pull/148981 Approved by: https://github.com/davidberard98	2025-04-24 21:28:53 +00:00
FFFrog	3528488061	[Openreg][PrivateUse1] Enable CI for openreg (#151007 ) Changes: - move test_openreg.py from test/cpp_extensions/open_registration_extension/ to test/ - update README.md for openreg - enable CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/151007 Approved by: https://github.com/albanD	2025-04-18 02:40:07 +00:00
PyTorch MergeBot	f252f9df5e	Revert "[Openreg][PrivateUse1] Enable CI for openreg (#151007 )" This reverts commit `abbca37fe8`. Reverted https://github.com/pytorch/pytorch/pull/151007 on behalf of https://github.com/clee2000 due to At least test_record_event needs to also be skipped on dynamo too, its failing and then somehow causing a hang? https://github.com/pytorch/pytorch/actions/runs/14487625709/job/40637535027#step:25:73 ([comment](https://github.com/pytorch/pytorch/pull/151007#issuecomment-2810789483))	2025-04-16 21:05:17 +00:00
FFFrog	abbca37fe8	[Openreg][PrivateUse1] Enable CI for openreg (#151007 ) Changes: - move test_openreg.py from test/cpp_extensions/open_registration_extension/ to test/ - update README.md for openreg - enable CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/151007 Approved by: https://github.com/albanD ghstack dependencies: #151005	2025-04-16 07:55:51 +00:00
Nikita Shulga	d7050ef48b	[CI] Run test_torchinductor for MPS device (#150821 ) There are only 118 failures atm, mark them all with xfail to avoid new regressions Add `xfail_if_mps_unimplemented` decorator to distinguish between tests that call unimplemented eager op vs ones that fail for some other reason. Added `aten._scaled_dot_product_attention_math_for_mps` fallback to make test behavior consistent between MacOS-15 (where falback is in place) and MacOS-14 Weird MacOS-14 specific skips: - test_torchinductor.py::GPUTests::test_cat_extern_kernel_mps - test_torchinductor.py::GPUTests::test_sort_transpose_mps (likely an eager bug) - test_torchinductor.py::GPUTests::test_unaligned_input_mps Numerous MacOS-13 skips, including few eager hard crashes, for example running `test_torchinductor.py::GPUTests::test_scatter5_mps` causes ``` /AppleInternal/Library/BuildRoots/c651a45f-806e-11ed-a221-7ef33c48bc85/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShaders/MPSNDArray/Kernels/MPSNDArrayScatter.mm:309: failed assertion `Rank of destination array (1) must be greater than or equal to inner-most dimension of indices array (3)' ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/150821 Approved by: https://github.com/ZainRizvi, https://github.com/dcci ghstack dependencies: #151224, #151246, #151272, #151282, #151288	2025-04-15 18:42:39 +00:00
Prachi Gupta	47cdad2995	[ROCm] Enable several fsdp related UTs (#149369 ) Enabling 26 UTs for ROCm in the following files: - distributed._shard.sharded_optim.test_sharded_optim - 2 UTs - distributed._shard.sharded_tensor.ops.test_binary_cmp - 4 UTs - distributed._shard.sharded_tensor.ops.test_init - 3 UTs - distributed._shard.sharded_tensor.ops.test_embedding - 2 UTs - distributed._shard.sharded_tensor.ops.test_embedding_bag - 2 UTs - distributed._composable.test_replicate_with_compiler - 4 UTs - distributed._composable.fsdp.test_fully_shard_grad_scaler - 1 UTs - distributed.tensor.test_attention - 4 UTs - distributed.tensor.test_matrix_ops - 1 UTs - distributed.tensor.test_tensor_ops - 1 UTs - distributed.fsdp.test_fsdp_grad_acc - 2 UTs Pull Request resolved: https://github.com/pytorch/pytorch/pull/149369 Approved by: https://github.com/jeffdaily	2025-03-31 16:15:57 +00:00
Catherine Lee	85079e4380	[TD] Enable TD on distributed cpu (#150028 ) Enable TD on distributed cpu, I think the only reason it's not is because I forgot to enable it Get rid of some of the statements that are no ops: * asan uses default shard * nogpu got moved to periodic * no windows cuda testing anymore Only thing on pull and trunk that doesn't use TD is dynamo_wrapped but I think it's fast enough to be ok for now, we can take another look after this Pull Request resolved: https://github.com/pytorch/pytorch/pull/150028 Approved by: https://github.com/ZainRizvi	2025-03-28 17:19:11 +00:00
Aleksei Nikiforov	0c139fa58e	Switch s390x tests to blocklist (#149507 ) Switch s390x tests to blocklist Pull Request resolved: https://github.com/pytorch/pytorch/pull/149507 Approved by: https://github.com/seemethere	2025-03-26 12:11:41 +00:00
Aleksei Nikiforov	d5b1d99f78	Enable more nightly tests on s390x (#148452 ) Also enable some tests which probably were accidentally disabled. Pull Request resolved: https://github.com/pytorch/pytorch/pull/148452 Approved by: https://github.com/seemethere, https://github.com/malfet	2025-03-18 16:09:39 +00:00
soulitzer	916e8979d3	Skip some tests not using gradcheck on slowgradcheck (#149220 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/149220 Approved by: https://github.com/seemethere	2025-03-17 00:34:52 +00:00
Jane Xu	971606befa	Add a stable TORCH_LIBRARY to C shim (#148124 ) This PR adds two main parts: - shim.h stable C APIs into torch::Library APIs - a higher level API in torch/csrc/stable/library.h that calls into this shim.h + otherwise is self contained Goal: custom kernel writers should be able to call the apis in the directories above in order to register their library in a way that allows their custom extension to run with a different libtorch version than it was built with. Subplots resolved: - Do we want a whole separate StableLibrary or do we want to freeze torch::Library and add `m.stable_impl(cstring, void (fn)(void , int64_t, int64_t)` into it - Yes, we want a separate StableLibrary. We cannot freeze Library and it is NOT header only. - Should I use unint64_t as the common denominator instead of void to support 32bit architectures better? - Yes, and done - Should I add a stable `def` and `fragment` when those can be done in python? - I think we do want these --- and now they're done - Where should library_stable_impl.cpp live? -- no longer relevant - I need some solid test cases to make sure everything's going ok. I've intentionally thrown in a bunch of random dtypes into the signature, but I still haven't tested returning multiple things, returning nothing, complex dtypes, etc. - Have since tested all the torch library endpoints. the others can be tested in a followup to separate components that need to be in shim.h vs can be added later Pull Request resolved: https://github.com/pytorch/pytorch/pull/148124 Approved by: https://github.com/albanD, https://github.com/zou3519, https://github.com/atalman	2025-03-11 19:12:46 +00:00
PyTorch MergeBot	275a7c5dbb	Revert "Add a stable TORCH_LIBRARY to C shim (#148124 )" This reverts commit `327e07ac1d`. Reverted https://github.com/pytorch/pytorch/pull/148124 on behalf of https://github.com/malfet due to Sorry for reverting your PR, but somehow it caused test failures in newly introduced tests, see https://hud.pytorch.org/hud/pytorch/pytorch/main/1?per_page=50&name_filter=pull%20%2F%20linux-focal-cuda12.6-py3.10-gcc11-sm89%20%2F%20test%20(default%2C%201&mergeLF=true ([comment](https://github.com/pytorch/pytorch/pull/148124#issuecomment-2709057833))	2025-03-09 20:44:56 +00:00
Jane Xu	327e07ac1d	Add a stable TORCH_LIBRARY to C shim (#148124 ) This PR adds two main parts: - shim.h stable C APIs into torch::Library APIs - a higher level API in torch/csrc/stable/library.h that calls into this shim.h + otherwise is self contained Goal: custom kernel writers should be able to call the apis in the directories above in order to register their library in a way that allows their custom extension to run with a different libtorch version than it was built with. Subplots resolved: - Do we want a whole separate StableLibrary or do we want to freeze torch::Library and add `m.stable_impl(cstring, void (fn)(void , int64_t, int64_t)` into it - Yes, we want a separate StableLibrary. We cannot freeze Library and it is NOT header only. - Should I use unint64_t as the common denominator instead of void to support 32bit architectures better? - Yes, and done - Should I add a stable `def` and `fragment` when those can be done in python? - I think we do want these --- and now they're done - Where should library_stable_impl.cpp live? -- no longer relevant - I need some solid test cases to make sure everything's going ok. I've intentionally thrown in a bunch of random dtypes into the signature, but I still haven't tested returning multiple things, returning nothing, complex dtypes, etc. - Have since tested all the torch library endpoints. the others can be tested in a followup to separate components that need to be in shim.h vs can be added later Pull Request resolved: https://github.com/pytorch/pytorch/pull/148124 Approved by: https://github.com/albanD, https://github.com/zou3519	2025-03-09 10:07:25 +00:00
PyTorch MergeBot	63778cb8a0	Revert "[Inductor] Record Triton’s Base32 Cache Key in `.best_config` for Debugging (#147019 )" This reverts commit `e3e45d90d8`. Reverted https://github.com/pytorch/pytorch/pull/147019 on behalf of https://github.com/clee2000 due to broke inductor test inductor/test_max_autotune.py::TestMaxAutotune::test_cat_max_autotune_extern [GH job link](https://github.com/pytorch/pytorch/actions/runs/13653495421/job/38171259603) [HUD commit link](`e3e45d90d8`) on inductor workflow and rocm workflow ([comment](https://github.com/pytorch/pytorch/pull/147019#issuecomment-2698677222))	2025-03-04 19:20:15 +00:00
fulvius31	e3e45d90d8	[Inductor] Record Triton’s Base32 Cache Key in `.best_config` for Debugging (#147019 ) Modified TorchInductor’s autotuning flow so that each `best_config` JSON file also includes the Triton “base32” (or base64) cache key. Motivation Debugging & Analysis: With this change, we can quickly identify which compiled binary and IRs belongs to a given best config. The impact is minimal since it is only an extra field in .best_config. It can help advanced performance tuning or kernel-level debugging. Also, since Triton already stores cubin/hsaco in its cache, developers/researchers can avoid to set `store_cubin = True` since they can get the cubin/hsaco in the Triton cache and with the code provided in this PR, they can easily match the best_config with the right Triton cache directory for the "best" kernel. Pull Request resolved: https://github.com/pytorch/pytorch/pull/147019 Approved by: https://github.com/davidberard98	2025-03-04 12:16:38 +00:00
Alexander Grund	f1cce0951b	Create unique test report files for distributed tests (#148325 ) The distributed tests are executed once for each backend and for each init method. `$TEST_REPORT_SOURCE_OVERRIDE` is used such that test results from different backends are stored in different files. The same needs to be done for the init method. Move the setting of the variable into `test_distributed` and incorporate the init method into the name. Useful for e.g. https://github.com/pytorch/pytorch/issues/126523 Pull Request resolved: https://github.com/pytorch/pytorch/pull/148325 Approved by: https://github.com/clee2000	2025-03-04 10:45:33 +00:00
Xuehai Pan	c73a92fbf5	[BE][CI] bump `ruff` to 0.9.2: multiline `assert` statements (#144546 ) Reference: https://docs.astral.sh/ruff/formatter/black/#assert-statements > Unlike Black, Ruff prefers breaking the message over breaking the assertion, similar to how both Ruff and Black prefer breaking the assignment value over breaking the assignment target: > > ```python > # Input > assert ( > len(policy_types) >= priority + num_duplicates > ), f"This tests needs at least {priority+num_duplicates} many types." > > > # Black > assert ( > len(policy_types) >= priority + num_duplicates > ), f"This tests needs at least {priority+num_duplicates} many types." > > # Ruff > assert len(policy_types) >= priority + num_duplicates, ( > f"This tests needs at least {priority + num_duplicates} many types." > ) > ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/144546 Approved by: https://github.com/malfet	2025-02-27 20:46:16 +00:00
Zhenbin Lin	7ffae2c028	Split test_transformers.py (#147441 ) Split test_transformers.py into test_transformers.py and test_transformers_privateuser1.py. Currently the privateuse1 test cases in test_transformers.py are skipped since they conflict with cuda test cases. Pull Request resolved: https://github.com/pytorch/pytorch/pull/147441 Approved by: https://github.com/drisspg	2025-02-26 11:54:24 +00:00
Xuehai Pan	754fb834db	[BE][CI] bump `ruff` to 0.9.0: string quote styles (#144569 ) Reference: https://docs.astral.sh/ruff/formatter/#f-string-formatting - Change the outer quotes to double quotes for nested f-strings ```diff - f'{", ".join(args)}' + f"{', '.join(args)}" ``` - Change the inner quotes to double quotes for triple f-strings ```diff string = """ - {', '.join(args)} + {", ".join(args)} """ ``` - Join implicitly concatenated strings ```diff - string = "short string " "short string " f"{var}" + string = f"short string short string {var}" ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/144569 Approved by: https://github.com/Skylion007 ghstack dependencies: #146509	2025-02-24 19:56:09 +00:00
Catherine Lee	0d16188c06	[CI] Use job name to index into test times json (#147154 ) When the test times are generated, it doesn't know what the build environment is because it's an environment variable. But when we index into the test times, we (previously) didn't know what the job name is. These are usually the same but sometimes they're different and when they're different it ends up using default, which can have unbalanced sharding I think job name was added at some point to most of the CI environments but I didn't realize, so we can now update this code to use the job name instead so the generation and the indexing match also upload stats workflow for mps Checked that inductor_amx doesn't use default Pull Request resolved: https://github.com/pytorch/pytorch/pull/147154 Approved by: https://github.com/huydhn	2025-02-14 17:06:56 +00:00
PyTorch MergeBot	9a883007a2	Revert "Implement cuda graphs implementation of torch.cond and torch.while_loop (#140979 )" This reverts commit `c7515da7b0`. Reverted https://github.com/pytorch/pytorch/pull/140979 on behalf of https://github.com/huydhn due to This change has been reported to break internal code ([comment](https://github.com/pytorch/pytorch/pull/140979#issuecomment-2657361940))	2025-02-13 18:04:26 +00:00
Daniel Galvez	c7515da7b0	Implement cuda graphs implementation of torch.cond and torch.while_loop (#140979 ) This is a new PR for #130386 , which got stale and was closed. Since I force-pushed to that branch in order to rebase it on top of main, the PR can no longer be reopened, according to https://github.com/isaacs/github/issues/361 I fixed the possibly-not-warmed-up problem described here: https://github.com/pytorch/pytorch/pull/130386/files#r1690856534 Since starting this, torch.cond and torch.while_loop now apparently have support for backward passes. I will look into what it might take to support that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/140979 Approved by: https://github.com/eqy, https://github.com/eellison	2025-02-11 18:16:15 +00:00
Aleksei Nikiforov	44ecbcbd5a	s390x: disable test_model_exports_to_core_aten.py test (#145835 ) It often gets killed by OOM. Disable it while investigating. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145835 Approved by: https://github.com/huydhn	2025-01-31 17:45:10 +00:00
Burak Turk	01a4d86b31	add pt2 callbacks for backward pass and prevent duplicate callbacks (#145732 ) Summary: This change adds callbacks for lazy backwards compilation while preventing duplicate callbacks to be fired. Differential Revision: D68577593 Pull Request resolved: https://github.com/pytorch/pytorch/pull/145732 Approved by: https://github.com/mlazos	2025-01-28 03:50:02 +00:00
albanD	0d28188cc8	Move privateuse1 test out of test_utils and make them serial (#145380 ) Fixes https://github.com/pytorch/pytorch/issues/132720 The reason is that changing the privateuse1 module is global and so can race when other tests happen to check if it is enabled. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145380 Approved by: https://github.com/Skylion007, https://github.com/janeyx99	2025-01-23 00:31:39 +00:00
Aaron Orenstein	99dbc5b0e2	PEP585 update - test (#145176 ) See #145101 for details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145176 Approved by: https://github.com/bobrenjc93	2025-01-22 04:48:28 +00:00
Zhenbin Lin	cbb1ed2966	[1/N] OpenReg: Replace `open_registration_extension.cpp` with openreg (#141815 ) As described in OpenReg [next-steps](https://github.com/pytorch/pytorch/blob/main/test/cpp_extensions/open_registration_extension/README.md#next-steps), here we replace the current `open_registration_extension.cpp` test in PyTorch CI with openreg. The current `open_registration_extension.cpp` contains two parts: 1. Implentations to support `PrivateUse1` backend. 2. Helper functions used for UTs in `test_cpp_extensions_open_device_registration.py` and `test_transformers.py`. For the first part, we'll replace it with openreg. For the second part, we'll migrate them to ut files step by step. @albanD Pull Request resolved: https://github.com/pytorch/pytorch/pull/141815 Approved by: https://github.com/albanD	2025-01-14 15:59:00 +00:00
Aleksei Nikiforov	4143312e67	S390x ci periodic tests (#125401 ) Periodically run testsuite for s390x Dependencies update Package z3-solver is updated from version 4.12.2.0 to version 4.12.6.0. This is a minor version update, so no functional change is expected. The reason for update is build on s390x. pypi doesn't provide binary build for z3-solver for versions 4.12.2.0 or 4.12.6.0 for s390x. Unfortunately, version 4.12.2.0 fails to build with newer gcc used on s390x builders, but those errors are fixed in version 4.12.6.0. Due to this minor version bump fixes build on s390x. ``` # pip3 install z3-solver==4.12.2.0 ... In file included from /tmp/pip-install-756iytc6/z3-solver_ce6f750b780b4146a9a7c01e52672071/core/src/util/region.cpp:53: /tmp/pip-install-756iytc6/z3-solver_ce6f750b780b4146a9a7c01e52672071/core/src/util/region.cpp: In member function ‘void* region::allocate(size_t)’: /tmp/pip-install-756iytc6/z3-solver_ce6f750b780b4146a9a7c01e52672071/core/src/util/tptr.h:29:62: error: ‘uintptr_t’ does not name a type 29 \| #define ALIGN(T, PTR) reinterpret_cast<T>(((reinterpret_cast<uintptr_t>(PTR) >> PTR_ALIGNMENT) + \ \| ^~~~~~~~~ /tmp/pip-install-756iytc6/z3-solver_ce6f750b780b4146a9a7c01e52672071/core/src/util/region.cpp:82:22: note: in expansion of macro ‘ALIGN’ 82 \| m_curr_ptr = ALIGN(char , new_curr_ptr); \| ^~~~~ /tmp/pip-install-756iytc6/z3-solver_ce6f750b780b4146a9a7c01e52672071/core/src/util/region.cpp:57:1: note: ‘uintptr_t’ is defined in header ‘<cstdint>’; did you forget to ‘#include <cstdint>’? 56 \| #include "util/page.h" +++ \|+#include <cstdint> 57 \| ``` Python paths update* On AlmaLinux 8 s390x, old paths: ``` python -c 'from distutils.sysconfig import get_python_lib; print(get_python_lib())' /usr/lib/python3.12/site-packages ``` Total result is `/usr/lib/python3.12/site-packages/torch;/usr/lib/python3.12/site-packages` New paths: ``` python -c 'import site; print(";".join([x for x in site.getsitepackages()] + [x + "/torch" for x in site.getsitepackages()]))' /usr/local/lib64/python3.12/site-packages;/usr/local/lib/python3.12/site-packages;/usr/lib64/python3.12/site-packages;/usr/lib/python3.12/site-packages;/usr/local/lib64/python3.12/site-packages/torch;/usr/local/lib/python3.12/site-packages/torch;/usr/lib64/python3.12/site-packages/torch;/usr/lib/python3.12/site-packages/torch ``` ``` # python -c 'import torch ; print(torch)' <module 'torch' from '/usr/local/lib64/python3.12/site-packages/torch/__init__.py'> ``` `pip3 install dist/.whl` installs torch into `/usr/local/lib64/python3.12/site-packages`, and later it's not found by cmake with old paths: ``` CMake Error at CMakeLists.txt:9 (find_package): By not providing "FindTorch.cmake" in CMAKE_MODULE_PATH this project has asked CMake to find a package configuration file provided by "Torch", but CMake did not find one. ``` https://github.com/pytorch/pytorch/actions/runs/10994060107/job/30521868178?pr=125401 Builders availability* Build took 60 minutes Tests took: 150, 110, 65, 55, 115, 85, 50, 70, 105, 110 minutes (split into 10 shards) 60 + 150 + 110 + 65 + 55 + 115 + 85 + 50 + 70 + 105 + 110 = 975 minutes used. Let's double it. It would be 1950 minutes. We have 20 machines * 24 hours = 20 * 24 * 60 = 20 * 1440 = 28800 minutes We currently run 5 nightly binaries builds, each on average 90 minutes build, 15 minutes test, 5 minutes upload, 110 minutes total for each, 550 minutes total. Doubling would be 1100 minutes. That leaves 28800 - 1100 = 27700 minutes total. Periodic tests would use will leave 25750 minutes. Nightly binaries build + nightly tests = 3050 minutes. 25750 / 3050 = 8.44. So we could do both 8 more times for additional CI runs for any reason. And that is with pretty good safety margin. Skip test_tensorexpr On s390x, pytorch is built without llvm. Even if it would be built with llvm, llvm currently doesn't support used features on s390x and test fails with errors like: ``` JIT session error: Unsupported target machine architecture in ELF object pytorch-jitted-objectbuffer unknown file: Failure C++ exception with description "valOrErr INTERNAL ASSERT FAILED at "/var/lib/jenkins/workspace/torch/csrc/jit/tensorexpr/llvm_jit.h":34, please report a bug to PyTorch. Unexpected failure in LLVM JIT: Failed to materialize symbols: { (main, { func }) } ``` Disable cpp/static_runtime_test on s390x Quantization is not fully supported on s390x in pytorch yet. Pull Request resolved: https://github.com/pytorch/pytorch/pull/125401 Approved by: https://github.com/malfet Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>	2025-01-10 18:21:07 +00:00
Wanchao Liang	96f4abba17	[dtensor] move all tests to distribute/tensor folder (#144166 ) as titled, mainly moving files Pull Request resolved: https://github.com/pytorch/pytorch/pull/144166 Approved by: https://github.com/Skylion007	2025-01-08 00:32:33 +00:00
PyTorch MergeBot	6c54963f75	Revert "[dtensor] move all tests to distribute/tensor folder (#144166 )" This reverts commit `2e1ea8598f`. Reverted https://github.com/pytorch/pytorch/pull/144166 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but inductor/test_compiled_autograd needs to be updated ([comment](https://github.com/pytorch/pytorch/pull/144166#issuecomment-2575969871))	2025-01-07 18:31:36 +00:00
Wanchao Liang	2e1ea8598f	[dtensor] move all tests to distribute/tensor folder (#144166 ) as titled, mainly moving files Pull Request resolved: https://github.com/pytorch/pytorch/pull/144166 Approved by: https://github.com/Skylion007	2025-01-07 06:45:14 +00:00
PyTorch MergeBot	99f2491af9	Revert "Use absolute path `path.resolve()` -> `path.absolute()` (#129409 )" This reverts commit `45411d1fc9`. Reverted https://github.com/pytorch/pytorch/pull/129409 on behalf of https://github.com/jeanschmidt due to Breaking internal CI, @albanD please help get this PR merged ([comment](https://github.com/pytorch/pytorch/pull/129409#issuecomment-2571316444))	2025-01-04 14:17:20 +00:00
Xuehai Pan	45411d1fc9	Use absolute path `path.resolve()` -> `path.absolute()` (#129409 ) Changes: 1. Always explicit `.absolute()`: `Path(__file__)` -> `Path(__file__).absolute()` 2. Replace `path.resolve()` with `path.absolute()` if the code is resolving the PyTorch repo root directory. Pull Request resolved: https://github.com/pytorch/pytorch/pull/129409 Approved by: https://github.com/albanD	2025-01-03 20:03:40 +00:00
Nikita Shulga	d8c3900d80	[Inductor] Implement primitive Metal compiler (#143893 ) Still work in progress, only works for element wise operations. Current implementation could be used to turn something like ```python def f(x): return x[:,::2].sin() + x[:, 1::2].cos() ``` into the following shader ```python # Topologically Sorted Source Nodes: [sin, cos, add], Original ATen: [aten.sin, aten.cos, aten.add] # Source node to ATen node mapping: # add => add # cos => cos # sin => sin # Graph fragment: # %sin : [num_users=1] = call_function[target=torch.ops.aten.sin.default](args = (%slice_2,), kwargs = {}) # %cos : [num_users=1] = call_function[target=torch.ops.aten.cos.default](args = (%slice_4,), kwargs = {}) # %add : [num_users=1] = call_function[target=torch.ops.aten.add.Tensor](args = (%sin, %cos), kwargs = {}) mps_lib = torch.mps._compile_shader(""" kernel void kernel_0( device float* out_ptr0, constant float* in_ptr0, uint xindex [[thread_position_in_grid]] ) { int x0 = xindex; auto tmp0 = in_ptr0[2x0]; auto tmp1 = metal::precise::sin(tmp0); auto tmp2 = in_ptr0[2x0 + 1]; auto tmp3 = metal::precise::cos(tmp2); auto tmp4 = tmp1 + tmp3; out_ptr0[x0] = static_cast<float>(tmp4); } """) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/143893 Approved by: https://github.com/jansel ghstack dependencies: #143891, #143892	2024-12-28 06:58:32 +00:00
PyTorch MergeBot	cc4e70b7c3	Revert "Use absolute path `path.resolve()` -> `path.absolute()` (#129409 )" This reverts commit `135c7db99d`. Reverted https://github.com/pytorch/pytorch/pull/129409 on behalf of https://github.com/malfet due to need to revert to as dependency of https://github.com/pytorch/pytorch/pull/129374 ([comment](https://github.com/pytorch/pytorch/pull/129409#issuecomment-2562969825))	2024-12-26 17:26:06 +00:00
Xuehai Pan	135c7db99d	Use absolute path `path.resolve()` -> `path.absolute()` (#129409 ) Changes: 1. Always explicit `.absolute()`: `Path(__file__)` -> `Path(__file__).absolute()` 2. Replace `path.resolve()` with `path.absolute()` if the code is resolving the PyTorch repo root directory. Pull Request resolved: https://github.com/pytorch/pytorch/pull/129409 Approved by: https://github.com/albanD	2024-12-24 08:33:08 +00:00
Yidi Wu	a8fa98ccef	skip test dynamo for aot_dispatch tests on ci (#142185 ) A lot of tests in test_aotdispatch.py is not meaningful (from user's perspective) when we run with dynamo. So we skip them. Pull Request resolved: https://github.com/pytorch/pytorch/pull/142185 Approved by: https://github.com/zou3519	2024-12-11 18:46:58 +00:00
Jane Xu	be27dbf2b8	Enable CPP/CUDAExtension with py_limited_api for python agnosticism (#138088 ) Getting tested with ao, but now there is a real test i added. ## What does this PR do? We want to allow custom PyTorch extensions to be able to build one wheel for multiple Python versions, in other words, achieve python agnosticism. It turns out that there is such a way that setuptools/Python provides already! Namely, if the user promises to use only the Python limited API in their extension, they can pass in `py_limited_api` to their Extension class and to the bdist_wheel command (with a min python version) in order to build 1 wheel that will suffice across multiple Python versions. Sounds lovely! Why don't people do that already with PyTorch? Well 2 things. This workflow is hardly documented (even searching for python agnostic specifically does not reveal many answers) so I'd expect that people simply don't know about it. But even if they did, _PyTorch_ custom Extensions would still not work because we always link torch_python, which does not abide by py_limited_api rules. So this is where this PR comes in! We respect when the user specifies py_limited_api and skip linking torch_python under that condition, allowing users to enroll in the provided functionality I just described. ## How do I know this PR works? I manually tested my silly little ultra_norm locally (with `import python_agnostic`) and wrote a test case for the extension showing that - torch_python doesn't show up in the ldd tree - no Py- symbols show up It may be a little confusing that our test case is actually python-free (more clean than python-agnostic) but it is sufficient (and not necessary) towards showing that this change works. Pull Request resolved: https://github.com/pytorch/pytorch/pull/138088 Approved by: https://github.com/ezyang, https://github.com/albanD	2024-12-11 18:22:55 +00:00
Nikita Shulga	95b17f6346	[MPS] Add CompileShader method (#141478 ) This allows one to do something like that ```python import torch x = torch.ones(10, device="mps") m = torch.mps._compile_shader(""" kernel void foo(device float* x, uint idx [[thread_position_in_grid]]) { x[idx] += idx; } ") m.foo(x) ``` And in general enables writing custom operators using Metal shaders purely in Python Pull Request resolved: https://github.com/pytorch/pytorch/pull/141478 Approved by: https://github.com/manuelcandales	2024-12-11 02:00:51 +00:00
PyTorch MergeBot	3e28da1e06	Revert "skip test dynamo for aot_dispatch tests on ci (#142185 )" This reverts commit `7eda06b366`. Reverted https://github.com/pytorch/pytorch/pull/142185 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but I think it has a landrace in trunk ([comment](https://github.com/pytorch/pytorch/pull/142185#issuecomment-2532605728))	2024-12-10 18:50:17 +00:00
Yidi Wu	7eda06b366	skip test dynamo for aot_dispatch tests on ci (#142185 ) A lot of tests in test_aotdispatch.py is not meaningful (from user's perspective) when we run with dynamo. So we skip them. Pull Request resolved: https://github.com/pytorch/pytorch/pull/142185 Approved by: https://github.com/zou3519 ghstack dependencies: #141610	2024-12-10 17:33:57 +00:00
Mark Saroufim	e24190709f	[BE] Remove Model Dump utility (#141540 ) So I found this utility by accident, trying to find how many html files we have in the repo so I could convert them to markdown Turns out we package some html and js files in pytorch to visualize torchscript models. This seems kinda strange, probably shouldn't be in core, I removed the tests I could find. Maybe some internal tests will break but considering torchscript is being superseded might make sense to do this Last time there was a meaningful update to the test for this file was about 2 years ago by @digantdesai since then it's a bunch of routine upgrades It seems like this package is unused https://github.com/search?type=code&auto_enroll=true&q=torch.utils.model_dump&p=1 I skimmed through 5 pages of these and the only time this shows up in code search is when someone is either cloning pytorch or checking in their venv into github Pull Request resolved: https://github.com/pytorch/pytorch/pull/141540 Approved by: https://github.com/malfet	2024-11-27 22:52:55 +00:00
Aleksei Nikiforov	a82bab6419	Run only listed tests on s390x (#140265 ) Skip tests that are failing This was previously part of https://github.com/pytorch/pytorch/pull/125401 Pull Request resolved: https://github.com/pytorch/pytorch/pull/140265 Approved by: https://github.com/malfet Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>	2024-11-20 22:53:09 +00:00
Catherine Lee	0db21a6b23	Remove most rockset references (#139922 ) Remove most references to rockset: * replace comments and docs with a generic "backend database" * Delete `upload_to_rockset`, so we no longer need to install the package. * Do not upload perf stats to rockset as well (we should be completely on DynamoDB now right @huydhn?) According to VSCode, it went from 41 -> 7 instances of "rockset" in the repo Pull Request resolved: https://github.com/pytorch/pytorch/pull/139922 Approved by: https://github.com/huydhn, https://github.com/ZainRizvi	2024-11-12 21:17:43 +00:00
Catherine Lee	cc93c1e5e4	Upload artifacts during test run (#125799 ) Zip and upload artifacts while run_test is running Upgrade boto3 because I get errors about not having `botocore.vendored.six.move` if I don't Pull Request resolved: https://github.com/pytorch/pytorch/pull/125799 Approved by: https://github.com/huydhn	2024-10-22 16:48:57 +00:00
Will Feng	e4ad02892f	Upgrade distributed test to g4dn instances (T4 GPUs) (#137161 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/137161 Approved by: https://github.com/seemethere, https://github.com/eqy, https://github.com/yf225 Co-authored-by: Will Feng <yf225@cornell.edu>	2024-10-20 23:48:54 +00:00
PyTorch MergeBot	24ee4af86b	Revert "Upgrade distributed test to g4dn instances (T4 GPUs) (#137161 )" This reverts commit `2b7c7a20b9`. Reverted https://github.com/pytorch/pytorch/pull/137161 on behalf of https://github.com/kwen2501 due to breaking trunk ([comment](https://github.com/pytorch/pytorch/pull/137161#issuecomment-2417833666))	2024-10-16 20:05:38 +00:00
Catherine Lee	f173623bb2	[td] try catch exception, do not run td if not results (#138087 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/138087 Approved by: https://github.com/wdvr	2024-10-16 18:04:25 +00:00
Ke Wen	2b7c7a20b9	Upgrade distributed test to g4dn instances (T4 GPUs) (#137161 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/137161 Approved by: https://github.com/seemethere, https://github.com/eqy	2024-10-16 16:42:57 +00:00
PyTorch MergeBot	78632b97b1	Revert "Upgrade distributed test to g4dn instances (T4 GPUs) (#137161 )" This reverts commit `f43c4d28b8`. Reverted https://github.com/pytorch/pytorch/pull/137161 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but it seems another failure showing up after the upgrade ([comment](https://github.com/pytorch/pytorch/pull/137161#issuecomment-2415941159))	2024-10-16 07:26:34 +00:00
Ke Wen	f43c4d28b8	Upgrade distributed test to g4dn instances (T4 GPUs) (#137161 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/137161 Approved by: https://github.com/seemethere, https://github.com/eqy	2024-10-16 05:03:08 +00:00
Ke Wen	56cc22eb01	[CI][Distributed] Not to test distributed_test.py with UCC (#137932 ) Some UCC tests became unstable recently, with or without the M60 to T4 upgrade. See for example: #137855 (without upgrade), #137161 (with upgrade). So I am extracting the disablement from #137161 here. Failure signature: ``` RuntimeError: [/var/lib/jenkins/workspace/torch/csrc/distributed/c10d/ProcessGroupUCC.cpp:496] [Rank 0][ProcessGroupUCC-0][READY]failed to post triggered collective, error code -6: Unhandled error, system error code 0 ``` Earlier discussed here: https://github.com/pytorch/pytorch/pull/137161/files#r1797353294 Cc: @Aidyn-A @eqy Pull Request resolved: https://github.com/pytorch/pytorch/pull/137932 Approved by: https://github.com/fduwjj, https://github.com/malfet, https://github.com/eqy	2024-10-15 07:22:57 +00:00
Jagadish Krishnamoorthy	674d59359d	[ROCm] Enable dist sharded_tensor test suites (#137724 ) Following test suites are enabled on ROCm test_sharded_tensor test_sharded_tensor_reshard test_sharding_plan Pull Request resolved: https://github.com/pytorch/pytorch/pull/137724 Approved by: https://github.com/jithunnair-amd, https://github.com/pruthvistony, https://github.com/malfet	2024-10-14 20:20:57 +00:00
eellison	47af7cc962	Add compiler bisector (#131936 ) This is a utility to aid the torch.compile debugging. You provide a function that returns True on success, False on failure, or do something out of process and run bisect_helper `good \| bad`. The bisector will first go through backends - `eager`, `aot_eager`, `aot_eager_decomp_partition`, `inductor` to find the first failing backend. Then, it will go through subsystems within the backend - currently limited but could be expanded - and try to find the first subsystem for which disabling fixes the problem. Once it has found the failing subsystem, it will find the number of times the subsystem is applied, and then bisect through it. An example usage of how to hook it up for aot_eager_decomp_partition and decomposition subsystem is : ``` from torch._inductor.bisect_helper import BisectionManager if op in CURRENT_DECOMPOSITION_TABLE: if BisectionManager.disable_subsystem("aot_eager_decomp_partition", "decomposition", lambda: repr(op)): return NotImplemented ``` Once it has discovered the problematic change, it will print out the associated debug info, and you can set the same limits with `TORCH_BISECT_BACKEND` `TORCH_BISECT_SUBSYSTEM` and `TORCH_BISECT_MAX`. We could add further options as an automated way of going through a check list for checking divergence - e.g., the mode to emulate amp casts. Fix for https://github.com/pytorch/pytorch/issues/126546 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131936 Approved by: https://github.com/ezyang	2024-10-09 20:34:11 +00:00
Siddharth Kotapati	e27c0048db	Enable additional tests for MPS CI runs (#134356 ) As part of the follow up for https://github.com/pytorch/pytorch/issues/133520, adapting existing unused tests for use in MPS CI runs. Focusing on nhwc & other memory formatting tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/134356 Approved by: https://github.com/malfet, https://github.com/eqy, https://github.com/huydhn	2024-10-04 21:52:38 +00:00
Sergii Dymchenko	a619ced5ed	Revert "Update run_test.py" This reverts commit `193073b491`.	2024-09-26 17:34:52 -07:00
Sergii Dymchenko	193073b491	Update run_test.py	2024-09-26 16:56:29 -07:00
Xinya Zhang	74fd1bf965	[ROCm] Update to AOTriton 0.7b (#134498 ) Notable changes: 1. Enable CudaGraph related tests 2. Fix UT problems 3. EXPERIMENTAL Navi31 support. User should enable Navi31 support with Env Var `TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL=1` Know Problem: 1. `test/test_transformers.py` will massive failures and/or NaN outputs with `--use-pytest` + Update: Confirmed skip `class TestSDPAPrivateUse1Only` can fix the problem with `--use-pytest` Note: AOTriton 0.7b adds support to nestedtenosrs+SDPA but need more work (and consequently a separate PR) to enable it. Fixes #133540 Pull Request resolved: https://github.com/pytorch/pytorch/pull/134498 Approved by: https://github.com/pruthvistony, https://github.com/jeffdaily, https://github.com/malfet	2024-09-11 20:34:01 +00:00
Bo Li	16b8146c9e	Exclude test_transformers and unit tests which require recent GPU arch (#132895 ) This PR is to exclude test_transformers on ROCm temporarily and skip some unit tests which require recent GPU arch. Pull Request resolved: https://github.com/pytorch/pytorch/pull/132895 Approved by: https://github.com/jithunnair-amd, https://github.com/pruthvistony, https://github.com/malfet	2024-08-27 20:40:53 +00:00
Roy Hvaara	1565940114	[MPS] Add `test/test_nn.py` to test suite (#134184 ) This PR increases test coverage by including the tests in `test/test_nn.py` in the test suite of MPS. Some of the tests are decorated with `@expectedFailureMPS` for various reasons. Either that the op is not implemented, or that the outputs do not align. Those tests that contain differing results should be investigated further to rule out any live bugs. ```bash $ python test/run_test.py --mps --verbose -k TestNN Running test batch 'tests to run' cost 84.76 seconds ``` Ref #133520 Pull Request resolved: https://github.com/pytorch/pytorch/pull/134184 Approved by: https://github.com/albanD, https://github.com/malfet	2024-08-26 23:48:23 +00:00
Aidyn-A	28a4db84f2	[ARM] Fix infinite recursion in unwind (#134387 ) Fixes #119905 The `TORCH_SHOW_CPP_STACKTRACES=1` setting on ARM causes infinite recursive unwind because on failure a `StackTraceFetcher` attempts to unwind the <ins>failed instruction</ins>: `5ad759ca33/torch/csrc/profiler/combined_traceback.cpp (L25)` then the unwind itself fails: `5ad759ca33/torch/csrc/profiler/unwind/unwind.cpp (L10-L12)` and it causes another attempt to unwind the failure in `unwind()`... In summary, the executed instruction is equivalent to: ```C++ std::vector<void*> unwind() { // some instructions ... return unwind(); } ``` This PR replaces `TORCH_CHECK` by `TORCH_WARN_ONCE` as it will not cause an uncontrolled recursion. The only side effect would be an empty back-trace. Huge thanks to @nWEIdia who found the root cause! Pull Request resolved: https://github.com/pytorch/pytorch/pull/134387 Approved by: https://github.com/eqy, https://github.com/nWEIdia, https://github.com/malfet	2024-08-26 21:02:31 +00:00
Edward Z. Yang	99cf567714	Make SCRIBE_GRAPHQL_ACCESS_TOKEN available to test jobs running on main (#133536 ) It is possible to write to Meta's internal in-memory database Scuba via the Scribe Graph API: https://www.internalfb.com/intern/wiki/Scribe/users/Knowledge_Base/Interacting_with_Scribe_categories/Graph_API/ This is currently being used by pytorch/benchmark repo to upload torchbench performance results. I want to make this API generally available to all jobs running on CI in a semi-trusted context. To talk to Scribe, you need a secret access token. I have initially configured an environment prod-branch-main which contains `SCRIBE_GRAPHQL_ACCESS_TOKEN`, and switched a single class of jobs (linux-test) to use this environment when they are running on the main branch. Because we require approvals for running CI on untrusted contributions, we could potentially allow all jobs to run in this environment, including jobs on PRs, but I don't need this for my use case (per-PR benchmark result reporting, and miscellaneous statistics on main.) If this works, I'll push out this environment to the rest of our test jobs. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/133536 Approved by: https://github.com/xuzhao9, https://github.com/malfet, https://github.com/albanD	2024-08-15 19:53:17 +00:00
hippocookie	a6ad834fa8	Fix counting execution time in run_test.py (#133199 ) Counting `elapsed_time` immediately after `start_time`, not reflect real execution time of `test_batch`. Move `elapsed_time` and print method after `run_tests` method call to fix it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/133199 Approved by: https://github.com/clee2000	2024-08-15 15:29:44 +00:00
chuanqiw	72f2b29bb0	[CI] disable xpu kineto build (#133069 ) Due to the xpu kineto support PR https://github.com/pytorch/pytorch/pull/130811 landed, but the xpu ci infra not ready for now. Disable kineto build as a temp WA Pull Request resolved: https://github.com/pytorch/pytorch/pull/133069 Approved by: https://github.com/seemethere	2024-08-09 23:58:50 +00:00
Xuehai Pan	4226ed1585	[BE] Format uncategorized Python files with `ruff format` (#132576 ) Remove patterns ``, `test/`, and `torch/**` in `tools/linter/adapters/pyfmt_linter.py` and run `lintrunner`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/132576 Approved by: https://github.com/ezyang, https://github.com/Skylion007 ghstack dependencies: #132574	2024-08-04 17:13:31 +00:00
Xuehai Pan	5cc34f61d1	[CI] add new test config label `ci-test-showlocals` to control test log verbosity (#131981 ) Add a new label `ci-test-showlocals` and add it to test config filter. If the PR is labeled with `ci-test-showlocals` or "ci-test-showlocals" present in the PR comment, the test config filter will set a environment variable `TEST_SHOWLOCALS`. Then `pytest` will show local variables on failures for better debugging. Pull Request resolved: https://github.com/pytorch/pytorch/pull/131981 Approved by: https://github.com/malfet ghstack dependencies: #131151	2024-07-29 18:53:14 +00:00
Xuehai Pan	4694ee1ad2	[BE][tests] show local variables on failure in tests (#131151 ) ------ As per the title, add argument `--locals` for `unittest` and `--showlocals --tb=long` for `pytest` in CI. Some failures cannot be reproduced on the local machine but exist on cloud CI. This change allows us to investigate the test failure more easily. Example output: https://github.com/pytorch/pytorch/actions/runs/9961546996/job/27523888353?pr=130710#step:20:3361 ```text /opt/conda/envs/py_3.8/lib/python3.8/site-packages/sympy/core/function.py:307: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ cls = FloorDiv, base = -1.00000000000000, divisor = -1.00000000000000 @classmethod def eval(cls, base, divisor): # python test/test_dynamic_shapes.py -k TestDimConstraints.test_dim_constraints_solve_full # Assert triggered by inequality solver # assert base.is_integer, base # assert divisor.is_integer, divisor # We don't provide the same error message as in Python because SymPy # makes it difficult to check the types. if divisor.is_zero: raise ZeroDivisionError("division by zero") if base in (int_oo, -int_oo, sympy.oo, -sympy.oo) and divisor in ( int_oo, -int_oo, sympy.oo, -sympy.oo, ): return sympy.nan if base is sympy.nan or divisor is sympy.nan: return sympy.nan if base.is_zero: return sympy.S.Zero if base.is_integer and divisor == 1: return base if base.is_integer and divisor == -1: return sympy.Mul(base, -1) if ( isinstance(base, sympy.Number) and isinstance(divisor, sympy.Number) and ( base in (int_oo, -int_oo, sympy.oo, -sympy.oo) or divisor in (int_oo, -int_oo, sympy.oo, -sympy.oo) ) ): r = float(base) / float(divisor) if r == math.inf: return int_oo elif r == -math.inf: return -int_oo elif math.isnan(r): return sympy.nan else: return sympy.Integer(math.floor(r)) if isinstance(base, sympy.Integer) and isinstance(divisor, sympy.Integer): return sympy.Integer(int(base) // int(divisor)) if isinstance(base, FloorDiv): return FloorDiv(base.args[0], base.args[1] * divisor) # Expands (x + y) // b into x // b + y // b. # This only works if floor is an identity, i.e. x / b is an integer. for term in sympy.Add.make_args(base): quotient = term / divisor if quotient.is_integer and isinstance(divisor, sympy.Integer): # NB: this is correct even if the divisor is not an integer, but it # creates rational expressions that cause problems with dynamic # shapes. return FloorDiv(base - term, divisor) + quotient try: gcd = sympy.gcd(base, divisor) if gcd != 1: > return FloorDiv( sympy.simplify(base / gcd), sympy.simplify(divisor / gcd) ) base = -1.00000000000000 cls = FloorDiv divisor = -1.00000000000000 gcd = 1.00000000000000 quotient = 1.00000000000000 term = -1.00000000000000 /opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/utils/_sympy/functions.py:159: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ args = (FloorDiv, -1.00000000000000, -1.00000000000000), kwargs = {} @wraps(func) def wrapper(args, kwargs): try: > retval = cfunc(args, **kwargs) E RecursionError: maximum recursion depth exceeded in comparison E E To execute this test, run the following from the base repo dir: E python test/test_sympy_utils.py -k TestValueRanges.test_binary_ref_fn_floordiv_dtype_float E E This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 args = (FloorDiv, -1.00000000000000, -1.00000000000000) cfunc = <functools._lru_cache_wrapper object at 0x7fc5303173a0> func = <function Function.__new__ at 0x7fc530317280> kwargs = {} ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/131151 Approved by: https://github.com/ezyang	2024-07-29 18:53:14 +00:00
PyTorch MergeBot	c35f21e5fc	Revert "[BE][tests] show local variables on failure in tests (#131151 )" This reverts commit `14158d892a`. Reverted https://github.com/pytorch/pytorch/pull/131151 on behalf of https://github.com/atalman due to Broke CI: test_testing.py::TestTestingCUDA::test_cuda_assert_should_stop_common_device_type_test_suite_cuda [GH job link](https://github.com/pytorch/pytorch/actions/runs/10131415299/job/28014665693) [HUD commit link](`14158d892a`) ([comment](https://github.com/pytorch/pytorch/pull/131151#issuecomment-2255921015))	2024-07-29 13:19:38 +00:00
PyTorch MergeBot	06fe99a097	Revert "[CI] add new test config label `ci-test-showlocals` to control test log verbosity (#131981 )" This reverts commit `dfa18bf3f3`. Reverted https://github.com/pytorch/pytorch/pull/131981 on behalf of https://github.com/atalman due to Sorry, need to revert bottom PR, which broke CI: https://github.com/pytorch/pytorch/pull/131151 ([comment](https://github.com/pytorch/pytorch/pull/131981#issuecomment-2255892628))	2024-07-29 13:09:41 +00:00
Xuehai Pan	dfa18bf3f3	[CI] add new test config label `ci-test-showlocals` to control test log verbosity (#131981 ) Add a new label `ci-test-showlocals` and add it to test config filter. If the PR is labeled with `ci-test-showlocals` or "ci-test-showlocals" present in the PR comment, the test config filter will set a environment variable `TEST_SHOWLOCALS`. Then `pytest` will show local variables on failures for better debugging. Pull Request resolved: https://github.com/pytorch/pytorch/pull/131981 Approved by: https://github.com/malfet	2024-07-29 07:40:42 +00:00
Xuehai Pan	14158d892a	[BE][tests] show local variables on failure in tests (#131151 ) ------ As per the title, add argument `--locals` for `unittest` and `--showlocals --tb=long` for `pytest` in CI. Some failures cannot be reproduced on the local machine but exist on cloud CI. This change allows us to investigate the test failure more easily. Example output: https://github.com/pytorch/pytorch/actions/runs/9961546996/job/27523888353?pr=130710#step:20:3361 ```text /opt/conda/envs/py_3.8/lib/python3.8/site-packages/sympy/core/function.py:307: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ cls = FloorDiv, base = -1.00000000000000, divisor = -1.00000000000000 @classmethod def eval(cls, base, divisor): # python test/test_dynamic_shapes.py -k TestDimConstraints.test_dim_constraints_solve_full # Assert triggered by inequality solver # assert base.is_integer, base # assert divisor.is_integer, divisor # We don't provide the same error message as in Python because SymPy # makes it difficult to check the types. if divisor.is_zero: raise ZeroDivisionError("division by zero") if base in (int_oo, -int_oo, sympy.oo, -sympy.oo) and divisor in ( int_oo, -int_oo, sympy.oo, -sympy.oo, ): return sympy.nan if base is sympy.nan or divisor is sympy.nan: return sympy.nan if base.is_zero: return sympy.S.Zero if base.is_integer and divisor == 1: return base if base.is_integer and divisor == -1: return sympy.Mul(base, -1) if ( isinstance(base, sympy.Number) and isinstance(divisor, sympy.Number) and ( base in (int_oo, -int_oo, sympy.oo, -sympy.oo) or divisor in (int_oo, -int_oo, sympy.oo, -sympy.oo) ) ): r = float(base) / float(divisor) if r == math.inf: return int_oo elif r == -math.inf: return -int_oo elif math.isnan(r): return sympy.nan else: return sympy.Integer(math.floor(r)) if isinstance(base, sympy.Integer) and isinstance(divisor, sympy.Integer): return sympy.Integer(int(base) // int(divisor)) if isinstance(base, FloorDiv): return FloorDiv(base.args[0], base.args[1] * divisor) # Expands (x + y) // b into x // b + y // b. # This only works if floor is an identity, i.e. x / b is an integer. for term in sympy.Add.make_args(base): quotient = term / divisor if quotient.is_integer and isinstance(divisor, sympy.Integer): # NB: this is correct even if the divisor is not an integer, but it # creates rational expressions that cause problems with dynamic # shapes. return FloorDiv(base - term, divisor) + quotient try: gcd = sympy.gcd(base, divisor) if gcd != 1: > return FloorDiv( sympy.simplify(base / gcd), sympy.simplify(divisor / gcd) ) base = -1.00000000000000 cls = FloorDiv divisor = -1.00000000000000 gcd = 1.00000000000000 quotient = 1.00000000000000 term = -1.00000000000000 /opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/utils/_sympy/functions.py:159: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ args = (FloorDiv, -1.00000000000000, -1.00000000000000), kwargs = {} @wraps(func) def wrapper(args, kwargs): try: > retval = cfunc(args, **kwargs) E RecursionError: maximum recursion depth exceeded in comparison E E To execute this test, run the following from the base repo dir: E python test/test_sympy_utils.py -k TestValueRanges.test_binary_ref_fn_floordiv_dtype_float E E This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 args = (FloorDiv, -1.00000000000000, -1.00000000000000) cfunc = <functools._lru_cache_wrapper object at 0x7fc5303173a0> func = <function Function.__new__ at 0x7fc530317280> kwargs = {} ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/131151 Approved by: https://github.com/ezyang	2024-07-27 19:39:40 +00:00
PyTorch MergeBot	0f9bf208ec	Revert "[BE][tests] show local variables on failure in tests (#131151 )" This reverts commit `054d214c50`. Reverted https://github.com/pytorch/pytorch/pull/131151 on behalf of https://github.com/jbschlosser due to pollutes test failure output for OpInfo tests ([comment](https://github.com/pytorch/pytorch/pull/131151#issuecomment-2253310448))	2024-07-26 19:03:10 +00:00
Xuehai Pan	054d214c50	[BE][tests] show local variables on failure in tests (#131151 ) ------ As per the title, add argument `--locals` for `unittest` and `--showlocals --tb=long` for `pytest` in CI. Some failures cannot be reproduced on the local machine but exist on cloud CI. This change allows us to investigate the test failure more easily. Example output: https://github.com/pytorch/pytorch/actions/runs/9961546996/job/27523888353?pr=130710#step:20:3361 ```text /opt/conda/envs/py_3.8/lib/python3.8/site-packages/sympy/core/function.py:307: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ cls = FloorDiv, base = -1.00000000000000, divisor = -1.00000000000000 @classmethod def eval(cls, base, divisor): # python test/test_dynamic_shapes.py -k TestDimConstraints.test_dim_constraints_solve_full # Assert triggered by inequality solver # assert base.is_integer, base # assert divisor.is_integer, divisor # We don't provide the same error message as in Python because SymPy # makes it difficult to check the types. if divisor.is_zero: raise ZeroDivisionError("division by zero") if base in (int_oo, -int_oo, sympy.oo, -sympy.oo) and divisor in ( int_oo, -int_oo, sympy.oo, -sympy.oo, ): return sympy.nan if base is sympy.nan or divisor is sympy.nan: return sympy.nan if base.is_zero: return sympy.S.Zero if base.is_integer and divisor == 1: return base if base.is_integer and divisor == -1: return sympy.Mul(base, -1) if ( isinstance(base, sympy.Number) and isinstance(divisor, sympy.Number) and ( base in (int_oo, -int_oo, sympy.oo, -sympy.oo) or divisor in (int_oo, -int_oo, sympy.oo, -sympy.oo) ) ): r = float(base) / float(divisor) if r == math.inf: return int_oo elif r == -math.inf: return -int_oo elif math.isnan(r): return sympy.nan else: return sympy.Integer(math.floor(r)) if isinstance(base, sympy.Integer) and isinstance(divisor, sympy.Integer): return sympy.Integer(int(base) // int(divisor)) if isinstance(base, FloorDiv): return FloorDiv(base.args[0], base.args[1] * divisor) # Expands (x + y) // b into x // b + y // b. # This only works if floor is an identity, i.e. x / b is an integer. for term in sympy.Add.make_args(base): quotient = term / divisor if quotient.is_integer and isinstance(divisor, sympy.Integer): # NB: this is correct even if the divisor is not an integer, but it # creates rational expressions that cause problems with dynamic # shapes. return FloorDiv(base - term, divisor) + quotient try: gcd = sympy.gcd(base, divisor) if gcd != 1: > return FloorDiv( sympy.simplify(base / gcd), sympy.simplify(divisor / gcd) ) base = -1.00000000000000 cls = FloorDiv divisor = -1.00000000000000 gcd = 1.00000000000000 quotient = 1.00000000000000 term = -1.00000000000000 /opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/utils/_sympy/functions.py:159: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ args = (FloorDiv, -1.00000000000000, -1.00000000000000), kwargs = {} @wraps(func) def wrapper(args, kwargs): try: > retval = cfunc(args, **kwargs) E RecursionError: maximum recursion depth exceeded in comparison E E To execute this test, run the following from the base repo dir: E python test/test_sympy_utils.py -k TestValueRanges.test_binary_ref_fn_floordiv_dtype_float E E This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 args = (FloorDiv, -1.00000000000000, -1.00000000000000) cfunc = <functools._lru_cache_wrapper object at 0x7fc5303173a0> func = <function Function.__new__ at 0x7fc530317280> kwargs = {} ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/131151 Approved by: https://github.com/ezyang	2024-07-25 10:10:58 +00:00
Xuehai Pan	ba48cf6535	[BE][Easy][6/19] enforce style for empty lines in import segments in `test/` (#129757 ) See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter. You can review these PRs via: ```bash git diff --ignore-all-space --ignore-blank-lines HEAD~1 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/129757 Approved by: https://github.com/ezyang	2024-07-17 06:42:37 +00:00
Xuehai Pan	4d7bf72d93	[BE][Easy] fix ruff rule needless-bool (SIM103) (#130206 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/130206 Approved by: https://github.com/malfet	2024-07-14 08:17:52 +00:00
Yuanhao Ji	312652c325	[RFC] Add support for device extension autoloading (#127074 ) Fixes #122468 - Load device extensions at the end of `torch/__init__.py` - Enabled by default, or you can disable it with `TORCH_DEVICE_BACKEND_AUTOLOAD=0` run test: ```python python test/run_test.py -i test_autoload_enable python test/run_test.py -i test_autoload_disable ``` doc: https://docs-preview.pytorch.org/pytorch/pytorch/127074/miscellaneous_environment_variables.html co-author: @jgong5 @bsochack @bkowalskiINTEL @jczaja @FFFrog @hipudding Co-authored-by: albanD <desmaison.alban@gmail.com> Co-authored-by: Jiong Gong <jiong.gong@intel.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/127074 Approved by: https://github.com/albanD, https://github.com/jgong5	2024-07-09 06:14:13 +00:00
Catherine Lee	91a8376d47	run_test: Unset cpp stacktraces after reruns (#129004 ) Rerun the failing test singly with the env var set. If it succeeds, start a new process without the cpp stack traces env var We don't want to waste time generating these if we don't have to They can also show up in assertion errors, which may cause unexpected failures if a test wants to check these Adds new --rs (run single) to be used the same way --scs and --sc are. It will only run the single test in the step current file https://hud.pytorch.org/pytorch/pytorch/pull/129004?sha=2c349d3557d399020bf1f6a8b7045e2e4957ba46 has some examples of logs In the above: * test_checkpoint_valid failed, then passed in another subprocess. The testing continued in a different new subprocess from the test right after it (test_checkpointing_without_reentrant_early_free) * test_format_traceback_short failed consistently, but it continued to run because keep-going was set Pull Request resolved: https://github.com/pytorch/pytorch/pull/129004 Approved by: https://github.com/PaliC	2024-07-03 01:50:15 +00:00
Xuehai Pan	4ee1cb9b95	[BE][Easy] replace `import pathlib` with `from pathlib import Path` (#129426 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/129426 Approved by: https://github.com/malfet	2024-06-30 01:36:07 +00:00
PyTorch MergeBot	2effbcfcd8	Revert "[BE][Easy] replace `import pathlib` with `from pathlib import Path` (#129426 )" This reverts commit `6d75604ef1`. Reverted https://github.com/pytorch/pytorch/pull/129426 on behalf of https://github.com/XuehaiPan due to recognize `Path` as new exported API ([comment](https://github.com/pytorch/pytorch/pull/129426#issuecomment-2198371625))	2024-06-29 23:24:06 +00:00
Xuehai Pan	6d75604ef1	[BE][Easy] replace `import pathlib` with `from pathlib import Path` (#129426 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/129426 Approved by: https://github.com/malfet	2024-06-29 15:42:09 +00:00
Catherine Lee	8892ddaacc	[TD] Test removal on sm86 (#127131 ) Yolo I'm excited to break CI :') Pull Request resolved: https://github.com/pytorch/pytorch/pull/127131 Approved by: https://github.com/huydhn, https://github.com/ZainRizvi	2024-06-07 20:19:18 +00:00
Howard Huang	baaa914bf7	[small] test clean up (#128079 ) remove unnecessary line: https://github.com/pytorch/pytorch/issues/123733 add main so test can be run `python ...`: https://github.com/pytorch/pytorch/issues/124906 Pull Request resolved: https://github.com/pytorch/pytorch/pull/128079 Approved by: https://github.com/awgu	2024-06-06 21:21:40 +00:00
chuanqiw	627d2cd87d	[CI] disable td for xpu ci test by default (#127611 ) Due to the xpu ci test has been enabled td by default, a lot of test cases (75%) have been skipped in CI tests. It caused some ci failures escaped from the ci tests, for example issue #127539. This PR depends on PR #127595 landed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127611 Approved by: https://github.com/etaf, https://github.com/atalman	2024-06-04 17:15:10 +00:00
Catherine Lee	a31a60d85b	Change run_test.py arg parsing to handle additional args better (#126709 ) Do not inherit parser from common_utils * I don't think we use any variables in run_test that depend on those, and I think all tests except doctests run in a subprocess so they will parse the args in common_utils and set the variables. I don't think doctests wants any of those variables? Parse known args, add the extra args as extra, pass the extra ones along to the subprocess Removes the first instance of `--` I think I will miss run_test telling me if an arg is valid or not Pull Request resolved: https://github.com/pytorch/pytorch/pull/126709 Approved by: https://github.com/ZainRizvi, https://github.com/huydhn, https://github.com/Flamefire	2024-05-23 21:08:12 +00:00
Catherine Lee	ac2c547838	[TD] Upload names of failures to s3 for pytest cache (#126315 ) Some tests don't get run through pytest and pytest crashes when a test segfaults, so in both caess, the pytest cache won't have an entry (similar to https://github.com/pytorch/test-infra/pull/5205). Instead, manually upload/download an extra file that lists the failing test files Technically this would be more general than the pytest cache Pull Request resolved: https://github.com/pytorch/pytorch/pull/126315 Approved by: https://github.com/ZainRizvi	2024-05-21 16:29:31 +00:00
PyTorch MergeBot	8bca0847c2	Revert "[TD] Upload names of failures to s3 for pytest cache (#126315 )" This reverts commit `655038687a`. Reverted https://github.com/pytorch/pytorch/pull/126315 on behalf of https://github.com/clee2000 due to broke inductor ([comment](https://github.com/pytorch/pytorch/pull/126315#issuecomment-2121133045))	2024-05-20 20:15:08 +00:00
Catherine Lee	655038687a	[TD] Upload names of failures to s3 for pytest cache (#126315 ) Some tests don't get run through pytest and pytest crashes when a test segfaults, so in both caess, the pytest cache won't have an entry (similar to https://github.com/pytorch/test-infra/pull/5205). Instead, manually upload/download an extra file that lists the failing test files Technically this would be more general than the pytest cache Pull Request resolved: https://github.com/pytorch/pytorch/pull/126315 Approved by: https://github.com/ZainRizvi	2024-05-20 17:36:30 +00:00
drisspg	762ce6f062	Add Lowering for FlexAttention Backwards (#125515 ) # Summary #### What does this PR do? It enables Inductor to actually generate the fused flex attention kernel for the backwards I did some other things along the way: - Abstract out the 'build_subgraph_buffer' subroutine and make it reusable between flex attention and flex_attention backwards. In total we need too build 3 subgraphs for fwd + bwd. 1 for the fwd graph and then 2 in the bwd. The FAv2 algorithm recomputes the parts of the forward (more efficiently since we already have the row_max via logsumexp), therefore we need to inline both the fwd graph and the joint graph in the bwds kernel. - The version of the backwards kernel is from a somewhat older version of the triton tutorial implementation. I think that we should update in a follow up to a newer version. Notably the blocks need to be square for this to work as currently implemented. I am sure there are many opportunities for optimization. - I didnt correctly register the decomp table + IndexMode when I landed: https://github.com/pytorch/pytorch/pull/123902, this remedies that. - The rel_bias helper func was reversed in terms of causality. I updated and then add a test specific for "future causal" attention. - This PRs but the main point that I think still needs to be worked out is the store_output call. I have it hacked up to be 'fake' but I dont think we want to land that and likely want to just have a mutated 'dq' and a stored_output 'dk' - I also needed to update the `TritonTemplateKernel` to actually accept multiple subgraphs (modifications) - I updated the benchmark to also profile bwds performance ### Benchmark Numbers: _The current implementation is not parallelizing over ctx length in the bwd_ FWD Speedups \| Type \| Speedup \| shape \| score_mod \| dtype \| \|---------\|-----------\|--------------------\|-------------\|----------------\| \| Average \| 0.991 \| \| \| \| \| Max \| 1.182 \| (16, 16, 4096, 64) \| noop \| torch.bfloat16 \| \| Min \| 0.796 \| (2, 16, 512, 256) \| head_bias \| torch.bfloat16 \| BWD Speedups \| Type \| Speedup \| shape \| score_mod \| dtype \| \|---------\|-----------\|--------------------\|-------------\|----------------\| \| Average \| 0.291 \| \| \| \| \| Max \| 0.652 \| (8, 16, 512, 64) \| head_bias \| torch.bfloat16 \| \| Min \| 0.073 \| (2, 16, 4096, 128) \| head_bias \| torch.bfloat16 \| <details> <summary>Full Data</summary> \| shape \| score_mod \| dtype \| fwd_eager_time \| fwd_compiled_time \| bwd_eager_time \| bwd_compiled_time \| fwd_speedup \| bwd_speedup \| \|---------------------\|---------------\|----------------\|------------------\|---------------------\|------------------\|---------------------\|---------------\|---------------\| \| (2, 16, 512, 64) \| noop \| torch.bfloat16 \| 19.936 \| 19.092 \| 57.851 \| 193.564 \| 1.044 \| 0.299 \| \| (2, 16, 512, 64) \| causal_mask \| torch.bfloat16 \| 19.955 \| 19.497 \| 57.662 \| 206.278 \| 1.024 \| 0.280 \| \| (2, 16, 512, 64) \| relative_bias \| torch.bfloat16 \| 19.455 \| 21.297 \| 57.674 \| 195.219 \| 0.913 \| 0.295 \| \| (2, 16, 512, 64) \| head_bias \| torch.bfloat16 \| 19.958 \| 21.289 \| 57.674 \| 193.859 \| 0.938 \| 0.298 \| \| (2, 16, 512, 128) \| noop \| torch.bfloat16 \| 28.157 \| 28.615 \| 82.831 \| 454.211 \| 0.984 \| 0.182 \| \| (2, 16, 512, 128) \| causal_mask \| torch.bfloat16 \| 28.154 \| 28.444 \| 83.091 \| 432.083 \| 0.990 \| 0.192 \| \| (2, 16, 512, 128) \| relative_bias \| torch.bfloat16 \| 28.722 \| 27.897 \| 83.175 \| 446.789 \| 1.030 \| 0.186 \| \| (2, 16, 512, 128) \| head_bias \| torch.bfloat16 \| 28.299 \| 27.673 \| 83.052 \| 459.179 \| 1.023 \| 0.181 \| \| (2, 16, 512, 256) \| noop \| torch.bfloat16 \| 41.167 \| 50.504 \| 175.019 \| 1083.545 \| 0.815 \| 0.162 \| \| (2, 16, 512, 256) \| causal_mask \| torch.bfloat16 \| 41.656 \| 51.933 \| 175.078 \| 1171.176 \| 0.802 \| 0.149 \| \| (2, 16, 512, 256) \| relative_bias \| torch.bfloat16 \| 41.697 \| 50.722 \| 175.159 \| 1097.312 \| 0.822 \| 0.160 \| \| (2, 16, 512, 256) \| head_bias \| torch.bfloat16 \| 41.690 \| 52.387 \| 175.184 \| 1097.336 \| 0.796 \| 0.160 \| \| (2, 16, 1024, 64) \| noop \| torch.bfloat16 \| 39.232 \| 37.454 \| 127.847 \| 612.430 \| 1.047 \| 0.209 \| \| (2, 16, 1024, 64) \| causal_mask \| torch.bfloat16 \| 39.930 \| 39.599 \| 127.755 \| 665.359 \| 1.008 \| 0.192 \| \| (2, 16, 1024, 64) \| relative_bias \| torch.bfloat16 \| 39.417 \| 41.304 \| 127.902 \| 614.990 \| 0.954 \| 0.208 \| \| (2, 16, 1024, 64) \| head_bias \| torch.bfloat16 \| 39.965 \| 42.034 \| 127.953 \| 613.273 \| 0.951 \| 0.209 \| \| (2, 16, 1024, 128) \| noop \| torch.bfloat16 \| 63.964 \| 71.024 \| 226.510 \| 1637.669 \| 0.901 \| 0.138 \| \| (2, 16, 1024, 128) \| causal_mask \| torch.bfloat16 \| 63.843 \| 72.451 \| 226.750 \| 1558.949 \| 0.881 \| 0.145 \| \| (2, 16, 1024, 128) \| relative_bias \| torch.bfloat16 \| 64.301 \| 70.487 \| 226.651 \| 1610.063 \| 0.912 \| 0.141 \| \| (2, 16, 1024, 128) \| head_bias \| torch.bfloat16 \| 64.033 \| 71.394 \| 226.676 \| 1668.511 \| 0.897 \| 0.136 \| \| (2, 16, 1024, 256) \| noop \| torch.bfloat16 \| 129.348 \| 141.390 \| 507.337 \| 4405.175 \| 0.915 \| 0.115 \| \| (2, 16, 1024, 256) \| causal_mask \| torch.bfloat16 \| 129.538 \| 145.680 \| 507.178 \| 4768.874 \| 0.889 \| 0.106 \| \| (2, 16, 1024, 256) \| relative_bias \| torch.bfloat16 \| 129.438 \| 142.782 \| 507.004 \| 4401.002 \| 0.907 \| 0.115 \| \| (2, 16, 1024, 256) \| head_bias \| torch.bfloat16 \| 129.058 \| 146.242 \| 507.547 \| 4434.251 \| 0.883 \| 0.114 \| \| (2, 16, 4096, 64) \| noop \| torch.bfloat16 \| 481.606 \| 409.120 \| 1440.890 \| 14147.269 \| 1.177 \| 0.102 \| \| (2, 16, 4096, 64) \| causal_mask \| torch.bfloat16 \| 480.227 \| 438.847 \| 1434.419 \| 14973.386 \| 1.094 \| 0.096 \| \| (2, 16, 4096, 64) \| relative_bias \| torch.bfloat16 \| 480.831 \| 458.104 \| 1432.935 \| 14193.253 \| 1.050 \| 0.101 \| \| (2, 16, 4096, 64) \| head_bias \| torch.bfloat16 \| 480.749 \| 452.497 \| 1437.040 \| 14084.869 \| 1.062 \| 0.102 \| \| (2, 16, 4096, 128) \| noop \| torch.bfloat16 \| 872.534 \| 848.275 \| 2600.895 \| 35156.849 \| 1.029 \| 0.074 \| \| (2, 16, 4096, 128) \| causal_mask \| torch.bfloat16 \| 872.647 \| 868.279 \| 2587.581 \| 31919.531 \| 1.005 \| 0.081 \| \| (2, 16, 4096, 128) \| relative_bias \| torch.bfloat16 \| 871.484 \| 827.644 \| 2593.989 \| 34805.634 \| 1.053 \| 0.075 \| \| (2, 16, 4096, 128) \| head_bias \| torch.bfloat16 \| 871.422 \| 856.437 \| 2602.482 \| 35708.591 \| 1.017 \| 0.073 \| \| (2, 16, 4096, 256) \| noop \| torch.bfloat16 \| 1904.497 \| 1758.183 \| 6122.416 \| 66754.593 \| 1.083 \| 0.092 \| \| (2, 16, 4096, 256) \| causal_mask \| torch.bfloat16 \| 1911.174 \| 1762.821 \| 6113.207 \| 72759.392 \| 1.084 \| 0.084 \| \| (2, 16, 4096, 256) \| relative_bias \| torch.bfloat16 \| 1911.254 \| 1727.108 \| 6123.530 \| 66577.988 \| 1.107 \| 0.092 \| \| (2, 16, 4096, 256) \| head_bias \| torch.bfloat16 \| 1916.977 \| 1801.804 \| 6118.158 \| 67359.680 \| 1.064 \| 0.091 \| \| (8, 16, 512, 64) \| noop \| torch.bfloat16 \| 44.984 \| 43.974 \| 170.276 \| 262.259 \| 1.023 \| 0.649 \| \| (8, 16, 512, 64) \| causal_mask \| torch.bfloat16 \| 45.001 \| 46.265 \| 170.509 \| 274.893 \| 0.973 \| 0.620 \| \| (8, 16, 512, 64) \| relative_bias \| torch.bfloat16 \| 45.466 \| 48.211 \| 170.606 \| 262.759 \| 0.943 \| 0.649 \| \| (8, 16, 512, 64) \| head_bias \| torch.bfloat16 \| 45.481 \| 48.435 \| 170.267 \| 261.265 \| 0.939 \| 0.652 \| \| (8, 16, 512, 128) \| noop \| torch.bfloat16 \| 72.565 \| 74.736 \| 313.220 \| 773.126 \| 0.971 \| 0.405 \| \| (8, 16, 512, 128) \| causal_mask \| torch.bfloat16 \| 72.015 \| 75.755 \| 313.311 \| 775.513 \| 0.951 \| 0.404 \| \| (8, 16, 512, 128) \| relative_bias \| torch.bfloat16 \| 72.105 \| 74.189 \| 313.806 \| 769.238 \| 0.972 \| 0.408 \| \| (8, 16, 512, 128) \| head_bias \| torch.bfloat16 \| 72.005 \| 74.364 \| 313.509 \| 775.237 \| 0.968 \| 0.404 \| \| (8, 16, 512, 256) \| noop \| torch.bfloat16 \| 138.656 \| 165.453 \| 663.707 \| 2672.067 \| 0.838 \| 0.248 \| \| (8, 16, 512, 256) \| causal_mask \| torch.bfloat16 \| 139.096 \| 172.613 \| 663.593 \| 2926.538 \| 0.806 \| 0.227 \| \| (8, 16, 512, 256) \| relative_bias \| torch.bfloat16 \| 139.500 \| 168.417 \| 663.938 \| 2658.629 \| 0.828 \| 0.250 \| \| (8, 16, 512, 256) \| head_bias \| torch.bfloat16 \| 139.776 \| 173.549 \| 662.920 \| 2667.266 \| 0.805 \| 0.249 \| \| (8, 16, 1024, 64) \| noop \| torch.bfloat16 \| 134.883 \| 125.004 \| 484.706 \| 1195.254 \| 1.079 \| 0.406 \| \| (8, 16, 1024, 64) \| causal_mask \| torch.bfloat16 \| 134.297 \| 132.875 \| 485.420 \| 1234.953 \| 1.011 \| 0.393 \| \| (8, 16, 1024, 64) \| relative_bias \| torch.bfloat16 \| 134.839 \| 139.231 \| 485.470 \| 1198.556 \| 0.968 \| 0.405 \| \| (8, 16, 1024, 64) \| head_bias \| torch.bfloat16 \| 133.822 \| 136.449 \| 485.608 \| 1189.198 \| 0.981 \| 0.408 \| \| (8, 16, 1024, 128) \| noop \| torch.bfloat16 \| 235.470 \| 234.765 \| 886.094 \| 2662.944 \| 1.003 \| 0.333 \| \| (8, 16, 1024, 128) \| causal_mask \| torch.bfloat16 \| 236.305 \| 241.382 \| 886.293 \| 2646.984 \| 0.979 \| 0.335 \| \| (8, 16, 1024, 128) \| relative_bias \| torch.bfloat16 \| 236.414 \| 233.980 \| 885.250 \| 2642.178 \| 1.010 \| 0.335 \| \| (8, 16, 1024, 128) \| head_bias \| torch.bfloat16 \| 237.176 \| 239.040 \| 885.754 \| 2665.242 \| 0.992 \| 0.332 \| \| (8, 16, 1024, 256) \| noop \| torch.bfloat16 \| 504.445 \| 517.855 \| 1978.956 \| 9592.906 \| 0.974 \| 0.206 \| \| (8, 16, 1024, 256) \| causal_mask \| torch.bfloat16 \| 502.428 \| 536.002 \| 1978.611 \| 10607.342 \| 0.937 \| 0.187 \| \| (8, 16, 1024, 256) \| relative_bias \| torch.bfloat16 \| 503.396 \| 523.960 \| 1977.993 \| 9539.284 \| 0.961 \| 0.207 \| \| (8, 16, 1024, 256) \| head_bias \| torch.bfloat16 \| 503.818 \| 536.014 \| 1980.131 \| 9576.262 \| 0.940 \| 0.207 \| \| (8, 16, 4096, 64) \| noop \| torch.bfloat16 \| 1970.139 \| 1674.930 \| 5750.940 \| 16724.134 \| 1.176 \| 0.344 \| \| (8, 16, 4096, 64) \| causal_mask \| torch.bfloat16 \| 1959.036 \| 1775.056 \| 5780.512 \| 17390.350 \| 1.104 \| 0.332 \| \| (8, 16, 4096, 64) \| relative_bias \| torch.bfloat16 \| 1947.198 \| 1773.869 \| 5780.643 \| 16779.699 \| 1.098 \| 0.345 \| \| (8, 16, 4096, 64) \| head_bias \| torch.bfloat16 \| 1963.935 \| 1829.502 \| 5780.018 \| 16703.259 \| 1.073 \| 0.346 \| \| (8, 16, 4096, 128) \| noop \| torch.bfloat16 \| 3582.711 \| 3362.623 \| 10436.069 \| 36415.565 \| 1.065 \| 0.287 \| \| (8, 16, 4096, 128) \| causal_mask \| torch.bfloat16 \| 3581.504 \| 3499.472 \| 10346.869 \| 36164.959 \| 1.023 \| 0.286 \| \| (8, 16, 4096, 128) \| relative_bias \| torch.bfloat16 \| 3589.779 \| 3337.849 \| 10529.621 \| 36261.696 \| 1.075 \| 0.290 \| \| (8, 16, 4096, 128) \| head_bias \| torch.bfloat16 \| 3602.265 \| 3436.444 \| 10458.660 \| 36507.790 \| 1.048 \| 0.286 \| \| (8, 16, 4096, 256) \| noop \| torch.bfloat16 \| 7695.923 \| 7126.275 \| 24643.009 \| 140949.081 \| 1.080 \| 0.175 \| \| (8, 16, 4096, 256) \| causal_mask \| torch.bfloat16 \| 7679.939 \| 7186.252 \| 24538.105 \| 157156.067 \| 1.069 \| 0.156 \| \| (8, 16, 4096, 256) \| relative_bias \| torch.bfloat16 \| 7681.374 \| 6994.832 \| 24549.713 \| 140077.179 \| 1.098 \| 0.175 \| \| (8, 16, 4096, 256) \| head_bias \| torch.bfloat16 \| 7679.822 \| 7212.278 \| 24627.823 \| 140675.003 \| 1.065 \| 0.175 \| \| (16, 16, 512, 64) \| noop \| torch.bfloat16 \| 80.126 \| 78.291 \| 333.719 \| 541.165 \| 1.023 \| 0.617 \| \| (16, 16, 512, 64) \| causal_mask \| torch.bfloat16 \| 80.065 \| 81.696 \| 333.779 \| 551.113 \| 0.980 \| 0.606 \| \| (16, 16, 512, 64) \| relative_bias \| torch.bfloat16 \| 80.138 \| 86.715 \| 333.364 \| 542.118 \| 0.924 \| 0.615 \| \| (16, 16, 512, 64) \| head_bias \| torch.bfloat16 \| 80.415 \| 85.204 \| 333.294 \| 536.840 \| 0.944 \| 0.621 \| \| (16, 16, 512, 128) \| noop \| torch.bfloat16 \| 134.964 \| 138.025 \| 607.093 \| 1333.102 \| 0.978 \| 0.455 \| \| (16, 16, 512, 128) \| causal_mask \| torch.bfloat16 \| 134.192 \| 141.523 \| 606.269 \| 1424.318 \| 0.948 \| 0.426 \| \| (16, 16, 512, 128) \| relative_bias \| torch.bfloat16 \| 135.711 \| 138.639 \| 606.283 \| 1327.974 \| 0.979 \| 0.457 \| \| (16, 16, 512, 128) \| head_bias \| torch.bfloat16 \| 135.552 \| 140.555 \| 607.107 \| 1347.370 \| 0.964 \| 0.451 \| \| (16, 16, 512, 256) \| noop \| torch.bfloat16 \| 275.113 \| 315.144 \| 1301.583 \| 5268.153 \| 0.873 \| 0.247 \| \| (16, 16, 512, 256) \| causal_mask \| torch.bfloat16 \| 274.867 \| 328.106 \| 1302.513 \| 5770.594 \| 0.838 \| 0.226 \| \| (16, 16, 512, 256) \| relative_bias \| torch.bfloat16 \| 276.052 \| 321.770 \| 1302.904 \| 5241.920 \| 0.858 \| 0.249 \| \| (16, 16, 512, 256) \| head_bias \| torch.bfloat16 \| 271.409 \| 328.839 \| 1302.142 \| 5266.037 \| 0.825 \| 0.247 \| \| (16, 16, 1024, 64) \| noop \| torch.bfloat16 \| 260.489 \| 237.463 \| 955.884 \| 1817.558 \| 1.097 \| 0.526 \| \| (16, 16, 1024, 64) \| causal_mask \| torch.bfloat16 \| 262.378 \| 254.350 \| 955.280 \| 1843.807 \| 1.032 \| 0.518 \| \| (16, 16, 1024, 64) \| relative_bias \| torch.bfloat16 \| 261.338 \| 268.253 \| 956.038 \| 1820.036 \| 0.974 \| 0.525 \| \| (16, 16, 1024, 64) \| head_bias \| torch.bfloat16 \| 262.153 \| 264.156 \| 956.023 \| 1810.076 \| 0.992 \| 0.528 \| \| (16, 16, 1024, 128) \| noop \| torch.bfloat16 \| 476.475 \| 461.413 \| 1760.578 \| 4306.521 \| 1.033 \| 0.409 \| \| (16, 16, 1024, 128) \| causal_mask \| torch.bfloat16 \| 473.794 \| 479.178 \| 1761.277 \| 4619.439 \| 0.989 \| 0.381 \| \| (16, 16, 1024, 128) \| relative_bias \| torch.bfloat16 \| 473.839 \| 463.282 \| 1758.692 \| 4290.562 \| 1.023 \| 0.410 \| \| (16, 16, 1024, 128) \| head_bias \| torch.bfloat16 \| 472.979 \| 472.896 \| 1763.086 \| 4367.931 \| 1.000 \| 0.404 \| \| (16, 16, 1024, 256) \| noop \| torch.bfloat16 \| 1014.184 \| 1026.764 \| 3922.997 \| 19104.147 \| 0.988 \| 0.205 \| \| (16, 16, 1024, 256) \| causal_mask \| torch.bfloat16 \| 1013.217 \| 1039.046 \| 3928.382 \| 21086.281 \| 0.975 \| 0.186 \| \| (16, 16, 1024, 256) \| relative_bias \| torch.bfloat16 \| 1008.519 \| 1015.278 \| 3922.133 \| 18980.652 \| 0.993 \| 0.207 \| \| (16, 16, 1024, 256) \| head_bias \| torch.bfloat16 \| 1011.360 \| 1047.542 \| 3931.245 \| 19069.172 \| 0.965 \| 0.206 \| \| (16, 16, 4096, 64) \| noop \| torch.bfloat16 \| 3929.850 \| 3325.667 \| 11411.704 \| 23344.280 \| 1.182 \| 0.489 \| \| (16, 16, 4096, 64) \| causal_mask \| torch.bfloat16 \| 3885.262 \| 3581.544 \| 11390.515 \| 23725.639 \| 1.085 \| 0.480 \| \| (16, 16, 4096, 64) \| relative_bias \| torch.bfloat16 \| 3865.737 \| 3537.308 \| 11489.901 \| 23406.330 \| 1.093 \| 0.491 \| \| (16, 16, 4096, 64) \| head_bias \| torch.bfloat16 \| 3880.530 \| 3665.249 \| 11484.411 \| 23299.496 \| 1.059 \| 0.493 \| \| (16, 16, 4096, 128) \| noop \| torch.bfloat16 \| 7030.306 \| 6745.715 \| 20621.264 \| 57464.096 \| 1.042 \| 0.359 \| \| (16, 16, 4096, 128) \| causal_mask \| torch.bfloat16 \| 7095.414 \| 7034.385 \| 20410.656 \| 61660.511 \| 1.009 \| 0.331 \| \| (16, 16, 4096, 128) \| relative_bias \| torch.bfloat16 \| 7084.779 \| 6686.497 \| 20315.161 \| 57243.969 \| 1.060 \| 0.355 \| \| (16, 16, 4096, 128) \| head_bias \| torch.bfloat16 \| 7075.367 \| 6863.305 \| 20494.385 \| 58481.953 \| 1.031 \| 0.350 \| \| (16, 16, 4096, 256) \| noop \| torch.bfloat16 \| 15612.741 \| 14297.482 \| 55306.847 \| 281161.865 \| 1.092 \| 0.197 \| \| (16, 16, 4096, 256) \| causal_mask \| torch.bfloat16 \| 15326.592 \| 14263.878 \| 55227.806 \| 313063.232 \| 1.075 \| 0.176 \| \| (16, 16, 4096, 256) \| relative_bias \| torch.bfloat16 \| 15297.963 \| 14007.379 \| 54558.029 \| 279529.175 \| 1.092 \| 0.195 \| \| (16, 16, 4096, 256) \| head_bias \| torch.bfloat16 \| 15216.160 \| 14276.027 \| 55081.581 \| 280996.826 \| 1.066 \| 0.196 \| </details> Pull Request resolved: https://github.com/pytorch/pytorch/pull/125515 Approved by: https://github.com/Chillee	2024-05-17 00:41:55 +00:00
Jithun Nair	14d8e3aec0	Add distributed/_tensor/test_attention to ROCM_BLOCKLIST (#126336 ) Fixes #125504 Fixes #126252 Fixes #126296 Fixes #126330 This PR doesn't really fix the RingAttentionTest tests for ROCm, but explicitly adds the whole test file to ROCM_BLOCKLIST to get a clean signal on ROCm distributed CI. We will enable these tests in a follow-up PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/126336 Approved by: https://github.com/huydhn, https://github.com/pruthvistony	2024-05-16 16:38:09 +00:00
Catherine Lee	48f98bcdfc	[TD] Enable test removal on most default configs + distributed CUDA for everyone (#125931 ) yolo Add the longest jobs in pull: * default cpu configs * non sm86 cuda * distributed cuda for everyone Still excluding * slow, inductor, rocm, onnx, mac, dynamo * distributed cpu * windows cuda Pull Request resolved: https://github.com/pytorch/pytorch/pull/125931 Approved by: https://github.com/huydhn, https://github.com/ZainRizvi	2024-05-14 17:35:12 +00:00
Catherine Lee	6f619cc727	[ez] functorch/test_vmap and test_dataloader to run in parallel (#125597 ) Also mark test_svd serial in linalg to see if it helps with the flakiness Pull Request resolved: https://github.com/pytorch/pytorch/pull/125597 Approved by: https://github.com/huydhn, https://github.com/ZainRizvi	2024-05-08 15:37:29 +00:00
Huy Do	0e57bbb6d7	Set timeout for C++ tests (#125517 ) Looking at the unrelated Windows timeout failure on https://github.com/pytorch/pytorch/pull/125199, it looks like we don't have a timeout value set for C++ tests atm. In this case, a C++ test on Windows timed out after 2+ hours. ``` 2024-05-02T23:35:34.0639067Z Running cpp/c10_TypeList_test 1/1 ... [2024-05-02 23:35:34.059021] 2024-05-02T23:35:34.0641108Z Executing ['pytest', 'C:\\actions-runner\\_work\\pytorch\\pytorch\\build\\win_tmp\\build\\torch\\test\\c10_TypeList_test.exe', '-m', 'not serial', '-v', '-vv', '-rfEX', '-n', '2', '--junit-xml-reruns', 'test-reports\\python-pytest\\test\\run_test\\test\\run_test-c898ddeff8f33cbf.xml', '-x', '--reruns=2'] ... [2024-05-02 23:35:34.062137] 2024-05-03T02:45:33.7862004Z Process SpawnPoolWorker-2: 2024-05-03T02:45:33.7927201Z Traceback (most recent call last): 2024-05-03T02:45:33.7928032Z File "C:\Jenkins\Miniconda3\lib\multiprocessing\process.py", line 315, in _bootstrap 2024-05-03T02:45:33.7928722Z self.run() 2024-05-03T02:45:33.7929722Z File "C:\Jenkins\Miniconda3\lib\multiprocessing\process.py", line 108, in run 2024-05-03T02:45:33.7931639Z self._target(self._args, self._kwargs) 2024-05-03T02:45:33.7932435Z File "C:\Jenkins\Miniconda3\lib\multiprocessing\pool.py", line 114, in worker 2024-05-03T02:45:33.7933338Z task = get() 2024-05-03T02:45:33.7933946Z File "C:\Jenkins\Miniconda3\lib\multiprocessing\queues.py", line 365, in get 2024-05-03T02:45:33.7935219Z res = self._reader.recv_bytes() 2024-05-03T02:45:33.7935897Z File "C:\Jenkins\Miniconda3\lib\multiprocessing\connection.py", line 221, in recv_bytes 2024-05-03T02:45:33.7936609Z buf = self._recv_bytes(maxlength) 2024-05-03T02:45:33.7937302Z File "C:\Jenkins\Miniconda3\lib\multiprocessing\connection.py", line 310, in _recv_bytes 2024-05-03T02:45:33.7938316Z waitres = _winapi.WaitForMultipleObjects( 2024-05-03T02:45:33.7938766Z KeyboardInterrupt ``` Retrying was working, but it was already too late to finish the job. I'm setting the same default `THRESHOLD 3` timeout value here for C++ tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/125517 Approved by: https://github.com/clee2000	2024-05-07 16:41:38 +00:00

1 2 3 4 5 ...

784 Commits