pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
Shunting Zhang	4cc64d6234	[inductor] pre grad graph bisecting (#166344 ) A few things to note: 1. Customers like vllm use a custom backend (e.g. VllmBackend), split the graph, and call standalone_compile for each split. If we let the bisector override the backend, we won't bisect thru the custom backend. `test_configs.bisect_keep_custom_backend_for_inductor` is used to keep the custom backend if we are bisecting for inductor. 2. pre_grad_graph bisecting and lowering bisecting so far does not compose well with each other since an issue may be just captured by the first one we try. `test_configs.bisect_pre_grad_graph` is used to enable the 'pre_grad_graph' bisecting. Pull Request resolved: https://github.com/pytorch/pytorch/pull/166344 Approved by: https://github.com/eellison	2025-11-01 09:22:21 +00:00
Maggie Moss	31e42eb732	Fix pyrefly ignore syntax (#166438 ) Reformats pyrefly ignore suppressions so they only ignore one error code. pyrefly check lintrunner Pull Request resolved: https://github.com/pytorch/pytorch/pull/166438 Approved by: https://github.com/Skylion007	2025-10-29 00:02:21 +00:00
Yuanyuan Chen	a60d9e1f6d	Fix flake8 B028 warnings (#166224 ) This PR fixes flake8 B028 warning by specifying stacklevel=2 in `warnings.warn`. The advantage is that users can know more contextual information about PyTorch warnings. Pull Request resolved: https://github.com/pytorch/pytorch/pull/166224 Approved by: https://github.com/ezyang	2025-10-26 06:18:55 +00:00
Maggie Moss	c7eee49525	Fix pyrefly ignores 1/n (#166239 ) First diff adjusting the syntax for pyrefly: ignore suppressions so they only hide one class of type error. Test: lintrunner pyrefly check Pull Request resolved: https://github.com/pytorch/pytorch/pull/166239 Approved by: https://github.com/oulgen	2025-10-26 00:44:10 +00:00
nick-kuhn	79a4a9c02e	Fix race condition and make CUDA kthvalue deterministic (#165762 ) The gatherKthValue kernel had a race condition where multiple threads could write to the same output location without synchronization when duplicate k-th values exist, resulting in non-deterministic output. Changes: - aten/src/ATen/native/cuda/Sorting.cu: Use atomicMin with shared memory to deterministically find minimum index. Add early termination and remove redundant inRange checks. (We have to cast the index to `int32_t`, but this is already assumed to fit earlier in the kernel.) - aten/src/ATen/native/cuda/Sorting.cpp: Remove non-deterministic alert since kthvalue is now deterministic on CUDA. - torch/__init__.py: Remove kthvalue from non-deterministic operations list and remove kthvalue example from use_deterministic_algorithms() docstring. - test/test_torch.py: Remove test_nondeterministic_alert_kthvalue since kthvalue no longer raises alerts on CUDA. Benefits: - Deterministic: always returns minimum index when duplicates exist - Potential performance improvement on large arrays with repetitions Test Results: - All existing PyTorch tests pass (test_kthvalue) - Custom determinism tests confirm consistent results - Custom CUDA vs CPU correctness validated across 50+ scenarios - Custom performance benchmarks show improvements with no visible regressions Addresses #165227 Pull Request resolved: https://github.com/pytorch/pytorch/pull/165762 Approved by: https://github.com/ngimel, https://github.com/eqy	2025-10-25 00:45:57 +00:00
Shunting Zhang	673060beae	[inductor] turn Inductor deterministic mode on with torch.use_deterministic_algorithms (#165950 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/165950 Approved by: https://github.com/v0i0, https://github.com/eellison	2025-10-23 02:48:42 +00:00
Aaron Gokaslan	57ba575242	[BE][Ez]: Update torch.is_tensor documentation (#165841 ) TypeIs propogates the isinstance check with the typing system. They are now equivalent. Pull Request resolved: https://github.com/pytorch/pytorch/pull/165841 Approved by: https://github.com/albanD	2025-10-19 09:24:11 +00:00
andreh7	f18041cca8	Fix missing closing quote in __init__.py documentation (#165827 ) Title says it all. Pull Request resolved: https://github.com/pytorch/pytorch/pull/165827 Approved by: https://github.com/Skylion007	2025-10-18 22:09:18 +00:00
Yuanyuan Chen	fb64da0791	[2/N] Use "is" in python type comparison (#165142 ) This is follow-up of #165037. It generally recommended to use `is/is not` to compare types. Therefore this series of changes apply this suggestion in the code base, and it aims to finally enabling related linter checks. Pull Request resolved: https://github.com/pytorch/pytorch/pull/165142 Approved by: https://github.com/albanD	2025-10-10 15:36:44 +00:00
atalman	483f4e0db9	CUDA 13.0 builds fix on Amazon Linux 2023 (#164870 ) During 2.9 rc testing I am seeing an issue on Amazon Linux 2023 with CUDA 13.0 builds This is related to: https://github.com/pytorch/pytorch/issues/152756 Workflow: https://github.com/pytorch/test-infra/actions/runs/18324074610/job/52184079262 Error: ``` WARNING: There was an error checking the latest version of pip. + python3.11 .ci/pytorch/smoke_test/smoke_test.py --package torchonly Traceback (most recent call last): File "/usr/local/lib64/python3.11/site-packages/torch/__init__.py", line 333, in _load_global_deps ctypes.CDLL(global_deps_lib_path, mode=ctypes.RTLD_GLOBAL) File "/usr/lib64/python3.11/ctypes/__init__.py", line 376, in __init__ self._handle = _dlopen(self._name, mode) ^^^^^^^^^^^^^^^^^^^^^^^^^ OSError: libcudart.so.13: cannot open shared object file: No such file or directory During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/pytorch/pytorch/.ci/pytorch/smoke_test/smoke_test.py", line 12, in <module> import torch File "/usr/local/lib64/python3.11/site-packages/torch/__init__.py", line 425, in <module> _load_global_deps() File "/usr/local/lib64/python3.11/site-packages/torch/__init__.py", line 383, in _load_global_deps _preload_cuda_deps(lib_folder, lib_name) File "/usr/local/lib64/python3.11/site-packages/torch/__init__.py", line 317, in _preload_cuda_deps raise ValueError(f"{lib_name} not found in the system path {sys.path}") Traceback (most recent call last): ValueError: libnvToolsExt.so.*[0-9] not found in the system path ['/pytorch/pytorch/.ci/pytorch/smoke_test', '/usr/lib64/python311.zip', '/usr/lib64/python3.11', '/usr/lib64/python3.11/lib-dynload', '/usr/local/lib64/python3.11/site-packages', '/usr/local/lib/python3.11/site-packages', '/usr/lib64/python3.11/site-packages', '/usr/lib/python3.11/site-packages'] File "/home/ec2-user/actions-runner/_work/test-infra/test-infra/test-infra/.github/scripts/run_with_env_secrets.py", line 102, in <module> main() File "/home/ec2-user/actions-runner/_work/test-infra/test-infra/test-infra/.github/scripts/run_with_env_secrets.py", line 98, in main run_cmd_or_die(f"docker exec -t {container_name} /exec") File "/home/ec2-user/actions-runner/_work/test-infra/test-infra/test-infra/.github/scripts/run_with_env_secrets.py", line 39, in run_cmd_or_die raise RuntimeError(f"Command {cmd} failed with exit code {exit_code}") RuntimeError: Command docker exec -t 7d9c5bd403cac9a9ee824d63a1d6f6057ecce89a7daa94a81617dbf8eff0ff2e /exec failed with exit code 1 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/164870 Approved by: https://github.com/Camyll Co-authored-by: Eli Uriegas <1700823+seemethere@users.noreply.github.com>	2025-10-07 22:52:53 +00:00
Laith Sakka	ef7e2ca77e	remove check_is_size from test_misc.py (#164667 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/164667 Approved by: https://github.com/angelayi ghstack dependencies: #164664, #164665	2025-10-07 07:33:50 +00:00
Maggie Moss	f414aa8e0d	Add pyrefly suppressions (3/n) (#164588 ) Adds suppressions to pyrefly will typecheck clean: https://github.com/pytorch/pytorch/issues/163283 Test plan: dmypy restart && python3 scripts/lintrunner.py -a pyrefly check step 1: uncomment lines in the pyrefly.toml file step 2: run pyrefly check step 3: add suppressions, clean up unused suppressions before: https://gist.github.com/maggiemoss/bb31574ac8a59893c9cf52189e67bb2d after: 0 errors (1,970 ignored) Pull Request resolved: https://github.com/pytorch/pytorch/pull/164588 Approved by: https://github.com/oulgen	2025-10-03 22:03:03 +00:00
Tugsbayasgalan Manlaibaatar	2a11ce2c78	Support calling torch.compile inside non-strict export (#164171 ) So this fixes at least two issues: 1) When we are invoking inductor backend, we apply pre-grad passes which try to find correct fake mode to use. In the nested case, we will run into clash when there is closure variable in the inductor region because non-strict would have fakified this variable before hand and inner torch.compile would have created a new fresh fake mode. This is not a problem in regular torch.compile because inner torch.compile gets ignored. I don't know if we are supposed to inherit fake mode from parent context in this case. But we can avoid this problem if we just default to eager backend which is fine in this case because the point of export is to capture aten operators. Going to inductor would mean we will lose inner torch.compile ops. 2) There is custom torch function modes in export that track number of torch fns executed and inner compile itself doesn't work because of guard failure as this mode state gets changed. I noticed torch.cond fixes this problem by carefully stashing the torch function mode and defer it in the backend. So the correct thing to do here is just re-use torch.cond implementation unconditionally. So the things i did for fixing above were: 1) Always default to eager backend when compile is invoked inside export. I needed to make how torch.cond sets up the fresh tracing env into an util that can be shared. 2) The previous eager backend for torch.cond was wrong because the context managers didn't actually persist until the backend is invoked. 3) torch.cond used only disable TorchFunctionMetadata tf mode and stash it for later, but in fact, we should do both TorchFunctionMetadata and PreDispatchTorchFunctionMode. With above fixes, we are able to export flex attention in export. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164171 Approved by: https://github.com/ydwu4	2025-10-03 16:31:07 +00:00
Eddie Yan	f7082e92b3	[cuBLAS] update cuBLAS determinism docs, remove workspace requirement checks (#161749 ) Since CUDA 11.x (need to update the docs for this, current PR is saying 12.2 which is incorrect) we've been allocating cuBLAS workspaces explicitly per handle/stream combination https://github.com/pytorch/pytorch/pull/85447 According to the cuBLAS documentation, this appears to be sufficient for determinism without any explicit workspace requirements to e.g., `:4096:8` or `:16:8` as was previously expressed in PyTorch docs https://docs.nvidia.com/cuda/cublas/#results-reproducibility Planning to add an explicit determinism test as well... Pull Request resolved: https://github.com/pytorch/pytorch/pull/161749 Approved by: https://github.com/ngimel	2025-10-03 00:09:47 +00:00
Maggie Moss	5f18f240de	Add initial suppressions for pyrefly (#164177 ) Adds suppressions to pyrefly will typecheck clean: https://github.com/pytorch/pytorch/issues/163283 Test plan: `python3 scripts/lintrunner.py` `pyrefly check` --- Pyrefly check before: https://gist.github.com/maggiemoss/3a0aa0b6cdda0e449cd5743d5fce2c60 After: ``` INFO Checking project configured at `/Users/maggiemoss/python_projects/pytorch/pyrefly.toml` INFO 0 errors (1,063 ignored) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/164177 Approved by: https://github.com/Lucaskabela	2025-10-02 20:57:41 +00:00
Yuanyuan Chen	315ffdc1e4	[4/N] Apply ruff UP035 rule to python code (#164206 ) Follows #164104 Pull Request resolved: https://github.com/pytorch/pytorch/pull/164206 Approved by: https://github.com/albanD	2025-10-01 19:05:53 +00:00
atalman	141fc7276e	[CD] CUDA 13.0 fix preload logic to include nvidia/cu13/lib/ (#163661 ) Preload logic no longer works with CUDA 13.0 See the installation path: ``` ls /home/ubuntu/.venv/lib/python3.10/site-packages/nvidia/cu13/lib/ libcheckpoint.so libcudadevrt.a libcufft.so.12 libcufile_rdma.so.1 libcusolver.so.12 libnvJitLink.so.13 libnvperf_target.so libnvrtc.alt.so.13 libpcsamplingutil.so libcublas.so.13 libcudart.so.13 libcufftw.so.12 libcupti.so.13 libcusolverMg.so.12 libnvblas.so.13 libnvrtc-builtins.alt.so.13.0 libnvrtc.so.13 libcublasLt.so.13 libcudart_static.a libcufile.so.0 libcurand.so.10 libcusparse.so.12 libnvperf_host.so libnvrtc-builtins.so.13.0 libnvtx3interop.so.1 ls /home/ubuntu/.venv/lib/python3.10/site-packages/nvidia/ cu13 cudnn cusparselt nccl nvshmem ``` Test using script from : https://github.com/pytorch/pytorch/issues/162367 ``` Kernel test passed! ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/163661 Approved by: https://github.com/nWEIdia, https://github.com/tinglvv, https://github.com/Camyll	2025-09-24 11:27:05 +00:00
Nikita Shulga	f9fa138a39	[BE] Delete all pre py-3.10 checks (#163653 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/163653 Approved by: https://github.com/jansel ghstack dependencies: #163648, #163649	2025-09-23 23:22:53 +00:00
bobrenjc93	ed3438ff13	Turn on capture_dynamic_output_shape_ops when fullgraph=True (#163123 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/163123 Approved by: https://github.com/laithsakka ghstack dependencies: #163121	2025-09-18 21:24:15 +00:00
Saman Khatir	1330c638be	Update Microsoft C++ Redistributable to the latest version (#161430 ) Update Microsoft C++ Redistributable link to the latest version as one of the libraries used by AMD currently has a dependency on that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/161430 Approved by: https://github.com/malfet	2025-09-18 14:22:03 +00:00
Rohit Manav	3ad3bfe11d	added example for torch.is_storage (#162614 ) Fixes #162613 Pull Request resolved: https://github.com/pytorch/pytorch/pull/162614 Approved by: https://github.com/malfet Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>	2025-09-11 20:25:26 +00:00
Xuehai Pan	13d66e2a66	[BE][Easy] restore #157584 after #158288 (#158541 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/158541 Approved by: https://github.com/ezyang	2025-09-02 02:06:50 +00:00
RajeshvShiyal	f795e92802	space added between type and checking for typechecking (#161352 ) space added between type and checking for "typechecking" Fixes #161282 Pull Request resolved: https://github.com/pytorch/pytorch/pull/161352 Approved by: https://github.com/malfet	2025-08-26 02:07:33 +00:00
PaliC	6162e650b0	[BE] remove torch deploy - conditionals (#158288 ) This PR is part of the work to deprecate torch::deploy in OSS. Effectively it does 3 things to get started. 1. Remove test_deploy_interaction as we no longer need to worry about this 2. Remove all torch._running_with_deploy checks and use the False path always (surfaced 1) 3. Remove `USE_DEPLOY` and switch to the default path always Note: MyPy does fail on a bunch of things here as a bunch of older files are touched. It may be better to fix these things on a separate PR Pull Request resolved: https://github.com/pytorch/pytorch/pull/158288 Approved by: https://github.com/albanD	2025-07-29 17:40:49 +00:00
PyTorch MergeBot	f8fafdc7a6	Revert "[BE] remove torch deploy - conditionals (#158288 )" This reverts commit `ab26d4fbeb`. Reverted https://github.com/pytorch/pytorch/pull/158288 on behalf of https://github.com/ZainRizvi due to Reverting as per offline discussion to fix internal breaks. @PaliC will reland this as a codev diff. Instructions here: https://fburl.com/fixing-ghfirst-reverts ([comment](https://github.com/pytorch/pytorch/pull/158288#issuecomment-3119037960))	2025-07-25 16:09:39 +00:00
PaliC	ab26d4fbeb	[BE] remove torch deploy - conditionals (#158288 ) This PR is part of the work to deprecate torch::deploy in OSS. Effectively it does 3 things to get started. 1. Remove test_deploy_interaction as we no longer need to worry about this 2. Remove all torch._running_with_deploy checks and use the False path always (surfaced 1) 3. Remove `USE_DEPLOY` and switch to the default path always Note: MyPy does fail on a bunch of things here as a bunch of older files are touched. It may be better to fix these things on a separate PR Pull Request resolved: https://github.com/pytorch/pytorch/pull/158288 Approved by: https://github.com/albanD	2025-07-23 20:27:28 +00:00
PyTorch MergeBot	ee5a434f8c	Revert "[BE] remove torch deploy - conditionals (#158288 )" This reverts commit `1a4268b811`. Reverted https://github.com/pytorch/pytorch/pull/158288 on behalf of https://github.com/ZainRizvi due to Sorry but this is breaking internally, see D78496147 for details. To validate your fixes internally, you can follow the instructions here: https://fburl.com/fixing-ghfirst-reverts ([comment](https://github.com/pytorch/pytorch/pull/158288#issuecomment-3099826158))	2025-07-21 23:17:39 +00:00
PaliC	1a4268b811	[BE] remove torch deploy - conditionals (#158288 ) This PR is part of the work to deprecate torch::deploy in OSS. Effectively it does 3 things to get started. 1. Remove test_deploy_interaction as we no longer need to worry about this 2. Remove all torch._running_with_deploy checks and use the False path always (surfaced 1) 3. Remove `USE_DEPLOY` and switch to the default path always Note: MyPy does fail on a bunch of things here as a bunch of older files are touched. It may be better to fix these things on a separate PR Pull Request resolved: https://github.com/pytorch/pytorch/pull/158288 Approved by: https://github.com/albanD	2025-07-17 05:56:07 +00:00
Andrey Talman	4e13eca713	[BE] Remove CUDA 11.8 artifacts (#158303 ) We are including cufile by default in all CUDA 12+ builds. Since CUDA 11.8 is removed we can safely remove this code Pull Request resolved: https://github.com/pytorch/pytorch/pull/158303 Approved by: https://github.com/Camyll, https://github.com/cyyever	2025-07-15 11:52:08 +00:00
Xuehai Pan	4dce5b71a0	[build] modernize build-frontend: `python setup.py develop/install` -> `[uv ]pip install --no-build-isolation [-e ].` (#156027 ) Modernize the development installation: ```bash # python setup.py develop python -m pip install --no-build-isolation -e . # python setup.py install python -m pip install --no-build-isolation . ``` Now, the `python setup.py develop` is a wrapper around `python -m pip install -e .` since `setuptools>=80.0`: - pypa/setuptools#4955 `python setup.py install` is deprecated and will emit a warning during run. The warning will become an error on October 31, 2025. - `9c4d383631/setuptools/command/install.py (L58-L67)` > ```python > SetuptoolsDeprecationWarning.emit( > "setup.py install is deprecated.", > """ > Please avoid running ``setup.py`` directly. > Instead, use pypa/build, pypa/installer or other > standards-based tools. > """, > see_url="https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html", > due_date=(2025, 10, 31), > ) > ``` - pypa/setuptools#3849 Additional Resource: - [Why you shouldn't invoke setup.py directly](https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html) Pull Request resolved: https://github.com/pytorch/pytorch/pull/156027 Approved by: https://github.com/ezyang	2025-07-09 11:24:27 +00:00
Xuehai Pan	4cc8b60d1b	[BE][1/16] fix typos in torch/ (#156311 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/156311 Approved by: https://github.com/albanD	2025-07-09 11:02:22 +00:00
Zeina Migeed	ed6df0e324	correctly import torch.version (#157584 ) The structure is ``` torch/ __init__.py version.py ``` When we import torch, only `torch/__init__.py` is executed by default. The submodules like `version.py` are not automatically imported or attached to the torch module. So without anything in `__init__.py`, `torch.version` may not be found. So in this PR, we make the import explicit. Pull Request resolved: https://github.com/pytorch/pytorch/pull/157584 Approved by: https://github.com/ezyang	2025-07-07 21:43:35 +00:00
Tom Ritchford	9968edd002	Fix #153942 (#153943 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/153943 Approved by: https://github.com/malfet	2025-07-04 18:25:18 +00:00
PyTorch MergeBot	19f851ce10	Revert "Simplify nvtx3 CMake handling, always use nvtx3 (#153784 )" This reverts commit `099d0d6121`. Reverted https://github.com/pytorch/pytorch/pull/153784 on behalf of https://github.com/Camyll due to breaking internal tests and cuda 12.4 builds still used in CI ([comment](https://github.com/pytorch/pytorch/pull/153784#issuecomment-3001702310))	2025-06-24 20:02:07 +00:00
cyy	099d0d6121	Simplify nvtx3 CMake handling, always use nvtx3 (#153784 ) Fall back to third-party NVTX3 if system NVTX3 doesn't exist. We also reuse the `CUDA::nvtx3` target for better interoperability. Pull Request resolved: https://github.com/pytorch/pytorch/pull/153784 Approved by: https://github.com/ezyang	2025-06-23 06:12:46 +00:00
Zhengxu Chen	7f0cddfb55	[dynamo] Add documentation for guard_filter_fn (#156114 ) Summary: Adding a section of doc for guard_filter_fn. Test Plan: CI Rollback Plan: Differential Revision: D76756743 Pull Request resolved: https://github.com/pytorch/pytorch/pull/156114 Approved by: https://github.com/jansel	2025-06-19 16:13:12 +00:00
Joel Schlosser	c4b93e6579	Replace frame_traced_fn hook with get_traced_code() util (#155249 ) #153622 introduced a hook for getting the relevant code objects after frame tracing. The idea is to have vLLM use this instead of monkey-patching `inline_call_()` to determine the source code files to hash. Unfortunately, the hook runs too late; the vLLM backend needs access to the set of source code filenames while it's running. This PR replaces the newly-added hook with a utility function that a backend can call to get this information. I've made the change in vLLM and can verify that this allows the information to be queried at the right time. Pull Request resolved: https://github.com/pytorch/pytorch/pull/155249 Approved by: https://github.com/zou3519	2025-06-10 22:40:58 +00:00
Kshiteej K	694028f502	update get_default_device to also respect torch.device ctx manager (#148621 ) Fixes https://github.com/pytorch/pytorch/issues/131328 Pull Request resolved: https://github.com/pytorch/pytorch/pull/148621 Approved by: https://github.com/ezyang	2025-06-07 14:26:17 +00:00
Stella Laurenzo	30387ab2e4	[ROCm] Adds initialization support for PyTorch when built from ROCm wheels. (#155285 ) AMD is beginning to roll out ROCm distribution via Python wheels. This patch adds the `__init__.py` hook that is necessary to bootstrap ROCm correctly on Linux and Windows when built from these wheels. See draft, developer documentation describing the mechanism here: https://github.com/ROCm/TheRock/blob/main/docs/packaging/python_packaging.md This operates to similar effect as how Torch can depend on CUDA wheels, with some differences: * ROCm libraries and checks are delegated to helpers in the `rocm_sdk` module, which knows how to find and configure access to the installed libraries. This limits the amount of plumbing and path machinations that must match up between the framework and ROCm. * When building torch against ROCm, no ROCm system install is needed: instead the proper SDK development wheel is installed and the `CMAKE_PREFIX_PATH` is obtained via `rocm-sdk path --cmake`. * It is expected that whoever produces such a build will also place a generated `_rocm_init.py` in the `torch` module with initialization logic to preload libraries, check versions, verify GPU compatibility, etc. * See [build_prod_wheels.py](https://github.com/ROCm/TheRock/blob/main/external-builds/pytorch/build_prod_wheels.py) for an example build script that is being used to generate nightlies in this configuration. Pull Request resolved: https://github.com/pytorch/pytorch/pull/155285 Approved by: https://github.com/jeffdaily Co-authored-by: Jeff Daily <jeff.daily@amd.com>	2025-06-07 02:59:03 +00:00
Joel Schlosser	9db7bcb3fe	[Dynamo] Introduce hook receiving list of traced code objects (#153622 ) This PR: * Expands `Hooks` with a new, optional `frame_traced_fn` field. It should be a callable receiving the list of traced code objects * Maintains a list of `traced_code` objects in the `TracingContext` of an `OutputGraph` * Whenever an `inline_call()` is encountered, the corresponding code object is added to this set * `OutputGraph`'s associated `f_code` is added to the list just before the hook is called I believe use of this hook should enable the source code hashing that vLLM does in a better way than monkey-patching `inline_call()`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/153622 Approved by: https://github.com/jansel	2025-05-28 15:40:09 +00:00
Yuki Kobayashi	f55f2f42a7	Add missing docstring for `sym_ite` (#154201 ) `sym_ite` is listed in [the reference page](https://docs.pytorch.org/docs/stable/torch.html) and has no document. Pull Request resolved: https://github.com/pytorch/pytorch/pull/154201 Approved by: https://github.com/Skylion007	2025-05-26 15:59:21 +00:00
Tsung-Hsien Lee	756fd80734	[BE] Improve the typing related to `model` input argument of `torch.compile()` (#153559 ) Summary: Match the `overload` typing with the original typing in function definition and adjust the corresponding comments. Test Plan: contbuild & OSS CI Differential Revision: D74746243 Pull Request resolved: https://github.com/pytorch/pytorch/pull/153559 Approved by: https://github.com/Skylion007	2025-05-15 04:49:26 +00:00
Ke Wen	5bf0c3518c	Detect NVSHMEM location (#153010 ) ### Changes - Detect NVSHMEM install location via `sysconfig.get_path("purelib")`, which typically resolves to `<conda_env>/lib/python/site-packages`, and NVSHMEM include and lib live under `nvidia/nvshmem` - Added link dir via `target_link_directories` - Removed direct dependency on mlx5 - Added preload rule (following other other NVIDIA libs) ### Plan of Record 1. End user experience: link against NVSHMEM dynamically (NVSHMEM lib size is 100M, similar to NCCL, thus we'd like users to `pip install nvshmem` than torch carrying the bits) 2. Developer experience: at compile time, prefers wheel dependency than using Git submodule General rule: submodule for small lib that torch can statically link with If user pip install a lib, our CI build process should do the same, rather than building from Git submodule (just for its header, for example) 3. Keep `USE_NVSHMEM` to gate non-Linux platforms, like Windows, Mac 4. At configuration time, we should be able to detect whether nvshmem is available, if not, we don't build `NVSHMEMSymmetricMemory` at all. For now, we have symbol dependency on two particular libs from NVSHMEM: - libnvshmem_host.so: contains host side APIs; - libnvshmem_device.a: contains device-side global variables AND device function impls. Pull Request resolved: https://github.com/pytorch/pytorch/pull/153010 Approved by: https://github.com/ngimel, https://github.com/fduwjj, https://github.com/Skylion007	2025-05-07 23:35:04 +00:00
Yuanhao Ji	f5f8f637a5	[Typing] Improve device typing for `torch.set_default_device()` (#153028 ) Part of: #152952 Here is the definition of `torch.types.Device`: `ab997d9ff5/torch/types.py (L74)` So `_Optional[_Union["torch.device", str, builtins.int]]` is equivalent to it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/153028 Approved by: https://github.com/Skylion007	2025-05-07 19:31:43 +00:00
David Berard	7d205b22b5	[profiler][retry] don't disable CUPTI_LAZY_REINIT for cuda >= 12.6 (#151124 ) Retry of https://github.com/pytorch/pytorch/pull/150957, which was reverted due to internal meta failures Credit to @mgmtea who wrote the initial version of this PR: https://github.com/pytorch/pytorch/pull/146604 Context: CUPTI is the NVIDIA library that Kineto uses for collecting GPU-side info during profiling. The intended usage is to register a callback while you want profiling to occur, and then unregister the callback when you want profiling to stop. But a bug would cause crashes if CUPTI callbacks were de-registered when used with cudagraphs. The workaround was to disable "CUPTI_LAZY_REINIT" and "CUPTI_TEARDOWN" in Kineto - which prevents crashes, but can result in slower execution after profiling has occurred and completed. This bug is believed to be fixed in CUDA >= 12.6, so this PR qualifies that DISABLE_CUPTI_LAZY_REINIT=1 and CUPTI_TEARDOWN=0 should only be applied if CUDA >= 12.6. Additionally, `profiler_allow_cudagraph_cupti_lazy_reinit_cuda12()` is added as an escape hatch so that we can add a killswitch in case we see more crashes related to this. Differential Revision: [D72842114](https://our.internmc.facebook.com/intern/diff/D72842114/) NOTE FOR REVIEWERS: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D72842114/)! Differential Revision: [D72842114](https://our.internmc.facebook.com/intern/diff/D72842114) Pull Request resolved: https://github.com/pytorch/pytorch/pull/151124 Approved by: https://github.com/sraikund16	2025-04-15 16:11:49 +00:00
Yuki Kobayashi	101c4f482a	Docs: Fix typos in the Symbolic Numbers docstrings (#151181 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/151181 Approved by: https://github.com/soulitzer	2025-04-14 01:46:02 +00:00
PyTorch MergeBot	44ed0c9fbb	Revert "[profiler] don't disable CUPTI_LAZY_REINIT for cuda >= 12.6 (#150957 )" This reverts commit `37812009fd`. Reverted https://github.com/pytorch/pytorch/pull/150957 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/150957#issuecomment-2795878848))	2025-04-11 05:38:58 +00:00
Zhengxu Chen	86370fd658	[dynamo] Allow guards to be dropped with custom filter functions. (#150936 ) Summary: A follow up of https://github.com/pytorch/pytorch/pull/150689. Test Plan: test_dynamo -k test_guard_filter_fn Differential Revision: D72722322 Pull Request resolved: https://github.com/pytorch/pytorch/pull/150936 Approved by: https://github.com/jansel	2025-04-11 03:06:34 +00:00
David Berard	37812009fd	[profiler] don't disable CUPTI_LAZY_REINIT for cuda >= 12.6 (#150957 ) Credit to @mgmtea who wrote the initial version of this PR: https://github.com/pytorch/pytorch/pull/146604 Context: CUPTI is the NVIDIA library that Kineto uses for collecting GPU-side info during profiling. The intended usage is to register a callback while you want profiling to occur, and then unregister the callback when you want profiling to stop. But a bug would cause crashes if CUPTI callbacks were de-registered when used with cudagraphs. The workaround was to disable "CUPTI_LAZY_REINIT" and "CUPTI_TEARDOWN" in Kineto - which prevents crashes, but can result in slower execution after profiling has occurred and completed. This bug is believed to be fixed in CUDA >= 12.6, so this PR qualifies that DISABLE_CUPTI_LAZY_REINIT=1 and CUPTI_TEARDOWN=0 should only be applied if CUDA >= 12.6. Additionally, `profiler_allow_cudagraph_cupti_lazy_reinit_cuda12()` is added as an escape hatch so that we can add a killswitch in case we see more crashes related to this. Differential Revision: [D72745929](https://our.internmc.facebook.com/intern/diff/D72745929) Pull Request resolved: https://github.com/pytorch/pytorch/pull/150957 Approved by: https://github.com/aaronenyeshi, https://github.com/Skylion007	2025-04-10 17:45:01 +00:00
Laith Sakka	5471e80fb4	Remove guard_size_oblivious from vector_norm decomposition. (#148809 ) This PR remove the usage of guard_size_oblivious in vector_norm by inlining it in the runtime check, this prevent any data dependent error from ever appearing here at the locations where guard_size_oblivious used to exist. Before this PR it used to break potentially. This is NOT BC breaking or changing of semantics from eager. Pull Request resolved: https://github.com/pytorch/pytorch/pull/148809 Approved by: https://github.com/bobrenjc93	2025-04-10 16:19:00 +00:00

1 2 3 4 5 ...

681 Commits