pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
Edward Yang	09cb34c1dc	[RELAND] Always build USE_DISTRIBUTED (#160449 ) and Make distributed modules importable even when backend not built (#159889 ) (#162594 ) Summary: Original: D81957844 and D81957923 Also, https://github.com/pytorch/pytorch/pull/162142 is patched in as well #buildall Test Plan: sandcastle and oss ci Rollback Plan: Reviewed By: H-Huang Pull Request resolved: https://github.com/pytorch/pytorch/pull/162594 Approved by: https://github.com/H-Huang, https://github.com/dcci	2025-09-22 21:12:18 +00:00
PyTorch MergeBot	edafc902d7	Revert "[BE] Make PyObjectSlot use a global PyInterpreter (#162659 )" This reverts commit `d1993c27ae`. Reverted https://github.com/pytorch/pytorch/pull/162659 on behalf of https://github.com/wdvr due to reverted internally, please see D82771705 @PaliC ([comment](https://github.com/pytorch/pytorch/pull/162659#issuecomment-3317110247))	2025-09-22 06:22:37 +00:00
PyTorch MergeBot	f0078941cf	Revert "[RELAND] Always build USE_DISTRIBUTED (#160449 ) and Make distributed modules importable even when backend not built (#159889 ) (#162594 )" This reverts commit `6c334885d4`. Reverted https://github.com/pytorch/pytorch/pull/162594 on behalf of https://github.com/wdvr due to reverted internally - @ezyang see D82281294 ([comment](https://github.com/pytorch/pytorch/pull/162594#issuecomment-3317017530))	2025-09-22 05:39:07 +00:00
Sherlock Huang	033b7d1e1a	[Reland] Return NoOpDeviceGuardImpl in replace of CudaDeviceGuard when device is not available (#163187 ) Reland of #160532 Summary: To support exporting a cuda model on a CPU-only machine under fake tensor mode. User commonly need to move sample inputs to the cuda device with .to("cuda:0") or .to("cuda") call. This diff supports this. I expect the following pattern to work ``` with FakeTensorMode(allow_non_fake_inputs=True): cuda_module = module.to("cuda:0") cuda_sample_inputs = tuple([x.to("cuda:0") for x in sample_inputs]) with torch.no_grad(): ep = torch.export.export(cuda_module, cuda_sample_inputs) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/163016 Approved by: https://github.com/huydhn Pull Request resolved: https://github.com/pytorch/pytorch/pull/163187 Approved by: https://github.com/angelayi	2025-09-18 04:46:26 +00:00
Sahan Paliskara	d1993c27ae	[BE] Make PyObjectSlot use a global PyInterpreter (#162659 ) This pr gets rid of the pyobj_interpreter_ variable from PyObjectSlot and saves a word in the process Gonna ask for review from @huydhn as there are some changes to CI. Pull Request resolved: https://github.com/pytorch/pytorch/pull/162659 Approved by: https://github.com/albanD, https://github.com/huydhn	2025-09-17 16:40:55 +00:00
PyTorch MergeBot	79fd497423	Revert "[Reland] Return NoOpDeviceGuardImpl in replace of CudaDeviceGuard when device is not available, or cpu-only build (#163016 )" This reverts commit `f1eb99e2e4`. Reverted https://github.com/pytorch/pytorch/pull/163016 on behalf of https://github.com/jeffdaily due to broke rocm CI, see export/test_export_opinfo.py::TestExportOnFakeCudaCUDA::test_fake_export_nonzero_cuda_float32 [GH job link](https://github.com/pytorch/pytorch/actions/runs/17787208381/job/50564369696) [HUD commit link](`f1eb99e2e4`) ([comment](https://github.com/pytorch/pytorch/pull/163016#issuecomment-3303707552))	2025-09-17 16:17:53 +00:00
Sherlock Huang	f1eb99e2e4	[Reland] Return NoOpDeviceGuardImpl in replace of CudaDeviceGuard when device is not available, or cpu-only build (#163016 ) Reland of #160532 Summary: To support exporting a cuda model on a CPU-only machine under fake tensor mode. User commonly need to move sample inputs to the cuda device with .to("cuda:0") or .to("cuda") call. This diff supports this. I expect the following pattern to work ``` with FakeTensorMode(allow_non_fake_inputs=True): cuda_module = module.to("cuda:0") cuda_sample_inputs = tuple([x.to("cuda:0") for x in sample_inputs]) with torch.no_grad(): ep = torch.export.export(cuda_module, cuda_sample_inputs) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/163016 Approved by: https://github.com/huydhn	2025-09-17 05:01:33 +00:00
PyTorch MergeBot	4db203f875	Revert "[BE] Make PyObjectSlot use a global PyInterpreter (#162659 )" This reverts commit `05ee8114f8`. Reverted https://github.com/pytorch/pytorch/pull/162659 on behalf of https://github.com/jeanschmidt due to seems to have introduced errors in linting see https://github.com/pytorch/pytorch/actions/runs/17750689989/job/50444910643 ([comment](https://github.com/pytorch/pytorch/pull/162659#issuecomment-3298626136))	2025-09-16 12:52:57 +00:00
PaliC	05ee8114f8	[BE] Make PyObjectSlot use a global PyInterpreter (#162659 ) This pr gets rid of the pyobj_interpreter_ variable from PyObjectSlot and saves a word in the process Gonna ask for review from @huydhn as there are some changes to CI. Pull Request resolved: https://github.com/pytorch/pytorch/pull/162659 Approved by: https://github.com/albanD, https://github.com/huydhn	2025-09-16 00:37:09 +00:00
PyTorch MergeBot	9c93dc8123	Revert "Return NoOpDeviceGuardImpl in replace of CudaDeviceGuard when device is not available, or cpu-only build (#160532 )" This reverts commit `a956c4ab1c`. Reverted https://github.com/pytorch/pytorch/pull/160532 on behalf of https://github.com/huydhn due to Reverted internally ([comment](https://github.com/pytorch/pytorch/pull/160532#issuecomment-3287745165))	2025-09-13 07:42:12 +00:00
Sherlock Huang	a956c4ab1c	Return NoOpDeviceGuardImpl in replace of CudaDeviceGuard when device is not available, or cpu-only build (#160532 ) Summary: To support exporting a cuda model on a CPU-only machine under fake tensor mode. User commonly need to move sample inputs to the cuda device with .to("cuda:0") or .to("cuda") call. This diff supports this. I expect the following pattern to work ``` with FakeTensorMode(allow_non_fake_inputs=True): cuda_module = module.to("cuda:0") cuda_sample_inputs = tuple([x.to("cuda:0") for x in sample_inputs]) with torch.no_grad(): ep = torch.export.export(cuda_module, cuda_sample_inputs) ``` Test Plan: CI Rollback Plan: Differential Revision: D80181887 Pull Request resolved: https://github.com/pytorch/pytorch/pull/160532 Approved by: https://github.com/henryoier, https://github.com/ezyang	2025-09-13 01:50:51 +00:00
Edward Yang	6c334885d4	[RELAND] Always build USE_DISTRIBUTED (#160449 ) and Make distributed modules importable even when backend not built (#159889 ) (#162594 ) Summary: Original: D81957844 and D81957923 Also, https://github.com/pytorch/pytorch/pull/162142 is patched in as well #buildall Test Plan: sandcastle and oss ci Rollback Plan: Reviewed By: H-Huang Pull Request resolved: https://github.com/pytorch/pytorch/pull/162594 Approved by: https://github.com/H-Huang, https://github.com/dcci	2025-09-12 10:54:42 +00:00
PyTorch MergeBot	6b59a19242	Revert "[RELAND] Always build USE_DISTRIBUTED (#160449 ) and Make distributed modules importable even when backend not built (#159889 ) (#162594 )" This reverts commit `6e8f17c580`. Reverted https://github.com/pytorch/pytorch/pull/162594 on behalf of https://github.com/huydhn due to Reverted internally ([comment](https://github.com/pytorch/pytorch/pull/162594#issuecomment-3283985880))	2025-09-12 06:52:03 +00:00
Edward Yang	6e8f17c580	[RELAND] Always build USE_DISTRIBUTED (#160449 ) and Make distributed modules importable even when backend not built (#159889 ) (#162594 ) Summary: Original: D81957844 and D81957923 Also, https://github.com/pytorch/pytorch/pull/162142 is patched in as well #buildall Test Plan: sandcastle and oss ci Rollback Plan: Reviewed By: H-Huang Pull Request resolved: https://github.com/pytorch/pytorch/pull/162594 Approved by: https://github.com/H-Huang, https://github.com/dcci	2025-09-12 03:56:18 +00:00
Edward Yang	dda071587f	Revert "Make distributed modules importable even when backend not built (#159889 )" (#162568 ) This reverts commit `a0d026688c`. Revert "Always build USE_DISTRIBUTED. (#160449)" This reverts commit `d80297a684`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/162568 Approved by: https://github.com/huydhn	2025-09-10 04:29:42 +00:00
Edward Yang	d80297a684	Always build USE_DISTRIBUTED. (#160449 ) Signed-off-by: Edward Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/160449 Approved by: https://github.com/wconstab, https://github.com/albanD, https://github.com/dcci	2025-09-08 19:10:36 +00:00
PyTorch MergeBot	1e0656f063	Revert "Always build USE_DISTRIBUTED. (#160449 )" This reverts commit `de893e96c7`. Reverted https://github.com/pytorch/pytorch/pull/160449 on behalf of https://github.com/jeanschmidt due to internal changes breaks import checks, see [D81845053](https://www.internalfb.com/diff/D81845053) ([comment](https://github.com/pytorch/pytorch/pull/160449#issuecomment-3264887002))	2025-09-08 07:04:36 +00:00
Edward Yang	de893e96c7	Always build USE_DISTRIBUTED. (#160449 ) Signed-off-by: Edward Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/160449 Approved by: https://github.com/wconstab, https://github.com/albanD, https://github.com/dcci	2025-09-05 20:15:11 +00:00
PyTorch MergeBot	adae7f66aa	Revert "Always build USE_DISTRIBUTED. (#160449 )" This reverts commit `c37103234a`. Reverted https://github.com/pytorch/pytorch/pull/160449 on behalf of https://github.com/jeanschmidt due to Breaking internal build rules, see D81756619 ([comment](https://github.com/pytorch/pytorch/pull/160449#issuecomment-3259430011))	2025-09-05 18:58:47 +00:00
Edward Yang	c37103234a	Always build USE_DISTRIBUTED. (#160449 ) Signed-off-by: Edward Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/160449 Approved by: https://github.com/wconstab, https://github.com/albanD, https://github.com/dcci	2025-09-04 19:43:17 +00:00
PyTorch MergeBot	b7dad7dd49	Revert "Always build USE_DISTRIBUTED. (#160449 )" This reverts commit `90b08643c3`. Reverted https://github.com/pytorch/pytorch/pull/160449 on behalf of https://github.com/jeanschmidt due to Already discussed with @ezyang about the internal quirks and errors ([comment](https://github.com/pytorch/pytorch/pull/160449#issuecomment-3254219358))	2025-09-04 15:25:07 +00:00
Edward Yang	90b08643c3	Always build USE_DISTRIBUTED. (#160449 ) Signed-off-by: Edward Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/160449 Approved by: https://github.com/wconstab, https://github.com/albanD, https://github.com/dcci	2025-09-03 07:33:55 +00:00
PyTorch MergeBot	4e42aa8ffc	Revert "Always build USE_DISTRIBUTED. (#160449 )" This reverts commit `b7034e9c92`. Reverted https://github.com/pytorch/pytorch/pull/160449 on behalf of https://github.com/jeanschmidt due to Breaking internal builds, can't be landed with forward fix due to internal tooling problems ([comment](https://github.com/pytorch/pytorch/pull/160449#issuecomment-3246689684))	2025-09-02 20:28:42 +00:00
Edward Yang	b7034e9c92	Always build USE_DISTRIBUTED. (#160449 ) Signed-off-by: Edward Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/160449 Approved by: https://github.com/wconstab, https://github.com/albanD, https://github.com/dcci	2025-09-01 23:00:21 +00:00
Benjamin Glass	cbc53b7696	Update pybind11 submodule to 3.0.1 (#160754 ) Upgrade to PyBind11 v3. This allows us to strip out our own (possibly broken?) handling of the C++ ABI when building extensions, in favor of the more-complete PyBind11 internal handling. Fixes a few test failures due to https://github.com/pybind/pybind11/issues/5774, which effectively makes the `__qualname__` attribute of functions platform-dependent. Test plan: CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/160754 Approved by: https://github.com/Skylion007	2025-08-27 21:15:01 +00:00
PyTorch MergeBot	1b34e04485	Revert "Update pybind11 submodule to 3.0.1 (#160754 )" This reverts commit `660b0b8128`. Reverted https://github.com/pytorch/pytorch/pull/160754 on behalf of https://github.com/atalman due to please see https://github.com/pytorch/pytorch/pull/160754#issuecomment-3226051449 ([comment](https://github.com/pytorch/pytorch/pull/160754#issuecomment-3226078102))	2025-08-26 23:35:22 +00:00
PyTorch MergeBot	9f6e1b8730	Revert "[ROCm] SDPA fix mem fault when dropout is enabled (#154864 )" This reverts commit `3caddd4daa`. Reverted https://github.com/pytorch/pytorch/pull/154864 on behalf of https://github.com/atalman due to reverted internally ([comment](https://github.com/pytorch/pytorch/pull/154864#issuecomment-3225554119))	2025-08-26 20:03:59 +00:00
Benjamin Glass	660b0b8128	Update pybind11 submodule to 3.0.1 (#160754 ) Upgrade to PyBind11 v3. This allows us to strip out our own (possibly broken?) handling of the C++ ABI when building extensions, in favor of the more-complete PyBind11 internal handling. Fixes a few test failures due to https://github.com/pybind/pybind11/issues/5774, which effectively makes the `__qualname__` attribute of functions platform-dependent. Test plan: CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/160754 Approved by: https://github.com/Skylion007	2025-08-26 01:21:18 +00:00
Aidyn-A	3e5b021f21	[ATen][CPU][Sparse] Use Third-Party Eigen for sparse add and addmm (#155357 ) This pull request adds the following ops for sparse matrices using Eigen library: ```python add(a_csr, b_csr) add(a_csc, b_csc) addmm(c_csr, a_csr, b_csr) addmm(c_csr, a_csr, b_csc) addmm(c_csr, a_csc, b_csc) addmm(c_csr, a_csc, b_csr) addmm(c_csc, a_csr, b_csr) addmm(c_csc, a_csr, b_csc) addmm(c_csc, a_csc, b_csc) addmm(c_csc, a_csc, b_csr) ``` Currently, the operations for sparse matrices on CPU are available through MKL only. The non-existence of MKL on `aarch64` causes the unavailability of these ops on any machines with ARM based CPUs, including Apple Silicon, AWS Graviton and NVIDIA Grace. This PR addresses this issue by using Eigen as a backend for the above ops. This is a re-factored version of my previous PR #101814. The main difference with the old one, this does not enable Eigen by default. Pull Request resolved: https://github.com/pytorch/pytorch/pull/155357 Approved by: https://github.com/pearu, https://github.com/eqy Co-authored-by: Eli Uriegas <eliuriegas@meta.com>	2025-08-23 19:03:55 +00:00
PyTorch MergeBot	fc0683b1e7	Revert "[ATen][CPU][Sparse] Use Third-Party Eigen for sparse add and addmm (#155357 )" This reverts commit `ce048de608`. Reverted https://github.com/pytorch/pytorch/pull/155357 on behalf of https://github.com/seemethere due to This is causing buck builds to fail since we didn't add the definition of AT_USE_EIGEN_SPARSE in the buckbuild.bzl file, will follow-up and re-land this. ([comment](https://github.com/pytorch/pytorch/pull/155357#issuecomment-3212270510))	2025-08-21 22:38:40 +00:00
Andy Lugo	3caddd4daa	[ROCm] SDPA fix mem fault when dropout is enabled (#154864 ) Fixes issue that exhibited a device side memory access fault due to incorrect tensor life management Pull Request resolved: https://github.com/pytorch/pytorch/pull/154864 Approved by: https://github.com/jeffdaily Co-authored-by: Jeff Daily <jeff.daily@amd.com>	2025-08-21 14:23:13 +00:00
Aidyn-A	ce048de608	[ATen][CPU][Sparse] Use Third-Party Eigen for sparse add and addmm (#155357 ) This pull request adds the following ops for sparse matrices using Eigen library: ```python add(a_csr, b_csr) add(a_csc, b_csc) addmm(c_csr, a_csr, b_csr) addmm(c_csr, a_csr, b_csc) addmm(c_csr, a_csc, b_csc) addmm(c_csr, a_csc, b_csr) addmm(c_csc, a_csr, b_csr) addmm(c_csc, a_csr, b_csc) addmm(c_csc, a_csc, b_csc) addmm(c_csc, a_csc, b_csr) ``` Currently, the operations for sparse matrices on CPU are available through MKL only. The non-existence of MKL on `aarch64` causes the unavailability of these ops on any machines with ARM based CPUs, including Apple Silicon, AWS Graviton and NVIDIA Grace. This PR addresses this issue by using Eigen as a backend for the above ops. This is a re-factored version of my previous PR #101814. The main difference with the old one, this does not enable Eigen by default. Pull Request resolved: https://github.com/pytorch/pytorch/pull/155357 Approved by: https://github.com/pearu, https://github.com/eqy	2025-08-20 15:44:54 +00:00
Nikita Shulga	a06ec54d40	[MPS] Add API to query GPU core count (#160414 ) Using good old IOKit to get `gpu-core-count` property from device implementing `AGXAccelerator` service Expose this one as `torch.backend.mps.get_core_count()` and make it accessible via `MpsInterface` to the inductor Test Plan: Run `python3 -c "import torch;print(torch.backends.mps.get_name(), torch.backends.mps.get_core_count())"` and compare it to `system_profiler SPDisplaysDataType\|head -n10` ``` % python3 -c "import torch;print(torch.backends.mps.get_name(), torch.backends.mps.get_core_count())" Apple M1 Pro 16 % system_profiler SPDisplaysDataType\|head -n10 Graphics/Displays: Apple M1 Pro: Chipset Model: Apple M1 Pro Type: GPU Bus: Built-In Total Number of Cores: 16 Vendor: Apple (0x106b) Metal Support: Metal 3 ``` This would significantly improve occupancy for torch.compile generated kernels Pull Request resolved: https://github.com/pytorch/pytorch/pull/160414 Approved by: https://github.com/dcci	2025-08-14 00:05:17 +00:00
PaliC	1b99c1859c	[BE] Make PyObjectSlot use a global PyInterpreter and remove (#158427 ) This PR is a bit more involved but effectively works to drastically simplify PyObjectSlot and PyInterpreter. 1) For PyObjectSlot we now use a global pyinterpreter since there only is one. From here we change all of the call sites to rely on this assumption. 2) We also remove the "tags" of the PyInterpreter by deprecating `PyInterpreterStatus`. For the reviewer, sadly it seems like `functorch/csrc/dim/dim.cpp` needed to get linted, so there is an unreadable amount of changes there. Fortunately, the only actual change in the file is as follows which just removes `getPyInterpreter()` from the `check_pyobj` call. ``` mpy::handle handle_from_tensor(Arena& A, TensorRef t) { - // fast case: tensor is live in python - std::optional<PyObject> mb_obj = - t->unsafeGetTensorImpl()->pyobj_slot()->check_pyobj(getPyInterpreter(), /ignore_hermetic_tls=/false); - if (mb_obj.has_value() && !t->unsafeGetTensorImpl()->pyobj_slot()->owns_pyobj()) { - return mb_obj; - } - return A.autorelease(mpy::object::checked_steal(THPVariable_Wrap(t))); -} -} + // fast case: tensor is live in python + std::optional<PyObject> mb_obj = + t->unsafeGetTensorImpl()->pyobj_slot()->check_pyobj( + /ignore_hermetic_tls=/false); + if (mb_obj.has_value() && + !t->unsafeGetTensorImpl()->pyobj_slot()->owns_pyobj()) { + return mb_obj; + } + return A.autorelease(mpy::object::checked_steal(THPVariable_Wrap(t))); +} ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/158427 Approved by: https://github.com/albanD	2025-07-30 17:29:43 +00:00
Zhengxu Chen	8460131087	[nativert] Add OSS version of ModelRunner (#159268 ) Summary: Implement a ModelRunner from scratch with the minimum features for OSS only Test Plan: test_export -r NativeRT Rollback Plan: Differential Revision: D78979812 Pull Request resolved: https://github.com/pytorch/pytorch/pull/159268 Approved by: https://github.com/dolpm	2025-07-29 21:08:14 +00:00
Jeff Daily	9b29166f57	[ROCm] add flag torch.backends.miopen.immediate (#158951 ) The MIOpen integration has changed over the years. In the past, the MIOpen default for benchmark was True and if it were set to False it would use MIOpen Immediate Mode. But with #145294 the MIOpen benchmark default changed to False and to activate immediate mode you would set the deterministic flag to True. This has proved too restrictive because benchmark and deterministic flags are independent from immediate mode. Thus, immediate mode needs its own flag. Though MIOpen still masquerades behind torch.backends.cudnn and its flags, it seemed inappropriate to add an miopen-exclusive flag to the set of cudnn flags. This PR adds the first miopen-only flag to control its immediate mode. Pull Request resolved: https://github.com/pytorch/pytorch/pull/158951 Approved by: https://github.com/jeffdaily Co-authored-by: Jeff Daily <jeff.daily@amd.com>	2025-07-25 04:01:51 +00:00
PyTorch MergeBot	15a50dcf1c	Revert "[BE] Make PyObjectSlot use a global PyInterpreter and remove (#158427 )" This reverts commit `eb73650723`. Reverted https://github.com/pytorch/pytorch/pull/158427 on behalf of https://github.com/ZainRizvi due to Reverting this as part of reverting the stack for https://github.com/pytorch/pytorch/pull/158288 ([comment](https://github.com/pytorch/pytorch/pull/158427#issuecomment-3099815367))	2025-07-21 23:14:57 +00:00
Yukio Siraichi	a10f15718d	[DLPack] Add support for missing keyword-arguments. (#150218 ) This PR introduces the rest of the keyword-arguments added in DLPack version 2023.12: `dl_device` and `copy`. In summary, we handle these arguments in the C++ implementation of `to_dlpack(...)` at _torch/csrc/Module.cpp_, by calling the `maybeCopyTensor` function at _aten/src/ATen/DLConvertor.cpp_. It also introduces the following changes: - Add a new Python API `torchDeviceToDLDevice()`, which is simply a refactoring of the `getDLDevice()` function at _aten/src/ATen/DLConvertor.cpp_. - Add both keyword-arguments to the `from_dlpack()` function at _torch/utils/dlpack.py_ and to the `Tensor.__dlpack__()` dunder method. Pull Request resolved: https://github.com/pytorch/pytorch/pull/150218 Approved by: https://github.com/albanD ghstack dependencies: #150216, #150217	2025-07-20 00:46:20 +00:00
PaliC	eb73650723	[BE] Make PyObjectSlot use a global PyInterpreter and remove (#158427 ) This PR is a bit more involved but effectively works to drastically simplify PyObjectSlot and PyInterpreter. 1) For PyObjectSlot we now use a global pyinterpreter since there only is one. From here we change all of the call sites to rely on this assumption. 2) We also remove the "tags" of the PyInterpreter by deprecating `PyInterpreterStatus`. For the reviewer, sadly it seems like `functorch/csrc/dim/dim.cpp` needed to get linted, so there is an unreadable amount of changes there. Fortunately, the only actual change in the file is as follows which just removes `getPyInterpreter()` from the `check_pyobj` call. ``` mpy::handle handle_from_tensor(Arena& A, TensorRef t) { - // fast case: tensor is live in python - std::optional<PyObject> mb_obj = - t->unsafeGetTensorImpl()->pyobj_slot()->check_pyobj(getPyInterpreter(), /ignore_hermetic_tls=/false); - if (mb_obj.has_value() && !t->unsafeGetTensorImpl()->pyobj_slot()->owns_pyobj()) { - return mb_obj; - } - return A.autorelease(mpy::object::checked_steal(THPVariable_Wrap(t))); -} -} + // fast case: tensor is live in python + std::optional<PyObject> mb_obj = + t->unsafeGetTensorImpl()->pyobj_slot()->check_pyobj( + /ignore_hermetic_tls=/false); + if (mb_obj.has_value() && + !t->unsafeGetTensorImpl()->pyobj_slot()->owns_pyobj()) { + return mb_obj; + } + return A.autorelease(mpy::object::checked_steal(THPVariable_Wrap(t))); +} ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/158427 Approved by: https://github.com/albanD	2025-07-18 05:23:00 +00:00
Yu, Guangye	1b58e7adab	fix storage use_count (#157694 ) # Motivation https://github.com/pytorch/pytorch/pull/155451 decoupled `torch._C._storage_Use_Count` from CUDA and introduced a corresponding unit test: `815545f2dd/test/test_torch.py (L257-L262)` However, this test fails when PyTorch is built with debug assertions enabled. @clee2000 disabled this UT in https://github.com/pytorch/pytorch/pull/156731. The root cause is that `_cdata` is obtained from an `intrusive_ptr`, not a `weak_intrusive_ptr`. As a result, calling `c10::weak_intrusive_ptr::use_count` on it triggers the internal assertion: `815545f2dd/c10/util/intrusive_ptr.h (L912-L917)` For example: ```python a = torch.randn(10, device=device) # refcount=1, weakcount=1 prev_cf = torch._C._storage_Use_Count(a.untyped_storage()._cdata) # violate the assertation ``` This violates the expected invariant inside `weak_intrusive_ptr::use_count`, which assumes the pointer was originally constructed from a valid `weak_intrusive_ptr`. Actually, `storage_impl` is obtained from an `intrusive_ptr`. `815545f2dd/torch/csrc/Module.cpp (L2105-L2109)` # Solution Use `c10::intrusive_ptr::use_count` instead. Pull Request resolved: https://github.com/pytorch/pytorch/pull/157694 Approved by: https://github.com/albanD	2025-07-08 05:53:12 +00:00
Yukio Siraichi	b54eac2a5e	Upgrade to DLPack 1.0. (#145000 ) This PR makes the necessary changes in order to upgrade PyTorch DLPack support to version 1.0. In summary, we add support for the following: - Support both `DLManagedTensor` and `DLManagedTensorVersioned` when producing and consuming DLPack capsules - New parameter for `__dlpack__` method: `max_version` - Version checks: - Fallback to old implementation if no `max_version` or if version lower than 1.0 - Check that the to-be-consumed capsule is of version up to 1.X In order to accommodate these new specifications, this PR adds the following main changes: - `torch._C._to_dlpack_versioned` Python API (Module.cpp): new Python API for creating a versioned DLPack capsule (called by `__dlpack__` method) - `DLPackTraits<T>` class (DLConvertor.h): select the correct traits (e.g. capsule name, conversion functions) depending on which DLPack tensor class is being used - `toDLPackImpl<T>` function (DLConvertor.cpp): populates the common fields of both classes - `fromDLPackImpl<T>` function (DLConvertor.cpp): constructs a tensor from a DLPAck capsule - `fillVersion<T>` function (DLConvertor.cpp): populates the version field for `DLManagedTensorVersioned` (no-op for `DLManagedTensor`) - `tensor_fromDLPackImpl<T>` function (tensor_new.cpp): outer function for constructing a tensor out of a DLPack capsule that also marks the capsule as used Pull Request resolved: https://github.com/pytorch/pytorch/pull/145000 Approved by: https://github.com/albanD	2025-06-30 16:58:06 +00:00
haozhe.zhu	53e0b9c393	refine fp32 precision api (#125888 ) Based on the [conversation](https://github.com/pytorch/pytorch/issues/121791), we plan to drop the "highest, high, medium" to represent fp32 internal computation data types . Instead, we will directly use the algorithm to represent it. ### Design Choice: Directly use algorithms name like "TF32", "BF16". #### Pros - The names are more informative. 'tf32' is more informative than a simple "high". - Easier to extend new algorithm like `tf32x3` #### Cons - "HIGHEST, HIGH, MEDIUM" indicated the relative precision between different algorithms. However, we can have more documents to discuss them. ### We provide a layered structure for backends/operators. ('f32' is short for 'fp32_precision') ![image](https://github.com/user-attachments/assets/f89143e5-d6a1-4865-9351-9a50439f5067) ### We provide 3 fp32 compute precision can be set: - "ieee": Not allowed to use any other internal computation data types . - "tf32": Allowed to use tf32 as internal computation data types. - "bf16": Allowed to use bf16 as internal computation data types. - "none": Precision's are not set. Can be override by its father node. ### Overriding Precision Settings Child node can be override by its father node if it is set to default. For current default settings: ``` backend = generic, op = all, precision setting = none backend = cuda, op = all, precision setting = none backend = cuda, op = conv, precision setting = tf32 backend = cuda, op = rnn, precision setting = tf32 backend = cuda, op = matmul, precision setting = none backend = matmul, op = all, precision setting = none backend = matmul, op = conv, precision setting = none backend = matmul, op = rnn, precision setting = none backend = matmul, op = matmul, precision setting = none ``` - If the user set `torch.backends.mkldnn.fp32_precision="bf16"`, his child nodes `torch.backends.mkldnn.matmul.fp32_precision` / `torch.backends.mkldnn.conv.fp32_precision` / `torch.backends.mkldnn.rnn.fp32_precision` will also be override to "bf16". - If the user set `torch.backends.fp32_precision="bf16"`, `torch.backends.mkldnn.fp32_precision` and his child nodes will also we override to "bf16". ### Backward Compatible Since new API allow user to have more fine-grained control. There will be some conflict. For example, previous `torch.backends.cudnn.allow_tf32` are not enough to represent the status for `torch.backends.cudnn.rnn.fp32_precision="ieee"` and `torch.backends.cudnn.conv.fp32_precision="tf32"`. Therefore, our goal for backward compatible is - If the user only uses previous APIs, it will work as previous expectations. - If the user use new API to change the status to an un-representable status for old API, and try to access the status by old API. We will raise Runtime Error and point the document for user. ### Test Plan ``` python test/test_cuda.py -k test_fp32_precision_with_tf32 python test/test_cuda.py -k test_fp32_precision_with_float32_matmul_precision python test/test_cuda.py -k test_invalid_status_for_legacy_api python test/test_mkldnn.py -k test_mlkdnn_get_set python test/test_mkldnn.py -k test_generic_precision python test/test_mkldnn.py -k test_invalid python test/test_mkldnn.py -k test_default_use_parent ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/125888 Approved by: https://github.com/jgong5, https://github.com/albanD Co-authored-by: Jiang, Yanbing <yanbing.jiang@intel.com>	2025-06-26 10:32:20 +00:00
Aaron Orenstein	568ca89bac	Add a crash handler to async compile subprocesses (#155068 ) When the async compile subprocesses crash in C++ they tend to just silently die instead of leaving any kind of trace. This installs a crash handler so that if they SEGV, ILL, or ABRT they'll attempt to output a backtrace instead. While in there I also cleaned up the CLANGTIDY warnings coming from Module.cpp. Pull Request resolved: https://github.com/pytorch/pytorch/pull/155068 Approved by: https://github.com/masnesral	2025-06-25 03:27:28 +00:00
Yuanyuan Chen	07bb097698	Fix clang-tidy bugprone* warnings (#148529 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/148529 Approved by: https://github.com/ezyang	2025-06-23 23:09:56 +00:00
PyTorch MergeBot	f5e1b24945	Revert "Enable Leak Sanitizer (#154584 )" This reverts commit `c79c7bbe61`. Reverted https://github.com/pytorch/pytorch/pull/154584 on behalf of https://github.com/cyyever due to Need to suppress more output ([comment](https://github.com/pytorch/pytorch/pull/154584#issuecomment-2995792265))	2025-06-23 10:08:40 +00:00
cyy	c79c7bbe61	Enable Leak Sanitizer (#154584 ) It enables Leak Sanitizer and also provides a suppression file. Pull Request resolved: https://github.com/pytorch/pytorch/pull/154584 Approved by: https://github.com/ezyang	2025-06-23 05:20:27 +00:00
PyTorch MergeBot	b4442f42a9	Revert "Upgrade to DLPack 1.0. (#145000 )" This reverts commit `6e185c5312`. Reverted https://github.com/pytorch/pytorch/pull/145000 on behalf of https://github.com/atalman due to failing internal tests ([comment](https://github.com/pytorch/pytorch/pull/145000#issuecomment-2992055400))	2025-06-20 15:32:47 +00:00
Yukio Siraichi	6e185c5312	Upgrade to DLPack 1.0. (#145000 ) This PR makes the necessary changes in order to upgrade PyTorch DLPack support to version 1.0. In summary, we add support for the following: - Support both `DLManagedTensor` and `DLManagedTensorVersioned` when producing and consuming DLPack capsules - New parameter for `__dlpack__` method: `max_version` - Version checks: - Fallback to old implementation if no `max_version` or if version lower than 1.0 - Check that the to-be-consumed capsule is of version up to 1.X In order to accommodate these new specifications, this PR adds the following main changes: - `torch._C._to_dlpack_versioned` Python API (Module.cpp): new Python API for creating a versioned DLPack capsule (called by `__dlpack__` method) - `DLPackTraits<T>` class (DLConvertor.h): select the correct traits (e.g. capsule name, conversion functions) depending on which DLPack tensor class is being used - `toDLPackImpl<T>` function (DLConvertor.cpp): populates the common fields of both classes - `fromDLPackImpl<T>` function (DLConvertor.cpp): constructs a tensor from a DLPAck capsule - `fillVersion<T>` function (DLConvertor.cpp): populates the version field for `DLManagedTensorVersioned` (no-op for `DLManagedTensor`) - `tensor_fromDLPackImpl<T>` function (tensor_new.cpp): outer function for constructing a tensor out of a DLPack capsule that also marks the capsule as used Pull Request resolved: https://github.com/pytorch/pytorch/pull/145000 Approved by: https://github.com/albanD	2025-06-19 16:27:42 +00:00
Yu, Guangye	d84efde3f0	Move _storage_Use_Count to be gerneric (#155451 ) # Motivation `torch._C._storage_Use_Count` should be a generic API that is not aware of device type. It is also used in `337cd7c53d/torchtune/training/_activation_offloading.py (L323)` to do some memory optimization. Pull Request resolved: https://github.com/pytorch/pytorch/pull/155451 Approved by: https://github.com/albanD	2025-06-12 01:39:04 +00:00
PyTorch MergeBot	fdc387ec7c	Revert "refine fp32 precision api (#125888 )" This reverts commit `4c11b26158`. Reverted https://github.com/pytorch/pytorch/pull/125888 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it seems to cause some failures on ROCm ([comment](https://github.com/pytorch/pytorch/pull/125888#issuecomment-2869274791))	2025-05-11 00:35:46 +00:00

1 2 3 4 5 ...

639 Commits