pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
Natalia Gimelshein	0443398f5b	Implement deterministic scan (#140887 ) Fixes #89492 Uses block-wise cub primitives On large inputs, this implementation is approximately 25% slower than device cub implementation, so it's turned on only in cases where cub would have been (floating point inputs, cumsum that is effectively 1d) Pull Request resolved: https://github.com/pytorch/pytorch/pull/140887 Approved by: https://github.com/ezyang, https://github.com/kurtamohler	2024-11-19 23:43:26 +00:00
PyTorch MergeBot	7f10351ba0	Revert "Implement deterministic scan (#140887 )" This reverts commit `4eed438a42`. Reverted https://github.com/pytorch/pytorch/pull/140887 on behalf of https://github.com/ngimel due to breaks with 11.4 ([comment](https://github.com/pytorch/pytorch/pull/140887#issuecomment-2486409438))	2024-11-19 18:08:48 +00:00
Prachi Gupta	7156d0824d	[ROCm] Fix largeIndexBlockSize (#139087 ) On ROCm, hipification converts std::min to ::min, but ::min is not returning the right result. This impacts index_add_ operation on a large tensor, we end up picking the large values instead of max supported block size (128). This leads to GPU accessing memory out of bounds. While we wait for ::min to be fixed, we can use < operator to compare instead of relying on ::min. Example Code w/ failure: ``` D=6144 hidden_states = torch.zeros([16384, 6144], device="cuda:0", dtype=torch.bfloat16) index = torch.randint(0, 16384, (1, 32, 16384), device="cuda:0", dtype=torch.int64) output = torch.empty([1, 32, 16384, 6144], device="cuda:0", dtype=torch.bfloat16) hidden_states.index_add_(0, index.view(-1), output.view(-1, D)) ``` ``` Traceback (most recent call last): RuntimeError: HIP error: invalid configuration argument ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/139087 Approved by: https://github.com/jeffdaily, https://github.com/pruthvistony	2024-11-19 06:29:48 +00:00
Natalia Gimelshein	4eed438a42	Implement deterministic scan (#140887 ) Fixes #89492 Uses block-wise cub primitives On large inputs, this implementation is approximately 25% slower than device cub implementation, so it's turned on only in cases where cub would have been (floating point inputs, cumsum that is effectively 1d) Pull Request resolved: https://github.com/pytorch/pytorch/pull/140887 Approved by: https://github.com/ezyang, https://github.com/kurtamohler	2024-11-18 20:56:14 +00:00
Nikita Shulga	0f739b8f66	[Codemod] `skipIfMps`->`skipIfMPS` (#140562 ) As `MPS` is an acronym that stands for Metal Performance Shaders Also to closer align with `skipCUDAIf` not `skipCudaIf` Pull Request resolved: https://github.com/pytorch/pytorch/pull/140562 Approved by: https://github.com/ZainRizvi, https://github.com/r-barnes	2024-11-13 19:45:08 +00:00
zeshengzong	cb71bcc542	Replace clone.detach with detach.clone (#140264 ) Fixes #64532 As state in issue, replace `clone.detach` by `detach.clone` Pull Request resolved: https://github.com/pytorch/pytorch/pull/140264 Approved by: https://github.com/soulitzer	2024-11-13 07:01:02 +00:00
zeshengzong	5ef33e40b3	Add size param check of unfold (#139965 ) Fixes #76617 Changes: - Add check of input `size` value, give user friendly hint message - fix `FIXME: move to shape ops test suite` in test file Before ```python import torch x = torch.arange(1., 8) x.unfold(0, -1, 1) Traceback (most recent call last): File "/home/zong/code/unfold.py", line 12, in <module> x.unfold(0, -1, 1) RuntimeError: Storage size calculation overflowed with sizes=[9, -1] and strides=[1, 1] ``` After ```python import torch x = torch.arange(1., 8) x.unfold(0, -1, 1) Traceback (most recent call last): File "/home/zong/code/pytorch/../unfold.py", line 12, in <module> x.unfold(0, -1, 1) RuntimeError: size is -1 but must be >= 0 ``` Test Result: ```bash pytest test/test_shape_ops.py ``` ![image](https://github.com/user-attachments/assets/d7bcef62-04e6-4187-9c8f-bc5220ff6c33) ```bash $ lintrunner ``` ![image](https://github.com/user-attachments/assets/6b48d095-5c8a-4e75-9957-dc22d39a73bb) Pull Request resolved: https://github.com/pytorch/pytorch/pull/139965 Approved by: https://github.com/ezyang	2024-11-09 17:12:53 +00:00
PyTorch MergeBot	067d2a089d	Revert "Expose Storage _use_count API in Python (#139426 )" This reverts commit `e31136d07b`. Reverted https://github.com/pytorch/pytorch/pull/139426 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but it is failing some inductor job in trunk ([comment](https://github.com/pytorch/pytorch/pull/139426#issuecomment-2453269063))	2024-11-03 02:40:45 +00:00
Jane Xu	e31136d07b	Expose Storage _use_count API in Python (#139426 ) Would be nice to replace the torch._C._storage_Use_Count call in https://github.com/pytorch/torchtune/pull/1936, at least without needing to know about _cdata in OSS code. Initially keeping it private as Tensor._use_count is also private. In favor over https://github.com/pytorch/pytorch/pull/139109 in solving the same problem, as exposing an existing API is better than adding a new one (and this enables a more robust fix) Pull Request resolved: https://github.com/pytorch/pytorch/pull/139426 Approved by: https://github.com/soulitzer	2024-11-02 23:36:31 +00:00
PyTorch MergeBot	49bfbed2eb	Revert "Add deterministic path for CUDA `cumsum` (#136224 )" This reverts commit `383eba5229`. Reverted https://github.com/pytorch/pytorch/pull/136224 on behalf of https://github.com/ezyang due to larger memory usage apparently not acceptable ([comment](https://github.com/pytorch/pytorch/pull/136224#issuecomment-2447382819))	2024-10-30 14:43:15 +00:00
Mikayla Gawarecki	37149d032c	Fix .to(cpu) for Storage (#138011 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/138011 Approved by: https://github.com/albanD	2024-10-23 01:31:48 +00:00
William Wen	92fdea8a39	remove skips due to https://github.com/pytorch/torchdynamo/issues/1991 (#138133 ) Closes https://github.com/pytorch/pytorch/issues/93479. A bunch of other dynamo-wrapped tests also exhibit "torch.* returned non-Tensor output unimplemented" making the issue seem less relevant to me. Some tests are marked as xfail as they fail for other reasons. If these tests are indeed important, we should create a new issue to track them. Pull Request resolved: https://github.com/pytorch/pytorch/pull/138133 Approved by: https://github.com/ezyang	2024-10-17 17:42:46 +00:00
Haifeng Jin	2db3f85894	Fixes NumPy 2 test failures in test_torch.py (#137740 ) Related to #107302 The breakages are caused by backward incompatibility between NumPy 1 and NumPy 2. This PR fixes all the corresponding test failures in `test_torch.py`. 1. The dtype of the return value `np.percentile` when passed a `torch.float32` tensor. NumPy 1: Return value of `np.float64`. NumPy 2: Return value of `np.float32`. Solution: Enforce it with `.astype(np.float64)`. 2. The type of `np.gradient()` when returning multiple arrays. NumPy1: A list of arrays. NumPy2: A tuple of arrays. Solution: Cast the tuple to a list. Pull Request resolved: https://github.com/pytorch/pytorch/pull/137740 Approved by: https://github.com/ezyang	2024-10-12 02:40:17 +00:00
Kurt Mohler	383eba5229	Add deterministic path for CUDA `cumsum` (#136224 ) Change `cumsum` to call its decomposition when `use_deterministic_algorithms(True)` and input is CUDA. Fixes #89492 Fixes #75240 Pull Request resolved: https://github.com/pytorch/pytorch/pull/136224 Approved by: https://github.com/ezyang, https://github.com/justinchuby, https://github.com/eqy	2024-10-10 06:59:08 +00:00
Miles	f301f6544b	fix bug for fill_empty_deterministic_ not support complex half (#137488 ) Fixes #133157 Pull Request resolved: https://github.com/pytorch/pytorch/pull/137488 Approved by: https://github.com/ezyang	2024-10-10 01:21:32 +00:00
vasiliy	a063a82c8b	[redo] Fp8 support for item() with cuda, index_select, and fill_ cpu (#137341 ) Summary: Redo of https://github.com/pytorch/pytorch/pull/128780, easier to copy-paste. Test Plan: CI Reviewers: Subscribers: Tasks: Tags: Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/137341 Approved by: https://github.com/eqy	2024-10-07 00:58:51 +00:00
albanD	c0deec120f	Fix resurrection logic to trigger early enough (#137267 ) Fixes https://github.com/pytorch/pytorch/issues/136358 The bug here is that the Tensor object is actually 2 classes: `Tensor` from `_tensor.py` and `TensorBase` from c++. Before this PR, they have the following gc methods: Tensor: - tp_clear subtype_clear - tp_traverse THPVariable_subclass_traverse - tp_dealloc THPVariable_subclass_dealloc TensorBase: - tp_clear THPVariable_clear - tp_traverse THPFunction_traverse (fake function that is just an error) - tp_dealloc object_dealloc The problem is that when clear is called on the Tensor, subtype_clear is going to clear the things owned by the "Tensor" type, in particular, its `__dict__` attribute, before delegating to the TensorBase clear where we detect that resurrection needs to happen and skip it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/137267 Approved by: https://github.com/ezyang, https://github.com/kshitij12345	2024-10-04 21:13:54 +00:00
PyTorch MergeBot	e9d2765ec8	Revert "Add deterministic path for CUDA `cumsum` (#136224 )" This reverts commit `d1bb8e828f`. Reverted https://github.com/pytorch/pytorch/pull/136224 on behalf of https://github.com/atalman due to Break internal CI ([comment](https://github.com/pytorch/pytorch/pull/136224#issuecomment-2379214226))	2024-09-27 12:54:47 +00:00
Kurt Mohler	d1bb8e828f	Add deterministic path for CUDA `cumsum` (#136224 ) Change `cumsum` to call its decomposition when `use_deterministic_algorithms(True)` and input is CUDA. Fixes #89492 Pull Request resolved: https://github.com/pytorch/pytorch/pull/136224 Approved by: https://github.com/ezyang, https://github.com/justinchuby	2024-09-26 04:52:05 +00:00
PyTorch MergeBot	e3b89ca124	Revert "Add deterministic path for CUDA `cumsum` (#136224 )" This reverts commit `b1a02bf708`. Reverted https://github.com/pytorch/pytorch/pull/136224 on behalf of https://github.com/ezyang due to Failing internall CI ([comment](https://github.com/pytorch/pytorch/pull/136224#issuecomment-2374201626))	2024-09-25 14:11:01 +00:00
Kurt Mohler	b1a02bf708	Add deterministic path for CUDA `cumsum` (#136224 ) Change `cumsum` to call its decomposition when `use_deterministic_algorithms(True)` and input is CUDA. Fixes #89492 Pull Request resolved: https://github.com/pytorch/pytorch/pull/136224 Approved by: https://github.com/ezyang, https://github.com/justinchuby	2024-09-24 21:34:43 +00:00
Robert Hardwick	eac04fe72a	Increase bf32 tolerances for some cdist tests in test_torch (#136315 ) - Set the new tolerances ~= N * eps(bfloat16) which should be a comfortable upper bound for tolerances. Where N is the inner dimension of the matmal. Logic behind choice of tolerance: The maximum error of the summation of a series of N numbers in bfloat16 should be `N * epsilon(bfloat16)` , I confirmed by sampling different random seeds that the maximum observed error doesn't exceed this value and is usually much less. Fixes test failures on Arm® Neoverse™ V1 ( not raised as an issue as this hardware type is not currently covered by linux-aarch64 workflow ) ``` Traceback (most recent call last): File "/var/lib/jenkins/workspace/test/test_torch.py", line 2478, in test_cdist_large self.assertEqual(expected, actual) File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3885, in assertEqual raise error_metas.pop()[0].to_error( AssertionError: Tensor-likes are not close! Mismatched elements: 134118 / 1000000 (13.4%) Greatest absolute difference: 0.03829193115234375 at index (291, 726) (up to 0.005 allowed) Greatest relative difference: 0.03519868478178978 at index (291, 726) (up to 1.3e-06 allowed) ``` @malfet @jondea Pull Request resolved: https://github.com/pytorch/pytorch/pull/136315 Approved by: https://github.com/albanD	2024-09-24 16:10:11 +00:00
PyTorch MergeBot	fd182b90a7	Revert "Add deterministic path for CUDA `cumsum` (#136224 )" This reverts commit `d45b0151e5`. Reverted https://github.com/pytorch/pytorch/pull/136224 on behalf of https://github.com/atalman due to Failing internall CI ([comment](https://github.com/pytorch/pytorch/pull/136224#issuecomment-2369244135))	2024-09-23 19:57:13 +00:00
Kurt Mohler	d45b0151e5	Add deterministic path for CUDA `cumsum` (#136224 ) Change `cumsum` to call its decomposition when `use_deterministic_algorithms(True)` and input is CUDA. Fixes #89492 Pull Request resolved: https://github.com/pytorch/pytorch/pull/136224 Approved by: https://github.com/ezyang, https://github.com/justinchuby	2024-09-20 02:41:56 +00:00
Nikita Shulga	cd5452aace	[CUDA] `is_bf16_supported()` should not crash if there are no GPUs (#132313 ) `False` is the good answer on a system that does not have any CUDA GPUs. - Added regression test to TestTorch. Fixes https://github.com/pytorch/pytorch/issues/132303 Pull Request resolved: https://github.com/pytorch/pytorch/pull/132313 Approved by: https://github.com/eqy, https://github.com/syed-ahmed	2024-08-02 02:50:43 +00:00
Songhao Jia	a141334c88	migitate wrong tensor.dim_order() (#131366 ) Summary: there're some issues for dim order creation. T194410923 has detail illustration. One of the reason is sometimes `is_contiguous` function may generate ambiguous memory format result (some tensors might be both channels_last and contiguous at the same time), and dim order generation rely on memory format result underneath for shortcut. To mitigate the issue, we make dim order utilizing the short cut if and only if the tensor is only belongs to single memory format. Otherwise, we will still recalculate it. Test Plan: CI Differential Revision: D60056793 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131366 Approved by: https://github.com/ezyang	2024-07-30 21:58:15 +00:00
Aaron Orenstein	5a0068cc69	[BE] mypy: disallow untyped decorators (#131428 ) Untyped decorators strip the types from their decorated function so even if the underlying function is fully typed then callers to it don't get any benefit from type annotations. Step 1 - Enable the error and override in all the offending files. #131429 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131428 Approved by: https://github.com/justinchuby, https://github.com/oulgen	2024-07-23 21:50:55 +00:00
wizzniu	8963623494	Re-implement pin_memory to be device-agnostic by leveraging the Accelerator concept (#126376 ) This PR re-implements pin memory aiming to get rid of the optional `device` argument and makes all related APIs to be device-agnostic. We add two new abstract APIs in [AcceleratorHooksInterface](https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/detail/AcceleratorHooksInterface.h#L12) and redefine pin memory as: "Pin memory is always pinned for the current accelerator device". In detail, it uses [getAcceleratorHooksInterface](https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/Context.h#L61) in pin_memory/is_pinned to get an appropriate device and invoke the corresponding overridden interfaces, instead of using BackendSelect and then dispatching to CUDA or other specific backends' implement methods. Note: For new backends who want to implement and use pin memory, just inherit AcceleratorHooksInterface and overwrite the `isPinnedPtr` and `getPinnedMemoryAllocator` methods. Additional context: To avoid BC-breaking, this PR just preserves the `device` arg of related APIs and would throw a deprecation warning if `device` arg is passed. Another PR will be submitted to update all PT callers (`Tensor.is_pinned()`, `Tensor.pin_memory()`...) not to pass this arg based on this PR. In future, `device` arg will be actually removed. Relates #124908 Relates #14560 Pull Request resolved: https://github.com/pytorch/pytorch/pull/126376 Approved by: https://github.com/albanD	2024-07-23 01:44:15 +00:00
PyTorch MergeBot	726b9268d2	Revert "Re-implement pin_memory to be device-agnostic by leveraging the Accelerator concept (#126376 )" This reverts commit `c986aeea2d`. Reverted https://github.com/pytorch/pytorch/pull/126376 on behalf of https://github.com/atalman due to Failing internal builds ([comment](https://github.com/pytorch/pytorch/pull/126376#issuecomment-2237496633))	2024-07-18 20:25:20 +00:00
wizzniu	c986aeea2d	Re-implement pin_memory to be device-agnostic by leveraging the Accelerator concept (#126376 ) This PR re-implements pin memory aiming to get rid of the optional `device` argument and makes all related APIs to be device-agnostic. We add two new abstract APIs in [AcceleratorHooksInterface](https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/detail/AcceleratorHooksInterface.h#L12) and redefine pin memory as: "Pin memory is always pinned for the current accelerator device". In detail, it uses [getAcceleratorHooksInterface](https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/Context.h#L61) in pin_memory/is_pinned to get an appropriate device and invoke the corresponding overridden interfaces, instead of using BackendSelect and then dispatching to CUDA or other specific backends' implement methods. Note: For new backends who want to implement and use pin memory, just inherit AcceleratorHooksInterface and overwrite the `isPinnedPtr` and `getPinnedMemoryAllocator` methods. Additional context: To avoid BC-breaking, this PR just preserves the `device` arg of related APIs and would throw a deprecation warning if `device` arg is passed. Another PR will be submitted to update all PT callers (`Tensor.is_pinned()`, `Tensor.pin_memory()`...) not to pass this arg based on this PR. In future, `device` arg will be actually removed. Relates #124908 Relates #14560 Pull Request resolved: https://github.com/pytorch/pytorch/pull/126376 Approved by: https://github.com/albanD	2024-07-18 11:54:14 +00:00
eellison	9ab8d47f9d	Constant folding for dynamic shape node (#129686 ) Extend constant folding for dynamic shape node, only support pointwise op and some restricted ops We support dynamic shapes by limiting constant folding of ops that are guaranteed to have uniform values (full, pointwise ops, and views) and running these operators with tensors of shape 1. This also eliminates the possibility of memory overhead of constant folding. Taken over from https://github.com/pytorch/pytorch/pull/128937 joint work with @imzhuhl Pull Request resolved: https://github.com/pytorch/pytorch/pull/129686 Approved by: https://github.com/Chillee ghstack dependencies: #130367	2024-07-16 00:17:11 +00:00
PyTorch MergeBot	9df4bc6a0d	Revert "Constant folding for dynamic shape node (#129686 )" This reverts commit `b7d287fbec`. Reverted https://github.com/pytorch/pytorch/pull/129686 on behalf of https://github.com/atalman due to Failing internally. Test: https://github.com/pytorch/ao/blob/main/test/prototype/mx_formats/test_mx_linear.py ([comment](https://github.com/pytorch/pytorch/pull/129686#issuecomment-2228755295))	2024-07-15 15:19:24 +00:00
eellison	b7d287fbec	Constant folding for dynamic shape node (#129686 ) Extend constant folding for dynamic shape node, only support pointwise op and some restricted ops We support dynamic shapes by limiting constant folding of ops that are guaranteed to have uniform values (full, pointwise ops, and views) and running these operators with tensors of shape 1. This also eliminates the possibility of memory overhead of constant folding. Taken over from https://github.com/pytorch/pytorch/pull/128937 joint work with @imzhuhl Pull Request resolved: https://github.com/pytorch/pytorch/pull/129686 Approved by: https://github.com/Chillee ghstack dependencies: #130367	2024-07-12 03:44:29 +00:00
cyy	cb5e9183c6	[Caffe2] [2/N] Remove Caffe2 from tests (#128911 ) Follows #128675 Pull Request resolved: https://github.com/pytorch/pytorch/pull/128911 Approved by: https://github.com/titaiwangms, https://github.com/r-barnes	2024-06-19 00:05:50 +00:00
Aaron Orenstein	dcfa7702c3	Flip default value for mypy disallow_untyped_defs [1/11] (#127838 ) See #127836 for details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127838 Approved by: https://github.com/oulgen	2024-06-08 18:16:33 +00:00
Aaron Gokaslan	12c4a2c297	[BE]: Apply PLR1736 fixes (unnecessary index lookup) (#127716 ) Applies the PLR1736 preview rule with some more autofixes to cut down on unnecessary accesses. Added a noqa since that test actually testing the dunder method. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127716 Approved by: https://github.com/ezyang	2024-06-03 17:22:13 +00:00
Xuehai Pan	67ef2683d9	[BE] wrap deprecated function/class with `typing_extensions.deprecated` (#127689 ) Use `typing_extensions.deprecated` for deprecation annotation if possible. Otherwise, add `category=FutureWarning` to `warnings.warn("message")` if the category is missing. Note that only warnings that their messages contain `[Dd]eprecat(ed\|ion)` are updated in this PR. Resolves #126888 - #126888 This PR is split from PR #126898. - #126898 ------ Pull Request resolved: https://github.com/pytorch/pytorch/pull/127689 Approved by: https://github.com/Skylion007	2024-06-02 12:30:43 +00:00
PyTorch MergeBot	033e733021	Revert "[BE] wrap deprecated function/class with `typing_extensions.deprecated` (#126898 )" This reverts commit `749a132fb0`. Reverted https://github.com/pytorch/pytorch/pull/126898 on behalf of https://github.com/fbgheith due to switching typing-extensions=4.3.0 to 4.9.0 causes internal failure ([comment](https://github.com/pytorch/pytorch/pull/126898#issuecomment-2142884456))	2024-05-31 19:47:24 +00:00
Mikayla Gawarecki	cd06ae0cb8	Relax use_count constraints for swap_tensors when AccumulateGrad holds a reference (#127313 ) ### Before this PR: `torch.utils.swap_tensors(a, b)` required the `use_count` of `a` and `b` to be 1 ```python a = torch.randn(2, 3, requires_grad=True) b = torch.randn(2, 4) out = a * 2 out.sum().backward() # Calling swap_tensors here would fail due to the reference held by AccumulateGrad node, which is not cleaned up after backward # torch.utils.swap_tensors(a, b) del out # Calling swap_tensors here would pass torch.utils.swap_tensors(a, b) ``` ### After this PR: `torch.utils.swap_tensors(a, b)` requires the `use_count` of `a` and `b` to be 1 or 2 IF the second reference is held by `AccumulateGrad` A pre-hook will be registered on the `AccumulateGrad` node so that it will fail if it is called (i.e. if user attempts to backward through the graph). ```python a = torch.randn(2, 3, requires_grad=True) b = torch.randn(2, 4) out = a * 2 out.sum().backward() # Calling swap_tensors here is ok torch.utils.swap_tensors(a, b) # If we ever backward to the AccumulateGrad node it will error that it was poisoned by swap_tensors ``` ### Application to `nn.Module` This issue is especially pertinent in context of `nn.Module` where parameters will have `AccumulateGrad` nodes initialized after forward. Specifically, this is intended to address https://github.com/pytorch/pytorch/pull/126814#issuecomment-2127777866. Previously, this would fail at the `m.cpu()` but we want users to be able to do something like the following, and instead raise an error if the user ever attempts to backward through the poisoned `AccumulateGrad` node ```python import torch import torch.nn as nn m = nn.Linear(3, 5) inp = torch.randn(2, 3) out = m(inp) out.sum().backward() m.cpu() ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/127313 Approved by: https://github.com/soulitzer	2024-05-30 07:06:55 +00:00
Xuehai Pan	749a132fb0	[BE] wrap deprecated function/class with `typing_extensions.deprecated` (#126898 ) Use `typing_extensions.deprecated` for deprecation annotation if possible. Otherwise, add `category=FutureWarning` to `warnings.warn("message")` if the category is missing. Note that only warnings that their messages contain `[Dd]eprecat(ed\|ion)` are updated in this PR. UPDATE: Use `FutureWarning` instead of `DeprecationWarning`. Resolves #126888 - #126888 Pull Request resolved: https://github.com/pytorch/pytorch/pull/126898 Approved by: https://github.com/albanD	2024-05-29 12:09:27 +00:00
William Wen	5359af0c7e	[dynamo] wrap GraphModule exceptions in dynamo-wrapped tests (#126341 ) Better approach to https://github.com/pytorch/pytorch/pull/126197 to catch issues like https://github.com/pytorch/pytorch/issues/125568. Pull Request resolved: https://github.com/pytorch/pytorch/pull/126341 Approved by: https://github.com/anijain2305, https://github.com/jansel	2024-05-29 05:18:04 +00:00
Yu, Guangye	c09205a057	Deprecate device-specific GradScaler autocast API (#126527 ) # Motivation ## for `torch.amp.GradScaler`, - `torch.cpu.amp.GradScaler(args...)` is completely equivalent to `torch. amp.GradScaler("cpu", args...)`. - `torch.cuda.amp.GradScaler(args...)` is completely equivalent to `torch.amp.GradScaler("cuda", args...)`. So, we intend to depreate them and strongly recommend developer to use `torch.amp.GradScaler`. ## for `custom_fwd` and `custom_bwd`, this is a good solution to make the custom function run with or without effect even in an autocast-enabled region and can be shared by other backends, like CPU and XPU. So we generalize it to be device-agnostic and put them int `torch/amp/autocast_mode.py` and re-expose to `torch.amp.custom_fwd` and `torch.amp.custom_bwd`. Meanwhile, we deprecate `torch.cuda.amp.custom_fwd` and `torch.cuda.amp.custom_bwd`. # Additional Context Add UT to cover the deprecated warning. No need for more UTs to cover the functionality of `torch.amp.custom_f/bwd`, the existing UTs that previously covered the functionality of `torch.cuda.amp.custom_f/bwd` can cover them. To facilitate the review, we separate these code changes to two PRs. The first PR cover `torch.amp.GradScaler`. The follow-up covers `custom_fwd` and `custom_bwd`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/126527 Approved by: https://github.com/jgong5, https://github.com/gujinghui, https://github.com/janeyx99, https://github.com/EikanWang	2024-05-25 06:41:34 +00:00
Matthew Hoffman	86ad101370	Enable pickling `torch._C.Generator` (#126271 ) Fixes #71398 Add `__reduce__` and `__setstate__` methods for `torch._C.Generator`. `__reduce__` returns a tuple of 3 values: 1. `torch.Generator` itself. 2. A one-element tuple containing the `torch.device` to create the `Generator` with, since this cannot be changed after the object is created. 3. The state, a three-element tuple: the initial seed, the offset (or `None` if a CPU `Generator`), and the RNG state tensor. `__setstate__` calls `manual_seed`, `set_offset` (if not `None`), and `set_state` on each respective element of the state. Added test demonstrating successful reserialization with cpu and cuda `Generator`s. Pull Request resolved: https://github.com/pytorch/pytorch/pull/126271 Approved by: https://github.com/ezyang	2024-05-22 14:38:47 +00:00
Nikita Shulga	a379ed6e98	Fix SobolEngine default dtype handling (#126781 ) - Change default dtype argument to `None` and fetch it value via `torch.get_default_dtype()` call if not defined - Fix bug in first draw handling logic, that would ignore dtype in favor of default one due to type promotion - Add regression tests Fixes https://github.com/pytorch/pytorch/issues/126478 Pull Request resolved: https://github.com/pytorch/pytorch/pull/126781 Approved by: https://github.com/Skylion007, https://github.com/albanD	2024-05-22 01:55:48 +00:00
Tarunbir Gambhir	ad67553c5c	Updated test_torch.py to use new OptimizerInfo infrastructure (#125538 ) Fixes #123451 (only addresses test_torch.py cases) This PR solves the specific task to update `test_grad_scaling_autocast` and `test_params_invalidated_with_grads_invalidated_between_unscale_and_step` in `test/test_torch.py` to use the new OptimizerInfo infrastructure. I have combined tests that call `_grad_scaling_autocast_test` into one called `test_grad_scaling_autocast` and used `_get_optim_inputs_including_global_cliquey_kwargs` to avoid hard-coded configurations. ``` $ lintrunner test/test_cuda.py ok No lint issues. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/125538 Approved by: https://github.com/janeyx99	2024-05-18 15:42:45 +00:00
albanD	19a9de114a	Forbid subclassing _TensorBase directly (#125558 ) As per title. This ensures that all the places where we assume the method defined in _tensor.py do exist. BC-Breaking: This is bc-breaking as the user cannot subclass this private class anymore. You should replace any use of _TensorBase to Tensor. Pull Request resolved: https://github.com/pytorch/pytorch/pull/125558 Approved by: https://github.com/ezyang	2024-05-08 20:29:29 +00:00
PyTorch MergeBot	a0e2f62edd	Revert "Include support for the scatter gather cuda kernels to allow for comp… (#124809 )" This reverts commit `9e24c263f9`. Reverted https://github.com/pytorch/pytorch/pull/124809 on behalf of https://github.com/kit1980 due to breaking internal builds ([comment](https://github.com/pytorch/pytorch/pull/124809#issuecomment-2091751002))	2024-05-02 21:36:18 +00:00
Danial Javady	9e24c263f9	Include support for the scatter gather cuda kernels to allow for comp… (#124809 ) Fixes #121965 This PR hopes to add support complex numbers in the scatter/gather related kernels. For brevity, I will only include `complex<float>` for now as `complex<double>`, for example, will be more complicated. C++ unit tests are currently passing alongside tests in `test_scatter_gather_ops.py`. Python test suites also seem to be passing. Please keep the following in mind: 1) I think this is my first time using Pytorch. 2) This is my first contribution to Pytorch. Environment: 3080 & WSL 2. `nvcc` is at 12.4. Pull Request resolved: https://github.com/pytorch/pytorch/pull/124809 Approved by: https://github.com/mikaylagawarecki	2024-05-01 23:58:35 +00:00
PyTorch MergeBot	4d410155b2	Revert "Include support for the scatter gather cuda kernels to allow for comp… (#124809 )" This reverts commit `e09f98c705`. Reverted https://github.com/pytorch/pytorch/pull/124809 on behalf of https://github.com/clee2000 due to windows build failure is real, https://github.com/pytorch/pytorch/actions/runs/8910674030/job/24470387612#step:11:11236 is the correct failure line, ignore the statement saying build passed, batch is errorcodes arent propagating again ([comment](https://github.com/pytorch/pytorch/pull/124809#issuecomment-2088680371))	2024-05-01 16:02:02 +00:00
Danial Javady	e09f98c705	Include support for the scatter gather cuda kernels to allow for comp… (#124809 ) Fixes #121965 This PR hopes to add support complex numbers in the scatter/gather related kernels. For brevity, I will only include `complex<float>` for now as `complex<double>`, for example, will be more complicated. C++ unit tests are currently passing alongside tests in `test_scatter_gather_ops.py`. Python test suites also seem to be passing. Please keep the following in mind: 1) I think this is my first time using Pytorch. 2) This is my first contribution to Pytorch. Environment: 3080 & WSL 2. `nvcc` is at 12.4. Pull Request resolved: https://github.com/pytorch/pytorch/pull/124809 Approved by: https://github.com/eqy, https://github.com/mikaylagawarecki	2024-05-01 14:31:31 +00:00

1 2 3 4 5 ...

2119 Commits