Commit Graph

2119 Commits

Author SHA1 Message Date
Natalia Gimelshein
0443398f5b Implement deterministic scan (#140887)
Fixes #89492
Uses block-wise cub primitives
On large inputs, this implementation is approximately 25% slower than device cub implementation, so it's turned on only in cases where cub would have been (floating point inputs, cumsum that is effectively 1d)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140887
Approved by: https://github.com/ezyang, https://github.com/kurtamohler
2024-11-19 23:43:26 +00:00
PyTorch MergeBot
7f10351ba0 Revert "Implement deterministic scan (#140887)"
This reverts commit 4eed438a42.

Reverted https://github.com/pytorch/pytorch/pull/140887 on behalf of https://github.com/ngimel due to breaks with 11.4 ([comment](https://github.com/pytorch/pytorch/pull/140887#issuecomment-2486409438))
2024-11-19 18:08:48 +00:00
Prachi Gupta
7156d0824d [ROCm] Fix largeIndexBlockSize (#139087)
On ROCm, hipification converts std::min to ::min, but ::min is not returning the right result. This impacts index_add_ operation on a large tensor, we end up picking the large values instead of max supported block size (128). This leads to GPU accessing memory out of bounds.

While we wait for ::min to be fixed, we can use < operator to compare instead of relying on ::min.

Example Code w/ failure:
```
D=6144
hidden_states = torch.zeros([16384, 6144],           device="cuda:0", dtype=torch.bfloat16)
index         = torch.randint(0, 16384, (1, 32, 16384), device="cuda:0", dtype=torch.int64)
output        = torch.empty([1, 32, 16384, 6144],    device="cuda:0", dtype=torch.bfloat16)
hidden_states.index_add_(0, index.view(-1), output.view(-1, D))
```

```
Traceback (most recent call last):
RuntimeError: HIP error: invalid configuration argument
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/139087
Approved by: https://github.com/jeffdaily, https://github.com/pruthvistony
2024-11-19 06:29:48 +00:00
Natalia Gimelshein
4eed438a42 Implement deterministic scan (#140887)
Fixes #89492
Uses block-wise cub primitives
On large inputs, this implementation is approximately 25% slower than device cub implementation, so it's turned on only in cases where cub would have been (floating point inputs, cumsum that is effectively 1d)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140887
Approved by: https://github.com/ezyang, https://github.com/kurtamohler
2024-11-18 20:56:14 +00:00
Nikita Shulga
0f739b8f66 [Codemod] skipIfMps->skipIfMPS (#140562)
As `MPS` is an acronym that stands for Metal Performance Shaders
Also to closer align with `skipCUDAIf` not `skipCudaIf`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140562
Approved by: https://github.com/ZainRizvi, https://github.com/r-barnes
2024-11-13 19:45:08 +00:00
zeshengzong
cb71bcc542 Replace clone.detach with detach.clone (#140264)
Fixes #64532

As state in issue, replace `clone.detach` by `detach.clone`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140264
Approved by: https://github.com/soulitzer
2024-11-13 07:01:02 +00:00
zeshengzong
5ef33e40b3 Add size param check of unfold (#139965)
Fixes #76617

Changes:

- Add check of input `size` value, give user friendly hint message
- fix `FIXME: move to shape ops test suite` in test file

Before
```python
import torch
x = torch.arange(1., 8)
x.unfold(0, -1, 1)

Traceback (most recent call last):
  File "/home/zong/code/unfold.py", line 12, in <module>
    x.unfold(0, -1, 1)
RuntimeError: Storage size calculation overflowed with sizes=[9, -1] and strides=[1, 1]

```

After
```python
import torch
x = torch.arange(1., 8)
x.unfold(0, -1, 1)

Traceback (most recent call last):
  File "/home/zong/code/pytorch/../unfold.py", line 12, in <module>
    x.unfold(0, -1, 1)
RuntimeError: size is -1 but must be >= 0
```

Test Result:
```bash
pytest test/test_shape_ops.py
```

![image](https://github.com/user-attachments/assets/d7bcef62-04e6-4187-9c8f-bc5220ff6c33)

```bash
$ lintrunner
```

![image](https://github.com/user-attachments/assets/6b48d095-5c8a-4e75-9957-dc22d39a73bb)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/139965
Approved by: https://github.com/ezyang
2024-11-09 17:12:53 +00:00
PyTorch MergeBot
067d2a089d Revert "Expose Storage _use_count API in Python (#139426)"
This reverts commit e31136d07b.

Reverted https://github.com/pytorch/pytorch/pull/139426 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but it is failing some inductor job in trunk ([comment](https://github.com/pytorch/pytorch/pull/139426#issuecomment-2453269063))
2024-11-03 02:40:45 +00:00
Jane Xu
e31136d07b Expose Storage _use_count API in Python (#139426)
Would be nice to replace the torch._C._storage_Use_Count call in https://github.com/pytorch/torchtune/pull/1936, at least without needing to know about _cdata in OSS code.

Initially keeping it private as Tensor._use_count is also private.

In favor over https://github.com/pytorch/pytorch/pull/139109 in solving the same problem, as exposing an existing API is better than adding a new one (and this enables a more robust fix)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/139426
Approved by: https://github.com/soulitzer
2024-11-02 23:36:31 +00:00
PyTorch MergeBot
49bfbed2eb Revert "Add deterministic path for CUDA cumsum (#136224)"
This reverts commit 383eba5229.

Reverted https://github.com/pytorch/pytorch/pull/136224 on behalf of https://github.com/ezyang due to larger memory usage apparently not acceptable ([comment](https://github.com/pytorch/pytorch/pull/136224#issuecomment-2447382819))
2024-10-30 14:43:15 +00:00
Mikayla Gawarecki
37149d032c Fix .to(cpu) for Storage (#138011)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/138011
Approved by: https://github.com/albanD
2024-10-23 01:31:48 +00:00
William Wen
92fdea8a39 remove skips due to https://github.com/pytorch/torchdynamo/issues/1991 (#138133)
Closes https://github.com/pytorch/pytorch/issues/93479. A bunch of other dynamo-wrapped tests also exhibit "torch.* returned non-Tensor output unimplemented" making the issue seem less relevant to me. Some tests are marked as xfail as they fail for other reasons.

If these tests are indeed important, we should create a new issue to track them.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/138133
Approved by: https://github.com/ezyang
2024-10-17 17:42:46 +00:00
Haifeng Jin
2db3f85894 Fixes NumPy 2 test failures in test_torch.py (#137740)
Related to #107302

The breakages are caused by backward incompatibility between NumPy 1 and NumPy 2.
This PR fixes all the corresponding test failures in `test_torch.py`.

1. The dtype of the return value `np.percentile` when passed a `torch.float32` tensor.
NumPy 1: Return value of `np.float64`.
NumPy 2: Return value of `np.float32`.
Solution: Enforce it with `.astype(np.float64)`.

2. The type of `np.gradient()` when returning multiple arrays.
NumPy1: A list of arrays.
NumPy2: A tuple of arrays.
Solution: Cast the tuple to a list.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/137740
Approved by: https://github.com/ezyang
2024-10-12 02:40:17 +00:00
Kurt Mohler
383eba5229 Add deterministic path for CUDA cumsum (#136224)
Change `cumsum` to call its decomposition when `use_deterministic_algorithms(True)` and input is CUDA.

Fixes #89492
Fixes #75240

Pull Request resolved: https://github.com/pytorch/pytorch/pull/136224
Approved by: https://github.com/ezyang, https://github.com/justinchuby, https://github.com/eqy
2024-10-10 06:59:08 +00:00
Miles
f301f6544b fix bug for fill_empty_deterministic_ not support complex half (#137488)
Fixes #133157

Pull Request resolved: https://github.com/pytorch/pytorch/pull/137488
Approved by: https://github.com/ezyang
2024-10-10 01:21:32 +00:00
vasiliy
a063a82c8b [redo] Fp8 support for item() with cuda, index_select, and fill_ cpu (#137341)
Summary:

Redo of https://github.com/pytorch/pytorch/pull/128780, easier to copy-paste.

Test Plan: CI

Reviewers:

Subscribers:

Tasks:

Tags:

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/137341
Approved by: https://github.com/eqy
2024-10-07 00:58:51 +00:00
albanD
c0deec120f Fix resurrection logic to trigger early enough (#137267)
Fixes https://github.com/pytorch/pytorch/issues/136358

The bug here is that the Tensor object is actually 2 classes: `Tensor` from `_tensor.py` and `TensorBase` from c++.

Before this PR, they have the following gc methods:
Tensor:
 - tp_clear subtype_clear
 - tp_traverse THPVariable_subclass_traverse
 - tp_dealloc THPVariable_subclass_dealloc

TensorBase:
- tp_clear THPVariable_clear
- tp_traverse THPFunction_traverse (fake function that is just an error)
- tp_dealloc object_dealloc

The problem is that when clear is called on the Tensor, subtype_clear is going to clear the things owned by the "Tensor" type, in particular, its `__dict__` attribute, before delegating to the TensorBase clear where we detect that resurrection needs to happen and skip it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/137267
Approved by: https://github.com/ezyang, https://github.com/kshitij12345
2024-10-04 21:13:54 +00:00
PyTorch MergeBot
e9d2765ec8 Revert "Add deterministic path for CUDA cumsum (#136224)"
This reverts commit d1bb8e828f.

Reverted https://github.com/pytorch/pytorch/pull/136224 on behalf of https://github.com/atalman due to Break internal CI ([comment](https://github.com/pytorch/pytorch/pull/136224#issuecomment-2379214226))
2024-09-27 12:54:47 +00:00
Kurt Mohler
d1bb8e828f Add deterministic path for CUDA cumsum (#136224)
Change `cumsum` to call its decomposition when `use_deterministic_algorithms(True)` and input is CUDA.

Fixes #89492

Pull Request resolved: https://github.com/pytorch/pytorch/pull/136224
Approved by: https://github.com/ezyang, https://github.com/justinchuby
2024-09-26 04:52:05 +00:00
PyTorch MergeBot
e3b89ca124 Revert "Add deterministic path for CUDA cumsum (#136224)"
This reverts commit b1a02bf708.

Reverted https://github.com/pytorch/pytorch/pull/136224 on behalf of https://github.com/ezyang due to Failing internall CI ([comment](https://github.com/pytorch/pytorch/pull/136224#issuecomment-2374201626))
2024-09-25 14:11:01 +00:00
Kurt Mohler
b1a02bf708 Add deterministic path for CUDA cumsum (#136224)
Change `cumsum` to call its decomposition when `use_deterministic_algorithms(True)` and input is CUDA.

Fixes #89492

Pull Request resolved: https://github.com/pytorch/pytorch/pull/136224
Approved by: https://github.com/ezyang, https://github.com/justinchuby
2024-09-24 21:34:43 +00:00
Robert Hardwick
eac04fe72a Increase bf32 tolerances for some cdist tests in test_torch (#136315)
- Set the new tolerances ~= N * eps(bfloat16) which should be a comfortable upper bound for tolerances. Where N is the inner dimension of the matmal.

Logic behind choice of tolerance:

The maximum error of the summation of a series of N numbers in bfloat16 should be `N * epsilon(bfloat16)` , I confirmed by sampling different random seeds that the maximum observed error doesn't exceed this value and is usually much less.

Fixes test failures on Arm® Neoverse™ V1 ( not raised as an issue as this hardware type is not currently covered by linux-aarch64 workflow )

```
Traceback (most recent call last):
  File "/var/lib/jenkins/workspace/test/test_torch.py", line 2478, in test_cdist_large
    self.assertEqual(expected, actual)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3885, in assertEqual
    raise error_metas.pop()[0].to_error(
AssertionError: Tensor-likes are not close!

Mismatched elements: 134118 / 1000000 (13.4%)
Greatest absolute difference: 0.03829193115234375 at index (291, 726) (up to 0.005 allowed)
Greatest relative difference: 0.03519868478178978 at index (291, 726) (up to 1.3e-06 allowed)
```

@malfet @jondea

Pull Request resolved: https://github.com/pytorch/pytorch/pull/136315
Approved by: https://github.com/albanD
2024-09-24 16:10:11 +00:00
PyTorch MergeBot
fd182b90a7 Revert "Add deterministic path for CUDA cumsum (#136224)"
This reverts commit d45b0151e5.

Reverted https://github.com/pytorch/pytorch/pull/136224 on behalf of https://github.com/atalman due to Failing internall CI ([comment](https://github.com/pytorch/pytorch/pull/136224#issuecomment-2369244135))
2024-09-23 19:57:13 +00:00
Kurt Mohler
d45b0151e5 Add deterministic path for CUDA cumsum (#136224)
Change `cumsum` to call its decomposition when `use_deterministic_algorithms(True)` and input is CUDA.

Fixes #89492

Pull Request resolved: https://github.com/pytorch/pytorch/pull/136224
Approved by: https://github.com/ezyang, https://github.com/justinchuby
2024-09-20 02:41:56 +00:00
Nikita Shulga
cd5452aace [CUDA] is_bf16_supported() should not crash if there are no GPUs (#132313)
`False` is the good answer on a system that does not have any CUDA GPUs.
- Added regression test to TestTorch.

Fixes https://github.com/pytorch/pytorch/issues/132303

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132313
Approved by: https://github.com/eqy, https://github.com/syed-ahmed
2024-08-02 02:50:43 +00:00
Songhao Jia
a141334c88 migitate wrong tensor.dim_order() (#131366)
Summary:
there're some issues for dim order creation. T194410923 has detail illustration.

One of the reason is sometimes `is_contiguous` function may generate ambiguous memory format result (some tensors might be both channels_last and contiguous at the same time), and dim order generation rely on memory format result underneath for shortcut.

To mitigate the issue, we make dim order utilizing the short cut if and only if the tensor is only belongs to single memory format. Otherwise, we will still recalculate it.

Test Plan: CI

Differential Revision: D60056793

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131366
Approved by: https://github.com/ezyang
2024-07-30 21:58:15 +00:00
Aaron Orenstein
5a0068cc69 [BE] mypy: disallow untyped decorators (#131428)
Untyped decorators strip the types from their decorated function so even if the underlying function is fully typed then callers to it don't get any benefit from type annotations.

Step 1 - Enable the error and override in all the offending files.

#131429

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131428
Approved by: https://github.com/justinchuby, https://github.com/oulgen
2024-07-23 21:50:55 +00:00
wizzniu
8963623494 Re-implement pin_memory to be device-agnostic by leveraging the Accelerator concept (#126376)
This PR re-implements pin memory aiming to get rid of the optional `device` argument and makes all related APIs to be device-agnostic. We add two new abstract APIs in [AcceleratorHooksInterface](https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/detail/AcceleratorHooksInterface.h#L12) and redefine pin memory as: "Pin memory is always pinned for the current accelerator device". In detail, it uses [getAcceleratorHooksInterface](https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/Context.h#L61) in pin_memory/is_pinned to get an appropriate device and invoke the corresponding overridden interfaces, instead of using BackendSelect and then dispatching to CUDA or other specific backends' implement methods.

Note: For new backends who want to implement and use pin memory, just inherit AcceleratorHooksInterface and overwrite the `isPinnedPtr` and `getPinnedMemoryAllocator` methods.

Additional context: To avoid BC-breaking, this PR just preserves the `device` arg of related APIs and would throw a deprecation warning if `device` arg is passed. Another PR will be submitted to update all PT callers (`Tensor.is_pinned()`, `Tensor.pin_memory()`...) not to pass this arg based on this PR. In future, `device` arg will be actually removed.

Relates #124908
Relates #14560
Pull Request resolved: https://github.com/pytorch/pytorch/pull/126376
Approved by: https://github.com/albanD
2024-07-23 01:44:15 +00:00
PyTorch MergeBot
726b9268d2 Revert "Re-implement pin_memory to be device-agnostic by leveraging the Accelerator concept (#126376)"
This reverts commit c986aeea2d.

Reverted https://github.com/pytorch/pytorch/pull/126376 on behalf of https://github.com/atalman due to Failing internal builds ([comment](https://github.com/pytorch/pytorch/pull/126376#issuecomment-2237496633))
2024-07-18 20:25:20 +00:00
wizzniu
c986aeea2d Re-implement pin_memory to be device-agnostic by leveraging the Accelerator concept (#126376)
This PR re-implements pin memory aiming to get rid of the optional `device` argument and makes all related APIs to be device-agnostic. We add two new abstract APIs in [AcceleratorHooksInterface](https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/detail/AcceleratorHooksInterface.h#L12) and redefine pin memory as: "Pin memory is always pinned for the current accelerator device". In detail, it uses [getAcceleratorHooksInterface](https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/Context.h#L61) in pin_memory/is_pinned to get an appropriate device and invoke the corresponding overridden interfaces, instead of using BackendSelect and then dispatching to CUDA or other specific backends' implement methods.

Note: For new backends who want to implement and use pin memory, just inherit AcceleratorHooksInterface and overwrite the `isPinnedPtr` and `getPinnedMemoryAllocator` methods.

Additional context: To avoid BC-breaking, this PR just preserves the `device` arg of related APIs and would throw a deprecation warning if `device` arg is passed. Another PR will be submitted to update all PT callers (`Tensor.is_pinned()`, `Tensor.pin_memory()`...) not to pass this arg based on this PR. In future, `device` arg will be actually removed.

Relates #124908
Relates #14560
Pull Request resolved: https://github.com/pytorch/pytorch/pull/126376
Approved by: https://github.com/albanD
2024-07-18 11:54:14 +00:00
eellison
9ab8d47f9d Constant folding for dynamic shape node (#129686)
Extend constant folding for dynamic shape node, only support pointwise op and some restricted ops

We support dynamic shapes by limiting constant folding of ops that are guaranteed to have uniform values (full, pointwise ops, and views) and running these operators with tensors of shape 1. This also eliminates the possibility of memory overhead of constant folding.

Taken over from https://github.com/pytorch/pytorch/pull/128937

joint work with @imzhuhl

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129686
Approved by: https://github.com/Chillee
ghstack dependencies: #130367
2024-07-16 00:17:11 +00:00
PyTorch MergeBot
9df4bc6a0d Revert "Constant folding for dynamic shape node (#129686)"
This reverts commit b7d287fbec.

Reverted https://github.com/pytorch/pytorch/pull/129686 on behalf of https://github.com/atalman due to Failing internally.  Test: https://github.com/pytorch/ao/blob/main/test/prototype/mx_formats/test_mx_linear.py ([comment](https://github.com/pytorch/pytorch/pull/129686#issuecomment-2228755295))
2024-07-15 15:19:24 +00:00
eellison
b7d287fbec Constant folding for dynamic shape node (#129686)
Extend constant folding for dynamic shape node, only support pointwise op and some restricted ops

We support dynamic shapes by limiting constant folding of ops that are guaranteed to have uniform values (full, pointwise ops, and views) and running these operators with tensors of shape 1. This also eliminates the possibility of memory overhead of constant folding.

Taken over from https://github.com/pytorch/pytorch/pull/128937

joint work with @imzhuhl

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129686
Approved by: https://github.com/Chillee
ghstack dependencies: #130367
2024-07-12 03:44:29 +00:00
cyy
cb5e9183c6 [Caffe2] [2/N] Remove Caffe2 from tests (#128911)
Follows #128675

Pull Request resolved: https://github.com/pytorch/pytorch/pull/128911
Approved by: https://github.com/titaiwangms, https://github.com/r-barnes
2024-06-19 00:05:50 +00:00
Aaron Orenstein
dcfa7702c3 Flip default value for mypy disallow_untyped_defs [1/11] (#127838)
See #127836 for details.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127838
Approved by: https://github.com/oulgen
2024-06-08 18:16:33 +00:00
Aaron Gokaslan
12c4a2c297 [BE]: Apply PLR1736 fixes (unnecessary index lookup) (#127716)
Applies the PLR1736 preview rule with some more autofixes to cut down on unnecessary accesses. Added a noqa since that test actually testing the dunder method.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127716
Approved by: https://github.com/ezyang
2024-06-03 17:22:13 +00:00
Xuehai Pan
67ef2683d9 [BE] wrap deprecated function/class with typing_extensions.deprecated (#127689)
Use `typing_extensions.deprecated` for deprecation annotation if possible. Otherwise, add `category=FutureWarning` to `warnings.warn("message")` if the category is missing.

Note that only warnings that their messages contain `[Dd]eprecat(ed|ion)` are updated in this PR.

Resolves #126888

- #126888

This PR is split from PR #126898.

- #126898

------

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127689
Approved by: https://github.com/Skylion007
2024-06-02 12:30:43 +00:00
PyTorch MergeBot
033e733021 Revert "[BE] wrap deprecated function/class with typing_extensions.deprecated (#126898)"
This reverts commit 749a132fb0.

Reverted https://github.com/pytorch/pytorch/pull/126898 on behalf of https://github.com/fbgheith due to switching typing-extensions=4.3.0 to 4.9.0 causes internal failure ([comment](https://github.com/pytorch/pytorch/pull/126898#issuecomment-2142884456))
2024-05-31 19:47:24 +00:00
Mikayla Gawarecki
cd06ae0cb8 Relax use_count constraints for swap_tensors when AccumulateGrad holds a reference (#127313)
### Before this PR:
`torch.utils.swap_tensors(a, b)` required the `use_count` of `a` and `b` to be 1

```python
a = torch.randn(2, 3, requires_grad=True)
b = torch.randn(2, 4)
out = a * 2
out.sum().backward()
# Calling swap_tensors here would fail due to the reference held by AccumulateGrad node, which is not cleaned up after backward
# torch.utils.swap_tensors(a, b)
del out
# Calling swap_tensors here would pass
torch.utils.swap_tensors(a, b)
```
### After this PR:
`torch.utils.swap_tensors(a, b)` requires the `use_count` of `a` and `b` to be 1 or 2 IF the second reference is held by `AccumulateGrad`

A pre-hook will be registered on the `AccumulateGrad` node so that it will fail if it is called (i.e. if user attempts to backward through the graph).

```python
a = torch.randn(2, 3, requires_grad=True)
b = torch.randn(2, 4)
out = a * 2
out.sum().backward()
# Calling swap_tensors here is ok
torch.utils.swap_tensors(a, b)
# If we ever backward to the AccumulateGrad node it will error that it was poisoned by swap_tensors
```

### Application to `nn.Module`

This issue is especially pertinent in context of `nn.Module` where parameters will have `AccumulateGrad` nodes initialized after forward. Specifically, this is intended to address https://github.com/pytorch/pytorch/pull/126814#issuecomment-2127777866. Previously, this would fail at the `m.cpu()` but we want users to be able to do something like the following, and instead raise an error if the user ever attempts to backward through the poisoned `AccumulateGrad` node

```python
import torch
import torch.nn as nn
m = nn.Linear(3, 5)
inp = torch.randn(2, 3)
out = m(inp)
out.sum().backward()
m.cpu()
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127313
Approved by: https://github.com/soulitzer
2024-05-30 07:06:55 +00:00
Xuehai Pan
749a132fb0 [BE] wrap deprecated function/class with typing_extensions.deprecated (#126898)
Use `typing_extensions.deprecated` for deprecation annotation if possible. Otherwise, add `category=FutureWarning` to `warnings.warn("message")` if the category is missing.

Note that only warnings that their messages contain `[Dd]eprecat(ed|ion)` are updated in this PR.

UPDATE: Use `FutureWarning` instead of `DeprecationWarning`.

Resolves #126888

- #126888

Pull Request resolved: https://github.com/pytorch/pytorch/pull/126898
Approved by: https://github.com/albanD
2024-05-29 12:09:27 +00:00
William Wen
5359af0c7e [dynamo] wrap GraphModule exceptions in dynamo-wrapped tests (#126341)
Better approach to https://github.com/pytorch/pytorch/pull/126197 to catch issues like https://github.com/pytorch/pytorch/issues/125568.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/126341
Approved by: https://github.com/anijain2305, https://github.com/jansel
2024-05-29 05:18:04 +00:00
Yu, Guangye
c09205a057 Deprecate device-specific GradScaler autocast API (#126527)
# Motivation

## for `torch.amp.GradScaler`,
- `torch.cpu.amp.GradScaler(args...)` is completely equivalent to `torch. amp.GradScaler("cpu", args...)`.
- `torch.cuda.amp.GradScaler(args...)` is completely equivalent to `torch.amp.GradScaler("cuda", args...)`.

So, we intend to depreate them and **strongly recommend** developer to use `torch.amp.GradScaler`.

## for `custom_fwd` and `custom_bwd`,
this is a good solution to make the custom function run with or without effect even in an autocast-enabled region and can be shared by other backends, like CPU and XPU.
So we generalize it to be device-agnostic and put them int `torch/amp/autocast_mode.py` and re-expose to `torch.amp.custom_fwd` and `torch.amp.custom_bwd`. Meanwhile, we deprecate `torch.cuda.amp.custom_fwd` and `torch.cuda.amp.custom_bwd`.

# Additional Context
Add UT to cover the deprecated warning.
No need for more UTs to cover the functionality of `torch.amp.custom_f/bwd`, the existing UTs that previously covered the functionality of `torch.cuda.amp.custom_f/bwd` can cover them.
To facilitate the review, we separate these code changes to two PRs. The first PR cover `torch.amp.GradScaler`. The follow-up covers `custom_fwd` and `custom_bwd`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/126527
Approved by: https://github.com/jgong5, https://github.com/gujinghui, https://github.com/janeyx99, https://github.com/EikanWang
2024-05-25 06:41:34 +00:00
Matthew Hoffman
86ad101370 Enable pickling torch._C.Generator (#126271)
Fixes #71398

Add `__reduce__` and `__setstate__` methods for `torch._C.Generator`.

`__reduce__` returns a tuple of 3 values:

1. `torch.Generator` itself.
2. A one-element tuple containing the `torch.device` to create the `Generator` with, since this cannot be changed after the object is created.
3. The state, a three-element tuple: the initial seed, the offset (or `None` if a CPU `Generator`), and the RNG state tensor.

`__setstate__` calls `manual_seed`, `set_offset` (if not `None`), and `set_state` on each respective element of the state.

Added test demonstrating successful reserialization with cpu and cuda `Generator`s.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/126271
Approved by: https://github.com/ezyang
2024-05-22 14:38:47 +00:00
Nikita Shulga
a379ed6e98 Fix SobolEngine default dtype handling (#126781)
- Change default dtype argument to `None` and fetch it value via `torch.get_default_dtype()` call if not defined
- Fix bug in first draw handling logic, that would ignore dtype in favor of default one due to type promotion
- Add regression tests

Fixes https://github.com/pytorch/pytorch/issues/126478
Pull Request resolved: https://github.com/pytorch/pytorch/pull/126781
Approved by: https://github.com/Skylion007, https://github.com/albanD
2024-05-22 01:55:48 +00:00
Tarunbir Gambhir
ad67553c5c Updated test_torch.py to use new OptimizerInfo infrastructure (#125538)
Fixes #123451 (only addresses test_torch.py cases)

This PR solves the specific task to update `test_grad_scaling_autocast` and `test_params_invalidated_with_grads_invalidated_between_unscale_and_step` in `test/test_torch.py` to use the new OptimizerInfo infrastructure.

I have combined tests that call `_grad_scaling_autocast_test` into one called `test_grad_scaling_autocast` and used `_get_optim_inputs_including_global_cliquey_kwargs` to avoid hard-coded configurations.

```
$ lintrunner test/test_cuda.py
ok No lint issues.
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/125538
Approved by: https://github.com/janeyx99
2024-05-18 15:42:45 +00:00
albanD
19a9de114a Forbid subclassing _TensorBase directly (#125558)
As per title.
This ensures that all the places where we assume the method defined in _tensor.py do exist.

BC-Breaking: This is bc-breaking as the user cannot subclass this private class anymore.
You should replace any use of _TensorBase to Tensor.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125558
Approved by: https://github.com/ezyang
2024-05-08 20:29:29 +00:00
PyTorch MergeBot
a0e2f62edd Revert "Include support for the scatter gather cuda kernels to allow for comp… (#124809)"
This reverts commit 9e24c263f9.

Reverted https://github.com/pytorch/pytorch/pull/124809 on behalf of https://github.com/kit1980 due to breaking internal builds ([comment](https://github.com/pytorch/pytorch/pull/124809#issuecomment-2091751002))
2024-05-02 21:36:18 +00:00
Danial Javady
9e24c263f9 Include support for the scatter gather cuda kernels to allow for comp… (#124809)
Fixes #121965

This PR hopes to add support complex numbers in the scatter/gather related kernels. For brevity, I will only include `complex<float>` for now as `complex<double>`, for example, will be more complicated.

C++ unit tests are currently passing alongside tests in `test_scatter_gather_ops.py`. Python test suites also seem to be passing.

Please keep the following in mind:
1) I think this is my first time using Pytorch.
2) This is my first contribution to Pytorch.

Environment:
3080 & WSL 2. `nvcc` is at 12.4.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124809
Approved by: https://github.com/mikaylagawarecki
2024-05-01 23:58:35 +00:00
PyTorch MergeBot
4d410155b2 Revert "Include support for the scatter gather cuda kernels to allow for comp… (#124809)"
This reverts commit e09f98c705.

Reverted https://github.com/pytorch/pytorch/pull/124809 on behalf of https://github.com/clee2000 due to windows build failure is real, https://github.com/pytorch/pytorch/actions/runs/8910674030/job/24470387612#step:11:11236 is the correct failure line, ignore the statement saying build passed, batch is errorcodes arent propagating again ([comment](https://github.com/pytorch/pytorch/pull/124809#issuecomment-2088680371))
2024-05-01 16:02:02 +00:00
Danial Javady
e09f98c705 Include support for the scatter gather cuda kernels to allow for comp… (#124809)
Fixes #121965

This PR hopes to add support complex numbers in the scatter/gather related kernels. For brevity, I will only include `complex<float>` for now as `complex<double>`, for example, will be more complicated.

C++ unit tests are currently passing alongside tests in `test_scatter_gather_ops.py`. Python test suites also seem to be passing.

Please keep the following in mind:
1) I think this is my first time using Pytorch.
2) This is my first contribution to Pytorch.

Environment:
3080 & WSL 2. `nvcc` is at 12.4.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124809
Approved by: https://github.com/eqy, https://github.com/mikaylagawarecki
2024-05-01 14:31:31 +00:00