Commit Graph

2185 Commits

Author SHA1 Message Date
Jeff Daily
c3b71d5499 [ROCm][CI] remove relaxed tolerance for tf32 tests (#166478)
Instead of relaxing tolerances for certain unit tests that exercise TF32 on MI300, skip the tests until hipblaslt accuracy is improved.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/166478
Approved by: https://github.com/jeffdaily

Co-authored-by: Jeff Daily <jeff.daily@amd.com>
Co-authored-by: Jagadish Krishnamoorthy <jagadish.krishnamoorthy@amd.com>
2025-10-31 16:15:42 +00:00
Yuanyuan Chen
fc8ac1216c [4/N] Remove unused loop variables in tests (#166690)
This PR removes unused loop variables in tests.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/166690
Approved by: https://github.com/justinchuby, https://github.com/mlazos
2025-10-31 10:20:48 +00:00
Yuanyuan Chen
a3fe1825aa Fix incomplete torch.cdist tests (#166507)
Because the `p` value is not used.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/166507
Approved by: https://github.com/Skylion007
2025-10-29 17:11:07 +00:00
nick-kuhn
79a4a9c02e Fix race condition and make CUDA kthvalue deterministic (#165762)
The gatherKthValue kernel had a race condition where multiple threads could write to the same output location without synchronization when duplicate k-th values exist, resulting in non-deterministic output.

Changes:
- aten/src/ATen/native/cuda/Sorting.cu: Use atomicMin with shared memory to deterministically find minimum index. Add early termination and remove redundant inRange checks. (We have to cast the index to `int32_t`, but this is already assumed to fit earlier in the kernel.)
- aten/src/ATen/native/cuda/Sorting.cpp: Remove non-deterministic alert since kthvalue is now deterministic on CUDA.
- torch/__init__.py: Remove kthvalue from non-deterministic operations list and remove kthvalue example from use_deterministic_algorithms() docstring.
- test/test_torch.py: Remove test_nondeterministic_alert_kthvalue since kthvalue no longer raises alerts on CUDA.

Benefits:
- Deterministic: always returns minimum index when duplicates exist
- Potential performance improvement on large arrays with repetitions

Test Results:
- All existing PyTorch tests pass (test_kthvalue)
- Custom determinism tests confirm consistent results
- Custom CUDA vs CPU correctness validated across 50+ scenarios
- Custom performance benchmarks show improvements with no visible regressions

Addresses #165227

Pull Request resolved: https://github.com/pytorch/pytorch/pull/165762
Approved by: https://github.com/ngimel, https://github.com/eqy
2025-10-25 00:45:57 +00:00
Yuanyuan Chen
0e083942cc Enable PLW0127 in ruff (#165851)
This PR enables `PLW0127` in ruff, which checks self-assignment of variables with the form `var=var`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/165851
Approved by: https://github.com/Lucaskabela
2025-10-21 03:30:57 +00:00
Yuanyuan Chen
fdab48a7c1 Enable all PIE rules on ruff (#165814)
This PR enables all PIE rules on ruff, there are already some enabled rules from this family, the new added rules are
```
PIE796  Enum contains duplicate value: {value}
PIE808  Unnecessary start argument in range
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/165814
Approved by: https://github.com/ezyang
2025-10-18 07:36:18 +00:00
PyTorch MergeBot
24520b8386 Revert "Enable all PIE rules on ruff (#165814)"
This reverts commit c79dfdc655.

Reverted https://github.com/pytorch/pytorch/pull/165814 on behalf of https://github.com/cyyever due to Need to cover more files ([comment](https://github.com/pytorch/pytorch/pull/165814#issuecomment-3417931863))
2025-10-18 07:21:08 +00:00
Yuanyuan Chen
c79dfdc655 Enable all PIE rules on ruff (#165814)
This PR enables all PIE rules on ruff, there are already some enabled rules from this family, the new added rules are
```
PIE796  Enum contains duplicate value: {value}
PIE808  Unnecessary start argument in range
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/165814
Approved by: https://github.com/ezyang
2025-10-18 06:40:12 +00:00
Raman Kumar
df26c51478 error message for instantiating CUDA Stream if CUDA not available (#159868)
Fixes #159744
Summary:
```
import torch

# Generate input data
input_tensor = torch.randn(3, 3)
stream = torch.cuda.Stream()

# Call the API
input_tensor.record_stream(stream)
```

⚠️ will now show an error message
`torch.cuda.Stream requires CUDA support`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/159868
Approved by: https://github.com/malfet, https://github.com/isuruf
2025-10-11 23:21:35 +00:00
Eddie Yan
f7082e92b3 [cuBLAS] update cuBLAS determinism docs, remove workspace requirement checks (#161749)
Since CUDA 11.x (need to update the docs for this, current PR is saying 12.2 which is incorrect) we've been allocating cuBLAS workspaces explicitly per handle/stream combination https://github.com/pytorch/pytorch/pull/85447

According to the cuBLAS documentation, this appears to be sufficient for determinism without any explicit workspace requirements to e.g., `:4096:8` or `:16:8` as was previously expressed in PyTorch docs https://docs.nvidia.com/cuda/cublas/#results-reproducibility

Planning to add an explicit determinism test as well...

Pull Request resolved: https://github.com/pytorch/pytorch/pull/161749
Approved by: https://github.com/ngimel
2025-10-03 00:09:47 +00:00
Nikita Shulga
3e3e83418d [BE] Move indexing tests to test_indexing (#160994)
Which enables them on MPS device
- xfail all `test_index_reduce` on MPS, as op is not implemented
- xfail all `test_index_copy` on MPS due to the silent correctness problems, see https://github.com/pytorch/pytorch/issues/160993
- Fixed hard crash in `index_fill` and replaced `skipIfMPS` with `expectedFailueMPS`
- Created issue for the lack of deterministic algorithms for MPS backend
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160994
Approved by: https://github.com/manuelcandales
ghstack dependencies: #160850, #160889, #160926
2025-08-21 00:42:55 +00:00
Xu Han
d6786741a7 [inductor] slow test some Windows UTs. (#160267)
When we enabled Windows inductor UTs since the PR: https://github.com/pytorch/pytorch/pull/160161/
The main branch CI occurred timeout issue, Let's move some UT to slow test.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160267
Approved by: https://github.com/ezyang
2025-08-10 18:35:42 +00:00
Raphael Reme
ee2649219c Fix max_width computation in _tensor_str._Formatter (#126859)
Previous version of `torch._tensor_str._Formatter` was not using `PRINT_OPTS.sci_mode` for the `max_width` computation but was using it for the formatting of values leading to a weird discrepancy.

Now, the code first checks if it should be in sci_mode, then compute `max_width`

Here is an example to test the behavior:
```python
A = torch.tensor([10, 1e-1, 1e-2])
B = torch.tensor([10, 1e-1, 1e-1])

print("================= Default =================")
print(A, f"Formatter max_width: {torch._tensor_str._Formatter(A).max_width}")
print(B, f"Formatter max_width: {torch._tensor_str._Formatter(B).max_width}")

print("================= sci_mode=False =================")
with torch._tensor_str.printoptions(sci_mode=False):
    print(A, f"Formatter max_width: {torch._tensor_str._Formatter(A).max_width}")
    print(B, f"Formatter max_width: {torch._tensor_str._Formatter(B).max_width}")

print("================= sci_mode=True =================")
with torch._tensor_str.printoptions(sci_mode=True):
    print(A, f"Formatter max_width: {torch._tensor_str._Formatter(A).max_width}")
    print(B, f"Formatter max_width: {torch._tensor_str._Formatter(B).max_width}")
```

In the current version this prints:
```
================= Default =================
tensor([1.0000e+01, 1.0000e-01, 1.0000e-02]) Formatter max_width: 10
tensor([10.0000,  0.1000,  0.1000]) Formatter max_width: 7
================= sci_mode=False =================
tensor([   10.0000,     0.1000,     0.0100]) Formatter max_width: 10
tensor([10.0000,  0.1000,  0.1000]) Formatter max_width: 7
================= sci_mode=True =================
tensor([1.0000e+01, 1.0000e-01, 1.0000e-02]) Formatter max_width: 10
tensor([1.0000e+01, 1.0000e-01, 1.0000e-01]) Formatter max_width: 7
```

On can see that in `sci_mode=False`, the values of A are prefixed with unneeded 0 and does not have the same `max_width` as B (It keeps the `max_width` from `sci_mode = None`)

Also in `sci_mode = True`, for B, the `max_width` is 7 but each value takes 10 chars... (But it is fine as the code that uses `max_width` do not rely much on it, but still, this is missleading)

After this commit, this will print
```
================= Default =================
tensor([1.0000e+01, 1.0000e-01, 1.0000e-02]) Formatter max_width: 10
tensor([10.0000,  0.1000,  0.1000]) Formatter max_width: 7
================= sci_mode=False =================
tensor([10.0000,  0.1000,  0.0100]) Formatter max_width: 7
tensor([10.0000,  0.1000,  0.1000]) Formatter max_width: 7
================= sci_mode=True =================
tensor([1.0000e+01, 1.0000e-01, 1.0000e-02]) Formatter max_width: 10
tensor([1.0000e+01, 1.0000e-01, 1.0000e-01]) Formatter max_width: 10
```

This also allows to align A with B for `sci_mode=False`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/126859
Approved by: https://github.com/malfet
2025-08-01 15:05:41 +00:00
PyTorch MergeBot
6c6e11c206 Revert "Fix max_width computation in _tensor_str._Formatter (#126859)"
This reverts commit 1465757959.

Reverted https://github.com/pytorch/pytorch/pull/126859 on behalf of https://github.com/yangw-dev due to broke trunk with test  distributed/test_c10d_functional_native.py::CompileTest::test_inductor_all_reduce_single - RuntimeError: Expected to find buf7 = empty but did not find it ([comment](https://github.com/pytorch/pytorch/pull/126859#issuecomment-3137137030))
2025-07-30 16:56:32 +00:00
Raphael Reme
1465757959 Fix max_width computation in _tensor_str._Formatter (#126859)
Previous version of `torch._tensor_str._Formatter` was not using `PRINT_OPTS.sci_mode` for the `max_width` computation but was using it for the formatting of values leading to a weird discrepancy.

Now, the code first checks if it should be in sci_mode, then compute `max_width`

Here is an example to test the behavior:
```python
A = torch.tensor([10, 1e-1, 1e-2])
B = torch.tensor([10, 1e-1, 1e-1])

print("================= Default =================")
print(A, f"Formatter max_width: {torch._tensor_str._Formatter(A).max_width}")
print(B, f"Formatter max_width: {torch._tensor_str._Formatter(B).max_width}")

print("================= sci_mode=False =================")
with torch._tensor_str.printoptions(sci_mode=False):
    print(A, f"Formatter max_width: {torch._tensor_str._Formatter(A).max_width}")
    print(B, f"Formatter max_width: {torch._tensor_str._Formatter(B).max_width}")

print("================= sci_mode=True =================")
with torch._tensor_str.printoptions(sci_mode=True):
    print(A, f"Formatter max_width: {torch._tensor_str._Formatter(A).max_width}")
    print(B, f"Formatter max_width: {torch._tensor_str._Formatter(B).max_width}")
```

In the current version this prints:
```
================= Default =================
tensor([1.0000e+01, 1.0000e-01, 1.0000e-02]) Formatter max_width: 10
tensor([10.0000,  0.1000,  0.1000]) Formatter max_width: 7
================= sci_mode=False =================
tensor([   10.0000,     0.1000,     0.0100]) Formatter max_width: 10
tensor([10.0000,  0.1000,  0.1000]) Formatter max_width: 7
================= sci_mode=True =================
tensor([1.0000e+01, 1.0000e-01, 1.0000e-02]) Formatter max_width: 10
tensor([1.0000e+01, 1.0000e-01, 1.0000e-01]) Formatter max_width: 7
```

On can see that in `sci_mode=False`, the values of A are prefixed with unneeded 0 and does not have the same `max_width` as B (It keeps the `max_width` from `sci_mode = None`)

Also in `sci_mode = True`, for B, the `max_width` is 7 but each value takes 10 chars... (But it is fine as the code that uses `max_width` do not rely much on it, but still, this is missleading)

After this commit, this will print
```
================= Default =================
tensor([1.0000e+01, 1.0000e-01, 1.0000e-02]) Formatter max_width: 10
tensor([10.0000,  0.1000,  0.1000]) Formatter max_width: 7
================= sci_mode=False =================
tensor([10.0000,  0.1000,  0.0100]) Formatter max_width: 7
tensor([10.0000,  0.1000,  0.1000]) Formatter max_width: 7
================= sci_mode=True =================
tensor([1.0000e+01, 1.0000e-01, 1.0000e-02]) Formatter max_width: 10
tensor([1.0000e+01, 1.0000e-01, 1.0000e-01]) Formatter max_width: 10
```

This also allows to align A with B for `sci_mode=False`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/126859
Approved by: https://github.com/malfet
2025-07-30 14:01:00 +00:00
gaoyufeng
f5cf05c983 Throw invalid_argument instead of RuntimeError when parameters exceed… (#158267)
Throw invalid_argument instead of RuntimeError when parameters exceed limits (for torch.int32 dtype)

Fixes #157707

Pull Request resolved: https://github.com/pytorch/pytorch/pull/158267
Approved by: https://github.com/albanD
2025-07-25 23:49:46 +00:00
Jiang, Yanbing
f4d8bc46c7 Enable TF32 as fp32 internal precision for matmul/linear/conv (#157520)
### Description

This PR is to enable TF32 as fp32 internal precision for matmul/linear/conv in `mkldnn backend`. Since we have refined fp32 precision API in https://github.com/pytorch/pytorch/pull/125888, we can easily extend the API to support TF32 for `mkldnn backend`.

```
torch.backends.mkldnn.matmul.fp32_precision = 'tf32'
torch.backends.mkldnn.conv.fp32_precision = "tf32"
```

Related kernel update and UTs update are done. And the wrapper `bf32_on_and _off` is updated to `reduced_f32_on_and_off`, and it can run tests 3 times, one is reduced_f32 OFF, the other two are reduced_f32 ON (including `bf32 ON` and `tf32 ON`).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/157520
Approved by: https://github.com/mingfeima, https://github.com/jansel
2025-07-17 08:57:34 +00:00
FFFrog
565fd07909 [Easy] Make the error message shown by THPUtils_unpackLong to be clearer (#157886)
As the title stated.

The error message of `THPUtils_unpackLong` is the same as `THPUtils_unpackInt`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/157886
Approved by: https://github.com/Skylion007
2025-07-10 06:26:13 +00:00
Xuehai Pan
fc0376e8b1 [BE][2/6] fix typos in test/ (test/test_*.py) (#157636)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/157636
Approved by: https://github.com/yewentao256, https://github.com/mlazos
ghstack dependencies: #156311, #156609
2025-07-09 11:02:23 +00:00
Aleksei Nikiforov
41e8b826d0 S390x update test marks (#157541)
Update s390x test marks

test_logs_out from test/dynamo/test_logging.py is updated
and no longer fails on s390x.

test_qengine from test/test_torch.py doesn't work on s390x:
no QEngine is available.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/157541
Approved by: https://github.com/huydhn
2025-07-08 09:08:33 +00:00
Yu, Guangye
1b58e7adab fix storage use_count (#157694)
# Motivation
https://github.com/pytorch/pytorch/pull/155451 decoupled `torch._C._storage_Use_Count` from CUDA and introduced a corresponding unit test:
815545f2dd/test/test_torch.py (L257-L262)
However, this test fails when PyTorch is built with debug assertions enabled. @clee2000 disabled this UT in https://github.com/pytorch/pytorch/pull/156731. The root cause is that `_cdata` is obtained from an `intrusive_ptr`, not a `weak_intrusive_ptr`. As a result, calling `c10::weak_intrusive_ptr::use_count` on it triggers the internal assertion:
815545f2dd/c10/util/intrusive_ptr.h (L912-L917)
For example:
```python
a = torch.randn(10, device=device) # refcount=1, weakcount=1
prev_cf = torch._C._storage_Use_Count(a.untyped_storage()._cdata) # violate the assertation
```
This violates the expected invariant inside `weak_intrusive_ptr::use_count`, which assumes the pointer was originally constructed from a valid `weak_intrusive_ptr`. Actually, `storage_impl` is obtained from an `intrusive_ptr`.
815545f2dd/torch/csrc/Module.cpp (L2105-L2109)

# Solution
Use `c10::intrusive_ptr::use_count` instead.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/157694
Approved by: https://github.com/albanD
2025-07-08 05:53:12 +00:00
Jason Ansel
b147b6c0e3 Increase tolerance for test_corrcoef_cuda_int32 (#157206)
Fixes #156988
Pull Request resolved: https://github.com/pytorch/pytorch/pull/157206
Approved by: https://github.com/Skylion007
2025-06-29 16:30:54 +00:00
Lukas Geiger
a92b24cd83 Prevent cudaStreamSync when indexing GPU tensors with boolean CPU mask (#156384)
`index_put` with a boolean mask (`target[mask] = src`) causes a `cudaStreamSynchronize`. When both `mask` and `target` tensors are on GPU this is expected.

However, the sync can be prevented if the `mask` is a CPU tensor.
Internally a new index tensor is created with `mask.nonzero()` so we can use a non-blocking copy to transfer it to the GPU since it cannot be accidentally mutated by the user between its creation and the device copy. @ngimel Let me know if I'm missing something.

I think this is useful since users can't prevent a sync simply by making sure all tensors are on the same device as with other ops. Instead one would need to do something like this which is much less readable
```python
indices = mask.nonzero().squeeze(1).to("cuda", non_blocking=True)
target[indices] = src
```
Fixes #12461

Pull Request resolved: https://github.com/pytorch/pytorch/pull/156384
Approved by: https://github.com/ngimel
2025-06-28 05:41:16 +00:00
Isuru Fernando
c808af514d Support deterministic upsample trilinear backward (#154239)
Fixes https://github.com/pytorch/pytorch/issues/154183
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154239
Approved by: https://github.com/eellison, https://github.com/albanD
2025-06-26 15:02:27 +00:00
Catherine Lee
2ff3280c77 [ez] Disable some failing periodic tests (#156731)
test_torch.py::TestTorchDeviceTypeCUDA::test_storage_use_count_cuda:
Added in https://github.com/pytorch/pytorch/pull/150059
Fails in debug mode [GH job link](https://github.com/pytorch/pytorch/actions/runs/15856606665/job/44706020831) [HUD commit link](4491326fb0)

inductor/test_inductor_freezing.py::FreezingGpuTests::test_cpp_wrapper_cuda:
[GH job link](https://github.com/pytorch/pytorch/actions/runs/15856606665/job/44707119967) [HUD commit link](4491326fb0)
started failing after moving to new cuda version https://github.com/pytorch/pytorch/pull/155234

I'll ping people if this gets merged

Pull Request resolved: https://github.com/pytorch/pytorch/pull/156731
Approved by: https://github.com/huydhn
2025-06-24 23:02:21 +00:00
Randolf Scholz
2e9bd03f60 Implemented Size.__radd__ (#152554)
Fixes #144334
Builds on top of #146834 by @khushi-411

The needed trick was to add `PyNumberMethods` because these Number Protocol appears to be responsible for `__radd__` (see https://stackoverflow.com/q/18794169)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/152554
Approved by: https://github.com/albanD

Co-authored-by: Khushi Agrawal <khushiagrawal411@gmail.com>
Co-authored-by: albanD <desmaison.alban@gmail.com>
2025-06-23 15:38:37 +00:00
Yu, Guangye
d84efde3f0 Move _storage_Use_Count to be gerneric (#155451)
# Motivation
`torch._C._storage_Use_Count` should be a generic API that is not aware of device type. It is also used in 337cd7c53d/torchtune/training/_activation_offloading.py (L323) to do some memory optimization.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155451
Approved by: https://github.com/albanD
2025-06-12 01:39:04 +00:00
Laith Sakka
8b507a9809 convert guard_size_oblivious to runtime check in infer_size_impl (#148872)
its ok to check the requirement  numel == newsize at runtime in case of unbacked instead of at compile time and assume that its true.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/148872
Approved by: https://github.com/bobrenjc93
2025-05-13 00:32:28 +00:00
Boyuan Feng
ffda46e3be [Graph Partition] remove weak dep from partition_input_names (#152863)
Graph partition analyzes read_writes to get partition input names. However, weak dep is fake dependency and is not actually read or written. So we should not include weak dep in graph partition input names.

The following test failure is fixed by removing weak dependency from partition_input_names:
`PYTORCH_TEST_WITH_INDUCTOR=1 python test/test_torch.py TestTorchDeviceTypeCUDA.test_params_invalidated_with_grads_invalidated_between_unscale_and_step_Adam_cuda_float32`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/152863
Approved by: https://github.com/eellison
2025-05-09 17:20:04 +00:00
albanD
22d1359bc6 Move warning from item to specific number conversions (#152709)
Follow up to https://github.com/pytorch/pytorch/pull/143261 to not warn when a plain .item() is done.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152709
Approved by: https://github.com/malfet, https://github.com/ngimel
2025-05-05 20:46:05 +00:00
Jagadish Krishnamoorthy
0d99b4e9e2 ROCm: Enable tf32 testing on test_nn (#148945)
Add tf32 support for ROCm tests.
test command: python test/test_nn.py -v

Pull Request resolved: https://github.com/pytorch/pytorch/pull/148945
Approved by: https://github.com/jeffdaily

Co-authored-by: Jeff Daily <jeff.daily@amd.com>
2025-04-28 23:01:04 +00:00
Andrew M. James
0413358a77 Non-deterministic alert in histc_cuda for floating types only (#151701)
The note about atomic add only applies for floating point. The
implementation is deterministic for integer data types.

fixes: #151610

Pull Request resolved: https://github.com/pytorch/pytorch/pull/151701
Approved by: https://github.com/ngimel, https://github.com/Skylion007
2025-04-24 21:16:46 +00:00
eqy
d78d2af4e3 [CUDA][TF32] Account for TF32 in test_corrcoef (#151830)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/151830
Approved by: https://github.com/Skylion007
2025-04-24 21:06:07 +00:00
PyTorch MergeBot
ed0d2ebaa0 Revert "Non-deterministic alert in histc_cuda for floating types only (#151701)"
This reverts commit b7a7741411.

Reverted https://github.com/pytorch/pytorch/pull/151701 on behalf of https://github.com/ZainRizvi due to Sorry but this is causing inductor tests to fail. See here for more info: test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_histc_cuda_float32 [GH job link](https://github.com/pytorch/pytorch/actions/runs/14586002763/job/40913547718) [HUD commit link](b7a7741411) ([comment](https://github.com/pytorch/pytorch/pull/151701#issuecomment-2821800837))
2025-04-22 16:07:25 +00:00
Andrew M. James
b7a7741411 Non-deterministic alert in histc_cuda for floating types only (#151701)
The note about atomic add only applies for floating point. The
implementation is deterministic for integer data types.

fixes: #151610

Pull Request resolved: https://github.com/pytorch/pytorch/pull/151701
Approved by: https://github.com/ngimel, https://github.com/Skylion007
2025-04-22 03:24:36 +00:00
Joshua Hamilton
4ce0b959ff Add a warning when a tensor with requires_grad=True is converted to a scalar (#143261)
Fixes #143071

Operations performed on tensors with `requires_grad=True` such as
```python
import torch

x = torch.tensor(2.0, requires_grad=True)
y = x ** 3
```
and
```python
x = torch.tensor(2.0, requires_grad=True)
y = torch.pow(x,3)
```
are valid operations.

While an operation using `numpy` like
```python
import numpy as np

x = torch.tensor(2.0, requires_grad=True)
y = np.pow(x,3)
# > RuntimeError: Can't call numpy() on Tensor that requires grad. Use tensor.detach().numpy() instead.
```
leads to an error.

However, an operation that uses `math` like
```python
import math

x = torch.tensor(2.0, requires_grad=True)
y = math.pow(x,3)
```
does not cause an error, and `y` is no longer a tensor with a gradient!

This represents a [footgun](https://en.wiktionary.org/wiki/footgun#Noun) for some users, like myself when training small, custom, non-neural network models.

To prevent future undesired behavior, I added a warning when converting tensors with `requires_grad=True` to scalars. Now, when using `math.pow` on a `tensor`, we get a single warning with:
```python
x = torch.tensor(2.0, requires_grad=True)
y = math.pow(x,3)
# > UserWarning: Converting a tensor with requires_grad=True to a scalar may lead to unexpected behavior.
# Consider using tensor.detach() first.
```

Please let me know if you have any questions 👍
Pull Request resolved: https://github.com/pytorch/pytorch/pull/143261
Approved by: https://github.com/malfet

Co-authored-by: albanD <desmaison.alban@gmail.com>
Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
2025-04-01 00:42:46 +00:00
PyTorch MergeBot
1526ff955e Revert "Add a warning when a tensor with requires_grad=True is converted to a scalar (#143261)"
This reverts commit 515b45e569.

Reverted https://github.com/pytorch/pytorch/pull/143261 on behalf of https://github.com/clee2000 due to failing internal tests D72135661 ([comment](https://github.com/pytorch/pytorch/pull/143261#issuecomment-2767531682))
2025-03-31 22:19:08 +00:00
Joshua Hamilton
515b45e569 Add a warning when a tensor with requires_grad=True is converted to a scalar (#143261)
Fixes #143071

Operations performed on tensors with `requires_grad=True` such as
```python
import torch

x = torch.tensor(2.0, requires_grad=True)
y = x ** 3
```
and
```python
x = torch.tensor(2.0, requires_grad=True)
y = torch.pow(x,3)
```
are valid operations.

While an operation using `numpy` like
```python
import numpy as np

x = torch.tensor(2.0, requires_grad=True)
y = np.pow(x,3)
# > RuntimeError: Can't call numpy() on Tensor that requires grad. Use tensor.detach().numpy() instead.
```
leads to an error.

However, an operation that uses `math` like
```python
import math

x = torch.tensor(2.0, requires_grad=True)
y = math.pow(x,3)
```
does not cause an error, and `y` is no longer a tensor with a gradient!

This represents a [footgun](https://en.wiktionary.org/wiki/footgun#Noun) for some users, like myself when training small, custom, non-neural network models.

To prevent future undesired behavior, I added a warning when converting tensors with `requires_grad=True` to scalars. Now, when using `math.pow` on a `tensor`, we get a single warning with:
```python
x = torch.tensor(2.0, requires_grad=True)
y = math.pow(x,3)
# > UserWarning: Converting a tensor with requires_grad=True to a scalar may lead to unexpected behavior.
# Consider using tensor.detach() first.
```

Please let me know if you have any questions 👍
Pull Request resolved: https://github.com/pytorch/pytorch/pull/143261
Approved by: https://github.com/albanD

Co-authored-by: albanD <desmaison.alban@gmail.com>
2025-03-30 11:19:07 +00:00
PyTorch MergeBot
16560d4e8f Revert "Refactor test/test_torch.py by moving testcase to test_indexing.py (#148875)"
This reverts commit 0fa0a74095.

Reverted https://github.com/pytorch/pytorch/pull/148875 on behalf of https://github.com/ZainRizvi due to That torch.version failure you got in CI was a legitimate failure and is now breaking trunk. [GH job link](https://github.com/pytorch/pytorch/actions/runs/13778023702/job/38534207536) [HUD commit link](0fa0a74095) ([comment](https://github.com/pytorch/pytorch/pull/148875#issuecomment-2714757288))
2025-03-11 15:27:25 +00:00
zeshengzong
0fa0a74095 Refactor test/test_torch.py by moving testcase to test_indexing.py (#148875)
Fix `FIXME` in `test_torch.py` by moving test-cases to `test_indexing.py`

```python
# FIXME: move to test indexing
# FIXME: move to indexing test suite
```

- Move tests in `test/test_torch.py` to `test_indexing.py`
- Remove `FIXME` comments

## TestResult

```bash
pytest test/test_torch.py -k TestTorchDeviceType -vv
pytest test/test_indexing.py  -k TestIndexing -vv
```

![image](https://github.com/user-attachments/assets/49a80985-e74a-4da6-a063-476e87e6aa8a)

![image](https://github.com/user-attachments/assets/77afa936-5dba-480c-b293-eb1f7bc74420)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/148875
Approved by: https://github.com/soulitzer
2025-03-11 01:01:59 +00:00
Nikita Shulga
3cde4c3069 [BE] Remove onlyCPU decorator from test_local_scalar_dense (#148559)
Followup from https://github.com/pytorch/pytorch/pull/145717, not sure why author thinks those tests should be limited to one architecture.
And fixed similar crashes for CUDA and MPS

Pull Request resolved: https://github.com/pytorch/pytorch/pull/148559
Approved by: https://github.com/ZainRizvi, https://github.com/atalman, https://github.com/seemethere
2025-03-06 17:43:02 +00:00
Sun, Jiayi
19a6cf35f6 add input shape check for _local_scalar_dense (#145717)
Fix https://github.com/pytorch/pytorch/issues/145066.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145717
Approved by: https://github.com/malfet
2025-03-05 15:24:08 +00:00
cyy
ec2805ada8 Remove outdated CUDA version check (#148142)
Since Torch requires CUDA>=11, some checks can be removed.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/148142
Approved by: https://github.com/janeyx99, https://github.com/eqy
2025-03-04 03:33:44 +00:00
cyy
b0dfd242fa Remove NO_MULTIPROCESSING_SPAWN checks (#146705)
py 3.9 has spawn.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146705
Approved by: https://github.com/colesbury
2025-02-28 05:53:19 +00:00
PyTorch MergeBot
926b7b5027 Revert "Remove NO_MULTIPROCESSING_SPAWN checks (#146705)"
This reverts commit 40ad5e01df.

Reverted https://github.com/pytorch/pytorch/pull/146705 on behalf of https://github.com/cyyever due to Broke lint?, I guess land race with rufff update ([comment](https://github.com/pytorch/pytorch/pull/146705#issuecomment-2689603077))
2025-02-28 03:04:38 +00:00
cyyever
40ad5e01df Remove NO_MULTIPROCESSING_SPAWN checks (#146705)
py 3.9 has spawn.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146705
Approved by: https://github.com/colesbury
2025-02-28 00:15:32 +00:00
Mikayla Gawarecki
536bce5a04 Make Tensor.set_ validate storage_offset when sizes/strides are unchanged (#147354)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/147354
Approved by: https://github.com/albanD
ghstack dependencies: #147352
2025-02-27 15:48:58 +00:00
Benjamin Glass
f98cd84b04 cpp_wrapper: use largeTensorTest for test memory checks (#146991)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/146991
Approved by: https://github.com/desertfire
2025-02-27 00:30:21 +00:00
cyy
c6ea4425e5 Enable some tests on Windows (#146243)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146243
Approved by: https://github.com/albanD
2025-02-05 03:54:28 +00:00
Danial Javady
bb4964013f Add determinmistic kernel for reflection2d (#136241)
Adds feature for #98925

Tests pass for both existing reflectionpad2d and the new one I inserted.

**Summary of the work:**

Simple conditional check for deterministic mode that will dispatch to a different kernel. This kernel does not use any atomic operations, and will lead to deterministic results as instead of going from the output to input(1:1) relationship, I am doing the opposite. I am going from input -> all outputs, which is 1 to many. These operations are done in the same order every execution as I simply traverse the data set with a grid stride loop and use simple linearized indexing into the input tensor.

So each thread will compute the 4 conditionals, which are then used to see if the input has an output in the 8 regions. These 8 regions are top left, top, top right, left, right, bottom left, bottom, bottom right`.

I did not focus on performance for this PR as that would expand the scope heavily. If there are any performance questions though i can answer.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/136241
Approved by: https://github.com/eqy, https://github.com/albanD
2025-01-29 20:34:03 +00:00