PyTorch MergeBot
15fd1ea118
Revert "[Reland] Update mypy to 1.4.1 ( #105227 )"
...
This reverts commit c9c4f8efc3 .
Reverted https://github.com/pytorch/pytorch/pull/105227 on behalf of https://github.com/atalman due to trying to mitigate ci sev #105248 ([comment](https://github.com/pytorch/pytorch/pull/105227#issuecomment-1636510935 ))
2023-07-14 22:28:35 +00:00
Nikita Karetnikov
0c89596e4f
[OpInfo] add reference and error inputs for multi_margin_loss ( #104850 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104850
Approved by: https://github.com/ezyang
2023-07-14 21:16:09 +00:00
Nikita Shulga
c9c4f8efc3
[Reland] Update mypy to 1.4.1 ( #105227 )
...
This PR re-lands
- [Typing] Fix PEP 484 Violation (#105022 )
- Update mypy to 1.4.1 (#91983 )
That were reverted due to the conflict with internal source repo.
Mostly fixes for PEP-484 violation (i.e. when default arg is set to None, but type is not annotated as optional)
Plus few real fixes:
- Add missing `_get_upgraders_entry_map` to `torch/_C/__init__.pyi`
- Add missing return statement to `torch._export. deserialize_graph`
- Fix error message in `torch.ao.ns.fx.weight_utils.get_lstm_mod_weights`
- Add assert it `torch/optim/optimizer.py` that Optional list is not None
TODO (in followup PR):
- Fix erroneous `isinstance` check in `torch/ao/quantization/_pt2e/qat_utils.py`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105227
Approved by: https://github.com/atalman , https://github.com/albanD , https://github.com/Skylion007
2023-07-14 20:45:12 +00:00
PyTorch MergeBot
3c5a494d7a
Revert "Update mypy to 1.4.1 ( #91983 )"
...
This reverts commit 634659e262 .
Reverted https://github.com/pytorch/pytorch/pull/91983 on behalf of https://github.com/malfet due to It's dependent change was reverted, so reverting this one as well, to keep CI clean ([comment](https://github.com/pytorch/pytorch/pull/91983#issuecomment-1636059709 ))
2023-07-14 15:59:16 +00:00
Kurt Mohler
f987d11fa7
Reland: Make torch.empty* deterministic by filling with NaN or max int ( #104995 )
...
Relands #101849 after #104302 reverted it.
torchrec PR https://github.com/pytorch/torchrec/pull/1269 fixes the torchrec failure that caused #101849 to be reverted
Part of #82004
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104995
Approved by: https://github.com/albanD
2023-07-13 22:18:03 +00:00
Nikita Shulga
634659e262
Update mypy to 1.4.1 ( #91983 )
...
Mostly fixes for PEP-484 violation (i.e. when default arg is set to None, but type is not annotated as optional)
Plus few real fixes:
- Add missing `_get_upgraders_entry_map` to `torch/_C/__init__.pyi`
- Add missing return statement to `torch._export. deserialize_graph`
- Fix error message in `torch.ao.ns.fx.weight_utils.get_lstm_mod_weights`
-
TODO (in followup PR):
- Fix erroneous `isinstance` check in `torch/ao/quantization/_pt2e/qat_utils.py`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91983
Approved by: https://github.com/kit1980 , https://github.com/ZainRizvi , https://github.com/huydhn , https://github.com/thiagocrepaldi , https://github.com/aaronenyeshi
2023-07-13 16:30:36 +00:00
yanbing-j
053654b9cf
Optimize scatter_add/scatter_reduce in BFloat16/Half data type in CPU backend ( #103427 )
...
### Description
This PR is to optimize scatter_add/scatter_reduce of BFloat16/Half data type in CPU backend, which is one task in https://github.com/pyg-team/pytorch_geometric/issues/7057 . Main point is creating a buffer among threads to accumulate intermediate data as fp32 data type.
Next step:
- [x] Add benchmarks
- [x] Extend to Half
- [x] Simplify code
### Performance test (Updated)
Test BFloat16 in Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz
With jemalloc and iomp
Single socket (40C)

Single core

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103427
Approved by: https://github.com/mingfeima , https://github.com/albanD
2023-07-13 09:34:29 +00:00
Aaron Gokaslan
2f95a3d0fc
[BE]: Apply ruff PERF fixes to torch ( #104917 )
...
Applies automated ruff fixes in the PERF modules and enables all automatic ones. I also updated ruff which applied some additional fixes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104917
Approved by: https://github.com/ezyang , https://github.com/albanD
2023-07-11 20:45:21 +00:00
Kurt Mohler
0ccdbbe233
Add deterministic path for Tensor.resize_ ( #104300 )
...
New elements added to a tensor by `torch.Tensor.resize_` are set to NaN/MAX_INT when deterministic mode is turned on.
When `torch.Tensor.resize_` is called on a quantized tensor and deterministic mode is turned on, a nondeterministic error is raised.
Part of #82004
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104300
Approved by: https://github.com/albanD
2023-07-07 00:22:13 +00:00
Nikita Shulga
ddd7da7546
Enable more tests ( #104437 )
...
Remove `test_segment_reductions` from list of blocklisted tests Remove `@onlyCPU` qualifier from test_segment_reductions as it has CUDA specific parts
Fixes https://github.com/pytorch/pytorch/issues/104410
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104437
Approved by: https://github.com/atalman , https://github.com/huydhn
2023-06-30 16:26:11 +00:00
Amr Elshennawy
a78bddac01
Revert D46920584: Multisect successfully blamed D46920584 for test or build failures ( #104269 ) ( #104302 )
...
Summary:
This diff is reverting D46920584
D46920584: Make `torch.empty*` deterministic by filling with NaN or max int value (#101849 ) by generatedunixname499836121 has been identified to be causing the following test or build failures:
Tests affected:
- [torchrec/distributed/composable/tests:test_fsdp - torchrec.distributed.composable.tests.test_fsdp.FullyShardTest: test_composable_checkpoint](https://www.internalfb.com/intern/test/281475062923125/ )
Here's the Multisect link:
https://www.internalfb.com/multisect/2341386
Here are the tasks that are relevant to this breakage:
We're generating a revert to back out the changes in this diff, please note the backout may land if someone accepts it.
If you believe this diff has been generated in error you may Commandeer and Abandon it.
Test Plan: NA
Reviewed By: huydhn, osalpekar
Differential Revision: D46997394
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104302
Approved by: https://github.com/osalpekar
2023-06-29 20:20:58 +00:00
Richard Barnes
8cad411d3d
Fix UntypedStorage pin error ( #104355 )
...
Summary:
Fixes:
```
TypeError: cannot pin 'torch.storage.UntypedStorage' only CPU memory can be pinned
```
Test Plan: Sandcastle
Differential Revision: D47093797
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104355
Approved by: https://github.com/malfet
2023-06-29 16:06:52 +00:00
Kurt Mohler
2642f31e4c
Make torch.empty* deterministic by filling with NaN or max int value ( #101849 )
...
Part of #82004
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101849
Approved by: https://github.com/lezcano , https://github.com/albanD , https://github.com/kulinseth
2023-06-21 02:53:22 +00:00
Elias Ellison
40d70ba7ed
Remove a number of fixed skips ( #103162 )
...
Also adds `PYTORCH_TEST_WITH_AOT_EAGER` to distinguish errors coming from aot_autograd and not inductor (not tested in ci, but useful for local debugging)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103162
Approved by: https://github.com/desertfire
2023-06-08 17:37:59 +00:00
ts
d2d03f0f44
Make index_add_ error if input source shape is wrong ( #100321 )
...
Fixes #92576 , checking the following as described in the documentation:
"source.shape[dim] == len(index) and source.shape[i] == self.shape[i] for i != dim"
Would be happy to iterate on this if there are any issues, and would be happy to implement the checking for the CUDA and MPS implementations of index_add_.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100321
Approved by: https://github.com/lezcano
2023-06-08 06:51:10 +00:00
Lu Fang
1237502213
Introduce fast path for cuda_equal ( #102714 )
...
We introduce the same trick for cuda_equal. Assuming in cuda_equal, the flags are already handled correctly.
Added the tests for cuda part.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102714
Approved by: https://github.com/ezyang
2023-06-03 05:49:49 +00:00
Shiyan Deng
685505353a
Back out "Add PyObject preservation for UntypedStorage ( #97470 )" ( #102553 )
...
Summary:
Original commit changeset: c24708d18ccb
Original Phabricator Diff: D46159983
Test Plan: SL tests and CI
Differential Revision: D46284986
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102553
Approved by: https://github.com/DanilBaibak
2023-06-01 17:23:43 +00:00
Edward Z. Yang
818d92f58c
Support resize on meta storage ( #101988 )
...
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101988
Approved by: https://github.com/albanD , https://github.com/bdhirsh
2023-05-25 04:41:45 +00:00
PyTorch MergeBot
210fc28d5e
Revert "Support resize on meta storage ( #101988 )"
...
This reverts commit 7d1ba0a92a .
Reverted https://github.com/pytorch/pytorch/pull/101988 on behalf of https://github.com/osalpekar due to Need to revert and rebase this in order to unblock train import ([comment](https://github.com/pytorch/pytorch/pull/101988#issuecomment-1561970230 ))
2023-05-24 21:51:33 +00:00
Wang, Eikan
2e18dd2bdc
Improve bf16 neg by bypassing the convertion between BF16 and FP32 ( #99711 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99711
Approved by: https://github.com/mingfeima , https://github.com/jgong5 , https://github.com/desertfire
2023-05-24 03:25:23 +00:00
Kazuaki Ishizaki
be5e77ca4c
Make _StorageBase.byteswap faster ( > 10000x) ( #101925 )
...
This PR addresses #101690 . This PR implement faster data elements swap in `_StorageBase` using C++ rather than using Python.
This PR helps such a situation that a large model saved on a little-endian machine will be loaded on a big-endian machine.
TODO:
- [x] Add test cases
- [x] Add performance comparison before and after the PR
- [ ] (Optional) Investigate further opportunities for performance improvements by [SIMDization](https://dev.to/wunk/fast-array-reversal-with-simd-j3p )
Fixes #101690
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101925
Approved by: https://github.com/mikaylagawarecki
2023-05-24 00:13:41 +00:00
Edward Z. Yang
7d1ba0a92a
Support resize on meta storage ( #101988 )
...
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101988
Approved by: https://github.com/albanD , https://github.com/bdhirsh
2023-05-23 16:49:17 +00:00
Kurt Mohler
5fe629e314
Add PyObject preservation for UntypedStorage ( #97470 )
...
Part of #91395
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97470
Approved by: https://github.com/ezyang
2023-05-23 01:27:30 +00:00
drisspg
6f13d6892a
Add meta support for multinomial ( #101324 )
...
# Summary
Found this when trying to compile the text gen loop of nanogpt here: b33289942b/torchbenchmark/models/nanogpt_generate/model.py (L322)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101324
Approved by: https://github.com/ngimel
2023-05-19 00:04:26 +00:00
Edward Z. Yang
c567748e16
Make interpolate_bilinear deterministic using decomposition ( #101115 )
...
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101115
Approved by: https://github.com/ngimel
2023-05-11 22:48:01 +00:00
Yu, Guangye
14964b3aa5
Add is_xpu to torch type ( #101072 )
...
# Motivate
Without this PR:
```python
>>>import torch
>>>torch.IntTensor.is_cuda
False
>>>torch.IntTensor.is_xpu
<attribute 'is_xpu' of 'torch._C._TensorBase' objects>
```
With this PR:
```python
>>>import torch
>>>torch.IntTensor.is_xpu
False
```
Align to CUDA, some customer code use is_xpu to check the backend. Without this PR, the check is always True which result in an unexpected behavior
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101072
Approved by: https://github.com/mikaylagawarecki
2023-05-11 17:50:59 +00:00
vfdev-5
622e582a2b
Register get_cpu_capability for jit ( #100723 )
...
Description:
Context: In torchvision we ensure that functional ops are torchscriptable. Recently exposed `torch.backends.cpu.get_cpu_capability()` in https://github.com/pytorch/pytorch/pull/100164 is failing in torchvision CI
```
RuntimeError:
Python builtin <built-in function _get_cpu_capability> is currently not supported in Torchscript:
File "/usr/local/lib/python3.10/dist-packages/torch/backends/cpu/__init__.py", line 17
- "AVX512"
"""
return torch._C._get_cpu_capability()
~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
```
Ref: https://github.com/pytorch/vision/pull/7557
In this PR, `torch._C._get_cpu_capability()` is explicitly registered for JIT and tested.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100723
Approved by: https://github.com/albanD
2023-05-09 09:52:29 +00:00
Valentin Andrei
9bc68fcd25
[pytorch] Accelerate indexing_backward_kernel with duplicates ( #99441 attempt 2) ( #100505 )
...
By knowing the stride value ahead of time, we can simplify the kernel code as follows:
If stride == 1 we can use the whole warp to reduce the gradients
If stride < warp_size we don't need the internal while (start_feature < stride) loop as blockDim.x is always 32
This changes improve the performance of the kernel when duplicates are present and do not affect the performance with low amount of duplicates. The implementation is deterministic.
The proposed implementation uses opmath_t to accumulate in registers the gradient values so when using FP16/BF16 it may overflow if the number of elements is large. This is different from the initial implementation who accumulates in scalar_t and does not overflow. In addition, when the stride is 1, we are using warp shuffles to sum the gradient so the order of the addition is slightly different than a reference implementation which causes some minor numerical differences when compared to a reference.
TEST CODE:
```
# The first element is the number of iterations.
# The second represents the number of unique elements. If
# set to 0, the number of unique elements is equal to the
# number of elements.
# The remaining elements are the tensor dimensions.
basic_indexing_tests = [
[10, 0, 12345],
[10, 4, 12345],
[10, 16, 512, 512, 32],
[10, 0, 4, 4],
[10, 0, 32, 32],
[10, 8, 32, 32],
[10, 8, 64, 32, 16],
[10, 0, 64, 32, 16],
[10, 16, 512, 512, 32],
[10, 0, 675, 999, 13],
[10, 0, 123, 456, 31],
[10, 0, 512, 512, 32],
[10, 4, 512, 512, 32],
[10, 2, 512, 512, 32],
[10, 0, 128, 128, 16, 16],
[10, 8, 128, 126, 16, 16],
[10, 4, 128, 126, 16, 16],
[10, 0, 64, 64, 16, 16, 16],
[10, 8, 64, 64, 16, 16, 16],
[10, 2, 64, 64, 16, 16, 16],
[10, 1, 64, 64, 16, 16, 16],
]
def run_basic_indexing_on_device(x, index, expected, device_string, iters):
x_dev = x.to(device_string)
x_dev = x_dev.detach().requires_grad_()
index_dev = index.to(device_string)
# Run backward pass; keep gradients and measure time
torch.cuda.synchronize()
t_bw_s = time()
for _ in range(iters):
y = x_dev[index_dev]
z = y.sum()
z.backward()
torch.cuda.synchronize()
t_bw_s = (time() - t_bw_s) / iters
return (x_dev.grad, t_bw_s)
def run_basic_indexing_test(test_input):
tensor_size = tuple(test_input[:5])
niters = test_input[0]
num_unique = test_input[1]
tensor_size = tuple(test_input[2:])
numel = 1
for dim in tensor_size:
numel *= dim
if num_unique == 0:
num_unique = numel
index = torch.randint(0, num_unique, tensor_size, dtype=torch.long, device="cpu")
x = torch.randn((numel,), dtype=torch.float32, device="cuda")
index = index.detach()
x = x.detach().requires_grad_()
(cpu_grad, t_bw_cpu) = run_basic_indexing_on_device(x, index, numel / 2, "cpu", 1)
(gpu_grad, t_bw_gpu) = run_basic_indexing_on_device(x, index, numel / 2, "cuda", 1)
max_delta = torch.max(torch.abs(cpu_grad - gpu_grad.to("cpu")))
missmatches = torch.nonzero(torch.abs(cpu_grad - gpu_grad.to("cpu")))
(gpu_grad_perf, t_gpu) = run_basic_indexing_on_device(
x, index, numel / 2, "cuda", niters
)
print(
"test = {}, delta = {:.5f}, missmatches = {} duration_ms = {:.3f}".format(
tuple(test_input), max_delta, missmatches, t_gpu * 1000.0
)
)
if torch.numel(missmatches) > 0:
print("cpu grad = {}", cpu_grad[missmatches])
print("gpu grad = {}", gpu_grad[missmatches])
```
RESULTS:
```
Default Implementation
test = (1, 0, 12345), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 0.726
test = (1, 4, 12345), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 0.867
test = (1, 16, 512, 512, 32), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 80.514
test = (1, 0, 4, 4), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 0.689
test = (1, 0, 32, 32), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 0.547
test = (1, 8, 32, 32), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 0.537
test = (1, 8, 64, 32, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 1.199
test = (1, 0, 64, 32, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 0.584
test = (1, 16, 512, 512, 32), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 80.055
test = (1, 0, 675, 999, 13), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 8.411
test = (1, 0, 123, 456, 31), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 2.419
test = (1, 0, 512, 512, 32), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 8.048
test = (1, 4, 512, 512, 32), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 307.633
test = (1, 2, 512, 512, 32), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 606.403
test = (1, 0, 128, 128, 16, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 4.099
test = (1, 8, 128, 126, 16, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 76.813
test = (1, 4, 128, 126, 16, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 148.760
test = (1, 0, 64, 64, 16, 16, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 16.547
test = (1, 8, 64, 64, 16, 16, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 317.583
test = (1, 2, 64, 64, 16, 16, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 1204.800
test = (1, 1, 64, 64, 16, 16, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 2412.133
Small Stride Kernel Version
test = (1, 0, 12345), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 0.904
test = (1, 4, 12345), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 2.156
test = (1, 16, 512, 512, 32), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 308.878
test = (1, 0, 4, 4), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 0.566
test = (1, 0, 32, 32), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 0.540
test = (1, 8, 32, 32), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 0.550
test = (1, 8, 64, 32, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 2.868
test = (1, 0, 64, 32, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 0.656
test = (1, 16, 512, 512, 32), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 307.856
test = (1, 0, 675, 999, 13), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 6.624
test = (1, 0, 123, 456, 31), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 1.837
test = (1, 0, 512, 512, 32), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 6.274
test = (1, 4, 512, 512, 32), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 1127.040
test = (1, 2, 512, 512, 32), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 2123.942
test = (1, 0, 128, 128, 16, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 3.282
test = (1, 8, 128, 126, 16, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 288.997
test = (1, 4, 128, 126, 16, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 547.267
test = (1, 0, 64, 64, 16, 16, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 12.844
test = (1, 8, 64, 64, 16, 16, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 1178.934
test = (1, 2, 64, 64, 16, 16, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 4262.042
test = (1, 1, 64, 64, 16, 16, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 8172.318
Stride 1 Kernel Version
test = (1, 0, 12345), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 0.692
test = (1, 4, 12345), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 0.834
test = (1, 16, 512, 512, 32), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 81.023
test = (1, 0, 4, 4), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 0.631
test = (100, 0, 32, 32), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 0.491
test = (100, 8, 32, 32), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 0.477
test = (50, 8, 64, 32, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 0.561
test = (50, 0, 64, 32, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 0.516
test = (16, 10, 512, 512, 32), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 126.455
test = (10, 0, 675, 999, 13), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 8.238
test = (10, 0, 123, 456, 31), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 1.520
test = (10, 0, 512, 512, 32), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 7.854
test = (10, 4, 512, 512, 32), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 306.327
test = (10, 2, 512, 512, 32), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 610.498
test = (5, 0, 128, 128, 16, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 3.684
test = (5, 8, 128, 126, 16, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 75.604
test = (5, 4, 128, 126, 16, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 148.679
test = (1, 0, 64, 64, 16, 16, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 16.525
test = (1, 8, 64, 64, 16, 16, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 315.095
test = (1, 2, 64, 64, 16, 16, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 1214.715
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100505
Approved by: https://github.com/ngimel
2023-05-03 23:52:58 +00:00
vfdev-5
6a12f10b08
Publicly exposing torch.backends.cpu.get_cpu_capability() ( #100164 )
...
Description:
- As suggested by Nikita, created `torch.backends.cpu` submodule and exposed `get_cpu_capability`.
- In torchvision Resize method we want to know current cpu capability in order to pick appropriate codepath depending on cpu capablities
Newly coded vectorized resize of uint8 images on AVX2 supported CPUs is now faster than older way (uint8->float->resize->uint8). However, on non-avx hardware (e.g. Mac M1) certain configs are slower using native uint8.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100164
Approved by: https://github.com/albanD , https://github.com/malfet
2023-05-03 19:02:07 +00:00
PyTorch MergeBot
1114673c90
Revert "[pytorch] Accelerate indexing_backward_kernel with duplicates ( #99441 )"
...
This reverts commit 97afbcbc80 .
Reverted https://github.com/pytorch/pytorch/pull/99441 on behalf of https://github.com/ngimel due to breaks ROCM ([comment](https://github.com/pytorch/pytorch/pull/99441#issuecomment-1531804487 ))
2023-05-02 16:46:04 +00:00
Lu Fang
090ec55f8d
Only skip in torch inductor test
...
Differential Revision: D45464303nnPull Request resolved: https://github.com/pytorch/pytorch/pull/100435
2023-05-01 22:21:37 -07:00
Lu Fang
429155b3c8
Disable some check to get the test pass
...
Differential Revision: D45437730nnPull Request resolved: https://github.com/pytorch/pytorch/pull/100364
2023-05-01 16:28:12 -07:00
valentinandrei
97afbcbc80
[pytorch] Accelerate indexing_backward_kernel with duplicates ( #99441 )
...
By knowing the stride value ahead of time, we can simplify the kernel code as follows:
If `stride == 1` we can use the whole warp to reduce the gradients
If `stride < warp_size` we don't need the internal `while (start_feature < stride)` loop as `blockDim.x` is always 32
This changes improve the performance of the kernel when duplicates are present and do not affect the performance with low amount of duplicates. The implementation is deterministic.
The proposed implementation uses `opmath_t` to accumulate in registers the gradient values so when using FP16/BF16 it may overflow if the number of elements is large. This is different from the initial implementation who accumulates in `scalar_t` and does not overflow. In addition, when the stride is 1, we are using warp shuffles to sum the gradient so the order of the addition is slightly different than a reference implementation which causes some minor numerical differences when compared to a reference.
TEST CODE:
```
# The first element is the number of iterations.
# The second represents the number of unique elements. If
# set to 0, the number of unique elements is equal to the
# number of elements.
# The remaining elements are the tensor dimensions.
basic_indexing_tests = [
[10, 0, 12345],
[10, 4, 12345],
[10, 16, 512, 512, 32],
[10, 0, 4, 4],
[10, 0, 32, 32],
[10, 8, 32, 32],
[10, 8, 64, 32, 16],
[10, 0, 64, 32, 16],
[10, 16, 512, 512, 32],
[10, 0, 675, 999, 13],
[10, 0, 123, 456, 31],
[10, 0, 512, 512, 32],
[10, 4, 512, 512, 32],
[10, 2, 512, 512, 32],
[10, 0, 128, 128, 16, 16],
[10, 8, 128, 126, 16, 16],
[10, 4, 128, 126, 16, 16],
[10, 0, 64, 64, 16, 16, 16],
[10, 8, 64, 64, 16, 16, 16],
[10, 2, 64, 64, 16, 16, 16],
[10, 1, 64, 64, 16, 16, 16],
]
def run_basic_indexing_on_device(x, index, expected, device_string, iters):
x_dev = x.to(device_string)
x_dev = x_dev.detach().requires_grad_()
index_dev = index.to(device_string)
# Run backward pass; keep gradients and measure time
torch.cuda.synchronize()
t_bw_s = time()
for _ in range(iters):
y = x_dev[index_dev]
z = y.sum()
z.backward()
torch.cuda.synchronize()
t_bw_s = (time() - t_bw_s) / iters
return (x_dev.grad, t_bw_s)
def run_basic_indexing_test(test_input):
tensor_size = tuple(test_input[:5])
niters = test_input[0]
num_unique = test_input[1]
tensor_size = tuple(test_input[2:])
numel = 1
for dim in tensor_size:
numel *= dim
if num_unique == 0:
num_unique = numel
index = torch.randint(0, num_unique, tensor_size, dtype=torch.long, device="cpu")
x = torch.randn((numel,), dtype=torch.float32, device="cuda")
index = index.detach()
x = x.detach().requires_grad_()
(cpu_grad, t_bw_cpu) = run_basic_indexing_on_device(x, index, numel / 2, "cpu", 1)
(gpu_grad, t_bw_gpu) = run_basic_indexing_on_device(x, index, numel / 2, "cuda", 1)
max_delta = torch.max(torch.abs(cpu_grad - gpu_grad.to("cpu")))
missmatches = torch.nonzero(torch.abs(cpu_grad - gpu_grad.to("cpu")))
(gpu_grad_perf, t_gpu) = run_basic_indexing_on_device(
x, index, numel / 2, "cuda", niters
)
print(
"test = {}, delta = {:.5f}, missmatches = {} duration_ms = {:.3f}".format(
tuple(test_input), max_delta, missmatches, t_gpu * 1000.0
)
)
if torch.numel(missmatches) > 0:
print("cpu grad = {}", cpu_grad[missmatches])
print("gpu grad = {}", gpu_grad[missmatches])
```
RESULTS:
```
Default Implementation
test = (1, 0, 12345), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 0.726
test = (1, 4, 12345), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 0.867
test = (1, 16, 512, 512, 32), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 80.514
test = (1, 0, 4, 4), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 0.689
test = (1, 0, 32, 32), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 0.547
test = (1, 8, 32, 32), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 0.537
test = (1, 8, 64, 32, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 1.199
test = (1, 0, 64, 32, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 0.584
test = (1, 16, 512, 512, 32), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 80.055
test = (1, 0, 675, 999, 13), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 8.411
test = (1, 0, 123, 456, 31), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 2.419
test = (1, 0, 512, 512, 32), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 8.048
test = (1, 4, 512, 512, 32), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 307.633
test = (1, 2, 512, 512, 32), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 606.403
test = (1, 0, 128, 128, 16, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 4.099
test = (1, 8, 128, 126, 16, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 76.813
test = (1, 4, 128, 126, 16, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 148.760
test = (1, 0, 64, 64, 16, 16, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 16.547
test = (1, 8, 64, 64, 16, 16, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 317.583
test = (1, 2, 64, 64, 16, 16, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 1204.800
test = (1, 1, 64, 64, 16, 16, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 2412.133
Small Stride Kernel Version
test = (1, 0, 12345), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 0.904
test = (1, 4, 12345), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 2.156
test = (1, 16, 512, 512, 32), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 308.878
test = (1, 0, 4, 4), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 0.566
test = (1, 0, 32, 32), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 0.540
test = (1, 8, 32, 32), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 0.550
test = (1, 8, 64, 32, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 2.868
test = (1, 0, 64, 32, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 0.656
test = (1, 16, 512, 512, 32), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 307.856
test = (1, 0, 675, 999, 13), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 6.624
test = (1, 0, 123, 456, 31), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 1.837
test = (1, 0, 512, 512, 32), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 6.274
test = (1, 4, 512, 512, 32), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 1127.040
test = (1, 2, 512, 512, 32), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 2123.942
test = (1, 0, 128, 128, 16, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 3.282
test = (1, 8, 128, 126, 16, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 288.997
test = (1, 4, 128, 126, 16, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 547.267
test = (1, 0, 64, 64, 16, 16, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 12.844
test = (1, 8, 64, 64, 16, 16, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 1178.934
test = (1, 2, 64, 64, 16, 16, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 4262.042
test = (1, 1, 64, 64, 16, 16, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 8172.318
Stride 1 Kernel Version
test = (1, 0, 12345), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 0.692
test = (1, 4, 12345), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 0.834
test = (1, 16, 512, 512, 32), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 81.023
test = (1, 0, 4, 4), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 0.631
test = (100, 0, 32, 32), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 0.491
test = (100, 8, 32, 32), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 0.477
test = (50, 8, 64, 32, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 0.561
test = (50, 0, 64, 32, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 0.516
test = (16, 10, 512, 512, 32), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 126.455
test = (10, 0, 675, 999, 13), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 8.238
test = (10, 0, 123, 456, 31), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 1.520
test = (10, 0, 512, 512, 32), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 7.854
test = (10, 4, 512, 512, 32), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 306.327
test = (10, 2, 512, 512, 32), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 610.498
test = (5, 0, 128, 128, 16, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 3.684
test = (5, 8, 128, 126, 16, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 75.604
test = (5, 4, 128, 126, 16, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 148.679
test = (1, 0, 64, 64, 16, 16, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 16.525
test = (1, 8, 64, 64, 16, 16, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 315.095
test = (1, 2, 64, 64, 16, 16, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 1214.715
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99441
Approved by: https://github.com/ngimel
2023-05-01 22:41:00 +00:00
Lu Fang
d7fa7fa8cf
Introduce fast path in the CPU equal op
...
Differential Revision: D45282119nnPull Request resolved: https://github.com/pytorch/pytorch/pull/100024
2023-04-28 16:00:17 -07:00
kshitij12345
61dffa61c3
[fix] masked_scatter_: non-contiguous self ( #100232 )
...
Fixes https://github.com/pytorch/pytorch/issues/99638
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100232
Approved by: https://github.com/ngimel
2023-04-28 18:12:23 +00:00
dujinhang
9cd48b0575
Add warning information for dtypetensor. ( #99521 )
...
Fixes #ISSUE_NUMBER
Without affecting the existing cpu/cuda logic, a separate interface is provided for the custom backend and users can choose whether to use the interface function which provides 10 tensor types with custom backend variations.
Therefore, users can use torch.set_deafult_tensor_type to set the default device tensor type, or use torch.xxx.dtypetensor to create a tensor.For example,torch.set_deafult_tensor_type(torch.foo.DoubleTensor) or torch.foo.DoubleTensor([]).
@albanD , please review my changes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99521
Approved by: https://github.com/albanD
2023-04-28 18:01:45 +00:00
Larry Liu
687afeb686
[dynamo][numpy] Add NumpyTensorVariable to translate ndarray attribute calls to tensor attributes ( #95849 )
...
Issue: #93684
# Problem
Reduce graph breaks when dynamo compiles python functions containing numpy functions and ndarray operations.
# Design (as I know it)
* Use torch_np.ndarray(a wrapper of tensor) to back a `VariableTracker`: `NumpyTensorVariable`.
* Translate all attributes and methods calls, on ndarray, to torch_np.ndarray equivalent.
This PR adds `NumpyTensorVariable` and supports:
1. tensor to ndarray, ndarray to tensor
2. numpy functions such as numpy.meshgrid()
3. ndarray attributes such as `itemsize`, `stride`
Next PR will handle returning `np.ndarray` and add support for ndarray methods
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95849
Approved by: https://github.com/ezyang
2023-04-27 16:18:35 +00:00
Jiong Gong
e5c9a0fcf5
[dynamo] avoid graph break on repeat_interleave.self_int ( #99528 )
...
Address convit_base failure: https://github.com/pytorch/torchdynamo/issues/1886 mentioned in https://github.com/pytorch/pytorch/issues/93777
Also for models like EleutherAI/gpt-j-6B.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99528
Approved by: https://github.com/ezyang
2023-04-25 04:47:39 +00:00
BJ Hargrave
555ab310dc
Add itemsize and nbytes properties to Tensor ( #98322 )
...
Adds properties for itemsize and nbytes to Tensor matching the properties in NumPy.
Fixes https://github.com/pytorch/pytorch/issues/12728
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98322
Approved by: https://github.com/ezyang
2023-04-05 12:11:55 +00:00
Jason Ansel
b96fe9b61c
Fix issues related to ClassInstantier in HF models ( #97997 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97997
Approved by: https://github.com/anijain2305
2023-04-04 00:01:08 +00:00
Jason Ansel
71d850a100
[inductor] Fallback on complex64 kernels ( #98155 )
...
Later PRs in this stack fixe graph breaks in GoogleFnet which triggers errors from inductor trying to compile torch.complex64, this fixes that.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98155
Approved by: https://github.com/anijain2305 , https://github.com/ngimel
2023-04-03 01:06:43 +00:00
Nikita Shulga
2af09393f9
masked_scatter should accept only bool masks (#97999 )
...
Modify test_torch to check that assert is raised in this case
torch.uint8 usage has been deprecated for a few releases, and errors has been raised for other dtypes on CUDA device, but not on CPU.
This PR finally restricts mask to just `torch.bool`
See https://github.com/pytorch/pytorch/pull/96594 as an example doing it for `torch.masked_fill`
Fixes https://github.com/pytorch/pytorch/issues/94634
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97999
Approved by: https://github.com/ngimel
2023-04-01 23:25:25 +00:00
Nikita Shulga
a1dc2b1774
[BE] Remove bool dtype from masked_scatter ( #98015 )
...
<!--
copilot:summary
-->
### <samp>🤖 Generated by Copilot at a9fa438</samp>
Simplified a test function for `torch.masked_scatter` in `test/test_torch.py` by removing redundant and unnecessary code.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98015
Approved by: https://github.com/ezyang
2023-03-31 01:45:57 +00:00
Aleksei Nikiforov
8289120ef0
Revert "test/test_torch.py: fix TestTorch::test_from_buffer test ( #96952 )" ( #97759 )
...
Tests were already fixed in https://github.com/pytorch/pytorch/pull/92834 , and these changes instead of also fixing tests are now breaking them again.
This reverts commit 7f94ea8492 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97759
Approved by: https://github.com/janeyx99
2023-03-28 18:43:08 +00:00
Nikita Shulga
542fb0b1fa
Specify file encoding in test_torch.py ( #97628 )
...
Attempt to fix
```
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 5260: ordinal not in range(128)
```
in https://github.com/pytorch/pytorch/actions/runs/4522628359/jobs/7965372405
In general, it's a good practice to explicitly specify encoding, as otherwise it depends on environment variable and makes tests failures unpredicatble
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97628
Approved by: https://github.com/dagitses , https://github.com/kit1980
2023-03-26 20:03:25 +00:00
Edward Z. Yang
37faa48844
DCE inference graphs too ( #97275 )
...
I added a bunch of asserts to verify that I didn't accidentally kill copy_ in the graph, hopefully this combined with our existing tests is good enough.
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97275
Approved by: https://github.com/bdhirsh
2023-03-23 07:02:52 +00:00
Kurt Mohler
fbc803df0c
Only warn once for TypedStorage deprecation ( #97379 )
...
Fixes #97207
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97379
Approved by: https://github.com/ezyang
2023-03-23 05:40:23 +00:00
Aleksei Nikiforov
7f94ea8492
test/test_torch.py: fix TestTorch::test_from_buffer test ( #96952 )
...
Use opposite encoding on big endian systems
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96952
Approved by: https://github.com/ezyang
2023-03-17 14:36:33 +00:00
mingfeima
06054d7df0
fix random output issue on index_select when src is scalar and index is empty ( #96408 )
...
Fix https://github.com/pytorch/pytorch/issues/94340
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96408
Approved by: https://github.com/ngimel
2023-03-16 05:30:45 +00:00
Kurt Mohler
06b7285163
Add torch._check* functions analogous to C++ TORCH_CHECK* ( #88725 )
...
Adds `_check`, `_check_index`, `_check_value`, `_check_type`, `_check_not_implemented`, `_check_tensor_all`
Part of #72948
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88725
Approved by: https://github.com/albanD
2023-03-14 20:44:50 +00:00
kvathupo
2b9d9bcb85
Deprecate non-bool masks in masked_fill ( #96594 )
...
__What?__
Per discussion at #94634 , deprecate `masked_fill` with non-bool masks. Deprecation warnings were previously added by #22261 , but not for Apple MPS. I can revert the MPS changes if deprecation warnings are wanted first tho. See also #96112 .
Fixes #85063 and #89320 .
__Further Development?__
- Fixed the mask dtype checking for the cuda dispatch for `masked_fill` in `aten/src/ATen/native/cuda/Indexing.cu`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96594
Approved by: https://github.com/malfet , https://github.com/ngimel
2023-03-13 01:41:47 +00:00
Nikita Shulga
1cd0929bf7
[BC] Allow only bool tensors as mask in masked_select ( #96112 )
...
`byte` support was marked as deprecated in 1.8, so it's fine to remove this in 2.1 (or even 2.0)
Deprecation warning was added by https://github.com/pytorch/pytorch/pull/22261
Also, fix bunch of syntactic errors in comments
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96112
Approved by: https://github.com/ezyang
2023-03-07 01:43:14 +00:00
puririshi98
8aa34602f7
Jetson Update for CI Redo ( #94549 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94549
Approved by: https://github.com/ezyang , https://github.com/malfet
2023-02-21 17:13:38 +00:00
Yuxin Wu
9bb2fe3eae
fix numpy1.24 deprecations in unittests ( #93997 )
...
Fixes https://github.com/pytorch/pytorch/issues/91329
Pull Request resolved: https://github.com/pytorch/pytorch/pull/93997
Approved by: https://github.com/ngimel , https://github.com/jerryzh168
2023-02-18 00:59:09 +00:00
Xuehai Pan
b005ec62b9
[BE] Remove dependency on six and future ( #94709 )
...
Remove the Python 2 and 3 compatibility library [six](https://pypi.org/project/six ) and [future](https://pypi.org/project/future ) and `torch._six`. We only support Python 3.8+ now. It's time to retire them.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94709
Approved by: https://github.com/malfet , https://github.com/Skylion007
2023-02-14 09:14:14 +00:00
Brian Hirsh
ceb0f1576b
turn functionalization on in aot_autograd inference ( #92857 )
...
still waiting for CI fallout
fixes #90759
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92857
Approved by: https://github.com/ezyang
2023-02-13 17:48:00 +00:00
Nikita Shulga
4869929f32
Update Triton hash ( #94249 )
...
That includes MLIR + latest packaging changes (that also download ptxas from CUDA-12)
Tweak CI to install gcc-9 to build trition
Disable a few tests to make everything be correct
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94249
Approved by: https://github.com/Skylion007 , https://github.com/ngimel , https://github.com/weiwangmeta
2023-02-13 13:17:36 +00:00
Aaron Gokaslan
9171f7d4cd
[BE] Modernize PyTorch even more for 3.8 with pyupgrade ( #94520 )
...
Applies some more pyupgrade fixits to PyTorch
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94520
Approved by: https://github.com/ezyang
2023-02-10 18:02:50 +00:00
ganler
db6cfff827
fix: forbid multi-index for index_select over scalar ( #94347 )
...
Fixes #88940
According to the [doc](https://pytorch.org/docs/stable/generated/torch.index_select.html ):
1. "The returned tensor has the same number of dimensions as the original tensor (`input`). "
2. "The `dim`th dimension has the same size as the length of `index`; other dimensions have the same size as in the original tensor."
These two conditions cannot be satisfied at the same time if the `input` is a scalar && `index` has multiple values: because a scalar at most holds one element (according to property 1, the output is a scalar), it is impossible to satisfy "The `dim`th dimension has the same size as the length of `index`" when `index` has multiple values.
However, currently, if we do so we either get:
1. Buffer overflow with ASAN;
2. Or (w/o ASAN) silently returns outputs that is not consistent with the doc (`x.index_select(0, torch.Tensor([0, 0, 0]).int())` returns `x`).
As a result, we should explicitly reject such cases.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94347
Approved by: https://github.com/malfet
2023-02-10 17:17:09 +00:00
min-jean-cho
900e09c872
[Dynamo] Support torch.Tensor.fn as TorchVariable, not UserDefinedObjectVariable, preventing graph break ( #93243 )
...
As found in #92709 , thanks to @ngimel and @jansel, currently `torch.Tensor.fn` points to `UserDefinedObjectVariable` rather than `TorchVariable`. The root cause is due to https://github.com/pytorch/pytorch/pull/92709#pullrequestreview-1273357406 . To prevent this, build `TorchVariable` of `torch.Tensor.fn` pointing to `torch.ops.aten.fn`.
This issue propagates to `torch.Tensor.fn` causing graph break with `nopython=True`.
```python
import torch
import torch._dynamo as dynamo
#op = torch.ops.aten.abs_ # no graph break
op = torch.Tensor.abs_ # graph break
args = torch.empty(10)
def foo(args):
return op(args)
opt_foo = dynamo.optimize("inductor", nopython=True)(foo)
y_ = opt_foo(args)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/93243
Approved by: https://github.com/jansel
2023-02-07 09:26:50 +00:00
min-jean-cho
6e1cfcdf4b
cauchy_ few fixes (1) check gamma > 0 (2) better dtype error log ( #93314 )
...
Related #92047
(1) `torch.Tensor.cauchy_` is missing check for `gamma > 0` (`torch.distributions.cauchy.Cauchy` correctly checks `gamma > 0`).
(2) add better error log on dtype similar to exponential_
Pull Request resolved: https://github.com/pytorch/pytorch/pull/93314
Approved by: https://github.com/jgong5 , https://github.com/fritzo , https://github.com/lezcano
2023-02-03 11:56:28 +00:00
min-jean-cho
2f0b0c5dd7
exponential_ few fixes (1) lambda > 0 (2) mkl kernel to continuous (3) better error log on dtype ( #92891 )
...
Exponential distribution is continuous. Fixes CPU MKL exponential implementation to exclude integer dtypes.
```python
import torch
dtypes = [torch.uint8, torch.int8, torch.int16, torch.int32, torch.int64]
for dtype in dtypes:
x = torch.empty(10000, dtype=dtype).exponential_() # should fail !
print("dtype: ", x.dtype, "sum: ", x.sum())
```
### Additional Context
Related to #92709 . This issue propagates to OpInfo of exponential.
```
AssertionError: The supported dtypes for exponential on device type cpu are incorrect!
The following dtypes worked in forward but are not listed by the OpInfo: {torch.int64, torch.uint8, torch.int8, torch.int16, torch.int32}.
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92891
Approved by: https://github.com/CaoE , https://github.com/jgong5 , https://github.com/ngimel
2023-01-28 02:27:16 +00:00
Yanbo Liang
a6b51448f5
[Dynamo] Supports if condition on user defined object ( #90892 )
...
Fixes Meta internal user case, see the pattern in unit test.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90892
Approved by: https://github.com/jansel , https://github.com/mlazos
2023-01-26 04:19:32 +00:00
PyTorch MergeBot
9b23fd378f
Revert "Logcumsumexp for complex in CPU and CUDA ( #90847 )"
...
This reverts commit 64985123e4 .
Reverted https://github.com/pytorch/pytorch/pull/90847 on behalf of https://github.com/malfet due to Reverting to decrease build time, let's discuss the alternatives here
2023-01-24 20:49:08 +00:00
pierreHaslee
1c30844eaa
where() function added as a Tensor method as well ( #92849 )
...
Fixes #88470
I added the "method" keyword in `aten/src/ATen/native/native_functions.yaml` for the function `where` with Scalar Overload.
This way, you can now use `Tensor.where()` with a scalar parameter the same way `torch.where()` can.
I added a test in `test/test_torch.py` as requested.
It uses the `where()` method on a tensor and then checks it has the same results as the `torch.where()` function.
The test is roughly the same as the one provided by the author of the issue.
PS: this is the second PR I make to resolve this issue, the first one is #92747 . I had troubles with commit signatures and is therefore closed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92849
Approved by: https://github.com/albanD
2023-01-24 03:09:33 +00:00
mfkasim1
64985123e4
Logcumsumexp for complex in CPU and CUDA ( #90847 )
...
Another PR towards solving #89205 .
What's in this PR:
* The implementation of forward `logcumsumexp` for complex numbers in CPU & CUDA
* The tests on forward call of `logcumsumexp` for complex numbers
* The implementation of backward `logcumsumexp` for complex numbers
What's missing:
* The test on backward gradient of `logcumsumexp` (it complaints `RuntimeError: logcumsumexp does not support automatic differentiation for outputs with complex dtype.` and I don't know how to solve the error and I don't know where to put the test for the backward computation). If possible, I'd like this to be done in this PR.
It's really tricky to handle the edge cases here (i.e. the ones involving `inf`), but I've tried my best to put some comments explaining the reasonings of my decisions in this PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90847
Approved by: https://github.com/albanD
2023-01-20 15:10:50 +00:00
Kurt Mohler
647b8f8e3e
Add TORCH_CHECK_TENSOR_ALL ( #89097 )
...
`TORCH_CHECK_TENSOR_ALL(cond, ...)` is a wrapper around `TORCH_CHECK` which allows the condition argument to be a tensor, batched or unbatched. `cond` can be a boolean tensor of any size. If any element is False, or if `cond.numel() == 0`, then `TORCH_CHECK_TENSOR_ALL` raises an error
Part of #72948
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89097
Approved by: https://github.com/zou3519
2023-01-19 21:04:09 +00:00
BowenBao
a72bcb3388
Do not leak SkipFrame exception to parent frames ( #91059 )
...
Discovered by https://github.com/pytorch/torchdynamo/issues/2000 , we noticed the exception `SkipFrame` to avoid repeatedly compiling frame of loop with graph breaks could leak to parent frames while inlining, which then prevents compiling.
This PR checks at inlining if such exception is raised and would instead raise an `Unsupported` to the outer frame. The original behavior and goal of #88857 is unaffected: the inner frame that has loop would still be skipped.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91059
Approved by: https://github.com/jansel , https://github.com/thiagocrepaldi
2023-01-13 17:11:22 +00:00
XiaobingSuper
1892c75a45
fix norrow_copy correctness issue for non-contiguous input for cpu path(reland) ( #91883 )
...
This PR is about re-land https://github.com/pytorch/pytorch/pull/91789 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91883
Approved by: https://github.com/lezcano
2023-01-10 10:56:18 +00:00
PyTorch MergeBot
d85f3c8237
Revert "fix norrow_copy correctness issue for non-contiguous input for cpu path ( #91789 )"
...
This reverts commit 136dadd689 .
Reverted https://github.com/pytorch/pytorch/pull/91789 on behalf of https://github.com/huydhn due to This breaks trunk with XPASS test_vmap_exhaustive_narrow_copy_cpu_float32 136dadd689
2023-01-09 06:50:20 +00:00
XiaobingSuper
136dadd689
fix norrow_copy correctness issue for non-contiguous input for cpu path ( #91789 )
...
Fix https://github.com/pytorch/pytorch/issues/91690 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91789
Approved by: https://github.com/jgong5 , https://github.com/lezcano
2023-01-09 00:55:03 +00:00
PyTorch MergeBot
b3603f8129
Revert "Deduplicate c10 error and PyTorchError hierarchy ( #87855 )"
...
This reverts commit 34f2d3e6ae .
Reverted https://github.com/pytorch/pytorch/pull/87855 on behalf of https://github.com/osalpekar due to perf regression in quantization tests
2023-01-06 19:56:35 +00:00
William Phetsinorath
34f2d3e6ae
Deduplicate c10 error and PyTorchError hierarchy ( #87855 )
...
Fixes #53370
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87855
Approved by: https://github.com/albanD
2023-01-02 15:53:36 +00:00
ecao
274d3b24c3
use scatter_add for index_add when dim is the most inner dim ( #88729 )
...
### Motivation
When dim is -1 and the slice of source or result is noncontiguous, original `index_add` is slow as it uses add for the sliced tensor, which is serial on index and parallel on sliced tensor to avoid write conflict. Doing parallel on the sliced tensor is not optimal as the size of sliced tensor may be not big enough to parallel and also causes multiple parallelizations.
`scatter_add ` is used to speedup for this case as `scatter_add ` parallels on the outer dimension of input and is serial on the inner dimension to avoid write conflict. `scatter_add ` only need one parallel and the size of outer dimensions is bigger to do parallel.
### Testing
- Single core:
Before:
shape | fp32 / s | bf16 / s
-- | -- | --
[10, 128, 20, 20] | 2.82E-03 | 2.11E-03
[10, 128, 50, 50] | 0.023604 | 0.023794
After:
shape | fp32 / s | bf16 / s
-- | -- | --
[10, 128, 20, 20] | 9.30E-04 | 1.66E-03
[10, 128, 50, 50] | 0.005995 | 0.010003
- Single socket (28 cores):
Before:
shape | fp32 / s | bf16 / s
-- | -- | --
[10, 128, 20, 20] | 2.96E-03 | 2.52E-03
[10, 128, 50, 50] | 0.012208 | 0.012568
After:
shape | fp32 / s | bf16 / s
-- | -- | --
[10, 128, 20, 20] | 7.44E-05 | 1.33E-04
[10, 128, 50, 50] | 0.000333 | 0.000469
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88729
Approved by: https://github.com/mingfeima , https://github.com/jgong5 , https://github.com/malfet
2022-12-28 12:04:17 +00:00
PyTorch MergeBot
eadd557266
Revert "use scatter_add for index_add when dim is the most inner dim ( #88729 )"
...
This reverts commit 68e9da68cb .
Reverted https://github.com/pytorch/pytorch/pull/88729 on behalf of https://github.com/atalman due to Break internal build
2022-12-22 18:06:45 +00:00
ecao
68e9da68cb
use scatter_add for index_add when dim is the most inner dim ( #88729 )
...
### Motivation
When dim is -1 and the slice of source or result is noncontiguous, original `index_add` is slow as it uses add for the sliced tensor, which is serial on index and parallel on sliced tensor to avoid write conflict. Doing parallel on the sliced tensor is not optimal as the size of sliced tensor may be not big enough to parallel and also causes multiple parallelizations.
`scatter_add ` is used to speedup for this case as `scatter_add ` parallels on the outer dimension of input and is serial on the inner dimension to avoid write conflict. `scatter_add ` only need one parallel and the size of outer dimensions is bigger to do parallel.
### Testing
- Single core:
Before:
shape | fp32 / s | bf16 / s
-- | -- | --
[10, 128, 20, 20] | 2.82E-03 | 2.11E-03
[10, 128, 50, 50] | 0.023604 | 0.023794
After:
shape | fp32 / s | bf16 / s
-- | -- | --
[10, 128, 20, 20] | 9.30E-04 | 1.66E-03
[10, 128, 50, 50] | 0.005995 | 0.010003
- Single socket (28 cores):
Before:
shape | fp32 / s | bf16 / s
-- | -- | --
[10, 128, 20, 20] | 2.96E-03 | 2.52E-03
[10, 128, 50, 50] | 0.012208 | 0.012568
After:
shape | fp32 / s | bf16 / s
-- | -- | --
[10, 128, 20, 20] | 7.44E-05 | 1.33E-04
[10, 128, 50, 50] | 0.000333 | 0.000469
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88729
Approved by: https://github.com/mingfeima , https://github.com/jgong5 , https://github.com/malfet
2022-12-22 01:13:35 +00:00
PyTorch MergeBot
3194281ca7
Revert "use scatter_add for index_add when dim is the most inner dim ( #88729 )"
...
This reverts commit 13dbad6369 .
Reverted https://github.com/pytorch/pytorch/pull/88729 on behalf of https://github.com/desertfire due to causing inductor test failure
2022-12-20 15:19:54 +00:00
ecao
13dbad6369
use scatter_add for index_add when dim is the most inner dim ( #88729 )
...
### Motivation
When dim is -1 and the slice of source or result is noncontiguous, original `index_add` is slow as it uses add for the sliced tensor, which is serial on index and parallel on sliced tensor to avoid write conflict. Doing parallel on the sliced tensor is not optimal as the size of sliced tensor may be not big enough to parallel and also causes multiple parallelizations.
`scatter_add ` is used to speedup for this case as `scatter_add ` parallels on the outer dimension of input and is serial on the inner dimension to avoid write conflict. `scatter_add ` only need one parallel and the size of outer dimensions is bigger to do parallel.
### Testing
- Single core:
Before:
shape | fp32 / s | bf16 / s
-- | -- | --
[10, 128, 20, 20] | 2.82E-03 | 2.11E-03
[10, 128, 50, 50] | 0.023604 | 0.023794
After:
shape | fp32 / s | bf16 / s
-- | -- | --
[10, 128, 20, 20] | 9.30E-04 | 1.66E-03
[10, 128, 50, 50] | 0.005995 | 0.010003
- Single socket (28 cores):
Before:
shape | fp32 / s | bf16 / s
-- | -- | --
[10, 128, 20, 20] | 2.96E-03 | 2.52E-03
[10, 128, 50, 50] | 0.012208 | 0.012568
After:
shape | fp32 / s | bf16 / s
-- | -- | --
[10, 128, 20, 20] | 7.44E-05 | 1.33E-04
[10, 128, 50, 50] | 0.000333 | 0.000469
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88729
Approved by: https://github.com/mingfeima , https://github.com/jgong5 , https://github.com/malfet
2022-12-20 13:12:36 +00:00
Yanbo Liang
511fbad830
[Dynamo] Fix builder for class with metaclass ( #90807 )
...
Fixes Meta internal user case: a class with metaclass can't be identified as ```UserDefinedClassVariable```.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90807
Approved by: https://github.com/jansel
2022-12-20 05:02:28 +00:00
Edward Z. Yang
e686a442b4
If a torch.* returns non-Tensor, make this unimplemented rather than assert. ( #89918 )
...
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89918
Approved by: https://github.com/albanD
2022-12-15 21:53:54 +00:00
Edward Z. Yang
283cf718ed
Fix _fix_weakref memory leak ( #90823 )
...
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90823
Approved by: https://github.com/eellison , https://github.com/albanD
2022-12-15 01:07:29 +00:00
Edward Z. Yang
cc504ce292
Restore test_warn_types ( #90810 )
...
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90810
Approved by: https://github.com/ngimel
2022-12-14 05:15:32 +00:00
Bin Bao
7035bcdd0f
[inductor] Enable test_torch ( #90518 )
...
Summary: Skipping failures in those tests so that CI can guard other
passing cases.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90518
Approved by: https://github.com/jansel
2022-12-13 16:21:35 +00:00
Yuxin Wu
5d8618dfbd
Some memory saving in large unittests ( #90148 )
...
Two tests test_large_cumsum, test_large_cumprod use a lot of memory. This PR:
* Reduces their memory usage by: avoid `self.assertEqual` and avoid a temporary python variable
* Mark their memory requirement by decorator.
related to https://github.com/pytorch/pytorch/issues/84944
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90148
Approved by: https://github.com/soumith
2022-12-11 21:04:38 +00:00
Edward Z. Yang
2ad6ed8ac9
Fix some typed storage is deprecated warnings. ( #89867 )
...
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89867
Approved by: https://github.com/albanD
2022-12-07 20:09:57 +00:00
eqy
f7520cb51e
Reduce memory usage requirement of test_pdist_norm_large in test_torch.py ( #90075 )
...
Basically the same fix as #85373 , `/usr/bin/time` indicates that the memory requirement on the host-side was actually ~64GiB before the workaround and ~30GiB after.
CC @ptrblck @davidberard98
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90075
Approved by: https://github.com/davidberard98
2022-12-03 05:28:21 +00:00
Yu, Guangye
4144ad16af
add XPU backend to support torch.save and torch.load ( #89679 )
...
# Motivate
We need to add XPU backend to support torch.save and torch.load when parameter _use_new_zipfile_serialization=False.
# Solution
We give a design via wrap data as a tensor:
>1. and use an in-place copy for H2D
>2. directly call a tensor.to() for D2H.
This can help us:
>1. unify the generic code for all backends.
>2. support all the non-CPU device backends.
# Additional Context
No need more UT.
test/test_serialization.py will cover this code change.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89679
Approved by: https://github.com/ezyang
2022-11-30 20:38:02 +00:00
albanD
8713119c89
Stream actually overrides __new__ so we need to patch it as well ( #89592 )
...
Avoids
```
$ python foo.py
Traceback (most recent call last):
File "foo.py", line 3, in <module>
a = torch.cuda.Stream()
File "/home/albandes/local/pytorch/3.8_debug_source/torch/cuda/streams.py", line 34, in __new__
return super(Stream, cls).__new__(cls, priority=priority, **kwargs)
TypeError: object.__new__() takes exactly one argument (the type to instantiate)
```
And now gets
```
$ python foo.py
Traceback (most recent call last):
File "foo.py", line 3, in <module>
a = torch.cuda.Stream()
File "/home/albandes/local/pytorch/3.8_debug_source/torch/cuda/streams.py", line 34, in __new__
return super(Stream, cls).__new__(cls, priority=priority, **kwargs)
File "/home/albandes/local/pytorch/3.8_debug_source/torch/cuda/_utils.py", line 44, in err_fn
raise RuntimeError(
RuntimeError: Tried to instantiate dummy base class Stream
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89592
Approved by: https://github.com/soumith
2022-11-29 21:43:23 +00:00
David Berard
a029ec2c88
Move gpu slow tests to sm86 ( #87880 )
...
NVFuser tests (which are slow tests) would be better to run on more
modern GPU hardware.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87880
Approved by: https://github.com/malfet
2022-11-29 19:29:59 +00:00
Nikita Karetnikov
57af0c8245
Bug fix: make sure copy_impl doesn't read out of bounds ( #88544 )
...
Fixes #88543 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88544
Approved by: https://github.com/lezcano
2022-11-16 13:23:38 +00:00
PyTorch MergeBot
8441443132
Revert "Add nondeterministic error for scatter ( #88244 )"
...
This reverts commit e940a2f8e2 .
Reverted https://github.com/pytorch/pytorch/pull/88244 on behalf of https://github.com/mehtanirav due to Internal test failures
2022-11-10 23:56:49 +00:00
Kurt Mohler
ee28b865ee
Deprecate TypedStorage, its derived classes, and all of their public methods ( #85303 )
...
Part of #85302
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85303
Approved by: https://github.com/ezyang
2022-11-08 18:11:01 +00:00
Kurt Mohler
e940a2f8e2
Add nondeterministic error for scatter ( #88244 )
...
Fixes #88096
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88244
Approved by: https://github.com/ezyang , https://github.com/mruberry
2022-11-04 20:23:59 +00:00
Nikolay Korovaiko
0f6304ef1e
disable the out variants in test_cumprod test for inductor ( #88328 )
...
`out=` variants aren't supported by autograd and it's not a must fix, so disabling the test (https://github.com/pytorch/torchdynamo/issues/1798 ) for now.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88328
Approved by: https://github.com/desertfire
2022-11-03 16:52:37 +00:00
Nikolay Korovaiko
529ba076c6
add an exclude for test_constructor for inductor ( #88143 )
...
This test (https://github.com/pytorch/torchdynamo/issues/1800 ) fails since none of the c-tor ops support `pin_memory=True`. Natalia suggests it's not a priority to fix.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88143
Approved by: https://github.com/desertfire
2022-11-03 16:21:18 +00:00
Edward Z. Yang
f884e817d4
Make Python op registration work with torchdeploy/multipy ( #87162 )
...
See strategy at PythonOpRegistrationTrampoline.cpp for the
big picture.
Along the way, I made OperatorHandle support == and hashing,
and slightly changed the low level python_dispatch impl API
to disallow empty strings for dispatch key, which had the knock
on effect of requiring us to explicitly make sure we pass in
CompositeImplicitAutograd if we would have passed in "" (I didn't apply
this to the rest of the file because I'm lazy.)
Test strategy is we delete the logic for preventing Python op
registrations in torch from being skipped in a torchdeploy context
and show CI still works.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87162
Approved by: https://github.com/anjali411 , https://github.com/bdhirsh
2022-11-03 12:56:44 +00:00
Philip Meier
bc73affdad
prepare removal of deprecated functionality in torch.testing ( #87969 )
...
_Redo of #86586 with all BC breaking changes granularly placed into separate commits._
---
Per title. Deprecation happened on Feb 25, 2022 in c6f1bbc0ac , which made it into the 1.12 release. Since it is now 245 days later and the next release will be 1.14, the removals later in the stack comply with the [BC policy](https://github.com/pytorch/pytorch/wiki/PyTorch's-Python-Frontend-Backward-and-Forward-Compatibility-Policy#minimizing-the-disruption-of-bc-breaking-changes ).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87969
Approved by: https://github.com/mruberry
2022-11-02 14:04:48 +00:00
Kurt Mohler
1dbc8ad3b7
Add Warning class and refactor C++ warnings to use it ( #84101 )
...
Also adds `TORCH_WARN_WITH` and `TORCH_WARN_DEPRECATION` macros
Part of #72948
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84101
Approved by: https://github.com/albanD
2022-10-18 20:02:42 +00:00
Natalia Gimelshein
1704256b10
Enables where to have cpu scalar args ( #87022 )
...
This is for decompositions only, no attempt made to have good performance for this case.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87022
Approved by: https://github.com/ezyang , https://github.com/eellison , https://github.com/mruberry
2022-10-17 17:08:47 +00:00
Mikayla Gawarecki
afaee00fec
Add python nested_tensor and as_nested_tensor constructors in torch.nested ( #85593 )
...
Remove `torch.nested_tensor` which has erroneous behavior wrt gradients (could be either leaf or not leaf). Introduce `torch.nested.nested_tensor` and `torch.nested.as_nested_tensor` in the vein of `torch.tensor` and `torch.as_tensor`. Done in nested `__init__.py` for now but can move to pybind in future (when we want to load from numpy/nested lists ).
Discussed offline with @cpuhrsch and pybind constructor (https://github.com/pytorch/pytorch/pull/85536 ) was more gnarly than expected, so we can move to that when we do need loading from numpy etc.
Differential Revision: [D39806622](https://our.internmc.facebook.com/intern/diff/D39806622 )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85593
Approved by: https://github.com/drisspg , https://github.com/cpuhrsch
2022-09-28 20:15:02 +00:00
Kurt Mohler
b0a631cd14
Add nondeterministic alert for MaxUnpool1d/2d/3d ( #84766 )
...
Part of #80827
Part of #78249
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84766
Approved by: https://github.com/Lezcano , https://github.com/mruberry , https://github.com/nikitaved
2022-09-17 11:58:18 +00:00
soulitzer
02f654abca
Disable torch.library.Library with PYTORCH_DISABLE_LIBRARY ( #85190 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85190
Approved by: https://github.com/d4l3k
2022-09-17 03:05:43 +00:00
Khushi Agrawal
a9258eba8e
[Testing] Port bernoulli and multinomial to ErrorInputs. ( #74683 )
...
Hi,
The PR aims to port `bernoulli` and `multinomial` to error inputs. Thanks!
cc: @kshitij12345! :)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74683
Approved by: https://github.com/kshitij12345 , https://github.com/mruberry
2022-09-16 21:24:09 +00:00
Elias Ellison
f37069aac7
Re-enable fixed dynamo tests ( #84969 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84969
Approved by: https://github.com/bdhirsh , https://github.com/ezyang
2022-09-16 15:36:52 +00:00
Kurt Mohler
95a2c3df31
Replace expectedAlertNondeterministic with simpler check function ( #84808 )
...
Fixes #84807
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84808
Approved by: https://github.com/mruberry
2022-09-16 01:10:12 +00:00
Kurt Mohler
5b58140d1a
Add deterministic impl of scatter_add CUDA for all input sizes ( #79466 )
...
Fixes #50469
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79466
Approved by: https://github.com/ngimel
2022-09-07 03:12:49 +00:00
Natalia Gimelshein
0b363c5c5c
don't synchronize single element any/all reductions ( #84465 )
...
Fixes #84291
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84465
Approved by: https://github.com/ezyang
2022-09-02 21:18:58 +00:00
Elias Ellison
f701cb04fb
Test Dynamo CI w Fake Tensors ( #84282 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84282
Approved by: https://github.com/anijain2305
2022-09-01 00:15:05 +00:00
mattip
4dfa6d28a1
Normalize DLPack stride to 1 where shape < 2 ( #83158 )
...
Fixes #83069 . Also move all the dlpack tests to a new file., `test_dlpack.py`.
The fix involves always allocating a "strides" int array when converting to dlPack and deleting the strides when the capsule descructor is called. Then the strides are copied from the tensor, and `strides[i]` is set to `1` where `shape[i] < 2`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83158
Approved by: https://github.com/ezyang
2022-08-23 15:03:29 +00:00
Brian Hirsh
0c24af4985
Always allow tensor metadata changes ( #83590 )
...
Make it so that it is valid to set metadata after detach calls, like `x.detach().resize_(...)`.
This technically lifts some restrictions around `.data`. This PR means that you can now technically call `x.data.resize_(...)`, which can now directly resize `x` instead of erroring.
My understanding: Before the tensor-variable merge, when `x` and `x.data` were really different tensors, you could resize `x.data` independently of `x`, and during the merge, this error was added to avoid silent confusing behavior changes.
It was agreed that this error has been around long enough (several years) that it's acceptable to drop. cc @albanD @ezyang.
(Ed already had a prototype PR [here](https://github.com/pytorch/pytorch/pull/83545 ) - I ended up making one to try to slog through test failures).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83590
Approved by: https://github.com/ezyang
2022-08-19 23:30:43 +00:00
Nikita Shulga
1a09b05c94
Fix torch.equal on CPU ( #83350 )
...
`torch.equal` should not raise an exception when comparing tensors of different types
I.e. `torch.equal(torch.tensor([1, 2]), torch.tensor([1, 2], dtype=torch.float)))` should return True rather than raise an exception.
Also, this makes it consistent with GPU behaviour
Fixes https://github.com/pytorch/pytorch/issues/83314
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83350
Approved by: https://github.com/albanD
2022-08-17 03:22:56 +00:00
Nikita Karetnikov
4010f96121
[primTorch] Fix off by 1 in canonicalize_dim ( #83198 )
...
Also fix an issue in the `unsqueeze` ref due to this change.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83198
Approved by: https://github.com/ngimel
2022-08-16 17:57:01 +00:00
PyTorch MergeBot
f534b2c627
Revert "Remove split functional wrapper ( #74727 )"
...
This reverts commit a58876ace7 .
Reverted https://github.com/pytorch/pytorch/pull/74727 on behalf of https://github.com/seemethere due to Fails internal use cases, might extend out to external use cases as well. Need to assess overall impact of this change more widely
2022-08-10 19:45:23 +00:00
Peter Bell
a58876ace7
Remove split functional wrapper ( #74727 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74727
Approved by: https://github.com/albanD , https://github.com/khabinov
2022-08-10 17:57:48 +00:00
Kurt Mohler
c379915969
Add nondeterministic alert to CUDA cumsum ( #75693 )
...
Part of #75240
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75693
Approved by: https://github.com/ngimel
2022-08-04 01:58:29 +00:00
Kurt Mohler
14d0296e5c
Rename _Typed/_UntypedStorage to Typed/UntypedStorage and update docs ( #82438 )
...
### Description
Since the major changes for `_TypedStorage` and `_UntypedStorage` are now complete, they can be renamed to be public.
`TypedStorage._untyped()` is renamed to `TypedStorage.untyped()`.
Documentation for storages is improved as well.
### Issue
Fixes #82436
### Testing
N/A
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82438
Approved by: https://github.com/ezyang
2022-07-30 19:37:08 +00:00
Fabio Rocha
fd84c458f4
Add torch.unflatten and improve its docs ( #81399 )
...
unflatten now has a free function version in torch.flatten in addition to
the method in torch.Tensor.flatten.
Updated docs to reflect this and polished them a little.
For consistency, changed the signature of the int version of unflatten in
native_functions.yaml.
Some override tests were failing because unflatten has unusual
characteristics in terms of the .int and .Dimname versions having
different number of arguments so this required some changes
to test/test_override.py
Removed support for using mix of integer and string arguments
when specifying dimensions in unflatten.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81399
Approved by: https://github.com/Lezcano , https://github.com/ngimel
2022-07-29 15:02:42 +00:00
ecao
1ebe98220c
Optimize the copy of BFloat16 to Float and Float to BFloat16 ( #79685 )
...
Optimize the copy of BFloat16 to Float and Float to BFloat16.
* Vectorize the copy of BFLoat16 <-> Float
* Use `at::internal::serial_for_each` instead of directly using `cpu_kernel_vec` as `cpu_kernel_vec` can't handle that input and output has different data types.
single socket (28cores):
```
before: torch.Size([10, 128, 10, 124]) bf16 -> fp32: 4.18e-05 ms; fp32 -> bf16: 5.04e-05 ms
torch.Size([10, 128, 30, 124]) bf16 -> fp32: 0.00011868 ms; fp32 -> bf16: 0.0001476 ms
after: torch.Size([10, 128, 10, 124]) bf16 -> fp32: 1.35e-05 ms; fp32 -> bf16: 1.97e-05 ms
torch.Size([10, 128, 30, 124]) bf16 -> fp32: 7.32e-05 ms; fp32 -> bf16: 5.70e-05 ms
```
single core:
```
before: torch.Size([10, 128, 10, 124]) bf16 -> fp32: 0.000848 ms; fp32 -> bf16: 0.00105 ms
torch.Size([10, 128, 30, 124]) bf16 -> fp32: 0.00269 ms; fp32 -> bf16: 0.00321 ms
after: torch.Size([10, 128, 10, 124]) bf16 -> fp32: 0.000370 ms; fp32 -> bf16: 0.000382 ms
torch.Size([10, 128, 30, 124]) bf16 -> fp32: 0.00153 ms; fp32 -> bf16: 0.00113 ms
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79685
Approved by: https://github.com/malfet
2022-07-28 14:34:08 +00:00
Huy Do
edf1868e67
Fix test_doc_template regex ( #81755 )
...
### The problem
This original regex abuses .* in combination with `re.DOTALL` and leads to a catastrophic backtracking perf issue when there is no match. When it happens, test_doc_template will run "forever" and timeout. Here is an example timeout test https://github.com/pytorch/pytorch/runs/7413337595
Another minor issue with this regex is that it won't matches concatenated doc string like `"""FOO""" + """BAR"""`, which is used for some API `_torch_docs.py`
### The fix
* Remove most of the match all .* usage. I have tested to make sure that the test finishes even when there is no match, i.e. it fails successfully
* Update the regex to match all the following cases before and after linting (You can also try it out on https://pythex.org ):
BEFORE
```
add_docstr(torch.abs, r"""
abs(input, *, out=None) -> Tensor
Computes the absolute value of each element in :attr:`input`.
.. math::
\text{out}_{i} = |\text{input}_{i}|
""" + r"""
Args:
{input}
Keyword args:
{out}
Example::
>>> torch.abs(torch.tensor([-1, -2, 3]))
tensor([ 1, 2, 3])
""".format(**common_args))
add_docstr(torch.absolute,
r"""
absolute(input, *, out=None) -> Tensor
Alias for :func:`torch.abs`
""")
```
AFTER
```
add_docstr(
torch.abs,
r"""
abs(input, *, out=None) -> Tensor
Computes the absolute value of each element in :attr:`input`.
.. math::
\text{out}_{i} = |\text{input}_{i}|
"""
+ r"""
Args:
{input}
Keyword args:
{out}
Example::
>>> torch.abs(torch.tensor([-1, -2, 3]))
tensor([ 1, 2, 3])
""".format(
**common_args
),
)
add_docstr(
torch.absolute,
r"""
absolute(input, *, out=None) -> Tensor
Alias for :func:`torch.abs`
""",
)
```
This will unblock https://github.com/pytorch/pytorch/pull/81643
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81755
Approved by: https://github.com/atalman
2022-07-21 16:28:29 +00:00
Animesh Jain
1d90d6ee60
Setup for running PyTorch tests with TorchDynamo and skips for known failing tests ( #80106 )
...
@ezyang I am going to keep adding more skips in this PR for now. And once we have the CI running, I will replace with the appropriate decorators.
cc @mlazos , we should add those tests in test_ops.py in this PR as well
cc @jansel
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80106
Approved by: https://github.com/ezyang , https://github.com/jansel
2022-07-07 18:57:33 +00:00
Kurt Mohler
4c279994fd
Fix Module.share_memory error ( #80843 )
...
Fixes #80733
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80843
Approved by: https://github.com/malfet
2022-07-05 15:17:36 +00:00
PyTorch MergeBot
f668b7ecb0
Add integer support to index_reduce ( #80464 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80464
Approved by: https://github.com/cpuhrsch
2022-06-30 12:54:51 +00:00
PyTorch MergeBot
d7847ed23e
Add integer support to scatter_reduce ( #80324 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80324
Approved by: https://github.com/cpuhrsch
2022-06-29 21:10:26 +00:00
Alexander Grund
71d9592a72
Only sync CUDA if the operation is run on GPU ( #80328 )
...
This fixes test failures when PyTorch is build without CUDA
Fixes https://github.com/pytorch/pytorch/issues/58563
I used the same is_cuda check that is used in test_nn.py
CC @ailzhang after #58564
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80328
Approved by: https://github.com/mruberry
2022-06-27 14:49:39 +00:00
Alexander Grund
3b8589ac44
Copy Tensor for tests to avoid in-place transform modifying the original tensor ( #80331 )
...
Fixes #48591
CC @mruberry after #60256
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80331
Approved by: https://github.com/mruberry
2022-06-27 14:47:52 +00:00
lezcano
f54e7b4ad6
More forward AD formulas
...
This PR:
- Corrects the forward AD formula of `torch.sgn`.
- The reason why we can't use `auto_element_wise` for this operations is rather subtle. I left a comment.
- This, in turn, fixes a problem we had in forward-over-backward for `linalg.svd` and other spectral decompositions (and `norm`, `linalg.norm`, `linalg.matrix_norm`) that were using `torch.abs` (whose derivative is given by `torch.sgn`.
- Implement the formula for a number of missing operations `nansum`, `amax`, `amin`...
- Simplified a few formulas, most notably the forward AD for `div` and the derivative of `norm`, `linalg.norm` and `vector_norm` for `ord=+-inf`.
- Correct the formula for `mean`, `std_mean`, `var_mean` when `dim` is provided and equal to `()` (or `None`)
- A few minor improvements to `sum_backward`, `unsqueeze_multiple` and formulas depending on them
- Fix the derivatives of `std_mean` and `std_var` (complex support,
ASAN, forward AD...)
Fixes: https://github.com/pytorch/pytorch/issues/67539
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80082
Approved by: https://github.com/zou3519
2022-06-23 01:31:08 +00:00
Alex Hedges
cb2b7b1e57
Fix code that triggers BytesWarning ( #79868 )
...
Fixes #74812 .
I have fixed the multiple instances in the repository that trigger
`BytesWarning`, and I have enabled the `-bb` option when tests are run
to prevent regressions.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79868
Approved by: https://github.com/janeyx99
2022-06-21 01:12:21 +00:00
PyTorch MergeBot
e10cbe3880
Revert "Fix BytesWarning in torch.load() ( #74813 )"
...
This reverts commit 6c2e8119dd .
Reverted https://github.com/pytorch/pytorch/pull/74813 on behalf of https://github.com/janeyx99 due to Broke slow tests in cuda 10.2 https://github.com/pytorch/pytorch/runs/6944238177?check_suite_focus=true
2022-06-18 03:53:54 +00:00
Alex Hedges
6c2e8119dd
Fix BytesWarning in torch.load() ( #74813 )
...
Fixes #74812 .
I have enabled the `-bb` option when tests are run to prevent regressions. I don't think it will make CI run more slowly, but I'm not entirely sure.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74813
Approved by: https://github.com/kit1980
2022-06-17 22:56:43 +00:00
drisspg
bdcee8f995
update is_same_size to work with nested tensor dispatch
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79297
Approved by: https://github.com/soulitzer
2022-06-11 00:07:27 +00:00
Brian Hirsh
7b3a0ff87a
Port index.Tensor to structured kernels.
...
Tracking issue: #55070
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69607
Approved by: https://github.com/bdhirsh
2022-06-10 17:27:47 +00:00
Peter Bell
7843a5e882
Move Tensor.grad back into C++
...
`Tensor.grad` was moved to python in #30531 to add a warning. However,
that warning has since been lowered into C++ so this wrapper is no
longer necessary.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76675
Approved by: https://github.com/albanD
2022-06-10 13:44:45 +00:00
PyTorch MergeBot
4b82ef7928
Revert "Port index.Tensor to structured kernels."
...
This reverts commit cfd84125bd .
Reverted https://github.com/pytorch/pytorch/pull/69607 on behalf of https://github.com/zengk95 due to This is breaking mac trunk tests cfd84125bd
2022-06-08 20:16:10 +00:00
Brian Hirsh
cfd84125bd
Port index.Tensor to structured kernels.
...
Tracking issue: #55070
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69607
Approved by: https://github.com/bdhirsh
2022-06-08 18:17:52 +00:00
Kshiteej K
497ae27050
[chalf] warn once on creating a chalf tensor ( #78245 )
...
`chalf` is experimental as the op coverage is low.
Following script raises 6 warnings if `set_warn_always(True)` else raises only 1 warning.
```python
import torch
torch.set_warn_always(True)
device='cpu'
t = torch.randn(3, dtype=torch.chalf, device=device)
y = torch.rand(3, dtype=torch.chalf, device=device)
# Allocates new tensor for result
t + y
device='cuda'
t = torch.randn(3, dtype=torch.chalf, device=device)
y = torch.rand(3, dtype=torch.chalf, device=device)
# Allocates new tensor for result
t + y
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78245
Approved by: https://github.com/anjali411
2022-06-01 18:38:31 +00:00
yuguo68
efdb4192bc
set data permits requires_grad=True on integer tensor
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78436
Approved by: https://github.com/albanD , https://github.com/soulitzer
2022-06-01 15:56:32 +00:00
PyTorch MergeBot
fca1f495c2
Revert "Port index.Tensor to structured kernels."
...
This reverts commit 9fe6f1baf5 .
Reverted https://github.com/pytorch/pytorch/pull/69607 on behalf of https://github.com/suo due to this broke master, see: 9fe6f1baf5
2022-06-01 00:12:15 +00:00
Brian Hirsh
9fe6f1baf5
Port index.Tensor to structured kernels.
...
Tracking issue: #55070
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69607
Approved by: https://github.com/bdhirsh
2022-05-31 22:15:20 +00:00
Kurt Mohler
e9afb43676
Add meta device support to _UntypedStorage and _TypedStorage ( #78008 )
...
Fixes #77885
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78008
Approved by: https://github.com/ezyang
2022-05-28 15:33:45 +00:00
Yu Guo
f69c990ecc
fix index_select when source tensor is empty
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77881
Approved by: https://github.com/ezyang
2022-05-26 03:10:47 +00:00
Kurt Mohler
cecb2ad95e
Restore old names for private funcs in legacy storages ( #77861 )
...
Followup from #75459
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77861
Approved by: https://github.com/ezyang
2022-05-20 02:03:34 +00:00
Eric Sauser
2d4291fb81
[torch] Fixed a few test for Windows & Linux GPUs ( #77531 )
...
Summary:
While running those tests on
- my local windows GPU machine
- a dev server
- an on-demand GPU
I noticed a few test failures and here's some tentative fixes
Test Plan:
Ran tests on:
- my local windows GPU machine
- a linux dev server w/o GPU
- an linux on-demand GPU server
Note that when using CUDA11, the tests crashes (segfaults) on calls to torch.nn.ConvTranspose3d. Fails on master, but works with CUDA10.
Differential Revision: D36377288
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77531
Approved by: https://github.com/ezyang
2022-05-19 14:04:13 +00:00
Kurt Mohler
aea6e2c396
Merge torch.cuda._UntypedStorage into torch._UntypedStorage ( #75459 )
...
Fixes #74933
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75459
Approved by: https://github.com/ezyang
2022-05-19 13:54:39 +00:00
Mikayla Gawarecki
841c65f499
Unprivate _index_reduce and add documentation
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76997
Approved by: https://github.com/cpuhrsch
2022-05-13 19:48:38 +00:00
Kulin Seth
e011a8e18b
Enable PyTorch operations on MPS Backend. ( #77343 )
...
Add PyTorch operations to MPS backend.
- https://github.com/pytorch/pytorch/issues/77394
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77343
Approved by: https://github.com/albanD
2022-05-13 18:28:53 +00:00
Mikayla Gawarecki
1141b45e7a
Index reduction CUDA support
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76296
Approved by: https://github.com/cpuhrsch , https://github.com/ngimel
2022-05-13 14:47:52 +00:00
Christian Puhrsch
ce9a477fdf
Support torch.Tensor.to for CSR
...
Fixes #76379
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76400
Approved by: https://github.com/pearu , https://github.com/davidberard98
2022-05-05 21:59:50 +00:00
Natalia Gimelshein
ce76244200
fix where type promotion
...
Fixes #73298
I don't know whether `where` kernel actually supports type promotion, nor am I in the mood to find out, so it's manual type promotion.
Edit: nah, i can't tell TI to "promote to common dtype" because of bool condition, so manual type promotion is our only option.
I'll see what tests start failing and fix.
Uses some parts from #62084
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76691
Approved by: https://github.com/mruberry
2022-05-03 04:40:04 +00:00
kshitij12345
e36d25fbae
[complex32] support printing the tensor
...
Reference: #74537
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76614
Approved by: https://github.com/anjali411
2022-05-01 12:46:09 +00:00
Mikayla Gawarecki
676a4a3969
Prototype _index_reduce (CPU-only)
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75981
Approved by: https://github.com/cpuhrsch
2022-04-27 23:01:00 +00:00
kshitij12345
aa51704ce5
[complex32] add chalf alias for complex32 and chalf method
...
Reference: https://github.com/pytorch/pytorch/issues/74537
Adds chalf alias for complex32 and also adds method `chalf` similar to `cfloat, cdouble`
TODO:
* [x] Add docs
* [x] Add override
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75320
Approved by: https://github.com/anjali411
2022-04-20 23:44:47 +00:00
Edward Z. Yang
ee955b8bb9
Cannibalize noarch CI job into crossref CI job
...
crossref is a new strategy for performing tests when you want
to run a normal PyTorch API call, separately run some variation of
the API call (e.g., same thing but all the arguments are meta tensors)
and then cross-reference the results to see that they are consistent.
Any logic you add to CrossRefMode will get run on *every* PyTorch API
call that is called in the course of PyTorch's test suite. This can
be a good choice for correctness testing if OpInfo testing is not
exhaustive enough.
For now, the crossref test doesn't do anything except verify that
we can validly push a mode onto the torch function mode stack for all
functions.
Signed-off-by: Edward Z. Yang <ezyangfb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75988
Approved by: https://github.com/seemethere
2022-04-20 11:56:25 +00:00
Edward Z. Yang
30943d1610
Remove noarchTest decorator
...
These tests are cheap so it doesn't matter if we run them on all
configs. This is in preparation for removing the noarch build
configuration entirely.
Signed-off-by: Edward Z. Yang <ezyangfb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75985
Approved by: https://github.com/seemethere , https://github.com/cbalioglu
2022-04-19 00:48:49 +00:00
Beilei Zheng
332086c08d
Add BFloat16 support for multinomial and poisson on CPU
...
Add BFloat16 support for multinomial and poisson on CPU
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63215
Approved by: https://github.com/frank-wei , https://github.com/bigfootjon
2022-04-14 15:42:18 +00:00
Jagadish Krishnamoorthy
26ba7a9297
ROCm: Enable test_masked_scatter_large_tensor
...
#68487 fixes the issue #60190 for ROCm >= 5.0 release.
Signed-off-by: Jagadish Krishnamoorthy <jagdish.krishna@gmail.com>
Fixes #60190
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75455
Approved by: https://github.com/ezyang
2022-04-08 15:59:40 +00:00
Nikita Karetnikov
936a65056e
Use the same checks in all grid_sampler functions
...
Fixes #73187 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75164
Approved by: https://github.com/albanD
2022-04-04 15:21:44 +00:00
Alban Desmaison
0ce02ea52d
Revert D35284563: Use the same checks in all grid_sampler functions
...
Test Plan: revert-hammer
Differential Revision:
D35284563 (835cc66e5d )
Original commit changeset: 1477c506b875
Original Phabricator Diff: D35284563 (835cc66e5d )
fbshipit-source-id: 7260f4dfda23bd60200e5ba2c5bf3e4f833c2646
(cherry picked from commit fbe082905ef678e7dd70dbc9520dca644383ce01)
2022-04-01 16:45:46 +00:00
kshitij12345
65b65af236
[complex32] cat, fill_(partial), item
...
Reference : #74537
`cat_backwards` (on CUDA) requires support for `fill`, have added support for `fill`. (Also `fill` requires `item` support)
Now `fill` backward requires `sum` (will add it in later PR).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75010
Approved by: https://github.com/anjali411
2022-04-01 15:19:05 +00:00
Nikita Karetnikov
835cc66e5d
Use the same checks in all grid_sampler functions ( #74635 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74635
Fixes #73187 .
Test Plan: Imported from OSS
Reviewed By: bdhirsh
Differential Revision: D35284563
Pulled By: albanD
fbshipit-source-id: 1477c506b8755d864ca902ee140bee7bdb0069b0
(cherry picked from commit dcbd5242baaae11f9e323d99a9596e5b88e86bd7)
2022-04-01 14:26:16 +00:00
Mikayla Gawarecki
2bfa018462
[BC-breaking] Use ScatterGatherKernel for scatter_reduce (CPU-only) ( #74226 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74226
Update signature of `scatter_reduce_` to match `scatter_/scatter_add_`
`Tensor.scatter_reduce_(int64 dim, Tensor index, Tensor src, str reduce)`
- Add new reduction options in ScatterGatherKernel.cpp and update `scatter_reduce` to call into the cpu kernel for `scatter.reduce`
- `scatter_reduce` now has the same shape constraints as `scatter_` and `scatter_add_`
- Migrate `test/test_torch.py:test_scatter_reduce` to `test/test_scatter_gather_ops.py`
Test Plan: Imported from OSS
Reviewed By: ngimel
Differential Revision: D35222842
Pulled By: mikaylagawarecki
fbshipit-source-id: 84930add2ad30baf872c495251373313cb7428bd
(cherry picked from commit 1b45139482e22eb0dc8b6aec2a7b25a4b58e31df)
2022-04-01 05:57:45 +00:00
Nikita Shulga
bfac65dfe5
[testing] Update dispatch macros ( #74977 )
...
This PR is reland of #74289
Co-authored-by: Khushi Agrawal <khushiagrawal411@gmail.com>
2022-03-30 14:13:21 -07:00
PyTorch MergeBot
2e4152b118
Revert "[testing] Update dispatch macros"
...
This reverts commit eed19a0f38 .
Reverted https://github.com/pytorch/pytorch/pull/74289 on behalf of https://github.com/malfet
2022-03-30 19:52:37 +00:00
Khushi Agrawal
eed19a0f38
[testing] Update dispatch macros
...
Hi,
This PR is the follow-up PR of #71561 . (the previous PR had a couple of merge conflicts and was reverted, this PR resolves that).
Please take a look. Thanks!
cc: @pmeier @mruberry @kshitij12345
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74289
Approved by: https://github.com/pmeier , https://github.com/mruberry
2022-03-30 16:10:16 +00:00
Edward Z. Yang
51e7a3406c
Fix formatting of scalar tensors (don't call item)
...
Signed-off-by: Edward Z. Yang <ezyangfb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74376
Approved by: https://github.com/bdhirsh
2022-03-25 02:22:25 +00:00
Jane Xu
3f9115dc7a
Decorate test_pdist_large for requiring large memory ( #74574 )
...
Summary:
Fixes https://github.com/pytorch/pytorch/issues/74154
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74574
Reviewed By: george-qi
Differential Revision: D35100229
Pulled By: janeyx99
fbshipit-source-id: d7df377318e45c7f5447c034aa025b1422fcc06e
(cherry picked from commit 335a76d9f2a721b30e1b9e1c869bfbe431f01a2a)
2022-03-24 17:25:37 +00:00
kshitij12345
f7ee308dfb
[complex-half] support casting (by updating copy_)
...
Reference https://github.com/pytorch/pytorch/issues/71680
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73847
Approved by: https://github.com/anjali411
2022-03-23 21:42:59 +00:00
Kurt Mohler
79ddc72b85
Virtualize <type>Storage classes ( #66970 )
...
Summary:
Fixes https://github.com/pytorch/pytorch/issues/66228
cc ezyang bhosmer smessmer ljk53 bdhirsh
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66970
Reviewed By: bdhirsh
Differential Revision: D33245612
Pulled By: ezyang
fbshipit-source-id: 4c61c2cb029e2b94b0e68927c377d3e1c358dd7c
(cherry picked from commit d29fcdfb4bc2cc17b1795d4349e4b56fa0d1cf12)
2022-03-22 23:44:48 +00:00
Saketh Are
46a88036af
Refactor error input tests in test_torch.py to OpInfos ( #73981 )
...
Summary:
This PR ports several tests in `test/test_torch.py` over to OpInfo ErrorInputs.
Some tests commented "convert to ErrorInputs" still remain in `test_torch.py`. They fall under two categories:
- Memory overlap tests which specifically test the in-place version of an operator (e.g. [this test](424a054d53/test/test_torch.py (L3788) ) for index_add_).
- Tests with non-trivial behavior calling `torch.cuda.synchronize()` after calling the operator being tested (e.g. [this test](424a054d53/test/test_torch.py (L4948) ) for torch.multinomial).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73981
Reviewed By: qihqi
Differential Revision: D35016669
Pulled By: saketh-are
fbshipit-source-id: bc0016d2b2bfb566a9dfef81ecf44e0adb9e4b14
(cherry picked from commit 99bcbdb05f2c10a717a269b0010aa3a3e24fe5c0)
2022-03-21 22:31:37 +00:00
Nikita Shulga
ef066f0832
Revert D34856571: [pytorch][PR] Replace get_all_ type macros with the ATen dispatch macros.
...
Test Plan: revert-hammer
Differential Revision:
D34856571 (3ded7b1da3 )
Original commit changeset: 0dca038bcad5
Original Phabricator Diff: D34856571 (3ded7b1da3 )
fbshipit-source-id: 594553fa0b710d78beba59d5d2b646f1f1270386
(cherry picked from commit 8090eb9b12dcf452a9e7dc01792a66fb91b563b6)
2022-03-15 22:07:11 +00:00
Khushi Agrawal
3ded7b1da3
Replace get_all_ type macros with the ATen dispatch macros. ( #71561 )
...
Summary:
Hi, Team!
The PR is motivated from https://github.com/pytorch/pytorch/pull/71153#discussion_r782446738 . It aims to replace `get_all` type macros with the ATen dispatch macros.
The files it iterates over are: (Thanks, Lezcano, for the idea!!)
<details>
<summary>
`test/test_autograd.py`</summary>
<p>
```python
43:from torch.testing._internal.common_dtype import get_all_dtypes
8506: floating_dt = [dt for dt in get_all_dtypes() if dt.is_floating_point]
```
</p>
</details>
<details>
<summary>
`test/test_binary_ufuncs.py`</summary>
<p>
```python
26: all_types_and_complex_and, integral_types_and, get_all_dtypes, get_all_int_dtypes, get_all_math_dtypes,
27: get_all_complex_dtypes, get_all_fp_dtypes,
935: dtypes(*get_all_dtypes(include_bool=False, include_complex=False))
1035: dtypes(*get_all_dtypes(
1488: dtypes(*(get_all_dtypes(include_bool=False, include_bfloat16=False)))
1879: dtypes(*product(get_all_dtypes(include_complex=False), get_all_dtypes(include_complex=False)))
1887: dtypes(*(get_all_int_dtypes() + [torch.bool]))
1913: dtypes(*(get_all_fp_dtypes()))
1941: dtypes(*(get_all_fp_dtypes()))
1977: dtypes(*product(get_all_complex_dtypes(), get_all_dtypes()))
2019: dtypes(*product(get_all_fp_dtypes(), get_all_fp_dtypes()))
2048: dtypes(*get_all_dtypes())
2110: dtypes(*product(get_all_dtypes(include_complex=False),
2111: get_all_dtypes(include_complex=False)))
2128: types = [torch.bool, torch.bfloat16] + get_all_int_dtypes()
2173: if dtypes[1] in get_all_fp_dtypes():
2178: dtypes(*product(get_all_fp_dtypes(),
2179: get_all_fp_dtypes()))
2260: dtypesIfCUDA(*set(get_all_math_dtypes('cuda')) - {torch.complex64, torch.complex128})
2261: dtypes(*set(get_all_math_dtypes('cpu')) - {torch.complex64, torch.complex128})
2273: dtypesIfCUDA(*set(get_all_math_dtypes('cuda')) - {torch.complex64, torch.complex128})
2274: dtypes(*set(get_all_math_dtypes('cpu')) - {torch.complex64, torch.complex128})
2307: dtypes(*get_all_math_dtypes('cpu'))
2319: dtypes(*get_all_fp_dtypes(include_bfloat16=False))
2331: dtypes(*get_all_int_dtypes())
2356: dtypes(*get_all_dtypes(include_bfloat16=False, include_bool=False, include_complex=False))
2393: if dtype in get_all_int_dtypes():
2614: dtypes(*get_all_dtypes())
2624: dtypes(*tuple(itertools.combinations_with_replacement(get_all_dtypes(), 2)))
2806: dtypes(*list(product(get_all_dtypes(include_complex=False),
2807: get_all_dtypes(include_complex=False))))
2866: dtypes(*list(product(get_all_complex_dtypes(),
2867: get_all_complex_dtypes())))
2902: dtypes(*product(get_all_dtypes(), get_all_dtypes()))
2906: dtypes(*product(get_all_dtypes(), get_all_dtypes()))
2910: dtypes(*product(get_all_dtypes(), get_all_dtypes()))
3019: dtypes = [torch.float, torch.double] + get_all_complex_dtypes()
3221: dtypes(*get_all_dtypes(include_complex=False))
3407: dtypes(*list(product(get_all_dtypes(include_bool=False),
3408: get_all_dtypes(include_bool=False))))
3504: dtypes(*product(get_all_dtypes(include_complex=False, include_bfloat16=False),
3505: get_all_dtypes(include_complex=False, include_bfloat16=False)))
3516: if x.dtype in get_all_int_dtypes() + [torch.bool]:
3643: dtypes(*product(get_all_dtypes(include_complex=False,
3645: get_all_dtypes(include_complex=False,
```
</p>
</details>
<details>
<summary>
`test/test_complex.py`</summary>
<p>
```python
6:from torch.testing._internal.common_dtype import get_all_complex_dtypes
11: dtypes(*get_all_complex_dtypes())
```
</p>
</details>
<details>
<summary>
`test/test_foreach.py`</summary>
<p>
```python
18: get_all_dtypes, get_all_int_dtypes, get_all_complex_dtypes, get_all_fp_dtypes,
142: if dtype in get_all_int_dtypes():
179: disable_fastpath = op.ref == torch.div and dtype in get_all_int_dtypes() + [torch.bool]
201: disable_fastpath = op.ref == torch.div and dtype in get_all_int_dtypes() + [torch.bool]
205: disable_fastpath |= dtype in get_all_int_dtypes() + [torch.bool]
211: disable_fastpath |= dtype not in get_all_complex_dtypes()
241: bool_int_div = op.ref == torch.div and dtype in get_all_int_dtypes() + [torch.bool]
246: disable_fastpath |= dtype in get_all_int_dtypes() + [torch.bool]
248: disable_fastpath |= dtype not in get_all_complex_dtypes()
250: disable_fastpath |= True and dtype not in get_all_complex_dtypes()
307: disable_fastpath = dtype in get_all_int_dtypes() + [torch.bool]
365: if opinfo.name == "_foreach_abs" and dtype in get_all_complex_dtypes():
376: ops(foreach_unary_op_db, dtypes=get_all_dtypes())
393: dtypes=get_all_dtypes(include_half=True, include_bfloat16=True, include_complex=False))
401: ops(foreach_minmax_op_db, dtypes=get_all_fp_dtypes(include_bfloat16=True, include_half=True))
426: if ord in (1, 2) and dtype in torch.testing.get_all_fp_dtypes():
439: dtypes(*get_all_dtypes())
449: ops(foreach_binary_op_db, dtypes=get_all_dtypes())
481: ops(foreach_binary_op_db, dtypes=get_all_dtypes())
536: if dtype in get_all_int_dtypes() + [torch.bool] and foreach_op == torch._foreach_div:
545: ops(foreach_binary_op_db, dtypes=get_all_dtypes())
637: ops(foreach_pointwise_op_db, allowed_dtypes=get_all_fp_dtypes(include_half=False, include_bfloat16=False))
```
</p>
</details>
<details>
<summary>
`test/test_linalg.py`</summary>
<p>
```python
29: all_types, floating_types, floating_and_complex_types, get_all_dtypes, get_all_int_dtypes, get_all_complex_dtypes,
30: get_all_fp_dtypes,
111: dtypes(*(get_all_dtypes()))
794: float_and_complex_dtypes = get_all_fp_dtypes() + get_all_complex_dtypes()
807: dtypes(*(get_all_int_dtypes()))
828: dtypes(*(get_all_fp_dtypes() + get_all_complex_dtypes()))
841: if dtype in get_all_complex_dtypes():
844: dtypes(*itertools.product(get_all_dtypes(),
845: get_all_dtypes()))
855: for dtypes0, dtypes1, dtypes2 in product(get_all_dtypes(), repeat=3):
5607: *get_all_fp_dtypes(include_half=not CUDA9, include_bfloat16=(CUDA11OrLater and SM53OrLater)))
5608: dtypes(*(set(get_all_dtypes()) - {torch.half, torch.bool}))
5644: dtypes(*(get_all_complex_dtypes() + get_all_fp_dtypes()))
6255: dtypesIfCUDA(*get_all_complex_dtypes(),
6256: *get_all_fp_dtypes(include_bfloat16=(TEST_WITH_ROCM or (CUDA11OrLater and SM53OrLater)),
6292: dtypesIfCUDA(*get_all_fp_dtypes(include_bfloat16=(TEST_WITH_ROCM or (CUDA11OrLater and SM53OrLater))))
6323: dtypesIfCUDA(*get_all_complex_dtypes(),
6324: *get_all_fp_dtypes(include_bfloat16=(TEST_WITH_ROCM or (CUDA11OrLater and SM53OrLater))))
6325: dtypes(*get_all_complex_dtypes(), *get_all_fp_dtypes())
6358: dtypesIfCUDA(*([torch.float, torch.double] + get_all_complex_dtypes()))
6556: dtypes(*get_all_fp_dtypes(), *get_all_complex_dtypes())
6668: dtypes(*get_all_fp_dtypes(), *get_all_complex_dtypes())
6741: dtypes(*get_all_fp_dtypes(), *get_all_complex_dtypes())
```
</p>
</details>
<details>
<summary>
`test/test_nn.py`</summary>
<p>
```python
37:from torch.testing._internal.common_dtype import integral_types, get_all_fp_dtypes, get_all_math_dtypes
50: onlyNativeDeviceTypes, deviceCountAtLeast, largeTensorTest, expectedFailureMeta, skipMeta, get_all_device_types, \
8862: for device in get_all_device_types():
9629: for dt1 in get_all_math_dtypes(device):
9630: for dt2 in get_all_math_dtypes(device):
9631: for dt3 in get_all_math_dtypes(device):
9648: for input_dtype in get_all_math_dtypes(device):
9664: for input_dtype in get_all_math_dtypes(device):
13015: dtypes(*get_all_fp_dtypes(include_bfloat16=AMPERE_OR_ROCM))
13034: dtypes(*get_all_fp_dtypes(include_bfloat16=AMPERE_OR_ROCM))
13159: dtypes(*get_all_fp_dtypes(include_bfloat16=AMPERE_OR_ROCM))
17400: dtypesIfCUDA(*get_all_fp_dtypes(include_bfloat16=AMPERE_OR_ROCM))
17768: dtypesIfCUDA(*get_all_fp_dtypes())
17773: dtypesIfCUDA(*get_all_fp_dtypes())
17778: dtypesIfCUDA(*get_all_fp_dtypes())
17783: dtypesIfCUDA(*get_all_fp_dtypes())
17788: dtypesIfCUDA(*get_all_fp_dtypes())
17793: dtypesIfCUDA(*get_all_fp_dtypes())
17798: dtypesIfCUDA(*get_all_fp_dtypes())
17963: dtypesIfCUDA(*get_all_fp_dtypes())
17977: dtypesIfCUDA(*get_all_fp_dtypes())
18684: def test_cross_entropy_loss_prob_target_all_reductions(self, device):
```
</p>
</details>
<details>
<summary>
`test/test_numpy_interop.py`</summary>
<p>
```python
12:from torch.testing._internal.common_dtype import get_all_dtypes
399: dtypes(*get_all_dtypes())
```
</p>
</details>
<details>
<summary>
`test/test_ops.py`</summary>
<p>
```python
12:from torch.testing._internal.common_dtype import floating_and_complex_types_and, get_all_dtypes
86: for dtype in get_all_dtypes():
```
</p>
</details>
<details>
<summary>
`test/test_reductions.py`</summary>
<p>
```python
16: get_all_dtypes, get_all_math_dtypes, get_all_int_dtypes, get_all_complex_dtypes, get_all_fp_dtypes,
360: allowed_dtypes=get_all_dtypes(include_bfloat16=False))
366: allowed_dtypes=get_all_dtypes(include_bfloat16=False))
394: allowed_dtypes=get_all_dtypes(include_bfloat16=False))
750: for dtype in [dtype for dtype in get_all_math_dtypes('cpu') if dtype != torch.float16]:
1404: dtypes(*get_all_dtypes(include_bool=False, include_complex=False))
1457: dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False) +
1458: get_all_complex_dtypes()))
1465: return dtype in get_all_int_dtypes()
1494: dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False)))
1501: dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False)))
1507: dtypes(*(get_all_complex_dtypes()))
1514: dtypes = list(get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False))
1523: dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False)))
1531: if dtype in get_all_fp_dtypes():
1608: dtypes(*(get_all_dtypes(include_half=True, include_bfloat16=False,
1837: dtypes(*get_all_dtypes(include_bool=False, include_complex=False))
1855: dtypes(*(set(get_all_dtypes(include_bool=False, include_complex=False)) - {torch.uint8}))
3219: for dtype in get_all_dtypes(include_half=True, include_bfloat16=False,
```
</p>
</details>
<details>
<summary>
`test/test_serialization.py`</summary>
<p>
```python
26:from torch.testing._internal.common_dtype import get_all_dtypes
586: for device, dtype in product(devices, get_all_dtypes()):
589: for other_dtype in get_all_dtypes():
```
</p>
</details>
<details>
<summary>
`test/test_shape_ops.py`</summary>
<p>
```python
18:from torch.testing._internal.common_dtype import get_all_dtypes
230: dtypes(*get_all_dtypes(include_complex=False, include_bool=False, include_half=False,
232: dtypesIfCUDA(*get_all_dtypes(include_complex=False, include_bool=False, include_bfloat16=False))
344: dtypes(*get_all_dtypes())
443: dtypes(*get_all_dtypes())
461: dtypes(*get_all_dtypes())
570: dtypes(*get_all_dtypes(include_complex=False))
```
</p>
</details>
<details>
<summary>
`test/test_sort_and_select.py`</summary>
<p>
```python
12: all_types, all_types_and, floating_types_and, get_all_dtypes, get_all_int_dtypes, get_all_fp_dtypes,
136: dtypes(*set(get_all_dtypes()) - {torch.bool, torch.complex64, torch.complex128})
231: dtypes(*set(get_all_dtypes()) - {torch.bool, torch.complex64, torch.complex128})
296: dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes()))
647: dtypesIfCUDA(*get_all_fp_dtypes())
678: dtypesIfCUDA(*(get_all_dtypes(include_complex=False,
682: dtypes(*(get_all_dtypes(include_complex=False, include_bool=False, include_half=False, include_bfloat16=False)))
739: dtypesIfCPU(*set(get_all_dtypes()) - {torch.complex64, torch.complex128})
740: dtypes(*set(get_all_dtypes()) - {torch.bfloat16, torch.complex64, torch.complex128})
799: dtypesIfCPU(*set(get_all_dtypes()) - {torch.complex64, torch.complex128})
800: dtypes(*set(get_all_dtypes()) - {torch.bfloat16, torch.complex64, torch.complex128})
```
</p>
</details>
<details>
<summary>
`test/test_sparse.py`</summary>
<p>
```python
20:from torch.testing import get_all_complex_dtypes, get_all_fp_dtypes
29: floating_and_complex_types, floating_and_complex_types_and, get_all_dtypes, get_all_int_dtypes,
1963: return dtype in get_all_int_dtypes()
1994: dtypes(*get_all_dtypes(include_bool=False, include_half=False,
2103: return dtype in get_all_int_dtypes()
2138: dtypes(*get_all_dtypes(include_bool=False, include_half=False,
2626: all_sparse_dtypes = get_all_dtypes(include_complex=True)
2633: all_sparse_dtypes = get_all_dtypes(include_complex=True)
3230: dtypes(*get_all_complex_dtypes(),
3231: *get_all_fp_dtypes(include_half=False, include_bfloat16=False))
3234: *get_all_fp_dtypes(
```
</p>
</details>
<details>
<summary>
`test/test_sparse_csr.py`</summary>
<p>
```python
7:from torch.testing import get_all_complex_dtypes, get_all_fp_dtypes, floating_and_complex_types, make_tensor
17:from torch.testing._internal.common_dtype import floating_types, get_all_dtypes
120: dtypes(*get_all_dtypes())
133: dtypes(*get_all_dtypes())
150: dtypes(*get_all_dtypes())
180: dtypes(*get_all_dtypes())
201: dtypes(*get_all_dtypes())
210: dtypes(*get_all_dtypes())
225: dtypes(*get_all_dtypes())
244: dtypes(*get_all_dtypes())
263: dtypes(*get_all_dtypes())
285: dtypes(*get_all_dtypes())
411: dtypes(*get_all_dtypes())
482: dtypes(*get_all_dtypes())
502: dtypes(*get_all_dtypes())
562: dtypes(*get_all_dtypes())
588: dtypesIfCUDA(*get_all_complex_dtypes(),
589: *get_all_fp_dtypes(include_half=SM53OrLater, include_bfloat16=SM80OrLater))
745: dtypesIfCUDA(*get_all_complex_dtypes(),
746: *get_all_fp_dtypes(include_half=SM53OrLater and TEST_CUSPARSE_GENERIC,
765: dtypesIfCUDA(*get_all_complex_dtypes(),
766: *get_all_fp_dtypes(include_half=SM53OrLater and TEST_CUSPARSE_GENERIC,
801: *torch.testing.get_all_fp_dtypes(include_bfloat16=SM80OrLater,
841: *torch.testing.get_all_fp_dtypes(include_bfloat16=SM80OrLater,
1182: dtypes(*get_all_dtypes())
1276: dtypes(*get_all_dtypes(include_bool=False, include_half=False, include_bfloat16=False))
1286: dtypes(*get_all_dtypes())
```
</p>
</details>
<details>
<summary>
`test/test_tensor_creation_ops.py`</summary>
<p>
```python
21: onlyCUDA, skipCPUIf, dtypesIfCUDA, skipMeta, get_all_device_types)
23: get_all_dtypes, get_all_math_dtypes, get_all_int_dtypes, get_all_fp_dtypes, get_all_complex_dtypes
150: for dt in get_all_dtypes():
160: for dt in get_all_dtypes():
314: dtypes = [dtype for dtype in get_all_dtypes() if dtype != torch.bfloat16]
1012: dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False) +
1013: get_all_complex_dtypes()))
1032: dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False) +
1033: get_all_complex_dtypes()))
1050: dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False) +
1051: get_all_complex_dtypes()))
1745: dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes()))
1779: dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes()))
1868: dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes()))
1926: dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes()))
1954: do_test_empty_full(self, get_all_math_dtypes('cpu'), torch.strided, torch_device)
1956: do_test_empty_full(self, get_all_math_dtypes('cpu'), torch.strided, None)
1957: do_test_empty_full(self, get_all_math_dtypes('cpu'), torch.strided, torch_device)
2538: for device in get_all_device_types():
2645: for dtype in get_all_dtypes():
2678: dtypes(*(get_all_fp_dtypes(include_half=False, include_bfloat16=False) +
2679: get_all_complex_dtypes()))
2716: dtypes(*get_all_fp_dtypes(include_half=False, include_bfloat16=False))
2827: for dt in get_all_dtypes():
2913: dtypes(*get_all_dtypes(include_bool=False, include_half=False))
2914: dtypesIfCUDA(*get_all_dtypes(include_bool=False, include_half=True))
3028: dtypes(*(get_all_fp_dtypes() + get_all_complex_dtypes()))
3033: dtypes(*(get_all_fp_dtypes() + get_all_complex_dtypes()))
3074: dtypes(*get_all_dtypes(include_bool=False, include_half=False, include_complex=False))
3075: dtypesIfCUDA(*((get_all_int_dtypes() + [torch.float32, torch.float16, torch.bfloat16])
3077: else get_all_dtypes(include_bool=False, include_half=True, include_complex=False)))
3873: dtypes(*get_all_dtypes())
3884: dtypes(*get_all_dtypes(include_bool=False))
3916: for other in get_all_dtypes():
3922: dtypes(*get_all_dtypes())
3932: dtypes(*get_all_dtypes(include_bool=False))
3955: dtypes(*get_all_dtypes(include_bool=False))
3961: dtypes(*get_all_dtypes(include_bool=False))
3965: dtypes(*get_all_dtypes())
```
</p>
</details>
<details>
<summary>
`test/test_testing.py`</summary>
<p>
```python
25:from torch.testing._internal.common_dtype import get_all_dtypes
31: dtypes(*(get_all_dtypes(include_half=True, include_bfloat16=False,
```
</p>
</details>
<details>
<summary>
`test/test_torch.py`</summary>
<p>
```python
51: expectedAlertNondeterministic, get_all_device_types, skipXLA)
57: get_all_fp_dtypes, get_all_int_dtypes, get_all_math_dtypes, get_all_dtypes, get_all_complex_dtypes
296: for d in get_all_device_types():
323: for device in get_all_device_types():
324: for dt1 in get_all_dtypes():
325: for dt2 in get_all_dtypes():
343: all_dtypes = get_all_dtypes()
350: all_dtypes = get_all_dtypes()
781: for dtype in get_all_dtypes():
986: for device in get_all_device_types():
1017: for device in get_all_device_types():
1018: for dtype in get_all_math_dtypes(device):
2792: for device in get_all_device_types():
3186: dtypes(*get_all_dtypes())
3195: for error_dtype in get_all_dtypes():
3203: dtypes(*get_all_dtypes())
3212: for error_dtype in get_all_dtypes():
4539: dtypes(*get_all_fp_dtypes())
4545: dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes()))
4577: dtypes(*get_all_fp_dtypes(include_half=False, include_bfloat16=False))
4578: dtypesIfCPU(*(get_all_fp_dtypes(include_half=False, include_bfloat16=True)))
4579: dtypesIfCUDA(*(get_all_fp_dtypes(include_bfloat16=False)))
4599: dtypes(*(get_all_fp_dtypes(include_half=False, include_bfloat16=False)))
4600: dtypesIfCPU(*(get_all_dtypes(include_half=False, include_bfloat16=False, include_complex=False)))
4601: dtypesIfCUDA(*(get_all_dtypes(include_bfloat16=False, include_complex=False)))
4613: for p_dtype in get_all_fp_dtypes(include_half=device.startswith('cuda'), include_bfloat16=False):
4628: dtypes(*(get_all_fp_dtypes(include_half=False, include_bfloat16=False)))
4629: dtypesIfCUDA(*(get_all_fp_dtypes(include_bfloat16=False)))
4640: dtypes(*get_all_fp_dtypes())
4723: dtypes(*get_all_fp_dtypes())
4735: dtypes(*get_all_fp_dtypes(include_bfloat16=False))
4736: dtypesIfCUDA(*get_all_fp_dtypes())
4747: dtypes(*get_all_fp_dtypes())
4761: dtypes(*get_all_fp_dtypes())
4771: dtypes(*get_all_fp_dtypes())
4792: dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes()))
5302: dtypes(*get_all_dtypes(include_bfloat16=False))
5322: dtypes(*get_all_dtypes(include_half=False, include_bfloat16=False))
5323: dtypesIfCPU(*get_all_dtypes(include_bfloat16=False))
5324: dtypesIfCUDA(*get_all_dtypes(include_bfloat16=False))
5591: for dt in get_all_dtypes():
5611: for dt in get_all_dtypes():
5678: for dt in get_all_dtypes():
5696: dtypesIfCUDA(*set(get_all_math_dtypes('cuda')))
5697: dtypes(*set(get_all_math_dtypes('cpu')))
5746: dtypes(*get_all_dtypes())
5780: dtypes(*get_all_dtypes())
5885: dtypes(*get_all_dtypes())
5902: dtypes(*get_all_dtypes())
5945: dtypes(*get_all_dtypes())
5979: dtypes(*get_all_dtypes(include_bool=False))
6049: dtypes(*get_all_dtypes(include_bool=False))
6092: dtypes(*(get_all_fp_dtypes(include_bfloat16=False, include_half=False) +
6093: get_all_complex_dtypes()))
6094: dtypesIfCPU(*get_all_dtypes())
6095: dtypesIfCUDA(*get_all_dtypes())
6122: dtypes(*(get_all_fp_dtypes(include_bfloat16=False, include_half=False) +
6123: get_all_complex_dtypes()))
6124: dtypesIfCPU(*get_all_dtypes())
6125: dtypesIfCUDA(*get_all_dtypes())
6163: dtypes(*(get_all_fp_dtypes(include_bfloat16=False, include_half=False) +
6164: get_all_complex_dtypes()))
6165: dtypesIfCPU(*get_all_dtypes())
6166: dtypesIfCUDA(*get_all_dtypes())
6190: dtypes(*(get_all_complex_dtypes() +
6191: get_all_int_dtypes()))
6238: dtypes(*get_all_dtypes())
6323: dtypes(*get_all_dtypes())
6389: dtypes(*product(get_all_dtypes(), (torch.uint8, torch.bool)))
6699: dtypesIfCUDA(*set(get_all_math_dtypes('cuda')))
6700: dtypes(*set(get_all_math_dtypes('cpu')))
7452: dtypes(*get_all_dtypes(include_bool=False))
7461: dtypes(*get_all_dtypes(include_bool=False))
7477: dtypes(*get_all_dtypes(include_bool=False))
7496: dtypes(*get_all_dtypes(include_bool=False))
7538: dtypes(*get_all_dtypes(include_bool=False))
8162: dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes() +
8163: get_all_complex_dtypes()))
8175: dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes() +
8176: get_all_complex_dtypes()))
```
</p>
</details>
<details>
<summary>
`test/test_type_promotion.py`</summary>
<p>
```python
14: get_all_dtypes, get_all_math_dtypes, get_all_int_dtypes, get_all_fp_dtypes
187: for dtype in get_all_dtypes():
262: dtypes1 = get_all_math_dtypes('cuda')
263: dtypes2 = get_all_math_dtypes(device)
339: dtypes(*itertools.product(get_all_dtypes(), get_all_dtypes()))
468: for dt1 in get_all_math_dtypes(device):
469: for dt2 in get_all_math_dtypes(device):
519: for dt1 in get_all_math_dtypes(device):
520: for dt2 in get_all_math_dtypes(device):
528: for dt in get_all_math_dtypes(device):
561: for dtype in get_all_dtypes():
766: dtypes=get_all_math_dtypes(device))
771: dtypes=get_all_math_dtypes(device))
782: dtypes=get_all_math_dtypes(device))
879: dtypes = get_all_dtypes(include_bfloat16=False)
898: dtypes = get_all_dtypes(include_bfloat16=False, include_bool=False)
965: dtypesIfCUDA(*itertools.product(get_all_dtypes(include_bfloat16=False, include_complex=False),
966: get_all_dtypes(include_bfloat16=False, include_complex=False)))
967: dtypes(*itertools.product(get_all_dtypes(include_half=False, include_bfloat16=False,
969: get_all_dtypes(include_half=False, include_bfloat16=False,
976: return dtype in get_all_int_dtypes() + [torch.bool]
979: return dtype in get_all_fp_dtypes(include_half=True, include_bfloat16=False)
```
</p>
</details>
<details>
<summary>
`test/test_unary_ufuncs.py`</summary>
<p>
```python
24: floating_types_and, all_types_and_complex_and, floating_and_complex_types_and, get_all_dtypes, get_all_math_dtypes,
25: get_all_int_dtypes, get_all_fp_dtypes, get_all_complex_dtypes
517: dtypes(*(get_all_int_dtypes() + [torch.bool] +
518: get_all_fp_dtypes(include_bfloat16=False)))
596: dtypes(*get_all_fp_dtypes(include_half=True, include_bfloat16=False))
611: invalid_input_dtypes = get_all_int_dtypes() + \
612: get_all_complex_dtypes() + \
619: for dtype in get_all_fp_dtypes(include_half=True, include_bfloat16=False):
1048: dtypes(*get_all_math_dtypes('cpu'))
1182: dtypesIfCUDA(*get_all_fp_dtypes())
1190: dtypesIfCUDA(*get_all_fp_dtypes())
1205: dtypesIfCUDA(*get_all_fp_dtypes())
1215: dtypesIfCUDA(*get_all_fp_dtypes())
1307: dtypes(*(get_all_dtypes(include_bool=False)))
1349: dtypes(*(get_all_fp_dtypes(include_half=False) +
1350: get_all_complex_dtypes()))
1351: dtypesIfCUDA(*(get_all_fp_dtypes(include_half=True) +
1352: get_all_complex_dtypes()))
```
</p>
</details>
<details>
<summary>
`test/test_view_ops.py`</summary>
<p>
```python
19: get_all_dtypes, get_all_int_dtypes, get_all_fp_dtypes, get_all_complex_dtypes
124: dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes()))
131: dtypes(*get_all_dtypes(include_bfloat16=False))
213: for view_dtype in [*get_all_fp_dtypes(), *get_all_complex_dtypes()]:
220: dtypes(*get_all_dtypes())
224: for view_dtype in get_all_dtypes():
305: dtypes(*get_all_complex_dtypes(include_complex32=True))
343: dtypes(*get_all_dtypes())
354: dtypes(*get_all_dtypes())
364: dtypes(*get_all_dtypes())
374: dtypes(*get_all_dtypes())
384: dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes()))
395: dtypes(*get_all_complex_dtypes())
426: dtypes(*get_all_complex_dtypes())
451: dtypes(*product(get_all_complex_dtypes(), get_all_dtypes()))
1263: dtypes(*(torch.testing.get_all_dtypes()))
1279: dtypes(*(torch.testing.get_all_dtypes()))
1405: dtypes(*(get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False) +
1406: get_all_complex_dtypes()))
1471: dtypes(*get_all_dtypes(include_bfloat16=False))
1574: dtypes(*get_all_dtypes())
1601: dtypes(*get_all_dtypes(include_bfloat16=False))
1632: dtypes(*get_all_dtypes(include_bfloat16=False))
1711: for dt in get_all_dtypes():
1717: for dt in get_all_dtypes():
1724: for dt in get_all_dtypes():
```
</p>
</details>
I'm looking forward to your viewpoints. Thanks :)
cc: mruberry kshitij12345 anjali411
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71561
Reviewed By: samdow
Differential Revision: D34856571
Pulled By: mruberry
fbshipit-source-id: 0dca038bcad5cf69906245c496d2e61ac3876335
(cherry picked from commit b058f67b4313143efa714ab105f36e74083131b9)
2022-03-15 20:31:41 +00:00
Natalia Gimelshein
967606124a
port torch cov tests to error inputs ( #73977 )
...
Summary:
Per title
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73977
Reviewed By: malfet
Differential Revision: D34779552
Pulled By: ngimel
fbshipit-source-id: b4191101a029981eb27c75e1b56d739db046f819
(cherry picked from commit 2c2af726ffdba68f358a4ff0ee07580609bccc34)
2022-03-10 19:04:44 +00:00
Natalia Gimelshein
e47a5a64bb
Back out "Revert D34524207: [pytorch][PR] remove _s_where" ( #73579 )
...
Summary:
Original commit changeset: 87b1220d851c
Original Phabricator Diff: D34524207 (4eb2482568 ) (4eb2482568 )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73579
Test Plan:
OSS tests
tested with canary https://www.internalfb.com/intern/ads/canary/441912928798660873
Reviewed By: ezyang
Differential Revision: D34688237
Pulled By: ngimel
fbshipit-source-id: 32f3a0046053ef52e95ab45a26bfc1de17e7e061
(cherry picked from commit d1c0acbe3e0ff884c429072923a468ee1d3d447d)
2022-03-08 19:15:30 +00:00
anjali411
37e0d2e361
Fix segfault while real and imaginary attributes are set to a number ( #73867 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73867
Fixes https://github.com/pytorch/pytorch/issues/72947
Test Plan: Imported from OSS
Reviewed By: davidberard98
Differential Revision: D34695956
Pulled By: anjali411
fbshipit-source-id: 2f3eda272a5214335eae506bd387ce8da4d81b8c
(cherry picked from commit fdb07354cac22c30aa047e65fbac9840608db811)
2022-03-08 18:58:26 +00:00
Natalia Gimelshein
55525632ab
Revert D34554432: Back out "Revert D34524207: [pytorch][PR] remove _s_where"
...
Test Plan: revert-hammer
Differential Revision:
D34554432 (9c03c6163f )
Original commit changeset: 2f3601d3d426
Original Phabricator Diff: D34554432 (9c03c6163f )
fbshipit-source-id: db434750f44c6e6ec545a248c462d8fdcbefbaf8
(cherry picked from commit 866d4d0c795edd7ef519925683b5e57dd9b116ad)
2022-03-04 20:32:39 +00:00
Natalia Gimelshein
9c03c6163f
Back out "Revert D34524207: [pytorch][PR] remove _s_where" ( #73579 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73579
Original commit changeset: 87b1220d851c
Original Phabricator Diff: D34524207 (4eb2482568 )
Test Plan: OSS tests
Reviewed By: malfet
Differential Revision: D34554432
fbshipit-source-id: 2f3601d3d4261ebcebb05b4b1aec0c9a8a00ea04
(cherry picked from commit b9cad3f2bc54e12b275567454336cf4d9dcb78c4)
2022-03-04 19:35:41 +00:00
Nikita Karetnikov
eb0d370f14
Write explicit meta-kernels for normal ( #70089 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70089
See #69386 .
Test Plan: Imported from OSS
Reviewed By: ejguan
Differential Revision: D34089964
Pulled By: bdhirsh
fbshipit-source-id: eb88eb7c4830545d3d43c82b6f3abb98617cee8e
(cherry picked from commit 89c9c02a0fb1c780495fee6370961104f4b1dcd1)
2022-03-01 23:28:14 +00:00
Nikita Shulga
dd9517cc4a
Revert D34524207: [pytorch][PR] remove _s_where
...
Test Plan: revert-hammer
Differential Revision:
D34524207 (4eb2482568 )
Original commit changeset: bc71e27b6d3f
Original Phabricator Diff: D34524207 (4eb2482568 )
fbshipit-source-id: 87b1220d851c3d2b51bdd1cf2f8a493c58ab9b14
(cherry picked from commit af1f0cc9e032b00619a7979bbbd2281f69e0fdf0)
2022-03-01 17:43:16 +00:00
Natalia Gimelshein
4eb2482568
remove _s_where ( #73468 )
...
Summary:
Per title
Fixes https://github.com/pytorch/pytorch/issues/73135
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73468
Reviewed By: albanD
Differential Revision: D34524207
Pulled By: ngimel
fbshipit-source-id: bc71e27b6d3fa50de6737533c92375266d9eadc5
(cherry picked from commit 047b925849370e6e4cbe9e3a722db52bb1e965b9)
2022-03-01 07:30:34 +00:00
Philip Meier
0973c5a1cc
align signature of make_tensor with other creation ops ( #72702 )
...
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72702
Test Plan: Imported from OSS
Reviewed By: mrshenli
Differential Revision: D34457729
Pulled By: mruberry
fbshipit-source-id: 83d580c4201eef946dc9cf4b9e28a3d36be55609
(cherry picked from commit aa4cf20fbeb4b795595729b8ac2e6ba7707d8283)
2022-02-25 06:30:31 +00:00
Nikita Shulga
cfb6c942fe
scatter_reduce documentation (#73125 )
...
Summary:
Reland of https://github.com/pytorch/pytorch/issues/68580 (which were milestoned for 1.11) plus partial revert of https://github.com/pytorch/pytorch/pull/72543
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73125
Reviewed By: bdhirsh
Differential Revision: D34355217
Pulled By: malfet
fbshipit-source-id: 325ecdeaf53183d653b44ee5e6e8839ceefd9200
(cherry picked from commit 71db31748a )
2022-02-22 19:33:46 +00:00
Philip Meier
1f74e082e2
only compare attributes for meta tensors ( #72508 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72508
Todo:
- [x] document this behavior
- [x] add tests
Test Plan: Imported from OSS
Reviewed By: zou3519
Differential Revision: D34262452
Pulled By: ezyang
fbshipit-source-id: bc5c9653d5c3ad5c6efccc9c8e0efc0d28e15104
(cherry picked from commit 233142c88e )
2022-02-17 02:33:08 +00:00
Brian Hirsh
f87f753bb9
avoiding adding some functions to the public python API before 1.11 release ( #72543 )
...
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72543
Test Plan: Imported from OSS
Reviewed By: ejguan
Differential Revision: D34085724
Pulled By: bdhirsh
fbshipit-source-id: 941d5a90a6fa5328268d623e0e2b01577e4132ca
(cherry picked from commit 6676a0c79a )
2022-02-14 19:49:01 +00:00
Kurt Mohler
47c6993355
Update from_dlpack tests and documentation ( #70543 )
...
Summary:
Part of https://github.com/pytorch/pytorch/issues/58742
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70543
Reviewed By: soulitzer
Differential Revision: D34172475
Pulled By: mruberry
fbshipit-source-id: d498764b8651a8b7a19181b3421aeebf28a5db2b
(cherry picked from commit 05332f164c )
2022-02-14 03:35:17 +00:00
anjali411
f607af126e
Set correct device id on efficientzerotensors ( #71611 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71611
Fixes https://github.com/pytorch/pytorch/issues/71160 https://github.com/pytorch/pytorch/issues/69925 #69913
Test Plan: Imported from OSS
Reviewed By: VitalyFedyunin
Differential Revision: D33897543
Pulled By: anjali411
fbshipit-source-id: f1d8608c351876b8c2619da5ef891f74bad30ab5
(cherry picked from commit 643e666ea3 )
2022-02-02 21:51:32 +00:00
Anjali Chourdia
1e4aefaa2f
Revert D33834916: Set correct device id on efficientzerotensors
...
Test Plan: revert-hammer
Differential Revision:
D33834916 (a18cfb790d )
Original commit changeset: 11cec343e95e
Original Phabricator Diff: D33834916 (a18cfb790d )
fbshipit-source-id: 3d3f60b760b445383768161b1d21ea4dadbe5d7c
(cherry picked from commit eba41aa646 )
2022-01-31 03:49:56 +00:00
anjali411
a18cfb790d
Set correct device id on efficientzerotensors ( #71611 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71611
Fixes https://github.com/pytorch/pytorch/issues/71160 https://github.com/pytorch/pytorch/issues/69925
Test Plan: Imported from OSS
Reviewed By: george-qi
Differential Revision: D33834916
Pulled By: anjali411
fbshipit-source-id: 11cec343e95e2ee188ab7576f26f64aa19317891
(cherry picked from commit f6e86f8a6b )
2022-01-30 20:53:15 +00:00
Mikayla Gawarecki
09c417ae65
Add new reduce options and autograd support for scatter_reduce ( #71788 )
...
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71788
Test Plan: Imported from OSS
Reviewed By: mikaylagawarecki
Differential Revision: D33778525
Pulled By: cpuhrsch
fbshipit-source-id: 47b8544e29df3075bc6ede894c59499a7ffec876
(cherry picked from commit ddcddac726 )
2022-01-27 17:38:50 +00:00
Mikayla Gawarecki
fdec94504f
Rename _scatter_reduce to scatter_reduce and make it unstructured ( #71787 )
...
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71787
Test Plan: Imported from OSS
Reviewed By: mikaylagawarecki
Differential Revision: D33778524
Pulled By: cpuhrsch
fbshipit-source-id: 55a330e1c2227c0eaaa1c0d2f9205a4dee24a11b
(cherry picked from commit 6e4a8a91da )
2022-01-27 16:29:13 +00:00
Mike Ruberry
0891c908bb
Revert D33768645: Set correct device id on efficientzerotensors
...
Test Plan: revert-hammer
Differential Revision:
D33768645 (5dd6cd55ba )
Original commit changeset: 66ce9907630b
Original Phabricator Diff: D33768645 (5dd6cd55ba )
fbshipit-source-id: 4bb1ad46f01cd33aeb813bdc123741cf665194a8
(cherry picked from commit 8ca385b1d8 )
2022-01-26 17:01:32 +00:00
anjali411
5dd6cd55ba
Set correct device id on efficientzerotensors ( #71611 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71611
Fixes https://github.com/pytorch/pytorch/issues/71160
Test Plan: Imported from OSS
Reviewed By: pbelevich, ngimel
Differential Revision: D33768645
Pulled By: anjali411
fbshipit-source-id: 66ce9907630b65a12c0775077147a7e72ff4cee4
(cherry picked from commit 3af98a4d70 )
2022-01-25 23:32:11 +00:00
Jonathan Colen
33403f4848
edge_order check in torch.gradient only applies to dim argument ( #67926 )
...
Summary:
Fixes https://github.com/pytorch/pytorch/issues/67919
The compatibility check on `edge_order` in `pre_check_gradient` now looks only at dim argument if it is present, otherwise it checks all dimensions.
Previously, it would check all dimensions regardless of the dim argument and throw unnecessary errors.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67926
Reviewed By: albanD
Differential Revision: D33760621
Pulled By: mruberry
fbshipit-source-id: d490cd8610c68ff3787e670fc947de3cbf2db062
(cherry picked from commit 45bc56de9e )
2022-01-25 21:29:31 +00:00
Mike Ruberry
e0d829a266
Kill the test_torch.py mixin and creates test_scatter_gather_ops ( #71691 )
...
Summary:
Per title.
Also annotates test_torch.py with additional cleanup tasks and adds empty sample inputs to elementwise unary and binary OpInfos.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71691
Reviewed By: ngimel
Differential Revision: D33735126
Pulled By: mruberry
fbshipit-source-id: 8cc097a7581a8b620540c95b2a5889c1165ecf23
(cherry picked from commit 5c6a245a3f )
2022-01-24 09:32:32 +00:00
Mike Ruberry
7680a0ae9d
Deprecates _aminmax ( #71576 )
...
Summary:
Replaces https://github.com/pytorch/pytorch/pull/62432 . Existing callsites are updated.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71576
Reviewed By: ngimel
Differential Revision: D33689960
Pulled By: mruberry
fbshipit-source-id: fad1ba78347ecec7fd48f21862c3eb606662b8f4
(cherry picked from commit 6cd438e9a1 )
2022-01-21 09:23:29 +00:00
Peter Bell
17bb68618f
Copy: Fix CPU transpose path ignoring neg and conj bits ( #69026 )
...
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69026
Test Plan: Imported from OSS
Reviewed By: ngimel
Differential Revision: D33064533
Pulled By: anjali411
fbshipit-source-id: 98c25586a1707ac2324f69f652ce5a14dd59c0ad
2022-01-14 10:13:33 -08:00
Emilio Castillo
8dfff8b2e2
Fix scatter for empty indexes ( #70662 )
...
Summary:
This PR fixes an issue with `scatter` where the output is garbage for zero-sized indexes.
```py
import torch
null_index = torch.zeros((0, 4), dtype=torch.int64)
null_arr = torch.zeros((0, 4))
zeros_arr = torch.zeros((1, 4))
result = zeros_arr.scatter(0, null_index, null_arr)
print(null_index)
print(null_arr)
print(zeros_arr)
print(result)
```
```
tensor([], size=(0, 4), dtype=torch.int64)
tensor([], size=(0, 4))
tensor([[0., 0., 0., 0.]])
tensor([[1.7036e+19, 2.9965e+32, 3.9133e-14, 1.3585e-19]])
```
the out array is never filled if `index` arg has 0 elements.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70662
Reviewed By: dagitses
Differential Revision: D33476807
Pulled By: albanD
fbshipit-source-id: 97dbdd9c0133899e58828c43ecba81838807b8af
2022-01-07 09:20:43 -08:00
Peter Bell
917d56a7e4
Copy: Fix conj bit being ignored on type mismatch ( #68963 )
...
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68963
Test Plan: Imported from OSS
Reviewed By: ngimel
Differential Revision: D33064492
Pulled By: anjali411
fbshipit-source-id: 043f927d6bfff46bf5f8ea6fce9409f250bf8ff8
2022-01-05 17:59:32 -08:00
Brian Hirsh
457ba1dd3e
Porting index_add to structured kernels, add an out variant ( #65993 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65993
This PR attempts to port `index_add` to structured kernels, but does more than that:
* Adds an `out=` variant to `index_add`
* Revises `native_functions.yaml` registrations, to not have multiple entries and instead pass default value to `alpha`.
* Changes in `derivatives.yaml` file for autograd functioning
* Revises error messages, please see: https://github.com/pytorch/pytorch/pull/65993#issuecomment-945441615
Follow-up PRs in near future will attempt to refactor the OpInfo test, and will give another look at tests in `test/test_torch.py` for this function. (hence the use of ghstack for this)
~This is WIP because there are tests failing for `Dimname` variant on mobile/android builds, and I'm working on fixing them.~
Issue tracker: https://github.com/pytorch/pytorch/issues/55070
Test Plan: Imported from OSS
Reviewed By: ejguan
Differential Revision: D32646426
fbshipit-source-id: b035ecf843a9a27d4d1e18b202b035adc2a49ab5
2021-12-14 11:57:13 -08:00
kshitij12345
5b2586fe09
[testing] Ignore expected_regex in assertRaisesRegex for non-native device ( #68723 )
...
Summary:
Fixes https://github.com/pytorch/pytorch/issues/29719
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68723
Reviewed By: zou3519
Differential Revision: D32797061
Pulled By: mruberry
fbshipit-source-id: 3bcae6d3d62d180059dbe39be520b0e7f9aea19f
2021-12-02 14:52:27 -08:00
Emilio Castillo
533e72e0a4
Fix DLPack CUDA stream convention ( #67618 )
...
Summary:
Apparently for the array API, cuda default stream and per thread stream should be 1 and 2 instead of 0 and 1:
https://data-apis.org/array-api/latest/API_specification/array_object.html?dlpack-self-stream-none#dlpack-self-stream-none .
This caused a problem in the interop with CuPy https://github.com/cupy/cupy/pull/5970#discussion_r739912926 .
cc rgommers leofang mruberry
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67618
Reviewed By: albanD
Differential Revision: D32521805
Pulled By: mruberry
fbshipit-source-id: 95777e4014e5edf1f88ba10adc03c6e34c13248d
2021-11-18 08:36:05 -08:00
kshitij12345
d5d2096dab
[testing] make @dtypes mandatory when using @dtypesIf ( #68186 )
...
Summary:
Fixes https://github.com/pytorch/pytorch/issues/53647
With this if a test forgets to add `dtypes` while using `dtypesIf`, following error is raised
```
AssertionError: dtypes is mandatory when using dtypesIf however 'test_exponential_no_zero' didn't specify it
```
**Tested Locally**
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68186
Reviewed By: VitalyFedyunin
Differential Revision: D32468581
Pulled By: mruberry
fbshipit-source-id: 805e0855f988b77a5d8d4cd52b31426c04c2200b
2021-11-18 08:29:31 -08:00