For the purposes of this function, `PyTensorType` is essentially being
used as a `pair<Backend, ScalarType>` so it makes more sense to just
take these arguments directly. This simplifies the code and makes it
so that `py_set_default_dtype` doesn't need to search for a valid
`PyTensorType` object just to set the `ScalarType`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73369
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71486
This PR adds upgraders for linspace and linspace.out as the optional step size will be deprecated soon. Old models will be using steps size of 100 when nothing is provided.
Test Plan: buck-out/gen/caffe2/test/jit#binary.par -r TestUpgraders.test_aten_linspace
Reviewed By: cccclai, mruberry
Differential Revision: D33654308
fbshipit-source-id: 0e0138091da0b11d4f49156eeb6bcd7e46102a5b
(cherry picked from commit 931ae4af32)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67032
This PR adds meta backend support to the `range`, `arange`, `linspace`, and `logspace` operators.
Note that the original PR (#66630) was reverted due to two failing unit tests in the Bionic CI. This revision includes a fix for those tests; otherwise its content is identical to the previous PR.
Original commit changeset: 2f9d8d1acbb0
ghstack-source-id: 142487306
Test Plan: Extended the existing tensor creation tests to assert meta backend support.
Reviewed By: zhaojuanmao
Differential Revision: D31834403
fbshipit-source-id: a489858a2a8a38a03234b14408e14d2b208a8d34
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66630
This PR adds meta backend support to the `range`, `arange`, `linspace`, and `logspace` operators.
ghstack-source-id: 140618055
Test Plan: Extended the existing tensor creation tests to assert meta backend support.
Reviewed By: ezyang
Differential Revision: D31656999
fbshipit-source-id: 06e7f3655b94c0d85a28bcd0ca61d9f9ce707f1d
Summary:
This is step 3/7 of https://github.com/pytorch/pytorch/issues/50276. It only adds support for the argument but doesn't implement new indexing modes yet.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62722
Test Plan:
Verified this is not FC breaking by adding logging to both meshgrid
overloads and then called meshgrid twice:
`meshgrid(*tensors)`
and
`meshgrid(*tensors, indexing='ij')`
This confirmed that the former signature triggered the original native
function and the latter signature triggered the new native function.
Reviewed By: H-Huang
Differential Revision: D30394313
Pulled By: dagitses
fbshipit-source-id: e265cb114d8caae414ee2305dc463b34fdb57fa6
Summary:
And replace two existing usages in the codebase with it
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64942
Reviewed By: jbschlosser
Differential Revision: D30906382
Pulled By: malfet
fbshipit-source-id: e7f20f53aff734b0379eded361255543dab4fa4b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63554
Following https://github.com/pytorch/pytorch/pull/61840#issuecomment-884087809, this deprecates all the dtype getters publicly exposed in the `torch.testing` namespace. The reason for this twofold:
1. If someone is not familiar with the C++ dispatch macros PyTorch uses, the names are misleading. For example `torch.testing.floating_types()` will only give you `float32` and `float64` skipping `float16` and `bfloat16`.
2. The dtype getters provide very minimal functionality that can be easily emulated by downstream libraries.
We thought about [providing an replacement](https://gist.github.com/pmeier/3dfd2e105842ad0de4505068a1a0270a), but ultimately decided against it. The major problem is BC: by keeping it, either the namespace is getting messy again after a new dtype is added or we need to somehow version the return values of the getters.
Test Plan: Imported from OSS
Reviewed By: H-Huang
Differential Revision: D30662206
Pulled By: mruberry
fbshipit-source-id: a2bdb10ab02ae665df1b5b76e8afa9af043bbf56
Summary:
Fixes https://github.com/pytorch/pytorch/issues/61767
## Changes
- [x] Add `torch.concat` alias to `torch.cat`
- [x] Add OpInfo for `cat`/`concat`
- [x] Fix `test_out` skips (Use `at::native::resize_output` or `at::native::resize_output_check`)
- [x] `cat`/`concat`
- [x] `stack`
- [x] `hstack`
- [x] `dstack`
- [x] `vstack`/`row_stack`
- [x] Remove redundant tests for `cat`/`stack`
~I've not added `cat`/`concat` to OpInfo `op_db` yet, since cat is a little more tricky than other OpInfos (should have a lot of tests) and currently there are no OpInfos for that. I can try to add that in a subsequent PR or maybe here itself, whatever is suggested.~
**Edit**: cat/concat OpInfo has been added.
**Note**: I've added the named tensor support for `concat` alias as well, maybe that's out of spec in `array-api` but it is still useful for consistency in PyTorch.
Thanks to krshrimali for guidance on my first PR :))
cc mruberry rgommers pmeier asmeurer leofang AnirudhDagar asi1024 emcastillo kmaehashi heitorschueroff krshrimali
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62560
Reviewed By: saketh-are
Differential Revision: D30762069
Pulled By: mruberry
fbshipit-source-id: 6985159d1d9756238890488a0ab3ae7699d94337
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63572
Addresses #61906. Issue will be fixed later in the stack when `torch.testing.assert_close` got the same treatment.
cc ezyang gchanan
Test Plan: Imported from OSS
Reviewed By: ezyang
Differential Revision: D30633527
Pulled By: mruberry
fbshipit-source-id: c2002a4998a7a75cb2ab83f87190bde43a9d4f7c
Summary:
Context https://github.com/pytorch/pytorch/issues/58545
The logic is that we are going to keep it consistent for both
torch.randperm and torch.randint
1. Generators can have either a fully-specified or non-fully specified device
2. As long as the device type match with the result, we don't error out
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59352
Test Plan:
```
python test/test_tensor_creation_ops.py -k TestRandomTensorCreation
```
Reviewed By: ngimel
Differential Revision: D28855920
Pulled By: zhouzhuojie
fbshipit-source-id: f8141a2c4b2f177e1aa7baec6999b65916cba02c
Summary:
…evice.
Previously, it was possible for torch.Tensor(tensor, device) or Tensor.new(tensor, device) to map to IntArrayRef or PyObject*.
PyObject* was not a problem because that would error out later.
But IntArrayRef would create an uninitialized tensor, which is confusing.
Fixes https://github.com/pytorch/pytorch/issues/47112
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58108
Reviewed By: agolynski, mruberry
Differential Revision: D28372426
Pulled By: gchanan
fbshipit-source-id: 795ab4f0561939d002a661c5cc14c6cdb579f31a
Summary:
For small tensors, it is known that GPU operates slower than CPU. However, offloading to CPU causes host <--> device sync. As a result, although offloading to CPU has better microbenchmarks, it often hurts instead of benefits the end-to-end performance, and it could be a blocker for CUDA graphs. After discussion with mcarilli and ptrblck, we think it might be good to just remove this piece of code and let it be slow.
Microbenchmarks:
```python
def run50_sync(f):
for _ in range(50):
f()
torch.cuda.synchronize()
torch.cuda.synchronize()
%timeit run50_sync(lambda: torch.randperm(3, device='cuda'))
%timeit run50_sync(lambda: torch.randperm(30, device='cuda'))
%timeit run50_sync(lambda: torch.randperm(300, device='cuda'))
%timeit run50_sync(lambda: torch.randperm(3000, device='cuda'))
%timeit run50_sync(lambda: torch.randperm(30000, device='cuda'))
%timeit run50_sync(lambda: torch.randperm(300000, device='cuda'))
%timeit run50_sync(lambda: torch.randperm(3000000, device='cuda'))
%timeit run50_sync(lambda: torch.randperm(30000000, device='cuda'))
```
Before this PR:
```
5.79 ms ± 51.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
5.78 ms ± 92.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
6.17 ms ± 87.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
9.65 ms ± 69.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
17.6 ms ± 133 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
21 ms ± 120 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
104 ms ± 880 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
944 ms ± 3.49 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
```
After this PR:
```
7.22 ms ± 11.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
7.28 ms ± 9.03 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
7.25 ms ± 10.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
9.19 ms ± 5.83 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
9.76 ms ± 162 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
12.3 ms ± 11.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
69.3 ms ± 42.3 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
716 ms ± 1.01 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54113
Reviewed By: ezyang
Differential Revision: D28017958
Pulled By: ngimel
fbshipit-source-id: 660992d43ca449e61ce0cb0aa1dae554c9560a4e
Summary:
Fixes https://github.com/pytorch/pytorch/issues/56822
There was an off by one in CPU randperm when checking the limits of the requested range. Also shows up in the "CUDA" version as it will fallback to CPU for small input sizes.
CC zasdfgbnm
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56967
Reviewed By: mruberry
Differential Revision: D28031819
Pulled By: ngimel
fbshipit-source-id: 4d25995628997f164aafe94e7eae6c54f018e4e5
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56976
Band-aid fix for #54282
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Reviewed By: mruberry
Differential Revision: D28020401
Pulled By: ezyang
fbshipit-source-id: 50546d5275eade408d65e9c883999fb3b65ff55a