Commit Graph

232 Commits

Author SHA1 Message Date
Yu, Guangye
7cd48df2da Refine the logic of device construction when only device index is given (#129119)
# Motivation
Before this PR, device construction was `cuda` type when only a device index was given. It also returns the `PrivateUser1` type if a `PrivateUser1` type is registered.
```bash
>>> import torch
>>> device = torch.device(0)
>>> device.type
'cuda'
>>> a = torch.tensor([1, 2])
>>> b = a.to(0)
>>> b
tensor([1, 2], device='cuda:0')
```
It works well on CUDA GPU. But it will raise unexpected information and error running on XPU.
```bash
>>> import torch
>>> device = torch.device(0)
>>> device.type
'cuda'
>>> a = torch.tensor([1, 2])
>>> b = a.to(0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/xxx/pytorch/torch/cuda/__init__.py", line 302, in _lazy_init
    raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled
```
With this PR, refine the logic to use the currently available device type instead.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129119
Approved by: https://github.com/albanD, https://github.com/gujinghui, https://github.com/EikanWang
ghstack dependencies: #129463, #129205, #129363
2024-07-15 14:34:29 +00:00
cyy
f4dcf2ae93 [1/N] Change #include <c10/util/Optional.h> to #include <optional> (#128301)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/128301
Approved by: https://github.com/ezyang, https://github.com/r-barnes
2024-07-08 07:03:53 +00:00
Nikita Shulga
2bc6f329b2 Make PyTorch argparser understand complex (#129580)
It understands float and int, so why not `complex`.

Test plan: `python -c "import torch;print(torch.rand(3, dtype=complex))"`

Fixes https://github.com/pytorch/pytorch/issues/126837

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129580
Approved by: https://github.com/albanD
2024-06-29 01:21:12 +00:00
PyTorch MergeBot
846bb30e13 Revert "[1/N] Change #include <c10/util/Optional.h> to #include <optional> (#128301)"
This reverts commit bd72e28314.

Reverted https://github.com/pytorch/pytorch/pull/128301 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it fails XLA build bd72e28314. Please rebase your PR before relanding because I think the failure is hidden by an unrelated broken trunk XLA failure from your current base commit ([comment](https://github.com/pytorch/pytorch/pull/128301#issuecomment-2169035822))
2024-06-15 01:58:20 +00:00
cyy
bd72e28314 [1/N] Change #include <c10/util/Optional.h> to #include <optional> (#128301)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/128301
Approved by: https://github.com/ezyang
2024-06-14 23:21:01 +00:00
cyy
f8c6d43524 Concat namespaces and other fixes in torch/csrc/utils (#127833)
It contains formatting and other minor fixes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127833
Approved by: https://github.com/ezyang
2024-06-04 15:12:45 +00:00
feifan
c9172d4471 print default value in FunctionSignature (#127059)
Fixes #[126758](https://github.com/pytorch/pytorch/issues/126758) and #[126759](https://github.com/pytorch/pytorch/issues/126759)

The output information in the issue is not accurate because `FunctionSignature::toString()` print the schema strings without default.
cb6ef68caa/torch/csrc/utils/python_arg_parser.cpp (L1282-L1283)
This pr, by adding a `default_value` to save the default str ,which shoule be priented. Of course, can also add an new api to reverse `default_bool/default_int` to string, which is slightly more complicated.
result:
![image](https://github.com/pytorch/pytorch/assets/37650440/f58a4cbf-b0f4-4c81-9106-59f0d35c54ea)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127059
Approved by: https://github.com/janeyx99
2024-05-28 18:04:31 +00:00
Richard Barnes
ed327876f5 [codemod] c10:optional -> std::optional (#126135)
Generated by running the following from PyTorch root:
```
find . -regex ".*\.\(cpp\|h\|cu\|hpp\|cc\|cxx\)$" | grep -v "build/" | xargs -n 50 -P 4 perl -pi -e 's/c10::optional/std::optional/'
```

`c10::optional` is just an alias for `std::optional`. This removes usages of that alias in preparation for eliminating it entirely.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/126135
Approved by: https://github.com/Skylion007, https://github.com/malfet, https://github.com/albanD, https://github.com/aaronenyeshi
2024-05-14 19:35:51 +00:00
Richard Barnes
98e5238ad8 [codemod][lowrisk] Remove unused exception parameter from caffe2/caffe2/image/image_input_op.h (#123056)
Summary:
`-Wunused-exception-parameter` has identified an unused exception parameter. This diff removes it.

This:
```
try {
    ...
} catch (exception& e) {
    // no use of e
}
```
should instead be written as
```
} catch (exception&) {
```

If the code compiles, this is safe to land.

Test Plan: Sandcastle

Reviewed By: palmje

Differential Revision: D55548497

Pull Request resolved: https://github.com/pytorch/pytorch/pull/123056
Approved by: https://github.com/Skylion007
2024-04-04 17:24:43 +00:00
PyTorch MergeBot
db506762d1 Revert "Change ATEN generator argument type to const std::optional<Generator>& (#120076)"
This reverts commit a52b4e2257.

Reverted https://github.com/pytorch/pytorch/pull/120076 on behalf of https://github.com/atalman due to breaking internal builds ([comment](https://github.com/pytorch/pytorch/pull/120076#issuecomment-2018680656))
2024-03-25 18:52:05 +00:00
cyy
a52b4e2257 Change ATEN generator argument type to const std::optional<Generator>& (#120076)
This PR proposes to use std::optional<Generator>& for underlying functions to avoid unnecessary copy and move operations. The torchgen code was changed to generate the new type.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120076
Approved by: https://github.com/malfet
2024-03-24 02:12:08 +00:00
PyTorch MergeBot
02fee6caec Revert "Change ATEN generator argument type to const std::optional<Generator>& (#120076)"
This reverts commit ecbe82b9ce.

Reverted https://github.com/pytorch/pytorch/pull/120076 on behalf of https://github.com/jeanschmidt due to Reverting in order to check if this will fix XLA trunk jobs ([comment](https://github.com/pytorch/pytorch/pull/120076#issuecomment-2015272644))
2024-03-22 14:53:45 +00:00
cyy
ecbe82b9ce Change ATEN generator argument type to const std::optional<Generator>& (#120076)
This PR proposes to use std::optional<Generator>& for underlying functions to avoid unnecessary copy and move operations. The torchgen code was changed to generate the new type.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120076
Approved by: https://github.com/malfet
2024-03-22 03:49:31 +00:00
PyTorch MergeBot
c0996866f4 Revert "Change ATEN generator argument type to const std::optional<Generator>& (#120076)"
This reverts commit 4305c64fea.

Reverted https://github.com/pytorch/pytorch/pull/120076 on behalf of https://github.com/izaitsevfb due to breaking internal builds(take 3) ([comment](https://github.com/pytorch/pytorch/pull/120076#issuecomment-1986338164))
2024-03-08 20:01:03 +00:00
cyy
4305c64fea Change ATEN generator argument type to const std::optional<Generator>& (#120076)
This PR proposes to use std::optional<Generator>& for underlying functions to avoid unnecessary copy and move operations. The torchgen code was changed to generate the new type.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120076
Approved by: https://github.com/malfet
2024-03-07 09:52:21 +00:00
PyTorch MergeBot
a9d9077f12 Revert "Increased compile time max GPUs to 512. Switched to int16_t DeviceIndex. (#119639)"
This reverts commit 7c556428c7.

Reverted https://github.com/pytorch/pytorch/pull/119639 on behalf of https://github.com/kit1980 due to breaking internal builds, see D54286923 ([comment](https://github.com/pytorch/pytorch/pull/119639#issuecomment-1969634480))
2024-02-28 18:57:09 +00:00
Tobias Ringwald
7c556428c7 Increased compile time max GPUs to 512. Switched to int16_t DeviceIndex. (#119639)
Fixes #115331.

This PR increases the number of valid GPU devices to 512 (from 64) in order to future-proof PyTorch for providers that offer [single nodes with a large device count](https://www.tensorwave.com/). Until now, `DeviceIndex` was an `int8_t`, thus multiple changes were necessary:

- `DeviceIndex` changed to `int16_t`. Updated consumers that assume it to be an `int8_t`.
- Updated bounds checking for `torch.device()` in the Python frontend. Right now, we allow funny things like `torch.device('cpu', 200).index == -56`, which is undefined behavior. I inserted some checks to only allow values between 0 and `c10::Device::MAX_NUM_DEVICES - 1`.
- Updated the `ArgumentInfo` struct as it hardcodes the device index as 8 bit field [^1]. Might be a breaking change, not sure if users rely on this.
- Introduced `c10::Device::MAX_NUM_DEVICES` as a replacement for the old `C10_COMPILE_TIME_MAX_GPUS`

[^1]: This field was unsigned, so I guess this has also been undef behavior the whole time? Our default device index is -1, so this always wrapped around to 255 when written to the `ArgumentInfo` struct. When I switched the `DeviceIndex` to `int16_t`, it actually stayed 255 after unpacking from `ArgumentInfo` again, as the `DeviceIndex` was now wide enough that it didn't wrap back to -1.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119639
Approved by: https://github.com/cyyever, https://github.com/albanD, https://github.com/huydhn
2024-02-27 07:05:48 +00:00
PyTorch MergeBot
fff9d98e58 Revert "Increased compile time max GPUs to 512. Switched to int16_t DeviceIndex. (#119639)"
This reverts commit e0268821dd.

Reverted https://github.com/pytorch/pytorch/pull/119639 on behalf of https://github.com/huydhn due to Sorry for reverting your change but I think the Window failures are legit as they are failing now in trunk, i.e. 450339ab2d ([comment](https://github.com/pytorch/pytorch/pull/119639#issuecomment-1958428416))
2024-02-22 00:12:54 +00:00
Tobias Ringwald
e0268821dd Increased compile time max GPUs to 512. Switched to int16_t DeviceIndex. (#119639)
Fixes #115331.

This PR increases the number of valid GPU devices to 512 (from 64) in order to future-proof PyTorch for providers that offer [single nodes with a large device count](https://www.tensorwave.com/). Until now, `DeviceIndex` was an `int8_t`, thus multiple changes were necessary:

- `DeviceIndex` changed to `int16_t`. Updated consumers that assume it to be an `int8_t`.
- Updated bounds checking for `torch.device()` in the Python frontend. Right now, we allow funny things like `torch.device('cpu', 200).index == -56`, which is undefined behavior. I inserted some checks to only allow values between 0 and `c10::Device::MAX_NUM_DEVICES - 1`.
- Updated the `ArgumentInfo` struct as it hardcodes the device index as 8 bit field [^1]. Might be a breaking change, not sure if users rely on this.
- Introduced `c10::Device::MAX_NUM_DEVICES` as a replacement for the old `C10_COMPILE_TIME_MAX_GPUS`

[^1]: This field was unsigned, so I guess this has also been undef behavior the whole time? Our default device index is -1, so this always wrapped around to 255 when written to the `ArgumentInfo` struct. When I switched the `DeviceIndex` to `int16_t`, it actually stayed 255 after unpacking from `ArgumentInfo` again, as the `DeviceIndex` was now wide enough that it didn't wrap back to -1.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119639
Approved by: https://github.com/cyyever, https://github.com/albanD
2024-02-21 21:10:49 +00:00
PyTorch MergeBot
dabb90f2a4 Revert "[Exception] [6/N] Remove use of torch::TypeError (#117964)"
This reverts commit 87335fabae.

Reverted https://github.com/pytorch/pytorch/pull/117964 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/117964#issuecomment-1913079096))
2024-01-27 08:44:34 +00:00
cyy
87335fabae [Exception] [6/N] Remove use of torch::TypeError (#117964)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/117964
Approved by: https://github.com/albanD
2024-01-25 03:35:58 +00:00
dilililiwhy
b025e5984c Get Device instance with correct type when privateuse1 backend is registered (#117966)
Fixes #ISSUE_NUMBER
If privateuse1 backend is registered. Let torch.device return corresponding instance of Device when only index is given.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/117966
Approved by: https://github.com/albanD, https://github.com/malfet
2024-01-24 19:03:18 +00:00
cyy
396a5c3091 [Exception] [4/N] Replace torch::IndexError and torch::ValueError with C10 counterparts (#117317)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/117317
Approved by: https://github.com/ezyang
2024-01-18 00:35:29 +00:00
Edward Z. Yang
347255809c Make c10::SymInt typecaster support scalar-like fake tensor (#117454)
We can use `__index__` to do this conversion because that will trigger a
guard on data dependent SymInt if the tensor is a fake tensor, but if
we fetch item directly and put it in the Scalar, we may still be able to
make it work out.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/117454
Approved by: https://github.com/yanboliang
ghstack dependencies: #117451, #117452
2024-01-14 15:15:29 +00:00
Edward Z. Yang
796fe40a96 [BE] Delete unnecessary variable fastpath (#117452)
This fastpath is unnecessary because in the logic below we
do the same thing:

```
        auto& var = THPVariable_Unpack(obj);
        if (var.numel() != 1 ||
            !at::isIntegralType(
                var.dtype().toScalarType(), /*include_bool*/ true)) {
          throw_intlist_exception(this, i, obj, idx);
        }
        auto scalar = var.item();
        TORCH_CHECK(scalar.isIntegral(/*include bool*/ false));
        res.push_back(scalar.toSymInt())
```

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/117452
Approved by: https://github.com/yanboliang
ghstack dependencies: #117451
2024-01-14 14:39:46 +00:00
Brian Hirsh
c9ca0dde0d python_arg_parser + dynamic shapes: fix segfault coercing symint to intlist (#111642)
Fixes https://github.com/pytorch/pytorch/issues/104812.

As of https://github.com/pytorch/pytorch/pull/111216, the python arg parser will now guard and cast symints from dynamo into ints when it is forced to (e.g. when we pass a symint to an op that only accepts ints).

But the python arg parser also has logic to try to coerce ints into int[] - we need the same logic for symint -> int[].

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111642
Approved by: https://github.com/ezyang, https://github.com/albanD
ghstack dependencies: #111553
2023-10-22 02:27:14 +00:00
Edward Z. Yang
971f67c988 Allow SymInt to specialize to FLOAT (#111219)
Fixes https://github.com/pytorch/pytorch/issues/111200

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111219
Approved by: https://github.com/Skylion007, https://github.com/bdhirsh
ghstack dependencies: #111216
2023-10-19 12:55:18 +00:00
Edward Z. Yang
40c44c2307 Force specialization on INT_LIST (#111216)
Follow up on https://github.com/pytorch/pytorch/pull/95479

Fixes https://github.com/pytorch/pytorch/issues/111198

Fixes https://github.com/pytorch/pytorch/issues/111197

Fixes https://github.com/pytorch/pytorch/issues/111188

Fixes https://github.com/pytorch/pytorch/issues/111201

Fixes https://github.com/pytorch/pytorch/issues/111202

I can also do this for some other types, will do this stacked on top.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111216
Approved by: https://github.com/voznesenskym
2023-10-19 12:55:18 +00:00
Kurt Mohler
4c5e43574c Reland 2: Add PyObject preservation for UntypedStorage (#109039)
Relands #103907 after it was reverted. This PR makes the new `ignore_hermetic_tls` argument of `check_pyobj` optional to avoid causing a compilation error in torchdistx

Part of #91395

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109039
Approved by: https://github.com/ezyang
2023-09-12 22:26:05 +00:00
PyTorch MergeBot
59f605be57 Revert "Reland 2: Add PyObject preservation for UntypedStorage (#109039)"
This reverts commit 419e4e17a2.

Reverted https://github.com/pytorch/pytorch/pull/109039 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it is failing linter job in trunk, probably due to a landrace ([comment](https://github.com/pytorch/pytorch/pull/109039#issuecomment-1715147020))
2023-09-12 07:26:11 +00:00
Kurt Mohler
419e4e17a2 Reland 2: Add PyObject preservation for UntypedStorage (#109039)
Relands #103907 after it was reverted. This PR makes the new `ignore_hermetic_tls` argument of `check_pyobj` optional to avoid causing a compilation error in torchdistx

Part of #91395

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109039
Approved by: https://github.com/ezyang
2023-09-12 01:19:40 +00:00
PyTorch MergeBot
68238606f3 Revert "Reland: Add PyObject preservation for UntypedStorage (#103907)"
This reverts commit 56b848157c.

Reverted https://github.com/pytorch/pytorch/pull/103907 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but it is failing torchdistx build which uses check_pyobj here 9c1b9f5cb2/src/python/torchdistx/_C/deferred_init.cc (L87) ([comment](https://github.com/pytorch/pytorch/pull/103907#issuecomment-1712121158))
2023-09-08 19:27:07 +00:00
soulitzer
8d863560bd Allow adding extra dispatch keys to wrapper tensor subclass (#108808)
Updated version of https://github.com/pytorch/pytorch/pull/108313 which has more review comments
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108808
Approved by: https://github.com/bdhirsh
2023-09-08 18:46:09 +00:00
Kurt Mohler
56b848157c Reland: Add PyObject preservation for UntypedStorage (#103907)
This relands #97470 after #102553 reverted it. This PR attempts to fix the internal failure by avoiding an unnecessary intermediate storage buffer allocation in `c10::newStorageImplFromRefcountedDataPtr`.

Part of #91395

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103907
Approved by: https://github.com/ezyang
2023-09-07 04:24:11 +00:00
cyy
1fd4e787ce [2/N] fix clang-tidy warnings in torch/csrc (#107966)
Apply fixes to some found issues by clang-tidy in torch/csrc.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107966
Approved by: https://github.com/Skylion007
2023-08-27 18:06:21 +00:00
FFFrog
6f0d0b3850 fix type check of overflow (#107579)
Fixes #95451 and remove duplicate check

**Code:**
```python
import torch
import sys

i = sys.maxsize + 1

input = torch.full((1, 32, 32,), 0.5)
torch.max_pool1d(input, kernel_size=[i] , stride=[i], padding=0, dilation=[i], ceil_mode=True)
```

**Result:**
```shell
Traceback (most recent call last):
  File "/root/Git.d/pytorch/samples/src/simple.py", line 13, in <module>
    torch.max_pool1d(input, kernel_size=[i] , stride=[i], padding=0, dilation=[i], ceil_mode=True)
TypeError: max_pool1d(): argument 'dilation' failed to unpack the object at pos 1 with error "Overflow when unpacking long"
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107579
Approved by: https://github.com/albanD
2023-08-23 15:34:40 +00:00
Yukio Siraichi
bcede143bd Do not mutate SymNode expression. (#107492)
This PR stops `SymNode` from mutating (i.e. simplifying) its expression. Instead, the
simplification (without mutation) is deferred to the `SymNode.maybe_as_int` method.

```python
- FakeTensor(size=(s0,), ...)
- FakeTensor(size=(s1, s2, s3), ...)

- Eq(s0, s1 + s2 + s3)

- FakeTensor(size=(s0,), ...)
- FakeTensor(size=(s1, s2, s3), ...)
```

In summary, this PR:
- Replaces `SymNode._expr` by `SymNode.expr`, removing the old property function
    - This makes it so `SymNode` instances never update their expression
- Creates `SymNode.simplified_expr()` method for actually calling `ShapeEnv.replace` on
  its expression. Note that this doesn't updates `SymNode.expr`
- Changes how `tensor.size()` gets converted to its Python `torch.Size` type
    - Instead of calling `SymInt::maybe_as_int()` method, we create a new
      `SymInt::is_symbolic()` method for checking whether it is actually a symbolic value
    - This is needed so that when we call `tensor.size()` in the Python side, the returned
      sequence is faithful to the actual data, instead of possibly simplifying it and
      returning an integer
    - 2 files needs this modification:
        - _torch/csrc/Size.cpp_: for handling `torch.Tensor.size` Python calls
        - _torch/csrc/utils/pybind.cpp_: for handling `symint.cast()` C++ calls

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107492
Approved by: https://github.com/ezyang
ghstack dependencies: #107523
2023-08-22 12:38:05 +00:00
Sam Gross
d0e50d9094 Move overloaded_args from FunctionSignature to PythonArgs (#106983)
This moves the `overloaded_args` field from FunctionSignature to PythonArgs. FunctionSignature is shared by all calls and should be immutable. PythonArgs contains the parsing results for an single call to the PyTorch API.

I did not measure a difference in performance in the "overrides_benchmark", although I expect there to be a bit more work in the common case. Note that the noise factor for the benchmark is much larger than the differences reported below:

Before:
```
Type tensor had a minimum time of 2.3615360260009766 us and a standard deviation of 0.7833134150132537 us.
Type SubTensor had a minimum time of 10.473251342773438 us and a standard deviation of 0.1973132457351312 us.
Type WithTorchFunction had a minimum time of 5.484819412231445 us and a standard deviation of 0.13305981701705605 us.
Type SubWithTorchFunction had a minimum time of 11.098146438598633 us and a standard deviation of 0.15598918253090233 us.
```
After:
```
Type tensor had a minimum time of 2.2134780883789062 us and a standard deviation of 0.802064489107579 us.
Type SubTensor had a minimum time of 10.625839233398438 us and a standard deviation of 0.15155907021835446 us.
Type WithTorchFunction had a minimum time of 5.520820617675781 us and a standard deviation of 0.23115111980587244 us.
Type SubWithTorchFunction had a minimum time of 11.227846145629883 us and a standard deviation of 0.23032321769278497 us.
```

Fixes #106974

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106983
Approved by: https://github.com/zou3519, https://github.com/ezyang, https://github.com/albanD
2023-08-16 15:59:26 +00:00
cyy
646fa36875 Add const reference in opportunities detected by clang-tidy (#105931)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105931
Approved by: https://github.com/Skylion007
2023-07-26 21:38:10 +00:00
Anthony Alayo
8d65635378 Prefixing DeviceType with c10 namespace to avoid name collisions (#104364)
Fixes #91338

Follow up from https://github.com/pytorch/pytorch/pull/91342

> 🚀 The feature, motivation and pitch
> We have an existing DeviceType class all over the place in our code base, and it conflicts with the one that is used in torch. > Thankfully the pytorch DeciceType enum class is under the c10 namespace.

```
In file included from /xxx/build/_deps/torch-src/../../aten/src/ATen/ops/view.h:5:
/xxx/_deps/torch-src/aten/src/ATen/Context.h:265:14: error: reference to 'DeviceType' is ambiguous
    if (p == DeviceType::HIP) {
             ^
/xxx/include/Common_types.h:178:8: note: candidate found by name lookup is 'DeviceType'
struct DeviceType {
       ^
/xxx/build/_deps/torch-src/c10/../c10/core/DeviceType.h:32:12: note: candidate found by name lookup is 'c10::DeviceType'
enum class DeviceType : int8_t {
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104364
Approved by: https://github.com/albanD
2023-07-07 13:23:03 +00:00
Will Feng
9541053cca [dynamo] support FakeTensor for SYM_INT/SYM_INT_LIST/INT_LIST param in python-to-cpp argument parsing (#103448)
before the PR, when compiling a function with signature symint/symintlist/intlist, we have runtime error like ```argument 'shifts' must be tuple of ints, not FakeTensor```. see newly added unit test in test/dynamo/test_misc.py for repro

after the PR, for FakeTensor with empty size and numel()=1, we will try
to convert it into symint/symintlist. we will likely see expected
exception
```torch._subclasses.fake_tensor.DataDependentOutputException / aten._local_scalar_dense.default``` during conversion

reference PR:
* we handle FakeTensor for symintlist as 1st varags: https://github.com/pytorch/pytorch/pull/97508
* we handle FakeTensor for intlist in a similar way:
https://github.com/pytorch/pytorch/pull/85759/files
* call local_scalar_dense on a FakeTensor:
f7365eca90

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103448
Approved by: https://github.com/yanboliang
2023-06-16 21:33:40 +00:00
Shiyan Deng
685505353a Back out "Add PyObject preservation for UntypedStorage (#97470)" (#102553)
Summary:
Original commit changeset: c24708d18ccb

Original Phabricator Diff: D46159983

Test Plan: SL tests and CI

Differential Revision: D46284986

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102553
Approved by: https://github.com/DanilBaibak
2023-06-01 17:23:43 +00:00
lantiankaikai
17166c2511 python_arg_parser to allow fake tensor element in symint_list when in dynamo mode #95424 (#97508)
Failing mechanism on #95424 :
In dynamo mode, when passing numpy.int_ to 'shape' like param (Sequence[Union[int, symint]]) is wrapped as list with FakeTensor.  However, in python_arg_parser, parser expect int in symint_list but got FakeTensor.

Following #85759, this PR allow tensor element in symint_list when in dynamo mode

This PR also fix below test with similar failing mechanism
pytest ./generated/test_huggingface_diffusers.py -k test_016
pytest ./generated/test_ustcml_RecStudio.py -k test_036

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97508
Approved by: https://github.com/yanboliang
2023-05-31 19:19:17 +00:00
Kurt Mohler
5fe629e314 Add PyObject preservation for UntypedStorage (#97470)
Part of #91395

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97470
Approved by: https://github.com/ezyang
2023-05-23 01:27:30 +00:00
PandaNinjas
f0786ad776 Use %zu instead of %ld when formatting size_t (#101412)
This fixes compiling on systems where `size_t` is an `unsigned int` instead of an `unsigned long int` (32 bit Raspberry Pi OS is one example).
`%ld` expects an `unsigned long int`, while `%zu` specifies that it's an unsigned size_t.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101412
Approved by: https://github.com/albanD
2023-05-16 02:45:55 +00:00
Edward Z. Yang
756a86d52c Support large negative SymInt (#99157)
The strategy is that we will heap allocate a LargeNegativeIntSymNodeImpl whenever we have a large negative int, so that we can keep the old `is_symbolic` test (now called `is_heap_allocated`) on SymInt. Whenever we need to do something with these ints, though, we convert them back into a plain `int64_t` (and then, e.g., wrap it in whatever user specificed SymNodeImpl they need.) We cannot wrap directly in the user specified SymNodeImpl as we generally do not know what the "tracing context" is from C++. We expect large negative ints to be rare, so we don't apply optimizations like singleton-ifying INT_MIN.  Here's the order to review:

* c10/core/SymInt.h and cpp
  * `is_symbolic` renamed to `is_heap_allocated` as I needed to audit all use sites: the old `is_symbolic` test would return true for large negative int, but it would be wrong to then try to dispatch on the LargeNegativeIntSymNodeImpl which supports very few operations. In this file, I had to update expect_int,
  * If you pass in a large negative integer, we instead heap allocate it in `promote_to_negative`. The function is written in a funny way to keep compact constructor code for SymInt (the heap allocation happens out of line)
  * clone is now moved out-of-line
  * New method maybe_as_int which will give you a constant int if it is possible, either because it's stored inline or in LargeNegativeIntSymNodeImpl. This is the preferred replacement for previous use of is_symbolic() and then as_int_unchecked().
  * Rename toSymNodeImpl to toSymNode, which is more correct (since it returns a SymNode)
  * Complete rewrite of `normalize_symints.cpp` to use new `maybe_as_int`. Cannot easily use the old code structure, so it's now done doing a macro and typing out each case manually (it's actually not that bad.)
  * Reimplementations of all the unary operators by hand to use `maybe_as_int`, relatively simple.
* c10/core/LargeNegativeIntSymNodeImpl.h - Just stores a int64_t value, but it has to be big and negative. Most methods are not implemented, since we will rewrap the large negative int in the real SymNodeImpl subclass before doing operations with it
* The rest of the files are just rewriting code to use `maybe_as_int`. There is a nontrivial comment in c10/core/SymIntArrayRef.h

Very minor test adjustment in c10/test/core/SymInt_test.cpp . Plan to exercise this properly in next PR.

Companion XLA PR: https://github.com/pytorch/xla/pull/4882

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99157
Approved by: https://github.com/albanD
2023-04-15 22:43:51 +00:00
Tugsbayasgalan Manlaibaatar
39fd7f945f Add Symbool support in python to C++ translation (#98453)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98453
Approved by: https://github.com/ezyang
2023-04-12 03:21:57 +00:00
albanD
dda95236c9 Add fast path in our type checks and argparser (#98764)
Add fastpath for common use cases in our python arg parsing.
This is using the observation that exact type check is a lot fast (pointer comparison) than subtype check (isintance call). So we make sure to do these before any isinstance check.

This can be pretty significant where `a.view((1, 1, 1, 1))` goes from ~1.13us to 800ns.

Full test:

Tested perf locally with cpu freq locked and script pinned to a single core to reduce jitter.
Benchmark results after doing each change in this PR one by one:
```
[albandes@albandes-fedora-K2202N0104138 test]$ # Original
[albandes@albandes-fedora-K2202N0104138 test]$ taskset 0x1 ipython foo.py
No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
Running  a.view(1)
827 ns ± 0.945 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.view((1, 1))
947 ns ± 1.23 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.view((1, 1, 1))
1.04 µs ± 0.882 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.view((1, 1, 1, 1))
1.14 µs ± 1.59 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.squeeze(0)
797 ns ± 0.955 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.squeeze((0,))
937 ns ± 1.51 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.squeeze((0, 1))
1.02 µs ± 3.52 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
[albandes@albandes-fedora-K2202N0104138 test]$ taskset 0x1 ipython foo.py
No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
Running  a.view(1)
823 ns ± 1.76 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.view((1, 1))
938 ns ± 1.38 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.view((1, 1, 1))
1.03 µs ± 0.801 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.view((1, 1, 1, 1))
1.13 µs ± 0.877 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.squeeze(0)
768 ns ± 2.27 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.squeeze((0,))
927 ns ± 0.779 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.squeeze((0, 1))
1.01 µs ± 1.34 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

[albandes@albandes-fedora-K2202N0104138 test]$ # checkLong fastpath
[albandes@albandes-fedora-K2202N0104138 test]$ taskset 0x1 ipython foo.py
No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
Running  a.view(1)
801 ns ± 0.982 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.view((1, 1))
900 ns ± 0.593 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.view((1, 1, 1))
1 µs ± 1.44 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.view((1, 1, 1, 1))
1.1 µs ± 1.38 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.squeeze(0)
782 ns ± 0.968 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.squeeze((0,))
1.11 µs ± 424 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.squeeze((0, 1))
1.09 µs ± 54.7 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
[albandes@albandes-fedora-K2202N0104138 test]$ taskset 0x1 ipython foo.py
No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
Running  a.view(1)
817 ns ± 0.65 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.view((1, 1))
912 ns ± 0.853 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.view((1, 1, 1))
1.02 µs ± 8.45 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.view((1, 1, 1, 1))
1.11 µs ± 2.53 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.squeeze(0)
781 ns ± 0.942 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.squeeze((0,))
939 ns ± 1.57 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.squeeze((0, 1))
1.01 µs ± 0.875 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

[albandes@albandes-fedora-K2202N0104138 test]$ # Tensor check fastpath
[albandes@albandes-fedora-K2202N0104138 test]$ taskset 0x1 ipython foo.py
No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
Running  a.view(1)
806 ns ± 2.8 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.view((1, 1))
903 ns ± 1.82 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.view((1, 1, 1))
1 µs ± 1.21 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.view((1, 1, 1, 1))
1.1 µs ± 1.17 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.squeeze(0)
770 ns ± 1.66 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.squeeze((0,))
931 ns ± 3.36 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.squeeze((0, 1))
1.02 µs ± 0.983 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
[albandes@albandes-fedora-K2202N0104138 test]$ taskset 0x1 ipython foo.py
No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
Running  a.view(1)
813 ns ± 2.42 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.view((1, 1))
915 ns ± 0.868 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.view((1, 1, 1))
1.02 µs ± 1.09 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.view((1, 1, 1, 1))
1.11 µs ± 1.15 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.squeeze(0)
785 ns ± 0.807 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.squeeze((0,))
941 ns ± 1.02 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.squeeze((0, 1))
1.02 µs ± 0.857 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

[albandes@albandes-fedora-K2202N0104138 test]$ # Fast path number in intlist/symintlist
[albandes@albandes-fedora-K2202N0104138 test]$ taskset 0x1 ipython foo.py
No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
Running  a.view(1)
728 ns ± 0.503 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.view((1, 1))
749 ns ± 0.829 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.view((1, 1, 1))
771 ns ± 0.727 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.view((1, 1, 1, 1))
800 ns ± 0.962 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.squeeze(0)
772 ns ± 0.622 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.squeeze((0,))
883 ns ± 0.567 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.squeeze((0, 1))
915 ns ± 0.638 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
[albandes@albandes-fedora-K2202N0104138 test]$ taskset 0x1 ipython foo.py
No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
Running  a.view(1)
735 ns ± 1.27 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.view((1, 1))
753 ns ± 2.57 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.view((1, 1, 1))
774 ns ± 1.38 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.view((1, 1, 1, 1))
801 ns ± 0.835 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.squeeze(0)
773 ns ± 0.677 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.squeeze((0,))
873 ns ± 1.1 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.squeeze((0, 1))
907 ns ± 0.836 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
```

<details>
  <summary>Test script</summary>

```python
import torch
from IPython import get_ipython

a = torch.empty(1)
print("Running ", "a.view(1)")
get_ipython().run_line_magic("timeit", "a.view(1)")
print("Running ", "a.view((1, 1))")
get_ipython().run_line_magic("timeit", "a.view((1, 1))")
print("Running ", "a.view((1, 1, 1))")
get_ipython().run_line_magic("timeit", "a.view((1, 1, 1))")
print("Running ", "a.view((1, 1, 1, 1))")
get_ipython().run_line_magic("timeit", "a.view((1, 1, 1, 1))")

a = torch.empty(1, 1, 1)
print("Running ", "a.squeeze(0)")
get_ipython().run_line_magic("timeit", "a.squeeze(0)")
print("Running ", "a.squeeze((0,))")
get_ipython().run_line_magic("timeit", "a.squeeze((0,))")
print("Running ", "a.squeeze((0, 1))")
get_ipython().run_line_magic("timeit", "a.squeeze((0, 1))")
```

</details>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98764
Approved by: https://github.com/ngimel
2023-04-11 00:08:26 +00:00
Elias Ellison
5c8fea5647 Reduce overhead in CUDAGraph Trees (#98529)
Significantly reduces overhead of constructing Tensors and Storages and checking Storage Liveness. Removes the regression for HF models that I tested and removes 75% of overhead of the extremely overhead bound resnet50 training we have in torchbench. (.91x base commit, 1.02x torchinductor default, 1.16x this PR, 1.25 previous cudagraphs impl).

This PR takes care of all of the lower hanging fruit.

- Computes storage aliasing at record time instead of during at runtime. We no longer need to use a runtime storage cache, and can instead index directly into the existing alias if there is one, or construct a new Storage

- Moves the heavyweight C++ calls into a batch - getting storage weakrefs and constructing tensors

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98529
Approved by: https://github.com/jansel, https://github.com/ngimel
2023-04-07 05:46:08 +00:00
Escapeqyq
3112d2a2b6 Export function symbols to enable Windows build of Intel Extension for PyTorch (#98054)
This PR is to export specific function symbols into .dll shared library on Windows platform to support Windows build for [Intel Extension for PyTorch](https://github.com/intel/intel-extension-for-pytorch).
TORCH_API/TORCH_PYTHON_API/PYBIND11_EXPORT are macros that decorate the function as dllexport while compilation, so that the function symbol will be exported into the .dll shared library file on Windows platform. It is necessary for other libraries (such as IPEX) to import and call these functions through dynamic linking of PyTorch on Windows platform.
The code changes of this PR adds decorators to export specific functions used by IPEX.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98054
Approved by: https://github.com/ezyang
2023-04-05 23:23:18 +00:00