Commit Graph

205 Commits

Author SHA1 Message Date
Edward Z. Yang
40c44c2307 Force specialization on INT_LIST (#111216)
Follow up on https://github.com/pytorch/pytorch/pull/95479

Fixes https://github.com/pytorch/pytorch/issues/111198

Fixes https://github.com/pytorch/pytorch/issues/111197

Fixes https://github.com/pytorch/pytorch/issues/111188

Fixes https://github.com/pytorch/pytorch/issues/111201

Fixes https://github.com/pytorch/pytorch/issues/111202

I can also do this for some other types, will do this stacked on top.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111216
Approved by: https://github.com/voznesenskym
2023-10-19 12:55:18 +00:00
Kurt Mohler
4c5e43574c Reland 2: Add PyObject preservation for UntypedStorage (#109039)
Relands #103907 after it was reverted. This PR makes the new `ignore_hermetic_tls` argument of `check_pyobj` optional to avoid causing a compilation error in torchdistx

Part of #91395

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109039
Approved by: https://github.com/ezyang
2023-09-12 22:26:05 +00:00
PyTorch MergeBot
59f605be57 Revert "Reland 2: Add PyObject preservation for UntypedStorage (#109039)"
This reverts commit 419e4e17a2.

Reverted https://github.com/pytorch/pytorch/pull/109039 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it is failing linter job in trunk, probably due to a landrace ([comment](https://github.com/pytorch/pytorch/pull/109039#issuecomment-1715147020))
2023-09-12 07:26:11 +00:00
Kurt Mohler
419e4e17a2 Reland 2: Add PyObject preservation for UntypedStorage (#109039)
Relands #103907 after it was reverted. This PR makes the new `ignore_hermetic_tls` argument of `check_pyobj` optional to avoid causing a compilation error in torchdistx

Part of #91395

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109039
Approved by: https://github.com/ezyang
2023-09-12 01:19:40 +00:00
PyTorch MergeBot
68238606f3 Revert "Reland: Add PyObject preservation for UntypedStorage (#103907)"
This reverts commit 56b848157c.

Reverted https://github.com/pytorch/pytorch/pull/103907 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but it is failing torchdistx build which uses check_pyobj here 9c1b9f5cb2/src/python/torchdistx/_C/deferred_init.cc (L87) ([comment](https://github.com/pytorch/pytorch/pull/103907#issuecomment-1712121158))
2023-09-08 19:27:07 +00:00
soulitzer
8d863560bd Allow adding extra dispatch keys to wrapper tensor subclass (#108808)
Updated version of https://github.com/pytorch/pytorch/pull/108313 which has more review comments
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108808
Approved by: https://github.com/bdhirsh
2023-09-08 18:46:09 +00:00
Kurt Mohler
56b848157c Reland: Add PyObject preservation for UntypedStorage (#103907)
This relands #97470 after #102553 reverted it. This PR attempts to fix the internal failure by avoiding an unnecessary intermediate storage buffer allocation in `c10::newStorageImplFromRefcountedDataPtr`.

Part of #91395

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103907
Approved by: https://github.com/ezyang
2023-09-07 04:24:11 +00:00
cyy
1fd4e787ce [2/N] fix clang-tidy warnings in torch/csrc (#107966)
Apply fixes to some found issues by clang-tidy in torch/csrc.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107966
Approved by: https://github.com/Skylion007
2023-08-27 18:06:21 +00:00
FFFrog
6f0d0b3850 fix type check of overflow (#107579)
Fixes #95451 and remove duplicate check

**Code:**
```python
import torch
import sys

i = sys.maxsize + 1

input = torch.full((1, 32, 32,), 0.5)
torch.max_pool1d(input, kernel_size=[i] , stride=[i], padding=0, dilation=[i], ceil_mode=True)
```

**Result:**
```shell
Traceback (most recent call last):
  File "/root/Git.d/pytorch/samples/src/simple.py", line 13, in <module>
    torch.max_pool1d(input, kernel_size=[i] , stride=[i], padding=0, dilation=[i], ceil_mode=True)
TypeError: max_pool1d(): argument 'dilation' failed to unpack the object at pos 1 with error "Overflow when unpacking long"
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107579
Approved by: https://github.com/albanD
2023-08-23 15:34:40 +00:00
Yukio Siraichi
bcede143bd Do not mutate SymNode expression. (#107492)
This PR stops `SymNode` from mutating (i.e. simplifying) its expression. Instead, the
simplification (without mutation) is deferred to the `SymNode.maybe_as_int` method.

```python
- FakeTensor(size=(s0,), ...)
- FakeTensor(size=(s1, s2, s3), ...)

- Eq(s0, s1 + s2 + s3)

- FakeTensor(size=(s0,), ...)
- FakeTensor(size=(s1, s2, s3), ...)
```

In summary, this PR:
- Replaces `SymNode._expr` by `SymNode.expr`, removing the old property function
    - This makes it so `SymNode` instances never update their expression
- Creates `SymNode.simplified_expr()` method for actually calling `ShapeEnv.replace` on
  its expression. Note that this doesn't updates `SymNode.expr`
- Changes how `tensor.size()` gets converted to its Python `torch.Size` type
    - Instead of calling `SymInt::maybe_as_int()` method, we create a new
      `SymInt::is_symbolic()` method for checking whether it is actually a symbolic value
    - This is needed so that when we call `tensor.size()` in the Python side, the returned
      sequence is faithful to the actual data, instead of possibly simplifying it and
      returning an integer
    - 2 files needs this modification:
        - _torch/csrc/Size.cpp_: for handling `torch.Tensor.size` Python calls
        - _torch/csrc/utils/pybind.cpp_: for handling `symint.cast()` C++ calls

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107492
Approved by: https://github.com/ezyang
ghstack dependencies: #107523
2023-08-22 12:38:05 +00:00
Sam Gross
d0e50d9094 Move overloaded_args from FunctionSignature to PythonArgs (#106983)
This moves the `overloaded_args` field from FunctionSignature to PythonArgs. FunctionSignature is shared by all calls and should be immutable. PythonArgs contains the parsing results for an single call to the PyTorch API.

I did not measure a difference in performance in the "overrides_benchmark", although I expect there to be a bit more work in the common case. Note that the noise factor for the benchmark is much larger than the differences reported below:

Before:
```
Type tensor had a minimum time of 2.3615360260009766 us and a standard deviation of 0.7833134150132537 us.
Type SubTensor had a minimum time of 10.473251342773438 us and a standard deviation of 0.1973132457351312 us.
Type WithTorchFunction had a minimum time of 5.484819412231445 us and a standard deviation of 0.13305981701705605 us.
Type SubWithTorchFunction had a minimum time of 11.098146438598633 us and a standard deviation of 0.15598918253090233 us.
```
After:
```
Type tensor had a minimum time of 2.2134780883789062 us and a standard deviation of 0.802064489107579 us.
Type SubTensor had a minimum time of 10.625839233398438 us and a standard deviation of 0.15155907021835446 us.
Type WithTorchFunction had a minimum time of 5.520820617675781 us and a standard deviation of 0.23115111980587244 us.
Type SubWithTorchFunction had a minimum time of 11.227846145629883 us and a standard deviation of 0.23032321769278497 us.
```

Fixes #106974

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106983
Approved by: https://github.com/zou3519, https://github.com/ezyang, https://github.com/albanD
2023-08-16 15:59:26 +00:00
cyy
646fa36875 Add const reference in opportunities detected by clang-tidy (#105931)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105931
Approved by: https://github.com/Skylion007
2023-07-26 21:38:10 +00:00
Anthony Alayo
8d65635378 Prefixing DeviceType with c10 namespace to avoid name collisions (#104364)
Fixes #91338

Follow up from https://github.com/pytorch/pytorch/pull/91342

> 🚀 The feature, motivation and pitch
> We have an existing DeviceType class all over the place in our code base, and it conflicts with the one that is used in torch. > Thankfully the pytorch DeciceType enum class is under the c10 namespace.

```
In file included from /xxx/build/_deps/torch-src/../../aten/src/ATen/ops/view.h:5:
/xxx/_deps/torch-src/aten/src/ATen/Context.h:265:14: error: reference to 'DeviceType' is ambiguous
    if (p == DeviceType::HIP) {
             ^
/xxx/include/Common_types.h:178:8: note: candidate found by name lookup is 'DeviceType'
struct DeviceType {
       ^
/xxx/build/_deps/torch-src/c10/../c10/core/DeviceType.h:32:12: note: candidate found by name lookup is 'c10::DeviceType'
enum class DeviceType : int8_t {
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104364
Approved by: https://github.com/albanD
2023-07-07 13:23:03 +00:00
Will Feng
9541053cca [dynamo] support FakeTensor for SYM_INT/SYM_INT_LIST/INT_LIST param in python-to-cpp argument parsing (#103448)
before the PR, when compiling a function with signature symint/symintlist/intlist, we have runtime error like ```argument 'shifts' must be tuple of ints, not FakeTensor```. see newly added unit test in test/dynamo/test_misc.py for repro

after the PR, for FakeTensor with empty size and numel()=1, we will try
to convert it into symint/symintlist. we will likely see expected
exception
```torch._subclasses.fake_tensor.DataDependentOutputException / aten._local_scalar_dense.default``` during conversion

reference PR:
* we handle FakeTensor for symintlist as 1st varags: https://github.com/pytorch/pytorch/pull/97508
* we handle FakeTensor for intlist in a similar way:
https://github.com/pytorch/pytorch/pull/85759/files
* call local_scalar_dense on a FakeTensor:
f7365eca90

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103448
Approved by: https://github.com/yanboliang
2023-06-16 21:33:40 +00:00
Shiyan Deng
685505353a Back out "Add PyObject preservation for UntypedStorage (#97470)" (#102553)
Summary:
Original commit changeset: c24708d18ccb

Original Phabricator Diff: D46159983

Test Plan: SL tests and CI

Differential Revision: D46284986

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102553
Approved by: https://github.com/DanilBaibak
2023-06-01 17:23:43 +00:00
lantiankaikai
17166c2511 python_arg_parser to allow fake tensor element in symint_list when in dynamo mode #95424 (#97508)
Failing mechanism on #95424 :
In dynamo mode, when passing numpy.int_ to 'shape' like param (Sequence[Union[int, symint]]) is wrapped as list with FakeTensor.  However, in python_arg_parser, parser expect int in symint_list but got FakeTensor.

Following #85759, this PR allow tensor element in symint_list when in dynamo mode

This PR also fix below test with similar failing mechanism
pytest ./generated/test_huggingface_diffusers.py -k test_016
pytest ./generated/test_ustcml_RecStudio.py -k test_036

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97508
Approved by: https://github.com/yanboliang
2023-05-31 19:19:17 +00:00
Kurt Mohler
5fe629e314 Add PyObject preservation for UntypedStorage (#97470)
Part of #91395

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97470
Approved by: https://github.com/ezyang
2023-05-23 01:27:30 +00:00
PandaNinjas
f0786ad776 Use %zu instead of %ld when formatting size_t (#101412)
This fixes compiling on systems where `size_t` is an `unsigned int` instead of an `unsigned long int` (32 bit Raspberry Pi OS is one example).
`%ld` expects an `unsigned long int`, while `%zu` specifies that it's an unsigned size_t.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101412
Approved by: https://github.com/albanD
2023-05-16 02:45:55 +00:00
Edward Z. Yang
756a86d52c Support large negative SymInt (#99157)
The strategy is that we will heap allocate a LargeNegativeIntSymNodeImpl whenever we have a large negative int, so that we can keep the old `is_symbolic` test (now called `is_heap_allocated`) on SymInt. Whenever we need to do something with these ints, though, we convert them back into a plain `int64_t` (and then, e.g., wrap it in whatever user specificed SymNodeImpl they need.) We cannot wrap directly in the user specified SymNodeImpl as we generally do not know what the "tracing context" is from C++. We expect large negative ints to be rare, so we don't apply optimizations like singleton-ifying INT_MIN.  Here's the order to review:

* c10/core/SymInt.h and cpp
  * `is_symbolic` renamed to `is_heap_allocated` as I needed to audit all use sites: the old `is_symbolic` test would return true for large negative int, but it would be wrong to then try to dispatch on the LargeNegativeIntSymNodeImpl which supports very few operations. In this file, I had to update expect_int,
  * If you pass in a large negative integer, we instead heap allocate it in `promote_to_negative`. The function is written in a funny way to keep compact constructor code for SymInt (the heap allocation happens out of line)
  * clone is now moved out-of-line
  * New method maybe_as_int which will give you a constant int if it is possible, either because it's stored inline or in LargeNegativeIntSymNodeImpl. This is the preferred replacement for previous use of is_symbolic() and then as_int_unchecked().
  * Rename toSymNodeImpl to toSymNode, which is more correct (since it returns a SymNode)
  * Complete rewrite of `normalize_symints.cpp` to use new `maybe_as_int`. Cannot easily use the old code structure, so it's now done doing a macro and typing out each case manually (it's actually not that bad.)
  * Reimplementations of all the unary operators by hand to use `maybe_as_int`, relatively simple.
* c10/core/LargeNegativeIntSymNodeImpl.h - Just stores a int64_t value, but it has to be big and negative. Most methods are not implemented, since we will rewrap the large negative int in the real SymNodeImpl subclass before doing operations with it
* The rest of the files are just rewriting code to use `maybe_as_int`. There is a nontrivial comment in c10/core/SymIntArrayRef.h

Very minor test adjustment in c10/test/core/SymInt_test.cpp . Plan to exercise this properly in next PR.

Companion XLA PR: https://github.com/pytorch/xla/pull/4882

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99157
Approved by: https://github.com/albanD
2023-04-15 22:43:51 +00:00
Tugsbayasgalan Manlaibaatar
39fd7f945f Add Symbool support in python to C++ translation (#98453)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98453
Approved by: https://github.com/ezyang
2023-04-12 03:21:57 +00:00
albanD
dda95236c9 Add fast path in our type checks and argparser (#98764)
Add fastpath for common use cases in our python arg parsing.
This is using the observation that exact type check is a lot fast (pointer comparison) than subtype check (isintance call). So we make sure to do these before any isinstance check.

This can be pretty significant where `a.view((1, 1, 1, 1))` goes from ~1.13us to 800ns.

Full test:

Tested perf locally with cpu freq locked and script pinned to a single core to reduce jitter.
Benchmark results after doing each change in this PR one by one:
```
[albandes@albandes-fedora-K2202N0104138 test]$ # Original
[albandes@albandes-fedora-K2202N0104138 test]$ taskset 0x1 ipython foo.py
No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
Running  a.view(1)
827 ns ± 0.945 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.view((1, 1))
947 ns ± 1.23 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.view((1, 1, 1))
1.04 µs ± 0.882 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.view((1, 1, 1, 1))
1.14 µs ± 1.59 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.squeeze(0)
797 ns ± 0.955 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.squeeze((0,))
937 ns ± 1.51 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.squeeze((0, 1))
1.02 µs ± 3.52 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
[albandes@albandes-fedora-K2202N0104138 test]$ taskset 0x1 ipython foo.py
No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
Running  a.view(1)
823 ns ± 1.76 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.view((1, 1))
938 ns ± 1.38 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.view((1, 1, 1))
1.03 µs ± 0.801 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.view((1, 1, 1, 1))
1.13 µs ± 0.877 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.squeeze(0)
768 ns ± 2.27 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.squeeze((0,))
927 ns ± 0.779 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.squeeze((0, 1))
1.01 µs ± 1.34 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

[albandes@albandes-fedora-K2202N0104138 test]$ # checkLong fastpath
[albandes@albandes-fedora-K2202N0104138 test]$ taskset 0x1 ipython foo.py
No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
Running  a.view(1)
801 ns ± 0.982 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.view((1, 1))
900 ns ± 0.593 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.view((1, 1, 1))
1 µs ± 1.44 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.view((1, 1, 1, 1))
1.1 µs ± 1.38 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.squeeze(0)
782 ns ± 0.968 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.squeeze((0,))
1.11 µs ± 424 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.squeeze((0, 1))
1.09 µs ± 54.7 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
[albandes@albandes-fedora-K2202N0104138 test]$ taskset 0x1 ipython foo.py
No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
Running  a.view(1)
817 ns ± 0.65 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.view((1, 1))
912 ns ± 0.853 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.view((1, 1, 1))
1.02 µs ± 8.45 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.view((1, 1, 1, 1))
1.11 µs ± 2.53 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.squeeze(0)
781 ns ± 0.942 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.squeeze((0,))
939 ns ± 1.57 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.squeeze((0, 1))
1.01 µs ± 0.875 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

[albandes@albandes-fedora-K2202N0104138 test]$ # Tensor check fastpath
[albandes@albandes-fedora-K2202N0104138 test]$ taskset 0x1 ipython foo.py
No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
Running  a.view(1)
806 ns ± 2.8 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.view((1, 1))
903 ns ± 1.82 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.view((1, 1, 1))
1 µs ± 1.21 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.view((1, 1, 1, 1))
1.1 µs ± 1.17 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.squeeze(0)
770 ns ± 1.66 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.squeeze((0,))
931 ns ± 3.36 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.squeeze((0, 1))
1.02 µs ± 0.983 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
[albandes@albandes-fedora-K2202N0104138 test]$ taskset 0x1 ipython foo.py
No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
Running  a.view(1)
813 ns ± 2.42 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.view((1, 1))
915 ns ± 0.868 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.view((1, 1, 1))
1.02 µs ± 1.09 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.view((1, 1, 1, 1))
1.11 µs ± 1.15 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.squeeze(0)
785 ns ± 0.807 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.squeeze((0,))
941 ns ± 1.02 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.squeeze((0, 1))
1.02 µs ± 0.857 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

[albandes@albandes-fedora-K2202N0104138 test]$ # Fast path number in intlist/symintlist
[albandes@albandes-fedora-K2202N0104138 test]$ taskset 0x1 ipython foo.py
No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
Running  a.view(1)
728 ns ± 0.503 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.view((1, 1))
749 ns ± 0.829 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.view((1, 1, 1))
771 ns ± 0.727 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.view((1, 1, 1, 1))
800 ns ± 0.962 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.squeeze(0)
772 ns ± 0.622 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.squeeze((0,))
883 ns ± 0.567 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.squeeze((0, 1))
915 ns ± 0.638 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
[albandes@albandes-fedora-K2202N0104138 test]$ taskset 0x1 ipython foo.py
No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
Running  a.view(1)
735 ns ± 1.27 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.view((1, 1))
753 ns ± 2.57 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.view((1, 1, 1))
774 ns ± 1.38 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.view((1, 1, 1, 1))
801 ns ± 0.835 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.squeeze(0)
773 ns ± 0.677 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.squeeze((0,))
873 ns ± 1.1 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Running  a.squeeze((0, 1))
907 ns ± 0.836 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
```

<details>
  <summary>Test script</summary>

```python
import torch
from IPython import get_ipython

a = torch.empty(1)
print("Running ", "a.view(1)")
get_ipython().run_line_magic("timeit", "a.view(1)")
print("Running ", "a.view((1, 1))")
get_ipython().run_line_magic("timeit", "a.view((1, 1))")
print("Running ", "a.view((1, 1, 1))")
get_ipython().run_line_magic("timeit", "a.view((1, 1, 1))")
print("Running ", "a.view((1, 1, 1, 1))")
get_ipython().run_line_magic("timeit", "a.view((1, 1, 1, 1))")

a = torch.empty(1, 1, 1)
print("Running ", "a.squeeze(0)")
get_ipython().run_line_magic("timeit", "a.squeeze(0)")
print("Running ", "a.squeeze((0,))")
get_ipython().run_line_magic("timeit", "a.squeeze((0,))")
print("Running ", "a.squeeze((0, 1))")
get_ipython().run_line_magic("timeit", "a.squeeze((0, 1))")
```

</details>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98764
Approved by: https://github.com/ngimel
2023-04-11 00:08:26 +00:00
Elias Ellison
5c8fea5647 Reduce overhead in CUDAGraph Trees (#98529)
Significantly reduces overhead of constructing Tensors and Storages and checking Storage Liveness. Removes the regression for HF models that I tested and removes 75% of overhead of the extremely overhead bound resnet50 training we have in torchbench. (.91x base commit, 1.02x torchinductor default, 1.16x this PR, 1.25 previous cudagraphs impl).

This PR takes care of all of the lower hanging fruit.

- Computes storage aliasing at record time instead of during at runtime. We no longer need to use a runtime storage cache, and can instead index directly into the existing alias if there is one, or construct a new Storage

- Moves the heavyweight C++ calls into a batch - getting storage weakrefs and constructing tensors

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98529
Approved by: https://github.com/jansel, https://github.com/ngimel
2023-04-07 05:46:08 +00:00
Escapeqyq
3112d2a2b6 Export function symbols to enable Windows build of Intel Extension for PyTorch (#98054)
This PR is to export specific function symbols into .dll shared library on Windows platform to support Windows build for [Intel Extension for PyTorch](https://github.com/intel/intel-extension-for-pytorch).
TORCH_API/TORCH_PYTHON_API/PYBIND11_EXPORT are macros that decorate the function as dllexport while compilation, so that the function symbol will be exported into the .dll shared library file on Windows platform. It is necessary for other libraries (such as IPEX) to import and call these functions through dynamic linking of PyTorch on Windows platform.
The code changes of this PR adds decorators to export specific functions used by IPEX.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98054
Approved by: https://github.com/ezyang
2023-04-05 23:23:18 +00:00
cyy
f27e09de04 Cleanup Windows warning suppression in CMake and fix some warnings in the source code (#94927)
This PR do two things:
1. It moves some Windows warning suppression from various CMake files into the main CMakeList.txt, following the conventions of gcc and clang.
2. It fixes some Windows warnings in the source code. Most importantly, it fixes lots of dll warnings by adjusting C10_API to TORCH_API or TORCH_PYTHON_API. There are still some dll warnings because some TORCH_API functions are actually built as part of libtorch_python

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94927
Approved by: https://github.com/malfet
2023-02-27 19:22:20 +00:00
Edward Z. Yang
d78274b759 Automatically guard when SymInt is converted to int (#95479)
During enablement, we disabled int() conversions because they were
any easy way to footgun guards.  We have enough of dynamic shapes
working now that this is now causing spurious errors; e.g., if you feed
a symbolic int to x.size(symint).  We now allow for implicit conversions
of SymInt to int here, posting a guard.  We expect guard provenance
to help people debug overspecialization.

Fixes https://github.com/pytorch/pytorch/issues/95328

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95479
Approved by: https://github.com/wconstab, https://github.com/voznesenskym, https://github.com/ngimel
2023-02-25 19:41:51 +00:00
cyy
bfe5e1258b avoid unnecessary static_cast (#93898)
avoid unnecessary static_cast
Pull Request resolved: https://github.com/pytorch/pytorch/pull/93898
Approved by: https://github.com/Skylion007
2023-02-03 03:44:43 +00:00
Eddie Yan
e096d2db5a [BC-Breaking] Separate stream_id, device_index, and device_type in pack and unpack for Streams (#81596)
#75854

A naive attempt at working around the limitations of using a single 64-bit integer to pack `stream_id`, `device_index`, and `device_type`.

Stills needs sanity checks, testing, and minimization of BC-breaking changes.

Currently a Holder for the `StreamData3` struct is used for `IValue` compatibility. While doing this seems to work for `ivalue.h` and `ivalue_inl.h`, this doesn't seem to be naively working for the JIT CUDA stream wrapper? (Something about ambiguous calls if an `intrusive_ptr` to `c10::ivalue::StreamData3Holder` is used as the return type for `pack()`. It turns out that the methods required to access the fields for rematerializing a CUDA Stream are basically already present anyway, so `pack` is simply removed in the wrapper for now and the methods to access the required fields are called directly.

CC @ptrblck

Pull Request resolved: https://github.com/pytorch/pytorch/pull/81596
Approved by: https://github.com/ezyang
2023-01-12 14:16:49 +00:00
PyTorch MergeBot
b3603f8129 Revert "Deduplicate c10 error and PyTorchError hierarchy (#87855)"
This reverts commit 34f2d3e6ae.

Reverted https://github.com/pytorch/pytorch/pull/87855 on behalf of https://github.com/osalpekar due to perf regression in quantization tests
2023-01-06 19:56:35 +00:00
William Phetsinorath
34f2d3e6ae Deduplicate c10 error and PyTorchError hierarchy (#87855)
Fixes #53370

Pull Request resolved: https://github.com/pytorch/pytorch/pull/87855
Approved by: https://github.com/albanD
2023-01-02 15:53:36 +00:00
Aaron Gokaslan
a34a9c3471 Perf: Apply more clang-tidy fixups to torch headers (#91445)
Applies so more fixes to headers that may have been missed before for performance optimization.cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @EikanWang @ezyang since this more in the series of the clang-tidy fixup

This is PR fixes 3 main issues:
1. Use emplacement more in headers
1. Avoid unnecessary copies and use const ref when possible
1. Default any special functions when possible to make them potentially trivial and more readable.
1. There is also one change in this PR that tries to prevent unnecessary math promotion, the rest of these changes are in another PR
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91445
Approved by: https://github.com/ezyang
2022-12-29 23:43:45 +00:00
Edward Z. Yang
f7365eca90 Add unbacked symints support; item works now (#90624)
The big idea is to add `create_unbacked_symfloat` and `create_unbacked_symint` to ShapeEnv, allowing you to allocate symbolic floats/ints corresponding to data you don't know about at compile time. Then, instead of immediately erroring out when you try to call local_scalar_dense on a FakeTensor, we instead create a fresh symint/symfloat and return that.

There a bunch of odds and ends that need to be handled:

* A number of `numel` calls converted to `sym_numel`
* When we finally return from item(), we need to ensure we actually produce a SymInt/SymFloat when appropriate. The previous binding code assumed that you would have to get a normal Python item. I add a pybind11 binding for Scalar (to PyObject only) and refactor the code to use that. There is some trickiness where you are NOT allowed to go through c10::SymInt if there isn't actually any SymInt involved. See comment.
* One of our unit tests tripped an implicit data dependent access which occurs when you pass a Tensor as an argument to a sizes parameter. This is also converted to support symbolic shapes
* We now support tracking bare SymInt/SymFloat returns in proxy tensor mode (this was already in symbolic-shapes branch)
* Whenever we allocate an unbacked symint, we record the stack trace it was allocated at. These get printed when you attempt data dependent access on the symint (e.g., you try to guard on it)
* Subtlety: unbacked symints are not necessarily > 1. I added a test for this.

These unbacked symints are not very useful right now as you will almost always immediately raise an error later when you try to guard on them. The next logical step is adding an assertion refinement system that lets ShapeEnv learn facts about unbacked symints so it can do a better job eliding guards that are unnecessary.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90624
Approved by: https://github.com/Skylion007, https://github.com/voznesenskym
2022-12-12 13:33:07 +00:00
Edward Z. Yang
d3c01c722d Fix pybind11 problems with c10::SymInt unregistered (#88011)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88011
Approved by: https://github.com/weiwangmeta, https://github.com/albanD
2022-10-29 07:55:45 +00:00
Edward Z. Yang
1ff52225f1 Unify SymIntNode and SymFloatNode into SymNode (#87817)
This refactor was prompted by challenges handling mixed int/float
operations in C++.  A previous version of this patch
added overloads for each permutation of int/float and was unwieldy
https://github.com/pytorch/pytorch/pull/87722/  This PR takes a different
approach.

The general outline of the patch is to combine the C++ types SymIntNode
and SymFloatNode into a single type, SymNode.  This is type erased; we
no longer know statically at C++ if we have an int/float and have to test
it with the is_int()/is_float() virtual methods.  This has a number of
knock on effects.

- We no longer have C++ classes to bind to Python.  Instead, we take an
  entirely new approach to our Python API, where we have a SymInt/SymFloat
  class defined entirely in Python, which hold a SymNode (which corresponds
  to the C++ SymNode).  However, SymNode is not pybind11-bound; instead,
  it lives as-is in Python, and is wrapped into C++ SymNode using PythonSymNode
  when it goes into C++.  This implies a userland rename.

  In principle, it is also possible for the canonical implementation of SymNode
  to be written in C++, and then bound to Python with pybind11 (we have
  this code, although it is commented out.)  However, I did not implement
  this as we currently have no C++ implementations of SymNode.

  Because we do return SymInt/SymFloat from C++ bindings, the C++ binding
  code needs to know how to find these classes.  Currently, this is done
  just by manually importing torch and getting the attributes.

- Because SymInt/SymFloat are easy Python wrappers, __sym_dispatch__ now
  takes SymInt/SymFloat, rather than SymNode, bringing it in line with how
  __torch_dispatch__ works.

Some miscellaneous improvements:

- SymInt now has a constructor that takes SymNode.  Note that this
  constructor is ambiguous if you pass in a subclass of SymNode,
  so an explicit downcast is necessary.  This means toSymFloat/toSymInt
  are no more.  This is a mild optimization as it means rvalue reference
  works automatically.

- We uniformly use the caster for c10::SymInt/SymFloat, rather than
  going the long way via the SymIntNode/SymFloatNode.

- Removed some unnecessary toSymInt/toSymFloat calls in normalize_*
  functions, pretty sure this doesn't do anything.

- guard_int is now a free function, since to guard on an int you cannot
  assume the method exists.  A function can handle both int and SymInt
  inputs.

- We clean up the magic method definition code for SymInt/SymFloat/SymNode.
  ONLY the user classes (SymInt/SymFloat) get magic methods; SymNode gets
  plain methods; this is to help avoid confusion between the two types.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

cc @jansel @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87817
Approved by: https://github.com/albanD, https://github.com/anjali411
2022-10-27 20:56:02 +00:00
albanD
3263bd24be Improve argument printing (#87601)
No more "expected tuple but got tuple".  We appropriately
grovel in the list/tuple for the element that mismatched
and report what exactly twinged the failure.

invalid_arguments.cpp is a shitshow so I did something
slapdash to get it not completely horrible.  See
https://github.com/pytorch/pytorch/issues/87514 for more context.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87601
Approved by: https://github.com/Chillee
2022-10-24 23:55:10 +00:00
samdow
169ec120ef [Modes] refactor modes to only use a stack in cpp (#86458)
Refactors the mode code to only have the C++ mode stack and not the "C++ mode" like we originally had. This also simplifies the mode logic in a number of places
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86458
Approved by: https://github.com/zou3519
2022-10-21 19:18:23 +00:00
Edward Z. Yang
954660a308 Correctly error if you pass in tensors where size arguments expected (#86126)
This also makes symintlist track intlist exception handling,
which eellison fixed.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86126
Approved by: https://github.com/eellison
2022-10-03 20:18:41 +00:00
Edward Z. Yang
07800c9c81 Miscellaneous fixes from symbolic-shapes branch (#86042)
- Make toIValue accept SymIntNode and SymFloatNode where number (aka Scalar) is
  expected
- Binding for symintlistOptional in python arg parser
- Teach translate to convert from IntArrayRef to ArrayRef<int64_t>
- Don't query _symint function for meta info in LTC unless LTC is
  code generating a symint function

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86042
Approved by: https://github.com/Chillee
2022-10-01 13:57:58 +00:00
Elias Ellison
75db0225ad Handle fake tensor in intlist (#85759)
Previously, we were swallowing up the Fake Tensor Exception and throwing `TypeError`, which led to https://github.com/pytorch/torchdynamo/issues/1066. Now, we are propagating back the `DataDependentOutputException`.

If this approach is accepted, I can go ahead and do doublelist, symintlist, afterward.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85759
Approved by: https://github.com/ezyang
2022-09-28 21:58:54 +00:00
Edward Z. Yang
9c036aa112 Add SymInt to Scalar (#84958)
This is by no means comprehensive, but adds initial support for SymInt as a Scalar.

Things that don't work yet but need to:
- for some reason `torch.add(tensor, sym_int)` got matched to the `add.Tensor(Tensor self, Tensor other, *, Scalar alpha=1) -> Tensor` schema
- `x + sym_int` failed bc we tried to turn `x` into a sym int:
```
              "__radd__",
              [](c10::SymIntNode a, py::object b) -> c10::SymIntNode {
                auto snb = toSymIntNode(a, b);
                return a->add(snb);
              })
 ```
- Many more things I'm sure

Pull Request resolved: https://github.com/pytorch/pytorch/pull/84958
Approved by: https://github.com/ezyang
2022-09-25 23:51:06 +00:00
Nikolay Korovaiko
f725009a48 as_strided supports SymInt; codegen supports optional SymInt (#84393)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84393
Approved by: https://github.com/ezyang
2022-09-06 16:39:24 +00:00
Edward Z. Yang
2a332afbf4 Add SymFloat, support SymInt to SymFloat conversion (#84284)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84284
Approved by: https://github.com/albanD
2022-09-03 01:30:32 +00:00
Nikolay Korovaiko
63cbdc92a7 switching the exact check to isinstance check (#84023)
Simplifying a type check if an object is a SymIntNode in `is_symint_node`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/84023
Approved by: https://github.com/ezyang
2022-08-25 08:28:40 +00:00
Nikolay Korovaiko
5b621205f4 Revert "Revert "adding a custom caster for c10::SymInt (#82692)"" (#83223)
This should fix the MacOS build errors and reland #82692
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83223
Approved by: https://github.com/albanD
2022-08-12 00:46:50 +00:00
PyTorch MergeBot
daeea7d2c3 Revert "adding a custom caster for c10::SymInt (#82692)"
This reverts commit dee63f4f7b.

Reverted https://github.com/pytorch/pytorch/pull/82692 on behalf of https://github.com/seemethere due to Broke internal builds, see [logs](https://www.internalfb.com/intern/sandcastle/job/4503600373141339/insights)
2022-08-09 22:17:41 +00:00
Nikolay Korovaiko
dee63f4f7b adding a custom caster for c10::SymInt (#82692)
### Description
Adding a custom caster for `c10::SymInt`. This simplifies handling of c10::SymInt on C++/Pytorch boundary. Namely, removing if statements to handle the union nature (e.g. SymIntNode, int) of c10::SymInt.

### Issue
<!-- Link to Issue ticket or RFP -->

### Testing
<!-- How did you test your change? -->

Pull Request resolved: https://github.com/pytorch/pytorch/pull/82692
Approved by: https://github.com/ezyang
2022-08-08 21:40:53 +00:00
Peter Bell
2c2278a960 Make python TensorOption signatures consistent with JIT schemas (#82241)
Fixes #81774

`TensorOptions` arguments in the JIT schema are optional, but in the Python API these were being translated to non-optional but with a default value. This change makes the arguments accept `None` for consistency with the JIT schema. However, it also means that `dtype=c10::nullopt` was previously completely untested so this also fixes several related bugs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82241
Approved by: https://github.com/ngimel
2022-08-07 00:10:27 +00:00
Edward Z. Yang
a9320e6d96 Delete SymInt::data() in favor of as_int_unchecked() (#82477)
I audited all the sites while I was at it, and marked a few suspicious
ones.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82477
Approved by: https://github.com/Chillee
2022-08-01 15:07:22 +00:00
Edward Z. Yang
50e8abbcad Change SymIntNode into an intrusive pointer (#82548)
This will make the pointer type a single word, which is important
for packing it into an int64_t

This time, this diff doesn't segfault when you build with DEBUG mode; more details at https://github.com/pybind/pybind11/issues/4099

Signed-off-by: Edward Z. Yang <ezyangfb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82548
Approved by: https://github.com/albanD
2022-08-01 15:07:21 +00:00
Edward Z. Yang
fd5ac1e6b5 Rename SymbolicIntNode to SymIntNodeImpl (#82350)
Done via

```
git grep -l 'SymbolicIntNode' | xargs sed -i 's/SymbolicIntNode/SymIntNodeImpl/g'
```

Reasoning for the change:

* Sym is shorter than Symbolic, and consistent with SymInt
* You usually will deal in shared_ptr<...>, so we're going to
  reserve the shorter name (SymIntNode) for the shared pointer.

But I don't want to update the Python name, so afterwards I ran

```
 git grep -l _C.SymIntNodeImpl | xargs sed -i 's/_C.SymIntNodeImpl/_C.SymIntNode/'
```

and manually fixed up the binding code

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82350
Approved by: https://github.com/Krovatkin
2022-07-28 18:27:45 +00:00
George Qi
393f7f6ad7 add layout to slow path (#80429)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80429
Approved by: https://github.com/ezyang
2022-07-06 18:01:31 +00:00