Commit Graph

1359 Commits

Author SHA1 Message Date
PyTorch MergeBot
fff9d98e58 Revert "Increased compile time max GPUs to 512. Switched to int16_t DeviceIndex. (#119639)"
This reverts commit e0268821dd.

Reverted https://github.com/pytorch/pytorch/pull/119639 on behalf of https://github.com/huydhn due to Sorry for reverting your change but I think the Window failures are legit as they are failing now in trunk, i.e. 450339ab2d ([comment](https://github.com/pytorch/pytorch/pull/119639#issuecomment-1958428416))
2024-02-22 00:12:54 +00:00
Tobias Ringwald
e0268821dd Increased compile time max GPUs to 512. Switched to int16_t DeviceIndex. (#119639)
Fixes #115331.

This PR increases the number of valid GPU devices to 512 (from 64) in order to future-proof PyTorch for providers that offer [single nodes with a large device count](https://www.tensorwave.com/). Until now, `DeviceIndex` was an `int8_t`, thus multiple changes were necessary:

- `DeviceIndex` changed to `int16_t`. Updated consumers that assume it to be an `int8_t`.
- Updated bounds checking for `torch.device()` in the Python frontend. Right now, we allow funny things like `torch.device('cpu', 200).index == -56`, which is undefined behavior. I inserted some checks to only allow values between 0 and `c10::Device::MAX_NUM_DEVICES - 1`.
- Updated the `ArgumentInfo` struct as it hardcodes the device index as 8 bit field [^1]. Might be a breaking change, not sure if users rely on this.
- Introduced `c10::Device::MAX_NUM_DEVICES` as a replacement for the old `C10_COMPILE_TIME_MAX_GPUS`

[^1]: This field was unsigned, so I guess this has also been undef behavior the whole time? Our default device index is -1, so this always wrapped around to 255 when written to the `ArgumentInfo` struct. When I switched the `DeviceIndex` to `int16_t`, it actually stayed 255 after unpacking from `ArgumentInfo` again, as the `DeviceIndex` was now wide enough that it didn't wrap back to -1.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119639
Approved by: https://github.com/cyyever, https://github.com/albanD
2024-02-21 21:10:49 +00:00
soulitzer
27c5bbe5cb Add is_nested_int() (#119975)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119975
Approved by: https://github.com/jbschlosser
ghstack dependencies: #119661, #119974
2024-02-21 21:10:02 +00:00
cyy
d5d13ab15e Remove C10_FALLTHROUGH (#120157)
Since [[fallthrough]] is supported in our C++17 compilers and no other repo is using it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/120157
Approved by: https://github.com/Skylion007
2024-02-21 06:18:58 +00:00
soulitzer
312ce35c1f Rename singleton int to nested int (#119661)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119661
Approved by: https://github.com/ezyang
2024-02-16 19:21:17 +00:00
cyy
87c6cd2f00 [1/N] Replace std::tie with structural binding (#119774)
This PR replaces some std::tie calls with structural binding from C++17.  This not only makes the code more compact, but also has some performance gain.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119774
Approved by: https://github.com/albanD, https://github.com/malfet
2024-02-14 09:25:04 +00:00
Edward Z. Yang
6665b96ebb Rewrite maybe_reduce more carefully for unbacked SymInt (#119562)
Fixes https://github.com/pytorch/pytorch/issues/119476

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119562
Approved by: https://github.com/albanD
ghstack dependencies: #119559
2024-02-13 21:40:06 +00:00
cyy
10f3abc6b8 [DeviceIndex][3/N] Use DeviceIndex in more places (#119635)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119635
Approved by: https://github.com/ezyang
2024-02-12 21:31:27 +00:00
Edward Z. Yang
3f0fd36835 Introduce size oblivious guards (#118579)
Fixes https://github.com/pytorch/pytorch/issues/117361

The implementation here slightly diverges from what was proposed in the issue, so I will recap what this PR is doing here. Today, when doing computations involving size-like unbacked SymInts, we assume for all operations that the compile time range of the integer is `[2, inf]`, even though at runtime we also accept zero and one.

This PR removes the carte blanche assumption, and instead does the analysis in a much more limited and controlled fashion: only for guards which we have designated as "size oblivious" are we willing to do the analysis under the assumption that the range of all size-like unbacked SymInts is `[2, inf]`; otherwise, we will faithfully only do analysis with `[0, inf]` (or whatever the user provided) bounds.

The infra pieces of this PR are:

* Remove runtime_var_to_range from torch/fx/experimental/symbolic_shapes.py; modify `_constrain_range_for_size` to refine the range without clamping min to 2, and instead add the symbol to a `size_like` set in the ShapeEnv
* When evaluating an expression, if the expression is requested to be evaluated in a `size_oblivious` way, we attempt to statically compute the value of the expression with the assumption that all symbols in `size_like` are updated to assume that they are `>= 2`.
* Add Python and C++ APIs for guarding on a SymBool in a size-oblivious way. In C++, I also need to add some helpers for performing symbolic comparisons, since the stock comparisons immediately specialize in the "normal" way.

The rest of the changes of the PR are marking various spots in PyTorch framework code as size oblivious, based on what our current test suite exercises.

As you review the places where we have marked things as size oblivious, it may become clear why I ended up not opting for the "designate a branch as the default branch when it's not statically obvious which way to go": for some of the conditions, this answer is rather non-obvious. I think potentially there is another refinement on top of this PR, which is something like "I don't care if you can't figure it out with ValueRange analysis, go down this path anyway if there are unbacked sizes involved." But even if we add this API, I think we are obligated to attempt the ValueRange analysis first, since it can lead to better outcomes sometimes (e.g., we are able to figure out that something is contiguous no matter what the unbacked size is.)

When is it permissible to mark something as size oblivious? Heuristically, it is OK anywhere in framework code if it gets you past a guard on unbacked SymInt problem. It is somewhat difficult to provide a true semantic answer, however. In particular, these annotations don't have any observational equivalence guarantee; for example, if I have `torch.empty(u0, 1).squeeze()`, we will always produce a `[u0]` size tensor, even though if `u0 == 1` PyTorch will actually produce a `[]` size tensor. The argument that I gave to Lezcano is that we are in fact defining an alternate semantics for a "special" size = 0, 1, for which we have these alternate eager mode semantics. In particular, suppose that we have a constant `special1` which semantically denotes 1, but triggers alternate handling rules. We would define `torch.empty(special1, 1).squeeze()` to always produce a `[special1]` size tensor, making its semantics coincide with unbacked SymInt semantics. In this model, the decision to designate guards as size oblivious is simply a user API question: you put them where ever you need some handling for special1! As we conservatively error out whenever it is not obvious what `special1` semantics should be, it is always valid to expand these semantics to cover more cases (although you can always choose the wrong semantics!)

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/118579
Approved by: https://github.com/eellison, https://github.com/lezcano
2024-02-06 19:45:32 +00:00
cyy
4a019047ad Enable nested namespace check in clang-tidy (#118506)
It is time to enable nested namespaces in the code.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/118506
Approved by: https://github.com/albanD
2024-01-31 00:32:35 +00:00
dilililiwhy
b025e5984c Get Device instance with correct type when privateuse1 backend is registered (#117966)
Fixes #ISSUE_NUMBER
If privateuse1 backend is registered. Let torch.device return corresponding instance of Device when only index is given.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/117966
Approved by: https://github.com/albanD, https://github.com/malfet
2024-01-24 19:03:18 +00:00
Nikita Shulga
90b3cf33ac [C10] Make Scalar constructable from longs (#118149)
On Linux and Mac `int64_t` is an alias to either `long` (Linux) or  `long long` (Mac)

Because of that, attempt to construct `c10::Scalar` from the other type will fail with `conversion from ‘long long int’ to ‘c10::Scalar’ is ambiguous`.

I.e. attempt to compile:
```cpp
int main() {
  c10::Scalar s = 1L;
}
```
on MacOS failed with:
```
foo.cpp:3:15: error: conversion from 'long' to 'c10::Scalar' is ambiguous
  c10::Scalar s = 1L;
              ^   ~~
/Users/nshulga/git/pytorch/pytorch/torch/include/c10/core/Scalar.h:59:7: note: candidate constructor
      DEFINE_IMPLICIT_CTOR)
      ^
/Users/nshulga/git/pytorch/pytorch/torch/include/c10/core/Scalar.h:59:7: note: candidate constructor
/Users/nshulga/git/pytorch/pytorch/torch/include/c10/core/Scalar.h:59:7: note: candidate constructor
/Users/nshulga/git/pytorch/pytorch/torch/include/c10/core/Scalar.h:59:7: note: candidate constructor
/Users/nshulga/git/pytorch/pytorch/torch/include/c10/core/Scalar.h:59:7: note: candidate constructor
/Users/nshulga/git/pytorch/pytorch/torch/include/c10/core/Scalar.h:59:7: note: candidate constructor
/Users/nshulga/git/pytorch/pytorch/torch/include/c10/core/Scalar.h:59:7: note: candidate constructor
/Users/nshulga/git/pytorch/pytorch/torch/include/c10/core/Scalar.h:62:3: note: candidate constructor
  Scalar(uint16_t vv) : Scalar(vv, true) {}
  ^
/Users/nshulga/git/pytorch/pytorch/torch/include/c10/core/Scalar.h:63:3: note: candidate constructor
  Scalar(uint32_t vv) : Scalar(vv, true) {}
  ^
/Users/nshulga/git/pytorch/pytorch/torch/include/c10/core/Scalar.h:64:3: note: candidate constructor
  Scalar(uint64_t vv) {
  ^

```

Prevent this by providing missing constructors when needed. Alas one can not use SFINAE, as template constructors on Scalar mess up a lot of implicit conversions, so I use  `static_asserts` to  detect early on if premise for constructing this class holds.

Add ScalarTest::LongsAndLongLongs that is essentially a compile time test

Discovered while trying to enable AOTI on MacOS
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118149
Approved by: https://github.com/ezyang, https://github.com/albanD
ghstack dependencies: #118077, #118076
2024-01-24 17:32:29 +00:00
Aaron Orenstein
d280b6ae58 Ensure that deleter is called even for a no-data tensor. (#117418)
Summary:

When using a custom deleter InefficientStdFunctionContext was using a
std::unique_ptr<> to store the pointer and call the deleter - but this failed to
call the deleter if the pointer was null. Since we have a separate holder class
anyway take out the std::unique_ptr<> and call the deleter directly.

Fixes #117273

Test Plan:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/117418
Approved by: https://github.com/wjakob, https://github.com/yanboliang
2024-01-22 23:27:27 +00:00
Jeff Daily
01abb5af21 additional support for float8_e4m3fnuz and _e5m2fnuz (#115214)
Follow up to #107586.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115214
Approved by: https://github.com/peterbell10, https://github.com/malfet
2024-01-22 18:33:41 +00:00
cyy
3baade4425 Remove calls of c10::guts::conjunction,c10::guts::disjunction,c10::guts::negation (#117926)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/117926
Approved by: https://github.com/Skylion007
2024-01-22 00:35:42 +00:00
PyTorch MergeBot
b637fdc8b3 Revert "additional support for float8_e4m3fnuz and _e5m2fnuz (#115214)"
This reverts commit 74e1362499.

Reverted https://github.com/pytorch/pytorch/pull/115214 on behalf of https://github.com/PaliC due to breaking internal builds ([comment](https://github.com/pytorch/pytorch/pull/115214#issuecomment-1900815152))
2024-01-19 17:35:04 +00:00
Jeff Daily
74e1362499 additional support for float8_e4m3fnuz and _e5m2fnuz (#115214)
Follow up to #107586.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115214
Approved by: https://github.com/peterbell10
2024-01-19 00:50:18 +00:00
Jerry Zhang
3e397cefc5 Add uint1 to uint7 dtypes (#117208)
Summary:
These dtypes are added since we see more demand for these sub byte dtypes, especially with
the popularity of LLMs (https://pytorch.org/blog/accelerating-generative-ai-2/#step-4-reducing-the-size-of-the-weights-even-more-with-int4-quantization-and-gptq-2021-toks)

Note these are just placeholders, the operator support for these dtypes will be implemented with tensor subclass.
e.g. torch.empty(..., dtype=torch.uint1) will return a tensor subclass of uint1, that supports different operations like bitwsise ops, add, mul etc. (will be added later)

Also Note that these are not quantized data types, we'll implement quantization logic with tensor subclass backed up by these dtypes as well.
e.g `Int4GroupedQuantization(torch.Tensor)` will be implemented with torch.uint4 Tensors (see https://github.com/pytorch-labs/ao/pull/13 as an example)

Test Plan:
CIs
python test/test_quantization.py -k test_uint1_7_dtype

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/117208
Approved by: https://github.com/ezyang
2024-01-13 01:09:23 +00:00
Edward Yang
b4a35632f9 Add function to materialize COW storages (#117053)
Summary: From Kurt Mohler, see https://github.com/pytorch/pytorch/pull/113396 (manually imported due to ghimport problems)

Test Plan: sandcastle, OSS CI

Differential Revision: D52610522

Pull Request resolved: https://github.com/pytorch/pytorch/pull/117053
Approved by: https://github.com/malfet, https://github.com/kurtamohler
2024-01-10 15:34:16 +00:00
Edward Z. Yang
2e983fcfd3 Support unsigned int for randint, item, equality, fill, iinfo, tensor (#116805)
These are some basic utilities that are often used for testing.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116805
Approved by: https://github.com/albanD
2024-01-10 02:17:23 +00:00
Nikita Shulga
fdfdba7c13 [BE] Use __builtin_overflow_sub when available (#117015)
Which is faster then ternary.

Following script
```python
import torch
from timeit import default_timer

global_setup = """
"""
setup = """
c10::SymInt a = c10::SymInt(123);
"""
code = """
-a;
"""

from torch.utils.benchmark import Timer

t = Timer(stmt=code, setup=setup, global_setup=global_setup, language="c++", timer=default_timer)

print(t.blocked_autorange())
```

reports 4.17 ns median type before and 3.61 ns after on x86_64 Linux and 2.02 ns before and 1.91 ns after on Apple M1

Pull Request resolved: https://github.com/pytorch/pytorch/pull/117015
Approved by: https://github.com/albanD
2024-01-10 00:50:09 +00:00
Edward Z. Yang
f4e35e2c3d Proposed mechanism for handling uint64_t in Scalar (#116595)
Here's the problem: if we support unsigned integer types, and in particular if we support uint64_t, we need a way to represent these integers in Scalar. However, Scalar currently stores all integral values inside int64_t, which is not wide enough to accommodate all possible uint64_t values. So we need to do something to Scalar to support it.

The obvious thing to do is add a uint64_t field to the union, and used it some situations. But when should we use it? The proposal is that we only use it if and only if the integer in question is not representable in int64_t. The historical precedent for this is our handling for uint8_t. Because this type is representable inside int64_t, we have historically stored it inside Scalar as an int64_t. In general, the concept behind Scalar is that it doesn't know the signedness/unsignedness/bitwidth of its input; in particular, we typically construct Scalar from Python int, which doesn't have any concept of how wide the integer is! So it doesn't make any sense to allow for a small integer like 255 to be representable under both the HAS_i tag and the HAS_u tag. So we forbid the latter case.

Although I have proposed this, the PR as currently written just chokes when you pass it a uint64_t that's too big. There's some more logic that would have to be written out for this. I'm putting this out to start to get some agreement that this is the way to do it.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/116595
Approved by: https://github.com/albanD
2024-01-08 22:02:03 +00:00
Edward Z. Yang
8ddac14a15 Add unsigned integer dtypes to PyTorch (#116594)
The dtypes are very useless right now (not even fill works), but it makes torch.uint16, uint32 and uint64 available as a dtype.

Towards https://github.com/pytorch/pytorch/issues/58734

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/116594
Approved by: https://github.com/albanD
ghstack dependencies: #116698, #116693
2024-01-07 07:40:49 +00:00
Edward Z. Yang
8e273e23b5 Refactor promoteType to no longer use shifting strategy (#116693)
Instead of manually fixing the indices (extremely error prone when new
dtypes are added) we just setup a lookup table to map ScalarType to the
offsets table.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/116693
Approved by: https://github.com/albanD
ghstack dependencies: #116698
2024-01-07 07:40:49 +00:00
Edward Z. Yang
a8a9695047 Move promoteTypes to cpp file (#116685)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/116685
Approved by: https://github.com/albanD
2024-01-04 04:42:14 +00:00
cyy
bb2a1e9941 Enable readability-redundant-smartptr-get in clang-tidy (#116381)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/116381
Approved by: https://github.com/Skylion007
2023-12-26 06:05:15 +00:00
cyy
9a0c217a0a [9/N] Fixes clang-tidy warnings in c10/util/*.h (#116185)
Continued work to clean headers in c10/util.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/116185
Approved by: https://github.com/Skylion007
2023-12-22 09:35:44 +00:00
Nikita Shulga
f2c1fb3ee4 Fix crash in SymInt unary minus (#116160)
Before this change `-SymInt(std::numeric_limits<int64_t>::min()) == 0` would reliably crash with null pointer dereference, as `data_` of the SymInt returned by `operator-` would be `0x8000000000000000`, because of the carry/overflow flags set by `negq`.

Before the change x86_64 assembly generated for
4f02cc0670/c10/core/SymInt.cpp (L137)
looked as follows:
```
   0x7ffff7f2f490 <+115>: movq   %rax, %rdx
    0x7ffff7f2f493 <+118>: negq   %rdx
    0x7ffff7f2f496 <+121>: movq   %rdx, (%rbp)
    0x7ffff7f2f49a <+125>: movabsq $0x4000000000000000, %rdx ; imm = 0x4000000000000000
    0x7ffff7f2f4a4 <+135>: cmpq   %rdx, %rax
    0x7ffff7f2f4a7 <+138>: jle    0x7ffff7f2f520            ; <+259> at SymInt.cpp:141:1
```
`negq %rfx` correspond to unary minus and  `cmpq   %rdx, 0x4000000000000000` are inverted `check_range`
b6d0d0819a/c10/core/SymInt.h (L247-L249)
Flags raised by `negq` will affect the results of `cmpq`, and as result value would not be allocated on heap, but rather preserved as `nullptr`.

Not sure if it's worth benchmarking, but perhaps using `__builtin_sub_overflow` would be faster as it does not require an extra comparison, just guarantees that overflow flags is cleared after the op.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/116160
Approved by: https://github.com/Skylion007, https://github.com/colesbury
2023-12-20 20:12:57 +00:00
cyy
968b94bef2 [8/N] Fixes clang-tidy warnings in c10/{core,util}/*.h (#116082)
This patch enables clang-tidy coverage on c10/**/*.h and contains other fixes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/116082
Approved by: https://github.com/Skylion007
2023-12-20 12:22:21 +00:00
cyy
1544c37520 [7/N] Fixes clang-tidy warnings in c10/{core,util}/*.h (#115495)
This PR continues to fix clang-tidy warnings for headers in c10/core and c10/util.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115495
Approved by: https://github.com/malfet
2023-12-19 02:14:30 +00:00
Sunita Nadampalli
88207b10ca Enable thp(transparent huge pages) for buffer sizes >=2MB (#107697)
The 2MB thp pages provide better allocation latencies compared to the standard 4KB pages. This change has shown substantial improvement for batch mode usecases where the tensor sizes are larger than 100MB.

Only enabled if THP_MEM_ALLOC_ENABLE environment variable is set.

Relanding https://github.com/pytorch/pytorch/pull/93888 with functionality disabled for Android

Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107697
Approved by: https://github.com/malfet
2023-12-16 18:16:19 +00:00
soulitzer
3fa3ed4923 Workaround to avoid MSVC std ambiguous symbol error (#115748)
Don't know what the correct fix is, but it appears that this is the known workaround https://github.com/pytorch/pytorch/issues/18607

Failing windows build: https://hud.pytorch.org/pytorch/pytorch/pull/114897?sha=574a6f7cfe979f1bac62c6b0b51380ff67a31a09

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115748
Approved by: https://github.com/jbschlosser
ghstack dependencies: #114895, #115739
2023-12-13 23:22:52 +00:00
soulitzer
67ce57ff66 Add pragma once to headers (#115739)
This reverts commit 9b93c23b5e2d695c2fbd9c886cc0c8010edab717.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115739
Approved by: https://github.com/Skylion007, https://github.com/jbschlosser
ghstack dependencies: #114895
2023-12-13 23:22:52 +00:00
soulitzer
4d8ad4fb82 Move SingletonSymNodeImpl from c10 to aten (#114895)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/114895
Approved by: https://github.com/jbschlosser
2023-12-13 20:01:18 +00:00
cyy
99f222372b [5/N] Fixes clang-tidy warnings in c10/{core,util}/*.h (#115354)
This PR continues to fix clang-tidy warnings for headers in c10/core and c10/util.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115354
Approved by: https://github.com/Skylion007
2023-12-09 17:16:04 +00:00
cyy
516bd4a72c [1/N] Use std::in_place (#115170)
It is time to gradually replace c10::in_place with std::in_place.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115170
Approved by: https://github.com/colesbury
2023-12-09 03:52:39 +00:00
cyy
7b8084d1c6 [5/N] Fixes clang-tidy warnings in c10/core/*.h (#115232)
This PR continues to fix clang-tidy warnings for headers in c10/core.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115232
Approved by: https://github.com/Skylion007
2023-12-07 15:48:03 +00:00
cyyever
1224acc018 [3/N] Fixes clang-tidy warnings in header files (#114431)
This PR series tries to enable clang-tidy for headers in torch/csrc and c10/util.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/114431
Approved by: https://github.com/Skylion007
2023-12-05 12:58:27 +00:00
Scott Wolchok
165f4f6ccf [PyTorch] Redirect c10::optional to std::optional (#101995)
We have C++17 now!

I am intentionally dropping the `c10::optional<c10::ArrayRef>` size optimization. It was intended to improve dispatch, but thanks to D34602980 / #70864 we don't use `optional<ArrayRef>` in function arguments anymore anyway.

Differential Revision: [D46079028](https://our.internmc.facebook.com/intern/diff/D46079028/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101995
Approved by: https://github.com/malfet, https://github.com/Skylion007, https://github.com/ezyang
2023-11-30 02:46:41 +00:00
cyy
07e00de8d7 Add missing member initialization in c10::ExtraMeta constructor (#114448)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/114448
Approved by: https://github.com/Skylion007
2023-11-24 03:54:11 +00:00
cyy
b76e2949f7 Fix pool_size type in TaskThreadPool (#114063)
As negative values of pool_size mean calling defaultNumThreads()

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114063
Approved by: https://github.com/Skylion007
2023-11-23 21:20:45 +00:00
Edward Z. Yang
8c4812be80 Replace expect_int with guard_int (#113921)
The idea is that instead of erroring, we will just specialize at these sites.

Fixes https://github.com/pytorch/pytorch/issues/113142

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113921
Approved by: https://github.com/zou3519
2023-11-20 21:27:48 +00:00
drisspg
2b97f5a9a1 Disallow fp8 type promotion (#113975)
Fixes #113663

As well as updating the promotion logic to disallow automatic type promotion between fp8 types this PR also cleans up the table entries.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113975
Approved by: https://github.com/albanD, https://github.com/malfet
2023-11-20 19:47:43 +00:00
PyTorch MergeBot
f36d09fcb7 Revert "Add function to materialize COW storages (#113396)"
This reverts commit e2f090086b.

Reverted https://github.com/pytorch/pytorch/pull/113396 on behalf of https://github.com/DanilBaibak due to Break internal build ([comment](https://github.com/pytorch/pytorch/pull/113396#issuecomment-1818769090))
2023-11-20 10:26:01 +00:00
cyy
bae61ecb96 [Reland 1] Cleanup header inclusions in torch_cpu by iwyu (#112311)
Reland https://github.com/pytorch/pytorch/pull/101178 to use IWYU on torch_cpu. The header file changes are excluded to avoid breaking internal jobs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112311
Approved by: https://github.com/ezyang
2023-11-19 04:06:36 +00:00
Edward Z. Yang
33f7c6638f Guard when fetching non-symbolic value out of Scalar (#113911)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113911
Approved by: https://github.com/voznesenskym
ghstack dependencies: #113877
2023-11-18 06:39:09 +00:00
Kurt Mohler
e2f090086b Add function to materialize COW storages (#113396)
Part of #109833

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113396
Approved by: https://github.com/ezyang
2023-11-17 01:58:51 +00:00
Nikita Shulga
05d949279c [C10] cpuinfo error handling (#113771)
If `cpuinfo_initalize` returns false, call to subsequent cpuinfo functions may result in `abort()`
Also, `defaultNumThreads()` method is assumption if one method fails then try another, and finally return 1.

Alas there are no good way to test it on x86 platform, but on ARM one can replicate it by running `sudo chmod 750 /sys` and then `python3 -c "import torch;torch._C.profiler.gather_traceback(True, True, True)"`

<!--
copilot:poem
-->
### <samp>🤖 Generated by Copilot at 4d942e8</samp>

> _`cpuinfo` fails_
> _avoid undefined behavior_
> _check before you count_

Partially addresses https://github.com/pytorch/pytorch/issues/113568
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113771
Approved by: https://github.com/atalman
2023-11-15 19:49:34 +00:00
George White
6c187246d6 Add support for float8_e4m3fnuz and _e5m2fnuz (#107586)
This PR relates to the feature in [this feature submission](https://docs.google.com/document/d/1pF2T1xz54IPg1jG7FhykbrpbcJZVelQw0v8vBaoLkfs/edit). It has been based on #104242 which adds similar float8 types.

These new types added in this PR are described in the paper at https://arxiv.org/abs/2206.02915. A brief description and comparison of the types with other float8 types can be also found in the [OpenXLA RFC](https://github.com/openxla/stablehlo/blob/main/rfcs/20230321-fp8_fnuz.md).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107586
Approved by: https://github.com/seemethere, https://github.com/malfet
2023-11-15 15:01:11 +00:00
Chen_Liqing
b910d9eaa6 Add tensor.is_privateuseone (#113421)
We found a scenario where ``tensor.device().is_privateuseone()`` is used to determine whether a tensor is privateuse1 but fails.
In the code of ``Autograd``, for example:
```
::std::tuple<at::Tensor,at::Tensor,at::Tensor> native_batch_norm(c10::DispatchKeySet ks, const at::Tensor & input, const c10::optional<at::Tensor> & weight, const c10::optional<at::Tensor> & bias, const c10::optional<at::Tensor> & running_mean, const c10::optional<at::Tensor> & running_var, bool training, double momentum, double eps) {
  auto& input_ = unpack(input, "input", 0);
  [[maybe_unused]] auto _any_requires_grad = compute_requires_grad( input, weight, bias );

  [[maybe_unused]] auto _any_has_forward_grad_result0 = (isFwGradDefined(input) || isFwGradDefined(weight) || isFwGradDefined(bias));
  check_no_requires_grad(running_mean, "running_mean", "native_batch_norm");
  check_no_requires_grad(running_var, "running_var", "native_batch_norm");
  std::shared_ptr<NativeBatchNormBackward0> grad_fn;
  if (_any_requires_grad) {
    grad_fn = std::shared_ptr<NativeBatchNormBackward0>(new NativeBatchNormBackward0(), deleteNode);
    grad_fn->set_next_edges(collect_next_edges( input, weight, bias ));
    grad_fn->eps = eps;
    grad_fn->input_ = SavedVariable(input, false);
    grad_fn->running_mean_ = SavedVariable(running_mean, false);
    grad_fn->running_var_ = SavedVariable(running_var, false);
    grad_fn->training = training;
    grad_fn->weight_ = SavedVariable(weight, false);
  }
  ...
}
```
When ``weight`` is ``None``, an empty tensor is automatically generated and will be transferred to the backward calculation:
c7e12c7427/torch/csrc/autograd/saved_variable.cpp (L121-L128)
At the beginning of the backward calculation in our scenario, we need to determine whether the input tensor is ``PrivateUse1`` . However, if we use ``tensor.device().is_privateuseone()``, we will get an error ``"tensor does not have a device"``:
c7e12c7427/c10/core/TensorImpl.h (L1223-L1235)
I think this part of the code can be optimized, what do you think?

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113421
Approved by: https://github.com/albanD
2023-11-13 01:51:27 +00:00
Peter Bell
15b61d6c1a TensorImpl: Lazily compute numel and contiguity when symbolic (#112785)
Currently whenever the sizes or strides are modified for a `TensorImpl` we
eagerly recompute the numel and memory format flags. This is fine for static
shapes as it's all fast C++ code, but for symbolic shapes it runs slow python code.

This instead changes the `SymbolicShapeMeta` object to compute the derived
quantities lazily at the first request. This has the added benefit that we can
now pass assumptions in `empty_tensor_restride` which remove the need to compute
some contiguity flags at all.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112785
Approved by: https://github.com/ezyang
ghstack dependencies: #112689, #112890
2023-11-09 01:36:37 +00:00
Peter Bell
8c4bdac560 TensorImpl: Move symbolic refresh_numel and refresh_contiguous into their own class (#112890)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112890
Approved by: https://github.com/lezcano
ghstack dependencies: #112689
2023-11-09 01:36:37 +00:00
Edward Z. Yang
9e6e9587c1 Make numel/sym_numel PyInterpreter work symmetrically to others (#113065)
Just some better engineering code cleanup.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113065
Approved by: https://github.com/voznesenskym
2023-11-08 17:44:29 +00:00
Peter Bell
c6f435befd Don't recompute numel and contiguous in detach (#112689)
When symolic shapes are involved, `refresh_numel` and `refresh_contiguous` are
fairly expensive since they dispatch to python for SymInt handling. However, in
the case of detach we can just copy the existing values instead.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112689
Approved by: https://github.com/lezcano, https://github.com/ezyang
2023-11-07 10:55:48 +00:00
Edward Z. Yang
3be99012d4 Switch some more SymInt tests to TORCH_CHECK_ALWAYS_SHOW_CPP_STACKTRACE (#112626)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112626
Approved by: https://github.com/bdhirsh
2023-11-03 03:15:26 +00:00
Edward Z. Yang
a1ab22b81d Reland "Trigger specialization when you call size()/stride() from C++ (#111935)" (#112605)
This reverts commit 22221c6d60.

Differential Revision: [D50886564](https://our.internmc.facebook.com/intern/diff/D50886564)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112605
Approved by: https://github.com/voznesenskym
2023-11-02 13:27:31 +00:00
PyTorch MergeBot
22221c6d60 Revert "Trigger specialization when you call size()/stride() from C++ (#111935)"
This reverts commit 5846705e36.

Reverted https://github.com/pytorch/pytorch/pull/111935 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/111935#issuecomment-1782107024))
2023-10-27 00:23:03 +00:00
Edward Z. Yang
5846705e36 Trigger specialization when you call size()/stride() from C++ (#111935)
This should be the last of the "it used to work with static shapes but
it doesn't work with dynamic shapes" hard errors.  Now we will just
specialize if you hit it from C++.

The strategy here is a bit clever.  We shunt the size() call to Python
binding if an error would have occurred.  Importantly, we already have
logic to make sure the newly allocated ints stay live for the duration
of the ArrayRef access.

storage_offset is intentionally omitted because there are some problems
with it.  I will fix them next.

This should let us get rid of the aotautograd_static test configuration.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111935
Approved by: https://github.com/zou3519
2023-10-25 16:17:55 +00:00
ydwu4
f3d02d9ae6 Add support for sym_ite (#111440)
This PR supports sym_ite. This is useful for converting SymBool to SymInt in e.g. #109916. Internally, it uses sympy.Piecewise. We cannot use sympy.ITE because it expects the arguments and output all to be boolean type but we want return SymInt type when converting a SymBool to SymInt. So we use sympy.Piecewise to denote the symbolic relationship.

Note that this pr uses the range analysis for sympy.Piecewise implemented in https://github.com/pytorch/pytorch/blob/main/torch/utils/_sympy/value_ranges.py.

Test Plan:
See added test.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111440
Approved by: https://github.com/ezyang
2023-10-23 16:17:43 +00:00
Kurt Mohler
a267d95c2a Reland: Add lazy_clone_storage to create COW storages (#111579)
Relands #110192

NOTE: COW storages do not actually copy on write yet, they just have the COW deleter and deleter context applied to them

Part of #109833

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111579
Approved by: https://github.com/ezyang
2023-10-20 15:49:59 +00:00
PyTorch MergeBot
9c7391ea36 Revert " [1/N] Apply clang-tidy to c10 cuda files (#111137)"
This reverts commit 43b023694e.

Reverted https://github.com/pytorch/pytorch/pull/111137 on behalf of https://github.com/malfet due to Was reverted internally due to the failures in torch.cuda.memory_stats(device=0) (presumably) ([comment](https://github.com/pytorch/pytorch/pull/111137#issuecomment-1769274103))
2023-10-18 20:32:53 +00:00
cyy
43b023694e [1/N] Apply clang-tidy to c10 cuda files (#111137)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111137
Approved by: https://github.com/zou3519, https://github.com/Skylion007
2023-10-17 04:52:50 +00:00
PyTorch MergeBot
97a513ed07 Revert "Add lazy_clone_storage to create COW storages (#110192)"
This reverts commit 1c30814417.

Reverted https://github.com/pytorch/pytorch/pull/110192 on behalf of https://github.com/jeanschmidt due to Breaking internal builds, @ezyang please support the author providing further details ([comment](https://github.com/pytorch/pytorch/pull/110192#issuecomment-1765157285))
2023-10-16 19:43:20 +00:00
Kurt Mohler
1c30814417 Add lazy_clone_storage to create COW storages (#110192)
This PR relands #110022 but accounts for the changes in #110191. Also, the function for creating COW storages is called `lazy_clone_storage` in this PR, instead of `try_ensure`

NOTE: COW storages do not actually copy on write yet, they just have the COW deleter and deleter context applied to them

Part of #109833

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110192
Approved by: https://github.com/ezyang
2023-10-14 00:53:21 +00:00
PyTorch MergeBot
482782406a Revert "Add lazy_clone_storage to create COW storages (#110192)"
This reverts commit 33f1513486.

Reverted https://github.com/pytorch/pytorch/pull/110192 on behalf of https://github.com/kit1980 due to revert to work around some importing issues ([comment](https://github.com/pytorch/pytorch/pull/110192#issuecomment-1762430374))
2023-10-14 00:48:45 +00:00
PyTorch MergeBot
f68d6e8108 Revert "Move at::{Refcounted,}MapAllocator to c10 (#109881)"
This reverts commit 68a1219f74.

Reverted https://github.com/pytorch/pytorch/pull/109881 on behalf of https://github.com/kit1980 due to breaking internal builds, undefined symbol: _ZN3c1022RefcountedMapAllocator6decrefEv ([comment](https://github.com/pytorch/pytorch/pull/109881#issuecomment-1761950014))
2023-10-13 17:57:53 +00:00
Kazuaki Ishizaki
8162f4170b Fix typo under c10 directory (#111155)
This PR fixes typo in comments and messages in files under `c10` directory.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111155
Approved by: https://github.com/Skylion007
2023-10-13 16:52:51 +00:00
Kurt Mohler
33f1513486 Add lazy_clone_storage to create COW storages (#110192)
This PR relands #110022 but accounts for the changes in #110191. Also, the function for creating COW storages is called `lazy_clone_storage` in this PR, instead of `try_ensure`

NOTE: COW storages do not actually copy on write yet, they just have the COW deleter and deleter context applied to them

Part of #109833

Differential Revision: [D50265134](https://our.internmc.facebook.com/intern/diff/D50265134)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110192
Approved by: https://github.com/ezyang
2023-10-13 15:33:40 +00:00
Jesse Cai
4c01686027 Public API for constructing NT with jagged layout from tensor list (#111078)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111078
Approved by: https://github.com/cpuhrsch, https://github.com/soulitzer
ghstack dependencies: #109123
2023-10-13 03:27:41 +00:00
Peter Bell
68a1219f74 Move at::{Refcounted,}MapAllocator to c10 (#109881)
`libshm.so` depends on the torch library exclusively for `at::RefcountedMapAllocator`,
 so it makes sense to move it to c10 along with the other memory allocators.

This means `libshm.so` only depends on `c10` and we don't need to relink
`libshm.so` for every ATen change.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109881
Approved by: https://github.com/albanD
2023-10-12 10:51:13 +00:00
cyy
f98d6ad8b3 [1/N] Apply clang-tidy to aten/src/ATen/core/ (#110861)
It is time to cliang-tidy aten.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110861
Approved by: https://github.com/Skylion007
2023-10-10 23:20:58 +00:00
PyTorch MergeBot
02a02a23ee Revert "Move at::{Refcounted,}MapAllocator to c10 (#109881)"
This reverts commit 0341deb1c7.

Reverted https://github.com/pytorch/pytorch/pull/109881 on behalf of https://github.com/albanD due to It does break buck build ([comment](https://github.com/pytorch/pytorch/pull/109881#issuecomment-1756195823))
2023-10-10 20:39:12 +00:00
soulitzer
fda0a965c7 [reland] Support SingletonSymNode mul with coefficient (#110673)
reland of https://github.com/pytorch/pytorch/pull/110369
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110673
Approved by: https://github.com/ezyang
2023-10-10 19:37:17 +00:00
Peter Bell
0341deb1c7 Move at::{Refcounted,}MapAllocator to c10 (#109881)
`libshm.so` depends on the torch library exclusively for `at::RefcountedMapAllocator`,
 so it makes sense to move it to c10 along with the other memory allocators.

This means `libshm.so` only depends on `c10` and we don't need to relink
`libshm.so` for every ATen change.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109881
Approved by: https://github.com/albanD
2023-10-09 23:53:47 +00:00
Kazuaki Ishizaki
50bd252863 Fix typo the the (#110869)
This PR fixes typo `the the` of comments and exception message in files.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110869
Approved by: https://github.com/soulitzer
2023-10-09 19:32:45 +00:00
vinithakv
36e6b0cfa2 Fix cpuinfo related crash on ppc64 (#110708)
The "import  torch" crashes with following cpuinfo error on powerpc64.
==============================================================
>>> import torch
Error in cpuinfo: processor architecture is not supported in cpuinfo
Fatal error in cpuinfo: cpuinfo_get_processors_count called before cpuinfo is initialized
Aborted (core dumped)
==================================================================
The patch fixes this by excluding powerpc from using cpuinfo as it is not supported for ppc64.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110708
Approved by: https://github.com/ezyang
2023-10-08 13:31:54 +00:00
cyy
12f97bb2e9 [Reland][3/N] Add -Wdeprecated and related fixes (#110518)
Fixes the string_view errors and reland the work. The previous changes in torch/csrc/utils/invalid_arguments.cpp were too aggressive and not tested thoroughly. They are discarded.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110518
Approved by: https://github.com/ezyang
2023-10-07 08:38:40 +00:00
soulitzer
69ea214cc2 [reland] Update singleton int to error when inequality relation is undefined (#110672)
reland of https://github.com/pytorch/pytorch/pull/110044
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110672
Approved by: https://github.com/ezyang
2023-10-06 17:50:25 +00:00
PyTorch MergeBot
330db8278b Revert "Update singleton int to error when inequality relation is undefined (#110044)"
This reverts commit 07331c65e6.

Reverted https://github.com/pytorch/pytorch/pull/110044 on behalf of https://github.com/PaliC due to bottom diff is causing a plethora of internal failures ([comment](https://github.com/pytorch/pytorch/pull/110044#issuecomment-1749805209))
2023-10-05 23:55:37 +00:00
PyTorch MergeBot
1c3fae46ee Revert "Support SingletonSymNode mul with coefficient (#110369)"
This reverts commit eb8feb8ff8.

Reverted https://github.com/pytorch/pytorch/pull/110369 on behalf of https://github.com/PaliC due to bottom diff is causing a plethora of internal failures ([comment](https://github.com/pytorch/pytorch/pull/110369#issuecomment-1749802899))
2023-10-05 23:51:28 +00:00
Amadeusz Skrzypczak
653f966df0 Fix type promotion of float8_e5m2 and float8_e4m3fn (#110279)
There is an issue with float8 type promotion, because _promoteTypesLookup doesn't contain records for few types between bfloat16 and float8.
I have simply moved float8 types just after bfloat16, however I'm not sure if it doesn't break serialization.

Please, decide if it can stay like this, or should I insert missing records filled with "ud" into _promoteTypesLookup instead of moving types.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110279
Approved by: https://github.com/albanD
2023-10-05 01:28:48 +00:00
soulitzer
eb8feb8ff8 Support SingletonSymNode mul with coefficient (#110369)
We want to be able to use SingletonSymNode to represent strides for Jagged layout tensor. The following is for 3D, but easily generalizable to higher dimensions.

Constraints:
- [B, x, D] (where x represents the "variably lengthed dim") can be strided in two ways [x, 1, sum(x)] and [dx, d, 1]. We need two different placeholder values depending on how the jagged tensor is strided.
- When doing operations we need the strides of output tensors to be expressable in terms of the strides and sizes of the inner tensors. Given [B, x, D] @ [D, D'], the output strides is [x * D', D', 1] rather than some opaque [x2, D', 1]. This constraint exists because if I'm tracing, I need a symint to represent the output stride. This symint needs to come from somewhere; I get it in several ways: (1) create a constant, (2) unbacked symint, (3) create a new input using a source, (4) output of an operation on an existing symint. It is clear that (4) is what we want here, which brings us to the design below.

Design:

Given the two constraints, the most straightforward way to implement this is actually to update SingletonSymNode to include some scalar factor, i.e. Morally, SingletonSymNode represents `factor * [s_0, s_1, …, s_n]` This enables us to symbolically compute strides from sizes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110369
Approved by: https://github.com/ezyang
ghstack dependencies: #110044
2023-10-04 22:56:15 +00:00
soulitzer
07331c65e6 Update singleton int to error when inequality relation is undefined (#110044)
Previously, something like j0 >= 3, would return False. In sympy however, it is not possible to make it so that both j0 >= 3 and j0 < 3 return False. In sympy, you only get to dispatch on Ge, and the remaining are derived, e.g. defining Ge(j0 >= 3) to be False would force Lt(j0, 3) to be True, which is not what we want.

In this PR, we make it so that both j0 >=3 and j0 < 3 error, so that in a future PR when we create the symbolic counterpart of this singleton, the behaviors can be the same.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110044
Approved by: https://github.com/ezyang
2023-10-04 22:55:53 +00:00
PyTorch MergeBot
156aefa89b Revert "[3/N] Add -Wdeprecated and related fixes (#109698)"
This reverts commit c31fcdaa4f.

Reverted https://github.com/pytorch/pytorch/pull/109698 on behalf of https://github.com/PaliC due to breaking quantization tests ( quantization/test_quantize_per_channel_sub_byte and  quantization/test_quantize_per_channel_float_qparams) internally ([comment](https://github.com/pytorch/pytorch/pull/109698#issuecomment-1746999806))
2023-10-04 14:33:47 +00:00
cyy
c31fcdaa4f [3/N] Add -Wdeprecated and related fixes (#109698)
This PR follows #108626. Hopefully we can enable the warning in the next PR.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109698
Approved by: https://github.com/Skylion007, https://github.com/ezyang
2023-10-03 22:50:53 +00:00
cyy
168f516fae [3/N] Move c10::variant to std::variant (#110141)
This PR moves more c10::variant calls to std::variant

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110141
Approved by: https://github.com/Skylion007
2023-09-28 18:43:55 +00:00
Kurt Mohler
f2c360e3e5 Reorganize and rename COW files and APIs (#110191)
This PR does the following:
* Combine `cow/context.<h/cpp>` and `cow/deleter.<h/cpp>` into `cow/COWDeleter.<h/cpp>`
* Rename `Context` to `COWDeleterContext`
* Rename `delete_context` to `cow_deleter`
* Remove the separate `impl_cow_context` bazel library, combining it with the base c10 core library
* Rename `context_test.cpp` to `cow_test.cpp`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110191
Approved by: https://github.com/ezyang
2023-09-28 17:50:44 +00:00
cyy
a81d083b1c [Reland] Add -Wdeprecated and related fixes (#110019)
This is reland of PRs #https://github.com/pytorch/pytorch/pull/108626 and #109564. We fixed the IOS build failure by changing
```
((CHECK) ? (EXPR) : ([] { assert(!#CHECK); }(), (EXPR)))
```
to
```
((CHECK) ? (EXPR) : ([] { assert(false); }(), (EXPR)))
```
in TR2_OPTIONAL_ASSERTED_EXPRESSION, since the former syntax was invalid on Apple Clang. Anyway, we could apply the simple fix hoping that c10::optional would be replaced by std::optional soon.
We also enabled -Wdeprecated on c10.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110019
Approved by: https://github.com/clee2000
2023-09-28 03:34:29 +00:00
cyy
36eb1bb548 Use constexpr members in ConstantSymNodeImpl (#110142)
A simple refactoring.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110142
Approved by: https://github.com/Skylion007
2023-09-27 18:31:33 +00:00
PyTorch MergeBot
1265400ba6 Revert "Reland: implement a function to convert a storage to copy-on-write (#110022)"
This reverts commit dddf07e56a.

Reverted https://github.com/pytorch/pytorch/pull/110022 on behalf of https://github.com/atalman due to New tests are failing in internal CI ([comment](https://github.com/pytorch/pytorch/pull/110022#issuecomment-1737584693))
2023-09-27 15:05:41 +00:00
mikey dagitses
dddf07e56a Reland: implement a function to convert a storage to copy-on-write (#110022)
Relands #100819

In addition, the `impl_cow_context` library is combined into the base c10 core library, and COW unit tests are combined into just one binary.

Part of #109833

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110022
Approved by: https://github.com/ezyang
2023-09-26 03:33:18 +00:00
Nikita Shulga
f87863335c [BE]s/DEFINE_ENUM/DEFINE_ST_ENUM_VAL_/ (#109917)
To avoid potential collisions with other libraries that can define such enum globally (which is a bad practice, but happens sometimes)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109917
Approved by: https://github.com/Skylion007
2023-09-25 22:19:09 +00:00
PyTorch MergeBot
83deaa16ed Revert "[1/N] Cleanup header inclusions in torch_cpu by iwyu (#101178)"
This reverts commit b7a95f4fdb.

Reverted https://github.com/pytorch/pytorch/pull/101178 on behalf of https://github.com/atalman due to Break internal CI ([comment](https://github.com/pytorch/pytorch/pull/101178#issuecomment-1734384645))
2023-09-25 20:05:25 +00:00
eellison
4734496a0c Extend storage access error api for untyped_storage() (#109750)
In cudagraph trees, we invalidate tensors at some point and drop their storage. Then, when they are accessed with .data_ptr(), a custom error message is thrown. Previously, this invalidation didn't also make untyped_storage()/storage() error which could result in a segfault.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109750
Approved by: https://github.com/zou3519
2023-09-25 17:51:27 +00:00
cyy
b7a95f4fdb [1/N] Cleanup header inclusions in torch_cpu by iwyu (#101178)
Following our previous IWYU work  #100304 on C10, it makes more sense to try IWYU on torch_cpu. This PR does exactly that. Meanwhile, it fixes issue #48684.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101178
Approved by: https://github.com/ezyang
2023-09-24 05:01:20 +00:00
Brian Hirsh
63526a63f5 Make FunctionalTensor subclass to be more like functorch (interaction with ZeroTensor + Conjugate key) (#109023)
I added some tests for Conj, Neg and ZeroTensor for both python and C++ functionalization. This also fixes a nasty segfult when running a functorch `jacfwd` test with `torch.compile`, once AOTAutograd is using `FunctionalTensor`.

Changes:

(1) I use Jeffrey's `make_wrapper_subclass(extra_dispatch_keys)` kwarg to plumb extra dispatch keys ontoto the wrapper, mirroring what C++ functionalization does (C++ functionalization will mirror all dispatch keys from the inner tensor to the wrapper, except for python and functorch keys).

(2) FunctionalTensorMode will decompose CompositeImplicitAutograd ops, since (for example) ZeroTensor kernels can send ops like `.to()` directly to the Python key. We'll need a way to toggle this later for pre-dispatch functionalization

(3) Bound `_ForceDispatchKeyGuard` and BatchedTensorImpl's dispatch keyset to python

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109023
Approved by: https://github.com/zou3519
ghstack dependencies: #108654, #109662, #109632
2023-09-22 07:09:04 +00:00
Brian Hirsh
dae9aa8925 fix subclass custom sizes dynamic shapes caching (#108654)
This PR fixes the ownership/lifetime handling for tensor subclasses that override sizes/strides, when tensors get resized.

This is needed now, because `FunctionalTensor` is a subclass that has a custom size/stride (so it can plumb requests to its inner tensor), and is also a core piece of infra (it's used during tracing in AOTAutograd, which means that metadata mutation and resizing that happens to work with torch.compile today needs to work with FunctionalTensor).

After a bunch of discussion with @ezyang and @soulitzer, I updated `PyInterpreter::sym_sizes()` (and friends) so that:
(1) They allocate a py::capsule buffer and stash it on the tensor on the first call to size/stride
(2) On a size/stride call where we noticed that the number of **dimensions** on the tensor has changed (so our buffer it stale), we re-allocate the buffer
(3) On a size/strude cal where we notice that the number of dimensions is the same, but the values are different (this happens whenever a tensor experiences a metadata mutation, like `.transpose_()`), we inplace-modify the buffer and put the new ints/symints into it

I also ended up doing the SmallVector optimization, which was required to fix some tests in AOTAutograd. Ideally we should look into those tests, and nail down the parts of our codebase that rely on SmallVector not re-allocating on a resize... but I'm saving this for a followup.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108654
Approved by: https://github.com/ezyang
2023-09-22 07:09:04 +00:00
rzou
8124a6c40c [TORCH_LIBRARY] Add impl_abstract_pystub (#109529)
We want users to be able to define custom ops in C++ but put the
abstract impl in Python (since it is easier to write them in Python and
the abstract impl better models device semantics and data-dependent
operators).

`m.impl_abstract_pystub(opname, python_module, context)` declares the
abstract_impl of the operator to exist in the given python module.
When the abstract_impl needs to be accessed (either via FakeTensor or
Meta), and it does not exist, the PyTorch Dispatcher will yell
with a descriptive error message.

Some details:
- We construct a new global AbstractImplPyStub mapping in
  Dispatcher.cpp. Read/write to this map is protected by the Dispatcher
  lock.
- We add a new Meta Tensor fallback kernel. The fallback errors out if there is
  no meta kernel, but also offers a nicer error message if we see that there is
  a pystub.
- We create a `torch._utils_internal.throw_abstract_impl_not_imported_error`
  helper function to throw errors. This way, we can throw different error
  messages in OSS PyTorch vs internal PyTorch. To invoke this from C++, we
  added a PyInterpreter::throw_abstract_impl_not_imported_error.

Differential Revision: [D49464753](https://our.internmc.facebook.com/intern/diff/D49464753/)

Differential Revision: [D49464753](https://our.internmc.facebook.com/intern/diff/D49464753)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109529
Approved by: https://github.com/ezyang, https://github.com/bdhirsh
2023-09-22 04:55:36 +00:00
Edward Z. Yang
09622d8d49 Allow inferring size-nature from sizes passed to empty constructor (#109720)
This removes the need for many constrain_as_size calls as we now
infer them from error checking for sizes.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109720
Approved by: https://github.com/aakhundov
2023-09-21 17:57:40 +00:00
Aleksei Nikiforov
b91ba226ce Don't use cpuinfo on s390x (#109496)
It doesn't support s390x and just crashes pytorch on init.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109496
Approved by: https://github.com/huydhn
2023-09-21 12:20:49 +00:00
soulitzer
5252fcb133 Handle constant SymBool in unary and binary operations (#109169)
In this PR:
- When Constant SymNode are detected in unary/binary ops demote them to plain int/bool before proceeding. Sometimes this means doing a unary op with a Constant SymNode would result in a plain bool.
- Introduce an is_symbolic method, only available from Python. We need this because isinstance(x, SymInt) is no longer sufficient to check whether a given int/SymInt is symbolic or not. See later PR in the stack to see how this is used.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109169
Approved by: https://github.com/ezyang
2023-09-20 20:37:15 +00:00
PyTorch MergeBot
1cc052bcab Revert "[1/N] Add -Wdeprecated and related fixes (#108626)"
This reverts commit a53a677b4d.

Reverted https://github.com/pytorch/pytorch/pull/108626 on behalf of https://github.com/clee2000 due to I'm getting errors internally that look like the below on x86_64-apple-ios-simulator with clang 16 ([comment](https://github.com/pytorch/pytorch/pull/108626#issuecomment-1728102447))
2023-09-20 16:49:11 +00:00
cyy
a53a677b4d [1/N] Add -Wdeprecated and related fixes (#108626)
This PR adds -Wdeprecated to CMake warnings and fixes related issues.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108626
Approved by: https://github.com/ezyang, https://github.com/Skylion007
2023-09-19 09:24:04 +00:00
cyy
ac603bc2f8 [Reland] Eliminate invocations of c10::stoi,c10::stod,c10::stoull,c10::stoll (#109566)
This is reland of #87603 with definitions of c10::stoXX kept for further investigation.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109566
Approved by: https://github.com/huydhn
2023-09-19 07:15:25 +00:00
Edward Z. Yang
77d745666b Add TORCH_CHECK_ALWAYS_SHOW_CPP_STACKTRACE (#109373)
Unlike TORCH_CHECK, these always show C++ stacktrace on error.  Put it
on errors where you frequently seem to need this information.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109373
Approved by: https://github.com/bdhirsh
ghstack dependencies: #109372
2023-09-18 19:46:32 +00:00
Edward Z. Yang
8a1bbf383d Out-of-line cannot call with symbolic error test (#109372)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109372
Approved by: https://github.com/bdhirsh
2023-09-18 19:46:32 +00:00
PyTorch MergeBot
4d44d8c00a Revert "Eliminate c10::stoi,c10::stod,c10::stoull,c10::stoll (#109179)"
This reverts commit 852f1b8417.

Reverted https://github.com/pytorch/pytorch/pull/109179 on behalf of https://github.com/huydhn due to Sorry for reverting your change but this is breaking periodic buck build, so please fix the issue and reland the change https://github.com/pytorch/pytorch/actions/runs/6207458526/job/16852695272 ([comment](https://github.com/pytorch/pytorch/pull/109179#issuecomment-1724168571))
2023-09-18 18:41:12 +00:00
Rockerz
05c31b3b69 typo in DispatchKeySet.h (#109431)
Fixes #108641

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109431
Approved by: https://github.com/Skylion007
2023-09-18 17:34:36 +00:00
cyy
852f1b8417 Eliminate c10::stoi,c10::stod,c10::stoull,c10::stoll (#109179)
We can remove these functions in favor of std ones.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109179
Approved by: https://github.com/colesbury
2023-09-16 07:22:50 +00:00
Brian Hirsh
f22b303f65 Add TorchDispatch version of functionalization (#106404)
This PR adds a new `FunctionalTensor` subclass, and `FunctionalTensorMode` torch dispatch mode. Together, this class/mode are a lightweight wrapper around our existing C++ functionalization logic.

This idea came from Ed - later in the stack, I want to be able to run functionalization **underneath** torch_dispatch, when performing tracing in AOTAutograd. I can't do this easily with vanilla C++ functionalization, because it has a dedicated dispatch key that always runs before TorchDispatch. However, by adding a torch_dispatch mode shim around functionalization, we can use functionalization as a torch_dispatch mode, which will make it easier to run underneath other modes later.

This PR provides the basic new classes, and some light testing.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106404
Approved by: https://github.com/ezyang
2023-09-15 20:19:25 +00:00
cyy
8cb96f5f2c [Reland]Use cpuinfo to determine c10::ThreadPool thread number (#107339)
Relands PR #107010 and fixes BUCK builds.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107339
Approved by: https://github.com/ezyang
2023-09-14 23:44:23 +00:00
soulitzer
4667a5c948 Update SingletonSymNode to allow more comparisons (#108315)
In this PR:
- {in,}equality between singleton and plain ints returns false instead of erroring
- Morally define the semantic of j0 > c to be as if j0 represented an array [s_0, s_1, ... s_n] and s_k > c for all k
- Just like for equality, we don't actually want to do the comparison one by one, instead j0 is constrained to some range [min, max]. By default this range is [2, int64_t::max] so that it acts like a size and passes 0/1 specialization checks.
- In the future, we can define some API to allow users to constrain the range of their singletons

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108315
Approved by: https://github.com/ezyang
2023-09-13 01:58:02 +00:00
Kurt Mohler
4c5e43574c Reland 2: Add PyObject preservation for UntypedStorage (#109039)
Relands #103907 after it was reverted. This PR makes the new `ignore_hermetic_tls` argument of `check_pyobj` optional to avoid causing a compilation error in torchdistx

Part of #91395

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109039
Approved by: https://github.com/ezyang
2023-09-12 22:26:05 +00:00
PyTorch MergeBot
59f605be57 Revert "Reland 2: Add PyObject preservation for UntypedStorage (#109039)"
This reverts commit 419e4e17a2.

Reverted https://github.com/pytorch/pytorch/pull/109039 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it is failing linter job in trunk, probably due to a landrace ([comment](https://github.com/pytorch/pytorch/pull/109039#issuecomment-1715147020))
2023-09-12 07:26:11 +00:00
Kurt Mohler
419e4e17a2 Reland 2: Add PyObject preservation for UntypedStorage (#109039)
Relands #103907 after it was reverted. This PR makes the new `ignore_hermetic_tls` argument of `check_pyobj` optional to avoid causing a compilation error in torchdistx

Part of #91395

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109039
Approved by: https://github.com/ezyang
2023-09-12 01:19:40 +00:00
cyy
f150f96255 [Reland] increase clang-tidy coverage in torch/csrc (#108875)
Reland  PR #103058 since there was a time gap between this PR and other PRs in terms of torch/csrc modifications

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108875
Approved by: https://github.com/Skylion007
2023-09-12 00:54:53 +00:00
PyTorch MergeBot
68238606f3 Revert "Reland: Add PyObject preservation for UntypedStorage (#103907)"
This reverts commit 56b848157c.

Reverted https://github.com/pytorch/pytorch/pull/103907 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but it is failing torchdistx build which uses check_pyobj here 9c1b9f5cb2/src/python/torchdistx/_C/deferred_init.cc (L87) ([comment](https://github.com/pytorch/pytorch/pull/103907#issuecomment-1712121158))
2023-09-08 19:27:07 +00:00
PyTorch MergeBot
fa8bfe5ca2 Revert "increase clang-tidy coverage in torch/csrc (#103058)"
This reverts commit cdf7f3e780.

Reverted https://github.com/pytorch/pytorch/pull/103058 on behalf of https://github.com/atalman due to Sorry for reverting your change, breaks lint ([comment](https://github.com/pytorch/pytorch/pull/103058#issuecomment-1711906915))
2023-09-08 16:07:41 +00:00
cyy
cdf7f3e780 increase clang-tidy coverage in torch/csrc (#103058)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103058
Approved by: https://github.com/Skylion007
2023-09-08 15:07:32 +00:00
Joel Schlosser
b928e08f3d Initial vmap + NT support with unbind fallback (#106786)
PoC demonstrating vmap + NT based on the [design doc](https://docs.google.com/document/d/1dVVk6TOqz93PLTIneU2T3xaxCs9qZ0MaJyCvOAp_bC0). This PR:
* Allows `BatchedTensorImpl`s to contain NTs
* Introduces a `BatchedNestedTensor` dispatch key for NT-specific batching rules
* Provides a batching rule fallback that unbinds the NTs -> performs computation on constituent -> rebinds results into NT

Restrictions:
* Only supports one level of vmap
* Only supports vmapping over dim=0 for NTs
    * For operations with mixed NT / dense inputs, support is also limited to dim=0 for the dense inputs
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106786
Approved by: https://github.com/zou3519
2023-09-07 13:53:20 +00:00
Kurt Mohler
56b848157c Reland: Add PyObject preservation for UntypedStorage (#103907)
This relands #97470 after #102553 reverted it. This PR attempts to fix the internal failure by avoiding an unnecessary intermediate storage buffer allocation in `c10::newStorageImplFromRefcountedDataPtr`.

Part of #91395

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103907
Approved by: https://github.com/ezyang
2023-09-07 04:24:11 +00:00
cyy
01fc6466d1 [Reland] [1/N] fix clang-tidy warnings in torch/csrc (#108114)
Reland of PR #107648 with auto replaced with Py_ssize_t in eval_frame.c. This PR applies fixes to some found issues by clang-tidy in torch/csrc.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108114
Approved by: https://github.com/Skylion007
2023-08-30 17:11:16 +00:00
Brian Hirsh
da54f3c519 reorder proxy / fake modes so they always run last (#104482)
**Update:** Made refactor of the original PR. See the original description below, but here I'll describe the updates:

(1) TLS changes in `TorchDispatchModeTLS.h/cpp`.

I added a `TorchDispatchModeKey` enum, that (for now) just contains PROXY and FAKE. The ModeTLS used to just contain a `std::vector<std::shared_ptr<c10::SafePyObject>>` corresponding to the mode stack. It now **also** contains a separate array of "infra modes", indexed by mode key (PROXY and FAKE, with a new addition, FUNCTIONAL, coming later in the stack).

`TorchDispatchModeTLS::push_onto_stack` and `TorchDispatchModeTLS::pop_stack` are now a bit more complicated. Pushing accepts an optional mode_key, which if set, tells us to add the given mode directly to our "infra_modes" array. Popping will first check the "user mode" stack, before trying to pop anything from the infra mode stack. It also optionally returns the mode key of the mode we popped if there was one - that way if we push that same mode back onto the TLS later, we know where it goes.

`TorchDispatchModeTLS::dispatch_mode_enabled()` now accepts an optional `skip_infra_modes` param, so you can separately query if there are "any modes at all", or if there are "any user modes".

`TorchDispatchModeTLS::get/set/unset_mode()` all take in a mode key, and get/set/unset the mode at that particular mode key (meaning they are only meant to be used for infra modes).

There were also some mild codegen changes to support the new enum

(2) `fake_tensor.py/proxy_tensor.py/_python_dispatch.py`

The way I tell the infra that certain subclasses/modes are "infra" is through the enum: I gave `FakeTensor` and `FakeTensorMode` a `self._mode_key = torch._C.TorchDispatchModeKey.FAKE`. `TorchDispatchMode.__enter/exit__()` (in `_python_dispatch.py` now check if the current mode has a mode key, and if so they plumb it into any `push_onto_stack()` calls (which eventually instructs `TorchDispatchModeTLS` where to put the mode). Same thing for `ProxyTorchDispatchMode`.

I also had to change both of these mode's enter/exit, to handle the fact that there can no longer be multiple proxy/fake modes on the mode stack at once. I updated them both to have a `self.enter_stack: List[Optional[TorchDispatchMode]]` - whenever we push a given mode in `__enter__`, we remove the current ambient fake/proxy mode from the mode stack, and save it in `enter_stack`, so that on exit we can reset the state properly.

(2) dispatching logic in `python_arg_parser.cpp`

This is where the core dispatching logic changes are. I added two helpers, `dispatch_on_subclass()` and `dispatch_on_mode()`. The overall dispatching order is now:
```
(a) dispatch_on_mode()  # try user modes first (where the mode stack automatically considers infra modes last)
(b) dispatch_on_subclass() # try user subclasses next (skipping infra subclasses)
(c) dispatch_on_subclass() # try infra subclasses next (skipping user subclasses)
```

Note that we still want "user subclasses" to run before "infra modes". As Ed helped me realize, this will work today: If proxy/fake modes in step 1, they'll return NotImplemented if they see a user subclass, allowing us to redispatch to the user subclass.

How do (b) and (c) distinguish between user and infra subclasses? Infra subclasses (FakeTensor, and later FunctionalTensor) are required to have a `_mode_key` hidden on the subclass - so we filter via arguments that do/don't have the _mode_key.

(3) I also changed `DoubleTensor` to `TwoTensor` to minimize confusion (@albanD  pointed out that DoubleTensor would be easily confused with `torch.FloatTensor` and friends).

----- original description below -----

The main purpose of this PR is to fix the "ordering problem" between torch_dispatch modes, where we want to ensure that our Fake and Proxy dispatch modes always run **after** any dispatch modes created by the user, regardless of where they are in the stack. See this doc for more details: https://docs.google.com/document/d/1COQ291nOZvtFnzGTQMJqoYZ3sttEYFw_7HbfSyL8gcA/edit

Full set of changes below. I ended up including a few semi-related changes in this PR that I documented - but if folks would rather I separate them out, happy to try to do that.

**(1) Add dedicated TLS slots for FakeTensorMode and ProxyTensorMode**

This is the main component of this PR. There are two new slots, `TorchDispatchModeTLS.fake_mode_` and `TorchDispatchModeTLS.proxy_mode_`, which correspond to a single "global" fake and proxy mode. There is now an invariant that `torchDispatchModeState.stack_` can never contain either of these modes.

I also added a `TorchDispatchModeTLS::maybe_highest_mode()` helper that consults the `stack_` as well as both the proxy and fake slots, and returns the highest priority mode - this is because there are a few places in the codebase where we legitimately want to get the highest priority mode, *including* fake or proxy, if one is set.

This also made the implementations of the existing `disable_proxy_modes_tracing()` and `get_innermost_proxy_mode()` marginally simpler.

**(2) Updated the dispatching logic in handle_torch_function_no_python_arg_parser()**

This is the function that actually figures out which torch_dispatch implementation to call, given the current mode stack and tensor subclass inputs. This function got marginally more complicated as part of the refactor: First we inspect the mode stack and any non-fake subclass inputs. Then we check for the proxy mode slot. Then we check for the Fake mode slot, before finally checking for any fake subclass inputs.

**(3) new python `_get_fake_tensor_mode()` and `_get_proxy_tensor_mode()` API's**

Before, if you wanted to see if proxy or fake modes were active in python, you would have to consult the mode stack. Since these two modes are no longer part of the actual mode stack, I added two new API's to directly check if either proxy or fake modes are active.

**(4) Allow traceable tensor subclasses to access storages from python**
This is convenient later in the stack, where AOTAutograd needs to detect aliasing of inputs and outputs, where those inputs and outputs might be tensor subclasses. Previously, `x.untyped_storage()` would raise an error if `x` was a subclass. In this PR, I tried to relax this constraint as little as possible: `THPVariable_storage()` will only try to return a storage to python if the tensor subclass that you are passing in is "traceable"

**(5) Fixed subclass fakeification**

@wanchaol recently added support to be able to fakeify tensor subclasses. That fakeification logic works in most cases, but there is one case it doesn't handle: autograd metadata. In particular, since autograd sees our tensor subclasses and not their desugared tensors, we need to make sure that our fakeified subclass has the same autograd metadata as the original subclass. I updated `meta_utils.py` to make sure that the autograd metadata is correct.

**(6) make tensor subclasses resizeable**

Previously we didn't allow tensor subclasses to be resizeable. I ran into an issue where fakeifying a tensor subclass occasionally requires swapping out its storage, which can involve resizing the tensor. Mechanically, this required updating `at::for_blob()` to expose a way to request that the tensor that you create has resizeable storage, and then using this new API in `_make_wrapper_tensor()`.

**(7) Added a basic DoubleTensor subclass for testing**

I use this subclass more later in this stack in my AOTAutograd tests - but it serves as a simple subclass example to test the dispatch ordering in this PR.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104482
Approved by: https://github.com/ezyang
ghstack dependencies: #107415
2023-08-29 02:36:48 +00:00
PyTorch MergeBot
8cbf77585d Revert "[1/N] fix clang-tidy warnings in torch/csrc (#107648)"
This reverts commit 49eeca00d1.

Reverted https://github.com/pytorch/pytorch/pull/107648 on behalf of https://github.com/osalpekar due to This causes breakages due to underspecified type ([comment](https://github.com/pytorch/pytorch/pull/107648#issuecomment-1696372588))
2023-08-28 20:35:12 +00:00
cyy
d9fb7166d6 [BE] use DeviceIndex instead of int64_t for related device interfaces (#103068)
This PR unifies the device interfaces in aten/*cpp and torch/csrc/*cpp to use  **c10::DeviceIndex**.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103068
Approved by: https://github.com/malfet
2023-08-25 20:16:14 +00:00
cyy
49eeca00d1 [1/N] fix clang-tidy warnings in torch/csrc (#107648)
Apply fixes to some found issues by clang-tidy in torch/csrc.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107648
Approved by: https://github.com/Skylion007
2023-08-25 00:30:09 +00:00
soulitzer
d7130e9704 Add SingletonSymIntNode (#107089)
Adds `SingletonSymNodeImpl` (alternatively, `SkolemSymNodeImpl`). This is a int-like object that only allows  the`eq` operation; any other operation produces an error.

The main complexity is that we require operations that dispatch to SymNode must take and return SymNodes, but when performing operations involving `SingletonSymNodeImpl`, operations involving SymNode can return non-SymNode bools.  For more discussion see [here](https://docs.google.com/document/d/18iqMdnHlUnvoTz4BveBbyWFi_tCRmFoqMFdBHKmCm_k/edit)
- Introduce `ConstantSymNodeImpl` a generalization of `LargeNegativeIntSymNodeImpl` and replace usage of `LargeNegativeIntSymNodeImpl`  in SymInt.
- Also use ConstantSymNodeImpl to enable SymBool to store its data on a SymNode. Remove the  assumption that if SymBool holds a non-null SymNode, it must be symbolic.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107089
Approved by: https://github.com/ezyang
ghstack dependencies: #107839
2023-08-24 21:38:47 +00:00
Yukio Siraichi
bcede143bd Do not mutate SymNode expression. (#107492)
This PR stops `SymNode` from mutating (i.e. simplifying) its expression. Instead, the
simplification (without mutation) is deferred to the `SymNode.maybe_as_int` method.

```python
- FakeTensor(size=(s0,), ...)
- FakeTensor(size=(s1, s2, s3), ...)

- Eq(s0, s1 + s2 + s3)

- FakeTensor(size=(s0,), ...)
- FakeTensor(size=(s1, s2, s3), ...)
```

In summary, this PR:
- Replaces `SymNode._expr` by `SymNode.expr`, removing the old property function
    - This makes it so `SymNode` instances never update their expression
- Creates `SymNode.simplified_expr()` method for actually calling `ShapeEnv.replace` on
  its expression. Note that this doesn't updates `SymNode.expr`
- Changes how `tensor.size()` gets converted to its Python `torch.Size` type
    - Instead of calling `SymInt::maybe_as_int()` method, we create a new
      `SymInt::is_symbolic()` method for checking whether it is actually a symbolic value
    - This is needed so that when we call `tensor.size()` in the Python side, the returned
      sequence is faithful to the actual data, instead of possibly simplifying it and
      returning an integer
    - 2 files needs this modification:
        - _torch/csrc/Size.cpp_: for handling `torch.Tensor.size` Python calls
        - _torch/csrc/utils/pybind.cpp_: for handling `symint.cast()` C++ calls

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107492
Approved by: https://github.com/ezyang
ghstack dependencies: #107523
2023-08-22 12:38:05 +00:00
Alan Ji
2d727c8c3f remove the duplicate method is_private_use1 in class Device (#107198)
In the `Device` class, there are two methods with similar functions called `is_private_use1` and `is_privateuseone`.
ddf36c82b8/c10/core/Device.h (L84-L87)

ddf36c82b8/c10/core/Device.h (L159-L162)
The former is not being utilized and therefore, this PR removes it.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107198
Approved by: https://github.com/bdhirsh
2023-08-17 18:23:29 +00:00
PyTorch MergeBot
6c0bba3daf Revert "Use cpuinfo to determine c10::ThreadPool thread number (#107010)"
This reverts commit ad0476540d.

Reverted https://github.com/pytorch/pytorch/pull/107010 on behalf of https://github.com/izaitsevfb due to Breaks internal meta builds ([comment](https://github.com/pytorch/pytorch/pull/107010#issuecomment-1679866821))
2023-08-16 02:20:31 +00:00
Edward Z. Yang
e1ee10e6f5 Add expect_true for irrefutable guards (#106720)
Here's what it does from the comments:

```
Assume that a boolean is true for the purposes of subsequent symbolic
reasoning.  This will keep track of corresponding runtime checks to verify
that the result is upheld: either as a regular guard, or as a special set
of asserts which are triggered when an unbacked SymInt is allocated.

DO NOT use this function for these cases:

 - This is inappropriate for "branching" conditions (where both
   true and false result in valid programs).  We will always assume
   the condition evaluates true, and so it will never be possible
   to trace the false condition when you use it.  For true branching
   on unbacked SymInts, you must use torch.cond.

 - This is inappropriate for situations where you know some other system
   invariant guarantees that this property holds, since you don't
   really need to insert a runtime check in that case.  Use something
   like constrain_range in that case.

This API has a hitch.  To avoid having to reimplement error reporting
capabilities, this function CAN return False.  The invariant is that
the surrounding code must raise an error when this function returns
False.  This is quite low level, so we recommend using other functions
like check() which enforce this in a more intuitive way.

By the way, this name is a nod to the __builtin_expect likely macro,
which is used similarly (but unlike __builtin_expect, you MUST fail
in the unlikely branch.)
```

We don't do anything with this right now, except use it to discharge regular guards.  Follow up PRs to (1) use it at important error checking sites, (2) actually ensure the runtime asserts make there way into the exported IR / inductor generated code.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106720
Approved by: https://github.com/ysiraichi, https://github.com/voznesenskym
2023-08-15 18:42:22 +00:00
cyy
ad0476540d Use cpuinfo to determine c10::ThreadPool thread number (#107010)
This PR prefers "logical processor number" (the cpu cores shown in htop) returned by cpuinfo for determining c10 thread number. If that fails, it uses hardware_concurrency exactly.
The motivation is that in a x86 host with 64 cores and Hyper-Threading disabled, the current behavior uses 32 threads, resulting half of cores being idle.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107010
Approved by: https://github.com/ezyang
2023-08-15 17:26:24 +00:00
Nikita Shulga
22a20d0850 Add isFloat8Type predicate (#106977)
And make float8 dtypes part of `isReducedFloatingType()` predicate
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106977
Approved by: https://github.com/albanD
2023-08-11 20:13:57 +00:00
Nikita Shulga
97396cdbb2 Fix undefined behavior detected by clang-12 (#106354)
Compiler behavior when non-zero offset is added to a null pointer is undefined and is a bad habit.

- When `lapackEig` is called with to estimate a workspace size, do not add matrix size to the W pointer.
- When `unpack_pivots_cpu_kernel` with zero `dim_size` exit early.
- When `topk_impl_loop` is called with  `k` is zero, exit right away as output tensors are empty anyway.
- Ignore adding non-zero storage-offset in `TensorImpl::data_ptr_impl_impl`, which can be the case if tensor is created as `torch.empty(3)[4:]`.
- In `s_addmm_out_sparse_dense_worker` do not call `axpy` over an empty vector.
- In `_sparse_binary_op_intersection_kernel_impl` do skip computing `ptr_indices_dim` when `sparse_dim` is empty.
- Exit `grid_sample` forward/backward kernels earlier if either `input` or `grid` are empty tensors.

Found by asan in clang-12

Before the change UBSan report looks as follows:
```
 ASAN_SYMBOLIZER_PATH=/usr/lib/llvm-12/bin/llvm-symbolizer UBSAN_OPTIONS=print_stacktrace=1 LD_PRELOAD=/usr/lib/llvm-12/lib/clang/12.0.1/lib/linux/libclang_rt.asan-x86_64.so python test_fx_experimental.py -v -k test_normalize_operator_exhaustive_linalg_eig_cpu_float32
Test results will be stored in test-reports/python-unittest/test_fx_experimental

Running tests...
----------------------------------------------------------------------
  test_normalize_operator_exhaustive_linalg_eig_cpu_float32 (__main__.TestNormalizeOperatorsCPU) ... /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/overrides.py:111: UserWarning: 'has_cuda' is deprecated, please use 'torch.backends.cuda.is_built()'
  torch.has_cuda,
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/overrides.py:112: UserWarning: 'has_cudnn' is deprecated, please use 'torch.backends.cudnn.is_available()'
  torch.has_cudnn,
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/overrides.py:118: UserWarning: 'has_mps' is deprecated, please use 'torch.backends.mps.is_built()'
  torch.has_mps,
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/overrides.py:119: UserWarning: 'has_mkldnn' is deprecated, please use 'torch.backends.mkldnn.is_available()'
  torch.has_mkldnn,
/var/lib/jenkins/workspace/aten/src/ATen/native/BatchLinearAlgebra.cpp:937:17: runtime error: applying non-zero offset 20 to null pointer
    #0 0x7f2025794888 in void at::native::lapackEig<float, float>(char, char, int, float*, int, float*, float*, int, float*, int, float*, int, float*, int*) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0x9945888)
    #1 0x7f20257da256 in void at::native::(anonymous namespace)::apply_linalg_eig<float>(at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&, bool) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0x998b256)
    #2 0x7f20257d902d in at::native::(anonymous namespace)::linalg_eig_kernel(at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor const&, bool) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0x998a02d)
    #3 0x7f20257b5b3d in at::native::linalg_eig_out_info(at::Tensor const&, at::Tensor&, at::Tensor&, at::Tensor&, bool) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0x9966b3d)
    #4 0x7f20257b4770 in at::native::linalg_eig_out(at::Tensor const&, at::Tensor&, at::Tensor&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0x9965770)
    #5 0x7f20280710e6 in c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor&, at::Tensor&> (at::Tensor const&, at::Tensor&, at::Tensor&), &(at::(anonymous namespace)::(anonymous namespace)::wrapper_CPU_out_linalg_eig_out(at::Tensor const&, at::Tensor&, at::Tensor&))>, std::tuple<at::Tensor&, at::Tensor&>, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor&, at::Tensor&> >, std::tuple<at::Tensor&, at::Tensor&> (at::Tensor const&, at::Tensor&, at::Tensor&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor&, at::Tensor&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0xc2220e6)
    #6 0x7f202727a045 in at::_ops::linalg_eig_out::call(at::Tensor const&, at::Tensor&, at::Tensor&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0xb42b045)
    #7 0x7f20257b7e29 in at::native::linalg_eig(at::Tensor const&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0x9968e29)
    #8 0x7f2028070bf0 in c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor> (at::Tensor const&), &(at::(anonymous namespace)::(anonymous namespace)::wrapper_CPU__linalg_eig(at::Tensor const&))>, std::tuple<at::Tensor, at::Tensor>, c10::guts::typelist::typelist<at::Tensor const&> >, std::tuple<at::Tensor, at::Tensor> (at::Tensor const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0xc221bf0)
    #9 0x7f2026b1f787 in std::tuple<at::Tensor, at::Tensor> c10::Dispatcher::redispatch<std::tuple<at::Tensor, at::Tensor>, at::Tensor const&>(c10::TypedOperatorHandle<std::tuple<at::Tensor, at::Tensor> (at::Tensor const&)> const&, c10::DispatchKeySet, at::Tensor const&) const (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0xacd0787)
    #10 0x7f20273230a7 in at::_ops::linalg_eig::redispatch(c10::DispatchKeySet, at::Tensor const&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0xb4d40a7)
    #11 0x7f202c3cc32d in torch::autograd::VariableType::(anonymous namespace)::linalg_eig(c10::DispatchKeySet, at::Tensor const&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0x1057d32d)
    #12 0x7f202c3cba96 in c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor> (c10::DispatchKeySet, at::Tensor const&), &(torch::autograd::VariableType::(anonymous namespace)::linalg_eig(c10::DispatchKeySet, at::Tensor const&))>, std::tuple<at::Tensor, at::Tensor>, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&> >, std::tuple<at::Tensor, at::Tensor> (c10::DispatchKeySet, at::Tensor const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0x1057ca96)
    #13 0x7f20272798e0 in at::_ops::linalg_eig::call(at::Tensor const&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0xb42a8e0)
    #14 0x7f2043d97ae3 in torch::autograd::THPVariable_linalg_eig(_object*, _object*, _object*) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_python.so+0x23feae3)
    #15 0x5072d6 in cfunction_call /usr/local/src/conda/python-3.9.17/Objects/methodobject.c:543:19
    ...

SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /var/lib/jenkins/workspace/aten/src/ATen/native/BatchLinearAlgebra.cpp:937:17 in
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106354
Approved by: https://github.com/huydhn, https://github.com/lezcano
2023-08-03 05:36:03 +00:00
Jun Luo
17a3141696 Support is_mtia() (#106396)
Summary: As title.

Test Plan: CI tests.

Reviewed By: yuhc

Differential Revision: D47937061

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106396
Approved by: https://github.com/yuhc, https://github.com/ezyang
2023-08-02 03:24:23 +00:00
cyy
b3e24c53eb use performance-unnecessary-value-param in clang-tidy (#102615)
performance-unnecessary-value-param has been disabled in clang-tidy for a long time. However, this check is actually useful and able to some interesting performance problems.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102615
Approved by: https://github.com/malfet, https://github.com/Skylion007
2023-07-28 17:37:03 +00:00
cyy
646fa36875 Add const reference in opportunities detected by clang-tidy (#105931)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105931
Approved by: https://github.com/Skylion007
2023-07-26 21:38:10 +00:00
Amadeusz Skrzypczak
b64bd4a5dd Add torch.float8_e5m2 and torch.float8_e4m3 data types (#104242)
Proposal of two float8 variants - e5m2 and e4m3 - based on https://arxiv.org/pdf/2209.05433.pdf

Hide all Float8 operator implementations behind `#if !defined(C10_MOBILE)` guard to keep Android build size almost unchanged

TODO:
 - Refactor duplicated code
 - Cleanup unbalanced pragma pop in dtype utils
 - Add native implementation on the CUDA size

Co-authored-by: Nikita Shulga <nshulga@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104242
Approved by: https://github.com/albanD
2023-07-20 16:09:11 +00:00
PyTorch MergeBot
f2b15772ff Revert "Add torch.float8_e5m2 and torch.float8_e4m3 data types (#104242)"
This reverts commit a9804130e5.

Reverted https://github.com/pytorch/pytorch/pull/104242 on behalf of https://github.com/PaliC due to breaks lint (run lintrunner and remerge) ([comment](https://github.com/pytorch/pytorch/pull/104242#issuecomment-1644150284))
2023-07-20 15:37:53 +00:00
Amadeusz Skrzypczak
a9804130e5 Add torch.float8_e5m2 and torch.float8_e4m3 data types (#104242)
Proposal of two float8 variants - e5m2 and e4m3 - based on https://arxiv.org/pdf/2209.05433.pdf

Hide all Float8 operator implementations behind `#if !defined(C10_MOBILE)` guard to keep Android build size almost unchanged

TODO:
 - Refactor duplicated code
 - Cleanup unbalanced pragma pop in dtype utils
 - Add native implementation on the CUDA size

Co-authored-by: Nikita Shulga <nshulga@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104242
Approved by: https://github.com/albanD
2023-07-20 09:45:45 +00:00
Edward Z. Yang
1152e86da1 Transmute refined SymInt into int (#104828)
Previously, x.size(0) could return a SymInt, even when the internal
sympy expression was actually already constant (e.g., due to an
introduced guard.)  We now allow to query the Python object with
maybe_as_int which allows us to transmute these objects back to
int when possible.

It is still possible to end up with a constant SymInt even after this
change, e.g., if you get out a SymInt and while holding onto it
specialize it, but casual users are more likely to get ints when they
want to.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104828
Approved by: https://github.com/Skylion007
2023-07-15 18:46:10 +00:00
PyTorch MergeBot
1c69f363c4 Revert "Transmute refined SymInt into int (#104828)"
This reverts commit 0f322a300e.

Reverted https://github.com/pytorch/pytorch/pull/104828 on behalf of https://github.com/ezyang due to executorch failure ([comment](https://github.com/pytorch/pytorch/pull/104828#issuecomment-1635997559))
2023-07-14 15:08:11 +00:00
Edward Z. Yang
0f322a300e Transmute refined SymInt into int (#104828)
Previously, x.size(0) could return a SymInt, even when the internal
sympy expression was actually already constant (e.g., due to an
introduced guard.)  We now allow to query the Python object with
maybe_as_int which allows us to transmute these objects back to
int when possible.

It is still possible to end up with a constant SymInt even after this
change, e.g., if you get out a SymInt and while holding onto it
specialize it, but casual users are more likely to get ints when they
want to.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104828
Approved by: https://github.com/Skylion007
2023-07-13 07:02:52 +00:00
PyTorch MergeBot
06a5df8d31 Revert "Transmute refined SymInt into int (#104828)"
This reverts commit 4694f54356.

Reverted https://github.com/pytorch/pytorch/pull/104828 on behalf of https://github.com/ezyang due to broke inductor ([comment](https://github.com/pytorch/pytorch/pull/104828#issuecomment-1633049980))
2023-07-12 18:57:58 +00:00
Edward Z. Yang
4694f54356 Transmute refined SymInt into int (#104828)
Previously, x.size(0) could return a SymInt, even when the internal
sympy expression was actually already constant (e.g., due to an
introduced guard.)  We now allow to query the Python object with
maybe_as_int which allows us to transmute these objects back to
int when possible.

It is still possible to end up with a constant SymInt even after this
change, e.g., if you get out a SymInt and while holding onto it
specialize it, but casual users are more likely to get ints when they
want to.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104828
Approved by: https://github.com/Skylion007
2023-07-12 16:40:21 +00:00
Jason Ansel
dffcf999bd Misc changes from compiled autograd branch (#104316)
This PR pulls out some standalone changes from #103822

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104316
Approved by: https://github.com/ezyang
2023-07-08 20:59:20 +00:00
Anthony Alayo
8d65635378 Prefixing DeviceType with c10 namespace to avoid name collisions (#104364)
Fixes #91338

Follow up from https://github.com/pytorch/pytorch/pull/91342

> 🚀 The feature, motivation and pitch
> We have an existing DeviceType class all over the place in our code base, and it conflicts with the one that is used in torch. > Thankfully the pytorch DeciceType enum class is under the c10 namespace.

```
In file included from /xxx/build/_deps/torch-src/../../aten/src/ATen/ops/view.h:5:
/xxx/_deps/torch-src/aten/src/ATen/Context.h:265:14: error: reference to 'DeviceType' is ambiguous
    if (p == DeviceType::HIP) {
             ^
/xxx/include/Common_types.h:178:8: note: candidate found by name lookup is 'DeviceType'
struct DeviceType {
       ^
/xxx/build/_deps/torch-src/c10/../c10/core/DeviceType.h:32:12: note: candidate found by name lookup is 'c10::DeviceType'
enum class DeviceType : int8_t {
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104364
Approved by: https://github.com/albanD
2023-07-07 13:23:03 +00:00
Elias Ellison
7bb40be143 Fix fake tensor for private use backends (#103090)
Fix for https://github.com/pytorch/pytorch/issues/101244

We need meta to be higher priority than PrivateUse1 (as it is for cpu and cuda) so that when meta is in tls we hit meta kernel.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103090
Approved by: https://github.com/bdhirsh
2023-06-27 21:17:40 +00:00
Xu Han
6c1ccccf21 Enable mimalloc on pytorch Windows (#102595)
This PR is implemention of [#102534](https://github.com/pytorch/pytorch/issues/102534), option 2.
Major changes:
1. Add mimalloc to the submodule.
2. Add build option "USE_MIMALLOC".
3. It is only enabled on Windows build, And it would improve pytorch memory allocation performance.

Additional Test:
<img width="953" alt="image" src="https://github.com/pytorch/pytorch/assets/8433590/4b2ec2dc-16f1-4ad9-b457-cfeb37e489d3">
This PR also build & static link mimalloc on Linux well.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102595
Approved by: https://github.com/jgong5, https://github.com/malfet
2023-06-27 08:53:26 +00:00
Meghan
6ff4548b6e [AMP] Support XLA:TPU (#96370)
With https://github.com/pytorch/xla/pull/5148, https://github.com/pytorch/xla/pull/4740

With these changes
XLA:GPU users should use `torch.cuda.amp.autocast()` for AMP with float16
XLA:TPU users should use `torch.amp.autocast('xla')` for AMP with bfloat16

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96370
Approved by: https://github.com/bdhirsh, https://github.com/malfet
2023-06-23 19:46:42 +00:00
Charlie West-Taylor
5eb7325bc7 Add autocast support for IPU (#103890)
As part of this, a new `AutocastIPU` dispatch key has been added.

There's an existing PR, #85043, to make `Autocast` a proper per-backend functionality key, but it ran into issues with layering with other functionality keys and went stale.

This has been tested in the out-of-tree IPU PyTorch backend.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103890
Approved by: https://github.com/albanD
2023-06-22 15:38:45 +00:00
Brian Hirsh
c3c03e7cb8 Reland of https://github.com/pytorch/pytorch/pull/101818 (#103888)
Original PR broke internal

This reverts commit 5ed618132f.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103888
Approved by: https://github.com/albanD
2023-06-21 21:00:56 +00:00
Brian Hirsh
3cfd677b1f fix inference mode / PyDispatcher / Functionalize interaction (#103275)
Fixes https://github.com/pytorch/pytorch/issues/103132

This is kind of annoying: Functionalization (and also vmap, I think?) manually figures out which ops have C++ CompositeImplicit decomps, and directly registers them to the Functionalize key. This is a problem for the PyDispatcher: We normally want the PyDispatcher to take precedence over the regular dispatcher. But in this case, we have a python decomp registered to `CompositeImplicitAutograd`, and a C++ decomp registered *directly* to the `Functionalize` key, so the C++ decomp gets precedence over the python decomp.

The way this showed up was that a model was running `matmul()` under inference mode, so we never hit the autograd dispatch key, and go straight to the functionalize dispatch key. Matmul has both a python decomp and a c++ decomp, but we were running the C++ decomp. That C++ decomp isn't meant to be used with dynamic shapes, so we were failing with the "tried to call `.sizes()` on a tensor with dynamic shapes" error.

For now, I had the PyDispatcher mimic the behavior of functionalization codegen: when you register a python decomp to the `CompositeImplicitAutograd` key, this PR just automatically registers that decomp to the `Functionalize` key at the same time.

I'm trying to remember now why we didn't just add `Functionalize` (and all of the other functorch transform keys) directly to the `CompositeImplicitAutograd` alias keyset, but I couldn't remember (@zou3519 any chance you remember?).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103275
Approved by: https://github.com/ezyang, https://github.com/zou3519
2023-06-21 15:19:55 +00:00
PyTorch MergeBot
5ed618132f Revert "change pre_autograd to pre_dispatch tracing (#101818)"
This reverts commit b0392de2c3.

Reverted https://github.com/pytorch/pytorch/pull/101818 on behalf of https://github.com/izaitsevfb due to Breaks internal builds see D46629736 TypeError: wrap_key() got an unexpected keyword argument pre_autograd ([comment](https://github.com/pytorch/pytorch/pull/101818#issuecomment-1587837667))
2023-06-12 18:16:37 +00:00
Brian Hirsh
b0392de2c3 change pre_autograd to pre_dispatch tracing (#101818)
We discussed in a composability meeting a few weeks ago that `pre_autograd` should probably be renamed to `pre_dispatch`.

One question in this PR was: should I re-use a dispatch key? Or should I create a new dispatch key (that yet again corresponds to "top of the dispatcher")?

~~For now, I ended up sticking our proxy mode on the mode stack corresponding to `PythonTLSSnapshot`, because it was simple and it works. It looks like one of the functorch dispatch keys has higher priority though, so it's possible that functorch will end up running first. Open to options, but we can consider adding a new dispatch key later if that becomes a problem~~

Update: I added a dedicated dispatch key, `PreDispatch`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101818
Approved by: https://github.com/ezyang, https://github.com/Neilblaze, https://github.com/albanD, https://github.com/zou3519
2023-06-09 17:30:15 +00:00
cyy
87cbfe957a increase clang-tidy coverage to more c10 source files (#102902)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102902
Approved by: https://github.com/Skylion007
2023-06-04 06:33:01 +00:00
shibo19
af50efca24 add nested/sprase/quantized tensor key for privateuse1 (#102696)
Fixes #ISSUE_NUMBER
add nested/sprase/quantized tensor key for privateuse1
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102696
Approved by: https://github.com/bdhirsh
2023-06-02 22:35:52 +00:00
Shiyan Deng
685505353a Back out "Add PyObject preservation for UntypedStorage (#97470)" (#102553)
Summary:
Original commit changeset: c24708d18ccb

Original Phabricator Diff: D46159983

Test Plan: SL tests and CI

Differential Revision: D46284986

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102553
Approved by: https://github.com/DanilBaibak
2023-06-01 17:23:43 +00:00
cyy
3ae42cb7db adjust header inclusions in C10 as sugguested by IWYU (#102467)
This PR aims to reduce unused header inclusions in C10.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102467
Approved by: https://github.com/albanD
2023-05-31 19:19:10 +00:00
Kurt Mohler
5fe629e314 Add PyObject preservation for UntypedStorage (#97470)
Part of #91395

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97470
Approved by: https://github.com/ezyang
2023-05-23 01:27:30 +00:00
Benson Ma
66a2600b6a [T153220354] Fix header inclusions in c10 (#1541) (#101846)
Summary:
This is a re-attempt to land the iwyu header changes, by taking the diff from [PR 100304](https://github.com/pytorch/pytorch/pull/100304), and adding the bare minimal changes to make the diff build corectly in the internal builds.

X-link: https://github.com/facebookresearch/pytorch3d/pull/1541

X-link: https://github.com/fairinternal/pytorch3d/pull/44

- Re-work D45769819 to fix header inclusions in c10

Test Plan:
```
buck2 build --no-remote-cache mode/dev-nosan //caffe2/c10/...

buck2 build --no-remote-cache mode/dev-nosan //deeplearning/fbgemm/fbgemm_gpu/...

buck2 build mode/dev-nosan //vision/fair/pytorch3d/pytorch3d:_C
```

Reviewed By: malfet

Differential Revision: D45920611

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101846
Approved by: https://github.com/malfet, https://github.com/Skylion007
2023-05-20 19:35:14 +00:00
Bug Hunter Yan
0c470b17e3 Extend storage create for custom storageImpl (#100237)
Fixes #ISSUE_NUMBER

For the scenario where users inherit storageimpl to implement their own subclasses, the current storage creation method cannot correctly create storage objects.

Refer to the registration method of Allocator to expand the creation method of storageimpl, users can register their own custom storageimpl creation.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100237
Approved by: https://github.com/albanD
2023-05-17 04:30:13 +00:00
Richard Zou
6bc0f4a4ee [reland][CustomOp] Add Dispatcher error callback (#101452)
Reland of #101015, original stack reverted due to internal test
flakiness.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101452
Approved by: https://github.com/soulitzer
2023-05-16 13:33:31 +00:00
Jean Schmidt
c2e16d8b2c buck1 can't properly handle '/' on rule names, so fixing 'impl/cow/context' and 'core/impl/cow/context_test' build rules (#101552)
This is because PR #101510 can't be landed due to the author not have linked a github account to his internal meta account.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101552
Approved by: https://github.com/DanilBaibak
2023-05-16 11:28:58 +00:00
PyTorch MergeBot
7912b34789 Revert "[CustomOp] Add Dispatcher error callback (#101015)"
This reverts commit c0e5d7e7fe.

Reverted https://github.com/pytorch/pytorch/pull/101015 on behalf of https://github.com/huydhn due to Revert this as the earlier commits in the stack have been reverted ([comment](https://github.com/pytorch/pytorch/pull/101015#issuecomment-1548476583))
2023-05-15 19:49:53 +00:00
PyTorch MergeBot
cca31f1797 Revert "implement a function to convert a storage to copy-on-write (#100819)"
This reverts commit aec11b8c80.

Reverted https://github.com/pytorch/pytorch/pull/100819 on behalf of https://github.com/jeanschmidt due to added tests are breaking internal builds ([comment](https://github.com/pytorch/pytorch/pull/100819#issuecomment-1547929531))
2023-05-15 14:10:23 +00:00
mikey dagitses
aec11b8c80 implement a function to convert a storage to copy-on-write (#100819)
implement a function to convert a storage to copy-on-write

Summary:
This will be used in the _lazy_clone() operator as well as reshape().

Test Plan: 100% coverage of reachable lines.

---
Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/pytorch/pytorch/pull/100819).
* #100821
* #100820
* __->__ #100819

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100819
Approved by: https://github.com/ezyang
2023-05-12 17:45:04 +00:00
Richard Zou
c0e5d7e7fe [CustomOp] Add Dispatcher error callback (#101015)
The PyTorch Dispatcher's "no kernel found for DispatchKey" error message
is a bit long and winded. This PR adds a way to add a custom error
callback and changes the CustomOp API to use the custom error callback
to deliver better error messages.

Test Plan:
- new tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101015
Approved by: https://github.com/ezyang
2023-05-12 13:49:20 +00:00
Elias Ellison
0ec4646588 CUDA Graph Trees - error on deallocated access (#100927)
Turn warning to error if we detect tensor is accessed after its memory is overwritten/released by a new invocation of cudagraphs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100927
Approved by: https://github.com/zou3519
2023-05-11 17:17:14 +00:00
PyTorch MergeBot
4eaaa08623 Revert "Fix header inclusions in c10 by iwyu (#100304)"
This reverts commit 6037ee8cc9.

Reverted https://github.com/pytorch/pytorch/pull/100304 on behalf of https://github.com/jeanschmidt due to Breaking meta internal builds and fbgemm builds ([comment](https://github.com/pytorch/pytorch/pull/100304#issuecomment-1543919257))
2023-05-11 12:37:35 +00:00
PyTorch MergeBot
cbfed470bd Revert "CUDA Graph Trees - error on deallocated access (#100927)"
This reverts commit 3941bbc5ba.

Reverted https://github.com/pytorch/pytorch/pull/100927 on behalf of https://github.com/jeanschmidt due to breaking internal builds ([comment](https://github.com/pytorch/pytorch/pull/100927#issuecomment-1543874258))
2023-05-11 12:07:20 +00:00
mikey dagitses
979f55d3bc implementation of DataPtr context for copy-on-write tensors (#100818)
implementation of DataPtr context for copy-on-write tensors

Summary:
Copy-on-write storage
=====================
This library adds support for copy-on-write storage, i.e. lazy copies,
to tensors. The design maintains the PyTorch invariant that tensors
alias if and only if they share a storage. Thus, tensors that are lazy
copies of one another will have distinct storages that share a data
allocation.

Thread-safety
-------------
The correctness of this design hinges on the pre-existing PyTorch user
requirement (and general default programming assumption) that users
are responsible for guaranteeing that writes do not take places
concurrently with reads and other writes.

Lazily copied tensors add a complication to this programming model
because users are not required to know if lazy copies exist and are
not required to serialize writes across lazy copies. For example: two
tensors with distinct storages that share a copy-on-write data context
may be given to different threads that may do whatever they wish to
them, and the runtime is required to guarantee its safety.

It turns out that this is not that difficult to protect because, due
to the copy-on-write requirement, we just need to materialize a tensor
upon writing. This could be done entirely without synchronization if
we materialized each copy, however, we have a common-sense
optimization to elide the copy for the last remaining reference. This
requires waiting for any pending copies.

### Thread-safety detailed design
There are two operations that affect the copy-on-write details of a
tensor:

1) lazy-clone (e.g. an explicit call or a hidden implementation detail
   added through an operator like reshape)
2) materialization (i.e. any write to the tensor)

The key insight that we exploit is that lazy-clone is logically a read
operation and materialization is logically a write operation. This
means that, for a given set of tensors that share a storage, if
materialization is taking place, no other read operation, including
lazy-clone, can be concurrent with it.

However, this insight only applies within a set of tensors that share
a storage. We also have to be concerned with tensors with different
storages that share a copy-on-write context. In this world,
materialization can race with lazy-clone or even other
materializations. _However_, in order for this to be the case, there
must be _at least_ two references to the context. This means that the
context _can not_ vanish out from under you if you are performing a
lazy-clone, and hence, it only requires an atomic refcount bump.

The most complicated case is that all lazy-copies are concurrently
materializing. In this case, because a write is occurring, there are
no in-flight lazy-copies taking place. We must simply ensure that all
lazy-copies are able to materialize (read the data) concurrently. If
we didn't have the aforementioned optimization where the last copy
steals the data, we could get away with no locking whatsoever: each
makes a copy and decrements the refcount. However, because of the
optimization, we require the loser of the materializing race wait for
the pending copies to finish, and then steal the data without copying
it.

We implement this by taking a shared lock when copying the data and
taking an exclusive lock when stealing the data. The exclusive lock
acquisition ensures that all pending shared locks are finished before
we steal the data.

Test Plan: 100% code coverage.

---
Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/pytorch/pytorch/pull/100818).
* #100821
* #100820
* #100819
* __->__ #100818

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100818
Approved by: https://github.com/ezyang
2023-05-11 11:13:51 +00:00
cyy
6037ee8cc9 Fix header inclusions in c10 by iwyu (#100304)
This work introduces include-what-you-use  support for c10 by a CMake option defaulting to off. We also remove some unused header inclusions and  fix a trivial inclusion error.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100304
Approved by: https://github.com/ezyang
2023-05-11 05:19:42 +00:00
PyTorch MergeBot
3271413e74 Revert "Fix header inclusions in c10 by iwyu (#100304)"
This reverts commit 39ec5fa722.

Reverted https://github.com/pytorch/pytorch/pull/100304 on behalf of https://github.com/huydhn due to Sorry for reverting your PR, it is almost there but fails on Windows 39ec5fa722, which is in unstable mode after https://github.com/pytorch/pytorch/pull/100548 ([comment](https://github.com/pytorch/pytorch/pull/100304#issuecomment-1542975714))
2023-05-11 00:37:32 +00:00
Elias Ellison
3941bbc5ba CUDA Graph Trees - error on deallocated access (#100927)
Turn warning to error if we detect tensor is accessed after its memory is overwritten/released by a new invocation of cudagraphs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100927
Approved by: https://github.com/zou3519
2023-05-10 17:15:33 +00:00
cyy
39ec5fa722 Fix header inclusions in c10 by iwyu (#100304)
This work introduces include-what-you-use  support for c10 by a CMake option defaulting to off. We also remove some unused header inclusions and  fix a trivial inclusion error.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100304
Approved by: https://github.com/ezyang
2023-05-10 15:42:43 +00:00
Shawn Xu
bde7b81f34 [S337714] Back out "[PyTorch] Don't do extra numel() check in TensorImpl::data() (#98090)" (#100893)
Summary:
Original commit changeset: 9875964c3b32

Original Phabricator Diff: D44586464

Reviewed By: drdarshan

Differential Revision: D45664329

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100893
Approved by: https://github.com/xush6528
2023-05-08 21:56:44 +00:00
Sergei Vorobev
116e04be29 Initialize view_replay_enabled_ in the AutogradState ctor (#100822)
Cruise uses [clang static analyzer](https://clang-analyzer.llvm.org/) internally.
In the v2.0.0 release of PyTorch it found this problem

```
In file included from external/pytorch/aten/src/ATen/ATen.h:7:
In file included from external/pytorch/aten/src/ATen/Context.h:3:
In file included from external/pytorch/aten/src/ATen/CPUGeneratorImpl.h:3:
In file included from external/pytorch/aten/src/ATen/core/Generator.h:22:
In file included from external/pytorch/c10/core/GeneratorImpl.h:8:
In file included from external/pytorch/c10/core/TensorImpl.h:6:
external/pytorch/c10/core/InferenceMode.h:58:5: warning: Passed-by-value struct argument contains uninitialized data (e.g., field: 'view_replay_enabled_')
    AutogradState::set_tls_state(AutogradState(
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 warning generated.
```

In other words, the value of `view_replay_enabled_` could be initialized which may lead to subtle bugs later on.

This PR addresses the warning by explicitly initializing it to `false`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100822
Approved by: https://github.com/Skylion007
2023-05-08 18:57:14 +00:00
Radek Bartoň
bf2258f582 Fix frequent "warning C4141: 'dllimport': used more than once" (#100708)
This was introduced recently for MSVC builds.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100708
Approved by: https://github.com/ezyang
2023-05-05 18:45:58 +00:00
Edward Z. Yang
9a3e411a41 More rigorous mixed overloads on SymInt (#100008)
Previously the change to aten/src/ATen/native/LossNLL.cpp eventually resulted in a double / SymInt division, which ended up calling the int64_t / SymInt overload, truncating the double (bad!) By adding overloads for all the int/float types, we avoid this situation from happening in the future.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100008
Approved by: https://github.com/albanD
2023-04-28 00:54:44 +00:00
mikey dagitses
8cc57593b9 remove redundant trailing semicolons in StorageImpl.h (#97658)
remove redundant trailing semicolons in StorageImpl.h

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97658
Approved by: https://github.com/kit1980, https://github.com/malfet
2023-04-25 21:04:22 +00:00
cyy
dbc7e919b8 add Wmissing-prototypes to clang-tidy (#96805)
This PR introduces **-Wmissing-prototypes** of clang-tidy to prevent further coding errors such as the one fixed by PR #96714.

<!--
copilot:summary
-->
### <samp>🤖 Generated by Copilot at fd2cf2a</samp>

This pull request makes several internal functions static to improve performance and avoid name clashes. It also fixes some typos, formatting, and missing includes in various files. It adds a new .clang-tidy check to warn about missing prototypes for non-static functions.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96805
Approved by: https://github.com/malfet, https://github.com/albanD
2023-04-25 18:20:36 +00:00
Animesh Jain
971df458db Reland of "Python binding to set/get CUDA rng state offset" (#99565)
Why?
* To reduce the latency of hot path in https://github.com/pytorch/pytorch/pull/97377

Concern - I had to add `set_offset` in all instances of `GeneratorImpl`. I don't know if there is a better way.

~~~~
import torch
torch.cuda.manual_seed(123)
print(torch.cuda.get_rng_state())
torch.cuda.set_rng_state_offset(40)
print(torch.cuda.get_rng_state())

tensor([123,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0], dtype=torch.uint8)
tensor([123,   0,   0,   0,   0,   0,   0,   0,  40,   0,   0,   0,   0,   0,
          0,   0], dtype=torch.uint8)
~~~~

Reland of https://github.com/pytorch/pytorch/pull/98965

(cherry picked from commit 8214fe07e8)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99565
Approved by: https://github.com/anijain2305
2023-04-20 15:42:25 +00:00
shibo
da322ea874 Enable torch.jit.load for custom device (#99535)
Fixes #ISSUE_NUMBER
1、torch.jit.load for custom device
```
# custom device named `foo`
ts_model = torch.jit.script(mode.to(device="foo"))
ts_model.save("./ts.pt") # it is a script model on device `foo`

# and then we want to load it and run it
torch.jit.load("./ts.pt")
```
2、 add some extra key for custom device with `privateuse1`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99535
Approved by: https://github.com/albanD
2023-04-20 05:37:57 +00:00
PyTorch MergeBot
bb2cd4a107 Revert "Python binding to set/get CUDA rng state offset (#98965)"
This reverts commit 8214fe07e8.

Reverted https://github.com/pytorch/pytorch/pull/98965 on behalf of https://github.com/DanilBaibak due to Break internal build
2023-04-19 11:23:32 +00:00
Animesh Jain
8214fe07e8 Python binding to set/get CUDA rng state offset (#98965)
Why?
* To reduce the latency of hot path in https://github.com/pytorch/pytorch/pull/97377

Concern - I had to add `set_offset` in all instances of `GeneratorImpl`. I don't know if there is a better way.

~~~~
import torch
torch.cuda.manual_seed(123)
print(torch.cuda.get_rng_state())
torch.cuda.set_rng_state_offset(40)
print(torch.cuda.get_rng_state())

tensor([123,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0], dtype=torch.uint8)
tensor([123,   0,   0,   0,   0,   0,   0,   0,  40,   0,   0,   0,   0,   0,
          0,   0], dtype=torch.uint8)
~~~

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98965
Approved by: https://github.com/kulinseth, https://github.com/ezyang
2023-04-18 07:52:21 +00:00
Tugsbayasgalan Manlaibaatar
7401f0f8ce Add unbacked symbool support (#98877)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98877
Approved by: https://github.com/ezyang
2023-04-17 17:45:10 +00:00
Edward Z. Yang
756a86d52c Support large negative SymInt (#99157)
The strategy is that we will heap allocate a LargeNegativeIntSymNodeImpl whenever we have a large negative int, so that we can keep the old `is_symbolic` test (now called `is_heap_allocated`) on SymInt. Whenever we need to do something with these ints, though, we convert them back into a plain `int64_t` (and then, e.g., wrap it in whatever user specificed SymNodeImpl they need.) We cannot wrap directly in the user specified SymNodeImpl as we generally do not know what the "tracing context" is from C++. We expect large negative ints to be rare, so we don't apply optimizations like singleton-ifying INT_MIN.  Here's the order to review:

* c10/core/SymInt.h and cpp
  * `is_symbolic` renamed to `is_heap_allocated` as I needed to audit all use sites: the old `is_symbolic` test would return true for large negative int, but it would be wrong to then try to dispatch on the LargeNegativeIntSymNodeImpl which supports very few operations. In this file, I had to update expect_int,
  * If you pass in a large negative integer, we instead heap allocate it in `promote_to_negative`. The function is written in a funny way to keep compact constructor code for SymInt (the heap allocation happens out of line)
  * clone is now moved out-of-line
  * New method maybe_as_int which will give you a constant int if it is possible, either because it's stored inline or in LargeNegativeIntSymNodeImpl. This is the preferred replacement for previous use of is_symbolic() and then as_int_unchecked().
  * Rename toSymNodeImpl to toSymNode, which is more correct (since it returns a SymNode)
  * Complete rewrite of `normalize_symints.cpp` to use new `maybe_as_int`. Cannot easily use the old code structure, so it's now done doing a macro and typing out each case manually (it's actually not that bad.)
  * Reimplementations of all the unary operators by hand to use `maybe_as_int`, relatively simple.
* c10/core/LargeNegativeIntSymNodeImpl.h - Just stores a int64_t value, but it has to be big and negative. Most methods are not implemented, since we will rewrap the large negative int in the real SymNodeImpl subclass before doing operations with it
* The rest of the files are just rewriting code to use `maybe_as_int`. There is a nontrivial comment in c10/core/SymIntArrayRef.h

Very minor test adjustment in c10/test/core/SymInt_test.cpp . Plan to exercise this properly in next PR.

Companion XLA PR: https://github.com/pytorch/xla/pull/4882

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99157
Approved by: https://github.com/albanD
2023-04-15 22:43:51 +00:00
mikey dagitses
286212080f introduce TensorBase::mutable_data_ptr<T> (#97874)
See D44409928 for motivation.

Note that we keep the const-ness of the existing data_ptr<T>() member so
that we don't have to change all references atomically. We just change
the ones here that we have higher confidence with.

Differential Revision: [D44497685](https://our.internmc.facebook.com/intern/diff/D44497685/)

**NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D44497685/)!

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97874
Approved by: https://github.com/ezyang
2023-04-12 15:13:30 +00:00
mikey dagitses
439a716785 remove unused TensorImpl::unsafe_data<T>() (#98720)
Differential Revision: [D44824809](https://our.internmc.facebook.com/intern/diff/D44824809/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98720
Approved by: https://github.com/ezyang
2023-04-12 09:58:41 +00:00
mikey dagitses
ee0143bf65 distinguish mutability of TensorImpl::data<T>() (#98719)
There already is a mutable_data<T>() with different semantics, so we
introduce new names:
TensorImpl::(mutable_)?data_dtype_initialized<T>().

Differential Revision: [D44824778](https://our.internmc.facebook.com/intern/diff/D44824778/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98719
Approved by: https://github.com/ezyang
2023-04-12 07:24:35 +00:00
Tugsbayasgalan Manlaibaatar
39fd7f945f Add Symbool support in python to C++ translation (#98453)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98453
Approved by: https://github.com/ezyang
2023-04-12 03:21:57 +00:00
mikey dagitses
6145964ec9 distinguish implementation of data() and mutable_data() on TensorImpl (#98732)
The old style had them both going through a mutable method on Storage,
which would prevent us from implementing checks differently depending
on whether we are writing or reading.

Differential Revision: [D44831044](https://our.internmc.facebook.com/intern/diff/D44831044/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98732
Approved by: https://github.com/ezyang
2023-04-11 11:37:29 +00:00
ykddd
537c346117 feat(add method is_private_use1() in class Device) (#98123)
As the title

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98123
Approved by: https://github.com/bdhirsh
2023-04-10 12:30:37 +00:00
mikey dagitses
2400cb1d57 distinguish mutability of TensorImpl::data() (#97776)
See D44409928.

Differential Revision: [D44459999](https://our.internmc.facebook.com/intern/diff/D44459999/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97776
Approved by: https://github.com/ezyang
2023-04-09 20:21:56 +00:00
mikey dagitses
9d36361601 make TensorImpl::data_ptr_impl() non-const and have mutable in the name (#97744)
See D44409928.

Differential Revision: [D44450468](https://our.internmc.facebook.com/intern/diff/D44450468/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97744
Approved by: https://github.com/ezyang
2023-04-09 11:08:41 +00:00
ecao
fdb04c6a86 Add overflow check for stride calculation (#94900)
Fixes #94120 and #94128.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94900
Approved by: https://github.com/ezyang, https://github.com/jgong5
2023-04-09 01:30:55 +00:00
mikey dagitses
387feaa131 add mutable to name of non-const Storage::data_ptr (#97694)
See D44409928.

Differential Revision: [D44432585](https://our.internmc.facebook.com/intern/diff/D44432585/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97694
Approved by: https://github.com/ezyang
2023-04-08 12:44:30 +00:00
mikey dagitses
c68a94c5ea distinguish mutability of untyped Storage::data (#97690)
See D44409928.

Differential Revision: [D44429769](https://our.internmc.facebook.com/intern/diff/D44429769/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97690
Approved by: https://github.com/ezyang
2023-04-08 02:02:28 +00:00
mikey dagitses
49b80c3ea2 [reland] remove typed StorageImpl::data() and StorageImpl::unsafe_data() (#98411)
Original commit changeset: a466b3cb6a0a

Original Phabricator Diff: D44629941

Differential Revision: [D44709004](https://our.internmc.facebook.com/intern/diff/D44709004/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98411
Approved by: https://github.com/ezyang
2023-04-06 17:42:48 +00:00
Paweł Piskorski
5854923c17 Extract ExtraMeta symbolic shape fields into a dedicated SymbolicShap… (#98399)
…eMeta

This modularizes ExtraMeta to bring down its creation cost when it is needed for other functions than syn shape handling.

Change-Id: Ife59b201b0c4fd75090fe8be5171a6dd73a10d10

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98399
Approved by: https://github.com/ezyang
2023-04-05 22:06:10 +00:00
PyTorch MergeBot
45edc58e4f Revert "remove typed StorageImpl::data() and StorageImpl::unsafe_data() (#98219)"
This reverts commit 144d5268a1.

Reverted https://github.com/pytorch/pytorch/pull/98219 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally
2023-04-05 09:08:08 +00:00
mikey dagitses
144d5268a1 remove typed StorageImpl::data() and StorageImpl::unsafe_data() (#98219)
Typed data will now only be a tensor level concept.

Differential Revision: [D44629941](https://our.internmc.facebook.com/intern/diff/D44629941/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98219
Approved by: https://github.com/ezyang
2023-04-05 03:32:02 +00:00
mikey dagitses
3af0228338 remove typed StorageImpl::unsafe_data() (#98218)
Typed data will now only be a tensor level concept.

Differential Revision: [D44629939](https://our.internmc.facebook.com/intern/diff/D44629939/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98218
Approved by: https://github.com/ezyang
2023-04-05 00:10:59 +00:00
Paweł Piskorski
2d9b2bcfba Extend TensorImpl with BackendMeta (#97429)
BackendMeta offers a binary interface for the backend to attach arbitrary data to TensorImpl. TensorImpl has exactly one "slot" for backend metadata, however backend is free to compose any structure that is opaque to the framework beyond iheriting standard BackendMeta base.

Change-Id: I670fcdd16dd1c2b00f7eaa1cbc5b5dfea59a6221

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97429
Approved by: https://github.com/ezyang
2023-04-04 23:47:03 +00:00
mingfeima
34c7adf1d7 add Half support for sigmoid on CPU (#96077)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96077
Approved by: https://github.com/jgong5, https://github.com/ezyang
2023-04-04 18:43:27 +00:00
Jun Luo
d47a4bf53f Align settings for new device key. (#98224)
Summary: As title.

Test Plan: All CI tests should pass.

Reviewed By: yuhc

Differential Revision: D44341331

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98224
Approved by: https://github.com/jackm321, https://github.com/ezyang
2023-04-04 08:39:11 +00:00
PyTorch MergeBot
7eaaefafb3 Revert "Extend TensorImpl with BackendMeta (#97429)"
This reverts commit bc38b278bf.

Reverted https://github.com/pytorch/pytorch/pull/97429 on behalf of https://github.com/huydhn due to Sorry for reverting your PR as I am trying to root cause a libtorch build failure on Windows starting from your change bc38b278bf.  AFAICT, there is no other change from the log.  I will reland this if the failure is unrelated
2023-04-04 05:13:18 +00:00
ppiskorski
bc38b278bf Extend TensorImpl with BackendMeta (#97429)
BackendMeta offers a binary interface for the backend to attach arbitrary data to TensorImpl. TensorImpl has exactly one "slot" for backend metadata, however backend is free to compose any structure that is opaque to the framework beyond iheriting standard BackendMeta base.

Change-Id: I670fcdd16dd1c2b00f7eaa1cbc5b5dfea59a6221

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97429
Approved by: https://github.com/ezyang
2023-04-04 03:01:14 +00:00
mikey dagitses
4431509a54 introduce c10::DataPtr::mutable_get() and use it in c10 (#98217)
Differential Revision: [D44629940](https://our.internmc.facebook.com/intern/diff/D44629940/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98217
Approved by: https://github.com/ezyang
2023-04-04 02:26:18 +00:00
Scott Wolchok
f386312ec9 [PyTorch] Don't do extra numel() check in TensorImpl::data() (#98090)
`is_empty()` checks `numel() == 0`, but we don't need to access `numel_` at all (or the policy that `numel()` checks) in our happy path -- we just need the data pointer from `storage_`. Let's do the check we need to do using only the data we strictly need, rather than adding instructions loading other pieces of data.

Differential Revision: [D44586464](https://our.internmc.facebook.com/intern/diff/D44586464/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98090
Approved by: https://github.com/Skylion007
2023-04-04 00:59:52 +00:00
mikey dagitses
2ac9086987 run buildifier on unified build files (#98141)
This is pretty tricky. buildifier by default doesn't do much to these
files. It does a little more if you tell it that they are
`BUILD.bazel` files with -type=build. But it can do even more if you
remove the target definitions from the `def define_rules()` wrapper
and dedent them.

I wrote a little wrapper that does that. I'll submit it at a later
date.

Differential Revision: [D44606558](https://our.internmc.facebook.com/intern/diff/D44606558/)

**NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D44606558/)!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98141
Approved by: https://github.com/ezyang, https://github.com/PaliC
2023-04-04 00:37:19 +00:00
mikey dagitses
64077ce511 remove redundant typed StorageImpl::data() member (#97650)
This has the same implementation as the unsafe variants and the unsafe
variants match the original semantics of the code, given that they
don't check that the type matches.

Given that we're updating callsites anyways to address the mutability
aspect, we might as well just drop this method now.

Differential Revision: [D44410210](https://our.internmc.facebook.com/intern/diff/D44410210/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97650
Approved by: https://github.com/ezyang
2023-04-01 08:16:54 +00:00
Kazuaki Ishizaki
64b8d20a5c Fix typos under c10 directory (#98079)
This PR fixes typos in comments and messages of files under `c10` directory

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98079
Approved by: https://github.com/Skylion007
2023-03-31 18:31:11 +00:00
mikey dagitses
cb8c0be54d add StorageImpl::mutable_unsafe_data (#97648)
See D44409928.

Differential Revision: [D44409945](https://our.internmc.facebook.com/intern/diff/D44409945/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97648
Approved by: https://github.com/ezyang
2023-03-31 16:04:07 +00:00
mikey dagitses
da28af3286 distinguish mutability of StorageImpl::data_ptr() member (#97651)
See D44409928.

Differential Revision: [D44410323](https://our.internmc.facebook.com/intern/diff/D44410323/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97651
Approved by: https://github.com/ezyang
2023-03-30 19:13:56 +00:00
mikey dagitses
428cb3a868 distinguish mutability of untyped StorageImpl::data() member (#97647)
To implement the warning when transitioning reshape to copy-on-write
storage, we want to be able to detect a write to one view family
following by a read or a write to another one that shares the same
copy-on-write storage.

Because we have historically not been strict about the mutability of
our data pointers, any warning we have would likely be far too
aggressive.

Therefore, this is the first PR in a long series to ensure a strict
distinction between mutable and const data accessors in TensorBase,
TensorImpl, Storage, and StorageImpl.

The rough plan is to give the mutable accessor a new name that is
explicit about mutation, this will also force us to rewrite any code
that really needs a mutation.

Differential Revision: [D44409928](https://our.internmc.facebook.com/intern/diff/D44409928/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97647
Approved by: https://github.com/ezyang
2023-03-30 09:45:09 +00:00
Pierre Moulon
01885cea43 [Typo] mulithreading_enabled => multithreading_enabled (#97054)
Summary: Fix typo

Test Plan: Continuous integration - Expected NoOp since it is just a variable renaming

Differential Revision: D44118850

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97054
Approved by: https://github.com/Skylion007
2023-03-21 20:11:59 +00:00
shibo
6b691b99da add amp support for custom backend (#96188)
Fixes #ISSUE_NUMBER
1、add amp support for custom backend
2、optimize the file `backend_registration.py`, and rename it with `custom_backend_registration.py`. And then we would register other funcs for custom backend.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96188
Approved by: https://github.com/bdhirsh
2023-03-20 20:27:35 +00:00
PyTorch MergeBot
a8f36dd646 Revert "add amp support for custom backend (#96188)"
This reverts commit cf12edee02.

Reverted https://github.com/pytorch/pytorch/pull/96188 on behalf of https://github.com/kit1980 due to Broke some linalg tests : https://github.com/pytorch/pytorch/actions/runs/4420037607/jobs/7750708339
2023-03-15 00:03:19 +00:00
shibo
cf12edee02 add amp support for custom backend (#96188)
Fixes #ISSUE_NUMBER
1、add amp support for custom backend
2、optimize the file `backend_registration.py`, and rename it with `custom_backend_registration.py`. And then we would register other funcs for custom backend.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96188
Approved by: https://github.com/bdhirsh
2023-03-14 20:43:21 +00:00
PyTorch MergeBot
a22b92d8ba Revert "Enable thp(transparent huge pages) for buffer sizes >=2MB (#95963)"
This reverts commit 3bb16a0842.

Reverted https://github.com/pytorch/pytorch/pull/95963 on behalf of https://github.com/izaitsevfb due to Breaks internal android builds: unused function c10_compute_alignment  [-Werror,-Wunused-function]
2023-03-14 02:15:08 +00:00
Zachary DeVito
48490cec28 [memory profiling] Move Context object to c10 (#96280)
Minor refactor so that follow up PR can have objects that meet the GatheredContext
inferface without having to depend on CUDA.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96280
Approved by: https://github.com/eellison
2023-03-12 07:24:14 +00:00
Sunita Nadampalli
3bb16a0842 Enable thp(transparent huge pages) for buffer sizes >=2MB (#95963)
The 2MB thp pages provide better allocation latencies compared to the standard 4KB pages. This change has shown substantial improvement for batch mode usecases where the tensor sizes are larger than 100MB.

Only enabled if THP_MEM_ALLOC_ENABLE environment variable is set.

re-landing https://github.com/pytorch/pytorch/pull/93888

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95963
Approved by: https://github.com/malfet
2023-03-10 13:58:01 +00:00
Kurt Mohler
1b59c3feb5 Add PyObjectSlot member to StorageImpl (#93342)
Part of #91395

Also modifies how `StorageImpl`s are stored in JIT static runtime's `MemoryPlanner`, which used to `std::move` `StorageImpl`s into a vector. But `StorageImpl` can no longer be moved. Instead, `MemoryPlanner` now contains a malloced buffer to which we add new `StorageImpl`s using placement new

Pull Request resolved: https://github.com/pytorch/pytorch/pull/93342
Approved by: https://github.com/ezyang
2023-03-10 10:40:01 +00:00
cyy
d0e4ca233e some reference and move fixes (#95942)
This PR introduces some modifications:
1. We find out some const function parameters that can be passed by reference and add the reference.
2. We find more opportunists of passing by value and change them accordingly.
3. Some use-after-move errors are fixed.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95942
Approved by: https://github.com/Skylion007
2023-03-10 03:44:09 +00:00
vasiliy
dc70e8175f Add various uninterpreted bit tensor data types (try 2) (#95860)
Summary:

This is a retry of https://github.com/pytorch/pytorch/pull/94992 which was reverted due to CI issues.

This PR adds a set of unintrepreted data types on PyTorch which can be used to implement experimental functionality out of core (think fp8, int4, int16 quant, etc).

@bypass-github-export-checks

Test Plan:

```
python test/test_quantization.py -k TestBits
```

Reviewers:

Subscribers:

Tasks:

Tags:

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95860
Approved by: https://github.com/atalman
2023-03-04 03:35:59 +00:00
Edward Z. Yang
027ebca4d7 Don't use guardless contiguity/stride-like implementations (#95733)
These prevent us from simplifying tests involving unbacked SymInts,
and then you end up with unbacked SymInt in guards, which is bad.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95733
Approved by: https://github.com/tugsbayasgalan
2023-03-03 21:56:41 +00:00
PyTorch MergeBot
4026c62174 Revert "Don't use guardless contiguity/stride-like implementations (#95733)"
This reverts commit deaf077de8.

Reverted https://github.com/pytorch/pytorch/pull/95733 on behalf of https://github.com/ezyang due to apparently this regresses executorch tests internally
2023-03-03 17:43:05 +00:00
Edward Z. Yang
deaf077de8 Don't use guardless contiguity/stride-like implementations (#95733)
These prevent us from simplifying tests involving unbacked SymInts,
and then you end up with unbacked SymInt in guards, which is bad.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95733
Approved by: https://github.com/tugsbayasgalan
2023-03-01 23:14:58 +00:00
PyTorch MergeBot
2936c8b9ce Revert "Enable thp(transparent huge pages) for buffer sizes >=2MB (#93888)"
This reverts commit 2cc845eb1a.

Reverted https://github.com/pytorch/pytorch/pull/93888 on behalf of https://github.com/seemethere due to Reverting due to internal build issues, Meta employees see: https://fburl.com/sandcastle/1p4zvldk
2023-03-01 22:33:04 +00:00
Sunita Nadampalli
2cc845eb1a Enable thp(transparent huge pages) for buffer sizes >=2MB (#93888)
The 2MB thp pages provide better allocation latencies compared to the standard 4KB pages. This change has shown significant improvement for batch mode usecases where the tensor sizes are larger than 100MB.

Only enabled if `THP_MEM_ALLOC_ENABLE` environment variable is set.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/93888
Approved by: https://github.com/jgong5, https://github.com/malfet
2023-02-28 21:12:46 +00:00
kshitij12345
407b0f3214 fix for debug crash build (#95464)
Fixes https://github.com/pytorch/pytorch/issues/94376

⚠️ Hacky fix

Details about use of `noop_vtable`:
d677432b70/c10/core/impl/PyInterpreter.h (L92-L102)

Currently, at destruction, `noop_vtable` goes out of scope first while there are dangling references to the object still present with other objects like `PythonKernelHolder` which is held by the singleton `Dispatcher`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95464
Approved by: https://github.com/ezyang
2023-02-25 19:42:06 +00:00
PyTorch MergeBot
3bafecf719 Revert "Add various uninterpreted bit tensor data types (#94992)"
This reverts commit 9dbfca7840.

Reverted https://github.com/pytorch/pytorch/pull/94992 on behalf of https://github.com/atalman due to breaks libtorch windows nightly builds see: https://github.com/pytorch/pytorch/pull/95406
2023-02-23 23:54:23 +00:00
Peter Bell
56aed2a6bb SymFloat: Expose comparison operators in C++ API (#94812)
This is adapted from the corresponding methods in `SymInt.h`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94812
Approved by: https://github.com/ezyang
2023-02-23 05:50:45 +00:00
Sergii Dymchenko
f98733e976 Fix disbale typos (#95322)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95322
Approved by: https://github.com/clee2000
2023-02-23 02:08:45 +00:00
vasiliy
9dbfca7840 Add various uninterpreted bit tensor data types (#94992)
Summary:

This PR adds a set of unintrepreted data types on PyTorch which can be used to implement experimental functionality out of core (think fp8, int4, int16 quant, etc).

Note: this is a copy-pasta of https://github.com/pytorch/pytorch/pull/89990 with a bug fix for clang9, easier to just to put up another PR since I'm not sure how comandeering works with Meta-only changes.

@bypass-github-export-checks

Test Plan:

```
python test/test_quantization.py -k TestBits
```

Reviewers:

Subscribers:

Tasks:

Tags:

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94992
Approved by: https://github.com/angelayi
2023-02-18 00:04:30 +00:00
Edward Z. Yang
d9950c5215 Hard code known true contiguity settings for unbacked SymInts (#95003)
Extracted from https://github.com/pytorch/pytorch/pull/94523 which has E2E test

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95003
Approved by: https://github.com/voznesenskym, https://github.com/ngimel
2023-02-17 00:42:41 +00:00
Edward Z. Yang
ef5de0a4cf Don't use PrimTorch decomposition for empty (#94512)
This PR removes the unnecessary == 0 guard when constructing empty tensors, by ensuring that when we create a contiguous tensor we go directly to the C++ torch.empty implementation (instead of indirecting through empty_strided), where we can bypass doing zero tests when computing the size of the storage. This probably also speeds up trace time.

When I did this, I found out that `empty_tensor_restride_symint` was flagrantly wrong (we had never exercised it before because we redirected to `empty_strided` in PrimTorch decomp, which doesn't hit this codepath.) The bugs:

* Stride computation was wrong (only `last_idx` was ever written to)
* Using set_sizes_and_strides with `sym_sizes` input doesn't work, because there is some sort of ordering problem where `clone_symvec` isn't safe when you clone a vector into itself. Probably should fix this.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94512
Approved by: https://github.com/ngimel
2023-02-16 16:04:41 +00:00
Edward Z. Yang
092e28f17f Make the glue compute short circuit only if possible (#94437)
If the inputs are unhinted, they will use the branchless implementation.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94437
Approved by: https://github.com/voznesenskym
2023-02-15 21:06:42 +00:00
Edward Z. Yang
ff7772317b Stub all TensorImpl bools; do not go to Python if not hinted. (#94431)
The basic idea behind this PR is that we want to continue using the guarding implementations of contiguity tests, if all of the elements are backend (aka, have hints). If they don't have hints, we'll have to do something slower (use the non-short circuiting, non guarding implementations of contiguity), but most of the time you aren't dealing with unbacked SymInts.

So this PR has three parts.

1. We expose `has_hint` on `SymNode`. This allows us to query whether or not a SymInt is backed or not from C++. Fairly self explanatory. Will require LTC/XLA updates; but for backends that don't support unbacked SymInts you can just always return true.
2. We update `compute_non_overlapping_and_dense` to test if the inputs are hinted. If they are all hinted, we use the conventional C++ implementation. Otherwise we call into Python. The Python case is not heavily tested right now because I haven't gotten all of the pieces for unbacked SymInts working yet. Coming soon.
3. We add stubs for all of the other contiguity tests. The intention is to apply the same treatment to them as well, but this is not wired up yet for safety reasons.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94431
Approved by: https://github.com/voznesenskym
2023-02-15 21:06:42 +00:00
Sujoy Saraswati
4a5ce921a0 Add HPU to compatible shallow copy list and remove lazy HPU changes (#94673)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94673
Approved by: https://github.com/wconstab
2023-02-14 17:15:25 +00:00
Brian Hirsh
2b36d35b9c add torch.autograd._unsafe_set_version_counter API (#92924)
better description coming soon (but this is meant to fix https://github.com/pytorch/pytorch/issues/91093)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92924
Approved by: https://github.com/ezyang, https://github.com/alanwaketan, https://github.com/albanD
2023-02-11 21:07:08 +00:00
Brian Hirsh
948cd61afc add fallthrough kernel for AutogradMeta key (#94603)
The other `Autograd[Backend]` keys all have fallthrough kernels registered to them, but `AutogradMeta` was missing the fallthrough kernel.

This is a problem for custom ops that don't have autograd support, if you try to run them with meta tensors. If you have a custom op, and register a CPU and a Meta kernel, then:

(1) if you run the op with cpu tensors, it will dispatch straight to the CPU kernel (as expected)

(2) if you run the op with meta tensors, you will error - because we don't have a fallthrough registered to the AutogradMeta key, we will try to dispatch to the AutogradMeta key and error, since the op author hasn't provided an autograd implementation.

Here's a repro that I confirmed now works:

```
import torch
from torch._dispatch.python import enable_python_dispatcher
from torch._subclasses.fake_tensor import FakeTensorMode

lib = torch.library.Library("test", "DEF")
impl_cpu = torch.library.Library("test", "IMPL", "CPU")
impl_meta = torch.library.Library("test", "IMPL", "Meta")

def foo_impl(x):
    return x + 1

lib.define("foo(Tensor a) -> Tensor")
impl_meta.impl("foo", foo_impl)
impl_cpu.impl("foo", foo_impl)

with enable_python_dispatcher():
    a = torch.ones(2, device='meta')
    print("@@@@@")
    b = torch.ops.test.foo.default(a)
    print(b)
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94603
Approved by: https://github.com/ezyang, https://github.com/albanD
2023-02-10 22:44:52 +00:00
Sahan Paliskara
47efbd5719 [pytorch] [hygiene] remove legacy buck rules (#94053)
Summary:
Removes legacy buck rules specifically we do the following conversions
- ["xxx:=yyy"] -> ["xxx[yyy]"]
- "//xxx/yyy" - "//xxx/yyy:yyy"

Test Plan: CI should pass

Differential Revision: D42999413

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94053
Approved by: https://github.com/osalpekar, https://github.com/malfet
2023-02-09 15:45:29 +00:00
Brian Hirsh
83275d8cdf add torch.autograd._set_view_replay_enabled, use in aot autograd (#92588)
tldr; this should fix some minor perf regressions that were caused by adding more as_strided() calls in aot autograd.

This PR adds a new context manager, `torch.autograd._set_view_replay_enabled()`.

Context: AOT Autograd has special handling for "outputs that alias graph intermediates". E.g. given this function:

```
def f(x):
    y = torch.mul(x, 2)
    out = y.view(-1)
    return out
```

AOT Autograd will do the following:

```
def fn_to_compile(x):
    y = torch.mul(x, 2)
    out = y.view(-1)
    # return the graph intermediate
    return y, out

compiled_fn = compile(fn_to_compile)

def wrapper(x):
    y, out = compiled_fn(x)
    # regenerate the alias of the graph intermediate
    return out._view_func(y)
```

What's annoying is that `out._view_func()` will result in a `.as_strided` call, because `out` is an ordinary runtime tensor. This (likely?) caused a perf regression, because when running the backward, out `as_strided_backward()` is slower than our `view_backward()`.

In this PR, I added some TLS for instructing autograd to do view replay instead of as_strided, even when given a normal tensor. I'm definitely interested in thoughts from autograd folks (cc @albanD @soulitzer). A few points that I want to bring up:

(1) One reason that this API seems generally useful to me is because of the case where you `torch.compile()` a function, and you pass in two inputs that alias each other, and mutate one of the inputs. Autograd is forced to add a bunch of as_strided() calls into the graph when this happens, but this would give users an escape hatch for better compiled perf in this situation

(2) To be fair, AOT Autograd probably won't need this TLS in the long term. There's a better (more complicated) solution, where AOT Autograd manually precomputes the view chain off of graph intermediates during tracing, and re-applies them at runtime. This is kind of complicated though and feels lower priority to implement immediately.

(3) Given all of that I made the API private, but lmk what you all think.

This is a followup of https://github.com/pytorch/pytorch/pull/92255.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92588
Approved by: https://github.com/ezyang, https://github.com/albanD
2023-02-08 01:48:32 +00:00
Mikayla Gawarecki
895d4781b8 [easy] Add NestedTensorMeta to parseDispatchKey (#94279)
ran into this when trying to use `torch.library.Library("aten", "IMPL", "NestedTensorMeta")`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94279
Approved by: https://github.com/bdhirsh
2023-02-07 19:46:29 +00:00
cyy
3c6bc58f63 use C10_API in libc10.so (#94171)
MSVC emits several C4273 warning  when compiling c10. I think the offending files should use C10_API instead of TORCH_API. If the tests pass, the changes should be safe.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94171
Approved by: https://github.com/Skylion007
2023-02-06 20:16:22 +00:00
Aaron Gokaslan
7b6e948812 Add missing move to torch_dispatch_mode.h (#94154)
Removes an unnecessary copy from torch_dispatch_mode.h
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94154
Approved by: https://github.com/ezyang
2023-02-05 20:43:30 +00:00
cyy
1a32db15e7 Some performance fixes (#94034)
Applies some performance fixes

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94034
Approved by: https://github.com/Skylion007
2023-02-04 02:17:48 +00:00