Commit Graph

73 Commits

Author SHA1 Message Date
soulitzer
fda0a965c7 [reland] Support SingletonSymNode mul with coefficient (#110673)
reland of https://github.com/pytorch/pytorch/pull/110369
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110673
Approved by: https://github.com/ezyang
2023-10-10 19:37:17 +00:00
soulitzer
69ea214cc2 [reland] Update singleton int to error when inequality relation is undefined (#110672)
reland of https://github.com/pytorch/pytorch/pull/110044
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110672
Approved by: https://github.com/ezyang
2023-10-06 17:50:25 +00:00
PyTorch MergeBot
330db8278b Revert "Update singleton int to error when inequality relation is undefined (#110044)"
This reverts commit 07331c65e6.

Reverted https://github.com/pytorch/pytorch/pull/110044 on behalf of https://github.com/PaliC due to bottom diff is causing a plethora of internal failures ([comment](https://github.com/pytorch/pytorch/pull/110044#issuecomment-1749805209))
2023-10-05 23:55:37 +00:00
PyTorch MergeBot
1c3fae46ee Revert "Support SingletonSymNode mul with coefficient (#110369)"
This reverts commit eb8feb8ff8.

Reverted https://github.com/pytorch/pytorch/pull/110369 on behalf of https://github.com/PaliC due to bottom diff is causing a plethora of internal failures ([comment](https://github.com/pytorch/pytorch/pull/110369#issuecomment-1749802899))
2023-10-05 23:51:28 +00:00
soulitzer
eb8feb8ff8 Support SingletonSymNode mul with coefficient (#110369)
We want to be able to use SingletonSymNode to represent strides for Jagged layout tensor. The following is for 3D, but easily generalizable to higher dimensions.

Constraints:
- [B, x, D] (where x represents the "variably lengthed dim") can be strided in two ways [x, 1, sum(x)] and [dx, d, 1]. We need two different placeholder values depending on how the jagged tensor is strided.
- When doing operations we need the strides of output tensors to be expressable in terms of the strides and sizes of the inner tensors. Given [B, x, D] @ [D, D'], the output strides is [x * D', D', 1] rather than some opaque [x2, D', 1]. This constraint exists because if I'm tracing, I need a symint to represent the output stride. This symint needs to come from somewhere; I get it in several ways: (1) create a constant, (2) unbacked symint, (3) create a new input using a source, (4) output of an operation on an existing symint. It is clear that (4) is what we want here, which brings us to the design below.

Design:

Given the two constraints, the most straightforward way to implement this is actually to update SingletonSymNode to include some scalar factor, i.e. Morally, SingletonSymNode represents `factor * [s_0, s_1, …, s_n]` This enables us to symbolically compute strides from sizes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110369
Approved by: https://github.com/ezyang
ghstack dependencies: #110044
2023-10-04 22:56:15 +00:00
soulitzer
07331c65e6 Update singleton int to error when inequality relation is undefined (#110044)
Previously, something like j0 >= 3, would return False. In sympy however, it is not possible to make it so that both j0 >= 3 and j0 < 3 return False. In sympy, you only get to dispatch on Ge, and the remaining are derived, e.g. defining Ge(j0 >= 3) to be False would force Lt(j0, 3) to be True, which is not what we want.

In this PR, we make it so that both j0 >=3 and j0 < 3 error, so that in a future PR when we create the symbolic counterpart of this singleton, the behaviors can be the same.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110044
Approved by: https://github.com/ezyang
2023-10-04 22:55:53 +00:00
cyy
55905c4a1a [2/N] Enable clang-tidy to c10/test/*cpp (#110270)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110270
Approved by: https://github.com/Skylion007, https://github.com/kit1980
2023-10-01 07:36:23 +00:00
cyy
3dc479e70b [1/N] Apply clang-tidy to c10/test/*cpp (#109278)
This series of PR enables clang-tidy checks in c10/test. We aim to finally add the path to lintrunner.toml
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109278
Approved by: https://github.com/kit1980
2023-09-29 02:20:57 +00:00
Kurt Mohler
f2c360e3e5 Reorganize and rename COW files and APIs (#110191)
This PR does the following:
* Combine `cow/context.<h/cpp>` and `cow/deleter.<h/cpp>` into `cow/COWDeleter.<h/cpp>`
* Rename `Context` to `COWDeleterContext`
* Rename `delete_context` to `cow_deleter`
* Remove the separate `impl_cow_context` bazel library, combining it with the base c10 core library
* Rename `context_test.cpp` to `cow_test.cpp`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110191
Approved by: https://github.com/ezyang
2023-09-28 17:50:44 +00:00
PyTorch MergeBot
1265400ba6 Revert "Reland: implement a function to convert a storage to copy-on-write (#110022)"
This reverts commit dddf07e56a.

Reverted https://github.com/pytorch/pytorch/pull/110022 on behalf of https://github.com/atalman due to New tests are failing in internal CI ([comment](https://github.com/pytorch/pytorch/pull/110022#issuecomment-1737584693))
2023-09-27 15:05:41 +00:00
mikey dagitses
dddf07e56a Reland: implement a function to convert a storage to copy-on-write (#110022)
Relands #100819

In addition, the `impl_cow_context` library is combined into the base c10 core library, and COW unit tests are combined into just one binary.

Part of #109833

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110022
Approved by: https://github.com/ezyang
2023-09-26 03:33:18 +00:00
soulitzer
4667a5c948 Update SingletonSymNode to allow more comparisons (#108315)
In this PR:
- {in,}equality between singleton and plain ints returns false instead of erroring
- Morally define the semantic of j0 > c to be as if j0 represented an array [s_0, s_1, ... s_n] and s_k > c for all k
- Just like for equality, we don't actually want to do the comparison one by one, instead j0 is constrained to some range [min, max]. By default this range is [2, int64_t::max] so that it acts like a size and passes 0/1 specialization checks.
- In the future, we can define some API to allow users to constrain the range of their singletons

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108315
Approved by: https://github.com/ezyang
2023-09-13 01:58:02 +00:00
soulitzer
d7130e9704 Add SingletonSymIntNode (#107089)
Adds `SingletonSymNodeImpl` (alternatively, `SkolemSymNodeImpl`). This is a int-like object that only allows  the`eq` operation; any other operation produces an error.

The main complexity is that we require operations that dispatch to SymNode must take and return SymNodes, but when performing operations involving `SingletonSymNodeImpl`, operations involving SymNode can return non-SymNode bools.  For more discussion see [here](https://docs.google.com/document/d/18iqMdnHlUnvoTz4BveBbyWFi_tCRmFoqMFdBHKmCm_k/edit)
- Introduce `ConstantSymNodeImpl` a generalization of `LargeNegativeIntSymNodeImpl` and replace usage of `LargeNegativeIntSymNodeImpl`  in SymInt.
- Also use ConstantSymNodeImpl to enable SymBool to store its data on a SymNode. Remove the  assumption that if SymBool holds a non-null SymNode, it must be symbolic.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107089
Approved by: https://github.com/ezyang
ghstack dependencies: #107839
2023-08-24 21:38:47 +00:00
PyTorch MergeBot
cca31f1797 Revert "implement a function to convert a storage to copy-on-write (#100819)"
This reverts commit aec11b8c80.

Reverted https://github.com/pytorch/pytorch/pull/100819 on behalf of https://github.com/jeanschmidt due to added tests are breaking internal builds ([comment](https://github.com/pytorch/pytorch/pull/100819#issuecomment-1547929531))
2023-05-15 14:10:23 +00:00
mikey dagitses
aec11b8c80 implement a function to convert a storage to copy-on-write (#100819)
implement a function to convert a storage to copy-on-write

Summary:
This will be used in the _lazy_clone() operator as well as reshape().

Test Plan: 100% coverage of reachable lines.

---
Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/pytorch/pytorch/pull/100819).
* #100821
* #100820
* __->__ #100819

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100819
Approved by: https://github.com/ezyang
2023-05-12 17:45:04 +00:00
mikey dagitses
979f55d3bc implementation of DataPtr context for copy-on-write tensors (#100818)
implementation of DataPtr context for copy-on-write tensors

Summary:
Copy-on-write storage
=====================
This library adds support for copy-on-write storage, i.e. lazy copies,
to tensors. The design maintains the PyTorch invariant that tensors
alias if and only if they share a storage. Thus, tensors that are lazy
copies of one another will have distinct storages that share a data
allocation.

Thread-safety
-------------
The correctness of this design hinges on the pre-existing PyTorch user
requirement (and general default programming assumption) that users
are responsible for guaranteeing that writes do not take places
concurrently with reads and other writes.

Lazily copied tensors add a complication to this programming model
because users are not required to know if lazy copies exist and are
not required to serialize writes across lazy copies. For example: two
tensors with distinct storages that share a copy-on-write data context
may be given to different threads that may do whatever they wish to
them, and the runtime is required to guarantee its safety.

It turns out that this is not that difficult to protect because, due
to the copy-on-write requirement, we just need to materialize a tensor
upon writing. This could be done entirely without synchronization if
we materialized each copy, however, we have a common-sense
optimization to elide the copy for the last remaining reference. This
requires waiting for any pending copies.

### Thread-safety detailed design
There are two operations that affect the copy-on-write details of a
tensor:

1) lazy-clone (e.g. an explicit call or a hidden implementation detail
   added through an operator like reshape)
2) materialization (i.e. any write to the tensor)

The key insight that we exploit is that lazy-clone is logically a read
operation and materialization is logically a write operation. This
means that, for a given set of tensors that share a storage, if
materialization is taking place, no other read operation, including
lazy-clone, can be concurrent with it.

However, this insight only applies within a set of tensors that share
a storage. We also have to be concerned with tensors with different
storages that share a copy-on-write context. In this world,
materialization can race with lazy-clone or even other
materializations. _However_, in order for this to be the case, there
must be _at least_ two references to the context. This means that the
context _can not_ vanish out from under you if you are performing a
lazy-clone, and hence, it only requires an atomic refcount bump.

The most complicated case is that all lazy-copies are concurrently
materializing. In this case, because a write is occurring, there are
no in-flight lazy-copies taking place. We must simply ensure that all
lazy-copies are able to materialize (read the data) concurrently. If
we didn't have the aforementioned optimization where the last copy
steals the data, we could get away with no locking whatsoever: each
makes a copy and decrements the refcount. However, because of the
optimization, we require the loser of the materializing race wait for
the pending copies to finish, and then steal the data without copying
it.

We implement this by taking a shared lock when copying the data and
taking an exclusive lock when stealing the data. The exclusive lock
acquisition ensures that all pending shared locks are finished before
we steal the data.

Test Plan: 100% code coverage.

---
Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/pytorch/pytorch/pull/100818).
* #100821
* #100820
* #100819
* __->__ #100818

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100818
Approved by: https://github.com/ezyang
2023-05-11 11:13:51 +00:00
cyy
dbc7e919b8 add Wmissing-prototypes to clang-tidy (#96805)
This PR introduces **-Wmissing-prototypes** of clang-tidy to prevent further coding errors such as the one fixed by PR #96714.

<!--
copilot:summary
-->
### <samp>🤖 Generated by Copilot at fd2cf2a</samp>

This pull request makes several internal functions static to improve performance and avoid name clashes. It also fixes some typos, formatting, and missing includes in various files. It adds a new .clang-tidy check to warn about missing prototypes for non-static functions.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96805
Approved by: https://github.com/malfet, https://github.com/albanD
2023-04-25 18:20:36 +00:00
Edward Z. Yang
756a86d52c Support large negative SymInt (#99157)
The strategy is that we will heap allocate a LargeNegativeIntSymNodeImpl whenever we have a large negative int, so that we can keep the old `is_symbolic` test (now called `is_heap_allocated`) on SymInt. Whenever we need to do something with these ints, though, we convert them back into a plain `int64_t` (and then, e.g., wrap it in whatever user specificed SymNodeImpl they need.) We cannot wrap directly in the user specified SymNodeImpl as we generally do not know what the "tracing context" is from C++. We expect large negative ints to be rare, so we don't apply optimizations like singleton-ifying INT_MIN.  Here's the order to review:

* c10/core/SymInt.h and cpp
  * `is_symbolic` renamed to `is_heap_allocated` as I needed to audit all use sites: the old `is_symbolic` test would return true for large negative int, but it would be wrong to then try to dispatch on the LargeNegativeIntSymNodeImpl which supports very few operations. In this file, I had to update expect_int,
  * If you pass in a large negative integer, we instead heap allocate it in `promote_to_negative`. The function is written in a funny way to keep compact constructor code for SymInt (the heap allocation happens out of line)
  * clone is now moved out-of-line
  * New method maybe_as_int which will give you a constant int if it is possible, either because it's stored inline or in LargeNegativeIntSymNodeImpl. This is the preferred replacement for previous use of is_symbolic() and then as_int_unchecked().
  * Rename toSymNodeImpl to toSymNode, which is more correct (since it returns a SymNode)
  * Complete rewrite of `normalize_symints.cpp` to use new `maybe_as_int`. Cannot easily use the old code structure, so it's now done doing a macro and typing out each case manually (it's actually not that bad.)
  * Reimplementations of all the unary operators by hand to use `maybe_as_int`, relatively simple.
* c10/core/LargeNegativeIntSymNodeImpl.h - Just stores a int64_t value, but it has to be big and negative. Most methods are not implemented, since we will rewrap the large negative int in the real SymNodeImpl subclass before doing operations with it
* The rest of the files are just rewriting code to use `maybe_as_int`. There is a nontrivial comment in c10/core/SymIntArrayRef.h

Very minor test adjustment in c10/test/core/SymInt_test.cpp . Plan to exercise this properly in next PR.

Companion XLA PR: https://github.com/pytorch/xla/pull/4882

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99157
Approved by: https://github.com/albanD
2023-04-15 22:43:51 +00:00
Edward Z. Yang
1ff52225f1 Unify SymIntNode and SymFloatNode into SymNode (#87817)
This refactor was prompted by challenges handling mixed int/float
operations in C++.  A previous version of this patch
added overloads for each permutation of int/float and was unwieldy
https://github.com/pytorch/pytorch/pull/87722/  This PR takes a different
approach.

The general outline of the patch is to combine the C++ types SymIntNode
and SymFloatNode into a single type, SymNode.  This is type erased; we
no longer know statically at C++ if we have an int/float and have to test
it with the is_int()/is_float() virtual methods.  This has a number of
knock on effects.

- We no longer have C++ classes to bind to Python.  Instead, we take an
  entirely new approach to our Python API, where we have a SymInt/SymFloat
  class defined entirely in Python, which hold a SymNode (which corresponds
  to the C++ SymNode).  However, SymNode is not pybind11-bound; instead,
  it lives as-is in Python, and is wrapped into C++ SymNode using PythonSymNode
  when it goes into C++.  This implies a userland rename.

  In principle, it is also possible for the canonical implementation of SymNode
  to be written in C++, and then bound to Python with pybind11 (we have
  this code, although it is commented out.)  However, I did not implement
  this as we currently have no C++ implementations of SymNode.

  Because we do return SymInt/SymFloat from C++ bindings, the C++ binding
  code needs to know how to find these classes.  Currently, this is done
  just by manually importing torch and getting the attributes.

- Because SymInt/SymFloat are easy Python wrappers, __sym_dispatch__ now
  takes SymInt/SymFloat, rather than SymNode, bringing it in line with how
  __torch_dispatch__ works.

Some miscellaneous improvements:

- SymInt now has a constructor that takes SymNode.  Note that this
  constructor is ambiguous if you pass in a subclass of SymNode,
  so an explicit downcast is necessary.  This means toSymFloat/toSymInt
  are no more.  This is a mild optimization as it means rvalue reference
  works automatically.

- We uniformly use the caster for c10::SymInt/SymFloat, rather than
  going the long way via the SymIntNode/SymFloatNode.

- Removed some unnecessary toSymInt/toSymFloat calls in normalize_*
  functions, pretty sure this doesn't do anything.

- guard_int is now a free function, since to guard on an int you cannot
  assume the method exists.  A function can handle both int and SymInt
  inputs.

- We clean up the magic method definition code for SymInt/SymFloat/SymNode.
  ONLY the user classes (SymInt/SymFloat) get magic methods; SymNode gets
  plain methods; this is to help avoid confusion between the two types.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

cc @jansel @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87817
Approved by: https://github.com/albanD, https://github.com/anjali411
2022-10-27 20:56:02 +00:00
Edward Z. Yang
02da9437b0 Store SymInt out of line (#84390)
swolchok reported that non-tracing usage of Tensor we are wasting a lot
of time on is_symbolic() tests, e.g., when destructing SymInts.  This
is a regression for no good reason because we don't actually ever
have SymInts in those cases.  This PR moves the stored SymInts on
Tensor out of line, into a separate ExtraMeta struct, which is only
allocated when we make a Tensor store symbolic sizes/strides.

To avoid adding another word to TensorImpl, I take over the named tensor
metadata field.  This makes named tensor require a double indirection
and use up more space, but it's OK since we're going to delete this
feature anyway soon.

I restore regular int64_t storage on Tensor.  This entailed reverting
https://github.com/pytorch/pytorch/pull/82467 ; there are no other
substantive changes to SizesAndStrides so a close review is not
necessary.

I don't bother optimizes sizes and strides in ExtraMeta in the same
way stock tensor is optimized.  I add a SymDimVector alias.  I make
SymInt UNCHECKED constructor public as it is a useful optimization
in some situations when the int is known to be positive.

I thought about storing the SymInts on the Python object instead.
However, because we can allocate symbolic shape tensors directly
from C++, we cannot guarantee that there is a PyInterpreter for
a Tensor. So we do it this way instead; it's also faster since you
don't have to take out the GIL to do accesses.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84390
Approved by: https://github.com/swolchok, https://github.com/Krovatkin
2022-09-06 20:24:39 +00:00
Nikolay Korovaiko
86e134ddf7 disable c10::SymIntNode tests on mobile (#84066)
This fixes c++ tests' breaks where we were passing pointers and expected `is_symbolic` to return `true`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/84066
Approved by: https://github.com/albanD
2022-08-25 17:28:23 +00:00
Edward Z. Yang
a9320e6d96 Delete SymInt::data() in favor of as_int_unchecked() (#82477)
I audited all the sites while I was at it, and marked a few suspicious
ones.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82477
Approved by: https://github.com/Chillee
2022-08-01 15:07:22 +00:00
Edward Z. Yang
50e8abbcad Change SymIntNode into an intrusive pointer (#82548)
This will make the pointer type a single word, which is important
for packing it into an int64_t

This time, this diff doesn't segfault when you build with DEBUG mode; more details at https://github.com/pybind/pybind11/issues/4099

Signed-off-by: Edward Z. Yang <ezyangfb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82548
Approved by: https://github.com/albanD
2022-08-01 15:07:21 +00:00
PyTorch MergeBot
3b9cbb1738 Revert "Change SymIntNode into an intrusive pointer (#82432)"
This reverts commit 7be44f8158.

Reverted https://github.com/pytorch/pytorch/pull/82432 on behalf of https://github.com/ezyang due to segfaults on test but not caught in CI
2022-07-29 20:08:59 +00:00
Edward Z. Yang
7be44f8158 Change SymIntNode into an intrusive pointer (#82432)
This will make the pointer type a single word, which is important
for packing it into an int64_t

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82432
Approved by: https://github.com/albanD, https://github.com/Krovatkin
2022-07-29 17:32:54 +00:00
Edward Z. Yang
fd5ac1e6b5 Rename SymbolicIntNode to SymIntNodeImpl (#82350)
Done via

```
git grep -l 'SymbolicIntNode' | xargs sed -i 's/SymbolicIntNode/SymIntNodeImpl/g'
```

Reasoning for the change:

* Sym is shorter than Symbolic, and consistent with SymInt
* You usually will deal in shared_ptr<...>, so we're going to
  reserve the shorter name (SymIntNode) for the shared pointer.

But I don't want to update the Python name, so afterwards I ran

```
 git grep -l _C.SymIntNodeImpl | xargs sed -i 's/_C.SymIntNodeImpl/_C.SymIntNode/'
```

and manually fixed up the binding code

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82350
Approved by: https://github.com/Krovatkin
2022-07-28 18:27:45 +00:00
Edward Z. Yang
1724e9f21f Refactor functionality and backend keys to reduce duplication (#81752)
Define some macros for stamping these out, and then use them everywhere
applicable.  Parsing should get this treatment too but I leave it to a
follow up.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81752
Approved by: https://github.com/cpuhrsch, https://github.com/bdhirsh
2022-07-21 21:23:54 +00:00
Edward Z. Yang
b24c102209 Add test that all BackendComponents are covered by toString (#81713)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81713
Approved by: https://github.com/bdhirsh
2022-07-20 16:58:04 +00:00
Michael Andreas Dagitses
ab2ca95dd1 turn on -Werror=unused-variable in our Bazel CPU build
Summary:
We also fix any existing issues. Note that we only do this for the CPU
build because nvcc is considered a C++ toolchain but it does not have
the same flag support. Adding flags to the GPU build will cause nvcc
errors.

Test Plan: Built locally, rely on CI to confirm.

Reviewers: malfet

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79156

Approved by: https://github.com/seemethere, https://github.com/osalpekar, https://github.com/albanD
2022-06-11 02:46:34 +00:00
Michael Suo
49979c4021 [symint] Make TensorImpl::sizes_and_strides_ contain SymInt
Change our representation of sizes and strides to contain SymInts
instead of int64_t.

Right now it's not actually possible to create a Tensor with symbolic
shape, so this change is intended to be a no-op.

But the intended behavior is:
- If you create a Tensor with symbolic shape, a `CustomSizes` policy
will be set, and the `has_symbolic_sizes_strides_` bit will be set. (not
currently implemented)
- Calling any TensorImpl function that naively interacts with sizes and
strides will throw. For hot-path functions (`sizes()`, `strides()`), we
make use of the existing policy check to throw. For others, we just have
a regular `TORCH_CHECK(!has_symbolic_sizes_strides_)`.

This also undoes the explicit constructor I made in
https://github.com/pytorch/pytorch/pull/77666; it ended up being more
annoying than useful when making these changes.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78272

Approved by: https://github.com/Krovatkin, https://github.com/Chillee
2022-05-25 20:54:51 +00:00
PyTorch MergeBot
fb84f2223c Revert "[symint] Make TensorImpl::sizes_and_strides_ contain SymInt"
This reverts commit a7a818d9e2.

Reverted https://github.com/pytorch/pytorch/pull/77994 on behalf of https://github.com/seemethere due to Talked with @suo and we decided to revert because of broken [internal builds](https://www.internalfb.com/intern/sandcastle/job/678535557/). Also appears as though internal codegen might be broken as well.
2022-05-24 00:14:02 +00:00
Michael Suo
a7a818d9e2 [symint] Make TensorImpl::sizes_and_strides_ contain SymInt
Change our representation of sizes and strides to contain SymInts
instead of int64_t.

Right now it's not actually possible to create a Tensor with symbolic
shape, so this change is intended to be a no-op.

But the intended behavior is:
- If you create a Tensor with symbolic shape, a `CustomSizes` policy
will be set, and the `has_symbolic_sizes_strides_` bit will be set. (not
currently implemented)
- Calling any TensorImpl function that naively interacts with sizes and
strides will throw. For hot-path functions (`sizes()`, `strides()`), we
make use of the existing policy check to throw. For others, we just have
a regular `TORCH_CHECK(!has_symbolic_sizes_strides_)`.

This also undoes the explicit constructor I made in
https://github.com/pytorch/pytorch/pull/77666; it ended up being more
annoying than useful when making these changes.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77994

Approved by: https://github.com/Krovatkin
2022-05-20 20:17:06 +00:00
Michael Suo
855c4eb051 [symint] Change SizesAndStrides test back to using negative ints
Since we decided we want to support negative ints in SymInt, this
reverts commit b3e7230efa.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77914

Approved by: https://github.com/Krovatkin
2022-05-20 18:13:02 +00:00
Michael Suo
68e22aa9fc [symint] add support for negative integers
The bit packing scheme is described in the comments.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77913

Approved by: https://github.com/Krovatkin
2022-05-20 03:46:29 +00:00
Michael Suo
b3e7230efa [symint] Fix SizesAndStridesTest to not use negative sizes/strides
With SymInt we are using the negative space of `int64_t` in our internal
representation. `SizesAndStridesTest` breaks this because it initializes
`SizesAndStrides` with negative sizes/strides. This PR fixes that.

As an aside: feels like `SizesAndStrides` (and `SymInt`) should really
take a uint64_t, but that would be BC-breaking so I don't do it here.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77820

Approved by: https://github.com/ezyang
2022-05-19 05:06:33 +00:00
Scott Wolchok
0a5e788ab2 [PyTorch] Add NestedTensorCPU and NestedTensorCUDA dispatch keys (#75808)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75808

Just as it is often difficult to write a single kernel that can handle both CPU and CUDA, so can it be difficult to do the same for NestedTensor.
ghstack-source-id: 154171542

(Note: this ignores all push blocking failures!)

Test Plan: CI?

Reviewed By: bdhirsh

Differential Revision: D35603836

fbshipit-source-id: fb0ebb19d34531ed96ce176aca325f8e2b5f90e6
(cherry picked from commit 0bcd753f93c04256c1b745f84a74ecccf0dceef5)
2022-04-19 18:12:12 +00:00
Brian Hirsh
5870e84407 add DispatchKeySet function to get highest backend key
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75233

Approved by: https://github.com/ezyang, https://github.com/larryliu0820
2022-04-05 18:06:52 +00:00
Brian Hirsh
1b7d7d9327 Reland: "free up dispatch key space (in C++)" (#74963)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74963

This is a re-land of D35192346 (9872a06d77) and D35192317 (a9216cde6c), which together are a diff that changes the internal representation of `DispatchKeySet` in pytorch core to free up the number of dispatch keys that we have available. See a more detailed description of the design in the original PR: https://github.com/pytorch/pytorch/pull/69633.

The original PR broke Milan workflows, which use a pytorch mobile build, and manifested as a memory corruption bug inside of `liboacrmerged.so`.

**Background: Existing Mobile Optimization**
Pytorch mobile builds have an existing optimization (here cc23725e89/c10/core/DispatchKey.h (L382) and here cc23725e89/aten/src/ATen/core/dispatch/OperatorEntry.h (L214)), which works as follows:

Every operator in pytorch has a "dispatch table" of function pointers, corresponding to all of the (up to 64) different kernels that we might dispatch to when we run an operator in pytorch (autograd, cpu, cuda, complex number support, etc).

In mobile builds, the size of that table is shrunk from 64 to 8 to save a bunch of space, because mobile doesn't end up using the functionality associated with most dispatch keys.

The dispatcher also has a notion of "fallback kernels", which are kernels that you can register to a particular dispatch key, but should be able to work for "any operator". The array of fallback kernels is defined here: cc23725e89/aten/src/ATen/core/dispatch/Dispatcher.h (L294).

The mobile-optimization currently does **not** extend to this array (it wouldn't be that useful anyway because there is only one array of fallback kernels globally - vs. there is a separate dispatch table of function pointers per operator). So the per-operator tables on mobile are size 8, while the fallback table is size 64.

**The Bug**
This PR actually makes it difficult to enable that optimization separately for the per-operator arrays vs. the fallback array, and incidentally shrunk the size of the fallback array from 64 to 8 for mobile (that happened on this line: https://github.com/pytorch/pytorch/pull/69633/files#diff-f735cd7aa68f15b624100cbc4bb3b5ea76ffc7c9d3bec3b0ccabaa09609e5319R294).

That isn't a problem by itself (since mobile doesn't actually use any of the fallbacks that can no longer be stored). However, pytorch core will still register all of those fallback kernels on startup in mobile builds, even if they aren't used. When we tried to register one of those fallbacks on startup, it would try to dump the kernel somewhere in memory past the bounds of the (now smaller) array inside of the `Dispatcher` object, `backendFallbackKernels_`.

**Why didn't this problem show up in OSS CI? Why didn't it break other internal mobile workflows aside from Milan?**

Ideally, this failure would show up as part of the OSS signal on GitHub, since we already have mobile OSS builds. Given that it was another memory corruption issue that only affected Milan (subset of mobile), I'm not sure what's specific about Milan's builds that caused it only to manifest there. dreiss I wonder if there's another flavor of mobile builds we could run in OSS CI that could potentially help catch this?

**The debugging experience was pretty difficult**

Debugging the Milan-specific failure was made difficult by the following:

(1) lack of CI
- the original Milan failure didn't surface on my original diff, because the Milan job(s) that failed weren't triggered to run on pytorch changes. There's probably a balance to strike here, since those jobs will only be useful if they aren't flaky, and if they can produce reliable failure logs for debugging.

(2) It's difficult to get a repro.
- my work laptop doesn't have the right specs to run the Milan development workflow (not enough disk space)
- There is an existing OnDemand workflow for Milan, but it appears to be relatively new, and after a bunch of help from MarcioPorto, we ran into issues forwarding the log output from Milan tests on the emulator back to the terminal (see the original discussion here: https://fb.workplace.com/groups/OnDemandFRL/permalink/1424937774645433/)

(3) Lack of stack-traces.
- Most Milan failures didn't include actionable stack traces. phding generously helped me debug by running my suggested patches locally, and reporting back if there were any failures. The failing test didn't include a stack trace though (just the line where the crash appeared), so I ended up making some educated guesses about what the issue was based on the area of the crash.
ghstack-source-id: 152688542

Test Plan: Confirmed with phding that the broken Milan workflow from the previous version of this diff is now passing.

Reviewed By: phding, albanD

Differential Revision: D35222806

fbshipit-source-id: 0ad115a0f768bc8ea5d4c203b2990254c7092d30
(cherry picked from commit 002b91966f11fd55ab3fa3801b636fa39a6dd12c)
2022-03-31 21:52:38 +00:00
Brian Hirsh
9872a06d77 Back out "free up dispatch key space (in C++)" (#74859)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74859

Original commit changeset: 6d1dd0fd8144

Original Phabricator Diff: D34227616 (2cbddc0e9b)
ghstack-source-id: 152381077

(Note: this ignores all push blocking failures!)

Test Plan:
Test on Milan with "get weather utterance"
buck build fbsourcefbandroid/mode/opt fbsourcefbandroid/mode/milan_build_rdk  //fbandroid/apps/wearable/system/speechservice:speechservice_target30_xhdpi_armv7_release_debug_keystore -c  pt.has_backtaces=1

Reviewed By: phding

Differential Revision: D35192346

fbshipit-source-id: b962de5d5effaf23f9aa8afd3ef36f8c6383de5b
(cherry picked from commit 913e3027a11457aaa2d97a9d89ebc6133b14213c)
2022-03-29 15:39:17 +00:00
Brian Hirsh
2cbddc0e9b free up dispatch key space (in C++) (#72827)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72827

Reland of D34034848 (6690256021)
ghstack-source-id: 152161452

Test Plan: Confirm that Milan tests are passing

Reviewed By: ezyang

Differential Revision: D34227616

fbshipit-source-id: 6d1dd0fd8144dfbd9e194cd7564cce017e7db968
(cherry picked from commit e5c1b29fedd5c2a0bad810cedc94aa784136b6aa)
2022-03-25 17:04:51 +00:00
Brian Hirsh
22ccf448e8 Revert D34034848: free up dispatch key space (in C++)
Test Plan: revert-hammer

Differential Revision:
D34034848 (6690256021)

Original commit changeset: 9677ee2c0a1a

Original Phabricator Diff: D34034848 (6690256021)

fbshipit-source-id: fd50943d915ef813bb9f9ab278fb582429eea3b1
(cherry picked from commit 3acefee1cd)
2022-02-14 23:29:00 +00:00
Brian Hirsh
6690256021 free up dispatch key space (in C++) (#72402)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72402

The original PR had an array-out-of-bounds access in `DispatchKeyExtractor.cpp`, that wasn't caught by ASAN and appeared to only manifest in a subset of android internal tests. After fixing the OOB access (and adding more asserts), I confirmed that the android internal test passes.

Reland of D33255193 (20b8653dfa)
ghstack-source-id: 148830728

Test Plan:
Steps to test:

(1) connect to a mobile OD

(2) run `one_world android emulator android-29` in a terminal to start the android emulator

(3) In a separate terminal, run the test: `buck test //fbandroid/instrumentation_tests/com/facebook/pytorch/bi_xray:instrumentation_test -c test.external_runner=tpx -- --regex 'testBIXRayModel.*PyTorchBIXRayInstrumentationTest' --force-remote-execution --run-disabled`

I also ran `buck test fbandroid/mode/dbg //fbandroid/instrumentation_tests/com/facebook/pytorch/bi_xray:instrumentation_test`, which failed before and passed after the PR.

Reviewed By: albanD

Differential Revision: D34034848

fbshipit-source-id: 9677ee2c0a1afd1183896f7055009445712523c5
(cherry picked from commit 9ab9b12d35)
2022-02-14 16:02:29 +00:00
Jacob Szwejbka
791e7df7d9 Back out "free up dispatch key space (in C++)"
Summary: I think this diff stack broke all the related tasks below.

Test Plan:
For our failing tests:

buck test //fbandroid/instrumentation_tests/com/facebook/pytorch/bi_xray:instrumentation_test -c test.external_runner=tpx -- --regex 'testBIXRayModel.*PyTorchBIXRayInstrumentationTest' --force-remote-execution --run-disabled

For the ubn:

Not really sure what to do, trying to build the app and see if I can use an effect?

Reviewed By: shoumikhin

Differential Revision: D34018849

fbshipit-source-id: 3571718cb6621931af931b494e0a70d6e0164e65
(cherry picked from commit 3cc63cb2ea)
2022-02-05 01:25:42 +00:00
Brian Hirsh
20b8653dfa free up dispatch key space (in C++) (#69633)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69633

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D33255193

Pulled By: bdhirsh

fbshipit-source-id: 79773e9c15bf4f2f27675121a49ff5ffd1375238
(cherry picked from commit eac0b13005)
2022-02-04 17:57:38 +00:00
Nikita Shulga
59deee8308 Make c10 tests compilable with -Werror (#69711)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69711

Test Plan: Imported from OSS

Reviewed By: r-barnes

Differential Revision: D32997005

Pulled By: malfet

fbshipit-source-id: 369194051ece9d213b48584ca84e5d76b3794dae
2021-12-10 16:47:46 -08:00
Richard Barnes
29d759948e use irange for loops 2 (#66746)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66746

Modified loops in files under fbsource/fbcode/caffe2/ from the format

`for(TYPE var=x0;var<x_max;x++)`

to the format

`for(const auto var: irange(xmax))`

This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand.

Test Plan: Sandcastle

Reviewed By: malfet

Differential Revision: D31705361

fbshipit-source-id: 33fd22eb03086d114e2c98e56703e8ec84460268
2021-12-10 04:26:23 -08:00
Xue Li
2f099c7555 Revert D30652629: use irange for loops
Test Plan: revert-hammer

Differential Revision:
D30652629 (687c2267d4)

Original commit changeset: 0ae6c4bbbb55

fbshipit-source-id: 5c4f067b584a021c8c9656454d1ee60999600fb3
2021-10-15 15:23:10 -07:00
Richard Barnes
687c2267d4 use irange for loops (#66234)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66234

Modified loops in files under fbsource/fbcode/caffe2/ from the format

`for(TYPE var=x0;var<x_max;x++)`

to the format

`for(const auto var: irange(xmax))`

This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand.

bypass_size_limit
allow-large-files

Test Plan: Sandcastle

Reviewed By: ngimel

Differential Revision: D30652629

fbshipit-source-id: 0ae6c4bbbb554bad42e372792a6430e1acf15e3e
2021-10-15 13:50:33 -07:00
Dhruv Matani
013a42bdb1 [PyTorch] Add Device_test.cpp (#63203)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63203

Currently, `c10::Device` isn't being tested - i.e. there's no test to ensure that the device string parsing works as expected. This diff adds very basic tests to assert that the stuff we expect to work works, and the stuff that we don't expect to work doesn't work.

ghstack-source-id: 136006962

Test Plan:
New test. Ran as:

```
cd fbsource/fbcode/
buck test //caffe2/c10:c10_test_0 -- -r '.*DeviceTest.*'
```

Reviewed By: dreiss, raziel

Differential Revision: D30286910

fbshipit-source-id: b5699068dcbba89d5d224dbaf74b175f3f785a00
2021-08-17 09:22:35 -07:00
Nikita Shulga
a9b0a921d5 Disable avoid-non-const-global-variables lint check (#62008)
Summary:
As GoogleTest `TEST` macro is non-compliant with it as well as `DEFINE_DISPATCH`

All changes but the ones to `.clang-tidy` are generated using following script:
```
for i in `find . -type f -iname "*.c*" -or -iname "*.h"|xargs grep cppcoreguidelines-avoid-non-const-global-variables|cut -f1 -d:|sort|uniq`;  do sed -i "/\/\/ NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables)/d" $i; done
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62008

Reviewed By: driazati, r-barnes

Differential Revision: D29838584

Pulled By: malfet

fbshipit-source-id: 1b2f8602c945bd4ce50a9bfdd204755556e31d13
2021-07-22 18:04:40 -07:00