Commit Graph

54 Commits

Author SHA1 Message Date
Aaron Orenstein
2f9d378f7b PEP585 update - torch/utils (#145201)
See #145101 for details.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145201
Approved by: https://github.com/bobrenjc93
2025-01-21 21:04:10 +00:00
William Wen
ee7eaad5c3 [dynamo] add SymNode bitwise and/or (#138777)
Fixes [T203472723](https://www.internalfb.com/intern/tasks/?t=203472723)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138777
Approved by: https://github.com/ezyang
2024-11-22 23:36:16 +00:00
PyTorch MergeBot
c1fe6be202 Revert "[dynamo] add SymNode bitwise and/or (#138777)"
This reverts commit c98ef0279e.

Reverted https://github.com/pytorch/pytorch/pull/138777 on behalf of https://github.com/ezyang due to triggering AssertionError: Guard check failed: 14/2: name 'BitwiseFn_bitwise_or' is not defined ([comment](https://github.com/pytorch/pytorch/pull/138777#issuecomment-2477477776))
2024-11-14 21:52:40 +00:00
William Wen
c98ef0279e [dynamo] add SymNode bitwise and/or (#138777)
Fixes [T203472723](https://www.internalfb.com/intern/tasks/?t=203472723)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138777
Approved by: https://github.com/ezyang
2024-11-13 18:31:06 +00:00
Edward Z. Yang
91ded0576d Add sym_log2 (#137980)
Internal xref: https://fb.workplace.com/groups/1075192433118967/permalink/1515595595745313/

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/137980
Approved by: https://github.com/bobrenjc93
2024-10-28 17:03:14 +00:00
PyTorch MergeBot
2487a834a4 Revert "Add sym_log2 (#137980)"
This reverts commit 5d450d7fac.

Reverted https://github.com/pytorch/pytorch/pull/137980 on behalf of https://github.com/jeanschmidt due to lint broke from this onwards on main ([comment](https://github.com/pytorch/pytorch/pull/137980#issuecomment-2441570186))
2024-10-28 13:21:08 +00:00
Edward Z. Yang
5d450d7fac Add sym_log2 (#137980)
Internal xref: https://fb.workplace.com/groups/1075192433118967/permalink/1515595595745313/

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/137980
Approved by: https://github.com/bobrenjc93
2024-10-28 03:09:11 +00:00
Laith Sakka
ed313a5ca2 Introduce torch.sym_add, variadic add (#138660)
Tested internally here: https://www.internalfb.com/diff/D64057744
This is a reland after previous internal failures.
main change is
```
 if min is None and max is None:
        torch._check_is_size(size)
        return
```

Partially addresses https://github.com/pytorch/pytorch/issues/128150

When you have big sums of values, we end up computing long chains of
binary addition in our FX graph representation.  Not only is this ugly,
it also is quadratic, as the sympy.Add constructor is O(N) in number
of arguments.  Instead, ensure that we maintain the summation as a
single FX node so we can do the entire addition all in one go.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138660
Approved by: https://github.com/ezyang, https://github.com/bobrenjc93
2024-10-23 17:42:41 +00:00
Bob Ren
38ea487338 Re-raise in _run_sympy_handler to reduce log spew (#138356)
Fixes: https://github.com/pytorch/pytorch/issues/138069

I tested this by running `python test/inductor/test_torchinductor_dynamic_shapes.py DynamicShapesCpuTests.test_builtins_round_float_ndigits_pos_dynamic_shapes_cpu` before and after the change and verifying no more log spew.

I'm uncertain on if it makes sense to add a test for this PR. Question for reviewers: is there a standard paradigm for testing these log spew based fixed? Happy to add a test if someone can point me towards the right direction.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138356
Approved by: https://github.com/ezyang
2024-10-19 16:02:45 +00:00
PyTorch MergeBot
16a2c2cfd4 Revert "Introduce torch.sym_sum (#136429)"
This reverts commit 90bed32b98.

Reverted https://github.com/pytorch/pytorch/pull/136429 on behalf of https://github.com/ezyang due to fails internal stuff ([comment](https://github.com/pytorch/pytorch/pull/136429#issuecomment-2403335147))
2024-10-09 20:08:01 +00:00
Edward Z. Yang
1aac1ffce1 Don't generate implicit value ranges for missing symbols. (#136667)
Instead, callback to a missing handler when needed. This greatly speeds things up with the value ranges dict is large. The missing handler is needed because nested ints don't have VRs, but symbolic sizes involving them occasionally show up in compute.

```
TORCHDYNAMO_EXTENDED_DEBUG_CREATE_SYMBOL="s11" TORCH_LOGS=dynamic PYTORCH_TEST_WITH_DYNAMO=1 python test/test_nestedtensor.py TestNestedTensorAutogradCPU.test_dropout_backward_jagged_cpu
```

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/136667
Approved by: https://github.com/isuruf
ghstack dependencies: #136429
2024-10-08 18:12:57 +00:00
Edward Z. Yang
90bed32b98 Introduce torch.sym_sum (#136429)
Partially addresses https://github.com/pytorch/pytorch/issues/128150

When you have big sums of values, we end up computing long chains of
binary addition in our FX graph representation.  Not only is this ugly,
it also is quadratic, as the sympy.Add constructor is O(N) in number
of arguments.  Instead, ensure that we maintain the summation as a
single FX node so we can do the entire addition all in one go.

update_hint_regression benchmark, before and after:

```
update_hint_regression,compile_time_instruction_count,2648328980
update_hint_regression,compile_time_instruction_count,2563748678
```

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/136429
Approved by: https://github.com/isuruf
2024-10-08 18:12:57 +00:00
Edward Z. Yang
326db8af4c Replace sympy Min/Max with reimplementations (#133319)
Sympy's implementation of Min/Max displays asymptotically bad behavior on `TORCH_COMPILE_CPROFILE=1 python torchrec/distributed/tests/test_pt2_multiprocess.py TestPt2Train.test_compile_multiprocess`. Evidence profile:

![image](https://github.com/user-attachments/assets/142301e9-3a18-4370-b9db-19b32ece7ee8)

On this test case, we spend 42% of all time compiling the network on ShapeEnv.replace, which in turn spends all of its time in xreplace.

The problem appears to be find_localzeros call. By vendoring the implementations of Min/Max, we can potentially reduce the cost of this operation.

The implementation is copy-pasted sympy/functions/elementary/miscellaneous.py but with some adjustments:

* I deleted logic related to differentatiation, evalf and heaviside, as it's not relevant to PyTorch reasoning
* There's some massaging to appease PyTorch's linters, including a lot of noqa and type: ignore (which I could potentially refactor away with substantive changes, but that's better as its own change)
* I deleted the second loop iteration for is_connected, as an attempt at initial optimization (this also simplifies the port, since I can omit some code). I'll comment at that point what the exact difference is.

Before this change, the test in question takes 100s with 40 features; post this change, afterwards, it takes only 69s.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133319
Approved by: https://github.com/Skylion007
2024-08-25 05:05:59 +00:00
Xuehai Pan
30293319a8 [BE][Easy][19/19] enforce style for empty lines in import segments in torch/[o-z]*/ (#129771)
See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter.

You can review these PRs via:

```bash
git diff --ignore-all-space --ignore-blank-lines HEAD~1
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129771
Approved by: https://github.com/justinchuby, https://github.com/janeyx99
2024-08-01 17:07:14 +00:00
Pian Pawakapan
1b3b4c2fb9 [runtime asserts] deduplicate runtime asserts & CSE (#128599) (#130380)
original PR: https://github.com/pytorch/pytorch/pull/128599 (re-created after revert + poisoned diff train)

Summary:
This PR adds deduplication and CSE for runtime asserts. Existing size computation in the graph is CSE'd along with added runtime asserts, and redundant asserts are removed. Shape calls on intermediate tensors are also turned into compute on input sizes if possible, allowing intermediate tensors to be freed earlier. For example:
```
z = torch.cat([x, x], dim=0)  # 2*s0
w = z.repeat(y.shape[0])  # 2*s0*s1
_w = w.shape[0]

s0 = x.shape[0]
s1 = y.shape[0]
_w0 = 2 * s0
_w = _w0 * s1
```

Additionally, constrain_range calls are deduplicated. Single-symbol bound checks for unbacked symbols (e.g. u0 >= 0, u0 <= 5) and sym_constrain_range.default calls are also removed, since they accumulate range info in the ShapeEnv, and are replaced with two _assert_scalar.default calls that check the min/max bounds. For example:
```
torch.sym_constrain_range_for_size(n, min=2, max=16)
torch.sym_constrain_range(n, min=4, max=20)
torch._check(n >= 0)
torch._check(n >= 3)
torch._check(n <= 14)

torch.sym_constrain_range_for_size(n)
torch._check(n >= 4)
torch._check(n <= 14)
```

Test Plan:
contbuild & OSS CI, see 940e4477ab

Original Phabricator Test Plan:
Imported from GitHub, without a `Test Plan:` line.

Differential Revision: D59543603

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130380
Approved by: https://github.com/izaitsevfb
2024-07-10 19:23:37 +00:00
PyTorch MergeBot
9c9744c3ac Revert "[runtime asserts] deduplicate runtime asserts & CSE (#128599)"
This reverts commit 940e4477ab.

Reverted https://github.com/pytorch/pytorch/pull/128599 on behalf of https://github.com/izaitsevfb due to breaking internal APS tests, see D59498864 ([comment](https://github.com/pytorch/pytorch/pull/128599#issuecomment-2218724762))
2024-07-09 21:03:49 +00:00
Pian Pawakapan
940e4477ab [runtime asserts] deduplicate runtime asserts & CSE (#128599)
This PR adds deduplication and CSE for runtime asserts. Existing size computation in the graph is CSE'd along with added runtime asserts, and redundant asserts are removed. Shape calls on intermediate tensors are also turned into compute on input sizes if possible, allowing intermediate tensors to be freed earlier. For example:
```
z = torch.cat([x, x], dim=0)  # 2*s0
w = z.repeat(y.shape[0])  # 2*s0*s1
_w = w.shape[0]
# something with _w ...

# turns into ->
s0 = x.shape[0]
s1 = y.shape[0]
_w0 = 2 * s0
_w = _w0 * s1
```

Additionally, constrain_range calls are deduplicated. Single-symbol bound checks for unbacked symbols (e.g. u0 >= 0, u0 <= 5) and sym_constrain_range.default calls are also removed, since they accumulate range info in the ShapeEnv, and are replaced with two _assert_scalar.default calls that check the min/max bounds. For example:
```
torch.sym_constrain_range_for_size(n, min=2, max=16)
torch.sym_constrain_range(n, min=4, max=20)
torch._check(n >= 0)
torch._check(n >= 3)
torch._check(n <= 14)

# turns into
torch.sym_constrain_range_for_size(n)
torch._check(n >= 4)
torch._check(n <= 14)
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/128599
Approved by: https://github.com/ezyang
2024-07-07 20:10:14 +00:00
PyTorch MergeBot
963f430d13 Revert "[runtime asserts] deduplicate runtime asserts & CSE (#128599)"
This reverts commit 0267b2ddcb.

Reverted https://github.com/pytorch/pytorch/pull/128599 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it seems to cause a landrace and fails inductor/test_cudagraph_trees in trunk 0267b2ddcb ([comment](https://github.com/pytorch/pytorch/pull/128599#issuecomment-2211690518))
2024-07-06 07:20:05 +00:00
Pian Pawakapan
0267b2ddcb [runtime asserts] deduplicate runtime asserts & CSE (#128599)
This PR adds deduplication and CSE for runtime asserts. Existing size computation in the graph is CSE'd along with added runtime asserts, and redundant asserts are removed. Shape calls on intermediate tensors are also turned into compute on input sizes if possible, allowing intermediate tensors to be freed earlier. For example:
```
z = torch.cat([x, x], dim=0)  # 2*s0
w = z.repeat(y.shape[0])  # 2*s0*s1
_w = w.shape[0]
# something with _w ...

# turns into ->
s0 = x.shape[0]
s1 = y.shape[0]
_w0 = 2 * s0
_w = _w0 * s1
```

Additionally, constrain_range calls are deduplicated. Single-symbol bound checks for unbacked symbols (e.g. u0 >= 0, u0 <= 5) and sym_constrain_range.default calls are also removed, since they accumulate range info in the ShapeEnv, and are replaced with two _assert_scalar.default calls that check the min/max bounds. For example:
```
torch.sym_constrain_range_for_size(n, min=2, max=16)
torch.sym_constrain_range(n, min=4, max=20)
torch._check(n >= 0)
torch._check(n >= 3)
torch._check(n <= 14)

# turns into
torch.sym_constrain_range_for_size(n)
torch._check(n >= 4)
torch._check(n <= 14)
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/128599
Approved by: https://github.com/ezyang
2024-07-06 03:44:49 +00:00
eellison
c187593418 Prevent expansion of cat indexing to avoid int64 intermediate (#127815)
Fix for https://github.com/pytorch/pytorch/issues/127652

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127815
Approved by: https://github.com/shunting314, https://github.com/peterbell10
2024-06-14 15:42:08 +00:00
Edward Z. Yang
2229884102 Introduce int_oo (#127693)
In a previous life, we used sympy.oo to represent the lower/upper bounds of integer ranges. Later, we changed this to be sys.maxsize - 1 for a few reasons: (1) sometimes we do tests on a value being exactly sys.maxsize, and we wanted to avoid a data dependent guard in this case, (2) sympy.oo corresponds to floating point infinity, so you get incorrect types for value ranges with oo, and (3) you can do slightly better reasoning if you assume that input sizes fall within representable 64-bit integer range.

After working in the sys.maxsize regime for a bit, I've concluded that this was actually a bad idea. Specifically, the problem is that you end up with sys.maxsize in your upper bound, and then whenever you do any sort of size-increasing computation like size * 2, you end up with 2 * sys.maxsize, and you end up doing a ton of arbitrary precision int computation that is totally unnecessary. A symbolic bound is better.

But especially after #126905, we can't go back to using sympy.oo, because that advertises that it's not an integer, and now your ValueRanges is typed incorrectly. So what do we do? We define a new numeric constant `int_oo`, which is like `sympy.oo` but it advertises `is_integer`. **test/test_sympy_utils.py** describes some basic properties of the number, and **torch/utils/_sympy/numbers.py** has the actual implementation.

The rest of the changes of the PR are working out the implications of this change. I'll give more commentary as inline comments.

Fixes https://github.com/pytorch/pytorch/issues/127396

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127693
Approved by: https://github.com/lezcano
ghstack dependencies: #126905
2024-06-13 04:08:20 +00:00
PyTorch MergeBot
f2dcbe89d6 Revert "Prevent expansion of cat indexing to avoid int64 intermediate (#127815)"
This reverts commit 793df7b7cb.

Reverted https://github.com/pytorch/pytorch/pull/127815 on behalf of https://github.com/clee2000 due to the newly added test is failing internally D58444153.  Test exists in opensource and passed in OSS CI, maybe env difference? ([comment](https://github.com/pytorch/pytorch/pull/127815#issuecomment-2163421968))
2024-06-12 16:09:22 +00:00
PyTorch MergeBot
5d8c7f39d4 Revert "Introduce int_oo (#127693)"
This reverts commit 9cab5987bd.

Reverted https://github.com/pytorch/pytorch/pull/127693 on behalf of https://github.com/clee2000 due to sorry executorch CI is a bit weird regarding pins, I'll make a chat with mergen with the choices of what to do and how it'll affect executorch CI, reverting for now to prevent more divergences in the meantime ([comment](https://github.com/pytorch/pytorch/pull/127693#issuecomment-2161775400))
2024-06-11 23:36:08 +00:00
eellison
793df7b7cb Prevent expansion of cat indexing to avoid int64 intermediate (#127815)
Fix for https://github.com/pytorch/pytorch/issues/127652

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127815
Approved by: https://github.com/shunting314, https://github.com/peterbell10
2024-06-11 02:41:07 +00:00
Edward Z. Yang
9cab5987bd Introduce int_oo (#127693)
In a previous life, we used sympy.oo to represent the lower/upper bounds of integer ranges. Later, we changed this to be sys.maxsize - 1 for a few reasons: (1) sometimes we do tests on a value being exactly sys.maxsize, and we wanted to avoid a data dependent guard in this case, (2) sympy.oo corresponds to floating point infinity, so you get incorrect types for value ranges with oo, and (3) you can do slightly better reasoning if you assume that input sizes fall within representable 64-bit integer range.

After working in the sys.maxsize regime for a bit, I've concluded that this was actually a bad idea. Specifically, the problem is that you end up with sys.maxsize in your upper bound, and then whenever you do any sort of size-increasing computation like size * 2, you end up with 2 * sys.maxsize, and you end up doing a ton of arbitrary precision int computation that is totally unnecessary. A symbolic bound is better.

But especially after #126905, we can't go back to using sympy.oo, because that advertises that it's not an integer, and now your ValueRanges is typed incorrectly. So what do we do? We define a new numeric constant `int_oo`, which is like `sympy.oo` but it advertises `is_integer`. **test/test_sympy_utils.py** describes some basic properties of the number, and **torch/utils/_sympy/numbers.py** has the actual implementation.

The rest of the changes of the PR are working out the implications of this change. I'll give more commentary as inline comments.

Fixes https://github.com/pytorch/pytorch/issues/127396

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127693
Approved by: https://github.com/lezcano
ghstack dependencies: #126905
2024-06-10 19:09:53 +00:00
Edward Z. Yang
3964a3ec73 Complete revamp of float/promotion sympy handling (#126905)
At a high level, the idea behind this PR is:

* Make it clearer what the promotion and int/float rules for various Sympy operations are. Operators that previously were polymorphic over int/float are now split into separate operators for clarity. We never do mixed int/float addition/multiplication etc in sympy, instead, we always promote to the appropriate operator. (However, equality is currently not done correctly.)
* Enforce strict typing on ValueRanges: if you have a ValueRange for a float, the lower and upper MUST be floats, and so forth for integers.

The story begins in **torch/utils/_sympy/functions.py**. Here, I make some changes to how we represent certain operations in sympy expressions:

* FloorDiv now only supports integer inputs; to do float floor division, do a truediv and then a trunc. Additionally, we remove the divide out addition by gcd optimization, because sympy gcd is over fields and is willing to generate rationals (but rationals are bad for ValueRange strict typing).
* ModularIndexing, LShift, RShift now assert they are given integer inputs.
* Mod only supports integer inputs; eventually we will support FloatMod (left for later work, when we build out Sympy support for floating operations). Unfortunately, I couldn't assert integer inputs here, because of a bad interaction with sympy's inequality solver that is used by the offline solver
* TrueDiv is split into FloatTrueDiv and IntTrueDiv. This allows for us to eventually generate accurate code for Python semantics IntTrueDiv, which is written in a special way to preserve precision when the inputs are >= 2**53 beyond what first coercing the integer to floats and then doing true division.
* Trunc is split to TruncToFloat and TruncToInt.
* Round is updated to return a float, not an int, making it consistent with the round op handler in Inductor. To get Python-style conversion to int, we call TruncToInt on the result.
* RoundDecimal updated to consistently only ever return a float
* Add ToFloat for explicit coercion to float (required so we can enforce strict ValueRanges typing)

In **torch/__init__.py**, we modify SymInt and SymFloat to appropriately call into new bindings that route to these refined sympy operations.  Also, we modify `torch.sym_min` and `torch.sym_max` to have promotion semantics (if one argument is a float, the return result is always a float), making them inconsistent with builtins.min/max, but possible to do type analysis without runtime information.

We also need to introduce some new op handlers in **torch/_inductor/ops_handler.py**:

* `to_int` for truncation to int64, directly corresponding to TruncToInt; this can be implemented by trunc and dtype, but with a dedicated handler it is more convenient for roundtripping in Sympy
* `int_truediv` for Python-style integer true division, which has higher precision than casting to floats and then running `truediv`

These changes have consequences. First, we need to make some administrative changes:

* Actually wire up these Sympy functions from SymInt/SymFloat in **torch/fx/experimental/sym_node.py**, including the new promotion rules (promote2)
* Add support for new Sympy functions in **torch/utils/_sympy/interp.py**, **torch/utils/_sympy/reference.py**
  * In particular, in torch.utils._sympy.reference, we have a strong preference to NOT do nontrivial compute, instead, everything in ops handler should map to a singular sympy function
  * TODO: I chose to roundtrip mod back to our Mod function, but I think I'm going to have to deal with the C/Python inconsistency this to fix tests here
* Add printer support for the Sympy functions in **torch/_inductor/codegen/common.py**, **torch/_inductor/codegen/cpp_utils.py**, **torch/_inductor/codegen/triton.py**. `int_truediv` and mixed precision equality is currently not implemented soundly, so we will lose precision in codegen for large values. TODO: The additions here are not exhaustive yet
* Update ValueRanges logic to use new sympy functions in **torch/utils/_sympy/value_ranges.py**. In general, we prefer to use the new Sympy function rather than try to roll things by hand, which is what was done previously for many VR analysis functions.

In **torch/fx/experimental/symbolic_shapes.py** we need to make some symbolic reasoning adjustments:

* Avoid generation of rational subexpressions by removing simplification of `x // y` into `floor(x / y)`. This simplification then triggers an addition simplification rule `(x + y) / c --> x / c + y / c` which is bad because x / c is a rational number now
* `_assert_bound_is_rational` is no more, we no longer generate rational bounds
* Don't intersect non-int value ranges with the `int_range`
* Support more sympy Functions for guard SYMPY_INTERP
* Assert the type of value range is consistent with the variable type

The new asserts uncovered necessary bug fixes:

* **torch/_inductor/codegen/cpp.py**, **torch/_inductor/select_algorithm.py**, **torch/_inductor/sizevars.py** - Ensure Wild/Symbol manually allocated in Inductor is marked `is_integer` so it's accepted to build expressions
* **torch/_inductor/utils.py** - make sure you actually pass in sympy.Expr to these functions
* **torch/_inductor/ir.py** - make_contiguous_strides_for takes int/SymInt, not sympy.Expr!
* **torch/export/dynamic_shapes.py** - don't use infinity to represent int ranges, instead use sys.maxsize - 1

Because of the removal of some symbolic reasoning that produced rationals, some of our symbolic reasoning has gotten worse and we are unable to simplify some guards. Check the TODO at **test/test_proxy_tensor.py**

**Reland notes.** This requires this internal fbcode diff https://www.internalfb.com/phabricator/paste/view/P1403322587 but I cannot prepare the diff codev due to https://fb.workplace.com/groups/osssupport/posts/26343544518600814/

It also requires this Executorch PR https://github.com/pytorch/executorch/pull/3911 but the ET PR can be landed prior to this landing.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/126905
Approved by: https://github.com/xadupre, https://github.com/lezcano
2024-06-09 06:20:25 +00:00
Aaron Orenstein
8db9dfa2d7 Flip default value for mypy disallow_untyped_defs [9/11] (#127846)
See #127836 for details.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127846
Approved by: https://github.com/ezyang
ghstack dependencies: #127842, #127843, #127844, #127845
2024-06-08 18:50:06 +00:00
PyTorch MergeBot
ac51f782fe Revert "Complete revamp of float/promotion sympy handling (#126905)"
This reverts commit 2f7cfecd86.

Reverted https://github.com/pytorch/pytorch/pull/126905 on behalf of https://github.com/atalman due to Sorry need to revert - failing internally ([comment](https://github.com/pytorch/pytorch/pull/126905#issuecomment-2155118778))
2024-06-07 16:01:46 +00:00
Edward Z. Yang
2f7cfecd86 Complete revamp of float/promotion sympy handling (#126905)
At a high level, the idea behind this PR is:

* Make it clearer what the promotion and int/float rules for various Sympy operations are. Operators that previously were polymorphic over int/float are now split into separate operators for clarity. We never do mixed int/float addition/multiplication etc in sympy, instead, we always promote to the appropriate operator. (However, equality is currently not done correctly.)
* Enforce strict typing on ValueRanges: if you have a ValueRange for a float, the lower and upper MUST be floats, and so forth for integers.

The story begins in **torch/utils/_sympy/functions.py**. Here, I make some changes to how we represent certain operations in sympy expressions:

* FloorDiv now only supports integer inputs; to do float floor division, do a truediv and then a trunc. Additionally, we remove the divide out addition by gcd optimization, because sympy gcd is over fields and is willing to generate rationals (but rationals are bad for ValueRange strict typing).
* ModularIndexing, LShift, RShift now assert they are given integer inputs.
* Mod only supports integer inputs; eventually we will support FloatMod (left for later work, when we build out Sympy support for floating operations). Unfortunately, I couldn't assert integer inputs here, because of a bad interaction with sympy's inequality solver that is used by the offline solver
* TrueDiv is split into FloatTrueDiv and IntTrueDiv. This allows for us to eventually generate accurate code for Python semantics IntTrueDiv, which is written in a special way to preserve precision when the inputs are >= 2**53 beyond what first coercing the integer to floats and then doing true division.
* Trunc is split to TruncToFloat and TruncToInt.
* Round is updated to return a float, not an int, making it consistent with the round op handler in Inductor. To get Python-style conversion to int, we call TruncToInt on the result.
* RoundDecimal updated to consistently only ever return a float
* Add ToFloat for explicit coercion to float (required so we can enforce strict ValueRanges typing)

In **torch/__init__.py**, we modify SymInt and SymFloat to appropriately call into new bindings that route to these refined sympy operations.  Also, we modify `torch.sym_min` and `torch.sym_max` to have promotion semantics (if one argument is a float, the return result is always a float), making them inconsistent with builtins.min/max, but possible to do type analysis without runtime information.

We also need to introduce some new op handlers in **torch/_inductor/ops_handler.py**:

* `to_int` for truncation to int64, directly corresponding to TruncToInt; this can be implemented by trunc and dtype, but with a dedicated handler it is more convenient for roundtripping in Sympy
* `int_truediv` for Python-style integer true division, which has higher precision than casting to floats and then running `truediv`

These changes have consequences. First, we need to make some administrative changes:

* Actually wire up these Sympy functions from SymInt/SymFloat in **torch/fx/experimental/sym_node.py**, including the new promotion rules (promote2)
* Add support for new Sympy functions in **torch/utils/_sympy/interp.py**, **torch/utils/_sympy/reference.py**
  * In particular, in torch.utils._sympy.reference, we have a strong preference to NOT do nontrivial compute, instead, everything in ops handler should map to a singular sympy function
  * TODO: I chose to roundtrip mod back to our Mod function, but I think I'm going to have to deal with the C/Python inconsistency this to fix tests here
* Add printer support for the Sympy functions in **torch/_inductor/codegen/common.py**, **torch/_inductor/codegen/cpp_utils.py**, **torch/_inductor/codegen/triton.py**. `int_truediv` and mixed precision equality is currently not implemented soundly, so we will lose precision in codegen for large values. TODO: The additions here are not exhaustive yet
* Update ValueRanges logic to use new sympy functions in **torch/utils/_sympy/value_ranges.py**. In general, we prefer to use the new Sympy function rather than try to roll things by hand, which is what was done previously for many VR analysis functions.

In **torch/fx/experimental/symbolic_shapes.py** we need to make some symbolic reasoning adjustments:

* Avoid generation of rational subexpressions by removing simplification of `x // y` into `floor(x / y)`. This simplification then triggers an addition simplification rule `(x + y) / c --> x / c + y / c` which is bad because x / c is a rational number now
* `_assert_bound_is_rational` is no more, we no longer generate rational bounds
* Don't intersect non-int value ranges with the `int_range`
* Support more sympy Functions for guard SYMPY_INTERP
* Assert the type of value range is consistent with the variable type

The new asserts uncovered necessary bug fixes:

* **torch/_inductor/codegen/cpp.py**, **torch/_inductor/select_algorithm.py**, **torch/_inductor/sizevars.py** - Ensure Wild/Symbol manually allocated in Inductor is marked `is_integer` so it's accepted to build expressions
* **torch/_inductor/utils.py** - make sure you actually pass in sympy.Expr to these functions
* **torch/_inductor/ir.py** - make_contiguous_strides_for takes int/SymInt, not sympy.Expr!
* **torch/export/dynamic_shapes.py** - don't use infinity to represent int ranges, instead use sys.maxsize - 1

Because of the removal of some symbolic reasoning that produced rationals, some of our symbolic reasoning has gotten worse and we are unable to simplify some guards. Check the TODO at **test/test_proxy_tensor.py**

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/126905
Approved by: https://github.com/xadupre, https://github.com/lezcano
2024-06-06 02:29:45 +00:00
PyTorch MergeBot
d5cb5d623a Revert "Complete revamp of float/promotion sympy handling (#126905)"
This reverts commit fb696ef3aa.

Reverted https://github.com/pytorch/pytorch/pull/126905 on behalf of https://github.com/ezyang due to internal user reported ceiling equality simplification problem, I have a plan ([comment](https://github.com/pytorch/pytorch/pull/126905#issuecomment-2148805840))
2024-06-05 03:57:58 +00:00
Edward Z. Yang
fb696ef3aa Complete revamp of float/promotion sympy handling (#126905)
At a high level, the idea behind this PR is:

* Make it clearer what the promotion and int/float rules for various Sympy operations are. Operators that previously were polymorphic over int/float are now split into separate operators for clarity. We never do mixed int/float addition/multiplication etc in sympy, instead, we always promote to the appropriate operator. (However, equality is currently not done correctly.)
* Enforce strict typing on ValueRanges: if you have a ValueRange for a float, the lower and upper MUST be floats, and so forth for integers.

The story begins in **torch/utils/_sympy/functions.py**. Here, I make some changes to how we represent certain operations in sympy expressions:

* FloorDiv now only supports integer inputs; to do float floor division, do a truediv and then a trunc. Additionally, we remove the divide out addition by gcd optimization, because sympy gcd is over fields and is willing to generate rationals (but rationals are bad for ValueRange strict typing).
* ModularIndexing, LShift, RShift now assert they are given integer inputs.
* Mod only supports integer inputs; eventually we will support FloatMod (left for later work, when we build out Sympy support for floating operations). Unfortunately, I couldn't assert integer inputs here, because of a bad interaction with sympy's inequality solver that is used by the offline solver
* TrueDiv is split into FloatTrueDiv and IntTrueDiv. This allows for us to eventually generate accurate code for Python semantics IntTrueDiv, which is written in a special way to preserve precision when the inputs are >= 2**53 beyond what first coercing the integer to floats and then doing true division.
* Trunc is split to TruncToFloat and TruncToInt.
* Round is updated to return a float, not an int, making it consistent with the round op handler in Inductor. To get Python-style conversion to int, we call TruncToInt on the result.
* RoundDecimal updated to consistently only ever return a float
* Add ToFloat for explicit coercion to float (required so we can enforce strict ValueRanges typing)

In **torch/__init__.py**, we modify SymInt and SymFloat to appropriately call into new bindings that route to these refined sympy operations.  Also, we modify `torch.sym_min` and `torch.sym_max` to have promotion semantics (if one argument is a float, the return result is always a float), making them inconsistent with builtins.min/max, but possible to do type analysis without runtime information.

We also need to introduce some new op handlers in **torch/_inductor/ops_handler.py**:

* `to_int` for truncation to int64, directly corresponding to TruncToInt; this can be implemented by trunc and dtype, but with a dedicated handler it is more convenient for roundtripping in Sympy
* `int_truediv` for Python-style integer true division, which has higher precision than casting to floats and then running `truediv`

These changes have consequences. First, we need to make some administrative changes:

* Actually wire up these Sympy functions from SymInt/SymFloat in **torch/fx/experimental/sym_node.py**, including the new promotion rules (promote2)
* Add support for new Sympy functions in **torch/utils/_sympy/interp.py**, **torch/utils/_sympy/reference.py**
  * In particular, in torch.utils._sympy.reference, we have a strong preference to NOT do nontrivial compute, instead, everything in ops handler should map to a singular sympy function
  * TODO: I chose to roundtrip mod back to our Mod function, but I think I'm going to have to deal with the C/Python inconsistency this to fix tests here
* Add printer support for the Sympy functions in **torch/_inductor/codegen/common.py**, **torch/_inductor/codegen/cpp_utils.py**, **torch/_inductor/codegen/triton.py**. `int_truediv` and mixed precision equality is currently not implemented soundly, so we will lose precision in codegen for large values. TODO: The additions here are not exhaustive yet
* Update ValueRanges logic to use new sympy functions in **torch/utils/_sympy/value_ranges.py**. In general, we prefer to use the new Sympy function rather than try to roll things by hand, which is what was done previously for many VR analysis functions.

In **torch/fx/experimental/symbolic_shapes.py** we need to make some symbolic reasoning adjustments:

* Avoid generation of rational subexpressions by removing simplification of `x // y` into `floor(x / y)`. This simplification then triggers an addition simplification rule `(x + y) / c --> x / c + y / c` which is bad because x / c is a rational number now
* `_assert_bound_is_rational` is no more, we no longer generate rational bounds
* Don't intersect non-int value ranges with the `int_range`
* Support more sympy Functions for guard SYMPY_INTERP
* Assert the type of value range is consistent with the variable type

The new asserts uncovered necessary bug fixes:

* **torch/_inductor/codegen/cpp.py**, **torch/_inductor/select_algorithm.py**, **torch/_inductor/sizevars.py** - Ensure Wild/Symbol manually allocated in Inductor is marked `is_integer` so it's accepted to build expressions
* **torch/_inductor/utils.py** - make sure you actually pass in sympy.Expr to these functions
* **torch/_inductor/ir.py** - make_contiguous_strides_for takes int/SymInt, not sympy.Expr!
* **torch/export/dynamic_shapes.py** - don't use infinity to represent int ranges, instead use sys.maxsize - 1

Because of the removal of some symbolic reasoning that produced rationals, some of our symbolic reasoning has gotten worse and we are unable to simplify some guards. Check the TODO at **test/test_proxy_tensor.py**

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/126905
Approved by: https://github.com/xadupre, https://github.com/lezcano
2024-06-04 11:47:32 +00:00
Edward Z. Yang
efa36ef092 Natively support int truncation, don't guard on positive/negative (#122827)
This doesn't entirely fix the original problem that prompted this, but
it seems to just be getting stuck in export constraint formatting now
which seems like progress to me.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122827
Approved by: https://github.com/avikchaudhuri
2024-04-11 15:22:32 +00:00
Edward Z. Yang
3178ba0dc9 Don't use sympy Float functions, use an opaque one with no reasoning (#122823)
Sympy simplifications don't obey floating point semantics, so don't
use Sympy for this.  Keep them as is, only evaluate with the reference
implementations when all arguments are known.

This may end up getting subsumed by some other changes later, but I
wanted to understand if this was easy and it seems to be easy.

This doesn't actually depend on the earlier diffs on the stack and I can detach it.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122823
Approved by: https://github.com/lezcano
2024-03-29 19:13:55 +00:00
vfdev-5
7005a4bcb6 [dynamo] Added dyn shapes support for math trigo ops: sin(h), cos(h), tan(h) ... (#114866)
Description:
- Added dynamic shapes support for math trigo ops: sin(h), cos(h), tan(h) ...

```python
import math
import torch

def func(x, a, b):
    c = 0
    c = c + math.sqrt(a)
    c = c + math.cos(a)
    c = c + math.cosh(a)
    c = c + math.sin(a)
    c = c + math.sinh(a)
    c = c + math.tan(a)
    c = c + math.tanh(a)
    c = c + math.asin(b)
    c = c + math.acos(b)
    c = c + math.atan(a)
    y = x + c
    return y

cfunc = torch.compile(func, dynamic=True, fullgraph=True)

device = "cpu"  # or "cuda"
x = torch.tensor([0, 1, 2, 3], dtype=torch.float32, device=device)
a = 12
b = 1

out = cfunc(x, a, b)
expected = func(x, a, b)
torch.testing.assert_close(out, expected)
```

and the graph `TORCH_LOGS=+graph_code python check_math_ops.py`:

<details>
<summary>
graph code
</summary>

```
[2023-11-30 22:16:10,654] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG] TRACED GRAPH
[2023-11-30 22:16:10,654] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG]  ===== __compiled_fn_0 =====
[2023-11-30 22:16:10,654] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG]  <eval_with_key>.0 class GraphModule(torch.nn.Module):
[2023-11-30 22:16:10,654] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG]     def forward(self, L_a_ : torch.SymInt, s1 : torch.SymInt, L_x_ : torch.Tensor):
[2023-11-30 22:16:10,654] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG]         l_a_ = L_a_
[2023-11-30 22:16:10,654] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG]         l_x_ = L_x_
[2023-11-30 22:16:10,654] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG]
[2023-11-30 22:16:10,654] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG]         # File: check_math_ops.py:57, code: c = c + math.sqrt(a)
[2023-11-30 22:16:10,654] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG]         sym_sqrt = torch.sym_sqrt(l_a_)
[2023-11-30 22:16:10,654] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG]         add = 0 + sym_sqrt;  sym_sqrt = None
[2023-11-30 22:16:10,654] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG]
[2023-11-30 22:16:10,654] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG]         # File: check_math_ops.py:58, code: c = c + math.cos(a)
[2023-11-30 22:16:10,654] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG]         sym_cos = torch.sym_cos(l_a_)
[2023-11-30 22:16:10,654] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG]         add_1 = add + sym_cos;  add = sym_cos = None
[2023-11-30 22:16:10,654] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG]
[2023-11-30 22:16:10,654] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG]         # File: check_math_ops.py:59, code: c = c + math.cosh(a)
[2023-11-30 22:16:10,654] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG]         sym_cosh = torch.sym_cosh(l_a_)
[2023-11-30 22:16:10,654] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG]         add_2 = add_1 + sym_cosh;  add_1 = sym_cosh = None
[2023-11-30 22:16:10,654] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG]
[2023-11-30 22:16:10,654] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG]         # File: check_math_ops.py:60, code: c = c + math.sin(a)
[2023-11-30 22:16:10,654] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG]         sym_sin = torch.sym_sin(l_a_)
[2023-11-30 22:16:10,654] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG]         add_3 = add_2 + sym_sin;  add_2 = sym_sin = None
[2023-11-30 22:16:10,654] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG]
[2023-11-30 22:16:10,654] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG]         # File: check_math_ops.py:61, code: c = c + math.sinh(a)
[2023-11-30 22:16:10,654] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG]         sym_sinh = torch.sym_sinh(l_a_)
[2023-11-30 22:16:10,654] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG]         add_4 = add_3 + sym_sinh;  add_3 = sym_sinh = None
[2023-11-30 22:16:10,654] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG]
[2023-11-30 22:16:10,654] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG]         # File: check_math_ops.py:62, code: c = c + math.tan(a)
[2023-11-30 22:16:10,654] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG]         sym_tan = torch.sym_tan(l_a_)
[2023-11-30 22:16:10,654] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG]         add_5 = add_4 + sym_tan;  add_4 = sym_tan = None
[2023-11-30 22:16:10,654] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG]
[2023-11-30 22:16:10,654] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG]         # File: check_math_ops.py:63, code: c = c + math.tanh(a)
[2023-11-30 22:16:10,654] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG]         sym_tanh = torch.sym_tanh(l_a_)
[2023-11-30 22:16:10,654] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG]         add_6 = add_5 + sym_tanh;  add_5 = sym_tanh = None
[2023-11-30 22:16:10,654] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG]
[2023-11-30 22:16:10,654] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG]         # File: check_math_ops.py:64, code: c = c + math.asin(b)
[2023-11-30 22:16:10,654] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG]         add_7 = add_6 + 1.5707963267948966;  add_6 = None
[2023-11-30 22:16:10,654] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG]
[2023-11-30 22:16:10,654] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG]         # File: check_math_ops.py:65, code: c = c + math.acos(b)
[2023-11-30 22:16:10,654] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG]         add_8 = add_7 + 0.0;  add_7 = None
[2023-11-30 22:16:10,654] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG]
[2023-11-30 22:16:10,654] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG]         # File: check_math_ops.py:66, code: c = c + math.atan(a)
[2023-11-30 22:16:10,654] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG]         sym_atan = torch.sym_atan(l_a_);  l_a_ = None
[2023-11-30 22:16:10,654] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG]         add_9 = add_8 + sym_atan;  add_8 = sym_atan = None
[2023-11-30 22:16:10,654] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG]
[2023-11-30 22:16:10,654] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG]         # File: check_math_ops.py:67, code: y = x + c
[2023-11-30 22:16:10,654] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG]         y = l_x_ + add_9;  l_x_ = add_9 = None
[2023-11-30 22:16:10,654] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG]         return (y,)
[2023-11-30 22:16:10,654] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG]
[2023-11-30 22:16:10,654] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG]
```
</details>

Generated code with `TORCH_LOGS=+output_code python check_math_ops.py`:
<details>
<summary>
C++ code
</summary>

```
[2023-11-30 22:19:09,709] [0/0] torch._inductor.graph.__output_code: [DEBUG] cpp_fused_add_0 = async_compile.cpp('''
[2023-11-30 22:19:09,709] [0/0] torch._inductor.graph.__output_code: [DEBUG] #include "/tmp/torchinductor_root/2l/c2ljzlm4sosod7u6lyrroqdba6hmfcyijrric6p4t3fhbcmw6osp.h"
[2023-11-30 22:19:09,709] [0/0] torch._inductor.graph.__output_code: [DEBUG] extern "C" void kernel(const float* in_ptr0,
[2023-11-30 22:19:09,709] [0/0] torch._inductor.graph.__output_code: [DEBUG]                        float* out_ptr0,
[2023-11-30 22:19:09,709] [0/0] torch._inductor.graph.__output_code: [DEBUG]                        const long ks0,
[2023-11-30 22:19:09,709] [0/0] torch._inductor.graph.__output_code: [DEBUG]                        const long ks1)
[2023-11-30 22:19:09,709] [0/0] torch._inductor.graph.__output_code: [DEBUG] {
[2023-11-30 22:19:09,709] [0/0] torch._inductor.graph.__output_code: [DEBUG]     {
[2023-11-30 22:19:09,709] [0/0] torch._inductor.graph.__output_code: [DEBUG]         #pragma GCC ivdep
[2023-11-30 22:19:09,709] [0/0] torch._inductor.graph.__output_code: [DEBUG]         for(long x0=static_cast<long>(0L); x0<static_cast<long>(ks0); x0+=static_cast<long>(1L))
[2023-11-30 22:19:09,709] [0/0] torch._inductor.graph.__output_code: [DEBUG]         {
[2023-11-30 22:19:09,709] [0/0] torch._inductor.graph.__output_code: [DEBUG]             auto tmp0 = in_ptr0[static_cast<long>(x0)];
[2023-11-30 22:19:09,709] [0/0] torch._inductor.graph.__output_code: [DEBUG]             auto tmp1 = c10::convert<float>(1.57079632679490 + (std::sqrt(ks1)) + (std::atan(ks1)) + (std::cos(ks1)) + (std::cosh(ks1)) + (std::sin(ks1)) + (std::sinh(ks1)) + (std::tan(ks1)) + (std::tanh(ks1)));
[2023-11-30 22:19:09,709] [0/0] torch._inductor.graph.__output_code: [DEBUG]             auto tmp2 = decltype(tmp0)(tmp0 + tmp1);
[2023-11-30 22:19:09,709] [0/0] torch._inductor.graph.__output_code: [DEBUG]             out_ptr0[static_cast<long>(x0)] = tmp2;
[2023-11-30 22:19:09,709] [0/0] torch._inductor.graph.__output_code: [DEBUG]         }
[2023-11-30 22:19:09,709] [0/0] torch._inductor.graph.__output_code: [DEBUG]     }
[2023-11-30 22:19:09,709] [0/0] torch._inductor.graph.__output_code: [DEBUG] }
[2023-11-30 22:19:09,709] [0/0] torch._inductor.graph.__output_code: [DEBUG] ''')
```

</details>

<details>
<summary>
Triton code
</summary>

```
[2023-11-30 22:20:00,383] [0/0] torch._inductor.graph.__output_code: [DEBUG] @pointwise(
[2023-11-30 22:20:00,383] [0/0] torch._inductor.graph.__output_code: [DEBUG]     size_hints=[4],
[2023-11-30 22:20:00,383] [0/0] torch._inductor.graph.__output_code: [DEBUG]     filename=__file__,
[2023-11-30 22:20:00,383] [0/0] torch._inductor.graph.__output_code: [DEBUG]     triton_meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32', 3: 'i32'}, 'device': 0, 'device_type': 'cuda', 'constants': {}, 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=(), i
ds_of_folded_args=(), divisible_by_8=())]},
[2023-11-30 22:20:00,383] [0/0] torch._inductor.graph.__output_code: [DEBUG]     inductor_meta={'autotune_hints': set(), 'kernel_name': 'triton_poi_fused_add_0', 'mutated_arg_names': []},
[2023-11-30 22:20:00,383] [0/0] torch._inductor.graph.__output_code: [DEBUG]     min_elem_per_thread=0
[2023-11-30 22:20:00,383] [0/0] torch._inductor.graph.__output_code: [DEBUG] )
[2023-11-30 22:20:00,383] [0/0] torch._inductor.graph.__output_code: [DEBUG] @triton.jit
[2023-11-30 22:20:00,383] [0/0] torch._inductor.graph.__output_code: [DEBUG] def triton_(in_ptr0, out_ptr0, ks0, xnumel, XBLOCK : tl.constexpr):
[2023-11-30 22:20:00,383] [0/0] torch._inductor.graph.__output_code: [DEBUG]     xoffset = tl.program_id(0) * XBLOCK
[2023-11-30 22:20:00,383] [0/0] torch._inductor.graph.__output_code: [DEBUG]     xindex = xoffset + tl.arange(0, XBLOCK)[:]
[2023-11-30 22:20:00,383] [0/0] torch._inductor.graph.__output_code: [DEBUG]     xmask = xindex < xnumel
[2023-11-30 22:20:00,383] [0/0] torch._inductor.graph.__output_code: [DEBUG]     x0 = xindex
[2023-11-30 22:20:00,383] [0/0] torch._inductor.graph.__output_code: [DEBUG]     tmp0 = tl.load(in_ptr0 + (x0), xmask)
[2023-11-30 22:20:00,383] [0/0] torch._inductor.graph.__output_code: [DEBUG]     tmp1 = 1.57079632679490 + (tl.math.sqrt(ks0.to(tl.float32))) + (tl.math.atan((ks0).to(tl.float32))) + (tl.math.cos((ks0).to(tl.float32))) + (tl.math.cosh((ks0).to(tl.float32))) + (tl.math.sin((ks0)
.to(tl.float32))) + (tl.math.sinh((ks0).to(tl.float32))) + (tl.math.tan((ks0).to(tl.float32))) + (tl.math.tanh((ks0).to(tl.float32)))
[2023-11-30 22:20:00,383] [0/0] torch._inductor.graph.__output_code: [DEBUG]     tmp2 = tmp1.to(tl.float32)
[2023-11-30 22:20:00,383] [0/0] torch._inductor.graph.__output_code: [DEBUG]     tmp3 = tmp0 + tmp2
[2023-11-30 22:20:00,383] [0/0] torch._inductor.graph.__output_code: [DEBUG]     tl.store(out_ptr0 + (x0), tmp3, xmask)
[2023-11-30 22:20:00,383] [0/0] torch._inductor.graph.__output_code: [DEBUG] ''')
```

</details>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114866
Approved by: https://github.com/peterbell10
2024-01-11 11:52:28 +00:00
Philip Meier
505a9e4854 add support for dynamic shapes in round (#115259)
Fixes #114310 and supersedes #114748.

There are two reasons why we have quite a few special cases for `round`:

1. `round` is actually two ops. With `ndigits=None` (default), `round` always returns an integer. When `ndigits` is an integer, the returned type is a float.
2. Although `round` takes two arguments, it is a unary function with a parameter rather than a binary one.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115259
Approved by: https://github.com/peterbell10, https://github.com/lezcano
2023-12-19 15:45:50 +00:00
Edward Z. Yang
793c62b79c Allow binary pointwise operations to cause refinement on unbacked SymInts (#112155)
To do this, there is a little detour to remove hint caching for unbacked
SymInts; now, we just always attempt to update the hint (using
maybe_evaluate_static; this is much better than the replace we were
doing before) if we don't think we know it.

With this change, we now can generally infer that i0 == 1 is false for
a size-like unbacked SymInt.  So if we write the size match /
broadcasting test very carefully (see comment), we will eventually
end up expect_true(sizeA == sizeB), which is good enough to cause
refinement.  Phew!

I think I still want to setup a replacement if you do i0 == s0, but I'm
going to do that in a follow up.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112155
Approved by: https://github.com/aakhundov, https://github.com/voznesenskym
2023-11-01 23:02:17 +00:00
lezcano
47ccf04885 Split SymNode into its own file (#112037)
This PR:

- Moves TrueDiv, LShift, RShift, IsNonOverlappingAndDenseIndicator to `_sympy.functions.py`
- Moves SymNode to `fx.experimental.sym_node`.
  - This file does not have any SymPy dependencies at import time
  - It installs the magic methods in Sym{Bool,Int,Float}.
  - N.b. With this split, we may be able to move Sym{Bool,Int,Float} to this file, and remove quite a few of the hacks around these classes
- Imports `sym_node` in `torch/__init__.py` rather than the whole `symbolic_shapes.py`.
  This breaks the import-time dependency between torch and SymPy

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112037
Approved by: https://github.com/peterbell10
ghstack dependencies: #112035, #112036
2023-10-26 23:32:27 +00:00
ydwu4
6140facf00 Support SymBool input to torch.compile (#107850)
We could have SymBool inputs for torch.compile, e.g. in the following situation:
```
def f(x:torch.Tensor):
  pred = x.size(0) == 3
  torch.compile(f)(pred, x)

make_fx(f, tracing_mode="symbolic")(x)
```

The idea of this PR (credit to @ezyang) is to support SymBool by re-using the infra we've already had for SymInt so that we don't need to replicate a lot of stuff.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107850
Approved by: https://github.com/ezyang
ghstack dependencies: #107662
2023-09-14 21:34:31 +00:00
PyTorch MergeBot
47f79e9a2b Revert "Support SymBool input to torch.compile (#107850)"
This reverts commit 9f6d70b2fd.

Reverted https://github.com/pytorch/pytorch/pull/107850 on behalf of https://github.com/huydhn due to Sorry for reverting this, but test_export_with_symbool_inputs is failing in trunk a08e1370ef ([comment](https://github.com/pytorch/pytorch/pull/107850#issuecomment-1718675877))
2023-09-14 02:53:36 +00:00
ydwu4
9f6d70b2fd Support SymBool input to torch.compile (#107850)
We could have SymBool inputs for torch.compile, e.g. in the following situation:
```
def f(x:torch.Tensor):
  pred = x.size(0) == 3
  torch.compile(f)(pred, x)

make_fx(f, tracing_mode="symbolic")(x)
```

The idea of this PR (credit to @ezyang) is to support SymBool by re-using the infra we've already had for SymInt so that we don't need to replicate a lot of stuff.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107850
Approved by: https://github.com/ezyang
ghstack dependencies: #107662
2023-09-14 01:16:29 +00:00
lezcano
2b6249e209 Wrap indirect indexing on CUDA (#105055)
Lifting this to CPU should be rather easy. @jgong5
Partially fixes https://github.com/pytorch/pytorch/issues/97365. I'd wait to close that issue once this works on CPU as well.

This fix works with dynamic shapes as well.

@voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @ngimel @yf225 @chenyang78 @kadeng @muchulee8

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105055
Approved by: https://github.com/peterbell10, https://github.com/jansel
2023-08-23 11:59:20 +00:00
PyTorch MergeBot
b282787409 Revert "Wrap indirect indexing on CUDA (#105055)"
This reverts commit 85c673e6b2.

Reverted https://github.com/pytorch/pytorch/pull/105055 on behalf of https://github.com/peterbell10 due to Causes failure in inductor_torchbench ([comment](https://github.com/pytorch/pytorch/pull/105055#issuecomment-1688871947))
2023-08-22 20:24:41 +00:00
lezcano
85c673e6b2 Wrap indirect indexing on CUDA (#105055)
Lifting this to CPU should be rather easy. @jgong5
Partially fixes https://github.com/pytorch/pytorch/issues/97365. I'd wait to close that issue once this works on CPU as well.

This fix works with dynamic shapes as well.

@voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @ngimel @yf225 @chenyang78 @kadeng @muchulee8

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105055
Approved by: https://github.com/peterbell10, https://github.com/jansel
2023-08-22 01:06:35 +00:00
Yukio Siraichi
8e01f75b1b New Mod class for SymPy expressions. (#104968)
This PR introduces a new `Mod` class to be used with SymPy expressions. The main reason
being due to SymPy simplification errors (#97792).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104968
Approved by: https://github.com/ezyang
2023-07-14 13:34:52 +00:00
lezcano
d1fedad080 Perform value range analysis with rationals when possible (#105137)
This is particularly useful for guards to avoid rounding errors, as most
guards (all?) are rational functions.

Fixes https://github.com/pytorch/pytorch/issues/105097

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105137
Approved by: https://github.com/ezyang
2023-07-13 16:45:47 +00:00
lezcano
ff05f81e1d Simplify and extend ValueRanges (#104557)
This PR:
- It adds a few boolean variants of some methods that were missing
- It simplifies the implementation of plenty of the operations
- Adds ModularIndexing to the SymPy interpreter

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104557
Approved by: https://github.com/eellison
2023-07-07 23:52:13 +00:00
lezcano
7ae100628e Move most SymPy functions to their own file (#104556)
All these are standalone implementations of some functions and they
don't depend on anything else, so we better have them under the
`_sympy/` folder on their own

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104556
Approved by: https://github.com/ezyang
2023-07-04 03:53:48 +00:00
Edward Z. Yang
d303665d33 Make int unspecialization actually work (#95621)
OK, so this PR used to be about reducing the number of constants we specialize on, but it turns out that unspecialization was ~essentially never used (because we still constant specialized way too aggressively) and I ended up having to fix a bunch of issues to actually get tests to pass. So this PR is now "make int unspecialization actually work". As part of this, I have to turn off unspecialization by default, as there are still latent bugs in inductor.

The general strategy is that an unspecialized int is represented as a SymInt. Representing it as a 0d tensor (which is what the code used to do) is untenable: (1) we often need unspecialized ints to participate in size computations, but we have no way of propagating sympy expressions through tensor compute, and (2) a lot of APIs work when passed SymInt, but not when passed a Tensor. However, I continue to represent Numpy scalars as Tensors, as they are rarely used for size computation and they have an explicit dtype, so they are more accurately modeled as 0d tensors.

* I folded in the changes from https://github.com/pytorch/pytorch/pull/95099 as I cannot represent unspecialized ints as SymInts without also turning on dynamic shapes. This also eliminates the necessity for test_unspec.py, as toggling specialization without dynamic shapes doesn't do anything. As dynamic shapes defaults to unspecializing, I just deleted this entirely; for the specialization case, I rely on regular static shape tests to catch it. (Hypothetically, we could also rerun all the tests with dynamic shapes, but WITH int/float specialization, but this seems... not that useful? I mean, I guess export wants it, but I'd kind of like our Source heuristic to improve enough that export doesn't have to toggle this either.)
* Only 0/1 integers get specialized by default now
* A hodgepodge of fixes. I'll comment on the PR about them.

Fixes https://github.com/pytorch/pytorch/issues/95469

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95621
Approved by: https://github.com/jansel, https://github.com/Chillee
2023-03-04 01:22:08 +00:00
Edward Z. Yang
3758559a58 Reland "Introduce constrain_range; remove old expr_subs (#95063)" (#95209)
This reverts commit 4e88547c95.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95209
Approved by: https://github.com/albanD
2023-02-22 18:16:25 +00:00
PyTorch MergeBot
cf6e078c34 Revert "Reland "Introduce constrain_range; remove old expr_subs (#95063)" (#95209)"
This reverts commit f7bf31fff1.

Reverted https://github.com/pytorch/pytorch/pull/95209 on behalf of https://github.com/ezyang due to internal sympy is too old
2023-02-22 01:58:58 +00:00