Commit Graph

507 Commits

Author SHA1 Message Date
Aaron Gokaslan
31715be72a [BE]: Update mypy to 1.11.2 (#133816)
Updates mypy to 1.11.1 to improve type inference

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133816
Approved by: https://github.com/ezyang
2024-09-16 19:44:11 +00:00
PyTorch MergeBot
3117f2cf67 Revert "[BE]: Update mypy to 1.11.2 (#133816)"
This reverts commit 55299cfc22.

Reverted https://github.com/pytorch/pytorch/pull/133816 on behalf of https://github.com/jeanschmidt due to seems to have broken https://github.com/pytorch/pytorch/actions/runs/10865710499/job/30155699792 on main ([comment](https://github.com/pytorch/pytorch/pull/133816#issuecomment-2352377684))
2024-09-16 09:11:16 +00:00
Aaron Gokaslan
55299cfc22 [BE]: Update mypy to 1.11.2 (#133816)
Updates mypy to 1.11.1 to improve type inference

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133816
Approved by: https://github.com/ezyang
2024-09-14 21:40:36 +00:00
Laith Sakka
ba6e0f31ab Remove cycle dependency by localizing the import. (#135926)
Summary:
Since https://www.internalfb.com/diff/D62215095 landed there has been many silence errors due to the dependency between functional_tensor and config.

```
 File "/tmp/torch_deploy_zip5YRJC1/torch_python_modules.zip/torch/export/__init__.py", line 64, in <module>
  File "/tmp/torch_deploy_zip5YRJC1/torch_python_modules.zip/torch/export/dynamic_shapes.py", line 23, in <module>
  File "/tmp/torch_deploy_zip5YRJC1/torch_python_modules.zip/torch/export/exported_program.py", line 26, in <module>
  File "/tmp/torch_deploy_zip5YRJC1/torch_python_modules.zip/torch/_higher_order_ops/__init__.py", line 1, in <module>
  File "/tmp/torch_deploy_zip5YRJC1/torch_python_modules.zip/torch/_higher_order_ops/cond.py", line 6, in <module>
  File "/tmp/torch_deploy_zip5YRJC1/torch_python_modules.zip/torch/_subclasses/functional_tensor.py", line 9, in <module>
  File "/tmp/torch_deploy_zip5YRJC1/torch_python_modules.zip/torch/_inductor/config.py", line 44, in <module>
```

https://fburl.com/logarithm/ol5kx0ee
complaining about a cycle dependency

this fix it.

Test Plan: buck test multipy/runtime:test_deploy_embedded_cuda_interp_without_cuda_available -- --run-disabled TorchpyTest.AcquireMultipleSessionsInDifferentPackages

Reviewed By: aorenste

Differential Revision: D62616765

Pull Request resolved: https://github.com/pytorch/pytorch/pull/135926
Approved by: https://github.com/aorenste, https://github.com/oulgen, https://github.com/Skylion007
2024-09-13 15:05:41 +00:00
Pian Pawakapan
6df91b5917 real tensor prop for composite ops (#135717)
Fixes #135632

Adds real tensor propagation for decompositions, checking any symbols on their outputs

Pull Request resolved: https://github.com/pytorch/pytorch/pull/135717
Approved by: https://github.com/ezyang
2024-09-13 03:35:16 +00:00
Joel Schlosser
525bec804c NJT <-> padded dense conversions (#125947)
This PR:
* Implements the pre-existing `nt.to_padded_tensor(padding_val)` ATen op via the FBGEMM kernel + appropriate view gymnastics (since that kernel only handles 2D values)
* Introduces a new `_nested_from_padded_tensor` op for the reverse conversion, implemented via the reverse FBGEMM kernel + view gymnastics
    * Note: there is currently no public API for this; design booted to a future PR

TODO:
* ~~Propagate min / max sequence length via the new factory function `_nested_from_padded_tensor`~~
* ~~Verify that Inductor does computation fusion via test logic~~

Pull Request resolved: https://github.com/pytorch/pytorch/pull/125947
Approved by: https://github.com/soulitzer
2024-09-12 17:54:25 +00:00
Laith Sakka
66dd4577b1 Track base of FunctionalTensor in inference mode. (#135141)
The idea behind the tracking is the following, whenever we see a tensor if the tensors is a root tensors (does not have any view metas ) when we consider is as the base of the all the tensors that shares its storage.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/135141
Approved by: https://github.com/zou3519
2024-09-06 00:10:25 +00:00
Avik Chaudhuri
43f4947d44 fix fake tensor tolist implementation (#135131)
Summary:
When exporting for training with `tolist`, we do not hit `FunctionalTensor.tolist` since we do not functionalize. Unfortunately, this means we hit `FakeTensor.tolist`, which creates unbacked symints that are not backed by proxies.

Rather than trying to patch up this low-level implementation, we replace it with essentially what `FunctionalTensor.tolist` does, which is higher-level: we essentially desugar to `item()` calls and let it take care of unbacked symints.

Test Plan:
Some expected failures are gone now.
Also found a test for `tolist` that was written when `FunctionalTensor.tolist` was implemented but not really doing much; repurposed it now to exercise more modes.

Differential Revision: D62197742

Pull Request resolved: https://github.com/pytorch/pytorch/pull/135131
Approved by: https://github.com/ezyang
2024-09-05 23:20:31 +00:00
IvanKobzarev
1efd341d15 [fake_tensor] Move unrecognized_type NotImplemented before ConstProp (#135033)
We should not try to do ConstProp on the unrecognized types (e.g. Subclasses).
In case of those types throwing NotImplemented will jump to the next torch_dispatch.

Test:
```
 python test/functorch/test_aotdispatch.py -k test_aot_test_subclasses_with_tensor_factories
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/135033
Approved by: https://github.com/zou3519, https://github.com/bdhirsh
2024-09-05 17:09:41 +00:00
Jack Zhang
8a5c8e5db9 Update unbacked symints in masked_select more precisely (#134899)
## Summary
At the moment, the fake impl for `masked_select` simply sets the upper range while updating its size-like SymInt to `sys.maxsize`(9223372036854775807, max value for an unsigned int64) if the there are any SymInts in the original input tensor shape. This PR constrains the range more intelligently by using the upper ranges of each SymInt in the input tensor shape.

This solves an issue where an model being lowered to Executorch errors during memory planning because the memory allocated for `masked_select` ended up exceeded the 64-bit address space (`INT_MAX * size(dtype)`).

## Test plan
- Passes existing unit tests (tests case where upper bound is inf)
- Added unit test to verify upper bound reduction calculation
- Tested end-to-end by exporting with TORCH_LOGS="export" and ensuring that the range for `masked_select`'s SymInt size has the correct upper bound
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134899
Approved by: https://github.com/ezyang
2024-09-05 09:01:06 +00:00
Tugsbayasgalan Manlaibaatar
9d705605dd Fix decomp behaviour in export training IR (#134801)
Subset of changes in https://github.com/pytorch/pytorch/pull/132901, can't land the previous one because it is too complicated. Rest of the change will be implemented as follow up after export design meeting. This part just makes the training IR -> inference IR decomp to have the same path as normal export.

Differential Revision: [D62000525](https://our.internmc.facebook.com/intern/diff/D62000525)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134801
Approved by: https://github.com/avikchaudhuri, https://github.com/angelayi
2024-09-05 06:37:44 +00:00
Laith Sakka
c8ab9b06a2 Redesign custom op functionlaization for better re-inplace (#134409)
- The new implementation (auto_functionalized_v2) is enabled by default but can be disable
 using an inductor flag.
- In export mode the old implementation is used.

**Motiviation**
Previous functionalization fails to re-inplace arguments when they are view over other tensors.
see issue https://github.com/pytorch/pytorch/issues/131192
The new functionalization is easier to re-inplace for views.

**A) Functionalizations pass**
consider a program:

```

func(t)
    x = t[0]
    y = t[1]
    foo(x, y) # custom operator with x, y mutable
    return (x, y, t)
```

- To functionalize `foo` we generate a function that operates on the base tensors of the inputs;  (x.base() and y.base())
and record how to regenerates the views out of the base for argument x by recording ```ViewInfo=(x.base(), x.size(), x.stride, x,storage_offset())```

- Due to some limitations on the torch.export arguments format, we have to generate alot of arguments, but this is something we can simplify in the future, for the example above we get the following function.

   ```
   auto_functionalized = torch.ops.higher_order.auto_functionalized(torch.ops.mylib.foo.default,
     _x_base_index = 0, _x_size = (), _x_stride = (), _x_storage_offset = 0 ,
     _y_base_index = 0,_y_size = (), _y_stride = (), _y_storage_offset = 1   ,
     _all_bases = [arg0_1])
   ```
 -  In the code above:
        - _all_bases[t]: refers to a unique set of bases for all foo arguments.
        - for each argument x we have _x_base_index, _x_size, _x_stride, _x_storage_offset that can be used to (1)  regenerate x from _all_bases[_x_base_index] or a copy of a the base.

-  the output of auto_functionalized is foo output , followed by x tensors one for each base in  _all_bases, that is a copy of the base tensor after observing the mutations of the all the arguments that are views of that base.

-  for each use of a base in _all_bases or a view of it , that are after the call to foo, replace it with a view of the new output

 for the function above after functionalization we get :
 ```
    def forward(self, arg0_1: "f32[2][1]cpu"):
        auto_functionalized = torch.ops.higher_order.auto_functionalized(torch.ops.mylib.foo.default, _x_base_index = 0, _x_size = (), _x_stride = (), _x_storage_offset = 0, _y_base_index = 0, _y_size = (), _y_stride = (), _y_storage_offset = 1, _all_bases = [arg0_1])
        getitem_1: "f32[2][1]cpu" = auto_functionalized[1];  auto_functionalized = None
        copy_: "f32[2][1]cpu" = torch.ops.aten.copy_.default(arg0_1, getitem_1);  arg0_1 = copy_ = None

        # No stacktrace found for following nodes
        select_2: "f32[][]cpu" = torch.ops.aten.select.int(getitem_1, 0, 0)
        select_3: "f32[][]cpu" = torch.ops.aten.select.int(getitem_1, 0, 1);  getitem_1 = None
        return (select_2, select_3)
```

**B) Semantics of  auto_functionalize**
The new semantics of auto_functionalize is as the following:
1. For each base in all_bases, copy the base and create all_bases copies. (if a base is inplaced we do not need to copy it)
2. For each arg, regenerate the arg from the copy of its base using the view information above.
3. return the original foo output followed by the new bases.

**C) Re-inplace pass**
since auto_functionalize not copy the bases, what we actually inplace is the bases.
 (run just like before but on the beses instead of args).

1. For each base b in _all_bases check if there is any use of base (or its aliases/views) after auto_functionalize (before its overwritten with a copy) if there is not any, then inplace it (avoid copying it in step 1 above).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134409
Approved by: https://github.com/zou3519
2024-09-04 17:08:58 +00:00
IvanKobzarev
33ba952e31 [subclasses] Do not fakeTensor const prop subclass args (#134855)
The issue:

Const propagation checks only if arguments do not have FakeTensor. If argument is Subclass, it will pass this condition.

As a result Const Propogation execution happens without FakeTensorMode and having tensor factories inside Subclass.__torch_dispatch__ results that this Tensor is not Fakified.

Solution:

If we have subclasses arguments, do not count that const propagation is doable

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134855
Approved by: https://github.com/zou3519
2024-09-03 13:31:49 +00:00
Avik Chaudhuri
92e38a476f preserve aten::to device in export training (#134622)
Summary:
With training IR, we cannot rely on trapping `to()` in `FunctionalTensor` because the regular decomposition kicks it first, and that can cause it to be optimized away.

So instead we preserve it until we functionalize, and then replace it explicitly with `_to_copy()`.

Test Plan: expected test failures go away

Differential Revision: D61883878

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134622
Approved by: https://github.com/zhxchen17, https://github.com/tugsbayasgalan
2024-08-29 14:53:30 +00:00
David Berard
289486d007 Move attention kernels back from fake_impls to meta_registrations (#134288)
See #121528 for additional context.

In #120682, we moved the attention kernels from meta_registrations to fake_impls with the intent of fixing the device handling for seed/offset: these are typically on CPU. We needed to put the registrations in fake_impls to do this because meta_registrations doesn't have a way to specify device, whereas fake_impls does. But when we tried to actually fix the device types (#120839), we had to revert the PR because it broke cudagraph handling (during which seed/offset _are_ on CUDA).

Now, we want to put the registrations back in meta_registrations so that we can call these kernels with meta tensors. The use case is later in this stack - we want to be able to use the flop counter with these kernels.

Also - I specifically skip the `compare_tensor_meta()` check in test_fake / test_fake_autocast tests for the `_efficient_attention_forward` and `_flash_attention_forward` kernels, which fails because of the device mismatch from the seed/offset tensors. Then we can un-skip these opinfos. I verified that the efficient_attention_forward bug (#120842) is now caught by these opinfos if I revert the fix from this PR.

Differential Revision: [D61687369](https://our.internmc.facebook.com/intern/diff/D61687369)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134288
Approved by: https://github.com/drisspg
2024-08-27 21:10:36 +00:00
Aart Bik
1a0d00f1f4 [traced-graph][sparse] enable to_dense() for compressed (#133371)
Fixes https://github.com/pytorch/pytorch/issues/133174

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133371
Approved by: https://github.com/ezyang
2024-08-24 20:33:23 +00:00
IvanKobzarev
8ae4f82243 [aotd] Support HOP effects in backward (#132638)
Support of effectful operations in backward:

1/ AOTD collects metadata from forward fn only, so we can have usage of effectful ops in backward, that were not used in forward => Allowing tokens discovery during joint function .

FunctionalTensorMode holds _tokens, in Joint function after tracing forward we memoize _tokens as `_tokens_forward_output`.

2/ Tokens are added as primals inputs (forward) in EffectTokensWrapper.
Tokens that will be used in backward are in partitioner saved values. We do not have control on which positions they are saved in forward outputs.

2/ If new tokens discovered in backward after tracing joint_fn, the result graph will be manually added in the end of primals.
_aot_autograd/utils.py

3/ All effectful ops during backward are marked with 'must_be_in_backward' partitioner_tag, to prevent partiitoner to place them in forward.

For that functional_tensor_mode got new optional state `self._effects_partitioner_tag` for effectful ops, to set after tracing forward.

There are additional changes in partitioner to improve functionality of 'must_be_in_backward'

4/ Unlift tokens now should run for both forward and backward.
- As saved for backward tokens are placed on non static places - we identify input and output tokens to erase, by input and output of `with_effects` operation
- In forward we can have input tokens, discovered in backward, that are not used in with_effects ops in forward, but saved for backward. We identify them by position in forward inputs.

5/ Adding aot debug logging for graphs before unlifting and before adding additional primal for backward tokens.

Tests:
```
python test/higher_order_ops/test_with_effects.py
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132638
Approved by: https://github.com/bdhirsh
2024-08-23 15:30:58 +00:00
Aaron Orenstein
d95aedf5fd [BE] typing for decorators - fx/_compatibility (part 1) (#134202)
Part of #134054.

This corresponds to the pytorch mypy changes from D61493706. Updating takes so
long and touches so many files that it's impossible to land as a whole without conflicting with some other intermediate change.
So landing these 'type: ignore' for pytorch in advance of them actually being needed.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134202
Approved by: https://github.com/Skylion007
2024-08-22 17:07:33 +00:00
PyTorch MergeBot
2db28a9611 Revert "[BE]: Update Typeguard to TypeIs for better type inference (#133814)"
This reverts commit bce0caba78.

Reverted https://github.com/pytorch/pytorch/pull/133814 on behalf of https://github.com/ezyang due to root cause of internal failures not addressed ([comment](https://github.com/pytorch/pytorch/pull/133814#issuecomment-2302466444))
2024-08-21 16:13:34 +00:00
Aaron Gokaslan
bce0caba78 [BE]: Update Typeguard to TypeIs for better type inference (#133814)
Uses TypeIs instead of TypeGuard for better inference. See https://peps.python.org/pep-0742/

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133814
Approved by: https://github.com/ezyang
2024-08-20 17:19:57 +00:00
PyTorch MergeBot
42097f0ec1 Revert "[BE]: Update Typeguard to TypeIs for better type inference (#133814)"
This reverts commit cf60fe53a8.

Reverted https://github.com/pytorch/pytorch/pull/133814 on behalf of https://github.com/jeanschmidt due to Broke 12k internal signals/jobs, @ezyang please help get those changes merged. More details check D61488368 ([comment](https://github.com/pytorch/pytorch/pull/133814#issuecomment-2298210309))
2024-08-20 08:02:49 +00:00
Colin Peppler
0d4eacb9d2 [fake tensor] unbacked symint support for binary op fast path (#133584)
Addreses https://github.com/pytorch/pytorch/issues/133525

We have an unbacked symint in `final_shape` and it's a tuple... So, add `guard_size_oblivious` to do size oblivious checks + `sym_eq` for list equality.

```
op.shape
> torch.Size([1])
final_shape
> (u0 + 1,)
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133584
Approved by: https://github.com/ezyang
2024-08-19 20:03:05 +00:00
Aaron Gokaslan
cf60fe53a8 [BE]: Update Typeguard to TypeIs for better type inference (#133814)
Uses TypeIs instead of TypeGuard for better inference. See https://peps.python.org/pep-0742/

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133814
Approved by: https://github.com/ezyang
2024-08-18 19:10:16 +00:00
soulitzer
4af4910b1a Reland "Construct NJT without graph breaks" (#133196)
This reverts commit 154d40ca488e6979ce9c2de89d8a35b53129ebea.

and adds changes from https://github.com/pytorch/pytorch/pull/133061

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133196
Approved by: https://github.com/ezyang
ghstack dependencies: #133145
2024-08-14 01:11:13 +00:00
PyTorch MergeBot
656465fc77 Revert "Conversions between strided and jagged layouts for Nested Tensors (#115749)"
This reverts commit ed97fb77f9.

Reverted https://github.com/pytorch/pytorch/pull/115749 on behalf of https://github.com/izaitsevfb due to fails internal jobs, see [S440348](https://www.internalfb.com/sevmanager/view/440348) ([comment](https://github.com/pytorch/pytorch/pull/115749#issuecomment-2285051164))
2024-08-12 23:14:19 +00:00
soulitzer
05de2b2d0f Revert "Construct NJT without graph breaks" (#133145)
This reverts commit 911154271309667b55dfb963ec6384bd0048019b.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133145
Approved by: https://github.com/YuqingJ
2024-08-10 03:11:16 +00:00
xinan.lin
5707c6e952 [Fake tensor] Align the appearance of device_put op in fx_graph generated for CUDA and XPU, which is exposed in the issue #130823 (#132479)
[Fake tensor] Align the appearance of device_put op in fx_graph generated for CUDA and XPU, which is exposed in the issue #130823
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132479
Approved by: https://github.com/EikanWang, https://github.com/zou3519, https://github.com/eellison
2024-08-09 05:31:00 +00:00
Antoni Viros
ed97fb77f9 Conversions between strided and jagged layouts for Nested Tensors (#115749)
This PR does 3 things:
1. Adds a copy-free strided->jagged layout conversion for NT
2. Adds a copy-free jagged->strided layout conversion for NT
3. Modifies and expands the .to() API to support the layout argument for the specific case of NT layout conversion.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115749
Approved by: https://github.com/jbschlosser
2024-08-07 14:18:53 +00:00
soulitzer
f50621989b Construct NJT without graph breaks (#130292)
Combines contributions from https://github.com/pytorch/pytorch/pull/130505

Some context can be found in this large comment block:

a5b64d39fd/test/dynamo/test_subclasses.py (L1667-L1681)

Changes in this PR
- For each tensor fakified, check the nested int registry in eager, and eagerly symbolicize if that tensor has already been associated with nested int in eager.
- Adds a separate counter stored on FakeTensorMode as a fake analog to _tensor_id_counter (which keeps track of unique tensors). This counter is initialized to the global eager tensor id counter upon creation of the FakeTensorMode, and needs to be reset when the same FakeTensorMode is reused to trace again (in this PR, we piggyback on the epoch incrementing logic).
- (refactor) Today, we store FakeTensor -> symbolic nested int in the global registry. With this PR, symbolic nested int is stored directly on the FakeTensor. (Eager still caches nested int in the registry, though we should avoid this at some point.)

Basically unchanged, but worth noting:
- `__tensor_unflatten__` is still responsible for determining whether we should cache for now. The logic is somewhat simplified.
- to_copy is still using the trick of updating two different tensors in the registry to point to the same nested int. This is kind of broken, but we try to leave it as is, and plan a better fix with the UnionFind stack.

Differential Revision: [D60406772](https://our.internmc.facebook.com/intern/diff/D60406772)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130292
Approved by: https://github.com/bdhirsh
ghstack dependencies: #131916, #131803
2024-08-06 17:03:39 +00:00
soulitzer
406b50835b Use FakeTensor cache for subclass inner tensors (#131803)
Rewrite of original PR in https://github.com/pytorch/pytorch/pull/130291

To answer review comments from https://github.com/pytorch/pytorch/pull/130291#pullrequestreview-2166671953:

> At a higher level, do we need this?

Today, this should not change the behavior of anything. But an invariant of "same tensor always corresponds to the same FakeTensor" is nice (from discussion with @bdhirsh).

> Why does this happen?

Today, both dynamo and meta_utils do some recursion when it comes to FakeTensors. So whenever we fakify a subclass, the process would roughly like:

```
wrap_to_fake (subclass)
   meta_utils (subclass)
      meta_utils (values) -> not cached because we use callback
      meta_utils(offsets) -> not cached because we use callback
  wrap_to_fake (values)
  wrap_to_fake (offsets) -> cached because we rely on top-level meta_utils
```

However, we know that:
- Caching only occurs at the top-level of meta_utils.
- The return value of the top-level wrap_to_fake is returned.

This means that after all of this:
- The fakified subclass holds inner FakeTensors that are NOT part of the cache
- values/offsets are Fakified a second time, and those instances are cached.

Differential Revision: [D60406773](https://our.internmc.facebook.com/intern/diff/D60406773)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/131803
Approved by: https://github.com/ezyang
ghstack dependencies: #131916
2024-08-06 17:03:39 +00:00
PyTorch MergeBot
38674bcb45 Revert "Conversions between strided and jagged layouts for Nested Tensors (#115749)"
This reverts commit eca0cb0fbe.

Reverted https://github.com/pytorch/pytorch/pull/115749 on behalf of https://github.com/izaitsevfb due to breaks test_overrides.py::TestTorchFunctionWarning::test_warn_on_invalid_torch_function_tensor_subclass ([comment](https://github.com/pytorch/pytorch/pull/115749#issuecomment-2270213988))
2024-08-06 01:55:41 +00:00
Antoni Viros
eca0cb0fbe Conversions between strided and jagged layouts for Nested Tensors (#115749)
This PR does 3 things:
1. Adds a copy-free strided->jagged layout conversion for NT
2. Adds a copy-free jagged->strided layout conversion for NT
3. Modifies and expands the .to() API to support the layout argument for the specific case of NT layout conversion.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115749
Approved by: https://github.com/jbschlosser
2024-08-05 23:45:48 +00:00
Aart Bik
a8490a0762 [traced-graph][sparse] propagate sparsity in fx graph (#131920)
This PR proceeds with implementing the feature request #117188 by generalizing more cases that already work with COO to work with the compressed sparse formats as well.

Feature request:
https://github.com/pytorch/pytorch/issues/117188

Rebranch of older PRs (for history):
https://github.com/pytorch/pytorch/pull/131474
https://github.com/pytorch/pytorch/pull/128549

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131920
Approved by: https://github.com/ezyang
2024-08-05 15:49:53 +00:00
Pearu Peterson
a4ea776881 Add pinned memory support to sparse COO/CSR/CSC/BSR/BSC tensors (#129645)
As in the title:

To register indices/values of a sparse XYZ tensor with CUDA, the following methods are supported
- `sparse_xyz_tensor(indices, values, pin_memory=True)`
- `sparse_xyz_tensor(indices, values).pin_memory()`
- `sparse_xyz_tensor(indices.pin_memory(), values.pin_memory())`

Fixes https://github.com/pytorch/pytorch/issues/115330

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129645
Approved by: https://github.com/amjames, https://github.com/cpuhrsch, https://github.com/eqy
2024-08-02 08:55:55 +00:00
Brian Hirsh
997f64af38 fastpath FunctionalTensor sizes() (#132084)
Another attempt at fast-pathing sizes() in FunctionalTensor, since it appears to improve compile time perf by up to ~10%. See the investigation from https://github.com/pytorch/pytorch/issues/125977#issuecomment-2122915602.

After looking at some failing tests locally I realized that we need to manually handle metadata mutations now, since the previous "smarter" size dispatch was handling the updates

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132084
Approved by: https://github.com/ezyang
2024-08-01 21:09:22 +00:00
Michael Lazos
93979e7063 Skip frame if torch dispatch mode enabled (#131828)
Fixes https://github.com/pytorch/pytorch/issues/105929

We now skip frames if a dispatch mode is enabled.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131828
Approved by: https://github.com/bdhirsh, https://github.com/anijain2305
2024-08-01 19:06:20 +00:00
Oguz Ulgen
72d2dba992 Add None return type to init (#132335)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132335
Approved by: https://github.com/albanD
2024-08-01 15:26:45 +00:00
IvanKobzarev
9c52013559 [subclasses] Fix nested subclasses flattened tensors ordering (#132096)
get_plain_tensors() should result in DFS of leaves.
The error was that plain tensors (leaves) on the same level were returned before subclasses plained tensors even if subclasses are before in "flatten" list.

Original issue from AO: https://github.com/pytorch/ao/issues/515

Test:TBD, need to make asymetric subclass with dense tensors and subclasses
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132096
Approved by: https://github.com/bdhirsh
2024-07-31 14:12:51 +00:00
Xuehai Pan
e7eeee473c [BE][Easy][14/19] enforce style for empty lines in import segments in torch/_[a-c]*/ and torch/_[e-h]*/ and torch/_[j-z]*/ (#129765)
See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter.

You can review these PRs via:

```bash
git diff --ignore-all-space --ignore-blank-lines HEAD~1
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129765
Approved by: https://github.com/ezyang
2024-07-31 10:42:50 +00:00
Aaron Orenstein
8721b21b38 Fix fake_tensor w/ non-view tensor (#132050)
Summary: This code was overly complex and is confusing some guards - basically if a result cached tensor isn't a view there's no reason to be messing with its storage.

Test Plan: unit tests pass

Differential Revision: D60387821

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132050
Approved by: https://github.com/oulgen
2024-07-30 02:17:18 +00:00
PyTorch MergeBot
945bf78894 Revert "[BE] typing for decorators - fx/_compatibility (#131568)"
This reverts commit 193f62fde9.

Reverted https://github.com/pytorch/pytorch/pull/131568 on behalf of https://github.com/clee2000 due to same as https://github.com/pytorch/pytorch/pull/131572#issuecomment-2254328359 but I clicked the wrong link by accident.  This is where it actually starts ([comment](https://github.com/pytorch/pytorch/pull/131568#issuecomment-2254330781))
2024-07-28 03:43:39 +00:00
Aaron Orenstein
3e0ccb3a9f Fixing fake tensor SymInt caching (#131966)
Summary: Some tests are failing because of a weird interaction between the symbolic sizes and the `set()` - back it out for now.

Differential Revision: D60320595

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131966
Approved by: https://github.com/oulgen
2024-07-27 22:43:57 +00:00
Brian Hirsh
071ac38141 fast-path FakeTensor detach (#131899)
Fixes https://github.com/pytorch/pytorch/issues/128281, see investigation at https://github.com/pytorch/pytorch/issues/128281#issuecomment-2252976926.

benchmark:
```
python benchmarks/dynamo/huggingface.py --performance --timing --explain --backend aot_eager --device cuda --training --float32 --only BertForMaskedLM
```

time before:
```
TIMING: entire_frame_compile:30.85435 backend_compile:23.98599 total_wall_time:30.85435
```

time after:
```
TIMING: entire_frame_compile:24.35898 backend_compile:18.15235 total_wall_time:24.35898
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131899
Approved by: https://github.com/ezyang, https://github.com/zou3519, https://github.com/albanD
2024-07-26 20:16:08 +00:00
Aaron Orenstein
193f62fde9 [BE] typing for decorators - fx/_compatibility (#131568)
See #131429

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131568
Approved by: https://github.com/justinchuby, https://github.com/oulgen, https://github.com/zou3519
2024-07-25 22:24:19 +00:00
Aaron Orenstein
1930698140 Fix fake tensor SymInt caching when there's a SymInt storage_offset (#131500)
Test Plan: Internal unit tests failed before and succeeded after.

Differential Revision: D60131273

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131500
Approved by: https://github.com/clee2000
2024-07-23 23:37:04 +00:00
Aaron Orenstein
5a0068cc69 [BE] mypy: disallow untyped decorators (#131428)
Untyped decorators strip the types from their decorated function so even if the underlying function is fully typed then callers to it don't get any benefit from type annotations.

Step 1 - Enable the error and override in all the offending files.

#131429

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131428
Approved by: https://github.com/justinchuby, https://github.com/oulgen
2024-07-23 21:50:55 +00:00
Aaron Orenstein
b193894b94 FakeTensor cache SymInt support (#127596)
Adds support for SymInts in the FakeTensor cache.

A couple notes:
1. When a SymInt is present in the input key for a FakeTensor operation we cache on the ShapeEnv instead of using the FakeTensorMode cache. This is necessary so we don't have to remember and check the guards. It reduces the cache hits but there's diminishing return on how much work we can do before the cache becomes more of a burden than a gain.
2. We need to be careful that when we cache an output SymInt that is a direct copy from the input that when we have a cache-hit we copy the SymNode from the input to the output. This is important because the fx-graph building code actually uses SymNode ids in the process of building the graph so constructing a same-content-but-different-id SymNode will fail.
3. In the cache key we store SymInts as a _PySymInputStub. These represent SymInt (and friends) but support `__hash__` and `__eq__` (which SymInt do not).
4. In the cache entry we store SymInts as a _SymIntOutputStub.

Perf example:
```
python benchmarks/dynamo/timm_models.py --ci --accuracy --timing
--explain --inductor --dynamic-shapes --dynamic-batch-only --device cuda
--training --amp --total-partitions 2 --partition-id 0 --output
/tmp/training_timm_models.csv --filter crossvit_9_240
```
fake tensor cache before:
```
INFO: FakeTensor cache stats:
INFO:   cache_hits: 68137
INFO:   cache_misses: 837
INFO:   cache_bypasses:
INFO:     symbolic shape:            48224
INFO:     CompositeImplicitAutograd: 917
INFO:     non-fake tensor:           70
INFO:     non-FakeTensor output:     62
INFO:     non-builtin:               8
INFO:     dynamic output shape:      1
```
and after:
```
INFO: FakeTensor cache stats:
INFO:   cache_hits: 88187
INFO:   cache_misses: 14233
INFO:   cache_bypasses:
INFO:     CompositeImplicitAutograd: 1037
INFO:     non-FakeTensor output:     602
INFO:     non-fake tensor:           70
INFO:     unsafe view:               36
INFO:     non-builtin:               8
INFO:     dynamic output shape:      1
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127596
Approved by: https://github.com/eellison
ghstack dependencies: #131014, #129780
2024-07-21 19:26:38 +00:00
Aaron Orenstein
ebce85172e FakeTensor cache SymInt support: flatten cache key (#129780)
This is part of #127596, pulled out to make reviewing a little easier.

Flatten the FakeTensor cache key - so it's a list of singular elements and pointing at one requires a single index rather than a PyTree path.  This is used in the next PR to allow us to have the cache entry refer to an input SymInt that it needs to copy directly into the output.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129780
Approved by: https://github.com/oulgen, https://github.com/eellison
ghstack dependencies: #131014
2024-07-21 19:26:38 +00:00
Xuehai Pan
4d7bf72d93 [BE][Easy] fix ruff rule needless-bool (SIM103) (#130206)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130206
Approved by: https://github.com/malfet
2024-07-14 08:17:52 +00:00
Aaron Orenstein
567482973d typing fake_tensor.py (#128041)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/128041
Approved by: https://github.com/eellison
ghstack dependencies: #129182
2024-07-13 06:07:40 +00:00