Commit Graph

252 Commits

Author SHA1 Message Date
Edward Z. Yang
c67c16bcd2 Switch calling convention back to real tensors (#99320)
Months ago, in order to get dynamic shapes working through to Dynamo backends, we changed the calling convention to pass fake tensors rather than real tensors as example inputs to backends. The motivation at the time was, well, backends shouldn't really be peeking at the real tensors when they are doing compilation, and so it would make more sense to hide the real tensors from backends. But there were a bunch of problems:

* This interacted poorly with our accuracy minifier design: accuracy minifier needs access to the real inputs in order to run the model and figure out what happens!
* The TensorRT backend required real inputs and we never figured out how to fix it.
* In practice, all the backends needed to detect if they were passed real tensors, and fakeify them anyway (certainly AOTAutograd does this)
* Parameters and inputs are treated non-uniformly: parameters had to be passed as real tensors, because CUDA graphs requires knowing what the actual tensors are

Furthermore, there were some more problems discovered after the fact:

* Backends may want to optimize on aspects of tensors which you cannot tell without having real tensors; e.g., alignment of the data pointer

So, this PR decides that changing the calling convention was a bad idea, and switches back to passing real tensors. There is a problem though: AOTAutograd will perform fakeification, which means that in practice backends are still going to end up with fake tensors in the end anyway. I want to change this, but this will require some work with bdhirsh's upcoming AOTAutograd export refactor.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99320
Approved by: https://github.com/voznesenskym
2023-04-19 12:15:52 +00:00
PyTorch MergeBot
ea50d4f146 Revert "Switch calling convention back to real tensors (#99320)"
This reverts commit 780922c24e.

Reverted https://github.com/pytorch/pytorch/pull/99320 on behalf of https://github.com/DanilBaibak due to Break internal build
2023-04-19 09:44:06 +00:00
Nikita Shulga
8a89eec2f8 [BE] Do not use unicode quotes (#99446)
They are mostly used in commented code examples, but even Python-3.12
does not recognize `“foobar”` as valid string literal

I.e. just `s/[“”]/"/`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99446
Approved by: https://github.com/huydhn, https://github.com/ezyang
2023-04-18 22:59:56 +00:00
Edward Z. Yang
780922c24e Switch calling convention back to real tensors (#99320)
Months ago, in order to get dynamic shapes working through to Dynamo backends, we changed the calling convention to pass fake tensors rather than real tensors as example inputs to backends. The motivation at the time was, well, backends shouldn't really be peeking at the real tensors when they are doing compilation, and so it would make more sense to hide the real tensors from backends. But there were a bunch of problems:

* This interacted poorly with our accuracy minifier design: accuracy minifier needs access to the real inputs in order to run the model and figure out what happens!
* The TensorRT backend required real inputs and we never figured out how to fix it.
* In practice, all the backends needed to detect if they were passed real tensors, and fakeify them anyway (certainly AOTAutograd does this)
* Parameters and inputs are treated non-uniformly: parameters had to be passed as real tensors, because CUDA graphs requires knowing what the actual tensors are

Furthermore, there were some more problems discovered after the fact:

* Backends may want to optimize on aspects of tensors which you cannot tell without having real tensors; e.g., alignment of the data pointer

So, this PR decides that changing the calling convention was a bad idea, and switches back to passing real tensors. There is a problem though: AOTAutograd will perform fakeification, which means that in practice backends are still going to end up with fake tensors in the end anyway. I want to change this, but this will require some work with bdhirsh's upcoming AOTAutograd export refactor.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99320
Approved by: https://github.com/voznesenskym
2023-04-18 02:09:57 +00:00
Edward Z. Yang
a109453df4 Delete use_functionalize feature flag (#99317)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99317
Approved by: https://github.com/voznesenskym
2023-04-18 02:09:57 +00:00
Edward Z. Yang
17d7be68ee Delete functorch use_fake_tensor and debug_fake_cross_ref (#99314)
Using fake tensor with AOTAutograd is now mandatory, simplifying our
logic.  Unfortunately, this means debug_fake_cross_ref must go,
but I don't think anyone has used it recently.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99314
Approved by: https://github.com/eellison, https://github.com/zou3519
2023-04-18 02:09:54 +00:00
Li-Huai (Allan) Lin
6f181aae7c [vmap] Register decomposition for huber_loss_backward (#99236)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99236
Approved by: https://github.com/kshitij12345
2023-04-16 18:50:45 +00:00
Animesh Jain
fdbc8625a1 Functionalization of torch.rand/rand_like ops (#97377)
This PR introduces the functionalization of RNG ops. Key points are

* Introduces a new `philox_rand` prim operator that accepts seed, offset.
* Adds decompositions for random operators that use these philox_rand prims
* Adds a PhiloxStateTracker to track the offset for each occurence of rand ops
* Changes calling convention of AOT Autograd and adds <fwd_seed, fwd_base_offset> and <bwd_seed, bwd_base_offset>
* Monkeypatches set_rng_state and get_rng_state while AOT Autograd tracing to record the rng state behavior
* Raises assertion for CPU because CPU does not Philox RNG.

Not dealt in this PR
* dropout op - offset calculation is different
* other distributions like normal, poisson etc
* Inductor support
* Cudagraph support
* Dynamic shape support

An example
~~~

class Custom(torch.autograd.Function):
    @staticmethod
    def forward(ctx, x):
        ctx.save_for_backward(x)
        a = torch.rand_like(x) * x
        a = torch.rand_like(x) * a
        return a

    @staticmethod
    def backward(ctx, grad_out):
        x, = ctx.saved_tensors
        return grad_out * torch.rand_like(grad_out) * torch.cos(x)

====== Forward graph 0 ======
def forward(self, fwd_seed_1: i64[], fwd_base_offset_1: i64[], primals_1: f32[16, 16]):
    # No stacktrace found for following nodes
    add: i64[] = torch.ops.aten.add.Tensor(fwd_base_offset_1, 0)
    philox_rand: f32[16, 16] = torch.ops.prims.philox_rand.default([16, 16], fwd_seed_1, add, [16, 1], device(type='cuda', index=0), torch.float32);  add = None
    mul: f32[16, 16] = torch.ops.aten.mul.Tensor(philox_rand, primals_1);  philox_rand = None
    add_1: i64[] = torch.ops.aten.add.Tensor(fwd_base_offset_1, 4);  fwd_base_offset_1 = None
    philox_rand_1: f32[16, 16] = torch.ops.prims.philox_rand.default([16, 16], fwd_seed_1, add_1, [16, 1], device(type='cuda', index=0), torch.float32);  fwd_seed_1 = add_1 = None
    mul_1: f32[16, 16] = torch.ops.aten.mul.Tensor(philox_rand_1, mul);  philox_rand_1 = mul = None
    return [mul_1, primals_1]

====== Backward graph 0 ======
def forward(self, bwd_seed_1: i64[], bwd_base_offset_1: i64[], primals_1: f32[16, 16], tangents_1: f32[16, 16]):
    # No stacktrace found for following nodes
    add_2: i64[] = torch.ops.aten.add.Tensor(bwd_base_offset_1, 0);  bwd_base_offset_1 = None
    philox_rand_2: f32[16, 16] = torch.ops.prims.philox_rand.default([16, 16], bwd_seed_1, add_2, [16, 1], device(type='cuda', index=0), torch.float32);  bwd_seed_1 = add_2 = None
    mul_2: f32[16, 16] = torch.ops.aten.mul.Tensor(tangents_1, philox_rand_2);  tangents_1 = philox_rand_2 = None
    cos: f32[16, 16] = torch.ops.aten.cos.default(primals_1);  primals_1 = None
    mul_3: f32[16, 16] = torch.ops.aten.mul.Tensor(mul_2, cos);  mul_2 = cos = None
    return [mul_3]

~~~

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97377
Approved by: https://github.com/ezyang
2023-04-16 09:55:56 +00:00
Edward Z. Yang
039faf0dbf Add invariant that all symbolic shapes must be bound in graph (#99089)
Previously, we had a problem when partitioning forward-backward dynamic graphs, which is that we could end up with a backward graph that mentions a symbol in an input tensor (e.g., `f32[s0 + s1]`), but without this symbol being otherwise bound elsewhere. When this happens, we have no way of actually deriving the values of `s0` and `s1`. Our fix for this in https://github.com/pytorch/pytorch/pull/93059 was to just retrace the graph, so that s0 + s1 got allocated a new symbol s2 and everything was happy. However, this strategy had other problems, namely (1) we lost all information from the previous ShapeEnv, including guards and (2) we end up allocating a LOT of fresh new symbols in backwards.

With this change, we preserve the same ShapeEnv between forward and backwards. How do we do this? We simply require that every symbol which may be present inside tensors, ALSO be a plain SymInt input to the graph. This invariant is enforced by Dynamo. Once we have done this, we can straightforwardly modify the partitioner to preserve these SymInt as saved for backwards, if they are needed in the backwards graph to preserve the invariant as well.

This apparently breaks yolov3, but since everything else is OK I'm merging this as obviously good and investigating later.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99089
Approved by: https://github.com/voznesenskym
2023-04-16 01:48:19 +00:00
Brian Hirsh
7cb581d42f aot_autograd: more logging on metadata asserts (#99177)
Summary: add better logging to aot autograd asserts to debug internal model issues

Test Plan: let CI run

Differential Revision: D45006044

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99177
Approved by: https://github.com/bertmaher
2023-04-15 00:22:00 +00:00
Brian Hirsh
670c5cf962 AOTAutograd: fix 'Trying to backward through the graph a second time' error (#98960)
Fixes https://github.com/pytorch/pytorch/issues/97745. See discussion and comment in the PR for more details.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98960
Approved by: https://github.com/bertmaher, https://github.com/albanD
2023-04-13 10:25:07 +00:00
Jason Ansel
8a6dd0dc97 Disable logging in pattern matcher calls to AotAutograd (#98936)
Fixes #98778

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98936
Approved by: https://github.com/wconstab
2023-04-13 04:51:08 +00:00
Andrew Gu
3c5a825f3c [AOTAutograd] Fix is-duplicate check in de-dup guard logic (#98932)
**Context**
The existing check to see if an arg is duped is `if dupe_arg_pos != kept_pos:`. However, this incorrectly considers every arg after a true duped arg to also be a duped arg.

Consider `flat_args = [a, b, b, c]`, where indices `1` and `2` are duped.
- `add_dupe_map = {0: 0, 1: 1, 2: 1, 3: 2}`
- For `dupe_arg_pos=2, kept_pos=1`, `2 != 1`, so the check correctly identifies the second `b` to be a duped arg.
- For `dupe_arg_pos=3, kept_pos=2`, `3 != 2`, so the check incorrectly identifies the `c` to be a duped arg.

Indeed, if there were more args like `[a, b, b, c, d, e, ...]`, every arg after the second `b` will be considered a duped arg since its `kept_pos` will always be 1 lower than its `dupe_arg_pos`.

**Overview**
This PR changes `add_dupe_map` to be implemented as a `List[int]`, where the list index implicitly represents the `dupe_arg_pos` and the list element represents the `kept_pos`. We use a list to have stable in-order iteration and because we know the keys to be in `{0, 1, ..., len(flat_args) - 1}`.

With `add_dupe_map` as a list, the `is_dupe_arg` condition is whether the entry in `add_dupe_map` shows a new not-yet-seen index in the iteration. One way to do this is to count the number of unique args so far and compare against that.

This closes https://github.com/pytorch/pytorch/issues/98883, where now the guards change from
```
GUARDS ___guarded_code.valid
and ___check_type_id(L['self'], 93996836333040)
and ___check_obj_id(L['self'], 140119034997536)
and not ___are_deterministic_algorithms_enabled()
and ___check_tensors(L['x'])
and L['self']._buf is L['self']._buf_module._buf
and L['self']._buf_module._buf is L['self']._param
```
to without the final incorrect `L['self']._buf_module._buf is L['self']._param` guard.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98932
Approved by: https://github.com/ezyang
2023-04-12 22:22:50 +00:00
Das, Barun
a38ff4cfd1 documentation update (#98782)
change` parameters_and_buffers` to `parameter_and_buffer_dicts` in function docstring

Fixes #98766

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98782
Approved by: https://github.com/ngimel, https://github.com/kit1980
2023-04-12 20:34:30 +00:00
Edward Z. Yang
822464567f Lazily format graphs for debug printing (#98776)
The current code unconditionally formats the graphs, which is a
waste of CPU if no one looks at them.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98776
Approved by: https://github.com/albanD, https://github.com/mlazos
2023-04-10 22:41:33 +00:00
Edward Z. Yang
02cff64784 Assert that there are not duplicate sources for distinct arguments (#98738)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98738
Approved by: https://github.com/albanD, https://github.com/bdhirsh
2023-04-10 19:32:08 +00:00
Liao, Xuan
95621b3c2e [aot] fix disable amp for runtime wrapper (#97864)
For the current runtime wrapper in aot, `disable_amp` is always set to True. In fact, we would like to avoid disabling autocast if possible because accessing TLS is slow. In this PR, `disable_amp` depends on whether there is any autocast enabled instead of always being True. Many operators would get an improvement of performance (inductor v.s. eager) with this fix.

Example of operators' 0.8 speedup in torchbench (inductor v.s. eager):
<html xmlns:v="urn:schemas-microsoft-com:vml"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns="http://www.w3.org/TR/REC-html40">

<head>

<meta name=ProgId content=Excel.Sheet>
<meta name=Generator content="Microsoft Excel 15">
<link id=Main-File rel=Main-File
href="file:///C:/Users/xuanliao/AppData/Local/Temp/msohtmlclip1/01/clip.htm">
<link rel=File-List
href="file:///C:/Users/xuanliao/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml">

</head>

<body link="#0563C1" vlink="#954F72">

  | current | new
-- | -- | --
aten.hardsigmoid.default | 0.709372349 | 0.81414306
aten.tanh.default | 0.715227805 | 0.855556349
aten.add.Scalar | 0.682292123 | 0.860371222
aten.sigmoid_backward.default | 0.688039934 | 0.915606579

</body>

</html>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97864
Approved by: https://github.com/EikanWang, https://github.com/jansel, https://github.com/jgong5, https://github.com/bdhirsh
2023-04-10 05:00:12 +00:00
Jason Ansel
b9d3b3f595 Improve support for contextlib.nullcontext (#98111)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98111
Approved by: https://github.com/anijain2305
2023-04-02 02:33:14 +00:00
Michael Lazos
ee9a9b7add Remove old logging callsites (#98095)
Get around GH first issue, OSS only changes for https://github.com/pytorch/pytorch/pull/97182

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98095
Approved by: https://github.com/anijain2305
2023-04-01 00:57:37 +00:00
Brian Hirsh
864ab93656 aot_autograd: avoid using intermediate_base logic unnecessarily (#97786)
fixes https://github.com/pytorch/pytorch/issues/97691, see the issue for the proposed design. Now that we are employing AOTAutograd's "intermediate base" logic a lot less frequently, we might see some speedups in the benchmark suite.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97786
Approved by: https://github.com/jansel, https://github.com/soulitzer
2023-03-31 16:25:13 +00:00
Aaron Gokaslan
47dca20d80 [BE] Enable flake8-comprehension rule C417 (#97880)
Enables flake8-comprehension rule C417. Ruff autogenerated these fixes to the codebase.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97880
Approved by: https://github.com/ezyang, https://github.com/kit1980, https://github.com/albanD
2023-03-30 14:34:24 +00:00
Jason Ansel
04ca3a289d Disable modes in preserve_rng_state (#97738)
This one allows make_fx to be called when already in a faketensor mode.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97738
Approved by: https://github.com/ezyang
2023-03-29 22:46:51 +00:00
Edward Z. Yang
8372c5dc68 Refactor dynamic dims api, stateless internals, higher level export API (#96699)
The purpose of this API is to execute a few large components of work:

1) Refactor all the internals of plumbing dynamic dimension information after dynamo to be stateless
2) Decouple allocation controls around dynamic dimensions from verification
3) For (2), for allocation, create an enum that dictates whether we are in DUCK (default today), STATIC (aka assume_static_default in the past), or DYNAMIC (aka user constrained, do not duck shape)
4) For (2), for verification, we separate out the list of dynamic ranges entirely from allocation. This means shape_env does not tracking for what we verify on, and instead, it is the callers job to invoke produce_guards() with the various things they want verified, specifically, with the valid ranges. We do use constrain ranges to refine value ranges when doing analysis.
5) We have decided, therefore, as an extension of (4) to double down on "late" checks versus "eager" checks, primarily because the mechanisms for gathering what actually matters happens during guards, and should be a purview of the caller seeking guards, not the shape env. However, for dynamo, these structures are essentially one and the same.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96699
Approved by: https://github.com/avikchaudhuri, https://github.com/ezyang
2023-03-29 16:55:49 +00:00
Aaron Gokaslan
597b558c51 [BE]: Update flake8 and plugins and fix bugs (#97795)
Update flake8 and flake8-plugins in lintrunner to a modern version. Enables more checks and makes flake8 checks significantly faster. Added a few additional rule ignores that will need to be fixed in the future.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97795
Approved by: https://github.com/alexsio27444, https://github.com/janeyx99, https://github.com/ezyang
2023-03-28 23:51:55 +00:00
kshitij12345
2b369eb3c2 [fix] jacrev and jacfwd : support non-tensor args again (#97746)
Fixes https://github.com/pytorch/pytorch/issues/97636

The code to check if argument tensor are complex assumed that all arguments are tensor (which is not the case) which lead to the error.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97746
Approved by: https://github.com/zou3519
2023-03-28 16:37:33 +00:00
Edward Z. Yang
6430dad700 Apparently aot_function doesn't cache anymore (#97610)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97610
Approved by: https://github.com/albanD
2023-03-27 21:07:20 +00:00
Michael Voznesensky
31e858e8fc Add missing aot_autograd_arg_pos_to_source (#97487)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97487
Approved by: https://github.com/malfet, https://github.com/ezyang
2023-03-24 05:17:59 +00:00
Edward Z. Yang
fa4c77e39b Rename PyOperator to HigherOrderOperator (#97493)
Twice this week I have had people confuse "operator defined with Python
operator registration aka torch.library" and "PyOperator which is used
to define control flow operators and other operators that cannot be
represented in JIT schema."  Renaming PyOperator for clarity.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97493
Approved by: https://github.com/SherlockNoMad
2023-03-24 05:04:02 +00:00
Elias Ellison
33dfdedb28 CUDAGraph Trees - Warn on dealloc (#97171)
Differential Revision: [D44228370](https://our.internmc.facebook.com/intern/diff/D44228370)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97171
Approved by: https://github.com/ezyang, https://github.com/jansel
2023-03-24 01:19:19 +00:00
Edward Z. Yang
37faa48844 DCE inference graphs too (#97275)
I added a bunch of asserts to verify that I didn't accidentally kill copy_ in the graph, hopefully this combined with our existing tests is good enough.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97275
Approved by: https://github.com/bdhirsh
2023-03-23 07:02:52 +00:00
Brian Hirsh
545abc292b [aot autograd] refactor to make functionalization self-contained (#96341)
This refactor should make it easier to add an export hook into aot autograd.

(1) I killed `create_forward_or_joint_functionalized()` (and the functions that it called, like `forward_or_joint()`) which used to handle autograd + functionalization all-in-one-go for the joint case, and was also used in the inference case.

I added a few separate helper functions:

`create_functionalized_graph()`: this takes a flat fn, and returns a functionalized fx graph. It is mostly just a thin wrapper around functionalization + make_fx(), but also has some extra logic to manually append `copy_()` ops to the end of the graph.

`fn_no_extra_mutations()`: this creates the fn that we want to trace in the inference code path. It takes in a function that it then calls, and returns the outputs + any (updated) mutated inputs.

`joint_fn_no_external_mutations()`: this creates the fn that we want to trace in the joint code path. It takes in a function, and traces out its joint. It also does the work of cloning inputs that are mutated and require gradients, returning mutated inputs as outputs, and returning intermediate bases as outputs

We should be able to add an export hook by basically adding a similar version of `joint_fn_no_external_mutations` but with a lot more restrictions (guaranteed to have no tangents, not synthetic bases, etc), and calling `create_functionalized_graph()` on it.

Differential Revision: [D44204090](https://our.internmc.facebook.com/intern/diff/D44204090)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96341
Approved by: https://github.com/ezyang
2023-03-22 21:41:52 +00:00
PyTorch MergeBot
a7856e18a7 Revert "DCE inference graphs too (#97275)"
This reverts commit aa3a57b80d.

Reverted https://github.com/pytorch/pytorch/pull/97275 on behalf of https://github.com/ezyang due to this broke a test
2023-03-22 18:55:52 +00:00
Elias Ellison
9c144bc4fe Dont increment generation if forward of backward exists, and warning on deallocation of live tensors (#97168)
Refining the logic for when it is okay to ignore previously live outputs from cudagraphs. If there is a forward that has been invoked without invocation of the corresponding backwards, dont allow overwriting outputs.

Differential Revision: [D44228369](https://our.internmc.facebook.com/intern/diff/D44228369)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97168
Approved by: https://github.com/ezyang, https://github.com/jansel
2023-03-22 18:27:36 +00:00
Horace He
e49b4d3827 Changed logging in aotautograd a little (#97289)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97289
Approved by: https://github.com/mlazos
2023-03-22 09:33:30 +00:00
Edward Z. Yang
aa3a57b80d DCE inference graphs too (#97275)
I added a bunch of asserts to verify that I didn't accidentally kill copy_ in the graph, hopefully this combined with our existing tests is good enough.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97275
Approved by: https://github.com/bdhirsh
2023-03-22 01:02:21 +00:00
Brian Hirsh
af440c427b [draft for discussion] add per-dispatch key modes (#97052)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97052
Approved by: https://github.com/ezyang, https://github.com/zou3519
2023-03-21 23:45:45 +00:00
Michael Voznesensky
f9ce593267 Extend aot autograd dedup guards to params, stop using positions (#96774)
The purpose of this PR is to remove reliance on argument positions in dedup guards, AND extend the functionality to params.

A version of this PR was stamped prior https://github.com/pytorch/pytorch/pull/95831 - but was kinda gross, because it was based on an underlying PR that did way too much with source names.

This PR leaves most of that alone, in favor of just reusing the same name standardization logic that dynamo module registration does.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96774
Approved by: https://github.com/ezyang
2023-03-21 05:59:33 +00:00
Aaron Gokaslan
5471621497 [BE] Remove unnecessary dict comprehensions (#97116)
Removes unnecessary dict comprehensions that optimize creation of dicts from iterables

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97116
Approved by: https://github.com/kit1980
2023-03-20 00:56:57 +00:00
Liao, Xuan
5ee5a164ff [aot] disable inference view tracking (#96478)
For inference, we should disable unnecessary view tracking to save time. Most of operators get an improvement of performance (inductor v.s. eager). This PR fix the general regression of operators for inductor.

Example of operators' speedup in torchbench (inductor v.s. eager):
<html xmlns:v="urn:schemas-microsoft-com:vml"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns="http://www.w3.org/TR/REC-html40">

<head>

<meta name=ProgId content=Excel.Sheet>
<meta name=Generator content="Microsoft Excel 15">
<link id=Main-File rel=Main-File
href="file:///C:/Users/xuanliao/AppData/Local/Temp/msohtmlclip1/01/clip.htm">
<link rel=File-List
href="file:///C:/Users/xuanliao/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml">
</head>

<body link="#0563C1" vlink="#954F72">

  | current | new
-- | -- | --
aten.hardsigmoid.default | [0.6426090814905988, 0.6791992931354925, 0.7046010955095103] | [0.7921782106271767, 0.8919522525991529, 0.9128089963571694]
aten.tanh.default | [0.6135534976747065, 0.7588851221588919, 0.898274076411234] | [0.857534066531159, 1.0524121834821605, 1.2535141671420165]
aten.floor.default | [0.6115868728087821, 0.6115868728087821, 0.6115868728087821] | [0.9472870784346195, 0.9472870784346195, 0.9472870784346195]
aten.exp.default | [0.7784016216625718, 0.9279358274876591, 1.1201178548406794] | [0.5777145055206203, 0.8610140436473923, 1.1850714193498957]
aten.mul_.Tensor | [0.14381872531802153, 0.14638969818507447,   0.14947766446663138] | [0.37695307573466363, 0.3832122689450142, 0.38963470437456904]
aten.hardtanh_.default | [0.49502896822398157, 0.5897512505705527, 0.8052969399847189] | [0.4915338157706071, 0.6098169585316151, 0.8587605051115021]
aten.relu_.default | [0.47776870021339685, 0.54452322796367, 0.6516167164223963] | [0.4764791289773786, 0.5608095328163419, 0.6753350976452626]

</body>

</html>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96478
Approved by: https://github.com/EikanWang, https://github.com/jansel, https://github.com/jgong5, https://github.com/bdhirsh
2023-03-18 13:58:24 +00:00
Michael Lazos
a1c46e5f8f component-level configurable logging for dynamo, inductor, aot (#94858)
Summary:

Adds NNC-like logging that is configured through an env var `TORCH_COMPILE_LOGS`
Examples:
`TORCH_LOGS="dynamo,guards" python script.py` - prints dynamo logs at level INFO with guards of all functions that are compiled

`TORCH_LOGS="+dynamo,guards,graph" python script.py` - prints dynamo logs at level DEBUG with guards and graphs (in tabular) format of all graphs that are compiled

[More examples with full output](https://gist.github.com/mlazos/b17f474457308ce15e88c91721ac1cce)

Implementation:
The implementation parses the log settings from the environment, finds any components (aot, dynamo, inductor) or other loggable objects (guards, graph, etc.) and generates a log_state object. This object contains all of the enabled artifacts, and a qualified log name -> level mapping. _init_logs then adds handlers to the highest level logs (the registered logs), and sets any artifact loggers to level DEBUG if the artifact is enabled.

Note: set_logs is an alternative for manipulating the log_state, but if the environment contains TORCH_LOGS, the environment settings will be prioritized.

Adding a new log:
To add a new log, a dev should add their log name to torch._logging._registrations (there are examples there already).

Adding a new artifact:
To add a new artifact, a dev should add their artifact name to torch._logging._registrations as well.
Additionally, wherever the artifact is logged, `torch._logging.getArtifactLogger(__name__, <artifact_name>)` should be used instead of the standard logging implementation.

[design doc](https://docs.google.com/document/d/1ZRfTWKa8eaPq1AxaiHrq4ASTPouzzlPiuquSBEJYwS8/edit#)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94858
Approved by: https://github.com/ezyang
2023-03-18 04:17:31 +00:00
PyTorch MergeBot
b5ecf727be Revert "[aot autograd] refactor to make functionalization self-contained (#96341)"
This reverts commit 3cd9c7a16d.

Reverted https://github.com/pytorch/pytorch/pull/96341 on behalf of https://github.com/DanilBaibak due to Break internal build
2023-03-17 09:24:05 +00:00
Brian Hirsh
e9d9151eec [aot autograd] avoid cloning some inputs unnecessarily when they dont require grad (#96342)
When constructing the joint graph, we normally have to clone any inputs that are mutated, so that we can pass in the original, pre-mutation inputs as leaves to autograd.

Previously, we were doing this for all mutated inputs - but we only need to do it for inputs that require gradients and participate in autograd.

Hopefully this should speed up code like batch norm - I think before this we were unnecessarily cloning the running stats during training.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96342
Approved by: https://github.com/albanD, https://github.com/ezyang
2023-03-15 16:34:04 +00:00
Brian Hirsh
3cd9c7a16d [aot autograd] refactor to make functionalization self-contained (#96341)
This refactor should make it easier to add an export hook into aot autograd.

(1) I killed `create_forward_or_joint_functionalized()` (and the functions that it called, like `forward_or_joint()`) which used to handle autograd + functionalization all-in-one-go for the joint case, and was also used in the inference case.

I added a few separate helper functions:

`create_functionalized_graph()`: this takes a flat fn, and returns a functionalized fx graph. It is mostly just a thin wrapper around functionalization + make_fx(), but also has some extra logic to manually append `copy_()` ops to the end of the graph.

`fn_no_extra_mutations()`: this creates the fn that we want to trace in the inference code path. It takes in a function that it then calls, and returns the outputs + any (updated) mutated inputs.

`joint_fn_no_external_mutations()`: this creates the fn that we want to trace in the joint code path. It takes in a function, and traces out its joint. It also does the work of cloning inputs that are mutated and require gradients, returning mutated inputs as outputs, and returning intermediate bases as outputs

We should be able to add an export hook by basically adding a similar version of `joint_fn_no_external_mutations` but with a lot more restrictions (guaranteed to have no tangents, not synthetic bases, etc), and calling `create_functionalized_graph()` on it.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96341
Approved by: https://github.com/ezyang
2023-03-15 16:34:04 +00:00
Brian Hirsh
8ce8d49cc4 aot autograd: consolidate metadata (#96340)
Another bonus of factoring the synthetic_base logic into one place: we used to have a `CompiledRuntimeMetadata` object that encapsulated `ViewAndMutationMeta`, plus a bunch of extra synthetic base metadata that was plumbed around. Now I can kill that first metadata object, and use `ViewAndMutationMeta` on its own everywhere.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96340
Approved by: https://github.com/ezyang
2023-03-15 13:45:45 +00:00
Brian Hirsh
070cefaef9 aot_autograd: dont requires_grad on tangents (#96339)
Ed pointed it out a few days ago - I probably added this mistakenly a few months ago. I can't think of any reason it's necessary, and removing it doesn't cause any tests to fail.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96339
Approved by: https://github.com/ezyang
2023-03-15 13:45:45 +00:00
Brian Hirsh
a269469982 aot autograd refactor: make all synthetic base logic layered in a single location (#96235)
This  refactor doesn't significantly change LoC in aot autograd, but I think this nets out to making it clearer (interested in peoples' thoughts).

The idea is that I tried to re-write the part of aot autograd that deals with synthetic bases in a layered way, similar to how Ed wrote the logic for dedup'ing inputs: it happens in one place, and all of the downstream transformation in aot autograd don't have to worry about it.

Specifically, I added a new function `aot_wrapper_synthetic_base`, similar to the existing `aot_wrapper_dedupe`.

The benefit: none of the other code in aot autograd needs to think about synthetic bases (previously, synthetic base code was intertwined in several places).

The downsides: there are two.

(1) `aot_wrapper_synthetic_base()` needs to have its own epilogue. There is one particularly hairy case, where factoring the synthetic base logic to a single location was painful: If you have two inputs that alias each other, where one gets a data mutation, and the other gets a metadata mutation.

Ordinarily, metadata mutations are handled by the runtime epilogue, in `create_runtime_wrapper`. However, now that things are factored this way, the runtime wrapper operates only on synthetic bases instead of operating on the original inputs. For data mutations, it is fine to apply the data mutation to the synthetic base instead of the original input alias. But for metadata mutations, we **need** to apply the metadata mutation directly to the original inputs.

The way that I handled this was by tracking which inputs slot into this specific case (part of a synthetic base, and get metadata mutations), and updateing the flat_fn() that we pass downstream to return these updated inputs as extra outputs. From the perspective of downstream logic, these are real user outputs, that it can treat like any other user outputs. `aot_wrapper_synthetic_base` will know to grab these extra outputs and use them to apply the metadata mutations.

This was pretty annoying, but has the benefit that all of that logic is encapsulated entirely in `aot_wrapper_synthetic_base()`.

(2) input mutations are now performed on the synthetic base instead of the individual aliases.

You can see the original code comment [here](b0b5f3c6c6/torch/_functorch/aot_autograd.py (L1131)) for details. We used to do the optimized thing in this case, and now we do the less optimized thing (copying the entire synthetic base, instead of the potentially smaller alias).

To be fair, we had no data showing that this optimization was showing improvements on any models in practice. I also think that the main reason anyone would ever run across this problem is because of a graph break - so if you care about perf, you probably want to avoid the extra graph breaks to begin with. I haven't added any warnings for this, but we probably could depending on what people think.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96235
Approved by: https://github.com/ezyang
2023-03-15 13:45:43 +00:00
Brian Hirsh
7a076b7b93 [aot_autograd] only performance functionalization analysis pass once (#95992)
For a while now, we've been re-running our functionalization analysis pass twice - once for get metadata when dedup'ing, and an entire second time during aot_dispatch_base/autograd.

This should also probably speed up compile times pretty noticeably, since we're going from:

(a) inference-only trace case: 3 fw traces -> 2 fw traces
(b) autograd trace case: 2 fw traces + 1 joint trace -> 1 fw trace + 1 joint trace

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95992
Approved by: https://github.com/ezyang
2023-03-15 13:45:40 +00:00
Edward Z. Yang
02f6d14b97 Only allow SymInt across partitioner boundaries, and fixes (#96653)
This PR does a few things all at once, as I needed to fix several bugs on the way here.  The main goal of the PR is to fix the  `'float' object has no attribute '_has_symbolic_sizes_strides'` error. The general idea is to heavily penalize non-SymInt but still SymNode cuts in the graph. This doesn't work for default partitioner, so essentially, dynamic shapes with default partitioner is not supported.

While doing this, I had a fix a few other bugs in the partitioner:
* SymNode operations weren't considered recomputable. But they are very cheap, go wild.
* zeros_like wasn't considered recomputable, and this prevented some gradient formulas (e.g., for angle with real inputs) from successfully finding a cut at all
* AOTAutograd tests use the default partitioner. I switch them to use min-cut partitioner...
* ...but this reveals a bug where if we have nodes in backward outputs that don't depend on tangents, they never get assigned to the backward graph. I fix this by making the backward outputs mandatory to be in backwards. I have to be careful to filter out None backward outputs; those never participate in flow analysis!

This causes some wobbling for the min-cut tests, but these seem legitimate: since we're now willing to recompute, the partitioner can reduce the number of SymInts it transmits by just doing some recompute in the backend.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96653
Approved by: https://github.com/ngimel
2023-03-14 18:30:56 +00:00
Edward Z. Yang
30c4ea138f Assert that there are no None arguments to backwards (#96300)
This assert would have caught https://github.com/pytorch/pytorch/pull/96219

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96300
Approved by: https://github.com/bdhirsh
2023-03-09 22:14:39 +00:00
Edward Z. Yang
e9e6b3b6c5 [EASY] Add complex dtypes to partitioner (#96297)
Also, delete some redundant dtype setting.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96297
Approved by: https://github.com/Chillee
2023-03-08 21:08:26 +00:00
Horace He
5bbec680d7 Fix usages of contextmanager without finally (#96170)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96170
Approved by: https://github.com/ngimel, https://github.com/malfet
2023-03-08 20:59:27 +00:00
Brian Hirsh
98ece75043 [aot autograd] merge all outputs of funtionalization analysis into single metadata (#95991)
This makes the next PR in the stack cleaner: having the top level entry point to aot autograd perform the functionalization analysis pass once, and plumb the metadata everywhere else that we need it.

I put it in a separate PR because I recently learned that this function is used in fbcode, so I'll need to fix up internals when I land this PR.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95991
Approved by: https://github.com/ezyang
2023-03-08 16:22:54 +00:00
Brian Hirsh
29b216acd5 aot autograd: handle detach() and no_grad() mutations on input (#95980)
Fixes https://github.com/pytorch/pytorch/issues/95167

More details are in that issue. To summarize, the issue shows up when we have some code like this:

```
def f(x):
    x.detach().mul_(2) # can also happen if the mul_() happens under torch.no_grad()
    return x + 1
```

AOTAutograd will then spit out code like this:
```
def compiled_fn(x):
    x_updated = x.mul(2)
    out = x_updated + 1
    return x_updated, out

def CompiledFunction.forward(x):  # pseudocode, this is part of an autograd.Function
    x_updated, out = compiled_function(x):
    return x_updated, out

def runtime_wrapper(x):
    x_updated, out = CompiledFunction.apply(x)
    x.copy_(x_updated)

x = torch.ones(2, requires_grad=True)
out = runtime_wrapper(x)
```

However, the call to `x.copy_(x_updated)` will fail with the error: `a leaf Variable that requires grad is being used in an in-place operation`. This is because `x` is an autograd leaf, and autograd doesn't allow you to mutate leaves.

In this case though, the data mutation should be entirely opaque to autograd - all mutations happened underneath a `.detach()` or a `torch.no_grad()`.

As Ed pointed out in the issue, we can detect this situation by checking if the mutated input is an autograd leaf. If it is, then it must have been the case that any mutations on it must have been hidden from autograd, since otherwise the eager code would have error'd. The solution I added is to detect this situation, and manually run `x.detach().copy_(x_updated)`, to hide the update from autograd.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95980
Approved by: https://github.com/ezyang
2023-03-08 16:11:06 +00:00
Brian Hirsh
f96bd52841 aot autograd: dont allow symint outputs to get tangents in the bw graph (#96219)
Previously, if dynamic shapes were turned on and we had a forward graph that returns a symint, then we would generate a backward graph that takes in a tangent input for that symint fwd output. This causes problems for downstream - inductor will see an input that it expects to be a symint, but it gets a `None` from autograd.

Confirmed that this repro now passes:
```
benchmarks/dynamo/torchbench.py --devices cuda --inductor --dynamic-shapes --unspecialize-int --accuracy --training --only drq
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96219
Approved by: https://github.com/ezyang
2023-03-08 13:02:34 +00:00
Edward Z. Yang
6fff232280 Delete torch._functorch.config.use_dynamic_shapes (#96102)
As requested in
https://github.com/pytorch/pytorch/pull/95975#discussion_r1124837162

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96102
Approved by: https://github.com/Skylion007
2023-03-06 18:50:20 +00:00
Edward Z. Yang
af8dbe7ec2 Fix training enablement in AOTAutograd (#95975)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95975
Approved by: https://github.com/ngimel, https://github.com/voznesenskym
2023-03-05 04:28:29 +00:00
Edward Z. Yang
20dfce591c Add support for Inductor + symbolic shapes + training (#93059)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/93059
Approved by: https://github.com/ezyang
2023-02-28 22:44:31 +00:00
Edward Z. Yang
58648822b6 Handle int/float arguments for cpp codegen in inductor (#95533)
This is a little questionable because we don't actually know what the dtype of the sympy expression is, and it's not clear we can rely on the assumptions.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95533
Approved by: https://github.com/ngimel, https://github.com/jansel
2023-02-28 03:57:35 +00:00
Sebastian Raschka
97ec340fe9 Fix double-a typo (#95470)
Fixes a type where there was a repeated "a" in a warning message.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95470
Approved by: https://github.com/ezyang
2023-02-27 18:54:43 +00:00
Brian Hirsh
bc51ee4ed7 fix spurious aot autograd warning (#95521)
The _make_boxed logic probably needs a cleanup, but this fixes a spurious warning that we should get in before the release.

Confirmed that this used to emit a warning and no longer does:
```
import torch

lin = torch.nn.Linear(100, 10)
def f(x):
    return lin(x)

opt_f = torch.compile(f)
opt_f(torch.randn(10, 100, requires_grad=False))
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95521
Approved by: https://github.com/ngimel
2023-02-26 18:17:36 +00:00
Brian Hirsh
a641d60757 hotfix for memory leak in aot autograd induced by saving tensors for backward (#95101)
Workaround fix in AOTAutograd for https://github.com/pytorch/pytorch/issues/94990 (see the comments for more details / discussion)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95101
Approved by: https://github.com/albanD
2023-02-24 03:02:55 +00:00
Edward Z. Yang
89e16c4f18 Assume sympy is always installed (#94903)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94903
Approved by: https://github.com/Skylion007, https://github.com/malfet
2023-02-16 14:09:58 +00:00
Kshiteej K
3fc4bc115f [functorch] jacrev, jacfwd error for complex input or output (#94805)
Related: https://github.com/pytorch/pytorch/issues/94397, https://github.com/pytorch/pytorch/issues/94397#issuecomment-1428452756
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94805
Approved by: https://github.com/lezcano
2023-02-14 16:13:37 +00:00
Brian Hirsh
ce474bc643 fix view + detach graph case for inductor (#94744)
fixes https://github.com/pytorch/pytorch/issues/94175

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94744
Approved by: https://github.com/ezyang
2023-02-14 01:35:23 +00:00
Brian Hirsh
ceb0f1576b turn functionalization on in aot_autograd inference (#92857)
still waiting for CI fallout
fixes #90759

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92857
Approved by: https://github.com/ezyang
2023-02-13 17:48:00 +00:00
Aaron Gokaslan
67d9790985 [BE] Apply almost all remaining flake8-comprehension checks (#94676)
Applies the remaining flake8-comprehension fixes and checks. This changes replace all remaining unnecessary generator expressions with list/dict/set comprehensions which are more succinct, performant, and better supported by our torch.jit compiler. It also removes useless generators such as 'set(a for a in b)`, resolving it into just the set call.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94676
Approved by: https://github.com/ezyang
2023-02-12 01:01:25 +00:00
Aaron Gokaslan
3d82d8d0ed [BE] Enable more flake8-comprehensions checks (#94601)
I applied some flake8 fixes and enabled checking for them in the linter. I also enabled some checks for my previous comprehensions PR.

This is a follow up to #94323 where I enable the flake8 checkers for the fixes I made and fix a few more of them.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94601
Approved by: https://github.com/ezyang
2023-02-10 23:40:29 +00:00
Xuehai Pan
5b1cedacde [BE] [2/3] Rewrite super() calls in functorch and torch (#94588)
Rewrite Python built-in class `super()` calls. Only non-semantic changes should be applied.

- #94587
- #94588
- #94592

Also, methods with only a `super()` call are removed:

```diff
class MyModule(nn.Module):
-   def __init__(self):
-       super().__init__()
-
    def forward(self, ...):
        ...
```

Some cases that change the semantics should be kept unchanged. E.g.:

f152a79be9/caffe2/python/net_printer.py (L184-L190)

f152a79be9/test/test_jit_fuser_te.py (L2628-L2635)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94588
Approved by: https://github.com/ezyang, https://github.com/albanD
2023-02-10 21:16:33 +00:00
kshitij12345
4f3858c6d8 [functorch] linearize (#94173)
Fixes https://github.com/pytorch/functorch/issues/724

TODO:
* [x] Docs

NOTE: `const_fold` pass raises UserWarning -> https://github.com/pytorch/pytorch/issues/94374

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94173
Approved by: https://github.com/Chillee
2023-02-09 15:45:08 +00:00
PyTorch MergeBot
e0e4f1a890 Revert "[functorch] linearize (#94173)"
This reverts commit b6b9e1e6e0.

Reverted https://github.com/pytorch/pytorch/pull/94173 on behalf of https://github.com/kshitij12345 due to Broke lint runner
2023-02-09 09:22:39 +00:00
Kshiteej K
b6b9e1e6e0 [functorch] linearize (#94173)
Fixes https://github.com/pytorch/functorch/issues/724

TODO:
* [x] Docs

NOTE: `const_fold` pass raises UserWarning -> https://github.com/pytorch/pytorch/issues/94374

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94173
Approved by: https://github.com/Chillee
2023-02-09 08:57:05 +00:00
Edward Z. Yang
c028fc4e25 Decouple PT2 dynamic shapes from the functorch setting (#94469)
The functorch setting still exists, but now it is no longer necessary:
we infer use of Python dispatcher by checking if the ambient
FakeTensorMode has a ShapeEnv or not.  The setting still exists,
but it is for controlling direct AOTAutograd use now; for PT2,
it's sufficient to use torch._dynamo.config.dynamic_shapes.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94469
Approved by: https://github.com/Chillee, https://github.com/voznesenskym, https://github.com/jansel
2023-02-09 06:41:41 +00:00
Edward Z. Yang
dc70b00d0b Track and record hint on SymNode and use when possible (#94201)
Historically, we work out `size_hint` by working it out on the fly by doing a substitution on the sympy expression with the `var_to_val` mapping. With this change, we also maintain the hint directly on SymNode (in `expr._hint`) and use it in lieu of Sympy substitution when it is available (mostly guards on SymInt, etc; in particular, in idiomatic Inductor code, we typically manipulate Sympy expressions directly and so do not have a way to conveniently maintain hints.)

While it's possible this will give us modest performance improvements, this is not the point of this PR; the goal is to make it easier to carefully handle unbacked SymInts, where hints are expected not to be available. You can now easily test if a SymInt is backed or not by checking `symint.node.hint is None`.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94201
Approved by: https://github.com/voznesenskym
2023-02-09 00:00:44 +00:00
Xuehai Pan
b8de1cf007 [functorch][nn] Refactor NN stateless APIs by swapping module tensors (#92536)
- Fixes #92295
- Resolves #86708
- Resolves #92153
- Closes #92401
- Closes #92218

- Requires #91579

Refactor NN stateless APIs by swapping module tensors.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92536
Approved by: https://github.com/jbschlosser
2023-02-08 17:31:38 +00:00
Brian Hirsh
83275d8cdf add torch.autograd._set_view_replay_enabled, use in aot autograd (#92588)
tldr; this should fix some minor perf regressions that were caused by adding more as_strided() calls in aot autograd.

This PR adds a new context manager, `torch.autograd._set_view_replay_enabled()`.

Context: AOT Autograd has special handling for "outputs that alias graph intermediates". E.g. given this function:

```
def f(x):
    y = torch.mul(x, 2)
    out = y.view(-1)
    return out
```

AOT Autograd will do the following:

```
def fn_to_compile(x):
    y = torch.mul(x, 2)
    out = y.view(-1)
    # return the graph intermediate
    return y, out

compiled_fn = compile(fn_to_compile)

def wrapper(x):
    y, out = compiled_fn(x)
    # regenerate the alias of the graph intermediate
    return out._view_func(y)
```

What's annoying is that `out._view_func()` will result in a `.as_strided` call, because `out` is an ordinary runtime tensor. This (likely?) caused a perf regression, because when running the backward, out `as_strided_backward()` is slower than our `view_backward()`.

In this PR, I added some TLS for instructing autograd to do view replay instead of as_strided, even when given a normal tensor. I'm definitely interested in thoughts from autograd folks (cc @albanD @soulitzer). A few points that I want to bring up:

(1) One reason that this API seems generally useful to me is because of the case where you `torch.compile()` a function, and you pass in two inputs that alias each other, and mutate one of the inputs. Autograd is forced to add a bunch of as_strided() calls into the graph when this happens, but this would give users an escape hatch for better compiled perf in this situation

(2) To be fair, AOT Autograd probably won't need this TLS in the long term. There's a better (more complicated) solution, where AOT Autograd manually precomputes the view chain off of graph intermediates during tracing, and re-applies them at runtime. This is kind of complicated though and feels lower priority to implement immediately.

(3) Given all of that I made the API private, but lmk what you all think.

This is a followup of https://github.com/pytorch/pytorch/pull/92255.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92588
Approved by: https://github.com/ezyang, https://github.com/albanD
2023-02-08 01:48:32 +00:00
Aaron Gokaslan
8fce9a09cd [BE]: pyupgrade Python to 3.8 - imports and object inheritance only (#94308)
Apply parts of pyupgrade to torch (starting with the safest changes).
This PR only does two things: removes the need to inherit from object and removes unused future imports.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94308
Approved by: https://github.com/ezyang, https://github.com/albanD
2023-02-07 21:10:56 +00:00
Eli Uriegas
567e6152da Revert "[inductor] fix crash issue when input is a view tensor (#90150)" (#94329)
Had to provide a merge conflict resolution due to conflicts with https://github.com/pytorch/pytorch/pull/94118

This was causing issues with internal tests that look similar to:
```
in clone_preserve_strides
    x.size(), x.stride(), x.storage_offset()
AttributeError: 'KeyedJaggedTensor' object has no attribute 'size'
```

See https://fburl.com/testinfra/nc0du2sp for more information

This reverts commit #90150

@jansel can you help @blzheng with re-landing this as a co-development diff?

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94329
Approved by: https://github.com/jansel
2023-02-07 20:45:58 +00:00
Edward Z. Yang
170a3e0257 Enable Python dispatcher on inference-only aot_dispatch_base (#94118)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94118
Approved by: https://github.com/voznesenskym
2023-02-04 06:10:21 +00:00
Elias Ellison
e4f11e01bd [Fake Tensor] Allow fake meta by default, delete unused ctor args (#93993)
Two small changes that I'm bundling together because one of them needs to touch fbcode and I'm not sure how to do stacked diffs + internal changes + land before release cut.

Remove allow_meta from ctor, and allow by default: we should be able to trace through meta with fake tensors, so in some senses it's a bit weird to expose to user to disallow this. However, it's still useful debug wise to error from time to time, so I've added an option to the config that will get back previous behavior.

Remove `throw_on_data_dependent_ops=True`: this was intended as a temporary behavior as we were smoothing things turning on the erroring. There are no uses anywhere of `throw_on_data_dependent_ops=False` I could find.

These are technically backward-incompatble, but fake tensor is new since the last release / in a private namespace, and I don't want to release it with baggage that would be hard to remove later.

Fix for https://github.com/pytorch/pytorch/issues/92877.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/93993
Approved by: https://github.com/bdhirsh, https://github.com/ezyang
2023-02-03 09:23:38 +00:00
blzheng
a71395dd88 [inductor] fix crash issue when input is a view tensor (#90150)
Fix the crash failure mentioned in https://github.com/pytorch/pytorch/issues/93460

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90150
Approved by: https://github.com/jgong5, https://github.com/jansel
2023-02-03 04:54:14 +00:00
PyTorch MergeBot
5d259425fc Revert "[inductor] fix crash issue when input is a view tensor (#90150)"
This reverts commit b11ec270ba.

Reverted https://github.com/pytorch/pytorch/pull/90150 on behalf of https://github.com/clee2000 due to failing test_inplace_unsqueeze3 (__main__.CPUReproTests) https://github.com/pytorch/pytorch/actions/runs/4074618739/jobs/7020199369 b11ec270ba, marking as landrace cuz all jobs are green on pr
2023-02-02 17:06:34 +00:00
blzheng
b11ec270ba [inductor] fix crash issue when input is a view tensor (#90150)
Fix the crash failure mentioned in https://github.com/pytorch/pytorch/issues/93460

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90150
Approved by: https://github.com/jgong5, https://github.com/jansel
2023-02-02 12:49:26 +00:00
Jason Ansel
23d58fedb1 Use ConfigModule for _functorch.config (#93375)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/93375
Approved by: https://github.com/Chillee
2023-02-02 00:31:24 +00:00
sli
524ee07143 Fix https://github.com/pytorch/pytorch/issues/92377 (#92379)
Fixes #92377

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92379
Approved by: https://github.com/Chillee
2023-01-31 02:22:16 +00:00
Sherlock Huang
36fe31f537 [Reland] Refactor stack_trace preservation for node meta preservation (#90803) (#92400)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90803
Approved by: https://github.com/jerryzh168, https://github.com/albanD
ghstack-source-id: 5848cca08ef5d6f8868f4f79d8bc29711e9a52c2

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92400
Approved by: https://github.com/jerryzh168
2023-01-30 23:30:43 +00:00
Michael Voznesensky
4ca511c69e Fix positional issues in dedup guards (#93137)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/93137
Approved by: https://github.com/bertmaher, https://github.com/wconstab, https://github.com/bdhirsh
2023-01-28 19:21:32 +00:00
soulitzer
f6f46ba3bb [Reland] aot autograd explicitly errors on double backward (#92893)
This reverts commit fb980581a7.

Testing: `python benchmarks/dynamo/timm_models.py  --float32 --training --only=mobilevit_s --performance --inductor --disable-cudagraphs`

```
main:               memory: eager: 12.30 GB, dynamo: 12.28 GB, ratio: 1.00
+ #90896 reverted:  memory: eager: 12.30 GB, dynamo: 8.81 GB, ratio: 1.40
+ this PR:          memory: eager: 12.30 GB, dynamo: 8.81 GB, ratio: 1.40
```

For comparison, if we apply old version of this PR instead:
```
main:
+ #90896 reverted:         memory: eager: 12.30 GB, dynamo: 8.81 GB, ratio: 1.40
+ old version of this PR   memory: eager: 12.30 GB, dynamo: 10.36 GB, ratio: 1.19
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92893
Approved by: https://github.com/bdhirsh
2023-01-26 23:19:27 +00:00
Nikita Shulga
b2f3ff6183 [Py3.11] Remove skip logic from vmap and forward_ad (#91825)
Depends on https://github.com/pytorch/pytorch/pull/91805

Fixes https://github.com/pytorch/pytorch/issues/85506
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91825
Approved by: https://github.com/albanD
2023-01-25 22:40:56 +00:00
Xuehai Pan
a2e1365248 [functorch] Remove not needed named member polyfill functions (#92613)
The `nn.Module` APIs already support `remove_duplicate` argument. It's time to retire these not needed polyfill functions. They are identical to the `nn.Module.named_parameters` and `nn.Module.named_buffers` methods.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92613
Approved by: https://github.com/ezyang, https://github.com/malfet
2023-01-24 13:15:32 +00:00
soulitzer
fb980581a7 Revert #92688 and #92348 (aot autograd explicitly errors on double backward) (#92863)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92863
Approved by: https://github.com/eellison
2023-01-24 03:04:04 +00:00
Michael Lazos
6f1727b288 Print aot graphs if user specifies aot graph env vars (#92720)
When integrating AOT logging with TorchInductor trace, the ability to print graphs to the console if the user specified any of the env vars was removed (in favor of using TORCH_COMPILE_DEBUG). This restores this by checking if the user set any of the aot debug variables *before* setting up the remainder of the logging, and adding a stream to stdout if any of those env vars are set.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92720
Approved by: https://github.com/Chillee
2023-01-21 04:46:35 +00:00
Horace He
d6c3468f70 Don't allow recomputing a node that *must* be materialized in the backwards pass (#90896)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90896
Approved by: https://github.com/ezyang
2023-01-20 22:34:41 +00:00
soulitzer
a3efa9d740 Create autograd Function for aot_autograd backward only when needed (#92688)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92688
Approved by: https://github.com/bdhirsh
2023-01-20 21:55:23 +00:00
Edward Z. Yang
c4501593c3 Delete get_pyobj() entirely (#92638)
Opt for the shorter and more direct node attribute access.

I need to do this because I'm going to publicly document
SymInt and SymFloat but I don't want to doc get_pyobj().

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92638
Approved by: https://github.com/Chillee, https://github.com/albanD, https://github.com/voznesenskym, https://github.com/bdhirsh
2023-01-20 19:06:56 +00:00
Sean Ross-Ross
fb3d9f39cc update vmap to accept nones (#91644)
* Fixes https://github.com/pytorch/functorch/issues/1082
* Fixes https://github.com/pytorch/functorch/issues/439

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91644
Approved by: https://github.com/kshitij12345, https://github.com/Chillee
2023-01-20 18:25:22 +00:00
soulitzer
bb7790781f Make aot_autograd explicitly error when double backward (#92348)
Mitigates https://github.com/pytorch/pytorch/issues/91469

Changes:
- ~once_differentiable can now be parametrized to print a custom error message~
- instead of once_differentiable, we do the backward inside another custom Function, which makes sure the graph is connected, but also makes sure to error on double backward
- we now explicitly error when doing double backward with torch.compile + aot_autograd instead of being silently incorrect. ~The niceness of the error message can vary depending on whether your grad_outputs are passed, or whether you are doing `.grad()` or `.backward()`.~

Unchanged:
- doing backward inside compiled function is still allowed. It currently causes a graph break and is equivalent to doing backward outside the compiled function. It might be nice to disallow this explicitly as well, but that can be done in a follow up.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92348
Approved by: https://github.com/albanD
2023-01-19 00:13:29 +00:00
Xuehai Pan
ae4ec7de1e Fix and update type hints for make_functional.py (#91579)
Changes in details:

- Fix and update some out-of-date type hints in `_functorch/make_functional.py`.
- ~Explicitly use `OrderedDict` for order-sensitive mappings.~

	In `create_names_map()`, `_swap_state()`, and `FunctionalModuleWithBuffers.__init__()`, the unordered `dict` was used. The key order should be preserved for `dict.items()` while it is required to `zip` with a tuple of `params`/`buffers`. Although since Python 3.6, the built-in dictionary is insertion ordered ([PEP 468](https://peps.python.org/pep-0468)). Explicit is better than implicit.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91579
Approved by: https://github.com/zou3519
2023-01-18 19:16:32 +00:00
Richard Zou
5d01277fea Deprecate torch.nn.utils.stateless.functional_call (#92280)
This PR:
- Updates the docs to say it is deprecated
- Raises a UserWarning
- Changes most of the callsites inside PyTorch to use
torch.func.functional_call, minus the test_stateless testing.

The motivation behind this is that we can now align behind a single
functional_call API in PyTorch.

Test Plan:
- existing tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92280
Approved by: https://github.com/albanD
2023-01-18 14:26:25 +00:00
Richard Zou
a8a44a1aa2 Add deprecation messages for functorch.* function transforms (#92279)
This PR:
- adds deprecation warnings when calling the functorch APIs
- adds documentation saying that those APIs are deprecated

It does this by creating thin wrappers around the original APIs that (1)
raise deprecation warnings and (2) have an additional line in their
documentation that they are deprecated.

NB:
- Python surpresses DeprecationWarning, so we use UserWarning instead.

Test Plan:
- New tests
- the functorch.* APIs are still tested for correctness because that's
what test/functorch/* use (as opposed to directly calling the
torch.func.* APIs)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92279
Approved by: https://github.com/albanD, https://github.com/soulitzer
2023-01-18 14:26:25 +00:00
Richard Zou
21d2bd782b stack_module_state should return unrelated parameters (#92278)
`torch.func.stack_module_state` is our replacement for
`functorch.combine_state_for_ensemble`. The most common usage for
combine_state_for_ensemble is to
- create stacked parameters and buffers
- use vmap to run the forward pass
- use regular PyTorch autograd to run the backward pass (e.g.,
Tensor.backwrd)
- optimize directly over the stacked parameters (this is more performant
than optimizing over the unstacked parameters).

Right now, stack_module_state returns stacked parameters that cannot be
optimized directly (only leaf tensors can have a .grad field); this PR
fixes that by turning the stacked parameters back into leaf tensors.

Test Plan:
- new tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92278
Approved by: https://github.com/soulitzer
2023-01-18 14:26:22 +00:00
Richard Zou
fbf9e379e1 [autograd.Function] update error messages for vmap to point to docs (#92030)
We need to separately update it when 2.0 comes along and the master docs
become stable docs so that users aren't looking at master docs all the
time.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92030
Approved by: https://github.com/soulitzer
2023-01-17 13:36:42 +00:00
Richard Zou
7aaad0b832 Rename flag that enables/disables _SingleLevelFunction for functorch (#92025)
functorch used to have a switch that enables/disables autograd.Function.
That switch now enables/disables torch.autograd.function._SingleLevelFunction, so
I've renamed it accordingly.

We could just delete the switch because users should not be directly
working with torch.autograd.function._SingleLevelFunction. However,
it was useful for debugging when something went wrong when I was
implementing the autograd.Function <> functorch interaction, so I want
to keep it around as a debugging tool for a while since the code is
already there.

Test Plan:
- updated tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92025
Approved by: https://github.com/soulitzer
2023-01-17 13:36:41 +00:00
Richard Zou
14ff58d4fa [generate_vmap_rule] Delete unused output_shapes (#92024)
We don't actually need `output_shapes` to implement
`generate_vmap_rule=True` support for autograd.Function.
- We need this in the vjp (backward) case because autograd automatically
  reduces grad_inputs to inputs and we need to replicate that behavior.
  In order to replicate that behavior, we recorded the original input
  shapes so we know how to reduce the grad_input.
- There is no such behavior for forward-mode AD, so we don't need to
  pass an `output_shapes` to reductify.

This PR simplifies the API of `reductify` and `reductify_leaf`. Instead
of accepting `input_shape_without_bdim` and `allow_expanded_grad`, we
now combine these into a single argument,
`reduce_to_input_shape_without_bdim`.
- if it is None, then we don't do anything
- if it is not-None and a shape, then we will reduce the grad to the
  provided shape.

Test Plan:
- updated original unittests
- wait for test suite
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92024
Approved by: https://github.com/soulitzer
2023-01-17 13:36:39 +00:00
Richard Zou
f5af97ef06 [autograd.Function] add nice error message for incorrect usage of vmap (#92023)
This PR:
- adds a nice error message if the user doesn't follow the API of the
  vmap staticmethod correctly. That is, the user must return two
  arguments from the vmap staticmethod API: (outputs, out_dims), and
  out_dims must be a PyTree with either the same structure as `outputs`
  our be broadcastable to the same structure as `outputs`.
- Fixes an edge case for out_dims=None. out_dims is allowed to be None,
  but wrap_outputs_maintaining_identity was treating "None" as "This is
  not the vmap case"

Test Plan:
- new tests

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92023
Approved by: https://github.com/soulitzer
2023-01-17 13:36:37 +00:00
Richard Zou
2f9166ef89 [autograd.Function] Cleanup asymmetry in generate_vmap_rule and vmap (#91787)
This PR:
- changes generate_vmap_rule to either be True or False. Previously it
  could be True, False, or not set. This simplifies the implementation a
  bit.
- changes the vmap staticmethod to always be on the autograd.Function
  rather than sometimes defined.
  This is how the other staticmethod (forward, backward, jvp) are
  implemented and allows us to document it.

There are 4 possible states for the autograd.Function w.r.t. to the
above:
- generate_vmap_rule is True, vmap staticmethod overriden. This raises
  an error when used with vmap.
- generate_vmap_rule is False, vmap staticmethod overriden. This is
  valid.
- generate_vmap_rule is True, vmap staticmethod not overriden. This is
  valid.
- generate_vmap_rule is False, vmap staticmethod not overriden. This
  raises an error when used with vmap.

Future:
- setup_context needs the same treatment, but that's a bit tricker to
  implement.

Test Plan:
- new unittest
- existing tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91787
Approved by: https://github.com/soulitzer
2023-01-17 13:36:34 +00:00
Edward Z. Yang
ded2b47bde Fix AOTAutograd 2.0 perf regression involving as_strided (#92255)
I feel there may be a deeper fix where we avoid as_strided entirely, but in the regressed model the sizes/strides all lined up exactly, so this seems to work to fix the immediate regression.

Repro command: `python benchmarks/dynamo/torchbench.py  --performance  --backend inductor --float16 --training --batch-size-file $(realpath benchmarks/dynamo/torchbench_models_list.txt) --only hf_Bert  `

Before: 1.138x p=0.00
After: 1.162x p=0.00

Natalia pinpointed it to this line by comparing GPU traces and finding that the regressed PyTorch had two extra fill kernels and a memcpy:

Without regression:
![image](https://user-images.githubusercontent.com/13564/212726521-450e183d-7b36-4538-ad14-617e09c689a8.png)

With regression:
![image](https://user-images.githubusercontent.com/13564/212726469-4f3ff4b5-3f68-48cf-94d2-ddebb9216176.png)

...which CPU profiler blamed on `AsStridedBackward`:

![image](https://user-images.githubusercontent.com/13564/212726953-16333bfc-8460-4445-90ad-7fe73c4173c2.png)

...which were then pinpointed to  https://github.com/pytorch/pytorch/pull/92076/files#diff-df954bbf954d2dcb81f687876053267ffa4ddb36ed86b7d2bd76319ff2b94416R486-R489

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92255
Approved by: https://github.com/ngimel, https://github.com/bdhirsh
2023-01-17 06:07:37 +00:00
Edward Z. Yang
7078ad5b8c Reland "AOT Autograd refactor + cleanup, handle intermediate views of bases, use view replay, fix non-tensor input handling" (#92076)
Original PR: https://github.com/pytorch/pytorch/pull/89532

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92076
Approved by: https://github.com/janeyx99, https://github.com/albanD
2023-01-12 21:32:05 +00:00
samdow
515dff7811 [functorch] move batch_norm_replacement to torch.func (#91412)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91412
Approved by: https://github.com/zou3519
2023-01-12 19:15:41 +00:00
samdow
8b3c4bc481 [stateless] add weight tying support (#90477)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90477
Approved by: https://github.com/zou3519
2023-01-11 15:19:09 +00:00
PyTorch MergeBot
498be7ed25 Revert "Refactor stack_trace preservation for node meta preservation (#90803)"
This reverts commit 0f1302eeae.

Reverted https://github.com/pytorch/pytorch/pull/90803 on behalf of https://github.com/DanilBaibak due to Break internal build
2023-01-10 10:44:28 +00:00
Sherlock Huang
0f1302eeae Refactor stack_trace preservation for node meta preservation (#90803)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90803
Approved by: https://github.com/jerryzh168, https://github.com/albanD
2023-01-09 23:23:27 +00:00
samdow
162474d7fd [functorch] add new ensembling api, demonstrate in example (#88850)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88850
Approved by: https://github.com/zou3519
2023-01-04 00:33:14 +00:00
samdow
c5e5916fff [functorch] add functorch functional_call, update tests to test this (#89213)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89213
Approved by: https://github.com/zou3519
2023-01-04 00:33:14 +00:00
joncrall
ad782ff7df Enable xdoctest runner in CI for real this time (#83816)
Builds on #83317 and enables running the doctests. Just need to figure out what is causing the failures.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/83816
Approved by: https://github.com/ezyang, https://github.com/malfet
2022-12-29 05:32:42 +00:00
Kurt Mohler
08a47549af Rename Tensor._storage to Tensor.untyped_storage and update docs (#91414)
Fixes #89224

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91414
Approved by: https://github.com/ezyang
2022-12-28 19:21:34 +00:00
Kshiteej K
3fdbf824ae [functorch] jacrev: chunk_size=1 without vmap (#91326)
As discussed at https://github.com/pytorch/pytorch/pull/91157#discussion_r1053679272

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91326
Approved by: https://github.com/zou3519
2022-12-28 04:56:25 +00:00
soulitzer
1b2ee4d0e1 Update functorch supported autograd.Function to allow mark_dirty (#91222)
Fixes https://github.com/pytorch/pytorch/issues/90225
Uses what was originally in 32a57bcdb6

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91222
Approved by: https://github.com/zou3519
2022-12-28 03:53:47 +00:00
Richard Zou
e8393131ee [generate_vmap_rule] support for jvp (#91211)
Support for jvp is very similar to support for backward():
- We need to vmap over a version of the original autograd.Function's jvp
method that does not take ctx as input.
- On the output, we need to reductify to ensure the output tangent has
the same shape as the output. This reductify does not have the
extra reduction semantics, because PyTorch forward-mode AD requires the
output tangent to have the same exact shape as the output.
- setup_context needs to tell us the bdims of the saved_tensors
(necessary for vmap over jvp_no_context), as well
as the output shapes (necessary for reductify).

Test Plan:
- Added jvp support to the *GenVmapAutogradFunction
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91211
Approved by: https://github.com/soulitzer
2022-12-27 23:25:59 +00:00
Tugsbayasgalan (Tugsuu) Manlaibaatar
76a3869fc6 Support functionalization on torch.cond (#89966)
This PR adds functionalization path for torch.cond. As it is the first pass, we only functionalize for very restrictive use cases. We explicitly restrict following:

- Output of each branch aliasing input
- In-place mutation on inputs given to each branch

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89966
Approved by: https://github.com/zou3519
2022-12-22 22:01:47 +00:00
kshitij12345
4437d0d161 [functorch] vmap: chunk_size support (#91157)
Ref: https://github.com/pytorch/functorch/issues/680

We introduce a kwarg `chunk_size` in vmap.

Also, we leverage most of the code from `chunk_vmap` (except for chunking the input based on `chunk_size`)

Benchmarks from https://github.com/pytorch/functorch/pull/774 apply.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91157
Approved by: https://github.com/zou3519
2022-12-22 19:45:45 +00:00
Brian Hirsh
c47bdd7522 *_scatter ops should preserve input stride/storage_offset (#91029)
It turns out that we *do* need to update *_scatter ops to return the exact same strides as their inputs. I added a test to `test/test_functionalization.py`, which now trips thanks to Ed's functionalization stride debugging check. It only actually ends up tripping silent correctness if you try to .backward() on that function.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91029
Approved by: https://github.com/ezyang
2022-12-22 19:41:53 +00:00
Richard Zou
fb2e1878cb [torch.func] alias torch.func.vmap as torch.vmap (#91026)
This PR also redirects torch.vmap to torch.func.vmap instead of the old
vmap prototype.

Test Plan:
- tests
- view docs preview
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91026
Approved by: https://github.com/albanD, https://github.com/samdow
2022-12-21 20:51:49 +00:00
Richard Zou
2f37804cae [generate_vmap_rule] Add generate_vmap_rule to autograd.Function (#90966)
Design document:
https://docs.google.com/document/d/1bIQkWXy3J35_20c_a5kchikabBW5M8_uRAhl0BIMwU4/edit

This PR adds a `generate_vmap_rule` option (default False) to autograd.Function.
By setting it to True, a user promises to us that their autograd.Function's
{forward, backward, jvp}, if defined, only uses PyTorch operations, in addition to the other
limitations of autograd.Function+functorch (such as the user not
capturing any Tensors being transformed over from outside of the
autograd.Function).

Concretely, the approach is:
- we update `custom_function_call` to accept an additional
`generate_vmap_rule` argument.
- The vmap rule for `custom_function_call` and `generate_vmap_rule=True`
is: we construct a vmapped version of the autograd.Function and dispatch
on it.
- The vmapped version of the autograd.Function can be thought of like
the following: if we have an autograd.Function Foo, then
VmappedFoo.apply(in_dims, ...) has the same semantics as
vmap(Foo.apply, in_dims...)
- VmappedFoo's forward, setup_context, and backward staticmethod are
vmapped versions of Foo's staticmethods.
- See the design doc for more motivation and explanation

Test Plan:
- This PR introduces additional autograd.Function with the suffix "GenVmap" to
autograd_function_db.
- There are also some minor UX tests

Future:
- jvp support
- likely more testing to come, but please let me know if you have
cases that you want me to test here.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90966
Approved by: https://github.com/soulitzer
2022-12-21 00:34:44 +00:00
Richard Zou
2a55984139 [generate_vmap_rule] reductify_leaf helper function (#90965)
As seen in
https://docs.google.com/document/d/1bIQkWXy3J35_20c_a5kchikabBW5M8_uRAhl0BIMwU4/edit

`reductify_leaf(grad_input, ...)` is a helper function that processes a
single grad_input Tensor. The reason why we need it is:
- the grad_input has some optional bdim
- the input has some optional bdim
- if these are different, we need to coerce the grad_input into having
the same shape as the input, either by reducing or expanding the
grad_input.

Note that there is a special case in autograd that the user is allowed
to return a grad_input Tensor that is an expanded version of the
original input tensor. In this case, autograd automatically reduces
grad_input to the same shape as the input. Unfortunately this logic
doesn't work when bdims are involved, so we manually handle it in
`reductify_leaf`.

Test Plan:
- tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90965
Approved by: https://github.com/soulitzer
2022-12-21 00:34:44 +00:00
Richard Zou
53c94ef1bb [generate_vmap_rule] Add mechanism to override ctx.saved_tensors (CtxWithSavedTensors) (#90964)
As seen in
https://docs.google.com/document/d/1bIQkWXy3J35_20c_a5kchikabBW5M8_uRAhl0BIMwU4/edit#heading=h.r3ckcnsh1cxt

This PR creates CtxWithSavedTensors. You can wrap a ctx object in the
backward pass of autograd.Function in CtxWithSavedTensors and specify
the saved_tensors attribute. CtxWithSavedTensor acts like the original
ctx object (all other attribute accesses are forwarded to the original ctx
object) but it has a custom saved_tensors field.

Test Plan:
- tests that you can use CtxWithSavedTensors to get a new object with
your own saved_tensors.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90964
Approved by: https://github.com/samdow, https://github.com/soulitzer
2022-12-21 00:34:43 +00:00
Richard Zou
31981d0139 [generate_vmap_rule] add restore_vmap helper function (#90963)
As seen in
https://docs.google.com/document/d/1bIQkWXy3J35_20c_a5kchikabBW5M8_uRAhl0BIMwU4/edit

`restore_vmap` is a private helper function. It is vmap but has the
following
differences:
- instead of returning outputs, it returns an (outputs, out_dims) tuple.
  out_dims is a pytree of shape shape as outputs and contains Optional[int]
  specifying where the vmapped dimension, if it exists, is in the
  corresponding output.
- does no validation on in_dims or inputs (vmap expects at least one
  Tensor to be vmapped).
  restore_vmap allows for no inputs to have the vmap dimension
- does no validation on outputs (vmap expects only Tensor outputs)
  restore_vmap allows for return of arbitrary outputs (not just
  Tensors)

Test Plan:
- added some simple test to test restore_vmap
- I am OK with restore_vmap not being a part of vmap right now -- the
implementation of vmap rarely changes and it is a bit difficult to
refactor vmap in a way that restore_vmap is a subroutine.

Other questions:
- Bikeshedding the `restore_vmap` name
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90963
Approved by: https://github.com/samdow, https://github.com/soulitzer
2022-12-21 00:34:41 +00:00
William Wen
289f06434c [dynamo] check buffers when checking accuracy (#91037)
Tested by running `python benchmarks/dynamo/torchbench.py --accuracy --float32 -dcuda --output=inductor_torchbench_float32_training_cuda_performance.csv --training --inductor --no-skip --dashboard --only mobilenet_v2 --cold_start_latency` and breakpointing after the changes to inspect buffers.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91037
Approved by: https://github.com/anijain2305
2022-12-20 13:57:25 +00:00
Richard Zou
41846e205e [torch.func] Setup torch.func, populate it with all transforms (#91016)
This PR sets up torch.func and populates it with the following APIs:
- grad
- grad_and_value
- vjp
- jvp
- jacrev
- jacfwd
- hessian
- functionalize
- vmap

It also renames all instances of `functorch` in the APIs for those docs
to `torch.func`.

We rewrite the `__module__` fields on some of the above APIs so that the
APIs fit PyTorch's public api definition.
- For an API to be public, it must have a `__module__` that points to a
  public PyTorch submodule. However, `torch._functorch.eager_transforms`
  is not public due to the leading underscore.
- The solution is to rewrite `__module__` to point to where the API is
  exposed (torch.func). This is what both Numpy and JAX do for their
  APIs.
- h/t pmeier in
  https://github.com/pytorch/pytorch/issues/90284#issuecomment-1348595246
  for idea and code
- The helper function, `exposed_in`, is confined to
  torch._functorch/utils for now because we're not completely sure if
  this should be the long-term solution.

Implication for functorch.* APIs:
- functorch.grad is the same object as torch.func.grad
- this means that the functorch.grad docstring is actually the
  torch.func.grad docstring and will refer to torch.func instead of
  functorch.
- This isn't really a problem since the plan on record is to deprecate
  functorch in favor of torch.func. We can fix these if we really want,
  but I'm not sure if a solution is worth maintaining.

Test Plan:
- view docs preview

Future:
- vmap should actually just be torch.vmap. This requires an extra step
  where I need to test internal callsites, so, I'm separating it into a
  different PR.
- make_fx should be in torch.func to be consistent with `import
  functorch`. This one is a bit more of a headache to deal with w.r.t.
  public api, so going to deal with it separately.
- beef up func.rst with everything else currently on the functorch
  documention website. func.rst is currently just an empty shell.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91016
Approved by: https://github.com/samdow
2022-12-20 00:00:52 +00:00
Richard Zou
cad1ce6158 Stop using :attr: in functorch docs (#91015)
We're using :attr: wrong. :attr: refers to an attribute of a Python
object, not the parameter to a function:
- https://www.sphinx-doc.org/en/master/usage/restructuredtext/domains.html#role-py-attr

This leads to some weird things when moving to torch.func: sphinx
decides to link torch.func for :attr:`func`

Test Plan:
- docs preview.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91015
Approved by: https://github.com/samdow
2022-12-20 00:00:52 +00:00
Kshiteej K
f02e93b584 jacrev : Support chunked computation (#89376)
Ref: https://github.com/pytorch/functorch/issues/680

We introduce a kwarg `chunk_size` in `jacrev` to control whether the Jacobian computation should be chunked and if so then `chunk_size` will dictate the maximum size of the chunks used.

We try two approaches,
* Stacked Approach: Append the intermediate computation to a list and then stack those results.
* Pre-allocation Approach: Pre-allocate a zeros tensor and copy chunked computation into it.

For Memory Benchmark, see https://github.com/pytorch/pytorch/pull/89376#issuecomment-1348479098

Benchmark CPU : Performs better with more chunks/ smaller chunk_size.

NOTE: There seems to be a lot of noise for shape `(64, 64)`.

<details>

```
[----------------------------------------------- jacrev : device cpu : chunks 2 -----------------------------------------------]
                                     |  with chunk_size and stacked  |  without chunk_size  |  with chunk_size and pre-allocated
1 threads: ---------------------------------------------------------------------------------------------------------------------
      (64, 64) : chunk_size 2080     |               76.2            |          50.9        |                  80.1
      (128, 128) : chunk_size 8256   |             1172.8            |         783.3        |                1225.5
      (128, 144) : chunk_size 9288   |             1475.1            |         990.4        |                1548.3
      (144, 144) : chunk_size 10440  |             1871.3            |        1254.4        |                1971.2

Times are in milliseconds (ms).

[----------------------------------------------- jacrev : device cpu : chunks 3 ----------------------------------------------]
                                    |  with chunk_size and stacked  |  without chunk_size  |  with chunk_size and pre-allocated
1 threads: --------------------------------------------------------------------------------------------------------------------
      (64, 64) : chunk_size 1386    |               39.9            |          25.8        |                  58.8
      (128, 128) : chunk_size 5504  |             1182.6            |         782.2        |                1229.7
      (128, 144) : chunk_size 6192  |             1483.6            |         995.4        |                1550.6
      (144, 144) : chunk_size 6960  |             1879.1            |        1257.7        |                1960.5

Times are in milliseconds (ms).

[----------------------------------------------- jacrev : device cpu : chunks 4 ----------------------------------------------]
                                    |  with chunk_size and stacked  |  without chunk_size  |  with chunk_size and pre-allocated
1 threads: --------------------------------------------------------------------------------------------------------------------
      (64, 64) : chunk_size 1040    |               41.7            |          50.6        |                  29.1
      (128, 128) : chunk_size 4128  |             1171.6            |         782.3        |                1226.7
      (128, 144) : chunk_size 4644  |             1482.2            |         994.6        |                1550.9
      (144, 144) : chunk_size 5220  |             1870.2            |        1254.5        |                1961.4

Times are in milliseconds (ms).

[--------------------------------------------- jacrev : device cpu : chunks 100 ---------------------------------------------]
                                   |  with chunk_size and stacked  |  without chunk_size  |  with chunk_size and pre-allocated
1 threads: -------------------------------------------------------------------------------------------------------------------
      (64, 64) : chunk_size 41     |               46.8            |          50.5        |                  46.4
      (128, 128) : chunk_size 165  |              622.2            |         775.2        |                 656.0
      (128, 144) : chunk_size 185  |              803.9            |         987.3        |                 866.9
      (144, 144) : chunk_size 208  |             1021.1            |        1251.2        |                1088.2

Times are in milliseconds (ms).

[--------------------------------------------- jacrev : device cpu : chunks 200 ---------------------------------------------]
                                   |  with chunk_size and stacked  |  without chunk_size  |  with chunk_size and pre-allocated
1 threads: -------------------------------------------------------------------------------------------------------------------
      (64, 64) : chunk_size 20     |               60.9            |          50.2        |                  62.3
      (128, 128) : chunk_size 82   |              583.1            |         779.4        |                 634.3
      (128, 144) : chunk_size 92   |              834.1            |        1005.8        |                 472.3
      (144, 144) : chunk_size 104  |             1053.6            |        1277.0        |                1033.9

Times are in milliseconds (ms).

[--------------------------------------------- jacrev : device cpu : chunks 300 --------------------------------------------]
                                  |  with chunk_size and stacked  |  without chunk_size  |  with chunk_size and pre-allocated
1 threads: ------------------------------------------------------------------------------------------------------------------
      (64, 64) : chunk_size 13    |              77.7             |          50.4        |                  79.6
      (128, 128) : chunk_size 55  |             578.9             |         782.3        |                 626.9
      (128, 144) : chunk_size 61  |             718.2             |        1024.9        |                 800.4
      (144, 144) : chunk_size 69  |             919.7             |        1313.7        |                1023.0

Times are in milliseconds (ms).
```

</details>

Benchmark CUDA: Performs better with less chunks/bigger chunk_size.

<details>

```
[--------------------------------------------- jacrev : device cuda:1 : chunks 2 ----------------------------------------------]
                                     |  with chunk_size and stacked  |  without chunk_size  |  with chunk_size and pre-allocated
1 threads: ---------------------------------------------------------------------------------------------------------------------
      (64, 64) : chunk_size 2080     |             1485.7            |         923.8        |                1632.3
      (128, 128) : chunk_size 8256   |            25390.2            |       14103.2        |               33557.4
      (128, 144) : chunk_size 9288   |              801.7            |       16854.1        |               42894.6
      (144, 144) : chunk_size 10440  |             1003.5            |       21386.5        |               59648.5

Times are in microseconds (us).

3 / 3 : Shape (144, 144) : Device cuda:1 : chunks: 3
[--------------------------------------------- jacrev : device cuda:1 : chunks 3 ---------------------------------------------]
                                    |  with chunk_size and stacked  |  without chunk_size  |  with chunk_size and pre-allocated
1 threads: --------------------------------------------------------------------------------------------------------------------
      (64, 64) : chunk_size 1386    |             1474.5            |         924.5        |                1655.5
      (128, 128) : chunk_size 5504  |            25368.9            |       10156.0        |               34022.1
      (128, 144) : chunk_size 6192  |            25223.0            |       12933.7        |               56418.5
      (144, 144) : chunk_size 6960  |            24729.3            |       16367.4        |               68744.7

Times are in microseconds (us).

3 / 3 : Shape (144, 144) : Device cuda:1 : chunks: 4
[--------------------------------------------- jacrev : device cuda:1 : chunks 4 ---------------------------------------------]
                                    |  with chunk_size and stacked  |  without chunk_size  |  with chunk_size and pre-allocated
1 threads: --------------------------------------------------------------------------------------------------------------------
      (64, 64) : chunk_size 1040    |             1489.2            |         924.4        |                 1679.6
      (128, 128) : chunk_size 4128  |            25370.4            |        8987.4        |                57201.3
      (128, 144) : chunk_size 4644  |            32239.1            |       10136.2        |                72406.5
      (144, 144) : chunk_size 5220  |            40994.3            |       12867.8        |               108653.4

Times are in microseconds (us).

3 / 3 : Shape (144, 144) : Device cuda:1 : chunks: 100
[------------------------------------------- jacrev : device cuda:1 : chunks 100 --------------------------------------------]
                                   |  with chunk_size and stacked  |  without chunk_size  |  with chunk_size and pre-allocated
1 threads: -------------------------------------------------------------------------------------------------------------------
      (64, 64) : chunk_size 41     |            21121.8            |         924.2        |               22753.5
      (128, 128) : chunk_size 165  |            23679.7            |       14284.4        |               26758.2
      (128, 144) : chunk_size 185  |            30082.3            |       18063.3        |               33553.5
      (144, 144) : chunk_size 208  |            38175.6            |       22839.5        |               42030.0

Times are in microseconds (us).
```

</details>

Benchmark Script

<details>

```python
import functorch
import torch
import itertools
import time
from torch.utils.benchmark import Timer
from torch.utils.benchmark import Compare
import sys
import pickle
from torch import profiler

import math

def prod(l):
    prod = 1
    for el in l:
        prod *= el

    return prod

def fn(x, y):
    return x + y, x.sum(0)

shapes = ((64, 64), (128, 128), (128, 144), (144, 144))

for device in ('cpu', 'cuda:1'):
    if device == 'cuda:1':
        chunks = (2, 3, 4, 100,)
    else:
        chunks = (2, 3, 4, 100, 200, 300)
    for chunk in chunks:
        results = []
        for shape in shapes:
            x = torch.zeros(*shape, dtype=torch.float, device=device)
            y = x.sum()
            chunk_size = (prod(shape) + prod(shape[1:])) // chunk
            jacrev_fn_chunked = functorch.jacrev(fn, (0, 1), chunk_size=chunk_size)
            jacrev_fn_chunked_pre = functorch.jacrev(fn, (0, 1), chunk_size=chunk_size, _preallocate_and_copy=True)
            jacrev_fn = functorch.jacrev(fn, (0, 1), chunk_size=None)

            tasks = [("jacrev_fn_chunked(x, y)", "with chunk_size and stacked"),
                     ("jacrev_fn(x, y)", "without chunk_size"),
                     ("jacrev_fn_chunked_pre(x, y)", "with chunk_size and pre-allocated"),]
            timers = [Timer(stmt=stmt, label=f"jacrev : device {device} : chunks {chunk}", sub_label=f"{(shape)} : chunk_size {chunk_size}", description=desc, globals=globals()) for stmt, desc in tasks]

            for i, timer in enumerate(timers):
                results.append(
                    timer.blocked_autorange(min_run_time=2.)
                )
                print(f"\r{i + 1} / {len(timers)} : Shape {shape} : Device {device} : chunks: {chunk}", end="")
                sys.stdout.flush()

        print()
        comparison = Compare(results)
        comparison.print()
```

</details>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89376
Approved by: https://github.com/zou3519
2022-12-19 20:04:21 +00:00
Edward Z. Yang
944519a468 Switch use_fake_tensor to True by default (#89663)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89663
Approved by: https://github.com/anjali411, https://github.com/Morgan77523
2022-12-19 07:24:06 +00:00
Michael Voznesensky
b72caf311d Introduce guardexpr, aot autograd guarding of duplicates into torch._guards (#90955)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90955
Approved by: https://github.com/ezyang
2022-12-18 03:05:47 +00:00
Brian Hirsh
4ab81ae80d fix default partitioner: save sizes instead of tensor for backward when possible (#91012)
This should fix hf_Longformer, AllenaiLongformerBase, and tacotron2 with dynamic shapes. Example repro:
```
TORCHDYNAMO_DYNAMIC_SHAPES=1 AOT_DYNAMIC_SHAPES=1 python benchmarks/dynamo/torchbench.py --accuracy --backend aot_eager --training --only hf_Longformer
```

used to fail with:
```
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [4, 1024, 12, 513]], which is output 0
 of AsStridedBackward0, is at version 6; expected version 4 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient,
with torch.autograd.set_detect_anomaly(True).
```

The problem is that:

(1) when we have a tensor from the forward, whose sizes are needed the backward, we were saving the actual tensor for backward, and directly grabbing the sizes off of it inside of the backward graph (bad for perf)

(2) If that tensor happens to be a graph input that gets mutated, we end up with the above error. Autograd yells at you if you try to save a tensor for backward, and later mutate it.

I confirmed that this problem doesn't happen for the min cut partitioner.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91012
Approved by: https://github.com/ezyang
2022-12-17 02:06:10 +00:00
Richard Zou
ffa37c9fca Add VmapInterpreter.randomness (in pyfunctorch) provide it in info object (#90789)
This PR:
- adds VmapInterpreter.randomness. This returns the randomness option
the user provided in vmap(..., randomness=...)
- adds randomness in the info object passed to the vmap staticmethod of
autograd.Function. This is so that the user can handle random operations
on their own terms (if randomness="error", and if the autograd.Function
has random operations, then it is the user's responsiblity to raise an
error).

Test Plan:
- updated unittest
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90789
Approved by: https://github.com/samdow, https://github.com/soulitzer
2022-12-17 00:43:43 +00:00
Brian Hirsh
7a683eaeb8 aot_autograd: add assert for functional-only graph (#88816)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88816
Approved by: https://github.com/ezyang, https://github.com/ngimel
2022-12-16 21:04:36 +00:00
Edward Z. Yang
e48c91688b DebugInterpreter works with symbolic shapes now, plus test (#90913)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90913
Approved by: https://github.com/voznesenskym
2022-12-16 05:22:56 +00:00
Edward Z. Yang
eef019c14a Lint rule to forbid direct use of logging.info/etc APIs (#90907)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90907
Approved by: https://github.com/jansel
2022-12-16 05:13:51 +00:00
Brian Hirsh
3c637e8007 fix aot autograd for None fw inputs (#89975)
hot fix: Confirmed this fixes an internal model that had None as one if its inputs

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89975
Approved by: https://github.com/aazzolini
2022-12-14 18:44:08 +00:00
Richard Zou
f21cb7d77e [pyfunctorch] Generate a more meaningful name for _SingleLevelAutogradFunction (#90418)
The API to do this is not pretty, but at least it works.

Test Plan:
- new test
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90418
Approved by: https://github.com/soulitzer
2022-12-14 16:20:57 +00:00
Richard Zou
da42eab48b Fix circular import in torch/autograd/function.py (#90415)
It turns out it is possible to break cycles by not directly importing a
module:
- there's a problem that torch.jit imports torch._ops and torch._ops
import torch.jit
- there's another problem that torch.autograd.function imports
custom_function_call but torch._functorch.autograd_function imports
torch.autograd.function

The "better" way to handle all of this is to do some large refactoring so
that torch._functorch.autograd_function imports some file that has
_SingleLevelAutogradFunction and then have torch.autograd.function
depend on torch.functorch.autograd_function... (and ditto for torch.jit
vs torch._ops), but I'm scared to move code around too much for BC
reasons and the fix in this PR works well.

Test Plan:
- import torch
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90415
Approved by: https://github.com/albanD, https://github.com/soulitzer
2022-12-14 16:20:57 +00:00
Richard Zou
4809e838c1 functorch.jvp support for autograd.Function (#90077)
This PR adds functorch.jvp support for autograd.Function. It does so by
adding a jvp rule for custom_function_call.

For a regular PyTorch operation (like at::sin), the VariableType kernel:
- re-dispatches to at::sin
- calls the jvp rule for at::sin

The jvp rule for custom_function_call does just that. It constructs a
new autograd.Function (because the above logic already exists). Inside
the forward, it re-dispatches to custom_function_call. In the jvp rule,
it just calls whatever the jvp rule is supposed to be.

Since this logic is really close to the custom_function_call_grad, I
just put them together.

Test Plan:
- added jvp rules to the autograd.Function in autograd_function_db
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90077
Approved by: https://github.com/albanD, https://github.com/soulitzer
2022-12-14 16:20:53 +00:00
Richard Zou
24c3ad7851 Move private forward grad mode helpers to torch.autograd.forward_ad (#90240)
Motivation
- These were previously defined in functorch. They are not
functorch-specific, so I'm moving them to torch.autograd.forward_ad and
the autograd python bindings.
- I need this to avoid some of my cyclic import problems.

Should these be public APIs? Probably. Though this needs discussion, so
punting it to the future.

Test Plan:
- moved the tests of these from test/functorch/test_eager_transforms.py
to test/test_autograd.py
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90240
Approved by: https://github.com/soulitzer
2022-12-13 14:14:02 +00:00
Richard Zou
3049d99027 autograd.Function supports vmap staticmethod (#90037)
This PR adds a `vmap` staticmethod to autograd.Function and a
corresponding vmap kernel for custom_function_call. These two items mean
that autograd.Function with a vmap staticmethod can be used with vmap.

```py
class NumpyMul(torch.autograd.Function)
    staticmethod
    def forward(x, y):
        return torch.tensor(to_numpy(x) * to_numpy(y), device=x.device)

    staticmethod
    def setup_context(ctx, outputs, x, y):
        ctx.save_for_backward(x, y)

    staticmethod
    def backward(ctx, grad_output):
        x, y = ctx.saved_tensors
        gx = None
        if isinstance(x, torch.Tensor) and x.requires_grad:
            gx = NumpyMul.apply(grad_output, y)
        gy = None
        if isinstance(y, torch.Tensor) and y.requires_grad:
            gy = NumpyMul.apply(grad_output, x)
        return gx, gy

    staticmethod
    def vmap(info, in_dims, x, y):
        x_bdim, y_bdim = in_dims
        x = x.movedim(x_bdim, -1) if x_bdim else x.unsqueeze(-1)
        y = y.movedim(y_bdim, -1) if y_bdim else y.unsqueeze(-1)
        result = NumpyMul.apply(x, y)
        result = result.movedim(-1, 0)
        return result, 0
```

API Spec
- the staticmethod takes two arguments (info, in_dims) as well as the
unexpanded inputs (x, y).
- If we think about it as `vmap(info, in_dims, *args)`, `in_dims` is a
pytree with the same tree structure as args. It has None if the arg is
not being vmapped over and an integer vmapped dimension index if it is.
- `info` is an object with metadata about the vmap. It currently has one
field, `info.batch_size`. In the future we can extend this by adding
things like the randomness information.
- If there is a single vmap going on, (x, y) are NOT BatchedTensors,
they've already been unpacked.
- We expect the user to return a `(outputs, out_dims)` tuple. `out_dims`
must "broadcast" to the same pytree structure as `outputs`.

Semantics
- vmap(NumpyMul.apply)(x) will apply the vmap staticmethod if there is
one and will never actually run NumpyMul.forward.
- In order for the autograd.Function to support nested vmap (e.g.,
`vmap(vmap(NumpyMul.apply))(x)`, then the vmap staticmethod must call
into operations that vmap understands (i.e. PyTorch operators or more
autograd.Function).

At a high level, this PR:
- adds a vmap rule for custom_function_call

Testing
- Added some tests for in_dims and info
- Added vmap staticmethod to most of the autograd.Function in
autograd_function_db and sent them through functorch's vmap-related
OpInfo tests

Future
- Better error messages if the user gets the return contract wrong. I
didn't include them in this PR because it might involve a refactor of
some of the existing code in functorch/_src/vmap.py that will add
~200LOC to the PR, but LMK if you'd prefer it here.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90037
Approved by: https://github.com/samdow, https://github.com/soulitzer
2022-12-13 14:14:02 +00:00
Michael Lazos
730e44bbc7 Add logging for aot autograd and unified debug flag (#88987)
- Adds `log_level` to aot's config
- Outputs log to `<graph_name>_<log_level>.log` in aot_torchinductor subfolder of the debug directory
- Modifies the Inductor debug context to use the graph name when naming the folder instead of the os pid
- Adds `TORCH_COMPILE_DEBUG` flag to enable it, (as well as separate dynamo and inductor logs)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88987
Approved by: https://github.com/Chillee
2022-12-09 17:28:10 +00:00
Elias Ellison
b651e06049 Add Pointwise Tag from pointwise set in DTensor, use in aot_autograd partitioner (#90029)
Takes the pointwise op list from [DTensor](https://github.com/pytorch/pytorch/blob/master/torch/distributed/_tensor/ops/pointwise_ops.py#L36) as an initially starting point for pointwise ops, and feeds them to the aot autograd partitioner.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90029
Approved by: https://github.com/ezyang
2022-12-08 20:21:17 +00:00
Richard Zou
7342251281 functorch.grad support for autograd.Function (#89860)
Happy to split this PR more if it helps.

This PR adds functorch.grad support for autograd.Function. There's a lot
going on; here is the high level picture and there are more details as
comments in the code.

Mechanism (PyOperator)
- Somehow, autograd.Function needs to dispatch with functorch. This is
necessary because every layer of functorch needs to see the
autograd.Function; grad layers need to preserve the backward pass.
- The mechanism for this is via PyOperator. If functorch transforms are
active, then we wrap the autograd.Function in a `custom_function_call`
PyOperator where we are able to define various rules for functorch
transforms.
- `custom_function_call` has a rule for the functorch grad transform.

autograd.Function changes
- I needed to make some changes to autograd.Function to make this work.
- First, this PR splits autograd.Function into a _SingleLevelFunction
(that works with a single level of functorch transform) and
autograd.Function (which works with multiple levels). This is necessary
because functorch's grad rule needs some way of specifying a backward
pass for that level only.
- This PR changes autograd.Function's apply to eitehr call
`custom_function_call` (if functorch is active) or super().apply (if
functorch isn't active).

Testing
- Most of this PR is just testing. It creates an autograd.Function
OpInfo database that then gets passed to the functorch grad-based tests
(grad, vjp, vjpvjp).
- Since functorch transform tests are autogenerated from OpInfo tests,
this is the easiest way to test various autograd.Function with
functorch.

Future
- jvp and vmap support coming next
- better error message (functorch only supports autograd.Function that
have the optional setup_context staticmethod)
- documentation to come when we remove the feature flag

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89860
Approved by: https://github.com/soulitzer
2022-12-08 19:31:04 +00:00
Edward Z. Yang
2ad6ed8ac9 Fix some typed storage is deprecated warnings. (#89867)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89867
Approved by: https://github.com/albanD
2022-12-07 20:09:57 +00:00
Ram Rachum
351d73b97f Fix exception causes all over the codebase (#90271)
This is the continuation to #90134 and hopefully the final PR in this series.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90271
Approved by: https://github.com/kit1980
2022-12-07 04:29:00 +00:00
Richard Zou
4068c5467d [Reland] Move functorch/_src to torch/_functorch (#88756) (#90091)
This will be the last disruptive functorch internals change.

Why are we moving these files?
- As a part of rationalizing functorch we are moving the code in
functorch/_src to torch/_functorch
- This is so that we can offer the functorch APIs as native PyTorch APIs
(coming soon) and resolve some internal build issues.

Why are we moving all of these files at once?
- It's better to break developers all at once rather than many times

Test Plan:
- wait for tests

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90091
Approved by: https://github.com/anijain2305, https://github.com/ezyang
2022-12-03 14:17:15 +00:00
PyTorch MergeBot
218d9c6e09 Revert "Move functorch/_src to torch/_functorch (#88756)"
This reverts commit 52bc5c1cfe.

Reverted https://github.com/pytorch/pytorch/pull/88756 on behalf of https://github.com/clee2000 due to broke imports in tests 52bc5c1cfe https://github.com/pytorch/pytorch/actions/runs/3574742513/jobs/6010814968 probably a landrace
2022-11-29 17:17:11 +00:00