pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
Edward Z. Yang	c67c16bcd2	Switch calling convention back to real tensors (#99320 ) Months ago, in order to get dynamic shapes working through to Dynamo backends, we changed the calling convention to pass fake tensors rather than real tensors as example inputs to backends. The motivation at the time was, well, backends shouldn't really be peeking at the real tensors when they are doing compilation, and so it would make more sense to hide the real tensors from backends. But there were a bunch of problems: * This interacted poorly with our accuracy minifier design: accuracy minifier needs access to the real inputs in order to run the model and figure out what happens! * The TensorRT backend required real inputs and we never figured out how to fix it. * In practice, all the backends needed to detect if they were passed real tensors, and fakeify them anyway (certainly AOTAutograd does this) * Parameters and inputs are treated non-uniformly: parameters had to be passed as real tensors, because CUDA graphs requires knowing what the actual tensors are Furthermore, there were some more problems discovered after the fact: * Backends may want to optimize on aspects of tensors which you cannot tell without having real tensors; e.g., alignment of the data pointer So, this PR decides that changing the calling convention was a bad idea, and switches back to passing real tensors. There is a problem though: AOTAutograd will perform fakeification, which means that in practice backends are still going to end up with fake tensors in the end anyway. I want to change this, but this will require some work with bdhirsh's upcoming AOTAutograd export refactor. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/99320 Approved by: https://github.com/voznesenskym	2023-04-19 12:15:52 +00:00
PyTorch MergeBot	ea50d4f146	Revert "Switch calling convention back to real tensors (#99320 )" This reverts commit `780922c24e`. Reverted https://github.com/pytorch/pytorch/pull/99320 on behalf of https://github.com/DanilBaibak due to Break internal build	2023-04-19 09:44:06 +00:00
Nikita Shulga	8a89eec2f8	[BE] Do not use unicode quotes (#99446 ) They are mostly used in commented code examples, but even Python-3.12 does not recognize `“foobar”` as valid string literal I.e. just `s/[“”]/"/` Pull Request resolved: https://github.com/pytorch/pytorch/pull/99446 Approved by: https://github.com/huydhn, https://github.com/ezyang	2023-04-18 22:59:56 +00:00
Edward Z. Yang	780922c24e	Switch calling convention back to real tensors (#99320 ) Months ago, in order to get dynamic shapes working through to Dynamo backends, we changed the calling convention to pass fake tensors rather than real tensors as example inputs to backends. The motivation at the time was, well, backends shouldn't really be peeking at the real tensors when they are doing compilation, and so it would make more sense to hide the real tensors from backends. But there were a bunch of problems: * This interacted poorly with our accuracy minifier design: accuracy minifier needs access to the real inputs in order to run the model and figure out what happens! * The TensorRT backend required real inputs and we never figured out how to fix it. * In practice, all the backends needed to detect if they were passed real tensors, and fakeify them anyway (certainly AOTAutograd does this) * Parameters and inputs are treated non-uniformly: parameters had to be passed as real tensors, because CUDA graphs requires knowing what the actual tensors are Furthermore, there were some more problems discovered after the fact: * Backends may want to optimize on aspects of tensors which you cannot tell without having real tensors; e.g., alignment of the data pointer So, this PR decides that changing the calling convention was a bad idea, and switches back to passing real tensors. There is a problem though: AOTAutograd will perform fakeification, which means that in practice backends are still going to end up with fake tensors in the end anyway. I want to change this, but this will require some work with bdhirsh's upcoming AOTAutograd export refactor. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/99320 Approved by: https://github.com/voznesenskym	2023-04-18 02:09:57 +00:00
Edward Z. Yang	a109453df4	Delete use_functionalize feature flag (#99317 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/99317 Approved by: https://github.com/voznesenskym	2023-04-18 02:09:57 +00:00
Edward Z. Yang	17d7be68ee	Delete functorch use_fake_tensor and debug_fake_cross_ref (#99314 ) Using fake tensor with AOTAutograd is now mandatory, simplifying our logic. Unfortunately, this means debug_fake_cross_ref must go, but I don't think anyone has used it recently. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/99314 Approved by: https://github.com/eellison, https://github.com/zou3519	2023-04-18 02:09:54 +00:00
Li-Huai (Allan) Lin	6f181aae7c	[vmap] Register decomposition for huber_loss_backward (#99236 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/99236 Approved by: https://github.com/kshitij12345	2023-04-16 18:50:45 +00:00
Animesh Jain	fdbc8625a1	Functionalization of torch.rand/rand_like ops (#97377 ) This PR introduces the functionalization of RNG ops. Key points are * Introduces a new `philox_rand` prim operator that accepts seed, offset. * Adds decompositions for random operators that use these philox_rand prims * Adds a PhiloxStateTracker to track the offset for each occurence of rand ops * Changes calling convention of AOT Autograd and adds <fwd_seed, fwd_base_offset> and <bwd_seed, bwd_base_offset> * Monkeypatches set_rng_state and get_rng_state while AOT Autograd tracing to record the rng state behavior * Raises assertion for CPU because CPU does not Philox RNG. Not dealt in this PR * dropout op - offset calculation is different * other distributions like normal, poisson etc * Inductor support * Cudagraph support * Dynamic shape support An example ~~~ class Custom(torch.autograd.Function): @staticmethod def forward(ctx, x): ctx.save_for_backward(x) a = torch.rand_like(x) * x a = torch.rand_like(x) * a return a @staticmethod def backward(ctx, grad_out): x, = ctx.saved_tensors return grad_out * torch.rand_like(grad_out) * torch.cos(x) ====== Forward graph 0 ====== def forward(self, fwd_seed_1: i64[], fwd_base_offset_1: i64[], primals_1: f32[16, 16]): # No stacktrace found for following nodes add: i64[] = torch.ops.aten.add.Tensor(fwd_base_offset_1, 0) philox_rand: f32[16, 16] = torch.ops.prims.philox_rand.default([16, 16], fwd_seed_1, add, [16, 1], device(type='cuda', index=0), torch.float32); add = None mul: f32[16, 16] = torch.ops.aten.mul.Tensor(philox_rand, primals_1); philox_rand = None add_1: i64[] = torch.ops.aten.add.Tensor(fwd_base_offset_1, 4); fwd_base_offset_1 = None philox_rand_1: f32[16, 16] = torch.ops.prims.philox_rand.default([16, 16], fwd_seed_1, add_1, [16, 1], device(type='cuda', index=0), torch.float32); fwd_seed_1 = add_1 = None mul_1: f32[16, 16] = torch.ops.aten.mul.Tensor(philox_rand_1, mul); philox_rand_1 = mul = None return [mul_1, primals_1] ====== Backward graph 0 ====== def forward(self, bwd_seed_1: i64[], bwd_base_offset_1: i64[], primals_1: f32[16, 16], tangents_1: f32[16, 16]): # No stacktrace found for following nodes add_2: i64[] = torch.ops.aten.add.Tensor(bwd_base_offset_1, 0); bwd_base_offset_1 = None philox_rand_2: f32[16, 16] = torch.ops.prims.philox_rand.default([16, 16], bwd_seed_1, add_2, [16, 1], device(type='cuda', index=0), torch.float32); bwd_seed_1 = add_2 = None mul_2: f32[16, 16] = torch.ops.aten.mul.Tensor(tangents_1, philox_rand_2); tangents_1 = philox_rand_2 = None cos: f32[16, 16] = torch.ops.aten.cos.default(primals_1); primals_1 = None mul_3: f32[16, 16] = torch.ops.aten.mul.Tensor(mul_2, cos); mul_2 = cos = None return [mul_3] ~~~ Pull Request resolved: https://github.com/pytorch/pytorch/pull/97377 Approved by: https://github.com/ezyang	2023-04-16 09:55:56 +00:00
Edward Z. Yang	039faf0dbf	Add invariant that all symbolic shapes must be bound in graph (#99089 ) Previously, we had a problem when partitioning forward-backward dynamic graphs, which is that we could end up with a backward graph that mentions a symbol in an input tensor (e.g., `f32[s0 + s1]`), but without this symbol being otherwise bound elsewhere. When this happens, we have no way of actually deriving the values of `s0` and `s1`. Our fix for this in https://github.com/pytorch/pytorch/pull/93059 was to just retrace the graph, so that s0 + s1 got allocated a new symbol s2 and everything was happy. However, this strategy had other problems, namely (1) we lost all information from the previous ShapeEnv, including guards and (2) we end up allocating a LOT of fresh new symbols in backwards. With this change, we preserve the same ShapeEnv between forward and backwards. How do we do this? We simply require that every symbol which may be present inside tensors, ALSO be a plain SymInt input to the graph. This invariant is enforced by Dynamo. Once we have done this, we can straightforwardly modify the partitioner to preserve these SymInt as saved for backwards, if they are needed in the backwards graph to preserve the invariant as well. This apparently breaks yolov3, but since everything else is OK I'm merging this as obviously good and investigating later. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/99089 Approved by: https://github.com/voznesenskym	2023-04-16 01:48:19 +00:00
Brian Hirsh	7cb581d42f	aot_autograd: more logging on metadata asserts (#99177 ) Summary: add better logging to aot autograd asserts to debug internal model issues Test Plan: let CI run Differential Revision: D45006044 Pull Request resolved: https://github.com/pytorch/pytorch/pull/99177 Approved by: https://github.com/bertmaher	2023-04-15 00:22:00 +00:00
Brian Hirsh	670c5cf962	AOTAutograd: fix 'Trying to backward through the graph a second time' error (#98960 ) Fixes https://github.com/pytorch/pytorch/issues/97745. See discussion and comment in the PR for more details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/98960 Approved by: https://github.com/bertmaher, https://github.com/albanD	2023-04-13 10:25:07 +00:00
Jason Ansel	8a6dd0dc97	Disable logging in pattern matcher calls to AotAutograd (#98936 ) Fixes #98778 Pull Request resolved: https://github.com/pytorch/pytorch/pull/98936 Approved by: https://github.com/wconstab	2023-04-13 04:51:08 +00:00
Andrew Gu	3c5a825f3c	[AOTAutograd] Fix is-duplicate check in de-dup guard logic (#98932 ) Context The existing check to see if an arg is duped is `if dupe_arg_pos != kept_pos:`. However, this incorrectly considers every arg after a true duped arg to also be a duped arg. Consider `flat_args = [a, b, b, c]`, where indices `1` and `2` are duped. - `add_dupe_map = {0: 0, 1: 1, 2: 1, 3: 2}` - For `dupe_arg_pos=2, kept_pos=1`, `2 != 1`, so the check correctly identifies the second `b` to be a duped arg. - For `dupe_arg_pos=3, kept_pos=2`, `3 != 2`, so the check incorrectly identifies the `c` to be a duped arg. Indeed, if there were more args like `[a, b, b, c, d, e, ...]`, every arg after the second `b` will be considered a duped arg since its `kept_pos` will always be 1 lower than its `dupe_arg_pos`. Overview This PR changes `add_dupe_map` to be implemented as a `List[int]`, where the list index implicitly represents the `dupe_arg_pos` and the list element represents the `kept_pos`. We use a list to have stable in-order iteration and because we know the keys to be in `{0, 1, ..., len(flat_args) - 1}`. With `add_dupe_map` as a list, the `is_dupe_arg` condition is whether the entry in `add_dupe_map` shows a new not-yet-seen index in the iteration. One way to do this is to count the number of unique args so far and compare against that. This closes https://github.com/pytorch/pytorch/issues/98883, where now the guards change from ``` GUARDS ___guarded_code.valid and ___check_type_id(L['self'], 93996836333040) and ___check_obj_id(L['self'], 140119034997536) and not ___are_deterministic_algorithms_enabled() and ___check_tensors(L['x']) and L['self']._buf is L['self']._buf_module._buf and L['self']._buf_module._buf is L['self']._param ``` to without the final incorrect `L['self']._buf_module._buf is L['self']._param` guard. Pull Request resolved: https://github.com/pytorch/pytorch/pull/98932 Approved by: https://github.com/ezyang	2023-04-12 22:22:50 +00:00
Das, Barun	a38ff4cfd1	documentation update (#98782 ) change` parameters_and_buffers` to `parameter_and_buffer_dicts` in function docstring Fixes #98766 Pull Request resolved: https://github.com/pytorch/pytorch/pull/98782 Approved by: https://github.com/ngimel, https://github.com/kit1980	2023-04-12 20:34:30 +00:00
Edward Z. Yang	822464567f	Lazily format graphs for debug printing (#98776 ) The current code unconditionally formats the graphs, which is a waste of CPU if no one looks at them. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/98776 Approved by: https://github.com/albanD, https://github.com/mlazos	2023-04-10 22:41:33 +00:00
Edward Z. Yang	02cff64784	Assert that there are not duplicate sources for distinct arguments (#98738 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/98738 Approved by: https://github.com/albanD, https://github.com/bdhirsh	2023-04-10 19:32:08 +00:00
Liao, Xuan	95621b3c2e	[aot] fix disable amp for runtime wrapper (#97864 ) For the current runtime wrapper in aot, `disable_amp` is always set to True. In fact, we would like to avoid disabling autocast if possible because accessing TLS is slow. In this PR, `disable_amp` depends on whether there is any autocast enabled instead of always being True. Many operators would get an improvement of performance (inductor v.s. eager) with this fix. Example of operators' 0.8 speedup in torchbench (inductor v.s. eager): <html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40"> <head> <meta name=ProgId content=Excel.Sheet> <meta name=Generator content="Microsoft Excel 15"> <link id=Main-File rel=Main-File href="file:///C:/Users/xuanliao/AppData/Local/Temp/msohtmlclip1/01/clip.htm"> <link rel=File-List href="file:///C:/Users/xuanliao/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml"> </head> <body link="#0563C1" vlink="#954F72"> \| current \| new -- \| -- \| -- aten.hardsigmoid.default \| 0.709372349 \| 0.81414306 aten.tanh.default \| 0.715227805 \| 0.855556349 aten.add.Scalar \| 0.682292123 \| 0.860371222 aten.sigmoid_backward.default \| 0.688039934 \| 0.915606579 </body> </html> Pull Request resolved: https://github.com/pytorch/pytorch/pull/97864 Approved by: https://github.com/EikanWang, https://github.com/jansel, https://github.com/jgong5, https://github.com/bdhirsh	2023-04-10 05:00:12 +00:00
Jason Ansel	b9d3b3f595	Improve support for contextlib.nullcontext (#98111 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/98111 Approved by: https://github.com/anijain2305	2023-04-02 02:33:14 +00:00
Michael Lazos	ee9a9b7add	Remove old logging callsites (#98095 ) Get around GH first issue, OSS only changes for https://github.com/pytorch/pytorch/pull/97182 Pull Request resolved: https://github.com/pytorch/pytorch/pull/98095 Approved by: https://github.com/anijain2305	2023-04-01 00:57:37 +00:00
Brian Hirsh	864ab93656	aot_autograd: avoid using intermediate_base logic unnecessarily (#97786 ) fixes https://github.com/pytorch/pytorch/issues/97691, see the issue for the proposed design. Now that we are employing AOTAutograd's "intermediate base" logic a lot less frequently, we might see some speedups in the benchmark suite. Pull Request resolved: https://github.com/pytorch/pytorch/pull/97786 Approved by: https://github.com/jansel, https://github.com/soulitzer	2023-03-31 16:25:13 +00:00
Aaron Gokaslan	47dca20d80	[BE] Enable flake8-comprehension rule C417 (#97880 ) Enables flake8-comprehension rule C417. Ruff autogenerated these fixes to the codebase. Pull Request resolved: https://github.com/pytorch/pytorch/pull/97880 Approved by: https://github.com/ezyang, https://github.com/kit1980, https://github.com/albanD	2023-03-30 14:34:24 +00:00
Jason Ansel	04ca3a289d	Disable modes in preserve_rng_state (#97738 ) This one allows make_fx to be called when already in a faketensor mode. Pull Request resolved: https://github.com/pytorch/pytorch/pull/97738 Approved by: https://github.com/ezyang	2023-03-29 22:46:51 +00:00
Edward Z. Yang	8372c5dc68	Refactor dynamic dims api, stateless internals, higher level export API (#96699 ) The purpose of this API is to execute a few large components of work: 1) Refactor all the internals of plumbing dynamic dimension information after dynamo to be stateless 2) Decouple allocation controls around dynamic dimensions from verification 3) For (2), for allocation, create an enum that dictates whether we are in DUCK (default today), STATIC (aka assume_static_default in the past), or DYNAMIC (aka user constrained, do not duck shape) 4) For (2), for verification, we separate out the list of dynamic ranges entirely from allocation. This means shape_env does not tracking for what we verify on, and instead, it is the callers job to invoke produce_guards() with the various things they want verified, specifically, with the valid ranges. We do use constrain ranges to refine value ranges when doing analysis. 5) We have decided, therefore, as an extension of (4) to double down on "late" checks versus "eager" checks, primarily because the mechanisms for gathering what actually matters happens during guards, and should be a purview of the caller seeking guards, not the shape env. However, for dynamo, these structures are essentially one and the same. Pull Request resolved: https://github.com/pytorch/pytorch/pull/96699 Approved by: https://github.com/avikchaudhuri, https://github.com/ezyang	2023-03-29 16:55:49 +00:00
Aaron Gokaslan	597b558c51	[BE]: Update flake8 and plugins and fix bugs (#97795 ) Update flake8 and flake8-plugins in lintrunner to a modern version. Enables more checks and makes flake8 checks significantly faster. Added a few additional rule ignores that will need to be fixed in the future. Pull Request resolved: https://github.com/pytorch/pytorch/pull/97795 Approved by: https://github.com/alexsio27444, https://github.com/janeyx99, https://github.com/ezyang	2023-03-28 23:51:55 +00:00
kshitij12345	2b369eb3c2	[fix] jacrev and jacfwd : support non-tensor args again (#97746 ) Fixes https://github.com/pytorch/pytorch/issues/97636 The code to check if argument tensor are complex assumed that all arguments are tensor (which is not the case) which lead to the error. Pull Request resolved: https://github.com/pytorch/pytorch/pull/97746 Approved by: https://github.com/zou3519	2023-03-28 16:37:33 +00:00
Edward Z. Yang	6430dad700	Apparently aot_function doesn't cache anymore (#97610 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/97610 Approved by: https://github.com/albanD	2023-03-27 21:07:20 +00:00
Michael Voznesensky	31e858e8fc	Add missing aot_autograd_arg_pos_to_source (#97487 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/97487 Approved by: https://github.com/malfet, https://github.com/ezyang	2023-03-24 05:17:59 +00:00
Edward Z. Yang	fa4c77e39b	Rename PyOperator to HigherOrderOperator (#97493 ) Twice this week I have had people confuse "operator defined with Python operator registration aka torch.library" and "PyOperator which is used to define control flow operators and other operators that cannot be represented in JIT schema." Renaming PyOperator for clarity. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/97493 Approved by: https://github.com/SherlockNoMad	2023-03-24 05:04:02 +00:00
Elias Ellison	33dfdedb28	CUDAGraph Trees - Warn on dealloc (#97171 ) Differential Revision: [D44228370](https://our.internmc.facebook.com/intern/diff/D44228370) Pull Request resolved: https://github.com/pytorch/pytorch/pull/97171 Approved by: https://github.com/ezyang, https://github.com/jansel	2023-03-24 01:19:19 +00:00
Edward Z. Yang	37faa48844	DCE inference graphs too (#97275 ) I added a bunch of asserts to verify that I didn't accidentally kill copy_ in the graph, hopefully this combined with our existing tests is good enough. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/97275 Approved by: https://github.com/bdhirsh	2023-03-23 07:02:52 +00:00
Brian Hirsh	545abc292b	[aot autograd] refactor to make functionalization self-contained (#96341 ) This refactor should make it easier to add an export hook into aot autograd. (1) I killed `create_forward_or_joint_functionalized()` (and the functions that it called, like `forward_or_joint()`) which used to handle autograd + functionalization all-in-one-go for the joint case, and was also used in the inference case. I added a few separate helper functions: `create_functionalized_graph()`: this takes a flat fn, and returns a functionalized fx graph. It is mostly just a thin wrapper around functionalization + make_fx(), but also has some extra logic to manually append `copy_()` ops to the end of the graph. `fn_no_extra_mutations()`: this creates the fn that we want to trace in the inference code path. It takes in a function that it then calls, and returns the outputs + any (updated) mutated inputs. `joint_fn_no_external_mutations()`: this creates the fn that we want to trace in the joint code path. It takes in a function, and traces out its joint. It also does the work of cloning inputs that are mutated and require gradients, returning mutated inputs as outputs, and returning intermediate bases as outputs We should be able to add an export hook by basically adding a similar version of `joint_fn_no_external_mutations` but with a lot more restrictions (guaranteed to have no tangents, not synthetic bases, etc), and calling `create_functionalized_graph()` on it. Differential Revision: [D44204090](https://our.internmc.facebook.com/intern/diff/D44204090) Pull Request resolved: https://github.com/pytorch/pytorch/pull/96341 Approved by: https://github.com/ezyang	2023-03-22 21:41:52 +00:00
PyTorch MergeBot	a7856e18a7	Revert "DCE inference graphs too (#97275 )" This reverts commit `aa3a57b80d`. Reverted https://github.com/pytorch/pytorch/pull/97275 on behalf of https://github.com/ezyang due to this broke a test	2023-03-22 18:55:52 +00:00
Elias Ellison	9c144bc4fe	Dont increment generation if forward of backward exists, and warning on deallocation of live tensors (#97168 ) Refining the logic for when it is okay to ignore previously live outputs from cudagraphs. If there is a forward that has been invoked without invocation of the corresponding backwards, dont allow overwriting outputs. Differential Revision: [D44228369](https://our.internmc.facebook.com/intern/diff/D44228369) Pull Request resolved: https://github.com/pytorch/pytorch/pull/97168 Approved by: https://github.com/ezyang, https://github.com/jansel	2023-03-22 18:27:36 +00:00
Horace He	e49b4d3827	Changed logging in aotautograd a little (#97289 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/97289 Approved by: https://github.com/mlazos	2023-03-22 09:33:30 +00:00
Edward Z. Yang	aa3a57b80d	DCE inference graphs too (#97275 ) I added a bunch of asserts to verify that I didn't accidentally kill copy_ in the graph, hopefully this combined with our existing tests is good enough. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/97275 Approved by: https://github.com/bdhirsh	2023-03-22 01:02:21 +00:00
Brian Hirsh	af440c427b	[draft for discussion] add per-dispatch key modes (#97052 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/97052 Approved by: https://github.com/ezyang, https://github.com/zou3519	2023-03-21 23:45:45 +00:00
Michael Voznesensky	f9ce593267	Extend aot autograd dedup guards to params, stop using positions (#96774 ) The purpose of this PR is to remove reliance on argument positions in dedup guards, AND extend the functionality to params. A version of this PR was stamped prior https://github.com/pytorch/pytorch/pull/95831 - but was kinda gross, because it was based on an underlying PR that did way too much with source names. This PR leaves most of that alone, in favor of just reusing the same name standardization logic that dynamo module registration does. Pull Request resolved: https://github.com/pytorch/pytorch/pull/96774 Approved by: https://github.com/ezyang	2023-03-21 05:59:33 +00:00
Aaron Gokaslan	5471621497	[BE] Remove unnecessary dict comprehensions (#97116 ) Removes unnecessary dict comprehensions that optimize creation of dicts from iterables Pull Request resolved: https://github.com/pytorch/pytorch/pull/97116 Approved by: https://github.com/kit1980	2023-03-20 00:56:57 +00:00
Liao, Xuan	5ee5a164ff	[aot] disable inference view tracking (#96478 ) For inference, we should disable unnecessary view tracking to save time. Most of operators get an improvement of performance (inductor v.s. eager). This PR fix the general regression of operators for inductor. Example of operators' speedup in torchbench (inductor v.s. eager): <html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40"> <head> <meta name=ProgId content=Excel.Sheet> <meta name=Generator content="Microsoft Excel 15"> <link id=Main-File rel=Main-File href="file:///C:/Users/xuanliao/AppData/Local/Temp/msohtmlclip1/01/clip.htm"> <link rel=File-List href="file:///C:/Users/xuanliao/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml"> </head> <body link="#0563C1" vlink="#954F72"> \| current \| new -- \| -- \| -- aten.hardsigmoid.default \| [0.6426090814905988, 0.6791992931354925, 0.7046010955095103] \| [0.7921782106271767, 0.8919522525991529, 0.9128089963571694] aten.tanh.default \| [0.6135534976747065, 0.7588851221588919, 0.898274076411234] \| [0.857534066531159, 1.0524121834821605, 1.2535141671420165] aten.floor.default \| [0.6115868728087821, 0.6115868728087821, 0.6115868728087821] \| [0.9472870784346195, 0.9472870784346195, 0.9472870784346195] aten.exp.default \| [0.7784016216625718, 0.9279358274876591, 1.1201178548406794] \| [0.5777145055206203, 0.8610140436473923, 1.1850714193498957] aten.mul_.Tensor \| [0.14381872531802153, 0.14638969818507447, 0.14947766446663138] \| [0.37695307573466363, 0.3832122689450142, 0.38963470437456904] aten.hardtanh_.default \| [0.49502896822398157, 0.5897512505705527, 0.8052969399847189] \| [0.4915338157706071, 0.6098169585316151, 0.8587605051115021] aten.relu_.default \| [0.47776870021339685, 0.54452322796367, 0.6516167164223963] \| [0.4764791289773786, 0.5608095328163419, 0.6753350976452626] </body> </html> Pull Request resolved: https://github.com/pytorch/pytorch/pull/96478 Approved by: https://github.com/EikanWang, https://github.com/jansel, https://github.com/jgong5, https://github.com/bdhirsh	2023-03-18 13:58:24 +00:00
Michael Lazos	a1c46e5f8f	component-level configurable logging for dynamo, inductor, aot (#94858 ) Summary: Adds NNC-like logging that is configured through an env var `TORCH_COMPILE_LOGS` Examples: `TORCH_LOGS="dynamo,guards" python script.py` - prints dynamo logs at level INFO with guards of all functions that are compiled `TORCH_LOGS="+dynamo,guards,graph" python script.py` - prints dynamo logs at level DEBUG with guards and graphs (in tabular) format of all graphs that are compiled [More examples with full output](https://gist.github.com/mlazos/b17f474457308ce15e88c91721ac1cce) Implementation: The implementation parses the log settings from the environment, finds any components (aot, dynamo, inductor) or other loggable objects (guards, graph, etc.) and generates a log_state object. This object contains all of the enabled artifacts, and a qualified log name -> level mapping. _init_logs then adds handlers to the highest level logs (the registered logs), and sets any artifact loggers to level DEBUG if the artifact is enabled. Note: set_logs is an alternative for manipulating the log_state, but if the environment contains TORCH_LOGS, the environment settings will be prioritized. Adding a new log: To add a new log, a dev should add their log name to torch._logging._registrations (there are examples there already). Adding a new artifact: To add a new artifact, a dev should add their artifact name to torch._logging._registrations as well. Additionally, wherever the artifact is logged, `torch._logging.getArtifactLogger(__name__, <artifact_name>)` should be used instead of the standard logging implementation. [design doc](https://docs.google.com/document/d/1ZRfTWKa8eaPq1AxaiHrq4ASTPouzzlPiuquSBEJYwS8/edit#) Pull Request resolved: https://github.com/pytorch/pytorch/pull/94858 Approved by: https://github.com/ezyang	2023-03-18 04:17:31 +00:00
PyTorch MergeBot	b5ecf727be	Revert "[aot autograd] refactor to make functionalization self-contained (#96341 )" This reverts commit `3cd9c7a16d`. Reverted https://github.com/pytorch/pytorch/pull/96341 on behalf of https://github.com/DanilBaibak due to Break internal build	2023-03-17 09:24:05 +00:00
Brian Hirsh	e9d9151eec	[aot autograd] avoid cloning some inputs unnecessarily when they dont require grad (#96342 ) When constructing the joint graph, we normally have to clone any inputs that are mutated, so that we can pass in the original, pre-mutation inputs as leaves to autograd. Previously, we were doing this for all mutated inputs - but we only need to do it for inputs that require gradients and participate in autograd. Hopefully this should speed up code like batch norm - I think before this we were unnecessarily cloning the running stats during training. Pull Request resolved: https://github.com/pytorch/pytorch/pull/96342 Approved by: https://github.com/albanD, https://github.com/ezyang	2023-03-15 16:34:04 +00:00
Brian Hirsh	3cd9c7a16d	[aot autograd] refactor to make functionalization self-contained (#96341 ) This refactor should make it easier to add an export hook into aot autograd. (1) I killed `create_forward_or_joint_functionalized()` (and the functions that it called, like `forward_or_joint()`) which used to handle autograd + functionalization all-in-one-go for the joint case, and was also used in the inference case. I added a few separate helper functions: `create_functionalized_graph()`: this takes a flat fn, and returns a functionalized fx graph. It is mostly just a thin wrapper around functionalization + make_fx(), but also has some extra logic to manually append `copy_()` ops to the end of the graph. `fn_no_extra_mutations()`: this creates the fn that we want to trace in the inference code path. It takes in a function that it then calls, and returns the outputs + any (updated) mutated inputs. `joint_fn_no_external_mutations()`: this creates the fn that we want to trace in the joint code path. It takes in a function, and traces out its joint. It also does the work of cloning inputs that are mutated and require gradients, returning mutated inputs as outputs, and returning intermediate bases as outputs We should be able to add an export hook by basically adding a similar version of `joint_fn_no_external_mutations` but with a lot more restrictions (guaranteed to have no tangents, not synthetic bases, etc), and calling `create_functionalized_graph()` on it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/96341 Approved by: https://github.com/ezyang	2023-03-15 16:34:04 +00:00
Brian Hirsh	8ce8d49cc4	aot autograd: consolidate metadata (#96340 ) Another bonus of factoring the synthetic_base logic into one place: we used to have a `CompiledRuntimeMetadata` object that encapsulated `ViewAndMutationMeta`, plus a bunch of extra synthetic base metadata that was plumbed around. Now I can kill that first metadata object, and use `ViewAndMutationMeta` on its own everywhere. Pull Request resolved: https://github.com/pytorch/pytorch/pull/96340 Approved by: https://github.com/ezyang	2023-03-15 13:45:45 +00:00
Brian Hirsh	070cefaef9	aot_autograd: dont requires_grad on tangents (#96339 ) Ed pointed it out a few days ago - I probably added this mistakenly a few months ago. I can't think of any reason it's necessary, and removing it doesn't cause any tests to fail. Pull Request resolved: https://github.com/pytorch/pytorch/pull/96339 Approved by: https://github.com/ezyang	2023-03-15 13:45:45 +00:00
Brian Hirsh	a269469982	aot autograd refactor: make all synthetic base logic layered in a single location (#96235 ) This refactor doesn't significantly change LoC in aot autograd, but I think this nets out to making it clearer (interested in peoples' thoughts). The idea is that I tried to re-write the part of aot autograd that deals with synthetic bases in a layered way, similar to how Ed wrote the logic for dedup'ing inputs: it happens in one place, and all of the downstream transformation in aot autograd don't have to worry about it. Specifically, I added a new function `aot_wrapper_synthetic_base`, similar to the existing `aot_wrapper_dedupe`. The benefit: none of the other code in aot autograd needs to think about synthetic bases (previously, synthetic base code was intertwined in several places). The downsides: there are two. (1) `aot_wrapper_synthetic_base()` needs to have its own epilogue. There is one particularly hairy case, where factoring the synthetic base logic to a single location was painful: If you have two inputs that alias each other, where one gets a data mutation, and the other gets a metadata mutation. Ordinarily, metadata mutations are handled by the runtime epilogue, in `create_runtime_wrapper`. However, now that things are factored this way, the runtime wrapper operates only on synthetic bases instead of operating on the original inputs. For data mutations, it is fine to apply the data mutation to the synthetic base instead of the original input alias. But for metadata mutations, we need to apply the metadata mutation directly to the original inputs. The way that I handled this was by tracking which inputs slot into this specific case (part of a synthetic base, and get metadata mutations), and updateing the flat_fn() that we pass downstream to return these updated inputs as extra outputs. From the perspective of downstream logic, these are real user outputs, that it can treat like any other user outputs. `aot_wrapper_synthetic_base` will know to grab these extra outputs and use them to apply the metadata mutations. This was pretty annoying, but has the benefit that all of that logic is encapsulated entirely in `aot_wrapper_synthetic_base()`. (2) input mutations are now performed on the synthetic base instead of the individual aliases. You can see the original code comment [here](`b0b5f3c6c6/torch/_functorch/aot_autograd.py (L1131)`) for details. We used to do the optimized thing in this case, and now we do the less optimized thing (copying the entire synthetic base, instead of the potentially smaller alias). To be fair, we had no data showing that this optimization was showing improvements on any models in practice. I also think that the main reason anyone would ever run across this problem is because of a graph break - so if you care about perf, you probably want to avoid the extra graph breaks to begin with. I haven't added any warnings for this, but we probably could depending on what people think. Pull Request resolved: https://github.com/pytorch/pytorch/pull/96235 Approved by: https://github.com/ezyang	2023-03-15 13:45:43 +00:00
Brian Hirsh	7a076b7b93	[aot_autograd] only performance functionalization analysis pass once (#95992 ) For a while now, we've been re-running our functionalization analysis pass twice - once for get metadata when dedup'ing, and an entire second time during aot_dispatch_base/autograd. This should also probably speed up compile times pretty noticeably, since we're going from: (a) inference-only trace case: 3 fw traces -> 2 fw traces (b) autograd trace case: 2 fw traces + 1 joint trace -> 1 fw trace + 1 joint trace Pull Request resolved: https://github.com/pytorch/pytorch/pull/95992 Approved by: https://github.com/ezyang	2023-03-15 13:45:40 +00:00
Edward Z. Yang	02f6d14b97	Only allow SymInt across partitioner boundaries, and fixes (#96653 ) This PR does a few things all at once, as I needed to fix several bugs on the way here. The main goal of the PR is to fix the `'float' object has no attribute '_has_symbolic_sizes_strides'` error. The general idea is to heavily penalize non-SymInt but still SymNode cuts in the graph. This doesn't work for default partitioner, so essentially, dynamic shapes with default partitioner is not supported. While doing this, I had a fix a few other bugs in the partitioner: * SymNode operations weren't considered recomputable. But they are very cheap, go wild. * zeros_like wasn't considered recomputable, and this prevented some gradient formulas (e.g., for angle with real inputs) from successfully finding a cut at all * AOTAutograd tests use the default partitioner. I switch them to use min-cut partitioner... * ...but this reveals a bug where if we have nodes in backward outputs that don't depend on tangents, they never get assigned to the backward graph. I fix this by making the backward outputs mandatory to be in backwards. I have to be careful to filter out None backward outputs; those never participate in flow analysis! This causes some wobbling for the min-cut tests, but these seem legitimate: since we're now willing to recompute, the partitioner can reduce the number of SymInts it transmits by just doing some recompute in the backend. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/96653 Approved by: https://github.com/ngimel	2023-03-14 18:30:56 +00:00
Edward Z. Yang	30c4ea138f	Assert that there are no None arguments to backwards (#96300 ) This assert would have caught https://github.com/pytorch/pytorch/pull/96219 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/96300 Approved by: https://github.com/bdhirsh	2023-03-09 22:14:39 +00:00
Edward Z. Yang	e9e6b3b6c5	[EASY] Add complex dtypes to partitioner (#96297 ) Also, delete some redundant dtype setting. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/96297 Approved by: https://github.com/Chillee	2023-03-08 21:08:26 +00:00
Horace He	5bbec680d7	Fix usages of contextmanager without finally (#96170 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/96170 Approved by: https://github.com/ngimel, https://github.com/malfet	2023-03-08 20:59:27 +00:00
Brian Hirsh	98ece75043	[aot autograd] merge all outputs of funtionalization analysis into single metadata (#95991 ) This makes the next PR in the stack cleaner: having the top level entry point to aot autograd perform the functionalization analysis pass once, and plumb the metadata everywhere else that we need it. I put it in a separate PR because I recently learned that this function is used in fbcode, so I'll need to fix up internals when I land this PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/95991 Approved by: https://github.com/ezyang	2023-03-08 16:22:54 +00:00
Brian Hirsh	29b216acd5	aot autograd: handle detach() and no_grad() mutations on input (#95980 ) Fixes https://github.com/pytorch/pytorch/issues/95167 More details are in that issue. To summarize, the issue shows up when we have some code like this: ``` def f(x): x.detach().mul_(2) # can also happen if the mul_() happens under torch.no_grad() return x + 1 ``` AOTAutograd will then spit out code like this: ``` def compiled_fn(x): x_updated = x.mul(2) out = x_updated + 1 return x_updated, out def CompiledFunction.forward(x): # pseudocode, this is part of an autograd.Function x_updated, out = compiled_function(x): return x_updated, out def runtime_wrapper(x): x_updated, out = CompiledFunction.apply(x) x.copy_(x_updated) x = torch.ones(2, requires_grad=True) out = runtime_wrapper(x) ``` However, the call to `x.copy_(x_updated)` will fail with the error: `a leaf Variable that requires grad is being used in an in-place operation`. This is because `x` is an autograd leaf, and autograd doesn't allow you to mutate leaves. In this case though, the data mutation should be entirely opaque to autograd - all mutations happened underneath a `.detach()` or a `torch.no_grad()`. As Ed pointed out in the issue, we can detect this situation by checking if the mutated input is an autograd leaf. If it is, then it must have been the case that any mutations on it must have been hidden from autograd, since otherwise the eager code would have error'd. The solution I added is to detect this situation, and manually run `x.detach().copy_(x_updated)`, to hide the update from autograd. Pull Request resolved: https://github.com/pytorch/pytorch/pull/95980 Approved by: https://github.com/ezyang	2023-03-08 16:11:06 +00:00
Brian Hirsh	f96bd52841	aot autograd: dont allow symint outputs to get tangents in the bw graph (#96219 ) Previously, if dynamic shapes were turned on and we had a forward graph that returns a symint, then we would generate a backward graph that takes in a tangent input for that symint fwd output. This causes problems for downstream - inductor will see an input that it expects to be a symint, but it gets a `None` from autograd. Confirmed that this repro now passes: ``` benchmarks/dynamo/torchbench.py --devices cuda --inductor --dynamic-shapes --unspecialize-int --accuracy --training --only drq ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/96219 Approved by: https://github.com/ezyang	2023-03-08 13:02:34 +00:00
Edward Z. Yang	6fff232280	Delete torch._functorch.config.use_dynamic_shapes (#96102 ) As requested in https://github.com/pytorch/pytorch/pull/95975#discussion_r1124837162 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/96102 Approved by: https://github.com/Skylion007	2023-03-06 18:50:20 +00:00
Edward Z. Yang	af8dbe7ec2	Fix training enablement in AOTAutograd (#95975 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/95975 Approved by: https://github.com/ngimel, https://github.com/voznesenskym	2023-03-05 04:28:29 +00:00
Edward Z. Yang	20dfce591c	Add support for Inductor + symbolic shapes + training (#93059 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/93059 Approved by: https://github.com/ezyang	2023-02-28 22:44:31 +00:00
Edward Z. Yang	58648822b6	Handle int/float arguments for cpp codegen in inductor (#95533 ) This is a little questionable because we don't actually know what the dtype of the sympy expression is, and it's not clear we can rely on the assumptions. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/95533 Approved by: https://github.com/ngimel, https://github.com/jansel	2023-02-28 03:57:35 +00:00
Sebastian Raschka	97ec340fe9	Fix double-a typo (#95470 ) Fixes a type where there was a repeated "a" in a warning message. Pull Request resolved: https://github.com/pytorch/pytorch/pull/95470 Approved by: https://github.com/ezyang	2023-02-27 18:54:43 +00:00
Brian Hirsh	bc51ee4ed7	fix spurious aot autograd warning (#95521 ) The _make_boxed logic probably needs a cleanup, but this fixes a spurious warning that we should get in before the release. Confirmed that this used to emit a warning and no longer does: ``` import torch lin = torch.nn.Linear(100, 10) def f(x): return lin(x) opt_f = torch.compile(f) opt_f(torch.randn(10, 100, requires_grad=False)) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/95521 Approved by: https://github.com/ngimel	2023-02-26 18:17:36 +00:00
Brian Hirsh	a641d60757	hotfix for memory leak in aot autograd induced by saving tensors for backward (#95101 ) Workaround fix in AOTAutograd for https://github.com/pytorch/pytorch/issues/94990 (see the comments for more details / discussion) Pull Request resolved: https://github.com/pytorch/pytorch/pull/95101 Approved by: https://github.com/albanD	2023-02-24 03:02:55 +00:00
Edward Z. Yang	89e16c4f18	Assume sympy is always installed (#94903 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/94903 Approved by: https://github.com/Skylion007, https://github.com/malfet	2023-02-16 14:09:58 +00:00
Kshiteej K	3fc4bc115f	[functorch] jacrev, jacfwd error for complex input or output (#94805 ) Related: https://github.com/pytorch/pytorch/issues/94397, https://github.com/pytorch/pytorch/issues/94397#issuecomment-1428452756 Pull Request resolved: https://github.com/pytorch/pytorch/pull/94805 Approved by: https://github.com/lezcano	2023-02-14 16:13:37 +00:00
Brian Hirsh	ce474bc643	fix view + detach graph case for inductor (#94744 ) fixes https://github.com/pytorch/pytorch/issues/94175 Pull Request resolved: https://github.com/pytorch/pytorch/pull/94744 Approved by: https://github.com/ezyang	2023-02-14 01:35:23 +00:00
Brian Hirsh	ceb0f1576b	turn functionalization on in aot_autograd inference (#92857 ) still waiting for CI fallout fixes #90759 Pull Request resolved: https://github.com/pytorch/pytorch/pull/92857 Approved by: https://github.com/ezyang	2023-02-13 17:48:00 +00:00
Aaron Gokaslan	67d9790985	[BE] Apply almost all remaining flake8-comprehension checks (#94676 ) Applies the remaining flake8-comprehension fixes and checks. This changes replace all remaining unnecessary generator expressions with list/dict/set comprehensions which are more succinct, performant, and better supported by our torch.jit compiler. It also removes useless generators such as 'set(a for a in b)`, resolving it into just the set call. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94676 Approved by: https://github.com/ezyang	2023-02-12 01:01:25 +00:00
Aaron Gokaslan	3d82d8d0ed	[BE] Enable more flake8-comprehensions checks (#94601 ) I applied some flake8 fixes and enabled checking for them in the linter. I also enabled some checks for my previous comprehensions PR. This is a follow up to #94323 where I enable the flake8 checkers for the fixes I made and fix a few more of them. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94601 Approved by: https://github.com/ezyang	2023-02-10 23:40:29 +00:00
Xuehai Pan	5b1cedacde	[BE] [2/3] Rewrite `super()` calls in functorch and torch (#94588 ) Rewrite Python built-in class `super()` calls. Only non-semantic changes should be applied. - #94587 - #94588 - #94592 Also, methods with only a `super()` call are removed: ```diff class MyModule(nn.Module): - def __init__(self): - super().__init__() - def forward(self, ...): ... ``` Some cases that change the semantics should be kept unchanged. E.g.: `f152a79be9/caffe2/python/net_printer.py (L184-L190)` `f152a79be9/test/test_jit_fuser_te.py (L2628-L2635)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/94588 Approved by: https://github.com/ezyang, https://github.com/albanD	2023-02-10 21:16:33 +00:00
kshitij12345	4f3858c6d8	[functorch] linearize (#94173 ) Fixes https://github.com/pytorch/functorch/issues/724 TODO: * [x] Docs NOTE: `const_fold` pass raises UserWarning -> https://github.com/pytorch/pytorch/issues/94374 Pull Request resolved: https://github.com/pytorch/pytorch/pull/94173 Approved by: https://github.com/Chillee	2023-02-09 15:45:08 +00:00
PyTorch MergeBot	e0e4f1a890	Revert "[functorch] linearize (#94173 )" This reverts commit `b6b9e1e6e0`. Reverted https://github.com/pytorch/pytorch/pull/94173 on behalf of https://github.com/kshitij12345 due to Broke lint runner	2023-02-09 09:22:39 +00:00
Kshiteej K	b6b9e1e6e0	[functorch] linearize (#94173 ) Fixes https://github.com/pytorch/functorch/issues/724 TODO: * [x] Docs NOTE: `const_fold` pass raises UserWarning -> https://github.com/pytorch/pytorch/issues/94374 Pull Request resolved: https://github.com/pytorch/pytorch/pull/94173 Approved by: https://github.com/Chillee	2023-02-09 08:57:05 +00:00
Edward Z. Yang	c028fc4e25	Decouple PT2 dynamic shapes from the functorch setting (#94469 ) The functorch setting still exists, but now it is no longer necessary: we infer use of Python dispatcher by checking if the ambient FakeTensorMode has a ShapeEnv or not. The setting still exists, but it is for controlling direct AOTAutograd use now; for PT2, it's sufficient to use torch._dynamo.config.dynamic_shapes. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/94469 Approved by: https://github.com/Chillee, https://github.com/voznesenskym, https://github.com/jansel	2023-02-09 06:41:41 +00:00
Edward Z. Yang	dc70b00d0b	Track and record hint on SymNode and use when possible (#94201 ) Historically, we work out `size_hint` by working it out on the fly by doing a substitution on the sympy expression with the `var_to_val` mapping. With this change, we also maintain the hint directly on SymNode (in `expr._hint`) and use it in lieu of Sympy substitution when it is available (mostly guards on SymInt, etc; in particular, in idiomatic Inductor code, we typically manipulate Sympy expressions directly and so do not have a way to conveniently maintain hints.) While it's possible this will give us modest performance improvements, this is not the point of this PR; the goal is to make it easier to carefully handle unbacked SymInts, where hints are expected not to be available. You can now easily test if a SymInt is backed or not by checking `symint.node.hint is None`. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/94201 Approved by: https://github.com/voznesenskym	2023-02-09 00:00:44 +00:00
Xuehai Pan	b8de1cf007	[functorch][nn] Refactor NN stateless APIs by swapping module tensors (#92536 ) - Fixes #92295 - Resolves #86708 - Resolves #92153 - Closes #92401 - Closes #92218 - Requires #91579 Refactor NN stateless APIs by swapping module tensors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/92536 Approved by: https://github.com/jbschlosser	2023-02-08 17:31:38 +00:00
Brian Hirsh	83275d8cdf	add torch.autograd._set_view_replay_enabled, use in aot autograd (#92588 ) tldr; this should fix some minor perf regressions that were caused by adding more as_strided() calls in aot autograd. This PR adds a new context manager, `torch.autograd._set_view_replay_enabled()`. Context: AOT Autograd has special handling for "outputs that alias graph intermediates". E.g. given this function: ``` def f(x): y = torch.mul(x, 2) out = y.view(-1) return out ``` AOT Autograd will do the following: ``` def fn_to_compile(x): y = torch.mul(x, 2) out = y.view(-1) # return the graph intermediate return y, out compiled_fn = compile(fn_to_compile) def wrapper(x): y, out = compiled_fn(x) # regenerate the alias of the graph intermediate return out._view_func(y) ``` What's annoying is that `out._view_func()` will result in a `.as_strided` call, because `out` is an ordinary runtime tensor. This (likely?) caused a perf regression, because when running the backward, out `as_strided_backward()` is slower than our `view_backward()`. In this PR, I added some TLS for instructing autograd to do view replay instead of as_strided, even when given a normal tensor. I'm definitely interested in thoughts from autograd folks (cc @albanD @soulitzer). A few points that I want to bring up: (1) One reason that this API seems generally useful to me is because of the case where you `torch.compile()` a function, and you pass in two inputs that alias each other, and mutate one of the inputs. Autograd is forced to add a bunch of as_strided() calls into the graph when this happens, but this would give users an escape hatch for better compiled perf in this situation (2) To be fair, AOT Autograd probably won't need this TLS in the long term. There's a better (more complicated) solution, where AOT Autograd manually precomputes the view chain off of graph intermediates during tracing, and re-applies them at runtime. This is kind of complicated though and feels lower priority to implement immediately. (3) Given all of that I made the API private, but lmk what you all think. This is a followup of https://github.com/pytorch/pytorch/pull/92255. Pull Request resolved: https://github.com/pytorch/pytorch/pull/92588 Approved by: https://github.com/ezyang, https://github.com/albanD	2023-02-08 01:48:32 +00:00
Aaron Gokaslan	8fce9a09cd	[BE]: pyupgrade Python to 3.8 - imports and object inheritance only (#94308 ) Apply parts of pyupgrade to torch (starting with the safest changes). This PR only does two things: removes the need to inherit from object and removes unused future imports. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94308 Approved by: https://github.com/ezyang, https://github.com/albanD	2023-02-07 21:10:56 +00:00
Eli Uriegas	567e6152da	Revert "[inductor] fix crash issue when input is a view tensor (#90150 )" (#94329 ) Had to provide a merge conflict resolution due to conflicts with https://github.com/pytorch/pytorch/pull/94118 This was causing issues with internal tests that look similar to: ``` in clone_preserve_strides x.size(), x.stride(), x.storage_offset() AttributeError: 'KeyedJaggedTensor' object has no attribute 'size' ``` See https://fburl.com/testinfra/nc0du2sp for more information This reverts commit #90150 @jansel can you help @blzheng with re-landing this as a co-development diff? Pull Request resolved: https://github.com/pytorch/pytorch/pull/94329 Approved by: https://github.com/jansel	2023-02-07 20:45:58 +00:00
Edward Z. Yang	170a3e0257	Enable Python dispatcher on inference-only aot_dispatch_base (#94118 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/94118 Approved by: https://github.com/voznesenskym	2023-02-04 06:10:21 +00:00
Elias Ellison	e4f11e01bd	[Fake Tensor] Allow fake meta by default, delete unused ctor args (#93993 ) Two small changes that I'm bundling together because one of them needs to touch fbcode and I'm not sure how to do stacked diffs + internal changes + land before release cut. Remove allow_meta from ctor, and allow by default: we should be able to trace through meta with fake tensors, so in some senses it's a bit weird to expose to user to disallow this. However, it's still useful debug wise to error from time to time, so I've added an option to the config that will get back previous behavior. Remove `throw_on_data_dependent_ops=True`: this was intended as a temporary behavior as we were smoothing things turning on the erroring. There are no uses anywhere of `throw_on_data_dependent_ops=False` I could find. These are technically backward-incompatble, but fake tensor is new since the last release / in a private namespace, and I don't want to release it with baggage that would be hard to remove later. Fix for https://github.com/pytorch/pytorch/issues/92877. Pull Request resolved: https://github.com/pytorch/pytorch/pull/93993 Approved by: https://github.com/bdhirsh, https://github.com/ezyang	2023-02-03 09:23:38 +00:00
blzheng	a71395dd88	[inductor] fix crash issue when input is a view tensor (#90150 ) Fix the crash failure mentioned in https://github.com/pytorch/pytorch/issues/93460 Pull Request resolved: https://github.com/pytorch/pytorch/pull/90150 Approved by: https://github.com/jgong5, https://github.com/jansel	2023-02-03 04:54:14 +00:00
PyTorch MergeBot	5d259425fc	Revert "[inductor] fix crash issue when input is a view tensor (#90150 )" This reverts commit `b11ec270ba`. Reverted https://github.com/pytorch/pytorch/pull/90150 on behalf of https://github.com/clee2000 due to failing test_inplace_unsqueeze3 (__main__.CPUReproTests) https://github.com/pytorch/pytorch/actions/runs/4074618739/jobs/7020199369 `b11ec270ba`, marking as landrace cuz all jobs are green on pr	2023-02-02 17:06:34 +00:00
blzheng	b11ec270ba	[inductor] fix crash issue when input is a view tensor (#90150 ) Fix the crash failure mentioned in https://github.com/pytorch/pytorch/issues/93460 Pull Request resolved: https://github.com/pytorch/pytorch/pull/90150 Approved by: https://github.com/jgong5, https://github.com/jansel	2023-02-02 12:49:26 +00:00
Jason Ansel	23d58fedb1	Use ConfigModule for _functorch.config (#93375 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/93375 Approved by: https://github.com/Chillee	2023-02-02 00:31:24 +00:00
sli	524ee07143	Fix https://github.com/pytorch/pytorch/issues/92377 (#92379 ) Fixes #92377 Pull Request resolved: https://github.com/pytorch/pytorch/pull/92379 Approved by: https://github.com/Chillee	2023-01-31 02:22:16 +00:00
Sherlock Huang	36fe31f537	[Reland] Refactor stack_trace preservation for node meta preservation (#90803 ) (#92400 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/90803 Approved by: https://github.com/jerryzh168, https://github.com/albanD ghstack-source-id: 5848cca08ef5d6f8868f4f79d8bc29711e9a52c2 Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/92400 Approved by: https://github.com/jerryzh168	2023-01-30 23:30:43 +00:00
Michael Voznesensky	4ca511c69e	Fix positional issues in dedup guards (#93137 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/93137 Approved by: https://github.com/bertmaher, https://github.com/wconstab, https://github.com/bdhirsh	2023-01-28 19:21:32 +00:00
soulitzer	f6f46ba3bb	[Reland] aot autograd explicitly errors on double backward (#92893 ) This reverts commit `fb980581a7`. Testing: `python benchmarks/dynamo/timm_models.py --float32 --training --only=mobilevit_s --performance --inductor --disable-cudagraphs` ``` main: memory: eager: 12.30 GB, dynamo: 12.28 GB, ratio: 1.00 + #90896 reverted: memory: eager: 12.30 GB, dynamo: 8.81 GB, ratio: 1.40 + this PR: memory: eager: 12.30 GB, dynamo: 8.81 GB, ratio: 1.40 ``` For comparison, if we apply old version of this PR instead: ``` main: + #90896 reverted: memory: eager: 12.30 GB, dynamo: 8.81 GB, ratio: 1.40 + old version of this PR memory: eager: 12.30 GB, dynamo: 10.36 GB, ratio: 1.19 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/92893 Approved by: https://github.com/bdhirsh	2023-01-26 23:19:27 +00:00
Nikita Shulga	b2f3ff6183	[Py3.11] Remove skip logic from vmap and forward_ad (#91825 ) Depends on https://github.com/pytorch/pytorch/pull/91805 Fixes https://github.com/pytorch/pytorch/issues/85506 Pull Request resolved: https://github.com/pytorch/pytorch/pull/91825 Approved by: https://github.com/albanD	2023-01-25 22:40:56 +00:00
Xuehai Pan	a2e1365248	[functorch] Remove not needed named member polyfill functions (#92613 ) The `nn.Module` APIs already support `remove_duplicate` argument. It's time to retire these not needed polyfill functions. They are identical to the `nn.Module.named_parameters` and `nn.Module.named_buffers` methods. Pull Request resolved: https://github.com/pytorch/pytorch/pull/92613 Approved by: https://github.com/ezyang, https://github.com/malfet	2023-01-24 13:15:32 +00:00
soulitzer	fb980581a7	Revert #92688 and #92348 (aot autograd explicitly errors on double backward) (#92863 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/92863 Approved by: https://github.com/eellison	2023-01-24 03:04:04 +00:00
Michael Lazos	6f1727b288	Print aot graphs if user specifies aot graph env vars (#92720 ) When integrating AOT logging with TorchInductor trace, the ability to print graphs to the console if the user specified any of the env vars was removed (in favor of using TORCH_COMPILE_DEBUG). This restores this by checking if the user set any of the aot debug variables before setting up the remainder of the logging, and adding a stream to stdout if any of those env vars are set. Pull Request resolved: https://github.com/pytorch/pytorch/pull/92720 Approved by: https://github.com/Chillee	2023-01-21 04:46:35 +00:00
Horace He	d6c3468f70	Don't allow recomputing a node that must be materialized in the backwards pass (#90896 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/90896 Approved by: https://github.com/ezyang	2023-01-20 22:34:41 +00:00
soulitzer	a3efa9d740	Create autograd Function for aot_autograd backward only when needed (#92688 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/92688 Approved by: https://github.com/bdhirsh	2023-01-20 21:55:23 +00:00
Edward Z. Yang	c4501593c3	Delete get_pyobj() entirely (#92638 ) Opt for the shorter and more direct node attribute access. I need to do this because I'm going to publicly document SymInt and SymFloat but I don't want to doc get_pyobj(). Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/92638 Approved by: https://github.com/Chillee, https://github.com/albanD, https://github.com/voznesenskym, https://github.com/bdhirsh	2023-01-20 19:06:56 +00:00
Sean Ross-Ross	fb3d9f39cc	update vmap to accept nones (#91644 ) * Fixes https://github.com/pytorch/functorch/issues/1082 * Fixes https://github.com/pytorch/functorch/issues/439 Pull Request resolved: https://github.com/pytorch/pytorch/pull/91644 Approved by: https://github.com/kshitij12345, https://github.com/Chillee	2023-01-20 18:25:22 +00:00
soulitzer	bb7790781f	Make aot_autograd explicitly error when double backward (#92348 ) Mitigates https://github.com/pytorch/pytorch/issues/91469 Changes: - ~once_differentiable can now be parametrized to print a custom error message~ - instead of once_differentiable, we do the backward inside another custom Function, which makes sure the graph is connected, but also makes sure to error on double backward - we now explicitly error when doing double backward with torch.compile + aot_autograd instead of being silently incorrect. ~The niceness of the error message can vary depending on whether your grad_outputs are passed, or whether you are doing `.grad()` or `.backward()`.~ Unchanged: - doing backward inside compiled function is still allowed. It currently causes a graph break and is equivalent to doing backward outside the compiled function. It might be nice to disallow this explicitly as well, but that can be done in a follow up. Pull Request resolved: https://github.com/pytorch/pytorch/pull/92348 Approved by: https://github.com/albanD	2023-01-19 00:13:29 +00:00
Xuehai Pan	ae4ec7de1e	Fix and update type hints for `make_functional.py` (#91579 ) Changes in details: - Fix and update some out-of-date type hints in `_functorch/make_functional.py`. - ~Explicitly use `OrderedDict` for order-sensitive mappings.~ In `create_names_map()`, `_swap_state()`, and `FunctionalModuleWithBuffers.__init__()`, the unordered `dict` was used. The key order should be preserved for `dict.items()` while it is required to `zip` with a tuple of `params`/`buffers`. Although since Python 3.6, the built-in dictionary is insertion ordered ([PEP 468](https://peps.python.org/pep-0468)). Explicit is better than implicit. Pull Request resolved: https://github.com/pytorch/pytorch/pull/91579 Approved by: https://github.com/zou3519	2023-01-18 19:16:32 +00:00
Richard Zou	5d01277fea	Deprecate torch.nn.utils.stateless.functional_call (#92280 ) This PR: - Updates the docs to say it is deprecated - Raises a UserWarning - Changes most of the callsites inside PyTorch to use torch.func.functional_call, minus the test_stateless testing. The motivation behind this is that we can now align behind a single functional_call API in PyTorch. Test Plan: - existing tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/92280 Approved by: https://github.com/albanD	2023-01-18 14:26:25 +00:00
Richard Zou	a8a44a1aa2	Add deprecation messages for functorch.* function transforms (#92279 ) This PR: - adds deprecation warnings when calling the functorch APIs - adds documentation saying that those APIs are deprecated It does this by creating thin wrappers around the original APIs that (1) raise deprecation warnings and (2) have an additional line in their documentation that they are deprecated. NB: - Python surpresses DeprecationWarning, so we use UserWarning instead. Test Plan: - New tests - the functorch.* APIs are still tested for correctness because that's what test/functorch/* use (as opposed to directly calling the torch.func.* APIs) Pull Request resolved: https://github.com/pytorch/pytorch/pull/92279 Approved by: https://github.com/albanD, https://github.com/soulitzer	2023-01-18 14:26:25 +00:00
Richard Zou	21d2bd782b	stack_module_state should return unrelated parameters (#92278 ) `torch.func.stack_module_state` is our replacement for `functorch.combine_state_for_ensemble`. The most common usage for combine_state_for_ensemble is to - create stacked parameters and buffers - use vmap to run the forward pass - use regular PyTorch autograd to run the backward pass (e.g., Tensor.backwrd) - optimize directly over the stacked parameters (this is more performant than optimizing over the unstacked parameters). Right now, stack_module_state returns stacked parameters that cannot be optimized directly (only leaf tensors can have a .grad field); this PR fixes that by turning the stacked parameters back into leaf tensors. Test Plan: - new tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/92278 Approved by: https://github.com/soulitzer	2023-01-18 14:26:22 +00:00
Richard Zou	fbf9e379e1	[autograd.Function] update error messages for vmap to point to docs (#92030 ) We need to separately update it when 2.0 comes along and the master docs become stable docs so that users aren't looking at master docs all the time. Pull Request resolved: https://github.com/pytorch/pytorch/pull/92030 Approved by: https://github.com/soulitzer	2023-01-17 13:36:42 +00:00
Richard Zou	7aaad0b832	Rename flag that enables/disables _SingleLevelFunction for functorch (#92025 ) functorch used to have a switch that enables/disables autograd.Function. That switch now enables/disables torch.autograd.function._SingleLevelFunction, so I've renamed it accordingly. We could just delete the switch because users should not be directly working with torch.autograd.function._SingleLevelFunction. However, it was useful for debugging when something went wrong when I was implementing the autograd.Function <> functorch interaction, so I want to keep it around as a debugging tool for a while since the code is already there. Test Plan: - updated tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/92025 Approved by: https://github.com/soulitzer	2023-01-17 13:36:41 +00:00
Richard Zou	14ff58d4fa	[generate_vmap_rule] Delete unused output_shapes (#92024 ) We don't actually need `output_shapes` to implement `generate_vmap_rule=True` support for autograd.Function. - We need this in the vjp (backward) case because autograd automatically reduces grad_inputs to inputs and we need to replicate that behavior. In order to replicate that behavior, we recorded the original input shapes so we know how to reduce the grad_input. - There is no such behavior for forward-mode AD, so we don't need to pass an `output_shapes` to reductify. This PR simplifies the API of `reductify` and `reductify_leaf`. Instead of accepting `input_shape_without_bdim` and `allow_expanded_grad`, we now combine these into a single argument, `reduce_to_input_shape_without_bdim`. - if it is None, then we don't do anything - if it is not-None and a shape, then we will reduce the grad to the provided shape. Test Plan: - updated original unittests - wait for test suite Pull Request resolved: https://github.com/pytorch/pytorch/pull/92024 Approved by: https://github.com/soulitzer	2023-01-17 13:36:39 +00:00
Richard Zou	f5af97ef06	[autograd.Function] add nice error message for incorrect usage of vmap (#92023 ) This PR: - adds a nice error message if the user doesn't follow the API of the vmap staticmethod correctly. That is, the user must return two arguments from the vmap staticmethod API: (outputs, out_dims), and out_dims must be a PyTree with either the same structure as `outputs` our be broadcastable to the same structure as `outputs`. - Fixes an edge case for out_dims=None. out_dims is allowed to be None, but wrap_outputs_maintaining_identity was treating "None" as "This is not the vmap case" Test Plan: - new tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/92023 Approved by: https://github.com/soulitzer	2023-01-17 13:36:37 +00:00
Richard Zou	2f9166ef89	[autograd.Function] Cleanup asymmetry in generate_vmap_rule and vmap (#91787 ) This PR: - changes generate_vmap_rule to either be True or False. Previously it could be True, False, or not set. This simplifies the implementation a bit. - changes the vmap staticmethod to always be on the autograd.Function rather than sometimes defined. This is how the other staticmethod (forward, backward, jvp) are implemented and allows us to document it. There are 4 possible states for the autograd.Function w.r.t. to the above: - generate_vmap_rule is True, vmap staticmethod overriden. This raises an error when used with vmap. - generate_vmap_rule is False, vmap staticmethod overriden. This is valid. - generate_vmap_rule is True, vmap staticmethod not overriden. This is valid. - generate_vmap_rule is False, vmap staticmethod not overriden. This raises an error when used with vmap. Future: - setup_context needs the same treatment, but that's a bit tricker to implement. Test Plan: - new unittest - existing tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/91787 Approved by: https://github.com/soulitzer	2023-01-17 13:36:34 +00:00
Edward Z. Yang	ded2b47bde	Fix AOTAutograd 2.0 perf regression involving as_strided (#92255 ) I feel there may be a deeper fix where we avoid as_strided entirely, but in the regressed model the sizes/strides all lined up exactly, so this seems to work to fix the immediate regression. Repro command: `python benchmarks/dynamo/torchbench.py --performance --backend inductor --float16 --training --batch-size-file $(realpath benchmarks/dynamo/torchbench_models_list.txt) --only hf_Bert ` Before: 1.138x p=0.00 After: 1.162x p=0.00 Natalia pinpointed it to this line by comparing GPU traces and finding that the regressed PyTorch had two extra fill kernels and a memcpy: Without regression: ![image](https://user-images.githubusercontent.com/13564/212726521-450e183d-7b36-4538-ad14-617e09c689a8.png) With regression: ![image](https://user-images.githubusercontent.com/13564/212726469-4f3ff4b5-3f68-48cf-94d2-ddebb9216176.png) ...which CPU profiler blamed on `AsStridedBackward`: ![image](https://user-images.githubusercontent.com/13564/212726953-16333bfc-8460-4445-90ad-7fe73c4173c2.png) ...which were then pinpointed to https://github.com/pytorch/pytorch/pull/92076/files#diff-df954bbf954d2dcb81f687876053267ffa4ddb36ed86b7d2bd76319ff2b94416R486-R489 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/92255 Approved by: https://github.com/ngimel, https://github.com/bdhirsh	2023-01-17 06:07:37 +00:00
Edward Z. Yang	7078ad5b8c	Reland "AOT Autograd refactor + cleanup, handle intermediate views of bases, use view replay, fix non-tensor input handling" (#92076 ) Original PR: https://github.com/pytorch/pytorch/pull/89532 Pull Request resolved: https://github.com/pytorch/pytorch/pull/92076 Approved by: https://github.com/janeyx99, https://github.com/albanD	2023-01-12 21:32:05 +00:00
samdow	515dff7811	[functorch] move batch_norm_replacement to torch.func (#91412 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/91412 Approved by: https://github.com/zou3519	2023-01-12 19:15:41 +00:00
samdow	8b3c4bc481	[stateless] add weight tying support (#90477 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/90477 Approved by: https://github.com/zou3519	2023-01-11 15:19:09 +00:00
PyTorch MergeBot	498be7ed25	Revert "Refactor stack_trace preservation for node meta preservation (#90803 )" This reverts commit `0f1302eeae`. Reverted https://github.com/pytorch/pytorch/pull/90803 on behalf of https://github.com/DanilBaibak due to Break internal build	2023-01-10 10:44:28 +00:00
Sherlock Huang	0f1302eeae	Refactor stack_trace preservation for node meta preservation (#90803 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/90803 Approved by: https://github.com/jerryzh168, https://github.com/albanD	2023-01-09 23:23:27 +00:00
samdow	162474d7fd	[functorch] add new ensembling api, demonstrate in example (#88850 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88850 Approved by: https://github.com/zou3519	2023-01-04 00:33:14 +00:00
samdow	c5e5916fff	[functorch] add functorch functional_call, update tests to test this (#89213 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89213 Approved by: https://github.com/zou3519	2023-01-04 00:33:14 +00:00
joncrall	ad782ff7df	Enable xdoctest runner in CI for real this time (#83816 ) Builds on #83317 and enables running the doctests. Just need to figure out what is causing the failures. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83816 Approved by: https://github.com/ezyang, https://github.com/malfet	2022-12-29 05:32:42 +00:00
Kurt Mohler	08a47549af	Rename `Tensor._storage` to `Tensor.untyped_storage` and update docs (#91414 ) Fixes #89224 Pull Request resolved: https://github.com/pytorch/pytorch/pull/91414 Approved by: https://github.com/ezyang	2022-12-28 19:21:34 +00:00
Kshiteej K	3fdbf824ae	[functorch] jacrev: chunk_size=1 without vmap (#91326 ) As discussed at https://github.com/pytorch/pytorch/pull/91157#discussion_r1053679272 Pull Request resolved: https://github.com/pytorch/pytorch/pull/91326 Approved by: https://github.com/zou3519	2022-12-28 04:56:25 +00:00
soulitzer	1b2ee4d0e1	Update functorch supported autograd.Function to allow mark_dirty (#91222 ) Fixes https://github.com/pytorch/pytorch/issues/90225 Uses what was originally in `32a57bcdb6` Pull Request resolved: https://github.com/pytorch/pytorch/pull/91222 Approved by: https://github.com/zou3519	2022-12-28 03:53:47 +00:00
Richard Zou	e8393131ee	[generate_vmap_rule] support for jvp (#91211 ) Support for jvp is very similar to support for backward(): - We need to vmap over a version of the original autograd.Function's jvp method that does not take ctx as input. - On the output, we need to reductify to ensure the output tangent has the same shape as the output. This reductify does not have the extra reduction semantics, because PyTorch forward-mode AD requires the output tangent to have the same exact shape as the output. - setup_context needs to tell us the bdims of the saved_tensors (necessary for vmap over jvp_no_context), as well as the output shapes (necessary for reductify). Test Plan: - Added jvp support to the *GenVmapAutogradFunction Pull Request resolved: https://github.com/pytorch/pytorch/pull/91211 Approved by: https://github.com/soulitzer	2022-12-27 23:25:59 +00:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	76a3869fc6	Support functionalization on torch.cond (#89966 ) This PR adds functionalization path for torch.cond. As it is the first pass, we only functionalize for very restrictive use cases. We explicitly restrict following: - Output of each branch aliasing input - In-place mutation on inputs given to each branch Pull Request resolved: https://github.com/pytorch/pytorch/pull/89966 Approved by: https://github.com/zou3519	2022-12-22 22:01:47 +00:00
kshitij12345	4437d0d161	[functorch] vmap: chunk_size support (#91157 ) Ref: https://github.com/pytorch/functorch/issues/680 We introduce a kwarg `chunk_size` in vmap. Also, we leverage most of the code from `chunk_vmap` (except for chunking the input based on `chunk_size`) Benchmarks from https://github.com/pytorch/functorch/pull/774 apply. Pull Request resolved: https://github.com/pytorch/pytorch/pull/91157 Approved by: https://github.com/zou3519	2022-12-22 19:45:45 +00:00
Brian Hirsh	c47bdd7522	_scatter ops should preserve input stride/storage_offset (#91029 ) It turns out that we do* need to update *_scatter ops to return the exact same strides as their inputs. I added a test to `test/test_functionalization.py`, which now trips thanks to Ed's functionalization stride debugging check. It only actually ends up tripping silent correctness if you try to .backward() on that function. Pull Request resolved: https://github.com/pytorch/pytorch/pull/91029 Approved by: https://github.com/ezyang	2022-12-22 19:41:53 +00:00
Richard Zou	fb2e1878cb	[torch.func] alias torch.func.vmap as torch.vmap (#91026 ) This PR also redirects torch.vmap to torch.func.vmap instead of the old vmap prototype. Test Plan: - tests - view docs preview Pull Request resolved: https://github.com/pytorch/pytorch/pull/91026 Approved by: https://github.com/albanD, https://github.com/samdow	2022-12-21 20:51:49 +00:00
Richard Zou	2f37804cae	[generate_vmap_rule] Add generate_vmap_rule to autograd.Function (#90966 ) Design document: https://docs.google.com/document/d/1bIQkWXy3J35_20c_a5kchikabBW5M8_uRAhl0BIMwU4/edit This PR adds a `generate_vmap_rule` option (default False) to autograd.Function. By setting it to True, a user promises to us that their autograd.Function's {forward, backward, jvp}, if defined, only uses PyTorch operations, in addition to the other limitations of autograd.Function+functorch (such as the user not capturing any Tensors being transformed over from outside of the autograd.Function). Concretely, the approach is: - we update `custom_function_call` to accept an additional `generate_vmap_rule` argument. - The vmap rule for `custom_function_call` and `generate_vmap_rule=True` is: we construct a vmapped version of the autograd.Function and dispatch on it. - The vmapped version of the autograd.Function can be thought of like the following: if we have an autograd.Function Foo, then VmappedFoo.apply(in_dims, ...) has the same semantics as vmap(Foo.apply, in_dims...) - VmappedFoo's forward, setup_context, and backward staticmethod are vmapped versions of Foo's staticmethods. - See the design doc for more motivation and explanation Test Plan: - This PR introduces additional autograd.Function with the suffix "GenVmap" to autograd_function_db. - There are also some minor UX tests Future: - jvp support - likely more testing to come, but please let me know if you have cases that you want me to test here. Pull Request resolved: https://github.com/pytorch/pytorch/pull/90966 Approved by: https://github.com/soulitzer	2022-12-21 00:34:44 +00:00
Richard Zou	2a55984139	[generate_vmap_rule] reductify_leaf helper function (#90965 ) As seen in https://docs.google.com/document/d/1bIQkWXy3J35_20c_a5kchikabBW5M8_uRAhl0BIMwU4/edit `reductify_leaf(grad_input, ...)` is a helper function that processes a single grad_input Tensor. The reason why we need it is: - the grad_input has some optional bdim - the input has some optional bdim - if these are different, we need to coerce the grad_input into having the same shape as the input, either by reducing or expanding the grad_input. Note that there is a special case in autograd that the user is allowed to return a grad_input Tensor that is an expanded version of the original input tensor. In this case, autograd automatically reduces grad_input to the same shape as the input. Unfortunately this logic doesn't work when bdims are involved, so we manually handle it in `reductify_leaf`. Test Plan: - tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/90965 Approved by: https://github.com/soulitzer	2022-12-21 00:34:44 +00:00
Richard Zou	53c94ef1bb	[generate_vmap_rule] Add mechanism to override ctx.saved_tensors (CtxWithSavedTensors) (#90964 ) As seen in https://docs.google.com/document/d/1bIQkWXy3J35_20c_a5kchikabBW5M8_uRAhl0BIMwU4/edit#heading=h.r3ckcnsh1cxt This PR creates CtxWithSavedTensors. You can wrap a ctx object in the backward pass of autograd.Function in CtxWithSavedTensors and specify the saved_tensors attribute. CtxWithSavedTensor acts like the original ctx object (all other attribute accesses are forwarded to the original ctx object) but it has a custom saved_tensors field. Test Plan: - tests that you can use CtxWithSavedTensors to get a new object with your own saved_tensors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/90964 Approved by: https://github.com/samdow, https://github.com/soulitzer	2022-12-21 00:34:43 +00:00
Richard Zou	31981d0139	[generate_vmap_rule] add restore_vmap helper function (#90963 ) As seen in https://docs.google.com/document/d/1bIQkWXy3J35_20c_a5kchikabBW5M8_uRAhl0BIMwU4/edit `restore_vmap` is a private helper function. It is vmap but has the following differences: - instead of returning outputs, it returns an (outputs, out_dims) tuple. out_dims is a pytree of shape shape as outputs and contains Optional[int] specifying where the vmapped dimension, if it exists, is in the corresponding output. - does no validation on in_dims or inputs (vmap expects at least one Tensor to be vmapped). restore_vmap allows for no inputs to have the vmap dimension - does no validation on outputs (vmap expects only Tensor outputs) restore_vmap allows for return of arbitrary outputs (not just Tensors) Test Plan: - added some simple test to test restore_vmap - I am OK with restore_vmap not being a part of vmap right now -- the implementation of vmap rarely changes and it is a bit difficult to refactor vmap in a way that restore_vmap is a subroutine. Other questions: - Bikeshedding the `restore_vmap` name Pull Request resolved: https://github.com/pytorch/pytorch/pull/90963 Approved by: https://github.com/samdow, https://github.com/soulitzer	2022-12-21 00:34:41 +00:00
William Wen	289f06434c	[dynamo] check buffers when checking accuracy (#91037 ) Tested by running `python benchmarks/dynamo/torchbench.py --accuracy --float32 -dcuda --output=inductor_torchbench_float32_training_cuda_performance.csv --training --inductor --no-skip --dashboard --only mobilenet_v2 --cold_start_latency` and breakpointing after the changes to inspect buffers. Pull Request resolved: https://github.com/pytorch/pytorch/pull/91037 Approved by: https://github.com/anijain2305	2022-12-20 13:57:25 +00:00
Richard Zou	41846e205e	[torch.func] Setup torch.func, populate it with all transforms (#91016 ) This PR sets up torch.func and populates it with the following APIs: - grad - grad_and_value - vjp - jvp - jacrev - jacfwd - hessian - functionalize - vmap It also renames all instances of `functorch` in the APIs for those docs to `torch.func`. We rewrite the `__module__` fields on some of the above APIs so that the APIs fit PyTorch's public api definition. - For an API to be public, it must have a `__module__` that points to a public PyTorch submodule. However, `torch._functorch.eager_transforms` is not public due to the leading underscore. - The solution is to rewrite `__module__` to point to where the API is exposed (torch.func). This is what both Numpy and JAX do for their APIs. - h/t pmeier in https://github.com/pytorch/pytorch/issues/90284#issuecomment-1348595246 for idea and code - The helper function, `exposed_in`, is confined to torch._functorch/utils for now because we're not completely sure if this should be the long-term solution. Implication for functorch.* APIs: - functorch.grad is the same object as torch.func.grad - this means that the functorch.grad docstring is actually the torch.func.grad docstring and will refer to torch.func instead of functorch. - This isn't really a problem since the plan on record is to deprecate functorch in favor of torch.func. We can fix these if we really want, but I'm not sure if a solution is worth maintaining. Test Plan: - view docs preview Future: - vmap should actually just be torch.vmap. This requires an extra step where I need to test internal callsites, so, I'm separating it into a different PR. - make_fx should be in torch.func to be consistent with `import functorch`. This one is a bit more of a headache to deal with w.r.t. public api, so going to deal with it separately. - beef up func.rst with everything else currently on the functorch documention website. func.rst is currently just an empty shell. Pull Request resolved: https://github.com/pytorch/pytorch/pull/91016 Approved by: https://github.com/samdow	2022-12-20 00:00:52 +00:00
Richard Zou	cad1ce6158	Stop using :attr: in functorch docs (#91015 ) We're using :attr: wrong. :attr: refers to an attribute of a Python object, not the parameter to a function: - https://www.sphinx-doc.org/en/master/usage/restructuredtext/domains.html#role-py-attr This leads to some weird things when moving to torch.func: sphinx decides to link torch.func for :attr:`func` Test Plan: - docs preview. Pull Request resolved: https://github.com/pytorch/pytorch/pull/91015 Approved by: https://github.com/samdow	2022-12-20 00:00:52 +00:00
Kshiteej K	f02e93b584	jacrev : Support chunked computation (#89376 ) Ref: https://github.com/pytorch/functorch/issues/680 We introduce a kwarg `chunk_size` in `jacrev` to control whether the Jacobian computation should be chunked and if so then `chunk_size` will dictate the maximum size of the chunks used. We try two approaches, * Stacked Approach: Append the intermediate computation to a list and then stack those results. * Pre-allocation Approach: Pre-allocate a zeros tensor and copy chunked computation into it. For Memory Benchmark, see https://github.com/pytorch/pytorch/pull/89376#issuecomment-1348479098 Benchmark CPU : Performs better with more chunks/ smaller chunk_size. NOTE: There seems to be a lot of noise for shape `(64, 64)`. <details> ``` [----------------------------------------------- jacrev : device cpu : chunks 2 -----------------------------------------------] \| with chunk_size and stacked \| without chunk_size \| with chunk_size and pre-allocated 1 threads: --------------------------------------------------------------------------------------------------------------------- (64, 64) : chunk_size 2080 \| 76.2 \| 50.9 \| 80.1 (128, 128) : chunk_size 8256 \| 1172.8 \| 783.3 \| 1225.5 (128, 144) : chunk_size 9288 \| 1475.1 \| 990.4 \| 1548.3 (144, 144) : chunk_size 10440 \| 1871.3 \| 1254.4 \| 1971.2 Times are in milliseconds (ms). [----------------------------------------------- jacrev : device cpu : chunks 3 ----------------------------------------------] \| with chunk_size and stacked \| without chunk_size \| with chunk_size and pre-allocated 1 threads: -------------------------------------------------------------------------------------------------------------------- (64, 64) : chunk_size 1386 \| 39.9 \| 25.8 \| 58.8 (128, 128) : chunk_size 5504 \| 1182.6 \| 782.2 \| 1229.7 (128, 144) : chunk_size 6192 \| 1483.6 \| 995.4 \| 1550.6 (144, 144) : chunk_size 6960 \| 1879.1 \| 1257.7 \| 1960.5 Times are in milliseconds (ms). [----------------------------------------------- jacrev : device cpu : chunks 4 ----------------------------------------------] \| with chunk_size and stacked \| without chunk_size \| with chunk_size and pre-allocated 1 threads: -------------------------------------------------------------------------------------------------------------------- (64, 64) : chunk_size 1040 \| 41.7 \| 50.6 \| 29.1 (128, 128) : chunk_size 4128 \| 1171.6 \| 782.3 \| 1226.7 (128, 144) : chunk_size 4644 \| 1482.2 \| 994.6 \| 1550.9 (144, 144) : chunk_size 5220 \| 1870.2 \| 1254.5 \| 1961.4 Times are in milliseconds (ms). [--------------------------------------------- jacrev : device cpu : chunks 100 ---------------------------------------------] \| with chunk_size and stacked \| without chunk_size \| with chunk_size and pre-allocated 1 threads: ------------------------------------------------------------------------------------------------------------------- (64, 64) : chunk_size 41 \| 46.8 \| 50.5 \| 46.4 (128, 128) : chunk_size 165 \| 622.2 \| 775.2 \| 656.0 (128, 144) : chunk_size 185 \| 803.9 \| 987.3 \| 866.9 (144, 144) : chunk_size 208 \| 1021.1 \| 1251.2 \| 1088.2 Times are in milliseconds (ms). [--------------------------------------------- jacrev : device cpu : chunks 200 ---------------------------------------------] \| with chunk_size and stacked \| without chunk_size \| with chunk_size and pre-allocated 1 threads: ------------------------------------------------------------------------------------------------------------------- (64, 64) : chunk_size 20 \| 60.9 \| 50.2 \| 62.3 (128, 128) : chunk_size 82 \| 583.1 \| 779.4 \| 634.3 (128, 144) : chunk_size 92 \| 834.1 \| 1005.8 \| 472.3 (144, 144) : chunk_size 104 \| 1053.6 \| 1277.0 \| 1033.9 Times are in milliseconds (ms). [--------------------------------------------- jacrev : device cpu : chunks 300 --------------------------------------------] \| with chunk_size and stacked \| without chunk_size \| with chunk_size and pre-allocated 1 threads: ------------------------------------------------------------------------------------------------------------------ (64, 64) : chunk_size 13 \| 77.7 \| 50.4 \| 79.6 (128, 128) : chunk_size 55 \| 578.9 \| 782.3 \| 626.9 (128, 144) : chunk_size 61 \| 718.2 \| 1024.9 \| 800.4 (144, 144) : chunk_size 69 \| 919.7 \| 1313.7 \| 1023.0 Times are in milliseconds (ms). ``` </details> Benchmark CUDA: Performs better with less chunks/bigger chunk_size. <details> ``` [--------------------------------------------- jacrev : device cuda:1 : chunks 2 ----------------------------------------------] \| with chunk_size and stacked \| without chunk_size \| with chunk_size and pre-allocated 1 threads: --------------------------------------------------------------------------------------------------------------------- (64, 64) : chunk_size 2080 \| 1485.7 \| 923.8 \| 1632.3 (128, 128) : chunk_size 8256 \| 25390.2 \| 14103.2 \| 33557.4 (128, 144) : chunk_size 9288 \| 801.7 \| 16854.1 \| 42894.6 (144, 144) : chunk_size 10440 \| 1003.5 \| 21386.5 \| 59648.5 Times are in microseconds (us). 3 / 3 : Shape (144, 144) : Device cuda:1 : chunks: 3 [--------------------------------------------- jacrev : device cuda:1 : chunks 3 ---------------------------------------------] \| with chunk_size and stacked \| without chunk_size \| with chunk_size and pre-allocated 1 threads: -------------------------------------------------------------------------------------------------------------------- (64, 64) : chunk_size 1386 \| 1474.5 \| 924.5 \| 1655.5 (128, 128) : chunk_size 5504 \| 25368.9 \| 10156.0 \| 34022.1 (128, 144) : chunk_size 6192 \| 25223.0 \| 12933.7 \| 56418.5 (144, 144) : chunk_size 6960 \| 24729.3 \| 16367.4 \| 68744.7 Times are in microseconds (us). 3 / 3 : Shape (144, 144) : Device cuda:1 : chunks: 4 [--------------------------------------------- jacrev : device cuda:1 : chunks 4 ---------------------------------------------] \| with chunk_size and stacked \| without chunk_size \| with chunk_size and pre-allocated 1 threads: -------------------------------------------------------------------------------------------------------------------- (64, 64) : chunk_size 1040 \| 1489.2 \| 924.4 \| 1679.6 (128, 128) : chunk_size 4128 \| 25370.4 \| 8987.4 \| 57201.3 (128, 144) : chunk_size 4644 \| 32239.1 \| 10136.2 \| 72406.5 (144, 144) : chunk_size 5220 \| 40994.3 \| 12867.8 \| 108653.4 Times are in microseconds (us). 3 / 3 : Shape (144, 144) : Device cuda:1 : chunks: 100 [------------------------------------------- jacrev : device cuda:1 : chunks 100 --------------------------------------------] \| with chunk_size and stacked \| without chunk_size \| with chunk_size and pre-allocated 1 threads: ------------------------------------------------------------------------------------------------------------------- (64, 64) : chunk_size 41 \| 21121.8 \| 924.2 \| 22753.5 (128, 128) : chunk_size 165 \| 23679.7 \| 14284.4 \| 26758.2 (128, 144) : chunk_size 185 \| 30082.3 \| 18063.3 \| 33553.5 (144, 144) : chunk_size 208 \| 38175.6 \| 22839.5 \| 42030.0 Times are in microseconds (us). ``` </details> Benchmark Script <details> ```python import functorch import torch import itertools import time from torch.utils.benchmark import Timer from torch.utils.benchmark import Compare import sys import pickle from torch import profiler import math def prod(l): prod = 1 for el in l: prod = el return prod def fn(x, y): return x + y, x.sum(0) shapes = ((64, 64), (128, 128), (128, 144), (144, 144)) for device in ('cpu', 'cuda:1'): if device == 'cuda:1': chunks = (2, 3, 4, 100,) else: chunks = (2, 3, 4, 100, 200, 300) for chunk in chunks: results = [] for shape in shapes: x = torch.zeros(shape, dtype=torch.float, device=device) y = x.sum() chunk_size = (prod(shape) + prod(shape[1:])) // chunk jacrev_fn_chunked = functorch.jacrev(fn, (0, 1), chunk_size=chunk_size) jacrev_fn_chunked_pre = functorch.jacrev(fn, (0, 1), chunk_size=chunk_size, _preallocate_and_copy=True) jacrev_fn = functorch.jacrev(fn, (0, 1), chunk_size=None) tasks = [("jacrev_fn_chunked(x, y)", "with chunk_size and stacked"), ("jacrev_fn(x, y)", "without chunk_size"), ("jacrev_fn_chunked_pre(x, y)", "with chunk_size and pre-allocated"),] timers = [Timer(stmt=stmt, label=f"jacrev : device {device} : chunks {chunk}", sub_label=f"{(shape)} : chunk_size {chunk_size}", description=desc, globals=globals()) for stmt, desc in tasks] for i, timer in enumerate(timers): results.append( timer.blocked_autorange(min_run_time=2.) ) print(f"\r{i + 1} / {len(timers)} : Shape {shape} : Device {device} : chunks: {chunk}", end="") sys.stdout.flush() print() comparison = Compare(results) comparison.print() ``` </details> Pull Request resolved: https://github.com/pytorch/pytorch/pull/89376 Approved by: https://github.com/zou3519	2022-12-19 20:04:21 +00:00
Edward Z. Yang	944519a468	Switch use_fake_tensor to True by default (#89663 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/89663 Approved by: https://github.com/anjali411, https://github.com/Morgan77523	2022-12-19 07:24:06 +00:00
Michael Voznesensky	b72caf311d	Introduce guardexpr, aot autograd guarding of duplicates into torch._guards (#90955 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/90955 Approved by: https://github.com/ezyang	2022-12-18 03:05:47 +00:00
Brian Hirsh	4ab81ae80d	fix default partitioner: save sizes instead of tensor for backward when possible (#91012 ) This should fix hf_Longformer, AllenaiLongformerBase, and tacotron2 with dynamic shapes. Example repro: ``` TORCHDYNAMO_DYNAMIC_SHAPES=1 AOT_DYNAMIC_SHAPES=1 python benchmarks/dynamo/torchbench.py --accuracy --backend aot_eager --training --only hf_Longformer ``` used to fail with: ``` RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [4, 1024, 12, 513]], which is output 0 of AsStridedBackward0, is at version 6; expected version 4 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True). ``` The problem is that: (1) when we have a tensor from the forward, whose sizes are needed the backward, we were saving the actual tensor for backward, and directly grabbing the sizes off of it inside of the backward graph (bad for perf) (2) If that tensor happens to be a graph input that gets mutated, we end up with the above error. Autograd yells at you if you try to save a tensor for backward, and later mutate it. I confirmed that this problem doesn't happen for the min cut partitioner. Pull Request resolved: https://github.com/pytorch/pytorch/pull/91012 Approved by: https://github.com/ezyang	2022-12-17 02:06:10 +00:00
Richard Zou	ffa37c9fca	Add VmapInterpreter.randomness (in pyfunctorch) provide it in info object (#90789 ) This PR: - adds VmapInterpreter.randomness. This returns the randomness option the user provided in vmap(..., randomness=...) - adds randomness in the info object passed to the vmap staticmethod of autograd.Function. This is so that the user can handle random operations on their own terms (if randomness="error", and if the autograd.Function has random operations, then it is the user's responsiblity to raise an error). Test Plan: - updated unittest Pull Request resolved: https://github.com/pytorch/pytorch/pull/90789 Approved by: https://github.com/samdow, https://github.com/soulitzer	2022-12-17 00:43:43 +00:00
Brian Hirsh	7a683eaeb8	aot_autograd: add assert for functional-only graph (#88816 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88816 Approved by: https://github.com/ezyang, https://github.com/ngimel	2022-12-16 21:04:36 +00:00
Edward Z. Yang	e48c91688b	DebugInterpreter works with symbolic shapes now, plus test (#90913 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/90913 Approved by: https://github.com/voznesenskym	2022-12-16 05:22:56 +00:00
Edward Z. Yang	eef019c14a	Lint rule to forbid direct use of logging.info/etc APIs (#90907 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/90907 Approved by: https://github.com/jansel	2022-12-16 05:13:51 +00:00
Brian Hirsh	3c637e8007	fix aot autograd for None fw inputs (#89975 ) hot fix: Confirmed this fixes an internal model that had None as one if its inputs Pull Request resolved: https://github.com/pytorch/pytorch/pull/89975 Approved by: https://github.com/aazzolini	2022-12-14 18:44:08 +00:00
Richard Zou	f21cb7d77e	[pyfunctorch] Generate a more meaningful name for _SingleLevelAutogradFunction (#90418 ) The API to do this is not pretty, but at least it works. Test Plan: - new test Pull Request resolved: https://github.com/pytorch/pytorch/pull/90418 Approved by: https://github.com/soulitzer	2022-12-14 16:20:57 +00:00
Richard Zou	da42eab48b	Fix circular import in torch/autograd/function.py (#90415 ) It turns out it is possible to break cycles by not directly importing a module: - there's a problem that torch.jit imports torch._ops and torch._ops import torch.jit - there's another problem that torch.autograd.function imports custom_function_call but torch._functorch.autograd_function imports torch.autograd.function The "better" way to handle all of this is to do some large refactoring so that torch._functorch.autograd_function imports some file that has _SingleLevelAutogradFunction and then have torch.autograd.function depend on torch.functorch.autograd_function... (and ditto for torch.jit vs torch._ops), but I'm scared to move code around too much for BC reasons and the fix in this PR works well. Test Plan: - import torch Pull Request resolved: https://github.com/pytorch/pytorch/pull/90415 Approved by: https://github.com/albanD, https://github.com/soulitzer	2022-12-14 16:20:57 +00:00
Richard Zou	4809e838c1	functorch.jvp support for autograd.Function (#90077 ) This PR adds functorch.jvp support for autograd.Function. It does so by adding a jvp rule for custom_function_call. For a regular PyTorch operation (like at::sin), the VariableType kernel: - re-dispatches to at::sin - calls the jvp rule for at::sin The jvp rule for custom_function_call does just that. It constructs a new autograd.Function (because the above logic already exists). Inside the forward, it re-dispatches to custom_function_call. In the jvp rule, it just calls whatever the jvp rule is supposed to be. Since this logic is really close to the custom_function_call_grad, I just put them together. Test Plan: - added jvp rules to the autograd.Function in autograd_function_db Pull Request resolved: https://github.com/pytorch/pytorch/pull/90077 Approved by: https://github.com/albanD, https://github.com/soulitzer	2022-12-14 16:20:53 +00:00
Richard Zou	24c3ad7851	Move private forward grad mode helpers to torch.autograd.forward_ad (#90240 ) Motivation - These were previously defined in functorch. They are not functorch-specific, so I'm moving them to torch.autograd.forward_ad and the autograd python bindings. - I need this to avoid some of my cyclic import problems. Should these be public APIs? Probably. Though this needs discussion, so punting it to the future. Test Plan: - moved the tests of these from test/functorch/test_eager_transforms.py to test/test_autograd.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/90240 Approved by: https://github.com/soulitzer	2022-12-13 14:14:02 +00:00
Richard Zou	3049d99027	autograd.Function supports vmap staticmethod (#90037 ) This PR adds a `vmap` staticmethod to autograd.Function and a corresponding vmap kernel for custom_function_call. These two items mean that autograd.Function with a vmap staticmethod can be used with vmap. ```py class NumpyMul(torch.autograd.Function) staticmethod def forward(x, y): return torch.tensor(to_numpy(x) * to_numpy(y), device=x.device) staticmethod def setup_context(ctx, outputs, x, y): ctx.save_for_backward(x, y) staticmethod def backward(ctx, grad_output): x, y = ctx.saved_tensors gx = None if isinstance(x, torch.Tensor) and x.requires_grad: gx = NumpyMul.apply(grad_output, y) gy = None if isinstance(y, torch.Tensor) and y.requires_grad: gy = NumpyMul.apply(grad_output, x) return gx, gy staticmethod def vmap(info, in_dims, x, y): x_bdim, y_bdim = in_dims x = x.movedim(x_bdim, -1) if x_bdim else x.unsqueeze(-1) y = y.movedim(y_bdim, -1) if y_bdim else y.unsqueeze(-1) result = NumpyMul.apply(x, y) result = result.movedim(-1, 0) return result, 0 ``` API Spec - the staticmethod takes two arguments (info, in_dims) as well as the unexpanded inputs (x, y). - If we think about it as `vmap(info, in_dims, *args)`, `in_dims` is a pytree with the same tree structure as args. It has None if the arg is not being vmapped over and an integer vmapped dimension index if it is. - `info` is an object with metadata about the vmap. It currently has one field, `info.batch_size`. In the future we can extend this by adding things like the randomness information. - If there is a single vmap going on, (x, y) are NOT BatchedTensors, they've already been unpacked. - We expect the user to return a `(outputs, out_dims)` tuple. `out_dims` must "broadcast" to the same pytree structure as `outputs`. Semantics - vmap(NumpyMul.apply)(x) will apply the vmap staticmethod if there is one and will never actually run NumpyMul.forward. - In order for the autograd.Function to support nested vmap (e.g., `vmap(vmap(NumpyMul.apply))(x)`, then the vmap staticmethod must call into operations that vmap understands (i.e. PyTorch operators or more autograd.Function). At a high level, this PR: - adds a vmap rule for custom_function_call Testing - Added some tests for in_dims and info - Added vmap staticmethod to most of the autograd.Function in autograd_function_db and sent them through functorch's vmap-related OpInfo tests Future - Better error messages if the user gets the return contract wrong. I didn't include them in this PR because it might involve a refactor of some of the existing code in functorch/_src/vmap.py that will add ~200LOC to the PR, but LMK if you'd prefer it here. Pull Request resolved: https://github.com/pytorch/pytorch/pull/90037 Approved by: https://github.com/samdow, https://github.com/soulitzer	2022-12-13 14:14:02 +00:00
Michael Lazos	730e44bbc7	Add logging for aot autograd and unified debug flag (#88987 ) - Adds `log_level` to aot's config - Outputs log to `<graph_name>_<log_level>.log` in aot_torchinductor subfolder of the debug directory - Modifies the Inductor debug context to use the graph name when naming the folder instead of the os pid - Adds `TORCH_COMPILE_DEBUG` flag to enable it, (as well as separate dynamo and inductor logs) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88987 Approved by: https://github.com/Chillee	2022-12-09 17:28:10 +00:00
Elias Ellison	b651e06049	Add Pointwise Tag from pointwise set in DTensor, use in aot_autograd partitioner (#90029 ) Takes the pointwise op list from [DTensor](https://github.com/pytorch/pytorch/blob/master/torch/distributed/_tensor/ops/pointwise_ops.py#L36) as an initially starting point for pointwise ops, and feeds them to the aot autograd partitioner. Pull Request resolved: https://github.com/pytorch/pytorch/pull/90029 Approved by: https://github.com/ezyang	2022-12-08 20:21:17 +00:00
Richard Zou	7342251281	functorch.grad support for autograd.Function (#89860 ) Happy to split this PR more if it helps. This PR adds functorch.grad support for autograd.Function. There's a lot going on; here is the high level picture and there are more details as comments in the code. Mechanism (PyOperator) - Somehow, autograd.Function needs to dispatch with functorch. This is necessary because every layer of functorch needs to see the autograd.Function; grad layers need to preserve the backward pass. - The mechanism for this is via PyOperator. If functorch transforms are active, then we wrap the autograd.Function in a `custom_function_call` PyOperator where we are able to define various rules for functorch transforms. - `custom_function_call` has a rule for the functorch grad transform. autograd.Function changes - I needed to make some changes to autograd.Function to make this work. - First, this PR splits autograd.Function into a _SingleLevelFunction (that works with a single level of functorch transform) and autograd.Function (which works with multiple levels). This is necessary because functorch's grad rule needs some way of specifying a backward pass for that level only. - This PR changes autograd.Function's apply to eitehr call `custom_function_call` (if functorch is active) or super().apply (if functorch isn't active). Testing - Most of this PR is just testing. It creates an autograd.Function OpInfo database that then gets passed to the functorch grad-based tests (grad, vjp, vjpvjp). - Since functorch transform tests are autogenerated from OpInfo tests, this is the easiest way to test various autograd.Function with functorch. Future - jvp and vmap support coming next - better error message (functorch only supports autograd.Function that have the optional setup_context staticmethod) - documentation to come when we remove the feature flag Pull Request resolved: https://github.com/pytorch/pytorch/pull/89860 Approved by: https://github.com/soulitzer	2022-12-08 19:31:04 +00:00
Edward Z. Yang	2ad6ed8ac9	Fix some typed storage is deprecated warnings. (#89867 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/89867 Approved by: https://github.com/albanD	2022-12-07 20:09:57 +00:00
Ram Rachum	351d73b97f	Fix exception causes all over the codebase (#90271 ) This is the continuation to #90134 and hopefully the final PR in this series. Pull Request resolved: https://github.com/pytorch/pytorch/pull/90271 Approved by: https://github.com/kit1980	2022-12-07 04:29:00 +00:00
Richard Zou	4068c5467d	[Reland] Move functorch/_src to torch/_functorch (#88756 ) (#90091 ) This will be the last disruptive functorch internals change. Why are we moving these files? - As a part of rationalizing functorch we are moving the code in functorch/_src to torch/_functorch - This is so that we can offer the functorch APIs as native PyTorch APIs (coming soon) and resolve some internal build issues. Why are we moving all of these files at once? - It's better to break developers all at once rather than many times Test Plan: - wait for tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/90091 Approved by: https://github.com/anijain2305, https://github.com/ezyang	2022-12-03 14:17:15 +00:00
PyTorch MergeBot	218d9c6e09	Revert "Move functorch/_src to torch/_functorch (#88756 )" This reverts commit `52bc5c1cfe`. Reverted https://github.com/pytorch/pytorch/pull/88756 on behalf of https://github.com/clee2000 due to broke imports in tests `52bc5c1cfe` https://github.com/pytorch/pytorch/actions/runs/3574742513/jobs/6010814968 probably a landrace	2022-11-29 17:17:11 +00:00

1 2 3 4 5 ...

252 Commits