pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Brian Hirsh	ba86dfcd83	AOTDispatch subclass (#104483 ) This is a PoC of AOTDispatch support. This PR actually works on basic examples, and I'm working on testing it out on `DTensor` (with @wanchaol), `SemiStructuredSparsityTensor` (with @jcaip), and `FP8Tensor`. There are some design decisions baked into the PR that I think we need consensus on though - so I'm planning on writing a larger design doc to go over the changes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/104483 Approved by: https://github.com/ezyang	2023-10-10 16:13:16 +00:00
chilli	201d02ef77	stop non-differentiable values from being materialized in aotautograd (#110721 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110721 Approved by: https://github.com/bdhirsh ghstack dependencies: #110720	2023-10-09 20:18:19 +00:00
chilli	c596db762f	refactor aotautograd to set requires_grad on info rather than a separate array (#110720 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110720 Approved by: https://github.com/bdhirsh	2023-10-09 20:18:19 +00:00
Kazuaki Ishizaki	b5f9696d81	Fix typo under torch directory (#110824 ) This PR fixes typo `the the` of comments and exception messages in files under `torch` directory. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110824 Approved by: https://github.com/H-Huang	2023-10-09 19:16:43 +00:00
chilli	6d23193aab	Added strict=True to zip in aot_autograd (#110668 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110668 Approved by: https://github.com/ezyang ghstack dependencies: #110501, #110504, #110591	2023-10-06 05:12:05 +00:00
Brian Hirsh	b457e3f79a	Reland attempt 2 of "Update AOTAutograd to use FunctionalTensorMode instead of C++ functionalization (#106406 )" (#109906 )" (#110079 ) The first reland broke internal (failing diff: D49617462). The major error looks like it's because there's an internal-only higher order op that needs a new functionalization rule. I'm going to land an internal diff for that and confirm tests pass before relanding this PR. Also confirmed that the issue from https://github.com/pytorch/pytorch/issues/110121 is fixed, and added a test. This reverts commit `1b90f07f5a`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110079 Approved by: https://github.com/ezyang	2023-10-03 18:50:25 +00:00
PyTorch MergeBot	1b90f07f5a	Revert "Reland "Update AOTAutograd to use FunctionalTensorMode instead of C++ functionalization (#106406 )" (#109906 )" This reverts commit `d0fe8fa5db`. Reverted https://github.com/pytorch/pytorch/pull/109906 on behalf of https://github.com/atalman due to Breaks internal tests ([comment](https://github.com/pytorch/pytorch/pull/109906#issuecomment-1735416852))	2023-09-26 12:10:25 +00:00
Brian Hirsh	d0fe8fa5db	Reland "Update AOTAutograd to use FunctionalTensorMode instead of C++ functionalization (#106406 )" (#109906 ) I'm pretty sure this is fixed but I'll run inductor and trunk CI. The failing test in trunk previously was that the selective activation checkpointing code that landed recently assumes that it can detect whether or not AOTAutograd is running by seeing if the inputs to SAC are C++ `FunctionalTensorWrapper`s previous land broke some inductor trunk tests This reverts commit `629a628cc8`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109906 Approved by: https://github.com/ezyang	2023-09-25 14:53:54 +00:00
PyTorch MergeBot	629a628cc8	Revert "Update AOTAutograd to use FunctionalTensorMode instead of C++ functionalization (#106406 )" This reverts commit `b5d6e831a9`. Reverted https://github.com/pytorch/pytorch/pull/106406 on behalf of https://github.com/malfet due to Broke lots of tests on trunk ([comment](https://github.com/pytorch/pytorch/pull/106406#issuecomment-1731524917))	2023-09-22 14:32:34 +00:00
Brian Hirsh	b5d6e831a9	Update AOTAutograd to use FunctionalTensorMode instead of C++ functionalization (#106406 ) Now that FunctionalTensor and `FunctionalTensorMode` are lower down in this stack, the changes in this PR are more mechanical: Everywhere in AOTAutograd that I used to use the C++ functionalization API, I now use the python functionalization API. Note that this doesn't actually cause functionalization to run underneath torch_dispatch. I'm saving that re-ordering for later in the stack. Pull Request resolved: https://github.com/pytorch/pytorch/pull/106406 Approved by: https://github.com/ezyang ghstack dependencies: #108654, #109662, #109632, #109023	2023-09-22 07:09:04 +00:00
Brian Hirsh	25e81f19f3	reland "python functionalization: add helpers, functionalize_sync and mirror_autograd_meta (#107917 )" (#109518 ) Reland - the previous PR was reverted by internal with this error: ``` File "/data/sandcastle/boxes/eden-trunk-hg-fbcode-fbsource/buck-out/v2/gen/fbcode/363cd7e240f5d021/caffe2/torch/fb/trainer/data_modules/tests/__test_dataloader__/test_dataloader#link-tree/torch/__init__.py", line 29, in <module> from ._utils_internal import _functionalize_sync as _sync ImportError: cannot import name '_functionalize_sync' from 'torch._utils_internal' ``` I couldn't figure out why internal was unhappy with the import. One potential reason is that I see a build rule for another `_utils_internal.py` in the fb folder here ([link](https://www.internalfb.com/code/fbsource/[30ed85cd88409af98b7490be137aaa5dfd7afd01]/fbcode/caffe2/TARGETS?lines=444)) Rather than burn more time investigating, I confirmed internally that the error goes away if I move the util from `torch/_utils_internal.py` to `torch/_utils.py`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109518 Approved by: https://github.com/albanD	2023-09-19 13:25:24 +00:00
PyTorch MergeBot	49b18ae546	Revert "python functionalization: add helpers, functionalize_sync and mirror_autograd_meta (#107917 )" This reverts commit `0ad595954a`. Reverted https://github.com/pytorch/pytorch/pull/107917 on behalf of https://github.com/clee2000 due to breaking internal builds D49346637 ([comment](https://github.com/pytorch/pytorch/pull/107917#issuecomment-1722566885))	2023-09-17 20:57:41 +00:00
Brian Hirsh	0ad595954a	python functionalization: add helpers, functionalize_sync and mirror_autograd_meta (#107917 ) Added two new utils to help with turning python functionalization on in AOTAutograd (next PR): (1) updated `torch._sync()`. Previously, this API could only handle `torch.Tensor` instances that had a `FunctionalTensorWrapper` TensorImpl. It now needs to handle python `FunctionalTensor`'s. In theory I can probably break BC and change this API (since it's private?), but I decided not to do it in this PR stack do minimize the chance of reverts. Instead of updating that API directly (which is in C++), I just added a python shim that first tries to unwrap the python `FunctionalTensor` if there is one, then calls the existing C++ logic (2) `mirror_autograd_meta` is now a standalone API that tries to mirror the `requires_grad` and `is_leaf` autograd metadata from one tensor to another. Previously this was hardcoded into `torch._to_functional_tensor()`. But I now need to use it in a more standalone way: later in AOTAutograd when we unwrap and re-wrap a tensor subclasses, we need to manually mirror the autograd metadata from the original to the updated version of the subclass. Pull Request resolved: https://github.com/pytorch/pytorch/pull/107917 Approved by: https://github.com/ezyang ghstack dependencies: #106404	2023-09-15 20:19:25 +00:00
soulitzer	3cc5c42a23	Fix aot sequence_nr to reset bwd flag (#107210 ) The way the aot autograd sequence_nr tracking works is that we run the aot export logic, the dynamo captured forward graph is run under an fx.Interpreter, which iterates through the nodes of the forward graph while setting the `current_metadata`. Since during backward what is run doesn't correspond to any node during forward, we fallback to the global `current_metadata`. And since this global metadata is ends up being shared between runs, that leads to weirdness if we forget to reset things, e.g., depending whether this is the first test run, the printed results will be different. Pull Request resolved: https://github.com/pytorch/pytorch/pull/107210 Approved by: https://github.com/bdhirsh	2023-08-24 16:58:12 +00:00
Brian Hirsh	8c44cfef5e	Add some support for detecting false aliasing in AOTAutograd (#106461 ) This is a partial fix for https://github.com/pytorch/pytorch/issues/106457. In the examples with the shampoo optimizer that i ran, they were enough to remove the parameter aliasing in shampoo. I added some new logic for detecting if two inputs have overlapping memory in specific cases: if they're both 2D tensors with stride 1. In that case (the case for shampoo), I try to compute a bunch of contiguous intervals on the two tensors, and check if any of the intervals overlap. In theory this is slow, since if our two tensors are e.g. of size (256, N), we'll need to create 256 intervals to check for overlap on. This seems... probably fine, since I think we do more egregious things in the compile stack to cause slowness. Open to suggestions though! Pull Request resolved: https://github.com/pytorch/pytorch/pull/106461 Approved by: https://github.com/albanD ghstack dependencies: #106460	2023-08-15 17:27:37 +00:00
Brian Hirsh	517ba2add7	AOTAutograd: allow input mutations on inputs that are non-contiguous (#106460 ) Fixes https://github.com/pytorch/pytorch/issues/106456 I also had to update the logic in functionalization's resize_() kernel to convey to AOTAutograd that resize_() is a metadata mutation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/106460 Approved by: https://github.com/ezyang	2023-08-15 17:27:37 +00:00
Alex Settle	9ba0558d48	Add sequence_nr to aot_autograd to map forward ops to their corresponding backward ops (#103129 ) Fixes #102375 Sequence_nr increments in the forward pass and decrements in the backward pass. Backward ops with the same sequence_nr as a forward op represent the backward implementation for the op. The long term goal is to make this information available to the profiler so users can observe which ops are fused by the inductor openai triton kernels. Added a test for this feature test/dynamo/test_aot_autograd.py::AotAutogradFallbackTests::test_aot_sequence_nr. The test case uses aot_export_module() to create a joint fwd/bwd fx graph. Then it walks all the nodes in fx graph using fx_graph.graph.nodes. The seq_nr of each node is recorded in node.meta. During the fwd pass the seq_nr increments and it decrements during the bwd pass. This allows the user to map forward ops to their corresponding bwd ops which is useful for performance analysis. Expected output from the test case. SeqNr\|OrigAten\|SrcFn 0\|aten.convolution.default\|l__self___conv1 0\|aten.add.Tensor\|l__self___bn1 1\|aten._native_batch_norm_legit_functional.default\|l__self___bn1 2\|aten.relu.default\|l__self___relu1 3\|aten.add.Tensor\|add 4\|aten.view.default\|flatten 5\|aten.t.default\|l__self___fc1 6\|aten.unsqueeze.default\|l__self___fc1 7\|aten.mm.default\|l__self___fc1 8\|aten.squeeze.dim\|l__self___fc1 9\|aten.add.Tensor\|l__self___fc1 10\|aten.sub.Tensor\|l__self___loss_fn 11\|aten.abs.default\|l__self___loss_fn 12\|aten.mean.default\|l__self___loss_fn 12\|aten.ones_like.default\| 12\|aten.expand.default\| 12\|aten.div.Scalar\| 11\|aten.sgn.default\| 11\|aten.mul.Tensor\| 8\|aten.unsqueeze.default\| 7\|aten.t.default\| 7\|aten.mm.default\| 7\|aten.t.default\| 7\|aten.t.default\| 7\|aten.mm.default\| 6\|aten.squeeze.dim\| 5\|aten.t.default\| 4\|aten.view.default\| 2\|aten.threshold_backward.default\| 1\|aten.native_batch_norm_backward.default\| 0\|aten.convolution_backward.default\| 0\|aten.add.Tensor\| Pull Request resolved: https://github.com/pytorch/pytorch/pull/103129 Approved by: https://github.com/soulitzer	2023-08-02 00:52:52 +00:00
Brian Hirsh	4a549dd57a	AOTAutograd: correctness fix when tracing custom autograd functions that alias inputs (#102992 ) Fixes https://github.com/pytorch/pytorch/issues/102970. See the comment [here](https://github.com/pytorch/pytorch/issues/102970#issuecomment-1577223773) for details. We normally treat "outputs that alias inputs" specially in AOTAutograd, by replaying the views at runtime, instead of baking them into the graph. For views that are part of custom autograd functions though, we can't do that view-replay, since it will clobber the backwards function that the user specified in their custom autograd.Function. Right now in this PR, I distinguish between "aliased inputs that are normal views" vs. "aliased inputs that are views that came from an autograd.Function call" by checking the outputs `.grad_fn` field, to see if it inherits from our custom CBackward function class. Then I added a new `OutputType` enum value, that we effectively treat the "normal" way (the same way that we treat ordinary, non-aliased outputs). The new enum val is mostly for debugging - so we can print it and know that our graph had custom autograd.Function aliased outputs in it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/102992 Approved by: https://github.com/ezyang, https://github.com/zou3519	2023-07-31 19:02:12 +00:00
PyTorch MergeBot	48cd8e29c1	Revert "Slightly improve AOTAutograd logging with ViewAndMutationMeta (#105702 )" This reverts commit `cc137342d0`. Reverted https://github.com/pytorch/pytorch/pull/105702 on behalf of https://github.com/PaliC due to breaking internal export tests (relevant details shared with author) ([comment](https://github.com/pytorch/pytorch/pull/105702#issuecomment-1650492077))	2023-07-25 20:17:27 +00:00
Edward Z. Yang	cc137342d0	Slightly improve AOTAutograd logging with ViewAndMutationMeta (#105702 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/105702 Approved by: https://github.com/albanD	2023-07-25 00:47:38 +00:00
Jason Ansel	c902b84e0b	Compiled autograd (#103822 ) This branch: 1) converts the autograd tape into an FX graph 2) caches that conversion using a "shadow" graph 3) compiles and runs the generated FX graph instead of the normal autograd What works currently: 1) Caching, capture, and initial integration 2) Backwards hooks 3) Inlining AotAutograd generated subgraphs 4) torch.compiling the generated FX graph 5) Auto-detecting dynamic shapes based on changes Future work 1) Larger scale testing 1) Boxed calling convention, so memory can be freed incrementally 1) Support hooks on SavedTensor 1) Additional testing by running eager autograd tests under compiled_autograd.enable() Pull Request resolved: https://github.com/pytorch/pytorch/pull/103822 Approved by: https://github.com/ezyang, https://github.com/albanD	2023-07-24 21:12:05 +00:00
Edward Z. Yang	2fa7d11b64	Immediately compile backwards graph in AOTAutograd if dynamic shapes (#104971 ) Previously, we made backwards graph compilation lazy to avoid paying for compilation if the user didn't actually end up using the backwards graph. This was useful in the old days when a lot of things in Inductor didn't work and we could bypass errors this way. However, this has a bad implication for dynamic shapes: the backwards graph compilation can trigger extra guards, which are too late to install in the Dynamo context if we wait until backwards is being run. So in this PR I move us back to compiling backwards graph immediately if we capture any SymInts for backwards. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/104971 Approved by: https://github.com/Chillee	2023-07-17 15:37:17 +00:00
Edward Z. Yang	10cbc9a063	Enable cuda graphs for dynamic shapes (#105064 ) The general idea is to do a separate CUDA graph for each size. Because of cuda graph trees, these graphs will all share the same memory pool, so your memory usage will only be the worst case memory usage of the biggest dynamic size you want. This requires an extra dispatch in the cudagraphified callable. You must pay for a CUDA graph recording for every dynamic size you encounter, but this is MUCH cheaper than running the entire PT2 compile stack, so I expect you to still see benefits. This was surprisingly easy to do. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/105064 Approved by: https://github.com/voznesenskym	2023-07-14 16:13:50 +00:00
Edward Z. Yang	979f826015	Read out real strides from compilation result, rather than real args (#105010 ) This prefigures a refactor that will move the backward compilation to entirely ahead of time, so I need to extract these strides some other way. Straight from the compiler's mouth will do it. I can't easily get the information via the return result of `fw_compiler` without changing the calling convention, so instead I smuggle it via TracingContext. TracingContext may be None when we are compiling patterns for the joint graph pattern matcher. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/105010 Approved by: https://github.com/shunting314	2023-07-12 11:33:08 +00:00
Edward Z. Yang	9d1f5a35df	Move more stuff into ViewAndMutationMeta (#105009 ) The one sort of tricksy thing about this PR is that `num_symints_saved_for_bw` is populated later; we compute the metadata with a forward pass, but we only know `num_symints_saved_for_bw` once we run partitioning. This seems... fine. Also, by pushing the conditionals into the slices, I can remove the top level if...else branch, for a nice simplification. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/105009 Approved by: https://github.com/albanD	2023-07-12 02:22:44 +00:00
Aaron Gokaslan	2f95a3d0fc	[BE]: Apply ruff PERF fixes to torch (#104917 ) Applies automated ruff fixes in the PERF modules and enables all automatic ones. I also updated ruff which applied some additional fixes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/104917 Approved by: https://github.com/ezyang, https://github.com/albanD	2023-07-11 20:45:21 +00:00
Jason Ansel	dffcf999bd	Misc changes from compiled autograd branch (#104316 ) This PR pulls out some standalone changes from #103822 Pull Request resolved: https://github.com/pytorch/pytorch/pull/104316 Approved by: https://github.com/ezyang	2023-07-08 20:59:20 +00:00
Yukio Siraichi	d0a72ec5e4	Translation validator for dynamo guards. (#102563 ) This PR introduces a translation validator for dynamo guards. In summary, it verifies whether the guards issued as Python code are sound, w.r.t the initial guards. The main changes in this PR are: - Create an FX graph for dynamic shapes - Translate "the original" guards from the FX graph to Z3 - Check if the guards produced by `produce_guards` are sound w.r.t. the ones from the FX graph gh-stack version of the PR #101146. Pull Request resolved: https://github.com/pytorch/pytorch/pull/102563 Approved by: https://github.com/ezyang	2023-06-28 22:32:53 +00:00
Shunting Zhang	98f00f881f	[inductor] convert layout of conv weight ahead of time for inference (#103642 ) This PR handles inference. Will do similar thing for training later. Some manual testing results shows this can improve inference perf by 2-3% (absolute improvement not relative one). - convmixer: 4.285x -> 4.309x - resnet50: 2.170x -> 2.203x The PR is built upon freezing. Since without freezing, the weight input for a conv node may not be a parameter directly but be the output of precision converting ops. It's so much easier to implement this PR after freezing. Commands ``` TORCHINDUCTOR_FREEZING=1 python benchmarks/dynamo/timm_models.py --backend inductor --amp --performance --only convmixer_768_32 --inference ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/103642 Approved by: https://github.com/eellison	2023-06-28 17:42:32 +00:00
Brian Hirsh	106d3f0115	[AOTAutograd] make _unsafe_view() logic happen during the runtime epilogue (#103919 ) Fixes https://github.com/pytorch/pytorch/issues/103153 AOTAutograd has some logic for handling the case when we have: * a graph output that is a view of an intermediate * None of the other aliases of that output escape the graph, so from the perspective of the user + the autograd engine, we can pretend that the output is not a view However, that logic would inject an `_unsafe_view()` call into the graph at trace time. This isn't wrong, but inductor will just immediately decompose `_unsafe_view()` into `view()`, and so the output tensor will continue to show up as having view metadata w.r.t. autograd. This PR changes the `unsafe_view()` call to be in the runtime epilogue, instead of being part of the graph (where the compiler might do bad things to it - the compiler also shouldn't have to concern itself with autograd metadata). Pull Request resolved: https://github.com/pytorch/pytorch/pull/103919 Approved by: https://github.com/ezyang	2023-06-21 14:37:35 +00:00
ShuaipengLi	df814484f4	remove dynamo fake param/buf check (#103574 ) Fixes #103569 Pull Request resolved: https://github.com/pytorch/pytorch/pull/103574 Approved by: https://github.com/ezyang	2023-06-16 14:19:37 +00:00
Thiago Crepaldi	6f655d4195	Add symbolic tracing support to torch._dynamo.export (fake input + weights) (#100017 ) Fixes #95900 Using the following repro as guide: ```python import torch import torch._dynamo from torch._subclasses import fake_tensor from torch.fx.experimental.symbolic_shapes import ShapeEnv from torch._dynamo.output_graph import config class Model(torch.nn.Module): def __init__(self) -> None: super().__init__() self.linear = torch.nn.Linear(2, 2) self.linear2 = torch.nn.Linear(2, 2) def forward(self, x): out = self.linear(x) out = self.linear2(out) return out fake_mode = fake_tensor.FakeTensorMode(allow_non_fake_inputs=False, allow_fallback_kernels=True, shape_env=ShapeEnv( allow_scalar_outputs=config.capture_scalar_outputs, allow_dynamic_output_shape_ops=config.capture_dynamic_output_shape_ops, frame_id=0 ), ) # Fakefying input/model before calling torch._dynamo.export with fake_mode: fake_x = torch.rand(5, 2, 2) model = Model() # Calling torch._dynamo.export without active fake mode graph_module, guards = torch._dynamo.export( model, fake_x, aten_graph=True, fake_mode=fake_mode ) graph_module.print_readable() graph_module.graph.print_tabular() ``` Summary of changes: * Plumb fake_mode through torch.export API. When specified, it replaces the creation of a new FaketendorMode at InstructionTranslator on behalf of OutputGraph Hacks FakeTensor.__new__ to prevent a torch.tensor._make_subclass call for inputs that are already fakefied by user. This probably need to be fixed in a nicer way. Any idea? * Removed a few asserts that didn't want faked tensors coming from user script * Added torch._subclasses.fake_tensor.FakeTensor to type list on a few asserts check to allow fake inputs The changes above allowed symbolic tracing with both static and dynamic shapes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/100017 Approved by: https://github.com/ezyang	2023-06-15 21:28:10 +00:00
Elias Ellison	d083d444ff	Inductor Freezing (#100652 ) Adds a freezing pass that will constant fold parameters in inductor `config.freezing`. This occurs post functionalization in aot autograd to capture both dispatching and allow passes to occur post functionalization. A few notes: - There is an option to discard parameters `config.freezing_discard_parameters` which will take the current eager modules and wrap parameters to a Tensor subclass which will error if used. - I needed to expose flat_params in aot_autograd in order to discard old references when we constant fold away parameters, like with amp. I also exposed `fw_metadata` to avoid constant folding mutated paraemters. - Caching parameter transformations/constant folding across different inferences nyi - Checking version_counter of constant folded params nyi I'm not really sure what the actual naming should be. In jit there was both "freezing", which was platform agnostic, and "optimize for inference", which made device specific optimizations. We're doing the latter here but maybe freezing is a better name. Differential Revision: [D46244033](https://our.internmc.facebook.com/intern/diff/D46244033) Pull Request resolved: https://github.com/pytorch/pytorch/pull/100652 Approved by: https://github.com/jansel	2023-06-12 20:56:03 +00:00
Edward Z. Yang	54daf870bc	CUDA graphs overrides dynamic shapes and forces specialization (#103290 ) Previously, cudagraphs and dynamic_shapes were incompatible and enabling dynamic shapes would forcibly disable cudagraphs. This new strategy I think is better. The idea is essentially that cudagraphs is an "optimization" that happens to guard on every input. When cudagraphs is on, we force everything static, and this automatically does the right thing because we will force a recompile if sizes change. This obsoletes https://github.com/pytorch/pytorch/pull/101813 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/103290 Approved by: https://github.com/voznesenskym, https://github.com/eellison	2023-06-12 20:26:55 +00:00
Shunting Zhang	daf75c0759	[AOTAutograd] compare with stride hints (#103342 ) We previously compare FakeTensor's strides with real tensor's strides. This cause dynamic dimension of FakeTensor being specialized to static int. This may cause a graph specialized for one shape being used by another shape which is wrong. Use stride hints for the comparison instead. Pull Request resolved: https://github.com/pytorch/pytorch/pull/103342 Approved by: https://github.com/malfet	2023-06-10 06:51:54 +00:00
Shunting Zhang	86c7652503	[inductor] layout optimization for conv (#99773 ) convolution kernel with channels last runs much faster then kernel with contiguous inputs. The PR leverage that to optimize tensor layouts so we provide 'channels last' inputs to convolution. Some care need to be taken to not convert tensor layout between contiguous and channels last back and forth. Those extra copies hurt performance quite much. Latest perf number [here](https://hud.pytorch.org/benchmark/compilers?startTime=Wed%2C%2024%20May%202023%2023%3A40%3A37%20GMT&stopTime=Wed%2C%2031%20May%202023%2023%3A40%3A37%20GMT&granularity=hour&suite=torchbench&mode=training&dtype=amp&lBranch=shunting-layout-opt-19&lCommit=baa797fc100688dfb044fbcbdebcfd2591710f78&rBranch=main&rCommit=999bae0f54108ffc5b7cf2524a02a83901554b16) - TB: 1.64x -> 1.69x - HF: 1.79x -> 1.78x (random noise) - TIMM: 1.51x -> 1.65x Right now we disable layout optimization for dynamic shape since there is perf loss in that combination. Here is a GH issue to followup: https://github.com/pytorch/pytorch/issues/102670 Pull Request resolved: https://github.com/pytorch/pytorch/pull/99773 Approved by: https://github.com/jansel	2023-06-02 21:08:18 +00:00
Brian Hirsh	f22148f0ed	aotautograd: fix mutation bug when input is noncontiguous (#102767 ) Fixes https://github.com/pytorch/pytorch/issues/93363. See the comment here for details: https://github.com/pytorch/pytorch/issues/93363#issuecomment-1572647261 Pull Request resolved: https://github.com/pytorch/pytorch/pull/102767 Approved by: https://github.com/ezyang	2023-06-02 14:31:06 +00:00
Richard Zou	74f10b9ea5	Switch most Python RAII guard usages to context manager (#102642 ) There are some I can't easily switch due to reasons like: - Dynamo modelling the guard - BC concerns (for torch.autograd.set_multithreading_enabled) Test Plan: - existing tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/102642 Approved by: https://github.com/albanD	2023-06-01 16:28:37 +00:00
Edward Z. Yang	dcf0c5fb6e	Use safe_is_leaf to test leafness (#102706 ) This fixes one of the problems in https://github.com/pytorch/pytorch/issues/101160#issuecomment-1570376548 but I don't have a test case because the full example is fairly difficult to minify. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/102706 Approved by: https://github.com/bdhirsh	2023-06-01 16:02:12 +00:00
Animesh Jain	9c4fd72b53	[aot_autograd][functional_rng] Change calling convention (#102344 ) Key change - seed, offset are the last 2 args in both the fwd and bwd graphs Reason - The cudagraphs implementation in inductor currently relies on very simple ordering guarantees i.e. first n inputs are static for both fwd and bwd graphs. In the current implementation of functionalization of rng ops, this assumption is broken because the first 2 inputs are seed, offset. Pull Request resolved: https://github.com/pytorch/pytorch/pull/102344 Approved by: https://github.com/eellison	2023-05-26 21:27:20 +00:00
Tugsbayasgalan Manlaibaatar	b5ee34e5f2	Disallow module forward input mutation in aot_export (#101834 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/101834 Approved by: https://github.com/bdhirsh	2023-05-20 05:41:01 +00:00
Brian Hirsh	ee40cce475	[AOTAutograd] add export entrypoints (#100587 ) The main addition in this PR is two new API's in AOTAutograd. APIs `aot_export_module`: Given a module, exports it into a functionalized FX graph. Returns an `fx.GraphModule`, `GraphSignature` pair. The `GraphSignature` tells you various information about the graph, such as which graph inputs correspond to module params/buffers (and their fqn's), how to pytree-ify the inputs and the outputs of the graph. If you specify `trace_joint=True`, then you'll get back a joint forward-backward graph, that also returns parameter gradients in addition to the user outputs. There are several restrictions on this API, detailed in the comments. The most notable one is probably that this API does not handle partial graphs: If you want a backward graph, then you module's forward function is required to return a scalar loss that we can backprop through. It also does not support capturing the optimizer step. I (gratefully) used @SherlockNoMad and @suo's internal version of the `GraphSignature` object for this API, with a few minor changes in order to integrate it into AOTAutograd. `aot_export_joint_simple`: Given a function, we'll trace it into a joint forward-backward graph and return it. Unlike the above API, the function is not required to return a scalar loss. However, this API makes the guarantee that you do not need to make any calling convention changes between the original function, and the exported one, provided that you do that you do the following: * If you pass `trace_joint=False`, no work is needed: we'll export a functionalized forward graph with the same set of inputs as the original function * If you pass `trace_joint=True`, then you will need to manually use the `default_partitioner` or `min_cut_partitioner` from functorch. If you do, and get back a fw and bw graph, then the forward graph will be runnable identically to the original user function. The main use case for this API is higher order ops: a higher order op like `torch.cond()` can implement its derivative formula by using this API to export a joint graph (for both the true subgraph and the false subgraph), partition it into a fw/bw graph, and run cond on the `true_bw`, `false_bw` subgraphs. cc @zou3519 @Chillee Implementation Strategy A lot of the work in this PR went in to trying to find a reasonable way to re-use existing AOTAutograd components to expose these API's. Concretely: * The two new API's are both thin wrappers around `_aot_export_function`: this is a general purpose export API, that just re-uses `create_aot_dispatcher_function`. If we want to add e.g. an export API that includes the optimizer step in the future, we could probably implement it using `_aot_export_function`. * `aot_export_module` works extra hard to re-use as much of AOTAutograd as possible. For example, when tracing an inference graph, I perform the export under `torch.no_grad()` to make sure we don't accidentally trace out a backwards graph. When exporting a joint graph, I manually `.detach()` all user outputs except the loss, to make sure that we don't accidentally compute gradients for any other user outputs (even if the user forgot to manually detach them). * A large portion of `aot_export_module` comes from parsing out and creating a `GraphSignature` object. We discussed a few weeks ago that there's potentially a lot more information that we could stuff into this object (see [doc](https://docs.google.com/document/d/1_qzdKew5D1J2Q2GkZ1v5jsczSsIU-Sr0AJiPW7DdGjE/edit?usp=sharing)). For now, I ended up deciding to support the more limited use case of exporting a fwd-bwd full graph, without some of the extra annotations in that doc (for example, if we were to export partial graphs, we would need annotations for saved activations). My thought is that once a more concrete use case comes up that the existing API doesn't satisfy, we can revisit the annotations then. * I factored out `create_functional_call()` and `create_tree_flattened_fn()` for pytree-flattening and lifting-params-and-buffers, since I also need them in the export code * I added an `AOTConfig.is_export` flag. The export API re-uses all of the same code paths as the rest of AOTAutograd, but there are a few points where we need to either exit early (and avoid making a runtime epilogue), or add extra error checking, that is only valuable for export. * `aot_dispatch_autograd()` now exits early if it's being called in an export context, so it returns the full graph instead of also trying to create an `autograd.Function`. I think we probably want to factor this out, although I figured it would be safer to wait a bit for clarity on how functional RNG works with export. Pull Request resolved: https://github.com/pytorch/pytorch/pull/100587 Approved by: https://github.com/ezyang, https://github.com/SherlockNoMad	2023-05-15 18:08:11 +00:00
Brian Hirsh	bba12a4668	aot_autograd: factor out runtime epilogue from aot_dispatch_base (#100586 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/100586 Approved by: https://github.com/ezyang	2023-05-15 18:08:11 +00:00
Brian Hirsh	a6b8e69d36	[aot autograd] fix de-dupping metadata computation bug (#100431 ) Fixes https://github.com/pytorch/pytorch/issues/100224 There was a bug in the way that metadata was computed when going from "metadata before-removing-dupes" to "metadata after-removing-dupes". In fact, when I ran the repro with `functorch.config.debug_assert = True`, that immediately signaled to me that the metadata was incorrect. Pull Request resolved: https://github.com/pytorch/pytorch/pull/100431 Approved by: https://github.com/ngimel, https://github.com/albanD	2023-05-12 00:50:35 +00:00
Brian Hirsh	5651006b9d	[aot_autograd] proper handling for when outputs are aliased but have identical size/stride/offset metadata (#100430 ) Fixes https://github.com/pytorch/pytorch/issues/100348, see the discussion in the issue for details. The problem was that for code like this: ``` def f(x): out = ... return out, out.detach() ``` The `.detach()` would turn into a `.alias()`, and inductor turns `.alias()` calls into no-ops. Inductor would effectively see that the two graph outputs have the same metadata, and return `out, out`. cc @ngimel alternatively we could have inductor try to detect when it's not ok to make `.alias()` a no-op, but that would probably require some custom logic instead of making `.alias()` a decomposition. Pull Request resolved: https://github.com/pytorch/pytorch/pull/100430 Approved by: https://github.com/ngimel	2023-05-12 00:50:35 +00:00
Edward Z. Yang	2621fbda7d	Turn on anomaly detection for AOTAutograd backward tracing (#101047 ) Previously, anomaly detection was only enabled on the inner forward function, and not on the overall joint function that calls backward. I believe this impeded us from printing "this is the forward that triggered the backward" because that printing only happens if anomaly mode is enabled when you run backward(). This PR fixes it. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/101047 Approved by: https://github.com/albanD, https://github.com/bdhirsh	2023-05-11 03:38:20 +00:00
Michael Voznesensky	fe3ecfe0cf	Add AotAutogradFallbackTests to dynamic suite (#100454 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/100454 Approved by: https://github.com/ezyang	2023-05-04 04:28:45 +00:00
Animesh Jain	6bc4651193	[philox_rand] Dynamic shape support (#99290 ) Extends the functionalization of rng work to Dynamic shapes. An example of the generated graph looks like this ~~~ [2023-04-24 21:41:37,446] torch._functorch.aot_autograd.__aot_graphs: [INFO] TRACED GRAPH ===== Forward graph 1 ===== <eval_with_key>.7 class <lambda>(torch.nn.Module): def forward(self, arg0_1: i64[], arg1_1: i64[], arg2_1: Sym(s0), arg3_1: Sym(s1), arg4_1: f32[s0, s1]): # File: /scratch/anijain/work/pytorch/test/test_functionalization_of_rng_ops.py:46, code: a = torch.rand_like(x) * x add: i64[] = torch.ops.aten.add.Tensor(arg1_1, 0) philox_rand = torch.ops.rngprims.philox_rand.default([arg2_1, arg3_1], arg0_1, add, None, device(type='cuda', index=0), torch.float32); add = None getitem: f32[s0, s1] = philox_rand[0] getitem_1: i64[] = philox_rand[1]; philox_rand = None add_1: i64[] = torch.ops.aten.add.Tensor(getitem_1, 0); getitem_1 = None mul: f32[s0, s1] = torch.ops.aten.mul.Tensor(getitem, arg4_1); getitem = arg4_1 = None # File: /scratch/anijain/work/pytorch/test/test_functionalization_of_rng_ops.py:47, code: a = torch.rand_like(x) * a add_2: i64[] = torch.ops.aten.add.Tensor(arg1_1, add_1) philox_rand_1 = torch.ops.rngprims.philox_rand.default([arg2_1, arg3_1], arg0_1, add_2, None, device(type='cuda', index=0), torch.float32); arg2_1 = arg3_1 = arg0_1 = add_2 = None getitem_2: f32[s0, s1] = philox_rand_1[0] getitem_3: i64[] = philox_rand_1[1]; philox_rand_1 = None add_3: i64[] = torch.ops.aten.add.Tensor(add_1, getitem_3); add_1 = getitem_3 = None mul_1: f32[s0, s1] = torch.ops.aten.mul.Tensor(getitem_2, mul); getitem_2 = mul = None # No stacktrace found for following nodes add_4: i64[] = torch.ops.aten.add.Tensor(arg1_1, add_3); arg1_1 = add_3 = None return (mul_1, add_4) ~~~ Each rand op is accompanied by its offset calculation op. Pull Request resolved: https://github.com/pytorch/pytorch/pull/99290 Approved by: https://github.com/ezyang, https://github.com/bdhirsh	2023-04-25 22:40:28 +00:00
Aaron Gokaslan	e2a3817dfd	[BE] Enable C419 rule for any all shortcircuiting (#99890 ) Apparently https://github.com/pytorch/pytorch/pull/78142 made torch.JIT allow for simple generator expressions which allows us to enable rules that replace unnecessary list comprehensions with generators in any/all. This was originally part of #99280 but I split it off into this PR so that it can be easily reverted should anything break. Pull Request resolved: https://github.com/pytorch/pytorch/pull/99890 Approved by: https://github.com/justinchuby, https://github.com/kit1980, https://github.com/malfet	2023-04-25 15:02:13 +00:00
Edward Z. Yang	ebd47b0eec	Propagate mark_dynamic in Dynamo compiled outputs. (#99634 ) If you run a user operation you'll lose it, but this will at least get the easy stuff. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/99634 Approved by: https://github.com/voznesenskym	2023-04-23 03:24:28 +00:00

1 2 3

137 Commits